Files
hub/docs/reviews/storage-architecture-review-2026-04-21.md
glm-5.1 2b63cda1c7 Setup repo: migrate architecture specs, code stubs, and tasks from alkhub_ts
Copy architecture docs, ADRs, storage domain specs, research, reviews,
and 56 storage architecture tasks from the alkhub_ts monorepo. Adapt for
standalone @alkdev/hub repo structure (src/ not packages/hub/).

Sanitize all sensitive information:
- Replace private IPs (10.0.0.1) with localhost defaults
- Remove internal server hostnames (dev1, ns528096)
- Replace /workspace/ private paths with npm package references
- Remove hardcoded credentials from examples
- Rewrite infrastructure.md without private network details

Add Deno project scaffolding: deno.json (pinned deps), .gitignore,
AGENTS.md, entry point. Migrate existing code stubs (crypto, config
types, logger) with updated import paths.
2026-05-25 10:56:32 +00:00

41 KiB

status, last_updated, review_date, reviewer, scope, resolution
status last_updated review_date reviewer scope resolution
active 2026-04-21 2026-04-21 architect (with 5 subagent reviewers) docs/architecture/storage/* + docs/decisions/ADR-001 through ADR-012 pending

Storage Architecture Review: 2026-04-21

Comprehensive review of the storage specification documents (docs/architecture/storage/) and related ADRs. Five parallel subagent reviews were conducted, each focused on a domain area. Their findings are consolidated here with deduplication, prioritization, and cross-references.

Review Sessions (open-memory)

# Domain Session ID
1 Identity & Auth ses_24f76141effegdhw2bxX2sOvYb
2 Sessions & Messages ses_24f751efbffeyWo9wb6hAnnj0y
3 Services, Spokes, Call Graph ses_24f746ebbffeG4jqN3MbK5i9yt
4 Tasks & Coordination ses_24f7431baffeElbZ3qVHCYQOSv
5 Cross-Cutting Concerns ses_24f735dbcffea1pN0JCgtPdbt2

Documents Reviewed

  • docs/architecture/storage/README.md — common pattern, package structure, open questions
  • docs/architecture/storage/table-reference.md — cross-cutting reference (cascades, indexes, enums, relations)
  • docs/architecture/storage/identity.md — accounts, organizations, organization_members, api_keys, audit_logs
  • docs/architecture/storage/projects.md — projects, workspaces
  • docs/architecture/storage/sessions.md — sessions, messages, parts
  • docs/architecture/storage/roles.md — roles
  • docs/architecture/storage/services.md — clients, client_secrets
  • docs/architecture/storage/spokes.md — spokes, operation_specs
  • docs/architecture/storage/call-graph.md — call_graph_nodes, call_graph_edges
  • docs/architecture/storage/coordination.md — mappings, detections
  • docs/architecture/storage/tasks.md — tasks, task_dependencies
  • docs/decisions/ADR-001 through ADR-012

Summary Statistics

Severity Count
🔴 Critical 14
🟡 Warning 22
💡 Suggestion 17

🔴 Critical Issues

Issues that must be resolved before the storage spec is stabilized. Each represents a concrete inconsistency, data integrity risk, or ambiguity that would cause implementation divergence.


C01. NOT NULL + onDelete: SET NULL — Contradictory Constraints

Sessions: 1, 2, 5 Files: sessions.md:17, identity.md:112, table-reference.md:80-71 Open-memory: ses_24f76141effegdhw2bxX2sOvYb, ses_24f751efbffeyWo9wb6hAnnj0y, ses_24f735dbcffea1pN0JCgtPdbt2

Two FK columns are declared NOT NULL but have onDelete: SET NULL. PostgreSQL will reject the DELETE because it cannot nullify a NOT NULL column:

  1. sessions.accountIdtext NOT NULL (sessions.md:17) with onDelete: SET NULL (table-reference.md:80). Deleting an account that owns sessions fails.
  2. audit_logs.ownerIdtext NOT NULL (identity.md:112) with onDelete: SET NULL (table-reference.md:71). Deleting an account that has audit entries fails.

Recommendation: For each, choose one:

  • Make the column nullable (if detaching on delete is desired)
  • Change cascade to RESTRICT (if the FK must always be populated — blocks account deletion)
  • Change cascade to CASCADE (if deleting dependent records is acceptable)
  • Add application-level logic that reassigns/destroys dependents before account deletion

For audit_logs.ownerId, RESTRICT may be correct — audit trails should prevent account deletion. For sessions.accountId, nullable is likely correct — orphaned sessions (account deleted) are still valuable data.


C02. ADR-003 vs sessions.md on Message IDs

Sessions: 2, 5 Files: ADR-003, sessions.md:42-46, table-reference.md:48 Open-memory: ses_24f751efbffeyWo9wb6hAnnj0y, ses_24f735dbcffea1pN0JCgtPdbt2

ADR-003 explicitly states: "Parts and messages tables use sortable timestamp-based IDs instead of commonCols.id." However, sessions.md defines the messages table using commonCols (which provides UUIDv4 via crypto.randomUUID()). Only parts explicitly uses sortable IDs. table-reference.md only mentions parts for sortable IDs.

This is a three-way inconsistency: ADR says both tables, sessions.md does one, table-reference says one. Message ordering is semantically important (the composite index idx_messages_session_id_created_at_id on (session_id, created_at, id) relies on created_at for ordering, making UUIDv4 sortable IDs unnecessary — but this contradicts ADR-003's stated rationale).

Recommendation: Either:

  • (A) Update messages table to use sortable IDs (consistent with ADR-003, eliminates dependency on created_at for ordering), or
  • (B) Amend ADR-003 to state that only parts uses sortable IDs, and messages relies on the (session_id, created_at, id) composite index

C03. Operation Specs: Delete vs. Soft-Deactivation Unresolved

Sessions: 3, 5 Files: spokes.md:66, table-reference.md:67, README.md Open Question #2 Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt, ses_24f735dbcffea1pN0JCgtPdbt2

The spoke disconnect lifecycle has three conflicting positions:

  • spokes.md:66: "Removes the spoke's operation_specs rows (or marks inactive)" — ambiguous
  • table-reference.md:67: operation_specs.spokeId → spokes.id with CASCADE delete
  • README.md Open Question #2: "DELETE aligns with the ephemeral spoke model" for now

The operation_specs table has no active or status column to support soft-deactivation. Crucially, spoke rows are never deleted — they're only marked status: "disconnected". This means the CASCADE FK never fires, and there's no mechanism to clean up operation_specs on disconnect. The operation_specs rows remain pointing to a disconnected spoke with no way to deprecate them.

Recommendation: Resolve decisively:

  • (A) Hard delete on disconnect: Add explicit cleanup in the disconnect handler. Remove "or marks inactive" from spokes.md. CASCADE only applies to rare admin spoke-row deletion.
  • (B) Add active/status column to operation_specs: Support soft-deactivation. Update cascade rationale. This preserves the operation registry for audit/reconnection but adds schema complexity.

Option A aligns with the ephemeral spoke model. Option B supports spoke reconnection. Choose one and update all documents.


C04. parts.sessionId Denormalization: No Enforcement Mechanism

Sessions: 2 Files: sessions.md:96, sessions.md:105 Open-memory: ses_24f751efbffeyWo9wb6hAnnj0y

The stated invariant: "when inserting a part, always set sessionId to the message's sessionId. Never update messages.sessionId without updating all child parts." However:

  • No DB trigger enforces this
  • No application-level transaction pattern is documented
  • No CHECK constraint exists
  • If messages.sessionId could change, there's a race condition window

Recommendation: Document that sessionId on both messages and parts is immutable after creation (which eliminates the update problem). Define the application-level contract for part insertion: read the message's sessionId and set it on the part within the same transaction. Add an explicit "IMMUTABLE" note to the sessionId column in sessions.md.


C05. sessions.roleName — No FK, No Validation Strategy Documented

Sessions: 2 Files: sessions.md:26, table-reference.md:100-101, roles.md Open-memory: ses_24f751efbffeyWo9wb6hAnnj0y

sessions.roleName is bare text with no FK to roles.name and no documented reason why. Is this intentional (to support file-based roles in Phase 1)? What happens if the role name has a typo? What about sessions referencing a role that was deleted?

Recommendation: Either:

  • (A) Add FK → roles.name with onDelete: SET NULL (role deletions detach sessions), or
  • (B) Document why the FK is intentionally omitted: "role definitions may come from .opencode/agents/*.md files before DB sync; application-level validation checks against known role names at session creation time."

C06. mappings.task Denormalized Column: No Sync Strategy

Sessions: 4 Files: coordination.md:22, tasks.md:209 Open-memory: ses_24f7431baffeElbZ3qVHCYQOSv

The mappings table has both taskId (FK → tasks.id) and task (denormalized display name). No mechanism keeps them in sync. If taskId points to a task whose slug or name changes, mappings.task becomes stale. When taskId is SET NULL (task deleted), what happens to task?

Recommendation: Document the invariant: "mappings.task is set to tasks.slug at insert time and is not automatically updated when the task's slug changes. When taskId is SET NULL (task deleted), task should also be SET NULL. This is a cache, not a source of truth." Alternatively, remove the denormalized column and use a VIEW that joins.


C07. Sync vs. Runtime Field Conflict in Tasks

Sessions: 4 Files: tasks.md:296-325 Open-memory: ses_24f7431baffeElbZ3qVHCYQOSv

The task sync does a full upsert, but the Authority Model says runtime status mutations go through hub.task.updateStatus. If sync blindly writes frontmatter status, it can clobber runtime state. Example:

  1. Agent sets task.status = 'in-progress' via hub.task.updateStatus
  2. Decomposer edits the task file (still has status: pending)
  3. Sync runs and upserts the task — overwrites in-progress back to pending

Recommendation: Define the sync field split explicitly: "Sync upserts authored fields (slug, name, path, scope, risk, impact, level, priority, tags, assignee, due, body, fileCreatedAt, fileModifiedAt, depends_on) and must not overwrite runtime-managed fields (status, startedAt, completedAt). Runtime fields are only mutated via hub.task.* operations." Update the sync flow specification in tasks.md.


C08. Concurrent task.body Appends: No Collision Handling

Sessions: 4 Files: tasks.md:249-266 Open-memory: ses_24f7431baffeElbZ3qVHCYQOSv

hub.task.addNote appends a timestamped note section to body. In a multi-agent system, read-modify-write is a race condition: Agent A reads body, Agent B reads body, both append, Agent A writes, Agent B overwrites A's addition. The spec says "This is simple" — it is not simple under concurrency.

Recommendation: Specify the concurrency model: hub.task.addNote must use DB-level concatenation (UPDATE tasks SET body = body || $note WHERE id = $taskId), not a read-modify-write cycle. Or use optimistic locking with updatedAt. Document this explicitly in the addNote specification.


C09. Cross-Project Dependency Constraint: No DB Enforcement

Sessions: 4 Files: tasks.md:217, tasks.md:357 Open-memory: ses_24f7431baffeElbZ3qVHCYQOSv

"Tasks can only depend on tasks within the same project" is declared but only "enforced at the application level." task_dependencies has FK columns with no projectId column or check constraint. Application-level enforcement is vulnerable to race conditions, direct SQL access, or bugs.

Recommendation: At minimum, add a DB-level guard. Options:

  • (A) Add a trigger that checks dependsOnTaskId and taskId belong to the same project
  • (B) Add a denormalized projectId column to task_dependencies with a composite FK
  • (C) Document the risk explicitly and specify that the sync operation validates project scope within a transaction (SELECT FOR SHARE)

C10. Call Graph Edges: Missing Indexes and Cascade Documentation

Sessions: 3, 5 Files: call-graph.md:32-41, table-reference.md (missing) Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt, ses_24f735dbcffea1pN0JCgtPdbt2

call_graph_edges has no indexes and no cascade entries in table-reference.md. Both sourceId and targetId reference call_graph_nodes.id with CASCADE (implied by domain doc), but this is undocumented. Without indexes, graph traversal queries (find children, find parents) will require sequential scans.

Additionally, the relationship between call_graph_nodes.parentRequestId and call_graph_edges is ambiguous: do they store the same parent-child relationship redundantly, or serve different purposes?

Recommendation:

  • Add indexes: idx_call_graph_edges_source_id on (sourceId), idx_call_graph_edges_target_id on (targetId). Consider unique on (sourceId, targetId, edgeType) to prevent duplicates.
  • Add cascade entries to table-reference.md for both FKs (CASCADE).
  • Clarify parentRequestId vs call_graph_edges: document whether parentRequestId is a convenience shortcut or redundant with edges.

C11. Secret Key Rotation: Underspecified

Sessions: 3 Files: services.md:94-97, ADR-008 Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt

Lazy re-encryption is insufficiently specified for a security-critical operation:

  1. Multi-key storage: HUB_ENCRYPTION_KEY (singular env var) — how are old and new keys stored simultaneously during rotation?
  2. Re-encryption transaction: If the process crashes between decrypt and re-encrypt-update, is the secret left in the old key version?
  3. Old key unavailability: What happens if a secret with keyVersion=1 is accessed after the old key is removed? Permanent data loss with no documented handling.
  4. No background sweep: Old-key-version secrets persist indefinitely until accessed. If the old key is compromised, those secrets remain vulnerable.

Recommendation:

  • Specify multi-key storage: e.g., HUB_ENCRYPTION_KEYS=v1:base64key,v2:base64key or a key file
  • Document the re-encryption transaction: decrypt → encrypt → UPDATE in a single DB transaction, with crash-safety note
  • Add a warning about the vulnerability window (old-key secrets not yet re-encrypted)
  • Specify whether a background re-encryption sweep is needed or deferred

C12. Client Config Schema Validation: Timing and Evolution Ambiguous

Sessions: 3 Files: services.md:19, ADR-007, README.md Open Question #10 Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt

"Validated against the TypeBox schema for this type on write" is ambiguous:

  1. Who validates? Drizzle insert schema? API handler? DB trigger? Direct SQL bypasses application validation.
  2. Schema evolution: when code deployment changes a client type's TypeBox schema, existing DB rows may become invalid under the new schema.
  3. No re-validation on read is documented.

Recommendation:

  • Specify: "validate on write (API handler layer) + warn on read (start-up validation pass with logging, not blocking)"
  • Document the schema evolution contract: new fields MUST be Type.Optional(); breaking changes MUST use a new client type string (e.g., llm-provider-v2)
  • Consider a configSchemaVersion in metadata tracking which schema version validated the config

C13. Dual Ownership Model for Organizations: Undefined

Sessions: 1 Files: identity.md:44 (ownerId), identity.md:58 (membershipLevel: "owner") Open-memory: ses_24f76141effegdhw2bxX2sOvYb

Two competing ownership concepts with no documented relationship:

  1. organizations.ownerId — a single FK to one account
  2. organization_members.membershipLevel: "owner" — can exist on multiple rows

Can ownerId point to an account with membershipLevel: "member" (not "owner")? Can an org have zero members with membershipLevel: "owner" but a non-null ownerId? An implementer cannot determine which field is authoritative for ownership queries.

Recommendation: Document the invariant. E.g.: "ownerId is always a member with membershipLevel: 'owner' (enforced by app logic). If all owner-level members are removed, ownerId must be transferred first." Or: "ownerId is the creator; membershipLevel: 'owner' is a separate authorization concept."


C14. Missing FK Cascade Entries in table-reference.md

Sessions: 5 Files: table-reference.md:53-83 Open-memory: ses_24f735dbcffea1pN0JCgtPdbt2

The following FK relationships are documented in per-domain docs but absent from the cascade reference table:

Missing Relationship Source Doc
mappings.workspaceId → workspaces.id coordination.md:19
detections.sessionId → sessions.id coordination.md:36
call_graph_edges.sourceId → call_graph_nodes.id call-graph.md:39
call_graph_edges.targetId → call_graph_nodes.id call-graph.md:41
api_keys.rotatedToId → api_keys.id identity.md:80

Without documented cascade behavior, PostgreSQL defaults to RESTRICT, which may not be the intended behavior for all of these.

Recommendation: Add all missing FK entries to the cascade table with explicit onDelete behavior. For the rotatedToId FK specifically: SET NULL (old key keeps its data but rotation link is broken if new key is deleted).


🟡 Warnings

Issues that should be resolved if possible. They represent gaps in documentation, suboptimal designs, or inconsistencies that could cause confusion.


W01. Dual JSONB Overlap: commonCols.metadata vs Per-Table data

Sessions: 1 Files: identity.md:85-88, identity.md:23, README.md:73 Open-memory: ses_24f76141effegdhw2bxX2sOvYb

Two overlapping JSONB columns exist on some tables with no documented boundary:

  • commonCols.metadata — present on every table, Record<string, unknown>
  • Per-table data columns — domain-specific data (e.g., accounts.data, organizations.data)

For api_keys, keypal stores scopes, resources, and tags inside commonCols.metadata. For accounts, both data ("preferences, avatar URL") and metadata (arbitrary) exist with overlapping purposes and no split documentation.

Recommendation: Document the boundary: "data holds structured domain-specific data with known TypeScript types. metadata holds opaque key-value pairs for subsystem use, with a namespacing convention (e.g., metadata._keypal.scopes). Never mix domain data into metadata."


W02. No Account Deactivation Mechanism

Sessions: 1 Files: identity.md (accounts table) Open-memory: ses_24f76141effegdhw2bxX2sOvYb

The accounts table has no enabled/suspended column. Combined with organizations.ownerId → RESTRICT, an org owner's account cannot be deleted. But there's also no way to deactivate it when an employee leaves.

Recommendation: Add an enabled boolean (consistent with api_keys.enabled and clients.enabled), or a status column (active/suspended/deactivated). Document the interaction with cascade constraints.


W03. Missing Indexes Across Many Tables

Sessions: 1, 2, 3, 5 Files: table-reference.md:87-145, per-domain docs Open-memory: All sessions (consensus finding)

Multiple tables have FK columns or common query patterns without supporting indexes:

Table Missing Index Purpose
sessions unq_sessions_slug in index ref UNIQUE constraint not listed (unlike other UNIQUEs)
sessions idx_sessions_parent_id on (parentId) Find child sessions of coordinator
projects idx_projects_org_id on (orgId) Find projects for an org
workspaces idx_workspaces_project_id on (projectId) Find workspaces for a project
spokes idx_spokes_name on (name) Look up spoke by name
detections idx_detections_session_id on (sessionId) Find detections for a session (no indexes at all)
call_graph_nodes idx_call_graph_nodes_created_at on (createdAt) Time-range queries
call_graph_nodes idx_call_graph_nodes_operation_created on (operationId, createdAt) Operation + time queries
call_graph_edges idx_call_graph_edges_source_id on (sourceId) Graph traversal (children)
call_graph_edges idx_call_graph_edges_target_id on (targetId) Graph traversal (parents)
mappings idx_mappings_workspace_id on (workspaceId) Workspace-scoped mapping queries

Also: idx_api_keys_key_hash (B-tree) is redundant with unq_api_keys_key_hash (UNIQUE). Postgres automatically creates an index for UNIQUE constraints.

Recommendation: Add all missing indexes to table-reference.md and relevant per-domain docs. Remove the redundant idx_api_keys_key_hash.


W04. operation_specs Pre-Remap vs. Post-Remap Namespace Ambiguity

Sessions: 3 Files: spokes.md:51-55, spoke-runner.md:62 Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt

Do operation_specs.namespace and operation_specs.name store the original spoke identifiers (pre-remap, e.g., dev.fs.read) or the remapped hub identifiers (post-remap, e.g., dev.{spokeId}.fs.read)? The spoke-runner.md says the hub remaps spoke operations into a hub namespace, but the operation_specs storage format is never specified.

If pre-remap: two spokes registering dev.fs.read creates ambiguity without joining on spokeId. If post-remap: the partial unique indexes may be over-constraining since the spoke-specific namespace prefix makes spokeId redundant for uniqueness.

Recommendation: Explicitly document which identifiers are stored. If pre-remap, document how callers resolve ambiguity. If post-remap, adjust the uniqueness rationale.


W05. call_graph_edges.edgeType Semantics Undefined

Sessions: 3 Files: call-graph.md:41 Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt

Three edge types are listed (triggered, depends_on, requested_by) but none are explained. The call-graph architecture doc only discusses parent-child relationships (triggered). depends_on and requested_by are novel and undocumented. Are these exhaustive or extensible?

Recommendation: Document each edge type's semantics in call-graph.md, or state that edgeType is an extensible text field with these three initial values and define what each means.


W06. spokes.status Missing reconnecting State

Sessions: 3 Files: spokes.md:18, spoke-runner.md:130-136 Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt

The spoke status enum is connected, disconnected. The spoke-runner.md describes a reconnection flow, but there's no intermediate state for "reconnecting." When a spoke's WebSocket drops, it shows disconnected — indistinguishable from a permanently offline spoke.

Recommendation: Add reconnecting to the spoke status enum, or document that reconnection is handled at the application layer (WebSocket reconnect timer) without a DB state change.


W07. client_secrets.keyVersion Redundancy

Sessions: 3 Files: services.md:71, services.md:82-86 Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt

client_secrets has both a standalone keyVersion column (integer NOT NULL DEFAULT 1) AND keyVersion embedded in the value JSONB (EncryptedData.keyVersion). These can diverge with no documented invariant.

Recommendation: Either remove the standalone column (read from value.keyVersion), or document that the standalone column is authoritative and they must be kept in sync.


W08. Call Graph Payload Security

Sessions: 3 Files: call-graph.md:22-23 Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt

The input and output JSONB columns store full call payloads. Operations like hub.register (which receives auth tokens) would store API keys and secrets in cleartext. The truncation strategy (10KB) addresses size, not sensitive data. No redaction is mentioned.

Recommendation: Add a section on sensitive data handling. Options:

  • Operation handlers mark certain fields as redacted
  • The call graph writer applies field-level redaction by convention (fields named password, token, secret, key)
  • The truncation strategy is extended with a redaction pass

W09. No Call Graph Retention Policy

Sessions: 3, 4 Files: call-graph.md (absent), README.md Open Question #5 Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt

Call graph data grows unboundedly. Every operation invocation creates a node and edges. CASCADE handles cleanup when a node is deleted, but nothing deletes old nodes. README.md acknowledges this as Open Question #5.

Recommendation: Specify the intended approach: TTL-based deletion, archival to cold storage, or aggregation + deletion. Even a "v1: manual cleanup, v2: automatic TTL" notation helps.


W10. sessions.version Column: Unspecified

Sessions: 2 Files: sessions.md:24, README.md Open Question #1 Open-memory: ses_24f751efbffeyWo9wb6hAnnj0y

The version column is text NOT NULL with description "Schema version (opencode compat)" but:

  • No valid values listed
  • No default documented for hub-direct sessions vs opencode imports
  • No versioning scheme defined
  • README.md Open Question #1 asks whether to version data columns — this is unresolved

Recommendation: Define initial version value (e.g., "1"), document what version governs (the data JSONB shape? the message/parts schema? opencode compatibility only?), and specify the default for hub-direct sessions.


W11. Overlapping Status Enums Without Cross-Table Disambiguation

Sessions: 4, 5 Files: table-reference.md:147-164, coordination.md:23, tasks.md:84-86 Open-memory: ses_24f7431baffeElbZ3qVHCYQOSv, ses_24f735dbcffea1pN0JCgtPdbt2

Three tables have status with overlapping values:

Table Shared Values Unique Values
mappings completed, failed, aborted active
call_graph_nodes completed, failed, aborted pending, running
tasks completed, failed pending, in-progress, blocked

table-reference.md:164 only contrasts mappings.active vs call_graph_nodes.pending/running. It does NOT contrast tasks statuses with the others. mappings.completed and tasks.completed mean different things (mapping workflow completion vs task completion).

Recommendation: Add cross-table state mapping documentation. When a task goes in-progress, there should be an active mapping; when a task is completed, the mapping becomes completed. Document valid combinations.


W12. Audit Logs Missing Session and Org Context

Sessions: 1 Files: identity.md:103-117 Open-memory: ses_24f76141effegdhw2bxX2sOvYb

audit_logs has ownerId and keyId but no sessionId or orgId. For LLM accounts that fill roles in sessions, the session correlation is a significant traceability gap. Multi-tenant auditing requires org filtering.

Recommendation: Add sessionId (nullable FK → sessions.id, SET NULL) and orgId (nullable FK → organizations.id, SET NULL). Expand action types to cover account, membership, and organization lifecycle events — or document the action enum as extensible.


W13. API Key Hashing (SHA-256) Trade-Off Undocumented

Sessions: 1 Files: identity.md:74, ADR-010 Open-memory: ses_24f76141effegdhw2bxX2sOvYb

API keys are bearer tokens stored as SHA-256 hashes. SHA-256 is a fast hash, not a deliberately slow KDF (bcrypt/Argon2). If the database is compromised, SHA-256 hashes can be brute-forced orders of magnitude faster than slow hashes. However, API keys are high-entropy machine-generated strings (128-bit+), making brute-force infeasible even with a fast hash. No ADR documents this trade-off.

Recommendation: Add documentation: "API keys are high-entropy random strings (128-bit+), making brute-force infeasible even with a fast hash. SHA-256 was chosen for O(1) verification latency at high throughput. This is acceptable because API keys are machine-generated, unlike human-chosen passwords."


W14. ADR Terminology Inconsistencies

Sessions: 1 Files: ADR-009:13, ADR-012:55, agent-roles.md Open-memory: ses_24f76141effegdhw2bxX2sOvYb

  • ADR-009 says "organization_members (membership with roles)" — contradicts ADR-012's rename to membershipLevel
  • ADR-012 itself uses accounts.role: "service" in its rationale, despite mandating the rename to accessLevel
  • agent-roles.md also uses accounts.role: "service"

Recommendation: Update ADR-009 to say "membership with levels." Update ADR-012:55 and agent-roles.md to use accounts.accessLevel: "service".


W15. Resolved Open Questions Still Listed as Open in README

Sessions: 5 Files: README.md:197-225 Open-memory: ses_24f735dbcffea1pN0JCgtPdbt2

Several open questions are resolved by per-domain docs or ADRs but remain listed as open:

  • Q2 (operation spec cleanup): Resolved — DELETE aligns with ephemeral spoke model (spokes.md, table-reference.md CASCADE)
  • Q4 (workspaces vs. directories): Marked as "Resolved" in the list but still present
  • Q14 (accounts.roleaccessLevel): Renamed in identity.md, referenced in ADR-012

Recommendation: Move resolved items to a "Resolved Decisions" section with cross-references to the resolving documents.


W16. organizations.ownerId RESTRICT: No Deletion/Transfer Workflow

Sessions: 1, 5 Files: identity.md:44, table-reference.md:56 Open-memory: ses_24f76141effegdhw2bxX2sOvYb, ses_24f735dbcffea1pN0JCgtPdbt2

RESTRICT prevents deletion of accounts that own organizations, but no ownership transfer mechanism is documented.

Recommendation: Add a note: "Before deleting an account, transfer all owned organizations via org.transferOwnership operation." Document the transfer pattern in identity.md or coordination.md.


W17. Path LIKE Queries May Not Use B-Tree Indexes in PostgreSQL

Sessions: 4 Files: tasks.md:83, tasks.md:101 Open-memory: ses_24f7431baffeElbZ3qVHCYQOSv

WHERE path LIKE 'implementation/%' can use a B-tree index only with the C locale or text_pattern_ops. With the default locale, LIKE pattern matching may not use the index.

Recommendation: Specify that the path index should use text_pattern_ops (CREATE INDEX idx_tasks_path ON tasks (path text_pattern_ops)) or document the locale dependency.


W18. Call Graph Payload Truncation Lacks Precision

Sessions: 3 Files: call-graph.md:30 Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt

The truncation strategy says "truncate payloads larger than 10KB" but doesn't specify: when truncation happens (on write? after call completes?), what preview contains (first N bytes? N characters?), whether 10KB is configurable, or how object storage reference URLs are structured.

Recommendation: Specify: (a) truncation happens on write to DB (in-flight calls have full payloads); (b) preview is the first 1024 bytes of the JSON-serialized payload; (c) make the threshold configurable per operation type or via hub config; (d) defer object storage details but add a placeholder section.


W19. call_graph_nodes.identity Has No FK or Account Linkage

Sessions: 3 Files: call-graph.md:20 Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt

The identity JSONB column stores { id, scopes, resources } as a snapshot, but there's no FK to accounts.id. Querying "all calls made by account X" requires JSONB containment, which is slow without a GIN index.

Recommendation: Add a callerAccountId text column with FK → accounts.id (SET NULL) for efficient querying, or add a GIN index on identity if JSONB queries are the intended access pattern.


W20. mappings Table Overloaded — Three Distinct Relationship Types

Sessions: 4 Files: coordination.md:10-27 Open-memory: ses_24f7431baffeElbZ3qVHCYQOSv

The mappings table stores three conceptually different relationships in one table:

  1. Session → Spoke (where is the session running?)
  2. Session → Parent session (coordination hierarchy)
  3. Session → Task (what work is the session doing?)

All nullable FKs allow any combination, including potentially invalid ones. The table name mappings doesn't convey what's being mapped.

Recommendation: Document the valid column combinations (e.g., sessionId always NOT NULL, taskId only for task-scoped mappings, parentSessionId only for coordinator children). This makes it a polymorphic association table with documented shapes.


W21. detections Table Is Minimal — No Resolution or Deduplication

Sessions: 4 Files: coordination.md:29-39 Open-memory: ses_24f7431baffeElbZ3qVHCYQOSv

  • No resolution tracking (resolved, acknowledged, false-positive)
  • No deduplication (persistent MODEL_DEGRADATION creates a new row every check interval)
  • No session end correlation
  • anomalyType value set is unclear (open or closed enum?)

Recommendation: Add resolvedAt timestamp column. Add a UNIQUE constraint on (sessionId, anomalyType) with upsert semantics, or document that deduplication is handled at the application level. Specify whether anomalyType is extensible.


💡 Suggestions

Quality-of-life improvements that should be considered but won't block stabilization.


S01. Document accessLevel Change Authorization

Who can change an accounts.accessLevel? Can a user self-promote? Document the assumed invariants even for application-level concerns.


S02. Add Partial Indexes for Common Access Patterns

Several partial indexes would improve performance: active API keys (WHERE revoked_at IS NULL AND enabled = true), connected spokes (WHERE status = 'connected'), non-archived sessions, active tasks (WHERE status IN ('pending', 'in-progress', 'blocked')).


S03. Reserve @alk.dev Email Domain for System Accounts

LLM accounts use fallback addresses like glm-5.1@alk.dev. Document that all *@alk.dev emails are reserved for system-generated accounts and humans must use other domains.


accounts.displayName is not indexed. For UIs with user search/autocomplete, this would require full table scans.


S05. Document API Key Expiration Behavior

Does an expired key return "key expired" or a generic "authentication failed"? Recommend generic response to avoid leaking key state to attackers.


S06. Cross-Reference sessions.accountId in Identity Docs

identity.md:12 lists FK targets but omits sessions.accountId. Add it for completeness.


S07. Define FilePartData Type

sessions.md:132 references FilePartData[] in ToolState but never defines it. Clarify whether it's the same as the file part type's data shape.


S08. Complete AI SDK UIMessage Part Type Mapping

sessions.md:145-152 maps 6 part types but omits step-finish, patch, snapshot, compaction, agent. Document that these are excluded from the UIMessage view, or add mappings.


S09. Document sessions.slug Generation Strategy

Is it human-provided? Auto-generated? Random? This matters for API design and uniqueness enforcement.


S10. Add parts.type Index for Part-Type Queries

A composite index (session_id, type) would support queries like "all tool-call parts in session X" without a full scan. At minimum, document that type queries rely on existing indexes + sequential scan.


S11. Document Whether Parts Are Flat or Nested

The agent part type implies sub-agent delegation, which might need nesting. The current schema has no parentId on parts. Document whether parts are flat or whether nesting might be needed.


W22. parts Table: Missing $onUpdate and NOT NULL on Timestamp Columns

Sessions: 5 Files: sessions.md:99-107, README.md:69-82 Open-memory: ses_24f735dbcffea1pN0JCgtPdbt2

The parts table defines its own id, metadata, createdAt, and updatedAt instead of using commonCols, but the spec only says "defaults to now()" without specifying NOT NULL or $onUpdate. If the Drizzle implementation omits $onUpdate, parts rows never have updatedAt updated on modification, silently breaking any optimistic concurrency or caching logic. If createdAt/updatedAt are not NOT NULL, they can become NULL.

Recommendation: The parts table spec must explicitly state that createdAt and updatedAt are NOT NULL and that updatedAt includes $onUpdate(() => new Date()). Either replicate these details from commonCols with an explicit override note for id, or reference commonCols with the id exception documented.


S13. Add projectId to mappings for Direct Project-Scoped Queries

Finding all active mappings for a project's tasks requires a JOIN through sessions.projectId or tasks.projectId. A denormalized projectId would simplify this, or document that the JOIN pattern is acceptable.


S14. Document mappings.status Lifecycle Transitions

Unlike tasks.status which has an explicit lifecycle diagram, mappings.status transitions are unspecified. Add a lifecycle diagram or state machine.


S15. Specify Task Enum Values as Drizzle pgEnum

The categorical enum values (scope, risk, impact, level, priority, status) are documented as text strings but not referenced as Drizzle pgEnum types. Specify that these should be pgEnum for type safety, with the decomposer template consuming the same definitions.


S16. Rename taskId to dependentTaskId in task_dependencies

The column name taskId is generic and could be confused as "this task" rather than "the dependent task." Renaming to dependentTaskId makes the direction unmistakable.


S17. Add call_graph_nodes.startedAt Index for Latency Analysis

startedAt is crucial for p99 latency analysis. Consider an index alongside or instead of the suggested createdAt index.


S18. Consider Unique Constraint on call_graph_edges(sourceId, targetId, edgeType)

Nothing prevents duplicate edges between the same two nodes with the same type. A unique constraint prevents silently duplicated edge events from retries/reconnections.


What's Working Well

Strengths identified across all five reviews:

  1. Drizzle-TypeBox pattern — Well-documented and consistently applied. The createSelectSchema/createInsertSchema workflow with JSONB overrides is clear and implementable.

  2. Cross-cutting reference patterntable-reference.md as a single source for cascades, indexes, enums, and relations is an excellent organizational pattern that prevents "hunt through every domain doc" problems.

  3. Nullable categorical fields (ADR-011) — The "not yet assessed" signal via NULL (instead of defaults) is well-reasoned and matches taskgraph's own Option<T> model.

  4. Dual task representation — DB as source of truth, files as authoring surface. The authority model table is excellent and provides clear guidance.

  5. ADR-012 terminology clarification — The account/role/session distinction is clearly motivated and the rename guidance is actionable.

  6. Cascade behavior documentation — Having all FK behaviors in one place with rationale per relationship prevents implementation errors.

  7. Operation specs as capabilities (ADR-006) — Elegant decision. Avoids opaque JSONB blobs, makes capabilities fully typed and queryable. Nullable spoke_id allows hub-native operations to coexist.

  8. Config/secrets separation (ADR-007) — Four-layer model (config schema, config instance, auth schema, auth instance) with different storage strategies is well-structured.

  9. Path semantics for tasks — Replacing parentId with path column for group scoping is clean. Prefix-based queries are intuitive and well-explained.

  10. Partial unique index design — The operation_specs partial indexes correctly handle PostgreSQL's NULL-in-unique-index behavior. The explanation prevents a common pitfall.


Action Plan

Before Stabilization (Must Fix)

Priority Issue Action
🔴1 C01 Resolve NOT NULL + SET NULL contradictions for sessions.accountId and audit_logs.ownerId
🔴2 C02 Resolve ADR-003 vs sessions.md on message IDs — update one or the other
🔴3 C03 Resolve operation_specs delete vs soft-deactivation — choose one, update all docs
🔴4 C04 Document sessionId immutability invariant on messages/parts
🔴5 C05 Document roleName validation strategy (FK or intentional omission)
🔴6 C14 Add missing FK cascade entries to table-reference.md

Before Implementation (Should Fix)

Priority Issue Action
🟡1 C06 Document mappings.task denormalization invariant
🟡2 C07 Define sync field split (authored vs. runtime fields)
🟡3 C08 Specify DB-level concatenation for task.body appends
🟡4 C09 Add DB-level guard for cross-project dependencies
🟡5 C10 Add call_graph_edges indexes and cascade docs
🟡6 C11 Specify key rotation multi-key format and transaction safety
🟡7 C14 Add remaining missing cascade entries
🟡8 W03 Add missing indexes across tables
🟡9 W11 Add cross-table status disambiguation
🟡10 W14 Fix ADR terminology inconsistencies

Before Production (Consider)

Priority Issue Action
💡1 W02 Add account deactivation mechanism
💡2 W08 Add call graph payload redaction
💡3 W09 Define call graph retention policy
💡4 W12 Add sessionId and orgId to audit_logs
💡5 W21 Add resolution tracking to detections
💡6 All S01-S18 Quality-of-life improvements