--- status: active last_updated: 2026-04-21 review_date: 2026-04-21 reviewer: architect (with 5 subagent reviewers) scope: docs/architecture/storage/* + docs/decisions/ADR-001 through ADR-012 resolution: pending --- # Storage Architecture Review: 2026-04-21 Comprehensive review of the storage specification documents (`docs/architecture/storage/`) and related ADRs. Five parallel subagent reviews were conducted, each focused on a domain area. Their findings are consolidated here with deduplication, prioritization, and cross-references. ## Review Sessions (open-memory) | # | Domain | Session ID | |---|--------|------------| | 1 | Identity & Auth | `ses_24f76141effegdhw2bxX2sOvYb` | | 2 | Sessions & Messages | `ses_24f751efbffeyWo9wb6hAnnj0y` | | 3 | Services, Spokes, Call Graph | `ses_24f746ebbffeG4jqN3MbK5i9yt` | | 4 | Tasks & Coordination | `ses_24f7431baffeElbZ3qVHCYQOSv` | | 5 | Cross-Cutting Concerns | `ses_24f735dbcffea1pN0JCgtPdbt2` | ## Documents Reviewed - `docs/architecture/storage/README.md` — common pattern, package structure, open questions - `docs/architecture/storage/table-reference.md` — cross-cutting reference (cascades, indexes, enums, relations) - `docs/architecture/storage/identity.md` — accounts, organizations, organization_members, api_keys, audit_logs - `docs/architecture/storage/projects.md` — projects, workspaces - `docs/architecture/storage/sessions.md` — sessions, messages, parts - `docs/architecture/storage/roles.md` — roles - `docs/architecture/storage/services.md` — clients, client_secrets - `docs/architecture/storage/spokes.md` — spokes, operation_specs - `docs/architecture/storage/call-graph.md` — call_graph_nodes, call_graph_edges - `docs/architecture/storage/coordination.md` — mappings, detections - `docs/architecture/storage/tasks.md` — tasks, task_dependencies - `docs/decisions/ADR-001` through `ADR-012` ## Summary Statistics | Severity | Count | |----------|-------| | 🔴 Critical | 14 | | 🟡 Warning | 22 | | 💡 Suggestion | 17 | --- ## 🔴 Critical Issues Issues that must be resolved before the storage spec is stabilized. Each represents a concrete inconsistency, data integrity risk, or ambiguity that would cause implementation divergence. --- ### C01. `NOT NULL` + `onDelete: SET NULL` — Contradictory Constraints **Sessions**: 1, 2, 5 **Files**: `sessions.md:17`, `identity.md:112`, `table-reference.md:80-71` **Open-memory**: `ses_24f76141effegdhw2bxX2sOvYb`, `ses_24f751efbffeyWo9wb6hAnnj0y`, `ses_24f735dbcffea1pN0JCgtPdbt2` Two FK columns are declared `NOT NULL` but have `onDelete: SET NULL`. PostgreSQL will reject the DELETE because it cannot nullify a NOT NULL column: 1. **`sessions.accountId`** — `text NOT NULL` (`sessions.md:17`) with `onDelete: SET NULL` (`table-reference.md:80`). Deleting an account that owns sessions fails. 2. **`audit_logs.ownerId`** — `text NOT NULL` (`identity.md:112`) with `onDelete: SET NULL` (`table-reference.md:71`). Deleting an account that has audit entries fails. **Recommendation**: For each, choose one: - Make the column **nullable** (if detaching on delete is desired) - Change cascade to **RESTRICT** (if the FK must always be populated — blocks account deletion) - Change cascade to **CASCADE** (if deleting dependent records is acceptable) - Add application-level logic that reassigns/destroys dependents before account deletion For `audit_logs.ownerId`, RESTRICT may be correct — audit trails should prevent account deletion. For `sessions.accountId`, nullable is likely correct — orphaned sessions (account deleted) are still valuable data. --- ### C02. ADR-003 vs `sessions.md` on Message IDs **Sessions**: 2, 5 **Files**: `ADR-003`, `sessions.md:42-46`, `table-reference.md:48` **Open-memory**: `ses_24f751efbffeyWo9wb6hAnnj0y`, `ses_24f735dbcffea1pN0JCgtPdbt2` ADR-003 explicitly states: *"Parts and messages tables use sortable timestamp-based IDs instead of commonCols.id."* However, `sessions.md` defines the `messages` table using `commonCols` (which provides UUIDv4 via `crypto.randomUUID()`). Only `parts` explicitly uses sortable IDs. `table-reference.md` only mentions parts for sortable IDs. This is a three-way inconsistency: ADR says both tables, sessions.md does one, table-reference says one. Message ordering is semantically important (the composite index `idx_messages_session_id_created_at_id` on `(session_id, created_at, id)` relies on `created_at` for ordering, making UUIDv4 sortable IDs unnecessary — but this contradicts ADR-003's stated rationale). **Recommendation**: Either: - (A) Update `messages` table to use sortable IDs (consistent with ADR-003, eliminates dependency on `created_at` for ordering), **or** - (B) Amend ADR-003 to state that only `parts` uses sortable IDs, and `messages` relies on the `(session_id, created_at, id)` composite index --- ### C03. Operation Specs: Delete vs. Soft-Deactivation Unresolved **Sessions**: 3, 5 **Files**: `spokes.md:66`, `table-reference.md:67`, `README.md` Open Question #2 **Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt`, `ses_24f735dbcffea1pN0JCgtPdbt2` The spoke disconnect lifecycle has three conflicting positions: - `spokes.md:66`: "Removes the spoke's `operation_specs` rows **(or marks inactive)**" — ambiguous - `table-reference.md:67`: `operation_specs.spokeId → spokes.id` with **CASCADE** delete - `README.md` Open Question #2: "DELETE aligns with the ephemeral spoke model" for now The `operation_specs` table has **no** `active` or `status` column to support soft-deactivation. Crucially, spoke rows are **never deleted** — they're only marked `status: "disconnected"`. This means the CASCADE FK never fires, and there's no mechanism to clean up operation_specs on disconnect. The operation_specs rows remain pointing to a disconnected spoke with no way to deprecate them. **Recommendation**: Resolve decisively: - **(A) Hard delete on disconnect**: Add explicit cleanup in the disconnect handler. Remove "or marks inactive" from spokes.md. CASCADE only applies to rare admin spoke-row deletion. - **(B) Add active/status column to operation_specs**: Support soft-deactivation. Update cascade rationale. This preserves the operation registry for audit/reconnection but adds schema complexity. Option A aligns with the ephemeral spoke model. Option B supports spoke reconnection. Choose one and update all documents. --- ### C04. `parts.sessionId` Denormalization: No Enforcement Mechanism **Sessions**: 2 **Files**: `sessions.md:96`, `sessions.md:105` **Open-memory**: `ses_24f751efbffeyWo9wb6hAnnj0y` The stated invariant: *"when inserting a part, always set `sessionId` to the message's `sessionId`. Never update `messages.sessionId` without updating all child parts."* However: - No DB trigger enforces this - No application-level transaction pattern is documented - No CHECK constraint exists - If `messages.sessionId` could change, there's a race condition window **Recommendation**: Document that `sessionId` on both `messages` and `parts` is **immutable after creation** (which eliminates the update problem). Define the application-level contract for part insertion: read the message's `sessionId` and set it on the part within the same transaction. Add an explicit "IMMUTABLE" note to the `sessionId` column in `sessions.md`. --- ### C05. `sessions.roleName` — No FK, No Validation Strategy Documented **Sessions**: 2 **Files**: `sessions.md:26`, `table-reference.md:100-101`, `roles.md` **Open-memory**: `ses_24f751efbffeyWo9wb6hAnnj0y` `sessions.roleName` is bare `text` with no FK to `roles.name` and no documented reason why. Is this intentional (to support file-based roles in Phase 1)? What happens if the role name has a typo? What about sessions referencing a role that was deleted? **Recommendation**: Either: - (A) Add `FK → roles.name` with `onDelete: SET NULL` (role deletions detach sessions), **or** - (B) Document why the FK is intentionally omitted: "role definitions may come from `.opencode/agents/*.md` files before DB sync; application-level validation checks against known role names at session creation time." --- ### C06. `mappings.task` Denormalized Column: No Sync Strategy **Sessions**: 4 **Files**: `coordination.md:22`, `tasks.md:209` **Open-memory**: `ses_24f7431baffeElbZ3qVHCYQOSv` The `mappings` table has both `taskId` (FK → tasks.id) and `task` (denormalized display name). No mechanism keeps them in sync. If `taskId` points to a task whose `slug` or `name` changes, `mappings.task` becomes stale. When `taskId` is SET NULL (task deleted), what happens to `task`? **Recommendation**: Document the invariant: "`mappings.task` is set to `tasks.slug` at insert time and is **not** automatically updated when the task's slug changes. When `taskId` is SET NULL (task deleted), `task` should also be SET NULL. This is a cache, not a source of truth." Alternatively, remove the denormalized column and use a VIEW that joins. --- ### C07. Sync vs. Runtime Field Conflict in Tasks **Sessions**: 4 **Files**: `tasks.md:296-325` **Open-memory**: `ses_24f7431baffeElbZ3qVHCYQOSv` The task sync does a full upsert, but the Authority Model says runtime status mutations go through `hub.task.updateStatus`. If sync blindly writes frontmatter `status`, it can clobber runtime state. Example: 1. Agent sets `task.status = 'in-progress'` via `hub.task.updateStatus` 2. Decomposer edits the task file (still has `status: pending`) 3. Sync runs and upserts the task — overwrites `in-progress` back to `pending` **Recommendation**: Define the sync field split explicitly: "Sync upserts **authored fields** (slug, name, path, scope, risk, impact, level, priority, tags, assignee, due, body, fileCreatedAt, fileModifiedAt, depends_on) and must **not overwrite runtime-managed fields** (status, startedAt, completedAt). Runtime fields are only mutated via `hub.task.*` operations." Update the sync flow specification in tasks.md. --- ### C08. Concurrent `task.body` Appends: No Collision Handling **Sessions**: 4 **Files**: `tasks.md:249-266` **Open-memory**: `ses_24f7431baffeElbZ3qVHCYQOSv` `hub.task.addNote` appends a timestamped note section to `body`. In a multi-agent system, read-modify-write is a race condition: Agent A reads body, Agent B reads body, both append, Agent A writes, Agent B overwrites A's addition. The spec says "This is simple" — it is not simple under concurrency. **Recommendation**: Specify the concurrency model: `hub.task.addNote` must use DB-level concatenation (`UPDATE tasks SET body = body || $note WHERE id = $taskId`), not a read-modify-write cycle. Or use optimistic locking with `updatedAt`. Document this explicitly in the `addNote` specification. --- ### C09. Cross-Project Dependency Constraint: No DB Enforcement **Sessions**: 4 **Files**: `tasks.md:217`, `tasks.md:357` **Open-memory**: `ses_24f7431baffeElbZ3qVHCYQOSv` "Tasks can only depend on tasks within the same project" is declared but only "enforced at the application level." `task_dependencies` has FK columns with no `projectId` column or check constraint. Application-level enforcement is vulnerable to race conditions, direct SQL access, or bugs. **Recommendation**: At minimum, add a DB-level guard. Options: - (A) Add a trigger that checks `dependsOnTaskId` and `taskId` belong to the same project - (B) Add a denormalized `projectId` column to `task_dependencies` with a composite FK - (C) Document the risk explicitly and specify that the sync operation validates project scope within a transaction (SELECT FOR SHARE) --- ### C10. Call Graph Edges: Missing Indexes and Cascade Documentation **Sessions**: 3, 5 **Files**: `call-graph.md:32-41`, `table-reference.md` (missing) **Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt`, `ses_24f735dbcffea1pN0JCgtPdbt2` `call_graph_edges` has **no indexes** and **no cascade entries** in `table-reference.md`. Both `sourceId` and `targetId` reference `call_graph_nodes.id` with CASCADE (implied by domain doc), but this is undocumented. Without indexes, graph traversal queries (find children, find parents) will require sequential scans. Additionally, the relationship between `call_graph_nodes.parentRequestId` and `call_graph_edges` is ambiguous: do they store the same parent-child relationship redundantly, or serve different purposes? **Recommendation**: - Add indexes: `idx_call_graph_edges_source_id` on `(sourceId)`, `idx_call_graph_edges_target_id` on `(targetId)`. Consider unique on `(sourceId, targetId, edgeType)` to prevent duplicates. - Add cascade entries to `table-reference.md` for both FKs (CASCADE). - Clarify `parentRequestId` vs `call_graph_edges`: document whether `parentRequestId` is a convenience shortcut or redundant with edges. --- ### C11. Secret Key Rotation: Underspecified **Sessions**: 3 **Files**: `services.md:94-97`, `ADR-008` **Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt` Lazy re-encryption is insufficiently specified for a security-critical operation: 1. **Multi-key storage**: `HUB_ENCRYPTION_KEY` (singular env var) — how are old and new keys stored simultaneously during rotation? 2. **Re-encryption transaction**: If the process crashes between decrypt and re-encrypt-update, is the secret left in the old key version? 3. **Old key unavailability**: What happens if a secret with `keyVersion=1` is accessed after the old key is removed? Permanent data loss with no documented handling. 4. **No background sweep**: Old-key-version secrets persist indefinitely until accessed. If the old key is compromised, those secrets remain vulnerable. **Recommendation**: - Specify multi-key storage: e.g., `HUB_ENCRYPTION_KEYS=v1:base64key,v2:base64key` or a key file - Document the re-encryption transaction: decrypt → encrypt → UPDATE in a single DB transaction, with crash-safety note - Add a warning about the vulnerability window (old-key secrets not yet re-encrypted) - Specify whether a background re-encryption sweep is needed or deferred --- ### C12. Client Config Schema Validation: Timing and Evolution Ambiguous **Sessions**: 3 **Files**: `services.md:19`, `ADR-007`, `README.md` Open Question #10 **Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt` "Validated against the TypeBox schema for this type **on write**" is ambiguous: 1. Who validates? Drizzle insert schema? API handler? DB trigger? Direct SQL bypasses application validation. 2. Schema evolution: when code deployment changes a client type's TypeBox schema, existing DB rows may become invalid under the new schema. 3. No re-validation on read is documented. **Recommendation**: - Specify: "validate on write (API handler layer) + warn on read (start-up validation pass with logging, not blocking)" - Document the schema evolution contract: new fields MUST be `Type.Optional()`; breaking changes MUST use a new client `type` string (e.g., `llm-provider-v2`) - Consider a `configSchemaVersion` in `metadata` tracking which schema version validated the config --- ### C13. Dual Ownership Model for Organizations: Undefined **Sessions**: 1 **Files**: `identity.md:44` (ownerId), `identity.md:58` (membershipLevel: "owner") **Open-memory**: `ses_24f76141effegdhw2bxX2sOvYb` Two competing ownership concepts with no documented relationship: 1. `organizations.ownerId` — a single FK to one account 2. `organization_members.membershipLevel: "owner"` — can exist on multiple rows Can `ownerId` point to an account with `membershipLevel: "member"` (not "owner")? Can an org have zero members with `membershipLevel: "owner"` but a non-null `ownerId`? An implementer cannot determine which field is authoritative for ownership queries. **Recommendation**: Document the invariant. E.g.: "`ownerId` is always a member with `membershipLevel: 'owner'` (enforced by app logic). If all owner-level members are removed, `ownerId` must be transferred first." Or: "`ownerId` is the creator; `membershipLevel: 'owner'` is a separate authorization concept." --- ### C14. Missing FK Cascade Entries in `table-reference.md` **Sessions**: 5 **Files**: `table-reference.md:53-83` **Open-memory**: `ses_24f735dbcffea1pN0JCgtPdbt2` The following FK relationships are documented in per-domain docs but **absent** from the cascade reference table: | Missing Relationship | Source Doc | |---|---| | `mappings.workspaceId → workspaces.id` | coordination.md:19 | | `detections.sessionId → sessions.id` | coordination.md:36 | | `call_graph_edges.sourceId → call_graph_nodes.id` | call-graph.md:39 | | `call_graph_edges.targetId → call_graph_nodes.id` | call-graph.md:41 | | `api_keys.rotatedToId → api_keys.id` | identity.md:80 | Without documented cascade behavior, PostgreSQL defaults to `RESTRICT`, which may not be the intended behavior for all of these. **Recommendation**: Add all missing FK entries to the cascade table with explicit `onDelete` behavior. For the `rotatedToId` FK specifically: SET NULL (old key keeps its data but rotation link is broken if new key is deleted). --- ## 🟡 Warnings Issues that should be resolved if possible. They represent gaps in documentation, suboptimal designs, or inconsistencies that could cause confusion. --- ### W01. Dual JSONB Overlap: `commonCols.metadata` vs Per-Table `data` **Sessions**: 1 **Files**: `identity.md:85-88`, `identity.md:23`, `README.md:73` **Open-memory**: `ses_24f76141effegdhw2bxX2sOvYb` Two overlapping JSONB columns exist on some tables with no documented boundary: - `commonCols.metadata` — present on every table, `Record` - Per-table `data` columns — domain-specific data (e.g., `accounts.data`, `organizations.data`) For `api_keys`, keypal stores `scopes`, `resources`, and `tags` **inside `commonCols.metadata`**. For `accounts`, both `data` ("preferences, avatar URL") and `metadata` (arbitrary) exist with overlapping purposes and no split documentation. **Recommendation**: Document the boundary: "`data` holds structured domain-specific data with known TypeScript types. `metadata` holds opaque key-value pairs for subsystem use, with a namespacing convention (e.g., `metadata._keypal.scopes`). Never mix domain data into `metadata`." --- ### W02. No Account Deactivation Mechanism **Sessions**: 1 **Files**: `identity.md` (accounts table) **Open-memory**: `ses_24f76141effegdhw2bxX2sOvYb` The `accounts` table has no `enabled`/`suspended` column. Combined with `organizations.ownerId → RESTRICT`, an org owner's account cannot be deleted. But there's also no way to deactivate it when an employee leaves. **Recommendation**: Add an `enabled` boolean (consistent with `api_keys.enabled` and `clients.enabled`), or a `status` column (`active`/`suspended`/`deactivated`). Document the interaction with cascade constraints. --- ### W03. Missing Indexes Across Many Tables **Sessions**: 1, 2, 3, 5 **Files**: `table-reference.md:87-145`, per-domain docs **Open-memory**: All sessions (consensus finding) Multiple tables have FK columns or common query patterns without supporting indexes: | Table | Missing Index | Purpose | |---|---|---| | `sessions` | `unq_sessions_slug` in index ref | UNIQUE constraint not listed (unlike other UNIQUEs) | | `sessions` | `idx_sessions_parent_id` on `(parentId)` | Find child sessions of coordinator | | `projects` | `idx_projects_org_id` on `(orgId)` | Find projects for an org | | `workspaces` | `idx_workspaces_project_id` on `(projectId)` | Find workspaces for a project | | `spokes` | `idx_spokes_name` on `(name)` | Look up spoke by name | | `detections` | `idx_detections_session_id` on `(sessionId)` | Find detections for a session (no indexes at all) | | `call_graph_nodes` | `idx_call_graph_nodes_created_at` on `(createdAt)` | Time-range queries | | `call_graph_nodes` | `idx_call_graph_nodes_operation_created` on `(operationId, createdAt)` | Operation + time queries | | `call_graph_edges` | `idx_call_graph_edges_source_id` on `(sourceId)` | Graph traversal (children) | | `call_graph_edges` | `idx_call_graph_edges_target_id` on `(targetId)` | Graph traversal (parents) | | `mappings` | `idx_mappings_workspace_id` on `(workspaceId)` | Workspace-scoped mapping queries | Also: `idx_api_keys_key_hash` (B-tree) is redundant with `unq_api_keys_key_hash` (UNIQUE). Postgres automatically creates an index for UNIQUE constraints. **Recommendation**: Add all missing indexes to `table-reference.md` and relevant per-domain docs. Remove the redundant `idx_api_keys_key_hash`. --- ### W04. `operation_specs` Pre-Remap vs. Post-Remap Namespace Ambiguity **Sessions**: 3 **Files**: `spokes.md:51-55`, `spoke-runner.md:62` **Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt` Do `operation_specs.namespace` and `operation_specs.name` store the original spoke identifiers (pre-remap, e.g., `dev.fs.read`) or the remapped hub identifiers (post-remap, e.g., `dev.{spokeId}.fs.read`)? The spoke-runner.md says the hub remaps spoke operations into a hub namespace, but the operation_specs storage format is never specified. If pre-remap: two spokes registering `dev.fs.read` creates ambiguity without joining on `spokeId`. If post-remap: the partial unique indexes may be over-constraining since the spoke-specific namespace prefix makes `spokeId` redundant for uniqueness. **Recommendation**: Explicitly document which identifiers are stored. If pre-remap, document how callers resolve ambiguity. If post-remap, adjust the uniqueness rationale. --- ### W05. `call_graph_edges.edgeType` Semantics Undefined **Sessions**: 3 **Files**: `call-graph.md:41` **Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt` Three edge types are listed (`triggered`, `depends_on`, `requested_by`) but none are explained. The call-graph architecture doc only discusses parent-child relationships (triggered). `depends_on` and `requested_by` are novel and undocumented. Are these exhaustive or extensible? **Recommendation**: Document each edge type's semantics in `call-graph.md`, or state that `edgeType` is an extensible text field with these three initial values and define what each means. --- ### W06. `spokes.status` Missing `reconnecting` State **Sessions**: 3 **Files**: `spokes.md:18`, `spoke-runner.md:130-136` **Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt` The spoke status enum is `connected`, `disconnected`. The spoke-runner.md describes a reconnection flow, but there's no intermediate state for "reconnecting." When a spoke's WebSocket drops, it shows `disconnected` — indistinguishable from a permanently offline spoke. **Recommendation**: Add `reconnecting` to the spoke status enum, or document that reconnection is handled at the application layer (WebSocket reconnect timer) without a DB state change. --- ### W07. `client_secrets.keyVersion` Redundancy **Sessions**: 3 **Files**: `services.md:71`, `services.md:82-86` **Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt` `client_secrets` has both a standalone `keyVersion` column (integer NOT NULL DEFAULT 1) AND `keyVersion` embedded in the `value` JSONB (`EncryptedData.keyVersion`). These can diverge with no documented invariant. **Recommendation**: Either remove the standalone column (read from `value.keyVersion`), or document that the standalone column is authoritative and they must be kept in sync. --- ### W08. Call Graph Payload Security **Sessions**: 3 **Files**: `call-graph.md:22-23` **Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt` The `input` and `output` JSONB columns store full call payloads. Operations like `hub.register` (which receives auth tokens) would store API keys and secrets in cleartext. The truncation strategy (10KB) addresses size, not sensitive data. No redaction is mentioned. **Recommendation**: Add a section on sensitive data handling. Options: - Operation handlers mark certain fields as redacted - The call graph writer applies field-level redaction by convention (fields named `password`, `token`, `secret`, `key`) - The truncation strategy is extended with a redaction pass --- ### W09. No Call Graph Retention Policy **Sessions**: 3, 4 **Files**: `call-graph.md` (absent), `README.md` Open Question #5 **Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt` Call graph data grows unboundedly. Every operation invocation creates a node and edges. CASCADE handles cleanup when a node is deleted, but nothing deletes old nodes. README.md acknowledges this as Open Question #5. **Recommendation**: Specify the intended approach: TTL-based deletion, archival to cold storage, or aggregation + deletion. Even a "v1: manual cleanup, v2: automatic TTL" notation helps. --- ### W10. `sessions.version` Column: Unspecified **Sessions**: 2 **Files**: `sessions.md:24`, `README.md` Open Question #1 **Open-memory**: `ses_24f751efbffeyWo9wb6hAnnj0y` The `version` column is `text NOT NULL` with description "Schema version (opencode compat)" but: - No valid values listed - No default documented for hub-direct sessions vs opencode imports - No versioning scheme defined - README.md Open Question #1 asks whether to version `data` columns — this is unresolved **Recommendation**: Define initial version value (e.g., `"1"`), document what `version` governs (the `data` JSONB shape? the message/parts schema? opencode compatibility only?), and specify the default for hub-direct sessions. --- ### W11. Overlapping Status Enums Without Cross-Table Disambiguation **Sessions**: 4, 5 **Files**: `table-reference.md:147-164`, `coordination.md:23`, `tasks.md:84-86` **Open-memory**: `ses_24f7431baffeElbZ3qVHCYQOSv`, `ses_24f735dbcffea1pN0JCgtPdbt2` Three tables have `status` with overlapping values: | Table | Shared Values | Unique Values | |---|---|---| | `mappings` | `completed`, `failed`, `aborted` | `active` | | `call_graph_nodes` | `completed`, `failed`, `aborted` | `pending`, `running` | | `tasks` | `completed`, `failed` | `pending`, `in-progress`, `blocked` | `table-reference.md:164` only contrasts `mappings.active` vs `call_graph_nodes.pending/running`. It does NOT contrast `tasks` statuses with the others. `mappings.completed` and `tasks.completed` mean different things (mapping workflow completion vs task completion). **Recommendation**: Add cross-table state mapping documentation. When a task goes `in-progress`, there should be an active mapping; when a task is `completed`, the mapping becomes `completed`. Document valid combinations. --- ### W12. Audit Logs Missing Session and Org Context **Sessions**: 1 **Files**: `identity.md:103-117` **Open-memory**: `ses_24f76141effegdhw2bxX2sOvYb` `audit_logs` has `ownerId` and `keyId` but no `sessionId` or `orgId`. For LLM accounts that fill roles in sessions, the session correlation is a significant traceability gap. Multi-tenant auditing requires org filtering. **Recommendation**: Add `sessionId` (nullable FK → sessions.id, SET NULL) and `orgId` (nullable FK → organizations.id, SET NULL). Expand `action` types to cover account, membership, and organization lifecycle events — or document the `action` enum as extensible. --- ### W13. API Key Hashing (SHA-256) Trade-Off Undocumented **Sessions**: 1 **Files**: `identity.md:74`, `ADR-010` **Open-memory**: `ses_24f76141effegdhw2bxX2sOvYb` API keys are bearer tokens stored as SHA-256 hashes. SHA-256 is a fast hash, not a deliberately slow KDF (bcrypt/Argon2). If the database is compromised, SHA-256 hashes can be brute-forced orders of magnitude faster than slow hashes. However, API keys are high-entropy machine-generated strings (128-bit+), making brute-force infeasible even with a fast hash. No ADR documents this trade-off. **Recommendation**: Add documentation: "API keys are high-entropy random strings (128-bit+), making brute-force infeasible even with a fast hash. SHA-256 was chosen for O(1) verification latency at high throughput. This is acceptable because API keys are machine-generated, unlike human-chosen passwords." --- ### W14. ADR Terminology Inconsistencies **Sessions**: 1 **Files**: `ADR-009:13`, `ADR-012:55`, `agent-roles.md` **Open-memory**: `ses_24f76141effegdhw2bxX2sOvYb` - ADR-009 says "organization_members (membership with **roles**)" — contradicts ADR-012's rename to `membershipLevel` - ADR-012 itself uses `accounts.role: "service"` in its rationale, despite mandating the rename to `accessLevel` - `agent-roles.md` also uses `accounts.role: "service"` **Recommendation**: Update ADR-009 to say "membership with levels." Update ADR-012:55 and agent-roles.md to use `accounts.accessLevel: "service"`. --- ### W15. Resolved Open Questions Still Listed as Open in README **Sessions**: 5 **Files**: `README.md:197-225` **Open-memory**: `ses_24f735dbcffea1pN0JCgtPdbt2` Several open questions are resolved by per-domain docs or ADRs but remain listed as open: - **Q2** (operation spec cleanup): Resolved — DELETE aligns with ephemeral spoke model (spokes.md, table-reference.md CASCADE) - **Q4** (workspaces vs. directories): Marked as "Resolved" in the list but still present - **Q14** (`accounts.role` → `accessLevel`): Renamed in identity.md, referenced in ADR-012 **Recommendation**: Move resolved items to a "Resolved Decisions" section with cross-references to the resolving documents. --- ### W16. `organizations.ownerId` RESTRICT: No Deletion/Transfer Workflow **Sessions**: 1, 5 **Files**: `identity.md:44`, `table-reference.md:56` **Open-memory**: `ses_24f76141effegdhw2bxX2sOvYb`, `ses_24f735dbcffea1pN0JCgtPdbt2` RESTRICT prevents deletion of accounts that own organizations, but no ownership transfer mechanism is documented. **Recommendation**: Add a note: "Before deleting an account, transfer all owned organizations via `org.transferOwnership` operation." Document the transfer pattern in identity.md or coordination.md. --- ### W17. Path LIKE Queries May Not Use B-Tree Indexes in PostgreSQL **Sessions**: 4 **Files**: `tasks.md:83`, `tasks.md:101` **Open-memory**: `ses_24f7431baffeElbZ3qVHCYQOSv` `WHERE path LIKE 'implementation/%'` can use a B-tree index **only with the `C` locale or `text_pattern_ops`**. With the default locale, LIKE pattern matching may not use the index. **Recommendation**: Specify that the `path` index should use `text_pattern_ops` (`CREATE INDEX idx_tasks_path ON tasks (path text_pattern_ops)`) or document the locale dependency. --- ### W18. Call Graph Payload Truncation Lacks Precision **Sessions**: 3 **Files**: `call-graph.md:30` **Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt` The truncation strategy says "truncate payloads larger than 10KB" but doesn't specify: when truncation happens (on write? after call completes?), what `preview` contains (first N bytes? N characters?), whether 10KB is configurable, or how object storage reference URLs are structured. **Recommendation**: Specify: (a) truncation happens on write to DB (in-flight calls have full payloads); (b) `preview` is the first 1024 bytes of the JSON-serialized payload; (c) make the threshold configurable per operation type or via hub config; (d) defer object storage details but add a placeholder section. --- ### W19. `call_graph_nodes.identity` Has No FK or Account Linkage **Sessions**: 3 **Files**: `call-graph.md:20` **Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt` The `identity` JSONB column stores `{ id, scopes, resources }` as a snapshot, but there's no FK to `accounts.id`. Querying "all calls made by account X" requires JSONB containment, which is slow without a GIN index. **Recommendation**: Add a `callerAccountId` text column with FK → accounts.id (SET NULL) for efficient querying, or add a GIN index on `identity` if JSONB queries are the intended access pattern. --- ### W20. `mappings` Table Overloaded — Three Distinct Relationship Types **Sessions**: 4 **Files**: `coordination.md:10-27` **Open-memory**: `ses_24f7431baffeElbZ3qVHCYQOSv` The `mappings` table stores three conceptually different relationships in one table: 1. Session → Spoke (where is the session running?) 2. Session → Parent session (coordination hierarchy) 3. Session → Task (what work is the session doing?) All nullable FKs allow any combination, including potentially invalid ones. The table name `mappings` doesn't convey what's being mapped. **Recommendation**: Document the valid column combinations (e.g., `sessionId` always NOT NULL, `taskId` only for task-scoped mappings, `parentSessionId` only for coordinator children). This makes it a polymorphic association table with documented shapes. --- ### W21. `detections` Table Is Minimal — No Resolution or Deduplication **Sessions**: 4 **Files**: `coordination.md:29-39` **Open-memory**: `ses_24f7431baffeElbZ3qVHCYQOSv` - No resolution tracking (resolved, acknowledged, false-positive) - No deduplication (persistent `MODEL_DEGRADATION` creates a new row every check interval) - No session end correlation - `anomalyType` value set is unclear (open or closed enum?) **Recommendation**: Add `resolvedAt` timestamp column. Add a UNIQUE constraint on `(sessionId, anomalyType)` with upsert semantics, or document that deduplication is handled at the application level. Specify whether `anomalyType` is extensible. --- ## 💡 Suggestions Quality-of-life improvements that should be considered but won't block stabilization. --- ### S01. Document `accessLevel` Change Authorization Who can change an `accounts.accessLevel`? Can a `user` self-promote? Document the assumed invariants even for application-level concerns. --- ### S02. Add Partial Indexes for Common Access Patterns Several partial indexes would improve performance: active API keys (`WHERE revoked_at IS NULL AND enabled = true`), connected spokes (`WHERE status = 'connected'`), non-archived sessions, active tasks (`WHERE status IN ('pending', 'in-progress', 'blocked')`). --- ### S03. Reserve `@alk.dev` Email Domain for System Accounts LLM accounts use fallback addresses like `glm-5.1@alk.dev`. Document that all `*@alk.dev` emails are reserved for system-generated accounts and humans must use other domains. --- ### S04. Consider `displayName` Index for User Search `accounts.displayName` is not indexed. For UIs with user search/autocomplete, this would require full table scans. --- ### S05. Document API Key Expiration Behavior Does an expired key return "key expired" or a generic "authentication failed"? Recommend generic response to avoid leaking key state to attackers. --- ### S06. Cross-Reference `sessions.accountId` in Identity Docs `identity.md:12` lists FK targets but omits `sessions.accountId`. Add it for completeness. --- ### S07. Define `FilePartData` Type `sessions.md:132` references `FilePartData[]` in ToolState but never defines it. Clarify whether it's the same as the `file` part type's data shape. --- ### S08. Complete AI SDK UIMessage Part Type Mapping `sessions.md:145-152` maps 6 part types but omits `step-finish`, `patch`, `snapshot`, `compaction`, `agent`. Document that these are excluded from the UIMessage view, or add mappings. --- ### S09. Document `sessions.slug` Generation Strategy Is it human-provided? Auto-generated? Random? This matters for API design and uniqueness enforcement. --- ### S10. Add `parts.type` Index for Part-Type Queries A composite index `(session_id, type)` would support queries like "all tool-call parts in session X" without a full scan. At minimum, document that `type` queries rely on existing indexes + sequential scan. --- ### S11. Document Whether Parts Are Flat or Nested The `agent` part type implies sub-agent delegation, which might need nesting. The current schema has no `parentId` on parts. Document whether parts are flat or whether nesting might be needed. --- ### W22. `parts` Table: Missing `$onUpdate` and `NOT NULL` on Timestamp Columns **Sessions**: 5 **Files**: `sessions.md:99-107`, `README.md:69-82` **Open-memory**: `ses_24f735dbcffea1pN0JCgtPdbt2` The `parts` table defines its own `id`, `metadata`, `createdAt`, and `updatedAt` instead of using `commonCols`, but the spec only says "defaults to `now()`" without specifying `NOT NULL` or `$onUpdate`. If the Drizzle implementation omits `$onUpdate`, parts rows never have `updatedAt` updated on modification, silently breaking any optimistic concurrency or caching logic. If `createdAt`/`updatedAt` are not `NOT NULL`, they can become NULL. **Recommendation**: The `parts` table spec must explicitly state that `createdAt` and `updatedAt` are `NOT NULL` and that `updatedAt` includes `$onUpdate(() => new Date())`. Either replicate these details from `commonCols` with an explicit override note for `id`, or reference `commonCols` with the `id` exception documented. --- ### S13. Add `projectId` to `mappings` for Direct Project-Scoped Queries Finding all active mappings for a project's tasks requires a JOIN through `sessions.projectId` or `tasks.projectId`. A denormalized `projectId` would simplify this, or document that the JOIN pattern is acceptable. --- ### S14. Document `mappings.status` Lifecycle Transitions Unlike `tasks.status` which has an explicit lifecycle diagram, `mappings.status` transitions are unspecified. Add a lifecycle diagram or state machine. --- ### S15. Specify Task Enum Values as Drizzle `pgEnum` The categorical enum values (`scope`, `risk`, `impact`, `level`, `priority`, `status`) are documented as text strings but not referenced as Drizzle `pgEnum` types. Specify that these should be `pgEnum` for type safety, with the decomposer template consuming the same definitions. --- ### S16. Rename `taskId` to `dependentTaskId` in `task_dependencies` The column name `taskId` is generic and could be confused as "this task" rather than "the dependent task." Renaming to `dependentTaskId` makes the direction unmistakable. --- ### S17. Add `call_graph_nodes.startedAt` Index for Latency Analysis `startedAt` is crucial for p99 latency analysis. Consider an index alongside or instead of the suggested `createdAt` index. --- ### S18. Consider Unique Constraint on `call_graph_edges(sourceId, targetId, edgeType)` Nothing prevents duplicate edges between the same two nodes with the same type. A unique constraint prevents silently duplicated edge events from retries/reconnections. --- ## ✅ What's Working Well Strengths identified across all five reviews: 1. **Drizzle-TypeBox pattern** — Well-documented and consistently applied. The `createSelectSchema`/`createInsertSchema` workflow with JSONB overrides is clear and implementable. 2. **Cross-cutting reference pattern** — `table-reference.md` as a single source for cascades, indexes, enums, and relations is an excellent organizational pattern that prevents "hunt through every domain doc" problems. 3. **Nullable categorical fields (ADR-011)** — The "not yet assessed" signal via NULL (instead of defaults) is well-reasoned and matches taskgraph's own `Option` model. 4. **Dual task representation** — DB as source of truth, files as authoring surface. The authority model table is excellent and provides clear guidance. 5. **ADR-012 terminology clarification** — The account/role/session distinction is clearly motivated and the rename guidance is actionable. 6. **Cascade behavior documentation** — Having all FK behaviors in one place with rationale per relationship prevents implementation errors. 7. **Operation specs as capabilities (ADR-006)** — Elegant decision. Avoids opaque JSONB blobs, makes capabilities fully typed and queryable. Nullable `spoke_id` allows hub-native operations to coexist. 8. **Config/secrets separation (ADR-007)** — Four-layer model (config schema, config instance, auth schema, auth instance) with different storage strategies is well-structured. 9. **Path semantics for tasks** — Replacing `parentId` with `path` column for group scoping is clean. Prefix-based queries are intuitive and well-explained. 10. **Partial unique index design** — The `operation_specs` partial indexes correctly handle PostgreSQL's NULL-in-unique-index behavior. The explanation prevents a common pitfall. --- ## Action Plan ### Before Stabilization (Must Fix) | Priority | Issue | Action | |---|---|---| | 🔴1 | C01 | Resolve NOT NULL + SET NULL contradictions for `sessions.accountId` and `audit_logs.ownerId` | | 🔴2 | C02 | Resolve ADR-003 vs sessions.md on message IDs — update one or the other | | 🔴3 | C03 | Resolve operation_specs delete vs soft-deactivation — choose one, update all docs | | 🔴4 | C04 | Document sessionId immutability invariant on messages/parts | | 🔴5 | C05 | Document roleName validation strategy (FK or intentional omission) | | 🔴6 | C14 | Add missing FK cascade entries to table-reference.md | ### Before Implementation (Should Fix) | Priority | Issue | Action | |---|---|---| | 🟡1 | C06 | Document mappings.task denormalization invariant | | 🟡2 | C07 | Define sync field split (authored vs. runtime fields) | | 🟡3 | C08 | Specify DB-level concatenation for task.body appends | | 🟡4 | C09 | Add DB-level guard for cross-project dependencies | | 🟡5 | C10 | Add call_graph_edges indexes and cascade docs | | 🟡6 | C11 | Specify key rotation multi-key format and transaction safety | | 🟡7 | C14 | Add remaining missing cascade entries | | 🟡8 | W03 | Add missing indexes across tables | | 🟡9 | W11 | Add cross-table status disambiguation | | 🟡10 | W14 | Fix ADR terminology inconsistencies | ### Before Production (Consider) | Priority | Issue | Action | |---|---|---| | 💡1 | W02 | Add account deactivation mechanism | | 💡2 | W08 | Add call graph payload redaction | | 💡3 | W09 | Define call graph retention policy | | 💡4 | W12 | Add sessionId and orgId to audit_logs | | 💡5 | W21 | Add resolution tracking to detections | | 💡6 | All S01-S18 | Quality-of-life improvements |