alkdev/hub

Files

glm-5.1 2b63cda1c7 Setup repo: migrate architecture specs, code stubs, and tasks from alkhub_ts

Copy architecture docs, ADRs, storage domain specs, research, reviews,
and 56 storage architecture tasks from the alkhub_ts monorepo. Adapt for
standalone @alkdev/hub repo structure (src/ not packages/hub/).

Sanitize all sensitive information:
- Replace private IPs (10.0.0.1) with localhost defaults
- Remove internal server hostnames (dev1, ns528096)
- Replace /workspace/ private paths with npm package references
- Remove hardcoded credentials from examples
- Rewrite infrastructure.md without private network details

Add Deno project scaffolding: deno.json (pinned deps), .gitignore,
AGENTS.md, entry point. Migrate existing code stubs (crypto, config
types, logger) with updated import paths.

2026-05-25 10:56:32 +00:00

41 KiB

Raw Permalink Blame History

status, last_updated, review_date, reviewer, scope, resolution

status	last_updated	review_date	reviewer	scope	resolution
active	2026-04-21	2026-04-21	architect (with 5 subagent reviewers)	docs/architecture/storage/* + docs/decisions/ADR-001 through ADR-012	pending

Storage Architecture Review: 2026-04-21

Comprehensive review of the storage specification documents (docs/architecture/storage/) and related ADRs. Five parallel subagent reviews were conducted, each focused on a domain area. Their findings are consolidated here with deduplication, prioritization, and cross-references.

Review Sessions (open-memory)

#	Domain	Session ID
1	Identity & Auth	`ses_24f76141effegdhw2bxX2sOvYb`
2	Sessions & Messages	`ses_24f751efbffeyWo9wb6hAnnj0y`
3	Services, Spokes, Call Graph	`ses_24f746ebbffeG4jqN3MbK5i9yt`
4	Tasks & Coordination	`ses_24f7431baffeElbZ3qVHCYQOSv`
5	Cross-Cutting Concerns	`ses_24f735dbcffea1pN0JCgtPdbt2`

Documents Reviewed

docs/architecture/storage/README.md — common pattern, package structure, open questions
docs/architecture/storage/table-reference.md — cross-cutting reference (cascades, indexes, enums, relations)
docs/architecture/storage/identity.md — accounts, organizations, organization_members, api_keys, audit_logs
docs/architecture/storage/projects.md — projects, workspaces
docs/architecture/storage/sessions.md — sessions, messages, parts
docs/architecture/storage/roles.md — roles
docs/architecture/storage/services.md — clients, client_secrets
docs/architecture/storage/spokes.md — spokes, operation_specs
docs/architecture/storage/call-graph.md — call_graph_nodes, call_graph_edges
docs/architecture/storage/coordination.md — mappings, detections
docs/architecture/storage/tasks.md — tasks, task_dependencies
docs/decisions/ADR-001 through ADR-012

Summary Statistics

Severity	Count
🔴 Critical	14
🟡 Warning	22
💡 Suggestion	17

🔴 Critical Issues

Issues that must be resolved before the storage spec is stabilized. Each represents a concrete inconsistency, data integrity risk, or ambiguity that would cause implementation divergence.

C01. `NOT NULL` + `onDelete: SET NULL` — Contradictory Constraints

Sessions: 1, 2, 5 Files: sessions.md:17, identity.md:112, table-reference.md:80-71 Open-memory: ses_24f76141effegdhw2bxX2sOvYb, ses_24f751efbffeyWo9wb6hAnnj0y, ses_24f735dbcffea1pN0JCgtPdbt2

Two FK columns are declared NOT NULL but have onDelete: SET NULL. PostgreSQL will reject the DELETE because it cannot nullify a NOT NULL column:

sessions.accountId — text NOT NULL (sessions.md:17) with onDelete: SET NULL (table-reference.md:80). Deleting an account that owns sessions fails.
audit_logs.ownerId — text NOT NULL (identity.md:112) with onDelete: SET NULL (table-reference.md:71). Deleting an account that has audit entries fails.

Recommendation: For each, choose one:

Make the column nullable (if detaching on delete is desired)
Change cascade to RESTRICT (if the FK must always be populated — blocks account deletion)
Change cascade to CASCADE (if deleting dependent records is acceptable)
Add application-level logic that reassigns/destroys dependents before account deletion

For audit_logs.ownerId, RESTRICT may be correct — audit trails should prevent account deletion. For sessions.accountId, nullable is likely correct — orphaned sessions (account deleted) are still valuable data.

C02. ADR-003 vs `sessions.md` on Message IDs

Sessions: 2, 5 Files: ADR-003, sessions.md:42-46, table-reference.md:48 Open-memory: ses_24f751efbffeyWo9wb6hAnnj0y, ses_24f735dbcffea1pN0JCgtPdbt2

ADR-003 explicitly states: "Parts and messages tables use sortable timestamp-based IDs instead of commonCols.id." However, sessions.md defines the messages table using commonCols (which provides UUIDv4 via crypto.randomUUID()). Only parts explicitly uses sortable IDs. table-reference.md only mentions parts for sortable IDs.

This is a three-way inconsistency: ADR says both tables, sessions.md does one, table-reference says one. Message ordering is semantically important (the composite index idx_messages_session_id_created_at_id on (session_id, created_at, id) relies on created_at for ordering, making UUIDv4 sortable IDs unnecessary — but this contradicts ADR-003's stated rationale).

Recommendation: Either:

(A) Update messages table to use sortable IDs (consistent with ADR-003, eliminates dependency on created_at for ordering), or
(B) Amend ADR-003 to state that only parts uses sortable IDs, and messages relies on the (session_id, created_at, id) composite index

C03. Operation Specs: Delete vs. Soft-Deactivation Unresolved

Sessions: 3, 5 Files: spokes.md:66, table-reference.md:67, README.md Open Question #2 Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt, ses_24f735dbcffea1pN0JCgtPdbt2

The spoke disconnect lifecycle has three conflicting positions:

spokes.md:66: "Removes the spoke's operation_specs rows (or marks inactive)" — ambiguous
table-reference.md:67: operation_specs.spokeId → spokes.id with CASCADE delete
README.md Open Question #2: "DELETE aligns with the ephemeral spoke model" for now

The operation_specs table has no active or status column to support soft-deactivation. Crucially, spoke rows are never deleted — they're only marked status: "disconnected". This means the CASCADE FK never fires, and there's no mechanism to clean up operation_specs on disconnect. The operation_specs rows remain pointing to a disconnected spoke with no way to deprecate them.

Recommendation: Resolve decisively:

(A) Hard delete on disconnect: Add explicit cleanup in the disconnect handler. Remove "or marks inactive" from spokes.md. CASCADE only applies to rare admin spoke-row deletion.
(B) Add active/status column to operation_specs: Support soft-deactivation. Update cascade rationale. This preserves the operation registry for audit/reconnection but adds schema complexity.

Option A aligns with the ephemeral spoke model. Option B supports spoke reconnection. Choose one and update all documents.

C04. `parts.sessionId` Denormalization: No Enforcement Mechanism

Sessions: 2 Files: sessions.md:96, sessions.md:105 Open-memory: ses_24f751efbffeyWo9wb6hAnnj0y

The stated invariant: "when inserting a part, always set sessionId to the message's sessionId. Never update messages.sessionId without updating all child parts." However:

No DB trigger enforces this
No application-level transaction pattern is documented
No CHECK constraint exists
If messages.sessionId could change, there's a race condition window

Recommendation: Document that sessionId on both messages and parts is immutable after creation (which eliminates the update problem). Define the application-level contract for part insertion: read the message's sessionId and set it on the part within the same transaction. Add an explicit "IMMUTABLE" note to the sessionId column in sessions.md.

C05. `sessions.roleName` — No FK, No Validation Strategy Documented

Sessions: 2 Files: sessions.md:26, table-reference.md:100-101, roles.md Open-memory: ses_24f751efbffeyWo9wb6hAnnj0y

sessions.roleName is bare text with no FK to roles.name and no documented reason why. Is this intentional (to support file-based roles in Phase 1)? What happens if the role name has a typo? What about sessions referencing a role that was deleted?

Recommendation: Either:

(A) Add FK → roles.name with onDelete: SET NULL (role deletions detach sessions), or
(B) Document why the FK is intentionally omitted: "role definitions may come from .opencode/agents/*.md files before DB sync; application-level validation checks against known role names at session creation time."

C06. `mappings.task` Denormalized Column: No Sync Strategy

Sessions: 4 Files: coordination.md:22, tasks.md:209 Open-memory: ses_24f7431baffeElbZ3qVHCYQOSv

The mappings table has both taskId (FK → tasks.id) and task (denormalized display name). No mechanism keeps them in sync. If taskId points to a task whose slug or name changes, mappings.task becomes stale. When taskId is SET NULL (task deleted), what happens to task?

Recommendation: Document the invariant: "mappings.task is set to tasks.slug at insert time and is not automatically updated when the task's slug changes. When taskId is SET NULL (task deleted), task should also be SET NULL. This is a cache, not a source of truth." Alternatively, remove the denormalized column and use a VIEW that joins.

C07. Sync vs. Runtime Field Conflict in Tasks

Sessions: 4 Files: tasks.md:296-325 Open-memory: ses_24f7431baffeElbZ3qVHCYQOSv

The task sync does a full upsert, but the Authority Model says runtime status mutations go through hub.task.updateStatus. If sync blindly writes frontmatter status, it can clobber runtime state. Example:

Agent sets task.status = 'in-progress' via hub.task.updateStatus
Decomposer edits the task file (still has status: pending)
Sync runs and upserts the task — overwrites in-progress back to pending

Recommendation: Define the sync field split explicitly: "Sync upserts authored fields (slug, name, path, scope, risk, impact, level, priority, tags, assignee, due, body, fileCreatedAt, fileModifiedAt, depends_on) and must not overwrite runtime-managed fields (status, startedAt, completedAt). Runtime fields are only mutated via hub.task.* operations." Update the sync flow specification in tasks.md.

C08. Concurrent `task.body` Appends: No Collision Handling

Sessions: 4 Files: tasks.md:249-266 Open-memory: ses_24f7431baffeElbZ3qVHCYQOSv

hub.task.addNote appends a timestamped note section to body. In a multi-agent system, read-modify-write is a race condition: Agent A reads body, Agent B reads body, both append, Agent A writes, Agent B overwrites A's addition. The spec says "This is simple" — it is not simple under concurrency.

Recommendation: Specify the concurrency model: hub.task.addNote must use DB-level concatenation (UPDATE tasks SET body = body || $note WHERE id = $taskId), not a read-modify-write cycle. Or use optimistic locking with updatedAt. Document this explicitly in the addNote specification.

C09. Cross-Project Dependency Constraint: No DB Enforcement

Sessions: 4 Files: tasks.md:217, tasks.md:357 Open-memory: ses_24f7431baffeElbZ3qVHCYQOSv

"Tasks can only depend on tasks within the same project" is declared but only "enforced at the application level." task_dependencies has FK columns with no projectId column or check constraint. Application-level enforcement is vulnerable to race conditions, direct SQL access, or bugs.

Recommendation: At minimum, add a DB-level guard. Options:

(A) Add a trigger that checks dependsOnTaskId and taskId belong to the same project
(B) Add a denormalized projectId column to task_dependencies with a composite FK
(C) Document the risk explicitly and specify that the sync operation validates project scope within a transaction (SELECT FOR SHARE)

C10. Call Graph Edges: Missing Indexes and Cascade Documentation

Sessions: 3, 5 Files: call-graph.md:32-41, table-reference.md (missing) Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt, ses_24f735dbcffea1pN0JCgtPdbt2

call_graph_edges has no indexes and no cascade entries in table-reference.md. Both sourceId and targetId reference call_graph_nodes.id with CASCADE (implied by domain doc), but this is undocumented. Without indexes, graph traversal queries (find children, find parents) will require sequential scans.

Additionally, the relationship between call_graph_nodes.parentRequestId and call_graph_edges is ambiguous: do they store the same parent-child relationship redundantly, or serve different purposes?

Recommendation:

Add indexes: idx_call_graph_edges_source_id on (sourceId), idx_call_graph_edges_target_id on (targetId). Consider unique on (sourceId, targetId, edgeType) to prevent duplicates.
Add cascade entries to table-reference.md for both FKs (CASCADE).
Clarify parentRequestId vs call_graph_edges: document whether parentRequestId is a convenience shortcut or redundant with edges.

C11. Secret Key Rotation: Underspecified

Sessions: 3 Files: services.md:94-97, ADR-008 Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt

Lazy re-encryption is insufficiently specified for a security-critical operation:

Multi-key storage: HUB_ENCRYPTION_KEY (singular env var) — how are old and new keys stored simultaneously during rotation?
Re-encryption transaction: If the process crashes between decrypt and re-encrypt-update, is the secret left in the old key version?
Old key unavailability: What happens if a secret with keyVersion=1 is accessed after the old key is removed? Permanent data loss with no documented handling.
No background sweep: Old-key-version secrets persist indefinitely until accessed. If the old key is compromised, those secrets remain vulnerable.

Recommendation:

Specify multi-key storage: e.g., HUB_ENCRYPTION_KEYS=v1:base64key,v2:base64key or a key file
Document the re-encryption transaction: decrypt → encrypt → UPDATE in a single DB transaction, with crash-safety note
Add a warning about the vulnerability window (old-key secrets not yet re-encrypted)
Specify whether a background re-encryption sweep is needed or deferred

C12. Client Config Schema Validation: Timing and Evolution Ambiguous

Sessions: 3 Files: services.md:19, ADR-007, README.md Open Question #10 Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt

"Validated against the TypeBox schema for this type on write" is ambiguous:

Who validates? Drizzle insert schema? API handler? DB trigger? Direct SQL bypasses application validation.
Schema evolution: when code deployment changes a client type's TypeBox schema, existing DB rows may become invalid under the new schema.
No re-validation on read is documented.

Recommendation:

Specify: "validate on write (API handler layer) + warn on read (start-up validation pass with logging, not blocking)"
Document the schema evolution contract: new fields MUST be Type.Optional(); breaking changes MUST use a new client type string (e.g., llm-provider-v2)
Consider a configSchemaVersion in metadata tracking which schema version validated the config

C13. Dual Ownership Model for Organizations: Undefined

Sessions: 1 Files: identity.md:44 (ownerId), identity.md:58 (membershipLevel: "owner") Open-memory: ses_24f76141effegdhw2bxX2sOvYb

Two competing ownership concepts with no documented relationship:

organizations.ownerId — a single FK to one account
organization_members.membershipLevel: "owner" — can exist on multiple rows

Can ownerId point to an account with membershipLevel: "member" (not "owner")? Can an org have zero members with membershipLevel: "owner" but a non-null ownerId? An implementer cannot determine which field is authoritative for ownership queries.

Recommendation: Document the invariant. E.g.: "ownerId is always a member with membershipLevel: 'owner' (enforced by app logic). If all owner-level members are removed, ownerId must be transferred first." Or: "ownerId is the creator; membershipLevel: 'owner' is a separate authorization concept."

C14. Missing FK Cascade Entries in `table-reference.md`

Sessions: 5 Files: table-reference.md:53-83 Open-memory: ses_24f735dbcffea1pN0JCgtPdbt2

The following FK relationships are documented in per-domain docs but absent from the cascade reference table:

Missing Relationship	Source Doc
`mappings.workspaceId → workspaces.id`	coordination.md:19
`detections.sessionId → sessions.id`	coordination.md:36
`call_graph_edges.sourceId → call_graph_nodes.id`	call-graph.md:39
`call_graph_edges.targetId → call_graph_nodes.id`	call-graph.md:41
`api_keys.rotatedToId → api_keys.id`	identity.md:80

Without documented cascade behavior, PostgreSQL defaults to RESTRICT, which may not be the intended behavior for all of these.

Recommendation: Add all missing FK entries to the cascade table with explicit onDelete behavior. For the rotatedToId FK specifically: SET NULL (old key keeps its data but rotation link is broken if new key is deleted).

🟡 Warnings

Issues that should be resolved if possible. They represent gaps in documentation, suboptimal designs, or inconsistencies that could cause confusion.

W01. Dual JSONB Overlap: `commonCols.metadata` vs Per-Table `data`

Sessions: 1 Files: identity.md:85-88, identity.md:23, README.md:73 Open-memory: ses_24f76141effegdhw2bxX2sOvYb

Two overlapping JSONB columns exist on some tables with no documented boundary:

commonCols.metadata — present on every table, Record<string, unknown>
Per-table data columns — domain-specific data (e.g., accounts.data, organizations.data)

For api_keys, keypal stores scopes, resources, and tags inside commonCols.metadata. For accounts, both data ("preferences, avatar URL") and metadata (arbitrary) exist with overlapping purposes and no split documentation.

Recommendation: Document the boundary: "data holds structured domain-specific data with known TypeScript types. metadata holds opaque key-value pairs for subsystem use, with a namespacing convention (e.g., metadata._keypal.scopes). Never mix domain data into metadata."

W02. No Account Deactivation Mechanism

Sessions: 1 Files: identity.md (accounts table) Open-memory: ses_24f76141effegdhw2bxX2sOvYb

The accounts table has no enabled/suspended column. Combined with organizations.ownerId → RESTRICT, an org owner's account cannot be deleted. But there's also no way to deactivate it when an employee leaves.

Recommendation: Add an enabled boolean (consistent with api_keys.enabled and clients.enabled), or a status column (active/suspended/deactivated). Document the interaction with cascade constraints.

W03. Missing Indexes Across Many Tables

Sessions: 1, 2, 3, 5 Files: table-reference.md:87-145, per-domain docs Open-memory: All sessions (consensus finding)

Multiple tables have FK columns or common query patterns without supporting indexes:

Table	Missing Index	Purpose
`sessions`	`unq_sessions_slug` in index ref	UNIQUE constraint not listed (unlike other UNIQUEs)
`sessions`	`idx_sessions_parent_id` on `(parentId)`	Find child sessions of coordinator
`projects`	`idx_projects_org_id` on `(orgId)`	Find projects for an org
`workspaces`	`idx_workspaces_project_id` on `(projectId)`	Find workspaces for a project
`spokes`	`idx_spokes_name` on `(name)`	Look up spoke by name
`detections`	`idx_detections_session_id` on `(sessionId)`	Find detections for a session (no indexes at all)
`call_graph_nodes`	`idx_call_graph_nodes_created_at` on `(createdAt)`	Time-range queries
`call_graph_nodes`	`idx_call_graph_nodes_operation_created` on `(operationId, createdAt)`	Operation + time queries
`call_graph_edges`	`idx_call_graph_edges_source_id` on `(sourceId)`	Graph traversal (children)
`call_graph_edges`	`idx_call_graph_edges_target_id` on `(targetId)`	Graph traversal (parents)
`mappings`	`idx_mappings_workspace_id` on `(workspaceId)`	Workspace-scoped mapping queries

Also: idx_api_keys_key_hash (B-tree) is redundant with unq_api_keys_key_hash (UNIQUE). Postgres automatically creates an index for UNIQUE constraints.

Recommendation: Add all missing indexes to table-reference.md and relevant per-domain docs. Remove the redundant idx_api_keys_key_hash.

W04. `operation_specs` Pre-Remap vs. Post-Remap Namespace Ambiguity

Sessions: 3 Files: spokes.md:51-55, spoke-runner.md:62 Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt

Do operation_specs.namespace and operation_specs.name store the original spoke identifiers (pre-remap, e.g., dev.fs.read) or the remapped hub identifiers (post-remap, e.g., dev.{spokeId}.fs.read)? The spoke-runner.md says the hub remaps spoke operations into a hub namespace, but the operation_specs storage format is never specified.

If pre-remap: two spokes registering dev.fs.read creates ambiguity without joining on spokeId. If post-remap: the partial unique indexes may be over-constraining since the spoke-specific namespace prefix makes spokeId redundant for uniqueness.

Recommendation: Explicitly document which identifiers are stored. If pre-remap, document how callers resolve ambiguity. If post-remap, adjust the uniqueness rationale.

W05. `call_graph_edges.edgeType` Semantics Undefined

Sessions: 3 Files: call-graph.md:41 Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt

Three edge types are listed (triggered, depends_on, requested_by) but none are explained. The call-graph architecture doc only discusses parent-child relationships (triggered). depends_on and requested_by are novel and undocumented. Are these exhaustive or extensible?

Recommendation: Document each edge type's semantics in call-graph.md, or state that edgeType is an extensible text field with these three initial values and define what each means.

W06. `spokes.status` Missing `reconnecting` State

Sessions: 3 Files: spokes.md:18, spoke-runner.md:130-136 Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt

The spoke status enum is connected, disconnected. The spoke-runner.md describes a reconnection flow, but there's no intermediate state for "reconnecting." When a spoke's WebSocket drops, it shows disconnected — indistinguishable from a permanently offline spoke.

Recommendation: Add reconnecting to the spoke status enum, or document that reconnection is handled at the application layer (WebSocket reconnect timer) without a DB state change.

W07. `client_secrets.keyVersion` Redundancy

Sessions: 3 Files: services.md:71, services.md:82-86 Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt

client_secrets has both a standalone keyVersion column (integer NOT NULL DEFAULT 1) AND keyVersion embedded in the value JSONB (EncryptedData.keyVersion). These can diverge with no documented invariant.

Recommendation: Either remove the standalone column (read from value.keyVersion), or document that the standalone column is authoritative and they must be kept in sync.

W08. Call Graph Payload Security

Sessions: 3 Files: call-graph.md:22-23 Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt

The input and output JSONB columns store full call payloads. Operations like hub.register (which receives auth tokens) would store API keys and secrets in cleartext. The truncation strategy (10KB) addresses size, not sensitive data. No redaction is mentioned.

Recommendation: Add a section on sensitive data handling. Options:

Operation handlers mark certain fields as redacted
The call graph writer applies field-level redaction by convention (fields named password, token, secret, key)
The truncation strategy is extended with a redaction pass

W09. No Call Graph Retention Policy

Sessions: 3, 4 Files: call-graph.md (absent), README.md Open Question #5 Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt

Call graph data grows unboundedly. Every operation invocation creates a node and edges. CASCADE handles cleanup when a node is deleted, but nothing deletes old nodes. README.md acknowledges this as Open Question #5.

Recommendation: Specify the intended approach: TTL-based deletion, archival to cold storage, or aggregation + deletion. Even a "v1: manual cleanup, v2: automatic TTL" notation helps.

W10. `sessions.version` Column: Unspecified

Sessions: 2 Files: sessions.md:24, README.md Open Question #1 Open-memory: ses_24f751efbffeyWo9wb6hAnnj0y

The version column is text NOT NULL with description "Schema version (opencode compat)" but:

No valid values listed
No default documented for hub-direct sessions vs opencode imports
No versioning scheme defined
README.md Open Question #1 asks whether to version data columns — this is unresolved

Recommendation: Define initial version value (e.g., "1"), document what version governs (the data JSONB shape? the message/parts schema? opencode compatibility only?), and specify the default for hub-direct sessions.

W11. Overlapping Status Enums Without Cross-Table Disambiguation

Sessions: 4, 5 Files: table-reference.md:147-164, coordination.md:23, tasks.md:84-86 Open-memory: ses_24f7431baffeElbZ3qVHCYQOSv, ses_24f735dbcffea1pN0JCgtPdbt2

Three tables have status with overlapping values:

Table	Shared Values	Unique Values
`mappings`	`completed`, `failed`, `aborted`	`active`
`call_graph_nodes`	`completed`, `failed`, `aborted`	`pending`, `running`
`tasks`	`completed`, `failed`	`pending`, `in-progress`, `blocked`

table-reference.md:164 only contrasts mappings.active vs call_graph_nodes.pending/running. It does NOT contrast tasks statuses with the others. mappings.completed and tasks.completed mean different things (mapping workflow completion vs task completion).

Recommendation: Add cross-table state mapping documentation. When a task goes in-progress, there should be an active mapping; when a task is completed, the mapping becomes completed. Document valid combinations.

W12. Audit Logs Missing Session and Org Context

Sessions: 1 Files: identity.md:103-117 Open-memory: ses_24f76141effegdhw2bxX2sOvYb

audit_logs has ownerId and keyId but no sessionId or orgId. For LLM accounts that fill roles in sessions, the session correlation is a significant traceability gap. Multi-tenant auditing requires org filtering.

Recommendation: Add sessionId (nullable FK → sessions.id, SET NULL) and orgId (nullable FK → organizations.id, SET NULL). Expand action types to cover account, membership, and organization lifecycle events — or document the action enum as extensible.

W13. API Key Hashing (SHA-256) Trade-Off Undocumented

Sessions: 1 Files: identity.md:74, ADR-010 Open-memory: ses_24f76141effegdhw2bxX2sOvYb

API keys are bearer tokens stored as SHA-256 hashes. SHA-256 is a fast hash, not a deliberately slow KDF (bcrypt/Argon2). If the database is compromised, SHA-256 hashes can be brute-forced orders of magnitude faster than slow hashes. However, API keys are high-entropy machine-generated strings (128-bit+), making brute-force infeasible even with a fast hash. No ADR documents this trade-off.

Recommendation: Add documentation: "API keys are high-entropy random strings (128-bit+), making brute-force infeasible even with a fast hash. SHA-256 was chosen for O(1) verification latency at high throughput. This is acceptable because API keys are machine-generated, unlike human-chosen passwords."

W14. ADR Terminology Inconsistencies

Sessions: 1 Files: ADR-009:13, ADR-012:55, agent-roles.md Open-memory: ses_24f76141effegdhw2bxX2sOvYb

ADR-009 says "organization_members (membership with roles)" — contradicts ADR-012's rename to membershipLevel
ADR-012 itself uses accounts.role: "service" in its rationale, despite mandating the rename to accessLevel
agent-roles.md also uses accounts.role: "service"

Recommendation: Update ADR-009 to say "membership with levels." Update ADR-012:55 and agent-roles.md to use accounts.accessLevel: "service".

W15. Resolved Open Questions Still Listed as Open in README

Sessions: 5 Files: README.md:197-225 Open-memory: ses_24f735dbcffea1pN0JCgtPdbt2

Several open questions are resolved by per-domain docs or ADRs but remain listed as open:

Q2 (operation spec cleanup): Resolved — DELETE aligns with ephemeral spoke model (spokes.md, table-reference.md CASCADE)
Q4 (workspaces vs. directories): Marked as "Resolved" in the list but still present
Q14 (accounts.role → accessLevel): Renamed in identity.md, referenced in ADR-012

Recommendation: Move resolved items to a "Resolved Decisions" section with cross-references to the resolving documents.

W16. `organizations.ownerId` RESTRICT: No Deletion/Transfer Workflow

Sessions: 1, 5 Files: identity.md:44, table-reference.md:56 Open-memory: ses_24f76141effegdhw2bxX2sOvYb, ses_24f735dbcffea1pN0JCgtPdbt2

RESTRICT prevents deletion of accounts that own organizations, but no ownership transfer mechanism is documented.

Recommendation: Add a note: "Before deleting an account, transfer all owned organizations via org.transferOwnership operation." Document the transfer pattern in identity.md or coordination.md.

W17. Path LIKE Queries May Not Use B-Tree Indexes in PostgreSQL

Sessions: 4 Files: tasks.md:83, tasks.md:101 Open-memory: ses_24f7431baffeElbZ3qVHCYQOSv

WHERE path LIKE 'implementation/%' can use a B-tree index only with the C locale or text_pattern_ops. With the default locale, LIKE pattern matching may not use the index.

Recommendation: Specify that the path index should use text_pattern_ops (CREATE INDEX idx_tasks_path ON tasks (path text_pattern_ops)) or document the locale dependency.

W18. Call Graph Payload Truncation Lacks Precision

Sessions: 3 Files: call-graph.md:30 Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt

The truncation strategy says "truncate payloads larger than 10KB" but doesn't specify: when truncation happens (on write? after call completes?), what preview contains (first N bytes? N characters?), whether 10KB is configurable, or how object storage reference URLs are structured.

Recommendation: Specify: (a) truncation happens on write to DB (in-flight calls have full payloads); (b) preview is the first 1024 bytes of the JSON-serialized payload; (c) make the threshold configurable per operation type or via hub config; (d) defer object storage details but add a placeholder section.

W19. `call_graph_nodes.identity` Has No FK or Account Linkage

Sessions: 3 Files: call-graph.md:20 Open-memory: ses_24f746ebbffeG4jqN3MbK5i9yt

The identity JSONB column stores { id, scopes, resources } as a snapshot, but there's no FK to accounts.id. Querying "all calls made by account X" requires JSONB containment, which is slow without a GIN index.

Recommendation: Add a callerAccountId text column with FK → accounts.id (SET NULL) for efficient querying, or add a GIN index on identity if JSONB queries are the intended access pattern.

W20. `mappings` Table Overloaded — Three Distinct Relationship Types

Sessions: 4 Files: coordination.md:10-27 Open-memory: ses_24f7431baffeElbZ3qVHCYQOSv

The mappings table stores three conceptually different relationships in one table:

Session → Spoke (where is the session running?)
Session → Parent session (coordination hierarchy)
Session → Task (what work is the session doing?)

All nullable FKs allow any combination, including potentially invalid ones. The table name mappings doesn't convey what's being mapped.

Recommendation: Document the valid column combinations (e.g., sessionId always NOT NULL, taskId only for task-scoped mappings, parentSessionId only for coordinator children). This makes it a polymorphic association table with documented shapes.

W21. `detections` Table Is Minimal — No Resolution or Deduplication

Sessions: 4 Files: coordination.md:29-39 Open-memory: ses_24f7431baffeElbZ3qVHCYQOSv

No resolution tracking (resolved, acknowledged, false-positive)
No deduplication (persistent MODEL_DEGRADATION creates a new row every check interval)
No session end correlation
anomalyType value set is unclear (open or closed enum?)

Recommendation: Add resolvedAt timestamp column. Add a UNIQUE constraint on (sessionId, anomalyType) with upsert semantics, or document that deduplication is handled at the application level. Specify whether anomalyType is extensible.

💡 Suggestions

Quality-of-life improvements that should be considered but won't block stabilization.

S01. Document `accessLevel` Change Authorization

Who can change an accounts.accessLevel? Can a user self-promote? Document the assumed invariants even for application-level concerns.

S02. Add Partial Indexes for Common Access Patterns

Several partial indexes would improve performance: active API keys (WHERE revoked_at IS NULL AND enabled = true), connected spokes (WHERE status = 'connected'), non-archived sessions, active tasks (WHERE status IN ('pending', 'in-progress', 'blocked')).

S03. Reserve `@alk.dev` Email Domain for System Accounts

LLM accounts use fallback addresses like glm-5.1@alk.dev. Document that all *@alk.dev emails are reserved for system-generated accounts and humans must use other domains.

S04. Consider `displayName` Index for User Search

accounts.displayName is not indexed. For UIs with user search/autocomplete, this would require full table scans.

S05. Document API Key Expiration Behavior

Does an expired key return "key expired" or a generic "authentication failed"? Recommend generic response to avoid leaking key state to attackers.

S06. Cross-Reference `sessions.accountId` in Identity Docs

identity.md:12 lists FK targets but omits sessions.accountId. Add it for completeness.

S07. Define `FilePartData` Type

sessions.md:132 references FilePartData[] in ToolState but never defines it. Clarify whether it's the same as the file part type's data shape.

S08. Complete AI SDK UIMessage Part Type Mapping

sessions.md:145-152 maps 6 part types but omits step-finish, patch, snapshot, compaction, agent. Document that these are excluded from the UIMessage view, or add mappings.

S09. Document `sessions.slug` Generation Strategy

Is it human-provided? Auto-generated? Random? This matters for API design and uniqueness enforcement.

S10. Add `parts.type` Index for Part-Type Queries

A composite index (session_id, type) would support queries like "all tool-call parts in session X" without a full scan. At minimum, document that type queries rely on existing indexes + sequential scan.

S11. Document Whether Parts Are Flat or Nested

The agent part type implies sub-agent delegation, which might need nesting. The current schema has no parentId on parts. Document whether parts are flat or whether nesting might be needed.

W22. `parts` Table: Missing `$onUpdate` and `NOT NULL` on Timestamp Columns

Sessions: 5 Files: sessions.md:99-107, README.md:69-82 Open-memory: ses_24f735dbcffea1pN0JCgtPdbt2

The parts table defines its own id, metadata, createdAt, and updatedAt instead of using commonCols, but the spec only says "defaults to now()" without specifying NOT NULL or $onUpdate. If the Drizzle implementation omits $onUpdate, parts rows never have updatedAt updated on modification, silently breaking any optimistic concurrency or caching logic. If createdAt/updatedAt are not NOT NULL, they can become NULL.

Recommendation: The parts table spec must explicitly state that createdAt and updatedAt are NOT NULL and that updatedAt includes $onUpdate(() => new Date()). Either replicate these details from commonCols with an explicit override note for id, or reference commonCols with the id exception documented.

S13. Add `projectId` to `mappings` for Direct Project-Scoped Queries

Finding all active mappings for a project's tasks requires a JOIN through sessions.projectId or tasks.projectId. A denormalized projectId would simplify this, or document that the JOIN pattern is acceptable.

S14. Document `mappings.status` Lifecycle Transitions

Unlike tasks.status which has an explicit lifecycle diagram, mappings.status transitions are unspecified. Add a lifecycle diagram or state machine.

S15. Specify Task Enum Values as Drizzle `pgEnum`

The categorical enum values (scope, risk, impact, level, priority, status) are documented as text strings but not referenced as Drizzle pgEnum types. Specify that these should be pgEnum for type safety, with the decomposer template consuming the same definitions.

S16. Rename `taskId` to `dependentTaskId` in `task_dependencies`

The column name taskId is generic and could be confused as "this task" rather than "the dependent task." Renaming to dependentTaskId makes the direction unmistakable.

S17. Add `call_graph_nodes.startedAt` Index for Latency Analysis

startedAt is crucial for p99 latency analysis. Consider an index alongside or instead of the suggested createdAt index.

S18. Consider Unique Constraint on `call_graph_edges(sourceId, targetId, edgeType)`

Nothing prevents duplicate edges between the same two nodes with the same type. A unique constraint prevents silently duplicated edge events from retries/reconnections.

✅ What's Working Well

Strengths identified across all five reviews:

Drizzle-TypeBox pattern — Well-documented and consistently applied. The createSelectSchema/createInsertSchema workflow with JSONB overrides is clear and implementable.
Cross-cutting reference pattern — table-reference.md as a single source for cascades, indexes, enums, and relations is an excellent organizational pattern that prevents "hunt through every domain doc" problems.
Nullable categorical fields (ADR-011) — The "not yet assessed" signal via NULL (instead of defaults) is well-reasoned and matches taskgraph's own Option<T> model.
Dual task representation — DB as source of truth, files as authoring surface. The authority model table is excellent and provides clear guidance.
ADR-012 terminology clarification — The account/role/session distinction is clearly motivated and the rename guidance is actionable.
Cascade behavior documentation — Having all FK behaviors in one place with rationale per relationship prevents implementation errors.
Operation specs as capabilities (ADR-006) — Elegant decision. Avoids opaque JSONB blobs, makes capabilities fully typed and queryable. Nullable spoke_id allows hub-native operations to coexist.
Config/secrets separation (ADR-007) — Four-layer model (config schema, config instance, auth schema, auth instance) with different storage strategies is well-structured.
Path semantics for tasks — Replacing parentId with path column for group scoping is clean. Prefix-based queries are intuitive and well-explained.
Partial unique index design — The operation_specs partial indexes correctly handle PostgreSQL's NULL-in-unique-index behavior. The explanation prevents a common pitfall.

Action Plan

Before Stabilization (Must Fix)

Priority	Issue	Action
🔴1	C01	Resolve NOT NULL + SET NULL contradictions for `sessions.accountId` and `audit_logs.ownerId`
🔴2	C02	Resolve ADR-003 vs sessions.md on message IDs — update one or the other
🔴3	C03	Resolve operation_specs delete vs soft-deactivation — choose one, update all docs
🔴4	C04	Document sessionId immutability invariant on messages/parts
🔴5	C05	Document roleName validation strategy (FK or intentional omission)
🔴6	C14	Add missing FK cascade entries to table-reference.md

Before Implementation (Should Fix)

Priority	Issue	Action
🟡1	C06	Document mappings.task denormalization invariant
🟡2	C07	Define sync field split (authored vs. runtime fields)
🟡3	C08	Specify DB-level concatenation for task.body appends
🟡4	C09	Add DB-level guard for cross-project dependencies
🟡5	C10	Add call_graph_edges indexes and cascade docs
🟡6	C11	Specify key rotation multi-key format and transaction safety
🟡7	C14	Add remaining missing cascade entries
🟡8	W03	Add missing indexes across tables
🟡9	W11	Add cross-table status disambiguation
🟡10	W14	Fix ADR terminology inconsistencies

Before Production (Consider)

Priority	Issue	Action
💡1	W02	Add account deactivation mechanism
💡2	W08	Add call graph payload redaction
💡3	W09	Define call graph retention policy
💡4	W12	Add sessionId and orgId to audit_logs
💡5	W21	Add resolution tracking to detections
💡6	All S01-S18	Quality-of-life improvements

41 KiB Raw Permalink Blame History

Storage Architecture Review: 2026-04-21

Review Sessions (open-memory)

Documents Reviewed

Summary Statistics

🔴 Critical Issues

C01. NOT NULL + onDelete: SET NULL — Contradictory Constraints

C02. ADR-003 vs sessions.md on Message IDs

C03. Operation Specs: Delete vs. Soft-Deactivation Unresolved

C04. parts.sessionId Denormalization: No Enforcement Mechanism

C05. sessions.roleName — No FK, No Validation Strategy Documented

C06. mappings.task Denormalized Column: No Sync Strategy

C07. Sync vs. Runtime Field Conflict in Tasks

C08. Concurrent task.body Appends: No Collision Handling

C09. Cross-Project Dependency Constraint: No DB Enforcement

C10. Call Graph Edges: Missing Indexes and Cascade Documentation

C11. Secret Key Rotation: Underspecified

C12. Client Config Schema Validation: Timing and Evolution Ambiguous

C13. Dual Ownership Model for Organizations: Undefined

C14. Missing FK Cascade Entries in table-reference.md

🟡 Warnings

W01. Dual JSONB Overlap: commonCols.metadata vs Per-Table data

W02. No Account Deactivation Mechanism

W03. Missing Indexes Across Many Tables

W04. operation_specs Pre-Remap vs. Post-Remap Namespace Ambiguity

W05. call_graph_edges.edgeType Semantics Undefined

W06. spokes.status Missing reconnecting State

W07. client_secrets.keyVersion Redundancy

W08. Call Graph Payload Security

W09. No Call Graph Retention Policy

W10. sessions.version Column: Unspecified

W11. Overlapping Status Enums Without Cross-Table Disambiguation

W12. Audit Logs Missing Session and Org Context

W13. API Key Hashing (SHA-256) Trade-Off Undocumented

W14. ADR Terminology Inconsistencies

W15. Resolved Open Questions Still Listed as Open in README

W16. organizations.ownerId RESTRICT: No Deletion/Transfer Workflow

W17. Path LIKE Queries May Not Use B-Tree Indexes in PostgreSQL

W18. Call Graph Payload Truncation Lacks Precision

W19. call_graph_nodes.identity Has No FK or Account Linkage

W20. mappings Table Overloaded — Three Distinct Relationship Types

W21. detections Table Is Minimal — No Resolution or Deduplication

💡 Suggestions

S01. Document accessLevel Change Authorization

S02. Add Partial Indexes for Common Access Patterns

S03. Reserve @alk.dev Email Domain for System Accounts

S04. Consider displayName Index for User Search

S05. Document API Key Expiration Behavior

S06. Cross-Reference sessions.accountId in Identity Docs

S07. Define FilePartData Type

S08. Complete AI SDK UIMessage Part Type Mapping

S09. Document sessions.slug Generation Strategy

S10. Add parts.type Index for Part-Type Queries

S11. Document Whether Parts Are Flat or Nested

W22. parts Table: Missing $onUpdate and NOT NULL on Timestamp Columns

S13. Add projectId to mappings for Direct Project-Scoped Queries

S14. Document mappings.status Lifecycle Transitions

S15. Specify Task Enum Values as Drizzle pgEnum

S16. Rename taskId to dependentTaskId in task_dependencies

S17. Add call_graph_nodes.startedAt Index for Latency Analysis

S18. Consider Unique Constraint on call_graph_edges(sourceId, targetId, edgeType)

✅ What's Working Well

Action Plan

Before Stabilization (Must Fix)

Before Implementation (Should Fix)

Before Production (Consider)

41 KiB

Raw Permalink Blame History

C01. `NOT NULL` + `onDelete: SET NULL` — Contradictory Constraints

C02. ADR-003 vs `sessions.md` on Message IDs

C04. `parts.sessionId` Denormalization: No Enforcement Mechanism

C05. `sessions.roleName` — No FK, No Validation Strategy Documented

C06. `mappings.task` Denormalized Column: No Sync Strategy

C08. Concurrent `task.body` Appends: No Collision Handling

C14. Missing FK Cascade Entries in `table-reference.md`

W01. Dual JSONB Overlap: `commonCols.metadata` vs Per-Table `data`

W04. `operation_specs` Pre-Remap vs. Post-Remap Namespace Ambiguity

W05. `call_graph_edges.edgeType` Semantics Undefined

W06. `spokes.status` Missing `reconnecting` State

W07. `client_secrets.keyVersion` Redundancy

W10. `sessions.version` Column: Unspecified

W16. `organizations.ownerId` RESTRICT: No Deletion/Transfer Workflow

W19. `call_graph_nodes.identity` Has No FK or Account Linkage

W20. `mappings` Table Overloaded — Three Distinct Relationship Types

W21. `detections` Table Is Minimal — No Resolution or Deduplication

S01. Document `accessLevel` Change Authorization

S03. Reserve `@alk.dev` Email Domain for System Accounts

S04. Consider `displayName` Index for User Search

S06. Cross-Reference `sessions.accountId` in Identity Docs

S07. Define `FilePartData` Type

S09. Document `sessions.slug` Generation Strategy

S10. Add `parts.type` Index for Part-Type Queries

W22. `parts` Table: Missing `$onUpdate` and `NOT NULL` on Timestamp Columns

S13. Add `projectId` to `mappings` for Direct Project-Scoped Queries

S14. Document `mappings.status` Lifecycle Transitions

S15. Specify Task Enum Values as Drizzle `pgEnum`

S16. Rename `taskId` to `dependentTaskId` in `task_dependencies`

S17. Add `call_graph_nodes.startedAt` Index for Latency Analysis

S18. Consider Unique Constraint on `call_graph_edges(sourceId, targetId, edgeType)`