Setup repo: migrate architecture specs, code stubs, and tasks from alkhub_ts

Copy architecture docs, ADRs, storage domain specs, research, reviews,
and 56 storage architecture tasks from the alkhub_ts monorepo. Adapt for
standalone @alkdev/hub repo structure (src/ not packages/hub/).

Sanitize all sensitive information:
- Replace private IPs (10.0.0.1) with localhost defaults
- Remove internal server hostnames (dev1, ns528096)
- Replace /workspace/ private paths with npm package references
- Remove hardcoded credentials from examples
- Rewrite infrastructure.md without private network details

Add Deno project scaffolding: deno.json (pinned deps), .gitignore,
AGENTS.md, entry point. Migrate existing code stubs (crypto, config
types, logger) with updated import paths.
This commit is contained in:
2026-05-25 10:56:32 +00:00
parent 3e3f12d2d5
commit 2b63cda1c7
120 changed files with 11714 additions and 2 deletions

View File

@@ -0,0 +1,521 @@
---
status: open
created: 2026-05-18
last_updated: 2026-05-18
---
# Core Library Extraction Sync Review
Review of the impact of extracting three core libraries — `@alkdev/operations`, `@alkdev/pubsub`, and `@alkdev/taskgraph` — on the alkhub_ts codebase and architecture documentation. These packages are now published on npm and replace in-repo code plus implement previously "not started" functionality.
---
## Summary
Three packages were extracted from (or designed for) this codebase and are now platform-agnostic npm packages:
| Package | Version | Replaces in `packages/core/` | New Capabilities |
|---------|---------|-------------------------------|------------------|
| `@alkdev/operations` | 0.1.0 | `operations/` (7 files) + `mcp/` (3 files) | Call protocol (PendingRequestMap), ResponseEnvelope, access control enforcement, CallError, SchemaAdapter, subscribe helper, SSE subscription handling |
| `@alkdev/pubsub` | 0.1.0 | `pubsub/` (5 files) | EventEnvelope, WebSocket client+server+worker event targets, 13 operators (was 3), inlined Repeater, `prefix`/`close()` on Redis ET |
| `@alkdev/taskgraph` | 0.0.2 | Nothing (new) | TaskGraph class, analysis (critical path, parallel groups, bottlenecks, risk, cost-benefit), frontmatter parsing |
The decision has been made to **remove `packages/core/` as a package entirely**. Its remaining modules (config, logger, crypto) will be relocated — most likely into hub directly, since spokes that need config can import `@alkdev/operations` config types or we create a minimal `@alkhub/config` package. The first spokes won't need provider key storage; eventual "hub-like spokes" will be addressed as a federation concern later.
---
## 1. Code Changes
### 1.1 Delete from `packages/core/`
All of these are replaced by npm packages:
**`core/pubsub/`** — replaced by `@alkdev/pubsub`:
- `create_pubsub.ts`
- `typed_event_target.ts`
- `redis_event_target.ts`
- `operators.ts`
- `mod.ts`
**`core/operations/`** — replaced by `@alkdev/operations`:
- `types.ts`
- `registry.ts`
- `env.ts`
- `scanner.ts`
- `validation.ts`
- `from_schema.ts`
- `from_openapi.ts`
- `mod.ts`
**`core/mcp/`** — replaced by `@alkdev/operations/from-mcp`:
- `wrapper.ts`
- `loader.ts`
- `mod.ts`
**Tests and fixtures** — for deleted modules:
- `tests/operations/registry.test.ts`
- `tests/operations/scanner.test.ts`
- `tests/pubsub/redis_event_target.test.ts`
- `tests/mcp/loader.test.ts`
- `tests/fixtures/registry.ts`
- `tests/fixtures/operations/demo/greet.ts`
- `tests/fixtures/operations/other/calculate.ts`
### 1.2 Relocate from `packages/core/`
These have no external replacement and need to be relocated:
| Module | Lines | Destination |
|--------|-------|-------------|
| `core/config/types.ts` | 169 | Hub package (or a thin `@alkhub/config` if spokes need shared config types) |
| `core/logger/mod.ts` | 27 | Hub package (logtape config is hub-specific anyway) |
| `core/utils/crypto.ts` | 119 | Hub package (encryption key management is hub-only) |
### 1.3 Delete `packages/core/` as a package
Once modules are relocated, remove:
- `packages/core/deno.json`
- `packages/core/mod.ts`
- The `"core"` entry from root `deno.json` workspace array
### 1.4 Update dependency declarations
**Root `deno.json`**:
- Remove `"packages/core"` from workspace array
- Add `@alkdev/operations`, `@alkdev/pubsub`, `@alkdev/taskgraph` to imports (if needed at root level)
**New `packages/hub/deno.json`** (when created):
- Add: `@alkdev/operations`, `@alkdev/pubsub`, `@alkdev/taskgraph`, `@alkdev/typebox`, `@alkdev/drizzlebox`, `hono`, `drizzle-orm`, `ioredis`, `logtape`, `@hono/mcp`, `ai`, `keypal`
- Remove (no longer direct): `@repeaterjs/repeater` (inlined in @alkdev/pubsub), `@modelcontextprotocol/sdk` (optional peer in @alkdev/operations)
**New `packages/spoke/deno.json`** (when created):
- Add: `@alkdev/operations`, `@alkdev/pubsub` (client event target only), `@alkdev/typebox`, `logtape`
### 1.5 Breaking API Changes
| Change | Impact | Migration |
|--------|--------|-----------|
| `registry.execute()` returns `ResponseEnvelope<T>` not `T` | All callers must `unwrap()` or access `.data` | `import { unwrap } from "@alkdev/operations"` |
| `OperationEnv` functions return `Promise<ResponseEnvelope>` not `Promise<unknown>` | All nested call sites | Same |
| `OperationContext` drops `stream`/`pubsub` fields | Handlers using these (none exist yet) | Use `PendingRequestMap.subscribe()` for subscriptions |
| `createPubSub` uses `PubSubEventMap` not `PubSubPublishArgsByKey` | Any pubsub usage | `createPubSub<{ eventType: PayloadType }>()` — publishes with `publish(type, id, payload)` |
| `createRedisEventTarget` takes `prefix` and has `close()` | Redis setup code | Add `prefix: "alk:events:"`, call `close()` on shutdown |
| Scanner uses `ScannerFS` interface, not `Deno.readDir` directly | Spoke scanner | Provide Deno adapter: `{ readdir: (p) => Deno.readDir(p), cwd: () => Deno.cwd() }` |
| `AccessControl` drops `customAuth` field | No code uses it yet | N/A |
| MCP adapter wraps results in `mcpEnvelope()` | MCP consumers | Use `unwrap()` or `isResponseEnvelope()` |
| `assertIsSchema` throws `Error` instead of `AssertionError` | Test code | Already the correct behavior per @alkdev/operations |
---
## 2. Architecture Spec Updates
### 2.1 AGENTS.md — Major Update
**Provenance table** — Replace all "Copied from predecessor project" and "Forked from graphql-yoga" entries:
| Module | Current Status | New Status |
|--------|---------------|------------|
| Operations system | "Working, 7 tests passing" | **Extracted to `@alkdev/operations` v0.1.0** |
| PubSub (createPubSub) | "Working" | **Extracted to `@alkdev/pubsub` v0.1.0** |
| PubSub (operators) | "Working" | **Extracted to `@alkdev/pubsub` v0.1.0** |
| TypedEventTarget | "Forked from graphql-yoga" | **Extracted to `@alkdev/pubsub` v0.1.0** |
| Redis EventTarget | "Working, 5 tests passing" | **Extracted to `@alkdev/pubsub` v0.1.0** |
| WebSocket EventTarget | "Not started" | **Implemented in `@alkdev/pubsub` v0.1.0** (client + server + worker) |
| MCP client | "Working, 1 test passing" | **Extracted to `@alkdev/operations/from-mcp` v0.1.0** |
| Call protocol | "Not started" | **Implemented in `@alkdev/operations` v0.1.0** |
| Config types | "Needs hub config" | Remains (to relocate) |
| Logger | "Needs proper config" | Remains (to relocate) |
| Storage | "Not started" | Not started (unchanged) |
**Key Patterns section** — Update:
- Operations: Reference `@alkdev/operations` package, add ResponseEnvelope and call protocol
- PubSub: Reference `@alkdev/pubsub` package, update from "graphql-yoga (MIT)" to standalone package with EventEnvelope pattern
- New: Task graph operations via `@alkdev/taskgraph`
**Reference Dependencies table** — Add:
| `@alkdev/operations` | `npm:@alkdev/operations@^0.1.0` | Operations, call protocol, MCP adapter, ResponseEnvelope |
| `@alkdev/pubsub` | `npm:@alkdev/pubsub@^0.1.0` | PubSub, EventEnvelope, event targets (Redis/WS/Worker) |
| `@alkdev/taskgraph` | `npm:@alkdev/taskgraph@^0.0.2` | Task graph, analysis, frontmatter |
Remove:
- `graphql-yoga` row (source now in `@alkdev/pubsub`)
Update:
- `graphology` row: note it's now a transitive dep of `@alkdev/taskgraph`, no longer a direct dep of this project
**Workspace Structure** — Remove `core/` package:
```
packages/
hub/ — Hono API server, storage (Drizzle+Postgres), auth, coordination, Redis events
spoke/ — Self-registering runner: websocket connection, dispatch, operation provider
```
Add note about external dependencies:
```
External @alkdev packages (npm):
@alkdev/operations — Operations registry, call protocol, MCP adapter, ResponseEnvelope
@alkdev/pubsub — PubSub, event targets (Redis/WS/Worker), operators
@alkdev/taskgraph — Task graph construction, analysis, frontmatter
```
**Constraints section** — Add:
- `@alkdev/pubsub`, `@alkdev/operations`, `@alkdev/taskgraph` are the canonical implementations — do not duplicate their code in-repo
### 2.2 overview.md — Major Update
**"What Exists" section** — Replace entirely:
| Module | Location | Status |
|--------|----------|--------|
| Operations system | `@alkdev/operations` | Published v0.1.0 |
| PubSub (createPubSub + operators) | `@alkdev/pubsub` | Published v0.1.0 |
| TypedEventTarget | `@alkdev/pubsub` | Published v0.1.0 |
| Redis EventTarget | `@alkdev/pubsub` | Published v0.1.0 |
| WebSocket EventTarget (client+server) | `@alkdev/pubsub` | Published v0.1.0 |
| Worker EventTarget | `@alkdev/pubsub` | Published v0.1.0 |
| MCP client adapter | `@alkdev/operations/from-mcp` | Published v0.1.0 |
| Call protocol (PendingRequestMap, CallHandler) | `@alkdev/operations` | Published v0.1.0 |
| Access control (enforceAccess) | `@alkdev/operations` | Published v0.1.0 |
| ResponseEnvelope | `@alkdev/operations` | Published v0.1.0 |
| SchemaAdapter (Zod/Valibot) | `@alkdev/operations/from-typemap` | Published v0.1.0 |
| SSE subscription handling | `@alkdev/operations/from-openapi` | Published v0.1.0 |
| Task graph + analysis | `@alkdev/taskgraph` | Published v0.0.2 |
| Config types | `packages/core/` | Stub — needs relocation |
| Logger | `packages/core/` | Stub — needs relocation |
**"What Needs Implementation"** — Remove completed items, keep remaining:
| Component | Spec | Priority |
|-----------|------|----------|
| ~~WebSocket EventTarget~~ | ~~spoke-runner.md~~ | ~~High~~**Done: `@alkdev/pubsub`** |
| ~~Call protocol (PendingRequestMap)~~ | ~~call-graph.md~~ | ~~High~~**Done: `@alkdev/operations`** |
| Storage (Drizzle+Postgres tables, migrations) | storage/ | High |
| Hub HTTP server (Hono) | hub-architecture.md | High |
| OpenAI proxy (Hono) | agent-sessions.md | High |
| Logger configuration | — | Medium |
| Hub config system | hub-config.md | Medium |
| MCP server (@hono/mcp) | mcp-server.md | Medium |
| Agent sessions (AI SDK) | agent-sessions.md | Medium |
| Coordination operations | coordination.md | Medium |
| Call graph storage | call-graph.md, storage/ | Medium |
| Operation graph | call-graph.md | Low |
| Call templates | call-graph.md | Low |
### 2.3 packages.md — Major Rewrite
**Remove `@alkhub/core` section entirely.** Add a new section for external `@alkdev/*` packages:
```
### `@alkdev/operations` (npm package)
Operations registry, call protocol, MCP adapter, ResponseEnvelope. Platform-agnostic.
Exports:
. — types, registry, call protocol (PendingRequestMap, buildCallHandler), subscribe, access control, error, env, scanner, validation, from_schema, response-envelope
./from-mcp — MCP tool adapter (ioredis optional peer)
./from-typemap — Zod/Valibot schema adapters (@alkdev/typemap optional peer)
./from-openapi — OpenAPI/SSE/HTTP service adapter
### `@alkdev/pubsub` (npm package)
PubSub, event targets, operators. Platform-agnostic.
Exports:
. — createPubSub, types, operators, repeater
./event-target-redis — Redis adapter (ioredis optional peer)
./event-target-websocket-client — Spoke-side WebSocket adapter
./event-target-websocket-server — Hub-side WebSocket adapter
./event-target-worker — Web Worker adapter (host + thread)
### `@alkdev/taskgraph` (npm package)
Task graph construction, analysis, frontmatter. Platform-agnostic.
Exports:
. — TaskGraph, analysis functions, schema, error types, frontmatter
```
**`@alkhub/hub` dependencies**: Add `@alkdev/operations`, `@alkdev/pubsub`, `@alkdev/taskgraph`. Remove `@repeaterjs/repeater` (inlined). Update: `ioredis` is optional (only if Redis ET is used directly; the package uses it).
**`@alkhub/spoke` dependencies**: Add `@alkdev/operations`, `@alkdev/pubsub`.
**Rules section** — Update rule 1: "core is transport-agnostic" becomes "packages should be transport-agnostic". Remove rule about core being persistence-agnostic (hub still is). Update dependency direction:
```
spoke → @alkdev/operations, @alkdev/pubsub
hub → @alkdev/operations, @alkdev/pubsub, @alkdev/taskgraph
hub ←/→ spoke (communicate via call protocol over WebSocket)
```
### 2.4 call-graph.md — Significant Update
**PendingRequestMap section** — Replace the schematic with actual `@alkdev/operations` API:
```ts
// From @alkdev/operations
import { PendingRequestMap } from "@alkdev/operations"
const prm = new PendingRequestMap({ eventTarget })
await prm.call(operationId, input, { deadline, identity })
const stream = prm.subscribe(operationId, input, { idleTimeout, identity })
prm.respond(requestId, output) // output must be ResponseEnvelope
prm.emitError(requestId, code, message, details?)
prm.complete(requestId)
prm.abort(requestId)
```
Key API differences from the doc:
- `call()` returns `Promise<ResponseEnvelope>` (not `Promise<unknown>`)
- `subscribe()` returns `AsyncIterable<ResponseEnvelope>`
- `respond()` requires output to be a `ResponseEnvelope`
- Deadline and idle timeout are built in
- Constructor takes optional `EventTarget` for pluggable transport
**CallHandler section** — Reference `buildCallHandler` from `@alkdev/operations`:
```ts
import { buildCallHandler } from "@alkdev/operations"
const handler = buildCallHandler({ registry, eventTarget })
```
**buildEnv section** — Remove `callMap` parameter. In `@alkdev/operations`, `buildEnv`:
- No longer takes `callMap` — uses `PendingRequestMap` internally
- Sets `trusted: true` on nested context
- Returns env functions that return `Promise<ResponseEnvelope>`
**Dependencies section** — Replace graphology direct deps. Graphology is now a transitive dependency through `@alkdev/taskgraph`. Call graph storage still uses graphology for runtime operations but should prefer `@alkdev/taskgraph`'s `TaskGraph` class when applicable.
### 2.5 operations.md — Major Rewrite
This doc needs significant restructuring since most of what it describes is now in `@alkdev/operations`.
**Key changes**:
- Remove "In-repo location: `packages/core/operations/`" — now external package
- Component descriptions should reference `@alkdev/operations` exports
- Schema Adapters section: Replace raw `@alkdev/typemap` dynamic import description with `SchemaAdapter` pattern
- Remove SSE Subscription Handler Fix from open issues — fixed in `@alkdev/operations/from-openapi`
- Update Call Protocol Integration section to reference `@alkdev/operations` API
- Add ResponseEnvelope concept (universal result wrapper: local/http/mcp)
- Add CallError/InfrastructureErrorCode concept
- Update access control: `enforceAccess` is now in the package, with `trusted` bypass
**New concepts to document**:
- `ResponseEnvelope<T>` with source discriminant (`"local"` | `"http"` | `"mcp"`)
- `subscribe()` helper for subscription operations
- `ScannerFS` interface (Deno runtime agnostic)
- `OpenAPIServiceRegistry` class for managing HTTP services
- `parseSSEFrames()` for SSE subscription handling
### 2.6 pubsub-redis.md — Major Rewrite
This doc describes code that's now in `@alkdev/pubsub`. Key changes:
- **Source location**: `@alkdev/pubsub` npm package, not `packages/core/pubsub/`
- **createPubSub API**: Uses `PubSubEventMap` (simple `{ [eventType: string]: payload }`) not `PubSubPublishArgsByKey`
- **EventEnvelope**: New concept — `{ type, id, payload }` is the cross-process message format. Reserved `__` prefix for control messages.
- **Redis EventTarget**: Now accepts `prefix` option (e.g., `"alk:events:"`) and has `close()` method. No need for serializer workaround to add prefix.
- **WebSocket EventTarget**: No longer "Not started" / "Deferred". Document both client and server adapters.
- **Worker EventTarget**: New adapter for Web Workers (host + thread).
- **Operators**: 13 operators, not 3. New: `take`, `reduce`, `toArray`, `batch`, `dedupe`, `window`, `flat`, `groupBy`, `chain`, `join`.
- **Repeater**: Inlined, no longer depends on `@repeaterjs/repeater` externally.
- **Prior Art section**: Update to reflect `@alkdev/pubsub` is a standalone package, not forked code in-repo.
### 2.7 storage/tasks.md — Update Graphology Section
**"Graphology Integration" section** — Replace direct graphology usage with `@alkdev/taskgraph`:
Instead of:
```
1. Load all tasks + task_dependencies rows for a project from the DB
2. Build a graphology DirectedGraph in memory
3. Run graph algorithms as needed
```
Use:
```
1. Load all tasks + task_dependencies rows for a project from the DB
2. Build a TaskGraph via TaskGraph.fromRecords(tasks, edges)
3. Run analysis functions as needed (criticalPath, parallelGroups, bottlenecks, riskPath, etc.)
```
**Frontmatter parsing** — Reference `@alkdev/taskgraph`'s `parseFrontmatter` and `serializeFrontmatter` functions instead of custom parsers. Note: `parseTaskFile` and `parseTaskDirectory` are Node.js only (use `node:fs/promises`).
**References section** — Update graphology reference to point to `@alkdev/taskgraph` package.
**NAPI note** — The doc says "Why not taskgraph NAPI for v1". This is now resolved: `@alkdev/taskgraph` is pure TypeScript (graphology-based), and the Rust CLI (`taskgraph`) is for offline analysis. The TS package handles runtime graph ops.
### 2.8 hub-architecture.md — Update Component Table
- Operations row: `@alkdev/operations` not `core/operations/`
- PubSub row: `@alkdev/pubsub` not `core/pubsub/`
- Call protocol row: `@alkdev/operations` not `core/` (see call-graph.md)
- WebSocket adapter: "pending" → "available in `@alkdev/pubsub`"
### 2.9 hub-config.md — Update Redis EventTarget Example
Update `createRedisEventTarget` example to include `prefix`:
```ts
createRedisEventTarget({
publishClient,
subscribeClient,
prefix: "alk:events:",
})
```
### 2.10 hub-startup.md — Update References
- PendingRequestMap + CallHandler: note these come from `@alkdev/operations`
- PubSub setup: reference `@alkdev/pubsub` with `prefix` option
### 2.11 spoke-runner.md — Update References
- WebSocketEventTarget: `@alkdev/pubsub/event-target-websocket-client`
- PendingRequestMap: `@alkdev/operations`
- Scanner: `@alkdev/operations` with `ScannerFS` Deno adapter
- SchemaAdapters: `@alkdev/operations/from-typemap`
- `FromSchema()` / `FromOpenAPI()`: `@alkdev/operations/from-schema` / `@alkdev/operations/from-openapi`
### 2.12 ADR-013 — Update Paths
- Update `packages/core/operations/scanner.ts` references to `@alkdev/operations/scanner`
- Update `packages/core/operations/from_schema.ts` references to `@alkdev/operations/from_schema`
- Update `packages/core/operations/from_openapi.ts` references to `@alkdev/operations/from_openapi`
- Update scanner enhancement task to reference `SchemaAdapter` pattern from `@alkdev/operations/from-typemap`
### 2.13 docs/research/migration/ — Update or Archive
Both `operations.md` and `pubsub.md` in this directory describe planned extractions that are now **complete**. Options:
- **Archive**: Move to `docs/research/migration/completed/` with a status note
- **Update**: Rewrite as "completed migration" docs showing before/after
Recommend: Archive both. They served their purpose and the current API surface is documented in the `@alkdev/*` package READMEs and this review.
### 2.14 docs/reviews/docs-consistency-review-2026-04-17.md — Superseded Entries
Several findings from the previous review are now resolved by the extractions:
| Finding | Original Issue | Resolution |
|---------|---------------|------------|
| C5 | PendingRequestMap is in core, not hub | **Resolved**: Now in `@alkdev/operations` |
| I2 | `env.ts` has PendingRequestMap interface only | **Resolved**: Full implementation in `@alkdev/operations` |
| I5 | `OperationContext.pubsub` typed as unknown | **Resolved**: `pubsub` field removed from context in `@alkdev/operations` |
| I6 | `OperationContext.stream` never populated | **Resolved**: `stream` field removed from context in `@alkdev/operations` |
| I7 | `@repeaterjs/repeater` version mismatch risk | **Resolved**: Inlined in `@alkdev/pubsub`, no external dep |
---
## 3. What's Now Unblocked
| Component | Previous Status | Now Available In |
|-----------|-----------------|------------------|
| Call protocol (PendingRequestMap, CallHandler) | Not started | `@alkdev/operations` |
| WebSocket transport (client + server) | Not started | `@alkdev/pubsub` |
| WebSocket connection management (backpressure, SpokeEventTarget) | Not started | `@alkdev/pubsub` |
| Access control enforcement (checkAccess, enforceAccess) | Not started | `@alkdev/operations` |
| Task graph operations (topo sort, cycles, critical path, risk) | Not started | `@alkdev/taskgraph` |
| ResponseEnvelope (source tracking) | Not started | `@alkdev/operations` |
| Schema conversion (Zod/Valibot) | Not started | `@alkdev/operations/from-typemap` |
| SSE subscription handling | Broken | `@alkdev/operations/from-openapi` |
| Error model (CallError, InfrastructureErrorCode) | Not started | `@alkdev/operations` |
| EventEnvelope (structured cross-process messages) | Not started | `@alkdev/pubsub` |
## 4. What Still Needs Implementation
All of these are hub or spoke level concerns that can now be built on top of the extracted packages:
| Component | Depends On | Spec |
|-----------|------------|------|
| Storage (Drizzle+Postgres tables, migrations) | `@alkdev/typebox`, `@alkdev/drizzlebox`, `drizzle-orm` | storage/ |
| Hub HTTP server (Hono) | `@alkdev/operations`, `@alkdev/pubsub`, `hono` | hub-architecture.md |
| Spoke WebSocket client | `@alkdev/operations`, `@alkdev/pubsub/event-target-websocket-client` | spoke-runner.md |
| Hub WebSocket server (spoke management) | `@alkdev/operations`, `@alkdev/pubsub/event-target-websocket-server` | spoke-runner.md |
| OpenAI proxy | `hono`, AI SDK | agent-sessions.md |
| Auth (keypal) | Hono middleware | — |
| MCP server (@hono/mcp) | `@alkdev/operations`, `@hono/mcp` | mcp-server.md |
| Agent sessions (AI SDK) | `@alkdev/operations`, AI SDK, storage | agent-sessions.md |
| Coordination operations | `@alkdev/operations`, storage | coordination.md |
| Call graph storage | `@alkdev/operations`, storage | storage/call-graph.md |
| Hub config loader | `@alkdev/operations` (config types) | hub-config.md |
| Logger configuration | logtape | — |
---
## 5. Package Dependency Graph (New)
```
@alkdev/operations → @alkdev/typebox, @alkdev/pubsub, @logtape/logtape
→ (optional peers): @alkdev/typemap, @modelcontextprotocol/sdk
@alkdev/pubsub → (no runtime deps)
→ (optional peer): ioredis (for ./event-target-redis)
@alkdev/taskgraph → @alkdev/typebox, graphology (+plugins), yaml
@alkhub/hub → @alkdev/operations, @alkdev/pubsub, @alkdev/taskgraph,
@alkdev/typebox, @alkdev/drizzlebox, hono, drizzle-orm,
ioredis, ai, keypal, logtape, @hono/mcp
@alkhub/spoke → @alkdev/operations, @alkdev/pubsub, @alkdev/typebox, logtape
```
No `@alkhub/core` package. Config types, logger, and crypto utils live in `@alkhub/hub` (or a thin shared package if spokes need config types — this can be decided when implementing the spoke).
---
## 6. Open Decisions
### 6.1 Where do config types go?
`core/config/types.ts` has `HubConfig`, `SpokeConfig`, `BaseConfig`, `PostgresConfig`, `RedisConfig`, `HttpConfig`, `AuthConfig`. These are used by both hub and spoke.
Options:
- **A**: Move to `@alkhub/hub`. Spokes that need config types import them from their own copy or a minimal `@alkhub/config` package.
- **B**: Create `@alkdev/config` npm package. Platform-agnostic like the other `@alkdev/*` packages.
- **C**: Put config types in `@alkdev/operations`. They're already TypeBox schemas and operations already depend on `@alkdev/typebox`.
**Recommendation**: A for now. First spokes won't need hub config. Re-evaluate when a spoke actually needs shared config types. The spoke config types are already minimal (`SpokeConfig` has `hub.url` and `hub.auth.tokenFile`).
### 6.2 Logger and crypto?
`core/logger/mod.ts` (27 lines) and `core/utils/crypto.ts` (119 lines) are hub-specific concerns. Move them into `@alkhub/hub` directly.
### 6.3 How to handle `ScannerFS` for Deno?
`@alkdev/operations` uses an abstract `ScannerFS` interface. The spoke needs a Deno adapter:
```ts
import { scanOperations } from "@alkdev/operations"
const DenoFS: ScannerFS = {
readdir: async (path) => Deno.readDir(path),
cwd: () => Deno.cwd(),
}
const operations = await scanOperations("./operations", DenoFS)
```
This is minimal (~3 lines) and can live in the spoke package.
### 6.4 Research migration docs?
`docs/research/migration/operations.md` and `docs/research/migration/pubsub.md` describe extraction plans that are now complete. They should be archived or removed — they're historical context, not current documentation.
### 6.5 Previous consistency review findings?
The `docs-consistency-review-2026-04-17.md` has several findings that are now resolved by the extractions (C5, I2, I5, I6, I7 at minimum). These should be marked resolved in that document or superseded by this review.
---
## 7. Suggested Execution Order
1. **Delete replaced code** from `packages/core/` (operations, pubsub, mcp dirs + their tests)
2. **Update `packages/core/deno.json`** — remove deleted exports and dependencies
3. **Relocate remaining core modules** (config, logger, crypto) into `packages/hub/`
4. **Remove `packages/core/`** from workspace
5. **Update architecture docs** (overview, packages, call-graph, operations, pubsub-redis as priority)
6. **Update AGENTS.md** — provenance, key patterns, reference deps, workspace structure
7. **Update storage/tasks.md** — taskgraph references
8. **Update secondary docs** (hub-architecture, hub-config, hub-startup, spoke-runner, ADR-013)
9. **Archive research/migration docs** or mark as completed
10. **Update docs-consistency-review-2026-04-17.md** — mark superseded findings as resolved

View File

@@ -0,0 +1,260 @@
---
status: resolved
created: 2026-04-17
last_updated: 2026-04-17
---
# Documentation Consistency Review
Review of AGENTS.md and all 12 architecture docs for conflicting, confusing, and inconsistent content. Findings are organized by severity: Conflicts (actively misleading), Inconsistencies (confusing), and Gaps (missing info).
Each finding has a resolution status: **open** (needs decision), **resolved** (fixed), or **wontfix** (explicitly justified with rationale).
---
## 🔴 Conflicts — Actively Misleading
### C1. Runner/Spoke writes directly to Postgres vs. "No Postgres Connection" — ✅ resolved
**Files**: `agent-sessions.md`, `spoke-runner.md`, `packages.md`
**Problem**: `agent-sessions.md` diagram showed direct Postgres access from runner, contradicting spoke-runner.md ("No Postgres connection") and packages.md.
**Resolution**: Fixed diagram — session writes now go through hub operations (call protocol), not direct Postgres. Runner is stateless.
---
### C2. Hub "inherits from spoke" — ✅ resolved
**Files**: `hub-architecture.md`, `packages.md`, `AGENTS.md`
**Problem**: "Hub = Spoke + Orchestration — *inherits* the spoke's operation registry..." implied hub depends on spoke. Actual model: both → core independently.
**Resolution**: Rewrote to "Hub shares core with spoke, adds orchestration." Updated table section from "Kept from ade_spoke (wholesale)" to "From core (shared with spoke)."
---
### C3. Call protocol: conflicting signals on whether to build it now — ✅ resolved
**Files**: `call-graph.md`, `operations.md`, `overview.md`
**Problem**: Three docs gave different signals — call-graph.md said initial implementation, operations.md said stopgap without it, overview.md said needs implementation.
**Resolution**: Call protocol is in initial implementation. Removed stopgap language from operations.md. Updated overview.md to clarify it's the implementation that's needed, not the design decision. The stopgap reference was from a session that conflated the open-coordinator dev plugin with the project's native call protocol.
---
### C4. Coordination operations use `registry.execute()` — ✅ resolved
**Files**: `coordination.md`, `call-graph.md`
**Problem**: All `coord.*` operations showed `registry.execute()` calls, bypassing the call protocol designed to solve exactly the abort cascading problem that coordination needs.
**Resolution**: Updated coordination.md to use `env.*` (call protocol via buildEnv) instead of `registry.execute()`. The previous form was from the initial POC; the real implementation should use the call protocol.
---
### C5. PendingRequestMap package location: core vs. hub — ✅ resolved
**Files**: `call-graph.md`, `operations.md`, `packages.md`
**Problem**: `buildEnv()` in `core/operations/env.ts` takes `callMap: PendingRequestMap`. `packages.md` listed PendingRequestMap in hub. Circular dependency risk.
**Resolution**: PendingRequestMap belongs in core because both hub and spoke need it. Updated `packages.md` to list `call/` module in core with PendingRequestMap, CallHandler, and call event types. Hub module changed from "Call protocol" to "Call graph" (runtime tracking/observability using core's PendingRequestMap).
> **Resolution (2026-05-18)**: PendingRequestMap is now in `@alkdev/operations` package with full implementation (not just an interface). The complete class includes `call()`, `subscribe()`, `respond()`, `emitError()`, `complete()`, and `abort()` methods. Resolved by core library extraction to `@alkdev/operations`. See `docs/reviews/core-library-extraction-sync-2026-05-18.md`.
---
## 🟡 Inconsistencies — Confusing
### I1. Redis EventTarget status duplicated in AGENTS.md provenance — ✅ resolved
**Problem**: Same work described in both "PubSub" row and "Redis EventTarget" row.
**Resolution**: Merged. Provenance table now has separate rows for PubSub (createPubSub + operators), TypedEventTarget, Redis EventTarget — each with single source of truth.
---
### I2. "Do not reference paths outside this repo" vs. provenance external refs — ✅ resolved
**Problem**: Rule prohibited external paths but provenance table was full of them with no exemption.
**Resolution**: Rewrote provenance section with explanation: "ade_spoke was a predecessor project — references are for historical traceability only." Sources now say "Copied from predecessor project" instead of `ade_spoke/operations/`. Made the rule clearer: `/workspace/` checkouts of public packages are fine; private project paths are not.
---
### I3. "Not for copying code from" vs. "Copied to core/" — ✅ resolved
**Problem**: Reference deps say read-only; provenance shows code copied from those same sources.
**Resolution**: Rules now clarify: provenance code was copied during initial setup; going forward reference deps are read-only for source-level understanding only. The distinction is: (1) use local clones as references when you have questions — source and tests beat docs, (2) don't pull in references to in-house private projects that outsiders won't have access to.
---
### I4. graphql-yoga "should fork in" (future) vs. already forked (past) — ✅ resolved
**Problem**: Line 97 said "we should fork in" while line 76 said "Done ✅."
**Resolution**: Updated AGENTS.md graphql-yoga row to past tense: "Source of createPubSub + event-target code (already forked into core/pubsub/). Kept for reference only."
---
### I5. AI SDK version column had three different versions — ✅ resolved
**Problem**: npm Version `6.0.138`, parenthetical "latest 6.x stable", git checkout `6.0.165`.
**Resolution**: Updated to: npm "Will use latest 6.x stable (currently 6.0.168)", git checkout `6.0.165` (slightly behind). Removed the stale `6.0.138` reference.
---
### I6. Four operations vs. Three MCP tools — ✅ resolved
**Problem**: Spoke protocol has `list`; MCP server didn't expose it.
**Resolution**: Added `list` as a fourth MCP tool. Updated mcp-server.md throughout (3→4 tools). Updated overview.md and AGENTS.md to match.
---
### I7. `mappings` table schema conflicts — ✅ resolved
**Resolution**: Renamed `storage-pattern.md``storage.md`. All table schemas now canonical in storage.md. Removed inline schemas from coordination.md and call-graph.md — they now link to storage.md. Added `detections` table, `status` column on `mappings`, and full column lists for `call_graph_nodes`/`call_graph_edges`.
---
### I8. Status enum mismatch: call graph vs. mappings — ✅ resolved
**Resolution**: Added a "Status Enum Reference" section to storage.md documenting all status enums and explaining that `mappings.active` and `call_graph_nodes.pending`/`running` are different concepts — "active" = workflow in progress, "pending"/"running" = call execution state.
---
### I9. `call_graph_nodes` columns missing from storage-pattern.md summary — ✅ resolved
**Resolution**: Full column lists for all tables now in storage.md. Removed the abbreviated summary table format in favor of per-table detailed specs.
---
### I10. Identity model — ✅ resolved
**Problem**: Call protocol `Identity` had `roles: string[]` and `AccessControl` had `requiredRoles`. These came from a prior project's dual auth system (token/keys + iroh identities). With keypal as the single auth mechanism, "roles" are just scope bundles — a configuration convention, not a separate type.
**Resolution**:
- Removed `roles` from `Identity` interface and TypeBox schema. Now `{ id, scopes, resources }` — matches keypal's `ApiKeyMetadata` exactly.
- Renamed `AccessControl.requiredRoles``requiredScopesAny` (OR semantics for "any of these scopes").
- Added Access Control Model section to operations.md explaining how keypal scopes/resources map to AccessControl checks.
- Updated call-graph.md `CallEventMap` and error model to match.
- All 16 core tests pass.
---
### I11. "Kept from ade_spoke" section includes new designs — ✅ resolved (with C2)
**Resolution**: Section renamed to "From core (shared with spoke)" and new designs moved or reclassified.
---
### I12. SSE vs WebSocket clarification — ✅ resolved
**Resolution**: Added clarification to call-graph.md: WebSocket is primary bidirectional transport for hub↔spoke and hub↔client-spoke. SSE exists for compatibility (OpenAI proxy, legacy clients) but is not preferred. A client connecting as a spoke gets full bidirectional communication over a single WebSocket. Updated AGENTS.md constraint to match. Updated hub-architecture.md hub responsibilities.
---
### I13. WebSocketEventTarget: hub-side spec — ✅ resolved (architectural task noted)
**Resolution**: Added "Hub-Side WebSocket Handling (Architectural Task)" section to spoke-runner.md outlining the needed components: Hono WebSocket upgrade, per-connection WebSocketEventTarget + PendingRequestMap, spoke lifecycle management, identity/authentication at upgrade. Flagged as architectural task needing deeper design before implementation.
---
### I14. Container Manager → Container Spoke (deferred) — ✅ resolved
**Resolution**: Renamed "Container Manager" → "Container Spoke (deferred)" in hub-architecture.md. Added "Container Spoke (deferred)" spoke type to spoke-runner.md explaining it extends base spoke with Docker + opencode lifecycle. Prerequisite: working hub + minimal base spoke first. Also added a vast.ai variant note.
---
### I15. OpenAI Proxy needs a doc home — ✅ resolved
**Resolution**: Added "OpenAI proxy — LLM provider proxy, key management, rate limiting (blocks all LLM usage)" to hub modules in packages.md. Added "Proxy LLM calls" to hub responsibilities in hub-architecture.md.
---
### I16. `ade_spoke` / `ade-v0` / `open-coordinator` unexplained external references — ✅ resolved (with I2)
**Resolution**: AGENTS.md provenance now explains predecessor project context. Sources say "Copied from predecessor project" instead of cryptic paths. open-coordinator references removed from architecture docs (it's a dev tool, not project code).
---
### I17. Open questions not cross-referenced between docs — ✅ resolved
**Resolution**: Added cross-references between hub-architecture.md (API auth question) and spoke-runner.md (WebSocket auth question). Updated container lifecycle question in spoke-runner.md to reference the deferred container spoke. These cross-references should help reduce future drift since it's obvious when a related doc needs updating.
---
### I18. AGENTS.md: "call ≡ subscribe at protocol level" ambiguous — ✅ resolved
**Resolution**: Expanded in AGENTS.md to: "see call-graph.md: a call resolves after one event, a subscription stays open and yields events until stopped. Same message format, different consumption pattern."
---
## 🔵 Gaps — Missing Info (Not Contradictory)
| # | Gap | Where | Status | Suggested Fix |
|---|-----|-------|--------|---------------|
| G1 | `detections` table not in storage docs | coordination.md, storage.md | ✅ resolved | Added to storage.md table list |
| G2 | MCP client vs MCP server not distinguished | packages.md | ✅ resolved | Added clarification: MCP client in core (spokes need it), MCP server hub-only |
| G3 | No Deno version specified | AGENTS.md | ✅ resolved | Added: "latest stable, currently 2.6.9" |
| G4 | Do `hub/` and `spoke/` dirs exist? | AGENTS.md workspace structure | ✅ resolved | All three package dirs exist |
| G5 | Keypal version "close enough" | AGENTS.md | ✅ resolved | Updated to note "behind npm — needs tag update" |
| G6 | `DbType.Table` not explained | AGENTS.md | ✅ resolved | Added explanation: "from our prior project's storage layer — use drizzle-typebox pattern instead" |
| G7 | Graphology "not installed yet" may be stale | AGENTS.md | ✅ resolved | Verified: not in deno.json yet, updated phrasing |
| G8 | Provenance statuses undated | AGENTS.md | ✅ resolved | Rewrote provenance for clarity; historical context noted |
| G9 | `scripts/analyze_lint.ts` not explained | AGENTS.md | ✅ resolved | Verified exists; added description: in-house dev tool (filtering, stats for large lint output) |
---
## Resolution Log
| ID | Decision | Date | Rationale |
|----|----------|------|-----------|
| C1 | Fixed diagram: session writes go through hub, not direct Postgres | 2026-04-17 | Spokes have no Postgres connection; writes must go through hub operations |
| C2 | Rewrote "inherits spoke" to "shares core, adds orchestration" | 2026-04-17 | Actual dependency model is hub→core, spoke→core, not hub→spoke |
| C3 | Call protocol is initial implementation; removed stopgap language | 2026-04-17 | Stopgap/open-coordinator references were from a session that conflated dev plugin with project code. Call protocol is project code |
| C4 | Coordination ops use call protocol (env.*) not registry.execute() | 2026-04-17 | registry.execute() was POC pattern; call protocol provides abort cascading and observability that coordination needs |
| C5 | PendingRequestMap is in core, not hub | 2026-04-17 | Both hub and spoke need it; core's buildEnv() references it |
| I1-I6 | AGENTS.md provenance and reference deps rewritten for clarity | 2026-04-17 | Eliminated duplicated rows, clarified rules about external refs vs reference deps, fixed version info, added list to MCP tools |
| I7/I8/I9 | Storage doc centralized all table schemas; removed inline duplications | 2026-04-17 | Renamed storage-pattern.md → storage.md; coordination.md and call-graph.md now link to it; added detections table, status column on mappings, full column lists |
| I10 | Removed roles from Identity; renamed requiredRoles → requiredScopesAny | 2026-04-17 | With keypal as single auth, "roles" are scope bundles (convention), not a type. Identity now { id, scopes, resources } matching keypal's ApiKeyMetadata. AccessControl.requiredRoles → requiredScopesAny |
| I12 | SSE/WebSocket transport distinction clarified | 2026-04-17 | WebSocket primary for all bidirectional communication; SSE for compatibility only. Updated call-graph.md, AGENTS.md, hub-architecture.md |
| I13 | Hub-side WebSocket handling flagged as architectural task | 2026-04-17 | Added spec outline to spoke-runner.md; needs deeper design |
| I14 | Renamed Container Manager → Container Spoke (deferred) | 2026-04-17 | Extends base spoke with Docker/opencode lifecycle. Prerequisite: working hub + minimal spoke first |
| I15 | OpenAI proxy added to hub module list and responsibilities | 2026-04-17 | Added to packages.md and hub-architecture.md |
| I16 | open-coordinator references removed from architecture docs | 2026-04-17 | It's a dev tool for local agent coordination, not a project dependency |
| I17 | Cross-references added between hub and spoke open questions | 2026-04-17 | Auth and container questions now link between docs |
| I18 | "call ≡ subscribe" expanded with explanation and link | 2026-04-17 | AGENTS.md now explains: call resolves after one event, subscribe streams until stopped |
---
## Superseding Resolutions (2026-05-18 Core Library Extraction)
The following findings from this review have been further resolved by the extraction of `@alkdev/operations` v0.1.0 and `@alkdev/pubsub` v0.1.0 to npm. The original resolution in each case was correct at the time; these notes record the additional progress.
| Finding | Original Issue | Additional Resolution |
|---------|---------------|----------------------|
| C5 | PendingRequestMap is in core, not hub | **Further resolved**: PendingRequestMap is now in `@alkdev/operations` package with full implementation (not just an interface). Resolved by core library extraction to `@alkdev/operations`. See `docs/reviews/core-library-extraction-sync-2026-05-18.md`. |
| I2 | `env.ts` has PendingRequestMap interface only | **Further resolved**: Full PendingRequestMap class is now in `@alkdev/operations` with `call()`, `subscribe()`, `respond()`, `emitError()`, `complete()`, and `abort()`. Resolved by core library extraction to `@alkdev/operations`. See `docs/reviews/core-library-extraction-sync-2026-05-18.md`. |
| I5 | `OperationContext.pubsub` typed as unknown | **Further resolved**: `pubsub` field has been removed from OperationContext in `@alkdev/operations`. Subscriptions use `PendingRequestMap.subscribe()` instead. Resolved by core library extraction to `@alkdev/operations`. See `docs/reviews/core-library-extraction-sync-2026-05-18.md`. |
| I6 | `OperationContext.stream` never populated | **Further resolved**: `stream` field has been removed from OperationContext in `@alkdev/operations`. Resolved by core library extraction to `@alkdev/operations`. See `docs/reviews/core-library-extraction-sync-2026-05-18.md`. |
| I7 | `@repeaterjs/repeater` version mismatch risk | **Further resolved**: Repeater is now inlined in `@alkdev/pubsub`, eliminating the external dependency and version mismatch risk. Resolved by core library extraction to `@alkdev/pubsub`. See `docs/reviews/core-library-extraction-sync-2026-05-18.md`. |
---
## Remaining Open Items
All items from this review have been resolved. Future architecture work that was identified:
1. **Hub-side WebSocket handling** (I13) — spec outline added, needs deeper design before implementation
2. **Container spoke** (I14) — deferred until hub + minimal spoke are working
3. **Instruction firewall** — future project for safe bash/filesystem access from untrusted agent roles
4. **Message/part schema iteration** — storage.md has structure, detailed data shapes need more work
7. **I17** — Cross-reference open questions between docs
8. **I18** — "call ≡ subscribe" needs clarification
9. **G1/G2/G3/G9** — Small gaps (detections table, MCP client/server, Deno version, lint script)

View File

@@ -0,0 +1,782 @@
---
status: active
last_updated: 2026-04-21
review_date: 2026-04-21
reviewer: architect (with 5 subagent reviewers)
scope: docs/architecture/storage/* + docs/decisions/ADR-001 through ADR-012
resolution: pending
---
# Storage Architecture Review: 2026-04-21
Comprehensive review of the storage specification documents (`docs/architecture/storage/`) and related ADRs. Five parallel subagent reviews were conducted, each focused on a domain area. Their findings are consolidated here with deduplication, prioritization, and cross-references.
## Review Sessions (open-memory)
| # | Domain | Session ID |
|---|--------|------------|
| 1 | Identity & Auth | `ses_24f76141effegdhw2bxX2sOvYb` |
| 2 | Sessions & Messages | `ses_24f751efbffeyWo9wb6hAnnj0y` |
| 3 | Services, Spokes, Call Graph | `ses_24f746ebbffeG4jqN3MbK5i9yt` |
| 4 | Tasks & Coordination | `ses_24f7431baffeElbZ3qVHCYQOSv` |
| 5 | Cross-Cutting Concerns | `ses_24f735dbcffea1pN0JCgtPdbt2` |
## Documents Reviewed
- `docs/architecture/storage/README.md` — common pattern, package structure, open questions
- `docs/architecture/storage/table-reference.md` — cross-cutting reference (cascades, indexes, enums, relations)
- `docs/architecture/storage/identity.md` — accounts, organizations, organization_members, api_keys, audit_logs
- `docs/architecture/storage/projects.md` — projects, workspaces
- `docs/architecture/storage/sessions.md` — sessions, messages, parts
- `docs/architecture/storage/roles.md` — roles
- `docs/architecture/storage/services.md` — clients, client_secrets
- `docs/architecture/storage/spokes.md` — spokes, operation_specs
- `docs/architecture/storage/call-graph.md` — call_graph_nodes, call_graph_edges
- `docs/architecture/storage/coordination.md` — mappings, detections
- `docs/architecture/storage/tasks.md` — tasks, task_dependencies
- `docs/decisions/ADR-001` through `ADR-012`
## Summary Statistics
| Severity | Count |
|----------|-------|
| 🔴 Critical | 14 |
| 🟡 Warning | 22 |
| 💡 Suggestion | 17 |
---
## 🔴 Critical Issues
Issues that must be resolved before the storage spec is stabilized. Each represents a concrete inconsistency, data integrity risk, or ambiguity that would cause implementation divergence.
---
### C01. `NOT NULL` + `onDelete: SET NULL` — Contradictory Constraints
**Sessions**: 1, 2, 5
**Files**: `sessions.md:17`, `identity.md:112`, `table-reference.md:80-71`
**Open-memory**: `ses_24f76141effegdhw2bxX2sOvYb`, `ses_24f751efbffeyWo9wb6hAnnj0y`, `ses_24f735dbcffea1pN0JCgtPdbt2`
Two FK columns are declared `NOT NULL` but have `onDelete: SET NULL`. PostgreSQL will reject the DELETE because it cannot nullify a NOT NULL column:
1. **`sessions.accountId`** — `text NOT NULL` (`sessions.md:17`) with `onDelete: SET NULL` (`table-reference.md:80`). Deleting an account that owns sessions fails.
2. **`audit_logs.ownerId`** — `text NOT NULL` (`identity.md:112`) with `onDelete: SET NULL` (`table-reference.md:71`). Deleting an account that has audit entries fails.
**Recommendation**: For each, choose one:
- Make the column **nullable** (if detaching on delete is desired)
- Change cascade to **RESTRICT** (if the FK must always be populated — blocks account deletion)
- Change cascade to **CASCADE** (if deleting dependent records is acceptable)
- Add application-level logic that reassigns/destroys dependents before account deletion
For `audit_logs.ownerId`, RESTRICT may be correct — audit trails should prevent account deletion. For `sessions.accountId`, nullable is likely correct — orphaned sessions (account deleted) are still valuable data.
---
### C02. ADR-003 vs `sessions.md` on Message IDs
**Sessions**: 2, 5
**Files**: `ADR-003`, `sessions.md:42-46`, `table-reference.md:48`
**Open-memory**: `ses_24f751efbffeyWo9wb6hAnnj0y`, `ses_24f735dbcffea1pN0JCgtPdbt2`
ADR-003 explicitly states: *"Parts and messages tables use sortable timestamp-based IDs instead of commonCols.id."* However, `sessions.md` defines the `messages` table using `commonCols` (which provides UUIDv4 via `crypto.randomUUID()`). Only `parts` explicitly uses sortable IDs. `table-reference.md` only mentions parts for sortable IDs.
This is a three-way inconsistency: ADR says both tables, sessions.md does one, table-reference says one. Message ordering is semantically important (the composite index `idx_messages_session_id_created_at_id` on `(session_id, created_at, id)` relies on `created_at` for ordering, making UUIDv4 sortable IDs unnecessary — but this contradicts ADR-003's stated rationale).
**Recommendation**: Either:
- (A) Update `messages` table to use sortable IDs (consistent with ADR-003, eliminates dependency on `created_at` for ordering), **or**
- (B) Amend ADR-003 to state that only `parts` uses sortable IDs, and `messages` relies on the `(session_id, created_at, id)` composite index
---
### C03. Operation Specs: Delete vs. Soft-Deactivation Unresolved
**Sessions**: 3, 5
**Files**: `spokes.md:66`, `table-reference.md:67`, `README.md` Open Question #2
**Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt`, `ses_24f735dbcffea1pN0JCgtPdbt2`
The spoke disconnect lifecycle has three conflicting positions:
- `spokes.md:66`: "Removes the spoke's `operation_specs` rows **(or marks inactive)**" — ambiguous
- `table-reference.md:67`: `operation_specs.spokeId → spokes.id` with **CASCADE** delete
- `README.md` Open Question #2: "DELETE aligns with the ephemeral spoke model" for now
The `operation_specs` table has **no** `active` or `status` column to support soft-deactivation. Crucially, spoke rows are **never deleted** — they're only marked `status: "disconnected"`. This means the CASCADE FK never fires, and there's no mechanism to clean up operation_specs on disconnect. The operation_specs rows remain pointing to a disconnected spoke with no way to deprecate them.
**Recommendation**: Resolve decisively:
- **(A) Hard delete on disconnect**: Add explicit cleanup in the disconnect handler. Remove "or marks inactive" from spokes.md. CASCADE only applies to rare admin spoke-row deletion.
- **(B) Add active/status column to operation_specs**: Support soft-deactivation. Update cascade rationale. This preserves the operation registry for audit/reconnection but adds schema complexity.
Option A aligns with the ephemeral spoke model. Option B supports spoke reconnection. Choose one and update all documents.
---
### C04. `parts.sessionId` Denormalization: No Enforcement Mechanism
**Sessions**: 2
**Files**: `sessions.md:96`, `sessions.md:105`
**Open-memory**: `ses_24f751efbffeyWo9wb6hAnnj0y`
The stated invariant: *"when inserting a part, always set `sessionId` to the message's `sessionId`. Never update `messages.sessionId` without updating all child parts."* However:
- No DB trigger enforces this
- No application-level transaction pattern is documented
- No CHECK constraint exists
- If `messages.sessionId` could change, there's a race condition window
**Recommendation**: Document that `sessionId` on both `messages` and `parts` is **immutable after creation** (which eliminates the update problem). Define the application-level contract for part insertion: read the message's `sessionId` and set it on the part within the same transaction. Add an explicit "IMMUTABLE" note to the `sessionId` column in `sessions.md`.
---
### C05. `sessions.roleName` — No FK, No Validation Strategy Documented
**Sessions**: 2
**Files**: `sessions.md:26`, `table-reference.md:100-101`, `roles.md`
**Open-memory**: `ses_24f751efbffeyWo9wb6hAnnj0y`
`sessions.roleName` is bare `text` with no FK to `roles.name` and no documented reason why. Is this intentional (to support file-based roles in Phase 1)? What happens if the role name has a typo? What about sessions referencing a role that was deleted?
**Recommendation**: Either:
- (A) Add `FK → roles.name` with `onDelete: SET NULL` (role deletions detach sessions), **or**
- (B) Document why the FK is intentionally omitted: "role definitions may come from `.opencode/agents/*.md` files before DB sync; application-level validation checks against known role names at session creation time."
---
### C06. `mappings.task` Denormalized Column: No Sync Strategy
**Sessions**: 4
**Files**: `coordination.md:22`, `tasks.md:209`
**Open-memory**: `ses_24f7431baffeElbZ3qVHCYQOSv`
The `mappings` table has both `taskId` (FK → tasks.id) and `task` (denormalized display name). No mechanism keeps them in sync. If `taskId` points to a task whose `slug` or `name` changes, `mappings.task` becomes stale. When `taskId` is SET NULL (task deleted), what happens to `task`?
**Recommendation**: Document the invariant: "`mappings.task` is set to `tasks.slug` at insert time and is **not** automatically updated when the task's slug changes. When `taskId` is SET NULL (task deleted), `task` should also be SET NULL. This is a cache, not a source of truth." Alternatively, remove the denormalized column and use a VIEW that joins.
---
### C07. Sync vs. Runtime Field Conflict in Tasks
**Sessions**: 4
**Files**: `tasks.md:296-325`
**Open-memory**: `ses_24f7431baffeElbZ3qVHCYQOSv`
The task sync does a full upsert, but the Authority Model says runtime status mutations go through `hub.task.updateStatus`. If sync blindly writes frontmatter `status`, it can clobber runtime state. Example:
1. Agent sets `task.status = 'in-progress'` via `hub.task.updateStatus`
2. Decomposer edits the task file (still has `status: pending`)
3. Sync runs and upserts the task — overwrites `in-progress` back to `pending`
**Recommendation**: Define the sync field split explicitly: "Sync upserts **authored fields** (slug, name, path, scope, risk, impact, level, priority, tags, assignee, due, body, fileCreatedAt, fileModifiedAt, depends_on) and must **not overwrite runtime-managed fields** (status, startedAt, completedAt). Runtime fields are only mutated via `hub.task.*` operations." Update the sync flow specification in tasks.md.
---
### C08. Concurrent `task.body` Appends: No Collision Handling
**Sessions**: 4
**Files**: `tasks.md:249-266`
**Open-memory**: `ses_24f7431baffeElbZ3qVHCYQOSv`
`hub.task.addNote` appends a timestamped note section to `body`. In a multi-agent system, read-modify-write is a race condition: Agent A reads body, Agent B reads body, both append, Agent A writes, Agent B overwrites A's addition. The spec says "This is simple" — it is not simple under concurrency.
**Recommendation**: Specify the concurrency model: `hub.task.addNote` must use DB-level concatenation (`UPDATE tasks SET body = body || $note WHERE id = $taskId`), not a read-modify-write cycle. Or use optimistic locking with `updatedAt`. Document this explicitly in the `addNote` specification.
---
### C09. Cross-Project Dependency Constraint: No DB Enforcement
**Sessions**: 4
**Files**: `tasks.md:217`, `tasks.md:357`
**Open-memory**: `ses_24f7431baffeElbZ3qVHCYQOSv`
"Tasks can only depend on tasks within the same project" is declared but only "enforced at the application level." `task_dependencies` has FK columns with no `projectId` column or check constraint. Application-level enforcement is vulnerable to race conditions, direct SQL access, or bugs.
**Recommendation**: At minimum, add a DB-level guard. Options:
- (A) Add a trigger that checks `dependsOnTaskId` and `taskId` belong to the same project
- (B) Add a denormalized `projectId` column to `task_dependencies` with a composite FK
- (C) Document the risk explicitly and specify that the sync operation validates project scope within a transaction (SELECT FOR SHARE)
---
### C10. Call Graph Edges: Missing Indexes and Cascade Documentation
**Sessions**: 3, 5
**Files**: `call-graph.md:32-41`, `table-reference.md` (missing)
**Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt`, `ses_24f735dbcffea1pN0JCgtPdbt2`
`call_graph_edges` has **no indexes** and **no cascade entries** in `table-reference.md`. Both `sourceId` and `targetId` reference `call_graph_nodes.id` with CASCADE (implied by domain doc), but this is undocumented. Without indexes, graph traversal queries (find children, find parents) will require sequential scans.
Additionally, the relationship between `call_graph_nodes.parentRequestId` and `call_graph_edges` is ambiguous: do they store the same parent-child relationship redundantly, or serve different purposes?
**Recommendation**:
- Add indexes: `idx_call_graph_edges_source_id` on `(sourceId)`, `idx_call_graph_edges_target_id` on `(targetId)`. Consider unique on `(sourceId, targetId, edgeType)` to prevent duplicates.
- Add cascade entries to `table-reference.md` for both FKs (CASCADE).
- Clarify `parentRequestId` vs `call_graph_edges`: document whether `parentRequestId` is a convenience shortcut or redundant with edges.
---
### C11. Secret Key Rotation: Underspecified
**Sessions**: 3
**Files**: `services.md:94-97`, `ADR-008`
**Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt`
Lazy re-encryption is insufficiently specified for a security-critical operation:
1. **Multi-key storage**: `HUB_ENCRYPTION_KEY` (singular env var) — how are old and new keys stored simultaneously during rotation?
2. **Re-encryption transaction**: If the process crashes between decrypt and re-encrypt-update, is the secret left in the old key version?
3. **Old key unavailability**: What happens if a secret with `keyVersion=1` is accessed after the old key is removed? Permanent data loss with no documented handling.
4. **No background sweep**: Old-key-version secrets persist indefinitely until accessed. If the old key is compromised, those secrets remain vulnerable.
**Recommendation**:
- Specify multi-key storage: e.g., `HUB_ENCRYPTION_KEYS=v1:base64key,v2:base64key` or a key file
- Document the re-encryption transaction: decrypt → encrypt → UPDATE in a single DB transaction, with crash-safety note
- Add a warning about the vulnerability window (old-key secrets not yet re-encrypted)
- Specify whether a background re-encryption sweep is needed or deferred
---
### C12. Client Config Schema Validation: Timing and Evolution Ambiguous
**Sessions**: 3
**Files**: `services.md:19`, `ADR-007`, `README.md` Open Question #10
**Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt`
"Validated against the TypeBox schema for this type **on write**" is ambiguous:
1. Who validates? Drizzle insert schema? API handler? DB trigger? Direct SQL bypasses application validation.
2. Schema evolution: when code deployment changes a client type's TypeBox schema, existing DB rows may become invalid under the new schema.
3. No re-validation on read is documented.
**Recommendation**:
- Specify: "validate on write (API handler layer) + warn on read (start-up validation pass with logging, not blocking)"
- Document the schema evolution contract: new fields MUST be `Type.Optional()`; breaking changes MUST use a new client `type` string (e.g., `llm-provider-v2`)
- Consider a `configSchemaVersion` in `metadata` tracking which schema version validated the config
---
### C13. Dual Ownership Model for Organizations: Undefined
**Sessions**: 1
**Files**: `identity.md:44` (ownerId), `identity.md:58` (membershipLevel: "owner")
**Open-memory**: `ses_24f76141effegdhw2bxX2sOvYb`
Two competing ownership concepts with no documented relationship:
1. `organizations.ownerId` — a single FK to one account
2. `organization_members.membershipLevel: "owner"` — can exist on multiple rows
Can `ownerId` point to an account with `membershipLevel: "member"` (not "owner")? Can an org have zero members with `membershipLevel: "owner"` but a non-null `ownerId`? An implementer cannot determine which field is authoritative for ownership queries.
**Recommendation**: Document the invariant. E.g.: "`ownerId` is always a member with `membershipLevel: 'owner'` (enforced by app logic). If all owner-level members are removed, `ownerId` must be transferred first." Or: "`ownerId` is the creator; `membershipLevel: 'owner'` is a separate authorization concept."
---
### C14. Missing FK Cascade Entries in `table-reference.md`
**Sessions**: 5
**Files**: `table-reference.md:53-83`
**Open-memory**: `ses_24f735dbcffea1pN0JCgtPdbt2`
The following FK relationships are documented in per-domain docs but **absent** from the cascade reference table:
| Missing Relationship | Source Doc |
|---|---|
| `mappings.workspaceId → workspaces.id` | coordination.md:19 |
| `detections.sessionId → sessions.id` | coordination.md:36 |
| `call_graph_edges.sourceId → call_graph_nodes.id` | call-graph.md:39 |
| `call_graph_edges.targetId → call_graph_nodes.id` | call-graph.md:41 |
| `api_keys.rotatedToId → api_keys.id` | identity.md:80 |
Without documented cascade behavior, PostgreSQL defaults to `RESTRICT`, which may not be the intended behavior for all of these.
**Recommendation**: Add all missing FK entries to the cascade table with explicit `onDelete` behavior. For the `rotatedToId` FK specifically: SET NULL (old key keeps its data but rotation link is broken if new key is deleted).
---
## 🟡 Warnings
Issues that should be resolved if possible. They represent gaps in documentation, suboptimal designs, or inconsistencies that could cause confusion.
---
### W01. Dual JSONB Overlap: `commonCols.metadata` vs Per-Table `data`
**Sessions**: 1
**Files**: `identity.md:85-88`, `identity.md:23`, `README.md:73`
**Open-memory**: `ses_24f76141effegdhw2bxX2sOvYb`
Two overlapping JSONB columns exist on some tables with no documented boundary:
- `commonCols.metadata` — present on every table, `Record<string, unknown>`
- Per-table `data` columns — domain-specific data (e.g., `accounts.data`, `organizations.data`)
For `api_keys`, keypal stores `scopes`, `resources`, and `tags` **inside `commonCols.metadata`**. For `accounts`, both `data` ("preferences, avatar URL") and `metadata` (arbitrary) exist with overlapping purposes and no split documentation.
**Recommendation**: Document the boundary: "`data` holds structured domain-specific data with known TypeScript types. `metadata` holds opaque key-value pairs for subsystem use, with a namespacing convention (e.g., `metadata._keypal.scopes`). Never mix domain data into `metadata`."
---
### W02. No Account Deactivation Mechanism
**Sessions**: 1
**Files**: `identity.md` (accounts table)
**Open-memory**: `ses_24f76141effegdhw2bxX2sOvYb`
The `accounts` table has no `enabled`/`suspended` column. Combined with `organizations.ownerId → RESTRICT`, an org owner's account cannot be deleted. But there's also no way to deactivate it when an employee leaves.
**Recommendation**: Add an `enabled` boolean (consistent with `api_keys.enabled` and `clients.enabled`), or a `status` column (`active`/`suspended`/`deactivated`). Document the interaction with cascade constraints.
---
### W03. Missing Indexes Across Many Tables
**Sessions**: 1, 2, 3, 5
**Files**: `table-reference.md:87-145`, per-domain docs
**Open-memory**: All sessions (consensus finding)
Multiple tables have FK columns or common query patterns without supporting indexes:
| Table | Missing Index | Purpose |
|---|---|---|
| `sessions` | `unq_sessions_slug` in index ref | UNIQUE constraint not listed (unlike other UNIQUEs) |
| `sessions` | `idx_sessions_parent_id` on `(parentId)` | Find child sessions of coordinator |
| `projects` | `idx_projects_org_id` on `(orgId)` | Find projects for an org |
| `workspaces` | `idx_workspaces_project_id` on `(projectId)` | Find workspaces for a project |
| `spokes` | `idx_spokes_name` on `(name)` | Look up spoke by name |
| `detections` | `idx_detections_session_id` on `(sessionId)` | Find detections for a session (no indexes at all) |
| `call_graph_nodes` | `idx_call_graph_nodes_created_at` on `(createdAt)` | Time-range queries |
| `call_graph_nodes` | `idx_call_graph_nodes_operation_created` on `(operationId, createdAt)` | Operation + time queries |
| `call_graph_edges` | `idx_call_graph_edges_source_id` on `(sourceId)` | Graph traversal (children) |
| `call_graph_edges` | `idx_call_graph_edges_target_id` on `(targetId)` | Graph traversal (parents) |
| `mappings` | `idx_mappings_workspace_id` on `(workspaceId)` | Workspace-scoped mapping queries |
Also: `idx_api_keys_key_hash` (B-tree) is redundant with `unq_api_keys_key_hash` (UNIQUE). Postgres automatically creates an index for UNIQUE constraints.
**Recommendation**: Add all missing indexes to `table-reference.md` and relevant per-domain docs. Remove the redundant `idx_api_keys_key_hash`.
---
### W04. `operation_specs` Pre-Remap vs. Post-Remap Namespace Ambiguity
**Sessions**: 3
**Files**: `spokes.md:51-55`, `spoke-runner.md:62`
**Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt`
Do `operation_specs.namespace` and `operation_specs.name` store the original spoke identifiers (pre-remap, e.g., `dev.fs.read`) or the remapped hub identifiers (post-remap, e.g., `dev.{spokeId}.fs.read`)? The spoke-runner.md says the hub remaps spoke operations into a hub namespace, but the operation_specs storage format is never specified.
If pre-remap: two spokes registering `dev.fs.read` creates ambiguity without joining on `spokeId`.
If post-remap: the partial unique indexes may be over-constraining since the spoke-specific namespace prefix makes `spokeId` redundant for uniqueness.
**Recommendation**: Explicitly document which identifiers are stored. If pre-remap, document how callers resolve ambiguity. If post-remap, adjust the uniqueness rationale.
---
### W05. `call_graph_edges.edgeType` Semantics Undefined
**Sessions**: 3
**Files**: `call-graph.md:41`
**Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt`
Three edge types are listed (`triggered`, `depends_on`, `requested_by`) but none are explained. The call-graph architecture doc only discusses parent-child relationships (triggered). `depends_on` and `requested_by` are novel and undocumented. Are these exhaustive or extensible?
**Recommendation**: Document each edge type's semantics in `call-graph.md`, or state that `edgeType` is an extensible text field with these three initial values and define what each means.
---
### W06. `spokes.status` Missing `reconnecting` State
**Sessions**: 3
**Files**: `spokes.md:18`, `spoke-runner.md:130-136`
**Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt`
The spoke status enum is `connected`, `disconnected`. The spoke-runner.md describes a reconnection flow, but there's no intermediate state for "reconnecting." When a spoke's WebSocket drops, it shows `disconnected` — indistinguishable from a permanently offline spoke.
**Recommendation**: Add `reconnecting` to the spoke status enum, or document that reconnection is handled at the application layer (WebSocket reconnect timer) without a DB state change.
---
### W07. `client_secrets.keyVersion` Redundancy
**Sessions**: 3
**Files**: `services.md:71`, `services.md:82-86`
**Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt`
`client_secrets` has both a standalone `keyVersion` column (integer NOT NULL DEFAULT 1) AND `keyVersion` embedded in the `value` JSONB (`EncryptedData.keyVersion`). These can diverge with no documented invariant.
**Recommendation**: Either remove the standalone column (read from `value.keyVersion`), or document that the standalone column is authoritative and they must be kept in sync.
---
### W08. Call Graph Payload Security
**Sessions**: 3
**Files**: `call-graph.md:22-23`
**Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt`
The `input` and `output` JSONB columns store full call payloads. Operations like `hub.register` (which receives auth tokens) would store API keys and secrets in cleartext. The truncation strategy (10KB) addresses size, not sensitive data. No redaction is mentioned.
**Recommendation**: Add a section on sensitive data handling. Options:
- Operation handlers mark certain fields as redacted
- The call graph writer applies field-level redaction by convention (fields named `password`, `token`, `secret`, `key`)
- The truncation strategy is extended with a redaction pass
---
### W09. No Call Graph Retention Policy
**Sessions**: 3, 4
**Files**: `call-graph.md` (absent), `README.md` Open Question #5
**Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt`
Call graph data grows unboundedly. Every operation invocation creates a node and edges. CASCADE handles cleanup when a node is deleted, but nothing deletes old nodes. README.md acknowledges this as Open Question #5.
**Recommendation**: Specify the intended approach: TTL-based deletion, archival to cold storage, or aggregation + deletion. Even a "v1: manual cleanup, v2: automatic TTL" notation helps.
---
### W10. `sessions.version` Column: Unspecified
**Sessions**: 2
**Files**: `sessions.md:24`, `README.md` Open Question #1
**Open-memory**: `ses_24f751efbffeyWo9wb6hAnnj0y`
The `version` column is `text NOT NULL` with description "Schema version (opencode compat)" but:
- No valid values listed
- No default documented for hub-direct sessions vs opencode imports
- No versioning scheme defined
- README.md Open Question #1 asks whether to version `data` columns — this is unresolved
**Recommendation**: Define initial version value (e.g., `"1"`), document what `version` governs (the `data` JSONB shape? the message/parts schema? opencode compatibility only?), and specify the default for hub-direct sessions.
---
### W11. Overlapping Status Enums Without Cross-Table Disambiguation
**Sessions**: 4, 5
**Files**: `table-reference.md:147-164`, `coordination.md:23`, `tasks.md:84-86`
**Open-memory**: `ses_24f7431baffeElbZ3qVHCYQOSv`, `ses_24f735dbcffea1pN0JCgtPdbt2`
Three tables have `status` with overlapping values:
| Table | Shared Values | Unique Values |
|---|---|---|
| `mappings` | `completed`, `failed`, `aborted` | `active` |
| `call_graph_nodes` | `completed`, `failed`, `aborted` | `pending`, `running` |
| `tasks` | `completed`, `failed` | `pending`, `in-progress`, `blocked` |
`table-reference.md:164` only contrasts `mappings.active` vs `call_graph_nodes.pending/running`. It does NOT contrast `tasks` statuses with the others. `mappings.completed` and `tasks.completed` mean different things (mapping workflow completion vs task completion).
**Recommendation**: Add cross-table state mapping documentation. When a task goes `in-progress`, there should be an active mapping; when a task is `completed`, the mapping becomes `completed`. Document valid combinations.
---
### W12. Audit Logs Missing Session and Org Context
**Sessions**: 1
**Files**: `identity.md:103-117`
**Open-memory**: `ses_24f76141effegdhw2bxX2sOvYb`
`audit_logs` has `ownerId` and `keyId` but no `sessionId` or `orgId`. For LLM accounts that fill roles in sessions, the session correlation is a significant traceability gap. Multi-tenant auditing requires org filtering.
**Recommendation**: Add `sessionId` (nullable FK → sessions.id, SET NULL) and `orgId` (nullable FK → organizations.id, SET NULL). Expand `action` types to cover account, membership, and organization lifecycle events — or document the `action` enum as extensible.
---
### W13. API Key Hashing (SHA-256) Trade-Off Undocumented
**Sessions**: 1
**Files**: `identity.md:74`, `ADR-010`
**Open-memory**: `ses_24f76141effegdhw2bxX2sOvYb`
API keys are bearer tokens stored as SHA-256 hashes. SHA-256 is a fast hash, not a deliberately slow KDF (bcrypt/Argon2). If the database is compromised, SHA-256 hashes can be brute-forced orders of magnitude faster than slow hashes. However, API keys are high-entropy machine-generated strings (128-bit+), making brute-force infeasible even with a fast hash. No ADR documents this trade-off.
**Recommendation**: Add documentation: "API keys are high-entropy random strings (128-bit+), making brute-force infeasible even with a fast hash. SHA-256 was chosen for O(1) verification latency at high throughput. This is acceptable because API keys are machine-generated, unlike human-chosen passwords."
---
### W14. ADR Terminology Inconsistencies
**Sessions**: 1
**Files**: `ADR-009:13`, `ADR-012:55`, `agent-roles.md`
**Open-memory**: `ses_24f76141effegdhw2bxX2sOvYb`
- ADR-009 says "organization_members (membership with **roles**)" — contradicts ADR-012's rename to `membershipLevel`
- ADR-012 itself uses `accounts.role: "service"` in its rationale, despite mandating the rename to `accessLevel`
- `agent-roles.md` also uses `accounts.role: "service"`
**Recommendation**: Update ADR-009 to say "membership with levels." Update ADR-012:55 and agent-roles.md to use `accounts.accessLevel: "service"`.
---
### W15. Resolved Open Questions Still Listed as Open in README
**Sessions**: 5
**Files**: `README.md:197-225`
**Open-memory**: `ses_24f735dbcffea1pN0JCgtPdbt2`
Several open questions are resolved by per-domain docs or ADRs but remain listed as open:
- **Q2** (operation spec cleanup): Resolved — DELETE aligns with ephemeral spoke model (spokes.md, table-reference.md CASCADE)
- **Q4** (workspaces vs. directories): Marked as "Resolved" in the list but still present
- **Q14** (`accounts.role``accessLevel`): Renamed in identity.md, referenced in ADR-012
**Recommendation**: Move resolved items to a "Resolved Decisions" section with cross-references to the resolving documents.
---
### W16. `organizations.ownerId` RESTRICT: No Deletion/Transfer Workflow
**Sessions**: 1, 5
**Files**: `identity.md:44`, `table-reference.md:56`
**Open-memory**: `ses_24f76141effegdhw2bxX2sOvYb`, `ses_24f735dbcffea1pN0JCgtPdbt2`
RESTRICT prevents deletion of accounts that own organizations, but no ownership transfer mechanism is documented.
**Recommendation**: Add a note: "Before deleting an account, transfer all owned organizations via `org.transferOwnership` operation." Document the transfer pattern in identity.md or coordination.md.
---
### W17. Path LIKE Queries May Not Use B-Tree Indexes in PostgreSQL
**Sessions**: 4
**Files**: `tasks.md:83`, `tasks.md:101`
**Open-memory**: `ses_24f7431baffeElbZ3qVHCYQOSv`
`WHERE path LIKE 'implementation/%'` can use a B-tree index **only with the `C` locale or `text_pattern_ops`**. With the default locale, LIKE pattern matching may not use the index.
**Recommendation**: Specify that the `path` index should use `text_pattern_ops` (`CREATE INDEX idx_tasks_path ON tasks (path text_pattern_ops)`) or document the locale dependency.
---
### W18. Call Graph Payload Truncation Lacks Precision
**Sessions**: 3
**Files**: `call-graph.md:30`
**Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt`
The truncation strategy says "truncate payloads larger than 10KB" but doesn't specify: when truncation happens (on write? after call completes?), what `preview` contains (first N bytes? N characters?), whether 10KB is configurable, or how object storage reference URLs are structured.
**Recommendation**: Specify: (a) truncation happens on write to DB (in-flight calls have full payloads); (b) `preview` is the first 1024 bytes of the JSON-serialized payload; (c) make the threshold configurable per operation type or via hub config; (d) defer object storage details but add a placeholder section.
---
### W19. `call_graph_nodes.identity` Has No FK or Account Linkage
**Sessions**: 3
**Files**: `call-graph.md:20`
**Open-memory**: `ses_24f746ebbffeG4jqN3MbK5i9yt`
The `identity` JSONB column stores `{ id, scopes, resources }` as a snapshot, but there's no FK to `accounts.id`. Querying "all calls made by account X" requires JSONB containment, which is slow without a GIN index.
**Recommendation**: Add a `callerAccountId` text column with FK → accounts.id (SET NULL) for efficient querying, or add a GIN index on `identity` if JSONB queries are the intended access pattern.
---
### W20. `mappings` Table Overloaded — Three Distinct Relationship Types
**Sessions**: 4
**Files**: `coordination.md:10-27`
**Open-memory**: `ses_24f7431baffeElbZ3qVHCYQOSv`
The `mappings` table stores three conceptually different relationships in one table:
1. Session → Spoke (where is the session running?)
2. Session → Parent session (coordination hierarchy)
3. Session → Task (what work is the session doing?)
All nullable FKs allow any combination, including potentially invalid ones. The table name `mappings` doesn't convey what's being mapped.
**Recommendation**: Document the valid column combinations (e.g., `sessionId` always NOT NULL, `taskId` only for task-scoped mappings, `parentSessionId` only for coordinator children). This makes it a polymorphic association table with documented shapes.
---
### W21. `detections` Table Is Minimal — No Resolution or Deduplication
**Sessions**: 4
**Files**: `coordination.md:29-39`
**Open-memory**: `ses_24f7431baffeElbZ3qVHCYQOSv`
- No resolution tracking (resolved, acknowledged, false-positive)
- No deduplication (persistent `MODEL_DEGRADATION` creates a new row every check interval)
- No session end correlation
- `anomalyType` value set is unclear (open or closed enum?)
**Recommendation**: Add `resolvedAt` timestamp column. Add a UNIQUE constraint on `(sessionId, anomalyType)` with upsert semantics, or document that deduplication is handled at the application level. Specify whether `anomalyType` is extensible.
---
## 💡 Suggestions
Quality-of-life improvements that should be considered but won't block stabilization.
---
### S01. Document `accessLevel` Change Authorization
Who can change an `accounts.accessLevel`? Can a `user` self-promote? Document the assumed invariants even for application-level concerns.
---
### S02. Add Partial Indexes for Common Access Patterns
Several partial indexes would improve performance: active API keys (`WHERE revoked_at IS NULL AND enabled = true`), connected spokes (`WHERE status = 'connected'`), non-archived sessions, active tasks (`WHERE status IN ('pending', 'in-progress', 'blocked')`).
---
### S03. Reserve `@alk.dev` Email Domain for System Accounts
LLM accounts use fallback addresses like `glm-5.1@alk.dev`. Document that all `*@alk.dev` emails are reserved for system-generated accounts and humans must use other domains.
---
### S04. Consider `displayName` Index for User Search
`accounts.displayName` is not indexed. For UIs with user search/autocomplete, this would require full table scans.
---
### S05. Document API Key Expiration Behavior
Does an expired key return "key expired" or a generic "authentication failed"? Recommend generic response to avoid leaking key state to attackers.
---
### S06. Cross-Reference `sessions.accountId` in Identity Docs
`identity.md:12` lists FK targets but omits `sessions.accountId`. Add it for completeness.
---
### S07. Define `FilePartData` Type
`sessions.md:132` references `FilePartData[]` in ToolState but never defines it. Clarify whether it's the same as the `file` part type's data shape.
---
### S08. Complete AI SDK UIMessage Part Type Mapping
`sessions.md:145-152` maps 6 part types but omits `step-finish`, `patch`, `snapshot`, `compaction`, `agent`. Document that these are excluded from the UIMessage view, or add mappings.
---
### S09. Document `sessions.slug` Generation Strategy
Is it human-provided? Auto-generated? Random? This matters for API design and uniqueness enforcement.
---
### S10. Add `parts.type` Index for Part-Type Queries
A composite index `(session_id, type)` would support queries like "all tool-call parts in session X" without a full scan. At minimum, document that `type` queries rely on existing indexes + sequential scan.
---
### S11. Document Whether Parts Are Flat or Nested
The `agent` part type implies sub-agent delegation, which might need nesting. The current schema has no `parentId` on parts. Document whether parts are flat or whether nesting might be needed.
---
### W22. `parts` Table: Missing `$onUpdate` and `NOT NULL` on Timestamp Columns
**Sessions**: 5
**Files**: `sessions.md:99-107`, `README.md:69-82`
**Open-memory**: `ses_24f735dbcffea1pN0JCgtPdbt2`
The `parts` table defines its own `id`, `metadata`, `createdAt`, and `updatedAt` instead of using `commonCols`, but the spec only says "defaults to `now()`" without specifying `NOT NULL` or `$onUpdate`. If the Drizzle implementation omits `$onUpdate`, parts rows never have `updatedAt` updated on modification, silently breaking any optimistic concurrency or caching logic. If `createdAt`/`updatedAt` are not `NOT NULL`, they can become NULL.
**Recommendation**: The `parts` table spec must explicitly state that `createdAt` and `updatedAt` are `NOT NULL` and that `updatedAt` includes `$onUpdate(() => new Date())`. Either replicate these details from `commonCols` with an explicit override note for `id`, or reference `commonCols` with the `id` exception documented.
---
### S13. Add `projectId` to `mappings` for Direct Project-Scoped Queries
Finding all active mappings for a project's tasks requires a JOIN through `sessions.projectId` or `tasks.projectId`. A denormalized `projectId` would simplify this, or document that the JOIN pattern is acceptable.
---
### S14. Document `mappings.status` Lifecycle Transitions
Unlike `tasks.status` which has an explicit lifecycle diagram, `mappings.status` transitions are unspecified. Add a lifecycle diagram or state machine.
---
### S15. Specify Task Enum Values as Drizzle `pgEnum`
The categorical enum values (`scope`, `risk`, `impact`, `level`, `priority`, `status`) are documented as text strings but not referenced as Drizzle `pgEnum` types. Specify that these should be `pgEnum` for type safety, with the decomposer template consuming the same definitions.
---
### S16. Rename `taskId` to `dependentTaskId` in `task_dependencies`
The column name `taskId` is generic and could be confused as "this task" rather than "the dependent task." Renaming to `dependentTaskId` makes the direction unmistakable.
---
### S17. Add `call_graph_nodes.startedAt` Index for Latency Analysis
`startedAt` is crucial for p99 latency analysis. Consider an index alongside or instead of the suggested `createdAt` index.
---
### S18. Consider Unique Constraint on `call_graph_edges(sourceId, targetId, edgeType)`
Nothing prevents duplicate edges between the same two nodes with the same type. A unique constraint prevents silently duplicated edge events from retries/reconnections.
---
## ✅ What's Working Well
Strengths identified across all five reviews:
1. **Drizzle-TypeBox pattern** — Well-documented and consistently applied. The `createSelectSchema`/`createInsertSchema` workflow with JSONB overrides is clear and implementable.
2. **Cross-cutting reference pattern**`table-reference.md` as a single source for cascades, indexes, enums, and relations is an excellent organizational pattern that prevents "hunt through every domain doc" problems.
3. **Nullable categorical fields (ADR-011)** — The "not yet assessed" signal via NULL (instead of defaults) is well-reasoned and matches taskgraph's own `Option<T>` model.
4. **Dual task representation** — DB as source of truth, files as authoring surface. The authority model table is excellent and provides clear guidance.
5. **ADR-012 terminology clarification** — The account/role/session distinction is clearly motivated and the rename guidance is actionable.
6. **Cascade behavior documentation** — Having all FK behaviors in one place with rationale per relationship prevents implementation errors.
7. **Operation specs as capabilities (ADR-006)** — Elegant decision. Avoids opaque JSONB blobs, makes capabilities fully typed and queryable. Nullable `spoke_id` allows hub-native operations to coexist.
8. **Config/secrets separation (ADR-007)** — Four-layer model (config schema, config instance, auth schema, auth instance) with different storage strategies is well-structured.
9. **Path semantics for tasks** — Replacing `parentId` with `path` column for group scoping is clean. Prefix-based queries are intuitive and well-explained.
10. **Partial unique index design** — The `operation_specs` partial indexes correctly handle PostgreSQL's NULL-in-unique-index behavior. The explanation prevents a common pitfall.
---
## Action Plan
### Before Stabilization (Must Fix)
| Priority | Issue | Action |
|---|---|---|
| 🔴1 | C01 | Resolve NOT NULL + SET NULL contradictions for `sessions.accountId` and `audit_logs.ownerId` |
| 🔴2 | C02 | Resolve ADR-003 vs sessions.md on message IDs — update one or the other |
| 🔴3 | C03 | Resolve operation_specs delete vs soft-deactivation — choose one, update all docs |
| 🔴4 | C04 | Document sessionId immutability invariant on messages/parts |
| 🔴5 | C05 | Document roleName validation strategy (FK or intentional omission) |
| 🔴6 | C14 | Add missing FK cascade entries to table-reference.md |
### Before Implementation (Should Fix)
| Priority | Issue | Action |
|---|---|---|
| 🟡1 | C06 | Document mappings.task denormalization invariant |
| 🟡2 | C07 | Define sync field split (authored vs. runtime fields) |
| 🟡3 | C08 | Specify DB-level concatenation for task.body appends |
| 🟡4 | C09 | Add DB-level guard for cross-project dependencies |
| 🟡5 | C10 | Add call_graph_edges indexes and cascade docs |
| 🟡6 | C11 | Specify key rotation multi-key format and transaction safety |
| 🟡7 | C14 | Add remaining missing cascade entries |
| 🟡8 | W03 | Add missing indexes across tables |
| 🟡9 | W11 | Add cross-table status disambiguation |
| 🟡10 | W14 | Fix ADR terminology inconsistencies |
### Before Production (Consider)
| Priority | Issue | Action |
|---|---|---|
| 💡1 | W02 | Add account deactivation mechanism |
| 💡2 | W08 | Add call graph payload redaction |
| 💡3 | W09 | Define call graph retention policy |
| 💡4 | W12 | Add sessionId and orgId to audit_logs |
| 💡5 | W21 | Add resolution tracking to detections |
| 💡6 | All S01-S18 | Quality-of-life improvements |