Setup repo: migrate architecture specs, code stubs, and tasks from alkhub_ts

Copy architecture docs, ADRs, storage domain specs, research, reviews, and 56 storage architecture tasks from the alkhub_ts monorepo. Adapt for standalone @alkdev/hub repo structure (src/ not packages/hub/). Sanitize all sensitive information: - Replace private IPs (10.0.0.1) with localhost defaults - Remove internal server hostnames (dev1, ns528096) - Replace /workspace/ private paths with npm package references - Remove hardcoded credentials from examples - Rewrite infrastructure.md without private network details Add Deno project scaffolding: deno.json (pinned deps), .gitignore, AGENTS.md, entry point. Migrate existing code stubs (crypto, config types, logger) with updated import paths.
2026-05-25 10:56:32 +00:00
parent 3e3f12d2d5
commit 2b63cda1c7
120 changed files with 11714 additions and 2 deletions
--- a/docs/decisions/ADR-001-jsonb-data-columns-vs-individual-columns.md
+++ b/docs/decisions/ADR-001-jsonb-data-columns-vs-individual-columns.md
@@ -0,0 +1,17 @@
+# ADR-001: JSONB data columns vs individual columns
+
+- **Status**: Accepted
+- **Date**: 2026-04-19
+- **Deciders**: alkdev
+
+## Context
+
+Opencode stores message and part content as JSON blobs in a `data` column. AI SDK `UIMessage` uses inline parts. Need format that works for both query flexibility and streaming.
+
+## Decision
+
+Use separate structured columns for high-query, high-filter fields (role, status, type) and JSONB `data` columns for rich, type-discriminated content. Follows opencode pattern.
+
+## Consequences
+
+JSONB content is opaque to SQL queries on individual fields. If we need to query inside `data`, add generated columns or GIN indexes. Flexibility outweighs the query limitation for now. Positive: clean separation between queryable and flexible data, consistent with proven opencode pattern.
--- a/docs/decisions/ADR-002-jsonb-nullability-rationale.md
+++ b/docs/decisions/ADR-002-jsonb-nullability-rationale.md
@@ -0,0 +1,17 @@
+# ADR-002: JSONB nullability rationale
+
+- **Status**: Accepted
+- **Date**: 2026-04-19
+- **Deciders**: alkdev
+
+## Context
+
+Some JSONB columns are NOT NULL (messages.data, parts.data, operations.inputSchema) while others are nullable (sessions.data, spokes.hostInfo, operations.outputSchema). Need a consistent rationale for when JSONB should be nullable.
+
+## Decision
+
+JSONB columns are NOT NULL when data is required for the record to be meaningful — a message without role-specific metadata or a part without type-specific content is incomplete. Nullable JSONB columns are for optional, evolving, or context-dependent data.
+
+## Consequences
+
+Minimal — this is a convention that matches the semantic meaning of each column. Positive: consistent mental model for schema design. Negative: none significant.
--- a/docs/decisions/ADR-003-sortable-ids-for-parts.md
+++ b/docs/decisions/ADR-003-sortable-ids-for-parts.md
@@ -0,0 +1,29 @@
+# ADR-003: Sortable IDs for parts
+
+- **Status**: Accepted
+- **Date**: 2026-04-19
+- **Deciders**: alkdev
+
+## Context
+
+Parts must be ordered chronologically within a message. UUIDv4 from crypto.randomUUID() is not sortable. Opencode uses prefix-based sortable IDs (prt_{timestamp_hex}{random}).
+
+## Decision
+
+Parts use sortable timestamp-based IDs instead of commonCols.id. Enables ORDER BY id ASC for chronological ordering without a separate position column. Use a monotonic ID generator (e.g., @std/ulid or custom prefix+sortable scheme).
+
+Messages continue to use UUIDv4 (via `commonCols.id`) and rely on the composite index `idx_messages_session_id_created_at_id` on `(session_id, created_at, id)` for ordering. This avoids changing the message ID scheme when messages already have a reliable ordering mechanism via the composite index.
+
+## Amendment (2026-04-22)
+
+Sortable IDs apply to the `parts` table only. Messages retain UUIDv4 from `commonCols.id` because:
+
+1. Messages already have a composite index `(session_id, created_at, id)` that provides efficient chronological ordering without sortable IDs.
+2. UUIDv4 is sufficient for messages since ordering is driven by `created_at`, not by ID sortability.
+3. Changing message IDs would cascade into opencode/AI SDK compatibility layers for no ordering benefit.
+
+Parts are the primary beneficiary of sortable IDs because they are ordered `BY id ASC` within a message, and a separate `position` column would otherwise be required.
+
+## Consequences
+
+Sortable IDs reveal creation timestamps (mitigated by random suffix). Slightly larger than UUIDv4. Ordering benefit outweighs both concerns. Positive: eliminates need for separate position/sort columns, natural chronological ordering. Negative: timestamp leakage and larger ID size.
--- a/docs/decisions/ADR-004-keypal-integration-strategy.md
+++ b/docs/decisions/ADR-004-keypal-integration-strategy.md
@@ -0,0 +1,17 @@
+# ADR-004: Keypal integration strategy
+
+- **Status**: Accepted
+- **Date**: 2026-04-19
+- **Deciders**: alkdev
+
+## Context
+
+keypal (v0.1.11, MIT) provides API key management with hashing, scopes, caching, and a Drizzle storage adapter. Need API key management for hub authentication.
+
+## Decision
+
+Use keypal as a dependency (not fork). Import core utilities (createKeys, hashKey, validateKey, scope checking) directly. Define our own api_keys table following the commonCols pattern with proper columns for high-query fields (owner_id, key_hash, enabled, expires_at, revoked_at). Implement keypal's Storage interface as a thin adapter (HubKeyStorage) over our Drizzle tables.
+
+## Consequences
+
+Custom Storage adapter is more work than using keypal's DrizzleStore directly, but our commonCols pattern and column structure are important for consistency. The adapter is ~100 lines and straightforward. Positive: clean integration that respects our schema conventions. Negative: maintenance burden on adapter if keypal's Storage interface changes.
--- a/docs/decisions/ADR-005-spoke-naming-not-runner.md
+++ b/docs/decisions/ADR-005-spoke-naming-not-runner.md
@@ -0,0 +1,19 @@
+# ADR-005: Spoke naming, not runner
+
+- **Status**: Accepted
+- **Date**: 2026-04-19
+- **Deciders**: alkdev
+
+## Context
+
+The concept of a process connecting to the hub via websocket is a "spoke." Previous drafts used "runner" (influenced by GitHub Actions runner naming), but spokes are more general — dev environments, client applications, or compute instances.
+
+## Decision
+
+Use "spoke" consistently in table names, column names, and throughout the codebase. Table is `spokes` (not `runners`). FK columns are `spoke_id` (not `runner_id`). Registered spoke record is a "spoke registration."
+
+Rationale: Hub-spoke metaphor is consistent throughout architecture docs. "Runner" is a specific kind of spoke, not the general concept.
+
+## Consequences
+
+Positive: naming consistency with hub-spoke architecture metaphor, more general and accurate terminology. Negative: none — purely a naming convention decision that improves clarity.
--- a/docs/decisions/ADR-006-operation-specs-as-capabilities.md
+++ b/docs/decisions/ADR-006-operation-specs-as-capabilities.md
@@ -0,0 +1,28 @@
+# ADR-006: Operation specs as capabilities
+
+- **Status**: Superseded (see update below)
+- **Date**: 2026-04-19
+- **Deciders**: alkdev
+- **Superseded by**: D3 in storage-spec-phase1-resolutions.md (2026-04-22)
+
+## Context
+
+A spoke's capabilities were previously modeled as an opaque JSONB blob. Operations are the universal abstraction; they have names, namespaces, types, and typed schemas.
+
+## Original Decision
+
+A spoke's capabilities are its registered operation specs. The spokes table stores minimal metadata. The operation_specs table stores full definitions. The relationship: spoke registers → hub creates operation_specs rows linked to that spoke. Queries for "what can spoke X do?" go through operation_specs filtered by spoke_id, not through a capabilities blob. The spokes table has no capabilities column. Instead, operation_specs has a spoke_id FK (nullable — hub-native operations have spoke_id = null).
+
+## Revised Decision (D3, 2026-04-22)
+
+The original unified `operation_specs` table conflated two concepts: "what an operation IS" (a definition) and "who provides it right now" (a registration). These are now split into two tables:
+
+1. **`operations`** (definitions): Stores the operation's identity — namespace, name, type, input/output schemas, access control, description, tags. Unique by `(namespace, name)`. No spoke FK — definitions are provider-independent. These persist even when all providers disconnect.
+
+2. **`operation_registrations`** (provider bindings): Links a provider (spoke or client) to an operation definition. Has `operationId → operations.id` (CASCADE), `providerType` (spoke|client), `providerId`, `status` (active|inactive), and pre-remap identifiers. On spoke disconnect, registrations are set to `inactive`. On admin spoke-row deletion, registrations CASCADE.
+
+This supersedes the original unified model. The core principle from the original decision — that a spoke's capabilities are its registered operations, not a capabilities blob — remains unchanged. The query pattern shifts from `operation_specs filtered by spoke_id` to `operation_registrations filtered by providerId and status = 'active'`.
+
+## Consequences
+
+Positive: capabilities are fully typed and queryable, consistent with the operations system, no duplicated capability data. Negative: requires a join to get spoke capabilities (acceptable since operation_registrations are indexed by providerId). The split adds a second table but cleanly separates definition persistence from runtime provider state, enabling multi-instance providers and operation survival across disconnects.
--- a/docs/decisions/ADR-007-client-config-as-schema-validated-jsonb.md
+++ b/docs/decisions/ADR-007-client-config-as-schema-validated-jsonb.md
@@ -0,0 +1,17 @@
+# ADR-007: Client config as schema-validated JSONB
+
+- **Status**: Accepted
+- **Date**: 2026-04-19
+- **Deciders**: alkdev
+
+## Context
+
+The hub connects to external services — LLM providers, VCS, compute, MCP servers, future integrations (JMAP, etc.). Each has a different configuration shape. TypeBox schemas already exist for some (MCPServerConfig in core).
+
+## Decision
+
+Each client type has a known TypeBox schema that validates the config column on write. Schemas live in code (not in the DB). The type column determines which schema validates config. This supports arbitrary client types without schema migrations. The four-layer model: (1) Client config schema (TypeBox, in code), (2) Client config instance (JSONB, clients.config), (3) Auth config schema (TypeBox, in code — implicit in secretKey wiring), (4) Auth config instance (encrypted, client_secrets.value). Config instances are plain JSONB. Auth instances are encrypted with AES-256-GCM.
+
+## Consequences
+
+Config column is opaque to SQL queries. Acceptable because clients are looked up by name (unique) or type, not by config field values. Positive: no schema migrations for new client types, TypeBox validation ensures data integrity, clean separation of config and secrets. Negative: cannot query config fields directly in SQL.
--- a/docs/decisions/ADR-008-secrets-encrypted-at-rest-with-key-versioning.md
+++ b/docs/decisions/ADR-008-secrets-encrypted-at-rest-with-key-versioning.md
@@ -0,0 +1,39 @@
+# ADR-008: Secrets encrypted at rest with key versioning
+
+- **Status**: Accepted (revised 2026-04-23)
+- **Date**: 2026-04-19
+- **Revised**: 2026-04-23
+- **Deciders**: alkdev
+
+## Context
+
+API keys, passwords, OAuth tokens, and SSH keys for external services must be stored securely. The crypto.ts utility from ade-v0 (AES-256-GCM + PBKDF2 with key version support) is battle-tested.
+
+The original decision specified reading the encryption key from an environment variable (`HUB_ENCRYPTION_KEY`). This is a security concern: environment variables are readable via `/proc/PID/environ` by any process with the same UID on the host, and are visible in `docker inspect`. In a multi-container Docker environment, this is a real attack surface.
+
+## Decision
+
+Copy crypto.ts to packages/core/utils/crypto.ts. Store encrypted secrets in client_secrets.value as EncryptedData { keyVersion, salt, iv, data }.
+
+**Two-layer key model** (revised from original):
+
+1. **Master key** — Provisioned via Docker secret (`/run/secrets/hub_master_key`). tmpfs-backed, never on container filesystem, not visible in `/proc/environ`. Used only to decrypt the config file's encrypted fields. Rarely rotated (requires redeploying the Docker secret).
+
+2. **Data encryption keys** — Stored in the config file's `encryptionKeys` field (itself encrypted with the master key). Multi-key format: `v1:base64,v2:base64` — the first key is "current" (used for new encryptions), all keys are available for decryption (enables rotation). Generated via `crypto.generateEncryptionKey()`. Rotated by updating the config file and re-encrypting `client_secrets` rows — no Docker secret change needed.
+
+Key versioning supports rotation — bump keyVersion, re-encrypt on next access. The rotation protocol is defined in storage/services.md.
+
+**No environment variables for secrets or important configuration.** This is a hard rule. Non-sensitive convenience vars (e.g., `ALKHUB_CONFIG_PATH`) are acceptable. Nothing that would be damaging if exposed via `/proc` may be in an env var.
+
+Full config system specification: [docs/architecture/hub-config.md](../docs/architecture/hub-config.md).
+Startup sequence: [docs/architecture/hub-startup.md](../docs/architecture/hub-startup.md).
+
+## Consequences
+
+Encryption keys must be available at runtime. If lost, all secrets unrecoverable. Standard for symmetric encryption.
+
+**Positive**: Key versioning enables rotation without downtime. Proven crypto implementation. Docker secrets eliminate the `/proc/environ` leak vector. Two-layer keys allow independent rotation schedules (master key rarely, data keys as needed). Config file with encrypted fields is safe to version-control (ciphertext only).
+
+**Negative**: Encryption key loss means total data loss (same as before). Two keys to manage instead of one. Slightly more complex deployment (mount config file + secret, rather than just setting env vars). Config file must be prepared with the `alkhub-config` CLI tool before deployment.
+
+**Mitigated by**: Storing master key in Docker secrets (not DB, not env), supporting key rotation so compromised keys can be cycled, `alkhub-config` tool automating config file preparation, infrastructure.md documenting the Docker deployment pattern.
--- a/docs/decisions/ADR-009-multi-tenancy-via-accounts-and-organizations.md
+++ b/docs/decisions/ADR-009-multi-tenancy-via-accounts-and-organizations.md
@@ -0,0 +1,17 @@
+# ADR-009: Multi-tenancy via accounts and organizations
+
+- **Status**: Accepted
+- **Date**: 2026-04-19
+- **Deciders**: alkdev
+
+## Context
+
+Initial schema was implicitly single-tenant. Multiple users, projects, and organizations need to coexist. But we don't replicate Gitea's user/team/repo model — Gitea handles VCS access control via operations. The hub handles session, key, and client ownership.
+
+## Decision
+
+Add three small tables — accounts (hub-local identity), organizations (top-level grouping), and organization_members (membership with levels). Link existing tables via FKs (api_keys.ownerId, clients.ownerId → accounts.id; projects.orgId → organizations.id). Bridge to Gitea via accounts.giteaUsername and organizations.giteaOrgName.
+
+## Consequences
+
+Minimal multi-tenancy layer. Doesn't handle fine-grained permissions (that's Gitea's job). Provides ownership tracking and grouping, enough for single-to-few-tenant case. Positive: lightweight, delegates VCS permissions to Gitea, easy to understand. Negative: if we need RBAC beyond owner/admin/member, must extend or add a permissions layer later.
--- a/docs/decisions/ADR-010-api-keys-vs-client-secrets-direction-matters.md
+++ b/docs/decisions/ADR-010-api-keys-vs-client-secrets-direction-matters.md
@@ -0,0 +1,27 @@
+# ADR-010: API keys vs client secrets — direction matters
+
+- **Status**: Accepted
+- **Date**: 2026-04-19
+- **Deciders**: alkdev
+
+## Context
+
+Both api_keys and client_secrets store authentication credentials, but they serve opposite directions.
+
+## Decision
+
+Keep as separate tables with different security models. api_keys: keys WE issue so others can call US (hub auth). Managed by keypal. Stored as SHA-256 hashes. client_secrets: keys OTHERS issue so we can call THEM (outbound auth). Managed by us. Stored as AES-256-GCM encrypted values. Never mix — a hashed client secret is useless (we can't send it), an encryptable API key defeats the purpose of hashing.
+
+## SHA-256 vs KDF trade-off
+
+API keys are hashed with SHA-256, not a deliberately slow KDF (bcrypt, Argon2). This is acceptable because:
+
+1. API keys are high-entropy machine-generated strings (128-bit+). With 2^128 key space, brute-force is infeasible regardless of hash speed — there are not enough keys to make a dictionary attack viable.
+2. SHA-256 provides O(1) verification latency at high throughput, which matters for every API request.
+3. Slow KDFs exist to protect low-entropy human passwords (where rate-limiting cannot compensate for small key space). Machine-generated keys do not have this weakness.
+
+If the database is compromised, the attacker has the SHA-256 hashes but cannot reverse them without enumerating the key space — which is computationally infeasible for 128-bit+ random keys.
+
+## Consequences
+
+Positive: clear security model per direction, appropriate crypto per use case, no confusion about how credentials are stored. Negative: two tables instead of one, but the security models are fundamentally incompatible so merging would be wrong.
--- a/docs/decisions/ADR-011-dual-task-representation.md
+++ b/docs/decisions/ADR-011-dual-task-representation.md
@@ -0,0 +1,76 @@
+# ADR-011: Database as source of truth for tasks
+
+- **Status**: Accepted
+- **Date**: 2026-04-19
+- **Deciders**: alkdev
+- **Supersedes**: Previous "dual representation" design where files were source of truth for content and DB for state
+
+## Context
+
+The SDD process uses tasks as markdown files (compatible with the `taskgraph` CLI). The hub coordinator needs to query and mutate task state at runtime across multiple parallel worktrees. We need a storage model that serves both authoring and runtime coordination.
+
+Taskgraph's file-based model works well for single-agent, single-worktree workflows. In the hub's multi-agent, multi-worktree environment, files create problems:
+
+- **Parallel worktrees**: Agent A marks a task `in-progress` in their worktree's file. Agent B can't see this — the file lives in A's working directory. The coordinator can't get a consistent view.
+- **Merge conflicts**: Two agents editing the same task file in different worktrees creates git conflicts on merge.
+- **Reliable coordination**: The coordinator needs to query "which tasks are pending?" without scanning filesystems across worktrees.
+- **Atomic mutations**: Status changes must be immediately visible to all agents, not delayed until file merges.
+
+Three options were considered:
+
+1. **Files only** — The coordinator runs `taskgraph` CLI commands via bash to query status. Agents edit files directly.
+2. **Database only** — Tasks are stored exclusively in Postgres. No markdown files.
+3. **Database as source of truth, files as authoring surface** — The DB is the authoritative runtime representation. Markdown files serve as the Decomposer's authoring format, ingested to DB via sync. Taskgraph CLI used for offline analysis via DB export.
+
+## Decision
+
+We choose **Option 3: Database as source of truth, files as authoring surface**.
+
+### Authority Model
+
+| Aspect | Authority | Why |
+|--------|-----------|-----|
+| All task fields (structure, categorical estimates, metadata) | **DB** | Every taskgraph frontmatter field maps to a dedicated DB column. Queryable, concurrent-safe, consistent. |
+| Task specification (body) | **DB** (`body` column) | Stored as markdown text. Agents append notes during execution. |
+| Task creation/authoring | **Files** → sync → DB | Decomposer edits markdown files; sync ingests them into DB. |
+| Runtime status mutations | **DB** (hub operations) | `hub.task.*` operations ensure all agents see consistent state. |
+| Offline graph analysis | **Files** (taskgraph CLI) | Export from DB when needed for `taskgraph risk-path` etc. |
+
+### Key Design Principles
+
+1. **Every taskgraph frontmatter field is a proper DB column** — no fields relegated to JSONB `metadata`. `priority`, `assignee`, `dueAt`, `tags` get dedicated columns because they're queryable and filterable in coordinator workflows.
+
+2. **Categorical fields are nullable, not NOT NULL with defaults** — `scope`, `risk`, `impact`, `level` are nullable (NULL = not yet assessed). This preserves the distinction between "deliberately assessed as low" and "nobody filled this in." Taskgraph itself uses `Option<TaskScope>` etc.
+
+3. **No `parentId`** — Grouping is handled by `path` (a nullable text column for scoped queries like `WHERE path LIKE 'implementation/%'`). Dependencies are in `task_dependencies`. These are separate concepts.
+
+4. **No `removedAt` soft delete** — When a task file is removed, the sync DELETEs the DB row. Git history preserves file-level history. No DB duplication needed.
+
+5. **`fileCreatedAt`/`fileModifiedAt`** — Dedicated columns for frontmatter timestamps, separate from DB `createdAt`/`updatedAt` (row lifecycle times).
+
+## Consequences
+
+**Positive**:
+- Coordinator gets a reliable, consistent view of all task state across parallel worktrees.
+- No merge conflicts from agents editing the same file in different worktrees.
+- Status changes are atomic and immediately visible to all agents via hub operations.
+- All taskgraph fields are queryable with proper SQL types and indexes.
+- Taskgraph CLI still works for offline analysis via DB → file export.
+- Nullable categorical fields provide the "not yet assessed" signal that defaults hide.
+
+**Negative**:
+- Two representations exist (files and DB), requiring a sync operation.
+- Files are no longer the source of truth — they're the authoring surface. This is a conceptual shift from taskgraph's default model.
+- DB → file export is needed for offline analysis (not automatic).
+
+**Mitigation for negatives**:
+- Sync is idempotent and can be run at any time after authoring.
+- The DB is the authority; files are just one input method. Tasks can also be created via hub API.
+- Export for offline analysis is a manual step (run when needed), not a continuous sync.
+
+## Related
+
+- ADR-001: JSONB data columns vs individual columns (same principle — proper columns for queryable fields)
+- Cost-benefit framework: taskgraph framework docs
+- Task storage: `docs/architecture/storage/tasks.md`
+- taskgraph TaskFrontmatter: taskgraph source
--- a/docs/decisions/ADR-012-agent-vs-role-vs-account.md
+++ b/docs/decisions/ADR-012-agent-vs-role-vs-account.md
@@ -0,0 +1,84 @@
+# ADR-012: Agent vs Role vs Account Terminology
+
+## Status
+
+Proposed
+
+## Context
+
+The codebase and documentation use "agent" in multiple overlapping senses:
+1. **OpenCode "agent"**: A behavioral specification defining what tools, permissions, model, and prompt an LLM session uses. OpenCode's `.opencode/agents/*.md` files define these.
+2. **Philosophical "agency"**: An ill-defined notion of autonomy or self-direction.
+3. **Principal-agent "agent"**: In the legal sense, an entity that acts on behalf of a principal.
+4. **MCP/LLM "agent"**: A general term for an LLM-powered system that takes actions.
+
+Meanwhile, our `accounts` table has a `role` column with values `admin`, `user`, `service` — which is a _different_ "role" concept (access level, not behavioral specification).
+
+This creates confusion:
+- When we say "agent permissions," do we mean the behavioral spec (OpenCode sense) or the access level (account sense)?
+- When an LLM creates a Gitea commit, who is the "agent"? The LLM? The human who delegated? The account the LLM uses?
+- When we import OpenCode sessions, their `agent` field maps to... what in our model?
+
+## Decision
+
+We adopt the following terminology:
+
+| Term | Definition | Storage |
+|------|-----------|---------|
+| **Account** | An identity in the system (human, service, or LLM). Owns resources, authenticates. | `accounts` table |
+| **Role** | A behavioral specification that any account can fill. Defines permissions, tools, model params. | `roles` table (future), currently `.opencode/agents/*.md` |
+| **Session** | A unit of work where an account fills a role. Binds account + role for a duration. | `sessions` table |
+
+### Specific naming changes:
+
+1. **`sessions.agentName`** → **`sessions.roleName`**
+   - The field stores which behavioral role is active, not which account
+   - OpenCode's `agent` field on messages maps to our `roleName`
+
+2. **`accounts.role`** → **`accounts.accessLevel`**
+   - Renamed to avoid confusion with behavioral roles
+   - Values remain: `admin`, `user`, `service`
+   - This is a different concept from the behavioral role
+
+3. **`organization_members.role`** → **`organization_members.membershipLevel`**
+   - Yet another "role" concept — org membership level
+   - Values remain: `owner`, `admin`, `member`
+   - Renamed for the same reason: avoid collision with behavioral roles
+
+4. **New term**: When we need to say "an LLM acting autonomously", we say **"LLM in a role"** or **"session with an LLM account"**, not "agent"
+
+5. **OpenCode import mapping**: OpenCode's `session.agent` → our `sessions.roleName`
+
+### Rationale
+
+- **"Role" is what you fill, not what you are**. A human can fill the implementer role. An LLM can fill the implementer role. The role defines behavior, not identity.
+- **"Account" provides accountability**. Every session, API call, and audit entry traces back to an account. Whether that account is human or LLM is indicated by `accounts.accessLevel: "service"`.
+- **"Agent" is ambiguous**. The philosophical and legal senses conflict. The OpenCode sense conflates behavior with identity. Avoiding it removes confusion.
+- **The principal-agent framework maps naturally**. When a coordinator (principal) delegates to an implementer (agent), both have accounts. The accountability flows through the accounts, not through some notion of "agency."
+- **Permission intersection makes sense**. `Session permissions = Role.permissions ∩ Account.scopes ∩ SpokeType.trustLevel` reads clearly. `Agent.permissions ∩ ...` would be unclear.
+
+## Consequences
+
+### Positive
+- Clear separation between identity (account) and behavior (role)
+- Unambiguous accountability trail (every action → account)
+- Natural mapping of OpenCode's `agent` field → `roleName`
+- No philosophical confusion about "agency"
+
+### Negative
+- Three columns renamed: `sessions.agentName` → `sessions.roleName`, `accounts.role` → `accounts.accessLevel`, `organization_members.role` → `organization_members.membershipLevel`
+- Need to be consistent about this in all new documentation and code
+- OpenCode's `.opencode/agents/` directory name stays (it's their convention), but we refer to the contents as "role specs" not "agent specs"
+- Migration needed for existing code/docs that use the old column names
+
+### Terminology Summary
+
+| Old/Ambiguous Term | Canonical Term | Storage Location | Values |
+|---|---|---|---|
+| `accounts.role` | `accounts.accessLevel` | `accounts.accessLevel` | admin, user, service |
+| `sessions.agentName` | `sessions.roleName` | `sessions.roleName` | architect, implementation-specialist, ... |
+| `organization_members.role` | `organization_members.membershipLevel` | `organization_members.membershipLevel` | owner, admin, member |
+| behavioral "agent" (OpenCode) | role | `roles` table (planned) | architect, implementation-specialist, ... |
+
+### Neutral
+- OpenCode import just maps `agent` → `roleName` — this is a data mapping, not a semantic conflict
--- a/docs/decisions/ADR-013-schema-system-integration.md
+++ b/docs/decisions/ADR-013-schema-system-integration.md
@@ -0,0 +1,161 @@
+# ADR-013: Schema system integration — TypeBox as canonical, typemap as scanner adapter
+
+- **Status**: Accepted (implemented in `@alkdev/operations`)
+- **Date**: 2026-04-25 (updated 2026-05-18)
+- **Deciders**: alkdev
+
+## Context
+
+The operations system requires typed `inputSchema` and `outputSchema` on every `IOperationDefinition`. Internally, the system uses `@alkdev/typebox` (our fork of `@sinclair/typebox` 0.x LTS) exclusively — `KindGuard.IsSchema()` gates registration, `Value.Check()`/`Value.Errors()` performs validation, and `Static<>` derives TypeScript types from schemas. This is a hard dependency; the runtime requires genuine TypeBox `TSchema` objects with `[Kind]` symbols.
+
+External systems send schemas over the wire as JSON Schema. The hub-spoke protocol is JSON over WebSocket. MCP tools and OpenAPI specs are JSON Schema. Non-TypeScript spokes (Python, Rust, etc.) send JSON Schema. This means:
+
+1. **TypeBox is the internal runtime format** — the hub and TypeScript spokes use it for validation, type derivation, and schema checking.
+2. **JSON Schema is the wire format** — TypeBox schemas serialize to JSON Schema (they're a superset with `[Kind]` symbols that strip on serialization). The hub deserializes via `FromSchema()`. Any language with a JSON Schema library and a WebSocket client can implement a spoke.
+3. **Spoke authors may prefer different schema DSLs** — Zod, Valibot, or TypeScript syntax strings are more ergonomic for some developers than TypeBox's builder API. `@alkdev/typemap` (a fork of the archived `@sinclair/typemap`) provides bidirectional conversion between TypeBox, Zod, Valibot, and Syntax, with TypeBox as the canonical intermediate representation.
+
+The question is how to integrate typemap without forcing Zod/Valibot into every install, and without changing the internal TypeBox contract.
+
+## Decision
+
+### TypeBox is canonical — no multi-schema internals
+
+`IOperationDefinition.inputSchema` and `outputSchema` remain `TSchema`. The registry, validation, call protocol, and storage all use TypeBox natively. No `TSchema | ZodTypeAny | ValibotSchema` union types anywhere in core.
+
+### JSON Schema is the wire format
+
+The spoke registration protocol (`hub.register`) carries operation specs with their schemas serialized as JSON Schema. On deserialization, the hub converts back to TypeBox `TSchema` via `FromSchema()`. This is the same pattern already used for MCP tools and OpenAPI specs.
+
+The call protocol events (`call.requested`, `call.responded`, etc.) carry `input` as `Type.Unknown()` — the payload is validated against the operation's `inputSchema` by the receiver, not by the transport. The schema itself isn't in every event; only the `operationId` is, and the receiver looks up the schema from its registry.
+
+Any language with a JSON Schema library and a WebSocket client can implement a spoke. No TypeBox dependency required on the spoke side.
+
+### FromSchema() coverage is a subset of JSON Schema
+
+`FromSchema()` (in `@alkdev/operations/from-schema`) handles the JSON Schema features most commonly encountered in operation schemas. The current implementation covers:
+
+| Feature | Support |
+|---------|---------|
+| `type: "string"`, `"number"`, `"integer"`, `"boolean"`, `"null"` | ✅ Full |
+| `type: "object"` with `properties` / `required` | ✅ Full |
+| `type: "array"` with `items` (single schema or tuple) | ✅ Full |
+| `allOf`, `anyOf`, `oneOf` | ✅ Full |
+| `enum` (value arrays) | ✅ Full |
+| `const` (literal values) | ✅ Full |
+| `$ref` (schema references) | ⚠️ Partial — produces `Type.Ref()` but requires definitions registered in TypeBox's schema registry for resolution at validation time |
+| Schema annotations (`description`, `default`, `format`, etc.) | ✅ Passed through to TypeBox as options |
+| `$defs` / `definitions` | ❌ Not handled — schemas using shared definitions must inline them before sending over the wire |
+| `patternProperties`, `additionalProperties` | ❌ Not handled — falls through to `Type.Unknown()` |
+| `if/then/else` | ❌ Not handled |
+| `not` | ❌ Not handled |
+| `contentEncoding`, `contentMediaType` | ❌ Not handled |
+
+**Wire format constraint**: Spoke schemas sent over the wire must be **self-contained** (no `$ref`s, no `$defs`/`definitions`) and use only the supported JSON Schema subset. Unsupported features currently produce `Type.Unknown()`, which accepts any value — safe (no false rejections) but no validation. The hardened `FromSchema()` (see security constraints below) must warn on unsupported features rather than silently degrading.
+
+### Inbound schema processing has security constraints
+
+When a spoke sends JSON Schema over the wire, the hub runs `FromSchema()` on it. This is processing untrusted input and must be hardened:
+
+- **Schema depth limit**: `FromSchema()` is recursive. Schemas with deeply nested `allOf`/`anyOf` can cause stack overflows. The hub must reject schemas exceeding 10 levels of nesting.
+- **Schema size limit**: The `hub.register` handler must reject operation specs whose serialized schema exceeds 64KB per schema.
+- **`$ref` policy**: Wire schemas must be self-contained. Circular `$ref`s are a DoS vector. The hub must reject any schema containing `$ref` or `$defs`/`definitions` at registration time.
+- **No silent degradation**: `FromSchema()` must warn on unsupported JSON Schema features rather than silently producing `Type.Unknown()`. The hub logs which features fell through so spoke authors can fix their schemas.
+
+### Scanner is the conversion point — typemap converts at scan time
+
+The scanner (`@alkdev/operations/scanner`, using `ScannerFS` Deno adapter for filesystem access) walks the filesystem, imports `.ts` operation files, and registers their default exports. This is where typemap integrates: the scanner detects the schema type and converts non-TypeBox schemas before registration, using the `SchemaAdapter` pattern from `@alkdev/operations/from-typemap`.
+
+```ts
+// Scanner conversion logic (schematic)
+if (KindGuard.IsSchema(schema)) {
+  // TypeBox — register directly (current path)
+} else if (IsZod(schema)) {
+  // Zod → TypeBoxFromZod → TSchema → register
+} else if (IsValibot(schema)) {
+  // Valibot → TypeBoxFromValibot → TSchema → register
+} else {
+  throw new Error("Not a valid schema type...");
+}
+```
+
+The spoke author writes their operation definition using whatever schema DSL they prefer. The scanner converts it to TypeBox transparently at registration time. No manual `fromZod()` call needed — the author just writes Zod schemas in their operation file and the scanner handles the rest.
+
+The conversion is one-way and happens once at scan time. After registration, only the TypeBox `TSchema` exists in the registry. The original Zod/Valibot schema is not kept — the TypeBox conversion is the authoritative schema for validation, serialization, and type derivation.
+
+### typemap is an optional dependency with dynamic import
+
+`@alkdev/typemap` is a peer dependency of the spoke package, not a dependency of core. The scanner uses the `SchemaAdapter` from `@alkdev/operations/from-typemap` which handles dynamic imports to load typemap's conversion functions only when needed:
+
+```ts
+// If a Zod schema is detected and typemap isn't installed,
+// the error message directs the user to install it.
+async function convertFromZod(schema: unknown): Promise<TSchema> {
+  try {
+    const { TypeBoxFromZod } = await import("@alkdev/typemap");
+    return TypeBoxFromZod(schema);
+  } catch {
+    throw new Error(
+      "Zod schema detected but @alkdev/typemap is not installed. " +
+      "Add it as a peer dependency to use Zod schemas in operation definitions."
+    );
+  }
+}
+```
+
+This keeps typemap, Zod, and Valibot out of the dependency tree entirely for spoke authors who use TypeBox directly. The `import()` is conditional — if no Zod schemas are encountered, the dynamic import is never executed and the modules are never loaded.
+
+The type detection guards (`IsZod`, `IsValibot`) use the [Standard Schema](https://github.com/standard-schema/standard-schema) `~standard` property with the `vendor` field (`"zod"` or `"valibot"`). This is a community spec implemented by Zod 3.23+ and Valibot 1.0+. The checks are small inline predicates that don't require importing Zod or Valibot themselves.
+
+### Hub-side registration stays unchanged
+
+When a spoke sends its operation list over the wire in `hub.register`, the schemas arrive as plain JSON (no `[Kind]` symbols). The hub's registration handler converts them via `FromSchema()` (from `@alkdev/operations/from-schema`):
+
+```ts
+// In hub.register handler
+for (const spec of wireSpecs) {
+  const inputSchema = FromSchema(spec.inputSchema);   // JSON Schema → TSchema
+  const outputSchema = FromSchema(spec.outputSchema);  // JSON Schema → TSchema
+  registry.register({ ...spec, inputSchema, outputSchema });
+}
+```
+
+This is already the pattern used for MCP tools and OpenAPI specs. Spoke registration is the same, whether the original author wrote in TypeBox, Zod, or Valibot — by the time it crosses the wire, it's JSON Schema.
+
+## Consequences
+
+**Positive:**
+- Zero bloat for core or for spoke authors using TypeBox directly
+- Spoke authors get ergonomic schema definition in Zod, Valibot, or Syntax transparently — the scanner converts at registration time
+- Non-TypeScript spokes use JSON Schema natively — no adapter needed at the protocol level
+- Wire format is language-agnostic (JSON Schema)
+- TypeBox remains the single canonical runtime format — no multi-schema validation paths
+- Dynamic imports mean Zod and Valibot are only loaded when schemas in those formats are actually encountered
+
+**Negative:**
+- Zod refinements that have no JSON Schema equivalent (e.g., `.refine()`, `.superRefine()`, `.transform()`) will be lost in conversion. The `TypeBoxFromZod` conversion handles declarative constraints (`.min()`, `.max()`, `.email()`, etc.) but not arbitrary validation functions. Spoke authors using Zod refinements need to understand that only the JSON Schema-representable subset survives the TypeBox conversion.
+- **Type precision loss at the wire boundary**: `FromSchema()` returns `Type.TSchema` generically, so `Static<typeof schema>` resolves to `unknown` for wire-registered schemas (unlike in-process TypeBox schemas where `Static<>` gives precise types). Runtime validation is preserved, but compile-time type narrowing is lost for hub-side TypeScript code consuming spoke-registered operations. This is an inherent trade-off with wire-mediated schema exchange — the hub can't reconstruct the precise TypeScript type from JSON Schema alone.
+- **Error message fidelity**: When a Zod-derived schema fails validation after TypeBox conversion, error messages reference TypeBox paths and type names, not the original Zod field names. Adding `description` fields to Zod schemas helps, since those survive conversion.
+- The scanner needs a fallback error path for when typemap isn't installed but a Zod/Valibot schema is encountered.
+- typemap is a community-maintained fork of an archived project — carries some maintenance risk, mitigated by it being a thin conversion layer with no runtime presence in the hub.
+
+**Implementation status:** The scanner enhancement is now implemented in `@alkdev/operations`. The `SchemaAdapter` pattern in `@alkdev/operations/from-typemap` handles schema type detection (using Standard Schema `~standard` vendor checks) and dynamic import conversion paths. `@alkdev/typemap` is an optional peer dependency of the spoke package. `FromSchema()` in `@alkdev/operations/from-schema` is hardened with depth limits, size limits, and cycle detection.
+
+## Out of Scope
+
+- Bidirectional Zod ↔ TypeBox sync (conversion is one-way and one-time at scan/registration)
+- Runtime schema migration or schema versioning across re-registrations
+- Auto-generation of TypeScript types from wire schemas (code generation approach, deferred)
+- Converting Zod `.transform()` / `.pipe()` output types (these are runtime-only, not representable in JSON Schema)
+
+## References
+
+- `@alkdev/typemap` npm: `@alkdev/typemap@0.10.1` — fork of `@sinclair/typemap` 0.x
+- [Standard Schema spec](https://github.com/standard-schema/standard-schema) — community interface for type checking libraries
+- Scanner: `@alkdev/operations/scanner` (with `ScannerFS` Deno adapter)
+- `FromSchema()`: `@alkdev/operations/from-schema` — JSON Schema → TypeBox converter
+- `FromOpenAPI()`: `@alkdev/operations/from-openapi` — OpenAPI → operation definitions
+- `SchemaAdapter`: `@alkdev/operations/from-typemap` — Zod/Valibot → TypeBox conversion at registration time
+- Spoke architecture: `docs/architecture/spoke-runner.md`
+- Call protocol: `docs/architecture/call-graph.md`
+- Operations system: `docs/architecture/operations.md`
+- ADR-006: Operation specs as capabilities (definitions vs. registrations)
--- a/docs/decisions/storage-spec-phase1-resolutions.md
+++ b/docs/decisions/storage-spec-phase1-resolutions.md
@@ -0,0 +1,84 @@
+---
+status: stable
+last_updated: 2026-04-22
+---
+
+# Storage Spec Phase 1 Resolutions
+
+Architectural decisions made during the storage spec stabilization planning session on 2026-04-22. These resolutions inform all downstream task execution.
+
+## Decisions
+
+### D1. Cascade Policy Defaults
+
+| Data Category | Default Cascade | Rationale |
+|---|---|---|
+| Audit/traceability data | RESTRICT on NOT NULL FKs; SET NULL on nullable FKs | NOT NULL FKs (ownerId) prevent account deletion. Nullable FKs (keyId, sessionId, orgId) preserve the row while clearing the reference. Both patterns prevent data loss. |
+| Live session data | Nullable FK + SET NULL | Orphaned sessions preserve conversation history for audit/debugging |
+| Ephemeral config (spoke ops, etc.) | CASCADE | Delete with parent — these are runtime artifacts |
+| Transferable ownership | RESTRICT + transfer workflow | Cannot delete account that owns an org; must transfer first |
+
+### D2. Message IDs — Composite Index Approach
+
+**Decision**: Messages table keeps UUIDv4 (`commonCols.id`). Ordering is handled by composite index `(session_id, created_at, id)`.
+
+**Rationale**:
+- ADR-003's sortable IDs remain in effect for `parts` only
+- Composite index provides efficient ordering for messages without requiring sortable IDs
+- Simpler opencode conversation import — opencode uses UUIDv4 message IDs natively
+- ADR-003 is amended to scope sortable IDs to `parts`, not `messages`
+
+**Action**: Amend ADR-003, update sessions.md, update table-reference.md
+
+### D3. Operations Schema — Definitions + Registrations Split (Option A)
+
+**Decision**: Split `operation_specs` into two tables:
+
+1. **`operations`** (definitions): `id`, `namespace`, `name`, `type` (query/mutation/subscription), `inputSchema`, `outputSchema`, `accessControl`, `description`
+2. **`operation_registrations`**: `id`, `operationId → operations.id`, `providerType` (spoke|client), `providerId`, `status` (active|inactive), `registeredAt`
+
+**Rationale**:
+- Separation of "what an operation is" from "who provides it right now"
+- Multiple instances of the same client (e.g., 5 opencode instances) share definitions but have separate registrations
+- OpenAPI/MCP spec imports create definitions; spoke/client connection creates registrations
+- On spoke disconnect: registration rows are deactivated (not deleted). Definitions survive.
+- On admin spoke-row deletion: registrations CASCADE (ephemeral config pattern from D1)
+- Call routing: resolve from definition → active registrations → provider
+- More upfront schema work, but avoids a confusing refactor later when multi-instance clients arrive
+
+**Namespace convention**: `operations.namespace`/`name` store **post-remap** identifiers (e.g., `dev.{spokeId}.fs.read`). This ensures uniqueness across multiple providers of the same logical operation. Pre-remap identifiers are stored in `operation_registrations.metadata` for traceability.
+
+**Actions**:
+- Rename `operation_specs` → `operations` across all docs
+- Add `operation_registrations` table spec to spokes.md
+- Update table-reference.md with new FK relationships and cascade policies
+- Update spokes.md disconnect lifecycle to deactivate registrations, not delete
+- Update ADR-006 to reflect the split
+
+### D4. Key Rotation
+
+- **API key rotation**: Handled by keypal library (ADR-004)
+- **Client secret encryption**: Needs multi-key format specification. Current `HUB_ENCRYPTION_KEY` (singular, env var) was insufficient — superseded by the two-layer key model in [hub-config.md](../architecture/hub-config.md) and ADR-008 (revised). Task `specify-key-rotation-protocol` addresses this.
+
+### D5. Account Deactivation
+
+**Decision**: Add `status` enum column (`active` | `suspended` | `deactivated`) to accounts table, not a boolean.
+
+**Rationale**: More extensible — allows distinguishing "admin suspended" from "user deactivated" in the future. Consistent with having meaningful status semantics rather than overloaded booleans.
+
+**Action**: Update identity.md accounts table spec, update table-reference.md
+
+### D6. System Account Email Convention
+
+**Decision**: Email reservation for system/LLM accounts is **deployment-configurable**, not hardcoded to any domain.
+
+**Convention**: Deployments MAY reserve an email domain or pattern (e.g., `{model}@llm.example.com` or `{model}@system.example.com`) for non-human accounts. This prevents collision between human and system-generated accounts and enables attribution in git and audit logs.
+
+**Anti-pattern**: Do NOT hardcode any specific domain (e.g., `alk.dev`) in architecture documentation. The convention is generic; the specific domain is a deployment concern.
+
+**Action**: Update identity.md to document the configurable pattern convention, not a specific domain.
+
+## References
+
+- docs/reviews/storage-architecture-review-2026-04-21.md — source review
+- tasks/architecture/storage/* — downstream implementation tasks