Resolve 22 open questions via 4 ADRs; add dev spoke questions (OQ-61, OQ-62)

ADR-014 (Docker-first deployment): resolves OQ-21, OQ-22, OQ-23, OQ-34, OQ-35, OQ-36, OQ-37. Docker is the primary deployment model. Redis/Postgres in same network. Config via mounted volumes. Single- container restart for v1. Migrations block startup. ADR-015 (Dev spoke, not opencode): resolves OQ-16, OQ-17, OQ-26, OQ-28, OQ-51, OQ-55. Replaces opencode integration with a compiled dev spoke binary. Hub owns session format. Opencode compat is an import tool, not an architectural constraint. Adds OQ-61 (dev spoke operations) and OQ-62 (dev spoke distribution). ADR-016 (Hub-own schema): resolves OQ-18, OQ-19. Hub defines its own canonical message/part format. JSONB is implicitly versioned. Flat parts for v1. Compaction is a hub concern (pruning), not opencode's. ADR-017 (Hub-first roles): resolves OQ-26, OQ-28, OQ-51 (overlapping with ADR-015). Hub is database-first for roles. Seeded by migrations. No file sync needed. hub.createRole for custom roles. Also narrowed: OQ-04 (service accounts), OQ-05 (git SSO out of scope), OQ-08 (spoke-side concern), OQ-09 (v1: reconnect only), OQ-11 (dev spoke replaces container spoke), OQ-29 (hub-only concern), OQ-41 (gitea ops are optional spoke concern). Deferred: OQ-52 (memory), OQ-55 (anthropic import). Net result: 15 resolved, 7 narrowed, 2 deferred out of 62 total. 39 remain open, down from 60 in the original tracker.
2026-05-26 05:40:54 +00:00
parent 3d7f90dec9
commit 2d7f9c11cb
5 changed files with 410 additions and 122 deletions
--- a/docs/decisions/ADR-014-docker-first-deployment.md
+++ b/docs/decisions/ADR-014-docker-first-deployment.md
@@ -0,0 +1,52 @@
+# ADR-014: Docker-first deployment model
+
+- **Status**: Accepted
+- **Date**: 2026-05-26
+- **Deciders**: alkdev
+
+## Context
+
+The hub needs a deployment model. Several architecture questions were left open about how the hub relates to its infrastructure dependencies (Postgres, Redis), how configuration is managed across environments, and what assumptions the architecture can make about the runtime environment.
+
+The previous iteration (alkhub_ts) was designed for a specific production infrastructure with specific host IPs and manual setup. The hub (`@alkdev/hub`) is designed to be a generalized, OSS-first project that others can deploy.
+
+## Decision
+
+The hub assumes Docker as its primary deployment model. This resolves several open questions:
+
+1. **Postgres and Redis run in the same Docker network** — No cross-network TLS needed between hub and its data stores. `PostgresConfig.ssl` can remain `boolean` for v1 (same-network communication doesn't require TLS between containers). TLS termination happens at the reverse proxy (nginx/caddy) for external traffic.
+
+2. **Configuration is encrypted files + Docker secrets** — The `alkhub-config` CLI encrypts config values. CI/CD doesn't need the master key because config files are pre-encrypted and mounted at runtime. The master key is a Docker secret provisioned by the operator. This eliminates the "how does CI/CD get the key?" problem (OQ-21).
+
+3. **Docker Compose handles environment variation** — Dev vs. prod differences are handled via different Docker Compose files and different mounted config files. No overlay config system needed (OQ-22).
+
+4. **Single-container restart is sufficient for v1** — No hot spare or zero-downtime restart needed. Docker's restart policy handles failures. Connection draining and session transfer are Phase 2 concerns (OQ-35).
+
+5. **Startup observability via `/health` + Docker logs** — The `/health` endpoint with step-level progress and structured JSON logging to stdout is sufficient. Docker's `HEALTHCHECK` directive and log aggregation handle the rest. No pub/sub startup events needed (OQ-36).
+
+6. **Migrations block startup** — Single-container deployment means the hub must have a consistent schema before serving. Background migrations require schema version negotiation that adds complexity without benefit in a single-container model (OQ-34).
+
+7. **Redis is hub-internal** — Redis runs in the same Docker network as the hub. Spokes connect via WebSocket (not Redis), so Redis topology is a deployment detail, not an architectural concern (OQ-37).
+
+## Consequences
+
+**Positive**: Deployment model is explicit and consistent. Config, networking, and observability all assume Docker conventions. Open questions about SSL between hub and Postgres, CI/CD key management, config overlays, and zero-downtime restarts are resolved by the Docker model.
+
+**Negative**: Docker is a hard dependency for production deployment. Developers who want to run the hub outside Docker must provide their own Postgres and Redis, and handle TLS and config management themselves. This is acceptable — the Docker model is the documented production path.
+
+### Open questions resolved by this decision
+
+| OQ | Resolution |
+|----|-----------|
+| OQ-21 | CI/CD doesn't need the master key — config files are pre-encrypted and mounted at runtime; master key is a Docker secret |
+| OQ-22 | No overlay config needed — Docker Compose handles environment variation via mounted volumes |
+| OQ-34 | Block migrations at startup — single-container model requires consistent schema before serving |
+| OQ-35 | Single-container restart is v1 — Docker restart policy handles failures; zero-downtime is Phase 2 |
+| OQ-36 | `/health` + Docker logs is sufficient — no pub/sub startup events needed |
+| OQ-37 | Redis is hub-internal in Docker — same network, not an architectural concern |
+
+### Open questions narrowed by this decision
+
+| OQ | Narrowing |
+|----|-----------|
+| OQ-23 | `PostgresConfig.ssl: boolean` is sufficient for same-network Docker deployment. TLS between containers is a deployment concern, not an app config concern. |
--- a/docs/decisions/ADR-015-dev-spoke-not-opencode.md
+++ b/docs/decisions/ADR-015-dev-spoke-not-opencode.md
@@ -0,0 +1,87 @@
+# ADR-015: Dev spoke instead of opencode integration
+
+- **Status**: Accepted
+- **Date**: 2026-05-26
+- **Deciders**: alkdev
+
+## Context
+
+The original hub architecture included tight integration with opencode (an external tool) for several capabilities:
+
+- Agent sessions were modeled on opencode's session/message format
+- Coordination operations called `opencode.sessionCreate`, `opencode.sessionPromptAsync`, etc.
+- Agent roles were defined in `.opencode/agents/*.md` files
+- The `ai-sdk-provider-opencode-sdk` package wrapped opencode as an AI SDK model
+- Opencode's SQLite database was the reference format for message storage
+
+This created a two-left-hand problem: the hub's architecture was shaped by opencode's data model and conventions, but the hub is a generalized platform that other users won't run opencode on. The integration surface was large (sessions, messages, roles, tools, git operations) and the conceptual overhead was significant (understanding opencode's model to understand the hub).
+
+Meanwhile, the hub already has the spoke model for extending capabilities. A "dev env spoke" that exposes bash, file operations, git, and other development tools would provide the same functionality as opencode's tool suite, but connected to the hub via the standard call protocol over WebSocket — just like any other spoke.
+
+## Decision
+
+Replace the opencode integration with a purpose-built **dev spoke**. The dev spoke is a compiled Deno binary that connects to the hub via WebSocket and exposes development operations (bash, file ops, git, web search) as hub operations. This sits alongside the existing spoke types (client spoke, GPU compute spoke) as just another spoke.
+
+The hub owns sessions, messages, and roles in its own format. Opencode is no longer a core dependency or a shaping force on the hub's architecture. If opencode compatibility is needed in the future, it comes through an **opencode-spoke** — an optional spoke that wraps an opencode instance and exposes its operations through the standard call protocol, just like any other spoke.
+
+### What changes
+
+- **Session model**: Hub defines its own canonical message/part format (based on AI SDK `UIMessage` + parts), not opencode's format. The hub's format stays close to opencode's for import compatibility, but this is a compat concern, not an architectural constraint.
+
+- **Agent roles**: Hub defines roles in the database (see ADR-017). No `.opencode/agents/*.md` file import is needed. The hub is database-first from day one.
+
+- **Coordination operations**: `coord.spawn`, `coord.message`, etc. no longer call `opencode.*` operations. They call hub operations that create sessions, send messages, and manage worktrees — implemented by the dev spoke or hub-native logic.
+
+- **Dev tools**: bash, file operations, git commands come from the dev spoke, not from opencode. The dev spoke is a small, focused binary that does one thing well.
+
+- **Session hosting**: Agent sessions run via the AI SDK directly in the hub (for architect, decomposer, etc.) or via the dev spoke (for implementation specialist). No opencode container required.
+
+### What doesn't change
+
+- The call protocol, call graph, and operation graph are unchanged
+- WebSocket spoke transport is unchanged (dev spoke is just another spoke)
+- Session/message storage in Postgres is unchanged (just simpler without opencode import compat shaping the schema)
+- Auth model, config system, startup sequence — all unchanged
+- `hub.list`/`hub.search`/`hub.schema`/`hub.call` — unchanged
+- The hub never "spoke opencode" — it spoke the call protocol
+
+### Future: opencode-spoke
+
+An optional opencode-spoke can be built later that:
+- Connects to the hub via WebSocket (standard spoke protocol)
+- Wraps an opencode instance
+- Exposes opencode's operations through FromOpenAPI (opencode's REST spec → typed operations)
+- Imports opencode's SQLite session data into the hub's Postgres
+
+This is a spoke-level concern, not a hub architecture concern.
+
+## Consequences
+
+**Positive**: The hub is self-contained and doesn't depend on opencode's data model, conventions, or implementation. The dev spoke is a bounded, implementable component. The hub's session format, role model, and coordination operations are designed for the hub's needs, not adapted from another project. Opencode users can still integrate via an optional spoke.
+
+**Negative**: The dev spoke needs to be built. It replaces the "install opencode and connect it" model with "download a binary and connect it." The dev spoke needs its own implementation of bash, file ops, git, and web search — these are well-understood operations but still need implementation. The `ai-sdk-provider-opencode-sdk` package is no longer a dependency.
+
+### Open questions resolved by this decision
+
+| OQ | Resolution |
+|----|-----------|
+| OQ-16 | Hub defines its own canonical message/part format. Opencode's format is an import concern, not an architectural constraint. The format stays close for compat but the hub owns it. |
+| OQ-17 | Compaction is a hub concern (pruning/summarization for long sessions), not an opencode "compaction agent" concern. For v1, full message history is served. |
+| OQ-26 | No `roles.sync` from `.opencode/agents/*.md` needed. Hub is database-first for roles (see ADR-017). |
+| OQ-28 | No `Agent.generate()` support needed. The hub creates sessions with DB-defined roles. |
+| OQ-29 | Per-session role switching is a hub-only concern. No opencode agent model to reconcile. |
+| OQ-51 | No file→DB role migration needed. Hub is database-first from day one. |
+| OQ-55 | Anthropic conversation import is deferred. Not shipped with the codebase. |
+
+### Open questions narrowed by this decision
+
+| OQ | Narrowing |
+|----|-----------|
+| OQ-11 | Container spoke is now "dev spoke" — a compiled binary, not an opencode container. Simpler scope, no opencode integration needed. |
+
+### New open questions
+
+| ID | Question |
+|----|----------|
+| OQ-61 | What operations does the dev spoke expose? |
+| OQ-62 | How is the dev spoke distributed and configured? |
--- a/docs/decisions/ADR-016-hub-own-schema.md
+++ b/docs/decisions/ADR-016-hub-own-schema.md
@@ -0,0 +1,45 @@
+# ADR-016: Hub-own schema, opencode compat via import
+
+- **Status**: Accepted
+- **Date**: 2026-05-26
+- **Deciders**: alkdev
+
+## Context
+
+The hub's session/message/part storage was originally designed to closely mirror opencode's drizzle+sqlite schema, so that importing sessions from opencode would be straightforward. This created a tension: the hub's data model was being shaped by an external project's conventions (opencode uses a message tree format with parent/child parts, a `version` column, a `compaction` agent type, etc.) rather than by the hub's own needs (AI SDK `UIMessage` format, hub-specific session metadata, role-based permissions).
+
+With ADR-015 (dev spoke instead of opencode integration), opencode is no longer a core dependency. The hub needs its own canonical data model designed for the hub's needs, with opencode import as a compatibility tool, not an architectural driver.
+
+## Decision
+
+The hub owns its own session, message, and part schema. The schema is designed for the hub's needs first:
+
+1. **AI SDK `UIMessage` compatibility** is the primary design constraint — direct agents (architect, decomposer, etc.) produce `UIMessage` format, and the hub's API assembles `messages` + `parts` into `UIMessage` for consumption.
+
+2. **The hub's `data` JSONB columns** are implicitly versioned by their TypeBox schema evolution. No separate `version` column is needed — each schema change is documented in the migration history, and the TypeBox schemas in the codebase are the source of truth for what each `data` shape contains at any point in time.
+
+3. **Compaction and pruning** are hub concerns, not opencode concerns. The hub may need message pruning for API response size, but this is different from opencode's LLM-driven compaction (summarizing old messages to compress context). Hub pruning is server-side truncation of old messages; opencode compaction is an LLM feature that doesn't belong in the hub's core architecture.
+
+4. **The hub's part types are a subset** of what opencode defines. The hub adds types as it implements features, not because opencode has them. `text`, `tool`, `reasoning` are core; `step-start`, `step-finish`, `snapshot`, `patch` may be added as needed.
+
+5. **Opencode import** remains possible through an import tool that maps opencode's sqlite schema to the hub's postgres schema. This is a compat tool, not a core feature. The import tool maps opencode's `agent` field to the hub's `roleName`, opencode's message tree to the hub's flat message+parts model, etc.
+
+## Consequences
+
+**Positive**: The hub's schema is self-determined and evolves based on the hub's needs, not another project's conventions. AI SDK compatibility is a natural fit. Schema versioning is implicit (TypeBox schemas + migrations). Import from opencode is a tool, not an architectural constraint.
+
+**Negative**: An import tool for opencode's sqlite format will need to be built separately. It's not a priority for v1 and may not be shipped with the codebase. The hub's part type set starts minimal and grows organically, which means early consumers may not find all the part types they expect.
+
+### Open questions resolved by this decision
+
+| OQ | Resolution |
+|----|-----------|
+| OQ-16 | Hub defines its own canonical format. AI SDK `UIMessage` + parts is the primary design constraint. Opencode import is a compat tool. |
+| OQ-17 | Compaction is an opencode concept. The hub needs pruning (server-side truncation) at most. For v1, full history is served. |
+| OQ-18 | JSONB `data` columns are implicitly versioned by TypeBox schema evolution. No `version` column needed. |
+
+### Open questions resolved by previous decisions (confirmed)
+
+| OQ | Resolution |
+|----|-----------|
+| OQ-19 | Flat parts with `messageId` FK for v1. No `parentId` needed unless nesting becomes necessary in the hub's own use cases. |
--- a/docs/decisions/ADR-017-hub-first-roles.md
+++ b/docs/decisions/ADR-017-hub-first-roles.md
@@ -0,0 +1,79 @@
+# ADR-017: Hub-first role definitions (database, not files)
+
+- **Status**: Accepted
+- **Date**: 2026-05-26
+- **Deciders**: alkdev
+
+## Context
+
+The original architecture (from agent-roles.md) defined a three-phase role system:
+
+1. **Phase 1 (current)**: Roles defined in `.opencode/agents/*.md` markdown files
+2. **Phase 2**: A `roles.sync` operation that ingests `.opencode/agents/*.md` files into a `roles` table
+3. **Phase 3**: Database-authoritative roles, with markdown files only for version control editing
+
+This phased approach was designed around opencode's convention of file-based agent definitions. With ADR-015 (dev spoke instead of opencode integration), opencode is no longer a core dependency. The hub manages its own roles.
+
+## Decision
+
+The hub is database-first for roles from day one. There is no Phase 1 or Phase 2 transition from files to database. Roles are defined in the `roles` table in Postgres, seeded by migrations for the built-in roles (architect, decomposer, coordinator, implementation-specialist, code-reviewer, architecture-reviewer, research-specialist, poc-specialist).
+
+### Built-in roles
+
+The hub's migration files seed the standard SDD roles:
+
+| Role | Mode | Key Permission Pattern |
+|------|------|----------------------|
+| architect | primary | read, write, webSearch — no bash |
+| architecture-reviewer | subagent | read, grep — read-only |
+| code-reviewer | subagent | read, grep, bash (read-only) |
+| coordinator | primary | read, worktree_*, bash (limited) — no implementation |
+| decomposer | primary | read, taskgraph — no bash |
+| implementation-specialist | primary | read, write, edit, bash, webSearch — scoped to worktree |
+| poc-specialist | primary | read, write, edit, bash, webSearch — scoped to research worktree |
+| research-specialist | subagent | webSearch, read, write — no bash |
+
+### Role API
+
+Roles are managed via hub operations:
+
+- `hub.listRoles` — list available roles
+- `hub.getRole` — get role definition by name
+- `hub.createRole` — create a custom role (requires admin scope)
+- `hub.updateRole` — update role definition (requires admin scope)
+
+Custom roles can be created at runtime via `hub.createRole`. No file sync is needed.
+
+### Opencode agent mapping
+
+When importing opencode sessions (if an opencode-spoke is built later), the mapping from opencode's `agent` field to the hub's `roleName` is:
+
+| Opencode `agent` | Hub `roleName` |
+|-------------------|----------------|
+| `"build"` | `"implementation-specialist"` |
+| `"plan"` | `"decomposer"` |
+| `"general"` | `"coordinator"` |
+| `"explore"` | `"research-specialist"` |
+
+This mapping is a compat concern in the import tool, not a core architecture concern.
+
+## Consequences
+
+**Positive**: No file-based role sync system to build. No `.opencode/agents/` directory dependency. Roles are queryable, type-safe, and managed through the hub's operation interface. Custom roles can be created programmatically. The hub doesn't need a `roles.sync` operation.
+
+**Negative**: Role definitions can't be easily version-controlled in markdown files alongside the code. Role creation requires the hub API (or seeding via migrations). If role editing in a text editor is desired later, a `roles.export`/`roles.import` operation can be built, but this is not a v1 concern.
+
+### Open questions resolved by this decision
+
+| OQ | Resolution |
+|----|-----------|
+| OQ-26 | No `roles.sync` from `.opencode/agents/*.md` needed. Hub is database-first. Role definitions are seeded by migrations. |
+| OQ-28 | No `Agent.generate()` equivalent needed for v1. Custom roles are created via `hub.createRole`. |
+| OQ-51 | No file→DB migration needed. Hub started in the database-first state. |
+
+### Open questions narrowed by this decision
+
+| OQ | Narrowing |
+|----|-----------|
+| OQ-04 | Service account provisioning is now a generalized question: `hub.createAccount` operation for programmatic creation. For v1, manual creation with keypal CLI is sufficient. LLM-specific email conventions (like `glm-5.1@alk.dev`) are deployment-specific, not core architecture. |
+| OQ-05 | SSO with Gitea is out of scope for a generalized hub. Git provider integration (Gitea, GitHub, etc.) is a spoke concern via operations, not through SSO. For v1, Gitea is accessed via the dev spoke's git operations, not via shared auth. |