--- status: draft last_updated: 2026-04-19 --- # Coordination Operations ## Overview Coordination operations manage multi-agent workflows: spawning sessions, inter-session messaging, status tracking, and anomaly detection. These are hub operations in the registry, backed by Postgres and Redis. ## Architecture ### State: Postgres Tables Coordination operations use three tables in the hub's storage layer. See `storage/coordination.md` for the full schema definitions: - **`mappings`** — Worktree/session/coordinator relationships. Links spawned sessions to their parent coordinator, spoke, git branch, and now the assigned task. Status: `active`, `completed`, `aborted`, `failed`. - **`detections`** — Anomaly detection records. Links detection events to sessions with severity and details. - **`tasks`** + **`task_dependencies`** — SDD task definitions and their dependency edges. The coordinator queries task status to determine next work. See `storage/tasks.md` for the full task storage design. ### Operations #### `coord.spawn` — Create Worktree + Session 1. `env.git.worktreeCreate({ name, branch })` — create worktree (via call protocol) 2. `env.opencode.sessionCreate({ directory, title })` — create session (via call protocol) 3. Insert into `mappings` table (with `taskId` referencing the assigned task) 4. `env.opencode.sessionPromptAsync({ sessionId, prompt, agent })` — send initial prompt (via call protocol) 5. Publish `coord.spawned` event to Redis #### `coord.status` — Query Spawned Session Status 1. Query `mappings` table for children of parent session 2. For each mapping, `env.opencode.sessionStatus({ sessionId })` (via call protocol) 3. Return aggregated status #### `coord.message` — Send Message to Spawned Session 1. `env.opencode.sessionPromptAsync({ sessionId, message, agent })` (via call protocol) 2. Publish `coord.messaged` event to Redis #### `coord.notify` — Notify Coordinator 1. Look up mapping to find `parentSessionId` 2. `env.opencode.sessionPromptAsync({ sessionId: parentSessionId, message: formattedNotification })` (via call protocol) 3. Publish `coord.notified` event to Redis with level (info/warning/blocking) #### `coord.abort` — Abort Spawned Session 1. `env.opencode.sessionAbort({ sessionId })` (via call protocol) 2. Update mapping status to "aborted" 3. Publish `coord.aborted` event to Redis ### opencode REST Operations via FromOpenAPI Each coordination operation that interacts with an opencode container calls through the operations generated by `FromOpenAPI` from opencode's server spec: ``` opencode.sessionCreate → POST /session opencode.sessionPromptAsync → POST /session/{id}/prompt_async opencode.sessionStatus → GET /session/{id}/status opencode.sessionAbort → POST /session/{id}/abort opencode.sessionMessages → GET /session/{id}/messages ``` These operations are auto-generated and type-safe. No manual HTTP client code. The SSE fix in `from_openapi.ts` (async generator for SUBSCRIPTION endpoints) makes the streaming endpoints work through our call protocol. ### How Agents Call Coordination Operations Agents in opencode containers call hub operations via MCP — not through a plugin: ``` Agent in opencode container │ ├── MCP search({ q: "coord" }) → finds coord.*, hub.list, hub.call, etc. ├── MCP call({ tool: "coord.notify" }) → reports task finished, blocked, or messages coordinator ├── MCP call({ tool: "coord.status" }) → checks on sibling sessions └── MCP call({ tool: "coord.abort" }) → aborts a stuck session ``` The hub's MCP endpoint is configured when the opencode container is set up (in `opencode.json` MCP servers). The agent discovers and calls coordination tools the same way it discovers any other tool — via the MCP `search`/`schema`/`call` pattern. No plugin needed. ## Anomaly Detection The hub monitors sessions via Redis events and runs detection heuristics: 1. The hub subscribes to Redis `alk:events:message.part.updated:*` and `alk:events:session.status:*` channels 2. Maintains in-memory metrics per monitored session (tool errors, malformed tools, last activity, status) 3. Periodic check (every 30s) for stalls 4. When thresholds exceeded, stores detection in `detections` table and publishes `coord.detection` event Detections are queryable via `coord.detect`: ``` coord.detect({ sessionIDs?: string[] }) → Array<{ sessionId, issues, severity }> ``` ### Detection Heuristics These heuristics are validated patterns for catching common agent session failures: | Anomaly Type | Trigger | Default Threshold | Severity | |-------------|---------|-------------------|----------| | MODEL_DEGRADATION | Malformed tool calls detected | ≥1 malformed tool | High | | HIGH_ERROR_COUNT | Tool errors accumulating | ≥5 tool errors | Medium | | SESSION_STALL | No activity while busy | >60s no activity | Medium | Simple counters and timers per session, maintained from the Redis event stream. Pull model — the coordinator calls `coord.detect` on demand rather than being interrupted by push notifications. ## Provenance The coordination operations design (spawn/message/notify/abort/detect) and detection heuristics (model degradation, high error count, session stall) are validated patterns from prior work. The alkhub_ts implementation uses the call protocol and Postgres persistence rather than single-process file-based state.