Copy architecture docs, ADRs, storage domain specs, research, reviews, and 56 storage architecture tasks from the alkhub_ts monorepo. Adapt for standalone @alkdev/hub repo structure (src/ not packages/hub/). Sanitize all sensitive information: - Replace private IPs (10.0.0.1) with localhost defaults - Remove internal server hostnames (dev1, ns528096) - Replace /workspace/ private paths with npm package references - Remove hardcoded credentials from examples - Rewrite infrastructure.md without private network details Add Deno project scaffolding: deno.json (pinned deps), .gitignore, AGENTS.md, entry point. Migrate existing code stubs (crypto, config types, logger) with updated import paths.
5.4 KiB
status, last_updated
| status | last_updated |
|---|---|
| draft | 2026-04-19 |
Coordination Operations
Overview
Coordination operations manage multi-agent workflows: spawning sessions, inter-session messaging, status tracking, and anomaly detection. These are hub operations in the registry, backed by Postgres and Redis.
Architecture
State: Postgres Tables
Coordination operations use three tables in the hub's storage layer. See storage/coordination.md for the full schema definitions:
mappings— Worktree/session/coordinator relationships. Links spawned sessions to their parent coordinator, spoke, git branch, and now the assigned task. Status:active,completed,aborted,failed.detections— Anomaly detection records. Links detection events to sessions with severity and details.tasks+task_dependencies— SDD task definitions and their dependency edges. The coordinator queries task status to determine next work. Seestorage/tasks.mdfor the full task storage design.
Operations
coord.spawn — Create Worktree + Session
env.git.worktreeCreate({ name, branch })— create worktree (via call protocol)env.opencode.sessionCreate({ directory, title })— create session (via call protocol)- Insert into
mappingstable (withtaskIdreferencing the assigned task) env.opencode.sessionPromptAsync({ sessionId, prompt, agent })— send initial prompt (via call protocol)- Publish
coord.spawnedevent to Redis
coord.status — Query Spawned Session Status
- Query
mappingstable for children of parent session - For each mapping,
env.opencode.sessionStatus({ sessionId })(via call protocol) - Return aggregated status
coord.message — Send Message to Spawned Session
env.opencode.sessionPromptAsync({ sessionId, message, agent })(via call protocol)- Publish
coord.messagedevent to Redis
coord.notify — Notify Coordinator
- Look up mapping to find
parentSessionId env.opencode.sessionPromptAsync({ sessionId: parentSessionId, message: formattedNotification })(via call protocol)- Publish
coord.notifiedevent to Redis with level (info/warning/blocking)
coord.abort — Abort Spawned Session
env.opencode.sessionAbort({ sessionId })(via call protocol)- Update mapping status to "aborted"
- Publish
coord.abortedevent to Redis
opencode REST Operations via FromOpenAPI
Each coordination operation that interacts with an opencode container calls through the operations generated by FromOpenAPI from opencode's server spec:
opencode.sessionCreate → POST /session
opencode.sessionPromptAsync → POST /session/{id}/prompt_async
opencode.sessionStatus → GET /session/{id}/status
opencode.sessionAbort → POST /session/{id}/abort
opencode.sessionMessages → GET /session/{id}/messages
These operations are auto-generated and type-safe. No manual HTTP client code. The SSE fix in from_openapi.ts (async generator for SUBSCRIPTION endpoints) makes the streaming endpoints work through our call protocol.
How Agents Call Coordination Operations
Agents in opencode containers call hub operations via MCP — not through a plugin:
Agent in opencode container
│
├── MCP search({ q: "coord" }) → finds coord.*, hub.list, hub.call, etc.
├── MCP call({ tool: "coord.notify" }) → reports task finished, blocked, or messages coordinator
├── MCP call({ tool: "coord.status" }) → checks on sibling sessions
└── MCP call({ tool: "coord.abort" }) → aborts a stuck session
The hub's MCP endpoint is configured when the opencode container is set up (in opencode.json MCP servers). The agent discovers and calls coordination tools the same way it discovers any other tool — via the MCP search/schema/call pattern. No plugin needed.
Anomaly Detection
The hub monitors sessions via Redis events and runs detection heuristics:
- The hub subscribes to Redis
alk:events:message.part.updated:*andalk:events:session.status:*channels - Maintains in-memory metrics per monitored session (tool errors, malformed tools, last activity, status)
- Periodic check (every 30s) for stalls
- When thresholds exceeded, stores detection in
detectionstable and publishescoord.detectionevent
Detections are queryable via coord.detect:
coord.detect({ sessionIDs?: string[] }) → Array<{ sessionId, issues, severity }>
Detection Heuristics
These heuristics are validated patterns for catching common agent session failures:
| Anomaly Type | Trigger | Default Threshold | Severity |
|---|---|---|---|
| MODEL_DEGRADATION | Malformed tool calls detected | ≥1 malformed tool | High |
| HIGH_ERROR_COUNT | Tool errors accumulating | ≥5 tool errors | Medium |
| SESSION_STALL | No activity while busy | >60s no activity | Medium |
Simple counters and timers per session, maintained from the Redis event stream. Pull model — the coordinator calls coord.detect on demand rather than being interrupted by push notifications.
Provenance
The coordination operations design (spawn/message/notify/abort/detect) and detection heuristics (model degradation, high error count, session stall) are validated patterns from prior work. The alkhub_ts implementation uses the call protocol and Postgres persistence rather than single-process file-based state.