Files
hub/docs/architecture/coordination.md
glm-5.1 2b63cda1c7 Setup repo: migrate architecture specs, code stubs, and tasks from alkhub_ts
Copy architecture docs, ADRs, storage domain specs, research, reviews,
and 56 storage architecture tasks from the alkhub_ts monorepo. Adapt for
standalone @alkdev/hub repo structure (src/ not packages/hub/).

Sanitize all sensitive information:
- Replace private IPs (10.0.0.1) with localhost defaults
- Remove internal server hostnames (dev1, ns528096)
- Replace /workspace/ private paths with npm package references
- Remove hardcoded credentials from examples
- Rewrite infrastructure.md without private network details

Add Deno project scaffolding: deno.json (pinned deps), .gitignore,
AGENTS.md, entry point. Migrate existing code stubs (crypto, config
types, logger) with updated import paths.
2026-05-25 10:56:32 +00:00

5.4 KiB

status, last_updated
status last_updated
draft 2026-04-19

Coordination Operations

Overview

Coordination operations manage multi-agent workflows: spawning sessions, inter-session messaging, status tracking, and anomaly detection. These are hub operations in the registry, backed by Postgres and Redis.

Architecture

State: Postgres Tables

Coordination operations use three tables in the hub's storage layer. See storage/coordination.md for the full schema definitions:

  • mappings — Worktree/session/coordinator relationships. Links spawned sessions to their parent coordinator, spoke, git branch, and now the assigned task. Status: active, completed, aborted, failed.
  • detections — Anomaly detection records. Links detection events to sessions with severity and details.
  • tasks + task_dependencies — SDD task definitions and their dependency edges. The coordinator queries task status to determine next work. See storage/tasks.md for the full task storage design.

Operations

coord.spawn — Create Worktree + Session

  1. env.git.worktreeCreate({ name, branch }) — create worktree (via call protocol)
  2. env.opencode.sessionCreate({ directory, title }) — create session (via call protocol)
  3. Insert into mappings table (with taskId referencing the assigned task)
  4. env.opencode.sessionPromptAsync({ sessionId, prompt, agent }) — send initial prompt (via call protocol)
  5. Publish coord.spawned event to Redis

coord.status — Query Spawned Session Status

  1. Query mappings table for children of parent session
  2. For each mapping, env.opencode.sessionStatus({ sessionId }) (via call protocol)
  3. Return aggregated status

coord.message — Send Message to Spawned Session

  1. env.opencode.sessionPromptAsync({ sessionId, message, agent }) (via call protocol)
  2. Publish coord.messaged event to Redis

coord.notify — Notify Coordinator

  1. Look up mapping to find parentSessionId
  2. env.opencode.sessionPromptAsync({ sessionId: parentSessionId, message: formattedNotification }) (via call protocol)
  3. Publish coord.notified event to Redis with level (info/warning/blocking)

coord.abort — Abort Spawned Session

  1. env.opencode.sessionAbort({ sessionId }) (via call protocol)
  2. Update mapping status to "aborted"
  3. Publish coord.aborted event to Redis

opencode REST Operations via FromOpenAPI

Each coordination operation that interacts with an opencode container calls through the operations generated by FromOpenAPI from opencode's server spec:

opencode.sessionCreate      → POST /session
opencode.sessionPromptAsync → POST /session/{id}/prompt_async
opencode.sessionStatus      → GET /session/{id}/status
opencode.sessionAbort        → POST /session/{id}/abort
opencode.sessionMessages     → GET /session/{id}/messages

These operations are auto-generated and type-safe. No manual HTTP client code. The SSE fix in from_openapi.ts (async generator for SUBSCRIPTION endpoints) makes the streaming endpoints work through our call protocol.

How Agents Call Coordination Operations

Agents in opencode containers call hub operations via MCP — not through a plugin:

Agent in opencode container
  │
  ├── MCP search({ q: "coord" })           → finds coord.*, hub.list, hub.call, etc.
  ├── MCP call({ tool: "coord.notify" })    → reports task finished, blocked, or messages coordinator
  ├── MCP call({ tool: "coord.status" })    → checks on sibling sessions
  └── MCP call({ tool: "coord.abort" })     → aborts a stuck session

The hub's MCP endpoint is configured when the opencode container is set up (in opencode.json MCP servers). The agent discovers and calls coordination tools the same way it discovers any other tool — via the MCP search/schema/call pattern. No plugin needed.

Anomaly Detection

The hub monitors sessions via Redis events and runs detection heuristics:

  1. The hub subscribes to Redis alk:events:message.part.updated:* and alk:events:session.status:* channels
  2. Maintains in-memory metrics per monitored session (tool errors, malformed tools, last activity, status)
  3. Periodic check (every 30s) for stalls
  4. When thresholds exceeded, stores detection in detections table and publishes coord.detection event

Detections are queryable via coord.detect:

coord.detect({ sessionIDs?: string[] }) → Array<{ sessionId, issues, severity }>

Detection Heuristics

These heuristics are validated patterns for catching common agent session failures:

Anomaly Type Trigger Default Threshold Severity
MODEL_DEGRADATION Malformed tool calls detected ≥1 malformed tool High
HIGH_ERROR_COUNT Tool errors accumulating ≥5 tool errors Medium
SESSION_STALL No activity while busy >60s no activity Medium

Simple counters and timers per session, maintained from the Redis event stream. Pull model — the coordinator calls coord.detect on demand rather than being interrupted by push notifications.

Provenance

The coordination operations design (spawn/message/notify/abort/detect) and detection heuristics (model degradation, high error count, session stall) are validated patterns from prior work. The alkhub_ts implementation uses the call protocol and Postgres persistence rather than single-process file-based state.