Setup repo: migrate architecture specs, code stubs, and tasks from alkhub_ts

Copy architecture docs, ADRs, storage domain specs, research, reviews,
and 56 storage architecture tasks from the alkhub_ts monorepo. Adapt for
standalone @alkdev/hub repo structure (src/ not packages/hub/).

Sanitize all sensitive information:
- Replace private IPs (10.0.0.1) with localhost defaults
- Remove internal server hostnames (dev1, ns528096)
- Replace /workspace/ private paths with npm package references
- Remove hardcoded credentials from examples
- Rewrite infrastructure.md without private network details

Add Deno project scaffolding: deno.json (pinned deps), .gitignore,
AGENTS.md, entry point. Migrate existing code stubs (crypto, config
types, logger) with updated import paths.
This commit is contained in:
2026-05-25 10:56:32 +00:00
parent 3e3f12d2d5
commit 2b63cda1c7
120 changed files with 11714 additions and 2 deletions

View File

@@ -0,0 +1,113 @@
---
status: draft
last_updated: 2026-04-19
---
# Coordination Operations
## Overview
Coordination operations manage multi-agent workflows: spawning sessions, inter-session messaging, status tracking, and anomaly detection. These are hub operations in the registry, backed by Postgres and Redis.
## Architecture
### State: Postgres Tables
Coordination operations use three tables in the hub's storage layer. See `storage/coordination.md` for the full schema definitions:
- **`mappings`** — Worktree/session/coordinator relationships. Links spawned sessions to their parent coordinator, spoke, git branch, and now the assigned task. Status: `active`, `completed`, `aborted`, `failed`.
- **`detections`** — Anomaly detection records. Links detection events to sessions with severity and details.
- **`tasks`** + **`task_dependencies`** — SDD task definitions and their dependency edges. The coordinator queries task status to determine next work. See `storage/tasks.md` for the full task storage design.
### Operations
#### `coord.spawn` — Create Worktree + Session
1. `env.git.worktreeCreate({ name, branch })` — create worktree (via call protocol)
2. `env.opencode.sessionCreate({ directory, title })` — create session (via call protocol)
3. Insert into `mappings` table (with `taskId` referencing the assigned task)
4. `env.opencode.sessionPromptAsync({ sessionId, prompt, agent })` — send initial prompt (via call protocol)
5. Publish `coord.spawned` event to Redis
#### `coord.status` — Query Spawned Session Status
1. Query `mappings` table for children of parent session
2. For each mapping, `env.opencode.sessionStatus({ sessionId })` (via call protocol)
3. Return aggregated status
#### `coord.message` — Send Message to Spawned Session
1. `env.opencode.sessionPromptAsync({ sessionId, message, agent })` (via call protocol)
2. Publish `coord.messaged` event to Redis
#### `coord.notify` — Notify Coordinator
1. Look up mapping to find `parentSessionId`
2. `env.opencode.sessionPromptAsync({ sessionId: parentSessionId, message: formattedNotification })` (via call protocol)
3. Publish `coord.notified` event to Redis with level (info/warning/blocking)
#### `coord.abort` — Abort Spawned Session
1. `env.opencode.sessionAbort({ sessionId })` (via call protocol)
2. Update mapping status to "aborted"
3. Publish `coord.aborted` event to Redis
### opencode REST Operations via FromOpenAPI
Each coordination operation that interacts with an opencode container calls through the operations generated by `FromOpenAPI` from opencode's server spec:
```
opencode.sessionCreate → POST /session
opencode.sessionPromptAsync → POST /session/{id}/prompt_async
opencode.sessionStatus → GET /session/{id}/status
opencode.sessionAbort → POST /session/{id}/abort
opencode.sessionMessages → GET /session/{id}/messages
```
These operations are auto-generated and type-safe. No manual HTTP client code. The SSE fix in `from_openapi.ts` (async generator for SUBSCRIPTION endpoints) makes the streaming endpoints work through our call protocol.
### How Agents Call Coordination Operations
Agents in opencode containers call hub operations via MCP — not through a plugin:
```
Agent in opencode container
├── MCP search({ q: "coord" }) → finds coord.*, hub.list, hub.call, etc.
├── MCP call({ tool: "coord.notify" }) → reports task finished, blocked, or messages coordinator
├── MCP call({ tool: "coord.status" }) → checks on sibling sessions
└── MCP call({ tool: "coord.abort" }) → aborts a stuck session
```
The hub's MCP endpoint is configured when the opencode container is set up (in `opencode.json` MCP servers). The agent discovers and calls coordination tools the same way it discovers any other tool — via the MCP `search`/`schema`/`call` pattern. No plugin needed.
## Anomaly Detection
The hub monitors sessions via Redis events and runs detection heuristics:
1. The hub subscribes to Redis `alk:events:message.part.updated:*` and `alk:events:session.status:*` channels
2. Maintains in-memory metrics per monitored session (tool errors, malformed tools, last activity, status)
3. Periodic check (every 30s) for stalls
4. When thresholds exceeded, stores detection in `detections` table and publishes `coord.detection` event
Detections are queryable via `coord.detect`:
```
coord.detect({ sessionIDs?: string[] }) → Array<{ sessionId, issues, severity }>
```
### Detection Heuristics
These heuristics are validated patterns for catching common agent session failures:
| Anomaly Type | Trigger | Default Threshold | Severity |
|-------------|---------|-------------------|----------|
| MODEL_DEGRADATION | Malformed tool calls detected | ≥1 malformed tool | High |
| HIGH_ERROR_COUNT | Tool errors accumulating | ≥5 tool errors | Medium |
| SESSION_STALL | No activity while busy | >60s no activity | Medium |
Simple counters and timers per session, maintained from the Redis event stream. Pull model — the coordinator calls `coord.detect` on demand rather than being interrupted by push notifications.
## Provenance
The coordination operations design (spawn/message/notify/abort/detect) and detection heuristics (model degradation, high error count, session stall) are validated patterns from prior work. The alkhub_ts implementation uses the call protocol and Postgres persistence rather than single-process file-based state.