Copy architecture docs, ADRs, storage domain specs, research, reviews, and 56 storage architecture tasks from the alkhub_ts monorepo. Adapt for standalone @alkdev/hub repo structure (src/ not packages/hub/). Sanitize all sensitive information: - Replace private IPs (10.0.0.1) with localhost defaults - Remove internal server hostnames (dev1, ns528096) - Replace /workspace/ private paths with npm package references - Remove hardcoded credentials from examples - Rewrite infrastructure.md without private network details Add Deno project scaffolding: deno.json (pinned deps), .gitignore, AGENTS.md, entry point. Migrate existing code stubs (crypto, config types, logger) with updated import paths.
20 KiB
status, last_updated
| status | last_updated |
|---|---|
| draft | 2026-05-22 |
Spoke: WebSocket-Connected Operation Provider
Overview
A "spoke" is any process connected to the hub via a persistent websocket that provides and/or consumes operations. The hub-spoke protocol is the same four operations that MCP agents use: list, search, schema, call. There is one contract — the spoke is just another client of the hub's operation interface, except it also provides operations to the hub's registry.
A spoke can be many things:
- Dev env spoke — exposes local dev tools (bash, file ops, fs.read, fs.write) to the hub
- Client spoke — a user's local machine, where the hub can call operations like notifications or local integrations back to the user
- GPU compute spoke — a vast.ai instance exposing CUDA operations
- Any future spoke — anything that connects, lists its ops, and responds to calls
Design Principles
- One contract — the hub-spoke protocol is
list/search/schema/call. Same operations, same event shapes, whether the consumer is an MCP agent, a browser client, or another spoke. No separate "runner management" protocol. - WebSocket is the transport — persistent bidirectional connection. The hub pushes
call.requested, the spoke pushescall.responded/call.error. Same call protocol,WebSocketEventTarget(@alkdev/pubsub/event-target-websocket-clienton spoke,@alkdev/pubsub/event-target-websocket-serveron hub) as theTypedEventTargetimpl. - Bidirectional — the hub calls operations on the spoke (dispatch), and the spoke calls operations on the hub (e.g., publishing events, calling other spokes' operations through the hub). Same protocol in both directions.
- Registration = list — when a spoke connects, it calls
hub.registerand includes its operation list. The hub now knows what that spoke can do. No separate registration protocol. - Filtered by identity —
listandsearchreturn operations scoped to the caller's identity. An admin sees everything. A dev env spoke sees only the operations it's allowed to call. This prevents context bloat and enforces access control at the discovery layer. - Op remapping — a dev env spoke exposes
fs.read,fs.write,bash.exec, etc. The hub maps these to its owndev.fs.read,dev.fs.write,dev.bash.exec(or similar namespaced form) so they don't collide with hub-native operations. When an LLM callsdev.fs.read, the hub routes to the right spoke. From the LLM's perspective it's just acall— it doesn't know or care which spoke executes it. - No persistent state — spoke is ephemeral. All state lives in the hub's Postgres.
PendingRequestMapandCallHandlerare from@alkdev/operations. - Stateless on reconnect — if the websocket drops, the spoke reconnects. The hub aborts in-flight calls via call protocol cascading. On reconnect,
hub.registerre-establishes what the spoke can do.
Why WebSocket, Not Redis or HTTP
| Redis Pub/Sub | HTTP Long-Poll | WebSocket |
|---|---|---|
| Spoke needs Redis access | Spoke is always a client | Spoke is always a client |
| Separate channels for dispatch vs results | Polling latency | Bidirectional, push-based |
spoke:{id}:dispatch + spoke:{id}:results |
POST result back after poll | Same connection, same protocol |
| Requires Redis on spoke's network | Works anywhere but slow | Works anywhere, fast |
| Hub mediates via Redis, not call protocol | Hub mediates via HTTP, not call protocol | Call protocol flows end-to-end |
External compute (vast.ai, ubicloud) won't have Redis access. A user's laptop running a client spoke won't have Redis. WebSocket works from anywhere with just an internet connection, and gives us bidirectional push. The call protocol's TypedEventTarget abstraction means the hub's PendingRequestMap (from @alkdev/operations) doesn't care whether the event traverses Redis, in-process EventTarget, or a websocket.
The hub uses Redis internally for its own cross-process event routing (see pubsub-redis.md). Spokes don't need to know about Redis.
Spoke Types
Dev Env Spoke
Wraps local development tools. The spoke scans its local operation definitions (bash, filesystem, git) and registers them with the hub on connect. The hub remaps these into a namespace (e.g., dev.*) so an LLM agent working with this spoke gets dev.fs.read, dev.bash.exec, etc. in its list results.
This is what replaces the per-opencode-container MCP server model. Instead of each container running its own MCP server with open-websearch etc., the container runs a dev env spoke. The hub provides shared infrastructure operations (websearch, coordination); the spoke provides local dev tools.
Client Spoke
A user's local machine or browser. The hub can call operations on the client spoke — for example, sending a notification, triggering a local action, or providing a callback for a long-running agent task. The client spoke might expose only a few operations (client.notify, client.openUrl, client.confirm), but the bidirectional nature means the hub can push to the user proactively.
From the LLM's perspective, calling client.notify is just another call. It doesn't know the operation routes to the user's laptop.
GPU Compute Spoke
# On vast.ai instance
curl -fsSL https://alk.dev/install-spoke | sh
alk-spoke start --hub <hub-url> --token <token> --capability cuda
Same websocket, same hub.register with its operation list. The hub routes compute.train or compute.infer to it.
Container Spoke (deferred)
Extends the base spoke with Docker container lifecycle management + opencode integration. A dev server spoke that manages opencode containers on a compute server, wrapping container start/stop/restart as operations. A separate variant (without Docker) will target cloud compute instances. Both are just spokes with extra operations — they register like any other spoke, the hub dispatches to them.
Prerequisite: Working hub + minimal base spoke first. The open-coordinator plugin's container/worktree patterns inform the design but are not a runtime dependency.
Identity-Filtered Discovery
The list and search operations return different results based on the caller's identity. This is access control at the discovery layer:
| Identity | What list/search returns |
|---|---|
| Admin | All operations across all connected spokes + hub-native |
| Dev env spoke (authenticated) | Hub operations it's allowed to call + its own operations |
| Dev env spoke's LLM agent | Operations the LLM is allowed to call (dev tools, coordination, search) |
| Client spoke | Hub operations scoped to that user + any client-callable ops |
| Unauthenticated | Nothing (auth required) |
This is why list/search/schema/call are operations, not just passive endpoints — they go through CallHandler which checks the operation's AccessControl (requiredScopes, resource permissions) against the caller's Identity. The hub can also filter based on the spoke type (dev env vs client vs compute) and the spoke's declared capabilities.
Op remapping in practice: when a dev env spoke registers with fs.read, fs.write, bash.exec, the hub stores these as dev.{spokeId}.fs.read, dev.{spokeId}.fs.write, dev.{spokeId}.bash.exec. For LLM agents using this spoke, list can collapse the prefix to just dev.fs.read if only one dev env spoke is active for that session. If multiple dev env spokes are connected, the full dev.{spokeId}.* form disambiguates.
Registration Flow
Registration is a spoke calling hub.register — a regular operation call over the websocket:
Spoke connects (WS)
│
├── Auth (token in first message or WS handshake)
│
├── Spoke calls: hub.register { runnerId, operations[], spokeType, project, hardware }
│ └── Hub's hub.register handler:
│ ├── Stores spoke's websocket reference
│ ├── Remaps spoke's operations into hub namespace
│ ├── Adds to RunnerPool
│ └── Returns { runnerId, status: "connected" }
│
└── Spoke is now registered. Hub can dispatch to it; it can call hub ops.
On reconnect: the spoke calls hub.register again. The hub refreshes. Any in-flight calls from the previous connection were already aborted by the call protocol on disconnect.
On disconnect: the hub detects the closed websocket, aborts in-flight calls via call protocol cascading, and marks the spoke disconnected. The spoke's remapped operations are removed from the hub's registry so list/search no longer return them.
Spoke Lifecycle
1. Start
├── Load config (hub WS URL, auth token)
├── Scan local operations (OperationRegistry.scan via `@alkdev/operations` with `ScannerFS` Deno adapter)
├── Open websocket to hub (wss://api.alk.dev/ws)
├── Call hub.register with runnerId + operation list + spokeType + hardware
│ └── Hub stores spoke in RunnerPool, remaps operations
└── Heartbeat via WS ping/pong
2. Running
├── Receive call.requested over WS (hub dispatching an operation to this spoke)
│ ├── Execute via local OperationRegistry
│ ├── Send call.responded (or call.error) back over WS
│ └── Call graph tracked on hub side via parentRequestId
├── Receive call.aborted over WS
│ └── Abort local execution (AbortController cascade)
└── Send call.requested over WS to hub (spoke calling a hub operation)
└── Hub responds with call.responded
3. Disconnect / Reconnect
├── WebSocket drops
├── Hub detects missed heartbeats
│ └── Abort in-flight calls dispatched to spoke (call protocol cascading)
├── Spoke reconnects
│ └── Call hub.register again → hub refreshes
└── Or spoke shuts down gracefully
└── Call hub.unregister before closing WS
Dispatch Flow
Hub Spoke
│ │
│──── call.requested ─────────────────────→│ (hub → spoke: "execute this")
│ ├── CallHandler validates
│ ├── registry.execute(operationId, input)
│←─── call.responded ────────────────────│ (spoke → hub: "here's the result")
│ │
│──── call.aborted ──────────────────────→│ (hub → spoke: "cancel this")
│ ├── AbortController.abort()
│←─── call.aborted ──────────────────────│ (spoke → hub: "confirmed")
│ │
│←─── call.requested ─────────────────────│ (spoke → hub: "call a hub op")
│──── call.responded ────────────────────→│ (hub → spoke: "result")
The call protocol is fully bidirectional over the websocket. The hub dispatches operations to the spoke; the spoke calls hub operations. Same CallEventMap, same requestId correlation, same error model.
WebSocketEventTarget
Available in @alkdev/pubsub:
- Spoke side:
@alkdev/pubsub/event-target-websocket-client—createWebSocketEventTarget(ws)wraps aWebSocketinstance as aTypedEventTarget - Hub side:
@alkdev/pubsub/event-target-websocket-server— creates aWebSocketEventTargetfor each incoming spoke connection
Both implement the same TypedEventTarget interface as RedisEventTarget, using EventEnvelope for structured cross-process messaging.
On the hub side, each spoke's websocket connection gets a WebSocketEventTarget. The hub creates a PendingRequestMap (from @alkdev/operations) scoped to that spoke. When the hub needs to call an operation on a specific spoke, it uses that spoke's PendingRequestMap.call() — the event traverses the websocket, the spoke handles it, the response comes back, the Promise resolves.
Hub-Side WebSocket Handling (Architectural Task)
The hub needs a WebSocket server component that handles the other side of spoke connections. This is an architectural task that needs deeper design:
- Hono WebSocket upgrade —
app.get("/ws", upgradeWebSocket(...))handler - Per-connection
WebSocketEventTarget— create aWebSocketEventTargetfor each incoming spoke connection - Per-connection
PendingRequestMap— scopedcallMapfor dispatching to this specific spoke - Spoke lifecycle — on connect:
hub.register→ create event target + call map → add to RunnerPool; on disconnect: abort in-flight calls → remove from pool - Identity/authentication — verify token at upgrade or first message, attach to
OperationContext.identity
This connects the pubsub system's WebSocketEventTarget (@alkdev/pubsub/event-target-websocket-client for spokes, @alkdev/pubsub/event-target-websocket-server for the hub) with the hub's PendingRequestMap and CallHandler (from @alkdev/operations). The full design needs to account for reconnection, heartbeat, and the interaction with the existing RedisEventTarget (@alkdev/pubsub) for cross-process event routing.
Hub-Side Operations
Spoke management and discovery are just operations in the hub's registry — the same ones the MCP interface exposes:
| Operation | Input | Output | Description |
|---|---|---|---|
hub.register |
{ runnerId, operations[], spokeType, project, hardware } |
{ status: "connected" } |
Register spoke, remap its operations |
hub.unregister |
{ runnerId } |
{ status: "disconnected" } |
Graceful disconnect, abort in-flight calls |
hub.list |
{ namespace?, q? } |
OperationSpec[] |
List available ops (filtered by caller identity) |
hub.search |
{ q, namespace? } |
{ tool, description }[] |
Search ops (filtered by caller identity) |
hub.schema |
{ tool } |
{ inputSchema, outputSchema } |
Get schemas for an operation |
hub.call |
{ calls: [{ tool, input }] } |
{ success, result/error }[] |
Execute operations (routes to correct spoke) |
When an MCP agent calls search, it's calling hub.search. When a spoke calls hub.register, it's using the same interface. One contract.
Routing in hub.call:
- Operation starts with
hub.*→ execute locally in hub's registry - Operation matches a spoke's remapped namespace → dispatch via that spoke's
WebSocketEventTarget - Operation not found →
OPERATION_NOT_FOUNDerror via call protocol
What a Spoke Does NOT Have
- No Postgres connection
- No Redis connection
- No HTTP API server (it's a websocket client, not a server)
- No UI of any kind
- No session storage
- No task graph
- No call graph (the hub tracks the graph; the spoke just executes and responds)
- No separate "spoke protocol" — same operation interface as everyone else
It is an operation provider/consumer connected to the hub by a single websocket.
Composability Note
MCP as an RPC protocol has a fundamental limitation: you can't get return types from MCP servers, so MCP tools aren't composable. This is fine for LLMs calling tools interactively, but it breaks programmatic composition — you can't chain MCP tools together or build higher-level operations from MCP tool outputs. That's what started the toolEnv POC research in the first place.
Our operations avoid this because every operation has typed inputSchema and outputSchema (TypeBox/JSON Schema). You can compose: the output of dev.fs.read can feed into the input of hub.search because schemas are known and type-checkable. MCP tools can't do this.
Schema Wire Format
Schemas travel over the wire as JSON Schema, not as TypeBox objects. TypeBox schemas are a superset of JSON Schema (they add [Kind] symbols for runtime type checking), so JSON.parse(JSON.stringify(typeboxSchema)) produces valid JSON Schema. On the receiving end, FromSchema() decorates plain JSON Schema with [Kind] symbols to create TypeBox TSchema objects suitable for Value.Check() validation.
This means:
- TypeScript spokes using TypeBox: serialize naturally (TypeBox schemas are already valid JSON Schema minus the
[Kind]symbols, which strip on serialization). - TypeScript spokes using Zod or Valibot: the scanner converts to TypeBox at registration time via
@alkdev/operations/from-typemap(see ADR-013), then serialize as JSON Schema. - Non-TypeScript spokes (Python, Rust, etc.): send JSON Schema directly. Any language with a JSON Schema library and a WebSocket client can implement a spoke. No TypeBox dependency required.
- The hub deserializes incoming JSON Schema via
FromSchema()(from@alkdev/operations/from-schema) — same path used for MCP tools and OpenAPI specs (from@alkdev/operations/from-openapi).
This makes the hub-spoke protocol language-agnostic at the schema level. The hub's internal use of TypeBox for validation is an implementation detail, not a protocol requirement.
Wire Schema Constraints
Schemas sent over the wire must be self-contained JSON Schema — no external $refs, no $defs/definitions. The hub's FromSchema() converter handles the commonly-used JSON Schema subset (objects, arrays, primitives, allOf/anyOf/oneOf, enum, const, format annotations) but not features like patternProperties, if/then/else, or not (see ADR-013 for the full coverage table).
The hub enforces security constraints on inbound schemas:
- Depth limit (suggested: 10 levels of nesting) — prevents stack overflow from deeply nested allOf/anyOf
- Size limit (suggested: 64KB per schema) — prevents oversized payloads
- No circular
$refs — the hub rejects schemas with$refor$defs/definitions, or pre-processes by inlining with cycle detection
Unsupported JSON Schema features silently degrade to Type.Unknown() (accepts any value — safe but unvalidated). The hub should log degradation warnings to help spoke authors fix their schemas.
For "legacy" systems like opencode that only speak MCP, we expose an MCP endpoint as a thin adapter over the same hub.list/hub.search/hub.schema/hub.call operations. The MCP endpoint is a compatibility layer, not the primary interface.
Open Questions
- How does a spoke receive its project context? — Does the hub tell it which git repo to clone, or does it come pre-configured?
- Container lifecycle — See "Container Spoke (deferred)" above. Container lifecycle management will be handled by a container spoke that extends the base spoke.
- Source sync for external compute — Does a GPU spoke clone from Gitea automatically, or does the hub push source?
- WebSocket auth — Token in first message after connect, or token in query string / subprotocol header? (Related: hub-architecture.md API auth model)
- Concurrent operations per spoke — Can a spoke handle multiple
call.requestedevents concurrently? Concurrent is better for SUBSCRIPTION operations. - Operation list freshness — Does the spoke re-register on reconnect only, or does it push updates when its registry changes?