Files

glm-5.1 e7941da04a docs: clarify phase boundaries — Phase 1 vs downstream concerns

The architecture specs were implying that StorageIdentityProvider, irpc
service implementations, and application services (agent, Docker, etc.)
already exist. This commit makes the phasing explicit:

- services.md: deployment topology now clearly labels 'Current (Phase 1)'
  vs 'Future (Phase 2+)', notes that application services are downstream
- identity.md: StorageIdentityProvider labeled 'Future — Phase 2+',
  clarifying alknet-storage doesn't exist yet
- storage.md: adds phase note that the crate hasn't been built yet,
  StorageIdentityProvider is a future impl
- ADR-028: ConfigAuthService is Phase 1 path, StorageAuthService is
  Phase 2+ contract
- call-protocol.md: Agent Service Pattern section explicitly framed as
  a downstream application concern, not a core requirement

2026-06-07 10:29:52 +00:00

17 KiB

Raw Blame History

status, last_updated

status	last_updated
draft	2026-06-07

Call Protocol

What

A bidirectional, transport-agnostic call and event protocol that runs over authenticated pipes. It supports request/response calls, streaming subscriptions, and unidirectional events — all using the same wire format. The protocol is defined as a spec + handler + registry; downstream consumers (NAPI, Python, head/worker) register their own operations without modifying core.

Why

The current control channel (ADR-018) is unidirectional (client → server) and provides fire-and-forget event dispatch without request/response semantics. The call protocol generalizes it to support bidirectional calls (ADR-024) and downstream service registration (ADR-025), enabling the head/worker model where workers expose operations the head invokes.

Architecture

Operation Paths

Operation names use slash-based paths aligned with URL routing conventions:

/{node}/{service}/{op}

node — identity prefix of the node that exposes the operation. The head uses this segment to route calls to the correct connected node.
service — the logical service namespace. Groups related operations under one handler prefix.
op — the specific operation within that service.

Examples:

Path	Meaning
`/dev1/fs/readFile`	Node `dev1`, service `fs`, operation `readFile`
`/dev1/bash/exec`	Node `dev1`, service `bash`, operation `exec`
`/head/agent/chat`	Head's own `agent` service, operation `chat`
`/head/sessions/list`	Head's own `sessions` service, operation `list`
`/browser-1/notify/alert`	Worker `browser-1`, `notify` service

This three-level routing mirrors iroh's ALPN dispatch: the first segment routes to a connected node (like ALPN routes to a protocol handler), the remaining path dispatches within that node's registry. See ADR-025 for the handler/spec separation decision.

The namespace field on OperationSpec is derived from the path (namespace = second path segment). It's a convenience accessor for ACL matching and service grouping.

Wire Format: EventEnvelope

Every message on the wire is a length-prefixed JSON EventEnvelope:

pub struct EventEnvelope {
    pub r#type: String,    // Event type (e.g., "call.requested", "call.responded")
    pub id: String,        // Correlation key (requestId, topic, or "" for broadcasts)
    pub payload: Value,   // JSON payload — schema depends on event type
}

// Frame: 4-byte big-endian length prefix + UTF-8 JSON body

This is the same format used by @alkdev/pubsub adapters. It is JSON because it must be consumable from JavaScript, Python, and any language. The envelope is transport-agnostic — it runs over SSH channels, WebTransport streams, iroh bidirectional streams, WebSocket, or Worker postMessage.

Binary payloads (postcard, protobuf, etc.) are base64-encoded in the payload field. The envelope itself stays JSON for cross-language compatibility.

Call Protocol Events

Five event types carry request/response and subscription semantics:

Event	Direction	Purpose
`call.requested`	Caller → Handler	Initiate a call or subscription
`call.responded`	Handler → Caller	Deliver a result (one for calls, many for subscriptions)
`call.completed`	Handler → Caller	Signal end of subscription stream
`call.aborted`	Either side	Cancel the call/subscription
`call.error`	Handler → Caller	Signal an error

call.error payload:

{
  "code": "string",
  "message": "string",
  "retryable": false
}

A call is just a subscribe that resolves after one event. Both call() and subscribe() send the same call.requested event. The difference is consumption pattern:

call(): Sends call.requested, resolves Promise on first call.responded
subscribe(): Sends call.requested, yields each call.responded until call.completed or call.aborted

The id field carries the requestId for correlation.

Bidirectional Calls and Routing

Both sides of a connection can initiate calls. The head routes calls to workers using the first path segment:

Head (server)                              Worker: "dev1" (client)
     │                                           │
     │  call.requested                           │
     │  name: "/dev1/fs/readFile"                │
     │  payload: { path: "/src/main.rs" }        │
     │──────────────────────────────────────────▶│
     │                                           │
     │  call.responded                           │
     │  id: <requestId>                          │
     │  payload: { content: "fn main()..." }     │
     │◀──────────────────────────────────────────│
     │                                           │
     │          Worker exposes /dev1/fs/*,        │
     │          /dev1/bash/* to head              │
     │                                           │
     │◀─ call.requested ────────────────────────│
     │  name: "/head/agent/chat"                  │
     │  payload: { provider: "anthropic", ... }  │
     │                                           │
     │── call.responded ──────────────────────▶ │
     │  id: <requestId>                          │
     │  payload: { completion: "..." }            │

The head's registry includes:

Head-local operations (/head/*) — handled directly
Remote operations (/{node}/*) — forwarded to the worker connection

When the head routes /dev1/fs/readFile to worker dev1, it strips the node prefix and delivers the call to the worker's local registry as /fs/readFile. The worker doesn't need to know its own alias.

Head/Worker Architecture

         ┌─────────────────────────────────┐
         │           Head Node             │
         │                                 │
         │  Head-local services:           │
         │  /head/agent/chat  (LLM coord)  │
         │  /head/agent/complete           │
         │  /head/sessions/list            │
         │  /head/sessions/history         │
         │                                 │
         │  Worker registry (discovered):  │
         │  /dev1/fs/* → dev1 connection   │
         │  /dev1/bash/* → dev1 connection │
         │  /dev2/fs/* → dev2 connection   │
         │  /browser-1/notify/* → WT conn │
         └──────┬───────┬───────┬──────────┘
                │       │       │
      ┌─────────▼┐ ┌───▼────┐ ┌▼───────────┐
      │  Worker  │ │Worker  │ │Browser Worker│
      │  "dev1"  │ │"dev2"  │ │"browser-1"  │
      │  /fs/*   │ │/fs/*   │ │/notify/*    │
      │  /bash/* │ │/bash/* │ │             │
      │  /search/*│ │       │ │             │
      └──────────┘ └────────┘ └─────────────┘

When a worker connects, it registers its operations with the head:

worker → head:  call.requested { name: "/head/services/register", payload: {
  node: "dev1",
  operations: ["/fs/readFile", "/fs/writeFile", "/bash/exec", "/search/query"]
}}

The head adds these to its routing table with the node prefix. Other workers and browser clients can then call /dev1/fs/readFile without knowing how the head routes it internally.

Operation Registry

The operation registry maps paths to specs and handlers. Specs and handlers are separate — downstream consumers register both (ADR-025).

pub struct OperationSpec {
    pub name: String,                    // e.g., "/fs/readFile", "/agent/chat"
    pub namespace: String,               // e.g., "fs", "agent"
    pub op_type: OperationType,          // Query, Mutation, Subscription
    pub input_schema: Value,             // JSON Schema for input
    pub output_schema: Value,            // JSON Schema for output
    pub access_control: AccessControl,   // Required scopes/resources
}

pub enum OperationType {
    Query,         // Read-only, idempotent (e.g., "/fs/readFile", "/search/query")
    Mutation,      // Side effects (e.g., "/bash/exec", "/sessions/create")
    Subscription,  // Streaming (e.g., "/events/subscribe")
}

pub struct AccessControl {
    pub required_scopes: Vec<String>,                  // AND-checked
    pub required_scopes_any: Option<Vec<String>>,       // OR-checked
    pub resource_type: Option<String>,                  // e.g., "service"
    pub resource_action: Option<String>,                // e.g., "read"
}

Registration is separated from implementation:

// Core registers discovery operations
registry.register(OperationSpec { name: "/services/list", ... }, list_services_handler);
registry.register(OperationSpec { name: "/services/schema", ... }, schema_handler);

// A dev env worker registers its tools
registry.register(OperationSpec { name: "/fs/readFile", ... }, fs_read_handler);
registry.register(OperationSpec { name: "/bash/exec", ... }, bash_exec_handler);

// A browser client registers notification UDFs
registry.register(OperationSpec { name: "/notify/alert", ... }, notify_handler);

Core-provided operations use short paths without a node prefix (/services/list, /services/schema). They live on whatever node the caller is connected to. Worker-prefixed operations (/dev1/fs/readFile) are routed by the head.

ACL Per Operation Path

Access control maps to path prefixes using standard URL-like matching:

Pattern	Matches	Purpose
`/dev1/*`	All operations on node `dev1`	Full access to a worker
`//fs/`	`fs` service on any node	Read file access across dev envs
`//bash/`	`bash` service on any node	Shell access (higher risk)
`/head/agent/*`	Head LLM agent	LLM calls
`/head/sessions/*`	Head session management	Session history
`/browser-1/notify/alert`	Specific operation on specific node	One UI notification

Higher-risk operations (shell, filesystem write) can require tighter scopes than read-only operations. The ACL evaluates against the caller's Identity.scopes and Identity.resources from the auth layer (see auth.md).

Service Discovery

The /services/list and /services/schema operations expose what a node offers. Read-only — no admin operations:

Operation	Type	Description
`/services/list`	Query	List registered operation paths + metadata
`/services/schema`	Query	Get `OperationSpec` for a specific operation

These tell the caller: "here's what you can call." They are not a control panel. Access control is enforced at the operation level.

PendingRequestMap

Manages in-flight calls and subscriptions. Correlates call.responded events back to the original call.requested:

pub struct PendingRequestMap {
    pending: HashMap<String, PendingEntry>,
}

enum PendingEntry {
    Call {
        tx: oneshot::Sender<Result<Value>>,
        timeout: Instant,
    },
    Subscribe {
        tx: mpsc::Sender<Result<Value>>,
        timeout: Option<Instant>,
    },
}

When a call.responded event arrives:

If PendingEntry::Call → resolve the oneshot, delete entry
If PendingEntry::Subscribe → push to the mpsc channel, keep entry alive

When call.completed arrives on a subscription → close the mpsc channel, delete entry. When call.aborted arrives → cancel/drop whichever side initiated it. A call.aborted for an unknown requestId is silently discarded — no error response is generated.

Timeouts prevent dangling entries. A background task sweeps expired entries periodically.

Protocol Adapter Layer

The call protocol is transport-agnostic by design. It maps to any transport that carries EventEnvelope frames:

Transport	Channel mechanism	Direction
SSH	Reserved `direct_tcpip` destination (ADR-018)	Bidirectional over SSH channel
WebTransport	Bidirectional stream after CONNECT	Bidirectional over WT stream
iroh QUIC	Bidirectional `open_bi()` / `accept_bi()`	Bidirectional over QUIC stream
WebSocket	Single WS connection	Bidirectional over WS frames
Worker	`postMessage`	Bidirectional over structured clone

The framing is always: 4-byte BE length prefix + JSON. The envelope shape is the same regardless of transport.

Relationship to @alkdev/pubsub and @alkdev/operations

The call protocol in core is a Rust reimplementation of the same protocol defined in @alkdev/operations. The TypeScript implementation provides:

PendingRequestMap — request/response correlation
CallHandler — bridges pubsub events to operation registry
OperationSpec, AccessControl, Identity — type definitions

The Rust implementation mirrors these types and behaviors. TypeScript consumers continue using @alkdev/operations over @alkdev/pubsub adapters (including the event-target-alknet adapter). Rust consumers use core's registry directly. Both speak the same wire protocol and can interoperate.

The key principle: the same EventEnvelope can flow from a Rust handler through core, out over SSH channel, into a JavaScript pubsub adapter, and be dispatched through @alkdev/operations's call handler — with zero translation at the wire level.

Agent Service Pattern (Future)

An agent service — coordinating between LLM providers and tool calls — is a primary use case for the call protocol. It would be just another set of registered operations with no special treatment:

/head/agent/chat — send a message, get a completion. Routes to the appropriate LLM provider based on available workers and configuration.
/head/agent/complete — streaming completion. Yields tokens as they arrive.
/head/sessions/list — list session histories (backed by Honker or other durable storage).
/head/sessions/history — retrieve a specific session's message history.

The agent service would use the same call protocol to invoke tools on workers (e.g., /dev1/fs/readFile for file access, /dev1/bash/exec for shell commands). This is a downstream application concern, not a core requirement. The call protocol enables it by providing the universal composition mechanism (OperationEnv, ADR-033), but the agent service itself is built on top, not into the core.

Constraints

The call protocol does not depend on Honker, SQLite, or any database. The PendingRequestMap is in-memory. Durable session storage is a consumer concern.
Operation specs use JSON Schema. Complex sub-structures (postcard, protobuf) can be carried as base64-encoded blobs in the payload, but the envelope itself is always JSON.
Service discovery (/services/list, /services/schema) is read-only. No admin operations are exposed through the call protocol itself.
Batch is not a protocol primitive. Multiple call.requested events with correlated requestIds provide equivalent semantics.
The node prefix in the operation path is a routing mechanism, not a security boundary. ACL is enforced at the AccessControl level, not by path prefix alone. A worker that exposes /dev1/bash/exec can restrict access via required_scopes — not every authenticated identity should have shell access.

Open Questions

OQ-20: How does the head track which workers expose which operations when workers connect and disconnect? Registration on connect and cleanup on disconnect, or heartbeat-based discovery? See open-questions.md.
OQ-22: Should the call protocol support streaming inputs (client streaming in gRPC terms), or is client→server always a single request payload with streaming only server→client? See open-questions.md.

Design Decisions

ADR	Decision	Summary
018	Control channel for pubsub	Reserved destination for event bus
024	Bidirectional call protocol	Generalizes ADR-018, both sides can call
025	Handler/spec separation	Downstream registers operations without modifying core

References

auth.md — Identity and IdentityProvider trait
napi-and-pubsub.md — NAPI wrapper and pubsub adapter
server.md — Channel handling and control channel routing
transport.md — Transport abstraction
configuration.md — ForwardingPolicy, service metadata
@alkdev/pubsub — TypeScript event target adapters and EventEnvelope
@alkdev/operations — TypeScript call protocol, OperationSpec, registry
@alkdev/storage — peer_credentials table, ACL graph, Identity
irpc — iroh streaming RPC (postcard-only, Rust-to-Rust)
iroh — P2P QUIC transport

17 KiB Raw Blame History