Files
alknet/docs/architecture/decisions/017-call-protocol-client-and-adapter-contract.md
glm-5.2 c62a6adc7b docs(architecture): resolve review #002 Tiers 1-3 — mechanical and consistency fixes
Governance (Tier 2):
- Advance ADR-022 and ADR-023 from Proposed to Accepted (specs already
  depend on their types as source of truth)
- Amend ADR-015: mark Decision 3 and Assumption 6 as superseded by ADR-022;
  update handler_identity type to CompositionAuthority
- Amend ADR-002: note handle() signature revised by ADR-007 (BiStream → Connection)
- Amend ADR-004: note 'enrich/replace' AuthContext language superseded by
  ADR-011's immutability model; update to describe set_identity on Connection
- Update main README ADR table to show ADR-022/023 as Accepted

Spec-ADR consistency (Tier 3):
- Add abort_policy: AbortPolicy field to OperationContext struct (ADR-016
  Decision 6 mandated this but the spec omitted it)
- Define AbortPolicy enum (AbortDependents | ContinueRunning) with Default impl
- Add abort_policy to build_root_context and LocalOperationEnv::invoke()
- Define the OperationEnv trait explicitly with invoke() and
  invoke_with_policy() methods (was referenced as 'must remain a trait'
  but never defined)
- Specify From<StreamError> for HandlerError impl with exact variant mapping
- Add Connection::from_quinn() / from_iroh() constructors (was referenced
  as Connection::new() but never defined)
- Remove undefined CertAuthorityEntry placeholder from AuthPolicy v1 (will
  be added additively when alknet-ssh lands)
- Fix config.md key-differences table: rate limits are in DynamicConfig,
  not StaticConfig

Mechanical fixes (Tier 1):
- overview.md: 'closes the QUIC stream' → 'closes the connection' (stale
  from pre-ADR-007 model)
- overview.md: OQ-04 entry updated from stale 'defer to implementation'
  to 'resolved: static at startup'
- mnemonic-derivation.md: remove duplicate helper functions block (incomplete
  first copy, complete second copy)
- ADR-003: add iroh (feature-gated) to alknet-core dependency list, added
  by ADR-010
- ADR-021: fix ambiguous 'W1 drift issue from the vault review' cross-reference
- ADR-022: rephrase FromCall 'leaf locally' to 'leaf in the local registry'
- ADR-017: add error_schemas to from_call mirror list and services/schema
  step (inconsistency with ADR-023)
- ADR-016: fix self-referential citation ('ADR-016 Assumption 5' → 'Assumption 5')
- Add ScopedOperationEnv::empty(), allows(), new() and
  CompositionAuthority::none(), new() impl blocks (referenced but undefined)
- Add call.completed clarification for non-subscription calls
- Add services/schema leading-slash normalization note
- Crate README ADR tables: add missing ADR-013 (call), ADR-015 (core),
  ADR-006 + ADR-010 (vault)
- Vault README: add consolidated 'Known Source Drift' table tracking all
  four drift items (OsRng, unwrap, CURRENT_KEY_VERSION, spawn bug) in one
  place, including the two previously missing from README
2026-06-22 05:46:37 +00:00

14 KiB

ADR-017: Call Protocol Client and Adapter Contract

Status

Accepted

Context

The call protocol spec (ADR-012) defined the stream model as bidirectional — "both sides can initiate calls." But the spec only described the server side: CallAdapter implements ProtocolHandler, accepts incoming QUIC connections, and dispatches to the operation registry. The client side — who opens the connection, how calls are sent, how remote operations are discovered and imported — was left as OQ-15.

The need for the client side is concrete and immediate:

  • Head/worker dispatch: a head node manages worker nodes (Vast.ai, RunPod, local Docker). The head needs to call operations on workers (exec, sync, status) and workers need to call back (report status, request work). The POC at /workspace/@alkdev/dispatch demonstrated this over SSH+axum; under the call protocol, it's cross-node composition.
  • NAPI/Python adapters: Node.js and Python clients need to call operations on an alknet node. They speak the EventEnvelope wire format over a QUIC connection.
  • Agent tool dispatch: an agent handler needs to call operations on remote nodes (tools, services) the same way it calls local operations — through OperationEnv::invoke(). The from_call adapter makes remote operations appear in the local registry.
  • Cross-protocol interop: external systems (HTTP APIs, MCP servers) are imported via from_openapi and from_mcp. The reverse direction — exposing local operations to external systems — needs to_openapi and to_mcp.

The @alkdev/operations TypeScript package demonstrated the adapter patterns (from_openapi, from_mcp) and the buildEnv composition mechanism. The Rust implementation defines the canonical traits (ADR-013).

OQ-15 was constrained by ADR-014 (adapters take credential sources, not static tokens) and ADR-015 (adapter-registered operations are Internal by default). This ADR locks the remaining one-way door: the client/adapter contract architecture.

Decision

1. CallClient opens connections and shares the dispatch loop

CallClient opens a QUIC connection to a remote node with ALPN alknet/call. Once connected, the connection is symmetric — both sides can send and receive call.requested. The CallClient is not just a caller; it is also a callee. It has its own operation registry to dispatch incoming calls from the remote side.

pub struct CallClient {
    registry: Arc<OperationRegistry>,
    identity_provider: Arc<dyn IdentityProvider>,
}

impl CallClient {
    pub async fn connect(&self, addr: SocketAddr, credentials: CallCredentials) -> Result<CallConnection>;
}

The dispatch loop is shared between CallAdapter and CallClient. Once a connection is established (whether accepted by the adapter or opened by the client), the same logic applies: read EventEnvelope frames, dispatch to the operation registry, write responses, and send outgoing call.requested events for calls initiated on this side. The only difference is who opened the connection.

CallConnection provides:

  • call(operation_id, input) -> ResponseEnvelope — send call.requested, await call.responded (one result)
  • subscribe(operation_id, input) -> Stream<ResponseEnvelope> — send call.requested, yield each call.responded until call.completed or call.aborted
  • abort(request_id) — send call.aborted, cascade to descendants (ADR-016)
  • services_list() -> Vec<OperationSpec> — call services/list
  • services_schema(name) -> OperationSpec — call services/schema

2. Connection direction is independent of call direction

Who opens the QUIC connection (who has the public IP, who uses a relay, who connects out reverse-runner style) is a connection-layer concern, not a protocol-layer concern. Once connected, both sides can call each other.

Topology Who advertises Who opens connection Who can call whom
Public service Server (public IP/domain) Client Both directions
P2P (iroh relay) Both (relay-assisted) Either Both directions
Reverse (runner pattern) Head (public IP) Worker connects out Both directions
Reverse (dispatch pattern) Worker (public SSH port) Head connects out Both directions

The protocol does not distinguish "server" and "client" after connection establishment. The CallAdapter accepts connections; the CallClient opens connections. Both dispatch incoming and outgoing calls through the same mechanism.

3. from_call adapter imports remote operations

from_call does for call protocol endpoints what from_openapi does for HTTP APIs: discovers operations and registers them in the local registry with forwarding handlers.

pub async fn from_call(
    connection: &CallConnection,
    config: FromCallConfig,
) -> Vec<(OperationSpec, Handler)>

The adapter:

  1. Calls services/list on the remote node → gets the list of External operations
  2. Calls services/schema for each → gets the input/output JSON Schemas and declared error_schemas (ADR-023)
  3. For each discovered operation, constructs an (OperationSpec, Handler) pair:
    • The spec mirrors the remote operation's name, namespace, type, schemas (input, output, and error_schemas — ADR-023), and access control
    • The handler sends call.requested through the CallConnection and awaits call.responded (or streams for subscriptions)
  4. The caller registers these pairs in their local registry

from_call-registered operations are Internal by default (ADR-015) — they are composition material, not directly callable from the wire. The handler that composes them is External.

The FromCallConfig includes:

  • The credential source for the outbound connection (ADR-014) — TLS identity, auth token, or capability-provided credentials
  • An optional namespace prefix (to avoid collisions when importing from multiple remote nodes)
  • An optional operation filter (to import only specific operations)

4. to_openapi and to_mcp adapters export local operations

The reverse direction — exposing local operations to external systems:

  • to_openapi: generates an OpenAPI spec from the local registry's External operations. External systems (HTTP clients, API gateways) can discover and call alknet operations through a standard HTTP interface.
  • to_mcp: exposes local operations as MCP tools. MCP clients (editors, AI tools) can discover and call alknet operations through the MCP protocol.

These adapters are outbound bridges — they translate the call protocol's operation model into external protocol formats. They do not modify the local registry; they project it.

5. The adapter contract trait

The adapter patterns share a common shape: they produce (OperationSpec, Handler) pairs that register in the local registry. The trait:

pub trait OperationAdapter: Send + Sync {
    fn import(&self) -> Vec<(OperationSpec, Handler)>;
}

Implementations:

  • FromOpenAPI — imports from an OpenAPI spec (HTTP-backed handlers)
  • FromMCP — imports from an MCP server (MCP-backed handlers)
  • FromCall — imports from a remote call protocol endpoint (call-protocol-backed handlers)
  • FromJsonSchema — imports from a JSON Schema definition (schema-only, no handler — used for validation or client generation)

The to_* adapters are outbound projections, not OperationAdapter implementations — they consume the registry, they don't produce entries for it.

The specific trait signatures (async vs sync, error types, configuration parameters) are two-way doors for implementation. The one-way door is the architectural commitment that adapters produce (OperationSpec, Handler) pairs and live in alknet-call.

6. Cross-node call tree and abort cascade

When a from_call handler sends call.requested to a remote node, the call participates in the local call tree via parent_request_id. If the parent is aborted, the cascade (ADR-016) reaches the from_call handler, which sends call.aborted to the remote node. The remote node cascades to its own descendants. The abort crosses the node boundary transparently.

Head node                                    Worker node
  r1: /dispatch/run_training
    r1-a: worker/exec (from_call handler)
      → call.requested { id: r1-a } ────────→ receives, dispatches to exec
                                                r1-a-1: exec spawns child
  user aborts r1
    cascade to r1-a
      from_call handler sends:
        call.aborted { id: r1-a } ───────────→ receives, cascades to r1-a-1
                                                  aborts exec and children

7. Credential sources for connections

The CallClient needs credentials to authenticate to the remote node. These come from capabilities (ADR-014), not environment variables. The credential types:

  • TLS identity: the local node's Ed25519 key (RFC 7250 raw key) or X.509 cert, derived from the vault at startup
  • Auth token: an opaque token for call-protocol-level authentication, decrypted from the vault or derived from a shared secret
  • Remote identity verification: the expected fingerprint or cert of the remote node, stored as a capability (not an env var or config file)

The from_call adapter receives these credentials at registration time, same as from_openapi receives HTTP credentials.

Consequences

Positive:

  • Cross-node composition works the same as local composition. A handler calls env.invoke("worker", "exec", ...) and doesn't know (or care) whether worker/exec is a local operation or a from_call-imported remote operation. The composition is transparent.
  • The head/worker pattern (dispatch, runners) is a connection topology, not a protocol feature. Workers can connect to heads (runner pattern) or heads can connect to workers (dispatch pattern) — the protocol handles both.
  • from_call is the same pattern as from_openapi and from_mcp: discover, register, forward. The adapter contract is unified.
  • to_openapi and to_mcp enable interop with non-alknet systems without those systems needing to speak EventEnvelope.
  • The abort cascade (ADR-016) crosses node boundaries transparently. No consumer needs to implement cross-node abort propagation.
  • The NAPI and Python adapters can use CallClient directly to call remote operations — they don't need a separate client implementation.

Negative:

  • CallClient has its own operation registry (for dispatching incoming calls from the remote side). This is a second registry instance, not the global one — it needs to be populated with the operations this node wants to expose to that specific remote peer. The specific mechanism (sharing the global registry, a peer-scoped subset, or a separate registry) is a two-way door.
  • from_call-registered operations have a latency cost: each invocation sends a call.requested over QUIC and awaits a call.responded. This is inherent to remote calls and not specific to the adapter pattern. Caching or batching strategies are consumer concerns.
  • The to_* adapters need to translate the call protocol's operation model (JSON Schema, EventEnvelope, subscribe/stream) into external formats (OpenAPI paths, MCP tools). Some semantics don't map cleanly (e.g., subscriptions in OpenAPI, bidirectional calls in MCP). The adapters handle these with best-effort mappings and document the gaps.
  • The CallConnection abstraction adds a layer between the handler and the raw QUIC stream. This is necessary for the from_call handler to be transparent — it shouldn't know about QUIC streams, only about call/request semantics.

Assumptions

  1. The connection is symmetric after establishment. Both sides can send and receive call.requested. If a future use case requires one-directional connections (e.g., a fire-and-forget notification where the receiver can't call back), the model needs extension. The assumption is that bidirectional is the correct default.

  2. services/list and services/schema are the discovery mechanism for from_call. The remote node exposes its External operations through these built-in operations. If a remote node doesn't support service discovery (e.g., a minimal worker that only accepts specific calls), from_call needs an alternative discovery mechanism (static config, manual spec). The assumption is that nodes participating in cross-node composition support service discovery.

  3. The from_call handler is transparent to composition. A handler that calls env.invoke("worker", "exec", ...) doesn't know it's a remote call. If the remote node is unreachable or the connection drops, the handler gets a call.error (same as a local handler error). The assumption is that remote call failures are handled the same as local handler failures.

  4. from_call-registered operations mirror the remote spec. The imported OperationSpec has the same name, namespace, type, schemas (input, output, and error_schemas per ADR-023), and access control as the remote operation. If the remote operation changes (new schema, renamed), the imported spec is stale until re-import. The assumption is that re-import happens on reconnection or is triggered explicitly. Hot-swapping imported specs is a two-way door.

  5. The to_* adapters are projections, not live bridges. to_openapi generates a spec; it doesn't proxy HTTP requests. An external HTTP client calling the generated OpenAPI endpoints needs an HTTP handler (alknet-http) that translates HTTP requests into call protocol operations. The assumption is that to_* generates specs/tools, and a separate HTTP/MCP handler bridges the actual traffic.

References

  • ADR-005: irpc as call protocol foundation
  • ADR-012: Call protocol stream model (bidirectional streams)
  • ADR-013: Rust as canonical implementation language (adapter traits in Rust)
  • ADR-014: Secret material flow (credential sources, not static tokens)
  • ADR-015: Privilege model (adapter ops are Internal by default)
  • ADR-016: Abort cascade (cross-node abort propagation)
  • OQ-15: Call protocol client and adapter contract (resolved by this ADR)
  • call-protocol.md
  • operation-registry.md
  • TypeScript @alkdev/operationsfrom_openapi, from_mcp, buildEnv prior art
  • POC at /workspace/@alkdev/dispatch — head/worker dispatch over SSH+axum