docs(architecture): add ADR-017 call protocol client and adapter contract, resolve OQ-15
ADR-017 locks the client/adapter architecture: - CallClient opens QUIC connections, shares dispatch loop with CallAdapter - Connection direction independent of call direction (both sides can call) - from_call adapter: discovers remote ops via services/list + services/schema, registers with forwarding handlers (same pattern as from_openapi/from_mcp) - to_openapi/to_mcp: project local ops to external protocols - OperationAdapter trait: produces (OperationSpec, Handler) pairs - Cross-node call tree: abort cascade propagates through from_call handlers - Credentials from capabilities (ADR-014), adapter ops Internal by default (ADR-015) The dispatch POC at /workspace/@alkdev/dispatch demonstrated head/worker over SSH+axum; under the call protocol it's cross-node composition via from_call. Connection topology (who advertises, who opens) is independent of call direction — runner pattern, dispatch pattern, and P2P all work.
This commit is contained in:
@@ -0,0 +1,301 @@
|
||||
# ADR-017: Call Protocol Client and Adapter Contract
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The call protocol spec (ADR-012) defined the stream model as bidirectional —
|
||||
"both sides can initiate calls." But the spec only described the server side:
|
||||
`CallAdapter` implements `ProtocolHandler`, accepts incoming QUIC connections,
|
||||
and dispatches to the operation registry. The client side — who opens the
|
||||
connection, how calls are sent, how remote operations are discovered and
|
||||
imported — was left as OQ-15.
|
||||
|
||||
The need for the client side is concrete and immediate:
|
||||
|
||||
- **Head/worker dispatch**: a head node manages worker nodes (Vast.ai, RunPod,
|
||||
local Docker). The head needs to call operations on workers (exec, sync,
|
||||
status) and workers need to call back (report status, request work). The
|
||||
POC at `/workspace/@alkdev/dispatch` demonstrated this over SSH+axum; under
|
||||
the call protocol, it's cross-node composition.
|
||||
- **NAPI/Python adapters**: Node.js and Python clients need to call operations
|
||||
on an alknet node. They speak the EventEnvelope wire format over a QUIC
|
||||
connection.
|
||||
- **Agent tool dispatch**: an agent handler needs to call operations on remote
|
||||
nodes (tools, services) the same way it calls local operations — through
|
||||
`OperationEnv::invoke()`. The `from_call` adapter makes remote operations
|
||||
appear in the local registry.
|
||||
- **Cross-protocol interop**: external systems (HTTP APIs, MCP servers) are
|
||||
imported via `from_openapi` and `from_mcp`. The reverse direction —
|
||||
exposing local operations to external systems — needs `to_openapi` and
|
||||
`to_mcp`.
|
||||
|
||||
The `@alkdev/operations` TypeScript package demonstrated the adapter patterns
|
||||
(`from_openapi`, `from_mcp`) and the `buildEnv` composition mechanism. The Rust
|
||||
implementation defines the canonical traits (ADR-013).
|
||||
|
||||
OQ-15 was constrained by ADR-014 (adapters take credential sources, not static
|
||||
tokens) and ADR-015 (adapter-registered operations are `Internal` by default).
|
||||
This ADR locks the remaining one-way door: the client/adapter contract
|
||||
architecture.
|
||||
|
||||
## Decision
|
||||
|
||||
### 1. `CallClient` opens connections and shares the dispatch loop
|
||||
|
||||
`CallClient` opens a QUIC connection to a remote node with ALPN `alknet/call`.
|
||||
Once connected, the connection is symmetric — both sides can send and receive
|
||||
`call.requested`. The `CallClient` is not just a caller; it is also a callee.
|
||||
It has its own operation registry to dispatch incoming calls from the remote
|
||||
side.
|
||||
|
||||
```rust
|
||||
pub struct CallClient {
|
||||
registry: Arc<OperationRegistry>,
|
||||
identity_provider: Arc<dyn IdentityProvider>,
|
||||
}
|
||||
|
||||
impl CallClient {
|
||||
pub async fn connect(&self, addr: SocketAddr, credentials: CallCredentials) -> Result<CallConnection>;
|
||||
}
|
||||
```
|
||||
|
||||
The dispatch loop is shared between `CallAdapter` and `CallClient`. Once a
|
||||
connection is established (whether accepted by the adapter or opened by the
|
||||
client), the same logic applies: read `EventEnvelope` frames, dispatch to the
|
||||
operation registry, write responses, and send outgoing `call.requested` events
|
||||
for calls initiated on this side. The only difference is who opened the
|
||||
connection.
|
||||
|
||||
`CallConnection` provides:
|
||||
- `call(operation_id, input) -> ResponseEnvelope` — send `call.requested`,
|
||||
await `call.responded` (one result)
|
||||
- `subscribe(operation_id, input) -> Stream<ResponseEnvelope>` — send
|
||||
`call.requested`, yield each `call.responded` until `call.completed` or
|
||||
`call.aborted`
|
||||
- `abort(request_id)` — send `call.aborted`, cascade to descendants (ADR-016)
|
||||
- `services_list() -> Vec<OperationSpec>` — call `services/list`
|
||||
- `services_schema(name) -> OperationSpec` — call `services/schema`
|
||||
|
||||
### 2. Connection direction is independent of call direction
|
||||
|
||||
Who opens the QUIC connection (who has the public IP, who uses a relay, who
|
||||
connects out reverse-runner style) is a connection-layer concern, not a
|
||||
protocol-layer concern. Once connected, both sides can call each other.
|
||||
|
||||
| Topology | Who advertises | Who opens connection | Who can call whom |
|
||||
|----------|---------------|----------------------|-------------------|
|
||||
| Public service | Server (public IP/domain) | Client | Both directions |
|
||||
| P2P (iroh relay) | Both (relay-assisted) | Either | Both directions |
|
||||
| Reverse (runner pattern) | Head (public IP) | Worker connects out | Both directions |
|
||||
| Reverse (dispatch pattern) | Worker (public SSH port) | Head connects out | Both directions |
|
||||
|
||||
The protocol does not distinguish "server" and "client" after connection
|
||||
establishment. The `CallAdapter` accepts connections; the `CallClient` opens
|
||||
connections. Both dispatch incoming and outgoing calls through the same
|
||||
mechanism.
|
||||
|
||||
### 3. `from_call` adapter imports remote operations
|
||||
|
||||
`from_call` does for call protocol endpoints what `from_openapi` does for HTTP
|
||||
APIs: discovers operations and registers them in the local registry with
|
||||
forwarding handlers.
|
||||
|
||||
```rust
|
||||
pub async fn from_call(
|
||||
connection: &CallConnection,
|
||||
config: FromCallConfig,
|
||||
) -> Vec<(OperationSpec, Handler)>
|
||||
```
|
||||
|
||||
The adapter:
|
||||
1. Calls `services/list` on the remote node → gets the list of `External`
|
||||
operations
|
||||
2. Calls `services/schema` for each → gets the input/output JSON Schemas
|
||||
3. For each discovered operation, constructs an `(OperationSpec, Handler)` pair:
|
||||
- The spec mirrors the remote operation's name, namespace, type, schemas,
|
||||
and access control
|
||||
- The handler sends `call.requested` through the `CallConnection` and awaits
|
||||
`call.responded` (or streams for subscriptions)
|
||||
4. The caller registers these pairs in their local registry
|
||||
|
||||
`from_call`-registered operations are `Internal` by default (ADR-015) — they
|
||||
are composition material, not directly callable from the wire. The handler
|
||||
that composes them is `External`.
|
||||
|
||||
The `FromCallConfig` includes:
|
||||
- The credential source for the outbound connection (ADR-014) — TLS identity,
|
||||
auth token, or capability-provided credentials
|
||||
- An optional namespace prefix (to avoid collisions when importing from
|
||||
multiple remote nodes)
|
||||
- An optional operation filter (to import only specific operations)
|
||||
|
||||
### 4. `to_openapi` and `to_mcp` adapters export local operations
|
||||
|
||||
The reverse direction — exposing local operations to external systems:
|
||||
|
||||
- **`to_openapi`**: generates an OpenAPI spec from the local registry's
|
||||
`External` operations. External systems (HTTP clients, API gateways) can
|
||||
discover and call alknet operations through a standard HTTP interface.
|
||||
- **`to_mcp`**: exposes local operations as MCP tools. MCP clients (editors,
|
||||
AI tools) can discover and call alknet operations through the MCP protocol.
|
||||
|
||||
These adapters are outbound bridges — they translate the call protocol's
|
||||
operation model into external protocol formats. They do not modify the local
|
||||
registry; they project it.
|
||||
|
||||
### 5. The adapter contract trait
|
||||
|
||||
The adapter patterns share a common shape: they produce `(OperationSpec,
|
||||
Handler)` pairs that register in the local registry. The trait:
|
||||
|
||||
```rust
|
||||
pub trait OperationAdapter: Send + Sync {
|
||||
fn import(&self) -> Vec<(OperationSpec, Handler)>;
|
||||
}
|
||||
```
|
||||
|
||||
Implementations:
|
||||
- `FromOpenAPI` — imports from an OpenAPI spec (HTTP-backed handlers)
|
||||
- `FromMCP` — imports from an MCP server (MCP-backed handlers)
|
||||
- `FromCall` — imports from a remote call protocol endpoint
|
||||
(call-protocol-backed handlers)
|
||||
- `FromJsonSchema` — imports from a JSON Schema definition (schema-only, no
|
||||
handler — used for validation or client generation)
|
||||
|
||||
The `to_*` adapters are outbound projections, not `OperationAdapter`
|
||||
implementations — they consume the registry, they don't produce entries for it.
|
||||
|
||||
The specific trait signatures (async vs sync, error types, configuration
|
||||
parameters) are two-way doors for implementation. The one-way door is the
|
||||
architectural commitment that adapters produce `(OperationSpec, Handler)`
|
||||
pairs and live in alknet-call.
|
||||
|
||||
### 6. Cross-node call tree and abort cascade
|
||||
|
||||
When a `from_call` handler sends `call.requested` to a remote node, the call
|
||||
participates in the local call tree via `parent_request_id`. If the parent is
|
||||
aborted, the cascade (ADR-016) reaches the `from_call` handler, which sends
|
||||
`call.aborted` to the remote node. The remote node cascades to its own
|
||||
descendants. The abort crosses the node boundary transparently.
|
||||
|
||||
```
|
||||
Head node Worker node
|
||||
r1: /dispatch/run_training
|
||||
r1-a: worker/exec (from_call handler)
|
||||
→ call.requested { id: r1-a } ────────→ receives, dispatches to exec
|
||||
r1-a-1: exec spawns child
|
||||
user aborts r1
|
||||
cascade to r1-a
|
||||
from_call handler sends:
|
||||
call.aborted { id: r1-a } ───────────→ receives, cascades to r1-a-1
|
||||
aborts exec and children
|
||||
```
|
||||
|
||||
### 7. Credential sources for connections
|
||||
|
||||
The `CallClient` needs credentials to authenticate to the remote node. These
|
||||
come from capabilities (ADR-014), not environment variables. The credential
|
||||
types:
|
||||
|
||||
- **TLS identity**: the local node's Ed25519 key (RFC 7250 raw key) or X.509
|
||||
cert, derived from the vault at startup
|
||||
- **Auth token**: an opaque token for call-protocol-level authentication,
|
||||
decrypted from the vault or derived from a shared secret
|
||||
- **Remote identity verification**: the expected fingerprint or cert of the
|
||||
remote node, stored as a capability (not an env var or config file)
|
||||
|
||||
The `from_call` adapter receives these credentials at registration time,
|
||||
same as `from_openapi` receives HTTP credentials.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive:**
|
||||
- Cross-node composition works the same as local composition. A handler calls
|
||||
`env.invoke("worker", "exec", ...)` and doesn't know (or care) whether
|
||||
`worker/exec` is a local operation or a `from_call`-imported remote
|
||||
operation. The composition is transparent.
|
||||
- The head/worker pattern (dispatch, runners) is a connection topology, not a
|
||||
protocol feature. Workers can connect to heads (runner pattern) or heads can
|
||||
connect to workers (dispatch pattern) — the protocol handles both.
|
||||
- `from_call` is the same pattern as `from_openapi` and `from_mcp`: discover,
|
||||
register, forward. The adapter contract is unified.
|
||||
- `to_openapi` and `to_mcp` enable interop with non-alknet systems without
|
||||
those systems needing to speak EventEnvelope.
|
||||
- The abort cascade (ADR-016) crosses node boundaries transparently. No
|
||||
consumer needs to implement cross-node abort propagation.
|
||||
- The NAPI and Python adapters can use `CallClient` directly to call remote
|
||||
operations — they don't need a separate client implementation.
|
||||
|
||||
**Negative:**
|
||||
- `CallClient` has its own operation registry (for dispatching incoming calls
|
||||
from the remote side). This is a second registry instance, not the global
|
||||
one — it needs to be populated with the operations this node wants to expose
|
||||
to that specific remote peer. The specific mechanism (sharing the global
|
||||
registry, a peer-scoped subset, or a separate registry) is a two-way door.
|
||||
- `from_call`-registered operations have a latency cost: each invocation sends
|
||||
a `call.requested` over QUIC and awaits a `call.responded`. This is
|
||||
inherent to remote calls and not specific to the adapter pattern. Caching
|
||||
or batching strategies are consumer concerns.
|
||||
- The `to_*` adapters need to translate the call protocol's operation model
|
||||
(JSON Schema, EventEnvelope, subscribe/stream) into external formats
|
||||
(OpenAPI paths, MCP tools). Some semantics don't map cleanly (e.g.,
|
||||
subscriptions in OpenAPI, bidirectional calls in MCP). The adapters handle
|
||||
these with best-effort mappings and document the gaps.
|
||||
- The `CallConnection` abstraction adds a layer between the handler and the
|
||||
raw QUIC stream. This is necessary for the `from_call` handler to be
|
||||
transparent — it shouldn't know about QUIC streams, only about call/request
|
||||
semantics.
|
||||
|
||||
## Assumptions
|
||||
|
||||
1. **The connection is symmetric after establishment.** Both sides can send
|
||||
and receive `call.requested`. If a future use case requires one-directional
|
||||
connections (e.g., a fire-and-forget notification where the receiver can't
|
||||
call back), the model needs extension. The assumption is that bidirectional
|
||||
is the correct default.
|
||||
|
||||
2. **`services/list` and `services/schema` are the discovery mechanism for
|
||||
`from_call`.** The remote node exposes its `External` operations through
|
||||
these built-in operations. If a remote node doesn't support service
|
||||
discovery (e.g., a minimal worker that only accepts specific calls),
|
||||
`from_call` needs an alternative discovery mechanism (static config, manual
|
||||
spec). The assumption is that nodes participating in cross-node composition
|
||||
support service discovery.
|
||||
|
||||
3. **The `from_call` handler is transparent to composition.** A handler that
|
||||
calls `env.invoke("worker", "exec", ...)` doesn't know it's a remote call.
|
||||
If the remote node is unreachable or the connection drops, the handler gets
|
||||
a `call.error` (same as a local handler error). The assumption is that
|
||||
remote call failures are handled the same as local handler failures.
|
||||
|
||||
4. **`from_call`-registered operations mirror the remote spec.** The imported
|
||||
`OperationSpec` has the same name, namespace, type, schemas, and access
|
||||
control as the remote operation. If the remote operation changes (new
|
||||
schema, renamed), the imported spec is stale until re-import. The
|
||||
assumption is that re-import happens on reconnection or is triggered
|
||||
explicitly. Hot-swapping imported specs is a two-way door.
|
||||
|
||||
5. **The `to_*` adapters are projections, not live bridges.** `to_openapi`
|
||||
generates a spec; it doesn't proxy HTTP requests. An external HTTP client
|
||||
calling the generated OpenAPI endpoints needs an HTTP handler (alknet-http)
|
||||
that translates HTTP requests into call protocol operations. The assumption
|
||||
is that `to_*` generates specs/tools, and a separate HTTP/MCP handler
|
||||
bridges the actual traffic.
|
||||
|
||||
## References
|
||||
|
||||
- ADR-005: irpc as call protocol foundation
|
||||
- ADR-012: Call protocol stream model (bidirectional streams)
|
||||
- ADR-013: Rust as canonical implementation language (adapter traits in Rust)
|
||||
- ADR-014: Secret material flow (credential sources, not static tokens)
|
||||
- ADR-015: Privilege model (adapter ops are Internal by default)
|
||||
- ADR-016: Abort cascade (cross-node abort propagation)
|
||||
- OQ-15: Call protocol client and adapter contract (resolved by this ADR)
|
||||
- [call-protocol.md](../crates/call/call-protocol.md)
|
||||
- [operation-registry.md](../crates/call/operation-registry.md)
|
||||
- TypeScript `@alkdev/operations` — `from_openapi`, `from_mcp`, `buildEnv`
|
||||
prior art
|
||||
- POC at `/workspace/@alkdev/dispatch` — head/worker dispatch over SSH+axum
|
||||
Reference in New Issue
Block a user