docs(architecture): add alknet-call crate spec, ADR-012, resolve OQ-07

Add architecture specs for the alknet-call crate:

- call-protocol.md: CallAdapter, EventEnvelope wire format, bidirectional
  stream model with ID-based correlation, PendingRequestMap, protocol
  operations (call/subscribe/batch/schema), per-request identity resolution,
  connection/stream lifecycle, error codes

- operation-registry.md: OperationSpec, async Handler type, OperationRegistry,
  AccessControl with trusted call bypass, OperationEnv with context
  propagation (parent_request_id, identity inheritance), service discovery,
  irpc integration layering, naming convention (no leading slash in names)

- ADR-012: Call protocol uses bidirectional QUIC streams with EventEnvelope
  framing and ID-based correlation. Protocol is stream-agnostic and symmetric.
  Resolves OQ-07.

Key design decisions:
- Handler type is async (Fn returning Pin<Box<dyn Future>>)
- OperationEnv::invoke propagates parent context (identity, metadata,
  parent_request_id)
- Identity resolution is per-request, not per-connection
- Operation names without leading slash (fs/readFile, not /fs/readFile)
- Batch is a client-side pattern, not a protocol primitive (OQ-14)
- Phase 1 uses service/op paths, node prefix added later (OQ-13)

Also: promote ADR-010 and ADR-011 from Proposed to Accepted, add OQ-13
and OQ-14 to open-questions.md.
This commit is contained in:
2026-06-16 14:22:20 +00:00
parent bd4055ff70
commit a596f0d188
8 changed files with 686 additions and 17 deletions

View File

@@ -0,0 +1,278 @@
---
status: draft
last_updated: 2026-06-17
---
# Call Protocol
The wire protocol, stream model, framing, and adapter that alknet-call implements on ALPN `alknet/call`.
## What
The call protocol is a bidirectional, stream-agnostic RPC protocol that runs over QUIC bidirectional streams within a single `alknet/call` connection. It supports request/response calls, streaming subscriptions, batch operations, and service discovery — all using the same EventEnvelope wire format.
The `CallAdapter` implements `ProtocolHandler` for ALPN `alknet/call`. It receives a `Connection` from the endpoint, accepts bidirectional streams, and dispatches incoming `EventEnvelope` messages to the operation registry.
## Why
The call protocol is the primary programmatic interface to an alknet node. While SSH provides interactive shell access and HTTP provides REST APIs, the call protocol provides structured, discoverable RPC — the same interface that NAPI clients, MCP tools, and other automation consumers use.
The protocol must be:
- **Cross-language**: JSON wire format consumable from TypeScript, Python, any language
- **Bidirectional**: Both sides can initiate calls (server-to-client is as natural as client-to-server)
- **Stream-agnostic**: QUIC provides stream multiplexing; the protocol shouldn't impose additional constraints
- **Discoverable**: Clients can query what operations exist and their schemas
See ADR-005 for the decision to use irpc as the call protocol's foundation and ADR-012 for the stream model decision.
## Architecture
### CallAdapter
The `CallAdapter` implements `ProtocolHandler`:
```rust
pub struct CallAdapter {
registry: Arc<OperationRegistry>,
identity_provider: Arc<dyn IdentityProvider>,
}
#[async_trait]
impl ProtocolHandler for CallAdapter {
fn alpn(&self) -> &'static [u8] { b"alknet/call" }
async fn handle(&self, connection: Connection, auth: &AuthContext) -> Result<(), HandlerError> {
// Accept bidirectional streams, read EventEnvelopes,
// dispatch to registry, write responses
}
}
```
The adapter:
1. Accepts bidirectional streams on the connection
2. Reads length-prefixed JSON `EventEnvelope` frames from each stream
3. Resolves the peer's identity using `AuthContext` and `IdentityProvider`
4. Dispatches `call.requested` events to the operation registry
5. Writes response `EventEnvelope` frames back to the appropriate stream
6. Manages the `PendingRequestMap` for outgoing calls
### Stream Model
See ADR-012 for the full rationale.
The call protocol uses bidirectional QUIC streams with EventEnvelope framing. Key properties:
- **Either side can open streams**: The client opens a stream to call a server operation. The server opens a stream to call a client operation. Both use `open_bi()` and `accept_bi()`.
- **Correlation by request ID**: The `id` field in `EventEnvelope` correlates requests with responses. A response arriving on stream N can fulfill a request sent on stream M. The `PendingRequestMap` is keyed by ID, not by stream.
- **Stream usage is the client's choice**: A client may open one stream per operation, one stream for all operations, or any mix. The server processes EventEnvelopes regardless of stream origin.
- **One connection, full access**: A single `alknet/call` connection provides access to all operations (call, subscribe, batch, schema). No need for multiple connections or multiple ALPNs.
### Wire Format: EventEnvelope
Every message on the wire is a length-prefixed JSON `EventEnvelope`:
```rust
pub struct EventEnvelope {
pub r#type: String, // Event type
pub id: String, // Correlation key (request ID, subscription ID)
pub payload: Value, // serde_json::Value — schema depends on event type
}
// Frame: 4-byte big-endian length prefix + UTF-8 JSON body
```
The `Value` type is `serde_json::Value`. The envelope is JSON because it must be consumable from JavaScript, Python, and any language. The envelope itself stays JSON for cross-language compatibility.
Binary payloads (postcard, protobuf) are base64-encoded as a JSON string within the `payload` field. The convention is: if an operation's output schema specifies a binary field, the handler encodes it as a base64 string and the client decodes it. The `EventEnvelope` structure is not aware of this convention — it carries a `serde_json::Value` and does not interpret the payload. This is a handler-level concern, not a protocol-level concern.
This is the same framing used by irpc and by the `@alkdev/pubsub` TypeScript adapters. The wire format is identical — an `EventEnvelope` flowing from a Rust handler through core, out over a QUIC stream, can be consumed by a JavaScript `@alkdev/operations` call handler with zero translation at the wire level.
### Event Types
Five event types carry request/response and subscription semantics:
| Event | Direction | Purpose |
|-------|-----------|---------|
| `call.requested` | Caller → Handler | Initiate a call or subscription |
| `call.responded` | Handler → Caller | Deliver a result (one for calls, many for subscriptions) |
| `call.completed` | Handler → Caller | Signal end of subscription stream |
| `call.aborted` | Either side | Cancel the call/subscription |
| `call.error` | Handler → Caller | Signal an error |
**A call is a subscribe that resolves after one event.** Both `call()` and `subscribe()` send the same `call.requested` event. The difference is consumption pattern:
- **call()**: Sends `call.requested`, resolves on first `call.responded`
- **subscribe()**: Sends `call.requested`, yields each `call.responded` until `call.completed` or `call.aborted`
The `id` field carries the `requestId` for correlation.
### `call.error` Payload
```json
{
"code": "NOT_FOUND",
"message": "operation not found: /fs/readFile",
"retryable": false
}
```
Error codes use an extensible string enum. Phase 1 defines the following codes:
- `NOT_FOUND` — operation not in registry
- `FORBIDDEN` — access denied (insufficient scopes or unauthenticated)
- `INVALID_INPUT` — input doesn't match the operation's JSON Schema
- `INTERNAL` — handler error
- `TIMEOUT` — request timed out (retryable: true)
New error codes may be added in future versions. Clients should treat unknown error codes as `INTERNAL` with `retryable: false`.
### Protocol Operations
The call protocol defines four top-level operations, expressed through event types and operation names:
| Operation | Event Pattern | Description |
|-----------|--------------|-------------|
| **call** | `call.requested``call.responded` or `call.error` | Request/response — one result |
| **subscribe** | `call.requested` → many `call.responded``call.completed` or `call.aborted` | Streaming — zero or more results |
| **batch** | multiple `call.requested` (different IDs) → multiple `call.responded` | Multiple operations in one round |
| **schema** | `call.requested` name `services/list` or `services/schema``call.responded` | Discover available operations |
Batch is not a separate event type — it's multiple `call.requested` events with different request IDs. The client sends them (on one or many streams) and correlates the responses by ID. See OQ-14.
### Bidirectional Calls
Both sides of the connection can initiate calls. The server can call operations on the client just as the client calls operations on the server.
```
Client Server
│ │
│── open_bi() → stream ─────────────────────────▶│
│── call.requested { id: "c1", ... } ────────────▶│ (client calls server)
│◀─ call.responded { id: "c1", ... } ───────────│
│ │
│◀─ open_bi() ← stream ──────────────────────────│
│◀─ call.requested { id: "s1", ... } ────────────│ (server calls client)
│── call.responded { id: "s1", ... } ───────────▶│
│ │
```
The server calls client operations using the same `PendingRequestMap` and the same `EventEnvelope` format. The operation registry on the client side dispatches `call.requested` events just like the server side.
This enables patterns where the server pushes notifications, requests configuration from the client, or orchestrates workflows that require the client to perform operations.
### PendingRequestMap
Manages in-flight calls and subscriptions. Correlates `call.responded` events back to the original `call.requested`:
```rust
pub struct PendingRequestMap {
pending: HashMap<String, PendingEntry>,
}
enum PendingEntry {
Call {
tx: oneshot::Sender<Result<Value, CallError>>,
timeout: Instant,
},
Subscribe {
tx: mpsc::Sender<Result<Value, CallError>>,
timeout: Option<Instant>,
},
}
```
When a `call.responded` event arrives:
- If `PendingEntry::Call` → resolve the oneshot, delete entry
- If `PendingEntry::Subscribe` → push to the mpsc channel, keep entry alive
When `call.completed` arrives on a subscription → close the mpsc channel, delete entry.
When `call.aborted` arrives → cancel/drop whichever side initiated it.
A `call.aborted` for an unknown `requestId` is silently discarded.
Timeouts prevent dangling entries. A background task sweeps expired entries periodically.
### CallAdapter Stream Handling
The `CallAdapter::handle()` method:
1. Spawns a task that continuously calls `connection.accept_bi()` to receive incoming streams
2. For each accepted stream, reads `EventEnvelope` frames using `FrameFramedReader`
3. Dispatches `call.requested` events to the operation registry
4. Writes response `EventEnvelope` frames using `FrameFramedWriter`
5. Manages `PendingRequestMap` for outgoing calls initiated by the server
For outgoing calls (server → client), the adapter:
1. Opens a bidirectional stream with `connection.open_bi()`
2. Sends `call.requested` on that stream
3. Adds the request ID to the `PendingRequestMap`
4. Reads responses from any stream, correlates by ID
### AuthContext and Identity Resolution
The `CallAdapter` receives an `AuthContext` from the endpoint. The call protocol resolves identity per-request, not per-connection:
**Resolution flow**:
1. The endpoint provides `AuthContext` with whatever identity it resolved at the TLS layer (e.g., client certificate fingerprint). This may be `None` — the `AuthContext.identity` field is `Option<Identity>`.
2. When a `call.requested` event arrives, the `CallAdapter` constructs an `OperationContext` with the connection-level `AuthContext.identity`.
3. If the `call.requested` payload includes an `auth_token` field, the `CallAdapter` resolves it using `IdentityProvider::resolve_from_token()`. If resolution succeeds, the resulting `Identity` replaces the connection-level identity in the `OperationContext`. If resolution fails, the request proceeds with the connection-level identity (which may be `None`).
4. The `OperationContext.identity` is passed to the `OperationRegistry` for ACL checking.
5. If `identity` is `None` and the operation's `AccessControl` has restrictions, the registry returns `FORBIDDEN` with message `"authentication required"`.
**Key point**: Identity is resolved per-request, not per-connection. This allows a single connection to upgrade authentication mid-session (e.g., after an `auth/login` operation returns a token), and allows different operations on the same connection to have different identity levels.
### ResponseEnvelope
The universal return type from all operation invocations:
```rust
pub struct ResponseEnvelope {
pub request_id: String,
pub result: Result<Value, CallError>,
}
pub struct CallError {
pub code: String,
pub message: String,
pub retryable: bool,
}
```
Local dispatch produces `ResponseEnvelope` with no serialization overhead. The `CallAdapter` converts `ResponseEnvelope` to `EventEnvelope` for the wire.
### Connection and Stream Lifecycle
**Connection drop**: When the QUIC connection closes, all pending requests in the `PendingRequestMap` are failed with `call.error` code `INTERNAL` and message `"connection closed"`. All subscription channels are closed. The `CallAdapter::handle()` method returns `Ok(())` (clean shutdown) or `Err(HandlerError::ConnectionClosed)` (unexpected).
**Stream reset**: When a QUIC stream is reset mid-operation, the `FrameFramedReader` returns an error. If the stream was carrying a subscription, the `PendingRequestMap` entry is removed and the mpsc channel is closed. If the stream was carrying a call, the oneshot is resolved with an error. No `call.aborted` is sent — the stream is gone.
**Timeouts**: Default timeout for calls is 30 seconds. Default timeout for subscriptions is optional (the client can specify a timeout in the `call.requested` payload, or leave it open-ended). The `PendingRequestMap` sweeper runs every 10 seconds and removes expired entries. Timeouts are configurable at the `CallAdapter` level, not per-operation.
**Error handling in `CallAdapter::handle()`**: If a handler panics, the stream is closed and the `PendingRequestMap` entry (if any) is cleaned up by the next sweeper pass. Other streams and the connection are unaffected.
## Constraints
- The call protocol does not depend on any database. `PendingRequestMap` is in-memory. Durable session storage is a consumer concern.
- Operation specs use JSON Schema. The envelope is always JSON. Binary payloads may be base64-encoded in the `payload` field.
- Batch is not a protocol primitive — multiple `call.requested` events with correlated IDs provide equivalent semantics. See OQ-14.
- The call protocol is transport-agnostic at the envelope level. The `EventEnvelope` framing can run over QUIC streams, WebSocket frames, or Worker `postMessage`. The `CallAdapter` is the QUIC-specific implementation.
- Phase 1 is local dispatch only. The operation registry dispatches to handlers in the same process. Remote dispatch (head/worker routing) and irpc service dispatch are contracted but not built. See ADR-005 and OQ-13.
## Design Decisions
| Decision | ADR | Summary |
|----------|-----|---------|
| irpc as call protocol foundation | [ADR-005](../../decisions/005-irpc-as-call-protocol-foundation.md) | irpc provides framing and service dispatch |
| Call protocol stream model | [ADR-012](../../decisions/012-call-protocol-stream-model.md) | Bidirectional streams, EventEnvelope, ID-based correlation |
| ALPN per connection | [ADR-006](../../decisions/006-alpn-convention-and-connection-model.md) | `alknet/call` is a distinct ALPN, one connection per ALPN |
| ProtocolHandler receives Connection | [ADR-007](../../decisions/007-bistream-type-definition.md) | CallAdapter gets Connection, can accept/open multiple streams |
## Open Questions
- **OQ-13**: What is the operation path format for the alknet-call crate? The reference implementation used `/{node}/{service}/{op}` for head/worker routing. Phase 1 is single-node, so `/{service}/{op}` may be sufficient. The node prefix can be added later when remote dispatch is implemented.
- **OQ-14**: Should batch be a distinct protocol primitive with its own event types, or is the "multiple call.requested with correlated IDs" pattern sufficient? The reference implementation treats batch as a client-side pattern. This is a two-way door — batch-specific event types can be added later without breaking existing clients.
## References
- [operation-registry.md](operation-registry.md) — OperationSpec, Handler, AccessControl, service discovery
- ADR-005: irpc as call protocol foundation
- ADR-012: Call protocol stream model
- Reference implementation: `/workspace/@alkdev/alknet-main/crates/alknet-core/src/call/`