docs(architecture): add alknet-call crate spec, ADR-012, resolve OQ-07
Add architecture specs for the alknet-call crate: - call-protocol.md: CallAdapter, EventEnvelope wire format, bidirectional stream model with ID-based correlation, PendingRequestMap, protocol operations (call/subscribe/batch/schema), per-request identity resolution, connection/stream lifecycle, error codes - operation-registry.md: OperationSpec, async Handler type, OperationRegistry, AccessControl with trusted call bypass, OperationEnv with context propagation (parent_request_id, identity inheritance), service discovery, irpc integration layering, naming convention (no leading slash in names) - ADR-012: Call protocol uses bidirectional QUIC streams with EventEnvelope framing and ID-based correlation. Protocol is stream-agnostic and symmetric. Resolves OQ-07. Key design decisions: - Handler type is async (Fn returning Pin<Box<dyn Future>>) - OperationEnv::invoke propagates parent context (identity, metadata, parent_request_id) - Identity resolution is per-request, not per-connection - Operation names without leading slash (fs/readFile, not /fs/readFile) - Batch is a client-side pattern, not a protocol primitive (OQ-14) - Phase 1 uses service/op paths, node prefix added later (OQ-13) Also: promote ADR-010 and ADR-011 from Proposed to Accepted, add OQ-13 and OQ-14 to open-questions.md.
This commit is contained in:
278
docs/architecture/crates/call/call-protocol.md
Normal file
278
docs/architecture/crates/call/call-protocol.md
Normal file
@@ -0,0 +1,278 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-17
|
||||
---
|
||||
|
||||
# Call Protocol
|
||||
|
||||
The wire protocol, stream model, framing, and adapter that alknet-call implements on ALPN `alknet/call`.
|
||||
|
||||
## What
|
||||
|
||||
The call protocol is a bidirectional, stream-agnostic RPC protocol that runs over QUIC bidirectional streams within a single `alknet/call` connection. It supports request/response calls, streaming subscriptions, batch operations, and service discovery — all using the same EventEnvelope wire format.
|
||||
|
||||
The `CallAdapter` implements `ProtocolHandler` for ALPN `alknet/call`. It receives a `Connection` from the endpoint, accepts bidirectional streams, and dispatches incoming `EventEnvelope` messages to the operation registry.
|
||||
|
||||
## Why
|
||||
|
||||
The call protocol is the primary programmatic interface to an alknet node. While SSH provides interactive shell access and HTTP provides REST APIs, the call protocol provides structured, discoverable RPC — the same interface that NAPI clients, MCP tools, and other automation consumers use.
|
||||
|
||||
The protocol must be:
|
||||
- **Cross-language**: JSON wire format consumable from TypeScript, Python, any language
|
||||
- **Bidirectional**: Both sides can initiate calls (server-to-client is as natural as client-to-server)
|
||||
- **Stream-agnostic**: QUIC provides stream multiplexing; the protocol shouldn't impose additional constraints
|
||||
- **Discoverable**: Clients can query what operations exist and their schemas
|
||||
|
||||
See ADR-005 for the decision to use irpc as the call protocol's foundation and ADR-012 for the stream model decision.
|
||||
|
||||
## Architecture
|
||||
|
||||
### CallAdapter
|
||||
|
||||
The `CallAdapter` implements `ProtocolHandler`:
|
||||
|
||||
```rust
|
||||
pub struct CallAdapter {
|
||||
registry: Arc<OperationRegistry>,
|
||||
identity_provider: Arc<dyn IdentityProvider>,
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl ProtocolHandler for CallAdapter {
|
||||
fn alpn(&self) -> &'static [u8] { b"alknet/call" }
|
||||
|
||||
async fn handle(&self, connection: Connection, auth: &AuthContext) -> Result<(), HandlerError> {
|
||||
// Accept bidirectional streams, read EventEnvelopes,
|
||||
// dispatch to registry, write responses
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The adapter:
|
||||
1. Accepts bidirectional streams on the connection
|
||||
2. Reads length-prefixed JSON `EventEnvelope` frames from each stream
|
||||
3. Resolves the peer's identity using `AuthContext` and `IdentityProvider`
|
||||
4. Dispatches `call.requested` events to the operation registry
|
||||
5. Writes response `EventEnvelope` frames back to the appropriate stream
|
||||
6. Manages the `PendingRequestMap` for outgoing calls
|
||||
|
||||
### Stream Model
|
||||
|
||||
See ADR-012 for the full rationale.
|
||||
|
||||
The call protocol uses bidirectional QUIC streams with EventEnvelope framing. Key properties:
|
||||
|
||||
- **Either side can open streams**: The client opens a stream to call a server operation. The server opens a stream to call a client operation. Both use `open_bi()` and `accept_bi()`.
|
||||
- **Correlation by request ID**: The `id` field in `EventEnvelope` correlates requests with responses. A response arriving on stream N can fulfill a request sent on stream M. The `PendingRequestMap` is keyed by ID, not by stream.
|
||||
- **Stream usage is the client's choice**: A client may open one stream per operation, one stream for all operations, or any mix. The server processes EventEnvelopes regardless of stream origin.
|
||||
- **One connection, full access**: A single `alknet/call` connection provides access to all operations (call, subscribe, batch, schema). No need for multiple connections or multiple ALPNs.
|
||||
|
||||
### Wire Format: EventEnvelope
|
||||
|
||||
Every message on the wire is a length-prefixed JSON `EventEnvelope`:
|
||||
|
||||
```rust
|
||||
pub struct EventEnvelope {
|
||||
pub r#type: String, // Event type
|
||||
pub id: String, // Correlation key (request ID, subscription ID)
|
||||
pub payload: Value, // serde_json::Value — schema depends on event type
|
||||
}
|
||||
|
||||
// Frame: 4-byte big-endian length prefix + UTF-8 JSON body
|
||||
```
|
||||
|
||||
The `Value` type is `serde_json::Value`. The envelope is JSON because it must be consumable from JavaScript, Python, and any language. The envelope itself stays JSON for cross-language compatibility.
|
||||
|
||||
Binary payloads (postcard, protobuf) are base64-encoded as a JSON string within the `payload` field. The convention is: if an operation's output schema specifies a binary field, the handler encodes it as a base64 string and the client decodes it. The `EventEnvelope` structure is not aware of this convention — it carries a `serde_json::Value` and does not interpret the payload. This is a handler-level concern, not a protocol-level concern.
|
||||
|
||||
This is the same framing used by irpc and by the `@alkdev/pubsub` TypeScript adapters. The wire format is identical — an `EventEnvelope` flowing from a Rust handler through core, out over a QUIC stream, can be consumed by a JavaScript `@alkdev/operations` call handler with zero translation at the wire level.
|
||||
|
||||
### Event Types
|
||||
|
||||
Five event types carry request/response and subscription semantics:
|
||||
|
||||
| Event | Direction | Purpose |
|
||||
|-------|-----------|---------|
|
||||
| `call.requested` | Caller → Handler | Initiate a call or subscription |
|
||||
| `call.responded` | Handler → Caller | Deliver a result (one for calls, many for subscriptions) |
|
||||
| `call.completed` | Handler → Caller | Signal end of subscription stream |
|
||||
| `call.aborted` | Either side | Cancel the call/subscription |
|
||||
| `call.error` | Handler → Caller | Signal an error |
|
||||
|
||||
**A call is a subscribe that resolves after one event.** Both `call()` and `subscribe()` send the same `call.requested` event. The difference is consumption pattern:
|
||||
- **call()**: Sends `call.requested`, resolves on first `call.responded`
|
||||
- **subscribe()**: Sends `call.requested`, yields each `call.responded` until `call.completed` or `call.aborted`
|
||||
|
||||
The `id` field carries the `requestId` for correlation.
|
||||
|
||||
### `call.error` Payload
|
||||
|
||||
```json
|
||||
{
|
||||
"code": "NOT_FOUND",
|
||||
"message": "operation not found: /fs/readFile",
|
||||
"retryable": false
|
||||
}
|
||||
```
|
||||
|
||||
Error codes use an extensible string enum. Phase 1 defines the following codes:
|
||||
- `NOT_FOUND` — operation not in registry
|
||||
- `FORBIDDEN` — access denied (insufficient scopes or unauthenticated)
|
||||
- `INVALID_INPUT` — input doesn't match the operation's JSON Schema
|
||||
- `INTERNAL` — handler error
|
||||
- `TIMEOUT` — request timed out (retryable: true)
|
||||
|
||||
New error codes may be added in future versions. Clients should treat unknown error codes as `INTERNAL` with `retryable: false`.
|
||||
|
||||
### Protocol Operations
|
||||
|
||||
The call protocol defines four top-level operations, expressed through event types and operation names:
|
||||
|
||||
| Operation | Event Pattern | Description |
|
||||
|-----------|--------------|-------------|
|
||||
| **call** | `call.requested` → `call.responded` or `call.error` | Request/response — one result |
|
||||
| **subscribe** | `call.requested` → many `call.responded` → `call.completed` or `call.aborted` | Streaming — zero or more results |
|
||||
| **batch** | multiple `call.requested` (different IDs) → multiple `call.responded` | Multiple operations in one round |
|
||||
| **schema** | `call.requested` name `services/list` or `services/schema` → `call.responded` | Discover available operations |
|
||||
|
||||
Batch is not a separate event type — it's multiple `call.requested` events with different request IDs. The client sends them (on one or many streams) and correlates the responses by ID. See OQ-14.
|
||||
|
||||
### Bidirectional Calls
|
||||
|
||||
Both sides of the connection can initiate calls. The server can call operations on the client just as the client calls operations on the server.
|
||||
|
||||
```
|
||||
Client Server
|
||||
│ │
|
||||
│── open_bi() → stream ─────────────────────────▶│
|
||||
│── call.requested { id: "c1", ... } ────────────▶│ (client calls server)
|
||||
│◀─ call.responded { id: "c1", ... } ───────────│
|
||||
│ │
|
||||
│◀─ open_bi() ← stream ──────────────────────────│
|
||||
│◀─ call.requested { id: "s1", ... } ────────────│ (server calls client)
|
||||
│── call.responded { id: "s1", ... } ───────────▶│
|
||||
│ │
|
||||
```
|
||||
|
||||
The server calls client operations using the same `PendingRequestMap` and the same `EventEnvelope` format. The operation registry on the client side dispatches `call.requested` events just like the server side.
|
||||
|
||||
This enables patterns where the server pushes notifications, requests configuration from the client, or orchestrates workflows that require the client to perform operations.
|
||||
|
||||
### PendingRequestMap
|
||||
|
||||
Manages in-flight calls and subscriptions. Correlates `call.responded` events back to the original `call.requested`:
|
||||
|
||||
```rust
|
||||
pub struct PendingRequestMap {
|
||||
pending: HashMap<String, PendingEntry>,
|
||||
}
|
||||
|
||||
enum PendingEntry {
|
||||
Call {
|
||||
tx: oneshot::Sender<Result<Value, CallError>>,
|
||||
timeout: Instant,
|
||||
},
|
||||
Subscribe {
|
||||
tx: mpsc::Sender<Result<Value, CallError>>,
|
||||
timeout: Option<Instant>,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
When a `call.responded` event arrives:
|
||||
- If `PendingEntry::Call` → resolve the oneshot, delete entry
|
||||
- If `PendingEntry::Subscribe` → push to the mpsc channel, keep entry alive
|
||||
|
||||
When `call.completed` arrives on a subscription → close the mpsc channel, delete entry.
|
||||
When `call.aborted` arrives → cancel/drop whichever side initiated it.
|
||||
A `call.aborted` for an unknown `requestId` is silently discarded.
|
||||
|
||||
Timeouts prevent dangling entries. A background task sweeps expired entries periodically.
|
||||
|
||||
### CallAdapter Stream Handling
|
||||
|
||||
The `CallAdapter::handle()` method:
|
||||
|
||||
1. Spawns a task that continuously calls `connection.accept_bi()` to receive incoming streams
|
||||
2. For each accepted stream, reads `EventEnvelope` frames using `FrameFramedReader`
|
||||
3. Dispatches `call.requested` events to the operation registry
|
||||
4. Writes response `EventEnvelope` frames using `FrameFramedWriter`
|
||||
5. Manages `PendingRequestMap` for outgoing calls initiated by the server
|
||||
|
||||
For outgoing calls (server → client), the adapter:
|
||||
1. Opens a bidirectional stream with `connection.open_bi()`
|
||||
2. Sends `call.requested` on that stream
|
||||
3. Adds the request ID to the `PendingRequestMap`
|
||||
4. Reads responses from any stream, correlates by ID
|
||||
|
||||
### AuthContext and Identity Resolution
|
||||
|
||||
The `CallAdapter` receives an `AuthContext` from the endpoint. The call protocol resolves identity per-request, not per-connection:
|
||||
|
||||
**Resolution flow**:
|
||||
|
||||
1. The endpoint provides `AuthContext` with whatever identity it resolved at the TLS layer (e.g., client certificate fingerprint). This may be `None` — the `AuthContext.identity` field is `Option<Identity>`.
|
||||
2. When a `call.requested` event arrives, the `CallAdapter` constructs an `OperationContext` with the connection-level `AuthContext.identity`.
|
||||
3. If the `call.requested` payload includes an `auth_token` field, the `CallAdapter` resolves it using `IdentityProvider::resolve_from_token()`. If resolution succeeds, the resulting `Identity` replaces the connection-level identity in the `OperationContext`. If resolution fails, the request proceeds with the connection-level identity (which may be `None`).
|
||||
4. The `OperationContext.identity` is passed to the `OperationRegistry` for ACL checking.
|
||||
5. If `identity` is `None` and the operation's `AccessControl` has restrictions, the registry returns `FORBIDDEN` with message `"authentication required"`.
|
||||
|
||||
**Key point**: Identity is resolved per-request, not per-connection. This allows a single connection to upgrade authentication mid-session (e.g., after an `auth/login` operation returns a token), and allows different operations on the same connection to have different identity levels.
|
||||
|
||||
### ResponseEnvelope
|
||||
|
||||
The universal return type from all operation invocations:
|
||||
|
||||
```rust
|
||||
pub struct ResponseEnvelope {
|
||||
pub request_id: String,
|
||||
pub result: Result<Value, CallError>,
|
||||
}
|
||||
|
||||
pub struct CallError {
|
||||
pub code: String,
|
||||
pub message: String,
|
||||
pub retryable: bool,
|
||||
}
|
||||
```
|
||||
|
||||
Local dispatch produces `ResponseEnvelope` with no serialization overhead. The `CallAdapter` converts `ResponseEnvelope` to `EventEnvelope` for the wire.
|
||||
|
||||
### Connection and Stream Lifecycle
|
||||
|
||||
**Connection drop**: When the QUIC connection closes, all pending requests in the `PendingRequestMap` are failed with `call.error` code `INTERNAL` and message `"connection closed"`. All subscription channels are closed. The `CallAdapter::handle()` method returns `Ok(())` (clean shutdown) or `Err(HandlerError::ConnectionClosed)` (unexpected).
|
||||
|
||||
**Stream reset**: When a QUIC stream is reset mid-operation, the `FrameFramedReader` returns an error. If the stream was carrying a subscription, the `PendingRequestMap` entry is removed and the mpsc channel is closed. If the stream was carrying a call, the oneshot is resolved with an error. No `call.aborted` is sent — the stream is gone.
|
||||
|
||||
**Timeouts**: Default timeout for calls is 30 seconds. Default timeout for subscriptions is optional (the client can specify a timeout in the `call.requested` payload, or leave it open-ended). The `PendingRequestMap` sweeper runs every 10 seconds and removes expired entries. Timeouts are configurable at the `CallAdapter` level, not per-operation.
|
||||
|
||||
**Error handling in `CallAdapter::handle()`**: If a handler panics, the stream is closed and the `PendingRequestMap` entry (if any) is cleaned up by the next sweeper pass. Other streams and the connection are unaffected.
|
||||
|
||||
## Constraints
|
||||
|
||||
- The call protocol does not depend on any database. `PendingRequestMap` is in-memory. Durable session storage is a consumer concern.
|
||||
- Operation specs use JSON Schema. The envelope is always JSON. Binary payloads may be base64-encoded in the `payload` field.
|
||||
- Batch is not a protocol primitive — multiple `call.requested` events with correlated IDs provide equivalent semantics. See OQ-14.
|
||||
- The call protocol is transport-agnostic at the envelope level. The `EventEnvelope` framing can run over QUIC streams, WebSocket frames, or Worker `postMessage`. The `CallAdapter` is the QUIC-specific implementation.
|
||||
- Phase 1 is local dispatch only. The operation registry dispatches to handlers in the same process. Remote dispatch (head/worker routing) and irpc service dispatch are contracted but not built. See ADR-005 and OQ-13.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| Decision | ADR | Summary |
|
||||
|----------|-----|---------|
|
||||
| irpc as call protocol foundation | [ADR-005](../../decisions/005-irpc-as-call-protocol-foundation.md) | irpc provides framing and service dispatch |
|
||||
| Call protocol stream model | [ADR-012](../../decisions/012-call-protocol-stream-model.md) | Bidirectional streams, EventEnvelope, ID-based correlation |
|
||||
| ALPN per connection | [ADR-006](../../decisions/006-alpn-convention-and-connection-model.md) | `alknet/call` is a distinct ALPN, one connection per ALPN |
|
||||
| ProtocolHandler receives Connection | [ADR-007](../../decisions/007-bistream-type-definition.md) | CallAdapter gets Connection, can accept/open multiple streams |
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **OQ-13**: What is the operation path format for the alknet-call crate? The reference implementation used `/{node}/{service}/{op}` for head/worker routing. Phase 1 is single-node, so `/{service}/{op}` may be sufficient. The node prefix can be added later when remote dispatch is implemented.
|
||||
- **OQ-14**: Should batch be a distinct protocol primitive with its own event types, or is the "multiple call.requested with correlated IDs" pattern sufficient? The reference implementation treats batch as a client-side pattern. This is a two-way door — batch-specific event types can be added later without breaking existing clients.
|
||||
|
||||
## References
|
||||
|
||||
- [operation-registry.md](operation-registry.md) — OperationSpec, Handler, AccessControl, service discovery
|
||||
- ADR-005: irpc as call protocol foundation
|
||||
- ADR-012: Call protocol stream model
|
||||
- Reference implementation: `/workspace/@alkdev/alknet-main/crates/alknet-core/src/call/`
|
||||
Reference in New Issue
Block a user