docs(architecture): add alknet-call crate spec, ADR-012, resolve OQ-07

Add architecture specs for the alknet-call crate:

- call-protocol.md: CallAdapter, EventEnvelope wire format, bidirectional
  stream model with ID-based correlation, PendingRequestMap, protocol
  operations (call/subscribe/batch/schema), per-request identity resolution,
  connection/stream lifecycle, error codes

- operation-registry.md: OperationSpec, async Handler type, OperationRegistry,
  AccessControl with trusted call bypass, OperationEnv with context
  propagation (parent_request_id, identity inheritance), service discovery,
  irpc integration layering, naming convention (no leading slash in names)

- ADR-012: Call protocol uses bidirectional QUIC streams with EventEnvelope
  framing and ID-based correlation. Protocol is stream-agnostic and symmetric.
  Resolves OQ-07.

Key design decisions:
- Handler type is async (Fn returning Pin<Box<dyn Future>>)
- OperationEnv::invoke propagates parent context (identity, metadata,
  parent_request_id)
- Identity resolution is per-request, not per-connection
- Operation names without leading slash (fs/readFile, not /fs/readFile)
- Batch is a client-side pattern, not a protocol primitive (OQ-14)
- Phase 1 uses service/op paths, node prefix added later (OQ-13)

Also: promote ADR-010 and ADR-011 from Proposed to Accepted, add OQ-13
and OQ-14 to open-questions.md.
This commit is contained in:
2026-06-16 14:22:20 +00:00
parent bd4055ff70
commit a596f0d188
8 changed files with 686 additions and 17 deletions

View File

@@ -1,15 +1,15 @@
--- ---
status: draft status: draft
last_updated: 2026-06-17 last_updated: 2026-06-16
--- ---
# Alknet Architecture # Alknet Architecture
## Current State ## Current State
**Pre-implementation.** The project has completed a pivot from a three-layer model to an ALPN-as-service model. The greenfield workspace contains only `alknet-vault` (stable) and research/reference material. Foundational ADRs (001011) are in place, including the BiStream type definition (ADR-007), vault integration (ADR-008), ALPN router/endpoint (ADR-010), and AuthContext structure (ADR-011). The alknet-core crate spec is in draft. **Pre-implementation.** The project has completed a pivot from a three-layer model to an ALPN-as-service model. The greenfield workspace contains only `alknet-vault` (stable) and research/reference material. Foundational ADRs (001012) are in place, including the BiStream type definition (ADR-007), vault integration (ADR-008), ALPN router/endpoint (ADR-010), AuthContext structure (ADR-011), and call protocol stream model (ADR-012). The alknet-core and alknet-call crate specs are in draft.
**Next step**: Review alknet-core spec documents, then begin implementation. Two-way-door questions (OQ-05, OQ-07, OQ-11, OQ-12) will be resolved during implementation. **Next step**: Review alknet-call spec documents, then begin implementation. Two-way-door questions (OQ-11, OQ-13, OQ-14) will be resolved during implementation.
## Architecture Documents ## Architecture Documents
@@ -22,8 +22,9 @@ last_updated: 2026-06-17
| [crates/core/endpoint.md](crates/core/endpoint.md) | draft | ALPN router, HandlerRegistry, accept loop, shutdown | | [crates/core/endpoint.md](crates/core/endpoint.md) | draft | ALPN router, HandlerRegistry, accept loop, shutdown |
| [crates/core/auth.md](crates/core/auth.md) | draft | AuthContext, Identity, IdentityProvider, AuthToken, resolution flow | | [crates/core/auth.md](crates/core/auth.md) | draft | AuthContext, Identity, IdentityProvider, AuthToken, resolution flow |
| [crates/core/config.md](crates/core/config.md) | draft | StaticConfig, DynamicConfig, ArcSwap, ConfigReloadHandle | | [crates/core/config.md](crates/core/config.md) | draft | StaticConfig, DynamicConfig, ArcSwap, ConfigReloadHandle |
| [crates/call/README.md](crates/call/README.md) | draft | alknet-call crate index |
Crate-specific specs for alknet-call, alknet-ssh, etc. will be created when each crate is ready for Phase 1 architecture work. | [crates/call/call-protocol.md](crates/call/call-protocol.md) | draft | CallAdapter, EventEnvelope framing, stream model, PendingRequestMap, bidirectional calls |
| [crates/call/operation-registry.md](crates/call/operation-registry.md) | draft | OperationSpec, Handler, OperationRegistry, AccessControl, service discovery, irpc integration |
## ADR Table ## ADR Table
@@ -38,8 +39,9 @@ Crate-specific specs for alknet-call, alknet-ssh, etc. will be created when each
| [007](decisions/007-bistream-type-definition.md) | BiStream Type Definition | Accepted | | [007](decisions/007-bistream-type-definition.md) | BiStream Type Definition | Accepted |
| [008](decisions/008-secret-service-integration.md) | Vault Integration Point | Accepted | | [008](decisions/008-secret-service-integration.md) | Vault Integration Point | Accepted |
| [009](decisions/009-one-way-door-decision-framework.md) | One-Way Door Decision Framework | Accepted | | [009](decisions/009-one-way-door-decision-framework.md) | One-Way Door Decision Framework | Accepted |
| [010](decisions/010-alpn-router-and-endpoint.md) | ALPN Router and Endpoint | Proposed | | [010](decisions/010-alpn-router-and-endpoint.md) | ALPN Router and Endpoint | Accepted |
| [011](decisions/011-authcontext-structure.md) | AuthContext Structure and Resolution Flow | Proposed | | [011](decisions/011-authcontext-structure.md) | AuthContext Structure and Resolution Flow | Accepted |
| [012](decisions/012-call-protocol-stream-model.md) | Call Protocol Stream Model | Accepted |
## Open Questions ## Open Questions
@@ -55,11 +57,13 @@ See [open-questions.md](open-questions.md) for the full tracker.
**Resolved two-way doors:** **Resolved two-way doors:**
- **OQ-04**: Dynamic handler registration — static at startup (ADR-010) - **OQ-04**: Dynamic handler registration — static at startup (ADR-010)
- **OQ-07**: Call protocol scope — bidirectional streams, EventEnvelope, ID-based correlation (ADR-012)
- **OQ-12**: TLS certificate provisioning — file paths in StaticConfig, ACME later - **OQ-12**: TLS certificate provisioning — file paths in StaticConfig, ACME later
**Two-way doors (resolved or deferred to implementation):** **Open two-way doors (resolved during implementation):**
- **OQ-07**: Call protocol scope — start with one stream per operation
- **OQ-11**: Handler-level auth resolution observability — decide during implementation - **OQ-11**: Handler-level auth resolution observability — decide during implementation
- **OQ-13**: Operation path format — `/{service}/{op}` for Phase 1, `/{node}/{service}/{op}` later
- **OQ-14**: Batch operation semantics — client-side pattern for Phase 1, batch event types later
**Deferred (not active):** **Deferred (not active):**
- **OQ-09**: WASM target boundaries — design constraint, not deliverable - **OQ-09**: WASM target boundaries — design constraint, not deliverable

View File

@@ -0,0 +1,46 @@
---
status: draft
last_updated: 2026-06-17
---
# alknet-call
Structured RPC over QUIC: operations, request/response, streaming subscriptions, and service discovery. Implements `ProtocolHandler` on ALPN `alknet/call`.
## Documents
| Document | Status | Description |
|----------|--------|-------------|
| [call-protocol.md](call-protocol.md) | draft | CallAdapter, EventEnvelope framing, stream model, PendingRequestMap, bidirectional calls |
| [operation-registry.md](operation-registry.md) | draft | OperationSpec, Handler, OperationRegistry, AccessControl, service discovery, irpc integration |
## Applicable ADRs
| ADR | Title | Relevance |
|-----|-------|-----------|
| [001](../../decisions/001-alpn-protocol-dispatch.md) | ALPN-Based Protocol Dispatch | CallAdapter registers on ALPN `alknet/call` |
| [002](../../decisions/002-protocol-handler-trait.md) | ProtocolHandler Trait | CallAdapter implements ProtocolHandler |
| [003](../../decisions/003-crate-decomposition.md) | Crate Decomposition | alknet-call depends on alknet-core and irpc |
| [004](../../decisions/004-auth-as-shared-core.md) | Auth as Shared Core | AuthContext passed to call handlers |
| [005](../../decisions/005-irpc-as-call-protocol-foundation.md) | irpc as Call Protocol Foundation | irpc provides framing and service dispatch |
| [006](../../decisions/006-alpn-convention-and-connection-model.md) | ALPN String Convention | `alknet/call` ALPN, one ALPN per connection |
| [007](../../decisions/007-bistream-type-definition.md) | BiStream Type Definition | CallAdapter receives Connection, not BiStream |
| [008](../../decisions/008-secret-service-integration.md) | Vault Integration Point | Vault operations exposed via call protocol |
| [012](../../decisions/012-call-protocol-stream-model.md) | Call Protocol Stream Model | Bidirectional streams, EventEnvelope, ID-based correlation |
## Relevant Open Questions
| OQ | Title | Status | Relevance |
|----|-------|--------|-----------|
| OQ-07 | Call protocol scope within a connection | resolved (ADR-012) | Stream model, multiplexing, scope |
| OQ-13 | Operation path format and routing scope | open | Namespace paths: `/{service}/{op}` vs `/{node}/{service}/{op}` |
| OQ-14 | Batch operation semantics | open | Whether batch is a protocol primitive or client-side pattern |
## Key Design Principles
1. **One connection, full access**: An `alknet/call` connection gives access to the entire operation registry — calls, subscriptions, batch, schema.
2. **Protocol is symmetric**: Both sides can initiate calls. The server calling a client uses the same EventEnvelope format and correlation.
3. **Stream-agnostic correlation**: PendingRequestMap correlates by request ID, not by stream. The protocol works with any stream arrangement.
4. **Operation registry is dynamic**: Operations are registered at startup by the CLI binary. The registry supports JSON Schema discovery.
5. **irpc is one dispatch backend**: Local operations dispatch directly. irpc service calls (vault, auth) are internal. The call protocol is the external interface.
6. **Phase 1 is local dispatch only**: The operation registry dispatches to local handlers. Remote dispatch (head/worker routing) and irpc service dispatch are contracted but not built yet.

View File

@@ -0,0 +1,278 @@
---
status: draft
last_updated: 2026-06-17
---
# Call Protocol
The wire protocol, stream model, framing, and adapter that alknet-call implements on ALPN `alknet/call`.
## What
The call protocol is a bidirectional, stream-agnostic RPC protocol that runs over QUIC bidirectional streams within a single `alknet/call` connection. It supports request/response calls, streaming subscriptions, batch operations, and service discovery — all using the same EventEnvelope wire format.
The `CallAdapter` implements `ProtocolHandler` for ALPN `alknet/call`. It receives a `Connection` from the endpoint, accepts bidirectional streams, and dispatches incoming `EventEnvelope` messages to the operation registry.
## Why
The call protocol is the primary programmatic interface to an alknet node. While SSH provides interactive shell access and HTTP provides REST APIs, the call protocol provides structured, discoverable RPC — the same interface that NAPI clients, MCP tools, and other automation consumers use.
The protocol must be:
- **Cross-language**: JSON wire format consumable from TypeScript, Python, any language
- **Bidirectional**: Both sides can initiate calls (server-to-client is as natural as client-to-server)
- **Stream-agnostic**: QUIC provides stream multiplexing; the protocol shouldn't impose additional constraints
- **Discoverable**: Clients can query what operations exist and their schemas
See ADR-005 for the decision to use irpc as the call protocol's foundation and ADR-012 for the stream model decision.
## Architecture
### CallAdapter
The `CallAdapter` implements `ProtocolHandler`:
```rust
pub struct CallAdapter {
registry: Arc<OperationRegistry>,
identity_provider: Arc<dyn IdentityProvider>,
}
#[async_trait]
impl ProtocolHandler for CallAdapter {
fn alpn(&self) -> &'static [u8] { b"alknet/call" }
async fn handle(&self, connection: Connection, auth: &AuthContext) -> Result<(), HandlerError> {
// Accept bidirectional streams, read EventEnvelopes,
// dispatch to registry, write responses
}
}
```
The adapter:
1. Accepts bidirectional streams on the connection
2. Reads length-prefixed JSON `EventEnvelope` frames from each stream
3. Resolves the peer's identity using `AuthContext` and `IdentityProvider`
4. Dispatches `call.requested` events to the operation registry
5. Writes response `EventEnvelope` frames back to the appropriate stream
6. Manages the `PendingRequestMap` for outgoing calls
### Stream Model
See ADR-012 for the full rationale.
The call protocol uses bidirectional QUIC streams with EventEnvelope framing. Key properties:
- **Either side can open streams**: The client opens a stream to call a server operation. The server opens a stream to call a client operation. Both use `open_bi()` and `accept_bi()`.
- **Correlation by request ID**: The `id` field in `EventEnvelope` correlates requests with responses. A response arriving on stream N can fulfill a request sent on stream M. The `PendingRequestMap` is keyed by ID, not by stream.
- **Stream usage is the client's choice**: A client may open one stream per operation, one stream for all operations, or any mix. The server processes EventEnvelopes regardless of stream origin.
- **One connection, full access**: A single `alknet/call` connection provides access to all operations (call, subscribe, batch, schema). No need for multiple connections or multiple ALPNs.
### Wire Format: EventEnvelope
Every message on the wire is a length-prefixed JSON `EventEnvelope`:
```rust
pub struct EventEnvelope {
pub r#type: String, // Event type
pub id: String, // Correlation key (request ID, subscription ID)
pub payload: Value, // serde_json::Value — schema depends on event type
}
// Frame: 4-byte big-endian length prefix + UTF-8 JSON body
```
The `Value` type is `serde_json::Value`. The envelope is JSON because it must be consumable from JavaScript, Python, and any language. The envelope itself stays JSON for cross-language compatibility.
Binary payloads (postcard, protobuf) are base64-encoded as a JSON string within the `payload` field. The convention is: if an operation's output schema specifies a binary field, the handler encodes it as a base64 string and the client decodes it. The `EventEnvelope` structure is not aware of this convention — it carries a `serde_json::Value` and does not interpret the payload. This is a handler-level concern, not a protocol-level concern.
This is the same framing used by irpc and by the `@alkdev/pubsub` TypeScript adapters. The wire format is identical — an `EventEnvelope` flowing from a Rust handler through core, out over a QUIC stream, can be consumed by a JavaScript `@alkdev/operations` call handler with zero translation at the wire level.
### Event Types
Five event types carry request/response and subscription semantics:
| Event | Direction | Purpose |
|-------|-----------|---------|
| `call.requested` | Caller → Handler | Initiate a call or subscription |
| `call.responded` | Handler → Caller | Deliver a result (one for calls, many for subscriptions) |
| `call.completed` | Handler → Caller | Signal end of subscription stream |
| `call.aborted` | Either side | Cancel the call/subscription |
| `call.error` | Handler → Caller | Signal an error |
**A call is a subscribe that resolves after one event.** Both `call()` and `subscribe()` send the same `call.requested` event. The difference is consumption pattern:
- **call()**: Sends `call.requested`, resolves on first `call.responded`
- **subscribe()**: Sends `call.requested`, yields each `call.responded` until `call.completed` or `call.aborted`
The `id` field carries the `requestId` for correlation.
### `call.error` Payload
```json
{
"code": "NOT_FOUND",
"message": "operation not found: /fs/readFile",
"retryable": false
}
```
Error codes use an extensible string enum. Phase 1 defines the following codes:
- `NOT_FOUND` — operation not in registry
- `FORBIDDEN` — access denied (insufficient scopes or unauthenticated)
- `INVALID_INPUT` — input doesn't match the operation's JSON Schema
- `INTERNAL` — handler error
- `TIMEOUT` — request timed out (retryable: true)
New error codes may be added in future versions. Clients should treat unknown error codes as `INTERNAL` with `retryable: false`.
### Protocol Operations
The call protocol defines four top-level operations, expressed through event types and operation names:
| Operation | Event Pattern | Description |
|-----------|--------------|-------------|
| **call** | `call.requested``call.responded` or `call.error` | Request/response — one result |
| **subscribe** | `call.requested` → many `call.responded``call.completed` or `call.aborted` | Streaming — zero or more results |
| **batch** | multiple `call.requested` (different IDs) → multiple `call.responded` | Multiple operations in one round |
| **schema** | `call.requested` name `services/list` or `services/schema``call.responded` | Discover available operations |
Batch is not a separate event type — it's multiple `call.requested` events with different request IDs. The client sends them (on one or many streams) and correlates the responses by ID. See OQ-14.
### Bidirectional Calls
Both sides of the connection can initiate calls. The server can call operations on the client just as the client calls operations on the server.
```
Client Server
│ │
│── open_bi() → stream ─────────────────────────▶│
│── call.requested { id: "c1", ... } ────────────▶│ (client calls server)
│◀─ call.responded { id: "c1", ... } ───────────│
│ │
│◀─ open_bi() ← stream ──────────────────────────│
│◀─ call.requested { id: "s1", ... } ────────────│ (server calls client)
│── call.responded { id: "s1", ... } ───────────▶│
│ │
```
The server calls client operations using the same `PendingRequestMap` and the same `EventEnvelope` format. The operation registry on the client side dispatches `call.requested` events just like the server side.
This enables patterns where the server pushes notifications, requests configuration from the client, or orchestrates workflows that require the client to perform operations.
### PendingRequestMap
Manages in-flight calls and subscriptions. Correlates `call.responded` events back to the original `call.requested`:
```rust
pub struct PendingRequestMap {
pending: HashMap<String, PendingEntry>,
}
enum PendingEntry {
Call {
tx: oneshot::Sender<Result<Value, CallError>>,
timeout: Instant,
},
Subscribe {
tx: mpsc::Sender<Result<Value, CallError>>,
timeout: Option<Instant>,
},
}
```
When a `call.responded` event arrives:
- If `PendingEntry::Call` → resolve the oneshot, delete entry
- If `PendingEntry::Subscribe` → push to the mpsc channel, keep entry alive
When `call.completed` arrives on a subscription → close the mpsc channel, delete entry.
When `call.aborted` arrives → cancel/drop whichever side initiated it.
A `call.aborted` for an unknown `requestId` is silently discarded.
Timeouts prevent dangling entries. A background task sweeps expired entries periodically.
### CallAdapter Stream Handling
The `CallAdapter::handle()` method:
1. Spawns a task that continuously calls `connection.accept_bi()` to receive incoming streams
2. For each accepted stream, reads `EventEnvelope` frames using `FrameFramedReader`
3. Dispatches `call.requested` events to the operation registry
4. Writes response `EventEnvelope` frames using `FrameFramedWriter`
5. Manages `PendingRequestMap` for outgoing calls initiated by the server
For outgoing calls (server → client), the adapter:
1. Opens a bidirectional stream with `connection.open_bi()`
2. Sends `call.requested` on that stream
3. Adds the request ID to the `PendingRequestMap`
4. Reads responses from any stream, correlates by ID
### AuthContext and Identity Resolution
The `CallAdapter` receives an `AuthContext` from the endpoint. The call protocol resolves identity per-request, not per-connection:
**Resolution flow**:
1. The endpoint provides `AuthContext` with whatever identity it resolved at the TLS layer (e.g., client certificate fingerprint). This may be `None` — the `AuthContext.identity` field is `Option<Identity>`.
2. When a `call.requested` event arrives, the `CallAdapter` constructs an `OperationContext` with the connection-level `AuthContext.identity`.
3. If the `call.requested` payload includes an `auth_token` field, the `CallAdapter` resolves it using `IdentityProvider::resolve_from_token()`. If resolution succeeds, the resulting `Identity` replaces the connection-level identity in the `OperationContext`. If resolution fails, the request proceeds with the connection-level identity (which may be `None`).
4. The `OperationContext.identity` is passed to the `OperationRegistry` for ACL checking.
5. If `identity` is `None` and the operation's `AccessControl` has restrictions, the registry returns `FORBIDDEN` with message `"authentication required"`.
**Key point**: Identity is resolved per-request, not per-connection. This allows a single connection to upgrade authentication mid-session (e.g., after an `auth/login` operation returns a token), and allows different operations on the same connection to have different identity levels.
### ResponseEnvelope
The universal return type from all operation invocations:
```rust
pub struct ResponseEnvelope {
pub request_id: String,
pub result: Result<Value, CallError>,
}
pub struct CallError {
pub code: String,
pub message: String,
pub retryable: bool,
}
```
Local dispatch produces `ResponseEnvelope` with no serialization overhead. The `CallAdapter` converts `ResponseEnvelope` to `EventEnvelope` for the wire.
### Connection and Stream Lifecycle
**Connection drop**: When the QUIC connection closes, all pending requests in the `PendingRequestMap` are failed with `call.error` code `INTERNAL` and message `"connection closed"`. All subscription channels are closed. The `CallAdapter::handle()` method returns `Ok(())` (clean shutdown) or `Err(HandlerError::ConnectionClosed)` (unexpected).
**Stream reset**: When a QUIC stream is reset mid-operation, the `FrameFramedReader` returns an error. If the stream was carrying a subscription, the `PendingRequestMap` entry is removed and the mpsc channel is closed. If the stream was carrying a call, the oneshot is resolved with an error. No `call.aborted` is sent — the stream is gone.
**Timeouts**: Default timeout for calls is 30 seconds. Default timeout for subscriptions is optional (the client can specify a timeout in the `call.requested` payload, or leave it open-ended). The `PendingRequestMap` sweeper runs every 10 seconds and removes expired entries. Timeouts are configurable at the `CallAdapter` level, not per-operation.
**Error handling in `CallAdapter::handle()`**: If a handler panics, the stream is closed and the `PendingRequestMap` entry (if any) is cleaned up by the next sweeper pass. Other streams and the connection are unaffected.
## Constraints
- The call protocol does not depend on any database. `PendingRequestMap` is in-memory. Durable session storage is a consumer concern.
- Operation specs use JSON Schema. The envelope is always JSON. Binary payloads may be base64-encoded in the `payload` field.
- Batch is not a protocol primitive — multiple `call.requested` events with correlated IDs provide equivalent semantics. See OQ-14.
- The call protocol is transport-agnostic at the envelope level. The `EventEnvelope` framing can run over QUIC streams, WebSocket frames, or Worker `postMessage`. The `CallAdapter` is the QUIC-specific implementation.
- Phase 1 is local dispatch only. The operation registry dispatches to handlers in the same process. Remote dispatch (head/worker routing) and irpc service dispatch are contracted but not built. See ADR-005 and OQ-13.
## Design Decisions
| Decision | ADR | Summary |
|----------|-----|---------|
| irpc as call protocol foundation | [ADR-005](../../decisions/005-irpc-as-call-protocol-foundation.md) | irpc provides framing and service dispatch |
| Call protocol stream model | [ADR-012](../../decisions/012-call-protocol-stream-model.md) | Bidirectional streams, EventEnvelope, ID-based correlation |
| ALPN per connection | [ADR-006](../../decisions/006-alpn-convention-and-connection-model.md) | `alknet/call` is a distinct ALPN, one connection per ALPN |
| ProtocolHandler receives Connection | [ADR-007](../../decisions/007-bistream-type-definition.md) | CallAdapter gets Connection, can accept/open multiple streams |
## Open Questions
- **OQ-13**: What is the operation path format for the alknet-call crate? The reference implementation used `/{node}/{service}/{op}` for head/worker routing. Phase 1 is single-node, so `/{service}/{op}` may be sufficient. The node prefix can be added later when remote dispatch is implemented.
- **OQ-14**: Should batch be a distinct protocol primitive with its own event types, or is the "multiple call.requested with correlated IDs" pattern sufficient? The reference implementation treats batch as a client-side pattern. This is a two-way door — batch-specific event types can be added later without breaking existing clients.
## References
- [operation-registry.md](operation-registry.md) — OperationSpec, Handler, AccessControl, service discovery
- ADR-005: irpc as call protocol foundation
- ADR-012: Call protocol stream model
- Reference implementation: `/workspace/@alkdev/alknet-main/crates/alknet-core/src/call/`

View File

@@ -0,0 +1,267 @@
---
status: draft
last_updated: 2026-06-17
---
# Operation Registry
OperationSpec, Handler, OperationRegistry, AccessControl, service discovery, and irpc integration.
## What
The operation registry maps operation names to specs and handlers. It is the dispatch core of the call protocol — when a `call.requested` event arrives, the registry looks up the operation by name, checks access control, invokes the handler, and returns the result.
The registry is populated at startup by the CLI binary (or by the assembly layer in embedded contexts). Operations cannot be added or removed at runtime. This is consistent with OQ-04 (static registration at startup) and the `HandlerRegistry` model in alknet-core.
## Why
The operation registry provides:
- **Discoverability**: Clients can query `/services/list` and `/services/schema` to learn what operations exist before calling them
- **Access control**: Each operation declares its required scopes and resources; the registry enforces ACL before invoking the handler
- **Type safety**: JSON Schema for input and output enables validation and client code generation
- **Composability**: Handlers can invoke other operations through `OperationEnv` (local dispatch in Phase 1)
The registry design is derived from the `@alkdev/operations` TypeScript package, which provides the same capabilities in JavaScript runtimes. The Rust implementation preserves the behavioral contract: namespace + operation name → invoke with input, return output.
## Architecture
### OperationSpec
Every registered operation has a spec that declares its name, type, schemas, and access control:
```rust
pub struct OperationSpec {
pub name: String, // e.g., "fs/readFile", "vault/derive" (no leading slash)
pub namespace: String, // e.g., "fs", "vault"
pub op_type: OperationType, // Query, Mutation, Subscription
pub input_schema: Value, // JSON Schema for input
pub output_schema: Value, // JSON Schema for output
pub access_control: AccessControl,
}
pub enum OperationType {
Query, // Read-only, idempotent (e.g., "fs/readFile", "services/list")
Mutation, // Side effects (e.g., "bash/exec", "vault/unlock")
Subscription, // Streaming (e.g., "events/subscribe")
}
```
Operation names use slash-based paths without a leading slash, aligned with URL path conventions: `fs/readFile`, `vault/derive`, `services/list`. The leading slash is added when needed for display (`spec.path()` returns `/fs/readFile`) and for wire format (the `call.requested` payload uses `/fs/readFile`). See OQ-13 for the path format decision (single-node `service/op` vs head/worker `node/service/op`).
The `namespace` field is derived from the name: for `fs/readFile` it's `fs`, for `vault/derive` it's `vault`. It's a convenience accessor for ACL matching and service grouping.
### AccessControl
```rust
pub struct AccessControl {
pub required_scopes: Vec<String>, // AND-checked: caller must have ALL
pub required_scopes_any: Option<Vec<String>>, // OR-checked: caller must have at LEAST ONE
pub resource_type: Option<String>, // e.g., "service"
pub resource_action: Option<String>, // e.g., "read"
}
```
When a `call.requested` event arrives:
1. The `CallAdapter` resolves the caller's `Identity` from `AuthContext` (and possibly an `AuthToken` in the payload)
2. The registry checks `access_control.check(identity)` before invoking the handler
3. If access is denied, the adapter returns `call.error` with code `FORBIDDEN`
4. If the identity is `None` and the operation has restrictions, the adapter returns `call.error` with code `FORBIDDEN` and message `"authentication required"`
Operations with empty `AccessControl` (no required scopes, no resource checks) are accessible to all callers, including unauthenticated ones.
**Trusted calls skip ACL**: When a handler invokes another operation through `OperationEnv`, the nested call is marked `trusted: true` and skips access control checks. This prevents double-checking: if `/agent/chat` is allowed and it internally calls `/auth/verify`, the auth check is trusted.
### Handler
```rust
pub type Handler = Arc<dyn Fn(Value, OperationContext) -> Pin<Box<dyn Future<Output = ResponseEnvelope> + Send>> + Send + Sync>;
```
Handlers are async — many operations (vault key derivation, file I/O, irpc service calls) are inherently asynchronous. The handler receives an `async` runtime context and returns a `Future<Output = ResponseEnvelope>`.
A handler receives:
- `input: Value` — the deserialized `payload` from the `call.requested` event (always `serde_json::Value`)
- `context: OperationContext` — request ID, identity, metadata, env
And returns a `ResponseEnvelope` containing the result or an error.
### OperationContext
```rust
pub struct OperationContext {
pub request_id: String,
pub parent_request_id: Option<String>,
pub identity: Option<Identity>,
pub metadata: HashMap<String, Value>,
pub env: OperationEnv,
pub trusted: bool,
}
```
- `request_id`: Correlates with the `call.requested` event's `id` field
- `parent_request_id`: Set when this call was initiated by another operation (via `OperationEnv`)
- `identity`: The authenticated identity making the call (from `IdentityProvider`)
- `metadata`: Additional context (connection info, tracing IDs)
- `env`: The operation environment for composing calls to other operations
- `trusted`: When `true`, ACL checks are skipped (set by `OperationEnv`, not by callers). The `trusted` field uses module-private construction — handlers construct `OperationContext` through `OperationEnv::invoke()` which sets `trusted: true`, or through the `CallAdapter` dispatch path which sets `trusted: false`. The field is not `pub` for writes; only `pub fn is_trusted(&self) -> bool` is exposed for reads.
### OperationRegistry
```rust
pub struct OperationRegistry {
operations: HashMap<String, (OperationSpec, Handler)>,
}
```
The registry maps operation names to `(OperationSpec, Handler)` pairs. Key methods:
- `register(spec, handler)`: Add an operation at startup
- `lookup(name)`: Find an operation by name, returning spec and handler
- `invoke(name, input, context)`: Look up, check ACL, invoke handler, return result
- `list_operations()`: Return all registered specs (for `/services/list`)
The `OperationRegistryBuilder` provides a fluent API for constructing the registry at startup:
```rust
let registry = OperationRegistryBuilder::new()
.with(services_list_spec(), Arc::new(services_list_handler))
.with(services_schema_spec(), Arc::new(schema_handler))
.with(vault_derive_spec(), Arc::new(vault_derive_handler))
.with(vault_unlock_spec(), Arc::new(vault_unlock_handler))
.build();
```
The CLI binary (or assembly layer) constructs the registry and passes it to the `CallAdapter`. Once built, the registry is immutable.
### OperationEnv
```rust
#[async_trait]
pub trait OperationEnv: Send + Sync {
async fn invoke(&self, namespace: &str, operation: &str, input: Value, parent: &OperationContext) -> ResponseEnvelope;
}
```
`OperationEnv` is the universal composition mechanism. A handler calls `context.env.invoke("vault", "derive", input, &context)` and gets a `ResponseEnvelope` back — regardless of whether the operation runs locally, via an irpc service, or on a remote node.
The `parent` parameter propagates the calling context: the nested call gets `parent_request_id: Some(parent.request_id)`, inherits `parent.identity`, and is marked `trusted: true`.
**Phase 1: Local dispatch only.** The initial `OperationEnv` implementation dispatches directly through the local `OperationRegistry`:
```rust
pub struct LocalOperationEnv {
registry: Arc<OperationRegistry>,
}
#[async_trait]
impl OperationEnv for LocalOperationEnv {
async fn invoke(&self, namespace: &str, operation: &str, input: Value, parent: &OperationContext) -> ResponseEnvelope {
let name = format!("/{namespace}/{operation}");
let context = OperationContext {
request_id: format!("env-{name}"),
parent_request_id: Some(parent.request_id.clone()),
identity: parent.identity.clone(), // Inherit caller's identity
metadata: parent.metadata.clone(), // Inherit caller's metadata
env: self.clone(),
trusted: true, // Nested calls skip ACL
};
self.registry.invoke(&name, input, context).await
}
}
```
Future phases add irpc service dispatch and remote call protocol dispatch as additional backends. The handler-facing API stays the same.
### Service Discovery
Two built-in operations expose what the node offers:
| Operation name | Display path | Type | Description |
|---------------|-------------|------|-------------|
| `services/list` | `/services/list` | Query | List registered operation names and metadata |
| `services/schema` | `/services/schema` | Query | Get the `OperationSpec` for a specific operation |
These are read-only — no admin operations are exposed through the call protocol itself.
`services/list` returns:
```json
{
"operations": [
{ "name": "fs/readFile", "namespace": "fs", "op_type": "query" },
{ "name": "vault/derive", "namespace": "vault", "op_type": "mutation" },
{ "name": "events/subscribe", "namespace": "events", "op_type": "subscription" }
]
}
```
`services/schema` accepts `{ "name": "fs/readFile" }` and returns the full `OperationSpec` including input/output JSON Schemas.
### irpc Integration
irpc and the operation registry serve different scopes:
| Layer | Mechanism | Serialization | Scope |
|-------|-----------|---------------|-------|
| Call protocol (external) | `EventEnvelope` over QUIC streams | JSON | Cross-language, cross-node |
| irpc services (internal) | `VaultProtocol` derive macro, `Service` trait | postcard (binary) | Rust-to-Rust, in-process or in-cluster |
| Local dispatch (in-process) | Direct function call through `OperationRegistry` | None | Same process |
The call protocol can wrap irpc services. When `/vault/derive` receives a `call.requested` event, the handler:
1. Deserializes the JSON payload
2. Calls `VaultProtocol::DeriveEd25519` via irpc (in-process, type-safe, postcard)
3. Serializes the result back to JSON
4. Returns `call.responded` on the stream
This layering preserves irpc's type safety for internal calls while keeping the external interface cross-language.
### Operation Registration at Startup
The CLI binary (or assembly layer) registers operations before starting the endpoint:
```rust
let registry = OperationRegistryBuilder::new()
// Built-in service discovery
.with(services_list_spec(), Arc::new(services_list_handler))
.with(services_schema_spec(), Arc::new(services_schema_handler))
// Vault operations (exposed via call protocol, backed by irpc)
.with(vault_derive_spec(), Arc::new(vault_derive_handler))
.with(vault_unlock_spec(), Arc::new(vault_unlock_handler))
.with(vault_lock_spec(), Arc::new(vault_lock_handler))
.build();
let call_adapter = CallAdapter::new(Arc::new(registry), identity_provider);
```
The registry is immutable after construction. Adding operations requires restarting the process. This is consistent with OQ-04 and the `HandlerRegistry` model in alknet-core.
## Constraints
- The registry is immutable after construction. No runtime registration or deregistration. Two-way door — `ArcSwap<OperationRegistry>` can be added later.
- Operation specs use JSON Schema. The call protocol's external interface is always JSON. irpc's postcard serialization is internal only.
- Phase 1 is local dispatch only. `OperationEnv::invoke()` goes through the local registry. irpc service dispatch and remote call protocol dispatch are contracted but not built.
- The call protocol does not depend on any database. Operation specs are in-memory, populated at startup.
- `OperationContext.trusted` is set by `OperationEnv`, not by callers. A handler cannot mark its own call as trusted.
## Design Decisions
| Decision | ADR | Summary |
|----------|-----|---------|
| irpc as call protocol foundation | [ADR-005](../../decisions/005-irpc-as-call-protocol-foundation.md) | irpc provides framing and service dispatch |
| Call protocol stream model | [ADR-012](../../decisions/012-call-protocol-stream-model.md) | Bidirectional streams, EventEnvelope, ID-based correlation |
| Static handler registration | [ADR-010](../../decisions/010-alpn-router-and-endpoint.md) | Registry is immutable after construction |
| Vault integration via call protocol | [ADR-008](../../decisions/008-secret-service-integration.md) | Vault ops exposed as call protocol operations |
## Open Questions
- **OQ-13**: Operation path format — `/{service}/{op}` for Phase 1 (single-node), with the node prefix `/{node}/{service}/{op}` added when remote dispatch is implemented. Two-way door — the prefix can be added later without breaking existing operations.
- **OQ-14**: Batch operation semantics — whether to add batch-specific event types or rely on the "multiple call.requested with correlated IDs" pattern. Two-way door — can be added later.
## References
- [call-protocol.md](call-protocol.md) — CallAdapter, EventEnvelope, stream model, PendingRequestMap
- ADR-005: irpc as call protocol foundation
- ADR-008: Vault integration point
- ADR-010: ALPN router and endpoint (static registration)
- ADR-012: Call protocol stream model
- Reference implementation: `/workspace/@alkdev/alknet-main/crates/alknet-core/src/call/`

View File

@@ -2,7 +2,7 @@
## Status ## Status
Proposed Accepted
## Context ## Context

View File

@@ -2,7 +2,7 @@
## Status ## Status
Proposed Accepted
## Context ## Context

View File

@@ -0,0 +1,56 @@
# ADR-012: Call Protocol Stream Model
## Status
Accepted
## Context
The call protocol (alknet-call) operates on a QUIC connection with ALPN `alknet/call`. Within that connection, QUIC provides bidirectional streams. The question is how the call protocol uses those streams and how it correlates requests with responses — especially when both sides can initiate calls.
The reference implementation used `EventEnvelope` framing with a `PendingRequestMap` that correlates `call.requested` events to `call.responded` events by request ID, regardless of which stream carries them. This works well but the relationship between streams and operations was underspecified.
OQ-07 asked: "What is the scope of the call protocol within a connection? Should operations be multiplexed within a single stream, or should each operation get its own stream?"
## Decision
The call protocol uses **bidirectional QUIC streams with EventEnvelope framing and ID-based correlation**. The protocol does not prescribe a stream usage pattern — it works with any arrangement:
1. **EventEnvelope on every stream** — every bidirectional stream opened on the `alknet/call` connection carries length-prefixed JSON `EventEnvelope` messages. The five event types (`call.requested`, `call.responded`, `call.completed`, `call.aborted`, `call.error`) are the protocol primitives.
2. **PendingRequestMap correlates by ID, not by stream** — the `id` field in `EventEnvelope` correlates requests with responses. A response on stream 5 can fulfill a request sent on stream 3. The PendingRequestMap is keyed by request ID.
3. **Protocol is symmetric** — both sides of the connection can `open_bi()` to initiate calls and `accept_bi()` to receive them. The server calling a client operation uses the same EventEnvelope format and the same correlation mechanism.
4. **Top-level protocol operations** — the call protocol defines four operations that map to EventEnvelope event patterns:
- **call**: `call.requested``call.responded` (one response) or `call.error`
- **subscribe**: `call.requested` → one or more `call.responded``call.completed` or `call.aborted`
- **batch**: multiple `call.requested` events (with correlated IDs) → multiple `call.responded` events
- **schema**: `call.requested` (name `/services/list` or `/services/schema`) → `call.responded`
5. **Stream usage is the client's choice** — a client may open one stream per operation, one stream for all operations, or any mix. The protocol is stream-agnostic. The server accepts streams and processes EventEnvelopes regardless of which stream they arrive on.
This resolves OQ-07: the call protocol's scope within a connection is the full operation registry. One `alknet/call` connection gives access to all operations (call, subscribe, batch, schema). QUIC's built-in stream multiplexing handles concurrency — the protocol doesn't need to impose additional multiplexing.
## Consequences
**Positive:**
- Simple mental model: one connection, full access, stream-agnostic correlation
- The protocol works the same way regardless of stream usage — no "right" way to use streams
- Bidirectional calls are natural — either side can open a stream and send `call.requested`
- PendingRequestMap from the reference implementation carries forward without modification
- QUIC's stream multiplexing provides natural flow control and head-of-line blocking avoidance
- The top-level operations (call, subscribe, batch, schema) are protocol primitives, not separate ALPNs
**Negative:**
- Clients that multiplex many operations on one stream must manage request IDs carefully — but this is standard RPC practice
- The PendingRequestMap requires timeout-based cleanup to prevent memory leaks from abandoned requests — but this is already implemented and tested in the reference
- No built-in stream-level backpressure per operation when multiple operations share a stream — but QUIC provides connection-level and stream-level flow control
## References
- ADR-005: irpc as call protocol foundation
- ADR-006: ALPN string convention and connection model
- ADR-007: BiStream type definition
- OQ-07: Call protocol scope within a connection (resolved by this ADR)
- Reference implementation: `/workspace/@alkdev/alknet-main/crates/alknet-core/src/call/`

View File

@@ -1,6 +1,6 @@
--- ---
status: draft status: draft
last_updated: 2026-06-17 last_updated: 2026-06-16
--- ---
# Open Questions # Open Questions
@@ -76,11 +76,11 @@ Door type classifications follow ADR-009:
### OQ-07: Call Protocol Scope Within a Connection ### OQ-07: Call Protocol Scope Within a Connection
- **Origin**: ADR-005 - **Origin**: ADR-005
- **Status**: open - **Status**: resolved
- **Door type**: Two-way - **Door type**: Two-way
- **Priority**: medium - **Priority**: medium
- **Resolution**: (deferred to implementation) Start with one stream per operation/request in the call protocol. Multiplexing within a stream can be added later if needed without breaking existing clients. Resolve when speccing alknet-call. - **Resolution**: The call protocol uses bidirectional QUIC streams with EventEnvelope framing and ID-based correlation via PendingRequestMap. The protocol is stream-agnostic — the client can open one stream per operation, multiplex on one stream, or any mix. Correlation is by request ID, not by stream. Both sides can initiate calls. One `alknet/call` connection gives access to the full operation registry (call, subscribe, batch, schema). No multiplexing layer is needed inside the connection. See ADR-012.
- **Cross-references**: ADR-005 - **Cross-references**: ADR-005, ADR-012
## Theme: Security ## Theme: Security
@@ -132,5 +132,23 @@ These questions are acknowledged but not active. They will be promoted to open w
- **Status**: resolved - **Status**: resolved
- **Door type**: Two-way - **Door type**: Two-way
- **Priority**: medium - **Priority**: medium
- **Resolution**: Start with file paths in StaticConfig (option A). The CLI binary provides `tls_cert` and `tls_key` paths at startup. ACME auto-provisioning (option B) and external cert managers (option C) are additive — they can be added as features without changing the core StaticConfig or endpoint lifecycle. `StaticConfig` does NOT include `acme_domain` in v1; ACME will be a separate feature when implemented. - **Resolution**: Start with file paths in StaticConfig (option a). The CLI binary provides `tls_cert` and `tls_key` paths at startup. ACME auto-provisioning (option b) and external cert managers (option c) are additive — they can be added as features without changing the core StaticConfig or endpoint lifecycle. `StaticConfig` does NOT include `acme_domain` in v1; ACME will be a separate feature when implemented.
- **Cross-references**: ADR-010, [config.md](crates/core/config.md) - **Cross-references**: ADR-010, [config.md](crates/core/config.md)
### OQ-13: Operation Path Format and Routing Scope
- **Origin**: [operation-registry.md](crates/call/operation-registry.md)
- **Status**: open
- **Door type**: Two-way
- **Priority**: medium
- **Resolution**: Phase 1 uses `/{service}/{op}` (e.g., `/vault/derive`, `/services/list`). The head/worker `/{node}/{service}/{op}` routing from the reference implementation is a Phase 2+ concern that can be added when remote dispatch is implemented — the node prefix is additive and doesn't break existing operations. Two-way door.
- **Cross-references**: ADR-005, ADR-012
### OQ-14: Batch Operation Semantics
- **Origin**: [call-protocol.md](crates/call/call-protocol.md)
- **Status**: open
- **Door type**: Two-way
- **Priority**: low
- **Resolution**: Phase 1 treats batch as a client-side pattern — multiple `call.requested` events with correlated IDs, responses arrive independently. Batch-specific event types can be added later if needed (e.g., `batch.requested`, `batch.responded`) without breaking the existing protocol. Two-way door.
- **Cross-references**: ADR-012