tasks: decompose vault, core, call crates into 28 atomic implementation tasks

Break down the three initial crates (alknet-vault, alknet-core, alknet-call)
into dependency-ordered task files for implementation agents.

Structure:
- tasks/vault/ (10 tasks) — drift fixes from ADR-025/026 refactor, review,
  spec sync. Vault is independent and can run fully in parallel with core/call.
- tasks/core/ (6 tasks) — crate init, core types, config, auth, endpoint,
  review. Core is foundational; call depends on it.
- tasks/call/ (12 tasks) — split into registry/ and protocol/ topic subdirs
  reflecting the two subsystems. CallAdapter is the merge point.

Key decisions:
- Drifts 3+9+10 grouped as one task (key-versioning-rotation) — the complete
  ADR-021 rotation feature that doesn't compile in pieces
- Reviews injected at end of each crate phase (vault, core, call)
- Vault spec-sync task removes the drift table and bumps doc status to stable
- ACME deferred in core/endpoint (noted as TODO; X509 manual certs for now)
- OperationEnv kept as a trait (load-bearing for ADR-024 layering)

Validated: 28 tasks, no cycles, 11 generations of parallel work.
Critical path runs through call (11 tasks). Vault completes by generation 4.
6 high-risk tasks identified (21%): irpc-removal, endpoint, operation-context,
operation-env, call-adapter, abort-cascade.
This commit is contained in:
2026-06-23 12:41:47 +00:00
parent 2e34590522
commit 098fd8b9b9
28 changed files with 4271 additions and 0 deletions

View File

@@ -0,0 +1,193 @@
---
id: call/protocol/abort-cascade
name: Implement abort cascade logic for nested calls (ADR-016)
status: pending
depends_on: [call/protocol/call-adapter]
scope: moderate
risk: high
impact: component
level: implementation
---
## Description
Implement the abort cascade logic in `src/protocol/abort.rs`. When a handler
composes other operations via `OperationEnv::invoke()`, it creates a call tree:
a parent request (r1) spawns children (r1-a, r1-b), which may spawn their own
children. When `call.aborted` arrives for a parent, the protocol cascades the
abort to all non-terminal descendants.
**Read ADR-016 before starting this task.**
### Call tree
The call tree is indexed by `parent_request_id` in the `PendingRequestMap`. The
root request has `parent_request_id: None`. Each composed call has
`parent_request_id: Some(parent.request_id)`.
```
r1 (root, wire call)
├── r1-a (composed by r1's handler)
│ ├── r1-a-1 (composed by r1-a's handler)
│ └── r1-a-2
└── r1-b
└── r1-b-1
```
### Abort cascade
When `call.aborted` arrives for a parent request:
1. Find all non-terminal descendants in the tree (walk by `parent_request_id`)
2. Send `call.aborted` for each descendant
3. Cancel each descendant's future (Drop releases resources)
The CallAdapter walks the tree indexed by `parent_request_id` in
`PendingRequestMap` and sends `call.aborted` for each descendant.
### AbortPolicy
The abort policy is set on `OperationContext` and propagated through
`OperationEnv::invoke()` — the composing handler decides the child's policy,
not the wire caller.
**`AbortDependents` (default)**: aborting a request aborts everything
downstream, regardless of branch. This is the correct default because aborted
parent work has no consumer waiting for results — continuing is wasted work at
best and unwanted side effects at worst (e.g., a `bash/exec` that keeps running
after the caller stopped caring).
**`ContinueRunning` (opt-in)**: descendants that have already started continue
to completion; descendants that haven't started yet are aborted; no new
descendants start. Use for long-running work that should survive a parent's
abort (e.g., a subscription that should keep streaming).
### Wire visibility
Composed child `request_id`s are **internal** — they appear in
`PendingRequestMap` for abort-cascade indexing but are not sent as
`call.requested` to any peer. The client only sees `call.aborted` for the root
ID it sent; the server cascades internally to descendants.
The exception is `from_call` ops, which generate their own wire ID when
forwarding to the remote node (the remote node's `PendingRequestMap` indexes
it).
### Implementation
The abort cascade needs access to the `PendingRequestMap` to walk the tree.
The `CallAdapter` holds the `PendingRequestMap` (or a reference to it). The
cascade logic:
```rust
pub struct AbortCascade {
// Access to PendingRequestMap for tree walking
// The map indexes entries by request_id, and each entry knows its parent_request_id
// (from OperationContext, stored when the entry was registered)
}
impl AbortCascade {
/// Cascade an abort from the given request ID to all non-terminal descendants.
/// Returns the list of request IDs that were aborted (for logging/auditing).
pub fn cascade_abort(&self, root_request_id: &str, policy: AbortPolicy) -> Vec<String>;
/// Find all descendants of a request ID in the call tree.
fn find_descendants(&self, parent_id: &str) -> Vec<String>;
}
```
### Storing parent_request_id in PendingRequestMap
The `PendingRequestMap` needs to know the `parent_request_id` for each entry to
walk the tree. This means `PendingEntry` needs to store the parent ID (or the
full `OperationContext`):
```rust
enum PendingEntry {
Call {
tx: oneshot::Sender<Result<Value, CallError>>,
timeout: Instant,
parent_request_id: Option<String>, // for abort cascade tree
},
Subscribe {
tx: mpsc::Sender<Result<Value, CallError>>,
timeout: Option<Instant>,
parent_request_id: Option<String>, // for abort cascade tree
},
}
```
Update the `PendingRequestMap` (from the pending-request-map task) to store
`parent_request_id` when registering entries. The `register_call` and
`register_subscribe` methods take an optional `parent_request_id` parameter.
### AbortPolicy propagation
The abort policy is propagated through `OperationEnv::invoke()`:
- `invoke()` uses the default impl, which delegates to `invoke_with_policy()`
with `parent.abort_policy.clone()`
- `invoke_with_policy()` takes an explicit policy — use
`AbortPolicy::ContinueRunning` for long-running work
When cascading:
- `AbortDependents`: abort ALL descendants (started and unstarted)
- `ContinueRunning`: abort only unstarted descendants; started ones continue to
completion; no new descendants start
Determining "started" vs "unstarted" is tricky. A practical approach:
- A descendant is "started" if its handler has begun executing (the future has
been polled at least once)
- A descendant is "unstarted" if it's queued but not yet dispatched
This may require tracking dispatch state in `PendingEntry`. A simpler
approximation: under `ContinueRunning`, abort all descendants that haven't sent
a `call.responded` yet (they're still pending). This is conservative but safe.
### Handler cleanup
Handlers clean up resources when their call is cancelled. In Rust, the future
is dropped and `Drop` guards release resources (HTTP streams, file handles,
locks). This is a handler-level concern; the protocol's job is to cascade the
abort. See ADR-016.
## Acceptance Criteria
- [ ] `PendingEntry` stores `parent_request_id` (Call and Subscribe variants)
- [ ] `register_call` and `register_subscribe` accept optional `parent_request_id`
- [ ] `AbortCascade` struct with `cascade_abort()` method
- [ ] `cascade_abort` walks the tree by `parent_request_id`
- [ ] `AbortDependents`: aborts ALL descendants (started and unstarted)
- [ ] `ContinueRunning`: aborts unstarted descendants, started ones continue
- [ ] `cascade_abort` returns list of aborted request IDs
- [ ] `call.aborted` for unknown request_id is silently discarded
- [ ] Composed child request_ids are internal (not sent as call.requested to peer)
- [ ] Client only sees call.aborted for the root ID it sent
- [ ] AbortPolicy propagated through OperationEnv::invoke()
- [ ] Unit test: cascade aborts all descendants under AbortDependents
- [ ] Unit test: cascade aborts only unstarted under ContinueRunning
- [ ] Unit test: unknown request_id → no-op (silently discarded)
- [ ] Unit test: tree with depth 3, abort root → all descendants aborted
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/decisions/016-abort-cascade-for-nested-calls.md — ADR-016 (full rationale)
- docs/architecture/crates/call/call-protocol.md — Abort Cascade and Nested Calls section
- docs/architecture/crates/call/operation-registry.md — AbortPolicy, OperationContext.abort_policy
## Notes
> **Read ADR-016 before starting.** The abort cascade walks the call tree
> indexed by parent_request_id in PendingRequestMap. The default policy
> (AbortDependents) aborts everything downstream — this is correct because
> aborted parent work has no consumer. ContinueRunning is the opt-in for
> long-running work. Composed child request_ids are internal — the client only
> sees call.aborted for the root ID. The PendingRequestMap needs to store
> parent_request_id for tree walking — update the pending-request-map task's
> output if needed.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,260 @@
---
id: call/protocol/call-adapter
name: Implement CallAdapter (ProtocolHandler for alknet/call) with stream handling, identity resolution, and root context construction
status: pending
depends_on: [call/protocol/call-connection, call/registry/operation-env, call/registry/service-discovery, core/endpoint]
scope: broad
risk: high
impact: component
level: implementation
---
## Description
Implement `CallAdapter` in `src/protocol/adapter.rs`. This is the
`ProtocolHandler` implementation for ALPN `alknet/call` — the merge point of the
registry and protocol strands. It ties everything together: stream handling,
identity resolution, root context construction, env composition, dispatch.
### CallAdapter struct
```rust
pub struct CallAdapter {
registry: Arc<OperationRegistry>, // Layer 0 — curated, immutable
identity_provider: Arc<dyn IdentityProvider>,
session_source: Option<Arc<dyn SessionOverlaySource + Send + Sync>>, // Layer 1
default_timeout: Duration, // 30s default
}
impl CallAdapter {
pub fn new(registry: Arc<OperationRegistry>, identity_provider: Arc<dyn IdentityProvider>) -> Self {
Self { registry, identity_provider, session_source: None,
default_timeout: Duration::from_secs(30) }
}
pub fn with_session_source(mut self, source: Arc<dyn SessionOverlaySource + Send + Sync>) -> Self {
self.session_source = Some(source);
self
}
pub fn with_timeout(mut self, timeout: Duration) -> Self {
self.default_timeout = timeout;
self
}
}
```
### SessionOverlaySource trait
```rust
pub trait SessionOverlaySource: Send + Sync {
fn overlay_for(&self, context: &OperationContext) -> Option<Arc<dyn OperationEnv + Send + Sync>>;
}
```
Defined in alknet-call because CallAdapter must name the type — alknet-call
cannot depend on alknet-agent (agent depends on call, not reverse). The agent
crate implements this trait; alknet-call defines it. Same pattern as
IdentityProvider (ADR-004).
### ProtocolHandler impl
```rust
#[async_trait]
impl ProtocolHandler for CallAdapter {
fn alpn(&self) -> &'static [u8] { b"alknet/call" }
async fn handle(&self, connection: Connection, auth: &AuthContext) -> Result<(), HandlerError> {
// 1. Create CallConnection from the Connection
// 2. Spawn a task that continuously calls connection.accept_bi()
// 3. For each accepted stream, read EventEnvelope frames (FrameFramedReader)
// 4. Dispatch call.requested events to the operation registry
// 5. Write response EventEnvelope frames (FrameFramedWriter)
// 6. Manage PendingRequestMap for outgoing calls
// 7. On connection close: fail all pending, return Ok or Err(ConnectionClosed)
}
}
```
### Stream handling
The adapter:
1. Spawns a task that continuously calls `connection.accept_bi()` to receive
incoming streams
2. For each accepted stream, reads `EventEnvelope` frames using
`FrameFramedReader`
3. Dispatches `call.requested` events to the operation registry
4. Writes response `EventEnvelope` frames using `FrameFramedWriter`
5. Manages `PendingRequestMap` for outgoing calls initiated by the server
For outgoing calls (server → client), the adapter:
1. Opens a bidirectional stream with `connection.open_bi()`
2. Sends `call.requested` on that stream
3. Adds the request ID to the `PendingRequestMap`
4. Reads responses from any stream, correlates by ID
### Identity resolution (per-request)
The CallAdapter resolves identity per-request, not per-connection:
1. The endpoint provides `AuthContext` with whatever identity it resolved at
the TLS layer (may be `None`)
2. When a `call.requested` event arrives, the CallAdapter constructs an
`OperationContext` with the connection-level `AuthContext.identity`
3. If the `call.requested` payload includes an `auth_token` field, the
CallAdapter resolves it using `IdentityProvider::resolve_from_token()`. If
resolution succeeds, the resulting `Identity` replaces the connection-level
identity in the `OperationContext`. If resolution fails, the request
proceeds with the connection-level identity (which may be `None`)
4. The `OperationContext.identity` is passed to the `OperationRegistry` for
ACL checking
5. If `identity` is `None` and the operation's `AccessControl` has
restrictions, the registry returns `FORBIDDEN` with message
`"authentication required"`
**Key point**: Identity is resolved per-request. This allows a single
connection to upgrade authentication mid-session and allows different operations
on the same connection to have different identity levels.
### Root OperationContext construction
When a `call.requested` arrives from the wire, the CallAdapter constructs the
root `OperationContext` — the entry point of the call tree. This sets
`internal: false`, meaning ACL runs against the caller's `identity`, not a
handler's composition authority (ADR-015, ADR-022).
```rust
fn build_root_context(
&self,
request_id: String,
operation_name: &str,
identity: Option<Identity>,
/* connection, session */
) -> OperationContext {
let registration = self.registry.registration(operation_name);
OperationContext {
request_id,
parent_request_id: None, // wire request — top of call tree
identity: identity.clone(), // caller's identity (inbound)
handler_identity: registration.composition_authority.clone(),
capabilities: registration.capabilities.clone(),
metadata: HashMap::new(),
deadline: Some(Instant::now() + self.default_timeout),
scoped_env: registration.scoped_env.clone()
.unwrap_or_else(ScopedOperationEnv::empty),
env: self.compose_root_env(/* connection, session */),
abort_policy: AbortPolicy::default(), // abort-dependents
internal: false, // external call — ACL against caller identity
}
}
```
### compose_root_env
The per-call `env` composition (ADR-024) builds a `CompositeOperationEnv` from:
- Layer 0: `LocalOperationEnv` (curated registry)
- Layer 1: session overlay (if active, from `session_source.overlay_for()`)
- Layer 2: connection overlay (from `CallConnection.overlay_env()`)
```rust
fn compose_root_env(&self, connection: &CallConnection, context: &OperationContext) -> Arc<dyn OperationEnv + Send + Sync> {
let base = Arc::new(LocalOperationEnv { registry: self.registry.clone() });
let session = self.session_source.as_ref()
.and_then(|s| s.overlay_for(context));
let connection_overlay = connection.overlay_env();
Arc::new(CompositeOperationEnv { session, connection: Some(connection_overlay), base })
}
```
### operationId normalization
The `call.requested` payload's `operationId` has a leading slash (`/fs/readFile`).
The CallAdapter strips it before registry lookup (`fs/readFile`). This is a
single rule applied consistently — the registry stores names without leading
slash, the wire format adds it.
### ResponseEnvelope → EventEnvelope
The CallAdapter converts `ResponseEnvelope` (from local dispatch) to
`EventEnvelope` for the wire:
| `ResponseEnvelope` | `EventEnvelope` |
|--------------------|-----------------|
| `Ok(value)` | `{ type: "call.responded", id: request_id, payload: { output: value } }` |
| `Err(call_error)` | `{ type: "call.error", id: request_id, payload: <serialized CallError> }` |
For subscriptions, each `call.responded` is a separate `EventEnvelope` with the
same `id`; `call.completed` is `{ type: "call.completed", id, payload: {} }`.
### Timeout handling
- Default timeout for wire calls is 30 seconds (`default_timeout`)
- `build_root_context` sets `OperationContext.deadline` to `now + default_timeout`
- Composed calls inherit the parent's deadline (children do NOT get a fresh 30s)
- A composed call that exceeds the deadline is cancelled and returns
`CallError { code: "TIMEOUT", retryable: true }`
- Subscriptions default to no deadline (`deadline: None` — unbounded); the
client can specify a timeout in the `call.requested` payload
- The `PendingRequestMap` sweeper runs every 10 seconds and removes expired
wire entries
### Error handling in handle()
- If a handler panics, the stream is closed and the PendingRequestMap entry is
cleaned up by the next sweeper pass. Other streams and the connection are
unaffected.
- Connection drop: all pending requests failed with `call.error` code
`INTERNAL` and message `"connection closed"`. All subscription channels
closed. `handle()` returns `Ok(())` (clean) or `Err(ConnectionClosed)`.
- Stream reset: `FrameFramedReader` returns an error. If subscription, remove
PendingRequestMap entry, close mpsc. If call, resolve oneshot with error. No
`call.aborted` sent — stream is gone.
## Acceptance Criteria
- [ ] `CallAdapter` struct with registry, identity_provider, session_source, default_timeout
- [ ] `CallAdapter::new()`, `with_session_source()`, `with_timeout()` constructors
- [ ] `SessionOverlaySource` trait defined with `overlay_for()` method
- [ ] `ProtocolHandler::alpn()` returns `b"alknet/call"`
- [ ] `handle()` accepts streams, reads EventEnvelope frames, dispatches
- [ ] `handle()` spawns task for continuous `accept_bi()`
- [ ] Outgoing calls: open_bi, send call.requested, add to PendingRequestMap
- [ ] Identity resolution: AuthContext.identity used, auth_token overrides per-request
- [ ] auth_token resolution failure → proceed with connection-level identity
- [ ] `build_root_context` sets internal: false, deadline, capabilities from registration
- [ ] `compose_root_env` builds CompositeOperationEnv (base + session + connection)
- [ ] operationId leading slash stripped before registry lookup
- [ ] ResponseEnvelope → EventEnvelope conversion (Ok → responded, Err → error)
- [ ] Subscriptions: multiple call.responded with same id, then call.completed
- [ ] Timeout: 30s default, composed calls inherit parent deadline
- [ ] Handler panic: stream closed, PendingRequestMap cleaned up, others unaffected
- [ ] Connection drop: fail all pending with INTERNAL, return Ok or Err
- [ ] Unit test: CallAdapter alpn returns b"alknet/call"
- [ ] Integration test: call.requested → dispatch → call.responded round-trip
- [ ] Integration test: auth_token overrides connection-level identity
- [ ] Integration test: Internal op called from wire → NOT_FOUND
- [ ] Integration test: ACL denied → FORBIDDEN
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/crates/call/call-protocol.md — CallAdapter, stream handling, root context
- docs/architecture/crates/call/operation-registry.md — OperationContext construction
- docs/architecture/decisions/015-privilege-model-and-authority-context.md — ADR-015 (internal: false for wire)
- docs/architecture/decisions/024-operation-registry-layering.md — ADR-024 (env composition)
- docs/architecture/decisions/012-call-protocol-stream-model.md — ADR-012
## Notes
> This is the merge point of the registry and protocol strands — the highest-
> risk task in the call crate. It ties together stream handling, identity
> resolution, root context construction, env composition, and dispatch. The
> per-request identity resolution (auth_token overrides connection-level) is
> important — a single connection can upgrade auth mid-session. The
> compose_root_env builds the CompositeOperationEnv per call from the active
> layers. operationId on the wire has a leading slash; strip it before lookup.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,158 @@
---
id: call/protocol/call-connection
name: Implement CallConnection with imported-ops overlay (Layer 2) and call/subscribe/abort methods
status: pending
depends_on: [call/protocol/pending-request-map, call/registry/operation-env]
scope: moderate
risk: medium
impact: component
level: implementation
---
## Description
Implement `CallConnection` in `src/protocol/connection.rs`. This represents an
established `alknet/call` connection, regardless of which side opened it
(ADR-017). It holds the connection's imported-ops overlay (Layer 2, ADR-024).
### CallConnection
```rust
pub struct CallConnection {
connection: Connection,
imported_operations: Arc<RwLock<HashMap<String, HandlerRegistration>>>,
}
```
An established alknet/call connection (either direction — accepted or opened).
Holds the Layer 2 overlay (imported ops from `from_call` discovery).
### Layer 2 registration API
```rust
impl CallConnection {
/// Register an imported operation into this connection's overlay (Layer 2, ADR-024).
/// Called by from_call after discovery.
pub fn register_imported(&self, registration: HandlerRegistration) {
let name = registration.spec.name.clone();
self.imported_operations.write().insert(name, registration);
}
/// Register multiple imported operations (bulk variant for from_call).
pub fn register_imported_all(&self, registrations: Vec<HandlerRegistration>) {
let mut overlay = self.imported_operations.write();
for reg in registrations {
overlay.insert(reg.spec.name.clone(), reg);
}
}
}
```
Layer 0 (curated) is built via `OperationRegistryBuilder` at startup. Layer 2
(per-connection) registration uses `CallConnection::register_imported()` at
runtime. When the connection drops, the overlay (and all imported ops) is
dropped — no explicit deregistration needed.
### Overlay env
```rust
impl CallConnection {
/// Build an OperationEnv impl for this connection's overlay.
/// Used by the CallAdapter when composing the root OperationContext.env.
/// Returns an OperationEnv that dispatches to this connection's imported ops
/// (and reports contains only for ops in the overlay).
pub fn overlay_env(&self) -> Arc<dyn OperationEnv + Send + Sync>;
}
```
This is an `OperationEnv` impl that dispatches to the connection's imported ops.
The `contains()` method returns true only for ops in the overlay. The
`invoke_with_policy()` method looks up the op in the overlay and dispatches to
its handler.
This env is composed into the `CompositeOperationEnv` by the CallAdapter as the
`connection` layer (Layer 2).
### Call methods (outgoing)
```rust
impl CallConnection {
/// Call an operation on the remote peer (sends call.requested).
pub async fn call(&self, operation_id: &str, input: Value) -> ResponseEnvelope;
/// Subscribe to a streaming operation on the remote peer.
pub async fn subscribe(&self, operation_id: &str, input: Value) -> impl Stream<Item = ResponseEnvelope>;
/// Abort an in-flight request (sends call.aborted, cascades per ADR-016).
pub async fn abort(&self, request_id: &str);
}
```
These methods:
1. Open a bidirectional stream with `connection.open_bi()`
2. Send `call.requested` on that stream (via FrameFramedWriter)
3. Add the request ID to the PendingRequestMap
4. Read responses from any stream, correlate by ID (via PendingRequestMap)
`call()` resolves on the first `call.responded`. `subscribe()` yields each
`call.responded` until `call.completed` or `call.aborted`.
`abort()` sends `call.aborted` for the given request ID. The abort cascade
(ADR-016) is handled by the abort-cascade task.
### Connection direction independence
Per ADR-017, connection direction is independent of call direction. Both
sides can call each other once connected. The `CallConnection` type is the same
whether the connection was accepted (server side) or opened (client side via
`CallClient`). The `call`/`subscribe`/`abort` methods work the same way.
### from_call integration
The `from_call` adapter (ADR-017) discovers operations on a remote call
protocol endpoint via `services/list` and `services/schema`, then registers
them with `register_imported()` / `register_imported_all()`. This makes
cross-node composition transparent — a handler calling
`env.invoke("worker", "exec", ...)` doesn't know whether the operation is
local or remote.
The `from_call` adapter itself is not implemented in this task — it's a future
task. This task implements the `CallConnection` infrastructure that `from_call`
will use.
## Acceptance Criteria
- [ ] `CallConnection` struct with connection and imported_operations fields
- [ ] `register_imported()` adds to the Layer 2 overlay
- [ ] `register_imported_all()` bulk adds to the overlay
- [ ] `overlay_env()` returns an OperationEnv dispatching to imported ops
- [ ] `overlay_env().contains()` returns true only for ops in the overlay
- [ ] `call()` sends call.requested, resolves on first call.responded
- [ ] `subscribe()` sends call.requested, yields call.responded until completed/aborted
- [ ] `abort()` sends call.aborted for the request ID
- [ ] Outgoing calls open a stream, send request, add to PendingRequestMap
- [ ] Connection drop drops the overlay (no explicit deregistration)
- [ ] Unit test: register_imported adds to overlay, contains returns true
- [ ] Unit test: overlay_env dispatches to imported op
- [ ] Unit test: overlay_env contains returns false for non-imported op
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/crates/call/call-protocol.md — CallConnection section
- docs/architecture/decisions/017-call-protocol-client-and-adapter-contract.md — ADR-017
- docs/architecture/decisions/024-operation-registry-layering.md — ADR-024 (Layer 2)
## Notes
> Connection direction is independent of call direction (ADR-017) — both sides
> can call each other. The Layer 2 overlay is per-connection: when the
> connection drops, the overlay drops (no deregistration needed). The
> overlay_env() is composed into CompositeOperationEnv by the CallAdapter as
> the connection layer. The from_call adapter itself is a future task — this
> implements the infrastructure it will use.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,164 @@
---
id: call/protocol/pending-request-map
name: Implement PendingRequestMap for correlating call.requested and call.responded events
status: pending
depends_on: [call/protocol/wire-types]
scope: moderate
risk: medium
impact: component
level: implementation
---
## Description
Implement `PendingRequestMap` in `src/protocol/pending.rs`. This manages
in-flight calls and subscriptions, correlating `call.responded` events back to
the original `call.requested` by request ID.
### PendingRequestMap
```rust
pub struct PendingRequestMap {
pending: HashMap<String, PendingEntry>,
}
enum PendingEntry {
Call {
tx: oneshot::Sender<Result<Value, CallError>>,
timeout: Instant,
},
Subscribe {
tx: mpsc::Sender<Result<Value, CallError>>,
timeout: Option<Instant>,
},
}
```
### Behavior
When a `call.responded` event arrives:
- If `PendingEntry::Call` → resolve the oneshot, delete entry
- If `PendingEntry::Subscribe` → push to the mpsc channel, keep entry alive
When `call.completed` arrives on a subscription → close the mpsc channel, delete entry.
When `call.aborted` arrives → cancel/drop whichever side initiated it. A
`call.aborted` for an unknown `requestId` is silently discarded.
When `call.error` arrives → resolve the oneshot (Call) or push to channel
(Subscribe) with the error, delete entry.
### Timeouts
Timeouts prevent dangling entries. A background task sweeps expired entries
periodically (every 10 seconds per call-protocol.md).
- `Call` entries have a timeout (default 30s from CallAdapter.default_timeout)
- `Subscribe` entries may have `timeout: None` (unbounded — long-running
subscriptions)
When the sweeper finds an expired entry:
- `Call`: resolve oneshot with `CallError { code: "TIMEOUT", retryable: true }`, delete
- `Subscribe`: close mpsc channel with a timeout error, delete
### Methods
```rust
impl PendingRequestMap {
pub fn new() -> Self;
/// Register a pending call. Returns a oneshot receiver for the result.
pub fn register_call(&mut self, request_id: String, timeout: Instant) -> oneshot::Receiver<Result<Value, CallError>>;
/// Register a pending subscription. Returns an mpsc receiver for the stream.
pub fn register_subscribe(&mut self, request_id: String, timeout: Option<Instant>) -> mpsc::Receiver<Result<Value, CallError>>;
/// Handle an incoming call.responded event.
/// Returns true if the entry was found and handled.
pub fn handle_responded(&mut self, request_id: &str, output: Value) -> bool;
/// Handle an incoming call.completed event (subscriptions only).
/// Closes the mpsc channel, deletes entry.
pub fn handle_completed(&mut self, request_id: &str) -> bool;
/// Handle an incoming call.aborted event.
/// Cancels the pending request, deletes entry.
pub fn handle_aborted(&mut self, request_id: &str) -> bool;
/// Handle an incoming call.error event.
/// Resolves with the error, deletes entry.
pub fn handle_error(&mut self, request_id: &str, error: CallError) -> bool;
/// Sweep expired entries. Called periodically by a background task.
pub fn evict_expired(&mut self) -> Vec<String>; // returns evicted request IDs
/// Fail all pending requests (connection closed). Returns the request IDs that were failed.
pub fn fail_all(&mut self, error: CallError) -> Vec<String>;
/// Check if a request ID is pending.
pub fn contains(&self, request_id: &str) -> bool;
/// Number of pending entries.
pub fn len(&self) -> usize;
}
```
### Connection drop handling
When the QUIC connection closes, all pending requests are failed with
`call.error` code `INTERNAL` and message `"connection closed"`. All
subscription channels are closed. This is `fail_all()`.
### Stream reset handling
When a QUIC stream is reset mid-operation, the `FrameFramedReader` returns an
error. If the stream was carrying a subscription, the PendingRequestMap entry
is removed and the mpsc channel is closed. If the stream was carrying a call,
the oneshot is resolved with an error. No `call.aborted` is sent — the stream
is gone.
### Correlation is by ID, not by stream
A response arriving on stream N can fulfill a request sent on stream M. The
`PendingRequestMap` is keyed by ID, not by stream. This is the stream-agnostic
correlation property from ADR-012.
## Acceptance Criteria
- [ ] `PendingRequestMap` struct with pending HashMap
- [ ] `PendingEntry::Call` with oneshot::Sender and timeout
- [ ] `PendingEntry::Subscribe` with mpsc::Sender and optional timeout
- [ ] `register_call` returns oneshot::Receiver
- [ ] `register_subscribe` returns mpsc::Receiver
- [ ] `handle_responded` resolves Call oneshot, pushes to Subscribe channel
- [ ] `handle_completed` closes Subscribe mpsc, deletes entry
- [ ] `handle_aborted` cancels pending, deletes entry
- [ ] `handle_error` resolves with error, deletes entry
- [ ] Unknown request_id in handle_* is silently discarded (returns false)
- [ ] `evict_expired` removes timed-out entries, resolves with TIMEOUT error
- [ ] `fail_all` fails all pending with given error (connection close)
- [ ] Correlation is by request ID, not by stream
- [ ] Unit test: register call, handle_responded → oneshot resolves
- [ ] Unit test: register subscribe, handle multiple responded, handle_completed → stream ends
- [ ] Unit test: expired call → evict_expired resolves with TIMEOUT
- [ ] Unit test: fail_all resolves all pending with INTERNAL error
- [ ] Unit test: unknown request_id handle_responded → false (silently discarded)
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/crates/call/call-protocol.md — PendingRequestMap section
- docs/architecture/decisions/012-call-protocol-stream-model.md — ADR-012 (ID-based correlation)
## Notes
> Correlation is by request ID, not by stream — a response on stream N can
> fulfill a request sent on stream M. This is the stream-agnostic property from
> ADR-012. The sweeper runs every 10 seconds to evict expired entries. Unknown
> request IDs in handle_* are silently discarded (not an error — the entry may
> have already been resolved/cleaned up).
## Summary
> To be filled on completion

View File

@@ -0,0 +1,219 @@
---
id: call/protocol/wire-types
name: Implement EventEnvelope, ResponseEnvelope, CallError, and length-prefixed JSON framing
status: pending
depends_on: [call/crate-init]
scope: moderate
risk: medium
impact: component
level: implementation
---
## Description
Implement the wire protocol types and framing in `src/protocol/wire.rs`. Every
message on the wire is a length-prefixed JSON `EventEnvelope`.
### EventEnvelope
```rust
pub struct EventEnvelope {
pub r#type: String, // Event type
pub id: String, // Correlation key (request ID, subscription ID)
pub payload: Value, // serde_json::Value — schema depends on event type
}
// Frame: 4-byte big-endian length prefix + UTF-8 JSON body
```
The envelope is JSON because it must be consumable from JavaScript, Python, and
any language. The `Value` type is `serde_json::Value`.
Binary payloads (postcard, protobuf) are base64-encoded as a JSON string within
the `payload` field. The envelope itself does not interpret the payload — this
is a handler-level concern, not a protocol-level concern.
### Event Types
Five event types:
| Event | Direction | Purpose |
|-------|-----------|---------|
| `call.requested` | Caller → Handler | Initiate a call or subscription |
| `call.responded` | Handler → Caller | Deliver a result (one for calls, many for subscriptions) |
| `call.completed` | Handler → Caller | Signal end of subscription stream |
| `call.aborted` | Either side | Cancel the call/subscription |
| `call.error` | Handler → Caller | Signal an error |
### Wire Payload Schemas
| Event | `payload` shape |
|-------|----------------|
| `call.requested` | `{ "operationId": "/fs/readFile", "input": {...}, "auth_token": "alk_..." (optional) }` |
| `call.responded` | `{ "output": <Value> }` |
| `call.completed` | `{}` — empty object |
| `call.aborted` | `{}` — empty object |
| `call.error` | `{ "code": "...", "message": "...", "retryable": bool, "details": {...} (optional) }` |
### call.requested payload
```json
{
"operationId": "/fs/readFile",
"input": { ... },
"auth_token": "alk_..." // optional
}
```
- `operationId` — the operation to invoke, **with a leading slash** on the wire.
The registry stores names without the leading slash; the wire format adds it.
The CallAdapter strips the leading slash before registry lookup.
- `input` — the operation input, matching the operation's `input_schema`.
- `auth_token` — optional. If present, CallAdapter resolves via
`IdentityProvider::resolve_from_token()`. Resulting Identity takes precedence
over connection-level identity for this request.
The `call.requested` payload does **not** carry an abort policy field. The abort
policy is set on `OperationContext` and propagated through
`OperationEnv::invoke()` — the composing handler decides, not the wire caller.
### call.error payload
```json
{
"code": "FILE_NOT_FOUND",
"message": "file not found: /etc/nonexistent",
"retryable": false,
"details": { "path": "/etc/nonexistent", "errno": 2 }
}
```
Protocol-level codes (emitted by dispatch machinery):
- `NOT_FOUND` — operation not in registry (or Internal op called from wire)
- `FORBIDDEN` — access denied
- `INVALID_INPUT` — input doesn't match JSON Schema
- `INTERNAL` — handler error, panic, connection failure
- `TIMEOUT` — request timed out (retryable: true)
Operation-level domain codes (emitted by handlers, ADR-023): e.g.,
`FILE_NOT_FOUND`, `RATE_LIMITED`. These carry a `details` payload conforming to
the declared `ErrorDefinition.schema`.
New error codes may be added in future. Clients should treat unknown codes as
`INTERNAL` with `retryable: false`.
### ResponseEnvelope
```rust
pub struct ResponseEnvelope {
pub request_id: String,
pub result: Result<Value, CallError>,
}
pub struct CallError {
pub code: String,
pub message: String,
pub retryable: bool,
pub details: Option<Value>,
}
```
Local dispatch produces `ResponseEnvelope` with no serialization overhead. The
CallAdapter converts it to `EventEnvelope` for the wire.
### ResponseEnvelope → EventEnvelope conversion
| `ResponseEnvelope` | `EventEnvelope` |
|--------------------|-----------------|
| `Ok(value)` | `{ type: "call.responded", id: request_id, payload: { output: value } }` |
| `Err(call_error)` | `{ type: "call.error", id: request_id, payload: <serialized CallError> }` |
For subscriptions, each `call.responded` is a separate `EventEnvelope` with the
same `id`; `call.completed` is `{ type: "call.completed", id, payload: {} }`.
### Framing
Length-prefixed JSON: 4-byte big-endian length prefix + UTF-8 JSON body.
Implement:
- `FrameFramedReader` — reads length-prefixed frames from an async reader
(RecvStream)
- `FrameFramedWriter` — writes length-prefixed frames to an async writer
(SendStream)
```rust
pub struct FrameFramedReader<R: AsyncRead + Unpin> { /* ... */ }
impl<R: AsyncRead + Unpin> FrameFramedReader<R> {
pub fn new(reader: R) -> Self;
pub async fn read_frame(&mut self) -> Result<EventEnvelope, FrameError>;
}
pub struct FrameFramedWriter<W: AsyncWrite + Unpin> { /* ... */ }
impl<W: AsyncWrite + Unpin> FrameFramedWriter<W> {
pub fn new(writer: W) -> Self;
pub async fn write_frame(&mut self, envelope: &EventEnvelope) -> Result<(), FrameError>;
}
```
This is the same framing used by irpc. The Rust implementation in alknet-call is
canonical (ADR-005, ADR-013).
### ResponseEnvelope helper methods
```rust
impl ResponseEnvelope {
pub fn ok(request_id: String, output: Value) -> Self;
pub fn error(request_id: String, error: CallError) -> Self;
pub fn not_found(request_id: String, op_name: &str) -> Self;
pub fn forbidden(request_id: String, message: &str) -> Self;
}
```
### FrameError
```rust
pub enum FrameError {
Io(io::Error),
Json(serde_json::Error),
ConnectionClosed,
InvalidFrame,
}
```
## Acceptance Criteria
- [ ] `EventEnvelope` struct with type, id, payload fields
- [ ] `ResponseEnvelope` struct with request_id, result fields
- [ ] `CallError` struct with code, message, retryable, details fields
- [ ] `FrameError` enum with Io, Json, ConnectionClosed, InvalidFrame
- [ ] `FrameFramedReader` reads length-prefixed JSON frames
- [ ] `FrameFramedWriter` writes length-prefixed JSON frames
- [ ] 4-byte big-endian length prefix + UTF-8 JSON body
- [ ] `ResponseEnvelope::ok()`, `error()`, `not_found()`, `forbidden()` helpers
- [ ] `ResponseEnvelope``EventEnvelope` conversion (Ok → call.responded, Err → call.error)
- [ ] Unit test: write frame, read frame, round-trip EventEnvelope
- [ ] Unit test: ResponseEnvelope::ok produces correct EventEnvelope
- [ ] Unit test: ResponseEnvelope::error produces correct call.error EventEnvelope
- [ ] Unit test: framing handles large payloads
- [ ] Unit test: framing detects truncated frames (ConnectionClosed error)
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/crates/call/call-protocol.md — EventEnvelope, wire format, event types
- docs/architecture/decisions/005-irpc-as-call-protocol-foundation.md — ADR-005
- docs/architecture/decisions/012-call-protocol-stream-model.md — ADR-012
- docs/architecture/decisions/023-operation-error-schemas.md — ADR-023 (CallError, details)
## Notes
> The envelope is always JSON for cross-language compatibility. Binary
> payloads are base64-encoded within the payload field (handler concern, not
> protocol concern). The 4-byte big-endian length prefix is the same framing
> irpc uses. operationId on the wire has a leading slash; the registry stores
> names without it — the CallAdapter strips it before lookup.
## Summary
> To be filled on completion