tasks: decompose vault, core, call crates into 28 atomic implementation tasks

Break down the three initial crates (alknet-vault, alknet-core, alknet-call)
into dependency-ordered task files for implementation agents.

Structure:
- tasks/vault/ (10 tasks) — drift fixes from ADR-025/026 refactor, review,
  spec sync. Vault is independent and can run fully in parallel with core/call.
- tasks/core/ (6 tasks) — crate init, core types, config, auth, endpoint,
  review. Core is foundational; call depends on it.
- tasks/call/ (12 tasks) — split into registry/ and protocol/ topic subdirs
  reflecting the two subsystems. CallAdapter is the merge point.

Key decisions:
- Drifts 3+9+10 grouped as one task (key-versioning-rotation) — the complete
  ADR-021 rotation feature that doesn't compile in pieces
- Reviews injected at end of each crate phase (vault, core, call)
- Vault spec-sync task removes the drift table and bumps doc status to stable
- ACME deferred in core/endpoint (noted as TODO; X509 manual certs for now)
- OperationEnv kept as a trait (load-bearing for ADR-024 layering)

Validated: 28 tasks, no cycles, 11 generations of parallel work.
Critical path runs through call (11 tasks). Vault completes by generation 4.
6 high-risk tasks identified (21%): irpc-removal, endpoint, operation-context,
operation-env, call-adapter, abort-cascade.
This commit is contained in:
2026-06-23 12:41:47 +00:00
parent 2e34590522
commit 098fd8b9b9
28 changed files with 4271 additions and 0 deletions

103
tasks/call/crate-init.md Normal file
View File

@@ -0,0 +1,103 @@
---
id: call/crate-init
name: Initialize alknet-call crate with Cargo.toml, dependencies, and module skeleton
status: pending
depends_on: [core/core-types]
scope: moderate
risk: low
impact: project
level: implementation
---
## Description
Initialize the `alknet-call` crate from scratch. This crate implements the call
protocol (structured RPC over QUIC) on ALPN `alknet/call`. It depends on
alknet-core (for ProtocolHandler, Connection, AuthContext, Capabilities,
IdentityProvider) and irpc (for framing).
### Crate setup
Create `crates/alknet-call/` with:
- `Cargo.toml` — package metadata, dependencies
- `src/lib.rs` — crate root with module declarations and re-exports
- Module skeleton files for:
- `src/registry/mod.rs` — registry module root
- `src/registry/spec.rs` — OperationSpec, OperationType, Visibility, ErrorDefinition, AccessControl
- `src/registry/context.rs` — OperationContext, AbortPolicy, CompositionAuthority, ScopedOperationEnv
- `src/registry/registration.rs` — Handler, HandlerRegistration, OperationProvenance, OperationRegistry, OperationRegistryBuilder
- `src/registry/env.rs` — OperationEnv trait, LocalOperationEnv, CompositeOperationEnv
- `src/registry/discovery.rs` — services/list, services/schema handlers
- `src/protocol/mod.rs` — protocol module root
- `src/protocol/wire.rs` — EventEnvelope, ResponseEnvelope, CallError, framing
- `src/protocol/pending.rs` — PendingRequestMap, PendingEntry
- `src/protocol/connection.rs` — CallConnection
- `src/protocol/adapter.rs` — CallAdapter (ProtocolHandler impl)
- `src/protocol/abort.rs` — abort cascade logic
### Dependencies
| Crate | Purpose |
|-------|---------|
| `alknet-core` | ProtocolHandler, Connection, AuthContext, Capabilities, IdentityProvider, Identity, HandlerError (workspace path) |
| `irpc` | Framing, service dispatch (workspace dep) |
| `tokio` 1 (full) | Async runtime, sync primitives (oneshot, mpsc, watch) |
| `serde` 1 | Serialization for wire types |
| `serde_json` 1 | JSON wire format, JSON Schema values |
| `async-trait` 0.1 | OperationEnv trait (async fn in trait) |
| `tracing` 0.1 | Structured logging |
| `thiserror` 2 | Error enums |
| `uuid` 1 | Request ID generation (UUID v4) |
| `futures` | Stream trait for subscribe |
### Workspace Cargo.toml
Add `crates/alknet-call` to the workspace `members` list in the root
`Cargo.toml`.
### Module skeleton
```rust
// src/lib.rs
//! alknet-call: Structured RPC over QUIC — operations, streaming, service discovery.
//! Implements ProtocolHandler on ALPN `alknet/call`.
pub mod registry;
pub mod protocol;
// Re-exports (filled in by subsequent tasks)
```
Each module file gets a doc comment and `// TODO: implement` marker.
## Acceptance Criteria
- [ ] `crates/alknet-call/Cargo.toml` exists with all dependencies
- [ ] `crates/alknet-call/src/lib.rs` exists with module declarations
- [ ] All module skeleton files exist (registry/*, protocol/*)
- [ ] Root `Cargo.toml` `members` list includes `crates/alknet-call`
- [ ] `cargo check -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
- [ ] Dual licensing: `MIT OR Apache-2.0` (workspace-inherited)
- [ ] alknet-core dependency uses workspace path (`path = "../alknet-core"`)
## References
- docs/architecture/crates/call/README.md — crate index
- docs/architecture/crates/call/call-protocol.md — CallAdapter, wire format
- docs/architecture/crates/call/operation-registry.md — registry, OperationEnv
- docs/architecture/decisions/003-crate-decomposition.md — ADR-003
- docs/architecture/decisions/005-irpc-as-call-protocol-foundation.md — ADR-005
## Notes
> alknet-call depends on alknet-core (for ProtocolHandler, Connection,
> AuthContext, Capabilities, IdentityProvider) and irpc (for framing). The
> crate has two subsystems: registry (operation specs, context, dispatch) and
> protocol (wire format, streams, adapter). The module structure reflects
> this split.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,193 @@
---
id: call/protocol/abort-cascade
name: Implement abort cascade logic for nested calls (ADR-016)
status: pending
depends_on: [call/protocol/call-adapter]
scope: moderate
risk: high
impact: component
level: implementation
---
## Description
Implement the abort cascade logic in `src/protocol/abort.rs`. When a handler
composes other operations via `OperationEnv::invoke()`, it creates a call tree:
a parent request (r1) spawns children (r1-a, r1-b), which may spawn their own
children. When `call.aborted` arrives for a parent, the protocol cascades the
abort to all non-terminal descendants.
**Read ADR-016 before starting this task.**
### Call tree
The call tree is indexed by `parent_request_id` in the `PendingRequestMap`. The
root request has `parent_request_id: None`. Each composed call has
`parent_request_id: Some(parent.request_id)`.
```
r1 (root, wire call)
├── r1-a (composed by r1's handler)
│ ├── r1-a-1 (composed by r1-a's handler)
│ └── r1-a-2
└── r1-b
└── r1-b-1
```
### Abort cascade
When `call.aborted` arrives for a parent request:
1. Find all non-terminal descendants in the tree (walk by `parent_request_id`)
2. Send `call.aborted` for each descendant
3. Cancel each descendant's future (Drop releases resources)
The CallAdapter walks the tree indexed by `parent_request_id` in
`PendingRequestMap` and sends `call.aborted` for each descendant.
### AbortPolicy
The abort policy is set on `OperationContext` and propagated through
`OperationEnv::invoke()` — the composing handler decides the child's policy,
not the wire caller.
**`AbortDependents` (default)**: aborting a request aborts everything
downstream, regardless of branch. This is the correct default because aborted
parent work has no consumer waiting for results — continuing is wasted work at
best and unwanted side effects at worst (e.g., a `bash/exec` that keeps running
after the caller stopped caring).
**`ContinueRunning` (opt-in)**: descendants that have already started continue
to completion; descendants that haven't started yet are aborted; no new
descendants start. Use for long-running work that should survive a parent's
abort (e.g., a subscription that should keep streaming).
### Wire visibility
Composed child `request_id`s are **internal** — they appear in
`PendingRequestMap` for abort-cascade indexing but are not sent as
`call.requested` to any peer. The client only sees `call.aborted` for the root
ID it sent; the server cascades internally to descendants.
The exception is `from_call` ops, which generate their own wire ID when
forwarding to the remote node (the remote node's `PendingRequestMap` indexes
it).
### Implementation
The abort cascade needs access to the `PendingRequestMap` to walk the tree.
The `CallAdapter` holds the `PendingRequestMap` (or a reference to it). The
cascade logic:
```rust
pub struct AbortCascade {
// Access to PendingRequestMap for tree walking
// The map indexes entries by request_id, and each entry knows its parent_request_id
// (from OperationContext, stored when the entry was registered)
}
impl AbortCascade {
/// Cascade an abort from the given request ID to all non-terminal descendants.
/// Returns the list of request IDs that were aborted (for logging/auditing).
pub fn cascade_abort(&self, root_request_id: &str, policy: AbortPolicy) -> Vec<String>;
/// Find all descendants of a request ID in the call tree.
fn find_descendants(&self, parent_id: &str) -> Vec<String>;
}
```
### Storing parent_request_id in PendingRequestMap
The `PendingRequestMap` needs to know the `parent_request_id` for each entry to
walk the tree. This means `PendingEntry` needs to store the parent ID (or the
full `OperationContext`):
```rust
enum PendingEntry {
Call {
tx: oneshot::Sender<Result<Value, CallError>>,
timeout: Instant,
parent_request_id: Option<String>, // for abort cascade tree
},
Subscribe {
tx: mpsc::Sender<Result<Value, CallError>>,
timeout: Option<Instant>,
parent_request_id: Option<String>, // for abort cascade tree
},
}
```
Update the `PendingRequestMap` (from the pending-request-map task) to store
`parent_request_id` when registering entries. The `register_call` and
`register_subscribe` methods take an optional `parent_request_id` parameter.
### AbortPolicy propagation
The abort policy is propagated through `OperationEnv::invoke()`:
- `invoke()` uses the default impl, which delegates to `invoke_with_policy()`
with `parent.abort_policy.clone()`
- `invoke_with_policy()` takes an explicit policy — use
`AbortPolicy::ContinueRunning` for long-running work
When cascading:
- `AbortDependents`: abort ALL descendants (started and unstarted)
- `ContinueRunning`: abort only unstarted descendants; started ones continue to
completion; no new descendants start
Determining "started" vs "unstarted" is tricky. A practical approach:
- A descendant is "started" if its handler has begun executing (the future has
been polled at least once)
- A descendant is "unstarted" if it's queued but not yet dispatched
This may require tracking dispatch state in `PendingEntry`. A simpler
approximation: under `ContinueRunning`, abort all descendants that haven't sent
a `call.responded` yet (they're still pending). This is conservative but safe.
### Handler cleanup
Handlers clean up resources when their call is cancelled. In Rust, the future
is dropped and `Drop` guards release resources (HTTP streams, file handles,
locks). This is a handler-level concern; the protocol's job is to cascade the
abort. See ADR-016.
## Acceptance Criteria
- [ ] `PendingEntry` stores `parent_request_id` (Call and Subscribe variants)
- [ ] `register_call` and `register_subscribe` accept optional `parent_request_id`
- [ ] `AbortCascade` struct with `cascade_abort()` method
- [ ] `cascade_abort` walks the tree by `parent_request_id`
- [ ] `AbortDependents`: aborts ALL descendants (started and unstarted)
- [ ] `ContinueRunning`: aborts unstarted descendants, started ones continue
- [ ] `cascade_abort` returns list of aborted request IDs
- [ ] `call.aborted` for unknown request_id is silently discarded
- [ ] Composed child request_ids are internal (not sent as call.requested to peer)
- [ ] Client only sees call.aborted for the root ID it sent
- [ ] AbortPolicy propagated through OperationEnv::invoke()
- [ ] Unit test: cascade aborts all descendants under AbortDependents
- [ ] Unit test: cascade aborts only unstarted under ContinueRunning
- [ ] Unit test: unknown request_id → no-op (silently discarded)
- [ ] Unit test: tree with depth 3, abort root → all descendants aborted
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/decisions/016-abort-cascade-for-nested-calls.md — ADR-016 (full rationale)
- docs/architecture/crates/call/call-protocol.md — Abort Cascade and Nested Calls section
- docs/architecture/crates/call/operation-registry.md — AbortPolicy, OperationContext.abort_policy
## Notes
> **Read ADR-016 before starting.** The abort cascade walks the call tree
> indexed by parent_request_id in PendingRequestMap. The default policy
> (AbortDependents) aborts everything downstream — this is correct because
> aborted parent work has no consumer. ContinueRunning is the opt-in for
> long-running work. Composed child request_ids are internal — the client only
> sees call.aborted for the root ID. The PendingRequestMap needs to store
> parent_request_id for tree walking — update the pending-request-map task's
> output if needed.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,260 @@
---
id: call/protocol/call-adapter
name: Implement CallAdapter (ProtocolHandler for alknet/call) with stream handling, identity resolution, and root context construction
status: pending
depends_on: [call/protocol/call-connection, call/registry/operation-env, call/registry/service-discovery, core/endpoint]
scope: broad
risk: high
impact: component
level: implementation
---
## Description
Implement `CallAdapter` in `src/protocol/adapter.rs`. This is the
`ProtocolHandler` implementation for ALPN `alknet/call` — the merge point of the
registry and protocol strands. It ties everything together: stream handling,
identity resolution, root context construction, env composition, dispatch.
### CallAdapter struct
```rust
pub struct CallAdapter {
registry: Arc<OperationRegistry>, // Layer 0 — curated, immutable
identity_provider: Arc<dyn IdentityProvider>,
session_source: Option<Arc<dyn SessionOverlaySource + Send + Sync>>, // Layer 1
default_timeout: Duration, // 30s default
}
impl CallAdapter {
pub fn new(registry: Arc<OperationRegistry>, identity_provider: Arc<dyn IdentityProvider>) -> Self {
Self { registry, identity_provider, session_source: None,
default_timeout: Duration::from_secs(30) }
}
pub fn with_session_source(mut self, source: Arc<dyn SessionOverlaySource + Send + Sync>) -> Self {
self.session_source = Some(source);
self
}
pub fn with_timeout(mut self, timeout: Duration) -> Self {
self.default_timeout = timeout;
self
}
}
```
### SessionOverlaySource trait
```rust
pub trait SessionOverlaySource: Send + Sync {
fn overlay_for(&self, context: &OperationContext) -> Option<Arc<dyn OperationEnv + Send + Sync>>;
}
```
Defined in alknet-call because CallAdapter must name the type — alknet-call
cannot depend on alknet-agent (agent depends on call, not reverse). The agent
crate implements this trait; alknet-call defines it. Same pattern as
IdentityProvider (ADR-004).
### ProtocolHandler impl
```rust
#[async_trait]
impl ProtocolHandler for CallAdapter {
fn alpn(&self) -> &'static [u8] { b"alknet/call" }
async fn handle(&self, connection: Connection, auth: &AuthContext) -> Result<(), HandlerError> {
// 1. Create CallConnection from the Connection
// 2. Spawn a task that continuously calls connection.accept_bi()
// 3. For each accepted stream, read EventEnvelope frames (FrameFramedReader)
// 4. Dispatch call.requested events to the operation registry
// 5. Write response EventEnvelope frames (FrameFramedWriter)
// 6. Manage PendingRequestMap for outgoing calls
// 7. On connection close: fail all pending, return Ok or Err(ConnectionClosed)
}
}
```
### Stream handling
The adapter:
1. Spawns a task that continuously calls `connection.accept_bi()` to receive
incoming streams
2. For each accepted stream, reads `EventEnvelope` frames using
`FrameFramedReader`
3. Dispatches `call.requested` events to the operation registry
4. Writes response `EventEnvelope` frames using `FrameFramedWriter`
5. Manages `PendingRequestMap` for outgoing calls initiated by the server
For outgoing calls (server → client), the adapter:
1. Opens a bidirectional stream with `connection.open_bi()`
2. Sends `call.requested` on that stream
3. Adds the request ID to the `PendingRequestMap`
4. Reads responses from any stream, correlates by ID
### Identity resolution (per-request)
The CallAdapter resolves identity per-request, not per-connection:
1. The endpoint provides `AuthContext` with whatever identity it resolved at
the TLS layer (may be `None`)
2. When a `call.requested` event arrives, the CallAdapter constructs an
`OperationContext` with the connection-level `AuthContext.identity`
3. If the `call.requested` payload includes an `auth_token` field, the
CallAdapter resolves it using `IdentityProvider::resolve_from_token()`. If
resolution succeeds, the resulting `Identity` replaces the connection-level
identity in the `OperationContext`. If resolution fails, the request
proceeds with the connection-level identity (which may be `None`)
4. The `OperationContext.identity` is passed to the `OperationRegistry` for
ACL checking
5. If `identity` is `None` and the operation's `AccessControl` has
restrictions, the registry returns `FORBIDDEN` with message
`"authentication required"`
**Key point**: Identity is resolved per-request. This allows a single
connection to upgrade authentication mid-session and allows different operations
on the same connection to have different identity levels.
### Root OperationContext construction
When a `call.requested` arrives from the wire, the CallAdapter constructs the
root `OperationContext` — the entry point of the call tree. This sets
`internal: false`, meaning ACL runs against the caller's `identity`, not a
handler's composition authority (ADR-015, ADR-022).
```rust
fn build_root_context(
&self,
request_id: String,
operation_name: &str,
identity: Option<Identity>,
/* connection, session */
) -> OperationContext {
let registration = self.registry.registration(operation_name);
OperationContext {
request_id,
parent_request_id: None, // wire request — top of call tree
identity: identity.clone(), // caller's identity (inbound)
handler_identity: registration.composition_authority.clone(),
capabilities: registration.capabilities.clone(),
metadata: HashMap::new(),
deadline: Some(Instant::now() + self.default_timeout),
scoped_env: registration.scoped_env.clone()
.unwrap_or_else(ScopedOperationEnv::empty),
env: self.compose_root_env(/* connection, session */),
abort_policy: AbortPolicy::default(), // abort-dependents
internal: false, // external call — ACL against caller identity
}
}
```
### compose_root_env
The per-call `env` composition (ADR-024) builds a `CompositeOperationEnv` from:
- Layer 0: `LocalOperationEnv` (curated registry)
- Layer 1: session overlay (if active, from `session_source.overlay_for()`)
- Layer 2: connection overlay (from `CallConnection.overlay_env()`)
```rust
fn compose_root_env(&self, connection: &CallConnection, context: &OperationContext) -> Arc<dyn OperationEnv + Send + Sync> {
let base = Arc::new(LocalOperationEnv { registry: self.registry.clone() });
let session = self.session_source.as_ref()
.and_then(|s| s.overlay_for(context));
let connection_overlay = connection.overlay_env();
Arc::new(CompositeOperationEnv { session, connection: Some(connection_overlay), base })
}
```
### operationId normalization
The `call.requested` payload's `operationId` has a leading slash (`/fs/readFile`).
The CallAdapter strips it before registry lookup (`fs/readFile`). This is a
single rule applied consistently — the registry stores names without leading
slash, the wire format adds it.
### ResponseEnvelope → EventEnvelope
The CallAdapter converts `ResponseEnvelope` (from local dispatch) to
`EventEnvelope` for the wire:
| `ResponseEnvelope` | `EventEnvelope` |
|--------------------|-----------------|
| `Ok(value)` | `{ type: "call.responded", id: request_id, payload: { output: value } }` |
| `Err(call_error)` | `{ type: "call.error", id: request_id, payload: <serialized CallError> }` |
For subscriptions, each `call.responded` is a separate `EventEnvelope` with the
same `id`; `call.completed` is `{ type: "call.completed", id, payload: {} }`.
### Timeout handling
- Default timeout for wire calls is 30 seconds (`default_timeout`)
- `build_root_context` sets `OperationContext.deadline` to `now + default_timeout`
- Composed calls inherit the parent's deadline (children do NOT get a fresh 30s)
- A composed call that exceeds the deadline is cancelled and returns
`CallError { code: "TIMEOUT", retryable: true }`
- Subscriptions default to no deadline (`deadline: None` — unbounded); the
client can specify a timeout in the `call.requested` payload
- The `PendingRequestMap` sweeper runs every 10 seconds and removes expired
wire entries
### Error handling in handle()
- If a handler panics, the stream is closed and the PendingRequestMap entry is
cleaned up by the next sweeper pass. Other streams and the connection are
unaffected.
- Connection drop: all pending requests failed with `call.error` code
`INTERNAL` and message `"connection closed"`. All subscription channels
closed. `handle()` returns `Ok(())` (clean) or `Err(ConnectionClosed)`.
- Stream reset: `FrameFramedReader` returns an error. If subscription, remove
PendingRequestMap entry, close mpsc. If call, resolve oneshot with error. No
`call.aborted` sent — stream is gone.
## Acceptance Criteria
- [ ] `CallAdapter` struct with registry, identity_provider, session_source, default_timeout
- [ ] `CallAdapter::new()`, `with_session_source()`, `with_timeout()` constructors
- [ ] `SessionOverlaySource` trait defined with `overlay_for()` method
- [ ] `ProtocolHandler::alpn()` returns `b"alknet/call"`
- [ ] `handle()` accepts streams, reads EventEnvelope frames, dispatches
- [ ] `handle()` spawns task for continuous `accept_bi()`
- [ ] Outgoing calls: open_bi, send call.requested, add to PendingRequestMap
- [ ] Identity resolution: AuthContext.identity used, auth_token overrides per-request
- [ ] auth_token resolution failure → proceed with connection-level identity
- [ ] `build_root_context` sets internal: false, deadline, capabilities from registration
- [ ] `compose_root_env` builds CompositeOperationEnv (base + session + connection)
- [ ] operationId leading slash stripped before registry lookup
- [ ] ResponseEnvelope → EventEnvelope conversion (Ok → responded, Err → error)
- [ ] Subscriptions: multiple call.responded with same id, then call.completed
- [ ] Timeout: 30s default, composed calls inherit parent deadline
- [ ] Handler panic: stream closed, PendingRequestMap cleaned up, others unaffected
- [ ] Connection drop: fail all pending with INTERNAL, return Ok or Err
- [ ] Unit test: CallAdapter alpn returns b"alknet/call"
- [ ] Integration test: call.requested → dispatch → call.responded round-trip
- [ ] Integration test: auth_token overrides connection-level identity
- [ ] Integration test: Internal op called from wire → NOT_FOUND
- [ ] Integration test: ACL denied → FORBIDDEN
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/crates/call/call-protocol.md — CallAdapter, stream handling, root context
- docs/architecture/crates/call/operation-registry.md — OperationContext construction
- docs/architecture/decisions/015-privilege-model-and-authority-context.md — ADR-015 (internal: false for wire)
- docs/architecture/decisions/024-operation-registry-layering.md — ADR-024 (env composition)
- docs/architecture/decisions/012-call-protocol-stream-model.md — ADR-012
## Notes
> This is the merge point of the registry and protocol strands — the highest-
> risk task in the call crate. It ties together stream handling, identity
> resolution, root context construction, env composition, and dispatch. The
> per-request identity resolution (auth_token overrides connection-level) is
> important — a single connection can upgrade auth mid-session. The
> compose_root_env builds the CompositeOperationEnv per call from the active
> layers. operationId on the wire has a leading slash; strip it before lookup.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,158 @@
---
id: call/protocol/call-connection
name: Implement CallConnection with imported-ops overlay (Layer 2) and call/subscribe/abort methods
status: pending
depends_on: [call/protocol/pending-request-map, call/registry/operation-env]
scope: moderate
risk: medium
impact: component
level: implementation
---
## Description
Implement `CallConnection` in `src/protocol/connection.rs`. This represents an
established `alknet/call` connection, regardless of which side opened it
(ADR-017). It holds the connection's imported-ops overlay (Layer 2, ADR-024).
### CallConnection
```rust
pub struct CallConnection {
connection: Connection,
imported_operations: Arc<RwLock<HashMap<String, HandlerRegistration>>>,
}
```
An established alknet/call connection (either direction — accepted or opened).
Holds the Layer 2 overlay (imported ops from `from_call` discovery).
### Layer 2 registration API
```rust
impl CallConnection {
/// Register an imported operation into this connection's overlay (Layer 2, ADR-024).
/// Called by from_call after discovery.
pub fn register_imported(&self, registration: HandlerRegistration) {
let name = registration.spec.name.clone();
self.imported_operations.write().insert(name, registration);
}
/// Register multiple imported operations (bulk variant for from_call).
pub fn register_imported_all(&self, registrations: Vec<HandlerRegistration>) {
let mut overlay = self.imported_operations.write();
for reg in registrations {
overlay.insert(reg.spec.name.clone(), reg);
}
}
}
```
Layer 0 (curated) is built via `OperationRegistryBuilder` at startup. Layer 2
(per-connection) registration uses `CallConnection::register_imported()` at
runtime. When the connection drops, the overlay (and all imported ops) is
dropped — no explicit deregistration needed.
### Overlay env
```rust
impl CallConnection {
/// Build an OperationEnv impl for this connection's overlay.
/// Used by the CallAdapter when composing the root OperationContext.env.
/// Returns an OperationEnv that dispatches to this connection's imported ops
/// (and reports contains only for ops in the overlay).
pub fn overlay_env(&self) -> Arc<dyn OperationEnv + Send + Sync>;
}
```
This is an `OperationEnv` impl that dispatches to the connection's imported ops.
The `contains()` method returns true only for ops in the overlay. The
`invoke_with_policy()` method looks up the op in the overlay and dispatches to
its handler.
This env is composed into the `CompositeOperationEnv` by the CallAdapter as the
`connection` layer (Layer 2).
### Call methods (outgoing)
```rust
impl CallConnection {
/// Call an operation on the remote peer (sends call.requested).
pub async fn call(&self, operation_id: &str, input: Value) -> ResponseEnvelope;
/// Subscribe to a streaming operation on the remote peer.
pub async fn subscribe(&self, operation_id: &str, input: Value) -> impl Stream<Item = ResponseEnvelope>;
/// Abort an in-flight request (sends call.aborted, cascades per ADR-016).
pub async fn abort(&self, request_id: &str);
}
```
These methods:
1. Open a bidirectional stream with `connection.open_bi()`
2. Send `call.requested` on that stream (via FrameFramedWriter)
3. Add the request ID to the PendingRequestMap
4. Read responses from any stream, correlate by ID (via PendingRequestMap)
`call()` resolves on the first `call.responded`. `subscribe()` yields each
`call.responded` until `call.completed` or `call.aborted`.
`abort()` sends `call.aborted` for the given request ID. The abort cascade
(ADR-016) is handled by the abort-cascade task.
### Connection direction independence
Per ADR-017, connection direction is independent of call direction. Both
sides can call each other once connected. The `CallConnection` type is the same
whether the connection was accepted (server side) or opened (client side via
`CallClient`). The `call`/`subscribe`/`abort` methods work the same way.
### from_call integration
The `from_call` adapter (ADR-017) discovers operations on a remote call
protocol endpoint via `services/list` and `services/schema`, then registers
them with `register_imported()` / `register_imported_all()`. This makes
cross-node composition transparent — a handler calling
`env.invoke("worker", "exec", ...)` doesn't know whether the operation is
local or remote.
The `from_call` adapter itself is not implemented in this task — it's a future
task. This task implements the `CallConnection` infrastructure that `from_call`
will use.
## Acceptance Criteria
- [ ] `CallConnection` struct with connection and imported_operations fields
- [ ] `register_imported()` adds to the Layer 2 overlay
- [ ] `register_imported_all()` bulk adds to the overlay
- [ ] `overlay_env()` returns an OperationEnv dispatching to imported ops
- [ ] `overlay_env().contains()` returns true only for ops in the overlay
- [ ] `call()` sends call.requested, resolves on first call.responded
- [ ] `subscribe()` sends call.requested, yields call.responded until completed/aborted
- [ ] `abort()` sends call.aborted for the request ID
- [ ] Outgoing calls open a stream, send request, add to PendingRequestMap
- [ ] Connection drop drops the overlay (no explicit deregistration)
- [ ] Unit test: register_imported adds to overlay, contains returns true
- [ ] Unit test: overlay_env dispatches to imported op
- [ ] Unit test: overlay_env contains returns false for non-imported op
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/crates/call/call-protocol.md — CallConnection section
- docs/architecture/decisions/017-call-protocol-client-and-adapter-contract.md — ADR-017
- docs/architecture/decisions/024-operation-registry-layering.md — ADR-024 (Layer 2)
## Notes
> Connection direction is independent of call direction (ADR-017) — both sides
> can call each other. The Layer 2 overlay is per-connection: when the
> connection drops, the overlay drops (no deregistration needed). The
> overlay_env() is composed into CompositeOperationEnv by the CallAdapter as
> the connection layer. The from_call adapter itself is a future task — this
> implements the infrastructure it will use.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,164 @@
---
id: call/protocol/pending-request-map
name: Implement PendingRequestMap for correlating call.requested and call.responded events
status: pending
depends_on: [call/protocol/wire-types]
scope: moderate
risk: medium
impact: component
level: implementation
---
## Description
Implement `PendingRequestMap` in `src/protocol/pending.rs`. This manages
in-flight calls and subscriptions, correlating `call.responded` events back to
the original `call.requested` by request ID.
### PendingRequestMap
```rust
pub struct PendingRequestMap {
pending: HashMap<String, PendingEntry>,
}
enum PendingEntry {
Call {
tx: oneshot::Sender<Result<Value, CallError>>,
timeout: Instant,
},
Subscribe {
tx: mpsc::Sender<Result<Value, CallError>>,
timeout: Option<Instant>,
},
}
```
### Behavior
When a `call.responded` event arrives:
- If `PendingEntry::Call` → resolve the oneshot, delete entry
- If `PendingEntry::Subscribe` → push to the mpsc channel, keep entry alive
When `call.completed` arrives on a subscription → close the mpsc channel, delete entry.
When `call.aborted` arrives → cancel/drop whichever side initiated it. A
`call.aborted` for an unknown `requestId` is silently discarded.
When `call.error` arrives → resolve the oneshot (Call) or push to channel
(Subscribe) with the error, delete entry.
### Timeouts
Timeouts prevent dangling entries. A background task sweeps expired entries
periodically (every 10 seconds per call-protocol.md).
- `Call` entries have a timeout (default 30s from CallAdapter.default_timeout)
- `Subscribe` entries may have `timeout: None` (unbounded — long-running
subscriptions)
When the sweeper finds an expired entry:
- `Call`: resolve oneshot with `CallError { code: "TIMEOUT", retryable: true }`, delete
- `Subscribe`: close mpsc channel with a timeout error, delete
### Methods
```rust
impl PendingRequestMap {
pub fn new() -> Self;
/// Register a pending call. Returns a oneshot receiver for the result.
pub fn register_call(&mut self, request_id: String, timeout: Instant) -> oneshot::Receiver<Result<Value, CallError>>;
/// Register a pending subscription. Returns an mpsc receiver for the stream.
pub fn register_subscribe(&mut self, request_id: String, timeout: Option<Instant>) -> mpsc::Receiver<Result<Value, CallError>>;
/// Handle an incoming call.responded event.
/// Returns true if the entry was found and handled.
pub fn handle_responded(&mut self, request_id: &str, output: Value) -> bool;
/// Handle an incoming call.completed event (subscriptions only).
/// Closes the mpsc channel, deletes entry.
pub fn handle_completed(&mut self, request_id: &str) -> bool;
/// Handle an incoming call.aborted event.
/// Cancels the pending request, deletes entry.
pub fn handle_aborted(&mut self, request_id: &str) -> bool;
/// Handle an incoming call.error event.
/// Resolves with the error, deletes entry.
pub fn handle_error(&mut self, request_id: &str, error: CallError) -> bool;
/// Sweep expired entries. Called periodically by a background task.
pub fn evict_expired(&mut self) -> Vec<String>; // returns evicted request IDs
/// Fail all pending requests (connection closed). Returns the request IDs that were failed.
pub fn fail_all(&mut self, error: CallError) -> Vec<String>;
/// Check if a request ID is pending.
pub fn contains(&self, request_id: &str) -> bool;
/// Number of pending entries.
pub fn len(&self) -> usize;
}
```
### Connection drop handling
When the QUIC connection closes, all pending requests are failed with
`call.error` code `INTERNAL` and message `"connection closed"`. All
subscription channels are closed. This is `fail_all()`.
### Stream reset handling
When a QUIC stream is reset mid-operation, the `FrameFramedReader` returns an
error. If the stream was carrying a subscription, the PendingRequestMap entry
is removed and the mpsc channel is closed. If the stream was carrying a call,
the oneshot is resolved with an error. No `call.aborted` is sent — the stream
is gone.
### Correlation is by ID, not by stream
A response arriving on stream N can fulfill a request sent on stream M. The
`PendingRequestMap` is keyed by ID, not by stream. This is the stream-agnostic
correlation property from ADR-012.
## Acceptance Criteria
- [ ] `PendingRequestMap` struct with pending HashMap
- [ ] `PendingEntry::Call` with oneshot::Sender and timeout
- [ ] `PendingEntry::Subscribe` with mpsc::Sender and optional timeout
- [ ] `register_call` returns oneshot::Receiver
- [ ] `register_subscribe` returns mpsc::Receiver
- [ ] `handle_responded` resolves Call oneshot, pushes to Subscribe channel
- [ ] `handle_completed` closes Subscribe mpsc, deletes entry
- [ ] `handle_aborted` cancels pending, deletes entry
- [ ] `handle_error` resolves with error, deletes entry
- [ ] Unknown request_id in handle_* is silently discarded (returns false)
- [ ] `evict_expired` removes timed-out entries, resolves with TIMEOUT error
- [ ] `fail_all` fails all pending with given error (connection close)
- [ ] Correlation is by request ID, not by stream
- [ ] Unit test: register call, handle_responded → oneshot resolves
- [ ] Unit test: register subscribe, handle multiple responded, handle_completed → stream ends
- [ ] Unit test: expired call → evict_expired resolves with TIMEOUT
- [ ] Unit test: fail_all resolves all pending with INTERNAL error
- [ ] Unit test: unknown request_id handle_responded → false (silently discarded)
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/crates/call/call-protocol.md — PendingRequestMap section
- docs/architecture/decisions/012-call-protocol-stream-model.md — ADR-012 (ID-based correlation)
## Notes
> Correlation is by request ID, not by stream — a response on stream N can
> fulfill a request sent on stream M. This is the stream-agnostic property from
> ADR-012. The sweeper runs every 10 seconds to evict expired entries. Unknown
> request IDs in handle_* are silently discarded (not an error — the entry may
> have already been resolved/cleaned up).
## Summary
> To be filled on completion

View File

@@ -0,0 +1,219 @@
---
id: call/protocol/wire-types
name: Implement EventEnvelope, ResponseEnvelope, CallError, and length-prefixed JSON framing
status: pending
depends_on: [call/crate-init]
scope: moderate
risk: medium
impact: component
level: implementation
---
## Description
Implement the wire protocol types and framing in `src/protocol/wire.rs`. Every
message on the wire is a length-prefixed JSON `EventEnvelope`.
### EventEnvelope
```rust
pub struct EventEnvelope {
pub r#type: String, // Event type
pub id: String, // Correlation key (request ID, subscription ID)
pub payload: Value, // serde_json::Value — schema depends on event type
}
// Frame: 4-byte big-endian length prefix + UTF-8 JSON body
```
The envelope is JSON because it must be consumable from JavaScript, Python, and
any language. The `Value` type is `serde_json::Value`.
Binary payloads (postcard, protobuf) are base64-encoded as a JSON string within
the `payload` field. The envelope itself does not interpret the payload — this
is a handler-level concern, not a protocol-level concern.
### Event Types
Five event types:
| Event | Direction | Purpose |
|-------|-----------|---------|
| `call.requested` | Caller → Handler | Initiate a call or subscription |
| `call.responded` | Handler → Caller | Deliver a result (one for calls, many for subscriptions) |
| `call.completed` | Handler → Caller | Signal end of subscription stream |
| `call.aborted` | Either side | Cancel the call/subscription |
| `call.error` | Handler → Caller | Signal an error |
### Wire Payload Schemas
| Event | `payload` shape |
|-------|----------------|
| `call.requested` | `{ "operationId": "/fs/readFile", "input": {...}, "auth_token": "alk_..." (optional) }` |
| `call.responded` | `{ "output": <Value> }` |
| `call.completed` | `{}` — empty object |
| `call.aborted` | `{}` — empty object |
| `call.error` | `{ "code": "...", "message": "...", "retryable": bool, "details": {...} (optional) }` |
### call.requested payload
```json
{
"operationId": "/fs/readFile",
"input": { ... },
"auth_token": "alk_..." // optional
}
```
- `operationId` — the operation to invoke, **with a leading slash** on the wire.
The registry stores names without the leading slash; the wire format adds it.
The CallAdapter strips the leading slash before registry lookup.
- `input` — the operation input, matching the operation's `input_schema`.
- `auth_token` — optional. If present, CallAdapter resolves via
`IdentityProvider::resolve_from_token()`. Resulting Identity takes precedence
over connection-level identity for this request.
The `call.requested` payload does **not** carry an abort policy field. The abort
policy is set on `OperationContext` and propagated through
`OperationEnv::invoke()` — the composing handler decides, not the wire caller.
### call.error payload
```json
{
"code": "FILE_NOT_FOUND",
"message": "file not found: /etc/nonexistent",
"retryable": false,
"details": { "path": "/etc/nonexistent", "errno": 2 }
}
```
Protocol-level codes (emitted by dispatch machinery):
- `NOT_FOUND` — operation not in registry (or Internal op called from wire)
- `FORBIDDEN` — access denied
- `INVALID_INPUT` — input doesn't match JSON Schema
- `INTERNAL` — handler error, panic, connection failure
- `TIMEOUT` — request timed out (retryable: true)
Operation-level domain codes (emitted by handlers, ADR-023): e.g.,
`FILE_NOT_FOUND`, `RATE_LIMITED`. These carry a `details` payload conforming to
the declared `ErrorDefinition.schema`.
New error codes may be added in future. Clients should treat unknown codes as
`INTERNAL` with `retryable: false`.
### ResponseEnvelope
```rust
pub struct ResponseEnvelope {
pub request_id: String,
pub result: Result<Value, CallError>,
}
pub struct CallError {
pub code: String,
pub message: String,
pub retryable: bool,
pub details: Option<Value>,
}
```
Local dispatch produces `ResponseEnvelope` with no serialization overhead. The
CallAdapter converts it to `EventEnvelope` for the wire.
### ResponseEnvelope → EventEnvelope conversion
| `ResponseEnvelope` | `EventEnvelope` |
|--------------------|-----------------|
| `Ok(value)` | `{ type: "call.responded", id: request_id, payload: { output: value } }` |
| `Err(call_error)` | `{ type: "call.error", id: request_id, payload: <serialized CallError> }` |
For subscriptions, each `call.responded` is a separate `EventEnvelope` with the
same `id`; `call.completed` is `{ type: "call.completed", id, payload: {} }`.
### Framing
Length-prefixed JSON: 4-byte big-endian length prefix + UTF-8 JSON body.
Implement:
- `FrameFramedReader` — reads length-prefixed frames from an async reader
(RecvStream)
- `FrameFramedWriter` — writes length-prefixed frames to an async writer
(SendStream)
```rust
pub struct FrameFramedReader<R: AsyncRead + Unpin> { /* ... */ }
impl<R: AsyncRead + Unpin> FrameFramedReader<R> {
pub fn new(reader: R) -> Self;
pub async fn read_frame(&mut self) -> Result<EventEnvelope, FrameError>;
}
pub struct FrameFramedWriter<W: AsyncWrite + Unpin> { /* ... */ }
impl<W: AsyncWrite + Unpin> FrameFramedWriter<W> {
pub fn new(writer: W) -> Self;
pub async fn write_frame(&mut self, envelope: &EventEnvelope) -> Result<(), FrameError>;
}
```
This is the same framing used by irpc. The Rust implementation in alknet-call is
canonical (ADR-005, ADR-013).
### ResponseEnvelope helper methods
```rust
impl ResponseEnvelope {
pub fn ok(request_id: String, output: Value) -> Self;
pub fn error(request_id: String, error: CallError) -> Self;
pub fn not_found(request_id: String, op_name: &str) -> Self;
pub fn forbidden(request_id: String, message: &str) -> Self;
}
```
### FrameError
```rust
pub enum FrameError {
Io(io::Error),
Json(serde_json::Error),
ConnectionClosed,
InvalidFrame,
}
```
## Acceptance Criteria
- [ ] `EventEnvelope` struct with type, id, payload fields
- [ ] `ResponseEnvelope` struct with request_id, result fields
- [ ] `CallError` struct with code, message, retryable, details fields
- [ ] `FrameError` enum with Io, Json, ConnectionClosed, InvalidFrame
- [ ] `FrameFramedReader` reads length-prefixed JSON frames
- [ ] `FrameFramedWriter` writes length-prefixed JSON frames
- [ ] 4-byte big-endian length prefix + UTF-8 JSON body
- [ ] `ResponseEnvelope::ok()`, `error()`, `not_found()`, `forbidden()` helpers
- [ ] `ResponseEnvelope``EventEnvelope` conversion (Ok → call.responded, Err → call.error)
- [ ] Unit test: write frame, read frame, round-trip EventEnvelope
- [ ] Unit test: ResponseEnvelope::ok produces correct EventEnvelope
- [ ] Unit test: ResponseEnvelope::error produces correct call.error EventEnvelope
- [ ] Unit test: framing handles large payloads
- [ ] Unit test: framing detects truncated frames (ConnectionClosed error)
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/crates/call/call-protocol.md — EventEnvelope, wire format, event types
- docs/architecture/decisions/005-irpc-as-call-protocol-foundation.md — ADR-005
- docs/architecture/decisions/012-call-protocol-stream-model.md — ADR-012
- docs/architecture/decisions/023-operation-error-schemas.md — ADR-023 (CallError, details)
## Notes
> The envelope is always JSON for cross-language compatibility. Binary
> payloads are base64-encoded within the payload field (handler concern, not
> protocol concern). The 4-byte big-endian length prefix is the same framing
> irpc uses. operationId on the wire has a leading slash; the registry stores
> names without it — the CallAdapter strips it before lookup.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,202 @@
---
id: call/registry/handler-registration
name: Implement Handler, HandlerRegistration, OperationProvenance, OperationRegistry, and OperationRegistryBuilder
status: pending
depends_on: [call/registry/operation-context]
scope: broad
risk: medium
impact: component
level: implementation
---
## Description
Implement the handler registration types and the operation registry in
`src/registry/registration.rs`. The registry maps operation names to
registration bundles and provides the dispatch entry point.
### Handler
```rust
pub type Handler = Arc<
dyn Fn(Value, OperationContext) -> Pin<Box<dyn Future<Output = ResponseEnvelope> + Send>>
+ Send + Sync
>;
```
Handlers are async. They receive:
- `input: Value` — deserialized payload from `call.requested` (always `serde_json::Value`)
- `context: OperationContext` — request ID, identity, metadata, env
And return `ResponseEnvelope` (defined in protocol/wire task — use a forward
reference or define a minimal version here, full impl in the wire task).
### HandlerRegistration
```rust
pub struct HandlerRegistration {
pub spec: OperationSpec,
pub handler: Handler,
pub provenance: OperationProvenance,
pub composition_authority: Option<CompositionAuthority>, // None for leaves
pub scoped_env: Option<ScopedOperationEnv>, // None for leaves
pub capabilities: Capabilities,
}
```
The registration bundle carries everything the dispatch path needs to
construct an `OperationContext`. See ADR-022.
### OperationProvenance
```rust
pub enum OperationProvenance {
Local, // Assembly-written, trusted, can compose
FromOpenAPI, // HTTP forwarding stub, leaf
FromMCP, // MCP forwarding stub, leaf
FromCall, // QUIC forwarding stub, leaf locally
FromJsonSchema, // JSON Schema definition, no handler — schema only
Session, // Agent-written, sandboxed, can compose within sandbox
}
```
| Provenance | Can compose? | Has composition authority? | Default visibility |
|-----------|-------------|---------------------------|-------------------|
| `Local` | Yes | Yes | External or Internal (assembly declares) |
| `FromOpenAPI` | No (leaf) | No | Internal |
| `FromMCP` | No (leaf) | No | Internal |
| `FromCall` | No (leaf in local registry) | No | Internal |
| `FromJsonSchema` | N/A (no handler) | No | N/A |
| `Session` | Yes (within sandbox) | Yes | Internal always |
### OperationRegistry
```rust
pub struct OperationRegistry {
operations: HashMap<String, HandlerRegistration>,
}
```
The curated layer (Layer 0) is a `HashMap<String, HandlerRegistration>`. Session
and connection overlays (Layers 1 and 2) are separate maps composed into the
per-call `OperationContext.env` by the CallAdapter (ADR-024).
Methods:
- `register(registration)`: add to curated layer at startup
- `registration(name)`: find by operation name (checks active overlays first,
then curated base — ADR-024). Returns spec, handler, provenance, composition
authority, scoped env, capabilities.
- `invoke(name, input, context)`: look up, check ACL, invoke handler, return result
- `list_operations()`: return all registered specs (for `/services/list`
returns curated + active overlay ops, External only)
### OperationRegistryBuilder
Fluent API with convenience methods:
```rust
pub struct OperationRegistryBuilder {
operations: HashMap<String, HandlerRegistration>,
}
impl OperationRegistryBuilder {
pub fn new() -> Self;
// with_local: Local provenance, full bundle — all 5 args required
pub fn with_local(
mut self,
spec: OperationSpec,
handler: Handler,
composition_authority: Option<CompositionAuthority>,
scoped_env: Option<ScopedOperationEnv>,
capabilities: Capabilities,
) -> Self;
// with_leaf: leaf provenance (FromOpenAPI/FromMCP/FromCall), no authority, no scoped env
pub fn with_leaf(
mut self,
spec: OperationSpec,
handler: Handler,
capabilities: Capabilities,
) -> Self;
// with: full manual registration (any provenance)
pub fn with(mut self, registration: HandlerRegistration) -> Self;
pub fn build(self) -> OperationRegistry;
}
```
`with_local` sets `provenance: Local`. `with_leaf` sets `provenance: FromOpenAPI`
(or a parameter), `composition_authority: None`, `scoped_env: None`. `with` takes
the full bundle for any provenance.
### Registry invoke flow
```rust
impl OperationRegistry {
pub async fn invoke(&self, name: &str, input: Value, context: OperationContext) -> ResponseEnvelope {
// 1. Look up registration by name
// 2. Check visibility: if Internal and context is external (internal: false), return NOT_FOUND
// 3. Check ACL: access_control.check(identity or handler_identity depending on internal flag)
// 4. If denied: return FORBIDDEN
// 5. Invoke handler: (handler)(input, context).await
// 6. Return ResponseEnvelope
}
}
```
The ACL authority depends on `context.internal`:
- `internal: false` (wire call): check against `context.identity` (caller)
- `internal: true` (composition): check against `context.handler_identity.as_identity()`
### Layer 0 immutability
The curated layer (Layer 0 — `Local` provenance ops) is immutable after
construction. Adding a `Local` op requires restarting the process. Session and
imported overlays are dynamic at their respective scopes (ADR-024). The
`OperationRegistryBuilder` is Layer-0-only; runtime overlay registration uses
`CallConnection::register_imported()` (in the protocol/connection task).
## Acceptance Criteria
- [ ] `Handler` type alias (async closure returning ResponseEnvelope)
- [ ] `HandlerRegistration` struct with all 6 fields
- [ ] `OperationProvenance` enum with all 6 variants
- [ ] `OperationRegistry` struct with operations HashMap
- [ ] `OperationRegistry::register()` adds to curated layer
- [ ] `OperationRegistry::registration()` looks up by name
- [ ] `OperationRegistry::invoke()` checks visibility, ACL, invokes handler
- [ ] `OperationRegistry::list_operations()` returns External specs only
- [ ] `OperationRegistryBuilder` with `new()`, `with_local()`, `with_leaf()`, `with()`, `build()`
- [ ] `with_local` sets provenance Local, requires all 5 args
- [ ] `with_leaf` sets provenance leaf, composition_authority None, scoped_env None
- [ ] invoke: Internal op called externally → NOT_FOUND (not FORBIDDEN)
- [ ] invoke: ACL denied → FORBIDDEN
- [ ] invoke: internal: true → ACL against handler_identity, not identity
- [ ] invoke: internal: false → ACL against identity
- [ ] Unit test: register and invoke a simple operation
- [ ] Unit test: Internal op returns NOT_FOUND from external call
- [ ] Unit test: ACL check with sufficient scopes → Allowed
- [ ] Unit test: ACL check with insufficient scopes → Forbidden
- [ ] Unit test: builder with_local and with_leaf produce correct provenance
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/crates/call/operation-registry.md — Handler, HandlerRegistration, OperationRegistry, builder
- docs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md — ADR-022
- docs/architecture/decisions/024-operation-registry-layering.md — ADR-024 (layering, immutability)
## Notes
> The registry is the dispatch core. The ACL authority switch (internal: true
> → handler_identity, internal: false → identity) is the ADR-015 privilege
> model — get this right. Internal ops return NOT_FOUND from the wire (don't
> leak existence), not FORBIDDEN. The builder is Layer-0-only; runtime overlay
> registration is via CallConnection (protocol task).
## Summary
> To be filled on completion

View File

@@ -0,0 +1,204 @@
---
id: call/registry/operation-context
name: Implement OperationContext, AbortPolicy, CompositionAuthority, and ScopedOperationEnv
status: pending
depends_on: [call/registry/operation-spec, core/core-types]
scope: broad
risk: high
impact: component
level: implementation
---
## Description
Implement the operation context types in `src/registry/context.rs`. This is
the highest-density task in the call crate — `OperationContext` has 10 fields,
each tied to an ADR. The authority-switch semantics (`internal: true` → ACL
against `handler_identity`, not `identity`) is where ADR-015, ADR-022, and
ADR-024 converge.
**Read ADR-015, ADR-022, and ADR-024 before starting this task.**
### OperationContext
```rust
pub struct OperationContext {
pub request_id: String,
pub parent_request_id: Option<String>,
pub identity: Option<Identity>, // Caller's identity (inbound)
pub handler_identity: Option<CompositionAuthority>, // Handler's composition authority (ADR-022)
pub capabilities: Capabilities,
pub metadata: HashMap<String, Value>,
pub scoped_env: ScopedOperationEnv, // Reachability set (data, ADR-022)
pub env: Arc<dyn OperationEnv + Send + Sync>, // Composition dispatch trait (ADR-024)
pub abort_policy: AbortPolicy, // ADR-016 Decision 6
pub deadline: Option<Instant>,
pub(crate) internal: bool, // Module-private for writes (ADR-015)
}
```
Field-by-field:
- `request_id`: correlates with `call.requested` event's `id` field. For wire
calls, this is the client-generated ID. For composed calls, generated by
`OperationEnv::invoke()` via `generate_request_id()` (UUID v4 or
`parent_id + "-" + counter`). **Deterministic IDs must not be used** — they
collide across concurrent invocations, corrupting PendingRequestMap and the
abort-cascade tree.
- `parent_request_id`: set when this call was initiated by another operation
(via OperationEnv). Records the agency chain — the call tree is the
principal→agent chain (ADR-015).
- `identity`: the authenticated caller (from IdentityProvider) — inbound auth
(who is calling me). For external calls, who sent `call.requested`. For
internal calls, the parent handler's `handler_identity` (propagated through
`OperationEnv::invoke()`).
- `handler_identity`: the composition authority of the handler processing this
call. `None` for leaves (FromOpenAPI/FromMCP/FromCall) — they don't compose.
`Some(...)` for Local/Session ops. For internal calls (`internal: true`), ACL
checks against this authority (ADR-015, ADR-022). This is NOT a peer Identity
— it's a declared authority bundle set at registration.
- `capabilities`: outbound credentials the handler may use (decrypted API keys,
scoped vault access). From the registration bundle (ADR-022).
- `metadata`: request-scoped context (tracing IDs, connection info). **Must not
hold secret material** (ADR-014). **Does not propagate through
`OperationEnv::invoke()`** — nested calls get fresh metadata. The tracing
link is `parent_request_id`, not metadata propagation.
- `scoped_env`: the reachability set — operations this handler may compose.
Populated from the registration bundle (ADR-022). This is *data* (a struct),
not a dispatch trait. `None`/empty for leaves.
- `env`: the composition dispatch trait (`Arc<dyn OperationEnv + Send + Sync>`).
A handler calls `context.env.invoke(...)` to compose children. This is a
trait object, not a concrete struct — enables registry layering (ADR-024).
- `abort_policy`: for this call's descendants (ADR-016 Decision 6). Default
`AbortDependents`. `ContinueRunning` is opt-in for long-running work. Set by
the composing handler via `invoke()`, not by the wire caller.
- `deadline`: for this call and all descendants. Set by `build_root_context`
to `now + CallAdapter.default_timeout` (default 30s). Composed calls inherit
the parent's deadline (children do NOT get a fresh 30s). `None` = unbounded
(long-running subscriptions).
- `internal`: when `true`, this call originated from composition (a handler
calling another operation via OperationEnv), not from a wire request. This
switches the authority context: ACL runs against `handler_identity`, not
`identity`. Module-private for writes; read via `is_internal()`. Only set by
`OperationEnv::invoke()` (true) or `CallAdapter` dispatch path (false).
### AbortPolicy
```rust
pub enum AbortPolicy {
AbortDependents, // default — abort cascades to all non-terminal descendants
ContinueRunning, // opt-in — started descendants continue, unstarted aborted
}
impl Default for AbortPolicy {
fn default() -> Self { Self::AbortDependents }
}
```
### CompositionAuthority
```rust
pub struct CompositionAuthority {
pub label: String, // e.g., "agent-chat" — not a peer id
pub scopes: Vec<String>, // e.g., ["llm:call", "fs:read"]
pub resources: HashMap<String, Vec<String>>, // e.g., {"service": ["vastai"]}
}
impl CompositionAuthority {
pub fn none() -> Option<Self> { None } // Convenience for leaves
pub fn new(label: &str, scopes: impl IntoIterator<Item = String>) -> Self { ... }
pub fn as_identity(&self) -> Option<Identity> { ... } // Synthetic Identity for ACL
}
```
The declared authority the handler operates under when composing children.
`None` for leaves. This replaces ADR-015's `handler_identity: Identity` — it's
not a peer identity, it's a declared authority bundle. See ADR-022.
`as_identity()` produces a synthetic `Identity` from the authority (label as
id, scopes, resources) for ACL checking against `AccessControl`.
### ScopedOperationEnv
```rust
pub struct ScopedOperationEnv {
allowed: HashSet<String>, // operation names this handler may reach
}
impl ScopedOperationEnv {
pub fn empty() -> Self;
pub fn new(ops: impl IntoIterator<Item = impl Into<String>>) -> Self;
pub fn allows(&self, name: &str) -> bool; // is this op in the reachability set?
}
```
The reachability set — the operations this handler may reach via `env.invoke()`.
Populated from the registration bundle (ADR-022). This is *data*, not a dispatch
trait. The reachability check in `OperationEnv::invoke()` consults
`scoped_env.allows(&name)`. `None`/empty for leaves.
### OperationContext methods
```rust
impl OperationContext {
pub fn is_internal(&self) -> bool { self.internal }
}
```
The `internal` field is `pub(crate)` — only `OperationEnv::invoke()` and the
`CallAdapter` dispatch path can set it. Handlers read via `is_internal()`.
### generate_request_id
```rust
pub(crate) fn generate_request_id() -> String {
// UUID v4 — must be unique across concurrent invocations
// Deterministic IDs (e.g., format!("env-{name}")) MUST NOT be used
}
```
Use the `uuid` crate (already a dependency). This is module-internal — called
by `OperationEnv::invoke()` for composed calls.
## Acceptance Criteria
- [ ] `OperationContext` struct with all 10 fields
- [ ] `internal` field is `pub(crate)` (module-private for writes)
- [ ] `is_internal()` method exposes read access
- [ ] `AbortPolicy` enum with AbortDependents, ContinueRunning
- [ ] `Default for AbortPolicy` returns `AbortDependents`
- [ ] `CompositionAuthority` struct with label, scopes, resources
- [ ] `CompositionAuthority::none()` returns `None`
- [ ] `CompositionAuthority::new(label, scopes)` constructor
- [ ] `CompositionAuthority::as_identity()` produces synthetic Identity for ACL
- [ ] `ScopedOperationEnv` struct with allowed set
- [ ] `ScopedOperationEnv::empty()`, `new()`, `allows()` methods
- [ ] `generate_request_id()` produces UUID v4 (unique, non-deterministic)
- [ ] Unit test: ScopedOperationEnv::allows (in set → true, not in set → false)
- [ ] Unit test: CompositionAuthority::as_identity produces correct Identity
- [ ] Unit test: AbortPolicy default is AbortDependents
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/crates/call/operation-registry.md — OperationContext, AbortPolicy, CompositionAuthority, ScopedOperationEnv
- docs/architecture/decisions/015-privilege-model-and-authority-context.md — ADR-015 (internal flag, authority switch)
- docs/architecture/decisions/016-abort-cascade-for-nested-calls.md — ADR-016 (AbortPolicy)
- docs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md — ADR-022 (CompositionAuthority, ScopedOperationEnv)
- docs/architecture/decisions/024-operation-registry-layering.md — ADR-024 (env as trait object)
## Notes
> **Read ADR-015, ADR-022, and ADR-024 before starting.** This is the
> highest-density task in the call crate. OperationContext has 10 fields, each
> tied to an ADR. The authority-switch semantics (internal: true → ACL against
> handler_identity, not identity) is where three ADRs converge. The `internal`
> field is module-private for writes — only OperationEnv::invoke() and the
> CallAdapter dispatch path set it. Metadata does NOT propagate through
> composition (security constraint, ADR-014). Request IDs must be unique
> (UUID v4) — deterministic IDs corrupt PendingRequestMap and abort-cascade tree.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,225 @@
---
id: call/registry/operation-env
name: Implement OperationEnv trait, LocalOperationEnv, and CompositeOperationEnv
status: pending
depends_on: [call/registry/handler-registration]
scope: broad
risk: high
impact: component
level: implementation
---
## Description
Implement the `OperationEnv` trait and its implementations in
`src/registry/env.rs`. This is the universal composition mechanism — a handler
calls `context.env.invoke(...)` to compose child operations. The trait-object
design is what enables registry layering (ADR-024).
**Read ADR-024 before starting this task.** The trait-object pattern is
load-bearing — making `OperationEnv` concrete would close the session-overlay
and connection-overlay patterns.
### OperationEnv trait
```rust
#[async_trait]
pub trait OperationEnv: Send + Sync {
/// Compose a child operation. The child's OperationContext is constructed
/// with internal: true, inheriting the parent's composition authority as
/// the child's caller identity. Abort policy defaults to parent's.
async fn invoke(
&self,
namespace: &str,
operation: &str,
input: Value,
parent: &OperationContext,
) -> ResponseEnvelope {
self.invoke_with_policy(namespace, operation, input, parent, parent.abort_policy.clone()).await
}
/// Compose with explicit abort policy (ADR-016 Decision 6).
/// This is the required method — invoke() delegates to it.
async fn invoke_with_policy(
&self,
namespace: &str,
operation: &str,
input: Value,
parent: &OperationContext,
policy: AbortPolicy,
) -> ResponseEnvelope;
/// Does this env contain the named operation? Used by CompositeOperationEnv
/// to probe overlays before dispatching (ADR-024).
fn contains(&self, name: &str) -> bool { true }
}
```
`invoke()` has a default impl that delegates to `invoke_with_policy()` with
the parent's abort policy. Implementations only need to implement
`invoke_with_policy()`.
### LocalOperationEnv (Layer 0)
```rust
pub struct LocalOperationEnv {
registry: Arc<OperationRegistry>,
}
#[async_trait]
impl OperationEnv for LocalOperationEnv {
async fn invoke_with_policy(&self, namespace: &str, operation: &str, input: Value, parent: &OperationContext, policy: AbortPolicy) -> ResponseEnvelope {
let name = format!("{namespace}/{operation}");
// 1. Reachability check (ADR-015, ADR-022): is this op in parent's scoped env?
if !parent.scoped_env.allows(&name) {
return ResponseEnvelope::not_found(name);
}
// 2. Look up registration
let registration = self.registry.registration(&name);
// 3. Construct child OperationContext
let context = OperationContext {
request_id: generate_request_id(), // UUID v4 — NOT deterministic
parent_request_id: Some(parent.request_id.clone()),
identity: parent.handler_identity.as_identity(), // authority switch
handler_identity: registration.composition_authority.clone(),
capabilities: parent.capabilities.clone(), // inherit
metadata: HashMap::new(), // fresh — does NOT propagate parent metadata (ADR-014)
abort_policy: policy,
deadline: parent.deadline, // inherit — children don't get fresh 30s
scoped_env: registration.scoped_env.clone().unwrap_or_else(ScopedOperationEnv::empty),
env: parent.env.clone(), // inherit the same composite env
internal: true, // nested calls use handler authority
};
// 4. Dispatch
self.registry.invoke(&name, input, context).await
}
// contains() uses default (returns true — curated registry contains everything it can dispatch)
}
```
Key points:
- **Reachability check first**: if op not in parent's scoped_env, NOT_FOUND.
This bounds the parameterized-dispatch attack surface.
- **Authority propagation**: child's `identity` = parent's `handler_identity`
(the parent's composition authority becomes the caller). This is the
authority switch from ADR-015.
- **Fresh metadata**: `HashMap::new()`, NOT parent's metadata. Security
constraint (ADR-014) — prevents secret leakage through composition.
- **Inherited deadline**: children don't get a fresh 30s — the root call's
deadline bounds the entire call tree.
- **Inherited env**: child gets `parent.env.clone()` (the same composite of
curated base + active overlays).
- **internal: true**: this is the flag that switches ACL authority.
### CompositeOperationEnv (per-call, ADR-024)
```rust
pub struct CompositeOperationEnv {
session: Option<Arc<dyn OperationEnv + Send + Sync>>, // Layer 1
connection: Option<Arc<dyn OperationEnv + Send + Sync>>, // Layer 2
base: Arc<dyn OperationEnv + Send + Sync>, // Layer 0 (LocalOperationEnv)
}
#[async_trait]
impl OperationEnv for CompositeOperationEnv {
async fn invoke_with_policy(&self, namespace: &str, operation: &str, input: Value, parent: &OperationContext, policy: AbortPolicy) -> ResponseEnvelope {
let name = format!("{namespace}/{operation}");
// Reachability check (same as LocalOperationEnv)
if !parent.scoped_env.allows(&name) {
return ResponseEnvelope::not_found(name);
}
// Dispatch in overlay order: session → connection → curated base
// First overlay that *contains* the op wins
if let Some(session) = &self.session {
if session.contains(&name) {
return session.invoke_with_policy(namespace, operation, input, parent, policy).await;
}
}
if let Some(connection) = &self.connection {
if connection.contains(&name) {
return connection.invoke_with_policy(namespace, operation, input, parent, policy).await;
}
}
self.base.invoke_with_policy(namespace, operation, input, parent, policy).await
}
fn contains(&self, name: &str) -> bool {
self.session.as_ref().map_or(false, |s| s.contains(name))
|| self.connection.as_ref().map_or(false, |c| c.contains(name))
|| self.base.contains(name)
}
}
```
The `contains()` method (review #003 C9) is the overlay-dispatch contract. It
replaces the previous ambiguous "sentinel or contains check" framing. The
structural decision (composite trait object, overlay order, Arc::clone
inheritance) is locked by ADR-024; the dispatch contract (contains probe before
invoke_with_policy) is locked too.
### Why OperationEnv must remain a trait
The trait-based design enables registry layering (ADR-024):
- The CallAdapter composes the root env per call from curated base + active
connection/session overlays
- Overlays wrap the base via trait layering
- Session-scoped registries (OQ-19) and connection-scoped remote imports
(ADR-017 `from_call`) are both overlays on the same base
Making `OperationEnv` concrete or hardcoding the global registry into the
dispatch path would close both patterns. This is the same integration-point
pattern as `IdentityProvider` (ADR-004).
## Acceptance Criteria
- [ ] `OperationEnv` trait with `invoke()`, `invoke_with_policy()`, `contains()`
- [ ] `invoke()` has default impl delegating to `invoke_with_policy()` with parent's policy
- [ ] `contains()` has default impl returning `true`
- [ ] `LocalOperationEnv` struct holding `Arc<OperationRegistry>`
- [ ] `LocalOperationEnv::invoke_with_policy` checks reachability (scoped_env.allows)
- [ ] `LocalOperationEnv` constructs child context with internal: true, authority switch
- [ ] `LocalOperationEnv` fresh metadata (HashMap::new(), not parent's)
- [ ] `LocalOperationEnv` inherited deadline (parent.deadline, not fresh 30s)
- [ ] `LocalOperationEnv` inherited env (parent.env.clone())
- [ ] `CompositeOperationEnv` with session, connection, base fields
- [ ] `CompositeOperationEnv::invoke_with_policy` dispatches in overlay order (session → connection → base)
- [ ] `CompositeOperationEnv` uses `contains()` probe before dispatching to overlay
- [ ] `CompositeOperationEnv::contains` returns true if any layer contains the op
- [ ] Reachability check returns NOT_FOUND if op not in scoped_env
- [ ] Unit test: LocalOperationEnv invoke with allowed op → dispatches
- [ ] Unit test: LocalOperationEnv invoke with disallowed op → NOT_FOUND
- [ ] Unit test: child context has internal: true
- [ ] Unit test: child context identity = parent's handler_identity
- [ ] Unit test: child metadata is fresh (empty), not parent's
- [ ] Unit test: CompositeOperationEnv dispatches to session overlay if contains
- [ ] Unit test: CompositeOperationEnv falls through to base if no overlay contains
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/crates/call/operation-registry.md — OperationEnv, LocalOperationEnv, CompositeOperationEnv
- docs/architecture/decisions/015-privilege-model-and-authority-context.md — ADR-015 (authority switch)
- docs/architecture/decisions/016-abort-cascade-for-nested-calls.md — ADR-016 (abort policy propagation)
- docs/architecture/decisions/024-operation-registry-layering.md — ADR-024 (layering, contains contract)
## Notes
> **Read ADR-024 before starting.** The trait-object design is load-bearing —
> OperationEnv MUST remain a trait, not a concrete type. The authority switch
> (child identity = parent handler_identity) is the ADR-015 privilege model.
> Metadata does NOT propagate (ADR-014 security constraint). Deadline
> inherits (children don't get fresh 30s). The `contains()` probe is the
> overlay-dispatch contract from review #003 C9 — any OperationEnv impl that
> correctly reports contains works with the composite.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,168 @@
---
id: call/registry/operation-spec
name: Implement OperationSpec, OperationType, Visibility, ErrorDefinition, and AccessControl
status: pending
depends_on: [call/crate-init]
scope: moderate
risk: medium
impact: component
level: implementation
---
## Description
Implement the operation specification types in `src/registry/spec.rs`. These
types declare what an operation is, its schemas, and its access control policy.
### OperationSpec
```rust
pub struct OperationSpec {
pub name: String, // e.g., "fs/readFile", "agent/chat" (no leading slash)
pub namespace: String, // e.g., "fs", "agent"
pub op_type: OperationType, // Query, Mutation, Subscription
pub visibility: Visibility, // External (wire-callable) or Internal (composition-only)
pub input_schema: Value, // JSON Schema for input
pub output_schema: Value, // JSON Schema for output
pub error_schemas: Vec<ErrorDefinition>, // Declared domain errors (ADR-023)
pub access_control: AccessControl,
}
```
Operation names use slash-based paths **without a leading slash**, aligned with
URL path conventions: `fs/readFile`, `agent/chat`, `services/list`. The leading
slash is added for display (`spec.path()` returns `/fs/readFile`) and wire
format. The registry stores names without the leading slash.
The `namespace` field is derived from the name: for `fs/readFile` it's `fs`,
for `agent/chat` it's `agent`. It's a convenience accessor for ACL matching and
service grouping.
Implement `OperationSpec::path(&self) -> String` that returns `/{name}` (the
wire/display form with leading slash).
### OperationType
```rust
pub enum OperationType {
Query, // Read-only, idempotent (e.g., "fs/readFile", "services/list")
Mutation, // Side effects (e.g., "bash/exec", "github/authenticate")
Subscription, // Streaming (e.g., "agent/chat", "events/subscribe")
}
```
### Visibility
```rust
pub enum Visibility {
External, // Callable from the wire (call.requested from a client)
Internal, // Composition-only (env.invoke from a handler)
}
```
`External` operations appear in `services/list` and accept `call.requested`.
`Internal` operations return `NOT_FOUND` when called from the wire and do not
appear in `services/list`. The assembly layer declares visibility at
registration. All import adapters register operations as `Internal` by default
(they're composition material); the handler that composes them is `External`.
### ErrorDefinition
```rust
pub struct ErrorDefinition {
pub code: String, // e.g., "FILE_NOT_FOUND", "RATE_LIMITED"
pub description: String, // Human-readable description
pub schema: Value, // JSON Schema for the error detail payload
pub http_status: Option<u16>, // HTTP status for adapter projection (from_openapi/to_openapi)
}
```
A declared operation-level error (ADR-023). When a handler returns a `CallError`
whose `code` matches a declared `ErrorDefinition`, the `call.error` event
carries that code and the error's detail payload. If it doesn't match, the
`call.error` carries `INTERNAL`.
### AccessControl
```rust
pub struct AccessControl {
pub required_scopes: Vec<String>, // AND-checked: caller must have ALL
pub required_scopes_any: Option<Vec<String>>, // OR-checked: caller must have at LEAST ONE
pub resource_type: Option<String>, // e.g., "service"
pub resource_action: Option<String>, // e.g., "read"
}
```
### ACL check flow
When a `call.requested` event arrives:
1. Registry checks **visibility** — if `Internal`, returns `NOT_FOUND` (does
not leak existence)
2. Registry checks `access_control.check(identity)`:
- For external calls (`internal: false`): ACL against the **caller's identity**
- For internal calls (`internal: true`): ACL against the **handler's
composition authority** (ADR-015)
3. If denied: `FORBIDDEN`
4. If identity is `None` and operation has restrictions: `FORBIDDEN` with
message `"authentication required"`
Operations with empty `AccessControl` (no required scopes, no resource checks)
are accessible to all callers, including unauthenticated ones.
### Implement AccessControl::check
```rust
impl AccessControl {
pub fn check(&self, identity: Option<&Identity>) -> AccessResult;
}
pub enum AccessResult {
Allowed,
Forbidden(String), // reason
}
```
The check logic:
- `required_scopes`: caller must have ALL (subset check)
- `required_scopes_any`: caller must have at LEAST ONE (if present)
- `resource_type` / `resource_action`: check against `identity.resources`
- If `identity` is `None` and any scope/resource is required: `Forbidden("authentication required")`
## Acceptance Criteria
- [ ] `OperationSpec` struct with all 8 fields
- [ ] `OperationSpec::path()` returns `/{name}` (leading slash for wire/display)
- [ ] `OperationSpec::namespace` derived from name (split on `/`)
- [ ] `OperationType` enum with Query, Mutation, Subscription
- [ ] `Visibility` enum with External, Internal
- [ ] `ErrorDefinition` struct with all 4 fields
- [ ] `AccessControl` struct with all 4 fields
- [ ] `AccessControl::check(identity)` returns `AccessResult`
- [ ] `required_scopes` is AND-checked (caller must have all)
- [ ] `required_scopes_any` is OR-checked (caller must have at least one)
- [ ] `None` identity with restrictions → `Forbidden("authentication required")`
- [ ] Empty AccessControl → `Allowed` for all callers
- [ ] Unit tests for AccessControl::check (all combinations)
- [ ] Unit test: OperationSpec::path() produces leading slash
- [ ] Unit test: namespace derived correctly from name
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/crates/call/operation-registry.md — OperationSpec, AccessControl, Visibility
- docs/architecture/decisions/015-privilege-model-and-authority-context.md — ADR-015 (visibility, ACL)
- docs/architecture/decisions/023-operation-error-schemas.md — ADR-023 (ErrorDefinition)
## Notes
> Operation names have NO leading slash in the registry (`fs/readFile`). The
> leading slash is added for wire format and display (`/fs/readFile`). This is
> a single rule applied consistently — do not mix the two forms. Visibility
> controls wire-callability: Internal ops return NOT_FOUND from the wire (don't
> leak existence). AccessControl.check is the ACL gate — read it carefully
> against ADR-015 for the internal vs external authority distinction.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,181 @@
---
id: call/registry/service-discovery
name: Implement services/list and services/schema built-in operations
status: pending
depends_on: [call/registry/handler-registration]
scope: narrow
risk: low
impact: isolated
level: implementation
---
## Description
Implement the two built-in service discovery operations in
`src/registry/discovery.rs`. These are read-only operations that expose what
the node offers.
### Operations
| Operation name | Display path | Type | Description |
|---------------|-------------|------|-------------|
| `services/list` | `/services/list` | Query | List registered operation names and metadata |
| `services/schema` | `/services/schema` | Query | Get the OperationSpec for a specific operation |
### services/list
Returns `External` operations only. `Internal` operations are not part of the
wire-facing API surface — they're implementation details of composition. A
remote client cannot enumerate the internal call tree (ADR-015).
```json
{
"operations": [
{ "name": "fs/readFile", "namespace": "fs", "op_type": "query" },
{ "name": "agent/chat", "namespace": "agent", "op_type": "subscription" },
{ "name": "events/subscribe", "namespace": "events", "op_type": "subscription" }
]
}
```
The handler queries the registry's `list_operations()` (which returns External
specs only) and serializes to the above format.
### services/schema
Accepts `{ "name": "fs/readFile" }` (no leading slash — registry form, same as
`OperationSpec.name`) and returns the full `OperationSpec` including
input/output JSON Schemas and declared `error_schemas` (ADR-023).
The CallAdapter normalizes the leading slash from wire `operationId`s before
lookup, so `services/schema` accepts both `fs/readFile` and `/fs/readFile`.
This enables client code generation: a client reading the schema can produce
typed error enums instead of generic error handling.
### Registration
These are registered as `Local` provenance with empty composition authority,
empty scoped env, and empty capabilities (they don't compose, don't need
credentials):
```rust
.with_local(services_list_spec(), Arc::new(services_list_handler),
CompositionAuthority::none(), ScopedOperationEnv::empty(), Capabilities::new())
.with_local(services_schema_spec(), Arc::new(schema_handler),
CompositionAuthority::none(), ScopedOperationEnv::empty(), Capabilities::new())
```
### Specs
```rust
fn services_list_spec() -> OperationSpec {
OperationSpec {
name: "services/list".into(),
namespace: "services".into(),
op_type: OperationType::Query,
visibility: Visibility::External,
input_schema: json!({}), // no input
output_schema: json!({
"type": "object",
"properties": {
"operations": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"namespace": { "type": "string" },
"op_type": { "type": "string", "enum": ["query", "mutation", "subscription"] }
}
}
}
}
}),
error_schemas: vec![],
access_control: AccessControl::default(), // no restrictions — callable by all
}
}
fn services_schema_spec() -> OperationSpec {
OperationSpec {
name: "services/schema".into(),
namespace: "services".into(),
op_type: OperationType::Query,
visibility: Visibility::External,
input_schema: json!({
"type": "object",
"properties": { "name": { "type": "string" } },
"required": ["name"]
}),
output_schema: json!({ /* full OperationSpec schema */ }),
error_schemas: vec![],
access_control: AccessControl::default(),
}
}
```
### Handlers
The handlers need access to the registry. Since handlers are `Arc<dyn Fn>`,
the registry reference is captured in the closure. Use `Arc<OperationRegistry>`
cloned into the closure.
```rust
fn services_list_handler(registry: Arc<OperationRegistry>) -> Handler {
Arc::new(move |input: Value, ctx: OperationContext| {
let registry = registry.clone();
Box::pin(async move {
let ops: Vec<_> = registry.list_operations()
.into_iter()
.filter(|s| s.visibility == Visibility::External)
.map(|s| json!({
"name": s.name,
"namespace": s.namespace,
"op_type": match s.op_type {
OperationType::Query => "query",
OperationType::Mutation => "mutation",
OperationType::Subscription => "subscription",
}
}))
.collect();
ResponseEnvelope::ok(ctx.request_id, json!({ "operations": ops }))
})
})
}
```
## Acceptance Criteria
- [ ] `services/list` spec with correct fields (Query, External, no input, output schema)
- [ ] `services/schema` spec with correct fields (Query, External, name input, full spec output)
- [ ] `services/list` handler returns External operations only (Internal excluded)
- [ ] `services/list` output format matches spec (operations array with name, namespace, op_type)
- [ ] `services/schema` handler accepts name with or without leading slash
- [ ] `services/schema` returns full OperationSpec (input_schema, output_schema, error_schemas)
- [ ] `services/schema` returns NOT_FOUND for unknown operation name
- [ ] Both registered as Local provenance, empty authority/env/caps
- [ ] Both have empty AccessControl (callable by all, including unauthenticated)
- [ ] Unit test: services/list returns only External ops
- [ ] Unit test: services/schema returns spec for known op
- [ ] Unit test: services/schema returns NOT_FOUND for unknown op
- [ ] Unit test: services/schema accepts both "fs/readFile" and "/fs/readFile"
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/crates/call/operation-registry.md — Service Discovery section
- docs/architecture/decisions/015-privilege-model-and-authority-context.md — ADR-015 (Internal not in services/list)
## Notes
> services/list returns External ops only — Internal ops are implementation
> details of composition and must not be enumerable from the wire. The
> CallAdapter normalizes leading slashes, so services/schema accepts both
> forms. These are the only built-in operations; no admin operations are
> exposed through the call protocol itself.
## Summary
> To be filled on completion

137
tasks/call/review-call.md Normal file
View File

@@ -0,0 +1,137 @@
---
id: call/review-call
name: Review alknet-call implementation for spec conformance and pattern consistency
status: pending
depends_on: [call/protocol/abort-cascade]
scope: broad
risk: low
impact: phase
level: review
---
## Description
Review the alknet-call implementation for spec conformance, pattern
consistency, and correctness. This is the quality checkpoint at the end of the
call phase — the most complex crate in this batch.
### Review Checklist
1. **Registry conformance** (operation-registry.md):
- `OperationSpec` has all 8 fields, `path()` adds leading slash
- `OperationType` (Query/Mutation/Subscription), `Visibility` (External/Internal)
- `ErrorDefinition` with code, description, schema, http_status (ADR-023)
- `AccessControl` with required_scopes (AND), required_scopes_any (OR), resource checks
- `AccessControl::check` returns Allowed/Forbidden, None identity with restrictions → Forbidden
- `OperationContext` has all 10 fields, `internal` is pub(crate), `is_internal()` reads
- `AbortPolicy` (AbortDependents default, ContinueRunning opt-in)
- `CompositionAuthority` with label, scopes, resources, `as_identity()`
- `ScopedOperationEnv` with `allows()` reachability check
- `Handler` type (async closure → ResponseEnvelope)
- `HandlerRegistration` with all 6 fields (spec, handler, provenance, authority, scoped_env, caps)
- `OperationProvenance` with all 6 variants
- `OperationRegistry` with register, registration, invoke, list_operations
- `OperationRegistryBuilder` with with_local, with_leaf, with, build
- `OperationEnv` trait with invoke, invoke_with_policy, contains
- `LocalOperationEnv` reachability check, authority switch, fresh metadata, inherited deadline
- `CompositeOperationEnv` overlay dispatch (session → connection → base), contains probe
- `services/list` returns External only, `services/schema` returns full spec
2. **Protocol conformance** (call-protocol.md):
- `EventEnvelope` with type, id, payload (JSON, length-prefixed framing)
- `ResponseEnvelope` with request_id, result
- `CallError` with code, message, retryable, details
- 5 event types: call.requested, call.responded, call.completed, call.aborted, call.error
- Wire payload schemas match spec table
- `call.requested` has operationId (leading slash), input, optional auth_token
- `call.error` has protocol-level codes (NOT_FOUND, FORBIDDEN, INVALID_INPUT, INTERNAL, TIMEOUT)
- `PendingRequestMap` correlates by ID (not stream), handles all event types
- `CallConnection` with Layer 2 overlay, register_imported, overlay_env, call/subscribe/abort
- `CallAdapter` implements ProtocolHandler for alknet/call
- CallAdapter stream handling (accept_bi loop, FrameFramedReader/Writer)
- Per-request identity resolution (auth_token overrides connection-level)
- `build_root_context` sets internal: false, deadline, capabilities from registration
- `compose_root_env` builds CompositeOperationEnv (base + session + connection)
- operationId leading slash stripped before lookup
- ResponseEnvelope → EventEnvelope conversion
- Timeout: 30s default, composed calls inherit parent deadline
- Abort cascade: walks tree by parent_request_id, AbortDependents/ContinueRunning
3. **ADR conformance**:
- ADR-005: irpc framing used
- ADR-012: bidirectional streams, ID-based correlation
- ADR-014: no secret material on wire, Capabilities non-serializable
- ADR-015: internal flag switches authority (handler_identity vs identity), Visibility
- ADR-016: abort cascade, AbortPolicy, default AbortDependents
- ADR-017: connection direction independent of call direction
- ADR-022: registration bundle (provenance, authority, scoped_env, capabilities)
- ADR-023: ErrorDefinition, typed details in call.error
- ADR-024: registry layering (curated + session + connection), OperationEnv as trait
4. **Security constraints**:
- Capabilities non-serializable (no Serialize derive)
- Capabilities zeroized, immutable after construction
- Metadata does not propagate through composition (fresh HashMap::new())
- Call protocol carries no secret material
- Internal ops return NOT_FOUND from wire (don't leak existence)
- Reachability check (scoped_env.allows) bounds composition attack surface
- Request IDs are UUID v4 (non-deterministic, no collisions)
5. **Pattern consistency**:
- OperationEnv is a trait (not concrete) — enables layering
- CompositeOperationEnv uses contains() probe before dispatch
- Authority switch in invoke_with_policy (child identity = parent handler_identity)
- Deadline inheritance (children don't get fresh 30s)
- ArcSwap not used in call (that's core's pattern)
6. **Test coverage**:
- Unit tests for AccessControl::check (all combinations)
- Unit tests for OperationContext construction
- Unit tests for OperationEnv (LocalOperationEnv, CompositeOperationEnv)
- Unit tests for PendingRequestMap (all event types, timeouts, fail_all)
- Unit tests for framing (round-trip, truncation)
- Unit tests for abort cascade (both policies, tree walking)
- Integration test: call.requested → dispatch → call.responded
- Integration test: auth_token overrides identity
- Integration test: Internal op → NOT_FOUND from wire
- Integration test: ACL denied → FORBIDDEN
- Integration test: subscription streaming (multiple responded, completed)
## Acceptance Criteria
- [ ] All registry types match operation-registry.md
- [ ] All protocol types match call-protocol.md
- [ ] All ADRs conformed to (005, 012, 014, 015, 016, 017, 022, 023, 024)
- [ ] Capabilities non-serializable, zeroized, immutable
- [ ] Metadata does not propagate through composition
- [ ] Internal ops return NOT_FOUND from wire
- [ ] Reachability check bounds composition
- [ ] Request IDs are UUID v4
- [ ] OperationEnv is a trait (not concrete)
- [ ] CompositeOperationEnv uses contains() probe
- [ ] Authority switch correct (internal: true → handler_identity)
- [ ] Deadline inheritance correct (children inherit parent deadline)
- [ ] Test coverage adequate for all functionality
- [ ] `cargo fmt --check -p alknet-call` passes
- [ ] `cargo clippy -p alknet-call` passes with no warnings
- [ ] All tests pass
## References
- docs/architecture/crates/call/README.md
- docs/architecture/crates/call/call-protocol.md
- docs/architecture/crates/call/operation-registry.md
- docs/architecture/decisions/ (relevant ADRs: 005, 012, 014-017, 022-024)
## Notes
> This is the most complex crate in this batch. The review should verify that
> the registry layering (ADR-024), authority switch (ADR-015), abort cascade
> (ADR-016), and composition model (ADR-022) all work correctly together. The
> OperationEnv trait-object design is load-bearing — verify it's a trait, not
> concrete. If deviations are found, document and fix before considering the
> call crate complete.
## Summary
> To be filled on completion

162
tasks/core/auth.md Normal file
View File

@@ -0,0 +1,162 @@
---
id: core/auth
name: Implement AuthContext, Identity, AuthToken, IdentityProvider trait, and ConfigIdentityProvider
status: pending
depends_on: [core/core-types]
scope: moderate
risk: medium
impact: component
level: implementation
---
## Description
Implement the authentication types in `src/auth.rs`. Auth is hybrid: the
endpoint resolves what it can (TLS-level), handlers resolve what they need
(protocol-level). AuthContext may be partial — handlers complete auth inside
`handle()`.
### AuthContext
```rust
#[derive(Clone)]
pub struct AuthContext {
pub identity: Option<Identity>,
pub alpn: Vec<u8>,
pub remote_addr: Option<SocketAddr>,
pub tls_client_fingerprint: Option<String>,
}
```
Created by the endpoint for each incoming connection. Passed to
`ProtocolHandler::handle()` as an immutable reference.
- `identity`: peer's authenticated identity, if resolved by the endpoint. None
means the endpoint has no identity info for this connection.
- `alpn`: negotiated ALPN — always present after TLS handshake.
- `remote_addr`: peer's address, if available (may be None for iroh).
- `tls_client_fingerprint`: SHA-256 fingerprint of TLS client cert, if presented.
`AuthContext` is `Clone` (handlers clone for per-stream contexts) and immutable
in `handle()` (handlers create local variables for resolved identity, they
don't mutate the shared context).
### Identity
```rust
#[derive(Debug, Clone, PartialEq)]
pub struct Identity {
pub id: String,
pub scopes: Vec<String>,
pub resources: HashMap<String, Vec<String>>,
}
```
The authenticated peer identity. `id` is ALPN-agnostic:
- SSH key auth: `"SHA256:abc123..."` (key fingerprint)
- API key auth: `"alk_test"` (key prefix)
- Certificate auth: `"username"` (principal name)
### AuthToken
```rust
#[derive(Debug, Clone)]
pub struct AuthToken {
pub raw: Vec<u8>,
}
```
Opaque authentication token carried in protocol frames. The handler that
extracted it knows its encoding.
### IdentityProvider trait
```rust
pub trait IdentityProvider: Send + Sync + 'static {
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
}
```
- `resolve_from_fingerprint()`: used by endpoint (TLS client cert) and SSH (key fingerprint)
- `resolve_from_token()`: used by call protocol (AuthToken in first frame) and HTTP (Bearer header)
- Both return `Option<Identity>` — None means credential not recognized
### ConfigIdentityProvider
```rust
pub struct ConfigIdentityProvider {
dynamic: Arc<ArcSwap<DynamicConfig>>,
}
```
The default implementation. Resolves identities from `DynamicConfig` (reads
from ArcSwap on every call — hot-reloadable).
Resolution logic:
- **Fingerprint**: look up in `DynamicConfig::auth::authorized_fingerprints`.
If found, return `Identity { id: fingerprint, scopes: ["relay:connect"], resources: {} }`.
- **Token**: parse as UTF-8. If starts with `alk_`, look up in
`DynamicConfig::auth::api_keys` by prefix match + SHA-256 hash. If found and
not expired, return `Identity { id: prefix, scopes: entry.scopes, resources: entry.resources }`.
Changes to DynamicConfig via ConfigReloadHandle are reflected immediately.
### Two Identity Scopes
There are two distinct identity scopes that must not be conflated:
| Scope | Where set | Where stored | Represents | Used for |
|-------|-----------|--------------|------------|----------|
| Connection-level | Handler in `handle()` | `Connection` (via `set_identity`) | Who opened the QUIC connection | Observability, logging |
| Per-request | CallAdapter per `call.requested` | `OperationContext.identity` | Who makes this specific call | ACL (ADR-015) |
The connection-level identity is stable (set once). The per-request identity
is dynamic (resolved per call, potentially different across requests). The
per-request identity takes precedence for ACL.
### Security constraints
- **Token entropy**: generated `alk_` tokens must have ≥128 bits of entropy.
The prefix (first 8 chars) is for O(1) lookup and is not secret — it appears
in logs by design. SHA-256 of the full token allows offline verification; this
is safe only if the full token is high-entropy.
- **Config reload must be authenticated**: a reload that adds an authorized
fingerprint or API key grants access immediately. The reload trigger must be
local-only or admin-scoped.
- **Connection-level identity is for observability only**: per-request identity
takes precedence for ACL.
## Acceptance Criteria
- [ ] `AuthContext` struct with all 4 fields, derives `Clone`
- [ ] `Identity` struct with `id`, `scopes`, `resources`, derives `Clone`, `PartialEq`
- [ ] `AuthToken` struct with `raw` field, derives `Clone`
- [ ] `IdentityProvider` trait with both methods
- [ ] `ConfigIdentityProvider` struct holding `Arc<ArcSwap<DynamicConfig>>`
- [ ] `ConfigIdentityProvider::resolve_from_fingerprint` looks up in authorized_fingerprints
- [ ] `ConfigIdentityProvider::resolve_from_token` parses `alk_` prefix, matches by hash, checks expiry
- [ ] ConfigIdentityProvider reads from ArcSwap on every call (hot-reloadable)
- [ ] Unit test: fingerprint resolution (known fingerprint → Some, unknown → None)
- [ ] Unit test: token resolution (valid non-expired → Some, expired → None, unknown → None)
- [ ] Unit test: config reload changes resolution results immediately
- [ ] `cargo test -p alknet-core` succeeds
- [ ] `cargo clippy -p alknet-core` succeeds with no warnings
## References
- docs/architecture/crates/core/auth.md — all type definitions, resolution flow
- docs/architecture/decisions/004-auth-as-shared-core.md — ADR-004
- docs/architecture/decisions/011-authcontext-structure.md — ADR-011
## Notes
> Auth is hybrid: endpoint resolves TLS-level, handler resolves protocol-level.
> AuthContext may be partial (identity = None). The two identity scopes
> (connection-level for observability, per-request for ACL) must not be
> conflated. ConfigIdentityProvider reads from ArcSwap on every call so config
> reloads take effect immediately.
## Summary
> To be filled on completion

190
tasks/core/config.md Normal file
View File

@@ -0,0 +1,190 @@
---
id: core/config
name: Implement StaticConfig, DynamicConfig, AuthPolicy, ApiKeyEntry, ConfigReloadHandle, TlsIdentity
status: pending
depends_on: [core/core-types]
scope: moderate
risk: low
impact: component
level: implementation
---
## Description
Implement the configuration types in `src/config.rs`. These are the config
structures consumed by the endpoint and the CLI binary. StaticConfig is
immutable at startup; DynamicConfig is hot-reloadable via ArcSwap.
### StaticConfig
```rust
pub struct StaticConfig {
pub listen_addr: Option<SocketAddr>,
pub tls_identity: Option<TlsIdentity>,
pub iroh_relay: Option<RelayUrl>,
pub drain_timeout: Duration,
}
```
Immutable configuration resolved at startup. `listen_addr` is None for
iroh-only nodes. `tls_identity` is required if `listen_addr` is Some.
### TlsIdentity
```rust
pub enum TlsIdentity {
X509 { cert: PathBuf, key: PathBuf },
RawKey(iroh::SecretKey),
SelfSigned,
}
```
Three modes (OQ-12):
- `X509`: domain certificate for browser/WebTransport clients
- `RawKey`: RFC 7250 raw Ed25519 public key — default for P2P, no domain/CA
- `SelfSigned`: development only
`RawKey` uses `iroh::SecretKey` (Ed25519) — re-exported from iroh, which
alknet-core depends on (feature-gated). The key can be derived from
alknet-vault at the assembly layer or generated fresh.
### DynamicConfig
```rust
#[derive(Debug, Clone)]
pub struct DynamicConfig {
pub auth: AuthPolicy,
pub rate_limits: RateLimitConfig,
}
```
Runtime-reloadable via ArcSwap.
### AuthPolicy
```rust
pub struct AuthPolicy {
pub authorized_fingerprints: HashSet<String>,
pub api_keys: Vec<ApiKeyEntry>,
}
```
Fingerprints stored as strings (no russh dependency in core — ADR-003).
Certificate authority entries deferred to alknet-ssh (omitted from v1 to avoid
referencing an undefined type; adding back is additive).
### ApiKeyEntry
```rust
pub struct ApiKeyEntry {
pub prefix: String,
pub hash: String,
pub scopes: Vec<String>,
pub description: String,
pub expires_at: Option<u64>,
}
```
Carries forward from reference implementation. Prefix (first 8 chars) for O(1)
lookup, SHA-256 hash for verification.
### RateLimitConfig
```rust
pub struct RateLimitConfig {
pub max_connections_per_ip: usize,
pub max_auth_attempts: usize,
}
```
### ArcSwap pattern
```rust
let dynamic = Arc::new(ArcSwap::new(Arc::new(DynamicConfig::default())));
```
- Reads: `dynamic.load()` returns `Arc<DynamicConfig>` — lock-free
- Writes: `dynamic.store(Arc::new(new_config))` — atomic swap
- No locks: ArcSwap uses atomic operations
### ConfigReloadHandle
```rust
pub struct ConfigReloadHandle {
dynamic: Arc<ArcSwap<DynamicConfig>>,
}
impl ConfigReloadHandle {
pub fn reload(&self, new_config: DynamicConfig);
pub fn dynamic(&self) -> Arc<DynamicConfig>;
}
```
- `reload()`: atomically replaces the dynamic config
- `dynamic()`: returns current config as `Arc<DynamicConfig>`
**Config reload is a privilege-escalation path.** A reload that adds an
authorized fingerprint or API key grants access immediately. The reload
trigger must be authenticated/local-only (SIGHUP, file watch, or admin call
protocol operation). The implementation must not ship a reload endpoint with
no auth "for convenience."
### ConfigError
```rust
pub enum ConfigError {
InvalidFlag { name: String },
KeyFileNotFound { path: String },
BindFailed(io::Error),
TlsConfig(io::Error),
IncompatibleOptions,
}
```
### Defaults
- `drain_timeout`: 2 seconds
- `max_connections_per_ip`: implementation default (reference uses a reasonable value)
- `max_auth_attempts`: implementation default
- `DynamicConfig::default()`: empty auth policy, default rate limits
### What NOT to include
Per the spec, StaticConfig does NOT include: `host_key`, `host_key_algorithm`,
`proxy_config`, `stealth`, `transport_mode`, `listeners`. These are removed in
the new model (ALPN dispatch replaces them — see config.md Key Differences).
## Acceptance Criteria
- [ ] `StaticConfig` struct with all fields per config.md
- [ ] `TlsIdentity` enum with X509, RawKey, SelfSigned variants
- [ ] `DynamicConfig` struct with `auth` and `rate_limits` fields
- [ ] `AuthPolicy` struct with `authorized_fingerprints` and `api_keys`
- [ ] `ApiKeyEntry` struct with all 5 fields
- [ ] `RateLimitConfig` struct with both fields
- [ ] `ConfigReloadHandle` with `reload()` and `dynamic()` methods
- [ ] `ConfigError` enum with all variants
- [ ] `DynamicConfig` derives `Clone`, `Debug` (for ArcSwap)
- [ ] Default values match config.md (drain_timeout = 2s, etc.)
- [ ] No russh dependency (fingerprints as strings)
- [ ] Unit tests for Default impls
- [ ] Unit test: ConfigReloadHandle reload swaps config atomically
- [ ] `cargo test -p alknet-core` succeeds
- [ ] `cargo clippy -p alknet-core` succeeds with no warnings
## References
- docs/architecture/crates/core/config.md — all type definitions
- docs/architecture/decisions/003-crate-decomposition.md — ADR-003 (no russh in core)
- docs/architecture/decisions/010-alpn-router-and-endpoint.md — ADR-010 (no ListenerConfig)
## Notes
> Config reload is a privilege-escalation path — do not ship an unauthenticated
> reload endpoint. The ArcSwap pattern carries forward from the reference
> implementation. StaticConfig removes all SSH-centric fields (host_key,
> stealth, transport_mode, listeners) — those are handler concerns now.
## Summary
> To be filled on completion

224
tasks/core/core-types.md Normal file
View File

@@ -0,0 +1,224 @@
---
id: core/core-types
name: "Implement core types: ProtocolHandler, Connection, BiStream, SendStream, RecvStream, StreamError, HandlerError, Capabilities"
status: pending
depends_on: [core/crate-init]
scope: broad
risk: medium
impact: component
level: implementation
---
## Description
Implement the core types in `src/types.rs`. These are the foundational
abstractions that every handler crate depends on. This is the most
cross-crate-boundary task in core — `Capabilities` in particular is used
heavily by alknet-call's operation registry and composition model.
### ProtocolHandler trait
```rust
#[async_trait]
pub trait ProtocolHandler: Send + Sync + 'static {
fn alpn(&self) -> &'static [u8];
async fn handle(&self, connection: Connection, auth: &AuthContext) -> Result<(), HandlerError>;
}
```
- `alpn()` returns the handler's ALPN identifier as a static byte string
- `handle()` receives a `Connection` (not a single BiStream) and an `AuthContext`
- Handlers that need a single stream call `connection.accept_bi()` once
- Handlers that multiplex (SSH, call) open/accept streams as needed
See ADR-002, ADR-007.
### HandlerError
```rust
pub enum HandlerError {
ConnectionClosed,
StreamError(io::Error),
AuthRequired,
Internal(Box<dyn std::error::Error + Send + Sync>),
}
```
Non-fatal errors within `handle()`. The endpoint catches these, logs them,
closes the connection. Other connections are unaffected. Handler panics are
caught by tokio's task isolation.
### Connection
```rust
pub struct Connection {
// Private: wraps the underlying QUIC connection or test mock
identity: OnceLock<Identity>,
}
impl Connection {
#[cfg(feature = "quinn")]
pub fn from_quinn(conn: quinn::Connection) -> Self;
#[cfg(feature = "iroh")]
pub fn from_iroh(conn: iroh::Connection) -> Self;
pub async fn accept_bi(&self) -> Result<(SendStream, RecvStream), StreamError>;
pub async fn open_bi(&self) -> Result<(SendStream, RecvStream), StreamError>;
pub fn remote_alpn(&self) -> &[u8];
pub fn remote_addr(&self) -> Option<SocketAddr>;
pub fn close(&self, code: u32, reason: &str);
pub fn set_identity(&self, identity: Identity) -> Result<(), IdentityAlreadySet>;
pub fn identity(&self) -> Option<&Identity>;
}
```
- Opaque type wrapping a QUIC connection (quinn or iroh, feature-gated)
- `set_identity` is write-once-read-many via `OnceLock` (OQ-11) — handlers
store resolved identity for observability; the endpoint does NOT read it
after `handle()` returns (the Connection is moved into the spawned task)
- Internal enum dispatch for quinn vs iroh vs test mock
- `Connection` does not expose quinn types in its public API
### BiStream trait
```rust
pub trait BiStream: AsyncRead + AsyncWrite + Send + Unpin {}
```
A convenience trait for client-side code, test mocks, and future transport
abstractions (WebTransport, raw TCP). Handlers that need a single stream
obtain one via `connection.accept_bi()` and treat the pair as a BiStream.
### SendStream and RecvStream
```rust
pub struct SendStream { /* wraps quinn::SendStream or iroh::SendStream or test mock */ }
pub struct RecvStream { /* wraps quinn::RecvStream or iroh::RecvStream or test mock */ }
impl AsyncWrite for SendStream { ... }
impl AsyncRead for RecvStream { ... }
```
Concrete wrapper types using internal enum dispatch to delegate to the
appropriate QUIC stream type (quinn or iroh) in production, and to test mocks
in tests.
### StreamError
```rust
pub enum StreamError {
ConnectionClosed,
StreamClosed,
Timeout,
Internal(io::Error),
}
```
Returned by `accept_bi()`, `open_bi()`, and stream read/write operations.
Maps from `quinn::ConnectionError` / `quinn::StreamError` and iroh equivalents.
### From<StreamError> for HandlerError
```rust
impl From<StreamError> for HandlerError {
fn from(e: StreamError) -> Self {
match e {
StreamError::ConnectionClosed => HandlerError::ConnectionClosed,
StreamError::StreamClosed => HandlerError::StreamError(
io::Error::new(io::ErrorKind::ConnectionReset, "stream closed")),
StreamError::Timeout => HandlerError::StreamError(
io::Error::new(io::ErrorKind::TimedOut, "stream timed out")),
StreamError::Internal(e) => HandlerError::StreamError(e),
}
}
}
```
This `From` impl is the canonical conversion — handlers use `?` on
`accept_bi()` / `open_bi()`.
### Capabilities
```rust
#[derive(Clone, Zeroize, ZeroizeOnDrop)]
pub struct Capabilities {
entries: HashMap<String, Secret<String>>,
}
impl Capabilities {
pub fn new() -> Self;
pub fn with_api_key(mut self, service: &str, key: String) -> Self;
pub fn with_http_token(mut self, service: &str, token: String) -> Self;
pub fn get(&self, service: &str) -> Option<&Secret<String>>;
}
```
Critical constraints (ADR-014, ADR-022, review #002 W2):
- **Non-serializable**: does NOT derive `Serialize`. Cannot appear in
`EventEnvelope` payloads even by accident.
- **Zeroized**: derives `Zeroize` and `ZeroizeOnDrop`. Secret material does
not linger in freed heap memory.
- **Clone + Send + Sync**: required by the composition model —
`OperationEnv::invoke()` clones the parent's capabilities for each child.
- **Immutable after construction**: no `set`, no `insert`, no `mut` accessors.
This is the guard from review #002 W2 — makes clone semantics genuinely
two-way (Arc-based vs deep-copy are behaviorally identical when neither
supports mutation).
- **Private fields**: the builder API (`new`, `with_*`) is the only
construction path.
Use `secrecy::Secret<String>` (from the `secrecy` crate) or a similar wrapper
for the secret values. Add `secrecy` to dependencies if needed, or implement
a simple `Secret` wrapper that zeroizes on drop and redacts in Debug.
### IdentityAlreadySet error
```rust
#[derive(Debug, thiserror::Error)]
pub enum IdentityAlreadySet {
#[error("connection identity already set")]
AlreadySet,
}
```
Returned by `Connection::set_identity()` if called a second time.
## Acceptance Criteria
- [ ] `ProtocolHandler` trait defined with `alpn()` and `handle()` (async)
- [ ] `HandlerError` enum with all 4 variants
- [ ] `Connection` struct with all methods (from_quinn/from_iroh feature-gated)
- [ ] `Connection::set_identity` write-once via `OnceLock`, returns `IdentityAlreadySet` on second call
- [ ] `BiStream` trait defined (AsyncRead + AsyncWrite + Send + Unpin)
- [ ] `SendStream` implements `AsyncWrite`
- [ ] `RecvStream` implements `AsyncRead`
- [ ] `StreamError` enum with all 4 variants
- [ ] `From<StreamError> for HandlerError` impl
- [ ] `Capabilities` struct with `new()`, `with_api_key()`, `with_http_token()`, `get()`
- [ ] `Capabilities` derives `Clone`, `Zeroize`, `ZeroizeOnDrop` — NOT `Serialize`
- [ ] `Capabilities` fields are private (builder API only, no mut accessors)
- [ ] `IdentityAlreadySet` error type
- [ ] Unit tests for Capabilities (build, get, clone, zeroize)
- [ ] Unit test: `Connection::set_identity` once succeeds, twice returns error
- [ ] `cargo test -p alknet-core` succeeds
- [ ] `cargo clippy -p alknet-core` succeeds with no warnings
## References
- docs/architecture/crates/core/core-types.md — all type definitions
- docs/architecture/decisions/002-protocol-handler-trait.md — ADR-002
- docs/architecture/decisions/007-bistream-type-definition.md — ADR-007
- docs/architecture/decisions/014-secret-material-flow-and-capability-injection.md — ADR-014 (Capabilities)
- docs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md — ADR-022
## Notes
> This is the most cross-crate-boundary task in core. `Capabilities` is used
> heavily by alknet-call's operation registry and composition model — it must
> be right the first time. The immutability guard (no mut accessors) is the
> security control from review #002 W2 that makes clone semantics safe. The
> `Connection` type uses internal enum dispatch for quinn/iroh/test — do not
> expose quinn types in the public API.
## Summary
> To be filled on completion

116
tasks/core/crate-init.md Normal file
View File

@@ -0,0 +1,116 @@
---
id: core/crate-init
name: Initialize alknet-core crate with Cargo.toml, dependencies, and module skeleton
status: pending
depends_on: []
scope: moderate
risk: low
impact: project
level: implementation
---
## Description
Initialize the `alknet-core` crate from scratch. The workspace currently has
only `alknet-vault`. This task creates the crate directory, `Cargo.toml`,
`lib.rs`, and the module skeleton that subsequent core tasks will fill in.
### Crate setup
Create `crates/alknet-core/` with:
- `Cargo.toml` — package metadata, dependencies, feature flags
- `src/lib.rs` — crate root with module declarations and re-exports
- Module skeleton files (empty or with `// TODO` markers) for:
- `src/types.rs` — ProtocolHandler, HandlerError, Connection, BiStream, SendStream, RecvStream, StreamError, Capabilities
- `src/auth.rs` — AuthContext, Identity, IdentityProvider, AuthToken, ConfigIdentityProvider
- `src/config.rs` — StaticConfig, DynamicConfig, AuthPolicy, ApiKeyEntry, RateLimitConfig, ConfigReloadHandle, ConfigError, TlsIdentity
- `src/endpoint.rs` — AlknetEndpoint, HandlerRegistry, EndpointError
### Dependencies
Per the architecture specs (overview.md, core/README.md, endpoint.md):
| Crate | Purpose |
|-------|---------|
| `tokio` 1 (full) | Async runtime, watch channel for shutdown |
| `quinn` | QUIC endpoint (feature-gated) |
| `iroh` | P2P relay-assisted endpoint (feature-gated) |
| `rustls` | TLS implementation |
| `rustls-pki-types` | TLS types (CertificateDer, PrivateKeyDer) |
| `serde` 1 | Serialization for config types |
| `serde_json` 1 | JSON for config, JSON Schema values |
| `toml` 0.8 | Config file format |
| `arc-swap` 1 | Atomic config swap for DynamicConfig |
| `async-trait` 0.1 | ProtocolHandler trait (async fn in trait) |
| `tracing` 0.1 | Structured logging |
| `thiserror` 2 | Error enums |
| `zeroize` 1 | Capabilities zeroization |
| `bytes` 1 | Byte buffer types for streams |
| `futures` | AsyncRead/AsyncWrite for BiStream trait |
### Feature flags
```toml
[features]
default = ["quinn"]
quinn = ["dep:quinn"]
iroh = ["dep:iroh"]
```
Both quinn and iroh are optional, both can be active simultaneously (ADR-010).
`quinn` is default-on for the common case; `iroh` is opt-in.
### Workspace Cargo.toml
Add `crates/alknet-core` to the workspace `members` list in the root
`Cargo.toml`.
### Module skeleton
```rust
// src/lib.rs
//! alknet-core: Core library for ALPN-based protocol dispatch.
pub mod types;
pub mod auth;
pub mod config;
pub mod endpoint;
// Re-exports (filled in by subsequent tasks)
```
Each module file gets a doc comment and `// TODO: implement` marker. The
subsequent tasks (core-types, config, auth, endpoint) fill these in.
## Acceptance Criteria
- [ ] `crates/alknet-core/Cargo.toml` exists with all dependencies and feature flags
- [ ] `crates/alknet-core/src/lib.rs` exists with module declarations
- [ ] Module skeleton files exist: `types.rs`, `auth.rs`, `config.rs`, `endpoint.rs`
- [ ] Root `Cargo.toml` `members` list includes `crates/alknet-core`
- [ ] `cargo check -p alknet-core` succeeds
- [ ] `cargo clippy -p alknet-core` succeeds with no warnings
- [ ] Dual licensing: `MIT OR Apache-2.0` (workspace-inherited)
## References
- docs/architecture/overview.md — crate graph, shared types
- docs/architecture/crates/core/README.md — crate index
- docs/architecture/crates/core/core-types.md — types to implement
- docs/architecture/crates/core/endpoint.md — endpoint, features (quinn + iroh)
- docs/architecture/crates/core/config.md — config types
- docs/architecture/crates/core/auth.md — auth types
- docs/architecture/decisions/003-crate-decomposition.md — ADR-003
- docs/architecture/decisions/010-alpn-router-and-endpoint.md — ADR-010 (feature-gating)
## Notes
> This is the foundational setup task for alknet-core. All subsequent core
> tasks depend on this one. The crate has no alknet dependencies (vault is
> standalone; core doesn't depend on vault). The feature flags for quinn/iroh
> are important — both are optional and can be active simultaneously.
## Summary
> To be filled on completion

249
tasks/core/endpoint.md Normal file
View File

@@ -0,0 +1,249 @@
---
id: core/endpoint
name: Implement AlknetEndpoint, HandlerRegistry, accept loops (quinn + iroh), TLS identity, and graceful shutdown
status: pending
depends_on: [core/core-types, core/config, core/auth]
scope: broad
risk: high
impact: component
level: implementation
---
## Description
Implement the ALPN router and endpoint in `src/endpoint.rs`. This is the
integration point of alknet-core — it ties together the core types, config,
and auth into the central runtime that accepts connections and dispatches to
handlers by ALPN string.
### AlknetEndpoint
```rust
pub struct AlknetEndpoint {
quinn: Option<quinn::Endpoint>,
iroh: Option<iroh::Endpoint>,
handlers: Arc<HandlerRegistry>,
dynamic: Arc<ArcSwap<DynamicConfig>>,
identity_provider: Arc<dyn IdentityProvider>,
shutdown: watch::Receiver<bool>,
}
```
Manages one or more QUIC connection sources, each feeding into the same ALPN
router. Both quinn and iroh are optional (feature-gated), both can be active
simultaneously (ADR-010).
### HandlerRegistry
```rust
pub struct HandlerRegistry {
handlers: HashMap<&'static [u8], Arc<dyn ProtocolHandler>>,
}
impl HandlerRegistry {
pub fn new() -> Self;
pub fn register(&mut self, handler: Arc<dyn ProtocolHandler>);
pub fn get(&self, alpn: &[u8]) -> Option<&Arc<dyn ProtocolHandler>>;
pub fn alpn_strings(&self) -> Vec<Vec<u8>>;
}
```
- `register()`: insert a handler. Panics if ALPN already registered.
- `get()`: look up by ALPN string.
- `alpn_strings()`: all registered ALPNs — used to build TLS ServerConfig
(quinn) and ALPN list (iroh).
- Registration is **static at startup** (OQ-04, ADR-010). The CLI builds the
registry, inserts all handlers, passes to `AlknetEndpoint::new()`.
### Accept loops
Each active connection source runs its own accept loop. Both dispatch through
the same `HandlerRegistry`.
**Quinn accept loop** (public QUIC+TLS):
```
loop {
tokio::select! {
incoming = quinn_endpoint.accept() => {
let connection = incoming.await;
match connection {
Ok(conn) => dispatch(conn),
Err(e) => { /* log TLS handshake failure, continue */ }
}
}
_ = shutdown.changed() => break,
}
}
```
**iroh accept loop** (P2P relay-assisted):
```
loop {
tokio::select! {
incoming = iroh_endpoint.accept() => {
let accepting = incoming.accept();
let alpn = accepting.alpn().await;
match alpn {
Ok(alpn) => dispatch(alpn, accepting),
Err(e) => { /* log handshake failure, continue */ }
}
}
_ = shutdown.changed() => break,
}
}
```
Use `iroh::Endpoint` directly (not iroh's `Router`) because our HandlerRegistry
is shared between quinn and iroh, and our AuthContext construction differs per
source. See iroh's `protocol.rs` for the reference pattern.
### Dispatch function (shared)
```
fn dispatch(connection) {
let alpn = connection.alpn();
match handlers.get(alpn) {
Some(handler) => {
let auth = AuthContext::from_connection(&connection);
let conn = Connection::from_quinn(connection); // or from_iroh
tokio::spawn(async move {
if let Err(e) = handler.handle(conn, &auth).await {
// log error, connection closes
}
});
}
None => connection.close(0u32, "no handler"),
}
}
```
### AuthContext construction
The endpoint constructs `AuthContext` from the QUIC connection:
1. `alpn`: from `connection.alpn()` — always present
2. `remote_addr`: from `connection.remote_addr()` — may be None for iroh
3. `tls_client_fingerprint`: extracted from TLS session's client cert, if presented
4. `identity`: if fingerprint available, call `IdentityProvider::resolve_from_fingerprint()`.
If resolves, `identity = Some(resolved)`. If not, `identity = None`.
### TLS Identity
Three modes per `TlsIdentity` (OQ-12):
**RawKey (RFC 7250, default for P2P)**:
- Build `rustls::ServerConfig` with `only_raw_public_keys() -> true`
- `ResolvesServerCert` generates cert on-the-fly from the Ed25519 key
- ~100 lines — see `iroh/iroh/src/tls/resolver.rs` for the reference pattern
- Works natively with SSH auth and git; browsers do NOT support RFC 7250
**X509 (domain-hosted)**:
- Load cert/key from file paths
- Standard `rustls::ServerConfig`
- For browser/WebTransport clients and public domain services
**SelfSigned (dev only)**:
- Generate self-signed cert on startup
- External clients will not trust it
**ACME (future, not in this task)**:
- The reverse-proxy project demonstrates the complete ACME pattern. It will be
adapted as an additional `TlsIdentity` variant or `ResolvesServerCert` impl.
For now, X509 with manual certs is the domain path. Note this as a TODO.
The quinn endpoint's `rustls::ServerConfig` ALPN list is set from
`registry.alpn_strings()` at construction time. The iroh endpoint's ALPN list
is similarly derived. Both advertise the same set of ALPNs.
### Graceful shutdown
```rust
impl AlknetEndpoint {
pub fn shutdown_sender(&self) -> watch::Sender<bool>;
pub async fn shutdown(&self) -> Result<(), EndpointError>;
}
```
- `shutdown_sender()`: clone of shutdown channel sender. `send(true)` signals shutdown.
- `shutdown()`: signals all accept loops to stop, waits for in-flight connections
with drain timeout (default 2s from StaticConfig), then forcefully closes remaining.
- SIGTERM/SIGINT wired to shutdown channel by the CLI binary (not core's concern).
### EndpointError
```rust
pub enum EndpointError {
BindFailed(io::Error),
TlsConfig(io::Error),
HandlerNotFound(Vec<u8>),
}
```
Fatal errors that prevent the endpoint from starting or continuing.
### Accept loop error handling
- **TLS handshake failure**: log and continue. Client may have offered no
compatible ALPN, or cert may be untrusted.
- **Handler panic**: caught by tokio's task isolation. Connection dropped,
others continue.
- **Connection-level errors** (quinn/iroh ConnectionError): log and continue.
Accept loop keeps running.
### What the accept loops do NOT do
- No byte-peeking (ALPN handles protocol detection)
- No per-handler accept loops (ALPN unifies)
- No SSH-specific logic (accept loop is ALPN-agnostic)
### TCP is NOT an endpoint concern
Bare TCP (SSH over port 22) does not use QUIC or ALPN. TCP access is handled by
individual handlers (the SSH handler can listen on TCP independently). This is
handler-specific, not core endpoint.
## Acceptance Criteria
- [ ] `AlknetEndpoint` struct with quinn/iroh (both Option, both feature-gated)
- [ ] `HandlerRegistry` with new/register/get/alpn_strings
- [ ] `register()` panics on duplicate ALPN
- [ ] Quinn accept loop runs, dispatches by ALPN, respects shutdown
- [ ] iroh accept loop runs, dispatches by ALPN, respects shutdown
- [ ] Dispatch function spawns handler task via `tokio::spawn`
- [ ] AuthContext constructed from connection (alpn, remote_addr, fingerprint, identity)
- [ ] TLS RawKey mode: rustls ServerConfig with `only_raw_public_keys()`, on-the-fly cert
- [ ] TLS X509 mode: load cert/key from files, standard ServerConfig
- [ ] TLS SelfSigned mode: generate self-signed cert on startup
- [ ] ALPN list in TLS ServerConfig set from `registry.alpn_strings()`
- [ ] Graceful shutdown: signal accept loops to stop, drain timeout, force close
- [ ] `EndpointError` enum with all variants
- [ ] Accept loop errors logged, loop continues (no crash on handshake failure)
- [ ] Handler panics caught by tokio task isolation (connection dropped, others continue)
- [ ] No byte-peeking, no per-handler accept loops, no SSH-specific logic
- [ ] Unit test: HandlerRegistry register/get/alpn_strings
- [ ] Unit test: HandlerRegistry register panics on duplicate ALPN
- [ ] Integration test: endpoint with mock handler, verify dispatch by ALPN
- [ ] `cargo test -p alknet-core` succeeds
- [ ] `cargo clippy -p alknet-core` succeeds with no warnings
## References
- docs/architecture/crates/core/endpoint.md — full endpoint spec
- docs/architecture/decisions/001-alpn-protocol-dispatch.md — ADR-001
- docs/architecture/decisions/010-alpn-router-and-endpoint.md — ADR-010
- docs/architecture/decisions/006-alpn-convention-and-connection-model.md — ADR-006
- docs/architecture/decisions/007-bistream-type-definition.md — ADR-007
- iroh reference: `/workspace/iroh/iroh/src/protocol.rs` (accept loop pattern)
- iroh reference: `/workspace/iroh/iroh/src/tls/resolver.rs` (RFC 7250 raw key)
## Notes
> This is the integration point of alknet-core — it ties together types, config,
> and auth. The highest-risk task in core because it involves QUIC connection
> handling, TLS identity (3 modes), and graceful shutdown. The RFC 7250 raw key
> path is ~100 lines (iroh has a reference implementation). ACME is deferred —
> note as TODO, use X509 manual certs for the domain path for now. TCP is NOT
> an endpoint concern — it's handler-specific.
## Summary
> To be filled on completion

122
tasks/core/review-core.md Normal file
View File

@@ -0,0 +1,122 @@
---
id: core/review-core
name: Review alknet-core implementation for spec conformance and pattern consistency
status: pending
depends_on: [core/endpoint]
scope: moderate
risk: low
impact: phase
level: review
---
## Description
Review the alknet-core implementation for spec conformance, pattern
consistency, and correctness before alknet-call (which depends on core)
begins implementation. This is the quality checkpoint at the end of the core
phase.
### Review Checklist
1. **Core types conformance** (core-types.md):
- `ProtocolHandler` trait signature matches spec (alpn, handle)
- `HandlerError` has all 4 variants (ConnectionClosed, StreamError, AuthRequired, Internal)
- `Connection` has all methods, from_quinn/from_iroh feature-gated
- `Connection::set_identity` is write-once via OnceLock
- `BiStream` is a trait (AsyncRead + AsyncWrite + Send + Unpin)
- `SendStream` implements AsyncWrite, `RecvStream` implements AsyncRead
- `StreamError` has all 4 variants
- `From<StreamError> for HandlerError` impl matches spec mapping table
- `Capabilities` is non-serializable, zeroized, immutable, Clone+Send+Sync
- `Capabilities` has builder API (new, with_api_key, with_http_token, get), private fields
2. **Config conformance** (config.md):
- `StaticConfig` fields match (listen_addr, tls_identity, iroh_relay, drain_timeout)
- `TlsIdentity` has X509, RawKey, SelfSigned
- `DynamicConfig` has auth and rate_limits
- `AuthPolicy` has authorized_fingerprints (HashSet<String>), api_keys (Vec<ApiKeyEntry>)
- `ApiKeyEntry` has all 5 fields (prefix, hash, scopes, description, expires_at)
- `ConfigReloadHandle` has reload() and dynamic()
- No russh dependency (fingerprints as strings)
- No removed fields (host_key, stealth, transport_mode, listeners)
3. **Auth conformance** (auth.md):
- `AuthContext` has all 4 fields, derives Clone
- `Identity` has id, scopes, resources
- `AuthToken` has raw field
- `IdentityProvider` trait with both methods
- `ConfigIdentityProvider` reads from ArcSwap on every call
- Fingerprint resolution looks up in authorized_fingerprints
- Token resolution: alk_ prefix, hash match, expiry check
- Two identity scopes documented (connection-level vs per-request)
4. **Endpoint conformance** (endpoint.md):
- `AlknetEndpoint` has quinn/iroh (both Option, both feature-gated)
- `HandlerRegistry` register/get/alpn_strings, panics on duplicate
- Quinn accept loop: select on accept + shutdown, dispatch by ALPN
- iroh accept loop: select on accept + shutdown, dispatch by ALPN
- Dispatch spawns handler task via tokio::spawn
- AuthContext constructed from connection (alpn, remote_addr, fingerprint, identity)
- TLS RawKey: only_raw_public_keys(), on-the-fly cert from Ed25519
- TLS X509: load from files
- TLS SelfSigned: generate on startup
- ALPN list in ServerConfig from registry.alpn_strings()
- Graceful shutdown: drain timeout, force close
- EndpointError has all 3 variants
- No byte-peeking, no per-handler loops, no SSH-specific logic
5. **Pattern consistency**:
- ArcSwap used consistently for DynamicConfig
- Feature flags (quinn, iroh) gate transport code correctly
- Error handling patterns consistent (thiserror, Result propagation)
- No quinn/iroh types in public API (Connection wraps them)
6. **Security constraints**:
- Capabilities non-serializable (no Serialize derive)
- Capabilities zeroized (Zeroize, ZeroizeOnDrop)
- Capabilities immutable (no mut accessors)
- Config reload is privilege escalation (no unauthenticated reload endpoint)
- Token entropy requirement documented
7. **Test coverage**:
- Unit tests for Capabilities (build, get, clone, zeroize)
- Unit tests for config types and reload
- Unit tests for auth resolution (fingerprint, token, expiry)
- Unit tests for HandlerRegistry
- Integration test: endpoint dispatch by ALPN
## Acceptance Criteria
- [ ] All core types match core-types.md
- [ ] All config types match config.md
- [ ] All auth types match auth.md
- [ ] Endpoint matches endpoint.md (accept loops, TLS modes, shutdown)
- [ ] Capabilities security constraints satisfied (non-serializable, zeroized, immutable)
- [ ] No russh dependency in core
- [ ] No quinn/iroh types in public API
- [ ] ArcSwap pattern consistent
- [ ] Feature flags gate transport code correctly
- [ ] Test coverage adequate for all functionality
- [ ] `cargo fmt --check -p alknet-core` passes
- [ ] `cargo clippy -p alknet-core` passes with no warnings
- [ ] All tests pass
## References
- docs/architecture/crates/core/README.md
- docs/architecture/crates/core/core-types.md
- docs/architecture/crates/core/config.md
- docs/architecture/crates/core/auth.md
- docs/architecture/crates/core/endpoint.md
- docs/architecture/decisions/ (relevant ADRs: 001-011, 014, 015, 022)
## Notes
> This review verifies core is spec-conformant before alknet-call begins.
> alknet-call depends heavily on core types (ProtocolHandler, Connection,
> AuthContext, Capabilities, IdentityProvider) — any issues here propagate to
> call. If deviations are found, document and fix before proceeding.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,85 @@
---
id: vault/cache-zeroization-test
name: Verify and test that HashMap::clear() drops CachedKey values triggering zeroization
status: pending
depends_on: []
scope: single
risk: low
impact: isolated
level: implementation
---
## Description
Fix drift item #6: `KeyCache::clear()` removes entries and relies on
`CachedKey`'s `Drop` impl for zeroization. The spec says to verify that
`HashMap::clear()` actually drops the values (it does, but this is worth a
test). This task adds a test that proves zeroization happens on cache eviction
and clear.
### Background
`CachedKey` derives `Zeroize` and `ZeroizeOnDrop` (via the `DerivedKey` it
holds, which is `#[zeroize(drop)]`). When the cache evicts an entry (LRU or TTL)
or `clear()` is called, the `CachedKey` is dropped, which triggers
`ZeroizeOnDrop` — the private key bytes are zeroized before deallocation.
`HashMap::clear()` drops all values, which triggers their `Drop` impls. This
is standard Rust behavior, but the security-critical nature of key material
warrants an explicit test.
### What to add
A test in `cache.rs` (or `tests/`) that:
1. Inserts a `CachedKey` with a known private key into the cache
2. Verifies the key is present
3. Calls `clear()` (or evicts via LRU/TTL)
4. Verifies the `CachedKey` was dropped and zeroized
Testing zeroization directly is tricky because the memory is freed — you can't
easily inspect it after drop. A practical approach:
- **Option A**: Use a custom type with a `Drop` impl that sets a flag (e.g., an
`Arc<AtomicBool>`) when zeroized. Insert it into the cache, clear, verify the
flag is set. This tests the drop path, not the zeroize path directly, but
confirms `clear()` drops values.
- **Option B**: Test the LRU eviction path — fill the cache to `max_entries`,
insert one more, verify the LRU entry was evicted (dropped).
- **Option C**: Test that `lock()` calls `cache.clear()` and the cache is empty
afterward (integration test via `VaultServiceHandle`).
At minimum, implement Option B and C. Option A is a bonus if feasible without
over-engineering the test type.
### Scope
This task touches `cache.rs` (test additions) and possibly `tests/`. It does
not depend on the irpc removal task (drift #4) because `cache.rs` is a separate
file. It can run in parallel with drift #4.
## Acceptance Criteria
- [ ] Test: LRU eviction drops the evicted `CachedKey` (cache exceeds `max_entries`, oldest evicted)
- [ ] Test: `lock()` clears the cache (verify cache is empty after lock)
- [ ] Test: TTL expiry evicts entries (set short TTL, wait, verify entry gone)
- [ ] Test: `clear()` removes all entries (verify empty after clear)
- [ ] `cargo test` succeeds
- [ ] `cargo clippy` succeeds with no warnings
## References
- docs/architecture/crates/vault/README.md — Known Source Drift table item #6
- docs/architecture/crates/vault/service.md — Cache section, Security Constraints
- docs/architecture/crates/vault/encryption.md — Security Constraints
## Notes
> `HashMap::clear()` does drop values, triggering their `Drop` impls. This is
> standard Rust behavior, but key material is security-critical enough to
> warrant an explicit test. This task touches only `cache.rs` and can run in
> parallel with the irpc removal task (drift #4).
## Summary
> To be filled on completion

View File

@@ -0,0 +1,140 @@
---
id: vault/derivedkey-serialization
name: Implement always-redact DerivedKey serialization and reject redacted payloads on deserialize
status: pending
depends_on: [vault/irpc-removal]
scope: narrow
risk: medium
impact: component
level: implementation
---
## Description
Fix drift item #5: `DerivedKey` currently has dual serialization behavior — JSON
redacts the private key, but postcard (the binary format used by irpc) preserves
the raw bytes. ADR-025 dropped the postcard/remote path, so `DerivedKey` should
**always** redact on serialize and reject `"[REDACTED]"` on deserialize with an
explicit error.
### Current state
`protocol.rs` has `DerivedKey` with `#[derive(Serialize, Deserialize)]` (or
similar) that produces JSON redaction for JSON but preserves bytes in postcard.
The postcard tests in the test suite verify the binary round-trip.
### Target state
Per `docs/architecture/crates/vault/protocol.md` → Serialization Redaction:
`DerivedKey` must **not** derive `Deserialize` via `#[derive]`. It needs custom
`Serialize` and `Deserialize` impls:
**Custom Serialize** — always redacts `private_key`:
```rust
impl serde::Serialize for DerivedKey {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where S: serde::Serializer {
use serde::SerializeStruct;
let mut s = serializer.serialize_struct("DerivedKey", 3)?;
s.serialize_field("key_type", &self.key_type)?;
s.serialize_field("private_key", "[REDACTED]")?;
s.serialize_field("public_key", &self.public_key)?;
s.end()
}
}
```
**Custom Deserialize** — rejects `"[REDACTED]"` with an explicit error:
```rust
impl<'de> serde::Deserialize<'de> for DerivedKey {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where D: serde::Deserializer<'de> {
#[derive(serde::Deserialize)]
struct DerivedKeyHelper {
key_type: KeyType,
private_key: Vec<u8>,
public_key: Vec<u8>,
}
let helper = DerivedKeyHelper::deserialize(deserializer)?;
if helper.private_key == b"[REDACTED]" {
return Err(serde::de::Error::custom(
"DerivedKey.private_key is \"[REDACTED]\" — redacted payloads \
cannot be deserialized. JSON round-tripping a DerivedKey is \
not supported (the private key is gone)."
));
}
Ok(DerivedKey {
key_type: helper.key_type,
private_key: helper.private_key,
public_key: helper.public_key,
})
}
}
```
**Debug impl** — also redacts:
```rust
impl fmt::Debug for DerivedKey {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.debug_struct("DerivedKey")
.field("key_type", &self.key_type)
.field("private_key", &"[REDACTED]")
.field("public_key", &self.public_key)
.finish()
}
}
```
### Remove postcard tests
The postcard round-trip tests (which verified binary format preserved private
key bytes) are removed — ADR-025 dropped that path. The `postcard`
dev-dependency was removed in the irpc removal task (drift #4).
### Why custom impls instead of derives
A derived `Deserialize` would generate a default impl that conflicts with the
manual one, and would only fail incidentally (serde type mismatch: string vs
sequence), not with the explicit redaction-rejection error the spec requires.
The custom impl is required for the explicit error message.
### Scope
This task touches `protocol.rs` (the `DerivedKey` type, its serde impls, Debug
impl) and test files (remove postcard tests, add redaction tests). It depends on
the irpc removal task (drift #4) because both modify `protocol.rs`.
## Acceptance Criteria
- [ ] `DerivedKey` does not derive `Serialize` or `Deserialize` via `#[derive]`
- [ ] Custom `Serialize` impl always redacts `private_key` as `"[REDACTED]"`
- [ ] Custom `Deserialize` impl rejects `private_key == b"[REDACTED]"` with explicit error
- [ ] Custom `Debug` impl redacts `private_key` as `"[REDACTED]"`
- [ ] Postcard round-trip tests removed
- [ ] Unit test: JSON serialize produces `"[REDACTED]"` for `private_key`
- [ ] Unit test: JSON deserialize of a redacted payload returns an error (not a corrupted key)
- [ ] Unit test: `{:?}` on `DerivedKey` does not contain private key bytes
- [ ] `cargo test` succeeds
- [ ] `cargo clippy` succeeds with no warnings
## References
- docs/architecture/crates/vault/README.md — Known Source Drift table item #5
- docs/architecture/crates/vault/protocol.md — Serialization Redaction, Debug redaction
- docs/architecture/decisions/025-vault-local-only-dispatch.md — ADR-025 (resolves W8)
- docs/architecture/decisions/014-secret-material-flow-and-capability-injection.md — ADR-014
## Notes
> The redaction is defense-in-depth for logging safety, not the primary control
> — the primary control is that `DerivedKey` never crosses the call protocol
> wire (ADR-014). ADR-025 dropped the postcard/remote path that previously
> preserved bytes in binary formats. The custom Deserialize impl is required
> because `#[derive(Deserialize)]` would conflict and not produce the explicit
> redaction-rejection error. Depends on irpc removal because both modify
> `protocol.rs`.
## Summary
> To be filled on completion

106
tasks/vault/irpc-removal.md Normal file
View File

@@ -0,0 +1,106 @@
---
id: vault/irpc-removal
name: Remove irpc dependency and actor dispatch from vault, convert to direct method calls on VaultServiceHandle
status: pending
depends_on: []
scope: broad
risk: high
impact: component
level: implementation
---
## Description
Remove the irpc-based actor dispatch from the vault crate and convert to direct
method calls on `VaultServiceHandle`. This is drift item #4 from the vault README
drift table and the foundational ADR-025 refactor — it restructures `service.rs`
and `protocol.rs` fundamentally, which is why most other vault drift tasks depend
on this one.
### What to remove
- `VaultProtocol` enum with `#[rpc_requests]` derive in `protocol.rs`
- `VaultServiceActor` in `service.rs`
- `Client<VaultProtocol>` usage
- `irpc` and `irpc-derive` dependencies from `Cargo.toml`
- `postcard` from dev-dependencies (was only needed for the irpc binary path)
- `tokio` dependency from `Cargo.toml` (the vault uses `std::sync::RwLock`, not
`tokio::sync::RwLock` — ADR-025)
- `VaultMessage` / `VaultProtocol` re-exports from `lib.rs`
### What to keep / change
- `VaultServiceHandle` stays — it becomes the sole API. It is already
`Arc<std::sync::RwLock<VaultServiceInner>>` with synchronous methods. The actor
path (`mpsc` channel + oneshot backchannels via irpc's `Service` trait) is
removed entirely.
- `VaultServiceError` drops `Serialize`/`Deserialize` derives (were needed for
irpc dispatch — ADR-025 removed that path). It becomes a plain `thiserror::Error`
enum.
- `DerivedKey` and `KeyType` stay in `protocol.rs` — the file is renamed in
spirit to "the types module" but the filename stays `protocol.rs` for
continuity. The `VaultProtocol` enum is removed; `DerivedKey`/`KeyType` remain.
- `lib.rs` re-exports are updated to remove `VaultMessage`, `VaultProtocol`,
`VaultServiceActor` and reflect the new public API per the vault README's
Public API section.
### Public API after this task
Per `docs/architecture/crates/vault/README.md` → Public API:
```rust
pub use mnemonic::{Language, Mnemonic, Seed};
pub use derivation::{DerivationError, ExtendedPrivKey, PATHS};
pub use encryption::{EncryptedData, EncryptionError, EncryptionKey};
pub use encryption::CURRENT_KEY_VERSION;
pub use protocol::{DerivedKey, KeyType};
pub use service::{VaultServiceError, VaultServiceHandle};
pub use cache::CacheConfig;
```
### Cargo.toml changes
Remove from `[dependencies]`:
- `irpc = { workspace = true }`
- `irpc-derive = { workspace = true }`
- `tokio = { version = "1", features = ["sync", "rt", "macros"] }`
Remove from `[dev-dependencies]`:
- `postcard = { version = "1", features = ["alloc"] }`
The vault should have **zero** async runtime dependency after this task.
## Acceptance Criteria
- [ ] `VaultProtocol` enum and `#[rpc_requests]` derive removed from `protocol.rs`
- [ ] `VaultServiceActor` removed from `service.rs`
- [ ] `Client<VaultProtocol>` usage removed
- [ ] `irpc`, `irpc-derive`, `tokio` removed from `[dependencies]` in `Cargo.toml`
- [ ] `postcard` removed from `[dev-dependencies]` in `Cargo.toml`
- [ ] `VaultServiceError` no longer derives `Serialize`/`Deserialize`
- [ ] `lib.rs` re-exports match the Public API section of vault README (no `VaultMessage`, `VaultProtocol`, `VaultServiceActor`)
- [ ] `VaultServiceHandle` methods are all synchronous (no `async`, no `.await`)
- [ ] `cargo check` succeeds
- [ ] `cargo clippy` succeeds with no warnings
- [ ] `cargo test` succeeds (existing tests updated to remove irpc/postcard usage)
- [ ] No `tokio` dependency remains in the vault `Cargo.toml`
## References
- docs/architecture/crates/vault/README.md — Known Source Drift table item #4, Public API section
- docs/architecture/crates/vault/service.md — Dispatch section, VaultServiceHandle
- docs/architecture/crates/vault/protocol.md — Local-Only by Construction
- docs/architecture/decisions/025-vault-local-only-dispatch.md — ADR-025
## Notes
> This is the foundational vault refactor. It restructures `service.rs` and
> `protocol.rs` — most other vault drift tasks touch these same files and must
> follow this one to avoid merge conflicts. The `VaultServiceHandle` struct
> already uses `std::sync::RwLock` with synchronous methods; the actor path is
> the dead code to remove. After this task, the vault has no async runtime
> dependency and no RPC framework dependency — it is local-only by construction.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,127 @@
---
id: vault/key-versioning-rotation
name: Implement version-indexed encryption key paths, bump CURRENT_KEY_VERSION to 2, and add rotate method
status: pending
depends_on: [vault/irpc-removal]
scope: moderate
risk: medium
impact: component
level: implementation
---
## Description
Fix drift items #3, #9, and #10 as one coherent feature: the version-indexed
key rotation mechanism from ADR-021. These three drifts are tightly coupled —
`CURRENT_KEY_VERSION = 2` (drift #3), version-aware `encrypt`/`decrypt` via
`encryption_path_for_version` (drift #9), and the `rotate` method (drift #10)
form the complete key rotation feature. Splitting them would produce tasks that
don't compile independently.
### Drift #3: Bump CURRENT_KEY_VERSION
Current: `CURRENT_KEY_VERSION = 1` (but the key is HD-derived, and v1 is
reserved for the TypeScript PBKDF2 legacy per ADR-020).
Target: `CURRENT_KEY_VERSION = 2` (HD-derived, per ADR-020).
Version semantics:
- v1: TypeScript predecessor's PBKDF2-encrypted data — the vault **cannot**
decrypt it (different key derivation). Migration is a one-time re-encryption.
- v2: HD-derived at `m/74'/2'/0'/0'` (PATHS::ENCRYPTION) — current.
- v3+: `m/74'/2'/0'/1'`, `m/74'/2'/0'/2'`, etc. — future rotation versions.
### Drift #9: Version-aware encrypt/decrypt
Current: `encrypt`/`decrypt` always derive at `PATHS::ENCRYPTION` regardless of
the `key_version` parameter.
Target:
- `encrypt(plaintext, key_version)`: derive the encryption key at
`encryption_path_for_version(key_version)`, stamp the same `key_version` on
the resulting `EncryptedData`.
- `decrypt(encrypted)`: derive the key at
`encryption_path_for_version(encrypted.key_version)` — the blob carries its
own version, and each version maps to a distinct derivation path.
This requires:
1. `encryption_path_for_version(version: u32) -> Result<String, DerivationError>`
already exists in `derivation.rs` — verify it returns `InvalidPath` for
`version < 2` (v1 is TS legacy, v0 is meaningless).
2. `derive_encryption_key_for_version(version: u32) -> Result<DerivedKey, VaultServiceError>`
— a new method on `VaultServiceHandle` that maps version → path → derive.
Cached by path (same cache as `derive_encryption_key`).
3. `encrypt` and `decrypt` use `derive_encryption_key_for_version` instead of
deriving at the fixed `PATHS::ENCRYPTION` path.
### Drift #10: Implement rotate
Current: no `rotate` method exists.
Target:
```rust
pub fn rotate(&self, encrypted: &EncryptedData, to_version: u32) -> Result<EncryptedData, VaultServiceError>;
```
Decrypts with the old version's key (from `encrypted.key_version`), re-encrypts
with the new version's key (`to_version`). Returns the new `EncryptedData`
the caller replaces the blob in storage. No new mnemonic needed; the same seed
produces all version keys via different derivation paths (ADR-021).
### Implementation notes
- `derive_encryption_key(path)` (the path-based API) remains as-is for deriving
at arbitrary paths. `derive_encryption_key_for_version(version)` is the
version-aware API used by `encrypt`/`decrypt`. Both share the same cache
(keyed by derivation path).
- `encrypt` and `decrypt` extract the `EncryptionKey` from the `DerivedKey` via
`EncryptionKey::from_derived_bytes` (see encryption.md).
- `encryption_path_for_version` returns `InvalidPath` for `version < 2`.
`derive_encryption_key_for_version` propagates this as
`VaultServiceError::InvalidPath`.
### Scope
This task touches `encryption.rs` (CURRENT_KEY_VERSION), `service.rs` (encrypt,
decrypt, rotate, derive_encryption_key_for_version), and possibly `derivation.rs`
(verify `encryption_path_for_version`). It depends on the irpc removal task
(drift #4) because both modify `service.rs`.
## Acceptance Criteria
- [ ] `CURRENT_KEY_VERSION` is `2` in `encryption.rs`
- [ ] `derive_encryption_key_for_version(version)` method added to `VaultServiceHandle`
- [ ] `derive_encryption_key_for_version` returns `InvalidPath` for `version < 2`
- [ ] `encrypt(plaintext, key_version)` derives at `encryption_path_for_version(key_version)`
- [ ] `encrypt` stamps the passed `key_version` on the resulting `EncryptedData`
- [ ] `decrypt(encrypted)` derives at `encryption_path_for_version(encrypted.key_version)`
- [ ] `rotate(encrypted, to_version)` method implemented: decrypt old, re-encrypt new
- [ ] `rotate` returns `EncryptedData` with `key_version = to_version`
- [ ] Unit test: encrypt at v2, decrypt at v2 — round-trip succeeds
- [ ] Unit test: encrypt at v2, rotate to v3, decrypt at v3 — round-trip succeeds
- [ ] Unit test: decrypt v2 blob after rotation — old key still derivable (partial rotation safe)
- [ ] Unit test: `derive_encryption_key_for_version(1)` returns `InvalidPath`
- [ ] Unit test: `derive_encryption_key_for_version(0)` returns `InvalidPath`
- [ ] `cargo test` succeeds
- [ ] `cargo clippy` succeeds with no warnings
## References
- docs/architecture/crates/vault/README.md — Known Source Drift table items #3, #9, #10
- docs/architecture/crates/vault/encryption.md — Key Versioning, Rotation, EncryptionKey
- docs/architecture/crates/vault/service.md — encrypt, decrypt, rotate, derive_encryption_key_for_version
- docs/architecture/crates/vault/mnemonic-derivation.md — encryption_path_for_version, PATHS
- docs/architecture/decisions/020-hd-derivation-for-encryption-keys.md — ADR-020
- docs/architecture/decisions/021-key-rotation-via-version-indexed-paths.md — ADR-021
## Notes
> These three drifts are one feature: version-indexed key rotation (ADR-021).
> Splitting them would produce tasks that don't compile independently —
> bumping the version without version-aware encrypt/decrypt would make v2
> blobs undecryptable, and rotate without version-aware encrypt/decrypt has no
> keys to work with. Depends on irpc removal because both modify `service.rs`.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,83 @@
---
id: vault/osrng-iv-generation
name: Replace rand::random() IV generation with OsRng in AES-GCM encryption
status: pending
depends_on: []
scope: single
risk: medium
impact: isolated
level: implementation
---
## Description
Fix drift item #1: the AES-256-GCM IV (nonce) generation in `encryption.rs`
currently uses `rand::random()`, which uses the thread-local RNG and may not be a
CSPRNG on all platforms. Replace with `OsRng` (or equivalent CSPRNG).
This is a security-critical fix. IV reuse under the same AES-GCM key is
catastrophic — it breaks authenticity and creates a two-time-pad on the
plaintext. `OsRng` reads from the operating system's entropy source and is the
correct choice for cryptographic nonces.
### Current state
`encryption.rs` line ~133: IV generation uses `rand::random()` to produce the
12-byte GCM nonce.
### Target state
Use `rand::rngs::OsRng` (from the `rand` crate, which is already a dependency)
to generate the 12-byte IV. The `aes-gcm` crate's `Aes256Gcm` encrypt path takes
a `Nonce` — construct it from `OsRng`-generated bytes.
```rust
use rand::rngs::OsRng;
use rand::RngCore;
let mut iv_bytes = [0u8; 12];
OsRng.fill_bytes(&mut iv_bytes);
let nonce = Nonce::from_slice(&iv_bytes);
```
The IV is generated fresh for each `encrypt()` call. The salt (32 bytes, unused
in v2 for key derivation but kept for wire-format compat) should also use `OsRng`
for consistency — it's stored in the `EncryptedData` blob and doesn't need to be
deterministic.
### Scope
This task touches only `encryption.rs`. It does not depend on the irpc removal
(drift #4) because `encryption.rs` is a separate file from `service.rs` /
`protocol.rs`. It can run in parallel with drift #4.
## Acceptance Criteria
- [ ] `encryption::encrypt()` uses `OsRng` for IV generation, not `rand::random()`
- [ ] Salt generation uses `OsRng` (or equivalent CSPRNG)
- [ ] No `rand::random()` calls remain in `encryption.rs`
- [ ] IV is 12 bytes (standard GCM nonce size)
- [ ] Salt is 32 bytes (wire-format compat, unused in key derivation)
- [ ] Unit test: verify IV is fresh on each encrypt call (encrypt twice, different IVs)
- [ ] Unit test: verify decrypt round-trip still works after the change
- [ ] `cargo test` succeeds
- [ ] `cargo clippy` succeeds with no warnings
## References
- docs/architecture/crates/vault/README.md — Known Source Drift table item #1
- docs/architecture/crates/vault/encryption.md — Security Constraints: OsRng for IVs
- docs/architecture/crates/vault/service.md — Security Constraints: OsRng for IVs
- docs/architecture/decisions/020-hd-derivation-for-encryption-keys.md — ADR-020
## Notes
> This is a security-critical fix. IV reuse under the same AES-GCM key breaks
> authenticity and creates a two-time-pad on the plaintext. `rand::random()`
> uses the thread-local RNG which may not be a CSPRNG on all platforms; `OsRng`
> reads from the operating system's entropy source. This task touches only
> `encryption.rs` and can run in parallel with the irpc removal task (drift #4).
## Summary
> To be filled on completion

View File

@@ -0,0 +1,86 @@
---
id: vault/poisoned-lock-recovery
name: Replace unwrap() on RwLock acquisition with poisoned-lock recovery via unwrap_or_else
status: pending
depends_on: [vault/irpc-removal]
scope: narrow
risk: low
impact: component
level: implementation
---
## Description
Fix drift item #2: `VaultServiceHandle` methods use `unwrap()` on every
`RwLock` acquisition (read and write locks). A poisoned lock (caused by a panic
while the lock was held) would brick the vault for all subsequent operations.
Replace with `unwrap_or_else(|e| e.into_inner())` to recover the inner data from
a poisoned lock, or explicit error propagation where appropriate.
### Current state
`service.rs` uses `.unwrap()` on `RwLock` read and write acquisitions at
approximately lines 142, 161, 182, 191, 196, 227, 264, 307, 340, 367 (line
numbers may shift after the irpc removal task — match by pattern: every
`.read().unwrap()` and `.write().unwrap()` call in `VaultServiceHandle` method
bodies).
### Target state
For read locks:
```rust
let inner = self.inner.read().unwrap_or_else(|e| e.into_inner());
```
For write locks:
```rust
let mut inner = self.inner.write().unwrap_or_else(|e| e.into_inner());
```
The rationale: a poisoned lock means a panic occurred while the lock was held.
The data may be in an inconsistent state, but bricking the vault (panicking on
every subsequent call) is worse than attempting to continue. The vault's
operations are idempotent reads (derive) and state transitions (lock/unlock) —
recovering the inner data and continuing is the pragmatic choice. If the data
is truly corrupted, the next operation will fail with a normal error, not a
panic.
### No unwrap() or expect() outside tests
This is a general constraint for the vault: no `unwrap()` or `expect()` outside
test code. After fixing the RwLock acquisitions, audit the rest of `service.rs`
for any remaining `unwrap()`/`expect()` calls and replace them with proper error
propagation (`?` operator, explicit `Result` returns, or
`unwrap_or_else(|e| e.into_inner())` for lock recovery).
### Scope
This task touches `service.rs` only. It depends on the irpc removal task (drift
#4) because that task restructures `service.rs` — doing this first would cause
merge conflicts.
## Acceptance Criteria
- [ ] All `.read().unwrap()` calls in `VaultServiceHandle` methods replaced with `.read().unwrap_or_else(|e| e.into_inner())`
- [ ] All `.write().unwrap()` calls in `VaultServiceHandle` methods replaced with `.write().unwrap_or_else(|e| e.into_inner())`
- [ ] No `unwrap()` or `expect()` calls remain in `service.rs` outside of test code
- [ ] Unit test: vault remains usable after a simulated panic (poison the lock, verify next call recovers)
- [ ] `cargo test` succeeds
- [ ] `cargo clippy` succeeds with no warnings
## References
- docs/architecture/crates/vault/README.md — Known Source Drift table item #2
- docs/architecture/crates/vault/service.md — Security Constraints: No unwrap() outside tests
- docs/architecture/decisions/025-vault-local-only-dispatch.md — ADR-025
## Notes
> A panic in one vault operation must not brick the vault for all other
> operations. The poisoned-lock recovery via `unwrap_or_else(|e| e.into_inner())`
> is the standard Rust pattern for this. This task depends on the irpc removal
> task because both modify `service.rs` heavily.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,69 @@
---
id: vault/remove-password-derivation
name: Remove derive_password and site_password_path methods (password-manager pattern not relevant)
status: pending
depends_on: [vault/irpc-removal]
scope: single
risk: trivial
impact: isolated
level: implementation
---
## Description
Fix drift item #7: the vault currently has `derive_password`,
`derive_password_string`, and `site_password_path` methods. These implement a
password-manager pattern (deriving site-specific passwords from the seed) that
is not relevant to an RPC system's vault. Remove them entirely per ADR-025
(resolves review #002 C9).
### What to remove
- `derive_password` method from `VaultServiceHandle` (in `service.rs`)
- `derive_password_string` method from `VaultServiceHandle` (in `service.rs`)
- `site_password_path` function (in `mnemonic-derivation.rs` or `derivation.rs`,
wherever it's defined)
- Any associated path constants for password derivation
- Any tests for these methods
- Any references in `lib.rs` re-exports
### Why
The vault's purpose in alknet is to derive cryptographic keys (Ed25519 for
identity, AES-256-GCM for encryption) and encrypt/decrypt external credentials.
Site-specific password derivation is a password-manager feature that doesn't
belong in a networking toolkit's vault. Keeping it expands the attack surface
and API surface for no benefit.
### Scope
This task touches `service.rs` and possibly `derivation.rs` /
`mnemonic-derivation.rs`. It depends on the irpc removal task (drift #4) because
both modify `service.rs`.
## Acceptance Criteria
- [ ] `derive_password` method removed from `VaultServiceHandle`
- [ ] `derive_password_string` method removed from `VaultServiceHandle`
- [ ] `site_password_path` function removed
- [ ] Any password-derivation path constants removed
- [ ] Tests for password derivation removed
- [ ] No references to password derivation remain in `lib.rs` re-exports
- [ ] `cargo check` succeeds (no dangling references)
- [ ] `cargo test` succeeds
- [ ] `cargo clippy` succeeds with no warnings
## References
- docs/architecture/crates/vault/README.md — Known Source Drift table item #7
- docs/architecture/decisions/025-vault-local-only-dispatch.md — ADR-025 (resolves C9)
## Notes
> Straightforward removal. The password-manager pattern was inherited from the
> POC and is not relevant to alknet's vault use case. Depends on irpc removal
> because both modify `service.rs`.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,112 @@
---
id: vault/review-vault-sync
name: Review vault implementation against specs after all drift fixes
status: pending
depends_on: [vault/irpc-removal, vault/osrng-iv-generation, vault/poisoned-lock-recovery, vault/remove-password-derivation, vault/unlock-new-zeroizing-return, vault/key-versioning-rotation, vault/derivedkey-serialization, vault/cache-zeroization-test]
scope: moderate
risk: low
impact: phase
level: review
---
## Description
Review the vault crate implementation against the architecture specs after all
drift fixes are complete. This is the quality checkpoint before the spec-sync
task — verify that the implementation matches the specs and that no drift
items were missed or incompletely fixed.
### Review Checklist
1. **Drift table verification** — every item in the vault README's Known Source
Drift table is resolved:
- #1: OsRng for IVs (encryption.rs)
- #2: No unwrap() on RwLock (service.rs)
- #3: CURRENT_KEY_VERSION = 2 (encryption.rs)
- #4: irpc removed, direct method calls (protocol.rs, service.rs, Cargo.toml)
- #5: DerivedKey always-redact serialization (protocol.rs)
- #6: Cache zeroization tested (cache.rs)
- #7: derive_password removed (service.rs, derivation)
- #8: unlock_new returns Zeroizing<String> (service.rs)
- #9: encrypt/decrypt version-aware (service.rs)
- #10: rotate implemented (service.rs)
2. **Spec conformance** — implementation matches the spec docs:
- `VaultServiceHandle` API matches service.md (all methods, signatures, semantics)
- `DerivedKey` / `KeyType` match protocol.md (serialization, redaction, move-only)
- `EncryptedData` / `EncryptionKey` match encryption.md (fields, key versioning)
- `Mnemonic` / `Seed` / `ExtendedPrivKey` match mnemonic-derivation.md
- `KeyCache` / `CachedKey` / `CacheConfig` match service.md Cache section
- PATHS constants match mnemonic-derivation.md (IDENTITY, DEVICE_PREFIX, SSH_HOST, ENCRYPTION, ETHEREUM)
- `encryption_path_for_version` matches (returns InvalidPath for version < 2)
3. **Security constraints** (from service.md, encryption.md, README.md):
- OsRng for IVs and salt (no `rand::random()`)
- Zeroized drop on Seed, Mnemonic, ExtendedPrivKey, EncryptionKey, CachedKey, DerivedKey
- No `unwrap()` or `expect()` outside tests
- DerivedKey is move-only (no Clone)
- DerivedKey Debug impl redacts private key
- Cache eviction zeroizes (tested)
- No tokio dependency (local-only, std::sync::RwLock)
4. **Public API**`lib.rs` re-exports match the vault README's Public API section:
- `Mnemonic`, `Seed`, `Language` from mnemonic
- `DerivationError`, `ExtendedPrivKey`, `PATHS` from derivation
- `EncryptedData`, `EncryptionError`, `EncryptionKey`, `CURRENT_KEY_VERSION` from encryption
- `DerivedKey`, `KeyType` from protocol
- `VaultServiceError`, `VaultServiceHandle` from service
- `CacheConfig` from cache
- No `VaultMessage`, `VaultProtocol`, `VaultServiceActor` (removed)
5. **Test coverage**:
- Derivation test vectors (BIP39 "abandon...about" vector)
- Encryption round-trip tests
- Service lifecycle tests (unlock, lock, derive, encrypt, decrypt, rotate)
- Cache tests (LRU, TTL, clear, zeroization)
- Serialization redaction tests (JSON redact, reject redacted deserialize)
6. **Code quality**:
- `cargo fmt --check` passes
- `cargo clippy` passes with no warnings
- No dead code (removed irpc/actor/password paths fully gone)
## Acceptance Criteria
- [ ] All 10 drift items verified resolved
- [ ] VaultServiceHandle API matches service.md
- [ ] DerivedKey / KeyType match protocol.md
- [ ] EncryptedData / EncryptionKey match encryption.md
- [ ] Mnemonic / Seed / ExtendedPrivKey match mnemonic-derivation.md
- [ ] KeyCache / CachedKey / CacheConfig match service.md
- [ ] PATHS constants match mnemonic-derivation.md
- [ ] All security constraints satisfied (OsRng, zeroize, no unwrap, move-only, redaction)
- [ ] Public API (lib.rs re-exports) matches vault README
- [ ] Test coverage adequate for all functionality
- [ ] `cargo fmt --check` passes
- [ ] `cargo clippy` passes with no warnings
- [ ] All tests pass
- [ ] No dead code from removed features (irpc, actor, password derivation)
## References
- docs/architecture/crates/vault/README.md — drift table, public API, security constraints
- docs/architecture/crates/vault/service.md
- docs/architecture/crates/vault/encryption.md
- docs/architecture/crates/vault/protocol.md
- docs/architecture/crates/vault/mnemonic-derivation.md
- docs/architecture/decisions/018-vault-standalone-crate.md
- docs/architecture/decisions/020-hd-derivation-for-encryption-keys.md
- docs/architecture/decisions/021-key-rotation-via-version-indexed-paths.md
- docs/architecture/decisions/025-vault-local-only-dispatch.md
- docs/architecture/decisions/026-vault-key-model-hd-derivation.md
## Notes
> This review verifies the vault is spec-conformant after all drift fixes. If
> deviations are found, document them and create fix tasks before proceeding
> to the spec-sync task. This is the last checkpoint before the vault docs are
> updated to remove the drift table and bump status.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,107 @@
---
id: vault/spec-sync-remove-drift
name: Update vault specs to remove drift table and security-constraint drift prose, bump doc status
status: pending
depends_on: [vault/review-vault-sync]
scope: narrow
risk: low
impact: component
level: implementation
---
## Description
After the vault review confirms all drift is resolved, update the vault
architecture docs to remove the drift tracking artifacts and reflect the
completed state. The drift table and the "known drift" prose in the security
constraints sections were tracking tools during the spec-to-implementation
sync — now that the sync is complete, they should be cleaned up.
### What to update
1. **vault/README.md**:
- Remove the "Known Source Drift" section (the entire table and its intro
paragraph). The drift is resolved; the table is no longer needed.
- Remove the "Security Constraints" drift prose — the items that said
"The current source uses `rand::random()` — this is a known drift" etc.
Keep the constraint statements themselves (OsRng for IVs, zeroized drop,
no unwrap, etc.) — those are permanent implementation requirements. Remove
only the "current source uses X, this is a known drift" sentences.
- Bump `status: draft``status: stable` in the frontmatter (per the
Document Lifecycle in the architecture README: stable = implementation
complete and verified).
2. **vault/encryption.md**:
- In Security Constraints, remove the "The current source uses
`rand::random()` for IV generation (`encryption.rs` line 133) — this is a
known drift from the spec and must be corrected during implementation
sync." sentence. Keep the "OsRng for IVs" constraint.
- In Key Versioning, remove the "The current source uses
`CURRENT_KEY_VERSION = 1` with HD derivation and does not implement
version-indexed paths or `rotate`. These are drift items to be corrected
during implementation sync." paragraph.
- Bump `status: draft``status: stable`.
3. **vault/service.md**:
- In Security Constraints, remove the drift prose about `rand::random()`,
`unwrap()` on RwLock, and `KeyCache::clear()` verification. Keep the
constraint statements.
- Bump `status: draft``status: stable`.
4. **vault/protocol.md**:
- Remove the "to be updated per ADR-025 — remove `VaultProtocol` enum and
irpc usage" note in References.
- Remove the "postcard tests to be removed" note in References.
- Bump `status: draft``status: stable`.
5. **vault/mnemonic-derivation.md**:
- Bump `status: draft``status: stable` (no drift prose to remove here,
but the doc should reflect stable status).
6. **architecture/README.md**:
- Update the vault crate doc status entries in the Architecture Documents
table from `draft` to `stable`.
- Update the Current State paragraph to reflect vault implementation is
complete (remove "pending ADR-025/026 refactor" language).
### What NOT to change
- Do not remove the Security Constraints sections themselves — they are
permanent implementation requirements, not drift tracking.
- Do not change the ADRs — they record decisions, not implementation status.
- Do not remove the Public API section — it's a living reference.
### Scope
This task touches only documentation files — no source code changes. It
depends on the review task (which depends on all drift fixes).
## Acceptance Criteria
- [ ] "Known Source Drift" table removed from vault/README.md
- [ ] Drift prose removed from Security Constraints sections (constraint statements kept)
- [ ] All vault doc frontmatter bumped from `status: draft` to `status: stable`
- [ ] architecture/README.md vault doc statuses updated to `stable`
- [ ] architecture/README.md Current State updated (no "pending refactor" language)
- [ ] No drift-tracking language remains anywhere in vault docs
- [ ] Security constraint statements (OsRng, zeroize, no unwrap, etc.) preserved
- [ ] Public API section preserved in vault/README.md
## References
- docs/architecture/crates/vault/README.md — Known Source Drift, Security Constraints, Public API
- docs/architecture/crates/vault/encryption.md — Security Constraints, Key Versioning
- docs/architecture/crates/vault/service.md — Security Constraints
- docs/architecture/crates/vault/protocol.md — References
- docs/architecture/README.md — Document Lifecycle, Architecture Documents table, Current State
## Notes
> This is the doc cleanup that closes out the vault phase. The drift table and
> "known drift" prose were tracking tools during spec-to-implementation sync;
> now that the sync is complete, they're noise. Keep the permanent constraint
> statements — they guide future implementation agents who touch the vault.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,79 @@
---
id: vault/unlock-new-zeroizing-return
name: Change unlock_new return type from String to Zeroizing<String>
status: pending
depends_on: [vault/irpc-removal]
scope: single
risk: low
impact: isolated
level: implementation
---
## Description
Fix drift item #8: `unlock_new` currently returns `String`, which is not
zeroized on drop. The mnemonic phrase is the root of trust — it must not linger
in freed heap memory. Change the return type to `Zeroizing<String>` (from the
`zeroize` crate, already a dependency).
### Current state
```rust
pub fn unlock_new(&self, word_count: usize) -> Result<String, VaultServiceError>;
```
### Target state
```rust
pub fn unlock_new(&self, word_count: usize) -> Result<Zeroizing<String>, VaultServiceError>;
```
Per `docs/architecture/crates/vault/service.md` → unlock_new:
> The returned phrase is the root of trust — it is heap-allocated and zeroized
> on drop, so it does not linger in freed memory. The caller should extract the
> phrase for secure storage (write down, display to user) and let the
> `Zeroizing<String>` drop when done. Do not clone the returned value or store
> it in a non-zeroizing container.
### Caller adaptation
The assembly layer (CLI binary, not yet implemented) will call `unlock_new` and
extract the phrase. The `Zeroizing<String>` wrapper derefs to `String`, so
`&*result` or `result.as_str()` works for reading. The caller must not clone the
inner `String` into a non-zeroizing container.
Existing tests that call `unlock_new` need updating to handle the new return
type — use `&*phrase` or `phrase.as_str()` to read the string.
### Scope
This task touches `service.rs` (the method signature and body) and test files.
It depends on the irpc removal task (drift #4) because both modify `service.rs`.
## Acceptance Criteria
- [ ] `unlock_new` return type changed from `Result<String, ...>` to `Result<Zeroizing<String>, ...>`
- [ ] Method body constructs `Zeroizing<String>` from the generated phrase
- [ ] Existing tests updated to handle `Zeroizing<String>` return type
- [ ] No `clone()` of the returned value in non-test code
- [ ] `cargo check` succeeds
- [ ] `cargo test` succeeds
- [ ] `cargo clippy` succeeds with no warnings
## References
- docs/architecture/crates/vault/README.md — Known Source Drift table item #8
- docs/architecture/crates/vault/service.md — unlock_new section
- docs/architecture/decisions/025-vault-local-only-dispatch.md — ADR-025 (resolves W7)
## Notes
> The mnemonic is the root of trust. Returning a plain `String` means the phrase
> lingers in freed heap memory after the caller drops it. `Zeroizing<String>`
> zeroizes the bytes on drop. This resolves review #002 W7. Depends on irpc
> removal because both modify `service.rs`.
## Summary
> To be filled on completion