tasks: decompose vault, core, call crates into 28 atomic implementation tasks
Break down the three initial crates (alknet-vault, alknet-core, alknet-call) into dependency-ordered task files for implementation agents. Structure: - tasks/vault/ (10 tasks) — drift fixes from ADR-025/026 refactor, review, spec sync. Vault is independent and can run fully in parallel with core/call. - tasks/core/ (6 tasks) — crate init, core types, config, auth, endpoint, review. Core is foundational; call depends on it. - tasks/call/ (12 tasks) — split into registry/ and protocol/ topic subdirs reflecting the two subsystems. CallAdapter is the merge point. Key decisions: - Drifts 3+9+10 grouped as one task (key-versioning-rotation) — the complete ADR-021 rotation feature that doesn't compile in pieces - Reviews injected at end of each crate phase (vault, core, call) - Vault spec-sync task removes the drift table and bumps doc status to stable - ACME deferred in core/endpoint (noted as TODO; X509 manual certs for now) - OperationEnv kept as a trait (load-bearing for ADR-024 layering) Validated: 28 tasks, no cycles, 11 generations of parallel work. Critical path runs through call (11 tasks). Vault completes by generation 4. 6 high-risk tasks identified (21%): irpc-removal, endpoint, operation-context, operation-env, call-adapter, abort-cascade.
This commit is contained in:
103
tasks/call/crate-init.md
Normal file
103
tasks/call/crate-init.md
Normal file
@@ -0,0 +1,103 @@
|
||||
---
|
||||
id: call/crate-init
|
||||
name: Initialize alknet-call crate with Cargo.toml, dependencies, and module skeleton
|
||||
status: pending
|
||||
depends_on: [core/core-types]
|
||||
scope: moderate
|
||||
risk: low
|
||||
impact: project
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Initialize the `alknet-call` crate from scratch. This crate implements the call
|
||||
protocol (structured RPC over QUIC) on ALPN `alknet/call`. It depends on
|
||||
alknet-core (for ProtocolHandler, Connection, AuthContext, Capabilities,
|
||||
IdentityProvider) and irpc (for framing).
|
||||
|
||||
### Crate setup
|
||||
|
||||
Create `crates/alknet-call/` with:
|
||||
|
||||
- `Cargo.toml` — package metadata, dependencies
|
||||
- `src/lib.rs` — crate root with module declarations and re-exports
|
||||
- Module skeleton files for:
|
||||
- `src/registry/mod.rs` — registry module root
|
||||
- `src/registry/spec.rs` — OperationSpec, OperationType, Visibility, ErrorDefinition, AccessControl
|
||||
- `src/registry/context.rs` — OperationContext, AbortPolicy, CompositionAuthority, ScopedOperationEnv
|
||||
- `src/registry/registration.rs` — Handler, HandlerRegistration, OperationProvenance, OperationRegistry, OperationRegistryBuilder
|
||||
- `src/registry/env.rs` — OperationEnv trait, LocalOperationEnv, CompositeOperationEnv
|
||||
- `src/registry/discovery.rs` — services/list, services/schema handlers
|
||||
- `src/protocol/mod.rs` — protocol module root
|
||||
- `src/protocol/wire.rs` — EventEnvelope, ResponseEnvelope, CallError, framing
|
||||
- `src/protocol/pending.rs` — PendingRequestMap, PendingEntry
|
||||
- `src/protocol/connection.rs` — CallConnection
|
||||
- `src/protocol/adapter.rs` — CallAdapter (ProtocolHandler impl)
|
||||
- `src/protocol/abort.rs` — abort cascade logic
|
||||
|
||||
### Dependencies
|
||||
|
||||
| Crate | Purpose |
|
||||
|-------|---------|
|
||||
| `alknet-core` | ProtocolHandler, Connection, AuthContext, Capabilities, IdentityProvider, Identity, HandlerError (workspace path) |
|
||||
| `irpc` | Framing, service dispatch (workspace dep) |
|
||||
| `tokio` 1 (full) | Async runtime, sync primitives (oneshot, mpsc, watch) |
|
||||
| `serde` 1 | Serialization for wire types |
|
||||
| `serde_json` 1 | JSON wire format, JSON Schema values |
|
||||
| `async-trait` 0.1 | OperationEnv trait (async fn in trait) |
|
||||
| `tracing` 0.1 | Structured logging |
|
||||
| `thiserror` 2 | Error enums |
|
||||
| `uuid` 1 | Request ID generation (UUID v4) |
|
||||
| `futures` | Stream trait for subscribe |
|
||||
|
||||
### Workspace Cargo.toml
|
||||
|
||||
Add `crates/alknet-call` to the workspace `members` list in the root
|
||||
`Cargo.toml`.
|
||||
|
||||
### Module skeleton
|
||||
|
||||
```rust
|
||||
// src/lib.rs
|
||||
//! alknet-call: Structured RPC over QUIC — operations, streaming, service discovery.
|
||||
//! Implements ProtocolHandler on ALPN `alknet/call`.
|
||||
|
||||
pub mod registry;
|
||||
pub mod protocol;
|
||||
|
||||
// Re-exports (filled in by subsequent tasks)
|
||||
```
|
||||
|
||||
Each module file gets a doc comment and `// TODO: implement` marker.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `crates/alknet-call/Cargo.toml` exists with all dependencies
|
||||
- [ ] `crates/alknet-call/src/lib.rs` exists with module declarations
|
||||
- [ ] All module skeleton files exist (registry/*, protocol/*)
|
||||
- [ ] Root `Cargo.toml` `members` list includes `crates/alknet-call`
|
||||
- [ ] `cargo check -p alknet-call` succeeds
|
||||
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
|
||||
- [ ] Dual licensing: `MIT OR Apache-2.0` (workspace-inherited)
|
||||
- [ ] alknet-core dependency uses workspace path (`path = "../alknet-core"`)
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/call/README.md — crate index
|
||||
- docs/architecture/crates/call/call-protocol.md — CallAdapter, wire format
|
||||
- docs/architecture/crates/call/operation-registry.md — registry, OperationEnv
|
||||
- docs/architecture/decisions/003-crate-decomposition.md — ADR-003
|
||||
- docs/architecture/decisions/005-irpc-as-call-protocol-foundation.md — ADR-005
|
||||
|
||||
## Notes
|
||||
|
||||
> alknet-call depends on alknet-core (for ProtocolHandler, Connection,
|
||||
> AuthContext, Capabilities, IdentityProvider) and irpc (for framing). The
|
||||
> crate has two subsystems: registry (operation specs, context, dispatch) and
|
||||
> protocol (wire format, streams, adapter). The module structure reflects
|
||||
> this split.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
193
tasks/call/protocol/abort-cascade.md
Normal file
193
tasks/call/protocol/abort-cascade.md
Normal file
@@ -0,0 +1,193 @@
|
||||
---
|
||||
id: call/protocol/abort-cascade
|
||||
name: Implement abort cascade logic for nested calls (ADR-016)
|
||||
status: pending
|
||||
depends_on: [call/protocol/call-adapter]
|
||||
scope: moderate
|
||||
risk: high
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the abort cascade logic in `src/protocol/abort.rs`. When a handler
|
||||
composes other operations via `OperationEnv::invoke()`, it creates a call tree:
|
||||
a parent request (r1) spawns children (r1-a, r1-b), which may spawn their own
|
||||
children. When `call.aborted` arrives for a parent, the protocol cascades the
|
||||
abort to all non-terminal descendants.
|
||||
|
||||
**Read ADR-016 before starting this task.**
|
||||
|
||||
### Call tree
|
||||
|
||||
The call tree is indexed by `parent_request_id` in the `PendingRequestMap`. The
|
||||
root request has `parent_request_id: None`. Each composed call has
|
||||
`parent_request_id: Some(parent.request_id)`.
|
||||
|
||||
```
|
||||
r1 (root, wire call)
|
||||
├── r1-a (composed by r1's handler)
|
||||
│ ├── r1-a-1 (composed by r1-a's handler)
|
||||
│ └── r1-a-2
|
||||
└── r1-b
|
||||
└── r1-b-1
|
||||
```
|
||||
|
||||
### Abort cascade
|
||||
|
||||
When `call.aborted` arrives for a parent request:
|
||||
|
||||
1. Find all non-terminal descendants in the tree (walk by `parent_request_id`)
|
||||
2. Send `call.aborted` for each descendant
|
||||
3. Cancel each descendant's future (Drop releases resources)
|
||||
|
||||
The CallAdapter walks the tree indexed by `parent_request_id` in
|
||||
`PendingRequestMap` and sends `call.aborted` for each descendant.
|
||||
|
||||
### AbortPolicy
|
||||
|
||||
The abort policy is set on `OperationContext` and propagated through
|
||||
`OperationEnv::invoke()` — the composing handler decides the child's policy,
|
||||
not the wire caller.
|
||||
|
||||
**`AbortDependents` (default)**: aborting a request aborts everything
|
||||
downstream, regardless of branch. This is the correct default because aborted
|
||||
parent work has no consumer waiting for results — continuing is wasted work at
|
||||
best and unwanted side effects at worst (e.g., a `bash/exec` that keeps running
|
||||
after the caller stopped caring).
|
||||
|
||||
**`ContinueRunning` (opt-in)**: descendants that have already started continue
|
||||
to completion; descendants that haven't started yet are aborted; no new
|
||||
descendants start. Use for long-running work that should survive a parent's
|
||||
abort (e.g., a subscription that should keep streaming).
|
||||
|
||||
### Wire visibility
|
||||
|
||||
Composed child `request_id`s are **internal** — they appear in
|
||||
`PendingRequestMap` for abort-cascade indexing but are not sent as
|
||||
`call.requested` to any peer. The client only sees `call.aborted` for the root
|
||||
ID it sent; the server cascades internally to descendants.
|
||||
|
||||
The exception is `from_call` ops, which generate their own wire ID when
|
||||
forwarding to the remote node (the remote node's `PendingRequestMap` indexes
|
||||
it).
|
||||
|
||||
### Implementation
|
||||
|
||||
The abort cascade needs access to the `PendingRequestMap` to walk the tree.
|
||||
The `CallAdapter` holds the `PendingRequestMap` (or a reference to it). The
|
||||
cascade logic:
|
||||
|
||||
```rust
|
||||
pub struct AbortCascade {
|
||||
// Access to PendingRequestMap for tree walking
|
||||
// The map indexes entries by request_id, and each entry knows its parent_request_id
|
||||
// (from OperationContext, stored when the entry was registered)
|
||||
}
|
||||
|
||||
impl AbortCascade {
|
||||
/// Cascade an abort from the given request ID to all non-terminal descendants.
|
||||
/// Returns the list of request IDs that were aborted (for logging/auditing).
|
||||
pub fn cascade_abort(&self, root_request_id: &str, policy: AbortPolicy) -> Vec<String>;
|
||||
|
||||
/// Find all descendants of a request ID in the call tree.
|
||||
fn find_descendants(&self, parent_id: &str) -> Vec<String>;
|
||||
}
|
||||
```
|
||||
|
||||
### Storing parent_request_id in PendingRequestMap
|
||||
|
||||
The `PendingRequestMap` needs to know the `parent_request_id` for each entry to
|
||||
walk the tree. This means `PendingEntry` needs to store the parent ID (or the
|
||||
full `OperationContext`):
|
||||
|
||||
```rust
|
||||
enum PendingEntry {
|
||||
Call {
|
||||
tx: oneshot::Sender<Result<Value, CallError>>,
|
||||
timeout: Instant,
|
||||
parent_request_id: Option<String>, // for abort cascade tree
|
||||
},
|
||||
Subscribe {
|
||||
tx: mpsc::Sender<Result<Value, CallError>>,
|
||||
timeout: Option<Instant>,
|
||||
parent_request_id: Option<String>, // for abort cascade tree
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
Update the `PendingRequestMap` (from the pending-request-map task) to store
|
||||
`parent_request_id` when registering entries. The `register_call` and
|
||||
`register_subscribe` methods take an optional `parent_request_id` parameter.
|
||||
|
||||
### AbortPolicy propagation
|
||||
|
||||
The abort policy is propagated through `OperationEnv::invoke()`:
|
||||
|
||||
- `invoke()` uses the default impl, which delegates to `invoke_with_policy()`
|
||||
with `parent.abort_policy.clone()`
|
||||
- `invoke_with_policy()` takes an explicit policy — use
|
||||
`AbortPolicy::ContinueRunning` for long-running work
|
||||
|
||||
When cascading:
|
||||
- `AbortDependents`: abort ALL descendants (started and unstarted)
|
||||
- `ContinueRunning`: abort only unstarted descendants; started ones continue to
|
||||
completion; no new descendants start
|
||||
|
||||
Determining "started" vs "unstarted" is tricky. A practical approach:
|
||||
- A descendant is "started" if its handler has begun executing (the future has
|
||||
been polled at least once)
|
||||
- A descendant is "unstarted" if it's queued but not yet dispatched
|
||||
|
||||
This may require tracking dispatch state in `PendingEntry`. A simpler
|
||||
approximation: under `ContinueRunning`, abort all descendants that haven't sent
|
||||
a `call.responded` yet (they're still pending). This is conservative but safe.
|
||||
|
||||
### Handler cleanup
|
||||
|
||||
Handlers clean up resources when their call is cancelled. In Rust, the future
|
||||
is dropped and `Drop` guards release resources (HTTP streams, file handles,
|
||||
locks). This is a handler-level concern; the protocol's job is to cascade the
|
||||
abort. See ADR-016.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `PendingEntry` stores `parent_request_id` (Call and Subscribe variants)
|
||||
- [ ] `register_call` and `register_subscribe` accept optional `parent_request_id`
|
||||
- [ ] `AbortCascade` struct with `cascade_abort()` method
|
||||
- [ ] `cascade_abort` walks the tree by `parent_request_id`
|
||||
- [ ] `AbortDependents`: aborts ALL descendants (started and unstarted)
|
||||
- [ ] `ContinueRunning`: aborts unstarted descendants, started ones continue
|
||||
- [ ] `cascade_abort` returns list of aborted request IDs
|
||||
- [ ] `call.aborted` for unknown request_id is silently discarded
|
||||
- [ ] Composed child request_ids are internal (not sent as call.requested to peer)
|
||||
- [ ] Client only sees call.aborted for the root ID it sent
|
||||
- [ ] AbortPolicy propagated through OperationEnv::invoke()
|
||||
- [ ] Unit test: cascade aborts all descendants under AbortDependents
|
||||
- [ ] Unit test: cascade aborts only unstarted under ContinueRunning
|
||||
- [ ] Unit test: unknown request_id → no-op (silently discarded)
|
||||
- [ ] Unit test: tree with depth 3, abort root → all descendants aborted
|
||||
- [ ] `cargo test -p alknet-call` succeeds
|
||||
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/decisions/016-abort-cascade-for-nested-calls.md — ADR-016 (full rationale)
|
||||
- docs/architecture/crates/call/call-protocol.md — Abort Cascade and Nested Calls section
|
||||
- docs/architecture/crates/call/operation-registry.md — AbortPolicy, OperationContext.abort_policy
|
||||
|
||||
## Notes
|
||||
|
||||
> **Read ADR-016 before starting.** The abort cascade walks the call tree
|
||||
> indexed by parent_request_id in PendingRequestMap. The default policy
|
||||
> (AbortDependents) aborts everything downstream — this is correct because
|
||||
> aborted parent work has no consumer. ContinueRunning is the opt-in for
|
||||
> long-running work. Composed child request_ids are internal — the client only
|
||||
> sees call.aborted for the root ID. The PendingRequestMap needs to store
|
||||
> parent_request_id for tree walking — update the pending-request-map task's
|
||||
> output if needed.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
260
tasks/call/protocol/call-adapter.md
Normal file
260
tasks/call/protocol/call-adapter.md
Normal file
@@ -0,0 +1,260 @@
|
||||
---
|
||||
id: call/protocol/call-adapter
|
||||
name: Implement CallAdapter (ProtocolHandler for alknet/call) with stream handling, identity resolution, and root context construction
|
||||
status: pending
|
||||
depends_on: [call/protocol/call-connection, call/registry/operation-env, call/registry/service-discovery, core/endpoint]
|
||||
scope: broad
|
||||
risk: high
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement `CallAdapter` in `src/protocol/adapter.rs`. This is the
|
||||
`ProtocolHandler` implementation for ALPN `alknet/call` — the merge point of the
|
||||
registry and protocol strands. It ties everything together: stream handling,
|
||||
identity resolution, root context construction, env composition, dispatch.
|
||||
|
||||
### CallAdapter struct
|
||||
|
||||
```rust
|
||||
pub struct CallAdapter {
|
||||
registry: Arc<OperationRegistry>, // Layer 0 — curated, immutable
|
||||
identity_provider: Arc<dyn IdentityProvider>,
|
||||
session_source: Option<Arc<dyn SessionOverlaySource + Send + Sync>>, // Layer 1
|
||||
default_timeout: Duration, // 30s default
|
||||
}
|
||||
|
||||
impl CallAdapter {
|
||||
pub fn new(registry: Arc<OperationRegistry>, identity_provider: Arc<dyn IdentityProvider>) -> Self {
|
||||
Self { registry, identity_provider, session_source: None,
|
||||
default_timeout: Duration::from_secs(30) }
|
||||
}
|
||||
|
||||
pub fn with_session_source(mut self, source: Arc<dyn SessionOverlaySource + Send + Sync>) -> Self {
|
||||
self.session_source = Some(source);
|
||||
self
|
||||
}
|
||||
|
||||
pub fn with_timeout(mut self, timeout: Duration) -> Self {
|
||||
self.default_timeout = timeout;
|
||||
self
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### SessionOverlaySource trait
|
||||
|
||||
```rust
|
||||
pub trait SessionOverlaySource: Send + Sync {
|
||||
fn overlay_for(&self, context: &OperationContext) -> Option<Arc<dyn OperationEnv + Send + Sync>>;
|
||||
}
|
||||
```
|
||||
|
||||
Defined in alknet-call because CallAdapter must name the type — alknet-call
|
||||
cannot depend on alknet-agent (agent depends on call, not reverse). The agent
|
||||
crate implements this trait; alknet-call defines it. Same pattern as
|
||||
IdentityProvider (ADR-004).
|
||||
|
||||
### ProtocolHandler impl
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
impl ProtocolHandler for CallAdapter {
|
||||
fn alpn(&self) -> &'static [u8] { b"alknet/call" }
|
||||
|
||||
async fn handle(&self, connection: Connection, auth: &AuthContext) -> Result<(), HandlerError> {
|
||||
// 1. Create CallConnection from the Connection
|
||||
// 2. Spawn a task that continuously calls connection.accept_bi()
|
||||
// 3. For each accepted stream, read EventEnvelope frames (FrameFramedReader)
|
||||
// 4. Dispatch call.requested events to the operation registry
|
||||
// 5. Write response EventEnvelope frames (FrameFramedWriter)
|
||||
// 6. Manage PendingRequestMap for outgoing calls
|
||||
// 7. On connection close: fail all pending, return Ok or Err(ConnectionClosed)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Stream handling
|
||||
|
||||
The adapter:
|
||||
1. Spawns a task that continuously calls `connection.accept_bi()` to receive
|
||||
incoming streams
|
||||
2. For each accepted stream, reads `EventEnvelope` frames using
|
||||
`FrameFramedReader`
|
||||
3. Dispatches `call.requested` events to the operation registry
|
||||
4. Writes response `EventEnvelope` frames using `FrameFramedWriter`
|
||||
5. Manages `PendingRequestMap` for outgoing calls initiated by the server
|
||||
|
||||
For outgoing calls (server → client), the adapter:
|
||||
1. Opens a bidirectional stream with `connection.open_bi()`
|
||||
2. Sends `call.requested` on that stream
|
||||
3. Adds the request ID to the `PendingRequestMap`
|
||||
4. Reads responses from any stream, correlates by ID
|
||||
|
||||
### Identity resolution (per-request)
|
||||
|
||||
The CallAdapter resolves identity per-request, not per-connection:
|
||||
|
||||
1. The endpoint provides `AuthContext` with whatever identity it resolved at
|
||||
the TLS layer (may be `None`)
|
||||
2. When a `call.requested` event arrives, the CallAdapter constructs an
|
||||
`OperationContext` with the connection-level `AuthContext.identity`
|
||||
3. If the `call.requested` payload includes an `auth_token` field, the
|
||||
CallAdapter resolves it using `IdentityProvider::resolve_from_token()`. If
|
||||
resolution succeeds, the resulting `Identity` replaces the connection-level
|
||||
identity in the `OperationContext`. If resolution fails, the request
|
||||
proceeds with the connection-level identity (which may be `None`)
|
||||
4. The `OperationContext.identity` is passed to the `OperationRegistry` for
|
||||
ACL checking
|
||||
5. If `identity` is `None` and the operation's `AccessControl` has
|
||||
restrictions, the registry returns `FORBIDDEN` with message
|
||||
`"authentication required"`
|
||||
|
||||
**Key point**: Identity is resolved per-request. This allows a single
|
||||
connection to upgrade authentication mid-session and allows different operations
|
||||
on the same connection to have different identity levels.
|
||||
|
||||
### Root OperationContext construction
|
||||
|
||||
When a `call.requested` arrives from the wire, the CallAdapter constructs the
|
||||
root `OperationContext` — the entry point of the call tree. This sets
|
||||
`internal: false`, meaning ACL runs against the caller's `identity`, not a
|
||||
handler's composition authority (ADR-015, ADR-022).
|
||||
|
||||
```rust
|
||||
fn build_root_context(
|
||||
&self,
|
||||
request_id: String,
|
||||
operation_name: &str,
|
||||
identity: Option<Identity>,
|
||||
/* connection, session */
|
||||
) -> OperationContext {
|
||||
let registration = self.registry.registration(operation_name);
|
||||
OperationContext {
|
||||
request_id,
|
||||
parent_request_id: None, // wire request — top of call tree
|
||||
identity: identity.clone(), // caller's identity (inbound)
|
||||
handler_identity: registration.composition_authority.clone(),
|
||||
capabilities: registration.capabilities.clone(),
|
||||
metadata: HashMap::new(),
|
||||
deadline: Some(Instant::now() + self.default_timeout),
|
||||
scoped_env: registration.scoped_env.clone()
|
||||
.unwrap_or_else(ScopedOperationEnv::empty),
|
||||
env: self.compose_root_env(/* connection, session */),
|
||||
abort_policy: AbortPolicy::default(), // abort-dependents
|
||||
internal: false, // external call — ACL against caller identity
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### compose_root_env
|
||||
|
||||
The per-call `env` composition (ADR-024) builds a `CompositeOperationEnv` from:
|
||||
- Layer 0: `LocalOperationEnv` (curated registry)
|
||||
- Layer 1: session overlay (if active, from `session_source.overlay_for()`)
|
||||
- Layer 2: connection overlay (from `CallConnection.overlay_env()`)
|
||||
|
||||
```rust
|
||||
fn compose_root_env(&self, connection: &CallConnection, context: &OperationContext) -> Arc<dyn OperationEnv + Send + Sync> {
|
||||
let base = Arc::new(LocalOperationEnv { registry: self.registry.clone() });
|
||||
let session = self.session_source.as_ref()
|
||||
.and_then(|s| s.overlay_for(context));
|
||||
let connection_overlay = connection.overlay_env();
|
||||
Arc::new(CompositeOperationEnv { session, connection: Some(connection_overlay), base })
|
||||
}
|
||||
```
|
||||
|
||||
### operationId normalization
|
||||
|
||||
The `call.requested` payload's `operationId` has a leading slash (`/fs/readFile`).
|
||||
The CallAdapter strips it before registry lookup (`fs/readFile`). This is a
|
||||
single rule applied consistently — the registry stores names without leading
|
||||
slash, the wire format adds it.
|
||||
|
||||
### ResponseEnvelope → EventEnvelope
|
||||
|
||||
The CallAdapter converts `ResponseEnvelope` (from local dispatch) to
|
||||
`EventEnvelope` for the wire:
|
||||
|
||||
| `ResponseEnvelope` | `EventEnvelope` |
|
||||
|--------------------|-----------------|
|
||||
| `Ok(value)` | `{ type: "call.responded", id: request_id, payload: { output: value } }` |
|
||||
| `Err(call_error)` | `{ type: "call.error", id: request_id, payload: <serialized CallError> }` |
|
||||
|
||||
For subscriptions, each `call.responded` is a separate `EventEnvelope` with the
|
||||
same `id`; `call.completed` is `{ type: "call.completed", id, payload: {} }`.
|
||||
|
||||
### Timeout handling
|
||||
|
||||
- Default timeout for wire calls is 30 seconds (`default_timeout`)
|
||||
- `build_root_context` sets `OperationContext.deadline` to `now + default_timeout`
|
||||
- Composed calls inherit the parent's deadline (children do NOT get a fresh 30s)
|
||||
- A composed call that exceeds the deadline is cancelled and returns
|
||||
`CallError { code: "TIMEOUT", retryable: true }`
|
||||
- Subscriptions default to no deadline (`deadline: None` — unbounded); the
|
||||
client can specify a timeout in the `call.requested` payload
|
||||
- The `PendingRequestMap` sweeper runs every 10 seconds and removes expired
|
||||
wire entries
|
||||
|
||||
### Error handling in handle()
|
||||
|
||||
- If a handler panics, the stream is closed and the PendingRequestMap entry is
|
||||
cleaned up by the next sweeper pass. Other streams and the connection are
|
||||
unaffected.
|
||||
- Connection drop: all pending requests failed with `call.error` code
|
||||
`INTERNAL` and message `"connection closed"`. All subscription channels
|
||||
closed. `handle()` returns `Ok(())` (clean) or `Err(ConnectionClosed)`.
|
||||
- Stream reset: `FrameFramedReader` returns an error. If subscription, remove
|
||||
PendingRequestMap entry, close mpsc. If call, resolve oneshot with error. No
|
||||
`call.aborted` sent — stream is gone.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `CallAdapter` struct with registry, identity_provider, session_source, default_timeout
|
||||
- [ ] `CallAdapter::new()`, `with_session_source()`, `with_timeout()` constructors
|
||||
- [ ] `SessionOverlaySource` trait defined with `overlay_for()` method
|
||||
- [ ] `ProtocolHandler::alpn()` returns `b"alknet/call"`
|
||||
- [ ] `handle()` accepts streams, reads EventEnvelope frames, dispatches
|
||||
- [ ] `handle()` spawns task for continuous `accept_bi()`
|
||||
- [ ] Outgoing calls: open_bi, send call.requested, add to PendingRequestMap
|
||||
- [ ] Identity resolution: AuthContext.identity used, auth_token overrides per-request
|
||||
- [ ] auth_token resolution failure → proceed with connection-level identity
|
||||
- [ ] `build_root_context` sets internal: false, deadline, capabilities from registration
|
||||
- [ ] `compose_root_env` builds CompositeOperationEnv (base + session + connection)
|
||||
- [ ] operationId leading slash stripped before registry lookup
|
||||
- [ ] ResponseEnvelope → EventEnvelope conversion (Ok → responded, Err → error)
|
||||
- [ ] Subscriptions: multiple call.responded with same id, then call.completed
|
||||
- [ ] Timeout: 30s default, composed calls inherit parent deadline
|
||||
- [ ] Handler panic: stream closed, PendingRequestMap cleaned up, others unaffected
|
||||
- [ ] Connection drop: fail all pending with INTERNAL, return Ok or Err
|
||||
- [ ] Unit test: CallAdapter alpn returns b"alknet/call"
|
||||
- [ ] Integration test: call.requested → dispatch → call.responded round-trip
|
||||
- [ ] Integration test: auth_token overrides connection-level identity
|
||||
- [ ] Integration test: Internal op called from wire → NOT_FOUND
|
||||
- [ ] Integration test: ACL denied → FORBIDDEN
|
||||
- [ ] `cargo test -p alknet-call` succeeds
|
||||
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/call/call-protocol.md — CallAdapter, stream handling, root context
|
||||
- docs/architecture/crates/call/operation-registry.md — OperationContext construction
|
||||
- docs/architecture/decisions/015-privilege-model-and-authority-context.md — ADR-015 (internal: false for wire)
|
||||
- docs/architecture/decisions/024-operation-registry-layering.md — ADR-024 (env composition)
|
||||
- docs/architecture/decisions/012-call-protocol-stream-model.md — ADR-012
|
||||
|
||||
## Notes
|
||||
|
||||
> This is the merge point of the registry and protocol strands — the highest-
|
||||
> risk task in the call crate. It ties together stream handling, identity
|
||||
> resolution, root context construction, env composition, and dispatch. The
|
||||
> per-request identity resolution (auth_token overrides connection-level) is
|
||||
> important — a single connection can upgrade auth mid-session. The
|
||||
> compose_root_env builds the CompositeOperationEnv per call from the active
|
||||
> layers. operationId on the wire has a leading slash; strip it before lookup.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
158
tasks/call/protocol/call-connection.md
Normal file
158
tasks/call/protocol/call-connection.md
Normal file
@@ -0,0 +1,158 @@
|
||||
---
|
||||
id: call/protocol/call-connection
|
||||
name: Implement CallConnection with imported-ops overlay (Layer 2) and call/subscribe/abort methods
|
||||
status: pending
|
||||
depends_on: [call/protocol/pending-request-map, call/registry/operation-env]
|
||||
scope: moderate
|
||||
risk: medium
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement `CallConnection` in `src/protocol/connection.rs`. This represents an
|
||||
established `alknet/call` connection, regardless of which side opened it
|
||||
(ADR-017). It holds the connection's imported-ops overlay (Layer 2, ADR-024).
|
||||
|
||||
### CallConnection
|
||||
|
||||
```rust
|
||||
pub struct CallConnection {
|
||||
connection: Connection,
|
||||
imported_operations: Arc<RwLock<HashMap<String, HandlerRegistration>>>,
|
||||
}
|
||||
```
|
||||
|
||||
An established alknet/call connection (either direction — accepted or opened).
|
||||
Holds the Layer 2 overlay (imported ops from `from_call` discovery).
|
||||
|
||||
### Layer 2 registration API
|
||||
|
||||
```rust
|
||||
impl CallConnection {
|
||||
/// Register an imported operation into this connection's overlay (Layer 2, ADR-024).
|
||||
/// Called by from_call after discovery.
|
||||
pub fn register_imported(&self, registration: HandlerRegistration) {
|
||||
let name = registration.spec.name.clone();
|
||||
self.imported_operations.write().insert(name, registration);
|
||||
}
|
||||
|
||||
/// Register multiple imported operations (bulk variant for from_call).
|
||||
pub fn register_imported_all(&self, registrations: Vec<HandlerRegistration>) {
|
||||
let mut overlay = self.imported_operations.write();
|
||||
for reg in registrations {
|
||||
overlay.insert(reg.spec.name.clone(), reg);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Layer 0 (curated) is built via `OperationRegistryBuilder` at startup. Layer 2
|
||||
(per-connection) registration uses `CallConnection::register_imported()` at
|
||||
runtime. When the connection drops, the overlay (and all imported ops) is
|
||||
dropped — no explicit deregistration needed.
|
||||
|
||||
### Overlay env
|
||||
|
||||
```rust
|
||||
impl CallConnection {
|
||||
/// Build an OperationEnv impl for this connection's overlay.
|
||||
/// Used by the CallAdapter when composing the root OperationContext.env.
|
||||
/// Returns an OperationEnv that dispatches to this connection's imported ops
|
||||
/// (and reports contains only for ops in the overlay).
|
||||
pub fn overlay_env(&self) -> Arc<dyn OperationEnv + Send + Sync>;
|
||||
}
|
||||
```
|
||||
|
||||
This is an `OperationEnv` impl that dispatches to the connection's imported ops.
|
||||
The `contains()` method returns true only for ops in the overlay. The
|
||||
`invoke_with_policy()` method looks up the op in the overlay and dispatches to
|
||||
its handler.
|
||||
|
||||
This env is composed into the `CompositeOperationEnv` by the CallAdapter as the
|
||||
`connection` layer (Layer 2).
|
||||
|
||||
### Call methods (outgoing)
|
||||
|
||||
```rust
|
||||
impl CallConnection {
|
||||
/// Call an operation on the remote peer (sends call.requested).
|
||||
pub async fn call(&self, operation_id: &str, input: Value) -> ResponseEnvelope;
|
||||
|
||||
/// Subscribe to a streaming operation on the remote peer.
|
||||
pub async fn subscribe(&self, operation_id: &str, input: Value) -> impl Stream<Item = ResponseEnvelope>;
|
||||
|
||||
/// Abort an in-flight request (sends call.aborted, cascades per ADR-016).
|
||||
pub async fn abort(&self, request_id: &str);
|
||||
}
|
||||
```
|
||||
|
||||
These methods:
|
||||
1. Open a bidirectional stream with `connection.open_bi()`
|
||||
2. Send `call.requested` on that stream (via FrameFramedWriter)
|
||||
3. Add the request ID to the PendingRequestMap
|
||||
4. Read responses from any stream, correlate by ID (via PendingRequestMap)
|
||||
|
||||
`call()` resolves on the first `call.responded`. `subscribe()` yields each
|
||||
`call.responded` until `call.completed` or `call.aborted`.
|
||||
|
||||
`abort()` sends `call.aborted` for the given request ID. The abort cascade
|
||||
(ADR-016) is handled by the abort-cascade task.
|
||||
|
||||
### Connection direction independence
|
||||
|
||||
Per ADR-017, connection direction is independent of call direction. Both
|
||||
sides can call each other once connected. The `CallConnection` type is the same
|
||||
whether the connection was accepted (server side) or opened (client side via
|
||||
`CallClient`). The `call`/`subscribe`/`abort` methods work the same way.
|
||||
|
||||
### from_call integration
|
||||
|
||||
The `from_call` adapter (ADR-017) discovers operations on a remote call
|
||||
protocol endpoint via `services/list` and `services/schema`, then registers
|
||||
them with `register_imported()` / `register_imported_all()`. This makes
|
||||
cross-node composition transparent — a handler calling
|
||||
`env.invoke("worker", "exec", ...)` doesn't know whether the operation is
|
||||
local or remote.
|
||||
|
||||
The `from_call` adapter itself is not implemented in this task — it's a future
|
||||
task. This task implements the `CallConnection` infrastructure that `from_call`
|
||||
will use.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `CallConnection` struct with connection and imported_operations fields
|
||||
- [ ] `register_imported()` adds to the Layer 2 overlay
|
||||
- [ ] `register_imported_all()` bulk adds to the overlay
|
||||
- [ ] `overlay_env()` returns an OperationEnv dispatching to imported ops
|
||||
- [ ] `overlay_env().contains()` returns true only for ops in the overlay
|
||||
- [ ] `call()` sends call.requested, resolves on first call.responded
|
||||
- [ ] `subscribe()` sends call.requested, yields call.responded until completed/aborted
|
||||
- [ ] `abort()` sends call.aborted for the request ID
|
||||
- [ ] Outgoing calls open a stream, send request, add to PendingRequestMap
|
||||
- [ ] Connection drop drops the overlay (no explicit deregistration)
|
||||
- [ ] Unit test: register_imported adds to overlay, contains returns true
|
||||
- [ ] Unit test: overlay_env dispatches to imported op
|
||||
- [ ] Unit test: overlay_env contains returns false for non-imported op
|
||||
- [ ] `cargo test -p alknet-call` succeeds
|
||||
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/call/call-protocol.md — CallConnection section
|
||||
- docs/architecture/decisions/017-call-protocol-client-and-adapter-contract.md — ADR-017
|
||||
- docs/architecture/decisions/024-operation-registry-layering.md — ADR-024 (Layer 2)
|
||||
|
||||
## Notes
|
||||
|
||||
> Connection direction is independent of call direction (ADR-017) — both sides
|
||||
> can call each other. The Layer 2 overlay is per-connection: when the
|
||||
> connection drops, the overlay drops (no deregistration needed). The
|
||||
> overlay_env() is composed into CompositeOperationEnv by the CallAdapter as
|
||||
> the connection layer. The from_call adapter itself is a future task — this
|
||||
> implements the infrastructure it will use.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
164
tasks/call/protocol/pending-request-map.md
Normal file
164
tasks/call/protocol/pending-request-map.md
Normal file
@@ -0,0 +1,164 @@
|
||||
---
|
||||
id: call/protocol/pending-request-map
|
||||
name: Implement PendingRequestMap for correlating call.requested and call.responded events
|
||||
status: pending
|
||||
depends_on: [call/protocol/wire-types]
|
||||
scope: moderate
|
||||
risk: medium
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement `PendingRequestMap` in `src/protocol/pending.rs`. This manages
|
||||
in-flight calls and subscriptions, correlating `call.responded` events back to
|
||||
the original `call.requested` by request ID.
|
||||
|
||||
### PendingRequestMap
|
||||
|
||||
```rust
|
||||
pub struct PendingRequestMap {
|
||||
pending: HashMap<String, PendingEntry>,
|
||||
}
|
||||
|
||||
enum PendingEntry {
|
||||
Call {
|
||||
tx: oneshot::Sender<Result<Value, CallError>>,
|
||||
timeout: Instant,
|
||||
},
|
||||
Subscribe {
|
||||
tx: mpsc::Sender<Result<Value, CallError>>,
|
||||
timeout: Option<Instant>,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### Behavior
|
||||
|
||||
When a `call.responded` event arrives:
|
||||
- If `PendingEntry::Call` → resolve the oneshot, delete entry
|
||||
- If `PendingEntry::Subscribe` → push to the mpsc channel, keep entry alive
|
||||
|
||||
When `call.completed` arrives on a subscription → close the mpsc channel, delete entry.
|
||||
|
||||
When `call.aborted` arrives → cancel/drop whichever side initiated it. A
|
||||
`call.aborted` for an unknown `requestId` is silently discarded.
|
||||
|
||||
When `call.error` arrives → resolve the oneshot (Call) or push to channel
|
||||
(Subscribe) with the error, delete entry.
|
||||
|
||||
### Timeouts
|
||||
|
||||
Timeouts prevent dangling entries. A background task sweeps expired entries
|
||||
periodically (every 10 seconds per call-protocol.md).
|
||||
|
||||
- `Call` entries have a timeout (default 30s from CallAdapter.default_timeout)
|
||||
- `Subscribe` entries may have `timeout: None` (unbounded — long-running
|
||||
subscriptions)
|
||||
|
||||
When the sweeper finds an expired entry:
|
||||
- `Call`: resolve oneshot with `CallError { code: "TIMEOUT", retryable: true }`, delete
|
||||
- `Subscribe`: close mpsc channel with a timeout error, delete
|
||||
|
||||
### Methods
|
||||
|
||||
```rust
|
||||
impl PendingRequestMap {
|
||||
pub fn new() -> Self;
|
||||
|
||||
/// Register a pending call. Returns a oneshot receiver for the result.
|
||||
pub fn register_call(&mut self, request_id: String, timeout: Instant) -> oneshot::Receiver<Result<Value, CallError>>;
|
||||
|
||||
/// Register a pending subscription. Returns an mpsc receiver for the stream.
|
||||
pub fn register_subscribe(&mut self, request_id: String, timeout: Option<Instant>) -> mpsc::Receiver<Result<Value, CallError>>;
|
||||
|
||||
/// Handle an incoming call.responded event.
|
||||
/// Returns true if the entry was found and handled.
|
||||
pub fn handle_responded(&mut self, request_id: &str, output: Value) -> bool;
|
||||
|
||||
/// Handle an incoming call.completed event (subscriptions only).
|
||||
/// Closes the mpsc channel, deletes entry.
|
||||
pub fn handle_completed(&mut self, request_id: &str) -> bool;
|
||||
|
||||
/// Handle an incoming call.aborted event.
|
||||
/// Cancels the pending request, deletes entry.
|
||||
pub fn handle_aborted(&mut self, request_id: &str) -> bool;
|
||||
|
||||
/// Handle an incoming call.error event.
|
||||
/// Resolves with the error, deletes entry.
|
||||
pub fn handle_error(&mut self, request_id: &str, error: CallError) -> bool;
|
||||
|
||||
/// Sweep expired entries. Called periodically by a background task.
|
||||
pub fn evict_expired(&mut self) -> Vec<String>; // returns evicted request IDs
|
||||
|
||||
/// Fail all pending requests (connection closed). Returns the request IDs that were failed.
|
||||
pub fn fail_all(&mut self, error: CallError) -> Vec<String>;
|
||||
|
||||
/// Check if a request ID is pending.
|
||||
pub fn contains(&self, request_id: &str) -> bool;
|
||||
|
||||
/// Number of pending entries.
|
||||
pub fn len(&self) -> usize;
|
||||
}
|
||||
```
|
||||
|
||||
### Connection drop handling
|
||||
|
||||
When the QUIC connection closes, all pending requests are failed with
|
||||
`call.error` code `INTERNAL` and message `"connection closed"`. All
|
||||
subscription channels are closed. This is `fail_all()`.
|
||||
|
||||
### Stream reset handling
|
||||
|
||||
When a QUIC stream is reset mid-operation, the `FrameFramedReader` returns an
|
||||
error. If the stream was carrying a subscription, the PendingRequestMap entry
|
||||
is removed and the mpsc channel is closed. If the stream was carrying a call,
|
||||
the oneshot is resolved with an error. No `call.aborted` is sent — the stream
|
||||
is gone.
|
||||
|
||||
### Correlation is by ID, not by stream
|
||||
|
||||
A response arriving on stream N can fulfill a request sent on stream M. The
|
||||
`PendingRequestMap` is keyed by ID, not by stream. This is the stream-agnostic
|
||||
correlation property from ADR-012.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `PendingRequestMap` struct with pending HashMap
|
||||
- [ ] `PendingEntry::Call` with oneshot::Sender and timeout
|
||||
- [ ] `PendingEntry::Subscribe` with mpsc::Sender and optional timeout
|
||||
- [ ] `register_call` returns oneshot::Receiver
|
||||
- [ ] `register_subscribe` returns mpsc::Receiver
|
||||
- [ ] `handle_responded` resolves Call oneshot, pushes to Subscribe channel
|
||||
- [ ] `handle_completed` closes Subscribe mpsc, deletes entry
|
||||
- [ ] `handle_aborted` cancels pending, deletes entry
|
||||
- [ ] `handle_error` resolves with error, deletes entry
|
||||
- [ ] Unknown request_id in handle_* is silently discarded (returns false)
|
||||
- [ ] `evict_expired` removes timed-out entries, resolves with TIMEOUT error
|
||||
- [ ] `fail_all` fails all pending with given error (connection close)
|
||||
- [ ] Correlation is by request ID, not by stream
|
||||
- [ ] Unit test: register call, handle_responded → oneshot resolves
|
||||
- [ ] Unit test: register subscribe, handle multiple responded, handle_completed → stream ends
|
||||
- [ ] Unit test: expired call → evict_expired resolves with TIMEOUT
|
||||
- [ ] Unit test: fail_all resolves all pending with INTERNAL error
|
||||
- [ ] Unit test: unknown request_id handle_responded → false (silently discarded)
|
||||
- [ ] `cargo test -p alknet-call` succeeds
|
||||
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/call/call-protocol.md — PendingRequestMap section
|
||||
- docs/architecture/decisions/012-call-protocol-stream-model.md — ADR-012 (ID-based correlation)
|
||||
|
||||
## Notes
|
||||
|
||||
> Correlation is by request ID, not by stream — a response on stream N can
|
||||
> fulfill a request sent on stream M. This is the stream-agnostic property from
|
||||
> ADR-012. The sweeper runs every 10 seconds to evict expired entries. Unknown
|
||||
> request IDs in handle_* are silently discarded (not an error — the entry may
|
||||
> have already been resolved/cleaned up).
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
219
tasks/call/protocol/wire-types.md
Normal file
219
tasks/call/protocol/wire-types.md
Normal file
@@ -0,0 +1,219 @@
|
||||
---
|
||||
id: call/protocol/wire-types
|
||||
name: Implement EventEnvelope, ResponseEnvelope, CallError, and length-prefixed JSON framing
|
||||
status: pending
|
||||
depends_on: [call/crate-init]
|
||||
scope: moderate
|
||||
risk: medium
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the wire protocol types and framing in `src/protocol/wire.rs`. Every
|
||||
message on the wire is a length-prefixed JSON `EventEnvelope`.
|
||||
|
||||
### EventEnvelope
|
||||
|
||||
```rust
|
||||
pub struct EventEnvelope {
|
||||
pub r#type: String, // Event type
|
||||
pub id: String, // Correlation key (request ID, subscription ID)
|
||||
pub payload: Value, // serde_json::Value — schema depends on event type
|
||||
}
|
||||
|
||||
// Frame: 4-byte big-endian length prefix + UTF-8 JSON body
|
||||
```
|
||||
|
||||
The envelope is JSON because it must be consumable from JavaScript, Python, and
|
||||
any language. The `Value` type is `serde_json::Value`.
|
||||
|
||||
Binary payloads (postcard, protobuf) are base64-encoded as a JSON string within
|
||||
the `payload` field. The envelope itself does not interpret the payload — this
|
||||
is a handler-level concern, not a protocol-level concern.
|
||||
|
||||
### Event Types
|
||||
|
||||
Five event types:
|
||||
|
||||
| Event | Direction | Purpose |
|
||||
|-------|-----------|---------|
|
||||
| `call.requested` | Caller → Handler | Initiate a call or subscription |
|
||||
| `call.responded` | Handler → Caller | Deliver a result (one for calls, many for subscriptions) |
|
||||
| `call.completed` | Handler → Caller | Signal end of subscription stream |
|
||||
| `call.aborted` | Either side | Cancel the call/subscription |
|
||||
| `call.error` | Handler → Caller | Signal an error |
|
||||
|
||||
### Wire Payload Schemas
|
||||
|
||||
| Event | `payload` shape |
|
||||
|-------|----------------|
|
||||
| `call.requested` | `{ "operationId": "/fs/readFile", "input": {...}, "auth_token": "alk_..." (optional) }` |
|
||||
| `call.responded` | `{ "output": <Value> }` |
|
||||
| `call.completed` | `{}` — empty object |
|
||||
| `call.aborted` | `{}` — empty object |
|
||||
| `call.error` | `{ "code": "...", "message": "...", "retryable": bool, "details": {...} (optional) }` |
|
||||
|
||||
### call.requested payload
|
||||
|
||||
```json
|
||||
{
|
||||
"operationId": "/fs/readFile",
|
||||
"input": { ... },
|
||||
"auth_token": "alk_..." // optional
|
||||
}
|
||||
```
|
||||
|
||||
- `operationId` — the operation to invoke, **with a leading slash** on the wire.
|
||||
The registry stores names without the leading slash; the wire format adds it.
|
||||
The CallAdapter strips the leading slash before registry lookup.
|
||||
- `input` — the operation input, matching the operation's `input_schema`.
|
||||
- `auth_token` — optional. If present, CallAdapter resolves via
|
||||
`IdentityProvider::resolve_from_token()`. Resulting Identity takes precedence
|
||||
over connection-level identity for this request.
|
||||
|
||||
The `call.requested` payload does **not** carry an abort policy field. The abort
|
||||
policy is set on `OperationContext` and propagated through
|
||||
`OperationEnv::invoke()` — the composing handler decides, not the wire caller.
|
||||
|
||||
### call.error payload
|
||||
|
||||
```json
|
||||
{
|
||||
"code": "FILE_NOT_FOUND",
|
||||
"message": "file not found: /etc/nonexistent",
|
||||
"retryable": false,
|
||||
"details": { "path": "/etc/nonexistent", "errno": 2 }
|
||||
}
|
||||
```
|
||||
|
||||
Protocol-level codes (emitted by dispatch machinery):
|
||||
- `NOT_FOUND` — operation not in registry (or Internal op called from wire)
|
||||
- `FORBIDDEN` — access denied
|
||||
- `INVALID_INPUT` — input doesn't match JSON Schema
|
||||
- `INTERNAL` — handler error, panic, connection failure
|
||||
- `TIMEOUT` — request timed out (retryable: true)
|
||||
|
||||
Operation-level domain codes (emitted by handlers, ADR-023): e.g.,
|
||||
`FILE_NOT_FOUND`, `RATE_LIMITED`. These carry a `details` payload conforming to
|
||||
the declared `ErrorDefinition.schema`.
|
||||
|
||||
New error codes may be added in future. Clients should treat unknown codes as
|
||||
`INTERNAL` with `retryable: false`.
|
||||
|
||||
### ResponseEnvelope
|
||||
|
||||
```rust
|
||||
pub struct ResponseEnvelope {
|
||||
pub request_id: String,
|
||||
pub result: Result<Value, CallError>,
|
||||
}
|
||||
|
||||
pub struct CallError {
|
||||
pub code: String,
|
||||
pub message: String,
|
||||
pub retryable: bool,
|
||||
pub details: Option<Value>,
|
||||
}
|
||||
```
|
||||
|
||||
Local dispatch produces `ResponseEnvelope` with no serialization overhead. The
|
||||
CallAdapter converts it to `EventEnvelope` for the wire.
|
||||
|
||||
### ResponseEnvelope → EventEnvelope conversion
|
||||
|
||||
| `ResponseEnvelope` | `EventEnvelope` |
|
||||
|--------------------|-----------------|
|
||||
| `Ok(value)` | `{ type: "call.responded", id: request_id, payload: { output: value } }` |
|
||||
| `Err(call_error)` | `{ type: "call.error", id: request_id, payload: <serialized CallError> }` |
|
||||
|
||||
For subscriptions, each `call.responded` is a separate `EventEnvelope` with the
|
||||
same `id`; `call.completed` is `{ type: "call.completed", id, payload: {} }`.
|
||||
|
||||
### Framing
|
||||
|
||||
Length-prefixed JSON: 4-byte big-endian length prefix + UTF-8 JSON body.
|
||||
|
||||
Implement:
|
||||
- `FrameFramedReader` — reads length-prefixed frames from an async reader
|
||||
(RecvStream)
|
||||
- `FrameFramedWriter` — writes length-prefixed frames to an async writer
|
||||
(SendStream)
|
||||
|
||||
```rust
|
||||
pub struct FrameFramedReader<R: AsyncRead + Unpin> { /* ... */ }
|
||||
impl<R: AsyncRead + Unpin> FrameFramedReader<R> {
|
||||
pub fn new(reader: R) -> Self;
|
||||
pub async fn read_frame(&mut self) -> Result<EventEnvelope, FrameError>;
|
||||
}
|
||||
|
||||
pub struct FrameFramedWriter<W: AsyncWrite + Unpin> { /* ... */ }
|
||||
impl<W: AsyncWrite + Unpin> FrameFramedWriter<W> {
|
||||
pub fn new(writer: W) -> Self;
|
||||
pub async fn write_frame(&mut self, envelope: &EventEnvelope) -> Result<(), FrameError>;
|
||||
}
|
||||
```
|
||||
|
||||
This is the same framing used by irpc. The Rust implementation in alknet-call is
|
||||
canonical (ADR-005, ADR-013).
|
||||
|
||||
### ResponseEnvelope helper methods
|
||||
|
||||
```rust
|
||||
impl ResponseEnvelope {
|
||||
pub fn ok(request_id: String, output: Value) -> Self;
|
||||
pub fn error(request_id: String, error: CallError) -> Self;
|
||||
pub fn not_found(request_id: String, op_name: &str) -> Self;
|
||||
pub fn forbidden(request_id: String, message: &str) -> Self;
|
||||
}
|
||||
```
|
||||
|
||||
### FrameError
|
||||
|
||||
```rust
|
||||
pub enum FrameError {
|
||||
Io(io::Error),
|
||||
Json(serde_json::Error),
|
||||
ConnectionClosed,
|
||||
InvalidFrame,
|
||||
}
|
||||
```
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `EventEnvelope` struct with type, id, payload fields
|
||||
- [ ] `ResponseEnvelope` struct with request_id, result fields
|
||||
- [ ] `CallError` struct with code, message, retryable, details fields
|
||||
- [ ] `FrameError` enum with Io, Json, ConnectionClosed, InvalidFrame
|
||||
- [ ] `FrameFramedReader` reads length-prefixed JSON frames
|
||||
- [ ] `FrameFramedWriter` writes length-prefixed JSON frames
|
||||
- [ ] 4-byte big-endian length prefix + UTF-8 JSON body
|
||||
- [ ] `ResponseEnvelope::ok()`, `error()`, `not_found()`, `forbidden()` helpers
|
||||
- [ ] `ResponseEnvelope` → `EventEnvelope` conversion (Ok → call.responded, Err → call.error)
|
||||
- [ ] Unit test: write frame, read frame, round-trip EventEnvelope
|
||||
- [ ] Unit test: ResponseEnvelope::ok produces correct EventEnvelope
|
||||
- [ ] Unit test: ResponseEnvelope::error produces correct call.error EventEnvelope
|
||||
- [ ] Unit test: framing handles large payloads
|
||||
- [ ] Unit test: framing detects truncated frames (ConnectionClosed error)
|
||||
- [ ] `cargo test -p alknet-call` succeeds
|
||||
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/call/call-protocol.md — EventEnvelope, wire format, event types
|
||||
- docs/architecture/decisions/005-irpc-as-call-protocol-foundation.md — ADR-005
|
||||
- docs/architecture/decisions/012-call-protocol-stream-model.md — ADR-012
|
||||
- docs/architecture/decisions/023-operation-error-schemas.md — ADR-023 (CallError, details)
|
||||
|
||||
## Notes
|
||||
|
||||
> The envelope is always JSON for cross-language compatibility. Binary
|
||||
> payloads are base64-encoded within the payload field (handler concern, not
|
||||
> protocol concern). The 4-byte big-endian length prefix is the same framing
|
||||
> irpc uses. operationId on the wire has a leading slash; the registry stores
|
||||
> names without it — the CallAdapter strips it before lookup.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
202
tasks/call/registry/handler-registration.md
Normal file
202
tasks/call/registry/handler-registration.md
Normal file
@@ -0,0 +1,202 @@
|
||||
---
|
||||
id: call/registry/handler-registration
|
||||
name: Implement Handler, HandlerRegistration, OperationProvenance, OperationRegistry, and OperationRegistryBuilder
|
||||
status: pending
|
||||
depends_on: [call/registry/operation-context]
|
||||
scope: broad
|
||||
risk: medium
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the handler registration types and the operation registry in
|
||||
`src/registry/registration.rs`. The registry maps operation names to
|
||||
registration bundles and provides the dispatch entry point.
|
||||
|
||||
### Handler
|
||||
|
||||
```rust
|
||||
pub type Handler = Arc<
|
||||
dyn Fn(Value, OperationContext) -> Pin<Box<dyn Future<Output = ResponseEnvelope> + Send>>
|
||||
+ Send + Sync
|
||||
>;
|
||||
```
|
||||
|
||||
Handlers are async. They receive:
|
||||
- `input: Value` — deserialized payload from `call.requested` (always `serde_json::Value`)
|
||||
- `context: OperationContext` — request ID, identity, metadata, env
|
||||
|
||||
And return `ResponseEnvelope` (defined in protocol/wire task — use a forward
|
||||
reference or define a minimal version here, full impl in the wire task).
|
||||
|
||||
### HandlerRegistration
|
||||
|
||||
```rust
|
||||
pub struct HandlerRegistration {
|
||||
pub spec: OperationSpec,
|
||||
pub handler: Handler,
|
||||
pub provenance: OperationProvenance,
|
||||
pub composition_authority: Option<CompositionAuthority>, // None for leaves
|
||||
pub scoped_env: Option<ScopedOperationEnv>, // None for leaves
|
||||
pub capabilities: Capabilities,
|
||||
}
|
||||
```
|
||||
|
||||
The registration bundle carries everything the dispatch path needs to
|
||||
construct an `OperationContext`. See ADR-022.
|
||||
|
||||
### OperationProvenance
|
||||
|
||||
```rust
|
||||
pub enum OperationProvenance {
|
||||
Local, // Assembly-written, trusted, can compose
|
||||
FromOpenAPI, // HTTP forwarding stub, leaf
|
||||
FromMCP, // MCP forwarding stub, leaf
|
||||
FromCall, // QUIC forwarding stub, leaf locally
|
||||
FromJsonSchema, // JSON Schema definition, no handler — schema only
|
||||
Session, // Agent-written, sandboxed, can compose within sandbox
|
||||
}
|
||||
```
|
||||
|
||||
| Provenance | Can compose? | Has composition authority? | Default visibility |
|
||||
|-----------|-------------|---------------------------|-------------------|
|
||||
| `Local` | Yes | Yes | External or Internal (assembly declares) |
|
||||
| `FromOpenAPI` | No (leaf) | No | Internal |
|
||||
| `FromMCP` | No (leaf) | No | Internal |
|
||||
| `FromCall` | No (leaf in local registry) | No | Internal |
|
||||
| `FromJsonSchema` | N/A (no handler) | No | N/A |
|
||||
| `Session` | Yes (within sandbox) | Yes | Internal always |
|
||||
|
||||
### OperationRegistry
|
||||
|
||||
```rust
|
||||
pub struct OperationRegistry {
|
||||
operations: HashMap<String, HandlerRegistration>,
|
||||
}
|
||||
```
|
||||
|
||||
The curated layer (Layer 0) is a `HashMap<String, HandlerRegistration>`. Session
|
||||
and connection overlays (Layers 1 and 2) are separate maps composed into the
|
||||
per-call `OperationContext.env` by the CallAdapter (ADR-024).
|
||||
|
||||
Methods:
|
||||
- `register(registration)`: add to curated layer at startup
|
||||
- `registration(name)`: find by operation name (checks active overlays first,
|
||||
then curated base — ADR-024). Returns spec, handler, provenance, composition
|
||||
authority, scoped env, capabilities.
|
||||
- `invoke(name, input, context)`: look up, check ACL, invoke handler, return result
|
||||
- `list_operations()`: return all registered specs (for `/services/list` —
|
||||
returns curated + active overlay ops, External only)
|
||||
|
||||
### OperationRegistryBuilder
|
||||
|
||||
Fluent API with convenience methods:
|
||||
|
||||
```rust
|
||||
pub struct OperationRegistryBuilder {
|
||||
operations: HashMap<String, HandlerRegistration>,
|
||||
}
|
||||
|
||||
impl OperationRegistryBuilder {
|
||||
pub fn new() -> Self;
|
||||
|
||||
// with_local: Local provenance, full bundle — all 5 args required
|
||||
pub fn with_local(
|
||||
mut self,
|
||||
spec: OperationSpec,
|
||||
handler: Handler,
|
||||
composition_authority: Option<CompositionAuthority>,
|
||||
scoped_env: Option<ScopedOperationEnv>,
|
||||
capabilities: Capabilities,
|
||||
) -> Self;
|
||||
|
||||
// with_leaf: leaf provenance (FromOpenAPI/FromMCP/FromCall), no authority, no scoped env
|
||||
pub fn with_leaf(
|
||||
mut self,
|
||||
spec: OperationSpec,
|
||||
handler: Handler,
|
||||
capabilities: Capabilities,
|
||||
) -> Self;
|
||||
|
||||
// with: full manual registration (any provenance)
|
||||
pub fn with(mut self, registration: HandlerRegistration) -> Self;
|
||||
|
||||
pub fn build(self) -> OperationRegistry;
|
||||
}
|
||||
```
|
||||
|
||||
`with_local` sets `provenance: Local`. `with_leaf` sets `provenance: FromOpenAPI`
|
||||
(or a parameter), `composition_authority: None`, `scoped_env: None`. `with` takes
|
||||
the full bundle for any provenance.
|
||||
|
||||
### Registry invoke flow
|
||||
|
||||
```rust
|
||||
impl OperationRegistry {
|
||||
pub async fn invoke(&self, name: &str, input: Value, context: OperationContext) -> ResponseEnvelope {
|
||||
// 1. Look up registration by name
|
||||
// 2. Check visibility: if Internal and context is external (internal: false), return NOT_FOUND
|
||||
// 3. Check ACL: access_control.check(identity or handler_identity depending on internal flag)
|
||||
// 4. If denied: return FORBIDDEN
|
||||
// 5. Invoke handler: (handler)(input, context).await
|
||||
// 6. Return ResponseEnvelope
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The ACL authority depends on `context.internal`:
|
||||
- `internal: false` (wire call): check against `context.identity` (caller)
|
||||
- `internal: true` (composition): check against `context.handler_identity.as_identity()`
|
||||
|
||||
### Layer 0 immutability
|
||||
|
||||
The curated layer (Layer 0 — `Local` provenance ops) is immutable after
|
||||
construction. Adding a `Local` op requires restarting the process. Session and
|
||||
imported overlays are dynamic at their respective scopes (ADR-024). The
|
||||
`OperationRegistryBuilder` is Layer-0-only; runtime overlay registration uses
|
||||
`CallConnection::register_imported()` (in the protocol/connection task).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `Handler` type alias (async closure returning ResponseEnvelope)
|
||||
- [ ] `HandlerRegistration` struct with all 6 fields
|
||||
- [ ] `OperationProvenance` enum with all 6 variants
|
||||
- [ ] `OperationRegistry` struct with operations HashMap
|
||||
- [ ] `OperationRegistry::register()` adds to curated layer
|
||||
- [ ] `OperationRegistry::registration()` looks up by name
|
||||
- [ ] `OperationRegistry::invoke()` checks visibility, ACL, invokes handler
|
||||
- [ ] `OperationRegistry::list_operations()` returns External specs only
|
||||
- [ ] `OperationRegistryBuilder` with `new()`, `with_local()`, `with_leaf()`, `with()`, `build()`
|
||||
- [ ] `with_local` sets provenance Local, requires all 5 args
|
||||
- [ ] `with_leaf` sets provenance leaf, composition_authority None, scoped_env None
|
||||
- [ ] invoke: Internal op called externally → NOT_FOUND (not FORBIDDEN)
|
||||
- [ ] invoke: ACL denied → FORBIDDEN
|
||||
- [ ] invoke: internal: true → ACL against handler_identity, not identity
|
||||
- [ ] invoke: internal: false → ACL against identity
|
||||
- [ ] Unit test: register and invoke a simple operation
|
||||
- [ ] Unit test: Internal op returns NOT_FOUND from external call
|
||||
- [ ] Unit test: ACL check with sufficient scopes → Allowed
|
||||
- [ ] Unit test: ACL check with insufficient scopes → Forbidden
|
||||
- [ ] Unit test: builder with_local and with_leaf produce correct provenance
|
||||
- [ ] `cargo test -p alknet-call` succeeds
|
||||
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/call/operation-registry.md — Handler, HandlerRegistration, OperationRegistry, builder
|
||||
- docs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md — ADR-022
|
||||
- docs/architecture/decisions/024-operation-registry-layering.md — ADR-024 (layering, immutability)
|
||||
|
||||
## Notes
|
||||
|
||||
> The registry is the dispatch core. The ACL authority switch (internal: true
|
||||
> → handler_identity, internal: false → identity) is the ADR-015 privilege
|
||||
> model — get this right. Internal ops return NOT_FOUND from the wire (don't
|
||||
> leak existence), not FORBIDDEN. The builder is Layer-0-only; runtime overlay
|
||||
> registration is via CallConnection (protocol task).
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
204
tasks/call/registry/operation-context.md
Normal file
204
tasks/call/registry/operation-context.md
Normal file
@@ -0,0 +1,204 @@
|
||||
---
|
||||
id: call/registry/operation-context
|
||||
name: Implement OperationContext, AbortPolicy, CompositionAuthority, and ScopedOperationEnv
|
||||
status: pending
|
||||
depends_on: [call/registry/operation-spec, core/core-types]
|
||||
scope: broad
|
||||
risk: high
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the operation context types in `src/registry/context.rs`. This is
|
||||
the highest-density task in the call crate — `OperationContext` has 10 fields,
|
||||
each tied to an ADR. The authority-switch semantics (`internal: true` → ACL
|
||||
against `handler_identity`, not `identity`) is where ADR-015, ADR-022, and
|
||||
ADR-024 converge.
|
||||
|
||||
**Read ADR-015, ADR-022, and ADR-024 before starting this task.**
|
||||
|
||||
### OperationContext
|
||||
|
||||
```rust
|
||||
pub struct OperationContext {
|
||||
pub request_id: String,
|
||||
pub parent_request_id: Option<String>,
|
||||
pub identity: Option<Identity>, // Caller's identity (inbound)
|
||||
pub handler_identity: Option<CompositionAuthority>, // Handler's composition authority (ADR-022)
|
||||
pub capabilities: Capabilities,
|
||||
pub metadata: HashMap<String, Value>,
|
||||
pub scoped_env: ScopedOperationEnv, // Reachability set (data, ADR-022)
|
||||
pub env: Arc<dyn OperationEnv + Send + Sync>, // Composition dispatch trait (ADR-024)
|
||||
pub abort_policy: AbortPolicy, // ADR-016 Decision 6
|
||||
pub deadline: Option<Instant>,
|
||||
pub(crate) internal: bool, // Module-private for writes (ADR-015)
|
||||
}
|
||||
```
|
||||
|
||||
Field-by-field:
|
||||
|
||||
- `request_id`: correlates with `call.requested` event's `id` field. For wire
|
||||
calls, this is the client-generated ID. For composed calls, generated by
|
||||
`OperationEnv::invoke()` via `generate_request_id()` (UUID v4 or
|
||||
`parent_id + "-" + counter`). **Deterministic IDs must not be used** — they
|
||||
collide across concurrent invocations, corrupting PendingRequestMap and the
|
||||
abort-cascade tree.
|
||||
- `parent_request_id`: set when this call was initiated by another operation
|
||||
(via OperationEnv). Records the agency chain — the call tree is the
|
||||
principal→agent chain (ADR-015).
|
||||
- `identity`: the authenticated caller (from IdentityProvider) — inbound auth
|
||||
(who is calling me). For external calls, who sent `call.requested`. For
|
||||
internal calls, the parent handler's `handler_identity` (propagated through
|
||||
`OperationEnv::invoke()`).
|
||||
- `handler_identity`: the composition authority of the handler processing this
|
||||
call. `None` for leaves (FromOpenAPI/FromMCP/FromCall) — they don't compose.
|
||||
`Some(...)` for Local/Session ops. For internal calls (`internal: true`), ACL
|
||||
checks against this authority (ADR-015, ADR-022). This is NOT a peer Identity
|
||||
— it's a declared authority bundle set at registration.
|
||||
- `capabilities`: outbound credentials the handler may use (decrypted API keys,
|
||||
scoped vault access). From the registration bundle (ADR-022).
|
||||
- `metadata`: request-scoped context (tracing IDs, connection info). **Must not
|
||||
hold secret material** (ADR-014). **Does not propagate through
|
||||
`OperationEnv::invoke()`** — nested calls get fresh metadata. The tracing
|
||||
link is `parent_request_id`, not metadata propagation.
|
||||
- `scoped_env`: the reachability set — operations this handler may compose.
|
||||
Populated from the registration bundle (ADR-022). This is *data* (a struct),
|
||||
not a dispatch trait. `None`/empty for leaves.
|
||||
- `env`: the composition dispatch trait (`Arc<dyn OperationEnv + Send + Sync>`).
|
||||
A handler calls `context.env.invoke(...)` to compose children. This is a
|
||||
trait object, not a concrete struct — enables registry layering (ADR-024).
|
||||
- `abort_policy`: for this call's descendants (ADR-016 Decision 6). Default
|
||||
`AbortDependents`. `ContinueRunning` is opt-in for long-running work. Set by
|
||||
the composing handler via `invoke()`, not by the wire caller.
|
||||
- `deadline`: for this call and all descendants. Set by `build_root_context`
|
||||
to `now + CallAdapter.default_timeout` (default 30s). Composed calls inherit
|
||||
the parent's deadline (children do NOT get a fresh 30s). `None` = unbounded
|
||||
(long-running subscriptions).
|
||||
- `internal`: when `true`, this call originated from composition (a handler
|
||||
calling another operation via OperationEnv), not from a wire request. This
|
||||
switches the authority context: ACL runs against `handler_identity`, not
|
||||
`identity`. Module-private for writes; read via `is_internal()`. Only set by
|
||||
`OperationEnv::invoke()` (true) or `CallAdapter` dispatch path (false).
|
||||
|
||||
### AbortPolicy
|
||||
|
||||
```rust
|
||||
pub enum AbortPolicy {
|
||||
AbortDependents, // default — abort cascades to all non-terminal descendants
|
||||
ContinueRunning, // opt-in — started descendants continue, unstarted aborted
|
||||
}
|
||||
|
||||
impl Default for AbortPolicy {
|
||||
fn default() -> Self { Self::AbortDependents }
|
||||
}
|
||||
```
|
||||
|
||||
### CompositionAuthority
|
||||
|
||||
```rust
|
||||
pub struct CompositionAuthority {
|
||||
pub label: String, // e.g., "agent-chat" — not a peer id
|
||||
pub scopes: Vec<String>, // e.g., ["llm:call", "fs:read"]
|
||||
pub resources: HashMap<String, Vec<String>>, // e.g., {"service": ["vastai"]}
|
||||
}
|
||||
|
||||
impl CompositionAuthority {
|
||||
pub fn none() -> Option<Self> { None } // Convenience for leaves
|
||||
pub fn new(label: &str, scopes: impl IntoIterator<Item = String>) -> Self { ... }
|
||||
pub fn as_identity(&self) -> Option<Identity> { ... } // Synthetic Identity for ACL
|
||||
}
|
||||
```
|
||||
|
||||
The declared authority the handler operates under when composing children.
|
||||
`None` for leaves. This replaces ADR-015's `handler_identity: Identity` — it's
|
||||
not a peer identity, it's a declared authority bundle. See ADR-022.
|
||||
|
||||
`as_identity()` produces a synthetic `Identity` from the authority (label as
|
||||
id, scopes, resources) for ACL checking against `AccessControl`.
|
||||
|
||||
### ScopedOperationEnv
|
||||
|
||||
```rust
|
||||
pub struct ScopedOperationEnv {
|
||||
allowed: HashSet<String>, // operation names this handler may reach
|
||||
}
|
||||
|
||||
impl ScopedOperationEnv {
|
||||
pub fn empty() -> Self;
|
||||
pub fn new(ops: impl IntoIterator<Item = impl Into<String>>) -> Self;
|
||||
pub fn allows(&self, name: &str) -> bool; // is this op in the reachability set?
|
||||
}
|
||||
```
|
||||
|
||||
The reachability set — the operations this handler may reach via `env.invoke()`.
|
||||
Populated from the registration bundle (ADR-022). This is *data*, not a dispatch
|
||||
trait. The reachability check in `OperationEnv::invoke()` consults
|
||||
`scoped_env.allows(&name)`. `None`/empty for leaves.
|
||||
|
||||
### OperationContext methods
|
||||
|
||||
```rust
|
||||
impl OperationContext {
|
||||
pub fn is_internal(&self) -> bool { self.internal }
|
||||
}
|
||||
```
|
||||
|
||||
The `internal` field is `pub(crate)` — only `OperationEnv::invoke()` and the
|
||||
`CallAdapter` dispatch path can set it. Handlers read via `is_internal()`.
|
||||
|
||||
### generate_request_id
|
||||
|
||||
```rust
|
||||
pub(crate) fn generate_request_id() -> String {
|
||||
// UUID v4 — must be unique across concurrent invocations
|
||||
// Deterministic IDs (e.g., format!("env-{name}")) MUST NOT be used
|
||||
}
|
||||
```
|
||||
|
||||
Use the `uuid` crate (already a dependency). This is module-internal — called
|
||||
by `OperationEnv::invoke()` for composed calls.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `OperationContext` struct with all 10 fields
|
||||
- [ ] `internal` field is `pub(crate)` (module-private for writes)
|
||||
- [ ] `is_internal()` method exposes read access
|
||||
- [ ] `AbortPolicy` enum with AbortDependents, ContinueRunning
|
||||
- [ ] `Default for AbortPolicy` returns `AbortDependents`
|
||||
- [ ] `CompositionAuthority` struct with label, scopes, resources
|
||||
- [ ] `CompositionAuthority::none()` returns `None`
|
||||
- [ ] `CompositionAuthority::new(label, scopes)` constructor
|
||||
- [ ] `CompositionAuthority::as_identity()` produces synthetic Identity for ACL
|
||||
- [ ] `ScopedOperationEnv` struct with allowed set
|
||||
- [ ] `ScopedOperationEnv::empty()`, `new()`, `allows()` methods
|
||||
- [ ] `generate_request_id()` produces UUID v4 (unique, non-deterministic)
|
||||
- [ ] Unit test: ScopedOperationEnv::allows (in set → true, not in set → false)
|
||||
- [ ] Unit test: CompositionAuthority::as_identity produces correct Identity
|
||||
- [ ] Unit test: AbortPolicy default is AbortDependents
|
||||
- [ ] `cargo test -p alknet-call` succeeds
|
||||
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/call/operation-registry.md — OperationContext, AbortPolicy, CompositionAuthority, ScopedOperationEnv
|
||||
- docs/architecture/decisions/015-privilege-model-and-authority-context.md — ADR-015 (internal flag, authority switch)
|
||||
- docs/architecture/decisions/016-abort-cascade-for-nested-calls.md — ADR-016 (AbortPolicy)
|
||||
- docs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md — ADR-022 (CompositionAuthority, ScopedOperationEnv)
|
||||
- docs/architecture/decisions/024-operation-registry-layering.md — ADR-024 (env as trait object)
|
||||
|
||||
## Notes
|
||||
|
||||
> **Read ADR-015, ADR-022, and ADR-024 before starting.** This is the
|
||||
> highest-density task in the call crate. OperationContext has 10 fields, each
|
||||
> tied to an ADR. The authority-switch semantics (internal: true → ACL against
|
||||
> handler_identity, not identity) is where three ADRs converge. The `internal`
|
||||
> field is module-private for writes — only OperationEnv::invoke() and the
|
||||
> CallAdapter dispatch path set it. Metadata does NOT propagate through
|
||||
> composition (security constraint, ADR-014). Request IDs must be unique
|
||||
> (UUID v4) — deterministic IDs corrupt PendingRequestMap and abort-cascade tree.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
225
tasks/call/registry/operation-env.md
Normal file
225
tasks/call/registry/operation-env.md
Normal file
@@ -0,0 +1,225 @@
|
||||
---
|
||||
id: call/registry/operation-env
|
||||
name: Implement OperationEnv trait, LocalOperationEnv, and CompositeOperationEnv
|
||||
status: pending
|
||||
depends_on: [call/registry/handler-registration]
|
||||
scope: broad
|
||||
risk: high
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the `OperationEnv` trait and its implementations in
|
||||
`src/registry/env.rs`. This is the universal composition mechanism — a handler
|
||||
calls `context.env.invoke(...)` to compose child operations. The trait-object
|
||||
design is what enables registry layering (ADR-024).
|
||||
|
||||
**Read ADR-024 before starting this task.** The trait-object pattern is
|
||||
load-bearing — making `OperationEnv` concrete would close the session-overlay
|
||||
and connection-overlay patterns.
|
||||
|
||||
### OperationEnv trait
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait OperationEnv: Send + Sync {
|
||||
/// Compose a child operation. The child's OperationContext is constructed
|
||||
/// with internal: true, inheriting the parent's composition authority as
|
||||
/// the child's caller identity. Abort policy defaults to parent's.
|
||||
async fn invoke(
|
||||
&self,
|
||||
namespace: &str,
|
||||
operation: &str,
|
||||
input: Value,
|
||||
parent: &OperationContext,
|
||||
) -> ResponseEnvelope {
|
||||
self.invoke_with_policy(namespace, operation, input, parent, parent.abort_policy.clone()).await
|
||||
}
|
||||
|
||||
/// Compose with explicit abort policy (ADR-016 Decision 6).
|
||||
/// This is the required method — invoke() delegates to it.
|
||||
async fn invoke_with_policy(
|
||||
&self,
|
||||
namespace: &str,
|
||||
operation: &str,
|
||||
input: Value,
|
||||
parent: &OperationContext,
|
||||
policy: AbortPolicy,
|
||||
) -> ResponseEnvelope;
|
||||
|
||||
/// Does this env contain the named operation? Used by CompositeOperationEnv
|
||||
/// to probe overlays before dispatching (ADR-024).
|
||||
fn contains(&self, name: &str) -> bool { true }
|
||||
}
|
||||
```
|
||||
|
||||
`invoke()` has a default impl that delegates to `invoke_with_policy()` with
|
||||
the parent's abort policy. Implementations only need to implement
|
||||
`invoke_with_policy()`.
|
||||
|
||||
### LocalOperationEnv (Layer 0)
|
||||
|
||||
```rust
|
||||
pub struct LocalOperationEnv {
|
||||
registry: Arc<OperationRegistry>,
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl OperationEnv for LocalOperationEnv {
|
||||
async fn invoke_with_policy(&self, namespace: &str, operation: &str, input: Value, parent: &OperationContext, policy: AbortPolicy) -> ResponseEnvelope {
|
||||
let name = format!("{namespace}/{operation}");
|
||||
|
||||
// 1. Reachability check (ADR-015, ADR-022): is this op in parent's scoped env?
|
||||
if !parent.scoped_env.allows(&name) {
|
||||
return ResponseEnvelope::not_found(name);
|
||||
}
|
||||
|
||||
// 2. Look up registration
|
||||
let registration = self.registry.registration(&name);
|
||||
|
||||
// 3. Construct child OperationContext
|
||||
let context = OperationContext {
|
||||
request_id: generate_request_id(), // UUID v4 — NOT deterministic
|
||||
parent_request_id: Some(parent.request_id.clone()),
|
||||
identity: parent.handler_identity.as_identity(), // authority switch
|
||||
handler_identity: registration.composition_authority.clone(),
|
||||
capabilities: parent.capabilities.clone(), // inherit
|
||||
metadata: HashMap::new(), // fresh — does NOT propagate parent metadata (ADR-014)
|
||||
abort_policy: policy,
|
||||
deadline: parent.deadline, // inherit — children don't get fresh 30s
|
||||
scoped_env: registration.scoped_env.clone().unwrap_or_else(ScopedOperationEnv::empty),
|
||||
env: parent.env.clone(), // inherit the same composite env
|
||||
internal: true, // nested calls use handler authority
|
||||
};
|
||||
|
||||
// 4. Dispatch
|
||||
self.registry.invoke(&name, input, context).await
|
||||
}
|
||||
|
||||
// contains() uses default (returns true — curated registry contains everything it can dispatch)
|
||||
}
|
||||
```
|
||||
|
||||
Key points:
|
||||
- **Reachability check first**: if op not in parent's scoped_env, NOT_FOUND.
|
||||
This bounds the parameterized-dispatch attack surface.
|
||||
- **Authority propagation**: child's `identity` = parent's `handler_identity`
|
||||
(the parent's composition authority becomes the caller). This is the
|
||||
authority switch from ADR-015.
|
||||
- **Fresh metadata**: `HashMap::new()`, NOT parent's metadata. Security
|
||||
constraint (ADR-014) — prevents secret leakage through composition.
|
||||
- **Inherited deadline**: children don't get a fresh 30s — the root call's
|
||||
deadline bounds the entire call tree.
|
||||
- **Inherited env**: child gets `parent.env.clone()` (the same composite of
|
||||
curated base + active overlays).
|
||||
- **internal: true**: this is the flag that switches ACL authority.
|
||||
|
||||
### CompositeOperationEnv (per-call, ADR-024)
|
||||
|
||||
```rust
|
||||
pub struct CompositeOperationEnv {
|
||||
session: Option<Arc<dyn OperationEnv + Send + Sync>>, // Layer 1
|
||||
connection: Option<Arc<dyn OperationEnv + Send + Sync>>, // Layer 2
|
||||
base: Arc<dyn OperationEnv + Send + Sync>, // Layer 0 (LocalOperationEnv)
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl OperationEnv for CompositeOperationEnv {
|
||||
async fn invoke_with_policy(&self, namespace: &str, operation: &str, input: Value, parent: &OperationContext, policy: AbortPolicy) -> ResponseEnvelope {
|
||||
let name = format!("{namespace}/{operation}");
|
||||
|
||||
// Reachability check (same as LocalOperationEnv)
|
||||
if !parent.scoped_env.allows(&name) {
|
||||
return ResponseEnvelope::not_found(name);
|
||||
}
|
||||
|
||||
// Dispatch in overlay order: session → connection → curated base
|
||||
// First overlay that *contains* the op wins
|
||||
if let Some(session) = &self.session {
|
||||
if session.contains(&name) {
|
||||
return session.invoke_with_policy(namespace, operation, input, parent, policy).await;
|
||||
}
|
||||
}
|
||||
if let Some(connection) = &self.connection {
|
||||
if connection.contains(&name) {
|
||||
return connection.invoke_with_policy(namespace, operation, input, parent, policy).await;
|
||||
}
|
||||
}
|
||||
self.base.invoke_with_policy(namespace, operation, input, parent, policy).await
|
||||
}
|
||||
|
||||
fn contains(&self, name: &str) -> bool {
|
||||
self.session.as_ref().map_or(false, |s| s.contains(name))
|
||||
|| self.connection.as_ref().map_or(false, |c| c.contains(name))
|
||||
|| self.base.contains(name)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The `contains()` method (review #003 C9) is the overlay-dispatch contract. It
|
||||
replaces the previous ambiguous "sentinel or contains check" framing. The
|
||||
structural decision (composite trait object, overlay order, Arc::clone
|
||||
inheritance) is locked by ADR-024; the dispatch contract (contains probe before
|
||||
invoke_with_policy) is locked too.
|
||||
|
||||
### Why OperationEnv must remain a trait
|
||||
|
||||
The trait-based design enables registry layering (ADR-024):
|
||||
- The CallAdapter composes the root env per call from curated base + active
|
||||
connection/session overlays
|
||||
- Overlays wrap the base via trait layering
|
||||
- Session-scoped registries (OQ-19) and connection-scoped remote imports
|
||||
(ADR-017 `from_call`) are both overlays on the same base
|
||||
|
||||
Making `OperationEnv` concrete or hardcoding the global registry into the
|
||||
dispatch path would close both patterns. This is the same integration-point
|
||||
pattern as `IdentityProvider` (ADR-004).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `OperationEnv` trait with `invoke()`, `invoke_with_policy()`, `contains()`
|
||||
- [ ] `invoke()` has default impl delegating to `invoke_with_policy()` with parent's policy
|
||||
- [ ] `contains()` has default impl returning `true`
|
||||
- [ ] `LocalOperationEnv` struct holding `Arc<OperationRegistry>`
|
||||
- [ ] `LocalOperationEnv::invoke_with_policy` checks reachability (scoped_env.allows)
|
||||
- [ ] `LocalOperationEnv` constructs child context with internal: true, authority switch
|
||||
- [ ] `LocalOperationEnv` fresh metadata (HashMap::new(), not parent's)
|
||||
- [ ] `LocalOperationEnv` inherited deadline (parent.deadline, not fresh 30s)
|
||||
- [ ] `LocalOperationEnv` inherited env (parent.env.clone())
|
||||
- [ ] `CompositeOperationEnv` with session, connection, base fields
|
||||
- [ ] `CompositeOperationEnv::invoke_with_policy` dispatches in overlay order (session → connection → base)
|
||||
- [ ] `CompositeOperationEnv` uses `contains()` probe before dispatching to overlay
|
||||
- [ ] `CompositeOperationEnv::contains` returns true if any layer contains the op
|
||||
- [ ] Reachability check returns NOT_FOUND if op not in scoped_env
|
||||
- [ ] Unit test: LocalOperationEnv invoke with allowed op → dispatches
|
||||
- [ ] Unit test: LocalOperationEnv invoke with disallowed op → NOT_FOUND
|
||||
- [ ] Unit test: child context has internal: true
|
||||
- [ ] Unit test: child context identity = parent's handler_identity
|
||||
- [ ] Unit test: child metadata is fresh (empty), not parent's
|
||||
- [ ] Unit test: CompositeOperationEnv dispatches to session overlay if contains
|
||||
- [ ] Unit test: CompositeOperationEnv falls through to base if no overlay contains
|
||||
- [ ] `cargo test -p alknet-call` succeeds
|
||||
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/call/operation-registry.md — OperationEnv, LocalOperationEnv, CompositeOperationEnv
|
||||
- docs/architecture/decisions/015-privilege-model-and-authority-context.md — ADR-015 (authority switch)
|
||||
- docs/architecture/decisions/016-abort-cascade-for-nested-calls.md — ADR-016 (abort policy propagation)
|
||||
- docs/architecture/decisions/024-operation-registry-layering.md — ADR-024 (layering, contains contract)
|
||||
|
||||
## Notes
|
||||
|
||||
> **Read ADR-024 before starting.** The trait-object design is load-bearing —
|
||||
> OperationEnv MUST remain a trait, not a concrete type. The authority switch
|
||||
> (child identity = parent handler_identity) is the ADR-015 privilege model.
|
||||
> Metadata does NOT propagate (ADR-014 security constraint). Deadline
|
||||
> inherits (children don't get fresh 30s). The `contains()` probe is the
|
||||
> overlay-dispatch contract from review #003 C9 — any OperationEnv impl that
|
||||
> correctly reports contains works with the composite.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
168
tasks/call/registry/operation-spec.md
Normal file
168
tasks/call/registry/operation-spec.md
Normal file
@@ -0,0 +1,168 @@
|
||||
---
|
||||
id: call/registry/operation-spec
|
||||
name: Implement OperationSpec, OperationType, Visibility, ErrorDefinition, and AccessControl
|
||||
status: pending
|
||||
depends_on: [call/crate-init]
|
||||
scope: moderate
|
||||
risk: medium
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the operation specification types in `src/registry/spec.rs`. These
|
||||
types declare what an operation is, its schemas, and its access control policy.
|
||||
|
||||
### OperationSpec
|
||||
|
||||
```rust
|
||||
pub struct OperationSpec {
|
||||
pub name: String, // e.g., "fs/readFile", "agent/chat" (no leading slash)
|
||||
pub namespace: String, // e.g., "fs", "agent"
|
||||
pub op_type: OperationType, // Query, Mutation, Subscription
|
||||
pub visibility: Visibility, // External (wire-callable) or Internal (composition-only)
|
||||
pub input_schema: Value, // JSON Schema for input
|
||||
pub output_schema: Value, // JSON Schema for output
|
||||
pub error_schemas: Vec<ErrorDefinition>, // Declared domain errors (ADR-023)
|
||||
pub access_control: AccessControl,
|
||||
}
|
||||
```
|
||||
|
||||
Operation names use slash-based paths **without a leading slash**, aligned with
|
||||
URL path conventions: `fs/readFile`, `agent/chat`, `services/list`. The leading
|
||||
slash is added for display (`spec.path()` returns `/fs/readFile`) and wire
|
||||
format. The registry stores names without the leading slash.
|
||||
|
||||
The `namespace` field is derived from the name: for `fs/readFile` it's `fs`,
|
||||
for `agent/chat` it's `agent`. It's a convenience accessor for ACL matching and
|
||||
service grouping.
|
||||
|
||||
Implement `OperationSpec::path(&self) -> String` that returns `/{name}` (the
|
||||
wire/display form with leading slash).
|
||||
|
||||
### OperationType
|
||||
|
||||
```rust
|
||||
pub enum OperationType {
|
||||
Query, // Read-only, idempotent (e.g., "fs/readFile", "services/list")
|
||||
Mutation, // Side effects (e.g., "bash/exec", "github/authenticate")
|
||||
Subscription, // Streaming (e.g., "agent/chat", "events/subscribe")
|
||||
}
|
||||
```
|
||||
|
||||
### Visibility
|
||||
|
||||
```rust
|
||||
pub enum Visibility {
|
||||
External, // Callable from the wire (call.requested from a client)
|
||||
Internal, // Composition-only (env.invoke from a handler)
|
||||
}
|
||||
```
|
||||
|
||||
`External` operations appear in `services/list` and accept `call.requested`.
|
||||
`Internal` operations return `NOT_FOUND` when called from the wire and do not
|
||||
appear in `services/list`. The assembly layer declares visibility at
|
||||
registration. All import adapters register operations as `Internal` by default
|
||||
(they're composition material); the handler that composes them is `External`.
|
||||
|
||||
### ErrorDefinition
|
||||
|
||||
```rust
|
||||
pub struct ErrorDefinition {
|
||||
pub code: String, // e.g., "FILE_NOT_FOUND", "RATE_LIMITED"
|
||||
pub description: String, // Human-readable description
|
||||
pub schema: Value, // JSON Schema for the error detail payload
|
||||
pub http_status: Option<u16>, // HTTP status for adapter projection (from_openapi/to_openapi)
|
||||
}
|
||||
```
|
||||
|
||||
A declared operation-level error (ADR-023). When a handler returns a `CallError`
|
||||
whose `code` matches a declared `ErrorDefinition`, the `call.error` event
|
||||
carries that code and the error's detail payload. If it doesn't match, the
|
||||
`call.error` carries `INTERNAL`.
|
||||
|
||||
### AccessControl
|
||||
|
||||
```rust
|
||||
pub struct AccessControl {
|
||||
pub required_scopes: Vec<String>, // AND-checked: caller must have ALL
|
||||
pub required_scopes_any: Option<Vec<String>>, // OR-checked: caller must have at LEAST ONE
|
||||
pub resource_type: Option<String>, // e.g., "service"
|
||||
pub resource_action: Option<String>, // e.g., "read"
|
||||
}
|
||||
```
|
||||
|
||||
### ACL check flow
|
||||
|
||||
When a `call.requested` event arrives:
|
||||
1. Registry checks **visibility** — if `Internal`, returns `NOT_FOUND` (does
|
||||
not leak existence)
|
||||
2. Registry checks `access_control.check(identity)`:
|
||||
- For external calls (`internal: false`): ACL against the **caller's identity**
|
||||
- For internal calls (`internal: true`): ACL against the **handler's
|
||||
composition authority** (ADR-015)
|
||||
3. If denied: `FORBIDDEN`
|
||||
4. If identity is `None` and operation has restrictions: `FORBIDDEN` with
|
||||
message `"authentication required"`
|
||||
|
||||
Operations with empty `AccessControl` (no required scopes, no resource checks)
|
||||
are accessible to all callers, including unauthenticated ones.
|
||||
|
||||
### Implement AccessControl::check
|
||||
|
||||
```rust
|
||||
impl AccessControl {
|
||||
pub fn check(&self, identity: Option<&Identity>) -> AccessResult;
|
||||
}
|
||||
|
||||
pub enum AccessResult {
|
||||
Allowed,
|
||||
Forbidden(String), // reason
|
||||
}
|
||||
```
|
||||
|
||||
The check logic:
|
||||
- `required_scopes`: caller must have ALL (subset check)
|
||||
- `required_scopes_any`: caller must have at LEAST ONE (if present)
|
||||
- `resource_type` / `resource_action`: check against `identity.resources`
|
||||
- If `identity` is `None` and any scope/resource is required: `Forbidden("authentication required")`
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `OperationSpec` struct with all 8 fields
|
||||
- [ ] `OperationSpec::path()` returns `/{name}` (leading slash for wire/display)
|
||||
- [ ] `OperationSpec::namespace` derived from name (split on `/`)
|
||||
- [ ] `OperationType` enum with Query, Mutation, Subscription
|
||||
- [ ] `Visibility` enum with External, Internal
|
||||
- [ ] `ErrorDefinition` struct with all 4 fields
|
||||
- [ ] `AccessControl` struct with all 4 fields
|
||||
- [ ] `AccessControl::check(identity)` returns `AccessResult`
|
||||
- [ ] `required_scopes` is AND-checked (caller must have all)
|
||||
- [ ] `required_scopes_any` is OR-checked (caller must have at least one)
|
||||
- [ ] `None` identity with restrictions → `Forbidden("authentication required")`
|
||||
- [ ] Empty AccessControl → `Allowed` for all callers
|
||||
- [ ] Unit tests for AccessControl::check (all combinations)
|
||||
- [ ] Unit test: OperationSpec::path() produces leading slash
|
||||
- [ ] Unit test: namespace derived correctly from name
|
||||
- [ ] `cargo test -p alknet-call` succeeds
|
||||
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/call/operation-registry.md — OperationSpec, AccessControl, Visibility
|
||||
- docs/architecture/decisions/015-privilege-model-and-authority-context.md — ADR-015 (visibility, ACL)
|
||||
- docs/architecture/decisions/023-operation-error-schemas.md — ADR-023 (ErrorDefinition)
|
||||
|
||||
## Notes
|
||||
|
||||
> Operation names have NO leading slash in the registry (`fs/readFile`). The
|
||||
> leading slash is added for wire format and display (`/fs/readFile`). This is
|
||||
> a single rule applied consistently — do not mix the two forms. Visibility
|
||||
> controls wire-callability: Internal ops return NOT_FOUND from the wire (don't
|
||||
> leak existence). AccessControl.check is the ACL gate — read it carefully
|
||||
> against ADR-015 for the internal vs external authority distinction.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
181
tasks/call/registry/service-discovery.md
Normal file
181
tasks/call/registry/service-discovery.md
Normal file
@@ -0,0 +1,181 @@
|
||||
---
|
||||
id: call/registry/service-discovery
|
||||
name: Implement services/list and services/schema built-in operations
|
||||
status: pending
|
||||
depends_on: [call/registry/handler-registration]
|
||||
scope: narrow
|
||||
risk: low
|
||||
impact: isolated
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the two built-in service discovery operations in
|
||||
`src/registry/discovery.rs`. These are read-only operations that expose what
|
||||
the node offers.
|
||||
|
||||
### Operations
|
||||
|
||||
| Operation name | Display path | Type | Description |
|
||||
|---------------|-------------|------|-------------|
|
||||
| `services/list` | `/services/list` | Query | List registered operation names and metadata |
|
||||
| `services/schema` | `/services/schema` | Query | Get the OperationSpec for a specific operation |
|
||||
|
||||
### services/list
|
||||
|
||||
Returns `External` operations only. `Internal` operations are not part of the
|
||||
wire-facing API surface — they're implementation details of composition. A
|
||||
remote client cannot enumerate the internal call tree (ADR-015).
|
||||
|
||||
```json
|
||||
{
|
||||
"operations": [
|
||||
{ "name": "fs/readFile", "namespace": "fs", "op_type": "query" },
|
||||
{ "name": "agent/chat", "namespace": "agent", "op_type": "subscription" },
|
||||
{ "name": "events/subscribe", "namespace": "events", "op_type": "subscription" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
The handler queries the registry's `list_operations()` (which returns External
|
||||
specs only) and serializes to the above format.
|
||||
|
||||
### services/schema
|
||||
|
||||
Accepts `{ "name": "fs/readFile" }` (no leading slash — registry form, same as
|
||||
`OperationSpec.name`) and returns the full `OperationSpec` including
|
||||
input/output JSON Schemas and declared `error_schemas` (ADR-023).
|
||||
|
||||
The CallAdapter normalizes the leading slash from wire `operationId`s before
|
||||
lookup, so `services/schema` accepts both `fs/readFile` and `/fs/readFile`.
|
||||
|
||||
This enables client code generation: a client reading the schema can produce
|
||||
typed error enums instead of generic error handling.
|
||||
|
||||
### Registration
|
||||
|
||||
These are registered as `Local` provenance with empty composition authority,
|
||||
empty scoped env, and empty capabilities (they don't compose, don't need
|
||||
credentials):
|
||||
|
||||
```rust
|
||||
.with_local(services_list_spec(), Arc::new(services_list_handler),
|
||||
CompositionAuthority::none(), ScopedOperationEnv::empty(), Capabilities::new())
|
||||
.with_local(services_schema_spec(), Arc::new(schema_handler),
|
||||
CompositionAuthority::none(), ScopedOperationEnv::empty(), Capabilities::new())
|
||||
```
|
||||
|
||||
### Specs
|
||||
|
||||
```rust
|
||||
fn services_list_spec() -> OperationSpec {
|
||||
OperationSpec {
|
||||
name: "services/list".into(),
|
||||
namespace: "services".into(),
|
||||
op_type: OperationType::Query,
|
||||
visibility: Visibility::External,
|
||||
input_schema: json!({}), // no input
|
||||
output_schema: json!({
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"operations": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": { "type": "string" },
|
||||
"namespace": { "type": "string" },
|
||||
"op_type": { "type": "string", "enum": ["query", "mutation", "subscription"] }
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}),
|
||||
error_schemas: vec![],
|
||||
access_control: AccessControl::default(), // no restrictions — callable by all
|
||||
}
|
||||
}
|
||||
|
||||
fn services_schema_spec() -> OperationSpec {
|
||||
OperationSpec {
|
||||
name: "services/schema".into(),
|
||||
namespace: "services".into(),
|
||||
op_type: OperationType::Query,
|
||||
visibility: Visibility::External,
|
||||
input_schema: json!({
|
||||
"type": "object",
|
||||
"properties": { "name": { "type": "string" } },
|
||||
"required": ["name"]
|
||||
}),
|
||||
output_schema: json!({ /* full OperationSpec schema */ }),
|
||||
error_schemas: vec![],
|
||||
access_control: AccessControl::default(),
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Handlers
|
||||
|
||||
The handlers need access to the registry. Since handlers are `Arc<dyn Fn>`,
|
||||
the registry reference is captured in the closure. Use `Arc<OperationRegistry>`
|
||||
cloned into the closure.
|
||||
|
||||
```rust
|
||||
fn services_list_handler(registry: Arc<OperationRegistry>) -> Handler {
|
||||
Arc::new(move |input: Value, ctx: OperationContext| {
|
||||
let registry = registry.clone();
|
||||
Box::pin(async move {
|
||||
let ops: Vec<_> = registry.list_operations()
|
||||
.into_iter()
|
||||
.filter(|s| s.visibility == Visibility::External)
|
||||
.map(|s| json!({
|
||||
"name": s.name,
|
||||
"namespace": s.namespace,
|
||||
"op_type": match s.op_type {
|
||||
OperationType::Query => "query",
|
||||
OperationType::Mutation => "mutation",
|
||||
OperationType::Subscription => "subscription",
|
||||
}
|
||||
}))
|
||||
.collect();
|
||||
ResponseEnvelope::ok(ctx.request_id, json!({ "operations": ops }))
|
||||
})
|
||||
})
|
||||
}
|
||||
```
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `services/list` spec with correct fields (Query, External, no input, output schema)
|
||||
- [ ] `services/schema` spec with correct fields (Query, External, name input, full spec output)
|
||||
- [ ] `services/list` handler returns External operations only (Internal excluded)
|
||||
- [ ] `services/list` output format matches spec (operations array with name, namespace, op_type)
|
||||
- [ ] `services/schema` handler accepts name with or without leading slash
|
||||
- [ ] `services/schema` returns full OperationSpec (input_schema, output_schema, error_schemas)
|
||||
- [ ] `services/schema` returns NOT_FOUND for unknown operation name
|
||||
- [ ] Both registered as Local provenance, empty authority/env/caps
|
||||
- [ ] Both have empty AccessControl (callable by all, including unauthenticated)
|
||||
- [ ] Unit test: services/list returns only External ops
|
||||
- [ ] Unit test: services/schema returns spec for known op
|
||||
- [ ] Unit test: services/schema returns NOT_FOUND for unknown op
|
||||
- [ ] Unit test: services/schema accepts both "fs/readFile" and "/fs/readFile"
|
||||
- [ ] `cargo test -p alknet-call` succeeds
|
||||
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/call/operation-registry.md — Service Discovery section
|
||||
- docs/architecture/decisions/015-privilege-model-and-authority-context.md — ADR-015 (Internal not in services/list)
|
||||
|
||||
## Notes
|
||||
|
||||
> services/list returns External ops only — Internal ops are implementation
|
||||
> details of composition and must not be enumerable from the wire. The
|
||||
> CallAdapter normalizes leading slashes, so services/schema accepts both
|
||||
> forms. These are the only built-in operations; no admin operations are
|
||||
> exposed through the call protocol itself.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
137
tasks/call/review-call.md
Normal file
137
tasks/call/review-call.md
Normal file
@@ -0,0 +1,137 @@
|
||||
---
|
||||
id: call/review-call
|
||||
name: Review alknet-call implementation for spec conformance and pattern consistency
|
||||
status: pending
|
||||
depends_on: [call/protocol/abort-cascade]
|
||||
scope: broad
|
||||
risk: low
|
||||
impact: phase
|
||||
level: review
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Review the alknet-call implementation for spec conformance, pattern
|
||||
consistency, and correctness. This is the quality checkpoint at the end of the
|
||||
call phase — the most complex crate in this batch.
|
||||
|
||||
### Review Checklist
|
||||
|
||||
1. **Registry conformance** (operation-registry.md):
|
||||
- `OperationSpec` has all 8 fields, `path()` adds leading slash
|
||||
- `OperationType` (Query/Mutation/Subscription), `Visibility` (External/Internal)
|
||||
- `ErrorDefinition` with code, description, schema, http_status (ADR-023)
|
||||
- `AccessControl` with required_scopes (AND), required_scopes_any (OR), resource checks
|
||||
- `AccessControl::check` returns Allowed/Forbidden, None identity with restrictions → Forbidden
|
||||
- `OperationContext` has all 10 fields, `internal` is pub(crate), `is_internal()` reads
|
||||
- `AbortPolicy` (AbortDependents default, ContinueRunning opt-in)
|
||||
- `CompositionAuthority` with label, scopes, resources, `as_identity()`
|
||||
- `ScopedOperationEnv` with `allows()` reachability check
|
||||
- `Handler` type (async closure → ResponseEnvelope)
|
||||
- `HandlerRegistration` with all 6 fields (spec, handler, provenance, authority, scoped_env, caps)
|
||||
- `OperationProvenance` with all 6 variants
|
||||
- `OperationRegistry` with register, registration, invoke, list_operations
|
||||
- `OperationRegistryBuilder` with with_local, with_leaf, with, build
|
||||
- `OperationEnv` trait with invoke, invoke_with_policy, contains
|
||||
- `LocalOperationEnv` reachability check, authority switch, fresh metadata, inherited deadline
|
||||
- `CompositeOperationEnv` overlay dispatch (session → connection → base), contains probe
|
||||
- `services/list` returns External only, `services/schema` returns full spec
|
||||
|
||||
2. **Protocol conformance** (call-protocol.md):
|
||||
- `EventEnvelope` with type, id, payload (JSON, length-prefixed framing)
|
||||
- `ResponseEnvelope` with request_id, result
|
||||
- `CallError` with code, message, retryable, details
|
||||
- 5 event types: call.requested, call.responded, call.completed, call.aborted, call.error
|
||||
- Wire payload schemas match spec table
|
||||
- `call.requested` has operationId (leading slash), input, optional auth_token
|
||||
- `call.error` has protocol-level codes (NOT_FOUND, FORBIDDEN, INVALID_INPUT, INTERNAL, TIMEOUT)
|
||||
- `PendingRequestMap` correlates by ID (not stream), handles all event types
|
||||
- `CallConnection` with Layer 2 overlay, register_imported, overlay_env, call/subscribe/abort
|
||||
- `CallAdapter` implements ProtocolHandler for alknet/call
|
||||
- CallAdapter stream handling (accept_bi loop, FrameFramedReader/Writer)
|
||||
- Per-request identity resolution (auth_token overrides connection-level)
|
||||
- `build_root_context` sets internal: false, deadline, capabilities from registration
|
||||
- `compose_root_env` builds CompositeOperationEnv (base + session + connection)
|
||||
- operationId leading slash stripped before lookup
|
||||
- ResponseEnvelope → EventEnvelope conversion
|
||||
- Timeout: 30s default, composed calls inherit parent deadline
|
||||
- Abort cascade: walks tree by parent_request_id, AbortDependents/ContinueRunning
|
||||
|
||||
3. **ADR conformance**:
|
||||
- ADR-005: irpc framing used
|
||||
- ADR-012: bidirectional streams, ID-based correlation
|
||||
- ADR-014: no secret material on wire, Capabilities non-serializable
|
||||
- ADR-015: internal flag switches authority (handler_identity vs identity), Visibility
|
||||
- ADR-016: abort cascade, AbortPolicy, default AbortDependents
|
||||
- ADR-017: connection direction independent of call direction
|
||||
- ADR-022: registration bundle (provenance, authority, scoped_env, capabilities)
|
||||
- ADR-023: ErrorDefinition, typed details in call.error
|
||||
- ADR-024: registry layering (curated + session + connection), OperationEnv as trait
|
||||
|
||||
4. **Security constraints**:
|
||||
- Capabilities non-serializable (no Serialize derive)
|
||||
- Capabilities zeroized, immutable after construction
|
||||
- Metadata does not propagate through composition (fresh HashMap::new())
|
||||
- Call protocol carries no secret material
|
||||
- Internal ops return NOT_FOUND from wire (don't leak existence)
|
||||
- Reachability check (scoped_env.allows) bounds composition attack surface
|
||||
- Request IDs are UUID v4 (non-deterministic, no collisions)
|
||||
|
||||
5. **Pattern consistency**:
|
||||
- OperationEnv is a trait (not concrete) — enables layering
|
||||
- CompositeOperationEnv uses contains() probe before dispatch
|
||||
- Authority switch in invoke_with_policy (child identity = parent handler_identity)
|
||||
- Deadline inheritance (children don't get fresh 30s)
|
||||
- ArcSwap not used in call (that's core's pattern)
|
||||
|
||||
6. **Test coverage**:
|
||||
- Unit tests for AccessControl::check (all combinations)
|
||||
- Unit tests for OperationContext construction
|
||||
- Unit tests for OperationEnv (LocalOperationEnv, CompositeOperationEnv)
|
||||
- Unit tests for PendingRequestMap (all event types, timeouts, fail_all)
|
||||
- Unit tests for framing (round-trip, truncation)
|
||||
- Unit tests for abort cascade (both policies, tree walking)
|
||||
- Integration test: call.requested → dispatch → call.responded
|
||||
- Integration test: auth_token overrides identity
|
||||
- Integration test: Internal op → NOT_FOUND from wire
|
||||
- Integration test: ACL denied → FORBIDDEN
|
||||
- Integration test: subscription streaming (multiple responded, completed)
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] All registry types match operation-registry.md
|
||||
- [ ] All protocol types match call-protocol.md
|
||||
- [ ] All ADRs conformed to (005, 012, 014, 015, 016, 017, 022, 023, 024)
|
||||
- [ ] Capabilities non-serializable, zeroized, immutable
|
||||
- [ ] Metadata does not propagate through composition
|
||||
- [ ] Internal ops return NOT_FOUND from wire
|
||||
- [ ] Reachability check bounds composition
|
||||
- [ ] Request IDs are UUID v4
|
||||
- [ ] OperationEnv is a trait (not concrete)
|
||||
- [ ] CompositeOperationEnv uses contains() probe
|
||||
- [ ] Authority switch correct (internal: true → handler_identity)
|
||||
- [ ] Deadline inheritance correct (children inherit parent deadline)
|
||||
- [ ] Test coverage adequate for all functionality
|
||||
- [ ] `cargo fmt --check -p alknet-call` passes
|
||||
- [ ] `cargo clippy -p alknet-call` passes with no warnings
|
||||
- [ ] All tests pass
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/call/README.md
|
||||
- docs/architecture/crates/call/call-protocol.md
|
||||
- docs/architecture/crates/call/operation-registry.md
|
||||
- docs/architecture/decisions/ (relevant ADRs: 005, 012, 014-017, 022-024)
|
||||
|
||||
## Notes
|
||||
|
||||
> This is the most complex crate in this batch. The review should verify that
|
||||
> the registry layering (ADR-024), authority switch (ADR-015), abort cascade
|
||||
> (ADR-016), and composition model (ADR-022) all work correctly together. The
|
||||
> OperationEnv trait-object design is load-bearing — verify it's a trait, not
|
||||
> concrete. If deviations are found, document and fix before considering the
|
||||
> call crate complete.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
162
tasks/core/auth.md
Normal file
162
tasks/core/auth.md
Normal file
@@ -0,0 +1,162 @@
|
||||
---
|
||||
id: core/auth
|
||||
name: Implement AuthContext, Identity, AuthToken, IdentityProvider trait, and ConfigIdentityProvider
|
||||
status: pending
|
||||
depends_on: [core/core-types]
|
||||
scope: moderate
|
||||
risk: medium
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the authentication types in `src/auth.rs`. Auth is hybrid: the
|
||||
endpoint resolves what it can (TLS-level), handlers resolve what they need
|
||||
(protocol-level). AuthContext may be partial — handlers complete auth inside
|
||||
`handle()`.
|
||||
|
||||
### AuthContext
|
||||
|
||||
```rust
|
||||
#[derive(Clone)]
|
||||
pub struct AuthContext {
|
||||
pub identity: Option<Identity>,
|
||||
pub alpn: Vec<u8>,
|
||||
pub remote_addr: Option<SocketAddr>,
|
||||
pub tls_client_fingerprint: Option<String>,
|
||||
}
|
||||
```
|
||||
|
||||
Created by the endpoint for each incoming connection. Passed to
|
||||
`ProtocolHandler::handle()` as an immutable reference.
|
||||
|
||||
- `identity`: peer's authenticated identity, if resolved by the endpoint. None
|
||||
means the endpoint has no identity info for this connection.
|
||||
- `alpn`: negotiated ALPN — always present after TLS handshake.
|
||||
- `remote_addr`: peer's address, if available (may be None for iroh).
|
||||
- `tls_client_fingerprint`: SHA-256 fingerprint of TLS client cert, if presented.
|
||||
|
||||
`AuthContext` is `Clone` (handlers clone for per-stream contexts) and immutable
|
||||
in `handle()` (handlers create local variables for resolved identity, they
|
||||
don't mutate the shared context).
|
||||
|
||||
### Identity
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone, PartialEq)]
|
||||
pub struct Identity {
|
||||
pub id: String,
|
||||
pub scopes: Vec<String>,
|
||||
pub resources: HashMap<String, Vec<String>>,
|
||||
}
|
||||
```
|
||||
|
||||
The authenticated peer identity. `id` is ALPN-agnostic:
|
||||
- SSH key auth: `"SHA256:abc123..."` (key fingerprint)
|
||||
- API key auth: `"alk_test"` (key prefix)
|
||||
- Certificate auth: `"username"` (principal name)
|
||||
|
||||
### AuthToken
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct AuthToken {
|
||||
pub raw: Vec<u8>,
|
||||
}
|
||||
```
|
||||
|
||||
Opaque authentication token carried in protocol frames. The handler that
|
||||
extracted it knows its encoding.
|
||||
|
||||
### IdentityProvider trait
|
||||
|
||||
```rust
|
||||
pub trait IdentityProvider: Send + Sync + 'static {
|
||||
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
|
||||
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
|
||||
}
|
||||
```
|
||||
|
||||
- `resolve_from_fingerprint()`: used by endpoint (TLS client cert) and SSH (key fingerprint)
|
||||
- `resolve_from_token()`: used by call protocol (AuthToken in first frame) and HTTP (Bearer header)
|
||||
- Both return `Option<Identity>` — None means credential not recognized
|
||||
|
||||
### ConfigIdentityProvider
|
||||
|
||||
```rust
|
||||
pub struct ConfigIdentityProvider {
|
||||
dynamic: Arc<ArcSwap<DynamicConfig>>,
|
||||
}
|
||||
```
|
||||
|
||||
The default implementation. Resolves identities from `DynamicConfig` (reads
|
||||
from ArcSwap on every call — hot-reloadable).
|
||||
|
||||
Resolution logic:
|
||||
- **Fingerprint**: look up in `DynamicConfig::auth::authorized_fingerprints`.
|
||||
If found, return `Identity { id: fingerprint, scopes: ["relay:connect"], resources: {} }`.
|
||||
- **Token**: parse as UTF-8. If starts with `alk_`, look up in
|
||||
`DynamicConfig::auth::api_keys` by prefix match + SHA-256 hash. If found and
|
||||
not expired, return `Identity { id: prefix, scopes: entry.scopes, resources: entry.resources }`.
|
||||
|
||||
Changes to DynamicConfig via ConfigReloadHandle are reflected immediately.
|
||||
|
||||
### Two Identity Scopes
|
||||
|
||||
There are two distinct identity scopes that must not be conflated:
|
||||
|
||||
| Scope | Where set | Where stored | Represents | Used for |
|
||||
|-------|-----------|--------------|------------|----------|
|
||||
| Connection-level | Handler in `handle()` | `Connection` (via `set_identity`) | Who opened the QUIC connection | Observability, logging |
|
||||
| Per-request | CallAdapter per `call.requested` | `OperationContext.identity` | Who makes this specific call | ACL (ADR-015) |
|
||||
|
||||
The connection-level identity is stable (set once). The per-request identity
|
||||
is dynamic (resolved per call, potentially different across requests). The
|
||||
per-request identity takes precedence for ACL.
|
||||
|
||||
### Security constraints
|
||||
|
||||
- **Token entropy**: generated `alk_` tokens must have ≥128 bits of entropy.
|
||||
The prefix (first 8 chars) is for O(1) lookup and is not secret — it appears
|
||||
in logs by design. SHA-256 of the full token allows offline verification; this
|
||||
is safe only if the full token is high-entropy.
|
||||
- **Config reload must be authenticated**: a reload that adds an authorized
|
||||
fingerprint or API key grants access immediately. The reload trigger must be
|
||||
local-only or admin-scoped.
|
||||
- **Connection-level identity is for observability only**: per-request identity
|
||||
takes precedence for ACL.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `AuthContext` struct with all 4 fields, derives `Clone`
|
||||
- [ ] `Identity` struct with `id`, `scopes`, `resources`, derives `Clone`, `PartialEq`
|
||||
- [ ] `AuthToken` struct with `raw` field, derives `Clone`
|
||||
- [ ] `IdentityProvider` trait with both methods
|
||||
- [ ] `ConfigIdentityProvider` struct holding `Arc<ArcSwap<DynamicConfig>>`
|
||||
- [ ] `ConfigIdentityProvider::resolve_from_fingerprint` looks up in authorized_fingerprints
|
||||
- [ ] `ConfigIdentityProvider::resolve_from_token` parses `alk_` prefix, matches by hash, checks expiry
|
||||
- [ ] ConfigIdentityProvider reads from ArcSwap on every call (hot-reloadable)
|
||||
- [ ] Unit test: fingerprint resolution (known fingerprint → Some, unknown → None)
|
||||
- [ ] Unit test: token resolution (valid non-expired → Some, expired → None, unknown → None)
|
||||
- [ ] Unit test: config reload changes resolution results immediately
|
||||
- [ ] `cargo test -p alknet-core` succeeds
|
||||
- [ ] `cargo clippy -p alknet-core` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/core/auth.md — all type definitions, resolution flow
|
||||
- docs/architecture/decisions/004-auth-as-shared-core.md — ADR-004
|
||||
- docs/architecture/decisions/011-authcontext-structure.md — ADR-011
|
||||
|
||||
## Notes
|
||||
|
||||
> Auth is hybrid: endpoint resolves TLS-level, handler resolves protocol-level.
|
||||
> AuthContext may be partial (identity = None). The two identity scopes
|
||||
> (connection-level for observability, per-request for ACL) must not be
|
||||
> conflated. ConfigIdentityProvider reads from ArcSwap on every call so config
|
||||
> reloads take effect immediately.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
190
tasks/core/config.md
Normal file
190
tasks/core/config.md
Normal file
@@ -0,0 +1,190 @@
|
||||
---
|
||||
id: core/config
|
||||
name: Implement StaticConfig, DynamicConfig, AuthPolicy, ApiKeyEntry, ConfigReloadHandle, TlsIdentity
|
||||
status: pending
|
||||
depends_on: [core/core-types]
|
||||
scope: moderate
|
||||
risk: low
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the configuration types in `src/config.rs`. These are the config
|
||||
structures consumed by the endpoint and the CLI binary. StaticConfig is
|
||||
immutable at startup; DynamicConfig is hot-reloadable via ArcSwap.
|
||||
|
||||
### StaticConfig
|
||||
|
||||
```rust
|
||||
pub struct StaticConfig {
|
||||
pub listen_addr: Option<SocketAddr>,
|
||||
pub tls_identity: Option<TlsIdentity>,
|
||||
pub iroh_relay: Option<RelayUrl>,
|
||||
pub drain_timeout: Duration,
|
||||
}
|
||||
```
|
||||
|
||||
Immutable configuration resolved at startup. `listen_addr` is None for
|
||||
iroh-only nodes. `tls_identity` is required if `listen_addr` is Some.
|
||||
|
||||
### TlsIdentity
|
||||
|
||||
```rust
|
||||
pub enum TlsIdentity {
|
||||
X509 { cert: PathBuf, key: PathBuf },
|
||||
RawKey(iroh::SecretKey),
|
||||
SelfSigned,
|
||||
}
|
||||
```
|
||||
|
||||
Three modes (OQ-12):
|
||||
- `X509`: domain certificate for browser/WebTransport clients
|
||||
- `RawKey`: RFC 7250 raw Ed25519 public key — default for P2P, no domain/CA
|
||||
- `SelfSigned`: development only
|
||||
|
||||
`RawKey` uses `iroh::SecretKey` (Ed25519) — re-exported from iroh, which
|
||||
alknet-core depends on (feature-gated). The key can be derived from
|
||||
alknet-vault at the assembly layer or generated fresh.
|
||||
|
||||
### DynamicConfig
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct DynamicConfig {
|
||||
pub auth: AuthPolicy,
|
||||
pub rate_limits: RateLimitConfig,
|
||||
}
|
||||
```
|
||||
|
||||
Runtime-reloadable via ArcSwap.
|
||||
|
||||
### AuthPolicy
|
||||
|
||||
```rust
|
||||
pub struct AuthPolicy {
|
||||
pub authorized_fingerprints: HashSet<String>,
|
||||
pub api_keys: Vec<ApiKeyEntry>,
|
||||
}
|
||||
```
|
||||
|
||||
Fingerprints stored as strings (no russh dependency in core — ADR-003).
|
||||
Certificate authority entries deferred to alknet-ssh (omitted from v1 to avoid
|
||||
referencing an undefined type; adding back is additive).
|
||||
|
||||
### ApiKeyEntry
|
||||
|
||||
```rust
|
||||
pub struct ApiKeyEntry {
|
||||
pub prefix: String,
|
||||
pub hash: String,
|
||||
pub scopes: Vec<String>,
|
||||
pub description: String,
|
||||
pub expires_at: Option<u64>,
|
||||
}
|
||||
```
|
||||
|
||||
Carries forward from reference implementation. Prefix (first 8 chars) for O(1)
|
||||
lookup, SHA-256 hash for verification.
|
||||
|
||||
### RateLimitConfig
|
||||
|
||||
```rust
|
||||
pub struct RateLimitConfig {
|
||||
pub max_connections_per_ip: usize,
|
||||
pub max_auth_attempts: usize,
|
||||
}
|
||||
```
|
||||
|
||||
### ArcSwap pattern
|
||||
|
||||
```rust
|
||||
let dynamic = Arc::new(ArcSwap::new(Arc::new(DynamicConfig::default())));
|
||||
```
|
||||
|
||||
- Reads: `dynamic.load()` returns `Arc<DynamicConfig>` — lock-free
|
||||
- Writes: `dynamic.store(Arc::new(new_config))` — atomic swap
|
||||
- No locks: ArcSwap uses atomic operations
|
||||
|
||||
### ConfigReloadHandle
|
||||
|
||||
```rust
|
||||
pub struct ConfigReloadHandle {
|
||||
dynamic: Arc<ArcSwap<DynamicConfig>>,
|
||||
}
|
||||
|
||||
impl ConfigReloadHandle {
|
||||
pub fn reload(&self, new_config: DynamicConfig);
|
||||
pub fn dynamic(&self) -> Arc<DynamicConfig>;
|
||||
}
|
||||
```
|
||||
|
||||
- `reload()`: atomically replaces the dynamic config
|
||||
- `dynamic()`: returns current config as `Arc<DynamicConfig>`
|
||||
|
||||
**Config reload is a privilege-escalation path.** A reload that adds an
|
||||
authorized fingerprint or API key grants access immediately. The reload
|
||||
trigger must be authenticated/local-only (SIGHUP, file watch, or admin call
|
||||
protocol operation). The implementation must not ship a reload endpoint with
|
||||
no auth "for convenience."
|
||||
|
||||
### ConfigError
|
||||
|
||||
```rust
|
||||
pub enum ConfigError {
|
||||
InvalidFlag { name: String },
|
||||
KeyFileNotFound { path: String },
|
||||
BindFailed(io::Error),
|
||||
TlsConfig(io::Error),
|
||||
IncompatibleOptions,
|
||||
}
|
||||
```
|
||||
|
||||
### Defaults
|
||||
|
||||
- `drain_timeout`: 2 seconds
|
||||
- `max_connections_per_ip`: implementation default (reference uses a reasonable value)
|
||||
- `max_auth_attempts`: implementation default
|
||||
- `DynamicConfig::default()`: empty auth policy, default rate limits
|
||||
|
||||
### What NOT to include
|
||||
|
||||
Per the spec, StaticConfig does NOT include: `host_key`, `host_key_algorithm`,
|
||||
`proxy_config`, `stealth`, `transport_mode`, `listeners`. These are removed in
|
||||
the new model (ALPN dispatch replaces them — see config.md Key Differences).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `StaticConfig` struct with all fields per config.md
|
||||
- [ ] `TlsIdentity` enum with X509, RawKey, SelfSigned variants
|
||||
- [ ] `DynamicConfig` struct with `auth` and `rate_limits` fields
|
||||
- [ ] `AuthPolicy` struct with `authorized_fingerprints` and `api_keys`
|
||||
- [ ] `ApiKeyEntry` struct with all 5 fields
|
||||
- [ ] `RateLimitConfig` struct with both fields
|
||||
- [ ] `ConfigReloadHandle` with `reload()` and `dynamic()` methods
|
||||
- [ ] `ConfigError` enum with all variants
|
||||
- [ ] `DynamicConfig` derives `Clone`, `Debug` (for ArcSwap)
|
||||
- [ ] Default values match config.md (drain_timeout = 2s, etc.)
|
||||
- [ ] No russh dependency (fingerprints as strings)
|
||||
- [ ] Unit tests for Default impls
|
||||
- [ ] Unit test: ConfigReloadHandle reload swaps config atomically
|
||||
- [ ] `cargo test -p alknet-core` succeeds
|
||||
- [ ] `cargo clippy -p alknet-core` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/core/config.md — all type definitions
|
||||
- docs/architecture/decisions/003-crate-decomposition.md — ADR-003 (no russh in core)
|
||||
- docs/architecture/decisions/010-alpn-router-and-endpoint.md — ADR-010 (no ListenerConfig)
|
||||
|
||||
## Notes
|
||||
|
||||
> Config reload is a privilege-escalation path — do not ship an unauthenticated
|
||||
> reload endpoint. The ArcSwap pattern carries forward from the reference
|
||||
> implementation. StaticConfig removes all SSH-centric fields (host_key,
|
||||
> stealth, transport_mode, listeners) — those are handler concerns now.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
224
tasks/core/core-types.md
Normal file
224
tasks/core/core-types.md
Normal file
@@ -0,0 +1,224 @@
|
||||
---
|
||||
id: core/core-types
|
||||
name: "Implement core types: ProtocolHandler, Connection, BiStream, SendStream, RecvStream, StreamError, HandlerError, Capabilities"
|
||||
status: pending
|
||||
depends_on: [core/crate-init]
|
||||
scope: broad
|
||||
risk: medium
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the core types in `src/types.rs`. These are the foundational
|
||||
abstractions that every handler crate depends on. This is the most
|
||||
cross-crate-boundary task in core — `Capabilities` in particular is used
|
||||
heavily by alknet-call's operation registry and composition model.
|
||||
|
||||
### ProtocolHandler trait
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait ProtocolHandler: Send + Sync + 'static {
|
||||
fn alpn(&self) -> &'static [u8];
|
||||
async fn handle(&self, connection: Connection, auth: &AuthContext) -> Result<(), HandlerError>;
|
||||
}
|
||||
```
|
||||
|
||||
- `alpn()` returns the handler's ALPN identifier as a static byte string
|
||||
- `handle()` receives a `Connection` (not a single BiStream) and an `AuthContext`
|
||||
- Handlers that need a single stream call `connection.accept_bi()` once
|
||||
- Handlers that multiplex (SSH, call) open/accept streams as needed
|
||||
|
||||
See ADR-002, ADR-007.
|
||||
|
||||
### HandlerError
|
||||
|
||||
```rust
|
||||
pub enum HandlerError {
|
||||
ConnectionClosed,
|
||||
StreamError(io::Error),
|
||||
AuthRequired,
|
||||
Internal(Box<dyn std::error::Error + Send + Sync>),
|
||||
}
|
||||
```
|
||||
|
||||
Non-fatal errors within `handle()`. The endpoint catches these, logs them,
|
||||
closes the connection. Other connections are unaffected. Handler panics are
|
||||
caught by tokio's task isolation.
|
||||
|
||||
### Connection
|
||||
|
||||
```rust
|
||||
pub struct Connection {
|
||||
// Private: wraps the underlying QUIC connection or test mock
|
||||
identity: OnceLock<Identity>,
|
||||
}
|
||||
|
||||
impl Connection {
|
||||
#[cfg(feature = "quinn")]
|
||||
pub fn from_quinn(conn: quinn::Connection) -> Self;
|
||||
#[cfg(feature = "iroh")]
|
||||
pub fn from_iroh(conn: iroh::Connection) -> Self;
|
||||
pub async fn accept_bi(&self) -> Result<(SendStream, RecvStream), StreamError>;
|
||||
pub async fn open_bi(&self) -> Result<(SendStream, RecvStream), StreamError>;
|
||||
pub fn remote_alpn(&self) -> &[u8];
|
||||
pub fn remote_addr(&self) -> Option<SocketAddr>;
|
||||
pub fn close(&self, code: u32, reason: &str);
|
||||
pub fn set_identity(&self, identity: Identity) -> Result<(), IdentityAlreadySet>;
|
||||
pub fn identity(&self) -> Option<&Identity>;
|
||||
}
|
||||
```
|
||||
|
||||
- Opaque type wrapping a QUIC connection (quinn or iroh, feature-gated)
|
||||
- `set_identity` is write-once-read-many via `OnceLock` (OQ-11) — handlers
|
||||
store resolved identity for observability; the endpoint does NOT read it
|
||||
after `handle()` returns (the Connection is moved into the spawned task)
|
||||
- Internal enum dispatch for quinn vs iroh vs test mock
|
||||
- `Connection` does not expose quinn types in its public API
|
||||
|
||||
### BiStream trait
|
||||
|
||||
```rust
|
||||
pub trait BiStream: AsyncRead + AsyncWrite + Send + Unpin {}
|
||||
```
|
||||
|
||||
A convenience trait for client-side code, test mocks, and future transport
|
||||
abstractions (WebTransport, raw TCP). Handlers that need a single stream
|
||||
obtain one via `connection.accept_bi()` and treat the pair as a BiStream.
|
||||
|
||||
### SendStream and RecvStream
|
||||
|
||||
```rust
|
||||
pub struct SendStream { /* wraps quinn::SendStream or iroh::SendStream or test mock */ }
|
||||
pub struct RecvStream { /* wraps quinn::RecvStream or iroh::RecvStream or test mock */ }
|
||||
|
||||
impl AsyncWrite for SendStream { ... }
|
||||
impl AsyncRead for RecvStream { ... }
|
||||
```
|
||||
|
||||
Concrete wrapper types using internal enum dispatch to delegate to the
|
||||
appropriate QUIC stream type (quinn or iroh) in production, and to test mocks
|
||||
in tests.
|
||||
|
||||
### StreamError
|
||||
|
||||
```rust
|
||||
pub enum StreamError {
|
||||
ConnectionClosed,
|
||||
StreamClosed,
|
||||
Timeout,
|
||||
Internal(io::Error),
|
||||
}
|
||||
```
|
||||
|
||||
Returned by `accept_bi()`, `open_bi()`, and stream read/write operations.
|
||||
Maps from `quinn::ConnectionError` / `quinn::StreamError` and iroh equivalents.
|
||||
|
||||
### From<StreamError> for HandlerError
|
||||
|
||||
```rust
|
||||
impl From<StreamError> for HandlerError {
|
||||
fn from(e: StreamError) -> Self {
|
||||
match e {
|
||||
StreamError::ConnectionClosed => HandlerError::ConnectionClosed,
|
||||
StreamError::StreamClosed => HandlerError::StreamError(
|
||||
io::Error::new(io::ErrorKind::ConnectionReset, "stream closed")),
|
||||
StreamError::Timeout => HandlerError::StreamError(
|
||||
io::Error::new(io::ErrorKind::TimedOut, "stream timed out")),
|
||||
StreamError::Internal(e) => HandlerError::StreamError(e),
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This `From` impl is the canonical conversion — handlers use `?` on
|
||||
`accept_bi()` / `open_bi()`.
|
||||
|
||||
### Capabilities
|
||||
|
||||
```rust
|
||||
#[derive(Clone, Zeroize, ZeroizeOnDrop)]
|
||||
pub struct Capabilities {
|
||||
entries: HashMap<String, Secret<String>>,
|
||||
}
|
||||
|
||||
impl Capabilities {
|
||||
pub fn new() -> Self;
|
||||
pub fn with_api_key(mut self, service: &str, key: String) -> Self;
|
||||
pub fn with_http_token(mut self, service: &str, token: String) -> Self;
|
||||
pub fn get(&self, service: &str) -> Option<&Secret<String>>;
|
||||
}
|
||||
```
|
||||
|
||||
Critical constraints (ADR-014, ADR-022, review #002 W2):
|
||||
- **Non-serializable**: does NOT derive `Serialize`. Cannot appear in
|
||||
`EventEnvelope` payloads even by accident.
|
||||
- **Zeroized**: derives `Zeroize` and `ZeroizeOnDrop`. Secret material does
|
||||
not linger in freed heap memory.
|
||||
- **Clone + Send + Sync**: required by the composition model —
|
||||
`OperationEnv::invoke()` clones the parent's capabilities for each child.
|
||||
- **Immutable after construction**: no `set`, no `insert`, no `mut` accessors.
|
||||
This is the guard from review #002 W2 — makes clone semantics genuinely
|
||||
two-way (Arc-based vs deep-copy are behaviorally identical when neither
|
||||
supports mutation).
|
||||
- **Private fields**: the builder API (`new`, `with_*`) is the only
|
||||
construction path.
|
||||
|
||||
Use `secrecy::Secret<String>` (from the `secrecy` crate) or a similar wrapper
|
||||
for the secret values. Add `secrecy` to dependencies if needed, or implement
|
||||
a simple `Secret` wrapper that zeroizes on drop and redacts in Debug.
|
||||
|
||||
### IdentityAlreadySet error
|
||||
|
||||
```rust
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
pub enum IdentityAlreadySet {
|
||||
#[error("connection identity already set")]
|
||||
AlreadySet,
|
||||
}
|
||||
```
|
||||
|
||||
Returned by `Connection::set_identity()` if called a second time.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `ProtocolHandler` trait defined with `alpn()` and `handle()` (async)
|
||||
- [ ] `HandlerError` enum with all 4 variants
|
||||
- [ ] `Connection` struct with all methods (from_quinn/from_iroh feature-gated)
|
||||
- [ ] `Connection::set_identity` write-once via `OnceLock`, returns `IdentityAlreadySet` on second call
|
||||
- [ ] `BiStream` trait defined (AsyncRead + AsyncWrite + Send + Unpin)
|
||||
- [ ] `SendStream` implements `AsyncWrite`
|
||||
- [ ] `RecvStream` implements `AsyncRead`
|
||||
- [ ] `StreamError` enum with all 4 variants
|
||||
- [ ] `From<StreamError> for HandlerError` impl
|
||||
- [ ] `Capabilities` struct with `new()`, `with_api_key()`, `with_http_token()`, `get()`
|
||||
- [ ] `Capabilities` derives `Clone`, `Zeroize`, `ZeroizeOnDrop` — NOT `Serialize`
|
||||
- [ ] `Capabilities` fields are private (builder API only, no mut accessors)
|
||||
- [ ] `IdentityAlreadySet` error type
|
||||
- [ ] Unit tests for Capabilities (build, get, clone, zeroize)
|
||||
- [ ] Unit test: `Connection::set_identity` once succeeds, twice returns error
|
||||
- [ ] `cargo test -p alknet-core` succeeds
|
||||
- [ ] `cargo clippy -p alknet-core` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/core/core-types.md — all type definitions
|
||||
- docs/architecture/decisions/002-protocol-handler-trait.md — ADR-002
|
||||
- docs/architecture/decisions/007-bistream-type-definition.md — ADR-007
|
||||
- docs/architecture/decisions/014-secret-material-flow-and-capability-injection.md — ADR-014 (Capabilities)
|
||||
- docs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md — ADR-022
|
||||
|
||||
## Notes
|
||||
|
||||
> This is the most cross-crate-boundary task in core. `Capabilities` is used
|
||||
> heavily by alknet-call's operation registry and composition model — it must
|
||||
> be right the first time. The immutability guard (no mut accessors) is the
|
||||
> security control from review #002 W2 that makes clone semantics safe. The
|
||||
> `Connection` type uses internal enum dispatch for quinn/iroh/test — do not
|
||||
> expose quinn types in the public API.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
116
tasks/core/crate-init.md
Normal file
116
tasks/core/crate-init.md
Normal file
@@ -0,0 +1,116 @@
|
||||
---
|
||||
id: core/crate-init
|
||||
name: Initialize alknet-core crate with Cargo.toml, dependencies, and module skeleton
|
||||
status: pending
|
||||
depends_on: []
|
||||
scope: moderate
|
||||
risk: low
|
||||
impact: project
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Initialize the `alknet-core` crate from scratch. The workspace currently has
|
||||
only `alknet-vault`. This task creates the crate directory, `Cargo.toml`,
|
||||
`lib.rs`, and the module skeleton that subsequent core tasks will fill in.
|
||||
|
||||
### Crate setup
|
||||
|
||||
Create `crates/alknet-core/` with:
|
||||
|
||||
- `Cargo.toml` — package metadata, dependencies, feature flags
|
||||
- `src/lib.rs` — crate root with module declarations and re-exports
|
||||
- Module skeleton files (empty or with `// TODO` markers) for:
|
||||
- `src/types.rs` — ProtocolHandler, HandlerError, Connection, BiStream, SendStream, RecvStream, StreamError, Capabilities
|
||||
- `src/auth.rs` — AuthContext, Identity, IdentityProvider, AuthToken, ConfigIdentityProvider
|
||||
- `src/config.rs` — StaticConfig, DynamicConfig, AuthPolicy, ApiKeyEntry, RateLimitConfig, ConfigReloadHandle, ConfigError, TlsIdentity
|
||||
- `src/endpoint.rs` — AlknetEndpoint, HandlerRegistry, EndpointError
|
||||
|
||||
### Dependencies
|
||||
|
||||
Per the architecture specs (overview.md, core/README.md, endpoint.md):
|
||||
|
||||
| Crate | Purpose |
|
||||
|-------|---------|
|
||||
| `tokio` 1 (full) | Async runtime, watch channel for shutdown |
|
||||
| `quinn` | QUIC endpoint (feature-gated) |
|
||||
| `iroh` | P2P relay-assisted endpoint (feature-gated) |
|
||||
| `rustls` | TLS implementation |
|
||||
| `rustls-pki-types` | TLS types (CertificateDer, PrivateKeyDer) |
|
||||
| `serde` 1 | Serialization for config types |
|
||||
| `serde_json` 1 | JSON for config, JSON Schema values |
|
||||
| `toml` 0.8 | Config file format |
|
||||
| `arc-swap` 1 | Atomic config swap for DynamicConfig |
|
||||
| `async-trait` 0.1 | ProtocolHandler trait (async fn in trait) |
|
||||
| `tracing` 0.1 | Structured logging |
|
||||
| `thiserror` 2 | Error enums |
|
||||
| `zeroize` 1 | Capabilities zeroization |
|
||||
| `bytes` 1 | Byte buffer types for streams |
|
||||
| `futures` | AsyncRead/AsyncWrite for BiStream trait |
|
||||
|
||||
### Feature flags
|
||||
|
||||
```toml
|
||||
[features]
|
||||
default = ["quinn"]
|
||||
quinn = ["dep:quinn"]
|
||||
iroh = ["dep:iroh"]
|
||||
```
|
||||
|
||||
Both quinn and iroh are optional, both can be active simultaneously (ADR-010).
|
||||
`quinn` is default-on for the common case; `iroh` is opt-in.
|
||||
|
||||
### Workspace Cargo.toml
|
||||
|
||||
Add `crates/alknet-core` to the workspace `members` list in the root
|
||||
`Cargo.toml`.
|
||||
|
||||
### Module skeleton
|
||||
|
||||
```rust
|
||||
// src/lib.rs
|
||||
//! alknet-core: Core library for ALPN-based protocol dispatch.
|
||||
|
||||
pub mod types;
|
||||
pub mod auth;
|
||||
pub mod config;
|
||||
pub mod endpoint;
|
||||
|
||||
// Re-exports (filled in by subsequent tasks)
|
||||
```
|
||||
|
||||
Each module file gets a doc comment and `// TODO: implement` marker. The
|
||||
subsequent tasks (core-types, config, auth, endpoint) fill these in.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `crates/alknet-core/Cargo.toml` exists with all dependencies and feature flags
|
||||
- [ ] `crates/alknet-core/src/lib.rs` exists with module declarations
|
||||
- [ ] Module skeleton files exist: `types.rs`, `auth.rs`, `config.rs`, `endpoint.rs`
|
||||
- [ ] Root `Cargo.toml` `members` list includes `crates/alknet-core`
|
||||
- [ ] `cargo check -p alknet-core` succeeds
|
||||
- [ ] `cargo clippy -p alknet-core` succeeds with no warnings
|
||||
- [ ] Dual licensing: `MIT OR Apache-2.0` (workspace-inherited)
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/overview.md — crate graph, shared types
|
||||
- docs/architecture/crates/core/README.md — crate index
|
||||
- docs/architecture/crates/core/core-types.md — types to implement
|
||||
- docs/architecture/crates/core/endpoint.md — endpoint, features (quinn + iroh)
|
||||
- docs/architecture/crates/core/config.md — config types
|
||||
- docs/architecture/crates/core/auth.md — auth types
|
||||
- docs/architecture/decisions/003-crate-decomposition.md — ADR-003
|
||||
- docs/architecture/decisions/010-alpn-router-and-endpoint.md — ADR-010 (feature-gating)
|
||||
|
||||
## Notes
|
||||
|
||||
> This is the foundational setup task for alknet-core. All subsequent core
|
||||
> tasks depend on this one. The crate has no alknet dependencies (vault is
|
||||
> standalone; core doesn't depend on vault). The feature flags for quinn/iroh
|
||||
> are important — both are optional and can be active simultaneously.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
249
tasks/core/endpoint.md
Normal file
249
tasks/core/endpoint.md
Normal file
@@ -0,0 +1,249 @@
|
||||
---
|
||||
id: core/endpoint
|
||||
name: Implement AlknetEndpoint, HandlerRegistry, accept loops (quinn + iroh), TLS identity, and graceful shutdown
|
||||
status: pending
|
||||
depends_on: [core/core-types, core/config, core/auth]
|
||||
scope: broad
|
||||
risk: high
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the ALPN router and endpoint in `src/endpoint.rs`. This is the
|
||||
integration point of alknet-core — it ties together the core types, config,
|
||||
and auth into the central runtime that accepts connections and dispatches to
|
||||
handlers by ALPN string.
|
||||
|
||||
### AlknetEndpoint
|
||||
|
||||
```rust
|
||||
pub struct AlknetEndpoint {
|
||||
quinn: Option<quinn::Endpoint>,
|
||||
iroh: Option<iroh::Endpoint>,
|
||||
handlers: Arc<HandlerRegistry>,
|
||||
dynamic: Arc<ArcSwap<DynamicConfig>>,
|
||||
identity_provider: Arc<dyn IdentityProvider>,
|
||||
shutdown: watch::Receiver<bool>,
|
||||
}
|
||||
```
|
||||
|
||||
Manages one or more QUIC connection sources, each feeding into the same ALPN
|
||||
router. Both quinn and iroh are optional (feature-gated), both can be active
|
||||
simultaneously (ADR-010).
|
||||
|
||||
### HandlerRegistry
|
||||
|
||||
```rust
|
||||
pub struct HandlerRegistry {
|
||||
handlers: HashMap<&'static [u8], Arc<dyn ProtocolHandler>>,
|
||||
}
|
||||
|
||||
impl HandlerRegistry {
|
||||
pub fn new() -> Self;
|
||||
pub fn register(&mut self, handler: Arc<dyn ProtocolHandler>);
|
||||
pub fn get(&self, alpn: &[u8]) -> Option<&Arc<dyn ProtocolHandler>>;
|
||||
pub fn alpn_strings(&self) -> Vec<Vec<u8>>;
|
||||
}
|
||||
```
|
||||
|
||||
- `register()`: insert a handler. Panics if ALPN already registered.
|
||||
- `get()`: look up by ALPN string.
|
||||
- `alpn_strings()`: all registered ALPNs — used to build TLS ServerConfig
|
||||
(quinn) and ALPN list (iroh).
|
||||
- Registration is **static at startup** (OQ-04, ADR-010). The CLI builds the
|
||||
registry, inserts all handlers, passes to `AlknetEndpoint::new()`.
|
||||
|
||||
### Accept loops
|
||||
|
||||
Each active connection source runs its own accept loop. Both dispatch through
|
||||
the same `HandlerRegistry`.
|
||||
|
||||
**Quinn accept loop** (public QUIC+TLS):
|
||||
```
|
||||
loop {
|
||||
tokio::select! {
|
||||
incoming = quinn_endpoint.accept() => {
|
||||
let connection = incoming.await;
|
||||
match connection {
|
||||
Ok(conn) => dispatch(conn),
|
||||
Err(e) => { /* log TLS handshake failure, continue */ }
|
||||
}
|
||||
}
|
||||
_ = shutdown.changed() => break,
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**iroh accept loop** (P2P relay-assisted):
|
||||
```
|
||||
loop {
|
||||
tokio::select! {
|
||||
incoming = iroh_endpoint.accept() => {
|
||||
let accepting = incoming.accept();
|
||||
let alpn = accepting.alpn().await;
|
||||
match alpn {
|
||||
Ok(alpn) => dispatch(alpn, accepting),
|
||||
Err(e) => { /* log handshake failure, continue */ }
|
||||
}
|
||||
}
|
||||
_ = shutdown.changed() => break,
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Use `iroh::Endpoint` directly (not iroh's `Router`) because our HandlerRegistry
|
||||
is shared between quinn and iroh, and our AuthContext construction differs per
|
||||
source. See iroh's `protocol.rs` for the reference pattern.
|
||||
|
||||
### Dispatch function (shared)
|
||||
|
||||
```
|
||||
fn dispatch(connection) {
|
||||
let alpn = connection.alpn();
|
||||
match handlers.get(alpn) {
|
||||
Some(handler) => {
|
||||
let auth = AuthContext::from_connection(&connection);
|
||||
let conn = Connection::from_quinn(connection); // or from_iroh
|
||||
tokio::spawn(async move {
|
||||
if let Err(e) = handler.handle(conn, &auth).await {
|
||||
// log error, connection closes
|
||||
}
|
||||
});
|
||||
}
|
||||
None => connection.close(0u32, "no handler"),
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### AuthContext construction
|
||||
|
||||
The endpoint constructs `AuthContext` from the QUIC connection:
|
||||
1. `alpn`: from `connection.alpn()` — always present
|
||||
2. `remote_addr`: from `connection.remote_addr()` — may be None for iroh
|
||||
3. `tls_client_fingerprint`: extracted from TLS session's client cert, if presented
|
||||
4. `identity`: if fingerprint available, call `IdentityProvider::resolve_from_fingerprint()`.
|
||||
If resolves, `identity = Some(resolved)`. If not, `identity = None`.
|
||||
|
||||
### TLS Identity
|
||||
|
||||
Three modes per `TlsIdentity` (OQ-12):
|
||||
|
||||
**RawKey (RFC 7250, default for P2P)**:
|
||||
- Build `rustls::ServerConfig` with `only_raw_public_keys() -> true`
|
||||
- `ResolvesServerCert` generates cert on-the-fly from the Ed25519 key
|
||||
- ~100 lines — see `iroh/iroh/src/tls/resolver.rs` for the reference pattern
|
||||
- Works natively with SSH auth and git; browsers do NOT support RFC 7250
|
||||
|
||||
**X509 (domain-hosted)**:
|
||||
- Load cert/key from file paths
|
||||
- Standard `rustls::ServerConfig`
|
||||
- For browser/WebTransport clients and public domain services
|
||||
|
||||
**SelfSigned (dev only)**:
|
||||
- Generate self-signed cert on startup
|
||||
- External clients will not trust it
|
||||
|
||||
**ACME (future, not in this task)**:
|
||||
- The reverse-proxy project demonstrates the complete ACME pattern. It will be
|
||||
adapted as an additional `TlsIdentity` variant or `ResolvesServerCert` impl.
|
||||
For now, X509 with manual certs is the domain path. Note this as a TODO.
|
||||
|
||||
The quinn endpoint's `rustls::ServerConfig` ALPN list is set from
|
||||
`registry.alpn_strings()` at construction time. The iroh endpoint's ALPN list
|
||||
is similarly derived. Both advertise the same set of ALPNs.
|
||||
|
||||
### Graceful shutdown
|
||||
|
||||
```rust
|
||||
impl AlknetEndpoint {
|
||||
pub fn shutdown_sender(&self) -> watch::Sender<bool>;
|
||||
pub async fn shutdown(&self) -> Result<(), EndpointError>;
|
||||
}
|
||||
```
|
||||
|
||||
- `shutdown_sender()`: clone of shutdown channel sender. `send(true)` signals shutdown.
|
||||
- `shutdown()`: signals all accept loops to stop, waits for in-flight connections
|
||||
with drain timeout (default 2s from StaticConfig), then forcefully closes remaining.
|
||||
- SIGTERM/SIGINT wired to shutdown channel by the CLI binary (not core's concern).
|
||||
|
||||
### EndpointError
|
||||
|
||||
```rust
|
||||
pub enum EndpointError {
|
||||
BindFailed(io::Error),
|
||||
TlsConfig(io::Error),
|
||||
HandlerNotFound(Vec<u8>),
|
||||
}
|
||||
```
|
||||
|
||||
Fatal errors that prevent the endpoint from starting or continuing.
|
||||
|
||||
### Accept loop error handling
|
||||
|
||||
- **TLS handshake failure**: log and continue. Client may have offered no
|
||||
compatible ALPN, or cert may be untrusted.
|
||||
- **Handler panic**: caught by tokio's task isolation. Connection dropped,
|
||||
others continue.
|
||||
- **Connection-level errors** (quinn/iroh ConnectionError): log and continue.
|
||||
Accept loop keeps running.
|
||||
|
||||
### What the accept loops do NOT do
|
||||
|
||||
- No byte-peeking (ALPN handles protocol detection)
|
||||
- No per-handler accept loops (ALPN unifies)
|
||||
- No SSH-specific logic (accept loop is ALPN-agnostic)
|
||||
|
||||
### TCP is NOT an endpoint concern
|
||||
|
||||
Bare TCP (SSH over port 22) does not use QUIC or ALPN. TCP access is handled by
|
||||
individual handlers (the SSH handler can listen on TCP independently). This is
|
||||
handler-specific, not core endpoint.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `AlknetEndpoint` struct with quinn/iroh (both Option, both feature-gated)
|
||||
- [ ] `HandlerRegistry` with new/register/get/alpn_strings
|
||||
- [ ] `register()` panics on duplicate ALPN
|
||||
- [ ] Quinn accept loop runs, dispatches by ALPN, respects shutdown
|
||||
- [ ] iroh accept loop runs, dispatches by ALPN, respects shutdown
|
||||
- [ ] Dispatch function spawns handler task via `tokio::spawn`
|
||||
- [ ] AuthContext constructed from connection (alpn, remote_addr, fingerprint, identity)
|
||||
- [ ] TLS RawKey mode: rustls ServerConfig with `only_raw_public_keys()`, on-the-fly cert
|
||||
- [ ] TLS X509 mode: load cert/key from files, standard ServerConfig
|
||||
- [ ] TLS SelfSigned mode: generate self-signed cert on startup
|
||||
- [ ] ALPN list in TLS ServerConfig set from `registry.alpn_strings()`
|
||||
- [ ] Graceful shutdown: signal accept loops to stop, drain timeout, force close
|
||||
- [ ] `EndpointError` enum with all variants
|
||||
- [ ] Accept loop errors logged, loop continues (no crash on handshake failure)
|
||||
- [ ] Handler panics caught by tokio task isolation (connection dropped, others continue)
|
||||
- [ ] No byte-peeking, no per-handler accept loops, no SSH-specific logic
|
||||
- [ ] Unit test: HandlerRegistry register/get/alpn_strings
|
||||
- [ ] Unit test: HandlerRegistry register panics on duplicate ALPN
|
||||
- [ ] Integration test: endpoint with mock handler, verify dispatch by ALPN
|
||||
- [ ] `cargo test -p alknet-core` succeeds
|
||||
- [ ] `cargo clippy -p alknet-core` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/core/endpoint.md — full endpoint spec
|
||||
- docs/architecture/decisions/001-alpn-protocol-dispatch.md — ADR-001
|
||||
- docs/architecture/decisions/010-alpn-router-and-endpoint.md — ADR-010
|
||||
- docs/architecture/decisions/006-alpn-convention-and-connection-model.md — ADR-006
|
||||
- docs/architecture/decisions/007-bistream-type-definition.md — ADR-007
|
||||
- iroh reference: `/workspace/iroh/iroh/src/protocol.rs` (accept loop pattern)
|
||||
- iroh reference: `/workspace/iroh/iroh/src/tls/resolver.rs` (RFC 7250 raw key)
|
||||
|
||||
## Notes
|
||||
|
||||
> This is the integration point of alknet-core — it ties together types, config,
|
||||
> and auth. The highest-risk task in core because it involves QUIC connection
|
||||
> handling, TLS identity (3 modes), and graceful shutdown. The RFC 7250 raw key
|
||||
> path is ~100 lines (iroh has a reference implementation). ACME is deferred —
|
||||
> note as TODO, use X509 manual certs for the domain path for now. TCP is NOT
|
||||
> an endpoint concern — it's handler-specific.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
122
tasks/core/review-core.md
Normal file
122
tasks/core/review-core.md
Normal file
@@ -0,0 +1,122 @@
|
||||
---
|
||||
id: core/review-core
|
||||
name: Review alknet-core implementation for spec conformance and pattern consistency
|
||||
status: pending
|
||||
depends_on: [core/endpoint]
|
||||
scope: moderate
|
||||
risk: low
|
||||
impact: phase
|
||||
level: review
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Review the alknet-core implementation for spec conformance, pattern
|
||||
consistency, and correctness before alknet-call (which depends on core)
|
||||
begins implementation. This is the quality checkpoint at the end of the core
|
||||
phase.
|
||||
|
||||
### Review Checklist
|
||||
|
||||
1. **Core types conformance** (core-types.md):
|
||||
- `ProtocolHandler` trait signature matches spec (alpn, handle)
|
||||
- `HandlerError` has all 4 variants (ConnectionClosed, StreamError, AuthRequired, Internal)
|
||||
- `Connection` has all methods, from_quinn/from_iroh feature-gated
|
||||
- `Connection::set_identity` is write-once via OnceLock
|
||||
- `BiStream` is a trait (AsyncRead + AsyncWrite + Send + Unpin)
|
||||
- `SendStream` implements AsyncWrite, `RecvStream` implements AsyncRead
|
||||
- `StreamError` has all 4 variants
|
||||
- `From<StreamError> for HandlerError` impl matches spec mapping table
|
||||
- `Capabilities` is non-serializable, zeroized, immutable, Clone+Send+Sync
|
||||
- `Capabilities` has builder API (new, with_api_key, with_http_token, get), private fields
|
||||
|
||||
2. **Config conformance** (config.md):
|
||||
- `StaticConfig` fields match (listen_addr, tls_identity, iroh_relay, drain_timeout)
|
||||
- `TlsIdentity` has X509, RawKey, SelfSigned
|
||||
- `DynamicConfig` has auth and rate_limits
|
||||
- `AuthPolicy` has authorized_fingerprints (HashSet<String>), api_keys (Vec<ApiKeyEntry>)
|
||||
- `ApiKeyEntry` has all 5 fields (prefix, hash, scopes, description, expires_at)
|
||||
- `ConfigReloadHandle` has reload() and dynamic()
|
||||
- No russh dependency (fingerprints as strings)
|
||||
- No removed fields (host_key, stealth, transport_mode, listeners)
|
||||
|
||||
3. **Auth conformance** (auth.md):
|
||||
- `AuthContext` has all 4 fields, derives Clone
|
||||
- `Identity` has id, scopes, resources
|
||||
- `AuthToken` has raw field
|
||||
- `IdentityProvider` trait with both methods
|
||||
- `ConfigIdentityProvider` reads from ArcSwap on every call
|
||||
- Fingerprint resolution looks up in authorized_fingerprints
|
||||
- Token resolution: alk_ prefix, hash match, expiry check
|
||||
- Two identity scopes documented (connection-level vs per-request)
|
||||
|
||||
4. **Endpoint conformance** (endpoint.md):
|
||||
- `AlknetEndpoint` has quinn/iroh (both Option, both feature-gated)
|
||||
- `HandlerRegistry` register/get/alpn_strings, panics on duplicate
|
||||
- Quinn accept loop: select on accept + shutdown, dispatch by ALPN
|
||||
- iroh accept loop: select on accept + shutdown, dispatch by ALPN
|
||||
- Dispatch spawns handler task via tokio::spawn
|
||||
- AuthContext constructed from connection (alpn, remote_addr, fingerprint, identity)
|
||||
- TLS RawKey: only_raw_public_keys(), on-the-fly cert from Ed25519
|
||||
- TLS X509: load from files
|
||||
- TLS SelfSigned: generate on startup
|
||||
- ALPN list in ServerConfig from registry.alpn_strings()
|
||||
- Graceful shutdown: drain timeout, force close
|
||||
- EndpointError has all 3 variants
|
||||
- No byte-peeking, no per-handler loops, no SSH-specific logic
|
||||
|
||||
5. **Pattern consistency**:
|
||||
- ArcSwap used consistently for DynamicConfig
|
||||
- Feature flags (quinn, iroh) gate transport code correctly
|
||||
- Error handling patterns consistent (thiserror, Result propagation)
|
||||
- No quinn/iroh types in public API (Connection wraps them)
|
||||
|
||||
6. **Security constraints**:
|
||||
- Capabilities non-serializable (no Serialize derive)
|
||||
- Capabilities zeroized (Zeroize, ZeroizeOnDrop)
|
||||
- Capabilities immutable (no mut accessors)
|
||||
- Config reload is privilege escalation (no unauthenticated reload endpoint)
|
||||
- Token entropy requirement documented
|
||||
|
||||
7. **Test coverage**:
|
||||
- Unit tests for Capabilities (build, get, clone, zeroize)
|
||||
- Unit tests for config types and reload
|
||||
- Unit tests for auth resolution (fingerprint, token, expiry)
|
||||
- Unit tests for HandlerRegistry
|
||||
- Integration test: endpoint dispatch by ALPN
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] All core types match core-types.md
|
||||
- [ ] All config types match config.md
|
||||
- [ ] All auth types match auth.md
|
||||
- [ ] Endpoint matches endpoint.md (accept loops, TLS modes, shutdown)
|
||||
- [ ] Capabilities security constraints satisfied (non-serializable, zeroized, immutable)
|
||||
- [ ] No russh dependency in core
|
||||
- [ ] No quinn/iroh types in public API
|
||||
- [ ] ArcSwap pattern consistent
|
||||
- [ ] Feature flags gate transport code correctly
|
||||
- [ ] Test coverage adequate for all functionality
|
||||
- [ ] `cargo fmt --check -p alknet-core` passes
|
||||
- [ ] `cargo clippy -p alknet-core` passes with no warnings
|
||||
- [ ] All tests pass
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/core/README.md
|
||||
- docs/architecture/crates/core/core-types.md
|
||||
- docs/architecture/crates/core/config.md
|
||||
- docs/architecture/crates/core/auth.md
|
||||
- docs/architecture/crates/core/endpoint.md
|
||||
- docs/architecture/decisions/ (relevant ADRs: 001-011, 014, 015, 022)
|
||||
|
||||
## Notes
|
||||
|
||||
> This review verifies core is spec-conformant before alknet-call begins.
|
||||
> alknet-call depends heavily on core types (ProtocolHandler, Connection,
|
||||
> AuthContext, Capabilities, IdentityProvider) — any issues here propagate to
|
||||
> call. If deviations are found, document and fix before proceeding.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
85
tasks/vault/cache-zeroization-test.md
Normal file
85
tasks/vault/cache-zeroization-test.md
Normal file
@@ -0,0 +1,85 @@
|
||||
---
|
||||
id: vault/cache-zeroization-test
|
||||
name: Verify and test that HashMap::clear() drops CachedKey values triggering zeroization
|
||||
status: pending
|
||||
depends_on: []
|
||||
scope: single
|
||||
risk: low
|
||||
impact: isolated
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Fix drift item #6: `KeyCache::clear()` removes entries and relies on
|
||||
`CachedKey`'s `Drop` impl for zeroization. The spec says to verify that
|
||||
`HashMap::clear()` actually drops the values (it does, but this is worth a
|
||||
test). This task adds a test that proves zeroization happens on cache eviction
|
||||
and clear.
|
||||
|
||||
### Background
|
||||
|
||||
`CachedKey` derives `Zeroize` and `ZeroizeOnDrop` (via the `DerivedKey` it
|
||||
holds, which is `#[zeroize(drop)]`). When the cache evicts an entry (LRU or TTL)
|
||||
or `clear()` is called, the `CachedKey` is dropped, which triggers
|
||||
`ZeroizeOnDrop` — the private key bytes are zeroized before deallocation.
|
||||
|
||||
`HashMap::clear()` drops all values, which triggers their `Drop` impls. This
|
||||
is standard Rust behavior, but the security-critical nature of key material
|
||||
warrants an explicit test.
|
||||
|
||||
### What to add
|
||||
|
||||
A test in `cache.rs` (or `tests/`) that:
|
||||
|
||||
1. Inserts a `CachedKey` with a known private key into the cache
|
||||
2. Verifies the key is present
|
||||
3. Calls `clear()` (or evicts via LRU/TTL)
|
||||
4. Verifies the `CachedKey` was dropped and zeroized
|
||||
|
||||
Testing zeroization directly is tricky because the memory is freed — you can't
|
||||
easily inspect it after drop. A practical approach:
|
||||
|
||||
- **Option A**: Use a custom type with a `Drop` impl that sets a flag (e.g., an
|
||||
`Arc<AtomicBool>`) when zeroized. Insert it into the cache, clear, verify the
|
||||
flag is set. This tests the drop path, not the zeroize path directly, but
|
||||
confirms `clear()` drops values.
|
||||
- **Option B**: Test the LRU eviction path — fill the cache to `max_entries`,
|
||||
insert one more, verify the LRU entry was evicted (dropped).
|
||||
- **Option C**: Test that `lock()` calls `cache.clear()` and the cache is empty
|
||||
afterward (integration test via `VaultServiceHandle`).
|
||||
|
||||
At minimum, implement Option B and C. Option A is a bonus if feasible without
|
||||
over-engineering the test type.
|
||||
|
||||
### Scope
|
||||
|
||||
This task touches `cache.rs` (test additions) and possibly `tests/`. It does
|
||||
not depend on the irpc removal task (drift #4) because `cache.rs` is a separate
|
||||
file. It can run in parallel with drift #4.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Test: LRU eviction drops the evicted `CachedKey` (cache exceeds `max_entries`, oldest evicted)
|
||||
- [ ] Test: `lock()` clears the cache (verify cache is empty after lock)
|
||||
- [ ] Test: TTL expiry evicts entries (set short TTL, wait, verify entry gone)
|
||||
- [ ] Test: `clear()` removes all entries (verify empty after clear)
|
||||
- [ ] `cargo test` succeeds
|
||||
- [ ] `cargo clippy` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/vault/README.md — Known Source Drift table item #6
|
||||
- docs/architecture/crates/vault/service.md — Cache section, Security Constraints
|
||||
- docs/architecture/crates/vault/encryption.md — Security Constraints
|
||||
|
||||
## Notes
|
||||
|
||||
> `HashMap::clear()` does drop values, triggering their `Drop` impls. This is
|
||||
> standard Rust behavior, but key material is security-critical enough to
|
||||
> warrant an explicit test. This task touches only `cache.rs` and can run in
|
||||
> parallel with the irpc removal task (drift #4).
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
140
tasks/vault/derivedkey-serialization.md
Normal file
140
tasks/vault/derivedkey-serialization.md
Normal file
@@ -0,0 +1,140 @@
|
||||
---
|
||||
id: vault/derivedkey-serialization
|
||||
name: Implement always-redact DerivedKey serialization and reject redacted payloads on deserialize
|
||||
status: pending
|
||||
depends_on: [vault/irpc-removal]
|
||||
scope: narrow
|
||||
risk: medium
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Fix drift item #5: `DerivedKey` currently has dual serialization behavior — JSON
|
||||
redacts the private key, but postcard (the binary format used by irpc) preserves
|
||||
the raw bytes. ADR-025 dropped the postcard/remote path, so `DerivedKey` should
|
||||
**always** redact on serialize and reject `"[REDACTED]"` on deserialize with an
|
||||
explicit error.
|
||||
|
||||
### Current state
|
||||
|
||||
`protocol.rs` has `DerivedKey` with `#[derive(Serialize, Deserialize)]` (or
|
||||
similar) that produces JSON redaction for JSON but preserves bytes in postcard.
|
||||
The postcard tests in the test suite verify the binary round-trip.
|
||||
|
||||
### Target state
|
||||
|
||||
Per `docs/architecture/crates/vault/protocol.md` → Serialization Redaction:
|
||||
|
||||
`DerivedKey` must **not** derive `Deserialize` via `#[derive]`. It needs custom
|
||||
`Serialize` and `Deserialize` impls:
|
||||
|
||||
**Custom Serialize** — always redacts `private_key`:
|
||||
```rust
|
||||
impl serde::Serialize for DerivedKey {
|
||||
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
|
||||
where S: serde::Serializer {
|
||||
use serde::SerializeStruct;
|
||||
let mut s = serializer.serialize_struct("DerivedKey", 3)?;
|
||||
s.serialize_field("key_type", &self.key_type)?;
|
||||
s.serialize_field("private_key", "[REDACTED]")?;
|
||||
s.serialize_field("public_key", &self.public_key)?;
|
||||
s.end()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Custom Deserialize** — rejects `"[REDACTED]"` with an explicit error:
|
||||
```rust
|
||||
impl<'de> serde::Deserialize<'de> for DerivedKey {
|
||||
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
|
||||
where D: serde::Deserializer<'de> {
|
||||
#[derive(serde::Deserialize)]
|
||||
struct DerivedKeyHelper {
|
||||
key_type: KeyType,
|
||||
private_key: Vec<u8>,
|
||||
public_key: Vec<u8>,
|
||||
}
|
||||
let helper = DerivedKeyHelper::deserialize(deserializer)?;
|
||||
if helper.private_key == b"[REDACTED]" {
|
||||
return Err(serde::de::Error::custom(
|
||||
"DerivedKey.private_key is \"[REDACTED]\" — redacted payloads \
|
||||
cannot be deserialized. JSON round-tripping a DerivedKey is \
|
||||
not supported (the private key is gone)."
|
||||
));
|
||||
}
|
||||
Ok(DerivedKey {
|
||||
key_type: helper.key_type,
|
||||
private_key: helper.private_key,
|
||||
public_key: helper.public_key,
|
||||
})
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Debug impl** — also redacts:
|
||||
```rust
|
||||
impl fmt::Debug for DerivedKey {
|
||||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
|
||||
f.debug_struct("DerivedKey")
|
||||
.field("key_type", &self.key_type)
|
||||
.field("private_key", &"[REDACTED]")
|
||||
.field("public_key", &self.public_key)
|
||||
.finish()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Remove postcard tests
|
||||
|
||||
The postcard round-trip tests (which verified binary format preserved private
|
||||
key bytes) are removed — ADR-025 dropped that path. The `postcard`
|
||||
dev-dependency was removed in the irpc removal task (drift #4).
|
||||
|
||||
### Why custom impls instead of derives
|
||||
|
||||
A derived `Deserialize` would generate a default impl that conflicts with the
|
||||
manual one, and would only fail incidentally (serde type mismatch: string vs
|
||||
sequence), not with the explicit redaction-rejection error the spec requires.
|
||||
The custom impl is required for the explicit error message.
|
||||
|
||||
### Scope
|
||||
|
||||
This task touches `protocol.rs` (the `DerivedKey` type, its serde impls, Debug
|
||||
impl) and test files (remove postcard tests, add redaction tests). It depends on
|
||||
the irpc removal task (drift #4) because both modify `protocol.rs`.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `DerivedKey` does not derive `Serialize` or `Deserialize` via `#[derive]`
|
||||
- [ ] Custom `Serialize` impl always redacts `private_key` as `"[REDACTED]"`
|
||||
- [ ] Custom `Deserialize` impl rejects `private_key == b"[REDACTED]"` with explicit error
|
||||
- [ ] Custom `Debug` impl redacts `private_key` as `"[REDACTED]"`
|
||||
- [ ] Postcard round-trip tests removed
|
||||
- [ ] Unit test: JSON serialize produces `"[REDACTED]"` for `private_key`
|
||||
- [ ] Unit test: JSON deserialize of a redacted payload returns an error (not a corrupted key)
|
||||
- [ ] Unit test: `{:?}` on `DerivedKey` does not contain private key bytes
|
||||
- [ ] `cargo test` succeeds
|
||||
- [ ] `cargo clippy` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/vault/README.md — Known Source Drift table item #5
|
||||
- docs/architecture/crates/vault/protocol.md — Serialization Redaction, Debug redaction
|
||||
- docs/architecture/decisions/025-vault-local-only-dispatch.md — ADR-025 (resolves W8)
|
||||
- docs/architecture/decisions/014-secret-material-flow-and-capability-injection.md — ADR-014
|
||||
|
||||
## Notes
|
||||
|
||||
> The redaction is defense-in-depth for logging safety, not the primary control
|
||||
> — the primary control is that `DerivedKey` never crosses the call protocol
|
||||
> wire (ADR-014). ADR-025 dropped the postcard/remote path that previously
|
||||
> preserved bytes in binary formats. The custom Deserialize impl is required
|
||||
> because `#[derive(Deserialize)]` would conflict and not produce the explicit
|
||||
> redaction-rejection error. Depends on irpc removal because both modify
|
||||
> `protocol.rs`.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
106
tasks/vault/irpc-removal.md
Normal file
106
tasks/vault/irpc-removal.md
Normal file
@@ -0,0 +1,106 @@
|
||||
---
|
||||
id: vault/irpc-removal
|
||||
name: Remove irpc dependency and actor dispatch from vault, convert to direct method calls on VaultServiceHandle
|
||||
status: pending
|
||||
depends_on: []
|
||||
scope: broad
|
||||
risk: high
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Remove the irpc-based actor dispatch from the vault crate and convert to direct
|
||||
method calls on `VaultServiceHandle`. This is drift item #4 from the vault README
|
||||
drift table and the foundational ADR-025 refactor — it restructures `service.rs`
|
||||
and `protocol.rs` fundamentally, which is why most other vault drift tasks depend
|
||||
on this one.
|
||||
|
||||
### What to remove
|
||||
|
||||
- `VaultProtocol` enum with `#[rpc_requests]` derive in `protocol.rs`
|
||||
- `VaultServiceActor` in `service.rs`
|
||||
- `Client<VaultProtocol>` usage
|
||||
- `irpc` and `irpc-derive` dependencies from `Cargo.toml`
|
||||
- `postcard` from dev-dependencies (was only needed for the irpc binary path)
|
||||
- `tokio` dependency from `Cargo.toml` (the vault uses `std::sync::RwLock`, not
|
||||
`tokio::sync::RwLock` — ADR-025)
|
||||
- `VaultMessage` / `VaultProtocol` re-exports from `lib.rs`
|
||||
|
||||
### What to keep / change
|
||||
|
||||
- `VaultServiceHandle` stays — it becomes the sole API. It is already
|
||||
`Arc<std::sync::RwLock<VaultServiceInner>>` with synchronous methods. The actor
|
||||
path (`mpsc` channel + oneshot backchannels via irpc's `Service` trait) is
|
||||
removed entirely.
|
||||
- `VaultServiceError` drops `Serialize`/`Deserialize` derives (were needed for
|
||||
irpc dispatch — ADR-025 removed that path). It becomes a plain `thiserror::Error`
|
||||
enum.
|
||||
- `DerivedKey` and `KeyType` stay in `protocol.rs` — the file is renamed in
|
||||
spirit to "the types module" but the filename stays `protocol.rs` for
|
||||
continuity. The `VaultProtocol` enum is removed; `DerivedKey`/`KeyType` remain.
|
||||
- `lib.rs` re-exports are updated to remove `VaultMessage`, `VaultProtocol`,
|
||||
`VaultServiceActor` and reflect the new public API per the vault README's
|
||||
Public API section.
|
||||
|
||||
### Public API after this task
|
||||
|
||||
Per `docs/architecture/crates/vault/README.md` → Public API:
|
||||
|
||||
```rust
|
||||
pub use mnemonic::{Language, Mnemonic, Seed};
|
||||
pub use derivation::{DerivationError, ExtendedPrivKey, PATHS};
|
||||
pub use encryption::{EncryptedData, EncryptionError, EncryptionKey};
|
||||
pub use encryption::CURRENT_KEY_VERSION;
|
||||
pub use protocol::{DerivedKey, KeyType};
|
||||
pub use service::{VaultServiceError, VaultServiceHandle};
|
||||
pub use cache::CacheConfig;
|
||||
```
|
||||
|
||||
### Cargo.toml changes
|
||||
|
||||
Remove from `[dependencies]`:
|
||||
- `irpc = { workspace = true }`
|
||||
- `irpc-derive = { workspace = true }`
|
||||
- `tokio = { version = "1", features = ["sync", "rt", "macros"] }`
|
||||
|
||||
Remove from `[dev-dependencies]`:
|
||||
- `postcard = { version = "1", features = ["alloc"] }`
|
||||
|
||||
The vault should have **zero** async runtime dependency after this task.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `VaultProtocol` enum and `#[rpc_requests]` derive removed from `protocol.rs`
|
||||
- [ ] `VaultServiceActor` removed from `service.rs`
|
||||
- [ ] `Client<VaultProtocol>` usage removed
|
||||
- [ ] `irpc`, `irpc-derive`, `tokio` removed from `[dependencies]` in `Cargo.toml`
|
||||
- [ ] `postcard` removed from `[dev-dependencies]` in `Cargo.toml`
|
||||
- [ ] `VaultServiceError` no longer derives `Serialize`/`Deserialize`
|
||||
- [ ] `lib.rs` re-exports match the Public API section of vault README (no `VaultMessage`, `VaultProtocol`, `VaultServiceActor`)
|
||||
- [ ] `VaultServiceHandle` methods are all synchronous (no `async`, no `.await`)
|
||||
- [ ] `cargo check` succeeds
|
||||
- [ ] `cargo clippy` succeeds with no warnings
|
||||
- [ ] `cargo test` succeeds (existing tests updated to remove irpc/postcard usage)
|
||||
- [ ] No `tokio` dependency remains in the vault `Cargo.toml`
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/vault/README.md — Known Source Drift table item #4, Public API section
|
||||
- docs/architecture/crates/vault/service.md — Dispatch section, VaultServiceHandle
|
||||
- docs/architecture/crates/vault/protocol.md — Local-Only by Construction
|
||||
- docs/architecture/decisions/025-vault-local-only-dispatch.md — ADR-025
|
||||
|
||||
## Notes
|
||||
|
||||
> This is the foundational vault refactor. It restructures `service.rs` and
|
||||
> `protocol.rs` — most other vault drift tasks touch these same files and must
|
||||
> follow this one to avoid merge conflicts. The `VaultServiceHandle` struct
|
||||
> already uses `std::sync::RwLock` with synchronous methods; the actor path is
|
||||
> the dead code to remove. After this task, the vault has no async runtime
|
||||
> dependency and no RPC framework dependency — it is local-only by construction.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
127
tasks/vault/key-versioning-rotation.md
Normal file
127
tasks/vault/key-versioning-rotation.md
Normal file
@@ -0,0 +1,127 @@
|
||||
---
|
||||
id: vault/key-versioning-rotation
|
||||
name: Implement version-indexed encryption key paths, bump CURRENT_KEY_VERSION to 2, and add rotate method
|
||||
status: pending
|
||||
depends_on: [vault/irpc-removal]
|
||||
scope: moderate
|
||||
risk: medium
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Fix drift items #3, #9, and #10 as one coherent feature: the version-indexed
|
||||
key rotation mechanism from ADR-021. These three drifts are tightly coupled —
|
||||
`CURRENT_KEY_VERSION = 2` (drift #3), version-aware `encrypt`/`decrypt` via
|
||||
`encryption_path_for_version` (drift #9), and the `rotate` method (drift #10)
|
||||
form the complete key rotation feature. Splitting them would produce tasks that
|
||||
don't compile independently.
|
||||
|
||||
### Drift #3: Bump CURRENT_KEY_VERSION
|
||||
|
||||
Current: `CURRENT_KEY_VERSION = 1` (but the key is HD-derived, and v1 is
|
||||
reserved for the TypeScript PBKDF2 legacy per ADR-020).
|
||||
|
||||
Target: `CURRENT_KEY_VERSION = 2` (HD-derived, per ADR-020).
|
||||
|
||||
Version semantics:
|
||||
- v1: TypeScript predecessor's PBKDF2-encrypted data — the vault **cannot**
|
||||
decrypt it (different key derivation). Migration is a one-time re-encryption.
|
||||
- v2: HD-derived at `m/74'/2'/0'/0'` (PATHS::ENCRYPTION) — current.
|
||||
- v3+: `m/74'/2'/0'/1'`, `m/74'/2'/0'/2'`, etc. — future rotation versions.
|
||||
|
||||
### Drift #9: Version-aware encrypt/decrypt
|
||||
|
||||
Current: `encrypt`/`decrypt` always derive at `PATHS::ENCRYPTION` regardless of
|
||||
the `key_version` parameter.
|
||||
|
||||
Target:
|
||||
- `encrypt(plaintext, key_version)`: derive the encryption key at
|
||||
`encryption_path_for_version(key_version)`, stamp the same `key_version` on
|
||||
the resulting `EncryptedData`.
|
||||
- `decrypt(encrypted)`: derive the key at
|
||||
`encryption_path_for_version(encrypted.key_version)` — the blob carries its
|
||||
own version, and each version maps to a distinct derivation path.
|
||||
|
||||
This requires:
|
||||
1. `encryption_path_for_version(version: u32) -> Result<String, DerivationError>`
|
||||
already exists in `derivation.rs` — verify it returns `InvalidPath` for
|
||||
`version < 2` (v1 is TS legacy, v0 is meaningless).
|
||||
2. `derive_encryption_key_for_version(version: u32) -> Result<DerivedKey, VaultServiceError>`
|
||||
— a new method on `VaultServiceHandle` that maps version → path → derive.
|
||||
Cached by path (same cache as `derive_encryption_key`).
|
||||
3. `encrypt` and `decrypt` use `derive_encryption_key_for_version` instead of
|
||||
deriving at the fixed `PATHS::ENCRYPTION` path.
|
||||
|
||||
### Drift #10: Implement rotate
|
||||
|
||||
Current: no `rotate` method exists.
|
||||
|
||||
Target:
|
||||
```rust
|
||||
pub fn rotate(&self, encrypted: &EncryptedData, to_version: u32) -> Result<EncryptedData, VaultServiceError>;
|
||||
```
|
||||
|
||||
Decrypts with the old version's key (from `encrypted.key_version`), re-encrypts
|
||||
with the new version's key (`to_version`). Returns the new `EncryptedData` —
|
||||
the caller replaces the blob in storage. No new mnemonic needed; the same seed
|
||||
produces all version keys via different derivation paths (ADR-021).
|
||||
|
||||
### Implementation notes
|
||||
|
||||
- `derive_encryption_key(path)` (the path-based API) remains as-is for deriving
|
||||
at arbitrary paths. `derive_encryption_key_for_version(version)` is the
|
||||
version-aware API used by `encrypt`/`decrypt`. Both share the same cache
|
||||
(keyed by derivation path).
|
||||
- `encrypt` and `decrypt` extract the `EncryptionKey` from the `DerivedKey` via
|
||||
`EncryptionKey::from_derived_bytes` (see encryption.md).
|
||||
- `encryption_path_for_version` returns `InvalidPath` for `version < 2`.
|
||||
`derive_encryption_key_for_version` propagates this as
|
||||
`VaultServiceError::InvalidPath`.
|
||||
|
||||
### Scope
|
||||
|
||||
This task touches `encryption.rs` (CURRENT_KEY_VERSION), `service.rs` (encrypt,
|
||||
decrypt, rotate, derive_encryption_key_for_version), and possibly `derivation.rs`
|
||||
(verify `encryption_path_for_version`). It depends on the irpc removal task
|
||||
(drift #4) because both modify `service.rs`.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `CURRENT_KEY_VERSION` is `2` in `encryption.rs`
|
||||
- [ ] `derive_encryption_key_for_version(version)` method added to `VaultServiceHandle`
|
||||
- [ ] `derive_encryption_key_for_version` returns `InvalidPath` for `version < 2`
|
||||
- [ ] `encrypt(plaintext, key_version)` derives at `encryption_path_for_version(key_version)`
|
||||
- [ ] `encrypt` stamps the passed `key_version` on the resulting `EncryptedData`
|
||||
- [ ] `decrypt(encrypted)` derives at `encryption_path_for_version(encrypted.key_version)`
|
||||
- [ ] `rotate(encrypted, to_version)` method implemented: decrypt old, re-encrypt new
|
||||
- [ ] `rotate` returns `EncryptedData` with `key_version = to_version`
|
||||
- [ ] Unit test: encrypt at v2, decrypt at v2 — round-trip succeeds
|
||||
- [ ] Unit test: encrypt at v2, rotate to v3, decrypt at v3 — round-trip succeeds
|
||||
- [ ] Unit test: decrypt v2 blob after rotation — old key still derivable (partial rotation safe)
|
||||
- [ ] Unit test: `derive_encryption_key_for_version(1)` returns `InvalidPath`
|
||||
- [ ] Unit test: `derive_encryption_key_for_version(0)` returns `InvalidPath`
|
||||
- [ ] `cargo test` succeeds
|
||||
- [ ] `cargo clippy` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/vault/README.md — Known Source Drift table items #3, #9, #10
|
||||
- docs/architecture/crates/vault/encryption.md — Key Versioning, Rotation, EncryptionKey
|
||||
- docs/architecture/crates/vault/service.md — encrypt, decrypt, rotate, derive_encryption_key_for_version
|
||||
- docs/architecture/crates/vault/mnemonic-derivation.md — encryption_path_for_version, PATHS
|
||||
- docs/architecture/decisions/020-hd-derivation-for-encryption-keys.md — ADR-020
|
||||
- docs/architecture/decisions/021-key-rotation-via-version-indexed-paths.md — ADR-021
|
||||
|
||||
## Notes
|
||||
|
||||
> These three drifts are one feature: version-indexed key rotation (ADR-021).
|
||||
> Splitting them would produce tasks that don't compile independently —
|
||||
> bumping the version without version-aware encrypt/decrypt would make v2
|
||||
> blobs undecryptable, and rotate without version-aware encrypt/decrypt has no
|
||||
> keys to work with. Depends on irpc removal because both modify `service.rs`.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
83
tasks/vault/osrng-iv-generation.md
Normal file
83
tasks/vault/osrng-iv-generation.md
Normal file
@@ -0,0 +1,83 @@
|
||||
---
|
||||
id: vault/osrng-iv-generation
|
||||
name: Replace rand::random() IV generation with OsRng in AES-GCM encryption
|
||||
status: pending
|
||||
depends_on: []
|
||||
scope: single
|
||||
risk: medium
|
||||
impact: isolated
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Fix drift item #1: the AES-256-GCM IV (nonce) generation in `encryption.rs`
|
||||
currently uses `rand::random()`, which uses the thread-local RNG and may not be a
|
||||
CSPRNG on all platforms. Replace with `OsRng` (or equivalent CSPRNG).
|
||||
|
||||
This is a security-critical fix. IV reuse under the same AES-GCM key is
|
||||
catastrophic — it breaks authenticity and creates a two-time-pad on the
|
||||
plaintext. `OsRng` reads from the operating system's entropy source and is the
|
||||
correct choice for cryptographic nonces.
|
||||
|
||||
### Current state
|
||||
|
||||
`encryption.rs` line ~133: IV generation uses `rand::random()` to produce the
|
||||
12-byte GCM nonce.
|
||||
|
||||
### Target state
|
||||
|
||||
Use `rand::rngs::OsRng` (from the `rand` crate, which is already a dependency)
|
||||
to generate the 12-byte IV. The `aes-gcm` crate's `Aes256Gcm` encrypt path takes
|
||||
a `Nonce` — construct it from `OsRng`-generated bytes.
|
||||
|
||||
```rust
|
||||
use rand::rngs::OsRng;
|
||||
use rand::RngCore;
|
||||
|
||||
let mut iv_bytes = [0u8; 12];
|
||||
OsRng.fill_bytes(&mut iv_bytes);
|
||||
let nonce = Nonce::from_slice(&iv_bytes);
|
||||
```
|
||||
|
||||
The IV is generated fresh for each `encrypt()` call. The salt (32 bytes, unused
|
||||
in v2 for key derivation but kept for wire-format compat) should also use `OsRng`
|
||||
for consistency — it's stored in the `EncryptedData` blob and doesn't need to be
|
||||
deterministic.
|
||||
|
||||
### Scope
|
||||
|
||||
This task touches only `encryption.rs`. It does not depend on the irpc removal
|
||||
(drift #4) because `encryption.rs` is a separate file from `service.rs` /
|
||||
`protocol.rs`. It can run in parallel with drift #4.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `encryption::encrypt()` uses `OsRng` for IV generation, not `rand::random()`
|
||||
- [ ] Salt generation uses `OsRng` (or equivalent CSPRNG)
|
||||
- [ ] No `rand::random()` calls remain in `encryption.rs`
|
||||
- [ ] IV is 12 bytes (standard GCM nonce size)
|
||||
- [ ] Salt is 32 bytes (wire-format compat, unused in key derivation)
|
||||
- [ ] Unit test: verify IV is fresh on each encrypt call (encrypt twice, different IVs)
|
||||
- [ ] Unit test: verify decrypt round-trip still works after the change
|
||||
- [ ] `cargo test` succeeds
|
||||
- [ ] `cargo clippy` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/vault/README.md — Known Source Drift table item #1
|
||||
- docs/architecture/crates/vault/encryption.md — Security Constraints: OsRng for IVs
|
||||
- docs/architecture/crates/vault/service.md — Security Constraints: OsRng for IVs
|
||||
- docs/architecture/decisions/020-hd-derivation-for-encryption-keys.md — ADR-020
|
||||
|
||||
## Notes
|
||||
|
||||
> This is a security-critical fix. IV reuse under the same AES-GCM key breaks
|
||||
> authenticity and creates a two-time-pad on the plaintext. `rand::random()`
|
||||
> uses the thread-local RNG which may not be a CSPRNG on all platforms; `OsRng`
|
||||
> reads from the operating system's entropy source. This task touches only
|
||||
> `encryption.rs` and can run in parallel with the irpc removal task (drift #4).
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
86
tasks/vault/poisoned-lock-recovery.md
Normal file
86
tasks/vault/poisoned-lock-recovery.md
Normal file
@@ -0,0 +1,86 @@
|
||||
---
|
||||
id: vault/poisoned-lock-recovery
|
||||
name: Replace unwrap() on RwLock acquisition with poisoned-lock recovery via unwrap_or_else
|
||||
status: pending
|
||||
depends_on: [vault/irpc-removal]
|
||||
scope: narrow
|
||||
risk: low
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Fix drift item #2: `VaultServiceHandle` methods use `unwrap()` on every
|
||||
`RwLock` acquisition (read and write locks). A poisoned lock (caused by a panic
|
||||
while the lock was held) would brick the vault for all subsequent operations.
|
||||
Replace with `unwrap_or_else(|e| e.into_inner())` to recover the inner data from
|
||||
a poisoned lock, or explicit error propagation where appropriate.
|
||||
|
||||
### Current state
|
||||
|
||||
`service.rs` uses `.unwrap()` on `RwLock` read and write acquisitions at
|
||||
approximately lines 142, 161, 182, 191, 196, 227, 264, 307, 340, 367 (line
|
||||
numbers may shift after the irpc removal task — match by pattern: every
|
||||
`.read().unwrap()` and `.write().unwrap()` call in `VaultServiceHandle` method
|
||||
bodies).
|
||||
|
||||
### Target state
|
||||
|
||||
For read locks:
|
||||
```rust
|
||||
let inner = self.inner.read().unwrap_or_else(|e| e.into_inner());
|
||||
```
|
||||
|
||||
For write locks:
|
||||
```rust
|
||||
let mut inner = self.inner.write().unwrap_or_else(|e| e.into_inner());
|
||||
```
|
||||
|
||||
The rationale: a poisoned lock means a panic occurred while the lock was held.
|
||||
The data may be in an inconsistent state, but bricking the vault (panicking on
|
||||
every subsequent call) is worse than attempting to continue. The vault's
|
||||
operations are idempotent reads (derive) and state transitions (lock/unlock) —
|
||||
recovering the inner data and continuing is the pragmatic choice. If the data
|
||||
is truly corrupted, the next operation will fail with a normal error, not a
|
||||
panic.
|
||||
|
||||
### No unwrap() or expect() outside tests
|
||||
|
||||
This is a general constraint for the vault: no `unwrap()` or `expect()` outside
|
||||
test code. After fixing the RwLock acquisitions, audit the rest of `service.rs`
|
||||
for any remaining `unwrap()`/`expect()` calls and replace them with proper error
|
||||
propagation (`?` operator, explicit `Result` returns, or
|
||||
`unwrap_or_else(|e| e.into_inner())` for lock recovery).
|
||||
|
||||
### Scope
|
||||
|
||||
This task touches `service.rs` only. It depends on the irpc removal task (drift
|
||||
#4) because that task restructures `service.rs` — doing this first would cause
|
||||
merge conflicts.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] All `.read().unwrap()` calls in `VaultServiceHandle` methods replaced with `.read().unwrap_or_else(|e| e.into_inner())`
|
||||
- [ ] All `.write().unwrap()` calls in `VaultServiceHandle` methods replaced with `.write().unwrap_or_else(|e| e.into_inner())`
|
||||
- [ ] No `unwrap()` or `expect()` calls remain in `service.rs` outside of test code
|
||||
- [ ] Unit test: vault remains usable after a simulated panic (poison the lock, verify next call recovers)
|
||||
- [ ] `cargo test` succeeds
|
||||
- [ ] `cargo clippy` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/vault/README.md — Known Source Drift table item #2
|
||||
- docs/architecture/crates/vault/service.md — Security Constraints: No unwrap() outside tests
|
||||
- docs/architecture/decisions/025-vault-local-only-dispatch.md — ADR-025
|
||||
|
||||
## Notes
|
||||
|
||||
> A panic in one vault operation must not brick the vault for all other
|
||||
> operations. The poisoned-lock recovery via `unwrap_or_else(|e| e.into_inner())`
|
||||
> is the standard Rust pattern for this. This task depends on the irpc removal
|
||||
> task because both modify `service.rs` heavily.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
69
tasks/vault/remove-password-derivation.md
Normal file
69
tasks/vault/remove-password-derivation.md
Normal file
@@ -0,0 +1,69 @@
|
||||
---
|
||||
id: vault/remove-password-derivation
|
||||
name: Remove derive_password and site_password_path methods (password-manager pattern not relevant)
|
||||
status: pending
|
||||
depends_on: [vault/irpc-removal]
|
||||
scope: single
|
||||
risk: trivial
|
||||
impact: isolated
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Fix drift item #7: the vault currently has `derive_password`,
|
||||
`derive_password_string`, and `site_password_path` methods. These implement a
|
||||
password-manager pattern (deriving site-specific passwords from the seed) that
|
||||
is not relevant to an RPC system's vault. Remove them entirely per ADR-025
|
||||
(resolves review #002 C9).
|
||||
|
||||
### What to remove
|
||||
|
||||
- `derive_password` method from `VaultServiceHandle` (in `service.rs`)
|
||||
- `derive_password_string` method from `VaultServiceHandle` (in `service.rs`)
|
||||
- `site_password_path` function (in `mnemonic-derivation.rs` or `derivation.rs`,
|
||||
wherever it's defined)
|
||||
- Any associated path constants for password derivation
|
||||
- Any tests for these methods
|
||||
- Any references in `lib.rs` re-exports
|
||||
|
||||
### Why
|
||||
|
||||
The vault's purpose in alknet is to derive cryptographic keys (Ed25519 for
|
||||
identity, AES-256-GCM for encryption) and encrypt/decrypt external credentials.
|
||||
Site-specific password derivation is a password-manager feature that doesn't
|
||||
belong in a networking toolkit's vault. Keeping it expands the attack surface
|
||||
and API surface for no benefit.
|
||||
|
||||
### Scope
|
||||
|
||||
This task touches `service.rs` and possibly `derivation.rs` /
|
||||
`mnemonic-derivation.rs`. It depends on the irpc removal task (drift #4) because
|
||||
both modify `service.rs`.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `derive_password` method removed from `VaultServiceHandle`
|
||||
- [ ] `derive_password_string` method removed from `VaultServiceHandle`
|
||||
- [ ] `site_password_path` function removed
|
||||
- [ ] Any password-derivation path constants removed
|
||||
- [ ] Tests for password derivation removed
|
||||
- [ ] No references to password derivation remain in `lib.rs` re-exports
|
||||
- [ ] `cargo check` succeeds (no dangling references)
|
||||
- [ ] `cargo test` succeeds
|
||||
- [ ] `cargo clippy` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/vault/README.md — Known Source Drift table item #7
|
||||
- docs/architecture/decisions/025-vault-local-only-dispatch.md — ADR-025 (resolves C9)
|
||||
|
||||
## Notes
|
||||
|
||||
> Straightforward removal. The password-manager pattern was inherited from the
|
||||
> POC and is not relevant to alknet's vault use case. Depends on irpc removal
|
||||
> because both modify `service.rs`.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
112
tasks/vault/review-vault-sync.md
Normal file
112
tasks/vault/review-vault-sync.md
Normal file
@@ -0,0 +1,112 @@
|
||||
---
|
||||
id: vault/review-vault-sync
|
||||
name: Review vault implementation against specs after all drift fixes
|
||||
status: pending
|
||||
depends_on: [vault/irpc-removal, vault/osrng-iv-generation, vault/poisoned-lock-recovery, vault/remove-password-derivation, vault/unlock-new-zeroizing-return, vault/key-versioning-rotation, vault/derivedkey-serialization, vault/cache-zeroization-test]
|
||||
scope: moderate
|
||||
risk: low
|
||||
impact: phase
|
||||
level: review
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Review the vault crate implementation against the architecture specs after all
|
||||
drift fixes are complete. This is the quality checkpoint before the spec-sync
|
||||
task — verify that the implementation matches the specs and that no drift
|
||||
items were missed or incompletely fixed.
|
||||
|
||||
### Review Checklist
|
||||
|
||||
1. **Drift table verification** — every item in the vault README's Known Source
|
||||
Drift table is resolved:
|
||||
- #1: OsRng for IVs (encryption.rs)
|
||||
- #2: No unwrap() on RwLock (service.rs)
|
||||
- #3: CURRENT_KEY_VERSION = 2 (encryption.rs)
|
||||
- #4: irpc removed, direct method calls (protocol.rs, service.rs, Cargo.toml)
|
||||
- #5: DerivedKey always-redact serialization (protocol.rs)
|
||||
- #6: Cache zeroization tested (cache.rs)
|
||||
- #7: derive_password removed (service.rs, derivation)
|
||||
- #8: unlock_new returns Zeroizing<String> (service.rs)
|
||||
- #9: encrypt/decrypt version-aware (service.rs)
|
||||
- #10: rotate implemented (service.rs)
|
||||
|
||||
2. **Spec conformance** — implementation matches the spec docs:
|
||||
- `VaultServiceHandle` API matches service.md (all methods, signatures, semantics)
|
||||
- `DerivedKey` / `KeyType` match protocol.md (serialization, redaction, move-only)
|
||||
- `EncryptedData` / `EncryptionKey` match encryption.md (fields, key versioning)
|
||||
- `Mnemonic` / `Seed` / `ExtendedPrivKey` match mnemonic-derivation.md
|
||||
- `KeyCache` / `CachedKey` / `CacheConfig` match service.md Cache section
|
||||
- PATHS constants match mnemonic-derivation.md (IDENTITY, DEVICE_PREFIX, SSH_HOST, ENCRYPTION, ETHEREUM)
|
||||
- `encryption_path_for_version` matches (returns InvalidPath for version < 2)
|
||||
|
||||
3. **Security constraints** (from service.md, encryption.md, README.md):
|
||||
- OsRng for IVs and salt (no `rand::random()`)
|
||||
- Zeroized drop on Seed, Mnemonic, ExtendedPrivKey, EncryptionKey, CachedKey, DerivedKey
|
||||
- No `unwrap()` or `expect()` outside tests
|
||||
- DerivedKey is move-only (no Clone)
|
||||
- DerivedKey Debug impl redacts private key
|
||||
- Cache eviction zeroizes (tested)
|
||||
- No tokio dependency (local-only, std::sync::RwLock)
|
||||
|
||||
4. **Public API** — `lib.rs` re-exports match the vault README's Public API section:
|
||||
- `Mnemonic`, `Seed`, `Language` from mnemonic
|
||||
- `DerivationError`, `ExtendedPrivKey`, `PATHS` from derivation
|
||||
- `EncryptedData`, `EncryptionError`, `EncryptionKey`, `CURRENT_KEY_VERSION` from encryption
|
||||
- `DerivedKey`, `KeyType` from protocol
|
||||
- `VaultServiceError`, `VaultServiceHandle` from service
|
||||
- `CacheConfig` from cache
|
||||
- No `VaultMessage`, `VaultProtocol`, `VaultServiceActor` (removed)
|
||||
|
||||
5. **Test coverage**:
|
||||
- Derivation test vectors (BIP39 "abandon...about" vector)
|
||||
- Encryption round-trip tests
|
||||
- Service lifecycle tests (unlock, lock, derive, encrypt, decrypt, rotate)
|
||||
- Cache tests (LRU, TTL, clear, zeroization)
|
||||
- Serialization redaction tests (JSON redact, reject redacted deserialize)
|
||||
|
||||
6. **Code quality**:
|
||||
- `cargo fmt --check` passes
|
||||
- `cargo clippy` passes with no warnings
|
||||
- No dead code (removed irpc/actor/password paths fully gone)
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] All 10 drift items verified resolved
|
||||
- [ ] VaultServiceHandle API matches service.md
|
||||
- [ ] DerivedKey / KeyType match protocol.md
|
||||
- [ ] EncryptedData / EncryptionKey match encryption.md
|
||||
- [ ] Mnemonic / Seed / ExtendedPrivKey match mnemonic-derivation.md
|
||||
- [ ] KeyCache / CachedKey / CacheConfig match service.md
|
||||
- [ ] PATHS constants match mnemonic-derivation.md
|
||||
- [ ] All security constraints satisfied (OsRng, zeroize, no unwrap, move-only, redaction)
|
||||
- [ ] Public API (lib.rs re-exports) matches vault README
|
||||
- [ ] Test coverage adequate for all functionality
|
||||
- [ ] `cargo fmt --check` passes
|
||||
- [ ] `cargo clippy` passes with no warnings
|
||||
- [ ] All tests pass
|
||||
- [ ] No dead code from removed features (irpc, actor, password derivation)
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/vault/README.md — drift table, public API, security constraints
|
||||
- docs/architecture/crates/vault/service.md
|
||||
- docs/architecture/crates/vault/encryption.md
|
||||
- docs/architecture/crates/vault/protocol.md
|
||||
- docs/architecture/crates/vault/mnemonic-derivation.md
|
||||
- docs/architecture/decisions/018-vault-standalone-crate.md
|
||||
- docs/architecture/decisions/020-hd-derivation-for-encryption-keys.md
|
||||
- docs/architecture/decisions/021-key-rotation-via-version-indexed-paths.md
|
||||
- docs/architecture/decisions/025-vault-local-only-dispatch.md
|
||||
- docs/architecture/decisions/026-vault-key-model-hd-derivation.md
|
||||
|
||||
## Notes
|
||||
|
||||
> This review verifies the vault is spec-conformant after all drift fixes. If
|
||||
> deviations are found, document them and create fix tasks before proceeding
|
||||
> to the spec-sync task. This is the last checkpoint before the vault docs are
|
||||
> updated to remove the drift table and bump status.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
107
tasks/vault/spec-sync-remove-drift.md
Normal file
107
tasks/vault/spec-sync-remove-drift.md
Normal file
@@ -0,0 +1,107 @@
|
||||
---
|
||||
id: vault/spec-sync-remove-drift
|
||||
name: Update vault specs to remove drift table and security-constraint drift prose, bump doc status
|
||||
status: pending
|
||||
depends_on: [vault/review-vault-sync]
|
||||
scope: narrow
|
||||
risk: low
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
After the vault review confirms all drift is resolved, update the vault
|
||||
architecture docs to remove the drift tracking artifacts and reflect the
|
||||
completed state. The drift table and the "known drift" prose in the security
|
||||
constraints sections were tracking tools during the spec-to-implementation
|
||||
sync — now that the sync is complete, they should be cleaned up.
|
||||
|
||||
### What to update
|
||||
|
||||
1. **vault/README.md**:
|
||||
- Remove the "Known Source Drift" section (the entire table and its intro
|
||||
paragraph). The drift is resolved; the table is no longer needed.
|
||||
- Remove the "Security Constraints" drift prose — the items that said
|
||||
"The current source uses `rand::random()` — this is a known drift" etc.
|
||||
Keep the constraint statements themselves (OsRng for IVs, zeroized drop,
|
||||
no unwrap, etc.) — those are permanent implementation requirements. Remove
|
||||
only the "current source uses X, this is a known drift" sentences.
|
||||
- Bump `status: draft` → `status: stable` in the frontmatter (per the
|
||||
Document Lifecycle in the architecture README: stable = implementation
|
||||
complete and verified).
|
||||
|
||||
2. **vault/encryption.md**:
|
||||
- In Security Constraints, remove the "The current source uses
|
||||
`rand::random()` for IV generation (`encryption.rs` line 133) — this is a
|
||||
known drift from the spec and must be corrected during implementation
|
||||
sync." sentence. Keep the "OsRng for IVs" constraint.
|
||||
- In Key Versioning, remove the "The current source uses
|
||||
`CURRENT_KEY_VERSION = 1` with HD derivation and does not implement
|
||||
version-indexed paths or `rotate`. These are drift items to be corrected
|
||||
during implementation sync." paragraph.
|
||||
- Bump `status: draft` → `status: stable`.
|
||||
|
||||
3. **vault/service.md**:
|
||||
- In Security Constraints, remove the drift prose about `rand::random()`,
|
||||
`unwrap()` on RwLock, and `KeyCache::clear()` verification. Keep the
|
||||
constraint statements.
|
||||
- Bump `status: draft` → `status: stable`.
|
||||
|
||||
4. **vault/protocol.md**:
|
||||
- Remove the "to be updated per ADR-025 — remove `VaultProtocol` enum and
|
||||
irpc usage" note in References.
|
||||
- Remove the "postcard tests to be removed" note in References.
|
||||
- Bump `status: draft` → `status: stable`.
|
||||
|
||||
5. **vault/mnemonic-derivation.md**:
|
||||
- Bump `status: draft` → `status: stable` (no drift prose to remove here,
|
||||
but the doc should reflect stable status).
|
||||
|
||||
6. **architecture/README.md**:
|
||||
- Update the vault crate doc status entries in the Architecture Documents
|
||||
table from `draft` to `stable`.
|
||||
- Update the Current State paragraph to reflect vault implementation is
|
||||
complete (remove "pending ADR-025/026 refactor" language).
|
||||
|
||||
### What NOT to change
|
||||
|
||||
- Do not remove the Security Constraints sections themselves — they are
|
||||
permanent implementation requirements, not drift tracking.
|
||||
- Do not change the ADRs — they record decisions, not implementation status.
|
||||
- Do not remove the Public API section — it's a living reference.
|
||||
|
||||
### Scope
|
||||
|
||||
This task touches only documentation files — no source code changes. It
|
||||
depends on the review task (which depends on all drift fixes).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] "Known Source Drift" table removed from vault/README.md
|
||||
- [ ] Drift prose removed from Security Constraints sections (constraint statements kept)
|
||||
- [ ] All vault doc frontmatter bumped from `status: draft` to `status: stable`
|
||||
- [ ] architecture/README.md vault doc statuses updated to `stable`
|
||||
- [ ] architecture/README.md Current State updated (no "pending refactor" language)
|
||||
- [ ] No drift-tracking language remains anywhere in vault docs
|
||||
- [ ] Security constraint statements (OsRng, zeroize, no unwrap, etc.) preserved
|
||||
- [ ] Public API section preserved in vault/README.md
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/vault/README.md — Known Source Drift, Security Constraints, Public API
|
||||
- docs/architecture/crates/vault/encryption.md — Security Constraints, Key Versioning
|
||||
- docs/architecture/crates/vault/service.md — Security Constraints
|
||||
- docs/architecture/crates/vault/protocol.md — References
|
||||
- docs/architecture/README.md — Document Lifecycle, Architecture Documents table, Current State
|
||||
|
||||
## Notes
|
||||
|
||||
> This is the doc cleanup that closes out the vault phase. The drift table and
|
||||
> "known drift" prose were tracking tools during spec-to-implementation sync;
|
||||
> now that the sync is complete, they're noise. Keep the permanent constraint
|
||||
> statements — they guide future implementation agents who touch the vault.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
79
tasks/vault/unlock-new-zeroizing-return.md
Normal file
79
tasks/vault/unlock-new-zeroizing-return.md
Normal file
@@ -0,0 +1,79 @@
|
||||
---
|
||||
id: vault/unlock-new-zeroizing-return
|
||||
name: Change unlock_new return type from String to Zeroizing<String>
|
||||
status: pending
|
||||
depends_on: [vault/irpc-removal]
|
||||
scope: single
|
||||
risk: low
|
||||
impact: isolated
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Fix drift item #8: `unlock_new` currently returns `String`, which is not
|
||||
zeroized on drop. The mnemonic phrase is the root of trust — it must not linger
|
||||
in freed heap memory. Change the return type to `Zeroizing<String>` (from the
|
||||
`zeroize` crate, already a dependency).
|
||||
|
||||
### Current state
|
||||
|
||||
```rust
|
||||
pub fn unlock_new(&self, word_count: usize) -> Result<String, VaultServiceError>;
|
||||
```
|
||||
|
||||
### Target state
|
||||
|
||||
```rust
|
||||
pub fn unlock_new(&self, word_count: usize) -> Result<Zeroizing<String>, VaultServiceError>;
|
||||
```
|
||||
|
||||
Per `docs/architecture/crates/vault/service.md` → unlock_new:
|
||||
|
||||
> The returned phrase is the root of trust — it is heap-allocated and zeroized
|
||||
> on drop, so it does not linger in freed memory. The caller should extract the
|
||||
> phrase for secure storage (write down, display to user) and let the
|
||||
> `Zeroizing<String>` drop when done. Do not clone the returned value or store
|
||||
> it in a non-zeroizing container.
|
||||
|
||||
### Caller adaptation
|
||||
|
||||
The assembly layer (CLI binary, not yet implemented) will call `unlock_new` and
|
||||
extract the phrase. The `Zeroizing<String>` wrapper derefs to `String`, so
|
||||
`&*result` or `result.as_str()` works for reading. The caller must not clone the
|
||||
inner `String` into a non-zeroizing container.
|
||||
|
||||
Existing tests that call `unlock_new` need updating to handle the new return
|
||||
type — use `&*phrase` or `phrase.as_str()` to read the string.
|
||||
|
||||
### Scope
|
||||
|
||||
This task touches `service.rs` (the method signature and body) and test files.
|
||||
It depends on the irpc removal task (drift #4) because both modify `service.rs`.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `unlock_new` return type changed from `Result<String, ...>` to `Result<Zeroizing<String>, ...>`
|
||||
- [ ] Method body constructs `Zeroizing<String>` from the generated phrase
|
||||
- [ ] Existing tests updated to handle `Zeroizing<String>` return type
|
||||
- [ ] No `clone()` of the returned value in non-test code
|
||||
- [ ] `cargo check` succeeds
|
||||
- [ ] `cargo test` succeeds
|
||||
- [ ] `cargo clippy` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/vault/README.md — Known Source Drift table item #8
|
||||
- docs/architecture/crates/vault/service.md — unlock_new section
|
||||
- docs/architecture/decisions/025-vault-local-only-dispatch.md — ADR-025 (resolves W7)
|
||||
|
||||
## Notes
|
||||
|
||||
> The mnemonic is the root of trust. Returning a plain `String` means the phrase
|
||||
> lingers in freed heap memory after the caller drops it. `Zeroizing<String>`
|
||||
> zeroizes the bytes on drop. This resolves review #002 W7. Depends on irpc
|
||||
> removal because both modify `service.rs`.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
Reference in New Issue
Block a user