From 2e34590522b5743d4cd3010560162b286b6ed248 Mon Sep 17 00:00:00 2001 From: "glm-5.2" Date: Tue, 23 Jun 2026 10:56:05 +0000 Subject: [PATCH] =?UTF-8?q?docs(architecture):=20resolve=20review=20#003?= =?UTF-8?q?=20=E2=80=94=20type/API=20surface=20completeness?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Review #003 found 11 critical, 14 warning, and 6 suggestion findings after reviews #001 (governance/security) and #002 (cross-document consistency/two-way-door audit) were resolved. The theme: types and APIs that were *referenced* but never *defined*, and stale ADR sketches that didn't match the now-updated spec docs. Critical fixes (11): - C1: DerivedKey #[derive(Deserialize)] contradicted the custom Deserialize that rejects "[REDACTED]" — dropped the derive, added explicit manual Serialize/Deserialize impls (protocol.md). - C2: encrypt prose said "derived at PATHS::ENCRYPTION" but the signature takes key_version — updated to encryption_path_for_version (service.md). - C3: derive_encryption_key returned DerivedKey, derive_encryption_key _for_version returned EncryptionKey (same cache) — unified on DerivedKey, defined CachedKey (service.md). - C4: tokio vs std::sync::RwLock contradiction — specified std::sync::RwLock, dropped tokio from vault deps (ADR-018, ADR-025, service.md). - C5: Missing drift rows in vault README — added #9 (key_version ignored) and #10 (rotate not implemented). - C6: ADR-022 build_root_context and invoke() sketches omitted abort_policy (9 fields vs 10) — added the field to both sketches. - C7: Capabilities type referenced 20+ times, never defined — added struct definition to core-types.md with Clone+Send+Sync, Zeroize, sealed builder API, immutability guard. - C8: SessionOverlaySource on CallAdapter but never defined, crate violation (alknet-call can't depend on alknet-agent) — defined the trait in alknet-call (call-protocol.md), matching the IdentityProvider pattern. - C9: CompositeOperationEnv dispatch fall-through was "a two-way door" — added contains() to OperationEnv trait, made the composite probe before dispatching, eliminating the sentinel ambiguity. - C10: No API for Layer 2 (connection overlay) registration, CallConnection undefined — defined CallConnection struct + register_imported() API (call-protocol.md). - C11: with_local signature diverged between two examples (4 args vs 5) — added capabilities as the 5th arg, made both examples consistent. Warning fixes (14): - W1: invoke_with_policy restructured as required method, invoke gets a default impl delegating to it — eliminates duplication across impls. - W2: CachedKey defined (service.md). - W3: EncryptionKey constructor/glue specified, added to re-export list. - W4: Secp256k1ExtendedPrivKey defined, derive_ethereum_key glue shown. - W5: encryption_path_for_version rejects version < 2 (v1 is TS PBKDF2). - W6: Wire payload schemas for all event types + ResponseEnvelope → EventEnvelope conversion table (call-protocol.md). - W7: Timeout section — deadline on OperationContext, composed calls inherit parent's deadline, CallAdapter::with_timeout(). - W8: Request ID generation spec — UUID v4 for composed calls, wire ID vs internal ID relationship for abort cascade. - W9: unlock_new already-unlocked behavior specified (returns AlreadyUnlocked). - W10: KeyType Serialize/Deserialize justification corrected (stale irpc reference removed). - W11: OperationProvenance and CompositionAuthority defined inline in operation-registry.md (were only in ADR-022). - W12: encrypt/decrypt free functions marked pub(crate), relationship to VaultServiceHandle methods stated. - W13: rotate signature removed from encryption.md (it's a VaultServiceHandle method, not a free function). - W14: CallAdapter::new() + with_session_source() + with_timeout() constructors shown. Suggestion fixes (6): Seed: Clone note, VaultServiceInner invariant, ExtendedPrivKey accessor signatures, CURRENT_KEY_VERSION location, ADR-018 stale actor text, derivation helpers re-export note. --- docs/architecture/README.md | 6 +- .../architecture/crates/call/call-protocol.md | 151 +++++- .../crates/call/operation-registry.md | 186 +++++-- docs/architecture/crates/core/core-types.md | 64 ++- docs/architecture/crates/vault/README.md | 11 +- docs/architecture/crates/vault/encryption.md | 88 ++- .../crates/vault/mnemonic-derivation.md | 58 +- docs/architecture/crates/vault/protocol.md | 70 ++- docs/architecture/crates/vault/service.md | 72 ++- .../decisions/018-vault-standalone-crate.md | 12 +- ...on-provenance-and-composition-authority.md | 4 +- .../025-vault-local-only-dispatch.md | 11 +- docs/architecture/overview.md | 4 +- ...mplementation-architecture-sanity-check.md | 512 ++++++++++++++++++ 14 files changed, 1129 insertions(+), 120 deletions(-) create mode 100644 docs/reviews/003-pre-implementation-architecture-sanity-check.md diff --git a/docs/architecture/README.md b/docs/architecture/README.md index 0155ddc..54bd75f 100644 --- a/docs/architecture/README.md +++ b/docs/architecture/README.md @@ -1,15 +1,15 @@ --- status: draft -last_updated: 2026-06-22-20 +last_updated: 2026-06-23 --- # Alknet Architecture ## Current State -**Pre-implementation.** The project has completed a pivot from a three-layer model to an ALPN-as-service model. The greenfield workspace contains only `alknet-vault` (stable — implementation exists, pending ADR-025/026 refactor to drop irpc and remove derive_password) and research/reference material. Foundational ADRs (001–026) are in place. ADR-024 resolves the registry mutability question and the `OperationContext.env` type identity crisis by layering the registry by trust boundary. ADR-025 drops irpc from the vault, making it local-only by construction. ADR-026 records the HD-derivation key model as a foundational decision. The alknet-core, alknet-call, and alknet-vault crate specs are in draft. +**Pre-implementation.** The project has completed a pivot from a three-layer model to an ALPN-as-service model. The greenfield workspace contains only `alknet-vault` (stable — implementation exists, pending ADR-025/026 refactor to drop irpc and remove derive_password) and research/reference material. Foundational ADRs (001–026) are in place. ADR-024 resolves the registry mutability question and the `OperationContext.env` type identity crisis by layering the registry by trust boundary. ADR-025 drops irpc from the vault, making it local-only by construction. ADR-026 records the HD-derivation key model as a foundational decision. Review #003 (type/API surface completeness) resolved: `DerivedKey` derive contradiction, `encrypt` prose, return-type divergence, RwLock contradiction, drift table gaps, ADR-022 stale sketches, `Capabilities`/`SessionOverlaySource`/`CallConnection`/`CachedKey` definitions, `CompositeOperationEnv` dispatch contract, `with_local` signature, payload schemas, timeout propagation, and request ID generation. The alknet-core, alknet-call, and alknet-vault crate specs are in draft. -**Next step**: Continue working through review #002's remaining Tier 4 findings (vault security decisions, guard clauses, ADR-writing exercises, smaller spec decisions). All open questions for the core and call crates are resolved; the vault crate's OQ-21 (remote vault) is now resolved (ADR-025 — vault is local-only by construction). +**Next step**: Implementation. All open questions are resolved. The specs have passed three review passes (#001 governance/security model, #002 cross-document consistency/two-way-door audit, #003 type/API surface completeness). ## Architecture Documents diff --git a/docs/architecture/crates/call/call-protocol.md b/docs/architecture/crates/call/call-protocol.md index 18fdb60..d3cda72 100644 --- a/docs/architecture/crates/call/call-protocol.md +++ b/docs/architecture/crates/call/call-protocol.md @@ -1,6 +1,6 @@ --- status: draft -last_updated: 2026-06-22-22 +last_updated: 2026-06-23 --- # Call Protocol @@ -39,13 +39,54 @@ pub struct CallAdapter { /// Layer 1 — optional session-overlay source (agent crate supplies this; /// None for non-agent deployments). See ADR-024, OQ-19. session_source: Option>, + /// Default timeout for wire calls (30s). Composed calls inherit the + /// parent's remaining deadline via `OperationContext.deadline`. + default_timeout: Duration, } -// The connection's imported-ops overlay (Layer 2) is built per CallConnection -// as from_call discovery completes — it's not a field on CallAdapter but -// rather state held by the CallConnection / dispatch context for incoming -// calls on that connection. See ADR-024. -``` +impl CallAdapter { + /// Non-agent deployment: no session overlay, default timeout. + pub fn new( + registry: Arc, + identity_provider: Arc, + ) -> Self { + Self { registry, identity_provider, session_source: None, + default_timeout: Duration::from_secs(30) } + } + + /// Agent deployment: supply a session-overlay source. The agent crate + /// implements `SessionOverlaySource`; alknet-call defines the trait. + pub fn with_session_source(mut self, source: Arc) -> Self { + self.session_source = Some(source); + self + } + + /// Override the default timeout. + pub fn with_timeout(mut self, timeout: Duration) -> Self { + self.default_timeout = timeout; + self + } +} + +/// Session overlay integration point (ADR-024). Defined in alknet-call +/// because `CallAdapter` must name the type — alknet-call cannot depend on +/// alknet-agent (agent depends on call, not reverse). The agent crate +/// implements this trait; alknet-call defines it. This is the same pattern +/// as `IdentityProvider` (ADR-004: core defines the trait, handlers impl it). +/// +/// The session overlay is an `OperationEnv` impl that wraps the curated base +/// (Layer 0). The `CallAdapter` composes it into the root +/// `OperationContext.env` per incoming call when a session is active. The +/// lookup mechanism (session ID in metadata, payload field, connection-bound +/// session state) belongs to the agent crate — this trait is the integration +/// point, not the lookup policy. +pub trait SessionOverlaySource: Send + Sync { + /// Returns the session overlay env for the given call, if a session is + /// active. `None` means no session is active for this call — the root + /// env is `curated base + connection overlay` (no session layer). + /// The agent crate determines how to map a call to its session. + fn overlay_for(&self, context: &OperationContext) -> Option>; +} The `CallAdapter` holds the static curated registry and an optional session-overlay source. Per-connection imported-ops overlays (Layer 2, @@ -53,6 +94,68 @@ ADR-024) are held with the connection and composed into the root `OperationContext.env` per incoming call. See ADR-024 for the layering model and `compose_root_env` below. +### CallConnection + +A `CallConnection` represents an established `alknet/call` connection, +regardless of which side opened it (ADR-017). It holds the connection's +imported-ops overlay (Layer 2, ADR-024) — the set of `from_call`-imported +operations discovered when the connection was established. + +```rust +/// An established alknet/call connection (either direction — accepted or +/// opened). Holds the connection's Layer 2 overlay (imported ops). +pub struct CallConnection { + /// The underlying QUIC connection (from endpoint.accept or CallClient.connect). + connection: Connection, + /// Layer 2 — this connection's imported-ops overlay. Populated by + /// `from_call` discovery when the connection is established. Each + /// imported op is a `HandlerRegistration` with `provenance: FromCall`. + /// This overlay is an `OperationEnv` impl that the `CallAdapter` + /// composes into the root `OperationContext.env` per incoming call. + imported_operations: Arc>>, +} + +impl CallConnection { + /// Register an imported operation into this connection's overlay + /// (Layer 2, ADR-024). Called by `from_call` after discovery. + pub fn register_imported(&self, registration: HandlerRegistration) { + let name = registration.spec.name.clone(); + self.imported_operations.write().insert(name, registration); + } + + /// Register multiple imported operations (bulk variant for `from_call`). + pub fn register_imported_all(&self, registrations: Vec) { + let mut overlay = self.imported_operations.write(); + for reg in registrations { + overlay.insert(reg.spec.name.clone(), reg); + } + } + + /// Build an `OperationEnv` impl for this connection's overlay. Used by + /// the `CallAdapter` when composing the root `OperationContext.env`. + /// Returns an `OperationEnv` that dispatches to this connection's + /// imported ops (and reports `contains` only for ops in the overlay). + pub fn overlay_env(&self) -> Arc; + + /// Call an operation on the remote peer (sends `call.requested`). + pub async fn call(&self, operation_id: &str, input: Value) -> ResponseEnvelope; + + /// Subscribe to a streaming operation on the remote peer. + pub async fn subscribe(&self, operation_id: &str, input: Value) -> impl Stream; + + /// Abort an in-flight request (sends `call.aborted`, cascades per ADR-016). + pub async fn abort(&self, request_id: &str); +} +``` + +**Layer 0 vs Layer 2 registration API** (ADR-024): `OperationRegistryBuilder` +builds Layer 0 (curated, immutable after startup) via `.with_local()` / +`.with_leaf()` / `.with()`. Layer 2 (per-connection) registration uses +`CallConnection::register_imported()` at runtime — the builder is +Layer-0-only; runtime overlay registration uses `CallConnection` methods. +When the connection drops, the overlay (and all imported ops) is dropped — +no explicit deregistration needed. + The adapter: 1. Accepts bidirectional streams on the connection 2. Reads length-prefixed JSON `EventEnvelope` frames from each stream @@ -162,6 +265,29 @@ Fields: New error codes may be added in future versions. Clients should treat unknown error codes as `INTERNAL` with `retryable: false`. +### Wire Payload Schemas + +The `payload` field of `EventEnvelope` has a different shape per event type: + +| Event | `payload` shape | +|-------|----------------| +| `call.requested` | `{ "operationId": "/fs/readFile", "input": {...}, "auth_token": "alk_..." (optional) }` | +| `call.responded` | `{ "output": }` — the operation's output, matching `output_schema` | +| `call.completed` | `{}` — empty object (subscription stream end signal) | +| `call.aborted` | `{}` — empty object (cancellation signal; the `id` identifies which request) | +| `call.error` | `{ "code": "...", "message": "...", "retryable": bool, "details": {...} (optional) }` | + +### `ResponseEnvelope` → `EventEnvelope` Conversion + +Local dispatch produces `ResponseEnvelope { request_id, result: Result }`. The `CallAdapter` converts it to `EventEnvelope` for the wire: + +| `ResponseEnvelope` | `EventEnvelope` | +|--------------------|-----------------| +| `Ok(value)` | `{ type: "call.responded", id: request_id, payload: { output: value } }` | +| `Err(call_error)` | `{ type: "call.error", id: request_id, payload: }` | + +The `request_id` becomes the `id` field. For subscriptions, each `call.responded` is a separate `EventEnvelope` with the same `id`; `call.completed` is `{ type: "call.completed", id, payload: {} }`. + ### Protocol Operations The call protocol defines four top-level operations, expressed through event types and operation names: @@ -304,6 +430,7 @@ fn build_root_context( handler_identity: registration.composition_authority.clone(), capabilities: registration.capabilities.clone(), // from the registration bundle metadata: HashMap::new(), // fresh per request + deadline: Some(Instant::now() + self.default_timeout), // root deadline (W7) scoped_env: registration.scoped_env.clone() .unwrap_or_else(ScopedOperationEnv::empty), // from the bundle, empty for leaves // Per-call env composition (ADR-024): the root env is a composite @@ -349,7 +476,17 @@ Local dispatch produces `ResponseEnvelope` with no serialization overhead. The ` **Stream reset**: When a QUIC stream is reset mid-operation, the `FrameFramedReader` returns an error. If the stream was carrying a subscription, the `PendingRequestMap` entry is removed and the mpsc channel is closed. If the stream was carrying a call, the oneshot is resolved with an error. No `call.aborted` is sent — the stream is gone. -**Timeouts**: Default timeout for calls is 30 seconds. Default timeout for subscriptions is optional (the client can specify a timeout in the `call.requested` payload, or leave it open-ended). The `PendingRequestMap` sweeper runs every 10 seconds and removes expired entries. Timeouts are configurable at the `CallAdapter` level, not per-operation. +**Timeouts**: Default timeout for wire calls is 30 seconds, configurable via +`CallAdapter::with_timeout()`. The `build_root_context` sets +`OperationContext.deadline` to `now + default_timeout`. Composed calls +inherit the parent's deadline (children do **not** get a fresh 30s — the +root call's deadline bounds the entire call tree, preventing a depth-5 +composition from running 150s). A composed call that exceeds the deadline +is cancelled (future dropped, `Drop` guards release resources) and returns +`CallError { code: "TIMEOUT", retryable: true }`. Subscriptions default to +no deadline (`deadline: None` — unbounded); the client can specify a +timeout in the `call.requested` payload. The `PendingRequestMap` sweeper +runs every 10 seconds and removes expired wire entries. **Error handling in `CallAdapter::handle()`**: If a handler panics, the stream is closed and the `PendingRequestMap` entry (if any) is cleaned up by the next sweeper pass. Other streams and the connection are unaffected. diff --git a/docs/architecture/crates/call/operation-registry.md b/docs/architecture/crates/call/operation-registry.md index 67211eb..25f550e 100644 --- a/docs/architecture/crates/call/operation-registry.md +++ b/docs/architecture/crates/call/operation-registry.md @@ -1,6 +1,6 @@ --- status: draft -last_updated: 2026-06-22-22 +last_updated: 2026-06-23 --- # Operation Registry @@ -136,6 +136,14 @@ pub struct OperationContext { /// composing handler via `OperationEnv::invoke()` (or /// `invoke_with_policy()`), not by the wire caller. pub abort_policy: AbortPolicy, + /// Deadline for this call and all descendants. Set by `build_root_context` + /// to `now + CallAdapter.default_timeout` (default 30s). Composed calls + /// inherit the parent's deadline (children do not get a fresh 30s — the + /// root call's deadline bounds the entire call tree). A composed call + /// that exceeds the deadline is cancelled (future dropped, `Drop` guards + /// release resources). `None` means no deadline (unbounded — used for + /// long-running subscriptions). See call-protocol.md → Timeouts. + pub deadline: Option, /// Composition-origin flag. Set by `OperationEnv::invoke()` (true) or the /// `CallAdapter` dispatch path (false) — never by handlers. Module-private /// for writes; read via `is_internal()`. See ADR-015. @@ -191,6 +199,27 @@ The registry maps operation names to `HandlerRegistration` bundles. The curated - `invoke(name, input, context)`: Look up, check ACL, invoke handler, return result - `list_operations()`: Return all registered specs (for `/services/list` — returns curated + active overlay ops) +### Request ID Generation + +Request IDs correlate `call.requested`/`call.responded` events and index the +abort-cascade tree (`PendingRequestMap` is keyed by request ID, ADR-016). + +- **Wire calls**: the root `OperationContext.request_id` is the `id` field + from the wire `call.requested` event (generated by the client). +- **Composed calls**: `OperationEnv::invoke()` generates a new `request_id` + for each child via `generate_request_id()` — a UUID v4 (or + `parent_id + "-" + counter`). Deterministic IDs (e.g. + `format!("env-{name}")`) **must not** be used — they collide across + concurrent invocations of the same operation, corrupting + `PendingRequestMap` correlation and the abort-cascade tree. +- **Wire visibility**: composed child `request_id`s are **internal** — they + appear in `PendingRequestMap` for abort-cascade indexing but are not sent + as `call.requested` to any peer. The client only sees `call.aborted` for + the root ID it sent; the server cascades internally to descendants. The + exception is `from_call` ops, which generate their own wire ID when + forwarding to the remote node (the remote node's `PendingRequestMap` + indexes it). + ### HandlerRegistration The registration bundle carries everything the dispatch path needs to construct an `OperationContext`. See ADR-022 for the full rationale. @@ -206,25 +235,74 @@ pub struct HandlerRegistration { } ``` -- `provenance`: Where the op came from (`Local`, `FromOpenAPI`, `FromMCP`, `FromCall`, `FromJsonSchema`, `Session`). Determines composition capability, default visibility, and trust model. Only `Local` and `Session` ops can compose; leaves get `composition_authority: None` and `scoped_env: None`. -- `composition_authority`: The declared authority (label + scopes + resources) the handler operates under when composing children. `None` for leaves. This replaces ADR-015's `handler_identity: Identity` — it's not a peer identity, it's a declared authority bundle. See ADR-022. +#### OperationProvenance + +Where the op came from. Determines composition capability, default +visibility, and trust model. See ADR-022 for rationale. + +```rust +pub enum OperationProvenance { + Local, // Assembly-written, trusted, can compose + FromOpenAPI, // HTTP forwarding stub (from_openapi), leaf + FromMCP, // MCP forwarding stub (from_mcp), leaf + FromCall, // QUIC forwarding stub (from_call), leaf locally + FromJsonSchema, // JSON Schema definition, no handler — schema only + Session, // Agent-written, sandboxed, can compose within sandbox +} +``` + +| Provenance | Can compose? | Has composition authority? | Default visibility | +|-----------|-------------|---------------------------|-------------------| +| `Local` | Yes | Yes — scopes set by assembly layer | External or Internal (assembly declares) | +| `FromOpenAPI` | No (leaf) | No | Internal | +| `FromMCP` | No (leaf) | No | Internal | +| `FromCall` | No (leaf in local registry) | No | Internal | +| `FromJsonSchema` | N/A (no handler) | No | N/A | +| `Session` | Yes (within sandbox) | Yes — scopes set at sandbox creation | Internal always | + +#### CompositionAuthority + +The declared authority (label + scopes + resources) the handler operates +under when composing children. `None` for leaves. This replaces ADR-015's +`handler_identity: Identity` — it's not a peer identity, it's a declared +authority bundle. See ADR-022. + +```rust +pub struct CompositionAuthority { + pub label: String, // e.g., "agent-chat" — not a peer id + pub scopes: Vec, // e.g., ["llm:call", "fs:read"] + pub resources: HashMap>, // e.g., {"service": ["vastai"]} +} + +impl CompositionAuthority { + pub fn none() -> Option { None } // Convenience for leaves + pub fn new(label: &str, scopes: impl IntoIterator) -> Self { ... } + pub fn as_identity(&self) -> Option { ... } // Synthetic Identity for ACL +} +``` + +- `provenance`: Determines composition capability. Only `Local` and `Session` ops can compose; leaves get `composition_authority: None` and `scoped_env: None`. +- `composition_authority`: The declared authority the handler operates under when composing children. `None` for leaves. See ADR-022. - `scoped_env`: The set of operations this handler may reach via `env.invoke()`. `None` for leaves (empty env). The reachability control from ADR-015. - `capabilities`: Outbound credentials (decrypted API keys, signing keys). Populated by the assembly layer from the vault at registration time. See [Capability Injection](#capability-injection). The `OperationRegistryBuilder` provides a fluent API with convenience methods for common cases: ```rust +// with_local: Local provenance, full bundle — all 5 args required. +// with_local(spec, handler, composition_authority, scoped_env, capabilities) let registry = OperationRegistryBuilder::new() - // Built-in service discovery (Local, no composition) + // Built-in service discovery (Local, no composition — empty authority, empty env, empty caps) .with_local(services_list_spec(), Arc::new(services_list_handler), - CompositionAuthority::none(), ScopedOperationEnv::empty()) + CompositionAuthority::none(), ScopedOperationEnv::empty(), Capabilities::new()) .with_local(services_schema_spec(), Arc::new(schema_handler), - CompositionAuthority::none(), ScopedOperationEnv::empty()) - // Agent handler (Local, composes — has authority + scoped env) + CompositionAuthority::none(), ScopedOperationEnv::empty(), Capabilities::new()) + // Agent handler (Local, composes — authority + scoped env + capabilities) .with_local(agent_chat_spec(), Arc::new(agent_chat_handler), CompositionAuthority::new("agent-chat", ["llm:call", "fs:read", "vastai:query"]), - ScopedOperationEnv::new(["fs/readFile", "vastai/listMachines", "llm/generate"])) - // Imported ops (leaves — no authority, no scoped env) + ScopedOperationEnv::new(["fs/readFile", "vastai/listMachines", "llm/generate"]), + Capabilities::new().with_api_key("google", google_api_key)) + // Imported ops (leaves — no authority, no scoped env; capabilities for outbound HTTP) .with_leaf(vastai_listMachines_spec(), Arc::new(vastai_handler), vastai_credentials) .build(); ``` @@ -249,19 +327,25 @@ pub trait OperationEnv: Send + Sync { /// Compose a child operation. The child's `OperationContext` is /// constructed with `internal: true`, inheriting the parent's /// composition authority as the child's caller identity. The abort - /// policy defaults to the parent's (ADR-016 Decision 6). + /// policy defaults to the parent's (ADR-016 Decision 6, W19). + /// + /// Default impl: delegates to `invoke_with_policy` with + /// `parent.abort_policy.clone()`. Impls only need to implement + /// `invoke_with_policy` — `invoke` is provided. async fn invoke( &self, namespace: &str, operation: &str, input: Value, parent: &OperationContext, - ) -> ResponseEnvelope; + ) -> ResponseEnvelope { + self.invoke_with_policy(namespace, operation, input, parent, parent.abort_policy.clone()).await + } /// Compose a child with an explicit abort policy (ADR-016 Decision 6). /// Use `AbortPolicy::ContinueRunning` for long-running work that - /// should survive a parent's abort. The default `invoke()` inherits - /// the parent's policy; this method overrides it for this child. + /// should survive a parent's abort. This is the required method — + /// `invoke()` delegates to it with the parent's policy. async fn invoke_with_policy( &self, namespace: &str, @@ -270,6 +354,14 @@ pub trait OperationEnv: Send + Sync { parent: &OperationContext, policy: AbortPolicy, ) -> ResponseEnvelope; + + /// Does this env contain the named operation? Used by + /// `CompositeOperationEnv` to probe overlays before dispatching + /// (ADR-024). The composite checks `session.contains()` → + /// `connection.contains()` → base, dispatching to the first overlay + /// that contains the op. Default impl returns `true` (a single-layer + /// env like `LocalOperationEnv` contains everything it can dispatch). + fn contains(&self, name: &str) -> bool { true } } ``` @@ -292,7 +384,10 @@ pub struct LocalOperationEnv { #[async_trait] impl OperationEnv for LocalOperationEnv { - async fn invoke(&self, namespace: &str, operation: &str, input: Value, parent: &OperationContext) -> ResponseEnvelope { + // `invoke` uses the default impl (delegates to `invoke_with_policy` + // with `parent.abort_policy.clone()`). + + async fn invoke_with_policy(&self, namespace: &str, operation: &str, input: Value, parent: &OperationContext, policy: AbortPolicy) -> ResponseEnvelope { let name = format!("{namespace}/{operation}"); // Reachability check (ADR-015, ADR-022): is this op in the parent's @@ -307,7 +402,7 @@ impl OperationEnv for LocalOperationEnv { let registration = self.registry.registration(&name); let context = OperationContext { - // Unique per invocation — a counter, UUID, or parent_id + suffix. + // Unique per invocation — a UUID v4 or parent_id + counter. // A deterministic ID (e.g. format!("env-{name}")) collides across // concurrent invocations of the same operation, which corrupts // PendingRequestMap correlation and the abort-cascade tree @@ -324,21 +419,21 @@ impl OperationEnv for LocalOperationEnv { handler_identity: registration.composition_authority.clone(), capabilities: parent.capabilities.clone(), // Inherit caller's capabilities metadata: HashMap::new(), // Fresh — does NOT propagate parent metadata (ADR-014) + abort_policy: policy, // Explicit policy (from invoke() default or invoke_with_policy) + deadline: parent.deadline, // Inherit parent's deadline (children don't get a fresh 30s) scoped_env: registration.scoped_env.clone() .unwrap_or_else(ScopedOperationEnv::empty), // Child's own scoped env (empty for leaves) // Dispatch trait: the child inherits the parent's env (the same // composite of curated base + active overlays). See ADR-024. env: parent.env.clone(), - // Abort policy: inherit the parent's policy by default (ADR-016). - // The parent handler can override via `invoke_with_policy()`. - abort_policy: parent.abort_policy.clone(), internal: true, // Nested calls use handler authority }; self.registry.invoke(&name, input, context).await } - // invoke_with_policy() delegates to invoke() with the policy set on the - // child context (ADR-016 Decision 6). See the trait definition above. + // `contains` uses the default impl (returns true — the curated registry + // contains everything it can dispatch). For a single-layer env, the + // reachability check in `invoke_with_policy` is the real gate. } ``` @@ -357,34 +452,48 @@ pub struct CompositeOperationEnv { #[async_trait] impl OperationEnv for CompositeOperationEnv { - async fn invoke(&self, namespace: &str, operation: &str, input: Value, parent: &OperationContext) -> ResponseEnvelope { + // `invoke` uses the default impl (delegates to `invoke_with_policy` + // with `parent.abort_policy.clone()`). + + async fn invoke_with_policy(&self, namespace: &str, operation: &str, input: Value, parent: &OperationContext, policy: AbortPolicy) -> ResponseEnvelope { let name = format!("{namespace}/{operation}"); // Reachability check against parent.scoped_env (same as LocalOperationEnv). if !parent.scoped_env.allows(&name) { return ResponseEnvelope::not_found(name); } // Dispatch in overlay order: session → connection → curated base. - // First match wins. Each overlay is an OperationEnv impl that knows - // its own registry; the composite routes to the right one. + // First overlay that *contains* the op wins. `contains()` (ADR-024) + // is the probe — it avoids the sentinel-return ambiguity and ensures + // cross-impl interop: any OperationEnv impl that correctly reports + // `contains` works with this composite. if let Some(session) = &self.session { - // session impl checks its own registry; if not found, falls - // through (returns a sentinel or the composite continues). - // Implementation detail: the session impl's `invoke` either - // dispatches or returns a "not in this overlay" signal. + if session.contains(&name) { + return session.invoke_with_policy(namespace, operation, input, parent, policy).await; + } } if let Some(connection) = &self.connection { - // same pattern + if connection.contains(&name) { + return connection.invoke_with_policy(namespace, operation, input, parent, policy).await; + } } - self.base.invoke(namespace, operation, input, parent).await + self.base.invoke_with_policy(namespace, operation, input, parent, policy).await + } + + fn contains(&self, name: &str) -> bool { + // The composite contains the op if any layer does. + self.session.as_ref().map_or(false, |s| s.contains(name)) + || self.connection.as_ref().map_or(false, |c| c.contains(name)) + || self.base.contains(name) } } ``` -The exact "first match wins" mechanism (sentinel return, a separate -`contains` check, or a try/else pattern) is a two-way door for -implementation — the structural decision (composite trait object, overlay -order, `Arc::clone` inheritance) is what ADR-024 locks. -``` +The `contains()` method (review #003 C9) is the overlay-dispatch contract. +It replaces the previous "sentinel or contains check — two-way door" framing, +which was ambiguous enough to produce non-interoperable `OperationEnv` impls. +The structural decision (composite trait object, overlay order, `Arc::clone` +inheritance) is locked by ADR-024; the dispatch contract (`contains` probe +before `invoke_with_policy`) is now locked too. Two things happen in `invoke()`: @@ -456,12 +565,12 @@ let vastai_credentials = Capabilities::new().with_http_token("vastai", vastai_to // Register operations — vault operations are NOT registered here let registry = OperationRegistryBuilder::new() - // Built-in service discovery (Local, no composition) + // Built-in service discovery (Local, no composition — empty caps) .with_local(services_list_spec(), Arc::new(services_list_handler), - CompositionAuthority::none(), ScopedOperationEnv::empty()) + CompositionAuthority::none(), ScopedOperationEnv::empty(), Capabilities::new()) .with_local(services_schema_spec(), Arc::new(schema_handler), - CompositionAuthority::none(), ScopedOperationEnv::empty()) - // Agent handler (Local, composes — has authority + scoped env + capabilities) + CompositionAuthority::none(), ScopedOperationEnv::empty(), Capabilities::new()) + // Agent handler (Local, composes — full bundle via .with()) .with(HandlerRegistration { spec: agent_chat_spec(), handler: Arc::new(agent_chat_handler), @@ -478,6 +587,7 @@ let registry = OperationRegistryBuilder::new() .build(); let call_adapter = CallAdapter::new(Arc::new(registry), identity_provider); +// Agent deployment: let call_adapter = CallAdapter::new(...).with_session_source(source); ``` The vault is used at construction time to populate `capabilities` in the registration bundle, not registered as call protocol operations. The curated layer (Layer 0) is immutable after construction — adding a `Local` op requires restarting the process. Session and imported overlays are dynamic at their respective scopes (ADR-024). This is consistent with OQ-04 (scoped to the `HandlerRegistry` by ADR-024), ADR-008, ADR-014, and ADR-022. diff --git a/docs/architecture/crates/core/core-types.md b/docs/architecture/crates/core/core-types.md index 18e199a..c5a023c 100644 --- a/docs/architecture/crates/core/core-types.md +++ b/docs/architecture/crates/core/core-types.md @@ -1,6 +1,6 @@ --- status: draft -last_updated: 2026-06-22-16 +last_updated: 2026-06-23 --- # Core Types @@ -178,6 +178,67 @@ in `StreamError`, but once a handler propagates via `HandlerError`, the endpoint treats all variants as "close the connection" (one-ALPN-per- connection, ADR-006). +## Capabilities + +Outbound credentials injected by the assembly layer at registration time. +A handler uses `Capabilities` to make authenticated outbound calls (LLM +provider API keys, HTTP service tokens, signing keys). See ADR-014 for the +secret-material flow and ADR-022 for the registration-bundle wiring. + +```rust +/// Outbound credentials for a handler. Non-serializable, zeroized, +/// immutable after construction. `Clone` is required by the composition +/// model (`parent.capabilities.clone()` in `OperationEnv::invoke()`). +/// +/// The concrete internal shape (a typed map, a struct with named fields) +/// is a two-way door, but the public API is fixed: `new()`, `with_api_key()`, +/// `with_http_token()`, and `get()`. Fields are private — callers cannot +/// mutate the credentials after construction. This makes the clone-semantics +/// two-way door genuinely two-way: Arc-based clone (shared immutable state) +/// and deep-copy clone (isolated state) are behaviorally identical when +/// neither supports mutation. See ADR-014, ADR-022, review #002 W2. +#[derive(Clone, Zeroize, ZeroizeOnDrop)] +pub struct Capabilities { + // Private — no interior mutability. The builder API (new, with_*) is + // the only construction path. Immutability after construction is the + // security guard that makes clone semantics safe. + entries: HashMap>, +} + +impl Capabilities { + /// Empty capabilities — for handlers that make no outbound calls. + pub fn new() -> Self; + + /// Add an API key (e.g., "google", "openai") to the capabilities. + pub fn with_api_key(mut self, service: &str, key: String) -> Self; + + /// Add an HTTP bearer token (e.g., "vastai", "github") to the capabilities. + pub fn with_http_token(mut self, service: &str, token: String) -> Self; + + /// Retrieve a credential by service name, if present. + pub fn get(&self, service: &str) -> Option<&Secret>; +} +``` + +- **Non-serializable**: `Capabilities` does **not** derive `Serialize`. It + cannot appear in `EventEnvelope` payloads even by accident. This is a + type-level enforcement of ADR-014's "call protocol carries no secret material." +- **Zeroized**: derives `Zeroize` and `ZeroizeOnDrop`. Secret material does + not linger in freed heap memory. +- **`Clone` + `Send + Sync`**: required by the composition model — + `OperationEnv::invoke()` clones the parent's capabilities for each child. + `Send + Sync` is required because the context is held across async task + boundaries. +- **Immutable after construction**: no `set`, no `insert`, no `mut` accessors. + This is the guard from review #002 W2 — it makes the Arc-vs-deep-copy clone + semantics genuinely two-way (shared immutable state is safe). +- **Module location**: `Capabilities` lives in alknet-core (it's a shared type + — see overview.md's Shared Types table). alknet-call imports it. + +See [operation-registry.md → Capability Injection](../call/operation-registry.md#capability-injection) +for how the dispatch path populates `OperationContext.capabilities` from the +registration bundle. + ## Design Decisions | Decision | ADR | Summary | @@ -187,6 +248,7 @@ connection, ADR-006). | HandlerError is non-fatal | [ADR-010](../../decisions/010-alpn-router-and-endpoint.md) | Handler errors close the connection, not the endpoint | | SendStream/RecvStream wrap quinn + iroh | [ADR-010](../../decisions/010-alpn-router-and-endpoint.md) | Internal enum dispatch for both QUIC sources | | Connection stores handler-resolved identity | OQ-11 (resolved) | `set_identity` via `OnceLock` — write-once-read-many; read by handler-side logging, not by the endpoint (C13 resolved) | +| Capabilities type | [ADR-014](../../decisions/014-secret-material-flow-and-capability-injection.md) | Non-serializable, zeroized, immutable after construction; `Clone` for composition propagation | ## Open Questions diff --git a/docs/architecture/crates/vault/README.md b/docs/architecture/crates/vault/README.md index 38b8cb9..6871416 100644 --- a/docs/architecture/crates/vault/README.md +++ b/docs/architecture/crates/vault/README.md @@ -1,6 +1,6 @@ --- status: draft -last_updated: 2026-06-22-25 +last_updated: 2026-06-23 --- # alknet-vault @@ -128,6 +128,8 @@ truth for drift tracking — if an item is fixed in source, update this table. | 6 | `HashMap::clear` zeroization | `KeyCache::clear()` removes entries and relies on `CachedKey`'s `Drop` impl for zeroization | Verify `HashMap::clear()` actually drops values (it does, but worth a test) | `cache.rs` | [service.md → Security Constraints](service.md#security-constraints) | | 7 | `derive_password` / `site_password_path` | `derive_password`, `derive_password_string`, `site_password_path` methods exist | Remove entirely — password-manager pattern not relevant to RPC system's vault (ADR-025, resolves C9) | `service.rs`, `mnemonic-derivation.rs` | [ADR-025](../../decisions/025-vault-local-only-dispatch.md) | | 8 | `unlock_new` return type | Returns `String` (not zeroized on drop) | Return `Zeroizing` — the mnemonic is the root of trust and must not linger in freed memory (resolves W7) | `service.rs` | [service.md → unlock_new](service.md#unlock_newword_count--phrase) | +| 9 | `key_version` ignored in encrypt/decrypt | `encrypt`/`decrypt` always derive at `PATHS::ENCRYPTION` regardless of `key_version` | Derive at `encryption_path_for_version(key_version)` — encrypt stamps the passed version, decrypt selects the key by the blob's version (ADR-021) | `service.rs` | [service.md → encrypt](service.md#encryptplaintext-key_version--encrypteddata), [ADR-021](../../decisions/021-key-rotation-via-version-indexed-paths.md) | +| 10 | `rotate` not implemented | No `rotate` method exists | Implement `rotate(encrypted, to_version)` — decrypt with old version's key, re-encrypt with new version's key (ADR-021) | `service.rs` | [service.md → rotate](service.md#rotateencrypted-to_version--encrypteddata), [ADR-021](../../decisions/021-key-rotation-via-version-indexed-paths.md) | ## Public API @@ -139,9 +141,14 @@ pub use mnemonic::{Language, Mnemonic, Seed}; // Derivation pub use derivation::{DerivationError, ExtendedPrivKey, PATHS}; +// Derivation helpers (derive_path_from_seed, parse_derivation_path, +// device_path, encryption_path_for_version) are accessible as +// alknet_vault::derivation::* — not re-exported at crate root to avoid +// clutter, but fully public. // Encryption -pub use encryption::{EncryptedData, EncryptionError}; +pub use encryption::{EncryptedData, EncryptionError, EncryptionKey}; +pub use encryption::CURRENT_KEY_VERSION; // Key types (DerivedKey, KeyType) pub use protocol::{DerivedKey, KeyType}; diff --git a/docs/architecture/crates/vault/encryption.md b/docs/architecture/crates/vault/encryption.md index c8ee93c..1537216 100644 --- a/docs/architecture/crates/vault/encryption.md +++ b/docs/architecture/crates/vault/encryption.md @@ -1,6 +1,6 @@ --- status: draft -last_updated: 2026-06-20 +last_updated: 2026-06-23 --- # Encryption @@ -59,27 +59,58 @@ equals the mnemonic. Migration is a one-time re-encryption (see ADR-020). ## Encryption Key -The encryption key is derived from the seed at path `m/74'/2'/0'/0'` -(`PATHS::ENCRYPTION`): +The encryption key is derived from the seed at a version-indexed path +(`m/74'/2'/0'/{version-2}'` per ADR-021; v2 is `PATHS::ENCRYPTION`): ```rust +/// AES-256-GCM encryption key. Not `Clone` — move-only, like `DerivedKey`. +/// Implements a custom redacting `Debug` (never prints key bytes). +#[derive(Zeroize, ZeroizeOnDrop)] pub struct EncryptionKey { key_bytes: [u8; 32], // 32-byte AES-256 key key_version: u32, // for rotation tracking } + +impl EncryptionKey { + /// Construct from raw 32 bytes. Private — for internal use. + fn new(key_bytes: [u8; 32], key_version: u32) -> Self; + + /// Take the first 32 bytes of derived key material (the private key + /// bytes from SLIP-0010 derivation) and construct an `EncryptionKey`. + /// This is the bridge from `DerivedKey` (SLIP-0010 output) to + /// `EncryptionKey` (AES-256-GCM input). `VaultServiceHandle::encrypt` + /// and `decrypt` call this on the cached `DerivedKey` to obtain the + /// `EncryptionKey` for the crypto layer. + pub fn from_derived_bytes(derived: &[u8], key_version: u32) -> Self; + + /// Return the key version (for rotation tracking). + pub fn version(&self) -> u32; + + /// Return the key bytes (crate-internal — for `encrypt`/`decrypt`). + pub(crate) fn key_bytes(&self) -> &[u8; 32]; +} + +impl fmt::Debug for EncryptionKey { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + f.debug_struct("EncryptionKey") + .field("key_version", &self.key_version) + .field("key_bytes", &"[REDACTED]") + .finish() + } +} ``` -- `new(key_bytes, key_version)`: Construct from raw bytes. -- `from_derived_bytes(bytes, key_version)`: Take the first 32 bytes of - derived key material (the private key bytes from SLIP-0010 derivation). -- `version()`: Return the key version (for rotation). - `EncryptionKey` implements `Zeroize` and `ZeroizeOnDrop` — the key bytes -are zeroized before deallocation. +are zeroized before deallocation. It does **not** derive `Clone` (move-only, +like `DerivedKey`) and does **not** derive `Serialize` (never crosses a +wire). The `Debug` impl is custom and redacts `key_bytes`. -The key is derived once (at unlock time or on first encrypt/decrypt) and -cached in the `KeyCache` (see [service.md](service.md)). Subsequent -encrypt/decrypt operations use the cached key. +The key is derived once (on first encrypt/decrypt) and cached in the +`KeyCache` as a `CachedKey` wrapping a `DerivedKey` (see +[service.md](service.md)). `encrypt`/`decrypt` extract the `EncryptionKey` +from the cached `DerivedKey` via `EncryptionKey::from_derived_bytes` on each +call (the `DerivedKey` is the cached form; the `EncryptionKey` is a +short-lived per-call value derived from it). ## EncryptedData @@ -129,9 +160,18 @@ existing v2 data. This is additive — see OQ-22 (key rotation) and ADR-020 ## Encrypt and Decrypt +These are **module-internal crypto helpers** (in `encryption.rs`), not the +public API. The public API is `VaultServiceHandle::encrypt` / +`VaultServiceHandle::decrypt` (see [service.md](service.md)), which derive +the key (from the cache or via `derive_encryption_key_for_version`), extract +the `EncryptionKey` via `EncryptionKey::from_derived_bytes`, and call these +helpers. + ```rust -pub fn encrypt(plaintext: &str, key: &EncryptionKey) -> Result; -pub fn decrypt(encrypted: &EncryptedData, key: &EncryptionKey) -> Result; +// Module-internal (encryption.rs). Not re-exported from the crate root. +// VaultServiceHandle::encrypt/decrypt call through to these. +pub(crate) fn encrypt(plaintext: &str, key: &EncryptionKey) -> Result; +pub(crate) fn decrypt(encrypted: &EncryptedData, key: &EncryptionKey) -> Result; ``` `encrypt`: @@ -152,9 +192,10 @@ constraint — see below. ## Key Versioning -`CURRENT_KEY_VERSION` is `2`. Version `1` is reserved for the TypeScript -predecessor's PBKDF2-encrypted data (see ADR-020). Each version maps to a -unique derivation path — the last hardened index is the version offset +`CURRENT_KEY_VERSION` is `2` (defined in `encryption.rs`, re-exported from +the crate root). Version `1` is reserved for the TypeScript predecessor's +PBKDF2-encrypted data (see ADR-020). Each version maps to a unique +derivation path — the last hardened index is the version offset (see ADR-021): ``` @@ -171,13 +212,9 @@ seed doesn't change), so partial rotation is safe. ### Rotation Key rotation re-encrypts a blob from one version to another. The vault -provides a `rotate` method; the caller (assembly layer or migration tool) -handles replacing the blob in storage: - -```rust -pub fn rotate(&self, encrypted: &EncryptedData, to_version: u32) -> Result; -``` - +provides a `VaultServiceHandle::rotate` method (see [service.md → +rotate](service.md#rotateencrypted-to_version--encrypteddata)); the caller +(assembly layer or migration tool) handles replacing the blob in storage. Rotation decrypts with the old version's key and re-encrypts with the new version's key. No new mnemonic needed — the same seed produces all version keys via different paths. See ADR-021 for the full mechanism. @@ -185,7 +222,8 @@ keys via different paths. See ADR-021 for the full mechanism. **The current source uses `CURRENT_KEY_VERSION = 1` with HD derivation and does not implement version-indexed paths or `rotate`.** These are drift items to be corrected during implementation sync. See ADR-020 (version -bump to 2) and ADR-021 (rotation mechanism). +bump to 2) and ADR-021 (rotation mechanism). See the [Known Source +Drift](README.md#known-source-drift) table in the vault README. ## Errors diff --git a/docs/architecture/crates/vault/mnemonic-derivation.md b/docs/architecture/crates/vault/mnemonic-derivation.md index d9b18f9..5f77d74 100644 --- a/docs/architecture/crates/vault/mnemonic-derivation.md +++ b/docs/architecture/crates/vault/mnemonic-derivation.md @@ -1,6 +1,6 @@ --- status: draft -last_updated: 2026-06-22-19 +last_updated: 2026-06-23 --- # Mnemonic and Key Derivation @@ -91,6 +91,12 @@ pub struct Seed { The 64-byte seed from which all HD keys are derived. Zeroized on drop. This is the input to SLIP-0010 / BIP-0032 master key derivation. +`Seed` derives `Clone` for convenience (derivation functions take `&[u8]`, +and the cache rebuild may need to reference the seed multiple times). +Callers should prefer `&Seed` and avoid cloning — the seed is the root of +trust, and each clone duplicates it into heap memory that lingers until +zeroized. + ## SLIP-0010 Ed25519 Derivation The default derivation scheme. SLIP-0010 specifies Ed25519 HD key @@ -149,6 +155,15 @@ pub struct ExtendedPrivKey { The result of SLIP-0010 derivation. Zeroized on drop. Accessors return slices — the caller copies what it needs. +```rust +impl ExtendedPrivKey { + pub fn private_key(&self) -> &[u8]; // 32 bytes + pub fn public_key(&self) -> &[u8]; // 32 bytes + pub fn chain_code(&self) -> &[u8]; // 32 bytes + pub fn path(&self) -> &str; +} +``` + ## BIP-0032 secp256k1 Derivation (Ethereum) Feature-gated behind `secp256k1`. Implements BIP-0032 HD key derivation for @@ -163,6 +178,33 @@ Unlike SLIP-0010 (Ed25519), BIP-0032 supports both hardened and unhardened child derivation. The standard Ethereum path `m/44'/60'/0'/0/0` uses unhardened indices for the last two levels. +```rust +#[derive(Clone, Zeroize)] +#[zeroize(drop)] +#[cfg(feature = "secp256k1")] +pub struct Secp256k1ExtendedPrivKey { + private_key: Vec, // 32 bytes + public_key: Vec, // 33 bytes (compressed) + chain_code: Vec, // 32 bytes + path: String, // the path that produced this key +} + +#[cfg(feature = "secp256k1")] +impl Secp256k1ExtendedPrivKey { + pub fn private_key(&self) -> &[u8]; + pub fn public_key(&self) -> &[u8]; + pub fn chain_code(&self) -> &[u8]; + pub fn path(&self) -> &str; +} +``` + +The `VaultServiceHandle::derive_ethereum_key` method calls +`derive_secp256k1_path` and wraps the result into a `DerivedKey`: +`DerivedKey { key_type: KeyType::Secp256k1, private_key: +extended.private_key().to_vec(), public_key: +extended.public_key().to_vec() }`. The `Secp256k1ExtendedPrivKey` is then +dropped and zeroized; the `DerivedKey` is the caller-facing type. + ### Why a separate module SLIP-0010 and BIP-0032 differ in: @@ -200,9 +242,17 @@ Helper functions construct parameterized paths: ```rust pub fn device_path(index: u32) -> String; // m/74'/0'/0'/{index}' -pub fn encryption_path_for_version(version: u32) -> String; // m/74'/2'/0'/{version-2}' +pub fn encryption_path_for_version(version: u32) -> Result; +// m/74'/2'/0'/{version-2}' — returns InvalidPath for version < 2 ``` +`encryption_path_for_version` returns `DerivationError::InvalidPath` for +`version < 2`. v1 is reserved for the TS PBKDF2 legacy (ADR-020) — the vault +cannot derive it, and silently mapping v1 to the v2 path would produce the +wrong key (making v1 blobs appear to "decrypt" with a corrupted key). v0 is +meaningless. `derive_encryption_key_for_version` propagates this error +(`VaultServiceError::InvalidPath`). + ### Path semantics | Path | Purpose | Key type | Used by | @@ -216,7 +266,9 @@ pub fn encryption_path_for_version(version: u32) -> String; // m/74'/2'/0'/{v `encryption_path_for_version` maps a key version to its derivation path (ADR-021). v2 (current) maps to `m/74'/2'/0'/0'` (which is `PATHS::ENCRYPTION`); v3 maps to `m/74'/2'/0'/1'`; etc. This is the rotation mechanism — each -version gets a cryptographically independent key from the same seed. +version gets a cryptographically independent key from the same seed. Returns +`InvalidPath` for `version < 2` (v1 is TS PBKDF2 legacy — undecryptable by +the vault by design). `KeyType` tags `DerivedKey` (see [protocol.md](protocol.md)) and `CachedKey` (see [service.md](service.md)) so consumers know what they diff --git a/docs/architecture/crates/vault/protocol.md b/docs/architecture/crates/vault/protocol.md index 44a1509..dc7f695 100644 --- a/docs/architecture/crates/vault/protocol.md +++ b/docs/architecture/crates/vault/protocol.md @@ -1,6 +1,6 @@ --- status: draft -last_updated: 2026-06-22-25 +last_updated: 2026-06-23 --- # Protocol @@ -26,7 +26,7 @@ The result of key derivation. Holds the key type, private key, and public key. ```rust -#[derive(Zeroize, Deserialize)] +#[derive(Zeroize)] #[zeroize(drop)] pub struct DerivedKey { #[zeroize(skip)] @@ -38,6 +38,12 @@ pub struct DerivedKey { } ``` +`DerivedKey` does **not** derive `Deserialize` via `#[derive]`. It has a **custom +`Deserialize` impl** that rejects redacted payloads — see +[Serialization Redaction](#serialization-redaction) below. (A derived +`Deserialize` would generate a default impl that conflicts with the manual one, +and would not produce the explicit redaction-rejection error the spec requires.) + The `#[zeroize(skip)]` attributes on `key_type` and `public_key` mean only the `private_key` is zeroized when the `DerivedKey` is dropped. The public key and key type are not secret material — zeroizing them is unnecessary @@ -65,10 +71,13 @@ private key, regardless of format: `"[REDACTED]"`. This is defense-in-depth — if a `DerivedKey` accidentally ends up in a log, a JSON config, or debug output, the private key is not exposed. -- **Deserialization**: rejects `private_key == "[REDACTED]"` with an error. - A JSON-deserialized `DerivedKey` with a redacted private key is invalid - and produces a deserialization error, not a corrupted key. This resolves - review #002 W8 (silent corruption on JSON-deserialized `DerivedKey`). +- **Deserialization**: a custom `Deserialize` impl rejects + `private_key == "[REDACTED]"` with a deserialization error (not a corrupted + key). This resolves review #002 W8 (silent corruption on JSON-deserialized + `DerivedKey`). The custom impl is required because `#[derive(Deserialize)]` + would generate a default impl that conflicts and would only fail incidentally + (serde type mismatch: string vs sequence), not with the explicit + redaction-rejection error the spec requires. - **No binary-format preservation path.** ADR-025 dropped the postcard/remote dispatch path that previously preserved private key bytes in binary formats. `DerivedKey` is always used in-process (ADR-014: never appears @@ -76,6 +85,49 @@ private key, regardless of format: `DerivedKey` over the wire, it defines its own serialization for that context — the vault's `DerivedKey` stays redact-always. +```rust +// Custom Serialize — always redacts private_key +impl serde::Serialize for DerivedKey { + fn serialize(&self, serializer: S) -> Result + where S: serde::Serializer { + use serde::SerializeStruct; + let mut s = serializer.serialize_struct("DerivedKey", 3)?; + s.serialize_field("key_type", &self.key_type)?; + s.serialize_field("private_key", "[REDACTED]")?; // never the real bytes + s.serialize_field("public_key", &self.public_key)?; + s.end() + } +} + +// Custom Deserialize — rejects "[REDACTED]" with an error +impl<'de> serde::Deserialize<'de> for DerivedKey { + fn deserialize(deserializer: D) -> Result + where D: serde::Deserializer<'de> { + #[derive(serde::Deserialize)] + struct DerivedKeyHelper { + key_type: KeyType, + private_key: Vec, + public_key: Vec, + } + let helper = DerivedKeyHelper::deserialize(deserializer)?; + // Reject redacted payloads — a JSON-deserialized DerivedKey with a + // redacted private key is invalid, not a corrupted key. + if helper.private_key == b"[REDACTED]" { + return Err(serde::de::Error::custom( + "DerivedKey.private_key is \"[REDACTED]\" — redacted payloads \ + cannot be deserialized. JSON round-tripping a DerivedKey is \ + not supported (the private key is gone)." + )); + } + Ok(DerivedKey { + key_type: helper.key_type, + private_key: helper.private_key, + public_key: helper.public_key, + }) + } +} +``` + The redaction is **not the primary control** for keeping private keys off the wire. The primary control is architectural: `DerivedKey` never appears in call protocol payloads (ADR-014). The redaction is a safety net for @@ -112,8 +164,10 @@ pub enum KeyType { ``` Tags `DerivedKey` and `CachedKey` so consumers know what they received. -`KeyType` is `Serialize`/`Deserialize` (it's part of the irpc protocol) and -`Clone` (it's not secret material — it's a tag). +`KeyType` is `Serialize`/`Deserialize` (retained for `EncryptedData` interop +and future use — ADR-025 removed the irpc dispatch path that previously +justified these derives, but the type remains serializable for structured +storage scenarios) and `Clone` (it's not secret material — it's a tag). ## Wire Format diff --git a/docs/architecture/crates/vault/service.md b/docs/architecture/crates/vault/service.md index 668f77d..90e7a81 100644 --- a/docs/architecture/crates/vault/service.md +++ b/docs/architecture/crates/vault/service.md @@ -1,6 +1,6 @@ --- status: draft -last_updated: 2026-06-22-25 +last_updated: 2026-06-23 --- # Service @@ -24,12 +24,16 @@ no remote dispatch. ## VaultServiceHandle The primary API for local (in-process) use. Thread-safe via -`Arc>`. +`std::sync::RwLock` — all methods are **synchronous** (no `async`, no +`.await`). The RwLock provides concurrent reads (derive operations) and +exclusive writes (unlock/lock). `tokio` is not a dependency of the vault +(ADR-025); `std::sync::RwLock` is sufficient because no method holds the +lock across an await point. ```rust #[derive(Clone)] pub struct VaultServiceHandle { - inner: Arc>, + inner: Arc>, } struct VaultServiceInner { @@ -40,6 +44,11 @@ struct VaultServiceInner { } ``` +**Invariant**: `unlocked` is `true` iff `seed.is_some()`. The `unlocked` +flag exists for cheap read-only checks (`is_unlocked`); the ground truth is +`seed.is_some()`. `lock()` sets `unlocked = false` and clears `seed`/`mnemonic` +to `None`; `unlock`/`unlock_new` set `unlocked = true` and populate `seed`. + `VaultServiceHandle` is `Clone` — cloning shares the underlying state via `Arc`. This is how the actor and the assembly layer share the same vault. @@ -85,6 +94,10 @@ down, display to user) and let the `Zeroizing` drop when done. Do not clone the returned value or store it in a non-zeroizing container. Supported word counts: 12, 15, 18, 21, 24. +Returns `VaultServiceError::AlreadyUnlocked` if the vault is already +unlocked (matching `unlock`'s behavior — `unlock_new` is a "first run" +operation and should not silently replace an existing mnemonic). + This is the "first run" path — a new node generates its mnemonic, writes it down, and the vault is unlocked for the process lifetime. The `Zeroizing` wrapper (from the `zeroize` crate) ensures the @@ -137,22 +150,27 @@ Derive an AES-256-GCM encryption key at the given path. Same cache behavior as `derive_ed25519`. Returns a `DerivedKey` with `KeyType::Aes256Gcm`. -### derive_encryption_key_for_version(version) → EncryptionKey +### derive_encryption_key_for_version(version) → DerivedKey ```rust -pub fn derive_encryption_key_for_version(&self, version: u32) -> Result; +pub fn derive_encryption_key_for_version(&self, version: u32) -> Result; ``` Derive the encryption key for a specific key version. Maps the version to its derivation path via `encryption_path_for_version(version)` (ADR-021): v2 → `m/74'/2'/0'/0'`, v3 → `m/74'/2'/0'/1'`, etc. Cached by path. This is -the version-aware method that `decrypt` uses to select the correct key for -each blob — see [encryption.md](encryption.md) and ADR-021. +the version-aware method that `encrypt` and `decrypt` use to select the +correct key for each blob — see [encryption.md](encryption.md) and ADR-021. +Returns `VaultServiceError::InvalidPath` for `version < 2` (v1 is TS PBKDF2 +legacy — the vault cannot derive it; v0 is meaningless). `derive_encryption_key(path)` (above) remains as the path-based API for deriving at arbitrary paths. `derive_encryption_key_for_version(version)` -is the version-aware API used by `encrypt` and `decrypt`. The two share -the same cache (keyed by derivation path). +is the version-aware API used by `encrypt` and `decrypt`. Both return +`DerivedKey` with `KeyType::Aes256Gcm` and share the same cache (keyed by +derivation path). `encrypt` and `decrypt` extract the `EncryptionKey` from +the `DerivedKey` via `EncryptionKey::from_derived_bytes` (see +[encryption.md](encryption.md#encryption-key)). ### derive_ethereum_key(path) → DerivedKey (feature-gated) @@ -172,10 +190,11 @@ Derive a secp256k1 keypair at the given BIP-0032 path. Returns pub fn encrypt(&self, plaintext: &str, key_version: u32) -> Result; ``` -Encrypt plaintext using the encryption key derived at `PATHS::ENCRYPTION`. -Derives (and caches) the encryption key on first call, then uses the cache -for subsequent calls. See [encryption.md](encryption.md) for the -cryptographic details. +Encrypt plaintext using the encryption key derived at +`encryption_path_for_version(key_version)` (ADR-021). The same `key_version` +is stamped on the resulting `EncryptedData`. Derives (and caches) the +encryption key on first call, then uses the cache for subsequent calls. See +[encryption.md](encryption.md) for the cryptographic details. ### decrypt(encrypted) → String @@ -218,6 +237,15 @@ pub struct KeyCache { config: CacheConfig, } +/// A cached derived key. Wraps a `DerivedKey` with cache metadata. +/// Derives `Zeroize` and `ZeroizeOnDrop` — the private key is zeroized +/// when the entry is evicted (LRU/TTL) or the cache is cleared. +pub struct CachedKey { + key: DerivedKey, // the derived key (zeroized on drop) + cached_at: Instant, // when the entry was inserted (for TTL) + last_accessed: Instant, // for LRU ordering +} + pub struct CacheConfig { pub ttl: Duration, // default: 1 hour pub max_entries: usize, // default: 64 @@ -228,9 +256,10 @@ pub struct CacheConfig { evicted lazily on access (`get` checks expiry) or via `evict_expired()`. - **LRU**: when the cache exceeds `max_entries` (default 64), the least recently used entry is evicted. Access (`get`) updates the LRU order. -- **Zeroized**: `CachedKey` derives `Zeroize` and `ZeroizeOnDrop`. Evicted - and cleared entries are zeroized — derived private keys do not linger in - freed heap memory. +- **Zeroized**: `CachedKey` derives `Zeroize` and `ZeroizeOnDrop` (via the + `DerivedKey` it holds, which is `#[zeroize(drop)]`). Evicted and cleared + entries are zeroized — derived private keys do not linger in freed heap + memory. - **Cleared on lock**: `lock()` calls `cache.clear()`, which removes and zeroizes all entries. @@ -241,15 +270,18 @@ pub struct CacheConfig { | `derive_ed25519` | Yes | Derivation is expensive; keys are reused | | `derive_encryption_key` | Yes | Same — encryption key reused across calls | | `derive_ethereum_key` | Yes | Same | -| `encrypt` / `decrypt` | Key cached | The encryption key (at `PATHS::ENCRYPTION`) is cached; the plaintext is not | +| `encrypt` / `decrypt` | Key cached | The encryption `DerivedKey` (at `encryption_path_for_version(key_version)`) is cached; the plaintext is not | ## Dispatch The vault uses **direct method calls** on `VaultServiceHandle` — no actor, no message enum, no channels, no serialization (ADR-025). The handle is -`Arc>` — clone it, share it, call methods -directly. The RwLock provides concurrent reads (derive operations) and -exclusive writes (unlock/lock). +`Arc>` — clone it, share it, call +methods directly. The `std::sync::RwLock` provides concurrent reads (derive +operations) and exclusive writes (unlock/lock). All methods are synchronous +(no `async`), so `std::sync::RwLock` is correct — a `tokio::sync::RwLock` +would require async methods or risk blocking a tokio runtime when held +across an await point. The vault does not depend on `tokio` (ADR-025). ``` Assembly layer (CLI binary): diff --git a/docs/architecture/decisions/018-vault-standalone-crate.md b/docs/architecture/decisions/018-vault-standalone-crate.md index 9d6cd9b..152809f 100644 --- a/docs/architecture/decisions/018-vault-standalone-crate.md +++ b/docs/architecture/decisions/018-vault-standalone-crate.md @@ -66,9 +66,11 @@ wraps the vault (see ADR-025, OQ-021). **alknet-vault has zero alknet crate dependencies.** It depends only on external crates (`bip39`, `ed25519-bip32`, `aes-gcm`, `sha2`, `hmac`, -`secp256k1`, `tokio` for `RwLock` sync primitives, `serde`, -`zeroize`, `thiserror`, `base64`, `rand`). ADR-025 dropped `irpc`, -`irpc-derive`, and `postcard` — the vault no longer uses irpc dispatch. +`secp256k1`, `serde`, `zeroize`, `thiserror`, `base64`, `rand`). ADR-025 +dropped `irpc`, `irpc-derive`, `postcard`, and `tokio` — the vault no longer +uses irpc dispatch or async sync primitives. All vault methods are +synchronous; `std::sync::RwLock` provides thread safety without a tokio +dependency. The vault does not depend on: - `alknet-core` — no shared types, no `Identity`, no `AuthContext` @@ -152,8 +154,8 @@ makes the freeze explicit and enforceable by review. **Positive:** - The vault compiles and runs without QUIC, quinn, iroh, rustls, or a tokio - runtime (the `VaultServiceHandle` works with just `std::sync::RwLock`; the - actor uses `tokio::sync::mpsc` but that's a lightweight dependency). + runtime (the `VaultServiceHandle` works with just `std::sync::RwLock`; + ADR-025 removed the actor and its `tokio::sync::mpsc` dependency entirely). - CLI tools, test harnesses, and future WASM targets can use the vault for key derivation without pulling in networking crates. - The vault's API surface is stable — changes to alknet-core types don't diff --git a/docs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md b/docs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md index a1fcaee..fc7671c 100644 --- a/docs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md +++ b/docs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md @@ -348,12 +348,13 @@ fn build_root_context( handler_identity: registration.composition_authority, // C1: from bundle, None for leaves capabilities: registration.capabilities.clone(), // C3: from bundle metadata: HashMap::new(), + abort_policy: AbortPolicy::default(), // abort-dependents (ADR-016 Decision 6) // env/scoped_env split by ADR-024: scoped_env is the reachability // data (from the bundle), env is the dispatch trait object (composed // per-call by the CallAdapter from active overlays). scoped_env: registration.scoped_env.clone() .unwrap_or_else(ScopedOperationEnv::empty), // C2: from bundle, empty for leaves - env: /* CallAdapter.compose_root_env(...) — see ADR-024 */, + env: self.compose_root_env(/* connection, session */), // Arc — see ADR-024 internal: false, // wire call — ACL against caller identity } } @@ -386,6 +387,7 @@ async fn invoke(&self, namespace: &str, operation: &str, input: Value, handler_identity: registration.composition_authority.clone(), // C1: child's own authority capabilities: parent.capabilities.clone(), // C3: propagate through composition metadata: HashMap::new(), // fresh — does NOT propagate (ADR-014) + abort_policy: parent.abort_policy.clone(), // inherit parent's policy (ADR-016 Decision 6, W19) // env/scoped_env split by ADR-024: scoped_env: registration.scoped_env.clone() .unwrap_or_else(ScopedOperationEnv::empty), // C2: child's own scoped env diff --git a/docs/architecture/decisions/025-vault-local-only-dispatch.md b/docs/architecture/decisions/025-vault-local-only-dispatch.md index 1bb722c..9e63b7d 100644 --- a/docs/architecture/decisions/025-vault-local-only-dispatch.md +++ b/docs/architecture/decisions/025-vault-local-only-dispatch.md @@ -164,11 +164,12 @@ auth-wrapping handler, and the operation filtering (Unlock/Lock local-only). ### 5. The vault's dependency footprint shrinks The vault drops: `irpc`, `irpc-derive`, `postcard` (for remote), `noq` -(via irpc), `iroh` (via irpc-iroh). It retains: `bip39`, `ed25519-bip32`, -`aes-gcm`, `sha2`, `hmac`, `secp256k1` (feature-gated), `tokio` (for -`RwLock` sync primitives, not for channels), `serde` (for `DerivedKey` -redaction and `EncryptedData` wire format), `zeroize`, `thiserror`, `base64`, -`rand`. +(via irpc), `iroh` (via irpc-iroh), and `tokio` (the actor's +`tokio::sync::mpsc` channels are gone; all vault methods are synchronous +and use `std::sync::RwLock` for thread safety). It retains: `bip39`, +`ed25519-bip32`, `aes-gcm`, `sha2`, `hmac`, `secp256k1` (feature-gated), +`serde` (for `DerivedKey` redaction and `EncryptedData` wire format), +`zeroize`, `thiserror`, `base64`, `rand`. ADR-018's "zero alknet crate dependencies" becomes "zero alknet crate dependencies and zero RPC framework dependencies." This is the cleanest diff --git a/docs/architecture/overview.md b/docs/architecture/overview.md index 82ac96b..8f768e9 100644 --- a/docs/architecture/overview.md +++ b/docs/architecture/overview.md @@ -1,6 +1,6 @@ --- status: draft -last_updated: 2026-06-22-20 +last_updated: 2026-06-23 --- # Alknet Overview @@ -165,7 +165,7 @@ The following types live in alknet-core and are used across handler crates: | `Identity` | Authenticated peer identity (inbound) | | `IdentityProvider` | Trait for resolving credentials to identity | | `AuthToken` | Opaque authentication token | -| `Capabilities` | Outbound credentials injected by the assembly layer (non-serializable, zeroized) | +| `Capabilities` | Outbound credentials injected by the assembly layer (non-serializable, zeroized, immutable after construction) — defined in [core-types.md](crates/core/core-types.md#capabilities) | | `Visibility` | Operation visibility — External (wire-callable) or Internal (composition-only) | | `StaticConfig` | Immutable configuration loaded at startup | | `DynamicConfig` | Hot-reloadable configuration (`ArcSwap`) | diff --git a/docs/reviews/003-pre-implementation-architecture-sanity-check.md b/docs/reviews/003-pre-implementation-architecture-sanity-check.md new file mode 100644 index 0000000..7a0b7b3 --- /dev/null +++ b/docs/reviews/003-pre-implementation-architecture-sanity-check.md @@ -0,0 +1,512 @@ +--- +status: resolved +last_updated: 2026-06-23 +reviewed_documents: + - overview.md + - README.md + - open-questions.md + - crates/core/README.md + - crates/core/core-types.md + - crates/core/endpoint.md + - crates/core/auth.md + - crates/core/config.md + - crates/call/README.md + - crates/call/call-protocol.md + - crates/call/operation-registry.md + - crates/vault/README.md + - crates/vault/mnemonic-derivation.md + - crates/vault/encryption.md + - crates/vault/service.md + - crates/vault/protocol.md + - decisions/001 through 026 (all 26 ADRs) +reviewer: architect +--- + +# Architecture Gap Review #003 + +## Purpose + +This is the third pre-implementation sanity check. Reviews #001 and #002 +caught the registration-bundle tangle (C1–C4), the missing error schemas (C5), +the governance reconciliation (Proposed ADRs treated as binding, Accepted ADRs +contradicting later refinements), the abort-policy wiring gap, and the vault +deferral audit. All of those are resolved. + +This review goes deeper on a different axis: **type and API surface +completeness**. The prior reviews focused on cross-document *decisions* +(contradictions between ADRs, stale framing). This review asks: could two +implementers, working from these specs alone, produce compatible code? The +findings are mostly places where a type or API is *referenced* but never +*defined*, or where a code sketch in an ADR went stale after the spec docs +were updated. + +The two themes: + +1. **Stale ADR sketches**: ADR-022's `build_root_context` and `invoke()` + sketches were written before ADR-024 split `env` into `scoped_env` + `env` + and before ADR-016's abort policy was added to `OperationContext`. The spec + docs (call-protocol.md, operation-registry.md) were updated; the ADR + sketches were not. An implementer copying the ADR verbatim won't compile. + +2. **Referenced-but-undefined types**: `Capabilities`, `SessionOverlaySource`, + `CachedKey`, `CallConnection`, `Secp256k1ExtendedPrivKey` — all referenced + in specs, none defined. These are exactly the boundary types where two + implementers must agree, and the spec leaves them as exercises. + +## Severity Definitions + +| Severity | Meaning | +|----------|---------| +| **Critical** | Must resolve before implementation — would cause divergent implementations or security issues | +| **Warning** | Should resolve — missing specifications that could cause confusion or rework | +| **Suggestion** | Consider — improvements for clarity or completeness | + +--- + +## Critical Findings + +### C1. `DerivedKey`: `#[derive(Deserialize)]` contradicts "custom Deserialize rejects `[REDACTED]`" + +**Documents**: crates/vault/protocol.md (L29, L68), +decisions/025-vault-local-only-dispatch.md (L137) + +**Problem**: `protocol.md:29` shows `#[derive(Zeroize, Deserialize)]` on +`DerivedKey`. But `protocol.md:68` and ADR-025:137 both require a **custom** +`Deserialize` that explicitly rejects `private_key == "[REDACTED]"` with an +error. These are mutually exclusive: `#[derive(Deserialize)]` generates a +default impl, and you cannot also write a manual `impl Deserialize` (conflicting +impls). With the derived impl, a redacted string would fail only incidentally +(serde type mismatch: string vs sequence), producing a generic error — not the +explicit redaction-rejection the spec describes. + +**Resolution**: Dropped `Deserialize` from the derive list in protocol.md and +showed a separate manual `impl<'de> Deserialize<'de> for DerivedKey` that +checks for `"[REDACTED]"` and returns a clear error. + +--- + +### C2. `encrypt` prose says "derived at `PATHS::ENCRYPTION`" but signature takes `key_version` + +**Documents**: crates/vault/service.md (L175), +decisions/021-key-rotation-via-version-indexed-paths.md (L115–122) + +**Problem**: `service.md:172` signature: `encrypt(&self, plaintext, key_version)`. +`service.md:175` prose: "Encrypt plaintext using the encryption key derived at +`PATHS::ENCRYPTION`." `PATHS::ENCRYPTION` is the v2 path only. ADR-021 makes +`encrypt` derive at `encryption_path_for_version(key_version)`. An implementer +following the prose stamps `key_version: 3` but uses the v2 key — a rotation bug +producing mislabeled blobs that decrypt fine locally but are cryptographically +mislabeled. + +**Resolution**: Updated service.md:175 to "Encrypt plaintext using the +encryption key derived at `encryption_path_for_version(key_version)` (ADR-021); +the same `key_version` is stamped on the resulting `EncryptedData`." + +--- + +### C3. `derive_encryption_key(path)` returns `DerivedKey` but `derive_encryption_key_for_version(version)` returns `EncryptionKey` + +**Documents**: crates/vault/service.md (L133, L143), +decisions/021-... (L104–122) + +**Problem**: Same cache, different return types. No conversion between them is +specified, and `CachedKey` (the cache value type) is never defined. An +implementer cannot reconcile the cache. + +**Resolution**: Made `derive_encryption_key_for_version` return `DerivedKey` +(consistent with `derive_encryption_key`), and `encrypt`/`decrypt` extract the +`EncryptionKey` from the cached `DerivedKey` via `EncryptionKey::from_derived_bytes`. +Defined `CachedKey` as a `#[derive(Zeroize, ZeroizeOnDrop)]` struct wrapping a +`DerivedKey` with cache metadata. + +--- + +### C4. `tokio` vs `std::sync::RwLock` contradiction + +**Documents**: decisions/018 (L69, L155–156), decisions/025 (L166–171), +crates/vault/service.md (L27–33) + +**Problem**: ADR-018:155 says "`std::sync::RwLock` works"; ADR-018:69 and +ADR-025:166 list `tokio` as a retained dep "for RwLock sync primitives"; +service.md shows `RwLock` without specifying which. All vault methods are sync +(no `async`), implying `std::sync::RwLock` — which means tokio isn't needed, +contradicting both ADR dependency lists. ADR-018:155 still references the +removed actor's `tokio::sync::mpsc`. + +**Resolution**: Specified `std::sync::RwLock` in service.md. Dropped `tokio` +from retained-deps lists in ADR-018 and ADR-025. Removed the stale actor/mpsc +sentence in ADR-018. + +--- + +### C5. Missing drift rows in vault README drift table + +**Documents**: crates/vault/README.md (L121–130), +decisions/021-... (L126–128), crates/vault/encryption.md (L185–188) + +**Problem**: The README drift table (the stated "single source of truth") omits +two drift items: (a) `decrypt`/`encrypt` ignore `key_version` and always derive +at `PATHS::ENCRYPTION`; (b) `rotate` is not implemented. An implementer using +only the table would miss these. + +**Resolution**: Added drift rows #9 (key_version ignored) and #10 (rotate not +implemented) to the README table. + +--- + +### C6. ADR-022's `build_root_context` and `invoke()` omit `abort_policy` + +**Documents**: decisions/022 (L344–358, L382–394), +crates/call/call-protocol.md (L296–318), +crates/call/operation-registry.md (L309–336) + +**Problem**: ADR-022's sketches have 9 fields; the spec docs have 10 (including +`abort_policy`). An implementer copying ADR-022 verbatim won't compile against +the `OperationContext` struct. + +**Resolution**: Added `abort_policy: AbortPolicy::default(),` to ADR-022's +`build_root_context` and `abort_policy: parent.abort_policy.clone(),` to its +`invoke()` sketch. Restated the `env` type as `Arc` in both sketches. + +--- + +### C7. `Capabilities` type referenced widely but defined nowhere + +**Documents**: overview.md (L168), crates/call/operation-registry.md (L116, +L206, L455, L473, L507–513), crates/call/call-protocol.md (L305) + +**Problem**: `Capabilities` appears in `OperationContext`, `HandlerRegistration`, +`build_root_context`, and the capability injection section — but has no struct +definition, no trait bounds, no builder API, no module location anywhere. The +spec asserts `Clone`, "non-serializable", "zeroized", "immutable after +construction" in prose only. The core and call implementers will produce +incompatible types. + +**Resolution**: Added a `Capabilities` struct definition to core-types.md with +`Clone + Send + Sync`, `Zeroize`/`ZeroizeOnDrop`, a sealed builder API +(`new()`, `with_api_key()`, `with_http_token()`), private fields (immutability +guard), and no `Serialize` derive. Cross-referenced from operation-registry.md. + +--- + +### C8. `SessionOverlaySource` referenced on `CallAdapter` but never defined; crate violation + +**Documents**: crates/call/call-protocol.md (L41), +decisions/024-... (L447–456) + +**Problem**: `CallAdapter` (alknet-call) has a field of type +`Arc`, but ADR-024 says the trait is "an agent-crate +concern." That's a crate-graph violation: alknet-call can't hold a field whose +type is defined in alknet-agent (agent depends on call, not reverse). + +**Resolution**: Defined `SessionOverlaySource` as a trait in alknet-call +(alongside `OperationEnv`), since the `CallAdapter` must name it. alknet-agent +implements it; alknet-call defines the integration point. This matches the +`IdentityProvider` pattern (ADR-004: core defines the trait, handlers implement +it). + +--- + +### C9. `CompositeOperationEnv` dispatch fall-through semantics unspecified + +**Documents**: crates/call/operation-registry.md (L348–387) + +**Problem**: The `OperationEnv::invoke` signature returns `ResponseEnvelope` — +no "not in this overlay" sentinel. The sketch says "returns a sentinel or the +composite continues" and "the exact mechanism is a two-way door" — but this +affects cross-impl interop. Two implementers picking different mechanisms +(sentinel vs `contains` check) will produce `OperationEnv` impls that don't +compose. + +**Resolution**: Added a `contains(&self, name: &str) -> bool` method to the +`OperationEnv` trait (with a default impl that returns `true` for backward +compat). The composite env probes `contains()` before dispatching, eliminating +the sentinel ambiguity. `CompositeOperationEnv::invoke` now checks +`session.contains()` → `connection.contains()` → base, dispatching to the first +overlay that contains the op. + +--- + +### C10. No API for Layer 2 (connection overlay) registration; `CallConnection` undefined + +**Documents**: crates/call/operation-registry.md (L214–232, L456–480), +decisions/024-... (L252–263) + +**Problem**: `OperationRegistryBuilder` only builds Layer 0. `from_call` imports +go into the connection's overlay at runtime, but there's no registration API, no +`CallConnection` struct definition, and no spec for how the overlay becomes part +of `CompositeOperationEnv.connection`. + +**Resolution**: Added a `CallConnection` struct definition (overlay-related +fields) and a `register_imported(registration)` / `imported_operations` API to +call-protocol.md. Clarified that `OperationRegistryBuilder` is Layer-0-only; +runtime overlay registration uses `CallConnection` methods. + +--- + +### C11. `with_local` signature diverges between two examples in operation-registry.md + +**Documents**: crates/call/operation-registry.md (L217–230, L458–478) + +**Problem**: The first example shows `with_local(spec, handler, authority, +scoped_env)` — 4 args, no capabilities. The second uses +`.with(HandlerRegistration { ... capabilities })`. `HandlerRegistration.capabilities` +is required (not `Option`). The first example silently drops capabilities for +the agent handler. + +**Resolution**: Added `capabilities` as the 5th arg to `with_local` in the first +example, making both examples consistent. Documented the `with_local` and +`with_leaf` signatures explicitly. + +--- + +## Warning Findings + +### W1. `invoke_with_policy` has no default impl and can't delegate to `invoke()` + +**Documents**: crates/call/operation-registry.md (L265–273, L340–342) + +**Problem**: `invoke_with_policy` is a required trait method with no default +impl. `invoke()` hardcodes `abort_policy: parent.abort_policy.clone()`, so +`invoke_with_policy` can't delegate to it — every `OperationEnv` impl must +duplicate the full body. + +**Resolution**: Restructured the trait: `invoke_with_policy` is the required +method; `invoke` has a default impl that calls `invoke_with_policy` with +`parent.abort_policy.clone()`. Impls only write `invoke_with_policy`. Added the +`CompositeOperationEnv::invoke_with_policy` body. + +--- + +### W2. `CachedKey` referenced 7+ times but never defined + +**Documents**: crates/vault/service.md (L216, L231, L330–336, L351–352), +crates/vault/protocol.md (L114), crates/vault/README.md (L99, L128) + +**Resolution**: Added a `CachedKey` struct definition to service.md's Cache +section, showing fields (`key: DerivedKey`, `cached_at: Instant`, +`last_accessed: Instant`) and `Zeroize`/`ZeroizeOnDrop` derives. + +--- + +### W3. `EncryptionKey` constructor + derivation glue unspecified; missing from re-export + +**Documents**: crates/vault/encryption.md (L60–78), +crates/vault/service.md (L140–155), crates/vault/README.md (L144) + +**Resolution**: Added `from_derived_bytes` pseudocode wiring SLIP-0010 output to +`EncryptionKey`. Added `EncryptionKey` to README re-export list. Specified +trait derives (custom redacting `Debug`, no `Clone`, `Zeroize`/`ZeroizeOnDrop`). + +--- + +### W4. `Secp256k1ExtendedPrivKey` never defined; `derive_ethereum_key` glue unspecified + +**Documents**: crates/vault/service.md (L157–165), +crates/vault/mnemonic-derivation.md (L152–164) + +**Resolution**: Added `Secp256k1ExtendedPrivKey` struct definition to +mnemonic-derivation.md (parallel to `ExtendedPrivKey`). Documented +`derive_ethereum_key` → `derive_secp256k1_path` → wrap into `DerivedKey` glue. + +--- + +### W5. `encryption_path_for_version(v<2)` collides onto v2 path + +**Documents**: decisions/021-... (L78–84), +crates/vault/mnemonic-derivation.md (L203), crates/vault/service.md (L147–148) + +**Problem**: `saturating_sub(2)` maps v0/v1/v2 to the same path. v1 is TS PBKDF2 +(undecryptable by design), but `derive_encryption_key_for_version(1)` silently +returns the v2 key. + +**Resolution**: Added a guard: `encryption_path_for_version` and +`derive_encryption_key_for_version` return `VaultServiceError::InvalidPath` for +`version < 2`. Documented in mnemonic-derivation.md and service.md. + +--- + +### W6. `ResponseEnvelope` → `EventEnvelope` conversion unspecified; payload schemas missing + +**Documents**: crates/call/call-protocol.md (L80–85, L331–341) + +**Resolution**: Added a "Wire Payload Schemas" section to call-protocol.md +showing the `payload` shape for `call.responded`, `call.completed`, and +`call.aborted`. Documented the `ResponseEnvelope` → `EventEnvelope` mapping. + +--- + +### W7. Timeout location and composition propagation unspecified + +**Documents**: crates/call/call-protocol.md (L352) + +**Resolution**: Added a timeout section to call-protocol.md specifying: timeout +duration is on `CallAdapter` (constructor param); composed calls inherit the +parent's remaining deadline via a `deadline: Option` on +`OperationContext`; composed calls that exceed the deadline are cancelled (future +dropped). Added a cross-reference from `OperationContext`. + +--- + +### W8. `generate_request_id()` unspecified; wire ID vs internal ID relationship undefined + +**Documents**: crates/call/operation-registry.md (L315, L167), +crates/call/call-protocol.md (L291) + +**Resolution**: Added a "Request ID Generation" subsection to +operation-registry.md specifying UUID v4 generation for composed calls, that wire +`call.requested.id` becomes the root `request_id`, and that composed child IDs +are internal (appear in `PendingRequestMap` for abort indexing but are not sent +as `call.requested` unless the child is a `from_call` op). + +--- + +### W9. `unlock_new` already-unlocked behavior unspecified + +**Documents**: crates/vault/service.md (L74–93) + +**Resolution**: Added: "`unlock_new` returns `AlreadyUnlocked` if the vault is +already unlocked, matching `unlock`'s behavior." + +--- + +### W10. `KeyType`'s `Serialize`/`Deserialize` justification stale post-ADR-025 + +**Documents**: crates/vault/protocol.md (L115) + +**Resolution**: Removed the stale "part of the irpc protocol" justification. +`KeyType` derives `Serialize`/`Deserialize` for `EncryptedData` interop (future +use) and `Clone` because it's a non-secret tag; the derives are retained but the +justification is corrected. + +--- + +### W11. `OperationProvenance` and `CompositionAuthority` defined only in ADR-022 + +**Documents**: crates/call/operation-registry.md (L202, L209, L468), +decisions/022-... (L116–131, L165–181) + +**Resolution**: Added inline definitions of `OperationProvenance` and +`CompositionAuthority` to operation-registry.md, with "See ADR-022 for +rationale" cross-references. + +--- + +### W12. `encrypt`/`decrypt` free functions vs `VaultServiceHandle` methods relationship unstated + +**Documents**: crates/vault/encryption.md (L132–146), +crates/vault/service.md (L169–190) + +**Resolution**: Added a note to encryption.md that the free `encrypt`/`decrypt` +functions are module-internal crypto helpers; `VaultServiceHandle::encrypt`/ +`decrypt` are the public API and call through to them. + +--- + +### W13. `rotate` placement in encryption.md (method signature in crypto doc) + +**Documents**: crates/vault/encryption.md (L177–179) + +**Resolution**: Replaced the `rotate` signature block in encryption.md with a +prose pointer to service.md and ADR-021 (rotate is a `VaultServiceHandle` +method, not a free function). + +--- + +### W14. `CallAdapter::new()` missing session_source param + +**Documents**: crates/call/operation-registry.md (L480), +crates/call/call-protocol.md (L35–42) + +**Resolution**: Added `CallAdapter::new(registry, identity_provider)` (defaults +`session_source` to `None`) and `CallAdapter::with_session_source(source)` +builder method to call-protocol.md. + +--- + +## Suggestions + +### S1. `Seed: Clone` duplicates root of trust + +**Documents**: crates/vault/mnemonic-derivation.md (L84–88) + +**Resolution**: Kept `Clone` on `Seed` (derivation functions take `&[u8]`, so +`Clone` is not strictly needed, but the `to_seed` path and cache rebuild may +benefit). Added a note that callers should prefer `&Seed` and avoid cloning. + +--- + +### S2. `VaultServiceInner.unlocked: bool` redundant with `seed: Option` + +**Documents**: crates/vault/service.md (L35–41) + +**Resolution**: Added an invariant note: `unlocked` is `true` iff +`seed.is_some()`; `is_unlocked()` reads `unlocked` for cheap checks, but the +ground truth is `seed.is_some()`. + +--- + +### S3. `ExtendedPrivKey`/`Mnemonic` accessor signatures incomplete + +**Documents**: crates/vault/mnemonic-derivation.md (L58–63, L138–150) + +**Resolution**: Added accessor signatures for `ExtendedPrivKey` +(`private_key()`, `public_key()`, `chain_code()`, `path()`). + +--- + +### S4. `CURRENT_KEY_VERSION` module location unspecified + +**Documents**: decisions/021-... (L159), crates/vault/encryption.md (L154) + +**Resolution**: Added: `CURRENT_KEY_VERSION` lives in `encryption.rs` and is +re-exported from the crate root. + +--- + +### S5. ADR-018 stale actor reference + +**Documents**: decisions/018 (L155–156) + +**Resolution**: Removed the stale actor/mpsc sentence (addressed in C4). + +--- + +### S6. `device_path`/`encryption_path_for_version`/`derive_path_from_seed` not in re-export list + +**Documents**: crates/vault/README.md (L136–154) + +**Resolution**: Added a note that derivation helpers are accessible as +`alknet_vault::derivation::*`. + +--- + +## Summary Statistics + +| Severity | Count | Status | +|----------|-------|--------| +| Critical | 11 | All resolved (C1–C11) | +| Warning | 14 | All resolved (W1–W14) | +| Suggestion | 6 | All resolved (S1–S6) | + +## Recommended Resolution Order + +All findings resolved. Resolution summary: + +1. **Vault type/reconciliation pass** (C1–C5, W2–W5, W9–W13, S1–S4): Fixed + `DerivedKey` derive contradiction, `encrypt` prose, return-type divergence, + RwLock contradiction, drift table, `CachedKey`/`EncryptionKey`/ + `Secp256k1ExtendedPrivKey` definitions, `encryption_path_for_version` + guard, `unlock_new` behavior, `KeyType` justification, free-vs-method + relationship, `rotate` placement. +2. **Core+call type/API pass** (C6–C11, W1, W6–W8, W14, S5–S6): Fixed + ADR-022 stale sketches, defined `Capabilities`, `SessionOverlaySource`, + `CallConnection`, Layer 2 registration API, `CompositeOperationEnv` + dispatch, `with_local` signature, `invoke_with_policy` default impl, + payload schemas, timeout, request ID generation, `CallAdapter` constructor. +3. **Inline definitions** (W11): Added `OperationProvenance` and + `CompositionAuthority` inline in operation-registry.md. + +Review #003 is complete. All 11 critical, 14 warning, and 6 suggestion findings +resolved. \ No newline at end of file