Files
alknet/docs/reviews/004-post-implementation-sanity-check.md
glm-5.2 97216764ea fix: resolve review #004 findings W1-W4 + close review gate
W1 (call/protocol/abort-cascade-wiring): wire AbortCascade into
CallAdapter handle_stream for EVENT_ABORTED. Cascades with
AbortPolicy::AbortDependents, aborts root, no descendant frames on
wire (ADR-016 Decision 2). Two integration tests added.

W2 (core/endpoint-client-fingerprint): extract TLS client cert
fingerprint in dispatch_quinn (SHA256:<hex> of leaf cert DER via
peer_identity) and dispatch_iroh (ed25519:<hex> of peer NodeId).
Fingerprint format documented in auth.md. Server config change
(with_no_client_auth → request-but-don't-require) deferred to new
follow-up task core/endpoint-request-client-cert.

W3 (vault/mnemonic-debug-redaction): replace Mnemonic derive(Debug)
with manual redacting impl (phrase: "[REDACTED]"). Seed confirmed
no Debug impl. Redaction test added.

W4 (core/auth-apikey-resources): Option B — drop entry.resources from
spec. External identities (token/fingerprint) grant scopes only;
resource-scoped ACLs are composition-internal (ADR-015/022). auth.md
corrected + limitation documented. Two tests confirm empty resources.

review-post-impl-fixes: all 4 verified, workspace green (326 tests,
0 failures, 0 clippy warnings). Review #004 status → resolved.

Graph: 34 tasks, 12 gens.
2026-06-24 11:00:54 +00:00

609 lines
31 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
status: resolved
last_updated: 2026-06-24
reviewed_artifacts:
- crates/alknet-vault/src/{lib,cache,derivation,encryption,ethereum,mnemonic,protocol,service}.rs
- crates/alknet-core/src/{lib,auth,config,endpoint,types}.rs
- crates/alknet-call/src/{lib,registry/*,protocol/*}.rs
- docs/architecture/{overview,open-questions}.md
- docs/architecture/crates/{vault,core,call}/*.md
- docs/architecture/decisions/001026
- docs/reviews/001,002,003-*.md
- tasks/{vault,core,call}/**/*.md (28 completed tasks)
reviewer: code-reviewer (post-implementation)
---
# Post-Implementation Sanity Check #004
## Purpose
Sanity check after the 28-task implementation round across the three greenfield
crates (`alknet-vault`, `alknet-core`, `alknet-call`). Reviews #001#003 were
*pre*-implementation and caught spec-level gaps (type-surface completeness,
registration-bundle tangle, abort-policy wiring, vault deferral audit). This
review is *post*-implementation: it asks whether the code matches the now-stable
specs, and whether anything obvious (or not so obvious) slipped through.
Headline result: **the implementation is in remarkably good shape.** It builds
clean (`cargo build --workspace --all-features`), all 332 tests pass, and
`cargo clippy --workspace --all-features --all-targets` reports zero default-
level warnings. The spec drift that remains is concentrated in two places:
(a) one wiring gap where implemented-and-tested code is never called, and (b)
two minor behaviors the spec describes but the implementation shortcuts. None
of the findings block forward progress; all are local fixes.
## Severity Definitions
| Severity | Meaning |
|----------|---------|
| **Critical** | Security or correctness defect that will misbehave in production |
| **Warning** | Spec drift that works today but will surprise a future implementer or breaks an invariant |
| **Suggestion** | Hardening, ergonomics, or test-coverage gaps |
## Methodology
- Read every source file in the three crates (~10.3k LOC).
- Cross-referenced each task's acceptance criteria against the implemented code.
- Cross-referenced each ADR's binding decisions against the implementation.
- Ran `cargo build`, `cargo test`, `cargo clippy` (default + `-W clippy::pedantic`).
- Walked the dispatch paths end-to-end: ALPN handshake → endpoint dispatch →
handler → registry invoke → env compose → child invoke → wire response.
- Traced the secret-material flow: vault unlock → derive → cache → encrypt →
`DerivedKey`/`EncryptionKey` zeroization on drop.
## Summary Statistics
| Severity | Count |
|----------|-------|
| Critical | 0 |
| Warning | 4 (W1W4) |
| Suggestion | 5 (S1S5) |
---
## Warning Findings
### W1. `AbortCascade` is implemented and tested but never invoked by `CallAdapter`
**Files**: `crates/alknet-call/src/protocol/abort.rs` (full impl + 20 tests),
`crates/alknet-call/src/protocol/adapter.rs:200228` (`handle_stream`),
`crates/alknet-call/src/protocol/adapter.rs:261301` (`handle`)
**Problem**: `AbortCascade::cascade_abort` exists with full tree-walking logic
for both `AbortDependents` and `ContinueRunning`, and 20 unit tests cover depth-
3 cascades, mixed Call/Subscribe entries, and the "unknown root silently
discarded" rule from ADR-016. But no caller invokes it.
In `CallAdapter::handle_stream` (adapter.rs:210213), the only inbound event
type dispatched is `EVENT_REQUESTED`:
```rust
if envelope.r#type != EVENT_REQUESTED {
debug!(event_type = %envelope.r#type, id = %envelope.id, "ignoring non-requested event on inbound stream");
continue;
}
```
In `CallAdapter::handle` (adapter.rs:261301), the only abort-related behavior
is `pending.lock().fail_all(...)` on connection close — which fails every
pending wire entry with `INTERNAL` but does not walk the call tree. There is no
path where an inbound `call.aborted` event for a parent request triggers
`AbortCascade::cascade_abort`.
**Spec contradiction**: `docs/architecture/crates/call/call-protocol.md:497`:
> "When `call.aborted` arrives for a parent request, the protocol cascades the
> abort to all non-terminal descendants in the tree. The CallAdapter walks the
> tree (indexed by `parent_request_id` in `PendingRequestMap`) and sends
> `call.aborted` for each descendant."
The implementation does not do this. A wire client that sends `call.aborted`
to cancel its in-flight root request gets the event silently dropped
server-side; the root's composed descendants continue running to completion
under the default `AbortDependents` policy — exactly the wasted-work / unwanted-
side-effects case ADR-016 was written to prevent.
**Why the tests pass anyway**: `AbortCascade`'s unit tests construct a
`PendingRequestMap` directly and call `cascade_abort` on it. They never go
through `CallAdapter`. The call-adapter task's acceptance criteria (tasks/call/
protocol/call-adapter.md:213238) did not include "inbound `call.aborted`
triggers cascade" — only outgoing aborts via `CallConnection::abort()` and the
abort-cascade logic itself. The abort-cascade task's acceptance criteria
(tasks/call/protocol/abort-cascade.md:154172) likewise test `AbortCascade` in
isolation. The integration step — wiring the cascade into the adapter's inbound
event path — is in neither task's acceptance criteria, and so was never done.
This is the classic "two correctly-implemented halves with no bolt between
them" pattern. Each half passes its own tests; the integration is untested and
absent.
**Impact**: Today, with no `from_call` ops and no LLM agent driving nested
composition, the gap is mostly latent — most call trees are depth-1 (single
`call.requested` → single handler → response). The cascade becomes load-
bearing the moment a handler composes other operations and a client aborts the
root mid-flight (e.g., user cancels an agent chat → `/agent/chat` handler has
already invoked `/fs/readFile` and `/bash/exec` → those keep running, possibly
with side effects, after the user-visible cancellation). That is the precise
scenario ADR-016's "aborted parent work has no consumer waiting for results —
continuing is wasted work at best and unwanted side effects at worst" calls
out.
**Resolution**: In `CallAdapter::handle_stream`, add a branch for
`EVENT_ABORTED` (and possibly `EVENT_COMPLETED`/`EVENT_ERROR` for client-
initiated close on subscriptions) that:
1. Looks up the request_id in the connection's `PendingRequestMap`.
2. If present, constructs `AbortCascade::new(&mut pending)` and calls
`cascade_abort(&request_id, AbortPolicy::AbortDependents)` (the wire caller
doesn't get to choose the policy — the root's policy is `AbortDependents`
per ADR-016 Decision 6; `ContinueRunning` is a handler-level opt-in for
children, not a wire field).
3. Optionally sends `call.aborted` back for each cascaded descendant ID, if the
protocol intends the cascade to be wire-visible (the spec is ambiguous
here — call-protocol.md:497 says "sends `call.aborted` for each descendant";
abort-cascade.md:6874 says "Composed child request_ids are internal — the
client only sees `call.aborted` for the root ID it sent". The latter is the
stronger statement and matches ADR-016. Pick one and document it.)
Then add an integration test that opens a stub connection, registers a parent
op whose handler invokes a child op via `env.invoke()`, sends `call.aborted`
for the parent's request_id on an inbound stream, and asserts the child's
`PendingRequestMap` entry is removed. This is the test that would have caught
the gap.
---
### W2. Endpoint never extracts TLS client certificate fingerprint
**Files**: `crates/alknet-core/src/endpoint.rs:306` (`dispatch_quinn`),
`crates/alknet-core/src/endpoint.rs:396` (`dispatch_iroh`),
`crates/alknet-core/src/endpoint.rs:405421` (`build_auth_context`)
**Problem**: Both dispatch functions hard-code `tls_client_fingerprint: None`
when calling `build_auth_context`:
```rust
let auth = build_auth_context(&alpn, remote_addr, None, identity_provider);
// ^^^^
```
So `AuthContext.tls_client_fingerprint` is always `None`, and
`AuthContext.identity` (the endpoint-resolved identity) is always `None` too —
because `build_auth_context` only attempts `resolve_from_fingerprint` when the
fingerprint is `Some`. The endpoint-level auth resolution path described in
`docs/architecture/crates/core/auth.md:159171` is non-functional:
> "QUIC connection arrives → TLS handshake (ALPN negotiation) → Extract TLS
> client certificate fingerprint (if presented) → If fingerprint present:
> IdentityProvider::resolve_from_fingerprint() → Some(identity):
> auth.identity = Some(identity) → Construct AuthContext { identity, alpn,
> remote_addr, tls_client_fingerprint }"
The implementation constructs the AuthContext but with `identity: None` and
`tls_client_fingerprint: None` regardless of whether the TLS peer presented a
client certificate. All identity resolution is deferred to handler-level code
(the `auth_token` path in `CallAdapter::resolve_identity`, or the future SSH
handler's key-fingerprint path).
**Task summary mismatch**: `tasks/core/endpoint.md:252` claims "AuthContext
construction from connection (alpn/remote_addr/fingerprint/identity)" as a
completed deliverable. The fingerprint step is not actually implemented —
`None` is passed in both dispatch sites. The task's acceptance criteria
(tasks/core/endpoint.md:213238 — not directly quoted above, but the same
file) likewise lists fingerprint extraction as an acceptance criterion.
**Why this matters**: For SSH and git handlers, this is fine — they do their
own key-fingerprint extraction inside `handle()`. For P2P nodes using RFC 7250
raw Ed25519 keys (the "default for most alknet nodes" per OQ-12), the
connection-level identity *is* the TLS client cert — there's no separate
protocol-level credential to extract. Without endpoint-level fingerprint
extraction, a raw-key peer connecting to an `alknet/call` endpoint cannot be
identified by fingerprint at the endpoint layer; the call protocol's
`auth_token` field becomes the *only* identity path, which defeats the
"hybrid resolution" design from ADR-004/OQ-02.
**Resolution**: In `dispatch_quinn`, extract the client cert from the rustls
session via `connection.handshake_data()`
`quinn::crypto::rustls::HandshakeData` (which already has `downcast::<...>()`
in `extract_quinn_alpn` at endpoint.rs:316326). The `HandshakeData` struct
exposes `peer_subjective_name`/`client_cert_chain`-equivalent fields; with
`with_no_client_auth()` currently set, no client cert is requested, so this
also requires deciding whether the server config should request client certs
(two-way door — `with_client_auth()` vs `with_no_client_auth()`). For iroh,
`connecting.lms()` / the connection's `tls::Lts` exposes the peer's `NodeId`,
which *is* the raw-key fingerprint in iroh's model. At minimum, extract *some*
fingerprint when one is available; passing `None` unconditionally is the gap.
If client cert extraction is deferred (a defensible call — ACME and full
client-auth are also deferred per the task summary), update the task summary
and the spec to say so explicitly. Right now the spec says "extracted", the
summary says "constructed from fingerprint", and the code says `None`. Pick
two.
---
### W3. `Mnemonic: #[derive(Debug)]` prints the seed phrase
**File**: `crates/alknet-vault/src/mnemonic.rs:35`
**Problem**: `Mnemonic` derives `Debug`:
```rust
#[derive(Debug)]
pub struct Mnemonic {
inner: Bip39Mnemonic,
phrase: String,
}
```
The derived `Debug` prints both fields, including `phrase: "abandon abandon
abandon ..."`. The crate's own module doc (mnemonic.rs:8) says "Seed material
is protected with `Zeroize`" and the `Mnemonic::phrase()` accessor (line 82)
warns "Handle with care — this is the root of trust for all derived keys." The
`Zeroize + Drop` impls (lines 8899) wipe the phrase from memory on drop — but
`Debug` will happily hand the phrase to any `tracing::debug!`, `format!("{:?}")`,
panic backtrace, or error-context printer that touches a `Mnemonic` value.
This is inconsistent with the rest of the crate's secret handling:
- `DerivedKey` has a custom redacting `Debug` (protocol.rs:4856) that prints
`private_key: "[REDACTED]"`.
- `EncryptionKey` has a custom redacting `Debug` (encryption.rs:132139).
- `Capabilities` has a redacting `Debug` (types.rs:112118).
- `Secret<T>` has a redacting `Debug` (types.rs:5155).
- `Mnemonic` ... derives `Debug` and prints the phrase.
**Impact**: Low today (no `tracing::debug!` site currently formats a
`Mnemonic`), but this is exactly the kind of latent secret-leak that bites
later — a future debugging session adds `tracing::debug!(?mnemonic, ...)`, the
phrase lands in a log file, the log file gets attached to a bug report. The
root of trust should never have a `Debug` impl that can print it.
**Resolution**: Replace `#[derive(Debug)]` with a manual impl that redacts,
matching the pattern used for `DerivedKey`:
```rust
impl std::fmt::Debug for Mnemonic {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("Mnemonic")
.field("phrase", &"[REDACTED]")
.finish()
}
}
```
Add a test asserting `format!("{:?}", mnemonic)` does not contain any word
from the phrase. Same treatment for `Seed` if it doesn't already redact (it
derives `Zeroize` but I didn't check its `Debug` — verify).
---
### W4. `AuthPolicy::resolve_api_key` does not populate `Identity.resources`
**Files**: `crates/alknet-core/src/config.rs:81118` (`resolve_api_key`),
`crates/alknet-core/src/auth.rs:1519` (`Identity`)
**Problem**: `resolve_api_key` constructs the resolved `Identity` with
`resources: std::collections::HashMap::new()` (config.rs:116) — resources are
always empty, regardless of what the API key entry grants. The spec
(docs/architecture/crates/core/auth.md:153) says:
> "Token: Parse as UTF-8. If it starts with `alk_`, look up in
> `DynamicConfig::auth::api_keys` by prefix match + SHA-256 hash. If found and
> not expired, return `Identity { id: prefix, scopes: entry.scopes,
> resources: entry.resources }`."
The spec references `entry.resources`, but `ApiKeyEntry` (config.rs:5562)
has no `resources` field — only `prefix, hash, scopes, description, expires_at`.
So the spec's `entry.resources` cannot be implemented as written; either the
spec is ahead of the type, or the type is behind the spec.
**Impact**: Today, no operation in the workspace uses resource-based ACLs
against an API-key-resolved identity (the `AccessControl::resource_type`/
`resource_action` fields exist in spec.rs:3237 and are tested in spec.rs:284
303, but always with hand-constructed `Identity` resources — never with a
token-resolved identity). So the gap is latent. The moment an operation
declares a resource-scoped ACL and a caller authenticates via API key, the ACL
check will fail with "missing resource" even if the key was granted that
resource in config — because `resources` is always empty.
**Resolution**: Pick one:
(a) **Add `resources: HashMap<String, Vec<String>>` to `ApiKeyEntry`**, update
`resolve_api_key` to populate `Identity.resources` from `entry.resources`, and
update `AuthPolicy`'s TOML schema (when one exists) to parse it. This matches
the spec.
(b) **Drop `entry.resources` from the spec**, document that API keys grant
scopes only (not resources), and update `auth.md:153` accordingly. Resource-
scoped access then requires fingerprint-based identity (which also currently
returns `resources: HashMap::new()` at config.rs:74 — same gap, same fix
needed there).
Either way, the current state — spec says `entry.resources`, code says `HashMap::new()`,
type has no field for it — is a three-way mismatch that will trip the next
implementer who tries to use resource ACLs with token auth.
---
## Suggestions
### S1. `EncryptionKey::from_derived_bytes` panics on short input
**File**: `crates/alknet-vault/src/encryption.rs:112119`
`bytes[..32]` panics if `bytes.len() < 32`. Today every caller passes a 32-byte
`DerivedKey.private_key` (from SLIP-0010, which always produces 32 bytes), so
this is safe in practice. But the function is `pub`, takes `&[u8]`, and
documents itself as "the bridge from `DerivedKey` (SLIP-0010 output) to
`EncryptionKey`" — a caller passing a short slice (e.g., a future secp256k1
path that accidentally hands a 33-byte compressed pub key where a 32-byte
priv key was expected) gets an index-out-of-bounds panic from a `pub` crypto
API, which is a poor failure mode for a security-sensitive function.
Consider `u32::try_from(bytes.len())` and returning `Result<Self,
EncryptionError>` with `EncryptionError::InvalidKeyLength`, or at minimum
`bytes.get(..32).ok_or(...)` instead of `bytes[..32]`. Tests should cover the
short-input case.
---
### S2. `CallConnection::call` does not enforce the `OperationContext.deadline`
**File**: `crates/alknet-call/src/protocol/connection.rs:75117`
`call()` registers the pending entry with `Instant::now() + DEFAULT_CALL_TIMEOUT`
(a fixed 30s), ignoring the caller's `OperationContext.deadline`. The spec
(call-protocol.md:480489) says composed calls inherit the parent's remaining
deadline — a depth-5 composition should not get 5×30s. The deadline field is
propagated through `OperationContext` (env.rs:83 `deadline: parent.deadline`),
but `CallConnection::call` is the *client*-side path (outgoing wire call) and
doesn't receive an `OperationContext` — it builds its own timeout from a
const. This is fine for top-level client calls; it's a gap for `from_call` ops
(which forward via `CallConnection::call` from inside a handler). When
`from_call` is implemented (currently deferred, see call-connection.md:119),
the forwarding handler should pass `parent.deadline` through to the outgoing
call so a depth-N composition is bounded by the root's deadline, not N×30s.
Not a defect today (no `from_call` callers exist), but worth a TODO at the
call site so the next implementer doesn't ship the same N×30s bug.
---
### S3. `AccessControl::check` for `resource_action` without `resource_type` is surprising
**File**: `crates/alknet-call/src/registry/spec.rs:9199`
When `resource_type` is `None` but `resource_action` is `Some(action)`, the
check scans *every* resource type in `identity.resources` looking for any list
that contains `action`. This is an odd semantics — "the caller has *some*
resource with this action, anywhere" — and is not documented in the spec. The
spec (operation-registry.md) describes resource checks as type-scoped. If the
"any resource" branch is intentional, document it; if not, require
`resource_type` whenever `resource_action` is set (or return `Allowed` when
only `resource_action` is set, matching the principle that restrictions
shouldn't be inferred). The test at spec.rs:284303 only exercises the type-
scoped case, so the type-less branch is untested and unspecified.
---
### S4. `PendingRequestMap::handle_responded` silently drops a Subscribe entry when the channel is full
**File**: `crates/alknet-call/src/protocol/pending.rs:135141`
When a Subscribe entry's `mpsc::Sender::try_send` returns `Full`, the entry is
removed from the map and the subscription is silently closed — the subscriber
sees the channel end but no error, just `None`. The `tracing::warn!` at
pending.rs:136 logs it server-side. The subscriber has no way to distinguish
"stream completed normally" from "you were too slow and got dropped". For
subscriptions that carry semantic state (LLM token streams, event tails),
silent drop is a poor failure mode. Consider sending an explicit
`CallError::internal("subscribe channel full")` before closing, or blocking
on `send().await` with a timeout rather than `try_send`. This is a two-way
door (the protocol doesn't forbid either behavior), but the current choice
should be a documented one, not an accident.
---
### S5. `clippy::pedantic` flags a few minor items worth addressing
**Files**: various
`cargo clippy --all-features --all-targets -- -W clippy::pedantic` (run during
this review) reports 11 unique warnings beyond default clippy:
- `clippy::cast_possible_truncation` at `wire.rs:312, 488, 533` — `body.len()
as u32` in the frame writer. Already guarded by the `len > MAX_FRAME_SIZE`
check at wire.rs:232, but `u32::try_from(body.len()).expect("guarded by
MAX_FRAME_SIZE check")` (or `unwrap_or`) would make the invariant explicit
and silence the lint.
- `clippy::unchecked_time_subtraction` at `pending.rs:556` —
`Instant::now() - Duration::from_millis(1)` in a test. Test-only, but
`checked_sub` is the pedantic-clean form.
- `clippy::redundant_closure_for_method_calls` at `registration.rs:307` —
`scopes.iter().map(|s| s.to_string())` → `.map(ToString::to_string)`.
- A handful of `missing_const_fn` / `needless_pass_by_value` in tests.
None of these are correctness issues. Default `cargo clippy` (the bar the
tasks were held to) is clean. The pedantic set is offered as a one-time
cleanup if the team wants to enable `clippy::pedantic` in CI later — the
fixes are mechanical.
---
## What Was Verified and Found Correct
These are areas I specifically checked because the prior reviews or the ADRs
flagged them as high-risk, and the implementation got them right:
- **`DerivedKey` custom `Deserialize` rejects `"[REDACTED]"`** (protocol.rs:69
91) — exactly as review #003 C1 demanded. The `#[derive(Deserialize)]` is
gone, the manual impl checks for the redaction marker, and the test at
protocol.rs:136149 verifies the error message mentions `[REDACTED]` and
does not leak key bytes.
- **`encrypt`/`decrypt` use `encryption_path_for_version(key_version)`**
(service.rs:320340) — review #003 C2's resolution is implemented faithfully.
The `key_version` is stamped on `EncryptedData` from the same call, not from
`PATHS::ENCRYPTION`. Test at service.rs:610621 confirms v2 round trip;
service.rs:623638 confirms rotate v2→v3.
- **`encryption_path_for_version(version < 2)` returns `InvalidPath`**
(derivation.rs:6168) — review #003 W5's guard is in place. Tests at
derivation.rs:274287 and service.rs:654679 cover v0 and v1 rejection at
both the path and `encrypt` layers.
- **`std::sync::RwLock` (not tokio) in `VaultServiceHandle`** (service.rs:43)
— review #003 C4's resolution is correct. All vault methods are sync; no
`tokio` dependency in `alknet-vault`'s `Cargo.toml`. Poisoned-lock recovery
via `unwrap_or_else(|e| e.into_inner())` at service.rs:133, 152, 173, 190,
etc. — tested at service.rs:429449.
- **`Capabilities` is non-serializable, `Zeroize`/`ZeroizeOnDrop`, `Clone`,
sealed-builder** (types.rs:57118) — review #003 C7's resolution is
implemented: private `entries`, builder via `with_api_key`/`with_http_token`,
no `Serialize` derive (test at types.rs:589592 asserts this), redacting
`Debug`. `Clone` is explicit (not derived) and zeroizes-on-drop via the
`Secret<T>` wrapper.
- **`SessionOverlaySource` is defined in `alknet-call`** (adapter.rs:3843) —
review #003 C8's crate-graph fix is correct. The trait lives alongside
`OperationEnv`, not in a hypothetical `alknet-agent`. `CallAdapter` holds
`Option<Arc<dyn SessionOverlaySource + Send + Sync>>`.
- **`CompositeOperationEnv` dispatch probes `contains()` before delegating**
(env.rs:144161) — review #003 C9's resolution is implemented: session →
connection → base, each gated by `contains()`. No sentinel ambiguity. Tests
at env.rs:397417 (session overlay wins), 419448 (falls through to base),
466488 (contains aggregates), 568597 (connection wins when session absent
or missing the op).
- **`OperationRegistryBuilder` is Layer-0-only; `CallConnection::register_
imported` is the Layer 2 API** (registration.rs:127199, connection.rs:57
67) — review #003 C10's resolution is correct. The builder has no overlay
API; runtime overlay registration uses `CallConnection` methods. The
`OverlayOperationEnv` (connection.rs:232291) implements `OperationEnv`
with `contains()` keyed on the overlay map.
- **`with_local` takes 5 args including `capabilities`** (registration.rs:138
157) — review #003 C11's resolution is in place. Both examples in the spec
now agree; the implementation matches.
- **`invoke_with_policy` is the required method; `invoke` has a default impl**
(env.rs:1135) — review #003 W1's resolution is correct. Impls only write
`invoke_with_policy`; `invoke` delegates with `parent.abort_policy`.
- **`OperationContext` has 11 fields including `abort_policy` and `deadline`**
(context.rs:1123) — matches the post-review-#003 spec. `internal` is
`pub(crate)` (set only by `OperationRegistry::invoke` and the env impls,
not by handlers) — good, prevents handlers from forging internal authority.
- **`CallError.details` is `Option<Value>` and serializes only when present**
(wire.rs:6168, 477481) — ADR-023's typed error payload is wired. The
`skip_serializing_if = "Option::is_none"` keeps protocol-level errors (no
`details`) compact on the wire.
- **`OperationProvenance` and `CompositionAuthority` are defined inline in
the call crate** (registration.rs:1927, context.rs:3865) — review #003
W11's resolution is correct. `CompositionAuthority::as_identity()` produces
the `Identity` used for internal-call ACL (registration.rs:103110),
matching ADR-015's "handler identity, not caller identity, for internal
calls" rule. Test at registration.rs:473503 confirms internal call uses
handler authority; 505538 confirms insufficient handler authority is
forbidden; 540570 confirms external call uses caller identity, not handler
authority.
- **Internal ops return `NOT_FOUND` (not `FORBIDDEN`) from the wire**
(registration.rs:98100) — ADR-015's "internal ops invisible from the wire"
rule is correct. Tested at registration.rs:331351.
- **`auth_token` resolution: success overrides connection identity; failure
falls back** (adapter.rs:87105) — matches call-protocol.md:403. Tested at
adapter.rs:490535 (all four cases: no token, valid token, invalid token
with connection identity, invalid token without connection identity).
- **`call.requested` payload carries no abort policy field; root gets
`AbortPolicy::default()`** (adapter.rs:161) — ADR-016 Decision 6 is
honored. The wire caller cannot set the policy; the composing handler opts
children into `ContinueRunning` via `invoke_with_policy`.
- **Vault has no `tokio` dependency, no `irpc`, no ALPN, no alknet-core
dep** (Cargo.toml) — ADR-018, ADR-025 are satisfied. The vault is local-
only by construction.
- **`VaultServiceHandle` is `Clone` (Arc<RwLock>)** (service.rs:5659) —
matches ADR-019's "assembly layer holds the handle, injects derived
material" model.
- **`rotate` decrypts with old version, re-encrypts with new** (service.rs:350
357) — ADR-021's "no new mnemonic" rotation is correct. Partial rotation is
safe (test at service.rs:640652 confirms old key still derivable after
rotation).
- **`KeyCache` zeroizes on drop via `CachedKey: Zeroize + ZeroizeOnDrop`**
(cache.rs:2537) — `DerivedKey` inside `CachedKey` is `#[zeroize]` on the
private key field. `KeyCache::clear` drops all entries, triggering
zeroization. The drop-tracker tests at cache.rs:215289 verify HashMap
`clear`/`remove`/`insert`-replace all invoke `Drop` on the values.
- **`build_root_context` reads `composition_authority`, `capabilities`,
`scoped_env` from the registration bundle** (adapter.rs:126166) — ADR-
022's bundle is the dispatch source of truth, not a closure-captured
ambient. Capabilities are per-request on `OperationContext`, populated
from the bundle.
- **`OperationContext.metadata` is fresh per composed call** (env.rs:81,
connection.rs:277) — ADR-014's "metadata does not propagate through
composition" security constraint is honored. Test at env.rs:351370
confirms child metadata is empty even when parent has entries.
- **`ScopedOperationEnv` bounds composition reachability** (context.rs:6794,
env.rs:6365, connection.rs:248250) — ADR-015's "handler can only invoke
declared ops" rule is enforced at every env layer. Test at env.rs:288299
confirms disallowed op returns `NOT_FOUND`.
- **Wire framing: 4-byte big-endian length prefix, 64 MiB max, zero-length
rejected, oversized rejected, EOF → `ConnectionClosed`** (wire.rs:185214)
— robust. Tested at wire.rs:297349.
- **`CallAdapter::handle` spawns a 10s sweeper for expired pending entries**
(adapter.rs:246259) and fails all pending on connection close (289297).
Correct lifecycle.
---
## Recommended Resolution Order
1. **W1 (abort cascade wiring)** — highest value. The code exists; the bolt
is missing. ~30 lines in `handle_stream` + one integration test. Without
this, ADR-016 is a documented property that doesn't hold at runtime.
2. **W3 (Mnemonic Debug redaction)** — smallest fix, eliminates a latent
root-of-trust leak. ~10 lines + one test.
3. **W2 (fingerprint extraction)** — decide: implement or explicitly defer.
Either way, reconcile spec / summary / code. If implementing, the quinn
path is ~20 lines (downcast `HandshakeData`, extract client cert, hash to
fingerprint); the iroh path uses the peer `NodeId` directly.
4. **W4 (ApiKeyEntry resources)** — decide: add the field or drop it from the
spec. Whichever way, fix the three-way mismatch.
5. **S1S5** — opportunistic; bundle into a follow-up "polish" task or address
as touched.
Review #004 is open. Zero critical findings; four warnings, all local; five
suggestions. The implementation is sound, the spec drift is bounded, and the
one wiring gap (W1) has all the hard logic already written and tested — it
just needs to be called.
---
## Resolution (2026-06-24)
All four warnings (W1W4) resolved. Workspace green:
`cargo build --workspace --all-features`, `cargo test --workspace
--all-features` (326 tests, 0 failures), `cargo clippy --workspace
--all-features --all-targets` (0 warnings).
- **W1 (abort cascade wiring)**: `CallAdapter::handle_stream` now
matches `EVENT_ABORTED`, invokes `AbortCascade::cascade_abort` with
`AbortPolicy::AbortDependents`, and aborts the root. No descendant
`call.aborted` frames sent on the wire (ADR-016 Decision 2). Two
integration tests cover the cascade + unknown-id no-op paths.
(`tasks/call/protocol/abort-cascade-wiring.md` → completed)
- **W2 (fingerprint extraction)**: `dispatch_quinn` extracts the leaf
client cert DER via `peer_identity()` → `SHA256:<hex>`; `dispatch_iroh`
extracts the peer `NodeId` → `ed25519:<hex>`. Fingerprint format
documented in `auth.md`. Server config still uses
`with_no_client_auth()` — extraction is a safe no-op until the
follow-up task `core/endpoint-request-client-cert` switches to
request-but-don't-require. Two unit tests cover fingerprint format +
determinism.
(`tasks/core/endpoint-client-fingerprint.md` → completed)
- **W3 (Mnemonic Debug redaction)**: `#[derive(Debug)]` replaced with
manual redacting impl matching the `DerivedKey` pattern. `Seed`
confirmed to have no `Debug` impl. Test asserts no phrase word leaks.
(`tasks/vault/mnemonic-debug-redaction.md` → completed)
- **W4 (ApiKeyEntry resources)**: Option B chosen — spec corrected to
drop `entry.resources`; `auth.md` now documents that external
identities (token/fingerprint) grant scopes only, resource-scoped
ACLs are a composition-internal concern (ADR-015/022). Two tests
confirm both resolvers return empty resources.
(`tasks/core/auth-apikey-resources.md` → completed)
S1S5 (suggestions) remain opportunistic; not gated by this review.