docs(reviews): add post-implementation sanity check #004
28-task implementation review across alknet-vault, alknet-core, alknet-call. Zero critical findings; 4 warnings (W1: AbortCascade implemented but never invoked by CallAdapter; W2: endpoint hard-codes tls_client_fingerprint=None; W3: Mnemonic Debug leaks seed phrase; W4: ApiKeyEntry missing resources field); 5 suggestions. Build clean, 332 tests pass, default clippy clean.
This commit is contained in:
569
docs/reviews/004-post-implementation-sanity-check.md
Normal file
569
docs/reviews/004-post-implementation-sanity-check.md
Normal file
@@ -0,0 +1,569 @@
|
||||
---
|
||||
status: open
|
||||
last_updated: 2026-06-23
|
||||
reviewed_artifacts:
|
||||
- crates/alknet-vault/src/{lib,cache,derivation,encryption,ethereum,mnemonic,protocol,service}.rs
|
||||
- crates/alknet-core/src/{lib,auth,config,endpoint,types}.rs
|
||||
- crates/alknet-call/src/{lib,registry/*,protocol/*}.rs
|
||||
- docs/architecture/{overview,open-questions}.md
|
||||
- docs/architecture/crates/{vault,core,call}/*.md
|
||||
- docs/architecture/decisions/001–026
|
||||
- docs/reviews/001,002,003-*.md
|
||||
- tasks/{vault,core,call}/**/*.md (28 completed tasks)
|
||||
reviewer: code-reviewer (post-implementation)
|
||||
---
|
||||
|
||||
# Post-Implementation Sanity Check #004
|
||||
|
||||
## Purpose
|
||||
|
||||
Sanity check after the 28-task implementation round across the three greenfield
|
||||
crates (`alknet-vault`, `alknet-core`, `alknet-call`). Reviews #001–#003 were
|
||||
*pre*-implementation and caught spec-level gaps (type-surface completeness,
|
||||
registration-bundle tangle, abort-policy wiring, vault deferral audit). This
|
||||
review is *post*-implementation: it asks whether the code matches the now-stable
|
||||
specs, and whether anything obvious (or not so obvious) slipped through.
|
||||
|
||||
Headline result: **the implementation is in remarkably good shape.** It builds
|
||||
clean (`cargo build --workspace --all-features`), all 332 tests pass, and
|
||||
`cargo clippy --workspace --all-features --all-targets` reports zero default-
|
||||
level warnings. The spec drift that remains is concentrated in two places:
|
||||
(a) one wiring gap where implemented-and-tested code is never called, and (b)
|
||||
two minor behaviors the spec describes but the implementation shortcuts. None
|
||||
of the findings block forward progress; all are local fixes.
|
||||
|
||||
## Severity Definitions
|
||||
|
||||
| Severity | Meaning |
|
||||
|----------|---------|
|
||||
| **Critical** | Security or correctness defect that will misbehave in production |
|
||||
| **Warning** | Spec drift that works today but will surprise a future implementer or breaks an invariant |
|
||||
| **Suggestion** | Hardening, ergonomics, or test-coverage gaps |
|
||||
|
||||
## Methodology
|
||||
|
||||
- Read every source file in the three crates (~10.3k LOC).
|
||||
- Cross-referenced each task's acceptance criteria against the implemented code.
|
||||
- Cross-referenced each ADR's binding decisions against the implementation.
|
||||
- Ran `cargo build`, `cargo test`, `cargo clippy` (default + `-W clippy::pedantic`).
|
||||
- Walked the dispatch paths end-to-end: ALPN handshake → endpoint dispatch →
|
||||
handler → registry invoke → env compose → child invoke → wire response.
|
||||
- Traced the secret-material flow: vault unlock → derive → cache → encrypt →
|
||||
`DerivedKey`/`EncryptionKey` zeroization on drop.
|
||||
|
||||
## Summary Statistics
|
||||
|
||||
| Severity | Count |
|
||||
|----------|-------|
|
||||
| Critical | 0 |
|
||||
| Warning | 4 (W1–W4) |
|
||||
| Suggestion | 5 (S1–S5) |
|
||||
|
||||
---
|
||||
|
||||
## Warning Findings
|
||||
|
||||
### W1. `AbortCascade` is implemented and tested but never invoked by `CallAdapter`
|
||||
|
||||
**Files**: `crates/alknet-call/src/protocol/abort.rs` (full impl + 20 tests),
|
||||
`crates/alknet-call/src/protocol/adapter.rs:200–228` (`handle_stream`),
|
||||
`crates/alknet-call/src/protocol/adapter.rs:261–301` (`handle`)
|
||||
|
||||
**Problem**: `AbortCascade::cascade_abort` exists with full tree-walking logic
|
||||
for both `AbortDependents` and `ContinueRunning`, and 20 unit tests cover depth-
|
||||
3 cascades, mixed Call/Subscribe entries, and the "unknown root silently
|
||||
discarded" rule from ADR-016. But no caller invokes it.
|
||||
|
||||
In `CallAdapter::handle_stream` (adapter.rs:210–213), the only inbound event
|
||||
type dispatched is `EVENT_REQUESTED`:
|
||||
|
||||
```rust
|
||||
if envelope.r#type != EVENT_REQUESTED {
|
||||
debug!(event_type = %envelope.r#type, id = %envelope.id, "ignoring non-requested event on inbound stream");
|
||||
continue;
|
||||
}
|
||||
```
|
||||
|
||||
In `CallAdapter::handle` (adapter.rs:261–301), the only abort-related behavior
|
||||
is `pending.lock().fail_all(...)` on connection close — which fails every
|
||||
pending wire entry with `INTERNAL` but does not walk the call tree. There is no
|
||||
path where an inbound `call.aborted` event for a parent request triggers
|
||||
`AbortCascade::cascade_abort`.
|
||||
|
||||
**Spec contradiction**: `docs/architecture/crates/call/call-protocol.md:497`:
|
||||
|
||||
> "When `call.aborted` arrives for a parent request, the protocol cascades the
|
||||
> abort to all non-terminal descendants in the tree. The CallAdapter walks the
|
||||
> tree (indexed by `parent_request_id` in `PendingRequestMap`) and sends
|
||||
> `call.aborted` for each descendant."
|
||||
|
||||
The implementation does not do this. A wire client that sends `call.aborted`
|
||||
to cancel its in-flight root request gets the event silently dropped
|
||||
server-side; the root's composed descendants continue running to completion
|
||||
under the default `AbortDependents` policy — exactly the wasted-work / unwanted-
|
||||
side-effects case ADR-016 was written to prevent.
|
||||
|
||||
**Why the tests pass anyway**: `AbortCascade`'s unit tests construct a
|
||||
`PendingRequestMap` directly and call `cascade_abort` on it. They never go
|
||||
through `CallAdapter`. The call-adapter task's acceptance criteria (tasks/call/
|
||||
protocol/call-adapter.md:213–238) did not include "inbound `call.aborted`
|
||||
triggers cascade" — only outgoing aborts via `CallConnection::abort()` and the
|
||||
abort-cascade logic itself. The abort-cascade task's acceptance criteria
|
||||
(tasks/call/protocol/abort-cascade.md:154–172) likewise test `AbortCascade` in
|
||||
isolation. The integration step — wiring the cascade into the adapter's inbound
|
||||
event path — is in neither task's acceptance criteria, and so was never done.
|
||||
|
||||
This is the classic "two correctly-implemented halves with no bolt between
|
||||
them" pattern. Each half passes its own tests; the integration is untested and
|
||||
absent.
|
||||
|
||||
**Impact**: Today, with no `from_call` ops and no LLM agent driving nested
|
||||
composition, the gap is mostly latent — most call trees are depth-1 (single
|
||||
`call.requested` → single handler → response). The cascade becomes load-
|
||||
bearing the moment a handler composes other operations and a client aborts the
|
||||
root mid-flight (e.g., user cancels an agent chat → `/agent/chat` handler has
|
||||
already invoked `/fs/readFile` and `/bash/exec` → those keep running, possibly
|
||||
with side effects, after the user-visible cancellation). That is the precise
|
||||
scenario ADR-016's "aborted parent work has no consumer waiting for results —
|
||||
continuing is wasted work at best and unwanted side effects at worst" calls
|
||||
out.
|
||||
|
||||
**Resolution**: In `CallAdapter::handle_stream`, add a branch for
|
||||
`EVENT_ABORTED` (and possibly `EVENT_COMPLETED`/`EVENT_ERROR` for client-
|
||||
initiated close on subscriptions) that:
|
||||
|
||||
1. Looks up the request_id in the connection's `PendingRequestMap`.
|
||||
2. If present, constructs `AbortCascade::new(&mut pending)` and calls
|
||||
`cascade_abort(&request_id, AbortPolicy::AbortDependents)` (the wire caller
|
||||
doesn't get to choose the policy — the root's policy is `AbortDependents`
|
||||
per ADR-016 Decision 6; `ContinueRunning` is a handler-level opt-in for
|
||||
children, not a wire field).
|
||||
3. Optionally sends `call.aborted` back for each cascaded descendant ID, if the
|
||||
protocol intends the cascade to be wire-visible (the spec is ambiguous
|
||||
here — call-protocol.md:497 says "sends `call.aborted` for each descendant";
|
||||
abort-cascade.md:68–74 says "Composed child request_ids are internal — the
|
||||
client only sees `call.aborted` for the root ID it sent". The latter is the
|
||||
stronger statement and matches ADR-016. Pick one and document it.)
|
||||
|
||||
Then add an integration test that opens a stub connection, registers a parent
|
||||
op whose handler invokes a child op via `env.invoke()`, sends `call.aborted`
|
||||
for the parent's request_id on an inbound stream, and asserts the child's
|
||||
`PendingRequestMap` entry is removed. This is the test that would have caught
|
||||
the gap.
|
||||
|
||||
---
|
||||
|
||||
### W2. Endpoint never extracts TLS client certificate fingerprint
|
||||
|
||||
**Files**: `crates/alknet-core/src/endpoint.rs:306` (`dispatch_quinn`),
|
||||
`crates/alknet-core/src/endpoint.rs:396` (`dispatch_iroh`),
|
||||
`crates/alknet-core/src/endpoint.rs:405–421` (`build_auth_context`)
|
||||
|
||||
**Problem**: Both dispatch functions hard-code `tls_client_fingerprint: None`
|
||||
when calling `build_auth_context`:
|
||||
|
||||
```rust
|
||||
let auth = build_auth_context(&alpn, remote_addr, None, identity_provider);
|
||||
// ^^^^
|
||||
```
|
||||
|
||||
So `AuthContext.tls_client_fingerprint` is always `None`, and
|
||||
`AuthContext.identity` (the endpoint-resolved identity) is always `None` too —
|
||||
because `build_auth_context` only attempts `resolve_from_fingerprint` when the
|
||||
fingerprint is `Some`. The endpoint-level auth resolution path described in
|
||||
`docs/architecture/crates/core/auth.md:159–171` is non-functional:
|
||||
|
||||
> "QUIC connection arrives → TLS handshake (ALPN negotiation) → Extract TLS
|
||||
> client certificate fingerprint (if presented) → If fingerprint present:
|
||||
> IdentityProvider::resolve_from_fingerprint() → Some(identity):
|
||||
> auth.identity = Some(identity) → Construct AuthContext { identity, alpn,
|
||||
> remote_addr, tls_client_fingerprint }"
|
||||
|
||||
The implementation constructs the AuthContext but with `identity: None` and
|
||||
`tls_client_fingerprint: None` regardless of whether the TLS peer presented a
|
||||
client certificate. All identity resolution is deferred to handler-level code
|
||||
(the `auth_token` path in `CallAdapter::resolve_identity`, or the future SSH
|
||||
handler's key-fingerprint path).
|
||||
|
||||
**Task summary mismatch**: `tasks/core/endpoint.md:252` claims "AuthContext
|
||||
construction from connection (alpn/remote_addr/fingerprint/identity)" as a
|
||||
completed deliverable. The fingerprint step is not actually implemented —
|
||||
`None` is passed in both dispatch sites. The task's acceptance criteria
|
||||
(tasks/core/endpoint.md:213–238 — not directly quoted above, but the same
|
||||
file) likewise lists fingerprint extraction as an acceptance criterion.
|
||||
|
||||
**Why this matters**: For SSH and git handlers, this is fine — they do their
|
||||
own key-fingerprint extraction inside `handle()`. For P2P nodes using RFC 7250
|
||||
raw Ed25519 keys (the "default for most alknet nodes" per OQ-12), the
|
||||
connection-level identity *is* the TLS client cert — there's no separate
|
||||
protocol-level credential to extract. Without endpoint-level fingerprint
|
||||
extraction, a raw-key peer connecting to an `alknet/call` endpoint cannot be
|
||||
identified by fingerprint at the endpoint layer; the call protocol's
|
||||
`auth_token` field becomes the *only* identity path, which defeats the
|
||||
"hybrid resolution" design from ADR-004/OQ-02.
|
||||
|
||||
**Resolution**: In `dispatch_quinn`, extract the client cert from the rustls
|
||||
session via `connection.handshake_data()` →
|
||||
`quinn::crypto::rustls::HandshakeData` (which already has `downcast::<...>()`
|
||||
in `extract_quinn_alpn` at endpoint.rs:316–326). The `HandshakeData` struct
|
||||
exposes `peer_subjective_name`/`client_cert_chain`-equivalent fields; with
|
||||
`with_no_client_auth()` currently set, no client cert is requested, so this
|
||||
also requires deciding whether the server config should request client certs
|
||||
(two-way door — `with_client_auth()` vs `with_no_client_auth()`). For iroh,
|
||||
`connecting.lms()` / the connection's `tls::Lts` exposes the peer's `NodeId`,
|
||||
which *is* the raw-key fingerprint in iroh's model. At minimum, extract *some*
|
||||
fingerprint when one is available; passing `None` unconditionally is the gap.
|
||||
|
||||
If client cert extraction is deferred (a defensible call — ACME and full
|
||||
client-auth are also deferred per the task summary), update the task summary
|
||||
and the spec to say so explicitly. Right now the spec says "extracted", the
|
||||
summary says "constructed from fingerprint", and the code says `None`. Pick
|
||||
two.
|
||||
|
||||
---
|
||||
|
||||
### W3. `Mnemonic: #[derive(Debug)]` prints the seed phrase
|
||||
|
||||
**File**: `crates/alknet-vault/src/mnemonic.rs:35`
|
||||
|
||||
**Problem**: `Mnemonic` derives `Debug`:
|
||||
|
||||
```rust
|
||||
#[derive(Debug)]
|
||||
pub struct Mnemonic {
|
||||
inner: Bip39Mnemonic,
|
||||
phrase: String,
|
||||
}
|
||||
```
|
||||
|
||||
The derived `Debug` prints both fields, including `phrase: "abandon abandon
|
||||
abandon ..."`. The crate's own module doc (mnemonic.rs:8) says "Seed material
|
||||
is protected with `Zeroize`" and the `Mnemonic::phrase()` accessor (line 82)
|
||||
warns "Handle with care — this is the root of trust for all derived keys." The
|
||||
`Zeroize + Drop` impls (lines 88–99) wipe the phrase from memory on drop — but
|
||||
`Debug` will happily hand the phrase to any `tracing::debug!`, `format!("{:?}")`,
|
||||
panic backtrace, or error-context printer that touches a `Mnemonic` value.
|
||||
|
||||
This is inconsistent with the rest of the crate's secret handling:
|
||||
- `DerivedKey` has a custom redacting `Debug` (protocol.rs:48–56) that prints
|
||||
`private_key: "[REDACTED]"`.
|
||||
- `EncryptionKey` has a custom redacting `Debug` (encryption.rs:132–139).
|
||||
- `Capabilities` has a redacting `Debug` (types.rs:112–118).
|
||||
- `Secret<T>` has a redacting `Debug` (types.rs:51–55).
|
||||
- `Mnemonic` ... derives `Debug` and prints the phrase.
|
||||
|
||||
**Impact**: Low today (no `tracing::debug!` site currently formats a
|
||||
`Mnemonic`), but this is exactly the kind of latent secret-leak that bites
|
||||
later — a future debugging session adds `tracing::debug!(?mnemonic, ...)`, the
|
||||
phrase lands in a log file, the log file gets attached to a bug report. The
|
||||
root of trust should never have a `Debug` impl that can print it.
|
||||
|
||||
**Resolution**: Replace `#[derive(Debug)]` with a manual impl that redacts,
|
||||
matching the pattern used for `DerivedKey`:
|
||||
|
||||
```rust
|
||||
impl std::fmt::Debug for Mnemonic {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
f.debug_struct("Mnemonic")
|
||||
.field("phrase", &"[REDACTED]")
|
||||
.finish()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Add a test asserting `format!("{:?}", mnemonic)` does not contain any word
|
||||
from the phrase. Same treatment for `Seed` if it doesn't already redact (it
|
||||
derives `Zeroize` but I didn't check its `Debug` — verify).
|
||||
|
||||
---
|
||||
|
||||
### W4. `AuthPolicy::resolve_api_key` does not populate `Identity.resources`
|
||||
|
||||
**Files**: `crates/alknet-core/src/config.rs:81–118` (`resolve_api_key`),
|
||||
`crates/alknet-core/src/auth.rs:15–19` (`Identity`)
|
||||
|
||||
**Problem**: `resolve_api_key` constructs the resolved `Identity` with
|
||||
`resources: std::collections::HashMap::new()` (config.rs:116) — resources are
|
||||
always empty, regardless of what the API key entry grants. The spec
|
||||
(docs/architecture/crates/core/auth.md:153) says:
|
||||
|
||||
> "Token: Parse as UTF-8. If it starts with `alk_`, look up in
|
||||
> `DynamicConfig::auth::api_keys` by prefix match + SHA-256 hash. If found and
|
||||
> not expired, return `Identity { id: prefix, scopes: entry.scopes,
|
||||
> resources: entry.resources }`."
|
||||
|
||||
The spec references `entry.resources`, but `ApiKeyEntry` (config.rs:55–62)
|
||||
has no `resources` field — only `prefix, hash, scopes, description, expires_at`.
|
||||
So the spec's `entry.resources` cannot be implemented as written; either the
|
||||
spec is ahead of the type, or the type is behind the spec.
|
||||
|
||||
**Impact**: Today, no operation in the workspace uses resource-based ACLs
|
||||
against an API-key-resolved identity (the `AccessControl::resource_type`/
|
||||
`resource_action` fields exist in spec.rs:32–37 and are tested in spec.rs:284–
|
||||
303, but always with hand-constructed `Identity` resources — never with a
|
||||
token-resolved identity). So the gap is latent. The moment an operation
|
||||
declares a resource-scoped ACL and a caller authenticates via API key, the ACL
|
||||
check will fail with "missing resource" even if the key was granted that
|
||||
resource in config — because `resources` is always empty.
|
||||
|
||||
**Resolution**: Pick one:
|
||||
|
||||
(a) **Add `resources: HashMap<String, Vec<String>>` to `ApiKeyEntry`**, update
|
||||
`resolve_api_key` to populate `Identity.resources` from `entry.resources`, and
|
||||
update `AuthPolicy`'s TOML schema (when one exists) to parse it. This matches
|
||||
the spec.
|
||||
|
||||
(b) **Drop `entry.resources` from the spec**, document that API keys grant
|
||||
scopes only (not resources), and update `auth.md:153` accordingly. Resource-
|
||||
scoped access then requires fingerprint-based identity (which also currently
|
||||
returns `resources: HashMap::new()` at config.rs:74 — same gap, same fix
|
||||
needed there).
|
||||
|
||||
Either way, the current state — spec says `entry.resources`, code says `HashMap::new()`,
|
||||
type has no field for it — is a three-way mismatch that will trip the next
|
||||
implementer who tries to use resource ACLs with token auth.
|
||||
|
||||
---
|
||||
|
||||
## Suggestions
|
||||
|
||||
### S1. `EncryptionKey::from_derived_bytes` panics on short input
|
||||
|
||||
**File**: `crates/alknet-vault/src/encryption.rs:112–119`
|
||||
|
||||
`bytes[..32]` panics if `bytes.len() < 32`. Today every caller passes a 32-byte
|
||||
`DerivedKey.private_key` (from SLIP-0010, which always produces 32 bytes), so
|
||||
this is safe in practice. But the function is `pub`, takes `&[u8]`, and
|
||||
documents itself as "the bridge from `DerivedKey` (SLIP-0010 output) to
|
||||
`EncryptionKey`" — a caller passing a short slice (e.g., a future secp256k1
|
||||
path that accidentally hands a 33-byte compressed pub key where a 32-byte
|
||||
priv key was expected) gets an index-out-of-bounds panic from a `pub` crypto
|
||||
API, which is a poor failure mode for a security-sensitive function.
|
||||
|
||||
Consider `u32::try_from(bytes.len())` and returning `Result<Self,
|
||||
EncryptionError>` with `EncryptionError::InvalidKeyLength`, or at minimum
|
||||
`bytes.get(..32).ok_or(...)` instead of `bytes[..32]`. Tests should cover the
|
||||
short-input case.
|
||||
|
||||
---
|
||||
|
||||
### S2. `CallConnection::call` does not enforce the `OperationContext.deadline`
|
||||
|
||||
**File**: `crates/alknet-call/src/protocol/connection.rs:75–117`
|
||||
|
||||
`call()` registers the pending entry with `Instant::now() + DEFAULT_CALL_TIMEOUT`
|
||||
(a fixed 30s), ignoring the caller's `OperationContext.deadline`. The spec
|
||||
(call-protocol.md:480–489) says composed calls inherit the parent's remaining
|
||||
deadline — a depth-5 composition should not get 5×30s. The deadline field is
|
||||
propagated through `OperationContext` (env.rs:83 `deadline: parent.deadline`),
|
||||
but `CallConnection::call` is the *client*-side path (outgoing wire call) and
|
||||
doesn't receive an `OperationContext` — it builds its own timeout from a
|
||||
const. This is fine for top-level client calls; it's a gap for `from_call` ops
|
||||
(which forward via `CallConnection::call` from inside a handler). When
|
||||
`from_call` is implemented (currently deferred, see call-connection.md:119),
|
||||
the forwarding handler should pass `parent.deadline` through to the outgoing
|
||||
call so a depth-N composition is bounded by the root's deadline, not N×30s.
|
||||
|
||||
Not a defect today (no `from_call` callers exist), but worth a TODO at the
|
||||
call site so the next implementer doesn't ship the same N×30s bug.
|
||||
|
||||
---
|
||||
|
||||
### S3. `AccessControl::check` for `resource_action` without `resource_type` is surprising
|
||||
|
||||
**File**: `crates/alknet-call/src/registry/spec.rs:91–99`
|
||||
|
||||
When `resource_type` is `None` but `resource_action` is `Some(action)`, the
|
||||
check scans *every* resource type in `identity.resources` looking for any list
|
||||
that contains `action`. This is an odd semantics — "the caller has *some*
|
||||
resource with this action, anywhere" — and is not documented in the spec. The
|
||||
spec (operation-registry.md) describes resource checks as type-scoped. If the
|
||||
"any resource" branch is intentional, document it; if not, require
|
||||
`resource_type` whenever `resource_action` is set (or return `Allowed` when
|
||||
only `resource_action` is set, matching the principle that restrictions
|
||||
shouldn't be inferred). The test at spec.rs:284–303 only exercises the type-
|
||||
scoped case, so the type-less branch is untested and unspecified.
|
||||
|
||||
---
|
||||
|
||||
### S4. `PendingRequestMap::handle_responded` silently drops a Subscribe entry when the channel is full
|
||||
|
||||
**File**: `crates/alknet-call/src/protocol/pending.rs:135–141`
|
||||
|
||||
When a Subscribe entry's `mpsc::Sender::try_send` returns `Full`, the entry is
|
||||
removed from the map and the subscription is silently closed — the subscriber
|
||||
sees the channel end but no error, just `None`. The `tracing::warn!` at
|
||||
pending.rs:136 logs it server-side. The subscriber has no way to distinguish
|
||||
"stream completed normally" from "you were too slow and got dropped". For
|
||||
subscriptions that carry semantic state (LLM token streams, event tails),
|
||||
silent drop is a poor failure mode. Consider sending an explicit
|
||||
`CallError::internal("subscribe channel full")` before closing, or blocking
|
||||
on `send().await` with a timeout rather than `try_send`. This is a two-way
|
||||
door (the protocol doesn't forbid either behavior), but the current choice
|
||||
should be a documented one, not an accident.
|
||||
|
||||
---
|
||||
|
||||
### S5. `clippy::pedantic` flags a few minor items worth addressing
|
||||
|
||||
**Files**: various
|
||||
|
||||
`cargo clippy --all-features --all-targets -- -W clippy::pedantic` (run during
|
||||
this review) reports 11 unique warnings beyond default clippy:
|
||||
|
||||
- `clippy::cast_possible_truncation` at `wire.rs:312, 488, 533` — `body.len()
|
||||
as u32` in the frame writer. Already guarded by the `len > MAX_FRAME_SIZE`
|
||||
check at wire.rs:232, but `u32::try_from(body.len()).expect("guarded by
|
||||
MAX_FRAME_SIZE check")` (or `unwrap_or`) would make the invariant explicit
|
||||
and silence the lint.
|
||||
- `clippy::unchecked_time_subtraction` at `pending.rs:556` —
|
||||
`Instant::now() - Duration::from_millis(1)` in a test. Test-only, but
|
||||
`checked_sub` is the pedantic-clean form.
|
||||
- `clippy::redundant_closure_for_method_calls` at `registration.rs:307` —
|
||||
`scopes.iter().map(|s| s.to_string())` → `.map(ToString::to_string)`.
|
||||
- A handful of `missing_const_fn` / `needless_pass_by_value` in tests.
|
||||
|
||||
None of these are correctness issues. Default `cargo clippy` (the bar the
|
||||
tasks were held to) is clean. The pedantic set is offered as a one-time
|
||||
cleanup if the team wants to enable `clippy::pedantic` in CI later — the
|
||||
fixes are mechanical.
|
||||
|
||||
---
|
||||
|
||||
## What Was Verified and Found Correct
|
||||
|
||||
These are areas I specifically checked because the prior reviews or the ADRs
|
||||
flagged them as high-risk, and the implementation got them right:
|
||||
|
||||
- **`DerivedKey` custom `Deserialize` rejects `"[REDACTED]"`** (protocol.rs:69–
|
||||
91) — exactly as review #003 C1 demanded. The `#[derive(Deserialize)]` is
|
||||
gone, the manual impl checks for the redaction marker, and the test at
|
||||
protocol.rs:136–149 verifies the error message mentions `[REDACTED]` and
|
||||
does not leak key bytes.
|
||||
- **`encrypt`/`decrypt` use `encryption_path_for_version(key_version)`**
|
||||
(service.rs:320–340) — review #003 C2's resolution is implemented faithfully.
|
||||
The `key_version` is stamped on `EncryptedData` from the same call, not from
|
||||
`PATHS::ENCRYPTION`. Test at service.rs:610–621 confirms v2 round trip;
|
||||
service.rs:623–638 confirms rotate v2→v3.
|
||||
- **`encryption_path_for_version(version < 2)` returns `InvalidPath`**
|
||||
(derivation.rs:61–68) — review #003 W5's guard is in place. Tests at
|
||||
derivation.rs:274–287 and service.rs:654–679 cover v0 and v1 rejection at
|
||||
both the path and `encrypt` layers.
|
||||
- **`std::sync::RwLock` (not tokio) in `VaultServiceHandle`** (service.rs:43)
|
||||
— review #003 C4's resolution is correct. All vault methods are sync; no
|
||||
`tokio` dependency in `alknet-vault`'s `Cargo.toml`. Poisoned-lock recovery
|
||||
via `unwrap_or_else(|e| e.into_inner())` at service.rs:133, 152, 173, 190,
|
||||
etc. — tested at service.rs:429–449.
|
||||
- **`Capabilities` is non-serializable, `Zeroize`/`ZeroizeOnDrop`, `Clone`,
|
||||
sealed-builder** (types.rs:57–118) — review #003 C7's resolution is
|
||||
implemented: private `entries`, builder via `with_api_key`/`with_http_token`,
|
||||
no `Serialize` derive (test at types.rs:589–592 asserts this), redacting
|
||||
`Debug`. `Clone` is explicit (not derived) and zeroizes-on-drop via the
|
||||
`Secret<T>` wrapper.
|
||||
- **`SessionOverlaySource` is defined in `alknet-call`** (adapter.rs:38–43) —
|
||||
review #003 C8's crate-graph fix is correct. The trait lives alongside
|
||||
`OperationEnv`, not in a hypothetical `alknet-agent`. `CallAdapter` holds
|
||||
`Option<Arc<dyn SessionOverlaySource + Send + Sync>>`.
|
||||
- **`CompositeOperationEnv` dispatch probes `contains()` before delegating**
|
||||
(env.rs:144–161) — review #003 C9's resolution is implemented: session →
|
||||
connection → base, each gated by `contains()`. No sentinel ambiguity. Tests
|
||||
at env.rs:397–417 (session overlay wins), 419–448 (falls through to base),
|
||||
466–488 (contains aggregates), 568–597 (connection wins when session absent
|
||||
or missing the op).
|
||||
- **`OperationRegistryBuilder` is Layer-0-only; `CallConnection::register_
|
||||
imported` is the Layer 2 API** (registration.rs:127–199, connection.rs:57–
|
||||
67) — review #003 C10's resolution is correct. The builder has no overlay
|
||||
API; runtime overlay registration uses `CallConnection` methods. The
|
||||
`OverlayOperationEnv` (connection.rs:232–291) implements `OperationEnv`
|
||||
with `contains()` keyed on the overlay map.
|
||||
- **`with_local` takes 5 args including `capabilities`** (registration.rs:138–
|
||||
157) — review #003 C11's resolution is in place. Both examples in the spec
|
||||
now agree; the implementation matches.
|
||||
- **`invoke_with_policy` is the required method; `invoke` has a default impl**
|
||||
(env.rs:11–35) — review #003 W1's resolution is correct. Impls only write
|
||||
`invoke_with_policy`; `invoke` delegates with `parent.abort_policy`.
|
||||
- **`OperationContext` has 11 fields including `abort_policy` and `deadline`**
|
||||
(context.rs:11–23) — matches the post-review-#003 spec. `internal` is
|
||||
`pub(crate)` (set only by `OperationRegistry::invoke` and the env impls,
|
||||
not by handlers) — good, prevents handlers from forging internal authority.
|
||||
- **`CallError.details` is `Option<Value>` and serializes only when present**
|
||||
(wire.rs:61–68, 477–481) — ADR-023's typed error payload is wired. The
|
||||
`skip_serializing_if = "Option::is_none"` keeps protocol-level errors (no
|
||||
`details`) compact on the wire.
|
||||
- **`OperationProvenance` and `CompositionAuthority` are defined inline in
|
||||
the call crate** (registration.rs:19–27, context.rs:38–65) — review #003
|
||||
W11's resolution is correct. `CompositionAuthority::as_identity()` produces
|
||||
the `Identity` used for internal-call ACL (registration.rs:103–110),
|
||||
matching ADR-015's "handler identity, not caller identity, for internal
|
||||
calls" rule. Test at registration.rs:473–503 confirms internal call uses
|
||||
handler authority; 505–538 confirms insufficient handler authority is
|
||||
forbidden; 540–570 confirms external call uses caller identity, not handler
|
||||
authority.
|
||||
- **Internal ops return `NOT_FOUND` (not `FORBIDDEN`) from the wire**
|
||||
(registration.rs:98–100) — ADR-015's "internal ops invisible from the wire"
|
||||
rule is correct. Tested at registration.rs:331–351.
|
||||
- **`auth_token` resolution: success overrides connection identity; failure
|
||||
falls back** (adapter.rs:87–105) — matches call-protocol.md:403. Tested at
|
||||
adapter.rs:490–535 (all four cases: no token, valid token, invalid token
|
||||
with connection identity, invalid token without connection identity).
|
||||
- **`call.requested` payload carries no abort policy field; root gets
|
||||
`AbortPolicy::default()`** (adapter.rs:161) — ADR-016 Decision 6 is
|
||||
honored. The wire caller cannot set the policy; the composing handler opts
|
||||
children into `ContinueRunning` via `invoke_with_policy`.
|
||||
- **Vault has no `tokio` dependency, no `irpc`, no ALPN, no alknet-core
|
||||
dep** (Cargo.toml) — ADR-018, ADR-025 are satisfied. The vault is local-
|
||||
only by construction.
|
||||
- **`VaultServiceHandle` is `Clone` (Arc<RwLock>)** (service.rs:56–59) —
|
||||
matches ADR-019's "assembly layer holds the handle, injects derived
|
||||
material" model.
|
||||
- **`rotate` decrypts with old version, re-encrypts with new** (service.rs:350–
|
||||
357) — ADR-021's "no new mnemonic" rotation is correct. Partial rotation is
|
||||
safe (test at service.rs:640–652 confirms old key still derivable after
|
||||
rotation).
|
||||
- **`KeyCache` zeroizes on drop via `CachedKey: Zeroize + ZeroizeOnDrop`**
|
||||
(cache.rs:25–37) — `DerivedKey` inside `CachedKey` is `#[zeroize]` on the
|
||||
private key field. `KeyCache::clear` drops all entries, triggering
|
||||
zeroization. The drop-tracker tests at cache.rs:215–289 verify HashMap
|
||||
`clear`/`remove`/`insert`-replace all invoke `Drop` on the values.
|
||||
- **`build_root_context` reads `composition_authority`, `capabilities`,
|
||||
`scoped_env` from the registration bundle** (adapter.rs:126–166) — ADR-
|
||||
022's bundle is the dispatch source of truth, not a closure-captured
|
||||
ambient. Capabilities are per-request on `OperationContext`, populated
|
||||
from the bundle.
|
||||
- **`OperationContext.metadata` is fresh per composed call** (env.rs:81,
|
||||
connection.rs:277) — ADR-014's "metadata does not propagate through
|
||||
composition" security constraint is honored. Test at env.rs:351–370
|
||||
confirms child metadata is empty even when parent has entries.
|
||||
- **`ScopedOperationEnv` bounds composition reachability** (context.rs:67–94,
|
||||
env.rs:63–65, connection.rs:248–250) — ADR-015's "handler can only invoke
|
||||
declared ops" rule is enforced at every env layer. Test at env.rs:288–299
|
||||
confirms disallowed op returns `NOT_FOUND`.
|
||||
- **Wire framing: 4-byte big-endian length prefix, 64 MiB max, zero-length
|
||||
rejected, oversized rejected, EOF → `ConnectionClosed`** (wire.rs:185–214)
|
||||
— robust. Tested at wire.rs:297–349.
|
||||
- **`CallAdapter::handle` spawns a 10s sweeper for expired pending entries**
|
||||
(adapter.rs:246–259) and fails all pending on connection close (289–297).
|
||||
Correct lifecycle.
|
||||
|
||||
---
|
||||
|
||||
## Recommended Resolution Order
|
||||
|
||||
1. **W1 (abort cascade wiring)** — highest value. The code exists; the bolt
|
||||
is missing. ~30 lines in `handle_stream` + one integration test. Without
|
||||
this, ADR-016 is a documented property that doesn't hold at runtime.
|
||||
2. **W3 (Mnemonic Debug redaction)** — smallest fix, eliminates a latent
|
||||
root-of-trust leak. ~10 lines + one test.
|
||||
3. **W2 (fingerprint extraction)** — decide: implement or explicitly defer.
|
||||
Either way, reconcile spec / summary / code. If implementing, the quinn
|
||||
path is ~20 lines (downcast `HandshakeData`, extract client cert, hash to
|
||||
fingerprint); the iroh path uses the peer `NodeId` directly.
|
||||
4. **W4 (ApiKeyEntry resources)** — decide: add the field or drop it from the
|
||||
spec. Whichever way, fix the three-way mismatch.
|
||||
5. **S1–S5** — opportunistic; bundle into a follow-up "polish" task or address
|
||||
as touched.
|
||||
|
||||
Review #004 is open. Zero critical findings; four warnings, all local; five
|
||||
suggestions. The implementation is sound, the spec drift is bounded, and the
|
||||
one wiring gap (W1) has all the hard logic already written and tested — it
|
||||
just needs to be called.
|
||||
Reference in New Issue
Block a user