Files
alknet/docs/reviews/003-pre-implementation-architecture-sanity-check.md
glm-5.2 2e34590522 docs(architecture): resolve review #003 — type/API surface completeness
Review #003 found 11 critical, 14 warning, and 6 suggestion findings
after reviews #001 (governance/security) and #002 (cross-document
consistency/two-way-door audit) were resolved. The theme: types and
APIs that were *referenced* but never *defined*, and stale ADR sketches
that didn't match the now-updated spec docs.

Critical fixes (11):

- C1: DerivedKey #[derive(Deserialize)] contradicted the custom
  Deserialize that rejects "[REDACTED]" — dropped the derive, added
  explicit manual Serialize/Deserialize impls (protocol.md).
- C2: encrypt prose said "derived at PATHS::ENCRYPTION" but the
  signature takes key_version — updated to encryption_path_for_version
  (service.md).
- C3: derive_encryption_key returned DerivedKey, derive_encryption_key
  _for_version returned EncryptionKey (same cache) — unified on
  DerivedKey, defined CachedKey (service.md).
- C4: tokio vs std::sync::RwLock contradiction — specified
  std::sync::RwLock, dropped tokio from vault deps (ADR-018, ADR-025,
  service.md).
- C5: Missing drift rows in vault README — added #9 (key_version
  ignored) and #10 (rotate not implemented).
- C6: ADR-022 build_root_context and invoke() sketches omitted
  abort_policy (9 fields vs 10) — added the field to both sketches.
- C7: Capabilities type referenced 20+ times, never defined — added
  struct definition to core-types.md with Clone+Send+Sync, Zeroize,
  sealed builder API, immutability guard.
- C8: SessionOverlaySource on CallAdapter but never defined, crate
  violation (alknet-call can't depend on alknet-agent) — defined the
  trait in alknet-call (call-protocol.md), matching the IdentityProvider
  pattern.
- C9: CompositeOperationEnv dispatch fall-through was "a two-way door"
  — added contains() to OperationEnv trait, made the composite probe
  before dispatching, eliminating the sentinel ambiguity.
- C10: No API for Layer 2 (connection overlay) registration, CallConnection
  undefined — defined CallConnection struct + register_imported() API
  (call-protocol.md).
- C11: with_local signature diverged between two examples (4 args vs 5)
  — added capabilities as the 5th arg, made both examples consistent.

Warning fixes (14):

- W1: invoke_with_policy restructured as required method, invoke gets a
  default impl delegating to it — eliminates duplication across impls.
- W2: CachedKey defined (service.md).
- W3: EncryptionKey constructor/glue specified, added to re-export list.
- W4: Secp256k1ExtendedPrivKey defined, derive_ethereum_key glue shown.
- W5: encryption_path_for_version rejects version < 2 (v1 is TS PBKDF2).
- W6: Wire payload schemas for all event types + ResponseEnvelope →
  EventEnvelope conversion table (call-protocol.md).
- W7: Timeout section — deadline on OperationContext, composed calls
  inherit parent's deadline, CallAdapter::with_timeout().
- W8: Request ID generation spec — UUID v4 for composed calls, wire ID
  vs internal ID relationship for abort cascade.
- W9: unlock_new already-unlocked behavior specified (returns
  AlreadyUnlocked).
- W10: KeyType Serialize/Deserialize justification corrected (stale
  irpc reference removed).
- W11: OperationProvenance and CompositionAuthority defined inline in
  operation-registry.md (were only in ADR-022).
- W12: encrypt/decrypt free functions marked pub(crate), relationship
  to VaultServiceHandle methods stated.
- W13: rotate signature removed from encryption.md (it's a
  VaultServiceHandle method, not a free function).
- W14: CallAdapter::new() + with_session_source() + with_timeout()
  constructors shown.

Suggestion fixes (6): Seed: Clone note, VaultServiceInner invariant,
ExtendedPrivKey accessor signatures, CURRENT_KEY_VERSION location, ADR-018
stale actor text, derivation helpers re-export note.
2026-06-23 10:56:05 +00:00

512 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
status: resolved
last_updated: 2026-06-23
reviewed_documents:
- overview.md
- README.md
- open-questions.md
- crates/core/README.md
- crates/core/core-types.md
- crates/core/endpoint.md
- crates/core/auth.md
- crates/core/config.md
- crates/call/README.md
- crates/call/call-protocol.md
- crates/call/operation-registry.md
- crates/vault/README.md
- crates/vault/mnemonic-derivation.md
- crates/vault/encryption.md
- crates/vault/service.md
- crates/vault/protocol.md
- decisions/001 through 026 (all 26 ADRs)
reviewer: architect
---
# Architecture Gap Review #003
## Purpose
This is the third pre-implementation sanity check. Reviews #001 and #002
caught the registration-bundle tangle (C1C4), the missing error schemas (C5),
the governance reconciliation (Proposed ADRs treated as binding, Accepted ADRs
contradicting later refinements), the abort-policy wiring gap, and the vault
deferral audit. All of those are resolved.
This review goes deeper on a different axis: **type and API surface
completeness**. The prior reviews focused on cross-document *decisions*
(contradictions between ADRs, stale framing). This review asks: could two
implementers, working from these specs alone, produce compatible code? The
findings are mostly places where a type or API is *referenced* but never
*defined*, or where a code sketch in an ADR went stale after the spec docs
were updated.
The two themes:
1. **Stale ADR sketches**: ADR-022's `build_root_context` and `invoke()`
sketches were written before ADR-024 split `env` into `scoped_env` + `env`
and before ADR-016's abort policy was added to `OperationContext`. The spec
docs (call-protocol.md, operation-registry.md) were updated; the ADR
sketches were not. An implementer copying the ADR verbatim won't compile.
2. **Referenced-but-undefined types**: `Capabilities`, `SessionOverlaySource`,
`CachedKey`, `CallConnection`, `Secp256k1ExtendedPrivKey` — all referenced
in specs, none defined. These are exactly the boundary types where two
implementers must agree, and the spec leaves them as exercises.
## Severity Definitions
| Severity | Meaning |
|----------|---------|
| **Critical** | Must resolve before implementation — would cause divergent implementations or security issues |
| **Warning** | Should resolve — missing specifications that could cause confusion or rework |
| **Suggestion** | Consider — improvements for clarity or completeness |
---
## Critical Findings
### C1. `DerivedKey`: `#[derive(Deserialize)]` contradicts "custom Deserialize rejects `[REDACTED]`"
**Documents**: crates/vault/protocol.md (L29, L68),
decisions/025-vault-local-only-dispatch.md (L137)
**Problem**: `protocol.md:29` shows `#[derive(Zeroize, Deserialize)]` on
`DerivedKey`. But `protocol.md:68` and ADR-025:137 both require a **custom**
`Deserialize` that explicitly rejects `private_key == "[REDACTED]"` with an
error. These are mutually exclusive: `#[derive(Deserialize)]` generates a
default impl, and you cannot also write a manual `impl Deserialize` (conflicting
impls). With the derived impl, a redacted string would fail only incidentally
(serde type mismatch: string vs sequence), producing a generic error — not the
explicit redaction-rejection the spec describes.
**Resolution**: Dropped `Deserialize` from the derive list in protocol.md and
showed a separate manual `impl<'de> Deserialize<'de> for DerivedKey` that
checks for `"[REDACTED]"` and returns a clear error.
---
### C2. `encrypt` prose says "derived at `PATHS::ENCRYPTION`" but signature takes `key_version`
**Documents**: crates/vault/service.md (L175),
decisions/021-key-rotation-via-version-indexed-paths.md (L115122)
**Problem**: `service.md:172` signature: `encrypt(&self, plaintext, key_version)`.
`service.md:175` prose: "Encrypt plaintext using the encryption key derived at
`PATHS::ENCRYPTION`." `PATHS::ENCRYPTION` is the v2 path only. ADR-021 makes
`encrypt` derive at `encryption_path_for_version(key_version)`. An implementer
following the prose stamps `key_version: 3` but uses the v2 key — a rotation bug
producing mislabeled blobs that decrypt fine locally but are cryptographically
mislabeled.
**Resolution**: Updated service.md:175 to "Encrypt plaintext using the
encryption key derived at `encryption_path_for_version(key_version)` (ADR-021);
the same `key_version` is stamped on the resulting `EncryptedData`."
---
### C3. `derive_encryption_key(path)` returns `DerivedKey` but `derive_encryption_key_for_version(version)` returns `EncryptionKey`
**Documents**: crates/vault/service.md (L133, L143),
decisions/021-... (L104122)
**Problem**: Same cache, different return types. No conversion between them is
specified, and `CachedKey` (the cache value type) is never defined. An
implementer cannot reconcile the cache.
**Resolution**: Made `derive_encryption_key_for_version` return `DerivedKey`
(consistent with `derive_encryption_key`), and `encrypt`/`decrypt` extract the
`EncryptionKey` from the cached `DerivedKey` via `EncryptionKey::from_derived_bytes`.
Defined `CachedKey` as a `#[derive(Zeroize, ZeroizeOnDrop)]` struct wrapping a
`DerivedKey` with cache metadata.
---
### C4. `tokio` vs `std::sync::RwLock` contradiction
**Documents**: decisions/018 (L69, L155156), decisions/025 (L166171),
crates/vault/service.md (L2733)
**Problem**: ADR-018:155 says "`std::sync::RwLock` works"; ADR-018:69 and
ADR-025:166 list `tokio` as a retained dep "for RwLock sync primitives";
service.md shows `RwLock` without specifying which. All vault methods are sync
(no `async`), implying `std::sync::RwLock` — which means tokio isn't needed,
contradicting both ADR dependency lists. ADR-018:155 still references the
removed actor's `tokio::sync::mpsc`.
**Resolution**: Specified `std::sync::RwLock` in service.md. Dropped `tokio`
from retained-deps lists in ADR-018 and ADR-025. Removed the stale actor/mpsc
sentence in ADR-018.
---
### C5. Missing drift rows in vault README drift table
**Documents**: crates/vault/README.md (L121130),
decisions/021-... (L126128), crates/vault/encryption.md (L185188)
**Problem**: The README drift table (the stated "single source of truth") omits
two drift items: (a) `decrypt`/`encrypt` ignore `key_version` and always derive
at `PATHS::ENCRYPTION`; (b) `rotate` is not implemented. An implementer using
only the table would miss these.
**Resolution**: Added drift rows #9 (key_version ignored) and #10 (rotate not
implemented) to the README table.
---
### C6. ADR-022's `build_root_context` and `invoke()` omit `abort_policy`
**Documents**: decisions/022 (L344358, L382394),
crates/call/call-protocol.md (L296318),
crates/call/operation-registry.md (L309336)
**Problem**: ADR-022's sketches have 9 fields; the spec docs have 10 (including
`abort_policy`). An implementer copying ADR-022 verbatim won't compile against
the `OperationContext` struct.
**Resolution**: Added `abort_policy: AbortPolicy::default(),` to ADR-022's
`build_root_context` and `abort_policy: parent.abort_policy.clone(),` to its
`invoke()` sketch. Restated the `env` type as `Arc<dyn OperationEnv + Send +
Sync>` in both sketches.
---
### C7. `Capabilities` type referenced widely but defined nowhere
**Documents**: overview.md (L168), crates/call/operation-registry.md (L116,
L206, L455, L473, L507513), crates/call/call-protocol.md (L305)
**Problem**: `Capabilities` appears in `OperationContext`, `HandlerRegistration`,
`build_root_context`, and the capability injection section — but has no struct
definition, no trait bounds, no builder API, no module location anywhere. The
spec asserts `Clone`, "non-serializable", "zeroized", "immutable after
construction" in prose only. The core and call implementers will produce
incompatible types.
**Resolution**: Added a `Capabilities` struct definition to core-types.md with
`Clone + Send + Sync`, `Zeroize`/`ZeroizeOnDrop`, a sealed builder API
(`new()`, `with_api_key()`, `with_http_token()`), private fields (immutability
guard), and no `Serialize` derive. Cross-referenced from operation-registry.md.
---
### C8. `SessionOverlaySource` referenced on `CallAdapter` but never defined; crate violation
**Documents**: crates/call/call-protocol.md (L41),
decisions/024-... (L447456)
**Problem**: `CallAdapter` (alknet-call) has a field of type
`Arc<dyn SessionOverlaySource>`, but ADR-024 says the trait is "an agent-crate
concern." That's a crate-graph violation: alknet-call can't hold a field whose
type is defined in alknet-agent (agent depends on call, not reverse).
**Resolution**: Defined `SessionOverlaySource` as a trait in alknet-call
(alongside `OperationEnv`), since the `CallAdapter` must name it. alknet-agent
implements it; alknet-call defines the integration point. This matches the
`IdentityProvider` pattern (ADR-004: core defines the trait, handlers implement
it).
---
### C9. `CompositeOperationEnv` dispatch fall-through semantics unspecified
**Documents**: crates/call/operation-registry.md (L348387)
**Problem**: The `OperationEnv::invoke` signature returns `ResponseEnvelope`
no "not in this overlay" sentinel. The sketch says "returns a sentinel or the
composite continues" and "the exact mechanism is a two-way door" — but this
affects cross-impl interop. Two implementers picking different mechanisms
(sentinel vs `contains` check) will produce `OperationEnv` impls that don't
compose.
**Resolution**: Added a `contains(&self, name: &str) -> bool` method to the
`OperationEnv` trait (with a default impl that returns `true` for backward
compat). The composite env probes `contains()` before dispatching, eliminating
the sentinel ambiguity. `CompositeOperationEnv::invoke` now checks
`session.contains()``connection.contains()` → base, dispatching to the first
overlay that contains the op.
---
### C10. No API for Layer 2 (connection overlay) registration; `CallConnection` undefined
**Documents**: crates/call/operation-registry.md (L214232, L456480),
decisions/024-... (L252263)
**Problem**: `OperationRegistryBuilder` only builds Layer 0. `from_call` imports
go into the connection's overlay at runtime, but there's no registration API, no
`CallConnection` struct definition, and no spec for how the overlay becomes part
of `CompositeOperationEnv.connection`.
**Resolution**: Added a `CallConnection` struct definition (overlay-related
fields) and a `register_imported(registration)` / `imported_operations` API to
call-protocol.md. Clarified that `OperationRegistryBuilder` is Layer-0-only;
runtime overlay registration uses `CallConnection` methods.
---
### C11. `with_local` signature diverges between two examples in operation-registry.md
**Documents**: crates/call/operation-registry.md (L217230, L458478)
**Problem**: The first example shows `with_local(spec, handler, authority,
scoped_env)` — 4 args, no capabilities. The second uses
`.with(HandlerRegistration { ... capabilities })`. `HandlerRegistration.capabilities`
is required (not `Option`). The first example silently drops capabilities for
the agent handler.
**Resolution**: Added `capabilities` as the 5th arg to `with_local` in the first
example, making both examples consistent. Documented the `with_local` and
`with_leaf` signatures explicitly.
---
## Warning Findings
### W1. `invoke_with_policy` has no default impl and can't delegate to `invoke()`
**Documents**: crates/call/operation-registry.md (L265273, L340342)
**Problem**: `invoke_with_policy` is a required trait method with no default
impl. `invoke()` hardcodes `abort_policy: parent.abort_policy.clone()`, so
`invoke_with_policy` can't delegate to it — every `OperationEnv` impl must
duplicate the full body.
**Resolution**: Restructured the trait: `invoke_with_policy` is the required
method; `invoke` has a default impl that calls `invoke_with_policy` with
`parent.abort_policy.clone()`. Impls only write `invoke_with_policy`. Added the
`CompositeOperationEnv::invoke_with_policy` body.
---
### W2. `CachedKey` referenced 7+ times but never defined
**Documents**: crates/vault/service.md (L216, L231, L330336, L351352),
crates/vault/protocol.md (L114), crates/vault/README.md (L99, L128)
**Resolution**: Added a `CachedKey` struct definition to service.md's Cache
section, showing fields (`key: DerivedKey`, `cached_at: Instant`,
`last_accessed: Instant`) and `Zeroize`/`ZeroizeOnDrop` derives.
---
### W3. `EncryptionKey` constructor + derivation glue unspecified; missing from re-export
**Documents**: crates/vault/encryption.md (L6078),
crates/vault/service.md (L140155), crates/vault/README.md (L144)
**Resolution**: Added `from_derived_bytes` pseudocode wiring SLIP-0010 output to
`EncryptionKey`. Added `EncryptionKey` to README re-export list. Specified
trait derives (custom redacting `Debug`, no `Clone`, `Zeroize`/`ZeroizeOnDrop`).
---
### W4. `Secp256k1ExtendedPrivKey` never defined; `derive_ethereum_key` glue unspecified
**Documents**: crates/vault/service.md (L157165),
crates/vault/mnemonic-derivation.md (L152164)
**Resolution**: Added `Secp256k1ExtendedPrivKey` struct definition to
mnemonic-derivation.md (parallel to `ExtendedPrivKey`). Documented
`derive_ethereum_key``derive_secp256k1_path` → wrap into `DerivedKey` glue.
---
### W5. `encryption_path_for_version(v<2)` collides onto v2 path
**Documents**: decisions/021-... (L7884),
crates/vault/mnemonic-derivation.md (L203), crates/vault/service.md (L147148)
**Problem**: `saturating_sub(2)` maps v0/v1/v2 to the same path. v1 is TS PBKDF2
(undecryptable by design), but `derive_encryption_key_for_version(1)` silently
returns the v2 key.
**Resolution**: Added a guard: `encryption_path_for_version` and
`derive_encryption_key_for_version` return `VaultServiceError::InvalidPath` for
`version < 2`. Documented in mnemonic-derivation.md and service.md.
---
### W6. `ResponseEnvelope` → `EventEnvelope` conversion unspecified; payload schemas missing
**Documents**: crates/call/call-protocol.md (L8085, L331341)
**Resolution**: Added a "Wire Payload Schemas" section to call-protocol.md
showing the `payload` shape for `call.responded`, `call.completed`, and
`call.aborted`. Documented the `ResponseEnvelope``EventEnvelope` mapping.
---
### W7. Timeout location and composition propagation unspecified
**Documents**: crates/call/call-protocol.md (L352)
**Resolution**: Added a timeout section to call-protocol.md specifying: timeout
duration is on `CallAdapter` (constructor param); composed calls inherit the
parent's remaining deadline via a `deadline: Option<Instant>` on
`OperationContext`; composed calls that exceed the deadline are cancelled (future
dropped). Added a cross-reference from `OperationContext`.
---
### W8. `generate_request_id()` unspecified; wire ID vs internal ID relationship undefined
**Documents**: crates/call/operation-registry.md (L315, L167),
crates/call/call-protocol.md (L291)
**Resolution**: Added a "Request ID Generation" subsection to
operation-registry.md specifying UUID v4 generation for composed calls, that wire
`call.requested.id` becomes the root `request_id`, and that composed child IDs
are internal (appear in `PendingRequestMap` for abort indexing but are not sent
as `call.requested` unless the child is a `from_call` op).
---
### W9. `unlock_new` already-unlocked behavior unspecified
**Documents**: crates/vault/service.md (L7493)
**Resolution**: Added: "`unlock_new` returns `AlreadyUnlocked` if the vault is
already unlocked, matching `unlock`'s behavior."
---
### W10. `KeyType`'s `Serialize`/`Deserialize` justification stale post-ADR-025
**Documents**: crates/vault/protocol.md (L115)
**Resolution**: Removed the stale "part of the irpc protocol" justification.
`KeyType` derives `Serialize`/`Deserialize` for `EncryptedData` interop (future
use) and `Clone` because it's a non-secret tag; the derives are retained but the
justification is corrected.
---
### W11. `OperationProvenance` and `CompositionAuthority` defined only in ADR-022
**Documents**: crates/call/operation-registry.md (L202, L209, L468),
decisions/022-... (L116131, L165181)
**Resolution**: Added inline definitions of `OperationProvenance` and
`CompositionAuthority` to operation-registry.md, with "See ADR-022 for
rationale" cross-references.
---
### W12. `encrypt`/`decrypt` free functions vs `VaultServiceHandle` methods relationship unstated
**Documents**: crates/vault/encryption.md (L132146),
crates/vault/service.md (L169190)
**Resolution**: Added a note to encryption.md that the free `encrypt`/`decrypt`
functions are module-internal crypto helpers; `VaultServiceHandle::encrypt`/
`decrypt` are the public API and call through to them.
---
### W13. `rotate` placement in encryption.md (method signature in crypto doc)
**Documents**: crates/vault/encryption.md (L177179)
**Resolution**: Replaced the `rotate` signature block in encryption.md with a
prose pointer to service.md and ADR-021 (rotate is a `VaultServiceHandle`
method, not a free function).
---
### W14. `CallAdapter::new()` missing session_source param
**Documents**: crates/call/operation-registry.md (L480),
crates/call/call-protocol.md (L3542)
**Resolution**: Added `CallAdapter::new(registry, identity_provider)` (defaults
`session_source` to `None`) and `CallAdapter::with_session_source(source)`
builder method to call-protocol.md.
---
## Suggestions
### S1. `Seed: Clone` duplicates root of trust
**Documents**: crates/vault/mnemonic-derivation.md (L8488)
**Resolution**: Kept `Clone` on `Seed` (derivation functions take `&[u8]`, so
`Clone` is not strictly needed, but the `to_seed` path and cache rebuild may
benefit). Added a note that callers should prefer `&Seed` and avoid cloning.
---
### S2. `VaultServiceInner.unlocked: bool` redundant with `seed: Option<Seed>`
**Documents**: crates/vault/service.md (L3541)
**Resolution**: Added an invariant note: `unlocked` is `true` iff
`seed.is_some()`; `is_unlocked()` reads `unlocked` for cheap checks, but the
ground truth is `seed.is_some()`.
---
### S3. `ExtendedPrivKey`/`Mnemonic` accessor signatures incomplete
**Documents**: crates/vault/mnemonic-derivation.md (L5863, L138150)
**Resolution**: Added accessor signatures for `ExtendedPrivKey`
(`private_key()`, `public_key()`, `chain_code()`, `path()`).
---
### S4. `CURRENT_KEY_VERSION` module location unspecified
**Documents**: decisions/021-... (L159), crates/vault/encryption.md (L154)
**Resolution**: Added: `CURRENT_KEY_VERSION` lives in `encryption.rs` and is
re-exported from the crate root.
---
### S5. ADR-018 stale actor reference
**Documents**: decisions/018 (L155156)
**Resolution**: Removed the stale actor/mpsc sentence (addressed in C4).
---
### S6. `device_path`/`encryption_path_for_version`/`derive_path_from_seed` not in re-export list
**Documents**: crates/vault/README.md (L136154)
**Resolution**: Added a note that derivation helpers are accessible as
`alknet_vault::derivation::*`.
---
## Summary Statistics
| Severity | Count | Status |
|----------|-------|--------|
| Critical | 11 | All resolved (C1C11) |
| Warning | 14 | All resolved (W1W14) |
| Suggestion | 6 | All resolved (S1S6) |
## Recommended Resolution Order
All findings resolved. Resolution summary:
1. **Vault type/reconciliation pass** (C1C5, W2W5, W9W13, S1S4): Fixed
`DerivedKey` derive contradiction, `encrypt` prose, return-type divergence,
RwLock contradiction, drift table, `CachedKey`/`EncryptionKey`/
`Secp256k1ExtendedPrivKey` definitions, `encryption_path_for_version`
guard, `unlock_new` behavior, `KeyType` justification, free-vs-method
relationship, `rotate` placement.
2. **Core+call type/API pass** (C6C11, W1, W6W8, W14, S5S6): Fixed
ADR-022 stale sketches, defined `Capabilities`, `SessionOverlaySource`,
`CallConnection`, Layer 2 registration API, `CompositeOperationEnv`
dispatch, `with_local` signature, `invoke_with_policy` default impl,
payload schemas, timeout, request ID generation, `CallAdapter` constructor.
3. **Inline definitions** (W11): Added `OperationProvenance` and
`CompositionAuthority` inline in operation-registry.md.
Review #003 is complete. All 11 critical, 14 warning, and 6 suggestion findings
resolved.