Files

glm-5.2 2e34590522 docs(architecture): resolve review #003 — type/API surface completeness

Review #003 found 11 critical, 14 warning, and 6 suggestion findings
after reviews #001 (governance/security) and #002 (cross-document
consistency/two-way-door audit) were resolved. The theme: types and
APIs that were *referenced* but never *defined*, and stale ADR sketches
that didn't match the now-updated spec docs.

Critical fixes (11):

- C1: DerivedKey #[derive(Deserialize)] contradicted the custom
  Deserialize that rejects "[REDACTED]" — dropped the derive, added
  explicit manual Serialize/Deserialize impls (protocol.md).
- C2: encrypt prose said "derived at PATHS::ENCRYPTION" but the
  signature takes key_version — updated to encryption_path_for_version
  (service.md).
- C3: derive_encryption_key returned DerivedKey, derive_encryption_key
  _for_version returned EncryptionKey (same cache) — unified on
  DerivedKey, defined CachedKey (service.md).
- C4: tokio vs std::sync::RwLock contradiction — specified
  std::sync::RwLock, dropped tokio from vault deps (ADR-018, ADR-025,
  service.md).
- C5: Missing drift rows in vault README — added #9 (key_version
  ignored) and #10 (rotate not implemented).
- C6: ADR-022 build_root_context and invoke() sketches omitted
  abort_policy (9 fields vs 10) — added the field to both sketches.
- C7: Capabilities type referenced 20+ times, never defined — added
  struct definition to core-types.md with Clone+Send+Sync, Zeroize,
  sealed builder API, immutability guard.
- C8: SessionOverlaySource on CallAdapter but never defined, crate
  violation (alknet-call can't depend on alknet-agent) — defined the
  trait in alknet-call (call-protocol.md), matching the IdentityProvider
  pattern.
- C9: CompositeOperationEnv dispatch fall-through was "a two-way door"
  — added contains() to OperationEnv trait, made the composite probe
  before dispatching, eliminating the sentinel ambiguity.
- C10: No API for Layer 2 (connection overlay) registration, CallConnection
  undefined — defined CallConnection struct + register_imported() API
  (call-protocol.md).
- C11: with_local signature diverged between two examples (4 args vs 5)
  — added capabilities as the 5th arg, made both examples consistent.

Warning fixes (14):

- W1: invoke_with_policy restructured as required method, invoke gets a
  default impl delegating to it — eliminates duplication across impls.
- W2: CachedKey defined (service.md).
- W3: EncryptionKey constructor/glue specified, added to re-export list.
- W4: Secp256k1ExtendedPrivKey defined, derive_ethereum_key glue shown.
- W5: encryption_path_for_version rejects version < 2 (v1 is TS PBKDF2).
- W6: Wire payload schemas for all event types + ResponseEnvelope →
  EventEnvelope conversion table (call-protocol.md).
- W7: Timeout section — deadline on OperationContext, composed calls
  inherit parent's deadline, CallAdapter::with_timeout().
- W8: Request ID generation spec — UUID v4 for composed calls, wire ID
  vs internal ID relationship for abort cascade.
- W9: unlock_new already-unlocked behavior specified (returns
  AlreadyUnlocked).
- W10: KeyType Serialize/Deserialize justification corrected (stale
  irpc reference removed).
- W11: OperationProvenance and CompositionAuthority defined inline in
  operation-registry.md (were only in ADR-022).
- W12: encrypt/decrypt free functions marked pub(crate), relationship
  to VaultServiceHandle methods stated.
- W13: rotate signature removed from encryption.md (it's a
  VaultServiceHandle method, not a free function).
- W14: CallAdapter::new() + with_session_source() + with_timeout()
  constructors shown.

Suggestion fixes (6): Seed: Clone note, VaultServiceInner invariant,
ExtendedPrivKey accessor signatures, CURRENT_KEY_VERSION location, ADR-018
stale actor text, derivation helpers re-export note.

2026-06-23 10:56:05 +00:00

18 KiB

Raw Blame History

ADR-025: Vault Local-Only Dispatch

Status

Accepted

Context

alknet-vault uses irpc for its internal dispatch. The VaultProtocol enum is annotated with #[rpc_requests(message = VaultMessage, no_spans)], which generates a Service trait impl (for in-process mpsc dispatch) and a RemoteService trait impl (for remote QUIC dispatch). The vault's VaultServiceActor processes VaultMessage variants from an mpsc channel. This was adopted from irpc's actor pattern (ADR-005).

What irpc gives the vault

Separating irpc into its constituent parts and asking which the vault actually needs:

irpc component	What it does	Does the vault need it?
`#[rpc_requests]` macro	Generates message enum, `Channels` impls, `From` conversions	Marginally — it's convenient boilerplate, but the vault's protocol is 8 variants
`Service` trait	Local in-process dispatch via mpsc + oneshot	No — `VaultServiceHandle` direct calls are already preferred (service.md: "For local in-process use, prefer `VaultServiceHandle` directly — no channel, no serialization")
`RemoteService` trait	Remote dispatch via QUIC + postcard	No — this is the footgun
`Client<S>`	Wraps either local mpsc or remote QUIC	No — the assembly layer uses the handle directly
`IrohProtocol` handler	Forwards all messages without auth	No — this is the default-insecure handler
postcard serialization	Binary serialization for remote dispatch	No — not needed without remote dispatch
`DerivedKey` dual serialization	JSON redacts, postcard preserves	Only needed because remote dispatch exists

The vault uses irpc for the actor pattern (in-process mpsc dispatch), but the actor pattern is the secondary dispatch path. The primary path — direct method calls on VaultServiceHandle — doesn't use irpc at all. And the thing that makes irpc attractive for the actor pattern (the macro-generated boilerplate) is a convenience, not a structural need. The vault's protocol is small enough that the boilerplate is manageable by hand, or simply unnecessary when the actor is removed.

The security problem: default-insecure

The core problem is not that remote vault access is possible in principle — it's that irpc makes it possible by default, with the unsafe path being the easy path.

The #[rpc_requests] macro generates RemoteService unless you pass no_rpc. The IrohProtocol handler forwards all message types without auth checks. The docs frame "register an ALPN" as a server-setup change (OQ-21: "Enabling remote access is a server-setup change"). The result is an architecture where:

The vault is remote-capable by construction (the footgun is loaded).
Enabling remote access is easy — one line: Router::builder(endpoint) .accept(b"alknet/vault", protocol).spawn().
The default handler has no auth (the safety is off).
Making it safe requires an auth-wrapping handler outside the vault crate (the safety is a separate part you have to remember to install).

This is the default-insecure anti-pattern. Security should be opt-in, not opt-out. The vault should be local-only by default, and remote access should require adding something, not removing a default.

The use cases don't justify the default

Single node, local vault (the designed path): The CLI binary unlocks the vault at startup, derives/decrypts credentials, injects them into handler capabilities. The vault is accessed only at the assembly layer (ADR-019). No network. This is the path every deployment starts with, and it needs only direct in-process method calls on VaultServiceHandle. irpc adds nothing.

Many nodes encrypt/decrypt the same data: The most likely network-vault use case, but a stretch. The better pattern is per-node vaults: the head encrypts credentials for the worker using the worker's public key or a shared derivation path the worker can derive locally. The worker decrypts locally. This is end-to-end encryption between nodes, not a centralized decryption oracle. It matches ADR-008's "capability source" model — credentials are injected at the assembly layer, not fetched over the network at call time.

Machine node → workers (OQ-21's use case): A long-lived machine node holds the mnemonic and exposes a restricted vault API to ephemeral workers. This is the use case the vault docs actually spec. But from_call's trust model already flags the risk: "a compromised remote node can do anything its operations are declared to do" (operation-registry.md). If the machine node is compromised, every worker that calls it is compromised. That's inherent to remote vault access and not a reason to forbid it, but it is a reason to make the exposure a deliberate, hard-to-accidentally-enable act — not the default state of the crate.

None of these use cases justify making the vault remote-capable by construction. The first needs no remote. The second has a better pattern (per-node vaults). The third is real but should be an explicit addition, not a default that's already loaded.

The actor path is dead code

service.md says "For local in-process use, prefer VaultServiceHandle directly — no channel, no serialization." The actor exists for irpc, and the direct path is preferred. So the vault has two dispatch paths, and the one irpc provides (actor) is the secondary one. The primary path (direct method calls) doesn't use irpc at all. The actor is dead code for the designed use case — it exists only to make irpc's Service trait work, which exists only to make RemoteService work, which is the footgun.

Decision

1. alknet-vault drops irpc entirely

The vault's dispatch is direct method calls on VaultServiceHandle. No VaultProtocol enum, no VaultMessage, no VaultServiceActor, no mpsc channel, no Service trait, no RemoteService trait, no Client<S>, no IrohProtocol handler, no postcard serialization.

The vault's public API is VaultServiceHandle (and the types it returns: DerivedKey, KeyType, EncryptedData, EncryptionKey). That's it. An implementer reading the vault crate sees one way to use it, not two ways with a note saying "prefer the first."

2. The vault is local-only by construction

The vault crate has no remote dispatch capability. There is no RemoteService trait, no remote handler, no wire format for vault messages. Enabling remote vault access is not a flag flip or a server-setup change — it requires building a separate crate that depends on both alknet-core (for auth) and alknet-vault (for the handle) and adds the remote transport

auth-wrapping handler. That is a visible architectural act that shows up in code review, not a runtime config flip on a macro that was already generating the remote code.

This inverts the security default: local-only is the only mode. Remote access requires adding something, not removing a default.

3. `DerivedKey` serialization simplifies

Without the postcard/remote-dispatch path, DerivedKey's custom Serialize always redacts the private key (for logging safety) — there is no "postcard preserves bytes" path. The custom Deserialize rejects private_key == "[REDACTED]" with an error rather than producing a corrupted key (this resolves review #002 finding W8).

The redaction is purely for defense-in-depth against logging accidents. The architectural control — DerivedKey never appears in call protocol payloads (ADR-014) — is unchanged and remains the primary control. The serialization redaction is the safety net, not the primary mechanism.

VaultServiceError no longer needs Serialize/Deserialize (which it had for irpc dispatch). It can be a plain thiserror::Error enum. If a future remote-vault crate needs to serialize errors across the wire, that crate defines the wire representation.

4. If remote vault access is ever needed, it's a separate crate

The vault-server-crate question (review #002 C7) is decided: if remote vault access is ever needed, it is a separate crate that depends on both alknet-core (for IdentityProvider, scopes, auth-wrapping) and alknet-vault (for VaultServiceHandle). The vault crate itself remains local-only. This is a decision not to create the crate now, and not to preclude it. It is the path of least commitment, and it matches ADR-018's standalone-vault principle.

The remote vault crate would need its own ADR (matching ADR-019's language: "requires its own ADR") defining the threat model, the access policy, the auth-wrapping handler, and the operation filtering (Unlock/Lock local-only).

5. The vault's dependency footprint shrinks

The vault drops: irpc, irpc-derive, postcard (for remote), noq (via irpc), iroh (via irpc-iroh), and tokio (the actor's tokio::sync::mpsc channels are gone; all vault methods are synchronous and use std::sync::RwLock for thread safety). It retains: bip39, ed25519-bip32, aes-gcm, sha2, hmac, secp256k1 (feature-gated), serde (for DerivedKey redaction and EncryptedData wire format), zeroize, thiserror, base64, rand.

ADR-018's "zero alknet crate dependencies" becomes "zero alknet crate dependencies and zero RPC framework dependencies." This is the cleanest version of ADR-018's intent.

Consequences

Positive:

The security default is inverted. Local-only is the only mode. Remote access requires building a separate crate — a visible, deliberate act. This matches the principle that security should be opt-in, not opt-out.
The vault's API is honest. VaultServiceHandle is the API. No secondary dispatch path that exists for a feature (remote) that isn't enabled. An implementer sees one way to use the vault, not two with a note saying "prefer the first."
Dead code is removed. The actor path, which service.md says is secondary to direct calls, is gone. The VaultProtocol enum, VaultMessage, VaultServiceActor, and the mpsc dispatch loop are gone. The vault is a pure library with a thread-safe handle.
DerivedKey serialization simplifies. The dual serialization (JSON redacts, postcard preserves) is replaced by always-redact-on-serialize, reject-on-deserialize. No "postcard preserves bytes" path to test or document. This resolves review #002 W8 (silent corruption on JSON-deserialized DerivedKey) — the custom Deserialize rejects redacted payloads with an error.
The dependency footprint shrinks. No irpc, no postcard-for-remote, no noq, no iroh via irpc. The vault is truly standalone (ADR-018's intent, strengthened). Supply-chain surface is reduced.
The vault's concurrency model is honest. VaultServiceHandle is Arc<RwLock<...>> — the RwLock provides concurrent reads (derive) and exclusive writes (unlock/lock). The actor's sequential processing was actually worse for throughput than the RwLock. Removing the actor makes the concurrency model visible and correct.
derive_password and site_password_path are removed from the vault's API and path model. The password-manager pattern (deterministic per-site passwords from HD derivation) is not relevant to an RPC system's vault — handlers call APIs (using API keys, OAuth tokens, mTLS), not websites with passwords. The vault is for cryptographic key derivation and credential encryption. This resolves review #002 C9 (site_password_path hash mapping underspecified) by removing the feature rather than specifying the non-standard string→u32 mapping and Ed25519-as-password- entropy construction. If deterministic password generation is ever needed (browser-automation edge case), it can be re-added or implemented as a separate concern — the cost is near-zero, and removing it now eliminates permanent API surface that was inherited from a prior project's password-manager pattern.

Negative:

The vault's VaultProtocol enum and VaultServiceActor are removed. This is a breaking change to the vault crate's public API (VaultProtocol, VaultMessage, VaultServiceActor, Client<VaultProtocol> are removed from the public exports). Since no implementation consumer exists outside the vault crate itself (ADR-019: the assembly layer uses VaultServiceHandle directly), this is a spec edit, not a migration.
If a future use case needs the actor pattern (e.g., for a remote-vault crate that wants in-process mpsc dispatch before forwarding over the wire), it must be re-added in that crate, not in the vault. This is additive — the vault's direct-handle API is unchanged.
The DerivedKey postcard round-trip tests in protocol.rs are removed. The JSON-redaction tests remain. If a future remote-vault crate needs postcard serialization, it defines and tests its own serialization path for the types it sends over the wire.
VaultServiceError loses Serialize/Deserialize. Any code that serialized vault errors (only the irpc dispatch path, which is removed) must adapt. The assembly layer converts vault errors to alknet-core errors at the boundary (ADR-018), and that conversion is string-based already.

On review #002 findings resolved by this ADR:

C7 (OQ-21 remote vault): resolved. OQ-21 moves from "deferred" to "resolved: remote vault access is not a feature of the vault crate; if needed, a separate vault-server crate wraps the vault and adds remote transport + auth, requiring its own ADR." The vault-server-crate question is decided: not created now, not precluded. The crate-decomposition one-way door (ADR-003 territory) is decided by not creating the crate now.
W8 (DerivedKey JSON deserialization silently corrupts): resolved. Without the postcard path, the custom Deserialize rejects private_key == "[REDACTED]" with an error. There is no "postcard preserves bytes" path to complicate the serialization story. The redaction is purely for logging safety; deserialization of a redacted payload is always an error.
C8 (operation access policy table incomplete): dissolved. Without VaultProtocol's remote capability, there is no operation access policy table to complete — all operations are local-only by default. The table in protocol.md goes away. If a future vault-server crate exposes some operations remotely, that crate defines the access policy in its own ADR.
C9 (site_password_path hash mapping underspecified): resolved. The derive_password / derive_password_string / site_password_path methods are removed from the vault's API. The password-manager pattern is not relevant to an RPC system's vault. No hash mapping to specify, no Ed25519-as-password-entropy question to answer.

Assumptions

The vault's designed use case is local-only. ADR-019 says the assembly layer is the sole direct caller. ADR-008 says the vault is a capability source accessed at assembly time. ADR-014 says handlers receive credentials through OperationContext.capabilities, not by calling vault operations. The vault was always designed to be local — irpc's remote capability was an accident of adoption, not a designed feature.
Per-node vaults are the right pattern for multi-node deployments. Each node has its own vault and mnemonic. Credentials are encrypted for the receiving node's public key, not decrypted centrally. This is end-to-end encryption, not a centralized decryption oracle. If this assumption is wrong (a use case truly requires centralized vault access), a remote-vault crate is the answer — not making the vault remote-capable by default.
The actor pattern's sequential processing is not needed. VaultServiceHandle's Arc<RwLock<...>> provides concurrent reads (derive operations) and exclusive writes (unlock/lock). The actor's sequential processing was a constraint, not a feature — it serialized all operations including independent reads. The RwLock is the better concurrency model for this workload.
The vault's protocol is small enough that macro-generated boilerplate is not a maintenance burden. With 8 operations, the VaultServiceHandle method signatures are the protocol. There is no need for a separate protocol enum when the handle's methods are the API. If the vault grew to dozens of operations (unlikely given its scope), a protocol enum could be re-introduced — but it would be a local enum, not an irpc-generated one.
DerivedKey never needs to cross a wire format that preserves private key bytes. The architectural control (ADR-014: DerivedKey never appears in call protocol payloads) means DerivedKey is always used in-process. The redacting Serialize impl is for logging safety (defense-in-depth), not for wire transport. If a future remote-vault crate needs to send DerivedKey over the wire, it defines its own serialization for that context — the vault's DerivedKey stays redact-always.

References

ADR-005: irpc as call protocol foundation (this ADR amends the vault reference in ADR-005's Decision and Consequences; irpc remains the foundation for alknet-call, just not for alknet-vault)
ADR-008: Vault integration point (the vault is a capability source accessed at assembly time — this ADR makes that the only mode)
ADR-014: Secret material flow and capability injection (DerivedKey never appears in call protocol payloads — the redacting Serialize is defense-in-depth for logging, not for wire transport)
ADR-018: Vault as standalone crate (this ADR strengthens the standalone principle: zero alknet crate dependencies and zero RPC framework dependencies)
ADR-019: Vault assembly-layer-only access (this ADR makes the vault local-only, not just assembly-layer-only-for-direct-calls)
OQ-21: Remote vault administration (resolved by this ADR — not a vault crate feature; if needed, a separate crate with its own ADR)
docs/reviews/002-pre-implementation-architecture-sanity-check.md (findings C7, C8, W8 — resolved or dissolved by this ADR)
irpc design patterns: docs/research/references/iroh/irpc/09-design-patterns-and-examples.md (Pattern 3: no_rpc flag — this ADR goes further by dropping irpc entirely, since the actor pattern is also unnecessary)

18 KiB Raw Blame History