docs(architecture): add ADR-025 — vault local-only dispatch, drop irpc

Drops irpc from alknet-vault entirely. The vault's dispatch is now direct method calls on VaultServiceHandle — no VaultProtocol enum, no VaultMessage, no VaultServiceActor, no mpsc channel, no Service trait, no RemoteService trait, no postcard serialization. The vault is local-only by construction. The core security argument: irpc made the vault remote-capable by default (RemoteService generated unless no_rpc is passed). The IrohProtocol handler forwards all messages without auth. The docs framed 'register an ALPN' as a server-setup change. This is the default-insecure anti-pattern — security should be opt-in, not opt-out. ADR-025 inverts the default: local-only is the only mode, and remote access requires building a separate vault-server crate (a visible architectural act, not a flag flip). The actor path was already dead code — service.md said 'prefer VaultServiceHandle directly — no channel, no serialization.' The actor existed only to make irpc's Service trait work, which existed only to make RemoteService work, which was the footgun. VaultServiceHandle's Arc<RwLock> provides concurrent reads and exclusive writes — better throughput than the actor's sequential processing. DerivedKey serialization simplifies: always redact on serialize (for logging safety), reject '[REDACTED]' on deserialize with an error. No 'postcard preserves bytes' path. This resolves review #002 W8 (silent corruption on JSON-deserialized DerivedKey). Resolves: - OQ-21: remote vault access — resolved (not deferred). Not a vault crate feature; if needed, a separate vault-server crate with its own ADR. - C7: vault-server-crate question decided — not created now, not precluded. - C8: operation access policy table dissolved — all operations local-only by default; if a vault-server crate exposes some remotely, that crate defines the policy. - W8: DerivedKey JSON deserialization — resolved (reject redacted payloads). Amends ADR-005 (irpc remains for alknet-call, not for alknet-vault), ADR-018 (vault is even more standalone — zero RPC framework deps), ADR-019 (vault is the only layer, not just the only direct-caller layer), ADR-008 (vault integration point unchanged, but now local-only by construction).
2026-06-22 14:53:52 +00:00
parent cdf340bec7
commit 7dda6eec68
13 changed files with 527 additions and 368 deletions
--- a/docs/architecture/decisions/025-vault-local-only-dispatch.md
+++ b/docs/architecture/decisions/025-vault-local-only-dispatch.md
@@ -0,0 +1,314 @@
+# ADR-025: Vault Local-Only Dispatch
+
+## Status
+
+Accepted
+
+## Context
+
+alknet-vault uses irpc for its internal dispatch. The `VaultProtocol` enum is
+annotated with `#[rpc_requests(message = VaultMessage, no_spans)]`, which
+generates a `Service` trait impl (for in-process mpsc dispatch) and a
+`RemoteService` trait impl (for remote QUIC dispatch). The vault's
+`VaultServiceActor` processes `VaultMessage` variants from an mpsc channel.
+This was adopted from irpc's actor pattern (ADR-005).
+
+### What irpc gives the vault
+
+Separating irpc into its constituent parts and asking which the vault
+actually needs:
+
+| irpc component | What it does | Does the vault need it? |
+|---|---|---|
+| `#[rpc_requests]` macro | Generates message enum, `Channels` impls, `From` conversions | Marginally — it's convenient boilerplate, but the vault's protocol is 8 variants |
+| `Service` trait | Local in-process dispatch via mpsc + oneshot | No — `VaultServiceHandle` direct calls are already preferred (service.md: "For local in-process use, prefer `VaultServiceHandle` directly — no channel, no serialization") |
+| `RemoteService` trait | Remote dispatch via QUIC + postcard | No — this is the footgun |
+| `Client<S>` | Wraps either local mpsc or remote QUIC | No — the assembly layer uses the handle directly |
+| `IrohProtocol` handler | Forwards all messages without auth | No — this is the default-insecure handler |
+| postcard serialization | Binary serialization for remote dispatch | No — not needed without remote dispatch |
+| `DerivedKey` dual serialization | JSON redacts, postcard preserves | Only needed *because* remote dispatch exists |
+
+The vault uses irpc for the actor pattern (in-process mpsc dispatch), but
+the actor pattern is the *secondary* dispatch path. The primary path — direct
+method calls on `VaultServiceHandle` — doesn't use irpc at all. And the thing
+that makes irpc attractive for the actor pattern (the macro-generated
+boilerplate) is a convenience, not a structural need. The vault's protocol
+is small enough that the boilerplate is manageable by hand, or simply
+unnecessary when the actor is removed.
+
+### The security problem: default-insecure
+
+The core problem is not that remote vault access is *possible* in principle
+— it's that irpc makes it possible *by default*, with the unsafe path being
+the easy path.
+
+The `#[rpc_requests]` macro generates `RemoteService` unless you pass
+`no_rpc`. The `IrohProtocol` handler forwards all message types without auth
+checks. The docs frame "register an ALPN" as a server-setup change
+(OQ-21: "Enabling remote access is a server-setup change"). The result is
+an architecture where:
+
+1. The vault is remote-capable by construction (the footgun is loaded).
+2. Enabling remote access is easy — one line: `Router::builder(endpoint)
+   .accept(b"alknet/vault", protocol).spawn()`.
+3. The default handler has no auth (the safety is off).
+4. Making it safe requires an auth-wrapping handler *outside the vault
+   crate* (the safety is a separate part you have to remember to install).
+
+This is the **default-insecure anti-pattern**. Security should be opt-in, not
+opt-out. The vault should be local-only by default, and remote access should
+require *adding* something, not *removing* a default.
+
+### The use cases don't justify the default
+
+**Single node, local vault (the designed path):** The CLI binary unlocks the
+vault at startup, derives/decrypts credentials, injects them into handler
+capabilities. The vault is accessed only at the assembly layer (ADR-019). No
+network. This is the path every deployment starts with, and it needs only
+direct in-process method calls on `VaultServiceHandle`. irpc adds nothing.
+
+**Many nodes encrypt/decrypt the same data:** The most likely network-vault
+use case, but a stretch. The better pattern is per-node vaults: the head
+encrypts credentials *for* the worker using the worker's public key or a
+shared derivation path the worker can derive locally. The worker decrypts
+locally. This is end-to-end encryption between nodes, not a centralized
+decryption oracle. It matches ADR-008's "capability source" model —
+credentials are injected at the assembly layer, not fetched over the network
+at call time.
+
+**Machine node → workers (OQ-21's use case):** A long-lived machine node
+holds the mnemonic and exposes a restricted vault API to ephemeral workers.
+This is the use case the vault docs actually spec. But `from_call`'s trust
+model already flags the risk: "a compromised remote node can do anything its
+operations are declared to do" (operation-registry.md). If the machine node
+is compromised, every worker that calls it is compromised. That's inherent
+to remote vault access and not a reason to forbid it, but it *is* a reason
+to make the exposure a deliberate, hard-to-accidentally-enable act — not the
+default state of the crate.
+
+None of these use cases justify making the vault remote-capable *by
+construction*. The first needs no remote. The second has a better pattern
+(per-node vaults). The third is real but should be an explicit addition, not
+a default that's already loaded.
+
+### The actor path is dead code
+
+service.md says "For local in-process use, prefer `VaultServiceHandle`
+directly — no channel, no serialization." The actor exists *for* irpc, and
+the direct path is preferred. So the vault has two dispatch paths, and the
+one irpc provides (actor) is the secondary one. The primary path (direct
+method calls) doesn't use irpc at all. The actor is dead code for the
+designed use case — it exists only to make irpc's `Service` trait work,
+which exists only to make `RemoteService` work, which is the footgun.
+
+## Decision
+
+### 1. alknet-vault drops irpc entirely
+
+The vault's dispatch is direct method calls on `VaultServiceHandle`. No
+`VaultProtocol` enum, no `VaultMessage`, no `VaultServiceActor`, no mpsc
+channel, no `Service` trait, no `RemoteService` trait, no `Client<S>`, no
+`IrohProtocol` handler, no postcard serialization.
+
+The vault's public API is `VaultServiceHandle` (and the types it returns:
+`DerivedKey`, `KeyType`, `EncryptedData`, `EncryptionKey`). That's it. An
+implementer reading the vault crate sees one way to use it, not two ways
+with a note saying "prefer the first."
+
+### 2. The vault is local-only by construction
+
+The vault crate has no remote dispatch capability. There is no
+`RemoteService` trait, no remote handler, no wire format for vault messages.
+Enabling remote vault access is not a flag flip or a server-setup change —
+it requires *building a separate crate* that depends on both alknet-core
+(for auth) and alknet-vault (for the handle) and adds the remote transport
+ auth-wrapping handler. That is a visible architectural act that shows up
+in code review, not a runtime config flip on a macro that was already
+generating the remote code.
+
+This inverts the security default: local-only is the only mode. Remote
+access requires adding something, not removing a default.
+
+### 3. `DerivedKey` serialization simplifies
+
+Without the postcard/remote-dispatch path, `DerivedKey`'s custom
+`Serialize` always redacts the private key (for logging safety) — there is
+no "postcard preserves bytes" path. The custom `Deserialize` rejects
+`private_key == "[REDACTED]"` with an error rather than producing a
+corrupted key (this resolves review #002 finding W8).
+
+The redaction is purely for defense-in-depth against logging accidents.
+The architectural control — `DerivedKey` never appears in call protocol
+payloads (ADR-014) — is unchanged and remains the primary control. The
+serialization redaction is the safety net, not the primary mechanism.
+
+`VaultServiceError` no longer needs `Serialize`/`Deserialize` (which it had
+for irpc dispatch). It can be a plain `thiserror::Error` enum. If a future
+remote-vault crate needs to serialize errors across the wire, *that crate*
+defines the wire representation.
+
+### 4. If remote vault access is ever needed, it's a separate crate
+
+The vault-server-crate question (review #002 C7) is decided: *if* remote
+vault access is ever needed, it is a separate crate that depends on both
+alknet-core (for `IdentityProvider`, scopes, auth-wrapping) and
+alknet-vault (for `VaultServiceHandle`). The vault crate itself remains
+local-only. This is a decision not to create the crate now, and not to
+preclude it. It is the path of least commitment, and it matches ADR-018's
+standalone-vault principle.
+
+The remote vault crate would need its own ADR (matching ADR-019's language:
+"requires its own ADR") defining the threat model, the access policy, the
+auth-wrapping handler, and the operation filtering (Unlock/Lock local-only).
+
+### 5. The vault's dependency footprint shrinks
+
+The vault drops: `irpc`, `irpc-derive`, `postcard` (for remote), `noq`
+(via irpc), `iroh` (via irpc-iroh). It retains: `bip39`, `ed25519-bip32`,
+`aes-gcm`, `sha2`, `hmac`, `secp256k1` (feature-gated), `tokio` (for
+`RwLock` sync primitives, not for channels), `serde` (for `DerivedKey`
+redaction and `EncryptedData` wire format), `zeroize`, `thiserror`, `base64`,
+`rand`.
+
+ADR-018's "zero alknet crate dependencies" becomes "zero alknet crate
+dependencies and zero RPC framework dependencies." This is the cleanest
+version of ADR-018's intent.
+
+## Consequences
+
+**Positive:**
+
+- The security default is inverted. Local-only is the only mode. Remote
+  access requires building a separate crate — a visible, deliberate act.
+  This matches the principle that security should be opt-in, not opt-out.
+- The vault's API is honest. `VaultServiceHandle` is the API. No secondary
+  dispatch path that exists for a feature (remote) that isn't enabled. An
+  implementer sees one way to use the vault, not two with a note saying
+  "prefer the first."
+- Dead code is removed. The actor path, which service.md says is secondary
+  to direct calls, is gone. The `VaultProtocol` enum, `VaultMessage`,
+  `VaultServiceActor`, and the mpsc dispatch loop are gone. The vault is a
+  pure library with a thread-safe handle.
+- `DerivedKey` serialization simplifies. The dual serialization (JSON
+  redacts, postcard preserves) is replaced by always-redact-on-serialize,
+  reject-on-deserialize. No "postcard preserves bytes" path to test or
+  document. This resolves review #002 W8 (silent corruption on
+  JSON-deserialized `DerivedKey`) — the custom `Deserialize` rejects
+  redacted payloads with an error.
+- The dependency footprint shrinks. No irpc, no postcard-for-remote, no
+  noq, no iroh via irpc. The vault is truly standalone (ADR-018's intent,
+  strengthened). Supply-chain surface is reduced.
+- The vault's concurrency model is honest. `VaultServiceHandle` is
+  `Arc<RwLock<...>>` — the RwLock provides concurrent reads (derive) and
+  exclusive writes (unlock/lock). The actor's sequential processing was
+  actually *worse* for throughput than the RwLock. Removing the actor
+  makes the concurrency model visible and correct.
+
+**Negative:**
+
+- The vault's `VaultProtocol` enum and `VaultServiceActor` are removed.
+  This is a breaking change to the vault crate's public API (`VaultProtocol`,
+  `VaultMessage`, `VaultServiceActor`, `Client<VaultProtocol>` are removed
+  from the public exports). Since no implementation consumer exists outside
+  the vault crate itself (ADR-019: the assembly layer uses
+  `VaultServiceHandle` directly), this is a spec edit, not a migration.
+- If a future use case needs the actor pattern (e.g., for a remote-vault
+  crate that wants in-process mpsc dispatch before forwarding over the
+  wire), it must be re-added in *that crate*, not in the vault. This is
+  additive — the vault's direct-handle API is unchanged.
+- The `DerivedKey` postcard round-trip tests in `protocol.rs` are removed.
+  The JSON-redaction tests remain. If a future remote-vault crate needs
+  postcard serialization, it defines and tests its own serialization path
+  for the types it sends over the wire.
+- `VaultServiceError` loses `Serialize`/`Deserialize`. Any code that
+  serialized vault errors (only the irpc dispatch path, which is removed)
+  must adapt. The assembly layer converts vault errors to alknet-core
+  errors at the boundary (ADR-018), and that conversion is string-based
+  already.
+
+**On review #002 findings resolved by this ADR:**
+
+- **C7 (OQ-21 remote vault)**: resolved. OQ-21 moves from "deferred" to
+  "resolved: remote vault access is not a feature of the vault crate; if
+  needed, a separate vault-server crate wraps the vault and adds remote
+  transport + auth, requiring its own ADR." The vault-server-crate question
+  is decided: not created now, not precluded. The crate-decomposition
+  one-way door (ADR-003 territory) is decided by *not* creating the crate
+  now.
+- **W8 (`DerivedKey` JSON deserialization silently corrupts)**: resolved.
+  Without the postcard path, the custom `Deserialize` rejects
+  `private_key == "[REDACTED]"` with an error. There is no
+  "postcard preserves bytes" path to complicate the serialization story.
+  The redaction is purely for logging safety; deserialization of a redacted
+  payload is always an error.
+- **C8 (operation access policy table incomplete)**: dissolved. Without
+  `VaultProtocol`'s remote capability, there is no operation access policy
+  table to complete — all operations are local-only by default. The table
+  in protocol.md goes away. If a future vault-server crate exposes some
+  operations remotely, *that crate* defines the access policy in its own
+  ADR.
+
+## Assumptions
+
+1. **The vault's designed use case is local-only.** ADR-019 says the
+   assembly layer is the sole direct caller. ADR-008 says the vault is a
+   capability source accessed at assembly time. ADR-014 says handlers
+   receive credentials through `OperationContext.capabilities`, not by
+   calling vault operations. The vault was always designed to be local —
+   irpc's remote capability was an accident of adoption, not a designed
+   feature.
+
+2. **Per-node vaults are the right pattern for multi-node deployments.**
+   Each node has its own vault and mnemonic. Credentials are encrypted *for*
+   the receiving node's public key, not decrypted centrally. This is
+   end-to-end encryption, not a centralized decryption oracle. If this
+   assumption is wrong (a use case truly requires centralized vault
+   access), a remote-vault crate is the answer — not making the vault
+   remote-capable by default.
+
+3. **The actor pattern's sequential processing is not needed.**
+   `VaultServiceHandle`'s `Arc<RwLock<...>>` provides concurrent reads
+   (derive operations) and exclusive writes (unlock/lock). The actor's
+   sequential processing was a constraint, not a feature — it serialized
+   all operations including independent reads. The RwLock is the better
+   concurrency model for this workload.
+
+4. **The vault's protocol is small enough that macro-generated boilerplate
+   is not a maintenance burden.** With 8 operations, the
+   `VaultServiceHandle` method signatures *are* the protocol. There is no
+   need for a separate protocol enum when the handle's methods are the
+   API. If the vault grew to dozens of operations (unlikely given its
+   scope), a protocol enum could be re-introduced — but it would be a
+   local enum, not an irpc-generated one.
+
+5. **`DerivedKey` never needs to cross a wire format that preserves
+   private key bytes.** The architectural control (ADR-014:
+   `DerivedKey` never appears in call protocol payloads) means
+   `DerivedKey` is always used in-process. The redacting `Serialize` impl
+   is for logging safety (defense-in-depth), not for wire transport. If a
+   future remote-vault crate needs to send `DerivedKey` over the wire, it
+   defines its own serialization for that context — the vault's
+   `DerivedKey` stays redact-always.
+
+## References
+
+- ADR-005: irpc as call protocol foundation (this ADR amends the vault
+  reference in ADR-005's Decision and Consequences; irpc remains the
+  foundation for alknet-*call*, just not for alknet-*vault*)
+- ADR-008: Vault integration point (the vault is a capability source
+  accessed at assembly time — this ADR makes that the *only* mode)
+- ADR-014: Secret material flow and capability injection (`DerivedKey`
+  never appears in call protocol payloads — the redacting `Serialize`
+  is defense-in-depth for logging, not for wire transport)
+- ADR-018: Vault as standalone crate (this ADR strengthens the
+  standalone principle: zero alknet crate dependencies *and* zero RPC
+  framework dependencies)
+- ADR-019: Vault assembly-layer-only access (this ADR makes the vault
+  local-only, not just assembly-layer-only-for-direct-calls)
+- OQ-21: Remote vault administration (resolved by this ADR — not a vault
+  crate feature; if needed, a separate crate with its own ADR)
+- docs/reviews/002-pre-implementation-architecture-sanity-check.md
+  (findings C7, C8, W8 — resolved or dissolved by this ADR)
+- irpc design patterns: `docs/research/references/iroh/irpc/09-design-patterns-and-examples.md`
+  (Pattern 3: `no_rpc` flag — this ADR goes further by dropping irpc
+  entirely, since the actor pattern is also unnecessary)