docs(architecture): add ADR-025 — vault local-only dispatch, drop irpc

Drops irpc from alknet-vault entirely. The vault's dispatch is now direct
method calls on VaultServiceHandle — no VaultProtocol enum, no
VaultMessage, no VaultServiceActor, no mpsc channel, no Service trait, no
RemoteService trait, no postcard serialization. The vault is local-only by
construction.

The core security argument: irpc made the vault remote-capable by default
(RemoteService generated unless no_rpc is passed). The IrohProtocol handler
forwards all messages without auth. The docs framed 'register an ALPN' as a
server-setup change. This is the default-insecure anti-pattern — security
should be opt-in, not opt-out. ADR-025 inverts the default: local-only is
the only mode, and remote access requires building a separate vault-server
crate (a visible architectural act, not a flag flip).

The actor path was already dead code — service.md said 'prefer
VaultServiceHandle directly — no channel, no serialization.' The actor
existed only to make irpc's Service trait work, which existed only to make
RemoteService work, which was the footgun. VaultServiceHandle's
Arc<RwLock> provides concurrent reads and exclusive writes — better
throughput than the actor's sequential processing.

DerivedKey serialization simplifies: always redact on serialize (for
logging safety), reject '[REDACTED]' on deserialize with an error. No
'postcard preserves bytes' path. This resolves review #002 W8 (silent
corruption on JSON-deserialized DerivedKey).

Resolves:
- OQ-21: remote vault access — resolved (not deferred). Not a vault crate
  feature; if needed, a separate vault-server crate with its own ADR.
- C7: vault-server-crate question decided — not created now, not precluded.
- C8: operation access policy table dissolved — all operations local-only
  by default; if a vault-server crate exposes some remotely, that crate
  defines the policy.
- W8: DerivedKey JSON deserialization — resolved (reject redacted payloads).

Amends ADR-005 (irpc remains for alknet-call, not for alknet-vault),
ADR-018 (vault is even more standalone — zero RPC framework deps),
ADR-019 (vault is the only layer, not just the only direct-caller layer),
ADR-008 (vault integration point unchanged, but now local-only by
construction).
This commit is contained in:
2026-06-22 14:53:52 +00:00
parent cdf340bec7
commit 7dda6eec68
13 changed files with 527 additions and 368 deletions

View File

@@ -0,0 +1,314 @@
# ADR-025: Vault Local-Only Dispatch
## Status
Accepted
## Context
alknet-vault uses irpc for its internal dispatch. The `VaultProtocol` enum is
annotated with `#[rpc_requests(message = VaultMessage, no_spans)]`, which
generates a `Service` trait impl (for in-process mpsc dispatch) and a
`RemoteService` trait impl (for remote QUIC dispatch). The vault's
`VaultServiceActor` processes `VaultMessage` variants from an mpsc channel.
This was adopted from irpc's actor pattern (ADR-005).
### What irpc gives the vault
Separating irpc into its constituent parts and asking which the vault
actually needs:
| irpc component | What it does | Does the vault need it? |
|---|---|---|
| `#[rpc_requests]` macro | Generates message enum, `Channels` impls, `From` conversions | Marginally — it's convenient boilerplate, but the vault's protocol is 8 variants |
| `Service` trait | Local in-process dispatch via mpsc + oneshot | No — `VaultServiceHandle` direct calls are already preferred (service.md: "For local in-process use, prefer `VaultServiceHandle` directly — no channel, no serialization") |
| `RemoteService` trait | Remote dispatch via QUIC + postcard | No — this is the footgun |
| `Client<S>` | Wraps either local mpsc or remote QUIC | No — the assembly layer uses the handle directly |
| `IrohProtocol` handler | Forwards all messages without auth | No — this is the default-insecure handler |
| postcard serialization | Binary serialization for remote dispatch | No — not needed without remote dispatch |
| `DerivedKey` dual serialization | JSON redacts, postcard preserves | Only needed *because* remote dispatch exists |
The vault uses irpc for the actor pattern (in-process mpsc dispatch), but
the actor pattern is the *secondary* dispatch path. The primary path — direct
method calls on `VaultServiceHandle` — doesn't use irpc at all. And the thing
that makes irpc attractive for the actor pattern (the macro-generated
boilerplate) is a convenience, not a structural need. The vault's protocol
is small enough that the boilerplate is manageable by hand, or simply
unnecessary when the actor is removed.
### The security problem: default-insecure
The core problem is not that remote vault access is *possible* in principle
— it's that irpc makes it possible *by default*, with the unsafe path being
the easy path.
The `#[rpc_requests]` macro generates `RemoteService` unless you pass
`no_rpc`. The `IrohProtocol` handler forwards all message types without auth
checks. The docs frame "register an ALPN" as a server-setup change
(OQ-21: "Enabling remote access is a server-setup change"). The result is
an architecture where:
1. The vault is remote-capable by construction (the footgun is loaded).
2. Enabling remote access is easy — one line: `Router::builder(endpoint)
.accept(b"alknet/vault", protocol).spawn()`.
3. The default handler has no auth (the safety is off).
4. Making it safe requires an auth-wrapping handler *outside the vault
crate* (the safety is a separate part you have to remember to install).
This is the **default-insecure anti-pattern**. Security should be opt-in, not
opt-out. The vault should be local-only by default, and remote access should
require *adding* something, not *removing* a default.
### The use cases don't justify the default
**Single node, local vault (the designed path):** The CLI binary unlocks the
vault at startup, derives/decrypts credentials, injects them into handler
capabilities. The vault is accessed only at the assembly layer (ADR-019). No
network. This is the path every deployment starts with, and it needs only
direct in-process method calls on `VaultServiceHandle`. irpc adds nothing.
**Many nodes encrypt/decrypt the same data:** The most likely network-vault
use case, but a stretch. The better pattern is per-node vaults: the head
encrypts credentials *for* the worker using the worker's public key or a
shared derivation path the worker can derive locally. The worker decrypts
locally. This is end-to-end encryption between nodes, not a centralized
decryption oracle. It matches ADR-008's "capability source" model —
credentials are injected at the assembly layer, not fetched over the network
at call time.
**Machine node → workers (OQ-21's use case):** A long-lived machine node
holds the mnemonic and exposes a restricted vault API to ephemeral workers.
This is the use case the vault docs actually spec. But `from_call`'s trust
model already flags the risk: "a compromised remote node can do anything its
operations are declared to do" (operation-registry.md). If the machine node
is compromised, every worker that calls it is compromised. That's inherent
to remote vault access and not a reason to forbid it, but it *is* a reason
to make the exposure a deliberate, hard-to-accidentally-enable act — not the
default state of the crate.
None of these use cases justify making the vault remote-capable *by
construction*. The first needs no remote. The second has a better pattern
(per-node vaults). The third is real but should be an explicit addition, not
a default that's already loaded.
### The actor path is dead code
service.md says "For local in-process use, prefer `VaultServiceHandle`
directly — no channel, no serialization." The actor exists *for* irpc, and
the direct path is preferred. So the vault has two dispatch paths, and the
one irpc provides (actor) is the secondary one. The primary path (direct
method calls) doesn't use irpc at all. The actor is dead code for the
designed use case — it exists only to make irpc's `Service` trait work,
which exists only to make `RemoteService` work, which is the footgun.
## Decision
### 1. alknet-vault drops irpc entirely
The vault's dispatch is direct method calls on `VaultServiceHandle`. No
`VaultProtocol` enum, no `VaultMessage`, no `VaultServiceActor`, no mpsc
channel, no `Service` trait, no `RemoteService` trait, no `Client<S>`, no
`IrohProtocol` handler, no postcard serialization.
The vault's public API is `VaultServiceHandle` (and the types it returns:
`DerivedKey`, `KeyType`, `EncryptedData`, `EncryptionKey`). That's it. An
implementer reading the vault crate sees one way to use it, not two ways
with a note saying "prefer the first."
### 2. The vault is local-only by construction
The vault crate has no remote dispatch capability. There is no
`RemoteService` trait, no remote handler, no wire format for vault messages.
Enabling remote vault access is not a flag flip or a server-setup change —
it requires *building a separate crate* that depends on both alknet-core
(for auth) and alknet-vault (for the handle) and adds the remote transport
+ auth-wrapping handler. That is a visible architectural act that shows up
in code review, not a runtime config flip on a macro that was already
generating the remote code.
This inverts the security default: local-only is the only mode. Remote
access requires adding something, not removing a default.
### 3. `DerivedKey` serialization simplifies
Without the postcard/remote-dispatch path, `DerivedKey`'s custom
`Serialize` always redacts the private key (for logging safety) — there is
no "postcard preserves bytes" path. The custom `Deserialize` rejects
`private_key == "[REDACTED]"` with an error rather than producing a
corrupted key (this resolves review #002 finding W8).
The redaction is purely for defense-in-depth against logging accidents.
The architectural control — `DerivedKey` never appears in call protocol
payloads (ADR-014) — is unchanged and remains the primary control. The
serialization redaction is the safety net, not the primary mechanism.
`VaultServiceError` no longer needs `Serialize`/`Deserialize` (which it had
for irpc dispatch). It can be a plain `thiserror::Error` enum. If a future
remote-vault crate needs to serialize errors across the wire, *that crate*
defines the wire representation.
### 4. If remote vault access is ever needed, it's a separate crate
The vault-server-crate question (review #002 C7) is decided: *if* remote
vault access is ever needed, it is a separate crate that depends on both
alknet-core (for `IdentityProvider`, scopes, auth-wrapping) and
alknet-vault (for `VaultServiceHandle`). The vault crate itself remains
local-only. This is a decision not to create the crate now, and not to
preclude it. It is the path of least commitment, and it matches ADR-018's
standalone-vault principle.
The remote vault crate would need its own ADR (matching ADR-019's language:
"requires its own ADR") defining the threat model, the access policy, the
auth-wrapping handler, and the operation filtering (Unlock/Lock local-only).
### 5. The vault's dependency footprint shrinks
The vault drops: `irpc`, `irpc-derive`, `postcard` (for remote), `noq`
(via irpc), `iroh` (via irpc-iroh). It retains: `bip39`, `ed25519-bip32`,
`aes-gcm`, `sha2`, `hmac`, `secp256k1` (feature-gated), `tokio` (for
`RwLock` sync primitives, not for channels), `serde` (for `DerivedKey`
redaction and `EncryptedData` wire format), `zeroize`, `thiserror`, `base64`,
`rand`.
ADR-018's "zero alknet crate dependencies" becomes "zero alknet crate
dependencies and zero RPC framework dependencies." This is the cleanest
version of ADR-018's intent.
## Consequences
**Positive:**
- The security default is inverted. Local-only is the only mode. Remote
access requires building a separate crate — a visible, deliberate act.
This matches the principle that security should be opt-in, not opt-out.
- The vault's API is honest. `VaultServiceHandle` is the API. No secondary
dispatch path that exists for a feature (remote) that isn't enabled. An
implementer sees one way to use the vault, not two with a note saying
"prefer the first."
- Dead code is removed. The actor path, which service.md says is secondary
to direct calls, is gone. The `VaultProtocol` enum, `VaultMessage`,
`VaultServiceActor`, and the mpsc dispatch loop are gone. The vault is a
pure library with a thread-safe handle.
- `DerivedKey` serialization simplifies. The dual serialization (JSON
redacts, postcard preserves) is replaced by always-redact-on-serialize,
reject-on-deserialize. No "postcard preserves bytes" path to test or
document. This resolves review #002 W8 (silent corruption on
JSON-deserialized `DerivedKey`) — the custom `Deserialize` rejects
redacted payloads with an error.
- The dependency footprint shrinks. No irpc, no postcard-for-remote, no
noq, no iroh via irpc. The vault is truly standalone (ADR-018's intent,
strengthened). Supply-chain surface is reduced.
- The vault's concurrency model is honest. `VaultServiceHandle` is
`Arc<RwLock<...>>` — the RwLock provides concurrent reads (derive) and
exclusive writes (unlock/lock). The actor's sequential processing was
actually *worse* for throughput than the RwLock. Removing the actor
makes the concurrency model visible and correct.
**Negative:**
- The vault's `VaultProtocol` enum and `VaultServiceActor` are removed.
This is a breaking change to the vault crate's public API (`VaultProtocol`,
`VaultMessage`, `VaultServiceActor`, `Client<VaultProtocol>` are removed
from the public exports). Since no implementation consumer exists outside
the vault crate itself (ADR-019: the assembly layer uses
`VaultServiceHandle` directly), this is a spec edit, not a migration.
- If a future use case needs the actor pattern (e.g., for a remote-vault
crate that wants in-process mpsc dispatch before forwarding over the
wire), it must be re-added in *that crate*, not in the vault. This is
additive — the vault's direct-handle API is unchanged.
- The `DerivedKey` postcard round-trip tests in `protocol.rs` are removed.
The JSON-redaction tests remain. If a future remote-vault crate needs
postcard serialization, it defines and tests its own serialization path
for the types it sends over the wire.
- `VaultServiceError` loses `Serialize`/`Deserialize`. Any code that
serialized vault errors (only the irpc dispatch path, which is removed)
must adapt. The assembly layer converts vault errors to alknet-core
errors at the boundary (ADR-018), and that conversion is string-based
already.
**On review #002 findings resolved by this ADR:**
- **C7 (OQ-21 remote vault)**: resolved. OQ-21 moves from "deferred" to
"resolved: remote vault access is not a feature of the vault crate; if
needed, a separate vault-server crate wraps the vault and adds remote
transport + auth, requiring its own ADR." The vault-server-crate question
is decided: not created now, not precluded. The crate-decomposition
one-way door (ADR-003 territory) is decided by *not* creating the crate
now.
- **W8 (`DerivedKey` JSON deserialization silently corrupts)**: resolved.
Without the postcard path, the custom `Deserialize` rejects
`private_key == "[REDACTED]"` with an error. There is no
"postcard preserves bytes" path to complicate the serialization story.
The redaction is purely for logging safety; deserialization of a redacted
payload is always an error.
- **C8 (operation access policy table incomplete)**: dissolved. Without
`VaultProtocol`'s remote capability, there is no operation access policy
table to complete — all operations are local-only by default. The table
in protocol.md goes away. If a future vault-server crate exposes some
operations remotely, *that crate* defines the access policy in its own
ADR.
## Assumptions
1. **The vault's designed use case is local-only.** ADR-019 says the
assembly layer is the sole direct caller. ADR-008 says the vault is a
capability source accessed at assembly time. ADR-014 says handlers
receive credentials through `OperationContext.capabilities`, not by
calling vault operations. The vault was always designed to be local —
irpc's remote capability was an accident of adoption, not a designed
feature.
2. **Per-node vaults are the right pattern for multi-node deployments.**
Each node has its own vault and mnemonic. Credentials are encrypted *for*
the receiving node's public key, not decrypted centrally. This is
end-to-end encryption, not a centralized decryption oracle. If this
assumption is wrong (a use case truly requires centralized vault
access), a remote-vault crate is the answer — not making the vault
remote-capable by default.
3. **The actor pattern's sequential processing is not needed.**
`VaultServiceHandle`'s `Arc<RwLock<...>>` provides concurrent reads
(derive operations) and exclusive writes (unlock/lock). The actor's
sequential processing was a constraint, not a feature — it serialized
all operations including independent reads. The RwLock is the better
concurrency model for this workload.
4. **The vault's protocol is small enough that macro-generated boilerplate
is not a maintenance burden.** With 8 operations, the
`VaultServiceHandle` method signatures *are* the protocol. There is no
need for a separate protocol enum when the handle's methods are the
API. If the vault grew to dozens of operations (unlikely given its
scope), a protocol enum could be re-introduced — but it would be a
local enum, not an irpc-generated one.
5. **`DerivedKey` never needs to cross a wire format that preserves
private key bytes.** The architectural control (ADR-014:
`DerivedKey` never appears in call protocol payloads) means
`DerivedKey` is always used in-process. The redacting `Serialize` impl
is for logging safety (defense-in-depth), not for wire transport. If a
future remote-vault crate needs to send `DerivedKey` over the wire, it
defines its own serialization for that context — the vault's
`DerivedKey` stays redact-always.
## References
- ADR-005: irpc as call protocol foundation (this ADR amends the vault
reference in ADR-005's Decision and Consequences; irpc remains the
foundation for alknet-*call*, just not for alknet-*vault*)
- ADR-008: Vault integration point (the vault is a capability source
accessed at assembly time — this ADR makes that the *only* mode)
- ADR-014: Secret material flow and capability injection (`DerivedKey`
never appears in call protocol payloads — the redacting `Serialize`
is defense-in-depth for logging, not for wire transport)
- ADR-018: Vault as standalone crate (this ADR strengthens the
standalone principle: zero alknet crate dependencies *and* zero RPC
framework dependencies)
- ADR-019: Vault assembly-layer-only access (this ADR makes the vault
local-only, not just assembly-layer-only-for-direct-calls)
- OQ-21: Remote vault administration (resolved by this ADR — not a vault
crate feature; if needed, a separate crate with its own ADR)
- docs/reviews/002-pre-implementation-architecture-sanity-check.md
(findings C7, C8, W8 — resolved or dissolved by this ADR)
- irpc design patterns: `docs/research/references/iroh/irpc/09-design-patterns-and-examples.md`
(Pattern 3: `no_rpc` flag — this ADR goes further by dropping irpc
entirely, since the actor pattern is also unnecessary)