docs(architecture): document vault remote capability, enrich OQ-21

The VaultProtocol is a remote-capable irpc service by construction —
#[rpc_requests] generates both Service (local) and RemoteService (remote)
trait impls. DerivedKey's dual serialization (JSON redacts, postcard
preserves) was designed for this. Enabling remote vault access is a
server-setup change, not a protocol change.

OQ-21 enriched with full context:
- What's already in place (protocol, serialization, actor, auth transport)
- What's not in place (IrohProtocol handler forwards all messages without
  auth checks; needs NodeId allowlist + message filtering in assembly layer)
- Operation access policy: Unlock/Lock local-only; Derive/Encrypt/Decrypt
  remote-capable
- Use case: machine node → workers (workers don't hold mnemonics)
- Per-machine-node vaults, not shared (compartmentalization)
- Breaking vs non-breaking analysis (enabling = non-breaking; protocol
  evolution = wire break, manageable via ALPN versioning)

The auth-wrapping handler lives in the assembly layer (or a dedicated
vault-server crate depending on both alknet-core and alknet-vault), not in
the vault crate itself — the vault is standalone (ADR-018) and can't
import alknet-core's auth model.

OQ-21 remains deferred — no commitment to implement, but the door is open
and the design space is mapped.
This commit is contained in:
2026-06-20 06:48:23 +00:00
parent dc27753680
commit 9087f0579f
6 changed files with 199 additions and 21 deletions

View File

@@ -87,7 +87,7 @@ See [open-questions.md](open-questions.md) for the full tracker.
**Deferred (not active):**
- **OQ-09**: WASM target boundaries — design constraint, not deliverable
- **OQ-10**: Git adapter scope — start with smart protocol, add ERC721 later
- **OQ-21**: Remote vault administration — network unlock not supported; needs ADR if ever needed
- **OQ-21**: Remote vault access — protocol is remote-capable by construction (irpc `RemoteService`); enabling is a server-setup change with an auth-wrapping handler in the assembly layer; `Unlock`/`Lock` are local-only
## Document Lifecycle

View File

@@ -51,7 +51,7 @@ cross the network.
| OQ | Title | Status | Relevance |
|----|-------|--------|-----------|
| OQ-20 | Encryption key derivation | resolved (ADR-020) | HD derivation from seed; salt field unused in v2 |
| OQ-21 | Remote vault administration | deferred | Network unlock not supported; needs ADR if ever needed |
| OQ-21 | Remote vault access | deferred | Protocol is remote-capable by construction; enabling = server-setup change with auth-wrapping handler; Unlock/Lock local-only |
| OQ-22 | Key rotation mechanism | resolved (ADR-021) | Version-indexed paths; `rotate` method |
## Key Design Principles

View File

@@ -154,15 +154,153 @@ the vault is wrapped in an operation that serializes to JSON — but **no
vault operations are exposed over the call protocol** (ADR-014). The JSON
serialization path exists only for the `DerivedKey` redaction safety net.
## Remote Capability
The `VaultProtocol` is a remote-capable irpc service **by construction**.
The `#[rpc_requests]` macro generates both `Service` (local) and
`RemoteService` (remote) trait implementations. The `VaultServiceActor`
processes `VaultMessage` variants identically regardless of transport —
the only difference between local and remote use is the `Client<VaultProtocol>`
construction and the server-side listener setup.
This was a purposeful design decision: irpc's "zero-overhead local,
transparent remote" architecture means the same protocol definition and
actor code work for both in-process and cross-network dispatch. Enabling
remote vault access is a server-setup change, not a protocol change.
### What's already in place
- **Protocol**: `VaultProtocol` is already a `RemoteService`. No code
changes needed in the protocol definition.
- **Serialization**: `DerivedKey`'s dual serialization (JSON redacts private
key for safety; postcard preserves bytes for remote dispatch) was
designed for this use case.
- **Actor**: `VaultServiceActor` already processes all message types. The
actor is transport-agnostic — it doesn't know whether a message arrived
via a local mpsc channel or a remote QUIC stream.
- **Auth transport**: irpc over iroh uses iroh's QUIC connections, which
authenticate via NodeId (Ed25519, RFC 7250 raw keys) — the same identity
model as the rest of alknet (ADR-010). The connection-level identity
("which NodeId is calling") is available before any vault operation is
dispatched.
### What's not in place (the gap)
The `IrohProtocol` handler that irpc provides forwards **all** message
types to the actor without auth checks. For local use this is correct
(the assembly layer is trusted). For remote use, the listener needs:
1. **NodeId allowlist**: only known worker NodeIds may connect.
2. **Message filtering**: reject `Unlock` and `Lock` from remote callers
(see "Operation access policy" below).
3. **Then** forward to the actor.
This auth-wrapping handler cannot live in the vault crate — the vault is
standalone (ADR-018) and depends on no alknet crate. The auth model
(`IdentityProvider`, `Identity`, scopes) lives in alknet-core. The
auth-wrapping listener lives in the **assembly layer** (the CLI binary)
or a dedicated vault-server crate that depends on both alknet-core and
alknet-vault. This is the same pattern as ADR-019: the vault is a
library, the assembly layer is the integrator.
```
alknet-vault (standalone, no deps)
- VaultProtocol (RemoteService by construction)
- VaultServiceActor (processes all message types, no auth)
- VaultServiceHandle (direct API)
assembly layer / vault-server (depends on alknet-core + alknet-vault)
- AuthWrappingHandler: checks NodeId, filters message types, forwards
- IrohProtocol::new(auth_wrapping_handler)
- Router::builder(endpoint).accept(b"alknet/vault", protocol).spawn()
```
### Operation access policy
Not all `VaultProtocol` operations are safe to expose remotely. The vault
spec defines the policy; the assembly-layer listener enforces it.
| Operation | Local (assembly layer) | Remote (workers) | Why |
|-----------|----------------------|-------------------|-----|
| `Unlock` | ✅ | ❌ | Sends the mnemonic (root of trust) over the wire. Even with NodeId auth, the mnemonic in transit is a different threat model — it's in memory on the receiving end, potentially in logs/traces. Local-only. |
| `Lock` | ✅ | ❌ | Locking the vault bricks the machine node for all workers. A compromised or buggy worker could DoS the entire machine node. Local-only. |
| `DeriveEd25519` | ✅ | ✅ | Workers need derived keys for signing, identity. The derivation path is the access control — the worker can only derive at paths the assembly layer declares. |
| `DeriveEncryptionKey` | ✅ | ✅ | Workers need encryption keys for credential encryption. Same path-based access control. |
| `DeriveEthereumKey` | ✅ | ✅ | Same as DeriveEd25519, for Ethereum signing. |
| `DerivePassword` | ✅ | ✅ | Workers need deterministic passwords for service credentials. |
| `Encrypt` | ✅ | ✅ | Workers encrypt external credentials (API keys) for storage. |
| `Decrypt` | ✅ | ✅ | Workers decrypt stored credentials at call time. |
The policy is: **`Unlock` and `Lock` are local-only; all other operations
are remote-capable.** The assembly-layer listener filters `Unlock` and
`Lock` messages from remote connections and returns an error.
### Use case: machine node → workers
The primary use case is a **machine node** (long-lived, holds the mnemonic,
manages container services) exposing a restricted vault API to its
**workers** (ephemeral, containerized, no mnemonic):
```
Machine Node (head, vault unlocked locally)
├── exposes alknet/vault ALPN to workers
├── NodeId allowlist: only known worker NodeIds may connect
├── message filter: rejects Unlock/Lock from remote callers
├── Worker A (no mnemonic)
│ └── calls DeriveEd25519, Encrypt, Decrypt on machine node's vault
└── Worker B (also a head for its own sub-workers)
├── gets its own credentials from machine node's vault
└── can expose its own restricted vault API to sub-workers
```
Workers don't hold mnemonics. They get static credentials injected at
construction (the common case) and call the machine node's vault for
dynamic derivation or decryption when needed. This is the
defense-in-depth (Russian doll) model: the seed is the innermost layer,
the machine node's vault is the next, iroh's NodeId auth is the outer,
and workers are outside that — calling in through authenticated channels.
### Per-machine-node vaults, not shared
Each machine node has its own vault and mnemonic. Machine nodes do not
share vaults with each other. Compromising one machine node exposes only
that node's workers, not all nodes. This is compartmentalization — the
blast radius of a vault compromise is one machine node, not the entire
fleet.
The remote vault capability is for the **machine→worker** relationship,
not for cross-machine-node sharing. Machine nodes don't expose their
vaults to peer machine nodes — only to their own workers, authenticated
by NodeId.
### What's breaking vs. non-breaking
| Change | Breaking? | Why |
|--------|-----------|-----|
| Enabling remote vault access | **No** | Server-setup change — register `IrohProtocol` with an ALPN. The protocol is already a `RemoteService`. |
| Restricting which operations are remote-capable | **No** | Policy in the assembly-layer handler, not a protocol change. |
| Adding NodeId auth checks | **No** | Implementation in the assembly-layer handler. The vault crate doesn't change. |
| Adding new `VaultProtocol` variants | **Yes (wire break)** | Inherent to irpc — versioning is a non-goal. Would need ALPN versioning (`alknet/vault/v2`) if the protocol evolves. Same constraint as any irpc service. |
| Changing `DerivedKey` serialization | **No** | Dual serialization is already in place — postcard preserves bytes for remote, JSON redacts for safety. |
The only breaking change is evolving the `VaultProtocol` enum itself, and
that's manageable with ALPN versioning (`alknet/vault`, then
`alknet/vault/v2` if needed) — the same pattern alknet uses for all ALPN
protocols (ADR-006).
## Design Decisions
| Decision | ADR | Summary |
|----------|-----|---------|
| irpc for vault dispatch | [ADR-005](../../decisions/005-irpc-as-call-protocol-foundation.md) | In-process type-safe dispatch |
| irpc for vault dispatch | [ADR-005](../../decisions/005-irpc-as-call-protocol-foundation.md) | In-process type-safe dispatch; remote-capable by construction |
| `DerivedKey` is move-only | [ADR-014](../../decisions/014-secret-material-flow-and-capability-injection.md) | Prevents accidental duplication of secret material |
| JSON redacts private key | [ADR-014](../../decisions/014-secret-material-flow-and-capability-injection.md) | Defense-in-depth for logging accidents |
| postcard preserves private key | — | Required for in-cluster irpc dispatch |
| No vault operations on call protocol | [ADR-008](../../decisions/008-secret-service-integration.md), [ADR-014](../../decisions/014-secret-material-flow-and-capability-injection.md) | Master seed never crosses the network |
| Unlock/Lock are local-only | OQ-21 (deferred) | Mnemonic and lock control must not be remotely accessible |
| Auth wrapping lives in assembly layer | [ADR-018](../../decisions/018-vault-standalone-crate.md), [ADR-019](../../decisions/019-vault-assembly-layer-only.md) | Vault is standalone; can't import alknet-core's auth model |
## Open Questions

View File

@@ -280,18 +280,29 @@ in-process pattern (the actor and the handle share state via `Arc`).
| Direct (in-process) | `VaultServiceHandle` method calls | None | CLI binary at startup (the supported path) |
| Actor (in-process) | `VaultMessage` over mpsc | None (channel) | irpc service dispatch (in-process) |
Remote (in-cluster) vault dispatch — where the vault runs as a sidecar
and other processes send `VaultMessage` over a network — is **not
supported** (ADR-019, OQ-21). The irpc `RemoteService` trait infrastructure
exists in the library, but exposing the vault over the network would
require its own ADR with an explicit threat model (the master seed must
never cross the network). The dispatch table above lists only the
supported paths.
Remote vault dispatch — where the vault is exposed over irpc/iroh to
workers or other processes — is **deferred** (OQ-21). The `VaultProtocol`
is already a `RemoteService` by construction (irpc's `#[rpc_requests]`
generates it), and `DerivedKey`'s dual serialization was designed for this.
Enabling remote access is a server-setup change (register `IrohProtocol`
with an ALPN), not a protocol change.
However, the `IrohProtocol` handler that irpc provides forwards all
message types without auth checks. Remote use needs an **auth-wrapping
handler** in the assembly layer (not the vault crate — the vault is
standalone, ADR-018, and can't import alknet-core's auth model) that:
1. Checks the caller's NodeId against an allowlist
2. Filters `Unlock` and `Lock` messages from remote callers (local-only)
3. Forwards remaining messages to the actor
See [protocol.md → Remote Capability](protocol.md#remote-capability) for
the full design, operation access policy, use case (machine node →
workers), and breaking-vs-non-breaking analysis.
The assembly layer (CLI binary) uses the direct path. The actor path
exists for in-process irpc dispatch but is not used by the assembly layer
— it's available for test harnesses and future in-process service
patterns. Neither path is on the alknet call protocol (ADR-008, ADR-014).
exists for in-process irpc dispatch. Neither path is on the alknet call
protocol (ADR-008, ADR-014) — the vault has no ALPN until a future
deployment explicitly registers one with an auth-wrapping handler.
## Errors
@@ -328,8 +339,11 @@ error types — the CLI binary converts at the assembly boundary (ADR-018).
See [open-questions.md](../../open-questions.md) for full details.
- **OQ-21** (deferred): Remote vault administration — network unlock is not
supported; needs an ADR if ever needed.
- **OQ-21** (deferred): Remote vault access — the `VaultProtocol` is
remote-capable by construction (irpc `RemoteService`). Enabling remote
access is a server-setup change with an auth-wrapping handler in the
assembly layer. `Unlock`/`Lock` are local-only; other operations are
remote-capable. See [protocol.md → Remote Capability](protocol.md#remote-capability).
## Security Constraints

View File

@@ -251,12 +251,38 @@ These questions are acknowledged but not active. They will be promoted to open w
### OQ-21: Remote Vault Administration
- **Origin**: [service.md](crates/vault/service.md), ADR-019
- **Origin**: [service.md](crates/vault/service.md), [protocol.md](crates/vault/protocol.md), ADR-019
- **Status**: deferred
- **Door type**: One-way (if implemented)
- **Priority**: low
- **Resolution**: Network unlock of a running node's vault is not supported (ADR-008, ADR-019). The vault is unlocked at startup by the CLI binary from a local mnemonic prompt or file. If a future use case requires remote vault administration (e.g., unlocking a headless node's vault over the network), it requires a separate, heavily restricted mechanism: admin scope (ADR-015), mTLS-only (never expose the mnemonic over an unauthenticated channel), and its own ADR with an explicit threat model. This decision does not close that door; it simply does not open it. Deferred because no current use case requires it.
- **Cross-references**: ADR-008, ADR-014, ADR-019, [service.md](crates/vault/service.md)
- **Door type**: One-way (if implemented — wire format exposure), two-way (enabling is non-breaking)
- **Priority**: medium
- **Resolution**: The `VaultProtocol` is a remote-capable irpc service by construction — the `#[rpc_requests]` macro generates both `Service` (local) and `RemoteService` (remote) trait implementations. `DerivedKey`'s dual serialization (JSON redacts private key for safety; postcard preserves bytes for remote dispatch) was designed for this. Enabling remote vault access is a server-setup change (register `IrohProtocol` with an ALPN), not a protocol change.
**What's already in place:**
- Protocol: `VaultProtocol` is already a `RemoteService`
- Serialization: `DerivedKey` redacts in JSON, preserves in postcard
- Actor: `VaultServiceActor` processes all message types, transport-agnostic
- Auth transport: irpc over iroh uses iroh's QUIC connections (NodeId auth, RFC 7250 raw keys)
**What's not in place (the gap):**
- The `IrohProtocol` handler forwards all message types without auth checks
- Remote use needs: (1) NodeId allowlist, (2) message filtering (reject `Unlock`/`Lock` from remote callers), (3) forwarding to the actor
- This auth-wrapping handler cannot live in the vault crate (standalone, ADR-018) — it needs alknet-core's auth model (`IdentityProvider`, scopes). It lives in the assembly layer or a dedicated vault-server crate that depends on both alknet-core and alknet-vault.
**Operation access policy:**
- `Unlock` and `Lock` are local-only (mnemonic and lock control must not be remotely accessible)
- All other operations (`DeriveEd25519`, `DeriveEncryptionKey`, `DeriveEthereumKey`, `DerivePassword`, `Encrypt`, `Decrypt`) are remote-capable
- The policy is documented in the vault spec; the assembly-layer listener enforces it
**Use case:** machine node (head, holds mnemonic) exposes restricted vault API to workers (ephemeral, no mnemonic) over irpc/iroh. Per-machine-node vaults, not shared — compartmentalization limits blast radius.
**What's breaking vs. non-breaking:**
- Enabling remote access: non-breaking (server-setup change)
- Restricting operations / adding auth: non-breaking (handler policy)
- Adding new `VaultProtocol` variants: wire break (inherent to irpc; manageable via ALPN versioning `alknet/vault/v2`)
- Changing `DerivedKey` serialization: non-breaking (dual serialization already in place)
**Why deferred:** the capability is available and the use cases are clear (machine→worker credential access), but no current deployment needs it. The door is left open intentionally — irpc's remote support was chosen for this reason. When a use case materializes, the assembly-layer auth-wrapping handler is the implementation task, not a protocol change. The vault spec documents the policy (which operations are remote-capable) so the future implementer has clear guidance.
- **Cross-references**: ADR-005, ADR-008, ADR-014, ADR-018, ADR-019, [protocol.md](crates/vault/protocol.md), [service.md](crates/vault/service.md)
### OQ-22: Key Rotation Mechanism

View File

@@ -224,7 +224,7 @@ Open questions are tracked in [open-questions.md](open-questions.md). Key questi
- **OQ-08**: Vault integration point (resolved: CLI-embedded, assembly-layer only — see ADR-008, ADR-014, ADR-018, ADR-019)
- **OQ-16**: Safe vault operations for call protocol exposure (resolved: none for now — see ADR-014)
- **OQ-20**: Encryption key derivation (resolved: HD derivation, not PBKDF2 — see ADR-020)
- **OQ-21**: Remote vault administration (deferred: network unlock not supported — see ADR-019)
- **OQ-21**: Remote vault access (deferred: protocol is remote-capable; enabling = server-setup + auth-wrapping handler; Unlock/Lock local-only — see [protocol.md](crates/vault/protocol.md#remote-capability))
- **OQ-22**: Key rotation (resolved: version-indexed paths, `rotate` method — see ADR-021)
## Failure Modes