docs(architecture): document vault remote capability, enrich OQ-21
The VaultProtocol is a remote-capable irpc service by construction — #[rpc_requests] generates both Service (local) and RemoteService (remote) trait impls. DerivedKey's dual serialization (JSON redacts, postcard preserves) was designed for this. Enabling remote vault access is a server-setup change, not a protocol change. OQ-21 enriched with full context: - What's already in place (protocol, serialization, actor, auth transport) - What's not in place (IrohProtocol handler forwards all messages without auth checks; needs NodeId allowlist + message filtering in assembly layer) - Operation access policy: Unlock/Lock local-only; Derive/Encrypt/Decrypt remote-capable - Use case: machine node → workers (workers don't hold mnemonics) - Per-machine-node vaults, not shared (compartmentalization) - Breaking vs non-breaking analysis (enabling = non-breaking; protocol evolution = wire break, manageable via ALPN versioning) The auth-wrapping handler lives in the assembly layer (or a dedicated vault-server crate depending on both alknet-core and alknet-vault), not in the vault crate itself — the vault is standalone (ADR-018) and can't import alknet-core's auth model. OQ-21 remains deferred — no commitment to implement, but the door is open and the design space is mapped.
This commit is contained in:
@@ -51,7 +51,7 @@ cross the network.
|
||||
| OQ | Title | Status | Relevance |
|
||||
|----|-------|--------|-----------|
|
||||
| OQ-20 | Encryption key derivation | resolved (ADR-020) | HD derivation from seed; salt field unused in v2 |
|
||||
| OQ-21 | Remote vault administration | deferred | Network unlock not supported; needs ADR if ever needed |
|
||||
| OQ-21 | Remote vault access | deferred | Protocol is remote-capable by construction; enabling = server-setup change with auth-wrapping handler; Unlock/Lock local-only |
|
||||
| OQ-22 | Key rotation mechanism | resolved (ADR-021) | Version-indexed paths; `rotate` method |
|
||||
|
||||
## Key Design Principles
|
||||
|
||||
@@ -154,15 +154,153 @@ the vault is wrapped in an operation that serializes to JSON — but **no
|
||||
vault operations are exposed over the call protocol** (ADR-014). The JSON
|
||||
serialization path exists only for the `DerivedKey` redaction safety net.
|
||||
|
||||
## Remote Capability
|
||||
|
||||
The `VaultProtocol` is a remote-capable irpc service **by construction**.
|
||||
The `#[rpc_requests]` macro generates both `Service` (local) and
|
||||
`RemoteService` (remote) trait implementations. The `VaultServiceActor`
|
||||
processes `VaultMessage` variants identically regardless of transport —
|
||||
the only difference between local and remote use is the `Client<VaultProtocol>`
|
||||
construction and the server-side listener setup.
|
||||
|
||||
This was a purposeful design decision: irpc's "zero-overhead local,
|
||||
transparent remote" architecture means the same protocol definition and
|
||||
actor code work for both in-process and cross-network dispatch. Enabling
|
||||
remote vault access is a server-setup change, not a protocol change.
|
||||
|
||||
### What's already in place
|
||||
|
||||
- **Protocol**: `VaultProtocol` is already a `RemoteService`. No code
|
||||
changes needed in the protocol definition.
|
||||
- **Serialization**: `DerivedKey`'s dual serialization (JSON redacts private
|
||||
key for safety; postcard preserves bytes for remote dispatch) was
|
||||
designed for this use case.
|
||||
- **Actor**: `VaultServiceActor` already processes all message types. The
|
||||
actor is transport-agnostic — it doesn't know whether a message arrived
|
||||
via a local mpsc channel or a remote QUIC stream.
|
||||
- **Auth transport**: irpc over iroh uses iroh's QUIC connections, which
|
||||
authenticate via NodeId (Ed25519, RFC 7250 raw keys) — the same identity
|
||||
model as the rest of alknet (ADR-010). The connection-level identity
|
||||
("which NodeId is calling") is available before any vault operation is
|
||||
dispatched.
|
||||
|
||||
### What's not in place (the gap)
|
||||
|
||||
The `IrohProtocol` handler that irpc provides forwards **all** message
|
||||
types to the actor without auth checks. For local use this is correct
|
||||
(the assembly layer is trusted). For remote use, the listener needs:
|
||||
|
||||
1. **NodeId allowlist**: only known worker NodeIds may connect.
|
||||
2. **Message filtering**: reject `Unlock` and `Lock` from remote callers
|
||||
(see "Operation access policy" below).
|
||||
3. **Then** forward to the actor.
|
||||
|
||||
This auth-wrapping handler cannot live in the vault crate — the vault is
|
||||
standalone (ADR-018) and depends on no alknet crate. The auth model
|
||||
(`IdentityProvider`, `Identity`, scopes) lives in alknet-core. The
|
||||
auth-wrapping listener lives in the **assembly layer** (the CLI binary)
|
||||
or a dedicated vault-server crate that depends on both alknet-core and
|
||||
alknet-vault. This is the same pattern as ADR-019: the vault is a
|
||||
library, the assembly layer is the integrator.
|
||||
|
||||
```
|
||||
alknet-vault (standalone, no deps)
|
||||
- VaultProtocol (RemoteService by construction)
|
||||
- VaultServiceActor (processes all message types, no auth)
|
||||
- VaultServiceHandle (direct API)
|
||||
|
||||
assembly layer / vault-server (depends on alknet-core + alknet-vault)
|
||||
- AuthWrappingHandler: checks NodeId, filters message types, forwards
|
||||
- IrohProtocol::new(auth_wrapping_handler)
|
||||
- Router::builder(endpoint).accept(b"alknet/vault", protocol).spawn()
|
||||
```
|
||||
|
||||
### Operation access policy
|
||||
|
||||
Not all `VaultProtocol` operations are safe to expose remotely. The vault
|
||||
spec defines the policy; the assembly-layer listener enforces it.
|
||||
|
||||
| Operation | Local (assembly layer) | Remote (workers) | Why |
|
||||
|-----------|----------------------|-------------------|-----|
|
||||
| `Unlock` | ✅ | ❌ | Sends the mnemonic (root of trust) over the wire. Even with NodeId auth, the mnemonic in transit is a different threat model — it's in memory on the receiving end, potentially in logs/traces. Local-only. |
|
||||
| `Lock` | ✅ | ❌ | Locking the vault bricks the machine node for all workers. A compromised or buggy worker could DoS the entire machine node. Local-only. |
|
||||
| `DeriveEd25519` | ✅ | ✅ | Workers need derived keys for signing, identity. The derivation path is the access control — the worker can only derive at paths the assembly layer declares. |
|
||||
| `DeriveEncryptionKey` | ✅ | ✅ | Workers need encryption keys for credential encryption. Same path-based access control. |
|
||||
| `DeriveEthereumKey` | ✅ | ✅ | Same as DeriveEd25519, for Ethereum signing. |
|
||||
| `DerivePassword` | ✅ | ✅ | Workers need deterministic passwords for service credentials. |
|
||||
| `Encrypt` | ✅ | ✅ | Workers encrypt external credentials (API keys) for storage. |
|
||||
| `Decrypt` | ✅ | ✅ | Workers decrypt stored credentials at call time. |
|
||||
|
||||
The policy is: **`Unlock` and `Lock` are local-only; all other operations
|
||||
are remote-capable.** The assembly-layer listener filters `Unlock` and
|
||||
`Lock` messages from remote connections and returns an error.
|
||||
|
||||
### Use case: machine node → workers
|
||||
|
||||
The primary use case is a **machine node** (long-lived, holds the mnemonic,
|
||||
manages container services) exposing a restricted vault API to its
|
||||
**workers** (ephemeral, containerized, no mnemonic):
|
||||
|
||||
```
|
||||
Machine Node (head, vault unlocked locally)
|
||||
├── exposes alknet/vault ALPN to workers
|
||||
├── NodeId allowlist: only known worker NodeIds may connect
|
||||
├── message filter: rejects Unlock/Lock from remote callers
|
||||
│
|
||||
├── Worker A (no mnemonic)
|
||||
│ └── calls DeriveEd25519, Encrypt, Decrypt on machine node's vault
|
||||
│
|
||||
└── Worker B (also a head for its own sub-workers)
|
||||
├── gets its own credentials from machine node's vault
|
||||
└── can expose its own restricted vault API to sub-workers
|
||||
```
|
||||
|
||||
Workers don't hold mnemonics. They get static credentials injected at
|
||||
construction (the common case) and call the machine node's vault for
|
||||
dynamic derivation or decryption when needed. This is the
|
||||
defense-in-depth (Russian doll) model: the seed is the innermost layer,
|
||||
the machine node's vault is the next, iroh's NodeId auth is the outer,
|
||||
and workers are outside that — calling in through authenticated channels.
|
||||
|
||||
### Per-machine-node vaults, not shared
|
||||
|
||||
Each machine node has its own vault and mnemonic. Machine nodes do not
|
||||
share vaults with each other. Compromising one machine node exposes only
|
||||
that node's workers, not all nodes. This is compartmentalization — the
|
||||
blast radius of a vault compromise is one machine node, not the entire
|
||||
fleet.
|
||||
|
||||
The remote vault capability is for the **machine→worker** relationship,
|
||||
not for cross-machine-node sharing. Machine nodes don't expose their
|
||||
vaults to peer machine nodes — only to their own workers, authenticated
|
||||
by NodeId.
|
||||
|
||||
### What's breaking vs. non-breaking
|
||||
|
||||
| Change | Breaking? | Why |
|
||||
|--------|-----------|-----|
|
||||
| Enabling remote vault access | **No** | Server-setup change — register `IrohProtocol` with an ALPN. The protocol is already a `RemoteService`. |
|
||||
| Restricting which operations are remote-capable | **No** | Policy in the assembly-layer handler, not a protocol change. |
|
||||
| Adding NodeId auth checks | **No** | Implementation in the assembly-layer handler. The vault crate doesn't change. |
|
||||
| Adding new `VaultProtocol` variants | **Yes (wire break)** | Inherent to irpc — versioning is a non-goal. Would need ALPN versioning (`alknet/vault/v2`) if the protocol evolves. Same constraint as any irpc service. |
|
||||
| Changing `DerivedKey` serialization | **No** | Dual serialization is already in place — postcard preserves bytes for remote, JSON redacts for safety. |
|
||||
|
||||
The only breaking change is evolving the `VaultProtocol` enum itself, and
|
||||
that's manageable with ALPN versioning (`alknet/vault`, then
|
||||
`alknet/vault/v2` if needed) — the same pattern alknet uses for all ALPN
|
||||
protocols (ADR-006).
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| Decision | ADR | Summary |
|
||||
|----------|-----|---------|
|
||||
| irpc for vault dispatch | [ADR-005](../../decisions/005-irpc-as-call-protocol-foundation.md) | In-process type-safe dispatch |
|
||||
| irpc for vault dispatch | [ADR-005](../../decisions/005-irpc-as-call-protocol-foundation.md) | In-process type-safe dispatch; remote-capable by construction |
|
||||
| `DerivedKey` is move-only | [ADR-014](../../decisions/014-secret-material-flow-and-capability-injection.md) | Prevents accidental duplication of secret material |
|
||||
| JSON redacts private key | [ADR-014](../../decisions/014-secret-material-flow-and-capability-injection.md) | Defense-in-depth for logging accidents |
|
||||
| postcard preserves private key | — | Required for in-cluster irpc dispatch |
|
||||
| No vault operations on call protocol | [ADR-008](../../decisions/008-secret-service-integration.md), [ADR-014](../../decisions/014-secret-material-flow-and-capability-injection.md) | Master seed never crosses the network |
|
||||
| Unlock/Lock are local-only | OQ-21 (deferred) | Mnemonic and lock control must not be remotely accessible |
|
||||
| Auth wrapping lives in assembly layer | [ADR-018](../../decisions/018-vault-standalone-crate.md), [ADR-019](../../decisions/019-vault-assembly-layer-only.md) | Vault is standalone; can't import alknet-core's auth model |
|
||||
|
||||
## Open Questions
|
||||
|
||||
|
||||
@@ -280,18 +280,29 @@ in-process pattern (the actor and the handle share state via `Arc`).
|
||||
| Direct (in-process) | `VaultServiceHandle` method calls | None | CLI binary at startup (the supported path) |
|
||||
| Actor (in-process) | `VaultMessage` over mpsc | None (channel) | irpc service dispatch (in-process) |
|
||||
|
||||
Remote (in-cluster) vault dispatch — where the vault runs as a sidecar
|
||||
and other processes send `VaultMessage` over a network — is **not
|
||||
supported** (ADR-019, OQ-21). The irpc `RemoteService` trait infrastructure
|
||||
exists in the library, but exposing the vault over the network would
|
||||
require its own ADR with an explicit threat model (the master seed must
|
||||
never cross the network). The dispatch table above lists only the
|
||||
supported paths.
|
||||
Remote vault dispatch — where the vault is exposed over irpc/iroh to
|
||||
workers or other processes — is **deferred** (OQ-21). The `VaultProtocol`
|
||||
is already a `RemoteService` by construction (irpc's `#[rpc_requests]`
|
||||
generates it), and `DerivedKey`'s dual serialization was designed for this.
|
||||
Enabling remote access is a server-setup change (register `IrohProtocol`
|
||||
with an ALPN), not a protocol change.
|
||||
|
||||
However, the `IrohProtocol` handler that irpc provides forwards all
|
||||
message types without auth checks. Remote use needs an **auth-wrapping
|
||||
handler** in the assembly layer (not the vault crate — the vault is
|
||||
standalone, ADR-018, and can't import alknet-core's auth model) that:
|
||||
1. Checks the caller's NodeId against an allowlist
|
||||
2. Filters `Unlock` and `Lock` messages from remote callers (local-only)
|
||||
3. Forwards remaining messages to the actor
|
||||
|
||||
See [protocol.md → Remote Capability](protocol.md#remote-capability) for
|
||||
the full design, operation access policy, use case (machine node →
|
||||
workers), and breaking-vs-non-breaking analysis.
|
||||
|
||||
The assembly layer (CLI binary) uses the direct path. The actor path
|
||||
exists for in-process irpc dispatch but is not used by the assembly layer
|
||||
— it's available for test harnesses and future in-process service
|
||||
patterns. Neither path is on the alknet call protocol (ADR-008, ADR-014).
|
||||
exists for in-process irpc dispatch. Neither path is on the alknet call
|
||||
protocol (ADR-008, ADR-014) — the vault has no ALPN until a future
|
||||
deployment explicitly registers one with an auth-wrapping handler.
|
||||
|
||||
## Errors
|
||||
|
||||
@@ -328,8 +339,11 @@ error types — the CLI binary converts at the assembly boundary (ADR-018).
|
||||
|
||||
See [open-questions.md](../../open-questions.md) for full details.
|
||||
|
||||
- **OQ-21** (deferred): Remote vault administration — network unlock is not
|
||||
supported; needs an ADR if ever needed.
|
||||
- **OQ-21** (deferred): Remote vault access — the `VaultProtocol` is
|
||||
remote-capable by construction (irpc `RemoteService`). Enabling remote
|
||||
access is a server-setup change with an auth-wrapping handler in the
|
||||
assembly layer. `Unlock`/`Lock` are local-only; other operations are
|
||||
remote-capable. See [protocol.md → Remote Capability](protocol.md#remote-capability).
|
||||
|
||||
## Security Constraints
|
||||
|
||||
|
||||
Reference in New Issue
Block a user