From 9087f0579f71371fa32f5b330ca4b0f9f08ea5a0 Mon Sep 17 00:00:00 2001 From: "glm-5.2" Date: Sat, 20 Jun 2026 06:48:23 +0000 Subject: [PATCH] docs(architecture): document vault remote capability, enrich OQ-21 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The VaultProtocol is a remote-capable irpc service by construction — #[rpc_requests] generates both Service (local) and RemoteService (remote) trait impls. DerivedKey's dual serialization (JSON redacts, postcard preserves) was designed for this. Enabling remote vault access is a server-setup change, not a protocol change. OQ-21 enriched with full context: - What's already in place (protocol, serialization, actor, auth transport) - What's not in place (IrohProtocol handler forwards all messages without auth checks; needs NodeId allowlist + message filtering in assembly layer) - Operation access policy: Unlock/Lock local-only; Derive/Encrypt/Decrypt remote-capable - Use case: machine node → workers (workers don't hold mnemonics) - Per-machine-node vaults, not shared (compartmentalization) - Breaking vs non-breaking analysis (enabling = non-breaking; protocol evolution = wire break, manageable via ALPN versioning) The auth-wrapping handler lives in the assembly layer (or a dedicated vault-server crate depending on both alknet-core and alknet-vault), not in the vault crate itself — the vault is standalone (ADR-018) and can't import alknet-core's auth model. OQ-21 remains deferred — no commitment to implement, but the door is open and the design space is mapped. --- docs/architecture/README.md | 2 +- docs/architecture/crates/vault/README.md | 2 +- docs/architecture/crates/vault/protocol.md | 140 ++++++++++++++++++++- docs/architecture/crates/vault/service.md | 38 ++++-- docs/architecture/open-questions.md | 36 +++++- docs/architecture/overview.md | 2 +- 6 files changed, 199 insertions(+), 21 deletions(-) diff --git a/docs/architecture/README.md b/docs/architecture/README.md index 8768b45..0f3d819 100644 --- a/docs/architecture/README.md +++ b/docs/architecture/README.md @@ -87,7 +87,7 @@ See [open-questions.md](open-questions.md) for the full tracker. **Deferred (not active):** - **OQ-09**: WASM target boundaries — design constraint, not deliverable - **OQ-10**: Git adapter scope — start with smart protocol, add ERC721 later -- **OQ-21**: Remote vault administration — network unlock not supported; needs ADR if ever needed +- **OQ-21**: Remote vault access — protocol is remote-capable by construction (irpc `RemoteService`); enabling is a server-setup change with an auth-wrapping handler in the assembly layer; `Unlock`/`Lock` are local-only ## Document Lifecycle diff --git a/docs/architecture/crates/vault/README.md b/docs/architecture/crates/vault/README.md index 0e9e502..e028281 100644 --- a/docs/architecture/crates/vault/README.md +++ b/docs/architecture/crates/vault/README.md @@ -51,7 +51,7 @@ cross the network. | OQ | Title | Status | Relevance | |----|-------|--------|-----------| | OQ-20 | Encryption key derivation | resolved (ADR-020) | HD derivation from seed; salt field unused in v2 | -| OQ-21 | Remote vault administration | deferred | Network unlock not supported; needs ADR if ever needed | +| OQ-21 | Remote vault access | deferred | Protocol is remote-capable by construction; enabling = server-setup change with auth-wrapping handler; Unlock/Lock local-only | | OQ-22 | Key rotation mechanism | resolved (ADR-021) | Version-indexed paths; `rotate` method | ## Key Design Principles diff --git a/docs/architecture/crates/vault/protocol.md b/docs/architecture/crates/vault/protocol.md index 3abc7da..2cb62ff 100644 --- a/docs/architecture/crates/vault/protocol.md +++ b/docs/architecture/crates/vault/protocol.md @@ -154,15 +154,153 @@ the vault is wrapped in an operation that serializes to JSON — but **no vault operations are exposed over the call protocol** (ADR-014). The JSON serialization path exists only for the `DerivedKey` redaction safety net. +## Remote Capability + +The `VaultProtocol` is a remote-capable irpc service **by construction**. +The `#[rpc_requests]` macro generates both `Service` (local) and +`RemoteService` (remote) trait implementations. The `VaultServiceActor` +processes `VaultMessage` variants identically regardless of transport — +the only difference between local and remote use is the `Client` +construction and the server-side listener setup. + +This was a purposeful design decision: irpc's "zero-overhead local, +transparent remote" architecture means the same protocol definition and +actor code work for both in-process and cross-network dispatch. Enabling +remote vault access is a server-setup change, not a protocol change. + +### What's already in place + +- **Protocol**: `VaultProtocol` is already a `RemoteService`. No code + changes needed in the protocol definition. +- **Serialization**: `DerivedKey`'s dual serialization (JSON redacts private + key for safety; postcard preserves bytes for remote dispatch) was + designed for this use case. +- **Actor**: `VaultServiceActor` already processes all message types. The + actor is transport-agnostic — it doesn't know whether a message arrived + via a local mpsc channel or a remote QUIC stream. +- **Auth transport**: irpc over iroh uses iroh's QUIC connections, which + authenticate via NodeId (Ed25519, RFC 7250 raw keys) — the same identity + model as the rest of alknet (ADR-010). The connection-level identity + ("which NodeId is calling") is available before any vault operation is + dispatched. + +### What's not in place (the gap) + +The `IrohProtocol` handler that irpc provides forwards **all** message +types to the actor without auth checks. For local use this is correct +(the assembly layer is trusted). For remote use, the listener needs: + +1. **NodeId allowlist**: only known worker NodeIds may connect. +2. **Message filtering**: reject `Unlock` and `Lock` from remote callers + (see "Operation access policy" below). +3. **Then** forward to the actor. + +This auth-wrapping handler cannot live in the vault crate — the vault is +standalone (ADR-018) and depends on no alknet crate. The auth model +(`IdentityProvider`, `Identity`, scopes) lives in alknet-core. The +auth-wrapping listener lives in the **assembly layer** (the CLI binary) +or a dedicated vault-server crate that depends on both alknet-core and +alknet-vault. This is the same pattern as ADR-019: the vault is a +library, the assembly layer is the integrator. + +``` +alknet-vault (standalone, no deps) + - VaultProtocol (RemoteService by construction) + - VaultServiceActor (processes all message types, no auth) + - VaultServiceHandle (direct API) + +assembly layer / vault-server (depends on alknet-core + alknet-vault) + - AuthWrappingHandler: checks NodeId, filters message types, forwards + - IrohProtocol::new(auth_wrapping_handler) + - Router::builder(endpoint).accept(b"alknet/vault", protocol).spawn() +``` + +### Operation access policy + +Not all `VaultProtocol` operations are safe to expose remotely. The vault +spec defines the policy; the assembly-layer listener enforces it. + +| Operation | Local (assembly layer) | Remote (workers) | Why | +|-----------|----------------------|-------------------|-----| +| `Unlock` | ✅ | ❌ | Sends the mnemonic (root of trust) over the wire. Even with NodeId auth, the mnemonic in transit is a different threat model — it's in memory on the receiving end, potentially in logs/traces. Local-only. | +| `Lock` | ✅ | ❌ | Locking the vault bricks the machine node for all workers. A compromised or buggy worker could DoS the entire machine node. Local-only. | +| `DeriveEd25519` | ✅ | ✅ | Workers need derived keys for signing, identity. The derivation path is the access control — the worker can only derive at paths the assembly layer declares. | +| `DeriveEncryptionKey` | ✅ | ✅ | Workers need encryption keys for credential encryption. Same path-based access control. | +| `DeriveEthereumKey` | ✅ | ✅ | Same as DeriveEd25519, for Ethereum signing. | +| `DerivePassword` | ✅ | ✅ | Workers need deterministic passwords for service credentials. | +| `Encrypt` | ✅ | ✅ | Workers encrypt external credentials (API keys) for storage. | +| `Decrypt` | ✅ | ✅ | Workers decrypt stored credentials at call time. | + +The policy is: **`Unlock` and `Lock` are local-only; all other operations +are remote-capable.** The assembly-layer listener filters `Unlock` and +`Lock` messages from remote connections and returns an error. + +### Use case: machine node → workers + +The primary use case is a **machine node** (long-lived, holds the mnemonic, +manages container services) exposing a restricted vault API to its +**workers** (ephemeral, containerized, no mnemonic): + +``` +Machine Node (head, vault unlocked locally) +├── exposes alknet/vault ALPN to workers +├── NodeId allowlist: only known worker NodeIds may connect +├── message filter: rejects Unlock/Lock from remote callers +│ +├── Worker A (no mnemonic) +│ └── calls DeriveEd25519, Encrypt, Decrypt on machine node's vault +│ +└── Worker B (also a head for its own sub-workers) + ├── gets its own credentials from machine node's vault + └── can expose its own restricted vault API to sub-workers +``` + +Workers don't hold mnemonics. They get static credentials injected at +construction (the common case) and call the machine node's vault for +dynamic derivation or decryption when needed. This is the +defense-in-depth (Russian doll) model: the seed is the innermost layer, +the machine node's vault is the next, iroh's NodeId auth is the outer, +and workers are outside that — calling in through authenticated channels. + +### Per-machine-node vaults, not shared + +Each machine node has its own vault and mnemonic. Machine nodes do not +share vaults with each other. Compromising one machine node exposes only +that node's workers, not all nodes. This is compartmentalization — the +blast radius of a vault compromise is one machine node, not the entire +fleet. + +The remote vault capability is for the **machine→worker** relationship, +not for cross-machine-node sharing. Machine nodes don't expose their +vaults to peer machine nodes — only to their own workers, authenticated +by NodeId. + +### What's breaking vs. non-breaking + +| Change | Breaking? | Why | +|--------|-----------|-----| +| Enabling remote vault access | **No** | Server-setup change — register `IrohProtocol` with an ALPN. The protocol is already a `RemoteService`. | +| Restricting which operations are remote-capable | **No** | Policy in the assembly-layer handler, not a protocol change. | +| Adding NodeId auth checks | **No** | Implementation in the assembly-layer handler. The vault crate doesn't change. | +| Adding new `VaultProtocol` variants | **Yes (wire break)** | Inherent to irpc — versioning is a non-goal. Would need ALPN versioning (`alknet/vault/v2`) if the protocol evolves. Same constraint as any irpc service. | +| Changing `DerivedKey` serialization | **No** | Dual serialization is already in place — postcard preserves bytes for remote, JSON redacts for safety. | + +The only breaking change is evolving the `VaultProtocol` enum itself, and +that's manageable with ALPN versioning (`alknet/vault`, then +`alknet/vault/v2` if needed) — the same pattern alknet uses for all ALPN +protocols (ADR-006). + ## Design Decisions | Decision | ADR | Summary | |----------|-----|---------| -| irpc for vault dispatch | [ADR-005](../../decisions/005-irpc-as-call-protocol-foundation.md) | In-process type-safe dispatch | +| irpc for vault dispatch | [ADR-005](../../decisions/005-irpc-as-call-protocol-foundation.md) | In-process type-safe dispatch; remote-capable by construction | | `DerivedKey` is move-only | [ADR-014](../../decisions/014-secret-material-flow-and-capability-injection.md) | Prevents accidental duplication of secret material | | JSON redacts private key | [ADR-014](../../decisions/014-secret-material-flow-and-capability-injection.md) | Defense-in-depth for logging accidents | | postcard preserves private key | — | Required for in-cluster irpc dispatch | | No vault operations on call protocol | [ADR-008](../../decisions/008-secret-service-integration.md), [ADR-014](../../decisions/014-secret-material-flow-and-capability-injection.md) | Master seed never crosses the network | +| Unlock/Lock are local-only | OQ-21 (deferred) | Mnemonic and lock control must not be remotely accessible | +| Auth wrapping lives in assembly layer | [ADR-018](../../decisions/018-vault-standalone-crate.md), [ADR-019](../../decisions/019-vault-assembly-layer-only.md) | Vault is standalone; can't import alknet-core's auth model | ## Open Questions diff --git a/docs/architecture/crates/vault/service.md b/docs/architecture/crates/vault/service.md index 286251a..ed67996 100644 --- a/docs/architecture/crates/vault/service.md +++ b/docs/architecture/crates/vault/service.md @@ -280,18 +280,29 @@ in-process pattern (the actor and the handle share state via `Arc`). | Direct (in-process) | `VaultServiceHandle` method calls | None | CLI binary at startup (the supported path) | | Actor (in-process) | `VaultMessage` over mpsc | None (channel) | irpc service dispatch (in-process) | -Remote (in-cluster) vault dispatch — where the vault runs as a sidecar -and other processes send `VaultMessage` over a network — is **not -supported** (ADR-019, OQ-21). The irpc `RemoteService` trait infrastructure -exists in the library, but exposing the vault over the network would -require its own ADR with an explicit threat model (the master seed must -never cross the network). The dispatch table above lists only the -supported paths. +Remote vault dispatch — where the vault is exposed over irpc/iroh to +workers or other processes — is **deferred** (OQ-21). The `VaultProtocol` +is already a `RemoteService` by construction (irpc's `#[rpc_requests]` +generates it), and `DerivedKey`'s dual serialization was designed for this. +Enabling remote access is a server-setup change (register `IrohProtocol` +with an ALPN), not a protocol change. + +However, the `IrohProtocol` handler that irpc provides forwards all +message types without auth checks. Remote use needs an **auth-wrapping +handler** in the assembly layer (not the vault crate — the vault is +standalone, ADR-018, and can't import alknet-core's auth model) that: +1. Checks the caller's NodeId against an allowlist +2. Filters `Unlock` and `Lock` messages from remote callers (local-only) +3. Forwards remaining messages to the actor + +See [protocol.md → Remote Capability](protocol.md#remote-capability) for +the full design, operation access policy, use case (machine node → +workers), and breaking-vs-non-breaking analysis. The assembly layer (CLI binary) uses the direct path. The actor path -exists for in-process irpc dispatch but is not used by the assembly layer -— it's available for test harnesses and future in-process service -patterns. Neither path is on the alknet call protocol (ADR-008, ADR-014). +exists for in-process irpc dispatch. Neither path is on the alknet call +protocol (ADR-008, ADR-014) — the vault has no ALPN until a future +deployment explicitly registers one with an auth-wrapping handler. ## Errors @@ -328,8 +339,11 @@ error types — the CLI binary converts at the assembly boundary (ADR-018). See [open-questions.md](../../open-questions.md) for full details. -- **OQ-21** (deferred): Remote vault administration — network unlock is not - supported; needs an ADR if ever needed. +- **OQ-21** (deferred): Remote vault access — the `VaultProtocol` is + remote-capable by construction (irpc `RemoteService`). Enabling remote + access is a server-setup change with an auth-wrapping handler in the + assembly layer. `Unlock`/`Lock` are local-only; other operations are + remote-capable. See [protocol.md → Remote Capability](protocol.md#remote-capability). ## Security Constraints diff --git a/docs/architecture/open-questions.md b/docs/architecture/open-questions.md index 753ce8b..3178e16 100644 --- a/docs/architecture/open-questions.md +++ b/docs/architecture/open-questions.md @@ -251,12 +251,38 @@ These questions are acknowledged but not active. They will be promoted to open w ### OQ-21: Remote Vault Administration -- **Origin**: [service.md](crates/vault/service.md), ADR-019 +- **Origin**: [service.md](crates/vault/service.md), [protocol.md](crates/vault/protocol.md), ADR-019 - **Status**: deferred -- **Door type**: One-way (if implemented) -- **Priority**: low -- **Resolution**: Network unlock of a running node's vault is not supported (ADR-008, ADR-019). The vault is unlocked at startup by the CLI binary from a local mnemonic prompt or file. If a future use case requires remote vault administration (e.g., unlocking a headless node's vault over the network), it requires a separate, heavily restricted mechanism: admin scope (ADR-015), mTLS-only (never expose the mnemonic over an unauthenticated channel), and its own ADR with an explicit threat model. This decision does not close that door; it simply does not open it. Deferred because no current use case requires it. -- **Cross-references**: ADR-008, ADR-014, ADR-019, [service.md](crates/vault/service.md) +- **Door type**: One-way (if implemented — wire format exposure), two-way (enabling is non-breaking) +- **Priority**: medium +- **Resolution**: The `VaultProtocol` is a remote-capable irpc service by construction — the `#[rpc_requests]` macro generates both `Service` (local) and `RemoteService` (remote) trait implementations. `DerivedKey`'s dual serialization (JSON redacts private key for safety; postcard preserves bytes for remote dispatch) was designed for this. Enabling remote vault access is a server-setup change (register `IrohProtocol` with an ALPN), not a protocol change. + + **What's already in place:** + - Protocol: `VaultProtocol` is already a `RemoteService` + - Serialization: `DerivedKey` redacts in JSON, preserves in postcard + - Actor: `VaultServiceActor` processes all message types, transport-agnostic + - Auth transport: irpc over iroh uses iroh's QUIC connections (NodeId auth, RFC 7250 raw keys) + + **What's not in place (the gap):** + - The `IrohProtocol` handler forwards all message types without auth checks + - Remote use needs: (1) NodeId allowlist, (2) message filtering (reject `Unlock`/`Lock` from remote callers), (3) forwarding to the actor + - This auth-wrapping handler cannot live in the vault crate (standalone, ADR-018) — it needs alknet-core's auth model (`IdentityProvider`, scopes). It lives in the assembly layer or a dedicated vault-server crate that depends on both alknet-core and alknet-vault. + + **Operation access policy:** + - `Unlock` and `Lock` are local-only (mnemonic and lock control must not be remotely accessible) + - All other operations (`DeriveEd25519`, `DeriveEncryptionKey`, `DeriveEthereumKey`, `DerivePassword`, `Encrypt`, `Decrypt`) are remote-capable + - The policy is documented in the vault spec; the assembly-layer listener enforces it + + **Use case:** machine node (head, holds mnemonic) exposes restricted vault API to workers (ephemeral, no mnemonic) over irpc/iroh. Per-machine-node vaults, not shared — compartmentalization limits blast radius. + + **What's breaking vs. non-breaking:** + - Enabling remote access: non-breaking (server-setup change) + - Restricting operations / adding auth: non-breaking (handler policy) + - Adding new `VaultProtocol` variants: wire break (inherent to irpc; manageable via ALPN versioning `alknet/vault/v2`) + - Changing `DerivedKey` serialization: non-breaking (dual serialization already in place) + + **Why deferred:** the capability is available and the use cases are clear (machine→worker credential access), but no current deployment needs it. The door is left open intentionally — irpc's remote support was chosen for this reason. When a use case materializes, the assembly-layer auth-wrapping handler is the implementation task, not a protocol change. The vault spec documents the policy (which operations are remote-capable) so the future implementer has clear guidance. +- **Cross-references**: ADR-005, ADR-008, ADR-014, ADR-018, ADR-019, [protocol.md](crates/vault/protocol.md), [service.md](crates/vault/service.md) ### OQ-22: Key Rotation Mechanism diff --git a/docs/architecture/overview.md b/docs/architecture/overview.md index f3e4d30..b7d44dc 100644 --- a/docs/architecture/overview.md +++ b/docs/architecture/overview.md @@ -224,7 +224,7 @@ Open questions are tracked in [open-questions.md](open-questions.md). Key questi - **OQ-08**: Vault integration point (resolved: CLI-embedded, assembly-layer only — see ADR-008, ADR-014, ADR-018, ADR-019) - **OQ-16**: Safe vault operations for call protocol exposure (resolved: none for now — see ADR-014) - **OQ-20**: Encryption key derivation (resolved: HD derivation, not PBKDF2 — see ADR-020) -- **OQ-21**: Remote vault administration (deferred: network unlock not supported — see ADR-019) +- **OQ-21**: Remote vault access (deferred: protocol is remote-capable; enabling = server-setup + auth-wrapping handler; Unlock/Lock local-only — see [protocol.md](crates/vault/protocol.md#remote-capability)) - **OQ-22**: Key rotation (resolved: version-indexed paths, `rotate` method — see ADR-021) ## Failure Modes