Files
alknet/docs/architecture/decisions/014-secret-material-flow-and-capability-injection.md
glm-5.2 fab2c88444 docs(architecture): rename trusted to internal, add OQ-17 abort cascade and OQ-18 privilege model
The 'trusted' flag on OperationContext was the wrong word — it implies a
trust decision was made, but what actually happens is the call originated
internally (from composition) not externally (from the wire). Renamed to
'internal' with clarified semantics: internal calls switch authority
context to the handler's identity, not skip ACL. This prevents the
privilege escalation vector where composition with 'trusted: true' bypassed
all access control (buggy handler + parameterized dispatch).

- Rename trusted -> internal across operation-registry.md, ADR-014
- Update OperationContext field description and LocalOperationEnv code
- Add OQ-17: abort cascade for nested calls (call.aborted cascades to
  descendants, default abort-dependents, continue-running opt-in). One-way
  door on the protocol event schema; mechanism is a two-way door.
- Add OQ-18: privilege model and authority context (internal = authority
  switch not ACL skip, External/Internal operation visibility, scoped
  composition env + handler identity). Needs agent crate in view.
- Add abort cascade section and constraint to call-protocol.md
- Update crates/call/README.md with OQ-17, OQ-18, and two new design principles
- Update architecture README.md with OQ-17, OQ-18
2026-06-18 07:38:33 +00:00

203 lines
10 KiB
Markdown

# ADR-014: Secret Material Flow and Capability Injection
## Status
Accepted
## Context
alknet-vault holds the master seed and can derive keys and encrypt/decrypt
arbitrary data. ADR-008 established that the vault is a **capability source**:
"derived keys and decrypted credentials are injected into operation contexts
at the assembly layer, not passed as vault references to handlers." That
prose was correct but the mechanism was never specified.
The result was a contradiction in the spec documents. ADR-008 said the master
seed never crosses the network, but `operation-registry.md` showed
`vault/derive`, `vault/unlock`, and `vault/decrypt` registered as call protocol
operations — directly on the wire. Those two statements cannot both be true.
The contradiction arose because no injection mechanism existed in the
architecture, so the only way the docs could show a handler obtaining a key was
to expose vault operations over the call protocol.
This is a one-way door. Once secret material crosses the wire as a call
protocol operation, the attack surface is permanent:
- `vault/unlock` accepts a BIP39 mnemonic — the root of trust — over QUIC. A
compromised peer, a logging accident, a tracing span, and the seed is gone.
- `vault/derive` returns a `DerivedKey`. The type redacts the private key in
JSON today, but the operation's existence means a serialization change, a
binary codec addition, or a wrapper change would leak it. The surface is
the risk, not the current implementation.
- `vault/decrypt` accepts an encrypted blob and returns plaintext. Any
authorized caller can decrypt any blob they possess.
The broader problem this decision addresses is structural: the industry
default for storing LLM provider keys, API tokens, and other credentials is
plaintext config files and environment variables (e.g., the aisdk Rust port
reads `std::env::var("GOOGLE_API_KEY")` and the example backend calls
`dotenv::dotenv()`). alknet replaces that with a vault. But the vault only
solves the storage problem; the flow problem — how decrypted material reaches
the code that needs it without crossing the network — requires its own
decision.
There is a separate, second axis that the current `OperationContext` conflates
with the secret-flow problem. A handler has two orthogonal credential concerns:
- **Identity (inbound)**: who is calling me? Resolved per-request from
`AuthContext` (TLS client cert, auth token). Already in `OperationContext`.
- **Capabilities (outbound)**: what secrets can I use for outbound calls? This
is the missing axis. A handler calling Google's API needs a decrypted Google
API key. That is not the caller's identity — it is the handler's own outbound
credential, provisioned by the assembly layer.
Mixing these two into one channel (e.g., stuffing secrets into
`OperationContext.metadata: HashMap<String, Value>`) is a leak risk: metadata
propagates through nested calls via `OperationEnv::invoke()`, so a secret
placed there by one handler would flow to every downstream operation.
## Decision
**1. The vault is assembly-layer only.**
The CLI binary (the `alknet` crate, or an embedded assembly layer) is the sole
component that talks to `VaultServiceHandle` directly. It unlocks the vault at
startup, derives and decrypts what each handler needs, and constructs handlers
with the results. No vault operation (`derive`, `decrypt`, `unlock`, `lock`)
is registered as a call protocol operation. The vault has no ALPN. The master
seed and derived private keys never enter the call protocol.
**2. Capabilities are the injection mechanism.**
A `Capabilities` type carries outbound secret material from the assembly layer
into handlers. Capabilities are distinct from identity (inbound auth) and
distinct from per-request metadata. The concrete shape of the `Capabilities`
type is a two-way door — to be decided during implementation of the
`alknet-call` crate. The one-way constraint is:
- Capabilities hold non-serializable, zeroized secret material. They cannot
cross the call protocol wire even by accident — they are not
`serde_json::Value`, they do not implement `Serialize`, and they do not
appear in `EventEnvelope` payloads.
- Capabilities are injected at handler construction (the common case: a static
decrypted API key held for the handler's lifetime) or scoped per-request for
internal-only flows. They are never populated from call protocol
inputs.
**3. The call protocol carries no secret material.**
This is a wire-level constraint on the call protocol, not a handler-level
convention. Secret material (private keys, API keys, mnemonics, decrypted
credentials, raw tokens) must not appear in:
- `call.requested` payloads (inputs)
- `call.responded` payloads (outputs)
- `OperationContext.metadata`
The wire format does not enforce this — it carries `serde_json::Value` — so the
constraint is architectural, enforced by the operation registry and by
convention. Operations that need to share public key material (e.g., for
identity verification) use a dedicated operation that returns only the public
component, never the private key.
**4. Adapters take credential sources, not static tokens.**
The `from_openapi` and `from_jsonschema` adapter patterns (defined in Rust in
alknet-call per ADR-013) register HTTP-backed operations. The TypeScript
`@alkdev/operations` `from_openapi` takes `config.auth: { token: "..." }` — a
static string. The Rust adapters take a credential source wired to the assembly
layer (a resolver, a capability handle, or an injected secret), not a literal
token. This is the integration point where the vault feeds credentials into
HTTP-backed operations: the assembly layer decrypts the token at startup and
provides it to the adapter at registration time.
**5. Handlers that need per-request vault access receive a scoped capability.**
The common case (a static decrypted API key) is covered by construction-time
injection. A narrower case — a handler that derives a child key for a specific
operation (e.g., signing for GitHub authentication) — receives a
scoped capability that can only derive at a restricted path set. This is still
not a vault reference: it is a restricted handle that performs a specific
derivation and returns the result to the handler, in-process. The handler
never sees the master seed. Whether this scoped capability is a distinct type
or modeled as a pre-derived key injected at construction is a two-way door
left to the `alknet-call` and `alknet-agent` crate specs.
## Consequences
**Positive:**
- The master seed and derived private keys never cross the network. The attack
surface for the root of trust is local-only.
- The `OperationContext` gains a clean second axis (capabilities) instead of
overloading `metadata` for secrets, preventing accidental propagation of
secret material through nested calls.
- Handlers that need outbound credentials (the agent handler calling an LLM
provider) receive them directly — no indirection through a `vault/derive`
call, no latency, no failure mode where the vault must be reachable at call
time.
- The adapter contract (OQ-15) gains a concrete shape: adapters take a
credential source from the assembly layer, not a static token. This makes
the `from_openapi` / `from_jsonschema` / `from_call` patterns safe by
construction.
- The model is structurally incompatible with the env-var / plaintext-config
default. There is no `std::env::var("API_KEY")` path — the only way a handler
gets a credential is through a capability, and the only way a capability is
populated is through the assembly layer from the vault.
**Negative:**
- The assembly layer (CLI binary) has more construction-time responsibility: it
must know which handlers need which credentials and wire them. This is
expected — the CLI assembles everything (ADR-008).
- Adding a new handler that needs a new credential requires updating the
assembly layer, not just registering an operation. This is a feature, not a
bug: it forces an explicit decision about what secret material a handler
needs.
- Remote vault administration (unlock a running node's vault over the network)
is not supported by this decision. If that capability is needed in the
future, it would require a separate, heavily restricted mechanism (admin
scope, mTLS-only, never expose the mnemonic over an unauthenticated channel)
and its own ADR. This decision does not close that door; it simply does not
open it.
- The `Capabilities` type shape is not fully specified here. The one-way
constraint (non-serializable, zeroized, injection-only) is fixed; the
concrete API is a two-way door for the `alknet-call` spec.
## Assumptions
These are the load-bearing assumptions. If any of them breaks, the decision
should be revisited:
1. **Handlers need credentials at construction time or at call time, not
dynamically discovered at call time.** If a handler needs to derive a key
at an unpredictable path determined by call input, the scoped-capability
model still covers it (the handler holds a scoped vault access), but the
surface area is larger. The assumption is that this case is rare.
2. **The call protocol's threat model excludes the assembly layer.** The CLI
binary is trusted to hold the vault handle and inject capabilities. If the
assembly layer is compromised, all handlers' capabilities are compromised.
This is the same trust boundary as ADR-008.
3. **No legitimate use case requires returning a private key over the wire.**
Public key sharing (identity verification, encryption to a recipient) is
the only cross-node key material flow. If a use case for returning a
private key emerges (e.g., a key-escrow service), it needs its own ADR and a
very different threat model.
4. **Adapters are registered at startup, not at call time.** The credential
source is wired to the adapter when the operation is registered, not when
the operation is invoked. This is consistent with OQ-04 (static
registration at startup).
## References
- ADR-003: Crate decomposition (alknet-vault is standalone)
- ADR-005: irpc as call protocol foundation
- ADR-008: Vault integration point (capability source — this ADR specifies the
mechanism that ADR-008 described in prose)
- ADR-009: One-way door decision framework
- ADR-013: Rust as canonical implementation language
- OQ-15: Call protocol client and adapter contract (this ADR constrains the
adapter contract: adapters take credential sources, not static tokens)
- OQ-16: Safe vault operations for call protocol exposure (resolved by this
ADR: none, for now)
- alknet-vault implementation: `crates/alknet-vault/`