Files
alknet/docs/architecture/decisions/021-key-rotation-via-version-indexed-paths.md
glm-5.2 c62a6adc7b docs(architecture): resolve review #002 Tiers 1-3 — mechanical and consistency fixes
Governance (Tier 2):
- Advance ADR-022 and ADR-023 from Proposed to Accepted (specs already
  depend on their types as source of truth)
- Amend ADR-015: mark Decision 3 and Assumption 6 as superseded by ADR-022;
  update handler_identity type to CompositionAuthority
- Amend ADR-002: note handle() signature revised by ADR-007 (BiStream → Connection)
- Amend ADR-004: note 'enrich/replace' AuthContext language superseded by
  ADR-011's immutability model; update to describe set_identity on Connection
- Update main README ADR table to show ADR-022/023 as Accepted

Spec-ADR consistency (Tier 3):
- Add abort_policy: AbortPolicy field to OperationContext struct (ADR-016
  Decision 6 mandated this but the spec omitted it)
- Define AbortPolicy enum (AbortDependents | ContinueRunning) with Default impl
- Add abort_policy to build_root_context and LocalOperationEnv::invoke()
- Define the OperationEnv trait explicitly with invoke() and
  invoke_with_policy() methods (was referenced as 'must remain a trait'
  but never defined)
- Specify From<StreamError> for HandlerError impl with exact variant mapping
- Add Connection::from_quinn() / from_iroh() constructors (was referenced
  as Connection::new() but never defined)
- Remove undefined CertAuthorityEntry placeholder from AuthPolicy v1 (will
  be added additively when alknet-ssh lands)
- Fix config.md key-differences table: rate limits are in DynamicConfig,
  not StaticConfig

Mechanical fixes (Tier 1):
- overview.md: 'closes the QUIC stream' → 'closes the connection' (stale
  from pre-ADR-007 model)
- overview.md: OQ-04 entry updated from stale 'defer to implementation'
  to 'resolved: static at startup'
- mnemonic-derivation.md: remove duplicate helper functions block (incomplete
  first copy, complete second copy)
- ADR-003: add iroh (feature-gated) to alknet-core dependency list, added
  by ADR-010
- ADR-021: fix ambiguous 'W1 drift issue from the vault review' cross-reference
- ADR-022: rephrase FromCall 'leaf locally' to 'leaf in the local registry'
- ADR-017: add error_schemas to from_call mirror list and services/schema
  step (inconsistency with ADR-023)
- ADR-016: fix self-referential citation ('ADR-016 Assumption 5' → 'Assumption 5')
- Add ScopedOperationEnv::empty(), allows(), new() and
  CompositionAuthority::none(), new() impl blocks (referenced but undefined)
- Add call.completed clarification for non-subscription calls
- Add services/schema leading-slash normalization note
- Crate README ADR tables: add missing ADR-013 (call), ADR-015 (core),
  ADR-006 + ADR-010 (vault)
- Vault README: add consolidated 'Known Source Drift' table tracking all
  four drift items (OsRng, unwrap, CURRENT_KEY_VERSION, spawn bug) in one
  place, including the two previously missing from README
2026-06-22 05:46:37 +00:00

253 lines
10 KiB
Markdown

# ADR-021: Key Rotation via Version-Indexed Derivation Paths
## Status
Accepted
## Context
ADR-020 established that the vault derives the AES-256-GCM encryption key
from the BIP39 seed via SLIP-0010 HD derivation at path `m/74'/2'/0'/0'`.
The `EncryptedData.key_version` field exists for rotation tracking, but
the current implementation always derives at the same path regardless of
version — `key_version` is metadata, not a functional selector.
OQ-22 asked: how does key rotation work? The key versioning is in place,
but the rotation mechanism — how a new key is derived, how existing data
is re-encrypted, and how the vault selects the right key for decryption —
is not specified.
### Why rotation matters
Key rotation is a fundamental security hygiene practice. The scenarios
that require it:
1. **Suspected key compromise**: the encryption key may have leaked
(memory dump, process compromise, log accident). All data encrypted
with that key must be re-encrypted with a new key.
2. **Periodic rotation**: security policy mandates key rotation every N
months. The vault must support this without re-deriving from a new
mnemonic (which would require re-deploying all nodes).
3. **Version transition**: moving from TS PBKDF2 data (v1) to vault HD
data (v2, per ADR-020) is itself a rotation. The mechanism should
generalize — it's the same operation.
### What "rotation" means concretely
Rotating from key version N to N+1:
1. Derive a new encryption key at a new derivation path
2. For each existing `EncryptedData` blob with `key_version: N`:
- Decrypt with the v-N key
- Re-encrypt the plaintext with the v-(N+1) key
- Replace the blob in storage with `key_version: N+1`
3. New encryptions use `key_version: N+1`
4. Old keys remain available for decrypting any data that hasn't been
rotated yet (partial rotation is safe)
The question is: **how is the new key derived?** The options:
- **Option A: New derivation path per version.** `m/74'/2'/0'/0'` for v2,
`m/74'/2'/0'/1'` for v3, etc. Each version gets its own HD key. No
new seed needed.
- **Option B: New mnemonic (new seed).** Generate a new mnemonic, unlock
with it, re-encrypt everything. This is heavy — it changes *all* derived
keys (identity, SSH host, etc.), not just the encryption key.
- **Option C: KDF from the existing key.** Use HKDF or PBKDF2 with the
existing derived key + the salt as input. This is the salt field's
potential use (OQ-20 mentioned this), but it adds KDF complexity and
the salt becomes load-bearing.
## Decision
### 1. Version-indexed derivation paths
Each key version maps to a unique derivation path. The last hardened index
in the encryption path is the key version:
```
v2: m/74'/2'/0'/0' ← PATHS::ENCRYPTION (current)
v3: m/74'/2'/0'/1'
v4: m/74'/2'/0'/2'
...
```
The `encryption_path_for_version(version)` function constructs the path:
```rust
pub fn encryption_path_for_version(version: u32) -> String {
// v1 is the TS PBKDF2 legacy — not an HD path. The vault starts at v2.
// v2 → m/74'/2'/0'/0', v3 → m/74'/2'/0'/1', etc.
let index = version.saturating_sub(2);
format!("m/74'/2'/0'/{}'", index)
}
```
`PATHS::ENCRYPTION` remains `m/74'/2'/0'/0'` — it's the v2 path, and v2
is the current version. When the vault is rotated to v3,
`encryption_path_for_version(3)` produces `m/74'/2'/0'/1'`.
This means:
- No new mnemonic needed — rotation uses the same seed, different path
- Each version's key is cryptographically independent (HD derivation
ensures this)
- The derivation path is self-documenting (`m/74'/2'/0'/1'` is clearly
"encryption key, version 3")
- Old keys are always derivable (the seed doesn't change), so partial
rotation is safe — the vault can decrypt any version
### 2. `encrypt_key(version)` and `decrypt_key(version)` methods
The `VaultServiceHandle` gains version-aware key derivation:
```rust
impl VaultServiceHandle {
/// Derive the encryption key for the given version. Cached.
fn derive_encryption_key_for_version(
&self,
version: u32,
) -> Result<EncryptionKey, VaultServiceError> {
let path = encryption_path_for_version(version);
// ... derive at path, cache by path ...
}
/// Encrypt with the current key version.
pub fn encrypt(&self, plaintext: &str, key_version: u32) -> Result<EncryptedData, VaultServiceError>;
/// Decrypt by deriving the key at the version indicated by the blob.
pub fn decrypt(&self, encrypted: &EncryptedData) -> Result<String, VaultServiceError> {
let key = self.derive_encryption_key_for_version(encrypted.key_version)?;
encryption::decrypt(encrypted, &key)
}
}
```
`decrypt` now derives the key at the path **indicated by
`encrypted.key_version`** — not always at `PATHS::ENCRYPTION`. This corrects
a source drift: the current source ignores `key_version` for key selection;
the spec now makes it functional.
### 3. `rotate` method
```rust
impl VaultServiceHandle {
/// Re-encrypt an EncryptedData blob from one key version to another.
///
/// Decrypts with the key at the blob's current key_version,
/// re-encrypts with the key at `to_version`. Returns the new
/// EncryptedData. Does not update storage — the caller replaces the
/// blob in storage.
pub fn rotate(
&self,
encrypted: &EncryptedData,
to_version: u32,
) -> Result<EncryptedData, VaultServiceError> {
let plaintext = self.decrypt(encrypted)?;
self.encrypt(&plaintext, to_version)
}
}
```
`rotate` is a vault method, not a storage operation. It decrypts and
re-encrypts; the caller (the assembly layer or a migration tool) handles
replacing the blob in storage. This keeps the vault focused on crypto and
the storage system focused on storage.
### 4. `CURRENT_KEY_VERSION` and rotation policy
```rust
pub const CURRENT_KEY_VERSION: u32 = 2;
```
`encrypt()` stamps `CURRENT_KEY_VERSION` (or the explicitly-passed version)
onto new `EncryptedData` blobs. The assembly layer decides when to rotate:
- **Manual rotation**: an operator triggers rotation (e.g., a CLI command
`alknet vault rotate --to v3` that loads all blobs, calls `rotate` on
each, and writes them back to storage).
- **No automatic rotation**: the vault does not self-rotate. Rotation is
an operational action, not a runtime behavior. The vault provides the
mechanism; the policy is external.
### 5. Cache implications
The `KeyCache` is keyed by derivation path. Since each version has a
distinct path, the cache naturally holds multiple versions simultaneously.
This is correct — during a rotation, the vault may need to decrypt old
blobs (v2) and encrypt new blobs (v3), and both keys should be cached.
The cache's TTL and LRU eviction still apply. If the cache evicts an old
version's key during a long rotation, the next `decrypt` of an old blob
re-derives it (the seed hasn't changed). This is correct but slightly
slower — the rotation tool should be aware that cache misses on old keys
are expected.
## Consequences
**Positive:**
- Key rotation is a vault method (`rotate`), not a storage operation or a
full mnemonic change. It's cheap (HD derivation) and local.
- Partial rotation is safe. Old and new keys coexist — the vault can
decrypt any version. This means a rotation can be performed incrementally
(rotate some blobs, verify, rotate the rest).
- No new mnemonic needed. The same seed produces all version keys. A
backup node with the same mnemonic can decrypt any version.
- The derivation path is self-documenting. `m/74'/2'/0'/1'` is clearly
"encryption key version 3."
- The `salt` field remains unused — no KDF complexity. Rotation is pure HD
path indexing.
- The mechanism generalizes the TS→vault migration (v1→v2 is a rotation,
though v1 requires the TS PBKDF2 `decrypt`, not the vault's `decrypt`).
**Negative:**
- `decrypt` now derives the key at the version-indicated path, which means
a cache miss on an old version re-derives from the seed. This is a few
HMAC operations — negligible, but the path construction and cache lookup
add a small amount of complexity over the current "always use
`PATHS::ENCRYPTION`" approach.
- The rotation tool (CLI command or migration script) must iterate all
stored blobs and call `rotate` on each. This is an operational concern,
not a vault concern — but the vault spec should document the expected
usage pattern so the tool implementer knows the contract.
- Old version keys are always derivable (the seed doesn't change). This is
a feature (partial rotation is safe) but also means a compromised seed
allows decrypting all versions. If the seed itself is compromised, all
versions are compromised — rotation doesn't help. This is inherent to
HD derivation and not specific to this design.
## Assumptions
1. **The seed is not compromised.** If the seed is compromised, rotating
the encryption key path doesn't help — the attacker can derive all
version keys. Seed compromise requires a full mnemonic change (new
seed, re-derive everything, re-deploy). This ADR covers encryption key
rotation, not seed rotation. Seed rotation is an operational procedure
(generate new mnemonic, unlock with it, re-encrypt all data) that is
outside the vault's API.
2. **Rotation is infrequent.** The vault does not optimize for frequent
rotation (e.g., per-request key derivation). Rotation is an operational
event triggered by policy or incident. The cache and path-indexed
approach are efficient for this usage pattern.
3. **The storage system tracks which blobs to rotate.** The vault's `rotate`
method handles one blob at a time. Iterating all stored
`EncryptedData` blobs is the storage system's job (or the CLI's). The
vault doesn't know what's in storage — it only knows how to rotate a
blob it's given.
4. **v1 (TS PBKDF2) data is not rotated through the vault.** v1 data is
decrypted by the TS `decrypt()` function (PBKDF2), not the vault's
`decrypt()` (which uses HD derivation). The v1→v2 migration is a
separate tool that has access to both. Once data is at v2, future
rotations (v2→v3, etc.) use the vault's `rotate` method.
## References
- ADR-020: HD derivation for encryption keys (this ADR builds on the
version-indexed path scheme)
- OQ-22: Key rotation mechanism (resolved by this ADR)
- [encryption.md](../crates/vault/encryption.md) — AES-256-GCM, EncryptedData
- [service.md](../crates/vault/service.md) — encrypt, decrypt, rotate methods
- [mnemonic-derivation.md](../crates/vault/mnemonic-derivation.md) —
derivation paths, `PATHS::ENCRYPTION`