diff --git a/.opencode/agents/implementation-specialist.md b/.opencode/agents/implementation-specialist.md index 6c92f88..bfc1fa1 100644 --- a/.opencode/agents/implementation-specialist.md +++ b/.opencode/agents/implementation-specialist.md @@ -212,12 +212,24 @@ Read `AGENTS.md` at project root for full details. Key rules: 1. **No comments in code** — Per project convention. 2. **Error handling** — Use `anyhow::Result` for application code, `thiserror` for library error types. Never panic in library code. -3. **Feature flags** — Transports are feature-gated (`tls`, `iroh`, `acme`). Base +3. **No `unwrap()` or `expect()` outside tests** — These are debug signals that + something wasn't clear. If you reach for `unwrap()`, it means the error + handling path wasn't specified — stop and think about what should actually + happen on that error. For poisoned locks, use + `unwrap_or_else(|e| e.into_inner())` or explicit error propagation. A panic + in one operation must not cascade to other operations. +4. **Cryptographic nonces use `OsRng`** — AES-GCM IVs and any other cryptographic + nonces must use `OsRng` (or equivalent CSPRNG), never `rand::random()`. IV + reuse under the same key is catastrophic for GCM. +5. **Secret material is zeroized on drop** — Any type holding derived keys, + decrypted credentials, or other secret material must derive `Zeroize` and + `ZeroizeOnDrop`. Secrets must not linger in freed heap memory. +6. **Feature flags** — Transports are feature-gated (`tls`, `iroh`, `acme`). Base crate should compile lean. -4. **Async runtime** — `tokio` is the async runtime. All I/O is async. -5. **Naming conventions** — Rust standard: `snake_case` for functions/variables/ +7. **Async runtime** — `tokio` is the async runtime. All I/O is async. +8. **Naming conventions** — Rust standard: `snake_case` for functions/variables/ modules, `PascalCase` for types/traits, `SCREAMING_SNAKE_CASE` for constants. -6. **Module structure** — One module per component under `src/`. Re-export via +9. **Module structure** — One module per component under `src/`. Re-export via `mod.rs` or `lib.rs` as appropriate. ## Key Principles diff --git a/docs/architecture/crates/call/operation-registry.md b/docs/architecture/crates/call/operation-registry.md index 8598911..e5108c7 100644 --- a/docs/architecture/crates/call/operation-registry.md +++ b/docs/architecture/crates/call/operation-registry.md @@ -1,6 +1,6 @@ --- status: draft -last_updated: 2026-06-21 +last_updated: 2026-06-22 --- # Operation Registry @@ -114,7 +114,7 @@ pub struct OperationContext { - `identity`: The authenticated caller (from `IdentityProvider`) — inbound auth (who is calling me). For external calls, this is who sent the `call.requested`. For internal calls, this is the parent handler's `handler_identity` (propagated through `OperationEnv::invoke()`) - `handler_identity`: The identity of the handler processing this call. Set at registration by the assembly layer. For internal calls (`internal: true`), the ACL check runs against this identity (ADR-015) - `capabilities`: Outbound credentials the handler may use (decrypted API keys, scoped vault access) — see [Capability Injection](#capability-injection) below -- `metadata`: Additional context (connection info, tracing IDs). **Must not hold secret material** — see ADR-014 +- `metadata`: Request-scoped context (tracing IDs, connection info). **Must not hold secret material** — see ADR-014. **Does not propagate through `OperationEnv::invoke()`** — nested calls get fresh metadata. The tracing link between parent and child is `parent_request_id`, not metadata propagation. Anything a handler needs to pass to a child goes in the call `input`. - `env`: The operation environment for composing calls to other operations. Scoped — the handler can only invoke a declared set of operations (ADR-015) - `internal`: When `true`, this call originated from composition (a handler calling another operation via `OperationEnv`), not from a wire request. This switches the authority context: ACL runs against `handler_identity`, not `identity`. The `internal` field uses module-private construction — handlers construct `OperationContext` through `OperationEnv::invoke()` which sets `internal: true`, or through the `CallAdapter` dispatch path which sets `internal: false`. The field is not `pub` for writes; only `pub fn is_internal(&self) -> bool` is exposed for reads. See ADR-015. @@ -151,11 +151,26 @@ The CLI binary (or assembly layer) constructs the registry and passes it to the ```rust #[async_trait] -pub trait OperationEnv: Send + Sync { - async fn invoke(&self, namespace: &str, operation: &str, input: Value, parent: &OperationContext) -> ResponseEnvelope; +impl OperationEnv for LocalOperationEnv { + async fn invoke(&self, namespace: &str, operation: &str, input: Value, parent: &OperationContext) -> ResponseEnvelope { + let name = format!("{namespace}/{operation}"); + let context = OperationContext { + request_id: format!("env-{name}"), + parent_request_id: Some(parent.request_id.clone()), + identity: parent.handler_identity.clone(), // Parent's handler identity becomes the caller + handler_identity: parent.handler_identity.clone(), // Inherit handler authority for ACL + capabilities: parent.capabilities.clone(), // Inherit caller's capabilities + metadata: HashMap::new(), // Fresh — does NOT propagate parent metadata + env: self.clone(), + internal: true, // Nested calls use handler authority + }; + self.registry.invoke(&name, input, context).await + } } ``` +**Metadata does not propagate through composition.** Nested calls get fresh metadata (`HashMap::new()`), not the parent's metadata bag. This is a security constraint (ADR-014): `metadata: HashMap` accepts any `serde_json::Value`, including secret material. If metadata propagated through `env.invoke()`, a handler that accidentally placed a secret in metadata would leak it to every child operation — and if a child is a `from_call` operation (ADR-017), the metadata would cross the wire to the remote node. The tracing link between parent and child is `parent_request_id`, not metadata propagation. Anything a handler needs to pass to a child goes in the call `input`, not in ambient context. + `OperationEnv` is the universal composition mechanism. A handler calls `context.env.invoke("fs", "readFile", input, &context)` and gets a `ResponseEnvelope` back — regardless of whether the operation runs locally, via an irpc service, or on a remote node. The `parent` parameter propagates the calling context: the nested call gets `parent_request_id: Some(parent.request_id)`, inherits `parent.identity`, and is marked `internal: true`. @@ -294,6 +309,8 @@ The `Capabilities` type holds non-serializable, zeroized secret material. It doe **`from_call` imports remote operations.** The `from_call` adapter (ADR-017) discovers operations on a remote call protocol endpoint via `services/list` and `services/schema`, then registers them with handlers that forward calls over the QUIC connection. This makes cross-node composition transparent — a handler calling `env.invoke("worker", "exec", ...)` doesn't know whether the operation is local or remote. Connection direction (who opened the QUIC connection) is independent of call direction (who calls whom) — both sides can call each other once connected. +**`from_call` trust is transitive.** A `from_call`-imported operation executes the remote node's code, not yours. The scoped env (ADR-015) bounds *which* operations are reachable, but not *what* they do. A compromised remote node can do anything its operations are declared to do (and anything its handler bugs allow). This is inherent to remote composition — same as trusting any RPC endpoint — but it must be explicit in the threat model. `from_call` means "I trust the remote node as much as my own handlers." The scoping protects the caller from reaching arbitrary ops; it does not protect against what the reached op does. + **Scoped composition env.** The `OperationEnv` given to a handler is scoped — it can only invoke a declared set of operations, set at registration by the assembly layer. This bounds the parameterized-dispatch attack surface: a handler (or an LLM picking tools, or a quickjs sandbox) can only reach declared operations, not the entire registry. The scoped env is the reachability control; the handler identity is the authority control. Both are needed for least privilege. See ADR-015. ## Constraints @@ -307,6 +324,7 @@ The `Capabilities` type holds non-serializable, zeroized secret material. It doe - **The composition env is scoped.** A handler can only invoke operations declared in its scoped env. This bounds parameterized-dispatch attack surface. See ADR-015. - **No vault operations are registered in the call protocol.** The vault is assembly-layer only (ADR-008, ADR-014). Handlers receive secret material through `OperationContext.capabilities`, not by calling vault operations over the wire. - **The call protocol carries no secret material.** Secret material (private keys, API keys, mnemonics, decrypted credentials) must not appear in `call.requested` payloads, `call.responded` payloads, or `OperationContext.metadata`. See ADR-014. +- **Metadata does not propagate through composition.** `OperationEnv::invoke()` constructs fresh metadata for nested calls (`HashMap::new()`), not the parent's metadata. This prevents a handler that accidentally places a secret in metadata from leaking it to child operations — and if a child is a `from_call` operation (ADR-017), across the wire to a remote node. The tracing link is `parent_request_id`, not metadata propagation. See ADR-014. ## Design Decisions diff --git a/docs/architecture/crates/core/auth.md b/docs/architecture/crates/core/auth.md index 89f55e5..482ceec 100644 --- a/docs/architecture/crates/core/auth.md +++ b/docs/architecture/crates/core/auth.md @@ -249,4 +249,15 @@ The endpoint's `AlknetEndpoint` also holds `Arc` for endpo ## Open Questions -None. All auth-related open questions are resolved. \ No newline at end of file +None. All auth-related open questions are resolved. + +## Security Constraints + +These are security-critical implementation requirements, not architectural decisions (the architecture is locked by the ADRs above). They are documented here so implementation agents don't miss them. + +- **Token entropy**: generated `alk_` tokens must have ≥128 bits of entropy. The prefix (first 8 chars) is for O(1) lookup and is not secret — it appears in logs by design. SHA-256 of the full token allows offline verification; this is safe only if the full token is high-entropy. The prefix alone must not be sufficient to authenticate. +- **Config reload must be authenticated**: a reload that adds an authorized fingerprint or API key grants access immediately (see [config.md](config.md)). The reload trigger must be local-only (SIGHUP, file watch) or an admin-scoped call protocol operation. A malicious reload is equivalent to root-level privilege grant. +- **Connection-level identity is for observability only**: `Connection::set_identity` stores the handler-resolved identity for logging/audit. Per-request identity (`OperationContext.identity`) takes precedence for ACL. See OQ-11. +- **Cryptographic nonces use OsRng**: AES-GCM IVs and any other cryptographic nonces must use `OsRng` (or equivalent CSPRNG), not `rand::random()`. IV reuse under the same key is catastrophic for GCM (authenticity breaks, two-time-pad on plaintext). The vault implementation (`crates/alknet-vault/src/encryption.rs`) must use `OsRng` for IV generation. +- **Derived keys are zeroized on drop**: cached derived keys (`CachedKey`) must derive `Zeroize` and `ZeroizeOnDrop`. When the cache evicts an entry (LRU) or the process exits without explicit `lock()`, derived private keys must not linger in freed heap memory. The cache must clear on drop, not just on explicit `lock()`. +- **No `unwrap()` or `expect()` outside tests**: poisoned lock recovery uses `unwrap_or_else(|e| e.into_inner())` or explicit error propagation. A panic in one vault operation must not brick the vault for all other operations. \ No newline at end of file diff --git a/docs/architecture/crates/core/config.md b/docs/architecture/crates/core/config.md index 521e2ab..9621f7f 100644 --- a/docs/architecture/crates/core/config.md +++ b/docs/architecture/crates/core/config.md @@ -1,6 +1,6 @@ --- status: draft -last_updated: 2026-06-16 +last_updated: 2026-06-21 --- # Configuration @@ -197,6 +197,8 @@ impl ConfigReloadHandle { The CLI binary creates a `ConfigReloadHandle` and passes it to a config watcher (file watcher, SIGHUP handler, or call protocol operation) that calls `reload()` when config changes are detected. +**Config reload is a privilege-escalation path.** `ConfigIdentityProvider` reads from `ArcSwap`, so a reload that adds an authorized fingerprint or API key grants access immediately. A malicious reload is equivalent to root-level privilege grant. The reload trigger **must be authenticated/local-only**: SIGHUP (local signal), local file watch, or an admin call protocol operation with the same auth treatment as any other mutation (requires `admin` scope, ADR-015). The implementation must not ship a reload endpoint with no auth "for convenience." + ## ConfigError ```rust