docs(architecture): security constraints from security review
Address security review findings by adding explicit constraints to specs and implementation specialist role: Architectural constraints (spec updates): - metadata does not propagate through OperationEnv::invoke() — fresh HashMap for nested calls, closes the back-door leak channel where a handler that puts a secret in metadata would leak it to children and across from_call to remote nodes (ADR-014) - Config reload must be authenticated/local-only — malicious reload = root-equivalent privilege grant (config.md) - from_call trust is transitive — scoped env bounds reachability, not what the remote op does (operation-registry.md) - Token entropy ≥128 bits — prefix is lookup aid not secret, offline hash verification requires high-entropy tokens (auth.md) Implementation constraints (auth.md security constraints section + role spec): - OsRng for cryptographic nonces (AES-GCM IV reuse is catastrophic) - CachedKey derives Zeroize/ZeroizeOnDrop (no secrets in freed heap) - No unwrap()/expect() outside tests (poisoned lock recovery, not crash) - Implementation specialist role spec updated with all four constraints
This commit is contained in:
@@ -212,12 +212,24 @@ Read `AGENTS.md` at project root for full details. Key rules:
|
||||
1. **No comments in code** — Per project convention.
|
||||
2. **Error handling** — Use `anyhow::Result` for application code, `thiserror` for
|
||||
library error types. Never panic in library code.
|
||||
3. **Feature flags** — Transports are feature-gated (`tls`, `iroh`, `acme`). Base
|
||||
3. **No `unwrap()` or `expect()` outside tests** — These are debug signals that
|
||||
something wasn't clear. If you reach for `unwrap()`, it means the error
|
||||
handling path wasn't specified — stop and think about what should actually
|
||||
happen on that error. For poisoned locks, use
|
||||
`unwrap_or_else(|e| e.into_inner())` or explicit error propagation. A panic
|
||||
in one operation must not cascade to other operations.
|
||||
4. **Cryptographic nonces use `OsRng`** — AES-GCM IVs and any other cryptographic
|
||||
nonces must use `OsRng` (or equivalent CSPRNG), never `rand::random()`. IV
|
||||
reuse under the same key is catastrophic for GCM.
|
||||
5. **Secret material is zeroized on drop** — Any type holding derived keys,
|
||||
decrypted credentials, or other secret material must derive `Zeroize` and
|
||||
`ZeroizeOnDrop`. Secrets must not linger in freed heap memory.
|
||||
6. **Feature flags** — Transports are feature-gated (`tls`, `iroh`, `acme`). Base
|
||||
crate should compile lean.
|
||||
4. **Async runtime** — `tokio` is the async runtime. All I/O is async.
|
||||
5. **Naming conventions** — Rust standard: `snake_case` for functions/variables/
|
||||
7. **Async runtime** — `tokio` is the async runtime. All I/O is async.
|
||||
8. **Naming conventions** — Rust standard: `snake_case` for functions/variables/
|
||||
modules, `PascalCase` for types/traits, `SCREAMING_SNAKE_CASE` for constants.
|
||||
6. **Module structure** — One module per component under `src/`. Re-export via
|
||||
9. **Module structure** — One module per component under `src/`. Re-export via
|
||||
`mod.rs` or `lib.rs` as appropriate.
|
||||
|
||||
## Key Principles
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-21
|
||||
last_updated: 2026-06-22
|
||||
---
|
||||
|
||||
# Operation Registry
|
||||
@@ -114,7 +114,7 @@ pub struct OperationContext {
|
||||
- `identity`: The authenticated caller (from `IdentityProvider`) — inbound auth (who is calling me). For external calls, this is who sent the `call.requested`. For internal calls, this is the parent handler's `handler_identity` (propagated through `OperationEnv::invoke()`)
|
||||
- `handler_identity`: The identity of the handler processing this call. Set at registration by the assembly layer. For internal calls (`internal: true`), the ACL check runs against this identity (ADR-015)
|
||||
- `capabilities`: Outbound credentials the handler may use (decrypted API keys, scoped vault access) — see [Capability Injection](#capability-injection) below
|
||||
- `metadata`: Additional context (connection info, tracing IDs). **Must not hold secret material** — see ADR-014
|
||||
- `metadata`: Request-scoped context (tracing IDs, connection info). **Must not hold secret material** — see ADR-014. **Does not propagate through `OperationEnv::invoke()`** — nested calls get fresh metadata. The tracing link between parent and child is `parent_request_id`, not metadata propagation. Anything a handler needs to pass to a child goes in the call `input`.
|
||||
- `env`: The operation environment for composing calls to other operations. Scoped — the handler can only invoke a declared set of operations (ADR-015)
|
||||
- `internal`: When `true`, this call originated from composition (a handler calling another operation via `OperationEnv`), not from a wire request. This switches the authority context: ACL runs against `handler_identity`, not `identity`. The `internal` field uses module-private construction — handlers construct `OperationContext` through `OperationEnv::invoke()` which sets `internal: true`, or through the `CallAdapter` dispatch path which sets `internal: false`. The field is not `pub` for writes; only `pub fn is_internal(&self) -> bool` is exposed for reads. See ADR-015.
|
||||
|
||||
@@ -151,11 +151,26 @@ The CLI binary (or assembly layer) constructs the registry and passes it to the
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait OperationEnv: Send + Sync {
|
||||
async fn invoke(&self, namespace: &str, operation: &str, input: Value, parent: &OperationContext) -> ResponseEnvelope;
|
||||
impl OperationEnv for LocalOperationEnv {
|
||||
async fn invoke(&self, namespace: &str, operation: &str, input: Value, parent: &OperationContext) -> ResponseEnvelope {
|
||||
let name = format!("{namespace}/{operation}");
|
||||
let context = OperationContext {
|
||||
request_id: format!("env-{name}"),
|
||||
parent_request_id: Some(parent.request_id.clone()),
|
||||
identity: parent.handler_identity.clone(), // Parent's handler identity becomes the caller
|
||||
handler_identity: parent.handler_identity.clone(), // Inherit handler authority for ACL
|
||||
capabilities: parent.capabilities.clone(), // Inherit caller's capabilities
|
||||
metadata: HashMap::new(), // Fresh — does NOT propagate parent metadata
|
||||
env: self.clone(),
|
||||
internal: true, // Nested calls use handler authority
|
||||
};
|
||||
self.registry.invoke(&name, input, context).await
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Metadata does not propagate through composition.** Nested calls get fresh metadata (`HashMap::new()`), not the parent's metadata bag. This is a security constraint (ADR-014): `metadata: HashMap<String, Value>` accepts any `serde_json::Value`, including secret material. If metadata propagated through `env.invoke()`, a handler that accidentally placed a secret in metadata would leak it to every child operation — and if a child is a `from_call` operation (ADR-017), the metadata would cross the wire to the remote node. The tracing link between parent and child is `parent_request_id`, not metadata propagation. Anything a handler needs to pass to a child goes in the call `input`, not in ambient context.
|
||||
|
||||
`OperationEnv` is the universal composition mechanism. A handler calls `context.env.invoke("fs", "readFile", input, &context)` and gets a `ResponseEnvelope` back — regardless of whether the operation runs locally, via an irpc service, or on a remote node.
|
||||
|
||||
The `parent` parameter propagates the calling context: the nested call gets `parent_request_id: Some(parent.request_id)`, inherits `parent.identity`, and is marked `internal: true`.
|
||||
@@ -294,6 +309,8 @@ The `Capabilities` type holds non-serializable, zeroized secret material. It doe
|
||||
|
||||
**`from_call` imports remote operations.** The `from_call` adapter (ADR-017) discovers operations on a remote call protocol endpoint via `services/list` and `services/schema`, then registers them with handlers that forward calls over the QUIC connection. This makes cross-node composition transparent — a handler calling `env.invoke("worker", "exec", ...)` doesn't know whether the operation is local or remote. Connection direction (who opened the QUIC connection) is independent of call direction (who calls whom) — both sides can call each other once connected.
|
||||
|
||||
**`from_call` trust is transitive.** A `from_call`-imported operation executes the remote node's code, not yours. The scoped env (ADR-015) bounds *which* operations are reachable, but not *what* they do. A compromised remote node can do anything its operations are declared to do (and anything its handler bugs allow). This is inherent to remote composition — same as trusting any RPC endpoint — but it must be explicit in the threat model. `from_call` means "I trust the remote node as much as my own handlers." The scoping protects the caller from reaching arbitrary ops; it does not protect against what the reached op does.
|
||||
|
||||
**Scoped composition env.** The `OperationEnv` given to a handler is scoped — it can only invoke a declared set of operations, set at registration by the assembly layer. This bounds the parameterized-dispatch attack surface: a handler (or an LLM picking tools, or a quickjs sandbox) can only reach declared operations, not the entire registry. The scoped env is the reachability control; the handler identity is the authority control. Both are needed for least privilege. See ADR-015.
|
||||
|
||||
## Constraints
|
||||
@@ -307,6 +324,7 @@ The `Capabilities` type holds non-serializable, zeroized secret material. It doe
|
||||
- **The composition env is scoped.** A handler can only invoke operations declared in its scoped env. This bounds parameterized-dispatch attack surface. See ADR-015.
|
||||
- **No vault operations are registered in the call protocol.** The vault is assembly-layer only (ADR-008, ADR-014). Handlers receive secret material through `OperationContext.capabilities`, not by calling vault operations over the wire.
|
||||
- **The call protocol carries no secret material.** Secret material (private keys, API keys, mnemonics, decrypted credentials) must not appear in `call.requested` payloads, `call.responded` payloads, or `OperationContext.metadata`. See ADR-014.
|
||||
- **Metadata does not propagate through composition.** `OperationEnv::invoke()` constructs fresh metadata for nested calls (`HashMap::new()`), not the parent's metadata. This prevents a handler that accidentally places a secret in metadata from leaking it to child operations — and if a child is a `from_call` operation (ADR-017), across the wire to a remote node. The tracing link is `parent_request_id`, not metadata propagation. See ADR-014.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
|
||||
@@ -250,3 +250,14 @@ The endpoint's `AlknetEndpoint` also holds `Arc<dyn IdentityProvider>` for endpo
|
||||
## Open Questions
|
||||
|
||||
None. All auth-related open questions are resolved.
|
||||
|
||||
## Security Constraints
|
||||
|
||||
These are security-critical implementation requirements, not architectural decisions (the architecture is locked by the ADRs above). They are documented here so implementation agents don't miss them.
|
||||
|
||||
- **Token entropy**: generated `alk_` tokens must have ≥128 bits of entropy. The prefix (first 8 chars) is for O(1) lookup and is not secret — it appears in logs by design. SHA-256 of the full token allows offline verification; this is safe only if the full token is high-entropy. The prefix alone must not be sufficient to authenticate.
|
||||
- **Config reload must be authenticated**: a reload that adds an authorized fingerprint or API key grants access immediately (see [config.md](config.md)). The reload trigger must be local-only (SIGHUP, file watch) or an admin-scoped call protocol operation. A malicious reload is equivalent to root-level privilege grant.
|
||||
- **Connection-level identity is for observability only**: `Connection::set_identity` stores the handler-resolved identity for logging/audit. Per-request identity (`OperationContext.identity`) takes precedence for ACL. See OQ-11.
|
||||
- **Cryptographic nonces use OsRng**: AES-GCM IVs and any other cryptographic nonces must use `OsRng` (or equivalent CSPRNG), not `rand::random()`. IV reuse under the same key is catastrophic for GCM (authenticity breaks, two-time-pad on plaintext). The vault implementation (`crates/alknet-vault/src/encryption.rs`) must use `OsRng` for IV generation.
|
||||
- **Derived keys are zeroized on drop**: cached derived keys (`CachedKey`) must derive `Zeroize` and `ZeroizeOnDrop`. When the cache evicts an entry (LRU) or the process exits without explicit `lock()`, derived private keys must not linger in freed heap memory. The cache must clear on drop, not just on explicit `lock()`.
|
||||
- **No `unwrap()` or `expect()` outside tests**: poisoned lock recovery uses `unwrap_or_else(|e| e.into_inner())` or explicit error propagation. A panic in one vault operation must not brick the vault for all other operations.
|
||||
@@ -1,6 +1,6 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-16
|
||||
last_updated: 2026-06-21
|
||||
---
|
||||
|
||||
# Configuration
|
||||
@@ -197,6 +197,8 @@ impl ConfigReloadHandle {
|
||||
|
||||
The CLI binary creates a `ConfigReloadHandle` and passes it to a config watcher (file watcher, SIGHUP handler, or call protocol operation) that calls `reload()` when config changes are detected.
|
||||
|
||||
**Config reload is a privilege-escalation path.** `ConfigIdentityProvider` reads from `ArcSwap<DynamicConfig>`, so a reload that adds an authorized fingerprint or API key grants access immediately. A malicious reload is equivalent to root-level privilege grant. The reload trigger **must be authenticated/local-only**: SIGHUP (local signal), local file watch, or an admin call protocol operation with the same auth treatment as any other mutation (requires `admin` scope, ADR-015). The implementation must not ship a reload endpoint with no auth "for convenience."
|
||||
|
||||
## ConfigError
|
||||
|
||||
```rust
|
||||
|
||||
Reference in New Issue
Block a user