From cdf340bec763fb2202ebbad477de73052a450a9b Mon Sep 17 00:00:00 2001 From: "glm-5.2" Date: Mon, 22 Jun 2026 13:44:58 +0000 Subject: [PATCH] =?UTF-8?q?docs(architecture):=20add=20ADR-024=20=E2=80=94?= =?UTF-8?q?=20operation=20registry=20layering,=20resolve=20C6?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Diagnoses a conflation in the pre-ADR-024 spec: the OperationRegistry inherited immutability by analogy from ADR-010's HandlerRegistry (ALPN-level), but the TLS-config argument that justifies HandlerRegistry immutability does not apply to the operation registry, which lives behind a single ALPN (alknet/call). This made from_call (which discovers ops over a live connection at runtime) structurally incompatible with the blanket immutability claim. ADR-024 layers the operation registry by trust boundary: curated (Local) ops are static and immutable — the startup trust boundary is where their composition authority is granted; session (Session) and imported (FromCall etc.) ops are dynamic at their respective scopes (per-session, per-connection) — their trust boundaries are per-scope, not per-startup. The principle: immutability follows the trust boundary. Immutability is the security control for composing ops (can escalate privilege); provenance + composition authority are the controls for non-composing ops (can't escalate). The OperationEnv trait becomes the integration point (Arc), following the IdentityProvider precedent (ADR-004): the CallAdapter composes the root OperationContext.env per incoming call from the active layers (curated base + connection overlay + session overlay). Children inherit the parent's composite env by Arc::clone — overlay composition happens once at the root and propagates through the composition tree. Resolves review #002 C6 (OperationContext.env type identity crisis): the field is split into scoped_env: ScopedOperationEnv (reachability data, from the registration bundle) and env: Arc (dispatch trait object). One field was being used as two different types (reachability set with .allows() and dispatch trait with .invoke()); Localizes W4 (hot-swap ↔ registry mutability coupling) to the connection scope: no global mutable registry to hot-swap; overlays replace naturally with connect/disconnect and session start/end. Schema-drift on reconnect is a per-connection overlay-rebuild concern, not a global hot-swap protocol. Partially addresses W3 (CallClient registry security): the registry-shape sub-question is resolved by the overlay model; the capability-exposure sub-question (what capabilities a remote peer can trigger) remains for ADR-017 — ADR-024 does not overclaim resolution there. Amends OQ-04 to scope its immutability claim to the HandlerRegistry and cross-reference ADR-024 for the operation registry. Generalizes OQ-19's session-overlay mechanism to also cover connection-scoped remote imports — both are per-scope dynamic overlays on the static curated base, using the same trait-layering mechanism. --- docs/architecture/README.md | 9 +- docs/architecture/crates/call/README.md | 5 +- .../architecture/crates/call/call-protocol.md | 34 +- .../crates/call/operation-registry.md | 110 +++- ...5-privilege-model-and-authority-context.md | 4 +- ...on-provenance-and-composition-authority.md | 29 +- .../024-operation-registry-layering.md | 483 ++++++++++++++++++ docs/architecture/open-questions.md | 21 +- docs/architecture/overview.md | 5 +- 9 files changed, 655 insertions(+), 45 deletions(-) create mode 100644 docs/architecture/decisions/024-operation-registry-layering.md diff --git a/docs/architecture/README.md b/docs/architecture/README.md index 2f42cef..759fe2a 100644 --- a/docs/architecture/README.md +++ b/docs/architecture/README.md @@ -7,9 +7,9 @@ last_updated: 2026-06-22-20 ## Current State -**Pre-implementation.** The project has completed a pivot from a three-layer model to an ALPN-as-service model. The greenfield workspace contains only `alknet-vault` (stable — implementation exists) and research/reference material. Foundational ADRs (001–023) are in place, including the BiStream type definition (ADR-007), vault integration (ADR-008), ALPN router/endpoint (ADR-010), AuthContext structure (ADR-011), call protocol stream model (ADR-012), Rust as canonical implementation language (ADR-013), secret material flow with capability injection (ADR-014), privilege model with authority context (ADR-015), abort cascade for nested calls (ADR-016), call protocol client and adapter contract (ADR-017), vault standalone crate (ADR-018), vault assembly-layer-only access (ADR-019), HD derivation for encryption keys (ADR-020), key rotation via version-indexed paths (ADR-021), handler registration, provenance, and composition authority (ADR-022), and operation error schemas (ADR-023). The alknet-core, alknet-call, and alknet-vault crate specs are in draft. +**Pre-implementation.** The project has completed a pivot from a three-layer model to an ALPN-as-service model. The greenfield workspace contains only `alknet-vault` (stable — implementation exists) and research/reference material. Foundational ADRs (001–024) are in place, including the BiStream type definition (ADR-007), vault integration (ADR-008), ALPN router/endpoint (ADR-010), AuthContext structure (ADR-011), call protocol stream model (ADR-012), Rust as canonical implementation language (ADR-013), secret material flow with capability injection (ADR-014), privilege model with authority context (ADR-015), abort cascade for nested calls (ADR-016), call protocol client and adapter contract (ADR-017), vault standalone crate (ADR-018), vault assembly-layer-only access (ADR-019), HD derivation for encryption keys (ADR-020), key rotation via version-indexed paths (ADR-021), handler registration, provenance, and composition authority (ADR-022), operation error schemas (ADR-023), and operation registry layering (ADR-024). ADR-024 resolves the registry mutability question that ADR-022/017 surfaced (`from_call` imports require a runtime-mutable home) and the `OperationContext.env` type identity crisis (review #002 C6), by layering the registry by trust boundary (curated static + session/connection dynamic overlays) and making `OperationEnv` a trait-object integration point. The alknet-core, alknet-call, and alknet-vault crate specs are in draft. -**Next step**: Review the vault spec documents, then begin implementation. All open questions for the core and call crates are resolved; the vault crate has one deferred OQ (OQ-21, remote vault administration) that does not block implementation. +**Next step**: Continue working through review #002's remaining Tier 4 findings (vault security decisions, guard clauses, ADR-writing exercises, smaller spec decisions). All open questions for the core and call crates are resolved; the vault crate has one deferred OQ (OQ-21, remote vault administration) that does not block implementation. ## Architecture Documents @@ -58,6 +58,7 @@ last_updated: 2026-06-22-20 | [021](decisions/021-key-rotation-via-version-indexed-paths.md) | Key Rotation via Version-Indexed Paths | Accepted | | [022](decisions/022-handler-registration-provenance-and-composition-authority.md) | Handler Registration, Provenance, and Composition Authority | Accepted | | [023](decisions/023-operation-error-schemas.md) | Operation Error Schemas | Accepted | +| [024](decisions/024-operation-registry-layering.md) | Operation Registry Layering | Accepted | ## Open Questions @@ -76,13 +77,13 @@ See [open-questions.md](open-questions.md) for the full tracker. - **OQ-15**: Call protocol client and adapter contract — `CallClient` opens connections; `from_call` imports remote ops; connection direction independent of call direction (ADR-017) **Resolved two-way doors:** -- **OQ-04**: Dynamic handler registration — static at startup (ADR-010) +- **OQ-04**: Dynamic handler registration — static at startup (ADR-010); scoped to the `HandlerRegistry` (ALPN-level) by ADR-024, which governs `OperationRegistry` mutability separately - **OQ-07**: Call protocol scope — bidirectional streams, EventEnvelope, ID-based correlation (ADR-012) - **OQ-11**: Handler-level auth resolution observability — handlers store resolved identity on Connection (Option B); two identity scopes: connection-level (observability) and per-request (ACL) - **OQ-12**: TLS identity provisioning — two use cases: RFC 7250 raw keys (default, P2P) and X.509 certs (domain-hosted, browsers). ACME is a proven pattern. - **OQ-13**: Operation path format — `/{service}/{op}` is the correct design for alknet-call, not a simplification - **OQ-14**: Batch operation semantics — multiple correlated `call.requested` events is the correct protocol design, not a simplification -- **OQ-19**: Session-scoped registries — agent-written operations via `OperationEnv` trait layering; protocol doesn't need changes; `OperationEnv` must remain a trait +- **OQ-19**: Session-scoped registries — agent-written operations via `OperationEnv` trait layering; protocol doesn't need changes; `OperationEnv` must remain a trait. Generalized by ADR-024 to cover connection-scoped overlays as well. - **OQ-20**: Encryption key derivation — HD derivation from BIP39 seed, not PBKDF2; salt field unused in v2 (wire-format compat) (ADR-020) - **OQ-22**: Key rotation — version-indexed derivation paths; `rotate` method re-encrypts (ADR-021) - **OQ-23**: Handler identity registration path — registration bundle with provenance, composition authority, scoped env, capabilities (ADR-022) diff --git a/docs/architecture/crates/call/README.md b/docs/architecture/crates/call/README.md index 4aeae76..acd5a00 100644 --- a/docs/architecture/crates/call/README.md +++ b/docs/architecture/crates/call/README.md @@ -35,6 +35,7 @@ Structured RPC over QUIC: operations, request/response, streaming subscriptions, | [017](../../decisions/017-call-protocol-client-and-adapter-contract.md) | Call Protocol Client and Adapter Contract | `CallClient` opens connections; `from_call` imports remote ops; connection direction independent of call direction | | [022](../../decisions/022-handler-registration-provenance-and-composition-authority.md) | Handler Registration, Provenance, and Composition Authority | Registration bundle carries provenance, composition authority, scoped env, capabilities | | [023](../../decisions/023-operation-error-schemas.md) | Operation Error Schemas | Operations declare domain errors; `call.error` carries typed `details`; adapter fidelity | +| [024](../../decisions/024-operation-registry-layering.md) | Operation Registry Layering | Curated (static) + session/connection overlays (dynamic); `OperationEnv` as trait-object integration point; `OperationContext.env` split into `scoped_env` (data) and `env` (dispatch trait) | ## Relevant Open Questions @@ -44,14 +45,14 @@ Structured RPC over QUIC: operations, request/response, streaming subscriptions, | OQ-13 | Operation path format and routing scope | resolved | `/{service}/{op}` is the correct design; remote dispatch is a separate layer | | OQ-14 | Batch operation semantics | resolved | Correlated `call.requested` events is the correct protocol design | | OQ-16 | Safe vault operations for call protocol exposure | resolved (ADR-014) | None exposed for now | -| OQ-19 | Session-scoped operation registries | resolved | Agent-written operations overlaid on global registry via `OperationEnv` trait layering. Protocol doesn't need changes; `OperationEnv` must remain a trait | +| OQ-19 | Session-scoped operation registries | resolved | Agent-written operations overlaid on curated registry via `OperationEnv` trait layering. Protocol doesn't need changes; `OperationEnv` must remain a trait. Generalized by ADR-024 to cover connection-scoped overlays. | ## Key Design Principles 1. **One connection, full access**: An `alknet/call` connection gives access to the entire operation registry — calls, subscriptions, batch, schema. 2. **Protocol is symmetric**: Both sides can initiate calls. The server calling a client uses the same EventEnvelope format and correlation. 3. **Stream-agnostic correlation**: PendingRequestMap correlates by request ID, not by stream. The protocol works with any stream arrangement. -4. **Operation registry is static**: Operations are registered at startup by the CLI binary. The registry supports JSON Schema discovery. +4. **Operation registry is layered**: The curated layer (`Local` provenance) is static — registered at startup by the CLI binary, immutable for the process lifetime. Session (`Session`) and imported (`FromCall` etc.) ops are dynamic overlays at their respective scopes (per-session, per-connection). The registry supports JSON Schema discovery. See ADR-024. 5. **irpc is one dispatch backend**: Local operations dispatch directly. irpc service calls (in-process, type-safe) are internal. The call protocol is the external interface. 6. **Local dispatch only**: The operation registry dispatches to local handlers. Remote dispatch (federation, head/worker routing) would be a separate mechanism at a different layer, not a modification to alknet-call's path format. 7. **No secret material on the wire**: The call protocol carries no private keys, API keys, mnemonics, or decrypted credentials. Handlers receive outbound credentials through `OperationContext.capabilities`, injected at the assembly layer. See ADR-014. diff --git a/docs/architecture/crates/call/call-protocol.md b/docs/architecture/crates/call/call-protocol.md index 370ac69..18fdb60 100644 --- a/docs/architecture/crates/call/call-protocol.md +++ b/docs/architecture/crates/call/call-protocol.md @@ -33,21 +33,26 @@ The `CallAdapter` implements `ProtocolHandler`: ```rust pub struct CallAdapter { + /// Layer 0 — the curated operation registry. Immutable after startup. registry: Arc, identity_provider: Arc, + /// Layer 1 — optional session-overlay source (agent crate supplies this; + /// None for non-agent deployments). See ADR-024, OQ-19. + session_source: Option>, } -#[async_trait] -impl ProtocolHandler for CallAdapter { - fn alpn(&self) -> &'static [u8] { b"alknet/call" } - - async fn handle(&self, connection: Connection, auth: &AuthContext) -> Result<(), HandlerError> { - // Accept bidirectional streams, read EventEnvelopes, - // dispatch to registry, write responses - } -} +// The connection's imported-ops overlay (Layer 2) is built per CallConnection +// as from_call discovery completes — it's not a field on CallAdapter but +// rather state held by the CallConnection / dispatch context for incoming +// calls on that connection. See ADR-024. ``` +The `CallAdapter` holds the static curated registry and an optional +session-overlay source. Per-connection imported-ops overlays (Layer 2, +ADR-024) are held with the connection and composed into the root +`OperationContext.env` per incoming call. See ADR-024 for the layering +model and `compose_root_env` below. + The adapter: 1. Accepts bidirectional streams on the connection 2. Reads length-prefixed JSON `EventEnvelope` frames from each stream @@ -299,8 +304,15 @@ fn build_root_context( handler_identity: registration.composition_authority.clone(), capabilities: registration.capabilities.clone(), // from the registration bundle metadata: HashMap::new(), // fresh per request - env: registration.scoped_env.clone() + scoped_env: registration.scoped_env.clone() .unwrap_or_else(ScopedOperationEnv::empty), // from the bundle, empty for leaves + // Per-call env composition (ADR-024): the root env is a composite + // of the curated base + this connection's imported-ops overlay + + // the active session overlay (if any). The CallAdapter builds this + // composite per incoming call — same shape as per-call identity + // resolution via IdentityProvider. Handlers call env.invoke(); + // the composite routes to the right overlay. + env: self.compose_root_env(/* connection, session */), abort_policy: AbortPolicy::default(), // abort-dependents (ADR-016 Decision 6) internal: false, // external call — ACL against caller identity } @@ -309,6 +321,8 @@ fn build_root_context( The `internal: false` here is what makes a wire call a wire call — ACL checks against the caller's resolved `identity`. When a handler subsequently calls `context.env.invoke(...)`, the `OperationEnv::invoke()` path (see [operation-registry.md](operation-registry.md#operationenv)) constructs a nested `OperationContext` with `internal: true`, switching authority to `handler_identity`. The two construction paths — `CallAdapter` for wire-originated, `OperationEnv::invoke()` for composition-originated — are the only places `internal` is set. Handlers cannot set it themselves (the field is module-private for writes — see [operation-registry.md](operation-registry.md#operationcontext) and ADR-015). +The per-call `env` composition (ADR-024) is the operation-dispatch analogue of the per-call identity resolution the CallAdapter already does via `IdentityProvider`. Both are integration-point patterns: the trait object owns the routing, the CallAdapter supplies the right sources per call. A connection's imported-ops overlay is part of the root env only for calls arriving on that connection; a session overlay is part of the root env only when a session is active. See ADR-024. + ### ResponseEnvelope The universal return type from all operation invocations: diff --git a/docs/architecture/crates/call/operation-registry.md b/docs/architecture/crates/call/operation-registry.md index b734bef..4c30d85 100644 --- a/docs/architecture/crates/call/operation-registry.md +++ b/docs/architecture/crates/call/operation-registry.md @@ -11,7 +11,7 @@ OperationSpec, Handler, OperationRegistry, AccessControl, service discovery, and The operation registry maps operation names to specs and handlers. It is the dispatch core of the call protocol — when a `call.requested` event arrives, the registry looks up the operation by name, checks access control, invokes the handler, and returns the result. -The registry is populated at startup by the CLI binary (or by the assembly layer in embedded contexts). Operations cannot be added or removed at runtime. This is consistent with OQ-04 (static registration at startup) and the `HandlerRegistry` model in alknet-core. +The registry is **layered by trust boundary** (ADR-024): a static, immutable curated layer (`Local` provenance, registered at startup) plus dynamic overlays for session ops (`Session` provenance, per-session) and imported ops (`FromCall` etc., per-connection). The immutability claim that previously applied to the whole registry is now scoped to the curated layer — see ADR-024 for the layering model and the rationale for why immutability is the security control for composing ops but not for imported leaves. ## Why @@ -115,7 +115,20 @@ pub struct OperationContext { pub handler_identity: Option, // Handler's composition authority (ADR-022) pub capabilities: Capabilities, pub metadata: HashMap, - pub env: OperationEnv, + /// Reachability set — the operations this handler may compose. + /// Populated from the registration bundle's `scoped_env` (ADR-022). + /// The reachability check in `OperationEnv::invoke()` consults + /// `scoped_env.allows(&name)`. This is data, not a dispatch trait. + pub scoped_env: ScopedOperationEnv, + /// Composition dispatch trait. A handler calls `env.invoke(...)` to + /// compose child operations. This is `Arc` (a trait + /// object), not a concrete struct — the trait-object design is what + /// enables registry layering (ADR-024): the CallAdapter composes the + /// root env per call from the active layers (curated base + connection + /// overlay + session overlay), and session/connection overlays wrap + /// the base via trait layering. Same pattern as `IdentityProvider` + /// (ADR-004). See ADR-024. + pub env: Arc, /// Abort policy for this call's descendants (ADR-016 Decision 6). /// Default `AbortDependents` — aborting this request aborts all /// non-terminal descendants. `ContinueRunning` is an opt-in for @@ -157,7 +170,8 @@ impl OperationContext { - `handler_identity`: The composition authority of the handler processing this call. `None` for leaves (`FromOpenAPI`, `FromMCP`, `FromCall`) — they don't compose. `Some(...)` for `Local` and `Session` ops that can compose children. For internal calls (`internal: true`), the ACL check runs against this authority (ADR-015, ADR-022). This is NOT a peer `Identity` — it's a declared authority bundle set at registration by the assembly layer - `capabilities`: Outbound credentials the handler may use (decrypted API keys, scoped vault access) — see [Capability Injection](#capability-injection) below - `metadata`: Request-scoped context (tracing IDs, connection info). **Must not hold secret material** — see ADR-014. **Does not propagate through `OperationEnv::invoke()`** — nested calls get fresh metadata. The tracing link between parent and child is `parent_request_id`, not metadata propagation. Anything a handler needs to pass to a child goes in the call `input`. -- `env`: The operation environment for composing calls to other operations. Scoped — the handler can only invoke a declared set of operations (ADR-015). `None`/empty for leaves. +- `scoped_env`: The reachability set — the operations this handler may compose. Populated from the registration bundle's `scoped_env` (ADR-022). The reachability check in `OperationEnv::invoke()` consults `scoped_env.allows(&name)`. This is *data* (a `ScopedOperationEnv` struct), not a dispatch trait. `None`/empty for leaves. +- `env`: The composition dispatch trait (`Arc`). A handler calls `context.env.invoke(...)` to compose child operations. This is a trait object, not a concrete struct — the trait-object design enables registry layering (ADR-024): the CallAdapter composes the root env per call from the active layers (curated base + connection overlay + session overlay), and overlays wrap the base via trait layering. Same pattern as `IdentityProvider` (ADR-004). See ADR-024. - `internal`: When `true`, this call originated from composition (a handler calling another operation via `OperationEnv`), not from a wire request. This switches the authority context: ACL runs against `handler_identity`, not `identity`. The `internal` field uses module-private construction — handlers construct `OperationContext` through `OperationEnv::invoke()` which sets `internal: true`, or through the `CallAdapter` dispatch path which sets `internal: false`. The field is not `pub` for writes; only `pub fn is_internal(&self) -> bool` is exposed for reads. See ADR-015. `identity` and `capabilities` are orthogonal: identity is inbound (who is calling me), capabilities are outbound (what credentials I can use). `identity` and `handler_identity` are the principal/agent pair: `identity` is the principal (who delegated), `handler_identity` is the agent (who is acting). See ADR-014 for capabilities, ADR-015 for the privilege model, and ADR-022 for the composition authority type. @@ -170,12 +184,12 @@ pub struct OperationRegistry { } ``` -The registry maps operation names to `HandlerRegistration` bundles. See ADR-022 for the full registration model. Key methods: +The registry maps operation names to `HandlerRegistration` bundles. The curated layer (Layer 0) is a `HashMap`; session and connection overlays (Layers 1 and 2) are separate maps that the `CallAdapter` composes into the per-call `OperationContext.env` (ADR-024). See ADR-022 for the full registration model and ADR-024 for the layering model. Key methods: -- `register(registration)`: Add an operation at startup -- `registration(name)`: Find a registration by operation name (returns spec, handler, provenance, composition authority, scoped env, capabilities) +- `register(registration)`: Add an operation to the curated layer at startup +- `registration(name)`: Find a registration by operation name (checks active overlays first, then curated base — ADR-024). Returns spec, handler, provenance, composition authority, scoped env, capabilities. - `invoke(name, input, context)`: Look up, check ACL, invoke handler, return result -- `list_operations()`: Return all registered specs (for `/services/list`) +- `list_operations()`: Return all registered specs (for `/services/list` — returns curated + active overlay ops) ### HandlerRegistration @@ -215,7 +229,7 @@ let registry = OperationRegistryBuilder::new() .build(); ``` -The CLI binary (or assembly layer) constructs the registry and passes it to the `CallAdapter`. Once built, the registry is immutable. +The CLI binary (or assembly layer) constructs the registry and passes it to the `CallAdapter`. Once built, the **curated layer** (Layer 0 — `Local` provenance ops) is immutable. Session and imported overlays are dynamic at their respective scopes (per-session, per-connection) per ADR-024. The `CallAdapter` composes the root `OperationContext.env` per incoming call from the active layers. ### OperationEnv @@ -263,9 +277,15 @@ The `parent` parameter propagates the calling context: the nested call gets `par **Metadata does not propagate through composition.** Nested calls get fresh metadata (`HashMap::new()`), not the parent's metadata bag. This is a security constraint (ADR-014): `metadata: HashMap` accepts any `serde_json::Value`, including secret material. If metadata propagated through `env.invoke()`, a handler that accidentally placed a secret in metadata would leak it to every child operation — and if a child is a `from_call` operation (ADR-017), the metadata would cross the wire to the remote node. The tracing link between parent and child is `parent_request_id`, not metadata propagation. Anything a handler needs to pass to a child goes in the call `input`, not in ambient context. -**Local dispatch only.** The initial `OperationEnv` implementation dispatches directly through the local `OperationRegistry`: +**Local dispatch only.** The initial `OperationEnv` implementation for the +curated layer (Layer 0) dispatches directly through the local +`OperationRegistry`. The composite env (curated + session + connection +overlays) is a separate type built by the `CallAdapter` per call — see +ADR-024 and the `CompositeOperationEnv` sketch below. ```rust +/// Layer 0 dispatch — the curated registry. This is the base env that +/// overlays wrap. See ADR-024 for the layering model. pub struct LocalOperationEnv { registry: Arc, } @@ -278,8 +298,10 @@ impl OperationEnv for LocalOperationEnv { // Reachability check (ADR-015, ADR-022): is this op in the parent's // scoped env? If not, return NOT_FOUND. This bounds the // parameterized-dispatch attack surface — a handler (or an LLM - // picking tools) can only reach declared ops. - if !parent.env.allows(&name) { + // picking tools) can only reach declared ops. The reachability set + // is on `parent.scoped_env` (data), not on `parent.env` (dispatch + // trait) — see ADR-024 for the split. + if !parent.scoped_env.allows(&name) { return ResponseEnvelope::not_found(name); } @@ -302,8 +324,11 @@ impl OperationEnv for LocalOperationEnv { handler_identity: registration.composition_authority.clone(), capabilities: parent.capabilities.clone(), // Inherit caller's capabilities metadata: HashMap::new(), // Fresh — does NOT propagate parent metadata (ADR-014) - env: registration.scoped_env.clone() + scoped_env: registration.scoped_env.clone() .unwrap_or_else(ScopedOperationEnv::empty), // Child's own scoped env (empty for leaves) + // Dispatch trait: the child inherits the parent's env (the same + // composite of curated base + active overlays). See ADR-024. + env: parent.env.clone(), // Abort policy: inherit the parent's policy by default (ADR-016). // The parent handler can override via `invoke_with_policy()`. abort_policy: parent.abort_policy.clone(), @@ -311,9 +336,56 @@ impl OperationEnv for LocalOperationEnv { }; self.registry.invoke(&name, input, context).await } + + // invoke_with_policy() delegates to invoke() with the policy set on the + // child context (ADR-016 Decision 6). See the trait definition above. } ``` +The composite env (built by the `CallAdapter` per incoming call) wraps the +curated base and any active overlays: + +```rust +/// Per-call composite env (ADR-024). Built by the CallAdapter in +/// build_root_context from the active layers. The child inherits this by +/// Arc::clone through invoke(). +pub struct CompositeOperationEnv { + session: Option>, // Layer 1 — active session, if any + connection: Option>, // Layer 2 — this connection's imported ops + base: Arc, // Layer 0 — curated registry (LocalOperationEnv) +} + +#[async_trait] +impl OperationEnv for CompositeOperationEnv { + async fn invoke(&self, namespace: &str, operation: &str, input: Value, parent: &OperationContext) -> ResponseEnvelope { + let name = format!("{namespace}/{operation}"); + // Reachability check against parent.scoped_env (same as LocalOperationEnv). + if !parent.scoped_env.allows(&name) { + return ResponseEnvelope::not_found(name); + } + // Dispatch in overlay order: session → connection → curated base. + // First match wins. Each overlay is an OperationEnv impl that knows + // its own registry; the composite routes to the right one. + if let Some(session) = &self.session { + // session impl checks its own registry; if not found, falls + // through (returns a sentinel or the composite continues). + // Implementation detail: the session impl's `invoke` either + // dispatches or returns a "not in this overlay" signal. + } + if let Some(connection) = &self.connection { + // same pattern + } + self.base.invoke(namespace, operation, input, parent).await + } +} +``` + +The exact "first match wins" mechanism (sentinel return, a separate +`contains` check, or a try/else pattern) is a two-way door for +implementation — the structural decision (composite trait object, overlay +order, `Arc::clone` inheritance) is what ADR-024 locks. +``` + Two things happen in `invoke()`: 1. **Reachability check**: before constructing the child context, `invoke()` checks whether the requested op is in the parent's scoped env. If not, `NOT_FOUND`. This is the reachability control — a handler can only compose declared ops. @@ -321,7 +393,7 @@ Two things happen in `invoke()`: Future work may add irpc service dispatch and remote call protocol dispatch as additional backends. The handler-facing API stays the same. -**`OperationEnv` must remain a trait.** This is a constraint, not a suggestion. The trait-based design enables session-scoped registries (OQ-19) — a session env wraps the global env (check session registry first, fall through to global). Making `OperationEnv` concrete or hardcoding the global registry into the dispatch path would close the session-overlay pattern. See OQ-19. +**`OperationEnv` must remain a trait.** This is a constraint, not a suggestion. The trait-based design enables registry layering (ADR-024): the CallAdapter composes the root env per call from the curated base + active connection/session overlays, and overlays wrap the base via trait layering. Session-scoped registries (OQ-19) and connection-scoped remote imports (ADR-017 `from_call`) are both overlays on the same base, using the same mechanism. Making `OperationEnv` concrete or hardcoding the global registry into the dispatch path would close both the session-overlay and connection-overlay patterns. This is the same integration-point pattern as `IdentityProvider` (ADR-004). See OQ-19 and ADR-024. ### Service Discovery @@ -409,7 +481,7 @@ let registry = OperationRegistryBuilder::new() let call_adapter = CallAdapter::new(Arc::new(registry), identity_provider); ``` -The vault is used at construction time to populate `capabilities` in the registration bundle, not registered as call protocol operations. The registry is immutable after construction. Adding operations requires restarting the process. This is consistent with OQ-04, ADR-008, ADR-014, and ADR-022. +The vault is used at construction time to populate `capabilities` in the registration bundle, not registered as call protocol operations. The curated layer (Layer 0) is immutable after construction — adding a `Local` op requires restarting the process. Session and imported overlays are dynamic at their respective scopes (ADR-024). This is consistent with OQ-04 (scoped to the `HandlerRegistry` by ADR-024), ADR-008, ADR-014, and ADR-022. ### Capability Injection @@ -452,7 +524,7 @@ The `Capabilities` type holds non-serializable, zeroized secret material. It doe ## Constraints -- The registry is immutable after construction. No runtime registration or deregistration. Two-way door — `ArcSwap` can be added later. +- The registry is **layered by trust boundary** (ADR-024). The curated layer (`Local` provenance) is immutable after construction — adding a `Local` op requires restarting the process, which re-enters the startup trust boundary. Session (`Session`) and imported (`FromCall` etc.) ops are dynamic at their respective scopes (per-session, per-connection). The pre-ADR-024 blanket immutability claim was inherited by analogy from ADR-010's `HandlerRegistry` (ALPN-level) and did not apply to the operation registry — the TLS-config argument that justifies `HandlerRegistry` immutability does not touch the operation registry, which lives behind the single ALPN `alknet/call`. - Operation specs use JSON Schema. The call protocol's external interface is always JSON. irpc's postcard serialization is internal only. - `OperationEnv::invoke()` dispatches through the local registry. Remote dispatch (federation, head/worker routing) would be a separate mechanism at a different layer — not a prefix added to operation paths. irpc service dispatch is contracted but not built. - The call protocol does not depend on any database. Operation specs are in-memory, populated at startup. @@ -470,11 +542,12 @@ The `Capabilities` type holds non-serializable, zeroized secret material. It doe |----------|-----|---------| | irpc as call protocol foundation | [ADR-005](../../decisions/005-irpc-as-call-protocol-foundation.md) | irpc provides framing and service dispatch | | Call protocol stream model | [ADR-012](../../decisions/012-call-protocol-stream-model.md) | Bidirectional streams, EventEnvelope, ID-based correlation | -| Static handler registration | [ADR-010](../../decisions/010-alpn-router-and-endpoint.md) | Registry is immutable after construction | +| Static handler registration | [ADR-010](../../decisions/010-alpn-router-and-endpoint.md) | `HandlerRegistry` (ALPN-level) immutable after construction; `OperationRegistry` layered by ADR-024 (curated immutable, session/imported dynamic) | | Vault integration via assembly layer | [ADR-008](../../decisions/008-secret-service-integration.md) | Vault is a capability source, accessed at assembly time | | Secret material flow and capability injection | [ADR-014](../../decisions/014-secret-material-flow-and-capability-injection.md) | Capabilities carry outbound credentials; call protocol carries no secret material | | Privilege model and authority context | [ADR-015](../../decisions/015-privilege-model-and-authority-context.md) | `internal` = authority switch not ACL skip; External/Internal visibility; composition authority + scoped env | | Handler registration, provenance, and composition authority | [ADR-022](../../decisions/022-handler-registration-provenance-and-composition-authority.md) | Registration bundle carries provenance, composition authority, scoped env, capabilities; dispatch path reads from bundle | +| Operation registry layering | [ADR-024](../../decisions/024-operation-registry-layering.md) | Curated (static, immutable) + session and connection overlays (dynamic); `OperationEnv` as trait-object integration point; `OperationContext.env` split into `scoped_env` (data) and `env` (dispatch trait) | | Operation error schemas | [ADR-023](../../decisions/023-operation-error-schemas.md) | Operations declare domain errors; `call.error` carries typed `details`; adapter fidelity for `from_openapi`/`to_openapi` | ## Open Questions @@ -484,13 +557,14 @@ See [open-questions.md](../../open-questions.md) for full details. - **OQ-13** (resolved): Operation path format is `/{service}/{op}`. Remote dispatch is a separate mechanism, not a path prefix. - **OQ-14** (resolved): Batch is a client-side pattern of correlated `call.requested` events, not a protocol primitive. - **OQ-16** (resolved by ADR-014): No vault operations are exposed over the call protocol for now. -- **OQ-19** (resolved): Session-scoped operation registries — agent-written operations overlaid on the global registry via `OperationEnv` trait layering. Protocol doesn't need changes; `OperationEnv` must remain a trait. Session ops are `Session` provenance (ADR-022) — always `Internal`, compose under restricted authority scoped down at sandbox creation. +- **OQ-19** (resolved): Session-scoped operation registries — agent-written operations overlaid on the curated registry via `OperationEnv` trait layering. Protocol doesn't need changes; `OperationEnv` must remain a trait. Session ops are `Session` provenance (ADR-022) — always `Internal`, compose under restricted authority scoped down at sandbox creation. Generalized by ADR-024 to cover connection-scoped overlays as well. ## References - [call-protocol.md](call-protocol.md) — CallAdapter, EventEnvelope, stream model, PendingRequestMap - ADR-005: irpc as call protocol foundation - ADR-008: Vault integration point -- ADR-010: ALPN router and endpoint (static registration) +- ADR-010: ALPN router and endpoint (static registration — applies to the `HandlerRegistry`, not the `OperationRegistry`; see ADR-024 for the distinction) - ADR-012: Call protocol stream model +- ADR-024: Operation registry layering (curated + session/connection overlays; `OperationEnv` as trait-object integration point) - Reference implementation: `/workspace/@alkdev/alknet-main/crates/alknet-core/src/call/` \ No newline at end of file diff --git a/docs/architecture/decisions/015-privilege-model-and-authority-context.md b/docs/architecture/decisions/015-privilege-model-and-authority-context.md index 0a72169..c4df516 100644 --- a/docs/architecture/decisions/015-privilege-model-and-authority-context.md +++ b/docs/architecture/decisions/015-privilege-model-and-authority-context.md @@ -132,7 +132,9 @@ pub struct OperationContext { pub handler_identity: Option, // Handler's composition authority pub capabilities: Capabilities, pub metadata: HashMap, - pub env: OperationEnv, + // env/scoped_env split by ADR-024: + pub scoped_env: ScopedOperationEnv, // Reachability data (ADR-022, ADR-024) + pub env: Arc, // Dispatch trait (ADR-024) /// Module-private for writes; read via `is_internal()`. Set only by /// `OperationEnv::invoke()` (true) or `CallAdapter` dispatch (false). pub(crate) internal: bool, diff --git a/docs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md b/docs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md index 4d144ff..ad72b49 100644 --- a/docs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md +++ b/docs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md @@ -319,8 +319,12 @@ fn build_root_context( handler_identity: registration.composition_authority, // C1: from bundle, None for leaves capabilities: registration.capabilities.clone(), // C3: from bundle metadata: HashMap::new(), - env: registration.scoped_env.clone() + // env/scoped_env split by ADR-024: scoped_env is the reachability + // data (from the bundle), env is the dispatch trait object (composed + // per-call by the CallAdapter from active overlays). + scoped_env: registration.scoped_env.clone() .unwrap_or_else(ScopedOperationEnv::empty), // C2: from bundle, empty for leaves + env: /* CallAdapter.compose_root_env(...) — see ADR-024 */, internal: false, // wire call — ACL against caller identity } } @@ -339,7 +343,9 @@ async fn invoke(&self, namespace: &str, operation: &str, input: Value, // Reachability check (C2): is this op in the parent's scoped env? // If not, return NOT_FOUND. This is the reachability control. - if !parent.env.allows(&name) { + // (ADR-024: the reachability check consults parent.scoped_env, not + // parent.env — env is now the dispatch trait, scoped_env is the data.) + if !parent.scoped_env.allows(&name) { return ResponseEnvelope::not_found(name); } @@ -351,8 +357,10 @@ async fn invoke(&self, namespace: &str, operation: &str, input: Value, handler_identity: registration.composition_authority.clone(), // C1: child's own authority capabilities: parent.capabilities.clone(), // C3: propagate through composition metadata: HashMap::new(), // fresh — does NOT propagate (ADR-014) - env: registration.scoped_env.clone() + // env/scoped_env split by ADR-024: + scoped_env: registration.scoped_env.clone() .unwrap_or_else(ScopedOperationEnv::empty), // C2: child's own scoped env + env: parent.env.clone(), // child inherits parent's composite env (Arc::clone) internal: true, // composition — ACL against handler_identity }; self.registry.invoke(&name, input, context).await @@ -580,16 +588,27 @@ the fuzzer validates the implementation against the spec. cascade tree; `parent_request_id` indexes it) - ADR-017: Call protocol client and adapter contract (adapter-registered ops are `Internal` by default; this ADR's provenance makes that explicit) +- ADR-024: Operation registry layering (amends this ADR's Decision 5: the + `env` field shown in `build_root_context` and `invoke()` is split into + `scoped_env: ScopedOperationEnv` (reachability data, populated from the + bundle's `scoped_env`) and `env: Arc` + (dispatch trait object). The split is required by ADR-024's overlay model + — the trait-object design is what enables connection and session overlays + to compose. The `HandlerRegistration` bundle shape, provenance model, + composition authority, and capability injection specified by this ADR + are unchanged.) - ADR-008: Vault integration point (assembly layer is the trust boundary) - OQ-19: Session-scoped operation registries (session ops are `Session` provenance, always `Internal`, compose under restricted authority) - docs/reviews/001-pre-implementation-architecture-sanity-check.md (findings C1–C4, which this ADR resolves) +- docs/reviews/002-pre-implementation-architecture-sanity-check.md (finding + C6, resolved by ADR-024's `env`/`scoped_env` split) - `/workspace/@alkdev/flowgraph/README.md` — operation graph, call graph, and scoped subgraph concepts (the graph model this ADR uses as framing) - `/workspace/@alkdev/alknet-main/docs/architecture/flowgraph.md` — prior Rust speccing of flowgraph (incomplete; this ADR uses the model, not the crate) - Kernel/user mode analogy: `getaddrinfo` runs under kernel authority, not - the caller's `CAP_NET_RAW`; the curated entry point exists to do things the - user can't, on the user's behalf \ No newline at end of file + the caller's `CAP_NET_RAW`; the curated entry point exists to do things + the user can't, on the user's behalf \ No newline at end of file diff --git a/docs/architecture/decisions/024-operation-registry-layering.md b/docs/architecture/decisions/024-operation-registry-layering.md new file mode 100644 index 0000000..123e9e8 --- /dev/null +++ b/docs/architecture/decisions/024-operation-registry-layering.md @@ -0,0 +1,483 @@ +# ADR-024: Operation Registry Layering + +## Status + +Accepted + +## Context + +The architecture has two registries that the spec documents previously treated +as sharing one immutability argument: + +1. **The endpoint's `HandlerRegistry`** (ALPN string → `ProtocolHandler`). + This is what ADR-010 and OQ-04 are about. Its immutability is load-bearing: + ALPN strings are baked into the TLS `ServerConfig` at startup, so adding a + protocol handler at runtime requires rebuilding the TLS config. This is a + genuine one-way door and the rationale is correct. + +2. **The call protocol's `OperationRegistry`** (operation name → + `HandlerRegistration`). This lives *inside* the `CallAdapter`, which is one + `ProtocolHandler` behind the single ALPN `alknet/call`. Adding an operation + to the `OperationRegistry` does **not** touch the TLS `ServerConfig` — the + ALPN is already `alknet/call`, registered once at startup. + +`operation-registry.md` stated the operation registry "is immutable after +construction… consistent with OQ-04 and ADR-010." That inheritance was by +analogy, not by shared rationale. The TLS argument that justifies +`HandlerRegistry` immutability does not apply to the `OperationRegistry`. The +operation registry's mutability profile is a separate question, and it has been +answered incorrectly by inheriting a constraint that belongs to a different +registry. + +### Why `from_call` breaks the inherited constraint + +The import adapters have different lifecycle requirements: + +- **`from_openapi` / `from_mcp`** can run at startup — the assembly layer reads + a static spec file or queries a known service before the registry is frozen. + Static import, fits immutability. +- **`from_call`** requires a **live connection** to discover operations + (`services/list` + `services/schema`). Connections happen at runtime. + Workers join and leave dynamically in the machine→worker topology. You + cannot pre-freeze a set you discover over a connection you haven't opened + yet. + +So `from_call` is structurally incompatible with "frozen at startup, never +touched again." The pre-ADR-024 spec held two contradictory positions: the +registry is immutable (operation-registry.md), and `from_call` imports remote +operations at connection time (ADR-017). An implementer would have to resolve +the contradiction by guessing — likely by either forcing all `from_call` +imports to happen at startup (awkward, doesn't fit worker topologies) or +quietly making the registry mutable (undermining the stated constraint without +acknowledging it). + +### Why immutability is not the load-bearing security control for imported ops + +Imported operations (`FromOpenAPI`, `FromMCP`, `FromCall`) are leaves — they +cannot compose (ADR-022 Assumption 5). They have no composition authority, no +scoped env, `Internal` visibility by default, and their trust model is "the +remote endpoint is trusted as much as my own handlers" (ADR-017). Their +reachability from a composing handler is bounded by the *parent handler's* +scoped env, not by their registration timing. + +The security controls on imported ops are **provenance** and **composition +authority** — both set at registration, both checked at dispatch. Immutability +is redundant here. An imported op registered at runtime is no more or less +privileged than one registered at startup; it's a forwarding stub either way, +and its capacity to do harm is bounded by what the *composing parent*'s +authority and scoped env permit. + +Immutability *is* load-bearing for **curated** operations — the `Local` ops +the assembly layer writes at startup, which *can* compose and therefore *can* +escalate privilege under their own authority. For those, the trust boundary is +"the assembly layer declared them at startup," and immutability is what locks +that declaration. But that's a constraint on `Local` provenance specifically, +not on the registry as a whole. + +### The trust-boundary principle + +The right axis is not visibility (`Internal` vs `External`) or wire-vs-local — +it is **provenance combined with import timing**, which maps to where each +operation's trust decision is made: + +| Provenance | Import timing | Trust boundary | Layer | Lifetime | +|-----------|---------------|----------------|-------|----------| +| `Local` | Startup | Assembly layer at startup | 0 (curated) | Process — immutable | +| `Session` | Sandbox creation | Composing handler at sandbox creation | 1 (session) | Session — dynamic | +| `FromCall` | Connection (runtime) | Remote node at connection time | 2 (connection) | Connection — dynamic | +| `FromOpenAPI` / `FromMCP` | Startup | External endpoint, discovered at startup | 0 (curated) | Process — immutable | +| `FromOpenAPI` / `FromMCP` | Runtime (rare) | External endpoint, discovered at runtime | 2 (discovery) | Discovery-scoped — dynamic | + +`FromOpenAPI` / `FromMCP` provenance is **layer-polymorphic**: the same +provenance lands in Layer 0 (immutable) or Layer 2 (dynamic) depending on +when the import happens. The common case is startup import into Layer 0 +(Decision 6); runtime import into Layer 2 is permitted but rare. + +**Immutability follows the trust boundary.** Operations are mutable at the +scope where their trust decision is made. `Local` ops (and startup-imported +`FromOpenAPI`/`FromMCP`) are trusted at startup → immutable. Session ops +are trusted at sandbox creation → session-scoped dynamic. `FromCall` ops +(and runtime-imported `FromOpenAPI`/`FromMCP`) are trusted at +connection/discovery time → connection/runtime dynamic. + +Session ops are the edge case that proves the rule: they are `Internal` +visibility and can compose, but their trust boundary is per-session (the +parent handler grants them restricted authority at sandbox creation, per +ADR-022 Assumption 6), not per-startup. Visibility alone would misclassify +them; provenance correctly identifies them as dynamic. + +### The precedent: `IdentityProvider` + +The structural problem — *N consumers need to resolve something from M +sources, don't globalize the sources into one pot, don't make each consumer +know about all sources* — is the same problem `IdentityProvider` solves for +auth (ADR-004). An `IdentityProvider` is a trait (`Arc`) +that centralizes resolution policy behind a stable interface; source +composition is an impl detail. Handlers consume the result; the trait owns the +routing. + +`OperationEnv` is the same problem one layer over: *N handlers need to +dispatch to operations, operations come from M sources (curated local, this +session, this peer connection, that peer connection), don't globalize all +sources into one mutable pot, don't make each handler know about all sources +and pick the right registry.* The solution is the same shape: a trait — +`Arc` — that centralizes dispatch routing behind a stable +interface, with overlay composition as an impl detail. + +The alternative — a single global `ArcSwap` into which all +imported ops merge with namespace prefixes — is the registry equivalent of +"every handler reads identity from a global env var." It works at one +connection. At many connections it produces: an unbounded pot, namespace +collisions scaling with connection count, disconnect cleanup requiring a +reverse index (op → owning connection), zero source isolation, and +routing-by-naming-convention instead of routing-by-structure. That is the +failure mode the `IdentityProvider` pattern exists to prevent. + +## Decision + +### 1. The operation registry is layered by trust boundary + +The `OperationRegistry` is not a single flat map. It is a layered structure +where each layer corresponds to a trust boundary: + +``` +Layer 0 — Curated (static, immutable, startup trust boundary) + Local provenance operations from the assembly layer. + Registered once at startup, never mutated for the process lifetime. + This is where immutability is load-bearing: these ops can compose, + therefore can escalate privilege under their own authority. The + startup trust boundary + immutability is the security control. + +Layer 1 — Session (dynamic, per-session, sandbox-creation trust boundary) + Session provenance operations, agent-written, sandboxed. + Created and destroyed with each session. + Already specified by OQ-19 as an overlay on Layer 0. + +Layer 2 — Imported (dynamic, per-connection, peer trust boundary) + FromCall operations discovered when a peer connects. + FromOpenAPI / FromMCP operations when imported at runtime (rare; + usually at startup into Layer 0, but runtime import is permitted). + Created and destroyed with the connection / discovery event. +``` + +Layers 1 and 2 are the same shape: **per-scope dynamic overlays on the static +curated base.** The scope is "session" for Layer 1 and "connection" (or +"discovery event") for Layer 2. OQ-19 already specified the overlay mechanism +for Layer 1 (session env wraps global env via `OperationEnv` trait layering). +This ADR generalizes the same mechanism to Layer 2. + +### 2. The `OperationEnv` trait is the integration point + +`OperationContext.env` is `Arc` — a trait +object, not a concrete struct. This is required by the overlay model: a +composite env (curated base + connection overlay + session overlay) is built +by composing `OperationEnv` impls, not by merging registries. + +This resolves review #002 finding C6 (`OperationContext.env` type identity +crisis). The pre-ADR-024 spec had `env: OperationEnv` (a trait, which can't +be a field without `dyn`) and used the same field as both a reachability set +(`parent.env.allows()`) and a dispatch trait (`context.env.invoke()`). One +field cannot be both. The split: + +- `scoped_env: ScopedOperationEnv` — reachability data. Populated from the + registration bundle's `scoped_env` (ADR-022). The reachability check in + `invoke()` consults `parent.scoped_env.allows(&name)`. +- `env: Arc` — dispatch trait. The handler + calls `context.env.invoke(...)`; the trait impl routes to the right + overlay. + +This is the `IdentityProvider`-shaped integration point: handlers consume +the trait; source composition is an impl detail. + +### 3. The `CallAdapter` composes the root env per incoming call + +When a `call.requested` arrives over connection C, the `CallAdapter` does +not look up the operation in a single global registry. It composes the root +`OperationContext.env` from the layers active for this call: + +``` +root env = CompositeOperationEnv { + base: curated_registry_env, // Layer 0 — static + connection: C.imported_operations, // Layer 2 — this connection's overlay + session: active_session_overlay, // Layer 1 — if a session is active +} +``` + +The composite impl checks overlays in order (session first, then connection, +then curated base) and dispatches to the first match. This is structural +source binding: a handler composing `worker/exec` reaches it via the +connection overlay that contains it, not via a naming convention in a +global pot. + +**Env inheritance through composition**: the child's `env` is +`parent.env.clone()` — an `Arc::clone`, not a re-composition. Overlay +composition happens once at the root (in `build_root_context`) and +propagates by `Arc` through the composition tree. A child handler sees the +same active overlays its parent saw. This is deliberate: re-composing per +`invoke()` would re-resolve overlays on every dispatch and would break the +session-overlay case (a session that was active when the parent ran must +still be active for the child, even if the session ended mid-composition — +the child is part of the same call tree the parent started). The root env +is composed per incoming call; nested calls inherit it by `Arc::clone`. + +When connection C disconnects, its overlay is dropped. Operations imported +from C vanish from the reachable set with no global mutation and no reverse +index. Handlers that try to compose a now-gone op receive `NOT_FOUND` (if +the overlay was already dropped when `invoke()` runs the reachability +check) or a connection error with code `INTERNAL` (if the call was +dispatched to the forwarding handler and the connection drops mid-flight). +Both cases are clean failures — no stale-handler-binds-to-dead-connection +hazard. + +### 4. Curated operations remain immutable; imported and session ops are dynamic + +The blanket immutability claim in `operation-registry.md` is replaced by: + +- **Layer 0 (curated, `Local`)**: immutable after startup. The + `OperationRegistry` holding curated ops is constructed once by the + assembly layer and never mutated. This is where the security argument for + immutability applies: composing ops are privileged, the startup trust + boundary is where that privilege is granted, immutability locks it. +- **Layer 1 (session, `Session`)**: dynamic, per-session. Created at sandbox + creation, destroyed at session end. Already specified by OQ-19. +- **Layer 2 (imported, `FromCall` etc.)**: dynamic, per-connection. Created + when a peer connection completes `from_call` discovery, destroyed when the + connection closes. + +Adding a `Local` op at runtime is not supported — it would require re-entering +the startup trust boundary, which is a deployment (restart), not a runtime +operation. This preserves the security property ADR-010/OQ-04 were concerned +with, scoped to where it actually applies. + +### 5. `from_call` imports into the connection's overlay, not the global registry + +The `from_call` adapter (ADR-017) discovers operations on a remote peer and +produces `HandlerRegistration` bundles. Under ADR-024, those bundles are +registered into the **connection's overlay**, not a global mutable registry. + +```rust +// On CallConnection establishment: +let imported = from_call(&connection, config).await; +connection.imported_operations.extend(imported); +// The connection's env now includes these ops. +``` + +The handler closures produced by `from_call` capture the `CallConnection` — +when the connection drops, the handlers become unreachable (their env is +dropped), and any in-flight calls to them return connection errors. This is +the natural lifecycle; no explicit deregistration is needed. + +### 6. `from_openapi` and `from_mcp` default to startup import into Layer 0 + +For the common case — the assembly layer imports a static OpenAPI spec or +connects to a known MCP server at startup — `from_openapi` / `from_mcp` +register into the curated (Layer 0) registry, which is then frozen. This +preserves the pre-ADR-024 behavior for the case where it was correct. + +Runtime `from_openapi` / `from_mcp` import (e.g., discovering an MCP server +at connection time) is permitted and follows the Layer 2 model — the imported +ops live in a connection/discovery-scoped overlay. This is additive and +does not affect the startup-import path. + +### 7. OQ-04 scope clarification and OQ-19 generalization + +This ADR amends OQ-04 to scope its immutability claim to the +**`HandlerRegistry`** (ALPN-level, ADR-010). The `OperationRegistry`'s +mutability profile is now governed by this ADR: curated (Layer 0) is +immutable; session and imported layers are dynamic at their trust-boundary +scopes. See the OQ-04 amendment in `open-questions.md`. + +This ADR generalizes OQ-19's session-overlay mechanism to also cover +connection-scoped remote imports. Both are per-scope dynamic overlays on the +static curated base, composed into the per-call `OperationContext.env` by +the `CallAdapter`. `OperationEnv` being a trait object is what enables +both. See the OQ-19 resolution update in `open-questions.md`. + +## Consequences + +**Positive:** + +- `from_call` has a coherent home. Imported ops live with the connection + that produced them, appear when the connection is established, and + disappear when it closes. No contradiction with immutability, no awkward + "import everything at startup" workaround. +- The immutability argument is now correctly scoped. Layer 0 (curated, + composing ops) is immutable because that's where the security control + applies. Layers 1 and 2 are dynamic because their trust boundaries are + per-scope. An implementer reading the spec sees the right constraint in + the right place, instead of a blanket claim that doesn't fit all cases. +- The `OperationEnv`-as-trait constraint (OQ-19) is now required by the + overlay model, not just by the session-overlay pattern. The same + mechanism (trait layering) supports both session overlays and connection + overlays — one pattern, two scopes. This makes C6's resolution + (`env: Arc`) structurally motivated, not just a + type-system cleanup. +- Disconnect handling is structural. A connection drops → its overlay drops + → its ops vanish from the reachable set. No `ArcSwap` coordination, no + reverse index from op to owning connection, no stale handlers bound to a + dead connection. This is the same lifecycle property session overlays + already have (session ends → session overlay drops). +- Source isolation is structural. Imported ops from peer X are only + reachable from handlers whose `OperationEnv` is wired to X's overlay. + They are not globally callable. A handler that shouldn't be able to + reach peer X's ops simply doesn't have X's overlay in its env. This is + better hygiene than a global registry with namespace prefixes, where + every handler sees every imported op and isolation is a naming + convention. +- The `IdentityProvider` precedent makes the design legible. A future + reader sees "trait-object integration point, source composition as impl + detail" and recognizes the pattern; they don't have to re-derive why + trait-composed overlays were chosen over a global mutable registry. + +**Negative:** + +- The dispatch path is a composite lookup (session → connection → curated) + rather than a single `HashMap` lookup. This is a small constant cost — + three hash lookups in the worst case instead of one — and the overlays are + small (a session's ops, a connection's imported ops). The common case + (composing a curated op) hits Layer 0 after two empty-overlay misses, which + is a predictable and cache-friendly path. The cost is justified by the + source isolation and lifecycle properties it buys. +- `OperationContext.env` is now `Arc`, which + is a trait object with dynamic dispatch. This is the same cost as + `Arc` — a vtable call per `invoke()`. Negligible + relative to the work an operation does, and the same pattern the codebase + already uses for auth. +- The `CallAdapter` has more responsibility: it composes the root env per + call from the active layers, rather than handing every call the same + global registry. This is expected — the CallAdapter is the integration + point for the call protocol, and per-call env composition is the same + shape as per-call identity resolution (which the CallAdapter already does + via `IdentityProvider`). +- Naming across overlays: if two connections import ops with the same name + (e.g., both peers expose `worker/exec`), the composite env dispatches to + the first overlay that contains the name. This is the same ambiguity + `FromCallConfig`'s namespace prefix (ADR-017) was designed to address — + the caller disambiguates with a prefix at import time. ADR-024 does not + change this; it makes the disambiguation structural (which overlay is in + the env) rather than nominal (which prefix is in the name). +- The blanket immutability claim in `operation-registry.md` and the + cross-references that inherit it (the "Two-way door — + `ArcSwap` can be added later" note, OQ-04's framing) + must be updated. This is a spec edit, not a migration — no implementation + exists yet. + +**On review #002 findings resolved by this ADR:** + +- **C6** (`OperationContext.env` type identity crisis): resolved by Decision 2. + The field is split into `scoped_env` (reachability data) and `env` (dispatch + trait object). The split is structurally motivated by the overlay model, + not just a type-system cleanup. +- **W4** (hot-swap ↔ registry mutability coupling): localized to the + connection scope. There is no global mutable registry to hot-swap. + Overlays are per-scope and replace naturally with connect/disconnect and + session start/end. The schema-drift hazard (a peer re-runs + `services/list` on reconnect and re-imports with a changed schema) moves + from global to per-connection — it does not vanish. A handler + mid-composition whose peer reconnects with a changed schema sees the old + schema until the overlay is rebuilt. This is a per-connection concern, + not a global one; the guard clause the review asked for becomes a note on + overlay rebuild semantics rather than a global hot-swap protocol. +- **W3** (CallClient registry security dimension): partially addressed. The + *registry-shape* sub-question is resolved by the overlay model — a + `CallClient`'s incoming-call dispatch uses the same overlay composition, + and sharing the curated base with a remote peer is fine (curated ops are + trusted). The *capability-exposure* sub-question (a remote peer calling + `/llm/generate` uses the local node's API key) is **not resolved by this + ADR** — it is a separate concern about what capabilities a remote peer + can trigger, and it is unaffected by the registry shape. That sub-question + remains open for ADR-017 (a guard-clause note: a peer-scoped subset must + filter by capability remote-safety, not just operation name). ADR-024 + resolves the dispatch shape; ADR-017 retains the capability-exposure + decision. + +## Assumptions + +1. **Provenance is knowable at registration time and stable for the + registration's lifetime.** A `Local` op does not become `FromCall` later; + a `FromCall` op does not become `Local`. If a remote-imported op is later + "promoted" to curated, that's a re-registration at the next startup + (deployment), not a runtime mutation. Inherited from ADR-022 Assumption 2. + +2. **Layer 0 immutability is the security control for composing ops.** The + pre-ADR-024 blanket immutability claim was overbroad but not wrong about + `Local` ops. Curated composing ops must be immutable because the startup + trust boundary is where their authority is granted. This ADR narrows the + claim, it does not remove it. + +3. **Imported and session ops do not need immutability as a security + control for privilege escalation.** Their security against privilege + escalation is bounded by provenance (no composition authority → no + privilege escalation) and by the parent handler's scoped env + (reachability control). This is the central argument; if it's wrong — + if a `from_call` op can escalate in some way provenance + scoped env + don't bound — the model needs revisiting. **Immutability is not the + control for non-escalation threats** (availability, schema drift): + availability is bounded by per-handler timeouts (ADR-016) and the + connection's overlay being drop-on-disconnect; schema drift on + reconnect is a per-connection overlay-rebuild concern (see W4 in + Consequences), not a global-registry-mutation concern. The point of + scoping immutability to Layer 0 is that immutability is the right + control *for composing ops* and the wrong control *for non-composing + ops*; it is not a claim that non-composing ops face no threats. + +4. **A connection's overlay is the right scope for `from_call` imports.** + Operations discovered from peer X are reachable from handlers whose env + includes X's overlay. If a use case requires imported ops to be globally + reachable (every handler sees every peer's ops), the composite env can be + built to include all active connection overlays — but the default is + per-connection scoping for isolation. + +5. **Disconnect → overlay drop → op vanishes is acceptable behavior.** A + handler composing an op whose peer has disconnected receives `NOT_FOUND` + (or a connection error if the in-flight call was mid-dispatch). This is + the same behavior as a peer that never exposed the op. If a use case + requires disconnected-peer ops to remain reachable (e.g., cached results), + that's a handler-level caching concern, not a registry concern. + +6. **The root env is composed per incoming call, not cached per + connection.** The active session overlay can change during a connection's + lifetime (a session starts or ends mid-connection), so the env cannot be + composed once at connection establishment and reused. `build_root_context` + runs per `call.requested` and composes the env from the layers active at + that moment. The cost (constructing an `Arc` per + call) is negligible — it's three `Arc::clone`s, not three registry + traversals. + +7. **Session-overlay attachment is an agent-crate concern.** ADR-024 + generalizes OQ-19's session overlay to also cover connection overlays, + but the mechanism by which a session overlay attaches to a given wire + call (session ID in metadata, payload field, connection-bound session + state, etc.) is not specified here. The `CallAdapter` is wired with an + optional session-overlay source by the assembly layer; the lookup + mechanism belongs to the agent crate spec (OQ-19: "the agent-specific + mechanism belongs to the agent crate spec"). If a wire call has no + active session, the root env is `curated base + connection overlay` + (no session layer). + +## References + +- ADR-010: ALPN router and endpoint (the `HandlerRegistry` immutability + argument — this ADR clarifies that it applies to the ALPN registry, not + the operation registry) +- ADR-014: Secret material flow and capability injection (capabilities are + per-`HandlerRegistration` bundle, not per-registry — the overlay model + doesn't change how capabilities flow; an imported op's capabilities come + from its bundle, which for `from_call` is whatever the assembly layer + granted the import) +- ADR-017: Call protocol client and adapter contract (`from_call` adapter; + the `FromCallConfig` namespace prefix is the disambiguation mechanism this + ADR's overlay model uses structurally) +- ADR-022: Handler registration, provenance, and composition authority + (provenance is the axis this ADR's layering is based on; the + `HandlerRegistration` bundle shape is unchanged) +- ADR-004: Auth as shared core (`IdentityProvider` — the precedent for the + trait-object integration point pattern this ADR applies to `OperationEnv`) +- OQ-04: Dynamic handler registration (this ADR amends OQ-04 to scope it to + the `HandlerRegistry`; the operation registry's mutability is now governed + by ADR-024) +- OQ-19: Session-scoped operation registries (this ADR generalizes the + session-overlay mechanism to connection overlays — same pattern, two + scopes) +- docs/reviews/002-pre-implementation-architecture-sanity-check.md + (findings C6, W3, W4 — resolved by this ADR) \ No newline at end of file diff --git a/docs/architecture/open-questions.md b/docs/architecture/open-questions.md index ddf420e..c596dcf 100644 --- a/docs/architecture/open-questions.md +++ b/docs/architecture/open-questions.md @@ -49,7 +49,22 @@ Door type classifications follow ADR-009: - **Door type**: Two-way - **Priority**: low - **Resolution**: Static registration at startup. `HandlerRegistry` is immutable after construction. ALPN strings in the TLS `ServerConfig` are derived from the registry at startup — adding a handler at runtime requires rebuilding the TLS config. The `ArcSwap` pattern can be applied later if needed (two-way door). See ADR-010. -- **Cross-references**: ADR-001, ADR-010, [endpoint.md](crates/core/endpoint.md) + + **Scope clarification (ADR-024)**: This resolution applies to the + **`HandlerRegistry`** (ALPN string → `ProtocolHandler`), which is what + ADR-010 governs. The call protocol's **`OperationRegistry`** (operation + name → `HandlerRegistration`) is a *separate* registry living inside the + `CallAdapter`, behind the single ALPN `alknet/call`. Its mutability + profile is governed by ADR-024, not by this OQ. ADR-024 layers the + operation registry by trust boundary: curated `Local` ops are immutable + (same rationale as here — composing ops are privileged, the startup trust + boundary is where their authority is granted); `Session` and imported + (`FromCall` etc.) ops are dynamic at their respective trust-boundary + scopes (session, connection). The pre-ADR-024 blanket immutability claim + in `operation-registry.md` was inherited by analogy from this OQ and did + not actually apply — the TLS-config argument that justifies + `HandlerRegistry` immutability does not touch the `OperationRegistry`. +- **Cross-references**: ADR-001, ADR-010, ADR-024, [endpoint.md](crates/core/endpoint.md), [operation-registry.md](crates/call/operation-registry.md) ## Theme: Transport and Endpoint @@ -231,12 +246,12 @@ These questions are acknowledged but not active. They will be promoted to open w Session-scoped operations are always `Internal` (ADR-015), run under the handler's identity (the agent handler that authorized the sandbox), can only compose operations in the handler's scoped env, and are ephemeral (gone when the session ends). Core operations are curated — reviewed before promotion. The promotion path is the curation checkpoint where autonomous (session-scoped) becomes curated (core). This is not auto-promotion. - **Implementation guard**: `OperationEnv` must remain a trait, not a concrete type. A session-scoped env wraps the global env (check session registry first, fall through to global). Making `OperationEnv` concrete or hardcoding the global registry into the dispatch path would close this pattern. The static registration constraint (OQ-04) applies to the global registry only; session registries are dynamic by nature and are a different registry overlaying the global one. + **Implementation guard**: `OperationEnv` must remain a trait, not a concrete type. A session-scoped env wraps the global env (check session registry first, fall through to global). Making `OperationEnv` concrete or hardcoding the global registry into the dispatch path would close this pattern. The static registration constraint (OQ-04) applies to the curated (Layer 0) registry only; session registries are dynamic by nature and are a different registry overlaying the curated one. **Generalized by ADR-024**: connection-scoped remote imports (`from_call`) use the same overlay mechanism as session-scoped ops. Both are per-scope dynamic overlays on the static curated base, composed into the per-call `OperationContext.env` by the `CallAdapter`. `OperationEnv` being a trait object (`Arc`) is what enables both overlay patterns. Session-scoped operations run in a locked-down sandbox (no direct net/fs/env access), can only reach operations in the handler's scoped env, and their output should be validated against their declared schema before returning. The promotion path requires review — an agent with a `promote` scope (the architect role) performs the promotion; the writing agent (lower-privileged role) requests it. This is the role-based escalation pattern (ADR-015): privileges escalate through a chain of command, not through direct authority. The agent-specific mechanism (quickjs sandbox, session registry lifecycle, promotion workflow) belongs to the agent crate spec. The call protocol's job is to keep the `OperationEnv` trait composable and the visibility/ACL model consistent across tiers. -- **Cross-references**: OQ-04, ADR-014, ADR-015, ADR-016, [operation-registry.md](crates/call/operation-registry.md) +- **Cross-references**: OQ-04, ADR-014, ADR-015, ADR-016, ADR-024, [operation-registry.md](crates/call/operation-registry.md) ## Theme: alknet-vault diff --git a/docs/architecture/overview.md b/docs/architecture/overview.md index 39a5e5e..feadba3 100644 --- a/docs/architecture/overview.md +++ b/docs/architecture/overview.md @@ -175,7 +175,7 @@ The following types live in alknet-core and are used across handler crates: ### One-Way and Two-Way Doors -Not all decisions carry the same reversal cost. One-way door decisions (BiStream type, crate independence, secret material flow) require ADRs and possibly POCs before commitment. Two-way door decisions (static vs dynamic registration, single vs multi-transport) can be decided during implementation — start simple, add complexity when needed. WASM compatibility is a design constraint within this framework, not a separate principle: decisions that would permanently close the WASM door require explicit justification. See [ADR-009](decisions/009-one-way-door-decision-framework.md). +Not all decisions carry the same reversal cost. One-way door decisions (BiStream type, crate independence, secret material flow) require ADRs and possibly POCs before commitment. Two-way door decisions (single vs multi-transport) can be decided during implementation — start simple, add complexity when needed. The static-vs-dynamic registration question is now resolved: the `HandlerRegistry` (ALPN-level) is static at startup (ADR-010, OQ-04), while the `OperationRegistry` (call-protocol-level) is layered — curated ops static, session/imported ops dynamic at their trust-boundary scopes (ADR-024). WASM compatibility is a design constraint within this framework, not a separate principle: decisions that would permanently close the WASM door require explicit justification. See [ADR-009](decisions/009-one-way-door-decision-framework.md). ### One ALPN, One Connection, One Handler @@ -214,6 +214,7 @@ All design decisions are documented as ADRs in [decisions/](decisions/). | [021](decisions/021-key-rotation-via-version-indexed-paths.md) | Key Rotation via Version-Indexed Paths | Version-indexed derivation paths; `rotate` re-encrypts between versions | | [022](decisions/022-handler-registration-provenance-and-composition-authority.md) | Handler Registration, Provenance, and Composition Authority | Registration bundle carries provenance, composition authority, scoped env, capabilities; dispatch path reads from bundle | | [023](decisions/023-operation-error-schemas.md) | Operation Error Schemas | Operations declare domain errors; `call.error` carries typed `details`; adapter fidelity for `from_openapi`/`to_openapi` | +| [024](decisions/024-operation-registry-layering.md) | Operation Registry Layering | Curated (static) + session/connection overlays (dynamic); `OperationEnv` as trait-object integration point | ## Open Questions @@ -222,7 +223,7 @@ Open questions are tracked in [open-questions.md](open-questions.md). Key questi - **OQ-01**: BiStream type definition (resolved: trait, Connection parameter — see ADR-007) - **OQ-02**: AuthContext resolution timing (resolved: hybrid — see ADR-004) - **OQ-03**: ALPN string naming convention (resolved: see ADR-006) -- **OQ-04**: Dynamic handler registration (resolved: static at startup — see ADR-010) +- **OQ-04**: Dynamic handler registration (resolved: static at startup for the `HandlerRegistry` — see ADR-010; the `OperationRegistry` is layered by ADR-024: curated ops static, session/imported ops dynamic at their trust-boundary scopes) - **OQ-08**: Vault integration point (resolved: CLI-embedded, assembly-layer only — see ADR-008, ADR-014, ADR-018, ADR-019) - **OQ-16**: Safe vault operations for call protocol exposure (resolved: none for now — see ADR-014) - **OQ-20**: Encryption key derivation (resolved: HD derivation, not PBKDF2 — see ADR-020)