--- status: draft last_updated: 2026-06-20 reviewed_documents: - overview.md - README.md - open-questions.md - crates/core/README.md - crates/core/core-types.md - crates/core/endpoint.md - crates/core/auth.md - crates/core/config.md - crates/call/README.md - crates/call/call-protocol.md - crates/call/operation-registry.md - crates/vault/README.md - crates/vault/mnemonic-derivation.md - crates/vault/encryption.md - crates/vault/service.md - crates/vault/protocol.md - decisions/001 through 021 (all 21 ADRs) reviewer: architect --- # Architecture Gap Review #001 ## Purpose This document captures gaps, ambiguities, and inconsistencies found in the alknet architecture specifications (core, call, vault crates) before decomposition into implementation tasks. Each finding is categorized by severity and structured as **Problem** → **Suggestion** (or **Open Question** where no solution is yet known). This is a final sanity check before implementation. The vault crate carries over source from the POC with documented drift (OsRng, `unwrap()`, `CURRENT_KEY_VERSION=1`, `spawn` return bug) flagged for correction during implementation sync; that drift is out of scope for this review. The focus is cross-boundary ambiguities and inconsistencies that would cause divergent implementations across crates. ## Severity Definitions | Severity | Meaning | |----------|---------| | **Critical** | Must resolve before implementation — would cause divergent implementations or security issues | | **Warning** | Should resolve — missing specifications that could cause confusion or rework | | **Suggestion** | Consider — improvements for clarity or completeness | --- ## Critical Findings ### C1. `handler_identity` Has No Registration Path **Documents**: ADR-015 (lines 79, 141, 251), operation-registry.md (lines 81, 104, 122, 126, 185), call-protocol.md (lines 265, 279, 288) **Problem**: ADR-015 and operation-registry.md are emphatic and repeated that `handler_identity` is "set at registration by the assembly layer": > ACL check runs against the composing handler's `handler_identity` (**set at > registration**) — operation-registry.md L81 > `handler_identity`: The identity of the handler processing this call. **Set > at registration by the assembly layer**. — operation-registry.md L122 > **Handler identity is set at registration by the assembly layer.** — ADR-015 > Assumption 2 But the registration API shown in operation-registry.md has no place for it: - `OperationSpec` (L33–41) has fields: `name`, `namespace`, `op_type`, `visibility`, `input_schema`, `output_schema`, `access_control`. No `handler_identity`. - `OperationRegistry::register(spec, handler)` (L140) takes `(OperationSpec, Handler)`. - `OperationRegistryBuilder::with(spec, handler)` (L148–153, L259–266) takes `(OperationSpec, Handler)`. So the spec says handler_identity is "set at registration" but the registration API doesn't accept it. Tracing how it would reach `OperationContext` at call time reveals the gap: `build_root_context` (call-protocol.md L265–286) sets `handler_identity: None` for wire-originated calls — correct for the root (ACL runs against caller identity). `OperationEnv::invoke()` propagates `parent.handler_identity.clone()` to children (L185). But the parent's `handler_identity` was `None` (root), so every internal call gets `handler_identity: None`. Under the shown API, ADR-015's "ACL runs against `handler_identity` for internal calls" is checking against `None` — which is the privilege-escalation gap ADR-015 was written to close. The handler's identity must enter the system somewhere. The natural place is registration: the assembly layer associates an `Identity` (scopes) with each handler, and the dispatch path attaches it to the `OperationContext` for calls dispatched to that handler (so it propagates via `parent.handler_identity` when that handler composes children). The spec shows no such mechanism. The TypeScript reference (`/workspace/@alkdev/operations/src/registry.ts`) has the same shape problem at a smaller scale — `OperationContext.trusted` (types.ts L27) is set by `buildEnv()` (env.ts L37), not by `register()`. The TypeScript registry never carried a handler identity; it relied on `trusted` being set by the env builder. The Rust rewrite replaces `trusted` with `handler_identity` (ADR-015) but doesn't show the equivalent of `buildEnv`'s wiring point. **Suggestion**: Decide and document where `handler_identity` is supplied at registration. Likely a third parameter to `register`/`with` or part of a registration bundle (see C4). The dispatch path needs to look up the handler's registered identity by operation name and attach it to the `OperationContext` before invoking the handler, so that handler's children inherit it via `parent.handler_identity`. This should be its own ADR (ADR-022: Handler Registration and Dispatch Context) before any call-crate implementation begins — see C4 for the synthesis. --- ### C2. Scoped Composition Env Has No Registration/Construction Path **Documents**: ADR-015 (lines 150–169, 256–260), operation-registry.md (lines 126, 168, 174, 307), call-protocol.md (line 285) **Problem**: Same shape as C1. ADR-015 and operation-registry.md say the scoped composition env is "set at registration by the assembly layer": > The `OperationEnv` given to a handler is scoped — it can only invoke a > declared set of operations, **set at registration by the assembly layer** > — operation-registry.md L307 > **Static scoping at registration**: the assembly layer declares which > operations a handler may compose — ADR-015 L159 > The scoped env is **declared at registration** (static) — ADR-015 > Assumption 3 But: - `register(spec, handler)` / `with(spec, handler)` take no scoped-env declaration. `OperationSpec` has no field for it. - The only `OperationEnv` implementation shown is `LocalOperationEnv { registry: Arc }` (operation-registry.md L168) — the full registry, not a scoped one. - `LocalOperationEnv::invoke` (L174) does `self.registry.invoke(&name, ...)` against the unfiltered registry. There's no scoping layer shown. - `build_root_context` sets `env: self.env.clone()` (call-protocol.md L285) — one env cloned to every root context. That's a single global env, not per-handler scoping. For scoping to work, each handler must receive a different scoped env, but the dispatch path shown gives every root context the same `self.env`. ADR-015 explicitly calls the scoped env one of the three independent controls that are "all needed" (L177–198), and says without it "the handler can call anything in the registry" — the parameterized-dispatch attack surface. The spec locks the requirement but not the mechanism, and the mechanism is non-obvious (it changes the registration API and the per-handler context construction). The TypeScript reference (`/workspace/@alkdev/operations/src/env.ts`) implements scoping via `buildEnv({ registry, context, allowedNamespaces })` — the env builder takes an `allowedNamespaces` filter and returns a pre-filtered env (L14–46). The Rust equivalent needs the same: a per-handler env built from a per-handler scope declaration. The declaration point (registration) and the construction point (dispatch) both need specifying. **Suggestion**: Specify how a scoped env is declared at registration and how it reaches the handler's `OperationContext.env`. Likely the same bundle that resolves C1 — see C4. --- ### C3. `Capabilities` Lives in Two Places and the Two Models Aren't Reconciled **Documents**: ADR-014 (lines 82–83, 114–124), operation-registry.md (lines 105, 253–256, 274–297), call-protocol.md (line 274) **Problem**: ADR-014 and operation-registry.md show two different models for how a handler gets outbound credentials: - **Model A — construction-time capture in the handler closure.** operation-registry.md L253–256: `agent_chat_handler(Capabilities::new().with_api_key("google", google_api_key))`. The handler is constructed with capabilities baked in. ADR-014 L82: "injected at handler construction (the common case: a static decrypted API key held for the handler's lifetime)." - **Model B — per-request on `OperationContext.capabilities`, propagated through composition.** operation-registry.md L105: `OperationContext { pub capabilities: Capabilities }`. L186: `capabilities: parent.capabilities.clone()` in `invoke()`. ADR-014 L83: "or scoped per-request for internal-only flows." call-protocol.md L274: `build_root_context(..., capabilities: Capabilities)` takes a capabilities parameter. These two models don't connect. If the handler closure captured the capabilities at construction (Model A), then `OperationContext.capabilities` (Model B) is either redundant or must be populated from the closure — but the closure isn't passed the context, the closure receives the context. There's a circular dependency: the handler needs capabilities to do its work, the capabilities are on the context the handler receives, but the context is built by the CallAdapter which doesn't have the handler's construction-time capabilities (the `CallAdapter` struct at call-protocol.md L35–38 has no `capabilities` field). `build_root_context`'s comment (call-protocol.md L274) says "the CallAdapter's own capabilities (if any)" — hedged with "if any," suggesting even the spec author wasn't sure. For composition, `invoke()` clones `parent.capabilities` — so children inherit the parent's capabilities. But if the parent's `context.capabilities` was empty (because the parent handler used its captured capabilities, not the context's), children get empty capabilities too. An agent handler composing `/fs/readFile` with an outbound LLM call would lose the LLM API key at the composition boundary. **Suggestion**: Pick one of: - (a) Capabilities are only per-request on `OperationContext`, populated by the dispatch path from a per-handler capabilities table held by the registry/adapter (the construction-time "baking" just populates that table). - (b) Capabilities are captured in the handler closure and `OperationContext.capabilities` is removed, with composition passing capabilities explicitly as an `invoke()` parameter. - (c) Keep both but document the bridge (e.g., the handler closure captures an `Arc` and the registry also holds a clone that the CallAdapter uses to populate `build_root_context`). Option (a) is closest to the shape that resolves C1 and C2 cleanly — see C4. --- ### C4. C1–C3 Are One Tangle: Registration Bundle Is Missing **Documents**: operation-registry.md (lines 130–153, 168–194), call-protocol.md (lines 265–288) **Problem**: C1, C2, C3 are the same underlying gap: the registration/dispatch API surface in operation-registry.md doesn't carry the three things ADR-014 and ADR-015 say are "set at registration" — handler identity, scoped env, and capabilities. All three are load-bearing for the security model, all three are locked in ADRs, and none are reflected in the `register(spec, handler)` signature or the `OperationSpec` struct. The three controls in ADR-015 (visibility, handler identity, scoped env) plus the capability injection in ADR-014 all enter the system at the same boundary: the assembly layer hands the registry a `(spec, handler)` pair plus the handler's runtime context material. The natural shape is a registration bundle that carries all three: ```rust // Illustrative — the exact shape is the decision to make. pub struct HandlerRegistration { pub spec: OperationSpec, pub handler: Handler, pub identity: Identity, // C1: handler identity for ACL pub scoped_env: ScopedOperationEnv, // C2: reachable operations pub capabilities: Capabilities, // C3: outbound credentials } ``` And the dispatch path (`build_root_context` + `OperationEnv::invoke`) reads from this bundle: the CallAdapter looks up the handler's registration by operation name, attaches `registration.identity` to `OperationContext.handler_identity` (so children inherit it), uses `registration.scoped_env` as `OperationContext.env` (so the handler can only reach declared ops), and populates `OperationContext.capabilities` from `registration.capabilities` (so composition propagates them via `parent.capabilities.clone()`). This resolves all three cleanly and matches the prior art: the TypeScript `buildEnv({ registry, context, allowedNamespaces })` already bundles the scoping decision with the env construction; the Rust version generalizes this to a per-handler registration rather than a per-call build. **Suggestion**: Write ADR-022 (Handler Registration and Dispatch Context) that: 1. Defines the registration bundle shape. 2. Updates `OperationRegistry::register` / `OperationRegistryBuilder::with` signatures. 3. Shows how `build_root_context` and `OperationEnv::invoke()` read from the bundle. 4. Reconciles Model A and Model B for capabilities (recommend option (a): the bundle carries capabilities, the closure captures nothing, the dispatch path populates `OperationContext.capabilities` from the bundle). 5. Updates operation-registry.md and call-protocol.md to match. This is the single highest-value piece of work before implementation. It closes the gap between ADR-014/015's locked security model and the API surface in one place, before any implementer has to guess. --- ### C5. Operation Registry Has No Error Schema Concept **Documents**: operation-registry.md (lines 33–41, OperationSpec), call-protocol.md (lines 127–143, call.error payload), ADR-017 (lines 113–124, from_call mirrors remote spec) **Problem**: The `OperationSpec` has `input_schema` and `output_schema` but no `error_schemas`. The `call.error` payload (call-protocol.md L128–134) carries a `code` (string enum: `NOT_FOUND`, `FORBIDDEN`, `INVALID_INPUT`, `INTERNAL`, `TIMEOUT`) and a `message`, but there's no way for an operation to declare the specific errors it can return beyond these five infrastructure codes. This matters for two reasons: 1. **OpenAPI/HTTP adapter fidelity.** OpenAPI specs naturally include error information (response status codes with schemas). The `from_openapi` adapter (ADR-017 L113–124) imports operations and mirrors "the remote operation's name, namespace, type, schemas, and access control" — but with no error schema field, error responses from the OpenAPI source are dropped on import. `to_openapi` has nowhere to project error information to. The same gap applies to `from_mcp`/`to_mcp`. An OpenAPI operation that returns `404: { schema: NotFoundError }` and `422: { schema: ValidationError }` cannot be faithfully represented in alknet's `OperationSpec`. 2. **Client ergonomics and type safety.** A client calling `/fs/readFile` can't know from the schema that it might return `FILE_NOT_FOUND` vs `PERMISSION_DENIED` vs `INVALID_PATH`. The five infrastructure codes cover protocol-level failures; they don't cover operation-level domain errors. A caller has to parse `message` strings — the exact anti-pattern that typed RPC is meant to avoid. The TypeScript reference (`/workspace/@alkdev/operations/src/types.ts` L38–47, L94, L112) defines `ErrorDefinitionSchema` and an optional `errorSchemas?: ErrorDefinition[]` on `OperationSpec`. Each error definition has `code`, `description`, `schema`, and optional `httpStatus`. The `mapError()` function (`error.ts` L25–51) matches thrown errors against the declared error schemas by code prefix. This is a proven pattern; the Rust implementation should carry it forward. The translator agent likely omitted it because it's `Optional` in the TS schema (so dropping it doesn't break the happy path) and because error schemas are semantically different from input/output schemas (an operation returns one output but could return any of several errors). That's a reasonable judgment call for a first translation pass, but it leaves a real gap for adapters and clients. Without error schemas, `from_openapi` and `to_openapi` are lossy on the error axis — an OpenAPI operation's error contract is silently dropped. This undermines the "discoverable, type-safe RPC" goal in operation-registry.md L18–24. It's also a cross-boundary issue: the call protocol, the operation registry, and the adapters all need to agree on the error representation, and right now only the call protocol has one (the five infrastructure codes). **Suggestion**: Add an optional `error_schemas: Vec` (or similar) to `OperationSpec`, where `ErrorDefinition` carries a `code`, optional `http_status` (for adapter projection), and a JSON Schema for the error detail payload. Extend the `call.error` payload to optionally carry an error detail matching the declared schema. Document that the five infrastructure codes (`NOT_FOUND` etc.) are protocol-level and distinct from operation-level error codes. This should be its own ADR (ADR-023: Operation Error Schemas) — the error representation is a one-way door (once the wire format carries error codes, changing them breaks clients), and it must be decided before implementing `from_openapi`/`to_openapi`, otherwise the adapter contract in ADR-017 can't be faithfully implemented. --- ## Warning Findings ### W1. ADR-020 Contains a Stale Example Path That Contradicts ADR-021 **Documents**: ADR-020 (line 70) **Problem**: ADR-020 L70 says "different derivation paths (`m/74'/2'/0'/0'` for v1, `m/74'/2'/1'/0'` for a future v2)". This puts the version index at the third level (`2'/1'/0'`). ADR-021 (which resolves the actual rotation scheme) and all the vault specs put the version index at the last level: v2 = `m/74'/2'/0'/0'`, v3 = `m/74'/2'/0'/1'` (mnemonic-derivation.md L226, encryption.md L160–164, ADR-021 L68–73). ADR-020's example is a pre-ADR-021 leftover and directly contradicts the locked decision. An implementer reading ADR-020 in isolation would derive at the wrong path. **Suggestion**: Update ADR-020 L70's example to `m/74'/2'/0'/0'` for v2, `m/74'/2'/0'/1'` for v3, or remove the illustrative example and point to ADR-021 for the scheme. --- ### W2. `EncryptionError::KeyVersionMismatch` Is Stale After OQ-22 Resolved **Documents**: encryption.md (lines 197, 205–210) **Problem**: `EncryptionError::KeyVersionMismatch { expected, actual }` is defined with comment "reserved for future rotation (OQ-22)". encryption.md L205 says it's "defined but unused in v2." But OQ-22 is now resolved by ADR-021, and ADR-021's rotation mechanism (version-indexed paths + `decrypt` derives at the version's path + `rotate` re-encrypts) never emits `KeyVersionMismatch` — there's no version-mismatch to detect, because `decrypt` always derives the key the blob says to use. So after ADR-021, `KeyVersionMismatch` isn't "reserved for future rotation" (rotation is implemented, without it) — it's effectively dead. The "reserved for future (OQ-22)" comment is stale and could mislead an implementer into thinking they need to wire it up for rotation. **Suggestion**: Update the encryption.md comment to state that ADR-021 implements rotation without this variant, and clarify whether it's retained for a different future use or should be removed. --- ### W3. `from_mcp` Is Missing from Adapter Lists in operation-registry.md and ADR-014 **Documents**: operation-registry.md (lines 59, 301), ADR-014 (line 105) **Problem**: ADR-017 L160–165 lists four import adapters: `FromOpenAPI`, `FromMCP`, `FromCall`, `FromJsonSchema`. But: - operation-registry.md L59 says "`from_openapi` and `from_jsonschema` adapters register operations as `Internal` by default" — omits `from_mcp` and `from_call`. - operation-registry.md L301 says "The `from_openapi`, `from_jsonschema`, and `from_call` adapter patterns" — omits `from_mcp`. - ADR-014 L105 says "The `from_openapi` and `from_jsonschema` adapter patterns" — omits `from_mcp` and `from_call`. `from_mcp` is a real import adapter (in ADR-017's trait list and ADR-013's adapter contract). Its absence from the prose in operation-registry.md/ADR-014 is likely an oversight, but it could make an implementer wonder whether `from_mcp` is real or whether it follows the Internal-by-default rule. **Suggestion**: Standardize on listing all four import adapters (or say "all import adapters" / "all `OperationAdapter` implementations") when stating the Internal-by-default rule, so the rule clearly applies to `from_mcp` too. --- ### W4. `derive_encryption_key_for_version` (ADR-021) Not in service.md Method List **Documents**: service.md (lines 119–127, 161–179), ADR-021 (lines 104–122) **Problem**: ADR-021 L104–122 introduces `derive_encryption_key_for_version(version)` and shows `decrypt` using it. But service.md's "Derive Methods" section (L119–127) only shows `derive_encryption_key(path)`, and its `encrypt`/`decrypt` (L161, L172) reference the version-indexed path scheme in prose but don't show the version-aware method. An implementer whose primary reference is service.md wouldn't see `derive_encryption_key_for_version` and might implement `decrypt` as "always derive at `PATHS::ENCRYPTION`" (the pre-ADR-021 behavior, which ADR-021 explicitly calls out as a drift bug to fix). The ADR is authoritative, but the spec doc is what the implementer reads first. **Suggestion**: Add `derive_encryption_key_for_version` to service.md's method list and show `decrypt` calling it, mirroring ADR-021. Clarify whether `derive_encryption_key(path)` (the path-based method) is still public API or subsumed by the version-based one. --- ## Suggestions ### S1. Abort Policy Field Absent from `call.requested` Payload **Documents**: call-protocol.md (lines 112–118, call.requested payload), ADR-016 (lines 86–89, 177–181) **Suggestion**: ADR-016 says the `abort-dependents` vs `continue-running` policy is per-call and "the specific mechanism (a field in the `call.requested` payload, a field on `OperationContext`, or a per-operation declaration) is a two-way door for implementation." call-protocol.md's `call.requested` payload (L112–118) shows `operationId`, `input`, `auth_token` — no policy field. Since ADR-016 explicitly leaves the mechanism open, this isn't an inconsistency, but a one-line note in call-protocol.md ("the abort policy is conveyed via [TBD — two-way door, ADR-016]") would prevent an implementer from assuming the payload is final and forgetting the policy exists. --- ### S2. `build_root_context` "CallAdapter's Own Capabilities (If Any)" Hedge **Documents**: call-protocol.md (line 274) **Suggestion**: The "(if any)" hedge reflects the C3 ambiguity. Once C3/C4 is resolved, this comment should be made definitive — either "the handler's capabilities, looked up from the registry by operation name" (if option (a)) or removed (if capabilities are closure-captured only). Leaving it hedged propagates the ambiguity into the one place an implementer looks for "how does the root context get built." --- ### S3. Example Style: `Arc::new(handler)` Inline vs. Pre-wrapped **Documents**: operation-registry.md (lines 149, 264) **Suggestion**: L149 uses `.with(spec, Arc::new(services_list_handler))` while L264 uses `.with(spec, agent_handler)` where `agent_handler` was already `Arc::new`'d at L253. Both are valid; pick one style in the examples to avoid a moment of "wait, does `with` take an `Arc` or a `Handler`?" confusion. Trivial. --- ### S4. `last_updated` Dates Scattered; README Index Older Than Sub-docs **Documents**: frontmatter across all docs **Suggestion**: The README index date (2026-06-19) is older than several sub-documents (core/README 06-21, call/operation-registry 06-22, etc.). If frontmatter dates are meant to mean something, bump README/overview when sub-docs change. Not worth a cycle on its own. --- ## Summary Statistics | Severity | Count | Status | |----------|-------|--------| | Critical | 5 | Needs resolution before implementation | | Warning | 4 | Should resolve — mostly quick spec clarifications | | Suggestion | 4 | Consider for clarity | ## Recommended Resolution Order 1. **Critical findings C1–C4** — These are one tangle (the registration bundle). Resolve together via ADR-022 (Handler Registration and Dispatch Context) before any call-crate implementation begins. C1, C2, C3 are kept as separate findings for traceability (each cites specific lines) but C4 synthesizes them: the registration bundle carries handler identity, scoped env, and capabilities, and the dispatch path reads from it. This is the highest-value work before implementation. 2. **Critical finding C5** — Error schemas. Resolve via ADR-023 (Operation Error Schemas) before implementing `from_openapi`/`to_openapi`, otherwise the adapter contract in ADR-017 can't be faithfully implemented. 3. **Warning findings W1–W4** — Small doc edits. W1 (stale ADR-020 path) and W4 (missing service.md method) should go in the same pass as the ADR-022 spec updates since they touch the same files. W2 and W3 are one-line fixes. 4. **Suggestions S1–S4** — Address opportunistically during implementation.