From 8f8a8a48f9be8509e202a150b602722ee0b73ae7 Mon Sep 17 00:00:00 2001 From: "glm-5.2" Date: Mon, 22 Jun 2026 05:09:39 +0000 Subject: [PATCH] docs(reviews): add pre-implementation architecture sanity check #002 Second pre-implementation review. Goes wider than #001 on cross-document consistency and the two-way-door framing from ADR-009. Finds 13 critical, 21 warning, 12 suggestion issues: - Governance: ADR-022/023 are Proposed but specs treat them as binding; ADR-015/002/004 (Accepted) contradict later refinements without supersession markers - Abort policy (ADR-016) missing from OperationContext struct; OperationEnv trait never defined - OperationContext.env type identity crisis (reachability set vs dispatch trait) - ADR-017 from_call mirror list missing error_schemas; OperationAdapter trait stale vs ADR-022 bundle - OQ-21 remote vault 'non-breaking' framing conflicts with ADR-019 and hides a crate-decomposition decision; RemoteService path unvalidated - Vault operation access policy table incomplete for security-sensitive methods - site_password_path string-to-index mapping breaks determinism guarantee - Two-way-door audit: ADR-022 narrowed several doors without updating OQ classifications; 'published artifact is a contract' blind spot in framework Includes recommended 5-pass resolution order. --- ...mplementation-architecture-sanity-check.md | 1780 +++++++++++++++++ 1 file changed, 1780 insertions(+) create mode 100644 docs/reviews/002-pre-implementation-architecture-sanity-check.md diff --git a/docs/reviews/002-pre-implementation-architecture-sanity-check.md b/docs/reviews/002-pre-implementation-architecture-sanity-check.md new file mode 100644 index 0000000..cb4bf3d --- /dev/null +++ b/docs/reviews/002-pre-implementation-architecture-sanity-check.md @@ -0,0 +1,1780 @@ +--- +status: draft +last_updated: 2026-06-22 +reviewed_documents: + - overview.md + - README.md + - open-questions.md + - crates/core/README.md + - crates/core/core-types.md + - crates/core/endpoint.md + - crates/core/auth.md + - crates/core/config.md + - crates/call/README.md + - crates/call/call-protocol.md + - crates/call/operation-registry.md + - crates/vault/README.md + - crates/vault/mnemonic-derivation.md + - crates/vault/encryption.md + - crates/vault/service.md + - crates/vault/protocol.md + - decisions/001 through 023 (all 23 ADRs) +reviewer: architect +--- + +# Architecture Gap Review #002 + +## Purpose + +This is the second pre-implementation sanity check. Review #001 caught the +registration-bundle tangle (C1–C4) and the missing error schemas (C5), both +resolved via ADR-022 and ADR-023. This review goes deeper on two axes the +first review under-covered: + +1. **Cross-document consistency**: places where an Accepted ADR contradicts a + Proposed ADR, where a spec doc adopted a Proposed ADR's types without the + ADR being advanced, and where a later ADR narrowed a "two-way door" + deferral without updating the OQ classification. +2. **Two-way-door deferral audit**: the user's concern that the one-way/two-way + door framing from ADR-009 has been applied too loosely. Some deferrals are + legitimate, but several are questionable or misleading — either because a + later ADR quietly narrowed the door, because the "path of least resistance" + implementation forecloses a better option, or because multiple + individually-reversible deferrals compound to foreclose a path that no + single deferral would block on its own. + +Review #001 resolved 5 critical, 4 warning, and 4 suggestion findings. This +review finds a different set: the registration-bundle work surfaced new +inconsistencies between ADR-015 (Accepted) and ADR-022 (Proposed), the abort +policy (ADR-016) was never reflected in the OperationContext struct, the +OperationEnv trait was never defined, and several vault deferrals have +security dimensions that the two-way-door framing doesn't surface. + +## Severity Definitions + +| Severity | Meaning | +|----------|---------| +| **Critical** | Must resolve before implementation — would cause divergent implementations or security issues | +| **Warning** | Should resolve — missing specifications that could cause confusion or rework | +| **Suggestion** | Consider — improvements for clarity or completeness | + +--- + +## Critical Findings + +### C1. ADR-015 Is Not Reconciled With ADR-022 — Direct Contradiction in the Document Set + +**Documents**: ADR-015 (Status: Accepted, Decision 3 L114–136, Assumption 6 +L275–280), ADR-022 (Status: Proposed, Decision 2 L145–186), operation-registry.md +(L115, L133, L172), call-protocol.md (L296) + +**Problem**: ADR-015 is marked **Accepted** and defines +`handler_identity: Option` in its Decision 3 (L124) — a peer `Identity` +type resolved via `IdentityProvider`. Assumption 6 (L275–280) explicitly states +"the handler identity is a full `Identity` (with scopes), not a special +principal type… This reuses the existing `Identity` type and `IdentityProvider` +infrastructure (ADR-004)." + +ADR-022 is marked **Proposed** and explicitly supersedes ADR-015's Assumption 6 +(L147–149: "This ADR refines that: composition authority is a declared authority +bundle, not a peer `Identity`"; L183–185: "This supersedes ADR-015's Assumption +6"). The spec docs now use `handler_identity: Option` — +a different, non-peer type. + +This is a **direct contradiction**. Per the ADR lifecycle, a superseded +decision should be marked `Superseded` or carry an inline note pointing to the +superseding ADR. ADR-015 has neither. Worse, ADR-015 has **higher status** +(Accepted) than the ADR that supersedes it (Proposed). An implementer reading +ADR-015 in isolation would build `Identity`-based handler identity; an +implementer reading the specs would build `CompositionAuthority`-based. The +Accepted-overrides-Proposed governance rule means the old `Identity` definition +is, by the document set's own lifecycle, currently authoritative — yet the +specs have already adopted the proposed change. + +This is the same shape of bug as review #001's W1 (ADR-020's stale example +path contradicting the locked ADR-021 scheme), but more severe: it's a +security-model type, not a path string. + +**Suggestion**: Two things must happen before implementation: + +1. Advance ADR-022 and ADR-023 to **Accepted**. The specs already depend on + them as the source of truth for types that appear throughout the code. + Keeping them Proposed while the specs adopt their types is incoherent + governance. (See C2.) +2. Reconcile ADR-015: either update its Decision 3 and Assumption 6 to point + to ADR-022 (inline amendment note + change `Identity` to + `CompositionAuthority` in the struct), or mark the relevant sections + `Superseded by ADR-022 (Decision 2)`. + +The cleanest path is to do both in one pass: advance ADR-022/023 to Accepted, +amend ADR-015 to reference the supersession. This resolves C1, C2, and C6 in +one move. + +--- + +### C2. ADR-022 and ADR-023 Are "Proposed" but the Specs Treat Them as Authoritative + +**Documents**: ADR-022 (Status: Proposed), ADR-023 (Status: Proposed), +operation-registry.md (L40, L115, L133, L149, L158, L160–168, L286, L372–379, +L404–406), call-protocol.md (L296, L305, L369–373), README.md (ADR table +L36–60) + +**Problem**: Both ADR-022 and ADR-023 are marked **Proposed**. Yet the spec +docs depend on them pervasively as the source of truth: + +- `OperationSpec.error_schemas` (ADR-023) — operation-registry.md L40, L286. +- `ErrorDefinition` (ADR-023) — operation-registry.md L56–61. +- `HandlerRegistration` with `provenance`, `composition_authority`, + `scoped_env`, `capabilities` (ADR-022) — operation-registry.md L160–168, + the central registration type. +- `OperationContext.handler_identity: Option` (ADR-022 + Decision 2) — operation-registry.md L115, L133; call-protocol.md L296. +- `CompositionAuthority`, `ScopedOperationEnv`, `OperationProvenance` + (ADR-022) — used throughout. + +The README's ADR table (L36–60) shows ADR-022 and ADR-023 as "Proposed," but +the call-crate README's ADR table has **no Status column** at all — a reader +can't tell which ADRs are binding. Per the ADR lifecycle, "Proposed" means +under review, not yet a commitment. + +Combined with C1 (ADR-015 Accepted still contradicts ADR-022 Proposed), the +governance state is incoherent: the Accepted ADR says `Identity`, the Proposed +ADR says `CompositionAuthority`, and the specs have picked the Proposed one. + +**Suggestion**: Advance ADR-022 and ADR-023 to **Accepted** before +implementation. They are clearly intended to be binding — the specs already +implement them. Add a Status column to the crate README ADR tables so the +binding/non-binding distinction is visible. This is the governance fix that +unblocks C1. + +--- + +### C3. The Abort Policy (ADR-016 Decision 6) Is Missing from `OperationContext` and `OperationEnv::invoke()` + +**Documents**: ADR-016 (Decision 6 L120–164, Accepted), operation-registry.md +(`OperationContext` struct L111–128, `LocalOperationEnv::invoke()` L211–249), +call-protocol.md (L124, L345) + +**Problem**: ADR-016 Decision 6 states unambiguously (L122–125): "The abort +policy (`abort-dependents` vs `continue-running`) is set on `OperationContext` +and propagated to children via `OperationEnv::invoke()`." It shows +`OperationContext` carrying the policy (L154) and an opt-in +`invoke_with_policy(...)` method (L149–151). + +The `OperationContext` struct in operation-registry.md (L111–128) has **no +`abort_policy` field**. The fields shown are: `request_id`, +`parent_request_id`, `identity`, `handler_identity`, `capabilities`, +`metadata`, `env`, `internal`. No policy. The `LocalOperationEnv::invoke()` +implementation (L211–249) likewise constructs a child `OperationContext` with +no policy field and takes no policy parameter. + +call-protocol.md L124 and L345 correctly *state* that the policy is on +`OperationContext` and propagated via `OperationEnv::invoke()`, but the actual +struct definition in operation-registry.md — the one an implementer would copy +— omits it. This is a gap between ADR-016 (Accepted) and the spec. An +implementer copying the spec would simply omit the abort policy, and the +`continue-running` opt-in would have no wiring point. + +**Suggestion**: Add `pub abort_policy: AbortPolicy` (with a default of +`AbortPolicy::AbortDependents`) to the `OperationContext` struct in +operation-registry.md, and define the `AbortPolicy` enum +(`AbortDependents | ContinueRunning`). The `build_root_context` in +call-protocol.md should set `abort_policy: AbortPolicy::AbortDependents` (the +default per ADR-016 L140). See C4 for the `invoke()` side. + +--- + +### C4. `OperationEnv` Trait Is Never Defined; `invoke_with_policy()` Absent from Spec + +**Documents**: operation-registry.md (L196–259), ADR-016 (Decision 6 L144–164) + +**Problem**: Two related gaps: + +1. **The `OperationEnv` trait is never defined.** operation-registry.md + repeatedly references `OperationEnv` as a **trait** that "must remain a + trait" (L259, also OQ-19), and shows + `#[async_trait] impl OperationEnv for LocalOperationEnv` (L211–212), but + **the trait definition itself is never shown**. Only the `LocalOperationEnv` + struct (L207–209) and its impl block (L211–249) appear. An implementer + cannot know the trait's method signatures, return types, or whether + `invoke()` is the sole method. + +2. **The `invoke_with_policy()` opt-in is absent.** ADR-016 Decision 6 + (L144–152) shows the policy entering through `OperationEnv::invoke()` via + `invoke_with_policy("fs", "readFile", input, &context, + AbortPolicy::ContinueRunning)`. The spec's `OperationEnv::invoke()` + signature (L213) takes no policy parameter, and there is no + `invoke_with_policy` method, no `InvokeOptions` struct, no builder + pattern. The spec never defines the trait surface, so the policy opt-in + mechanism ADR-016 mandates is unspecified. + +This is load-bearing because OQ-19 makes the trait-ness of `OperationEnv` a +**constraint**: session-scoped registries depend on wrapping the global env +via trait layering. If the trait surface is unspecified, two implementers +could define it differently, breaking the wrapping pattern. + +**Suggestion**: Define the `OperationEnv` trait explicitly in +operation-registry.md, showing at minimum: + +```rust +#[async_trait] +pub trait OperationEnv: Send + Sync { + async fn invoke( + &self, + namespace: &str, + operation: &str, + input: Value, + parent: &OperationContext, + ) -> ResponseEnvelope; + + /// Opt-in: compose a child with a non-default abort policy. + async fn invoke_with_policy( + &self, + namespace: &str, + operation: &str, + input: Value, + parent: &OperationContext, + policy: AbortPolicy, + ) -> ResponseEnvelope { + // default implementation: set policy on child context, delegate to invoke + } +} +``` + +This satisfies both the "must remain a trait" constraint and ADR-016's +policy parameter. See C7 for a related type ambiguity on the same struct. + +--- + +### C5. ADR-017's `from_call` Mirror List Does Not Include `error_schemas` — Inconsistency with ADR-023 + +**Documents**: ADR-017 (Decision 3 L113–122, Assumption 4 L274–279, Accepted), +ADR-023 (Decision 1 L112–147, L42–47, L257, Proposed) + +**Problem**: ADR-017 Decision 3, step 3 (L118–119) says `from_call` constructs +a spec that "mirrors the remote operation's name, namespace, type, schemas, +and access control." Assumption 4 (L274–275) repeats: "the imported +`OperationSpec` has the same name, namespace, type, schemas, and access +control as the remote operation." The list is: name, namespace, type, +schemas, access control. There is **no `error_schemas`** in either list. + +ADR-023 adds `error_schemas: Vec` to `OperationSpec` (L123) +and Decision 5 (L236–261) makes adapter fidelity explicit for +`from_openapi`/`to_openapi`/`from_mcp`/`to_mcp`. But ADR-023 does **not** +mention `from_call` in Decision 5 — it covers OpenAPI and MCP adapters only. +Meanwhile ADR-017's `from_call` mirror list predates `error_schemas` existing +as a field. + +The result: it's ambiguous whether `from_call` should mirror `error_schemas`. +ADR-017's step 2 (L116–117) says `from_call` calls `services/schema` "for each +→ gets the input/output JSON Schemas" — again, no mention of error schemas. +ADR-023 Decision 6 (L264–286) confirms `services/schema` *returns* +`error_schemas`, so the data is available, but ADR-017 doesn't say to consume +it. + +An implementer following ADR-017 literally would import operations with empty +`error_schemas`, silently dropping the remote error contract — the exact +lossy-import problem ADR-023 was written to solve for OpenAPI/MCP, now +recurring for `from_call`. + +**Suggestion**: Update ADR-017 Decision 3 step 3 and Assumption 4 to include +`error_schemas` in the mirror list (name, namespace, type, schemas, +**error_schemas**, access control), and update step 2 to note that +`services/schema` now also returns error schemas. Add a cross-reference to +ADR-023. This is an inconsistency between an Accepted ADR (017) and a later +ADR (023) that must be reconciled — same shape as review #001's W3 +(`from_mcp` missing from adapter lists), but for a new field. + +--- + +### C6. `OperationContext.env` Has a Type Identity Crisis — Reachability Set vs Dispatch Trait + +**Documents**: ADR-015 (L127: `pub env: OperationEnv`), ADR-022 (L208–215 +`ScopedOperationEnv`, L286–287 `env: registration.scoped_env…`), +operation-registry.md (L118: `pub env: OperationEnv`; L220: +`parent.env.allows(&name)`; L243–244: `env: registration.scoped_env.clone()…`; +L198: `context.env.invoke(...)`), call-protocol.md (L298–299) + +**Problem**: There is a type identity problem for the `env` field on +`OperationContext`: + +- ADR-015 L127 and operation-registry.md L118 declare `pub env: OperationEnv` + (a trait object / the trait type). +- ADR-022 and operation-registry.md L243–244 and call-protocol.md L298–299 + *populate* `env` with a `ScopedOperationEnv` (a concrete struct: + `allowed_operations: HashSet`, ADR-022 L208–214). + +`ScopedOperationEnv` (a reachability set — ADR-022's reachability control) +and `OperationEnv` (the composition dispatch trait — the thing with +`invoke()`) are **different concepts**. ADR-015 conflated them by putting +`OperationEnv` on the context; ADR-022 introduced `ScopedOperationEnv` as the +reachability *input* but the dispatch path assigns the scoped set directly to +`env:`. + +operation-registry.md L220 calls `parent.env.allows(&name)` — implying `env` +is the `ScopedOperationEnv` (has `.allows()`). But the same field is named +`OperationEnv` in the struct and described (L136) as "the operation environment +for composing calls… `OperationEnv`" with `invoke()`. A handler calls +`context.env.invoke(...)` (L198, L200) — implying `env` is the dispatch +trait. One field cannot be both a `HashSet`-based reachability gate and a +trait with an async `invoke()` method, unless they are the *same type* (they +are not — ADR-022 treats `ScopedOperationEnv` as data and the dispatch +`OperationEnv`/`LocalOperationEnv` as the trait impl that *consults* the +scoped env). + +This is a genuine type-system ambiguity that would force an implementer to +guess: is `context.env` the dispatch trait (then where does the scoped +reachability set live?), or is it the reachability set (then how does the +handler call `invoke()` on it?), or is it a composite type that holds both? + +This interacts with OQ-19's constraint that `OperationEnv` must remain a trait: +if `env` is `ScopedOperationEnv` (a concrete struct), the session-overlay +pattern is closed because the field can't hold a trait object. + +**Suggestion**: Separate the two concepts explicitly. Likely: +`OperationContext` carries `scoped_env: ScopedOperationEnv` (reachability +data, used for the `allows()` check) and a separate `env: Box` (or `Arc`) that is the dispatch trait (used +for `invoke()`). The reachability check in `invoke()` consults +`parent.scoped_env.allows(&name)`, and the actual dispatch goes through the +`OperationEnv` trait impl. The current field naming (`env` for both) is the +root of the confusion and should be disambiguated. This may warrant an ADR-022 +clarification, since it changes the struct shape that ADR-022 Decision 5 shows. + +--- + +### C7. OQ-21's "Non-Breaking" Deferral Conflicts With ADR-019 and Hides a Crate-Decomposition Decision + +**Documents**: open-questions.md (OQ-21 L252–285), ADR-019 (L133–136), +crates/vault/protocol.md (L187–216, L278–291), crates/vault/service.md +(L300–322) + +**Problem**: Three interlocking problems: + +1. **ADR-019 directly contradicts OQ-21's deferral framing.** ADR-019 L133–136 + states: *"Remote vault administration (unlock a running node's vault over + the network) is not supported. If needed in the future, it requires a + separate, heavily restricted mechanism (admin scope, mTLS-only, never + expose the mnemonic over an unauthenticated channel) and **its own ADR**."* + OQ-21 defers remote vault *access* (derive/encrypt/decrypt) as + "non-breaking" with no ADR required. The boundary between ADR-019's + "remote vault administration" (needs an ADR) and OQ-21's "remote vault + access" (deferred, no ADR) is undefined. An implementer reading ADR-019 + could reasonably conclude that *any* remote vault exposure requires an + ADR; an implementer reading OQ-21 could conclude none is needed. + +2. **The "non-breaking" claim is only true for the wire format, not for + secure enablement.** OQ-21 and protocol.md L278–286 frame enabling remote + access as a "server-setup change." But the auth-wrapping handler that makes + remote access *safe* cannot live in the vault crate (ADR-018) and requires + either (a) assembly-layer integration or (b) **a new dedicated vault-server + crate depending on both alknet-core and alknet-vault** (protocol.md + L201–204, open-questions.md L269). Option (b) is a crate-decomposition + change (ADR-003 territory), not "server setup." OQ-21's "two-way (enabling + is non-breaking)" door-type classification (L256) is misleading: once + workers build dependencies on the remote vault API, disabling it breaks + them — the door is operationally one-way even if the wire format is + additive. + +3. **Nested, undecided deferral.** OQ-21 L269 says the auth handler "lives in + the assembly layer **or** a dedicated vault-server crate." This "or" is + itself an undecided architectural choice (affects the dependency graph and + ADR-003's crate decomposition) deferred *inside* the OQ-21 deferral. It is + not tracked as a separate question. + +4. **The RemoteService path is unvalidated.** The deferral means + `VaultProtocol`'s `RemoteService` trait impl, `DerivedKey`'s dual + serialization (JSON redacts, postcard preserves), and + `VaultServiceActor`'s transport-agnosticism have never been exercised over + a real network. The spec *asserts* these work ("by construction"), but + there are concrete failure modes that only surface at enablement — + e.g., `DerivedKey`'s custom `Serialize` redacts `private_key` for + "human-readable formats"; whether postcard correctly bypasses the + redaction is a known source of subtle bugs. If it redacts under postcard + too, remote dispatch silently corrupts private keys. ADR-009:33 warns + "if the right choice is unclear, validate with a POC." The deferral skips + the POC and asserts the door is open by construction. + +5. **Safety footgun.** The default `IrohProtocol` handler forwards all + message types without auth (protocol.md L189–191). If someone naively + registers the ALPN without the wrapping handler, `Unlock` (mnemonic over + the wire) and `Lock` (DoS) become remotely callable. The operation access + policy is documented in prose only; nothing in the vault crate prevents + the insecure configuration. + +**Suggestion**: Promote OQ-21 from "deferred" to "open." Reconcile with +ADR-019 by explicitly defining the administration-vs-access boundary. Decide +now whether a vault-server crate is part of the architecture (this is a +crate-decomposition one-way door per ADR-003/ADR-018 and should be decided, +not deferred). If remote access is genuinely deferred, require that the +*enablement* (not just the protocol) be covered by a future ADR — matching +ADR-019's "requires its own ADR" language. Correct the door-type +classification: enabling remote access is a one-way door operationally (workers +depend on it), even if the wire format is additive. At minimum, add a guard +clause: "The RemoteService path is unvalidated. The `DerivedKey` postcard +serialization and the actor's transport-agnosticism must be verified by a +POC before claiming the door is open." + +--- + +### C8. The Operation Access Policy Table Is Incomplete — Security-Sensitive Methods Are Unclassified + +**Documents**: crates/vault/protocol.md (L218–236), crates/vault/service.md +(L104–212) + +**Problem**: The "Operation access policy" table (protocol.md L223–232) +covers exactly the 8 `VaultProtocol` enum variants. But `VaultServiceHandle` +exposes additional methods not in the protocol enum: `rotate`, `unlock_new`, +`derive_encryption_key_for_version`, `derive_password_string`, `is_unlocked`. +These methods have no access classification — an implementer cannot determine +from the spec whether they are local-only, remote-capable, or simply not +protocol-exposed. This is security-relevant: + +- **`rotate`** re-encrypts blobs (security-sensitive operational action; + ADR-021 L168–170 says "rotation is an operational action, not a runtime + behavior"). If it were ever exposed remotely, a compromised worker could + trigger mass re-encryption. It is not in the protocol enum and not in the + access policy table — its status is undefined. +- **`unlock_new`** generates and returns the mnemonic (root of trust) as a + plaintext `String`. Not in the protocol enum; not in the policy table. Its + local-only status is implied but not stated. +- **`derive_encryption_key_for_version`** returns an `EncryptionKey` directly + (raw key material), unlike `Encrypt`/`Decrypt` which use the key internally. + If exposed remotely, a worker could extract raw encryption keys. Not + classified. + +The spec never states the rule that "service methods not in `VaultProtocol` +are local-only by construction." An implementer could add protocol variants +for these methods without recognizing the security implication, since the +policy table only constrains existing variants. + +**Suggestion**: Add an explicit rule: "Service methods absent from +`VaultProtocol` are local-only by default; adding a protocol variant requires +updating this policy table and an ADR if the method is security-sensitive." +Add `rotate`, `unlock_new`, and `derive_encryption_key_for_version` to the +policy table (or an explicit "local-only" list) with rationale. Document the +mapping between service methods and protocol variants. + +--- + +### C9. `site_password_path(site_hash: &str)` Is Underspecified — String-to-Index Mapping Breaks Determinism + +**Documents**: crates/vault/mnemonic-derivation.md (L202–204, L213, L219–222) + +**Problem**: The helper `site_password_path(site_hash: &str) -> String` +produces path `m/74'/1'/0'/{site_hash}'`, but SLIP-0010 hardened derivation +indices are `u32` (0 to 2³¹−1). A `&str` "site hash" cannot be a derivation +index directly — it must be converted to a `u32`. The conversion (hash +algorithm, truncation, endianness, collision handling) is completely +unspecified. Two implementers could produce different, incompatible mappings: +SHA-256(site) → first 4 bytes → u32; or last 4 bytes; or BLAKE3; or a string +hash. Since the path semantics table (L213) says these are "Ed25519 bytes" +used for "Per-site passwords (not cached)," and determinism is a stated design +principle (L234–237: "the same mnemonic + passphrase + path always produces +the same key"), the string→index mapping must be deterministic and specified. +An underspecified mapping means the same site + mnemonic produces different +passwords across implementations — breaking the reproducibility guarantee. + +Additionally, "Ed25519 bytes" (L213) as a password source is an undefined +concept. The path produces a 32-byte Ed25519 private key; using those raw +bytes as a password is a non-standard construction with no ADR explaining why +Ed25519-derived material is suitable as password entropy (vs HKDF or a +dedicated KDF). + +**Suggestion**: Specify the exact string→u32 mapping (e.g., +`u32::from_be_bytes(SHA-256(site_hash)[0..4]) & 0x7FFFFFFF` for hardened +index). Add a design note or ADR explaining why Ed25519-derived bytes are +acceptable as password material. Specify the password encoding (raw bytes vs +base64url — service.md L166 says base64url for the string variant, but the +bytes variant's encoding is the caller's concern). + +--- + +### C10. `HandlerError::StreamError(io::Error)` vs `StreamError` Variant — Conversion Gap + +**Documents**: crates/core/core-types.md (L33–38, L118–139), +crates/core/endpoint.md (L248–253), crates/core/auth.md (L55, L59), ADR-010 + +**Problem**: `core-types.md` and `endpoint.md` (citing ADR-010) both define +`HandlerError::StreamError(io::Error)`. But `auth.md:59` shows handler code +calling +`self.identity_provider.resolve_from_token(&token).ok_or(HandlerError::AuthRequired)?` +after `let stream = connection.accept_bi().await?;` (L55). The `.await?` on +`accept_bi()` returns `StreamError` (per `core-types.md:60`), yet there is no +`From for HandlerError` impl specified. The mapping table in +`core-types.md:132–137` is descriptive prose, not a trait impl. Without a +specified `From for HandlerError` (or `#[from]`), the `?` in the +example will not compile. Two implementers following these docs literally +will diverge: one will hand-map per the table, another will expect an auto- +`From`. + +Additionally, `HandlerError::StreamError(io::Error)` holds `io::Error`, but +`StreamError` itself holds `Internal(io::Error)` (L118–124). So +`StreamError::Internal(io::Error)` → `HandlerError::StreamError(io::Error)` +is a lossy wrap (the distinction between stream-level and connection-level is +flattened). The spec should state whether this `From` impl exists and its +exact behavior, because handler code (`?` operator) depends on it. + +**Suggestion**: Specify the exact conversion mechanism. Either (a) declare +`HandlerError::StreamError(#[from] StreamError)` — but then the variant must +hold `StreamError`, not `io::Error`, contradicting ADR-010/core-types; or (b) +explicitly state handlers must use `.map_err(HandlerError::from)` or +`.map_err(|e| e.into())` and define the `From` impl; or (c) keep `io::Error` +and define `From for HandlerError` matching the mapping table. +Resolve the variant payload (`io::Error` vs `StreamError`) and specify the +`From` impl. Update `auth.md:55` example to use the canonical conversion +pattern. + +--- + +### C11. ADR-004 Says Handlers "Enrich or Replace" AuthContext; ADR-011 Says AuthContext Is Immutable in `handle()` + +**Documents**: ADR-004 (L39, Accepted), ADR-011 (L131–133, Accepted), +crates/core/auth.md (L82–85) + +**Problem**: ADR-004 line 39 states the handler "enriches or replaces the +partial `AuthContext` with the fully resolved `Identity`." This describes +**mutation** of the shared AuthContext. ADR-011 explicitly contradicts this +at lines 131–133: "`handle()` signature passes `&AuthContext` (immutable +reference). Handlers that resolve identity create a local variable… they +don't mutate the AuthContext." `auth.md:85` reiterates immutability. The +immutability decision is the later, more refined one, but ADR-004 was never +amended. An implementer reading ADR-004 alone would mutate `AuthContext` — +causing cross-stream contamination (one stream's token resolution leaking +into another stream's AuthContext on the same connection), exactly what +ADR-011 prevents. + +This is security-relevant — mutation across streams is a contamination +vector. Same shape as C1 (Accepted ADR contradicts a later refinement that +was never recorded as a supersession). + +**Suggestion**: Amend ADR-004 to note that the "enrich/replace" language was +superseded by ADR-011's immutability model (handlers resolve into a local +variable, store on `Connection` via `set_identity`). Add an inline note at +ADR-004 L39 pointing to ADR-011. Same pattern as the C1 fix for ADR-015. + +--- + +### C12. ADR-002's `ProtocolHandler::handle` Signature Is Stale — Not Marked as Superseded by ADR-007 + +**Documents**: ADR-002 (L19–27, Accepted), ADR-007 (L44–53, Accepted), +overview.md (L89), crates/core/core-types.md (L14–19) + +**Problem**: ADR-002 (Status: Accepted) still shows +`async fn handle(&self, stream: BiStream, auth: &AuthContext)` at L26. +ADR-007 revised the signature to +`handle(&self, connection: Connection, auth: &AuthContext)` and its References +note "ADR-002: ProtocolHandler trait (signature revised by this ADR)." +However, ADR-002 itself has **no superseded/amended marker** — it reads as +the current authoritative definition. `overview.md:89` even says "This differs +from the original ADR-002 signature which passed `BiStream`," confirming the +discrepancy is known but ADR-002 was never annotated. An implementer reading +only ADR-002 (as the README's "Applicable ADRs" table encourages) gets the +wrong trait. + +**Suggestion**: Add a "Superseded by ADR-007" amendment note at the top of +ADR-002's Decision section (or a `## Amendments` section) stating the +signature was changed by ADR-007. Alternatively, update the ADR-002 code block +to the `Connection` signature with a note "Revised by ADR-007." ADRs must +reflect current state or explicitly mark supersession — this is the same +lifecycle rule that C1 and C11 violate. + +--- + +### C13. `Connection::set_identity` Observability Read Path Is Under-Specified + +**Documents**: crates/core/core-types.md (L53–67), crates/core/auth.md (L80), +open-questions.md (OQ-11 L138), crates/core/endpoint.md (L122–136) + +**Problem**: `Connection` has private `identity: OnceLock`. +`set_identity(&self,…)` takes `&self` (interior mutability) and is +"write-once-read-many." But the endpoint "holds a reference for logging" +(OQ-11:138) and the dispatch code in `endpoint.md:122–136` **moves** `conn` +into the spawned task (`Connection::new(connection)` then +`tokio::spawn(async move { ... handler.handle(conn, &auth) })`). After the +move, the endpoint no longer has the `Connection` to read `identity()` for +observability. OQ-11 says "the endpoint may also hold a reference for +logging" — but there is no `Arc` or shared handle shown anywhere; +the struct definition in `core-types.md:53–57` has no sharing mechanism, and +`AlknetEndpoint` in `endpoint.md:17–27` does not hold a registry of live +connections. So the stated observability reader (the endpoint) has no defined +way to access the connection after spawn. + +**Suggestion**: Specify the observability read path precisely. Either (a) +the endpoint holds a side table of live connections keyed by connection ID +with `Weak` or an `Arc` shared with the handler task, +and `Connection` becomes `Arc`-shareable; (b) the handler emits the resolved +identity via a channel/observer callback and the endpoint records it; or (c) +drop the claim that the endpoint reads it and state observability is +handler-side logging only. The current spec is internally inconsistent on +who reads `identity()` and how they obtain the reference. + +--- + +## Warning Findings + +### W1. ADR-017's `OperationAdapter` Trait Is Stale Relative to ADR-022 — And the Sync/Async "Two-Way Door" Is Already Decided + +**Documents**: ADR-017 (L155–158, L171–174, Accepted), ADR-022 (L225–255, +Proposed) + +**Problem**: ADR-017 L171–172 states: "The specific trait signatures (async +vs sync, error types, configuration parameters) are two-way doors for +implementation." Two problems: + +1. **The bundle shape is already constrained.** ADR-022 changes registration + from `(OperationSpec, Handler)` to `HandlerRegistration` (spec + handler + + provenance + composition_authority + scoped_env + capabilities). + ADR-022 L257–261 says adapter convenience methods construct + `HandlerRegistration` with `composition_authority: None` and + `scoped_env: None`. So the adapter trait must now produce + `Vec`, not `Vec<(OperationSpec, Handler)>`. The trait + signature shown in ADR-017 (`fn import(&self) -> Vec<(OperationSpec, + Handler)>`) is wrong per ADR-022. The "two-way door for implementation" on + the signature has already been narrowed by ADR-022 — the return type is + committed to the bundle, not the pair. ADR-017 has not been updated. + +2. **The async/sync question is not fully two-way.** `from_call` is inherently + async — it discovers operations via `services/list` + `services/schema` + over a QUIC connection. A synchronous `fn import(&self) -> Vec<…>` trait + cannot accommodate `from_call` without a separate async pre-step that + populates a cache, then a sync import that reads the cache. This means the + trait shape is constrained: either the trait is async (`async fn import`), + or `from_call` is not an `OperationAdapter` impl and uses a different path. + The "two-way door" framing suggests you can pick sync now and switch to + async later; in reality, `from_call` forces at least an async-capable path + from day one. + +This is a "two-way door" label that ADR-022 has silently narrowed without +updating the OQ classification. (See the two-way-door audit, item 7.) + +**Suggestion**: Update ADR-017 to: (1) return `Vec` (not +`(OperationSpec, Handler)`), (2) acknowledge that `from_call` requires async +discovery and the trait must be async or `from_call` must be special-cased, (3) +reclassify the signature as one-way (bundle shape) with a narrow two-way door +on ergonomic details (error types, config params). The current framing in +ADR-017 is inconsistent with ADR-022 and should be reconciled — same +reconciliation pass as C5. + +--- + +### W2. `Capabilities` Concrete Shape and Clone Semantics — "Two-Way Door" With a Hidden One-Way Door Inside + +**Documents**: operation-registry.md (L364, L369), ADR-014 (L74–76, L123–124, +L163–165) + +**Problem**: The spec says "the concrete shape of the type (a typed map, a +struct with named fields, a trait object) is a two-way door for implementation" +and "the concrete cloning semantics (reference-counted `Arc` vs deep copy of +zeroized material) is a two-way door for implementation, but +`Capabilities: Clone` is required." + +The `Clone` requirement is accurately identified as a one-way constraint. The +problem is the *clone semantics*, which the framing treats as two-way but +which contain a hidden one-way door: + +- **Arc-based clone (cheap, shared).** If `Capabilities` is + `Arc>`, `clone()` bumps the refcount. All calls in a + composition tree share the same secret material. If a handler can mutate + capabilities (e.g., add a derived key for a child operation), that mutation + is visible to siblings and the parent — shared mutable state across the + call tree. This is a security-relevant behavior. +- **Deep-copy clone (expensive, isolated).** Each call gets its own copy. + Mutations are isolated. But cloning zeroized secret material on every + `invoke()` duplicates secrets N times for a depth-N tree, increasing the + attack surface. + +The "path of least resistance" is Arc-based (cheap, the obvious choice for a +`Clone` type holding secrets). Once shipped, handlers may come to depend on +shared-mutation. Switching from Arc-shared to deep-copy-isolated later is a +behavior change that breaks those handlers. + +Worse, ADR-022's composition model *assumes* cheap clone — it clones +unconditionally on every `invoke()`. A depth-5 composition tree with Arc-based +clone does 5 refcount bumps (cheap). With deep-copy, it does 5 full copies of +all secret material. The ADR-022 design bakes in the assumption that clone is +cheap — which means it bakes in the Arc-based choice, making the "two-way door" +effectively decided by the composition model's performance assumptions. + +**Suggestion**: Add a guard clause: `Capabilities` must be *immutable after +construction* (no interior mutability, no `Mutex`, no `RefCell`). This +makes Arc-based clone safe (shared immutable state is fine) and makes the +two-way door genuinely two-way: you can switch between `Arc` and +deep-copy without behavior change, because neither supports mutation. Without +this guard, the "two-way door" is a future one-way door waiting for the first +handler that mutates capabilities. + +--- + +### W3. `CallClient` Registry Mechanism — "Two-Way Door" With a Security Dimension ADR-022 Introduced + +**Documents**: ADR-017 (L236–237, Accepted), ADR-022 (L225–242, Proposed) + +**Problem**: ADR-017 L236–237: "The specific mechanism (sharing the global +registry, a peer-scoped subset, or a separate registry) is a two-way door." + +The three options are mechanically possible with the `HandlerRegistration` +bundle. But ADR-022 introduced a security dimension the framing doesn't flag: + +- **Sharing the global registry exposes local capabilities to the remote + peer.** Each `HandlerRegistration` carries `Capabilities` with secret + material. If the `CallClient` shares the global registry, the remote peer + can invoke any External operation — and those operations' handlers use the + *local* capabilities. A remote peer calling `/llm/generate` uses the local + node's Google API key. The remote peer effectively gains access to all + local secret material that backs External operations. +- **A peer-scoped subset must filter by capability remote-safety, not just + operation name.** The "path of least resistance" for a peer-scoped subset is + to filter by operation name/namespace. But the real filter must be: "is this + operation's capability safe to expose to this peer?" An operation backed by + a signing key that should never leave the node is not remote-safe, + regardless of its name. + +The framing was written before ADR-022 put capabilities in the bundle. +Post-ADR-022, the "two-way door" on the registry mechanism has a security +constraint attached that the framing doesn't mention. + +**Suggestion**: Append to ADR-017's negative consequences: "Sharing the +global registry with a `CallClient` exposes all local capabilities (API keys, +signing keys) to the remote peer, because the dispatch path populates +`OperationContext.capabilities` from the registration bundle. A peer-scoped +subset must filter by capability remote-safety, not just operation name. The +registry-mechanism choice is two-way mechanically but has a security dimension +post-ADR-022: the 'share global' option is a capability-exposure decision, not +just a dispatch decision." + +--- + +### W4. `from_call` Hot-Swapping and Registry Mutability — Two "Two-Way Doors" That Gate Each Other + +**Documents**: ADR-017 (L279), operation-registry.md (L383) + +**Problem**: ADR-017 L279: "Hot-swapping imported specs is a two-way door." +operation-registry.md L383: "The registry is immutable after construction. No +runtime registration or deregistration. Two-way door — +`ArcSwap` can be added later." + +These are presented as independent two-way doors. They are not — one gates the +other. If the initial implementation registers `from_call`-imported specs +immutably (per the immutable-registry constraint), and handlers/clients +compile/generated-code against those specs, then "hot-swapping" later means: +re-importing on reconnect, diffing specs, invalidating generated clients, +handling in-flight calls whose schema changed. If the registry is +`ArcSwap`-able, hot-swap is mechanically possible but semantically fraught (a +handler mid-composition doesn't expect the child's schema to change). + +Deciding "hot-swap is a two-way door" without resolving the registry-mutability +question leaves an implementer to guess whether to build for immutability or +hot-swap. The deferral hides a coupling: the *immutability* of the registry (a +one-way door per ADR-010 and operation-registry.md L14) and the *mutability* +of remote specs (inherent to `from_call`) are in tension. + +Additionally, hot-swapping with ADR-022 bundles requires re-injecting +capabilities (cloning from the old registry or re-deriving from the vault) and +rebuilding the TLS `ServerConfig` (ALPN strings come from the registry). +The "simple ArcSwap" now touches the secret-material flow and the TLS config. +This isn't closing the door, but it raises the cost and security sensitivity +of the swap. + +**Suggestion**: Resolve the registry-mutability question (immutable vs +`ArcSwap`) *before* deciding hot-swap, since hot-swap depends on it — don't +let both be independent "two-way doors" when one gates the other. Add a guard +clause to OQ-04: "Hot-swapping the registry with ADR-022 bundles requires +re-injecting capabilities (cloning from the old registry or re-deriving from +the vault) and rebuilding the TLS ServerConfig. The `ArcSwap` upgrade is +mechanical but not trivial — it is a secret-material-handling operation, not +a pure pointer swap." + +--- + +### W5. `to_*` Adapter "Best-Effort" Mappings Become Compatibility Contracts Once Published + +**Documents**: ADR-017 (L244–246, L281–286, Accepted), ADR-023 (error schemas +projection) + +**Problem**: ADR-017 L244–246: "Some semantics don't map cleanly (e.g., +subscriptions in OpenAPI, bidirectional calls in MCP). The adapters handle +these with best-effort mappings and document the gaps." + +The "best-effort" framing implies the mappings are malleable — two-way, +adjustable. This is misleading once the generated spec is published: + +- **A published spec is a compatibility contract.** `to_openapi` generates an + OpenAPI spec. Once external HTTP clients consume that spec and build against + it, the mapping semantics (e.g., subscriptions → SSE long-poll, or + subscriptions → a polling endpoint) become a de facto contract. Changing the + mapping later breaks every client that built against the original spec. The + "best-effort" mapping, once published, is a one-way door for the clients + that depend on it. +- **`from_openapi` round-trip lossiness compounds.** If `to_openapi` maps + subscriptions to SSE, and later `from_openapi` imports a spec where + subscriptions are modeled as webhooks, the two adapters disagree on what a + "subscription" means in OpenAPI. The `to_*` mapping choices constrain the + `from_*` parsing expectations. +- **With ADR-023**, `to_openapi` now maps `ErrorDefinition.http_status` to + OpenAPI response codes. Once published, those status code mappings are a + contract. Changing `FILE_NOT_FOUND` from 404 to 422 in a later `to_openapi` + revision breaks clients. + +This is the "informal spec becomes formal spec" problem. The generated +artifact is the contract; the "best-effort" label is internal framing that +external consumers never see. ADR-009's framework classifies doors by +reversal cost in the codebase, not by compatibility cost for external +consumers — a systematic blind spot. + +**Suggestion**: Add a guard clause to ADR-017: "The `to_*` mapping choices, +once published as a generated spec, become a compatibility contract for +external consumers. Best-effort mappings are two-way before first publication +but one-way after. Version the generated specs (e.g., OpenAPI spec version +tied to the registry's External operation set version) and document that +mapping changes are breaking for consumers. The `to_*` adapters should emit a +spec version marker so consumers can detect mapping changes." + +--- + +### W6. The Salt Field's "Reserved for Future KDF" Framing Is Misleading — Semantics Can't Change for Existing Data + +**Documents**: ADR-020 (L102–108, L151–157, L204–207, Accepted), +crates/vault/encryption.md (L112–128) + +**Problem**: ADR-020 L108: "If a future KDF-based derivation is needed (see +'Future KDF' below), the field is already present." ADR-020 L156: "This is +additive — it doesn't change v2 data." + +The "reserved" framing creates a false sense of flexibility: + +- **The salt in v2 is cryptographically meaningless and cannot be + retroactively made load-bearing for v2 data.** v2 encrypts with an + HD-derived key at `m/74'/2'/0'/0'`. The salt is random, stored, and unused. + If v3 introduces a KDF that uses the salt, v3's key derivation is different + from v2's. v3 cannot decrypt v2 data using the salt — v2 was never encrypted + with a salt-derived key. The salt is only usable for NEW v3+ data. So "the + field is already present" is technically true but misleading: the field is + present in v2 blobs, but its presence doesn't make v2 data upgradeable to a + KDF. It only means v3 blobs can reuse the field without a wire-format + change. +- **The wire-format-compat framing is the real two-way door, and it's + accurate.** Keeping the field means the `EncryptedData` struct doesn't + change shape across v2→v3. That IS a wire-format two-way door — you can add + a KDF in v3 without changing the struct. But the *semantics* of the salt + field ("unused" in v2, "load-bearing" in v3) cannot be changed for existing + v2 data. +- **A KDF doesn't fit the rotation scheme.** Rotation is version-indexed + paths (`m/74'/2'/0'/{version-2}'`). A KDF-based derivation is a different + derivation *method*, not a different path. If a KDF is introduced, the + rotation mechanism (version-indexed paths) and the KDF mechanism + (salt-based) coexist or conflict. The "reserved salt" framing doesn't + acknowledge that a KDF is a different derivation family, not just another + version index. + +**Suggestion**: Update ADR-020: "The salt field is reserved for FUTURE +versions' use. v2 data's salt is permanently unused — it was random, never +participated in key derivation, and cannot be retroactively made +load-bearing. Introducing a KDF in v3 is a new derivation method (not a +version-indexed path), requiring its own design and a v2→v3 migration +(re-encrypt with the new KDF, using a newly-generated v3 salt — the v2 salt +is not reused). The field's presence saves a wire-format struct change only; +it does not make the KDF design or migration trivial." Drop the speculative +"reserved for future KDF" framing or explicitly mark it as speculative, not a +planned feature. + +--- + +### W7. `unlock_new` Returns the Mnemonic as a Non-Zeroizing `String` — Security Gap + +**Documents**: crates/vault/service.md (L71–79) + +**Problem**: `unlock_new(word_count) -> Result` +returns the generated mnemonic phrase as a `String`. The `Mnemonic` type +zeroizes on drop (mnemonic-derivation.md L77–78), but `String` does not +implement `Zeroize` by default. The root of trust is returned to the caller +as a plaintext heap-allocated `String` that lingers in freed memory until the +allocator reuses it. The spec says "Store the returned phrase securely" +(L78–79) but does not address that the returned `String` cannot be zeroized +by the caller without `unsafe` or converting to a zeroizing container. + +This contradicts the stated principle "Zeroize everything sensitive" +(vault README L65–67) and the `Mnemonic` type's own zeroization. The Security +Constraints section (service.md L365–399) does not mention this. + +**Suggestion**: Either return a zeroizing wrapper (e.g., `Zeroizing` +or a dedicated `Phrase` type that implements `Zeroize`/`ZeroizeOnDrop`), or +document in Security Constraints that the caller must convert the returned +`String` to a zeroizing type and must not let it be cloned/logged. At +minimum, flag this as a known limitation. + +--- + +### W8. `DerivedKey` JSON Deserialization Silently Produces a Corrupted Key + +**Documents**: crates/vault/protocol.md (L94–112) + +**Problem**: protocol.md L104–107 states: "A JSON-deserialized `DerivedKey` +will have `"[REDACTED]"` as its `private_key` string — this is expected; JSON +round-tripping a `DerivedKey` is not a supported use case (the private key is +gone)." + +This is a silent-failure footgun: if a `DerivedKey` is accidentally +serialized to JSON and then deserialized (e.g., in a log pipeline, a debug +tool, or a caching layer that uses JSON), the resulting `DerivedKey` has a +10-byte string (`"[REDACTED]"`) as its `private_key` instead of 32 bytes. Any +code that then uses this `DerivedKey` for signing will either produce garbage +signatures or panic on length mismatch — with no error at deserialization +time. The redaction is a defense-in-depth for *serialization* (logs), but the +*deserialization* side silently accepts the redacted form, producing a +corrupted key. + +For the irpc/postcard path this is not an issue (postcard preserves bytes). +But if any code path ever uses JSON for `DerivedKey` transport (the spec says +this is "not a supported use case" but doesn't prevent it), the corruption is +silent. + +**Suggestion**: Either (a) make `DerivedKey::deserialize` reject payloads +where `private_key == "[REDACTED]"` (returning an error rather than a +corrupted key), or (b) document explicitly that `private_key` length must be +validated before use and that JSON-deserialized `DerivedKey`s are invalid. +Consider whether the custom `Deserialize` impl should enforce a 32-byte +length check. + +--- + +### W9. No ADR for the HD-Derivation Key Model (Identity/SSH/Signing) — Only Encryption Keys Have ADR-020 + +**Documents**: crates/vault/mnemonic-derivation.md (L31–46, L248–256), +crates/vault/README.md (L68–70) + +**Problem**: ADR-020 covers HD derivation for *encryption* keys. But the +vault's primary use of HD derivation is for identity keys, SSH host keys, and +signing keys (mnemonic-derivation.md L26–29, L184–215). The rationale for "HD +derivation, not stored keys" (L31–46: no key storage, reproducible across +nodes, domain separation, auditable derivation) is inline in the spec's "Why +HD Derivation" section. The Design Decisions table (L250–256) lists this with +no ADR ("HD derivation (not stored keys) | — | One seed, many keys, no key +storage"). + +This is a foundational architectural decision — the entire vault model +depends on it — yet it has no ADR. The choice of HD derivation vs. stored keys +is a one-way door (switching to stored keys would change the entire trust +model). ADR-018 and ADR-019 both *assume* HD derivation but don't *decide* it. + +Similarly, these design choices have inline rationale but no ADR: +- **`74'` coin type reservation** (L186, L254): SLIP-0044 unallocated coin + type claimed for alknet. This is a namespace allocation that, once keys are + derived, cannot be changed without re-deriving all keys — effectively + one-way. No ADR records this allocation. +- **secp256k1 feature-gating** (L166–179): rationale (heavy C dependency, only + needed for Ethereum) is inline. No ADR. +- **AES-256-GCM cipher/mode choice** (encryption.md L27–37): rationale + (authenticated, hardware-accelerated) is inline. No ADR. + +**Suggestion**: Either (a) create an ADR for "HD derivation as the vault's key +model" covering identity/SSH/signing keys (the encryption case is already +ADR-020), or (b) broaden ADR-020's scope, or (c) create a single "Vault Key +Model" ADR covering the overarching HD-derivation decision with ADR-020 as a +special case for encryption keys. Record the `74'` coin type reservation and +the AES-256-GCM choice in ADRs or a tracked decisions log, since they are +one-way doors with no current ADR. + +--- + +### W10. The `EncryptedData` "Stable Wire Format" Is a One-Way Door Documented Only in Prose + +**Documents**: crates/vault/encryption.md (L84–95), ADR-018 (L101–104) + +**Problem**: encryption.md L86–95 states: "This is the **stable wire format** +shared with `alknet-storage` (a future crate) by type-level agreement, not by +a crate dependency. Both crates must agree on the serialization format… This +cross-language compatibility is why the wire format must stay stable — +changing it breaks both `alknet-storage` and the TypeScript consumer." +ADR-018 L101–104 repeats this. + +This is a one-way door: once `alknet-storage` and TS consumers depend on the +format, changing it breaks them. Yet it is not locked by an ADR. It is a +cross-crate compatibility contract enforced by documentation, not by a decided +constraint. An implementer modifying `EncryptedData` (e.g., removing the +unused `salt` field) would find no ADR saying "this format is frozen." The +"type-level agreement, not a crate dependency" approach (ADR-018) means there +is no compiler enforcement either — the stability contract exists only in +prose. + +**Suggestion**: Create an ADR (or add a decision to ADR-018) locking the +`EncryptedData` wire format as stable, with the compatibility surface +explicitly enumerated (fields, base64 encoding, field semantics). Note that +the `salt` field, though unused in v2, is part of the frozen format and cannot +be removed without a format-version migration. + +--- + +### W11. `OperationContext.identity` Conversion from `CompositionAuthority` Is Undefined + +**Documents**: operation-registry.md (L236: +`identity: parent.handler_identity.as_identity()`), ADR-022 (L314: +`identity: parent.handler_identity_as_identity()`) + +**Problem**: Both spec and ADR-022 populate the child's `identity` by +converting the parent's `handler_identity: Option` into +an `Option` via a method `as_identity()` / +`handler_identity_as_identity()`. This conversion method is **never defined** +and its semantics are unclear: `CompositionAuthority` (label + scopes + +resources) is a declared authority bundle, while `Identity` is a peer identity +resolved via `IdentityProvider`. ADR-022 goes to lengths to stress that +`CompositionAuthority` is *not* a peer `Identity` (L150–160). Yet the dispatch +path needs an `Identity` to run ACL against for the child. + +So there's an implicit conversion from a non-peer authority bundle to a peer +identity type. Is it a synthetic `Identity` constructed with the authority's +label as `id` and its scopes? Does it bypass `IdentityProvider`? Is +`as_identity()` returning `None` when `handler_identity` is `None` (the leaf +case)? For leaves, the ACL "checks against the leaf's `AccessControl`" — but +if `identity` is `None` because the parent is a leaf with +`handler_identity: None`, then per operation-registry.md L86–87 "If the +relevant identity is `None` and the operation has restrictions, the adapter +returns `FORBIDDEN`." That would mean a composing handler with no composition +authority cannot compose *any* restricted leaf — which contradicts the model +where leaves are composed under the *parent's* authority. + +Actually: a composing parent (Local/Session) *does* have +`handler_identity: Some(...)`, so `as_identity()` returns `Some`. Leaves being +composed *are* the child; their own `handler_identity` is `None` but that's +fine because the child's ACL uses the child's `identity` (which is the parent's +authority). So the conversion is load-bearing and must be specified. + +**Suggestion**: Define `CompositionAuthority::as_identity() -> Option` +(or `handler_identity_as_identity()` on `OperationContext`) explicitly in the +spec or ADR-022. Specify that it constructs a synthetic +`Identity { id: label, scopes, resources }` that is *not* resolvable via +`IdentityProvider` but is used directly for ACL matching. Note that this +creates a second `Identity` construction path (the first is +`IdentityProvider::resolve_*`), which should be acknowledged. + +--- + +### W12. Source Drift Items Are Tracked Only in Scattered Prose — The `spawn` Return Bug Could Be Lost + +**Documents**: crates/vault/README.md (L79–103), crates/vault/encryption.md +(L185–188, L245–250), crates/vault/service.md (L280–286, L374–391) + +**Problem**: Four source-vs-spec drift items are documented in prose across +multiple files: + +1. **OsRng vs `rand::random()`** — documented in README L87–89, encryption.md + L245–250, service.md L374–376 (3 locations). +2. **`unwrap()` on RwLock** — documented in README L94–97, service.md + L384–391 (2 locations). +3. **`CURRENT_KEY_VERSION = 1` (should be 2)** — documented in encryption.md + L185–188, ADR-020 L120–124 (2 locations, none in README). +4. **`spawn` returns unspawned actor** — documented **only** in service.md + L280–286 (1 location, inside a method description, not in a Security + Constraints section). + +Problems with this approach: +- The `spawn` bug (item 4) is documented in exactly one place, buried in the + `spawn` method's bullet description, not in any "Security Constraints" or + "Known Drift" section. An implementer who reads the Security Constraints + sections (the natural place to look for implementation-sync corrections) + would miss it. +- README's Security Constraints (L79–103) lists OsRng and unwrap but **omits** + `CURRENT_KEY_VERSION` and the `spawn` bug. README is the index document; an + implementer using it as a checklist would miss two drift items. +- There is no single, structured tracking mechanism. If one prose location is + updated and others aren't, the docs silently diverge. + +**Suggestion**: Add a single "Known Source Drift" table to README.md (or a +dedicated section) listing all drift items with: item, current behavior, +target behavior, location in source, and status. At minimum, add +`CURRENT_KEY_VERSION` and the `spawn` bug to README's Security Constraints. +Move the `spawn` bug note out of the method description and into the drift +tracker. + +--- + +### W13. `CertAuthorityEntry` Is an Undefined Placeholder Type Used in `DynamicConfig` + +**Documents**: crates/core/config.md (L119–130) + +**Problem**: `AuthPolicy.cert_authorities: Vec` is +declared, but `CertAuthorityEntry` is explicitly "a placeholder type. Its +fields will be defined when alknet-ssh is implemented." This means alknet-core's +public API references a type that does not exist yet. The spec says "for v1, +`cert_authorities` will be an empty vector," implying the field is dead weight. +An implementer cannot compile `DynamicConfig` without defining +`CertAuthorityEntry` somewhere. Where does it live — alknet-core (but it's +SSH-specific per the doc) or alknet-ssh (but then alknet-core can't reference it +without a dependency)? This is a crate-dependency contradiction: core +references an SSH-defined type, but core cannot depend on alknet-ssh (ADR-003 +forbids it). + +**Suggestion**: Either define a minimal `CertAuthorityEntry` in alknet-core +(as an opaque/newtype or a generic struct with the fields that are +ALPN-agnostic), or remove `cert_authorities` from v1 `AuthPolicy` entirely and +add it back when alknet-ssh lands (it's a two-way door — adding a field is +additive). Leaving an undefined type in a public struct blocks compilation. + +--- + +### W14. `SecretKey` Type in `TlsIdentity::RawKey(SecretKey)` Is Undefined and Its Origin Ambiguous + +**Documents**: crates/core/config.md (L44, L79) + +**Problem**: `TlsIdentity::RawKey(SecretKey)` and `SecretKey::generate()` are +used but `SecretKey` is never imported or attributed. It could be +`ed25519_dalek::SecretKey`, `iroh::SecretKey`, `alknet_vault::SecretKey`, or a +core-defined type. Since config.md says the key "can be derived from +alknet-vault" (per endpoint.md:181), and alknet-core must not depend on +alknet-vault (ADR-003, ADR-018), this matters: if `SecretKey` is from iroh, +then alknet-core depends on iroh even for quinn-only nodes; if it's a +re-export/newtype, that needs stating. The crate dependency table in ADR-003 +L27 lists alknet-core depending on "tokio, quinn, rustls, irpc" — **not iroh** +— yet `config.md` and `endpoint.md` show iroh types in core's public API. + +**Suggestion**: Specify the exact type and crate for `SecretKey`. If it's +`iroh::SecretKey`, update ADR-003's dependency list (iroh is already used for +`iroh::Endpoint`, so this is consistent, but ADR-003 omits iroh from +alknet-core's deps — a separate inconsistency). If it's `ed25519_dalek`, say +so. Resolve whether alknet-core publicly depends on iroh types. + +--- + +### W15. ADR-003 Omits iroh From alknet-core's Dependency List; Spec Docs Assume It + +**Documents**: ADR-003 (L27), crates/core/endpoint.md (L20), config.md (L44) + +**Problem**: ADR-003 L27 lists alknet-core as depending on "tokio, quinn, +rustls, irpc" — no iroh. But `AlknetEndpoint` holds +`iroh: Option` (endpoint.md L20, ADR-010 L67), +`SendStream`/`RecvStream` wrap "iroh::SendStream or iroh::RecvStream" +(core-types.md L103), and `TlsIdentity::RawKey(SecretKey)` likely uses iroh's +key type. ADR-010 L184 says this is "mitigated: both are feature-gated." But +ADR-003's dependency table is now stale — it doesn't reflect the iroh +dependency that ADR-010 introduced. + +**Suggestion**: Update ADR-003's dependency table to include iroh +(feature-gated) for alknet-core, or add an amendment note that ADR-010 added +the iroh dependency. + +--- + +### W16. Overview Failure-Mode Table Says Handler Error Closes "QUIC Stream"; Spec Says Connection + +**Documents**: overview.md (L237–238), crates/core/core-types.md (L30, L46), +crates/core/endpoint.md (L255–257, L262) + +**Problem**: `overview.md:237` says "Handler `handle()` returns `HandlerError` +| Endpoint logs the error, closes the QUIC **stream**." But +`core-types.md:30` says "The endpoint catches these, logs them, and closes the +**connection**." `endpoint.md:256` says HandlerError is "Non-fatal errors +within a handler" and the connection closes. The whole architecture is +one-ALPN-per-connection (ADR-006), and `handle()` receives the whole +`Connection`. Closing a "stream" doesn't make sense when the handler owns the +connection. This is a stale leftover from the pre-ADR-007 model where +`handle()` received a `BiStream`. + +**Suggestion**: Change `overview.md:237–238` from "closes the QUIC stream" to +"closes the QUIC connection." + +--- + +### W17. OQ-09's "BiStream Trait Preserves the WASM Door" Is Only True for the Client Side + +**Documents**: open-questions.md (OQ-09 L100–107), ADR-007, ADR-009 (L41, L44), +crates/core/core-types.md, crates/core/endpoint.md + +**Problem**: The resolution claims "The BiStream trait decision (ADR-007) has +already preserved the most important WASM door." This is accurate for the +*client* path (a browser can implement `BiStream` over WebTransport) but +oversells what is preserved for a *server-side* WASM peer: + +- **`Connection` is a concrete struct, not a trait** (ADR-007). Its + `accept_bi()`/`open_bi()` return `SendStream`/`RecvStream` described as + wrapping "the underlying QUIC stream types." A WASM server-side peer cannot + implement this `Connection` without a QUIC library that compiles to WASM. + BiStream being a trait does not help — handlers receive `Connection`, and + `Connection` is the binding point. +- **The accept loop uses `tokio::spawn`** (ADR-010). Tokio does not run on + WASM. The entire endpoint runtime is tokio-bound. +- **`PendingRequestMap` and the `CallAdapter` use tokio `oneshot`/`mpsc` + channels** (call-protocol.md L218–240). These are tokio-specific. + +The OQ framing implies BiStream was the binding constraint and the rest is +detail. In reality, BiStream preserves the *client stream* door, but +`Connection`, the accept loop, and the call-protocol dispatch internals +quietly close the *server-side dispatch* door. + +**Suggestion**: Update OQ-09's resolution to state: "BiStream being a trait +preserves the client-side stream door. The server-side dispatch door +(`Connection` as a concrete quinn-bound struct, tokio-based accept loop, tokio +channel dispatch) is NOT preserved by ADR-007 and would require a `Connection` +trait and a runtime-abstracted accept loop to open. This is a known, accepted +closure — the browser path is client-side via a JS SDK, not server-side +Rust-to-WASM." This converts the misleading framing into an explicit, accepted +one-way door. + +--- + +### W18. OQ-10's "Start With Smart Protocol" Deferral Forecloses Git Composability + +**Documents**: open-questions.md (OQ-10 L109–116) + +**Problem**: The deferral says "start with git smart protocol over QUIC +streams, ERC721 and full server capabilities are additive." The ALPN/endpoint +design does not constrain the git adapter's *wire format* — the handler owns +its format and `alknet/git` is a dedicated connection. That part is genuinely +two-way. + +However, the deferral hides a fork that the "path of least resistance" will +silently resolve in a way that forecloses composability: + +- **Raw smart protocol vs. call-protocol projection.** If git ships as a + `ProtocolHandler` speaking raw git smart protocol over QUIC streams (the + simplest path), git operations are NOT callable via `env.invoke()`. They + exist only on the `alknet/git` ALPN, not in the `OperationRegistry`. An + agent handler that wants to compose `git/clone` cannot — there's no + `OperationSpec`, no `Handler`, no registration. The agent would have to + open a raw QUIC connection to `alknet/git` and speak pkt-line, bypassing + the entire composition/security model (ADR-015, ADR-022). +- To make git composable later, you'd need a *separate* call-protocol + projection — a set of `OperationSpec`/`Handler` pairs that wrap git + operations behind the registry. That's additive, but it means the "simple" + choice (raw smart protocol only) produces a git adapter that is permanently + non-composable unless a second handler layer is built. + +The deferral frames ERC721/full-server as the additive part. But the real +additive question is composability, and the simplest implementation forecloses +it. + +**Suggestion**: Append to OQ-10's resolution: "The composability of git +operations (whether git ops are registered in the `OperationRegistry` and +callable via `env.invoke()`, or only available as raw smart protocol on +`alknet/git`) is a separate decision from ERC721 scope. The path of least +resistance (raw smart protocol only) forecloses agent composition of git +operations. If agent tool dispatch needs to compose git ops, a call-protocol +projection must be built alongside or instead of the raw handler. Resolve +this when speccing alknet-git, not deferred past it." + +--- + +### W19. ADR-016 Self-Referential Citation and Under-Specified Grandchild Propagation + +**Documents**: ADR-016 (L134, L154–158) + +**Problem**: Two minor issues in ADR-016: + +1. **Self-referential citation** (L134): ADR-016 cites itself ("ADR-016 + Assumption 5 says the policy is per-call, not per-operation") in its own + Decision 6 rationale. This should just be "Assumption 5" (it's the same + document). + +2. **Grandchild propagation under-specified** (L154–157): "The child's + `OperationContext` carries the policy. If the child itself composes + grandchildren, the policy propagates unless the child explicitly overrides + it." This is the only statement of propagation-vs-override semantics. It + doesn't specify: does the grandchild *inherit* the child's policy by + default (so if the child got `ContinueRunning`, the grandchild is + `ContinueRunning` unless overridden)? Or does each `invoke()` reset to + the default `AbortDependents` unless explicitly set? The phrase + "propagates unless overridden" implies inheritance, which is reasonable, + but the `invoke()` default (operation-registry.md L213, ADR-016 L146) shows + `context.env.invoke(...)` with no policy = default. So is "no policy arg" + = "inherit parent's" or "reset to AbortDependents"? These conflict. + +**Suggestion**: (1) Change "ADR-016 Assumption 5" to "Assumption 5" within +ADR-016. (2) Clarify: does `invoke()` with no policy argument (a) inherit the +parent's current policy, or (b) reset to `AbortDependents`? Recommend (b) +reset-to-default for predictability (each composition is a fresh decision) +unless the parent opts the child into `ContinueRunning`, and document that +`ContinueRunning` does *not* auto-propagate to grandchildren (the child must +re-affirm it for each grandchild). Or pick (a) and update the `invoke()` +examples. Either is defensible; pick one. + +--- + +### W20. ADR-023's `from_openapi` Error Code Example Collides With Protocol-Level `NOT_FOUND` + +**Documents**: ADR-023 (Assumption 3 L366–373, Decision 5 L241) + +**Problem**: ADR-023 Assumption 3 acknowledges the collision risk: an operation +declaring `NOT_FOUND` as a domain code collides with the protocol-level +`NOT_FOUND` (operation not registered). The resolution (L367–368): "the +protocol code takes precedence in the dispatch machinery." And (L372–373): +"The assumption is that operations use domain-specific codes (`FILE_NOT_FOUND`) +rather than reusing protocol codes (`NOT_FOUND`). This is a naming convention, +not a type-system enforcement." + +But Decision 5's example (L241) shows `from_openapi` mapping a 404 to +`ErrorDefinition { code: "NOT_FOUND", http_status: Some(404), … }` — i.e., +`from_openapi` *does* produce a domain code that collides with a protocol +code. So the "naming convention" is violated by the very adapter the ADR +specifies. An implementer following Decision 5's example would create a +`FILE_NOT_FOUND`-vs-`NOT_FOUND` ambiguity: when `call.error` carries +`code: "NOT_FOUND"`, is it "operation not registered" (protocol) or "file not +found" (operation-level, from a 404)? The `details` field disambiguates +(present for operation-level, absent for protocol-level), but the `code` alone +is ambiguous, and ADR-023 L171 says "Clients should switch on `code`, not +parse `message`." + +**Suggestion**: Either (a) make `from_openapi` prefix/namespace the imported +error codes (e.g., `HTTP_404` or `_NOT_FOUND`) to avoid collision, or (b) +update Decision 5's example to use a non-colliding code and add a normative +rule that `from_openapi` must not produce codes that match the five +protocol-level codes. Assumption 3's "naming convention" should be elevated to +a requirement for adapters, since the adapter example breaks it. + +--- + +### W21. `ScopedOperationEnv` Public Field Leaks the `HashSet` Representation — Future Subgraph Refactor Is Breaking + +**Documents**: ADR-022 (L208–214, L97–105), operation-registry.md (L188, L328) + +**Problem**: ADR-022 L97–105: "For v1, the scoped env can be a +`HashSet` of reachable operation names. A dedicated +`alknet-flowgraph` crate … is a future enhancement — not a prerequisite for +the security model." + +The `HashSet` representation is fine *internally*. The problem is that +it leaks into the public API in a way that makes the future refactor a +breaking change: + +- **`ScopedOperationEnv` is a public struct with a public field.** ADR-022 + L208–214 shows `pub struct ScopedOperationEnv { pub allowed_operations: + HashSet }`. The assembly layer and agent crate construct this + directly: `ScopedOperationEnv { allowed_operations: HashSet::from(…) }` + (operation-registry.md L188, L328). If the representation later changes to a + subgraph type (with type-compatibility edges, as the flowgraph model + requires), every construction site breaks — the field type changes from + `HashSet` to `Subgraph` or `Vec`. This is a public-API + breaking change, not an internal refactor. + +The framing says the flowgraph crate is "not a prerequisite for the security +model." That's true — the security model works with a flat set. But the +*refactor* from flat set to subgraph is not the trivial "two-way door" the +framing implies. It's a breaking change to construction sites. + +Additionally, this interacts with OQ-19 (session-scoped registries): session +ops compose under a scoped env. If the scoped env is a flat `HashSet`, +session ops get reachability control (can this op be reached?) but NOT +type-compatibility validation (does the output of op A feed validly into op +B?). A session op (untrusted, LLM-written) that composes an op whose output +type doesn't match the next op's input gets a runtime error, not a static +rejection. The flowgraph deferral means session ops ship without static type +validation — a security-relevant gap for untrusted code. + +**Suggestion**: Make `allowed_operations` private (not `pub`), with a +constructor `ScopedOperationEnv::new(ops: impl IntoIterator)` +and a query method `allows(&self, name: &str) -> bool`. This keeps the +representation internal and makes the future subgraph refactor a non-breaking +change. Add to the flowgraph deferral: "The `HashSet` representation +does not support type-compatibility validation. Session-scoped ops (OQ-19, +untrusted code) compose without static type checking until the flowgraph +crate is built. This is a known gap for the untrusted-code path, not just a +visualization feature." + +--- + +## Suggestions + +### S1. Inline Decision Rationale in Spec Docs That Should Be in ADRs + +**Documents**: call-protocol.md (L341, L343–344, L124), operation-registry.md +(L202, L364–369, L377), crates/vault/encryption.md (L27–37), +crates/vault/mnemonic-derivation.md (L31–46) + +**Problem**: Per the architect's documentation principles, spec docs should +reference ADRs by number, not contain the rationale. Several spec sections +contain *why* rationale that duplicates or extends ADR content: + +- call-protocol.md L341 duplicates ADR-016 Decision 2's rationale ("the + correct default because aborted parent work has no consumer…") verbatim. +- operation-registry.md L202 explains *why* metadata doesn't propagate (the + secret-leak scenario) — this is ADR-014's rationale, restated inline. +- operation-registry.md L364–369 explains the `Clone` semantics of + `Capabilities` and the security implication — this is a design rationale, + not in any ADR. It should be in ADR-014 or ADR-022. +- operation-registry.md L377 ("`from_call` trust is transitive… a compromised + remote node can do anything") is a threat-model statement that belongs in + ADR-017. +- crates/vault/encryption.md L27–37 (AES-256-GCM rationale) and + mnemonic-derivation.md L31–46 (HD derivation rationale) have no ADRs (see + W9). + +**Suggestion**: Move the unique rationale to the relevant ADRs (ADR-016 +already has the abort-dependents rationale — just cite it; add +capabilities-clone rationale to ADR-014 or ADR-022; add `from_call` transitive +trust to ADR-017) and replace the inline rationale with ADR references. Keep +only the *constraint statement* in the spec. + +--- + +### S2. Untracked "Two-Way Door" Deferrals Scattered Through Specs + +**Documents**: operation-registry.md (L383, L257, L364, L369), +call-protocol.md (L333), ADR-017 (L236–237, L279) + +**Problem**: Per the documentation principles, inline open questions should be +in `open-questions.md`. Several "two-way door" / "future work" notes are +scattered through the specs without being tracked as open questions: + +- operation-registry.md L383: "Two-way door — `ArcSwap` + can be added later" (contradicts the "immutable after construction" + constraint at L14, L194, L340). +- operation-registry.md L257: "Future work may add irpc service dispatch and + remote call protocol dispatch as additional backends." +- operation-registry.md L364: "The concrete shape of the type… is a two-way + door for implementation." +- operation-registry.md L369: "The concrete cloning semantics… is a two-way + door for implementation." +- call-protocol.md L333: timeout policy (30s default, adapter-level config) — + no ADR governs this. +- ADR-017 L236–237, L279: CallClient registry mechanism and `from_call` + hot-swapping. + +**Suggestion**: Add the genuinely-open items (Capabilities concrete shape, +Capabilities clone semantics, `OperationAdapter` trait signatures, CallClient +registry mechanism, `to_*` adapter gap semantics, `from_call` hot-swap, +registry hot-swap via ArcSwap, timeout policy) to `open-questions.md` with +explicit status. For items that are *decided* deferrals, record the deferral +in the relevant ADR's Consequences rather than as inline spec commentary. + +--- + +### S3. Missing ADR References for Design Choices ("Why X Over Y?") + +**Documents**: call-protocol.md (L86, L116, L262–272, L333), +operation-registry.md (L64–67, L120–127, L139) + +**Problem**: Several design choices are stated in the spec with no ADR +backing: + +- **Binary payload convention** (call-protocol.md L86): base64-in-JSON. Why + not a separate binary event type? No ADR. +- **`auth_token` in payload** (call-protocol.md L116–122, L262–272): per- + request identity upgrade via an optional `auth_token` field. Significant + security-model decision with no ADR. +- **Per-request vs per-connection identity** (L272): "Identity is resolved + per-request, not per-connection." No ADR. +- **Timeout policy** (L333): 30s default, adapter-level config. No ADR. +- **Module-private `internal` writes** (operation-registry.md L120–127): + the enforcement mechanism for "handlers can't set `internal`." The + requirement is from ADR-015; the mechanism (Rust visibility) has no ADR. +- **Namespace derivation** (operation-registry.md L66): "`namespace` is + derived from the name." Why derived rather than independently declared? No + ADR. + +**Suggestion**: For load-bearing choices (binary convention, auth_token / +per-request identity, timeout policy), either add an ADR or extend an +existing one. For minor choices (namespace derivation, visibility pattern), a +one-line note suffices, but they shouldn't be silently decided in the spec. + +--- + +### S4. README ADR Tables Missing Entries and Status Column + +**Documents**: crates/call/README.md (L19–36), crates/core/README.md +(L21–31), crates/vault/README.md (L38–47) + +**Problem**: Three issues across the crate READMEs: + +1. **call README** omits ADR-013 (Rust canonical implementation), which both + call spec docs reference (call-protocol.md L88, operation-registry.md L24). +2. **core README** omits ADR-015, referenced by auth.md L78, L200 and + config.md L200. +3. **vault README** omits ADR-010 and ADR-006, referenced by + mnemonic-derivation.md L26 and protocol.md L290. +4. **None of the crate READMEs have a Status column** in their ADR tables, + which contributes to C2 (reader can't tell which ADRs are binding). + +**Suggestion**: Add the missing ADRs to each crate README's ADR table. Add a +Status column (Accepted/Proposed/Superseded) to make the governance state +visible — this directly supports resolving C1/C2. + +--- + +### S5. `Connection::new()` Referenced but Never Defined + +**Documents**: crates/core/core-types.md (L113), crates/core/endpoint.md +(L127), ADR-010 (L117) + +**Problem**: `Connection::new(connection)` is called in dispatch pseudocode +(endpoint.md L127, ADR-010 L117) but the `impl Connection` block in +`core-types.md:59–67` does not include a `new()` constructor, nor specify its +signature/parameter. `core-types.md:113` says "`Connection::new()` wraps the +appropriate stream source based on where the connection came from" — but +there's no formal signature, and the argument type differs by source (a +`quinn::Connection` vs an iroh `Accepting`/`Connecting`). An implementer +doesn't know the constructor's parameter type. + +**Suggestion**: Add `pub fn new(source: ConnectionSource) -> Connection` (or +two constructors `from_quinn` / `from_iroh`, or a private enum) to the +`impl Connection` block. Define `ConnectionSource` or the variant +constructors. + +--- + +### S6. `services/schema` Input Shape Under-Specified (Leading Slash) + +**Documents**: operation-registry.md (L286), call-protocol.md (L120, L126) + +**Problem**: The input shape for `services/schema` is shown as +`{ "name": "fs/readFile" }` (no leading slash) but the wire `operationId` +convention (call-protocol.md L120, L126) is *with* a leading slash +(`/fs/readFile`). So does `services/schema`'s `name` input use the wire form +(leading slash) or the registry form (no slash)? The spec shows no slash, +implying registry form, but a client calling `services/schema` over the wire +would naturally use the same `operationId` form it uses elsewhere. This is a +small ambiguity but could cause a `NOT_FOUND` if the adapter doesn't +normalize. + +**Suggestion**: Specify whether `services/schema`'s `name` input has a +leading slash and whether the adapter strips it (consistent with +`call.requested`'s `operationId` handling per call-protocol.md L120). + +--- + +### S7. `call.completed` For Non-Subscription Calls Not Addressed + +**Documents**: call-protocol.md (L96–106) + +**Problem**: The event table lists `call.completed` as "Signal end of +subscription stream" — subscription-only. But for a plain `call()` +(request/response, one result), the lifecycle is `call.requested` → +`call.responded` (one) and then… nothing? Is the request considered complete +on the first `call.responded`? Is a `call.completed` ever sent for +non-subscription calls? The spec says (L103) "call(): Sends `call.requested`, +resolves on first `call.responded`" — so no `call.completed` for calls. But +then what signals to the server that the client is done listening? This is +consistent but the asymmetry (subscriptions get `completed`, calls don't) +could be stated explicitly to avoid an implementer sending a spurious +`call.completed` after a `call.responded`. + +**Suggestion**: Add a note: "`call.completed` is sent only for subscriptions. +A plain `call()` is complete after its single `call.responded`; no +`call.completed` follows." + +--- + +### S8. `OperationProvenance::FromCall` Comment Is Ambiguous ("Leaf Locally") + +**Documents**: ADR-022 (L123–124) + +**Problem**: `FromCall` => "QUIC forwarding stub (from_call), leaf locally — +cannot compose." The "locally" qualifier is ambiguous. Does it mean: (a) +`from_call` ops are leaves in the *local* registry but could compose on the +*remote* node? Or (b) they're leaves, full stop, and "locally" is just +emphasis? Assumption 5 (L518–524) clarifies, but the phrasing invites +misreading. + +**Suggestion**: Rephrase to "leaf in the local registry — forwards calls to +a remote node; cannot compose locally" or similar to remove the ambiguity. + +--- + +### S9. `ScopedOperationEnv::empty()` and `CompositionAuthority::none()` Referenced but Undefined + +**Documents**: operation-registry.md (L182–184, L243–244, L298–299, L318–320), +ADR-022 (L287) + +**Problem**: The spec and ADR-022 use `ScopedOperationEnv::empty()` and +`CompositionAuthority::none()` (constructor methods) but neither type's +definition (ADR-022 L164–180 for `CompositionAuthority`, L208–214 for +`ScopedOperationEnv`) lists these methods. They're implied (empty set / +None-equivalent) but not shown. + +**Suggestion**: Add `impl ScopedOperationEnv { pub fn empty() -> Self { … } +pub fn allows(&self, name: &str) -> bool { … } }` and +`impl CompositionAuthority { pub fn none() -> Option { None } }` (or +similar) to the spec or ADR-022. + +--- + +### S10. Overview OQ-04 Entry Is Stale + +**Documents**: overview.md (L225) + +**Problem**: `overview.md:225` lists "OQ-04: Dynamic handler registration at +runtime vs static at startup (two-way door, defer to implementation)." But +OQ-04 in `open-questions.md:47–51` is **resolved** ("Static registration at +startup"). The overview shows a stale framing ("defer to implementation") +while the OQ is resolved. Compare to OQ-01/OQ-02/OQ-03 which are correctly +marked "resolved" in the same overview list. + +**Suggestion**: Update `overview.md:225` to "OQ-04: Dynamic handler +registration (resolved: static at startup — see ADR-010)" to match the +resolved state. + +--- + +### S11. ADR-021 References "W1 Drift Issue from the Vault Review" — Ambiguous Cross-Reference + +**Documents**: ADR-021 (L127–129) + +**Problem**: ADR-021 L127–129 states: "This is the fix for the W1 drift issue +from the vault review: the current source ignores `key_version` for key +selection; the spec now makes it functional." But review 001's W1 (L355–371) +is about *ADR-020's stale example path* (`m/74'/2'/1'/0'` vs +`m/74'/2'/0'/1'`), not about the source ignoring `key_version`. The +source-ignoring-`key_version` drift is not tracked in review 001 at all. The +"W1" label in ADR-021 doesn't match review 001's W1. + +**Suggestion**: Clarify the reference — either cite review 001 with the +correct finding number, or remove the "W1" label and describe the drift +directly (which the text already does). + +--- + +### S12. Duplicate "Helper Functions" Block in mnemonic-derivation.md + +**Documents**: crates/vault/mnemonic-derivation.md (L199–204, L217–223) + +**Problem**: The helper functions `device_path` and `site_password_path` are +documented twice: L199–204 (missing `encryption_path_for_version`) and +L217–223 (complete). An implementer reading the first block might not realize +`encryption_path_for_version` exists, or might think there are two conflicting +definitions. The duplication is a maintenance hazard. + +**Suggestion**: Remove the first block (L199–204); keep the complete list at +L217–223. + +--- + +## Cross-Cutting Theme: ADR-022 Narrowed Several "Two-Way Doors" Without Updating the OQ Classifications + +The two-way-door audit found a systematic pattern: ADR-022 (Proposed, not yet +Accepted) is the single largest source of door-type staleness. It narrowed +several doors without updating the OQ classifications that were written +pre-ADR-022: + +| Door | Pre-ADR-022 classification | Post-ADR-022 reality | +|------|---------------------------|----------------------| +| Capabilities concrete shape (W2) | Two-way (implementation detail) | Clone semantics become a behavioral contract once handlers depend on shared mutation; ADR-022's composition model bakes in the cheap-clone assumption | +| OperationAdapter trait signatures (W1) | Two-way (async vs sync) | ADR-022 committed the return type to `HandlerRegistration`; `from_call` forces async — both are already decided | +| CallClient registry mechanism (W3) | Two-way (share global vs subset vs separate) | ADR-022 put capabilities (secrets) in the bundle — "share global" is now a capability-exposure decision, not just dispatch | +| Dynamic registration (W4) | Two-way (add ArcSwap later) | Hot-swap now requires re-injecting capabilities (secret-material handling) + TLS config rebuild, not a pointer swap | +| Session-scoped registries (OQ-19) | Two-way (protocol doesn't need changes) | `OperationContext.env` field type is a current commitment — if it's a concrete struct, the session-overlay pattern is closed (see C6) | +| Graph structures (W21) | Two-way (HashSet now, flowgraph later) | `ScopedOperationEnv`'s public field leaks the representation; the refactor is a public-API break, not an internal change | + +**Pattern**: ADR-022 was written to resolve review #001's C1–C4 (the +registration-bundle tangle), and in doing so it quietly narrowed several +previously-two-way doors. The OQ tracker and ADR-017 should be reconciled +with ADR-022 before the "two-way door" labels can be trusted. ADR-009's +framework is sound; its application is drifting because later ADRs don't +update earlier OQ classifications. + +The fix is not to re-litigate each door — most of ADR-022's narrowing is +correct (the bundle *should* carry capabilities, the adapter *should* produce +`HandlerRegistration`). The fix is to add guard clauses to the affected OQs +and ADRs documenting the narrowed scope, so an implementer reading the OQ +tracker sees the post-ADR-022 reality, not the pre-ADR-022 optimism. + +--- + +## Cross-Cutting Theme: The "Published Artifact Is a Contract" Blind Spot + +The audit found a systematic blind spot in ADR-009's framework: it +classifies doors by reversal cost in the codebase, not by compatibility cost +for external consumers. Three deferrals share this pattern: + +- **`to_*` best-effort mappings (W5)**: "best-effort" is true before first + publication but one-way after — external clients depend on the generated + spec. +- **Error schema `http_status` mappings (W20)**: once `to_openapi` publishes + `FILE_NOT_FOUND` → 404, changing it breaks clients. +- **The salt field's wire format (W6)**: the field is "reserved" (wire-format- + stable), but its *semantics* cannot change for existing data — "reserved" + implies more flexibility than exists. + +The framework treats these as two-way because the *format* is additive; it +misses that the *semantics*, once published, are frozen. This is the same +class of issue as a public API: adding a method is additive, but changing an +existing method's behavior is breaking. The spec should acknowledge that +published artifacts (generated OpenAPI specs, published error code mappings, +shipped wire formats) are compatibility contracts, not internal +implementation details. + +--- + +## Summary Statistics + +| Severity | Count | +|----------|-------| +| Critical | 13 | +| Warning | 21 | +| Suggestion | 12 | + +## Recommended Resolution Order + +The findings cluster into a few resolution passes. Recommended order: + +### Pass 1: Governance reconciliation (resolves C1, C2, C12, C11) + +1. Advance ADR-022 and ADR-023 to **Accepted**. +2. Amend ADR-015: mark Decision 3 and Assumption 6 as superseded by ADR-022; + update the struct to `CompositionAuthority`. +3. Amend ADR-002: note the signature was revised by ADR-007. +4. Amend ADR-004: note the "enrich/replace" language was superseded by + ADR-011's immutability model. +5. Add a Status column to all crate README ADR tables. + +This is the highest-value pass — it resolves the direct contradictions in the +document set and makes the governance state coherent. + +### Pass 2: Spec-ADR consistency (resolves C3, C4, C5, C6, C10, C13, W1) + +6. Add `abort_policy` to `OperationContext`; define `AbortPolicy` enum (C3). +7. Define the `OperationEnv` trait explicitly, including + `invoke_with_policy()` (C4). +8. Separate `OperationContext.env` (dispatch trait) from + `OperationContext.scoped_env` (reachability set) — resolve the type + identity crisis (C6). +9. Update ADR-017's `from_call` mirror list and `OperationAdapter` trait to + include `error_schemas` and `HandlerRegistration` (C5, W1). +10. Specify `From for HandlerError` and resolve the variant + payload (C10). +11. Specify the `Connection::set_identity` observability read path (C13). + +### Pass 3: Vault security and deferral reconciliation (resolves C7, C8, C9, W6, W7, W8, W9, W10, W12) + +12. Promote OQ-21 from "deferred" to "open"; reconcile with ADR-019; decide + the vault-server-crate question (C7). +13. Complete the operation access policy table (C8). +14. Specify the `site_password_path` string→u32 mapping (C9). +15. Add an ADR for the HD-derivation key model (identity/SSH/signing) (W9). +16. Add an ADR locking the `EncryptedData` wire format (W10). +17. Add a "Known Source Drift" table to the vault README (W12). +18. Address `unlock_new` returning non-zeroizing `String` (W7) and + `DerivedKey` JSON deserialization (W8). +19. Clarify the salt field's "reserved" semantics (W6). + +### Pass 4: Two-way-door guard clauses (resolves W2, W3, W4, W5, W17, W18, W21) + +20. Add guard clauses to the OQs and ADRs that ADR-022 narrowed (W2, W3, W4, + W21). +21. Add the "published artifact is a contract" guard to `to_*` adapters (W5). +22. Correct OQ-09 (WASM server-side) and OQ-10 (git composability) framings + (W17, W18). + +### Pass 5: Minor consistency (resolves remaining Warnings and Suggestions) + +23. Fix the overview.md "stream" → "connection" wording (W16). +24. Fix ADR-016's self-referential citation and grandchild propagation (W19). +25. Fix ADR-023's `from_openapi` collision example (W20). +26. Address the remaining suggestions (S1–S12). + +--- + +## Note on Effort + +This review found more issues than review #001 (13 critical vs 5, 21 warning +vs 4), but that's expected: review #001 caught the registration-bundle tangle +and the missing error schemas, which were the highest-density gaps. This +review goes wider — cross-document consistency, the two-way-door framing, and +the vault deferrals — and surfaces issues that were always there but not yet +audited. The critical findings here are individually smaller than review +#001's C1–C5 (no single tangle of four interlocking gaps), but they are +collectively important because several are governance issues (Proposed ADRs +treated as binding, Accepted ADRs contradicting later refinements) that +undermine the document set's authority. + +The two-way-door audit's core finding: the framework is sound, but its +application is drifting because later ADRs don't update earlier OQ +classifications. The fix is not to re-litigate each door — it's to add guard +clauses documenting the narrowed scope, so the OQ tracker reflects +post-ADR-022 reality. \ No newline at end of file