Files
alknet/docs/reviews/002-pre-implementation-architecture-sanity-check.md
glm-5.2 8f8a8a48f9 docs(reviews): add pre-implementation architecture sanity check #002
Second pre-implementation review. Goes wider than #001 on cross-document
consistency and the two-way-door framing from ADR-009.

Finds 13 critical, 21 warning, 12 suggestion issues:
- Governance: ADR-022/023 are Proposed but specs treat them as binding;
  ADR-015/002/004 (Accepted) contradict later refinements without supersession
  markers
- Abort policy (ADR-016) missing from OperationContext struct; OperationEnv
  trait never defined
- OperationContext.env type identity crisis (reachability set vs dispatch
  trait)
- ADR-017 from_call mirror list missing error_schemas; OperationAdapter trait
  stale vs ADR-022 bundle
- OQ-21 remote vault 'non-breaking' framing conflicts with ADR-019 and hides
  a crate-decomposition decision; RemoteService path unvalidated
- Vault operation access policy table incomplete for security-sensitive methods
- site_password_path string-to-index mapping breaks determinism guarantee
- Two-way-door audit: ADR-022 narrowed several doors without updating OQ
  classifications; 'published artifact is a contract' blind spot in framework

Includes recommended 5-pass resolution order.
2026-06-22 05:09:39 +00:00

90 KiB
Raw Blame History

status, last_updated, reviewed_documents, reviewer
status last_updated reviewed_documents reviewer
draft 2026-06-22
overview.md
README.md
open-questions.md
crates/core/README.md
crates/core/core-types.md
crates/core/endpoint.md
crates/core/auth.md
crates/core/config.md
crates/call/README.md
crates/call/call-protocol.md
crates/call/operation-registry.md
crates/vault/README.md
crates/vault/mnemonic-derivation.md
crates/vault/encryption.md
crates/vault/service.md
crates/vault/protocol.md
decisions/001 through 023 (all 23 ADRs)
architect

Architecture Gap Review #002

Purpose

This is the second pre-implementation sanity check. Review #001 caught the registration-bundle tangle (C1C4) and the missing error schemas (C5), both resolved via ADR-022 and ADR-023. This review goes deeper on two axes the first review under-covered:

  1. Cross-document consistency: places where an Accepted ADR contradicts a Proposed ADR, where a spec doc adopted a Proposed ADR's types without the ADR being advanced, and where a later ADR narrowed a "two-way door" deferral without updating the OQ classification.
  2. Two-way-door deferral audit: the user's concern that the one-way/two-way door framing from ADR-009 has been applied too loosely. Some deferrals are legitimate, but several are questionable or misleading — either because a later ADR quietly narrowed the door, because the "path of least resistance" implementation forecloses a better option, or because multiple individually-reversible deferrals compound to foreclose a path that no single deferral would block on its own.

Review #001 resolved 5 critical, 4 warning, and 4 suggestion findings. This review finds a different set: the registration-bundle work surfaced new inconsistencies between ADR-015 (Accepted) and ADR-022 (Proposed), the abort policy (ADR-016) was never reflected in the OperationContext struct, the OperationEnv trait was never defined, and several vault deferrals have security dimensions that the two-way-door framing doesn't surface.

Severity Definitions

Severity Meaning
Critical Must resolve before implementation — would cause divergent implementations or security issues
Warning Should resolve — missing specifications that could cause confusion or rework
Suggestion Consider — improvements for clarity or completeness

Critical Findings

C1. ADR-015 Is Not Reconciled With ADR-022 — Direct Contradiction in the Document Set

Documents: ADR-015 (Status: Accepted, Decision 3 L114136, Assumption 6 L275280), ADR-022 (Status: Proposed, Decision 2 L145186), operation-registry.md (L115, L133, L172), call-protocol.md (L296)

Problem: ADR-015 is marked Accepted and defines handler_identity: Option<Identity> in its Decision 3 (L124) — a peer Identity type resolved via IdentityProvider. Assumption 6 (L275280) explicitly states "the handler identity is a full Identity (with scopes), not a special principal type… This reuses the existing Identity type and IdentityProvider infrastructure (ADR-004)."

ADR-022 is marked Proposed and explicitly supersedes ADR-015's Assumption 6 (L147149: "This ADR refines that: composition authority is a declared authority bundle, not a peer Identity"; L183185: "This supersedes ADR-015's Assumption 6"). The spec docs now use handler_identity: Option<CompositionAuthority> — a different, non-peer type.

This is a direct contradiction. Per the ADR lifecycle, a superseded decision should be marked Superseded or carry an inline note pointing to the superseding ADR. ADR-015 has neither. Worse, ADR-015 has higher status (Accepted) than the ADR that supersedes it (Proposed). An implementer reading ADR-015 in isolation would build Identity-based handler identity; an implementer reading the specs would build CompositionAuthority-based. The Accepted-overrides-Proposed governance rule means the old Identity definition is, by the document set's own lifecycle, currently authoritative — yet the specs have already adopted the proposed change.

This is the same shape of bug as review #001's W1 (ADR-020's stale example path contradicting the locked ADR-021 scheme), but more severe: it's a security-model type, not a path string.

Suggestion: Two things must happen before implementation:

  1. Advance ADR-022 and ADR-023 to Accepted. The specs already depend on them as the source of truth for types that appear throughout the code. Keeping them Proposed while the specs adopt their types is incoherent governance. (See C2.)
  2. Reconcile ADR-015: either update its Decision 3 and Assumption 6 to point to ADR-022 (inline amendment note + change Identity to CompositionAuthority in the struct), or mark the relevant sections Superseded by ADR-022 (Decision 2).

The cleanest path is to do both in one pass: advance ADR-022/023 to Accepted, amend ADR-015 to reference the supersession. This resolves C1, C2, and C6 in one move.


C2. ADR-022 and ADR-023 Are "Proposed" but the Specs Treat Them as Authoritative

Documents: ADR-022 (Status: Proposed), ADR-023 (Status: Proposed), operation-registry.md (L40, L115, L133, L149, L158, L160168, L286, L372379, L404406), call-protocol.md (L296, L305, L369373), README.md (ADR table L3660)

Problem: Both ADR-022 and ADR-023 are marked Proposed. Yet the spec docs depend on them pervasively as the source of truth:

  • OperationSpec.error_schemas (ADR-023) — operation-registry.md L40, L286.
  • ErrorDefinition (ADR-023) — operation-registry.md L5661.
  • HandlerRegistration with provenance, composition_authority, scoped_env, capabilities (ADR-022) — operation-registry.md L160168, the central registration type.
  • OperationContext.handler_identity: Option<CompositionAuthority> (ADR-022 Decision 2) — operation-registry.md L115, L133; call-protocol.md L296.
  • CompositionAuthority, ScopedOperationEnv, OperationProvenance (ADR-022) — used throughout.

The README's ADR table (L3660) shows ADR-022 and ADR-023 as "Proposed," but the call-crate README's ADR table has no Status column at all — a reader can't tell which ADRs are binding. Per the ADR lifecycle, "Proposed" means under review, not yet a commitment.

Combined with C1 (ADR-015 Accepted still contradicts ADR-022 Proposed), the governance state is incoherent: the Accepted ADR says Identity, the Proposed ADR says CompositionAuthority, and the specs have picked the Proposed one.

Suggestion: Advance ADR-022 and ADR-023 to Accepted before implementation. They are clearly intended to be binding — the specs already implement them. Add a Status column to the crate README ADR tables so the binding/non-binding distinction is visible. This is the governance fix that unblocks C1.


C3. The Abort Policy (ADR-016 Decision 6) Is Missing from OperationContext and OperationEnv::invoke()

Documents: ADR-016 (Decision 6 L120164, Accepted), operation-registry.md (OperationContext struct L111128, LocalOperationEnv::invoke() L211249), call-protocol.md (L124, L345)

Problem: ADR-016 Decision 6 states unambiguously (L122125): "The abort policy (abort-dependents vs continue-running) is set on OperationContext and propagated to children via OperationEnv::invoke()." It shows OperationContext carrying the policy (L154) and an opt-in invoke_with_policy(...) method (L149151).

The OperationContext struct in operation-registry.md (L111128) has no abort_policy field. The fields shown are: request_id, parent_request_id, identity, handler_identity, capabilities, metadata, env, internal. No policy. The LocalOperationEnv::invoke() implementation (L211249) likewise constructs a child OperationContext with no policy field and takes no policy parameter.

call-protocol.md L124 and L345 correctly state that the policy is on OperationContext and propagated via OperationEnv::invoke(), but the actual struct definition in operation-registry.md — the one an implementer would copy — omits it. This is a gap between ADR-016 (Accepted) and the spec. An implementer copying the spec would simply omit the abort policy, and the continue-running opt-in would have no wiring point.

Suggestion: Add pub abort_policy: AbortPolicy (with a default of AbortPolicy::AbortDependents) to the OperationContext struct in operation-registry.md, and define the AbortPolicy enum (AbortDependents | ContinueRunning). The build_root_context in call-protocol.md should set abort_policy: AbortPolicy::AbortDependents (the default per ADR-016 L140). See C4 for the invoke() side.


C4. OperationEnv Trait Is Never Defined; invoke_with_policy() Absent from Spec

Documents: operation-registry.md (L196259), ADR-016 (Decision 6 L144164)

Problem: Two related gaps:

  1. The OperationEnv trait is never defined. operation-registry.md repeatedly references OperationEnv as a trait that "must remain a trait" (L259, also OQ-19), and shows #[async_trait] impl OperationEnv for LocalOperationEnv (L211212), but the trait definition itself is never shown. Only the LocalOperationEnv struct (L207209) and its impl block (L211249) appear. An implementer cannot know the trait's method signatures, return types, or whether invoke() is the sole method.

  2. The invoke_with_policy() opt-in is absent. ADR-016 Decision 6 (L144152) shows the policy entering through OperationEnv::invoke() via invoke_with_policy("fs", "readFile", input, &context, AbortPolicy::ContinueRunning). The spec's OperationEnv::invoke() signature (L213) takes no policy parameter, and there is no invoke_with_policy method, no InvokeOptions struct, no builder pattern. The spec never defines the trait surface, so the policy opt-in mechanism ADR-016 mandates is unspecified.

This is load-bearing because OQ-19 makes the trait-ness of OperationEnv a constraint: session-scoped registries depend on wrapping the global env via trait layering. If the trait surface is unspecified, two implementers could define it differently, breaking the wrapping pattern.

Suggestion: Define the OperationEnv trait explicitly in operation-registry.md, showing at minimum:

#[async_trait]
pub trait OperationEnv: Send + Sync {
    async fn invoke(
        &self,
        namespace: &str,
        operation: &str,
        input: Value,
        parent: &OperationContext,
    ) -> ResponseEnvelope;

    /// Opt-in: compose a child with a non-default abort policy.
    async fn invoke_with_policy(
        &self,
        namespace: &str,
        operation: &str,
        input: Value,
        parent: &OperationContext,
        policy: AbortPolicy,
    ) -> ResponseEnvelope {
        // default implementation: set policy on child context, delegate to invoke
    }
}

This satisfies both the "must remain a trait" constraint and ADR-016's policy parameter. See C7 for a related type ambiguity on the same struct.


C5. ADR-017's from_call Mirror List Does Not Include error_schemas — Inconsistency with ADR-023

Documents: ADR-017 (Decision 3 L113122, Assumption 4 L274279, Accepted), ADR-023 (Decision 1 L112147, L4247, L257, Proposed)

Problem: ADR-017 Decision 3, step 3 (L118119) says from_call constructs a spec that "mirrors the remote operation's name, namespace, type, schemas, and access control." Assumption 4 (L274275) repeats: "the imported OperationSpec has the same name, namespace, type, schemas, and access control as the remote operation." The list is: name, namespace, type, schemas, access control. There is no error_schemas in either list.

ADR-023 adds error_schemas: Vec<ErrorDefinition> to OperationSpec (L123) and Decision 5 (L236261) makes adapter fidelity explicit for from_openapi/to_openapi/from_mcp/to_mcp. But ADR-023 does not mention from_call in Decision 5 — it covers OpenAPI and MCP adapters only. Meanwhile ADR-017's from_call mirror list predates error_schemas existing as a field.

The result: it's ambiguous whether from_call should mirror error_schemas. ADR-017's step 2 (L116117) says from_call calls services/schema "for each → gets the input/output JSON Schemas" — again, no mention of error schemas. ADR-023 Decision 6 (L264286) confirms services/schema returns error_schemas, so the data is available, but ADR-017 doesn't say to consume it.

An implementer following ADR-017 literally would import operations with empty error_schemas, silently dropping the remote error contract — the exact lossy-import problem ADR-023 was written to solve for OpenAPI/MCP, now recurring for from_call.

Suggestion: Update ADR-017 Decision 3 step 3 and Assumption 4 to include error_schemas in the mirror list (name, namespace, type, schemas, error_schemas, access control), and update step 2 to note that services/schema now also returns error schemas. Add a cross-reference to ADR-023. This is an inconsistency between an Accepted ADR (017) and a later ADR (023) that must be reconciled — same shape as review #001's W3 (from_mcp missing from adapter lists), but for a new field.


C6. OperationContext.env Has a Type Identity Crisis — Reachability Set vs Dispatch Trait

Documents: ADR-015 (L127: pub env: OperationEnv), ADR-022 (L208215 ScopedOperationEnv, L286287 env: registration.scoped_env…), operation-registry.md (L118: pub env: OperationEnv; L220: parent.env.allows(&name); L243244: env: registration.scoped_env.clone()…; L198: context.env.invoke(...)), call-protocol.md (L298299)

Problem: There is a type identity problem for the env field on OperationContext:

  • ADR-015 L127 and operation-registry.md L118 declare pub env: OperationEnv (a trait object / the trait type).
  • ADR-022 and operation-registry.md L243244 and call-protocol.md L298299 populate env with a ScopedOperationEnv (a concrete struct: allowed_operations: HashSet<String>, ADR-022 L208214).

ScopedOperationEnv (a reachability set — ADR-022's reachability control) and OperationEnv (the composition dispatch trait — the thing with invoke()) are different concepts. ADR-015 conflated them by putting OperationEnv on the context; ADR-022 introduced ScopedOperationEnv as the reachability input but the dispatch path assigns the scoped set directly to env:.

operation-registry.md L220 calls parent.env.allows(&name) — implying env is the ScopedOperationEnv (has .allows()). But the same field is named OperationEnv in the struct and described (L136) as "the operation environment for composing calls… OperationEnv" with invoke(). A handler calls context.env.invoke(...) (L198, L200) — implying env is the dispatch trait. One field cannot be both a HashSet-based reachability gate and a trait with an async invoke() method, unless they are the same type (they are not — ADR-022 treats ScopedOperationEnv as data and the dispatch OperationEnv/LocalOperationEnv as the trait impl that consults the scoped env).

This is a genuine type-system ambiguity that would force an implementer to guess: is context.env the dispatch trait (then where does the scoped reachability set live?), or is it the reachability set (then how does the handler call invoke() on it?), or is it a composite type that holds both?

This interacts with OQ-19's constraint that OperationEnv must remain a trait: if env is ScopedOperationEnv (a concrete struct), the session-overlay pattern is closed because the field can't hold a trait object.

Suggestion: Separate the two concepts explicitly. Likely: OperationContext carries scoped_env: ScopedOperationEnv (reachability data, used for the allows() check) and a separate env: Box<dyn OperationEnv> (or Arc<dyn OperationEnv>) that is the dispatch trait (used for invoke()). The reachability check in invoke() consults parent.scoped_env.allows(&name), and the actual dispatch goes through the OperationEnv trait impl. The current field naming (env for both) is the root of the confusion and should be disambiguated. This may warrant an ADR-022 clarification, since it changes the struct shape that ADR-022 Decision 5 shows.


C7. OQ-21's "Non-Breaking" Deferral Conflicts With ADR-019 and Hides a Crate-Decomposition Decision

Documents: open-questions.md (OQ-21 L252285), ADR-019 (L133136), crates/vault/protocol.md (L187216, L278291), crates/vault/service.md (L300322)

Problem: Three interlocking problems:

  1. ADR-019 directly contradicts OQ-21's deferral framing. ADR-019 L133136 states: "Remote vault administration (unlock a running node's vault over the network) is not supported. If needed in the future, it requires a separate, heavily restricted mechanism (admin scope, mTLS-only, never expose the mnemonic over an unauthenticated channel) and its own ADR." OQ-21 defers remote vault access (derive/encrypt/decrypt) as "non-breaking" with no ADR required. The boundary between ADR-019's "remote vault administration" (needs an ADR) and OQ-21's "remote vault access" (deferred, no ADR) is undefined. An implementer reading ADR-019 could reasonably conclude that any remote vault exposure requires an ADR; an implementer reading OQ-21 could conclude none is needed.

  2. The "non-breaking" claim is only true for the wire format, not for secure enablement. OQ-21 and protocol.md L278286 frame enabling remote access as a "server-setup change." But the auth-wrapping handler that makes remote access safe cannot live in the vault crate (ADR-018) and requires either (a) assembly-layer integration or (b) a new dedicated vault-server crate depending on both alknet-core and alknet-vault (protocol.md L201204, open-questions.md L269). Option (b) is a crate-decomposition change (ADR-003 territory), not "server setup." OQ-21's "two-way (enabling is non-breaking)" door-type classification (L256) is misleading: once workers build dependencies on the remote vault API, disabling it breaks them — the door is operationally one-way even if the wire format is additive.

  3. Nested, undecided deferral. OQ-21 L269 says the auth handler "lives in the assembly layer or a dedicated vault-server crate." This "or" is itself an undecided architectural choice (affects the dependency graph and ADR-003's crate decomposition) deferred inside the OQ-21 deferral. It is not tracked as a separate question.

  4. The RemoteService path is unvalidated. The deferral means VaultProtocol's RemoteService trait impl, DerivedKey's dual serialization (JSON redacts, postcard preserves), and VaultServiceActor's transport-agnosticism have never been exercised over a real network. The spec asserts these work ("by construction"), but there are concrete failure modes that only surface at enablement — e.g., DerivedKey's custom Serialize redacts private_key for "human-readable formats"; whether postcard correctly bypasses the redaction is a known source of subtle bugs. If it redacts under postcard too, remote dispatch silently corrupts private keys. ADR-009:33 warns "if the right choice is unclear, validate with a POC." The deferral skips the POC and asserts the door is open by construction.

  5. Safety footgun. The default IrohProtocol handler forwards all message types without auth (protocol.md L189191). If someone naively registers the ALPN without the wrapping handler, Unlock (mnemonic over the wire) and Lock (DoS) become remotely callable. The operation access policy is documented in prose only; nothing in the vault crate prevents the insecure configuration.

Suggestion: Promote OQ-21 from "deferred" to "open." Reconcile with ADR-019 by explicitly defining the administration-vs-access boundary. Decide now whether a vault-server crate is part of the architecture (this is a crate-decomposition one-way door per ADR-003/ADR-018 and should be decided, not deferred). If remote access is genuinely deferred, require that the enablement (not just the protocol) be covered by a future ADR — matching ADR-019's "requires its own ADR" language. Correct the door-type classification: enabling remote access is a one-way door operationally (workers depend on it), even if the wire format is additive. At minimum, add a guard clause: "The RemoteService path is unvalidated. The DerivedKey postcard serialization and the actor's transport-agnosticism must be verified by a POC before claiming the door is open."


C8. The Operation Access Policy Table Is Incomplete — Security-Sensitive Methods Are Unclassified

Documents: crates/vault/protocol.md (L218236), crates/vault/service.md (L104212)

Problem: The "Operation access policy" table (protocol.md L223232) covers exactly the 8 VaultProtocol enum variants. But VaultServiceHandle exposes additional methods not in the protocol enum: rotate, unlock_new, derive_encryption_key_for_version, derive_password_string, is_unlocked. These methods have no access classification — an implementer cannot determine from the spec whether they are local-only, remote-capable, or simply not protocol-exposed. This is security-relevant:

  • rotate re-encrypts blobs (security-sensitive operational action; ADR-021 L168170 says "rotation is an operational action, not a runtime behavior"). If it were ever exposed remotely, a compromised worker could trigger mass re-encryption. It is not in the protocol enum and not in the access policy table — its status is undefined.
  • unlock_new generates and returns the mnemonic (root of trust) as a plaintext String. Not in the protocol enum; not in the policy table. Its local-only status is implied but not stated.
  • derive_encryption_key_for_version returns an EncryptionKey directly (raw key material), unlike Encrypt/Decrypt which use the key internally. If exposed remotely, a worker could extract raw encryption keys. Not classified.

The spec never states the rule that "service methods not in VaultProtocol are local-only by construction." An implementer could add protocol variants for these methods without recognizing the security implication, since the policy table only constrains existing variants.

Suggestion: Add an explicit rule: "Service methods absent from VaultProtocol are local-only by default; adding a protocol variant requires updating this policy table and an ADR if the method is security-sensitive." Add rotate, unlock_new, and derive_encryption_key_for_version to the policy table (or an explicit "local-only" list) with rationale. Document the mapping between service methods and protocol variants.


C9. site_password_path(site_hash: &str) Is Underspecified — String-to-Index Mapping Breaks Determinism

Documents: crates/vault/mnemonic-derivation.md (L202204, L213, L219222)

Problem: The helper site_password_path(site_hash: &str) -> String produces path m/74'/1'/0'/{site_hash}', but SLIP-0010 hardened derivation indices are u32 (0 to 2³¹1). A &str "site hash" cannot be a derivation index directly — it must be converted to a u32. The conversion (hash algorithm, truncation, endianness, collision handling) is completely unspecified. Two implementers could produce different, incompatible mappings: SHA-256(site) → first 4 bytes → u32; or last 4 bytes; or BLAKE3; or a string hash. Since the path semantics table (L213) says these are "Ed25519 bytes" used for "Per-site passwords (not cached)," and determinism is a stated design principle (L234237: "the same mnemonic + passphrase + path always produces the same key"), the string→index mapping must be deterministic and specified. An underspecified mapping means the same site + mnemonic produces different passwords across implementations — breaking the reproducibility guarantee.

Additionally, "Ed25519 bytes" (L213) as a password source is an undefined concept. The path produces a 32-byte Ed25519 private key; using those raw bytes as a password is a non-standard construction with no ADR explaining why Ed25519-derived material is suitable as password entropy (vs HKDF or a dedicated KDF).

Suggestion: Specify the exact string→u32 mapping (e.g., u32::from_be_bytes(SHA-256(site_hash)[0..4]) & 0x7FFFFFFF for hardened index). Add a design note or ADR explaining why Ed25519-derived bytes are acceptable as password material. Specify the password encoding (raw bytes vs base64url — service.md L166 says base64url for the string variant, but the bytes variant's encoding is the caller's concern).


C10. HandlerError::StreamError(io::Error) vs StreamError Variant — Conversion Gap

Documents: crates/core/core-types.md (L3338, L118139), crates/core/endpoint.md (L248253), crates/core/auth.md (L55, L59), ADR-010

Problem: core-types.md and endpoint.md (citing ADR-010) both define HandlerError::StreamError(io::Error). But auth.md:59 shows handler code calling self.identity_provider.resolve_from_token(&token).ok_or(HandlerError::AuthRequired)? after let stream = connection.accept_bi().await?; (L55). The .await? on accept_bi() returns StreamError (per core-types.md:60), yet there is no From<StreamError> for HandlerError impl specified. The mapping table in core-types.md:132137 is descriptive prose, not a trait impl. Without a specified From<StreamError> for HandlerError (or #[from]), the ? in the example will not compile. Two implementers following these docs literally will diverge: one will hand-map per the table, another will expect an auto- From.

Additionally, HandlerError::StreamError(io::Error) holds io::Error, but StreamError itself holds Internal(io::Error) (L118124). So StreamError::Internal(io::Error)HandlerError::StreamError(io::Error) is a lossy wrap (the distinction between stream-level and connection-level is flattened). The spec should state whether this From impl exists and its exact behavior, because handler code (? operator) depends on it.

Suggestion: Specify the exact conversion mechanism. Either (a) declare HandlerError::StreamError(#[from] StreamError) — but then the variant must hold StreamError, not io::Error, contradicting ADR-010/core-types; or (b) explicitly state handlers must use .map_err(HandlerError::from) or .map_err(|e| e.into()) and define the From impl; or (c) keep io::Error and define From<StreamError> for HandlerError matching the mapping table. Resolve the variant payload (io::Error vs StreamError) and specify the From impl. Update auth.md:55 example to use the canonical conversion pattern.


C11. ADR-004 Says Handlers "Enrich or Replace" AuthContext; ADR-011 Says AuthContext Is Immutable in handle()

Documents: ADR-004 (L39, Accepted), ADR-011 (L131133, Accepted), crates/core/auth.md (L8285)

Problem: ADR-004 line 39 states the handler "enriches or replaces the partial AuthContext with the fully resolved Identity." This describes mutation of the shared AuthContext. ADR-011 explicitly contradicts this at lines 131133: "handle() signature passes &AuthContext (immutable reference). Handlers that resolve identity create a local variable… they don't mutate the AuthContext." auth.md:85 reiterates immutability. The immutability decision is the later, more refined one, but ADR-004 was never amended. An implementer reading ADR-004 alone would mutate AuthContext — causing cross-stream contamination (one stream's token resolution leaking into another stream's AuthContext on the same connection), exactly what ADR-011 prevents.

This is security-relevant — mutation across streams is a contamination vector. Same shape as C1 (Accepted ADR contradicts a later refinement that was never recorded as a supersession).

Suggestion: Amend ADR-004 to note that the "enrich/replace" language was superseded by ADR-011's immutability model (handlers resolve into a local variable, store on Connection via set_identity). Add an inline note at ADR-004 L39 pointing to ADR-011. Same pattern as the C1 fix for ADR-015.


C12. ADR-002's ProtocolHandler::handle Signature Is Stale — Not Marked as Superseded by ADR-007

Documents: ADR-002 (L1927, Accepted), ADR-007 (L4453, Accepted), overview.md (L89), crates/core/core-types.md (L1419)

Problem: ADR-002 (Status: Accepted) still shows async fn handle(&self, stream: BiStream, auth: &AuthContext) at L26. ADR-007 revised the signature to handle(&self, connection: Connection, auth: &AuthContext) and its References note "ADR-002: ProtocolHandler trait (signature revised by this ADR)." However, ADR-002 itself has no superseded/amended marker — it reads as the current authoritative definition. overview.md:89 even says "This differs from the original ADR-002 signature which passed BiStream," confirming the discrepancy is known but ADR-002 was never annotated. An implementer reading only ADR-002 (as the README's "Applicable ADRs" table encourages) gets the wrong trait.

Suggestion: Add a "Superseded by ADR-007" amendment note at the top of ADR-002's Decision section (or a ## Amendments section) stating the signature was changed by ADR-007. Alternatively, update the ADR-002 code block to the Connection signature with a note "Revised by ADR-007." ADRs must reflect current state or explicitly mark supersession — this is the same lifecycle rule that C1 and C11 violate.


C13. Connection::set_identity Observability Read Path Is Under-Specified

Documents: crates/core/core-types.md (L5367), crates/core/auth.md (L80), open-questions.md (OQ-11 L138), crates/core/endpoint.md (L122136)

Problem: Connection has private identity: OnceLock<Identity>. set_identity(&self,…) takes &self (interior mutability) and is "write-once-read-many." But the endpoint "holds a reference for logging" (OQ-11:138) and the dispatch code in endpoint.md:122136 moves conn into the spawned task (Connection::new(connection) then tokio::spawn(async move { ... handler.handle(conn, &auth) })). After the move, the endpoint no longer has the Connection to read identity() for observability. OQ-11 says "the endpoint may also hold a reference for logging" — but there is no Arc<Connection> or shared handle shown anywhere; the struct definition in core-types.md:5357 has no sharing mechanism, and AlknetEndpoint in endpoint.md:1727 does not hold a registry of live connections. So the stated observability reader (the endpoint) has no defined way to access the connection after spawn.

Suggestion: Specify the observability read path precisely. Either (a) the endpoint holds a side table of live connections keyed by connection ID with Weak<Connection> or an Arc<Connection> shared with the handler task, and Connection becomes Arc-shareable; (b) the handler emits the resolved identity via a channel/observer callback and the endpoint records it; or (c) drop the claim that the endpoint reads it and state observability is handler-side logging only. The current spec is internally inconsistent on who reads identity() and how they obtain the reference.


Warning Findings

W1. ADR-017's OperationAdapter Trait Is Stale Relative to ADR-022 — And the Sync/Async "Two-Way Door" Is Already Decided

Documents: ADR-017 (L155158, L171174, Accepted), ADR-022 (L225255, Proposed)

Problem: ADR-017 L171172 states: "The specific trait signatures (async vs sync, error types, configuration parameters) are two-way doors for implementation." Two problems:

  1. The bundle shape is already constrained. ADR-022 changes registration from (OperationSpec, Handler) to HandlerRegistration (spec + handler + provenance + composition_authority + scoped_env + capabilities). ADR-022 L257261 says adapter convenience methods construct HandlerRegistration with composition_authority: None and scoped_env: None. So the adapter trait must now produce Vec<HandlerRegistration>, not Vec<(OperationSpec, Handler)>. The trait signature shown in ADR-017 (fn import(&self) -> Vec<(OperationSpec, Handler)>) is wrong per ADR-022. The "two-way door for implementation" on the signature has already been narrowed by ADR-022 — the return type is committed to the bundle, not the pair. ADR-017 has not been updated.

  2. The async/sync question is not fully two-way. from_call is inherently async — it discovers operations via services/list + services/schema over a QUIC connection. A synchronous fn import(&self) -> Vec<…> trait cannot accommodate from_call without a separate async pre-step that populates a cache, then a sync import that reads the cache. This means the trait shape is constrained: either the trait is async (async fn import), or from_call is not an OperationAdapter impl and uses a different path. The "two-way door" framing suggests you can pick sync now and switch to async later; in reality, from_call forces at least an async-capable path from day one.

This is a "two-way door" label that ADR-022 has silently narrowed without updating the OQ classification. (See the two-way-door audit, item 7.)

Suggestion: Update ADR-017 to: (1) return Vec<HandlerRegistration> (not (OperationSpec, Handler)), (2) acknowledge that from_call requires async discovery and the trait must be async or from_call must be special-cased, (3) reclassify the signature as one-way (bundle shape) with a narrow two-way door on ergonomic details (error types, config params). The current framing in ADR-017 is inconsistent with ADR-022 and should be reconciled — same reconciliation pass as C5.


W2. Capabilities Concrete Shape and Clone Semantics — "Two-Way Door" With a Hidden One-Way Door Inside

Documents: operation-registry.md (L364, L369), ADR-014 (L7476, L123124, L163165)

Problem: The spec says "the concrete shape of the type (a typed map, a struct with named fields, a trait object) is a two-way door for implementation" and "the concrete cloning semantics (reference-counted Arc vs deep copy of zeroized material) is a two-way door for implementation, but Capabilities: Clone is required."

The Clone requirement is accurately identified as a one-way constraint. The problem is the clone semantics, which the framing treats as two-way but which contain a hidden one-way door:

  • Arc-based clone (cheap, shared). If Capabilities is Arc<Map<String, Secret>>, clone() bumps the refcount. All calls in a composition tree share the same secret material. If a handler can mutate capabilities (e.g., add a derived key for a child operation), that mutation is visible to siblings and the parent — shared mutable state across the call tree. This is a security-relevant behavior.
  • Deep-copy clone (expensive, isolated). Each call gets its own copy. Mutations are isolated. But cloning zeroized secret material on every invoke() duplicates secrets N times for a depth-N tree, increasing the attack surface.

The "path of least resistance" is Arc-based (cheap, the obvious choice for a Clone type holding secrets). Once shipped, handlers may come to depend on shared-mutation. Switching from Arc-shared to deep-copy-isolated later is a behavior change that breaks those handlers.

Worse, ADR-022's composition model assumes cheap clone — it clones unconditionally on every invoke(). A depth-5 composition tree with Arc-based clone does 5 refcount bumps (cheap). With deep-copy, it does 5 full copies of all secret material. The ADR-022 design bakes in the assumption that clone is cheap — which means it bakes in the Arc-based choice, making the "two-way door" effectively decided by the composition model's performance assumptions.

Suggestion: Add a guard clause: Capabilities must be immutable after construction (no interior mutability, no Mutex<Map>, no RefCell). This makes Arc-based clone safe (shared immutable state is fine) and makes the two-way door genuinely two-way: you can switch between Arc<Map> and deep-copy without behavior change, because neither supports mutation. Without this guard, the "two-way door" is a future one-way door waiting for the first handler that mutates capabilities.


W3. CallClient Registry Mechanism — "Two-Way Door" With a Security Dimension ADR-022 Introduced

Documents: ADR-017 (L236237, Accepted), ADR-022 (L225242, Proposed)

Problem: ADR-017 L236237: "The specific mechanism (sharing the global registry, a peer-scoped subset, or a separate registry) is a two-way door."

The three options are mechanically possible with the HandlerRegistration bundle. But ADR-022 introduced a security dimension the framing doesn't flag:

  • Sharing the global registry exposes local capabilities to the remote peer. Each HandlerRegistration carries Capabilities with secret material. If the CallClient shares the global registry, the remote peer can invoke any External operation — and those operations' handlers use the local capabilities. A remote peer calling /llm/generate uses the local node's Google API key. The remote peer effectively gains access to all local secret material that backs External operations.
  • A peer-scoped subset must filter by capability remote-safety, not just operation name. The "path of least resistance" for a peer-scoped subset is to filter by operation name/namespace. But the real filter must be: "is this operation's capability safe to expose to this peer?" An operation backed by a signing key that should never leave the node is not remote-safe, regardless of its name.

The framing was written before ADR-022 put capabilities in the bundle. Post-ADR-022, the "two-way door" on the registry mechanism has a security constraint attached that the framing doesn't mention.

Suggestion: Append to ADR-017's negative consequences: "Sharing the global registry with a CallClient exposes all local capabilities (API keys, signing keys) to the remote peer, because the dispatch path populates OperationContext.capabilities from the registration bundle. A peer-scoped subset must filter by capability remote-safety, not just operation name. The registry-mechanism choice is two-way mechanically but has a security dimension post-ADR-022: the 'share global' option is a capability-exposure decision, not just a dispatch decision."


W4. from_call Hot-Swapping and Registry Mutability — Two "Two-Way Doors" That Gate Each Other

Documents: ADR-017 (L279), operation-registry.md (L383)

Problem: ADR-017 L279: "Hot-swapping imported specs is a two-way door." operation-registry.md L383: "The registry is immutable after construction. No runtime registration or deregistration. Two-way door — ArcSwap<OperationRegistry> can be added later."

These are presented as independent two-way doors. They are not — one gates the other. If the initial implementation registers from_call-imported specs immutably (per the immutable-registry constraint), and handlers/clients compile/generated-code against those specs, then "hot-swapping" later means: re-importing on reconnect, diffing specs, invalidating generated clients, handling in-flight calls whose schema changed. If the registry is ArcSwap-able, hot-swap is mechanically possible but semantically fraught (a handler mid-composition doesn't expect the child's schema to change).

Deciding "hot-swap is a two-way door" without resolving the registry-mutability question leaves an implementer to guess whether to build for immutability or hot-swap. The deferral hides a coupling: the immutability of the registry (a one-way door per ADR-010 and operation-registry.md L14) and the mutability of remote specs (inherent to from_call) are in tension.

Additionally, hot-swapping with ADR-022 bundles requires re-injecting capabilities (cloning from the old registry or re-deriving from the vault) and rebuilding the TLS ServerConfig (ALPN strings come from the registry). The "simple ArcSwap" now touches the secret-material flow and the TLS config. This isn't closing the door, but it raises the cost and security sensitivity of the swap.

Suggestion: Resolve the registry-mutability question (immutable vs ArcSwap) before deciding hot-swap, since hot-swap depends on it — don't let both be independent "two-way doors" when one gates the other. Add a guard clause to OQ-04: "Hot-swapping the registry with ADR-022 bundles requires re-injecting capabilities (cloning from the old registry or re-deriving from the vault) and rebuilding the TLS ServerConfig. The ArcSwap upgrade is mechanical but not trivial — it is a secret-material-handling operation, not a pure pointer swap."


W5. to_* Adapter "Best-Effort" Mappings Become Compatibility Contracts Once Published

Documents: ADR-017 (L244246, L281286, Accepted), ADR-023 (error schemas projection)

Problem: ADR-017 L244246: "Some semantics don't map cleanly (e.g., subscriptions in OpenAPI, bidirectional calls in MCP). The adapters handle these with best-effort mappings and document the gaps."

The "best-effort" framing implies the mappings are malleable — two-way, adjustable. This is misleading once the generated spec is published:

  • A published spec is a compatibility contract. to_openapi generates an OpenAPI spec. Once external HTTP clients consume that spec and build against it, the mapping semantics (e.g., subscriptions → SSE long-poll, or subscriptions → a polling endpoint) become a de facto contract. Changing the mapping later breaks every client that built against the original spec. The "best-effort" mapping, once published, is a one-way door for the clients that depend on it.
  • from_openapi round-trip lossiness compounds. If to_openapi maps subscriptions to SSE, and later from_openapi imports a spec where subscriptions are modeled as webhooks, the two adapters disagree on what a "subscription" means in OpenAPI. The to_* mapping choices constrain the from_* parsing expectations.
  • With ADR-023, to_openapi now maps ErrorDefinition.http_status to OpenAPI response codes. Once published, those status code mappings are a contract. Changing FILE_NOT_FOUND from 404 to 422 in a later to_openapi revision breaks clients.

This is the "informal spec becomes formal spec" problem. The generated artifact is the contract; the "best-effort" label is internal framing that external consumers never see. ADR-009's framework classifies doors by reversal cost in the codebase, not by compatibility cost for external consumers — a systematic blind spot.

Suggestion: Add a guard clause to ADR-017: "The to_* mapping choices, once published as a generated spec, become a compatibility contract for external consumers. Best-effort mappings are two-way before first publication but one-way after. Version the generated specs (e.g., OpenAPI spec version tied to the registry's External operation set version) and document that mapping changes are breaking for consumers. The to_* adapters should emit a spec version marker so consumers can detect mapping changes."


W6. The Salt Field's "Reserved for Future KDF" Framing Is Misleading — Semantics Can't Change for Existing Data

Documents: ADR-020 (L102108, L151157, L204207, Accepted), crates/vault/encryption.md (L112128)

Problem: ADR-020 L108: "If a future KDF-based derivation is needed (see 'Future KDF' below), the field is already present." ADR-020 L156: "This is additive — it doesn't change v2 data."

The "reserved" framing creates a false sense of flexibility:

  • The salt in v2 is cryptographically meaningless and cannot be retroactively made load-bearing for v2 data. v2 encrypts with an HD-derived key at m/74'/2'/0'/0'. The salt is random, stored, and unused. If v3 introduces a KDF that uses the salt, v3's key derivation is different from v2's. v3 cannot decrypt v2 data using the salt — v2 was never encrypted with a salt-derived key. The salt is only usable for NEW v3+ data. So "the field is already present" is technically true but misleading: the field is present in v2 blobs, but its presence doesn't make v2 data upgradeable to a KDF. It only means v3 blobs can reuse the field without a wire-format change.
  • The wire-format-compat framing is the real two-way door, and it's accurate. Keeping the field means the EncryptedData struct doesn't change shape across v2→v3. That IS a wire-format two-way door — you can add a KDF in v3 without changing the struct. But the semantics of the salt field ("unused" in v2, "load-bearing" in v3) cannot be changed for existing v2 data.
  • A KDF doesn't fit the rotation scheme. Rotation is version-indexed paths (m/74'/2'/0'/{version-2}'). A KDF-based derivation is a different derivation method, not a different path. If a KDF is introduced, the rotation mechanism (version-indexed paths) and the KDF mechanism (salt-based) coexist or conflict. The "reserved salt" framing doesn't acknowledge that a KDF is a different derivation family, not just another version index.

Suggestion: Update ADR-020: "The salt field is reserved for FUTURE versions' use. v2 data's salt is permanently unused — it was random, never participated in key derivation, and cannot be retroactively made load-bearing. Introducing a KDF in v3 is a new derivation method (not a version-indexed path), requiring its own design and a v2→v3 migration (re-encrypt with the new KDF, using a newly-generated v3 salt — the v2 salt is not reused). The field's presence saves a wire-format struct change only; it does not make the KDF design or migration trivial." Drop the speculative "reserved for future KDF" framing or explicitly mark it as speculative, not a planned feature.


W7. unlock_new Returns the Mnemonic as a Non-Zeroizing String — Security Gap

Documents: crates/vault/service.md (L7179)

Problem: unlock_new(word_count) -> Result<String, VaultServiceError> returns the generated mnemonic phrase as a String. The Mnemonic type zeroizes on drop (mnemonic-derivation.md L7778), but String does not implement Zeroize by default. The root of trust is returned to the caller as a plaintext heap-allocated String that lingers in freed memory until the allocator reuses it. The spec says "Store the returned phrase securely" (L7879) but does not address that the returned String cannot be zeroized by the caller without unsafe or converting to a zeroizing container.

This contradicts the stated principle "Zeroize everything sensitive" (vault README L6567) and the Mnemonic type's own zeroization. The Security Constraints section (service.md L365399) does not mention this.

Suggestion: Either return a zeroizing wrapper (e.g., Zeroizing<String> or a dedicated Phrase type that implements Zeroize/ZeroizeOnDrop), or document in Security Constraints that the caller must convert the returned String to a zeroizing type and must not let it be cloned/logged. At minimum, flag this as a known limitation.


W8. DerivedKey JSON Deserialization Silently Produces a Corrupted Key

Documents: crates/vault/protocol.md (L94112)

Problem: protocol.md L104107 states: "A JSON-deserialized DerivedKey will have "[REDACTED]" as its private_key string — this is expected; JSON round-tripping a DerivedKey is not a supported use case (the private key is gone)."

This is a silent-failure footgun: if a DerivedKey is accidentally serialized to JSON and then deserialized (e.g., in a log pipeline, a debug tool, or a caching layer that uses JSON), the resulting DerivedKey has a 10-byte string ("[REDACTED]") as its private_key instead of 32 bytes. Any code that then uses this DerivedKey for signing will either produce garbage signatures or panic on length mismatch — with no error at deserialization time. The redaction is a defense-in-depth for serialization (logs), but the deserialization side silently accepts the redacted form, producing a corrupted key.

For the irpc/postcard path this is not an issue (postcard preserves bytes). But if any code path ever uses JSON for DerivedKey transport (the spec says this is "not a supported use case" but doesn't prevent it), the corruption is silent.

Suggestion: Either (a) make DerivedKey::deserialize reject payloads where private_key == "[REDACTED]" (returning an error rather than a corrupted key), or (b) document explicitly that private_key length must be validated before use and that JSON-deserialized DerivedKeys are invalid. Consider whether the custom Deserialize impl should enforce a 32-byte length check.


W9. No ADR for the HD-Derivation Key Model (Identity/SSH/Signing) — Only Encryption Keys Have ADR-020

Documents: crates/vault/mnemonic-derivation.md (L3146, L248256), crates/vault/README.md (L6870)

Problem: ADR-020 covers HD derivation for encryption keys. But the vault's primary use of HD derivation is for identity keys, SSH host keys, and signing keys (mnemonic-derivation.md L2629, L184215). The rationale for "HD derivation, not stored keys" (L3146: no key storage, reproducible across nodes, domain separation, auditable derivation) is inline in the spec's "Why HD Derivation" section. The Design Decisions table (L250256) lists this with no ADR ("HD derivation (not stored keys) | — | One seed, many keys, no key storage").

This is a foundational architectural decision — the entire vault model depends on it — yet it has no ADR. The choice of HD derivation vs. stored keys is a one-way door (switching to stored keys would change the entire trust model). ADR-018 and ADR-019 both assume HD derivation but don't decide it.

Similarly, these design choices have inline rationale but no ADR:

  • 74' coin type reservation (L186, L254): SLIP-0044 unallocated coin type claimed for alknet. This is a namespace allocation that, once keys are derived, cannot be changed without re-deriving all keys — effectively one-way. No ADR records this allocation.
  • secp256k1 feature-gating (L166179): rationale (heavy C dependency, only needed for Ethereum) is inline. No ADR.
  • AES-256-GCM cipher/mode choice (encryption.md L2737): rationale (authenticated, hardware-accelerated) is inline. No ADR.

Suggestion: Either (a) create an ADR for "HD derivation as the vault's key model" covering identity/SSH/signing keys (the encryption case is already ADR-020), or (b) broaden ADR-020's scope, or (c) create a single "Vault Key Model" ADR covering the overarching HD-derivation decision with ADR-020 as a special case for encryption keys. Record the 74' coin type reservation and the AES-256-GCM choice in ADRs or a tracked decisions log, since they are one-way doors with no current ADR.


W10. The EncryptedData "Stable Wire Format" Is a One-Way Door Documented Only in Prose

Documents: crates/vault/encryption.md (L8495), ADR-018 (L101104)

Problem: encryption.md L8695 states: "This is the stable wire format shared with alknet-storage (a future crate) by type-level agreement, not by a crate dependency. Both crates must agree on the serialization format… This cross-language compatibility is why the wire format must stay stable — changing it breaks both alknet-storage and the TypeScript consumer." ADR-018 L101104 repeats this.

This is a one-way door: once alknet-storage and TS consumers depend on the format, changing it breaks them. Yet it is not locked by an ADR. It is a cross-crate compatibility contract enforced by documentation, not by a decided constraint. An implementer modifying EncryptedData (e.g., removing the unused salt field) would find no ADR saying "this format is frozen." The "type-level agreement, not a crate dependency" approach (ADR-018) means there is no compiler enforcement either — the stability contract exists only in prose.

Suggestion: Create an ADR (or add a decision to ADR-018) locking the EncryptedData wire format as stable, with the compatibility surface explicitly enumerated (fields, base64 encoding, field semantics). Note that the salt field, though unused in v2, is part of the frozen format and cannot be removed without a format-version migration.


W11. OperationContext.identity Conversion from CompositionAuthority Is Undefined

Documents: operation-registry.md (L236: identity: parent.handler_identity.as_identity()), ADR-022 (L314: identity: parent.handler_identity_as_identity())

Problem: Both spec and ADR-022 populate the child's identity by converting the parent's handler_identity: Option<CompositionAuthority> into an Option<Identity> via a method as_identity() / handler_identity_as_identity(). This conversion method is never defined and its semantics are unclear: CompositionAuthority (label + scopes + resources) is a declared authority bundle, while Identity is a peer identity resolved via IdentityProvider. ADR-022 goes to lengths to stress that CompositionAuthority is not a peer Identity (L150160). Yet the dispatch path needs an Identity to run ACL against for the child.

So there's an implicit conversion from a non-peer authority bundle to a peer identity type. Is it a synthetic Identity constructed with the authority's label as id and its scopes? Does it bypass IdentityProvider? Is as_identity() returning None when handler_identity is None (the leaf case)? For leaves, the ACL "checks against the leaf's AccessControl" — but if identity is None because the parent is a leaf with handler_identity: None, then per operation-registry.md L8687 "If the relevant identity is None and the operation has restrictions, the adapter returns FORBIDDEN." That would mean a composing handler with no composition authority cannot compose any restricted leaf — which contradicts the model where leaves are composed under the parent's authority.

Actually: a composing parent (Local/Session) does have handler_identity: Some(...), so as_identity() returns Some. Leaves being composed are the child; their own handler_identity is None but that's fine because the child's ACL uses the child's identity (which is the parent's authority). So the conversion is load-bearing and must be specified.

Suggestion: Define CompositionAuthority::as_identity() -> Option<Identity> (or handler_identity_as_identity() on OperationContext) explicitly in the spec or ADR-022. Specify that it constructs a synthetic Identity { id: label, scopes, resources } that is not resolvable via IdentityProvider but is used directly for ACL matching. Note that this creates a second Identity construction path (the first is IdentityProvider::resolve_*), which should be acknowledged.


W12. Source Drift Items Are Tracked Only in Scattered Prose — The spawn Return Bug Could Be Lost

Documents: crates/vault/README.md (L79103), crates/vault/encryption.md (L185188, L245250), crates/vault/service.md (L280286, L374391)

Problem: Four source-vs-spec drift items are documented in prose across multiple files:

  1. OsRng vs rand::random() — documented in README L8789, encryption.md L245250, service.md L374376 (3 locations).
  2. unwrap() on RwLock — documented in README L9497, service.md L384391 (2 locations).
  3. CURRENT_KEY_VERSION = 1 (should be 2) — documented in encryption.md L185188, ADR-020 L120124 (2 locations, none in README).
  4. spawn returns unspawned actor — documented only in service.md L280286 (1 location, inside a method description, not in a Security Constraints section).

Problems with this approach:

  • The spawn bug (item 4) is documented in exactly one place, buried in the spawn method's bullet description, not in any "Security Constraints" or "Known Drift" section. An implementer who reads the Security Constraints sections (the natural place to look for implementation-sync corrections) would miss it.
  • README's Security Constraints (L79103) lists OsRng and unwrap but omits CURRENT_KEY_VERSION and the spawn bug. README is the index document; an implementer using it as a checklist would miss two drift items.
  • There is no single, structured tracking mechanism. If one prose location is updated and others aren't, the docs silently diverge.

Suggestion: Add a single "Known Source Drift" table to README.md (or a dedicated section) listing all drift items with: item, current behavior, target behavior, location in source, and status. At minimum, add CURRENT_KEY_VERSION and the spawn bug to README's Security Constraints. Move the spawn bug note out of the method description and into the drift tracker.


W13. CertAuthorityEntry Is an Undefined Placeholder Type Used in DynamicConfig

Documents: crates/core/config.md (L119130)

Problem: AuthPolicy.cert_authorities: Vec<CertAuthorityEntry> is declared, but CertAuthorityEntry is explicitly "a placeholder type. Its fields will be defined when alknet-ssh is implemented." This means alknet-core's public API references a type that does not exist yet. The spec says "for v1, cert_authorities will be an empty vector," implying the field is dead weight. An implementer cannot compile DynamicConfig without defining CertAuthorityEntry somewhere. Where does it live — alknet-core (but it's SSH-specific per the doc) or alknet-ssh (but then alknet-core can't reference it without a dependency)? This is a crate-dependency contradiction: core references an SSH-defined type, but core cannot depend on alknet-ssh (ADR-003 forbids it).

Suggestion: Either define a minimal CertAuthorityEntry in alknet-core (as an opaque/newtype or a generic struct with the fields that are ALPN-agnostic), or remove cert_authorities from v1 AuthPolicy entirely and add it back when alknet-ssh lands (it's a two-way door — adding a field is additive). Leaving an undefined type in a public struct blocks compilation.


W14. SecretKey Type in TlsIdentity::RawKey(SecretKey) Is Undefined and Its Origin Ambiguous

Documents: crates/core/config.md (L44, L79)

Problem: TlsIdentity::RawKey(SecretKey) and SecretKey::generate() are used but SecretKey is never imported or attributed. It could be ed25519_dalek::SecretKey, iroh::SecretKey, alknet_vault::SecretKey, or a core-defined type. Since config.md says the key "can be derived from alknet-vault" (per endpoint.md:181), and alknet-core must not depend on alknet-vault (ADR-003, ADR-018), this matters: if SecretKey is from iroh, then alknet-core depends on iroh even for quinn-only nodes; if it's a re-export/newtype, that needs stating. The crate dependency table in ADR-003 L27 lists alknet-core depending on "tokio, quinn, rustls, irpc" — not iroh — yet config.md and endpoint.md show iroh types in core's public API.

Suggestion: Specify the exact type and crate for SecretKey. If it's iroh::SecretKey, update ADR-003's dependency list (iroh is already used for iroh::Endpoint, so this is consistent, but ADR-003 omits iroh from alknet-core's deps — a separate inconsistency). If it's ed25519_dalek, say so. Resolve whether alknet-core publicly depends on iroh types.


W15. ADR-003 Omits iroh From alknet-core's Dependency List; Spec Docs Assume It

Documents: ADR-003 (L27), crates/core/endpoint.md (L20), config.md (L44)

Problem: ADR-003 L27 lists alknet-core as depending on "tokio, quinn, rustls, irpc" — no iroh. But AlknetEndpoint holds iroh: Option<iroh::Endpoint> (endpoint.md L20, ADR-010 L67), SendStream/RecvStream wrap "iroh::SendStream or iroh::RecvStream" (core-types.md L103), and TlsIdentity::RawKey(SecretKey) likely uses iroh's key type. ADR-010 L184 says this is "mitigated: both are feature-gated." But ADR-003's dependency table is now stale — it doesn't reflect the iroh dependency that ADR-010 introduced.

Suggestion: Update ADR-003's dependency table to include iroh (feature-gated) for alknet-core, or add an amendment note that ADR-010 added the iroh dependency.


W16. Overview Failure-Mode Table Says Handler Error Closes "QUIC Stream"; Spec Says Connection

Documents: overview.md (L237238), crates/core/core-types.md (L30, L46), crates/core/endpoint.md (L255257, L262)

Problem: overview.md:237 says "Handler handle() returns HandlerError | Endpoint logs the error, closes the QUIC stream." But core-types.md:30 says "The endpoint catches these, logs them, and closes the connection." endpoint.md:256 says HandlerError is "Non-fatal errors within a handler" and the connection closes. The whole architecture is one-ALPN-per-connection (ADR-006), and handle() receives the whole Connection. Closing a "stream" doesn't make sense when the handler owns the connection. This is a stale leftover from the pre-ADR-007 model where handle() received a BiStream.

Suggestion: Change overview.md:237238 from "closes the QUIC stream" to "closes the QUIC connection."


W17. OQ-09's "BiStream Trait Preserves the WASM Door" Is Only True for the Client Side

Documents: open-questions.md (OQ-09 L100107), ADR-007, ADR-009 (L41, L44), crates/core/core-types.md, crates/core/endpoint.md

Problem: The resolution claims "The BiStream trait decision (ADR-007) has already preserved the most important WASM door." This is accurate for the client path (a browser can implement BiStream over WebTransport) but oversells what is preserved for a server-side WASM peer:

  • Connection is a concrete struct, not a trait (ADR-007). Its accept_bi()/open_bi() return SendStream/RecvStream described as wrapping "the underlying QUIC stream types." A WASM server-side peer cannot implement this Connection without a QUIC library that compiles to WASM. BiStream being a trait does not help — handlers receive Connection, and Connection is the binding point.
  • The accept loop uses tokio::spawn (ADR-010). Tokio does not run on WASM. The entire endpoint runtime is tokio-bound.
  • PendingRequestMap and the CallAdapter use tokio oneshot/mpsc channels (call-protocol.md L218240). These are tokio-specific.

The OQ framing implies BiStream was the binding constraint and the rest is detail. In reality, BiStream preserves the client stream door, but Connection, the accept loop, and the call-protocol dispatch internals quietly close the server-side dispatch door.

Suggestion: Update OQ-09's resolution to state: "BiStream being a trait preserves the client-side stream door. The server-side dispatch door (Connection as a concrete quinn-bound struct, tokio-based accept loop, tokio channel dispatch) is NOT preserved by ADR-007 and would require a Connection trait and a runtime-abstracted accept loop to open. This is a known, accepted closure — the browser path is client-side via a JS SDK, not server-side Rust-to-WASM." This converts the misleading framing into an explicit, accepted one-way door.


W18. OQ-10's "Start With Smart Protocol" Deferral Forecloses Git Composability

Documents: open-questions.md (OQ-10 L109116)

Problem: The deferral says "start with git smart protocol over QUIC streams, ERC721 and full server capabilities are additive." The ALPN/endpoint design does not constrain the git adapter's wire format — the handler owns its format and alknet/git is a dedicated connection. That part is genuinely two-way.

However, the deferral hides a fork that the "path of least resistance" will silently resolve in a way that forecloses composability:

  • Raw smart protocol vs. call-protocol projection. If git ships as a ProtocolHandler speaking raw git smart protocol over QUIC streams (the simplest path), git operations are NOT callable via env.invoke(). They exist only on the alknet/git ALPN, not in the OperationRegistry. An agent handler that wants to compose git/clone cannot — there's no OperationSpec, no Handler, no registration. The agent would have to open a raw QUIC connection to alknet/git and speak pkt-line, bypassing the entire composition/security model (ADR-015, ADR-022).
  • To make git composable later, you'd need a separate call-protocol projection — a set of OperationSpec/Handler pairs that wrap git operations behind the registry. That's additive, but it means the "simple" choice (raw smart protocol only) produces a git adapter that is permanently non-composable unless a second handler layer is built.

The deferral frames ERC721/full-server as the additive part. But the real additive question is composability, and the simplest implementation forecloses it.

Suggestion: Append to OQ-10's resolution: "The composability of git operations (whether git ops are registered in the OperationRegistry and callable via env.invoke(), or only available as raw smart protocol on alknet/git) is a separate decision from ERC721 scope. The path of least resistance (raw smart protocol only) forecloses agent composition of git operations. If agent tool dispatch needs to compose git ops, a call-protocol projection must be built alongside or instead of the raw handler. Resolve this when speccing alknet-git, not deferred past it."


W19. ADR-016 Self-Referential Citation and Under-Specified Grandchild Propagation

Documents: ADR-016 (L134, L154158)

Problem: Two minor issues in ADR-016:

  1. Self-referential citation (L134): ADR-016 cites itself ("ADR-016 Assumption 5 says the policy is per-call, not per-operation") in its own Decision 6 rationale. This should just be "Assumption 5" (it's the same document).

  2. Grandchild propagation under-specified (L154157): "The child's OperationContext carries the policy. If the child itself composes grandchildren, the policy propagates unless the child explicitly overrides it." This is the only statement of propagation-vs-override semantics. It doesn't specify: does the grandchild inherit the child's policy by default (so if the child got ContinueRunning, the grandchild is ContinueRunning unless overridden)? Or does each invoke() reset to the default AbortDependents unless explicitly set? The phrase "propagates unless overridden" implies inheritance, which is reasonable, but the invoke() default (operation-registry.md L213, ADR-016 L146) shows context.env.invoke(...) with no policy = default. So is "no policy arg" = "inherit parent's" or "reset to AbortDependents"? These conflict.

Suggestion: (1) Change "ADR-016 Assumption 5" to "Assumption 5" within ADR-016. (2) Clarify: does invoke() with no policy argument (a) inherit the parent's current policy, or (b) reset to AbortDependents? Recommend (b) reset-to-default for predictability (each composition is a fresh decision) unless the parent opts the child into ContinueRunning, and document that ContinueRunning does not auto-propagate to grandchildren (the child must re-affirm it for each grandchild). Or pick (a) and update the invoke() examples. Either is defensible; pick one.


W20. ADR-023's from_openapi Error Code Example Collides With Protocol-Level NOT_FOUND

Documents: ADR-023 (Assumption 3 L366373, Decision 5 L241)

Problem: ADR-023 Assumption 3 acknowledges the collision risk: an operation declaring NOT_FOUND as a domain code collides with the protocol-level NOT_FOUND (operation not registered). The resolution (L367368): "the protocol code takes precedence in the dispatch machinery." And (L372373): "The assumption is that operations use domain-specific codes (FILE_NOT_FOUND) rather than reusing protocol codes (NOT_FOUND). This is a naming convention, not a type-system enforcement."

But Decision 5's example (L241) shows from_openapi mapping a 404 to ErrorDefinition { code: "NOT_FOUND", http_status: Some(404), … } — i.e., from_openapi does produce a domain code that collides with a protocol code. So the "naming convention" is violated by the very adapter the ADR specifies. An implementer following Decision 5's example would create a FILE_NOT_FOUND-vs-NOT_FOUND ambiguity: when call.error carries code: "NOT_FOUND", is it "operation not registered" (protocol) or "file not found" (operation-level, from a 404)? The details field disambiguates (present for operation-level, absent for protocol-level), but the code alone is ambiguous, and ADR-023 L171 says "Clients should switch on code, not parse message."

Suggestion: Either (a) make from_openapi prefix/namespace the imported error codes (e.g., HTTP_404 or <op>_NOT_FOUND) to avoid collision, or (b) update Decision 5's example to use a non-colliding code and add a normative rule that from_openapi must not produce codes that match the five protocol-level codes. Assumption 3's "naming convention" should be elevated to a requirement for adapters, since the adapter example breaks it.


W21. ScopedOperationEnv Public Field Leaks the HashSet Representation — Future Subgraph Refactor Is Breaking

Documents: ADR-022 (L208214, L97105), operation-registry.md (L188, L328)

Problem: ADR-022 L97105: "For v1, the scoped env can be a HashSet<String> of reachable operation names. A dedicated alknet-flowgraph crate … is a future enhancement — not a prerequisite for the security model."

The HashSet<String> representation is fine internally. The problem is that it leaks into the public API in a way that makes the future refactor a breaking change:

  • ScopedOperationEnv is a public struct with a public field. ADR-022 L208214 shows pub struct ScopedOperationEnv { pub allowed_operations: HashSet<String> }. The assembly layer and agent crate construct this directly: ScopedOperationEnv { allowed_operations: HashSet::from(…) } (operation-registry.md L188, L328). If the representation later changes to a subgraph type (with type-compatibility edges, as the flowgraph model requires), every construction site breaks — the field type changes from HashSet<String> to Subgraph or Vec<Edge>. This is a public-API breaking change, not an internal refactor.

The framing says the flowgraph crate is "not a prerequisite for the security model." That's true — the security model works with a flat set. But the refactor from flat set to subgraph is not the trivial "two-way door" the framing implies. It's a breaking change to construction sites.

Additionally, this interacts with OQ-19 (session-scoped registries): session ops compose under a scoped env. If the scoped env is a flat HashSet<String>, session ops get reachability control (can this op be reached?) but NOT type-compatibility validation (does the output of op A feed validly into op B?). A session op (untrusted, LLM-written) that composes an op whose output type doesn't match the next op's input gets a runtime error, not a static rejection. The flowgraph deferral means session ops ship without static type validation — a security-relevant gap for untrusted code.

Suggestion: Make allowed_operations private (not pub), with a constructor ScopedOperationEnv::new(ops: impl IntoIterator<Item = String>) and a query method allows(&self, name: &str) -> bool. This keeps the representation internal and makes the future subgraph refactor a non-breaking change. Add to the flowgraph deferral: "The HashSet<String> representation does not support type-compatibility validation. Session-scoped ops (OQ-19, untrusted code) compose without static type checking until the flowgraph crate is built. This is a known gap for the untrusted-code path, not just a visualization feature."


Suggestions

S1. Inline Decision Rationale in Spec Docs That Should Be in ADRs

Documents: call-protocol.md (L341, L343344, L124), operation-registry.md (L202, L364369, L377), crates/vault/encryption.md (L2737), crates/vault/mnemonic-derivation.md (L3146)

Problem: Per the architect's documentation principles, spec docs should reference ADRs by number, not contain the rationale. Several spec sections contain why rationale that duplicates or extends ADR content:

  • call-protocol.md L341 duplicates ADR-016 Decision 2's rationale ("the correct default because aborted parent work has no consumer…") verbatim.
  • operation-registry.md L202 explains why metadata doesn't propagate (the secret-leak scenario) — this is ADR-014's rationale, restated inline.
  • operation-registry.md L364369 explains the Clone semantics of Capabilities and the security implication — this is a design rationale, not in any ADR. It should be in ADR-014 or ADR-022.
  • operation-registry.md L377 ("from_call trust is transitive… a compromised remote node can do anything") is a threat-model statement that belongs in ADR-017.
  • crates/vault/encryption.md L2737 (AES-256-GCM rationale) and mnemonic-derivation.md L3146 (HD derivation rationale) have no ADRs (see W9).

Suggestion: Move the unique rationale to the relevant ADRs (ADR-016 already has the abort-dependents rationale — just cite it; add capabilities-clone rationale to ADR-014 or ADR-022; add from_call transitive trust to ADR-017) and replace the inline rationale with ADR references. Keep only the constraint statement in the spec.


S2. Untracked "Two-Way Door" Deferrals Scattered Through Specs

Documents: operation-registry.md (L383, L257, L364, L369), call-protocol.md (L333), ADR-017 (L236237, L279)

Problem: Per the documentation principles, inline open questions should be in open-questions.md. Several "two-way door" / "future work" notes are scattered through the specs without being tracked as open questions:

  • operation-registry.md L383: "Two-way door — ArcSwap<OperationRegistry> can be added later" (contradicts the "immutable after construction" constraint at L14, L194, L340).
  • operation-registry.md L257: "Future work may add irpc service dispatch and remote call protocol dispatch as additional backends."
  • operation-registry.md L364: "The concrete shape of the type… is a two-way door for implementation."
  • operation-registry.md L369: "The concrete cloning semantics… is a two-way door for implementation."
  • call-protocol.md L333: timeout policy (30s default, adapter-level config) — no ADR governs this.
  • ADR-017 L236237, L279: CallClient registry mechanism and from_call hot-swapping.

Suggestion: Add the genuinely-open items (Capabilities concrete shape, Capabilities clone semantics, OperationAdapter trait signatures, CallClient registry mechanism, to_* adapter gap semantics, from_call hot-swap, registry hot-swap via ArcSwap, timeout policy) to open-questions.md with explicit status. For items that are decided deferrals, record the deferral in the relevant ADR's Consequences rather than as inline spec commentary.


S3. Missing ADR References for Design Choices ("Why X Over Y?")

Documents: call-protocol.md (L86, L116, L262272, L333), operation-registry.md (L6467, L120127, L139)

Problem: Several design choices are stated in the spec with no ADR backing:

  • Binary payload convention (call-protocol.md L86): base64-in-JSON. Why not a separate binary event type? No ADR.
  • auth_token in payload (call-protocol.md L116122, L262272): per- request identity upgrade via an optional auth_token field. Significant security-model decision with no ADR.
  • Per-request vs per-connection identity (L272): "Identity is resolved per-request, not per-connection." No ADR.
  • Timeout policy (L333): 30s default, adapter-level config. No ADR.
  • Module-private internal writes (operation-registry.md L120127): the enforcement mechanism for "handlers can't set internal." The requirement is from ADR-015; the mechanism (Rust visibility) has no ADR.
  • Namespace derivation (operation-registry.md L66): "namespace is derived from the name." Why derived rather than independently declared? No ADR.

Suggestion: For load-bearing choices (binary convention, auth_token / per-request identity, timeout policy), either add an ADR or extend an existing one. For minor choices (namespace derivation, visibility pattern), a one-line note suffices, but they shouldn't be silently decided in the spec.


S4. README ADR Tables Missing Entries and Status Column

Documents: crates/call/README.md (L1936), crates/core/README.md (L2131), crates/vault/README.md (L3847)

Problem: Three issues across the crate READMEs:

  1. call README omits ADR-013 (Rust canonical implementation), which both call spec docs reference (call-protocol.md L88, operation-registry.md L24).
  2. core README omits ADR-015, referenced by auth.md L78, L200 and config.md L200.
  3. vault README omits ADR-010 and ADR-006, referenced by mnemonic-derivation.md L26 and protocol.md L290.
  4. None of the crate READMEs have a Status column in their ADR tables, which contributes to C2 (reader can't tell which ADRs are binding).

Suggestion: Add the missing ADRs to each crate README's ADR table. Add a Status column (Accepted/Proposed/Superseded) to make the governance state visible — this directly supports resolving C1/C2.


S5. Connection::new() Referenced but Never Defined

Documents: crates/core/core-types.md (L113), crates/core/endpoint.md (L127), ADR-010 (L117)

Problem: Connection::new(connection) is called in dispatch pseudocode (endpoint.md L127, ADR-010 L117) but the impl Connection block in core-types.md:5967 does not include a new() constructor, nor specify its signature/parameter. core-types.md:113 says "Connection::new() wraps the appropriate stream source based on where the connection came from" — but there's no formal signature, and the argument type differs by source (a quinn::Connection vs an iroh Accepting/Connecting). An implementer doesn't know the constructor's parameter type.

Suggestion: Add pub fn new(source: ConnectionSource) -> Connection (or two constructors from_quinn / from_iroh, or a private enum) to the impl Connection block. Define ConnectionSource or the variant constructors.


S6. services/schema Input Shape Under-Specified (Leading Slash)

Documents: operation-registry.md (L286), call-protocol.md (L120, L126)

Problem: The input shape for services/schema is shown as { "name": "fs/readFile" } (no leading slash) but the wire operationId convention (call-protocol.md L120, L126) is with a leading slash (/fs/readFile). So does services/schema's name input use the wire form (leading slash) or the registry form (no slash)? The spec shows no slash, implying registry form, but a client calling services/schema over the wire would naturally use the same operationId form it uses elsewhere. This is a small ambiguity but could cause a NOT_FOUND if the adapter doesn't normalize.

Suggestion: Specify whether services/schema's name input has a leading slash and whether the adapter strips it (consistent with call.requested's operationId handling per call-protocol.md L120).


S7. call.completed For Non-Subscription Calls Not Addressed

Documents: call-protocol.md (L96106)

Problem: The event table lists call.completed as "Signal end of subscription stream" — subscription-only. But for a plain call() (request/response, one result), the lifecycle is call.requestedcall.responded (one) and then… nothing? Is the request considered complete on the first call.responded? Is a call.completed ever sent for non-subscription calls? The spec says (L103) "call(): Sends call.requested, resolves on first call.responded" — so no call.completed for calls. But then what signals to the server that the client is done listening? This is consistent but the asymmetry (subscriptions get completed, calls don't) could be stated explicitly to avoid an implementer sending a spurious call.completed after a call.responded.

Suggestion: Add a note: "call.completed is sent only for subscriptions. A plain call() is complete after its single call.responded; no call.completed follows."


S8. OperationProvenance::FromCall Comment Is Ambiguous ("Leaf Locally")

Documents: ADR-022 (L123124)

Problem: FromCall => "QUIC forwarding stub (from_call), leaf locally — cannot compose." The "locally" qualifier is ambiguous. Does it mean: (a) from_call ops are leaves in the local registry but could compose on the remote node? Or (b) they're leaves, full stop, and "locally" is just emphasis? Assumption 5 (L518524) clarifies, but the phrasing invites misreading.

Suggestion: Rephrase to "leaf in the local registry — forwards calls to a remote node; cannot compose locally" or similar to remove the ambiguity.


S9. ScopedOperationEnv::empty() and CompositionAuthority::none() Referenced but Undefined

Documents: operation-registry.md (L182184, L243244, L298299, L318320), ADR-022 (L287)

Problem: The spec and ADR-022 use ScopedOperationEnv::empty() and CompositionAuthority::none() (constructor methods) but neither type's definition (ADR-022 L164180 for CompositionAuthority, L208214 for ScopedOperationEnv) lists these methods. They're implied (empty set / None-equivalent) but not shown.

Suggestion: Add impl ScopedOperationEnv { pub fn empty() -> Self { … } pub fn allows(&self, name: &str) -> bool { … } } and impl CompositionAuthority { pub fn none() -> Option<Self> { None } } (or similar) to the spec or ADR-022.


S10. Overview OQ-04 Entry Is Stale

Documents: overview.md (L225)

Problem: overview.md:225 lists "OQ-04: Dynamic handler registration at runtime vs static at startup (two-way door, defer to implementation)." But OQ-04 in open-questions.md:4751 is resolved ("Static registration at startup"). The overview shows a stale framing ("defer to implementation") while the OQ is resolved. Compare to OQ-01/OQ-02/OQ-03 which are correctly marked "resolved" in the same overview list.

Suggestion: Update overview.md:225 to "OQ-04: Dynamic handler registration (resolved: static at startup — see ADR-010)" to match the resolved state.


S11. ADR-021 References "W1 Drift Issue from the Vault Review" — Ambiguous Cross-Reference

Documents: ADR-021 (L127129)

Problem: ADR-021 L127129 states: "This is the fix for the W1 drift issue from the vault review: the current source ignores key_version for key selection; the spec now makes it functional." But review 001's W1 (L355371) is about ADR-020's stale example path (m/74'/2'/1'/0' vs m/74'/2'/0'/1'), not about the source ignoring key_version. The source-ignoring-key_version drift is not tracked in review 001 at all. The "W1" label in ADR-021 doesn't match review 001's W1.

Suggestion: Clarify the reference — either cite review 001 with the correct finding number, or remove the "W1" label and describe the drift directly (which the text already does).


S12. Duplicate "Helper Functions" Block in mnemonic-derivation.md

Documents: crates/vault/mnemonic-derivation.md (L199204, L217223)

Problem: The helper functions device_path and site_password_path are documented twice: L199204 (missing encryption_path_for_version) and L217223 (complete). An implementer reading the first block might not realize encryption_path_for_version exists, or might think there are two conflicting definitions. The duplication is a maintenance hazard.

Suggestion: Remove the first block (L199204); keep the complete list at L217223.


Cross-Cutting Theme: ADR-022 Narrowed Several "Two-Way Doors" Without Updating the OQ Classifications

The two-way-door audit found a systematic pattern: ADR-022 (Proposed, not yet Accepted) is the single largest source of door-type staleness. It narrowed several doors without updating the OQ classifications that were written pre-ADR-022:

Door Pre-ADR-022 classification Post-ADR-022 reality
Capabilities concrete shape (W2) Two-way (implementation detail) Clone semantics become a behavioral contract once handlers depend on shared mutation; ADR-022's composition model bakes in the cheap-clone assumption
OperationAdapter trait signatures (W1) Two-way (async vs sync) ADR-022 committed the return type to HandlerRegistration; from_call forces async — both are already decided
CallClient registry mechanism (W3) Two-way (share global vs subset vs separate) ADR-022 put capabilities (secrets) in the bundle — "share global" is now a capability-exposure decision, not just dispatch
Dynamic registration (W4) Two-way (add ArcSwap later) Hot-swap now requires re-injecting capabilities (secret-material handling) + TLS config rebuild, not a pointer swap
Session-scoped registries (OQ-19) Two-way (protocol doesn't need changes) OperationContext.env field type is a current commitment — if it's a concrete struct, the session-overlay pattern is closed (see C6)
Graph structures (W21) Two-way (HashSet now, flowgraph later) ScopedOperationEnv's public field leaks the representation; the refactor is a public-API break, not an internal change

Pattern: ADR-022 was written to resolve review #001's C1C4 (the registration-bundle tangle), and in doing so it quietly narrowed several previously-two-way doors. The OQ tracker and ADR-017 should be reconciled with ADR-022 before the "two-way door" labels can be trusted. ADR-009's framework is sound; its application is drifting because later ADRs don't update earlier OQ classifications.

The fix is not to re-litigate each door — most of ADR-022's narrowing is correct (the bundle should carry capabilities, the adapter should produce HandlerRegistration). The fix is to add guard clauses to the affected OQs and ADRs documenting the narrowed scope, so an implementer reading the OQ tracker sees the post-ADR-022 reality, not the pre-ADR-022 optimism.


Cross-Cutting Theme: The "Published Artifact Is a Contract" Blind Spot

The audit found a systematic blind spot in ADR-009's framework: it classifies doors by reversal cost in the codebase, not by compatibility cost for external consumers. Three deferrals share this pattern:

  • to_* best-effort mappings (W5): "best-effort" is true before first publication but one-way after — external clients depend on the generated spec.
  • Error schema http_status mappings (W20): once to_openapi publishes FILE_NOT_FOUND → 404, changing it breaks clients.
  • The salt field's wire format (W6): the field is "reserved" (wire-format- stable), but its semantics cannot change for existing data — "reserved" implies more flexibility than exists.

The framework treats these as two-way because the format is additive; it misses that the semantics, once published, are frozen. This is the same class of issue as a public API: adding a method is additive, but changing an existing method's behavior is breaking. The spec should acknowledge that published artifacts (generated OpenAPI specs, published error code mappings, shipped wire formats) are compatibility contracts, not internal implementation details.


Summary Statistics

Severity Count
Critical 13
Warning 21
Suggestion 12

The findings cluster into a few resolution passes. Recommended order:

Pass 1: Governance reconciliation (resolves C1, C2, C12, C11)

  1. Advance ADR-022 and ADR-023 to Accepted.
  2. Amend ADR-015: mark Decision 3 and Assumption 6 as superseded by ADR-022; update the struct to CompositionAuthority.
  3. Amend ADR-002: note the signature was revised by ADR-007.
  4. Amend ADR-004: note the "enrich/replace" language was superseded by ADR-011's immutability model.
  5. Add a Status column to all crate README ADR tables.

This is the highest-value pass — it resolves the direct contradictions in the document set and makes the governance state coherent.

Pass 2: Spec-ADR consistency (resolves C3, C4, C5, C6, C10, C13, W1)

  1. Add abort_policy to OperationContext; define AbortPolicy enum (C3).
  2. Define the OperationEnv trait explicitly, including invoke_with_policy() (C4).
  3. Separate OperationContext.env (dispatch trait) from OperationContext.scoped_env (reachability set) — resolve the type identity crisis (C6).
  4. Update ADR-017's from_call mirror list and OperationAdapter trait to include error_schemas and HandlerRegistration (C5, W1).
  5. Specify From<StreamError> for HandlerError and resolve the variant payload (C10).
  6. Specify the Connection::set_identity observability read path (C13).

Pass 3: Vault security and deferral reconciliation (resolves C7, C8, C9, W6, W7, W8, W9, W10, W12)

  1. Promote OQ-21 from "deferred" to "open"; reconcile with ADR-019; decide the vault-server-crate question (C7).
  2. Complete the operation access policy table (C8).
  3. Specify the site_password_path string→u32 mapping (C9).
  4. Add an ADR for the HD-derivation key model (identity/SSH/signing) (W9).
  5. Add an ADR locking the EncryptedData wire format (W10).
  6. Add a "Known Source Drift" table to the vault README (W12).
  7. Address unlock_new returning non-zeroizing String (W7) and DerivedKey JSON deserialization (W8).
  8. Clarify the salt field's "reserved" semantics (W6).

Pass 4: Two-way-door guard clauses (resolves W2, W3, W4, W5, W17, W18, W21)

  1. Add guard clauses to the OQs and ADRs that ADR-022 narrowed (W2, W3, W4, W21).
  2. Add the "published artifact is a contract" guard to to_* adapters (W5).
  3. Correct OQ-09 (WASM server-side) and OQ-10 (git composability) framings (W17, W18).

Pass 5: Minor consistency (resolves remaining Warnings and Suggestions)

  1. Fix the overview.md "stream" → "connection" wording (W16).
  2. Fix ADR-016's self-referential citation and grandchild propagation (W19).
  3. Fix ADR-023's from_openapi collision example (W20).
  4. Address the remaining suggestions (S1S12).

Note on Effort

This review found more issues than review #001 (13 critical vs 5, 21 warning vs 4), but that's expected: review #001 caught the registration-bundle tangle and the missing error schemas, which were the highest-density gaps. This review goes wider — cross-document consistency, the two-way-door framing, and the vault deferrals — and surfaces issues that were always there but not yet audited. The critical findings here are individually smaller than review #001's C1C5 (no single tangle of four interlocking gaps), but they are collectively important because several are governance issues (Proposed ADRs treated as binding, Accepted ADRs contradicting later refinements) that undermine the document set's authority.

The two-way-door audit's core finding: the framework is sound, but its application is drifting because later ADRs don't update earlier OQ classifications. The fix is not to re-litigate each door — it's to add guard clauses documenting the narrowed scope, so the OQ tracker reflects post-ADR-022 reality.