Second pre-implementation review. Goes wider than #001 on cross-document consistency and the two-way-door framing from ADR-009. Finds 13 critical, 21 warning, 12 suggestion issues: - Governance: ADR-022/023 are Proposed but specs treat them as binding; ADR-015/002/004 (Accepted) contradict later refinements without supersession markers - Abort policy (ADR-016) missing from OperationContext struct; OperationEnv trait never defined - OperationContext.env type identity crisis (reachability set vs dispatch trait) - ADR-017 from_call mirror list missing error_schemas; OperationAdapter trait stale vs ADR-022 bundle - OQ-21 remote vault 'non-breaking' framing conflicts with ADR-019 and hides a crate-decomposition decision; RemoteService path unvalidated - Vault operation access policy table incomplete for security-sensitive methods - site_password_path string-to-index mapping breaks determinism guarantee - Two-way-door audit: ADR-022 narrowed several doors without updating OQ classifications; 'published artifact is a contract' blind spot in framework Includes recommended 5-pass resolution order.
90 KiB
status, last_updated, reviewed_documents, reviewer
| status | last_updated | reviewed_documents | reviewer | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| draft | 2026-06-22 |
|
architect |
Architecture Gap Review #002
Purpose
This is the second pre-implementation sanity check. Review #001 caught the registration-bundle tangle (C1–C4) and the missing error schemas (C5), both resolved via ADR-022 and ADR-023. This review goes deeper on two axes the first review under-covered:
- Cross-document consistency: places where an Accepted ADR contradicts a Proposed ADR, where a spec doc adopted a Proposed ADR's types without the ADR being advanced, and where a later ADR narrowed a "two-way door" deferral without updating the OQ classification.
- Two-way-door deferral audit: the user's concern that the one-way/two-way door framing from ADR-009 has been applied too loosely. Some deferrals are legitimate, but several are questionable or misleading — either because a later ADR quietly narrowed the door, because the "path of least resistance" implementation forecloses a better option, or because multiple individually-reversible deferrals compound to foreclose a path that no single deferral would block on its own.
Review #001 resolved 5 critical, 4 warning, and 4 suggestion findings. This review finds a different set: the registration-bundle work surfaced new inconsistencies between ADR-015 (Accepted) and ADR-022 (Proposed), the abort policy (ADR-016) was never reflected in the OperationContext struct, the OperationEnv trait was never defined, and several vault deferrals have security dimensions that the two-way-door framing doesn't surface.
Severity Definitions
| Severity | Meaning |
|---|---|
| Critical | Must resolve before implementation — would cause divergent implementations or security issues |
| Warning | Should resolve — missing specifications that could cause confusion or rework |
| Suggestion | Consider — improvements for clarity or completeness |
Critical Findings
C1. ADR-015 Is Not Reconciled With ADR-022 — Direct Contradiction in the Document Set
Documents: ADR-015 (Status: Accepted, Decision 3 L114–136, Assumption 6 L275–280), ADR-022 (Status: Proposed, Decision 2 L145–186), operation-registry.md (L115, L133, L172), call-protocol.md (L296)
Problem: ADR-015 is marked Accepted and defines
handler_identity: Option<Identity> in its Decision 3 (L124) — a peer Identity
type resolved via IdentityProvider. Assumption 6 (L275–280) explicitly states
"the handler identity is a full Identity (with scopes), not a special
principal type… This reuses the existing Identity type and IdentityProvider
infrastructure (ADR-004)."
ADR-022 is marked Proposed and explicitly supersedes ADR-015's Assumption 6
(L147–149: "This ADR refines that: composition authority is a declared authority
bundle, not a peer Identity"; L183–185: "This supersedes ADR-015's Assumption
6"). The spec docs now use handler_identity: Option<CompositionAuthority> —
a different, non-peer type.
This is a direct contradiction. Per the ADR lifecycle, a superseded
decision should be marked Superseded or carry an inline note pointing to the
superseding ADR. ADR-015 has neither. Worse, ADR-015 has higher status
(Accepted) than the ADR that supersedes it (Proposed). An implementer reading
ADR-015 in isolation would build Identity-based handler identity; an
implementer reading the specs would build CompositionAuthority-based. The
Accepted-overrides-Proposed governance rule means the old Identity definition
is, by the document set's own lifecycle, currently authoritative — yet the
specs have already adopted the proposed change.
This is the same shape of bug as review #001's W1 (ADR-020's stale example path contradicting the locked ADR-021 scheme), but more severe: it's a security-model type, not a path string.
Suggestion: Two things must happen before implementation:
- Advance ADR-022 and ADR-023 to Accepted. The specs already depend on them as the source of truth for types that appear throughout the code. Keeping them Proposed while the specs adopt their types is incoherent governance. (See C2.)
- Reconcile ADR-015: either update its Decision 3 and Assumption 6 to point
to ADR-022 (inline amendment note + change
IdentitytoCompositionAuthorityin the struct), or mark the relevant sectionsSuperseded by ADR-022 (Decision 2).
The cleanest path is to do both in one pass: advance ADR-022/023 to Accepted, amend ADR-015 to reference the supersession. This resolves C1, C2, and C6 in one move.
C2. ADR-022 and ADR-023 Are "Proposed" but the Specs Treat Them as Authoritative
Documents: ADR-022 (Status: Proposed), ADR-023 (Status: Proposed), operation-registry.md (L40, L115, L133, L149, L158, L160–168, L286, L372–379, L404–406), call-protocol.md (L296, L305, L369–373), README.md (ADR table L36–60)
Problem: Both ADR-022 and ADR-023 are marked Proposed. Yet the spec docs depend on them pervasively as the source of truth:
OperationSpec.error_schemas(ADR-023) — operation-registry.md L40, L286.ErrorDefinition(ADR-023) — operation-registry.md L56–61.HandlerRegistrationwithprovenance,composition_authority,scoped_env,capabilities(ADR-022) — operation-registry.md L160–168, the central registration type.OperationContext.handler_identity: Option<CompositionAuthority>(ADR-022 Decision 2) — operation-registry.md L115, L133; call-protocol.md L296.CompositionAuthority,ScopedOperationEnv,OperationProvenance(ADR-022) — used throughout.
The README's ADR table (L36–60) shows ADR-022 and ADR-023 as "Proposed," but the call-crate README's ADR table has no Status column at all — a reader can't tell which ADRs are binding. Per the ADR lifecycle, "Proposed" means under review, not yet a commitment.
Combined with C1 (ADR-015 Accepted still contradicts ADR-022 Proposed), the
governance state is incoherent: the Accepted ADR says Identity, the Proposed
ADR says CompositionAuthority, and the specs have picked the Proposed one.
Suggestion: Advance ADR-022 and ADR-023 to Accepted before implementation. They are clearly intended to be binding — the specs already implement them. Add a Status column to the crate README ADR tables so the binding/non-binding distinction is visible. This is the governance fix that unblocks C1.
C3. The Abort Policy (ADR-016 Decision 6) Is Missing from OperationContext and OperationEnv::invoke()
Documents: ADR-016 (Decision 6 L120–164, Accepted), operation-registry.md
(OperationContext struct L111–128, LocalOperationEnv::invoke() L211–249),
call-protocol.md (L124, L345)
Problem: ADR-016 Decision 6 states unambiguously (L122–125): "The abort
policy (abort-dependents vs continue-running) is set on OperationContext
and propagated to children via OperationEnv::invoke()." It shows
OperationContext carrying the policy (L154) and an opt-in
invoke_with_policy(...) method (L149–151).
The OperationContext struct in operation-registry.md (L111–128) has no
abort_policy field. The fields shown are: request_id,
parent_request_id, identity, handler_identity, capabilities,
metadata, env, internal. No policy. The LocalOperationEnv::invoke()
implementation (L211–249) likewise constructs a child OperationContext with
no policy field and takes no policy parameter.
call-protocol.md L124 and L345 correctly state that the policy is on
OperationContext and propagated via OperationEnv::invoke(), but the actual
struct definition in operation-registry.md — the one an implementer would copy
— omits it. This is a gap between ADR-016 (Accepted) and the spec. An
implementer copying the spec would simply omit the abort policy, and the
continue-running opt-in would have no wiring point.
Suggestion: Add pub abort_policy: AbortPolicy (with a default of
AbortPolicy::AbortDependents) to the OperationContext struct in
operation-registry.md, and define the AbortPolicy enum
(AbortDependents | ContinueRunning). The build_root_context in
call-protocol.md should set abort_policy: AbortPolicy::AbortDependents (the
default per ADR-016 L140). See C4 for the invoke() side.
C4. OperationEnv Trait Is Never Defined; invoke_with_policy() Absent from Spec
Documents: operation-registry.md (L196–259), ADR-016 (Decision 6 L144–164)
Problem: Two related gaps:
-
The
OperationEnvtrait is never defined. operation-registry.md repeatedly referencesOperationEnvas a trait that "must remain a trait" (L259, also OQ-19), and shows#[async_trait] impl OperationEnv for LocalOperationEnv(L211–212), but the trait definition itself is never shown. Only theLocalOperationEnvstruct (L207–209) and its impl block (L211–249) appear. An implementer cannot know the trait's method signatures, return types, or whetherinvoke()is the sole method. -
The
invoke_with_policy()opt-in is absent. ADR-016 Decision 6 (L144–152) shows the policy entering throughOperationEnv::invoke()viainvoke_with_policy("fs", "readFile", input, &context, AbortPolicy::ContinueRunning). The spec'sOperationEnv::invoke()signature (L213) takes no policy parameter, and there is noinvoke_with_policymethod, noInvokeOptionsstruct, no builder pattern. The spec never defines the trait surface, so the policy opt-in mechanism ADR-016 mandates is unspecified.
This is load-bearing because OQ-19 makes the trait-ness of OperationEnv a
constraint: session-scoped registries depend on wrapping the global env
via trait layering. If the trait surface is unspecified, two implementers
could define it differently, breaking the wrapping pattern.
Suggestion: Define the OperationEnv trait explicitly in
operation-registry.md, showing at minimum:
#[async_trait]
pub trait OperationEnv: Send + Sync {
async fn invoke(
&self,
namespace: &str,
operation: &str,
input: Value,
parent: &OperationContext,
) -> ResponseEnvelope;
/// Opt-in: compose a child with a non-default abort policy.
async fn invoke_with_policy(
&self,
namespace: &str,
operation: &str,
input: Value,
parent: &OperationContext,
policy: AbortPolicy,
) -> ResponseEnvelope {
// default implementation: set policy on child context, delegate to invoke
}
}
This satisfies both the "must remain a trait" constraint and ADR-016's policy parameter. See C7 for a related type ambiguity on the same struct.
C5. ADR-017's from_call Mirror List Does Not Include error_schemas — Inconsistency with ADR-023
Documents: ADR-017 (Decision 3 L113–122, Assumption 4 L274–279, Accepted), ADR-023 (Decision 1 L112–147, L42–47, L257, Proposed)
Problem: ADR-017 Decision 3, step 3 (L118–119) says from_call constructs
a spec that "mirrors the remote operation's name, namespace, type, schemas,
and access control." Assumption 4 (L274–275) repeats: "the imported
OperationSpec has the same name, namespace, type, schemas, and access
control as the remote operation." The list is: name, namespace, type,
schemas, access control. There is no error_schemas in either list.
ADR-023 adds error_schemas: Vec<ErrorDefinition> to OperationSpec (L123)
and Decision 5 (L236–261) makes adapter fidelity explicit for
from_openapi/to_openapi/from_mcp/to_mcp. But ADR-023 does not
mention from_call in Decision 5 — it covers OpenAPI and MCP adapters only.
Meanwhile ADR-017's from_call mirror list predates error_schemas existing
as a field.
The result: it's ambiguous whether from_call should mirror error_schemas.
ADR-017's step 2 (L116–117) says from_call calls services/schema "for each
→ gets the input/output JSON Schemas" — again, no mention of error schemas.
ADR-023 Decision 6 (L264–286) confirms services/schema returns
error_schemas, so the data is available, but ADR-017 doesn't say to consume
it.
An implementer following ADR-017 literally would import operations with empty
error_schemas, silently dropping the remote error contract — the exact
lossy-import problem ADR-023 was written to solve for OpenAPI/MCP, now
recurring for from_call.
Suggestion: Update ADR-017 Decision 3 step 3 and Assumption 4 to include
error_schemas in the mirror list (name, namespace, type, schemas,
error_schemas, access control), and update step 2 to note that
services/schema now also returns error schemas. Add a cross-reference to
ADR-023. This is an inconsistency between an Accepted ADR (017) and a later
ADR (023) that must be reconciled — same shape as review #001's W3
(from_mcp missing from adapter lists), but for a new field.
C6. OperationContext.env Has a Type Identity Crisis — Reachability Set vs Dispatch Trait
Documents: ADR-015 (L127: pub env: OperationEnv), ADR-022 (L208–215
ScopedOperationEnv, L286–287 env: registration.scoped_env…),
operation-registry.md (L118: pub env: OperationEnv; L220:
parent.env.allows(&name); L243–244: env: registration.scoped_env.clone()…;
L198: context.env.invoke(...)), call-protocol.md (L298–299)
Problem: There is a type identity problem for the env field on
OperationContext:
- ADR-015 L127 and operation-registry.md L118 declare
pub env: OperationEnv(a trait object / the trait type). - ADR-022 and operation-registry.md L243–244 and call-protocol.md L298–299
populate
envwith aScopedOperationEnv(a concrete struct:allowed_operations: HashSet<String>, ADR-022 L208–214).
ScopedOperationEnv (a reachability set — ADR-022's reachability control)
and OperationEnv (the composition dispatch trait — the thing with
invoke()) are different concepts. ADR-015 conflated them by putting
OperationEnv on the context; ADR-022 introduced ScopedOperationEnv as the
reachability input but the dispatch path assigns the scoped set directly to
env:.
operation-registry.md L220 calls parent.env.allows(&name) — implying env
is the ScopedOperationEnv (has .allows()). But the same field is named
OperationEnv in the struct and described (L136) as "the operation environment
for composing calls… OperationEnv" with invoke(). A handler calls
context.env.invoke(...) (L198, L200) — implying env is the dispatch
trait. One field cannot be both a HashSet-based reachability gate and a
trait with an async invoke() method, unless they are the same type (they
are not — ADR-022 treats ScopedOperationEnv as data and the dispatch
OperationEnv/LocalOperationEnv as the trait impl that consults the
scoped env).
This is a genuine type-system ambiguity that would force an implementer to
guess: is context.env the dispatch trait (then where does the scoped
reachability set live?), or is it the reachability set (then how does the
handler call invoke() on it?), or is it a composite type that holds both?
This interacts with OQ-19's constraint that OperationEnv must remain a trait:
if env is ScopedOperationEnv (a concrete struct), the session-overlay
pattern is closed because the field can't hold a trait object.
Suggestion: Separate the two concepts explicitly. Likely:
OperationContext carries scoped_env: ScopedOperationEnv (reachability
data, used for the allows() check) and a separate env: Box<dyn OperationEnv> (or Arc<dyn OperationEnv>) that is the dispatch trait (used
for invoke()). The reachability check in invoke() consults
parent.scoped_env.allows(&name), and the actual dispatch goes through the
OperationEnv trait impl. The current field naming (env for both) is the
root of the confusion and should be disambiguated. This may warrant an ADR-022
clarification, since it changes the struct shape that ADR-022 Decision 5 shows.
C7. OQ-21's "Non-Breaking" Deferral Conflicts With ADR-019 and Hides a Crate-Decomposition Decision
Documents: open-questions.md (OQ-21 L252–285), ADR-019 (L133–136), crates/vault/protocol.md (L187–216, L278–291), crates/vault/service.md (L300–322)
Problem: Three interlocking problems:
-
ADR-019 directly contradicts OQ-21's deferral framing. ADR-019 L133–136 states: "Remote vault administration (unlock a running node's vault over the network) is not supported. If needed in the future, it requires a separate, heavily restricted mechanism (admin scope, mTLS-only, never expose the mnemonic over an unauthenticated channel) and its own ADR." OQ-21 defers remote vault access (derive/encrypt/decrypt) as "non-breaking" with no ADR required. The boundary between ADR-019's "remote vault administration" (needs an ADR) and OQ-21's "remote vault access" (deferred, no ADR) is undefined. An implementer reading ADR-019 could reasonably conclude that any remote vault exposure requires an ADR; an implementer reading OQ-21 could conclude none is needed.
-
The "non-breaking" claim is only true for the wire format, not for secure enablement. OQ-21 and protocol.md L278–286 frame enabling remote access as a "server-setup change." But the auth-wrapping handler that makes remote access safe cannot live in the vault crate (ADR-018) and requires either (a) assembly-layer integration or (b) a new dedicated vault-server crate depending on both alknet-core and alknet-vault (protocol.md L201–204, open-questions.md L269). Option (b) is a crate-decomposition change (ADR-003 territory), not "server setup." OQ-21's "two-way (enabling is non-breaking)" door-type classification (L256) is misleading: once workers build dependencies on the remote vault API, disabling it breaks them — the door is operationally one-way even if the wire format is additive.
-
Nested, undecided deferral. OQ-21 L269 says the auth handler "lives in the assembly layer or a dedicated vault-server crate." This "or" is itself an undecided architectural choice (affects the dependency graph and ADR-003's crate decomposition) deferred inside the OQ-21 deferral. It is not tracked as a separate question.
-
The RemoteService path is unvalidated. The deferral means
VaultProtocol'sRemoteServicetrait impl,DerivedKey's dual serialization (JSON redacts, postcard preserves), andVaultServiceActor's transport-agnosticism have never been exercised over a real network. The spec asserts these work ("by construction"), but there are concrete failure modes that only surface at enablement — e.g.,DerivedKey's customSerializeredactsprivate_keyfor "human-readable formats"; whether postcard correctly bypasses the redaction is a known source of subtle bugs. If it redacts under postcard too, remote dispatch silently corrupts private keys. ADR-009:33 warns "if the right choice is unclear, validate with a POC." The deferral skips the POC and asserts the door is open by construction. -
Safety footgun. The default
IrohProtocolhandler forwards all message types without auth (protocol.md L189–191). If someone naively registers the ALPN without the wrapping handler,Unlock(mnemonic over the wire) andLock(DoS) become remotely callable. The operation access policy is documented in prose only; nothing in the vault crate prevents the insecure configuration.
Suggestion: Promote OQ-21 from "deferred" to "open." Reconcile with
ADR-019 by explicitly defining the administration-vs-access boundary. Decide
now whether a vault-server crate is part of the architecture (this is a
crate-decomposition one-way door per ADR-003/ADR-018 and should be decided,
not deferred). If remote access is genuinely deferred, require that the
enablement (not just the protocol) be covered by a future ADR — matching
ADR-019's "requires its own ADR" language. Correct the door-type
classification: enabling remote access is a one-way door operationally (workers
depend on it), even if the wire format is additive. At minimum, add a guard
clause: "The RemoteService path is unvalidated. The DerivedKey postcard
serialization and the actor's transport-agnosticism must be verified by a
POC before claiming the door is open."
C8. The Operation Access Policy Table Is Incomplete — Security-Sensitive Methods Are Unclassified
Documents: crates/vault/protocol.md (L218–236), crates/vault/service.md (L104–212)
Problem: The "Operation access policy" table (protocol.md L223–232)
covers exactly the 8 VaultProtocol enum variants. But VaultServiceHandle
exposes additional methods not in the protocol enum: rotate, unlock_new,
derive_encryption_key_for_version, derive_password_string, is_unlocked.
These methods have no access classification — an implementer cannot determine
from the spec whether they are local-only, remote-capable, or simply not
protocol-exposed. This is security-relevant:
rotatere-encrypts blobs (security-sensitive operational action; ADR-021 L168–170 says "rotation is an operational action, not a runtime behavior"). If it were ever exposed remotely, a compromised worker could trigger mass re-encryption. It is not in the protocol enum and not in the access policy table — its status is undefined.unlock_newgenerates and returns the mnemonic (root of trust) as a plaintextString. Not in the protocol enum; not in the policy table. Its local-only status is implied but not stated.derive_encryption_key_for_versionreturns anEncryptionKeydirectly (raw key material), unlikeEncrypt/Decryptwhich use the key internally. If exposed remotely, a worker could extract raw encryption keys. Not classified.
The spec never states the rule that "service methods not in VaultProtocol
are local-only by construction." An implementer could add protocol variants
for these methods without recognizing the security implication, since the
policy table only constrains existing variants.
Suggestion: Add an explicit rule: "Service methods absent from
VaultProtocol are local-only by default; adding a protocol variant requires
updating this policy table and an ADR if the method is security-sensitive."
Add rotate, unlock_new, and derive_encryption_key_for_version to the
policy table (or an explicit "local-only" list) with rationale. Document the
mapping between service methods and protocol variants.
C9. site_password_path(site_hash: &str) Is Underspecified — String-to-Index Mapping Breaks Determinism
Documents: crates/vault/mnemonic-derivation.md (L202–204, L213, L219–222)
Problem: The helper site_password_path(site_hash: &str) -> String
produces path m/74'/1'/0'/{site_hash}', but SLIP-0010 hardened derivation
indices are u32 (0 to 2³¹−1). A &str "site hash" cannot be a derivation
index directly — it must be converted to a u32. The conversion (hash
algorithm, truncation, endianness, collision handling) is completely
unspecified. Two implementers could produce different, incompatible mappings:
SHA-256(site) → first 4 bytes → u32; or last 4 bytes; or BLAKE3; or a string
hash. Since the path semantics table (L213) says these are "Ed25519 bytes"
used for "Per-site passwords (not cached)," and determinism is a stated design
principle (L234–237: "the same mnemonic + passphrase + path always produces
the same key"), the string→index mapping must be deterministic and specified.
An underspecified mapping means the same site + mnemonic produces different
passwords across implementations — breaking the reproducibility guarantee.
Additionally, "Ed25519 bytes" (L213) as a password source is an undefined concept. The path produces a 32-byte Ed25519 private key; using those raw bytes as a password is a non-standard construction with no ADR explaining why Ed25519-derived material is suitable as password entropy (vs HKDF or a dedicated KDF).
Suggestion: Specify the exact string→u32 mapping (e.g.,
u32::from_be_bytes(SHA-256(site_hash)[0..4]) & 0x7FFFFFFF for hardened
index). Add a design note or ADR explaining why Ed25519-derived bytes are
acceptable as password material. Specify the password encoding (raw bytes vs
base64url — service.md L166 says base64url for the string variant, but the
bytes variant's encoding is the caller's concern).
C10. HandlerError::StreamError(io::Error) vs StreamError Variant — Conversion Gap
Documents: crates/core/core-types.md (L33–38, L118–139), crates/core/endpoint.md (L248–253), crates/core/auth.md (L55, L59), ADR-010
Problem: core-types.md and endpoint.md (citing ADR-010) both define
HandlerError::StreamError(io::Error). But auth.md:59 shows handler code
calling
self.identity_provider.resolve_from_token(&token).ok_or(HandlerError::AuthRequired)?
after let stream = connection.accept_bi().await?; (L55). The .await? on
accept_bi() returns StreamError (per core-types.md:60), yet there is no
From<StreamError> for HandlerError impl specified. The mapping table in
core-types.md:132–137 is descriptive prose, not a trait impl. Without a
specified From<StreamError> for HandlerError (or #[from]), the ? in the
example will not compile. Two implementers following these docs literally
will diverge: one will hand-map per the table, another will expect an auto-
From.
Additionally, HandlerError::StreamError(io::Error) holds io::Error, but
StreamError itself holds Internal(io::Error) (L118–124). So
StreamError::Internal(io::Error) → HandlerError::StreamError(io::Error)
is a lossy wrap (the distinction between stream-level and connection-level is
flattened). The spec should state whether this From impl exists and its
exact behavior, because handler code (? operator) depends on it.
Suggestion: Specify the exact conversion mechanism. Either (a) declare
HandlerError::StreamError(#[from] StreamError) — but then the variant must
hold StreamError, not io::Error, contradicting ADR-010/core-types; or (b)
explicitly state handlers must use .map_err(HandlerError::from) or
.map_err(|e| e.into()) and define the From impl; or (c) keep io::Error
and define From<StreamError> for HandlerError matching the mapping table.
Resolve the variant payload (io::Error vs StreamError) and specify the
From impl. Update auth.md:55 example to use the canonical conversion
pattern.
C11. ADR-004 Says Handlers "Enrich or Replace" AuthContext; ADR-011 Says AuthContext Is Immutable in handle()
Documents: ADR-004 (L39, Accepted), ADR-011 (L131–133, Accepted), crates/core/auth.md (L82–85)
Problem: ADR-004 line 39 states the handler "enriches or replaces the
partial AuthContext with the fully resolved Identity." This describes
mutation of the shared AuthContext. ADR-011 explicitly contradicts this
at lines 131–133: "handle() signature passes &AuthContext (immutable
reference). Handlers that resolve identity create a local variable… they
don't mutate the AuthContext." auth.md:85 reiterates immutability. The
immutability decision is the later, more refined one, but ADR-004 was never
amended. An implementer reading ADR-004 alone would mutate AuthContext —
causing cross-stream contamination (one stream's token resolution leaking
into another stream's AuthContext on the same connection), exactly what
ADR-011 prevents.
This is security-relevant — mutation across streams is a contamination vector. Same shape as C1 (Accepted ADR contradicts a later refinement that was never recorded as a supersession).
Suggestion: Amend ADR-004 to note that the "enrich/replace" language was
superseded by ADR-011's immutability model (handlers resolve into a local
variable, store on Connection via set_identity). Add an inline note at
ADR-004 L39 pointing to ADR-011. Same pattern as the C1 fix for ADR-015.
C12. ADR-002's ProtocolHandler::handle Signature Is Stale — Not Marked as Superseded by ADR-007
Documents: ADR-002 (L19–27, Accepted), ADR-007 (L44–53, Accepted), overview.md (L89), crates/core/core-types.md (L14–19)
Problem: ADR-002 (Status: Accepted) still shows
async fn handle(&self, stream: BiStream, auth: &AuthContext) at L26.
ADR-007 revised the signature to
handle(&self, connection: Connection, auth: &AuthContext) and its References
note "ADR-002: ProtocolHandler trait (signature revised by this ADR)."
However, ADR-002 itself has no superseded/amended marker — it reads as
the current authoritative definition. overview.md:89 even says "This differs
from the original ADR-002 signature which passed BiStream," confirming the
discrepancy is known but ADR-002 was never annotated. An implementer reading
only ADR-002 (as the README's "Applicable ADRs" table encourages) gets the
wrong trait.
Suggestion: Add a "Superseded by ADR-007" amendment note at the top of
ADR-002's Decision section (or a ## Amendments section) stating the
signature was changed by ADR-007. Alternatively, update the ADR-002 code block
to the Connection signature with a note "Revised by ADR-007." ADRs must
reflect current state or explicitly mark supersession — this is the same
lifecycle rule that C1 and C11 violate.
C13. Connection::set_identity Observability Read Path Is Under-Specified
Documents: crates/core/core-types.md (L53–67), crates/core/auth.md (L80), open-questions.md (OQ-11 L138), crates/core/endpoint.md (L122–136)
Problem: Connection has private identity: OnceLock<Identity>.
set_identity(&self,…) takes &self (interior mutability) and is
"write-once-read-many." But the endpoint "holds a reference for logging"
(OQ-11:138) and the dispatch code in endpoint.md:122–136 moves conn
into the spawned task (Connection::new(connection) then
tokio::spawn(async move { ... handler.handle(conn, &auth) })). After the
move, the endpoint no longer has the Connection to read identity() for
observability. OQ-11 says "the endpoint may also hold a reference for
logging" — but there is no Arc<Connection> or shared handle shown anywhere;
the struct definition in core-types.md:53–57 has no sharing mechanism, and
AlknetEndpoint in endpoint.md:17–27 does not hold a registry of live
connections. So the stated observability reader (the endpoint) has no defined
way to access the connection after spawn.
Suggestion: Specify the observability read path precisely. Either (a)
the endpoint holds a side table of live connections keyed by connection ID
with Weak<Connection> or an Arc<Connection> shared with the handler task,
and Connection becomes Arc-shareable; (b) the handler emits the resolved
identity via a channel/observer callback and the endpoint records it; or (c)
drop the claim that the endpoint reads it and state observability is
handler-side logging only. The current spec is internally inconsistent on
who reads identity() and how they obtain the reference.
Warning Findings
W1. ADR-017's OperationAdapter Trait Is Stale Relative to ADR-022 — And the Sync/Async "Two-Way Door" Is Already Decided
Documents: ADR-017 (L155–158, L171–174, Accepted), ADR-022 (L225–255, Proposed)
Problem: ADR-017 L171–172 states: "The specific trait signatures (async vs sync, error types, configuration parameters) are two-way doors for implementation." Two problems:
-
The bundle shape is already constrained. ADR-022 changes registration from
(OperationSpec, Handler)toHandlerRegistration(spec + handler + provenance + composition_authority + scoped_env + capabilities). ADR-022 L257–261 says adapter convenience methods constructHandlerRegistrationwithcomposition_authority: Noneandscoped_env: None. So the adapter trait must now produceVec<HandlerRegistration>, notVec<(OperationSpec, Handler)>. The trait signature shown in ADR-017 (fn import(&self) -> Vec<(OperationSpec, Handler)>) is wrong per ADR-022. The "two-way door for implementation" on the signature has already been narrowed by ADR-022 — the return type is committed to the bundle, not the pair. ADR-017 has not been updated. -
The async/sync question is not fully two-way.
from_callis inherently async — it discovers operations viaservices/list+services/schemaover a QUIC connection. A synchronousfn import(&self) -> Vec<…>trait cannot accommodatefrom_callwithout a separate async pre-step that populates a cache, then a sync import that reads the cache. This means the trait shape is constrained: either the trait is async (async fn import), orfrom_callis not anOperationAdapterimpl and uses a different path. The "two-way door" framing suggests you can pick sync now and switch to async later; in reality,from_callforces at least an async-capable path from day one.
This is a "two-way door" label that ADR-022 has silently narrowed without updating the OQ classification. (See the two-way-door audit, item 7.)
Suggestion: Update ADR-017 to: (1) return Vec<HandlerRegistration> (not
(OperationSpec, Handler)), (2) acknowledge that from_call requires async
discovery and the trait must be async or from_call must be special-cased, (3)
reclassify the signature as one-way (bundle shape) with a narrow two-way door
on ergonomic details (error types, config params). The current framing in
ADR-017 is inconsistent with ADR-022 and should be reconciled — same
reconciliation pass as C5.
W2. Capabilities Concrete Shape and Clone Semantics — "Two-Way Door" With a Hidden One-Way Door Inside
Documents: operation-registry.md (L364, L369), ADR-014 (L74–76, L123–124, L163–165)
Problem: The spec says "the concrete shape of the type (a typed map, a
struct with named fields, a trait object) is a two-way door for implementation"
and "the concrete cloning semantics (reference-counted Arc vs deep copy of
zeroized material) is a two-way door for implementation, but
Capabilities: Clone is required."
The Clone requirement is accurately identified as a one-way constraint. The
problem is the clone semantics, which the framing treats as two-way but
which contain a hidden one-way door:
- Arc-based clone (cheap, shared). If
CapabilitiesisArc<Map<String, Secret>>,clone()bumps the refcount. All calls in a composition tree share the same secret material. If a handler can mutate capabilities (e.g., add a derived key for a child operation), that mutation is visible to siblings and the parent — shared mutable state across the call tree. This is a security-relevant behavior. - Deep-copy clone (expensive, isolated). Each call gets its own copy.
Mutations are isolated. But cloning zeroized secret material on every
invoke()duplicates secrets N times for a depth-N tree, increasing the attack surface.
The "path of least resistance" is Arc-based (cheap, the obvious choice for a
Clone type holding secrets). Once shipped, handlers may come to depend on
shared-mutation. Switching from Arc-shared to deep-copy-isolated later is a
behavior change that breaks those handlers.
Worse, ADR-022's composition model assumes cheap clone — it clones
unconditionally on every invoke(). A depth-5 composition tree with Arc-based
clone does 5 refcount bumps (cheap). With deep-copy, it does 5 full copies of
all secret material. The ADR-022 design bakes in the assumption that clone is
cheap — which means it bakes in the Arc-based choice, making the "two-way door"
effectively decided by the composition model's performance assumptions.
Suggestion: Add a guard clause: Capabilities must be immutable after
construction (no interior mutability, no Mutex<Map>, no RefCell). This
makes Arc-based clone safe (shared immutable state is fine) and makes the
two-way door genuinely two-way: you can switch between Arc<Map> and
deep-copy without behavior change, because neither supports mutation. Without
this guard, the "two-way door" is a future one-way door waiting for the first
handler that mutates capabilities.
W3. CallClient Registry Mechanism — "Two-Way Door" With a Security Dimension ADR-022 Introduced
Documents: ADR-017 (L236–237, Accepted), ADR-022 (L225–242, Proposed)
Problem: ADR-017 L236–237: "The specific mechanism (sharing the global registry, a peer-scoped subset, or a separate registry) is a two-way door."
The three options are mechanically possible with the HandlerRegistration
bundle. But ADR-022 introduced a security dimension the framing doesn't flag:
- Sharing the global registry exposes local capabilities to the remote
peer. Each
HandlerRegistrationcarriesCapabilitieswith secret material. If theCallClientshares the global registry, the remote peer can invoke any External operation — and those operations' handlers use the local capabilities. A remote peer calling/llm/generateuses the local node's Google API key. The remote peer effectively gains access to all local secret material that backs External operations. - A peer-scoped subset must filter by capability remote-safety, not just operation name. The "path of least resistance" for a peer-scoped subset is to filter by operation name/namespace. But the real filter must be: "is this operation's capability safe to expose to this peer?" An operation backed by a signing key that should never leave the node is not remote-safe, regardless of its name.
The framing was written before ADR-022 put capabilities in the bundle. Post-ADR-022, the "two-way door" on the registry mechanism has a security constraint attached that the framing doesn't mention.
Suggestion: Append to ADR-017's negative consequences: "Sharing the
global registry with a CallClient exposes all local capabilities (API keys,
signing keys) to the remote peer, because the dispatch path populates
OperationContext.capabilities from the registration bundle. A peer-scoped
subset must filter by capability remote-safety, not just operation name. The
registry-mechanism choice is two-way mechanically but has a security dimension
post-ADR-022: the 'share global' option is a capability-exposure decision, not
just a dispatch decision."
W4. from_call Hot-Swapping and Registry Mutability — Two "Two-Way Doors" That Gate Each Other
Documents: ADR-017 (L279), operation-registry.md (L383)
Problem: ADR-017 L279: "Hot-swapping imported specs is a two-way door."
operation-registry.md L383: "The registry is immutable after construction. No
runtime registration or deregistration. Two-way door —
ArcSwap<OperationRegistry> can be added later."
These are presented as independent two-way doors. They are not — one gates the
other. If the initial implementation registers from_call-imported specs
immutably (per the immutable-registry constraint), and handlers/clients
compile/generated-code against those specs, then "hot-swapping" later means:
re-importing on reconnect, diffing specs, invalidating generated clients,
handling in-flight calls whose schema changed. If the registry is
ArcSwap-able, hot-swap is mechanically possible but semantically fraught (a
handler mid-composition doesn't expect the child's schema to change).
Deciding "hot-swap is a two-way door" without resolving the registry-mutability
question leaves an implementer to guess whether to build for immutability or
hot-swap. The deferral hides a coupling: the immutability of the registry (a
one-way door per ADR-010 and operation-registry.md L14) and the mutability
of remote specs (inherent to from_call) are in tension.
Additionally, hot-swapping with ADR-022 bundles requires re-injecting
capabilities (cloning from the old registry or re-deriving from the vault) and
rebuilding the TLS ServerConfig (ALPN strings come from the registry).
The "simple ArcSwap" now touches the secret-material flow and the TLS config.
This isn't closing the door, but it raises the cost and security sensitivity
of the swap.
Suggestion: Resolve the registry-mutability question (immutable vs
ArcSwap) before deciding hot-swap, since hot-swap depends on it — don't
let both be independent "two-way doors" when one gates the other. Add a guard
clause to OQ-04: "Hot-swapping the registry with ADR-022 bundles requires
re-injecting capabilities (cloning from the old registry or re-deriving from
the vault) and rebuilding the TLS ServerConfig. The ArcSwap upgrade is
mechanical but not trivial — it is a secret-material-handling operation, not
a pure pointer swap."
W5. to_* Adapter "Best-Effort" Mappings Become Compatibility Contracts Once Published
Documents: ADR-017 (L244–246, L281–286, Accepted), ADR-023 (error schemas projection)
Problem: ADR-017 L244–246: "Some semantics don't map cleanly (e.g., subscriptions in OpenAPI, bidirectional calls in MCP). The adapters handle these with best-effort mappings and document the gaps."
The "best-effort" framing implies the mappings are malleable — two-way, adjustable. This is misleading once the generated spec is published:
- A published spec is a compatibility contract.
to_openapigenerates an OpenAPI spec. Once external HTTP clients consume that spec and build against it, the mapping semantics (e.g., subscriptions → SSE long-poll, or subscriptions → a polling endpoint) become a de facto contract. Changing the mapping later breaks every client that built against the original spec. The "best-effort" mapping, once published, is a one-way door for the clients that depend on it. from_openapiround-trip lossiness compounds. Ifto_openapimaps subscriptions to SSE, and laterfrom_openapiimports a spec where subscriptions are modeled as webhooks, the two adapters disagree on what a "subscription" means in OpenAPI. Theto_*mapping choices constrain thefrom_*parsing expectations.- With ADR-023,
to_openapinow mapsErrorDefinition.http_statusto OpenAPI response codes. Once published, those status code mappings are a contract. ChangingFILE_NOT_FOUNDfrom 404 to 422 in a laterto_openapirevision breaks clients.
This is the "informal spec becomes formal spec" problem. The generated artifact is the contract; the "best-effort" label is internal framing that external consumers never see. ADR-009's framework classifies doors by reversal cost in the codebase, not by compatibility cost for external consumers — a systematic blind spot.
Suggestion: Add a guard clause to ADR-017: "The to_* mapping choices,
once published as a generated spec, become a compatibility contract for
external consumers. Best-effort mappings are two-way before first publication
but one-way after. Version the generated specs (e.g., OpenAPI spec version
tied to the registry's External operation set version) and document that
mapping changes are breaking for consumers. The to_* adapters should emit a
spec version marker so consumers can detect mapping changes."
W6. The Salt Field's "Reserved for Future KDF" Framing Is Misleading — Semantics Can't Change for Existing Data
Documents: ADR-020 (L102–108, L151–157, L204–207, Accepted), crates/vault/encryption.md (L112–128)
Problem: ADR-020 L108: "If a future KDF-based derivation is needed (see 'Future KDF' below), the field is already present." ADR-020 L156: "This is additive — it doesn't change v2 data."
The "reserved" framing creates a false sense of flexibility:
- The salt in v2 is cryptographically meaningless and cannot be
retroactively made load-bearing for v2 data. v2 encrypts with an
HD-derived key at
m/74'/2'/0'/0'. The salt is random, stored, and unused. If v3 introduces a KDF that uses the salt, v3's key derivation is different from v2's. v3 cannot decrypt v2 data using the salt — v2 was never encrypted with a salt-derived key. The salt is only usable for NEW v3+ data. So "the field is already present" is technically true but misleading: the field is present in v2 blobs, but its presence doesn't make v2 data upgradeable to a KDF. It only means v3 blobs can reuse the field without a wire-format change. - The wire-format-compat framing is the real two-way door, and it's
accurate. Keeping the field means the
EncryptedDatastruct doesn't change shape across v2→v3. That IS a wire-format two-way door — you can add a KDF in v3 without changing the struct. But the semantics of the salt field ("unused" in v2, "load-bearing" in v3) cannot be changed for existing v2 data. - A KDF doesn't fit the rotation scheme. Rotation is version-indexed
paths (
m/74'/2'/0'/{version-2}'). A KDF-based derivation is a different derivation method, not a different path. If a KDF is introduced, the rotation mechanism (version-indexed paths) and the KDF mechanism (salt-based) coexist or conflict. The "reserved salt" framing doesn't acknowledge that a KDF is a different derivation family, not just another version index.
Suggestion: Update ADR-020: "The salt field is reserved for FUTURE versions' use. v2 data's salt is permanently unused — it was random, never participated in key derivation, and cannot be retroactively made load-bearing. Introducing a KDF in v3 is a new derivation method (not a version-indexed path), requiring its own design and a v2→v3 migration (re-encrypt with the new KDF, using a newly-generated v3 salt — the v2 salt is not reused). The field's presence saves a wire-format struct change only; it does not make the KDF design or migration trivial." Drop the speculative "reserved for future KDF" framing or explicitly mark it as speculative, not a planned feature.
W7. unlock_new Returns the Mnemonic as a Non-Zeroizing String — Security Gap
Documents: crates/vault/service.md (L71–79)
Problem: unlock_new(word_count) -> Result<String, VaultServiceError>
returns the generated mnemonic phrase as a String. The Mnemonic type
zeroizes on drop (mnemonic-derivation.md L77–78), but String does not
implement Zeroize by default. The root of trust is returned to the caller
as a plaintext heap-allocated String that lingers in freed memory until the
allocator reuses it. The spec says "Store the returned phrase securely"
(L78–79) but does not address that the returned String cannot be zeroized
by the caller without unsafe or converting to a zeroizing container.
This contradicts the stated principle "Zeroize everything sensitive"
(vault README L65–67) and the Mnemonic type's own zeroization. The Security
Constraints section (service.md L365–399) does not mention this.
Suggestion: Either return a zeroizing wrapper (e.g., Zeroizing<String>
or a dedicated Phrase type that implements Zeroize/ZeroizeOnDrop), or
document in Security Constraints that the caller must convert the returned
String to a zeroizing type and must not let it be cloned/logged. At
minimum, flag this as a known limitation.
W8. DerivedKey JSON Deserialization Silently Produces a Corrupted Key
Documents: crates/vault/protocol.md (L94–112)
Problem: protocol.md L104–107 states: "A JSON-deserialized DerivedKey
will have "[REDACTED]" as its private_key string — this is expected; JSON
round-tripping a DerivedKey is not a supported use case (the private key is
gone)."
This is a silent-failure footgun: if a DerivedKey is accidentally
serialized to JSON and then deserialized (e.g., in a log pipeline, a debug
tool, or a caching layer that uses JSON), the resulting DerivedKey has a
10-byte string ("[REDACTED]") as its private_key instead of 32 bytes. Any
code that then uses this DerivedKey for signing will either produce garbage
signatures or panic on length mismatch — with no error at deserialization
time. The redaction is a defense-in-depth for serialization (logs), but the
deserialization side silently accepts the redacted form, producing a
corrupted key.
For the irpc/postcard path this is not an issue (postcard preserves bytes).
But if any code path ever uses JSON for DerivedKey transport (the spec says
this is "not a supported use case" but doesn't prevent it), the corruption is
silent.
Suggestion: Either (a) make DerivedKey::deserialize reject payloads
where private_key == "[REDACTED]" (returning an error rather than a
corrupted key), or (b) document explicitly that private_key length must be
validated before use and that JSON-deserialized DerivedKeys are invalid.
Consider whether the custom Deserialize impl should enforce a 32-byte
length check.
W9. No ADR for the HD-Derivation Key Model (Identity/SSH/Signing) — Only Encryption Keys Have ADR-020
Documents: crates/vault/mnemonic-derivation.md (L31–46, L248–256), crates/vault/README.md (L68–70)
Problem: ADR-020 covers HD derivation for encryption keys. But the vault's primary use of HD derivation is for identity keys, SSH host keys, and signing keys (mnemonic-derivation.md L26–29, L184–215). The rationale for "HD derivation, not stored keys" (L31–46: no key storage, reproducible across nodes, domain separation, auditable derivation) is inline in the spec's "Why HD Derivation" section. The Design Decisions table (L250–256) lists this with no ADR ("HD derivation (not stored keys) | — | One seed, many keys, no key storage").
This is a foundational architectural decision — the entire vault model depends on it — yet it has no ADR. The choice of HD derivation vs. stored keys is a one-way door (switching to stored keys would change the entire trust model). ADR-018 and ADR-019 both assume HD derivation but don't decide it.
Similarly, these design choices have inline rationale but no ADR:
74'coin type reservation (L186, L254): SLIP-0044 unallocated coin type claimed for alknet. This is a namespace allocation that, once keys are derived, cannot be changed without re-deriving all keys — effectively one-way. No ADR records this allocation.- secp256k1 feature-gating (L166–179): rationale (heavy C dependency, only needed for Ethereum) is inline. No ADR.
- AES-256-GCM cipher/mode choice (encryption.md L27–37): rationale (authenticated, hardware-accelerated) is inline. No ADR.
Suggestion: Either (a) create an ADR for "HD derivation as the vault's key
model" covering identity/SSH/signing keys (the encryption case is already
ADR-020), or (b) broaden ADR-020's scope, or (c) create a single "Vault Key
Model" ADR covering the overarching HD-derivation decision with ADR-020 as a
special case for encryption keys. Record the 74' coin type reservation and
the AES-256-GCM choice in ADRs or a tracked decisions log, since they are
one-way doors with no current ADR.
W10. The EncryptedData "Stable Wire Format" Is a One-Way Door Documented Only in Prose
Documents: crates/vault/encryption.md (L84–95), ADR-018 (L101–104)
Problem: encryption.md L86–95 states: "This is the stable wire format
shared with alknet-storage (a future crate) by type-level agreement, not by
a crate dependency. Both crates must agree on the serialization format… This
cross-language compatibility is why the wire format must stay stable —
changing it breaks both alknet-storage and the TypeScript consumer."
ADR-018 L101–104 repeats this.
This is a one-way door: once alknet-storage and TS consumers depend on the
format, changing it breaks them. Yet it is not locked by an ADR. It is a
cross-crate compatibility contract enforced by documentation, not by a decided
constraint. An implementer modifying EncryptedData (e.g., removing the
unused salt field) would find no ADR saying "this format is frozen." The
"type-level agreement, not a crate dependency" approach (ADR-018) means there
is no compiler enforcement either — the stability contract exists only in
prose.
Suggestion: Create an ADR (or add a decision to ADR-018) locking the
EncryptedData wire format as stable, with the compatibility surface
explicitly enumerated (fields, base64 encoding, field semantics). Note that
the salt field, though unused in v2, is part of the frozen format and cannot
be removed without a format-version migration.
W11. OperationContext.identity Conversion from CompositionAuthority Is Undefined
Documents: operation-registry.md (L236:
identity: parent.handler_identity.as_identity()), ADR-022 (L314:
identity: parent.handler_identity_as_identity())
Problem: Both spec and ADR-022 populate the child's identity by
converting the parent's handler_identity: Option<CompositionAuthority> into
an Option<Identity> via a method as_identity() /
handler_identity_as_identity(). This conversion method is never defined
and its semantics are unclear: CompositionAuthority (label + scopes +
resources) is a declared authority bundle, while Identity is a peer identity
resolved via IdentityProvider. ADR-022 goes to lengths to stress that
CompositionAuthority is not a peer Identity (L150–160). Yet the dispatch
path needs an Identity to run ACL against for the child.
So there's an implicit conversion from a non-peer authority bundle to a peer
identity type. Is it a synthetic Identity constructed with the authority's
label as id and its scopes? Does it bypass IdentityProvider? Is
as_identity() returning None when handler_identity is None (the leaf
case)? For leaves, the ACL "checks against the leaf's AccessControl" — but
if identity is None because the parent is a leaf with
handler_identity: None, then per operation-registry.md L86–87 "If the
relevant identity is None and the operation has restrictions, the adapter
returns FORBIDDEN." That would mean a composing handler with no composition
authority cannot compose any restricted leaf — which contradicts the model
where leaves are composed under the parent's authority.
Actually: a composing parent (Local/Session) does have
handler_identity: Some(...), so as_identity() returns Some. Leaves being
composed are the child; their own handler_identity is None but that's
fine because the child's ACL uses the child's identity (which is the parent's
authority). So the conversion is load-bearing and must be specified.
Suggestion: Define CompositionAuthority::as_identity() -> Option<Identity>
(or handler_identity_as_identity() on OperationContext) explicitly in the
spec or ADR-022. Specify that it constructs a synthetic
Identity { id: label, scopes, resources } that is not resolvable via
IdentityProvider but is used directly for ACL matching. Note that this
creates a second Identity construction path (the first is
IdentityProvider::resolve_*), which should be acknowledged.
W12. Source Drift Items Are Tracked Only in Scattered Prose — The spawn Return Bug Could Be Lost
Documents: crates/vault/README.md (L79–103), crates/vault/encryption.md (L185–188, L245–250), crates/vault/service.md (L280–286, L374–391)
Problem: Four source-vs-spec drift items are documented in prose across multiple files:
- OsRng vs
rand::random()— documented in README L87–89, encryption.md L245–250, service.md L374–376 (3 locations). unwrap()on RwLock — documented in README L94–97, service.md L384–391 (2 locations).CURRENT_KEY_VERSION = 1(should be 2) — documented in encryption.md L185–188, ADR-020 L120–124 (2 locations, none in README).spawnreturns unspawned actor — documented only in service.md L280–286 (1 location, inside a method description, not in a Security Constraints section).
Problems with this approach:
- The
spawnbug (item 4) is documented in exactly one place, buried in thespawnmethod's bullet description, not in any "Security Constraints" or "Known Drift" section. An implementer who reads the Security Constraints sections (the natural place to look for implementation-sync corrections) would miss it. - README's Security Constraints (L79–103) lists OsRng and unwrap but omits
CURRENT_KEY_VERSIONand thespawnbug. README is the index document; an implementer using it as a checklist would miss two drift items. - There is no single, structured tracking mechanism. If one prose location is updated and others aren't, the docs silently diverge.
Suggestion: Add a single "Known Source Drift" table to README.md (or a
dedicated section) listing all drift items with: item, current behavior,
target behavior, location in source, and status. At minimum, add
CURRENT_KEY_VERSION and the spawn bug to README's Security Constraints.
Move the spawn bug note out of the method description and into the drift
tracker.
W13. CertAuthorityEntry Is an Undefined Placeholder Type Used in DynamicConfig
Documents: crates/core/config.md (L119–130)
Problem: AuthPolicy.cert_authorities: Vec<CertAuthorityEntry> is
declared, but CertAuthorityEntry is explicitly "a placeholder type. Its
fields will be defined when alknet-ssh is implemented." This means alknet-core's
public API references a type that does not exist yet. The spec says "for v1,
cert_authorities will be an empty vector," implying the field is dead weight.
An implementer cannot compile DynamicConfig without defining
CertAuthorityEntry somewhere. Where does it live — alknet-core (but it's
SSH-specific per the doc) or alknet-ssh (but then alknet-core can't reference it
without a dependency)? This is a crate-dependency contradiction: core
references an SSH-defined type, but core cannot depend on alknet-ssh (ADR-003
forbids it).
Suggestion: Either define a minimal CertAuthorityEntry in alknet-core
(as an opaque/newtype or a generic struct with the fields that are
ALPN-agnostic), or remove cert_authorities from v1 AuthPolicy entirely and
add it back when alknet-ssh lands (it's a two-way door — adding a field is
additive). Leaving an undefined type in a public struct blocks compilation.
W14. SecretKey Type in TlsIdentity::RawKey(SecretKey) Is Undefined and Its Origin Ambiguous
Documents: crates/core/config.md (L44, L79)
Problem: TlsIdentity::RawKey(SecretKey) and SecretKey::generate() are
used but SecretKey is never imported or attributed. It could be
ed25519_dalek::SecretKey, iroh::SecretKey, alknet_vault::SecretKey, or a
core-defined type. Since config.md says the key "can be derived from
alknet-vault" (per endpoint.md:181), and alknet-core must not depend on
alknet-vault (ADR-003, ADR-018), this matters: if SecretKey is from iroh,
then alknet-core depends on iroh even for quinn-only nodes; if it's a
re-export/newtype, that needs stating. The crate dependency table in ADR-003
L27 lists alknet-core depending on "tokio, quinn, rustls, irpc" — not iroh
— yet config.md and endpoint.md show iroh types in core's public API.
Suggestion: Specify the exact type and crate for SecretKey. If it's
iroh::SecretKey, update ADR-003's dependency list (iroh is already used for
iroh::Endpoint, so this is consistent, but ADR-003 omits iroh from
alknet-core's deps — a separate inconsistency). If it's ed25519_dalek, say
so. Resolve whether alknet-core publicly depends on iroh types.
W15. ADR-003 Omits iroh From alknet-core's Dependency List; Spec Docs Assume It
Documents: ADR-003 (L27), crates/core/endpoint.md (L20), config.md (L44)
Problem: ADR-003 L27 lists alknet-core as depending on "tokio, quinn,
rustls, irpc" — no iroh. But AlknetEndpoint holds
iroh: Option<iroh::Endpoint> (endpoint.md L20, ADR-010 L67),
SendStream/RecvStream wrap "iroh::SendStream or iroh::RecvStream"
(core-types.md L103), and TlsIdentity::RawKey(SecretKey) likely uses iroh's
key type. ADR-010 L184 says this is "mitigated: both are feature-gated." But
ADR-003's dependency table is now stale — it doesn't reflect the iroh
dependency that ADR-010 introduced.
Suggestion: Update ADR-003's dependency table to include iroh (feature-gated) for alknet-core, or add an amendment note that ADR-010 added the iroh dependency.
W16. Overview Failure-Mode Table Says Handler Error Closes "QUIC Stream"; Spec Says Connection
Documents: overview.md (L237–238), crates/core/core-types.md (L30, L46), crates/core/endpoint.md (L255–257, L262)
Problem: overview.md:237 says "Handler handle() returns HandlerError
| Endpoint logs the error, closes the QUIC stream." But
core-types.md:30 says "The endpoint catches these, logs them, and closes the
connection." endpoint.md:256 says HandlerError is "Non-fatal errors
within a handler" and the connection closes. The whole architecture is
one-ALPN-per-connection (ADR-006), and handle() receives the whole
Connection. Closing a "stream" doesn't make sense when the handler owns the
connection. This is a stale leftover from the pre-ADR-007 model where
handle() received a BiStream.
Suggestion: Change overview.md:237–238 from "closes the QUIC stream" to
"closes the QUIC connection."
W17. OQ-09's "BiStream Trait Preserves the WASM Door" Is Only True for the Client Side
Documents: open-questions.md (OQ-09 L100–107), ADR-007, ADR-009 (L41, L44), crates/core/core-types.md, crates/core/endpoint.md
Problem: The resolution claims "The BiStream trait decision (ADR-007) has
already preserved the most important WASM door." This is accurate for the
client path (a browser can implement BiStream over WebTransport) but
oversells what is preserved for a server-side WASM peer:
Connectionis a concrete struct, not a trait (ADR-007). Itsaccept_bi()/open_bi()returnSendStream/RecvStreamdescribed as wrapping "the underlying QUIC stream types." A WASM server-side peer cannot implement thisConnectionwithout a QUIC library that compiles to WASM. BiStream being a trait does not help — handlers receiveConnection, andConnectionis the binding point.- The accept loop uses
tokio::spawn(ADR-010). Tokio does not run on WASM. The entire endpoint runtime is tokio-bound. PendingRequestMapand theCallAdapteruse tokiooneshot/mpscchannels (call-protocol.md L218–240). These are tokio-specific.
The OQ framing implies BiStream was the binding constraint and the rest is
detail. In reality, BiStream preserves the client stream door, but
Connection, the accept loop, and the call-protocol dispatch internals
quietly close the server-side dispatch door.
Suggestion: Update OQ-09's resolution to state: "BiStream being a trait
preserves the client-side stream door. The server-side dispatch door
(Connection as a concrete quinn-bound struct, tokio-based accept loop, tokio
channel dispatch) is NOT preserved by ADR-007 and would require a Connection
trait and a runtime-abstracted accept loop to open. This is a known, accepted
closure — the browser path is client-side via a JS SDK, not server-side
Rust-to-WASM." This converts the misleading framing into an explicit, accepted
one-way door.
W18. OQ-10's "Start With Smart Protocol" Deferral Forecloses Git Composability
Documents: open-questions.md (OQ-10 L109–116)
Problem: The deferral says "start with git smart protocol over QUIC
streams, ERC721 and full server capabilities are additive." The ALPN/endpoint
design does not constrain the git adapter's wire format — the handler owns
its format and alknet/git is a dedicated connection. That part is genuinely
two-way.
However, the deferral hides a fork that the "path of least resistance" will silently resolve in a way that forecloses composability:
- Raw smart protocol vs. call-protocol projection. If git ships as a
ProtocolHandlerspeaking raw git smart protocol over QUIC streams (the simplest path), git operations are NOT callable viaenv.invoke(). They exist only on thealknet/gitALPN, not in theOperationRegistry. An agent handler that wants to composegit/clonecannot — there's noOperationSpec, noHandler, no registration. The agent would have to open a raw QUIC connection toalknet/gitand speak pkt-line, bypassing the entire composition/security model (ADR-015, ADR-022). - To make git composable later, you'd need a separate call-protocol
projection — a set of
OperationSpec/Handlerpairs that wrap git operations behind the registry. That's additive, but it means the "simple" choice (raw smart protocol only) produces a git adapter that is permanently non-composable unless a second handler layer is built.
The deferral frames ERC721/full-server as the additive part. But the real additive question is composability, and the simplest implementation forecloses it.
Suggestion: Append to OQ-10's resolution: "The composability of git
operations (whether git ops are registered in the OperationRegistry and
callable via env.invoke(), or only available as raw smart protocol on
alknet/git) is a separate decision from ERC721 scope. The path of least
resistance (raw smart protocol only) forecloses agent composition of git
operations. If agent tool dispatch needs to compose git ops, a call-protocol
projection must be built alongside or instead of the raw handler. Resolve
this when speccing alknet-git, not deferred past it."
W19. ADR-016 Self-Referential Citation and Under-Specified Grandchild Propagation
Documents: ADR-016 (L134, L154–158)
Problem: Two minor issues in ADR-016:
-
Self-referential citation (L134): ADR-016 cites itself ("ADR-016 Assumption 5 says the policy is per-call, not per-operation") in its own Decision 6 rationale. This should just be "Assumption 5" (it's the same document).
-
Grandchild propagation under-specified (L154–157): "The child's
OperationContextcarries the policy. If the child itself composes grandchildren, the policy propagates unless the child explicitly overrides it." This is the only statement of propagation-vs-override semantics. It doesn't specify: does the grandchild inherit the child's policy by default (so if the child gotContinueRunning, the grandchild isContinueRunningunless overridden)? Or does eachinvoke()reset to the defaultAbortDependentsunless explicitly set? The phrase "propagates unless overridden" implies inheritance, which is reasonable, but theinvoke()default (operation-registry.md L213, ADR-016 L146) showscontext.env.invoke(...)with no policy = default. So is "no policy arg" = "inherit parent's" or "reset to AbortDependents"? These conflict.
Suggestion: (1) Change "ADR-016 Assumption 5" to "Assumption 5" within
ADR-016. (2) Clarify: does invoke() with no policy argument (a) inherit the
parent's current policy, or (b) reset to AbortDependents? Recommend (b)
reset-to-default for predictability (each composition is a fresh decision)
unless the parent opts the child into ContinueRunning, and document that
ContinueRunning does not auto-propagate to grandchildren (the child must
re-affirm it for each grandchild). Or pick (a) and update the invoke()
examples. Either is defensible; pick one.
W20. ADR-023's from_openapi Error Code Example Collides With Protocol-Level NOT_FOUND
Documents: ADR-023 (Assumption 3 L366–373, Decision 5 L241)
Problem: ADR-023 Assumption 3 acknowledges the collision risk: an operation
declaring NOT_FOUND as a domain code collides with the protocol-level
NOT_FOUND (operation not registered). The resolution (L367–368): "the
protocol code takes precedence in the dispatch machinery." And (L372–373):
"The assumption is that operations use domain-specific codes (FILE_NOT_FOUND)
rather than reusing protocol codes (NOT_FOUND). This is a naming convention,
not a type-system enforcement."
But Decision 5's example (L241) shows from_openapi mapping a 404 to
ErrorDefinition { code: "NOT_FOUND", http_status: Some(404), … } — i.e.,
from_openapi does produce a domain code that collides with a protocol
code. So the "naming convention" is violated by the very adapter the ADR
specifies. An implementer following Decision 5's example would create a
FILE_NOT_FOUND-vs-NOT_FOUND ambiguity: when call.error carries
code: "NOT_FOUND", is it "operation not registered" (protocol) or "file not
found" (operation-level, from a 404)? The details field disambiguates
(present for operation-level, absent for protocol-level), but the code alone
is ambiguous, and ADR-023 L171 says "Clients should switch on code, not
parse message."
Suggestion: Either (a) make from_openapi prefix/namespace the imported
error codes (e.g., HTTP_404 or <op>_NOT_FOUND) to avoid collision, or (b)
update Decision 5's example to use a non-colliding code and add a normative
rule that from_openapi must not produce codes that match the five
protocol-level codes. Assumption 3's "naming convention" should be elevated to
a requirement for adapters, since the adapter example breaks it.
W21. ScopedOperationEnv Public Field Leaks the HashSet Representation — Future Subgraph Refactor Is Breaking
Documents: ADR-022 (L208–214, L97–105), operation-registry.md (L188, L328)
Problem: ADR-022 L97–105: "For v1, the scoped env can be a
HashSet<String> of reachable operation names. A dedicated
alknet-flowgraph crate … is a future enhancement — not a prerequisite for
the security model."
The HashSet<String> representation is fine internally. The problem is that
it leaks into the public API in a way that makes the future refactor a
breaking change:
ScopedOperationEnvis a public struct with a public field. ADR-022 L208–214 showspub struct ScopedOperationEnv { pub allowed_operations: HashSet<String> }. The assembly layer and agent crate construct this directly:ScopedOperationEnv { allowed_operations: HashSet::from(…) }(operation-registry.md L188, L328). If the representation later changes to a subgraph type (with type-compatibility edges, as the flowgraph model requires), every construction site breaks — the field type changes fromHashSet<String>toSubgraphorVec<Edge>. This is a public-API breaking change, not an internal refactor.
The framing says the flowgraph crate is "not a prerequisite for the security model." That's true — the security model works with a flat set. But the refactor from flat set to subgraph is not the trivial "two-way door" the framing implies. It's a breaking change to construction sites.
Additionally, this interacts with OQ-19 (session-scoped registries): session
ops compose under a scoped env. If the scoped env is a flat HashSet<String>,
session ops get reachability control (can this op be reached?) but NOT
type-compatibility validation (does the output of op A feed validly into op
B?). A session op (untrusted, LLM-written) that composes an op whose output
type doesn't match the next op's input gets a runtime error, not a static
rejection. The flowgraph deferral means session ops ship without static type
validation — a security-relevant gap for untrusted code.
Suggestion: Make allowed_operations private (not pub), with a
constructor ScopedOperationEnv::new(ops: impl IntoIterator<Item = String>)
and a query method allows(&self, name: &str) -> bool. This keeps the
representation internal and makes the future subgraph refactor a non-breaking
change. Add to the flowgraph deferral: "The HashSet<String> representation
does not support type-compatibility validation. Session-scoped ops (OQ-19,
untrusted code) compose without static type checking until the flowgraph
crate is built. This is a known gap for the untrusted-code path, not just a
visualization feature."
Suggestions
S1. Inline Decision Rationale in Spec Docs That Should Be in ADRs
Documents: call-protocol.md (L341, L343–344, L124), operation-registry.md (L202, L364–369, L377), crates/vault/encryption.md (L27–37), crates/vault/mnemonic-derivation.md (L31–46)
Problem: Per the architect's documentation principles, spec docs should reference ADRs by number, not contain the rationale. Several spec sections contain why rationale that duplicates or extends ADR content:
- call-protocol.md L341 duplicates ADR-016 Decision 2's rationale ("the correct default because aborted parent work has no consumer…") verbatim.
- operation-registry.md L202 explains why metadata doesn't propagate (the secret-leak scenario) — this is ADR-014's rationale, restated inline.
- operation-registry.md L364–369 explains the
Clonesemantics ofCapabilitiesand the security implication — this is a design rationale, not in any ADR. It should be in ADR-014 or ADR-022. - operation-registry.md L377 ("
from_calltrust is transitive… a compromised remote node can do anything") is a threat-model statement that belongs in ADR-017. - crates/vault/encryption.md L27–37 (AES-256-GCM rationale) and mnemonic-derivation.md L31–46 (HD derivation rationale) have no ADRs (see W9).
Suggestion: Move the unique rationale to the relevant ADRs (ADR-016
already has the abort-dependents rationale — just cite it; add
capabilities-clone rationale to ADR-014 or ADR-022; add from_call transitive
trust to ADR-017) and replace the inline rationale with ADR references. Keep
only the constraint statement in the spec.
S2. Untracked "Two-Way Door" Deferrals Scattered Through Specs
Documents: operation-registry.md (L383, L257, L364, L369), call-protocol.md (L333), ADR-017 (L236–237, L279)
Problem: Per the documentation principles, inline open questions should be
in open-questions.md. Several "two-way door" / "future work" notes are
scattered through the specs without being tracked as open questions:
- operation-registry.md L383: "Two-way door —
ArcSwap<OperationRegistry>can be added later" (contradicts the "immutable after construction" constraint at L14, L194, L340). - operation-registry.md L257: "Future work may add irpc service dispatch and remote call protocol dispatch as additional backends."
- operation-registry.md L364: "The concrete shape of the type… is a two-way door for implementation."
- operation-registry.md L369: "The concrete cloning semantics… is a two-way door for implementation."
- call-protocol.md L333: timeout policy (30s default, adapter-level config) — no ADR governs this.
- ADR-017 L236–237, L279: CallClient registry mechanism and
from_callhot-swapping.
Suggestion: Add the genuinely-open items (Capabilities concrete shape,
Capabilities clone semantics, OperationAdapter trait signatures, CallClient
registry mechanism, to_* adapter gap semantics, from_call hot-swap,
registry hot-swap via ArcSwap, timeout policy) to open-questions.md with
explicit status. For items that are decided deferrals, record the deferral
in the relevant ADR's Consequences rather than as inline spec commentary.
S3. Missing ADR References for Design Choices ("Why X Over Y?")
Documents: call-protocol.md (L86, L116, L262–272, L333), operation-registry.md (L64–67, L120–127, L139)
Problem: Several design choices are stated in the spec with no ADR backing:
- Binary payload convention (call-protocol.md L86): base64-in-JSON. Why not a separate binary event type? No ADR.
auth_tokenin payload (call-protocol.md L116–122, L262–272): per- request identity upgrade via an optionalauth_tokenfield. Significant security-model decision with no ADR.- Per-request vs per-connection identity (L272): "Identity is resolved per-request, not per-connection." No ADR.
- Timeout policy (L333): 30s default, adapter-level config. No ADR.
- Module-private
internalwrites (operation-registry.md L120–127): the enforcement mechanism for "handlers can't setinternal." The requirement is from ADR-015; the mechanism (Rust visibility) has no ADR. - Namespace derivation (operation-registry.md L66): "
namespaceis derived from the name." Why derived rather than independently declared? No ADR.
Suggestion: For load-bearing choices (binary convention, auth_token / per-request identity, timeout policy), either add an ADR or extend an existing one. For minor choices (namespace derivation, visibility pattern), a one-line note suffices, but they shouldn't be silently decided in the spec.
S4. README ADR Tables Missing Entries and Status Column
Documents: crates/call/README.md (L19–36), crates/core/README.md (L21–31), crates/vault/README.md (L38–47)
Problem: Three issues across the crate READMEs:
- call README omits ADR-013 (Rust canonical implementation), which both call spec docs reference (call-protocol.md L88, operation-registry.md L24).
- core README omits ADR-015, referenced by auth.md L78, L200 and config.md L200.
- vault README omits ADR-010 and ADR-006, referenced by mnemonic-derivation.md L26 and protocol.md L290.
- None of the crate READMEs have a Status column in their ADR tables, which contributes to C2 (reader can't tell which ADRs are binding).
Suggestion: Add the missing ADRs to each crate README's ADR table. Add a Status column (Accepted/Proposed/Superseded) to make the governance state visible — this directly supports resolving C1/C2.
S5. Connection::new() Referenced but Never Defined
Documents: crates/core/core-types.md (L113), crates/core/endpoint.md (L127), ADR-010 (L117)
Problem: Connection::new(connection) is called in dispatch pseudocode
(endpoint.md L127, ADR-010 L117) but the impl Connection block in
core-types.md:59–67 does not include a new() constructor, nor specify its
signature/parameter. core-types.md:113 says "Connection::new() wraps the
appropriate stream source based on where the connection came from" — but
there's no formal signature, and the argument type differs by source (a
quinn::Connection vs an iroh Accepting/Connecting). An implementer
doesn't know the constructor's parameter type.
Suggestion: Add pub fn new(source: ConnectionSource) -> Connection (or
two constructors from_quinn / from_iroh, or a private enum) to the
impl Connection block. Define ConnectionSource or the variant
constructors.
S6. services/schema Input Shape Under-Specified (Leading Slash)
Documents: operation-registry.md (L286), call-protocol.md (L120, L126)
Problem: The input shape for services/schema is shown as
{ "name": "fs/readFile" } (no leading slash) but the wire operationId
convention (call-protocol.md L120, L126) is with a leading slash
(/fs/readFile). So does services/schema's name input use the wire form
(leading slash) or the registry form (no slash)? The spec shows no slash,
implying registry form, but a client calling services/schema over the wire
would naturally use the same operationId form it uses elsewhere. This is a
small ambiguity but could cause a NOT_FOUND if the adapter doesn't
normalize.
Suggestion: Specify whether services/schema's name input has a
leading slash and whether the adapter strips it (consistent with
call.requested's operationId handling per call-protocol.md L120).
S7. call.completed For Non-Subscription Calls Not Addressed
Documents: call-protocol.md (L96–106)
Problem: The event table lists call.completed as "Signal end of
subscription stream" — subscription-only. But for a plain call()
(request/response, one result), the lifecycle is call.requested →
call.responded (one) and then… nothing? Is the request considered complete
on the first call.responded? Is a call.completed ever sent for
non-subscription calls? The spec says (L103) "call(): Sends call.requested,
resolves on first call.responded" — so no call.completed for calls. But
then what signals to the server that the client is done listening? This is
consistent but the asymmetry (subscriptions get completed, calls don't)
could be stated explicitly to avoid an implementer sending a spurious
call.completed after a call.responded.
Suggestion: Add a note: "call.completed is sent only for subscriptions.
A plain call() is complete after its single call.responded; no
call.completed follows."
S8. OperationProvenance::FromCall Comment Is Ambiguous ("Leaf Locally")
Documents: ADR-022 (L123–124)
Problem: FromCall => "QUIC forwarding stub (from_call), leaf locally —
cannot compose." The "locally" qualifier is ambiguous. Does it mean: (a)
from_call ops are leaves in the local registry but could compose on the
remote node? Or (b) they're leaves, full stop, and "locally" is just
emphasis? Assumption 5 (L518–524) clarifies, but the phrasing invites
misreading.
Suggestion: Rephrase to "leaf in the local registry — forwards calls to a remote node; cannot compose locally" or similar to remove the ambiguity.
S9. ScopedOperationEnv::empty() and CompositionAuthority::none() Referenced but Undefined
Documents: operation-registry.md (L182–184, L243–244, L298–299, L318–320), ADR-022 (L287)
Problem: The spec and ADR-022 use ScopedOperationEnv::empty() and
CompositionAuthority::none() (constructor methods) but neither type's
definition (ADR-022 L164–180 for CompositionAuthority, L208–214 for
ScopedOperationEnv) lists these methods. They're implied (empty set /
None-equivalent) but not shown.
Suggestion: Add impl ScopedOperationEnv { pub fn empty() -> Self { … } pub fn allows(&self, name: &str) -> bool { … } } and
impl CompositionAuthority { pub fn none() -> Option<Self> { None } } (or
similar) to the spec or ADR-022.
S10. Overview OQ-04 Entry Is Stale
Documents: overview.md (L225)
Problem: overview.md:225 lists "OQ-04: Dynamic handler registration at
runtime vs static at startup (two-way door, defer to implementation)." But
OQ-04 in open-questions.md:47–51 is resolved ("Static registration at
startup"). The overview shows a stale framing ("defer to implementation")
while the OQ is resolved. Compare to OQ-01/OQ-02/OQ-03 which are correctly
marked "resolved" in the same overview list.
Suggestion: Update overview.md:225 to "OQ-04: Dynamic handler
registration (resolved: static at startup — see ADR-010)" to match the
resolved state.
S11. ADR-021 References "W1 Drift Issue from the Vault Review" — Ambiguous Cross-Reference
Documents: ADR-021 (L127–129)
Problem: ADR-021 L127–129 states: "This is the fix for the W1 drift issue
from the vault review: the current source ignores key_version for key
selection; the spec now makes it functional." But review 001's W1 (L355–371)
is about ADR-020's stale example path (m/74'/2'/1'/0' vs
m/74'/2'/0'/1'), not about the source ignoring key_version. The
source-ignoring-key_version drift is not tracked in review 001 at all. The
"W1" label in ADR-021 doesn't match review 001's W1.
Suggestion: Clarify the reference — either cite review 001 with the correct finding number, or remove the "W1" label and describe the drift directly (which the text already does).
S12. Duplicate "Helper Functions" Block in mnemonic-derivation.md
Documents: crates/vault/mnemonic-derivation.md (L199–204, L217–223)
Problem: The helper functions device_path and site_password_path are
documented twice: L199–204 (missing encryption_path_for_version) and
L217–223 (complete). An implementer reading the first block might not realize
encryption_path_for_version exists, or might think there are two conflicting
definitions. The duplication is a maintenance hazard.
Suggestion: Remove the first block (L199–204); keep the complete list at L217–223.
Cross-Cutting Theme: ADR-022 Narrowed Several "Two-Way Doors" Without Updating the OQ Classifications
The two-way-door audit found a systematic pattern: ADR-022 (Proposed, not yet Accepted) is the single largest source of door-type staleness. It narrowed several doors without updating the OQ classifications that were written pre-ADR-022:
| Door | Pre-ADR-022 classification | Post-ADR-022 reality |
|---|---|---|
| Capabilities concrete shape (W2) | Two-way (implementation detail) | Clone semantics become a behavioral contract once handlers depend on shared mutation; ADR-022's composition model bakes in the cheap-clone assumption |
| OperationAdapter trait signatures (W1) | Two-way (async vs sync) | ADR-022 committed the return type to HandlerRegistration; from_call forces async — both are already decided |
| CallClient registry mechanism (W3) | Two-way (share global vs subset vs separate) | ADR-022 put capabilities (secrets) in the bundle — "share global" is now a capability-exposure decision, not just dispatch |
| Dynamic registration (W4) | Two-way (add ArcSwap later) | Hot-swap now requires re-injecting capabilities (secret-material handling) + TLS config rebuild, not a pointer swap |
| Session-scoped registries (OQ-19) | Two-way (protocol doesn't need changes) | OperationContext.env field type is a current commitment — if it's a concrete struct, the session-overlay pattern is closed (see C6) |
| Graph structures (W21) | Two-way (HashSet now, flowgraph later) | ScopedOperationEnv's public field leaks the representation; the refactor is a public-API break, not an internal change |
Pattern: ADR-022 was written to resolve review #001's C1–C4 (the registration-bundle tangle), and in doing so it quietly narrowed several previously-two-way doors. The OQ tracker and ADR-017 should be reconciled with ADR-022 before the "two-way door" labels can be trusted. ADR-009's framework is sound; its application is drifting because later ADRs don't update earlier OQ classifications.
The fix is not to re-litigate each door — most of ADR-022's narrowing is
correct (the bundle should carry capabilities, the adapter should produce
HandlerRegistration). The fix is to add guard clauses to the affected OQs
and ADRs documenting the narrowed scope, so an implementer reading the OQ
tracker sees the post-ADR-022 reality, not the pre-ADR-022 optimism.
Cross-Cutting Theme: The "Published Artifact Is a Contract" Blind Spot
The audit found a systematic blind spot in ADR-009's framework: it classifies doors by reversal cost in the codebase, not by compatibility cost for external consumers. Three deferrals share this pattern:
to_*best-effort mappings (W5): "best-effort" is true before first publication but one-way after — external clients depend on the generated spec.- Error schema
http_statusmappings (W20): onceto_openapipublishesFILE_NOT_FOUND→ 404, changing it breaks clients. - The salt field's wire format (W6): the field is "reserved" (wire-format- stable), but its semantics cannot change for existing data — "reserved" implies more flexibility than exists.
The framework treats these as two-way because the format is additive; it misses that the semantics, once published, are frozen. This is the same class of issue as a public API: adding a method is additive, but changing an existing method's behavior is breaking. The spec should acknowledge that published artifacts (generated OpenAPI specs, published error code mappings, shipped wire formats) are compatibility contracts, not internal implementation details.
Summary Statistics
| Severity | Count |
|---|---|
| Critical | 13 |
| Warning | 21 |
| Suggestion | 12 |
Recommended Resolution Order
The findings cluster into a few resolution passes. Recommended order:
Pass 1: Governance reconciliation (resolves C1, C2, C12, C11)
- Advance ADR-022 and ADR-023 to Accepted.
- Amend ADR-015: mark Decision 3 and Assumption 6 as superseded by ADR-022;
update the struct to
CompositionAuthority. - Amend ADR-002: note the signature was revised by ADR-007.
- Amend ADR-004: note the "enrich/replace" language was superseded by ADR-011's immutability model.
- Add a Status column to all crate README ADR tables.
This is the highest-value pass — it resolves the direct contradictions in the document set and makes the governance state coherent.
Pass 2: Spec-ADR consistency (resolves C3, C4, C5, C6, C10, C13, W1)
- Add
abort_policytoOperationContext; defineAbortPolicyenum (C3). - Define the
OperationEnvtrait explicitly, includinginvoke_with_policy()(C4). - Separate
OperationContext.env(dispatch trait) fromOperationContext.scoped_env(reachability set) — resolve the type identity crisis (C6). - Update ADR-017's
from_callmirror list andOperationAdaptertrait to includeerror_schemasandHandlerRegistration(C5, W1). - Specify
From<StreamError> for HandlerErrorand resolve the variant payload (C10). - Specify the
Connection::set_identityobservability read path (C13).
Pass 3: Vault security and deferral reconciliation (resolves C7, C8, C9, W6, W7, W8, W9, W10, W12)
- Promote OQ-21 from "deferred" to "open"; reconcile with ADR-019; decide the vault-server-crate question (C7).
- Complete the operation access policy table (C8).
- Specify the
site_password_pathstring→u32 mapping (C9). - Add an ADR for the HD-derivation key model (identity/SSH/signing) (W9).
- Add an ADR locking the
EncryptedDatawire format (W10). - Add a "Known Source Drift" table to the vault README (W12).
- Address
unlock_newreturning non-zeroizingString(W7) andDerivedKeyJSON deserialization (W8). - Clarify the salt field's "reserved" semantics (W6).
Pass 4: Two-way-door guard clauses (resolves W2, W3, W4, W5, W17, W18, W21)
- Add guard clauses to the OQs and ADRs that ADR-022 narrowed (W2, W3, W4, W21).
- Add the "published artifact is a contract" guard to
to_*adapters (W5). - Correct OQ-09 (WASM server-side) and OQ-10 (git composability) framings (W17, W18).
Pass 5: Minor consistency (resolves remaining Warnings and Suggestions)
- Fix the overview.md "stream" → "connection" wording (W16).
- Fix ADR-016's self-referential citation and grandchild propagation (W19).
- Fix ADR-023's
from_openapicollision example (W20). - Address the remaining suggestions (S1–S12).
Note on Effort
This review found more issues than review #001 (13 critical vs 5, 21 warning vs 4), but that's expected: review #001 caught the registration-bundle tangle and the missing error schemas, which were the highest-density gaps. This review goes wider — cross-document consistency, the two-way-door framing, and the vault deferrals — and surfaces issues that were always there but not yet audited. The critical findings here are individually smaller than review #001's C1–C5 (no single tangle of four interlocking gaps), but they are collectively important because several are governance issues (Proposed ADRs treated as binding, Accepted ADRs contradicting later refinements) that undermine the document set's authority.
The two-way-door audit's core finding: the framework is sound, but its application is drifting because later ADRs don't update earlier OQ classifications. The fix is not to re-litigate each door — it's to add guard clauses documenting the narrowed scope, so the OQ tracker reflects post-ADR-022 reality.