Files
alknet/docs/reviews/002-pre-implementation-architecture-sanity-check.md
glm-5.2 8f8a8a48f9 docs(reviews): add pre-implementation architecture sanity check #002
Second pre-implementation review. Goes wider than #001 on cross-document
consistency and the two-way-door framing from ADR-009.

Finds 13 critical, 21 warning, 12 suggestion issues:
- Governance: ADR-022/023 are Proposed but specs treat them as binding;
  ADR-015/002/004 (Accepted) contradict later refinements without supersession
  markers
- Abort policy (ADR-016) missing from OperationContext struct; OperationEnv
  trait never defined
- OperationContext.env type identity crisis (reachability set vs dispatch
  trait)
- ADR-017 from_call mirror list missing error_schemas; OperationAdapter trait
  stale vs ADR-022 bundle
- OQ-21 remote vault 'non-breaking' framing conflicts with ADR-019 and hides
  a crate-decomposition decision; RemoteService path unvalidated
- Vault operation access policy table incomplete for security-sensitive methods
- site_password_path string-to-index mapping breaks determinism guarantee
- Two-way-door audit: ADR-022 narrowed several doors without updating OQ
  classifications; 'published artifact is a contract' blind spot in framework

Includes recommended 5-pass resolution order.
2026-06-22 05:09:39 +00:00

1780 lines
90 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
status: draft
last_updated: 2026-06-22
reviewed_documents:
- overview.md
- README.md
- open-questions.md
- crates/core/README.md
- crates/core/core-types.md
- crates/core/endpoint.md
- crates/core/auth.md
- crates/core/config.md
- crates/call/README.md
- crates/call/call-protocol.md
- crates/call/operation-registry.md
- crates/vault/README.md
- crates/vault/mnemonic-derivation.md
- crates/vault/encryption.md
- crates/vault/service.md
- crates/vault/protocol.md
- decisions/001 through 023 (all 23 ADRs)
reviewer: architect
---
# Architecture Gap Review #002
## Purpose
This is the second pre-implementation sanity check. Review #001 caught the
registration-bundle tangle (C1C4) and the missing error schemas (C5), both
resolved via ADR-022 and ADR-023. This review goes deeper on two axes the
first review under-covered:
1. **Cross-document consistency**: places where an Accepted ADR contradicts a
Proposed ADR, where a spec doc adopted a Proposed ADR's types without the
ADR being advanced, and where a later ADR narrowed a "two-way door"
deferral without updating the OQ classification.
2. **Two-way-door deferral audit**: the user's concern that the one-way/two-way
door framing from ADR-009 has been applied too loosely. Some deferrals are
legitimate, but several are questionable or misleading — either because a
later ADR quietly narrowed the door, because the "path of least resistance"
implementation forecloses a better option, or because multiple
individually-reversible deferrals compound to foreclose a path that no
single deferral would block on its own.
Review #001 resolved 5 critical, 4 warning, and 4 suggestion findings. This
review finds a different set: the registration-bundle work surfaced new
inconsistencies between ADR-015 (Accepted) and ADR-022 (Proposed), the abort
policy (ADR-016) was never reflected in the OperationContext struct, the
OperationEnv trait was never defined, and several vault deferrals have
security dimensions that the two-way-door framing doesn't surface.
## Severity Definitions
| Severity | Meaning |
|----------|---------|
| **Critical** | Must resolve before implementation — would cause divergent implementations or security issues |
| **Warning** | Should resolve — missing specifications that could cause confusion or rework |
| **Suggestion** | Consider — improvements for clarity or completeness |
---
## Critical Findings
### C1. ADR-015 Is Not Reconciled With ADR-022 — Direct Contradiction in the Document Set
**Documents**: ADR-015 (Status: Accepted, Decision 3 L114136, Assumption 6
L275280), ADR-022 (Status: Proposed, Decision 2 L145186), operation-registry.md
(L115, L133, L172), call-protocol.md (L296)
**Problem**: ADR-015 is marked **Accepted** and defines
`handler_identity: Option<Identity>` in its Decision 3 (L124) — a peer `Identity`
type resolved via `IdentityProvider`. Assumption 6 (L275280) explicitly states
"the handler identity is a full `Identity` (with scopes), not a special
principal type… This reuses the existing `Identity` type and `IdentityProvider`
infrastructure (ADR-004)."
ADR-022 is marked **Proposed** and explicitly supersedes ADR-015's Assumption 6
(L147149: "This ADR refines that: composition authority is a declared authority
bundle, not a peer `Identity`"; L183185: "This supersedes ADR-015's Assumption
6"). The spec docs now use `handler_identity: Option<CompositionAuthority>`
a different, non-peer type.
This is a **direct contradiction**. Per the ADR lifecycle, a superseded
decision should be marked `Superseded` or carry an inline note pointing to the
superseding ADR. ADR-015 has neither. Worse, ADR-015 has **higher status**
(Accepted) than the ADR that supersedes it (Proposed). An implementer reading
ADR-015 in isolation would build `Identity`-based handler identity; an
implementer reading the specs would build `CompositionAuthority`-based. The
Accepted-overrides-Proposed governance rule means the old `Identity` definition
is, by the document set's own lifecycle, currently authoritative — yet the
specs have already adopted the proposed change.
This is the same shape of bug as review #001's W1 (ADR-020's stale example
path contradicting the locked ADR-021 scheme), but more severe: it's a
security-model type, not a path string.
**Suggestion**: Two things must happen before implementation:
1. Advance ADR-022 and ADR-023 to **Accepted**. The specs already depend on
them as the source of truth for types that appear throughout the code.
Keeping them Proposed while the specs adopt their types is incoherent
governance. (See C2.)
2. Reconcile ADR-015: either update its Decision 3 and Assumption 6 to point
to ADR-022 (inline amendment note + change `Identity` to
`CompositionAuthority` in the struct), or mark the relevant sections
`Superseded by ADR-022 (Decision 2)`.
The cleanest path is to do both in one pass: advance ADR-022/023 to Accepted,
amend ADR-015 to reference the supersession. This resolves C1, C2, and C6 in
one move.
---
### C2. ADR-022 and ADR-023 Are "Proposed" but the Specs Treat Them as Authoritative
**Documents**: ADR-022 (Status: Proposed), ADR-023 (Status: Proposed),
operation-registry.md (L40, L115, L133, L149, L158, L160168, L286, L372379,
L404406), call-protocol.md (L296, L305, L369373), README.md (ADR table
L3660)
**Problem**: Both ADR-022 and ADR-023 are marked **Proposed**. Yet the spec
docs depend on them pervasively as the source of truth:
- `OperationSpec.error_schemas` (ADR-023) — operation-registry.md L40, L286.
- `ErrorDefinition` (ADR-023) — operation-registry.md L5661.
- `HandlerRegistration` with `provenance`, `composition_authority`,
`scoped_env`, `capabilities` (ADR-022) — operation-registry.md L160168,
the central registration type.
- `OperationContext.handler_identity: Option<CompositionAuthority>` (ADR-022
Decision 2) — operation-registry.md L115, L133; call-protocol.md L296.
- `CompositionAuthority`, `ScopedOperationEnv`, `OperationProvenance`
(ADR-022) — used throughout.
The README's ADR table (L3660) shows ADR-022 and ADR-023 as "Proposed," but
the call-crate README's ADR table has **no Status column** at all — a reader
can't tell which ADRs are binding. Per the ADR lifecycle, "Proposed" means
under review, not yet a commitment.
Combined with C1 (ADR-015 Accepted still contradicts ADR-022 Proposed), the
governance state is incoherent: the Accepted ADR says `Identity`, the Proposed
ADR says `CompositionAuthority`, and the specs have picked the Proposed one.
**Suggestion**: Advance ADR-022 and ADR-023 to **Accepted** before
implementation. They are clearly intended to be binding — the specs already
implement them. Add a Status column to the crate README ADR tables so the
binding/non-binding distinction is visible. This is the governance fix that
unblocks C1.
---
### C3. The Abort Policy (ADR-016 Decision 6) Is Missing from `OperationContext` and `OperationEnv::invoke()`
**Documents**: ADR-016 (Decision 6 L120164, Accepted), operation-registry.md
(`OperationContext` struct L111128, `LocalOperationEnv::invoke()` L211249),
call-protocol.md (L124, L345)
**Problem**: ADR-016 Decision 6 states unambiguously (L122125): "The abort
policy (`abort-dependents` vs `continue-running`) is set on `OperationContext`
and propagated to children via `OperationEnv::invoke()`." It shows
`OperationContext` carrying the policy (L154) and an opt-in
`invoke_with_policy(...)` method (L149151).
The `OperationContext` struct in operation-registry.md (L111128) has **no
`abort_policy` field**. The fields shown are: `request_id`,
`parent_request_id`, `identity`, `handler_identity`, `capabilities`,
`metadata`, `env`, `internal`. No policy. The `LocalOperationEnv::invoke()`
implementation (L211249) likewise constructs a child `OperationContext` with
no policy field and takes no policy parameter.
call-protocol.md L124 and L345 correctly *state* that the policy is on
`OperationContext` and propagated via `OperationEnv::invoke()`, but the actual
struct definition in operation-registry.md — the one an implementer would copy
— omits it. This is a gap between ADR-016 (Accepted) and the spec. An
implementer copying the spec would simply omit the abort policy, and the
`continue-running` opt-in would have no wiring point.
**Suggestion**: Add `pub abort_policy: AbortPolicy` (with a default of
`AbortPolicy::AbortDependents`) to the `OperationContext` struct in
operation-registry.md, and define the `AbortPolicy` enum
(`AbortDependents | ContinueRunning`). The `build_root_context` in
call-protocol.md should set `abort_policy: AbortPolicy::AbortDependents` (the
default per ADR-016 L140). See C4 for the `invoke()` side.
---
### C4. `OperationEnv` Trait Is Never Defined; `invoke_with_policy()` Absent from Spec
**Documents**: operation-registry.md (L196259), ADR-016 (Decision 6 L144164)
**Problem**: Two related gaps:
1. **The `OperationEnv` trait is never defined.** operation-registry.md
repeatedly references `OperationEnv` as a **trait** that "must remain a
trait" (L259, also OQ-19), and shows
`#[async_trait] impl OperationEnv for LocalOperationEnv` (L211212), but
**the trait definition itself is never shown**. Only the `LocalOperationEnv`
struct (L207209) and its impl block (L211249) appear. An implementer
cannot know the trait's method signatures, return types, or whether
`invoke()` is the sole method.
2. **The `invoke_with_policy()` opt-in is absent.** ADR-016 Decision 6
(L144152) shows the policy entering through `OperationEnv::invoke()` via
`invoke_with_policy("fs", "readFile", input, &context,
AbortPolicy::ContinueRunning)`. The spec's `OperationEnv::invoke()`
signature (L213) takes no policy parameter, and there is no
`invoke_with_policy` method, no `InvokeOptions` struct, no builder
pattern. The spec never defines the trait surface, so the policy opt-in
mechanism ADR-016 mandates is unspecified.
This is load-bearing because OQ-19 makes the trait-ness of `OperationEnv` a
**constraint**: session-scoped registries depend on wrapping the global env
via trait layering. If the trait surface is unspecified, two implementers
could define it differently, breaking the wrapping pattern.
**Suggestion**: Define the `OperationEnv` trait explicitly in
operation-registry.md, showing at minimum:
```rust
#[async_trait]
pub trait OperationEnv: Send + Sync {
async fn invoke(
&self,
namespace: &str,
operation: &str,
input: Value,
parent: &OperationContext,
) -> ResponseEnvelope;
/// Opt-in: compose a child with a non-default abort policy.
async fn invoke_with_policy(
&self,
namespace: &str,
operation: &str,
input: Value,
parent: &OperationContext,
policy: AbortPolicy,
) -> ResponseEnvelope {
// default implementation: set policy on child context, delegate to invoke
}
}
```
This satisfies both the "must remain a trait" constraint and ADR-016's
policy parameter. See C7 for a related type ambiguity on the same struct.
---
### C5. ADR-017's `from_call` Mirror List Does Not Include `error_schemas` — Inconsistency with ADR-023
**Documents**: ADR-017 (Decision 3 L113122, Assumption 4 L274279, Accepted),
ADR-023 (Decision 1 L112147, L4247, L257, Proposed)
**Problem**: ADR-017 Decision 3, step 3 (L118119) says `from_call` constructs
a spec that "mirrors the remote operation's name, namespace, type, schemas,
and access control." Assumption 4 (L274275) repeats: "the imported
`OperationSpec` has the same name, namespace, type, schemas, and access
control as the remote operation." The list is: name, namespace, type,
schemas, access control. There is **no `error_schemas`** in either list.
ADR-023 adds `error_schemas: Vec<ErrorDefinition>` to `OperationSpec` (L123)
and Decision 5 (L236261) makes adapter fidelity explicit for
`from_openapi`/`to_openapi`/`from_mcp`/`to_mcp`. But ADR-023 does **not**
mention `from_call` in Decision 5 — it covers OpenAPI and MCP adapters only.
Meanwhile ADR-017's `from_call` mirror list predates `error_schemas` existing
as a field.
The result: it's ambiguous whether `from_call` should mirror `error_schemas`.
ADR-017's step 2 (L116117) says `from_call` calls `services/schema` "for each
→ gets the input/output JSON Schemas" — again, no mention of error schemas.
ADR-023 Decision 6 (L264286) confirms `services/schema` *returns*
`error_schemas`, so the data is available, but ADR-017 doesn't say to consume
it.
An implementer following ADR-017 literally would import operations with empty
`error_schemas`, silently dropping the remote error contract — the exact
lossy-import problem ADR-023 was written to solve for OpenAPI/MCP, now
recurring for `from_call`.
**Suggestion**: Update ADR-017 Decision 3 step 3 and Assumption 4 to include
`error_schemas` in the mirror list (name, namespace, type, schemas,
**error_schemas**, access control), and update step 2 to note that
`services/schema` now also returns error schemas. Add a cross-reference to
ADR-023. This is an inconsistency between an Accepted ADR (017) and a later
ADR (023) that must be reconciled — same shape as review #001's W3
(`from_mcp` missing from adapter lists), but for a new field.
---
### C6. `OperationContext.env` Has a Type Identity Crisis — Reachability Set vs Dispatch Trait
**Documents**: ADR-015 (L127: `pub env: OperationEnv`), ADR-022 (L208215
`ScopedOperationEnv`, L286287 `env: registration.scoped_env…`),
operation-registry.md (L118: `pub env: OperationEnv`; L220:
`parent.env.allows(&name)`; L243244: `env: registration.scoped_env.clone()…`;
L198: `context.env.invoke(...)`), call-protocol.md (L298299)
**Problem**: There is a type identity problem for the `env` field on
`OperationContext`:
- ADR-015 L127 and operation-registry.md L118 declare `pub env: OperationEnv`
(a trait object / the trait type).
- ADR-022 and operation-registry.md L243244 and call-protocol.md L298299
*populate* `env` with a `ScopedOperationEnv` (a concrete struct:
`allowed_operations: HashSet<String>`, ADR-022 L208214).
`ScopedOperationEnv` (a reachability set — ADR-022's reachability control)
and `OperationEnv` (the composition dispatch trait — the thing with
`invoke()`) are **different concepts**. ADR-015 conflated them by putting
`OperationEnv` on the context; ADR-022 introduced `ScopedOperationEnv` as the
reachability *input* but the dispatch path assigns the scoped set directly to
`env:`.
operation-registry.md L220 calls `parent.env.allows(&name)` — implying `env`
is the `ScopedOperationEnv` (has `.allows()`). But the same field is named
`OperationEnv` in the struct and described (L136) as "the operation environment
for composing calls… `OperationEnv`" with `invoke()`. A handler calls
`context.env.invoke(...)` (L198, L200) — implying `env` is the dispatch
trait. One field cannot be both a `HashSet`-based reachability gate and a
trait with an async `invoke()` method, unless they are the *same type* (they
are not — ADR-022 treats `ScopedOperationEnv` as data and the dispatch
`OperationEnv`/`LocalOperationEnv` as the trait impl that *consults* the
scoped env).
This is a genuine type-system ambiguity that would force an implementer to
guess: is `context.env` the dispatch trait (then where does the scoped
reachability set live?), or is it the reachability set (then how does the
handler call `invoke()` on it?), or is it a composite type that holds both?
This interacts with OQ-19's constraint that `OperationEnv` must remain a trait:
if `env` is `ScopedOperationEnv` (a concrete struct), the session-overlay
pattern is closed because the field can't hold a trait object.
**Suggestion**: Separate the two concepts explicitly. Likely:
`OperationContext` carries `scoped_env: ScopedOperationEnv` (reachability
data, used for the `allows()` check) and a separate `env: Box<dyn
OperationEnv>` (or `Arc<dyn OperationEnv>`) that is the dispatch trait (used
for `invoke()`). The reachability check in `invoke()` consults
`parent.scoped_env.allows(&name)`, and the actual dispatch goes through the
`OperationEnv` trait impl. The current field naming (`env` for both) is the
root of the confusion and should be disambiguated. This may warrant an ADR-022
clarification, since it changes the struct shape that ADR-022 Decision 5 shows.
---
### C7. OQ-21's "Non-Breaking" Deferral Conflicts With ADR-019 and Hides a Crate-Decomposition Decision
**Documents**: open-questions.md (OQ-21 L252285), ADR-019 (L133136),
crates/vault/protocol.md (L187216, L278291), crates/vault/service.md
(L300322)
**Problem**: Three interlocking problems:
1. **ADR-019 directly contradicts OQ-21's deferral framing.** ADR-019 L133136
states: *"Remote vault administration (unlock a running node's vault over
the network) is not supported. If needed in the future, it requires a
separate, heavily restricted mechanism (admin scope, mTLS-only, never
expose the mnemonic over an unauthenticated channel) and **its own ADR**."*
OQ-21 defers remote vault *access* (derive/encrypt/decrypt) as
"non-breaking" with no ADR required. The boundary between ADR-019's
"remote vault administration" (needs an ADR) and OQ-21's "remote vault
access" (deferred, no ADR) is undefined. An implementer reading ADR-019
could reasonably conclude that *any* remote vault exposure requires an
ADR; an implementer reading OQ-21 could conclude none is needed.
2. **The "non-breaking" claim is only true for the wire format, not for
secure enablement.** OQ-21 and protocol.md L278286 frame enabling remote
access as a "server-setup change." But the auth-wrapping handler that makes
remote access *safe* cannot live in the vault crate (ADR-018) and requires
either (a) assembly-layer integration or (b) **a new dedicated vault-server
crate depending on both alknet-core and alknet-vault** (protocol.md
L201204, open-questions.md L269). Option (b) is a crate-decomposition
change (ADR-003 territory), not "server setup." OQ-21's "two-way (enabling
is non-breaking)" door-type classification (L256) is misleading: once
workers build dependencies on the remote vault API, disabling it breaks
them — the door is operationally one-way even if the wire format is
additive.
3. **Nested, undecided deferral.** OQ-21 L269 says the auth handler "lives in
the assembly layer **or** a dedicated vault-server crate." This "or" is
itself an undecided architectural choice (affects the dependency graph and
ADR-003's crate decomposition) deferred *inside* the OQ-21 deferral. It is
not tracked as a separate question.
4. **The RemoteService path is unvalidated.** The deferral means
`VaultProtocol`'s `RemoteService` trait impl, `DerivedKey`'s dual
serialization (JSON redacts, postcard preserves), and
`VaultServiceActor`'s transport-agnosticism have never been exercised over
a real network. The spec *asserts* these work ("by construction"), but
there are concrete failure modes that only surface at enablement —
e.g., `DerivedKey`'s custom `Serialize` redacts `private_key` for
"human-readable formats"; whether postcard correctly bypasses the
redaction is a known source of subtle bugs. If it redacts under postcard
too, remote dispatch silently corrupts private keys. ADR-009:33 warns
"if the right choice is unclear, validate with a POC." The deferral skips
the POC and asserts the door is open by construction.
5. **Safety footgun.** The default `IrohProtocol` handler forwards all
message types without auth (protocol.md L189191). If someone naively
registers the ALPN without the wrapping handler, `Unlock` (mnemonic over
the wire) and `Lock` (DoS) become remotely callable. The operation access
policy is documented in prose only; nothing in the vault crate prevents
the insecure configuration.
**Suggestion**: Promote OQ-21 from "deferred" to "open." Reconcile with
ADR-019 by explicitly defining the administration-vs-access boundary. Decide
now whether a vault-server crate is part of the architecture (this is a
crate-decomposition one-way door per ADR-003/ADR-018 and should be decided,
not deferred). If remote access is genuinely deferred, require that the
*enablement* (not just the protocol) be covered by a future ADR — matching
ADR-019's "requires its own ADR" language. Correct the door-type
classification: enabling remote access is a one-way door operationally (workers
depend on it), even if the wire format is additive. At minimum, add a guard
clause: "The RemoteService path is unvalidated. The `DerivedKey` postcard
serialization and the actor's transport-agnosticism must be verified by a
POC before claiming the door is open."
---
### C8. The Operation Access Policy Table Is Incomplete — Security-Sensitive Methods Are Unclassified
**Documents**: crates/vault/protocol.md (L218236), crates/vault/service.md
(L104212)
**Problem**: The "Operation access policy" table (protocol.md L223232)
covers exactly the 8 `VaultProtocol` enum variants. But `VaultServiceHandle`
exposes additional methods not in the protocol enum: `rotate`, `unlock_new`,
`derive_encryption_key_for_version`, `derive_password_string`, `is_unlocked`.
These methods have no access classification — an implementer cannot determine
from the spec whether they are local-only, remote-capable, or simply not
protocol-exposed. This is security-relevant:
- **`rotate`** re-encrypts blobs (security-sensitive operational action;
ADR-021 L168170 says "rotation is an operational action, not a runtime
behavior"). If it were ever exposed remotely, a compromised worker could
trigger mass re-encryption. It is not in the protocol enum and not in the
access policy table — its status is undefined.
- **`unlock_new`** generates and returns the mnemonic (root of trust) as a
plaintext `String`. Not in the protocol enum; not in the policy table. Its
local-only status is implied but not stated.
- **`derive_encryption_key_for_version`** returns an `EncryptionKey` directly
(raw key material), unlike `Encrypt`/`Decrypt` which use the key internally.
If exposed remotely, a worker could extract raw encryption keys. Not
classified.
The spec never states the rule that "service methods not in `VaultProtocol`
are local-only by construction." An implementer could add protocol variants
for these methods without recognizing the security implication, since the
policy table only constrains existing variants.
**Suggestion**: Add an explicit rule: "Service methods absent from
`VaultProtocol` are local-only by default; adding a protocol variant requires
updating this policy table and an ADR if the method is security-sensitive."
Add `rotate`, `unlock_new`, and `derive_encryption_key_for_version` to the
policy table (or an explicit "local-only" list) with rationale. Document the
mapping between service methods and protocol variants.
---
### C9. `site_password_path(site_hash: &str)` Is Underspecified — String-to-Index Mapping Breaks Determinism
**Documents**: crates/vault/mnemonic-derivation.md (L202204, L213, L219222)
**Problem**: The helper `site_password_path(site_hash: &str) -> String`
produces path `m/74'/1'/0'/{site_hash}'`, but SLIP-0010 hardened derivation
indices are `u32` (0 to 2³¹1). A `&str` "site hash" cannot be a derivation
index directly — it must be converted to a `u32`. The conversion (hash
algorithm, truncation, endianness, collision handling) is completely
unspecified. Two implementers could produce different, incompatible mappings:
SHA-256(site) → first 4 bytes → u32; or last 4 bytes; or BLAKE3; or a string
hash. Since the path semantics table (L213) says these are "Ed25519 bytes"
used for "Per-site passwords (not cached)," and determinism is a stated design
principle (L234237: "the same mnemonic + passphrase + path always produces
the same key"), the string→index mapping must be deterministic and specified.
An underspecified mapping means the same site + mnemonic produces different
passwords across implementations — breaking the reproducibility guarantee.
Additionally, "Ed25519 bytes" (L213) as a password source is an undefined
concept. The path produces a 32-byte Ed25519 private key; using those raw
bytes as a password is a non-standard construction with no ADR explaining why
Ed25519-derived material is suitable as password entropy (vs HKDF or a
dedicated KDF).
**Suggestion**: Specify the exact string→u32 mapping (e.g.,
`u32::from_be_bytes(SHA-256(site_hash)[0..4]) & 0x7FFFFFFF` for hardened
index). Add a design note or ADR explaining why Ed25519-derived bytes are
acceptable as password material. Specify the password encoding (raw bytes vs
base64url — service.md L166 says base64url for the string variant, but the
bytes variant's encoding is the caller's concern).
---
### C10. `HandlerError::StreamError(io::Error)` vs `StreamError` Variant — Conversion Gap
**Documents**: crates/core/core-types.md (L3338, L118139),
crates/core/endpoint.md (L248253), crates/core/auth.md (L55, L59), ADR-010
**Problem**: `core-types.md` and `endpoint.md` (citing ADR-010) both define
`HandlerError::StreamError(io::Error)`. But `auth.md:59` shows handler code
calling
`self.identity_provider.resolve_from_token(&token).ok_or(HandlerError::AuthRequired)?`
after `let stream = connection.accept_bi().await?;` (L55). The `.await?` on
`accept_bi()` returns `StreamError` (per `core-types.md:60`), yet there is no
`From<StreamError> for HandlerError` impl specified. The mapping table in
`core-types.md:132137` is descriptive prose, not a trait impl. Without a
specified `From<StreamError> for HandlerError` (or `#[from]`), the `?` in the
example will not compile. Two implementers following these docs literally
will diverge: one will hand-map per the table, another will expect an auto-
`From`.
Additionally, `HandlerError::StreamError(io::Error)` holds `io::Error`, but
`StreamError` itself holds `Internal(io::Error)` (L118124). So
`StreamError::Internal(io::Error)``HandlerError::StreamError(io::Error)`
is a lossy wrap (the distinction between stream-level and connection-level is
flattened). The spec should state whether this `From` impl exists and its
exact behavior, because handler code (`?` operator) depends on it.
**Suggestion**: Specify the exact conversion mechanism. Either (a) declare
`HandlerError::StreamError(#[from] StreamError)` — but then the variant must
hold `StreamError`, not `io::Error`, contradicting ADR-010/core-types; or (b)
explicitly state handlers must use `.map_err(HandlerError::from)` or
`.map_err(|e| e.into())` and define the `From` impl; or (c) keep `io::Error`
and define `From<StreamError> for HandlerError` matching the mapping table.
Resolve the variant payload (`io::Error` vs `StreamError`) and specify the
`From` impl. Update `auth.md:55` example to use the canonical conversion
pattern.
---
### C11. ADR-004 Says Handlers "Enrich or Replace" AuthContext; ADR-011 Says AuthContext Is Immutable in `handle()`
**Documents**: ADR-004 (L39, Accepted), ADR-011 (L131133, Accepted),
crates/core/auth.md (L8285)
**Problem**: ADR-004 line 39 states the handler "enriches or replaces the
partial `AuthContext` with the fully resolved `Identity`." This describes
**mutation** of the shared AuthContext. ADR-011 explicitly contradicts this
at lines 131133: "`handle()` signature passes `&AuthContext` (immutable
reference). Handlers that resolve identity create a local variable… they
don't mutate the AuthContext." `auth.md:85` reiterates immutability. The
immutability decision is the later, more refined one, but ADR-004 was never
amended. An implementer reading ADR-004 alone would mutate `AuthContext`
causing cross-stream contamination (one stream's token resolution leaking
into another stream's AuthContext on the same connection), exactly what
ADR-011 prevents.
This is security-relevant — mutation across streams is a contamination
vector. Same shape as C1 (Accepted ADR contradicts a later refinement that
was never recorded as a supersession).
**Suggestion**: Amend ADR-004 to note that the "enrich/replace" language was
superseded by ADR-011's immutability model (handlers resolve into a local
variable, store on `Connection` via `set_identity`). Add an inline note at
ADR-004 L39 pointing to ADR-011. Same pattern as the C1 fix for ADR-015.
---
### C12. ADR-002's `ProtocolHandler::handle` Signature Is Stale — Not Marked as Superseded by ADR-007
**Documents**: ADR-002 (L1927, Accepted), ADR-007 (L4453, Accepted),
overview.md (L89), crates/core/core-types.md (L1419)
**Problem**: ADR-002 (Status: Accepted) still shows
`async fn handle(&self, stream: BiStream, auth: &AuthContext)` at L26.
ADR-007 revised the signature to
`handle(&self, connection: Connection, auth: &AuthContext)` and its References
note "ADR-002: ProtocolHandler trait (signature revised by this ADR)."
However, ADR-002 itself has **no superseded/amended marker** — it reads as
the current authoritative definition. `overview.md:89` even says "This differs
from the original ADR-002 signature which passed `BiStream`," confirming the
discrepancy is known but ADR-002 was never annotated. An implementer reading
only ADR-002 (as the README's "Applicable ADRs" table encourages) gets the
wrong trait.
**Suggestion**: Add a "Superseded by ADR-007" amendment note at the top of
ADR-002's Decision section (or a `## Amendments` section) stating the
signature was changed by ADR-007. Alternatively, update the ADR-002 code block
to the `Connection` signature with a note "Revised by ADR-007." ADRs must
reflect current state or explicitly mark supersession — this is the same
lifecycle rule that C1 and C11 violate.
---
### C13. `Connection::set_identity` Observability Read Path Is Under-Specified
**Documents**: crates/core/core-types.md (L5367), crates/core/auth.md (L80),
open-questions.md (OQ-11 L138), crates/core/endpoint.md (L122136)
**Problem**: `Connection` has private `identity: OnceLock<Identity>`.
`set_identity(&self,…)` takes `&self` (interior mutability) and is
"write-once-read-many." But the endpoint "holds a reference for logging"
(OQ-11:138) and the dispatch code in `endpoint.md:122136` **moves** `conn`
into the spawned task (`Connection::new(connection)` then
`tokio::spawn(async move { ... handler.handle(conn, &auth) })`). After the
move, the endpoint no longer has the `Connection` to read `identity()` for
observability. OQ-11 says "the endpoint may also hold a reference for
logging" — but there is no `Arc<Connection>` or shared handle shown anywhere;
the struct definition in `core-types.md:5357` has no sharing mechanism, and
`AlknetEndpoint` in `endpoint.md:1727` does not hold a registry of live
connections. So the stated observability reader (the endpoint) has no defined
way to access the connection after spawn.
**Suggestion**: Specify the observability read path precisely. Either (a)
the endpoint holds a side table of live connections keyed by connection ID
with `Weak<Connection>` or an `Arc<Connection>` shared with the handler task,
and `Connection` becomes `Arc`-shareable; (b) the handler emits the resolved
identity via a channel/observer callback and the endpoint records it; or (c)
drop the claim that the endpoint reads it and state observability is
handler-side logging only. The current spec is internally inconsistent on
who reads `identity()` and how they obtain the reference.
---
## Warning Findings
### W1. ADR-017's `OperationAdapter` Trait Is Stale Relative to ADR-022 — And the Sync/Async "Two-Way Door" Is Already Decided
**Documents**: ADR-017 (L155158, L171174, Accepted), ADR-022 (L225255,
Proposed)
**Problem**: ADR-017 L171172 states: "The specific trait signatures (async
vs sync, error types, configuration parameters) are two-way doors for
implementation." Two problems:
1. **The bundle shape is already constrained.** ADR-022 changes registration
from `(OperationSpec, Handler)` to `HandlerRegistration` (spec + handler +
provenance + composition_authority + scoped_env + capabilities).
ADR-022 L257261 says adapter convenience methods construct
`HandlerRegistration` with `composition_authority: None` and
`scoped_env: None`. So the adapter trait must now produce
`Vec<HandlerRegistration>`, not `Vec<(OperationSpec, Handler)>`. The trait
signature shown in ADR-017 (`fn import(&self) -> Vec<(OperationSpec,
Handler)>`) is wrong per ADR-022. The "two-way door for implementation" on
the signature has already been narrowed by ADR-022 — the return type is
committed to the bundle, not the pair. ADR-017 has not been updated.
2. **The async/sync question is not fully two-way.** `from_call` is inherently
async — it discovers operations via `services/list` + `services/schema`
over a QUIC connection. A synchronous `fn import(&self) -> Vec<…>` trait
cannot accommodate `from_call` without a separate async pre-step that
populates a cache, then a sync import that reads the cache. This means the
trait shape is constrained: either the trait is async (`async fn import`),
or `from_call` is not an `OperationAdapter` impl and uses a different path.
The "two-way door" framing suggests you can pick sync now and switch to
async later; in reality, `from_call` forces at least an async-capable path
from day one.
This is a "two-way door" label that ADR-022 has silently narrowed without
updating the OQ classification. (See the two-way-door audit, item 7.)
**Suggestion**: Update ADR-017 to: (1) return `Vec<HandlerRegistration>` (not
`(OperationSpec, Handler)`), (2) acknowledge that `from_call` requires async
discovery and the trait must be async or `from_call` must be special-cased, (3)
reclassify the signature as one-way (bundle shape) with a narrow two-way door
on ergonomic details (error types, config params). The current framing in
ADR-017 is inconsistent with ADR-022 and should be reconciled — same
reconciliation pass as C5.
---
### W2. `Capabilities` Concrete Shape and Clone Semantics — "Two-Way Door" With a Hidden One-Way Door Inside
**Documents**: operation-registry.md (L364, L369), ADR-014 (L7476, L123124,
L163165)
**Problem**: The spec says "the concrete shape of the type (a typed map, a
struct with named fields, a trait object) is a two-way door for implementation"
and "the concrete cloning semantics (reference-counted `Arc` vs deep copy of
zeroized material) is a two-way door for implementation, but
`Capabilities: Clone` is required."
The `Clone` requirement is accurately identified as a one-way constraint. The
problem is the *clone semantics*, which the framing treats as two-way but
which contain a hidden one-way door:
- **Arc-based clone (cheap, shared).** If `Capabilities` is
`Arc<Map<String, Secret>>`, `clone()` bumps the refcount. All calls in a
composition tree share the same secret material. If a handler can mutate
capabilities (e.g., add a derived key for a child operation), that mutation
is visible to siblings and the parent — shared mutable state across the
call tree. This is a security-relevant behavior.
- **Deep-copy clone (expensive, isolated).** Each call gets its own copy.
Mutations are isolated. But cloning zeroized secret material on every
`invoke()` duplicates secrets N times for a depth-N tree, increasing the
attack surface.
The "path of least resistance" is Arc-based (cheap, the obvious choice for a
`Clone` type holding secrets). Once shipped, handlers may come to depend on
shared-mutation. Switching from Arc-shared to deep-copy-isolated later is a
behavior change that breaks those handlers.
Worse, ADR-022's composition model *assumes* cheap clone — it clones
unconditionally on every `invoke()`. A depth-5 composition tree with Arc-based
clone does 5 refcount bumps (cheap). With deep-copy, it does 5 full copies of
all secret material. The ADR-022 design bakes in the assumption that clone is
cheap — which means it bakes in the Arc-based choice, making the "two-way door"
effectively decided by the composition model's performance assumptions.
**Suggestion**: Add a guard clause: `Capabilities` must be *immutable after
construction* (no interior mutability, no `Mutex<Map>`, no `RefCell`). This
makes Arc-based clone safe (shared immutable state is fine) and makes the
two-way door genuinely two-way: you can switch between `Arc<Map>` and
deep-copy without behavior change, because neither supports mutation. Without
this guard, the "two-way door" is a future one-way door waiting for the first
handler that mutates capabilities.
---
### W3. `CallClient` Registry Mechanism — "Two-Way Door" With a Security Dimension ADR-022 Introduced
**Documents**: ADR-017 (L236237, Accepted), ADR-022 (L225242, Proposed)
**Problem**: ADR-017 L236237: "The specific mechanism (sharing the global
registry, a peer-scoped subset, or a separate registry) is a two-way door."
The three options are mechanically possible with the `HandlerRegistration`
bundle. But ADR-022 introduced a security dimension the framing doesn't flag:
- **Sharing the global registry exposes local capabilities to the remote
peer.** Each `HandlerRegistration` carries `Capabilities` with secret
material. If the `CallClient` shares the global registry, the remote peer
can invoke any External operation — and those operations' handlers use the
*local* capabilities. A remote peer calling `/llm/generate` uses the local
node's Google API key. The remote peer effectively gains access to all
local secret material that backs External operations.
- **A peer-scoped subset must filter by capability remote-safety, not just
operation name.** The "path of least resistance" for a peer-scoped subset is
to filter by operation name/namespace. But the real filter must be: "is this
operation's capability safe to expose to this peer?" An operation backed by
a signing key that should never leave the node is not remote-safe,
regardless of its name.
The framing was written before ADR-022 put capabilities in the bundle.
Post-ADR-022, the "two-way door" on the registry mechanism has a security
constraint attached that the framing doesn't mention.
**Suggestion**: Append to ADR-017's negative consequences: "Sharing the
global registry with a `CallClient` exposes all local capabilities (API keys,
signing keys) to the remote peer, because the dispatch path populates
`OperationContext.capabilities` from the registration bundle. A peer-scoped
subset must filter by capability remote-safety, not just operation name. The
registry-mechanism choice is two-way mechanically but has a security dimension
post-ADR-022: the 'share global' option is a capability-exposure decision, not
just a dispatch decision."
---
### W4. `from_call` Hot-Swapping and Registry Mutability — Two "Two-Way Doors" That Gate Each Other
**Documents**: ADR-017 (L279), operation-registry.md (L383)
**Problem**: ADR-017 L279: "Hot-swapping imported specs is a two-way door."
operation-registry.md L383: "The registry is immutable after construction. No
runtime registration or deregistration. Two-way door —
`ArcSwap<OperationRegistry>` can be added later."
These are presented as independent two-way doors. They are not — one gates the
other. If the initial implementation registers `from_call`-imported specs
immutably (per the immutable-registry constraint), and handlers/clients
compile/generated-code against those specs, then "hot-swapping" later means:
re-importing on reconnect, diffing specs, invalidating generated clients,
handling in-flight calls whose schema changed. If the registry is
`ArcSwap`-able, hot-swap is mechanically possible but semantically fraught (a
handler mid-composition doesn't expect the child's schema to change).
Deciding "hot-swap is a two-way door" without resolving the registry-mutability
question leaves an implementer to guess whether to build for immutability or
hot-swap. The deferral hides a coupling: the *immutability* of the registry (a
one-way door per ADR-010 and operation-registry.md L14) and the *mutability*
of remote specs (inherent to `from_call`) are in tension.
Additionally, hot-swapping with ADR-022 bundles requires re-injecting
capabilities (cloning from the old registry or re-deriving from the vault) and
rebuilding the TLS `ServerConfig` (ALPN strings come from the registry).
The "simple ArcSwap" now touches the secret-material flow and the TLS config.
This isn't closing the door, but it raises the cost and security sensitivity
of the swap.
**Suggestion**: Resolve the registry-mutability question (immutable vs
`ArcSwap`) *before* deciding hot-swap, since hot-swap depends on it — don't
let both be independent "two-way doors" when one gates the other. Add a guard
clause to OQ-04: "Hot-swapping the registry with ADR-022 bundles requires
re-injecting capabilities (cloning from the old registry or re-deriving from
the vault) and rebuilding the TLS ServerConfig. The `ArcSwap` upgrade is
mechanical but not trivial — it is a secret-material-handling operation, not
a pure pointer swap."
---
### W5. `to_*` Adapter "Best-Effort" Mappings Become Compatibility Contracts Once Published
**Documents**: ADR-017 (L244246, L281286, Accepted), ADR-023 (error schemas
projection)
**Problem**: ADR-017 L244246: "Some semantics don't map cleanly (e.g.,
subscriptions in OpenAPI, bidirectional calls in MCP). The adapters handle
these with best-effort mappings and document the gaps."
The "best-effort" framing implies the mappings are malleable — two-way,
adjustable. This is misleading once the generated spec is published:
- **A published spec is a compatibility contract.** `to_openapi` generates an
OpenAPI spec. Once external HTTP clients consume that spec and build against
it, the mapping semantics (e.g., subscriptions → SSE long-poll, or
subscriptions → a polling endpoint) become a de facto contract. Changing the
mapping later breaks every client that built against the original spec. The
"best-effort" mapping, once published, is a one-way door for the clients
that depend on it.
- **`from_openapi` round-trip lossiness compounds.** If `to_openapi` maps
subscriptions to SSE, and later `from_openapi` imports a spec where
subscriptions are modeled as webhooks, the two adapters disagree on what a
"subscription" means in OpenAPI. The `to_*` mapping choices constrain the
`from_*` parsing expectations.
- **With ADR-023**, `to_openapi` now maps `ErrorDefinition.http_status` to
OpenAPI response codes. Once published, those status code mappings are a
contract. Changing `FILE_NOT_FOUND` from 404 to 422 in a later `to_openapi`
revision breaks clients.
This is the "informal spec becomes formal spec" problem. The generated
artifact is the contract; the "best-effort" label is internal framing that
external consumers never see. ADR-009's framework classifies doors by
reversal cost in the codebase, not by compatibility cost for external
consumers — a systematic blind spot.
**Suggestion**: Add a guard clause to ADR-017: "The `to_*` mapping choices,
once published as a generated spec, become a compatibility contract for
external consumers. Best-effort mappings are two-way before first publication
but one-way after. Version the generated specs (e.g., OpenAPI spec version
tied to the registry's External operation set version) and document that
mapping changes are breaking for consumers. The `to_*` adapters should emit a
spec version marker so consumers can detect mapping changes."
---
### W6. The Salt Field's "Reserved for Future KDF" Framing Is Misleading — Semantics Can't Change for Existing Data
**Documents**: ADR-020 (L102108, L151157, L204207, Accepted),
crates/vault/encryption.md (L112128)
**Problem**: ADR-020 L108: "If a future KDF-based derivation is needed (see
'Future KDF' below), the field is already present." ADR-020 L156: "This is
additive — it doesn't change v2 data."
The "reserved" framing creates a false sense of flexibility:
- **The salt in v2 is cryptographically meaningless and cannot be
retroactively made load-bearing for v2 data.** v2 encrypts with an
HD-derived key at `m/74'/2'/0'/0'`. The salt is random, stored, and unused.
If v3 introduces a KDF that uses the salt, v3's key derivation is different
from v2's. v3 cannot decrypt v2 data using the salt — v2 was never encrypted
with a salt-derived key. The salt is only usable for NEW v3+ data. So "the
field is already present" is technically true but misleading: the field is
present in v2 blobs, but its presence doesn't make v2 data upgradeable to a
KDF. It only means v3 blobs can reuse the field without a wire-format
change.
- **The wire-format-compat framing is the real two-way door, and it's
accurate.** Keeping the field means the `EncryptedData` struct doesn't
change shape across v2→v3. That IS a wire-format two-way door — you can add
a KDF in v3 without changing the struct. But the *semantics* of the salt
field ("unused" in v2, "load-bearing" in v3) cannot be changed for existing
v2 data.
- **A KDF doesn't fit the rotation scheme.** Rotation is version-indexed
paths (`m/74'/2'/0'/{version-2}'`). A KDF-based derivation is a different
derivation *method*, not a different path. If a KDF is introduced, the
rotation mechanism (version-indexed paths) and the KDF mechanism
(salt-based) coexist or conflict. The "reserved salt" framing doesn't
acknowledge that a KDF is a different derivation family, not just another
version index.
**Suggestion**: Update ADR-020: "The salt field is reserved for FUTURE
versions' use. v2 data's salt is permanently unused — it was random, never
participated in key derivation, and cannot be retroactively made
load-bearing. Introducing a KDF in v3 is a new derivation method (not a
version-indexed path), requiring its own design and a v2→v3 migration
(re-encrypt with the new KDF, using a newly-generated v3 salt — the v2 salt
is not reused). The field's presence saves a wire-format struct change only;
it does not make the KDF design or migration trivial." Drop the speculative
"reserved for future KDF" framing or explicitly mark it as speculative, not a
planned feature.
---
### W7. `unlock_new` Returns the Mnemonic as a Non-Zeroizing `String` — Security Gap
**Documents**: crates/vault/service.md (L7179)
**Problem**: `unlock_new(word_count) -> Result<String, VaultServiceError>`
returns the generated mnemonic phrase as a `String`. The `Mnemonic` type
zeroizes on drop (mnemonic-derivation.md L7778), but `String` does not
implement `Zeroize` by default. The root of trust is returned to the caller
as a plaintext heap-allocated `String` that lingers in freed memory until the
allocator reuses it. The spec says "Store the returned phrase securely"
(L7879) but does not address that the returned `String` cannot be zeroized
by the caller without `unsafe` or converting to a zeroizing container.
This contradicts the stated principle "Zeroize everything sensitive"
(vault README L6567) and the `Mnemonic` type's own zeroization. The Security
Constraints section (service.md L365399) does not mention this.
**Suggestion**: Either return a zeroizing wrapper (e.g., `Zeroizing<String>`
or a dedicated `Phrase` type that implements `Zeroize`/`ZeroizeOnDrop`), or
document in Security Constraints that the caller must convert the returned
`String` to a zeroizing type and must not let it be cloned/logged. At
minimum, flag this as a known limitation.
---
### W8. `DerivedKey` JSON Deserialization Silently Produces a Corrupted Key
**Documents**: crates/vault/protocol.md (L94112)
**Problem**: protocol.md L104107 states: "A JSON-deserialized `DerivedKey`
will have `"[REDACTED]"` as its `private_key` string — this is expected; JSON
round-tripping a `DerivedKey` is not a supported use case (the private key is
gone)."
This is a silent-failure footgun: if a `DerivedKey` is accidentally
serialized to JSON and then deserialized (e.g., in a log pipeline, a debug
tool, or a caching layer that uses JSON), the resulting `DerivedKey` has a
10-byte string (`"[REDACTED]"`) as its `private_key` instead of 32 bytes. Any
code that then uses this `DerivedKey` for signing will either produce garbage
signatures or panic on length mismatch — with no error at deserialization
time. The redaction is a defense-in-depth for *serialization* (logs), but the
*deserialization* side silently accepts the redacted form, producing a
corrupted key.
For the irpc/postcard path this is not an issue (postcard preserves bytes).
But if any code path ever uses JSON for `DerivedKey` transport (the spec says
this is "not a supported use case" but doesn't prevent it), the corruption is
silent.
**Suggestion**: Either (a) make `DerivedKey::deserialize` reject payloads
where `private_key == "[REDACTED]"` (returning an error rather than a
corrupted key), or (b) document explicitly that `private_key` length must be
validated before use and that JSON-deserialized `DerivedKey`s are invalid.
Consider whether the custom `Deserialize` impl should enforce a 32-byte
length check.
---
### W9. No ADR for the HD-Derivation Key Model (Identity/SSH/Signing) — Only Encryption Keys Have ADR-020
**Documents**: crates/vault/mnemonic-derivation.md (L3146, L248256),
crates/vault/README.md (L6870)
**Problem**: ADR-020 covers HD derivation for *encryption* keys. But the
vault's primary use of HD derivation is for identity keys, SSH host keys, and
signing keys (mnemonic-derivation.md L2629, L184215). The rationale for "HD
derivation, not stored keys" (L3146: no key storage, reproducible across
nodes, domain separation, auditable derivation) is inline in the spec's "Why
HD Derivation" section. The Design Decisions table (L250256) lists this with
no ADR ("HD derivation (not stored keys) | — | One seed, many keys, no key
storage").
This is a foundational architectural decision — the entire vault model
depends on it — yet it has no ADR. The choice of HD derivation vs. stored keys
is a one-way door (switching to stored keys would change the entire trust
model). ADR-018 and ADR-019 both *assume* HD derivation but don't *decide* it.
Similarly, these design choices have inline rationale but no ADR:
- **`74'` coin type reservation** (L186, L254): SLIP-0044 unallocated coin
type claimed for alknet. This is a namespace allocation that, once keys are
derived, cannot be changed without re-deriving all keys — effectively
one-way. No ADR records this allocation.
- **secp256k1 feature-gating** (L166179): rationale (heavy C dependency, only
needed for Ethereum) is inline. No ADR.
- **AES-256-GCM cipher/mode choice** (encryption.md L2737): rationale
(authenticated, hardware-accelerated) is inline. No ADR.
**Suggestion**: Either (a) create an ADR for "HD derivation as the vault's key
model" covering identity/SSH/signing keys (the encryption case is already
ADR-020), or (b) broaden ADR-020's scope, or (c) create a single "Vault Key
Model" ADR covering the overarching HD-derivation decision with ADR-020 as a
special case for encryption keys. Record the `74'` coin type reservation and
the AES-256-GCM choice in ADRs or a tracked decisions log, since they are
one-way doors with no current ADR.
---
### W10. The `EncryptedData` "Stable Wire Format" Is a One-Way Door Documented Only in Prose
**Documents**: crates/vault/encryption.md (L8495), ADR-018 (L101104)
**Problem**: encryption.md L8695 states: "This is the **stable wire format**
shared with `alknet-storage` (a future crate) by type-level agreement, not by
a crate dependency. Both crates must agree on the serialization format… This
cross-language compatibility is why the wire format must stay stable —
changing it breaks both `alknet-storage` and the TypeScript consumer."
ADR-018 L101104 repeats this.
This is a one-way door: once `alknet-storage` and TS consumers depend on the
format, changing it breaks them. Yet it is not locked by an ADR. It is a
cross-crate compatibility contract enforced by documentation, not by a decided
constraint. An implementer modifying `EncryptedData` (e.g., removing the
unused `salt` field) would find no ADR saying "this format is frozen." The
"type-level agreement, not a crate dependency" approach (ADR-018) means there
is no compiler enforcement either — the stability contract exists only in
prose.
**Suggestion**: Create an ADR (or add a decision to ADR-018) locking the
`EncryptedData` wire format as stable, with the compatibility surface
explicitly enumerated (fields, base64 encoding, field semantics). Note that
the `salt` field, though unused in v2, is part of the frozen format and cannot
be removed without a format-version migration.
---
### W11. `OperationContext.identity` Conversion from `CompositionAuthority` Is Undefined
**Documents**: operation-registry.md (L236:
`identity: parent.handler_identity.as_identity()`), ADR-022 (L314:
`identity: parent.handler_identity_as_identity()`)
**Problem**: Both spec and ADR-022 populate the child's `identity` by
converting the parent's `handler_identity: Option<CompositionAuthority>` into
an `Option<Identity>` via a method `as_identity()` /
`handler_identity_as_identity()`. This conversion method is **never defined**
and its semantics are unclear: `CompositionAuthority` (label + scopes +
resources) is a declared authority bundle, while `Identity` is a peer identity
resolved via `IdentityProvider`. ADR-022 goes to lengths to stress that
`CompositionAuthority` is *not* a peer `Identity` (L150160). Yet the dispatch
path needs an `Identity` to run ACL against for the child.
So there's an implicit conversion from a non-peer authority bundle to a peer
identity type. Is it a synthetic `Identity` constructed with the authority's
label as `id` and its scopes? Does it bypass `IdentityProvider`? Is
`as_identity()` returning `None` when `handler_identity` is `None` (the leaf
case)? For leaves, the ACL "checks against the leaf's `AccessControl`" — but
if `identity` is `None` because the parent is a leaf with
`handler_identity: None`, then per operation-registry.md L8687 "If the
relevant identity is `None` and the operation has restrictions, the adapter
returns `FORBIDDEN`." That would mean a composing handler with no composition
authority cannot compose *any* restricted leaf — which contradicts the model
where leaves are composed under the *parent's* authority.
Actually: a composing parent (Local/Session) *does* have
`handler_identity: Some(...)`, so `as_identity()` returns `Some`. Leaves being
composed *are* the child; their own `handler_identity` is `None` but that's
fine because the child's ACL uses the child's `identity` (which is the parent's
authority). So the conversion is load-bearing and must be specified.
**Suggestion**: Define `CompositionAuthority::as_identity() -> Option<Identity>`
(or `handler_identity_as_identity()` on `OperationContext`) explicitly in the
spec or ADR-022. Specify that it constructs a synthetic
`Identity { id: label, scopes, resources }` that is *not* resolvable via
`IdentityProvider` but is used directly for ACL matching. Note that this
creates a second `Identity` construction path (the first is
`IdentityProvider::resolve_*`), which should be acknowledged.
---
### W12. Source Drift Items Are Tracked Only in Scattered Prose — The `spawn` Return Bug Could Be Lost
**Documents**: crates/vault/README.md (L79103), crates/vault/encryption.md
(L185188, L245250), crates/vault/service.md (L280286, L374391)
**Problem**: Four source-vs-spec drift items are documented in prose across
multiple files:
1. **OsRng vs `rand::random()`** — documented in README L8789, encryption.md
L245250, service.md L374376 (3 locations).
2. **`unwrap()` on RwLock** — documented in README L9497, service.md
L384391 (2 locations).
3. **`CURRENT_KEY_VERSION = 1` (should be 2)** — documented in encryption.md
L185188, ADR-020 L120124 (2 locations, none in README).
4. **`spawn` returns unspawned actor** — documented **only** in service.md
L280286 (1 location, inside a method description, not in a Security
Constraints section).
Problems with this approach:
- The `spawn` bug (item 4) is documented in exactly one place, buried in the
`spawn` method's bullet description, not in any "Security Constraints" or
"Known Drift" section. An implementer who reads the Security Constraints
sections (the natural place to look for implementation-sync corrections)
would miss it.
- README's Security Constraints (L79103) lists OsRng and unwrap but **omits**
`CURRENT_KEY_VERSION` and the `spawn` bug. README is the index document; an
implementer using it as a checklist would miss two drift items.
- There is no single, structured tracking mechanism. If one prose location is
updated and others aren't, the docs silently diverge.
**Suggestion**: Add a single "Known Source Drift" table to README.md (or a
dedicated section) listing all drift items with: item, current behavior,
target behavior, location in source, and status. At minimum, add
`CURRENT_KEY_VERSION` and the `spawn` bug to README's Security Constraints.
Move the `spawn` bug note out of the method description and into the drift
tracker.
---
### W13. `CertAuthorityEntry` Is an Undefined Placeholder Type Used in `DynamicConfig`
**Documents**: crates/core/config.md (L119130)
**Problem**: `AuthPolicy.cert_authorities: Vec<CertAuthorityEntry>` is
declared, but `CertAuthorityEntry` is explicitly "a placeholder type. Its
fields will be defined when alknet-ssh is implemented." This means alknet-core's
public API references a type that does not exist yet. The spec says "for v1,
`cert_authorities` will be an empty vector," implying the field is dead weight.
An implementer cannot compile `DynamicConfig` without defining
`CertAuthorityEntry` somewhere. Where does it live — alknet-core (but it's
SSH-specific per the doc) or alknet-ssh (but then alknet-core can't reference it
without a dependency)? This is a crate-dependency contradiction: core
references an SSH-defined type, but core cannot depend on alknet-ssh (ADR-003
forbids it).
**Suggestion**: Either define a minimal `CertAuthorityEntry` in alknet-core
(as an opaque/newtype or a generic struct with the fields that are
ALPN-agnostic), or remove `cert_authorities` from v1 `AuthPolicy` entirely and
add it back when alknet-ssh lands (it's a two-way door — adding a field is
additive). Leaving an undefined type in a public struct blocks compilation.
---
### W14. `SecretKey` Type in `TlsIdentity::RawKey(SecretKey)` Is Undefined and Its Origin Ambiguous
**Documents**: crates/core/config.md (L44, L79)
**Problem**: `TlsIdentity::RawKey(SecretKey)` and `SecretKey::generate()` are
used but `SecretKey` is never imported or attributed. It could be
`ed25519_dalek::SecretKey`, `iroh::SecretKey`, `alknet_vault::SecretKey`, or a
core-defined type. Since config.md says the key "can be derived from
alknet-vault" (per endpoint.md:181), and alknet-core must not depend on
alknet-vault (ADR-003, ADR-018), this matters: if `SecretKey` is from iroh,
then alknet-core depends on iroh even for quinn-only nodes; if it's a
re-export/newtype, that needs stating. The crate dependency table in ADR-003
L27 lists alknet-core depending on "tokio, quinn, rustls, irpc" — **not iroh**
— yet `config.md` and `endpoint.md` show iroh types in core's public API.
**Suggestion**: Specify the exact type and crate for `SecretKey`. If it's
`iroh::SecretKey`, update ADR-003's dependency list (iroh is already used for
`iroh::Endpoint`, so this is consistent, but ADR-003 omits iroh from
alknet-core's deps — a separate inconsistency). If it's `ed25519_dalek`, say
so. Resolve whether alknet-core publicly depends on iroh types.
---
### W15. ADR-003 Omits iroh From alknet-core's Dependency List; Spec Docs Assume It
**Documents**: ADR-003 (L27), crates/core/endpoint.md (L20), config.md (L44)
**Problem**: ADR-003 L27 lists alknet-core as depending on "tokio, quinn,
rustls, irpc" — no iroh. But `AlknetEndpoint` holds
`iroh: Option<iroh::Endpoint>` (endpoint.md L20, ADR-010 L67),
`SendStream`/`RecvStream` wrap "iroh::SendStream or iroh::RecvStream"
(core-types.md L103), and `TlsIdentity::RawKey(SecretKey)` likely uses iroh's
key type. ADR-010 L184 says this is "mitigated: both are feature-gated." But
ADR-003's dependency table is now stale — it doesn't reflect the iroh
dependency that ADR-010 introduced.
**Suggestion**: Update ADR-003's dependency table to include iroh
(feature-gated) for alknet-core, or add an amendment note that ADR-010 added
the iroh dependency.
---
### W16. Overview Failure-Mode Table Says Handler Error Closes "QUIC Stream"; Spec Says Connection
**Documents**: overview.md (L237238), crates/core/core-types.md (L30, L46),
crates/core/endpoint.md (L255257, L262)
**Problem**: `overview.md:237` says "Handler `handle()` returns `HandlerError`
| Endpoint logs the error, closes the QUIC **stream**." But
`core-types.md:30` says "The endpoint catches these, logs them, and closes the
**connection**." `endpoint.md:256` says HandlerError is "Non-fatal errors
within a handler" and the connection closes. The whole architecture is
one-ALPN-per-connection (ADR-006), and `handle()` receives the whole
`Connection`. Closing a "stream" doesn't make sense when the handler owns the
connection. This is a stale leftover from the pre-ADR-007 model where
`handle()` received a `BiStream`.
**Suggestion**: Change `overview.md:237238` from "closes the QUIC stream" to
"closes the QUIC connection."
---
### W17. OQ-09's "BiStream Trait Preserves the WASM Door" Is Only True for the Client Side
**Documents**: open-questions.md (OQ-09 L100107), ADR-007, ADR-009 (L41, L44),
crates/core/core-types.md, crates/core/endpoint.md
**Problem**: The resolution claims "The BiStream trait decision (ADR-007) has
already preserved the most important WASM door." This is accurate for the
*client* path (a browser can implement `BiStream` over WebTransport) but
oversells what is preserved for a *server-side* WASM peer:
- **`Connection` is a concrete struct, not a trait** (ADR-007). Its
`accept_bi()`/`open_bi()` return `SendStream`/`RecvStream` described as
wrapping "the underlying QUIC stream types." A WASM server-side peer cannot
implement this `Connection` without a QUIC library that compiles to WASM.
BiStream being a trait does not help — handlers receive `Connection`, and
`Connection` is the binding point.
- **The accept loop uses `tokio::spawn`** (ADR-010). Tokio does not run on
WASM. The entire endpoint runtime is tokio-bound.
- **`PendingRequestMap` and the `CallAdapter` use tokio `oneshot`/`mpsc`
channels** (call-protocol.md L218240). These are tokio-specific.
The OQ framing implies BiStream was the binding constraint and the rest is
detail. In reality, BiStream preserves the *client stream* door, but
`Connection`, the accept loop, and the call-protocol dispatch internals
quietly close the *server-side dispatch* door.
**Suggestion**: Update OQ-09's resolution to state: "BiStream being a trait
preserves the client-side stream door. The server-side dispatch door
(`Connection` as a concrete quinn-bound struct, tokio-based accept loop, tokio
channel dispatch) is NOT preserved by ADR-007 and would require a `Connection`
trait and a runtime-abstracted accept loop to open. This is a known, accepted
closure — the browser path is client-side via a JS SDK, not server-side
Rust-to-WASM." This converts the misleading framing into an explicit, accepted
one-way door.
---
### W18. OQ-10's "Start With Smart Protocol" Deferral Forecloses Git Composability
**Documents**: open-questions.md (OQ-10 L109116)
**Problem**: The deferral says "start with git smart protocol over QUIC
streams, ERC721 and full server capabilities are additive." The ALPN/endpoint
design does not constrain the git adapter's *wire format* — the handler owns
its format and `alknet/git` is a dedicated connection. That part is genuinely
two-way.
However, the deferral hides a fork that the "path of least resistance" will
silently resolve in a way that forecloses composability:
- **Raw smart protocol vs. call-protocol projection.** If git ships as a
`ProtocolHandler` speaking raw git smart protocol over QUIC streams (the
simplest path), git operations are NOT callable via `env.invoke()`. They
exist only on the `alknet/git` ALPN, not in the `OperationRegistry`. An
agent handler that wants to compose `git/clone` cannot — there's no
`OperationSpec`, no `Handler`, no registration. The agent would have to
open a raw QUIC connection to `alknet/git` and speak pkt-line, bypassing
the entire composition/security model (ADR-015, ADR-022).
- To make git composable later, you'd need a *separate* call-protocol
projection — a set of `OperationSpec`/`Handler` pairs that wrap git
operations behind the registry. That's additive, but it means the "simple"
choice (raw smart protocol only) produces a git adapter that is permanently
non-composable unless a second handler layer is built.
The deferral frames ERC721/full-server as the additive part. But the real
additive question is composability, and the simplest implementation forecloses
it.
**Suggestion**: Append to OQ-10's resolution: "The composability of git
operations (whether git ops are registered in the `OperationRegistry` and
callable via `env.invoke()`, or only available as raw smart protocol on
`alknet/git`) is a separate decision from ERC721 scope. The path of least
resistance (raw smart protocol only) forecloses agent composition of git
operations. If agent tool dispatch needs to compose git ops, a call-protocol
projection must be built alongside or instead of the raw handler. Resolve
this when speccing alknet-git, not deferred past it."
---
### W19. ADR-016 Self-Referential Citation and Under-Specified Grandchild Propagation
**Documents**: ADR-016 (L134, L154158)
**Problem**: Two minor issues in ADR-016:
1. **Self-referential citation** (L134): ADR-016 cites itself ("ADR-016
Assumption 5 says the policy is per-call, not per-operation") in its own
Decision 6 rationale. This should just be "Assumption 5" (it's the same
document).
2. **Grandchild propagation under-specified** (L154157): "The child's
`OperationContext` carries the policy. If the child itself composes
grandchildren, the policy propagates unless the child explicitly overrides
it." This is the only statement of propagation-vs-override semantics. It
doesn't specify: does the grandchild *inherit* the child's policy by
default (so if the child got `ContinueRunning`, the grandchild is
`ContinueRunning` unless overridden)? Or does each `invoke()` reset to
the default `AbortDependents` unless explicitly set? The phrase
"propagates unless overridden" implies inheritance, which is reasonable,
but the `invoke()` default (operation-registry.md L213, ADR-016 L146) shows
`context.env.invoke(...)` with no policy = default. So is "no policy arg"
= "inherit parent's" or "reset to AbortDependents"? These conflict.
**Suggestion**: (1) Change "ADR-016 Assumption 5" to "Assumption 5" within
ADR-016. (2) Clarify: does `invoke()` with no policy argument (a) inherit the
parent's current policy, or (b) reset to `AbortDependents`? Recommend (b)
reset-to-default for predictability (each composition is a fresh decision)
unless the parent opts the child into `ContinueRunning`, and document that
`ContinueRunning` does *not* auto-propagate to grandchildren (the child must
re-affirm it for each grandchild). Or pick (a) and update the `invoke()`
examples. Either is defensible; pick one.
---
### W20. ADR-023's `from_openapi` Error Code Example Collides With Protocol-Level `NOT_FOUND`
**Documents**: ADR-023 (Assumption 3 L366373, Decision 5 L241)
**Problem**: ADR-023 Assumption 3 acknowledges the collision risk: an operation
declaring `NOT_FOUND` as a domain code collides with the protocol-level
`NOT_FOUND` (operation not registered). The resolution (L367368): "the
protocol code takes precedence in the dispatch machinery." And (L372373):
"The assumption is that operations use domain-specific codes (`FILE_NOT_FOUND`)
rather than reusing protocol codes (`NOT_FOUND`). This is a naming convention,
not a type-system enforcement."
But Decision 5's example (L241) shows `from_openapi` mapping a 404 to
`ErrorDefinition { code: "NOT_FOUND", http_status: Some(404), … }` — i.e.,
`from_openapi` *does* produce a domain code that collides with a protocol
code. So the "naming convention" is violated by the very adapter the ADR
specifies. An implementer following Decision 5's example would create a
`FILE_NOT_FOUND`-vs-`NOT_FOUND` ambiguity: when `call.error` carries
`code: "NOT_FOUND"`, is it "operation not registered" (protocol) or "file not
found" (operation-level, from a 404)? The `details` field disambiguates
(present for operation-level, absent for protocol-level), but the `code` alone
is ambiguous, and ADR-023 L171 says "Clients should switch on `code`, not
parse `message`."
**Suggestion**: Either (a) make `from_openapi` prefix/namespace the imported
error codes (e.g., `HTTP_404` or `<op>_NOT_FOUND`) to avoid collision, or (b)
update Decision 5's example to use a non-colliding code and add a normative
rule that `from_openapi` must not produce codes that match the five
protocol-level codes. Assumption 3's "naming convention" should be elevated to
a requirement for adapters, since the adapter example breaks it.
---
### W21. `ScopedOperationEnv` Public Field Leaks the `HashSet` Representation — Future Subgraph Refactor Is Breaking
**Documents**: ADR-022 (L208214, L97105), operation-registry.md (L188, L328)
**Problem**: ADR-022 L97105: "For v1, the scoped env can be a
`HashSet<String>` of reachable operation names. A dedicated
`alknet-flowgraph` crate … is a future enhancement — not a prerequisite for
the security model."
The `HashSet<String>` representation is fine *internally*. The problem is that
it leaks into the public API in a way that makes the future refactor a
breaking change:
- **`ScopedOperationEnv` is a public struct with a public field.** ADR-022
L208214 shows `pub struct ScopedOperationEnv { pub allowed_operations:
HashSet<String> }`. The assembly layer and agent crate construct this
directly: `ScopedOperationEnv { allowed_operations: HashSet::from(…) }`
(operation-registry.md L188, L328). If the representation later changes to a
subgraph type (with type-compatibility edges, as the flowgraph model
requires), every construction site breaks — the field type changes from
`HashSet<String>` to `Subgraph` or `Vec<Edge>`. This is a public-API
breaking change, not an internal refactor.
The framing says the flowgraph crate is "not a prerequisite for the security
model." That's true — the security model works with a flat set. But the
*refactor* from flat set to subgraph is not the trivial "two-way door" the
framing implies. It's a breaking change to construction sites.
Additionally, this interacts with OQ-19 (session-scoped registries): session
ops compose under a scoped env. If the scoped env is a flat `HashSet<String>`,
session ops get reachability control (can this op be reached?) but NOT
type-compatibility validation (does the output of op A feed validly into op
B?). A session op (untrusted, LLM-written) that composes an op whose output
type doesn't match the next op's input gets a runtime error, not a static
rejection. The flowgraph deferral means session ops ship without static type
validation — a security-relevant gap for untrusted code.
**Suggestion**: Make `allowed_operations` private (not `pub`), with a
constructor `ScopedOperationEnv::new(ops: impl IntoIterator<Item = String>)`
and a query method `allows(&self, name: &str) -> bool`. This keeps the
representation internal and makes the future subgraph refactor a non-breaking
change. Add to the flowgraph deferral: "The `HashSet<String>` representation
does not support type-compatibility validation. Session-scoped ops (OQ-19,
untrusted code) compose without static type checking until the flowgraph
crate is built. This is a known gap for the untrusted-code path, not just a
visualization feature."
---
## Suggestions
### S1. Inline Decision Rationale in Spec Docs That Should Be in ADRs
**Documents**: call-protocol.md (L341, L343344, L124), operation-registry.md
(L202, L364369, L377), crates/vault/encryption.md (L2737),
crates/vault/mnemonic-derivation.md (L3146)
**Problem**: Per the architect's documentation principles, spec docs should
reference ADRs by number, not contain the rationale. Several spec sections
contain *why* rationale that duplicates or extends ADR content:
- call-protocol.md L341 duplicates ADR-016 Decision 2's rationale ("the
correct default because aborted parent work has no consumer…") verbatim.
- operation-registry.md L202 explains *why* metadata doesn't propagate (the
secret-leak scenario) — this is ADR-014's rationale, restated inline.
- operation-registry.md L364369 explains the `Clone` semantics of
`Capabilities` and the security implication — this is a design rationale,
not in any ADR. It should be in ADR-014 or ADR-022.
- operation-registry.md L377 ("`from_call` trust is transitive… a compromised
remote node can do anything") is a threat-model statement that belongs in
ADR-017.
- crates/vault/encryption.md L2737 (AES-256-GCM rationale) and
mnemonic-derivation.md L3146 (HD derivation rationale) have no ADRs (see
W9).
**Suggestion**: Move the unique rationale to the relevant ADRs (ADR-016
already has the abort-dependents rationale — just cite it; add
capabilities-clone rationale to ADR-014 or ADR-022; add `from_call` transitive
trust to ADR-017) and replace the inline rationale with ADR references. Keep
only the *constraint statement* in the spec.
---
### S2. Untracked "Two-Way Door" Deferrals Scattered Through Specs
**Documents**: operation-registry.md (L383, L257, L364, L369),
call-protocol.md (L333), ADR-017 (L236237, L279)
**Problem**: Per the documentation principles, inline open questions should be
in `open-questions.md`. Several "two-way door" / "future work" notes are
scattered through the specs without being tracked as open questions:
- operation-registry.md L383: "Two-way door — `ArcSwap<OperationRegistry>`
can be added later" (contradicts the "immutable after construction"
constraint at L14, L194, L340).
- operation-registry.md L257: "Future work may add irpc service dispatch and
remote call protocol dispatch as additional backends."
- operation-registry.md L364: "The concrete shape of the type… is a two-way
door for implementation."
- operation-registry.md L369: "The concrete cloning semantics… is a two-way
door for implementation."
- call-protocol.md L333: timeout policy (30s default, adapter-level config) —
no ADR governs this.
- ADR-017 L236237, L279: CallClient registry mechanism and `from_call`
hot-swapping.
**Suggestion**: Add the genuinely-open items (Capabilities concrete shape,
Capabilities clone semantics, `OperationAdapter` trait signatures, CallClient
registry mechanism, `to_*` adapter gap semantics, `from_call` hot-swap,
registry hot-swap via ArcSwap, timeout policy) to `open-questions.md` with
explicit status. For items that are *decided* deferrals, record the deferral
in the relevant ADR's Consequences rather than as inline spec commentary.
---
### S3. Missing ADR References for Design Choices ("Why X Over Y?")
**Documents**: call-protocol.md (L86, L116, L262272, L333),
operation-registry.md (L6467, L120127, L139)
**Problem**: Several design choices are stated in the spec with no ADR
backing:
- **Binary payload convention** (call-protocol.md L86): base64-in-JSON. Why
not a separate binary event type? No ADR.
- **`auth_token` in payload** (call-protocol.md L116122, L262272): per-
request identity upgrade via an optional `auth_token` field. Significant
security-model decision with no ADR.
- **Per-request vs per-connection identity** (L272): "Identity is resolved
per-request, not per-connection." No ADR.
- **Timeout policy** (L333): 30s default, adapter-level config. No ADR.
- **Module-private `internal` writes** (operation-registry.md L120127):
the enforcement mechanism for "handlers can't set `internal`." The
requirement is from ADR-015; the mechanism (Rust visibility) has no ADR.
- **Namespace derivation** (operation-registry.md L66): "`namespace` is
derived from the name." Why derived rather than independently declared? No
ADR.
**Suggestion**: For load-bearing choices (binary convention, auth_token /
per-request identity, timeout policy), either add an ADR or extend an
existing one. For minor choices (namespace derivation, visibility pattern), a
one-line note suffices, but they shouldn't be silently decided in the spec.
---
### S4. README ADR Tables Missing Entries and Status Column
**Documents**: crates/call/README.md (L1936), crates/core/README.md
(L2131), crates/vault/README.md (L3847)
**Problem**: Three issues across the crate READMEs:
1. **call README** omits ADR-013 (Rust canonical implementation), which both
call spec docs reference (call-protocol.md L88, operation-registry.md L24).
2. **core README** omits ADR-015, referenced by auth.md L78, L200 and
config.md L200.
3. **vault README** omits ADR-010 and ADR-006, referenced by
mnemonic-derivation.md L26 and protocol.md L290.
4. **None of the crate READMEs have a Status column** in their ADR tables,
which contributes to C2 (reader can't tell which ADRs are binding).
**Suggestion**: Add the missing ADRs to each crate README's ADR table. Add a
Status column (Accepted/Proposed/Superseded) to make the governance state
visible — this directly supports resolving C1/C2.
---
### S5. `Connection::new()` Referenced but Never Defined
**Documents**: crates/core/core-types.md (L113), crates/core/endpoint.md
(L127), ADR-010 (L117)
**Problem**: `Connection::new(connection)` is called in dispatch pseudocode
(endpoint.md L127, ADR-010 L117) but the `impl Connection` block in
`core-types.md:5967` does not include a `new()` constructor, nor specify its
signature/parameter. `core-types.md:113` says "`Connection::new()` wraps the
appropriate stream source based on where the connection came from" — but
there's no formal signature, and the argument type differs by source (a
`quinn::Connection` vs an iroh `Accepting`/`Connecting`). An implementer
doesn't know the constructor's parameter type.
**Suggestion**: Add `pub fn new(source: ConnectionSource) -> Connection` (or
two constructors `from_quinn` / `from_iroh`, or a private enum) to the
`impl Connection` block. Define `ConnectionSource` or the variant
constructors.
---
### S6. `services/schema` Input Shape Under-Specified (Leading Slash)
**Documents**: operation-registry.md (L286), call-protocol.md (L120, L126)
**Problem**: The input shape for `services/schema` is shown as
`{ "name": "fs/readFile" }` (no leading slash) but the wire `operationId`
convention (call-protocol.md L120, L126) is *with* a leading slash
(`/fs/readFile`). So does `services/schema`'s `name` input use the wire form
(leading slash) or the registry form (no slash)? The spec shows no slash,
implying registry form, but a client calling `services/schema` over the wire
would naturally use the same `operationId` form it uses elsewhere. This is a
small ambiguity but could cause a `NOT_FOUND` if the adapter doesn't
normalize.
**Suggestion**: Specify whether `services/schema`'s `name` input has a
leading slash and whether the adapter strips it (consistent with
`call.requested`'s `operationId` handling per call-protocol.md L120).
---
### S7. `call.completed` For Non-Subscription Calls Not Addressed
**Documents**: call-protocol.md (L96106)
**Problem**: The event table lists `call.completed` as "Signal end of
subscription stream" — subscription-only. But for a plain `call()`
(request/response, one result), the lifecycle is `call.requested`
`call.responded` (one) and then… nothing? Is the request considered complete
on the first `call.responded`? Is a `call.completed` ever sent for
non-subscription calls? The spec says (L103) "call(): Sends `call.requested`,
resolves on first `call.responded`" — so no `call.completed` for calls. But
then what signals to the server that the client is done listening? This is
consistent but the asymmetry (subscriptions get `completed`, calls don't)
could be stated explicitly to avoid an implementer sending a spurious
`call.completed` after a `call.responded`.
**Suggestion**: Add a note: "`call.completed` is sent only for subscriptions.
A plain `call()` is complete after its single `call.responded`; no
`call.completed` follows."
---
### S8. `OperationProvenance::FromCall` Comment Is Ambiguous ("Leaf Locally")
**Documents**: ADR-022 (L123124)
**Problem**: `FromCall` => "QUIC forwarding stub (from_call), leaf locally —
cannot compose." The "locally" qualifier is ambiguous. Does it mean: (a)
`from_call` ops are leaves in the *local* registry but could compose on the
*remote* node? Or (b) they're leaves, full stop, and "locally" is just
emphasis? Assumption 5 (L518524) clarifies, but the phrasing invites
misreading.
**Suggestion**: Rephrase to "leaf in the local registry — forwards calls to
a remote node; cannot compose locally" or similar to remove the ambiguity.
---
### S9. `ScopedOperationEnv::empty()` and `CompositionAuthority::none()` Referenced but Undefined
**Documents**: operation-registry.md (L182184, L243244, L298299, L318320),
ADR-022 (L287)
**Problem**: The spec and ADR-022 use `ScopedOperationEnv::empty()` and
`CompositionAuthority::none()` (constructor methods) but neither type's
definition (ADR-022 L164180 for `CompositionAuthority`, L208214 for
`ScopedOperationEnv`) lists these methods. They're implied (empty set /
None-equivalent) but not shown.
**Suggestion**: Add `impl ScopedOperationEnv { pub fn empty() -> Self { … }
pub fn allows(&self, name: &str) -> bool { … } }` and
`impl CompositionAuthority { pub fn none() -> Option<Self> { None } }` (or
similar) to the spec or ADR-022.
---
### S10. Overview OQ-04 Entry Is Stale
**Documents**: overview.md (L225)
**Problem**: `overview.md:225` lists "OQ-04: Dynamic handler registration at
runtime vs static at startup (two-way door, defer to implementation)." But
OQ-04 in `open-questions.md:4751` is **resolved** ("Static registration at
startup"). The overview shows a stale framing ("defer to implementation")
while the OQ is resolved. Compare to OQ-01/OQ-02/OQ-03 which are correctly
marked "resolved" in the same overview list.
**Suggestion**: Update `overview.md:225` to "OQ-04: Dynamic handler
registration (resolved: static at startup — see ADR-010)" to match the
resolved state.
---
### S11. ADR-021 References "W1 Drift Issue from the Vault Review" — Ambiguous Cross-Reference
**Documents**: ADR-021 (L127129)
**Problem**: ADR-021 L127129 states: "This is the fix for the W1 drift issue
from the vault review: the current source ignores `key_version` for key
selection; the spec now makes it functional." But review 001's W1 (L355371)
is about *ADR-020's stale example path* (`m/74'/2'/1'/0'` vs
`m/74'/2'/0'/1'`), not about the source ignoring `key_version`. The
source-ignoring-`key_version` drift is not tracked in review 001 at all. The
"W1" label in ADR-021 doesn't match review 001's W1.
**Suggestion**: Clarify the reference — either cite review 001 with the
correct finding number, or remove the "W1" label and describe the drift
directly (which the text already does).
---
### S12. Duplicate "Helper Functions" Block in mnemonic-derivation.md
**Documents**: crates/vault/mnemonic-derivation.md (L199204, L217223)
**Problem**: The helper functions `device_path` and `site_password_path` are
documented twice: L199204 (missing `encryption_path_for_version`) and
L217223 (complete). An implementer reading the first block might not realize
`encryption_path_for_version` exists, or might think there are two conflicting
definitions. The duplication is a maintenance hazard.
**Suggestion**: Remove the first block (L199204); keep the complete list at
L217223.
---
## Cross-Cutting Theme: ADR-022 Narrowed Several "Two-Way Doors" Without Updating the OQ Classifications
The two-way-door audit found a systematic pattern: ADR-022 (Proposed, not yet
Accepted) is the single largest source of door-type staleness. It narrowed
several doors without updating the OQ classifications that were written
pre-ADR-022:
| Door | Pre-ADR-022 classification | Post-ADR-022 reality |
|------|---------------------------|----------------------|
| Capabilities concrete shape (W2) | Two-way (implementation detail) | Clone semantics become a behavioral contract once handlers depend on shared mutation; ADR-022's composition model bakes in the cheap-clone assumption |
| OperationAdapter trait signatures (W1) | Two-way (async vs sync) | ADR-022 committed the return type to `HandlerRegistration`; `from_call` forces async — both are already decided |
| CallClient registry mechanism (W3) | Two-way (share global vs subset vs separate) | ADR-022 put capabilities (secrets) in the bundle — "share global" is now a capability-exposure decision, not just dispatch |
| Dynamic registration (W4) | Two-way (add ArcSwap later) | Hot-swap now requires re-injecting capabilities (secret-material handling) + TLS config rebuild, not a pointer swap |
| Session-scoped registries (OQ-19) | Two-way (protocol doesn't need changes) | `OperationContext.env` field type is a current commitment — if it's a concrete struct, the session-overlay pattern is closed (see C6) |
| Graph structures (W21) | Two-way (HashSet now, flowgraph later) | `ScopedOperationEnv`'s public field leaks the representation; the refactor is a public-API break, not an internal change |
**Pattern**: ADR-022 was written to resolve review #001's C1C4 (the
registration-bundle tangle), and in doing so it quietly narrowed several
previously-two-way doors. The OQ tracker and ADR-017 should be reconciled
with ADR-022 before the "two-way door" labels can be trusted. ADR-009's
framework is sound; its application is drifting because later ADRs don't
update earlier OQ classifications.
The fix is not to re-litigate each door — most of ADR-022's narrowing is
correct (the bundle *should* carry capabilities, the adapter *should* produce
`HandlerRegistration`). The fix is to add guard clauses to the affected OQs
and ADRs documenting the narrowed scope, so an implementer reading the OQ
tracker sees the post-ADR-022 reality, not the pre-ADR-022 optimism.
---
## Cross-Cutting Theme: The "Published Artifact Is a Contract" Blind Spot
The audit found a systematic blind spot in ADR-009's framework: it
classifies doors by reversal cost in the codebase, not by compatibility cost
for external consumers. Three deferrals share this pattern:
- **`to_*` best-effort mappings (W5)**: "best-effort" is true before first
publication but one-way after — external clients depend on the generated
spec.
- **Error schema `http_status` mappings (W20)**: once `to_openapi` publishes
`FILE_NOT_FOUND` → 404, changing it breaks clients.
- **The salt field's wire format (W6)**: the field is "reserved" (wire-format-
stable), but its *semantics* cannot change for existing data — "reserved"
implies more flexibility than exists.
The framework treats these as two-way because the *format* is additive; it
misses that the *semantics*, once published, are frozen. This is the same
class of issue as a public API: adding a method is additive, but changing an
existing method's behavior is breaking. The spec should acknowledge that
published artifacts (generated OpenAPI specs, published error code mappings,
shipped wire formats) are compatibility contracts, not internal
implementation details.
---
## Summary Statistics
| Severity | Count |
|----------|-------|
| Critical | 13 |
| Warning | 21 |
| Suggestion | 12 |
## Recommended Resolution Order
The findings cluster into a few resolution passes. Recommended order:
### Pass 1: Governance reconciliation (resolves C1, C2, C12, C11)
1. Advance ADR-022 and ADR-023 to **Accepted**.
2. Amend ADR-015: mark Decision 3 and Assumption 6 as superseded by ADR-022;
update the struct to `CompositionAuthority`.
3. Amend ADR-002: note the signature was revised by ADR-007.
4. Amend ADR-004: note the "enrich/replace" language was superseded by
ADR-011's immutability model.
5. Add a Status column to all crate README ADR tables.
This is the highest-value pass — it resolves the direct contradictions in the
document set and makes the governance state coherent.
### Pass 2: Spec-ADR consistency (resolves C3, C4, C5, C6, C10, C13, W1)
6. Add `abort_policy` to `OperationContext`; define `AbortPolicy` enum (C3).
7. Define the `OperationEnv` trait explicitly, including
`invoke_with_policy()` (C4).
8. Separate `OperationContext.env` (dispatch trait) from
`OperationContext.scoped_env` (reachability set) — resolve the type
identity crisis (C6).
9. Update ADR-017's `from_call` mirror list and `OperationAdapter` trait to
include `error_schemas` and `HandlerRegistration` (C5, W1).
10. Specify `From<StreamError> for HandlerError` and resolve the variant
payload (C10).
11. Specify the `Connection::set_identity` observability read path (C13).
### Pass 3: Vault security and deferral reconciliation (resolves C7, C8, C9, W6, W7, W8, W9, W10, W12)
12. Promote OQ-21 from "deferred" to "open"; reconcile with ADR-019; decide
the vault-server-crate question (C7).
13. Complete the operation access policy table (C8).
14. Specify the `site_password_path` string→u32 mapping (C9).
15. Add an ADR for the HD-derivation key model (identity/SSH/signing) (W9).
16. Add an ADR locking the `EncryptedData` wire format (W10).
17. Add a "Known Source Drift" table to the vault README (W12).
18. Address `unlock_new` returning non-zeroizing `String` (W7) and
`DerivedKey` JSON deserialization (W8).
19. Clarify the salt field's "reserved" semantics (W6).
### Pass 4: Two-way-door guard clauses (resolves W2, W3, W4, W5, W17, W18, W21)
20. Add guard clauses to the OQs and ADRs that ADR-022 narrowed (W2, W3, W4,
W21).
21. Add the "published artifact is a contract" guard to `to_*` adapters (W5).
22. Correct OQ-09 (WASM server-side) and OQ-10 (git composability) framings
(W17, W18).
### Pass 5: Minor consistency (resolves remaining Warnings and Suggestions)
23. Fix the overview.md "stream" → "connection" wording (W16).
24. Fix ADR-016's self-referential citation and grandchild propagation (W19).
25. Fix ADR-023's `from_openapi` collision example (W20).
26. Address the remaining suggestions (S1S12).
---
## Note on Effort
This review found more issues than review #001 (13 critical vs 5, 21 warning
vs 4), but that's expected: review #001 caught the registration-bundle tangle
and the missing error schemas, which were the highest-density gaps. This
review goes wider — cross-document consistency, the two-way-door framing, and
the vault deferrals — and surfaces issues that were always there but not yet
audited. The critical findings here are individually smaller than review
#001's C1C5 (no single tangle of four interlocking gaps), but they are
collectively important because several are governance issues (Proposed ADRs
treated as binding, Accepted ADRs contradicting later refinements) that
undermine the document set's authority.
The two-way-door audit's core finding: the framework is sound, but its
application is drifting because later ADRs don't update earlier OQ
classifications. The fix is not to re-litigate each door — it's to add guard
clauses documenting the narrowed scope, so the OQ tracker reflects
post-ADR-022 reality.