Second pre-implementation review. Goes wider than #001 on cross-document consistency and the two-way-door framing from ADR-009. Finds 13 critical, 21 warning, 12 suggestion issues: - Governance: ADR-022/023 are Proposed but specs treat them as binding; ADR-015/002/004 (Accepted) contradict later refinements without supersession markers - Abort policy (ADR-016) missing from OperationContext struct; OperationEnv trait never defined - OperationContext.env type identity crisis (reachability set vs dispatch trait) - ADR-017 from_call mirror list missing error_schemas; OperationAdapter trait stale vs ADR-022 bundle - OQ-21 remote vault 'non-breaking' framing conflicts with ADR-019 and hides a crate-decomposition decision; RemoteService path unvalidated - Vault operation access policy table incomplete for security-sensitive methods - site_password_path string-to-index mapping breaks determinism guarantee - Two-way-door audit: ADR-022 narrowed several doors without updating OQ classifications; 'published artifact is a contract' blind spot in framework Includes recommended 5-pass resolution order.
1780 lines
90 KiB
Markdown
1780 lines
90 KiB
Markdown
---
|
||
status: draft
|
||
last_updated: 2026-06-22
|
||
reviewed_documents:
|
||
- overview.md
|
||
- README.md
|
||
- open-questions.md
|
||
- crates/core/README.md
|
||
- crates/core/core-types.md
|
||
- crates/core/endpoint.md
|
||
- crates/core/auth.md
|
||
- crates/core/config.md
|
||
- crates/call/README.md
|
||
- crates/call/call-protocol.md
|
||
- crates/call/operation-registry.md
|
||
- crates/vault/README.md
|
||
- crates/vault/mnemonic-derivation.md
|
||
- crates/vault/encryption.md
|
||
- crates/vault/service.md
|
||
- crates/vault/protocol.md
|
||
- decisions/001 through 023 (all 23 ADRs)
|
||
reviewer: architect
|
||
---
|
||
|
||
# Architecture Gap Review #002
|
||
|
||
## Purpose
|
||
|
||
This is the second pre-implementation sanity check. Review #001 caught the
|
||
registration-bundle tangle (C1–C4) and the missing error schemas (C5), both
|
||
resolved via ADR-022 and ADR-023. This review goes deeper on two axes the
|
||
first review under-covered:
|
||
|
||
1. **Cross-document consistency**: places where an Accepted ADR contradicts a
|
||
Proposed ADR, where a spec doc adopted a Proposed ADR's types without the
|
||
ADR being advanced, and where a later ADR narrowed a "two-way door"
|
||
deferral without updating the OQ classification.
|
||
2. **Two-way-door deferral audit**: the user's concern that the one-way/two-way
|
||
door framing from ADR-009 has been applied too loosely. Some deferrals are
|
||
legitimate, but several are questionable or misleading — either because a
|
||
later ADR quietly narrowed the door, because the "path of least resistance"
|
||
implementation forecloses a better option, or because multiple
|
||
individually-reversible deferrals compound to foreclose a path that no
|
||
single deferral would block on its own.
|
||
|
||
Review #001 resolved 5 critical, 4 warning, and 4 suggestion findings. This
|
||
review finds a different set: the registration-bundle work surfaced new
|
||
inconsistencies between ADR-015 (Accepted) and ADR-022 (Proposed), the abort
|
||
policy (ADR-016) was never reflected in the OperationContext struct, the
|
||
OperationEnv trait was never defined, and several vault deferrals have
|
||
security dimensions that the two-way-door framing doesn't surface.
|
||
|
||
## Severity Definitions
|
||
|
||
| Severity | Meaning |
|
||
|----------|---------|
|
||
| **Critical** | Must resolve before implementation — would cause divergent implementations or security issues |
|
||
| **Warning** | Should resolve — missing specifications that could cause confusion or rework |
|
||
| **Suggestion** | Consider — improvements for clarity or completeness |
|
||
|
||
---
|
||
|
||
## Critical Findings
|
||
|
||
### C1. ADR-015 Is Not Reconciled With ADR-022 — Direct Contradiction in the Document Set
|
||
|
||
**Documents**: ADR-015 (Status: Accepted, Decision 3 L114–136, Assumption 6
|
||
L275–280), ADR-022 (Status: Proposed, Decision 2 L145–186), operation-registry.md
|
||
(L115, L133, L172), call-protocol.md (L296)
|
||
|
||
**Problem**: ADR-015 is marked **Accepted** and defines
|
||
`handler_identity: Option<Identity>` in its Decision 3 (L124) — a peer `Identity`
|
||
type resolved via `IdentityProvider`. Assumption 6 (L275–280) explicitly states
|
||
"the handler identity is a full `Identity` (with scopes), not a special
|
||
principal type… This reuses the existing `Identity` type and `IdentityProvider`
|
||
infrastructure (ADR-004)."
|
||
|
||
ADR-022 is marked **Proposed** and explicitly supersedes ADR-015's Assumption 6
|
||
(L147–149: "This ADR refines that: composition authority is a declared authority
|
||
bundle, not a peer `Identity`"; L183–185: "This supersedes ADR-015's Assumption
|
||
6"). The spec docs now use `handler_identity: Option<CompositionAuthority>` —
|
||
a different, non-peer type.
|
||
|
||
This is a **direct contradiction**. Per the ADR lifecycle, a superseded
|
||
decision should be marked `Superseded` or carry an inline note pointing to the
|
||
superseding ADR. ADR-015 has neither. Worse, ADR-015 has **higher status**
|
||
(Accepted) than the ADR that supersedes it (Proposed). An implementer reading
|
||
ADR-015 in isolation would build `Identity`-based handler identity; an
|
||
implementer reading the specs would build `CompositionAuthority`-based. The
|
||
Accepted-overrides-Proposed governance rule means the old `Identity` definition
|
||
is, by the document set's own lifecycle, currently authoritative — yet the
|
||
specs have already adopted the proposed change.
|
||
|
||
This is the same shape of bug as review #001's W1 (ADR-020's stale example
|
||
path contradicting the locked ADR-021 scheme), but more severe: it's a
|
||
security-model type, not a path string.
|
||
|
||
**Suggestion**: Two things must happen before implementation:
|
||
|
||
1. Advance ADR-022 and ADR-023 to **Accepted**. The specs already depend on
|
||
them as the source of truth for types that appear throughout the code.
|
||
Keeping them Proposed while the specs adopt their types is incoherent
|
||
governance. (See C2.)
|
||
2. Reconcile ADR-015: either update its Decision 3 and Assumption 6 to point
|
||
to ADR-022 (inline amendment note + change `Identity` to
|
||
`CompositionAuthority` in the struct), or mark the relevant sections
|
||
`Superseded by ADR-022 (Decision 2)`.
|
||
|
||
The cleanest path is to do both in one pass: advance ADR-022/023 to Accepted,
|
||
amend ADR-015 to reference the supersession. This resolves C1, C2, and C6 in
|
||
one move.
|
||
|
||
---
|
||
|
||
### C2. ADR-022 and ADR-023 Are "Proposed" but the Specs Treat Them as Authoritative
|
||
|
||
**Documents**: ADR-022 (Status: Proposed), ADR-023 (Status: Proposed),
|
||
operation-registry.md (L40, L115, L133, L149, L158, L160–168, L286, L372–379,
|
||
L404–406), call-protocol.md (L296, L305, L369–373), README.md (ADR table
|
||
L36–60)
|
||
|
||
**Problem**: Both ADR-022 and ADR-023 are marked **Proposed**. Yet the spec
|
||
docs depend on them pervasively as the source of truth:
|
||
|
||
- `OperationSpec.error_schemas` (ADR-023) — operation-registry.md L40, L286.
|
||
- `ErrorDefinition` (ADR-023) — operation-registry.md L56–61.
|
||
- `HandlerRegistration` with `provenance`, `composition_authority`,
|
||
`scoped_env`, `capabilities` (ADR-022) — operation-registry.md L160–168,
|
||
the central registration type.
|
||
- `OperationContext.handler_identity: Option<CompositionAuthority>` (ADR-022
|
||
Decision 2) — operation-registry.md L115, L133; call-protocol.md L296.
|
||
- `CompositionAuthority`, `ScopedOperationEnv`, `OperationProvenance`
|
||
(ADR-022) — used throughout.
|
||
|
||
The README's ADR table (L36–60) shows ADR-022 and ADR-023 as "Proposed," but
|
||
the call-crate README's ADR table has **no Status column** at all — a reader
|
||
can't tell which ADRs are binding. Per the ADR lifecycle, "Proposed" means
|
||
under review, not yet a commitment.
|
||
|
||
Combined with C1 (ADR-015 Accepted still contradicts ADR-022 Proposed), the
|
||
governance state is incoherent: the Accepted ADR says `Identity`, the Proposed
|
||
ADR says `CompositionAuthority`, and the specs have picked the Proposed one.
|
||
|
||
**Suggestion**: Advance ADR-022 and ADR-023 to **Accepted** before
|
||
implementation. They are clearly intended to be binding — the specs already
|
||
implement them. Add a Status column to the crate README ADR tables so the
|
||
binding/non-binding distinction is visible. This is the governance fix that
|
||
unblocks C1.
|
||
|
||
---
|
||
|
||
### C3. The Abort Policy (ADR-016 Decision 6) Is Missing from `OperationContext` and `OperationEnv::invoke()`
|
||
|
||
**Documents**: ADR-016 (Decision 6 L120–164, Accepted), operation-registry.md
|
||
(`OperationContext` struct L111–128, `LocalOperationEnv::invoke()` L211–249),
|
||
call-protocol.md (L124, L345)
|
||
|
||
**Problem**: ADR-016 Decision 6 states unambiguously (L122–125): "The abort
|
||
policy (`abort-dependents` vs `continue-running`) is set on `OperationContext`
|
||
and propagated to children via `OperationEnv::invoke()`." It shows
|
||
`OperationContext` carrying the policy (L154) and an opt-in
|
||
`invoke_with_policy(...)` method (L149–151).
|
||
|
||
The `OperationContext` struct in operation-registry.md (L111–128) has **no
|
||
`abort_policy` field**. The fields shown are: `request_id`,
|
||
`parent_request_id`, `identity`, `handler_identity`, `capabilities`,
|
||
`metadata`, `env`, `internal`. No policy. The `LocalOperationEnv::invoke()`
|
||
implementation (L211–249) likewise constructs a child `OperationContext` with
|
||
no policy field and takes no policy parameter.
|
||
|
||
call-protocol.md L124 and L345 correctly *state* that the policy is on
|
||
`OperationContext` and propagated via `OperationEnv::invoke()`, but the actual
|
||
struct definition in operation-registry.md — the one an implementer would copy
|
||
— omits it. This is a gap between ADR-016 (Accepted) and the spec. An
|
||
implementer copying the spec would simply omit the abort policy, and the
|
||
`continue-running` opt-in would have no wiring point.
|
||
|
||
**Suggestion**: Add `pub abort_policy: AbortPolicy` (with a default of
|
||
`AbortPolicy::AbortDependents`) to the `OperationContext` struct in
|
||
operation-registry.md, and define the `AbortPolicy` enum
|
||
(`AbortDependents | ContinueRunning`). The `build_root_context` in
|
||
call-protocol.md should set `abort_policy: AbortPolicy::AbortDependents` (the
|
||
default per ADR-016 L140). See C4 for the `invoke()` side.
|
||
|
||
---
|
||
|
||
### C4. `OperationEnv` Trait Is Never Defined; `invoke_with_policy()` Absent from Spec
|
||
|
||
**Documents**: operation-registry.md (L196–259), ADR-016 (Decision 6 L144–164)
|
||
|
||
**Problem**: Two related gaps:
|
||
|
||
1. **The `OperationEnv` trait is never defined.** operation-registry.md
|
||
repeatedly references `OperationEnv` as a **trait** that "must remain a
|
||
trait" (L259, also OQ-19), and shows
|
||
`#[async_trait] impl OperationEnv for LocalOperationEnv` (L211–212), but
|
||
**the trait definition itself is never shown**. Only the `LocalOperationEnv`
|
||
struct (L207–209) and its impl block (L211–249) appear. An implementer
|
||
cannot know the trait's method signatures, return types, or whether
|
||
`invoke()` is the sole method.
|
||
|
||
2. **The `invoke_with_policy()` opt-in is absent.** ADR-016 Decision 6
|
||
(L144–152) shows the policy entering through `OperationEnv::invoke()` via
|
||
`invoke_with_policy("fs", "readFile", input, &context,
|
||
AbortPolicy::ContinueRunning)`. The spec's `OperationEnv::invoke()`
|
||
signature (L213) takes no policy parameter, and there is no
|
||
`invoke_with_policy` method, no `InvokeOptions` struct, no builder
|
||
pattern. The spec never defines the trait surface, so the policy opt-in
|
||
mechanism ADR-016 mandates is unspecified.
|
||
|
||
This is load-bearing because OQ-19 makes the trait-ness of `OperationEnv` a
|
||
**constraint**: session-scoped registries depend on wrapping the global env
|
||
via trait layering. If the trait surface is unspecified, two implementers
|
||
could define it differently, breaking the wrapping pattern.
|
||
|
||
**Suggestion**: Define the `OperationEnv` trait explicitly in
|
||
operation-registry.md, showing at minimum:
|
||
|
||
```rust
|
||
#[async_trait]
|
||
pub trait OperationEnv: Send + Sync {
|
||
async fn invoke(
|
||
&self,
|
||
namespace: &str,
|
||
operation: &str,
|
||
input: Value,
|
||
parent: &OperationContext,
|
||
) -> ResponseEnvelope;
|
||
|
||
/// Opt-in: compose a child with a non-default abort policy.
|
||
async fn invoke_with_policy(
|
||
&self,
|
||
namespace: &str,
|
||
operation: &str,
|
||
input: Value,
|
||
parent: &OperationContext,
|
||
policy: AbortPolicy,
|
||
) -> ResponseEnvelope {
|
||
// default implementation: set policy on child context, delegate to invoke
|
||
}
|
||
}
|
||
```
|
||
|
||
This satisfies both the "must remain a trait" constraint and ADR-016's
|
||
policy parameter. See C7 for a related type ambiguity on the same struct.
|
||
|
||
---
|
||
|
||
### C5. ADR-017's `from_call` Mirror List Does Not Include `error_schemas` — Inconsistency with ADR-023
|
||
|
||
**Documents**: ADR-017 (Decision 3 L113–122, Assumption 4 L274–279, Accepted),
|
||
ADR-023 (Decision 1 L112–147, L42–47, L257, Proposed)
|
||
|
||
**Problem**: ADR-017 Decision 3, step 3 (L118–119) says `from_call` constructs
|
||
a spec that "mirrors the remote operation's name, namespace, type, schemas,
|
||
and access control." Assumption 4 (L274–275) repeats: "the imported
|
||
`OperationSpec` has the same name, namespace, type, schemas, and access
|
||
control as the remote operation." The list is: name, namespace, type,
|
||
schemas, access control. There is **no `error_schemas`** in either list.
|
||
|
||
ADR-023 adds `error_schemas: Vec<ErrorDefinition>` to `OperationSpec` (L123)
|
||
and Decision 5 (L236–261) makes adapter fidelity explicit for
|
||
`from_openapi`/`to_openapi`/`from_mcp`/`to_mcp`. But ADR-023 does **not**
|
||
mention `from_call` in Decision 5 — it covers OpenAPI and MCP adapters only.
|
||
Meanwhile ADR-017's `from_call` mirror list predates `error_schemas` existing
|
||
as a field.
|
||
|
||
The result: it's ambiguous whether `from_call` should mirror `error_schemas`.
|
||
ADR-017's step 2 (L116–117) says `from_call` calls `services/schema` "for each
|
||
→ gets the input/output JSON Schemas" — again, no mention of error schemas.
|
||
ADR-023 Decision 6 (L264–286) confirms `services/schema` *returns*
|
||
`error_schemas`, so the data is available, but ADR-017 doesn't say to consume
|
||
it.
|
||
|
||
An implementer following ADR-017 literally would import operations with empty
|
||
`error_schemas`, silently dropping the remote error contract — the exact
|
||
lossy-import problem ADR-023 was written to solve for OpenAPI/MCP, now
|
||
recurring for `from_call`.
|
||
|
||
**Suggestion**: Update ADR-017 Decision 3 step 3 and Assumption 4 to include
|
||
`error_schemas` in the mirror list (name, namespace, type, schemas,
|
||
**error_schemas**, access control), and update step 2 to note that
|
||
`services/schema` now also returns error schemas. Add a cross-reference to
|
||
ADR-023. This is an inconsistency between an Accepted ADR (017) and a later
|
||
ADR (023) that must be reconciled — same shape as review #001's W3
|
||
(`from_mcp` missing from adapter lists), but for a new field.
|
||
|
||
---
|
||
|
||
### C6. `OperationContext.env` Has a Type Identity Crisis — Reachability Set vs Dispatch Trait
|
||
|
||
**Documents**: ADR-015 (L127: `pub env: OperationEnv`), ADR-022 (L208–215
|
||
`ScopedOperationEnv`, L286–287 `env: registration.scoped_env…`),
|
||
operation-registry.md (L118: `pub env: OperationEnv`; L220:
|
||
`parent.env.allows(&name)`; L243–244: `env: registration.scoped_env.clone()…`;
|
||
L198: `context.env.invoke(...)`), call-protocol.md (L298–299)
|
||
|
||
**Problem**: There is a type identity problem for the `env` field on
|
||
`OperationContext`:
|
||
|
||
- ADR-015 L127 and operation-registry.md L118 declare `pub env: OperationEnv`
|
||
(a trait object / the trait type).
|
||
- ADR-022 and operation-registry.md L243–244 and call-protocol.md L298–299
|
||
*populate* `env` with a `ScopedOperationEnv` (a concrete struct:
|
||
`allowed_operations: HashSet<String>`, ADR-022 L208–214).
|
||
|
||
`ScopedOperationEnv` (a reachability set — ADR-022's reachability control)
|
||
and `OperationEnv` (the composition dispatch trait — the thing with
|
||
`invoke()`) are **different concepts**. ADR-015 conflated them by putting
|
||
`OperationEnv` on the context; ADR-022 introduced `ScopedOperationEnv` as the
|
||
reachability *input* but the dispatch path assigns the scoped set directly to
|
||
`env:`.
|
||
|
||
operation-registry.md L220 calls `parent.env.allows(&name)` — implying `env`
|
||
is the `ScopedOperationEnv` (has `.allows()`). But the same field is named
|
||
`OperationEnv` in the struct and described (L136) as "the operation environment
|
||
for composing calls… `OperationEnv`" with `invoke()`. A handler calls
|
||
`context.env.invoke(...)` (L198, L200) — implying `env` is the dispatch
|
||
trait. One field cannot be both a `HashSet`-based reachability gate and a
|
||
trait with an async `invoke()` method, unless they are the *same type* (they
|
||
are not — ADR-022 treats `ScopedOperationEnv` as data and the dispatch
|
||
`OperationEnv`/`LocalOperationEnv` as the trait impl that *consults* the
|
||
scoped env).
|
||
|
||
This is a genuine type-system ambiguity that would force an implementer to
|
||
guess: is `context.env` the dispatch trait (then where does the scoped
|
||
reachability set live?), or is it the reachability set (then how does the
|
||
handler call `invoke()` on it?), or is it a composite type that holds both?
|
||
|
||
This interacts with OQ-19's constraint that `OperationEnv` must remain a trait:
|
||
if `env` is `ScopedOperationEnv` (a concrete struct), the session-overlay
|
||
pattern is closed because the field can't hold a trait object.
|
||
|
||
**Suggestion**: Separate the two concepts explicitly. Likely:
|
||
`OperationContext` carries `scoped_env: ScopedOperationEnv` (reachability
|
||
data, used for the `allows()` check) and a separate `env: Box<dyn
|
||
OperationEnv>` (or `Arc<dyn OperationEnv>`) that is the dispatch trait (used
|
||
for `invoke()`). The reachability check in `invoke()` consults
|
||
`parent.scoped_env.allows(&name)`, and the actual dispatch goes through the
|
||
`OperationEnv` trait impl. The current field naming (`env` for both) is the
|
||
root of the confusion and should be disambiguated. This may warrant an ADR-022
|
||
clarification, since it changes the struct shape that ADR-022 Decision 5 shows.
|
||
|
||
---
|
||
|
||
### C7. OQ-21's "Non-Breaking" Deferral Conflicts With ADR-019 and Hides a Crate-Decomposition Decision
|
||
|
||
**Documents**: open-questions.md (OQ-21 L252–285), ADR-019 (L133–136),
|
||
crates/vault/protocol.md (L187–216, L278–291), crates/vault/service.md
|
||
(L300–322)
|
||
|
||
**Problem**: Three interlocking problems:
|
||
|
||
1. **ADR-019 directly contradicts OQ-21's deferral framing.** ADR-019 L133–136
|
||
states: *"Remote vault administration (unlock a running node's vault over
|
||
the network) is not supported. If needed in the future, it requires a
|
||
separate, heavily restricted mechanism (admin scope, mTLS-only, never
|
||
expose the mnemonic over an unauthenticated channel) and **its own ADR**."*
|
||
OQ-21 defers remote vault *access* (derive/encrypt/decrypt) as
|
||
"non-breaking" with no ADR required. The boundary between ADR-019's
|
||
"remote vault administration" (needs an ADR) and OQ-21's "remote vault
|
||
access" (deferred, no ADR) is undefined. An implementer reading ADR-019
|
||
could reasonably conclude that *any* remote vault exposure requires an
|
||
ADR; an implementer reading OQ-21 could conclude none is needed.
|
||
|
||
2. **The "non-breaking" claim is only true for the wire format, not for
|
||
secure enablement.** OQ-21 and protocol.md L278–286 frame enabling remote
|
||
access as a "server-setup change." But the auth-wrapping handler that makes
|
||
remote access *safe* cannot live in the vault crate (ADR-018) and requires
|
||
either (a) assembly-layer integration or (b) **a new dedicated vault-server
|
||
crate depending on both alknet-core and alknet-vault** (protocol.md
|
||
L201–204, open-questions.md L269). Option (b) is a crate-decomposition
|
||
change (ADR-003 territory), not "server setup." OQ-21's "two-way (enabling
|
||
is non-breaking)" door-type classification (L256) is misleading: once
|
||
workers build dependencies on the remote vault API, disabling it breaks
|
||
them — the door is operationally one-way even if the wire format is
|
||
additive.
|
||
|
||
3. **Nested, undecided deferral.** OQ-21 L269 says the auth handler "lives in
|
||
the assembly layer **or** a dedicated vault-server crate." This "or" is
|
||
itself an undecided architectural choice (affects the dependency graph and
|
||
ADR-003's crate decomposition) deferred *inside* the OQ-21 deferral. It is
|
||
not tracked as a separate question.
|
||
|
||
4. **The RemoteService path is unvalidated.** The deferral means
|
||
`VaultProtocol`'s `RemoteService` trait impl, `DerivedKey`'s dual
|
||
serialization (JSON redacts, postcard preserves), and
|
||
`VaultServiceActor`'s transport-agnosticism have never been exercised over
|
||
a real network. The spec *asserts* these work ("by construction"), but
|
||
there are concrete failure modes that only surface at enablement —
|
||
e.g., `DerivedKey`'s custom `Serialize` redacts `private_key` for
|
||
"human-readable formats"; whether postcard correctly bypasses the
|
||
redaction is a known source of subtle bugs. If it redacts under postcard
|
||
too, remote dispatch silently corrupts private keys. ADR-009:33 warns
|
||
"if the right choice is unclear, validate with a POC." The deferral skips
|
||
the POC and asserts the door is open by construction.
|
||
|
||
5. **Safety footgun.** The default `IrohProtocol` handler forwards all
|
||
message types without auth (protocol.md L189–191). If someone naively
|
||
registers the ALPN without the wrapping handler, `Unlock` (mnemonic over
|
||
the wire) and `Lock` (DoS) become remotely callable. The operation access
|
||
policy is documented in prose only; nothing in the vault crate prevents
|
||
the insecure configuration.
|
||
|
||
**Suggestion**: Promote OQ-21 from "deferred" to "open." Reconcile with
|
||
ADR-019 by explicitly defining the administration-vs-access boundary. Decide
|
||
now whether a vault-server crate is part of the architecture (this is a
|
||
crate-decomposition one-way door per ADR-003/ADR-018 and should be decided,
|
||
not deferred). If remote access is genuinely deferred, require that the
|
||
*enablement* (not just the protocol) be covered by a future ADR — matching
|
||
ADR-019's "requires its own ADR" language. Correct the door-type
|
||
classification: enabling remote access is a one-way door operationally (workers
|
||
depend on it), even if the wire format is additive. At minimum, add a guard
|
||
clause: "The RemoteService path is unvalidated. The `DerivedKey` postcard
|
||
serialization and the actor's transport-agnosticism must be verified by a
|
||
POC before claiming the door is open."
|
||
|
||
---
|
||
|
||
### C8. The Operation Access Policy Table Is Incomplete — Security-Sensitive Methods Are Unclassified
|
||
|
||
**Documents**: crates/vault/protocol.md (L218–236), crates/vault/service.md
|
||
(L104–212)
|
||
|
||
**Problem**: The "Operation access policy" table (protocol.md L223–232)
|
||
covers exactly the 8 `VaultProtocol` enum variants. But `VaultServiceHandle`
|
||
exposes additional methods not in the protocol enum: `rotate`, `unlock_new`,
|
||
`derive_encryption_key_for_version`, `derive_password_string`, `is_unlocked`.
|
||
These methods have no access classification — an implementer cannot determine
|
||
from the spec whether they are local-only, remote-capable, or simply not
|
||
protocol-exposed. This is security-relevant:
|
||
|
||
- **`rotate`** re-encrypts blobs (security-sensitive operational action;
|
||
ADR-021 L168–170 says "rotation is an operational action, not a runtime
|
||
behavior"). If it were ever exposed remotely, a compromised worker could
|
||
trigger mass re-encryption. It is not in the protocol enum and not in the
|
||
access policy table — its status is undefined.
|
||
- **`unlock_new`** generates and returns the mnemonic (root of trust) as a
|
||
plaintext `String`. Not in the protocol enum; not in the policy table. Its
|
||
local-only status is implied but not stated.
|
||
- **`derive_encryption_key_for_version`** returns an `EncryptionKey` directly
|
||
(raw key material), unlike `Encrypt`/`Decrypt` which use the key internally.
|
||
If exposed remotely, a worker could extract raw encryption keys. Not
|
||
classified.
|
||
|
||
The spec never states the rule that "service methods not in `VaultProtocol`
|
||
are local-only by construction." An implementer could add protocol variants
|
||
for these methods without recognizing the security implication, since the
|
||
policy table only constrains existing variants.
|
||
|
||
**Suggestion**: Add an explicit rule: "Service methods absent from
|
||
`VaultProtocol` are local-only by default; adding a protocol variant requires
|
||
updating this policy table and an ADR if the method is security-sensitive."
|
||
Add `rotate`, `unlock_new`, and `derive_encryption_key_for_version` to the
|
||
policy table (or an explicit "local-only" list) with rationale. Document the
|
||
mapping between service methods and protocol variants.
|
||
|
||
---
|
||
|
||
### C9. `site_password_path(site_hash: &str)` Is Underspecified — String-to-Index Mapping Breaks Determinism
|
||
|
||
**Documents**: crates/vault/mnemonic-derivation.md (L202–204, L213, L219–222)
|
||
|
||
**Problem**: The helper `site_password_path(site_hash: &str) -> String`
|
||
produces path `m/74'/1'/0'/{site_hash}'`, but SLIP-0010 hardened derivation
|
||
indices are `u32` (0 to 2³¹−1). A `&str` "site hash" cannot be a derivation
|
||
index directly — it must be converted to a `u32`. The conversion (hash
|
||
algorithm, truncation, endianness, collision handling) is completely
|
||
unspecified. Two implementers could produce different, incompatible mappings:
|
||
SHA-256(site) → first 4 bytes → u32; or last 4 bytes; or BLAKE3; or a string
|
||
hash. Since the path semantics table (L213) says these are "Ed25519 bytes"
|
||
used for "Per-site passwords (not cached)," and determinism is a stated design
|
||
principle (L234–237: "the same mnemonic + passphrase + path always produces
|
||
the same key"), the string→index mapping must be deterministic and specified.
|
||
An underspecified mapping means the same site + mnemonic produces different
|
||
passwords across implementations — breaking the reproducibility guarantee.
|
||
|
||
Additionally, "Ed25519 bytes" (L213) as a password source is an undefined
|
||
concept. The path produces a 32-byte Ed25519 private key; using those raw
|
||
bytes as a password is a non-standard construction with no ADR explaining why
|
||
Ed25519-derived material is suitable as password entropy (vs HKDF or a
|
||
dedicated KDF).
|
||
|
||
**Suggestion**: Specify the exact string→u32 mapping (e.g.,
|
||
`u32::from_be_bytes(SHA-256(site_hash)[0..4]) & 0x7FFFFFFF` for hardened
|
||
index). Add a design note or ADR explaining why Ed25519-derived bytes are
|
||
acceptable as password material. Specify the password encoding (raw bytes vs
|
||
base64url — service.md L166 says base64url for the string variant, but the
|
||
bytes variant's encoding is the caller's concern).
|
||
|
||
---
|
||
|
||
### C10. `HandlerError::StreamError(io::Error)` vs `StreamError` Variant — Conversion Gap
|
||
|
||
**Documents**: crates/core/core-types.md (L33–38, L118–139),
|
||
crates/core/endpoint.md (L248–253), crates/core/auth.md (L55, L59), ADR-010
|
||
|
||
**Problem**: `core-types.md` and `endpoint.md` (citing ADR-010) both define
|
||
`HandlerError::StreamError(io::Error)`. But `auth.md:59` shows handler code
|
||
calling
|
||
`self.identity_provider.resolve_from_token(&token).ok_or(HandlerError::AuthRequired)?`
|
||
after `let stream = connection.accept_bi().await?;` (L55). The `.await?` on
|
||
`accept_bi()` returns `StreamError` (per `core-types.md:60`), yet there is no
|
||
`From<StreamError> for HandlerError` impl specified. The mapping table in
|
||
`core-types.md:132–137` is descriptive prose, not a trait impl. Without a
|
||
specified `From<StreamError> for HandlerError` (or `#[from]`), the `?` in the
|
||
example will not compile. Two implementers following these docs literally
|
||
will diverge: one will hand-map per the table, another will expect an auto-
|
||
`From`.
|
||
|
||
Additionally, `HandlerError::StreamError(io::Error)` holds `io::Error`, but
|
||
`StreamError` itself holds `Internal(io::Error)` (L118–124). So
|
||
`StreamError::Internal(io::Error)` → `HandlerError::StreamError(io::Error)`
|
||
is a lossy wrap (the distinction between stream-level and connection-level is
|
||
flattened). The spec should state whether this `From` impl exists and its
|
||
exact behavior, because handler code (`?` operator) depends on it.
|
||
|
||
**Suggestion**: Specify the exact conversion mechanism. Either (a) declare
|
||
`HandlerError::StreamError(#[from] StreamError)` — but then the variant must
|
||
hold `StreamError`, not `io::Error`, contradicting ADR-010/core-types; or (b)
|
||
explicitly state handlers must use `.map_err(HandlerError::from)` or
|
||
`.map_err(|e| e.into())` and define the `From` impl; or (c) keep `io::Error`
|
||
and define `From<StreamError> for HandlerError` matching the mapping table.
|
||
Resolve the variant payload (`io::Error` vs `StreamError`) and specify the
|
||
`From` impl. Update `auth.md:55` example to use the canonical conversion
|
||
pattern.
|
||
|
||
---
|
||
|
||
### C11. ADR-004 Says Handlers "Enrich or Replace" AuthContext; ADR-011 Says AuthContext Is Immutable in `handle()`
|
||
|
||
**Documents**: ADR-004 (L39, Accepted), ADR-011 (L131–133, Accepted),
|
||
crates/core/auth.md (L82–85)
|
||
|
||
**Problem**: ADR-004 line 39 states the handler "enriches or replaces the
|
||
partial `AuthContext` with the fully resolved `Identity`." This describes
|
||
**mutation** of the shared AuthContext. ADR-011 explicitly contradicts this
|
||
at lines 131–133: "`handle()` signature passes `&AuthContext` (immutable
|
||
reference). Handlers that resolve identity create a local variable… they
|
||
don't mutate the AuthContext." `auth.md:85` reiterates immutability. The
|
||
immutability decision is the later, more refined one, but ADR-004 was never
|
||
amended. An implementer reading ADR-004 alone would mutate `AuthContext` —
|
||
causing cross-stream contamination (one stream's token resolution leaking
|
||
into another stream's AuthContext on the same connection), exactly what
|
||
ADR-011 prevents.
|
||
|
||
This is security-relevant — mutation across streams is a contamination
|
||
vector. Same shape as C1 (Accepted ADR contradicts a later refinement that
|
||
was never recorded as a supersession).
|
||
|
||
**Suggestion**: Amend ADR-004 to note that the "enrich/replace" language was
|
||
superseded by ADR-011's immutability model (handlers resolve into a local
|
||
variable, store on `Connection` via `set_identity`). Add an inline note at
|
||
ADR-004 L39 pointing to ADR-011. Same pattern as the C1 fix for ADR-015.
|
||
|
||
---
|
||
|
||
### C12. ADR-002's `ProtocolHandler::handle` Signature Is Stale — Not Marked as Superseded by ADR-007
|
||
|
||
**Documents**: ADR-002 (L19–27, Accepted), ADR-007 (L44–53, Accepted),
|
||
overview.md (L89), crates/core/core-types.md (L14–19)
|
||
|
||
**Problem**: ADR-002 (Status: Accepted) still shows
|
||
`async fn handle(&self, stream: BiStream, auth: &AuthContext)` at L26.
|
||
ADR-007 revised the signature to
|
||
`handle(&self, connection: Connection, auth: &AuthContext)` and its References
|
||
note "ADR-002: ProtocolHandler trait (signature revised by this ADR)."
|
||
However, ADR-002 itself has **no superseded/amended marker** — it reads as
|
||
the current authoritative definition. `overview.md:89` even says "This differs
|
||
from the original ADR-002 signature which passed `BiStream`," confirming the
|
||
discrepancy is known but ADR-002 was never annotated. An implementer reading
|
||
only ADR-002 (as the README's "Applicable ADRs" table encourages) gets the
|
||
wrong trait.
|
||
|
||
**Suggestion**: Add a "Superseded by ADR-007" amendment note at the top of
|
||
ADR-002's Decision section (or a `## Amendments` section) stating the
|
||
signature was changed by ADR-007. Alternatively, update the ADR-002 code block
|
||
to the `Connection` signature with a note "Revised by ADR-007." ADRs must
|
||
reflect current state or explicitly mark supersession — this is the same
|
||
lifecycle rule that C1 and C11 violate.
|
||
|
||
---
|
||
|
||
### C13. `Connection::set_identity` Observability Read Path Is Under-Specified
|
||
|
||
**Documents**: crates/core/core-types.md (L53–67), crates/core/auth.md (L80),
|
||
open-questions.md (OQ-11 L138), crates/core/endpoint.md (L122–136)
|
||
|
||
**Problem**: `Connection` has private `identity: OnceLock<Identity>`.
|
||
`set_identity(&self,…)` takes `&self` (interior mutability) and is
|
||
"write-once-read-many." But the endpoint "holds a reference for logging"
|
||
(OQ-11:138) and the dispatch code in `endpoint.md:122–136` **moves** `conn`
|
||
into the spawned task (`Connection::new(connection)` then
|
||
`tokio::spawn(async move { ... handler.handle(conn, &auth) })`). After the
|
||
move, the endpoint no longer has the `Connection` to read `identity()` for
|
||
observability. OQ-11 says "the endpoint may also hold a reference for
|
||
logging" — but there is no `Arc<Connection>` or shared handle shown anywhere;
|
||
the struct definition in `core-types.md:53–57` has no sharing mechanism, and
|
||
`AlknetEndpoint` in `endpoint.md:17–27` does not hold a registry of live
|
||
connections. So the stated observability reader (the endpoint) has no defined
|
||
way to access the connection after spawn.
|
||
|
||
**Suggestion**: Specify the observability read path precisely. Either (a)
|
||
the endpoint holds a side table of live connections keyed by connection ID
|
||
with `Weak<Connection>` or an `Arc<Connection>` shared with the handler task,
|
||
and `Connection` becomes `Arc`-shareable; (b) the handler emits the resolved
|
||
identity via a channel/observer callback and the endpoint records it; or (c)
|
||
drop the claim that the endpoint reads it and state observability is
|
||
handler-side logging only. The current spec is internally inconsistent on
|
||
who reads `identity()` and how they obtain the reference.
|
||
|
||
---
|
||
|
||
## Warning Findings
|
||
|
||
### W1. ADR-017's `OperationAdapter` Trait Is Stale Relative to ADR-022 — And the Sync/Async "Two-Way Door" Is Already Decided
|
||
|
||
**Documents**: ADR-017 (L155–158, L171–174, Accepted), ADR-022 (L225–255,
|
||
Proposed)
|
||
|
||
**Problem**: ADR-017 L171–172 states: "The specific trait signatures (async
|
||
vs sync, error types, configuration parameters) are two-way doors for
|
||
implementation." Two problems:
|
||
|
||
1. **The bundle shape is already constrained.** ADR-022 changes registration
|
||
from `(OperationSpec, Handler)` to `HandlerRegistration` (spec + handler +
|
||
provenance + composition_authority + scoped_env + capabilities).
|
||
ADR-022 L257–261 says adapter convenience methods construct
|
||
`HandlerRegistration` with `composition_authority: None` and
|
||
`scoped_env: None`. So the adapter trait must now produce
|
||
`Vec<HandlerRegistration>`, not `Vec<(OperationSpec, Handler)>`. The trait
|
||
signature shown in ADR-017 (`fn import(&self) -> Vec<(OperationSpec,
|
||
Handler)>`) is wrong per ADR-022. The "two-way door for implementation" on
|
||
the signature has already been narrowed by ADR-022 — the return type is
|
||
committed to the bundle, not the pair. ADR-017 has not been updated.
|
||
|
||
2. **The async/sync question is not fully two-way.** `from_call` is inherently
|
||
async — it discovers operations via `services/list` + `services/schema`
|
||
over a QUIC connection. A synchronous `fn import(&self) -> Vec<…>` trait
|
||
cannot accommodate `from_call` without a separate async pre-step that
|
||
populates a cache, then a sync import that reads the cache. This means the
|
||
trait shape is constrained: either the trait is async (`async fn import`),
|
||
or `from_call` is not an `OperationAdapter` impl and uses a different path.
|
||
The "two-way door" framing suggests you can pick sync now and switch to
|
||
async later; in reality, `from_call` forces at least an async-capable path
|
||
from day one.
|
||
|
||
This is a "two-way door" label that ADR-022 has silently narrowed without
|
||
updating the OQ classification. (See the two-way-door audit, item 7.)
|
||
|
||
**Suggestion**: Update ADR-017 to: (1) return `Vec<HandlerRegistration>` (not
|
||
`(OperationSpec, Handler)`), (2) acknowledge that `from_call` requires async
|
||
discovery and the trait must be async or `from_call` must be special-cased, (3)
|
||
reclassify the signature as one-way (bundle shape) with a narrow two-way door
|
||
on ergonomic details (error types, config params). The current framing in
|
||
ADR-017 is inconsistent with ADR-022 and should be reconciled — same
|
||
reconciliation pass as C5.
|
||
|
||
---
|
||
|
||
### W2. `Capabilities` Concrete Shape and Clone Semantics — "Two-Way Door" With a Hidden One-Way Door Inside
|
||
|
||
**Documents**: operation-registry.md (L364, L369), ADR-014 (L74–76, L123–124,
|
||
L163–165)
|
||
|
||
**Problem**: The spec says "the concrete shape of the type (a typed map, a
|
||
struct with named fields, a trait object) is a two-way door for implementation"
|
||
and "the concrete cloning semantics (reference-counted `Arc` vs deep copy of
|
||
zeroized material) is a two-way door for implementation, but
|
||
`Capabilities: Clone` is required."
|
||
|
||
The `Clone` requirement is accurately identified as a one-way constraint. The
|
||
problem is the *clone semantics*, which the framing treats as two-way but
|
||
which contain a hidden one-way door:
|
||
|
||
- **Arc-based clone (cheap, shared).** If `Capabilities` is
|
||
`Arc<Map<String, Secret>>`, `clone()` bumps the refcount. All calls in a
|
||
composition tree share the same secret material. If a handler can mutate
|
||
capabilities (e.g., add a derived key for a child operation), that mutation
|
||
is visible to siblings and the parent — shared mutable state across the
|
||
call tree. This is a security-relevant behavior.
|
||
- **Deep-copy clone (expensive, isolated).** Each call gets its own copy.
|
||
Mutations are isolated. But cloning zeroized secret material on every
|
||
`invoke()` duplicates secrets N times for a depth-N tree, increasing the
|
||
attack surface.
|
||
|
||
The "path of least resistance" is Arc-based (cheap, the obvious choice for a
|
||
`Clone` type holding secrets). Once shipped, handlers may come to depend on
|
||
shared-mutation. Switching from Arc-shared to deep-copy-isolated later is a
|
||
behavior change that breaks those handlers.
|
||
|
||
Worse, ADR-022's composition model *assumes* cheap clone — it clones
|
||
unconditionally on every `invoke()`. A depth-5 composition tree with Arc-based
|
||
clone does 5 refcount bumps (cheap). With deep-copy, it does 5 full copies of
|
||
all secret material. The ADR-022 design bakes in the assumption that clone is
|
||
cheap — which means it bakes in the Arc-based choice, making the "two-way door"
|
||
effectively decided by the composition model's performance assumptions.
|
||
|
||
**Suggestion**: Add a guard clause: `Capabilities` must be *immutable after
|
||
construction* (no interior mutability, no `Mutex<Map>`, no `RefCell`). This
|
||
makes Arc-based clone safe (shared immutable state is fine) and makes the
|
||
two-way door genuinely two-way: you can switch between `Arc<Map>` and
|
||
deep-copy without behavior change, because neither supports mutation. Without
|
||
this guard, the "two-way door" is a future one-way door waiting for the first
|
||
handler that mutates capabilities.
|
||
|
||
---
|
||
|
||
### W3. `CallClient` Registry Mechanism — "Two-Way Door" With a Security Dimension ADR-022 Introduced
|
||
|
||
**Documents**: ADR-017 (L236–237, Accepted), ADR-022 (L225–242, Proposed)
|
||
|
||
**Problem**: ADR-017 L236–237: "The specific mechanism (sharing the global
|
||
registry, a peer-scoped subset, or a separate registry) is a two-way door."
|
||
|
||
The three options are mechanically possible with the `HandlerRegistration`
|
||
bundle. But ADR-022 introduced a security dimension the framing doesn't flag:
|
||
|
||
- **Sharing the global registry exposes local capabilities to the remote
|
||
peer.** Each `HandlerRegistration` carries `Capabilities` with secret
|
||
material. If the `CallClient` shares the global registry, the remote peer
|
||
can invoke any External operation — and those operations' handlers use the
|
||
*local* capabilities. A remote peer calling `/llm/generate` uses the local
|
||
node's Google API key. The remote peer effectively gains access to all
|
||
local secret material that backs External operations.
|
||
- **A peer-scoped subset must filter by capability remote-safety, not just
|
||
operation name.** The "path of least resistance" for a peer-scoped subset is
|
||
to filter by operation name/namespace. But the real filter must be: "is this
|
||
operation's capability safe to expose to this peer?" An operation backed by
|
||
a signing key that should never leave the node is not remote-safe,
|
||
regardless of its name.
|
||
|
||
The framing was written before ADR-022 put capabilities in the bundle.
|
||
Post-ADR-022, the "two-way door" on the registry mechanism has a security
|
||
constraint attached that the framing doesn't mention.
|
||
|
||
**Suggestion**: Append to ADR-017's negative consequences: "Sharing the
|
||
global registry with a `CallClient` exposes all local capabilities (API keys,
|
||
signing keys) to the remote peer, because the dispatch path populates
|
||
`OperationContext.capabilities` from the registration bundle. A peer-scoped
|
||
subset must filter by capability remote-safety, not just operation name. The
|
||
registry-mechanism choice is two-way mechanically but has a security dimension
|
||
post-ADR-022: the 'share global' option is a capability-exposure decision, not
|
||
just a dispatch decision."
|
||
|
||
---
|
||
|
||
### W4. `from_call` Hot-Swapping and Registry Mutability — Two "Two-Way Doors" That Gate Each Other
|
||
|
||
**Documents**: ADR-017 (L279), operation-registry.md (L383)
|
||
|
||
**Problem**: ADR-017 L279: "Hot-swapping imported specs is a two-way door."
|
||
operation-registry.md L383: "The registry is immutable after construction. No
|
||
runtime registration or deregistration. Two-way door —
|
||
`ArcSwap<OperationRegistry>` can be added later."
|
||
|
||
These are presented as independent two-way doors. They are not — one gates the
|
||
other. If the initial implementation registers `from_call`-imported specs
|
||
immutably (per the immutable-registry constraint), and handlers/clients
|
||
compile/generated-code against those specs, then "hot-swapping" later means:
|
||
re-importing on reconnect, diffing specs, invalidating generated clients,
|
||
handling in-flight calls whose schema changed. If the registry is
|
||
`ArcSwap`-able, hot-swap is mechanically possible but semantically fraught (a
|
||
handler mid-composition doesn't expect the child's schema to change).
|
||
|
||
Deciding "hot-swap is a two-way door" without resolving the registry-mutability
|
||
question leaves an implementer to guess whether to build for immutability or
|
||
hot-swap. The deferral hides a coupling: the *immutability* of the registry (a
|
||
one-way door per ADR-010 and operation-registry.md L14) and the *mutability*
|
||
of remote specs (inherent to `from_call`) are in tension.
|
||
|
||
Additionally, hot-swapping with ADR-022 bundles requires re-injecting
|
||
capabilities (cloning from the old registry or re-deriving from the vault) and
|
||
rebuilding the TLS `ServerConfig` (ALPN strings come from the registry).
|
||
The "simple ArcSwap" now touches the secret-material flow and the TLS config.
|
||
This isn't closing the door, but it raises the cost and security sensitivity
|
||
of the swap.
|
||
|
||
**Suggestion**: Resolve the registry-mutability question (immutable vs
|
||
`ArcSwap`) *before* deciding hot-swap, since hot-swap depends on it — don't
|
||
let both be independent "two-way doors" when one gates the other. Add a guard
|
||
clause to OQ-04: "Hot-swapping the registry with ADR-022 bundles requires
|
||
re-injecting capabilities (cloning from the old registry or re-deriving from
|
||
the vault) and rebuilding the TLS ServerConfig. The `ArcSwap` upgrade is
|
||
mechanical but not trivial — it is a secret-material-handling operation, not
|
||
a pure pointer swap."
|
||
|
||
---
|
||
|
||
### W5. `to_*` Adapter "Best-Effort" Mappings Become Compatibility Contracts Once Published
|
||
|
||
**Documents**: ADR-017 (L244–246, L281–286, Accepted), ADR-023 (error schemas
|
||
projection)
|
||
|
||
**Problem**: ADR-017 L244–246: "Some semantics don't map cleanly (e.g.,
|
||
subscriptions in OpenAPI, bidirectional calls in MCP). The adapters handle
|
||
these with best-effort mappings and document the gaps."
|
||
|
||
The "best-effort" framing implies the mappings are malleable — two-way,
|
||
adjustable. This is misleading once the generated spec is published:
|
||
|
||
- **A published spec is a compatibility contract.** `to_openapi` generates an
|
||
OpenAPI spec. Once external HTTP clients consume that spec and build against
|
||
it, the mapping semantics (e.g., subscriptions → SSE long-poll, or
|
||
subscriptions → a polling endpoint) become a de facto contract. Changing the
|
||
mapping later breaks every client that built against the original spec. The
|
||
"best-effort" mapping, once published, is a one-way door for the clients
|
||
that depend on it.
|
||
- **`from_openapi` round-trip lossiness compounds.** If `to_openapi` maps
|
||
subscriptions to SSE, and later `from_openapi` imports a spec where
|
||
subscriptions are modeled as webhooks, the two adapters disagree on what a
|
||
"subscription" means in OpenAPI. The `to_*` mapping choices constrain the
|
||
`from_*` parsing expectations.
|
||
- **With ADR-023**, `to_openapi` now maps `ErrorDefinition.http_status` to
|
||
OpenAPI response codes. Once published, those status code mappings are a
|
||
contract. Changing `FILE_NOT_FOUND` from 404 to 422 in a later `to_openapi`
|
||
revision breaks clients.
|
||
|
||
This is the "informal spec becomes formal spec" problem. The generated
|
||
artifact is the contract; the "best-effort" label is internal framing that
|
||
external consumers never see. ADR-009's framework classifies doors by
|
||
reversal cost in the codebase, not by compatibility cost for external
|
||
consumers — a systematic blind spot.
|
||
|
||
**Suggestion**: Add a guard clause to ADR-017: "The `to_*` mapping choices,
|
||
once published as a generated spec, become a compatibility contract for
|
||
external consumers. Best-effort mappings are two-way before first publication
|
||
but one-way after. Version the generated specs (e.g., OpenAPI spec version
|
||
tied to the registry's External operation set version) and document that
|
||
mapping changes are breaking for consumers. The `to_*` adapters should emit a
|
||
spec version marker so consumers can detect mapping changes."
|
||
|
||
---
|
||
|
||
### W6. The Salt Field's "Reserved for Future KDF" Framing Is Misleading — Semantics Can't Change for Existing Data
|
||
|
||
**Documents**: ADR-020 (L102–108, L151–157, L204–207, Accepted),
|
||
crates/vault/encryption.md (L112–128)
|
||
|
||
**Problem**: ADR-020 L108: "If a future KDF-based derivation is needed (see
|
||
'Future KDF' below), the field is already present." ADR-020 L156: "This is
|
||
additive — it doesn't change v2 data."
|
||
|
||
The "reserved" framing creates a false sense of flexibility:
|
||
|
||
- **The salt in v2 is cryptographically meaningless and cannot be
|
||
retroactively made load-bearing for v2 data.** v2 encrypts with an
|
||
HD-derived key at `m/74'/2'/0'/0'`. The salt is random, stored, and unused.
|
||
If v3 introduces a KDF that uses the salt, v3's key derivation is different
|
||
from v2's. v3 cannot decrypt v2 data using the salt — v2 was never encrypted
|
||
with a salt-derived key. The salt is only usable for NEW v3+ data. So "the
|
||
field is already present" is technically true but misleading: the field is
|
||
present in v2 blobs, but its presence doesn't make v2 data upgradeable to a
|
||
KDF. It only means v3 blobs can reuse the field without a wire-format
|
||
change.
|
||
- **The wire-format-compat framing is the real two-way door, and it's
|
||
accurate.** Keeping the field means the `EncryptedData` struct doesn't
|
||
change shape across v2→v3. That IS a wire-format two-way door — you can add
|
||
a KDF in v3 without changing the struct. But the *semantics* of the salt
|
||
field ("unused" in v2, "load-bearing" in v3) cannot be changed for existing
|
||
v2 data.
|
||
- **A KDF doesn't fit the rotation scheme.** Rotation is version-indexed
|
||
paths (`m/74'/2'/0'/{version-2}'`). A KDF-based derivation is a different
|
||
derivation *method*, not a different path. If a KDF is introduced, the
|
||
rotation mechanism (version-indexed paths) and the KDF mechanism
|
||
(salt-based) coexist or conflict. The "reserved salt" framing doesn't
|
||
acknowledge that a KDF is a different derivation family, not just another
|
||
version index.
|
||
|
||
**Suggestion**: Update ADR-020: "The salt field is reserved for FUTURE
|
||
versions' use. v2 data's salt is permanently unused — it was random, never
|
||
participated in key derivation, and cannot be retroactively made
|
||
load-bearing. Introducing a KDF in v3 is a new derivation method (not a
|
||
version-indexed path), requiring its own design and a v2→v3 migration
|
||
(re-encrypt with the new KDF, using a newly-generated v3 salt — the v2 salt
|
||
is not reused). The field's presence saves a wire-format struct change only;
|
||
it does not make the KDF design or migration trivial." Drop the speculative
|
||
"reserved for future KDF" framing or explicitly mark it as speculative, not a
|
||
planned feature.
|
||
|
||
---
|
||
|
||
### W7. `unlock_new` Returns the Mnemonic as a Non-Zeroizing `String` — Security Gap
|
||
|
||
**Documents**: crates/vault/service.md (L71–79)
|
||
|
||
**Problem**: `unlock_new(word_count) -> Result<String, VaultServiceError>`
|
||
returns the generated mnemonic phrase as a `String`. The `Mnemonic` type
|
||
zeroizes on drop (mnemonic-derivation.md L77–78), but `String` does not
|
||
implement `Zeroize` by default. The root of trust is returned to the caller
|
||
as a plaintext heap-allocated `String` that lingers in freed memory until the
|
||
allocator reuses it. The spec says "Store the returned phrase securely"
|
||
(L78–79) but does not address that the returned `String` cannot be zeroized
|
||
by the caller without `unsafe` or converting to a zeroizing container.
|
||
|
||
This contradicts the stated principle "Zeroize everything sensitive"
|
||
(vault README L65–67) and the `Mnemonic` type's own zeroization. The Security
|
||
Constraints section (service.md L365–399) does not mention this.
|
||
|
||
**Suggestion**: Either return a zeroizing wrapper (e.g., `Zeroizing<String>`
|
||
or a dedicated `Phrase` type that implements `Zeroize`/`ZeroizeOnDrop`), or
|
||
document in Security Constraints that the caller must convert the returned
|
||
`String` to a zeroizing type and must not let it be cloned/logged. At
|
||
minimum, flag this as a known limitation.
|
||
|
||
---
|
||
|
||
### W8. `DerivedKey` JSON Deserialization Silently Produces a Corrupted Key
|
||
|
||
**Documents**: crates/vault/protocol.md (L94–112)
|
||
|
||
**Problem**: protocol.md L104–107 states: "A JSON-deserialized `DerivedKey`
|
||
will have `"[REDACTED]"` as its `private_key` string — this is expected; JSON
|
||
round-tripping a `DerivedKey` is not a supported use case (the private key is
|
||
gone)."
|
||
|
||
This is a silent-failure footgun: if a `DerivedKey` is accidentally
|
||
serialized to JSON and then deserialized (e.g., in a log pipeline, a debug
|
||
tool, or a caching layer that uses JSON), the resulting `DerivedKey` has a
|
||
10-byte string (`"[REDACTED]"`) as its `private_key` instead of 32 bytes. Any
|
||
code that then uses this `DerivedKey` for signing will either produce garbage
|
||
signatures or panic on length mismatch — with no error at deserialization
|
||
time. The redaction is a defense-in-depth for *serialization* (logs), but the
|
||
*deserialization* side silently accepts the redacted form, producing a
|
||
corrupted key.
|
||
|
||
For the irpc/postcard path this is not an issue (postcard preserves bytes).
|
||
But if any code path ever uses JSON for `DerivedKey` transport (the spec says
|
||
this is "not a supported use case" but doesn't prevent it), the corruption is
|
||
silent.
|
||
|
||
**Suggestion**: Either (a) make `DerivedKey::deserialize` reject payloads
|
||
where `private_key == "[REDACTED]"` (returning an error rather than a
|
||
corrupted key), or (b) document explicitly that `private_key` length must be
|
||
validated before use and that JSON-deserialized `DerivedKey`s are invalid.
|
||
Consider whether the custom `Deserialize` impl should enforce a 32-byte
|
||
length check.
|
||
|
||
---
|
||
|
||
### W9. No ADR for the HD-Derivation Key Model (Identity/SSH/Signing) — Only Encryption Keys Have ADR-020
|
||
|
||
**Documents**: crates/vault/mnemonic-derivation.md (L31–46, L248–256),
|
||
crates/vault/README.md (L68–70)
|
||
|
||
**Problem**: ADR-020 covers HD derivation for *encryption* keys. But the
|
||
vault's primary use of HD derivation is for identity keys, SSH host keys, and
|
||
signing keys (mnemonic-derivation.md L26–29, L184–215). The rationale for "HD
|
||
derivation, not stored keys" (L31–46: no key storage, reproducible across
|
||
nodes, domain separation, auditable derivation) is inline in the spec's "Why
|
||
HD Derivation" section. The Design Decisions table (L250–256) lists this with
|
||
no ADR ("HD derivation (not stored keys) | — | One seed, many keys, no key
|
||
storage").
|
||
|
||
This is a foundational architectural decision — the entire vault model
|
||
depends on it — yet it has no ADR. The choice of HD derivation vs. stored keys
|
||
is a one-way door (switching to stored keys would change the entire trust
|
||
model). ADR-018 and ADR-019 both *assume* HD derivation but don't *decide* it.
|
||
|
||
Similarly, these design choices have inline rationale but no ADR:
|
||
- **`74'` coin type reservation** (L186, L254): SLIP-0044 unallocated coin
|
||
type claimed for alknet. This is a namespace allocation that, once keys are
|
||
derived, cannot be changed without re-deriving all keys — effectively
|
||
one-way. No ADR records this allocation.
|
||
- **secp256k1 feature-gating** (L166–179): rationale (heavy C dependency, only
|
||
needed for Ethereum) is inline. No ADR.
|
||
- **AES-256-GCM cipher/mode choice** (encryption.md L27–37): rationale
|
||
(authenticated, hardware-accelerated) is inline. No ADR.
|
||
|
||
**Suggestion**: Either (a) create an ADR for "HD derivation as the vault's key
|
||
model" covering identity/SSH/signing keys (the encryption case is already
|
||
ADR-020), or (b) broaden ADR-020's scope, or (c) create a single "Vault Key
|
||
Model" ADR covering the overarching HD-derivation decision with ADR-020 as a
|
||
special case for encryption keys. Record the `74'` coin type reservation and
|
||
the AES-256-GCM choice in ADRs or a tracked decisions log, since they are
|
||
one-way doors with no current ADR.
|
||
|
||
---
|
||
|
||
### W10. The `EncryptedData` "Stable Wire Format" Is a One-Way Door Documented Only in Prose
|
||
|
||
**Documents**: crates/vault/encryption.md (L84–95), ADR-018 (L101–104)
|
||
|
||
**Problem**: encryption.md L86–95 states: "This is the **stable wire format**
|
||
shared with `alknet-storage` (a future crate) by type-level agreement, not by
|
||
a crate dependency. Both crates must agree on the serialization format… This
|
||
cross-language compatibility is why the wire format must stay stable —
|
||
changing it breaks both `alknet-storage` and the TypeScript consumer."
|
||
ADR-018 L101–104 repeats this.
|
||
|
||
This is a one-way door: once `alknet-storage` and TS consumers depend on the
|
||
format, changing it breaks them. Yet it is not locked by an ADR. It is a
|
||
cross-crate compatibility contract enforced by documentation, not by a decided
|
||
constraint. An implementer modifying `EncryptedData` (e.g., removing the
|
||
unused `salt` field) would find no ADR saying "this format is frozen." The
|
||
"type-level agreement, not a crate dependency" approach (ADR-018) means there
|
||
is no compiler enforcement either — the stability contract exists only in
|
||
prose.
|
||
|
||
**Suggestion**: Create an ADR (or add a decision to ADR-018) locking the
|
||
`EncryptedData` wire format as stable, with the compatibility surface
|
||
explicitly enumerated (fields, base64 encoding, field semantics). Note that
|
||
the `salt` field, though unused in v2, is part of the frozen format and cannot
|
||
be removed without a format-version migration.
|
||
|
||
---
|
||
|
||
### W11. `OperationContext.identity` Conversion from `CompositionAuthority` Is Undefined
|
||
|
||
**Documents**: operation-registry.md (L236:
|
||
`identity: parent.handler_identity.as_identity()`), ADR-022 (L314:
|
||
`identity: parent.handler_identity_as_identity()`)
|
||
|
||
**Problem**: Both spec and ADR-022 populate the child's `identity` by
|
||
converting the parent's `handler_identity: Option<CompositionAuthority>` into
|
||
an `Option<Identity>` via a method `as_identity()` /
|
||
`handler_identity_as_identity()`. This conversion method is **never defined**
|
||
and its semantics are unclear: `CompositionAuthority` (label + scopes +
|
||
resources) is a declared authority bundle, while `Identity` is a peer identity
|
||
resolved via `IdentityProvider`. ADR-022 goes to lengths to stress that
|
||
`CompositionAuthority` is *not* a peer `Identity` (L150–160). Yet the dispatch
|
||
path needs an `Identity` to run ACL against for the child.
|
||
|
||
So there's an implicit conversion from a non-peer authority bundle to a peer
|
||
identity type. Is it a synthetic `Identity` constructed with the authority's
|
||
label as `id` and its scopes? Does it bypass `IdentityProvider`? Is
|
||
`as_identity()` returning `None` when `handler_identity` is `None` (the leaf
|
||
case)? For leaves, the ACL "checks against the leaf's `AccessControl`" — but
|
||
if `identity` is `None` because the parent is a leaf with
|
||
`handler_identity: None`, then per operation-registry.md L86–87 "If the
|
||
relevant identity is `None` and the operation has restrictions, the adapter
|
||
returns `FORBIDDEN`." That would mean a composing handler with no composition
|
||
authority cannot compose *any* restricted leaf — which contradicts the model
|
||
where leaves are composed under the *parent's* authority.
|
||
|
||
Actually: a composing parent (Local/Session) *does* have
|
||
`handler_identity: Some(...)`, so `as_identity()` returns `Some`. Leaves being
|
||
composed *are* the child; their own `handler_identity` is `None` but that's
|
||
fine because the child's ACL uses the child's `identity` (which is the parent's
|
||
authority). So the conversion is load-bearing and must be specified.
|
||
|
||
**Suggestion**: Define `CompositionAuthority::as_identity() -> Option<Identity>`
|
||
(or `handler_identity_as_identity()` on `OperationContext`) explicitly in the
|
||
spec or ADR-022. Specify that it constructs a synthetic
|
||
`Identity { id: label, scopes, resources }` that is *not* resolvable via
|
||
`IdentityProvider` but is used directly for ACL matching. Note that this
|
||
creates a second `Identity` construction path (the first is
|
||
`IdentityProvider::resolve_*`), which should be acknowledged.
|
||
|
||
---
|
||
|
||
### W12. Source Drift Items Are Tracked Only in Scattered Prose — The `spawn` Return Bug Could Be Lost
|
||
|
||
**Documents**: crates/vault/README.md (L79–103), crates/vault/encryption.md
|
||
(L185–188, L245–250), crates/vault/service.md (L280–286, L374–391)
|
||
|
||
**Problem**: Four source-vs-spec drift items are documented in prose across
|
||
multiple files:
|
||
|
||
1. **OsRng vs `rand::random()`** — documented in README L87–89, encryption.md
|
||
L245–250, service.md L374–376 (3 locations).
|
||
2. **`unwrap()` on RwLock** — documented in README L94–97, service.md
|
||
L384–391 (2 locations).
|
||
3. **`CURRENT_KEY_VERSION = 1` (should be 2)** — documented in encryption.md
|
||
L185–188, ADR-020 L120–124 (2 locations, none in README).
|
||
4. **`spawn` returns unspawned actor** — documented **only** in service.md
|
||
L280–286 (1 location, inside a method description, not in a Security
|
||
Constraints section).
|
||
|
||
Problems with this approach:
|
||
- The `spawn` bug (item 4) is documented in exactly one place, buried in the
|
||
`spawn` method's bullet description, not in any "Security Constraints" or
|
||
"Known Drift" section. An implementer who reads the Security Constraints
|
||
sections (the natural place to look for implementation-sync corrections)
|
||
would miss it.
|
||
- README's Security Constraints (L79–103) lists OsRng and unwrap but **omits**
|
||
`CURRENT_KEY_VERSION` and the `spawn` bug. README is the index document; an
|
||
implementer using it as a checklist would miss two drift items.
|
||
- There is no single, structured tracking mechanism. If one prose location is
|
||
updated and others aren't, the docs silently diverge.
|
||
|
||
**Suggestion**: Add a single "Known Source Drift" table to README.md (or a
|
||
dedicated section) listing all drift items with: item, current behavior,
|
||
target behavior, location in source, and status. At minimum, add
|
||
`CURRENT_KEY_VERSION` and the `spawn` bug to README's Security Constraints.
|
||
Move the `spawn` bug note out of the method description and into the drift
|
||
tracker.
|
||
|
||
---
|
||
|
||
### W13. `CertAuthorityEntry` Is an Undefined Placeholder Type Used in `DynamicConfig`
|
||
|
||
**Documents**: crates/core/config.md (L119–130)
|
||
|
||
**Problem**: `AuthPolicy.cert_authorities: Vec<CertAuthorityEntry>` is
|
||
declared, but `CertAuthorityEntry` is explicitly "a placeholder type. Its
|
||
fields will be defined when alknet-ssh is implemented." This means alknet-core's
|
||
public API references a type that does not exist yet. The spec says "for v1,
|
||
`cert_authorities` will be an empty vector," implying the field is dead weight.
|
||
An implementer cannot compile `DynamicConfig` without defining
|
||
`CertAuthorityEntry` somewhere. Where does it live — alknet-core (but it's
|
||
SSH-specific per the doc) or alknet-ssh (but then alknet-core can't reference it
|
||
without a dependency)? This is a crate-dependency contradiction: core
|
||
references an SSH-defined type, but core cannot depend on alknet-ssh (ADR-003
|
||
forbids it).
|
||
|
||
**Suggestion**: Either define a minimal `CertAuthorityEntry` in alknet-core
|
||
(as an opaque/newtype or a generic struct with the fields that are
|
||
ALPN-agnostic), or remove `cert_authorities` from v1 `AuthPolicy` entirely and
|
||
add it back when alknet-ssh lands (it's a two-way door — adding a field is
|
||
additive). Leaving an undefined type in a public struct blocks compilation.
|
||
|
||
---
|
||
|
||
### W14. `SecretKey` Type in `TlsIdentity::RawKey(SecretKey)` Is Undefined and Its Origin Ambiguous
|
||
|
||
**Documents**: crates/core/config.md (L44, L79)
|
||
|
||
**Problem**: `TlsIdentity::RawKey(SecretKey)` and `SecretKey::generate()` are
|
||
used but `SecretKey` is never imported or attributed. It could be
|
||
`ed25519_dalek::SecretKey`, `iroh::SecretKey`, `alknet_vault::SecretKey`, or a
|
||
core-defined type. Since config.md says the key "can be derived from
|
||
alknet-vault" (per endpoint.md:181), and alknet-core must not depend on
|
||
alknet-vault (ADR-003, ADR-018), this matters: if `SecretKey` is from iroh,
|
||
then alknet-core depends on iroh even for quinn-only nodes; if it's a
|
||
re-export/newtype, that needs stating. The crate dependency table in ADR-003
|
||
L27 lists alknet-core depending on "tokio, quinn, rustls, irpc" — **not iroh**
|
||
— yet `config.md` and `endpoint.md` show iroh types in core's public API.
|
||
|
||
**Suggestion**: Specify the exact type and crate for `SecretKey`. If it's
|
||
`iroh::SecretKey`, update ADR-003's dependency list (iroh is already used for
|
||
`iroh::Endpoint`, so this is consistent, but ADR-003 omits iroh from
|
||
alknet-core's deps — a separate inconsistency). If it's `ed25519_dalek`, say
|
||
so. Resolve whether alknet-core publicly depends on iroh types.
|
||
|
||
---
|
||
|
||
### W15. ADR-003 Omits iroh From alknet-core's Dependency List; Spec Docs Assume It
|
||
|
||
**Documents**: ADR-003 (L27), crates/core/endpoint.md (L20), config.md (L44)
|
||
|
||
**Problem**: ADR-003 L27 lists alknet-core as depending on "tokio, quinn,
|
||
rustls, irpc" — no iroh. But `AlknetEndpoint` holds
|
||
`iroh: Option<iroh::Endpoint>` (endpoint.md L20, ADR-010 L67),
|
||
`SendStream`/`RecvStream` wrap "iroh::SendStream or iroh::RecvStream"
|
||
(core-types.md L103), and `TlsIdentity::RawKey(SecretKey)` likely uses iroh's
|
||
key type. ADR-010 L184 says this is "mitigated: both are feature-gated." But
|
||
ADR-003's dependency table is now stale — it doesn't reflect the iroh
|
||
dependency that ADR-010 introduced.
|
||
|
||
**Suggestion**: Update ADR-003's dependency table to include iroh
|
||
(feature-gated) for alknet-core, or add an amendment note that ADR-010 added
|
||
the iroh dependency.
|
||
|
||
---
|
||
|
||
### W16. Overview Failure-Mode Table Says Handler Error Closes "QUIC Stream"; Spec Says Connection
|
||
|
||
**Documents**: overview.md (L237–238), crates/core/core-types.md (L30, L46),
|
||
crates/core/endpoint.md (L255–257, L262)
|
||
|
||
**Problem**: `overview.md:237` says "Handler `handle()` returns `HandlerError`
|
||
| Endpoint logs the error, closes the QUIC **stream**." But
|
||
`core-types.md:30` says "The endpoint catches these, logs them, and closes the
|
||
**connection**." `endpoint.md:256` says HandlerError is "Non-fatal errors
|
||
within a handler" and the connection closes. The whole architecture is
|
||
one-ALPN-per-connection (ADR-006), and `handle()` receives the whole
|
||
`Connection`. Closing a "stream" doesn't make sense when the handler owns the
|
||
connection. This is a stale leftover from the pre-ADR-007 model where
|
||
`handle()` received a `BiStream`.
|
||
|
||
**Suggestion**: Change `overview.md:237–238` from "closes the QUIC stream" to
|
||
"closes the QUIC connection."
|
||
|
||
---
|
||
|
||
### W17. OQ-09's "BiStream Trait Preserves the WASM Door" Is Only True for the Client Side
|
||
|
||
**Documents**: open-questions.md (OQ-09 L100–107), ADR-007, ADR-009 (L41, L44),
|
||
crates/core/core-types.md, crates/core/endpoint.md
|
||
|
||
**Problem**: The resolution claims "The BiStream trait decision (ADR-007) has
|
||
already preserved the most important WASM door." This is accurate for the
|
||
*client* path (a browser can implement `BiStream` over WebTransport) but
|
||
oversells what is preserved for a *server-side* WASM peer:
|
||
|
||
- **`Connection` is a concrete struct, not a trait** (ADR-007). Its
|
||
`accept_bi()`/`open_bi()` return `SendStream`/`RecvStream` described as
|
||
wrapping "the underlying QUIC stream types." A WASM server-side peer cannot
|
||
implement this `Connection` without a QUIC library that compiles to WASM.
|
||
BiStream being a trait does not help — handlers receive `Connection`, and
|
||
`Connection` is the binding point.
|
||
- **The accept loop uses `tokio::spawn`** (ADR-010). Tokio does not run on
|
||
WASM. The entire endpoint runtime is tokio-bound.
|
||
- **`PendingRequestMap` and the `CallAdapter` use tokio `oneshot`/`mpsc`
|
||
channels** (call-protocol.md L218–240). These are tokio-specific.
|
||
|
||
The OQ framing implies BiStream was the binding constraint and the rest is
|
||
detail. In reality, BiStream preserves the *client stream* door, but
|
||
`Connection`, the accept loop, and the call-protocol dispatch internals
|
||
quietly close the *server-side dispatch* door.
|
||
|
||
**Suggestion**: Update OQ-09's resolution to state: "BiStream being a trait
|
||
preserves the client-side stream door. The server-side dispatch door
|
||
(`Connection` as a concrete quinn-bound struct, tokio-based accept loop, tokio
|
||
channel dispatch) is NOT preserved by ADR-007 and would require a `Connection`
|
||
trait and a runtime-abstracted accept loop to open. This is a known, accepted
|
||
closure — the browser path is client-side via a JS SDK, not server-side
|
||
Rust-to-WASM." This converts the misleading framing into an explicit, accepted
|
||
one-way door.
|
||
|
||
---
|
||
|
||
### W18. OQ-10's "Start With Smart Protocol" Deferral Forecloses Git Composability
|
||
|
||
**Documents**: open-questions.md (OQ-10 L109–116)
|
||
|
||
**Problem**: The deferral says "start with git smart protocol over QUIC
|
||
streams, ERC721 and full server capabilities are additive." The ALPN/endpoint
|
||
design does not constrain the git adapter's *wire format* — the handler owns
|
||
its format and `alknet/git` is a dedicated connection. That part is genuinely
|
||
two-way.
|
||
|
||
However, the deferral hides a fork that the "path of least resistance" will
|
||
silently resolve in a way that forecloses composability:
|
||
|
||
- **Raw smart protocol vs. call-protocol projection.** If git ships as a
|
||
`ProtocolHandler` speaking raw git smart protocol over QUIC streams (the
|
||
simplest path), git operations are NOT callable via `env.invoke()`. They
|
||
exist only on the `alknet/git` ALPN, not in the `OperationRegistry`. An
|
||
agent handler that wants to compose `git/clone` cannot — there's no
|
||
`OperationSpec`, no `Handler`, no registration. The agent would have to
|
||
open a raw QUIC connection to `alknet/git` and speak pkt-line, bypassing
|
||
the entire composition/security model (ADR-015, ADR-022).
|
||
- To make git composable later, you'd need a *separate* call-protocol
|
||
projection — a set of `OperationSpec`/`Handler` pairs that wrap git
|
||
operations behind the registry. That's additive, but it means the "simple"
|
||
choice (raw smart protocol only) produces a git adapter that is permanently
|
||
non-composable unless a second handler layer is built.
|
||
|
||
The deferral frames ERC721/full-server as the additive part. But the real
|
||
additive question is composability, and the simplest implementation forecloses
|
||
it.
|
||
|
||
**Suggestion**: Append to OQ-10's resolution: "The composability of git
|
||
operations (whether git ops are registered in the `OperationRegistry` and
|
||
callable via `env.invoke()`, or only available as raw smart protocol on
|
||
`alknet/git`) is a separate decision from ERC721 scope. The path of least
|
||
resistance (raw smart protocol only) forecloses agent composition of git
|
||
operations. If agent tool dispatch needs to compose git ops, a call-protocol
|
||
projection must be built alongside or instead of the raw handler. Resolve
|
||
this when speccing alknet-git, not deferred past it."
|
||
|
||
---
|
||
|
||
### W19. ADR-016 Self-Referential Citation and Under-Specified Grandchild Propagation
|
||
|
||
**Documents**: ADR-016 (L134, L154–158)
|
||
|
||
**Problem**: Two minor issues in ADR-016:
|
||
|
||
1. **Self-referential citation** (L134): ADR-016 cites itself ("ADR-016
|
||
Assumption 5 says the policy is per-call, not per-operation") in its own
|
||
Decision 6 rationale. This should just be "Assumption 5" (it's the same
|
||
document).
|
||
|
||
2. **Grandchild propagation under-specified** (L154–157): "The child's
|
||
`OperationContext` carries the policy. If the child itself composes
|
||
grandchildren, the policy propagates unless the child explicitly overrides
|
||
it." This is the only statement of propagation-vs-override semantics. It
|
||
doesn't specify: does the grandchild *inherit* the child's policy by
|
||
default (so if the child got `ContinueRunning`, the grandchild is
|
||
`ContinueRunning` unless overridden)? Or does each `invoke()` reset to
|
||
the default `AbortDependents` unless explicitly set? The phrase
|
||
"propagates unless overridden" implies inheritance, which is reasonable,
|
||
but the `invoke()` default (operation-registry.md L213, ADR-016 L146) shows
|
||
`context.env.invoke(...)` with no policy = default. So is "no policy arg"
|
||
= "inherit parent's" or "reset to AbortDependents"? These conflict.
|
||
|
||
**Suggestion**: (1) Change "ADR-016 Assumption 5" to "Assumption 5" within
|
||
ADR-016. (2) Clarify: does `invoke()` with no policy argument (a) inherit the
|
||
parent's current policy, or (b) reset to `AbortDependents`? Recommend (b)
|
||
reset-to-default for predictability (each composition is a fresh decision)
|
||
unless the parent opts the child into `ContinueRunning`, and document that
|
||
`ContinueRunning` does *not* auto-propagate to grandchildren (the child must
|
||
re-affirm it for each grandchild). Or pick (a) and update the `invoke()`
|
||
examples. Either is defensible; pick one.
|
||
|
||
---
|
||
|
||
### W20. ADR-023's `from_openapi` Error Code Example Collides With Protocol-Level `NOT_FOUND`
|
||
|
||
**Documents**: ADR-023 (Assumption 3 L366–373, Decision 5 L241)
|
||
|
||
**Problem**: ADR-023 Assumption 3 acknowledges the collision risk: an operation
|
||
declaring `NOT_FOUND` as a domain code collides with the protocol-level
|
||
`NOT_FOUND` (operation not registered). The resolution (L367–368): "the
|
||
protocol code takes precedence in the dispatch machinery." And (L372–373):
|
||
"The assumption is that operations use domain-specific codes (`FILE_NOT_FOUND`)
|
||
rather than reusing protocol codes (`NOT_FOUND`). This is a naming convention,
|
||
not a type-system enforcement."
|
||
|
||
But Decision 5's example (L241) shows `from_openapi` mapping a 404 to
|
||
`ErrorDefinition { code: "NOT_FOUND", http_status: Some(404), … }` — i.e.,
|
||
`from_openapi` *does* produce a domain code that collides with a protocol
|
||
code. So the "naming convention" is violated by the very adapter the ADR
|
||
specifies. An implementer following Decision 5's example would create a
|
||
`FILE_NOT_FOUND`-vs-`NOT_FOUND` ambiguity: when `call.error` carries
|
||
`code: "NOT_FOUND"`, is it "operation not registered" (protocol) or "file not
|
||
found" (operation-level, from a 404)? The `details` field disambiguates
|
||
(present for operation-level, absent for protocol-level), but the `code` alone
|
||
is ambiguous, and ADR-023 L171 says "Clients should switch on `code`, not
|
||
parse `message`."
|
||
|
||
**Suggestion**: Either (a) make `from_openapi` prefix/namespace the imported
|
||
error codes (e.g., `HTTP_404` or `<op>_NOT_FOUND`) to avoid collision, or (b)
|
||
update Decision 5's example to use a non-colliding code and add a normative
|
||
rule that `from_openapi` must not produce codes that match the five
|
||
protocol-level codes. Assumption 3's "naming convention" should be elevated to
|
||
a requirement for adapters, since the adapter example breaks it.
|
||
|
||
---
|
||
|
||
### W21. `ScopedOperationEnv` Public Field Leaks the `HashSet` Representation — Future Subgraph Refactor Is Breaking
|
||
|
||
**Documents**: ADR-022 (L208–214, L97–105), operation-registry.md (L188, L328)
|
||
|
||
**Problem**: ADR-022 L97–105: "For v1, the scoped env can be a
|
||
`HashSet<String>` of reachable operation names. A dedicated
|
||
`alknet-flowgraph` crate … is a future enhancement — not a prerequisite for
|
||
the security model."
|
||
|
||
The `HashSet<String>` representation is fine *internally*. The problem is that
|
||
it leaks into the public API in a way that makes the future refactor a
|
||
breaking change:
|
||
|
||
- **`ScopedOperationEnv` is a public struct with a public field.** ADR-022
|
||
L208–214 shows `pub struct ScopedOperationEnv { pub allowed_operations:
|
||
HashSet<String> }`. The assembly layer and agent crate construct this
|
||
directly: `ScopedOperationEnv { allowed_operations: HashSet::from(…) }`
|
||
(operation-registry.md L188, L328). If the representation later changes to a
|
||
subgraph type (with type-compatibility edges, as the flowgraph model
|
||
requires), every construction site breaks — the field type changes from
|
||
`HashSet<String>` to `Subgraph` or `Vec<Edge>`. This is a public-API
|
||
breaking change, not an internal refactor.
|
||
|
||
The framing says the flowgraph crate is "not a prerequisite for the security
|
||
model." That's true — the security model works with a flat set. But the
|
||
*refactor* from flat set to subgraph is not the trivial "two-way door" the
|
||
framing implies. It's a breaking change to construction sites.
|
||
|
||
Additionally, this interacts with OQ-19 (session-scoped registries): session
|
||
ops compose under a scoped env. If the scoped env is a flat `HashSet<String>`,
|
||
session ops get reachability control (can this op be reached?) but NOT
|
||
type-compatibility validation (does the output of op A feed validly into op
|
||
B?). A session op (untrusted, LLM-written) that composes an op whose output
|
||
type doesn't match the next op's input gets a runtime error, not a static
|
||
rejection. The flowgraph deferral means session ops ship without static type
|
||
validation — a security-relevant gap for untrusted code.
|
||
|
||
**Suggestion**: Make `allowed_operations` private (not `pub`), with a
|
||
constructor `ScopedOperationEnv::new(ops: impl IntoIterator<Item = String>)`
|
||
and a query method `allows(&self, name: &str) -> bool`. This keeps the
|
||
representation internal and makes the future subgraph refactor a non-breaking
|
||
change. Add to the flowgraph deferral: "The `HashSet<String>` representation
|
||
does not support type-compatibility validation. Session-scoped ops (OQ-19,
|
||
untrusted code) compose without static type checking until the flowgraph
|
||
crate is built. This is a known gap for the untrusted-code path, not just a
|
||
visualization feature."
|
||
|
||
---
|
||
|
||
## Suggestions
|
||
|
||
### S1. Inline Decision Rationale in Spec Docs That Should Be in ADRs
|
||
|
||
**Documents**: call-protocol.md (L341, L343–344, L124), operation-registry.md
|
||
(L202, L364–369, L377), crates/vault/encryption.md (L27–37),
|
||
crates/vault/mnemonic-derivation.md (L31–46)
|
||
|
||
**Problem**: Per the architect's documentation principles, spec docs should
|
||
reference ADRs by number, not contain the rationale. Several spec sections
|
||
contain *why* rationale that duplicates or extends ADR content:
|
||
|
||
- call-protocol.md L341 duplicates ADR-016 Decision 2's rationale ("the
|
||
correct default because aborted parent work has no consumer…") verbatim.
|
||
- operation-registry.md L202 explains *why* metadata doesn't propagate (the
|
||
secret-leak scenario) — this is ADR-014's rationale, restated inline.
|
||
- operation-registry.md L364–369 explains the `Clone` semantics of
|
||
`Capabilities` and the security implication — this is a design rationale,
|
||
not in any ADR. It should be in ADR-014 or ADR-022.
|
||
- operation-registry.md L377 ("`from_call` trust is transitive… a compromised
|
||
remote node can do anything") is a threat-model statement that belongs in
|
||
ADR-017.
|
||
- crates/vault/encryption.md L27–37 (AES-256-GCM rationale) and
|
||
mnemonic-derivation.md L31–46 (HD derivation rationale) have no ADRs (see
|
||
W9).
|
||
|
||
**Suggestion**: Move the unique rationale to the relevant ADRs (ADR-016
|
||
already has the abort-dependents rationale — just cite it; add
|
||
capabilities-clone rationale to ADR-014 or ADR-022; add `from_call` transitive
|
||
trust to ADR-017) and replace the inline rationale with ADR references. Keep
|
||
only the *constraint statement* in the spec.
|
||
|
||
---
|
||
|
||
### S2. Untracked "Two-Way Door" Deferrals Scattered Through Specs
|
||
|
||
**Documents**: operation-registry.md (L383, L257, L364, L369),
|
||
call-protocol.md (L333), ADR-017 (L236–237, L279)
|
||
|
||
**Problem**: Per the documentation principles, inline open questions should be
|
||
in `open-questions.md`. Several "two-way door" / "future work" notes are
|
||
scattered through the specs without being tracked as open questions:
|
||
|
||
- operation-registry.md L383: "Two-way door — `ArcSwap<OperationRegistry>`
|
||
can be added later" (contradicts the "immutable after construction"
|
||
constraint at L14, L194, L340).
|
||
- operation-registry.md L257: "Future work may add irpc service dispatch and
|
||
remote call protocol dispatch as additional backends."
|
||
- operation-registry.md L364: "The concrete shape of the type… is a two-way
|
||
door for implementation."
|
||
- operation-registry.md L369: "The concrete cloning semantics… is a two-way
|
||
door for implementation."
|
||
- call-protocol.md L333: timeout policy (30s default, adapter-level config) —
|
||
no ADR governs this.
|
||
- ADR-017 L236–237, L279: CallClient registry mechanism and `from_call`
|
||
hot-swapping.
|
||
|
||
**Suggestion**: Add the genuinely-open items (Capabilities concrete shape,
|
||
Capabilities clone semantics, `OperationAdapter` trait signatures, CallClient
|
||
registry mechanism, `to_*` adapter gap semantics, `from_call` hot-swap,
|
||
registry hot-swap via ArcSwap, timeout policy) to `open-questions.md` with
|
||
explicit status. For items that are *decided* deferrals, record the deferral
|
||
in the relevant ADR's Consequences rather than as inline spec commentary.
|
||
|
||
---
|
||
|
||
### S3. Missing ADR References for Design Choices ("Why X Over Y?")
|
||
|
||
**Documents**: call-protocol.md (L86, L116, L262–272, L333),
|
||
operation-registry.md (L64–67, L120–127, L139)
|
||
|
||
**Problem**: Several design choices are stated in the spec with no ADR
|
||
backing:
|
||
|
||
- **Binary payload convention** (call-protocol.md L86): base64-in-JSON. Why
|
||
not a separate binary event type? No ADR.
|
||
- **`auth_token` in payload** (call-protocol.md L116–122, L262–272): per-
|
||
request identity upgrade via an optional `auth_token` field. Significant
|
||
security-model decision with no ADR.
|
||
- **Per-request vs per-connection identity** (L272): "Identity is resolved
|
||
per-request, not per-connection." No ADR.
|
||
- **Timeout policy** (L333): 30s default, adapter-level config. No ADR.
|
||
- **Module-private `internal` writes** (operation-registry.md L120–127):
|
||
the enforcement mechanism for "handlers can't set `internal`." The
|
||
requirement is from ADR-015; the mechanism (Rust visibility) has no ADR.
|
||
- **Namespace derivation** (operation-registry.md L66): "`namespace` is
|
||
derived from the name." Why derived rather than independently declared? No
|
||
ADR.
|
||
|
||
**Suggestion**: For load-bearing choices (binary convention, auth_token /
|
||
per-request identity, timeout policy), either add an ADR or extend an
|
||
existing one. For minor choices (namespace derivation, visibility pattern), a
|
||
one-line note suffices, but they shouldn't be silently decided in the spec.
|
||
|
||
---
|
||
|
||
### S4. README ADR Tables Missing Entries and Status Column
|
||
|
||
**Documents**: crates/call/README.md (L19–36), crates/core/README.md
|
||
(L21–31), crates/vault/README.md (L38–47)
|
||
|
||
**Problem**: Three issues across the crate READMEs:
|
||
|
||
1. **call README** omits ADR-013 (Rust canonical implementation), which both
|
||
call spec docs reference (call-protocol.md L88, operation-registry.md L24).
|
||
2. **core README** omits ADR-015, referenced by auth.md L78, L200 and
|
||
config.md L200.
|
||
3. **vault README** omits ADR-010 and ADR-006, referenced by
|
||
mnemonic-derivation.md L26 and protocol.md L290.
|
||
4. **None of the crate READMEs have a Status column** in their ADR tables,
|
||
which contributes to C2 (reader can't tell which ADRs are binding).
|
||
|
||
**Suggestion**: Add the missing ADRs to each crate README's ADR table. Add a
|
||
Status column (Accepted/Proposed/Superseded) to make the governance state
|
||
visible — this directly supports resolving C1/C2.
|
||
|
||
---
|
||
|
||
### S5. `Connection::new()` Referenced but Never Defined
|
||
|
||
**Documents**: crates/core/core-types.md (L113), crates/core/endpoint.md
|
||
(L127), ADR-010 (L117)
|
||
|
||
**Problem**: `Connection::new(connection)` is called in dispatch pseudocode
|
||
(endpoint.md L127, ADR-010 L117) but the `impl Connection` block in
|
||
`core-types.md:59–67` does not include a `new()` constructor, nor specify its
|
||
signature/parameter. `core-types.md:113` says "`Connection::new()` wraps the
|
||
appropriate stream source based on where the connection came from" — but
|
||
there's no formal signature, and the argument type differs by source (a
|
||
`quinn::Connection` vs an iroh `Accepting`/`Connecting`). An implementer
|
||
doesn't know the constructor's parameter type.
|
||
|
||
**Suggestion**: Add `pub fn new(source: ConnectionSource) -> Connection` (or
|
||
two constructors `from_quinn` / `from_iroh`, or a private enum) to the
|
||
`impl Connection` block. Define `ConnectionSource` or the variant
|
||
constructors.
|
||
|
||
---
|
||
|
||
### S6. `services/schema` Input Shape Under-Specified (Leading Slash)
|
||
|
||
**Documents**: operation-registry.md (L286), call-protocol.md (L120, L126)
|
||
|
||
**Problem**: The input shape for `services/schema` is shown as
|
||
`{ "name": "fs/readFile" }` (no leading slash) but the wire `operationId`
|
||
convention (call-protocol.md L120, L126) is *with* a leading slash
|
||
(`/fs/readFile`). So does `services/schema`'s `name` input use the wire form
|
||
(leading slash) or the registry form (no slash)? The spec shows no slash,
|
||
implying registry form, but a client calling `services/schema` over the wire
|
||
would naturally use the same `operationId` form it uses elsewhere. This is a
|
||
small ambiguity but could cause a `NOT_FOUND` if the adapter doesn't
|
||
normalize.
|
||
|
||
**Suggestion**: Specify whether `services/schema`'s `name` input has a
|
||
leading slash and whether the adapter strips it (consistent with
|
||
`call.requested`'s `operationId` handling per call-protocol.md L120).
|
||
|
||
---
|
||
|
||
### S7. `call.completed` For Non-Subscription Calls Not Addressed
|
||
|
||
**Documents**: call-protocol.md (L96–106)
|
||
|
||
**Problem**: The event table lists `call.completed` as "Signal end of
|
||
subscription stream" — subscription-only. But for a plain `call()`
|
||
(request/response, one result), the lifecycle is `call.requested` →
|
||
`call.responded` (one) and then… nothing? Is the request considered complete
|
||
on the first `call.responded`? Is a `call.completed` ever sent for
|
||
non-subscription calls? The spec says (L103) "call(): Sends `call.requested`,
|
||
resolves on first `call.responded`" — so no `call.completed` for calls. But
|
||
then what signals to the server that the client is done listening? This is
|
||
consistent but the asymmetry (subscriptions get `completed`, calls don't)
|
||
could be stated explicitly to avoid an implementer sending a spurious
|
||
`call.completed` after a `call.responded`.
|
||
|
||
**Suggestion**: Add a note: "`call.completed` is sent only for subscriptions.
|
||
A plain `call()` is complete after its single `call.responded`; no
|
||
`call.completed` follows."
|
||
|
||
---
|
||
|
||
### S8. `OperationProvenance::FromCall` Comment Is Ambiguous ("Leaf Locally")
|
||
|
||
**Documents**: ADR-022 (L123–124)
|
||
|
||
**Problem**: `FromCall` => "QUIC forwarding stub (from_call), leaf locally —
|
||
cannot compose." The "locally" qualifier is ambiguous. Does it mean: (a)
|
||
`from_call` ops are leaves in the *local* registry but could compose on the
|
||
*remote* node? Or (b) they're leaves, full stop, and "locally" is just
|
||
emphasis? Assumption 5 (L518–524) clarifies, but the phrasing invites
|
||
misreading.
|
||
|
||
**Suggestion**: Rephrase to "leaf in the local registry — forwards calls to
|
||
a remote node; cannot compose locally" or similar to remove the ambiguity.
|
||
|
||
---
|
||
|
||
### S9. `ScopedOperationEnv::empty()` and `CompositionAuthority::none()` Referenced but Undefined
|
||
|
||
**Documents**: operation-registry.md (L182–184, L243–244, L298–299, L318–320),
|
||
ADR-022 (L287)
|
||
|
||
**Problem**: The spec and ADR-022 use `ScopedOperationEnv::empty()` and
|
||
`CompositionAuthority::none()` (constructor methods) but neither type's
|
||
definition (ADR-022 L164–180 for `CompositionAuthority`, L208–214 for
|
||
`ScopedOperationEnv`) lists these methods. They're implied (empty set /
|
||
None-equivalent) but not shown.
|
||
|
||
**Suggestion**: Add `impl ScopedOperationEnv { pub fn empty() -> Self { … }
|
||
pub fn allows(&self, name: &str) -> bool { … } }` and
|
||
`impl CompositionAuthority { pub fn none() -> Option<Self> { None } }` (or
|
||
similar) to the spec or ADR-022.
|
||
|
||
---
|
||
|
||
### S10. Overview OQ-04 Entry Is Stale
|
||
|
||
**Documents**: overview.md (L225)
|
||
|
||
**Problem**: `overview.md:225` lists "OQ-04: Dynamic handler registration at
|
||
runtime vs static at startup (two-way door, defer to implementation)." But
|
||
OQ-04 in `open-questions.md:47–51` is **resolved** ("Static registration at
|
||
startup"). The overview shows a stale framing ("defer to implementation")
|
||
while the OQ is resolved. Compare to OQ-01/OQ-02/OQ-03 which are correctly
|
||
marked "resolved" in the same overview list.
|
||
|
||
**Suggestion**: Update `overview.md:225` to "OQ-04: Dynamic handler
|
||
registration (resolved: static at startup — see ADR-010)" to match the
|
||
resolved state.
|
||
|
||
---
|
||
|
||
### S11. ADR-021 References "W1 Drift Issue from the Vault Review" — Ambiguous Cross-Reference
|
||
|
||
**Documents**: ADR-021 (L127–129)
|
||
|
||
**Problem**: ADR-021 L127–129 states: "This is the fix for the W1 drift issue
|
||
from the vault review: the current source ignores `key_version` for key
|
||
selection; the spec now makes it functional." But review 001's W1 (L355–371)
|
||
is about *ADR-020's stale example path* (`m/74'/2'/1'/0'` vs
|
||
`m/74'/2'/0'/1'`), not about the source ignoring `key_version`. The
|
||
source-ignoring-`key_version` drift is not tracked in review 001 at all. The
|
||
"W1" label in ADR-021 doesn't match review 001's W1.
|
||
|
||
**Suggestion**: Clarify the reference — either cite review 001 with the
|
||
correct finding number, or remove the "W1" label and describe the drift
|
||
directly (which the text already does).
|
||
|
||
---
|
||
|
||
### S12. Duplicate "Helper Functions" Block in mnemonic-derivation.md
|
||
|
||
**Documents**: crates/vault/mnemonic-derivation.md (L199–204, L217–223)
|
||
|
||
**Problem**: The helper functions `device_path` and `site_password_path` are
|
||
documented twice: L199–204 (missing `encryption_path_for_version`) and
|
||
L217–223 (complete). An implementer reading the first block might not realize
|
||
`encryption_path_for_version` exists, or might think there are two conflicting
|
||
definitions. The duplication is a maintenance hazard.
|
||
|
||
**Suggestion**: Remove the first block (L199–204); keep the complete list at
|
||
L217–223.
|
||
|
||
---
|
||
|
||
## Cross-Cutting Theme: ADR-022 Narrowed Several "Two-Way Doors" Without Updating the OQ Classifications
|
||
|
||
The two-way-door audit found a systematic pattern: ADR-022 (Proposed, not yet
|
||
Accepted) is the single largest source of door-type staleness. It narrowed
|
||
several doors without updating the OQ classifications that were written
|
||
pre-ADR-022:
|
||
|
||
| Door | Pre-ADR-022 classification | Post-ADR-022 reality |
|
||
|------|---------------------------|----------------------|
|
||
| Capabilities concrete shape (W2) | Two-way (implementation detail) | Clone semantics become a behavioral contract once handlers depend on shared mutation; ADR-022's composition model bakes in the cheap-clone assumption |
|
||
| OperationAdapter trait signatures (W1) | Two-way (async vs sync) | ADR-022 committed the return type to `HandlerRegistration`; `from_call` forces async — both are already decided |
|
||
| CallClient registry mechanism (W3) | Two-way (share global vs subset vs separate) | ADR-022 put capabilities (secrets) in the bundle — "share global" is now a capability-exposure decision, not just dispatch |
|
||
| Dynamic registration (W4) | Two-way (add ArcSwap later) | Hot-swap now requires re-injecting capabilities (secret-material handling) + TLS config rebuild, not a pointer swap |
|
||
| Session-scoped registries (OQ-19) | Two-way (protocol doesn't need changes) | `OperationContext.env` field type is a current commitment — if it's a concrete struct, the session-overlay pattern is closed (see C6) |
|
||
| Graph structures (W21) | Two-way (HashSet now, flowgraph later) | `ScopedOperationEnv`'s public field leaks the representation; the refactor is a public-API break, not an internal change |
|
||
|
||
**Pattern**: ADR-022 was written to resolve review #001's C1–C4 (the
|
||
registration-bundle tangle), and in doing so it quietly narrowed several
|
||
previously-two-way doors. The OQ tracker and ADR-017 should be reconciled
|
||
with ADR-022 before the "two-way door" labels can be trusted. ADR-009's
|
||
framework is sound; its application is drifting because later ADRs don't
|
||
update earlier OQ classifications.
|
||
|
||
The fix is not to re-litigate each door — most of ADR-022's narrowing is
|
||
correct (the bundle *should* carry capabilities, the adapter *should* produce
|
||
`HandlerRegistration`). The fix is to add guard clauses to the affected OQs
|
||
and ADRs documenting the narrowed scope, so an implementer reading the OQ
|
||
tracker sees the post-ADR-022 reality, not the pre-ADR-022 optimism.
|
||
|
||
---
|
||
|
||
## Cross-Cutting Theme: The "Published Artifact Is a Contract" Blind Spot
|
||
|
||
The audit found a systematic blind spot in ADR-009's framework: it
|
||
classifies doors by reversal cost in the codebase, not by compatibility cost
|
||
for external consumers. Three deferrals share this pattern:
|
||
|
||
- **`to_*` best-effort mappings (W5)**: "best-effort" is true before first
|
||
publication but one-way after — external clients depend on the generated
|
||
spec.
|
||
- **Error schema `http_status` mappings (W20)**: once `to_openapi` publishes
|
||
`FILE_NOT_FOUND` → 404, changing it breaks clients.
|
||
- **The salt field's wire format (W6)**: the field is "reserved" (wire-format-
|
||
stable), but its *semantics* cannot change for existing data — "reserved"
|
||
implies more flexibility than exists.
|
||
|
||
The framework treats these as two-way because the *format* is additive; it
|
||
misses that the *semantics*, once published, are frozen. This is the same
|
||
class of issue as a public API: adding a method is additive, but changing an
|
||
existing method's behavior is breaking. The spec should acknowledge that
|
||
published artifacts (generated OpenAPI specs, published error code mappings,
|
||
shipped wire formats) are compatibility contracts, not internal
|
||
implementation details.
|
||
|
||
---
|
||
|
||
## Summary Statistics
|
||
|
||
| Severity | Count |
|
||
|----------|-------|
|
||
| Critical | 13 |
|
||
| Warning | 21 |
|
||
| Suggestion | 12 |
|
||
|
||
## Recommended Resolution Order
|
||
|
||
The findings cluster into a few resolution passes. Recommended order:
|
||
|
||
### Pass 1: Governance reconciliation (resolves C1, C2, C12, C11)
|
||
|
||
1. Advance ADR-022 and ADR-023 to **Accepted**.
|
||
2. Amend ADR-015: mark Decision 3 and Assumption 6 as superseded by ADR-022;
|
||
update the struct to `CompositionAuthority`.
|
||
3. Amend ADR-002: note the signature was revised by ADR-007.
|
||
4. Amend ADR-004: note the "enrich/replace" language was superseded by
|
||
ADR-011's immutability model.
|
||
5. Add a Status column to all crate README ADR tables.
|
||
|
||
This is the highest-value pass — it resolves the direct contradictions in the
|
||
document set and makes the governance state coherent.
|
||
|
||
### Pass 2: Spec-ADR consistency (resolves C3, C4, C5, C6, C10, C13, W1)
|
||
|
||
6. Add `abort_policy` to `OperationContext`; define `AbortPolicy` enum (C3).
|
||
7. Define the `OperationEnv` trait explicitly, including
|
||
`invoke_with_policy()` (C4).
|
||
8. Separate `OperationContext.env` (dispatch trait) from
|
||
`OperationContext.scoped_env` (reachability set) — resolve the type
|
||
identity crisis (C6).
|
||
9. Update ADR-017's `from_call` mirror list and `OperationAdapter` trait to
|
||
include `error_schemas` and `HandlerRegistration` (C5, W1).
|
||
10. Specify `From<StreamError> for HandlerError` and resolve the variant
|
||
payload (C10).
|
||
11. Specify the `Connection::set_identity` observability read path (C13).
|
||
|
||
### Pass 3: Vault security and deferral reconciliation (resolves C7, C8, C9, W6, W7, W8, W9, W10, W12)
|
||
|
||
12. Promote OQ-21 from "deferred" to "open"; reconcile with ADR-019; decide
|
||
the vault-server-crate question (C7).
|
||
13. Complete the operation access policy table (C8).
|
||
14. Specify the `site_password_path` string→u32 mapping (C9).
|
||
15. Add an ADR for the HD-derivation key model (identity/SSH/signing) (W9).
|
||
16. Add an ADR locking the `EncryptedData` wire format (W10).
|
||
17. Add a "Known Source Drift" table to the vault README (W12).
|
||
18. Address `unlock_new` returning non-zeroizing `String` (W7) and
|
||
`DerivedKey` JSON deserialization (W8).
|
||
19. Clarify the salt field's "reserved" semantics (W6).
|
||
|
||
### Pass 4: Two-way-door guard clauses (resolves W2, W3, W4, W5, W17, W18, W21)
|
||
|
||
20. Add guard clauses to the OQs and ADRs that ADR-022 narrowed (W2, W3, W4,
|
||
W21).
|
||
21. Add the "published artifact is a contract" guard to `to_*` adapters (W5).
|
||
22. Correct OQ-09 (WASM server-side) and OQ-10 (git composability) framings
|
||
(W17, W18).
|
||
|
||
### Pass 5: Minor consistency (resolves remaining Warnings and Suggestions)
|
||
|
||
23. Fix the overview.md "stream" → "connection" wording (W16).
|
||
24. Fix ADR-016's self-referential citation and grandchild propagation (W19).
|
||
25. Fix ADR-023's `from_openapi` collision example (W20).
|
||
26. Address the remaining suggestions (S1–S12).
|
||
|
||
---
|
||
|
||
## Note on Effort
|
||
|
||
This review found more issues than review #001 (13 critical vs 5, 21 warning
|
||
vs 4), but that's expected: review #001 caught the registration-bundle tangle
|
||
and the missing error schemas, which were the highest-density gaps. This
|
||
review goes wider — cross-document consistency, the two-way-door framing, and
|
||
the vault deferrals — and surfaces issues that were always there but not yet
|
||
audited. The critical findings here are individually smaller than review
|
||
#001's C1–C5 (no single tangle of four interlocking gaps), but they are
|
||
collectively important because several are governance issues (Proposed ADRs
|
||
treated as binding, Accepted ADRs contradicting later refinements) that
|
||
undermine the document set's authority.
|
||
|
||
The two-way-door audit's core finding: the framework is sound, but its
|
||
application is drifting because later ADRs don't update earlier OQ
|
||
classifications. The fix is not to re-litigate each door — it's to add guard
|
||
clauses documenting the narrowed scope, so the OQ tracker reflects
|
||
post-ADR-022 reality. |