docs(arch): resolve OQ-26 (AdapterError variants) + OQ-33 (PeerId = logical id) + OQ-34 (persistent peer registry)

OQ-26 (resolved): AdapterError variants decided — DiscoveryFailed,
SchemaParse, Transport, Unauthorized, SamePeerCollision (replaces flat
Conflict per ADR-029 §5). #[non_exhaustive] for downstream extension.
Two-way door; the initial set is the code's return type.

OQ-33 (resolved): PeerId is a logical identifier, NOT Identity.id. The
research's v1 default (PeerId = fingerprint) is overridden: coupling PeerId
to crypto material breaks every in-flight PeerRef::Specific and every ACL
entry on key rotation. v1 source is a connection-assigned UUID — a
no-storage workaround that works for the immediate use case (head→workers,
reconnect produces fresh PeerRef, in-flight gets NOT_FOUND which is correct).
The one-way door: PeerId is logical, not crypto — this determines
PeerCompositeEnv key type and PeerRef::Specific payload. The id source
(UUID vs configured name vs peer registry) is the two-way-door remainder.

OQ-34 (new): the storage dimension OQ-33 surfaced. The core crates are
deliberately DB-free (smaller, fewer deps, simpler testing) — this served
local-only state (vault, registry) well, but peer identity is the first
cross-node state that wants persistence. The real solution (a persistent
peer registry mapping stable logical name → current crypto material,
surviving key rotation) is not a v1 blocker (UUID works), but tracked so the
no-DB posture's limit is deliberate, not accidental. The storage boundary
(core gets a PeerRegistry trait vs stays storage-free) is the one-way door;
the backend choice is two-way. Key-rotation/ACL note: decoupling PeerId from
crypto keeps the door open for ACL entries that persist across key rotation
— when the peer registry is built, ACLs key on the logical name and key
rotation becomes vault-only with no remote-side ACL update.
This commit is contained in:
2026-06-27 06:34:35 +00:00
parent 77eb35a8a5
commit 99c6dd9483
5 changed files with 167 additions and 38 deletions

View File

@@ -349,22 +349,26 @@ revisited during implementation without a new ADR.
### OQ-26: OperationAdapter Error Type (AdapterError Variants)
- **Origin**: [client-and-adapters.md](crates/call/client-and-adapters.md), ADR-017 §5
- **Status**: open
- **Origin**: [client-and-adapters.md](crates/call/client-and-adapters.md), ADR-017 §5, [ADR-029](decisions/029-peer-graph-routing-model.md) §5
- **Status**: **resolved** (2026-06-27)
- **Door type**: Two-way
- **Priority**: medium
- **Resolution**: ADR-017 §5 showed `async fn import(&self) ->
Vec<HandlerRegistration>` with no error type. The trait returns
`Result<Vec<HandlerRegistration>, AdapterError>` where `AdapterError` is a
crate-level enum. The *presence* of an error type is recorded in
[client-and-adapters.md](crates/call/client-and-adapters.md); the exact
variants are the two-way-door remainder. The failure modes real
implementations hit: discovery transport failure (`from_call` remote
unreachable), schema parse failure (`from_openapi`, `from_jsonschema`),
unauthorized (HTTP 401 for `from_openapi`, `from_mcp`). Likely variants:
`DiscoveryFailed`, `SchemaParse`, `Transport`, `Unauthorized`. Decided
during implementation; recorded here, not in a full ADR.
- **Cross-references**: ADR-017, [client-and-adapters.md](crates/call/client-and-adapters.md)
- **Resolution**: The `AdapterError` enum is `#[non_exhaustive]` +
`thiserror::Error`, with these v1 variants:
- `DiscoveryFailed { message: String }``from_call` remote unreachable / `services/list` failed
- `SchemaParse { message: String }``from_openapi` / `from_jsonschema` couldn't parse the spec
- `Transport { message: String }` — underlying transport error (QUIC for `from_call`, HTTP for `from_openapi`/`from_mcp`)
- `Unauthorized { message: String }` — HTTP 401 for `from_openapi`/`from_mcp`, auth rejected for `from_call`
- `SamePeerCollision { message: String }` — namespace collision *within a single peer* (ADR-029 §5: cross-peer collision dissolves; same-peer collision stays an error). Replaces the flat `Conflict` variant from the pre-ADR-029 implementation.
`#[non_exhaustive]` lets `alknet-http`'s adapters extend without breaking
match arms. The variant payloads are `String` messages — kept simple and
`Send + Sync` by construction. This matches the shipped implementation
(`crates/alknet-call/src/client/mod.rs`) except `Conflict`
`SamePeerCollision` (the ADR-029 migration renames it). Two-way door:
adding variants later is non-breaking; renaming a variant is a match-arm
update but not an architectural change.
- **Cross-references**: ADR-017, ADR-029, [client-and-adapters.md](crates/call/client-and-adapters.md)
### OQ-27: from_call Re-Import Trigger
@@ -485,4 +489,111 @@ revisited during implementation without a new ADR.
suffices. Whether multi-hop federation becomes a real use case is a future
decision; the peer-keyed model does not foreclose it. Not designed; tracked
here so the v1 model's extendability is recorded.
- **Cross-references**: ADR-029, [client-and-adapters.md](crates/call/client-and-adapters.md)
- **Cross-references**: ADR-029, [client-and-adapters.md](crates/call/client-and-adapters.md)
### OQ-33: PeerId — Cryptographic Identity vs Stable Logical Identifier
- **Origin**: [ADR-029](decisions/029-peer-graph-routing-model.md) Assumption 1, `docs/research/alknet-call-peer-routing/findings.md` §6.1
- **Status**: **resolved** (2026-06-27)
- **Door type**: One-way (composition semantics), two-way (id source)
- **Priority**: high
- **Resolution**: `PeerId` is a **logical identifier, decoupled from the
cryptographic identity**. It is *not* `Identity.id` (the TLS fingerprint or
API-key prefix) — those change on key rotation, which would break every
in-flight `PeerRef::Specific` and every ACL entry referencing that peer.
**v1 source**: connection-assigned UUID (v4) at `connect()`/`accept()` time.
Stable for the connection's lifetime; changes on reconnect. This is a
**no-storage workaround** — the project has deliberately avoided a DB
backend for the core crates (smaller, fewer deps, simpler testing), which
has served the local-only crates (vault, registry) well. But peer identity
is the first *cross-node* state that wants persistence: what we actually
want is a persistent mapping from a logical peer identity to its current
cryptographic material, updated on key rotation, surviving restarts.
Without a DB, the UUID is the least-bad ephemeral option — the failure
mode (in-flight `PeerRef::Specific` gets `NOT_FOUND` on reconnect) is
acceptable for v1, and the re-`from_call` produces a fresh `PeerRef`.
**The real solution (future, tracked as OQ-34):** a persistent peer
registry — a mapping from a stable logical peer identity (configured node
name or registered identity) to its current cryptographic material,
persisted across restarts and key rotations. This is what makes the
ACL-stability concern below work correctly: the ACL entry keys on the
logical name, the peer registry tracks the current crypto identity for
that name, and key rotation becomes a vault-only operation with no ACL
update on the remote side. The no-DB posture of the core crates means
this registry lives outside the core — likely in a service crate or an
assembly-layer store — not in alknet-call itself. See OQ-34.
**Key-rotation / ACL note (context for the future, not a v1 decision):**
if `PeerId` were the fingerprint, rotating a node's TLS key would change
its `PeerId`, invalidating every ACL entry that references that peer. The
vault makes local key rotation easy (derive a new key, re-encrypt,
ADR-021); the problem is the *remote* side's ACL — the hub's
`authorized_fingerprints` / `AccessControl` entries that reference the old
fingerprint. Decoupling `PeerId` from the crypto material means the ACL
entry *can* persist across key rotation — but only if there's a store that
maps the logical name to the new crypto identity after rotation. That
store is OQ-34. The v1 decision (logical id, not crypto; UUID source)
keeps the door open for it without requiring it now.
**The one-way door:** `PeerId` is a logical id, not `Identity.id`. This
determines the `PeerCompositeEnv` key type, the `PeerRef::Specific`
payload type, and the `ScopedPeerEnv.peer_pinned` entry shape. Reversing
it (switching to `Identity.id`) would break the peer-keyed overlay, the
routing selector, and the reachability set simultaneously. The *source* of
the logical id (UUID now, peer registry later) is the two-way-door
remainder — switching from UUID to a persistent registry changes the
id-generation path, not the composition model.
- **Cross-references**: ADR-009, ADR-014, ADR-015, ADR-017, ADR-021, ADR-027,
ADR-029, OQ-34, [client-and-adapters.md](crates/call/client-and-adapters.md),
[operation-registry.md](crates/call/operation-registry.md),
[auth.md](crates/core/auth.md)
### OQ-34: Persistent Peer Registry (Cross-Node State Storage)
- **Origin**: OQ-33 (the storage dimension it surfaced), the no-DB posture of ADR-008/018/025
- **Status**: open
- **Door type**: One-way (storage boundary), two-way (backend choice)
- **Priority**: medium (not a v1 blocker — UUID works for v1; becomes real
when key rotation across nodes or peer-attribution persistence matters)
- **Resolution**: The core crates (alknet-core, alknet-call, alknet-vault)
are deliberately storage-free — no DB, no persistence layer, in-memory
state only. This has kept the core small and testable, and it works for
local-only state (vault key rotation is version-indexed paths, no DB
needed, ADR-021). **Peer identity is the first cross-node state that
wants persistence**: a stable logical peer identity mapped to its current
cryptographic material, surviving restarts and key rotations. The v1
workaround (OQ-33: connection-assigned UUID) is ephemeral — it works for
the immediate use case (head→workers, operator-controlled, reconnects
produce a fresh UUID) but doesn't support ACL entries that persist across
key rotation, because there's nowhere to store "worker-a's current crypto
identity is X."
**What this OQ tracks (not designed, not a v1 decision):**
- Whether a persistent peer registry belongs in a service crate (e.g., an
`alknet-registry` or `alknet-peer-store`), in the assembly layer (a
SQLite file the binary owns), or as a new alknet-core abstraction
(a `PeerRegistry` trait with no built-in impl, like `IdentityProvider`).
- Whether the no-DB posture extends to "core has a trait, service has the
impl" (the `IdentityProvider` pattern) or stays "core is storage-free,
persistence is entirely outside the crate graph."
- The backend choice (SQLite, a key-value store, a config file) is the
two-way-door remainder; the *storage boundary* (does core know about
persistence at all?) is the one-way door.
**Why this is a one-way door on the storage boundary, not a two-way door:**
if core gains a `PeerRegistry` trait, downstream crates depend on it and
the trait shape becomes a contract. If core stays storage-free, the
registry lives in a service crate and core never knows about persistence.
Reversing either direction breaks downstream consumers. The decision
should be made when a concrete use case (key rotation across nodes,
durable peer attribution, multi-hop federation with OQ-32) forces it —
not before.
**Not a v1 blocker.** The UUID works for v1; this OQ exists so the
no-DB posture's limit is tracked and the decision is made deliberately
when it's needed, not accidentally when someone bolts a SQLite file onto
the assembly layer and it becomes load-bearing.
- **Cross-references**: ADR-008, ADR-018, ADR-021, ADR-025, ADR-029, OQ-33,
[auth.md](crates/core/auth.md), [config.md](crates/core/config.md)