From 1d94aaea51ac03928573155764f8bd2991dfc757 Mon Sep 17 00:00:00 2001 From: "glm-5.2" Date: Sun, 28 Jun 2026 05:35:52 +0000 Subject: [PATCH] docs(arch): resolve call-crate OQs, promote OQ-29 to load-bearing on ADR-030 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Resolve the call-crate open questions where the decision is made — OQ-27 (auto-re-import), OQ-28 (same-peer collision = error), OQ-30 (PeerRef::Any insertion-order first-match), OQ-31 (services/list-peers opt-in). These were previously marked 'open' with 'v1' hedging language despite having a decided default. What remains (refresh(), richer routing, services/list-peers the op) is genuine feature addition, not unmade architecture. Reframe OQ-32 (multi-hop) as a feature extension rather than a 'v1' deferral — the one-hop model is the architectural commitment; extending to multi-hop doesn't break downstream. Promote OQ-29 (CallClient TLS client-auth) from medium to high priority and surface its real interaction with ADR-030. Previously framed as 'additive — two-way-door remainder,' but ADR-030's PeerEntry fingerprint → peer_id resolution requires the client to present a TLS client cert. With with_no_client_auth(), no fingerprint is extracted, the PeerEntry path is dormant, and PeerCompositeEnv keys on None or the API-key prefix instead of the stable peer_id. This is the activation path for ADR-030's primary use case, not an additive feature. Three options laid out: (a) wire client-auth with the ADR-029 migration, (b) ship token-only and switch later (the 'compounds into a mess' path), (c) extend PeerEntry to cover auth_token-based identity. Requires a decision before the migration lands. Clarify OQ-36 (concrete adapter shapes): the trait shapes and in-memory adapters ship with core — the deferral is only for the persistence adapters (SQLite, etc.). The in-memory adapters are real implementations of a full repo pattern, not stubs. Update call_client.rs source comment to reference OQ-29 instead of the 'v1' / 'two-way-door remainder' framing. Workspace green: 326 tests pass, build clean. --- crates/alknet-call/src/client/call_client.rs | 24 +- docs/architecture/README.md | 22 +- docs/architecture/crates/call/README.md | 25 +- .../crates/call/client-and-adapters.md | 53 ++-- docs/architecture/open-questions.md | 251 +++++++++++------- 5 files changed, 224 insertions(+), 151 deletions(-) diff --git a/crates/alknet-call/src/client/call_client.rs b/crates/alknet-call/src/client/call_client.rs index 18e6bb0..45fbce6 100644 --- a/crates/alknet-call/src/client/call_client.rs +++ b/crates/alknet-call/src/client/call_client.rs @@ -207,17 +207,21 @@ fn build_quinn_client_config( _credentials: &CallCredentials, alpn: &[u8], ) -> Result { - // v1 connects without client-auth TLS identity: the server-side - // `AcceptAnyCertVerifier` (in alknet-core::endpoint) does not require or + // TODO(OQ-29): connects without client-auth TLS identity. The server-side + // `AcceptAnyCertVerifier` (in alknet-core::endpoint) requests but does not // verify client certs, so a client cert is not needed to establish a - // connection. Wiring the local node's RawKey/X509 identity as a quinn - // client-auth cert (for servers that *do* verify client identity) is a - // two-way-door remainder — the `credentials.tls_identity` field is - // carried through `CallCredentials` so the assembly layer can populate - // it, and a future task plugs it into the rustls client config. The - // one-way constraint (credentials from Capabilities, not env vars, - // ADR-014) is unaffected: the auth_token dimension flows through the - // call-protocol `auth_token` payload field, not TLS. + // connection. However, without a client cert, the server cannot extract a + // fingerprint, so `IdentityProvider::resolve_from_fingerprint` returns + // None and the peer gets no stable `PeerEntry.peer_id` (ADR-030). This is + // load-bearing on ADR-030's peer-identity model — see OQ-29 for the + // decision needed before the ADR-029 migration lands. + // + // The `credentials.tls_identity` field is carried through `CallCredentials` + // so the assembly layer can populate it; wiring it into the rustls client + // config is the missing piece. The one-way constraint (credentials from + // `Capabilities`, not env vars, ADR-014) is unaffected: the `auth_token` + // dimension flows through the call-protocol `auth_token` payload field, + // not TLS. let provider = Arc::new(rustls::crypto::aws_lc_rs::default_provider()); let mut config = rustls::ClientConfig::builder_with_provider(provider) .with_safe_default_protocol_versions() diff --git a/docs/architecture/README.md b/docs/architecture/README.md index feab166..d9a507f 100644 --- a/docs/architecture/README.md +++ b/docs/architecture/README.md @@ -114,16 +114,18 @@ See [open-questions.md](open-questions.md) for the full tracker. - **OQ-34**: ~~Persistent peer registry~~ — **resolved by ADR-030 + ADR-031 + ADR-033** (storage boundary: core defines repo traits + in-memory defaults; persistence adapters are separate crates) - **OQ-35**: API key identity vs peer identity — resolved (recorded by ADR-030; the asymmetry between fingerprint and API-key paths is deliberate) -**Open (two-way-door remainders from alknet-call completion + peer-graph routing):** -- **OQ-25**: ~~Remote-safe marking shape~~ — **dissolved by ADR-029** (no marking; peer authorization is `AccessControl::check(peer_identity)`) -- **OQ-26**: ~~`OperationAdapter` error type~~ — **resolved** (`AdapterError` variants: `DiscoveryFailed`, `SchemaParse`, `Transport`, `Unauthorized`, `SamePeerCollision`; `#[non_exhaustive]`) -- **OQ-27**: `from_call` re-import trigger — v1 default auto-on-reconnect; explicit `refresh()` additive -- **OQ-28**: `from_call` namespace collision — cross-peer **dissolved by ADR-029** (separate sub-overlays); same-peer stays error -- **OQ-29**: `CallClient` TLS client-auth — v1 `with_no_client_auth()` + `AcceptAnyServerCertVerifier`; wiring RawKey client-auth is additive -- **OQ-30**: `PeerRef::Any` routing policy — v1 insertion-order first-match; round-robin/least-loaded is future (ADR-029) -- **OQ-31**: `services/list-peers` re-export semantics — v1 "own ops only"; `services/list-peers` is opt-in (ADR-029) -- **OQ-32**: Multi-hop federation — v1 one-hop; peer-keyed model extends without redesign; petgraph candidate (ADR-029) -- **OQ-36**: Concrete adapter shapes — open (deferred for exploration; the repo/adapter pattern is committed by ADR-033, the concrete adapter shapes are not) +**Resolved by the call-completion / ADR-029 work:** +- **OQ-27**: ~~`from_call` re-import trigger~~ — **resolved** (auto-re-import on connection establishment; `refresh()` is a feature addition) +- **OQ-28**: ~~`from_call` namespace collision~~ — **resolved** (same-peer collision = error; cross-peer dissolved by ADR-029) +- **OQ-30**: ~~`PeerRef::Any` routing policy~~ — **resolved** (insertion-order first-match; richer routing is a feature extension) +- **OQ-31**: ~~`services/list-peers` re-export semantics~~ — **resolved** (opt-in `services/list-peers`; `services/list` is "own ops only") + +**Open (requires decision before ADR-029 migration lands):** +- **OQ-29**: `CallClient` TLS client-auth — **promoted to high priority, load-bearing on ADR-030**. Not "additive" as previously framed — it's the activation path for the `PeerEntry` fingerprint → `peer_id` resolution. Without it, `PeerCompositeEnv` keys on `None` or the API-key prefix, not the stable `peer_id`. See OQ-29 for the three options (wire client-auth with the migration / ship token-only / extend PeerEntry to cover auth_token). + +**Open (feature extensions, not blocking):** +- **OQ-32**: Multi-hop federation — the one-hop model is the architectural commitment; multi-hop is a feature extension that doesn't break downstream +- **OQ-36**: Concrete adapter shapes — the repo/adapter pattern is committed (ADR-033); concrete adapter shapes are deferred for exploration. Note: the trait shapes and in-memory adapters must ship with core (per the project's clarification) — the deferral is for the persistence adapters (SQLite, etc.), not the core traits **Deferred (not active):** - **OQ-09**: WASM target boundaries — design constraint, not deliverable diff --git a/docs/architecture/crates/call/README.md b/docs/architecture/crates/call/README.md index 3cff9d9..baf3456 100644 --- a/docs/architecture/crates/call/README.md +++ b/docs/architecture/crates/call/README.md @@ -1,7 +1,7 @@ --- status: draft -last_updated: 2026-06-26 -review: call/review-call passed 2026-06-23 — registry, protocol, ADR (005/012/014/015/016/017/022/023/024), security, and pattern-consistency checks all conformant; 159 unit/integration tests green; `cargo build`, `cargo clippy -- -D warnings`, `cargo fmt --check`, `cargo test` clean. Call-completion gap (ADR-017 client/adapter surface) addressed 2026-06-26 — ADR-028 + client-and-adapters.md added; implementation pending. +last_updated: 2026-06-27 +review: call/review-call passed 2026-06-23 — registry, protocol, ADR (005/012/014/015/016/017/022/023/024), security, and pattern-consistency checks all conformant; 159 unit/integration tests green; `cargo build`, `cargo clippy -- -D warnings`, `cargo fmt --check`, `cargo test` clean. Call-completion gap (ADR-017 client/adapter surface) addressed 2026-06-26; ADR-029 migration pending. --- # alknet-call @@ -40,6 +40,9 @@ Structured RPC over QUIC: operations, request/response, streaming subscriptions, | [024](../../decisions/024-operation-registry-layering.md) | Operation Registry Layering | Curated (static) + session/connection overlays (dynamic); `OperationEnv` as trait-object integration point; `OperationContext.env` split into `scoped_env` (data) and `env` (dispatch trait) | | [028](../../decisions/028-callclient-peer-scoped-registry-filtering.md) | ~~Peer-Scoped Registry Filtering~~ | ~~Accepted~~ → **Superseded** by ADR-029 (flat-namespace single-peer model couldn't express head→N-workers; parallel auth system duplicated `AccessControl`) | | [029](../../decisions/029-peer-graph-routing-model.md) | Peer-Graph Routing Model | Peer-keyed overlays + `PeerRef` routing; `AccessControl`-based peer authorization; retires `remote_safe`/`trusted_peer` | +| [030](../../decisions/030-peerentry-and-identity-id-decoupling.md) | PeerEntry and Identity.id Decoupling | `PeerId` source = `Identity.id` = `PeerEntry.peer_id` (stable); supersedes ADR-029's UUID source | +| [032](../../decisions/032-forwarded-for-identity.md) | Forwarded-For Identity | `forwarded_for` on `OperationContext` and `call.requested`; metadata only, never used by `AccessControl::check` | +| [033](../../decisions/033-storage-boundary-and-repo-adapter-pattern.md) | Storage Boundary and Repo/Adapter Pattern | Core defines repo traits + in-memory defaults; persistence adapters are separate crates | ## Relevant Open Questions @@ -52,14 +55,14 @@ Structured RPC over QUIC: operations, request/response, streaming subscriptions, | OQ-19 | Session-scoped operation registries | resolved | Agent-written operations overlaid on curated registry via `OperationEnv` trait layering. Protocol doesn't need changes; `OperationEnv` must remain a trait. Generalized by ADR-024 to cover connection-scoped overlays. | | OQ-25 | ~~Remote-safe marking shape~~ | **dissolved** (ADR-029) | `remote_safe`/`trusted_peer` retired; peer authorization is `AccessControl::check(peer_identity)` | | OQ-26 | OperationAdapter error type (AdapterError variants) | **resolved** | `DiscoveryFailed`, `SchemaParse`, `Transport`, `Unauthorized`, `SamePeerCollision`; `#[non_exhaustive]` | -| OQ-27 | from_call re-import trigger | open (two-way) | v1 default: auto-on-reconnect; explicit `refresh()` additive | -| OQ-28 | from_call namespace collision | cross-peer **dissolved** (ADR-029) / same-peer stays | Cross-peer: separate sub-overlays, no collision. Same-peer: error. `namespace_prefix` is local-naming sugar | -| OQ-29 | CallClient TLS client-auth and remote-identity verification | open (two-way) | v1 `with_no_client_auth()` + `AcceptAnyServerCertVerifier`; wiring RawKey client-auth is additive (orthogonal to ADR-029) | -| OQ-30 | `PeerRef::Any` routing policy | open (two-way) | v1 insertion-order first-match; round-robin/least-loaded is future (ADR-029) | -| OQ-31 | `services/list-peers` re-export semantics | open (two-way) | v1 "own ops only"; `services/list-peers` is opt-in (ADR-029) | -| OQ-32 | Multi-hop federation | open | v1 one-hop; peer-keyed model extends without redesign; petgraph candidate (ADR-029) | -| OQ-33 | PeerId — crypto identity vs stable logical id | **resolved** | Logical id (UUID v1), not `Identity.id`; decoupled from crypto for key-rotation-safe ACLs | -| OQ-34 | Persistent peer registry (cross-node state storage) | open | Not a v1 blocker (UUID works); the no-DB posture's limit, tracked for deliberate future decision | +| OQ-27 | from_call re-import trigger | **resolved** | Auto-re-import on connection establishment; `refresh()` is a feature addition | +| OQ-28 | from_call namespace collision | **resolved** | Same-peer collision = error; cross-peer dissolved by ADR-029 (separate sub-overlays) | +| OQ-29 | CallClient TLS client-auth | **open (high, load-bearing on ADR-030)** | NOT "additive" — activates the `PeerEntry` fingerprint → `peer_id` path. Requires decision before ADR-029 migration. | +| OQ-30 | `PeerRef::Any` routing policy | **resolved** | Insertion-order first-match; richer routing is a feature extension | +| OQ-31 | `services/list-peers` re-export semantics | **resolved** | Opt-in `services/list-peers`; `services/list` is "own ops only" | +| OQ-32 | Multi-hop federation | open (feature extension) | One-hop model is the commitment; multi-hop is a feature extension, not a deferral | +| OQ-33 | PeerId — crypto identity vs stable logical id | **resolved** (ADR-030) | `PeerId = Identity.id = PeerEntry.peer_id` (stable across key rotation) | +| OQ-34 | Persistent peer registry | **resolved** (ADR-030+033) | Core trait + in-memory default; persistence adapters are separate crates | ## Key Design Principles @@ -74,6 +77,6 @@ Structured RPC over QUIC: operations, request/response, streaming subscriptions, 9. **Internal calls switch authority context, not skip ACL**: The `internal` flag marks composition-originated calls. ACL runs against the handler's composition authority, not the caller's and not as a blanket skip. Operations have External/Internal visibility. Scoped composition env bounds reachability. See ADR-015, ADR-022. 10. **Provenance determines composition capability**: Only `Local` and `Session` ops can compose. Leaves (`FromOpenAPI`, `FromMCP`, `FromCall`) are forwarding stubs — they don't get composition authority or a scoped env. The assembly layer is the sole grantor of composition authority. See ADR-022. 11. **Connection direction is independent of call direction**: Who opens the QUIC connection is a connection-layer concern, not a protocol-layer concern. Both sides can call each other once connected. The `CallAdapter` accepts connections; the `CallClient` opens them; both produce the same `CallConnection` and dispatch through the same loop. See ADR-017, [client-and-adapters.md](client-and-adapters.md). -12. **CallClient registry is default-deny**: A `CallClient` exposes no operations to the remote peer unless explicitly marked remote-safe. Sharing the global registry is an explicit trusted-peer opt-in, never the default. This prevents a remote peer's call from triggering dispatch that populates `OperationContext.capabilities` from the local node's registration bundle. See ADR-028. +12. **Peer authorization via `AccessControl`**: A remote peer's call is authorized by `AccessControl::check(peer_identity)` against the op's `AccessControl` — the same mechanism that gates every other call. No `remote_safe` flag, no `trusted_peer` bypass. An op with `AccessControl::default()` is callable by any peer; an op with `required_scopes` is callable only by peers whose `Identity.scopes` satisfy them; an op with `Visibility::Internal` is never callable from the wire. See ADR-029. 13. **Adapter trait lives with the types; implementations live with their transport**: `OperationAdapter` is in `alknet-call`; `from_call`/`from_jsonschema` are in `alknet-call` (QUIC / pure parse); `from_openapi`/`from_mcp`/`to_openapi`/`to_mcp` are in `alknet-http` (reqwest / axum). `alknet-call` stays lean — no HTTP client, no HTTP server. See [client-and-adapters.md](client-and-adapters.md). 14. **No handler reads outbound credentials from any source other than `OperationContext.capabilities`** (no-env-vars invariant): the credential injection path is vault → assembly layer → `Capabilities` → `HandlerRegistration.capabilities` → `OperationContext.capabilities` → handler. Downstream consumers' `std::env::var` reads are unreachable because the assembly layer never calls `Default::default()`. See ADR-014, [client-and-adapters.md](client-and-adapters.md). \ No newline at end of file diff --git a/docs/architecture/crates/call/client-and-adapters.md b/docs/architecture/crates/call/client-and-adapters.md index 3f841bd..ef5da43 100644 --- a/docs/architecture/crates/call/client-and-adapters.md +++ b/docs/architecture/crates/call/client-and-adapters.md @@ -625,26 +625,30 @@ See [open-questions.md](../../open-questions.md) for full details. - **OQ-26** (resolved): `AdapterError` variants — `DiscoveryFailed`, `SchemaParse`, `Transport`, `Unauthorized`, `SamePeerCollision` (replaces flat `Conflict`). `#[non_exhaustive]`. -- **OQ-27** (open, two-way): `from_call` re-import trigger — auto-on-reconnect - (v1 default, recorded here) vs explicit `CallConnection::refresh()`. v1 is - auto-on-reconnect; the explicit path is additive. The overlay is now - peer-scoped (drops with the connection), so re-import is naturally scoped. -- **OQ-28** (cross-peer dissolved by ADR-029 / same-peer stays): Cross-peer - collision dissolves — same name on different peers is fine (separate - sub-overlays). Same-peer collision stays an error. `namespace_prefix` is - optional local-naming sugar, not the disambiguation mechanism. -- **OQ-29** (open, two-way): `CallClient` TLS client-auth + remote-identity - verification — v1 connects with `with_no_client_auth()` and - `AcceptAnyServerCertVerifier`. Wiring RawKey client-auth is additive. - Orthogonal to the routing model (ADR-029); `auth_token` flows through the - call-protocol payload, not TLS, so the no-env-vars invariant is unaffected. -- **OQ-30** (open, two-way): `PeerRef::Any` routing policy — v1 insertion-order - first-match; round-robin/least-loaded is the future extension (ADR-029 §2). -- **OQ-31** (open, two-way): `services/list-peers` re-export semantics — v1 - defaults to "own ops only"; `services/list-peers` is the opt-in (ADR-029 §6). -- **OQ-32** (open): Multi-hop federation — v1 is one-hop; the peer-keyed - overlay model extends to multi-hop without redesign; petgraph is the - candidate if path-finding becomes real (ADR-029 §3.7). +- **OQ-27** (resolved): `from_call` re-import trigger — auto-re-import on + connection establishment. `CallConnection::refresh()` is a feature + addition, not an unmade decision. +- **OQ-28** (resolved): `from_call` namespace collision — same-peer + collision = error; cross-peer dissolved by ADR-029 (separate sub-overlays). + `namespace_prefix` is optional local-naming sugar. +- **OQ-29** (open, **high priority, load-bearing on ADR-030**): `CallClient` + TLS client-auth — NOT "additive" as previously framed. ADR-030's + `PeerEntry` fingerprint → `peer_id` resolution requires the client to + present a TLS client cert; `with_no_client_auth()` means no fingerprint, + no `PeerEntry` resolution, no stable `peer_id`. The `auth_token` path + resolves to `Identity.id = ApiKeyEntry.prefix`, not `peer_id`. See OQ-29 + for the three options (wire client-auth with the migration / ship + token-only / extend PeerEntry to cover auth_token). Requires a decision + before the ADR-029 migration lands. +- **OQ-30** (resolved): `PeerRef::Any` routing policy — insertion-order + first-match. A richer `RoutingPolicy` is a feature extension. +- **OQ-31** (resolved): `services/list-peers` — opt-in; `services/list` + is "own ops only." +- **OQ-32** (open, feature extension): Multi-hop federation — the one-hop + model is the architectural commitment; multi-hop is a feature extension + that doesn't break downstream. The peer-keyed model extends to multi-hop + without redesign; petgraph is the candidate if path-finding becomes real + (ADR-029 §3.7). - **OQ-33** (resolved by ADR-030): `PeerId` is a logical id. Source is `Identity.id` from `IdentityProvider` resolution (= `PeerEntry.peer_id`, stable across key rotation), not a connection-assigned UUID. The UUID @@ -657,11 +661,10 @@ See [open-questions.md](../../open-questions.md) for full details. asymmetry between the fingerprint path (gets `PeerEntry` id-decoupling) and the API-key path (doesn't) is deliberate. See OQ-35 in open-questions.md. -- **OQ-36** (tracked by ADR-033): Concrete adapter shapes — the repo/adapter - pattern is committed (core trait + in-memory default; persistence adapters - are separate crates); the concrete adapter shapes (table schemas, backend - choice, indexing) are deferred for exploration. See OQ-36 in - open-questions.md. +- **OQ-36** (open, deferred for exploration): Concrete persistence adapter + shapes — the repo/adapter pattern is committed (ADR-033); the in-memory + adapters ship with core; the persistence adapter shapes (SQLite, etc.) + are deferred for exploration. See OQ-36 in open-questions.md. ## References diff --git a/docs/architecture/open-questions.md b/docs/architecture/open-questions.md index 95040a6..b613332 100644 --- a/docs/architecture/open-questions.md +++ b/docs/architecture/open-questions.md @@ -323,9 +323,10 @@ These open questions are the remainders from the call-completion gap analysis (`docs/research/alknet-call-completion/gap-analysis.md`, DC-1..4) and the peer-graph routing research (`docs/research/alknet-call-peer-routing/findings.md`). ADR-029 supersedes ADR-028 and dissolves OQ-25 and the cross-peer half of -OQ-28; the remaining two-way-door shape/defaults are recorded in -[client-and-adapters.md](crates/call/client-and-adapters.md) and may be -revisited during implementation without a new ADR. +OQ-28. Most of the remaining OQs are now resolved (decisions made, defaults +recorded). OQ-29 is the exception — it's load-bearing on ADR-030 and +requires a decision before the ADR-029 migration lands. OQ-32 (multi-hop) +is a feature extension, not an unmade architecture decision. ### OQ-25: ~~Remote-Safe Marking Shape for CallClient Peer-Scoped Filtering~~ (Dissolved by ADR-029) @@ -373,122 +374,170 @@ revisited during implementation without a new ADR. ### OQ-27: from_call Re-Import Trigger - **Origin**: [client-and-adapters.md](crates/call/client-and-adapters.md), ADR-017 Assumption 4 -- **Status**: open +- **Status**: **resolved** (2026-06-27) - **Door type**: Two-way - **Priority**: low -- **Resolution**: ADR-017 Assumption 4 noted re-import "happens on - reconnection or is triggered explicitly." The v1 default is - **auto-re-import on connection establishment**. The overlay is - per-connection (Layer 2, ADR-024), so a stale overlay dies with the - connection; re-import on reconnect is naturally scoped to the new - connection. This is the right default for the runner pattern (a worker - reconnects → the hub re-discovers the worker's ops automatically). - Explicit re-import via a future `CallConnection::refresh()` method is - additive and can be added if a deployment needs manual control. Reversal - is cheap; no ADR needed. +- **Resolution**: The decision is **auto-re-import on connection + establishment**. The overlay is per-connection (Layer 2, ADR-024), so a + stale overlay dies with the connection; re-import on reconnect is + naturally scoped to the new connection. This is the right default for the + runner pattern (a worker reconnects → the hub re-discovers the worker's + ops automatically). An explicit `CallConnection::refresh()` method is a + genuine feature addition — non-breaking, additive — if a deployment + needs manual control. - **Cross-references**: ADR-017, ADR-024, [client-and-adapters.md](crates/call/client-and-adapters.md) ### OQ-28: from_call Namespace Collision Behavior - **Origin**: [client-and-adapters.md](crates/call/client-and-adapters.md), ADR-017 §3 -- **Status**: open +- **Status**: **resolved** (2026-06-27) - **Door type**: Two-way - **Priority**: low - **Resolution**: ADR-017 §3's `FromCallConfig` namespace prefix is - **optional, default no prefix, collision = error**. A node importing from - two remotes that both expose `/container/exec` without prefixes should fail - loudly rather than silently overwrite. The operator adds prefixes when they - know they're importing from multiple sources. This matches the - default-deny, explicit-allow posture (ADR-015, ADR-028). Reversal is cheap; - no ADR needed. The alternative (last-wins) would silently mask one - remote's op behind another's, which is the kind of surprise the + **optional, default no prefix, same-peer collision = error**. A node + importing from a peer that exposes two ops with the same name should fail + loudly rather than silently overwrite. This matches the default-deny, + explicit-allow posture (ADR-015). The alternative (last-wins) would + silently mask one op behind another, which is the kind of surprise the default-deny posture exists to avoid. - **Cross-peer collision dissolved by ADR-029.** Under the peer-keyed overlay - model, same name on different peers is fine — they live in separate - peer sub-overlays, no collision, no prefix needed. The collision rule now - stays only *within* a peer (same name on the same peer is still an error — - a peer shouldn't expose two ops with the same name). `FromCallConfig::namespace_prefix` - becomes optional local-naming sugar, not the disambiguation mechanism. See - ADR-029 §5. + **Cross-peer collision dissolved by ADR-029.** Under the peer-keyed + overlay model, same name on different peers is fine — they live in + separate peer sub-overlays, no collision, no prefix needed. + `FromCallConfig::namespace_prefix` is optional local-naming sugar for + when the importing node wants to expose a peer's ops under a different + name *locally* — a local-naming concern, not a disambiguation concern. + See ADR-029 §5. - **Cross-references**: ADR-015, ADR-017, ~~ADR-028~~ (superseded), ADR-029, [client-and-adapters.md](crates/call/client-and-adapters.md) ### OQ-29: CallClient TLS Client-Auth and Remote-Identity Verification - **Origin**: [client-and-adapters.md](crates/call/client-and-adapters.md), ADR-017 §7 -- **Status**: open -- **Door type**: Two-way -- **Priority**: medium -- **Resolution**: v1 `CallClient::connect()` builds the quinn client config - with `with_no_client_auth()` and an `AcceptAnyServerCertVerifier` — the - client does not present its TLS identity (`credentials.tls_identity`) as a - client cert, and does not pin the remote's expected identity from - `credentials.remote_identity`. The server-side - `AcceptAnyCertVerifier` (in alknet-core's endpoint) does not require or - verify client certs, so a client cert is not needed to establish a - connection in v1. Wiring the local node's RawKey/X509 identity as a rustls - client-auth cert (for servers that *do* verify client identity) and - plugging `credentials.remote_identity` into a real `ServerCertVerifier` is - additive — a two-way-door remainder surfaced during implementation. - **The one-way constraint (credentials from `Capabilities`, not env vars, - ADR-014) is unaffected**: the `auth_token` dimension flows through the - call-protocol `auth_token` payload field, not TLS, so the no-env-vars - invariant holds independently of this gap. Decided during a future task that - wires RawKey client-auth; recorded here, not in a full ADR. -- **Cross-references**: ADR-014, ADR-017, ADR-027, [client-and-adapters.md](crates/call/client-and-adapters.md), [endpoint.md](crates/core/endpoint.md) +- **Status**: **open — load-bearing on ADR-030** (not "additive" as previously framed) +- **Door type**: One-way (identity model interaction), two-way (mechanism) +- **Priority**: **high** (was medium; promoted — this is the activation path + for ADR-030's `PeerEntry` fingerprint → `peer_id` resolution) +- **Resolution**: **Previously framed as "additive — two-way-door + remainder." That framing is incorrect.** ADR-030 makes `PeerId = + Identity.id = PeerEntry.peer_id` on the fingerprint path. But the + fingerprint path requires the client to present a TLS client cert, and + the current `CallClient::connect()` uses `with_no_client_auth()` — no + client cert is presented, no fingerprint is extracted by the server's + `AcceptAnyCertVerifier`, and `IdentityProvider::resolve_from_fingerprint` + returns `None`. The peer gets no `PeerId` from the fingerprint path. + + The `auth_token` path (`resolve_from_token`) still works, but it + resolves to `Identity.id = ApiKeyEntry.prefix` (the API-key identity + path), **not** to `PeerEntry.peer_id`. So with TLS client-auth unwired, + a calling peer's `PeerId` is either `None` (no client cert) or an + API-key prefix (if an `auth_token` is used) — neither is the stable + `PeerEntry.peer_id` that ADR-030 commits. The PeerEntry path is dormant + until client-auth is wired. + + This is not a "two-way-door remainder" — it's the activation path for + ADR-030's primary use case (stable `peer_id` across key rotation for + peer-keyed overlays). The decision to make is: + + - **(a)** Wire TLS client-auth as part of the ADR-029 migration, so the + fingerprint → `PeerEntry` → `peer_id` path is live from day one. The + server's `AcceptAnyCertVerifier` already requests (but doesn't verify) + client certs; the client's `with_no_client_auth()` is the gap. Wiring + the local node's `RawKey`/`X509` identity as a rustls client-auth cert + is the missing piece. Remote-identity verification (plugging + `credentials.remote_identity` into a real `ServerCertVerifier`) is + genuinely additive — the server-side fingerprint extraction is what + matters for `PeerId`, not the client-side verification of the server. + + - **(b)** Ship the ADR-029 migration with `auth_token`-only peer identity + and treat TLS client-auth as a follow-up. This means `PeerCompositeEnv` + keys on `Identity.id = ApiKeyEntry.prefix` (the token prefix) until + client-auth is wired, then switches to `PeerEntry.peer_id` when it is. + The switch is a behavior change for any deployment that built on the + token-prefix identity — the `PeerId` changes from the prefix to the + `peer_id`. This is the "compounds into a mess" path. + + - **(c)** Extend `PeerEntry` to also cover `auth_token`-based peer + identity — a peer entry keyed by token prefix (or a `PeerEntry.token` + field) instead of (or alongside) fingerprint. This unifies the two + identity paths under `PeerEntry`, so the `PeerId` is stable regardless + of which credential path the peer used. This is a design change to + ADR-030, not just an implementation choice. + + **The X.509 / raw-key wrinkle:** the vast majority of end users will use + Ed25519 raw keys (RFC 7250) — the same key type as SSH keys, native to + iroh's `NodeId` model. The fingerprint format for raw keys is + `ed25519:`. For X.509 (public-facing endpoints like + `api.alk.dev`, relays), the fingerprint is `SHA256:` — a + different format, a different key type, but the same `PeerEntry.fingerprint` + field. The `IdentityProvider::resolve_from_fingerprint` path is + format-agnostic (it's a string match against `PeerEntry.fingerprint`), + so both key types work once client-auth is wired. The wrinkle is on the + client side: presenting an Ed25519 raw key as a TLS client cert uses a + different rustls path than presenting an X.509 cert. Both are supported + by rustls; the `CallCredentials.tls_identity` field already carries the + `TlsIdentity` enum (RawKey / X509). The wiring is per-variant. + + **Not decided yet.** This OQ is promoted to high priority and requires a + decision before the ADR-029 migration lands. The previous "additive, + two-way-door remainder" framing is struck. +- **Cross-references**: ADR-014, ADR-017, ADR-027, ADR-029, ADR-030, + [client-and-adapters.md](crates/call/client-and-adapters.md), + [endpoint.md](crates/core/endpoint.md), [auth.md](crates/core/auth.md) ### OQ-30: PeerRef::Any Routing Policy - **Origin**: [ADR-029](decisions/029-peer-graph-routing-model.md) §2, [client-and-adapters.md](crates/call/client-and-adapters.md), `docs/research/alknet-call-peer-routing/findings.md` §3.2 -- **Status**: open +- **Status**: **resolved** (2026-06-27) - **Door type**: Two-way - **Priority**: low -- **Resolution**: v1 `PeerRef::Any` uses insertion-order first-match — - deterministic but order-dependent (worker A connects before worker B → `Any` - routes to A until A disconnects). This is the simplest routing policy and is - correct for the immediate use case (the head picks the first worker that - serves the op). A richer `RoutingPolicy` (round-robin, least-loaded, - affinity) is the two-way-door remainder; the `PeerRef` enum is designed to - compose with a `Route { selector, policy }` struct without breaking the - `invoke_peer` signature. Decided during implementation when a fan-out use - case needs it; recorded here, not in a full ADR. +- **Resolution**: `PeerRef::Any` uses **insertion-order first-match** — + deterministic but order-dependent (worker A connects before worker B → + `Any` routes to A until A disconnects). This is the simplest routing + policy and is correct for the immediate use case (the head picks the + first worker that serves the op). A richer `RoutingPolicy` (round-robin, + least-loaded, affinity) is a feature extension — the `PeerRef` enum is + designed to compose with a `Route { selector, policy }` struct without + breaking the `invoke_peer` signature. Adding a routing policy is + non-breaking; it's a feature addition when a fan-out use case needs it, + not an unmade architectural decision. - **Cross-references**: ADR-029, [client-and-adapters.md](crates/call/client-and-adapters.md) ### OQ-31: services/list-peers Re-Export Semantics - **Origin**: [ADR-029](decisions/029-peer-graph-routing-model.md) §6, `docs/research/alknet-call-peer-routing/findings.md` §3.5 -- **Status**: open +- **Status**: **resolved** (2026-06-27) - **Door type**: Two-way - **Priority**: low -- **Resolution**: v1 defaults to "own ops only" — `services/list` shows the - head's own Layer 0 `External` ops, filtered by `AccessControl::check(calling_peer)`, - unchanged from today (minus the `remote_safe` filter). A `services/list-peers` - opt-in (new built-in operation) lists the peer overlays with attribution: - each peer's sub-overlay listed as `{ peer: Option, operations: [...] }`, - filtered by the calling peer's authorization. Whether re-exported peer ops - are listed by default, opt-in, or per-peer-policy is the two-way-door - remainder; v1 is opt-in (`services/list-peers`). The re-export policy is an - `AccessControl` decision on the listing op. Decided during implementation - when a consumer needs peer-attributed discovery; recorded here, not in a - full ADR. +- **Resolution**: `services/list` defaults to **"own ops only"** — it shows + the head's own Layer 0 `External` ops, filtered by + `AccessControl::check(calling_peer)`, unchanged from today (minus the + retired `remote_safe` filter). A `services/list-peers` opt-in (new + built-in operation) lists the peer overlays with attribution: each + peer's sub-overlay listed as `{ peer: Option, operations: [...] }`, + filtered by the calling peer's authorization. The re-export policy is an + `AccessControl` decision on the listing op. Whether `services/list-peers` + is built now or as a feature addition is a scheduling question — the + decision (opt-in, `AccessControl`-filtered) is made. - **Cross-references**: ADR-029, [client-and-adapters.md](crates/call/client-and-adapters.md) ### OQ-32: Multi-Hop Federation - **Origin**: [ADR-029](decisions/029-peer-graph-routing-model.md) §3.7, `docs/research/alknet-call-peer-routing/findings.md` §3.7 -- **Status**: open +- **Status**: open (feature extension, not an unmade architecture decision) - **Door type**: One-way (federation model), two-way (mechanism) - **Priority**: low -- **Resolution**: v1 is one-hop — worker A does not transitively see worker - B's ops through the head unless the head explicitly re-exports them. The - peer-keyed overlay model extends to multi-hop without redesign (a chain of - `PeerRef::Specific` routing decisions), but path-finding (which peer reaches - which op transitively) is where a graph library (petgraph) would pay off. - For v1 (one hop, shallow), a nested `HashMap>` - suffices. Whether multi-hop federation becomes a real use case is a future - decision; the peer-keyed model does not foreclose it. Not designed; tracked - here so the v1 model's extendability is recorded. +- **Resolution**: The model is **one-hop** — worker A does not transitively + see worker B's ops through the head unless the head explicitly re-exports + them. The peer-keyed overlay model extends to multi-hop without redesign + (a chain of `PeerRef::Specific` routing decisions), but path-finding + (which peer reaches which op transitively) is where a graph library + (petgraph) would pay off. For one-hop (shallow), a nested + `HashMap>` suffices. Multi-hop federation is + a feature extension — the one-hop model is the architectural commitment; + extending to multi-hop doesn't break downstream crates. Whether multi-hop + becomes a real use case is a future decision; the peer-keyed model does + not foreclose it. - **Cross-references**: ADR-029, [client-and-adapters.md](crates/call/client-and-adapters.md) ### OQ-33: PeerId — Cryptographic Identity vs Stable Logical Identifier @@ -591,7 +640,7 @@ revisited during implementation without a new ADR. - **Cross-references**: ADR-030, [auth.md](crates/core/auth.md), [config.md](crates/core/config.md) -### OQ-36: Concrete Adapter Shapes (Deferred for Exploration) +### OQ-36: Concrete Persistence Adapter Shapes (Deferred for Exploration) - **Origin**: ADR-033 §"What this does NOT do" (concrete adapter shapes not specified), the project's note that the repo pattern is a tool to reach @@ -599,23 +648,35 @@ revisited during implementation without a new ADR. - **Status**: open (deferred for exploration) - **Door type**: Two-way (adapter shapes are implementation details; the trait shapes are the one-way doors, already committed by ADR-030/031/033) -- **Priority**: low (becomes real when a persistence use case forces a - concrete adapter build) +- **Priority**: medium (must be addressed before the next round of + implementation; not blocking the current OQ-29 decision) - **Resolution**: The repo/adapter pattern is committed (ADR-033): core defines repo traits + in-memory default adapters; persistence adapters - are separate crates; the assembly layer wires the adapter. The - **concrete adapter shapes** — table schemas, backend choice (SQLite + - honker vs. a key-value store vs. a remote service), indexing, caching, - connection management — are deferred for exploration. + are separate crates; the assembly layer wires the adapter. - The project is iterating on adapter simplification. The trait shapes - (`IdentityProvider`, `CredentialStore`) are the commitment; the adapter - shapes are not. When a concrete use case (peer identity persistence - across restarts, credential persistence across restarts, ACL delegation - graph) forces a persistence adapter build, the adapter shape gets - reasoned through then, not speculatively now. + **What ships with core** (not deferred): the repo traits + (`IdentityProvider`, `CredentialStore`) and their in-memory default + adapters (`ConfigIdentityProvider`, `InMemoryCredentialStore`). These are + the one-way-door commitments — they ship with the core crate, not as + separate adapters. The in-memory adapters are real implementations, not + stubs — a full repo pattern (the same trait surface a persistence + adapter would implement), just backed by config / `HashMap` instead of + a database. + + **What's deferred**: the concrete *persistence* adapter shapes — table + schemas, backend choice (SQLite + honker vs. a key-value store vs. a + remote service), indexing, caching, connection management. These are the + separate-crate adapters (e.g., `alknet-peer-store-sqlite`, + `alknet-credential-store-sqlite`) that implement the core traits against + a specific backend. The project is iterating on adapter simplification; + the trait shapes are the commitment, the persistence adapter shapes are + not. When a concrete use case (peer identity persistence across + restarts, credential persistence across restarts, ACL delegation graph) + forces a persistence adapter build, the adapter shape gets reasoned + through then. This OQ exists so the deferral is deliberate, not accidental — the - pattern is committed, the adapters are not, and the gap is tracked. + pattern is committed, the in-memory adapters ship with core, and the + persistence adapter shapes are the open exploration. - **Cross-references**: ADR-030, ADR-031, ADR-033, OQ-34, [auth.md](crates/core/auth.md), [config.md](crates/core/config.md) \ No newline at end of file