diff --git a/docs/architecture/README.md b/docs/architecture/README.md index ef08004..8703f90 100644 --- a/docs/architecture/README.md +++ b/docs/architecture/README.md @@ -9,9 +9,9 @@ last_updated: 2026-06-26 **Pre-implementation.** The project has completed a pivot from a three-layer model to an ALPN-as-service model. The greenfield workspace contains only `alknet-vault` (stable — implementation complete and verified, local-only by construction per ADR-025, HD-derivation key model per ADR-026) and research/reference material. Foundational ADRs (001–028) are in place. ADR-024 resolves the registry mutability question and the `OperationContext.env` type identity crisis by layering the registry by trust boundary. ADR-025 drops irpc from the vault, making it local-only by construction. ADR-026 records the HD-derivation key model as a foundational decision. Review #003 (type/API surface completeness) resolved: `DerivedKey` derive contradiction, `encrypt` prose, return-type divergence, RwLock contradiction, drift table gaps, ADR-022 stale sketches, `Capabilities`/`SessionOverlaySource`/`CallConnection`/`CachedKey` definitions, `CompositeOperationEnv` dispatch contract, `with_local` signature, payload schemas, timeout propagation, and request ID generation. The alknet-core and alknet-call crate specs are in draft; the alknet-vault crate specs are stable. -The alknet-call server-side core (`CallAdapter`, `CallConnection` dispatch loop, wire framing, pending map, abort cascade, operation registry, service discovery) is implemented and tested (159 tests passing). The call-completion gap analysis (`docs/research/alknet-call-completion/gap-analysis.md`) identified the missing client/adapter surface specced in ADR-017 — `CallClient`, `from_call`, `from_jsonschema`, the `OperationAdapter` trait — plus four decisions (DC-1..4) needed before implementation. DC-1 (the one-way door: peer-scoped registry filtering) is resolved by ADR-028; DC-2/3/4 are two-way-door defaults recorded in `client-and-adapters.md` and tracked as OQ-25..28. The client/adapter surface is specced (`crates/call/client-and-adapters.md`); implementation is pending. +The alknet-call crate is **implemented and reviewed** — both the server-side core and the client/adapter surface. The server-side core (`CallAdapter`, `CallConnection` dispatch loop, wire framing, pending map, abort cascade, operation registry, service discovery) and the client/adapter surface (`CallClient`, `from_call`, `from_jsonschema`, `OperationAdapter` trait, shared `Dispatcher`) are implemented and tested (207 lib + 2 integration tests passing). The call-completion gap analysis (`docs/research/alknet-call-completion/gap-analysis.md`) identified the missing client/adapter surface specced in ADR-017 plus four decisions (DC-1..4); all are resolved — DC-1 by ADR-028 (peer-scoped default-deny filtering), DC-2/3/4 as two-way-door defaults in `client-and-adapters.md` (OQ-25..28). A post-implementation review (`tasks/call/review-completion.md`) confirmed spec conformance; the one spec-sketch omission (`connect()`'s `ClientError` return type) was the intended illustrative-sketch gap, now filled in. A TLS client-auth gap surfaced during implementation is tracked as OQ-29. -**Next step**: Implementation of the alknet-call client/adapter surface (priority order in `client-and-adapters.md`): `CallClient` → `from_call` → `OperationAdapter` trait → `from_jsonschema`. All one-way doors are resolved; remaining open questions (OQ-25..28) are two-way-door shape/defaults decided during implementation. +**Next step**: The alknet-call crate is ready for downstream consumers (alknet-http's `OperationAdapter` implementations, the container-service/runner pattern, alknet-agent, alknet-napi). The remaining open questions (OQ-25..29) are all two-way-door shape/defaults, not blockers. The next crate phase is alknet-http (Phase 0 findings in `docs/research/alknet-http/`). ## Architecture Documents @@ -102,6 +102,7 @@ See [open-questions.md](open-questions.md) for the full tracker. - **OQ-26**: `OperationAdapter` error type — `import()` returns `Result<_, AdapterError>`; variants decided in implementation - **OQ-27**: `from_call` re-import trigger — v1 default auto-on-reconnect; explicit `refresh()` additive - **OQ-28**: `from_call` namespace collision — v1 default error-on-collision (no prefix by default) +- **OQ-29**: `CallClient` TLS client-auth + remote-identity verification — v1 connects with `with_no_client_auth()` and `AcceptAnyServerCertVerifier`; wiring RawKey client-auth is additive (the no-env-vars invariant is unaffected — `auth_token` flows through the call-protocol payload, not TLS) **Deferred (not active):** - **OQ-09**: WASM target boundaries — design constraint, not deliverable diff --git a/docs/architecture/crates/call/README.md b/docs/architecture/crates/call/README.md index be6a020..8b38677 100644 --- a/docs/architecture/crates/call/README.md +++ b/docs/architecture/crates/call/README.md @@ -53,6 +53,7 @@ Structured RPC over QUIC: operations, request/response, streaming subscriptions, | OQ-26 | OperationAdapter error type (AdapterError variants) | open (two-way) | `import()` returns `Result<_, AdapterError>`; variants decided in implementation | | OQ-27 | from_call re-import trigger | open (two-way) | v1 default: auto-on-reconnect; explicit `refresh()` is additive | | OQ-28 | from_call namespace collision behavior | open (two-way) | v1 default: error on collision (no prefix by default) | +| OQ-29 | CallClient TLS client-auth and remote-identity verification | open (two-way) | v1 connects with `with_no_client_auth()` + `AcceptAnyServerCertVerifier`; wiring RawKey client-auth and a real `ServerCertVerifier` is additive (no-env-vars invariant unaffected — `auth_token` flows via call-protocol payload, not TLS) | ## Key Design Principles diff --git a/docs/architecture/crates/call/call-protocol.md b/docs/architecture/crates/call/call-protocol.md index ac0faed..0539509 100644 --- a/docs/architecture/crates/call/call-protocol.md +++ b/docs/architecture/crates/call/call-protocol.md @@ -164,6 +164,15 @@ The adapter: 5. Writes response `EventEnvelope` frames back to the appropriate stream 6. Manages the `PendingRequestMap` for outgoing calls +The dispatch loop is **shared** with `CallClient` (ADR-017 §1): both +`CallAdapter::handle` (accept path) and `CallClient::connect` (connect path) +construct a `Dispatcher` (`protocol/dispatch.rs`) and call `run_loop` — the +dispatch half is one implementation, the connection-establishment half differs +(accept vs dial). The `Dispatcher` carries a `RemoteFilter` (ADR-028) that +gates dispatch by `remote_safe`; the accept path uses `RemoteFilter::trusted()` +by convention. See [client-and-adapters.md](client-and-adapters.md) for the +`Dispatcher`/`RemoteFilter` mechanism. + ### Stream Model See ADR-012 for the full rationale. @@ -538,6 +547,7 @@ See [open-questions.md](../../open-questions.md) for full details. - **OQ-16** (resolved by ADR-014): No vault operations are exposed over the call protocol for now. - **OQ-19** (resolved): Session-scoped operation registries — agent-written operations overlaid on global registry via `OperationEnv` trait layering. Protocol doesn't need changes; `OperationEnv` must remain a trait. - **OQ-25..28** (open, two-way): Call-completion remainders — `CallClient` remote-safe marking shape, `OperationAdapter` error type, `from_call` re-import trigger, `from_call` namespace collision. The `CallClient`/adapter surface itself is specced in [client-and-adapters.md](client-and-adapters.md); the one-way door among these (existence of default-deny filtering) is resolved by ADR-028. +- **OQ-29** (open, two-way): `CallClient` TLS client-auth + remote-identity verification — v1 connects with `with_no_client_auth()` and `AcceptAnyServerCertVerifier`; wiring RawKey client-auth and a real `ServerCertVerifier` is additive. See [client-and-adapters.md](client-and-adapters.md). ## References diff --git a/docs/architecture/crates/call/client-and-adapters.md b/docs/architecture/crates/call/client-and-adapters.md index 5590c31..e97938b 100644 --- a/docs/architecture/crates/call/client-and-adapters.md +++ b/docs/architecture/crates/call/client-and-adapters.md @@ -92,15 +92,9 @@ pub struct CallClient { } impl CallClient { - /// Open a QUIC connection to `addr` on ALPN `alknet/call`, perform - /// credential handshake, and return a CallConnection running the shared - /// dispatch loop. Credentials come from capabilities (ADR-014), not env - /// vars — see "No-Env-Vars Invariant" below. - pub async fn connect( - &self, - addr: SocketAddr, - credentials: CallCredentials, - ) -> Result; + /// Default-deny mode: only `remote_safe: true` ops dispatch/list to the + /// remote peer (ADR-028). + pub fn new(registry: Arc, idp: Arc) -> Self; /// Trusted-peer mode: construct a CallClient that exposes all External /// ops from `registry` to the remote peer, ignoring the remote-safe @@ -109,6 +103,18 @@ impl CallClient { registry: Arc, identity_provider: Arc, ) -> Self; + + /// Open a QUIC connection to `addr` on ALPN `alknet/call`, perform + /// credential handshake, and return a CallConnection running the shared + /// dispatch loop. Credentials come from capabilities (ADR-014), not env + /// vars — see "No-Env-Vars Invariant" below. The dispatch loop runs on a + /// spawned task; the returned `CallConnection` is live until the remote + /// closes the connection or the caller drops it. + pub async fn connect( + &self, + addr: SocketAddr, + credentials: CallCredentials, + ) -> Result; } ``` @@ -127,6 +133,63 @@ both a caller and a callee — it dispatches incoming calls from the remote peer against its peer-scoped registry view, and it initiates outgoing calls through the `CallConnection::call()` / `subscribe()` / `abort()` API. +#### Shared Dispatcher + +The shared dispatch loop lives in `protocol/dispatch.rs` as the `Dispatcher` +struct. This is the architectural mechanism that keeps `CallClient` from +becoming a parallel protocol implementation (ADR-017 §1): both `CallAdapter`'s +accept path and `CallClient`'s connect path construct a `Dispatcher` and call +`run_loop` — the dispatch half is one implementation, the +connection-establishment half differs (accept vs dial). + +```rust +/// Peer-scoped registry filter state (ADR-028). `trusted_peer: false` +/// (default-deny for a CallClient) hides ops whose +/// `HandlerRegistration.remote_safe` is false from both dispatch and +/// `services/list`. `trusted_peer: true` (explicit opt-in, also used by the +/// CallAdapter's local accept path) bypasses the filter. +pub struct RemoteFilter { pub trusted_peer: bool } + +/// Shared dispatcher for an established CallConnection. Constructed by both +/// CallAdapter (accept path) and CallClient (connect path). Holds no +/// per-connection state; the CallConnection is passed into run_loop. +pub struct Dispatcher { + pub registry: Arc, + pub identity_provider: Arc, + pub session_source: Option>, + pub default_timeout: Duration, + pub remote_filter: RemoteFilter, +} +``` + +The `remote_filter` is the dispatch-time gate that enforces ADR-028's +default-deny: `dispatch_requested` checks `remote_filter.allows(registration.remote_safe)` +**before** building the context or invoking the handler — a non-remote-safe op +returns `NOT_FOUND` before any capability material reaches the handler (the +security argument for default-deny, ADR-028 Context). The accept path +(`CallAdapter`) uses `RemoteFilter::trusted()` by convention — a direct QUIC +client is not a filtered `CallClient` peer in the ADR-028 sense. + +`CallClient::spawn_dispatch(connection)` is the lower-level API that takes a +pre-established `Connection`, constructs a `CallConnection`, builds a +`Dispatcher` with the appropriate `RemoteFilter`, spawns the dispatch task, +and returns the live `CallConnection`. `connect()` uses it after the QUIC dial +completes; tests use it to wire mock/loopback connections directly. + +#### services/list peer-scoped serving + +The `services/list` hide behavior (ADR-028 Assumption 2) is wired via a +separate handler factory: `services_list_handler_peer_scoped(registry, +trusted_peer)` in `registry/discovery.rs`, backed by +`OperationRegistry::list_operations_peer_scoped(trusted_peer)`. The assembly +layer constructs the `CallClient`'s registry with this peer-scoped handler +(not the plain `services_list_handler` used by the `CallAdapter`'s local +accept path) so that when the remote peer calls `services/list` on the +`CallClient`, the response hides non-remote-safe ops in default-deny mode. +The dispatch-path `RemoteFilter` (above) and the `services/list`-handler +filter are the two halves of the same default-deny posture — discovery and +dispatch filters agree. + ### Credential sources for connections `CallClient::connect()` takes a `CallCredentials` bundle. Credentials come @@ -139,6 +202,14 @@ pub struct CallCredentials { pub auth_token: Option, // call-protocol-level token pub remote_identity: Option, // expected fingerprint/cert } + +/// Expected identity of the remote node (ADR-017 §7). v1 carries a +/// fingerprint string the assembly layer derives from `Capabilities`. +pub struct RemoteIdentity { pub fingerprint: String } + +/// Errors produced by `CallClient::connect`. +#[non_exhaustive] +pub enum ClientError { Transport { .. }, TlsSetup { .. }, ConnectionClosed } ``` - **TLS identity** — the local node's Ed25519 raw key (RFC 7250) or X.509 cert, @@ -154,6 +225,18 @@ invariant (below). The concrete shapes of `TlsIdentity`, `AuthToken`, and `RemoteIdentity` are implementation-detail two-way doors; the one-way constraints are that they come from `Capabilities`, not env vars (ADR-014). +**v1 TLS client-auth gap** (OQ-29): v1 `connect()` builds the quinn client +config with `with_no_client_auth()` and an `AcceptAnyServerCertVerifier` — the +client does not present its TLS identity as a client cert, and does not pin the +remote's expected identity from `credentials.remote_identity`. This is a +two-way-door remainder: wiring the local node's RawKey/X509 identity as a +rustls client-auth cert (for servers that verify client identity) and +plugging `credentials.remote_identity` into a real `ServerCertVerifier` is +additive. The one-way constraint (credentials from `Capabilities`, not env +vars, ADR-014) is unaffected — the `auth_token` dimension flows through the +call-protocol `auth_token` payload field, not TLS, so the no-env-vars +invariant holds independently of this gap. + ### from_call `from_call` discovers the remote peer's `External` operations and registers @@ -511,6 +594,14 @@ See [open-questions.md](../../open-questions.md) for full details. auto-on-reconnect; the explicit path is additive. - **OQ-28** (open, two-way): `from_call` namespace collision behavior — error on collision (v1 default, recorded here) vs last-wins. +- **OQ-29** (open, two-way): `CallClient` TLS client-auth + remote-identity + verification — v1 connects with `with_no_client_auth()` and + `AcceptAnyServerCertVerifier` (does not present a client cert, does not pin + the remote's expected identity from `credentials.remote_identity`). Wiring + the local node's RawKey/X509 identity as a rustls client-auth cert and + plugging `remote_identity` into a real `ServerCertVerifier` is additive. + The one-way constraint (credentials from `Capabilities`, ADR-014) is + unaffected — `auth_token` flows through the call-protocol payload, not TLS. ## References diff --git a/docs/architecture/open-questions.md b/docs/architecture/open-questions.md index b9176db..ae5e5a3 100644 --- a/docs/architecture/open-questions.md +++ b/docs/architecture/open-questions.md @@ -408,4 +408,28 @@ revisited during implementation without a new ADR. no ADR needed. The alternative (last-wins) would silently mask one remote's op behind another's, which is the kind of surprise the default-deny posture exists to avoid. -- **Cross-references**: ADR-015, ADR-017, ADR-028, [client-and-adapters.md](crates/call/client-and-adapters.md) \ No newline at end of file +- **Cross-references**: ADR-015, ADR-017, ADR-028, [client-and-adapters.md](crates/call/client-and-adapters.md) + +### OQ-29: CallClient TLS Client-Auth and Remote-Identity Verification + +- **Origin**: [client-and-adapters.md](crates/call/client-and-adapters.md), ADR-017 §7 +- **Status**: open +- **Door type**: Two-way +- **Priority**: medium +- **Resolution**: v1 `CallClient::connect()` builds the quinn client config + with `with_no_client_auth()` and an `AcceptAnyServerCertVerifier` — the + client does not present its TLS identity (`credentials.tls_identity`) as a + client cert, and does not pin the remote's expected identity from + `credentials.remote_identity`. The server-side + `AcceptAnyCertVerifier` (in alknet-core's endpoint) does not require or + verify client certs, so a client cert is not needed to establish a + connection in v1. Wiring the local node's RawKey/X509 identity as a rustls + client-auth cert (for servers that *do* verify client identity) and + plugging `credentials.remote_identity` into a real `ServerCertVerifier` is + additive — a two-way-door remainder surfaced during implementation. + **The one-way constraint (credentials from `Capabilities`, not env vars, + ADR-014) is unaffected**: the `auth_token` dimension flows through the + call-protocol `auth_token` payload field, not TLS, so the no-env-vars + invariant holds independently of this gap. Decided during a future task that + wires RawKey client-auth; recorded here, not in a full ADR. +- **Cross-references**: ADR-014, ADR-017, ADR-027, [client-and-adapters.md](crates/call/client-and-adapters.md), [endpoint.md](crates/core/endpoint.md) \ No newline at end of file