diff --git a/docs/architecture/README.md b/docs/architecture/README.md index 2a1d140..294b088 100644 --- a/docs/architecture/README.md +++ b/docs/architecture/README.md @@ -1,6 +1,6 @@ --- status: draft -last_updated: 2026-06-27 +last_updated: 2026-06-28 --- # Alknet Architecture @@ -78,6 +78,7 @@ The alknet-call crate is **implemented and reviewed** — both the server-side c | [031](decisions/031-credentialstore-repo-trait.md) | CredentialStore Repo Trait | Accepted | | [032](decisions/032-forwarded-for-identity.md) | Forwarded-For Identity (Metadata, Not Authority) | Accepted | | [033](decisions/033-storage-boundary-and-repo-adapter-pattern.md) | Storage Boundary and Repo/Adapter Pattern | Accepted | +| [034](decisions/034-outgoing-only-x509-and-three-peer-roles.md) | Outgoing-Only X.509 and the Three Peer Roles | Accepted | ## Open Questions @@ -124,7 +125,7 @@ See [open-questions.md](open-questions.md) for the full tracker. **Open (feature extensions, not blocking):** - **OQ-32**: Multi-hop federation — the one-hop model is the architectural commitment; multi-hop is a feature extension that doesn't break downstream - **OQ-36**: Concrete persistence adapter shapes — the repo/adapter pattern is committed (ADR-033); in-memory adapters ship with core; persistence adapters (SQLite, etc.) are deferred for exploration -- **OQ-37**: X.509 outgoing-only case — the three auth types (Ed25519, X.509, bearer token) and how X.509 server identity fits the peer model. Not blocking the ADR-029 migration; downstream (HTTP crate phase) +- **OQ-37**: ~~X.509 outgoing-only case~~ — **resolved by ADR-034** (three remote roles named: public X.509 endpoint, transport relay, hub; `PeerEntry` asymmetry is correct; client-side verifier selection by `PeerEntry` presence) **Deferred (not active):** - **OQ-09**: WASM target boundaries — design constraint, not deliverable diff --git a/docs/architecture/crates/call/README.md b/docs/architecture/crates/call/README.md index 174573a..8d2a3dc 100644 --- a/docs/architecture/crates/call/README.md +++ b/docs/architecture/crates/call/README.md @@ -64,7 +64,7 @@ Structured RPC over QUIC: operations, request/response, streaming subscriptions, | OQ-33 | PeerId — crypto identity vs stable logical id | **resolved** (ADR-030) | `PeerId = Identity.id = PeerEntry.peer_id` (stable across key rotation) | | OQ-34 | Persistent peer registry | **resolved** (ADR-030+033) | Core trait + in-memory default; persistence adapters are separate crates | | OQ-35 | ~~API key asymmetry~~ | **dissolved** | `PeerEntry` supports multiple credential paths; `ApiKeyEntry` is for tokens that ARE the identity | -| OQ-37 | X.509 outgoing-only case | open | Three auth types; how X.509 server identity fits the peer model. Not blocking. | +| OQ-37 | X.509 outgoing-only case | **resolved** (ADR-034) | Three remote roles (public X.509 endpoint, transport relay, hub); `PeerEntry` asymmetry correct; verifier by `PeerEntry` presence | ## Key Design Principles diff --git a/docs/architecture/crates/call/client-and-adapters.md b/docs/architecture/crates/call/client-and-adapters.md index 6114ee0..49efe96 100644 --- a/docs/architecture/crates/call/client-and-adapters.md +++ b/docs/architecture/crates/call/client-and-adapters.md @@ -1,6 +1,6 @@ --- status: draft -last_updated: 2026-06-27 +last_updated: 2026-06-28 --- # alknet-call — Client and Adapters @@ -205,16 +205,28 @@ credential dimensions (ADR-017 §7): pub struct CallCredentials { pub tls_identity: Option, // RFC 7250 raw key or X.509 pub auth_token: Option, // call-protocol-level token - pub remote_identity: Option, // expected fingerprint/cert + pub remote_identity: Option, // expected fingerprint/cert (None = CA path, see below) } -/// Expected identity of the remote node (ADR-017 §7). v1 carries a -/// fingerprint string the assembly layer derives from `Capabilities`. +/// Expected identity of the remote node (ADR-017 §7, extended by +/// ADR-034 §2). Carries a fingerprint string the assembly layer +/// derives from `Capabilities` when the local node has a `PeerEntry` +/// for the remote (the known-peer case → fingerprint pin). +/// +/// `remote_identity: None` is the **public X.509 endpoint** case: the +/// local node has no `PeerEntry` for the remote, so there is no +/// fingerprint to pin. Combined with an X.509 transport, `None` +/// selects CA verification (`WebPkiServerVerifier`) per the +/// verifier-selection rule in ADR-034 §3. Combined with an Ed25519 +/// raw-key transport, `None` fails closed (raw-key remotes are always +/// known peers — no CA to fall back to). +/// +/// The `Option` is therefore load-bearing, not cosmetic: `Some(fingerprint)` +/// means "pin this" (known peer), `None` means "trust the CA or fail" +/// (unknown remote). An implementer must not default `remote_identity` +/// to a placeholder value to "satisfy" the field — `None` is a real +/// state that drives verifier selection. pub struct RemoteIdentity { pub fingerprint: String } - -/// Errors produced by `CallClient::connect`. -#[non_exhaustive] -pub enum ClientError { Transport { .. }, TlsSetup { .. }, ConnectionClosed } ``` - **TLS identity** — the local node's Ed25519 raw key (RFC 7250) or X.509 cert, @@ -222,7 +234,10 @@ pub enum ClientError { Transport { .. }, TlsSetup { .. }, ConnectionClosed } - **Auth token** — an opaque call-protocol-level token, decrypted from the vault or derived from a shared secret. - **Remote identity verification** — the expected fingerprint/cert of the - remote node, stored as a capability. + remote node, stored as a capability. `Some` → fingerprint pin (known + peer with a `PeerEntry`); `None` → CA verification for X.509 remotes, + fail-closed for Ed25519 raw-key remotes (ADR-034 §2/§3). The `None` + case is the public-X.509-endpoint path, not a missing field. These are populated by the assembly layer at `CallClient` construction time from vault-derived `Capabilities`. The credential path is the no-env-vars @@ -242,6 +257,22 @@ vars, ADR-014) is unaffected — the `auth_token` dimension flows through the call-protocol `auth_token` payload field, not TLS, so the no-env-vars invariant holds independently of this gap. +**Outgoing X.509 and the peer model** (ADR-034): the client-side +`ServerCertVerifier` is selected by whether the local node has a +`PeerEntry` for the remote, not by key type alone. A pure-client +connection to a **public X.509 endpoint** (no `PeerEntry` on the local +side — e.g., dialing `api.alk.dev` or a third-party API) uses +`WebPkiServerVerifier` (CA verification), gets **no `PeerId`** on the +client side, and is **not added to `PeerCompositeEnv`** — it is not in +the call-protocol peer graph (ADR-029). Ops discovered via `from_call` +on such a connection land in the connection's Layer 2 overlay +(ADR-024) and are invoked through the `CallConnection` handle directly, +not via `PeerRef::Specific`. A connection to a **hub** (a `PeerEntry` +with mixed Ed25519 + X.509 fingerprints) uses fingerprint pinning on +both cert paths and does enter the peer graph. See +[ADR-034](../../decisions/034-outgoing-only-x509-and-three-peer-roles.md) +for the verifier selection rule and the three-role naming. + ### from_call `from_call` discovers the remote peer's `External` operations and registers @@ -591,6 +622,22 @@ Based on the gap analysis and the downstream unblock chain: - **MCP stdio transport is not built.** Streamable HTTP is the only supported MCP transport in alknet. stdio = spawn arbitrary executable = built-in RCE. Recorded as an explicit security position, not a feature gap. +- **Pure-client X.509 connections are not in the peer graph on the client + side.** A `CallClient` connection to a public X.509 endpoint with no + local `PeerEntry` for the remote gets no `PeerId`, is not added to + `PeerCompositeEnv`, and is not addressable via `PeerRef::Specific`. + Ops discovered on it live in the connection's Layer 2 overlay and are + invoked through the `CallConnection` handle. The client-side + `ServerCertVerifier` uses CA verification (`WebPkiServerVerifier`) for + such remotes; known peers (hub with `PeerEntry`) use fingerprint + pinning. See [ADR-034](../../decisions/034-outgoing-only-x509-and-three-peer-roles.md). +- **`CallCredentials.remote_identity: None` is load-bearing.** `None` + means "no `PeerEntry` for this remote → use CA verification (X.509) + or fail closed (Ed25519 raw key)" per the ADR-034 §3 verifier rule. + The implementation must not default `remote_identity` to a placeholder + to satisfy the field, and must not treat `None` as "skip verification" + — `None` + X.509 is CA verification, `None` + raw key is a hard + failure. `Some(fingerprint)` is the known-peer pin path. ## Design Decisions @@ -609,6 +656,7 @@ Based on the gap analysis and the downstream unblock chain: | Abort cascade for nested calls | [ADR-016](../../decisions/016-abort-cascade-for-nested-calls.md) | Cross-node abort through `from_call` forwarding handler's `parent_request_id` | | Operation error schemas | [ADR-023](../../decisions/023-operation-error-schemas.md) | `error_schemas` mirrored by `from_call` from remote op's spec | | TLS identity redesign | [ADR-027](../../decisions/027-tls-identity-redesign-acme-rawkey-decoupling.md) | RFC 7250 raw key / X.509 cert dimensions of `CallCredentials` | +| Outgoing-only X.509 and three peer roles | [ADR-034](../../decisions/034-outgoing-only-x509-and-three-peer-roles.md) | Public X.509 endpoint is not a `PeerEntry` on the client side (no `PeerId`, not in peer graph); client-side verifier by `PeerEntry` presence (CA vs fingerprint pin); hub = mixed-fingerprint `PeerEntry` | | HD derivation for encryption keys | [ADR-020](../../decisions/020-hd-derivation-for-encryption-keys.md) | Vault-derived TLS identity material | | Vault key model | [ADR-026](../../decisions/026-vault-key-model-hd-derivation.md) | Vault-derived TLS identity material | | Vault local-only dispatch | [ADR-025](../../decisions/025-vault-local-only-dispatch.md) | Vault access at assembly layer only; the credential injection path's first hop | @@ -662,9 +710,17 @@ See [open-questions.md](../../open-questions.md) for full details. shapes — the repo/adapter pattern is committed (ADR-033); the in-memory adapters ship with core; the persistence adapter shapes (SQLite, etc.) are deferred for exploration. See OQ-36 in open-questions.md. -- **OQ-37** (open): X.509 outgoing-only case — the three auth types and - how X.509 server identity fits the peer model. Not blocking the - ADR-029 migration. See OQ-37 in open-questions.md. +- **OQ-37** (resolved by ADR-034): X.509 outgoing-only case — three + remote roles named (public X.509 endpoint, transport relay, hub). + `PeerEntry` asymmetry is correct: a pure-client connection to a public + X.509 endpoint is **not** in the call-protocol peer graph on the + client side — no `PeerEntry`, no `PeerId`, no `PeerRef::Specific` + routing. Ops discovered via `from_call`/`from_openapi`/`from_mcp` + land in the connection's Layer 2 overlay and are invoked through the + connection handle. The client-side `ServerCertVerifier` is selected + by `PeerEntry` presence: known peer → fingerprint pin; unknown X.509 + remote → CA verification (`WebPkiServerVerifier`). See ADR-034 and + OQ-37 in open-questions.md. ## References diff --git a/docs/architecture/crates/core/README.md b/docs/architecture/crates/core/README.md index 0148dd9..e1d96fe 100644 --- a/docs/architecture/crates/core/README.md +++ b/docs/architecture/crates/core/README.md @@ -45,7 +45,7 @@ Core library for ALPN-based protocol dispatch. Every handler crate depends on al | OQ-34 | Persistent peer registry (storage boundary) | resolved by ADR-030+031+033 | Core defines repo traits + in-memory defaults; persistence adapters are separate crates | | OQ-35 | ~~API key asymmetry~~ | dissolved | `PeerEntry` supports multiple credential paths; `ApiKeyEntry` is for tokens that ARE the identity | | OQ-36 | Concrete persistence adapter shapes | open (deferred for exploration) | The repo/adapter pattern is committed (ADR-033); in-memory adapters ship with core; persistence adapters deferred | -| OQ-37 | X.509 outgoing-only case | open | Three auth types; how X.509 server identity fits the peer model. Not blocking. | +| OQ-37 | X.509 outgoing-only case | resolved by ADR-034 | Three remote roles (public X.509 endpoint, transport relay, hub); `PeerEntry` asymmetry correct; client-side verifier by `PeerEntry` presence (CA vs fingerprint pin) | ## Key Design Principles diff --git a/docs/architecture/crates/core/auth.md b/docs/architecture/crates/core/auth.md index 6c1287b..d628482 100644 --- a/docs/architecture/crates/core/auth.md +++ b/docs/architecture/crates/core/auth.md @@ -1,6 +1,6 @@ --- status: draft -last_updated: 2026-06-27 +last_updated: 2026-06-28 --- # Authentication @@ -132,6 +132,64 @@ Bearer tokens have two paths: The distinction is whether the token needs a stable logical id across rotation (`PeerEntry`) or not (`ApiKeyEntry`). See ADR-030 §"Bearer tokens." +## Three Remote Roles (ADR-034) + +The three credential types above describe how a *single* `PeerEntry` can +be authenticated. Separately, there are **three distinct remote roles** +that the architecture must not conflate (see [ADR-034](../../decisions/034-outgoing-only-x509-and-three-peer-roles.md)): + +| Role | Identity | alknet peer? | `PeerEntry` on local side? | +|------|----------|--------------|----------------------------| +| **Public X.509 endpoint** | Domain + CA-issued X.509 | No (local node is a client) | No | +| **Transport relay** (iroh's DERP-equivalent) | iroh `NodeId` (Ed25519) | No (infrastructure) | No | +| **Hub / hosting node** | Ed25519 raw key **and/or** X.509 | Yes (full peer) | Yes | + +(Transport path and examples per role are in ADR-034; this table is +auth-focused — identity, peer-graph membership, and `PeerEntry` +presence on the local side.) + +`PeerEntry` (and the `PeerId` it resolves to) is the model for peers in +the call-protocol peer graph (ADR-029) — peers that get a stable logical +identity, are addressable via `PeerRef::Specific`, and whose ops land in +the peer-keyed overlay. A pure-client connection to a public X.509 +endpoint (e.g., `api.alk.dev`, a third-party API) is **not** in that +graph on the client side: the local node holds no `PeerEntry` for it, +the connection gets no `PeerId`, and ops discovered via +`from_call`/`from_openapi`/`from_mcp` are invoked through the +connection handle directly (Layer 2 overlay, ADR-024), not through +peer-keyed routing. The asymmetry is deliberate — a public domain's +operator can change hands, so there is no stable logical identity to +attach; the local node trusts the CA today and holds the connection +handle. + +The **hub** case is an ordinary `PeerEntry` that happens to expose both +an Ed25519 fingerprint (P2P path) and an X.509 fingerprint +(`SHA256:`, WebTransport/HTTPS path) — already supported by +`PeerEntry.fingerprints: Vec` (ADR-030). Browsers connecting to +a hub over WebTransport/HTTPS are *not* alknet peers on the hub's side +either — they're served by `alknet-http`, authenticate by bearer token, +and get no `PeerId`. + +### Client-side verifier selection (outgoing connections) + +The `CallClient` / `from_openapi` / `from_mcp` client-side +`ServerCertVerifier` is selected by **whether the local node has a +`PeerEntry` for the remote**, not by key type alone: + +| Local has `PeerEntry` for remote? | Remote cert type | Client verifier | +|----------------------------------|------------------|-----------------| +| No (public X.509 endpoint) | X.509 | `WebPkiServerVerifier` (CA verification) | +| No | Ed25519 raw key | fails closed (no CA to fall back to — raw-key remotes are always known peers) | +| Yes (hub, Ed25519 path) | Ed25519 raw key | fingerprint match (`ed25519:`) | +| Yes (hub, X.509 path) | X.509 | fingerprint match (`SHA256:`) | + +This is the key-type-aware verifier from OQ-29, with the peer-model +criterion (ADR-034) made explicit. `AcceptAnyServerCertVerifier` is a +security hole for X.509 and is only safe for raw-key fingerprint +extraction on the *server* side; the *client* side must use CA +verification for unknown X.509 remotes and fingerprint pinning for +known peers. + ## AuthToken Opaque authentication token carried in protocol frames. @@ -230,7 +288,7 @@ The verifier accepts any presented cert without CA verification because alknet's identity model is fingerprint-based, not PKI-based — the `AuthPolicy::peers` set is the trust anchor, not a root CA store. The cert bytes are extracted at the TLS layer and hashed to a fingerprint -string; the fingerprint is then matched against the configured `PeerEntry.fingerprint` +string; the fingerprint is then matched against the configured `PeerEntry.fingerprints` fields by `IdentityProvider::resolve_from_fingerprint()`. ## Resolution Flow @@ -328,12 +386,13 @@ The endpoint's `AlknetEndpoint` also holds `Arc` for endpo | PeerEntry and Identity.id decoupling | [ADR-030](../../decisions/030-peerentry-and-identity-id-decoupling.md) | `authorized_fingerprints` → `peers: Vec`; `Identity.id` = `peer_id` (stable), not fingerprint; key rotation changes fingerprint, not identity | | CredentialStore repo trait | [ADR-031](../../decisions/031-credentialstore-repo-trait.md) | Second repo trait in core (alongside `IdentityProvider`); `InMemoryCredentialStore` default adapter | | Storage boundary and repo/adapter pattern | [ADR-033](../../decisions/033-storage-boundary-and-repo-adapter-pattern.md) | Core defines traits + in-memory defaults; persistence adapters are separate crates | +| Three remote roles and outgoing-only X.509 | [ADR-034](../../decisions/034-outgoing-only-x509-and-three-peer-roles.md) | Public X.509 endpoint / transport relay / hub; `PeerEntry` asymmetry (pure-client X.509 is not a peer); client-side verifier by `PeerEntry` presence | ## Open Questions - **OQ-29** (resolved): `CallClient` TLS client-auth — wire quinn client-auth (present Ed25519 key as raw public key client cert); key-type-aware server cert verification (raw key = fingerprint match, X.509 = CA verification); fingerprint normalization (`ed25519:` across quinn/iroh). See OQ-29 in open-questions.md. - **OQ-35** (dissolved): the "API key asymmetry" framing was wrong; `PeerEntry` supports multiple credential paths (fingerprints + auth_token_hash), `ApiKeyEntry` is for tokens that ARE the identity. See OQ-35 in open-questions.md. -- **OQ-37** (open): X.509 outgoing-only case — the three auth types and how X.509 server identity fits the peer model. Not blocking the ADR-029 migration. See OQ-37 in open-questions.md. +- **OQ-37** (resolved): X.509 outgoing-only case — three remote roles named (public X.509 endpoint, transport relay, hub); `PeerEntry` asymmetry is correct (pure-client X.509 connections are not in the peer graph on the client side); client-side verifier selection by `PeerEntry` presence (CA verification for unknown X.509, fingerprint pin for known peers). See ADR-034 and OQ-37 in open-questions.md. ## Security Constraints diff --git a/docs/architecture/decisions/034-outgoing-only-x509-and-three-peer-roles.md b/docs/architecture/decisions/034-outgoing-only-x509-and-three-peer-roles.md new file mode 100644 index 0000000..b07db80 --- /dev/null +++ b/docs/architecture/decisions/034-outgoing-only-x509-and-three-peer-roles.md @@ -0,0 +1,398 @@ +# ADR-034: Outgoing-Only X.509 and the Three Peer Roles + +## Status + +Accepted (resolves OQ-37) + +## Context + +OQ-37 framed the open question as: "the three credential types (Ed25519, +X.509, bearer token) and how X.509 server identity fits the peer model." +During resolution, it became clear that **three distinct remote roles** +had been conflated under the single label "X.509 endpoint," and that the +conflation was the actual source of the confusion — not the TLS +mechanics, which ADR-027 and ADR-030 had already settled. + +The three roles are real and structurally different: + +1. **Public X.509 endpoint** — a remote HTTPS or `alknet/call`-over-TLS + server reachable by domain name, authenticated by a CA-issued X.509 + cert. The local alknet node is a *client* of it. Examples: a + third-party API (`vast.ai`, `api.openai.com`), a public alknet hub + that the local node dials over the open internet, an `alknet/call` + peer that has chosen to expose a domain + X.509 instead of (or in + addition to) an Ed25519 raw key. The client authenticates to the + server by **bearer token** (browsers and most HTTP clients cannot do + TLS client-auth); the server authenticates to the client by **CA + verification** (WebPKI), not by fingerprint pinning. + +2. **Transport relay** — iroh's DERP-equivalent (`iroh-relay`). A + connectivity-assistance node that forwards encrypted datagrams + between peers who cannot directly connect (NAT traversal). It is + *infrastructure*, not an alknet application peer: it does not + register operations, does not participate in the call protocol's + peer graph, and has no `PeerEntry` / `PeerId` in alknet's auth + model. Alknet inherits it for free when the `iroh` feature is on; the + relay's own identity (an Ed25519 `NodeId`) is iroh's concern, not + alknet's. + +3. **Hub / hosting node** — an alknet application peer that acts as a + hub in a hub-and-spoke (head/worker) topology. It is an ordinary + `PeerEntry` that *happens* to also expose a public domain + X.509 + (so browsers / external HTTPS clients can reach it) *and* an Ed25519 + identity (so other alknet nodes can reach it P2P via iroh or direct + quinn). The git-hosting-relay-with-gossip-sync use case is this role: + the hub is a full alknet peer that additionally serves browsers. + +The pre-ADR-034 framing asked whether `PeerEntry` should be made +**symmetric** — i.e., whether the local node should hold a `PeerEntry` +for *every* remote it might dial, including pure-public-API servers it +has no P2P relationship with. This ADR answers **no**: the asymmetry is +correct and reflects a real difference in trust model. `PeerEntry` (and +the `PeerId` it produces) is the model for **peers in the call-protocol +peer graph** (ADR-029) — peers that get a stable logical identity, are +addressable via `PeerRef::Specific`, and whose ops land in the +peer-keyed overlay. A pure-client connection to a public HTTPS API is +not that. + +This distinction matters because forcing a stable logical `peer_id` +onto "the operator of `api.example.com`" is wrong: a public domain's +operator can change hands, the cert can be reissued, and the local node +has no stable logical identity to attach — only "domain X verified by +CA Y today." That is a different trust model from "this Ed25519 key is +`worker-a`, and key rotation updates the fingerprint but not the +identity" (ADR-030). + +## Decision + +### 1. Name the three roles; stop using "relay" ambiguously + +The architecture documents use three distinct terms: + +| Role | Identity | Transport | alknet peer? | Example | +|------|----------|-----------|--------------|---------| +| **Public X.509 endpoint** | Domain + CA-issued X.509 | HTTPS / `alknet/call`-over-TLS | No (client only, unless also role 3) | `api.alk.dev`, `vast.ai` | +| **Transport relay** | iroh `NodeId` (Ed25519) | iroh's DERP-like protocol | No (infrastructure) | `relay.iroh.network` | +| **Hub / hosting node** | Ed25519 raw key **and/or** X.509 | iroh / direct quinn / HTTPS | Yes (full `PeerEntry`) | git-hosting hub, head node | + +Existing specs that say "relay" when they mean "domain-hosted service" +or "hub" are amended by reference to this table. ADR-027's "domain- +hosted services" and ADR-030's "X.509 cert" credential path refer to +the **public X.509 endpoint** role and the **hub** role; iroh's +transport relay is a separate, inherited component referenced only in +the iroh transport path. + +### 2. Outgoing-only X.509 is not a `PeerEntry` on the client side + +When a `CallClient` (or `from_openapi` / `from_mcp`) dials a remote +that is a **public X.509 endpoint** and the local node has no P2P +relationship with it (no `PeerEntry` for the remote): + +- The server is authenticated by **CA verification** + (`rustls::WebPkiServerVerifier` with the platform root store or a + configured CA bundle). There is no fingerprint to pin — pinning a + `SHA256:` fingerprint against an external CA-issued cert + is brittle (cert renewal changes the fingerprint) and is not the + WebPKI trust model. The trigger for CA verification is **the absence + of a `PeerEntry` for the remote combined with an X.509 transport**; + the verifier selection rule is stated in full in §3 below. The + `CallCredentials.remote_identity: Option` field + (ADR-017 §7) carries an expected fingerprint/cert when the caller has + one to pin (`Some`); for a pure-client X.509 dial with no + `PeerEntry`, `remote_identity` is `None` and the CA path applies. The + `Option` is load-bearing — `None` is the public-X.509-endpoint state, + not a missing field: an implementer must not default it to a + placeholder, and must not treat `None` as "skip verification" (`None` + + X.509 = CA verification; `None` + Ed25519 raw key = fail closed). + (ADR-017 §7 specified `remote_identity` as "expected fingerprint or + cert"; this ADR extends its semantics so that `remote_identity: None` + + no `PeerEntry` + X.509 transport selects CA verification, and + `remote_identity: None` + Ed25519 raw-key transport fails closed.) +- The client authenticates to the server by **bearer token** + (`CallCredentials.auth_token`), carried in the call-protocol + `auth_token` payload field (or the HTTP `Authorization` header for + `from_openapi` / `from_mcp`). What the *server* does with that token + depends on which kind of public X.509 endpoint it is: + - **Third-party API** (`api.openai.com`, `vast.ai` — not an alknet + node): the server applies its own auth scheme (its own API-key + validation, its own ACL). Alknet's `PeerEntry` / `ApiKeyEntry` types + do not apply on the far side; the alknet client just carries the + token in the shape the remote expects (an HTTP header, a + call-protocol `auth_token` payload) and treats the remote's + response as authoritative. + - **Alknet hub reached over its public X.509 path** (a role-3 hub + dialed over the domain instead of P2P): the hub resolves the + client's token via its own `PeerEntry.auth_token_hash` or + `ApiKeyEntry` — the *server's* bookkeeping, not the client's. The + client still holds no `PeerEntry` for the hub on its own side + unless it also has a P2P trust relationship with that hub (in which + case the §3 mixed-fingerprint path applies, not this one). +- The client may still present its TLS client cert (Ed25519 raw public + key, per OQ-29) when one is configured; bearer token is the + *authorization* credential, and TLS client-auth (when presented) is + *additional* identity material the server may use. For a third-party + API the cert is ignored; for an alknet hub it may be extracted as a + fingerprint. Presenting or omitting the client cert is the caller's + choice via `CallCredentials`; this ADR does not require disabling + client-auth on this path. +- The connection does **not** get a `PeerId` on the client side. It is + not added to `PeerCompositeEnv` (ADR-029). There is no + `PeerRef::Specific` routing to it. The connection is a live + `CallConnection` (or HTTP client session) the caller holds directly; + ops discovered via `from_call` / `from_openapi` / `from_mcp` land in + that connection's Layer 2 overlay (ADR-024) and are invoked through + the connection handle, not through the peer-keyed routing layer. + +This is the **asymmetry** OQ-37 worried about, stated as a deliberate +design property: `PeerEntry` is for peers in the call-protocol peer +graph. Pure-client connections to public X.509 endpoints are not in +that graph on the client side. The server may have a `PeerEntry` for +*us* (resolving our bearer token, in the alknet-hub sub-case); we +don't need one for *it*. + +### 3. The hub case is already covered by ADR-030's mixed-fingerprint `PeerEntry` + +A **hub / hosting node** that is reachable both P2P (Ed25519 raw key +via iroh or direct quinn) and via a public domain (X.509 for browsers) +is a single `PeerEntry` with mixed fingerprints: + +```rust +PeerEntry { + peer_id: "hub-a".into(), + fingerprints: vec![ + "ed25519:", // P2P path + "SHA256:", // WebTransport / HTTPS path + ], + auth_token_hash: Some(""), + scopes: vec![...], + resources: {...}, + ... +} +``` + +When an alknet node dials this hub P2P, the Ed25519 fingerprint +matches; when it dials over the public X.509 path (e.g., because P2P +connectivity failed), the X.509 fingerprint matches — both resolve to +the same `peer_id` (`"hub-a"`). The X.509 path here uses +**fingerprint pinning** (the `SHA256:` is in `PeerEntry`), *not* +CA verification, because the local node has a prior P2P trust +relationship with this specific hub and has recorded its cert's +fingerprint. This is the one case where X.509 fingerprint pinning is +correct: the peer is a known alknet peer, not an arbitrary public API. + +The choice between **CA verification** (role 1) and **fingerprint +pinning** (role 3, X.509 path) is driven by whether the local node has +a `PeerEntry` for the remote — this is the authoritative verifier +selection rule, referenced from §2: + +| Local has `PeerEntry` for remote? | Remote cert type | Client verifier | +|----------------------------------|------------------|-----------------| +| No (public X.509 endpoint) | X.509 | `WebPkiServerVerifier` (CA verification) | +| No | Ed25519 raw key | fails closed (no CA to fall back to — raw-key remotes are always known peers; fingerprint IS identity) | +| Yes (hub, Ed25519 path) | Ed25519 raw key | fingerprint match (`ed25519:`) | +| Yes (hub, X.509 path) | X.509 | fingerprint match (`SHA256:`) | + +This is the key-type-aware verifier from OQ-29, with the *peer-model* +criterion made explicit: the verifier choice is determined by whether +the remote is a known peer (`PeerEntry` present → pin) or an external +server (`PeerEntry` absent → CA, or fail closed for raw keys). + +### 4. Browsers connecting to a hub are not alknet peers + +A browser reaching a hub over WebTransport (or HTTPS) is served by the +hub's `alknet-http` handler. The browser authenticates by **bearer +token** (HTTP `Authorization`), resolved by the hub's +`IdentityProvider::resolve_from_token` against the hub's +`PeerEntry.auth_token_hash` or `ApiKeyEntry`. The browser is **not** an +alknet peer on the hub's side either — it does not get a `PeerId`, does +not enter `PeerCompositeEnv`, and its "ops" are HTTP routes / WebTransport +streams served by `alknet-http`, not entries in the call-protocol +peer-keyed overlay. The hub's `PeerEntry` for the browser (if any) is +about authorizing the bearer token, not about peer-graph membership. + +This keeps the peer graph populated only by full alknet nodes (role 3 +hubs and role-3-style spoke nodes), never by browsers or pure HTTP +clients. + +### 5. WebTransport relay-as-proxy is deferred with h3 / WebTransport + +A **WebTransport proxy** that terminates the browser's WebTransport +connection and proxies encrypted traffic to a hub's P2P endpoint +(avoiding the need for the hub itself to expose a public X.509 endpoint) +is a real feature, especially for the browser-to-P2P-peer case. It is +**not** load-bearing on the auth model resolved here: + +- The proxy does not change how identities resolve. The browser still + authenticates by bearer token; the hub still resolves it via + `PeerEntry.auth_token_hash`. The proxy is transport-only. +- The fingerprint normalization committed in ADR-030 §6 + (`ed25519:` for raw keys across quinn and iroh) was already + designed to keep the proxied path clean: a proxied connection's + Ed25519 identity is the same `ed25519:` whether the client + connected directly or through the proxy. + +WebTransport support is already deferred past v1 in the alknet-http +Phase 0 findings (decision point DH-2, "h3/WebTransport — in v1 or +deferred?"). The WebTransport-relay-as-proxy feature +belongs in that same deferral bucket — it lands when `h3` / +WebTransport lands, and it does not require any change to the auth +model in this ADR. It is recorded here so it is not lost; it is not an +open question for the auth model. + +### 6. On-chain / smart-contract peer discovery fits the OQ-36 adapter pattern + +The downstream use case — storing relay/repo info and org/user ACL on a +smart-contract platform, with relays (hubs) syncing git repos via +iroh's gossip protocol — is a **discovery and ACL-source** concern, not +an auth-model concern. It does not change any of decisions 1–4: + +- The hubs are role-3 `PeerEntry` peers (mixed fingerprints, full peer- + graph membership, gossip-synced). +- The smart contract is a **source of `PeerEntry` records**. It maps + cleanly onto the repo/adapter pattern (ADR-033): a future + `alknet-peer-store-onchain` adapter implementing `IdentityProvider` + against a smart contract is additive, exactly like + `alknet-peer-store-sqlite`. The auth model (`PeerEntry`, `PeerId`, + `Identity`) is unchanged; only the *source* of the records changes. +- The repo/ACL data on-chain is consumed by the hub's authorization + layer (`AccessControl::check` against scopes/resources populated from + the on-chain `PeerEntry`), not by the TLS / fingerprint path. + +Designing that adapter now would be premature — it is downstream of +both the repo/adapter exploration (OQ-36) and the git crate (OQ-10). +It is noted here only to confirm it does not reopen OQ-37. + +## What this does NOT change + +- **`PeerEntry` struct shape** (ADR-030) — unchanged. Mixed + fingerprints (Ed25519 + X.509) were already supported. +- **`Identity` / `IdentityProvider` trait** — unchanged. The verifier + choice is a `CallClient` / `from_openapi` / `from_mcp` concern, not + an `IdentityProvider` concern. +- **`CallCredentials` struct** — unchanged. `remote_identity` already + carries the expected key type (OQ-29); this ADR specifies how the + verifier is chosen from it (CA for unknown X.509 remotes, fingerprint + match for known peers). +- **`PeerCompositeEnv` / `PeerRef`** (ADR-029) — unchanged. Pure-client + X.509 connections simply do not enter the peer-keyed overlay. +- **`TlsIdentity`** (ADR-027) — unchanged. The server-side X.509 / ACME + / RawKey modes are unaffected; this ADR is about the *client-side* + verifier choice for outgoing connections. +- **The no-env-vars invariant** — unaffected. The bearer token for the + outgoing X.509 case still comes from `Capabilities`, not env vars. + +## Consequences + +**Positive:** +- OQ-37 is resolved. The "make `PeerEntry` symmetric" instinct is + rejected with a clear criterion: `PeerEntry` is for peers in the + call-protocol peer graph; pure-client connections to public X.509 + endpoints are not in that graph on the client side. +- The three remote roles are named, so future specs and conversations + can distinguish "public X.509 endpoint," "transport relay," and + "hub / hosting node" instead of overloading "relay." +- The client-side verifier choice has a single rule: known peer + (`PeerEntry` present) → fingerprint pin; unknown X.509 remote + (`PeerEntry` absent) → CA verification. This closes the + `AcceptAnyServerCertVerifier` security hole for X.509 that OQ-29 + flagged, with the peer-model criterion made explicit. +- The hub case (mixed Ed25519 + X.509 fingerprints, browser access via + WebTransport/HTTPS) is confirmed to need no new types — ADR-030's + `fingerprints: Vec` already covers it. +- The WebTransport-relay-as-proxy and on-chain-discovery use cases are + recorded with clear homes (h3/WebTransport deferral bucket; OQ-36 + adapter pattern) so they don't get lost and don't reopen the auth + model. + +**Negative:** +- The `alknet-http` and `alknet-call` client paths must branch on + "is this remote a known `PeerEntry`?" when selecting a + `ServerCertVerifier`. This is a small implementation cost and is + local to the client connection-establishment code; it is not a + structural change. +- Operators must understand the distinction between "I have a + `PeerEntry` for this remote (pin its fingerprint)" and "I'm calling a + public API (trust the CA)." In practice this is intuitive (it's the + difference between `~/.ssh/known_hosts` and a browser's CA trust + store), but the docs must state it clearly, which this ADR and the + spec amendments do. +- Pure-client X.509 connections have no `PeerId` on the client side, so + any future feature that wants to route to "the connection I opened to + `api.alk.dev`" must hold the `CallConnection` handle directly rather + than using `PeerRef::Specific`. This is the correct constraint — + `PeerRef::Specific` is for known peers, not for arbitrary dials — but + it is a constraint downstream code must respect. + +## Assumptions + +1. **A remote reachable by Ed25519 raw key is always a known peer.** + Raw-key remotes have no CA; the fingerprint IS the trust anchor. An + unknown Ed25519 remote cannot be verified at all (there is no CA to + fall back to), so the connection fails closed. This means the + "public X.509 endpoint" role is the *only* role where the local node + dials a remote it has no `PeerEntry` for. This is correct and + intended — it is the same model iroh uses. + +2. **Browsers never enter the peer-keyed overlay.** A browser is + served by `alknet-http` (HTTP routes / WebTransport streams) and + authenticates by bearer token. The hub may have a `PeerEntry` for + the browser's token (to authorize it), but the browser is not a + `PeerId`-bearing peer. This is the explicit closure of the + "browser as peer" path — browsers are clients, not peers. + +3. **X.509 fingerprint pinning is only for known hubs.** Pinning an + X.509 fingerprint for an arbitrary public API is brittle (cert + renewal) and is not done. The `PeerEntry.fingerprints` X.509 entry + is for the hub case where the local node has a P2P trust + relationship and wants to also recognize the hub's domain-facing + cert. + +4. **The on-chain / smart-contract discovery use case does not change + the auth model.** It is a source of `PeerEntry` records, implemented + as an additive `IdentityProvider` adapter (ADR-033 / OQ-36). The + hub-and-gossip topology it implies is built from role-3 hubs, which + this ADR confirms are ordinary `PeerEntry` peers. + +## References + +- OQ-37 (resolved by this ADR) — the three auth types and how X.509 + server identity fits the peer model +- [ADR-027](027-tls-identity-redesign-acme-rawkey-decoupling.md) — + `TlsIdentity` (RawKey / X509 / Acme), the browser limitation (no RFC + 7250), WebTransport requires X.509 +- [ADR-029](029-peer-graph-routing-model.md) — the peer-keyed overlay + model that `PeerEntry` / `PeerId` feed into; pure-client connections + are not in this graph +- [ADR-030](030-peerentry-and-identity-id-decoupling.md) — `PeerEntry` + with mixed fingerprints; fingerprint normalization (`ed25519:` across + quinn/iroh); the `SHA256:` X.509 fingerprint format +- [ADR-033](033-storage-boundary-and-repo-adapter-pattern.md) — the + repo/adapter pattern that an on-chain `IdentityProvider` adapter + follows; OQ-36 (concrete adapter shapes deferred for exploration) +- [ADR-017](017-call-protocol-client-and-adapter-contract.md) §7 — + `CallCredentials.remote_identity` (ADR-017 specified "expected + fingerprint or cert"; this ADR §2 extends its semantics so that + `remote_identity: None` + no `PeerEntry` + X.509 transport selects + CA verification) +- [ADR-024](024-operation-registry-layering.md) — the Layer 2 + per-connection overlay where ops discovered via `from_call` / + `from_openapi` / `from_mcp` on a pure-client X.509 connection land +- OQ-29 (resolved) — key-type-aware server cert verification; this ADR + adds the peer-model criterion (known peer vs. public X.509 endpoint) + that selects the verifier +- OQ-10 (deferred) — git adapter scope; the on-chain / gossip-synced + git-hosting hub use case in §6 is downstream of the git crate +- OQ-36 (open, deferred for exploration) — concrete persistence adapter + shapes; the on-chain `IdentityProvider` adapter in §6 follows this + pattern +- `docs/research/alknet-http/phase-0-findings.md` — DH-2 (h3 / + WebTransport deferred past v1); the WebTransport-relay-as-proxy + feature noted in this ADR's §5 belongs in that deferral bucket +- `docs/research/references/iroh/iroh/04-sub-crates.md` — iroh's + transport relay (`iroh-relay`), referenced to distinguish it from + alknet's hub role +- `docs/architecture/crates/core/auth.md` — amended: three-role + naming, the outgoing X.509 verifier selection rule +- `docs/architecture/crates/call/client-and-adapters.md` — amended: + outgoing X.509 connection has no client-side `PeerId`; verifier + selection by `PeerEntry` presence \ No newline at end of file diff --git a/docs/architecture/open-questions.md b/docs/architecture/open-questions.md index e4c7d3a..a17ae14 100644 --- a/docs/architecture/open-questions.md +++ b/docs/architecture/open-questions.md @@ -1,6 +1,6 @@ --- status: draft -last_updated: 2026-06-27 +last_updated: 2026-06-28 --- # Open Questions @@ -666,58 +666,62 @@ is a feature extension, not an unmade architecture decision. ## Theme: TLS Identity -### OQ-37: X.509 Outgoing-Only Case (Three Auth Types) +### OQ-37: X.509 Outgoing-Only Case (Three Peer Roles) - **Origin**: ADR-030 §"Bearer tokens" (the three credential types), the discussion that X.509 is fundamentally different from Ed25519 -- **Status**: open (lingering — the X.509 server-identity case needs design) +- **Status**: **resolved** (2026-06-28 by ADR-034) - **Door type**: One-way (how X.509 server identity integrates with the peer model) -- **Priority**: medium -- **Resolution**: The three credential types are: Ed25519 raw key (the - common case, normalized to `ed25519:` across quinn/iroh), X.509 - (domain-facing endpoints, ACME, `SHA256:`), and bearer token - (`PeerEntry.auth_token_hash` or `ApiKeyEntry`). +- **Priority**: medium → resolved +- **Resolution**: **The pre-ADR-034 framing conflated three distinct + remote roles under "X.509 endpoint."** [ADR-034](decisions/034-outgoing-only-x509-and-three-peer-roles.md) + names them and resolves the peer-model question: - Ed25519 and bearer token are resolved (ADR-030 + OQ-29). The X.509 case - that remains open is **outgoing-only**: a client connects to a public - X.509 endpoint (e.g., `api.alk.dev`). The client must verify the server - cert against a CA (rustls's `WebPkiServerVerifier`) — the - `AcceptAnyServerCertVerifier` is a security hole for X.509. The server - may or may not require a client cert (most public X.509 endpoints - won't — browsers can't easily do TLS client-auth). + 1. **Public X.509 endpoint** — a remote HTTPS / `alknet/call`-over-TLS + server reachable by domain, authenticated by CA verification + (`WebPkiServerVerifier`). The local node is a *client*; it + authenticates by bearer token. **Not a `PeerEntry` on the client + side** — it is not in the call-protocol peer graph (ADR-029), gets + no `PeerId`, and is not addressable via `PeerRef::Specific`. Ops + discovered via `from_call`/`from_openapi`/`from_mcp` land in the + connection's Layer 2 overlay and are invoked through the + connection handle. + 2. **Transport relay** — iroh's DERP-equivalent (`iroh-relay`). + Infrastructure, not an alknet peer; no `PeerEntry` / `PeerId`. + Inherited with the `iroh` feature; its identity is iroh's concern. + 3. **Hub / hosting node** — an alknet application peer (head/worker + hub, git-hosting hub) that *also* exposes a public domain + X.509 + for browsers. A single `PeerEntry` with **mixed fingerprints** + (`ed25519:...` + `SHA256:...`), already supported by ADR-030. + Browsers connecting to it are *not* alknet peers — served by + `alknet-http`, bearer-token auth, no `PeerId`. - What's resolved: - - The `PeerEntry.fingerprints` field accepts X.509 fingerprints - (`SHA256:`) alongside Ed25519 fingerprints. - - The client-side verifier is key-type-aware (OQ-29): raw keys use - fingerprint-matching, X.509 uses CA verification. + **The "make `PeerEntry` symmetric" instinct is rejected.** `PeerEntry` + is for peers in the call-protocol peer graph; pure-client connections + to public X.509 endpoints are not in that graph on the client side. + The asymmetry reflects a real trust-model difference: known peers have + stable logical identities (pin the fingerprint); public APIs don't + (trust the CA, hold the connection handle directly). - What's open: - - How does the outgoing X.509 case interact with `PeerEntry`? If a - client connects to `api.alk.dev` (X.509, no client-auth), the client - doesn't present a cert, so the server has no fingerprint to resolve. - The client authenticates via `auth_token` (the bearer-token path). - The server's `PeerEntry` for this client uses `auth_token_hash`, not - `fingerprints`. This works — but the server's `PeerEntry` might not - have a fingerprint at all for an HTTP-only client. - - Conversely, if the server requires X.509 client-auth (mutual TLS), - the client presents its X.509 cert, the server extracts the - `SHA256:` fingerprint, and `PeerEntry.fingerprints` matches it. - This works too. - - The open question is whether there are cases where X.509 server - identity needs to be part of the `PeerEntry` model (the server's - identity, not the client's) — e.g., for the client to know "I'm - connected to `api.alk.dev`, which is peer-id `api-server`." Currently - `PeerEntry` is about the *remote* peer's credentials, as seen by the - *local* node. For an outgoing connection, the local node is the - client, and `PeerEntry` describes the server. This may need a - design pass to make sure the model is symmetric. + **Client-side verifier selection rule (extends OQ-29):** known peer + (`PeerEntry` present) → fingerprint pin (Ed25519 `ed25519:` or + X.509 `SHA256:`); unknown X.509 remote (`PeerEntry` absent) → CA + verification. An unknown Ed25519 raw-key remote cannot be verified at + all (no CA fallback) and fails closed — same model as iroh. + + **Downstream, not blocking, recorded so they don't get lost:** + WebTransport relay-as-proxy (browser → proxy → P2P hub) is deferred + with the rest of h3/WebTransport (alknet-http DH-2); ADR-030 §6's + fingerprint normalization already keeps the proxied path clean. On- + chain / smart-contract peer discovery (relays syncing git repos via + iroh gossip) is a *source* of `PeerEntry` records, fits the OQ-36 + repo/adapter pattern (`alknet-peer-store-onchain` implementing + `IdentityProvider`), and does not change the auth model. Not blocking the ADR-029 migration — the Ed25519 path is the primary - use case and it's resolved. The X.509 outgoing-only case is a real - question but it's downstream (the HTTP crate phase, when - `from_openapi`/`from_mcp` handlers connect to X.509 endpoints). -- **Cross-references**: ADR-027, ADR-029, ADR-030, OQ-29, - [client-and-adapters.md](crates/call/client-and-adapters.md), + use case and was already resolved; this ADR closes the X.509 + outgoing-only remainder. +- **Cross-references**: ADR-027, ADR-029, ADR-030, ADR-033, ADR-034, + OQ-29, OQ-36, [client-and-adapters.md](crates/call/client-and-adapters.md), [endpoint.md](crates/core/endpoint.md), [auth.md](crates/core/auth.md) \ No newline at end of file diff --git a/docs/research/alknet-http/phase-0-findings.md b/docs/research/alknet-http/phase-0-findings.md index cdb1f42..90f5892 100644 --- a/docs/research/alknet-http/phase-0-findings.md +++ b/docs/research/alknet-http/phase-0-findings.md @@ -1,6 +1,6 @@ --- status: draft -last_updated: 2026-06-25 +last_updated: 2026-06-28 --- # alknet-http — Phase 0 Research Findings @@ -166,6 +166,19 @@ HTTP/2 is sufficient. WebTransport is the browser path for the agent service lands as a fast-follow when the agent service needs browser streaming. This keeps v1 focused on the adapter + REST surface. Two-way door. +**WebTransport relay-as-proxy (recorded via ADR-034, not a v1 item):** a +distinct WebTransport feature — a proxy that terminates the browser's +WebTransport connection and forwards encrypted traffic to a P2P hub's +Ed25519 endpoint (so the hub need not expose its own public X.509 cert) +— belongs in this same deferral bucket. It does not change the auth +model: the browser still authenticates by bearer token, the hub still +resolves it via `PeerEntry.auth_token_hash`, and the proxy is +transport-only. ADR-030 §6's fingerprint normalization +(`ed25519:` across quinn/iroh) was already designed to keep the +proxied path clean. See +[ADR-034](../../architecture/decisions/034-outgoing-only-x509-and-three-peer-roles.md) +§5 for the recording. + ### DH-3: How does HTTP map to call-protocol operations? *(One-way door — needs an ADR)* @@ -335,6 +348,12 @@ implementation detail; the credential (API key/token) comes from strategy for generated OpenAPI specs (tied to the registry's External operation set version) needs specifying. One-way door after first publication. +- **OQ-HTTP-07 (WebTransport relay-as-proxy)**: a WebTransport proxy that + fronts a P2P hub for browsers (so the hub need not expose public X.509) + is a real feature for the browser-to-P2P-peer case. Deferred with h3 / + WebTransport (DH-2); recorded in ADR-034 §5 so it is not lost. Does not + change the auth model (bearer token + `PeerEntry.auth_token_hash`; + proxy is transport-only). Two-way door; lands with the `h3` fast-follow. ## Next Steps (Phase 0 → Phase 1)