docs(arch): multi-credential PeerEntry, resolve OQ-29, dissolve OQ-35, add OQ-37
Amend ADR-030 with three changes from the auth-type analysis: 1. PeerEntry is now multi-credential: fingerprints: Vec<String> (Ed25519 and/or X.509) + auth_token_hash: Option<String> (bearer token). All resolve to the same peer_id. A peer that authenticates via Ed25519 today and via auth_token tomorrow gets the same PeerId. The 'peer bearer vs auth bearer' distinction was wrong — the correct framing is the three credential types (Ed25519, X.509, bearer token) and whether the token needs a stable logical id across rotation (PeerEntry) or not (ApiKeyEntry). 2. Fingerprint normalization (§6): quinn extracts the raw Ed25519 public key from the SPKI cert and formats as ed25519:<hex>, matching iroh. The same key has the same fingerprint regardless of transport. X.509 fingerprints stay as SHA256:<hex of DER>. This also simplifies the coming WebTransport relay work. 3. The 'API keys' section is replaced with 'Bearer tokens' — correctly framing the three auth types and the two bearer-token paths (PeerEntry.auth_token_hash vs ApiKeyEntry). Resolve OQ-29 (CallClient TLS client-auth): wire quinn client-auth (present Ed25519 key as raw public key client cert — the server-side extraction already works); key-type-aware server cert verification (raw key = fingerprint match, X.509 = CA verification via WebPkiServerVerifier — AcceptAnyServerCertVerifier is only safe for raw keys); fingerprint normalization. The iroh path already works (RFC 7250 raw keys, both sides exchange automatically); the gap was quinn-only. Dissolve OQ-35: the 'API key asymmetry' framing was wrong. PeerEntry supports multiple credential paths; ApiKeyEntry is for tokens that ARE the identity. Add OQ-37: X.509 outgoing-only case — the three auth types and how X.509 server identity fits the peer model. Not blocking the ADR-029 migration; downstream (HTTP crate phase). Update auth.md, config.md, client-and-adapters.md, call/README.md, core/README.md, open-questions.md, README.md, and call_client.rs source comment. Workspace green: 326 tests pass, build clean.
This commit is contained in:
@@ -414,73 +414,52 @@ is a feature extension, not an unmade architecture decision.
|
||||
### OQ-29: CallClient TLS Client-Auth and Remote-Identity Verification
|
||||
|
||||
- **Origin**: [client-and-adapters.md](crates/call/client-and-adapters.md), ADR-017 §7
|
||||
- **Status**: **open — load-bearing on ADR-030** (not "additive" as previously framed)
|
||||
- **Status**: **resolved** (2026-06-27 by ADR-030 §6 + this decision)
|
||||
- **Door type**: One-way (identity model interaction), two-way (mechanism)
|
||||
- **Priority**: **high** (was medium; promoted — this is the activation path
|
||||
for ADR-030's `PeerEntry` fingerprint → `peer_id` resolution)
|
||||
- **Resolution**: **Previously framed as "additive — two-way-door
|
||||
remainder." That framing is incorrect.** ADR-030 makes `PeerId =
|
||||
Identity.id = PeerEntry.peer_id` on the fingerprint path. But the
|
||||
fingerprint path requires the client to present a TLS client cert, and
|
||||
the current `CallClient::connect()` uses `with_no_client_auth()` — no
|
||||
client cert is presented, no fingerprint is extracted by the server's
|
||||
`AcceptAnyCertVerifier`, and `IdentityProvider::resolve_from_fingerprint`
|
||||
returns `None`. The peer gets no `PeerId` from the fingerprint path.
|
||||
- **Priority**: ~~high~~ → resolved
|
||||
- **Resolution**: **Three things are decided:**
|
||||
|
||||
The `auth_token` path (`resolve_from_token`) still works, but it
|
||||
resolves to `Identity.id = ApiKeyEntry.prefix` (the API-key identity
|
||||
path), **not** to `PeerEntry.peer_id`. So with TLS client-auth unwired,
|
||||
a calling peer's `PeerId` is either `None` (no client cert) or an
|
||||
API-key prefix (if an `auth_token` is used) — neither is the stable
|
||||
`PeerEntry.peer_id` that ADR-030 commits. The PeerEntry path is dormant
|
||||
until client-auth is wired.
|
||||
1. **Wire quinn client-auth.** The client presents its Ed25519 key as an
|
||||
RFC 7250 raw public key client cert (the client-side equivalent of
|
||||
the server's `RawKeyCertResolver`). The server's
|
||||
`AcceptAnyCertVerifier` already requests client certs and extracts
|
||||
the fingerprint — the gap was entirely on the client side
|
||||
(`with_no_client_auth()` → present the key). This activates the
|
||||
`PeerEntry` fingerprint → `peer_id` resolution path on quinn
|
||||
connections.
|
||||
|
||||
This is not a "two-way-door remainder" — it's the activation path for
|
||||
ADR-030's primary use case (stable `peer_id` across key rotation for
|
||||
peer-keyed overlays). The decision to make is:
|
||||
2. **Key-type-aware server cert verification.** The client's
|
||||
`ServerCertVerifier` depends on the remote's identity type:
|
||||
- **Ed25519 raw key** (the common case): accept the cert, extract the
|
||||
fingerprint, match against `PeerEntry.fingerprints`. The fingerprint
|
||||
IS the trust anchor — no CA needed. (Same model as iroh.)
|
||||
- **X.509** (domain-facing endpoints, ACME): verify against a CA
|
||||
(rustls's `WebPkiServerVerifier` with the platform root store or a
|
||||
configured CA). `AcceptAnyServerCertVerifier` is a security hole for
|
||||
X.509 — it's only safe for raw keys.
|
||||
- The verifier choice is driven by `CallCredentials.remote_identity`,
|
||||
which carries the expected key type.
|
||||
|
||||
- **(a)** Wire TLS client-auth as part of the ADR-029 migration, so the
|
||||
fingerprint → `PeerEntry` → `peer_id` path is live from day one. The
|
||||
server's `AcceptAnyCertVerifier` already requests (but doesn't verify)
|
||||
client certs; the client's `with_no_client_auth()` is the gap. Wiring
|
||||
the local node's `RawKey`/`X509` identity as a rustls client-auth cert
|
||||
is the missing piece. Remote-identity verification (plugging
|
||||
`credentials.remote_identity` into a real `ServerCertVerifier`) is
|
||||
genuinely additive — the server-side fingerprint extraction is what
|
||||
matters for `PeerId`, not the client-side verification of the server.
|
||||
3. **Fingerprint normalization** (ADR-030 §6): the quinn path extracts
|
||||
the raw Ed25519 public key from the SPKI cert and formats it as
|
||||
`ed25519:<hex>`, matching iroh. The same key has the same fingerprint
|
||||
regardless of transport. X.509 fingerprints stay as `SHA256:<hex of
|
||||
DER>`.
|
||||
|
||||
- **(b)** Ship the ADR-029 migration with `auth_token`-only peer identity
|
||||
and treat TLS client-auth as a follow-up. This means `PeerCompositeEnv`
|
||||
keys on `Identity.id = ApiKeyEntry.prefix` (the token prefix) until
|
||||
client-auth is wired, then switches to `PeerEntry.peer_id` when it is.
|
||||
The switch is a behavior change for any deployment that built on the
|
||||
token-prefix identity — the `PeerId` changes from the prefix to the
|
||||
`peer_id`. This is the "compounds into a mess" path.
|
||||
**The iroh path already works** — iroh uses RFC 7250 raw keys, both
|
||||
sides automatically exchange Ed25519 public keys during the TLS
|
||||
handshake, and `extract_iroh_client_fingerprint` already gets the
|
||||
`NodeId`. No client-auth wiring needed for iroh (direct or relay). The
|
||||
gap was quinn-only.
|
||||
|
||||
- **(c)** Extend `PeerEntry` to also cover `auth_token`-based peer
|
||||
identity — a peer entry keyed by token prefix (or a `PeerEntry.token`
|
||||
field) instead of (or alongside) fingerprint. This unifies the two
|
||||
identity paths under `PeerEntry`, so the `PeerId` is stable regardless
|
||||
of which credential path the peer used. This is a design change to
|
||||
ADR-030, not just an implementation choice.
|
||||
**What's genuinely additive** (not blocking the ADR-029 migration):
|
||||
remote-identity verification (the client verifying the server's
|
||||
fingerprint against an expected value) is additive — the server-side
|
||||
fingerprint extraction is what matters for `PeerId`, not the client-side
|
||||
verification. The verifier for raw keys can start as "accept any, extract
|
||||
fingerprint" and add fingerprint-pinning later.
|
||||
|
||||
**The X.509 / raw-key wrinkle:** the vast majority of end users will use
|
||||
Ed25519 raw keys (RFC 7250) — the same key type as SSH keys, native to
|
||||
iroh's `NodeId` model. The fingerprint format for raw keys is
|
||||
`ed25519:<hex>`. For X.509 (public-facing endpoints like
|
||||
`api.alk.dev`, relays), the fingerprint is `SHA256:<hex of DER>` — a
|
||||
different format, a different key type, but the same `PeerEntry.fingerprint`
|
||||
field. The `IdentityProvider::resolve_from_fingerprint` path is
|
||||
format-agnostic (it's a string match against `PeerEntry.fingerprint`),
|
||||
so both key types work once client-auth is wired. The wrinkle is on the
|
||||
client side: presenting an Ed25519 raw key as a TLS client cert uses a
|
||||
different rustls path than presenting an X.509 cert. Both are supported
|
||||
by rustls; the `CallCredentials.tls_identity` field already carries the
|
||||
`TlsIdentity` enum (RawKey / X509). The wiring is per-variant.
|
||||
|
||||
**Not decided yet.** This OQ is promoted to high priority and requires a
|
||||
decision before the ADR-029 migration lands. The previous "additive,
|
||||
two-way-door remainder" framing is struck.
|
||||
See ADR-030 §6 for the fingerprint normalization details.
|
||||
- **Cross-references**: ADR-014, ADR-017, ADR-027, ADR-029, ADR-030,
|
||||
[client-and-adapters.md](crates/call/client-and-adapters.md),
|
||||
[endpoint.md](crates/core/endpoint.md), [auth.md](crates/core/auth.md)
|
||||
@@ -615,28 +594,30 @@ is a feature extension, not an unmade architecture decision.
|
||||
|
||||
## Theme: Storage and Adapters
|
||||
|
||||
### OQ-35: API Key Identity vs Peer Identity
|
||||
### OQ-35: ~~API Key Identity vs Peer Identity~~ (Dissolved)
|
||||
|
||||
- **Origin**: ADR-030 §"API keys" (the asymmetry between the two auth paths)
|
||||
- **Status**: resolved (recorded by ADR-030, not a blocker)
|
||||
- **Door type**: One-way (the asymmetry is deliberate, not an oversight)
|
||||
- **Priority**: medium
|
||||
- **Resolution**: The fingerprint auth path gets the `PeerEntry`
|
||||
id-decoupling treatment (`Identity.id = peer_id`, stable across key
|
||||
rotation); the API-key auth path does not (`Identity.id = prefix`,
|
||||
changes with the key). This is deliberate:
|
||||
- **Status**: **dissolved** (2026-06-27 — the framing was wrong)
|
||||
- **Door type**: ~~One-way~~
|
||||
- **Priority**: ~~medium~~
|
||||
- **Resolution**: **Dissolved.** The original framing ("the fingerprint
|
||||
path gets `PeerEntry` id-decoupling, the API-key path doesn't — the
|
||||
asymmetry is deliberate") was based on a false distinction between "peer
|
||||
bearer" and "auth bearer" tokens. The correct framing is the three
|
||||
credential types (Ed25519, X.509, bearer token) and whether the token
|
||||
needs a stable logical id across rotation:
|
||||
|
||||
- Node identity (fingerprint path) must survive key rotation — the
|
||||
same logical node rotates its TLS key, and every ACL entry / routing
|
||||
reference to it should stay stable. `PeerEntry` provides this.
|
||||
- Bearer-token identity (API-key path) IS the token — rotating the key
|
||||
means a new prefix and a new identity, by design (revocation is the
|
||||
rotation mechanism for API keys). Decoupling the API key identity
|
||||
from the prefix would solve a problem API keys don't have.
|
||||
- `PeerEntry` supports multiple credential paths: `fingerprints: Vec<String>`
|
||||
(Ed25519 and/or X.509) + `auth_token_hash: Option<String>` (bearer
|
||||
token). All resolve to the same `peer_id`.
|
||||
- `ApiKeyEntry` is for bearer tokens that ARE the identity (rotation =
|
||||
new identity, no stable logical id needed).
|
||||
|
||||
The asymmetry is documented in `auth.md` ("API keys vs peer entries")
|
||||
and in ADR-030 §"API keys" so it's explicit, not an oversight. See
|
||||
[auth.md](crates/core/auth.md) for the table comparing the two paths.
|
||||
A bearer token that is one credential path among several for a stable
|
||||
peer goes in `PeerEntry.auth_token_hash`. A bearer token that IS the
|
||||
identity stays in `ApiKeyEntry`. The distinction is whether the token
|
||||
needs a stable logical id across rotation, not "peer bearer vs auth
|
||||
bearer." See ADR-030 §"Bearer tokens."
|
||||
- **Cross-references**: ADR-030, [auth.md](crates/core/auth.md),
|
||||
[config.md](crates/core/config.md)
|
||||
|
||||
@@ -679,4 +660,62 @@ is a feature extension, not an unmade architecture decision.
|
||||
pattern is committed, the in-memory adapters ship with core, and the
|
||||
persistence adapter shapes are the open exploration.
|
||||
- **Cross-references**: ADR-030, ADR-031, ADR-033, OQ-34,
|
||||
[auth.md](crates/core/auth.md), [config.md](crates/core/config.md)
|
||||
[auth.md](crates/core/auth.md), [config.md](crates/core/config.md)
|
||||
|
||||
## Theme: TLS Identity
|
||||
|
||||
### OQ-37: X.509 Outgoing-Only Case (Three Auth Types)
|
||||
|
||||
- **Origin**: ADR-030 §"Bearer tokens" (the three credential types), the
|
||||
discussion that X.509 is fundamentally different from Ed25519
|
||||
- **Status**: open (lingering — the X.509 server-identity case needs design)
|
||||
- **Door type**: One-way (how X.509 server identity integrates with the
|
||||
peer model)
|
||||
- **Priority**: medium
|
||||
- **Resolution**: The three credential types are: Ed25519 raw key (the
|
||||
common case, normalized to `ed25519:<hex>` across quinn/iroh), X.509
|
||||
(domain-facing endpoints, ACME, `SHA256:<hex>`), and bearer token
|
||||
(`PeerEntry.auth_token_hash` or `ApiKeyEntry`).
|
||||
|
||||
Ed25519 and bearer token are resolved (ADR-030 + OQ-29). The X.509 case
|
||||
that remains open is **outgoing-only**: a client connects to a public
|
||||
X.509 endpoint (e.g., `api.alk.dev`). The client must verify the server
|
||||
cert against a CA (rustls's `WebPkiServerVerifier`) — the
|
||||
`AcceptAnyServerCertVerifier` is a security hole for X.509. The server
|
||||
may or may not require a client cert (most public X.509 endpoints
|
||||
won't — browsers can't easily do TLS client-auth).
|
||||
|
||||
What's resolved:
|
||||
- The `PeerEntry.fingerprints` field accepts X.509 fingerprints
|
||||
(`SHA256:<hex of DER>`) alongside Ed25519 fingerprints.
|
||||
- The client-side verifier is key-type-aware (OQ-29): raw keys use
|
||||
fingerprint-matching, X.509 uses CA verification.
|
||||
|
||||
What's open:
|
||||
- How does the outgoing X.509 case interact with `PeerEntry`? If a
|
||||
client connects to `api.alk.dev` (X.509, no client-auth), the client
|
||||
doesn't present a cert, so the server has no fingerprint to resolve.
|
||||
The client authenticates via `auth_token` (the bearer-token path).
|
||||
The server's `PeerEntry` for this client uses `auth_token_hash`, not
|
||||
`fingerprints`. This works — but the server's `PeerEntry` might not
|
||||
have a fingerprint at all for an HTTP-only client.
|
||||
- Conversely, if the server requires X.509 client-auth (mutual TLS),
|
||||
the client presents its X.509 cert, the server extracts the
|
||||
`SHA256:<hex>` fingerprint, and `PeerEntry.fingerprints` matches it.
|
||||
This works too.
|
||||
- The open question is whether there are cases where X.509 server
|
||||
identity needs to be part of the `PeerEntry` model (the server's
|
||||
identity, not the client's) — e.g., for the client to know "I'm
|
||||
connected to `api.alk.dev`, which is peer-id `api-server`." Currently
|
||||
`PeerEntry` is about the *remote* peer's credentials, as seen by the
|
||||
*local* node. For an outgoing connection, the local node is the
|
||||
client, and `PeerEntry` describes the server. This may need a
|
||||
design pass to make sure the model is symmetric.
|
||||
|
||||
Not blocking the ADR-029 migration — the Ed25519 path is the primary
|
||||
use case and it's resolved. The X.509 outgoing-only case is a real
|
||||
question but it's downstream (the HTTP crate phase, when
|
||||
`from_openapi`/`from_mcp` handlers connect to X.509 endpoints).
|
||||
- **Cross-references**: ADR-027, ADR-029, ADR-030, OQ-29,
|
||||
[client-and-adapters.md](crates/call/client-and-adapters.md),
|
||||
[endpoint.md](crates/core/endpoint.md), [auth.md](crates/core/auth.md)
|
||||
Reference in New Issue
Block a user