Files
alknet/docs/architecture/decisions/030-peerentry-and-identity-id-decoupling.md
glm-5.2 7d812af8f4 docs(arch): multi-credential PeerEntry, resolve OQ-29, dissolve OQ-35, add OQ-37
Amend ADR-030 with three changes from the auth-type analysis:

1. PeerEntry is now multi-credential: fingerprints: Vec<String> (Ed25519
   and/or X.509) + auth_token_hash: Option<String> (bearer token). All
   resolve to the same peer_id. A peer that authenticates via Ed25519
   today and via auth_token tomorrow gets the same PeerId. The 'peer
   bearer vs auth bearer' distinction was wrong — the correct framing is
   the three credential types (Ed25519, X.509, bearer token) and whether
   the token needs a stable logical id across rotation (PeerEntry) or not
   (ApiKeyEntry).

2. Fingerprint normalization (§6): quinn extracts the raw Ed25519 public
   key from the SPKI cert and formats as ed25519:<hex>, matching iroh.
   The same key has the same fingerprint regardless of transport. X.509
   fingerprints stay as SHA256:<hex of DER>. This also simplifies the
   coming WebTransport relay work.

3. The 'API keys' section is replaced with 'Bearer tokens' — correctly
   framing the three auth types and the two bearer-token paths
   (PeerEntry.auth_token_hash vs ApiKeyEntry).

Resolve OQ-29 (CallClient TLS client-auth): wire quinn client-auth (present
Ed25519 key as raw public key client cert — the server-side extraction
already works); key-type-aware server cert verification (raw key =
fingerprint match, X.509 = CA verification via WebPkiServerVerifier —
AcceptAnyServerCertVerifier is only safe for raw keys); fingerprint
normalization. The iroh path already works (RFC 7250 raw keys, both sides
exchange automatically); the gap was quinn-only.

Dissolve OQ-35: the 'API key asymmetry' framing was wrong. PeerEntry
supports multiple credential paths; ApiKeyEntry is for tokens that ARE the
identity.

Add OQ-37: X.509 outgoing-only case — the three auth types and how X.509
server identity fits the peer model. Not blocking the ADR-029 migration;
downstream (HTTP crate phase).

Update auth.md, config.md, client-and-adapters.md, call/README.md,
core/README.md, open-questions.md, README.md, and call_client.rs source
comment.

Workspace green: 326 tests pass, build clean.
2026-06-28 08:49:36 +00:00

435 lines
21 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ADR-030: PeerEntry and Identity.id Decoupling
## Status
Accepted (supersedes the "v1 UUID" source in ADR-029 Assumption 1; resolves
the "real solution" half of OQ-33 and the storage-boundary half of OQ-34)
## Context
`Identity.id` is the string that keys authorization decisions across the
alknet crate graph. Today it is **coupled to the cryptographic material**:
```rust
// crates/alknet-core/src/config.rs — current implementation
pub struct AuthPolicy {
pub authorized_fingerprints: HashSet<String>, // just strings, no stable id
pub api_keys: Vec<ApiKeyEntry>,
}
impl AuthPolicy {
pub fn resolve_identity_from_fingerprint(&self, fingerprint: &str) -> Option<Identity> {
if self.authorized_fingerprints.contains(fingerprint) {
Some(Identity {
id: fingerprint.to_string(), // ← identity IS the crypto material
scopes: vec!["relay:connect".to_string()],
...
})
}
}
}
```
This coupling is a latent bug for any cross-node authorization decision:
- A TLS fingerprint or raw-key identity changes when the node rotates its key.
- When it changes, every ACL entry that references the old fingerprint stops
matching — the peer "disappears" from the authorization system even though
it is the same logical node.
- `PeerRef::Specific(PeerId)` (ADR-029) routes by `Identity.id`; a key
rotation would break in-flight routing references the same way.
- The hub's `authorized_fingerprints` set has to be manually updated on every
rotation on the *remote* side, which is exactly the operational pain the
vault's local key rotation (ADR-021) was meant to remove.
ADR-029 §1 set `PeerId = Identity.id` and made `PeerId` a logical identifier
"NOT `Identity.id` (the fingerprint)" — but left the *source* of that logical
identifier as a connection-assigned UUID (OQ-33's v1 workaround). The UUID
is ephemeral: it survives only for the connection's lifetime, changes on
reconnect, and cannot persist across restarts or key rotations. It is a
no-storage workaround, not a real identity.
The research at `docs/research/alknet-storage-strategy/findings.md` §4
established the real fix: introduce a `PeerEntry` config model that maps a
**stable logical peer id** to its current cryptographic material and
authorization scopes, and have `ConfigIdentityProvider` resolve
fingerprint → `PeerEntry``Identity { id: peer_entry.peer_id, scopes:
peer_entry.scopes, ... }`. The `Identity.id` becomes the stable `peer_id`,
decoupled from the fingerprint. Key rotation is a single field update in the
peer entry; the `peer_id` and every ACL / routing reference to it stay
stable.
This is the storage-boundary question OQ-34 tracks. With ADR-033 (the
repo/adapter pattern) establishing that core defines repo traits and the
default in-memory adapter lives alongside the trait, the answer is: core
gets the `PeerEntry` config model and the
`ConfigIdentityProvider::resolve_from_fingerprint → Identity { id: peer_id
}` resolution path now, with no SQLite dependency in core. A future
`alknet-peer-store-sqlite` adapter that persists `PeerEntry` records is
additive — it implements the same `IdentityProvider` trait against a `peers`
table instead of config. The trait is the one-way door; the adapter is the
two-way door.
## Decision
### 1. Add `PeerEntry` to `AuthPolicy`, replacing `authorized_fingerprints`
```rust
pub struct PeerEntry {
/// Stable logical peer id ("worker-a", "alice"). Does NOT change on
/// key rotation. This becomes Identity.id on resolution, regardless of
/// which credential path resolved the identity.
pub peer_id: String,
/// TLS fingerprints for this peer — one or more. A peer may have
/// multiple keys (e.g., an Ed25519 raw key for P2P and an X.509 cert
/// for domain-facing). Resolution matches against any entry.
/// Format: "ed25519:<hex of 32-byte pub key>" for RFC 7250 raw keys
/// (normalized across quinn and iroh — see §6), "SHA256:<hex>" for
/// X.509 certs (DER hash). Changes on key rotation.
pub fingerprints: Vec<String>,
/// Optional: bearer-token authentication for this peer. A peer that
/// also authenticates via auth_token (e.g., HTTP clients that can't
/// do TLS client-auth) stores the SHA-256 hash of the token here.
/// Resolution via resolve_from_token matches this field and returns
/// the same Identity { id: peer_id, ... } as the fingerprint path.
pub auth_token_hash: Option<String>,
/// Authorization scopes granted to this peer. Resolved into
/// Identity.scopes.
pub scopes: Vec<String>,
/// Named resource lists granted to this peer. Resolved into
/// Identity.resources.
pub resources: HashMap<String, Vec<String>>,
/// Human-readable display name for logs / UIs. Optional.
pub display_name: Option<String>,
/// Whether this peer is authorized at all. false = recognized but
/// disabled (revoked). Resolution returns None.
pub enabled: bool,
}
pub struct AuthPolicy {
/// Replaces authorized_fingerprints: HashSet<String>. Each entry maps
/// a stable logical peer_id to its credential paths (fingerprints,
/// optional auth_token_hash) + scopes + resources. The list is keyed
/// by peer_id; resolution looks up by fingerprint OR auth_token.
pub peers: Vec<PeerEntry>,
/// API keys for bearer-token auth where the token IS the identity
/// (rotation = new identity). Peers that need a stable logical id
/// across credential rotation use PeerEntry.auth_token_hash instead.
/// See "Bearer tokens" below.
pub api_keys: Vec<ApiKeyEntry>,
}
```
### 2. `Identity.id` becomes `PeerEntry.peer_id` on resolution (any credential path)
`ConfigIdentityProvider::resolve_from_fingerprint` resolves fingerprint →
matching `PeerEntry` (by any entry in `fingerprints`) → `Identity { id:
peer_entry.peer_id, ... }`. `ConfigIdentityProvider::resolve_from_token`
resolves token → matching `PeerEntry` (by `auth_token_hash`) → the same
`Identity { id: peer_entry.peer_id, ... }`. Both paths produce the same
`Identity` — the `peer_id` is the stable logical id regardless of how the
peer authenticated.
```rust
impl AuthPolicy {
pub fn resolve_identity_from_fingerprint(&self, fingerprint: &str) -> Option<Identity> {
self.peers.iter()
.find(|p| p.enabled && p.fingerprints.iter().any(|f| f == fingerprint))
.map(|p| Identity {
id: p.peer_id.clone(),
scopes: p.scopes.clone(),
resources: p.resources.clone(),
})
}
pub fn resolve_identity_from_token(&self, token: &str) -> Option<Identity> {
let token_hash = sha256(token);
self.peers.iter()
.find(|p| p.enabled && p.auth_token_hash.as_deref() == Some(&token_hash))
.map(|p| Identity {
id: p.peer_id.clone(),
scopes: p.scopes.clone(),
resources: p.resources.clone(),
})
.or_else(|| self.resolve_api_key(token)) // fall through to ApiKeyEntry
}
}
```
If the token doesn't match any `PeerEntry.auth_token_hash`, resolution falls
through to `resolve_api_key` (the `ApiKeyEntry` path, where `Identity.id =
prefix`). This preserves the existing API-key path for bearer tokens that
ARE the identity, while adding the `PeerEntry` token path for tokens that
are one credential path among several for a stable logical peer.
This removes the pre-ADR-030 limitation in `auth.md`
§"Resource-scoped ACLs and external identities" — resolved identities now
carry `resources` from the `PeerEntry`, not just from the composition path.
### 3. Key rotation is a `PeerEntry` field update (no `peer_id` change)
Rotating a peer's TLS key:
- The vault derives the new key locally (ADR-020/021).
- The remote side's config updates the `PeerEntry.fingerprints` entry for
that `peer_id`. The `peer_id`, `scopes`, `resources`, ACL entries, and
any `PeerRef::Specific(peer_id)` references stay stable.
- A config reload (`ConfigReloadHandle::reload`) makes the change live.
Rotating a peer's auth token:
- Update `PeerEntry.auth_token_hash` for that `peer_id`. The `peer_id`
and everything that references it stays stable.
No ACL update, no routing reference invalidation, no peer "disappears."
The vault's local rotation + a remote-side config edit is the full key
rotation story across nodes, for any credential path.
### 4. `PeerId` source changes from UUID to `Identity.id` from `PeerEntry`
ADR-029 Assumption 1 said `PeerId` is a connection-assigned UUID (v4). With
`Identity.id` now stable (`peer_id`), the UUID workaround is no longer
needed: `PeerId = Identity.id` from `IdentityProvider` resolution. This is
the one-way-door tightening — `PeerId` was always specified as logical-not-
crypto (ADR-029), the UUID was the *source*; the source now becomes the
auth system.
```rust
// ADR-029 §1, updated by this ADR:
pub type PeerId = String; // = Identity.id from IdentityProvider resolution
// = PeerEntry.peer_id (stable, not crypto material)
```
ADR-029 §2's `invoke_peer` / `PeerRef::Specific(PeerId)` signatures are
unchanged. The `PeerId` payload is now stable across reconnects and key
rotations, instead of ephemeral. An in-flight `PeerRef::Specific` that
survives a reconnect now keeps resolving (the `peer_id` is unchanged), which
is the property the UUID workaround could not provide.
### 5. The `PeerId` for a connection comes from `IdentityProvider` resolution
The dispatch path that builds a `CallConnection` and assigns a `PeerId` to
the peer-keyed overlay (`PeerCompositeEnv::attach_peer`) reads
`connection.identity().id` — the resolved `Identity.id` from the
`IdentityProvider`. If identity resolution returns `None` (no client cert,
unrecognized fingerprint), the peer has no `PeerId` and the connection
cannot be added to the peer-keyed overlay. The handler either rejects the
connection or falls back to a connection-without-peer-identity path (the
caller-id-is-the-connection case, e.g., anonymous dial-in).
The UUID fallback is removed. A connection with no resolved identity has no
`PeerId`, not a random one.
### 6. Fingerprint format normalization: `ed25519:` for raw keys
Ed25519 raw keys (RFC 7250) produce different fingerprint formats depending
on the transport:
- **iroh** (direct or relay): `ed25519:<hex of 32-byte public key>`
extracted from `connection.remote_node_id()`, which returns the NodeId
(the raw Ed25519 public key). Already implemented.
- **quinn RawKey**: currently `SHA256:<hex of cert DER>` — because
`fingerprint_from_cert_der` hashes the SPKI DER bytes. The DER encoding
of the SPKI is not the raw 32-byte public key; it's an ASN.1 wrapper.
So the same Ed25519 key produces `ed25519:abc...` on iroh and
`SHA256:def...` on quinn — two different fingerprints for the same key.
This is normalized: the quinn path extracts the Ed25519 public key from the
cert DER (the `RawKeyCertResolver` already has the raw key bytes via
`Ed25519SecretKey::public()`) and formats it as `ed25519:<hex>`, matching
iroh. A peer that connects via quinn direct and via iroh has the same
fingerprint in `PeerEntry.fingerprints` — one entry, both transports.
The normalization is in `extract_quinn_client_fingerprint`: when the
presented cert is an RFC 7250 raw public key (SPKI with Ed25519 algorithm
identifier), extract the raw 32-byte public key and format as
`ed25519:<hex>`. When the cert is X.509, keep the `SHA256:<hex of DER>`
format (X.509 certs don't have a "raw public key" form — the DER hash is
the fingerprint).
This also simplifies the coming WebTransport relay work: a WebTransport
relay acts as a proxy, and the proxied connection's Ed25519 identity
should be the same `ed25519:<hex>` whether the client connected directly
or through the relay. Normalizing on the iroh pattern means the relay
doesn't need a separate fingerprint format.
## Bearer tokens
There are three credential types in the alknet auth model:
1. **Ed25519 raw key** (RFC 7250) — the most common. Same key type as SSH
keys, native to iroh's `NodeId`. Fingerprint format: `ed25519:<hex>`.
Used for direct quinn, iroh direct, and iroh relay connections. The
fingerprint IS the trust anchor (no CA needed).
2. **X.509 cert** — for domain-facing endpoints (`api.alk.dev`, relays,
ACME/Let's Encrypt). Fingerprint format: `SHA256:<hex of DER>`. Requires
CA verification on the client side. The outgoing-only case (a client
connects to a public X.509 endpoint) is tracked as OQ-37.
3. **Bearer token** (auth_token) — for HTTP clients that can't do TLS
client-auth (browsers, curl), or as a secondary credential path. Carried
in the call-protocol `auth_token` payload field.
A `PeerEntry` can have any combination of these: `fingerprints: Vec<String>`
for one or more TLS keys (Ed25519 and/or X.509), `auth_token_hash:
Option<String>` for an optional bearer-token path. All resolve to the same
`peer_id`. A peer that authenticates via Ed25519 today and via auth_token
tomorrow gets the same `PeerId` — the logical identity is stable across
credential paths.
`ApiKeyEntry` stays as a separate path for bearer tokens where the token IS
the identity (rotation = new identity, no stable logical id needed). When a
bearer token is one credential path among several for a stable peer, it
goes in `PeerEntry.auth_token_hash`. The distinction is not "peer bearer vs
auth bearer" — it's whether the token needs a stable logical id across
rotation (`PeerEntry`) or not (`ApiKeyEntry`).
| Credential type | `PeerEntry` field | `Identity.id` | Rotation |
|-----------------|-------------------|---------------|----------|
| Ed25519 raw key | `fingerprints[i]` (`ed25519:...`) | `peer_id` (stable) | Update `fingerprints` entry |
| X.509 cert | `fingerprints[i]` (`SHA256:...`) | `peer_id` (stable) | Update `fingerprints` entry |
| Bearer token (peer) | `auth_token_hash` | `peer_id` (stable) | Update `auth_token_hash` |
| Bearer token (identity) | `ApiKeyEntry` (separate) | `prefix` (changes with key) | New `ApiKeyEntry` |
## What this does NOT change
- **`Identity` struct shape** — `id: String`, `scopes: Vec<String>`,
`resources: HashMap<String, Vec<String>>` are unchanged. Only the
*meaning* of `id` on the fingerprint path changes (fingerprint →
peer_id).
- **`IdentityProvider` trait** — unchanged. The adapter's resolution
semantics change, not the trait.
- **`AccessControl::check`** — unchanged. Still a flat scope/resource match
against `Identity`. The `Identity` it checks now has a stable `id` on the
fingerprint path, but `check` doesn't key on `id` (it checks scopes and
resources).
- **`AuthToken`, `AuthContext`** — unchanged.
- **`PeerRef::Specific(PeerId)` signature** — unchanged. The payload is now
stable.
- **`CompositeOperationEnv``PeerCompositeEnv` migration** — unchanged.
This ADR provides the stable `PeerId` source; ADR-029 still owns the
overlay-keying model.
## Consequences
**Positive:**
- Key rotation no longer breaks ACL entries or routing references on the
remote side — for any credential path (TLS key or auth token). The
vault's local rotation story (ADR-021) is now the complete story.
- `PeerRef::Specific` survives reconnects. An in-flight routing reference
to "worker-a" keeps resolving after worker-a's TLS key rotates and after
worker-a reconnects.
- OQ-33's UUID workaround is removed — the stable logical id is the real
thing, not an ephemeral stand-in.
- OQ-34's storage-boundary question is resolved: core has the config model
(`PeerEntry`) + the in-memory adapter (`ConfigIdentityProvider`); a
future `alknet-peer-store-sqlite` adapter that persists `PeerEntry`
records is additive, implementing the same `IdentityProvider` trait
against a `peers` table. See ADR-033.
- Resolved identities now carry `resources` (the pre-ADR-030 limitation is
lifted) — `AccessControl::check` against `resource_type`/
`resource_action` works for external authenticated callers when
configured, regardless of credential path.
- A peer can authenticate via Ed25519 today and via auth_token tomorrow,
getting the same `PeerId` — the logical identity is stable across
credential paths.
- Fingerprint normalization (`ed25519:<hex>` for raw keys across quinn and
iroh) means the same key has the same fingerprint regardless of transport.
This also simplifies the coming WebTransport relay work.
**Negative:**
- `AuthPolicy.authorized_fingerprints: HashSet<String>` is replaced with
`AuthPolicy.peers: Vec<PeerEntry>`. This is a breaking config change —
existing config files with `authorized_fingerprints` migrate to `peers`
entries. The migration is mechanical (each fingerprint becomes a
`PeerEntry { peer_id: <chosen name>, fingerprints: vec![<old value>], ... }`),
and operators must choose a `peer_id` per peer, but it is a config break.
- `Identity.id` for resolved identities changes from the fingerprint to
the `peer_id`. Code that logs or compares `Identity.id` and assumed it
was the fingerprint string will see the `peer_id` instead. This is the
correct behavior (logs should show the logical name, not the rotating
crypto material), but it's a behavior change in log output.
- The quinn fingerprint extraction changes from `SHA256:<hex of DER>` to
`ed25519:<hex of raw key>` for raw-key certs. Existing configs with
`SHA256:` fingerprints for Ed25519 keys migrate to `ed25519:` format.
X.509 fingerprints stay as `SHA256:<hex of DER>`.
- ADR-029 Assumption 1 is superseded on the `PeerId` source dimension:
the one-way door (`PeerId` is logical, not crypto) is preserved, but the
UUID source is replaced by `Identity.id` from `PeerEntry`. The
Assumption's framing of "no-storage workaround" is no longer accurate —
the storage boundary is now explicitly `config + in-memory adapter`
(this ADR + ADR-033), with the SQLite adapter additive.
## Assumptions
1. **The dispatch path can require identity resolution for peer-keyed
overlay membership.** A connection that fails `IdentityProvider`
resolution has no `PeerId` and is not added to `PeerCompositeEnv`. The
caller either authenticates with a recognized fingerprint (and gets a
`peer_id`) or is rejected / falls back to a no-peer-identity path. The
v1 UUID fallback is removed deliberately — anonymous dial-in to a
peer-keyed composition env is a contradiction.
2. **`PeerEntry.peer_id` is operator-chosen and unique within a config.**
Config validation enforces uniqueness; duplicate `peer_id` values in
`AuthPolicy.peers` are a config error.
3. **Bearer tokens have two paths.** `PeerEntry.auth_token_hash` is for
tokens that are one credential path among several for a stable logical
peer (the token rotates, the `peer_id` stays). `ApiKeyEntry` is for
tokens that ARE the identity (rotation = new identity, no stable
logical id needed). See "Bearer tokens" above. The distinction is not
"peer bearer vs auth bearer" — it's whether the token needs a stable
logical id across rotation.
4. **The `peers` list resolution is O(peers) per fingerprint lookup.** The
expected peer count per node is small (10s100s); a linear scan with a
side index is fine. A `HashMap<fingerprint, &PeerEntry>` index is an
implementation-detail two-way door.
5. **Adapter crates that persist `PeerEntry` records are additive and not
specified here.** ADR-033 establishes the pattern (core trait + in-memory
default; persistence adapters are separate crates); the concrete adapter
shapes are deferred for exploration per the user's note. This ADR's
commitment is to the `PeerEntry` config model + the resolution
semantics + the `PeerId` source, not to any specific backend.
## References
- ADR-004: Auth as Shared Core (`IdentityProvider` in core)
- ADR-015: Privilege Model and Authority Context (`AccessControl::check`
against `Identity`)
- ADR-021: Key Rotation via Version-Indexed Paths (the local rotation half
this ADR completes across nodes)
- ADR-022: Handler Registration, Provenance, and Composition Authority
(the registration bundle's `composition_authority` path produces its own
`Identity`; this ADR's `PeerEntry.resources` populates the external-auth
path's `Identity.resources`)
- ADR-029: Peer-Graph Routing Model (the `PeerId = Identity.id` model;
Assumption 1's UUID source is superseded by this ADR's `PeerEntry.peer_id`
source — the one-way door is preserved)
- ADR-033: Storage Boundary and Repo/Adapter Pattern (the overarching pattern
this ADR's `PeerEntry` + `ConfigIdentityProvider` follows)
- OQ-33: PeerId — Cryptographic Identity vs Stable Logical Identifier
(resolved by this ADR — the "real solution" half, replacing the UUID
workaround)
- OQ-34: Persistent Peer Registry (resolved by this ADR + ADR-033 — the
storage boundary is `config + in-memory adapter` now, SQLite adapter
additive)
- ~~OQ-35: API Key Identity vs Peer Identity~~ (dissolved — the
"asymmetry" framing was wrong; `PeerEntry` supports multiple credential
paths, and `ApiKeyEntry` is for tokens that ARE the identity)
- OQ-29: CallClient TLS Client-Auth (resolved by this ADR's §6 fingerprint
normalization + the client-auth wiring decision recorded in OQ-29)
- OQ-37: X.509 outgoing-only case (the three auth types and how X.509
server identity fits — see OQ-37 in open-questions.md)
- `docs/research/alknet-storage-strategy/findings.md` §4 (the `PeerEntry`
model and resolution path)
- `docs/architecture/crates/core/auth.md` (the spec amended by this ADR)
- `docs/architecture/crates/core/config.md` (the `AuthPolicy` change)