Files
alknet/docs/architecture/decisions/027-tls-identity-redesign-acme-rawkey-decoupling.md
glm-5.2 d94d7a132a docs(adr-027): TLS identity redesign — ACME + RawKey decoupling
ADR-027 resolves the architectural gap surfaced when ACME integration
became a concrete target:

1. TlsIdentity::Acme variant — static config data (domains, cache_dir,
   directory, contact) with async AcmeState constructed at endpoint
   setup via two-phase TlsSetup (not stuffed into the Clone-able enum).

2. TlsIdentity::RawKey decoupled from the iroh feature — uses
   Ed25519SecretKey (alknet-core-owned wrapper over ed25519_dalek)
   instead of iroh::SecretKey. Raw-key TLS identity (RFC 7250, the
   default for most alknet nodes) now works in quinn-only builds.
   iroh transport converts via SecretKey::from_bytes.

3. ACME feature-gated behind new acme feature (rustls-acme optional
   dep). Non-ACME builds don't compile it.

4. dispatch_quinn guard for acme-tls/1 challenge connections — TLS-ALPN-01
   is handled at the rustls cert resolver layer during the handshake;
   the guard closes challenge connections gracefully instead of logging
   a misleading "no handler" warning.

Research confirmed QUIC (quinn) handles ACME challenges differently than
TCP (reverse-proxy): quinn gives no ClientHello peek hook, but the
challenge is fully answered at the cert resolution step before the
connection surfaces to the application. No handler registration needed.

Spec updates: config.md, endpoint.md, open-questions.md (OQ-12),
overview.md + README.md (ADR index), ADR-010 (cross-ref).

Tasks: core/rawkey-decouple-from-iroh (gen 1, no deps),
core/acme-integration (gen 2, depends on rawkey). Graph: 36 tasks.
2026-06-24 12:29:24 +00:00

279 lines
12 KiB
Markdown

# ADR-027: TLS Identity Redesign — ACME Integration + RawKey Decoupling
## Status
Accepted
## Context
OQ-12 marked "resolved" identified two TLS identity use cases: RFC 7250
raw Ed25519 keys (default, P2P) and X.509 certs (domain-hosted, browsers).
ACME auto-provisioning was described as "additive — it will be adapted
when domain-hosted nodes need it." That deferral created two
architectural issues that surface now that ACME is a concrete target.
### Issue 1: `TlsIdentity` cannot represent ACME
`TlsIdentity` is `#[derive(Debug, Clone)]` and lives in `StaticConfig`
a static, synchronous config value. ACME requires:
- A long-lived async state machine (`AcmeState` event loop, spawned for
the endpoint's lifetime) that handles ordering, challenge response,
cert renewal, and cache I/O.
- TLS-ALPN-01 challenge handling: `acme-tls/1` must be in the server's
`alpn_protocols`, and a `ResolvesServerCertAcme` must serve challenge
certs during the TLS handshake.
- Config fields: domains, cache directory, ACME directory URL, contact
email.
`AcmeState` is not `Clone`. It cannot be a `TlsIdentity` variant. The
current `build_rustls_server_config(&TlsIdentity) -> ServerConfig` is
synchronous — there's no room for spawning an async state machine or
holding a runtime resolver handle. The reverse-proxy project solved this
with a two-phase construction: static config → `TlsMode` (runtime
objects) → `ServerConfig`. alknet needs the same split.
### Issue 2: `RawKey` is coupled to the `iroh` feature
`TlsIdentity::RawKey(iroh::SecretKey)` is gated `#[cfg(feature = "iroh")]`.
The `RawKeyCertResolver` and `Ed25519SigningKey` impls are gated
`#[cfg(all(feature = "quinn", feature = "iroh"))]`. This means a
quinn-only build (the default feature set) **cannot use RFC 7250 raw-key
identity** — the very mode described as "default for most alknet nodes."
The coupling is artificial. `iroh::SecretKey` is a thin newtype over
`ed25519_dalek::SigningKey` (`pub struct SecretKey(SigningKey)`). The
alknet code uses exactly three APIs: `.public().as_bytes()`, `.sign(msg)`,
and `.clone()`. None of these are iroh-specific. The raw-key TLS path
needs Ed25519 signing + SPKI encoding — both available from
`ed25519-dalek` + `rustls` without iroh.
The iroh *transport* (`build_iroh_endpoint`) does need `iroh::SecretKey`
for `iroh::Endpoint::builder().secret_key(...)`. If `TlsIdentity::RawKey`
no longer carries an `iroh::SecretKey`, the iroh transport must convert
from the new key type — trivial since `iroh::SecretKey::from_bytes(&[u8;
32])` accepts raw Ed25519 key bytes.
### ACME challenge handling with quinn (QUIC, not TCP)
Research confirmed how TLS-ALPN-01 works with quinn:
- The `ResolvesServerCertAcme` resolver intercepts the challenge at the
**cert resolution step**, during the TLS handshake, before the
handshake result is surfaced to the application.
- When an ACME CA connects with ALPN `[acme-tls/1]`, rustls calls the
resolver, which returns the challenge cert. The handshake completes.
The CA inspects the cert's SAN and validates the challenge — no
application-layer data exchange needed.
- quinn's `connecting.await` then returns a completed `Connection` with
ALPN `acme-tls/1`. alknet's `dispatch_quinn` would find no handler for
that ALPN and close the connection. **The challenge already succeeded**
— the close is cosmetic.
- Unlike the reverse-proxy (TCP + `LazyConfigAcceptor`), quinn gives no
"peek at ClientHello" hook. The challenge is fully TLS-layer-handled;
the application only needs to close challenge connections gracefully
(silent close, not a "no handler" warning).
Key constraint: ACME requires `with_cert_resolver(ResolvesServerCertAcme)`,
not `with_single_cert`. You cannot just append `acme-tls/1` to an
`X509`/`SelfSigned` config — there'd be no resolver to serve the
challenge cert. ACME is a distinct `ServerConfig` construction path.
## Decision
### 1. Add `TlsIdentity::Acme` variant (static config data only)
```rust
pub enum TlsIdentity {
X509 { cert: PathBuf, key: PathBuf },
RawKey(Ed25519SecretKey), // see Decision 3
SelfSigned,
Acme { // NEW
domains: Vec<String>,
cache_dir: PathBuf,
directory: AcmeDirectory, // enum: Production, Staging, Custom(String)
contact: Vec<String>, // e.g. ["mailto:admin@example.com"]
},
}
```
`Acme` holds only static, `Clone`/`Debug`-safe config data. No
`AcmeState`, no resolver, no runtime objects. The async state machine is
constructed at endpoint setup time (Decision 2).
### 2. Split server-config construction into two phases
Replace the synchronous `build_rustls_server_config(&TlsIdentity) ->
ServerConfig` with a two-phase construction:
**Phase 1 — `TlsSetup` (async, at endpoint construction):**
```rust
struct TlsSetup {
server_config: rustls::ServerConfig,
acme_state: Option<AcmeStateHandle>, // spawned task + handle for shutdown
}
```
For `X509`, `SelfSigned`, `RawKey`: construct `ServerConfig`
synchronously (current path, unchanged). `acme_state` is `None`.
For `Acme`: construct `AcmeConfig`, spawn the `AcmeState` event loop,
get `ResolvesServerCertAcme`, build `ServerConfig` with
`with_cert_resolver(resolver)`, add `acme-tls/1` to `alpn_protocols`.
`acme_state` is `Some(handle)` so the endpoint can abort the ACME task
on shutdown.
**Phase 2 — use `TlsSetup.server_config` to build `quinn::ServerConfig`:**
Same as today: `QuicServerConfig::try_from(rustls_config)`
`quinn::ServerConfig::with_crypto(...)`.
The `TlsSetup` is constructed inside `AlknetEndpoint::new()` (or
`run_quinn_accept_loop`), not inside `TlsIdentity`. The `TlsIdentity`
enum stays a pure data structure.
### 3. Decouple `RawKey` from iroh — use `ed25519-dalek` directly
Replace `TlsIdentity::RawKey(iroh::SecretKey)` with
`TlsIdentity::RawKey(Ed25519SecretKey)`, where `Ed25519SecretKey` is a
thin alknet-core-owned wrapper over `ed25519_dalek::SigningKey`:
```rust
pub struct Ed25519SecretKey(ed25519_dalek::SigningKey);
```
This type is `Clone`, `Debug` (redacting), `Zeroize`, and not gated
behind any feature flag. `ed25519-dalek` becomes a direct dependency of
alknet-core (it's already in the dependency tree transitively via iroh).
The `RawKeyCertResolver` and `Ed25519SigningKey` rustls impls move from
`#[cfg(all(feature = "quinn", feature = "iroh"))]` to
`#[cfg(feature = "quinn")]` — raw-key TLS identity works in quinn-only
builds.
The `iroh` feature gate on `TlsIdentity::RawKey` is removed. The
variant is always available.
### 4. iroh transport converts from `Ed25519SecretKey`
`build_iroh_endpoint` currently reads `TlsIdentity::RawKey(iroh::SecretKey)`
and passes it to `iroh::Endpoint::builder().secret_key(...)`. After
decoupling, it converts:
```rust
if let Some(TlsIdentity::RawKey(key)) = static_config.tls_identity.as_ref() {
let iroh_key = iroh::SecretKey::from_bytes(key.as_bytes());
builder = builder.secret_key(iroh_key);
}
```
`iroh::SecretKey::from_bytes(&[u8; 32])` accepts raw Ed25519 key bytes —
no information loss. This conversion is `#[cfg(feature = "iroh")]` only.
### 5. ACME ALPN challenge handling in `dispatch_quinn`
Add an early-return guard in `dispatch_quinn` before the handler lookup:
```rust
if alpn == b"acme-tls/1" {
debug!("acme-tls/1 challenge connection completed at TLS layer; closing");
connection.close(0u32.into(), b"acme done");
return;
}
```
This avoids the misleading "no handler for ALPN" warning. The challenge
is already answered at the TLS layer; the application just closes
gracefully. No `ProtocolHandler` registration for `acme-tls/1`.
### 6. Feature-gate ACME behind a new `acme` feature
Add a `acme` feature to alknet-core:
```toml
[features]
acme = ["dep:rustls-acme"]
```
`TlsIdentity::Acme` is available regardless of feature (it's just config
data), but constructing `TlsSetup` with an `Acme` variant requires the
`acme` feature. Without it, `TlsIdentity::Acme` at endpoint construction
returns an error ("ACME feature not enabled"). This keeps the
footprint down for nodes that don't need ACME — `rustls-acme` and its
dependencies are only compiled when the feature is on.
### 7. `acme-tls/1` in ALPN list only when ACME is active
When `TlsIdentity::Acme` is configured, `acme-tls/1` is appended to the
`alpn_protocols` list alongside the handler ALPNs. When ACME is not
configured, `acme-tls/1` is not advertised — no behavior change for
non-ACME nodes.
## Consequences
- **Breaking change to `TlsIdentity`**: `RawKey(iroh::SecretKey)`
`RawKey(Ed25519SecretKey)`. Pre-1.0 crate, in-repo consumers only.
The assembly layer and tests that construct `TlsIdentity::RawKey` must
update.
- **`ed25519-dalek` becomes a direct dependency** of alknet-core. It's
already in the dependency tree (transitive via iroh), so no new
compilation cost for `iroh` builds. Quinn-only builds that were not
using `RawKey` before will now compile `ed25519-dalek` — it's a small,
pure-Rust crate with no C dependencies.
- **`rustls-acme` is feature-gated** (`acme` feature). Nodes not using
ACME don't compile it. The feature is compatible with `quinn` (ACME
is quinn-only; iroh uses its own TLS).
- **`build_rustls_server_config` becomes async** (or is replaced by an
async `TlsSetup::new`). The accept loop already runs in an async
context, so this is a local change.
- **ACME state machine lifecycle**: the `AcmeState` task is spawned in
`AlknetEndpoint::new()` and aborted on shutdown. The `TlsSetup` struct
carries the `JoinHandle` so `AlknetEndpoint::shutdown()` can abort it.
- **No handler needed for `acme-tls/1`**: the `dispatch_quinn` guard
handles it. `HandlerRegistry` is not involved.
## Alternatives Considered
### A. ACME as a `ResolvesServerCert` wrapper behind `X509`
OQ-12 suggested ACME "fits naturally as an additional `TlsIdentity`
variant or as a `rustls::ResolvesServerCert` implementation behind the
existing `X509` path." The second option — wrapping `X509` — was
rejected because ACME needs async state + config fields (domains, cache,
contact) that don't fit behind the static `X509 { cert, key }` variant.
A `ResolvesServerCert` that internally does ACME would need to be
constructed at config time with those fields, which means `X509` would
need to carry them — bloating the variant for non-ACME users. A
dedicated `Acme` variant is cleaner.
### B. Keep `RawKey` coupled to iroh, only add ACME
Rejected because the coupling is the root cause of quinn-only builds not
supporting the "default" identity mode. Fixing only ACME would leave the
artificial iroh dependency in place. Since both changes touch
`TlsIdentity` and `build_rustls_server_config`, doing them together
avoids two breaking changes to the same enum.
### C. Use `iroh::SecretKey` for both, re-export from alknet-core
Rejected because it would make `iroh` a non-optional dependency of
alknet-core, defeating the feature-gated transport design (ADR-010).
`ed25519-dalek` is a lightweight, pure-Rust crate; `iroh` is not.
### D. Register a no-op `ProtocolHandler` for `acme-tls/1`
Rejected because it would require the handler registry to know about
ACME (a TLS-layer concern), polluting the ALPN dispatch abstraction.
The `dispatch_quinn` guard is a one-line check that keeps ACME handling
in the endpoint layer where it belongs.
## Cross-References
- OQ-12 (TLS identity provisioning) — updated by this ADR
- [ADR-010](010-alpn-router-and-endpoint.md) — multi-connectivity endpoint, feature-gated transports
- [ADR-004](004-auth-as-shared-core.md) — auth as shared core
- `docs/architecture/crates/core/endpoint.md` — TLS identity use cases, updated
- `docs/architecture/crates/core/config.md``TlsIdentity` enum, updated
- `/workspace/@alkdev/reverse-proxy/src/tls/` — proven ACME implementation pattern
- `rustls-acme` crate — ACME state machine + cert resolver