Files
alknet/docs/architecture/decisions/027-tls-identity-redesign-acme-rawkey-decoupling.md
glm-5.2 d94d7a132a docs(adr-027): TLS identity redesign — ACME + RawKey decoupling
ADR-027 resolves the architectural gap surfaced when ACME integration
became a concrete target:

1. TlsIdentity::Acme variant — static config data (domains, cache_dir,
   directory, contact) with async AcmeState constructed at endpoint
   setup via two-phase TlsSetup (not stuffed into the Clone-able enum).

2. TlsIdentity::RawKey decoupled from the iroh feature — uses
   Ed25519SecretKey (alknet-core-owned wrapper over ed25519_dalek)
   instead of iroh::SecretKey. Raw-key TLS identity (RFC 7250, the
   default for most alknet nodes) now works in quinn-only builds.
   iroh transport converts via SecretKey::from_bytes.

3. ACME feature-gated behind new acme feature (rustls-acme optional
   dep). Non-ACME builds don't compile it.

4. dispatch_quinn guard for acme-tls/1 challenge connections — TLS-ALPN-01
   is handled at the rustls cert resolver layer during the handshake;
   the guard closes challenge connections gracefully instead of logging
   a misleading "no handler" warning.

Research confirmed QUIC (quinn) handles ACME challenges differently than
TCP (reverse-proxy): quinn gives no ClientHello peek hook, but the
challenge is fully answered at the cert resolution step before the
connection surfaces to the application. No handler registration needed.

Spec updates: config.md, endpoint.md, open-questions.md (OQ-12),
overview.md + README.md (ADR index), ADR-010 (cross-ref).

Tasks: core/rawkey-decouple-from-iroh (gen 1, no deps),
core/acme-integration (gen 2, depends on rawkey). Graph: 36 tasks.
2026-06-24 12:29:24 +00:00

12 KiB

ADR-027: TLS Identity Redesign — ACME Integration + RawKey Decoupling

Status

Accepted

Context

OQ-12 marked "resolved" identified two TLS identity use cases: RFC 7250 raw Ed25519 keys (default, P2P) and X.509 certs (domain-hosted, browsers). ACME auto-provisioning was described as "additive — it will be adapted when domain-hosted nodes need it." That deferral created two architectural issues that surface now that ACME is a concrete target.

Issue 1: TlsIdentity cannot represent ACME

TlsIdentity is #[derive(Debug, Clone)] and lives in StaticConfig — a static, synchronous config value. ACME requires:

  • A long-lived async state machine (AcmeState event loop, spawned for the endpoint's lifetime) that handles ordering, challenge response, cert renewal, and cache I/O.
  • TLS-ALPN-01 challenge handling: acme-tls/1 must be in the server's alpn_protocols, and a ResolvesServerCertAcme must serve challenge certs during the TLS handshake.
  • Config fields: domains, cache directory, ACME directory URL, contact email.

AcmeState is not Clone. It cannot be a TlsIdentity variant. The current build_rustls_server_config(&TlsIdentity) -> ServerConfig is synchronous — there's no room for spawning an async state machine or holding a runtime resolver handle. The reverse-proxy project solved this with a two-phase construction: static config → TlsMode (runtime objects) → ServerConfig. alknet needs the same split.

Issue 2: RawKey is coupled to the iroh feature

TlsIdentity::RawKey(iroh::SecretKey) is gated #[cfg(feature = "iroh")]. The RawKeyCertResolver and Ed25519SigningKey impls are gated #[cfg(all(feature = "quinn", feature = "iroh"))]. This means a quinn-only build (the default feature set) cannot use RFC 7250 raw-key identity — the very mode described as "default for most alknet nodes."

The coupling is artificial. iroh::SecretKey is a thin newtype over ed25519_dalek::SigningKey (pub struct SecretKey(SigningKey)). The alknet code uses exactly three APIs: .public().as_bytes(), .sign(msg), and .clone(). None of these are iroh-specific. The raw-key TLS path needs Ed25519 signing + SPKI encoding — both available from ed25519-dalek + rustls without iroh.

The iroh transport (build_iroh_endpoint) does need iroh::SecretKey for iroh::Endpoint::builder().secret_key(...). If TlsIdentity::RawKey no longer carries an iroh::SecretKey, the iroh transport must convert from the new key type — trivial since iroh::SecretKey::from_bytes(&[u8; 32]) accepts raw Ed25519 key bytes.

ACME challenge handling with quinn (QUIC, not TCP)

Research confirmed how TLS-ALPN-01 works with quinn:

  • The ResolvesServerCertAcme resolver intercepts the challenge at the cert resolution step, during the TLS handshake, before the handshake result is surfaced to the application.
  • When an ACME CA connects with ALPN [acme-tls/1], rustls calls the resolver, which returns the challenge cert. The handshake completes. The CA inspects the cert's SAN and validates the challenge — no application-layer data exchange needed.
  • quinn's connecting.await then returns a completed Connection with ALPN acme-tls/1. alknet's dispatch_quinn would find no handler for that ALPN and close the connection. The challenge already succeeded — the close is cosmetic.
  • Unlike the reverse-proxy (TCP + LazyConfigAcceptor), quinn gives no "peek at ClientHello" hook. The challenge is fully TLS-layer-handled; the application only needs to close challenge connections gracefully (silent close, not a "no handler" warning).

Key constraint: ACME requires with_cert_resolver(ResolvesServerCertAcme), not with_single_cert. You cannot just append acme-tls/1 to an X509/SelfSigned config — there'd be no resolver to serve the challenge cert. ACME is a distinct ServerConfig construction path.

Decision

1. Add TlsIdentity::Acme variant (static config data only)

pub enum TlsIdentity {
    X509 { cert: PathBuf, key: PathBuf },
    RawKey(Ed25519SecretKey),      // see Decision 3
    SelfSigned,
    Acme {                         // NEW
        domains: Vec<String>,
        cache_dir: PathBuf,
        directory: AcmeDirectory,  // enum: Production, Staging, Custom(String)
        contact: Vec<String>,      // e.g. ["mailto:admin@example.com"]
    },
}

Acme holds only static, Clone/Debug-safe config data. No AcmeState, no resolver, no runtime objects. The async state machine is constructed at endpoint setup time (Decision 2).

2. Split server-config construction into two phases

Replace the synchronous build_rustls_server_config(&TlsIdentity) -> ServerConfig with a two-phase construction:

Phase 1 — TlsSetup (async, at endpoint construction):

struct TlsSetup {
    server_config: rustls::ServerConfig,
    acme_state: Option<AcmeStateHandle>,  // spawned task + handle for shutdown
}

For X509, SelfSigned, RawKey: construct ServerConfig synchronously (current path, unchanged). acme_state is None.

For Acme: construct AcmeConfig, spawn the AcmeState event loop, get ResolvesServerCertAcme, build ServerConfig with with_cert_resolver(resolver), add acme-tls/1 to alpn_protocols. acme_state is Some(handle) so the endpoint can abort the ACME task on shutdown.

Phase 2 — use TlsSetup.server_config to build quinn::ServerConfig:

Same as today: QuicServerConfig::try_from(rustls_config)quinn::ServerConfig::with_crypto(...).

The TlsSetup is constructed inside AlknetEndpoint::new() (or run_quinn_accept_loop), not inside TlsIdentity. The TlsIdentity enum stays a pure data structure.

3. Decouple RawKey from iroh — use ed25519-dalek directly

Replace TlsIdentity::RawKey(iroh::SecretKey) with TlsIdentity::RawKey(Ed25519SecretKey), where Ed25519SecretKey is a thin alknet-core-owned wrapper over ed25519_dalek::SigningKey:

pub struct Ed25519SecretKey(ed25519_dalek::SigningKey);

This type is Clone, Debug (redacting), Zeroize, and not gated behind any feature flag. ed25519-dalek becomes a direct dependency of alknet-core (it's already in the dependency tree transitively via iroh).

The RawKeyCertResolver and Ed25519SigningKey rustls impls move from #[cfg(all(feature = "quinn", feature = "iroh"))] to #[cfg(feature = "quinn")] — raw-key TLS identity works in quinn-only builds.

The iroh feature gate on TlsIdentity::RawKey is removed. The variant is always available.

4. iroh transport converts from Ed25519SecretKey

build_iroh_endpoint currently reads TlsIdentity::RawKey(iroh::SecretKey) and passes it to iroh::Endpoint::builder().secret_key(...). After decoupling, it converts:

if let Some(TlsIdentity::RawKey(key)) = static_config.tls_identity.as_ref() {
    let iroh_key = iroh::SecretKey::from_bytes(key.as_bytes());
    builder = builder.secret_key(iroh_key);
}

iroh::SecretKey::from_bytes(&[u8; 32]) accepts raw Ed25519 key bytes — no information loss. This conversion is #[cfg(feature = "iroh")] only.

5. ACME ALPN challenge handling in dispatch_quinn

Add an early-return guard in dispatch_quinn before the handler lookup:

if alpn == b"acme-tls/1" {
    debug!("acme-tls/1 challenge connection completed at TLS layer; closing");
    connection.close(0u32.into(), b"acme done");
    return;
}

This avoids the misleading "no handler for ALPN" warning. The challenge is already answered at the TLS layer; the application just closes gracefully. No ProtocolHandler registration for acme-tls/1.

6. Feature-gate ACME behind a new acme feature

Add a acme feature to alknet-core:

[features]
acme = ["dep:rustls-acme"]

TlsIdentity::Acme is available regardless of feature (it's just config data), but constructing TlsSetup with an Acme variant requires the acme feature. Without it, TlsIdentity::Acme at endpoint construction returns an error ("ACME feature not enabled"). This keeps the footprint down for nodes that don't need ACME — rustls-acme and its dependencies are only compiled when the feature is on.

7. acme-tls/1 in ALPN list only when ACME is active

When TlsIdentity::Acme is configured, acme-tls/1 is appended to the alpn_protocols list alongside the handler ALPNs. When ACME is not configured, acme-tls/1 is not advertised — no behavior change for non-ACME nodes.

Consequences

  • Breaking change to TlsIdentity: RawKey(iroh::SecretKey)RawKey(Ed25519SecretKey). Pre-1.0 crate, in-repo consumers only. The assembly layer and tests that construct TlsIdentity::RawKey must update.
  • ed25519-dalek becomes a direct dependency of alknet-core. It's already in the dependency tree (transitive via iroh), so no new compilation cost for iroh builds. Quinn-only builds that were not using RawKey before will now compile ed25519-dalek — it's a small, pure-Rust crate with no C dependencies.
  • rustls-acme is feature-gated (acme feature). Nodes not using ACME don't compile it. The feature is compatible with quinn (ACME is quinn-only; iroh uses its own TLS).
  • build_rustls_server_config becomes async (or is replaced by an async TlsSetup::new). The accept loop already runs in an async context, so this is a local change.
  • ACME state machine lifecycle: the AcmeState task is spawned in AlknetEndpoint::new() and aborted on shutdown. The TlsSetup struct carries the JoinHandle so AlknetEndpoint::shutdown() can abort it.
  • No handler needed for acme-tls/1: the dispatch_quinn guard handles it. HandlerRegistry is not involved.

Alternatives Considered

A. ACME as a ResolvesServerCert wrapper behind X509

OQ-12 suggested ACME "fits naturally as an additional TlsIdentity variant or as a rustls::ResolvesServerCert implementation behind the existing X509 path." The second option — wrapping X509 — was rejected because ACME needs async state + config fields (domains, cache, contact) that don't fit behind the static X509 { cert, key } variant. A ResolvesServerCert that internally does ACME would need to be constructed at config time with those fields, which means X509 would need to carry them — bloating the variant for non-ACME users. A dedicated Acme variant is cleaner.

B. Keep RawKey coupled to iroh, only add ACME

Rejected because the coupling is the root cause of quinn-only builds not supporting the "default" identity mode. Fixing only ACME would leave the artificial iroh dependency in place. Since both changes touch TlsIdentity and build_rustls_server_config, doing them together avoids two breaking changes to the same enum.

C. Use iroh::SecretKey for both, re-export from alknet-core

Rejected because it would make iroh a non-optional dependency of alknet-core, defeating the feature-gated transport design (ADR-010). ed25519-dalek is a lightweight, pure-Rust crate; iroh is not.

D. Register a no-op ProtocolHandler for acme-tls/1

Rejected because it would require the handler registry to know about ACME (a TLS-layer concern), polluting the ALPN dispatch abstraction. The dispatch_quinn guard is a one-line check that keeps ACME handling in the endpoint layer where it belongs.

Cross-References

  • OQ-12 (TLS identity provisioning) — updated by this ADR
  • ADR-010 — multi-connectivity endpoint, feature-gated transports
  • ADR-004 — auth as shared core
  • docs/architecture/crates/core/endpoint.md — TLS identity use cases, updated
  • docs/architecture/crates/core/config.mdTlsIdentity enum, updated
  • /workspace/@alkdev/reverse-proxy/src/tls/ — proven ACME implementation pattern
  • rustls-acme crate — ACME state machine + cert resolver