docs(adr-027): TLS identity redesign — ACME + RawKey decoupling
ADR-027 resolves the architectural gap surfaced when ACME integration became a concrete target: 1. TlsIdentity::Acme variant — static config data (domains, cache_dir, directory, contact) with async AcmeState constructed at endpoint setup via two-phase TlsSetup (not stuffed into the Clone-able enum). 2. TlsIdentity::RawKey decoupled from the iroh feature — uses Ed25519SecretKey (alknet-core-owned wrapper over ed25519_dalek) instead of iroh::SecretKey. Raw-key TLS identity (RFC 7250, the default for most alknet nodes) now works in quinn-only builds. iroh transport converts via SecretKey::from_bytes. 3. ACME feature-gated behind new acme feature (rustls-acme optional dep). Non-ACME builds don't compile it. 4. dispatch_quinn guard for acme-tls/1 challenge connections — TLS-ALPN-01 is handled at the rustls cert resolver layer during the handshake; the guard closes challenge connections gracefully instead of logging a misleading "no handler" warning. Research confirmed QUIC (quinn) handles ACME challenges differently than TCP (reverse-proxy): quinn gives no ClientHello peek hook, but the challenge is fully answered at the cert resolution step before the connection surfaces to the application. No handler registration needed. Spec updates: config.md, endpoint.md, open-questions.md (OQ-12), overview.md + README.md (ADR index), ADR-010 (cross-ref). Tasks: core/rawkey-decouple-from-iroh (gen 1, no deps), core/acme-integration (gen 2, depends on rawkey). Graph: 36 tasks.
This commit is contained in:
@@ -61,6 +61,7 @@ last_updated: 2026-06-23
|
|||||||
| [024](decisions/024-operation-registry-layering.md) | Operation Registry Layering | Accepted |
|
| [024](decisions/024-operation-registry-layering.md) | Operation Registry Layering | Accepted |
|
||||||
| [025](decisions/025-vault-local-only-dispatch.md) | Vault Local-Only Dispatch | Accepted |
|
| [025](decisions/025-vault-local-only-dispatch.md) | Vault Local-Only Dispatch | Accepted |
|
||||||
| [026](decisions/026-vault-key-model-hd-derivation.md) | Vault Key Model — HD Derivation | Accepted |
|
| [026](decisions/026-vault-key-model-hd-derivation.md) | Vault Key Model — HD Derivation | Accepted |
|
||||||
|
| [027](decisions/027-tls-identity-redesign-acme-rawkey-decoupling.md) | TLS Identity Redesign — ACME + RawKey Decoupling | Accepted |
|
||||||
|
|
||||||
## Open Questions
|
## Open Questions
|
||||||
|
|
||||||
@@ -82,7 +83,7 @@ See [open-questions.md](open-questions.md) for the full tracker.
|
|||||||
- **OQ-04**: Dynamic handler registration — static at startup (ADR-010); scoped to the `HandlerRegistry` (ALPN-level) by ADR-024, which governs `OperationRegistry` mutability separately
|
- **OQ-04**: Dynamic handler registration — static at startup (ADR-010); scoped to the `HandlerRegistry` (ALPN-level) by ADR-024, which governs `OperationRegistry` mutability separately
|
||||||
- **OQ-07**: Call protocol scope — bidirectional streams, EventEnvelope, ID-based correlation (ADR-012)
|
- **OQ-07**: Call protocol scope — bidirectional streams, EventEnvelope, ID-based correlation (ADR-012)
|
||||||
- **OQ-11**: Handler-level auth resolution observability — handlers store resolved identity on Connection (Option B); two identity scopes: connection-level (observability) and per-request (ACL)
|
- **OQ-11**: Handler-level auth resolution observability — handlers store resolved identity on Connection (Option B); two identity scopes: connection-level (observability) and per-request (ACL)
|
||||||
- **OQ-12**: TLS identity provisioning — two use cases: RFC 7250 raw keys (default, P2P) and X.509 certs (domain-hosted, browsers). ACME is a proven pattern.
|
- **OQ-12**: TLS identity provisioning — two use cases: RFC 7250 raw keys (default, P2P) and X.509 certs (domain-hosted, browsers). ACME designed in ADR-027; RawKey decoupled from iroh feature.
|
||||||
- **OQ-13**: Operation path format — `/{service}/{op}` is the correct design for alknet-call, not a simplification
|
- **OQ-13**: Operation path format — `/{service}/{op}` is the correct design for alknet-call, not a simplification
|
||||||
- **OQ-14**: Batch operation semantics — multiple correlated `call.requested` events is the correct protocol design, not a simplification
|
- **OQ-14**: Batch operation semantics — multiple correlated `call.requested` events is the correct protocol design, not a simplification
|
||||||
- **OQ-19**: Session-scoped registries — agent-written operations via `OperationEnv` trait layering; protocol doesn't need changes; `OperationEnv` must remain a trait. Generalized by ADR-024 to cover connection-scoped overlays as well.
|
- **OQ-19**: Session-scoped registries — agent-written operations via `OperationEnv` trait layering; protocol doesn't need changes; `OperationEnv` must remain a trait. Generalized by ADR-024 to cover connection-scoped overlays as well.
|
||||||
|
|||||||
@@ -41,15 +41,24 @@ pub enum TlsIdentity {
|
|||||||
/// RFC 7250 raw Ed25519 public key.
|
/// RFC 7250 raw Ed25519 public key.
|
||||||
/// No domain, no CA, no cert renewal. Key = identity.
|
/// No domain, no CA, no cert renewal. Key = identity.
|
||||||
/// Same model as iroh's NodeId, but for direct QUIC connections.
|
/// Same model as iroh's NodeId, but for direct QUIC connections.
|
||||||
/// `SecretKey` is `iroh::SecretKey` (Ed25519) — re-exported from iroh,
|
/// Uses `Ed25519SecretKey` (alknet-core-owned wrapper over
|
||||||
/// which alknet-core already depends on (feature-gated, ADR-010). The
|
/// `ed25519_dalek::SigningKey`) — not coupled to the `iroh` feature.
|
||||||
/// key can be derived from alknet-vault at the assembly layer
|
/// Available in quinn-only builds. See ADR-027.
|
||||||
/// (endpoint.md) or generated fresh. See OQ-12, W14.
|
RawKey(Ed25519SecretKey),
|
||||||
RawKey(iroh::SecretKey),
|
|
||||||
|
|
||||||
/// Self-signed X.509 cert for development.
|
/// Self-signed X.509 cert for development.
|
||||||
/// Generated on startup, not validated by external clients.
|
/// Generated on startup, not validated by external clients.
|
||||||
SelfSigned,
|
SelfSigned,
|
||||||
|
|
||||||
|
/// ACME auto-provisioning via Let's Encrypt (rustls-acme).
|
||||||
|
/// Produces X.509 certs at runtime; handles TLS-ALPN-01 challenges
|
||||||
|
/// and automatic renewal. Feature-gated behind `acme`. See ADR-027.
|
||||||
|
Acme {
|
||||||
|
domains: Vec<String>,
|
||||||
|
cache_dir: PathBuf,
|
||||||
|
directory: AcmeDirectory, // Production, Staging, Custom(url)
|
||||||
|
contact: Vec<String>, // e.g. ["mailto:admin@example.com"]
|
||||||
|
},
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -57,7 +66,52 @@ pub enum TlsIdentity {
|
|||||||
|
|
||||||
TLS identity in alknet has two distinct use cases, not one. The original `tls_cert: Option<PathBuf>` / `tls_key: Option<PathBuf>` assumed X.509 was the only TLS identity model. RFC 7250 raw public keys (used by iroh, supported by rustls) provide a fundamentally different mode: Ed25519 key as identity, no X.509, no CA, no domain. This is the default for most alknet nodes — it works natively with SSH auth and git. X.509 certs are for domain-hosted services and browser/WebTransport clients, which don't support RFC 7250.
|
TLS identity in alknet has two distinct use cases, not one. The original `tls_cert: Option<PathBuf>` / `tls_key: Option<PathBuf>` assumed X.509 was the only TLS identity model. RFC 7250 raw public keys (used by iroh, supported by rustls) provide a fundamentally different mode: Ed25519 key as identity, no X.509, no CA, no domain. This is the default for most alknet nodes — it works natively with SSH auth and git. X.509 certs are for domain-hosted services and browser/WebTransport clients, which don't support RFC 7250.
|
||||||
|
|
||||||
The `TlsIdentity` enum captures both use cases plus a development mode. See OQ-12 for the full rationale.
|
The `TlsIdentity` enum captures all four modes. See OQ-12 for the use-case
|
||||||
|
rationale and [ADR-027](../../decisions/027-tls-identity-redesign-acme-rawkey-decoupling.md)
|
||||||
|
for the ACME + RawKey decoupling design.
|
||||||
|
|
||||||
|
### `Ed25519SecretKey`
|
||||||
|
|
||||||
|
A thin alknet-core-owned wrapper over `ed25519_dalek::SigningKey`. Not
|
||||||
|
feature-gated — available in all builds. Used by `TlsIdentity::RawKey`
|
||||||
|
for RFC 7250 raw public key TLS identity. When the `iroh` transport is
|
||||||
|
configured, `build_iroh_endpoint` converts to `iroh::SecretKey::from_bytes`
|
||||||
|
(see ADR-027, Decision 4).
|
||||||
|
|
||||||
|
### `AcmeDirectory`
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub enum AcmeDirectory {
|
||||||
|
Production, // Let's Encrypt production
|
||||||
|
Staging, // Let's Encrypt staging
|
||||||
|
Custom(String), // custom ACME directory URL
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Construction examples (updated)
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// P2P / key-based identity (default for most nodes) — no iroh dep needed
|
||||||
|
let p2p_config = StaticConfig {
|
||||||
|
listen_addr: Some("0.0.0.0:4433".parse()?),
|
||||||
|
tls_identity: Some(TlsIdentity::RawKey(Ed25519SecretKey::generate())),
|
||||||
|
iroh_relay: None,
|
||||||
|
drain_timeout: Duration::from_secs(2),
|
||||||
|
};
|
||||||
|
|
||||||
|
// Domain-hosted service with ACME auto-provisioning
|
||||||
|
let acme_config = StaticConfig {
|
||||||
|
listen_addr: Some("0.0.0.0:443".parse()?),
|
||||||
|
tls_identity: Some(TlsIdentity::Acme {
|
||||||
|
domains: vec!["relay.alk.dev".to_string()],
|
||||||
|
cache_dir: "/var/lib/alknet/acme".into(),
|
||||||
|
directory: AcmeDirectory::Production,
|
||||||
|
contact: vec!["mailto:admin@alk.dev".to_string()],
|
||||||
|
}),
|
||||||
|
iroh_relay: None,
|
||||||
|
drain_timeout: Duration::from_secs(2),
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
### Key differences from reference implementation
|
### Key differences from reference implementation
|
||||||
|
|
||||||
@@ -80,12 +134,12 @@ The reference `StaticConfig` (in `alknet-main/crates/alknet-core/src/config/stat
|
|||||||
// P2P / key-based identity (default for most nodes)
|
// P2P / key-based identity (default for most nodes)
|
||||||
let p2p_config = StaticConfig {
|
let p2p_config = StaticConfig {
|
||||||
listen_addr: Some("0.0.0.0:4433".parse()?),
|
listen_addr: Some("0.0.0.0:4433".parse()?),
|
||||||
tls_identity: Some(TlsIdentity::RawKey(iroh::SecretKey::generate())),
|
tls_identity: Some(TlsIdentity::RawKey(Ed25519SecretKey::generate())),
|
||||||
iroh_relay: None,
|
iroh_relay: None,
|
||||||
drain_timeout: Duration::from_secs(2),
|
drain_timeout: Duration::from_secs(2),
|
||||||
};
|
};
|
||||||
|
|
||||||
// Domain-hosted service (relays, public services, browsers)
|
// Domain-hosted service (relays, public services, browsers) — manual certs
|
||||||
let domain_config = StaticConfig {
|
let domain_config = StaticConfig {
|
||||||
listen_addr: Some("0.0.0.0:4433".parse()?),
|
listen_addr: Some("0.0.0.0:4433".parse()?),
|
||||||
tls_identity: Some(TlsIdentity::X509 {
|
tls_identity: Some(TlsIdentity::X509 {
|
||||||
|
|||||||
@@ -206,7 +206,7 @@ This mode works natively with SSH auth (same key type) and git (SSH key-based au
|
|||||||
Nodes that serve browser/WebTransport clients, or nodes with public domain names, use X.509 certificates. This has two sub-cases:
|
Nodes that serve browser/WebTransport clients, or nodes with public domain names, use X.509 certificates. This has two sub-cases:
|
||||||
|
|
||||||
- **Manual**: Provide cert/key file paths via `TlsIdentity::X509`. The endpoint loads them at startup and builds a standard `rustls::ServerConfig`.
|
- **Manual**: Provide cert/key file paths via `TlsIdentity::X509`. The endpoint loads them at startup and builds a standard `rustls::ServerConfig`.
|
||||||
- **ACME auto-provisioning**: Let's Encrypt via `rustls-acme`. The reverse-proxy project (`/workspace/@alkdev/reverse-proxy`) demonstrates the complete pattern: per-listener ACME state machine, `ResolvesServerCertAcme` rustls integration, TLS-ALPN-01 challenge handling, automatic renewal. This is a proven, solved implementation pattern. It will be adapted to alknet's `AlknetEndpoint` context as an additional `TlsIdentity` variant or `ResolvesServerCert` implementation.
|
- **ACME auto-provisioning**: Let's Encrypt via `rustls-acme`. `TlsIdentity::Acme { domains, cache_dir, directory, contact }` carries the static config; the endpoint constructs the `AcmeState` async state machine and `ResolvesServerCertAcme` at setup time (ADR-027). The `acme` feature gate keeps `rustls-acme` out of non-ACME builds. See [ADR-027](../../decisions/027-tls-identity-redesign-acme-rawkey-decoupling.md) for the full design.
|
||||||
|
|
||||||
`TlsIdentity::SelfSigned` is for development only — the endpoint generates a self-signed cert on startup. External clients will not trust it.
|
`TlsIdentity::SelfSigned` is for development only — the endpoint generates a self-signed cert on startup. External clients will not trust it.
|
||||||
|
|
||||||
@@ -219,10 +219,17 @@ The iroh endpoint does not need TLS certificate configuration — it uses `NodeI
|
|||||||
| Path | Identity model | Client compatibility | Use case |
|
| Path | Identity model | Client compatibility | Use case |
|
||||||
|------|---------------|---------------------|----------|
|
|------|---------------|---------------------|----------|
|
||||||
| quinn + `TlsIdentity::RawKey` | RFC 7250 Ed25519 raw key | alknet-native, SSH, git | Personal nodes, P2P, most deployments |
|
| quinn + `TlsIdentity::RawKey` | RFC 7250 Ed25519 raw key | alknet-native, SSH, git | Personal nodes, P2P, most deployments |
|
||||||
| quinn + `TlsIdentity::X509` | X.509 domain certificate | All clients including browsers | Relays, public services, WebTransport |
|
| quinn + `TlsIdentity::X509` | X.509 domain certificate (manual) | All clients including browsers | Relays, public services, WebTransport |
|
||||||
|
| quinn + `TlsIdentity::Acme` | X.509 via ACME auto-provisioning | All clients including browsers | Public relays, domain-hosted services |
|
||||||
| quinn + `TlsIdentity::SelfSigned` | X.509 self-signed cert | None (dev only) | Local development |
|
| quinn + `TlsIdentity::SelfSigned` | X.509 self-signed cert | None (dev only) | Local development |
|
||||||
| iroh | NodeId (Ed25519, RFC 7250 built-in) | alknet-native, iroh clients | NAT traversal, home servers |
|
| iroh | NodeId (Ed25519, RFC 7250 built-in) | alknet-native, iroh clients | NAT traversal, home servers |
|
||||||
|
|
||||||
|
Note: `TlsIdentity::RawKey` uses `Ed25519SecretKey` (alknet-core-owned,
|
||||||
|
backed by `ed25519-dalek`), not `iroh::SecretKey`. It is available in
|
||||||
|
quinn-only builds without the `iroh` feature. When the iroh transport is
|
||||||
|
also configured, `build_iroh_endpoint` converts the key to
|
||||||
|
`iroh::SecretKey::from_bytes` (ADR-027).
|
||||||
|
|
||||||
## Graceful Shutdown
|
## Graceful Shutdown
|
||||||
|
|
||||||
```rust
|
```rust
|
||||||
@@ -294,4 +301,4 @@ See [open-questions.md](../../open-questions.md) for full details.
|
|||||||
|
|
||||||
- **OQ-04**: Resolved — HandlerRegistry is static at startup.
|
- **OQ-04**: Resolved — HandlerRegistry is static at startup.
|
||||||
- **OQ-05**: Resolved — multi-connectivity endpoint with quinn + iroh, both feature-gated.
|
- **OQ-05**: Resolved — multi-connectivity endpoint with quinn + iroh, both feature-gated.
|
||||||
- **OQ-12**: Resolved — two distinct TLS identity use cases: RFC 7250 raw keys (default, P2P) and X.509 certs (domain-hosted, browsers). ACME is a proven pattern from the reverse-proxy project, not speculative future work.
|
- **OQ-12**: Resolved — two distinct TLS identity use cases: RFC 7250 raw keys (default, P2P) and X.509 certs (domain-hosted, browsers). ACME auto-provisioning designed in [ADR-027](../../decisions/027-tls-identity-redesign-acme-rawkey-decoupling.md); RawKey decoupled from the `iroh` feature (available in quinn-only builds).
|
||||||
@@ -183,7 +183,7 @@ pub enum HandlerError {
|
|||||||
**Negative:**
|
**Negative:**
|
||||||
- alknet-core depends on both quinn and iroh (mitigated: both are feature-gated; a node that only needs one doesn't compile the other)
|
- alknet-core depends on both quinn and iroh (mitigated: both are feature-gated; a node that only needs one doesn't compile the other)
|
||||||
- The endpoint is more complex than a single quinn listener — it manages multiple accept loops
|
- The endpoint is more complex than a single quinn listener — it manages multiple accept loops
|
||||||
- TLS identity provisioning has two distinct use cases: RFC 7250 raw keys (default for P2P/key-based identity) and X.509 certs (for domain-hosted services and browsers). ACME auto-provisioning for X.509 is a proven pattern from the reverse-proxy project, not speculative future work. See OQ-12.
|
- TLS identity provisioning has two distinct use cases: RFC 7250 raw keys (default for P2P/key-based identity) and X.509 certs (for domain-hosted services and browsers). ACME auto-provisioning and RawKey decoupling from the `iroh` feature are designed in ADR-027. See OQ-12.
|
||||||
- No runtime handler registration without regenerating the TLS config (mitigated: two-way door, start static, add ArcSwap later if needed)
|
- No runtime handler registration without regenerating the TLS config (mitigated: two-way door, start static, add ArcSwap later if needed)
|
||||||
|
|
||||||
## References
|
## References
|
||||||
|
|||||||
@@ -0,0 +1,279 @@
|
|||||||
|
# ADR-027: TLS Identity Redesign — ACME Integration + RawKey Decoupling
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Accepted
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
OQ-12 marked "resolved" identified two TLS identity use cases: RFC 7250
|
||||||
|
raw Ed25519 keys (default, P2P) and X.509 certs (domain-hosted, browsers).
|
||||||
|
ACME auto-provisioning was described as "additive — it will be adapted
|
||||||
|
when domain-hosted nodes need it." That deferral created two
|
||||||
|
architectural issues that surface now that ACME is a concrete target.
|
||||||
|
|
||||||
|
### Issue 1: `TlsIdentity` cannot represent ACME
|
||||||
|
|
||||||
|
`TlsIdentity` is `#[derive(Debug, Clone)]` and lives in `StaticConfig` —
|
||||||
|
a static, synchronous config value. ACME requires:
|
||||||
|
|
||||||
|
- A long-lived async state machine (`AcmeState` event loop, spawned for
|
||||||
|
the endpoint's lifetime) that handles ordering, challenge response,
|
||||||
|
cert renewal, and cache I/O.
|
||||||
|
- TLS-ALPN-01 challenge handling: `acme-tls/1` must be in the server's
|
||||||
|
`alpn_protocols`, and a `ResolvesServerCertAcme` must serve challenge
|
||||||
|
certs during the TLS handshake.
|
||||||
|
- Config fields: domains, cache directory, ACME directory URL, contact
|
||||||
|
email.
|
||||||
|
|
||||||
|
`AcmeState` is not `Clone`. It cannot be a `TlsIdentity` variant. The
|
||||||
|
current `build_rustls_server_config(&TlsIdentity) -> ServerConfig` is
|
||||||
|
synchronous — there's no room for spawning an async state machine or
|
||||||
|
holding a runtime resolver handle. The reverse-proxy project solved this
|
||||||
|
with a two-phase construction: static config → `TlsMode` (runtime
|
||||||
|
objects) → `ServerConfig`. alknet needs the same split.
|
||||||
|
|
||||||
|
### Issue 2: `RawKey` is coupled to the `iroh` feature
|
||||||
|
|
||||||
|
`TlsIdentity::RawKey(iroh::SecretKey)` is gated `#[cfg(feature = "iroh")]`.
|
||||||
|
The `RawKeyCertResolver` and `Ed25519SigningKey` impls are gated
|
||||||
|
`#[cfg(all(feature = "quinn", feature = "iroh"))]`. This means a
|
||||||
|
quinn-only build (the default feature set) **cannot use RFC 7250 raw-key
|
||||||
|
identity** — the very mode described as "default for most alknet nodes."
|
||||||
|
|
||||||
|
The coupling is artificial. `iroh::SecretKey` is a thin newtype over
|
||||||
|
`ed25519_dalek::SigningKey` (`pub struct SecretKey(SigningKey)`). The
|
||||||
|
alknet code uses exactly three APIs: `.public().as_bytes()`, `.sign(msg)`,
|
||||||
|
and `.clone()`. None of these are iroh-specific. The raw-key TLS path
|
||||||
|
needs Ed25519 signing + SPKI encoding — both available from
|
||||||
|
`ed25519-dalek` + `rustls` without iroh.
|
||||||
|
|
||||||
|
The iroh *transport* (`build_iroh_endpoint`) does need `iroh::SecretKey`
|
||||||
|
for `iroh::Endpoint::builder().secret_key(...)`. If `TlsIdentity::RawKey`
|
||||||
|
no longer carries an `iroh::SecretKey`, the iroh transport must convert
|
||||||
|
from the new key type — trivial since `iroh::SecretKey::from_bytes(&[u8;
|
||||||
|
32])` accepts raw Ed25519 key bytes.
|
||||||
|
|
||||||
|
### ACME challenge handling with quinn (QUIC, not TCP)
|
||||||
|
|
||||||
|
Research confirmed how TLS-ALPN-01 works with quinn:
|
||||||
|
|
||||||
|
- The `ResolvesServerCertAcme` resolver intercepts the challenge at the
|
||||||
|
**cert resolution step**, during the TLS handshake, before the
|
||||||
|
handshake result is surfaced to the application.
|
||||||
|
- When an ACME CA connects with ALPN `[acme-tls/1]`, rustls calls the
|
||||||
|
resolver, which returns the challenge cert. The handshake completes.
|
||||||
|
The CA inspects the cert's SAN and validates the challenge — no
|
||||||
|
application-layer data exchange needed.
|
||||||
|
- quinn's `connecting.await` then returns a completed `Connection` with
|
||||||
|
ALPN `acme-tls/1`. alknet's `dispatch_quinn` would find no handler for
|
||||||
|
that ALPN and close the connection. **The challenge already succeeded**
|
||||||
|
— the close is cosmetic.
|
||||||
|
- Unlike the reverse-proxy (TCP + `LazyConfigAcceptor`), quinn gives no
|
||||||
|
"peek at ClientHello" hook. The challenge is fully TLS-layer-handled;
|
||||||
|
the application only needs to close challenge connections gracefully
|
||||||
|
(silent close, not a "no handler" warning).
|
||||||
|
|
||||||
|
Key constraint: ACME requires `with_cert_resolver(ResolvesServerCertAcme)`,
|
||||||
|
not `with_single_cert`. You cannot just append `acme-tls/1` to an
|
||||||
|
`X509`/`SelfSigned` config — there'd be no resolver to serve the
|
||||||
|
challenge cert. ACME is a distinct `ServerConfig` construction path.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
### 1. Add `TlsIdentity::Acme` variant (static config data only)
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub enum TlsIdentity {
|
||||||
|
X509 { cert: PathBuf, key: PathBuf },
|
||||||
|
RawKey(Ed25519SecretKey), // see Decision 3
|
||||||
|
SelfSigned,
|
||||||
|
Acme { // NEW
|
||||||
|
domains: Vec<String>,
|
||||||
|
cache_dir: PathBuf,
|
||||||
|
directory: AcmeDirectory, // enum: Production, Staging, Custom(String)
|
||||||
|
contact: Vec<String>, // e.g. ["mailto:admin@example.com"]
|
||||||
|
},
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`Acme` holds only static, `Clone`/`Debug`-safe config data. No
|
||||||
|
`AcmeState`, no resolver, no runtime objects. The async state machine is
|
||||||
|
constructed at endpoint setup time (Decision 2).
|
||||||
|
|
||||||
|
### 2. Split server-config construction into two phases
|
||||||
|
|
||||||
|
Replace the synchronous `build_rustls_server_config(&TlsIdentity) ->
|
||||||
|
ServerConfig` with a two-phase construction:
|
||||||
|
|
||||||
|
**Phase 1 — `TlsSetup` (async, at endpoint construction):**
|
||||||
|
|
||||||
|
```rust
|
||||||
|
struct TlsSetup {
|
||||||
|
server_config: rustls::ServerConfig,
|
||||||
|
acme_state: Option<AcmeStateHandle>, // spawned task + handle for shutdown
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
For `X509`, `SelfSigned`, `RawKey`: construct `ServerConfig`
|
||||||
|
synchronously (current path, unchanged). `acme_state` is `None`.
|
||||||
|
|
||||||
|
For `Acme`: construct `AcmeConfig`, spawn the `AcmeState` event loop,
|
||||||
|
get `ResolvesServerCertAcme`, build `ServerConfig` with
|
||||||
|
`with_cert_resolver(resolver)`, add `acme-tls/1` to `alpn_protocols`.
|
||||||
|
`acme_state` is `Some(handle)` so the endpoint can abort the ACME task
|
||||||
|
on shutdown.
|
||||||
|
|
||||||
|
**Phase 2 — use `TlsSetup.server_config` to build `quinn::ServerConfig`:**
|
||||||
|
|
||||||
|
Same as today: `QuicServerConfig::try_from(rustls_config)` →
|
||||||
|
`quinn::ServerConfig::with_crypto(...)`.
|
||||||
|
|
||||||
|
The `TlsSetup` is constructed inside `AlknetEndpoint::new()` (or
|
||||||
|
`run_quinn_accept_loop`), not inside `TlsIdentity`. The `TlsIdentity`
|
||||||
|
enum stays a pure data structure.
|
||||||
|
|
||||||
|
### 3. Decouple `RawKey` from iroh — use `ed25519-dalek` directly
|
||||||
|
|
||||||
|
Replace `TlsIdentity::RawKey(iroh::SecretKey)` with
|
||||||
|
`TlsIdentity::RawKey(Ed25519SecretKey)`, where `Ed25519SecretKey` is a
|
||||||
|
thin alknet-core-owned wrapper over `ed25519_dalek::SigningKey`:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub struct Ed25519SecretKey(ed25519_dalek::SigningKey);
|
||||||
|
```
|
||||||
|
|
||||||
|
This type is `Clone`, `Debug` (redacting), `Zeroize`, and not gated
|
||||||
|
behind any feature flag. `ed25519-dalek` becomes a direct dependency of
|
||||||
|
alknet-core (it's already in the dependency tree transitively via iroh).
|
||||||
|
|
||||||
|
The `RawKeyCertResolver` and `Ed25519SigningKey` rustls impls move from
|
||||||
|
`#[cfg(all(feature = "quinn", feature = "iroh"))]` to
|
||||||
|
`#[cfg(feature = "quinn")]` — raw-key TLS identity works in quinn-only
|
||||||
|
builds.
|
||||||
|
|
||||||
|
The `iroh` feature gate on `TlsIdentity::RawKey` is removed. The
|
||||||
|
variant is always available.
|
||||||
|
|
||||||
|
### 4. iroh transport converts from `Ed25519SecretKey`
|
||||||
|
|
||||||
|
`build_iroh_endpoint` currently reads `TlsIdentity::RawKey(iroh::SecretKey)`
|
||||||
|
and passes it to `iroh::Endpoint::builder().secret_key(...)`. After
|
||||||
|
decoupling, it converts:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
if let Some(TlsIdentity::RawKey(key)) = static_config.tls_identity.as_ref() {
|
||||||
|
let iroh_key = iroh::SecretKey::from_bytes(key.as_bytes());
|
||||||
|
builder = builder.secret_key(iroh_key);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`iroh::SecretKey::from_bytes(&[u8; 32])` accepts raw Ed25519 key bytes —
|
||||||
|
no information loss. This conversion is `#[cfg(feature = "iroh")]` only.
|
||||||
|
|
||||||
|
### 5. ACME ALPN challenge handling in `dispatch_quinn`
|
||||||
|
|
||||||
|
Add an early-return guard in `dispatch_quinn` before the handler lookup:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
if alpn == b"acme-tls/1" {
|
||||||
|
debug!("acme-tls/1 challenge connection completed at TLS layer; closing");
|
||||||
|
connection.close(0u32.into(), b"acme done");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This avoids the misleading "no handler for ALPN" warning. The challenge
|
||||||
|
is already answered at the TLS layer; the application just closes
|
||||||
|
gracefully. No `ProtocolHandler` registration for `acme-tls/1`.
|
||||||
|
|
||||||
|
### 6. Feature-gate ACME behind a new `acme` feature
|
||||||
|
|
||||||
|
Add a `acme` feature to alknet-core:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[features]
|
||||||
|
acme = ["dep:rustls-acme"]
|
||||||
|
```
|
||||||
|
|
||||||
|
`TlsIdentity::Acme` is available regardless of feature (it's just config
|
||||||
|
data), but constructing `TlsSetup` with an `Acme` variant requires the
|
||||||
|
`acme` feature. Without it, `TlsIdentity::Acme` at endpoint construction
|
||||||
|
returns an error ("ACME feature not enabled"). This keeps the
|
||||||
|
footprint down for nodes that don't need ACME — `rustls-acme` and its
|
||||||
|
dependencies are only compiled when the feature is on.
|
||||||
|
|
||||||
|
### 7. `acme-tls/1` in ALPN list only when ACME is active
|
||||||
|
|
||||||
|
When `TlsIdentity::Acme` is configured, `acme-tls/1` is appended to the
|
||||||
|
`alpn_protocols` list alongside the handler ALPNs. When ACME is not
|
||||||
|
configured, `acme-tls/1` is not advertised — no behavior change for
|
||||||
|
non-ACME nodes.
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
- **Breaking change to `TlsIdentity`**: `RawKey(iroh::SecretKey)` →
|
||||||
|
`RawKey(Ed25519SecretKey)`. Pre-1.0 crate, in-repo consumers only.
|
||||||
|
The assembly layer and tests that construct `TlsIdentity::RawKey` must
|
||||||
|
update.
|
||||||
|
- **`ed25519-dalek` becomes a direct dependency** of alknet-core. It's
|
||||||
|
already in the dependency tree (transitive via iroh), so no new
|
||||||
|
compilation cost for `iroh` builds. Quinn-only builds that were not
|
||||||
|
using `RawKey` before will now compile `ed25519-dalek` — it's a small,
|
||||||
|
pure-Rust crate with no C dependencies.
|
||||||
|
- **`rustls-acme` is feature-gated** (`acme` feature). Nodes not using
|
||||||
|
ACME don't compile it. The feature is compatible with `quinn` (ACME
|
||||||
|
is quinn-only; iroh uses its own TLS).
|
||||||
|
- **`build_rustls_server_config` becomes async** (or is replaced by an
|
||||||
|
async `TlsSetup::new`). The accept loop already runs in an async
|
||||||
|
context, so this is a local change.
|
||||||
|
- **ACME state machine lifecycle**: the `AcmeState` task is spawned in
|
||||||
|
`AlknetEndpoint::new()` and aborted on shutdown. The `TlsSetup` struct
|
||||||
|
carries the `JoinHandle` so `AlknetEndpoint::shutdown()` can abort it.
|
||||||
|
- **No handler needed for `acme-tls/1`**: the `dispatch_quinn` guard
|
||||||
|
handles it. `HandlerRegistry` is not involved.
|
||||||
|
|
||||||
|
## Alternatives Considered
|
||||||
|
|
||||||
|
### A. ACME as a `ResolvesServerCert` wrapper behind `X509`
|
||||||
|
|
||||||
|
OQ-12 suggested ACME "fits naturally as an additional `TlsIdentity`
|
||||||
|
variant or as a `rustls::ResolvesServerCert` implementation behind the
|
||||||
|
existing `X509` path." The second option — wrapping `X509` — was
|
||||||
|
rejected because ACME needs async state + config fields (domains, cache,
|
||||||
|
contact) that don't fit behind the static `X509 { cert, key }` variant.
|
||||||
|
A `ResolvesServerCert` that internally does ACME would need to be
|
||||||
|
constructed at config time with those fields, which means `X509` would
|
||||||
|
need to carry them — bloating the variant for non-ACME users. A
|
||||||
|
dedicated `Acme` variant is cleaner.
|
||||||
|
|
||||||
|
### B. Keep `RawKey` coupled to iroh, only add ACME
|
||||||
|
|
||||||
|
Rejected because the coupling is the root cause of quinn-only builds not
|
||||||
|
supporting the "default" identity mode. Fixing only ACME would leave the
|
||||||
|
artificial iroh dependency in place. Since both changes touch
|
||||||
|
`TlsIdentity` and `build_rustls_server_config`, doing them together
|
||||||
|
avoids two breaking changes to the same enum.
|
||||||
|
|
||||||
|
### C. Use `iroh::SecretKey` for both, re-export from alknet-core
|
||||||
|
|
||||||
|
Rejected because it would make `iroh` a non-optional dependency of
|
||||||
|
alknet-core, defeating the feature-gated transport design (ADR-010).
|
||||||
|
`ed25519-dalek` is a lightweight, pure-Rust crate; `iroh` is not.
|
||||||
|
|
||||||
|
### D. Register a no-op `ProtocolHandler` for `acme-tls/1`
|
||||||
|
|
||||||
|
Rejected because it would require the handler registry to know about
|
||||||
|
ACME (a TLS-layer concern), polluting the ALPN dispatch abstraction.
|
||||||
|
The `dispatch_quinn` guard is a one-line check that keeps ACME handling
|
||||||
|
in the endpoint layer where it belongs.
|
||||||
|
|
||||||
|
## Cross-References
|
||||||
|
|
||||||
|
- OQ-12 (TLS identity provisioning) — updated by this ADR
|
||||||
|
- [ADR-010](010-alpn-router-and-endpoint.md) — multi-connectivity endpoint, feature-gated transports
|
||||||
|
- [ADR-004](004-auth-as-shared-core.md) — auth as shared core
|
||||||
|
- `docs/architecture/crates/core/endpoint.md` — TLS identity use cases, updated
|
||||||
|
- `docs/architecture/crates/core/config.md` — `TlsIdentity` enum, updated
|
||||||
|
- `/workspace/@alkdev/reverse-proxy/src/tls/` — proven ACME implementation pattern
|
||||||
|
- `rustls-acme` crate — ACME state machine + cert resolver
|
||||||
@@ -172,16 +172,16 @@ These questions are acknowledged but not active. They will be promoted to open w
|
|||||||
- **Priority**: high
|
- **Priority**: high
|
||||||
- **Resolution**: TLS identity in alknet has two distinct use cases, not one:
|
- **Resolution**: TLS identity in alknet has two distinct use cases, not one:
|
||||||
|
|
||||||
**Use case 1 — P2P / key-based identity (default for most alknet nodes):** RFC 7250 raw Ed25519 public keys. No domain, no CA, no cert renewal. The Ed25519 public key IS the node's identity. This is the same model iroh uses with its `NodeId`. It works natively with SSH auth (same key type) and git (SSH key-based auth). `TlsIdentity::RawKey` in `StaticConfig` covers this. This is the primary identity mode for alknet-native clients — most nodes will use this.
|
**Use case 1 — P2P / key-based identity (default for most alknet nodes):** RFC 7250 raw Ed25519 public keys. No domain, no CA, no cert renewal. The Ed25519 public key IS the node's identity. This is the same model iroh uses with its `NodeId`. It works natively with SSH auth (same key type) and git (SSH key-based auth). `TlsIdentity::RawKey(Ed25519SecretKey)` in `StaticConfig` covers this. As of [ADR-027](decisions/027-tls-identity-redesign-acme-rawkey-decoupling.md), `RawKey` uses `ed25519_dalek::SigningKey` (via an alknet-core wrapper), **not** `iroh::SecretKey` — so raw-key TLS identity is available in quinn-only builds without the `iroh` feature.
|
||||||
|
|
||||||
**Use case 2 — Domain-hosted services (relays, public-facing nodes):** X.509 certificates with domain names. Required for browser/WebTransport clients, which don't support RFC 7250. This has two sub-cases:
|
**Use case 2 — Domain-hosted services (relays, public-facing nodes):** X.509 certificates with domain names. Required for browser/WebTransport clients, which don't support RFC 7250. This has two sub-cases:
|
||||||
- **Manual**: Provide cert/key file paths via `TlsIdentity::X509`. Already specified in `StaticConfig`.
|
- **Manual**: Provide cert/key file paths via `TlsIdentity::X509`. Already specified in `StaticConfig`.
|
||||||
- **ACME auto-provisioning**: Let's Encrypt via rustls-acme. The reverse-proxy project (`/workspace/@alkdev/reverse-proxy`) demonstrates the complete pattern: per-listener ACME state machine, `ResolvesServerCertAcme` rustls integration, TLS-ALPN-01 challenge handling, automatic renewal. This is a proven, solved implementation pattern — not speculative future work. It will be adapted to alknet's `AlknetEndpoint` context when domain-hosted nodes need it.
|
- **ACME auto-provisioning**: Let's Encrypt via `rustls-acme`. `TlsIdentity::Acme { domains, cache_dir, directory, contact }` carries static config; the endpoint constructs the `AcmeState` async state machine at setup time. Feature-gated behind `acme`. Designed in [ADR-027](decisions/027-tls-identity-redesign-acme-rawkey-decoupling.md). The reverse-proxy project (`/workspace/@alkdev/reverse-proxy`) demonstrates the proven pattern: `AcmeConfig`, `ResolvesServerCertAcme`, TLS-ALPN-01 challenge handling, automatic renewal.
|
||||||
|
|
||||||
**Browser constraint**: Browsers require X.509 and don't support RFC 7250. For browser/WebTransport clients, domain-hosted nodes with X.509 certs are mandatory. All other clients (SSH, git, alknet-native) work with raw keys by default.
|
**Browser constraint**: Browsers require X.509 and don't support RFC 7250. For browser/WebTransport clients, domain-hosted nodes with X.509 certs are mandatory. All other clients (SSH, git, alknet-native) work with raw keys by default.
|
||||||
|
|
||||||
The `TlsIdentity` enum in `StaticConfig` already captures all three modes (`X509`, `RawKey`, `SelfSigned`). ACME auto-provisioning is additive — it produces an X.509 cert at runtime rather than from files, and fits naturally as an additional `TlsIdentity` variant or as a `rustls::ResolvesServerCert` implementation behind the existing `X509` path.
|
The `TlsIdentity` enum in `StaticConfig` captures all four modes (`X509`, `RawKey`, `SelfSigned`, `Acme`). [ADR-027](decisions/027-tls-identity-redesign-acme-rawkey-decoupling.md) records the design decisions for ACME integration and RawKey decoupling.
|
||||||
- **Cross-references**: ADR-010, [config.md](crates/core/config.md), [endpoint.md](crates/core/endpoint.md)
|
- **Cross-references**: ADR-010, ADR-027, [config.md](crates/core/config.md), [endpoint.md](crates/core/endpoint.md)
|
||||||
|
|
||||||
### OQ-13: Operation Path Format and Routing Scope
|
### OQ-13: Operation Path Format and Routing Scope
|
||||||
|
|
||||||
|
|||||||
@@ -217,6 +217,7 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
|
|||||||
| [024](decisions/024-operation-registry-layering.md) | Operation Registry Layering | Curated (static) + session/connection overlays (dynamic); `OperationEnv` as trait-object integration point |
|
| [024](decisions/024-operation-registry-layering.md) | Operation Registry Layering | Curated (static) + session/connection overlays (dynamic); `OperationEnv` as trait-object integration point |
|
||||||
| [025](decisions/025-vault-local-only-dispatch.md) | Vault Local-Only Dispatch | Dropped irpc from vault; direct method calls; local-only by construction |
|
| [025](decisions/025-vault-local-only-dispatch.md) | Vault Local-Only Dispatch | Dropped irpc from vault; direct method calls; local-only by construction |
|
||||||
| [026](decisions/026-vault-key-model-hd-derivation.md) | Vault Key Model — HD Derivation | HD derivation from BIP39 seed; `74'` coin type; SLIP-0010/Ed25519 default; AES-256-GCM for credentials |
|
| [026](decisions/026-vault-key-model-hd-derivation.md) | Vault Key Model — HD Derivation | HD derivation from BIP39 seed; `74'` coin type; SLIP-0010/Ed25519 default; AES-256-GCM for credentials |
|
||||||
|
| [027](decisions/027-tls-identity-redesign-acme-rawkey-decoupling.md) | TLS Identity Redesign — ACME + RawKey Decoupling | `TlsIdentity::Acme` variant + two-phase server config; `RawKey` uses `ed25519-dalek` (not `iroh::SecretKey`); `acme` feature gate |
|
||||||
|
|
||||||
## Open Questions
|
## Open Questions
|
||||||
|
|
||||||
|
|||||||
191
tasks/core/acme-integration.md
Normal file
191
tasks/core/acme-integration.md
Normal file
@@ -0,0 +1,191 @@
|
|||||||
|
---
|
||||||
|
id: core/acme-integration
|
||||||
|
name: Add ACME auto-provisioning via rustls-acme (ADR-027)
|
||||||
|
status: pending
|
||||||
|
depends_on: [core/rawkey-decouple-from-iroh]
|
||||||
|
scope: moderate
|
||||||
|
risk: medium
|
||||||
|
impact: component
|
||||||
|
level: implementation
|
||||||
|
---
|
||||||
|
|
||||||
|
## Description
|
||||||
|
|
||||||
|
Implement ACME auto-provisioning (Let's Encrypt) for alknet endpoints,
|
||||||
|
following ADR-027. Adds `TlsIdentity::Acme`, a new `acme` feature gate,
|
||||||
|
a two-phase server-config construction (`TlsSetup`), and a
|
||||||
|
`dispatch_quinn` guard for `acme-tls/1` challenge connections.
|
||||||
|
|
||||||
|
The reverse-proxy project (`/workspace/@alkdev/reverse-proxy/src/tls/`)
|
||||||
|
demonstrates the proven pattern: `AcmeConfig`, `AcmeState` event loop,
|
||||||
|
`ResolvesServerCertAcme`, TLS-ALPN-01 challenge handling, DirCache for
|
||||||
|
cert persistence. This task adapts that pattern to alknet's quinn-based
|
||||||
|
endpoint.
|
||||||
|
|
||||||
|
### Implementation steps
|
||||||
|
|
||||||
|
1. **Add `acme` feature to alknet-core `Cargo.toml`:**
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[features]
|
||||||
|
acme = ["dep:rustls-acme"]
|
||||||
|
|
||||||
|
[dependencies]
|
||||||
|
rustls-acme = { version = "0.12", optional = true, features = ["aws-lc-rs"] }
|
||||||
|
```
|
||||||
|
|
||||||
|
Use the same version as reverse-proxy (`=0.12.1` or compatible).
|
||||||
|
Confirm the exact version against the latest available and the
|
||||||
|
reverse-proxy's `Cargo.toml`.
|
||||||
|
|
||||||
|
2. **Add `TlsIdentity::Acme` variant and supporting types** in
|
||||||
|
`config.rs`:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub enum TlsIdentity {
|
||||||
|
X509 { cert: PathBuf, key: PathBuf },
|
||||||
|
RawKey(Ed25519SecretKey),
|
||||||
|
SelfSigned,
|
||||||
|
Acme {
|
||||||
|
domains: Vec<String>,
|
||||||
|
cache_dir: PathBuf,
|
||||||
|
directory: AcmeDirectory,
|
||||||
|
contact: Vec<String>,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
pub enum AcmeDirectory {
|
||||||
|
Production,
|
||||||
|
Staging,
|
||||||
|
Custom(String),
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`Acme` holds only static, `Clone`/`Debug`-safe data. No `AcmeState`.
|
||||||
|
|
||||||
|
3. **Introduce `TlsSetup`** in `endpoint.rs` — the two-phase
|
||||||
|
construction (ADR-027 Decision 2):
|
||||||
|
|
||||||
|
```rust
|
||||||
|
struct TlsSetup {
|
||||||
|
server_config: rustls::ServerConfig,
|
||||||
|
acme_state_handle: Option<tokio::task::JoinHandle<()>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl TlsSetup {
|
||||||
|
async fn new(
|
||||||
|
tls_identity: &TlsIdentity,
|
||||||
|
alpns: &[Vec<u8>],
|
||||||
|
) -> Result<Self, EndpointError> {
|
||||||
|
match tls_identity {
|
||||||
|
TlsIdentity::X509 { .. } | TlsIdentity::SelfSigned | TlsIdentity::RawKey(_) => {
|
||||||
|
// synchronous path (current build_rustls_server_config)
|
||||||
|
let config = build_rustls_server_config(tls_identity, alpns)?;
|
||||||
|
Ok(Self { server_config: config, acme_state_handle: None })
|
||||||
|
}
|
||||||
|
TlsIdentity::Acme { domains, cache_dir, directory, contact } => {
|
||||||
|
#[cfg(feature = "acme")]
|
||||||
|
{ Self::new_acme(domains, cache_dir, directory, contact, alpns).await }
|
||||||
|
#[cfg(not(feature = "acme"))]
|
||||||
|
{ Err(EndpointError::TlsConfig(io::Error::other("ACME feature not enabled"))) }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Implement `TlsSetup::new_acme`** (`#[cfg(feature = "acme")]`):
|
||||||
|
- Build `AcmeConfig::new(domains)` with `DirCache::new(cache_dir)`,
|
||||||
|
directory URL (from `AcmeDirectory`), and contact.
|
||||||
|
- Get `state = acme_config.state()` and `resolver = state.resolver()`.
|
||||||
|
- Build `rustls::ServerConfig` with
|
||||||
|
`with_cert_resolver(resolver)` (NOT `with_single_cert`).
|
||||||
|
- Append `b"acme-tls/1"` to `alpn_protocols` alongside handler ALPNs.
|
||||||
|
- Spawn the `AcmeState` event loop as a tokio task (pattern from
|
||||||
|
`reverse-proxy/src/tls/acme.rs:spawn_acme_state`). Log
|
||||||
|
`DeployedCachedCert`, `DeployedNewCert`, and error events.
|
||||||
|
- Return `TlsSetup { server_config, acme_state_handle: Some(handle) }`.
|
||||||
|
|
||||||
|
5. **Wire `TlsSetup` into the endpoint construction**: replace the
|
||||||
|
direct `build_quinn_server_config` call in the accept loop setup with
|
||||||
|
`TlsSetup::new(...).await?`. The `acme_state_handle` is stored on
|
||||||
|
`AlknetEndpoint` (or the accept loop context) so it can be aborted on
|
||||||
|
shutdown.
|
||||||
|
|
||||||
|
6. **Add `acme-tls/1` guard in `dispatch_quinn`** (ADR-027 Decision 5):
|
||||||
|
|
||||||
|
```rust
|
||||||
|
if alpn == b"acme-tls/1" {
|
||||||
|
debug!("acme-tls/1 challenge connection completed at TLS layer; closing");
|
||||||
|
connection.close(0u32.into(), b"acme done");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Place this before the `handlers.get(&alpn)` lookup. This is
|
||||||
|
`#[cfg(feature = "acme")]` — without the feature, the guard is
|
||||||
|
absent and `acme-tls/1` is never in the ALPN list.
|
||||||
|
|
||||||
|
7. **Shutdown**: abort the `acme_state_handle` JoinHandle in
|
||||||
|
`AlknetEndpoint::shutdown()` alongside the existing shutdown logic.
|
||||||
|
|
||||||
|
### ACME challenge handling (from research)
|
||||||
|
|
||||||
|
The `ResolvesServerCertAcme` resolver intercepts TLS-ALPN-01 challenges
|
||||||
|
at the cert resolution step — during the TLS handshake, before the
|
||||||
|
connection surfaces to the application. The challenge cert (with the
|
||||||
|
SHA-256 key authorization in its SAN) is served by the resolver; the CA
|
||||||
|
validates it during the handshake. By the time `dispatch_quinn` runs,
|
||||||
|
the challenge already succeeded. The `acme-tls/1` guard just closes the
|
||||||
|
connection gracefully instead of logging a misleading "no handler"
|
||||||
|
warning.
|
||||||
|
|
||||||
|
Key constraint: ACME requires `with_cert_resolver`, not
|
||||||
|
`with_single_cert`. The `acme-tls/1` ALPN must be in
|
||||||
|
`alpn_protocols` or the challenge handshake aborts with
|
||||||
|
`no_application_protocol`.
|
||||||
|
|
||||||
|
### What NOT to change
|
||||||
|
|
||||||
|
- `TlsIdentity::X509`, `RawKey`, `SelfSigned` construction paths —
|
||||||
|
unchanged (the RawKey decoupling is done by the predecessor task).
|
||||||
|
- iroh endpoint — ACME is quinn-only (iroh uses its own TLS).
|
||||||
|
- `endpoint-request-client-cert` — independent task, can proceed in
|
||||||
|
parallel.
|
||||||
|
|
||||||
|
## Acceptance Criteria
|
||||||
|
|
||||||
|
- [ ] `acme` feature added to alknet-core with `rustls-acme` as optional dep
|
||||||
|
- [ ] `TlsIdentity::Acme` variant exists with `domains`, `cache_dir`, `directory`, `contact`
|
||||||
|
- [ ] `AcmeDirectory` enum exists (Production, Staging, Custom)
|
||||||
|
- [ ] `TlsSetup` two-phase construction: synchronous for X509/RawKey/SelfSigned, async for Acme
|
||||||
|
- [ ] ACME path uses `with_cert_resolver(ResolvesServerCertAcme)`, not `with_single_cert`
|
||||||
|
- [ ] `acme-tls/1` added to `alpn_protocols` when ACME is configured
|
||||||
|
- [ ] `dispatch_quinn` has `acme-tls/1` guard (closes silently, no "no handler" warning)
|
||||||
|
- [ ] ACME state machine spawned as tokio task, aborted on endpoint shutdown
|
||||||
|
- [ ] `TlsIdentity::Acme` without `acme` feature returns a clear error at endpoint construction
|
||||||
|
- [ ] Unit test: `AcmeDirectory` resolves to correct Let's Encrypt URLs (staging vs production)
|
||||||
|
- [ ] Unit test: `TlsSetup::new` with `X509`/`RawKey`/`SelfSigned` returns `acme_state_handle: None`
|
||||||
|
- [ ] `cargo build -p alknet-core --features quinn` (no acme) succeeds — no rustls-acme compiled
|
||||||
|
- [ ] `cargo build -p alknet-core --features "quinn acme"` succeeds
|
||||||
|
- [ ] `cargo test -p alknet-core --all-features` succeeds
|
||||||
|
- [ ] `cargo clippy -p alknet-core --all-features --all-targets` clean
|
||||||
|
- [ ] `cargo clippy -p alknet-core --features quinn --all-targets` clean (no acme, no warnings)
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- ADR-027 — full design (two-phase construction, challenge handling, feature gate)
|
||||||
|
- /workspace/@alkdev/reverse-proxy/src/tls/acme.rs — `AcmeTlsConfig`, `spawn_acme_state` (proven pattern)
|
||||||
|
- /workspace/@alkdev/reverse-proxy/src/tls/acceptor.rs — `build_acme_server_config`, `acme-tls/1` ALPN
|
||||||
|
- crates/alknet-core/src/endpoint.rs:286-314 — `dispatch_quinn` (guard insertion site)
|
||||||
|
- crates/alknet-core/src/endpoint.rs:464-509 — `build_rustls_server_config` (TlsSetup replaces this for Acme)
|
||||||
|
- crates/alknet-core/src/config.rs:33-41 — `TlsIdentity` enum (new Acme variant)
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
> Depends on `core/rawkey-decouple-from-iroh` because both modify
|
||||||
|
> `TlsIdentity` and `build_rustls_server_config`. The decoupling task
|
||||||
|
> cleans up the enum shape first; this task adds the Acme variant on top.
|
||||||
|
> The `acme` feature gate is critical — it keeps `rustls-acme` and its
|
||||||
|
> deps out of non-ACME builds. The reverse-proxy project is the reference
|
||||||
|
> implementation; adapt its event loop logging and cache patterns.
|
||||||
119
tasks/core/rawkey-decouple-from-iroh.md
Normal file
119
tasks/core/rawkey-decouple-from-iroh.md
Normal file
@@ -0,0 +1,119 @@
|
|||||||
|
---
|
||||||
|
id: core/rawkey-decouple-from-iroh
|
||||||
|
name: Decouple TlsIdentity::RawKey from the iroh feature (ADR-027)
|
||||||
|
status: pending
|
||||||
|
depends_on: []
|
||||||
|
scope: narrow
|
||||||
|
risk: medium
|
||||||
|
impact: component
|
||||||
|
level: implementation
|
||||||
|
---
|
||||||
|
|
||||||
|
## Description
|
||||||
|
|
||||||
|
`TlsIdentity::RawKey(iroh::SecretKey)` is gated `#[cfg(feature = "iroh")]`
|
||||||
|
and the `RawKeyCertResolver` / `Ed25519SigningKey` rustls impls are gated
|
||||||
|
`#[cfg(all(feature = "quinn", feature = "iroh"))]`. This means quinn-only
|
||||||
|
builds (the default feature set) cannot use RFC 7250 raw-key identity —
|
||||||
|
the mode described as "default for most alknet nodes" (OQ-12, ADR-027).
|
||||||
|
|
||||||
|
The coupling is artificial: `iroh::SecretKey` is a thin newtype over
|
||||||
|
`ed25519_dalek::SigningKey`. The alknet code uses only `.public().as_bytes()`,
|
||||||
|
`.sign(msg)`, and `.clone()`. This task replaces `iroh::SecretKey` with an
|
||||||
|
alknet-core-owned `Ed25519SecretKey` wrapper, un-gates the raw-key TLS
|
||||||
|
path from the `iroh` feature, and updates the iroh transport to convert.
|
||||||
|
|
||||||
|
See ADR-027 for the full design rationale.
|
||||||
|
|
||||||
|
### Implementation steps
|
||||||
|
|
||||||
|
1. **Add `ed25519-dalek` as a direct dependency** of alknet-core in
|
||||||
|
`Cargo.toml`. It's already in the lockfile (transitive via iroh).
|
||||||
|
Version: `2.2` (match what's in `Cargo.lock`).
|
||||||
|
|
||||||
|
2. **Introduce `Ed25519SecretKey`** in `config.rs` (or a new
|
||||||
|
`tls.rs` module if config.rs is getting large):
|
||||||
|
|
||||||
|
```rust
|
||||||
|
#[derive(Clone)]
|
||||||
|
pub struct Ed25519SecretKey(ed25519_dalek::SigningKey);
|
||||||
|
|
||||||
|
impl Ed25519SecretKey {
|
||||||
|
pub fn generate() -> Self { ... }
|
||||||
|
pub fn from_bytes(bytes: &[u8; 32]) -> Self { ... }
|
||||||
|
pub fn as_bytes(&self) -> &[u8; 32] { ... }
|
||||||
|
pub fn public(&self) -> ed25519_dalek::VerifyingKey { ... }
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Add `ZeroizeOnDrop` (the key is secret material). Add a redacting
|
||||||
|
`Debug` impl (like `Secret<T>` in types.rs). Do NOT derive `Debug` —
|
||||||
|
the raw key bytes must not be printed.
|
||||||
|
|
||||||
|
3. **Change `TlsIdentity::RawKey`** from `RawKey(iroh::SecretKey)` to
|
||||||
|
`RawKey(Ed25519SecretKey)`. Remove the `#[cfg(feature = "iroh")]` gate
|
||||||
|
— `RawKey` is available in all builds.
|
||||||
|
|
||||||
|
4. **Rewire `Ed25519SigningKey`** in `endpoint.rs`:
|
||||||
|
- Change the inner field from `iroh::SecretKey` to `Ed25519SecretKey`
|
||||||
|
(or `ed25519_dalek::SigningKey`).
|
||||||
|
- `spki_public_key()`: use `self.key.public().as_bytes()` (same logic,
|
||||||
|
different key type — `ed25519_dalek::VerifyingKey` has `as_bytes()`).
|
||||||
|
- `sign()`: use `self.key.sign(message)` → ed25519-dalek's
|
||||||
|
`SigningKey::sign` returns `Signature` which has `to_bytes()`.
|
||||||
|
- Change the cfg gate from `#[cfg(all(feature = "quinn", feature =
|
||||||
|
"iroh"))]` to `#[cfg(feature = "quinn")]` on `RawKeyCertResolver`,
|
||||||
|
`Ed25519SigningKey`, and all related impls.
|
||||||
|
|
||||||
|
5. **Update `build_iroh_endpoint`**: when `TlsIdentity::RawKey(key)` is
|
||||||
|
present, convert to `iroh::SecretKey::from_bytes(key.as_bytes())`
|
||||||
|
before passing to `iroh::Endpoint::builder().secret_key(...)`. This
|
||||||
|
conversion is `#[cfg(feature = "iroh")]` only.
|
||||||
|
|
||||||
|
6. **Update `build_rustls_server_config`**: the `RawKey` arm changes
|
||||||
|
from `#[cfg(feature = "iroh")]` to always-available (within the
|
||||||
|
`#[cfg(feature = "quinn")]` function). The `RawKeyCertResolver::new`
|
||||||
|
takes `&Ed25519SecretKey` instead of `&iroh::SecretKey`.
|
||||||
|
|
||||||
|
7. **Update all tests** that construct `TlsIdentity::RawKey`:
|
||||||
|
- `endpoint.rs` tests: `iroh::SecretKey::generate(&mut csprng)` →
|
||||||
|
`Ed25519SecretKey::generate()`.
|
||||||
|
- Any test in `config.rs` that constructs `RawKey`.
|
||||||
|
|
||||||
|
### What NOT to change
|
||||||
|
|
||||||
|
- `TlsIdentity::X509` and `SelfSigned` — untouched by this task.
|
||||||
|
- The `endpoint-request-client-cert` task (server config client auth) —
|
||||||
|
independent, can proceed in parallel or before/after this task.
|
||||||
|
- ACME — separate follow-up task (`core/acme-integration`).
|
||||||
|
|
||||||
|
## Acceptance Criteria
|
||||||
|
|
||||||
|
- [ ] `ed25519-dalek` is a direct dependency of alknet-core
|
||||||
|
- [ ] `Ed25519SecretKey` type exists with `generate`, `from_bytes`, `as_bytes`, `public`; redacting `Debug`; `ZeroizeOnDrop`
|
||||||
|
- [ ] `TlsIdentity::RawKey` uses `Ed25519SecretKey`, not `iroh::SecretKey`
|
||||||
|
- [ ] `TlsIdentity::RawKey` is not gated behind `#[cfg(feature = "iroh")]`
|
||||||
|
- [ ] `RawKeyCertResolver` and `Ed25519SigningKey` are gated `#[cfg(feature = "quinn")]` only (not `all(feature = "quinn", feature = "iroh")`)
|
||||||
|
- [ ] `build_iroh_endpoint` converts `Ed25519SecretKey` → `iroh::SecretKey::from_bytes`
|
||||||
|
- [ ] `cargo build -p alknet-core --features quinn` (no iroh) succeeds with `TlsIdentity::RawKey` usable
|
||||||
|
- [ ] `cargo build -p alknet-core --all-features` succeeds
|
||||||
|
- [ ] `cargo test -p alknet-core --all-features` succeeds
|
||||||
|
- [ ] `cargo test -p alknet-core --features quinn` succeeds (quinn-only, no iroh)
|
||||||
|
- [ ] `cargo clippy -p alknet-core --all-features --all-targets` clean
|
||||||
|
- [ ] `cargo clippy -p alknet-core --features quinn --all-targets` clean
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- ADR-027 — full design rationale
|
||||||
|
- crates/alknet-core/src/config.rs:33-41 — `TlsIdentity` enum
|
||||||
|
- crates/alknet-core/src/endpoint.rs:593-689 — `RawKeyCertResolver`, `Ed25519SigningKey`
|
||||||
|
- crates/alknet-core/src/endpoint.rs:511-538 — `build_iroh_endpoint` (conversion site)
|
||||||
|
- crates/alknet-core/src/endpoint.rs:484-495 — `build_rustls_server_config` RawKey arm
|
||||||
|
- /workspace/iroh/iroh-base/src/key.rs:261 — `iroh::SecretKey(SigningKey)` newtype
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
> This is the foundation task for ADR-027. The ACME task
|
||||||
|
> (`core/acme-integration`) depends on this one because both modify
|
||||||
|
> `TlsIdentity` and `build_rustls_server_config`. Doing decoupling first
|
||||||
|
> means the ACME task builds on the cleaned-up enum without iroh coupling.
|
||||||
Reference in New Issue
Block a user