docs(adr-027): TLS identity redesign — ACME + RawKey decoupling
ADR-027 resolves the architectural gap surfaced when ACME integration became a concrete target: 1. TlsIdentity::Acme variant — static config data (domains, cache_dir, directory, contact) with async AcmeState constructed at endpoint setup via two-phase TlsSetup (not stuffed into the Clone-able enum). 2. TlsIdentity::RawKey decoupled from the iroh feature — uses Ed25519SecretKey (alknet-core-owned wrapper over ed25519_dalek) instead of iroh::SecretKey. Raw-key TLS identity (RFC 7250, the default for most alknet nodes) now works in quinn-only builds. iroh transport converts via SecretKey::from_bytes. 3. ACME feature-gated behind new acme feature (rustls-acme optional dep). Non-ACME builds don't compile it. 4. dispatch_quinn guard for acme-tls/1 challenge connections — TLS-ALPN-01 is handled at the rustls cert resolver layer during the handshake; the guard closes challenge connections gracefully instead of logging a misleading "no handler" warning. Research confirmed QUIC (quinn) handles ACME challenges differently than TCP (reverse-proxy): quinn gives no ClientHello peek hook, but the challenge is fully answered at the cert resolution step before the connection surfaces to the application. No handler registration needed. Spec updates: config.md, endpoint.md, open-questions.md (OQ-12), overview.md + README.md (ADR index), ADR-010 (cross-ref). Tasks: core/rawkey-decouple-from-iroh (gen 1, no deps), core/acme-integration (gen 2, depends on rawkey). Graph: 36 tasks.
This commit is contained in:
191
tasks/core/acme-integration.md
Normal file
191
tasks/core/acme-integration.md
Normal file
@@ -0,0 +1,191 @@
|
||||
---
|
||||
id: core/acme-integration
|
||||
name: Add ACME auto-provisioning via rustls-acme (ADR-027)
|
||||
status: pending
|
||||
depends_on: [core/rawkey-decouple-from-iroh]
|
||||
scope: moderate
|
||||
risk: medium
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement ACME auto-provisioning (Let's Encrypt) for alknet endpoints,
|
||||
following ADR-027. Adds `TlsIdentity::Acme`, a new `acme` feature gate,
|
||||
a two-phase server-config construction (`TlsSetup`), and a
|
||||
`dispatch_quinn` guard for `acme-tls/1` challenge connections.
|
||||
|
||||
The reverse-proxy project (`/workspace/@alkdev/reverse-proxy/src/tls/`)
|
||||
demonstrates the proven pattern: `AcmeConfig`, `AcmeState` event loop,
|
||||
`ResolvesServerCertAcme`, TLS-ALPN-01 challenge handling, DirCache for
|
||||
cert persistence. This task adapts that pattern to alknet's quinn-based
|
||||
endpoint.
|
||||
|
||||
### Implementation steps
|
||||
|
||||
1. **Add `acme` feature to alknet-core `Cargo.toml`:**
|
||||
|
||||
```toml
|
||||
[features]
|
||||
acme = ["dep:rustls-acme"]
|
||||
|
||||
[dependencies]
|
||||
rustls-acme = { version = "0.12", optional = true, features = ["aws-lc-rs"] }
|
||||
```
|
||||
|
||||
Use the same version as reverse-proxy (`=0.12.1` or compatible).
|
||||
Confirm the exact version against the latest available and the
|
||||
reverse-proxy's `Cargo.toml`.
|
||||
|
||||
2. **Add `TlsIdentity::Acme` variant and supporting types** in
|
||||
`config.rs`:
|
||||
|
||||
```rust
|
||||
pub enum TlsIdentity {
|
||||
X509 { cert: PathBuf, key: PathBuf },
|
||||
RawKey(Ed25519SecretKey),
|
||||
SelfSigned,
|
||||
Acme {
|
||||
domains: Vec<String>,
|
||||
cache_dir: PathBuf,
|
||||
directory: AcmeDirectory,
|
||||
contact: Vec<String>,
|
||||
},
|
||||
}
|
||||
|
||||
pub enum AcmeDirectory {
|
||||
Production,
|
||||
Staging,
|
||||
Custom(String),
|
||||
}
|
||||
```
|
||||
|
||||
`Acme` holds only static, `Clone`/`Debug`-safe data. No `AcmeState`.
|
||||
|
||||
3. **Introduce `TlsSetup`** in `endpoint.rs` — the two-phase
|
||||
construction (ADR-027 Decision 2):
|
||||
|
||||
```rust
|
||||
struct TlsSetup {
|
||||
server_config: rustls::ServerConfig,
|
||||
acme_state_handle: Option<tokio::task::JoinHandle<()>>,
|
||||
}
|
||||
|
||||
impl TlsSetup {
|
||||
async fn new(
|
||||
tls_identity: &TlsIdentity,
|
||||
alpns: &[Vec<u8>],
|
||||
) -> Result<Self, EndpointError> {
|
||||
match tls_identity {
|
||||
TlsIdentity::X509 { .. } | TlsIdentity::SelfSigned | TlsIdentity::RawKey(_) => {
|
||||
// synchronous path (current build_rustls_server_config)
|
||||
let config = build_rustls_server_config(tls_identity, alpns)?;
|
||||
Ok(Self { server_config: config, acme_state_handle: None })
|
||||
}
|
||||
TlsIdentity::Acme { domains, cache_dir, directory, contact } => {
|
||||
#[cfg(feature = "acme")]
|
||||
{ Self::new_acme(domains, cache_dir, directory, contact, alpns).await }
|
||||
#[cfg(not(feature = "acme"))]
|
||||
{ Err(EndpointError::TlsConfig(io::Error::other("ACME feature not enabled"))) }
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
4. **Implement `TlsSetup::new_acme`** (`#[cfg(feature = "acme")]`):
|
||||
- Build `AcmeConfig::new(domains)` with `DirCache::new(cache_dir)`,
|
||||
directory URL (from `AcmeDirectory`), and contact.
|
||||
- Get `state = acme_config.state()` and `resolver = state.resolver()`.
|
||||
- Build `rustls::ServerConfig` with
|
||||
`with_cert_resolver(resolver)` (NOT `with_single_cert`).
|
||||
- Append `b"acme-tls/1"` to `alpn_protocols` alongside handler ALPNs.
|
||||
- Spawn the `AcmeState` event loop as a tokio task (pattern from
|
||||
`reverse-proxy/src/tls/acme.rs:spawn_acme_state`). Log
|
||||
`DeployedCachedCert`, `DeployedNewCert`, and error events.
|
||||
- Return `TlsSetup { server_config, acme_state_handle: Some(handle) }`.
|
||||
|
||||
5. **Wire `TlsSetup` into the endpoint construction**: replace the
|
||||
direct `build_quinn_server_config` call in the accept loop setup with
|
||||
`TlsSetup::new(...).await?`. The `acme_state_handle` is stored on
|
||||
`AlknetEndpoint` (or the accept loop context) so it can be aborted on
|
||||
shutdown.
|
||||
|
||||
6. **Add `acme-tls/1` guard in `dispatch_quinn`** (ADR-027 Decision 5):
|
||||
|
||||
```rust
|
||||
if alpn == b"acme-tls/1" {
|
||||
debug!("acme-tls/1 challenge connection completed at TLS layer; closing");
|
||||
connection.close(0u32.into(), b"acme done");
|
||||
return;
|
||||
}
|
||||
```
|
||||
|
||||
Place this before the `handlers.get(&alpn)` lookup. This is
|
||||
`#[cfg(feature = "acme")]` — without the feature, the guard is
|
||||
absent and `acme-tls/1` is never in the ALPN list.
|
||||
|
||||
7. **Shutdown**: abort the `acme_state_handle` JoinHandle in
|
||||
`AlknetEndpoint::shutdown()` alongside the existing shutdown logic.
|
||||
|
||||
### ACME challenge handling (from research)
|
||||
|
||||
The `ResolvesServerCertAcme` resolver intercepts TLS-ALPN-01 challenges
|
||||
at the cert resolution step — during the TLS handshake, before the
|
||||
connection surfaces to the application. The challenge cert (with the
|
||||
SHA-256 key authorization in its SAN) is served by the resolver; the CA
|
||||
validates it during the handshake. By the time `dispatch_quinn` runs,
|
||||
the challenge already succeeded. The `acme-tls/1` guard just closes the
|
||||
connection gracefully instead of logging a misleading "no handler"
|
||||
warning.
|
||||
|
||||
Key constraint: ACME requires `with_cert_resolver`, not
|
||||
`with_single_cert`. The `acme-tls/1` ALPN must be in
|
||||
`alpn_protocols` or the challenge handshake aborts with
|
||||
`no_application_protocol`.
|
||||
|
||||
### What NOT to change
|
||||
|
||||
- `TlsIdentity::X509`, `RawKey`, `SelfSigned` construction paths —
|
||||
unchanged (the RawKey decoupling is done by the predecessor task).
|
||||
- iroh endpoint — ACME is quinn-only (iroh uses its own TLS).
|
||||
- `endpoint-request-client-cert` — independent task, can proceed in
|
||||
parallel.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `acme` feature added to alknet-core with `rustls-acme` as optional dep
|
||||
- [ ] `TlsIdentity::Acme` variant exists with `domains`, `cache_dir`, `directory`, `contact`
|
||||
- [ ] `AcmeDirectory` enum exists (Production, Staging, Custom)
|
||||
- [ ] `TlsSetup` two-phase construction: synchronous for X509/RawKey/SelfSigned, async for Acme
|
||||
- [ ] ACME path uses `with_cert_resolver(ResolvesServerCertAcme)`, not `with_single_cert`
|
||||
- [ ] `acme-tls/1` added to `alpn_protocols` when ACME is configured
|
||||
- [ ] `dispatch_quinn` has `acme-tls/1` guard (closes silently, no "no handler" warning)
|
||||
- [ ] ACME state machine spawned as tokio task, aborted on endpoint shutdown
|
||||
- [ ] `TlsIdentity::Acme` without `acme` feature returns a clear error at endpoint construction
|
||||
- [ ] Unit test: `AcmeDirectory` resolves to correct Let's Encrypt URLs (staging vs production)
|
||||
- [ ] Unit test: `TlsSetup::new` with `X509`/`RawKey`/`SelfSigned` returns `acme_state_handle: None`
|
||||
- [ ] `cargo build -p alknet-core --features quinn` (no acme) succeeds — no rustls-acme compiled
|
||||
- [ ] `cargo build -p alknet-core --features "quinn acme"` succeeds
|
||||
- [ ] `cargo test -p alknet-core --all-features` succeeds
|
||||
- [ ] `cargo clippy -p alknet-core --all-features --all-targets` clean
|
||||
- [ ] `cargo clippy -p alknet-core --features quinn --all-targets` clean (no acme, no warnings)
|
||||
|
||||
## References
|
||||
|
||||
- ADR-027 — full design (two-phase construction, challenge handling, feature gate)
|
||||
- /workspace/@alkdev/reverse-proxy/src/tls/acme.rs — `AcmeTlsConfig`, `spawn_acme_state` (proven pattern)
|
||||
- /workspace/@alkdev/reverse-proxy/src/tls/acceptor.rs — `build_acme_server_config`, `acme-tls/1` ALPN
|
||||
- crates/alknet-core/src/endpoint.rs:286-314 — `dispatch_quinn` (guard insertion site)
|
||||
- crates/alknet-core/src/endpoint.rs:464-509 — `build_rustls_server_config` (TlsSetup replaces this for Acme)
|
||||
- crates/alknet-core/src/config.rs:33-41 — `TlsIdentity` enum (new Acme variant)
|
||||
|
||||
## Notes
|
||||
|
||||
> Depends on `core/rawkey-decouple-from-iroh` because both modify
|
||||
> `TlsIdentity` and `build_rustls_server_config`. The decoupling task
|
||||
> cleans up the enum shape first; this task adds the Acme variant on top.
|
||||
> The `acme` feature gate is critical — it keeps `rustls-acme` and its
|
||||
> deps out of non-ACME builds. The reverse-proxy project is the reference
|
||||
> implementation; adapt its event loop logging and cache patterns.
|
||||
119
tasks/core/rawkey-decouple-from-iroh.md
Normal file
119
tasks/core/rawkey-decouple-from-iroh.md
Normal file
@@ -0,0 +1,119 @@
|
||||
---
|
||||
id: core/rawkey-decouple-from-iroh
|
||||
name: Decouple TlsIdentity::RawKey from the iroh feature (ADR-027)
|
||||
status: pending
|
||||
depends_on: []
|
||||
scope: narrow
|
||||
risk: medium
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
`TlsIdentity::RawKey(iroh::SecretKey)` is gated `#[cfg(feature = "iroh")]`
|
||||
and the `RawKeyCertResolver` / `Ed25519SigningKey` rustls impls are gated
|
||||
`#[cfg(all(feature = "quinn", feature = "iroh"))]`. This means quinn-only
|
||||
builds (the default feature set) cannot use RFC 7250 raw-key identity —
|
||||
the mode described as "default for most alknet nodes" (OQ-12, ADR-027).
|
||||
|
||||
The coupling is artificial: `iroh::SecretKey` is a thin newtype over
|
||||
`ed25519_dalek::SigningKey`. The alknet code uses only `.public().as_bytes()`,
|
||||
`.sign(msg)`, and `.clone()`. This task replaces `iroh::SecretKey` with an
|
||||
alknet-core-owned `Ed25519SecretKey` wrapper, un-gates the raw-key TLS
|
||||
path from the `iroh` feature, and updates the iroh transport to convert.
|
||||
|
||||
See ADR-027 for the full design rationale.
|
||||
|
||||
### Implementation steps
|
||||
|
||||
1. **Add `ed25519-dalek` as a direct dependency** of alknet-core in
|
||||
`Cargo.toml`. It's already in the lockfile (transitive via iroh).
|
||||
Version: `2.2` (match what's in `Cargo.lock`).
|
||||
|
||||
2. **Introduce `Ed25519SecretKey`** in `config.rs` (or a new
|
||||
`tls.rs` module if config.rs is getting large):
|
||||
|
||||
```rust
|
||||
#[derive(Clone)]
|
||||
pub struct Ed25519SecretKey(ed25519_dalek::SigningKey);
|
||||
|
||||
impl Ed25519SecretKey {
|
||||
pub fn generate() -> Self { ... }
|
||||
pub fn from_bytes(bytes: &[u8; 32]) -> Self { ... }
|
||||
pub fn as_bytes(&self) -> &[u8; 32] { ... }
|
||||
pub fn public(&self) -> ed25519_dalek::VerifyingKey { ... }
|
||||
}
|
||||
```
|
||||
|
||||
Add `ZeroizeOnDrop` (the key is secret material). Add a redacting
|
||||
`Debug` impl (like `Secret<T>` in types.rs). Do NOT derive `Debug` —
|
||||
the raw key bytes must not be printed.
|
||||
|
||||
3. **Change `TlsIdentity::RawKey`** from `RawKey(iroh::SecretKey)` to
|
||||
`RawKey(Ed25519SecretKey)`. Remove the `#[cfg(feature = "iroh")]` gate
|
||||
— `RawKey` is available in all builds.
|
||||
|
||||
4. **Rewire `Ed25519SigningKey`** in `endpoint.rs`:
|
||||
- Change the inner field from `iroh::SecretKey` to `Ed25519SecretKey`
|
||||
(or `ed25519_dalek::SigningKey`).
|
||||
- `spki_public_key()`: use `self.key.public().as_bytes()` (same logic,
|
||||
different key type — `ed25519_dalek::VerifyingKey` has `as_bytes()`).
|
||||
- `sign()`: use `self.key.sign(message)` → ed25519-dalek's
|
||||
`SigningKey::sign` returns `Signature` which has `to_bytes()`.
|
||||
- Change the cfg gate from `#[cfg(all(feature = "quinn", feature =
|
||||
"iroh"))]` to `#[cfg(feature = "quinn")]` on `RawKeyCertResolver`,
|
||||
`Ed25519SigningKey`, and all related impls.
|
||||
|
||||
5. **Update `build_iroh_endpoint`**: when `TlsIdentity::RawKey(key)` is
|
||||
present, convert to `iroh::SecretKey::from_bytes(key.as_bytes())`
|
||||
before passing to `iroh::Endpoint::builder().secret_key(...)`. This
|
||||
conversion is `#[cfg(feature = "iroh")]` only.
|
||||
|
||||
6. **Update `build_rustls_server_config`**: the `RawKey` arm changes
|
||||
from `#[cfg(feature = "iroh")]` to always-available (within the
|
||||
`#[cfg(feature = "quinn")]` function). The `RawKeyCertResolver::new`
|
||||
takes `&Ed25519SecretKey` instead of `&iroh::SecretKey`.
|
||||
|
||||
7. **Update all tests** that construct `TlsIdentity::RawKey`:
|
||||
- `endpoint.rs` tests: `iroh::SecretKey::generate(&mut csprng)` →
|
||||
`Ed25519SecretKey::generate()`.
|
||||
- Any test in `config.rs` that constructs `RawKey`.
|
||||
|
||||
### What NOT to change
|
||||
|
||||
- `TlsIdentity::X509` and `SelfSigned` — untouched by this task.
|
||||
- The `endpoint-request-client-cert` task (server config client auth) —
|
||||
independent, can proceed in parallel or before/after this task.
|
||||
- ACME — separate follow-up task (`core/acme-integration`).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `ed25519-dalek` is a direct dependency of alknet-core
|
||||
- [ ] `Ed25519SecretKey` type exists with `generate`, `from_bytes`, `as_bytes`, `public`; redacting `Debug`; `ZeroizeOnDrop`
|
||||
- [ ] `TlsIdentity::RawKey` uses `Ed25519SecretKey`, not `iroh::SecretKey`
|
||||
- [ ] `TlsIdentity::RawKey` is not gated behind `#[cfg(feature = "iroh")]`
|
||||
- [ ] `RawKeyCertResolver` and `Ed25519SigningKey` are gated `#[cfg(feature = "quinn")]` only (not `all(feature = "quinn", feature = "iroh")`)
|
||||
- [ ] `build_iroh_endpoint` converts `Ed25519SecretKey` → `iroh::SecretKey::from_bytes`
|
||||
- [ ] `cargo build -p alknet-core --features quinn` (no iroh) succeeds with `TlsIdentity::RawKey` usable
|
||||
- [ ] `cargo build -p alknet-core --all-features` succeeds
|
||||
- [ ] `cargo test -p alknet-core --all-features` succeeds
|
||||
- [ ] `cargo test -p alknet-core --features quinn` succeeds (quinn-only, no iroh)
|
||||
- [ ] `cargo clippy -p alknet-core --all-features --all-targets` clean
|
||||
- [ ] `cargo clippy -p alknet-core --features quinn --all-targets` clean
|
||||
|
||||
## References
|
||||
|
||||
- ADR-027 — full design rationale
|
||||
- crates/alknet-core/src/config.rs:33-41 — `TlsIdentity` enum
|
||||
- crates/alknet-core/src/endpoint.rs:593-689 — `RawKeyCertResolver`, `Ed25519SigningKey`
|
||||
- crates/alknet-core/src/endpoint.rs:511-538 — `build_iroh_endpoint` (conversion site)
|
||||
- crates/alknet-core/src/endpoint.rs:484-495 — `build_rustls_server_config` RawKey arm
|
||||
- /workspace/iroh/iroh-base/src/key.rs:261 — `iroh::SecretKey(SigningKey)` newtype
|
||||
|
||||
## Notes
|
||||
|
||||
> This is the foundation task for ADR-027. The ACME task
|
||||
> (`core/acme-integration`) depends on this one because both modify
|
||||
> `TlsIdentity` and `build_rustls_server_config`. Doing decoupling first
|
||||
> means the ACME task builds on the cleaned-up enum without iroh coupling.
|
||||
Reference in New Issue
Block a user