Files
alknet/docs/architecture/decisions/008-acme-lets-encrypt.md
glm-5.1 13b0991fb8 Resolve all architecture open questions, add 13 ADRs, update specs
Resolved all 11 open questions based on project guidance:

Transport:
- OQ-01/OQ-07: ACME/Let's Encrypt with domain + IP paths (ADR-008)
- OQ-02: Default to n0 relay, --iroh-relay override (ADR-009)
- OQ-05: Transport chaining supported natively (ADR-010)

Client:
- OQ-06: Programmatic-first API, no ~/.ssh/config (ADR-011)

Server:
- OQ-04: Ed25519 + OpenSSH cert-authority, no password auth (ADR-012)
- OQ-08: fail2ban-friendly logging + built-in rate limiting (ADR-013)

TUN:
- OQ-03/OQ-09: Deferred entirely, recommend tun2proxy (ADR-014)
- tun-shim.md marked deprecated

NAPI:
- OQ-10: Expose both connect() and serve() (ADR-016)
- OQ-11: Use napi-rs for FFI bridge (ADR-015)

Additional ADRs created during review:
- ADR-006: No logging of tunnel destinations (was phantom reference)
- ADR-017: Stealth mode protocol multiplexing
- ADR-018: Control channel for pubsub over SSH

Fixed: ADR-002 status → Superseded, ADR-007 title typo,
WRAUTH_SERVER typo, ADR-005 stale wraith-tun refs,
undefined ACL feature removed from server.md,
--proxy semantic difference documented.
2026-06-01 17:31:28 +00:00

38 lines
2.6 KiB
Markdown

# ADR-008: ACME/Let's Encrypt Certificate Provisioning
## Status
Accepted
## Context
TLS transport mode requires certificates. Manual certificate management is error-prone — users need to obtain, install, and renew certificates. Our production setup uses certbot with Let's Encrypt (documented in `/workspace/system/dev1/certbot.md`), which automates this via the ACME protocol.
There are two ACME flows:
1. **Domain-based**: Standard flow with DNS-01 or HTTP-01 challenge. Certificate is tied to a domain name, auto-renews via certbot/systemd timer. Requires port 80 or DNS access for challenges.
2. **IP-based**: Short-lived certificates via TLS-ALPN-01 challenge on port 443. No domain needed, but cert is short-lived (days, not months). Simpler for quick setups but requires the ACME client to run continuously.
Both flows are important for wraith's usability. Without ACME, TLS mode requires manual cert setup — a significant barrier for users who want "SSH over port 443" for censorship resistance.
## Decision
Support both ACME certificate provisioning paths:
1. **Domain-based ACME** (`--acme-domain <domain>`): Standard certbot-style flow. Certificate is domain-bound, auto-renews. The server runs a challenge responder (HTTP-01 on port 80 or TLS-ALPN-01 on port 443) during certificate issuance/renewal.
2. **IP-based ACME**: Short-lived certs for servers without a domain. Uses TLS-ALPN-01 challenge on port 443. Lower burden but certs expire frequently.
3. **Manual certs** (`--tls-cert` / `--tls-key`): Always supported for users with existing certificates or specific PKI setups.
The implementation should use the `rustls-acme` crate (or similar pure-Rust ACME client) to avoid an external certbot dependency. This keeps wraith self-contained as a single binary.
## Consequences
- **Positive**: Users can run `wraith serve --transport tls --acme-domain example.com` and get working TLS with zero manual cert management.
- **Positive**: IP-based ACME covers the quick-setup case without requiring a domain.
- **Positive**: Consistent with our production infrastructure (certbot + Let's Encrypt is already our standard).
- **Negative**: ACME adds complexity to the server binary (challenge responder, cert store, renewal timer).
- **Negative**: IP-based short-lived certs require more frequent renewal handling.
- **Negative**: Binary size increases with ACME support (rustls-acme dependency). Consider making ACME a feature flag (`acme`).
## References
- [server.md](../server.md)
- [OQ-01](../open-questions.md) — resolved by this ADR
- [OQ-07](../open-questions.md) — resolved by this ADR
- Production certbot setup: `/workspace/system/dev1/certbot.md`