Files
glm-5.1 0d54eba41e Update architecture specs to reflect live deployment findings and fix two bugs
Architecture updates based on gaps discovered during live deployment testing:

- ADR-023: HTTP/2 client-facing support via ALPN-based protocol detection.
  The spec previously said HTTP/2 was out of scope, but the deployment
  revealed that modern browsers negotiate HTTP/2 via ALPN. The proxy now
  correctly detects the negotiated ALPN protocol and uses the appropriate
  HTTP server builder (http2::Builder for h2, auto::Builder for http/1.1).
  Upstream connections remain HTTP/1.1. Host resolution now falls back to
  URI host for HTTP/2 :authority pseudo-headers.

- ADR-024: ANSI-disabled logging. All tracing-subscriber layers now use
  with_ansi(false) to prevent ANSI escape codes in log output, which broke
  fail2ban regex matching in Docker deployments. Also documents the fail2ban
  regex anchor fix (^RATE_LIMIT → RATE_LIMIT).

Bug fixes found by architecture review:

- Fix missing ALPN protocols in manual TLS mode. build_manual_server_config
  and build_multi_domain_server_config did not set alpn_protocols, meaning
  manual TLS mode could not support HTTP/2. Added h2 and http/1.1 ALPN
  entries to both functions (acme-tls/1 only in ACME mode).

- Fix missing with_ansi(false) in JSON log format. The init_json function
  with file output did not disable ANSI on stdout or file layers, which would
  break fail2ban in production JSON logging mode.

Other spec updates:

- All document statuses updated from draft to reviewed
- proxy.md: documented Server header removal, upstream HTTPS client,
  two-phase timeout enforcement, HTTP/2 host resolution, connect timeout
- tls.md: documented ALPN configuration differing by mode (ACME vs manual)
- overview.md: added HTTP/2 client-facing support to scope, updated crate
  deps (hyper-rustls, rustls-native-certs, hyper-util), clarified out-of-scope
- config.md: fixed http_port type (u16→u32) to match implementation, added
  ANSI-disabled note for LoggingConfig
- operations.md: documented ANSI-disabled logging, fail2ban regex anchor
- open-questions.md: updated OQ-09 resolution (connect timeout fully
  implemented), OQ-10 (C2 bug is fixed)
2026-06-12 11:28:31 +00:00

334 lines
16 KiB
Markdown

---
status: reviewed
last_updated: 2026-06-12
---
# TLS Termination
## What It Is
The TLS termination component handles all aspects of encrypted connections:
certificate provisioning (ACME and manual), TLS handshake, SNI-based certificate
selection, and connection wrapping for the axum router.
## Why It Exists
TLS termination is the security boundary between the public internet and our
upstream services. It replaces nginx's `ssl_certificate`, `ssl_protocols`, and
`ssl_ciphers` configuration with a memory-safe Rust implementation using rustls.
## Architecture
The proxy supports multiple independent TLS listeners, each with its own bind
address, TLS configuration, and site routing. See ADR-019 for the rationale.
```
┌──────────────────────────────────────────┐
│ TLS Termination │
│ │
│ ┌─ Listener 1 ─────────────────────────┐ │
│ │ bind_addr_1:443 │ │
│ │ TcpListener::bind(bind_addr_1) │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ tokio-rustls::TlsAcceptor │ │
│ │ │ │ │
│ │ ACME or Manual TLS config │ │
│ │ (per-listener TLS mode) │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ TlsStream<TcpStream> │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ axum router (per-listener sites) │ │
│ └───────────────────────────────────────┘ │
│ │
│ ┌─ Listener N ─────────────────────────┐ │
│ │ bind_addr_N:443 │ │
│ │ ... (same structure) │ │
│ └───────────────────────────────────────┘ │
└──────────────────────────────────────────┘
bind_addr:80 ──► HTTP listener (redirect to HTTPS, no TLS)
```
Each listener is independently configured. This supports two deployment models:
1. **Shared-IP multi-domain**: One listener with multiple domains in
`acme_domains`, using a single SAN certificate and SNI routing.
2. **Dedicated-IP single-domain**: Multiple listeners, each with its own IP,
its own TLS certificate, and its own site. No SNI needed.
## Certificate Provisioning
Each listener has its own TLS mode (ACME or manual), configured independently.
### ACME Mode (Primary)
Uses `rustls-acme` for automatic certificate provisioning and renewal through
Let's Encrypt. This is the primary mode — no certbot dependency, no cron jobs,
no deploy hooks.
**How it works:**
1. Each listener in ACME mode creates its own `AcmeCertProvider` with the
listener's domain list, cache directory, and Let's Encrypt directory.
2. `AcmeConfig::new(domains)` creates an ACME configuration for the domains
listed in that listener's `acme_domains`. Let's Encrypt will issue a
certificate covering those domains (a single SAN certificate or a
single-domain certificate, depending on how many domains are listed).
3. The `acme_contact` field provides a contact email address (as a `mailto:`
URI) required by Let's Encrypt for production certificate requests. Without
a contact email, Let's Encrypt production API returns a 400-level error.
4. The ACME state machine runs as a background tokio task per listener,
handling:
- Account registration with Let's Encrypt
- Certificate ordering
- TLS-ALPN-01 challenge (or HTTP-01 challenge)
- Certificate issuance
- Certificate renewal (automatic, ~30 days before expiry)
5. `ResolvesServerCertAcme` is a rustls `ResolvesServerCert` implementation
that automatically serves the ACME-provisioned certificate.
6. When a new certificate is issued, the resolver updates atomically — no
restart or signal handling needed.
**Configuration (within a `[[listeners]]` entry):**
```toml
[[listeners]]
bind_addr = "203.0.113.10"
[listeners.tls]
mode = "acme"
acme_domains = ["git.alk.dev", "alk.dev"]
acme_cache_dir = "/var/lib/reverse-proxy/acme-cache"
acme_directory = "production" # or "staging" for testing
acme_contact = "mailto:admin@alk.dev" # Required for Let's Encrypt production
```
**Cache directory:** The `DirCache` from rustls-acme persists ACME account data,
private keys, and certificates between restarts. This avoids re-provisioning on
every restart. Each listener should use its own cache directory to avoid conflicts
between separate ACME state machines.
### Manual Mode (Fallback)
For environments where ACME is not desired (testing, self-signed certs,
corporate CAs, or BYO certificates), the proxy loads certificates from file
paths at startup.
```toml
[[listeners]]
bind_addr = "203.0.113.11"
[listeners.tls]
mode = "manual"
cert_path = "/etc/ssl/alk.dev/fullchain.pem"
key_path = "/etc/ssl/alk.dev/privkey.pem"
```
Certificate files are loaded once at startup using `rustls_pemfile`. Manual
mode requires a restart to pick up new certificates. See ADR-004 for the
rationale behind making ACME the primary mode and manual mode restart-dependent.
## TLS Configuration
### Protocol Versions
The proxy supports TLS 1.2 and TLS 1.3 only, matching the minimum security
level of the current nginx configuration. The `aws_lc_rs` crypto provider
defaults to these protocol versions; explicit configuration ensures no
regression if defaults change in future rustls releases.
### Cipher Suites
Cipher suites are explicitly restricted to match the scope of our current nginx
configuration. See ADR-012 for the full rationale.
**TLS 1.2 (explicitly selected):**
- `TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256`
- `TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256`
- `TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384`
- `TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384`
**TLS 1.3 (all default suites):**
- `TLS_AES_128_GCM_SHA256`
- `TLS_AES_256_GCM_SHA384`
- `TLS_CHACHA20_POLY1305_SHA256`
This is configured by building a `CryptoProvider` with a custom `cipher_suite`
list and passing it to `ServerConfig::builder_with_provider()`. The cipher
list matches our current nginx configuration's scope, providing behavioral
parity during migration.
### ServerConfig Construction
Each listener constructs its own `ServerConfig` based on its TLS mode.
For manual mode, the `ServerConfig` is built with `with_no_client_auth()` and
the loaded certificate chain and private key. If the listener serves multiple
domains from a single listener, a custom `ResolvesServerCert` implementation
maps SNI hostnames to certificate/key pairs loaded from disk.
For ACME mode, the `ServerConfig` is built with `with_cert_resolver()`, passing
the `ResolvesServerCertAcme` resolver. The ACME configuration includes the
domains listed in that listener's `acme_domains`, and the resolver manages the
certificate.
The TLS `ServerConfig` advertises ALPN protocols to enable HTTP/2 negotiation.
The ALPN configuration differs by TLS mode:
- **ACME mode**: `h2`, `http/1.1`, and `acme-tls/1`. The `acme-tls/1` entry is
required for TLS-ALPN-01 challenge verification during certificate provisioning.
- **Manual mode** (single-cert and multi-domain/SNI): `h2` and `http/1.1` only.
The `acme-tls/1` entry is not included because manual mode does not use ACME
challenges.
After the TLS handshake, the proxy inspects the negotiated ALPN protocol to
select the appropriate HTTP server: `h2` triggers
`hyper::server::conn::http2::Builder`, while `http/1.1` (or no ALPN) triggers
`hyper_util::server::conn::auto::Builder`. See ADR-023 for details.
Both modes use the `aws_lc_rs` crypto provider with safe default protocol
versions (TLS 1.2 and TLS 1.3).
## SNI-Based Certificate Selection
After the TLS handshake, the proxy inspects the negotiated ALPN protocol to
determine whether to serve the connection as HTTP/2 or HTTP/1.1. If the client
negotiated `h2` via ALPN, the proxy uses `hyper::server::conn::http2::Builder`;
otherwise, it uses `hyper_util::server::conn::auto::Builder` with HTTP/1.1
and upgrade support. See ADR-023 for details.
### Dedicated-IP Single-Domain (Multi-Config)
In the dedicated-IP model, each listener binds to its own IP address and serves
exactly one domain with one certificate. SNI is not required for certificate
selection — the listener's TLS config already has the correct certificate.
This is the simplest case: one IP, one listener, one certificate, one domain.
No SNI resolution logic is needed.
### Shared-IP Multi-Domain (SAN Certificate)
In the shared-IP model, a single listener serves multiple domains using a SAN
certificate. SNI-based certificate selection is required.
In ACME mode, `rustls-acme` manages a single SAN certificate covering all
configured domains for that listener. The `ResolvesServerCertAcme` resolver
automatically serves the correct certificate during the TLS handshake.
1. **TLS handshake**: The client sends the SNI extension indicating which
hostname it's connecting to.
2. **Certificate resolution**: `ResolvesServerCertAcme` matches the SNI
hostname against the provisioned certificate's Subject Alternative Names
and serves the certificate.
3. **HTTP routing**: After the TLS handshake, axum's `Host` extractor routes
the request to the correct site handler based on the `Host` header.
This is the same pattern nginx uses — SNI selects the cert during TLS, then
`Host` header selects the server block. ACME mode handles this automatically
through the cert resolver.
### Manual Mode with Multiple Domains
In manual mode on a shared-IP listener, a custom `ResolvesServerCert`
implementation maps SNI hostnames to the correct `CertifiedKey`. This
implementation:
1. Loads certificate files at startup (or on SIGHUP for reload)
2. Maps each domain name to its certificate chain and private key
3. During the TLS handshake, looks up the SNI hostname and returns the
matching `CertifiedKey`
The custom resolver must handle the case where no matching certificate exists
for the SNI hostname — in this case, the handshake fails, which is the correct
behavior (we don't serve a default certificate for unknown domains).
## HTTP Listener (Port 80)
Each listener has its own HTTP listener on port 80 (or the configured
`http_port`). It is a plain TCP listener with no TLS. It has one job: redirect
all requests to the HTTPS equivalent.
Each HTTP listener binds to the same IP address as its corresponding TLS
listener, but on port 80.
### ACME Challenge Type
The default ACME challenge type is **TLS-ALPN-01**, since the proxy already
listens on port 443. This avoids requiring a separate HTTP-01 challenge server.
HTTP-01 is available as a fallback for environments where TLS-ALPN-01 is not
suitable (e.g., behind a CDN that terminates TLS). When using HTTP-01, the
port 80 listener serves `/.well-known/acme-challenge/{token}` paths for
challenge verification.
## Certificate Failure Behavior
ACME certificate provisioning and renewal can fail for various reasons (network
outages, Let's Encrypt unavailability, DNS issues, rate limiting). The proxy's
behavior depends on the scenario:
| Scenario | Behavior |
|----------|----------|
| First start, no cached cert, ACME unreachable | **Fail to start** with clear error message. The proxy cannot serve TLS without a certificate. |
| First start, no cached cert, ACME succeeds | Normal startup. Certificate is provisioned and cached. |
| Start with cached cert, ACME unreachable for renewal | **Start normally** with cached cert. Log error at `warn` level. `rustls-acme` retries per its built-in schedule. |
| Renewal failure after startup | **Continue serving existing cert**. Log error at `warn` level. `rustls-acme` retries per its built-in schedule. |
| Cached cert expired, renewal fails at startup | **Fail to start** if cert is expired at startup. An expired certificate cannot serve valid TLS. |
| Cached cert expires during runtime | **Continue serving expired cert**. Clients will receive certificate errors. Log at `error` level. This is the correct behavior — silently dropping TLS would be worse. |
The key principle: **never start without a valid TLS certificate**, but **always
continue serving if a valid cert exists**, even if renewal fails.
## TLS Error Handling
TLS handshake failures are logged and the connection is closed. The proxy does
not serve a default certificate for unknown hostnames — connections that don't
match any configured certificate fail.
| Scenario | Behavior |
|----------|----------|
| SNI hostname doesn't match any certificate (manual mode) | TLS handshake fails with `unrecognized_name` alert. Log at `warn` level with client IP and SNI hostname. |
| No SNI extension sent by client | TLS handshake fails with `handshake_failure` alert. Log at `warn` level with client IP. |
| Unsupported TLS version (1.0/1.1) | TLS handshake fails with `protocol_version` alert. Log at `info` level. |
| Cipher suite negotiation fails | TLS handshake fails with `handshake_failure` alert. Log at `info` level with client IP. |
| Certificate expired (manual mode) | Connection fails during TLS handshake. Log at `error` level. Other listeners/connections continue serving. |
In ACME mode, the `ResolvesServerCertAcme` resolver handles certificate
selection automatically — there is no SNI mismatch scenario because the
resolver serves the ACME-provisioned certificate for all valid domains.
## Key Files and Crates
| Component | Crate | Purpose |
|-----------|-------|---------|
| TLS acceptor | `tokio-rustls` 0.26 | Async TLS handshake over TCP streams |
| TLS config | `rustls` 0.23 | ServerConfig, CryptoProvider, cipher suites |
| ACME client | `rustls-acme` 0.12 | Automatic cert provisioning and renewal |
| PEM parsing | `rustls-pemfile` 2 | Load cert/key from PEM files (manual mode) |
| PKI types | `rustls-pki-types` 1 | CertificateDer, PrivateKeyDer |
## Design Decisions
All design decisions are documented as ADRs in [decisions/](decisions/).
| ADR | Decision | Summary |
|-----|----------|---------|
| [004](decisions/004-rustls-acme.md) | ACME-primary cert management | Eliminates certbot; automatic provisioning and renewal |
| [005](decisions/005-tokio-rustls-direct.md) | tokio-rustls directly | Full control over TLS config and ACME resolver integration |
| [010](decisions/010-multi-site-phase1.md) | Multi-site in Phase 1 | Multiple domains from initial release |
| [011](decisions/011-multi-domain-tls.md) | Multi-domain TLS config | Single SAN certificate covering all domains via rustls-acme |
| [012](decisions/012-cipher-suite-restriction.md) | Restrict cipher suites | Match nginx scope: four ECDHE-AES-GCM suites for TLS 1.2, all TLS 1.3 suites |
| [019](decisions/019-multi-config-listeners.md) | Multi-config listeners | `[[listeners]]` supporting both dedicated-IP and shared-IP deployment models |
| [023](decisions/023-http2-client-facing.md) | HTTP/2 client-facing support | ALPN-based protocol detection; `h2` and `http/1.1` advertised |
## Open Questions
Open questions are tracked in [open-questions.md](open-questions.md). Key
questions affecting this document:
- ~~**OQ-01**: Should cipher suites be restricted beyond rustls defaults?~~ (resolved — ADR-012: restrict to nginx scope)
- ~~**OQ-07**: Should per-site TLS overrides be supported for mixed ACME/manual domains?~~ (resolved — ADR-019: `[[listeners]]` with per-listener TLS config)