Expand architecture: multi-site Phase 1, multi-domain TLS, fix review issues
Promote multi-site support from Phase 2 to Phase 1 (ADR-010): the proxy must support git.alk.dev and alk.dev from initial release. Add multi-domain TLS configuration (ADR-011): acme_domains array replaces acme_domain string, single SAN certificate via rustls-acme. Key changes: - ADR-010: Multi-site in Phase 1 — avoids config format migration later - ADR-011: Multi-domain TLS — single SAN cert, acme_domains Vec<String> - ADR-002: Updated rationale for multi-site (one upstream per domain) - overview.md: Phase 1 now includes multi-site, alk.dev pass-through, dual licensing (MIT OR Apache-2.0), real IP removed - config.md: acme_domain → acme_domains, TOML example shows both sites, validation adds unique host check, real IP replaced with 203.0.113.10 - tls.md: Multi-domain SNI section moved from Future to current, manual mode uses ResolvesServerCert for SNI mapping, TOML header fixed - proxy.md: Updated for multi-site, removed single-domain language - operations.md: RFC 5737 documentation IPs, clarified rate limit eviction semantics (distinct scan interval vs eviction age) - open-questions.md: OQ-05 resolved (single bind_addr sufficient), new OQ-07 (per-site TLS overrides) Review fixes: - acme_domains (plural) consistently used across all docs and diagram - ADR-011 clearly scopes acme_domain as previous design - Inline decision rationale extracted: tls.md hot-reload → ADR-004 ref, config.md static/dynamic → ADR-008 ref - TOML section headers consistent (server.tls)
This commit is contained in:
@@ -14,6 +14,10 @@ memory-safe Rust/axum reverse proxy. The primary motivation is CVE-2026-42945
|
|||||||
(unauthenticated RCE in nginx's rewrite module) and the broader pattern of
|
(unauthenticated RCE in nginx's rewrite module) and the broader pattern of
|
||||||
memory corruption bugs in nginx's C codebase.
|
memory corruption bugs in nginx's C codebase.
|
||||||
|
|
||||||
|
The proxy supports multiple domains from initial release (git.alk.dev and
|
||||||
|
alk.dev), with per-domain host-based routing and a single multi-domain SAN
|
||||||
|
certificate via ACME.
|
||||||
|
|
||||||
## Architecture Documents
|
## Architecture Documents
|
||||||
|
|
||||||
| Document | Status | Description |
|
| Document | Status | Description |
|
||||||
@@ -37,6 +41,8 @@ memory corruption bugs in nginx's C codebase.
|
|||||||
| [007](decisions/007-custom-log-format.md) | Custom Structured Log Format | Accepted |
|
| [007](decisions/007-custom-log-format.md) | Custom Structured Log Format | Accepted |
|
||||||
| [008](decisions/008-static-dynamic-config-split.md) | Static/Dynamic Config Split with ArcSwap | Accepted |
|
| [008](decisions/008-static-dynamic-config-split.md) | Static/Dynamic Config Split with ArcSwap | Accepted |
|
||||||
| [009](decisions/009-signal-handling.md) | Signal Handling Strategy | Accepted |
|
| [009](decisions/009-signal-handling.md) | Signal Handling Strategy | Accepted |
|
||||||
|
| [010](decisions/010-multi-site-phase1.md) | Multi-Site Support in Phase 1 | Accepted |
|
||||||
|
| [011](decisions/011-multi-domain-tls.md) | Multi-Domain TLS Configuration | Accepted |
|
||||||
|
|
||||||
## Open Questions
|
## Open Questions
|
||||||
|
|
||||||
@@ -48,8 +54,9 @@ See [open-questions.md](open-questions.md) for the full tracker.
|
|||||||
| ~~OQ-02~~ | ~~What log format should fail2ban consume?~~ | ~~high~~ | **resolved** (ADR-007) |
|
| ~~OQ-02~~ | ~~What log format should fail2ban consume?~~ | ~~high~~ | **resolved** (ADR-007) |
|
||||||
| OQ-03 | Should the health check endpoint be on a separate port? | low | open |
|
| OQ-03 | Should the health check endpoint be on a separate port? | low | open |
|
||||||
| OQ-04 | Config reload: SIGHUP only or also Unix socket API? | low | open |
|
| OQ-04 | Config reload: SIGHUP only or also Unix socket API? | low | open |
|
||||||
| OQ-05 | Should the proxy bind to multiple addresses? | low | open |
|
| ~~OQ-05~~ | ~~Should the proxy bind to multiple addresses?~~ | ~~low~~ | **resolved** (single bind_addr sufficient) |
|
||||||
| OQ-06 | Should upstream timeouts be configurable per-site? | low | open |
|
| OQ-06 | Should upstream timeouts be configurable per-site? | low | open |
|
||||||
|
| OQ-07 | Should per-site TLS overrides be supported for mixed ACME/manual domains? | low | open |
|
||||||
|
|
||||||
## Document Lifecycle
|
## Document Lifecycle
|
||||||
|
|
||||||
|
|||||||
@@ -39,7 +39,7 @@ config.toml
|
|||||||
│ http_port │ │ rate_limit │
|
│ http_port │ │ rate_limit │
|
||||||
│ https_port │ │ body_limit │
|
│ https_port │ │ body_limit │
|
||||||
│ tls.mode │ │ proxy_headers │
|
│ tls.mode │ │ proxy_headers │
|
||||||
│ tls.acme_domain │ │ │
|
│ tls.acme_domains │ │ │
|
||||||
│ tls.cert_path │ │ ← ArcSwap → │
|
│ tls.cert_path │ │ ← ArcSwap → │
|
||||||
│ tls.key_path │ │ ConfigReloadHandle │
|
│ tls.key_path │ │ ConfigReloadHandle │
|
||||||
│ tls.cache_dir │ │ .reload(new_config) │
|
│ tls.cache_dir │ │ .reload(new_config) │
|
||||||
@@ -59,11 +59,11 @@ Immutable after startup. Changes require a process restart.
|
|||||||
|
|
||||||
| Field | Type | Description |
|
| Field | Type | Description |
|
||||||
|-------|------|-------------|
|
|-------|------|-------------|
|
||||||
| `bind_addr` | `String` | IP address to bind to (e.g., `"15.235.125.95"`) |
|
| `bind_addr` | `String` | IP address to bind to (must be explicit, no `0.0.0.0`) |
|
||||||
| `http_port` | `u16` | Port for HTTP→HTTPS redirect (default: `80`; set to `0` to disable) |
|
| `http_port` | `u16` | Port for HTTP→HTTPS redirect (default: `80`; set to `0` to disable) |
|
||||||
| `https_port` | `u16` | Port for TLS listener (default: `443`) |
|
| `https_port` | `u16` | Port for TLS listener (default: `443`) |
|
||||||
| `tls.mode` | `"acme"` or `"manual"` | Certificate provisioning mode |
|
| `tls.mode` | `"acme"` or `"manual"` | Certificate provisioning mode |
|
||||||
| `tls.acme_domain` | `String` | Domain for ACME (ACME mode only) |
|
| `tls.acme_domains` | `Vec<String>` | Domains for ACME SAN certificate (ACME mode only) |
|
||||||
| `tls.acme_cache_dir` | `String` | ACME state cache directory |
|
| `tls.acme_cache_dir` | `String` | ACME state cache directory |
|
||||||
| `tls.acme_directory` | `"production"` or `"staging"` | Let's Encrypt directory |
|
| `tls.acme_directory` | `"production"` or `"staging"` | Let's Encrypt directory |
|
||||||
| `tls.cert_path` | `String` | Certificate file path (manual mode only) |
|
| `tls.cert_path` | `String` | Certificate file path (manual mode only) |
|
||||||
@@ -71,9 +71,10 @@ Immutable after startup. Changes require a process restart.
|
|||||||
| `log_level` | `"trace"`, `"debug"`, `"info"`, `"warn"`, `"error"` | Logging verbosity |
|
| `log_level` | `"trace"`, `"debug"`, `"info"`, `"warn"`, `"error"` | Logging verbosity |
|
||||||
| `log_format` | `"text"` or `"json"` | Log output format |
|
| `log_format` | `"text"` or `"json"` | Log output format |
|
||||||
|
|
||||||
**Why these are static:** Changing bind addresses, ports, or TLS mode requires
|
**Why these are static:** See ADR-008 for the rationale behind the
|
||||||
creating new listeners and TLS configurations — operations that fundamentally
|
static/dynamic split. In summary: changing bind addresses, ports, or TLS mode
|
||||||
require a restart. There's no safe way to change these at runtime.
|
requires creating new listeners and TLS configurations — operations that
|
||||||
|
fundamentally require a restart.
|
||||||
|
|
||||||
### DynamicConfig
|
### DynamicConfig
|
||||||
|
|
||||||
@@ -95,10 +96,10 @@ connections immediately.
|
|||||||
| `upstream` | `String` | Upstream address (e.g., `"127.0.0.1:3000"`) |
|
| `upstream` | `String` | Upstream address (e.g., `"127.0.0.1:3000"`) |
|
||||||
| `upstream_scheme` | `"http"` or `"https"` | Protocol for upstream connection (default: `"http"`) |
|
| `upstream_scheme` | `"http"` or `"https"` | Protocol for upstream connection (default: `"http"`) |
|
||||||
|
|
||||||
**Why these are dynamic:** Site definitions and rate limits are per-request
|
**Why these are dynamic:** See ADR-008 for the rationale. Site definitions
|
||||||
concerns. Adding a site or changing a rate limit should not require restarting
|
and rate limits are per-request concerns that should not require restarting
|
||||||
the proxy and dropping active connections. Rate limits and body limits are
|
the proxy or dropping active connections. Rate limits and body limits are
|
||||||
global settings in Phase 1; per-site configuration for these may be added in
|
global settings in Phase 1; per-site configuration for these is deferred to
|
||||||
Phase 2.
|
Phase 2.
|
||||||
|
|
||||||
## Config Reload
|
## Config Reload
|
||||||
@@ -136,13 +137,13 @@ config reload, but SIGHUP is sufficient for Phase 1.
|
|||||||
# reverse-proxy config
|
# reverse-proxy config
|
||||||
|
|
||||||
[server]
|
[server]
|
||||||
bind_addr = "15.235.125.95"
|
bind_addr = "203.0.113.10" # Replace with actual bind address
|
||||||
http_port = 80
|
http_port = 80
|
||||||
https_port = 443
|
https_port = 443
|
||||||
|
|
||||||
[server.tls]
|
[server.tls]
|
||||||
mode = "acme" # "acme" or "manual"
|
mode = "acme" # "acme" or "manual"
|
||||||
acme_domain = "git.alk.dev"
|
acme_domains = ["git.alk.dev", "alk.dev"]
|
||||||
acme_cache_dir = "/var/lib/reverse-proxy/acme-cache"
|
acme_cache_dir = "/var/lib/reverse-proxy/acme-cache"
|
||||||
acme_directory = "production" # "production" or "staging"
|
acme_directory = "production" # "production" or "staging"
|
||||||
|
|
||||||
@@ -166,6 +167,11 @@ limit_bytes = 104857600 # 100 MB
|
|||||||
host = "git.alk.dev"
|
host = "git.alk.dev"
|
||||||
upstream = "127.0.0.1:3000"
|
upstream = "127.0.0.1:3000"
|
||||||
upstream_scheme = "http"
|
upstream_scheme = "http"
|
||||||
|
|
||||||
|
[[sites]]
|
||||||
|
host = "alk.dev"
|
||||||
|
upstream = "127.0.0.1:8080"
|
||||||
|
upstream_scheme = "http"
|
||||||
```
|
```
|
||||||
|
|
||||||
### Validation
|
### Validation
|
||||||
@@ -173,12 +179,13 @@ upstream_scheme = "http"
|
|||||||
On startup, the config is validated:
|
On startup, the config is validated:
|
||||||
|
|
||||||
1. `bind_addr` is not `0.0.0.0` (must be explicit)
|
1. `bind_addr` is not `0.0.0.0` (must be explicit)
|
||||||
2. In ACME mode, `acme_domain` must be set
|
2. In ACME mode, `acme_domains` must be non-empty
|
||||||
3. In manual mode, `cert_path` and `key_path` must both be set and the files
|
3. In manual mode, `cert_path` and `key_path` must both be set and the files
|
||||||
must be readable
|
must be readable
|
||||||
4. Each site must have a `host` and `upstream`
|
4. Each site must have a `host` and `upstream`
|
||||||
5. `rate_limit.requests_per_second` must be > 0
|
5. Site `host` values must be unique (no duplicate hostnames)
|
||||||
6. `body.limit_bytes` must be > 0
|
6. `rate_limit.requests_per_second` must be > 0
|
||||||
|
7. `body.limit_bytes` must be > 0
|
||||||
|
|
||||||
On SIGHUP reload, the same validation applies. If the new config fails
|
On SIGHUP reload, the same validation applies. If the new config fails
|
||||||
validation, the reload is rejected and the old config remains active. An error
|
validation, the reload is rejected and the old config remains active. An error
|
||||||
@@ -196,6 +203,8 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
|
|||||||
|-----|----------|---------|
|
|-----|----------|---------|
|
||||||
| [003](decisions/003-toml-config.md) | TOML configuration format | Rust-native, unambiguous, excellent serde support |
|
| [003](decisions/003-toml-config.md) | TOML configuration format | Rust-native, unambiguous, excellent serde support |
|
||||||
| [008](decisions/008-static-dynamic-config-split.md) | Static/dynamic config split | Immutable StaticConfig, hot-reloadable DynamicConfig via ArcSwap |
|
| [008](decisions/008-static-dynamic-config-split.md) | Static/dynamic config split | Immutable StaticConfig, hot-reloadable DynamicConfig via ArcSwap |
|
||||||
|
| [010](decisions/010-multi-site-phase1.md) | Multi-site in Phase 1 | Multiple domains from initial release |
|
||||||
|
| [011](decisions/011-multi-domain-tls.md) | Multi-domain TLS config | Single SAN certificate covering all domains |
|
||||||
|
|
||||||
## Open Questions
|
## Open Questions
|
||||||
|
|
||||||
@@ -204,3 +213,5 @@ questions affecting this document:
|
|||||||
|
|
||||||
- **OQ-04**: Should config reload support a Unix domain socket API in addition
|
- **OQ-04**: Should config reload support a Unix domain socket API in addition
|
||||||
to SIGHUP? (open)
|
to SIGHUP? (open)
|
||||||
|
- **OQ-07**: Should per-site TLS overrides be supported for mixed ACME/manual
|
||||||
|
domains? (open)
|
||||||
@@ -16,8 +16,9 @@ available:
|
|||||||
2. **Custom handler** (Felix Knorr pattern): Build a handler using hyper's
|
2. **Custom handler** (Felix Knorr pattern): Build a handler using hyper's
|
||||||
`Client` to forward requests. ~50-100 lines of Rust for our needs.
|
`Client` to forward requests. ~50-100 lines of Rust for our needs.
|
||||||
|
|
||||||
Our use case is minimal: single upstream per domain, single domain, no load
|
Our use case is minimal: single upstream per domain, no load balancing, no
|
||||||
balancing, no retry, no HTTP/2 proxying.
|
retry, no HTTP/2 proxying. While the proxy supports multiple domains
|
||||||
|
(ADR-010), each domain routes to exactly one upstream.
|
||||||
|
|
||||||
## Decision
|
## Decision
|
||||||
|
|
||||||
@@ -31,6 +32,8 @@ project's channel proxy.
|
|||||||
path-based routing to multiple backends)
|
path-based routing to multiple backends)
|
||||||
- Our proxy case is the simplest possible: match a Host header, forward the
|
- Our proxy case is the simplest possible: match a Host header, forward the
|
||||||
entire request to a single upstream, stream the response back
|
entire request to a single upstream, stream the response back
|
||||||
|
- Multi-domain support (ADR-010) doesn't change this — each domain still maps
|
||||||
|
to one upstream
|
||||||
- The Felix Knorr pattern is proven, idiomatic, and ~50-100 lines
|
- The Felix Knorr pattern is proven, idiomatic, and ~50-100 lines
|
||||||
- We maintain full control over header injection, error handling, and upstream
|
- We maintain full control over header injection, error handling, and upstream
|
||||||
connection behavior
|
connection behavior
|
||||||
@@ -46,11 +49,12 @@ project's channel proxy.
|
|||||||
|
|
||||||
**Negative:**
|
**Negative:**
|
||||||
- We implement and maintain proxy logic ourselves (but it's trivial for our
|
- We implement and maintain proxy logic ourselves (but it's trivial for our
|
||||||
use case)
|
use case — each domain maps to one upstream)
|
||||||
- If requirements grow to load balancing or retry, we'd need to add that
|
- If requirements grow to load balancing or retry, we'd need to add that
|
||||||
ourselves or switch to `axum-reverse-proxy`
|
ourselves or switch to `axum-reverse-proxy`
|
||||||
|
|
||||||
## References
|
## References
|
||||||
|
|
||||||
- [proxy.md](../proxy.md)
|
- [proxy.md](../proxy.md)
|
||||||
|
- [ADR-010](010-multi-site-phase1.md) (multi-site in Phase 1)
|
||||||
- Felix Knorr, "Replacing nginx with axum" (felix-knorr.net/posts/2024-10-13-replacing-nginx-with-axum.html)
|
- Felix Knorr, "Replacing nginx with axum" (felix-knorr.net/posts/2024-10-13-replacing-nginx-with-axum.html)
|
||||||
90
docs/architecture/decisions/010-multi-site-phase1.md
Normal file
90
docs/architecture/decisions/010-multi-site-phase1.md
Normal file
@@ -0,0 +1,90 @@
|
|||||||
|
# ADR-010: Multi-Site Support in Phase 1
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Accepted
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
The original architecture phased multi-site support into Phase 2, treating
|
||||||
|
Phase 1 as a single-domain replacement for nginx serving only `git.alk.dev`.
|
||||||
|
This was based on the assumption that only one domain needed proxying initially.
|
||||||
|
|
||||||
|
However, `alk.dev` (the bare domain) will need proxying in the near future.
|
||||||
|
While `alk.dev` is a simple case — proxying to a Deno/Fresh container with no
|
||||||
|
special requirements — the proxy must support multiple sites from day one. The
|
||||||
|
config format, routing logic, and TLS certificate provisioning all need
|
||||||
|
multi-site awareness.
|
||||||
|
|
||||||
|
Additionally, `api.alk.dev` is explicitly out of scope (it runs its own
|
||||||
|
HTTP/2+ server natively), but the proxy must not prevent future sites from
|
||||||
|
being added.
|
||||||
|
|
||||||
|
The cost of deferring multi-site is high: we'd need a config format migration,
|
||||||
|
routing logic rewrite, and TLS cert management changes later. Supporting
|
||||||
|
multi-site from the start costs very little — the config format just uses an
|
||||||
|
array of sites (which it already does), host-based routing is trivial in axum,
|
||||||
|
and `rustls-acme` supports multi-domain certificates natively.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Move multi-site support from Phase 2 into Phase 1. The proxy supports multiple
|
||||||
|
sites from the initial release:
|
||||||
|
|
||||||
|
- `[[sites]]` array in config (already the planned format)
|
||||||
|
- Host-based routing via axum's `Host` extractor (already the planned approach)
|
||||||
|
- Multi-domain ACME certificate provisioning via `rustls-acme`
|
||||||
|
- Each site maps a hostname to an upstream address
|
||||||
|
|
||||||
|
Phase 1 scope becomes:
|
||||||
|
|
||||||
|
1. Multi-site reverse proxy with TLS termination
|
||||||
|
2. ACME certificate management (multi-domain)
|
||||||
|
3. HTTP → HTTPS redirect
|
||||||
|
4. Rate limiting, logging, health check, graceful shutdown
|
||||||
|
5. Systemd integration
|
||||||
|
|
||||||
|
Phase 2 scope shifts to operational hardening:
|
||||||
|
|
||||||
|
1. Per-site rate limits and body limits
|
||||||
|
2. Per-site upstream timeouts
|
||||||
|
3. Metrics endpoint (Prometheus-compatible)
|
||||||
|
4. Connection limits and timeouts
|
||||||
|
5. Log rotation
|
||||||
|
|
||||||
|
Phase 3 remains future enhancements.
|
||||||
|
|
||||||
|
## Rationale
|
||||||
|
|
||||||
|
- The config format already uses `[[sites]]` — no format change needed
|
||||||
|
- Host-based routing is the natural axum pattern and was already planned
|
||||||
|
- `rustls-acme` accepts `Vec<domain>` — multi-domain is its default usage
|
||||||
|
- The cost of adding multi-site later (config migration, routing rewrite,
|
||||||
|
cert management changes) far exceeds the cost of supporting it now (zero
|
||||||
|
additional complexity)
|
||||||
|
- `alk.dev` is confirmed as a near-term need, not a hypothetical
|
||||||
|
- The proxy's value proposition is being a memory-safe reverse proxy for *our
|
||||||
|
infrastructure*, which has multiple domains
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
**Positive:**
|
||||||
|
- No config format migration needed later
|
||||||
|
- `alk.dev` can be added to the config without code changes
|
||||||
|
- TLS cert management handles multiple domains from the start
|
||||||
|
- Eliminates an entire phase of work
|
||||||
|
|
||||||
|
**Negative:**
|
||||||
|
- Slightly more testing surface (must verify correct routing with multiple
|
||||||
|
sites)
|
||||||
|
- Must test multi-domain ACME provisioning (not just single-domain)
|
||||||
|
- Wildcard or fallback site behavior needs to be defined (addressed in
|
||||||
|
OQ-07)
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [overview.md](../overview.md)
|
||||||
|
- [config.md](../config.md)
|
||||||
|
- [tls.md](../tls.md)
|
||||||
|
- [proxy.md](../proxy.md)
|
||||||
|
- ADR-002 (custom proxy handler — rationale updated for multi-site)
|
||||||
92
docs/architecture/decisions/011-multi-domain-tls.md
Normal file
92
docs/architecture/decisions/011-multi-domain-tls.md
Normal file
@@ -0,0 +1,92 @@
|
|||||||
|
# ADR-011: Multi-Domain TLS Configuration
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Accepted
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
With multi-site support in Phase 1 (ADR-010), the TLS configuration must
|
||||||
|
support multiple domains. The previous design used a single `tls.acme_domain`
|
||||||
|
string field, which only works for one domain.
|
||||||
|
|
||||||
|
There are several approaches to multi-domain TLS:
|
||||||
|
|
||||||
|
1. **Single ACME config with domain list**: `acme_domains = ["git.alk.dev",
|
||||||
|
"alk.dev"]` — one certificate covering all domains (SAN certificate)
|
||||||
|
2. **Per-site TLS configuration**: Each site entry specifies its own TLS
|
||||||
|
mode (ACME or manual) and domain — more flexible but complex
|
||||||
|
3. **Hybrid**: A global TLS section with ACME domains, plus per-site overrides
|
||||||
|
for manual certificates
|
||||||
|
|
||||||
|
For our use case, all proxied domains use the same ACME certificate authority
|
||||||
|
(Let's Encrypt) and the same challenge type (TLS-ALPN-01). There's no need
|
||||||
|
for per-site TLS configuration in Phase 1.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Use a single ACME configuration with a list of domains, producing one SAN
|
||||||
|
certificate covering all proxied domains. Manual mode uses certificate file
|
||||||
|
paths (single cert file with all domains, or one cert per domain resolved via
|
||||||
|
SNI).
|
||||||
|
|
||||||
|
The config format changes from the previous single-domain format:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
# Previous (single-domain) format — no longer used
|
||||||
|
[tls]
|
||||||
|
mode = "acme"
|
||||||
|
acme_domain = "git.alk.dev" # single string
|
||||||
|
```
|
||||||
|
|
||||||
|
To the current multi-domain format:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[tls]
|
||||||
|
mode = "acme"
|
||||||
|
acme_domains = ["git.alk.dev", "alk.dev"] # array of strings
|
||||||
|
```
|
||||||
|
|
||||||
|
In ACME mode, `rustls-acme` provisions a single certificate covering all
|
||||||
|
listed domains via Subject Alternative Names (SAN). This is the standard
|
||||||
|
Let's Encrypt approach for multi-domain certificates.
|
||||||
|
|
||||||
|
In manual mode, the cert and key files must cover all domains (either a SAN
|
||||||
|
certificate or separate certificates resolved via SNI).
|
||||||
|
|
||||||
|
## Rationale
|
||||||
|
|
||||||
|
- A single SAN certificate is simpler to manage (one renewal, one cert)
|
||||||
|
- Let's Encrypt supports SAN certificates with up to 100 domains
|
||||||
|
- `rustls-acme` accepts `Vec<String>` for domain lists — this is its natural
|
||||||
|
API
|
||||||
|
- All our domains use the same ACME configuration (Let's Encrypt production,
|
||||||
|
TLS-ALPN-01 challenge)
|
||||||
|
- Per-site TLS overrides add complexity with no current benefit
|
||||||
|
- If per-site TLS configuration is needed later (e.g., a site with a manual
|
||||||
|
cert), it can be added as an optional override without changing the global
|
||||||
|
config structure
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
**Positive:**
|
||||||
|
- Single certificate for all domains — simpler renewal, simpler cert management
|
||||||
|
- Matches `rustls-acme`'s natural API (`AcmeConfig::new(domains: Vec<String>)`)
|
||||||
|
- All domains in one cert means SNI resolution is handled by ACME automatically
|
||||||
|
- Config format is a minimal change from single-domain
|
||||||
|
|
||||||
|
**Negative:**
|
||||||
|
- Adding or removing a domain requires re-provisioning the certificate (ACME
|
||||||
|
handles this automatically, but it means cert changes affect all domains)
|
||||||
|
- If one domain fails ACME validation, the entire cert renewal fails (all
|
||||||
|
domains must be validated) — mitigated by Let's Encrypt's domain-level
|
||||||
|
validation
|
||||||
|
- Per-site TLS configuration (e.g., a domain with a manual cert) requires a
|
||||||
|
future config extension (OQ-07)
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [tls.md](../tls.md)
|
||||||
|
- [config.md](../config.md)
|
||||||
|
- ADR-010 (multi-site in Phase 1)
|
||||||
|
- ADR-004 (ACME-primary certificate management)
|
||||||
@@ -21,8 +21,6 @@ last_updated: 2026-06-11
|
|||||||
than the current nginx config.
|
than the current nginx config.
|
||||||
- **Cross-references**: ADR-005
|
- **Cross-references**: ADR-005
|
||||||
|
|
||||||
## Logging and Monitoring
|
|
||||||
|
|
||||||
### ~~OQ-02: What log format should fail2ban consume?~~
|
### ~~OQ-02: What log format should fail2ban consume?~~
|
||||||
|
|
||||||
- **Origin**: [operations.md](operations.md), [proxy.md](proxy.md)
|
- **Origin**: [operations.md](operations.md), [proxy.md](proxy.md)
|
||||||
@@ -33,6 +31,22 @@ last_updated: 2026-06-11
|
|||||||
See ADR-007.
|
See ADR-007.
|
||||||
- **Cross-references**: ADR-007
|
- **Cross-references**: ADR-007
|
||||||
|
|
||||||
|
### OQ-07: Should per-site TLS overrides be supported for mixed ACME/manual domains?
|
||||||
|
|
||||||
|
- **Origin**: [tls.md](tls.md), [config.md](config.md)
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: low
|
||||||
|
- **Context**: Phase 1 uses a single TLS configuration (ACME or manual) for all
|
||||||
|
domains. All domains share the same ACME config and certificate. If a future
|
||||||
|
domain needs a manual certificate (e.g., a corporate CA cert) while other
|
||||||
|
domains use ACME, a per-site TLS override would be needed. This would require
|
||||||
|
a custom `ResolvesServerCert` that combines ACME-provisioned certs with
|
||||||
|
manually loaded certs. For now, all proxied domains use the same ACME config,
|
||||||
|
so this is not needed.
|
||||||
|
- **Cross-references**: ADR-011
|
||||||
|
|
||||||
|
## Logging and Monitoring
|
||||||
|
|
||||||
### OQ-03: Should the health check endpoint be on a separate port?
|
### OQ-03: Should the health check endpoint be on a separate port?
|
||||||
|
|
||||||
- **Origin**: [operations.md](operations.md)
|
- **Origin**: [operations.md](operations.md)
|
||||||
@@ -61,15 +75,15 @@ last_updated: 2026-06-11
|
|||||||
|
|
||||||
## Deployment
|
## Deployment
|
||||||
|
|
||||||
### OQ-05: Should the proxy bind to multiple addresses or just one?
|
### ~~OQ-05: Should the proxy bind to multiple addresses or just one?~~
|
||||||
|
|
||||||
- **Origin**: [overview.md](overview.md)
|
- **Origin**: [overview.md](overview.md)
|
||||||
- **Status**: open
|
- **Status**: resolved
|
||||||
- **Priority**: low
|
- **Priority**: low
|
||||||
- **Context**: Current nginx config binds to a specific IP (`15.235.125.95`).
|
- **Resolution**: A single `bind_addr` is sufficient. The proxy binds to one
|
||||||
The proposed config uses `bind_addr` which could be any IP. For Phase 1, the
|
explicit IP address (not `0.0.0.0`). Multi-address binding is not needed for
|
||||||
config will specify a single IP address. Multi-address binding (listening on
|
this single-server deployment. If needed in the future, `bind_addr` could be
|
||||||
multiple IPs) is not needed but could be added as an array of addresses.
|
extended to an array. See config.md for the `bind_addr` field.
|
||||||
- **Cross-references**: None
|
- **Cross-references**: None
|
||||||
|
|
||||||
## Proxy
|
## Proxy
|
||||||
|
|||||||
@@ -42,9 +42,10 @@ Requests` and logs the event with structured fields.
|
|||||||
### State Eviction
|
### State Eviction
|
||||||
|
|
||||||
The per-IP token bucket state grows over time as new IPs are seen. A
|
The per-IP token bucket state grows over time as new IPs are seen. A
|
||||||
background task runs at a configurable interval (default: 60 seconds) and
|
background task runs every 60 seconds (configurable) and removes entries
|
||||||
removes entries that haven't been accessed within the cleanup interval. This
|
whose last access timestamp is older than a configurable eviction age
|
||||||
prevents unbounded memory growth.
|
(default: 300 seconds / 5 minutes). This prevents unbounded memory growth
|
||||||
|
while preserving recent entries that may still receive traffic.
|
||||||
|
|
||||||
### Fail2ban Integration
|
### Fail2ban Integration
|
||||||
|
|
||||||
@@ -55,7 +56,7 @@ format decision.
|
|||||||
The log format uses `key=value` pairs with a `RATE_LIMIT` prefix:
|
The log format uses `key=value` pairs with a `RATE_LIMIT` prefix:
|
||||||
|
|
||||||
```
|
```
|
||||||
RATE_LIMIT client_ip=X.X.X.X host=Y.Z path=/W status=429
|
RATE_LIMIT client_ip=203.0.113.50 host=Y.Z path=/W status=429
|
||||||
```
|
```
|
||||||
|
|
||||||
A corresponding fail2ban filter and jail configuration are provided as part
|
A corresponding fail2ban filter and jail configuration are provided as part
|
||||||
@@ -71,15 +72,15 @@ log entries:
|
|||||||
1. **Access logs**: Every proxied request is logged at `info` level with
|
1. **Access logs**: Every proxied request is logged at `info` level with
|
||||||
structured fields.
|
structured fields.
|
||||||
|
|
||||||
```
|
```
|
||||||
REQUEST client_ip=1.2.3.4 host=git.alk.dev method=GET path=/user/repo status=200 upstream=127.0.0.1:3000 duration_ms=45
|
REQUEST client_ip=203.0.113.50 host=git.alk.dev method=GET path=/user/repo status=200 upstream=127.0.0.1:3000 duration_ms=45
|
||||||
```
|
```
|
||||||
|
|
||||||
2. **Event logs**: Rate limits, TLS errors, upstream failures, config reloads,
|
2. **Event logs**: Rate limits, TLS errors, upstream failures, config reloads,
|
||||||
etc.
|
etc.
|
||||||
|
|
||||||
```
|
```
|
||||||
RATE_LIMIT client_ip=1.2.3.4 host=git.alk.dev path=/login status=429
|
RATE_LIMIT client_ip=203.0.113.50 host=git.alk.dev path=/login status=429
|
||||||
UPSTREAM_ERROR host=git.alk.dev upstream=127.0.0.1:3000 error="connection refused"
|
UPSTREAM_ERROR host=git.alk.dev upstream=127.0.0.1:3000 error="connection refused"
|
||||||
CONFIG_RELOAD status=success sites=1
|
CONFIG_RELOAD status=success sites=1
|
||||||
```
|
```
|
||||||
|
|||||||
@@ -8,10 +8,12 @@ last_updated: 2026-06-11
|
|||||||
## Vision
|
## Vision
|
||||||
|
|
||||||
A memory-safe, minimal reverse proxy that replaces our vulnerable nginx instance
|
A memory-safe, minimal reverse proxy that replaces our vulnerable nginx instance
|
||||||
for forward-proxying to backend services. The proxy terminates TLS, injects
|
for forwarding requests to backend services. The proxy terminates TLS, injects
|
||||||
standard proxy headers, enforces rate limits, and forwards requests to upstream
|
standard proxy headers, enforces rate limits, and forwards requests to upstream
|
||||||
services — with operational feature parity for our current single-domain Gitea
|
services — supporting multiple domains from initial release.
|
||||||
setup.
|
|
||||||
|
This project is open source under dual licensing: MIT OR Apache-2.0, consistent
|
||||||
|
with standard Rust project licensing.
|
||||||
|
|
||||||
## Why This Exists
|
## Why This Exists
|
||||||
|
|
||||||
@@ -35,38 +37,44 @@ details.
|
|||||||
|
|
||||||
### In Scope
|
### In Scope
|
||||||
|
|
||||||
- **Phase 1**: Replace nginx for `git.alk.dev` with feature parity
|
- **Phase 1**: Multi-site reverse proxy with TLS termination
|
||||||
- TLS termination with ACME (Let's Encrypt) certificate management
|
- TLS termination with ACME (Let's Encrypt) multi-domain certificate management
|
||||||
- Manual certificate paths as fallback mode
|
- Manual certificate paths as fallback mode
|
||||||
- HTTP → HTTPS redirect
|
- HTTP → HTTPS redirect
|
||||||
- Reverse proxy to Gitea at `127.0.0.1:3000`
|
- Host-based routing to multiple upstream services
|
||||||
|
- Reverse proxy to Gitea at `127.0.0.1:3000` (git.alk.dev)
|
||||||
|
- Reverse proxy to Deno/Fresh container for alk.dev (simple pass-through)
|
||||||
- Proxy header injection (Host, X-Real-IP, X-Forwarded-For, X-Forwarded-Proto)
|
- Proxy header injection (Host, X-Real-IP, X-Forwarded-For, X-Forwarded-Proto)
|
||||||
- Request rate limiting with fail2ban-compatible logging (global per-IP; per-site in Phase 2)
|
- Request rate limiting with fail2ban-compatible logging (global per-IP)
|
||||||
- 100 MB body size limit (global; per-site in Phase 2)
|
- 100 MB body size limit (global)
|
||||||
- Configurable bind address (no `0.0.0.0` default)
|
- Configurable bind address (no `0.0.0.0` default)
|
||||||
- Health check endpoint
|
- Health check endpoint
|
||||||
- Graceful shutdown (SIGTERM handling)
|
- Graceful shutdown (SIGTERM handling)
|
||||||
- Systemd unit file
|
- Systemd unit file
|
||||||
|
- Dual licensing: MIT OR Apache-2.0
|
||||||
|
|
||||||
- **Phase 2**: Multi-site support
|
- **Phase 2**: Operational hardening
|
||||||
- SNI-based TLS routing for multiple domains
|
- Per-site rate limits and body limits
|
||||||
- Config file for site definitions
|
- Per-site upstream timeouts
|
||||||
- Dynamic config reload (ArcSwap pattern)
|
|
||||||
|
|
||||||
- **Phase 3**: Operational hardening
|
|
||||||
- Metrics endpoint (Prometheus-compatible)
|
- Metrics endpoint (Prometheus-compatible)
|
||||||
- Connection limits and timeouts
|
- Connection limits and timeouts
|
||||||
- Log rotation
|
- Log rotation
|
||||||
|
|
||||||
|
- **Phase 3**: Future enhancements
|
||||||
|
- Wildcard subdomain support
|
||||||
|
- Per-site TLS overrides (manual certs for specific domains)
|
||||||
|
- Unix domain socket config reload API
|
||||||
|
|
||||||
### Out of Scope
|
### Out of Scope
|
||||||
|
|
||||||
- HTTP/2 or HTTP/3 proxying (services that need these run their own native
|
- HTTP/2 or HTTP/3 proxying (services that need these run their own native
|
||||||
Rust servers — e.g., `api.alk.dev`)
|
Rust servers — e.g., `api.alk.dev` runs its own HTTP/2+ server)
|
||||||
- Load balancing or round-robin upstream selection
|
- Load balancing or round-robin upstream selection
|
||||||
- WebSocket proxying (can be added later if needed)
|
- WebSocket proxying (can be added later if needed)
|
||||||
- Static file serving
|
- Static file serving
|
||||||
- Access control beyond rate limiting (no auth, no IP allowlists in Phase 1)
|
- Access control beyond rate limiting (no auth, no IP allowlists in Phase 1)
|
||||||
- CGI, SCGI, uWSGI, FastCGI
|
- CGI, SCGI, uWSGI, FastCGI
|
||||||
|
- Per-site TLS configuration (all domains share one ACME config in Phase 1)
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
@@ -81,11 +89,14 @@ bind_addr:80 ──► │ HTTP listener → 301 redirect │
|
|||||||
│ │
|
│ │
|
||||||
bind_addr:443 ──► │ TLS listener (tokio-rustls) │
|
bind_addr:443 ──► │ TLS listener (tokio-rustls) │
|
||||||
│ ├─ ACME mode: rustls-acme resolver │
|
│ ├─ ACME mode: rustls-acme resolver │
|
||||||
│ │ (auto cert provisioning/renewal) │
|
│ │ (multi-domain SAN cert, │
|
||||||
|
│ │ auto-provision & renew) │
|
||||||
│ └─ Manual mode: cert/key file paths │
|
│ └─ Manual mode: cert/key file paths │
|
||||||
│ │
|
│ │
|
||||||
│ axum router │
|
│ axum router │
|
||||||
│ ├─ Host-based routing │
|
│ ├─ Host-based routing │
|
||||||
|
│ │ ├─ git.alk.dev → :3000 │
|
||||||
|
│ │ └─ alk.dev → :8080 │
|
||||||
│ ├─ Rate limiting middleware │
|
│ ├─ Rate limiting middleware │
|
||||||
│ ├─ Proxy header injection │
|
│ ├─ Proxy header injection │
|
||||||
│ ├─ Body size limit (100MB) │
|
│ ├─ Body size limit (100MB) │
|
||||||
@@ -147,7 +158,7 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
|
|||||||
| ADR | Decision | Summary |
|
| ADR | Decision | Summary |
|
||||||
|-----|----------|---------|
|
|-----|----------|---------|
|
||||||
| [001](decisions/001-rust-axum.md) | Rust with axum | Memory safety eliminates the bug class causing nginx CVEs; axum provides ergonomic tower integration |
|
| [001](decisions/001-rust-axum.md) | Rust with axum | Memory safety eliminates the bug class causing nginx CVEs; axum provides ergonomic tower integration |
|
||||||
| [002](decisions/002-custom-proxy-handler.md) | Custom proxy handler | Single upstream, single domain — axum-reverse-proxy adds unnecessary complexity |
|
| [002](decisions/002-custom-proxy-handler.md) | Custom proxy handler | Single upstream per domain — simpler than a general proxy library |
|
||||||
| [003](decisions/003-toml-config.md) | TOML configuration format | Rust-native, unambiguous, excellent serde support |
|
| [003](decisions/003-toml-config.md) | TOML configuration format | Rust-native, unambiguous, excellent serde support |
|
||||||
| [004](decisions/004-rustls-acme.md) | ACME-primary certificate management | Eliminates certbot dependency; automatic provisioning and renewal |
|
| [004](decisions/004-rustls-acme.md) | ACME-primary certificate management | Eliminates certbot dependency; automatic provisioning and renewal |
|
||||||
| [005](decisions/005-tokio-rustls-direct.md) | tokio-rustls directly, not axum-server | Full control over TLS config, ACME resolver integration, cipher suite configuration |
|
| [005](decisions/005-tokio-rustls-direct.md) | tokio-rustls directly, not axum-server | Full control over TLS config, ACME resolver integration, cipher suite configuration |
|
||||||
@@ -155,6 +166,8 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
|
|||||||
| [007](decisions/007-custom-log-format.md) | Custom structured log format | key=value pairs with RATE_LIMIT prefix for fail2ban |
|
| [007](decisions/007-custom-log-format.md) | Custom structured log format | key=value pairs with RATE_LIMIT prefix for fail2ban |
|
||||||
| [008](decisions/008-static-dynamic-config-split.md) | Static/dynamic config with ArcSwap | Immutable StaticConfig, hot-reloadable DynamicConfig via ArcSwap |
|
| [008](decisions/008-static-dynamic-config-split.md) | Static/dynamic config with ArcSwap | Immutable StaticConfig, hot-reloadable DynamicConfig via ArcSwap |
|
||||||
| [009](decisions/009-signal-handling.md) | Signal handling strategy | signal-hook for SIGTERM/SIGINT/SIGHUP |
|
| [009](decisions/009-signal-handling.md) | Signal handling strategy | signal-hook for SIGTERM/SIGINT/SIGHUP |
|
||||||
|
| [010](decisions/010-multi-site-phase1.md) | Multi-site in Phase 1 | Multiple domains from initial release; avoids config migration later |
|
||||||
|
| [011](decisions/011-multi-domain-tls.md) | Multi-domain TLS config | Single SAN certificate covering all domains via rustls-acme |
|
||||||
|
|
||||||
## Open Questions
|
## Open Questions
|
||||||
|
|
||||||
@@ -163,4 +176,4 @@ questions affecting this document:
|
|||||||
|
|
||||||
- **OQ-01**: Should cipher suites be restricted beyond rustls defaults? (open)
|
- **OQ-01**: Should cipher suites be restricted beyond rustls defaults? (open)
|
||||||
- **OQ-03**: Should the health check endpoint be on a separate port? (open)
|
- **OQ-03**: Should the health check endpoint be on a separate port? (open)
|
||||||
- **OQ-05**: Should the proxy bind to multiple addresses or just one? (open)
|
- **OQ-07**: Should per-site TLS overrides be supported for mixed ACME/manual domains? (open)
|
||||||
@@ -14,8 +14,9 @@ injection, body size limits), and forwards it to the upstream service.
|
|||||||
## Why It Exists
|
## Why It Exists
|
||||||
|
|
||||||
This component replaces nginx's `proxy_pass` directive. For our use case —
|
This component replaces nginx's `proxy_pass` directive. For our use case —
|
||||||
single upstream per domain, no load balancing, no HTTP/2 proxying — a custom
|
one upstream per domain across multiple domains, no load balancing, no HTTP/2
|
||||||
handler is simpler and more maintainable than a general-purpose proxy library.
|
proxying — a custom handler is simpler and more maintainable than a
|
||||||
|
general-purpose proxy library (ADR-002, ADR-010).
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
@@ -140,9 +141,9 @@ services typically run on the same host (e.g., `127.0.0.1:3000`). The
|
|||||||
`upstream_scheme` field in each site's configuration allows specifying `https://`
|
`upstream_scheme` field in each site's configuration allows specifying `https://`
|
||||||
for upstreams that require TLS (e.g., separate hosts or secure internal services).
|
for upstreams that require TLS (e.g., separate hosts or secure internal services).
|
||||||
|
|
||||||
For the initial deployment (`git.alk.dev` → `127.0.0.1:3000`), the upstream
|
For the initial deployment, upstream connections use plain HTTP (e.g.,
|
||||||
connection uses plain HTTP, as TLS between the proxy and Gitea on loopback is
|
`git.alk.dev` → `127.0.0.1:3000`, `alk.dev` → `127.0.0.1:8080`) since TLS
|
||||||
unnecessary.
|
between the proxy and backend services on loopback is unnecessary.
|
||||||
|
|
||||||
## Body Size Limit
|
## Body Size Limit
|
||||||
|
|
||||||
@@ -157,8 +158,9 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
|
|||||||
|
|
||||||
| ADR | Decision | Summary |
|
| ADR | Decision | Summary |
|
||||||
|-----|----------|---------|
|
|-----|----------|---------|
|
||||||
| [002](decisions/002-custom-proxy-handler.md) | Custom proxy handler | Single upstream, single domain — simpler than a general proxy library |
|
| [002](decisions/002-custom-proxy-handler.md) | Custom proxy handler | One upstream per domain — simpler than a general proxy library |
|
||||||
| [007](decisions/007-custom-log-format.md) | Custom structured log format | key=value pairs with RATE_LIMIT prefix for fail2ban |
|
| [007](decisions/007-custom-log-format.md) | Custom structured log format | key=value pairs with RATE_LIMIT prefix for fail2ban |
|
||||||
|
| [010](decisions/010-multi-site-phase1.md) | Multi-site in Phase 1 | Multiple domains from initial release |
|
||||||
|
|
||||||
## Open Questions
|
## Open Questions
|
||||||
|
|
||||||
|
|||||||
@@ -57,10 +57,11 @@ no deploy hooks.
|
|||||||
|
|
||||||
**How it works:**
|
**How it works:**
|
||||||
|
|
||||||
1. `AcmeCertProvider` configures the ACME client with the domain, cache
|
1. `AcmeCertProvider` configures the ACME client with the domain list, cache
|
||||||
directory, and Let's Encrypt directory (staging or production).
|
directory, and Let's Encrypt directory (staging or production).
|
||||||
2. `AcmeConfig::new(vec![domain])` creates an ACME configuration for the
|
2. `AcmeConfig::new(domains)` creates an ACME configuration for all listed
|
||||||
domain.
|
domains. Let's Encrypt will issue a single SAN certificate covering all
|
||||||
|
domains.
|
||||||
3. The ACME state machine runs as a background tokio task, handling:
|
3. The ACME state machine runs as a background tokio task, handling:
|
||||||
- Account registration with Let's Encrypt
|
- Account registration with Let's Encrypt
|
||||||
- Certificate ordering
|
- Certificate ordering
|
||||||
@@ -75,9 +76,9 @@ no deploy hooks.
|
|||||||
**Configuration:**
|
**Configuration:**
|
||||||
|
|
||||||
```toml
|
```toml
|
||||||
[tls]
|
[server.tls]
|
||||||
mode = "acme"
|
mode = "acme"
|
||||||
acme_domain = "git.alk.dev"
|
acme_domains = ["git.alk.dev", "alk.dev"]
|
||||||
acme_cache_dir = "/var/lib/reverse-proxy/acme-cache"
|
acme_cache_dir = "/var/lib/reverse-proxy/acme-cache"
|
||||||
acme_directory = "production" # or "staging" for testing
|
acme_directory = "production" # or "staging" for testing
|
||||||
```
|
```
|
||||||
@@ -100,13 +101,8 @@ key_path = "/etc/letsencrypt/live/git.alk.dev/privkey.pem"
|
|||||||
```
|
```
|
||||||
|
|
||||||
Certificate files are loaded once at startup using `rustls_pemfile`. Manual
|
Certificate files are loaded once at startup using `rustls_pemfile`. Manual
|
||||||
mode requires a restart to pick up new certificates.
|
mode requires a restart to pick up new certificates. See ADR-004 for the
|
||||||
|
rationale behind making ACME the primary mode and manual mode restart-dependent.
|
||||||
**Why not hot-reload manual certs?** ACME mode handles renewal automatically.
|
|
||||||
Manual mode is for cases where you control cert rotation externally (certbot,
|
|
||||||
manual renewal). In that case, a SIGHUP-triggered restart is simpler and more
|
|
||||||
reliable than file watching. If zero-downtime cert rotation is needed, use ACME
|
|
||||||
mode.
|
|
||||||
|
|
||||||
## TLS Configuration
|
## TLS Configuration
|
||||||
|
|
||||||
@@ -142,10 +138,13 @@ restrict cipher suites beyond rustls defaults.
|
|||||||
### ServerConfig Construction
|
### ServerConfig Construction
|
||||||
|
|
||||||
For manual mode, the `ServerConfig` is built with `with_no_client_auth()` and
|
For manual mode, the `ServerConfig` is built with `with_no_client_auth()` and
|
||||||
`with_single_cert()`, loading the certificate chain and private key from disk.
|
a custom `ResolvesServerCert` implementation that maps SNI hostnames to
|
||||||
|
certificate/key pairs loaded from disk.
|
||||||
|
|
||||||
For ACME mode, the `ServerConfig` is built with `with_cert_resolver()`, passing
|
For ACME mode, the `ServerConfig` is built with `with_cert_resolver()`, passing
|
||||||
the `ResolvesServerCertAcme` resolver. The ACME TLS-ALPN-01 protocol identifier
|
the `ResolvesServerCertAcme` resolver. The ACME configuration includes all
|
||||||
|
domains listed in `acme_domains`, and the resolver manages a single SAN
|
||||||
|
certificate covering all of them. The ACME TLS-ALPN-01 protocol identifier
|
||||||
(`acme-tls/1`) must be registered in the `alpn_protocols` list so the server
|
(`acme-tls/1`) must be registered in the `alpn_protocols` list so the server
|
||||||
can respond to TLS-ALPN-01 challenges.
|
can respond to TLS-ALPN-01 challenges.
|
||||||
|
|
||||||
@@ -154,28 +153,39 @@ versions (TLS 1.2 and TLS 1.3).
|
|||||||
|
|
||||||
## SNI-Based Certificate Selection
|
## SNI-Based Certificate Selection
|
||||||
|
|
||||||
### Current (Single Domain)
|
### ACME Mode (Multi-Domain)
|
||||||
|
|
||||||
For single-domain setups, SNI selection is trivial: there's only one
|
In ACME mode, `rustls-acme` manages a single SAN certificate covering all
|
||||||
certificate, so `with_single_cert()` or `ResolvesServerCertAcme` (which
|
configured domains. The `ResolvesServerCertAcme` resolver automatically serves
|
||||||
handles the domain) is sufficient.
|
the correct certificate during the TLS handshake.
|
||||||
|
|
||||||
### Future (Multi-Domain)
|
|
||||||
|
|
||||||
When multiple domains are served, SNI selection works as follows:
|
|
||||||
|
|
||||||
1. **TLS handshake**: The client sends the SNI extension indicating which
|
1. **TLS handshake**: The client sends the SNI extension indicating which
|
||||||
hostname it's connecting to.
|
hostname it's connecting to.
|
||||||
2. **Certificate resolution**: In ACME mode, `ResolvesServerCertAcme` handles
|
2. **Certificate resolution**: `ResolvesServerCertAcme` matches the SNI
|
||||||
this automatically — it stores certificates keyed by domain. In manual mode,
|
hostname against the provisioned certificate's Subject Alternative Names
|
||||||
a custom `ResolvesServerCert` implementation maps SNI hostname to the
|
and serves the certificate.
|
||||||
correct `CertifiedKey`.
|
|
||||||
3. **HTTP routing**: After the TLS handshake, axum's `Host` extractor routes
|
3. **HTTP routing**: After the TLS handshake, axum's `Host` extractor routes
|
||||||
the request to the correct site handler based on the `Host` header.
|
the request to the correct site handler based on the `Host` header.
|
||||||
|
|
||||||
This is the same pattern nginx uses — SNI selects the cert during TLS, then
|
This is the same pattern nginx uses — SNI selects the cert during TLS, then
|
||||||
`Host` header selects the server block. In manual mode, a `ResolvesServerCert`
|
`Host` header selects the server block. ACME mode handles this automatically
|
||||||
implementation maps SNI hostname to the correct `CertifiedKey`.
|
through the cert resolver.
|
||||||
|
|
||||||
|
### Manual Mode (Multi-Domain)
|
||||||
|
|
||||||
|
In manual mode, a custom `ResolvesServerCert` implementation is required to
|
||||||
|
map SNI hostnames to the correct `CertifiedKey`. This implementation:
|
||||||
|
|
||||||
|
1. Loads certificate files at startup (or on SIGHUP for reload)
|
||||||
|
2. Maps each domain name to its certificate chain and private key
|
||||||
|
3. During the TLS handshake, looks up the SNI hostname and returns the
|
||||||
|
matching `CertifiedKey`
|
||||||
|
|
||||||
|
The custom resolver must handle the case where no matching certificate exists
|
||||||
|
for the SNI hostname — in this case, the handshake fails, which is the
|
||||||
|
correct behavior (we don't serve a default certificate for unknown domains).
|
||||||
|
|
||||||
|
See [open-questions.md](open-questions.md) OQ-07 for per-site TLS overrides.
|
||||||
|
|
||||||
## HTTP Listener (Port 80)
|
## HTTP Listener (Port 80)
|
||||||
|
|
||||||
@@ -211,6 +221,8 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
|
|||||||
|-----|----------|---------|
|
|-----|----------|---------|
|
||||||
| [004](decisions/004-rustls-acme.md) | ACME-primary cert management | Eliminates certbot; automatic provisioning and renewal |
|
| [004](decisions/004-rustls-acme.md) | ACME-primary cert management | Eliminates certbot; automatic provisioning and renewal |
|
||||||
| [005](decisions/005-tokio-rustls-direct.md) | tokio-rustls directly | Full control over TLS config and ACME resolver integration |
|
| [005](decisions/005-tokio-rustls-direct.md) | tokio-rustls directly | Full control over TLS config and ACME resolver integration |
|
||||||
|
| [010](decisions/010-multi-site-phase1.md) | Multi-site in Phase 1 | Multiple domains from initial release |
|
||||||
|
| [011](decisions/011-multi-domain-tls.md) | Multi-domain TLS config | Single SAN certificate covering all domains via rustls-acme |
|
||||||
|
|
||||||
## Open Questions
|
## Open Questions
|
||||||
|
|
||||||
@@ -218,3 +230,5 @@ Open questions are tracked in [open-questions.md](open-questions.md). Key
|
|||||||
questions affecting this document:
|
questions affecting this document:
|
||||||
|
|
||||||
- **OQ-01**: Should cipher suites be restricted beyond rustls defaults? (open)
|
- **OQ-01**: Should cipher suites be restricted beyond rustls defaults? (open)
|
||||||
|
- **OQ-07**: Should per-site TLS overrides be supported for mixed ACME/manual
|
||||||
|
domains? (open)
|
||||||
Reference in New Issue
Block a user