Expand architecture: multi-site Phase 1, multi-domain TLS, fix review issues

Promote multi-site support from Phase 2 to Phase 1 (ADR-010): the proxy
must support git.alk.dev and alk.dev from initial release. Add multi-domain
TLS configuration (ADR-011): acme_domains array replaces acme_domain string,
single SAN certificate via rustls-acme.

Key changes:
- ADR-010: Multi-site in Phase 1 — avoids config format migration later
- ADR-011: Multi-domain TLS — single SAN cert, acme_domains Vec<String>
- ADR-002: Updated rationale for multi-site (one upstream per domain)
- overview.md: Phase 1 now includes multi-site, alk.dev pass-through,
  dual licensing (MIT OR Apache-2.0), real IP removed
- config.md: acme_domain → acme_domains, TOML example shows both sites,
  validation adds unique host check, real IP replaced with 203.0.113.10
- tls.md: Multi-domain SNI section moved from Future to current, manual
  mode uses ResolvesServerCert for SNI mapping, TOML header fixed
- proxy.md: Updated for multi-site, removed single-domain language
- operations.md: RFC 5737 documentation IPs, clarified rate limit eviction
  semantics (distinct scan interval vs eviction age)
- open-questions.md: OQ-05 resolved (single bind_addr sufficient), new
  OQ-07 (per-site TLS overrides)

Review fixes:
- acme_domains (plural) consistently used across all docs and diagram
- ADR-011 clearly scopes acme_domain as previous design
- Inline decision rationale extracted: tls.md hot-reload → ADR-004 ref,
  config.md static/dynamic → ADR-008 ref
- TOML section headers consistent (server.tls)
This commit is contained in:
2026-06-11 08:50:03 +00:00
parent 8ee6284b62
commit 7efc142406
10 changed files with 356 additions and 108 deletions

View File

@@ -16,8 +16,9 @@ available:
2. **Custom handler** (Felix Knorr pattern): Build a handler using hyper's
`Client` to forward requests. ~50-100 lines of Rust for our needs.
Our use case is minimal: single upstream per domain, single domain, no load
balancing, no retry, no HTTP/2 proxying.
Our use case is minimal: single upstream per domain, no load balancing, no
retry, no HTTP/2 proxying. While the proxy supports multiple domains
(ADR-010), each domain routes to exactly one upstream.
## Decision
@@ -31,6 +32,8 @@ project's channel proxy.
path-based routing to multiple backends)
- Our proxy case is the simplest possible: match a Host header, forward the
entire request to a single upstream, stream the response back
- Multi-domain support (ADR-010) doesn't change this — each domain still maps
to one upstream
- The Felix Knorr pattern is proven, idiomatic, and ~50-100 lines
- We maintain full control over header injection, error handling, and upstream
connection behavior
@@ -46,11 +49,12 @@ project's channel proxy.
**Negative:**
- We implement and maintain proxy logic ourselves (but it's trivial for our
use case)
use case — each domain maps to one upstream)
- If requirements grow to load balancing or retry, we'd need to add that
ourselves or switch to `axum-reverse-proxy`
## References
- [proxy.md](../proxy.md)
- [ADR-010](010-multi-site-phase1.md) (multi-site in Phase 1)
- Felix Knorr, "Replacing nginx with axum" (felix-knorr.net/posts/2024-10-13-replacing-nginx-with-axum.html)

View File

@@ -0,0 +1,90 @@
# ADR-010: Multi-Site Support in Phase 1
## Status
Accepted
## Context
The original architecture phased multi-site support into Phase 2, treating
Phase 1 as a single-domain replacement for nginx serving only `git.alk.dev`.
This was based on the assumption that only one domain needed proxying initially.
However, `alk.dev` (the bare domain) will need proxying in the near future.
While `alk.dev` is a simple case — proxying to a Deno/Fresh container with no
special requirements — the proxy must support multiple sites from day one. The
config format, routing logic, and TLS certificate provisioning all need
multi-site awareness.
Additionally, `api.alk.dev` is explicitly out of scope (it runs its own
HTTP/2+ server natively), but the proxy must not prevent future sites from
being added.
The cost of deferring multi-site is high: we'd need a config format migration,
routing logic rewrite, and TLS cert management changes later. Supporting
multi-site from the start costs very little — the config format just uses an
array of sites (which it already does), host-based routing is trivial in axum,
and `rustls-acme` supports multi-domain certificates natively.
## Decision
Move multi-site support from Phase 2 into Phase 1. The proxy supports multiple
sites from the initial release:
- `[[sites]]` array in config (already the planned format)
- Host-based routing via axum's `Host` extractor (already the planned approach)
- Multi-domain ACME certificate provisioning via `rustls-acme`
- Each site maps a hostname to an upstream address
Phase 1 scope becomes:
1. Multi-site reverse proxy with TLS termination
2. ACME certificate management (multi-domain)
3. HTTP → HTTPS redirect
4. Rate limiting, logging, health check, graceful shutdown
5. Systemd integration
Phase 2 scope shifts to operational hardening:
1. Per-site rate limits and body limits
2. Per-site upstream timeouts
3. Metrics endpoint (Prometheus-compatible)
4. Connection limits and timeouts
5. Log rotation
Phase 3 remains future enhancements.
## Rationale
- The config format already uses `[[sites]]` — no format change needed
- Host-based routing is the natural axum pattern and was already planned
- `rustls-acme` accepts `Vec<domain>` — multi-domain is its default usage
- The cost of adding multi-site later (config migration, routing rewrite,
cert management changes) far exceeds the cost of supporting it now (zero
additional complexity)
- `alk.dev` is confirmed as a near-term need, not a hypothetical
- The proxy's value proposition is being a memory-safe reverse proxy for *our
infrastructure*, which has multiple domains
## Consequences
**Positive:**
- No config format migration needed later
- `alk.dev` can be added to the config without code changes
- TLS cert management handles multiple domains from the start
- Eliminates an entire phase of work
**Negative:**
- Slightly more testing surface (must verify correct routing with multiple
sites)
- Must test multi-domain ACME provisioning (not just single-domain)
- Wildcard or fallback site behavior needs to be defined (addressed in
OQ-07)
## References
- [overview.md](../overview.md)
- [config.md](../config.md)
- [tls.md](../tls.md)
- [proxy.md](../proxy.md)
- ADR-002 (custom proxy handler — rationale updated for multi-site)

View File

@@ -0,0 +1,92 @@
# ADR-011: Multi-Domain TLS Configuration
## Status
Accepted
## Context
With multi-site support in Phase 1 (ADR-010), the TLS configuration must
support multiple domains. The previous design used a single `tls.acme_domain`
string field, which only works for one domain.
There are several approaches to multi-domain TLS:
1. **Single ACME config with domain list**: `acme_domains = ["git.alk.dev",
"alk.dev"]` — one certificate covering all domains (SAN certificate)
2. **Per-site TLS configuration**: Each site entry specifies its own TLS
mode (ACME or manual) and domain — more flexible but complex
3. **Hybrid**: A global TLS section with ACME domains, plus per-site overrides
for manual certificates
For our use case, all proxied domains use the same ACME certificate authority
(Let's Encrypt) and the same challenge type (TLS-ALPN-01). There's no need
for per-site TLS configuration in Phase 1.
## Decision
Use a single ACME configuration with a list of domains, producing one SAN
certificate covering all proxied domains. Manual mode uses certificate file
paths (single cert file with all domains, or one cert per domain resolved via
SNI).
The config format changes from the previous single-domain format:
```toml
# Previous (single-domain) format — no longer used
[tls]
mode = "acme"
acme_domain = "git.alk.dev" # single string
```
To the current multi-domain format:
```toml
[tls]
mode = "acme"
acme_domains = ["git.alk.dev", "alk.dev"] # array of strings
```
In ACME mode, `rustls-acme` provisions a single certificate covering all
listed domains via Subject Alternative Names (SAN). This is the standard
Let's Encrypt approach for multi-domain certificates.
In manual mode, the cert and key files must cover all domains (either a SAN
certificate or separate certificates resolved via SNI).
## Rationale
- A single SAN certificate is simpler to manage (one renewal, one cert)
- Let's Encrypt supports SAN certificates with up to 100 domains
- `rustls-acme` accepts `Vec<String>` for domain lists — this is its natural
API
- All our domains use the same ACME configuration (Let's Encrypt production,
TLS-ALPN-01 challenge)
- Per-site TLS overrides add complexity with no current benefit
- If per-site TLS configuration is needed later (e.g., a site with a manual
cert), it can be added as an optional override without changing the global
config structure
## Consequences
**Positive:**
- Single certificate for all domains — simpler renewal, simpler cert management
- Matches `rustls-acme`'s natural API (`AcmeConfig::new(domains: Vec<String>)`)
- All domains in one cert means SNI resolution is handled by ACME automatically
- Config format is a minimal change from single-domain
**Negative:**
- Adding or removing a domain requires re-provisioning the certificate (ACME
handles this automatically, but it means cert changes affect all domains)
- If one domain fails ACME validation, the entire cert renewal fails (all
domains must be validated) — mitigated by Let's Encrypt's domain-level
validation
- Per-site TLS configuration (e.g., a domain with a manual cert) requires a
future config extension (OQ-07)
## References
- [tls.md](../tls.md)
- [config.md](../config.md)
- ADR-010 (multi-site in Phase 1)
- ADR-004 (ACME-primary certificate management)