Expand architecture: multi-site Phase 1, multi-domain TLS, fix review issues

Promote multi-site support from Phase 2 to Phase 1 (ADR-010): the proxy must support git.alk.dev and alk.dev from initial release. Add multi-domain TLS configuration (ADR-011): acme_domains array replaces acme_domain string, single SAN certificate via rustls-acme. Key changes: - ADR-010: Multi-site in Phase 1 — avoids config format migration later - ADR-011: Multi-domain TLS — single SAN cert, acme_domains Vec<String> - ADR-002: Updated rationale for multi-site (one upstream per domain) - overview.md: Phase 1 now includes multi-site, alk.dev pass-through, dual licensing (MIT OR Apache-2.0), real IP removed - config.md: acme_domain → acme_domains, TOML example shows both sites, validation adds unique host check, real IP replaced with 203.0.113.10 - tls.md: Multi-domain SNI section moved from Future to current, manual mode uses ResolvesServerCert for SNI mapping, TOML header fixed - proxy.md: Updated for multi-site, removed single-domain language - operations.md: RFC 5737 documentation IPs, clarified rate limit eviction semantics (distinct scan interval vs eviction age) - open-questions.md: OQ-05 resolved (single bind_addr sufficient), new OQ-07 (per-site TLS overrides) Review fixes: - acme_domains (plural) consistently used across all docs and diagram - ADR-011 clearly scopes acme_domain as previous design - Inline decision rationale extracted: tls.md hot-reload → ADR-004 ref, config.md static/dynamic → ADR-008 ref - TOML section headers consistent (server.tls)
2026-06-11 08:50:03 +00:00
parent 8ee6284b62
commit 7efc142406
10 changed files with 356 additions and 108 deletions
--- a/docs/architecture/README.md
+++ b/docs/architecture/README.md
@@ -14,6 +14,10 @@ memory-safe Rust/axum reverse proxy. The primary motivation is CVE-2026-42945
 (unauthenticated RCE in nginx's rewrite module) and the broader pattern of
 memory corruption bugs in nginx's C codebase.
 The proxy supports multiple domains from initial release (git.alk.dev and
 alk.dev), with per-domain host-based routing and a single multi-domain SAN
 certificate via ACME.
 ## Architecture Documents
 | Document | Status | Description |
@@ -37,6 +41,8 @@ memory corruption bugs in nginx's C codebase.
 | [007](decisions/007-custom-log-format.md) | Custom Structured Log Format | Accepted |
 | [008](decisions/008-static-dynamic-config-split.md) | Static/Dynamic Config Split with ArcSwap | Accepted |
 | [009](decisions/009-signal-handling.md) | Signal Handling Strategy | Accepted |
 | [010](decisions/010-multi-site-phase1.md) | Multi-Site Support in Phase 1 | Accepted |
 | [011](decisions/011-multi-domain-tls.md) | Multi-Domain TLS Configuration | Accepted |
 ## Open Questions
@@ -48,8 +54,9 @@ See [open-questions.md](open-questions.md) for the full tracker.
 | ~~OQ-02~~ | ~~What log format should fail2ban consume?~~ | ~~high~~ | **resolved** (ADR-007) |
 | OQ-03 | Should the health check endpoint be on a separate port? | low | open |
 | OQ-04 | Config reload: SIGHUP only or also Unix socket API? | low | open |
-| OQ-05 | Should the proxy bind to multiple addresses? | low | open |
+| ~~OQ-05~~ | ~~Should the proxy bind to multiple addresses?~~ | ~~low~~ | **resolved** (single bind_addr sufficient) |
 | OQ-06 | Should upstream timeouts be configurable per-site? | low | open |
 | OQ-07 | Should per-site TLS overrides be supported for mixed ACME/manual domains? | low | open |
 ## Document Lifecycle
--- a/docs/architecture/config.md
+++ b/docs/architecture/config.md
@@ -39,7 +39,7 @@ config.toml
 │  http_port           │     │  rate_limit           │
 │  https_port          │     │  body_limit           │
 │  tls.mode            │     │  proxy_headers        │
-│  tls.acme_domain     │     │                       │
+│  tls.acme_domains    │     │                       │
 │  tls.cert_path       │     │  ← ArcSwap →          │
 │  tls.key_path        │     │  ConfigReloadHandle    │
 │  tls.cache_dir       │     │  .reload(new_config)  │
@@ -59,11 +59,11 @@ Immutable after startup. Changes require a process restart.
 | Field | Type | Description |
 |-------|------|-------------|
-| `bind_addr` | `String` | IP address to bind to (e.g., `"15.235.125.95"`) |
+| `bind_addr` | `String` | IP address to bind to (must be explicit, no `0.0.0.0`) |
 | `http_port` | `u16` | Port for HTTP→HTTPS redirect (default: `80`; set to `0` to disable) |
 | `https_port` | `u16` | Port for TLS listener (default: `443`) |
 | `tls.mode` | `"acme"` or `"manual"` | Certificate provisioning mode |
-| `tls.acme_domain` | `String` | Domain for ACME (ACME mode only) |
+| `tls.acme_domains` | `Vec<String>` | Domains for ACME SAN certificate (ACME mode only) |
 | `tls.acme_cache_dir` | `String` | ACME state cache directory |
 | `tls.acme_directory` | `"production"` or `"staging"` | Let's Encrypt directory |
 | `tls.cert_path` | `String` | Certificate file path (manual mode only) |
@@ -71,9 +71,10 @@ Immutable after startup. Changes require a process restart.
 | `log_level` | `"trace"`, `"debug"`, `"info"`, `"warn"`, `"error"` | Logging verbosity |
 | `log_format` | `"text"` or `"json"` | Log output format |
-**Why these are static:** Changing bind addresses, ports, or TLS mode requires
+**Why these are static:** See ADR-008 for the rationale behind the
-creating new listeners and TLS configurations — operations that fundamentally
+static/dynamic split. In summary: changing bind addresses, ports, or TLS mode
-require a restart. There's no safe way to change these at runtime.
+requires creating new listeners and TLS configurations — operations that
 fundamentally require a restart.
 ### DynamicConfig
@@ -95,10 +96,10 @@ connections immediately.
 | `upstream` | `String` | Upstream address (e.g., `"127.0.0.1:3000"`) |
 | `upstream_scheme` | `"http"` or `"https"` | Protocol for upstream connection (default: `"http"`) |
-**Why these are dynamic:** Site definitions and rate limits are per-request
+**Why these are dynamic:** See ADR-008 for the rationale. Site definitions
-concerns. Adding a site or changing a rate limit should not require restarting
+and rate limits are per-request concerns that should not require restarting
-the proxy and dropping active connections. Rate limits and body limits are
+the proxy or dropping active connections. Rate limits and body limits are
-global settings in Phase 1; per-site configuration for these may be added in
+global settings in Phase 1; per-site configuration for these is deferred to
 Phase 2.
 ## Config Reload
@@ -136,13 +137,13 @@ config reload, but SIGHUP is sufficient for Phase 1.
 # reverse-proxy config
 [server]
-bind_addr = "15.235.125.95"
+bind_addr = "203.0.113.10"  # Replace with actual bind address
 http_port = 80
 https_port = 443
 [server.tls]
 mode = "acme"                    # "acme" or "manual"
-acme_domain = "git.alk.dev"
+acme_domains = ["git.alk.dev", "alk.dev"]
 acme_cache_dir = "/var/lib/reverse-proxy/acme-cache"
 acme_directory = "production"    # "production" or "staging"
@@ -166,6 +167,11 @@ limit_bytes = 104857600          # 100 MB
 host = "git.alk.dev"
 upstream = "127.0.0.1:3000"
 upstream_scheme = "http"
 [[sites]]
 host = "alk.dev"
 upstream = "127.0.0.1:8080"
 upstream_scheme = "http"
 ```
 ### Validation
@@ -173,12 +179,13 @@ upstream_scheme = "http"
 On startup, the config is validated:
 1. `bind_addr` is not `0.0.0.0` (must be explicit)
-2. In ACME mode, `acme_domain` must be set
+2. In ACME mode, `acme_domains` must be non-empty
 3. In manual mode, `cert_path` and `key_path` must both be set and the files
   must be readable
 4. Each site must have a `host` and `upstream`
-5. `rate_limit.requests_per_second` must be > 0
+5. Site `host` values must be unique (no duplicate hostnames)
-6. `body.limit_bytes` must be > 0
+6. `rate_limit.requests_per_second` must be > 0
 7. `body.limit_bytes` must be > 0
 On SIGHUP reload, the same validation applies. If the new config fails
 validation, the reload is rejected and the old config remains active. An error
@@ -196,6 +203,8 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
 |-----|----------|---------|
 | [003](decisions/003-toml-config.md) | TOML configuration format | Rust-native, unambiguous, excellent serde support |
 | [008](decisions/008-static-dynamic-config-split.md) | Static/dynamic config split | Immutable StaticConfig, hot-reloadable DynamicConfig via ArcSwap |
 | [010](decisions/010-multi-site-phase1.md) | Multi-site in Phase 1 | Multiple domains from initial release |
 | [011](decisions/011-multi-domain-tls.md) | Multi-domain TLS config | Single SAN certificate covering all domains |
 ## Open Questions
@@ -204,3 +213,5 @@ questions affecting this document:
 - **OQ-04**: Should config reload support a Unix domain socket API in addition
  to SIGHUP? (open)
 - **OQ-07**: Should per-site TLS overrides be supported for mixed ACME/manual
  domains? (open)
--- a/docs/architecture/decisions/002-custom-proxy-handler.md
+++ b/docs/architecture/decisions/002-custom-proxy-handler.md
@@ -16,8 +16,9 @@ available:
 2. **Custom handler** (Felix Knorr pattern): Build a handler using hyper's
   `Client` to forward requests. ~50-100 lines of Rust for our needs.
-Our use case is minimal: single upstream per domain, single domain, no load
+Our use case is minimal: single upstream per domain, no load balancing, no
-balancing, no retry, no HTTP/2 proxying.
+retry, no HTTP/2 proxying. While the proxy supports multiple domains
 (ADR-010), each domain routes to exactly one upstream.
 ## Decision
@@ -31,6 +32,8 @@ project's channel proxy.
  path-based routing to multiple backends)
 - Our proxy case is the simplest possible: match a Host header, forward the
  entire request to a single upstream, stream the response back
 - Multi-domain support (ADR-010) doesn't change this — each domain still maps
  to one upstream
 - The Felix Knorr pattern is proven, idiomatic, and ~50-100 lines
 - We maintain full control over header injection, error handling, and upstream
  connection behavior
@@ -46,11 +49,12 @@ project's channel proxy.
 **Negative:**
 - We implement and maintain proxy logic ourselves (but it's trivial for our
-  use case)
+  use case — each domain maps to one upstream)
 - If requirements grow to load balancing or retry, we'd need to add that
  ourselves or switch to `axum-reverse-proxy`
 ## References
 - [proxy.md](../proxy.md)
 - [ADR-010](010-multi-site-phase1.md) (multi-site in Phase 1)
 - Felix Knorr, "Replacing nginx with axum" (felix-knorr.net/posts/2024-10-13-replacing-nginx-with-axum.html)
--- a/docs/architecture/decisions/010-multi-site-phase1.md
+++ b/docs/architecture/decisions/010-multi-site-phase1.md
@@ -0,0 +1,90 @@
 # ADR-010: Multi-Site Support in Phase 1
 ## Status
 Accepted
 ## Context
 The original architecture phased multi-site support into Phase 2, treating
 Phase 1 as a single-domain replacement for nginx serving only `git.alk.dev`.
 This was based on the assumption that only one domain needed proxying initially.
 However, `alk.dev` (the bare domain) will need proxying in the near future.
 While `alk.dev` is a simple case — proxying to a Deno/Fresh container with no
 special requirements — the proxy must support multiple sites from day one. The
 config format, routing logic, and TLS certificate provisioning all need
 multi-site awareness.
 Additionally, `api.alk.dev` is explicitly out of scope (it runs its own
 HTTP/2+ server natively), but the proxy must not prevent future sites from
 being added.
 The cost of deferring multi-site is high: we'd need a config format migration,
 routing logic rewrite, and TLS cert management changes later. Supporting
 multi-site from the start costs very little — the config format just uses an
 array of sites (which it already does), host-based routing is trivial in axum,
 and `rustls-acme` supports multi-domain certificates natively.
 ## Decision
 Move multi-site support from Phase 2 into Phase 1. The proxy supports multiple
 sites from the initial release:
 - `[[sites]]` array in config (already the planned format)
 - Host-based routing via axum's `Host` extractor (already the planned approach)
 - Multi-domain ACME certificate provisioning via `rustls-acme`
 - Each site maps a hostname to an upstream address
 Phase 1 scope becomes:
 1. Multi-site reverse proxy with TLS termination
 2. ACME certificate management (multi-domain)
 3. HTTP → HTTPS redirect
 4. Rate limiting, logging, health check, graceful shutdown
 5. Systemd integration
 Phase 2 scope shifts to operational hardening:
 1. Per-site rate limits and body limits
 2. Per-site upstream timeouts
 3. Metrics endpoint (Prometheus-compatible)
 4. Connection limits and timeouts
 5. Log rotation
 Phase 3 remains future enhancements.
 ## Rationale
 - The config format already uses `[[sites]]` — no format change needed
 - Host-based routing is the natural axum pattern and was already planned
 - `rustls-acme` accepts `Vec<domain>` — multi-domain is its default usage
 - The cost of adding multi-site later (config migration, routing rewrite,
  cert management changes) far exceeds the cost of supporting it now (zero
  additional complexity)
 - `alk.dev` is confirmed as a near-term need, not a hypothetical
 - The proxy's value proposition is being a memory-safe reverse proxy for *our
  infrastructure*, which has multiple domains
 ## Consequences
 **Positive:**
 - No config format migration needed later
 - `alk.dev` can be added to the config without code changes
 - TLS cert management handles multiple domains from the start
 - Eliminates an entire phase of work
 **Negative:**
 - Slightly more testing surface (must verify correct routing with multiple
  sites)
 - Must test multi-domain ACME provisioning (not just single-domain)
 - Wildcard or fallback site behavior needs to be defined (addressed in
  OQ-07)
 ## References
 - [overview.md](../overview.md)
 - [config.md](../config.md)
 - [tls.md](../tls.md)
 - [proxy.md](../proxy.md)
 - ADR-002 (custom proxy handler — rationale updated for multi-site)
--- a/docs/architecture/decisions/011-multi-domain-tls.md
+++ b/docs/architecture/decisions/011-multi-domain-tls.md
@@ -0,0 +1,92 @@
 # ADR-011: Multi-Domain TLS Configuration
 ## Status
 Accepted
 ## Context
 With multi-site support in Phase 1 (ADR-010), the TLS configuration must
 support multiple domains. The previous design used a single `tls.acme_domain`
 string field, which only works for one domain.
 There are several approaches to multi-domain TLS:
 1. **Single ACME config with domain list**: `acme_domains = ["git.alk.dev",
   "alk.dev"]` — one certificate covering all domains (SAN certificate)
 2. **Per-site TLS configuration**: Each site entry specifies its own TLS
   mode (ACME or manual) and domain — more flexible but complex
 3. **Hybrid**: A global TLS section with ACME domains, plus per-site overrides
   for manual certificates
 For our use case, all proxied domains use the same ACME certificate authority
 (Let's Encrypt) and the same challenge type (TLS-ALPN-01). There's no need
 for per-site TLS configuration in Phase 1.
 ## Decision
 Use a single ACME configuration with a list of domains, producing one SAN
 certificate covering all proxied domains. Manual mode uses certificate file
 paths (single cert file with all domains, or one cert per domain resolved via
 SNI).
 The config format changes from the previous single-domain format:
 ```toml
 # Previous (single-domain) format — no longer used
 [tls]
 mode = "acme"
 acme_domain = "git.alk.dev"  # single string
 ```
 To the current multi-domain format:
 ```toml
 [tls]
 mode = "acme"
 acme_domains = ["git.alk.dev", "alk.dev"]  # array of strings
 ```
 In ACME mode, `rustls-acme` provisions a single certificate covering all
 listed domains via Subject Alternative Names (SAN). This is the standard
 Let's Encrypt approach for multi-domain certificates.
 In manual mode, the cert and key files must cover all domains (either a SAN
 certificate or separate certificates resolved via SNI).
 ## Rationale
 - A single SAN certificate is simpler to manage (one renewal, one cert)
 - Let's Encrypt supports SAN certificates with up to 100 domains
 - `rustls-acme` accepts `Vec<String>` for domain lists — this is its natural
  API
 - All our domains use the same ACME configuration (Let's Encrypt production,
  TLS-ALPN-01 challenge)
 - Per-site TLS overrides add complexity with no current benefit
 - If per-site TLS configuration is needed later (e.g., a site with a manual
  cert), it can be added as an optional override without changing the global
  config structure
 ## Consequences
 **Positive:**
 - Single certificate for all domains — simpler renewal, simpler cert management
 - Matches `rustls-acme`'s natural API (`AcmeConfig::new(domains: Vec<String>)`)
 - All domains in one cert means SNI resolution is handled by ACME automatically
 - Config format is a minimal change from single-domain
 **Negative:**
 - Adding or removing a domain requires re-provisioning the certificate (ACME
  handles this automatically, but it means cert changes affect all domains)
 - If one domain fails ACME validation, the entire cert renewal fails (all
  domains must be validated) — mitigated by Let's Encrypt's domain-level
  validation
 - Per-site TLS configuration (e.g., a domain with a manual cert) requires a
  future config extension (OQ-07)
 ## References
 - [tls.md](../tls.md)
 - [config.md](../config.md)
 - ADR-010 (multi-site in Phase 1)
 - ADR-004 (ACME-primary certificate management)
--- a/docs/architecture/open-questions.md
+++ b/docs/architecture/open-questions.md
@@ -21,8 +21,6 @@ last_updated: 2026-06-11
  than the current nginx config.
 - **Cross-references**: ADR-005
 ## Logging and Monitoring
 ### ~~OQ-02: What log format should fail2ban consume?~~
 - **Origin**: [operations.md](operations.md), [proxy.md](proxy.md)
@@ -33,6 +31,22 @@ last_updated: 2026-06-11
  See ADR-007.
 - **Cross-references**: ADR-007
 ### OQ-07: Should per-site TLS overrides be supported for mixed ACME/manual domains?
 - **Origin**: [tls.md](tls.md), [config.md](config.md)
 - **Status**: open
 - **Priority**: low
 - **Context**: Phase 1 uses a single TLS configuration (ACME or manual) for all
  domains. All domains share the same ACME config and certificate. If a future
  domain needs a manual certificate (e.g., a corporate CA cert) while other
  domains use ACME, a per-site TLS override would be needed. This would require
  a custom `ResolvesServerCert` that combines ACME-provisioned certs with
  manually loaded certs. For now, all proxied domains use the same ACME config,
  so this is not needed.
 - **Cross-references**: ADR-011
 ## Logging and Monitoring
 ### OQ-03: Should the health check endpoint be on a separate port?
 - **Origin**: [operations.md](operations.md)
@@ -61,15 +75,15 @@ last_updated: 2026-06-11
 ## Deployment
-### OQ-05: Should the proxy bind to multiple addresses or just one?
+### ~~OQ-05: Should the proxy bind to multiple addresses or just one?~~
 - **Origin**: [overview.md](overview.md)
- **Status**: open
+- **Status**: resolved
 - **Priority**: low
- **Context**: Current nginx config binds to a specific IP (`15.235.125.95`).
+- **Resolution**: A single `bind_addr` is sufficient. The proxy binds to one
-  The proposed config uses `bind_addr` which could be any IP. For Phase 1, the
+  explicit IP address (not `0.0.0.0`). Multi-address binding is not needed for
-  config will specify a single IP address. Multi-address binding (listening on
+  this single-server deployment. If needed in the future, `bind_addr` could be
-  multiple IPs) is not needed but could be added as an array of addresses.
+  extended to an array. See config.md for the `bind_addr` field.
 - **Cross-references**: None
 ## Proxy
--- a/docs/architecture/operations.md
+++ b/docs/architecture/operations.md
@@ -42,9 +42,10 @@ Requests` and logs the event with structured fields.
 ### State Eviction
 The per-IP token bucket state grows over time as new IPs are seen. A
-background task runs at a configurable interval (default: 60 seconds) and
+background task runs every 60 seconds (configurable) and removes entries
-removes entries that haven't been accessed within the cleanup interval. This
+whose last access timestamp is older than a configurable eviction age
-prevents unbounded memory growth.
+(default: 300 seconds / 5 minutes). This prevents unbounded memory growth
 while preserving recent entries that may still receive traffic.
 ### Fail2ban Integration
@@ -55,7 +56,7 @@ format decision.
 The log format uses `key=value` pairs with a `RATE_LIMIT` prefix:
 ```
-RATE_LIMIT client_ip=X.X.X.X host=Y.Z path=/W status=429
+RATE_LIMIT client_ip=203.0.113.50 host=Y.Z path=/W status=429
 ```
 A corresponding fail2ban filter and jail configuration are provided as part
@@ -71,15 +72,15 @@ log entries:
 1. **Access logs**: Every proxied request is logged at `info` level with
   structured fields.
-   ```
+```
-   REQUEST client_ip=1.2.3.4 host=git.alk.dev method=GET path=/user/repo status=200 upstream=127.0.0.1:3000 duration_ms=45
+REQUEST client_ip=203.0.113.50 host=git.alk.dev method=GET path=/user/repo status=200 upstream=127.0.0.1:3000 duration_ms=45
-   ```
+```
 2. **Event logs**: Rate limits, TLS errors, upstream failures, config reloads,
   etc.
   ```
-   RATE_LIMIT client_ip=1.2.3.4 host=git.alk.dev path=/login status=429
+   RATE_LIMIT client_ip=203.0.113.50 host=git.alk.dev path=/login status=429
   UPSTREAM_ERROR host=git.alk.dev upstream=127.0.0.1:3000 error="connection refused"
   CONFIG_RELOAD status=success sites=1
   ```
--- a/docs/architecture/overview.md
+++ b/docs/architecture/overview.md
@@ -8,10 +8,12 @@ last_updated: 2026-06-11
 ## Vision
 A memory-safe, minimal reverse proxy that replaces our vulnerable nginx instance
-for forward-proxying to backend services. The proxy terminates TLS, injects
+for forwarding requests to backend services. The proxy terminates TLS, injects
 standard proxy headers, enforces rate limits, and forwards requests to upstream
-services — with operational feature parity for our current single-domain Gitea
+services — supporting multiple domains from initial release.
-setup.
+
 This project is open source under dual licensing: MIT OR Apache-2.0, consistent
 with standard Rust project licensing.
 ## Why This Exists
@@ -35,38 +37,44 @@ details.
 ### In Scope
- **Phase 1**: Replace nginx for `git.alk.dev` with feature parity
+- **Phase 1**: Multi-site reverse proxy with TLS termination
-  - TLS termination with ACME (Let's Encrypt) certificate management
+  - TLS termination with ACME (Let's Encrypt) multi-domain certificate management
  - Manual certificate paths as fallback mode
  - HTTP → HTTPS redirect
-  - Reverse proxy to Gitea at `127.0.0.1:3000`
+  - Host-based routing to multiple upstream services
  - Reverse proxy to Gitea at `127.0.0.1:3000` (git.alk.dev)
  - Reverse proxy to Deno/Fresh container for alk.dev (simple pass-through)
  - Proxy header injection (Host, X-Real-IP, X-Forwarded-For, X-Forwarded-Proto)
-  - Request rate limiting with fail2ban-compatible logging (global per-IP; per-site in Phase 2)
+  - Request rate limiting with fail2ban-compatible logging (global per-IP)
-  - 100 MB body size limit (global; per-site in Phase 2)
+  - 100 MB body size limit (global)
  - Configurable bind address (no `0.0.0.0` default)
  - Health check endpoint
  - Graceful shutdown (SIGTERM handling)
  - Systemd unit file
  - Dual licensing: MIT OR Apache-2.0
- **Phase 2**: Multi-site support
+- **Phase 2**: Operational hardening
-  - SNI-based TLS routing for multiple domains
+  - Per-site rate limits and body limits
-  - Config file for site definitions
+  - Per-site upstream timeouts
  - Dynamic config reload (ArcSwap pattern)
 - **Phase 3**: Operational hardening
  - Metrics endpoint (Prometheus-compatible)
  - Connection limits and timeouts
  - Log rotation
 - **Phase 3**: Future enhancements
  - Wildcard subdomain support
  - Per-site TLS overrides (manual certs for specific domains)
  - Unix domain socket config reload API
 ### Out of Scope
 - HTTP/2 or HTTP/3 proxying (services that need these run their own native
-  Rust servers — e.g., `api.alk.dev`)
+  Rust servers — e.g., `api.alk.dev` runs its own HTTP/2+ server)
 - Load balancing or round-robin upstream selection
 - WebSocket proxying (can be added later if needed)
 - Static file serving
 - Access control beyond rate limiting (no auth, no IP allowlists in Phase 1)
 - CGI, SCGI, uWSGI, FastCGI
 - Per-site TLS configuration (all domains share one ACME config in Phase 1)
 ## Architecture
@@ -81,11 +89,14 @@ bind_addr:80   ──►  │  HTTP listener → 301 redirect        │
                     │                                      │
 bind_addr:443  ──►  │  TLS listener (tokio-rustls)         │
                     │  ├─ ACME mode: rustls-acme resolver  │
-                    │  │  (auto cert provisioning/renewal) │
+                     │  │  (multi-domain SAN cert,           │
                     │  │   auto-provision & renew)          │
                     │  └─ Manual mode: cert/key file paths  │
                     │                                      │
                     │  axum router                         │
                     │  ├─ Host-based routing                │
                     │  │  ├─ git.alk.dev → :3000            │
                     │  │  └─ alk.dev     → :8080            │
                     │  ├─ Rate limiting middleware          │
                     │  ├─ Proxy header injection            │
                     │  ├─ Body size limit (100MB)           │
@@ -147,7 +158,7 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
 | ADR | Decision | Summary |
 |-----|----------|---------|
 | [001](decisions/001-rust-axum.md) | Rust with axum | Memory safety eliminates the bug class causing nginx CVEs; axum provides ergonomic tower integration |
-| [002](decisions/002-custom-proxy-handler.md) | Custom proxy handler | Single upstream, single domain — axum-reverse-proxy adds unnecessary complexity |
+| [002](decisions/002-custom-proxy-handler.md) | Custom proxy handler | Single upstream per domain — simpler than a general proxy library |
 | [003](decisions/003-toml-config.md) | TOML configuration format | Rust-native, unambiguous, excellent serde support |
 | [004](decisions/004-rustls-acme.md) | ACME-primary certificate management | Eliminates certbot dependency; automatic provisioning and renewal |
 | [005](decisions/005-tokio-rustls-direct.md) | tokio-rustls directly, not axum-server | Full control over TLS config, ACME resolver integration, cipher suite configuration |
@@ -155,6 +166,8 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
 | [007](decisions/007-custom-log-format.md) | Custom structured log format | key=value pairs with RATE_LIMIT prefix for fail2ban |
 | [008](decisions/008-static-dynamic-config-split.md) | Static/dynamic config with ArcSwap | Immutable StaticConfig, hot-reloadable DynamicConfig via ArcSwap |
 | [009](decisions/009-signal-handling.md) | Signal handling strategy | signal-hook for SIGTERM/SIGINT/SIGHUP |
 | [010](decisions/010-multi-site-phase1.md) | Multi-site in Phase 1 | Multiple domains from initial release; avoids config migration later |
 | [011](decisions/011-multi-domain-tls.md) | Multi-domain TLS config | Single SAN certificate covering all domains via rustls-acme |
 ## Open Questions
@@ -163,4 +176,4 @@ questions affecting this document:
 - **OQ-01**: Should cipher suites be restricted beyond rustls defaults? (open)
 - **OQ-03**: Should the health check endpoint be on a separate port? (open)
- **OQ-05**: Should the proxy bind to multiple addresses or just one? (open)
+- **OQ-07**: Should per-site TLS overrides be supported for mixed ACME/manual domains? (open)
--- a/docs/architecture/proxy.md
+++ b/docs/architecture/proxy.md
@@ -14,8 +14,9 @@ injection, body size limits), and forwards it to the upstream service.
 ## Why It Exists
 This component replaces nginx's `proxy_pass` directive. For our use case —
-single upstream per domain, no load balancing, no HTTP/2 proxying — a custom
+one upstream per domain across multiple domains, no load balancing, no HTTP/2
-handler is simpler and more maintainable than a general-purpose proxy library.
+proxying — a custom handler is simpler and more maintainable than a
 general-purpose proxy library (ADR-002, ADR-010).
 ## Architecture
@@ -140,9 +141,9 @@ services typically run on the same host (e.g., `127.0.0.1:3000`). The
 `upstream_scheme` field in each site's configuration allows specifying `https://`
 for upstreams that require TLS (e.g., separate hosts or secure internal services).
-For the initial deployment (`git.alk.dev` → `127.0.0.1:3000`), the upstream
+For the initial deployment, upstream connections use plain HTTP (e.g.,
-connection uses plain HTTP, as TLS between the proxy and Gitea on loopback is
+`git.alk.dev` → `127.0.0.1:3000`, `alk.dev` → `127.0.0.1:8080`) since TLS
-unnecessary.
+between the proxy and backend services on loopback is unnecessary.
 ## Body Size Limit
@@ -157,8 +158,9 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
 | ADR | Decision | Summary |
 |-----|----------|---------|
-| [002](decisions/002-custom-proxy-handler.md) | Custom proxy handler | Single upstream, single domain — simpler than a general proxy library |
+| [002](decisions/002-custom-proxy-handler.md) | Custom proxy handler | One upstream per domain — simpler than a general proxy library |
 | [007](decisions/007-custom-log-format.md) | Custom structured log format | key=value pairs with RATE_LIMIT prefix for fail2ban |
 | [010](decisions/010-multi-site-phase1.md) | Multi-site in Phase 1 | Multiple domains from initial release |
 ## Open Questions
--- a/docs/architecture/tls.md
+++ b/docs/architecture/tls.md
@@ -57,10 +57,11 @@ no deploy hooks.
 **How it works:**
-1. `AcmeCertProvider` configures the ACME client with the domain, cache
+1. `AcmeCertProvider` configures the ACME client with the domain list, cache
   directory, and Let's Encrypt directory (staging or production).
-2. `AcmeConfig::new(vec![domain])` creates an ACME configuration for the
+2. `AcmeConfig::new(domains)` creates an ACME configuration for all listed
-   domain.
+   domains. Let's Encrypt will issue a single SAN certificate covering all
   domains.
 3. The ACME state machine runs as a background tokio task, handling:
   - Account registration with Let's Encrypt
   - Certificate ordering
@@ -75,9 +76,9 @@ no deploy hooks.
 **Configuration:**
 ```toml
-[tls]
+[server.tls]
 mode = "acme"
-acme_domain = "git.alk.dev"
+acme_domains = ["git.alk.dev", "alk.dev"]
 acme_cache_dir = "/var/lib/reverse-proxy/acme-cache"
 acme_directory = "production"  # or "staging" for testing
 ```
@@ -100,13 +101,8 @@ key_path = "/etc/letsencrypt/live/git.alk.dev/privkey.pem"
 ```
 Certificate files are loaded once at startup using `rustls_pemfile`. Manual
-mode requires a restart to pick up new certificates.
+mode requires a restart to pick up new certificates. See ADR-004 for the
-
+rationale behind making ACME the primary mode and manual mode restart-dependent.
 **Why not hot-reload manual certs?** ACME mode handles renewal automatically.
 Manual mode is for cases where you control cert rotation externally (certbot,
 manual renewal). In that case, a SIGHUP-triggered restart is simpler and more
 reliable than file watching. If zero-downtime cert rotation is needed, use ACME
 mode.
 ## TLS Configuration
@@ -142,10 +138,13 @@ restrict cipher suites beyond rustls defaults.
 ### ServerConfig Construction
 For manual mode, the `ServerConfig` is built with `with_no_client_auth()` and
-`with_single_cert()`, loading the certificate chain and private key from disk.
+a custom `ResolvesServerCert` implementation that maps SNI hostnames to
 certificate/key pairs loaded from disk.
 For ACME mode, the `ServerConfig` is built with `with_cert_resolver()`, passing
-the `ResolvesServerCertAcme` resolver. The ACME TLS-ALPN-01 protocol identifier
+the `ResolvesServerCertAcme` resolver. The ACME configuration includes all
 domains listed in `acme_domains`, and the resolver manages a single SAN
 certificate covering all of them. The ACME TLS-ALPN-01 protocol identifier
 (`acme-tls/1`) must be registered in the `alpn_protocols` list so the server
 can respond to TLS-ALPN-01 challenges.
@@ -154,28 +153,39 @@ versions (TLS 1.2 and TLS 1.3).
 ## SNI-Based Certificate Selection
-### Current (Single Domain)
+### ACME Mode (Multi-Domain)
-For single-domain setups, SNI selection is trivial: there's only one
+In ACME mode, `rustls-acme` manages a single SAN certificate covering all
-certificate, so `with_single_cert()` or `ResolvesServerCertAcme` (which
+configured domains. The `ResolvesServerCertAcme` resolver automatically serves
-handles the domain) is sufficient.
+the correct certificate during the TLS handshake.
 ### Future (Multi-Domain)
 When multiple domains are served, SNI selection works as follows:
 1. **TLS handshake**: The client sends the SNI extension indicating which
   hostname it's connecting to.
-2. **Certificate resolution**: In ACME mode, `ResolvesServerCertAcme` handles
+2. **Certificate resolution**: `ResolvesServerCertAcme` matches the SNI
-   this automatically — it stores certificates keyed by domain. In manual mode,
+   hostname against the provisioned certificate's Subject Alternative Names
-   a custom `ResolvesServerCert` implementation maps SNI hostname to the
+   and serves the certificate.
   correct `CertifiedKey`.
 3. **HTTP routing**: After the TLS handshake, axum's `Host` extractor routes
   the request to the correct site handler based on the `Host` header.
 This is the same pattern nginx uses — SNI selects the cert during TLS, then
-`Host` header selects the server block. In manual mode, a `ResolvesServerCert`
+`Host` header selects the server block. ACME mode handles this automatically
-implementation maps SNI hostname to the correct `CertifiedKey`.
+through the cert resolver.
 ### Manual Mode (Multi-Domain)
 In manual mode, a custom `ResolvesServerCert` implementation is required to
 map SNI hostnames to the correct `CertifiedKey`. This implementation:
 1. Loads certificate files at startup (or on SIGHUP for reload)
 2. Maps each domain name to its certificate chain and private key
 3. During the TLS handshake, looks up the SNI hostname and returns the
   matching `CertifiedKey`
 The custom resolver must handle the case where no matching certificate exists
 for the SNI hostname — in this case, the handshake fails, which is the
 correct behavior (we don't serve a default certificate for unknown domains).
 See [open-questions.md](open-questions.md) OQ-07 for per-site TLS overrides.
 ## HTTP Listener (Port 80)
@@ -211,6 +221,8 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
 |-----|----------|---------|
 | [004](decisions/004-rustls-acme.md) | ACME-primary cert management | Eliminates certbot; automatic provisioning and renewal |
 | [005](decisions/005-tokio-rustls-direct.md) | tokio-rustls directly | Full control over TLS config and ACME resolver integration |
 | [010](decisions/010-multi-site-phase1.md) | Multi-site in Phase 1 | Multiple domains from initial release |
 | [011](decisions/011-multi-domain-tls.md) | Multi-domain TLS config | Single SAN certificate covering all domains via rustls-acme |
 ## Open Questions
@@ -218,3 +230,5 @@ Open questions are tracked in [open-questions.md](open-questions.md). Key
 questions affecting this document:
 - **OQ-01**: Should cipher suites be restricted beyond rustls defaults? (open)
 - **OQ-07**: Should per-site TLS overrides be supported for mixed ACME/manual
  domains? (open)