Expand architecture: multi-site Phase 1, multi-domain TLS, fix review issues

Promote multi-site support from Phase 2 to Phase 1 (ADR-010): the proxy
must support git.alk.dev and alk.dev from initial release. Add multi-domain
TLS configuration (ADR-011): acme_domains array replaces acme_domain string,
single SAN certificate via rustls-acme.

Key changes:
- ADR-010: Multi-site in Phase 1 — avoids config format migration later
- ADR-011: Multi-domain TLS — single SAN cert, acme_domains Vec<String>
- ADR-002: Updated rationale for multi-site (one upstream per domain)
- overview.md: Phase 1 now includes multi-site, alk.dev pass-through,
  dual licensing (MIT OR Apache-2.0), real IP removed
- config.md: acme_domain → acme_domains, TOML example shows both sites,
  validation adds unique host check, real IP replaced with 203.0.113.10
- tls.md: Multi-domain SNI section moved from Future to current, manual
  mode uses ResolvesServerCert for SNI mapping, TOML header fixed
- proxy.md: Updated for multi-site, removed single-domain language
- operations.md: RFC 5737 documentation IPs, clarified rate limit eviction
  semantics (distinct scan interval vs eviction age)
- open-questions.md: OQ-05 resolved (single bind_addr sufficient), new
  OQ-07 (per-site TLS overrides)

Review fixes:
- acme_domains (plural) consistently used across all docs and diagram
- ADR-011 clearly scopes acme_domain as previous design
- Inline decision rationale extracted: tls.md hot-reload → ADR-004 ref,
  config.md static/dynamic → ADR-008 ref
- TOML section headers consistent (server.tls)
This commit is contained in:
2026-06-11 08:50:03 +00:00
parent 8ee6284b62
commit 7efc142406
10 changed files with 356 additions and 108 deletions

View File

@@ -8,10 +8,12 @@ last_updated: 2026-06-11
## Vision
A memory-safe, minimal reverse proxy that replaces our vulnerable nginx instance
for forward-proxying to backend services. The proxy terminates TLS, injects
for forwarding requests to backend services. The proxy terminates TLS, injects
standard proxy headers, enforces rate limits, and forwards requests to upstream
services — with operational feature parity for our current single-domain Gitea
setup.
services — supporting multiple domains from initial release.
This project is open source under dual licensing: MIT OR Apache-2.0, consistent
with standard Rust project licensing.
## Why This Exists
@@ -35,65 +37,74 @@ details.
### In Scope
- **Phase 1**: Replace nginx for `git.alk.dev` with feature parity
- TLS termination with ACME (Let's Encrypt) certificate management
- **Phase 1**: Multi-site reverse proxy with TLS termination
- TLS termination with ACME (Let's Encrypt) multi-domain certificate management
- Manual certificate paths as fallback mode
- HTTP → HTTPS redirect
- Reverse proxy to Gitea at `127.0.0.1:3000`
- Host-based routing to multiple upstream services
- Reverse proxy to Gitea at `127.0.0.1:3000` (git.alk.dev)
- Reverse proxy to Deno/Fresh container for alk.dev (simple pass-through)
- Proxy header injection (Host, X-Real-IP, X-Forwarded-For, X-Forwarded-Proto)
- Request rate limiting with fail2ban-compatible logging (global per-IP; per-site in Phase 2)
- 100 MB body size limit (global; per-site in Phase 2)
- Request rate limiting with fail2ban-compatible logging (global per-IP)
- 100 MB body size limit (global)
- Configurable bind address (no `0.0.0.0` default)
- Health check endpoint
- Graceful shutdown (SIGTERM handling)
- Systemd unit file
- Dual licensing: MIT OR Apache-2.0
- **Phase 2**: Multi-site support
- SNI-based TLS routing for multiple domains
- Config file for site definitions
- Dynamic config reload (ArcSwap pattern)
- **Phase 3**: Operational hardening
- **Phase 2**: Operational hardening
- Per-site rate limits and body limits
- Per-site upstream timeouts
- Metrics endpoint (Prometheus-compatible)
- Connection limits and timeouts
- Log rotation
- **Phase 3**: Future enhancements
- Wildcard subdomain support
- Per-site TLS overrides (manual certs for specific domains)
- Unix domain socket config reload API
### Out of Scope
- HTTP/2 or HTTP/3 proxying (services that need these run their own native
Rust servers — e.g., `api.alk.dev`)
Rust servers — e.g., `api.alk.dev` runs its own HTTP/2+ server)
- Load balancing or round-robin upstream selection
- WebSocket proxying (can be added later if needed)
- Static file serving
- Access control beyond rate limiting (no auth, no IP allowlists in Phase 1)
- CGI, SCGI, uWSGI, FastCGI
- Per-site TLS configuration (all domains share one ACME config in Phase 1)
## Architecture
```
┌────────────────────────────────────┐
│ reverse-proxy (Rust/axum) │
┌────────────────────────────────────┐
│ reverse-proxy (Rust/axum) │
config.toml ──────► │ StaticConfig + DynamicConfig │
│ (ArcSwap for hot-reload) │
│ │
│ (ArcSwap for hot-reload) │
│ │
bind_addr:80 ──► │ HTTP listener → 301 redirect │
│ to HTTPS │
│ │
│ to HTTPS │
│ │
bind_addr:443 ──► │ TLS listener (tokio-rustls) │
│ ├─ ACME mode: rustls-acme resolver │
│ │ (auto cert provisioning/renewal)
│ └─ Manual mode: cert/key file paths
axum router
│ ├─ Host-based routing
│ ├─ Rate limiting middleware
│ ├─ Proxy header injection
├─ Body size limit (100MB)
─ Reverse proxy handler
└─ hyper Client → upstream
│ /health → 200 OK
└────────────────────────────────────┘
│ ├─ ACME mode: rustls-acme resolver │
│ │ (multi-domain SAN cert,
│ │ auto-provision & renew)
│ └─ Manual mode: cert/key file paths
│ axum router
│ ├─ Host-based routing
│ ├─ git.alk.dev → :3000
│ └─ alk.dev → :8080
─ Rate limiting middleware
├─ Proxy header injection
│ ├─ Body size limit (100MB)
│ └─ Reverse proxy handler
│ └─ hyper Client → upstream │
│ │
│ /health → 200 OK │
└────────────────────────────────────┘
```
## Crate Dependencies
@@ -147,7 +158,7 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
| ADR | Decision | Summary |
|-----|----------|---------|
| [001](decisions/001-rust-axum.md) | Rust with axum | Memory safety eliminates the bug class causing nginx CVEs; axum provides ergonomic tower integration |
| [002](decisions/002-custom-proxy-handler.md) | Custom proxy handler | Single upstream, single domain — axum-reverse-proxy adds unnecessary complexity |
| [002](decisions/002-custom-proxy-handler.md) | Custom proxy handler | Single upstream per domain — simpler than a general proxy library |
| [003](decisions/003-toml-config.md) | TOML configuration format | Rust-native, unambiguous, excellent serde support |
| [004](decisions/004-rustls-acme.md) | ACME-primary certificate management | Eliminates certbot dependency; automatic provisioning and renewal |
| [005](decisions/005-tokio-rustls-direct.md) | tokio-rustls directly, not axum-server | Full control over TLS config, ACME resolver integration, cipher suite configuration |
@@ -155,6 +166,8 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
| [007](decisions/007-custom-log-format.md) | Custom structured log format | key=value pairs with RATE_LIMIT prefix for fail2ban |
| [008](decisions/008-static-dynamic-config-split.md) | Static/dynamic config with ArcSwap | Immutable StaticConfig, hot-reloadable DynamicConfig via ArcSwap |
| [009](decisions/009-signal-handling.md) | Signal handling strategy | signal-hook for SIGTERM/SIGINT/SIGHUP |
| [010](decisions/010-multi-site-phase1.md) | Multi-site in Phase 1 | Multiple domains from initial release; avoids config migration later |
| [011](decisions/011-multi-domain-tls.md) | Multi-domain TLS config | Single SAN certificate covering all domains via rustls-acme |
## Open Questions
@@ -163,4 +176,4 @@ questions affecting this document:
- **OQ-01**: Should cipher suites be restricted beyond rustls defaults? (open)
- **OQ-03**: Should the health check endpoint be on a separate port? (open)
- **OQ-05**: Should the proxy bind to multiple addresses or just one? (open)
- **OQ-07**: Should per-site TLS overrides be supported for mixed ACME/manual domains? (open)