Expand architecture: multi-site Phase 1, multi-domain TLS, fix review issues
Promote multi-site support from Phase 2 to Phase 1 (ADR-010): the proxy must support git.alk.dev and alk.dev from initial release. Add multi-domain TLS configuration (ADR-011): acme_domains array replaces acme_domain string, single SAN certificate via rustls-acme. Key changes: - ADR-010: Multi-site in Phase 1 — avoids config format migration later - ADR-011: Multi-domain TLS — single SAN cert, acme_domains Vec<String> - ADR-002: Updated rationale for multi-site (one upstream per domain) - overview.md: Phase 1 now includes multi-site, alk.dev pass-through, dual licensing (MIT OR Apache-2.0), real IP removed - config.md: acme_domain → acme_domains, TOML example shows both sites, validation adds unique host check, real IP replaced with 203.0.113.10 - tls.md: Multi-domain SNI section moved from Future to current, manual mode uses ResolvesServerCert for SNI mapping, TOML header fixed - proxy.md: Updated for multi-site, removed single-domain language - operations.md: RFC 5737 documentation IPs, clarified rate limit eviction semantics (distinct scan interval vs eviction age) - open-questions.md: OQ-05 resolved (single bind_addr sufficient), new OQ-07 (per-site TLS overrides) Review fixes: - acme_domains (plural) consistently used across all docs and diagram - ADR-011 clearly scopes acme_domain as previous design - Inline decision rationale extracted: tls.md hot-reload → ADR-004 ref, config.md static/dynamic → ADR-008 ref - TOML section headers consistent (server.tls)
This commit is contained in:
@@ -16,8 +16,9 @@ available:
|
||||
2. **Custom handler** (Felix Knorr pattern): Build a handler using hyper's
|
||||
`Client` to forward requests. ~50-100 lines of Rust for our needs.
|
||||
|
||||
Our use case is minimal: single upstream per domain, single domain, no load
|
||||
balancing, no retry, no HTTP/2 proxying.
|
||||
Our use case is minimal: single upstream per domain, no load balancing, no
|
||||
retry, no HTTP/2 proxying. While the proxy supports multiple domains
|
||||
(ADR-010), each domain routes to exactly one upstream.
|
||||
|
||||
## Decision
|
||||
|
||||
@@ -31,6 +32,8 @@ project's channel proxy.
|
||||
path-based routing to multiple backends)
|
||||
- Our proxy case is the simplest possible: match a Host header, forward the
|
||||
entire request to a single upstream, stream the response back
|
||||
- Multi-domain support (ADR-010) doesn't change this — each domain still maps
|
||||
to one upstream
|
||||
- The Felix Knorr pattern is proven, idiomatic, and ~50-100 lines
|
||||
- We maintain full control over header injection, error handling, and upstream
|
||||
connection behavior
|
||||
@@ -46,11 +49,12 @@ project's channel proxy.
|
||||
|
||||
**Negative:**
|
||||
- We implement and maintain proxy logic ourselves (but it's trivial for our
|
||||
use case)
|
||||
use case — each domain maps to one upstream)
|
||||
- If requirements grow to load balancing or retry, we'd need to add that
|
||||
ourselves or switch to `axum-reverse-proxy`
|
||||
|
||||
## References
|
||||
|
||||
- [proxy.md](../proxy.md)
|
||||
- [ADR-010](010-multi-site-phase1.md) (multi-site in Phase 1)
|
||||
- Felix Knorr, "Replacing nginx with axum" (felix-knorr.net/posts/2024-10-13-replacing-nginx-with-axum.html)
|
||||
90
docs/architecture/decisions/010-multi-site-phase1.md
Normal file
90
docs/architecture/decisions/010-multi-site-phase1.md
Normal file
@@ -0,0 +1,90 @@
|
||||
# ADR-010: Multi-Site Support in Phase 1
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The original architecture phased multi-site support into Phase 2, treating
|
||||
Phase 1 as a single-domain replacement for nginx serving only `git.alk.dev`.
|
||||
This was based on the assumption that only one domain needed proxying initially.
|
||||
|
||||
However, `alk.dev` (the bare domain) will need proxying in the near future.
|
||||
While `alk.dev` is a simple case — proxying to a Deno/Fresh container with no
|
||||
special requirements — the proxy must support multiple sites from day one. The
|
||||
config format, routing logic, and TLS certificate provisioning all need
|
||||
multi-site awareness.
|
||||
|
||||
Additionally, `api.alk.dev` is explicitly out of scope (it runs its own
|
||||
HTTP/2+ server natively), but the proxy must not prevent future sites from
|
||||
being added.
|
||||
|
||||
The cost of deferring multi-site is high: we'd need a config format migration,
|
||||
routing logic rewrite, and TLS cert management changes later. Supporting
|
||||
multi-site from the start costs very little — the config format just uses an
|
||||
array of sites (which it already does), host-based routing is trivial in axum,
|
||||
and `rustls-acme` supports multi-domain certificates natively.
|
||||
|
||||
## Decision
|
||||
|
||||
Move multi-site support from Phase 2 into Phase 1. The proxy supports multiple
|
||||
sites from the initial release:
|
||||
|
||||
- `[[sites]]` array in config (already the planned format)
|
||||
- Host-based routing via axum's `Host` extractor (already the planned approach)
|
||||
- Multi-domain ACME certificate provisioning via `rustls-acme`
|
||||
- Each site maps a hostname to an upstream address
|
||||
|
||||
Phase 1 scope becomes:
|
||||
|
||||
1. Multi-site reverse proxy with TLS termination
|
||||
2. ACME certificate management (multi-domain)
|
||||
3. HTTP → HTTPS redirect
|
||||
4. Rate limiting, logging, health check, graceful shutdown
|
||||
5. Systemd integration
|
||||
|
||||
Phase 2 scope shifts to operational hardening:
|
||||
|
||||
1. Per-site rate limits and body limits
|
||||
2. Per-site upstream timeouts
|
||||
3. Metrics endpoint (Prometheus-compatible)
|
||||
4. Connection limits and timeouts
|
||||
5. Log rotation
|
||||
|
||||
Phase 3 remains future enhancements.
|
||||
|
||||
## Rationale
|
||||
|
||||
- The config format already uses `[[sites]]` — no format change needed
|
||||
- Host-based routing is the natural axum pattern and was already planned
|
||||
- `rustls-acme` accepts `Vec<domain>` — multi-domain is its default usage
|
||||
- The cost of adding multi-site later (config migration, routing rewrite,
|
||||
cert management changes) far exceeds the cost of supporting it now (zero
|
||||
additional complexity)
|
||||
- `alk.dev` is confirmed as a near-term need, not a hypothetical
|
||||
- The proxy's value proposition is being a memory-safe reverse proxy for *our
|
||||
infrastructure*, which has multiple domains
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive:**
|
||||
- No config format migration needed later
|
||||
- `alk.dev` can be added to the config without code changes
|
||||
- TLS cert management handles multiple domains from the start
|
||||
- Eliminates an entire phase of work
|
||||
|
||||
**Negative:**
|
||||
- Slightly more testing surface (must verify correct routing with multiple
|
||||
sites)
|
||||
- Must test multi-domain ACME provisioning (not just single-domain)
|
||||
- Wildcard or fallback site behavior needs to be defined (addressed in
|
||||
OQ-07)
|
||||
|
||||
## References
|
||||
|
||||
- [overview.md](../overview.md)
|
||||
- [config.md](../config.md)
|
||||
- [tls.md](../tls.md)
|
||||
- [proxy.md](../proxy.md)
|
||||
- ADR-002 (custom proxy handler — rationale updated for multi-site)
|
||||
92
docs/architecture/decisions/011-multi-domain-tls.md
Normal file
92
docs/architecture/decisions/011-multi-domain-tls.md
Normal file
@@ -0,0 +1,92 @@
|
||||
# ADR-011: Multi-Domain TLS Configuration
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
With multi-site support in Phase 1 (ADR-010), the TLS configuration must
|
||||
support multiple domains. The previous design used a single `tls.acme_domain`
|
||||
string field, which only works for one domain.
|
||||
|
||||
There are several approaches to multi-domain TLS:
|
||||
|
||||
1. **Single ACME config with domain list**: `acme_domains = ["git.alk.dev",
|
||||
"alk.dev"]` — one certificate covering all domains (SAN certificate)
|
||||
2. **Per-site TLS configuration**: Each site entry specifies its own TLS
|
||||
mode (ACME or manual) and domain — more flexible but complex
|
||||
3. **Hybrid**: A global TLS section with ACME domains, plus per-site overrides
|
||||
for manual certificates
|
||||
|
||||
For our use case, all proxied domains use the same ACME certificate authority
|
||||
(Let's Encrypt) and the same challenge type (TLS-ALPN-01). There's no need
|
||||
for per-site TLS configuration in Phase 1.
|
||||
|
||||
## Decision
|
||||
|
||||
Use a single ACME configuration with a list of domains, producing one SAN
|
||||
certificate covering all proxied domains. Manual mode uses certificate file
|
||||
paths (single cert file with all domains, or one cert per domain resolved via
|
||||
SNI).
|
||||
|
||||
The config format changes from the previous single-domain format:
|
||||
|
||||
```toml
|
||||
# Previous (single-domain) format — no longer used
|
||||
[tls]
|
||||
mode = "acme"
|
||||
acme_domain = "git.alk.dev" # single string
|
||||
```
|
||||
|
||||
To the current multi-domain format:
|
||||
|
||||
```toml
|
||||
[tls]
|
||||
mode = "acme"
|
||||
acme_domains = ["git.alk.dev", "alk.dev"] # array of strings
|
||||
```
|
||||
|
||||
In ACME mode, `rustls-acme` provisions a single certificate covering all
|
||||
listed domains via Subject Alternative Names (SAN). This is the standard
|
||||
Let's Encrypt approach for multi-domain certificates.
|
||||
|
||||
In manual mode, the cert and key files must cover all domains (either a SAN
|
||||
certificate or separate certificates resolved via SNI).
|
||||
|
||||
## Rationale
|
||||
|
||||
- A single SAN certificate is simpler to manage (one renewal, one cert)
|
||||
- Let's Encrypt supports SAN certificates with up to 100 domains
|
||||
- `rustls-acme` accepts `Vec<String>` for domain lists — this is its natural
|
||||
API
|
||||
- All our domains use the same ACME configuration (Let's Encrypt production,
|
||||
TLS-ALPN-01 challenge)
|
||||
- Per-site TLS overrides add complexity with no current benefit
|
||||
- If per-site TLS configuration is needed later (e.g., a site with a manual
|
||||
cert), it can be added as an optional override without changing the global
|
||||
config structure
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive:**
|
||||
- Single certificate for all domains — simpler renewal, simpler cert management
|
||||
- Matches `rustls-acme`'s natural API (`AcmeConfig::new(domains: Vec<String>)`)
|
||||
- All domains in one cert means SNI resolution is handled by ACME automatically
|
||||
- Config format is a minimal change from single-domain
|
||||
|
||||
**Negative:**
|
||||
- Adding or removing a domain requires re-provisioning the certificate (ACME
|
||||
handles this automatically, but it means cert changes affect all domains)
|
||||
- If one domain fails ACME validation, the entire cert renewal fails (all
|
||||
domains must be validated) — mitigated by Let's Encrypt's domain-level
|
||||
validation
|
||||
- Per-site TLS configuration (e.g., a domain with a manual cert) requires a
|
||||
future config extension (OQ-07)
|
||||
|
||||
## References
|
||||
|
||||
- [tls.md](../tls.md)
|
||||
- [config.md](../config.md)
|
||||
- ADR-010 (multi-site in Phase 1)
|
||||
- ADR-004 (ACME-primary certificate management)
|
||||
Reference in New Issue
Block a user