Add architecture specification for Rust/axum reverse proxy

Phase 1 architecture docs covering proxy handler, TLS termination (ACME + manual), TOML config with static/dynamic split (ArcSwap), and operations (rate limiting, logging, health check, systemd, graceful shutdown). Nine ADRs documenting key decisions: Rust/axum, custom proxy handler, TOML config, rustls-acme for cert management, tokio-rustls direct, token bucket rate limiting, custom log format for fail2ban, static/dynamic config split, and signal handling strategy. Includes threat landscape research documenting the nginx CVEs motivating this project.
2026-06-11 07:25:50 +00:00
parent 5c54a28822
commit 8ee6284b62
17 changed files with 1819 additions and 0 deletions
--- a/docs/architecture/open-questions.md
+++ b/docs/architecture/open-questions.md
@@ -0,0 +1,86 @@
+---
+status: draft
+last_updated: 2026-06-11
+---
+
+# Open Questions
+
+## TLS
+
+### OQ-01: Should cipher suites be restricted beyond rustls defaults?
+
+- **Origin**: [tls.md](tls.md)
+- **Status**: open
+- **Priority**: medium
+- **Context**: Our current nginx config explicitly restricts cipher suites to
+  four ECDHE-AES-GCM suites. rustls 0.23 with `aws_lc_rs` defaults to a
+  conservative set that excludes all weak ciphers (no SHA-1, no 3DES, no RC4,
+  no CBC-mode suites, no RSA key exchange). The defaults include TLS 1.3 suites
+  which nginx also allows. Restricting further would reduce compatibility with
+  older clients; not restricting means accepting a wider (but still safe) set
+  than the current nginx config.
+- **Cross-references**: ADR-005
+
+## Logging and Monitoring
+
+### ~~OQ-02: What log format should fail2ban consume?~~
+
+- **Origin**: [operations.md](operations.md), [proxy.md](proxy.md)
+- **Status**: resolved
+- **Priority**: high
+- **Resolution**: Custom structured log format with `key=value` pairs and
+  `RATE_LIMIT` prefix. A corresponding custom fail2ban filter will be provided.
+  See ADR-007.
+- **Cross-references**: ADR-007
+
+### OQ-03: Should the health check endpoint be on a separate port?
+
+- **Origin**: [operations.md](operations.md)
+- **Status**: open
+- **Priority**: low
+- **Context**: Currently the health check is on the main HTTPS listener at
+  `/health`. Alternatives: (a) separate unencrypted port for health checks
+  (simpler for load balancers but less secure), (b) admin port with its own
+  listener (more complex but isolates operational traffic), (c) on the main
+  listener (simplest, proposed approach). For a single-server deployment behind
+  no external load balancer, the main listener is fine.
+- **Cross-references**: None
+
+## Configuration
+
+### OQ-04: Should config reload support a Unix domain socket API in addition to SIGHUP?
+
+- **Origin**: [config.md](config.md)
+- **Status**: open
+- **Priority**: low
+- **Context**: Phase 1 uses SIGHUP for config reload, which is simple and proven.
+  A Unix domain socket API would allow programmatic reload (e.g., from an admin
+  tool or CI/CD pipeline) and could return success/failure status. This adds
+  complexity and is not needed for Phase 1.
+- **Cross-references**: None
+
+## Deployment
+
+### OQ-05: Should the proxy bind to multiple addresses or just one?
+
+- **Origin**: [overview.md](overview.md)
+- **Status**: open
+- **Priority**: low
+- **Context**: Current nginx config binds to a specific IP (`15.235.125.95`).
+  The proposed config uses `bind_addr` which could be any IP. For Phase 1, the
+  config will specify a single IP address. Multi-address binding (listening on
+  multiple IPs) is not needed but could be added as an array of addresses.
+- **Cross-references**: None
+
+## Proxy
+
+### OQ-06: Should upstream timeouts be configurable per-site?
+
+- **Origin**: [proxy.md](proxy.md)
+- **Status**: open
+- **Priority**: low
+- **Context**: Phase 1 uses global defaults (5s connect timeout, 60s request
+  timeout) for all upstream connections. Per-site timeout configuration would
+  allow tuning for different upstream services (e.g., a slow database-backed
+  API vs. a fast static site). Not needed for Phase 1 with a single upstream.
+- **Cross-references**: None