Resolve all architecture review findings (7 critical, 14 warnings, 6 suggestions)

Critical findings resolved: - C1: Site routing is global (per-listener TOML, global runtime lookup) - C2: X-Forwarded-For replaces (not appends) — edge proxy model (ADR-021) - C3: Hop-by-hop header handling rules specified (proxy.md) - C4: ACME failure behavior defined (tls.md) - C5: Startup sequence with fail-fast semantics (operations.md) - C6: Per-listener Router instances with shared global state (overview.md) - C7: Rate limiter adopts new params on next request, no state clear (operations.md) Warnings resolved: - W1: Admin socket wire protocol specified - W2: Host header port stripped, hostnames only in config - W3: HTTP redirect URL construction with port handling - W4: /health on HTTPS matches regardless of Host header - W5: Static config changes logged as warning during reload - W6: Reload operations serialized via Mutex - W7: http_port validation rules added (9 new rules total) - W8: upstream format validation (host:port required, no scheme) - W9: TLS error handling table (SNI, version, cipher failures) - W10: IPv6 rate limited per /64 prefix - W11: Graceful shutdown sequence specified (6 steps) - W12: Error response bodies: minimal plain text, no version disclosure - W13: upstream_scheme HTTPS uses system CA store - W14: allow_wildcard_bind is OR between config and CLI - W15: ADR-010 Phase 2 list updated (timeouts moved to Phase 1) - W17: LoggingConfig static/restart note added Suggestions applied: - S2: ConnectInfo propagation note - S3: Case-insensitive host matching (RFC 7230) - S5: Response streaming behavior (chunk-by-chunk) - S6: Token bucket nodelay semantics - S7: File watching explicitly out of scope - S8: All paths forwarded without filtering - S9: shutdown_timeout_secs referenced in shutdown description - S11: Consolidated defaults table in config.md
2026-06-11 10:56:40 +00:00
parent bcc58bc7ce
commit ceb59ad9b9
8 changed files with 467 additions and 61 deletions
--- a/docs/architecture/operations.md
+++ b/docs/architecture/operations.md
@@ -36,6 +36,12 @@ Rate limits are global per-IP in Phase 1 (not per-site). A request from IP
 address X counts against the same bucket regardless of which site it targets.
 Per-site rate limits may be added in Phase 2.

+The token bucket uses **nodelay** semantics matching nginx's `limit_req burst
+nodelay`: when the bucket is empty, the request is immediately rejected with
+429 — requests are not queued. Tokens are added at a rate of
+`requests_per_second` (1 token every 1000ms / requests_per_second), and the
+bucket capacity is the `burst` value.
+
 When a request exceeds the rate limit, the middleware returns `429 Too Many
 Requests` and logs the event with structured fields.

@@ -47,6 +53,37 @@ whose last access timestamp is older than a configurable eviction age
 (default: 300 seconds / 5 minutes). This prevents unbounded memory growth
 while preserving recent entries that may still receive traffic.

+### Config Reload Behavior
+
+When rate limit parameters change (e.g., from 10 req/s burst 20 to 20 req/s
+burst 40), the behavior is:
+
+1. New `DynamicConfig` is swapped in via ArcSwap.
+2. On the next request from an existing IP, the rate limiter reads the current
+   `DynamicConfig` for rate/burst parameters.
+3. The token bucket refills using the new rate, and its capacity is set to the
+   new burst maximum.
+4. If the current token count exceeds the new burst maximum, it is capped to
+   the new burst maximum.
+
+The HashMap is **not** cleared — this avoids creating a rate-limiting gap.
+Existing buckets adopt new parameters on their next request. The eviction task
+continues removing stale entries independently.
+
+### IPv6 Rate Limiting
+
+IPv6 addresses have a vastly larger address space than IPv4. Rate limiting per
+individual IPv6 address (`/128`) is ineffective against attackers who can
+generate many addresses within a `/64` prefix.
+
+- **IPv4**: Rate limited per individual address (`/32`).
+- **IPv6**: Rate limited per `/64` prefix. All addresses in the same `/64` share
+  the same token bucket. This matches RFC 4941 privacy extension boundaries and
+  common anti-abuse practice.
+
+The rate limiter normalizes IPv6 addresses to their `/64` prefix before
+bucket lookup.
+
 ### Fail2ban Integration

 Rate limit events are logged in a structured format that a custom fail2ban
@@ -225,13 +262,46 @@ process does not exit on SIGHUP.

 The admin Unix domain socket provides programmatic config reload with feedback.
 This is useful for CI/CD pipelines and automation tools. See ADR-014 for the
-command protocol.
+rationale.

-### Timeout
+**Protocol:**

-In-flight requests have a configurable shutdown timeout (default: 30 seconds).
-After the timeout, remaining connections are forcefully closed and the process
-exits.
+- **Connection lifecycle**: One command per connection. Client connects, sends
+  one newline-terminated command, receives one newline-terminated JSON
+  response, then the server closes the connection.
+- **Message framing**: Newline-delimited (`\n`). Responses end with `\n`.
+- **Commands**:
+  - `reload` — Re-read config file, validate, and swap DynamicConfig. Returns
+    `{"status": "ok"}` or `{"status": "error", "message": "..."}`.
+  - `status` — Return basic process info. Returns
+    `{"status": "ok", "uptime_secs": 1234, "sites": 2}`.
+- **Error responses**: Unrecognized commands return
+  `{"status": "error", "message": "unknown command: <cmd>"}`. Invalid or empty
+  input returns `{"status": "error", "message": "invalid input"}`.
+- **Concurrency**: Multiple clients can connect simultaneously, but reload
+  operations are serialized (see Config Reload section in config.md).
+- **Socket cleanup**: The proxy removes any existing socket file at startup
+  before binding. If the file exists and another process is listening, a warning
+  is logged and the admin socket is disabled (but the proxy continues starting).
+
+### Shutdown Sequence
+
+On SIGTERM or SIGINT, the proxy performs a graceful shutdown:
+
+1. **Stop accepting new connections** — Close all TCP listening sockets. No new
+   connections are accepted.
+2. **Close idle keep-alive connections** — Send `Connection: close` on any idle
+   connections in the keep-alive pool.
+3. **Wait for in-flight requests** — Up to `shutdown_timeout_secs` (default: 30)
+   for active requests to complete.
+4. **Force-close remaining connections** — After the timeout, any remaining
+   connections are forcefully closed via TCP RST.
+5. **Cancel background tasks** — ACME renewal tasks, rate limiter eviction task,
+   and admin socket listener are all cancelled.
+6. **Exit with code 0**.
+
+The `shutdown_timeout_secs` is configurable in StaticConfig (default: 30
+seconds). See config.md for details.

 ## Deployment

@@ -443,6 +513,51 @@ continues to be routed directly to the Gitea container via Docker port
 publishing (e.g., `203.0.113.10:22:2222`), matching the current deployment
 pattern.

+## Startup Sequence
+
+The proxy starts components in a specific order to ensure fail-fast behavior
+and correct dependency initialization:
+
+1. **Parse and validate config** — Read the TOML config file, deserialize into
+   `StaticConfig` and `DynamicConfig`, and validate all rules. If validation
+   fails, exit with non-zero code and log errors. No ports are bound.
+
+2. **Initialize DynamicConfig** — Load sites, rate limits, and body limits into
+   `ArcSwap<DynamicConfig>`.
+
+3. **Initialize shared state** — Create the rate limiter
+   `HashMap<IpAddr, TokenBucket>`, the shared `hyper::Client`, and the
+   `tracing-subscriber` with file and stdout layers.
+
+4. **Bind health check port** (if enabled) — Bind `127.0.0.1:{health_check_port}`.
+   Fail-fast if bind fails.
+
+5. **Bind admin socket** (if enabled) — Remove any stale socket file first, then
+   bind the Unix domain socket. If the socket file exists and another process is
+   listening, log a warning and fail the admin socket (but continue starting —
+   the admin socket is non-critical).
+
+6. **Bind all listener ports** — For each listener: bind HTTP port (if enabled)
+   and HTTPS port. If any bind fails, fail-fast and exit. All ports are bound
+   before proceeding.
+
+7. **Load TLS configuration** — For each listener: load manual certificates or
+   initialize ACME state machine. If manual certificate loading fails, fail-fast
+   and exit. For ACME: if no cached certificate exists and ACME provisioning
+   fails, fail-fast and exit.
+
+8. **Start TCP listeners** — Begin accepting connections on all bound ports.
+
+9. **Start background tasks** — ACME renewal tasks (per listener in ACME mode),
+   rate limiter eviction task, signal handler task, admin socket handler task.
+
+10. **Signal readiness** — Send `sd_notify("READY=1")` to systemd (if running
+    under systemd).
+
+**Failure semantics**: **Fail-fast**. If any step fails, the process exits with
+a non-zero code. The proxy does not partially start. All ports are bound before
+any connections are accepted.
+
 ## Design Decisions

 All design decisions are documented as ADRs in [decisions/](decisions/).