Resolve all architecture review findings (7 critical, 14 warnings, 6 suggestions)

Critical findings resolved: - C1: Site routing is global (per-listener TOML, global runtime lookup) - C2: X-Forwarded-For replaces (not appends) — edge proxy model (ADR-021) - C3: Hop-by-hop header handling rules specified (proxy.md) - C4: ACME failure behavior defined (tls.md) - C5: Startup sequence with fail-fast semantics (operations.md) - C6: Per-listener Router instances with shared global state (overview.md) - C7: Rate limiter adopts new params on next request, no state clear (operations.md) Warnings resolved: - W1: Admin socket wire protocol specified - W2: Host header port stripped, hostnames only in config - W3: HTTP redirect URL construction with port handling - W4: /health on HTTPS matches regardless of Host header - W5: Static config changes logged as warning during reload - W6: Reload operations serialized via Mutex - W7: http_port validation rules added (9 new rules total) - W8: upstream format validation (host:port required, no scheme) - W9: TLS error handling table (SNI, version, cipher failures) - W10: IPv6 rate limited per /64 prefix - W11: Graceful shutdown sequence specified (6 steps) - W12: Error response bodies: minimal plain text, no version disclosure - W13: upstream_scheme HTTPS uses system CA store - W14: allow_wildcard_bind is OR between config and CLI - W15: ADR-010 Phase 2 list updated (timeouts moved to Phase 1) - W17: LoggingConfig static/restart note added Suggestions applied: - S2: ConnectInfo propagation note - S3: Case-insensitive host matching (RFC 7230) - S5: Response streaming behavior (chunk-by-chunk) - S6: Token bucket nodelay semantics - S7: File watching explicitly out of scope - S8: All paths forwarded without filtering - S9: shutdown_timeout_secs referenced in shutdown description - S11: Consolidated defaults table in config.md
2026-06-11 10:56:40 +00:00
parent bcc58bc7ce
commit ceb59ad9b9
8 changed files with 467 additions and 61 deletions
--- a/docs/architecture/README.md
+++ b/docs/architecture/README.md
@@ -52,6 +52,7 @@ certificate via ACME.
 | [018](decisions/018-body-size-limit.md) | Request Body Size Limit | Accepted |
 | [019](decisions/019-multi-config-listeners.md) | Multi-Config Listener Support | Accepted |
 | [020](decisions/020-container-deployment.md) | Container Deployment Model | Accepted |
+| [021](decisions/021-x-forwarded-for-edge-proxy.md) | X-Forwarded-For Edge Proxy Model | Accepted |

 ## Open Questions

--- a/docs/architecture/config.md
+++ b/docs/architecture/config.md
@@ -100,6 +100,11 @@ Immutable after startup. Changes require a process restart.
 | `format` | `"text"` or `"json"` | Log output format |
 | `log_file_path` | `String` | Path to log file. When set, structured logs are written to this file in addition to stdout/stderr. Strongly recommended for fail2ban integration in container deployments (see ADR-020). Default: not set (file logging disabled) |

+**Note**: The entire `LoggingConfig` (including `log_file_path`) is static and
+requires a process restart to change. Log file path changes require reopening
+file handles, which is complex and low-value for Phase 1. Log rotation (Phase 2)
+will be handled via signal-based or built-in rotation.
+
 **ListenerConfig** (per-listener static config):

 | Field | Type | Description |
@@ -141,10 +146,13 @@ connections immediately.
 | `upstream_connect_timeout_secs` | `u64` | TCP connect timeout in seconds (default: `5`; see ADR-015, ADR-017) |
 | `upstream_request_timeout_secs` | `u64` | Full request timeout in seconds (default: `60`; see ADR-015, ADR-017) |

-Sites are defined per listener in the `[[listeners]]` entries. Each listener
-routes its own sites independently. The `DynamicConfig` collects all sites
-across all listeners for hot-reload via `ArcSwap`. When a config reload
-occurs, all listener site mappings are updated atomically.
+Sites are defined per listener in the `[[listeners]]` entries for organizational
+purposes, but at runtime they are collected into a single global routing table
+in `DynamicConfig`. The proxy looks up the `Host` header in this global table to
+route requests. Hostnames must be unique across all listeners — a `Host` header
+can only match one site definition, regardless of which listener received the
+request. See ADR-019 for the rationale behind the `[[listeners]]` configuration
+format.

 **Why these are dynamic:** See ADR-008 for the rationale. Site definitions
 and rate limits are per-request concerns that should not require restarting
@@ -152,6 +160,29 @@ the proxy or dropping active connections. Rate limits and body limits are
 global settings in Phase 1; per-site configuration for these is deferred to
 Phase 2.

+### Default Values
+
+| Field | Type | Default | Required |
+|-------|------|---------|----------|
+| `allow_wildcard_bind` | `bool` | `false` | No |
+| `health_check_port` | `u16` | `9900` | No |
+| `admin_socket_path` | `String` | `/run/reverse-proxy/admin.sock` | No |
+| `shutdown_timeout_secs` | `u64` | `30` | No |
+| `logging.level` | `String` | `"info"` | No |
+| `logging.format` | `String` | `"text"` | No |
+| `logging.log_file_path` | `String` | (not set) | No |
+| `listeners[].http_port` | `u16` | `80` | No |
+| `listeners[].https_port` | `u16` | `443` | No |
+| `listeners[].tls.acme_directory` | `String` | `"production"` | No |
+| `sites[].upstream_scheme` | `String` | `"http"` | No |
+| `sites[].upstream_connect_timeout_secs` | `u64` | `5` | No |
+| `sites[].upstream_request_timeout_secs` | `u64` | `60` | No |
+| `rate_limit.requests_per_second` | `u32` | — | Yes |
+| `rate_limit.burst` | `u32` | — | Yes |
+| `body.limit_bytes` | `u64` | — | Yes |
+
+Fields without defaults are required and must be specified in the config file.
+
 ## Config Reload

 ### ArcSwap Pattern
@@ -187,6 +218,41 @@ Both mechanisms converge on the same code path:
 3. Validate (check upstream reachability is optional)
 4. Call `ConfigReloadHandle::reload(new_config)`

+### Static Config Changes During Reload
+
+When the config file is reloaded (via SIGHUP or admin socket), the entire file
+is read and validated — both static and dynamic portions. This provides early
+error detection for misconfigurations that would prevent a restart from
+succeeding.
+
+If the full config fails validation, the reload is rejected and the old
+DynamicConfig remains active.
+
+If the full config passes validation but static fields have changed, the
+DynamicConfig is swapped normally and a warning is logged listing the changed
+static fields and noting that a restart is required for those changes to take
+effect. This gives operators early feedback about config drift.
+
+Only the DynamicConfig portion is swapped via ArcSwap. StaticConfig changes
+require a process restart to take effect.
+
+### Reload Serialization
+
+Reload operations are serialized using a `tokio::sync::Mutex` on the reload
+code path. If a reload is in progress (triggered by SIGHUP or admin socket) and
+a second reload is requested, the second request waits for the first to
+complete, then re-reads the config file (getting the latest version) and
+proceeds. This prevents race conditions where two concurrent reloads could apply
+an older config over a newer one.
+
+### Out of Scope: File Watching
+
+Automatic file watching (inotify, fsnotify, etc.) is out of scope for Phase 1.
+Config reload is triggered explicitly by SIGHUP or admin socket command. File
+watching adds complexity (debouncing, handling atomic renames, handling editor
+swap files) that is not justified for a single-instance proxy with infrequent
+config changes.
+
 ## TOML Config Format

 ### Multi-Config (Dedicated-IP Per Domain)
@@ -296,7 +362,7 @@ upstream = "127.0.0.1:8080"
 On startup, the config is validated:

 1. At least one `[[listeners]]` entry must exist
-2. Each listener's `bind_addr` is not `0.0.0.0` unless `allow_wildcard_bind = true` (in config) or `--allow-wildcard-bind` (CLI flag) is set (see ADR-016, ADR-020)
+2. Each listener's `bind_addr` is not `0.0.0.0` unless `allow_wildcard_bind` is enabled. This can be enabled via config (`allow_wildcard_bind = true`) or CLI flag (`--allow-wildcard-bind`). Either source enables it — it is an OR relationship, not AND. The CLI flag does not override the config value; if either is set, wildcard binding is allowed.
 3. Each listener's `bind_addr` and `https_port` combination must be unique
 4. In ACME mode, `acme_domains` must be non-empty
 5. In manual mode, `cert_path` and `key_path` must both be set and the files
@@ -308,6 +374,23 @@ On startup, the config is validated:
   route a request to when the `Host` header matches multiple sites.
 8. `rate_limit.requests_per_second` must be > 0
 9. `body.limit_bytes` must be > 0
+10. Each listener's `bind_addr` and `http_port` combination must be unique
+    (prevents bind-time errors, same as rule 3 for `https_port`)
+11. Within a listener, `http_port` and `https_port` must differ
+12. `https_port` must be 1–65535 (required — TLS needs a port)
+13. `http_port` must be 0 (disabled) or 1–65535
+14. `health_check_port` must not conflict with any listener's `http_port` or
+    `https_port` on the same bind address
+15. Site `host` values must not include a port number (e.g., `git.alk.dev`,
+    not `git.alk.dev:443`)
+16. Site `host` values must be valid hostnames (not IP addresses, not
+    including ports). Hostnames are normalized to lowercase during validation.
+17. `upstream` must be in `host:port` format where `port` is a required integer
+    1–65535. Examples: `gitea:3000`, `127.0.0.1:3000`, `[::1]:3000`. Invalid
+    examples: `gitea` (missing port), `http://gitea:3000` (includes scheme),
+    `10.0.0.5` (missing port). The `upstream_scheme` field handles the protocol.
+18. `upstream_scheme` values are case-sensitive: only `"http"` or `"https"`
+    (lowercase). Default is `"http"`.

 On SIGHUP reload, the same validation applies. If the new config fails
 validation, the reload is rejected and the old config remains active. An error
--- a/docs/architecture/decisions/010-multi-site-phase1.md
+++ b/docs/architecture/decisions/010-multi-site-phase1.md
@@ -48,10 +48,12 @@ Phase 1 scope becomes:
 Phase 2 scope shifts to operational hardening:

 1. Per-site rate limits and body limits
-2. Per-site upstream timeouts
-3. Metrics endpoint (Prometheus-compatible)
-4. Connection limits and timeouts
-5. Log rotation
+2. Metrics endpoint (Prometheus-compatible)
+3. Connection limits and timeouts
+4. Log rotation
+
+Note: "Per-site upstream timeouts" was originally listed here but was moved to
+Phase 1 via ADR-015.

 Phase 3 remains future enhancements.

--- a/docs/architecture/decisions/021-x-forwarded-for-edge-proxy.md
+++ b/docs/architecture/decisions/021-x-forwarded-for-edge-proxy.md
@@ -0,0 +1,74 @@
+# ADR-021: X-Forwarded-For Edge Proxy Model
+
+## Status
+
+Accepted
+
+## Context
+
+The reverse proxy terminates TLS and injects standard proxy headers (Host,
+X-Real-IP, X-Forwarded-For, X-Forwarded-Proto) before forwarding requests to
+upstream services. The question is how to handle the `X-Forwarded-For` header
+when the request already contains one.
+
+Two approaches exist:
+
+1. **Append**: Add the client IP to any existing `X-Forwarded-For` value.
+   This preserves the proxy chain history but trusts that existing values are
+   legitimate. In a chained proxy scenario (e.g., CDN → reverse proxy →
+   backend), appending is correct because the CDN's `X-Forwarded-For` value
+   is trustworthy.
+
+2. **Replace**: Set `X-Forwarded-For` to the client's IP address from
+   `ConnectInfo<SocketAddr>`, discarding any existing value. This is correct
+   when the proxy is the edge proxy directly facing the internet, because
+   client-provided `X-Forwarded-For` headers are untrusted — any client can
+   inject arbitrary values.
+
+This proxy is deployed as the **edge proxy** — it sits directly in front of the
+internet with no trusted proxies upstream. All client connections come directly
+to this proxy.
+
+## Decision
+
+Set `X-Forwarded-For` to the client's IP address from `ConnectInfo<SocketAddr>`,
+**replacing** any existing value rather than appending.
+
+The proxy is an edge proxy. There are no trusted proxies upstream, so existing
+`X-Forwarded-For` headers from clients cannot be trusted. Replacing prevents
+header spoofing attacks where a malicious client injects fake IP addresses to
+confuse upstream services or bypass IP-based access controls.
+
+## Rationale
+
+- The proxy is the edge proxy — it directly faces the internet. No CDN or
+  other trusted proxy sits in front of it.
+- Client-provided `X-Forwarded-For` headers are untrusted. Any client can send
+  `X-Forwarded-For: 1.2.3.4` to spoof their IP.
+- Appending to an untrusted header creates a security vulnerability: upstream
+  services (like Gitea) may use `X-Forwarded-For` for IP-based rate limiting or
+  access control. Spoofed values would bypass these protections.
+- `X-Real-IP` is also set to the client's IP from `ConnectInfo`, providing a
+  trustworthy header for upstream services that need the real client IP.
+- If this proxy is ever placed behind a CDN or other trusted proxy, the header
+  handling model should be revisited. A "trusted proxies" configuration can be
+  added in Phase 2 to support chained proxy scenarios.
+
+## Consequences
+
+**Positive:**
+- Prevents `X-Forwarded-For` spoofing attacks
+- Upstream services receive the real client IP, not a spoofed one
+- Simple, predictable header handling — no trust model to configure
+- Consistent with the proxy's role as the edge proxy
+
+**Negative:**
+- If the proxy is placed behind a CDN or other proxy in the future, the header
+  handling must be updated to support a "trusted proxies" model
+- Does not preserve legitimate proxy chain history (not applicable in our
+  deployment — there are no proxies upstream)
+
+## References
+
+- [proxy.md](../proxy.md)
+- Architecture Review C2 (X-Forwarded-For security model)
--- a/docs/architecture/operations.md
+++ b/docs/architecture/operations.md
@@ -36,6 +36,12 @@ Rate limits are global per-IP in Phase 1 (not per-site). A request from IP
 address X counts against the same bucket regardless of which site it targets.
 Per-site rate limits may be added in Phase 2.

+The token bucket uses **nodelay** semantics matching nginx's `limit_req burst
+nodelay`: when the bucket is empty, the request is immediately rejected with
+429 — requests are not queued. Tokens are added at a rate of
+`requests_per_second` (1 token every 1000ms / requests_per_second), and the
+bucket capacity is the `burst` value.
+
 When a request exceeds the rate limit, the middleware returns `429 Too Many
 Requests` and logs the event with structured fields.

@@ -47,6 +53,37 @@ whose last access timestamp is older than a configurable eviction age
 (default: 300 seconds / 5 minutes). This prevents unbounded memory growth
 while preserving recent entries that may still receive traffic.

+### Config Reload Behavior
+
+When rate limit parameters change (e.g., from 10 req/s burst 20 to 20 req/s
+burst 40), the behavior is:
+
+1. New `DynamicConfig` is swapped in via ArcSwap.
+2. On the next request from an existing IP, the rate limiter reads the current
+   `DynamicConfig` for rate/burst parameters.
+3. The token bucket refills using the new rate, and its capacity is set to the
+   new burst maximum.
+4. If the current token count exceeds the new burst maximum, it is capped to
+   the new burst maximum.
+
+The HashMap is **not** cleared — this avoids creating a rate-limiting gap.
+Existing buckets adopt new parameters on their next request. The eviction task
+continues removing stale entries independently.
+
+### IPv6 Rate Limiting
+
+IPv6 addresses have a vastly larger address space than IPv4. Rate limiting per
+individual IPv6 address (`/128`) is ineffective against attackers who can
+generate many addresses within a `/64` prefix.
+
+- **IPv4**: Rate limited per individual address (`/32`).
+- **IPv6**: Rate limited per `/64` prefix. All addresses in the same `/64` share
+  the same token bucket. This matches RFC 4941 privacy extension boundaries and
+  common anti-abuse practice.
+
+The rate limiter normalizes IPv6 addresses to their `/64` prefix before
+bucket lookup.
+
 ### Fail2ban Integration

 Rate limit events are logged in a structured format that a custom fail2ban
@@ -225,13 +262,46 @@ process does not exit on SIGHUP.

 The admin Unix domain socket provides programmatic config reload with feedback.
 This is useful for CI/CD pipelines and automation tools. See ADR-014 for the
-command protocol.
+rationale.

-### Timeout
+**Protocol:**

-In-flight requests have a configurable shutdown timeout (default: 30 seconds).
-After the timeout, remaining connections are forcefully closed and the process
-exits.
+- **Connection lifecycle**: One command per connection. Client connects, sends
+  one newline-terminated command, receives one newline-terminated JSON
+  response, then the server closes the connection.
+- **Message framing**: Newline-delimited (`\n`). Responses end with `\n`.
+- **Commands**:
+  - `reload` — Re-read config file, validate, and swap DynamicConfig. Returns
+    `{"status": "ok"}` or `{"status": "error", "message": "..."}`.
+  - `status` — Return basic process info. Returns
+    `{"status": "ok", "uptime_secs": 1234, "sites": 2}`.
+- **Error responses**: Unrecognized commands return
+  `{"status": "error", "message": "unknown command: <cmd>"}`. Invalid or empty
+  input returns `{"status": "error", "message": "invalid input"}`.
+- **Concurrency**: Multiple clients can connect simultaneously, but reload
+  operations are serialized (see Config Reload section in config.md).
+- **Socket cleanup**: The proxy removes any existing socket file at startup
+  before binding. If the file exists and another process is listening, a warning
+  is logged and the admin socket is disabled (but the proxy continues starting).
+
+### Shutdown Sequence
+
+On SIGTERM or SIGINT, the proxy performs a graceful shutdown:
+
+1. **Stop accepting new connections** — Close all TCP listening sockets. No new
+   connections are accepted.
+2. **Close idle keep-alive connections** — Send `Connection: close` on any idle
+   connections in the keep-alive pool.
+3. **Wait for in-flight requests** — Up to `shutdown_timeout_secs` (default: 30)
+   for active requests to complete.
+4. **Force-close remaining connections** — After the timeout, any remaining
+   connections are forcefully closed via TCP RST.
+5. **Cancel background tasks** — ACME renewal tasks, rate limiter eviction task,
+   and admin socket listener are all cancelled.
+6. **Exit with code 0**.
+
+The `shutdown_timeout_secs` is configurable in StaticConfig (default: 30
+seconds). See config.md for details.

 ## Deployment

@@ -443,6 +513,51 @@ continues to be routed directly to the Gitea container via Docker port
 publishing (e.g., `203.0.113.10:22:2222`), matching the current deployment
 pattern.

+## Startup Sequence
+
+The proxy starts components in a specific order to ensure fail-fast behavior
+and correct dependency initialization:
+
+1. **Parse and validate config** — Read the TOML config file, deserialize into
+   `StaticConfig` and `DynamicConfig`, and validate all rules. If validation
+   fails, exit with non-zero code and log errors. No ports are bound.
+
+2. **Initialize DynamicConfig** — Load sites, rate limits, and body limits into
+   `ArcSwap<DynamicConfig>`.
+
+3. **Initialize shared state** — Create the rate limiter
+   `HashMap<IpAddr, TokenBucket>`, the shared `hyper::Client`, and the
+   `tracing-subscriber` with file and stdout layers.
+
+4. **Bind health check port** (if enabled) — Bind `127.0.0.1:{health_check_port}`.
+   Fail-fast if bind fails.
+
+5. **Bind admin socket** (if enabled) — Remove any stale socket file first, then
+   bind the Unix domain socket. If the socket file exists and another process is
+   listening, log a warning and fail the admin socket (but continue starting —
+   the admin socket is non-critical).
+
+6. **Bind all listener ports** — For each listener: bind HTTP port (if enabled)
+   and HTTPS port. If any bind fails, fail-fast and exit. All ports are bound
+   before proceeding.
+
+7. **Load TLS configuration** — For each listener: load manual certificates or
+   initialize ACME state machine. If manual certificate loading fails, fail-fast
+   and exit. For ACME: if no cached certificate exists and ACME provisioning
+   fails, fail-fast and exit.
+
+8. **Start TCP listeners** — Begin accepting connections on all bound ports.
+
+9. **Start background tasks** — ACME renewal tasks (per listener in ACME mode),
+   rate limiter eviction task, signal handler task, admin socket handler task.
+
+10. **Signal readiness** — Send `sd_notify("READY=1")` to systemd (if running
+    under systemd).
+
+**Failure semantics**: **Fail-fast**. If any step fails, the process exits with
+a non-zero code. The proxy does not partially start. All ports are bound before
+any connections are accepted.
+
 ## Design Decisions

 All design decisions are documented as ADRs in [decisions/](decisions/).
--- a/docs/architecture/overview.md
+++ b/docs/architecture/overview.md
@@ -92,8 +92,9 @@ details.
                       │                                      │
  bind_addr:443 ────►  │  │  TLS listener (tokio-rustls)    │   │
  (published)          │  │  ├─ ACME or Manual TLS config    │   │
-                      │  │  └─ axum router                  │   │
-                      │  │     ├─ Host-based routing         │   │
+                       │  │  └─ axum router (per-listener)   │   │
+                       │  │     ├─ /health → 200 OK (any)    │   │
+                       │  │     ├─ Host → global site lookup  │   │
                       │  │     ├─ git.alk.dev → gitea:3000  │   │
                       │  │     └─ Rate limiting, headers     │   │
                       │  └────────────────────────────────┘   │
@@ -104,12 +105,15 @@ details.
                       │                                      │
  bind_addr_N:443 ───► │  │  TLS listener (tokio-rustls)    │   │
                       │  │  ├─ Manual TLS cert             │   │
-                      │  │  └─ axum router                  │   │
+                       │  │  └─ axum router (per-listener)   │   │
+                       │  │     ├─ /health → 200 OK (any)    │   │
+                       │  │     ├─ Host → global site lookup  │   │
                       │  │     ├─ alk.dev → app:8080       │   │
                       │  │     └─ Rate limiting, headers     │   │
                       │  └────────────────────────────────┘   │
                       │                                      │
                       │  /health → 200 OK (port 9900)        │
+                       │  Admin socket (Unix domain)           │
                       └────────────────────────────────────┘
                            │              │
                     ┌──────┘              └──────┐
@@ -121,6 +125,13 @@ details.
                                         └─ admin socket (rw)
 ```

+Each listener has its own `axum::Router` instance with its own middleware stack,
+but all routers share `Arc<ArcSwap<DynamicConfig>>` and
+`Arc<Mutex<HashMap<IpAddr, TokenBucket>>>` via axum State. Site routing is
+global: the `Host` header is matched against a single routing table collected
+from all listeners' site definitions. Hostnames must be unique across all
+listeners — see C1 resolution in the architecture review.
+
 In container deployments (ADR-020), the proxy runs in a minimal container with
 `0.0.0.0` bind address and Docker port publishing. Upstream addresses use Docker
 DNS names for same-host containers (e.g., `gitea:3000`) but also support
@@ -196,6 +207,7 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
 | [018](decisions/018-body-size-limit.md) | Request body size limit | 100 MB default matching nginx, Gitea push compatibility |
 | [019](decisions/019-multi-config-listeners.md) | Multi-config listeners | `[[listeners]]` supporting both dedicated-IP and shared-IP deployment models |
 | [020](decisions/020-container-deployment.md) | Container deployment model | Defense-in-depth via container isolation; file-primary logging; flexible upstream addressing |
+| [021](decisions/021-x-forwarded-for-edge-proxy.md) | X-Forwarded-For edge proxy model | Replace, don't append — proxy is the edge, no trusted upstream proxies |

 ## Open Questions

--- a/docs/architecture/proxy.md
+++ b/docs/architecture/proxy.md
@@ -76,26 +76,43 @@ Incoming HTTPS request
 ### 1. Host-Based Routing

 The axum router uses a `Host` extractor to match incoming requests to site
-definitions from `DynamicConfig`. Each site definition maps a hostname to an
-upstream address.
+definitions from `DynamicConfig`. Sites are defined per-listener in the TOML
+configuration for organizational purposes, but at runtime they are collected
+into a single global routing table. The proxy looks up the `Host` header in
+this global table and either proxies to the upstream or returns 404.

-Where `host_based_proxy` reads the `Host` header, looks up the site in
-`DynamicConfig.sites`, and either proxies to the upstream or returns 404.
+Host matching is **case-insensitive** per RFC 7230 §2.7.3. The `Host` header
+is normalized to lowercase before matching. Site `host` values in
+configuration are normalized to lowercase during validation.
+
+The `Host` header port component (e.g., `git.alk.dev:443`) is stripped before
+matching. Site `host` values must not include ports.
+
+The proxy does not filter or restrict paths. All paths and query strings on a
+known host are forwarded to the upstream without modification.
+
+The `/health` path is a special case: it matches regardless of the `Host`
+header and is evaluated before host-based routing. A `GET /health` request on
+any hostname returns `200 OK` with an empty body.

 ### 2. Proxy Header Injection

-Headers are injected before forwarding. The handler reads connection metadata
-from axum's `ConnectInfo` and the original request:
+Headers are injected before forwarding. The proxy is an **edge proxy** — it
+sits directly in front of the internet with no trusted proxies upstream. This
+means the client IP from `ConnectInfo<SocketAddr>` is the real client IP, and
+existing `X-Forwarded-For` headers from the client cannot be trusted.

 | Header | Value Source | Notes |
 |--------|-------------|-------|
-| `Host` | Original request `Host` header | Already present; preserved as-is |
+| `Host` | Original request `Host` header | Preserved as-is |
 | `X-Real-IP` | `ConnectInfo<SocketAddr>` remote IP | Set to client's IP address |
-| `X-Forwarded-For` | Client IP, appended if header exists | Comma-separated list of proxies |
+| `X-Forwarded-For` | `ConnectInfo<SocketAddr>` remote IP | **Replaced**, not appended. The proxy is the edge proxy — there are no trusted proxies upstream, so existing `X-Forwarded-For` values from the client cannot be trusted. |
 | `X-Forwarded-Proto` | Determined by which listener port received the request | `https` for requests on the listener's `https_port`, `http` for requests on the listener's `http_port` |

-The `X-Forwarded-For` handling must append the client IP to any existing value
-(rather than replacing it), to support chained proxies correctly.
+**ConnectInfo propagation**: `ConnectInfo<SocketAddr>` is populated by
+extracting `TcpStream::peer_addr()` before wrapping the connection in
+`TlsStream`. Each listener provides this information to its axum Router via
+`axum::ServiceExt::into_make_service_with_connect_info::<SocketAddr>()`.

 ### 3. Request Forwarding

@@ -106,7 +123,12 @@ The proxy handler constructs a new request to the upstream:
 2. Copy the request method, headers, and body from the original
 3. Inject proxy headers (X-Real-IP, X-Forwarded-For, X-Forwarded-Proto)
 4. Send the request via a shared hyper Client instance
-5. Stream the response back to the client
+5. Stream the response back to the client (chunk-by-chunk, not buffered)
+
+   If the client disconnects while the upstream is still sending, the upstream
+   connection is closed and the event is logged at `debug` level. If the
+   upstream disconnects mid-stream, the client receives whatever data was
+   already sent and the connection is closed.

 The hyper Client is created once at startup and shared via axum's `State`. It
 must be configured with (see ADR-017 for rationale):
@@ -118,22 +140,77 @@ Per-site timeout overrides are available via `upstream_connect_timeout_secs`
 and `upstream_request_timeout_secs` in `SiteConfig` (see ADR-015). When not
 specified, defaults of 5s connect and 60s request are used.

-### 4. Error Handling
+### 4. Header Handling

-| Upstream Condition | Response | Notes |
-|-------------------|----------|-------|
-| Upstream reachable | Stream response as-is | Headers, status, body all forwarded |
-| Upstream unreachable | 502 Bad Gateway | Logged at `warn` level |
-| Upstream timeout | 504 Gateway Timeout | Logged at `warn` level |
-| Request body too large | 413 Payload Too Large | From `DefaultBodyLimit` middleware |
-| Rate limit exceeded | 429 Too Many Requests | Logged at `info` level |
-| Unknown Host header | 404 Not Found | No matching site definition |
+The proxy must handle request and response headers correctly to avoid security
+issues and protocol violations.

-### 5. HTTP → HTTPS Redirect
+**Headers removed before forwarding (hop-by-hop headers per RFC 2616 §13.5.1):**
+
+- `Connection`
+- `Keep-Alive`
+- `Proxy-Authorization`
+- `Proxy-Authenticate`
+- `TE`
+- `Trailers`
+- `Transfer-Encoding`
+- `Upgrade`
+
+These headers are connection-specific and must not be forwarded to the
+upstream. Removing `Proxy-Authorization` and `Proxy-Authenticate` prevents
+credential leakage.
+
+**Headers added or modified:**
+
+See the Proxy Header Injection section above for the full list of proxy headers
+(X-Real-IP, X-Forwarded-For, X-Forwarded-Proto, Host).
+
+**Headers NOT added in Phase 1:**
+
+- `Via`: Not added. The proxy is an edge proxy and `Via` is primarily for
+  tracking proxy chains. Can be added in Phase 2 if needed.
+
+**Response headers:**
+
+Upstream response headers are forwarded as-is to the client, with the following
+exceptions:
+- Hop-by-hop headers listed above are removed
+- The proxy does not add a `Server` header to responses
+
+### 5. Error Handling
+
+All error responses use plain text bodies with no proxy version or identity
+information. No upstream error details are included. Response format:
+
+- Content-Type: `text/plain; charset=utf-8`
+- Body: Brief status text matching the HTTP status (e.g., `Bad Gateway` for 502)
+
+| Upstream Condition | Response | Body | Notes |
+|-------------------|----------|------|-------|
+| Upstream reachable | Stream response as-is | (upstream body) | Headers, status, body all forwarded |
+| Upstream unreachable | 502 Bad Gateway | `Bad Gateway` | Logged at `warn` level |
+| Upstream timeout | 504 Gateway Timeout | `Gateway Timeout` | Logged at `warn` level |
+| Request body too large | 413 Payload Too Large | `Payload Too Large` | From `DefaultBodyLimit` middleware |
+| Rate limit exceeded | 429 Too Many Requests | `Too Many Requests` | Logged at `info` level |
+| Unknown Host header | 404 Not Found | `Not Found` | No matching site definition |
+| Missing Host header | 400 Bad Request | `Bad Request` | Required for routing |
+
+### 6. HTTP → HTTPS Redirect

 A separate HTTP listener on port 80 (per listener) handles redirect. It reads
 the `Host` header from the incoming request and returns a 301 Permanent Redirect
-to the HTTPS equivalent URL (preserving the path and query string).
+to the HTTPS equivalent URL.
+
+The redirect URL is constructed as:
+`https://{host}:{https_port}/{path}?{query}`
+
+Where:
+- `{host}` is the hostname portion of the `Host` header (port stripped)
+- `{https_port}` is the listener's `https_port`, omitted if it's 443
+- `{path}` and `{query}` are preserved from the original request
+
+If the incoming request has no `Host` header, the proxy returns `400 Bad
+Request`.

 Each listener has its own HTTP redirect on its own bind address.

@@ -148,6 +225,11 @@ For the initial deployment, upstream connections use plain HTTP (e.g.,
 `git.alk.dev` → `127.0.0.1:3000`, `alk.dev` → `127.0.0.1:8080`) since TLS
 between the proxy and backend services on loopback is unnecessary.

+When `upstream_scheme` is `"https"`, the proxy validates the upstream's TLS
+certificate using the system's native TLS root certificates (via `rustls` root
+cert store). Certificate validation failures result in a 502 Bad Gateway
+response. No certificate pinning or custom CA support is provided in Phase 1.
+
 ## Body Size Limit

 axum's `DefaultBodyLimit` layer sets the maximum request body size. The default
@@ -168,6 +250,7 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
 | [015](decisions/015-per-site-timeouts.md) | Per-site upstream timeouts with defaults | 5s connect / 60s request defaults, per-site overrides |
 | [017](decisions/017-upstream-connection-defaults.md) | Upstream connection defaults | HTTP/1.1, no redirects, connection pooling |
 | [018](decisions/018-body-size-limit.md) | Request body size limit | 100 MB default matching nginx, Gitea push compatibility |
+| [021](decisions/021-x-forwarded-for-edge-proxy.md) | X-Forwarded-For edge proxy model | Replace, don't append — proxy is the edge, no trusted upstream proxies |

 ## Open Questions

--- a/docs/architecture/tls.md
+++ b/docs/architecture/tls.md
@@ -243,6 +243,42 @@ suitable (e.g., behind a CDN that terminates TLS). When using HTTP-01, the
 port 80 listener serves `/.well-known/acme-challenge/{token}` paths for
 challenge verification.

+## Certificate Failure Behavior
+
+ACME certificate provisioning and renewal can fail for various reasons (network
+outages, Let's Encrypt unavailability, DNS issues, rate limiting). The proxy's
+behavior depends on the scenario:
+
+| Scenario | Behavior |
+|----------|----------|
+| First start, no cached cert, ACME unreachable | **Fail to start** with clear error message. The proxy cannot serve TLS without a certificate. |
+| First start, no cached cert, ACME succeeds | Normal startup. Certificate is provisioned and cached. |
+| Start with cached cert, ACME unreachable for renewal | **Start normally** with cached cert. Log error at `warn` level. `rustls-acme` retries per its built-in schedule. |
+| Renewal failure after startup | **Continue serving existing cert**. Log error at `warn` level. `rustls-acme` retries per its built-in schedule. |
+| Cached cert expired, renewal fails at startup | **Fail to start** if cert is expired at startup. An expired certificate cannot serve valid TLS. |
+| Cached cert expires during runtime | **Continue serving expired cert**. Clients will receive certificate errors. Log at `error` level. This is the correct behavior — silently dropping TLS would be worse. |
+
+The key principle: **never start without a valid TLS certificate**, but **always
+continue serving if a valid cert exists**, even if renewal fails.
+
+## TLS Error Handling
+
+TLS handshake failures are logged and the connection is closed. The proxy does
+not serve a default certificate for unknown hostnames — connections that don't
+match any configured certificate fail.
+
+| Scenario | Behavior |
+|----------|----------|
+| SNI hostname doesn't match any certificate (manual mode) | TLS handshake fails with `unrecognized_name` alert. Log at `warn` level with client IP and SNI hostname. |
+| No SNI extension sent by client | TLS handshake fails with `handshake_failure` alert. Log at `warn` level with client IP. |
+| Unsupported TLS version (1.0/1.1) | TLS handshake fails with `protocol_version` alert. Log at `info` level. |
+| Cipher suite negotiation fails | TLS handshake fails with `handshake_failure` alert. Log at `info` level with client IP. |
+| Certificate expired (manual mode) | Connection fails during TLS handshake. Log at `error` level. Other listeners/connections continue serving. |
+
+In ACME mode, the `ResolvesServerCertAcme` resolver handles certificate
+selection automatically — there is no SNI mismatch scenario because the
+resolver serves the ACME-provisioned certificate for all valid domains.
+
 ## Key Files and Crates

 | Component | Crate | Purpose |