Resolve 5 open questions, add 7 ADRs for previously undocumented decisions

Resolve open questions: - OQ-01: Restrict cipher suites to match nginx scope (4 ECDHE-AES-GCM suites for TLS 1.2 + all TLS 1.3 suites) — ADR-012 - OQ-03: Health check on separate local port (default 9900, localhost only) — ADR-013 - OQ-04: Add Unix domain socket admin API for config reload alongside SIGHUP, with structured success/failure responses — ADR-014 - OQ-06: Per-site upstream timeouts with defaults (5s connect, 60s request), overridable in SiteConfig — ADR-015 Document previously undocumented decisions flagged by architecture review: - ADR-016: Explicit bind address requirement (reject 0.0.0.0) - ADR-017: Upstream connection defaults (HTTP/1.1, no redirects, pooling) - ADR-018: 100 MB body size limit (matches nginx, Gitea compatibility) OQ-07 (per-site TLS overrides) remains open for future consideration. Spec updates: - config.md: add health_check_port, admin_socket_path, per-site timeout fields, update TOML example and validation rules - proxy.md: reference ADR-015/017/018 for timeouts, connection defaults, and body limit decisions - tls.md: replace OQ-01 cipher suite section with ADR-012 decision - operations.md: add local health check port section, admin socket reload - overview.md: update Phase 1 scope with new features, add ADR references - open-questions.md: resolve OQ-01/03/04/06, keep OQ-07 open
2026-06-11 09:07:36 +00:00
parent 7efc142406
commit 9a2352e61c
14 changed files with 613 additions and 89 deletions
--- a/docs/architecture/README.md
+++ b/docs/architecture/README.md
@@ -43,6 +43,13 @@ certificate via ACME.
 | [009](decisions/009-signal-handling.md) | Signal Handling Strategy | Accepted |
 | [010](decisions/010-multi-site-phase1.md) | Multi-Site Support in Phase 1 | Accepted |
 | [011](decisions/011-multi-domain-tls.md) | Multi-Domain TLS Configuration | Accepted |
 | [012](decisions/012-cipher-suite-restriction.md) | Restrict Cipher Suites to nginx Scope | Accepted |
 | [013](decisions/013-health-check-port.md) | Health Check on Separate Local Port | Accepted |
 | [014](decisions/014-unix-socket-reload.md) | Unix Domain Socket Config Reload API | Accepted |
 | [015](decisions/015-per-site-timeouts.md) | Per-Site Upstream Timeouts with Defaults | Accepted |
 | [016](decisions/016-explicit-bind-address.md) | Explicit Bind Address Requirement | Accepted |
 | [017](decisions/017-upstream-connection-defaults.md) | Upstream Connection Defaults | Accepted |
 | [018](decisions/018-body-size-limit.md) | Request Body Size Limit | Accepted |
 ## Open Questions
@@ -50,12 +57,12 @@ See [open-questions.md](open-questions.md) for the full tracker.
 | OQ | Question | Priority | Status |
 |----|----------|----------|--------|
-| OQ-01 | Should cipher suites be restricted beyond rustls defaults? | medium | open |
+| ~~OQ-01~~ | ~~Should cipher suites be restricted beyond rustls defaults?~~ | ~~medium~~ | **resolved** (ADR-012) |
 | ~~OQ-02~~ | ~~What log format should fail2ban consume?~~ | ~~high~~ | **resolved** (ADR-007) |
-| OQ-03 | Should the health check endpoint be on a separate port? | low | open |
+| ~~OQ-03~~ | ~~Should the health check endpoint be on a separate port?~~ | ~~low~~ | **resolved** (ADR-013) |
-| OQ-04 | Config reload: SIGHUP only or also Unix socket API? | low | open |
+| ~~OQ-04~~ | ~~Config reload: SIGHUP only or also Unix socket API?~~ | ~~low~~ | **resolved** (ADR-014) |
 | ~~OQ-05~~ | ~~Should the proxy bind to multiple addresses?~~ | ~~low~~ | **resolved** (single bind_addr sufficient) |
-| OQ-06 | Should upstream timeouts be configurable per-site? | low | open |
+| ~~OQ-06~~ | ~~Should upstream timeouts be configurable per-site?~~ | ~~low~~ | **resolved** (ADR-015) |
 | OQ-07 | Should per-site TLS overrides be supported for mixed ACME/manual domains? | low | open |
 ## Document Lifecycle
@@ -63,6 +70,6 @@ See [open-questions.md](open-questions.md) for the full tracker.
 | Status | Meaning | Transitions |
 |--------|---------|-------------|
 | `draft` | Under active development. May change significantly. | → `reviewed` when open questions are resolved |
-| `reviewed` | Architecture is final. Implementation may begin. | → `stable` when implementation is complete |
+| `reviewed` | Architecture is final. Implementation may begin. Changes require review. | → `stable` when implementation is complete |
 | `stable` | Locked. Changes require review and may warrant an ADR. | → `deprecated` when superseded |
 | `deprecated` | Superseded. Kept for reference. | Removed when no longer referenced |
--- a/docs/architecture/config.md
+++ b/docs/architecture/config.md
@@ -38,7 +38,9 @@ config.toml
 │  bind_addr           │     │  sites[]              │
 │  http_port           │     │  rate_limit           │
 │  https_port          │     │  body_limit           │
-│  tls.mode            │     │  proxy_headers        │
+│  health_check_port   │     │  proxy_headers        │
 │  admin_socket_path   │     │                       │
 │  tls.mode            │     │  ← ArcSwap →          │
 │  tls.acme_domains    │     │                       │
 │  tls.cert_path       │     │  ← ArcSwap →          │
 │  tls.key_path        │     │  ConfigReloadHandle    │
@@ -59,9 +61,11 @@ Immutable after startup. Changes require a process restart.
 | Field | Type | Description |
 |-------|------|-------------|
-| `bind_addr` | `String` | IP address to bind to (must be explicit, no `0.0.0.0`) |
+| `bind_addr` | `String` | IP address to bind to (must be explicit, no `0.0.0.0`; see ADR-016) |
 | `http_port` | `u16` | Port for HTTP→HTTPS redirect (default: `80`; set to `0` to disable) |
 | `https_port` | `u16` | Port for TLS listener (default: `443`) |
 | `health_check_port` | `u16` | Port for local health check endpoint (default: `9900`; set to `0` to disable; see ADR-013) |
 | `admin_socket_path` | `String` | Unix domain socket path for admin API (default: `/run/reverse-proxy/admin.sock`; empty string to disable; see ADR-014) |
 | `tls.mode` | `"acme"` or `"manual"` | Certificate provisioning mode |
 | `tls.acme_domains` | `Vec<String>` | Domains for ACME SAN certificate (ACME mode only) |
 | `tls.acme_cache_dir` | `String` | ACME state cache directory |
@@ -95,6 +99,8 @@ connections immediately.
 | `host` | `String` | Hostname to match (e.g., `"git.alk.dev"`) |
 | `upstream` | `String` | Upstream address (e.g., `"127.0.0.1:3000"`) |
 | `upstream_scheme` | `"http"` or `"https"` | Protocol for upstream connection (default: `"http"`) |
 | `upstream_connect_timeout_secs` | `u64` | TCP connect timeout in seconds (default: `5`; see ADR-015, ADR-017) |
 | `upstream_request_timeout_secs` | `u64` | Full request timeout in seconds (default: `60`; see ADR-015, ADR-017) | |
 **Why these are dynamic:** See ADR-008 for the rationale. Site definitions
 and rate limits are per-request concerns that should not require restarting
@@ -120,17 +126,23 @@ behind this split.
 ### Reload Trigger
-The initial implementation uses SIGHUP as the reload trigger. When the process
+Config reload is triggered by two mechanisms:
 receives SIGHUP:
 1. **SIGHUP**: Re-reads the config file, validates, and swaps DynamicConfig if
   valid. Simple and well-understood, but provides no feedback on success or
   failure.
 2. **Admin socket**: The `reload` command via the admin Unix domain socket
   performs the same action as SIGHUP but returns a structured response
   indicating success or failure with an error message. See ADR-014 for
   details.
 Both mechanisms converge on the same code path:
 1. Re-read the config file from disk
 2. Deserialize into `DynamicConfig`
 3. Validate (check upstream reachability is optional)
 4. Call `ConfigReloadHandle::reload(new_config)`
 Future implementations could add a Unix domain socket API or HTTP endpoint for
 config reload, but SIGHUP is sufficient for Phase 1.
 ## TOML Config Format
 ```toml
@@ -140,6 +152,8 @@ config reload, but SIGHUP is sufficient for Phase 1.
 bind_addr = "203.0.113.10"  # Replace with actual bind address
 http_port = 80
 https_port = 443
 health_check_port = 9900     # Local health check (0 to disable)
 admin_socket_path = "/run/reverse-proxy/admin.sock"  # Empty string to disable
 [server.tls]
 mode = "acme"                    # "acme" or "manual"
@@ -167,6 +181,8 @@ limit_bytes = 104857600          # 100 MB
 host = "git.alk.dev"
 upstream = "127.0.0.1:3000"
 upstream_scheme = "http"
 # upstream_connect_timeout_secs = 5    # Default: 5s
 # upstream_request_timeout_secs = 60   # Default: 60s
 [[sites]]
 host = "alk.dev"
@@ -205,13 +221,17 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
 | [008](decisions/008-static-dynamic-config-split.md) | Static/dynamic config split | Immutable StaticConfig, hot-reloadable DynamicConfig via ArcSwap |
 | [010](decisions/010-multi-site-phase1.md) | Multi-site in Phase 1 | Multiple domains from initial release |
 | [011](decisions/011-multi-domain-tls.md) | Multi-domain TLS config | Single SAN certificate covering all domains |
 | [013](decisions/013-health-check-port.md) | Health check on separate local port | Localhost-only HTTP health check, configurable port |
 | [014](decisions/014-unix-socket-reload.md) | Unix domain socket config reload API | Programmatic reload with success/failure feedback |
 | [015](decisions/015-per-site-timeouts.md) | Per-site upstream timeouts with defaults | 5s connect / 60s request defaults, per-site overrides |
 | [016](decisions/016-explicit-bind-address.md) | Explicit bind address required | Rejects `0.0.0.0` to prevent accidental exposure |
 ## Open Questions
 Open questions are tracked in [open-questions.md](open-questions.md). Key
 questions affecting this document:
- **OQ-04**: Should config reload support a Unix domain socket API in addition
+- ~~**OQ-04**: Should config reload support a Unix domain socket API in addition
-  to SIGHUP? (open)
+  to SIGHUP?~~ (resolved — ADR-014: Unix domain socket admin API added)
 - **OQ-07**: Should per-site TLS overrides be supported for mixed ACME/manual
  domains? (open)
--- a/docs/architecture/decisions/012-cipher-suite-restriction.md
+++ b/docs/architecture/decisions/012-cipher-suite-restriction.md
@@ -0,0 +1,83 @@
 # ADR-012: Restrict Cipher Suites to Match nginx Scope
 ## Status
 Accepted
 ## Context
 Our current nginx configuration explicitly restricts cipher suites to four
 ECDHE-AES-GCM suites:
 ```
 ECDHE-ECDSA-AES128-GCM-SHA256
 ECDHE-RSA-AES128-GCM-SHA256
 ECDHE-ECDSA-AES256-GCM-SHA384
 ECDHE-RSA-AES256-GCM-SHA384
 ```
 rustls 0.23 with the `aws_lc_rs` crypto provider defaults to a conservative
 set that excludes all weak ciphers (no SHA-1, no 3DES, no RC4, no CBC-mode
 suites, no RSA key exchange). The rustls defaults include these four suites
 plus TLS 1.3 suites (which nginx also allows via `TLSv1.3`).
 The question was whether to accept the wider rustls defaults or restrict to
 the same scope as our current nginx configuration (see OQ-01).
 ## Decision
 Explicitly restrict cipher suites to match the same scope as our current nginx
 configuration: the four ECDHE-AES-GCM suites for TLS 1.2, plus all TLS 1.3
 suites. This is slightly more restrictive than rustls defaults but preserves
 compatibility with all modern clients.
 The restricted set in rustls terms:
 **TLS 1.2 (explicitly selected):**
 - `TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256`
 - `TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256`
 - `TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384`
 - `TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384`
 **TLS 1.3 (all default suites):**
 - `TLS_AES_128_GCM_SHA256`
 - `TLS_AES_256_GCM_SHA384`
 - `TLS_CHACHA20_POLY1305_SHA256`
 This is configured by building a `CryptoProvider` with a custom
 `cipher_suite` list and passing it to `ServerConfig::builder_with_provider()`.
 ## Rationale
 - Maintains behavioral parity with our current nginx configuration during
  migration — no client that worked with nginx should see different TLS
  behavior
 - The four TLS 1.2 suites are the same ones nginx allows, providing a known
  security baseline
 - TLS 1.3 suites are all modern and secure — restricting them provides no
  meaningful security benefit and reduces compatibility
 - rustls defaults include ChaCha20-Poly1305 suites for TLS 1.2 which nginx does
  not; these are cryptographically sound but represent a wider scope than our
  current nginx config allows
 - Explicit configuration means the cipher list is documented and auditable
 - If compatibility issues arise, expanding the list is straightforward; the
  reverse (restricting after deployment) risks breaking existing clients
 ## Consequences
 **Positive:**
 - Behavioral parity with current nginx TLS configuration
 - Explicitly auditable cipher list
 - No client that currently works will see different TLS behavior
 - Matches security review expectations
 **Negative:**
 - Slightly more restrictive than rustls defaults — excludes ChaCha20-Poly1305
  for TLS 1.2 and AES-CCM suites (rarely used)
 - Must update the cipher list when deprecating TLS 1.2 in the future
 - Custom `CryptoProvider` construction is slightly more code than using defaults
 ## References
 - [tls.md](../tls.md)
 - OQ-01 (now resolved)
--- a/docs/architecture/decisions/013-health-check-port.md
+++ b/docs/architecture/decisions/013-health-check-port.md
@@ -0,0 +1,69 @@
 # ADR-013: Health Check on Separate Local Port
 ## Status
 Accepted
 ## Context
 The health check endpoint (`/health`) needs to be accessible for monitoring
 without requiring TLS. Currently the design places it on the main HTTPS
 listener, which means:
 1. TLS handshake must succeed for the health check to respond
 2. External monitoring tools need to handle TLS
 3. A TLS configuration error would make the health check unreachable, creating
   a false-negative monitoring signal
 Three options were considered (see OQ-03):
 1. **Main HTTPS listener only**: Simplest, but TLS config errors make health
   checks unreachable
 2. **Separate unencrypted port on localhost**: Simple, works with standard
   monitoring tools, but health checks bypass TLS
 3. **Admin port with its own listener**: Most flexible but adds complexity
 ## Decision
 Add a configurable health check port that binds to `127.0.0.1` only (localhost),
 serving `/health` over plain HTTP. This is a separate listener from the main
 HTTP and HTTPS listeners.
 The port is configurable via `health_check_port` in StaticConfig. Setting it
 to `0` (default) disables the separate health check listener, and `/health`
 remains available on the main HTTPS listener as a fallback.
 ## Rationale
 - A local-only health check port is the standard pattern for reverse proxies
  and service meshes (envoy, haproxy, k8s health probes all use this pattern)
 - Health checks should work even when TLS is misconfigured — that's the whole
  point of monitoring
 - Binding to `127.0.0.1` only means the health check is not exposed to the
  internet — only local monitoring tools (systemd, scripts, load balancers on
  the same host) can reach it
 - Configurable port allows different deployment scenarios (some monitoring runs
  on different ports)
 - Disabling via `health_check_port = 0` keeps the main HTTPS `/health` endpoint
  available for cases where a separate port isn't needed
 - When this project is folded into alknet, the health check will use alknet's
  existing patterns, making the separate port unnecessary in that context
 ## Consequences
 **Positive:**
 - Health checks work even when TLS is misconfigured
 - Standard pattern that monitoring tools expect
 - Not exposed to the internet (localhost only)
 - Configurable — can be disabled if not needed
 - systemd can use it for `NotifyAccess` readiness checks
 **Negative:**
 - Additional listener to manage (minimal complexity)
 - Two health check endpoints exist when the separate port is enabled (the
  local one and the HTTPS one) — monitoring should prefer the local one
 ## References
 - [operations.md](../operations.md)
 - OQ-03 (now resolved)
--- a/docs/architecture/decisions/014-unix-socket-reload.md
+++ b/docs/architecture/decisions/014-unix-socket-reload.md
@@ -0,0 +1,76 @@
 # ADR-014: Unix Domain Socket Config Reload API
 ## Status
 Accepted
 ## Context
 The proxy supports config reload via SIGHUP (ADR-009). SIGHUP is simple and
 well-understood, but has limitations:
 1. No feedback — the sender doesn't know if the reload succeeded or failed
 2. No structured input — you can only signal "reload", not specify which parts
   to reload or pass validation context
 3. Requires process signal permissions — not all deployment tools can send
   signals
 A Unix domain socket API would allow programmatic config reload with
 success/failure status, enabling integration with CI/CD pipelines, admin
 tools, and automated configuration management.
 ## Decision
 Add a Unix domain socket API for config reload alongside SIGHUP. The socket
 accepts commands and returns structured responses.
 The socket path is configurable via `admin_socket_path` in StaticConfig
 (default: `/run/reverse-proxy/admin.sock`). Setting it to empty string
 disables the admin socket.
 Initial commands:
 - `reload` — Re-read config file, validate, and swap DynamicConfig. Returns
  `{"status": "ok"}` or `{"status": "error", "message": "..."}`.
 - `status` — Return basic process info (uptime, config load time, site count).
  Returns `{"status": "ok", "uptime_secs": 1234, "sites": 2}`.
 Future commands (not in Phase 1, but the protocol supports extension):
 - `metrics` — Return Prometheus-compatible metrics
 - `shutdown` — Graceful shutdown command
 ## Rationale
 - Providing reload feedback is operationally valuable — CI/CD pipelines can
  verify config changes before proceeding
 - The implementation cost is low — a Unix domain socket listener is ~50 lines
  of tokio code, and the command protocol is simple
 - SIGHUP is retained as a fallback for environments where socket access is
  inconvenient
 - This pattern will integrate naturally with alknet's admin interface when the
  projects merge
 - Unix domain sockets are filesystem-permission-based, providing access control
  without additional authentication
 - The socket path is configurable, allowing deployment-specific paths
 ## Consequences
 **Positive:**
 - Config reload with success/failure feedback
 - Programmatic integration with CI/CD and admin tools
 - Structured response format enables automation
 - SIGHUP still works as fallback
 - Natural path to future admin commands
 **Negative:**
 - Additional listener and command parsing logic (~100-150 lines)
 - Socket file management (cleanup on startup, stale socket detection)
 - One more config option (`admin_socket_path`)
 ## References
 - [operations.md](../operations.md)
 - [config.md](../config.md)
 - ADR-009 (signal handling — SIGHUP retained as fallback)
 - OQ-04 (now resolved)
--- a/docs/architecture/decisions/015-per-site-timeouts.md
+++ b/docs/architecture/decisions/015-per-site-timeouts.md
@@ -0,0 +1,69 @@
 # ADR-015: Per-Site Upstream Timeouts with Defaults
 ## Status
 Accepted
 ## Context
 The proxy forwards requests to upstream services. Connection and request
 timeouts affect reliability — too short and legitimate slow responses fail,
 too long and the proxy is vulnerable to resource exhaustion from stalled
 connections.
 Phase 1 initially specified global timeout defaults (5s connect, 60s request)
 for all upstreams. However, different upstream services have different latency
 profiles:
 - Gitea (git.alk.dev): Git push operations can be slow for large repos; 60s
  may be insufficient for clone/push operations with large pack files
 - Deno/Fresh (alk.dev): Fast responses expected; 60s is generous
 Global timeouts don't accommodate these differences. Per-site timeout
 configuration allows tuning for each upstream without affecting others.
 ## Decision
 Add optional per-site upstream timeout configuration to `SiteConfig`. When not
 specified, sensible defaults are used.
 **SiteConfig additions:**
 | Field | Type | Default | Description |
 |-------|------|---------|-------------|
 | `upstream_connect_timeout_secs` | `u64` | `5` | TCP connection timeout in seconds |
 | `upstream_request_timeout_secs` | `u64` | `60` | Full request timeout in seconds |
 These are part of `DynamicConfig` (hot-reloadable via ArcSwap) since they
 affect per-request behavior and should not require a restart to change.
 ## Rationale
 - Different upstreams genuinely have different latency profiles — Gitea pushes
  with large pack files need more time than a fast static site
 - Defaults of 5s connect and 60s request match common reverse proxy conventions
  (nginx defaults: 60s, haproxy defaults: 30s connect, 60s server)
 - Making these per-site rather than global allows tuning without side effects
 - Per-site overrides in DynamicConfig means timeout changes don't require
  restarts
 - The defaults are reasonable for most services; explicit configuration is only
  needed for outliers
 ## Consequences
 **Positive:**
 - Each upstream can be tuned independently
 - Defaults work for most cases — explicit configuration is optional
 - Hot-reloadable (part of DynamicConfig)
 - Consistent with how other reverse proxies handle timeouts
 **Negative:**
 - Two more fields per site in config (mitigated by sensible defaults)
 - Per-site timeout means the proxy must look up per-request config for each
  upstream connection (already required for routing, so no additional overhead)
 ## References
 - [proxy.md](../proxy.md)
 - [config.md](../config.md)
 - OQ-06 (now resolved)
--- a/docs/architecture/decisions/016-explicit-bind-address.md
+++ b/docs/architecture/decisions/016-explicit-bind-address.md
@@ -0,0 +1,53 @@
 # ADR-016: Explicit Bind Address Requirement
 ## Status
 Accepted
 ## Context
 The proxy's `bind_addr` configuration field specifies the IP address to bind
 to. The default for most network services is `0.0.0.0` (bind all interfaces),
 which means the service is accessible on every network interface, including
 public-facing ones.
 For a reverse proxy that terminates TLS and handles security-sensitive traffic,
 binding to all interfaces is risky. The proxy should only listen on the
 intended interface(s) — typically a specific public IP for a single-server
 deployment.
 ## Decision
 The `bind_addr` field must be an explicit IP address. `0.0.0.0` is rejected
 during config validation. The proxy will not start if `bind_addr` is `0.0.0.0`.
 ## Rationale
 - A reverse proxy terminating TLS is a security boundary — it should not be
  accidentally exposed on unintended interfaces
 - Single-server deployments have a specific public IP; binding to it
  deliberately is the correct operational practice
 - `0.0.0.0` is often a default in example configurations and can be deployed
  without the operator realizing the service is accessible on all interfaces
 - Rejecting it at validation time prevents this common mistake
 - If a deployment genuinely needs to bind all interfaces, `0.0.0.0` can be
  overridden with an explicit flag, but this should be a deliberate choice
 - This matches the principle of explicit over implicit for security-sensitive
  configuration
 ## Consequences
 **Positive:**
 - Prevents accidental exposure on unintended network interfaces
 - Forces operators to be deliberate about which interface the proxy serves
 - Config validation catches the mistake before deployment
 **Negative:**
 - Not suitable for deployments that genuinely need to bind all interfaces
  (mitigated by explicit override if needed in the future)
 - Slightly more configuration required (operator must know their public IP)
 ## References
 - [config.md](../config.md)
 - [overview.md](../overview.md)
--- a/docs/architecture/decisions/017-upstream-connection-defaults.md
+++ b/docs/architecture/decisions/017-upstream-connection-defaults.md
@@ -0,0 +1,66 @@
 # ADR-017: Upstream Connection Defaults
 ## Status
 Accepted
 ## Context
 The proxy forwards requests to upstream services using a shared hyper `Client`.
 The client's configuration affects all upstream connections: timeouts, redirect
 handling, and connection pooling. These are significant design choices that
 affect production behavior and should be traceable.
 ## Decision
 Configure the hyper `Client` with the following defaults:
 | Setting | Value | Rationale |
 |---------|-------|-----------|
 | Connect timeout | 5 seconds | Prevents hanging on unreachable upstreams |
 | Request timeout | 60 seconds (default) | Per-site override available (ADR-015) |
 | Redirect following | Disabled | Proxies should not follow redirects — they pass response status to the client |
 | Connection pooling | Enabled (hyper default) | Reuses connections to the same upstream for performance |
 | HTTP version | HTTP/1.1 only | Upstream connections use HTTP/1.1; HTTP/2 proxying is out of scope |
 The connect timeout of 5 seconds applies to establishing a TCP connection to
 the upstream. The request timeout of 60 seconds applies to the full
 request/response cycle and can be overridden per-site (ADR-015).
 ## Rationale
 - **5s connect timeout**: Matches common reverse proxy conventions. Long
  enough for remote upstreams with network latency, short enough to fail fast
  on unreachable services.
 - **60s request timeout**: Generous default that accommodates Gitea push
  operations with large pack files. Per-site overrides allow tuning for faster
  upstreams.
 - **No redirect following**: Standard reverse proxy behavior. If an upstream
  returns a 3xx, the proxy streams it to the client. Following redirects would
  break the client's expectation and could create security issues (redirect
  loops, credential forwarding).
 - **Connection pooling**: Hyper's default connection reuse improves performance
  for repeated requests to the same upstream without additional configuration.
 - **HTTP/1.1 only**: The proxy communicates with upstreams over plain HTTP
  (loopback). HTTP/2 proxying is explicitly out of scope (see overview.md).
  Upstreams needing HTTP/2+ run their own servers (e.g., api.alk.dev).
 ## Consequences
 **Positive:**
 - Explicit, documented defaults that match reverse proxy conventions
 - Per-site timeout overrides available for specific needs
 - No redirect following prevents common proxy pitfalls
 - Connection pooling provides good performance by default
 **Negative:**
 - 60s default may be too long for fast upstreams (mitigated by per-site
  overrides in ADR-015)
 - HTTP/1.1 only for upstream means no HTTP/2 multiplexing to backends (out of
  scope per project design)
 ## References
 - [proxy.md](../proxy.md)
 - ADR-015 (per-site upstream timeouts)
 - ADR-002 (custom proxy handler)
--- a/docs/architecture/decisions/018-body-size-limit.md
+++ b/docs/architecture/decisions/018-body-size-limit.md
@@ -0,0 +1,53 @@
 # ADR-018: Request Body Size Limit
 ## Status
 Accepted
 ## Context
 The proxy enforces a maximum request body size to protect against resource
 exhaustion attacks. The default limit must balance security with usability.
 Gitea push operations can involve large Git pack files. The current nginx
 configuration uses `client_max_body_size 100m`, and Gitea's documentation
 recommends allowing up to 100 MB for push operations.
 ## Decision
 Set the default request body size limit to 100 MB (104,857,600 bytes),
 matching our current nginx configuration. The limit is global in Phase 1
 (configurable via `body.limit_bytes` in DynamicConfig).
 ## Rationale
 - 100 MB matches the current nginx `client_max_body_size 100m`, ensuring
  behavioral parity during migration
 - Gitea push operations with large repositories regularly exceed 50 MB
 - 100 MB is large enough for any legitimate Git operation while still
  providing protection against resource exhaustion (a 100 MB body is not
  enough to exhaust memory on modern servers, but prevents unbounded uploads)
 - The limit is configurable — operators can reduce it for deployments that
  don't need large uploads
 - In Phase 2, per-site limits will allow different limits for different
  upstreams (e.g., a lower limit for alk.dev, the current limit for
  git.alk.dev)
 ## Consequences
 **Positive:**
 - Behavioral parity with current nginx configuration
 - Gitea push operations work without configuration changes
 - Configurable for deployments with different needs
 **Negative:**
 - 100 MB is a generous default — some deployments may want a lower limit
  (mitigated by configurability)
 - Global limit means all sites share the same maximum (mitigated by Phase 2
  per-site limits)
 ## References
 - [proxy.md](../proxy.md)
 - [config.md](../config.md)
 - nginx `client_max_body_size` documentation
--- a/docs/architecture/open-questions.md
+++ b/docs/architecture/open-questions.md
@@ -7,19 +7,15 @@ last_updated: 2026-06-11
 ## TLS
-### OQ-01: Should cipher suites be restricted beyond rustls defaults?
+### ~~OQ-01: Should cipher suites be restricted beyond rustls defaults?~~
 - **Origin**: [tls.md](tls.md)
- **Status**: open
+- **Status**: resolved
 - **Priority**: medium
- **Context**: Our current nginx config explicitly restricts cipher suites to
+- **Resolution**: Restrict cipher suites to match the nginx scope: four
-  four ECDHE-AES-GCM suites. rustls 0.23 with `aws_lc_rs` defaults to a
+  ECDHE-AES-GCM suites for TLS 1.2 plus all TLS 1.3 suites. This provides
-  conservative set that excludes all weak ciphers (no SHA-1, no 3DES, no RC4,
+  behavioral parity during migration. See ADR-012.
-  no CBC-mode suites, no RSA key exchange). The defaults include TLS 1.3 suites
+- **Cross-references**: ADR-005, ADR-012
  which nginx also allows. Restricting further would reduce compatibility with
  older clients; not restricting means accepting a wider (but still safe) set
  than the current nginx config.
 - **Cross-references**: ADR-005
 ### ~~OQ-02: What log format should fail2ban consume?~~
@@ -47,31 +43,28 @@ last_updated: 2026-06-11
 ## Logging and Monitoring
-### OQ-03: Should the health check endpoint be on a separate port?
+### ~~OQ-03: Should the health check endpoint be on a separate port?~~
 - **Origin**: [operations.md](operations.md)
- **Status**: open
+- **Status**: resolved
 - **Priority**: low
- **Context**: Currently the health check is on the main HTTPS listener at
+- **Resolution**: Add a configurable local health check port (default: 9900)
-  `/health`. Alternatives: (a) separate unencrypted port for health checks
+  bound to `127.0.0.1` only. Health checks work even when TLS is misconfigured.
-  (simpler for load balancers but less secure), (b) admin port with its own
+  The main HTTPS `/health` endpoint remains available as a fallback. See
-  listener (more complex but isolates operational traffic), (c) on the main
+  ADR-013.
-  listener (simplest, proposed approach). For a single-server deployment behind
+- **Cross-references**: ADR-013
  no external load balancer, the main listener is fine.
 - **Cross-references**: None
 ## Configuration
-### OQ-04: Should config reload support a Unix domain socket API in addition to SIGHUP?
+### ~~OQ-04: Should config reload support a Unix domain socket API in addition to SIGHUP?~~
 - **Origin**: [config.md](config.md)
- **Status**: open
+- **Status**: resolved
 - **Priority**: low
- **Context**: Phase 1 uses SIGHUP for config reload, which is simple and proven.
+- **Resolution**: Yes. Add a Unix domain socket admin API alongside SIGHUP.
-  A Unix domain socket API would allow programmatic reload (e.g., from an admin
+  The socket accepts a `reload` command and returns structured success/failure
-  tool or CI/CD pipeline) and could return success/failure status. This adds
+  responses. SIGHUP is retained as a fallback. See ADR-014.
-  complexity and is not needed for Phase 1.
+- **Cross-references**: ADR-014
 - **Cross-references**: None
 ## Deployment
@@ -84,17 +77,16 @@ last_updated: 2026-06-11
  explicit IP address (not `0.0.0.0`). Multi-address binding is not needed for
  this single-server deployment. If needed in the future, `bind_addr` could be
  extended to an array. See config.md for the `bind_addr` field.
- **Cross-references**: None
+- **Cross-references**: ADR-016
 ## Proxy
-### OQ-06: Should upstream timeouts be configurable per-site?
+### ~~OQ-06: Should upstream timeouts be configurable per-site?~~
 - **Origin**: [proxy.md](proxy.md)
- **Status**: open
+- **Status**: resolved
 - **Priority**: low
- **Context**: Phase 1 uses global defaults (5s connect timeout, 60s request
+- **Resolution**: Yes. Per-site upstream timeouts with sensible defaults (5s
-  timeout) for all upstream connections. Per-site timeout configuration would
+  connect, 60s request). Optional fields in SiteConfig that override global
-  allow tuning for different upstream services (e.g., a slow database-backed
+  defaults when specified. See ADR-015.
-  API vs. a fast static site). Not needed for Phase 1 with a single upstream.
+- **Cross-references**: ADR-015, ADR-017
 - **Cross-references**: None
--- a/docs/architecture/operations.md
+++ b/docs/architecture/operations.md
@@ -109,23 +109,30 @@ Configurable via `log_level` in StaticConfig.
 ## Health Check
-### Endpoint
+### Local Health Check Port
 The primary health check endpoint is served on a separate local port (default:
 9900), bound to `127.0.0.1` only. This ensures health checks work even when TLS
 is misconfigured. See ADR-013 for the rationale.
 ```
-GET /health → 200 OK (empty body)
+GET http://127.0.0.1:9900/health → 200 OK (empty body)
 ```
-The health check endpoint is accessible on the main HTTPS listener. It returns
+The port is configurable via `health_check_port` in StaticConfig. Setting it
-200 if the process is alive and serving requests.
+to `0` disables the separate health check listener.
-**Limitation**: Since `/health` is served over TLS, it cannot detect TLS
+### HTTPS Health Check (Fallback)
-configuration errors that prevent the TLS handshake from completing. External
+
-monitoring should also check TCP connectivity to port 443 independently.
+When the local health check port is enabled, `/health` is also available on the
 main HTTPS listener for cases where TLS-level health verification is desired.
 External monitoring should prefer the local health check for liveness checks
 and can use the HTTPS endpoint for TLS verification.
 ### What It Checks
 - Process is running and the tokio runtime is responsive
- TLS listener is accepting connections
+- TLS listener is accepting connections (HTTPS endpoint only)
 - Config is loaded (StaticConfig and DynamicConfig are initialized)
 It does **not** check upstream reachability. The health check answers "is the
@@ -180,13 +187,22 @@ The proxy handles three signals via `signal-hook` (see [ADR-009](decisions/009-s
 - **SIGTERM / SIGINT**: Graceful shutdown. Stop accepting new connections, wait
  for in-flight requests to complete (up to a configurable timeout), then exit.
 - **SIGHUP**: Config reload. Re-read the config file, validate, and swap
-  DynamicConfig if valid.
+  DynamicConfig if valid. No feedback on success or failure.
 - **Admin socket reload**: Send `reload` command via the Unix domain socket
  (default: `/run/reverse-proxy/admin.sock`). Returns structured response
  indicating success or failure. See ADR-014 for details.
 ### SIGHUP for Config Reload
 SIGHUP triggers config reload (see [config.md](config.md) for details). The
 process does not exit on SIGHUP.
 ### Admin Socket for Config Reload
 The admin Unix domain socket provides programmatic config reload with feedback.
 This is useful for CI/CD pipelines and automation tools. See ADR-014 for the
 command protocol.
 ### Timeout
 In-flight requests have a configurable shutdown timeout (default: 30 seconds).
@@ -242,10 +258,13 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
 | [006](decisions/006-rate-limiting-approach.md) | Token bucket rate limiting | In-memory per-IP token bucket matching nginx burst semantics |
 | [007](decisions/007-custom-log-format.md) | Custom structured log format | key=value pairs with RATE_LIMIT prefix for fail2ban |
 | [009](decisions/009-signal-handling.md) | Signal handling strategy | signal-hook for SIGTERM/SIGINT/SIGHUP |
 | [013](decisions/013-health-check-port.md) | Health check on separate local port | Localhost-only HTTP health check, configurable port |
 | [014](decisions/014-unix-socket-reload.md) | Unix domain socket config reload API | Programmatic reload with success/failure feedback |
 ## Open Questions
 Open questions are tracked in [open-questions.md](open-questions.md). Key
 questions affecting this document:
- **OQ-03**: Should the health check endpoint be on a separate port? (open)
+- ~~**OQ-03**: Should the health check endpoint be on a separate port?~~ (resolved
  — ADR-013: separate local port, default 9900, localhost only)
--- a/docs/architecture/overview.md
+++ b/docs/architecture/overview.md
@@ -40,22 +40,24 @@ details.
 - **Phase 1**: Multi-site reverse proxy with TLS termination
  - TLS termination with ACME (Let's Encrypt) multi-domain certificate management
  - Manual certificate paths as fallback mode
  - Cipher suite restriction matching nginx scope (ECDHE-AES-GCM + TLS 1.3)
  - HTTP → HTTPS redirect
  - Host-based routing to multiple upstream services
  - Reverse proxy to Gitea at `127.0.0.1:3000` (git.alk.dev)
  - Reverse proxy to Deno/Fresh container for alk.dev (simple pass-through)
  - Proxy header injection (Host, X-Real-IP, X-Forwarded-For, X-Forwarded-Proto)
  - Per-site upstream timeouts with sensible defaults (5s connect, 60s request)
  - Request rate limiting with fail2ban-compatible logging (global per-IP)
  - 100 MB body size limit (global)
-  - Configurable bind address (no `0.0.0.0` default)
+  - Configurable bind address (must be explicit, no `0.0.0.0`)
-  - Health check endpoint
+  - Local health check endpoint on separate port (default: 9900, localhost only)
  - Unix domain socket admin API for config reload with feedback
  - Graceful shutdown (SIGTERM handling)
  - Systemd unit file
  - Dual licensing: MIT OR Apache-2.0
 - **Phase 2**: Operational hardening
  - Per-site rate limits and body limits
  - Per-site upstream timeouts
  - Metrics endpoint (Prometheus-compatible)
  - Connection limits and timeouts
  - Log rotation
@@ -63,7 +65,6 @@ details.
 - **Phase 3**: Future enhancements
  - Wildcard subdomain support
  - Per-site TLS overrides (manual certs for specific domains)
  - Unix domain socket config reload API
 ### Out of Scope
@@ -168,12 +169,19 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
 | [009](decisions/009-signal-handling.md) | Signal handling strategy | signal-hook for SIGTERM/SIGINT/SIGHUP |
 | [010](decisions/010-multi-site-phase1.md) | Multi-site in Phase 1 | Multiple domains from initial release; avoids config migration later |
 | [011](decisions/011-multi-domain-tls.md) | Multi-domain TLS config | Single SAN certificate covering all domains via rustls-acme |
 | [012](decisions/012-cipher-suite-restriction.md) | Restrict cipher suites | Match nginx scope: ECDHE-AES-GCM for TLS 1.2, all TLS 1.3 |
 | [013](decisions/013-health-check-port.md) | Health check on separate local port | Localhost-only HTTP health check, configurable port |
 | [014](decisions/014-unix-socket-reload.md) | Unix domain socket config reload API | Programmatic reload with success/failure feedback |
 | [015](decisions/015-per-site-timeouts.md) | Per-site upstream timeouts with defaults | 5s connect / 60s request defaults, per-site overrides |
 | [016](decisions/016-explicit-bind-address.md) | Explicit bind address required | Rejects `0.0.0.0` to prevent accidental exposure |
 | [017](decisions/017-upstream-connection-defaults.md) | Upstream connection defaults | HTTP/1.1, no redirects, connection pooling |
 | [018](decisions/018-body-size-limit.md) | Request body size limit | 100 MB default matching nginx, Gitea push compatibility |
 ## Open Questions
 Open questions are tracked in [open-questions.md](open-questions.md). Key
 questions affecting this document:
- **OQ-01**: Should cipher suites be restricted beyond rustls defaults? (open)
+- ~~**OQ-01**: Should cipher suites be restricted beyond rustls defaults?~~ (resolved — ADR-012)
- **OQ-03**: Should the health check endpoint be on a separate port? (open)
+- ~~**OQ-03**: Should the health check endpoint be on a separate port?~~ (resolved — ADR-013)
 - **OQ-07**: Should per-site TLS overrides be supported for mixed ACME/manual domains? (open)
--- a/docs/architecture/proxy.md
+++ b/docs/architecture/proxy.md
@@ -109,12 +109,15 @@ The proxy handler constructs a new request to the upstream:
 5. Stream the response back to the client
 The hyper Client is created once at startup and shared via axum's `State`. It
-must be configured with:
+must be configured with (see ADR-017 for rationale):
 - Connection pooling (hyper default behavior)
- Connect timeout: 5 seconds
+- HTTP/1.1 only for upstream connections (HTTP/2 proxying is out of scope)
 - Request timeout: 60 seconds
 - No redirect following (proxies should not follow redirects)
 Per-site timeout overrides are available via `upstream_connect_timeout_secs`
 and `upstream_request_timeout_secs` in `SiteConfig` (see ADR-015). When not
 specified, defaults of 5s connect and 60s request are used.
 ### 4. Error Handling
 | Upstream Condition | Response | Notes |
@@ -147,10 +150,11 @@ between the proxy and backend services on loopback is unnecessary.
 ## Body Size Limit
-axum's `DefaultBodyLimit` layer sets the maximum request body size. For
+axum's `DefaultBodyLimit` layer sets the maximum request body size. The default
-compatibility with Gitea's push operations (large pack files), this defaults
+of 100 MB (104,857,600 bytes) matches our current nginx configuration and
-to 100 MB. In Phase 1, the body limit is a global setting; Phase 2 may add
+accommodates Gitea's push operations with large pack files (see ADR-018). In
-per-site body limits.
+Phase 1, the body limit is a global setting; Phase 2 may add per-site body
 limits.
 ## Design Decisions
@@ -161,11 +165,14 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
 | [002](decisions/002-custom-proxy-handler.md) | Custom proxy handler | One upstream per domain — simpler than a general proxy library |
 | [007](decisions/007-custom-log-format.md) | Custom structured log format | key=value pairs with RATE_LIMIT prefix for fail2ban |
 | [010](decisions/010-multi-site-phase1.md) | Multi-site in Phase 1 | Multiple domains from initial release |
 | [015](decisions/015-per-site-timeouts.md) | Per-site upstream timeouts with defaults | 5s connect / 60s request defaults, per-site overrides |
 | [017](decisions/017-upstream-connection-defaults.md) | Upstream connection defaults | HTTP/1.1, no redirects, connection pooling |
 | [018](decisions/018-body-size-limit.md) | Request body size limit | 100 MB default matching nginx, Gitea push compatibility |
 ## Open Questions
 Open questions are tracked in [open-questions.md](open-questions.md). Key
 questions affecting this document:
- **OQ-06**: Should upstream timeouts be configurable per-site? (open — Phase 1
+- ~~**OQ-06**: Should upstream timeouts be configurable per-site?~~ (resolved —
-  uses global defaults of 5s connect, 60s request)
+  ADR-015: per-site timeout overrides with defaults)
--- a/docs/architecture/tls.md
+++ b/docs/architecture/tls.md
@@ -115,25 +115,26 @@ regression if defaults change in future rustls releases.
 ### Cipher Suites
-rustls 0.23 with the `aws_lc_rs` crypto provider defaults to a conservative
+Cipher suites are explicitly restricted to match the scope of our current nginx
-cipher suite selection that excludes all weak ciphers (no SHA-1, no 3DES, no
+configuration. See ADR-012 for the full rationale.
 RC4, no CBC-mode suites, no RSA key exchange).
-The current nginx config explicitly restricts to:
+**TLS 1.2 (explicitly selected):**
-```
+- `TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256`
-ECDHE-ECDSA-AES128-GCM-SHA256
+- `TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256`
-ECDHE-RSA-AES128-GCM-SHA256
+- `TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384`
-ECDHE-ECDSA-AES256-GCM-SHA384
+- `TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384`
 ECDHE-RSA-AES256-GCM-SHA384
 ```
-rustls's defaults include these plus TLS 1.3 suites (which nginx's config
+**TLS 1.3 (all default suites):**
 also allows via `TLSv1.3`). The default rustls cipher list is a strict subset
 of what browsers accept.
-See [open-questions.md](open-questions.md) OQ-01 for whether to further
+- `TLS_AES_128_GCM_SHA256`
-restrict cipher suites beyond rustls defaults.
+- `TLS_AES_256_GCM_SHA384`
 - `TLS_CHACHA20_POLY1305_SHA256`
 This is configured by building a `CryptoProvider` with a custom `cipher_suite`
 list and passing it to `ServerConfig::builder_with_provider()`. The cipher
 list matches our current nginx configuration's scope, providing behavioral
 parity during migration.
 ### ServerConfig Construction
@@ -223,12 +224,13 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
 | [005](decisions/005-tokio-rustls-direct.md) | tokio-rustls directly | Full control over TLS config and ACME resolver integration |
 | [010](decisions/010-multi-site-phase1.md) | Multi-site in Phase 1 | Multiple domains from initial release |
 | [011](decisions/011-multi-domain-tls.md) | Multi-domain TLS config | Single SAN certificate covering all domains via rustls-acme |
 | [012](decisions/012-cipher-suite-restriction.md) | Restrict cipher suites | Match nginx scope: four ECDHE-AES-GCM suites for TLS 1.2, all TLS 1.3 suites |
 ## Open Questions
 Open questions are tracked in [open-questions.md](open-questions.md). Key
 questions affecting this document:
- **OQ-01**: Should cipher suites be restricted beyond rustls defaults? (open)
+- ~~**OQ-01**: Should cipher suites be restricted beyond rustls defaults?~~ (resolved — ADR-012: restrict to nginx scope)
 - **OQ-07**: Should per-site TLS overrides be supported for mixed ACME/manual
  domains? (open)