Resolve 5 open questions, add 7 ADRs for previously undocumented decisions
Resolve open questions: - OQ-01: Restrict cipher suites to match nginx scope (4 ECDHE-AES-GCM suites for TLS 1.2 + all TLS 1.3 suites) — ADR-012 - OQ-03: Health check on separate local port (default 9900, localhost only) — ADR-013 - OQ-04: Add Unix domain socket admin API for config reload alongside SIGHUP, with structured success/failure responses — ADR-014 - OQ-06: Per-site upstream timeouts with defaults (5s connect, 60s request), overridable in SiteConfig — ADR-015 Document previously undocumented decisions flagged by architecture review: - ADR-016: Explicit bind address requirement (reject 0.0.0.0) - ADR-017: Upstream connection defaults (HTTP/1.1, no redirects, pooling) - ADR-018: 100 MB body size limit (matches nginx, Gitea compatibility) OQ-07 (per-site TLS overrides) remains open for future consideration. Spec updates: - config.md: add health_check_port, admin_socket_path, per-site timeout fields, update TOML example and validation rules - proxy.md: reference ADR-015/017/018 for timeouts, connection defaults, and body limit decisions - tls.md: replace OQ-01 cipher suite section with ADR-012 decision - operations.md: add local health check port section, admin socket reload - overview.md: update Phase 1 scope with new features, add ADR references - open-questions.md: resolve OQ-01/03/04/06, keep OQ-07 open
This commit is contained in:
@@ -109,23 +109,30 @@ Configurable via `log_level` in StaticConfig.
|
||||
|
||||
## Health Check
|
||||
|
||||
### Endpoint
|
||||
### Local Health Check Port
|
||||
|
||||
The primary health check endpoint is served on a separate local port (default:
|
||||
9900), bound to `127.0.0.1` only. This ensures health checks work even when TLS
|
||||
is misconfigured. See ADR-013 for the rationale.
|
||||
|
||||
```
|
||||
GET /health → 200 OK (empty body)
|
||||
GET http://127.0.0.1:9900/health → 200 OK (empty body)
|
||||
```
|
||||
|
||||
The health check endpoint is accessible on the main HTTPS listener. It returns
|
||||
200 if the process is alive and serving requests.
|
||||
The port is configurable via `health_check_port` in StaticConfig. Setting it
|
||||
to `0` disables the separate health check listener.
|
||||
|
||||
**Limitation**: Since `/health` is served over TLS, it cannot detect TLS
|
||||
configuration errors that prevent the TLS handshake from completing. External
|
||||
monitoring should also check TCP connectivity to port 443 independently.
|
||||
### HTTPS Health Check (Fallback)
|
||||
|
||||
When the local health check port is enabled, `/health` is also available on the
|
||||
main HTTPS listener for cases where TLS-level health verification is desired.
|
||||
External monitoring should prefer the local health check for liveness checks
|
||||
and can use the HTTPS endpoint for TLS verification.
|
||||
|
||||
### What It Checks
|
||||
|
||||
- Process is running and the tokio runtime is responsive
|
||||
- TLS listener is accepting connections
|
||||
- TLS listener is accepting connections (HTTPS endpoint only)
|
||||
- Config is loaded (StaticConfig and DynamicConfig are initialized)
|
||||
|
||||
It does **not** check upstream reachability. The health check answers "is the
|
||||
@@ -180,13 +187,22 @@ The proxy handles three signals via `signal-hook` (see [ADR-009](decisions/009-s
|
||||
- **SIGTERM / SIGINT**: Graceful shutdown. Stop accepting new connections, wait
|
||||
for in-flight requests to complete (up to a configurable timeout), then exit.
|
||||
- **SIGHUP**: Config reload. Re-read the config file, validate, and swap
|
||||
DynamicConfig if valid.
|
||||
DynamicConfig if valid. No feedback on success or failure.
|
||||
- **Admin socket reload**: Send `reload` command via the Unix domain socket
|
||||
(default: `/run/reverse-proxy/admin.sock`). Returns structured response
|
||||
indicating success or failure. See ADR-014 for details.
|
||||
|
||||
### SIGHUP for Config Reload
|
||||
|
||||
SIGHUP triggers config reload (see [config.md](config.md) for details). The
|
||||
process does not exit on SIGHUP.
|
||||
|
||||
### Admin Socket for Config Reload
|
||||
|
||||
The admin Unix domain socket provides programmatic config reload with feedback.
|
||||
This is useful for CI/CD pipelines and automation tools. See ADR-014 for the
|
||||
command protocol.
|
||||
|
||||
### Timeout
|
||||
|
||||
In-flight requests have a configurable shutdown timeout (default: 30 seconds).
|
||||
@@ -242,10 +258,13 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
|
||||
| [006](decisions/006-rate-limiting-approach.md) | Token bucket rate limiting | In-memory per-IP token bucket matching nginx burst semantics |
|
||||
| [007](decisions/007-custom-log-format.md) | Custom structured log format | key=value pairs with RATE_LIMIT prefix for fail2ban |
|
||||
| [009](decisions/009-signal-handling.md) | Signal handling strategy | signal-hook for SIGTERM/SIGINT/SIGHUP |
|
||||
| [013](decisions/013-health-check-port.md) | Health check on separate local port | Localhost-only HTTP health check, configurable port |
|
||||
| [014](decisions/014-unix-socket-reload.md) | Unix domain socket config reload API | Programmatic reload with success/failure feedback |
|
||||
|
||||
## Open Questions
|
||||
|
||||
Open questions are tracked in [open-questions.md](open-questions.md). Key
|
||||
questions affecting this document:
|
||||
|
||||
- **OQ-03**: Should the health check endpoint be on a separate port? (open)
|
||||
- ~~**OQ-03**: Should the health check endpoint be on a separate port?~~ (resolved
|
||||
— ADR-013: separate local port, default 9900, localhost only)
|
||||
Reference in New Issue
Block a user