Resolve open questions: - OQ-01: Restrict cipher suites to match nginx scope (4 ECDHE-AES-GCM suites for TLS 1.2 + all TLS 1.3 suites) — ADR-012 - OQ-03: Health check on separate local port (default 9900, localhost only) — ADR-013 - OQ-04: Add Unix domain socket admin API for config reload alongside SIGHUP, with structured success/failure responses — ADR-014 - OQ-06: Per-site upstream timeouts with defaults (5s connect, 60s request), overridable in SiteConfig — ADR-015 Document previously undocumented decisions flagged by architecture review: - ADR-016: Explicit bind address requirement (reject 0.0.0.0) - ADR-017: Upstream connection defaults (HTTP/1.1, no redirects, pooling) - ADR-018: 100 MB body size limit (matches nginx, Gitea compatibility) OQ-07 (per-site TLS overrides) remains open for future consideration. Spec updates: - config.md: add health_check_port, admin_socket_path, per-site timeout fields, update TOML example and validation rules - proxy.md: reference ADR-015/017/018 for timeouts, connection defaults, and body limit decisions - tls.md: replace OQ-01 cipher suite section with ADR-012 decision - operations.md: add local health check port section, admin socket reload - overview.md: update Phase 1 scope with new features, add ADR references - open-questions.md: resolve OQ-01/03/04/06, keep OQ-07 open
2.6 KiB
2.6 KiB
ADR-013: Health Check on Separate Local Port
Status
Accepted
Context
The health check endpoint (/health) needs to be accessible for monitoring
without requiring TLS. Currently the design places it on the main HTTPS
listener, which means:
- TLS handshake must succeed for the health check to respond
- External monitoring tools need to handle TLS
- A TLS configuration error would make the health check unreachable, creating a false-negative monitoring signal
Three options were considered (see OQ-03):
- Main HTTPS listener only: Simplest, but TLS config errors make health checks unreachable
- Separate unencrypted port on localhost: Simple, works with standard monitoring tools, but health checks bypass TLS
- Admin port with its own listener: Most flexible but adds complexity
Decision
Add a configurable health check port that binds to 127.0.0.1 only (localhost),
serving /health over plain HTTP. This is a separate listener from the main
HTTP and HTTPS listeners.
The port is configurable via health_check_port in StaticConfig. Setting it
to 0 (default) disables the separate health check listener, and /health
remains available on the main HTTPS listener as a fallback.
Rationale
- A local-only health check port is the standard pattern for reverse proxies and service meshes (envoy, haproxy, k8s health probes all use this pattern)
- Health checks should work even when TLS is misconfigured — that's the whole point of monitoring
- Binding to
127.0.0.1only means the health check is not exposed to the internet — only local monitoring tools (systemd, scripts, load balancers on the same host) can reach it - Configurable port allows different deployment scenarios (some monitoring runs on different ports)
- Disabling via
health_check_port = 0keeps the main HTTPS/healthendpoint available for cases where a separate port isn't needed - When this project is folded into alknet, the health check will use alknet's existing patterns, making the separate port unnecessary in that context
Consequences
Positive:
- Health checks work even when TLS is misconfigured
- Standard pattern that monitoring tools expect
- Not exposed to the internet (localhost only)
- Configurable — can be disabled if not needed
- systemd can use it for
NotifyAccessreadiness checks
Negative:
- Additional listener to manage (minimal complexity)
- Two health check endpoints exist when the separate port is enabled (the local one and the HTTPS one) — monitoring should prefer the local one
References
- operations.md
- OQ-03 (now resolved)