Resolve all open questions, remove /health from main listener (ADR-022)

Resolve OQ-08 through OQ-12 after reviewing implementation findings: - OQ-08: Remove /health route from the main HTTPS listener entirely. Health checking belongs on port 9900 and admin socket only, not on the public-facing proxy. This eliminates upstream collision problems and special-case routing logic. (ADR-022) - OQ-09: Not an architectural unknown — ADR-015 already decided on a separate connect timeout. The implementation gap is a known issue. - OQ-10: Not an open question — acme_contact is already specified as required in config.md. The empty contact list is bug C2. - OQ-11: Hardcoded is_https=true is correct for a TLS-terminating proxy. HTTP listener redirects, doesn't proxy. Just needs a comment. - OQ-12: Access logging is already specified as mandatory/always-on in operations.md. Missing log_request! calls are bug W13. Updated docs: proxy.md, operations.md, overview.md, config.md, open-questions.md, README.md, ADR-013. Created ADR-022.
2026-06-12 03:39:52 +00:00
parent 68d27c4789
commit fe1ae6c05e
8 changed files with 204 additions and 149 deletions
--- a/docs/architecture/decisions/013-health-check-port.md
+++ b/docs/architecture/decisions/013-health-check-port.md
@@ -7,21 +7,23 @@ Accepted
 ## Context

 The health check endpoint (`/health`) needs to be accessible for monitoring
-without requiring TLS. Currently the design places it on the main HTTPS
-listener, which means:
+without requiring TLS. Serving it on the main HTTPS listener would mean:

 1. TLS handshake must succeed for the health check to respond
 2. External monitoring tools need to handle TLS
 3. A TLS configuration error would make the health check unreachable, creating
   a false-negative monitoring signal
+4. It creates collision with upstream applications that use `/health` for their
+   own health checks (see ADR-022)

 Three options were considered (see OQ-03):

-1. **Main HTTPS listener only**: Simplest, but TLS config errors make health
-   checks unreachable
-2. **Separate unencrypted port on localhost**: Simple, works with standard
-   monitoring tools, but health checks bypass TLS
+1. **Separate unencrypted port on localhost (chosen)**: Simple, works with
+   standard monitoring tools, health checks work even when TLS is misconfigured
+2. **Main HTTPS listener only**: Would require TLS for health checks, creating
+   a circular dependency — TLS config errors would make health checks unreachable
 3. **Admin port with its own listener**: Most flexible but adds complexity
+   beyond what's needed for a simple health check

 ## Decision

@@ -31,8 +33,8 @@ HTTP and HTTPS listeners.

 The port is configurable via `health_check_port` in StaticConfig. The default
 value is `9900` (enabled, localhost only). Setting it to `0` disables the
-separate health check listener, and `/health` remains available on the main
-HTTPS listener as a fallback.
+health check listener entirely — there is no `/health` route on the main HTTPS
+listener (see ADR-022).

 ## Rationale

@@ -45,8 +47,9 @@ HTTPS listener as a fallback.
  the same host) can reach it
 - Configurable port allows different deployment scenarios (some monitoring runs
  on different ports)
- Disabling via `health_check_port = 0` keeps the main HTTPS `/health` endpoint
-  available for cases where a separate port isn't needed
+- Disabling via `health_check_port = 0` removes the health check entirely —
+  the admin socket's `status` command remains available as an alternative
+  health/status mechanism
 - When this project is folded into alknet, the health check will use alknet's
  existing patterns, making the separate port unnecessary in that context

@@ -61,10 +64,9 @@ HTTPS listener as a fallback.

 **Negative:**
 - Additional listener to manage (minimal complexity)
- Two health check endpoints exist when the separate port is enabled (the
-  local one and the HTTPS one) — monitoring should prefer the local one

 ## References

 - [operations.md](../operations.md)
+- [ADR-022](022-health-check-scope.md) — Health check scope (no `/health` on main listener)
 - OQ-03 (now resolved)
--- a/docs/architecture/decisions/022-health-check-scope.md
+++ b/docs/architecture/decisions/022-health-check-scope.md
@@ -0,0 +1,56 @@
+# ADR-022: Health Check Scope — Local Port and Admin Socket Only
+
+## Status
+
+Accepted
+
+## Context
+
+The implementation served a `GET /health` route on the main HTTPS listener that
+returned 200 OK regardless of the Host header. This route was evaluated before
+host-based routing, meaning any upstream application using `/health` for its own
+health checks would have those requests silently intercepted by the proxy and
+never reach the upstream (implementation review finding W5).
+
+The architecture already specified a separate local health check port (9900,
+bound to 127.0.0.1 only) via ADR-013. The question was whether to keep the
+main-listener `/health` route alongside the dedicated port (and possibly make
+the path configurable), or to remove it entirely.
+
+## Decision
+
+The main HTTPS listener does **not** serve a `/health` route. Health checking is
+handled exclusively by:
+
+1. **Local health check port** (default: 9900, bound to `127.0.0.1`) — serves
+   `GET /health → 200 OK`. This is the primary health check mechanism for
+   container orchestration, load balancers, and monitoring systems.
+2. **Admin socket** (`status` command) — returns process information including
+   uptime and site count.
+
+The `/health` route is removed from the main listener entirely. No configurable
+path is needed because the route simply does not exist on the public listener.
+
+## Consequences
+
+**Positive:**
+- No collision with upstream applications that use `/health` for their own
+  health checks
+- The main listener's routing logic is simpler — all requests go through
+  host-based routing, no special cases
+- Clear separation of concerns: the main listener proxies, the local port
+  answers health checks
+- No configurable path needed — the problem disappears entirely
+
+**Negative:**
+- External monitoring that needs to verify TLS is working must connect to the
+  HTTPS port directly and check for a successful TLS handshake or a 404
+  response, rather than getting a 200 from `/health`. This is a minor
+  inconvenience — any successful TLS response (even 404) confirms the proxy is
+  serving TLS correctly.
+
+## References
+
+- ADR-013: Health check on separate local port
+- OQ-08: Resolved by this ADR
+- Implementation review finding W5 (hardcoded `/health` path)