Resolve all open questions, remove /health from main listener (ADR-022)

Resolve OQ-08 through OQ-12 after reviewing implementation findings: - OQ-08: Remove /health route from the main HTTPS listener entirely. Health checking belongs on port 9900 and admin socket only, not on the public-facing proxy. This eliminates upstream collision problems and special-case routing logic. (ADR-022) - OQ-09: Not an architectural unknown — ADR-015 already decided on a separate connect timeout. The implementation gap is a known issue. - OQ-10: Not an open question — acme_contact is already specified as required in config.md. The empty contact list is bug C2. - OQ-11: Hardcoded is_https=true is correct for a TLS-terminating proxy. HTTP listener redirects, doesn't proxy. Just needs a comment. - OQ-12: Access logging is already specified as mandatory/always-on in operations.md. Missing log_request! calls are bug W13. Updated docs: proxy.md, operations.md, overview.md, config.md, open-questions.md, README.md, ADR-013. Created ADR-022.
2026-06-12 03:39:52 +00:00
parent 68d27c4789
commit fe1ae6c05e
8 changed files with 204 additions and 149 deletions
--- a/docs/architecture/open-questions.md
+++ b/docs/architecture/open-questions.md
@@ -1,6 +1,6 @@
 ---
 status: draft
-last_updated: 2026-06-11
+last_updated: 2026-06-12
 ---

 # Open Questions
@@ -50,9 +50,10 @@ last_updated: 2026-06-11
 - **Priority**: low
 - **Resolution**: Add a configurable local health check port (default: 9900)
  bound to `127.0.0.1` only. Health checks work even when TLS is misconfigured.
-  The main HTTPS `/health` endpoint remains available as a fallback. See
-  ADR-013.
- **Cross-references**: ADR-013
+  There is no `/health` route on the main HTTPS listener — health checking is
+  handled exclusively by the local port and admin socket. See ADR-013 and
+  ADR-022.
+- **Cross-references**: ADR-013, ADR-022

 ## Configuration

@@ -92,92 +93,79 @@ last_updated: 2026-06-11
  that override global defaults when specified.
 - **Cross-references**: ADR-015, ADR-017

-### OQ-08: Should the `/health` path use a less common endpoint to avoid upstream collision?
+### ~~OQ-08: Should the `/health` path use a less common endpoint to avoid upstream collision?~~

 - **Origin**: Implementation review finding W5, [proxy.md](proxy.md)
- **Status**: open
+- **Status**: resolved
 - **Priority**: medium
- **Resolution**: None yet. The proxy currently intercepts `GET /health` on all
-  hosts before host-based routing, which means any upstream application that
-  uses `/health` for its own health checks will have those requests silently
-  intercepted. Options: (1) Use a less common path like `/__health` or
-  `/healthz`; (2) Only intercept `/health` when the Host header doesn't match
-  any known site (fallthrough); (3) Make the health check path configurable
-  via `StaticConfig`. Option 1 is simplest for Phase 1. Option 3 is most
-  flexible long-term. The architecture spec (proxy.md, ADR-013) currently
-  specifies `/health` as a top-level route regardless of Host.
- **Cross-references**: ADR-013
+- **Resolution**: The `/health` route does not belong on the main listener at
+  all. Health checking is an operational concern served by the dedicated local
+  port (9900) and the admin socket's `status` command — not by intercepting
+  traffic on the public-facing proxy. Serving `/health` on the main listener
+  creates collision with upstream applications, requires special-case routing
+  logic before host-based matching, and is architecturally wrong: the main
+  listener's job is to proxy requests, not to serve operational endpoints. The
+  local health check port (bound to `127.0.0.1:9900`) and the admin socket are
+  the sole health/status mechanisms. See ADR-022.
+- **Cross-references**: ADR-013, ADR-022

-### OQ-09: How should `upstream_connect_timeout_secs` be enforced?
+### ~~OQ-09: How should `upstream_connect_timeout_secs` be enforced?~~

 - **Origin**: Implementation review finding W4, ADR-015, ADR-017
- **Status**: open
+- **Status**: resolved
 - **Priority**: medium
- **Resolution**: None yet. The architecture (ADR-015, ADR-017) specifies a
-  5-second default connect timeout separate from the request timeout, and
-  `SiteConfig` includes `upstream_connect_timeout_secs`. However, the
-  implementation only applies `upstream_request_timeout_secs` as a blanket
-  timeout covering the entire exchange. The hyper client handles TCP connect
-  internally, making a two-phase timeout harder to implement without custom
-  connect logic. Need to decide: (1) implement a two-phase timeout using
-  `tokio::time::timeout` for connect phase then request phase; (2) configure
-  the hyper client's `connect_timeout` parameter; or (3) accept the current
-  behavior for Phase 1 and add connect timeout enforcement in Phase 2.
+- **Resolution**: This is an implementation gap, not an architectural unknown.
+  The architecture already specifies a 5-second default connect timeout
+  separate from the request timeout (ADR-015, ADR-017), and `SiteConfig`
+  already includes `upstream_connect_timeout_secs`. The implementation must
+  wire this field to hyper's `connect_timeout` parameter. If hyper's API
+  doesn't expose a separate connect timeout, a two-phase `tokio::time::timeout`
+  approach should be used for Phase 2. For Phase 1, the connect timeout field
+  exists in config but is not enforced — this is a documented known gap. No ADR
+  needed; the decision was already made in ADR-015.
 - **Cross-references**: ADR-015, ADR-017

-## Configuration
-
-### OQ-10: Should ACME contact email be a required config field?
+### ~~OQ-10: Should ACME contact email be a required config field?~~

 - **Origin**: Implementation review finding C2, [tls.md](tls.md), [config.md](config.md)
- **Status**: open
+- **Status**: resolved
 - **Priority**: high
- **Resolution**: None yet. Let's Encrypt requires a contact email for production
-  certificate requests. The current architecture spec does not include an
-  `acme_contact` field in `TlsConfig` or `ListenerConfig`. Without it, ACME
-  registration with Let's Encrypt production will fail. Options: (1) Add a
-  required `acme_contact` field to the TLS config within each `[[listeners]]`
-  entry that uses ACME mode; (2) Add a global `acme_contact` field shared
-  across all ACME listeners. Per-listener is more flexible but adds config
-  noise. Global is simpler for typical deployments. Need to update config.md
-  and tls.md.
+- **Resolution**: This is not an open question — the architecture already
+  specifies `acme_contact` as a required field in ACME mode (config.md
+  validation rule 19). The field is defined in the `ListenerConfig` table and
+  shown in TOML examples. Let's Encrypt requires a contact email for production
+  certificate requests. The implementation bug (C2: `contact: vec![]`) must be
+  fixed to use the configured `acme_contact` value. No new ADR needed — the
+  decision is already documented in config.md and tls.md.
 - **Cross-references**: ADR-004

-### OQ-11: How should `X-Forwarded-Proto` be derived per-listener?
+### ~~OQ-11: How should `X-Forwarded-Proto` be derived per-listener?~~

 - **Origin**: Implementation review finding W14, [proxy.md](proxy.md)
- **Status**: open
+- **Status**: resolved
 - **Priority**: medium
- **Resolution**: None yet. The architecture spec (proxy.md) states
-  `X-Forwarded-Proto` should be "determined by which listener port received the
-  request" — `https` for requests on the listener's `https_port`, `http` for
-  requests on the listener's `http_port`. The implementation hardcodes
-  `is_https: true` in `ProxyState`. For a TLS-terminating reverse proxy this
-  is correct (all TLS connections arrive on the HTTPS port), but the HTTP
-  redirect listener should set `X-Forwarded-Proto: https` since it redirects to
-  HTTPS. Need to clarify: (1) The HTTPS listener always sets `X-Forwarded-Proto:
-  https` (correct, since it terminates TLS); (2) The HTTP redirect listener
-  sends a 301 redirect and does NOT proxy, so `X-Forwarded-Proto` on the
-  redirect response is not applicable. The hardcoded behavior is correct but
-  should be documented.
+- **Resolution**: The hardcoded `is_https: true` behavior is correct for a
+  TLS-terminating reverse proxy. The proxy only proxies requests on the HTTPS
+  listener, which always sets `X-Forwarded-Proto: https`. The HTTP redirect
+  listener sends a 301 redirect and does NOT proxy requests, so
+  `X-Forwarded-Proto` is not set there. This behavior is correct and matches
+  the architecture spec (proxy.md). The implementation should add a comment
+  documenting this rationale to prevent future "fixes" that would change the
+  behavior. No ADR or spec change needed — just a code comment.
 - **Cross-references**: ADR-021

 ## Operations

-### OQ-12: Should request access logging be mandatory or optional?
+### ~~OQ-12: Should request access logging be mandatory or optional?~~

 - **Origin**: Implementation review finding W13, [operations.md](operations.md)
- **Status**: open
+- **Status**: resolved
 - **Priority**: high
- **Resolution**: None yet. The architecture spec (operations.md) defines an
-  access log format (`REQUEST client_ip=... host=... method=... path=...
-  status=... upstream=... duration_ms=...`) and a `log_request!` macro, but
-  the implementation does not emit access logs. Without request-level logging,
-  the proxy is operationally blind — there is no observability into traffic,
-  response codes, or upstream latency. This also blocks fail2ban integration
-  for access-log-based jails. The question is whether to: (1) Make access
-  logging mandatory (always-on at `info` level); (2) Make it configurable
-  (e.g., `access_log` boolean in `LoggingConfig`); or (3) Tie it to the
-  existing `log_file_path` setting. The architecture spec implies it's always
-  on.
+- **Resolution**: Access logging is mandatory and always-on at `info` level.
+  The architecture spec (operations.md) already states: "Access logging is
+  **always-on** — it is the primary observability mechanism for the proxy and
+  is required for fail2ban integration. There is no configuration option to
+  disable access logging." The `log_request!` macro exists in the codebase
+  but is not called — this is an implementation gap (W13), not an
+  architectural question. No ADR needed; ADR-007 already covers the log format.
 - **Cross-references**: ADR-007