Triage implementation review findings and update architecture specs

Analyzed 29 findings from the implementation review (002-implementation-review.md)
and identified 8 architecture-level concerns requiring spec changes:

Architecture gaps addressed:
- C2: Added acme_contact field to config.md, tls.md, and operations.md.
  Let's Encrypt requires a contact email for production; the spec was missing
  this required field.
- C4: Added StaticConfig drift tracking requirement to config.md reload
  section. ConfigReloadHandle must update its stored StaticConfig after each
  successful reload to prevent stale warnings.
- W1: Updated shutdown sequence in operations.md to specify that server tasks
  should be joined (not aborted) during the drain window.
- W5: Added health check path collision note to proxy.md.
- W13: Clarified that access logging is always-on in operations.md.
- W14: Updated X-Forwarded-Proto description in proxy.md to clarify that it
  is always 'https' since the HTTP listener redirects rather than proxies.

New open questions added:
- OQ-08: Should /health use a less common path to avoid upstream collision?
- OQ-09: How should upstream_connect_timeout_secs be enforced?
- OQ-10: Should ACME contact email be a required config field?
- OQ-11: How should X-Forwarded-Proto be derived per-listener?
- OQ-12: Should request access logging be mandatory or optional?

The remaining 21 findings are implementation-level bugs, code quality issues,
or Phase 2 improvements that don't require architecture spec changes.
This commit is contained in:
2026-06-11 15:04:09 +00:00
parent 5478df7ab7
commit 68d27c4789
6 changed files with 135 additions and 10 deletions

View File

@@ -107,7 +107,10 @@ All logs use `tracing` with structured fields. The proxy outputs two types of
log entries:
1. **Access logs**: Every proxied request is logged at `info` level with
structured fields.
structured fields. Access logging is **always-on** — it is the primary
observability mechanism for the proxy and is required for fail2ban
integration. There is no configuration option to disable access logging
(see OQ-12).
```
REQUEST client_ip=203.0.113.50 host=git.alk.dev method=GET path=/user/repo status=200 upstream=127.0.0.1:3000 duration_ms=45
@@ -293,7 +296,9 @@ On SIGTERM or SIGINT, the proxy performs a graceful shutdown:
2. **Close idle keep-alive connections** — Send `Connection: close` on any idle
connections in the keep-alive pool.
3. **Wait for in-flight requests** — Up to `shutdown_timeout_secs` (default: 30)
for active requests to complete.
for active requests to complete. Server tasks are joined (not aborted) so
that in-flight requests can drain normally. Only after the timeout expires
are remaining tasks aborted.
4. **Force-close remaining connections** — After the timeout, any remaining
connections are forcefully closed via TCP RST.
5. **Cancel background tasks** — ACME renewal tasks, rate limiter eviction task,
@@ -477,6 +482,7 @@ mode = "acme"
acme_domains = ["git.example.com"]
acme_cache_dir = "/var/lib/reverse-proxy/acme-cache"
acme_directory = "production"
acme_contact = "mailto:admin@example.com"
[[listeners.sites]]
host = "git.example.com"