Files
reverse-proxy/docs/architecture
glm-5.1 68d27c4789 Triage implementation review findings and update architecture specs
Analyzed 29 findings from the implementation review (002-implementation-review.md)
and identified 8 architecture-level concerns requiring spec changes:

Architecture gaps addressed:
- C2: Added acme_contact field to config.md, tls.md, and operations.md.
  Let's Encrypt requires a contact email for production; the spec was missing
  this required field.
- C4: Added StaticConfig drift tracking requirement to config.md reload
  section. ConfigReloadHandle must update its stored StaticConfig after each
  successful reload to prevent stale warnings.
- W1: Updated shutdown sequence in operations.md to specify that server tasks
  should be joined (not aborted) during the drain window.
- W5: Added health check path collision note to proxy.md.
- W13: Clarified that access logging is always-on in operations.md.
- W14: Updated X-Forwarded-Proto description in proxy.md to clarify that it
  is always 'https' since the HTTP listener redirects rather than proxies.

New open questions added:
- OQ-08: Should /health use a less common path to avoid upstream collision?
- OQ-09: How should upstream_connect_timeout_secs be enforced?
- OQ-10: Should ACME contact email be a required config field?
- OQ-11: How should X-Forwarded-Proto be derived per-listener?
- OQ-12: Should request access logging be mandatory or optional?

The remaining 21 findings are implementation-level bugs, code quality issues,
or Phase 2 improvements that don't require architecture spec changes.
2026-06-11 15:04:09 +00:00
..

status, last_updated
status last_updated
draft 2026-06-11

Reverse Proxy — Architecture

Current State

Phase 0 (Exploration) — Complete. Phase 1 (Architecture) — In progress.

This project replaces our vulnerable nginx 1.24.0 installation with a memory-safe Rust/axum reverse proxy. The primary motivation is CVE-2026-42945 (unauthenticated RCE in nginx's rewrite module) and the broader pattern of memory corruption bugs in nginx's C codebase.

The proxy supports multiple domains from initial release (git.alk.dev and alk.dev), with per-domain host-based routing and a single multi-domain SAN certificate via ACME.

Architecture Documents

Document Status Description
overview.md Draft Vision, scope, crate dependencies, exports
proxy.md Draft Reverse proxy handler, request flow, header injection
tls.md Draft TLS termination, ACME, manual certs, SNI
config.md Draft TOML config format, static/dynamic split, ArcSwap reload
operations.md Draft Rate limiting, logging, health check, systemd, shutdown

ADR Table

ADR Title Status
001 Rust with Axum Accepted
002 Custom Proxy Handler Accepted
003 TOML Configuration Format Accepted
004 ACME-Primary Certificate Management Accepted
005 tokio-rustls Directly, Not axum-server Accepted
006 Token Bucket Rate Limiting Accepted
007 Custom Structured Log Format Accepted
008 Static/Dynamic Config Split with ArcSwap Accepted
009 Signal Handling Strategy Accepted
010 Multi-Site Support in Phase 1 Accepted
011 Multi-Domain TLS Configuration Accepted
012 Restrict Cipher Suites to nginx Scope Accepted
013 Health Check on Separate Local Port Accepted
014 Unix Domain Socket Config Reload API Accepted
015 Per-Site Upstream Timeouts with Defaults Accepted
016 Explicit Bind Address Requirement Accepted
017 Upstream Connection Defaults Accepted
018 Request Body Size Limit Accepted
019 Multi-Config Listener Support Accepted
020 Container Deployment Model Accepted
021 X-Forwarded-For Edge Proxy Model Accepted

Open Questions

See open-questions.md for the full tracker.

OQ Question Priority Status
OQ-01 Should cipher suites be restricted beyond rustls defaults? medium resolved (ADR-012)
OQ-02 What log format should fail2ban consume? high resolved (ADR-007)
OQ-03 Should the health check endpoint be on a separate port? low resolved (ADR-013)
OQ-04 Config reload: SIGHUP only or also Unix socket API? low resolved (ADR-014)
OQ-05 Should the proxy bind to multiple addresses? low resolved (single bind_addr sufficient)
OQ-06 Should upstream timeouts be configurable per-site? low resolved (ADR-015)
OQ-07 Should per-site TLS overrides be supported for mixed ACME/manual domains? low resolved (ADR-019)
OQ-08 Should the /health path use a less common endpoint to avoid upstream collision? medium open
OQ-09 How should upstream_connect_timeout_secs be enforced? medium open
OQ-10 Should ACME contact email be a required config field? high open
OQ-11 How should X-Forwarded-Proto be derived per-listener? medium open
OQ-12 Should request access logging be mandatory or optional? high open

Document Lifecycle

Status Meaning Transitions
draft Under active development. May change significantly. reviewed when open questions are resolved
reviewed Architecture is final. Implementation may begin. Changes require review. stable when implementation is complete
stable Locked. Changes require review and may warrant an ADR. deprecated when superseded
deprecated Superseded. Kept for reference. Removed when no longer referenced