Commit Graph

8 Commits

Author SHA1 Message Date
0d54eba41e Update architecture specs to reflect live deployment findings and fix two bugs
Architecture updates based on gaps discovered during live deployment testing:

- ADR-023: HTTP/2 client-facing support via ALPN-based protocol detection.
  The spec previously said HTTP/2 was out of scope, but the deployment
  revealed that modern browsers negotiate HTTP/2 via ALPN. The proxy now
  correctly detects the negotiated ALPN protocol and uses the appropriate
  HTTP server builder (http2::Builder for h2, auto::Builder for http/1.1).
  Upstream connections remain HTTP/1.1. Host resolution now falls back to
  URI host for HTTP/2 :authority pseudo-headers.

- ADR-024: ANSI-disabled logging. All tracing-subscriber layers now use
  with_ansi(false) to prevent ANSI escape codes in log output, which broke
  fail2ban regex matching in Docker deployments. Also documents the fail2ban
  regex anchor fix (^RATE_LIMIT → RATE_LIMIT).

Bug fixes found by architecture review:

- Fix missing ALPN protocols in manual TLS mode. build_manual_server_config
  and build_multi_domain_server_config did not set alpn_protocols, meaning
  manual TLS mode could not support HTTP/2. Added h2 and http/1.1 ALPN
  entries to both functions (acme-tls/1 only in ACME mode).

- Fix missing with_ansi(false) in JSON log format. The init_json function
  with file output did not disable ANSI on stdout or file layers, which would
  break fail2ban in production JSON logging mode.

Other spec updates:

- All document statuses updated from draft to reviewed
- proxy.md: documented Server header removal, upstream HTTPS client,
  two-phase timeout enforcement, HTTP/2 host resolution, connect timeout
- tls.md: documented ALPN configuration differing by mode (ACME vs manual)
- overview.md: added HTTP/2 client-facing support to scope, updated crate
  deps (hyper-rustls, rustls-native-certs, hyper-util), clarified out-of-scope
- config.md: fixed http_port type (u16→u32) to match implementation, added
  ANSI-disabled note for LoggingConfig
- operations.md: documented ANSI-disabled logging, fail2ban regex anchor
- open-questions.md: updated OQ-09 resolution (connect timeout fully
  implemented), OQ-10 (C2 bug is fixed)
2026-06-12 11:28:31 +00:00
fe1ae6c05e Resolve all open questions, remove /health from main listener (ADR-022)
Resolve OQ-08 through OQ-12 after reviewing implementation findings:

- OQ-08: Remove /health route from the main HTTPS listener entirely.
  Health checking belongs on port 9900 and admin socket only, not on
  the public-facing proxy. This eliminates upstream collision problems
  and special-case routing logic. (ADR-022)

- OQ-09: Not an architectural unknown — ADR-015 already decided on a
  separate connect timeout. The implementation gap is a known issue.

- OQ-10: Not an open question — acme_contact is already specified as
  required in config.md. The empty contact list is bug C2.

- OQ-11: Hardcoded is_https=true is correct for a TLS-terminating
  proxy. HTTP listener redirects, doesn't proxy. Just needs a comment.

- OQ-12: Access logging is already specified as mandatory/always-on in
  operations.md. Missing log_request! calls are bug W13.

Updated docs: proxy.md, operations.md, overview.md, config.md,
open-questions.md, README.md, ADR-013. Created ADR-022.
2026-06-12 03:39:52 +00:00
ceb59ad9b9 Resolve all architecture review findings (7 critical, 14 warnings, 6 suggestions)
Critical findings resolved:
- C1: Site routing is global (per-listener TOML, global runtime lookup)
- C2: X-Forwarded-For replaces (not appends) — edge proxy model (ADR-021)
- C3: Hop-by-hop header handling rules specified (proxy.md)
- C4: ACME failure behavior defined (tls.md)
- C5: Startup sequence with fail-fast semantics (operations.md)
- C6: Per-listener Router instances with shared global state (overview.md)
- C7: Rate limiter adopts new params on next request, no state clear (operations.md)

Warnings resolved:
- W1: Admin socket wire protocol specified
- W2: Host header port stripped, hostnames only in config
- W3: HTTP redirect URL construction with port handling
- W4: /health on HTTPS matches regardless of Host header
- W5: Static config changes logged as warning during reload
- W6: Reload operations serialized via Mutex
- W7: http_port validation rules added (9 new rules total)
- W8: upstream format validation (host:port required, no scheme)
- W9: TLS error handling table (SNI, version, cipher failures)
- W10: IPv6 rate limited per /64 prefix
- W11: Graceful shutdown sequence specified (6 steps)
- W12: Error response bodies: minimal plain text, no version disclosure
- W13: upstream_scheme HTTPS uses system CA store
- W14: allow_wildcard_bind is OR between config and CLI
- W15: ADR-010 Phase 2 list updated (timeouts moved to Phase 1)
- W17: LoggingConfig static/restart note added

Suggestions applied:
- S2: ConnectInfo propagation note
- S3: Case-insensitive host matching (RFC 7230)
- S5: Response streaming behavior (chunk-by-chunk)
- S6: Token bucket nodelay semantics
- S7: File watching explicitly out of scope
- S8: All paths forwarded without filtering
- S9: shutdown_timeout_secs referenced in shutdown description
- S11: Consolidated defaults table in config.md
2026-06-11 10:56:40 +00:00
fecc385d75 Add container deployment model (ADR-020) and fix review issues
- ADR-020: Document defense-in-depth rationale for running in a minimal
  Docker container (memory-safe language + container isolation), flexible
  upstream addressing (Docker DNS, loopback, LAN, tunnel endpoints),
  file-primary logging for fail2ban, and volume mount strategy
- ADR-016: Add allow_wildcard_bind override for container deployments
  where 0.0.0.0 is correct inside the container network namespace
- operations.md: Add container deployment section with Docker Compose
  example, networking table, volume mounts, and health check integration;
  flip logging to file-primary for fail2ban reliability; note systemd as
  alternative to container deployment
- config.md: Restructure logging fields into nested LoggingConfig (matching
  TOML [logging] section), add allow_wildcard_bind, shutdown_timeout_secs,
  and log_file_path fields; clarify upstream addressing supports Docker
  DNS and tunnel endpoints; update validation rule for 0.0.0.0 override
- overview.md: Update architecture diagram for container model with Docker
  networking and volume mounts; add ADR-020 reference
- proxy.md: Clarify X-Forwarded-Proto is determined by listener port, not
  hardcoded 80/443
- ADR-013: Fix health_check_port default contradiction (default is 9900,
  not 0/disabled as previously stated)
2026-06-11 10:10:32 +00:00
346754fb2b Resolve OQ-07: add multi-config listener support (ADR-019)
Introduce [[listeners]] configuration to support both dedicated-IP
(1 IP = 1 cert = 1 domain) and shared-IP (SAN certificate) deployment
models. Each listener is an independent TLS endpoint with its own bind
address, TLS config, and site routing. OQ-07 is now resolved.

Changes:
- Add ADR-019 for multi-config listener support
- Update config format from [server] to [[listeners]] entries
- Update tls.md for per-listener TLS and certificate provisioning
- Update overview.md architecture diagram and scope
- Update proxy.md for per-listener HTTP redirect
- Fix stale references in ADR-010, ADR-011, ADR-016
- Update OQ-05 resolution (per-listener bind_addr supersedes)
- Add unique-host rationale to config validation rules
- Architecture review: fix all 3 critical and 6 warning issues
2026-06-11 09:35:24 +00:00
9a2352e61c Resolve 5 open questions, add 7 ADRs for previously undocumented decisions
Resolve open questions:
- OQ-01: Restrict cipher suites to match nginx scope (4 ECDHE-AES-GCM
  suites for TLS 1.2 + all TLS 1.3 suites) — ADR-012
- OQ-03: Health check on separate local port (default 9900, localhost
  only) — ADR-013
- OQ-04: Add Unix domain socket admin API for config reload alongside
  SIGHUP, with structured success/failure responses — ADR-014
- OQ-06: Per-site upstream timeouts with defaults (5s connect, 60s
  request), overridable in SiteConfig — ADR-015

Document previously undocumented decisions flagged by architecture review:
- ADR-016: Explicit bind address requirement (reject 0.0.0.0)
- ADR-017: Upstream connection defaults (HTTP/1.1, no redirects, pooling)
- ADR-018: 100 MB body size limit (matches nginx, Gitea compatibility)

OQ-07 (per-site TLS overrides) remains open for future consideration.

Spec updates:
- config.md: add health_check_port, admin_socket_path, per-site timeout
  fields, update TOML example and validation rules
- proxy.md: reference ADR-015/017/018 for timeouts, connection defaults,
  and body limit decisions
- tls.md: replace OQ-01 cipher suite section with ADR-012 decision
- operations.md: add local health check port section, admin socket reload
- overview.md: update Phase 1 scope with new features, add ADR references
- open-questions.md: resolve OQ-01/03/04/06, keep OQ-07 open
2026-06-11 09:07:36 +00:00
7efc142406 Expand architecture: multi-site Phase 1, multi-domain TLS, fix review issues
Promote multi-site support from Phase 2 to Phase 1 (ADR-010): the proxy
must support git.alk.dev and alk.dev from initial release. Add multi-domain
TLS configuration (ADR-011): acme_domains array replaces acme_domain string,
single SAN certificate via rustls-acme.

Key changes:
- ADR-010: Multi-site in Phase 1 — avoids config format migration later
- ADR-011: Multi-domain TLS — single SAN cert, acme_domains Vec<String>
- ADR-002: Updated rationale for multi-site (one upstream per domain)
- overview.md: Phase 1 now includes multi-site, alk.dev pass-through,
  dual licensing (MIT OR Apache-2.0), real IP removed
- config.md: acme_domain → acme_domains, TOML example shows both sites,
  validation adds unique host check, real IP replaced with 203.0.113.10
- tls.md: Multi-domain SNI section moved from Future to current, manual
  mode uses ResolvesServerCert for SNI mapping, TOML header fixed
- proxy.md: Updated for multi-site, removed single-domain language
- operations.md: RFC 5737 documentation IPs, clarified rate limit eviction
  semantics (distinct scan interval vs eviction age)
- open-questions.md: OQ-05 resolved (single bind_addr sufficient), new
  OQ-07 (per-site TLS overrides)

Review fixes:
- acme_domains (plural) consistently used across all docs and diagram
- ADR-011 clearly scopes acme_domain as previous design
- Inline decision rationale extracted: tls.md hot-reload → ADR-004 ref,
  config.md static/dynamic → ADR-008 ref
- TOML section headers consistent (server.tls)
2026-06-11 08:50:03 +00:00
8ee6284b62 Add architecture specification for Rust/axum reverse proxy
Phase 1 architecture docs covering proxy handler, TLS termination (ACME +
manual), TOML config with static/dynamic split (ArcSwap), and operations
(rate limiting, logging, health check, systemd, graceful shutdown).

Nine ADRs documenting key decisions: Rust/axum, custom proxy handler,
TOML config, rustls-acme for cert management, tokio-rustls direct,
token bucket rate limiting, custom log format for fail2ban,
static/dynamic config split, and signal handling strategy.

Includes threat landscape research documenting the nginx CVEs motivating
this project.
2026-06-11 07:25:50 +00:00