Commit Graph

14 Commits

Author SHA1 Message Date
4f537c80d2 Add security and bug review #003 (4 critical, 12 warnings, 10 suggestions) 2026-06-12 13:03:20 +00:00
0d54eba41e Update architecture specs to reflect live deployment findings and fix two bugs
Architecture updates based on gaps discovered during live deployment testing:

- ADR-023: HTTP/2 client-facing support via ALPN-based protocol detection.
  The spec previously said HTTP/2 was out of scope, but the deployment
  revealed that modern browsers negotiate HTTP/2 via ALPN. The proxy now
  correctly detects the negotiated ALPN protocol and uses the appropriate
  HTTP server builder (http2::Builder for h2, auto::Builder for http/1.1).
  Upstream connections remain HTTP/1.1. Host resolution now falls back to
  URI host for HTTP/2 :authority pseudo-headers.

- ADR-024: ANSI-disabled logging. All tracing-subscriber layers now use
  with_ansi(false) to prevent ANSI escape codes in log output, which broke
  fail2ban regex matching in Docker deployments. Also documents the fail2ban
  regex anchor fix (^RATE_LIMIT → RATE_LIMIT).

Bug fixes found by architecture review:

- Fix missing ALPN protocols in manual TLS mode. build_manual_server_config
  and build_multi_domain_server_config did not set alpn_protocols, meaning
  manual TLS mode could not support HTTP/2. Added h2 and http/1.1 ALPN
  entries to both functions (acme-tls/1 only in ACME mode).

- Fix missing with_ansi(false) in JSON log format. The init_json function
  with file output did not disable ANSI on stdout or file layers, which would
  break fail2ban in production JSON logging mode.

Other spec updates:

- All document statuses updated from draft to reviewed
- proxy.md: documented Server header removal, upstream HTTPS client,
  two-phase timeout enforcement, HTTP/2 host resolution, connect timeout
- tls.md: documented ALPN configuration differing by mode (ACME vs manual)
- overview.md: added HTTP/2 client-facing support to scope, updated crate
  deps (hyper-rustls, rustls-native-certs, hyper-util), clarified out-of-scope
- config.md: fixed http_port type (u16→u32) to match implementation, added
  ANSI-disabled note for LoggingConfig
- operations.md: documented ANSI-disabled logging, fail2ban regex anchor
- open-questions.md: updated OQ-09 resolution (connect timeout fully
  implemented), OQ-10 (C2 bug is fixed)
2026-06-12 11:28:31 +00:00
fe1ae6c05e Resolve all open questions, remove /health from main listener (ADR-022)
Resolve OQ-08 through OQ-12 after reviewing implementation findings:

- OQ-08: Remove /health route from the main HTTPS listener entirely.
  Health checking belongs on port 9900 and admin socket only, not on
  the public-facing proxy. This eliminates upstream collision problems
  and special-case routing logic. (ADR-022)

- OQ-09: Not an architectural unknown — ADR-015 already decided on a
  separate connect timeout. The implementation gap is a known issue.

- OQ-10: Not an open question — acme_contact is already specified as
  required in config.md. The empty contact list is bug C2.

- OQ-11: Hardcoded is_https=true is correct for a TLS-terminating
  proxy. HTTP listener redirects, doesn't proxy. Just needs a comment.

- OQ-12: Access logging is already specified as mandatory/always-on in
  operations.md. Missing log_request! calls are bug W13.

Updated docs: proxy.md, operations.md, overview.md, config.md,
open-questions.md, README.md, ADR-013. Created ADR-022.
2026-06-12 03:39:52 +00:00
68d27c4789 Triage implementation review findings and update architecture specs
Analyzed 29 findings from the implementation review (002-implementation-review.md)
and identified 8 architecture-level concerns requiring spec changes:

Architecture gaps addressed:
- C2: Added acme_contact field to config.md, tls.md, and operations.md.
  Let's Encrypt requires a contact email for production; the spec was missing
  this required field.
- C4: Added StaticConfig drift tracking requirement to config.md reload
  section. ConfigReloadHandle must update its stored StaticConfig after each
  successful reload to prevent stale warnings.
- W1: Updated shutdown sequence in operations.md to specify that server tasks
  should be joined (not aborted) during the drain window.
- W5: Added health check path collision note to proxy.md.
- W13: Clarified that access logging is always-on in operations.md.
- W14: Updated X-Forwarded-Proto description in proxy.md to clarify that it
  is always 'https' since the HTTP listener redirects rather than proxies.

New open questions added:
- OQ-08: Should /health use a less common path to avoid upstream collision?
- OQ-09: How should upstream_connect_timeout_secs be enforced?
- OQ-10: Should ACME contact email be a required config field?
- OQ-11: How should X-Forwarded-Proto be derived per-listener?
- OQ-12: Should request access logging be mandatory or optional?

The remaining 21 findings are implementation-level bugs, code quality issues,
or Phase 2 improvements that don't require architecture spec changes.
2026-06-11 15:04:09 +00:00
5478df7ab7 Add W13-W14, S9-S11 findings to implementation review
W13: No request access logging - log_request! macro defined but never called
W14: is_https hardcoded to true on ProxyState - X-Forwarded-Proto always https
S9: Rate limiting silently bypassed when no client IP found
S10: Integration test TOML has [[listeners.listeners.sites]] typo
S11: No Server response header added by proxy (upstream's is stripped)
2026-06-11 14:49:50 +00:00
39e1b82308 Add post-implementation code review (4 critical, 12 warning, 8 suggestion findings) 2026-06-11 14:20:06 +00:00
ceb59ad9b9 Resolve all architecture review findings (7 critical, 14 warnings, 6 suggestions)
Critical findings resolved:
- C1: Site routing is global (per-listener TOML, global runtime lookup)
- C2: X-Forwarded-For replaces (not appends) — edge proxy model (ADR-021)
- C3: Hop-by-hop header handling rules specified (proxy.md)
- C4: ACME failure behavior defined (tls.md)
- C5: Startup sequence with fail-fast semantics (operations.md)
- C6: Per-listener Router instances with shared global state (overview.md)
- C7: Rate limiter adopts new params on next request, no state clear (operations.md)

Warnings resolved:
- W1: Admin socket wire protocol specified
- W2: Host header port stripped, hostnames only in config
- W3: HTTP redirect URL construction with port handling
- W4: /health on HTTPS matches regardless of Host header
- W5: Static config changes logged as warning during reload
- W6: Reload operations serialized via Mutex
- W7: http_port validation rules added (9 new rules total)
- W8: upstream format validation (host:port required, no scheme)
- W9: TLS error handling table (SNI, version, cipher failures)
- W10: IPv6 rate limited per /64 prefix
- W11: Graceful shutdown sequence specified (6 steps)
- W12: Error response bodies: minimal plain text, no version disclosure
- W13: upstream_scheme HTTPS uses system CA store
- W14: allow_wildcard_bind is OR between config and CLI
- W15: ADR-010 Phase 2 list updated (timeouts moved to Phase 1)
- W17: LoggingConfig static/restart note added

Suggestions applied:
- S2: ConnectInfo propagation note
- S3: Case-insensitive host matching (RFC 7230)
- S5: Response streaming behavior (chunk-by-chunk)
- S6: Token bucket nodelay semantics
- S7: File watching explicitly out of scope
- S8: All paths forwarded without filtering
- S9: shutdown_timeout_secs referenced in shutdown description
- S11: Consolidated defaults table in config.md
2026-06-11 10:56:40 +00:00
bcc58bc7ce Add architecture gap review identifying 7 critical, 17 warning, and 11 suggestion findings
Comprehensive pre-implementation review of all architecture specs (overview,
proxy, tls, config, operations, 20 ADRs, open questions). Findings cover:
- Site routing model contradiction (per-listener vs global)
- X-Forwarded-For security model (edge proxy should replace, not append)
- Missing hop-by-hop header handling rules
- Undefined ACME failure behavior at startup/renewal
- Unspecified startup sequence and partial failure semantics
- Ambiguous per-listener vs shared router architecture
- Rate limiter state behavior on config reload

Plus warnings about admin socket protocol, Host header port handling,
port validation gaps, upstream format validation, TLS error handling,
shutdown draining, error response bodies, reload race conditions, and more.
2026-06-11 10:43:27 +00:00
fecc385d75 Add container deployment model (ADR-020) and fix review issues
- ADR-020: Document defense-in-depth rationale for running in a minimal
  Docker container (memory-safe language + container isolation), flexible
  upstream addressing (Docker DNS, loopback, LAN, tunnel endpoints),
  file-primary logging for fail2ban, and volume mount strategy
- ADR-016: Add allow_wildcard_bind override for container deployments
  where 0.0.0.0 is correct inside the container network namespace
- operations.md: Add container deployment section with Docker Compose
  example, networking table, volume mounts, and health check integration;
  flip logging to file-primary for fail2ban reliability; note systemd as
  alternative to container deployment
- config.md: Restructure logging fields into nested LoggingConfig (matching
  TOML [logging] section), add allow_wildcard_bind, shutdown_timeout_secs,
  and log_file_path fields; clarify upstream addressing supports Docker
  DNS and tunnel endpoints; update validation rule for 0.0.0.0 override
- overview.md: Update architecture diagram for container model with Docker
  networking and volume mounts; add ADR-020 reference
- proxy.md: Clarify X-Forwarded-Proto is determined by listener port, not
  hardcoded 80/443
- ADR-013: Fix health_check_port default contradiction (default is 9900,
  not 0/disabled as previously stated)
2026-06-11 10:10:32 +00:00
346754fb2b Resolve OQ-07: add multi-config listener support (ADR-019)
Introduce [[listeners]] configuration to support both dedicated-IP
(1 IP = 1 cert = 1 domain) and shared-IP (SAN certificate) deployment
models. Each listener is an independent TLS endpoint with its own bind
address, TLS config, and site routing. OQ-07 is now resolved.

Changes:
- Add ADR-019 for multi-config listener support
- Update config format from [server] to [[listeners]] entries
- Update tls.md for per-listener TLS and certificate provisioning
- Update overview.md architecture diagram and scope
- Update proxy.md for per-listener HTTP redirect
- Fix stale references in ADR-010, ADR-011, ADR-016
- Update OQ-05 resolution (per-listener bind_addr supersedes)
- Add unique-host rationale to config validation rules
- Architecture review: fix all 3 critical and 6 warning issues
2026-06-11 09:35:24 +00:00
9a2352e61c Resolve 5 open questions, add 7 ADRs for previously undocumented decisions
Resolve open questions:
- OQ-01: Restrict cipher suites to match nginx scope (4 ECDHE-AES-GCM
  suites for TLS 1.2 + all TLS 1.3 suites) — ADR-012
- OQ-03: Health check on separate local port (default 9900, localhost
  only) — ADR-013
- OQ-04: Add Unix domain socket admin API for config reload alongside
  SIGHUP, with structured success/failure responses — ADR-014
- OQ-06: Per-site upstream timeouts with defaults (5s connect, 60s
  request), overridable in SiteConfig — ADR-015

Document previously undocumented decisions flagged by architecture review:
- ADR-016: Explicit bind address requirement (reject 0.0.0.0)
- ADR-017: Upstream connection defaults (HTTP/1.1, no redirects, pooling)
- ADR-018: 100 MB body size limit (matches nginx, Gitea compatibility)

OQ-07 (per-site TLS overrides) remains open for future consideration.

Spec updates:
- config.md: add health_check_port, admin_socket_path, per-site timeout
  fields, update TOML example and validation rules
- proxy.md: reference ADR-015/017/018 for timeouts, connection defaults,
  and body limit decisions
- tls.md: replace OQ-01 cipher suite section with ADR-012 decision
- operations.md: add local health check port section, admin socket reload
- overview.md: update Phase 1 scope with new features, add ADR references
- open-questions.md: resolve OQ-01/03/04/06, keep OQ-07 open
2026-06-11 09:07:36 +00:00
7efc142406 Expand architecture: multi-site Phase 1, multi-domain TLS, fix review issues
Promote multi-site support from Phase 2 to Phase 1 (ADR-010): the proxy
must support git.alk.dev and alk.dev from initial release. Add multi-domain
TLS configuration (ADR-011): acme_domains array replaces acme_domain string,
single SAN certificate via rustls-acme.

Key changes:
- ADR-010: Multi-site in Phase 1 — avoids config format migration later
- ADR-011: Multi-domain TLS — single SAN cert, acme_domains Vec<String>
- ADR-002: Updated rationale for multi-site (one upstream per domain)
- overview.md: Phase 1 now includes multi-site, alk.dev pass-through,
  dual licensing (MIT OR Apache-2.0), real IP removed
- config.md: acme_domain → acme_domains, TOML example shows both sites,
  validation adds unique host check, real IP replaced with 203.0.113.10
- tls.md: Multi-domain SNI section moved from Future to current, manual
  mode uses ResolvesServerCert for SNI mapping, TOML header fixed
- proxy.md: Updated for multi-site, removed single-domain language
- operations.md: RFC 5737 documentation IPs, clarified rate limit eviction
  semantics (distinct scan interval vs eviction age)
- open-questions.md: OQ-05 resolved (single bind_addr sufficient), new
  OQ-07 (per-site TLS overrides)

Review fixes:
- acme_domains (plural) consistently used across all docs and diagram
- ADR-011 clearly scopes acme_domain as previous design
- Inline decision rationale extracted: tls.md hot-reload → ADR-004 ref,
  config.md static/dynamic → ADR-008 ref
- TOML section headers consistent (server.tls)
2026-06-11 08:50:03 +00:00
8ee6284b62 Add architecture specification for Rust/axum reverse proxy
Phase 1 architecture docs covering proxy handler, TLS termination (ACME +
manual), TOML config with static/dynamic split (ArcSwap), and operations
(rate limiting, logging, health check, systemd, graceful shutdown).

Nine ADRs documenting key decisions: Rust/axum, custom proxy handler,
TOML config, rustls-acme for cert management, tokio-rustls direct,
token bucket rate limiting, custom log format for fail2ban,
static/dynamic config split, and signal handling strategy.

Includes threat landscape research documenting the nginx CVEs motivating
this project.
2026-06-11 07:25:50 +00:00
5c54a28822 init 2026-06-11 06:32:33 +00:00