Resolve 5 open questions, add 7 ADRs for previously undocumented decisions

Resolve open questions:
- OQ-01: Restrict cipher suites to match nginx scope (4 ECDHE-AES-GCM
  suites for TLS 1.2 + all TLS 1.3 suites) — ADR-012
- OQ-03: Health check on separate local port (default 9900, localhost
  only) — ADR-013
- OQ-04: Add Unix domain socket admin API for config reload alongside
  SIGHUP, with structured success/failure responses — ADR-014
- OQ-06: Per-site upstream timeouts with defaults (5s connect, 60s
  request), overridable in SiteConfig — ADR-015

Document previously undocumented decisions flagged by architecture review:
- ADR-016: Explicit bind address requirement (reject 0.0.0.0)
- ADR-017: Upstream connection defaults (HTTP/1.1, no redirects, pooling)
- ADR-018: 100 MB body size limit (matches nginx, Gitea compatibility)

OQ-07 (per-site TLS overrides) remains open for future consideration.

Spec updates:
- config.md: add health_check_port, admin_socket_path, per-site timeout
  fields, update TOML example and validation rules
- proxy.md: reference ADR-015/017/018 for timeouts, connection defaults,
  and body limit decisions
- tls.md: replace OQ-01 cipher suite section with ADR-012 decision
- operations.md: add local health check port section, admin socket reload
- overview.md: update Phase 1 scope with new features, add ADR references
- open-questions.md: resolve OQ-01/03/04/06, keep OQ-07 open
This commit is contained in:
2026-06-11 09:07:36 +00:00
parent 7efc142406
commit 9a2352e61c
14 changed files with 613 additions and 89 deletions

View File

@@ -7,19 +7,15 @@ last_updated: 2026-06-11
## TLS
### OQ-01: Should cipher suites be restricted beyond rustls defaults?
### ~~OQ-01: Should cipher suites be restricted beyond rustls defaults?~~
- **Origin**: [tls.md](tls.md)
- **Status**: open
- **Status**: resolved
- **Priority**: medium
- **Context**: Our current nginx config explicitly restricts cipher suites to
four ECDHE-AES-GCM suites. rustls 0.23 with `aws_lc_rs` defaults to a
conservative set that excludes all weak ciphers (no SHA-1, no 3DES, no RC4,
no CBC-mode suites, no RSA key exchange). The defaults include TLS 1.3 suites
which nginx also allows. Restricting further would reduce compatibility with
older clients; not restricting means accepting a wider (but still safe) set
than the current nginx config.
- **Cross-references**: ADR-005
- **Resolution**: Restrict cipher suites to match the nginx scope: four
ECDHE-AES-GCM suites for TLS 1.2 plus all TLS 1.3 suites. This provides
behavioral parity during migration. See ADR-012.
- **Cross-references**: ADR-005, ADR-012
### ~~OQ-02: What log format should fail2ban consume?~~
@@ -47,31 +43,28 @@ last_updated: 2026-06-11
## Logging and Monitoring
### OQ-03: Should the health check endpoint be on a separate port?
### ~~OQ-03: Should the health check endpoint be on a separate port?~~
- **Origin**: [operations.md](operations.md)
- **Status**: open
- **Status**: resolved
- **Priority**: low
- **Context**: Currently the health check is on the main HTTPS listener at
`/health`. Alternatives: (a) separate unencrypted port for health checks
(simpler for load balancers but less secure), (b) admin port with its own
listener (more complex but isolates operational traffic), (c) on the main
listener (simplest, proposed approach). For a single-server deployment behind
no external load balancer, the main listener is fine.
- **Cross-references**: None
- **Resolution**: Add a configurable local health check port (default: 9900)
bound to `127.0.0.1` only. Health checks work even when TLS is misconfigured.
The main HTTPS `/health` endpoint remains available as a fallback. See
ADR-013.
- **Cross-references**: ADR-013
## Configuration
### OQ-04: Should config reload support a Unix domain socket API in addition to SIGHUP?
### ~~OQ-04: Should config reload support a Unix domain socket API in addition to SIGHUP?~~
- **Origin**: [config.md](config.md)
- **Status**: open
- **Status**: resolved
- **Priority**: low
- **Context**: Phase 1 uses SIGHUP for config reload, which is simple and proven.
A Unix domain socket API would allow programmatic reload (e.g., from an admin
tool or CI/CD pipeline) and could return success/failure status. This adds
complexity and is not needed for Phase 1.
- **Cross-references**: None
- **Resolution**: Yes. Add a Unix domain socket admin API alongside SIGHUP.
The socket accepts a `reload` command and returns structured success/failure
responses. SIGHUP is retained as a fallback. See ADR-014.
- **Cross-references**: ADR-014
## Deployment
@@ -84,17 +77,16 @@ last_updated: 2026-06-11
explicit IP address (not `0.0.0.0`). Multi-address binding is not needed for
this single-server deployment. If needed in the future, `bind_addr` could be
extended to an array. See config.md for the `bind_addr` field.
- **Cross-references**: None
- **Cross-references**: ADR-016
## Proxy
### OQ-06: Should upstream timeouts be configurable per-site?
### ~~OQ-06: Should upstream timeouts be configurable per-site?~~
- **Origin**: [proxy.md](proxy.md)
- **Status**: open
- **Status**: resolved
- **Priority**: low
- **Context**: Phase 1 uses global defaults (5s connect timeout, 60s request
timeout) for all upstream connections. Per-site timeout configuration would
allow tuning for different upstream services (e.g., a slow database-backed
API vs. a fast static site). Not needed for Phase 1 with a single upstream.
- **Cross-references**: None
- **Resolution**: Yes. Per-site upstream timeouts with sensible defaults (5s
connect, 60s request). Optional fields in SiteConfig that override global
defaults when specified. See ADR-015.
- **Cross-references**: ADR-015, ADR-017