diff --git a/tasks/fix/acme-contact-validation.md b/tasks/fix/acme-contact-validation.md new file mode 100644 index 0000000..0ec1c6c --- /dev/null +++ b/tasks/fix/acme-contact-validation.md @@ -0,0 +1,68 @@ +--- +id: fix/acme-contact-validation +name: Validate ACME contact email has non-empty address with @ sign (W2) +status: pending +depends_on: [] +scope: single +risk: trivial +impact: isolated +level: implementation +review_findings: [W2] +--- + +## Description + +The ACME contact validation only checks that `contact.starts_with("mailto:")` +but doesn't verify there's an actual email address after the prefix. Values +like `mailto:` (empty email) pass validation but will fail at the Let's Encrypt +API at certificate provisioning time — after the proxy has already started. + +The spec (config.md validation rule 19) now requires: "a valid `mailto:` URI +with a non-empty email address containing an `@` sign." + +### Changes Required + +**`src/config/validation.rs`** — ACME contact validation (lines 148-153): +- After checking `starts_with("mailto:")`, validate the email part: + ```rust + let contact = &listener.tls.acme_contact; + if contact.is_empty() || !contact.starts_with("mailto:") { + errors.push(ValidationError::AcmeContactInvalid { + bind_addr: listener.bind_addr.clone(), + }); + } else { + let email = &contact[7..]; // after "mailto:" + if email.is_empty() || !email.contains('@') { + errors.push(ValidationError::AcmeContactInvalid { + bind_addr: listener.bind_addr.clone(), + }); + } + } + ``` +- Add tests for: + - Valid: `mailto:admin@example.com` + - Invalid: `mailto:` (empty), `mailto:user` (no @), empty string, non-mailto + +## Acceptance Criteria + +- [ ] `mailto:` (empty email) is rejected by validation +- [ ] `mailto:user` (no @ sign) is rejected by validation +- [ ] `mailto:admin@example.com` still passes validation +- [ ] Non-mailto values still rejected +- [ ] New unit tests for the tightened validation +- [ ] `cargo test` passes +- [ ] `cargo clippy` passes with no warnings + +## References + +- docs/architecture/config.md — validation rule 19 +- docs/reviews/003-security-and-bug-review.md — W2 finding +- src/config/validation.rs — ACME contact validation, existing tests + +## Notes + +> To be filled on completion + +## Summary + +> To be filled on completion \ No newline at end of file diff --git a/tasks/fix/admin-socket-reload-mutex-visibility.md b/tasks/fix/admin-socket-reload-mutex-visibility.md new file mode 100644 index 0000000..fc7b25f --- /dev/null +++ b/tasks/fix/admin-socket-reload-mutex-visibility.md @@ -0,0 +1,52 @@ +--- +id: fix/admin-socket-reload-mutex-visibility +name: Gate AdminSocket::reload_mutex with #[cfg(test)] (W11) +status: pending +depends_on: [] +scope: single +risk: trivial +impact: isolated +level: implementation +review_findings: [W11] +--- + +## Description + +`AdminSocket::reload_mutex()` is a public method that exists solely for the +`test_reload_serialized_with_mutex` test. It exposes an internal synchronization +primitive, and the test acquires the mutex before sending a reload command — +coupling the test to implementation details. + +### Changes Required + +**`src/admin/socket.rs`**: +- Gate `reload_mutex()` with `#[cfg(test)]`: + ```rust + #[cfg(test)] + pub fn reload_mutex(&self) -> Arc> { + self.reload_mutex.clone() + } + ``` +- The existing test `test_reload_serialized_with_mutex` already uses this + method, so it will continue to work. + +## Acceptance Criteria + +- [ ] `reload_mutex()` is only available in test builds (`#[cfg(test)]`) +- [ ] The `test_reload_serialized_with_mutex` test still compiles and passes +- [ ] `cargo clippy` passes with no warnings in non-test build + +## References + +- docs/reviews/003-security-and-bug-review.md — W11 finding +- src/admin/socket.rs — `AdminSocket::reload_mutex()`, test + +## Notes + +> The review suggests alternatively removing the method entirely and testing +> serialization through observable behavior. For Phase 1, gating with +> `#[cfg(test)]` is the simpler fix that preserves the existing test. + +## Summary + +> To be filled on completion \ No newline at end of file diff --git a/tasks/fix/admin-socket-resource-limits.md b/tasks/fix/admin-socket-resource-limits.md new file mode 100644 index 0000000..2eeac01 --- /dev/null +++ b/tasks/fix/admin-socket-resource-limits.md @@ -0,0 +1,100 @@ +--- +id: fix/admin-socket-resource-limits +name: Add read timeout and line length limit to admin socket (ADR-027) +status: pending +depends_on: [] +scope: narrow +risk: low +impact: component +level: implementation +review_findings: [W4, S4] +--- + +## Description + +The admin socket's `handle_connection` reads one newline-terminated line with +`reader.read_line(&mut line)` but sets no timeout and no length limit. This +allows: +1. A client to connect and send no data, holding a connection indefinitely +2. A client to send unbounded data without a newline, causing OOM + +ADR-027 specifies: 5-second read timeout, 4096 byte line length limit. + +### Changes Required + +**`src/admin/socket.rs`** — `handle_connection` function (lines 166-210): +- Wrap the `BufReader` with `tokio::io::take` to limit read size to 4096 bytes: + ```rust + let (reader, mut writer) = stream.into_split(); + let mut reader = BufReader::new(tokio::io::take(reader, 4096)); + let mut line = String::new(); + ``` +- Wrap the `read_line` call in a `tokio::time::timeout`: + ```rust + use std::time::Duration; + let read_result = tokio::time::timeout( + Duration::from_secs(5), + reader.read_line(&mut line), + ).await; + ``` +- Handle timeout and line-too-long cases: + ```rust + match read_result { + Ok(Ok(0)) | Ok(Err(_)) => { + // existing "invalid input" handling + } + Err(_) => { + // timeout + tracing::debug!("admin socket connection timed out"); + let _ = writer.write_all( + format!("{}\n", serde_json::to_string(&ErrorResponse { + status: "error", + message: "read timeout".to_string(), + }).unwrap()).as_bytes() + ).await; + return; + } + Ok(Ok(n)) => { + // Check if line was truncated (no newline found within limit) + if !line.ends_with('\n') && n > 0 { + tracing::warn!("admin socket command exceeded 4096 byte limit"); + let _ = writer.write_all( + format!("{}\n", serde_json::to_string(&ErrorResponse { + status: "error", + message: "command too long".to_string(), + }).unwrap()).as_bytes() + ).await; + return; + } + // ... existing command handling + } + } + ``` +- Update existing tests and add new tests for timeout and line length limit. + +## Acceptance Criteria + +- [ ] Read timeout of 5 seconds applied to admin socket connections +- [ ] Line length limit of 4096 bytes applied (via `tokio::io::take`) +- [ ] Timeout logged at `debug` level per ADR-027 +- [ ] Line-too-long logged at `warn` level per ADR-027 +- [ ] Both conditions return appropriate error JSON to the client +- [ ] Legitimate commands (`reload`, `status`) still work +- [ ] New tests for timeout and line length limit behavior +- [ ] `cargo test` passes +- [ ] `cargo clippy` passes with no warnings + +## References + +- docs/architecture/decisions/027-admin-socket-resource-limits.md — ADR-027 +- docs/architecture/operations.md — admin socket resource limits +- docs/reviews/003-security-and-bug-review.md — W4, S4 findings +- src/admin/socket.rs — `handle_connection` + +## Notes + +> To be filled on completion + +## Summary + +> To be filled on completion \ No newline at end of file diff --git a/tasks/fix/connector-timeout-ceiling.md b/tasks/fix/connector-timeout-ceiling.md new file mode 100644 index 0000000..332b480 --- /dev/null +++ b/tasks/fix/connector-timeout-ceiling.md @@ -0,0 +1,58 @@ +--- +id: fix/connector-timeout-ceiling +name: Raise connector timeout ceiling to 30s per ADR-026 +status: pending +depends_on: [] +scope: single +risk: low +impact: component +level: implementation +review_findings: [C3] +--- + +## Description + +The HTTP connector's `set_connect_timeout` is hardcoded to 5 seconds. Per-site +connect timeout values > 5s are silently capped because the connector's internal +timeout fires before the `tokio::time::timeout` wrapper. + +ADR-026 establishes a 30-second ceiling on the connector. The per-site +`tokio::time::timeout` enforces the actual per-site connect timeout. The +connector ceiling is a safety backstop, not the primary enforcement mechanism. + +### Changes Required + +**`src/proxy/handler.rs`**: +- Change `DEFAULT_CONNECT_TIMEOUT_SECS` from 5 to 30: + ```rust + const CONNECT_TIMEOUT_CEILING_SECS: u64 = 30; + ``` +- Rename the constant to make its role clear (ceiling, not the default connect + timeout for sites). +- Update both `create_http_client` and `create_https_client` to use the renamed + constant. + +## Acceptance Criteria + +- [ ] Connector `set_connect_timeout` is set to 30 seconds +- [ ] Constant is named to reflect its ceiling role (not "default") +- [ ] Per-site connect timeout (default 5s) via `tokio::time::timeout` is unchanged +- [ ] `cargo test` passes +- [ ] `cargo clippy` passes with no warnings + +## References + +- docs/architecture/decisions/026-connector-timeout-ceiling.md — ADR-026 +- docs/architecture/proxy.md — Upstream connection section +- docs/reviews/003-security-and-bug-review.md — C3 finding +- src/proxy/handler.rs — `create_http_client`, `create_https_client` + +## Notes + +> The previous `fix/connect-timeout` task wired the two-phase timeout approach. +> This task completes that work by raising the connector ceiling so per-site +> timeouts > 5s actually work. + +## Summary + +> To be filled on completion \ No newline at end of file diff --git a/tasks/fix/consolidate-config-types.md b/tasks/fix/consolidate-config-types.md new file mode 100644 index 0000000..debc988 --- /dev/null +++ b/tasks/fix/consolidate-config-types.md @@ -0,0 +1,95 @@ +--- +id: fix/consolidate-config-types +name: Delete RawConfig and use FullConfig in load_config (W6, S5) +status: pending +depends_on: [] +scope: narrow +risk: medium +impact: component +level: implementation +review_findings: [W6, S5] +--- + +## Description + +`RawConfig` (in `src/cli.rs`) and `FullConfig` (in `src/config/mod.rs`) have +identical fields and identical serde attributes. They exist because the initial +load path manually constructs `StaticConfig` + `SerializableDynamicConfig`, +while the reload path uses `FullConfig::into_static_and_dynamic()`. Any new +config field must be added in two places. + +The fix is to delete `RawConfig` and use `FullConfig` in `load_config`. The +`collect_sites` helper can also be removed since `into_static_and_dynamic` +already collects sites from all listeners. + +### Changes Required + +**`src/cli.rs`**: +- Delete the `RawConfig` struct (lines 49-65) +- Rewrite `load_config` to use `FullConfig`: + ```rust + pub fn load_config(cli: &Cli) -> Result { + let config_path = Path::new(&cli.config); + let config_content = std::fs::read_to_string(config_path) + .with_context(|| format!("failed to read config file: {}", cli.config))?; + + let full_config = crate::config::FullConfig::parse(&config_content) + .with_context(|| format!("failed to parse config file: {}", cli.config))?; + + let (static_config, dynamic_config) = full_config.into_static_and_dynamic(); + + let allow_wildcard_bind = static_config.allow_wildcard_bind || cli.allow_wildcard_bind; + + validate(&static_config, &dynamic_config, cli.allow_wildcard_bind).map_err(|errors| { + anyhow::anyhow!( + "config validation failed:\n{}", + errors.iter() + .map(|e| format!(" - {}", e)) + .collect::>() + .join("\n") + ) + })?; + + Ok(LoadedConfig { + static_config, + dynamic_config, + allow_wildcard_bind, + }) + } + ``` +- Delete the `collect_sites` helper function (lines 112-118) +- Remove the now-unused imports: `SerializableDynamicConfig`, `BodyConfig`, + `RateLimitConfig` (if they're only used via `RawConfig`) + +**`src/config/mod.rs`**: +- No changes needed — `FullConfig` already has `into_static_and_dynamic()` +- Verify `FullConfig::parse` and `into_static_and_dynamic` produce identical + results to the old `RawConfig` path + +## Acceptance Criteria + +- [ ] `RawConfig` struct deleted from `src/cli.rs` +- [ ] `collect_sites` function deleted from `src/cli.rs` +- [ ] `load_config` uses `FullConfig::parse` + `into_static_and_dynamic` +- [ ] Startup config loading produces same results as before +- [ ] Config validation still runs on both startup and reload +- [ ] All existing `cli.rs` tests pass +- [ ] `cargo test` passes +- [ ] `cargo clippy` passes with no warnings + +## References + +- docs/architecture/config.md — config reload, FullConfig +- docs/reviews/003-security-and-bug-review.md — W6, S5 findings +- src/cli.rs — RawConfig, load_config, collect_sites +- src/config/mod.rs — FullConfig, into_static_and_dynamic + +## Notes + +> This changes the startup config parsing path. While the behavior should be +> identical (same fields, same serde attributes), this is a sensitive area. A +> review task follows the code quality generation. + +## Summary + +> To be filled on completion \ No newline at end of file diff --git a/tasks/fix/http-port-type-u16.md b/tasks/fix/http-port-type-u16.md new file mode 100644 index 0000000..8281efb --- /dev/null +++ b/tasks/fix/http-port-type-u16.md @@ -0,0 +1,77 @@ +--- +id: fix/http-port-type-u16 +name: Change http_port type from u32 to u16 per spec (W12) +status: pending +depends_on: [] +scope: narrow +risk: low +impact: component +level: implementation +review_findings: [W12] +--- + +## Description + +`http_port` is declared as `u32` in `ListenerConfig` but `https_port` is `u16`. +Both represent TCP port numbers (valid range 1–65535). The type inconsistency +means comparisons require casting (`listener.http_port == listener.https_port as +u32`) and `http_port` could theoretically hold values > 65535 caught by +validation rather than the type system. + +The spec (config.md) now declares `http_port` as `u16`. + +### Changes Required + +**`src/config/static_config.rs`**: +- Change `http_port` field type from `u32` to `u16` in `ListenerConfig` +- Update `default_http_port()` return type to `u16` + +**`src/config/validation.rs`**: +- Change `DuplicateHttpBind` error type: `http_port: u32` → `http_port: u16` +- Change `HttpsAndHttpPortSame` error type: `http_port: u32` → `http_port: u16` +- Change `HttpPortInvalid` error type: `http_port: u32` → `http_port: u16` +- Remove `as u32` casts — both `http_port` and `https_port` are now `u16` +- Remove the `http_port > 65535` check (impossible with `u16`, but keep `http_port > 0` + for the "disabled" check) +- Update comparison: `listener.http_port == listener.https_port` (no cast needed) +- Update health check port comparison: remove `as u32` cast + +**`src/main.rs`**: +- Update any `http_port` references that assume `u32` + +**`src/cli.rs`**: +- Update `RawConfig.http_port` type from `u32` to `u16` (if `RawConfig` still + exists after `fix/consolidate-config-types`; if not, this file is unaffected) + +**`src/config/test_fixtures.rs`**: +- Update any test fixture `http_port` values from `u32` to `u16` + +**`tests/integration_test.rs`**: +- Update any hardcoded `http_port` values + +## Acceptance Criteria + +- [ ] `http_port` is `u16` in `ListenerConfig` +- [ ] All `as u32` casts on `http_port` removed +- [ ] `http_port > 65535` validation check removed (impossible with u16) +- [ ] `http_port == https_port` comparison works without casting +- [ ] All validation tests pass +- [ ] `cargo test` passes +- [ ] `cargo clippy` passes with no warnings + +## References + +- docs/architecture/config.md — `http_port` type declaration +- docs/reviews/003-security-and-bug-review.md — W12 finding +- src/config/static_config.rs — `ListenerConfig` struct +- src/config/validation.rs — validation rules, error types + +## Notes + +> If `fix/consolidate-config-types` runs first and removes `RawConfig`, the +> `src/cli.rs` changes in this task are reduced. The two tasks are independent +> in terms of the type change itself. + +## Summary + +> To be filled on completion \ No newline at end of file diff --git a/tasks/fix/inflight-counter-increment.md b/tasks/fix/inflight-counter-increment.md new file mode 100644 index 0000000..a998080 --- /dev/null +++ b/tasks/fix/inflight-counter-increment.md @@ -0,0 +1,74 @@ +--- +id: fix/inflight-counter-increment +name: Fix InFlightCounter to increment before spawning task (C2 + drain interval) +status: pending +depends_on: [] +scope: narrow +risk: medium +impact: component +level: implementation +review_findings: [C2] +--- + +## Description + +`InFlightCounter::increment()` is never called anywhere in the codebase. The +`InFlightGuard` only decrements on drop. Since `count` stays at 0, the first +guard drop does `fetch_sub(1)` on an `AtomicUsize` with value 0, which wraps to +`usize::MAX`. `is_zero()` checks `count == 0`, which never becomes true again. +The drain logic in `drain_in_flight` always times out. + +The spec (operations.md shutdown sequence) states: "each request **must** +increment the counter when it begins and decrement when it completes (via guard +drop). The increment must happen before the request task is spawned." + +Additionally, the spec states the drain polls every 100ms, but the current +implementation uses 50ms. Align with the spec. + +### Changes Required + +**`src/server.rs`**: +- Fold the increment into `InFlightGuard::new()` so it's impossible to forget: + ```rust + impl InFlightGuard { + fn new(counter: Arc) -> Self { + counter.increment(); + Self(counter) + } + } + ``` +- Update `serve_https_listener` to use `InFlightGuard::new(in_flight.clone())` + instead of `InFlightGuard(in_flight.clone())`. +- Make `InFlightGuard`'s tuple struct private (if it isn't already) so callers + must use `new()`. + +**`src/server.rs` — `drain_in_flight`**: +- Change polling interval from 50ms to 100ms per the spec (operations.md): + ```rust + tokio::time::sleep(std::time::Duration::from_millis(100)).await; + ``` + +## Acceptance Criteria + +- [ ] `InFlightGuard::new()` calls `counter.increment()` before returning +- [ ] `InFlightGuard` is constructed via `new()` only, not the tuple struct +- [ ] `serve_https_listener` uses `InFlightGuard::new(in_flight.clone())` +- [ ] `drain_in_flight` polls every 100ms (not 50ms) +- [ ] `cargo test` passes +- [ ] `cargo clippy` passes with no warnings + +## References + +- docs/architecture/operations.md — shutdown sequence, in-flight counter +- docs/reviews/003-security-and-bug-review.md — C2 finding +- src/server.rs — InFlightCounter, InFlightGuard, drain_in_flight +- src/main.rs — drain_in_flight caller + +## Notes + +> The previous `fix/graceful-shutdown` task addressed the abort-vs-join logic +> but did not fix the increment bug. This task completes that work. + +## Summary + +> To be filled on completion \ No newline at end of file diff --git a/tasks/fix/json-format-without-logfile.md b/tasks/fix/json-format-without-logfile.md new file mode 100644 index 0000000..7f457d8 --- /dev/null +++ b/tasks/fix/json-format-without-logfile.md @@ -0,0 +1,58 @@ +--- +id: fix/json-format-without-logfile +name: Fix JSON format not applied when no log file is configured (C4) +status: pending +depends_on: [] +scope: single +risk: trivial +impact: isolated +level: implementation +review_findings: [C4] +--- + +## Description + +When `format = "json"` is configured but no `log_file_path` is set, the `None` +branch of `init_json` creates a layer **without** calling `.json()`. The output +is plain text, not JSON. The `format = "json"` config value is silently ignored. + +The spec (operations.md) states: "Both output destinations must respect the +`format` config value: when `format = "json"`, both file and stdout output must +use JSON formatting." + +### Changes Required + +**`src/logging/mod.rs`** — `init_json` function, `None` branch (lines 54-58): +- Add `.json()` to the stdout-only layer: + ```rust + None => { + let layer = tracing_subscriber::fmt::layer() + .json() + .with_ansi(false) + .with_filter(env_filter); + tracing_subscriber::registry().with(layer).try_init()?; + } + ``` + +## Acceptance Criteria + +- [ ] `.json()` is called in the `None` branch of `init_json` +- [ ] `format = "json"` with no `log_file_path` produces JSON output on stdout +- [ ] `format = "json"` with `log_file_path` still produces JSON on both outputs +- [ ] `format = "text"` paths are unchanged +- [ ] `cargo test` passes +- [ ] `cargo clippy` passes with no warnings + +## References + +- docs/architecture/operations.md — logging output, format guarantee +- docs/reviews/003-security-and-bug-review.md — C4 finding +- src/logging/mod.rs — `init_json` function + +## Notes + +> To be filled on completion + +## Summary + +> To be filled on completion \ No newline at end of file diff --git a/tasks/fix/log-root-cert-count.md b/tasks/fix/log-root-cert-count.md new file mode 100644 index 0000000..d0d166a --- /dev/null +++ b/tasks/fix/log-root-cert-count.md @@ -0,0 +1,66 @@ +--- +id: fix/log-root-cert-count +name: Log system root certificate count at startup (S3) +status: pending +depends_on: [] +scope: single +risk: trivial +impact: isolated +level: implementation +review_findings: [S3] +--- + +## Description + +`root_certs()` loads native certificates silently — only logs errors. If the +system has zero root certificates (misconfigured CA bundle), all HTTPS upstream +connections will fail with opaque TLS errors and no diagnostic message. + +### Changes Required + +**`src/proxy/handler.rs`** — `root_certs()` function (lines 246-258): +- Add an info-level log with cert count and warn if zero: + ```rust + fn root_certs() -> rustls::RootCertStore { + let mut roots = rustls::RootCertStore::empty(); + let result = rustls_native_certs::load_native_certs(); + for cert in result.certs { + roots.add(cert).ok(); + } + let cert_count = roots.len(); + let error_count = result.errors.len(); + if cert_count == 0 { + warn!(certs_loaded = cert_count, errors = error_count, + "no system root certificates loaded — HTTPS upstream connections will fail"); + } else { + info!(certs_loaded = cert_count, errors = error_count, + "loaded system root certificates"); + } + for err in &result.errors { + warn!(error = %err, "failed to load native certificate"); + } + roots + } + ``` + +## Acceptance Criteria + +- [ ] Info-level log with cert count when certs > 0 +- [ ] Warn-level log when cert count is 0 +- [ ] Error count included in log output +- [ ] Individual cert load errors still logged at warn level +- [ ] `cargo test` passes +- [ ] `cargo clippy` passes with no warnings + +## References + +- docs/reviews/003-security-and-bug-review.md — S3 finding +- src/proxy/handler.rs — `root_certs()` function + +## Notes + +> To be filled on completion + +## Summary + +> To be filled on completion \ No newline at end of file diff --git a/tasks/fix/rate-limiter-connectinfo-tests.md b/tasks/fix/rate-limiter-connectinfo-tests.md new file mode 100644 index 0000000..38e2d73 --- /dev/null +++ b/tasks/fix/rate-limiter-connectinfo-tests.md @@ -0,0 +1,67 @@ +--- +id: fix/rate-limiter-connectinfo-tests +name: Update rate limiter tests to use ConnectInfo instead of X-Forwarded-For (S10) +status: pending +depends_on: [fix/rate-limiter-ip-source] +scope: narrow +risk: medium +impact: component +level: implementation +review_findings: [S10] +--- + +## Description + +After `fix/rate-limiter-ip-source` removes X-Forwarded-For parsing from the rate +limiter, existing integration tests that pass client IPs via `X-Forwarded-For` +headers will no longer work correctly. The rate limiter now reads exclusively +from `ConnectInfo`, so tests must provide `ConnectInfo` in request +extensions. + +This task also addresses review finding S10: "No test verifies the ConnectInfo +extraction path, which is the primary path after C1 is fixed." + +### Changes Required + +**`tests/integration_test.rs`**: +- Find all rate limit integration tests (lines 90-163 and any others) that set + `X-Forwarded-For` headers for rate limiting purposes +- Update those tests to set `ConnectInfo` in request extensions + instead of (or in addition to) `X-Forwarded-For` +- Verify that `X-Forwarded-For` headers are now **ignored** by the rate limiter + — add a test that sends requests with different `X-Forwarded-For` values from + the same `ConnectInfo` IP and confirms they all count against the same bucket +- Add a test that verifies requests **without** `ConnectInfo` are rejected with + 429 (per ADR-025) + +**`src/rate_limit/mod.rs`** (test module): +- Update any unit tests that rely on `X-Forwarded-For` extraction +- Add a test that verifies `ConnectInfo`-based rate limiting works without any + `X-Forwarded-For` header + +## Acceptance Criteria + +- [ ] All rate limit integration tests use `ConnectInfo` for client IP +- [ ] Tests verify `X-Forwarded-For` headers are ignored by rate limiter +- [ ] New test: requests without `ConnectInfo` are rejected with 429 +- [ ] New test: different `X-Forwarded-For` values from same `ConnectInfo` IP + count against the same bucket (proving XFF is ignored) +- [ ] `cargo test` passes +- [ ] `cargo clippy` passes with no warnings + +## References + +- docs/architecture/decisions/025-rate-limiter-ip-source.md — ADR-025 +- docs/reviews/003-security-and-bug-review.md — S10 finding +- tests/integration_test.rs — rate limit tests +- src/rate_limit/mod.rs — rate limiter middleware and tests + +## Notes + +> This task depends on `fix/rate-limiter-ip-source` because the code changes +> must be in place before the tests can be updated. Attempting this before the +> C1 fix would require writing tests for the old (broken) behavior. + +## Summary + +> To be filled on completion \ No newline at end of file diff --git a/tasks/fix/rate-limiter-ip-source.md b/tasks/fix/rate-limiter-ip-source.md new file mode 100644 index 0000000..0d1058a --- /dev/null +++ b/tasks/fix/rate-limiter-ip-source.md @@ -0,0 +1,78 @@ +--- +id: fix/rate-limiter-ip-source +name: Fix rate limiter to use ConnectInfo only (ADR-025) +status: pending +depends_on: [] +scope: narrow +risk: high +impact: component +level: implementation +review_findings: [C1] +--- + +## Description + +The rate limiter currently extracts the client IP by checking the `X-Forwarded-For` +header **first**, then falling back to `ConnectInfo`. This is a security +vulnerability: since the rate limiter runs as middleware before the proxy +handler injects trusted headers, the `X-Forwarded-For` value is whatever the +client sent — completely untrusted. This enables two attack vectors: + +1. **Rate limit bypass**: Attacker sends each request with a different random + `X-Forwarded-For` value, evading per-IP token buckets entirely. +2. **DoS via IP spoofing**: Attacker sends requests with `X-Forwarded-For: + `, depleting the victim's bucket. + +ADR-025 establishes that the rate limiter must use `ConnectInfo` as +the **sole** source of client IP. If `ConnectInfo` is absent, the request must +be **rejected** (not fall back to an untrusted header). + +### Changes Required + +**`src/rate_limit/mod.rs`**: +- Replace the current IP extraction logic in `rate_limit_middleware` (lines 66-76) + with ConnectInfo-first, no fallback: + ```rust + let client_ip = req + .extensions() + .get::>() + .map(|ci| ci.ip()); + ``` +- When `ConnectInfo` is absent, return 429 (reject the request) rather than + passing through without rate limiting. Per ADR-025: "If ConnectInfo is absent, + the request must be rejected rather than falling back to an untrusted header." + ```rust + let Some(ip) = client_ip else { + return (StatusCode::TOO_MANY_REQUESTS, "Too Many Requests").into_response(); + }; + ``` +- Remove all `X-Forwarded-For` header parsing from the rate limiter middleware. + +## Acceptance Criteria + +- [ ] Rate limiter extracts client IP from `ConnectInfo` only +- [ ] No `X-Forwarded-For` header parsing in the rate limiter middleware +- [ ] Requests without `ConnectInfo` are rejected with 429 (not passed through) +- [ ] `cargo test` passes (note: integration tests that pass `X-Forwarded-For` + for rate limiting will need to be updated in the separate test task + `fix/rate-limiter-connectinfo-tests`) +- [ ] `cargo clippy` passes with no warnings + +## References + +- docs/architecture/decisions/025-rate-limiter-ip-source.md — ADR-025 +- docs/architecture/operations.md — Rate limiting design, IP source section +- docs/architecture/proxy.md — Rate limiter IP source section +- docs/reviews/003-security-and-bug-review.md — C1 finding +- src/rate_limit/mod.rs — current implementation (lines 61-104) + +## Notes + +> This is the highest-priority security fix. After this change, the integration +> tests in `tests/integration_test.rs` that rely on `X-Forwarded-For` for rate +> limiting will need to be updated. That work is tracked separately in +> `fix/rate-limiter-connectinfo-tests`. + +## Summary + +> To be filled on completion \ No newline at end of file diff --git a/tasks/fix/remove-dead-code-remnants.md b/tasks/fix/remove-dead-code-remnants.md new file mode 100644 index 0000000..de67271 --- /dev/null +++ b/tasks/fix/remove-dead-code-remnants.md @@ -0,0 +1,96 @@ +--- +id: fix/remove-dead-code-remnants +name: Remove dead code items identified in security review #003 (S1) +status: pending +depends_on: [] +scope: narrow +risk: trivial +impact: component +level: implementation +review_findings: [S1] +--- + +## Description + +Security review #003 identifies the following dead code items that survived the +previous `fix/clean-dead-code` task. These items are defined but never called in +production code: + +| Item | File | Note | +|------|------|------| +| `log_rate_limit!` macro | `src/logging/format.rs:72-83` | Rate limiter uses `warn!` directly | +| `log_config_reload!` macro | `src/logging/format.rs:97-106` | Reload uses `info!`/`warn!` directly | +| `format_event_fields()` | `src/logging/format.rs:50-54` | Never called | +| `ProxyError::PayloadTooLarge` | `src/proxy/error.rs:12` | Body limit returns tuple directly | +| `ProxyError::NotFound` | `src/proxy/error.rs:20` | Code uses `UnknownHost` instead | +| `ProxyError::BadRequest` | `src/proxy/error.rs:22` | Code uses `MissingHost` instead | +| `ProxyError::UpstreamTls` | `src/proxy/error.rs:28` | Never constructed | +| `build_multi_domain_server_config()` | `src/tls/config.rs:78-102` | Never called | +| `SniCertResolver` | `src/tls/config.rs:104-126` | Only used by above | +| `AcmeTlsConfig::directory_url()` | `src/tls/acme.rs:53-59` | Only used in tests | + +### Changes Required + +For each item, either wire it up (if it should be used) or remove it: + +**Logging macros** (`src/logging/format.rs`): +- `log_rate_limit!`: The rate limiter middleware now uses `warn!` directly. + Remove the macro and its test invocation in `log_macros_compile`. +- `log_config_reload!`: The admin socket reload uses `info!`/`warn!` directly. + Remove the macro and its test invocation. +- `format_event_fields()`: Never called externally. Remove it. + +**ProxyError variants** (`src/proxy/error.rs`): +- `PayloadTooLarge`: Not used by body limit middleware (returns tuple directly). + Keep for now with a comment explaining it's reserved for future use, OR remove + if no plan to use it. The `#[non_exhaustive]` attribute means removing a variant + is not a breaking change for external consumers. +- `NotFound`: Superseded by `UnknownHost`. Remove and update `status_code()` and + `body()` match arms. +- `BadRequest`: Superseded by `MissingHost`. Remove and update match arms. +- `UpstreamTls`: Never constructed. Remove and update match arms. +- Update unit tests that reference removed variants. + +**TLS dead code** (`src/tls/config.rs`, `src/tls/acme.rs`): +- `build_multi_domain_server_config()`: Remove function. +- `SniCertResolver`: Remove struct and impl (only used by above function). +- `AcmeTlsConfig::directory_url()`: Check if it's used in tests. If only in + tests, gate with `#[cfg(test)]`. If truly unused, remove. + +**Test helpers** (`tests/helpers/http_test_helper.rs`): +- `TestUpstream::url()`, `TestUpstream::upstream_addr()`: Currently annotated + `#[allow(dead_code)]`. If not used in any test, remove. If used, remove the + `#[allow(dead_code)]` annotation. + +## Acceptance Criteria + +- [ ] `log_rate_limit!` and `log_config_reload!` macros removed +- [ ] `format_event_fields()` removed +- [ ] Unused `ProxyError` variants removed (`NotFound`, `BadRequest`, `UpstreamTls`) +- [ ] `PayloadTooLarge` either removed or documented with a TODO comment +- [ ] `build_multi_domain_server_config()` and `SniCertResolver` removed +- [ ] `directory_url()` either `#[cfg(test)]`-gated or removed +- [ ] Dead test helper methods removed or `#[allow(dead_code)]` removed +- [ ] All match arms in `ProxyError` methods updated for removed variants +- [ ] All unit tests updated for removed variants +- [ ] `cargo test` passes +- [ ] `cargo clippy` passes with no warnings + +## References + +- docs/reviews/003-security-and-bug-review.md — S1 finding +- src/logging/format.rs — dead macros and function +- src/proxy/error.rs — unused ProxyError variants +- src/tls/config.rs — dead TLS functions +- src/tls/acme.rs — directory_url method +- tests/helpers/http_test_helper.rs — dead_code-annotated helpers + +## Notes + +> The previous `fix/clean-dead-code` task added `#[non_exhaustive]` and removed +> some dead code. This task addresses the items that survived that round or were +> newly identified in review #003. + +## Summary + +> To be filled on completion \ No newline at end of file diff --git a/tasks/fix/rename-misleading-test.md b/tasks/fix/rename-misleading-test.md new file mode 100644 index 0000000..b6ea5c3 --- /dev/null +++ b/tasks/fix/rename-misleading-test.md @@ -0,0 +1,57 @@ +--- +id: fix/rename-misleading-test +name: Rename misleading health check test and fix dynamic config test (W8, W9) +status: pending +depends_on: [] +scope: single +risk: trivial +impact: isolated +level: implementation +review_findings: [W8, W9] +--- + +## Description + +Two test quality issues in `tests/integration_test.rs`: + +**W8**: `test_health_check_disabled_when_port_zero` is misleading — port 0 means +"OS picks a random port" not "disabled". The test verifies that +`start_health_check_listener(0)` works (binds to a random port), which is +correct but the name implies it tests the disable path. The actual disable +logic is in `main.rs:94` where `health_check_port > 0` gates the call. + +**W9**: `test_dynamic_config_with_limit` constructs `DynamicConfig` directly +with `routing_table: Default::default()` (empty HashMap), bypassing +`DynamicConfig::from_sites()`. The body limit tests only read +`body.limit_bytes` so it doesn't matter, but it's a latent trap for anyone +copying this pattern. + +### Changes Required + +**`tests/integration_test.rs`**: +- Rename `test_health_check_disabled_when_port_zero` to + `test_health_check_binds_random_port_when_zero` +- Update `test_dynamic_config_with_limit` to use `DynamicConfig::from_sites()` + instead of directly constructing with an empty routing table, OR add a + comment explaining why the routing table is intentionally empty for this test + +## Acceptance Criteria + +- [ ] Test renamed to `test_health_check_binds_random_port_when_zero` +- [ ] `test_dynamic_config_with_limit` either uses `from_sites()` or has a + comment explaining the intentional empty routing table +- [ ] `cargo test` passes +- [ ] `cargo clippy` passes with no warnings + +## References + +- docs/reviews/003-security-and-bug-review.md — W8, W9 findings +- tests/integration_test.rs — test names and implementations + +## Notes + +> To be filled on completion + +## Summary + +> To be filled on completion \ No newline at end of file diff --git a/tasks/fix/tls-mode-wildcard-mismatch.md b/tasks/fix/tls-mode-wildcard-mismatch.md new file mode 100644 index 0000000..fd02721 --- /dev/null +++ b/tasks/fix/tls-mode-wildcard-mismatch.md @@ -0,0 +1,62 @@ +--- +id: fix/tls-mode-wildcard-mismatch +name: Add explicit listener/acceptor count check or remove TlsMode wildcard (W5) +status: pending +depends_on: [] +scope: single +risk: trivial +impact: isolated +level: implementation +review_findings: [W5] +--- + +## Description + +The `match tls_mode` in `main.rs` has a wildcard `_` arm that logs a warning +and pushes **no** acceptor. Then `bound_listeners.into_iter().zip(tls_acceptors.into_iter())` +uses `zip`, which silently stops at the shorter iterator. If the wildcard arm +were ever reached, some listeners would have no TLS acceptor and would be +silently dropped. + +`setup_tls` already rejects unknown modes with `bail!`, so the wildcard is +unreachable in practice. But it's a latent bug for future refactors. + +### Changes Required + +**`src/main.rs`** (lines 170-194): +- Option A (preferred): Remove the wildcard `_` arm entirely. Since `TlsMode` + only has two variants (`Manual` and `Acme`) and `setup_tls` already validates, + the wildcard is dead code. Removing it means the compiler will catch future + `TlsMode` additions. + +- Option B: Add an explicit count check after the match loop: + ```rust + if bound_listeners.len() != tls_acceptors.len() { + anyhow::bail!("listener/acceptor count mismatch: {} listeners, {} acceptors", + bound_listeners.len(), tls_acceptors.len()); + } + ``` + + If removing the wildcard, this check is redundant but harmless as a + defense-in-depth assertion. + +## Acceptance Criteria + +- [ ] Wildcard `_` arm removed from the `match tls_mode` block, OR +- [ ] Explicit count mismatch check added after the acceptor construction loop +- [ ] `cargo test` passes +- [ ] `cargo clippy` passes with no warnings + +## References + +- docs/reviews/003-security-and-bug-review.md — W5 finding +- src/main.rs — TLS acceptor construction loop (lines 170-194) +- src/tls/acceptor.rs — `setup_tls`, `TlsMode` enum + +## Notes + +> To be filled on completion + +## Summary + +> To be filled on completion \ No newline at end of file diff --git a/tasks/fix/token-bucket-field-visibility.md b/tasks/fix/token-bucket-field-visibility.md new file mode 100644 index 0000000..7a3e552 --- /dev/null +++ b/tasks/fix/token-bucket-field-visibility.md @@ -0,0 +1,51 @@ +--- +id: fix/token-bucket-field-visibility +name: Make TokenBucket fields private except last_access (W10, S6) +status: pending +depends_on: [] +scope: single +risk: trivial +impact: isolated +level: implementation +review_findings: [W10, S6] +--- + +## Description + +All `TokenBucket` fields are `pub` but only `last_access` is read externally (by +`evict_stale` in `rate_limit/mod.rs`). The other fields (`tokens`, `last_refill`, +`rate`, `max`) should be private to prevent accidental direct mutation that +bypasses `try_consume`/`refill` logic. + +### Changes Required + +**`src/rate_limit/bucket.rs`**: +- Make `tokens`, `last_refill`, `rate`, `max` private (remove `pub`) +- Keep `last_access` as `pub(crate)` for `evict_stale` access +- `TokenBucket::new()` already exists as a constructor, so no changes needed there +- Update any unit tests that directly access private fields. The tests in + `bucket.rs` are in the same module so they have access to private fields. + Tests in `mod.rs` may need adjustment if they access `bucket.tokens` etc. + +## Acceptance Criteria + +- [ ] `tokens`, `last_refill`, `rate`, `max` fields are private +- [ ] `last_access` is `pub(crate)` +- [ ] `new()` constructor is the only way to create a `TokenBucket` externally +- [ ] `evict_stale` still compiles and works (uses `last_access`) +- [ ] All unit tests pass (in-module tests can still access private fields) +- [ ] `cargo clippy` passes with no warnings + +## References + +- docs/reviews/003-security-and-bug-review.md — W10, S6 findings +- src/rate_limit/bucket.rs — TokenBucket struct +- src/rate_limit/mod.rs — evict_stale + +## Notes + +> To be filled on completion + +## Summary + +> To be filled on completion \ No newline at end of file diff --git a/tasks/fix/upstream-host-validation.md b/tasks/fix/upstream-host-validation.md new file mode 100644 index 0000000..5c4e407 --- /dev/null +++ b/tasks/fix/upstream-host-validation.md @@ -0,0 +1,83 @@ +--- +id: fix/upstream-host-validation +name: Validate host part of upstream address in config (W1) +status: pending +depends_on: [] +scope: narrow +risk: low +impact: component +level: implementation +review_findings: [W1] +--- + +## Description + +`is_valid_upstream` checks that the upstream has a `host:port` format with a +valid port number, but performs **no validation on the host part** beyond +checking it's non-empty and doesn't start with `http://` or `https://`. Values +like `!!!bad!!!:3000` or `@#$%:8080` pass validation. + +The spec (config.md validation rule 17) now requires: "the host part must parse +as a valid IpAddr or pass is_valid_hostname validation." Bracket-enclosed values +must be parsed as IPv6 addresses. + +### Changes Required + +**`src/config/validation.rs`** — `is_valid_upstream` function (lines 309-327): +- After validating the port, validate the host part: + ```rust + fn is_valid_upstream(upstream: &str) -> bool { + if let Some(idx) = upstream.rfind(':') { + let host_part = &upstream[..idx]; + let port_str = &upstream[idx + 1..]; + if host_part.is_empty() { return false; } + if upstream.starts_with("http://") || upstream.starts_with("https://") { return false; } + let port: u16 = match port_str.parse() { Ok(p) => p, Err(_) => return false }; + if port == 0 { return false; } + // Validate host part per config.md rule 17 + if host_part.starts_with('[') && host_part.ends_with(']') { + let inner = &host_part[1..host_part.len()-1]; + inner.parse::().is_ok() + } else { + host_part.parse::().is_ok() || is_valid_hostname(host_part) + } + } else { + false + } + } + ``` +- Note: `is_valid_hostname` already exists in the same file and is used for site + `host` validation. It rejects IP addresses, which is correct for site hosts + but wrong for upstream hosts — upstream hosts CAN be IPs. The upstream + validation must check `IpAddr::parse` first, then fall back to + `is_valid_hostname` for DNS names. + +- Add tests for: + - Valid: `gitea:3000`, `127.0.0.1:3000`, `[::1]:3000` + - Invalid: `!!!bad!!!:3000`, `@#$%:8080`, `:3000` (empty host) + +## Acceptance Criteria + +- [ ] `is_valid_upstream` validates the host part as IP address or valid hostname +- [ ] IPv6 bracket notation is handled (e.g., `[::1]:3000`) +- [ ] Invalid host characters like `!!!bad!!!:3000` are rejected +- [ ] Valid upstream formats still pass: `gitea:3000`, `127.0.0.1:3000` +- [ ] New unit tests for valid and invalid upstream host parts +- [ ] `cargo test` passes +- [ ] `cargo clippy` passes with no warnings + +## References + +- docs/architecture/config.md — validation rule 17 +- docs/reviews/003-security-and-bug-review.md — W1 finding +- src/config/validation.rs — `is_valid_upstream`, `is_valid_hostname` + +## Notes + +> `is_valid_hostname` currently rejects IP addresses (intentional for site +> hosts). The upstream validation must handle IPs separately before falling +> back to `is_valid_hostname`. + +## Summary + +> To be filled on completion \ No newline at end of file diff --git a/tasks/fix/upstream-uri-error-handling.md b/tasks/fix/upstream-uri-error-handling.md new file mode 100644 index 0000000..90da494 --- /dev/null +++ b/tasks/fix/upstream-uri-error-handling.md @@ -0,0 +1,80 @@ +--- +id: fix/upstream-uri-error-handling +name: Return 502 on upstream URI parse failure instead of dropping query string (W3) +status: pending +depends_on: [] +scope: narrow +risk: low +impact: component +level: implementation +review_findings: [W3] +--- + +## Description + +`build_upstream_uri` silently drops the query string on parse failure and +`.unwrap()`s the fallback. This corrupts requests — the upstream receives the +wrong URL with no query parameters, and neither the client nor operator is +notified. The `.unwrap()` on the fallback could also panic. + +The spec (proxy.md request forwarding step 1) states: "If URI construction +fails (e.g., the resulting URI is malformed), the proxy must return 502 Bad +Gateway and log the error at warn level. The proxy must never silently drop +parts of the URI." + +### Changes Required + +**`src/proxy/handler.rs`**: +- Change `build_upstream_uri` to return `Result`: + ```rust + fn build_upstream_uri(scheme: &str, upstream: &str, original_uri: &Uri) -> Result { + let path = original_uri.path(); + let query = original_uri + .query() + .map(|q| format!("?{}", q)) + .unwrap_or_default(); + let uri_string = format!("{}://{}{}{}", scheme, upstream, path, query); + uri_string.parse::().map_err(|e| { + warn!(error = %e, uri = %uri_string, "failed to parse upstream URI"); + }) + } + ``` +- Update `proxy_handler` to handle the `Err` case: + ```rust + let upstream_uri = match build_upstream_uri(&upstream_scheme, &upstream, req.uri()) { + Ok(uri) => uri, + Err(()) => { + log_upstream_error!(&host_owned, &upstream_addr, "malformed upstream URI"); + let duration_ms = start.elapsed().as_millis() as u64; + log_request!(&client_ip, &host_owned, &method, &path, 502u16, &upstream, duration_ms); + return StatusCode::BAD_GATEWAY.into_response(); + } + }; + ``` +- Update the existing unit tests for `build_upstream_uri` to handle the + `Result` return type. + +## Acceptance Criteria + +- [ ] `build_upstream_uri` returns `Result` instead of `Uri` +- [ ] URI parse failure logs a warning with the malformed URI string +- [ ] URI parse failure returns 502 Bad Gateway to the client +- [ ] No `.unwrap()` calls in `build_upstream_uri` +- [ ] Query strings are never silently dropped +- [ ] Existing unit tests updated for `Result` return type +- [ ] `cargo test` passes +- [ ] `cargo clippy` passes with no warnings + +## References + +- docs/architecture/proxy.md — request forwarding step 1, URI error handling +- docs/reviews/003-security-and-bug-review.md — W3 finding +- src/proxy/handler.rs — `build_upstream_uri`, `proxy_handler` + +## Notes + +> To be filled on completion + +## Summary + +> To be filled on completion \ No newline at end of file diff --git a/tasks/review/post-security-fix-review.md b/tasks/review/post-security-fix-review.md new file mode 100644 index 0000000..51b6ef3 --- /dev/null +++ b/tasks/review/post-security-fix-review.md @@ -0,0 +1,60 @@ +--- +id: review/post-security-fix-review +name: Review security fix implementations before production consideration +status: pending +depends_on: + - fix/rate-limiter-ip-source + - fix/inflight-counter-increment + - fix/connector-timeout-ceiling + - fix/json-format-without-logfile + - fix/upstream-host-validation + - fix/acme-contact-validation + - fix/upstream-uri-error-handling + - fix/admin-socket-resource-limits + - fix/consolidate-config-types + - fix/rate-limiter-connectinfo-tests +scope: moderate +risk: low +impact: project +level: review +--- + +## Description + +Review all security and bug fix implementations from Review #003 before +considering them production-ready. Verify that the fixes correctly implement +the architecture decisions (ADR-025, ADR-026, ADR-027) and the updated spec +documents. + +## Acceptance Criteria + +- [ ] C1 fix: Rate limiter uses ConnectInfo only, rejects without it (ADR-025) +- [ ] C2 fix: InFlightCounter increments before task spawn, drain polls 100ms +- [ ] C3 fix: Connector ceiling is 30s, per-site timeouts work >5s (ADR-026) +- [ ] C4 fix: JSON format applied in stdout-only path +- [ ] W1 fix: Upstream host part validated (DNS name or IP, IPv6 brackets) +- [ ] W2 fix: ACME contact email validated (non-empty, contains @) +- [ ] W3 fix: URI parse failure returns 502, never drops query string silently +- [ ] W4 fix: Admin socket has 5s timeout and 4096 byte line limit (ADR-027) +- [ ] W6 fix: RawConfig eliminated, FullConfig used in both paths +- [ ] S10 fix: Rate limit tests use ConnectInfo, verify XFF is ignored +- [ ] All `cargo test` passes +- [ ] All `cargo clippy` passes with no warnings +- [ ] No regressions in integration tests + +## References + +- docs/reviews/003-security-and-bug-review.md — all findings +- docs/architecture/decisions/025-rate-limiter-ip-source.md — ADR-025 +- docs/architecture/decisions/026-connector-timeout-ceiling.md — ADR-026 +- docs/architecture/decisions/027-admin-socket-resource-limits.md — ADR-027 + +## Notes + +> This review covers the critical security fixes and the sensitive config +> consolidation. It should be the last task before the generation 4+ code +> quality items are considered final. + +## Summary + +> To be filled on completion \ No newline at end of file