Decompose security review #003 findings into 17 fix tasks and 1 review task

Address 4 critical, 8 warning, and 5 suggestion findings from the security and bug review by creating atomic, dependency-ordered tasks: Critical fixes (C1-C4): rate limiter IP source (ADR-025), InFlightCounter increment + drain interval, connector timeout ceiling (ADR-026), JSON format without log file. Validation tightening (W1, W2): upstream host validation, ACME contact email validation. Robustness (W3, W4, W5, W12): upstream URI error handling (502 not silent drop), admin socket resource limits (ADR-027), TlsMode wildcard mismatch, http_port u32→u16. Code quality (W6, W10, W11, S1, S3, W8/W9): config type consolidation, TokenBucket field visibility, reload_mutex #[cfg(test)], dead code removal, root cert count logging, misleading test names. Test coverage (S10): rate limiter ConnectInfo tests (depends on C1 fix). Review: post-security-fix-review checkpoint covering all critical fixes and sensitive config consolidation path.
2026-06-12 13:42:37 +00:00
parent 80d1fd0fb3
commit 54f1725173
18 changed files with 1282 additions and 0 deletions
--- a/tasks/fix/acme-contact-validation.md
+++ b/tasks/fix/acme-contact-validation.md
@@ -0,0 +1,68 @@
+---
+id: fix/acme-contact-validation
+name: Validate ACME contact email has non-empty address with @ sign (W2)
+status: pending
+depends_on: []
+scope: single
+risk: trivial
+impact: isolated
+level: implementation
+review_findings: [W2]
+---
+
+## Description
+
+The ACME contact validation only checks that `contact.starts_with("mailto:")`
+but doesn't verify there's an actual email address after the prefix. Values
+like `mailto:` (empty email) pass validation but will fail at the Let's Encrypt
+API at certificate provisioning time — after the proxy has already started.
+
+The spec (config.md validation rule 19) now requires: "a valid `mailto:` URI
+with a non-empty email address containing an `@` sign."
+
+### Changes Required
+
+**`src/config/validation.rs`** — ACME contact validation (lines 148-153):
+- After checking `starts_with("mailto:")`, validate the email part:
+  ```rust
+  let contact = &listener.tls.acme_contact;
+  if contact.is_empty() || !contact.starts_with("mailto:") {
+      errors.push(ValidationError::AcmeContactInvalid {
+          bind_addr: listener.bind_addr.clone(),
+      });
+  } else {
+      let email = &contact[7..]; // after "mailto:"
+      if email.is_empty() || !email.contains('@') {
+          errors.push(ValidationError::AcmeContactInvalid {
+              bind_addr: listener.bind_addr.clone(),
+          });
+      }
+  }
+  ```
+- Add tests for:
+  - Valid: `mailto:admin@example.com`
+  - Invalid: `mailto:` (empty), `mailto:user` (no @), empty string, non-mailto
+
+## Acceptance Criteria
+
+- [ ] `mailto:` (empty email) is rejected by validation
+- [ ] `mailto:user` (no @ sign) is rejected by validation
+- [ ] `mailto:admin@example.com` still passes validation
+- [ ] Non-mailto values still rejected
+- [ ] New unit tests for the tightened validation
+- [ ] `cargo test` passes
+- [ ] `cargo clippy` passes with no warnings
+
+## References
+
+- docs/architecture/config.md — validation rule 19
+- docs/reviews/003-security-and-bug-review.md — W2 finding
+- src/config/validation.rs — ACME contact validation, existing tests
+
+## Notes
+
+> To be filled on completion
+
+## Summary
+
+> To be filled on completion
--- a/tasks/fix/admin-socket-reload-mutex-visibility.md
+++ b/tasks/fix/admin-socket-reload-mutex-visibility.md
@@ -0,0 +1,52 @@
+---
+id: fix/admin-socket-reload-mutex-visibility
+name: Gate AdminSocket::reload_mutex with #[cfg(test)] (W11)
+status: pending
+depends_on: []
+scope: single
+risk: trivial
+impact: isolated
+level: implementation
+review_findings: [W11]
+---
+
+## Description
+
+`AdminSocket::reload_mutex()` is a public method that exists solely for the
+`test_reload_serialized_with_mutex` test. It exposes an internal synchronization
+primitive, and the test acquires the mutex before sending a reload command —
+coupling the test to implementation details.
+
+### Changes Required
+
+**`src/admin/socket.rs`**:
+- Gate `reload_mutex()` with `#[cfg(test)]`:
+  ```rust
+  #[cfg(test)]
+  pub fn reload_mutex(&self) -> Arc<Mutex<()>> {
+      self.reload_mutex.clone()
+  }
+  ```
+- The existing test `test_reload_serialized_with_mutex` already uses this
+  method, so it will continue to work.
+
+## Acceptance Criteria
+
+- [ ] `reload_mutex()` is only available in test builds (`#[cfg(test)]`)
+- [ ] The `test_reload_serialized_with_mutex` test still compiles and passes
+- [ ] `cargo clippy` passes with no warnings in non-test build
+
+## References
+
+- docs/reviews/003-security-and-bug-review.md — W11 finding
+- src/admin/socket.rs — `AdminSocket::reload_mutex()`, test
+
+## Notes
+
+> The review suggests alternatively removing the method entirely and testing
+> serialization through observable behavior. For Phase 1, gating with
+> `#[cfg(test)]` is the simpler fix that preserves the existing test.
+
+## Summary
+
+> To be filled on completion
--- a/tasks/fix/admin-socket-resource-limits.md
+++ b/tasks/fix/admin-socket-resource-limits.md
@@ -0,0 +1,100 @@
+---
+id: fix/admin-socket-resource-limits
+name: Add read timeout and line length limit to admin socket (ADR-027)
+status: pending
+depends_on: []
+scope: narrow
+risk: low
+impact: component
+level: implementation
+review_findings: [W4, S4]
+---
+
+## Description
+
+The admin socket's `handle_connection` reads one newline-terminated line with
+`reader.read_line(&mut line)` but sets no timeout and no length limit. This
+allows:
+1. A client to connect and send no data, holding a connection indefinitely
+2. A client to send unbounded data without a newline, causing OOM
+
+ADR-027 specifies: 5-second read timeout, 4096 byte line length limit.
+
+### Changes Required
+
+**`src/admin/socket.rs`** — `handle_connection` function (lines 166-210):
+- Wrap the `BufReader` with `tokio::io::take` to limit read size to 4096 bytes:
+  ```rust
+  let (reader, mut writer) = stream.into_split();
+  let mut reader = BufReader::new(tokio::io::take(reader, 4096));
+  let mut line = String::new();
+  ```
+- Wrap the `read_line` call in a `tokio::time::timeout`:
+  ```rust
+  use std::time::Duration;
+  let read_result = tokio::time::timeout(
+      Duration::from_secs(5),
+      reader.read_line(&mut line),
+  ).await;
+  ```
+- Handle timeout and line-too-long cases:
+  ```rust
+  match read_result {
+      Ok(Ok(0)) | Ok(Err(_)) => {
+          // existing "invalid input" handling
+      }
+      Err(_) => {
+          // timeout
+          tracing::debug!("admin socket connection timed out");
+          let _ = writer.write_all(
+              format!("{}\n", serde_json::to_string(&ErrorResponse {
+                  status: "error",
+                  message: "read timeout".to_string(),
+              }).unwrap()).as_bytes()
+          ).await;
+          return;
+      }
+      Ok(Ok(n)) => {
+          // Check if line was truncated (no newline found within limit)
+          if !line.ends_with('\n') && n > 0 {
+              tracing::warn!("admin socket command exceeded 4096 byte limit");
+              let _ = writer.write_all(
+                  format!("{}\n", serde_json::to_string(&ErrorResponse {
+                      status: "error",
+                      message: "command too long".to_string(),
+                  }).unwrap()).as_bytes()
+              ).await;
+              return;
+          }
+          // ... existing command handling
+      }
+  }
+  ```
+- Update existing tests and add new tests for timeout and line length limit.
+
+## Acceptance Criteria
+
+- [ ] Read timeout of 5 seconds applied to admin socket connections
+- [ ] Line length limit of 4096 bytes applied (via `tokio::io::take`)
+- [ ] Timeout logged at `debug` level per ADR-027
+- [ ] Line-too-long logged at `warn` level per ADR-027
+- [ ] Both conditions return appropriate error JSON to the client
+- [ ] Legitimate commands (`reload`, `status`) still work
+- [ ] New tests for timeout and line length limit behavior
+- [ ] `cargo test` passes
+- [ ] `cargo clippy` passes with no warnings
+
+## References
+
+- docs/architecture/decisions/027-admin-socket-resource-limits.md — ADR-027
+- docs/architecture/operations.md — admin socket resource limits
+- docs/reviews/003-security-and-bug-review.md — W4, S4 findings
+- src/admin/socket.rs — `handle_connection`
+
+## Notes
+
+> To be filled on completion
+
+## Summary
+
+> To be filled on completion
--- a/tasks/fix/connector-timeout-ceiling.md
+++ b/tasks/fix/connector-timeout-ceiling.md
@@ -0,0 +1,58 @@
+---
+id: fix/connector-timeout-ceiling
+name: Raise connector timeout ceiling to 30s per ADR-026
+status: pending
+depends_on: []
+scope: single
+risk: low
+impact: component
+level: implementation
+review_findings: [C3]
+---
+
+## Description
+
+The HTTP connector's `set_connect_timeout` is hardcoded to 5 seconds. Per-site
+connect timeout values > 5s are silently capped because the connector's internal
+timeout fires before the `tokio::time::timeout` wrapper.
+
+ADR-026 establishes a 30-second ceiling on the connector. The per-site
+`tokio::time::timeout` enforces the actual per-site connect timeout. The
+connector ceiling is a safety backstop, not the primary enforcement mechanism.
+
+### Changes Required
+
+**`src/proxy/handler.rs`**:
+- Change `DEFAULT_CONNECT_TIMEOUT_SECS` from 5 to 30:
+  ```rust
+  const CONNECT_TIMEOUT_CEILING_SECS: u64 = 30;
+  ```
+- Rename the constant to make its role clear (ceiling, not the default connect
+  timeout for sites).
+- Update both `create_http_client` and `create_https_client` to use the renamed
+  constant.
+
+## Acceptance Criteria
+
+- [ ] Connector `set_connect_timeout` is set to 30 seconds
+- [ ] Constant is named to reflect its ceiling role (not "default")
+- [ ] Per-site connect timeout (default 5s) via `tokio::time::timeout` is unchanged
+- [ ] `cargo test` passes
+- [ ] `cargo clippy` passes with no warnings
+
+## References
+
+- docs/architecture/decisions/026-connector-timeout-ceiling.md — ADR-026
+- docs/architecture/proxy.md — Upstream connection section
+- docs/reviews/003-security-and-bug-review.md — C3 finding
+- src/proxy/handler.rs — `create_http_client`, `create_https_client`
+
+## Notes
+
+> The previous `fix/connect-timeout` task wired the two-phase timeout approach.
+> This task completes that work by raising the connector ceiling so per-site
+> timeouts > 5s actually work.
+
+## Summary
+
+> To be filled on completion
--- a/tasks/fix/consolidate-config-types.md
+++ b/tasks/fix/consolidate-config-types.md
@@ -0,0 +1,95 @@
+---
+id: fix/consolidate-config-types
+name: Delete RawConfig and use FullConfig in load_config (W6, S5)
+status: pending
+depends_on: []
+scope: narrow
+risk: medium
+impact: component
+level: implementation
+review_findings: [W6, S5]
+---
+
+## Description
+
+`RawConfig` (in `src/cli.rs`) and `FullConfig` (in `src/config/mod.rs`) have
+identical fields and identical serde attributes. They exist because the initial
+load path manually constructs `StaticConfig` + `SerializableDynamicConfig`,
+while the reload path uses `FullConfig::into_static_and_dynamic()`. Any new
+config field must be added in two places.
+
+The fix is to delete `RawConfig` and use `FullConfig` in `load_config`. The
+`collect_sites` helper can also be removed since `into_static_and_dynamic`
+already collects sites from all listeners.
+
+### Changes Required
+
+**`src/cli.rs`**:
+- Delete the `RawConfig` struct (lines 49-65)
+- Rewrite `load_config` to use `FullConfig`:
+  ```rust
+  pub fn load_config(cli: &Cli) -> Result<LoadedConfig> {
+      let config_path = Path::new(&cli.config);
+      let config_content = std::fs::read_to_string(config_path)
+          .with_context(|| format!("failed to read config file: {}", cli.config))?;
+
+      let full_config = crate::config::FullConfig::parse(&config_content)
+          .with_context(|| format!("failed to parse config file: {}", cli.config))?;
+
+      let (static_config, dynamic_config) = full_config.into_static_and_dynamic();
+
+      let allow_wildcard_bind = static_config.allow_wildcard_bind || cli.allow_wildcard_bind;
+
+      validate(&static_config, &dynamic_config, cli.allow_wildcard_bind).map_err(|errors| {
+          anyhow::anyhow!(
+              "config validation failed:\n{}",
+              errors.iter()
+                  .map(|e| format!("  - {}", e))
+                  .collect::<Vec<_>>()
+                  .join("\n")
+          )
+      })?;
+
+      Ok(LoadedConfig {
+          static_config,
+          dynamic_config,
+          allow_wildcard_bind,
+      })
+  }
+  ```
+- Delete the `collect_sites` helper function (lines 112-118)
+- Remove the now-unused imports: `SerializableDynamicConfig`, `BodyConfig`,
+  `RateLimitConfig` (if they're only used via `RawConfig`)
+
+**`src/config/mod.rs`**:
+- No changes needed — `FullConfig` already has `into_static_and_dynamic()`
+- Verify `FullConfig::parse` and `into_static_and_dynamic` produce identical
+  results to the old `RawConfig` path
+
+## Acceptance Criteria
+
+- [ ] `RawConfig` struct deleted from `src/cli.rs`
+- [ ] `collect_sites` function deleted from `src/cli.rs`
+- [ ] `load_config` uses `FullConfig::parse` + `into_static_and_dynamic`
+- [ ] Startup config loading produces same results as before
+- [ ] Config validation still runs on both startup and reload
+- [ ] All existing `cli.rs` tests pass
+- [ ] `cargo test` passes
+- [ ] `cargo clippy` passes with no warnings
+
+## References
+
+- docs/architecture/config.md — config reload, FullConfig
+- docs/reviews/003-security-and-bug-review.md — W6, S5 findings
+- src/cli.rs — RawConfig, load_config, collect_sites
+- src/config/mod.rs — FullConfig, into_static_and_dynamic
+
+## Notes
+
+> This changes the startup config parsing path. While the behavior should be
+> identical (same fields, same serde attributes), this is a sensitive area. A
+> review task follows the code quality generation.
+
+## Summary
+
+> To be filled on completion
--- a/tasks/fix/http-port-type-u16.md
+++ b/tasks/fix/http-port-type-u16.md
@@ -0,0 +1,77 @@
+---
+id: fix/http-port-type-u16
+name: Change http_port type from u32 to u16 per spec (W12)
+status: pending
+depends_on: []
+scope: narrow
+risk: low
+impact: component
+level: implementation
+review_findings: [W12]
+---
+
+## Description
+
+`http_port` is declared as `u32` in `ListenerConfig` but `https_port` is `u16`.
+Both represent TCP port numbers (valid range 1–65535). The type inconsistency
+means comparisons require casting (`listener.http_port == listener.https_port as
+u32`) and `http_port` could theoretically hold values > 65535 caught by
+validation rather than the type system.
+
+The spec (config.md) now declares `http_port` as `u16`.
+
+### Changes Required
+
+**`src/config/static_config.rs`**:
+- Change `http_port` field type from `u32` to `u16` in `ListenerConfig`
+- Update `default_http_port()` return type to `u16`
+
+**`src/config/validation.rs`**:
+- Change `DuplicateHttpBind` error type: `http_port: u32` → `http_port: u16`
+- Change `HttpsAndHttpPortSame` error type: `http_port: u32` → `http_port: u16`
+- Change `HttpPortInvalid` error type: `http_port: u32` → `http_port: u16`
+- Remove `as u32` casts — both `http_port` and `https_port` are now `u16`
+- Remove the `http_port > 65535` check (impossible with `u16`, but keep `http_port > 0`
+  for the "disabled" check)
+- Update comparison: `listener.http_port == listener.https_port` (no cast needed)
+- Update health check port comparison: remove `as u32` cast
+
+**`src/main.rs`**:
+- Update any `http_port` references that assume `u32`
+
+**`src/cli.rs`**:
+- Update `RawConfig.http_port` type from `u32` to `u16` (if `RawConfig` still
+  exists after `fix/consolidate-config-types`; if not, this file is unaffected)
+
+**`src/config/test_fixtures.rs`**:
+- Update any test fixture `http_port` values from `u32` to `u16`
+
+**`tests/integration_test.rs`**:
+- Update any hardcoded `http_port` values
+
+## Acceptance Criteria
+
+- [ ] `http_port` is `u16` in `ListenerConfig`
+- [ ] All `as u32` casts on `http_port` removed
+- [ ] `http_port > 65535` validation check removed (impossible with u16)
+- [ ] `http_port == https_port` comparison works without casting
+- [ ] All validation tests pass
+- [ ] `cargo test` passes
+- [ ] `cargo clippy` passes with no warnings
+
+## References
+
+- docs/architecture/config.md — `http_port` type declaration
+- docs/reviews/003-security-and-bug-review.md — W12 finding
+- src/config/static_config.rs — `ListenerConfig` struct
+- src/config/validation.rs — validation rules, error types
+
+## Notes
+
+> If `fix/consolidate-config-types` runs first and removes `RawConfig`, the
+> `src/cli.rs` changes in this task are reduced. The two tasks are independent
+> in terms of the type change itself.
+
+## Summary
+
+> To be filled on completion
--- a/tasks/fix/inflight-counter-increment.md
+++ b/tasks/fix/inflight-counter-increment.md
@@ -0,0 +1,74 @@
+---
+id: fix/inflight-counter-increment
+name: Fix InFlightCounter to increment before spawning task (C2 + drain interval)
+status: pending
+depends_on: []
+scope: narrow
+risk: medium
+impact: component
+level: implementation
+review_findings: [C2]
+---
+
+## Description
+
+`InFlightCounter::increment()` is never called anywhere in the codebase. The
+`InFlightGuard` only decrements on drop. Since `count` stays at 0, the first
+guard drop does `fetch_sub(1)` on an `AtomicUsize` with value 0, which wraps to
+`usize::MAX`. `is_zero()` checks `count == 0`, which never becomes true again.
+The drain logic in `drain_in_flight` always times out.
+
+The spec (operations.md shutdown sequence) states: "each request **must**
+increment the counter when it begins and decrement when it completes (via guard
+drop). The increment must happen before the request task is spawned."
+
+Additionally, the spec states the drain polls every 100ms, but the current
+implementation uses 50ms. Align with the spec.
+
+### Changes Required
+
+**`src/server.rs`**:
+- Fold the increment into `InFlightGuard::new()` so it's impossible to forget:
+  ```rust
+  impl InFlightGuard {
+      fn new(counter: Arc<InFlightCounter>) -> Self {
+          counter.increment();
+          Self(counter)
+      }
+  }
+  ```
+- Update `serve_https_listener` to use `InFlightGuard::new(in_flight.clone())`
+  instead of `InFlightGuard(in_flight.clone())`.
+- Make `InFlightGuard`'s tuple struct private (if it isn't already) so callers
+  must use `new()`.
+
+**`src/server.rs` — `drain_in_flight`**:
+- Change polling interval from 50ms to 100ms per the spec (operations.md):
+  ```rust
+  tokio::time::sleep(std::time::Duration::from_millis(100)).await;
+  ```
+
+## Acceptance Criteria
+
+- [ ] `InFlightGuard::new()` calls `counter.increment()` before returning
+- [ ] `InFlightGuard` is constructed via `new()` only, not the tuple struct
+- [ ] `serve_https_listener` uses `InFlightGuard::new(in_flight.clone())`
+- [ ] `drain_in_flight` polls every 100ms (not 50ms)
+- [ ] `cargo test` passes
+- [ ] `cargo clippy` passes with no warnings
+
+## References
+
+- docs/architecture/operations.md — shutdown sequence, in-flight counter
+- docs/reviews/003-security-and-bug-review.md — C2 finding
+- src/server.rs — InFlightCounter, InFlightGuard, drain_in_flight
+- src/main.rs — drain_in_flight caller
+
+## Notes
+
+> The previous `fix/graceful-shutdown` task addressed the abort-vs-join logic
+> but did not fix the increment bug. This task completes that work.
+
+## Summary
+
+> To be filled on completion
--- a/tasks/fix/json-format-without-logfile.md
+++ b/tasks/fix/json-format-without-logfile.md
@@ -0,0 +1,58 @@
+---
+id: fix/json-format-without-logfile
+name: Fix JSON format not applied when no log file is configured (C4)
+status: pending
+depends_on: []
+scope: single
+risk: trivial
+impact: isolated
+level: implementation
+review_findings: [C4]
+---
+
+## Description
+
+When `format = "json"` is configured but no `log_file_path` is set, the `None`
+branch of `init_json` creates a layer **without** calling `.json()`. The output
+is plain text, not JSON. The `format = "json"` config value is silently ignored.
+
+The spec (operations.md) states: "Both output destinations must respect the
+`format` config value: when `format = "json"`, both file and stdout output must
+use JSON formatting."
+
+### Changes Required
+
+**`src/logging/mod.rs`** — `init_json` function, `None` branch (lines 54-58):
+- Add `.json()` to the stdout-only layer:
+  ```rust
+  None => {
+      let layer = tracing_subscriber::fmt::layer()
+          .json()
+          .with_ansi(false)
+          .with_filter(env_filter);
+      tracing_subscriber::registry().with(layer).try_init()?;
+  }
+  ```
+
+## Acceptance Criteria
+
+- [ ] `.json()` is called in the `None` branch of `init_json`
+- [ ] `format = "json"` with no `log_file_path` produces JSON output on stdout
+- [ ] `format = "json"` with `log_file_path` still produces JSON on both outputs
+- [ ] `format = "text"` paths are unchanged
+- [ ] `cargo test` passes
+- [ ] `cargo clippy` passes with no warnings
+
+## References
+
+- docs/architecture/operations.md — logging output, format guarantee
+- docs/reviews/003-security-and-bug-review.md — C4 finding
+- src/logging/mod.rs — `init_json` function
+
+## Notes
+
+> To be filled on completion
+
+## Summary
+
+> To be filled on completion
--- a/tasks/fix/log-root-cert-count.md
+++ b/tasks/fix/log-root-cert-count.md
@@ -0,0 +1,66 @@
+---
+id: fix/log-root-cert-count
+name: Log system root certificate count at startup (S3)
+status: pending
+depends_on: []
+scope: single
+risk: trivial
+impact: isolated
+level: implementation
+review_findings: [S3]
+---
+
+## Description
+
+`root_certs()` loads native certificates silently — only logs errors. If the
+system has zero root certificates (misconfigured CA bundle), all HTTPS upstream
+connections will fail with opaque TLS errors and no diagnostic message.
+
+### Changes Required
+
+**`src/proxy/handler.rs`** — `root_certs()` function (lines 246-258):
+- Add an info-level log with cert count and warn if zero:
+  ```rust
+  fn root_certs() -> rustls::RootCertStore {
+      let mut roots = rustls::RootCertStore::empty();
+      let result = rustls_native_certs::load_native_certs();
+      for cert in result.certs {
+          roots.add(cert).ok();
+      }
+      let cert_count = roots.len();
+      let error_count = result.errors.len();
+      if cert_count == 0 {
+          warn!(certs_loaded = cert_count, errors = error_count,
+              "no system root certificates loaded — HTTPS upstream connections will fail");
+      } else {
+          info!(certs_loaded = cert_count, errors = error_count,
+              "loaded system root certificates");
+      }
+      for err in &result.errors {
+          warn!(error = %err, "failed to load native certificate");
+      }
+      roots
+  }
+  ```
+
+## Acceptance Criteria
+
+- [ ] Info-level log with cert count when certs > 0
+- [ ] Warn-level log when cert count is 0
+- [ ] Error count included in log output
+- [ ] Individual cert load errors still logged at warn level
+- [ ] `cargo test` passes
+- [ ] `cargo clippy` passes with no warnings
+
+## References
+
+- docs/reviews/003-security-and-bug-review.md — S3 finding
+- src/proxy/handler.rs — `root_certs()` function
+
+## Notes
+
+> To be filled on completion
+
+## Summary
+
+> To be filled on completion
--- a/tasks/fix/rate-limiter-connectinfo-tests.md
+++ b/tasks/fix/rate-limiter-connectinfo-tests.md
@@ -0,0 +1,67 @@
+---
+id: fix/rate-limiter-connectinfo-tests
+name: Update rate limiter tests to use ConnectInfo instead of X-Forwarded-For (S10)
+status: pending
+depends_on: [fix/rate-limiter-ip-source]
+scope: narrow
+risk: medium
+impact: component
+level: implementation
+review_findings: [S10]
+---
+
+## Description
+
+After `fix/rate-limiter-ip-source` removes X-Forwarded-For parsing from the rate
+limiter, existing integration tests that pass client IPs via `X-Forwarded-For`
+headers will no longer work correctly. The rate limiter now reads exclusively
+from `ConnectInfo<SocketAddr>`, so tests must provide `ConnectInfo` in request
+extensions.
+
+This task also addresses review finding S10: "No test verifies the ConnectInfo
+extraction path, which is the primary path after C1 is fixed."
+
+### Changes Required
+
+**`tests/integration_test.rs`**:
+- Find all rate limit integration tests (lines 90-163 and any others) that set
+  `X-Forwarded-For` headers for rate limiting purposes
+- Update those tests to set `ConnectInfo<SocketAddr>` in request extensions
+  instead of (or in addition to) `X-Forwarded-For`
+- Verify that `X-Forwarded-For` headers are now **ignored** by the rate limiter
+  — add a test that sends requests with different `X-Forwarded-For` values from
+  the same `ConnectInfo` IP and confirms they all count against the same bucket
+- Add a test that verifies requests **without** `ConnectInfo` are rejected with
+  429 (per ADR-025)
+
+**`src/rate_limit/mod.rs`** (test module):
+- Update any unit tests that rely on `X-Forwarded-For` extraction
+- Add a test that verifies `ConnectInfo`-based rate limiting works without any
+  `X-Forwarded-For` header
+
+## Acceptance Criteria
+
+- [ ] All rate limit integration tests use `ConnectInfo` for client IP
+- [ ] Tests verify `X-Forwarded-For` headers are ignored by rate limiter
+- [ ] New test: requests without `ConnectInfo` are rejected with 429
+- [ ] New test: different `X-Forwarded-For` values from same `ConnectInfo` IP
+      count against the same bucket (proving XFF is ignored)
+- [ ] `cargo test` passes
+- [ ] `cargo clippy` passes with no warnings
+
+## References
+
+- docs/architecture/decisions/025-rate-limiter-ip-source.md — ADR-025
+- docs/reviews/003-security-and-bug-review.md — S10 finding
+- tests/integration_test.rs — rate limit tests
+- src/rate_limit/mod.rs — rate limiter middleware and tests
+
+## Notes
+
+> This task depends on `fix/rate-limiter-ip-source` because the code changes
+> must be in place before the tests can be updated. Attempting this before the
+> C1 fix would require writing tests for the old (broken) behavior.
+
+## Summary
+
+> To be filled on completion
--- a/tasks/fix/rate-limiter-ip-source.md
+++ b/tasks/fix/rate-limiter-ip-source.md
@@ -0,0 +1,78 @@
+---
+id: fix/rate-limiter-ip-source
+name: Fix rate limiter to use ConnectInfo only (ADR-025)
+status: pending
+depends_on: []
+scope: narrow
+risk: high
+impact: component
+level: implementation
+review_findings: [C1]
+---
+
+## Description
+
+The rate limiter currently extracts the client IP by checking the `X-Forwarded-For`
+header **first**, then falling back to `ConnectInfo`. This is a security
+vulnerability: since the rate limiter runs as middleware before the proxy
+handler injects trusted headers, the `X-Forwarded-For` value is whatever the
+client sent — completely untrusted. This enables two attack vectors:
+
+1. **Rate limit bypass**: Attacker sends each request with a different random
+   `X-Forwarded-For` value, evading per-IP token buckets entirely.
+2. **DoS via IP spoofing**: Attacker sends requests with `X-Forwarded-For:
+   <victim IP>`, depleting the victim's bucket.
+
+ADR-025 establishes that the rate limiter must use `ConnectInfo<SocketAddr>` as
+the **sole** source of client IP. If `ConnectInfo` is absent, the request must
+be **rejected** (not fall back to an untrusted header).
+
+### Changes Required
+
+**`src/rate_limit/mod.rs`**:
+- Replace the current IP extraction logic in `rate_limit_middleware` (lines 66-76)
+  with ConnectInfo-first, no fallback:
+  ```rust
+  let client_ip = req
+      .extensions()
+      .get::<axum::extract::ConnectInfo<std::net::SocketAddr>>()
+      .map(|ci| ci.ip());
+  ```
+- When `ConnectInfo` is absent, return 429 (reject the request) rather than
+  passing through without rate limiting. Per ADR-025: "If ConnectInfo is absent,
+  the request must be rejected rather than falling back to an untrusted header."
+  ```rust
+  let Some(ip) = client_ip else {
+      return (StatusCode::TOO_MANY_REQUESTS, "Too Many Requests").into_response();
+  };
+  ```
+- Remove all `X-Forwarded-For` header parsing from the rate limiter middleware.
+
+## Acceptance Criteria
+
+- [ ] Rate limiter extracts client IP from `ConnectInfo<SocketAddr>` only
+- [ ] No `X-Forwarded-For` header parsing in the rate limiter middleware
+- [ ] Requests without `ConnectInfo` are rejected with 429 (not passed through)
+- [ ] `cargo test` passes (note: integration tests that pass `X-Forwarded-For`
+      for rate limiting will need to be updated in the separate test task
+      `fix/rate-limiter-connectinfo-tests`)
+- [ ] `cargo clippy` passes with no warnings
+
+## References
+
+- docs/architecture/decisions/025-rate-limiter-ip-source.md — ADR-025
+- docs/architecture/operations.md — Rate limiting design, IP source section
+- docs/architecture/proxy.md — Rate limiter IP source section
+- docs/reviews/003-security-and-bug-review.md — C1 finding
+- src/rate_limit/mod.rs — current implementation (lines 61-104)
+
+## Notes
+
+> This is the highest-priority security fix. After this change, the integration
+> tests in `tests/integration_test.rs` that rely on `X-Forwarded-For` for rate
+> limiting will need to be updated. That work is tracked separately in
+> `fix/rate-limiter-connectinfo-tests`.
+
+## Summary
+
+> To be filled on completion
--- a/tasks/fix/remove-dead-code-remnants.md
+++ b/tasks/fix/remove-dead-code-remnants.md
@@ -0,0 +1,96 @@
+---
+id: fix/remove-dead-code-remnants
+name: Remove dead code items identified in security review #003 (S1)
+status: pending
+depends_on: []
+scope: narrow
+risk: trivial
+impact: component
+level: implementation
+review_findings: [S1]
+---
+
+## Description
+
+Security review #003 identifies the following dead code items that survived the
+previous `fix/clean-dead-code` task. These items are defined but never called in
+production code:
+
+| Item | File | Note |
+|------|------|------|
+| `log_rate_limit!` macro | `src/logging/format.rs:72-83` | Rate limiter uses `warn!` directly |
+| `log_config_reload!` macro | `src/logging/format.rs:97-106` | Reload uses `info!`/`warn!` directly |
+| `format_event_fields()` | `src/logging/format.rs:50-54` | Never called |
+| `ProxyError::PayloadTooLarge` | `src/proxy/error.rs:12` | Body limit returns tuple directly |
+| `ProxyError::NotFound` | `src/proxy/error.rs:20` | Code uses `UnknownHost` instead |
+| `ProxyError::BadRequest` | `src/proxy/error.rs:22` | Code uses `MissingHost` instead |
+| `ProxyError::UpstreamTls` | `src/proxy/error.rs:28` | Never constructed |
+| `build_multi_domain_server_config()` | `src/tls/config.rs:78-102` | Never called |
+| `SniCertResolver` | `src/tls/config.rs:104-126` | Only used by above |
+| `AcmeTlsConfig::directory_url()` | `src/tls/acme.rs:53-59` | Only used in tests |
+
+### Changes Required
+
+For each item, either wire it up (if it should be used) or remove it:
+
+**Logging macros** (`src/logging/format.rs`):
+- `log_rate_limit!`: The rate limiter middleware now uses `warn!` directly.
+  Remove the macro and its test invocation in `log_macros_compile`.
+- `log_config_reload!`: The admin socket reload uses `info!`/`warn!` directly.
+  Remove the macro and its test invocation.
+- `format_event_fields()`: Never called externally. Remove it.
+
+**ProxyError variants** (`src/proxy/error.rs`):
+- `PayloadTooLarge`: Not used by body limit middleware (returns tuple directly).
+  Keep for now with a comment explaining it's reserved for future use, OR remove
+  if no plan to use it. The `#[non_exhaustive]` attribute means removing a variant
+  is not a breaking change for external consumers.
+- `NotFound`: Superseded by `UnknownHost`. Remove and update `status_code()` and
+  `body()` match arms.
+- `BadRequest`: Superseded by `MissingHost`. Remove and update match arms.
+- `UpstreamTls`: Never constructed. Remove and update match arms.
+- Update unit tests that reference removed variants.
+
+**TLS dead code** (`src/tls/config.rs`, `src/tls/acme.rs`):
+- `build_multi_domain_server_config()`: Remove function.
+- `SniCertResolver`: Remove struct and impl (only used by above function).
+- `AcmeTlsConfig::directory_url()`: Check if it's used in tests. If only in
+  tests, gate with `#[cfg(test)]`. If truly unused, remove.
+
+**Test helpers** (`tests/helpers/http_test_helper.rs`):
+- `TestUpstream::url()`, `TestUpstream::upstream_addr()`: Currently annotated
+  `#[allow(dead_code)]`. If not used in any test, remove. If used, remove the
+  `#[allow(dead_code)]` annotation.
+
+## Acceptance Criteria
+
+- [ ] `log_rate_limit!` and `log_config_reload!` macros removed
+- [ ] `format_event_fields()` removed
+- [ ] Unused `ProxyError` variants removed (`NotFound`, `BadRequest`, `UpstreamTls`)
+- [ ] `PayloadTooLarge` either removed or documented with a TODO comment
+- [ ] `build_multi_domain_server_config()` and `SniCertResolver` removed
+- [ ] `directory_url()` either `#[cfg(test)]`-gated or removed
+- [ ] Dead test helper methods removed or `#[allow(dead_code)]` removed
+- [ ] All match arms in `ProxyError` methods updated for removed variants
+- [ ] All unit tests updated for removed variants
+- [ ] `cargo test` passes
+- [ ] `cargo clippy` passes with no warnings
+
+## References
+
+- docs/reviews/003-security-and-bug-review.md — S1 finding
+- src/logging/format.rs — dead macros and function
+- src/proxy/error.rs — unused ProxyError variants
+- src/tls/config.rs — dead TLS functions
+- src/tls/acme.rs — directory_url method
+- tests/helpers/http_test_helper.rs — dead_code-annotated helpers
+
+## Notes
+
+> The previous `fix/clean-dead-code` task added `#[non_exhaustive]` and removed
+> some dead code. This task addresses the items that survived that round or were
+> newly identified in review #003.
+
+## Summary
+
+> To be filled on completion
--- a/tasks/fix/rename-misleading-test.md
+++ b/tasks/fix/rename-misleading-test.md
@@ -0,0 +1,57 @@
+---
+id: fix/rename-misleading-test
+name: Rename misleading health check test and fix dynamic config test (W8, W9)
+status: pending
+depends_on: []
+scope: single
+risk: trivial
+impact: isolated
+level: implementation
+review_findings: [W8, W9]
+---
+
+## Description
+
+Two test quality issues in `tests/integration_test.rs`:
+
+**W8**: `test_health_check_disabled_when_port_zero` is misleading — port 0 means
+"OS picks a random port" not "disabled". The test verifies that
+`start_health_check_listener(0)` works (binds to a random port), which is
+correct but the name implies it tests the disable path. The actual disable
+logic is in `main.rs:94` where `health_check_port > 0` gates the call.
+
+**W9**: `test_dynamic_config_with_limit` constructs `DynamicConfig` directly
+with `routing_table: Default::default()` (empty HashMap), bypassing
+`DynamicConfig::from_sites()`. The body limit tests only read
+`body.limit_bytes` so it doesn't matter, but it's a latent trap for anyone
+copying this pattern.
+
+### Changes Required
+
+**`tests/integration_test.rs`**:
+- Rename `test_health_check_disabled_when_port_zero` to
+  `test_health_check_binds_random_port_when_zero`
+- Update `test_dynamic_config_with_limit` to use `DynamicConfig::from_sites()`
+  instead of directly constructing with an empty routing table, OR add a
+  comment explaining why the routing table is intentionally empty for this test
+
+## Acceptance Criteria
+
+- [ ] Test renamed to `test_health_check_binds_random_port_when_zero`
+- [ ] `test_dynamic_config_with_limit` either uses `from_sites()` or has a
+      comment explaining the intentional empty routing table
+- [ ] `cargo test` passes
+- [ ] `cargo clippy` passes with no warnings
+
+## References
+
+- docs/reviews/003-security-and-bug-review.md — W8, W9 findings
+- tests/integration_test.rs — test names and implementations
+
+## Notes
+
+> To be filled on completion
+
+## Summary
+
+> To be filled on completion
--- a/tasks/fix/tls-mode-wildcard-mismatch.md
+++ b/tasks/fix/tls-mode-wildcard-mismatch.md
@@ -0,0 +1,62 @@
+---
+id: fix/tls-mode-wildcard-mismatch
+name: Add explicit listener/acceptor count check or remove TlsMode wildcard (W5)
+status: pending
+depends_on: []
+scope: single
+risk: trivial
+impact: isolated
+level: implementation
+review_findings: [W5]
+---
+
+## Description
+
+The `match tls_mode` in `main.rs` has a wildcard `_` arm that logs a warning
+and pushes **no** acceptor. Then `bound_listeners.into_iter().zip(tls_acceptors.into_iter())`
+uses `zip`, which silently stops at the shorter iterator. If the wildcard arm
+were ever reached, some listeners would have no TLS acceptor and would be
+silently dropped.
+
+`setup_tls` already rejects unknown modes with `bail!`, so the wildcard is
+unreachable in practice. But it's a latent bug for future refactors.
+
+### Changes Required
+
+**`src/main.rs`** (lines 170-194):
+- Option A (preferred): Remove the wildcard `_` arm entirely. Since `TlsMode`
+  only has two variants (`Manual` and `Acme`) and `setup_tls` already validates,
+  the wildcard is dead code. Removing it means the compiler will catch future
+  `TlsMode` additions.
+
+- Option B: Add an explicit count check after the match loop:
+  ```rust
+  if bound_listeners.len() != tls_acceptors.len() {
+      anyhow::bail!("listener/acceptor count mismatch: {} listeners, {} acceptors",
+          bound_listeners.len(), tls_acceptors.len());
+  }
+  ```
+
+  If removing the wildcard, this check is redundant but harmless as a
+  defense-in-depth assertion.
+
+## Acceptance Criteria
+
+- [ ] Wildcard `_` arm removed from the `match tls_mode` block, OR
+- [ ] Explicit count mismatch check added after the acceptor construction loop
+- [ ] `cargo test` passes
+- [ ] `cargo clippy` passes with no warnings
+
+## References
+
+- docs/reviews/003-security-and-bug-review.md — W5 finding
+- src/main.rs — TLS acceptor construction loop (lines 170-194)
+- src/tls/acceptor.rs — `setup_tls`, `TlsMode` enum
+
+## Notes
+
+> To be filled on completion
+
+## Summary
+
+> To be filled on completion
--- a/tasks/fix/token-bucket-field-visibility.md
+++ b/tasks/fix/token-bucket-field-visibility.md
@@ -0,0 +1,51 @@
+---
+id: fix/token-bucket-field-visibility
+name: Make TokenBucket fields private except last_access (W10, S6)
+status: pending
+depends_on: []
+scope: single
+risk: trivial
+impact: isolated
+level: implementation
+review_findings: [W10, S6]
+---
+
+## Description
+
+All `TokenBucket` fields are `pub` but only `last_access` is read externally (by
+`evict_stale` in `rate_limit/mod.rs`). The other fields (`tokens`, `last_refill`,
+`rate`, `max`) should be private to prevent accidental direct mutation that
+bypasses `try_consume`/`refill` logic.
+
+### Changes Required
+
+**`src/rate_limit/bucket.rs`**:
+- Make `tokens`, `last_refill`, `rate`, `max` private (remove `pub`)
+- Keep `last_access` as `pub(crate)` for `evict_stale` access
+- `TokenBucket::new()` already exists as a constructor, so no changes needed there
+- Update any unit tests that directly access private fields. The tests in
+  `bucket.rs` are in the same module so they have access to private fields.
+  Tests in `mod.rs` may need adjustment if they access `bucket.tokens` etc.
+
+## Acceptance Criteria
+
+- [ ] `tokens`, `last_refill`, `rate`, `max` fields are private
+- [ ] `last_access` is `pub(crate)`
+- [ ] `new()` constructor is the only way to create a `TokenBucket` externally
+- [ ] `evict_stale` still compiles and works (uses `last_access`)
+- [ ] All unit tests pass (in-module tests can still access private fields)
+- [ ] `cargo clippy` passes with no warnings
+
+## References
+
+- docs/reviews/003-security-and-bug-review.md — W10, S6 findings
+- src/rate_limit/bucket.rs — TokenBucket struct
+- src/rate_limit/mod.rs — evict_stale
+
+## Notes
+
+> To be filled on completion
+
+## Summary
+
+> To be filled on completion
--- a/tasks/fix/upstream-host-validation.md
+++ b/tasks/fix/upstream-host-validation.md
@@ -0,0 +1,83 @@
+---
+id: fix/upstream-host-validation
+name: Validate host part of upstream address in config (W1)
+status: pending
+depends_on: []
+scope: narrow
+risk: low
+impact: component
+level: implementation
+review_findings: [W1]
+---
+
+## Description
+
+`is_valid_upstream` checks that the upstream has a `host:port` format with a
+valid port number, but performs **no validation on the host part** beyond
+checking it's non-empty and doesn't start with `http://` or `https://`. Values
+like `!!!bad!!!:3000` or `@#$%:8080` pass validation.
+
+The spec (config.md validation rule 17) now requires: "the host part must parse
+as a valid IpAddr or pass is_valid_hostname validation." Bracket-enclosed values
+must be parsed as IPv6 addresses.
+
+### Changes Required
+
+**`src/config/validation.rs`** — `is_valid_upstream` function (lines 309-327):
+- After validating the port, validate the host part:
+  ```rust
+  fn is_valid_upstream(upstream: &str) -> bool {
+      if let Some(idx) = upstream.rfind(':') {
+          let host_part = &upstream[..idx];
+          let port_str = &upstream[idx + 1..];
+          if host_part.is_empty() { return false; }
+          if upstream.starts_with("http://") || upstream.starts_with("https://") { return false; }
+          let port: u16 = match port_str.parse() { Ok(p) => p, Err(_) => return false };
+          if port == 0 { return false; }
+          // Validate host part per config.md rule 17
+          if host_part.starts_with('[') && host_part.ends_with(']') {
+              let inner = &host_part[1..host_part.len()-1];
+              inner.parse::<std::net::Ipv6Addr>().is_ok()
+          } else {
+              host_part.parse::<std::net::IpAddr>().is_ok() || is_valid_hostname(host_part)
+          }
+      } else {
+          false
+      }
+  }
+  ```
+- Note: `is_valid_hostname` already exists in the same file and is used for site
+  `host` validation. It rejects IP addresses, which is correct for site hosts
+  but wrong for upstream hosts — upstream hosts CAN be IPs. The upstream
+  validation must check `IpAddr::parse` first, then fall back to
+  `is_valid_hostname` for DNS names.
+
+- Add tests for:
+  - Valid: `gitea:3000`, `127.0.0.1:3000`, `[::1]:3000`
+  - Invalid: `!!!bad!!!:3000`, `@#$%:8080`, `:3000` (empty host)
+
+## Acceptance Criteria
+
+- [ ] `is_valid_upstream` validates the host part as IP address or valid hostname
+- [ ] IPv6 bracket notation is handled (e.g., `[::1]:3000`)
+- [ ] Invalid host characters like `!!!bad!!!:3000` are rejected
+- [ ] Valid upstream formats still pass: `gitea:3000`, `127.0.0.1:3000`
+- [ ] New unit tests for valid and invalid upstream host parts
+- [ ] `cargo test` passes
+- [ ] `cargo clippy` passes with no warnings
+
+## References
+
+- docs/architecture/config.md — validation rule 17
+- docs/reviews/003-security-and-bug-review.md — W1 finding
+- src/config/validation.rs — `is_valid_upstream`, `is_valid_hostname`
+
+## Notes
+
+> `is_valid_hostname` currently rejects IP addresses (intentional for site
+> hosts). The upstream validation must handle IPs separately before falling
+> back to `is_valid_hostname`.
+
+## Summary
+
+> To be filled on completion
--- a/tasks/fix/upstream-uri-error-handling.md
+++ b/tasks/fix/upstream-uri-error-handling.md
@@ -0,0 +1,80 @@
+---
+id: fix/upstream-uri-error-handling
+name: Return 502 on upstream URI parse failure instead of dropping query string (W3)
+status: pending
+depends_on: []
+scope: narrow
+risk: low
+impact: component
+level: implementation
+review_findings: [W3]
+---
+
+## Description
+
+`build_upstream_uri` silently drops the query string on parse failure and
+`.unwrap()`s the fallback. This corrupts requests — the upstream receives the
+wrong URL with no query parameters, and neither the client nor operator is
+notified. The `.unwrap()` on the fallback could also panic.
+
+The spec (proxy.md request forwarding step 1) states: "If URI construction
+fails (e.g., the resulting URI is malformed), the proxy must return 502 Bad
+Gateway and log the error at warn level. The proxy must never silently drop
+parts of the URI."
+
+### Changes Required
+
+**`src/proxy/handler.rs`**:
+- Change `build_upstream_uri` to return `Result<Uri, ()>`:
+  ```rust
+  fn build_upstream_uri(scheme: &str, upstream: &str, original_uri: &Uri) -> Result<Uri, ()> {
+      let path = original_uri.path();
+      let query = original_uri
+          .query()
+          .map(|q| format!("?{}", q))
+          .unwrap_or_default();
+      let uri_string = format!("{}://{}{}{}", scheme, upstream, path, query);
+      uri_string.parse::<Uri>().map_err(|e| {
+          warn!(error = %e, uri = %uri_string, "failed to parse upstream URI");
+      })
+  }
+  ```
+- Update `proxy_handler` to handle the `Err` case:
+  ```rust
+  let upstream_uri = match build_upstream_uri(&upstream_scheme, &upstream, req.uri()) {
+      Ok(uri) => uri,
+      Err(()) => {
+          log_upstream_error!(&host_owned, &upstream_addr, "malformed upstream URI");
+          let duration_ms = start.elapsed().as_millis() as u64;
+          log_request!(&client_ip, &host_owned, &method, &path, 502u16, &upstream, duration_ms);
+          return StatusCode::BAD_GATEWAY.into_response();
+      }
+  };
+  ```
+- Update the existing unit tests for `build_upstream_uri` to handle the
+  `Result` return type.
+
+## Acceptance Criteria
+
+- [ ] `build_upstream_uri` returns `Result<Uri, ()>` instead of `Uri`
+- [ ] URI parse failure logs a warning with the malformed URI string
+- [ ] URI parse failure returns 502 Bad Gateway to the client
+- [ ] No `.unwrap()` calls in `build_upstream_uri`
+- [ ] Query strings are never silently dropped
+- [ ] Existing unit tests updated for `Result` return type
+- [ ] `cargo test` passes
+- [ ] `cargo clippy` passes with no warnings
+
+## References
+
+- docs/architecture/proxy.md — request forwarding step 1, URI error handling
+- docs/reviews/003-security-and-bug-review.md — W3 finding
+- src/proxy/handler.rs — `build_upstream_uri`, `proxy_handler`
+
+## Notes
+
+> To be filled on completion
+
+## Summary
+
+> To be filled on completion