Decompose security review #003 findings into 17 fix tasks and 1 review task

Address 4 critical, 8 warning, and 5 suggestion findings from the
security and bug review by creating atomic, dependency-ordered tasks:

Critical fixes (C1-C4): rate limiter IP source (ADR-025), InFlightCounter
increment + drain interval, connector timeout ceiling (ADR-026), JSON format
without log file.

Validation tightening (W1, W2): upstream host validation, ACME contact email
validation.

Robustness (W3, W4, W5, W12): upstream URI error handling (502 not silent
drop), admin socket resource limits (ADR-027), TlsMode wildcard mismatch,
http_port u32→u16.

Code quality (W6, W10, W11, S1, S3, W8/W9): config type consolidation,
TokenBucket field visibility, reload_mutex #[cfg(test)], dead code removal,
root cert count logging, misleading test names.

Test coverage (S10): rate limiter ConnectInfo tests (depends on C1 fix).

Review: post-security-fix-review checkpoint covering all critical fixes
and sensitive config consolidation path.
This commit is contained in:
2026-06-12 13:42:37 +00:00
parent 80d1fd0fb3
commit 54f1725173
18 changed files with 1282 additions and 0 deletions

View File

@@ -0,0 +1,68 @@
---
id: fix/acme-contact-validation
name: Validate ACME contact email has non-empty address with @ sign (W2)
status: pending
depends_on: []
scope: single
risk: trivial
impact: isolated
level: implementation
review_findings: [W2]
---
## Description
The ACME contact validation only checks that `contact.starts_with("mailto:")`
but doesn't verify there's an actual email address after the prefix. Values
like `mailto:` (empty email) pass validation but will fail at the Let's Encrypt
API at certificate provisioning time — after the proxy has already started.
The spec (config.md validation rule 19) now requires: "a valid `mailto:` URI
with a non-empty email address containing an `@` sign."
### Changes Required
**`src/config/validation.rs`** — ACME contact validation (lines 148-153):
- After checking `starts_with("mailto:")`, validate the email part:
```rust
let contact = &listener.tls.acme_contact;
if contact.is_empty() || !contact.starts_with("mailto:") {
errors.push(ValidationError::AcmeContactInvalid {
bind_addr: listener.bind_addr.clone(),
});
} else {
let email = &contact[7..]; // after "mailto:"
if email.is_empty() || !email.contains('@') {
errors.push(ValidationError::AcmeContactInvalid {
bind_addr: listener.bind_addr.clone(),
});
}
}
```
- Add tests for:
- Valid: `mailto:admin@example.com`
- Invalid: `mailto:` (empty), `mailto:user` (no @), empty string, non-mailto
## Acceptance Criteria
- [ ] `mailto:` (empty email) is rejected by validation
- [ ] `mailto:user` (no @ sign) is rejected by validation
- [ ] `mailto:admin@example.com` still passes validation
- [ ] Non-mailto values still rejected
- [ ] New unit tests for the tightened validation
- [ ] `cargo test` passes
- [ ] `cargo clippy` passes with no warnings
## References
- docs/architecture/config.md — validation rule 19
- docs/reviews/003-security-and-bug-review.md — W2 finding
- src/config/validation.rs — ACME contact validation, existing tests
## Notes
> To be filled on completion
## Summary
> To be filled on completion

View File

@@ -0,0 +1,52 @@
---
id: fix/admin-socket-reload-mutex-visibility
name: Gate AdminSocket::reload_mutex with #[cfg(test)] (W11)
status: pending
depends_on: []
scope: single
risk: trivial
impact: isolated
level: implementation
review_findings: [W11]
---
## Description
`AdminSocket::reload_mutex()` is a public method that exists solely for the
`test_reload_serialized_with_mutex` test. It exposes an internal synchronization
primitive, and the test acquires the mutex before sending a reload command —
coupling the test to implementation details.
### Changes Required
**`src/admin/socket.rs`**:
- Gate `reload_mutex()` with `#[cfg(test)]`:
```rust
#[cfg(test)]
pub fn reload_mutex(&self) -> Arc<Mutex<()>> {
self.reload_mutex.clone()
}
```
- The existing test `test_reload_serialized_with_mutex` already uses this
method, so it will continue to work.
## Acceptance Criteria
- [ ] `reload_mutex()` is only available in test builds (`#[cfg(test)]`)
- [ ] The `test_reload_serialized_with_mutex` test still compiles and passes
- [ ] `cargo clippy` passes with no warnings in non-test build
## References
- docs/reviews/003-security-and-bug-review.md — W11 finding
- src/admin/socket.rs — `AdminSocket::reload_mutex()`, test
## Notes
> The review suggests alternatively removing the method entirely and testing
> serialization through observable behavior. For Phase 1, gating with
> `#[cfg(test)]` is the simpler fix that preserves the existing test.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,100 @@
---
id: fix/admin-socket-resource-limits
name: Add read timeout and line length limit to admin socket (ADR-027)
status: pending
depends_on: []
scope: narrow
risk: low
impact: component
level: implementation
review_findings: [W4, S4]
---
## Description
The admin socket's `handle_connection` reads one newline-terminated line with
`reader.read_line(&mut line)` but sets no timeout and no length limit. This
allows:
1. A client to connect and send no data, holding a connection indefinitely
2. A client to send unbounded data without a newline, causing OOM
ADR-027 specifies: 5-second read timeout, 4096 byte line length limit.
### Changes Required
**`src/admin/socket.rs`** — `handle_connection` function (lines 166-210):
- Wrap the `BufReader` with `tokio::io::take` to limit read size to 4096 bytes:
```rust
let (reader, mut writer) = stream.into_split();
let mut reader = BufReader::new(tokio::io::take(reader, 4096));
let mut line = String::new();
```
- Wrap the `read_line` call in a `tokio::time::timeout`:
```rust
use std::time::Duration;
let read_result = tokio::time::timeout(
Duration::from_secs(5),
reader.read_line(&mut line),
).await;
```
- Handle timeout and line-too-long cases:
```rust
match read_result {
Ok(Ok(0)) | Ok(Err(_)) => {
// existing "invalid input" handling
}
Err(_) => {
// timeout
tracing::debug!("admin socket connection timed out");
let _ = writer.write_all(
format!("{}\n", serde_json::to_string(&ErrorResponse {
status: "error",
message: "read timeout".to_string(),
}).unwrap()).as_bytes()
).await;
return;
}
Ok(Ok(n)) => {
// Check if line was truncated (no newline found within limit)
if !line.ends_with('\n') && n > 0 {
tracing::warn!("admin socket command exceeded 4096 byte limit");
let _ = writer.write_all(
format!("{}\n", serde_json::to_string(&ErrorResponse {
status: "error",
message: "command too long".to_string(),
}).unwrap()).as_bytes()
).await;
return;
}
// ... existing command handling
}
}
```
- Update existing tests and add new tests for timeout and line length limit.
## Acceptance Criteria
- [ ] Read timeout of 5 seconds applied to admin socket connections
- [ ] Line length limit of 4096 bytes applied (via `tokio::io::take`)
- [ ] Timeout logged at `debug` level per ADR-027
- [ ] Line-too-long logged at `warn` level per ADR-027
- [ ] Both conditions return appropriate error JSON to the client
- [ ] Legitimate commands (`reload`, `status`) still work
- [ ] New tests for timeout and line length limit behavior
- [ ] `cargo test` passes
- [ ] `cargo clippy` passes with no warnings
## References
- docs/architecture/decisions/027-admin-socket-resource-limits.md — ADR-027
- docs/architecture/operations.md — admin socket resource limits
- docs/reviews/003-security-and-bug-review.md — W4, S4 findings
- src/admin/socket.rs — `handle_connection`
## Notes
> To be filled on completion
## Summary
> To be filled on completion

View File

@@ -0,0 +1,58 @@
---
id: fix/connector-timeout-ceiling
name: Raise connector timeout ceiling to 30s per ADR-026
status: pending
depends_on: []
scope: single
risk: low
impact: component
level: implementation
review_findings: [C3]
---
## Description
The HTTP connector's `set_connect_timeout` is hardcoded to 5 seconds. Per-site
connect timeout values > 5s are silently capped because the connector's internal
timeout fires before the `tokio::time::timeout` wrapper.
ADR-026 establishes a 30-second ceiling on the connector. The per-site
`tokio::time::timeout` enforces the actual per-site connect timeout. The
connector ceiling is a safety backstop, not the primary enforcement mechanism.
### Changes Required
**`src/proxy/handler.rs`**:
- Change `DEFAULT_CONNECT_TIMEOUT_SECS` from 5 to 30:
```rust
const CONNECT_TIMEOUT_CEILING_SECS: u64 = 30;
```
- Rename the constant to make its role clear (ceiling, not the default connect
timeout for sites).
- Update both `create_http_client` and `create_https_client` to use the renamed
constant.
## Acceptance Criteria
- [ ] Connector `set_connect_timeout` is set to 30 seconds
- [ ] Constant is named to reflect its ceiling role (not "default")
- [ ] Per-site connect timeout (default 5s) via `tokio::time::timeout` is unchanged
- [ ] `cargo test` passes
- [ ] `cargo clippy` passes with no warnings
## References
- docs/architecture/decisions/026-connector-timeout-ceiling.md — ADR-026
- docs/architecture/proxy.md — Upstream connection section
- docs/reviews/003-security-and-bug-review.md — C3 finding
- src/proxy/handler.rs — `create_http_client`, `create_https_client`
## Notes
> The previous `fix/connect-timeout` task wired the two-phase timeout approach.
> This task completes that work by raising the connector ceiling so per-site
> timeouts > 5s actually work.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,95 @@
---
id: fix/consolidate-config-types
name: Delete RawConfig and use FullConfig in load_config (W6, S5)
status: pending
depends_on: []
scope: narrow
risk: medium
impact: component
level: implementation
review_findings: [W6, S5]
---
## Description
`RawConfig` (in `src/cli.rs`) and `FullConfig` (in `src/config/mod.rs`) have
identical fields and identical serde attributes. They exist because the initial
load path manually constructs `StaticConfig` + `SerializableDynamicConfig`,
while the reload path uses `FullConfig::into_static_and_dynamic()`. Any new
config field must be added in two places.
The fix is to delete `RawConfig` and use `FullConfig` in `load_config`. The
`collect_sites` helper can also be removed since `into_static_and_dynamic`
already collects sites from all listeners.
### Changes Required
**`src/cli.rs`**:
- Delete the `RawConfig` struct (lines 49-65)
- Rewrite `load_config` to use `FullConfig`:
```rust
pub fn load_config(cli: &Cli) -> Result<LoadedConfig> {
let config_path = Path::new(&cli.config);
let config_content = std::fs::read_to_string(config_path)
.with_context(|| format!("failed to read config file: {}", cli.config))?;
let full_config = crate::config::FullConfig::parse(&config_content)
.with_context(|| format!("failed to parse config file: {}", cli.config))?;
let (static_config, dynamic_config) = full_config.into_static_and_dynamic();
let allow_wildcard_bind = static_config.allow_wildcard_bind || cli.allow_wildcard_bind;
validate(&static_config, &dynamic_config, cli.allow_wildcard_bind).map_err(|errors| {
anyhow::anyhow!(
"config validation failed:\n{}",
errors.iter()
.map(|e| format!(" - {}", e))
.collect::<Vec<_>>()
.join("\n")
)
})?;
Ok(LoadedConfig {
static_config,
dynamic_config,
allow_wildcard_bind,
})
}
```
- Delete the `collect_sites` helper function (lines 112-118)
- Remove the now-unused imports: `SerializableDynamicConfig`, `BodyConfig`,
`RateLimitConfig` (if they're only used via `RawConfig`)
**`src/config/mod.rs`**:
- No changes needed — `FullConfig` already has `into_static_and_dynamic()`
- Verify `FullConfig::parse` and `into_static_and_dynamic` produce identical
results to the old `RawConfig` path
## Acceptance Criteria
- [ ] `RawConfig` struct deleted from `src/cli.rs`
- [ ] `collect_sites` function deleted from `src/cli.rs`
- [ ] `load_config` uses `FullConfig::parse` + `into_static_and_dynamic`
- [ ] Startup config loading produces same results as before
- [ ] Config validation still runs on both startup and reload
- [ ] All existing `cli.rs` tests pass
- [ ] `cargo test` passes
- [ ] `cargo clippy` passes with no warnings
## References
- docs/architecture/config.md — config reload, FullConfig
- docs/reviews/003-security-and-bug-review.md — W6, S5 findings
- src/cli.rs — RawConfig, load_config, collect_sites
- src/config/mod.rs — FullConfig, into_static_and_dynamic
## Notes
> This changes the startup config parsing path. While the behavior should be
> identical (same fields, same serde attributes), this is a sensitive area. A
> review task follows the code quality generation.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,77 @@
---
id: fix/http-port-type-u16
name: Change http_port type from u32 to u16 per spec (W12)
status: pending
depends_on: []
scope: narrow
risk: low
impact: component
level: implementation
review_findings: [W12]
---
## Description
`http_port` is declared as `u32` in `ListenerConfig` but `https_port` is `u16`.
Both represent TCP port numbers (valid range 165535). The type inconsistency
means comparisons require casting (`listener.http_port == listener.https_port as
u32`) and `http_port` could theoretically hold values > 65535 caught by
validation rather than the type system.
The spec (config.md) now declares `http_port` as `u16`.
### Changes Required
**`src/config/static_config.rs`**:
- Change `http_port` field type from `u32` to `u16` in `ListenerConfig`
- Update `default_http_port()` return type to `u16`
**`src/config/validation.rs`**:
- Change `DuplicateHttpBind` error type: `http_port: u32``http_port: u16`
- Change `HttpsAndHttpPortSame` error type: `http_port: u32``http_port: u16`
- Change `HttpPortInvalid` error type: `http_port: u32``http_port: u16`
- Remove `as u32` casts — both `http_port` and `https_port` are now `u16`
- Remove the `http_port > 65535` check (impossible with `u16`, but keep `http_port > 0`
for the "disabled" check)
- Update comparison: `listener.http_port == listener.https_port` (no cast needed)
- Update health check port comparison: remove `as u32` cast
**`src/main.rs`**:
- Update any `http_port` references that assume `u32`
**`src/cli.rs`**:
- Update `RawConfig.http_port` type from `u32` to `u16` (if `RawConfig` still
exists after `fix/consolidate-config-types`; if not, this file is unaffected)
**`src/config/test_fixtures.rs`**:
- Update any test fixture `http_port` values from `u32` to `u16`
**`tests/integration_test.rs`**:
- Update any hardcoded `http_port` values
## Acceptance Criteria
- [ ] `http_port` is `u16` in `ListenerConfig`
- [ ] All `as u32` casts on `http_port` removed
- [ ] `http_port > 65535` validation check removed (impossible with u16)
- [ ] `http_port == https_port` comparison works without casting
- [ ] All validation tests pass
- [ ] `cargo test` passes
- [ ] `cargo clippy` passes with no warnings
## References
- docs/architecture/config.md — `http_port` type declaration
- docs/reviews/003-security-and-bug-review.md — W12 finding
- src/config/static_config.rs — `ListenerConfig` struct
- src/config/validation.rs — validation rules, error types
## Notes
> If `fix/consolidate-config-types` runs first and removes `RawConfig`, the
> `src/cli.rs` changes in this task are reduced. The two tasks are independent
> in terms of the type change itself.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,74 @@
---
id: fix/inflight-counter-increment
name: Fix InFlightCounter to increment before spawning task (C2 + drain interval)
status: pending
depends_on: []
scope: narrow
risk: medium
impact: component
level: implementation
review_findings: [C2]
---
## Description
`InFlightCounter::increment()` is never called anywhere in the codebase. The
`InFlightGuard` only decrements on drop. Since `count` stays at 0, the first
guard drop does `fetch_sub(1)` on an `AtomicUsize` with value 0, which wraps to
`usize::MAX`. `is_zero()` checks `count == 0`, which never becomes true again.
The drain logic in `drain_in_flight` always times out.
The spec (operations.md shutdown sequence) states: "each request **must**
increment the counter when it begins and decrement when it completes (via guard
drop). The increment must happen before the request task is spawned."
Additionally, the spec states the drain polls every 100ms, but the current
implementation uses 50ms. Align with the spec.
### Changes Required
**`src/server.rs`**:
- Fold the increment into `InFlightGuard::new()` so it's impossible to forget:
```rust
impl InFlightGuard {
fn new(counter: Arc<InFlightCounter>) -> Self {
counter.increment();
Self(counter)
}
}
```
- Update `serve_https_listener` to use `InFlightGuard::new(in_flight.clone())`
instead of `InFlightGuard(in_flight.clone())`.
- Make `InFlightGuard`'s tuple struct private (if it isn't already) so callers
must use `new()`.
**`src/server.rs` — `drain_in_flight`**:
- Change polling interval from 50ms to 100ms per the spec (operations.md):
```rust
tokio::time::sleep(std::time::Duration::from_millis(100)).await;
```
## Acceptance Criteria
- [ ] `InFlightGuard::new()` calls `counter.increment()` before returning
- [ ] `InFlightGuard` is constructed via `new()` only, not the tuple struct
- [ ] `serve_https_listener` uses `InFlightGuard::new(in_flight.clone())`
- [ ] `drain_in_flight` polls every 100ms (not 50ms)
- [ ] `cargo test` passes
- [ ] `cargo clippy` passes with no warnings
## References
- docs/architecture/operations.md — shutdown sequence, in-flight counter
- docs/reviews/003-security-and-bug-review.md — C2 finding
- src/server.rs — InFlightCounter, InFlightGuard, drain_in_flight
- src/main.rs — drain_in_flight caller
## Notes
> The previous `fix/graceful-shutdown` task addressed the abort-vs-join logic
> but did not fix the increment bug. This task completes that work.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,58 @@
---
id: fix/json-format-without-logfile
name: Fix JSON format not applied when no log file is configured (C4)
status: pending
depends_on: []
scope: single
risk: trivial
impact: isolated
level: implementation
review_findings: [C4]
---
## Description
When `format = "json"` is configured but no `log_file_path` is set, the `None`
branch of `init_json` creates a layer **without** calling `.json()`. The output
is plain text, not JSON. The `format = "json"` config value is silently ignored.
The spec (operations.md) states: "Both output destinations must respect the
`format` config value: when `format = "json"`, both file and stdout output must
use JSON formatting."
### Changes Required
**`src/logging/mod.rs`** — `init_json` function, `None` branch (lines 54-58):
- Add `.json()` to the stdout-only layer:
```rust
None => {
let layer = tracing_subscriber::fmt::layer()
.json()
.with_ansi(false)
.with_filter(env_filter);
tracing_subscriber::registry().with(layer).try_init()?;
}
```
## Acceptance Criteria
- [ ] `.json()` is called in the `None` branch of `init_json`
- [ ] `format = "json"` with no `log_file_path` produces JSON output on stdout
- [ ] `format = "json"` with `log_file_path` still produces JSON on both outputs
- [ ] `format = "text"` paths are unchanged
- [ ] `cargo test` passes
- [ ] `cargo clippy` passes with no warnings
## References
- docs/architecture/operations.md — logging output, format guarantee
- docs/reviews/003-security-and-bug-review.md — C4 finding
- src/logging/mod.rs — `init_json` function
## Notes
> To be filled on completion
## Summary
> To be filled on completion

View File

@@ -0,0 +1,66 @@
---
id: fix/log-root-cert-count
name: Log system root certificate count at startup (S3)
status: pending
depends_on: []
scope: single
risk: trivial
impact: isolated
level: implementation
review_findings: [S3]
---
## Description
`root_certs()` loads native certificates silently — only logs errors. If the
system has zero root certificates (misconfigured CA bundle), all HTTPS upstream
connections will fail with opaque TLS errors and no diagnostic message.
### Changes Required
**`src/proxy/handler.rs`** — `root_certs()` function (lines 246-258):
- Add an info-level log with cert count and warn if zero:
```rust
fn root_certs() -> rustls::RootCertStore {
let mut roots = rustls::RootCertStore::empty();
let result = rustls_native_certs::load_native_certs();
for cert in result.certs {
roots.add(cert).ok();
}
let cert_count = roots.len();
let error_count = result.errors.len();
if cert_count == 0 {
warn!(certs_loaded = cert_count, errors = error_count,
"no system root certificates loaded — HTTPS upstream connections will fail");
} else {
info!(certs_loaded = cert_count, errors = error_count,
"loaded system root certificates");
}
for err in &result.errors {
warn!(error = %err, "failed to load native certificate");
}
roots
}
```
## Acceptance Criteria
- [ ] Info-level log with cert count when certs > 0
- [ ] Warn-level log when cert count is 0
- [ ] Error count included in log output
- [ ] Individual cert load errors still logged at warn level
- [ ] `cargo test` passes
- [ ] `cargo clippy` passes with no warnings
## References
- docs/reviews/003-security-and-bug-review.md — S3 finding
- src/proxy/handler.rs — `root_certs()` function
## Notes
> To be filled on completion
## Summary
> To be filled on completion

View File

@@ -0,0 +1,67 @@
---
id: fix/rate-limiter-connectinfo-tests
name: Update rate limiter tests to use ConnectInfo instead of X-Forwarded-For (S10)
status: pending
depends_on: [fix/rate-limiter-ip-source]
scope: narrow
risk: medium
impact: component
level: implementation
review_findings: [S10]
---
## Description
After `fix/rate-limiter-ip-source` removes X-Forwarded-For parsing from the rate
limiter, existing integration tests that pass client IPs via `X-Forwarded-For`
headers will no longer work correctly. The rate limiter now reads exclusively
from `ConnectInfo<SocketAddr>`, so tests must provide `ConnectInfo` in request
extensions.
This task also addresses review finding S10: "No test verifies the ConnectInfo
extraction path, which is the primary path after C1 is fixed."
### Changes Required
**`tests/integration_test.rs`**:
- Find all rate limit integration tests (lines 90-163 and any others) that set
`X-Forwarded-For` headers for rate limiting purposes
- Update those tests to set `ConnectInfo<SocketAddr>` in request extensions
instead of (or in addition to) `X-Forwarded-For`
- Verify that `X-Forwarded-For` headers are now **ignored** by the rate limiter
— add a test that sends requests with different `X-Forwarded-For` values from
the same `ConnectInfo` IP and confirms they all count against the same bucket
- Add a test that verifies requests **without** `ConnectInfo` are rejected with
429 (per ADR-025)
**`src/rate_limit/mod.rs`** (test module):
- Update any unit tests that rely on `X-Forwarded-For` extraction
- Add a test that verifies `ConnectInfo`-based rate limiting works without any
`X-Forwarded-For` header
## Acceptance Criteria
- [ ] All rate limit integration tests use `ConnectInfo` for client IP
- [ ] Tests verify `X-Forwarded-For` headers are ignored by rate limiter
- [ ] New test: requests without `ConnectInfo` are rejected with 429
- [ ] New test: different `X-Forwarded-For` values from same `ConnectInfo` IP
count against the same bucket (proving XFF is ignored)
- [ ] `cargo test` passes
- [ ] `cargo clippy` passes with no warnings
## References
- docs/architecture/decisions/025-rate-limiter-ip-source.md — ADR-025
- docs/reviews/003-security-and-bug-review.md — S10 finding
- tests/integration_test.rs — rate limit tests
- src/rate_limit/mod.rs — rate limiter middleware and tests
## Notes
> This task depends on `fix/rate-limiter-ip-source` because the code changes
> must be in place before the tests can be updated. Attempting this before the
> C1 fix would require writing tests for the old (broken) behavior.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,78 @@
---
id: fix/rate-limiter-ip-source
name: Fix rate limiter to use ConnectInfo only (ADR-025)
status: pending
depends_on: []
scope: narrow
risk: high
impact: component
level: implementation
review_findings: [C1]
---
## Description
The rate limiter currently extracts the client IP by checking the `X-Forwarded-For`
header **first**, then falling back to `ConnectInfo`. This is a security
vulnerability: since the rate limiter runs as middleware before the proxy
handler injects trusted headers, the `X-Forwarded-For` value is whatever the
client sent — completely untrusted. This enables two attack vectors:
1. **Rate limit bypass**: Attacker sends each request with a different random
`X-Forwarded-For` value, evading per-IP token buckets entirely.
2. **DoS via IP spoofing**: Attacker sends requests with `X-Forwarded-For:
<victim IP>`, depleting the victim's bucket.
ADR-025 establishes that the rate limiter must use `ConnectInfo<SocketAddr>` as
the **sole** source of client IP. If `ConnectInfo` is absent, the request must
be **rejected** (not fall back to an untrusted header).
### Changes Required
**`src/rate_limit/mod.rs`**:
- Replace the current IP extraction logic in `rate_limit_middleware` (lines 66-76)
with ConnectInfo-first, no fallback:
```rust
let client_ip = req
.extensions()
.get::<axum::extract::ConnectInfo<std::net::SocketAddr>>()
.map(|ci| ci.ip());
```
- When `ConnectInfo` is absent, return 429 (reject the request) rather than
passing through without rate limiting. Per ADR-025: "If ConnectInfo is absent,
the request must be rejected rather than falling back to an untrusted header."
```rust
let Some(ip) = client_ip else {
return (StatusCode::TOO_MANY_REQUESTS, "Too Many Requests").into_response();
};
```
- Remove all `X-Forwarded-For` header parsing from the rate limiter middleware.
## Acceptance Criteria
- [ ] Rate limiter extracts client IP from `ConnectInfo<SocketAddr>` only
- [ ] No `X-Forwarded-For` header parsing in the rate limiter middleware
- [ ] Requests without `ConnectInfo` are rejected with 429 (not passed through)
- [ ] `cargo test` passes (note: integration tests that pass `X-Forwarded-For`
for rate limiting will need to be updated in the separate test task
`fix/rate-limiter-connectinfo-tests`)
- [ ] `cargo clippy` passes with no warnings
## References
- docs/architecture/decisions/025-rate-limiter-ip-source.md — ADR-025
- docs/architecture/operations.md — Rate limiting design, IP source section
- docs/architecture/proxy.md — Rate limiter IP source section
- docs/reviews/003-security-and-bug-review.md — C1 finding
- src/rate_limit/mod.rs — current implementation (lines 61-104)
## Notes
> This is the highest-priority security fix. After this change, the integration
> tests in `tests/integration_test.rs` that rely on `X-Forwarded-For` for rate
> limiting will need to be updated. That work is tracked separately in
> `fix/rate-limiter-connectinfo-tests`.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,96 @@
---
id: fix/remove-dead-code-remnants
name: Remove dead code items identified in security review #003 (S1)
status: pending
depends_on: []
scope: narrow
risk: trivial
impact: component
level: implementation
review_findings: [S1]
---
## Description
Security review #003 identifies the following dead code items that survived the
previous `fix/clean-dead-code` task. These items are defined but never called in
production code:
| Item | File | Note |
|------|------|------|
| `log_rate_limit!` macro | `src/logging/format.rs:72-83` | Rate limiter uses `warn!` directly |
| `log_config_reload!` macro | `src/logging/format.rs:97-106` | Reload uses `info!`/`warn!` directly |
| `format_event_fields()` | `src/logging/format.rs:50-54` | Never called |
| `ProxyError::PayloadTooLarge` | `src/proxy/error.rs:12` | Body limit returns tuple directly |
| `ProxyError::NotFound` | `src/proxy/error.rs:20` | Code uses `UnknownHost` instead |
| `ProxyError::BadRequest` | `src/proxy/error.rs:22` | Code uses `MissingHost` instead |
| `ProxyError::UpstreamTls` | `src/proxy/error.rs:28` | Never constructed |
| `build_multi_domain_server_config()` | `src/tls/config.rs:78-102` | Never called |
| `SniCertResolver` | `src/tls/config.rs:104-126` | Only used by above |
| `AcmeTlsConfig::directory_url()` | `src/tls/acme.rs:53-59` | Only used in tests |
### Changes Required
For each item, either wire it up (if it should be used) or remove it:
**Logging macros** (`src/logging/format.rs`):
- `log_rate_limit!`: The rate limiter middleware now uses `warn!` directly.
Remove the macro and its test invocation in `log_macros_compile`.
- `log_config_reload!`: The admin socket reload uses `info!`/`warn!` directly.
Remove the macro and its test invocation.
- `format_event_fields()`: Never called externally. Remove it.
**ProxyError variants** (`src/proxy/error.rs`):
- `PayloadTooLarge`: Not used by body limit middleware (returns tuple directly).
Keep for now with a comment explaining it's reserved for future use, OR remove
if no plan to use it. The `#[non_exhaustive]` attribute means removing a variant
is not a breaking change for external consumers.
- `NotFound`: Superseded by `UnknownHost`. Remove and update `status_code()` and
`body()` match arms.
- `BadRequest`: Superseded by `MissingHost`. Remove and update match arms.
- `UpstreamTls`: Never constructed. Remove and update match arms.
- Update unit tests that reference removed variants.
**TLS dead code** (`src/tls/config.rs`, `src/tls/acme.rs`):
- `build_multi_domain_server_config()`: Remove function.
- `SniCertResolver`: Remove struct and impl (only used by above function).
- `AcmeTlsConfig::directory_url()`: Check if it's used in tests. If only in
tests, gate with `#[cfg(test)]`. If truly unused, remove.
**Test helpers** (`tests/helpers/http_test_helper.rs`):
- `TestUpstream::url()`, `TestUpstream::upstream_addr()`: Currently annotated
`#[allow(dead_code)]`. If not used in any test, remove. If used, remove the
`#[allow(dead_code)]` annotation.
## Acceptance Criteria
- [ ] `log_rate_limit!` and `log_config_reload!` macros removed
- [ ] `format_event_fields()` removed
- [ ] Unused `ProxyError` variants removed (`NotFound`, `BadRequest`, `UpstreamTls`)
- [ ] `PayloadTooLarge` either removed or documented with a TODO comment
- [ ] `build_multi_domain_server_config()` and `SniCertResolver` removed
- [ ] `directory_url()` either `#[cfg(test)]`-gated or removed
- [ ] Dead test helper methods removed or `#[allow(dead_code)]` removed
- [ ] All match arms in `ProxyError` methods updated for removed variants
- [ ] All unit tests updated for removed variants
- [ ] `cargo test` passes
- [ ] `cargo clippy` passes with no warnings
## References
- docs/reviews/003-security-and-bug-review.md — S1 finding
- src/logging/format.rs — dead macros and function
- src/proxy/error.rs — unused ProxyError variants
- src/tls/config.rs — dead TLS functions
- src/tls/acme.rs — directory_url method
- tests/helpers/http_test_helper.rs — dead_code-annotated helpers
## Notes
> The previous `fix/clean-dead-code` task added `#[non_exhaustive]` and removed
> some dead code. This task addresses the items that survived that round or were
> newly identified in review #003.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,57 @@
---
id: fix/rename-misleading-test
name: Rename misleading health check test and fix dynamic config test (W8, W9)
status: pending
depends_on: []
scope: single
risk: trivial
impact: isolated
level: implementation
review_findings: [W8, W9]
---
## Description
Two test quality issues in `tests/integration_test.rs`:
**W8**: `test_health_check_disabled_when_port_zero` is misleading — port 0 means
"OS picks a random port" not "disabled". The test verifies that
`start_health_check_listener(0)` works (binds to a random port), which is
correct but the name implies it tests the disable path. The actual disable
logic is in `main.rs:94` where `health_check_port > 0` gates the call.
**W9**: `test_dynamic_config_with_limit` constructs `DynamicConfig` directly
with `routing_table: Default::default()` (empty HashMap), bypassing
`DynamicConfig::from_sites()`. The body limit tests only read
`body.limit_bytes` so it doesn't matter, but it's a latent trap for anyone
copying this pattern.
### Changes Required
**`tests/integration_test.rs`**:
- Rename `test_health_check_disabled_when_port_zero` to
`test_health_check_binds_random_port_when_zero`
- Update `test_dynamic_config_with_limit` to use `DynamicConfig::from_sites()`
instead of directly constructing with an empty routing table, OR add a
comment explaining why the routing table is intentionally empty for this test
## Acceptance Criteria
- [ ] Test renamed to `test_health_check_binds_random_port_when_zero`
- [ ] `test_dynamic_config_with_limit` either uses `from_sites()` or has a
comment explaining the intentional empty routing table
- [ ] `cargo test` passes
- [ ] `cargo clippy` passes with no warnings
## References
- docs/reviews/003-security-and-bug-review.md — W8, W9 findings
- tests/integration_test.rs — test names and implementations
## Notes
> To be filled on completion
## Summary
> To be filled on completion

View File

@@ -0,0 +1,62 @@
---
id: fix/tls-mode-wildcard-mismatch
name: Add explicit listener/acceptor count check or remove TlsMode wildcard (W5)
status: pending
depends_on: []
scope: single
risk: trivial
impact: isolated
level: implementation
review_findings: [W5]
---
## Description
The `match tls_mode` in `main.rs` has a wildcard `_` arm that logs a warning
and pushes **no** acceptor. Then `bound_listeners.into_iter().zip(tls_acceptors.into_iter())`
uses `zip`, which silently stops at the shorter iterator. If the wildcard arm
were ever reached, some listeners would have no TLS acceptor and would be
silently dropped.
`setup_tls` already rejects unknown modes with `bail!`, so the wildcard is
unreachable in practice. But it's a latent bug for future refactors.
### Changes Required
**`src/main.rs`** (lines 170-194):
- Option A (preferred): Remove the wildcard `_` arm entirely. Since `TlsMode`
only has two variants (`Manual` and `Acme`) and `setup_tls` already validates,
the wildcard is dead code. Removing it means the compiler will catch future
`TlsMode` additions.
- Option B: Add an explicit count check after the match loop:
```rust
if bound_listeners.len() != tls_acceptors.len() {
anyhow::bail!("listener/acceptor count mismatch: {} listeners, {} acceptors",
bound_listeners.len(), tls_acceptors.len());
}
```
If removing the wildcard, this check is redundant but harmless as a
defense-in-depth assertion.
## Acceptance Criteria
- [ ] Wildcard `_` arm removed from the `match tls_mode` block, OR
- [ ] Explicit count mismatch check added after the acceptor construction loop
- [ ] `cargo test` passes
- [ ] `cargo clippy` passes with no warnings
## References
- docs/reviews/003-security-and-bug-review.md — W5 finding
- src/main.rs — TLS acceptor construction loop (lines 170-194)
- src/tls/acceptor.rs — `setup_tls`, `TlsMode` enum
## Notes
> To be filled on completion
## Summary
> To be filled on completion

View File

@@ -0,0 +1,51 @@
---
id: fix/token-bucket-field-visibility
name: Make TokenBucket fields private except last_access (W10, S6)
status: pending
depends_on: []
scope: single
risk: trivial
impact: isolated
level: implementation
review_findings: [W10, S6]
---
## Description
All `TokenBucket` fields are `pub` but only `last_access` is read externally (by
`evict_stale` in `rate_limit/mod.rs`). The other fields (`tokens`, `last_refill`,
`rate`, `max`) should be private to prevent accidental direct mutation that
bypasses `try_consume`/`refill` logic.
### Changes Required
**`src/rate_limit/bucket.rs`**:
- Make `tokens`, `last_refill`, `rate`, `max` private (remove `pub`)
- Keep `last_access` as `pub(crate)` for `evict_stale` access
- `TokenBucket::new()` already exists as a constructor, so no changes needed there
- Update any unit tests that directly access private fields. The tests in
`bucket.rs` are in the same module so they have access to private fields.
Tests in `mod.rs` may need adjustment if they access `bucket.tokens` etc.
## Acceptance Criteria
- [ ] `tokens`, `last_refill`, `rate`, `max` fields are private
- [ ] `last_access` is `pub(crate)`
- [ ] `new()` constructor is the only way to create a `TokenBucket` externally
- [ ] `evict_stale` still compiles and works (uses `last_access`)
- [ ] All unit tests pass (in-module tests can still access private fields)
- [ ] `cargo clippy` passes with no warnings
## References
- docs/reviews/003-security-and-bug-review.md — W10, S6 findings
- src/rate_limit/bucket.rs — TokenBucket struct
- src/rate_limit/mod.rs — evict_stale
## Notes
> To be filled on completion
## Summary
> To be filled on completion

View File

@@ -0,0 +1,83 @@
---
id: fix/upstream-host-validation
name: Validate host part of upstream address in config (W1)
status: pending
depends_on: []
scope: narrow
risk: low
impact: component
level: implementation
review_findings: [W1]
---
## Description
`is_valid_upstream` checks that the upstream has a `host:port` format with a
valid port number, but performs **no validation on the host part** beyond
checking it's non-empty and doesn't start with `http://` or `https://`. Values
like `!!!bad!!!:3000` or `@#$%:8080` pass validation.
The spec (config.md validation rule 17) now requires: "the host part must parse
as a valid IpAddr or pass is_valid_hostname validation." Bracket-enclosed values
must be parsed as IPv6 addresses.
### Changes Required
**`src/config/validation.rs`** — `is_valid_upstream` function (lines 309-327):
- After validating the port, validate the host part:
```rust
fn is_valid_upstream(upstream: &str) -> bool {
if let Some(idx) = upstream.rfind(':') {
let host_part = &upstream[..idx];
let port_str = &upstream[idx + 1..];
if host_part.is_empty() { return false; }
if upstream.starts_with("http://") || upstream.starts_with("https://") { return false; }
let port: u16 = match port_str.parse() { Ok(p) => p, Err(_) => return false };
if port == 0 { return false; }
// Validate host part per config.md rule 17
if host_part.starts_with('[') && host_part.ends_with(']') {
let inner = &host_part[1..host_part.len()-1];
inner.parse::<std::net::Ipv6Addr>().is_ok()
} else {
host_part.parse::<std::net::IpAddr>().is_ok() || is_valid_hostname(host_part)
}
} else {
false
}
}
```
- Note: `is_valid_hostname` already exists in the same file and is used for site
`host` validation. It rejects IP addresses, which is correct for site hosts
but wrong for upstream hosts — upstream hosts CAN be IPs. The upstream
validation must check `IpAddr::parse` first, then fall back to
`is_valid_hostname` for DNS names.
- Add tests for:
- Valid: `gitea:3000`, `127.0.0.1:3000`, `[::1]:3000`
- Invalid: `!!!bad!!!:3000`, `@#$%:8080`, `:3000` (empty host)
## Acceptance Criteria
- [ ] `is_valid_upstream` validates the host part as IP address or valid hostname
- [ ] IPv6 bracket notation is handled (e.g., `[::1]:3000`)
- [ ] Invalid host characters like `!!!bad!!!:3000` are rejected
- [ ] Valid upstream formats still pass: `gitea:3000`, `127.0.0.1:3000`
- [ ] New unit tests for valid and invalid upstream host parts
- [ ] `cargo test` passes
- [ ] `cargo clippy` passes with no warnings
## References
- docs/architecture/config.md — validation rule 17
- docs/reviews/003-security-and-bug-review.md — W1 finding
- src/config/validation.rs — `is_valid_upstream`, `is_valid_hostname`
## Notes
> `is_valid_hostname` currently rejects IP addresses (intentional for site
> hosts). The upstream validation must handle IPs separately before falling
> back to `is_valid_hostname`.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,80 @@
---
id: fix/upstream-uri-error-handling
name: Return 502 on upstream URI parse failure instead of dropping query string (W3)
status: pending
depends_on: []
scope: narrow
risk: low
impact: component
level: implementation
review_findings: [W3]
---
## Description
`build_upstream_uri` silently drops the query string on parse failure and
`.unwrap()`s the fallback. This corrupts requests — the upstream receives the
wrong URL with no query parameters, and neither the client nor operator is
notified. The `.unwrap()` on the fallback could also panic.
The spec (proxy.md request forwarding step 1) states: "If URI construction
fails (e.g., the resulting URI is malformed), the proxy must return 502 Bad
Gateway and log the error at warn level. The proxy must never silently drop
parts of the URI."
### Changes Required
**`src/proxy/handler.rs`**:
- Change `build_upstream_uri` to return `Result<Uri, ()>`:
```rust
fn build_upstream_uri(scheme: &str, upstream: &str, original_uri: &Uri) -> Result<Uri, ()> {
let path = original_uri.path();
let query = original_uri
.query()
.map(|q| format!("?{}", q))
.unwrap_or_default();
let uri_string = format!("{}://{}{}{}", scheme, upstream, path, query);
uri_string.parse::<Uri>().map_err(|e| {
warn!(error = %e, uri = %uri_string, "failed to parse upstream URI");
})
}
```
- Update `proxy_handler` to handle the `Err` case:
```rust
let upstream_uri = match build_upstream_uri(&upstream_scheme, &upstream, req.uri()) {
Ok(uri) => uri,
Err(()) => {
log_upstream_error!(&host_owned, &upstream_addr, "malformed upstream URI");
let duration_ms = start.elapsed().as_millis() as u64;
log_request!(&client_ip, &host_owned, &method, &path, 502u16, &upstream, duration_ms);
return StatusCode::BAD_GATEWAY.into_response();
}
};
```
- Update the existing unit tests for `build_upstream_uri` to handle the
`Result` return type.
## Acceptance Criteria
- [ ] `build_upstream_uri` returns `Result<Uri, ()>` instead of `Uri`
- [ ] URI parse failure logs a warning with the malformed URI string
- [ ] URI parse failure returns 502 Bad Gateway to the client
- [ ] No `.unwrap()` calls in `build_upstream_uri`
- [ ] Query strings are never silently dropped
- [ ] Existing unit tests updated for `Result` return type
- [ ] `cargo test` passes
- [ ] `cargo clippy` passes with no warnings
## References
- docs/architecture/proxy.md — request forwarding step 1, URI error handling
- docs/reviews/003-security-and-bug-review.md — W3 finding
- src/proxy/handler.rs — `build_upstream_uri`, `proxy_handler`
## Notes
> To be filled on completion
## Summary
> To be filled on completion