---
status: draft
last_updated: 2026-06-11
reviewed_code:
  - src/main.rs
  - src/server.rs
  - src/tls/acceptor.rs
  - src/tls/acme.rs
  - src/tls/config.rs
  - src/tls/redirect.rs
  - src/config/validation.rs
  - src/config/dynamic_config.rs
  - src/config/static_config.rs
  - src/config/mod.rs
  - src/proxy/handler.rs
  - src/proxy/headers.rs
  - src/proxy/error.rs
  - src/proxy/body_limit.rs
  - src/proxy/mod.rs
  - src/rate_limit/mod.rs
  - src/rate_limit/bucket.rs
  - src/admin/socket.rs
  - src/shutdown.rs
  - src/health.rs
  - src/logging/mod.rs
  - src/logging/format.rs
  - src/cli.rs
reviewer: code-reviewer
---

# Implementation Review #002

## Purpose

Post-implementation review of all modules. Each finding is structured as
**Problem** → **Solution** (or **Open Question** where no solution is yet known).

## Severity Definitions

| Severity | Meaning |
|----------|---------|
| **Critical** | Will cause incorrect behavior or security issues in production |
| **Warning** | Could cause issues under specific conditions or represents a missed edge case |
| **Suggestion** | Code quality, style, or minor improvement opportunity |

---

## Critical Findings

### C1. ACME Challenge Listener Not Started

**File**: `src/main.rs:168-181`

**Problem**: When `TlsMode::Acme` is matched, the `challenge_config` and `resolver`
fields are destructured with `_` (discarded). The ACME TLS-ALPN-01 challenge
listener — required for Let's Encrypt certificate provisioning — is never started.
Without it, ACME certificate issuance will fail: Let's Encrypt cannot verify domain
ownership, and no certificates will ever be obtained.

**Solution**: Wire the challenge config into a separate TLS listener or configure
the `rustls-acme` resolver to serve TLS-ALPN-01 challenges on the same port.
The `rustls-acme` crate's `ResolvesServerCertAcme` handles ALPN protocol
negotiation automatically — if the ACME challenge ALPN protocol (`acme-tls/1`)
is included in the server's ALPN list, the resolver will serve challenge
certificates on the main HTTPS port. Verify that the `build_acme_server_config`
function includes `ACME_TLS_ALPN_01` in its ALPN list (it already does at line 28).
This means the main listener should be sufficient for TLS-ALPN-01 challenges
if the resolver is correctly installed. The separate `challenge_config` in
`TlsMode::Acme` may be unnecessary — confirm whether `rustls-acme` requires a
dedicated listener or if the resolver on the default config handles challenges
automatically.

---

### C2. ACME Contact Email Always Empty

**File**: `src/tls/acme.rs:86`

**Problem**: `AcmeTlsConfig` always sets `contact: vec![]`. Let's Encrypt requires
a contact email for production certificate requests. The `acme_directory` config
field supports `"production"` but an empty contact list will cause ACME
registration to fail with a 400-level error from the Let's Encrypt API.

**Solution**: Add an `acme_contact` field (e.g., `acme_contact = "mailto:admin@example.com"`)
to the `TlsConfig` struct and wire it through to `AcmeTlsConfig.contact`. This
requires changes to:
1. `src/config/static_config.rs` — add `acme_contact: Vec<String>` to `TlsConfig`
2. `src/tls/acme.rs` — use `self.contact` in `AcmeTlsConfig::setup`
3. `src/tls/acceptor.rs` — pass the contact list from `TlsConfig`

---

### C3. X-Forwarded-For Replaces Instead of Appending

**File**: `src/proxy/headers.rs:28`

**Problem**: `inject_proxy_headers` uses `headers.insert()` for `X-Forwarded-For`,
which **replaces** any existing value. The spec (review #001, finding C2) decided
that as an edge proxy the X-Forwarded-For should be **set** (not appended),
because there are no trusted proxies in front. However, the implementation
doesn't match either behavior — it silently discards any existing header rather
than explicitly setting it to just the client IP, which is correct for an edge
proxy but was implemented without a clear comment explaining the rationale.

**Solution**: The current behavior is actually **correct** for an edge proxy
(per review #001/C2). The fix is documentation, not code change. Add a comment
to `inject_proxy_headers` explaining:

```rust
// X-Forwarded-For is SET (not appended) because this proxy is the outermost
// edge proxy. Any existing X-Forwarded-For from the client is untrusted and
// must be replaced with the actual client IP from ConnectInfo.
```

This ensures future maintainers don't "fix" this by changing `insert` to `append`.

---

### C4. ConfigReloadHandle Never Updates Stored StaticConfig

**File**: `src/config/dynamic_config.rs:124-148`

**Problem**: The `reload()` method computes `diff_static_config(&self.static_config, &new_static)`
and returns the changed fields, but **never updates `self.static_config`** with
the new static config. This means:
- The first reload correctly reports changed fields
- The second reload compares against the **original** static config, not the
  last-reloaded one, and reports the same changes again
- Operators get repeated "static config fields changed" warnings for the same
  fields on every reload

**Solution**: After validation succeeds and the diff is computed, update
`self.static_config` with the new value. Since `ConfigReloadHandle` is accessed
concurrently, the static config field needs interior mutability. Use
`ArcSwap<StaticConfig>` or `std::sync::RwLock<StaticConfig>`:

```rust
pub struct ConfigReloadHandle {
    config: Arc<ArcSwap<DynamicConfig>>,
    static_config: ArcSwap<StaticConfig>,  // Changed from StaticConfig
    reload_mutex: Mutex<()>,
}
```

Then in `reload()`:
```rust
let changed_fields = diff_static_config(&self.static_config.load(), &new_static);
self.static_config.store(Arc::new(new_static));
self.config.store(Arc::new(new_dynamic));
```

---

## Warning Findings

### W1. Shutdown Aborts Listeners Without Draining

**File**: `src/main.rs:238-240`

**Problem**: On shutdown, `handle.abort()` is called on each HTTPS server task.
This immediately kills the tokio task, which can interrupt in-flight request
processing. The `drain_in_flight` counter is only decremented in
`InFlightGuard::drop`, but `abort()` prevents the guard from being dropped
normally for connections still being processed by the task.

**Solution**: Instead of `abort()`, use `tokio::time::timeout` with the shutdown
timeout to `join` each handle:

```rust
let shutdown_timeout = shutdown.shutdown_timeout();
for handle in https_server_handles {
    match tokio::time::timeout(shutdown_timeout, handle).await {
        Ok(_) => {}
        Err(_) => {
            warn!("shutdown timeout expired, aborting listener task");
            handle.abort();
        }
    }
}
```

Alternatively, keep the `shutdown_rx` pattern already in `serve_https_listener`
(which breaks the accept loop on shutdown signal) and remove the `abort()` calls
— the tasks will naturally exit when they stop accepting new connections and
existing connections drain.

---

### W2. Shutdown Doesn't Stop Admin Socket or Background Tasks

**File**: `src/main.rs:238-250`

**Problem**: The shutdown sequence aborts HTTPS listeners but doesn't stop:
- The admin socket listener (runs in infinite loop at `src/admin/socket.rs:99-111`)
- The rate limiter eviction task (runs in infinite loop at `src/rate_limit/mod.rs:106-112`)
- The ACME state machine task

These tasks will continue running until the process exits, which is fine for
process termination but means they can't be gracefully stopped in tests or
during a clean shutdown.

**Solution**: For Phase 1, this is acceptable — process termination will clean
up these tasks. For future improvements:
- Pass a `CancellationToken` or `watch::Receiver<bool>` to `start_admin_socket`
  and the eviction task
- On shutdown, signal cancellation before waiting for drain
- The ACME state machine task already exits when the stream ends (`None` branch
  at `src/tls/acme.rs:152-158`), but it should also be cancellable

---

### W3. Fragile Error Detection in Connection Error Handler

**File**: `src/server.rs:95-97`

**Problem**: The check `e.to_string().contains("incomplete message")` to silently
suppress connection errors is fragile. String matching on error descriptions can
break across hyper versions, locale changes, or error message reformatting.

**Solution**: Match on the error type instead. Check if `hyper` exposes a
client-disconnect error variant, or use `e.is_incomplete_message()` if available
in the hyper error API. If no typed variant exists, add a comment explaining
why string matching is used and which version(s) of hyper produce this message.

---

### W4. No Separate Connect Timeout for Upstream Requests

**File**: `src/proxy/handler.rs:73-79`

**Problem**: The proxy uses `tokio::time::timeout` with
`upstream_request_timeout_secs` for the entire request, but there's no separate
connect timeout. A slow DNS resolution or TCP handshake will consume the full
request timeout budget, leaving no time for the actual request/response cycle.

**Solution**: Add a `connect_timeout` (either a fixed default or from
`upstream_connect_timeout_secs` in `SiteConfig`). Structure the proxy call as:

```rust
let connect_timeout = Duration::from_secs(site.upstream_connect_timeout_secs);
let result = tokio::time::timeout(
    request_timeout,
    async {
        let upstream_req = build_upstream_request(req, &upstream_uri)?;
        // The hyper client handles connect internally, but we can wrap
        // the request in a two-phase timeout if needed
        client.request(upstream_req).await
    }
).await;
```

The `SiteConfig` already has `upstream_connect_timeout_secs` but it's not used
in `proxy_handler`. This should be wired up to either set the client's connect
timeout or to implement a two-phase timeout.

---

### W5. Hardcoded `/health` Path Intercepted on All Hosts

**File**: `src/proxy/handler.rs:37-39`

**Problem**: The proxy handler returns 200 OK for `GET /health` regardless of the
Host header. This means any site's `/health` path will be intercepted by the
proxy and never reach the upstream. If an upstream application uses `/health`
for its own health checks, those requests will never reach it.

**Solution**: Either:
1. Use a less common path like `/__health` or `/healthz` that won't collide
   with upstream applications, OR
2. Only intercept `/health` when the Host header doesn't match any known site
   (fallthrough), OR
3. Make the health check path configurable via `StaticConfig`

Option 1 is simplest for Phase 1. Option 3 is most flexible long-term.

---

### W6. Token Bucket Refill Uses Millisecond Precision

**File**: `src/rate_limit/bucket.rs:37`

**Problem**: `elapsed.as_millis()` truncates sub-millisecond time, which can lead
to token refill inaccuracies under high-frequency request bursts. For example,
two requests arriving 500µs apart both see `0ms` elapsed and don't refill tokens.

**Solution**: Use `as_nanos()` for the refill calculation:

```rust
let elapsed = now.duration_since(self.last_refill).as_nanos() as f64;
let tokens_to_add = (elapsed / 1_000_000_000.0) * rate;
```

This provides nanosecond-precision refill while keeping the math in floating point.

---

### W7. Admin Socket Has No Shutdown Mechanism

**File**: `src/admin/socket.rs:99-111`

**Problem**: `start_admin_socket` runs an infinite `loop` accepting connections
with no way to break out. It doesn't accept a shutdown signal, so it can't be
gracefully stopped during process shutdown or in tests.

**Solution**: Accept a `watch::Receiver<bool>` parameter and use `tokio::select!`
to check for shutdown:

```rust
tokio::select! {
    result = listener.accept() => { /* handle connection */ },
    _ = shutdown_rx.changed() => {
        info!("admin socket shutting down");
        break;
    }
}
```

This also requires cleaning up the socket file on exit.

---

### W8. Server Header Unconditionally Stripped from Upstream Response

**File**: `src/proxy/handler.rs:85`

**Problem**: `parts.headers.remove("server")` unconditionally removes the upstream
`Server` header. This is a design choice, not necessarily a bug, but it means
downstream clients can't see what software the upstream is running, which may be
undesirable for debugging.

**Solution**: This is acceptable behavior for a security-focused reverse proxy
(hiding upstream identity is a defense-in-depth measure). Document this decision
with a comment in the code explaining the rationale.

---

### W9. Logging Test Fails Due to Global Subscriber

**File**: `src/logging/mod.rs:99-111`

**Problem**: The test `init_creates_log_directory_and_file` calls `init()` which
sets a global default tracing subscriber. When tests run in parallel, this
conflicts with other tests that may also set a subscriber, causing the test to
fail with "a global default trace dispatcher has already been set."

**Solution**: Use `tracing_subscriber::fmt().with_test_writer()` and guard
against double-initialization. Alternatively, use `std::sync::OnceLock` or
`tracing_subscriber::util::SubscriberInitExt::try_init()` which returns an error
if already set rather than panicking.

---

### W10. Body Limit Middleware Only Checks Content-Length Header

**File**: `src/proxy/body_limit.rs:26-33`

**Problem**: The middleware checks the `Content-Length` header but doesn't handle
requests that lack `Content-Length` (e.g., chunked transfers, HTTP/2). A
malicious client can send `Transfer-Encoding: chunked` without `Content-Length`
and bypass the initial check, though the `Limited` body wrapper at line 37 will
still enforce the limit during streaming.

**Solution**: This is actually acceptable — the `Limited` body wrapper on line
37 is the real enforcement mechanism. The `Content-Length` check on line 26 is
an early-rejection optimization for clients that do include the header. Add a
comment explaining this two-layer defense:

```rust
// Early rejection: if Content-Length is present and exceeds the limit, reject
// immediately without reading the body. For requests without Content-Length
// (chunked, HTTP/2), the Limited body wrapper below enforces the limit during
// streaming.
```

---

### W11. Health Check Port Conflict Check Is Incomplete

**File**: `src/config/validation.rs:169-186`

**Problem**: The validation checks if `health_check_port` conflicts with any
listener's `https_port` or `http_port`, but it doesn't check whether the health
check port's bind address conflicts with a listener on a different bind address.
For example, `health_check_port = 80` bound to `127.0.0.1` shouldn't conflict
with a listener's `http_port = 80` bound to `203.0.113.10`.

**Solution**: This is acceptable for Phase 1 since the health check always binds
to `127.0.0.1` (hardcoded in `src/health.rs:20`). Document that health check
always binds to localhost, so the conflict check is conservative (warns even
when it might not actually conflict). Add a comment in the validation code
explaining this.

---

### W12. `build_upstream_request` Copies All Headers Without Filtering

**File**: `src/proxy/handler.rs:124-128`

**Problem**: `build_upstream_request` copies all headers from the original request
to the upstream request. However, `remove_hop_by_hop` was already called on the
original request's headers at line 59 before `build_upstream_request` is called,
so hop-by-hop headers have already been removed. The proxy headers
(`X-Real-IP`, `X-Forwarded-For`, `X-Forwarded-Proto`) were also already injected
at line 58. This means the function works correctly, but the code path is
spread across multiple locations, making it harder to reason about header
lifecycle.

**Solution**: Consider consolidating header manipulation into a single function
or adding inline comments that trace the header lifecycle:

```rust
// Header lifecycle:
// 1. inject_proxy_headers() — adds X-Real-IP, X-Forwarded-For, X-Forwarded-Proto
// 2. remove_hop_by_hop() — removes Connection, Keep-Alive, etc.
// 3. build_upstream_request() — copies remaining headers to upstream request
// 4. Response: remove_hop_by_hop() on upstream response headers
```

---

## Suggestions

### S1. `http_port` Validation Allows 0 but Not Documented

**File**: `src/config/validation.rs:106`

**Problem**: `http_port = 0` is treated as "disabled" (not validated as a port
number), which is correct, but the validation error message for invalid ports
says "must be 0 (disabled) or 1-65535" while `HttpPortInvalid` only checks
`http_port > 0 && http_port < 1` implicitly. The actual check is `http_port > 0`
to enter the conflict check, and port 0 is always allowed for HTTP.

**Solution**: Add explicit validation for `http_port` being in the range 0 or
1-65535 (currently it only validates conflicts, not the port range itself for
http). Add a `HttpPortInvalid` check similar to `HttpsPortInvalid`:

```rust
if listener.http_port > 65535 {
    errors.push(ValidationError::HttpPortInvalid { ... });
}
```

---

### S2. Consider Using `#[non_exhaustive]` on Public Enums

**Files**: `src/tls/acceptor.rs:49` (`TlsMode`), `src/proxy/error.rs:5` (`ProxyError`),
`src/admin/socket.rs:15` (`AdminSocketError`), `src/config/validation.rs:10` (`ValidationError`)

**Problem**: These public enums can be matched exhaustively by downstream consumers.
Adding a new variant would be a breaking change.

**Solution**: Add `#[non_exhaustive]` to these enums to allow future expansion
without breaking changes. This is especially important for `TlsMode` (which may
gain a `"letsencrypt"` or `"auto"` mode) and `ProxyError` (which may gain
`UpstreamTls` error handling).

---

### S3. `normalize_host` in `dynamic_config.rs` Doesn't Handle Edge Cases

**File**: `src/config/dynamic_config.rs:52-55`

**Problem**: `normalize_host` uses `split(':').next()` to strip ports, but this
fails for IPv6 addresses in brackets (e.g., `[::1]:443` would normalize to
`[::1]` instead of `::1`). The `strip_port_from_host` function in
`src/tls/redirect.rs:16-28` correctly handles this case.

**Solution**: Either reuse the `strip_port_from_host` logic from `redirect.rs`
or add a shared utility function. The IPv6 bracket handling is important for
correctness if the proxy ever receives an IPv6 Host header.

---

### S4. Multiple `#[allow(dead_code)]` Annotations on Public API

**Files**: `src/tls/acceptor.rs:14,33,48,58`, `src/tls/acme.rs:9,11,15,23,55`,
`src/config/static_config.rs:4,31,44,49,54,56,70,76,86,91`

**Problem**: Many public items are annotated with `#[allow(dead_code)]`, which
suggests they're defined but not yet used by the binary crate. This is fine
during initial development but should be cleaned up before release.

**Solution**: Remove `#[allow(dead_code)]` annotations once all features are
wired up (especially after C1 is fixed, since `build_acme_challenge_config` and
`TlsMode::Acme.challenge_config` will be needed). Run `cargo check` to identify
which items are actually dead code vs. just not yet referenced.

---

### S5. `InFlightCounter` Could Use Atomic Usize Directly

**File**: `src/server.rs:16-46`

**Problem**: `InFlightCounter` wraps an `AtomicUsize` with `increment`/`decrement`/`is_zero`
methods, plus a separate `InFlightGuard` RAII type. This is clean, but the
`Arc<InFlightCounter>` pattern means every connection task clones the `Arc`,
which has a small allocation cost.

**Solution**: This is a fine pattern for correctness. No change needed — the
RAII guard pattern correctly ensures `decrement` is always called even on panic.
Mentioning as a suggestion for awareness, not action.

---

### S6. Add `Clone` to `SiteConfig` Derive Is Correct But May Mask Cloning Hot Paths

**File**: `src/config/dynamic_config.rs:70`

**Problem**: `SiteConfig` derives `Clone`, which is used in
`src/proxy/handler.rs:53` where `site.clone()` clones the site config on every
request. For hot paths, this allocates new `String`s for `host`, `upstream`, and
`upstream_scheme`.

**Solution**: Consider using `Arc<SiteConfig>` in the routing table so that
lookups return an `Arc` clone (cheap atomic refcount) instead of a full `String`
clone. This would change `routing_table` from `HashMap<String, SiteConfig>` to
`HashMap<String, Arc<SiteConfig>>`. This is a performance optimization for
later — not a correctness issue.

---

### S7. HTTPS Client Trusts System Root Certificates Unconditionally

**File**: `src/proxy/handler.rs:153-165`

**Problem**: The `root_certs()` function loads native certificates and silently
skips any that fail to parse (`roots.add(cert).ok()`). While this matches the
spec (review #001, W13), certificate validation failures for upstream TLS
connections will produce opaque IO errors.

**Solution**: This is acceptable for Phase 1 per the spec (ADR-017). Consider
logging the number of root certificates loaded for operational visibility:

```rust
let cert_count = result.certs.len();
info!(certs_loaded = cert_count, "loaded system root certificates");
```

---

### S8. Request Timeout Applies to Entire HTTP Exchange, Not Just Response

**File**: `src/proxy/handler.rs:75-79`

**Problem**: `tokio::time::timeout(request_timeout, client.request(...))` applies
the timeout to the entire HTTP round-trip including response body streaming.
For large file downloads or slow upstreams, this means a 60-second timeout kills
the response even if the upstream is actively sending data. A more precise
timeout would apply only to the connection + first-byte response, then stream
the body without a timeout.

**Solution**: For Phase 1, this is acceptable behavior — the timeout name
(`upstream_request_timeout_secs`) is documented as applying to the full request.
Consider splitting into connect-timeout and response-timeout in Phase 2. The
`SiteConfig` already has separate `upstream_connect_timeout_secs` and
`upstream_request_timeout_secs` fields, but `upstream_connect_timeout_secs` is
unused (see W4).

---

## Summary Statistics

| Severity | Count | Status |
|----------|-------|--------|
| Critical | 4 | Must fix before production |
| Warning | 12 | Should fix — correctness and robustness |
| Suggestion | 8 | Consider for code quality |

## Recommended Fix Priority

1. **C1 (ACME challenge listener)** — Without this, ACME cert provisioning is
   completely broken. This is the highest priority fix.
2. **C2 (ACME contact email)** — Without this, production Let's Encrypt
   registration fails.
3. **C4 (ConfigReloadHandle static config drift)** — Every reload will produce
   stale warnings, confusing operators.
4. **C3 (X-Forwarded-For comment)** — Correct behavior, just needs a clarifying
   comment to prevent future "fixes."
5. **W1 (Shutdown drain)** — Can cause connection drops on graceful restart.
6. **W4 (Connect timeout)** — Slow upstreams can exhaust the full request
   timeout budget.
7. **W5 (Health path collision)** — Upstream `/health` endpoints are silently
   intercepted.
8. **W7 (Admin socket shutdown)** — Cannot gracefully stop.
9. **Remaining W and S findings** — Fix opportunistically.