Add post-implementation code review (4 critical, 12 warning, 8 suggestion findings)
This commit is contained in:
586
docs/reviews/002-implementation-review.md
Normal file
586
docs/reviews/002-implementation-review.md
Normal file
@@ -0,0 +1,586 @@
|
|||||||
|
---
|
||||||
|
status: draft
|
||||||
|
last_updated: 2026-06-11
|
||||||
|
reviewed_code:
|
||||||
|
- src/main.rs
|
||||||
|
- src/server.rs
|
||||||
|
- src/tls/acceptor.rs
|
||||||
|
- src/tls/acme.rs
|
||||||
|
- src/tls/config.rs
|
||||||
|
- src/tls/redirect.rs
|
||||||
|
- src/config/validation.rs
|
||||||
|
- src/config/dynamic_config.rs
|
||||||
|
- src/config/static_config.rs
|
||||||
|
- src/config/mod.rs
|
||||||
|
- src/proxy/handler.rs
|
||||||
|
- src/proxy/headers.rs
|
||||||
|
- src/proxy/error.rs
|
||||||
|
- src/proxy/body_limit.rs
|
||||||
|
- src/proxy/mod.rs
|
||||||
|
- src/rate_limit/mod.rs
|
||||||
|
- src/rate_limit/bucket.rs
|
||||||
|
- src/admin/socket.rs
|
||||||
|
- src/shutdown.rs
|
||||||
|
- src/health.rs
|
||||||
|
- src/logging/mod.rs
|
||||||
|
- src/logging/format.rs
|
||||||
|
- src/cli.rs
|
||||||
|
reviewer: code-reviewer
|
||||||
|
---
|
||||||
|
|
||||||
|
# Implementation Review #002
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Post-implementation review of all modules. Each finding is structured as
|
||||||
|
**Problem** → **Solution** (or **Open Question** where no solution is yet known).
|
||||||
|
|
||||||
|
## Severity Definitions
|
||||||
|
|
||||||
|
| Severity | Meaning |
|
||||||
|
|----------|---------|
|
||||||
|
| **Critical** | Will cause incorrect behavior or security issues in production |
|
||||||
|
| **Warning** | Could cause issues under specific conditions or represents a missed edge case |
|
||||||
|
| **Suggestion** | Code quality, style, or minor improvement opportunity |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Critical Findings
|
||||||
|
|
||||||
|
### C1. ACME Challenge Listener Not Started
|
||||||
|
|
||||||
|
**File**: `src/main.rs:168-181`
|
||||||
|
|
||||||
|
**Problem**: When `TlsMode::Acme` is matched, the `challenge_config` and `resolver`
|
||||||
|
fields are destructured with `_` (discarded). The ACME TLS-ALPN-01 challenge
|
||||||
|
listener — required for Let's Encrypt certificate provisioning — is never started.
|
||||||
|
Without it, ACME certificate issuance will fail: Let's Encrypt cannot verify domain
|
||||||
|
ownership, and no certificates will ever be obtained.
|
||||||
|
|
||||||
|
**Solution**: Wire the challenge config into a separate TLS listener or configure
|
||||||
|
the `rustls-acme` resolver to serve TLS-ALPN-01 challenges on the same port.
|
||||||
|
The `rustls-acme` crate's `ResolvesServerCertAcme` handles ALPN protocol
|
||||||
|
negotiation automatically — if the ACME challenge ALPN protocol (`acme-tls/1`)
|
||||||
|
is included in the server's ALPN list, the resolver will serve challenge
|
||||||
|
certificates on the main HTTPS port. Verify that the `build_acme_server_config`
|
||||||
|
function includes `ACME_TLS_ALPN_01` in its ALPN list (it already does at line 28).
|
||||||
|
This means the main listener should be sufficient for TLS-ALPN-01 challenges
|
||||||
|
if the resolver is correctly installed. The separate `challenge_config` in
|
||||||
|
`TlsMode::Acme` may be unnecessary — confirm whether `rustls-acme` requires a
|
||||||
|
dedicated listener or if the resolver on the default config handles challenges
|
||||||
|
automatically.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### C2. ACME Contact Email Always Empty
|
||||||
|
|
||||||
|
**File**: `src/tls/acme.rs:86`
|
||||||
|
|
||||||
|
**Problem**: `AcmeTlsConfig` always sets `contact: vec![]`. Let's Encrypt requires
|
||||||
|
a contact email for production certificate requests. The `acme_directory` config
|
||||||
|
field supports `"production"` but an empty contact list will cause ACME
|
||||||
|
registration to fail with a 400-level error from the Let's Encrypt API.
|
||||||
|
|
||||||
|
**Solution**: Add an `acme_contact` field (e.g., `acme_contact = "mailto:admin@example.com"`)
|
||||||
|
to the `TlsConfig` struct and wire it through to `AcmeTlsConfig.contact`. This
|
||||||
|
requires changes to:
|
||||||
|
1. `src/config/static_config.rs` — add `acme_contact: Vec<String>` to `TlsConfig`
|
||||||
|
2. `src/tls/acme.rs` — use `self.contact` in `AcmeTlsConfig::setup`
|
||||||
|
3. `src/tls/acceptor.rs` — pass the contact list from `TlsConfig`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### C3. X-Forwarded-For Replaces Instead of Appending
|
||||||
|
|
||||||
|
**File**: `src/proxy/headers.rs:28`
|
||||||
|
|
||||||
|
**Problem**: `inject_proxy_headers` uses `headers.insert()` for `X-Forwarded-For`,
|
||||||
|
which **replaces** any existing value. The spec (review #001, finding C2) decided
|
||||||
|
that as an edge proxy the X-Forwarded-For should be **set** (not appended),
|
||||||
|
because there are no trusted proxies in front. However, the implementation
|
||||||
|
doesn't match either behavior — it silently discards any existing header rather
|
||||||
|
than explicitly setting it to just the client IP, which is correct for an edge
|
||||||
|
proxy but was implemented without a clear comment explaining the rationale.
|
||||||
|
|
||||||
|
**Solution**: The current behavior is actually **correct** for an edge proxy
|
||||||
|
(per review #001/C2). The fix is documentation, not code change. Add a comment
|
||||||
|
to `inject_proxy_headers` explaining:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// X-Forwarded-For is SET (not appended) because this proxy is the outermost
|
||||||
|
// edge proxy. Any existing X-Forwarded-For from the client is untrusted and
|
||||||
|
// must be replaced with the actual client IP from ConnectInfo.
|
||||||
|
```
|
||||||
|
|
||||||
|
This ensures future maintainers don't "fix" this by changing `insert` to `append`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### C4. ConfigReloadHandle Never Updates Stored StaticConfig
|
||||||
|
|
||||||
|
**File**: `src/config/dynamic_config.rs:124-148`
|
||||||
|
|
||||||
|
**Problem**: The `reload()` method computes `diff_static_config(&self.static_config, &new_static)`
|
||||||
|
and returns the changed fields, but **never updates `self.static_config`** with
|
||||||
|
the new static config. This means:
|
||||||
|
- The first reload correctly reports changed fields
|
||||||
|
- The second reload compares against the **original** static config, not the
|
||||||
|
last-reloaded one, and reports the same changes again
|
||||||
|
- Operators get repeated "static config fields changed" warnings for the same
|
||||||
|
fields on every reload
|
||||||
|
|
||||||
|
**Solution**: After validation succeeds and the diff is computed, update
|
||||||
|
`self.static_config` with the new value. Since `ConfigReloadHandle` is accessed
|
||||||
|
concurrently, the static config field needs interior mutability. Use
|
||||||
|
`ArcSwap<StaticConfig>` or `std::sync::RwLock<StaticConfig>`:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub struct ConfigReloadHandle {
|
||||||
|
config: Arc<ArcSwap<DynamicConfig>>,
|
||||||
|
static_config: ArcSwap<StaticConfig>, // Changed from StaticConfig
|
||||||
|
reload_mutex: Mutex<()>,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Then in `reload()`:
|
||||||
|
```rust
|
||||||
|
let changed_fields = diff_static_config(&self.static_config.load(), &new_static);
|
||||||
|
self.static_config.store(Arc::new(new_static));
|
||||||
|
self.config.store(Arc::new(new_dynamic));
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Warning Findings
|
||||||
|
|
||||||
|
### W1. Shutdown Aborts Listeners Without Draining
|
||||||
|
|
||||||
|
**File**: `src/main.rs:238-240`
|
||||||
|
|
||||||
|
**Problem**: On shutdown, `handle.abort()` is called on each HTTPS server task.
|
||||||
|
This immediately kills the tokio task, which can interrupt in-flight request
|
||||||
|
processing. The `drain_in_flight` counter is only decremented in
|
||||||
|
`InFlightGuard::drop`, but `abort()` prevents the guard from being dropped
|
||||||
|
normally for connections still being processed by the task.
|
||||||
|
|
||||||
|
**Solution**: Instead of `abort()`, use `tokio::time::timeout` with the shutdown
|
||||||
|
timeout to `join` each handle:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
let shutdown_timeout = shutdown.shutdown_timeout();
|
||||||
|
for handle in https_server_handles {
|
||||||
|
match tokio::time::timeout(shutdown_timeout, handle).await {
|
||||||
|
Ok(_) => {}
|
||||||
|
Err(_) => {
|
||||||
|
warn!("shutdown timeout expired, aborting listener task");
|
||||||
|
handle.abort();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Alternatively, keep the `shutdown_rx` pattern already in `serve_https_listener`
|
||||||
|
(which breaks the accept loop on shutdown signal) and remove the `abort()` calls
|
||||||
|
— the tasks will naturally exit when they stop accepting new connections and
|
||||||
|
existing connections drain.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### W2. Shutdown Doesn't Stop Admin Socket or Background Tasks
|
||||||
|
|
||||||
|
**File**: `src/main.rs:238-250`
|
||||||
|
|
||||||
|
**Problem**: The shutdown sequence aborts HTTPS listeners but doesn't stop:
|
||||||
|
- The admin socket listener (runs in infinite loop at `src/admin/socket.rs:99-111`)
|
||||||
|
- The rate limiter eviction task (runs in infinite loop at `src/rate_limit/mod.rs:106-112`)
|
||||||
|
- The ACME state machine task
|
||||||
|
|
||||||
|
These tasks will continue running until the process exits, which is fine for
|
||||||
|
process termination but means they can't be gracefully stopped in tests or
|
||||||
|
during a clean shutdown.
|
||||||
|
|
||||||
|
**Solution**: For Phase 1, this is acceptable — process termination will clean
|
||||||
|
up these tasks. For future improvements:
|
||||||
|
- Pass a `CancellationToken` or `watch::Receiver<bool>` to `start_admin_socket`
|
||||||
|
and the eviction task
|
||||||
|
- On shutdown, signal cancellation before waiting for drain
|
||||||
|
- The ACME state machine task already exits when the stream ends (`None` branch
|
||||||
|
at `src/tls/acme.rs:152-158`), but it should also be cancellable
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### W3. Fragile Error Detection in Connection Error Handler
|
||||||
|
|
||||||
|
**File**: `src/server.rs:95-97`
|
||||||
|
|
||||||
|
**Problem**: The check `e.to_string().contains("incomplete message")` to silently
|
||||||
|
suppress connection errors is fragile. String matching on error descriptions can
|
||||||
|
break across hyper versions, locale changes, or error message reformatting.
|
||||||
|
|
||||||
|
**Solution**: Match on the error type instead. Check if `hyper` exposes a
|
||||||
|
client-disconnect error variant, or use `e.is_incomplete_message()` if available
|
||||||
|
in the hyper error API. If no typed variant exists, add a comment explaining
|
||||||
|
why string matching is used and which version(s) of hyper produce this message.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### W4. No Separate Connect Timeout for Upstream Requests
|
||||||
|
|
||||||
|
**File**: `src/proxy/handler.rs:73-79`
|
||||||
|
|
||||||
|
**Problem**: The proxy uses `tokio::time::timeout` with
|
||||||
|
`upstream_request_timeout_secs` for the entire request, but there's no separate
|
||||||
|
connect timeout. A slow DNS resolution or TCP handshake will consume the full
|
||||||
|
request timeout budget, leaving no time for the actual request/response cycle.
|
||||||
|
|
||||||
|
**Solution**: Add a `connect_timeout` (either a fixed default or from
|
||||||
|
`upstream_connect_timeout_secs` in `SiteConfig`). Structure the proxy call as:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
let connect_timeout = Duration::from_secs(site.upstream_connect_timeout_secs);
|
||||||
|
let result = tokio::time::timeout(
|
||||||
|
request_timeout,
|
||||||
|
async {
|
||||||
|
let upstream_req = build_upstream_request(req, &upstream_uri)?;
|
||||||
|
// The hyper client handles connect internally, but we can wrap
|
||||||
|
// the request in a two-phase timeout if needed
|
||||||
|
client.request(upstream_req).await
|
||||||
|
}
|
||||||
|
).await;
|
||||||
|
```
|
||||||
|
|
||||||
|
The `SiteConfig` already has `upstream_connect_timeout_secs` but it's not used
|
||||||
|
in `proxy_handler`. This should be wired up to either set the client's connect
|
||||||
|
timeout or to implement a two-phase timeout.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### W5. Hardcoded `/health` Path Intercepted on All Hosts
|
||||||
|
|
||||||
|
**File**: `src/proxy/handler.rs:37-39`
|
||||||
|
|
||||||
|
**Problem**: The proxy handler returns 200 OK for `GET /health` regardless of the
|
||||||
|
Host header. This means any site's `/health` path will be intercepted by the
|
||||||
|
proxy and never reach the upstream. If an upstream application uses `/health`
|
||||||
|
for its own health checks, those requests will never reach it.
|
||||||
|
|
||||||
|
**Solution**: Either:
|
||||||
|
1. Use a less common path like `/__health` or `/healthz` that won't collide
|
||||||
|
with upstream applications, OR
|
||||||
|
2. Only intercept `/health` when the Host header doesn't match any known site
|
||||||
|
(fallthrough), OR
|
||||||
|
3. Make the health check path configurable via `StaticConfig`
|
||||||
|
|
||||||
|
Option 1 is simplest for Phase 1. Option 3 is most flexible long-term.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### W6. Token Bucket Refill Uses Millisecond Precision
|
||||||
|
|
||||||
|
**File**: `src/rate_limit/bucket.rs:37`
|
||||||
|
|
||||||
|
**Problem**: `elapsed.as_millis()` truncates sub-millisecond time, which can lead
|
||||||
|
to token refill inaccuracies under high-frequency request bursts. For example,
|
||||||
|
two requests arriving 500µs apart both see `0ms` elapsed and don't refill tokens.
|
||||||
|
|
||||||
|
**Solution**: Use `as_nanos()` for the refill calculation:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
let elapsed = now.duration_since(self.last_refill).as_nanos() as f64;
|
||||||
|
let tokens_to_add = (elapsed / 1_000_000_000.0) * rate;
|
||||||
|
```
|
||||||
|
|
||||||
|
This provides nanosecond-precision refill while keeping the math in floating point.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### W7. Admin Socket Has No Shutdown Mechanism
|
||||||
|
|
||||||
|
**File**: `src/admin/socket.rs:99-111`
|
||||||
|
|
||||||
|
**Problem**: `start_admin_socket` runs an infinite `loop` accepting connections
|
||||||
|
with no way to break out. It doesn't accept a shutdown signal, so it can't be
|
||||||
|
gracefully stopped during process shutdown or in tests.
|
||||||
|
|
||||||
|
**Solution**: Accept a `watch::Receiver<bool>` parameter and use `tokio::select!`
|
||||||
|
to check for shutdown:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
tokio::select! {
|
||||||
|
result = listener.accept() => { /* handle connection */ },
|
||||||
|
_ = shutdown_rx.changed() => {
|
||||||
|
info!("admin socket shutting down");
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This also requires cleaning up the socket file on exit.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### W8. Server Header Unconditionally Stripped from Upstream Response
|
||||||
|
|
||||||
|
**File**: `src/proxy/handler.rs:85`
|
||||||
|
|
||||||
|
**Problem**: `parts.headers.remove("server")` unconditionally removes the upstream
|
||||||
|
`Server` header. This is a design choice, not necessarily a bug, but it means
|
||||||
|
downstream clients can't see what software the upstream is running, which may be
|
||||||
|
undesirable for debugging.
|
||||||
|
|
||||||
|
**Solution**: This is acceptable behavior for a security-focused reverse proxy
|
||||||
|
(hiding upstream identity is a defense-in-depth measure). Document this decision
|
||||||
|
with a comment in the code explaining the rationale.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### W9. Logging Test Fails Due to Global Subscriber
|
||||||
|
|
||||||
|
**File**: `src/logging/mod.rs:99-111`
|
||||||
|
|
||||||
|
**Problem**: The test `init_creates_log_directory_and_file` calls `init()` which
|
||||||
|
sets a global default tracing subscriber. When tests run in parallel, this
|
||||||
|
conflicts with other tests that may also set a subscriber, causing the test to
|
||||||
|
fail with "a global default trace dispatcher has already been set."
|
||||||
|
|
||||||
|
**Solution**: Use `tracing_subscriber::fmt().with_test_writer()` and guard
|
||||||
|
against double-initialization. Alternatively, use `std::sync::OnceLock` or
|
||||||
|
`tracing_subscriber::util::SubscriberInitExt::try_init()` which returns an error
|
||||||
|
if already set rather than panicking.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### W10. Body Limit Middleware Only Checks Content-Length Header
|
||||||
|
|
||||||
|
**File**: `src/proxy/body_limit.rs:26-33`
|
||||||
|
|
||||||
|
**Problem**: The middleware checks the `Content-Length` header but doesn't handle
|
||||||
|
requests that lack `Content-Length` (e.g., chunked transfers, HTTP/2). A
|
||||||
|
malicious client can send `Transfer-Encoding: chunked` without `Content-Length`
|
||||||
|
and bypass the initial check, though the `Limited` body wrapper at line 37 will
|
||||||
|
still enforce the limit during streaming.
|
||||||
|
|
||||||
|
**Solution**: This is actually acceptable — the `Limited` body wrapper on line
|
||||||
|
37 is the real enforcement mechanism. The `Content-Length` check on line 26 is
|
||||||
|
an early-rejection optimization for clients that do include the header. Add a
|
||||||
|
comment explaining this two-layer defense:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// Early rejection: if Content-Length is present and exceeds the limit, reject
|
||||||
|
// immediately without reading the body. For requests without Content-Length
|
||||||
|
// (chunked, HTTP/2), the Limited body wrapper below enforces the limit during
|
||||||
|
// streaming.
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### W11. Health Check Port Conflict Check Is Incomplete
|
||||||
|
|
||||||
|
**File**: `src/config/validation.rs:169-186`
|
||||||
|
|
||||||
|
**Problem**: The validation checks if `health_check_port` conflicts with any
|
||||||
|
listener's `https_port` or `http_port`, but it doesn't check whether the health
|
||||||
|
check port's bind address conflicts with a listener on a different bind address.
|
||||||
|
For example, `health_check_port = 80` bound to `127.0.0.1` shouldn't conflict
|
||||||
|
with a listener's `http_port = 80` bound to `203.0.113.10`.
|
||||||
|
|
||||||
|
**Solution**: This is acceptable for Phase 1 since the health check always binds
|
||||||
|
to `127.0.0.1` (hardcoded in `src/health.rs:20`). Document that health check
|
||||||
|
always binds to localhost, so the conflict check is conservative (warns even
|
||||||
|
when it might not actually conflict). Add a comment in the validation code
|
||||||
|
explaining this.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### W12. `build_upstream_request` Copies All Headers Without Filtering
|
||||||
|
|
||||||
|
**File**: `src/proxy/handler.rs:124-128`
|
||||||
|
|
||||||
|
**Problem**: `build_upstream_request` copies all headers from the original request
|
||||||
|
to the upstream request. However, `remove_hop_by_hop` was already called on the
|
||||||
|
original request's headers at line 59 before `build_upstream_request` is called,
|
||||||
|
so hop-by-hop headers have already been removed. The proxy headers
|
||||||
|
(`X-Real-IP`, `X-Forwarded-For`, `X-Forwarded-Proto`) were also already injected
|
||||||
|
at line 58. This means the function works correctly, but the code path is
|
||||||
|
spread across multiple locations, making it harder to reason about header
|
||||||
|
lifecycle.
|
||||||
|
|
||||||
|
**Solution**: Consider consolidating header manipulation into a single function
|
||||||
|
or adding inline comments that trace the header lifecycle:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// Header lifecycle:
|
||||||
|
// 1. inject_proxy_headers() — adds X-Real-IP, X-Forwarded-For, X-Forwarded-Proto
|
||||||
|
// 2. remove_hop_by_hop() — removes Connection, Keep-Alive, etc.
|
||||||
|
// 3. build_upstream_request() — copies remaining headers to upstream request
|
||||||
|
// 4. Response: remove_hop_by_hop() on upstream response headers
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Suggestions
|
||||||
|
|
||||||
|
### S1. `http_port` Validation Allows 0 but Not Documented
|
||||||
|
|
||||||
|
**File**: `src/config/validation.rs:106`
|
||||||
|
|
||||||
|
**Problem**: `http_port = 0` is treated as "disabled" (not validated as a port
|
||||||
|
number), which is correct, but the validation error message for invalid ports
|
||||||
|
says "must be 0 (disabled) or 1-65535" while `HttpPortInvalid` only checks
|
||||||
|
`http_port > 0 && http_port < 1` implicitly. The actual check is `http_port > 0`
|
||||||
|
to enter the conflict check, and port 0 is always allowed for HTTP.
|
||||||
|
|
||||||
|
**Solution**: Add explicit validation for `http_port` being in the range 0 or
|
||||||
|
1-65535 (currently it only validates conflicts, not the port range itself for
|
||||||
|
http). Add a `HttpPortInvalid` check similar to `HttpsPortInvalid`:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
if listener.http_port > 65535 {
|
||||||
|
errors.push(ValidationError::HttpPortInvalid { ... });
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### S2. Consider Using `#[non_exhaustive]` on Public Enums
|
||||||
|
|
||||||
|
**Files**: `src/tls/acceptor.rs:49` (`TlsMode`), `src/proxy/error.rs:5` (`ProxyError`),
|
||||||
|
`src/admin/socket.rs:15` (`AdminSocketError`), `src/config/validation.rs:10` (`ValidationError`)
|
||||||
|
|
||||||
|
**Problem**: These public enums can be matched exhaustively by downstream consumers.
|
||||||
|
Adding a new variant would be a breaking change.
|
||||||
|
|
||||||
|
**Solution**: Add `#[non_exhaustive]` to these enums to allow future expansion
|
||||||
|
without breaking changes. This is especially important for `TlsMode` (which may
|
||||||
|
gain a `"letsencrypt"` or `"auto"` mode) and `ProxyError` (which may gain
|
||||||
|
`UpstreamTls` error handling).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### S3. `normalize_host` in `dynamic_config.rs` Doesn't Handle Edge Cases
|
||||||
|
|
||||||
|
**File**: `src/config/dynamic_config.rs:52-55`
|
||||||
|
|
||||||
|
**Problem**: `normalize_host` uses `split(':').next()` to strip ports, but this
|
||||||
|
fails for IPv6 addresses in brackets (e.g., `[::1]:443` would normalize to
|
||||||
|
`[::1]` instead of `::1`). The `strip_port_from_host` function in
|
||||||
|
`src/tls/redirect.rs:16-28` correctly handles this case.
|
||||||
|
|
||||||
|
**Solution**: Either reuse the `strip_port_from_host` logic from `redirect.rs`
|
||||||
|
or add a shared utility function. The IPv6 bracket handling is important for
|
||||||
|
correctness if the proxy ever receives an IPv6 Host header.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### S4. Multiple `#[allow(dead_code)]` Annotations on Public API
|
||||||
|
|
||||||
|
**Files**: `src/tls/acceptor.rs:14,33,48,58`, `src/tls/acme.rs:9,11,15,23,55`,
|
||||||
|
`src/config/static_config.rs:4,31,44,49,54,56,70,76,86,91`
|
||||||
|
|
||||||
|
**Problem**: Many public items are annotated with `#[allow(dead_code)]`, which
|
||||||
|
suggests they're defined but not yet used by the binary crate. This is fine
|
||||||
|
during initial development but should be cleaned up before release.
|
||||||
|
|
||||||
|
**Solution**: Remove `#[allow(dead_code)]` annotations once all features are
|
||||||
|
wired up (especially after C1 is fixed, since `build_acme_challenge_config` and
|
||||||
|
`TlsMode::Acme.challenge_config` will be needed). Run `cargo check` to identify
|
||||||
|
which items are actually dead code vs. just not yet referenced.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### S5. `InFlightCounter` Could Use Atomic Usize Directly
|
||||||
|
|
||||||
|
**File**: `src/server.rs:16-46`
|
||||||
|
|
||||||
|
**Problem**: `InFlightCounter` wraps an `AtomicUsize` with `increment`/`decrement`/`is_zero`
|
||||||
|
methods, plus a separate `InFlightGuard` RAII type. This is clean, but the
|
||||||
|
`Arc<InFlightCounter>` pattern means every connection task clones the `Arc`,
|
||||||
|
which has a small allocation cost.
|
||||||
|
|
||||||
|
**Solution**: This is a fine pattern for correctness. No change needed — the
|
||||||
|
RAII guard pattern correctly ensures `decrement` is always called even on panic.
|
||||||
|
Mentioning as a suggestion for awareness, not action.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### S6. Add `Clone` to `SiteConfig` Derive Is Correct But May Mask Cloning Hot Paths
|
||||||
|
|
||||||
|
**File**: `src/config/dynamic_config.rs:70`
|
||||||
|
|
||||||
|
**Problem**: `SiteConfig` derives `Clone`, which is used in
|
||||||
|
`src/proxy/handler.rs:53` where `site.clone()` clones the site config on every
|
||||||
|
request. For hot paths, this allocates new `String`s for `host`, `upstream`, and
|
||||||
|
`upstream_scheme`.
|
||||||
|
|
||||||
|
**Solution**: Consider using `Arc<SiteConfig>` in the routing table so that
|
||||||
|
lookups return an `Arc` clone (cheap atomic refcount) instead of a full `String`
|
||||||
|
clone. This would change `routing_table` from `HashMap<String, SiteConfig>` to
|
||||||
|
`HashMap<String, Arc<SiteConfig>>`. This is a performance optimization for
|
||||||
|
later — not a correctness issue.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### S7. HTTPS Client Trusts System Root Certificates Unconditionally
|
||||||
|
|
||||||
|
**File**: `src/proxy/handler.rs:153-165`
|
||||||
|
|
||||||
|
**Problem**: The `root_certs()` function loads native certificates and silently
|
||||||
|
skips any that fail to parse (`roots.add(cert).ok()`). While this matches the
|
||||||
|
spec (review #001, W13), certificate validation failures for upstream TLS
|
||||||
|
connections will produce opaque IO errors.
|
||||||
|
|
||||||
|
**Solution**: This is acceptable for Phase 1 per the spec (ADR-017). Consider
|
||||||
|
logging the number of root certificates loaded for operational visibility:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
let cert_count = result.certs.len();
|
||||||
|
info!(certs_loaded = cert_count, "loaded system root certificates");
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### S8. Request Timeout Applies to Entire HTTP Exchange, Not Just Response
|
||||||
|
|
||||||
|
**File**: `src/proxy/handler.rs:75-79`
|
||||||
|
|
||||||
|
**Problem**: `tokio::time::timeout(request_timeout, client.request(...))` applies
|
||||||
|
the timeout to the entire HTTP round-trip including response body streaming.
|
||||||
|
For large file downloads or slow upstreams, this means a 60-second timeout kills
|
||||||
|
the response even if the upstream is actively sending data. A more precise
|
||||||
|
timeout would apply only to the connection + first-byte response, then stream
|
||||||
|
the body without a timeout.
|
||||||
|
|
||||||
|
**Solution**: For Phase 1, this is acceptable behavior — the timeout name
|
||||||
|
(`upstream_request_timeout_secs`) is documented as applying to the full request.
|
||||||
|
Consider splitting into connect-timeout and response-timeout in Phase 2. The
|
||||||
|
`SiteConfig` already has separate `upstream_connect_timeout_secs` and
|
||||||
|
`upstream_request_timeout_secs` fields, but `upstream_connect_timeout_secs` is
|
||||||
|
unused (see W4).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary Statistics
|
||||||
|
|
||||||
|
| Severity | Count | Status |
|
||||||
|
|----------|-------|--------|
|
||||||
|
| Critical | 4 | Must fix before production |
|
||||||
|
| Warning | 12 | Should fix — correctness and robustness |
|
||||||
|
| Suggestion | 8 | Consider for code quality |
|
||||||
|
|
||||||
|
## Recommended Fix Priority
|
||||||
|
|
||||||
|
1. **C1 (ACME challenge listener)** — Without this, ACME cert provisioning is
|
||||||
|
completely broken. This is the highest priority fix.
|
||||||
|
2. **C2 (ACME contact email)** — Without this, production Let's Encrypt
|
||||||
|
registration fails.
|
||||||
|
3. **C4 (ConfigReloadHandle static config drift)** — Every reload will produce
|
||||||
|
stale warnings, confusing operators.
|
||||||
|
4. **C3 (X-Forwarded-For comment)** — Correct behavior, just needs a clarifying
|
||||||
|
comment to prevent future "fixes."
|
||||||
|
5. **W1 (Shutdown drain)** — Can cause connection drops on graceful restart.
|
||||||
|
6. **W4 (Connect timeout)** — Slow upstreams can exhaust the full request
|
||||||
|
timeout budget.
|
||||||
|
7. **W5 (Health path collision)** — Upstream `/health` endpoints are silently
|
||||||
|
intercepted.
|
||||||
|
8. **W7 (Admin socket shutdown)** — Cannot gracefully stop.
|
||||||
|
9. **Remaining W and S findings** — Fix opportunistically.
|
||||||
Reference in New Issue
Block a user