Add post-implementation code review (4 critical, 12 warning, 8 suggestion findings)
This commit is contained in:
586
docs/reviews/002-implementation-review.md
Normal file
586
docs/reviews/002-implementation-review.md
Normal file
@@ -0,0 +1,586 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-11
|
||||
reviewed_code:
|
||||
- src/main.rs
|
||||
- src/server.rs
|
||||
- src/tls/acceptor.rs
|
||||
- src/tls/acme.rs
|
||||
- src/tls/config.rs
|
||||
- src/tls/redirect.rs
|
||||
- src/config/validation.rs
|
||||
- src/config/dynamic_config.rs
|
||||
- src/config/static_config.rs
|
||||
- src/config/mod.rs
|
||||
- src/proxy/handler.rs
|
||||
- src/proxy/headers.rs
|
||||
- src/proxy/error.rs
|
||||
- src/proxy/body_limit.rs
|
||||
- src/proxy/mod.rs
|
||||
- src/rate_limit/mod.rs
|
||||
- src/rate_limit/bucket.rs
|
||||
- src/admin/socket.rs
|
||||
- src/shutdown.rs
|
||||
- src/health.rs
|
||||
- src/logging/mod.rs
|
||||
- src/logging/format.rs
|
||||
- src/cli.rs
|
||||
reviewer: code-reviewer
|
||||
---
|
||||
|
||||
# Implementation Review #002
|
||||
|
||||
## Purpose
|
||||
|
||||
Post-implementation review of all modules. Each finding is structured as
|
||||
**Problem** → **Solution** (or **Open Question** where no solution is yet known).
|
||||
|
||||
## Severity Definitions
|
||||
|
||||
| Severity | Meaning |
|
||||
|----------|---------|
|
||||
| **Critical** | Will cause incorrect behavior or security issues in production |
|
||||
| **Warning** | Could cause issues under specific conditions or represents a missed edge case |
|
||||
| **Suggestion** | Code quality, style, or minor improvement opportunity |
|
||||
|
||||
---
|
||||
|
||||
## Critical Findings
|
||||
|
||||
### C1. ACME Challenge Listener Not Started
|
||||
|
||||
**File**: `src/main.rs:168-181`
|
||||
|
||||
**Problem**: When `TlsMode::Acme` is matched, the `challenge_config` and `resolver`
|
||||
fields are destructured with `_` (discarded). The ACME TLS-ALPN-01 challenge
|
||||
listener — required for Let's Encrypt certificate provisioning — is never started.
|
||||
Without it, ACME certificate issuance will fail: Let's Encrypt cannot verify domain
|
||||
ownership, and no certificates will ever be obtained.
|
||||
|
||||
**Solution**: Wire the challenge config into a separate TLS listener or configure
|
||||
the `rustls-acme` resolver to serve TLS-ALPN-01 challenges on the same port.
|
||||
The `rustls-acme` crate's `ResolvesServerCertAcme` handles ALPN protocol
|
||||
negotiation automatically — if the ACME challenge ALPN protocol (`acme-tls/1`)
|
||||
is included in the server's ALPN list, the resolver will serve challenge
|
||||
certificates on the main HTTPS port. Verify that the `build_acme_server_config`
|
||||
function includes `ACME_TLS_ALPN_01` in its ALPN list (it already does at line 28).
|
||||
This means the main listener should be sufficient for TLS-ALPN-01 challenges
|
||||
if the resolver is correctly installed. The separate `challenge_config` in
|
||||
`TlsMode::Acme` may be unnecessary — confirm whether `rustls-acme` requires a
|
||||
dedicated listener or if the resolver on the default config handles challenges
|
||||
automatically.
|
||||
|
||||
---
|
||||
|
||||
### C2. ACME Contact Email Always Empty
|
||||
|
||||
**File**: `src/tls/acme.rs:86`
|
||||
|
||||
**Problem**: `AcmeTlsConfig` always sets `contact: vec![]`. Let's Encrypt requires
|
||||
a contact email for production certificate requests. The `acme_directory` config
|
||||
field supports `"production"` but an empty contact list will cause ACME
|
||||
registration to fail with a 400-level error from the Let's Encrypt API.
|
||||
|
||||
**Solution**: Add an `acme_contact` field (e.g., `acme_contact = "mailto:admin@example.com"`)
|
||||
to the `TlsConfig` struct and wire it through to `AcmeTlsConfig.contact`. This
|
||||
requires changes to:
|
||||
1. `src/config/static_config.rs` — add `acme_contact: Vec<String>` to `TlsConfig`
|
||||
2. `src/tls/acme.rs` — use `self.contact` in `AcmeTlsConfig::setup`
|
||||
3. `src/tls/acceptor.rs` — pass the contact list from `TlsConfig`
|
||||
|
||||
---
|
||||
|
||||
### C3. X-Forwarded-For Replaces Instead of Appending
|
||||
|
||||
**File**: `src/proxy/headers.rs:28`
|
||||
|
||||
**Problem**: `inject_proxy_headers` uses `headers.insert()` for `X-Forwarded-For`,
|
||||
which **replaces** any existing value. The spec (review #001, finding C2) decided
|
||||
that as an edge proxy the X-Forwarded-For should be **set** (not appended),
|
||||
because there are no trusted proxies in front. However, the implementation
|
||||
doesn't match either behavior — it silently discards any existing header rather
|
||||
than explicitly setting it to just the client IP, which is correct for an edge
|
||||
proxy but was implemented without a clear comment explaining the rationale.
|
||||
|
||||
**Solution**: The current behavior is actually **correct** for an edge proxy
|
||||
(per review #001/C2). The fix is documentation, not code change. Add a comment
|
||||
to `inject_proxy_headers` explaining:
|
||||
|
||||
```rust
|
||||
// X-Forwarded-For is SET (not appended) because this proxy is the outermost
|
||||
// edge proxy. Any existing X-Forwarded-For from the client is untrusted and
|
||||
// must be replaced with the actual client IP from ConnectInfo.
|
||||
```
|
||||
|
||||
This ensures future maintainers don't "fix" this by changing `insert` to `append`.
|
||||
|
||||
---
|
||||
|
||||
### C4. ConfigReloadHandle Never Updates Stored StaticConfig
|
||||
|
||||
**File**: `src/config/dynamic_config.rs:124-148`
|
||||
|
||||
**Problem**: The `reload()` method computes `diff_static_config(&self.static_config, &new_static)`
|
||||
and returns the changed fields, but **never updates `self.static_config`** with
|
||||
the new static config. This means:
|
||||
- The first reload correctly reports changed fields
|
||||
- The second reload compares against the **original** static config, not the
|
||||
last-reloaded one, and reports the same changes again
|
||||
- Operators get repeated "static config fields changed" warnings for the same
|
||||
fields on every reload
|
||||
|
||||
**Solution**: After validation succeeds and the diff is computed, update
|
||||
`self.static_config` with the new value. Since `ConfigReloadHandle` is accessed
|
||||
concurrently, the static config field needs interior mutability. Use
|
||||
`ArcSwap<StaticConfig>` or `std::sync::RwLock<StaticConfig>`:
|
||||
|
||||
```rust
|
||||
pub struct ConfigReloadHandle {
|
||||
config: Arc<ArcSwap<DynamicConfig>>,
|
||||
static_config: ArcSwap<StaticConfig>, // Changed from StaticConfig
|
||||
reload_mutex: Mutex<()>,
|
||||
}
|
||||
```
|
||||
|
||||
Then in `reload()`:
|
||||
```rust
|
||||
let changed_fields = diff_static_config(&self.static_config.load(), &new_static);
|
||||
self.static_config.store(Arc::new(new_static));
|
||||
self.config.store(Arc::new(new_dynamic));
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Warning Findings
|
||||
|
||||
### W1. Shutdown Aborts Listeners Without Draining
|
||||
|
||||
**File**: `src/main.rs:238-240`
|
||||
|
||||
**Problem**: On shutdown, `handle.abort()` is called on each HTTPS server task.
|
||||
This immediately kills the tokio task, which can interrupt in-flight request
|
||||
processing. The `drain_in_flight` counter is only decremented in
|
||||
`InFlightGuard::drop`, but `abort()` prevents the guard from being dropped
|
||||
normally for connections still being processed by the task.
|
||||
|
||||
**Solution**: Instead of `abort()`, use `tokio::time::timeout` with the shutdown
|
||||
timeout to `join` each handle:
|
||||
|
||||
```rust
|
||||
let shutdown_timeout = shutdown.shutdown_timeout();
|
||||
for handle in https_server_handles {
|
||||
match tokio::time::timeout(shutdown_timeout, handle).await {
|
||||
Ok(_) => {}
|
||||
Err(_) => {
|
||||
warn!("shutdown timeout expired, aborting listener task");
|
||||
handle.abort();
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Alternatively, keep the `shutdown_rx` pattern already in `serve_https_listener`
|
||||
(which breaks the accept loop on shutdown signal) and remove the `abort()` calls
|
||||
— the tasks will naturally exit when they stop accepting new connections and
|
||||
existing connections drain.
|
||||
|
||||
---
|
||||
|
||||
### W2. Shutdown Doesn't Stop Admin Socket or Background Tasks
|
||||
|
||||
**File**: `src/main.rs:238-250`
|
||||
|
||||
**Problem**: The shutdown sequence aborts HTTPS listeners but doesn't stop:
|
||||
- The admin socket listener (runs in infinite loop at `src/admin/socket.rs:99-111`)
|
||||
- The rate limiter eviction task (runs in infinite loop at `src/rate_limit/mod.rs:106-112`)
|
||||
- The ACME state machine task
|
||||
|
||||
These tasks will continue running until the process exits, which is fine for
|
||||
process termination but means they can't be gracefully stopped in tests or
|
||||
during a clean shutdown.
|
||||
|
||||
**Solution**: For Phase 1, this is acceptable — process termination will clean
|
||||
up these tasks. For future improvements:
|
||||
- Pass a `CancellationToken` or `watch::Receiver<bool>` to `start_admin_socket`
|
||||
and the eviction task
|
||||
- On shutdown, signal cancellation before waiting for drain
|
||||
- The ACME state machine task already exits when the stream ends (`None` branch
|
||||
at `src/tls/acme.rs:152-158`), but it should also be cancellable
|
||||
|
||||
---
|
||||
|
||||
### W3. Fragile Error Detection in Connection Error Handler
|
||||
|
||||
**File**: `src/server.rs:95-97`
|
||||
|
||||
**Problem**: The check `e.to_string().contains("incomplete message")` to silently
|
||||
suppress connection errors is fragile. String matching on error descriptions can
|
||||
break across hyper versions, locale changes, or error message reformatting.
|
||||
|
||||
**Solution**: Match on the error type instead. Check if `hyper` exposes a
|
||||
client-disconnect error variant, or use `e.is_incomplete_message()` if available
|
||||
in the hyper error API. If no typed variant exists, add a comment explaining
|
||||
why string matching is used and which version(s) of hyper produce this message.
|
||||
|
||||
---
|
||||
|
||||
### W4. No Separate Connect Timeout for Upstream Requests
|
||||
|
||||
**File**: `src/proxy/handler.rs:73-79`
|
||||
|
||||
**Problem**: The proxy uses `tokio::time::timeout` with
|
||||
`upstream_request_timeout_secs` for the entire request, but there's no separate
|
||||
connect timeout. A slow DNS resolution or TCP handshake will consume the full
|
||||
request timeout budget, leaving no time for the actual request/response cycle.
|
||||
|
||||
**Solution**: Add a `connect_timeout` (either a fixed default or from
|
||||
`upstream_connect_timeout_secs` in `SiteConfig`). Structure the proxy call as:
|
||||
|
||||
```rust
|
||||
let connect_timeout = Duration::from_secs(site.upstream_connect_timeout_secs);
|
||||
let result = tokio::time::timeout(
|
||||
request_timeout,
|
||||
async {
|
||||
let upstream_req = build_upstream_request(req, &upstream_uri)?;
|
||||
// The hyper client handles connect internally, but we can wrap
|
||||
// the request in a two-phase timeout if needed
|
||||
client.request(upstream_req).await
|
||||
}
|
||||
).await;
|
||||
```
|
||||
|
||||
The `SiteConfig` already has `upstream_connect_timeout_secs` but it's not used
|
||||
in `proxy_handler`. This should be wired up to either set the client's connect
|
||||
timeout or to implement a two-phase timeout.
|
||||
|
||||
---
|
||||
|
||||
### W5. Hardcoded `/health` Path Intercepted on All Hosts
|
||||
|
||||
**File**: `src/proxy/handler.rs:37-39`
|
||||
|
||||
**Problem**: The proxy handler returns 200 OK for `GET /health` regardless of the
|
||||
Host header. This means any site's `/health` path will be intercepted by the
|
||||
proxy and never reach the upstream. If an upstream application uses `/health`
|
||||
for its own health checks, those requests will never reach it.
|
||||
|
||||
**Solution**: Either:
|
||||
1. Use a less common path like `/__health` or `/healthz` that won't collide
|
||||
with upstream applications, OR
|
||||
2. Only intercept `/health` when the Host header doesn't match any known site
|
||||
(fallthrough), OR
|
||||
3. Make the health check path configurable via `StaticConfig`
|
||||
|
||||
Option 1 is simplest for Phase 1. Option 3 is most flexible long-term.
|
||||
|
||||
---
|
||||
|
||||
### W6. Token Bucket Refill Uses Millisecond Precision
|
||||
|
||||
**File**: `src/rate_limit/bucket.rs:37`
|
||||
|
||||
**Problem**: `elapsed.as_millis()` truncates sub-millisecond time, which can lead
|
||||
to token refill inaccuracies under high-frequency request bursts. For example,
|
||||
two requests arriving 500µs apart both see `0ms` elapsed and don't refill tokens.
|
||||
|
||||
**Solution**: Use `as_nanos()` for the refill calculation:
|
||||
|
||||
```rust
|
||||
let elapsed = now.duration_since(self.last_refill).as_nanos() as f64;
|
||||
let tokens_to_add = (elapsed / 1_000_000_000.0) * rate;
|
||||
```
|
||||
|
||||
This provides nanosecond-precision refill while keeping the math in floating point.
|
||||
|
||||
---
|
||||
|
||||
### W7. Admin Socket Has No Shutdown Mechanism
|
||||
|
||||
**File**: `src/admin/socket.rs:99-111`
|
||||
|
||||
**Problem**: `start_admin_socket` runs an infinite `loop` accepting connections
|
||||
with no way to break out. It doesn't accept a shutdown signal, so it can't be
|
||||
gracefully stopped during process shutdown or in tests.
|
||||
|
||||
**Solution**: Accept a `watch::Receiver<bool>` parameter and use `tokio::select!`
|
||||
to check for shutdown:
|
||||
|
||||
```rust
|
||||
tokio::select! {
|
||||
result = listener.accept() => { /* handle connection */ },
|
||||
_ = shutdown_rx.changed() => {
|
||||
info!("admin socket shutting down");
|
||||
break;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This also requires cleaning up the socket file on exit.
|
||||
|
||||
---
|
||||
|
||||
### W8. Server Header Unconditionally Stripped from Upstream Response
|
||||
|
||||
**File**: `src/proxy/handler.rs:85`
|
||||
|
||||
**Problem**: `parts.headers.remove("server")` unconditionally removes the upstream
|
||||
`Server` header. This is a design choice, not necessarily a bug, but it means
|
||||
downstream clients can't see what software the upstream is running, which may be
|
||||
undesirable for debugging.
|
||||
|
||||
**Solution**: This is acceptable behavior for a security-focused reverse proxy
|
||||
(hiding upstream identity is a defense-in-depth measure). Document this decision
|
||||
with a comment in the code explaining the rationale.
|
||||
|
||||
---
|
||||
|
||||
### W9. Logging Test Fails Due to Global Subscriber
|
||||
|
||||
**File**: `src/logging/mod.rs:99-111`
|
||||
|
||||
**Problem**: The test `init_creates_log_directory_and_file` calls `init()` which
|
||||
sets a global default tracing subscriber. When tests run in parallel, this
|
||||
conflicts with other tests that may also set a subscriber, causing the test to
|
||||
fail with "a global default trace dispatcher has already been set."
|
||||
|
||||
**Solution**: Use `tracing_subscriber::fmt().with_test_writer()` and guard
|
||||
against double-initialization. Alternatively, use `std::sync::OnceLock` or
|
||||
`tracing_subscriber::util::SubscriberInitExt::try_init()` which returns an error
|
||||
if already set rather than panicking.
|
||||
|
||||
---
|
||||
|
||||
### W10. Body Limit Middleware Only Checks Content-Length Header
|
||||
|
||||
**File**: `src/proxy/body_limit.rs:26-33`
|
||||
|
||||
**Problem**: The middleware checks the `Content-Length` header but doesn't handle
|
||||
requests that lack `Content-Length` (e.g., chunked transfers, HTTP/2). A
|
||||
malicious client can send `Transfer-Encoding: chunked` without `Content-Length`
|
||||
and bypass the initial check, though the `Limited` body wrapper at line 37 will
|
||||
still enforce the limit during streaming.
|
||||
|
||||
**Solution**: This is actually acceptable — the `Limited` body wrapper on line
|
||||
37 is the real enforcement mechanism. The `Content-Length` check on line 26 is
|
||||
an early-rejection optimization for clients that do include the header. Add a
|
||||
comment explaining this two-layer defense:
|
||||
|
||||
```rust
|
||||
// Early rejection: if Content-Length is present and exceeds the limit, reject
|
||||
// immediately without reading the body. For requests without Content-Length
|
||||
// (chunked, HTTP/2), the Limited body wrapper below enforces the limit during
|
||||
// streaming.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### W11. Health Check Port Conflict Check Is Incomplete
|
||||
|
||||
**File**: `src/config/validation.rs:169-186`
|
||||
|
||||
**Problem**: The validation checks if `health_check_port` conflicts with any
|
||||
listener's `https_port` or `http_port`, but it doesn't check whether the health
|
||||
check port's bind address conflicts with a listener on a different bind address.
|
||||
For example, `health_check_port = 80` bound to `127.0.0.1` shouldn't conflict
|
||||
with a listener's `http_port = 80` bound to `203.0.113.10`.
|
||||
|
||||
**Solution**: This is acceptable for Phase 1 since the health check always binds
|
||||
to `127.0.0.1` (hardcoded in `src/health.rs:20`). Document that health check
|
||||
always binds to localhost, so the conflict check is conservative (warns even
|
||||
when it might not actually conflict). Add a comment in the validation code
|
||||
explaining this.
|
||||
|
||||
---
|
||||
|
||||
### W12. `build_upstream_request` Copies All Headers Without Filtering
|
||||
|
||||
**File**: `src/proxy/handler.rs:124-128`
|
||||
|
||||
**Problem**: `build_upstream_request` copies all headers from the original request
|
||||
to the upstream request. However, `remove_hop_by_hop` was already called on the
|
||||
original request's headers at line 59 before `build_upstream_request` is called,
|
||||
so hop-by-hop headers have already been removed. The proxy headers
|
||||
(`X-Real-IP`, `X-Forwarded-For`, `X-Forwarded-Proto`) were also already injected
|
||||
at line 58. This means the function works correctly, but the code path is
|
||||
spread across multiple locations, making it harder to reason about header
|
||||
lifecycle.
|
||||
|
||||
**Solution**: Consider consolidating header manipulation into a single function
|
||||
or adding inline comments that trace the header lifecycle:
|
||||
|
||||
```rust
|
||||
// Header lifecycle:
|
||||
// 1. inject_proxy_headers() — adds X-Real-IP, X-Forwarded-For, X-Forwarded-Proto
|
||||
// 2. remove_hop_by_hop() — removes Connection, Keep-Alive, etc.
|
||||
// 3. build_upstream_request() — copies remaining headers to upstream request
|
||||
// 4. Response: remove_hop_by_hop() on upstream response headers
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Suggestions
|
||||
|
||||
### S1. `http_port` Validation Allows 0 but Not Documented
|
||||
|
||||
**File**: `src/config/validation.rs:106`
|
||||
|
||||
**Problem**: `http_port = 0` is treated as "disabled" (not validated as a port
|
||||
number), which is correct, but the validation error message for invalid ports
|
||||
says "must be 0 (disabled) or 1-65535" while `HttpPortInvalid` only checks
|
||||
`http_port > 0 && http_port < 1` implicitly. The actual check is `http_port > 0`
|
||||
to enter the conflict check, and port 0 is always allowed for HTTP.
|
||||
|
||||
**Solution**: Add explicit validation for `http_port` being in the range 0 or
|
||||
1-65535 (currently it only validates conflicts, not the port range itself for
|
||||
http). Add a `HttpPortInvalid` check similar to `HttpsPortInvalid`:
|
||||
|
||||
```rust
|
||||
if listener.http_port > 65535 {
|
||||
errors.push(ValidationError::HttpPortInvalid { ... });
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### S2. Consider Using `#[non_exhaustive]` on Public Enums
|
||||
|
||||
**Files**: `src/tls/acceptor.rs:49` (`TlsMode`), `src/proxy/error.rs:5` (`ProxyError`),
|
||||
`src/admin/socket.rs:15` (`AdminSocketError`), `src/config/validation.rs:10` (`ValidationError`)
|
||||
|
||||
**Problem**: These public enums can be matched exhaustively by downstream consumers.
|
||||
Adding a new variant would be a breaking change.
|
||||
|
||||
**Solution**: Add `#[non_exhaustive]` to these enums to allow future expansion
|
||||
without breaking changes. This is especially important for `TlsMode` (which may
|
||||
gain a `"letsencrypt"` or `"auto"` mode) and `ProxyError` (which may gain
|
||||
`UpstreamTls` error handling).
|
||||
|
||||
---
|
||||
|
||||
### S3. `normalize_host` in `dynamic_config.rs` Doesn't Handle Edge Cases
|
||||
|
||||
**File**: `src/config/dynamic_config.rs:52-55`
|
||||
|
||||
**Problem**: `normalize_host` uses `split(':').next()` to strip ports, but this
|
||||
fails for IPv6 addresses in brackets (e.g., `[::1]:443` would normalize to
|
||||
`[::1]` instead of `::1`). The `strip_port_from_host` function in
|
||||
`src/tls/redirect.rs:16-28` correctly handles this case.
|
||||
|
||||
**Solution**: Either reuse the `strip_port_from_host` logic from `redirect.rs`
|
||||
or add a shared utility function. The IPv6 bracket handling is important for
|
||||
correctness if the proxy ever receives an IPv6 Host header.
|
||||
|
||||
---
|
||||
|
||||
### S4. Multiple `#[allow(dead_code)]` Annotations on Public API
|
||||
|
||||
**Files**: `src/tls/acceptor.rs:14,33,48,58`, `src/tls/acme.rs:9,11,15,23,55`,
|
||||
`src/config/static_config.rs:4,31,44,49,54,56,70,76,86,91`
|
||||
|
||||
**Problem**: Many public items are annotated with `#[allow(dead_code)]`, which
|
||||
suggests they're defined but not yet used by the binary crate. This is fine
|
||||
during initial development but should be cleaned up before release.
|
||||
|
||||
**Solution**: Remove `#[allow(dead_code)]` annotations once all features are
|
||||
wired up (especially after C1 is fixed, since `build_acme_challenge_config` and
|
||||
`TlsMode::Acme.challenge_config` will be needed). Run `cargo check` to identify
|
||||
which items are actually dead code vs. just not yet referenced.
|
||||
|
||||
---
|
||||
|
||||
### S5. `InFlightCounter` Could Use Atomic Usize Directly
|
||||
|
||||
**File**: `src/server.rs:16-46`
|
||||
|
||||
**Problem**: `InFlightCounter` wraps an `AtomicUsize` with `increment`/`decrement`/`is_zero`
|
||||
methods, plus a separate `InFlightGuard` RAII type. This is clean, but the
|
||||
`Arc<InFlightCounter>` pattern means every connection task clones the `Arc`,
|
||||
which has a small allocation cost.
|
||||
|
||||
**Solution**: This is a fine pattern for correctness. No change needed — the
|
||||
RAII guard pattern correctly ensures `decrement` is always called even on panic.
|
||||
Mentioning as a suggestion for awareness, not action.
|
||||
|
||||
---
|
||||
|
||||
### S6. Add `Clone` to `SiteConfig` Derive Is Correct But May Mask Cloning Hot Paths
|
||||
|
||||
**File**: `src/config/dynamic_config.rs:70`
|
||||
|
||||
**Problem**: `SiteConfig` derives `Clone`, which is used in
|
||||
`src/proxy/handler.rs:53` where `site.clone()` clones the site config on every
|
||||
request. For hot paths, this allocates new `String`s for `host`, `upstream`, and
|
||||
`upstream_scheme`.
|
||||
|
||||
**Solution**: Consider using `Arc<SiteConfig>` in the routing table so that
|
||||
lookups return an `Arc` clone (cheap atomic refcount) instead of a full `String`
|
||||
clone. This would change `routing_table` from `HashMap<String, SiteConfig>` to
|
||||
`HashMap<String, Arc<SiteConfig>>`. This is a performance optimization for
|
||||
later — not a correctness issue.
|
||||
|
||||
---
|
||||
|
||||
### S7. HTTPS Client Trusts System Root Certificates Unconditionally
|
||||
|
||||
**File**: `src/proxy/handler.rs:153-165`
|
||||
|
||||
**Problem**: The `root_certs()` function loads native certificates and silently
|
||||
skips any that fail to parse (`roots.add(cert).ok()`). While this matches the
|
||||
spec (review #001, W13), certificate validation failures for upstream TLS
|
||||
connections will produce opaque IO errors.
|
||||
|
||||
**Solution**: This is acceptable for Phase 1 per the spec (ADR-017). Consider
|
||||
logging the number of root certificates loaded for operational visibility:
|
||||
|
||||
```rust
|
||||
let cert_count = result.certs.len();
|
||||
info!(certs_loaded = cert_count, "loaded system root certificates");
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### S8. Request Timeout Applies to Entire HTTP Exchange, Not Just Response
|
||||
|
||||
**File**: `src/proxy/handler.rs:75-79`
|
||||
|
||||
**Problem**: `tokio::time::timeout(request_timeout, client.request(...))` applies
|
||||
the timeout to the entire HTTP round-trip including response body streaming.
|
||||
For large file downloads or slow upstreams, this means a 60-second timeout kills
|
||||
the response even if the upstream is actively sending data. A more precise
|
||||
timeout would apply only to the connection + first-byte response, then stream
|
||||
the body without a timeout.
|
||||
|
||||
**Solution**: For Phase 1, this is acceptable behavior — the timeout name
|
||||
(`upstream_request_timeout_secs`) is documented as applying to the full request.
|
||||
Consider splitting into connect-timeout and response-timeout in Phase 2. The
|
||||
`SiteConfig` already has separate `upstream_connect_timeout_secs` and
|
||||
`upstream_request_timeout_secs` fields, but `upstream_connect_timeout_secs` is
|
||||
unused (see W4).
|
||||
|
||||
---
|
||||
|
||||
## Summary Statistics
|
||||
|
||||
| Severity | Count | Status |
|
||||
|----------|-------|--------|
|
||||
| Critical | 4 | Must fix before production |
|
||||
| Warning | 12 | Should fix — correctness and robustness |
|
||||
| Suggestion | 8 | Consider for code quality |
|
||||
|
||||
## Recommended Fix Priority
|
||||
|
||||
1. **C1 (ACME challenge listener)** — Without this, ACME cert provisioning is
|
||||
completely broken. This is the highest priority fix.
|
||||
2. **C2 (ACME contact email)** — Without this, production Let's Encrypt
|
||||
registration fails.
|
||||
3. **C4 (ConfigReloadHandle static config drift)** — Every reload will produce
|
||||
stale warnings, confusing operators.
|
||||
4. **C3 (X-Forwarded-For comment)** — Correct behavior, just needs a clarifying
|
||||
comment to prevent future "fixes."
|
||||
5. **W1 (Shutdown drain)** — Can cause connection drops on graceful restart.
|
||||
6. **W4 (Connect timeout)** — Slow upstreams can exhaust the full request
|
||||
timeout budget.
|
||||
7. **W5 (Health path collision)** — Upstream `/health` endpoints are silently
|
||||
intercepted.
|
||||
8. **W7 (Admin socket shutdown)** — Cannot gracefully stop.
|
||||
9. **Remaining W and S findings** — Fix opportunistically.
|
||||
Reference in New Issue
Block a user