W13: No request access logging - log_request! macro defined but never called W14: is_https hardcoded to true on ProxyState - X-Forwarded-Proto always https S9: Rate limiting silently bypassed when no client IP found S10: Integration test TOML has [[listeners.listeners.sites]] typo S11: No Server response header added by proxy (upstream's is stripped)
27 KiB
status, last_updated, reviewed_code, reviewer
| status | last_updated | reviewed_code | reviewer | |||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| draft | 2026-06-11 |
|
code-reviewer |
Implementation Review #002
Purpose
Post-implementation review of all modules. Each finding is structured as Problem → Solution (or Open Question where no solution is yet known).
Severity Definitions
| Severity | Meaning |
|---|---|
| Critical | Will cause incorrect behavior or security issues in production |
| Warning | Could cause issues under specific conditions or represents a missed edge case |
| Suggestion | Code quality, style, or minor improvement opportunity |
Critical Findings
C1. ACME Challenge Listener Not Started
File: src/main.rs:168-181
Problem: When TlsMode::Acme is matched, the challenge_config and resolver
fields are destructured with _ (discarded). The ACME TLS-ALPN-01 challenge
listener — required for Let's Encrypt certificate provisioning — is never started.
Without it, ACME certificate issuance will fail: Let's Encrypt cannot verify domain
ownership, and no certificates will ever be obtained.
Solution: Wire the challenge config into a separate TLS listener or configure
the rustls-acme resolver to serve TLS-ALPN-01 challenges on the same port.
The rustls-acme crate's ResolvesServerCertAcme handles ALPN protocol
negotiation automatically — if the ACME challenge ALPN protocol (acme-tls/1)
is included in the server's ALPN list, the resolver will serve challenge
certificates on the main HTTPS port. Verify that the build_acme_server_config
function includes ACME_TLS_ALPN_01 in its ALPN list (it already does at line 28).
This means the main listener should be sufficient for TLS-ALPN-01 challenges
if the resolver is correctly installed. The separate challenge_config in
TlsMode::Acme may be unnecessary — confirm whether rustls-acme requires a
dedicated listener or if the resolver on the default config handles challenges
automatically.
C2. ACME Contact Email Always Empty
File: src/tls/acme.rs:86
Problem: AcmeTlsConfig always sets contact: vec![]. Let's Encrypt requires
a contact email for production certificate requests. The acme_directory config
field supports "production" but an empty contact list will cause ACME
registration to fail with a 400-level error from the Let's Encrypt API.
Solution: Add an acme_contact field (e.g., acme_contact = "mailto:admin@example.com")
to the TlsConfig struct and wire it through to AcmeTlsConfig.contact. This
requires changes to:
src/config/static_config.rs— addacme_contact: Vec<String>toTlsConfigsrc/tls/acme.rs— useself.contactinAcmeTlsConfig::setupsrc/tls/acceptor.rs— pass the contact list fromTlsConfig
C3. X-Forwarded-For Replaces Instead of Appending
File: src/proxy/headers.rs:28
Problem: inject_proxy_headers uses headers.insert() for X-Forwarded-For,
which replaces any existing value. The spec (review #001, finding C2) decided
that as an edge proxy the X-Forwarded-For should be set (not appended),
because there are no trusted proxies in front. However, the implementation
doesn't match either behavior — it silently discards any existing header rather
than explicitly setting it to just the client IP, which is correct for an edge
proxy but was implemented without a clear comment explaining the rationale.
Solution: The current behavior is actually correct for an edge proxy
(per review #001/C2). The fix is documentation, not code change. Add a comment
to inject_proxy_headers explaining:
// X-Forwarded-For is SET (not appended) because this proxy is the outermost
// edge proxy. Any existing X-Forwarded-For from the client is untrusted and
// must be replaced with the actual client IP from ConnectInfo.
This ensures future maintainers don't "fix" this by changing insert to append.
C4. ConfigReloadHandle Never Updates Stored StaticConfig
File: src/config/dynamic_config.rs:124-148
Problem: The reload() method computes diff_static_config(&self.static_config, &new_static)
and returns the changed fields, but never updates self.static_config with
the new static config. This means:
- The first reload correctly reports changed fields
- The second reload compares against the original static config, not the last-reloaded one, and reports the same changes again
- Operators get repeated "static config fields changed" warnings for the same fields on every reload
Solution: After validation succeeds and the diff is computed, update
self.static_config with the new value. Since ConfigReloadHandle is accessed
concurrently, the static config field needs interior mutability. Use
ArcSwap<StaticConfig> or std::sync::RwLock<StaticConfig>:
pub struct ConfigReloadHandle {
config: Arc<ArcSwap<DynamicConfig>>,
static_config: ArcSwap<StaticConfig>, // Changed from StaticConfig
reload_mutex: Mutex<()>,
}
Then in reload():
let changed_fields = diff_static_config(&self.static_config.load(), &new_static);
self.static_config.store(Arc::new(new_static));
self.config.store(Arc::new(new_dynamic));
Warning Findings
W1. Shutdown Aborts Listeners Without Draining
File: src/main.rs:238-240
Problem: On shutdown, handle.abort() is called on each HTTPS server task.
This immediately kills the tokio task, which can interrupt in-flight request
processing. The drain_in_flight counter is only decremented in
InFlightGuard::drop, but abort() prevents the guard from being dropped
normally for connections still being processed by the task.
Solution: Instead of abort(), use tokio::time::timeout with the shutdown
timeout to join each handle:
let shutdown_timeout = shutdown.shutdown_timeout();
for handle in https_server_handles {
match tokio::time::timeout(shutdown_timeout, handle).await {
Ok(_) => {}
Err(_) => {
warn!("shutdown timeout expired, aborting listener task");
handle.abort();
}
}
}
Alternatively, keep the shutdown_rx pattern already in serve_https_listener
(which breaks the accept loop on shutdown signal) and remove the abort() calls
— the tasks will naturally exit when they stop accepting new connections and
existing connections drain.
W2. Shutdown Doesn't Stop Admin Socket or Background Tasks
File: src/main.rs:238-250
Problem: The shutdown sequence aborts HTTPS listeners but doesn't stop:
- The admin socket listener (runs in infinite loop at
src/admin/socket.rs:99-111) - The rate limiter eviction task (runs in infinite loop at
src/rate_limit/mod.rs:106-112) - The ACME state machine task
These tasks will continue running until the process exits, which is fine for process termination but means they can't be gracefully stopped in tests or during a clean shutdown.
Solution: For Phase 1, this is acceptable — process termination will clean up these tasks. For future improvements:
- Pass a
CancellationTokenorwatch::Receiver<bool>tostart_admin_socketand the eviction task - On shutdown, signal cancellation before waiting for drain
- The ACME state machine task already exits when the stream ends (
Nonebranch atsrc/tls/acme.rs:152-158), but it should also be cancellable
W3. Fragile Error Detection in Connection Error Handler
File: src/server.rs:95-97
Problem: The check e.to_string().contains("incomplete message") to silently
suppress connection errors is fragile. String matching on error descriptions can
break across hyper versions, locale changes, or error message reformatting.
Solution: Match on the error type instead. Check if hyper exposes a
client-disconnect error variant, or use e.is_incomplete_message() if available
in the hyper error API. If no typed variant exists, add a comment explaining
why string matching is used and which version(s) of hyper produce this message.
W4. No Separate Connect Timeout for Upstream Requests
File: src/proxy/handler.rs:73-79
Problem: The proxy uses tokio::time::timeout with
upstream_request_timeout_secs for the entire request, but there's no separate
connect timeout. A slow DNS resolution or TCP handshake will consume the full
request timeout budget, leaving no time for the actual request/response cycle.
Solution: Add a connect_timeout (either a fixed default or from
upstream_connect_timeout_secs in SiteConfig). Structure the proxy call as:
let connect_timeout = Duration::from_secs(site.upstream_connect_timeout_secs);
let result = tokio::time::timeout(
request_timeout,
async {
let upstream_req = build_upstream_request(req, &upstream_uri)?;
// The hyper client handles connect internally, but we can wrap
// the request in a two-phase timeout if needed
client.request(upstream_req).await
}
).await;
The SiteConfig already has upstream_connect_timeout_secs but it's not used
in proxy_handler. This should be wired up to either set the client's connect
timeout or to implement a two-phase timeout.
W5. Hardcoded /health Path Intercepted on All Hosts
File: src/proxy/handler.rs:37-39
Problem: The proxy handler returns 200 OK for GET /health regardless of the
Host header. This means any site's /health path will be intercepted by the
proxy and never reach the upstream. If an upstream application uses /health
for its own health checks, those requests will never reach it.
Solution: Either:
- Use a less common path like
/__healthor/healthzthat won't collide with upstream applications, OR - Only intercept
/healthwhen the Host header doesn't match any known site (fallthrough), OR - Make the health check path configurable via
StaticConfig
Option 1 is simplest for Phase 1. Option 3 is most flexible long-term.
W6. Token Bucket Refill Uses Millisecond Precision
File: src/rate_limit/bucket.rs:37
Problem: elapsed.as_millis() truncates sub-millisecond time, which can lead
to token refill inaccuracies under high-frequency request bursts. For example,
two requests arriving 500µs apart both see 0ms elapsed and don't refill tokens.
Solution: Use as_nanos() for the refill calculation:
let elapsed = now.duration_since(self.last_refill).as_nanos() as f64;
let tokens_to_add = (elapsed / 1_000_000_000.0) * rate;
This provides nanosecond-precision refill while keeping the math in floating point.
W7. Admin Socket Has No Shutdown Mechanism
File: src/admin/socket.rs:99-111
Problem: start_admin_socket runs an infinite loop accepting connections
with no way to break out. It doesn't accept a shutdown signal, so it can't be
gracefully stopped during process shutdown or in tests.
Solution: Accept a watch::Receiver<bool> parameter and use tokio::select!
to check for shutdown:
tokio::select! {
result = listener.accept() => { /* handle connection */ },
_ = shutdown_rx.changed() => {
info!("admin socket shutting down");
break;
}
}
This also requires cleaning up the socket file on exit.
W8. Server Header Unconditionally Stripped from Upstream Response
File: src/proxy/handler.rs:85
Problem: parts.headers.remove("server") unconditionally removes the upstream
Server header. This is a design choice, not necessarily a bug, but it means
downstream clients can't see what software the upstream is running, which may be
undesirable for debugging.
Solution: This is acceptable behavior for a security-focused reverse proxy (hiding upstream identity is a defense-in-depth measure). Document this decision with a comment in the code explaining the rationale.
W9. Logging Test Fails Due to Global Subscriber
File: src/logging/mod.rs:99-111
Problem: The test init_creates_log_directory_and_file calls init() which
sets a global default tracing subscriber. When tests run in parallel, this
conflicts with other tests that may also set a subscriber, causing the test to
fail with "a global default trace dispatcher has already been set."
Solution: Use tracing_subscriber::fmt().with_test_writer() and guard
against double-initialization. Alternatively, use std::sync::OnceLock or
tracing_subscriber::util::SubscriberInitExt::try_init() which returns an error
if already set rather than panicking.
W10. Body Limit Middleware Only Checks Content-Length Header
File: src/proxy/body_limit.rs:26-33
Problem: The middleware checks the Content-Length header but doesn't handle
requests that lack Content-Length (e.g., chunked transfers, HTTP/2). A
malicious client can send Transfer-Encoding: chunked without Content-Length
and bypass the initial check, though the Limited body wrapper at line 37 will
still enforce the limit during streaming.
Solution: This is actually acceptable — the Limited body wrapper on line
37 is the real enforcement mechanism. The Content-Length check on line 26 is
an early-rejection optimization for clients that do include the header. Add a
comment explaining this two-layer defense:
// Early rejection: if Content-Length is present and exceeds the limit, reject
// immediately without reading the body. For requests without Content-Length
// (chunked, HTTP/2), the Limited body wrapper below enforces the limit during
// streaming.
W11. Health Check Port Conflict Check Is Incomplete
File: src/config/validation.rs:169-186
Problem: The validation checks if health_check_port conflicts with any
listener's https_port or http_port, but it doesn't check whether the health
check port's bind address conflicts with a listener on a different bind address.
For example, health_check_port = 80 bound to 127.0.0.1 shouldn't conflict
with a listener's http_port = 80 bound to 203.0.113.10.
Solution: This is acceptable for Phase 1 since the health check always binds
to 127.0.0.1 (hardcoded in src/health.rs:20). Document that health check
always binds to localhost, so the conflict check is conservative (warns even
when it might not actually conflict). Add a comment in the validation code
explaining this.
W12. build_upstream_request Copies All Headers Without Filtering
File: src/proxy/handler.rs:124-128
Problem: build_upstream_request copies all headers from the original request
to the upstream request. However, remove_hop_by_hop was already called on the
original request's headers at line 59 before build_upstream_request is called,
so hop-by-hop headers have already been removed. The proxy headers
(X-Real-IP, X-Forwarded-For, X-Forwarded-Proto) were also already injected
at line 58. This means the function works correctly, but the code path is
spread across multiple locations, making it harder to reason about header
lifecycle.
Solution: Consider consolidating header manipulation into a single function or adding inline comments that trace the header lifecycle:
// Header lifecycle:
// 1. inject_proxy_headers() — adds X-Real-IP, X-Forwarded-For, X-Forwarded-Proto
// 2. remove_hop_by_hop() — removes Connection, Keep-Alive, etc.
// 3. build_upstream_request() — copies remaining headers to upstream request
// 4. Response: remove_hop_by_hop() on upstream response headers
Suggestions
S1. http_port Validation Allows 0 but Not Documented
File: src/config/validation.rs:106
Problem: http_port = 0 is treated as "disabled" (not validated as a port
number), which is correct, but the validation error message for invalid ports
says "must be 0 (disabled) or 1-65535" while HttpPortInvalid only checks
http_port > 0 && http_port < 1 implicitly. The actual check is http_port > 0
to enter the conflict check, and port 0 is always allowed for HTTP.
Solution: Add explicit validation for http_port being in the range 0 or
1-65535 (currently it only validates conflicts, not the port range itself for
http). Add a HttpPortInvalid check similar to HttpsPortInvalid:
if listener.http_port > 65535 {
errors.push(ValidationError::HttpPortInvalid { ... });
}
S2. Consider Using #[non_exhaustive] on Public Enums
Files: src/tls/acceptor.rs:49 (TlsMode), src/proxy/error.rs:5 (ProxyError),
src/admin/socket.rs:15 (AdminSocketError), src/config/validation.rs:10 (ValidationError)
Problem: These public enums can be matched exhaustively by downstream consumers. Adding a new variant would be a breaking change.
Solution: Add #[non_exhaustive] to these enums to allow future expansion
without breaking changes. This is especially important for TlsMode (which may
gain a "letsencrypt" or "auto" mode) and ProxyError (which may gain
UpstreamTls error handling).
S3. normalize_host in dynamic_config.rs Doesn't Handle Edge Cases
File: src/config/dynamic_config.rs:52-55
Problem: normalize_host uses split(':').next() to strip ports, but this
fails for IPv6 addresses in brackets (e.g., [::1]:443 would normalize to
[::1] instead of ::1). The strip_port_from_host function in
src/tls/redirect.rs:16-28 correctly handles this case.
Solution: Either reuse the strip_port_from_host logic from redirect.rs
or add a shared utility function. The IPv6 bracket handling is important for
correctness if the proxy ever receives an IPv6 Host header.
S4. Multiple #[allow(dead_code)] Annotations on Public API
Files: src/tls/acceptor.rs:14,33,48,58, src/tls/acme.rs:9,11,15,23,55,
src/config/static_config.rs:4,31,44,49,54,56,70,76,86,91
Problem: Many public items are annotated with #[allow(dead_code)], which
suggests they're defined but not yet used by the binary crate. This is fine
during initial development but should be cleaned up before release.
Solution: Remove #[allow(dead_code)] annotations once all features are
wired up (especially after C1 is fixed, since build_acme_challenge_config and
TlsMode::Acme.challenge_config will be needed). Run cargo check to identify
which items are actually dead code vs. just not yet referenced.
S5. InFlightCounter Could Use Atomic Usize Directly
File: src/server.rs:16-46
Problem: InFlightCounter wraps an AtomicUsize with increment/decrement/is_zero
methods, plus a separate InFlightGuard RAII type. This is clean, but the
Arc<InFlightCounter> pattern means every connection task clones the Arc,
which has a small allocation cost.
Solution: This is a fine pattern for correctness. No change needed — the
RAII guard pattern correctly ensures decrement is always called even on panic.
Mentioning as a suggestion for awareness, not action.
S6. Add Clone to SiteConfig Derive Is Correct But May Mask Cloning Hot Paths
File: src/config/dynamic_config.rs:70
Problem: SiteConfig derives Clone, which is used in
src/proxy/handler.rs:53 where site.clone() clones the site config on every
request. For hot paths, this allocates new Strings for host, upstream, and
upstream_scheme.
Solution: Consider using Arc<SiteConfig> in the routing table so that
lookups return an Arc clone (cheap atomic refcount) instead of a full String
clone. This would change routing_table from HashMap<String, SiteConfig> to
HashMap<String, Arc<SiteConfig>>. This is a performance optimization for
later — not a correctness issue.
S7. HTTPS Client Trusts System Root Certificates Unconditionally
File: src/proxy/handler.rs:153-165
Problem: The root_certs() function loads native certificates and silently
skips any that fail to parse (roots.add(cert).ok()). While this matches the
spec (review #001, W13), certificate validation failures for upstream TLS
connections will produce opaque IO errors.
Solution: This is acceptable for Phase 1 per the spec (ADR-017). Consider logging the number of root certificates loaded for operational visibility:
let cert_count = result.certs.len();
info!(certs_loaded = cert_count, "loaded system root certificates");
S8. Request Timeout Applies to Entire HTTP Exchange, Not Just Response
File: src/proxy/handler.rs:75-79
Problem: tokio::time::timeout(request_timeout, client.request(...)) applies
the timeout to the entire HTTP round-trip including response body streaming.
For large file downloads or slow upstreams, this means a 60-second timeout kills
the response even if the upstream is actively sending data. A more precise
timeout would apply only to the connection + first-byte response, then stream
the body without a timeout.
Solution: For Phase 1, this is acceptable behavior — the timeout name
(upstream_request_timeout_secs) is documented as applying to the full request.
Consider splitting into connect-timeout and response-timeout in Phase 2. The
SiteConfig already has separate upstream_connect_timeout_secs and
upstream_request_timeout_secs fields, but upstream_connect_timeout_secs is
unused (see W4).
W13. No Request Access Logging
File: src/proxy/handler.rs, src/logging/format.rs
Problem: The proxy_handler function does not log successful or failed proxied
requests. The log_request! macro is defined in src/logging/format.rs but is
never called anywhere in the codebase. Without request-level access logging, there
is no observability into what the proxy is doing — no way to see which hosts are
being requested, response status codes, upstream latency, or client IPs for normal
traffic. This also means fail2ban cannot consume proxy access logs for the scanner-
catching jails (nginx-badbots, nginx-botsearch) that were part of the original
nginx setup.
Solution: Add request logging in proxy_handler that emits a structured log
line for every proxied request (method, host, path, status code, upstream address,
duration, client IP). Use the log_request! macro that already exists. This is
critical for production — without it, the proxy is operationally blind.
W14. is_https Hardcoded to true on ProxyState
File: src/main.rs:77
Problem: ProxyState.is_https is always set to true:
let proxy_state = Arc::new(ProxyState {
config: config_arc.clone(),
http_client,
https_client,
is_https: true,
});
This means X-Forwarded-Proto will always be set to https in
inject_proxy_headers. For a TLS-terminating reverse proxy this is correct
(since all connections arrive over TLS), but it's a latent bug if the proxy is
ever used in a non-TLS context or if someone adds an HTTP listener that routes
through the same handler.
Solution: Either remove the is_https field and hardcode "https" in the
header injection (with a comment explaining why), or derive the value from the
listener configuration so each listener sets it correctly. For Phase 1, a comment
explaining the assumption is sufficient.
S9. Rate Limiting Silently Bypassed When No Client IP Found
File: src/rate_limit/mod.rs:78-79
Problem: When rate_limit_middleware cannot extract a client IP (no
X-Forwarded-For header and no ConnectInfo extension), the request passes
through without rate limiting. In the current deployment, ConnectInfo is always
set by ConnectInfoService in server.rs, so this won't happen in practice.
However, if the proxy were ever deployed without ConnectInfo propagation (e.g.,
behind another load balancer that strips it), rate limiting would silently become
a no-op for all requests.
Solution: Consider logging a warning when a request has no identifiable client IP, and optionally returning a 429 or 503 instead of passing through. For Phase 1, add a comment documenting the fallthrough behavior.
S10. Integration Test TOML Has Double-Nested Listeners Key
File: tests/integration_test.rs:514
Problem: The write_valid_config helper contains:
[[listeners.listeners.sites]]
This should be [[listeners.sites]] — the double-nested listeners.listeners
is a TOML path that doesn't match the config schema. The test passes because the
binary validates the file and exits with an error, and the test checks that
--validate with this file fails (it's used as an invalid config test fixture).
However, the variable name write_valid_config suggests it was intended to produce
a valid config, and the invalid TOML path makes the test's intent unclear.
Solution: Fix the TOML key to [[listeners.sites]] and verify the test still
passes, or rename the function to write_invalid_config if the invalid TOML is
intentional.
S11. No Server Response Header Added by Proxy
File: src/proxy/handler.rs:85
Problem: The proxy strips the upstream Server header (per W8) but does not
add its own Server header. Some load balancers, monitoring tools, and HTTP
spec-compliant clients expect a Server header in responses. This is a minor
observation, not a bug — omitting Server is a valid security hardening choice
(hiding server identity).
Solution: This is acceptable for Phase 1. If desired later, add a configurable
Server header (e.g., Server: reverse-proxy) or leave it absent.
Summary Statistics
| Severity | Count | Status |
|---|---|---|
| Critical | 4 | Must fix before production |
| Warning | 14 | Should fix — correctness and robustness |
| Suggestion | 11 | Consider for code quality |
Recommended Fix Priority
- C1 (ACME challenge listener) — Without this, ACME cert provisioning is completely broken. This is the highest priority fix.
- C2 (ACME contact email) — Without this, production Let's Encrypt registration fails.
- C4 (ConfigReloadHandle static config drift) — Every reload will produce stale warnings, confusing operators.
- C3 (X-Forwarded-For comment) — Correct behavior, just needs a clarifying comment to prevent future "fixes."
- W13 (No request access logging) — Proxy is operationally blind without access logs; fail2ban cannot consume proxy traffic data.
- W1 (Shutdown drain) — Can cause connection drops on graceful restart.
- W4 (Connect timeout) — Slow upstreams can exhaust the full request timeout budget.
- W5 (Health path collision) — Upstream
/healthendpoints are silently intercepted. - W7 (Admin socket shutdown) — Cannot gracefully stop.
- W14 (is_https hardcoded) — Latent bug for non-TLS contexts; needs comment.
- Remaining W and S findings — Fix opportunistically.