Break down findings from review #002 into dependency-ordered fix tasks: Critical/High: - fix/acme-contact-and-challenge (C1+C2): Add acme_contact field, wire to ACME, remove unused challenge_config, add validation rule 19 - fix/remove-health-and-hardcode-https (W5+W14+ADR-022): Remove /health from main listener, hardcode X-Forwarded-Proto to https - fix/config-reload-static-drift (C4): Use ArcSwap<StaticConfig> so reload diffs against last config, not startup config - fix/access-logging (W13): Wire up log_request! macro for every proxied request with client_ip, host, method, path, status, upstream, duration_ms Medium: - fix/graceful-shutdown (W1+W7): Join HTTPS tasks with timeout instead of abort, add shutdown signal to admin socket and eviction task - fix/connect-timeout (W4): Wire upstream_connect_timeout_secs to enforce separate connect timeout Low/Independent: - fix/token-bucket-nanosecond (W6): Use as_nanos() instead of as_millis() - fix/normalize-host-ipv6 (S3): Handle IPv6 bracket notation in normalize_host - fix/http-port-validation (S1): Validate http_port in range 0 or 1-65535 - fix/integration-test-toml (S10): Fix double-nested listeners.listeners.sites - fix/logging-test-global-subscriber (W9): Use try_init() to avoid test conflicts - fix/fragile-error-detection (W3): Add typed error matching or documented string match - fix/add-code-comments (C3,W8,W10,W11,S9): Document correct-but-non-obvious behaviors - fix/request-timeout-scope (S8): Document full-request timeout scope - fix/clean-dead-code (S4+S2): Remove dead_code annotations, add #[non_exhaustive] Review gate: - review/post-fix-review: Verify all fixes against architecture spec
3.7 KiB
id, name, status, depends_on, scope, risk, impact, level, review_findings
| id | name | status | depends_on | scope | risk | impact | level | review_findings | ||
|---|---|---|---|---|---|---|---|---|---|---|
| fix/graceful-shutdown | Fix shutdown to drain listeners and stop background tasks cleanly | pending | moderate | medium | component | implementation |
|
Description
Two related shutdown issues:
-
W1: On shutdown,
handle.abort()is called on each HTTPS server task inmain.rs. This immediately kills the tokio task, interrupting in-flight request processing. TheInFlightGuardRAII type ensuresdecrementis called on drop, butabort()prevents normal drops. The architecture spec says tasks should be joined with a timeout, not aborted — only aborting after the shutdown timeout expires.The good news:
serve_https_listeneralready has ashutdown_rxthat breaks the accept loop on shutdown signal. So tasks will stop accepting new connections. We just need to wait for them to drain in-flight requests instead of aborting them. -
W7:
start_admin_socketruns an infiniteloopaccepting connections with no way to break out. It doesn't accept a shutdown signal, so it can't be gracefully stopped. Similarly, the rate limiter eviction task runs an infinite loop with no cancellation mechanism.
Changes Required
src/main.rs:
- Replace
handle.abort()loop with timeout-based join:let shutdown_timeout = shutdown.shutdown_timeout(); for handle in https_server_handles { match tokio::time::timeout(shutdown_timeout, handle).await { Ok(_) => {} Err(_) => { warn!("shutdown timeout expired, aborting listener task"); handle.abort(); } } } - After draining, signal cancellation to admin socket and eviction task
src/admin/socket.rs:
- Add a
shutdown_rx: tokio::sync::watch::Receiver<bool>parameter tostart_admin_socket - Replace the infinite
loop { listener.accept().await }withtokio::select!:tokio::select! { result = listener.accept() => { /* handle connection */ }, _ = shutdown_rx.changed() => { info!("admin socket shutting down"); break; } } - Clean up the socket file on exit (remove the Unix domain socket file)
- Update callers in
main.rsto pass the shutdown channel
src/rate_limit/mod.rs:
- Add a
shutdown_rx: tokio::sync::watch::Receiver<bool>parameter tostart_eviction_task - Replace infinite loop with
tokio::select!:tokio::select! { _ = interval_timer.tick() => { limiter.evict_stale(max_age); }, _ = shutdown_rx.changed() => { info!("rate limiter eviction task shutting down"); break; } } - Update caller in
main.rs
Acceptance Criteria
- HTTPS server tasks are joined with a timeout, not immediately aborted
- Tasks are only aborted if the shutdown timeout expires before they finish
- Admin socket listener breaks its accept loop on shutdown signal
- Admin socket file is cleaned up on shutdown
- Rate limiter eviction task breaks its loop on shutdown signal
- ACME state machine task is cancellable (it already exits on
Nonefrom stream, but should also respond to cancellation) - In-flight requests are allowed to drain before forceful shutdown
- All existing tests pass
cargo clippypasses with no warnings
References
- docs/architecture/operations.md — shutdown sequence
- docs/reviews/002-implementation-review.md — W1, W7 findings
- src/main.rs — current shutdown sequence
- src/admin/socket.rs — current infinite loop
- src/rate_limit/mod.rs — current infinite eviction loop
- src/server.rs — InFlightCounter and drain_in_flight
Notes
To be filled by implementation agent
Summary
To be filled on completion