3.7 KiB
id, name, status, depends_on, scope, risk, impact, level, review_findings
| id | name | status | depends_on | scope | risk | impact | level | review_findings | ||
|---|---|---|---|---|---|---|---|---|---|---|
| fix/graceful-shutdown | Fix shutdown to drain listeners and stop background tasks cleanly | completed | moderate | medium | component | implementation |
|
Description
Two related shutdown issues:
-
W1: On shutdown,
handle.abort()is called on each HTTPS server task inmain.rs. This immediately kills the tokio task, interrupting in-flight request processing. TheInFlightGuardRAII type ensuresdecrementis called on drop, butabort()prevents normal drops. The architecture spec says tasks should be joined with a timeout, not aborted — only aborting after the shutdown timeout expires.The good news:
serve_https_listeneralready has ashutdown_rxthat breaks the accept loop on shutdown signal. So tasks will stop accepting new connections. We just need to wait for them to drain in-flight requests instead of aborting them. -
W7:
start_admin_socketruns an infiniteloopaccepting connections with no way to break out. It doesn't accept a shutdown signal, so it can't be gracefully stopped. Similarly, the rate limiter eviction task runs an infinite loop with no cancellation mechanism.
Changes Required
src/main.rs:
- Replace
handle.abort()loop with timeout-based join:let shutdown_timeout = shutdown.shutdown_timeout(); for handle in https_server_handles { match tokio::time::timeout(shutdown_timeout, handle).await { Ok(_) => {} Err(_) => { warn!("shutdown timeout expired, aborting listener task"); handle.abort(); } } } - After draining, signal cancellation to admin socket and eviction task
src/admin/socket.rs:
- Add a
shutdown_rx: tokio::sync::watch::Receiver<bool>parameter tostart_admin_socket - Replace the infinite
loop { listener.accept().await }withtokio::select!:tokio::select! { result = listener.accept() => { /* handle connection */ }, _ = shutdown_rx.changed() => { info!("admin socket shutting down"); break; } } - Clean up the socket file on exit (remove the Unix domain socket file)
- Update callers in
main.rsto pass the shutdown channel
src/rate_limit/mod.rs:
- Add a
shutdown_rx: tokio::sync::watch::Receiver<bool>parameter tostart_eviction_task - Replace infinite loop with
tokio::select!:tokio::select! { _ = interval_timer.tick() => { limiter.evict_stale(max_age); }, _ = shutdown_rx.changed() => { info!("rate limiter eviction task shutting down"); break; } } - Update caller in
main.rs
Acceptance Criteria
- HTTPS server tasks are joined with a timeout, not immediately aborted
- Tasks are only aborted if the shutdown timeout expires before they finish
- Admin socket listener breaks its accept loop on shutdown signal
- Admin socket file is cleaned up on shutdown
- Rate limiter eviction task breaks its loop on shutdown signal
- ACME state machine task is cancellable (it already exits on
Nonefrom stream, but should also respond to cancellation) - In-flight requests are allowed to drain before forceful shutdown
- All existing tests pass
cargo clippypasses with no warnings
References
- docs/architecture/operations.md — shutdown sequence
- docs/reviews/002-implementation-review.md — W1, W7 findings
- src/main.rs — current shutdown sequence
- src/admin/socket.rs — current infinite loop
- src/rate_limit/mod.rs — current infinite eviction loop
- src/server.rs — InFlightCounter and drain_in_flight
Notes
To be filled by implementation agent
Summary
To be filled on completion