Add architecture specification for Rust/axum reverse proxy
Phase 1 architecture docs covering proxy handler, TLS termination (ACME + manual), TOML config with static/dynamic split (ArcSwap), and operations (rate limiting, logging, health check, systemd, graceful shutdown). Nine ADRs documenting key decisions: Rust/axum, custom proxy handler, TOML config, rustls-acme for cert management, tokio-rustls direct, token bucket rate limiting, custom log format for fail2ban, static/dynamic config split, and signal handling strategy. Includes threat landscape research documenting the nginx CVEs motivating this project.
This commit is contained in:
61
docs/architecture/decisions/001-rust-axum.md
Normal file
61
docs/architecture/decisions/001-rust-axum.md
Normal file
@@ -0,0 +1,61 @@
|
||||
# ADR-001: Rust with Axum
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Our current nginx 1.24.0 installation is vulnerable to multiple actively-exploited
|
||||
CVEs, most critically CVE-2026-42945 (CVSS 9.2, unauthenticated RCE via
|
||||
`ngx_http_rewrite_module`). Six of seven recent nginx CVEs are memory corruption
|
||||
bugs (buffer overflow, use-after-free, buffer overread) — the exact class of
|
||||
vulnerabilities that Rust eliminates by construction.
|
||||
|
||||
The threat landscape is worsening: LLM-assisted fuzzing is accelerating bug
|
||||
discovery in nginx's C codebase, and security researchers report additional
|
||||
undisclosed vulnerabilities.
|
||||
|
||||
We need to replace nginx with a memory-safe alternative that can handle:
|
||||
- TLS termination
|
||||
- HTTP reverse proxying to backend services
|
||||
- Rate limiting with fail2ban-compatible logging
|
||||
- Operational simplicity (single binary, systemd integration)
|
||||
|
||||
## Decision
|
||||
|
||||
Use Rust with the axum web framework for the reverse proxy implementation.
|
||||
|
||||
**Rust** provides:
|
||||
- Memory safety by construction (no buffer overflows, use-after-free, or
|
||||
double-free at runtime)
|
||||
- rustls (pure Rust TLS) avoids OpenSSL dependency and its CVE history
|
||||
- Single static binary deployment with no runtime dependencies
|
||||
- Excellent async I/O support via tokio
|
||||
|
||||
**axum** provides:
|
||||
- Ergonomic handler definitions with extractors
|
||||
- Tower middleware ecosystem (Service trait, layers)
|
||||
- Type-safe routing and state management
|
||||
- Well-maintained, widely used, good documentation
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive:**
|
||||
- Eliminates the entire class of memory corruption vulnerabilities affecting
|
||||
nginx
|
||||
- Single binary deployment simplifies operations
|
||||
- Rust's type system catches many errors at compile time
|
||||
- axum + tower provides composable middleware
|
||||
|
||||
**Negative:**
|
||||
- Smaller ecosystem than nginx for HTTP proxy features (but our use case is
|
||||
simple)
|
||||
- We maintain the code (vs. using a battle-tested C project)
|
||||
- Less granular control over HTTP/2 and connection pooling compared to nginx
|
||||
- Team needs Rust expertise (already available)
|
||||
|
||||
## References
|
||||
|
||||
- [threat-landscape.md](../../research/threat-landscape.md)
|
||||
- [overview.md](../overview.md)
|
||||
56
docs/architecture/decisions/002-custom-proxy-handler.md
Normal file
56
docs/architecture/decisions/002-custom-proxy-handler.md
Normal file
@@ -0,0 +1,56 @@
|
||||
# ADR-002: Custom Proxy Handler
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
We need to implement HTTP reverse proxying — receiving requests and forwarding
|
||||
them to an upstream service (Gitea on localhost:3000). Two approaches are
|
||||
available:
|
||||
|
||||
1. **`axum-reverse-proxy` crate**: Provides path-based routing, header
|
||||
forwarding, round-robin load balancing, TLS support, retry mechanisms, and
|
||||
RFC 9110 compliance.
|
||||
2. **Custom handler** (Felix Knorr pattern): Build a handler using hyper's
|
||||
`Client` to forward requests. ~50-100 lines of Rust for our needs.
|
||||
|
||||
Our use case is minimal: single upstream per domain, single domain, no load
|
||||
balancing, no retry, no HTTP/2 proxying.
|
||||
|
||||
## Decision
|
||||
|
||||
Implement a custom proxy handler using hyper's `Client` for request forwarding,
|
||||
following the pattern demonstrated by Felix Knorr and used in the alknet
|
||||
project's channel proxy.
|
||||
|
||||
## Rationale
|
||||
|
||||
- `axum-reverse-proxy` adds complexity we don't need (load balancing, retry,
|
||||
path-based routing to multiple backends)
|
||||
- Our proxy case is the simplest possible: match a Host header, forward the
|
||||
entire request to a single upstream, stream the response back
|
||||
- The Felix Knorr pattern is proven, idiomatic, and ~50-100 lines
|
||||
- We maintain full control over header injection, error handling, and upstream
|
||||
connection behavior
|
||||
- If requirements grow, we can adopt `axum-reverse-proxy` later
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive:**
|
||||
- Minimal dependencies
|
||||
- Full control over proxy behavior
|
||||
- Easy to understand and audit (~100 lines of proxy code)
|
||||
- No unnecessary abstraction layers
|
||||
|
||||
**Negative:**
|
||||
- We implement and maintain proxy logic ourselves (but it's trivial for our
|
||||
use case)
|
||||
- If requirements grow to load balancing or retry, we'd need to add that
|
||||
ourselves or switch to `axum-reverse-proxy`
|
||||
|
||||
## References
|
||||
|
||||
- [proxy.md](../proxy.md)
|
||||
- Felix Knorr, "Replacing nginx with axum" (felix-knorr.net/posts/2024-10-13-replacing-nginx-with-axum.html)
|
||||
44
docs/architecture/decisions/003-toml-config.md
Normal file
44
docs/architecture/decisions/003-toml-config.md
Normal file
@@ -0,0 +1,44 @@
|
||||
# ADR-003: TOML Configuration Format
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The proxy needs a configuration file format for defining sites, TLS settings,
|
||||
bind addresses, and rate limits. Options include TOML, YAML, JSON, and custom
|
||||
binary formats.
|
||||
|
||||
## Decision
|
||||
|
||||
Use TOML as the configuration file format.
|
||||
|
||||
## Rationale
|
||||
|
||||
- **Rust-native**: TOML is the configuration format for Cargo (Rust's package
|
||||
manager). The Rust ecosystem has first-class TOML support via `serde` +
|
||||
`toml` crate.
|
||||
- **Unambiguous**: TOML has a single canonical representation for any given
|
||||
data structure, unlike YAML which has multiple equivalent representations and
|
||||
surprising type coercion rules (e.g., `no` → boolean, `1.0` → float).
|
||||
- **Human-friendly**: TOML is easy to read and write for simple configurations
|
||||
like ours. It supports sections (tables), arrays, and inline tables.
|
||||
- **Good error messages**: The `toml` crate provides clear deserialization
|
||||
error messages pointing to the exact field that failed.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive:**
|
||||
- Familiar to Rust developers (Cargo.toml)
|
||||
- Clear, unambiguous syntax
|
||||
- Excellent serde integration with detailed error reporting
|
||||
- No type coercion surprises
|
||||
|
||||
**Negative:**
|
||||
- Not as widely used for config outside Rust (but our audience is ourselves)
|
||||
- No `#include` or file composition (each config file is self-contained)
|
||||
|
||||
## References
|
||||
|
||||
- [config.md](../config.md)
|
||||
67
docs/architecture/decisions/004-rustls-acme.md
Normal file
67
docs/architecture/decisions/004-rustls-acme.md
Normal file
@@ -0,0 +1,67 @@
|
||||
# ADR-004: ACME-Primary Certificate Management
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The proxy needs TLS certificates for HTTPS. Two approaches are available:
|
||||
|
||||
1. **certbot (external ACME client)**: Run certbot as a cron job or systemd
|
||||
timer to obtain and renew certificates. The proxy loads certificates from
|
||||
files on disk. Renewal requires either SIGHUP/restart or inotify file
|
||||
watching to pick up new certs.
|
||||
|
||||
2. **rustls-acme (built-in ACME client)**: The proxy handles ACME
|
||||
certificate provisioning and renewal internally as a background task. No
|
||||
external certbot dependency. The `ResolvesServerCertAcme` cert resolver
|
||||
automatically serves the correct certificate and updates when renewed.
|
||||
|
||||
The alknet project has successfully implemented the rustls-acme approach, and
|
||||
its patterns are directly reusable.
|
||||
|
||||
## Decision
|
||||
|
||||
Use `rustls-acme` as the primary certificate management mode, with manual
|
||||
certificate paths as a fallback mode for testing, self-signed certs, and
|
||||
corporate CA environments.
|
||||
|
||||
## Rationale
|
||||
|
||||
- **Eliminates certbot dependency**: No external cron job, no deploy hooks, no
|
||||
certbot package to install and maintain. The proxy is self-contained.
|
||||
- **Automatic renewal**: `rustls-acme` runs as a background tokio task that
|
||||
handles certificate provisioning and renewal automatically (~30 days before
|
||||
expiry).
|
||||
- **No restart needed**: When `rustls-acme` provisions a new certificate, the
|
||||
`ResolvesServerCertAcme` resolver updates atomically. No SIGHUP, no restart,
|
||||
no file watching.
|
||||
- **Proven pattern**: alknet uses the same approach successfully.
|
||||
- **Cache persistence**: `DirCache` persists ACME state between restarts,
|
||||
avoiding re-provisioning.
|
||||
- **Fallback mode**: Manual cert paths are still supported for environments
|
||||
where ACME is not possible.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive:**
|
||||
- Single binary deployment (no certbot dependency)
|
||||
- Zero-downtime certificate renewal
|
||||
- Simpler operational model (no certbot cron, no deploy hooks)
|
||||
- Proven in alknet
|
||||
|
||||
**Negative:**
|
||||
- `rustls-acme` is an additional dependency
|
||||
- ACME challenges require either port 80 (HTTP-01) or TLS-ALPN-01 on port 443,
|
||||
which our proxy already listens on
|
||||
- Less control over certificate issuance compared to certbot (e.g., no DNS-01
|
||||
challenge support, though rustls-acme supports TLS-ALPN-01 which is sufficient
|
||||
for our use case)
|
||||
- Manual mode requires restart for cert changes (acceptable for fallback)
|
||||
|
||||
## References
|
||||
|
||||
- [tls.md](../tls.md)
|
||||
- alknet ADR-008: ACME/Let's Encrypt decision
|
||||
- `rustls-acme` crate: https://github.com/FlorianUekermann/rustls-acme
|
||||
65
docs/architecture/decisions/005-tokio-rustls-direct.md
Normal file
65
docs/architecture/decisions/005-tokio-rustls-direct.md
Normal file
@@ -0,0 +1,65 @@
|
||||
# ADR-005: tokio-rustls Directly, Not axum-server
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
We need to serve HTTPS (TLS) traffic through axum. Two approaches exist for
|
||||
integrating TLS with axum:
|
||||
|
||||
1. **`axum-server`**: A wrapper that provides TLS support for axum via
|
||||
`tls_rustls` feature. Handles TCP binding, TLS accept, and passing TLS
|
||||
streams to axum. Simple API but limited control over the TLS configuration.
|
||||
|
||||
2. **`tokio-rustls` directly**: Bind TCP manually, perform TLS handshake with
|
||||
`TlsAcceptor`, then serve the TLS stream to axum/hyper. More code but full
|
||||
control over `ServerConfig`, cipher suites, ALPN protocols, and cert
|
||||
resolvers.
|
||||
|
||||
The alknet project uses tokio-rustls directly and has proven this pattern for
|
||||
both manual and ACME certificate management.
|
||||
|
||||
## Decision
|
||||
|
||||
Use `tokio-rustls` directly for TLS termination, with `hyper` serving the
|
||||
resulting TLS streams to axum. Do not use `axum-server`.
|
||||
|
||||
## Rationale
|
||||
|
||||
- **ACME integration**: The `rustls-acme` `ResolvesServerCertAcme` resolver
|
||||
needs to be set as the certificate resolver on `ServerConfig` via
|
||||
`with_cert_resolver()`. `axum-server` does not expose this level of control
|
||||
over the `ServerConfig`.
|
||||
- **Cipher suite control**: We may need to configure cipher suites beyond the
|
||||
defaults (see OQ-01). `axum-server` wraps the `ServerConfig` construction
|
||||
and may not expose `CryptoProvider` configuration. Direct `tokio-rustls`
|
||||
usage gives us full control.
|
||||
- **ALPN configuration**: ACME TLS-ALPN-01 challenge requires adding
|
||||
`acme-tls/1` to the ALPN protocol list. This is only possible with direct
|
||||
`ServerConfig` access.
|
||||
- **Proven pattern**: alknet uses exactly this approach (`TlsAcceptor` wrapping
|
||||
`tokio-rustls`, with manual or ACME `ServerConfig` construction).
|
||||
- **No abstraction cost**: The code to bind TCP, accept TLS, and serve to
|
||||
axum/hyper is ~50 lines. `axum-server` saves little for our simple case.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive:**
|
||||
- Full control over TLS configuration
|
||||
- Direct `rustls-acme` integration
|
||||
- Ability to add ALPN protocols for ACME challenges
|
||||
- Proven pattern from alknet
|
||||
|
||||
**Negative:**
|
||||
- Slightly more code than `axum-server` (~50 lines for the TLS acceptor loop)
|
||||
- Need to manage the TCP listener and TLS accept explicitly
|
||||
- Must handle the `TlsStream<TcpStream>` → `hyper::service_fn` → axum
|
||||
integration manually (well-documented pattern from Felix Knorr's blog and
|
||||
alknet)
|
||||
|
||||
## References
|
||||
|
||||
- [tls.md](../tls.md)
|
||||
- alknet transport layer (`alknet-core/src/transport/tls.rs`, `alknet-core/src/transport/acme.rs`)
|
||||
77
docs/architecture/decisions/006-rate-limiting-approach.md
Normal file
77
docs/architecture/decisions/006-rate-limiting-approach.md
Normal file
@@ -0,0 +1,77 @@
|
||||
# ADR-006: Token Bucket Rate Limiting with In-Memory State
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The proxy must enforce request rate limits per client IP address, replacing
|
||||
nginx's `limit_req_zone` directive. Rate limiting is critical for preventing
|
||||
abuse and for fail2ban integration (rate-limited requests trigger fail2ban
|
||||
actions).
|
||||
|
||||
Several rate limiting approaches exist:
|
||||
- **Token bucket**: Tokens accumulate at a fixed rate; each request consumes a
|
||||
token. Allows short bursts up to the bucket capacity.
|
||||
- **Leaky bucket**: Requests are processed at a fixed rate; excess requests
|
||||
queue or are rejected. No burst allowance.
|
||||
- **Fixed window**: Count requests in fixed time windows (e.g., per minute).
|
||||
Allows burst at window boundaries.
|
||||
- **Sliding window**: Count requests in a rolling time window. More accurate
|
||||
than fixed window but more complex.
|
||||
|
||||
The current nginx config uses `limit_req zone=gitea_limit burst=20 nodelay`,
|
||||
which is a token bucket with burst allowance.
|
||||
|
||||
For state storage:
|
||||
- **In-memory HashMap**: Fast, no external dependencies, lost on restart.
|
||||
- **External store (Redis, etc.)**: Shared across instances, persists across
|
||||
restarts. Adds operational complexity.
|
||||
- **tower-governor crate**: Pre-built rate limiting middleware. Uses
|
||||
generalized cell algorithm. Adds dependency.
|
||||
|
||||
## Decision
|
||||
|
||||
Use a token bucket algorithm with in-memory `HashMap<IpAddr, TokenBucket>`
|
||||
state, protected by `tokio::sync::Mutex`. Rate limiting runs as axum middleware
|
||||
before the proxy handler.
|
||||
|
||||
Rate limits are global per-IP (not per-site) in Phase 1. Per-site rate limits
|
||||
may be added in Phase 2 as the config model evolves.
|
||||
|
||||
Stale entries in the HashMap are cleaned up periodically. A background task
|
||||
scans the HashMap at a configurable interval (default: 60 seconds) and removes
|
||||
entries that haven't been accessed within the cleanup interval.
|
||||
|
||||
## Rationale
|
||||
|
||||
- Token bucket matches nginx's `limit_req burst` semantics, ensuring
|
||||
behavioral compatibility during migration.
|
||||
- In-memory state is sufficient for a single-instance proxy (no shared state
|
||||
needed).
|
||||
- `tokio::sync::Mutex` (not `std::sync::Mutex`) avoids holding the lock across
|
||||
await points and integrates with the async runtime.
|
||||
- Custom implementation gives full control over logging output for fail2ban
|
||||
integration (ADR-007).
|
||||
- State loss on restart is acceptable — rate limit state is inherently
|
||||
ephemeral.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive:**
|
||||
- Behavioral compatibility with nginx rate limiting
|
||||
- Full control over fail2ban log format
|
||||
- No external dependencies (Redis, etc.)
|
||||
- Simple implementation (~100 lines)
|
||||
|
||||
**Negative:**
|
||||
- Rate limit state is lost on restart (acceptable for single-instance deploy)
|
||||
- Not suitable for multi-instance deployments without external state store
|
||||
(Phase 1 is single-instance)
|
||||
- HashMap grows over time without eviction (mitigated by periodic cleanup)
|
||||
|
||||
## References
|
||||
|
||||
- [operations.md](../operations.md)
|
||||
- nginx `limit_req` documentation
|
||||
67
docs/architecture/decisions/007-custom-log-format.md
Normal file
67
docs/architecture/decisions/007-custom-log-format.md
Normal file
@@ -0,0 +1,67 @@
|
||||
# ADR-007: Custom Structured Log Format for Fail2ban
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The proxy needs to produce log output that fail2ban can parse to detect and ban
|
||||
abusive IP addresses. The current nginx setup uses nginx's default log format
|
||||
with standard fail2ban filters.
|
||||
|
||||
Options for fail2ban integration:
|
||||
- **nginx-compatible format**: Replicate nginx's log format so existing
|
||||
fail2ban filters work unchanged. Couples us to nginx's format.
|
||||
- **Custom structured format**: Design a clean, parseable format with a
|
||||
corresponding custom fail2ban filter. Gives us control and clarity.
|
||||
- **JSON format**: Machine-readable but harder for fail2ban regex matching.
|
||||
|
||||
## Decision
|
||||
|
||||
Use a custom structured log format with a corresponding custom fail2ban filter.
|
||||
|
||||
The format for rate-limited requests:
|
||||
|
||||
```
|
||||
RATE_LIMIT client_ip=<IP> host=<host> path=<path> status=429
|
||||
```
|
||||
|
||||
The format for general access logs:
|
||||
|
||||
```
|
||||
REQUEST client_ip=<IP> host=<host> method=<METHOD> path=<path> status=<code> upstream=<addr> duration_ms=<ms>
|
||||
```
|
||||
|
||||
A corresponding fail2ban filter (`/etc/fail2ban/filter.d/reverse-proxy.conf`)
|
||||
uses regex matching on the `RATE_LIMIT` prefix and `client_ip=<HOST>` field.
|
||||
|
||||
## Rationale
|
||||
|
||||
- Custom format is clear, unambiguous, and self-documenting
|
||||
- No coupling to nginx's format, which may change or include fields we don't
|
||||
produce
|
||||
- `key=value` pairs are easy to parse with regex and easy to extend
|
||||
- The `RATE_LIMIT` prefix makes it trivial to distinguish rate-limit events
|
||||
from other logs
|
||||
- Writing a custom fail2ban filter is straightforward (5 lines of config)
|
||||
- We control both sides (the proxy and the filter), so compatibility is
|
||||
guaranteed
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive:**
|
||||
- Clean, purpose-built format
|
||||
- Easy to extend with new fields
|
||||
- No dependency on nginx log format
|
||||
- Custom fail2ban filter is simple to maintain
|
||||
|
||||
**Negative:**
|
||||
- Cannot reuse existing nginx fail2ban filters (trivial to write our own)
|
||||
- Existing fail2ban configurations need updating (acceptable since we're
|
||||
replacing nginx entirely)
|
||||
|
||||
## References
|
||||
|
||||
- [operations.md](../operations.md)
|
||||
- [open-questions.md](../open-questions.md) OQ-02 (now resolved)
|
||||
@@ -0,0 +1,76 @@
|
||||
# ADR-008: Static/Dynamic Configuration Split with ArcSwap
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The proxy needs configuration that can be partially reloaded at runtime (site
|
||||
definitions, rate limits) without restarting the process and dropping active
|
||||
connections. However, some configuration (bind addresses, TLS mode) fundamentally
|
||||
requires creating new listeners and cannot be changed at runtime.
|
||||
|
||||
Two approaches:
|
||||
- **Full restart for all config changes**: Simple, but requires dropping
|
||||
active connections for every change, including trivial rate limit adjustments.
|
||||
- **Static/dynamic split**: Immutable parameters (bind address, TLS mode) in a
|
||||
`StaticConfig` that requires restart; runtime-adjustable parameters (sites,
|
||||
rate limits) in a `DynamicConfig` that can be atomically swapped via
|
||||
`Arc<ArcSwap<DynamicConfig>>` without dropping connections.
|
||||
|
||||
This pattern is proven in the alknet project, which uses the same
|
||||
`ArcSwap<DynamicConfig>` approach for auth policy, forwarding rules, and rate
|
||||
limits.
|
||||
|
||||
## Decision
|
||||
|
||||
Split configuration into `StaticConfig` (immutable after startup) and
|
||||
`DynamicConfig` (hot-reloadable via `ArcSwap`). The split is:
|
||||
|
||||
**StaticConfig** (restart required):
|
||||
- Bind address, HTTP port, HTTPS port
|
||||
- TLS mode (ACME vs. manual), cert paths, ACME settings
|
||||
- Log level and format
|
||||
|
||||
**DynamicConfig** (hot-reloadable via SIGHUP):
|
||||
- Site definitions (hostname → upstream mappings)
|
||||
- Rate limits (requests per second, burst)
|
||||
- Body size limits
|
||||
|
||||
`ConfigReloadHandle` provides a `reload(DynamicConfig)` method that atomically
|
||||
swaps the entire config. All request handlers read `DynamicConfig` via
|
||||
`ArcSwap::load()` — a lock-free operation.
|
||||
|
||||
## Rationale
|
||||
|
||||
- Rate limits and site definitions change more frequently than bind addresses
|
||||
and TLS settings. Hot-reload avoids unnecessary downtime.
|
||||
- `ArcSwap` provides lock-free reads and atomic writes — no partial updates,
|
||||
no lock contention on the hot path.
|
||||
- Proven pattern from alknet, where it's used for auth policy, forwarding
|
||||
rules, and rate limits.
|
||||
- SIGHUP trigger is simple, well-understood, and compatible with systemd and
|
||||
process supervisors.
|
||||
- The entire config is swapped at once, preventing inconsistent states where
|
||||
some sites use the old config and others use the new one.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive:**
|
||||
- Zero-downtime config reload for sites and rate limits
|
||||
- Lock-free reads on the request hot path
|
||||
- Atomic config updates — no partial states
|
||||
- Proven pattern from alknet
|
||||
|
||||
**Negative:**
|
||||
- Two config types add conceptual complexity
|
||||
- SIGHUP reload requires reading the config file from disk (need to handle
|
||||
file read errors gracefully)
|
||||
- Must validate DynamicConfig before swapping (invalid config must not replace
|
||||
valid config)
|
||||
|
||||
## References
|
||||
|
||||
- [config.md](../config.md)
|
||||
- alknet ADR-030 (static/dynamic config split)
|
||||
62
docs/architecture/decisions/009-signal-handling.md
Normal file
62
docs/architecture/decisions/009-signal-handling.md
Normal file
@@ -0,0 +1,62 @@
|
||||
# ADR-009: Signal Handling Strategy
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The proxy needs to handle Unix signals for:
|
||||
- **Graceful shutdown**: SIGTERM and SIGINT should stop accepting new
|
||||
connections, drain in-flight requests, then exit.
|
||||
- **Config reload**: SIGHUP should trigger a DynamicConfig reload from disk.
|
||||
|
||||
Two approaches for signal handling:
|
||||
- **`tokio::signal`**: Built into tokio. Handles SIGTERM and SIGINT via
|
||||
`ctrl_c()`. Does not directly handle SIGHUP.
|
||||
- **`signal-hook`**: External crate. Handles all Unix signals including SIGHUP.
|
||||
More flexible but adds a dependency.
|
||||
|
||||
## Decision
|
||||
|
||||
Use `signal-hook` for all signal handling. Specifically:
|
||||
- `signal-hook::flag` to set termination flags on SIGTERM/SIGINT
|
||||
- `signal-hook` to register a SIGHUP handler that triggers config reload
|
||||
|
||||
`tokio::signal::ctrl_c()` is registered as a secondary shutdown trigger; both
|
||||
mechanisms converge on the same shutdown path. This is a belt-and-suspenders
|
||||
approach: `signal-hook` handles all signals including SIGHUP, while
|
||||
`ctrl_c()` provides a fallback for environments where signal handling may not
|
||||
be fully wired (e.g., container runtimes).
|
||||
|
||||
The shutdown sequence:
|
||||
1. On SIGTERM or SIGINT: stop accepting new connections, wait up to 30 seconds
|
||||
for in-flight requests to complete, then exit with code 0.
|
||||
2. On SIGHUP: re-read config file, validate, and swap DynamicConfig if valid.
|
||||
Log the result.
|
||||
|
||||
## Rationale
|
||||
|
||||
- SIGHUP handling is required for config reload — `tokio::signal` doesn't
|
||||
support SIGHUP.
|
||||
- `signal-hook` is well-maintained, widely used, and handles all Unix signals.
|
||||
- Using one signal handling mechanism (rather than mixing `tokio::signal` and
|
||||
`signal-hook`) is simpler and avoids edge cases.
|
||||
- `signal-hook::flag` is a minimal, safe API for signal-triggered flags.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive:**
|
||||
- SIGHUP for config reload is simple and well-understood
|
||||
- Single signal handling mechanism for all signals
|
||||
- Compatible with systemd (SIGTERM for shutdown) and standard Unix conventions
|
||||
|
||||
**Negative:**
|
||||
- `signal-hook` is an additional dependency (but a well-established one)
|
||||
- Signal handling requires careful coordination with the tokio runtime (async
|
||||
signal receivers must be properly integrated)
|
||||
|
||||
## References
|
||||
|
||||
- [operations.md](../operations.md)
|
||||
- [config.md](../config.md)
|
||||
Reference in New Issue
Block a user