Decompose architecture into 23 atomic tasks across 7 parallel generations
Task graph covers all Phase 1 concerns: config system, TLS termination, proxy handler, operations (rate limiting, logging, health check, admin socket, signals, shutdown, body size limit), deployment artifacts, and two review checkpoints. No circular dependencies. Critical path length of 7. Risk distribution: 3 high-risk (ACME, TLS listener setup, startup orchestration), 7 medium, 11 low, 2 trivial.
This commit is contained in:
64
tasks/config/cli-parsing.md
Normal file
64
tasks/config/cli-parsing.md
Normal file
@@ -0,0 +1,64 @@
|
||||
---
|
||||
id: config/cli-parsing
|
||||
name: Implement CLI argument parsing with clap and config file loading
|
||||
status: pending
|
||||
depends_on: [config/static-config, config/validation]
|
||||
scope: narrow
|
||||
risk: low
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the CLI entry point using `clap` with derive macros. The CLI reads the config file, deserializes both static and dynamic portions, validates, and returns the parsed config.
|
||||
|
||||
### CLI Interface
|
||||
|
||||
```
|
||||
reverse-proxy [OPTIONS]
|
||||
|
||||
Options:
|
||||
--config <PATH> Path to config file [default: /etc/reverse-proxy/config.toml]
|
||||
--validate Validate config and exit
|
||||
--allow-wildcard-bind Permit 0.0.0.0 as a bind address (for container deployments)
|
||||
--help Show help
|
||||
--version Show version
|
||||
```
|
||||
|
||||
The `--allow-wildcard-bind` flag is OR'd with the config `allow_wildcard_bind` field — if either is set, wildcard binding is allowed.
|
||||
|
||||
### Behavior
|
||||
|
||||
- `--validate`: Load and validate the config, print success or errors, exit 0 or 1
|
||||
- Normal run: Load, validate, return config for the startup sequence
|
||||
- Config file not found: exit with error
|
||||
- Config validation fails: exit with code 1 and log all errors
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `clap` with derive macros for CLI parsing
|
||||
- [ ] `--config` flag with default `/etc/reverse-proxy/config.toml`
|
||||
- [ ] `--validate` flag: loads, validates, reports, exits
|
||||
- [ ] `--allow-wildcard-bind` flag: OR'd with config value
|
||||
- [ ] `--version` prints version from `Cargo.toml`
|
||||
- [ ] Config file loading and TOML deserialization
|
||||
- [ ] Validation runs on every load (startup and `--validate`)
|
||||
- [ ] Error messages are clear and actionable
|
||||
- [ ] Unit tests for CLI argument parsing
|
||||
- [ ] Integration test: `--validate` with valid config exits 0
|
||||
- [ ] Integration test: `--validate` with invalid config exits 1 and reports errors
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/operations.md — CLI interface
|
||||
- docs/architecture/config.md — config loading, validation
|
||||
- docs/architecture/decisions/016-explicit-bind-address.md — `allow_wildcard_bind`
|
||||
|
||||
## Notes
|
||||
|
||||
> To be filled by implementation agent
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
70
tasks/config/dynamic-config.md
Normal file
70
tasks/config/dynamic-config.md
Normal file
@@ -0,0 +1,70 @@
|
||||
---
|
||||
id: config/dynamic-config
|
||||
name: Implement DynamicConfig with ArcSwap hot-reload and ConfigReloadHandle
|
||||
status: pending
|
||||
depends_on: [config/static-config]
|
||||
scope: moderate
|
||||
risk: medium
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the dynamic configuration that can be hot-reloaded at runtime without restarting the process. This is the core of the config reload mechanism.
|
||||
|
||||
### DynamicConfig
|
||||
|
||||
Hot-reloadable at runtime via `ArcSwap`. Changes take effect for new connections immediately.
|
||||
|
||||
- `sites: Vec<SiteConfig>` — hostname → upstream mapping (collected from all listeners)
|
||||
- `rate_limit: RateLimitConfig` — `requests_per_second: u32`, `burst: u32`
|
||||
- `body_limit_bytes: u64` — max request body size
|
||||
|
||||
**RateLimitConfig**:
|
||||
- `requests_per_second: u32` — required, > 0
|
||||
- `burst: u32` — required, > 0
|
||||
|
||||
### ArcSwap Pattern
|
||||
|
||||
- `Arc<ArcSwap<DynamicConfig>>` provides lock-free reads on the request hot path
|
||||
- `ConfigReloadHandle` with `reload(new_config)` method atomically swaps the entire config
|
||||
- No partial updates — the entire DynamicConfig is swapped at once
|
||||
- All request handlers read current config via `Arc` dereference (no lock contention)
|
||||
|
||||
### Reload Flow
|
||||
|
||||
1. Read the TOML config file from disk
|
||||
2. Deserialize into full config (both static and dynamic portions)
|
||||
3. Validate the full config (catches static misconfigurations early)
|
||||
4. If valid, swap DynamicConfig via ArcSwap; log warnings for any static changes
|
||||
5. If invalid, reject the reload and keep the old DynamicConfig
|
||||
|
||||
### Reload Serialization
|
||||
|
||||
Use `tokio::sync::Mutex` on the reload code path. If a reload is in progress and a second is requested, the second waits, re-reads the config file (getting the latest), then proceeds.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `DynamicConfig` struct defined with `sites`, `rate_limit`, and `body_limit_bytes` fields
|
||||
- [ ] `RateLimitConfig` struct defined with `requests_per_second` and `burst`
|
||||
- [ ] `Arc<ArcSwap<DynamicConfig>>` used for lock-free reads in handlers
|
||||
- [ ] `ConfigReloadHandle` struct with `reload(DynamicConfig)` method
|
||||
- [ ] Reload serialization via `tokio::sync::Mutex` prevents concurrent reload race conditions
|
||||
- [ ] Static config change detection: if static fields differ from current, log warning listing changed fields
|
||||
- [ ] Unit tests for ArcSwap swap (verify new config visible after reload)
|
||||
- [ ] Unit tests for reload rejection on invalid config
|
||||
- [ ] Unit tests for concurrent reload serialization
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/config.md — DynamicConfig, ArcSwap pattern, reload flow
|
||||
- docs/architecture/decisions/008-static-dynamic-config-split.md — ArcSwap rationale
|
||||
|
||||
## Notes
|
||||
|
||||
> The sites vector is collected from all listeners into a single global routing table. Hostname uniqueness validation happens in the validation step, not in DynamicConfig itself.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
74
tasks/config/static-config.md
Normal file
74
tasks/config/static-config.md
Normal file
@@ -0,0 +1,74 @@
|
||||
---
|
||||
id: config/static-config
|
||||
name: Implement StaticConfig, ListenerConfig, TlsConfig, and LoggingConfig structs with TOML deserialization
|
||||
status: pending
|
||||
depends_on: [setup/project-init]
|
||||
scope: moderate
|
||||
risk: low
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the static configuration structs that are immutable after startup. These are deserialized from the TOML config file and validated at startup. Changes to static config require a process restart.
|
||||
|
||||
### Structs to Implement
|
||||
|
||||
**StaticConfig** (top-level, immutable after startup):
|
||||
- `listeners: Vec<ListenerConfig>` — at least one required
|
||||
- `allow_wildcard_bind: bool` — default `false`
|
||||
- `health_check_port: u16` — default `9900`, `0` to disable
|
||||
- `admin_socket_path: String` — default `/run/reverse-proxy/admin.sock`, empty string to disable
|
||||
- `shutdown_timeout_secs: u64` — default `30`
|
||||
- `logging: LoggingConfig`
|
||||
|
||||
**LoggingConfig** (nested in `[logging]`):
|
||||
- `level: String` — default `"info"`
|
||||
- `format: String` — default `"text"`
|
||||
- `log_file_path: Option<String>` — optional, enables file logging when set
|
||||
|
||||
**ListenerConfig** (per `[[listeners]]`):
|
||||
- `bind_addr: String` — required
|
||||
- `http_port: u16` — default `80`, `0` to disable
|
||||
- `https_port: u16` — default `443`
|
||||
- `tls: TlsConfig`
|
||||
- `sites: Vec<SiteConfig>` — sites defined per listener (moved to global routing in DynamicConfig)
|
||||
|
||||
**TlsConfig** (nested in `[listeners.tls]`):
|
||||
- `mode: String` — `"acme"` or `"manual"`
|
||||
- ACME fields: `acme_domains`, `acme_cache_dir`, `acme_directory`
|
||||
- Manual fields: `cert_path`, `key_path`
|
||||
|
||||
**SiteConfig** (per `[[listeners.sites]]`):
|
||||
- `host: String` — hostname to match
|
||||
- `upstream: String` — `host:port` format
|
||||
- `upstream_scheme: String` — default `"http"`
|
||||
- `upstream_connect_timeout_secs: u64` — default `5`
|
||||
- `upstream_request_timeout_secs: u64` — default `60`
|
||||
|
||||
All structs derive `Debug`, `Clone`, `serde::Deserialize`. Use serde defaults for optional fields.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `StaticConfig`, `LoggingConfig`, `ListenerConfig`, `TlsConfig`, and `SiteConfig` structs defined with correct fields and types
|
||||
- [ ] All structs derive `Debug`, `Clone`, `serde::Deserialize`
|
||||
- [ ] Default values implemented per config.md defaults table
|
||||
- [ ] TOML deserialization works for both multi-config (dedicated-IP) and shared-IP (SAN certificate) config formats
|
||||
- [ ] Unit tests verify deserialization of both example configs from config.md
|
||||
- [ ] `cargo check` and `cargo test` succeed
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/config.md — full config structure, defaults, TOML format
|
||||
- docs/architecture/decisions/003-toml-config.md — TOML format decision
|
||||
- docs/architecture/decisions/008-static-dynamic-config-split.md — static/dynamic split rationale
|
||||
- docs/architecture/decisions/019-multi-config-listeners.md — `[[listeners]]` format
|
||||
|
||||
## Notes
|
||||
|
||||
> SiteConfig is defined per-listener in TOML but collected into a global routing table in DynamicConfig. The per-listener definition is just for config organization; at runtime, hostnames must be unique across all listeners.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
72
tasks/config/validation.md
Normal file
72
tasks/config/validation.md
Normal file
@@ -0,0 +1,72 @@
|
||||
---
|
||||
id: config/validation
|
||||
name: Implement config validation with all 18 validation rules and error reporting
|
||||
status: pending
|
||||
depends_on: [config/static-config]
|
||||
scope: moderate
|
||||
risk: medium
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement comprehensive config validation per the 18 rules defined in config.md. Validation runs on startup (fail-fast, exit with non-zero code) and on reload (reject reload, log error).
|
||||
|
||||
### Validation Rules (from config.md)
|
||||
|
||||
1. At least one `[[listeners]]` entry must exist
|
||||
2. Each listener's `bind_addr` is not `0.0.0.0` unless `allow_wildcard_bind` is enabled (config OR CLI flag — OR relationship)
|
||||
3. Each listener's `bind_addr` and `https_port` combination must be unique
|
||||
4. In ACME mode, `acme_domains` must be non-empty
|
||||
5. In manual mode, `cert_path` and `key_path` must both be set and files must be readable
|
||||
6. Each site must have a `host` and `upstream`
|
||||
7. Site `host` values must be unique across all listeners (no duplicate hostnames)
|
||||
8. `rate_limit.requests_per_second` must be > 0
|
||||
9. `body.limit_bytes` must be > 0
|
||||
10. Each listener's `bind_addr` and `http_port` combination must be unique (if http_port > 0)
|
||||
11. Within a listener, `http_port` and `https_port` must differ
|
||||
12. `https_port` must be 1–65535 (required — TLS needs a port)
|
||||
13. `http_port` must be 0 (disabled) or 1–65535
|
||||
14. `health_check_port` must not conflict with any listener's `http_port` or `https_port` on the same bind address
|
||||
15. Site `host` values must not include a port number (e.g., `git.alk.dev`, not `git.alk.dev:443`)
|
||||
16. Site `host` values must be valid hostnames (not IP addresses, not including ports). Hostnames normalized to lowercase
|
||||
17. `upstream` must be in `host:port` format where `port` is 1–65535
|
||||
18. `upstream_scheme` values must be `"http"` or `"https"` (lowercase)
|
||||
|
||||
### Error Reporting
|
||||
|
||||
On validation failure, collect ALL errors (don't stop at first) and report them together. This helps operators fix multiple issues in one pass. Use a `Vec<ValidationError>` that is logged or printed on startup failure.
|
||||
|
||||
### Startup vs Reload Behavior
|
||||
|
||||
- **Startup**: If validation fails, exit with non-zero code and log all validation errors
|
||||
- **Reload**: If validation fails, reject the reload, log all errors, keep old DynamicConfig active
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] All 18 validation rules implemented
|
||||
- [ ] Validation collects all errors (doesn't stop at first)
|
||||
- [ ] `ValidationError` enum with descriptive messages for each rule
|
||||
- [ ] `validate(config: &StaticConfig, dynamic: &DynamicConfig) -> Result<(), Vec<ValidationError>>` function
|
||||
- [ ] Startup validation: exits with code 1 on failure, logs all errors
|
||||
- [ ] Reload validation: rejects reload on failure, logs all errors, keeps old config
|
||||
- [ ] `allow_wildcard_bind` OR logic: config flag OR CLI flag enables it
|
||||
- [ ] Hostname normalization to lowercase during validation
|
||||
- [ ] File existence check for manual mode `cert_path` and `key_path`
|
||||
- [ ] Unit tests covering each validation rule with valid and invalid inputs
|
||||
- [ ] Integration test: valid config from config.md examples passes all validation
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/config.md — full validation rules, default values, TOML format
|
||||
- docs/architecture/decisions/016-explicit-bind-address.md — `0.0.0.0` rejection rationale
|
||||
- docs/architecture/decisions/020-container-deployment.md — `allow_wildcard_bind` for containers
|
||||
|
||||
## Notes
|
||||
|
||||
> Rule 5 (file readability check for manual certs) should check that the files exist and are readable at validation time, not just that the paths are set. This provides early feedback on misconfiguration.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
95
tasks/deploy/systemd-and-container.md
Normal file
95
tasks/deploy/systemd-and-container.md
Normal file
@@ -0,0 +1,95 @@
|
||||
---
|
||||
id: deploy/systemd-and-container
|
||||
name: Create systemd unit file, Dockerfile, and docker-compose.yml for production deployment
|
||||
status: pending
|
||||
depends_on: [ops/signals-and-shutdown]
|
||||
scope: moderate
|
||||
risk: low
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Create the deployment artifacts: systemd unit file, Dockerfile, and docker-compose.yml template.
|
||||
|
||||
### Systemd Unit File
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=Reverse Proxy
|
||||
After=network.target
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=notify
|
||||
NotifyAccess=all
|
||||
ExecStart=/usr/local/bin/reverse-proxy --config /etc/reverse-proxy/config.toml
|
||||
Restart=on-failure
|
||||
RestartSec=5
|
||||
|
||||
# Security hardening
|
||||
NoNewPrivileges=yes
|
||||
ProtectSystem=strict
|
||||
ProtectHome=yes
|
||||
PrivateTmp=yes
|
||||
ReadWritePaths=/var/lib/reverse-proxy /var/log/reverse-proxy
|
||||
|
||||
# ACME challenge cache directory
|
||||
StateDirectory=reverse-proxy
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
The proxy signals readiness to systemd via `sd_notify("READY=1")` after binding listeners and completing initial configuration load.
|
||||
|
||||
### Dockerfile
|
||||
|
||||
Multi-stage build:
|
||||
- Build stage: `rust:alpine` with `x86_64-unknown-linux-musl` target for static linking
|
||||
- Run stage: `alpine` (or `scratch` for absolute minimum)
|
||||
- The `aws_lc_rs` crypto provider is statically linked — no OpenSSL dependency
|
||||
- Binary is self-contained, no runtime dependencies beyond libc for DNS resolution
|
||||
|
||||
### Docker Compose Template
|
||||
|
||||
Example template showing:
|
||||
- Reverse proxy with volume mounts for config, ACME cache, logs, and admin socket
|
||||
- `allow_wildcard_bind = true` for container deployments
|
||||
- Health check using `wget` against local health endpoint
|
||||
- Network configuration for upstream services
|
||||
|
||||
### Fail2ban Configuration
|
||||
|
||||
- Filter definition matching the `RATE_LIMIT` log prefix
|
||||
- Jail configuration for rate-limiting offenders
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Systemd unit file at `deploy/reverse-proxy.service`
|
||||
- [ ] `Type=notify` with `sd_notify("READY=1")` integration in binary
|
||||
- [ ] Security hardening directives in unit file
|
||||
- [ ] `ReadWritePaths` for ACME cache and log directory
|
||||
- [ ] Dockerfile with multi-stage build (`rust:alpine` → `alpine`/`scratch`)
|
||||
- [ ] Static linking with `x86_64-unknown-linux-musl` target
|
||||
- [ ] Docker Compose template at `deploy/docker-compose.yml`
|
||||
- [ ] Volume mounts for config (ro), ACME cache (rw), logs (rw), admin socket (rw)
|
||||
- [ ] Health check in Docker Compose using `wget` against `http://127.0.0.1:9900/health`
|
||||
- [ ] Fail2ban filter definition at `deploy/fail2ban/filter.d/reverse-proxy.conf`
|
||||
- [ ] Fail2ban jail configuration at `deploy/fail2ban/jail.d/reverse-proxy.conf`
|
||||
- [ ] `docker build` succeeds
|
||||
- [ ] Container starts and responds to health check
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/operations.md — systemd, container deployment, fail2ban, health check
|
||||
- docs/architecture/decisions/020-container-deployment.md — container model rationale
|
||||
|
||||
## Notes
|
||||
|
||||
> The Dockerfile should use `musl` for static linking. The `aws_lc_rs` crypto provider is statically linked. The resulting binary has no runtime dependencies beyond libc for DNS resolution (which `musl` provides).
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
70
tasks/integration/startup-orchestration.md
Normal file
70
tasks/integration/startup-orchestration.md
Normal file
@@ -0,0 +1,70 @@
|
||||
---
|
||||
id: integration/startup-orchestration
|
||||
name: Wire startup sequence with all components and sd_notify readiness signaling
|
||||
status: pending
|
||||
depends_on: [config/cli-parsing, config/validation, config/dynamic-config, tls/tls-listener-setup, tls/http-redirect, proxy/host-routing, proxy/headers-and-forwarding, proxy/error-responses, ops/rate-limiting, ops/logging, ops/health-check, ops/admin-socket, ops/signals-and-shutdown, ops/body-size-limit]
|
||||
scope: broad
|
||||
risk: high
|
||||
impact: project
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Wire together all components in the correct startup sequence from operations.md. This is the `main.rs` that orchestrates everything.
|
||||
|
||||
### Startup Sequence (per operations.md)
|
||||
|
||||
1. **Parse and validate config** — CLI args, TOML deserialization, validation
|
||||
2. **Initialize DynamicConfig** — Load into `ArcSwap<DynamicConfig>`
|
||||
3. **Initialize shared state** — Rate limiter HashMap, shared hyper Client, tracing subscriber
|
||||
4. **Bind health check port** (if enabled) — Fail-fast if bind fails
|
||||
5. **Bind admin socket** (if enabled) — Remove stale socket, warn if occupied
|
||||
6. **Bind all listener ports** — HTTP and HTTPS for each listener. Fail-fast if any bind fails
|
||||
7. **Load TLS configuration** — Manual certs or ACME init. Fail-fast on error
|
||||
8. **Start TCP listeners** — Begin accepting connections on all bound ports
|
||||
9. **Start background tasks** — ACME renewal, rate limiter eviction, signal handler, admin socket handler
|
||||
10. **Signal readiness** — `sd_notify("READY=1")` if running under systemd
|
||||
|
||||
### Component Wiring
|
||||
|
||||
- **axum Router per listener**: Each listener gets its own Router with its own middleware stack
|
||||
- **Shared State**: `Arc<ArcSwap<DynamicConfig>>`, `Arc<Mutex<HashMap<IpAddr, TokenBucket>>>`, `Arc<hyper::Client>` shared via axum State
|
||||
- **Middleware order**: Rate limiting → Body size limit → Proxy header injection → Host routing → Proxy handler
|
||||
- **Health endpoint**: On both the local health check port and the HTTPS listener(s)
|
||||
|
||||
### Fail-Fast Behavior
|
||||
|
||||
If any step in the startup sequence fails, the process exits with a non-zero code. The proxy does not partially start. All ports are bound before any connections are accepted.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Startup sequence follows the exact order from operations.md
|
||||
- [ ] Fail-fast behavior: exit with non-zero code on any startup failure
|
||||
- [ ] All ports bound before any connections accepted
|
||||
- [ ] Per-listener axum Router with shared State
|
||||
- [ ] Middleware stack in correct order: rate limiting → body limit → headers → routing → proxy
|
||||
- [ ] `ConnectInfo<SocketAddr>` propagated to routers for `X-Real-IP`
|
||||
- [ ] `sd_notify("READY=1")` sent after all listeners started
|
||||
- [ ] Graceful shutdown on SIGTERM/SIGINT
|
||||
- [ ] SIGHUP triggers config reload
|
||||
- [ ] Admin socket accepts `reload` and `status` commands
|
||||
- [ ] Health check endpoint responds on local port and HTTPS
|
||||
- [ ] Integration test: full proxy startup with test config, verify all endpoints work
|
||||
- [ ] Integration test: config reload via SIGHUP updates routing
|
||||
- [ ] Integration test: config reload via admin socket updates routing with feedback
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/operations.md — startup sequence, component wiring
|
||||
- docs/architecture/overview.md — architecture diagram
|
||||
- docs/architecture/config.md — StaticConfig, DynamicConfig, reload flow
|
||||
- docs/architecture/proxy.md — middleware stack, request flow
|
||||
|
||||
## Notes
|
||||
|
||||
> This is the critical integration task. All other implementation tasks must be complete before this one can start. The implementation agent should follow the startup sequence from operations.md precisely and ensure fail-fast behavior at every step.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
74
tasks/ops/admin-socket.md
Normal file
74
tasks/ops/admin-socket.md
Normal file
@@ -0,0 +1,74 @@
|
||||
---
|
||||
id: ops/admin-socket
|
||||
name: Implement Unix domain socket admin API for config reload with feedback and status
|
||||
status: pending
|
||||
depends_on: [config/dynamic-config]
|
||||
scope: moderate
|
||||
risk: medium
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the Unix domain socket admin API for programmatic config reload with success/failure feedback. This is an alternative to SIGHUP that provides structured responses.
|
||||
|
||||
### Protocol
|
||||
|
||||
- **Connection lifecycle**: One command per connection. Client connects, sends one newline-terminated command, receives one newline-terminated JSON response, then the server closes the connection.
|
||||
- **Message framing**: Newline-delimited (`\n`). Responses end with `\n`.
|
||||
|
||||
### Commands
|
||||
|
||||
- `reload` — Re-read config file, validate, and swap DynamicConfig. Returns:
|
||||
- Success: `{"status": "ok"}`
|
||||
- Failure: `{"status": "error", "message": "..."}`
|
||||
- `status` — Return basic process info. Returns:
|
||||
- `{"status": "ok", "uptime_secs": 1234, "sites": 2}`
|
||||
|
||||
### Error Responses
|
||||
|
||||
- Unrecognized commands: `{"status": "error", "message": "unknown command: <cmd>"}`
|
||||
- Invalid or empty input: `{"status": "error", "message": "invalid input"}`
|
||||
|
||||
### Socket Lifecycle
|
||||
|
||||
- Socket path from `admin_socket_path` config (default: `/run/reverse-proxy/admin.sock`)
|
||||
- Empty string disables the admin socket
|
||||
- Remove any existing socket file at startup before binding
|
||||
- If the socket file exists and another process is listening, log a warning and disable the admin socket (but continue starting)
|
||||
|
||||
### Concurrency
|
||||
|
||||
- Multiple clients can connect simultaneously
|
||||
- Reload operations are serialized via the same `tokio::sync::Mutex` used by SIGHUP reload
|
||||
- If a reload is in progress, subsequent reload requests wait, then re-read the config file (getting the latest version)
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Unix domain socket bound at `admin_socket_path`
|
||||
- [ ] `reload` command triggers config reload and returns structured JSON response
|
||||
- [ ] `status` command returns process uptime and site count
|
||||
- [ ] Unknown commands return `{"status": "error", "message": "unknown command: ..."}`
|
||||
- [ ] Empty/invalid input returns `{"status": "error", "message": "invalid input"}`
|
||||
- [ ] One command per connection, server closes connection after response
|
||||
- [ ] Stale socket file removed at startup
|
||||
- [ ] If socket file exists and is active (another process), log warning and continue
|
||||
- [ ] `admin_socket_path = ""` disables admin socket
|
||||
- [ ] Reload operations serialized with same Mutex as SIGHUP reload
|
||||
- [ ] Integration test: connect to socket, send `reload`, receive JSON response
|
||||
- [ ] Integration test: connect to socket, send `status`, receive JSON response
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/operations.md — admin socket section
|
||||
- docs/architecture/decisions/014-unix-socket-reload.md — admin socket rationale
|
||||
- docs/architecture/config.md — reload serialization
|
||||
|
||||
## Notes
|
||||
|
||||
> The admin socket and SIGHUP converge on the same reload code path. The only difference is that the admin socket returns a structured response while SIGHUP provides no feedback.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
52
tasks/ops/body-size-limit.md
Normal file
52
tasks/ops/body-size-limit.md
Normal file
@@ -0,0 +1,52 @@
|
||||
---
|
||||
id: ops/body-size-limit
|
||||
name: Implement global request body size limit with axum DefaultBodyLimit middleware
|
||||
status: pending
|
||||
depends_on: [config/dynamic-config]
|
||||
scope: single
|
||||
risk: trivial
|
||||
impact: isolated
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the global request body size limit using axum's `DefaultBodyLimit` middleware. The default limit is 100 MB (104,857,600 bytes), matching the current nginx configuration and accommodating Gitea's push operations with large pack files (ADR-018).
|
||||
|
||||
### Implementation
|
||||
|
||||
- Set `DefaultBodyLimit::max(body_limit_bytes)` as axum middleware
|
||||
- `body_limit_bytes` comes from `DynamicConfig`, so it can be changed at runtime via config reload
|
||||
- When the limit is exceeded, axum returns `413 Payload Too Large` with `Payload Too Large` body
|
||||
- In Phase 1, the limit is global (not per-site)
|
||||
|
||||
### Config Reload
|
||||
|
||||
Since `body_limit_bytes` is in `DynamicConfig`, it updates on config reload. However, axum's `DefaultBodyLimit` is typically set as a layer at router construction time. The implementation needs to ensure the current limit is read from `DynamicConfig` on each request, not cached at router construction time.
|
||||
|
||||
This may require a custom middleware that reads `DynamicConfig` via `ArcSwap` on each request, rather than relying solely on axum's `DefaultBodyLimit`.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Body size limit enforced on all proxied requests
|
||||
- [ ] Default: 100 MB (104,857,600 bytes)
|
||||
- [ ] 413 Payload Too Large response when limit exceeded
|
||||
- [ ] Limit is configurable via `DynamicConfig`
|
||||
- [ ] Limit can be changed at runtime via config reload
|
||||
- [ ] Config value is read from ArcSwap on each request (not cached)
|
||||
- [ ] Integration test: request with body > limit receives 413
|
||||
- [ ] Integration test: request with body < limit succeeds
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/proxy.md — body size limit section
|
||||
- docs/architecture/config.md — DynamicConfig, body_limit_bytes
|
||||
- docs/architecture/decisions/018-body-size-limit.md — 100 MB default rationale
|
||||
|
||||
## Notes
|
||||
|
||||
> The implementation agent should investigate whether axum's `DefaultBodyLimit` can be dynamically updated, or if a custom middleware reading from ArcSwap is needed. The important thing is that config reload changes the limit without restarting.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
57
tasks/ops/health-check.md
Normal file
57
tasks/ops/health-check.md
Normal file
@@ -0,0 +1,57 @@
|
||||
---
|
||||
id: ops/health-check
|
||||
name: Implement health check endpoint on separate local port and HTTPS fallback
|
||||
status: pending
|
||||
depends_on: [config/static-config]
|
||||
scope: narrow
|
||||
risk: low
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the health check endpoint on a separate local port (default: 9900, bound to `127.0.0.1` only) and as a fallback on the HTTPS listener.
|
||||
|
||||
### Local Health Check Port
|
||||
|
||||
- Binds to `127.0.0.1:{health_check_port}`
|
||||
- `GET /health` returns `200 OK` with empty body
|
||||
- `health_check_port = 0` disables the separate listener
|
||||
- Port must not conflict with any listener's `http_port` or `https_port` on `127.0.0.1` (validated in config validation)
|
||||
|
||||
### HTTPS Health Check Fallback
|
||||
|
||||
When the local health check port is enabled, `/health` is also available on the HTTPS listener(s) for TLS-level health verification. External monitoring should prefer the local health check for liveness and can use the HTTPS endpoint for TLS verification.
|
||||
|
||||
### What Health Check Verifies
|
||||
|
||||
- Process is running and tokio runtime is responsive
|
||||
- TLS listener is accepting connections (HTTPS endpoint only)
|
||||
- Config is loaded (StaticConfig and DynamicConfig are initialized)
|
||||
|
||||
It does **NOT** check upstream reachability. The health check answers "is the proxy process healthy?", not "is the upstream reachable?"
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Local health check binds to `127.0.0.1:{health_check_port}` only
|
||||
- [ ] `GET /health` returns `200 OK` with empty body
|
||||
- [ ] `health_check_port = 0` disables the listener
|
||||
- [ ] Port conflict detection in config validation
|
||||
- [ ] `/health` available on HTTPS listener(s) as fallback
|
||||
- [ ] Health check does not verify upstream reachability
|
||||
- [ ] Integration test: local health check responds 200
|
||||
- [ ] Integration test: HTTPS health check responds 200
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/operations.md — health check section
|
||||
- docs/architecture/decisions/013-health-check-port.md — separate local port rationale
|
||||
|
||||
## Notes
|
||||
|
||||
> To be filled by implementation agent
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
89
tasks/ops/logging.md
Normal file
89
tasks/ops/logging.md
Normal file
@@ -0,0 +1,89 @@
|
||||
---
|
||||
id: ops/logging
|
||||
name: Implement structured logging with tracing, file output, and fail2ban-compatible format
|
||||
status: pending
|
||||
depends_on: [setup/project-init]
|
||||
scope: moderate
|
||||
risk: low
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement structured logging using `tracing` and `tracing-subscriber` with dual output (file + stdout) and fail2ban-compatible log format.
|
||||
|
||||
### Log Types
|
||||
|
||||
1. **Access logs** (every proxied request, `info` level):
|
||||
```
|
||||
REQUEST client_ip=203.0.113.50 host=git.alk.dev method=GET path=/user/repo status=200 upstream=127.0.0.1:3000 duration_ms=45
|
||||
```
|
||||
|
||||
2. **Event logs** (rate limits, TLS errors, upstream failures, config reloads):
|
||||
```
|
||||
RATE_LIMIT client_ip=203.0.113.50 host=git.alk.dev path=/login status=429
|
||||
UPSTREAM_ERROR host=git.alk.dev upstream=127.0.0.1:3000 error="connection refused"
|
||||
CONFIG_RELOAD status=success sites=1
|
||||
```
|
||||
|
||||
### Dual Output
|
||||
|
||||
- **File** (primary): Written to `log_file_path` when configured. This is the authoritative source for fail2ban.
|
||||
- **stdout/stderr** (always-on): For `docker logs`, `journalctl`, and development.
|
||||
|
||||
Use `tracing-subscriber` `Layer` composition to write to both simultaneously.
|
||||
|
||||
### Log Levels
|
||||
|
||||
| Level | Use |
|
||||
|-------|-----|
|
||||
| `error` | Unrecoverable failures (TLS handshake failure, config validation) |
|
||||
| `warn` | Rate limit exceeded, upstream unreachable, upstream timeout |
|
||||
| `info` | Access logs, config reloads, ACME events, startup/shutdown |
|
||||
| `debug` | Request/response headers, connection details |
|
||||
| `trace` | Detailed protocol-level information |
|
||||
|
||||
Configurable via `log_level` in StaticConfig.
|
||||
|
||||
### Configuration
|
||||
|
||||
- `logging.level`: Log verbosity (default: `"info"`)
|
||||
- `logging.format`: `"text"` or `"json"` (default: `"text"`)
|
||||
- `logging.log_file_path`: Optional file path; when set, logs are written to this file in addition to stdout
|
||||
|
||||
### File Logging and fail2ban
|
||||
|
||||
File logging is the primary integration point for fail2ban. In container deployments, the log directory is volume-mounted so fail2ban on the host can read it directly.
|
||||
|
||||
A corresponding fail2ban filter definition and jail configuration will be provided in the deployment task.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `tracing` and `tracing-subscriber` initialized with dual output (file + stdout)
|
||||
- [ ] File output enabled when `log_file_path` is configured
|
||||
- [ ] Stdout output always enabled
|
||||
- [ ] Custom event format with `key=value` pairs
|
||||
- [ ] `REQUEST` prefix for access logs
|
||||
- [ ] `RATE_LIMIT` prefix for rate limit events
|
||||
- [ ] `UPSTREAM_ERROR` prefix for upstream failures
|
||||
- [ ] `CONFIG_RELOAD` prefix for config reload events
|
||||
- [ ] Log level configurable via `logging.level`
|
||||
- [ ] JSON format output when `logging.format = "json"`
|
||||
- [ ] Text format output when `logging.format = "text"` (default)
|
||||
- [ ] `duration_ms` field in access logs for response time
|
||||
- [ ] Unit tests for log format output
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/operations.md — logging section, log format
|
||||
- docs/architecture/decisions/007-custom-log-format.md — custom log format rationale
|
||||
- docs/architecture/decisions/020-container-deployment.md — file-primary logging rationale
|
||||
|
||||
## Notes
|
||||
|
||||
> The fail2ban filter and jail configuration are a separate deployment task. This task focuses on producing the correct log format.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
89
tasks/ops/rate-limiting.md
Normal file
89
tasks/ops/rate-limiting.md
Normal file
@@ -0,0 +1,89 @@
|
||||
---
|
||||
id: ops/rate-limiting
|
||||
name: Implement token bucket rate limiting with IPv6 /64 normalization and background eviction
|
||||
status: pending
|
||||
depends_on: [config/dynamic-config]
|
||||
scope: moderate
|
||||
risk: medium
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement per-IP token bucket rate limiting as axum middleware. This runs before the proxy handler and rejects requests that exceed the rate limit with 429 Too Many Requests.
|
||||
|
||||
### Token Bucket Algorithm
|
||||
|
||||
- **Nodelay** semantics matching nginx's `limit_req burst nodelay`
|
||||
- When bucket is empty, request is immediately rejected with 429 — no queuing
|
||||
- Tokens added at rate of `requests_per_second` (1 token every `1000ms / requests_per_second`)
|
||||
- Bucket capacity is `burst` value
|
||||
- Per-IP in Phase 1 (not per-site)
|
||||
|
||||
### IPv6 Normalization
|
||||
|
||||
- **IPv4**: Rate limited per individual address (`/32`)
|
||||
- **IPv6**: Rate limited per `/64` prefix. All addresses in the same `/64` share a token bucket
|
||||
- Normalize IPv6 addresses to their `/64` prefix before bucket lookup
|
||||
|
||||
### Rate Limit State
|
||||
|
||||
- `Arc<Mutex<HashMap<IpAddr, TokenBucket>>>` shared via axum State
|
||||
- Token bucket struct with: `tokens: f64`, `last_refill: Instant`, `rate: f64`, `max: u32`
|
||||
|
||||
### Background Eviction Task
|
||||
|
||||
- Runs every 60 seconds (configurable)
|
||||
- Removes entries whose last access timestamp is older than 300 seconds (5 minutes default)
|
||||
- Prevents unbounded memory growth
|
||||
|
||||
### Config Reload Behavior
|
||||
|
||||
When rate limit parameters change:
|
||||
1. New `DynamicConfig` swapped in via ArcSwap
|
||||
2. On next request from an existing IP, rate limiter reads current DynamicConfig
|
||||
3. Token bucket refills using new rate, capacity set to new burst
|
||||
4. If current token count exceeds new burst max, cap to new burst max
|
||||
5. HashMap is NOT cleared — avoids rate-limiting gap
|
||||
|
||||
### Logging
|
||||
|
||||
Rate limit events logged with `RATE_LIMIT` prefix:
|
||||
```
|
||||
RATE_LIMIT client_ip=203.0.113.50 host=Y.Z path=/W status=429
|
||||
```
|
||||
|
||||
### Middleware Integration
|
||||
|
||||
Rate limiting runs as tower middleware before the proxy handler in the axum router.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Token bucket implementation with nodelay semantics
|
||||
- [ ] Per-IP rate limiting with configurable rate and burst
|
||||
- [ ] IPv6 addresses normalized to `/64` prefix before bucket lookup
|
||||
- [ ] IPv4 addresses used as-is (`/32`)
|
||||
- [ ] Background eviction task removes stale entries every 60 seconds
|
||||
- [ ] Config reload: new rate/burst parameters adopted on next request from existing IP
|
||||
- [ ] Token count capped to new burst max when burst decreases
|
||||
- [ ] HashMap not cleared on config reload (no rate-limiting gap)
|
||||
- [ ] `429 Too Many Requests` response with `Too Many Requests` body
|
||||
- [ ] `RATE_LIMIT` prefixed log event with `client_ip`, `host`, `path`, `status`
|
||||
- [ ] Rate limiter state shared via `Arc<Mutex<HashMap<IpAddr, TokenBucket>>>`
|
||||
- [ ] Unit tests for token bucket algorithm (fill, drain, reject)
|
||||
- [ ] Unit tests for IPv6 `/64` normalization
|
||||
- [ ] Integration test: requests above rate limit receive 429
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/operations.md — rate limiting section
|
||||
- docs/architecture/decisions/006-rate-limiting-approach.md — token bucket rationale
|
||||
|
||||
## Notes
|
||||
|
||||
> The rate limiter must be efficient on the hot path — no locks on reads. Consider using a `DashMap` or similar concurrent map instead of `Mutex<HashMap>` for better read performance. The spec says `Mutex<HashMap>` but an implementation agent may choose a more performant concurrent data structure.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
76
tasks/ops/signals-and-shutdown.md
Normal file
76
tasks/ops/signals-and-shutdown.md
Normal file
@@ -0,0 +1,76 @@
|
||||
---
|
||||
id: ops/signals-and-shutdown
|
||||
name: Implement signal handling (SIGTERM/SIGINT/SIGHUP) and graceful shutdown sequence
|
||||
status: pending
|
||||
depends_on: [config/dynamic-config, ops/admin-socket]
|
||||
scope: moderate
|
||||
risk: medium
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement signal handling for SIGTERM, SIGINT, and SIGHUP, plus the graceful shutdown sequence.
|
||||
|
||||
### Signal Handling
|
||||
|
||||
Using `signal-hook` crate (per ADR-009):
|
||||
|
||||
- **SIGTERM / SIGINT**: Graceful shutdown
|
||||
- **SIGHUP**: Config reload (same code path as admin socket `reload` command)
|
||||
|
||||
### Graceful Shutdown Sequence
|
||||
|
||||
On SIGTERM or SIGINT:
|
||||
|
||||
1. **Stop accepting new connections** — Close all TCP listening sockets
|
||||
2. **Close idle keep-alive connections** — Send `Connection: close` on idle connections
|
||||
3. **Wait for in-flight requests** — Up to `shutdown_timeout_secs` (default: 30)
|
||||
4. **Force-close remaining connections** — After timeout, TCP RST
|
||||
5. **Cancel background tasks** — ACME renewal, rate limiter eviction, admin socket
|
||||
6. **Exit with code 0**
|
||||
|
||||
### SIGHUP for Config Reload
|
||||
|
||||
SIGHUP triggers the same config reload as the admin socket `reload` command:
|
||||
|
||||
1. Re-read the config file from disk
|
||||
2. Deserialize into full config (static + dynamic)
|
||||
3. Validate the full config
|
||||
4. If valid: swap DynamicConfig, log warnings for any static changes
|
||||
5. If invalid: reject reload, log error, keep old DynamicConfig
|
||||
|
||||
SIGHUP provides no feedback on success or failure — it just logs. The admin socket is the programmatic alternative with structured responses.
|
||||
|
||||
### Shutdown Timeout
|
||||
|
||||
Configurable via `shutdown_timeout_secs` in StaticConfig (default: 30 seconds).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `signal-hook` handles SIGTERM, SIGINT, SIGHUP
|
||||
- [ ] SIGTERM/SIGINT triggers graceful shutdown sequence
|
||||
- [ ] SIGHUP triggers config reload (same code path as admin socket)
|
||||
- [ ] Graceful shutdown: close listening sockets first
|
||||
- [ ] Graceful shutdown: close idle keep-alive connections
|
||||
- [ ] Graceful shutdown: wait for in-flight requests up to timeout
|
||||
- [ ] Graceful shutdown: force-close remaining connections after timeout
|
||||
- [ ] Cancel background tasks (ACME, eviction, admin socket) on shutdown
|
||||
- [ ] Exit code 0 on graceful shutdown
|
||||
- [ ] `shutdown_timeout_secs` configurable in StaticConfig
|
||||
- [ ] SIGHUP reload converges on same code path as admin socket reload
|
||||
- [ ] Integration test: send SIGTERM, verify graceful shutdown sequence
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/operations.md — signal handling, shutdown sequence
|
||||
- docs/architecture/decisions/009-signal-handling.md — signal handling strategy
|
||||
|
||||
## Notes
|
||||
|
||||
> The shutdown sequence must be carefully ordered. Closing listening sockets before waiting for in-flight requests ensures no new connections arrive while existing ones drain.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
61
tasks/proxy/error-responses.md
Normal file
61
tasks/proxy/error-responses.md
Normal file
@@ -0,0 +1,61 @@
|
||||
---
|
||||
id: proxy/error-responses
|
||||
name: Implement proxy error responses with plain text bodies and correct status codes
|
||||
status: pending
|
||||
depends_on: [proxy/host-routing]
|
||||
scope: single
|
||||
risk: trivial
|
||||
impact: isolated
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the error response types for the proxy handler. All error responses use plain text bodies with no proxy version or identity information. No upstream error details are included.
|
||||
|
||||
### Error Response Table
|
||||
|
||||
| Upstream Condition | Response | Body |
|
||||
|-------------------|----------|------|
|
||||
| Upstream reachable | Stream response as-is | (upstream body) |
|
||||
| Upstream unreachable | 502 Bad Gateway | `Bad Gateway` |
|
||||
| Upstream timeout | 504 Gateway Timeout | `Gateway Timeout` |
|
||||
| Request body too large | 413 Payload Too Large | `Payload Too Large` |
|
||||
| Rate limit exceeded | 429 Too Many Requests | `Too Many Requests` |
|
||||
| Unknown Host header | 404 Not Found | `Not Found` |
|
||||
| Missing Host header | 400 Bad Request | `Bad Request` |
|
||||
|
||||
### Response Format
|
||||
|
||||
- Content-Type: `text/plain; charset=utf-8`
|
||||
- Body: Brief status text matching the HTTP status
|
||||
- No proxy version or identity information
|
||||
- No upstream error details leaked
|
||||
|
||||
### Logging
|
||||
|
||||
- 502 and 504 responses logged at `warn` level with structured fields
|
||||
- 429 responses logged at `info` level with RATE_LIMIT prefix
|
||||
- 404 and 400 responses not specially logged (normal routing)
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Error response type/enum covering all cases in the table
|
||||
- [ ] All error responses use `text/plain; charset=utf-8` Content-Type
|
||||
- [ ] Error bodies are brief status text with no version or identity info
|
||||
- [ ] 502 logged at `warn` level with host and upstream
|
||||
- [ ] 504 logged at `warn` level with host and upstream
|
||||
- [ ] 429 logged at `info` level with RATE_LIMIT prefix
|
||||
- [ ] Unit tests for each error response type
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/proxy.md — error handling section
|
||||
|
||||
## Notes
|
||||
|
||||
> This is a small but important task — correct error responses without information leakage are a security concern. Implementation agents should not add extra detail to error bodies.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
89
tasks/proxy/headers-and-forwarding.md
Normal file
89
tasks/proxy/headers-and-forwarding.md
Normal file
@@ -0,0 +1,89 @@
|
||||
---
|
||||
id: proxy/headers-and-forwarding
|
||||
name: Implement proxy header injection, hop-by-hop removal, and request forwarding with hyper Client
|
||||
status: pending
|
||||
depends_on: [proxy/host-routing]
|
||||
scope: moderate
|
||||
risk: medium
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the core reverse proxy logic: inject proxy headers, remove hop-by-hop headers, and forward requests to the upstream via a shared `hyper::Client`.
|
||||
|
||||
### Proxy Header Injection
|
||||
|
||||
The proxy is an **edge proxy** — it sits directly in front of the internet with no trusted proxies upstream. This means existing `X-Forwarded-For` headers from the client cannot be trusted.
|
||||
|
||||
| Header | Value Source | Behavior |
|
||||
|--------|-------------|----------|
|
||||
| `Host` | Original request `Host` header | Preserved as-is |
|
||||
| `X-Real-IP` | `ConnectInfo<SocketAddr>` remote IP | Set to client's IP address |
|
||||
| `X-Forwarded-For` | `ConnectInfo<SocketAddr>` remote IP | **Replaced**, not appended |
|
||||
| `X-Forwarded-Proto` | Determined by listener port | `https` for `https_port`, `http` for `http_port` |
|
||||
|
||||
### Hop-by-Hop Header Removal
|
||||
|
||||
Remove these headers before forwarding to upstream (RFC 2616 §13.5.1):
|
||||
- `Connection`, `Keep-Alive`, `Proxy-Authorization`, `Proxy-Authenticate`
|
||||
- `TE`, `Trailers`, `Transfer-Encoding`, `Upgrade`
|
||||
|
||||
Also remove these from upstream responses before sending to client.
|
||||
|
||||
### Request Forwarding
|
||||
|
||||
1. Build the upstream URI: `{upstream_scheme}://{upstream}{path}?{query}`
|
||||
2. Copy request method, headers (with proxy headers injected, hop-by-hop removed), and body
|
||||
3. Send via shared `hyper::Client` with per-site timeout overrides
|
||||
4. Stream response back to client (chunk-by-chunk, not buffered)
|
||||
5. Handle client disconnect (log at debug, close upstream connection)
|
||||
6. Handle upstream disconnect (send whatever was already sent, close connection)
|
||||
|
||||
### hyper Client Configuration
|
||||
|
||||
- Created once at startup, shared via axum State
|
||||
- HTTP/1.1 only for upstream connections
|
||||
- No redirect following (proxies should not follow redirects)
|
||||
- Connection pooling (hyper default behavior)
|
||||
- Per-site timeout overrides: `upstream_connect_timeout_secs` (default 5s), `upstream_request_timeout_secs` (default 60s)
|
||||
|
||||
### Upstream Scheme
|
||||
|
||||
Default is `http://`. When `upstream_scheme` is `"https"`, validate the upstream's TLS certificate using the system's native TLS root certificates. Certificate validation failures result in `502 Bad Gateway`.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `X-Real-IP` set from `ConnectInfo<SocketAddr>` remote IP
|
||||
- [ ] `X-Forwarded-For` **replaced** (not appended) with client IP
|
||||
- [ ] `X-Forwarded-Proto` set to `https` or `http` based on listener port
|
||||
- [ ] `Host` header preserved as-is
|
||||
- [ ] Hop-by-hop headers removed before forwarding to upstream
|
||||
- [ ] Hop-by-hop headers removed from upstream response before sending to client
|
||||
- [ ] No `Server` header added to responses
|
||||
- [ ] No `Via` header added in Phase 1
|
||||
- [ ] Request body streamed (not buffered) to upstream
|
||||
- [ ] Response body streamed (not buffered) to client
|
||||
- [ ] Client disconnect logged at debug level, upstream connection closed
|
||||
- [ ] Upstream disconnect: client receives whatever was already sent
|
||||
- [ ] Per-site timeout overrides applied to hyper Client requests
|
||||
- [ ] `upstream_scheme: "https"` validates upstream TLS certificate with system roots
|
||||
- [ ] Shared `hyper::Client` instance via axum State
|
||||
- [ ] Unit tests for header injection and removal
|
||||
- [ ] Integration test: proxy request to upstream, verify headers and response
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/proxy.md — header injection, request forwarding, error handling
|
||||
- docs/architecture/decisions/002-custom-proxy-handler.md — custom handler rationale
|
||||
- docs/architecture/decisions/017-upstream-connection-defaults.md — HTTP/1.1, no redirects
|
||||
- docs/architecture/decisions/021-x-forwarded-for-edge-proxy.md — edge proxy model
|
||||
|
||||
## Notes
|
||||
|
||||
> The `X-Forwarded-For: replace, don't append` behavior is critical. The proxy is the edge — there are no trusted proxies upstream. Existing `X-Forwarded-For` values from the client could be spoofed and must not be trusted.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
56
tasks/proxy/host-routing.md
Normal file
56
tasks/proxy/host-routing.md
Normal file
@@ -0,0 +1,56 @@
|
||||
---
|
||||
id: proxy/host-routing
|
||||
name: Implement Host-based routing with global routing table from DynamicConfig
|
||||
status: pending
|
||||
depends_on: [config/dynamic-config]
|
||||
scope: narrow
|
||||
risk: low
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the host-based routing that matches incoming requests to site definitions. Sites are defined per-listener in TOML but collected into a single global routing table in `DynamicConfig`.
|
||||
|
||||
### Routing Logic
|
||||
|
||||
1. Check for `/health` path — if matched, return 200 OK with empty body (regardless of Host)
|
||||
2. Extract `Host` header from request
|
||||
3. If no `Host` header, return `400 Bad Request`
|
||||
4. Normalize `Host` to lowercase, strip port component (e.g., `git.alk.dev:443` → `git.alk.dev`)
|
||||
5. Look up normalized host in the global routing table
|
||||
6. If found, forward to the matching `SiteConfig`'s upstream
|
||||
7. If not found, return `404 Not Found`
|
||||
|
||||
### Global Routing Table
|
||||
|
||||
The routing table is a `HashMap<String, SiteConfig>` (or similar) in `DynamicConfig`, built by collecting all sites from all listeners. Hostnames must be unique — validation enforces this.
|
||||
|
||||
The routing table is part of `DynamicConfig` and is swapped atomically on config reload. This means a config reload can add, remove, or change site routing without restarting.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Host-based routing extracts `Host` header and normalizes to lowercase
|
||||
- [ ] Port component stripped from `Host` header before matching
|
||||
- [ ] `/health` path matches regardless of `Host` header, returns 200 OK
|
||||
- [ ] Missing `Host` header returns `400 Bad Request`
|
||||
- [ ] Unknown host returns `404 Not Found`
|
||||
- [ ] Global routing table built from all listeners' site definitions
|
||||
- [ ] Routing table updates atomically on config reload via ArcSwap
|
||||
- [ ] Case-insensitive host matching per RFC 7230 §2.7.3
|
||||
- [ ] Unit tests for host normalization (case, port stripping)
|
||||
- [ ] Unit tests for routing table lookup (match, no match)
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/proxy.md — Host-based routing section
|
||||
- docs/architecture/config.md — DynamicConfig, global routing table
|
||||
|
||||
## Notes
|
||||
|
||||
> To be filled by implementation agent
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
76
tasks/review/core-components.md
Normal file
76
tasks/review/core-components.md
Normal file
@@ -0,0 +1,76 @@
|
||||
---
|
||||
id: review/core-components
|
||||
name: Review core component implementations for spec conformance and pattern consistency
|
||||
status: pending
|
||||
depends_on: [config/static-config, config/dynamic-config, config/validation, config/cli-parsing, tls/manual-tls, tls/acme-tls, proxy/host-routing, proxy/headers-and-forwarding, proxy/error-responses]
|
||||
scope: moderate
|
||||
risk: low
|
||||
impact: phase
|
||||
level: review
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Review the core component implementations (config, TLS, proxy) for spec conformance, pattern consistency, and correctness before proceeding to the integration and operations phase.
|
||||
|
||||
### Review Checklist
|
||||
|
||||
1. **Config conformance**:
|
||||
- StaticConfig fields match config.md exactly
|
||||
- DynamicConfig fields match config.md exactly
|
||||
- All 18 validation rules implemented
|
||||
- Default values match config.md defaults table
|
||||
- TOML deserialization works for both example configs
|
||||
|
||||
2. **TLS conformance**:
|
||||
- Manual mode: PEM loading, ServerConfig construction, cipher suite restriction
|
||||
- ACME mode: rustls-acme integration, challenge handling, certificate failure behavior
|
||||
- Cipher suites match ADR-012 (4 TLS 1.2 suites + all TLS 1.3)
|
||||
- Protocol versions restricted to TLS 1.2 and 1.3
|
||||
|
||||
3. **Proxy conformance**:
|
||||
- Host-based routing: case-insensitive, port-stripped, global routing table
|
||||
- Header injection: X-Real-IP, X-Forwarded-For (replaced), X-Forwarded-Proto, Host
|
||||
- Hop-by-hop header removal
|
||||
- Error responses: correct status codes, plain text, no information leakage
|
||||
- Request forwarding: streaming, no buffering, hyper Client configuration
|
||||
|
||||
4. **Pattern consistency**:
|
||||
- ArcSwap used consistently for DynamicConfig
|
||||
- ConnectInfo propagated correctly
|
||||
- Error handling patterns are consistent
|
||||
- Logging patterns are consistent
|
||||
|
||||
5. **Test coverage**:
|
||||
- Unit tests for config validation rules
|
||||
- Unit tests for host routing
|
||||
- Unit tests for header injection
|
||||
- Integration tests for proxy forwarding
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] All StaticConfig/DynamicConfig fields match config.md
|
||||
- [ ] All validation rules implemented correctly
|
||||
- [ ] TLS cipher suites and protocol versions match ADR-012
|
||||
- [ ] Proxy headers match ADR-021 (X-Forwarded-For replaced, not appended)
|
||||
- [ ] Error responses match proxy.md table
|
||||
- [ ] ArcSwap pattern consistent across codebase
|
||||
- [ ] Test coverage adequate for core functionality
|
||||
- [ ] `cargo clippy` passes with no warnings
|
||||
- [ ] `cargo fmt --check` passes
|
||||
- [ ] All existing tests pass
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/config.md
|
||||
- docs/architecture/tls.md
|
||||
- docs/architecture/proxy.md
|
||||
- docs/architecture/decisions/ (relevant ADRs)
|
||||
|
||||
## Notes
|
||||
|
||||
> This review should verify that the core components are ready for integration. Focus on spec conformance and pattern consistency. If deviations are found, document them and decide whether to fix or accept.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
84
tasks/review/integration-readiness.md
Normal file
84
tasks/review/integration-readiness.md
Normal file
@@ -0,0 +1,84 @@
|
||||
---
|
||||
id: review/integration-readiness
|
||||
name: Review full integration and deployment readiness before release
|
||||
status: pending
|
||||
depends_on: [integration/startup-orchestration, deploy/systemd-and-container]
|
||||
scope: broad
|
||||
risk: medium
|
||||
impact: project
|
||||
level: review
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Review the full integration and deployment readiness. This is the final review before the proxy is considered production-ready.
|
||||
|
||||
### Review Checklist
|
||||
|
||||
1. **Startup sequence**:
|
||||
- All components initialize in the correct order
|
||||
- Fail-fast on any initialization error
|
||||
- All ports bound before accepting connections
|
||||
- `sd_notify("READY=1")` sent correctly
|
||||
|
||||
2. **Config reload**:
|
||||
- SIGHUP reload works correctly
|
||||
- Admin socket `reload` and `status` commands work
|
||||
- Reload serialization prevents race conditions
|
||||
- Static config change detection logs warnings
|
||||
- Invalid config rejection preserves old config
|
||||
|
||||
3. **Graceful shutdown**:
|
||||
- SIGTERM/SIGINT triggers graceful shutdown
|
||||
- Listening sockets closed
|
||||
- In-flight requests drained within timeout
|
||||
- Background tasks cancelled
|
||||
- Exit code 0 on clean shutdown
|
||||
|
||||
4. **Security**:
|
||||
- No information leakage in error responses
|
||||
- X-Forwarded-For replaced (not appended)
|
||||
- Cipher suites restricted to nginx scope
|
||||
- Bind address validation (no 0.0.0.0 unless allowed)
|
||||
- Rate limiting effective against basic abuse
|
||||
|
||||
5. **Production readiness**:
|
||||
- Docker image builds and runs correctly
|
||||
- Systemd unit file works
|
||||
- Health check endpoint responds
|
||||
- Log file output in correct format for fail2ban
|
||||
- ACME certificate provisioning works (manual testing against staging)
|
||||
|
||||
6. **Documentation**:
|
||||
- Config file examples are correct and complete
|
||||
- Deployment guide covers both systemd and container setups
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Full startup sequence works with both single and multi-listener configs
|
||||
- [ ] Config reload via SIGHUP works with feedback in logs
|
||||
- [ ] Config reload via admin socket works with structured JSON feedback
|
||||
- [ ] Graceful shutdown completes within timeout
|
||||
- [ ] No error response leaks version or identity information
|
||||
- [ ] Docker image builds and passes health check
|
||||
- [ ] Systemd unit file is correct
|
||||
- [ ] fail2ban filter matches `RATE_LIMIT` log format
|
||||
- [ ] All tests pass: `cargo test`
|
||||
- [ ] No clippy warnings: `cargo clippy`
|
||||
- [ ] Formatting clean: `cargo fmt --check`
|
||||
- [ ] Manual testing against ACME staging succeeds
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/operations.md — full operations review
|
||||
- docs/architecture/config.md — config reload
|
||||
- docs/architecture/tls.md — ACME testing
|
||||
- docs/architecture/decisions/ (all ADRs)
|
||||
|
||||
## Notes
|
||||
|
||||
> This review should be thorough and practical. Manual testing against ACME staging should be done at this point. Any deviations from the spec should be documented and accepted or fixed.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
98
tasks/setup/project-init.md
Normal file
98
tasks/setup/project-init.md
Normal file
@@ -0,0 +1,98 @@
|
||||
---
|
||||
id: setup/project-init
|
||||
name: Initialize Rust project with Cargo, dependencies, and module skeleton
|
||||
status: pending
|
||||
depends_on: []
|
||||
scope: moderate
|
||||
risk: low
|
||||
impact: project
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Initialize the Rust project from scratch. The repo currently has only `docs/` and `.git/`. Set up a single-binary Rust project with all core dependencies per the architecture spec (overview.md), and create the module skeleton that subsequent tasks will fill in.
|
||||
|
||||
This is a single-binary project — there are no library exports. The product is the `reverse-proxy` binary.
|
||||
|
||||
### Core Dependencies
|
||||
|
||||
| Crate | Purpose |
|
||||
|-------|---------|
|
||||
| `axum` 0.8 | HTTP framework, routing, middleware, extractors |
|
||||
| `tokio` 1 (full) | Async runtime |
|
||||
| `hyper` 1 | HTTP protocol, proxy `Client` |
|
||||
| `tower` 0.5 | Middleware ecosystem, Service trait |
|
||||
| `rustls` 0.23 | TLS implementation, `aws_lc_rs` crypto provider |
|
||||
| `tokio-rustls` 0.26 | Async TLS I/O |
|
||||
| `rustls-acme` 0.12 | ACME client for Let's Encrypt |
|
||||
| `serde` 1 | Serialization |
|
||||
| `toml` 0.8 | Config format |
|
||||
| `arc-swap` 1 | Atomic config swap for DynamicConfig |
|
||||
| `tracing` 0.1 | Structured logging |
|
||||
| `tracing-subscriber` 0.3 | Log output (file + stdout) |
|
||||
| `rustls-pemfile` 2 | PEM parsing for manual cert loading |
|
||||
| `rustls-pki-types` 1 | TLS types (CertificateDer, PrivateKeyDer) |
|
||||
| `clap` 4 | CLI arguments |
|
||||
| `signal-hook` 0.3 | SIGTERM/SIGINT/SIGHUP handling |
|
||||
|
||||
Pin exact versions in `Cargo.toml` per standard Rust practice.
|
||||
|
||||
### Module Skeleton
|
||||
|
||||
```
|
||||
src/
|
||||
├── main.rs — entry point, CLI parsing, startup orchestration
|
||||
├── config/
|
||||
│ ├── mod.rs — config module, re-exports
|
||||
│ ├── static_config.rs — StaticConfig, ListenerConfig, TlsConfig, LoggingConfig
|
||||
│ ├── dynamic_config.rs — DynamicConfig, SiteConfig, RateLimitConfig
|
||||
│ └── validation.rs — config validation logic
|
||||
├── proxy/
|
||||
│ ├── mod.rs — proxy module, re-exports
|
||||
│ ├── handler.rs — reverse proxy handler
|
||||
│ ├── headers.rs — proxy header injection
|
||||
│ └── error.rs — error response types
|
||||
├── tls/
|
||||
│ ├── mod.rs — TLS module, re-exports
|
||||
│ ├── acceptor.rs — TLS acceptor construction (manual + ACME)
|
||||
│ └── redirect.rs — HTTP → HTTPS redirect handler
|
||||
├── rate_limit/
|
||||
│ ├── mod.rs — rate limit module
|
||||
│ └── bucket.rs — token bucket implementation
|
||||
├── logging/
|
||||
│ ├── mod.rs — logging module
|
||||
│ └── format.rs — custom structured log format
|
||||
├── admin/
|
||||
│ ├── mod.rs — admin socket module
|
||||
│ └── socket.rs — Unix domain socket handler
|
||||
├── health.rs — health check endpoint
|
||||
└── shutdown.rs — graceful shutdown logic
|
||||
```
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `Cargo.toml` with all dependencies listed in overview.md, exact versions pinned
|
||||
- [ ] `src/main.rs` with minimal `fn main()` that compiles
|
||||
- [ ] All module files exist with `mod.rs` re-exports and skeleton content
|
||||
- [ ] `cargo check` succeeds with no errors
|
||||
- [ ] `cargo clippy` succeeds with no warnings
|
||||
- [ ] Binary name is `reverse-proxy` in `Cargo.toml`
|
||||
- [ ] `.gitignore` covers `target/`
|
||||
- [ ] Dual licensing: `MIT OR Apache-2.0` in `Cargo.toml`
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/overview.md — crate dependencies, exports
|
||||
- docs/architecture/config.md — config structure
|
||||
- docs/architecture/proxy.md — proxy handler architecture
|
||||
- docs/architecture/tls.md — TLS architecture
|
||||
- docs/architecture/operations.md — rate limiting, logging, health check, shutdown
|
||||
|
||||
## Notes
|
||||
|
||||
> To be filled by implementation agent
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
43
tasks/setup/test-infrastructure.md
Normal file
43
tasks/setup/test-infrastructure.md
Normal file
@@ -0,0 +1,43 @@
|
||||
---
|
||||
id: setup/test-infrastructure
|
||||
name: Set up test infrastructure with integration test helpers and fixtures
|
||||
status: pending
|
||||
depends_on: [setup/project-init]
|
||||
scope: narrow
|
||||
risk: low
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Set up the testing infrastructure that subsequent implementation tasks will use. This includes integration test directory structure, test helpers for creating mock configs, and HTTP test utilities.
|
||||
|
||||
Create:
|
||||
|
||||
1. **Test module structure**: `tests/` directory for integration tests, `src/config/test_fixtures.rs` for config test helpers
|
||||
2. **Test config fixtures**: Helper functions to create valid `StaticConfig` and `DynamicConfig` instances for tests (minimal valid config that passes validation)
|
||||
3. **HTTP test helpers**: Utilities for spinning up test HTTP servers (for upstream mocking) using `hyper`'s test server or `tokio::net::TcpListener`
|
||||
4. **Test TLS helpers**: Self-signed certificate generation for TLS tests (using `rcgen` dev-dependency)
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `tests/` directory exists with a sample integration test that compiles
|
||||
- [ ] Test helper module with `test_static_config()` and `test_dynamic_config()` fixture functions
|
||||
- [ ] `rcgen` added as a dev-dependency for self-signed cert generation
|
||||
- [ ] `tokio-test` or equivalent test utilities available
|
||||
- [ ] `cargo test` succeeds with the skeleton test
|
||||
- [ ] Test config fixtures produce configs that would pass validation (once validation is implemented)
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/config.md — config structures to create fixtures for
|
||||
- docs/architecture/proxy.md — proxy handler that will need upstream mocking
|
||||
|
||||
## Notes
|
||||
|
||||
> To be filled by implementation agent
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
72
tasks/tls/acme-tls.md
Normal file
72
tasks/tls/acme-tls.md
Normal file
@@ -0,0 +1,72 @@
|
||||
---
|
||||
id: tls/acme-tls
|
||||
name: Implement ACME certificate provisioning with rustls-acme for automatic Let's Encrypt
|
||||
status: pending
|
||||
depends_on: [setup/project-init]
|
||||
scope: moderate
|
||||
risk: high
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement ACME mode TLS certificate provisioning using `rustls-acme`. Each listener in ACME mode creates its own `AcmeCertProvider` with the listener's domain list, cache directory, and Let's Encrypt directory.
|
||||
|
||||
### ACME Mode
|
||||
|
||||
For each listener in ACME mode:
|
||||
1. Create `AcmeConfig::new(domains)` with the domains from `acme_domains`
|
||||
2. Configure the ACME state machine as a background tokio task per listener
|
||||
3. `ResolvesServerCertAcme` serves the ACME-provisioned certificate
|
||||
4. Certificate renewal is automatic (~30 days before expiry)
|
||||
5. Cache directory persists ACME state between restarts via `DirCache`
|
||||
|
||||
### Certificate Failure Behavior
|
||||
|
||||
| Scenario | Behavior |
|
||||
|----------|----------|
|
||||
| First start, no cached cert, ACME unreachable | **Fail to start** with clear error |
|
||||
| First start, no cached cert, ACME succeeds | Normal startup |
|
||||
| Start with cached cert, ACME unreachable for renewal | **Start normally** with cached cert, log `warn` |
|
||||
| Renewal failure after startup | **Continue serving existing cert**, log `warn` |
|
||||
| Cached cert expired, renewal fails at startup | **Fail to start** |
|
||||
| Cached cert expires during runtime | **Continue serving expired cert**, log `error` |
|
||||
|
||||
Key principle: **never start without a valid TLS certificate, but always continue serving if a valid cert exists**.
|
||||
|
||||
### ACME Challenge Type
|
||||
|
||||
Default is TLS-ALPN-01 since the proxy already listens on port 443. HTTP-01 is available as a fallback via the port 80 redirect listener serving `/.well-known/acme-challenge/{token}`.
|
||||
|
||||
### ServerConfig for ACME Mode
|
||||
|
||||
Build `ServerConfig` with `with_cert_resolver()`, passing the `ResolvesServerCertAcme` resolver. Register `acme-tls/1` in `alpn_protocols` for TLS-ALPN-01 challenge handling.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] ACME state machine runs as background tokio task per listener
|
||||
- [ ] `AcmeConfig` created per listener with correct domains, cache dir, and directory
|
||||
- [ ] `ResolvesServerCertAcme` integrated into `ServerConfig`
|
||||
- [ ] `acme-tls/1` ALPN protocol registered for TLS-ALPN-01 challenges
|
||||
- [ ] Cipher suite and protocol version restrictions applied (same as manual mode)
|
||||
- [ ] Certificate failure behavior matches the table above
|
||||
- [ ] Cache directory (`DirCache`) persists ACME state between restarts
|
||||
- [ ] Each listener uses its own cache directory to avoid conflicts
|
||||
- [ ] ACME renewal is automatic, no manual intervention
|
||||
- [ ] `staging` vs `production` ACME directory selection works
|
||||
- [ ] Unit tests for ACME config construction (mocked, not real Let's Encrypt calls)
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/tls.md — ACME mode, certificate failure behavior, challenge types
|
||||
- docs/architecture/decisions/004-rustls-acme.md — ACME-primary rationale
|
||||
- docs/architecture/decisions/005-tokio-rustls-direct.md — direct tokio-rustls for ACME integration
|
||||
|
||||
## Notes
|
||||
|
||||
> Real ACME integration tests require a network connection to Let's Encrypt staging. For CI, consider mock tests that verify the config and state machine setup without making real ACME requests. Manual testing against staging should be done before deployment.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
60
tasks/tls/http-redirect.md
Normal file
60
tasks/tls/http-redirect.md
Normal file
@@ -0,0 +1,60 @@
|
||||
---
|
||||
id: tls/http-redirect
|
||||
name: Implement HTTP to HTTPS redirect listener with Host-based URL construction
|
||||
status: pending
|
||||
depends_on: [config/static-config, config/dynamic-config]
|
||||
scope: narrow
|
||||
risk: low
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the HTTP → HTTPS redirect listener. Each listener that has `http_port > 0` runs a plain HTTP listener that redirects all requests to the HTTPS equivalent URL.
|
||||
|
||||
### Redirect Behavior
|
||||
|
||||
1. Read the `Host` header from the incoming request
|
||||
2. If no `Host` header, return `400 Bad Request`
|
||||
3. Construct redirect URL: `https://{host}:{https_port}/{path}?{query}`
|
||||
- `{host}` is the hostname portion of the `Host` header (port stripped)
|
||||
- `{https_port}` is the listener's `https_port`, omitted if 443
|
||||
- `{path}` and `{query}` preserved from original request
|
||||
4. Return `301 Permanent Redirect` with `Location` header
|
||||
|
||||
### Per-Listener
|
||||
|
||||
Each listener has its own HTTP redirect on its own bind address and `http_port`. Multiple listeners on different IPs can each have their own redirect.
|
||||
|
||||
### ACME HTTP-01 Challenge Support
|
||||
|
||||
When a listener is in ACME mode and uses HTTP-01 challenges, the redirect listener must also serve `/.well-known/acme-challenge/{token}` paths. This is a fallback for environments where TLS-ALPN-01 is not suitable.
|
||||
|
||||
Note: TLS-ALPN-01 is the default and primary challenge type. HTTP-01 support should be implemented but is not the primary path.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] HTTP listener binds to `bind_addr:http_port` for each enabled listener
|
||||
- [ ] Redirect to `https://{host}:{https_port}/{path}?{query}` with 301 status
|
||||
- [ ] Port 443 is omitted from redirect URL (standard HTTPS port)
|
||||
- [ ] Non-443 HTTPS ports are included in redirect URL
|
||||
- [ ] Missing `Host` header returns `400 Bad Request`
|
||||
- [ ] Per-listener redirect: each listener has its own HTTP redirect
|
||||
- [ ] `http_port = 0` disables HTTP redirect for that listener
|
||||
- [ ] ACME HTTP-01 challenge path handling (placeholder for future integration)
|
||||
- [ ] Unit tests for redirect URL construction
|
||||
- [ ] Integration test: HTTP request redirects to correct HTTPS URL
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/proxy.md — HTTP → HTTPS redirect section
|
||||
- docs/architecture/tls.md — ACME challenge types, HTTP listener
|
||||
|
||||
## Notes
|
||||
|
||||
> To be filled by implementation agent
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
78
tasks/tls/manual-tls.md
Normal file
78
tasks/tls/manual-tls.md
Normal file
@@ -0,0 +1,78 @@
|
||||
---
|
||||
id: tls/manual-tls
|
||||
name: Implement manual TLS certificate loading and ServerConfig construction
|
||||
status: pending
|
||||
depends_on: [setup/project-init]
|
||||
scope: narrow
|
||||
risk: low
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the manual TLS mode where certificates are loaded from PEM files on disk at startup. This covers building a `rustls::ServerConfig` with manually loaded certificate chains and private keys.
|
||||
|
||||
### Manual Mode
|
||||
|
||||
For each listener in manual mode:
|
||||
1. Load `cert_path` PEM file using `rustls_pemfile` → `Vec<CertificateDer>`
|
||||
2. Load `key_path` PEM file using `rustls_pemfile` → `PrivateKeyDer`
|
||||
3. Build `ServerConfig` with `with_no_client_auth()` and the loaded cert/key
|
||||
4. Configure cipher suites (restricted set per ADR-012)
|
||||
5. Configure protocol versions (TLS 1.2 and 1.3 only)
|
||||
|
||||
### Cipher Suite Configuration
|
||||
|
||||
Per ADR-012, restrict to nginx-equivalent cipher suites:
|
||||
|
||||
**TLS 1.2 (explicitly selected):**
|
||||
- `TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256`
|
||||
- `TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256`
|
||||
- `TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384`
|
||||
- `TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384`
|
||||
|
||||
**TLS 1.3 (all default suites):**
|
||||
- `TLS_AES_128_GCM_SHA256`
|
||||
- `TLS_AES_256_GCM_SHA384`
|
||||
- `TLS_CHACHA20_POLY1305_SHA256`
|
||||
|
||||
This is configured via a custom `CryptoProvider` with a `cipher_suite` list passed to `ServerConfig::builder_with_provider()`.
|
||||
|
||||
### Single-Domain Manual Mode
|
||||
|
||||
For a listener with one domain, build a simple `ServerConfig` with the single certificate chain and private key. No SNI resolver needed.
|
||||
|
||||
### Multi-Domain Manual Mode (on shared-IP listener)
|
||||
|
||||
For a listener with multiple sites on a shared IP, implement a custom `ResolvesServerCert` that maps SNI hostnames to `CertifiedKey` entries loaded from disk. If no certificate matches the SNI hostname, the handshake fails — we don't serve a default certificate for unknown domains.
|
||||
|
||||
Note: multi-domain manual mode with different certs per domain is a rare edge case. The initial implementation should handle the common case (single cert per manual listener). The SNI resolver can be a follow-up if needed.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `rustls::ServerConfig` construction for manual TLS mode
|
||||
- [ ] PEM file loading via `rustls_pemfile` for certificates and private keys
|
||||
- [ ] Cipher suite restriction per ADR-012 (4 TLS 1.2 suites + all TLS 1.3)
|
||||
- [ ] Protocol version restriction to TLS 1.2 and 1.3
|
||||
- [ `aws_lc_rs` crypto provider used
|
||||
- [ ] `with_no_client_auth()` for no client certificate requirement
|
||||
- [ ] Custom `ResolvesServerCert` for SNI-based cert selection in multi-domain manual mode
|
||||
- [ ] Unknown SNI hostname → handshake fails (no default cert)
|
||||
- [ ] Unit tests for ServerConfig construction with test certs (using `rcgen`)
|
||||
- [ ] Unit tests for cipher suite and protocol version configuration
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/tls.md — manual mode, cipher suites, SNI
|
||||
- docs/architecture/decisions/004-rustls-acme.md — manual mode is fallback
|
||||
- docs/architecture/decisions/005-tokio-rustls-direct.md — direct tokio-rustls usage
|
||||
- docs/architecture/decisions/012-cipher-suite-restriction.md — cipher suite selection
|
||||
|
||||
## Notes
|
||||
|
||||
> This task focuses on ServerConfig construction. The actual TCP listener + TLS acceptor wiring is in tls/tls-listener-setup.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
77
tasks/tls/tls-listener-setup.md
Normal file
77
tasks/tls/tls-listener-setup.md
Normal file
@@ -0,0 +1,77 @@
|
||||
---
|
||||
id: tls/tls-listener-setup
|
||||
name: Implement multi-listener TLS setup with ConnectInfo propagation and per-listener routers
|
||||
status: pending
|
||||
depends_on: [tls/manual-tls, tls/acme-tls, config/static-config, config/dynamic-config]
|
||||
scope: broad
|
||||
risk: high
|
||||
impact: phase
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Wire up the TLS listeners — this is the core integration task that brings together manual TLS, ACME TLS, and the config system to create running TLS listeners.
|
||||
|
||||
For each `ListenerConfig`:
|
||||
1. Bind TCP listener on `bind_addr:https_port`
|
||||
2. Construct the appropriate `ServerConfig` (manual or ACME)
|
||||
3. Create `tokio_rustls::TlsAcceptor` from the `ServerConfig`
|
||||
4. Accept connections, extract `peer_addr()` before wrapping in TLS
|
||||
5. Create a per-listener `axum::Router` with its middleware stack
|
||||
6. Provide `ConnectInfo<SocketAddr>` to the router via `into_make_service_with_connect_info::<SocketAddr>()`
|
||||
|
||||
### ConnectInfo Propagation
|
||||
|
||||
`ConnectInfo<SocketAddr>` is critical for the proxy handler — it provides the real client IP for `X-Real-IP` and `X-Forwarded-For` headers. The peer address must be extracted from the `TcpStream` before wrapping in `TlsStream`.
|
||||
|
||||
### Per-Listener Routers
|
||||
|
||||
Each listener has its own `axum::Router` instance with its own middleware stack. All routers share `Arc<ArcSwap<DynamicConfig>>` and `Arc<Mutex<HashMap<IpAddr, TokenBucket>>>` via axum State.
|
||||
|
||||
### Startup Sequence
|
||||
|
||||
The TLS listener setup follows the startup sequence from operations.md:
|
||||
1. Parse and validate config
|
||||
2. Initialize DynamicConfig in ArcSwap
|
||||
3. Initialize shared state (rate limiter, hyper client, logging)
|
||||
4. Bind health check port
|
||||
5. Bind admin socket
|
||||
6. Bind all listener ports (TCP bind)
|
||||
7. Load TLS configuration (manual certs or ACME init)
|
||||
8. Start TCP listeners
|
||||
9. Start background tasks (ACME renewal, rate limiter eviction, signal handler, admin socket)
|
||||
|
||||
Fail-fast if any bind or TLS load fails.
|
||||
|
||||
### Health Endpoint on HTTPS
|
||||
|
||||
When the local health check port is enabled, `/health` is also available on the HTTPS listener(s) as a fallback for TLS-level health verification.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Multi-listener setup: each `ListenerConfig` creates its own TCP listener + TLS acceptor
|
||||
- [ ] `ConnectInfo<SocketAddr>` populated from `TcpStream::peer_addr()` before TLS wrapping
|
||||
- [ ] Per-listener `axum::Router` instances sharing `Arc<ArcSwap<DynamicConfig>>` state
|
||||
- [ ] Both manual and ACME TLS modes work for different listeners
|
||||
- [ ] Fail-fast behavior: if any bind or TLS load fails, exit with non-zero code
|
||||
- [ ] All ports bound before any connections accepted
|
||||
- [ ] `/health` endpoint available on HTTPS listener(s)
|
||||
- [ ] `sd_notify("READY=1")` sent after all listeners started (systemd integration)
|
||||
- [ ] Integration test: start proxy with test config, verify HTTPS listener accepts connections
|
||||
- [ ] Integration test: multi-listener config with both manual and ACME listeners
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/tls.md — multi-listener architecture, ConnectInfo
|
||||
- docs/architecture/proxy.md — Host-based routing, ConnectInfo propagation
|
||||
- docs/architecture/operations.md — startup sequence, health check
|
||||
- docs/architecture/config.md — ListenerConfig, StaticConfig
|
||||
|
||||
## Notes
|
||||
|
||||
> This task is the critical integration point. It depends on manual TLS, ACME TLS, static config, and dynamic config all being complete. The implementation agent should wire these together carefully, following the startup sequence in operations.md.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
Reference in New Issue
Block a user