Decompose architecture into 23 atomic tasks across 7 parallel generations

Task graph covers all Phase 1 concerns: config system, TLS termination,
proxy handler, operations (rate limiting, logging, health check, admin
socket, signals, shutdown, body size limit), deployment artifacts, and
two review checkpoints.

No circular dependencies. Critical path length of 7. Risk distribution:
3 high-risk (ACME, TLS listener setup, startup orchestration), 7 medium,
11 low, 2 trivial.
This commit is contained in:
2026-06-11 11:21:10 +00:00
parent ceb59ad9b9
commit 309878c561
23 changed files with 1676 additions and 0 deletions

72
tasks/tls/acme-tls.md Normal file
View File

@@ -0,0 +1,72 @@
---
id: tls/acme-tls
name: Implement ACME certificate provisioning with rustls-acme for automatic Let's Encrypt
status: pending
depends_on: [setup/project-init]
scope: moderate
risk: high
impact: component
level: implementation
---
## Description
Implement ACME mode TLS certificate provisioning using `rustls-acme`. Each listener in ACME mode creates its own `AcmeCertProvider` with the listener's domain list, cache directory, and Let's Encrypt directory.
### ACME Mode
For each listener in ACME mode:
1. Create `AcmeConfig::new(domains)` with the domains from `acme_domains`
2. Configure the ACME state machine as a background tokio task per listener
3. `ResolvesServerCertAcme` serves the ACME-provisioned certificate
4. Certificate renewal is automatic (~30 days before expiry)
5. Cache directory persists ACME state between restarts via `DirCache`
### Certificate Failure Behavior
| Scenario | Behavior |
|----------|----------|
| First start, no cached cert, ACME unreachable | **Fail to start** with clear error |
| First start, no cached cert, ACME succeeds | Normal startup |
| Start with cached cert, ACME unreachable for renewal | **Start normally** with cached cert, log `warn` |
| Renewal failure after startup | **Continue serving existing cert**, log `warn` |
| Cached cert expired, renewal fails at startup | **Fail to start** |
| Cached cert expires during runtime | **Continue serving expired cert**, log `error` |
Key principle: **never start without a valid TLS certificate, but always continue serving if a valid cert exists**.
### ACME Challenge Type
Default is TLS-ALPN-01 since the proxy already listens on port 443. HTTP-01 is available as a fallback via the port 80 redirect listener serving `/.well-known/acme-challenge/{token}`.
### ServerConfig for ACME Mode
Build `ServerConfig` with `with_cert_resolver()`, passing the `ResolvesServerCertAcme` resolver. Register `acme-tls/1` in `alpn_protocols` for TLS-ALPN-01 challenge handling.
## Acceptance Criteria
- [ ] ACME state machine runs as background tokio task per listener
- [ ] `AcmeConfig` created per listener with correct domains, cache dir, and directory
- [ ] `ResolvesServerCertAcme` integrated into `ServerConfig`
- [ ] `acme-tls/1` ALPN protocol registered for TLS-ALPN-01 challenges
- [ ] Cipher suite and protocol version restrictions applied (same as manual mode)
- [ ] Certificate failure behavior matches the table above
- [ ] Cache directory (`DirCache`) persists ACME state between restarts
- [ ] Each listener uses its own cache directory to avoid conflicts
- [ ] ACME renewal is automatic, no manual intervention
- [ ] `staging` vs `production` ACME directory selection works
- [ ] Unit tests for ACME config construction (mocked, not real Let's Encrypt calls)
## References
- docs/architecture/tls.md — ACME mode, certificate failure behavior, challenge types
- docs/architecture/decisions/004-rustls-acme.md — ACME-primary rationale
- docs/architecture/decisions/005-tokio-rustls-direct.md — direct tokio-rustls for ACME integration
## Notes
> Real ACME integration tests require a network connection to Let's Encrypt staging. For CI, consider mock tests that verify the config and state machine setup without making real ACME requests. Manual testing against staging should be done before deployment.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,60 @@
---
id: tls/http-redirect
name: Implement HTTP to HTTPS redirect listener with Host-based URL construction
status: pending
depends_on: [config/static-config, config/dynamic-config]
scope: narrow
risk: low
impact: component
level: implementation
---
## Description
Implement the HTTP → HTTPS redirect listener. Each listener that has `http_port > 0` runs a plain HTTP listener that redirects all requests to the HTTPS equivalent URL.
### Redirect Behavior
1. Read the `Host` header from the incoming request
2. If no `Host` header, return `400 Bad Request`
3. Construct redirect URL: `https://{host}:{https_port}/{path}?{query}`
- `{host}` is the hostname portion of the `Host` header (port stripped)
- `{https_port}` is the listener's `https_port`, omitted if 443
- `{path}` and `{query}` preserved from original request
4. Return `301 Permanent Redirect` with `Location` header
### Per-Listener
Each listener has its own HTTP redirect on its own bind address and `http_port`. Multiple listeners on different IPs can each have their own redirect.
### ACME HTTP-01 Challenge Support
When a listener is in ACME mode and uses HTTP-01 challenges, the redirect listener must also serve `/.well-known/acme-challenge/{token}` paths. This is a fallback for environments where TLS-ALPN-01 is not suitable.
Note: TLS-ALPN-01 is the default and primary challenge type. HTTP-01 support should be implemented but is not the primary path.
## Acceptance Criteria
- [ ] HTTP listener binds to `bind_addr:http_port` for each enabled listener
- [ ] Redirect to `https://{host}:{https_port}/{path}?{query}` with 301 status
- [ ] Port 443 is omitted from redirect URL (standard HTTPS port)
- [ ] Non-443 HTTPS ports are included in redirect URL
- [ ] Missing `Host` header returns `400 Bad Request`
- [ ] Per-listener redirect: each listener has its own HTTP redirect
- [ ] `http_port = 0` disables HTTP redirect for that listener
- [ ] ACME HTTP-01 challenge path handling (placeholder for future integration)
- [ ] Unit tests for redirect URL construction
- [ ] Integration test: HTTP request redirects to correct HTTPS URL
## References
- docs/architecture/proxy.md — HTTP → HTTPS redirect section
- docs/architecture/tls.md — ACME challenge types, HTTP listener
## Notes
> To be filled by implementation agent
## Summary
> To be filled on completion

78
tasks/tls/manual-tls.md Normal file
View File

@@ -0,0 +1,78 @@
---
id: tls/manual-tls
name: Implement manual TLS certificate loading and ServerConfig construction
status: pending
depends_on: [setup/project-init]
scope: narrow
risk: low
impact: component
level: implementation
---
## Description
Implement the manual TLS mode where certificates are loaded from PEM files on disk at startup. This covers building a `rustls::ServerConfig` with manually loaded certificate chains and private keys.
### Manual Mode
For each listener in manual mode:
1. Load `cert_path` PEM file using `rustls_pemfile``Vec<CertificateDer>`
2. Load `key_path` PEM file using `rustls_pemfile``PrivateKeyDer`
3. Build `ServerConfig` with `with_no_client_auth()` and the loaded cert/key
4. Configure cipher suites (restricted set per ADR-012)
5. Configure protocol versions (TLS 1.2 and 1.3 only)
### Cipher Suite Configuration
Per ADR-012, restrict to nginx-equivalent cipher suites:
**TLS 1.2 (explicitly selected):**
- `TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256`
- `TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256`
- `TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384`
- `TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384`
**TLS 1.3 (all default suites):**
- `TLS_AES_128_GCM_SHA256`
- `TLS_AES_256_GCM_SHA384`
- `TLS_CHACHA20_POLY1305_SHA256`
This is configured via a custom `CryptoProvider` with a `cipher_suite` list passed to `ServerConfig::builder_with_provider()`.
### Single-Domain Manual Mode
For a listener with one domain, build a simple `ServerConfig` with the single certificate chain and private key. No SNI resolver needed.
### Multi-Domain Manual Mode (on shared-IP listener)
For a listener with multiple sites on a shared IP, implement a custom `ResolvesServerCert` that maps SNI hostnames to `CertifiedKey` entries loaded from disk. If no certificate matches the SNI hostname, the handshake fails — we don't serve a default certificate for unknown domains.
Note: multi-domain manual mode with different certs per domain is a rare edge case. The initial implementation should handle the common case (single cert per manual listener). The SNI resolver can be a follow-up if needed.
## Acceptance Criteria
- [ ] `rustls::ServerConfig` construction for manual TLS mode
- [ ] PEM file loading via `rustls_pemfile` for certificates and private keys
- [ ] Cipher suite restriction per ADR-012 (4 TLS 1.2 suites + all TLS 1.3)
- [ ] Protocol version restriction to TLS 1.2 and 1.3
- [ `aws_lc_rs` crypto provider used
- [ ] `with_no_client_auth()` for no client certificate requirement
- [ ] Custom `ResolvesServerCert` for SNI-based cert selection in multi-domain manual mode
- [ ] Unknown SNI hostname → handshake fails (no default cert)
- [ ] Unit tests for ServerConfig construction with test certs (using `rcgen`)
- [ ] Unit tests for cipher suite and protocol version configuration
## References
- docs/architecture/tls.md — manual mode, cipher suites, SNI
- docs/architecture/decisions/004-rustls-acme.md — manual mode is fallback
- docs/architecture/decisions/005-tokio-rustls-direct.md — direct tokio-rustls usage
- docs/architecture/decisions/012-cipher-suite-restriction.md — cipher suite selection
## Notes
> This task focuses on ServerConfig construction. The actual TCP listener + TLS acceptor wiring is in tls/tls-listener-setup.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,77 @@
---
id: tls/tls-listener-setup
name: Implement multi-listener TLS setup with ConnectInfo propagation and per-listener routers
status: pending
depends_on: [tls/manual-tls, tls/acme-tls, config/static-config, config/dynamic-config]
scope: broad
risk: high
impact: phase
level: implementation
---
## Description
Wire up the TLS listeners — this is the core integration task that brings together manual TLS, ACME TLS, and the config system to create running TLS listeners.
For each `ListenerConfig`:
1. Bind TCP listener on `bind_addr:https_port`
2. Construct the appropriate `ServerConfig` (manual or ACME)
3. Create `tokio_rustls::TlsAcceptor` from the `ServerConfig`
4. Accept connections, extract `peer_addr()` before wrapping in TLS
5. Create a per-listener `axum::Router` with its middleware stack
6. Provide `ConnectInfo<SocketAddr>` to the router via `into_make_service_with_connect_info::<SocketAddr>()`
### ConnectInfo Propagation
`ConnectInfo<SocketAddr>` is critical for the proxy handler — it provides the real client IP for `X-Real-IP` and `X-Forwarded-For` headers. The peer address must be extracted from the `TcpStream` before wrapping in `TlsStream`.
### Per-Listener Routers
Each listener has its own `axum::Router` instance with its own middleware stack. All routers share `Arc<ArcSwap<DynamicConfig>>` and `Arc<Mutex<HashMap<IpAddr, TokenBucket>>>` via axum State.
### Startup Sequence
The TLS listener setup follows the startup sequence from operations.md:
1. Parse and validate config
2. Initialize DynamicConfig in ArcSwap
3. Initialize shared state (rate limiter, hyper client, logging)
4. Bind health check port
5. Bind admin socket
6. Bind all listener ports (TCP bind)
7. Load TLS configuration (manual certs or ACME init)
8. Start TCP listeners
9. Start background tasks (ACME renewal, rate limiter eviction, signal handler, admin socket)
Fail-fast if any bind or TLS load fails.
### Health Endpoint on HTTPS
When the local health check port is enabled, `/health` is also available on the HTTPS listener(s) as a fallback for TLS-level health verification.
## Acceptance Criteria
- [ ] Multi-listener setup: each `ListenerConfig` creates its own TCP listener + TLS acceptor
- [ ] `ConnectInfo<SocketAddr>` populated from `TcpStream::peer_addr()` before TLS wrapping
- [ ] Per-listener `axum::Router` instances sharing `Arc<ArcSwap<DynamicConfig>>` state
- [ ] Both manual and ACME TLS modes work for different listeners
- [ ] Fail-fast behavior: if any bind or TLS load fails, exit with non-zero code
- [ ] All ports bound before any connections accepted
- [ ] `/health` endpoint available on HTTPS listener(s)
- [ ] `sd_notify("READY=1")` sent after all listeners started (systemd integration)
- [ ] Integration test: start proxy with test config, verify HTTPS listener accepts connections
- [ ] Integration test: multi-listener config with both manual and ACME listeners
## References
- docs/architecture/tls.md — multi-listener architecture, ConnectInfo
- docs/architecture/proxy.md — Host-based routing, ConnectInfo propagation
- docs/architecture/operations.md — startup sequence, health check
- docs/architecture/config.md — ListenerConfig, StaticConfig
## Notes
> This task is the critical integration point. It depends on manual TLS, ACME TLS, static config, and dynamic config all being complete. The implementation agent should wire these together carefully, following the startup sequence in operations.md.
## Summary
> To be filled on completion