Resolve all architecture review findings (7 critical, 14 warnings, 6 suggestions)

Critical findings resolved:
- C1: Site routing is global (per-listener TOML, global runtime lookup)
- C2: X-Forwarded-For replaces (not appends) — edge proxy model (ADR-021)
- C3: Hop-by-hop header handling rules specified (proxy.md)
- C4: ACME failure behavior defined (tls.md)
- C5: Startup sequence with fail-fast semantics (operations.md)
- C6: Per-listener Router instances with shared global state (overview.md)
- C7: Rate limiter adopts new params on next request, no state clear (operations.md)

Warnings resolved:
- W1: Admin socket wire protocol specified
- W2: Host header port stripped, hostnames only in config
- W3: HTTP redirect URL construction with port handling
- W4: /health on HTTPS matches regardless of Host header
- W5: Static config changes logged as warning during reload
- W6: Reload operations serialized via Mutex
- W7: http_port validation rules added (9 new rules total)
- W8: upstream format validation (host:port required, no scheme)
- W9: TLS error handling table (SNI, version, cipher failures)
- W10: IPv6 rate limited per /64 prefix
- W11: Graceful shutdown sequence specified (6 steps)
- W12: Error response bodies: minimal plain text, no version disclosure
- W13: upstream_scheme HTTPS uses system CA store
- W14: allow_wildcard_bind is OR between config and CLI
- W15: ADR-010 Phase 2 list updated (timeouts moved to Phase 1)
- W17: LoggingConfig static/restart note added

Suggestions applied:
- S2: ConnectInfo propagation note
- S3: Case-insensitive host matching (RFC 7230)
- S5: Response streaming behavior (chunk-by-chunk)
- S6: Token bucket nodelay semantics
- S7: File watching explicitly out of scope
- S8: All paths forwarded without filtering
- S9: shutdown_timeout_secs referenced in shutdown description
- S11: Consolidated defaults table in config.md
This commit is contained in:
2026-06-11 10:56:40 +00:00
parent bcc58bc7ce
commit ceb59ad9b9
8 changed files with 467 additions and 61 deletions

View File

@@ -52,6 +52,7 @@ certificate via ACME.
| [018](decisions/018-body-size-limit.md) | Request Body Size Limit | Accepted |
| [019](decisions/019-multi-config-listeners.md) | Multi-Config Listener Support | Accepted |
| [020](decisions/020-container-deployment.md) | Container Deployment Model | Accepted |
| [021](decisions/021-x-forwarded-for-edge-proxy.md) | X-Forwarded-For Edge Proxy Model | Accepted |
## Open Questions

View File

@@ -100,6 +100,11 @@ Immutable after startup. Changes require a process restart.
| `format` | `"text"` or `"json"` | Log output format |
| `log_file_path` | `String` | Path to log file. When set, structured logs are written to this file in addition to stdout/stderr. Strongly recommended for fail2ban integration in container deployments (see ADR-020). Default: not set (file logging disabled) |
**Note**: The entire `LoggingConfig` (including `log_file_path`) is static and
requires a process restart to change. Log file path changes require reopening
file handles, which is complex and low-value for Phase 1. Log rotation (Phase 2)
will be handled via signal-based or built-in rotation.
**ListenerConfig** (per-listener static config):
| Field | Type | Description |
@@ -141,10 +146,13 @@ connections immediately.
| `upstream_connect_timeout_secs` | `u64` | TCP connect timeout in seconds (default: `5`; see ADR-015, ADR-017) |
| `upstream_request_timeout_secs` | `u64` | Full request timeout in seconds (default: `60`; see ADR-015, ADR-017) |
Sites are defined per listener in the `[[listeners]]` entries. Each listener
routes its own sites independently. The `DynamicConfig` collects all sites
across all listeners for hot-reload via `ArcSwap`. When a config reload
occurs, all listener site mappings are updated atomically.
Sites are defined per listener in the `[[listeners]]` entries for organizational
purposes, but at runtime they are collected into a single global routing table
in `DynamicConfig`. The proxy looks up the `Host` header in this global table to
route requests. Hostnames must be unique across all listeners — a `Host` header
can only match one site definition, regardless of which listener received the
request. See ADR-019 for the rationale behind the `[[listeners]]` configuration
format.
**Why these are dynamic:** See ADR-008 for the rationale. Site definitions
and rate limits are per-request concerns that should not require restarting
@@ -152,6 +160,29 @@ the proxy or dropping active connections. Rate limits and body limits are
global settings in Phase 1; per-site configuration for these is deferred to
Phase 2.
### Default Values
| Field | Type | Default | Required |
|-------|------|---------|----------|
| `allow_wildcard_bind` | `bool` | `false` | No |
| `health_check_port` | `u16` | `9900` | No |
| `admin_socket_path` | `String` | `/run/reverse-proxy/admin.sock` | No |
| `shutdown_timeout_secs` | `u64` | `30` | No |
| `logging.level` | `String` | `"info"` | No |
| `logging.format` | `String` | `"text"` | No |
| `logging.log_file_path` | `String` | (not set) | No |
| `listeners[].http_port` | `u16` | `80` | No |
| `listeners[].https_port` | `u16` | `443` | No |
| `listeners[].tls.acme_directory` | `String` | `"production"` | No |
| `sites[].upstream_scheme` | `String` | `"http"` | No |
| `sites[].upstream_connect_timeout_secs` | `u64` | `5` | No |
| `sites[].upstream_request_timeout_secs` | `u64` | `60` | No |
| `rate_limit.requests_per_second` | `u32` | — | Yes |
| `rate_limit.burst` | `u32` | — | Yes |
| `body.limit_bytes` | `u64` | — | Yes |
Fields without defaults are required and must be specified in the config file.
## Config Reload
### ArcSwap Pattern
@@ -187,6 +218,41 @@ Both mechanisms converge on the same code path:
3. Validate (check upstream reachability is optional)
4. Call `ConfigReloadHandle::reload(new_config)`
### Static Config Changes During Reload
When the config file is reloaded (via SIGHUP or admin socket), the entire file
is read and validated — both static and dynamic portions. This provides early
error detection for misconfigurations that would prevent a restart from
succeeding.
If the full config fails validation, the reload is rejected and the old
DynamicConfig remains active.
If the full config passes validation but static fields have changed, the
DynamicConfig is swapped normally and a warning is logged listing the changed
static fields and noting that a restart is required for those changes to take
effect. This gives operators early feedback about config drift.
Only the DynamicConfig portion is swapped via ArcSwap. StaticConfig changes
require a process restart to take effect.
### Reload Serialization
Reload operations are serialized using a `tokio::sync::Mutex` on the reload
code path. If a reload is in progress (triggered by SIGHUP or admin socket) and
a second reload is requested, the second request waits for the first to
complete, then re-reads the config file (getting the latest version) and
proceeds. This prevents race conditions where two concurrent reloads could apply
an older config over a newer one.
### Out of Scope: File Watching
Automatic file watching (inotify, fsnotify, etc.) is out of scope for Phase 1.
Config reload is triggered explicitly by SIGHUP or admin socket command. File
watching adds complexity (debouncing, handling atomic renames, handling editor
swap files) that is not justified for a single-instance proxy with infrequent
config changes.
## TOML Config Format
### Multi-Config (Dedicated-IP Per Domain)
@@ -296,7 +362,7 @@ upstream = "127.0.0.1:8080"
On startup, the config is validated:
1. At least one `[[listeners]]` entry must exist
2. Each listener's `bind_addr` is not `0.0.0.0` unless `allow_wildcard_bind = true` (in config) or `--allow-wildcard-bind` (CLI flag) is set (see ADR-016, ADR-020)
2. Each listener's `bind_addr` is not `0.0.0.0` unless `allow_wildcard_bind` is enabled. This can be enabled via config (`allow_wildcard_bind = true`) or CLI flag (`--allow-wildcard-bind`). Either source enables it — it is an OR relationship, not AND. The CLI flag does not override the config value; if either is set, wildcard binding is allowed.
3. Each listener's `bind_addr` and `https_port` combination must be unique
4. In ACME mode, `acme_domains` must be non-empty
5. In manual mode, `cert_path` and `key_path` must both be set and the files
@@ -308,6 +374,23 @@ On startup, the config is validated:
route a request to when the `Host` header matches multiple sites.
8. `rate_limit.requests_per_second` must be > 0
9. `body.limit_bytes` must be > 0
10. Each listener's `bind_addr` and `http_port` combination must be unique
(prevents bind-time errors, same as rule 3 for `https_port`)
11. Within a listener, `http_port` and `https_port` must differ
12. `https_port` must be 165535 (required — TLS needs a port)
13. `http_port` must be 0 (disabled) or 165535
14. `health_check_port` must not conflict with any listener's `http_port` or
`https_port` on the same bind address
15. Site `host` values must not include a port number (e.g., `git.alk.dev`,
not `git.alk.dev:443`)
16. Site `host` values must be valid hostnames (not IP addresses, not
including ports). Hostnames are normalized to lowercase during validation.
17. `upstream` must be in `host:port` format where `port` is a required integer
165535. Examples: `gitea:3000`, `127.0.0.1:3000`, `[::1]:3000`. Invalid
examples: `gitea` (missing port), `http://gitea:3000` (includes scheme),
`10.0.0.5` (missing port). The `upstream_scheme` field handles the protocol.
18. `upstream_scheme` values are case-sensitive: only `"http"` or `"https"`
(lowercase). Default is `"http"`.
On SIGHUP reload, the same validation applies. If the new config fails
validation, the reload is rejected and the old config remains active. An error

View File

@@ -48,10 +48,12 @@ Phase 1 scope becomes:
Phase 2 scope shifts to operational hardening:
1. Per-site rate limits and body limits
2. Per-site upstream timeouts
3. Metrics endpoint (Prometheus-compatible)
4. Connection limits and timeouts
5. Log rotation
2. Metrics endpoint (Prometheus-compatible)
3. Connection limits and timeouts
4. Log rotation
Note: "Per-site upstream timeouts" was originally listed here but was moved to
Phase 1 via ADR-015.
Phase 3 remains future enhancements.

View File

@@ -0,0 +1,74 @@
# ADR-021: X-Forwarded-For Edge Proxy Model
## Status
Accepted
## Context
The reverse proxy terminates TLS and injects standard proxy headers (Host,
X-Real-IP, X-Forwarded-For, X-Forwarded-Proto) before forwarding requests to
upstream services. The question is how to handle the `X-Forwarded-For` header
when the request already contains one.
Two approaches exist:
1. **Append**: Add the client IP to any existing `X-Forwarded-For` value.
This preserves the proxy chain history but trusts that existing values are
legitimate. In a chained proxy scenario (e.g., CDN → reverse proxy →
backend), appending is correct because the CDN's `X-Forwarded-For` value
is trustworthy.
2. **Replace**: Set `X-Forwarded-For` to the client's IP address from
`ConnectInfo<SocketAddr>`, discarding any existing value. This is correct
when the proxy is the edge proxy directly facing the internet, because
client-provided `X-Forwarded-For` headers are untrusted — any client can
inject arbitrary values.
This proxy is deployed as the **edge proxy** — it sits directly in front of the
internet with no trusted proxies upstream. All client connections come directly
to this proxy.
## Decision
Set `X-Forwarded-For` to the client's IP address from `ConnectInfo<SocketAddr>`,
**replacing** any existing value rather than appending.
The proxy is an edge proxy. There are no trusted proxies upstream, so existing
`X-Forwarded-For` headers from clients cannot be trusted. Replacing prevents
header spoofing attacks where a malicious client injects fake IP addresses to
confuse upstream services or bypass IP-based access controls.
## Rationale
- The proxy is the edge proxy — it directly faces the internet. No CDN or
other trusted proxy sits in front of it.
- Client-provided `X-Forwarded-For` headers are untrusted. Any client can send
`X-Forwarded-For: 1.2.3.4` to spoof their IP.
- Appending to an untrusted header creates a security vulnerability: upstream
services (like Gitea) may use `X-Forwarded-For` for IP-based rate limiting or
access control. Spoofed values would bypass these protections.
- `X-Real-IP` is also set to the client's IP from `ConnectInfo`, providing a
trustworthy header for upstream services that need the real client IP.
- If this proxy is ever placed behind a CDN or other trusted proxy, the header
handling model should be revisited. A "trusted proxies" configuration can be
added in Phase 2 to support chained proxy scenarios.
## Consequences
**Positive:**
- Prevents `X-Forwarded-For` spoofing attacks
- Upstream services receive the real client IP, not a spoofed one
- Simple, predictable header handling — no trust model to configure
- Consistent with the proxy's role as the edge proxy
**Negative:**
- If the proxy is placed behind a CDN or other proxy in the future, the header
handling must be updated to support a "trusted proxies" model
- Does not preserve legitimate proxy chain history (not applicable in our
deployment — there are no proxies upstream)
## References
- [proxy.md](../proxy.md)
- Architecture Review C2 (X-Forwarded-For security model)

View File

@@ -36,6 +36,12 @@ Rate limits are global per-IP in Phase 1 (not per-site). A request from IP
address X counts against the same bucket regardless of which site it targets.
Per-site rate limits may be added in Phase 2.
The token bucket uses **nodelay** semantics matching nginx's `limit_req burst
nodelay`: when the bucket is empty, the request is immediately rejected with
429 — requests are not queued. Tokens are added at a rate of
`requests_per_second` (1 token every 1000ms / requests_per_second), and the
bucket capacity is the `burst` value.
When a request exceeds the rate limit, the middleware returns `429 Too Many
Requests` and logs the event with structured fields.
@@ -47,6 +53,37 @@ whose last access timestamp is older than a configurable eviction age
(default: 300 seconds / 5 minutes). This prevents unbounded memory growth
while preserving recent entries that may still receive traffic.
### Config Reload Behavior
When rate limit parameters change (e.g., from 10 req/s burst 20 to 20 req/s
burst 40), the behavior is:
1. New `DynamicConfig` is swapped in via ArcSwap.
2. On the next request from an existing IP, the rate limiter reads the current
`DynamicConfig` for rate/burst parameters.
3. The token bucket refills using the new rate, and its capacity is set to the
new burst maximum.
4. If the current token count exceeds the new burst maximum, it is capped to
the new burst maximum.
The HashMap is **not** cleared — this avoids creating a rate-limiting gap.
Existing buckets adopt new parameters on their next request. The eviction task
continues removing stale entries independently.
### IPv6 Rate Limiting
IPv6 addresses have a vastly larger address space than IPv4. Rate limiting per
individual IPv6 address (`/128`) is ineffective against attackers who can
generate many addresses within a `/64` prefix.
- **IPv4**: Rate limited per individual address (`/32`).
- **IPv6**: Rate limited per `/64` prefix. All addresses in the same `/64` share
the same token bucket. This matches RFC 4941 privacy extension boundaries and
common anti-abuse practice.
The rate limiter normalizes IPv6 addresses to their `/64` prefix before
bucket lookup.
### Fail2ban Integration
Rate limit events are logged in a structured format that a custom fail2ban
@@ -225,13 +262,46 @@ process does not exit on SIGHUP.
The admin Unix domain socket provides programmatic config reload with feedback.
This is useful for CI/CD pipelines and automation tools. See ADR-014 for the
command protocol.
rationale.
### Timeout
**Protocol:**
In-flight requests have a configurable shutdown timeout (default: 30 seconds).
After the timeout, remaining connections are forcefully closed and the process
exits.
- **Connection lifecycle**: One command per connection. Client connects, sends
one newline-terminated command, receives one newline-terminated JSON
response, then the server closes the connection.
- **Message framing**: Newline-delimited (`\n`). Responses end with `\n`.
- **Commands**:
- `reload` — Re-read config file, validate, and swap DynamicConfig. Returns
`{"status": "ok"}` or `{"status": "error", "message": "..."}`.
- `status` — Return basic process info. Returns
`{"status": "ok", "uptime_secs": 1234, "sites": 2}`.
- **Error responses**: Unrecognized commands return
`{"status": "error", "message": "unknown command: <cmd>"}`. Invalid or empty
input returns `{"status": "error", "message": "invalid input"}`.
- **Concurrency**: Multiple clients can connect simultaneously, but reload
operations are serialized (see Config Reload section in config.md).
- **Socket cleanup**: The proxy removes any existing socket file at startup
before binding. If the file exists and another process is listening, a warning
is logged and the admin socket is disabled (but the proxy continues starting).
### Shutdown Sequence
On SIGTERM or SIGINT, the proxy performs a graceful shutdown:
1. **Stop accepting new connections** — Close all TCP listening sockets. No new
connections are accepted.
2. **Close idle keep-alive connections** — Send `Connection: close` on any idle
connections in the keep-alive pool.
3. **Wait for in-flight requests** — Up to `shutdown_timeout_secs` (default: 30)
for active requests to complete.
4. **Force-close remaining connections** — After the timeout, any remaining
connections are forcefully closed via TCP RST.
5. **Cancel background tasks** — ACME renewal tasks, rate limiter eviction task,
and admin socket listener are all cancelled.
6. **Exit with code 0**.
The `shutdown_timeout_secs` is configurable in StaticConfig (default: 30
seconds). See config.md for details.
## Deployment
@@ -443,6 +513,51 @@ continues to be routed directly to the Gitea container via Docker port
publishing (e.g., `203.0.113.10:22:2222`), matching the current deployment
pattern.
## Startup Sequence
The proxy starts components in a specific order to ensure fail-fast behavior
and correct dependency initialization:
1. **Parse and validate config** — Read the TOML config file, deserialize into
`StaticConfig` and `DynamicConfig`, and validate all rules. If validation
fails, exit with non-zero code and log errors. No ports are bound.
2. **Initialize DynamicConfig** — Load sites, rate limits, and body limits into
`ArcSwap<DynamicConfig>`.
3. **Initialize shared state** — Create the rate limiter
`HashMap<IpAddr, TokenBucket>`, the shared `hyper::Client`, and the
`tracing-subscriber` with file and stdout layers.
4. **Bind health check port** (if enabled) — Bind `127.0.0.1:{health_check_port}`.
Fail-fast if bind fails.
5. **Bind admin socket** (if enabled) — Remove any stale socket file first, then
bind the Unix domain socket. If the socket file exists and another process is
listening, log a warning and fail the admin socket (but continue starting —
the admin socket is non-critical).
6. **Bind all listener ports** — For each listener: bind HTTP port (if enabled)
and HTTPS port. If any bind fails, fail-fast and exit. All ports are bound
before proceeding.
7. **Load TLS configuration** — For each listener: load manual certificates or
initialize ACME state machine. If manual certificate loading fails, fail-fast
and exit. For ACME: if no cached certificate exists and ACME provisioning
fails, fail-fast and exit.
8. **Start TCP listeners** — Begin accepting connections on all bound ports.
9. **Start background tasks** — ACME renewal tasks (per listener in ACME mode),
rate limiter eviction task, signal handler task, admin socket handler task.
10. **Signal readiness** — Send `sd_notify("READY=1")` to systemd (if running
under systemd).
**Failure semantics**: **Fail-fast**. If any step fails, the process exits with
a non-zero code. The proxy does not partially start. All ports are bound before
any connections are accepted.
## Design Decisions
All design decisions are documented as ADRs in [decisions/](decisions/).

View File

@@ -92,8 +92,9 @@ details.
│ │
bind_addr:443 ────► │ │ TLS listener (tokio-rustls) │ │
(published) │ │ ├─ ACME or Manual TLS config │ │
│ │ └─ axum router │ │
│ │ ├─ Host-based routing │ │
│ │ └─ axum router (per-listener) │ │
│ │ ├─ /health → 200 OK (any) │ │
│ │ ├─ Host → global site lookup │ │
│ │ ├─ git.alk.dev → gitea:3000 │ │
│ │ └─ Rate limiting, headers │ │
│ └────────────────────────────────┘ │
@@ -104,12 +105,15 @@ details.
│ │
bind_addr_N:443 ───► │ │ TLS listener (tokio-rustls) │ │
│ │ ├─ Manual TLS cert │ │
│ │ └─ axum router │ │
│ │ └─ axum router (per-listener) │ │
│ │ ├─ /health → 200 OK (any) │ │
│ │ ├─ Host → global site lookup │ │
│ │ ├─ alk.dev → app:8080 │ │
│ │ └─ Rate limiting, headers │ │
│ └────────────────────────────────┘ │
│ │
│ /health → 200 OK (port 9900) │
│ Admin socket (Unix domain) │
└────────────────────────────────────┘
│ │
┌──────┘ └──────┐
@@ -121,6 +125,13 @@ details.
└─ admin socket (rw)
```
Each listener has its own `axum::Router` instance with its own middleware stack,
but all routers share `Arc<ArcSwap<DynamicConfig>>` and
`Arc<Mutex<HashMap<IpAddr, TokenBucket>>>` via axum State. Site routing is
global: the `Host` header is matched against a single routing table collected
from all listeners' site definitions. Hostnames must be unique across all
listeners — see C1 resolution in the architecture review.
In container deployments (ADR-020), the proxy runs in a minimal container with
`0.0.0.0` bind address and Docker port publishing. Upstream addresses use Docker
DNS names for same-host containers (e.g., `gitea:3000`) but also support
@@ -196,6 +207,7 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
| [018](decisions/018-body-size-limit.md) | Request body size limit | 100 MB default matching nginx, Gitea push compatibility |
| [019](decisions/019-multi-config-listeners.md) | Multi-config listeners | `[[listeners]]` supporting both dedicated-IP and shared-IP deployment models |
| [020](decisions/020-container-deployment.md) | Container deployment model | Defense-in-depth via container isolation; file-primary logging; flexible upstream addressing |
| [021](decisions/021-x-forwarded-for-edge-proxy.md) | X-Forwarded-For edge proxy model | Replace, don't append — proxy is the edge, no trusted upstream proxies |
## Open Questions

View File

@@ -76,26 +76,43 @@ Incoming HTTPS request
### 1. Host-Based Routing
The axum router uses a `Host` extractor to match incoming requests to site
definitions from `DynamicConfig`. Each site definition maps a hostname to an
upstream address.
definitions from `DynamicConfig`. Sites are defined per-listener in the TOML
configuration for organizational purposes, but at runtime they are collected
into a single global routing table. The proxy looks up the `Host` header in
this global table and either proxies to the upstream or returns 404.
Where `host_based_proxy` reads the `Host` header, looks up the site in
`DynamicConfig.sites`, and either proxies to the upstream or returns 404.
Host matching is **case-insensitive** per RFC 7230 §2.7.3. The `Host` header
is normalized to lowercase before matching. Site `host` values in
configuration are normalized to lowercase during validation.
The `Host` header port component (e.g., `git.alk.dev:443`) is stripped before
matching. Site `host` values must not include ports.
The proxy does not filter or restrict paths. All paths and query strings on a
known host are forwarded to the upstream without modification.
The `/health` path is a special case: it matches regardless of the `Host`
header and is evaluated before host-based routing. A `GET /health` request on
any hostname returns `200 OK` with an empty body.
### 2. Proxy Header Injection
Headers are injected before forwarding. The handler reads connection metadata
from axum's `ConnectInfo` and the original request:
Headers are injected before forwarding. The proxy is an **edge proxy** — it
sits directly in front of the internet with no trusted proxies upstream. This
means the client IP from `ConnectInfo<SocketAddr>` is the real client IP, and
existing `X-Forwarded-For` headers from the client cannot be trusted.
| Header | Value Source | Notes |
|--------|-------------|-------|
| `Host` | Original request `Host` header | Already present; preserved as-is |
| `Host` | Original request `Host` header | Preserved as-is |
| `X-Real-IP` | `ConnectInfo<SocketAddr>` remote IP | Set to client's IP address |
| `X-Forwarded-For` | Client IP, appended if header exists | Comma-separated list of proxies |
| `X-Forwarded-For` | `ConnectInfo<SocketAddr>` remote IP | **Replaced**, not appended. The proxy is the edge proxy — there are no trusted proxies upstream, so existing `X-Forwarded-For` values from the client cannot be trusted. |
| `X-Forwarded-Proto` | Determined by which listener port received the request | `https` for requests on the listener's `https_port`, `http` for requests on the listener's `http_port` |
The `X-Forwarded-For` handling must append the client IP to any existing value
(rather than replacing it), to support chained proxies correctly.
**ConnectInfo propagation**: `ConnectInfo<SocketAddr>` is populated by
extracting `TcpStream::peer_addr()` before wrapping the connection in
`TlsStream`. Each listener provides this information to its axum Router via
`axum::ServiceExt::into_make_service_with_connect_info::<SocketAddr>()`.
### 3. Request Forwarding
@@ -106,7 +123,12 @@ The proxy handler constructs a new request to the upstream:
2. Copy the request method, headers, and body from the original
3. Inject proxy headers (X-Real-IP, X-Forwarded-For, X-Forwarded-Proto)
4. Send the request via a shared hyper Client instance
5. Stream the response back to the client
5. Stream the response back to the client (chunk-by-chunk, not buffered)
If the client disconnects while the upstream is still sending, the upstream
connection is closed and the event is logged at `debug` level. If the
upstream disconnects mid-stream, the client receives whatever data was
already sent and the connection is closed.
The hyper Client is created once at startup and shared via axum's `State`. It
must be configured with (see ADR-017 for rationale):
@@ -118,22 +140,77 @@ Per-site timeout overrides are available via `upstream_connect_timeout_secs`
and `upstream_request_timeout_secs` in `SiteConfig` (see ADR-015). When not
specified, defaults of 5s connect and 60s request are used.
### 4. Error Handling
### 4. Header Handling
| Upstream Condition | Response | Notes |
|-------------------|----------|-------|
| Upstream reachable | Stream response as-is | Headers, status, body all forwarded |
| Upstream unreachable | 502 Bad Gateway | Logged at `warn` level |
| Upstream timeout | 504 Gateway Timeout | Logged at `warn` level |
| Request body too large | 413 Payload Too Large | From `DefaultBodyLimit` middleware |
| Rate limit exceeded | 429 Too Many Requests | Logged at `info` level |
| Unknown Host header | 404 Not Found | No matching site definition |
The proxy must handle request and response headers correctly to avoid security
issues and protocol violations.
### 5. HTTP → HTTPS Redirect
**Headers removed before forwarding (hop-by-hop headers per RFC 2616 §13.5.1):**
- `Connection`
- `Keep-Alive`
- `Proxy-Authorization`
- `Proxy-Authenticate`
- `TE`
- `Trailers`
- `Transfer-Encoding`
- `Upgrade`
These headers are connection-specific and must not be forwarded to the
upstream. Removing `Proxy-Authorization` and `Proxy-Authenticate` prevents
credential leakage.
**Headers added or modified:**
See the Proxy Header Injection section above for the full list of proxy headers
(X-Real-IP, X-Forwarded-For, X-Forwarded-Proto, Host).
**Headers NOT added in Phase 1:**
- `Via`: Not added. The proxy is an edge proxy and `Via` is primarily for
tracking proxy chains. Can be added in Phase 2 if needed.
**Response headers:**
Upstream response headers are forwarded as-is to the client, with the following
exceptions:
- Hop-by-hop headers listed above are removed
- The proxy does not add a `Server` header to responses
### 5. Error Handling
All error responses use plain text bodies with no proxy version or identity
information. No upstream error details are included. Response format:
- Content-Type: `text/plain; charset=utf-8`
- Body: Brief status text matching the HTTP status (e.g., `Bad Gateway` for 502)
| Upstream Condition | Response | Body | Notes |
|-------------------|----------|------|-------|
| Upstream reachable | Stream response as-is | (upstream body) | Headers, status, body all forwarded |
| Upstream unreachable | 502 Bad Gateway | `Bad Gateway` | Logged at `warn` level |
| Upstream timeout | 504 Gateway Timeout | `Gateway Timeout` | Logged at `warn` level |
| Request body too large | 413 Payload Too Large | `Payload Too Large` | From `DefaultBodyLimit` middleware |
| Rate limit exceeded | 429 Too Many Requests | `Too Many Requests` | Logged at `info` level |
| Unknown Host header | 404 Not Found | `Not Found` | No matching site definition |
| Missing Host header | 400 Bad Request | `Bad Request` | Required for routing |
### 6. HTTP → HTTPS Redirect
A separate HTTP listener on port 80 (per listener) handles redirect. It reads
the `Host` header from the incoming request and returns a 301 Permanent Redirect
to the HTTPS equivalent URL (preserving the path and query string).
to the HTTPS equivalent URL.
The redirect URL is constructed as:
`https://{host}:{https_port}/{path}?{query}`
Where:
- `{host}` is the hostname portion of the `Host` header (port stripped)
- `{https_port}` is the listener's `https_port`, omitted if it's 443
- `{path}` and `{query}` are preserved from the original request
If the incoming request has no `Host` header, the proxy returns `400 Bad
Request`.
Each listener has its own HTTP redirect on its own bind address.
@@ -148,6 +225,11 @@ For the initial deployment, upstream connections use plain HTTP (e.g.,
`git.alk.dev``127.0.0.1:3000`, `alk.dev``127.0.0.1:8080`) since TLS
between the proxy and backend services on loopback is unnecessary.
When `upstream_scheme` is `"https"`, the proxy validates the upstream's TLS
certificate using the system's native TLS root certificates (via `rustls` root
cert store). Certificate validation failures result in a 502 Bad Gateway
response. No certificate pinning or custom CA support is provided in Phase 1.
## Body Size Limit
axum's `DefaultBodyLimit` layer sets the maximum request body size. The default
@@ -168,6 +250,7 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
| [015](decisions/015-per-site-timeouts.md) | Per-site upstream timeouts with defaults | 5s connect / 60s request defaults, per-site overrides |
| [017](decisions/017-upstream-connection-defaults.md) | Upstream connection defaults | HTTP/1.1, no redirects, connection pooling |
| [018](decisions/018-body-size-limit.md) | Request body size limit | 100 MB default matching nginx, Gitea push compatibility |
| [021](decisions/021-x-forwarded-for-edge-proxy.md) | X-Forwarded-For edge proxy model | Replace, don't append — proxy is the edge, no trusted upstream proxies |
## Open Questions

View File

@@ -243,6 +243,42 @@ suitable (e.g., behind a CDN that terminates TLS). When using HTTP-01, the
port 80 listener serves `/.well-known/acme-challenge/{token}` paths for
challenge verification.
## Certificate Failure Behavior
ACME certificate provisioning and renewal can fail for various reasons (network
outages, Let's Encrypt unavailability, DNS issues, rate limiting). The proxy's
behavior depends on the scenario:
| Scenario | Behavior |
|----------|----------|
| First start, no cached cert, ACME unreachable | **Fail to start** with clear error message. The proxy cannot serve TLS without a certificate. |
| First start, no cached cert, ACME succeeds | Normal startup. Certificate is provisioned and cached. |
| Start with cached cert, ACME unreachable for renewal | **Start normally** with cached cert. Log error at `warn` level. `rustls-acme` retries per its built-in schedule. |
| Renewal failure after startup | **Continue serving existing cert**. Log error at `warn` level. `rustls-acme` retries per its built-in schedule. |
| Cached cert expired, renewal fails at startup | **Fail to start** if cert is expired at startup. An expired certificate cannot serve valid TLS. |
| Cached cert expires during runtime | **Continue serving expired cert**. Clients will receive certificate errors. Log at `error` level. This is the correct behavior — silently dropping TLS would be worse. |
The key principle: **never start without a valid TLS certificate**, but **always
continue serving if a valid cert exists**, even if renewal fails.
## TLS Error Handling
TLS handshake failures are logged and the connection is closed. The proxy does
not serve a default certificate for unknown hostnames — connections that don't
match any configured certificate fail.
| Scenario | Behavior |
|----------|----------|
| SNI hostname doesn't match any certificate (manual mode) | TLS handshake fails with `unrecognized_name` alert. Log at `warn` level with client IP and SNI hostname. |
| No SNI extension sent by client | TLS handshake fails with `handshake_failure` alert. Log at `warn` level with client IP. |
| Unsupported TLS version (1.0/1.1) | TLS handshake fails with `protocol_version` alert. Log at `info` level. |
| Cipher suite negotiation fails | TLS handshake fails with `handshake_failure` alert. Log at `info` level with client IP. |
| Certificate expired (manual mode) | Connection fails during TLS handshake. Log at `error` level. Other listeners/connections continue serving. |
In ACME mode, the `ResolvesServerCertAcme` resolver handles certificate
selection automatically — there is no SNI mismatch scenario because the
resolver serves the ACME-provisioned certificate for all valid domains.
## Key Files and Crates
| Component | Crate | Purpose |