Resolve 5 open questions, add 7 ADRs for previously undocumented decisions
Resolve open questions: - OQ-01: Restrict cipher suites to match nginx scope (4 ECDHE-AES-GCM suites for TLS 1.2 + all TLS 1.3 suites) — ADR-012 - OQ-03: Health check on separate local port (default 9900, localhost only) — ADR-013 - OQ-04: Add Unix domain socket admin API for config reload alongside SIGHUP, with structured success/failure responses — ADR-014 - OQ-06: Per-site upstream timeouts with defaults (5s connect, 60s request), overridable in SiteConfig — ADR-015 Document previously undocumented decisions flagged by architecture review: - ADR-016: Explicit bind address requirement (reject 0.0.0.0) - ADR-017: Upstream connection defaults (HTTP/1.1, no redirects, pooling) - ADR-018: 100 MB body size limit (matches nginx, Gitea compatibility) OQ-07 (per-site TLS overrides) remains open for future consideration. Spec updates: - config.md: add health_check_port, admin_socket_path, per-site timeout fields, update TOML example and validation rules - proxy.md: reference ADR-015/017/018 for timeouts, connection defaults, and body limit decisions - tls.md: replace OQ-01 cipher suite section with ADR-012 decision - operations.md: add local health check port section, admin socket reload - overview.md: update Phase 1 scope with new features, add ADR references - open-questions.md: resolve OQ-01/03/04/06, keep OQ-07 open
This commit is contained in:
@@ -43,6 +43,13 @@ certificate via ACME.
|
|||||||
| [009](decisions/009-signal-handling.md) | Signal Handling Strategy | Accepted |
|
| [009](decisions/009-signal-handling.md) | Signal Handling Strategy | Accepted |
|
||||||
| [010](decisions/010-multi-site-phase1.md) | Multi-Site Support in Phase 1 | Accepted |
|
| [010](decisions/010-multi-site-phase1.md) | Multi-Site Support in Phase 1 | Accepted |
|
||||||
| [011](decisions/011-multi-domain-tls.md) | Multi-Domain TLS Configuration | Accepted |
|
| [011](decisions/011-multi-domain-tls.md) | Multi-Domain TLS Configuration | Accepted |
|
||||||
|
| [012](decisions/012-cipher-suite-restriction.md) | Restrict Cipher Suites to nginx Scope | Accepted |
|
||||||
|
| [013](decisions/013-health-check-port.md) | Health Check on Separate Local Port | Accepted |
|
||||||
|
| [014](decisions/014-unix-socket-reload.md) | Unix Domain Socket Config Reload API | Accepted |
|
||||||
|
| [015](decisions/015-per-site-timeouts.md) | Per-Site Upstream Timeouts with Defaults | Accepted |
|
||||||
|
| [016](decisions/016-explicit-bind-address.md) | Explicit Bind Address Requirement | Accepted |
|
||||||
|
| [017](decisions/017-upstream-connection-defaults.md) | Upstream Connection Defaults | Accepted |
|
||||||
|
| [018](decisions/018-body-size-limit.md) | Request Body Size Limit | Accepted |
|
||||||
|
|
||||||
## Open Questions
|
## Open Questions
|
||||||
|
|
||||||
@@ -50,12 +57,12 @@ See [open-questions.md](open-questions.md) for the full tracker.
|
|||||||
|
|
||||||
| OQ | Question | Priority | Status |
|
| OQ | Question | Priority | Status |
|
||||||
|----|----------|----------|--------|
|
|----|----------|----------|--------|
|
||||||
| OQ-01 | Should cipher suites be restricted beyond rustls defaults? | medium | open |
|
| ~~OQ-01~~ | ~~Should cipher suites be restricted beyond rustls defaults?~~ | ~~medium~~ | **resolved** (ADR-012) |
|
||||||
| ~~OQ-02~~ | ~~What log format should fail2ban consume?~~ | ~~high~~ | **resolved** (ADR-007) |
|
| ~~OQ-02~~ | ~~What log format should fail2ban consume?~~ | ~~high~~ | **resolved** (ADR-007) |
|
||||||
| OQ-03 | Should the health check endpoint be on a separate port? | low | open |
|
| ~~OQ-03~~ | ~~Should the health check endpoint be on a separate port?~~ | ~~low~~ | **resolved** (ADR-013) |
|
||||||
| OQ-04 | Config reload: SIGHUP only or also Unix socket API? | low | open |
|
| ~~OQ-04~~ | ~~Config reload: SIGHUP only or also Unix socket API?~~ | ~~low~~ | **resolved** (ADR-014) |
|
||||||
| ~~OQ-05~~ | ~~Should the proxy bind to multiple addresses?~~ | ~~low~~ | **resolved** (single bind_addr sufficient) |
|
| ~~OQ-05~~ | ~~Should the proxy bind to multiple addresses?~~ | ~~low~~ | **resolved** (single bind_addr sufficient) |
|
||||||
| OQ-06 | Should upstream timeouts be configurable per-site? | low | open |
|
| ~~OQ-06~~ | ~~Should upstream timeouts be configurable per-site?~~ | ~~low~~ | **resolved** (ADR-015) |
|
||||||
| OQ-07 | Should per-site TLS overrides be supported for mixed ACME/manual domains? | low | open |
|
| OQ-07 | Should per-site TLS overrides be supported for mixed ACME/manual domains? | low | open |
|
||||||
|
|
||||||
## Document Lifecycle
|
## Document Lifecycle
|
||||||
@@ -63,6 +70,6 @@ See [open-questions.md](open-questions.md) for the full tracker.
|
|||||||
| Status | Meaning | Transitions |
|
| Status | Meaning | Transitions |
|
||||||
|--------|---------|-------------|
|
|--------|---------|-------------|
|
||||||
| `draft` | Under active development. May change significantly. | → `reviewed` when open questions are resolved |
|
| `draft` | Under active development. May change significantly. | → `reviewed` when open questions are resolved |
|
||||||
| `reviewed` | Architecture is final. Implementation may begin. | → `stable` when implementation is complete |
|
| `reviewed` | Architecture is final. Implementation may begin. Changes require review. | → `stable` when implementation is complete |
|
||||||
| `stable` | Locked. Changes require review and may warrant an ADR. | → `deprecated` when superseded |
|
| `stable` | Locked. Changes require review and may warrant an ADR. | → `deprecated` when superseded |
|
||||||
| `deprecated` | Superseded. Kept for reference. | Removed when no longer referenced |
|
| `deprecated` | Superseded. Kept for reference. | Removed when no longer referenced |
|
||||||
@@ -38,7 +38,9 @@ config.toml
|
|||||||
│ bind_addr │ │ sites[] │
|
│ bind_addr │ │ sites[] │
|
||||||
│ http_port │ │ rate_limit │
|
│ http_port │ │ rate_limit │
|
||||||
│ https_port │ │ body_limit │
|
│ https_port │ │ body_limit │
|
||||||
│ tls.mode │ │ proxy_headers │
|
│ health_check_port │ │ proxy_headers │
|
||||||
|
│ admin_socket_path │ │ │
|
||||||
|
│ tls.mode │ │ ← ArcSwap → │
|
||||||
│ tls.acme_domains │ │ │
|
│ tls.acme_domains │ │ │
|
||||||
│ tls.cert_path │ │ ← ArcSwap → │
|
│ tls.cert_path │ │ ← ArcSwap → │
|
||||||
│ tls.key_path │ │ ConfigReloadHandle │
|
│ tls.key_path │ │ ConfigReloadHandle │
|
||||||
@@ -59,9 +61,11 @@ Immutable after startup. Changes require a process restart.
|
|||||||
|
|
||||||
| Field | Type | Description |
|
| Field | Type | Description |
|
||||||
|-------|------|-------------|
|
|-------|------|-------------|
|
||||||
| `bind_addr` | `String` | IP address to bind to (must be explicit, no `0.0.0.0`) |
|
| `bind_addr` | `String` | IP address to bind to (must be explicit, no `0.0.0.0`; see ADR-016) |
|
||||||
| `http_port` | `u16` | Port for HTTP→HTTPS redirect (default: `80`; set to `0` to disable) |
|
| `http_port` | `u16` | Port for HTTP→HTTPS redirect (default: `80`; set to `0` to disable) |
|
||||||
| `https_port` | `u16` | Port for TLS listener (default: `443`) |
|
| `https_port` | `u16` | Port for TLS listener (default: `443`) |
|
||||||
|
| `health_check_port` | `u16` | Port for local health check endpoint (default: `9900`; set to `0` to disable; see ADR-013) |
|
||||||
|
| `admin_socket_path` | `String` | Unix domain socket path for admin API (default: `/run/reverse-proxy/admin.sock`; empty string to disable; see ADR-014) |
|
||||||
| `tls.mode` | `"acme"` or `"manual"` | Certificate provisioning mode |
|
| `tls.mode` | `"acme"` or `"manual"` | Certificate provisioning mode |
|
||||||
| `tls.acme_domains` | `Vec<String>` | Domains for ACME SAN certificate (ACME mode only) |
|
| `tls.acme_domains` | `Vec<String>` | Domains for ACME SAN certificate (ACME mode only) |
|
||||||
| `tls.acme_cache_dir` | `String` | ACME state cache directory |
|
| `tls.acme_cache_dir` | `String` | ACME state cache directory |
|
||||||
@@ -95,6 +99,8 @@ connections immediately.
|
|||||||
| `host` | `String` | Hostname to match (e.g., `"git.alk.dev"`) |
|
| `host` | `String` | Hostname to match (e.g., `"git.alk.dev"`) |
|
||||||
| `upstream` | `String` | Upstream address (e.g., `"127.0.0.1:3000"`) |
|
| `upstream` | `String` | Upstream address (e.g., `"127.0.0.1:3000"`) |
|
||||||
| `upstream_scheme` | `"http"` or `"https"` | Protocol for upstream connection (default: `"http"`) |
|
| `upstream_scheme` | `"http"` or `"https"` | Protocol for upstream connection (default: `"http"`) |
|
||||||
|
| `upstream_connect_timeout_secs` | `u64` | TCP connect timeout in seconds (default: `5`; see ADR-015, ADR-017) |
|
||||||
|
| `upstream_request_timeout_secs` | `u64` | Full request timeout in seconds (default: `60`; see ADR-015, ADR-017) | |
|
||||||
|
|
||||||
**Why these are dynamic:** See ADR-008 for the rationale. Site definitions
|
**Why these are dynamic:** See ADR-008 for the rationale. Site definitions
|
||||||
and rate limits are per-request concerns that should not require restarting
|
and rate limits are per-request concerns that should not require restarting
|
||||||
@@ -120,17 +126,23 @@ behind this split.
|
|||||||
|
|
||||||
### Reload Trigger
|
### Reload Trigger
|
||||||
|
|
||||||
The initial implementation uses SIGHUP as the reload trigger. When the process
|
Config reload is triggered by two mechanisms:
|
||||||
receives SIGHUP:
|
|
||||||
|
|
||||||
|
1. **SIGHUP**: Re-reads the config file, validates, and swaps DynamicConfig if
|
||||||
|
valid. Simple and well-understood, but provides no feedback on success or
|
||||||
|
failure.
|
||||||
|
|
||||||
|
2. **Admin socket**: The `reload` command via the admin Unix domain socket
|
||||||
|
performs the same action as SIGHUP but returns a structured response
|
||||||
|
indicating success or failure with an error message. See ADR-014 for
|
||||||
|
details.
|
||||||
|
|
||||||
|
Both mechanisms converge on the same code path:
|
||||||
1. Re-read the config file from disk
|
1. Re-read the config file from disk
|
||||||
2. Deserialize into `DynamicConfig`
|
2. Deserialize into `DynamicConfig`
|
||||||
3. Validate (check upstream reachability is optional)
|
3. Validate (check upstream reachability is optional)
|
||||||
4. Call `ConfigReloadHandle::reload(new_config)`
|
4. Call `ConfigReloadHandle::reload(new_config)`
|
||||||
|
|
||||||
Future implementations could add a Unix domain socket API or HTTP endpoint for
|
|
||||||
config reload, but SIGHUP is sufficient for Phase 1.
|
|
||||||
|
|
||||||
## TOML Config Format
|
## TOML Config Format
|
||||||
|
|
||||||
```toml
|
```toml
|
||||||
@@ -140,6 +152,8 @@ config reload, but SIGHUP is sufficient for Phase 1.
|
|||||||
bind_addr = "203.0.113.10" # Replace with actual bind address
|
bind_addr = "203.0.113.10" # Replace with actual bind address
|
||||||
http_port = 80
|
http_port = 80
|
||||||
https_port = 443
|
https_port = 443
|
||||||
|
health_check_port = 9900 # Local health check (0 to disable)
|
||||||
|
admin_socket_path = "/run/reverse-proxy/admin.sock" # Empty string to disable
|
||||||
|
|
||||||
[server.tls]
|
[server.tls]
|
||||||
mode = "acme" # "acme" or "manual"
|
mode = "acme" # "acme" or "manual"
|
||||||
@@ -167,6 +181,8 @@ limit_bytes = 104857600 # 100 MB
|
|||||||
host = "git.alk.dev"
|
host = "git.alk.dev"
|
||||||
upstream = "127.0.0.1:3000"
|
upstream = "127.0.0.1:3000"
|
||||||
upstream_scheme = "http"
|
upstream_scheme = "http"
|
||||||
|
# upstream_connect_timeout_secs = 5 # Default: 5s
|
||||||
|
# upstream_request_timeout_secs = 60 # Default: 60s
|
||||||
|
|
||||||
[[sites]]
|
[[sites]]
|
||||||
host = "alk.dev"
|
host = "alk.dev"
|
||||||
@@ -205,13 +221,17 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
|
|||||||
| [008](decisions/008-static-dynamic-config-split.md) | Static/dynamic config split | Immutable StaticConfig, hot-reloadable DynamicConfig via ArcSwap |
|
| [008](decisions/008-static-dynamic-config-split.md) | Static/dynamic config split | Immutable StaticConfig, hot-reloadable DynamicConfig via ArcSwap |
|
||||||
| [010](decisions/010-multi-site-phase1.md) | Multi-site in Phase 1 | Multiple domains from initial release |
|
| [010](decisions/010-multi-site-phase1.md) | Multi-site in Phase 1 | Multiple domains from initial release |
|
||||||
| [011](decisions/011-multi-domain-tls.md) | Multi-domain TLS config | Single SAN certificate covering all domains |
|
| [011](decisions/011-multi-domain-tls.md) | Multi-domain TLS config | Single SAN certificate covering all domains |
|
||||||
|
| [013](decisions/013-health-check-port.md) | Health check on separate local port | Localhost-only HTTP health check, configurable port |
|
||||||
|
| [014](decisions/014-unix-socket-reload.md) | Unix domain socket config reload API | Programmatic reload with success/failure feedback |
|
||||||
|
| [015](decisions/015-per-site-timeouts.md) | Per-site upstream timeouts with defaults | 5s connect / 60s request defaults, per-site overrides |
|
||||||
|
| [016](decisions/016-explicit-bind-address.md) | Explicit bind address required | Rejects `0.0.0.0` to prevent accidental exposure |
|
||||||
|
|
||||||
## Open Questions
|
## Open Questions
|
||||||
|
|
||||||
Open questions are tracked in [open-questions.md](open-questions.md). Key
|
Open questions are tracked in [open-questions.md](open-questions.md). Key
|
||||||
questions affecting this document:
|
questions affecting this document:
|
||||||
|
|
||||||
- **OQ-04**: Should config reload support a Unix domain socket API in addition
|
- ~~**OQ-04**: Should config reload support a Unix domain socket API in addition
|
||||||
to SIGHUP? (open)
|
to SIGHUP?~~ (resolved — ADR-014: Unix domain socket admin API added)
|
||||||
- **OQ-07**: Should per-site TLS overrides be supported for mixed ACME/manual
|
- **OQ-07**: Should per-site TLS overrides be supported for mixed ACME/manual
|
||||||
domains? (open)
|
domains? (open)
|
||||||
83
docs/architecture/decisions/012-cipher-suite-restriction.md
Normal file
83
docs/architecture/decisions/012-cipher-suite-restriction.md
Normal file
@@ -0,0 +1,83 @@
|
|||||||
|
# ADR-012: Restrict Cipher Suites to Match nginx Scope
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Accepted
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
Our current nginx configuration explicitly restricts cipher suites to four
|
||||||
|
ECDHE-AES-GCM suites:
|
||||||
|
|
||||||
|
```
|
||||||
|
ECDHE-ECDSA-AES128-GCM-SHA256
|
||||||
|
ECDHE-RSA-AES128-GCM-SHA256
|
||||||
|
ECDHE-ECDSA-AES256-GCM-SHA384
|
||||||
|
ECDHE-RSA-AES256-GCM-SHA384
|
||||||
|
```
|
||||||
|
|
||||||
|
rustls 0.23 with the `aws_lc_rs` crypto provider defaults to a conservative
|
||||||
|
set that excludes all weak ciphers (no SHA-1, no 3DES, no RC4, no CBC-mode
|
||||||
|
suites, no RSA key exchange). The rustls defaults include these four suites
|
||||||
|
plus TLS 1.3 suites (which nginx also allows via `TLSv1.3`).
|
||||||
|
|
||||||
|
The question was whether to accept the wider rustls defaults or restrict to
|
||||||
|
the same scope as our current nginx configuration (see OQ-01).
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Explicitly restrict cipher suites to match the same scope as our current nginx
|
||||||
|
configuration: the four ECDHE-AES-GCM suites for TLS 1.2, plus all TLS 1.3
|
||||||
|
suites. This is slightly more restrictive than rustls defaults but preserves
|
||||||
|
compatibility with all modern clients.
|
||||||
|
|
||||||
|
The restricted set in rustls terms:
|
||||||
|
|
||||||
|
**TLS 1.2 (explicitly selected):**
|
||||||
|
- `TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256`
|
||||||
|
- `TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256`
|
||||||
|
- `TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384`
|
||||||
|
- `TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384`
|
||||||
|
|
||||||
|
**TLS 1.3 (all default suites):**
|
||||||
|
- `TLS_AES_128_GCM_SHA256`
|
||||||
|
- `TLS_AES_256_GCM_SHA384`
|
||||||
|
- `TLS_CHACHA20_POLY1305_SHA256`
|
||||||
|
|
||||||
|
This is configured by building a `CryptoProvider` with a custom
|
||||||
|
`cipher_suite` list and passing it to `ServerConfig::builder_with_provider()`.
|
||||||
|
|
||||||
|
## Rationale
|
||||||
|
|
||||||
|
- Maintains behavioral parity with our current nginx configuration during
|
||||||
|
migration — no client that worked with nginx should see different TLS
|
||||||
|
behavior
|
||||||
|
- The four TLS 1.2 suites are the same ones nginx allows, providing a known
|
||||||
|
security baseline
|
||||||
|
- TLS 1.3 suites are all modern and secure — restricting them provides no
|
||||||
|
meaningful security benefit and reduces compatibility
|
||||||
|
- rustls defaults include ChaCha20-Poly1305 suites for TLS 1.2 which nginx does
|
||||||
|
not; these are cryptographically sound but represent a wider scope than our
|
||||||
|
current nginx config allows
|
||||||
|
- Explicit configuration means the cipher list is documented and auditable
|
||||||
|
- If compatibility issues arise, expanding the list is straightforward; the
|
||||||
|
reverse (restricting after deployment) risks breaking existing clients
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
**Positive:**
|
||||||
|
- Behavioral parity with current nginx TLS configuration
|
||||||
|
- Explicitly auditable cipher list
|
||||||
|
- No client that currently works will see different TLS behavior
|
||||||
|
- Matches security review expectations
|
||||||
|
|
||||||
|
**Negative:**
|
||||||
|
- Slightly more restrictive than rustls defaults — excludes ChaCha20-Poly1305
|
||||||
|
for TLS 1.2 and AES-CCM suites (rarely used)
|
||||||
|
- Must update the cipher list when deprecating TLS 1.2 in the future
|
||||||
|
- Custom `CryptoProvider` construction is slightly more code than using defaults
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [tls.md](../tls.md)
|
||||||
|
- OQ-01 (now resolved)
|
||||||
69
docs/architecture/decisions/013-health-check-port.md
Normal file
69
docs/architecture/decisions/013-health-check-port.md
Normal file
@@ -0,0 +1,69 @@
|
|||||||
|
# ADR-013: Health Check on Separate Local Port
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Accepted
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
The health check endpoint (`/health`) needs to be accessible for monitoring
|
||||||
|
without requiring TLS. Currently the design places it on the main HTTPS
|
||||||
|
listener, which means:
|
||||||
|
|
||||||
|
1. TLS handshake must succeed for the health check to respond
|
||||||
|
2. External monitoring tools need to handle TLS
|
||||||
|
3. A TLS configuration error would make the health check unreachable, creating
|
||||||
|
a false-negative monitoring signal
|
||||||
|
|
||||||
|
Three options were considered (see OQ-03):
|
||||||
|
|
||||||
|
1. **Main HTTPS listener only**: Simplest, but TLS config errors make health
|
||||||
|
checks unreachable
|
||||||
|
2. **Separate unencrypted port on localhost**: Simple, works with standard
|
||||||
|
monitoring tools, but health checks bypass TLS
|
||||||
|
3. **Admin port with its own listener**: Most flexible but adds complexity
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Add a configurable health check port that binds to `127.0.0.1` only (localhost),
|
||||||
|
serving `/health` over plain HTTP. This is a separate listener from the main
|
||||||
|
HTTP and HTTPS listeners.
|
||||||
|
|
||||||
|
The port is configurable via `health_check_port` in StaticConfig. Setting it
|
||||||
|
to `0` (default) disables the separate health check listener, and `/health`
|
||||||
|
remains available on the main HTTPS listener as a fallback.
|
||||||
|
|
||||||
|
## Rationale
|
||||||
|
|
||||||
|
- A local-only health check port is the standard pattern for reverse proxies
|
||||||
|
and service meshes (envoy, haproxy, k8s health probes all use this pattern)
|
||||||
|
- Health checks should work even when TLS is misconfigured — that's the whole
|
||||||
|
point of monitoring
|
||||||
|
- Binding to `127.0.0.1` only means the health check is not exposed to the
|
||||||
|
internet — only local monitoring tools (systemd, scripts, load balancers on
|
||||||
|
the same host) can reach it
|
||||||
|
- Configurable port allows different deployment scenarios (some monitoring runs
|
||||||
|
on different ports)
|
||||||
|
- Disabling via `health_check_port = 0` keeps the main HTTPS `/health` endpoint
|
||||||
|
available for cases where a separate port isn't needed
|
||||||
|
- When this project is folded into alknet, the health check will use alknet's
|
||||||
|
existing patterns, making the separate port unnecessary in that context
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
**Positive:**
|
||||||
|
- Health checks work even when TLS is misconfigured
|
||||||
|
- Standard pattern that monitoring tools expect
|
||||||
|
- Not exposed to the internet (localhost only)
|
||||||
|
- Configurable — can be disabled if not needed
|
||||||
|
- systemd can use it for `NotifyAccess` readiness checks
|
||||||
|
|
||||||
|
**Negative:**
|
||||||
|
- Additional listener to manage (minimal complexity)
|
||||||
|
- Two health check endpoints exist when the separate port is enabled (the
|
||||||
|
local one and the HTTPS one) — monitoring should prefer the local one
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [operations.md](../operations.md)
|
||||||
|
- OQ-03 (now resolved)
|
||||||
76
docs/architecture/decisions/014-unix-socket-reload.md
Normal file
76
docs/architecture/decisions/014-unix-socket-reload.md
Normal file
@@ -0,0 +1,76 @@
|
|||||||
|
# ADR-014: Unix Domain Socket Config Reload API
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Accepted
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
The proxy supports config reload via SIGHUP (ADR-009). SIGHUP is simple and
|
||||||
|
well-understood, but has limitations:
|
||||||
|
|
||||||
|
1. No feedback — the sender doesn't know if the reload succeeded or failed
|
||||||
|
2. No structured input — you can only signal "reload", not specify which parts
|
||||||
|
to reload or pass validation context
|
||||||
|
3. Requires process signal permissions — not all deployment tools can send
|
||||||
|
signals
|
||||||
|
|
||||||
|
A Unix domain socket API would allow programmatic config reload with
|
||||||
|
success/failure status, enabling integration with CI/CD pipelines, admin
|
||||||
|
tools, and automated configuration management.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Add a Unix domain socket API for config reload alongside SIGHUP. The socket
|
||||||
|
accepts commands and returns structured responses.
|
||||||
|
|
||||||
|
The socket path is configurable via `admin_socket_path` in StaticConfig
|
||||||
|
(default: `/run/reverse-proxy/admin.sock`). Setting it to empty string
|
||||||
|
disables the admin socket.
|
||||||
|
|
||||||
|
Initial commands:
|
||||||
|
|
||||||
|
- `reload` — Re-read config file, validate, and swap DynamicConfig. Returns
|
||||||
|
`{"status": "ok"}` or `{"status": "error", "message": "..."}`.
|
||||||
|
- `status` — Return basic process info (uptime, config load time, site count).
|
||||||
|
Returns `{"status": "ok", "uptime_secs": 1234, "sites": 2}`.
|
||||||
|
|
||||||
|
Future commands (not in Phase 1, but the protocol supports extension):
|
||||||
|
|
||||||
|
- `metrics` — Return Prometheus-compatible metrics
|
||||||
|
- `shutdown` — Graceful shutdown command
|
||||||
|
|
||||||
|
## Rationale
|
||||||
|
|
||||||
|
- Providing reload feedback is operationally valuable — CI/CD pipelines can
|
||||||
|
verify config changes before proceeding
|
||||||
|
- The implementation cost is low — a Unix domain socket listener is ~50 lines
|
||||||
|
of tokio code, and the command protocol is simple
|
||||||
|
- SIGHUP is retained as a fallback for environments where socket access is
|
||||||
|
inconvenient
|
||||||
|
- This pattern will integrate naturally with alknet's admin interface when the
|
||||||
|
projects merge
|
||||||
|
- Unix domain sockets are filesystem-permission-based, providing access control
|
||||||
|
without additional authentication
|
||||||
|
- The socket path is configurable, allowing deployment-specific paths
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
**Positive:**
|
||||||
|
- Config reload with success/failure feedback
|
||||||
|
- Programmatic integration with CI/CD and admin tools
|
||||||
|
- Structured response format enables automation
|
||||||
|
- SIGHUP still works as fallback
|
||||||
|
- Natural path to future admin commands
|
||||||
|
|
||||||
|
**Negative:**
|
||||||
|
- Additional listener and command parsing logic (~100-150 lines)
|
||||||
|
- Socket file management (cleanup on startup, stale socket detection)
|
||||||
|
- One more config option (`admin_socket_path`)
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [operations.md](../operations.md)
|
||||||
|
- [config.md](../config.md)
|
||||||
|
- ADR-009 (signal handling — SIGHUP retained as fallback)
|
||||||
|
- OQ-04 (now resolved)
|
||||||
69
docs/architecture/decisions/015-per-site-timeouts.md
Normal file
69
docs/architecture/decisions/015-per-site-timeouts.md
Normal file
@@ -0,0 +1,69 @@
|
|||||||
|
# ADR-015: Per-Site Upstream Timeouts with Defaults
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Accepted
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
The proxy forwards requests to upstream services. Connection and request
|
||||||
|
timeouts affect reliability — too short and legitimate slow responses fail,
|
||||||
|
too long and the proxy is vulnerable to resource exhaustion from stalled
|
||||||
|
connections.
|
||||||
|
|
||||||
|
Phase 1 initially specified global timeout defaults (5s connect, 60s request)
|
||||||
|
for all upstreams. However, different upstream services have different latency
|
||||||
|
profiles:
|
||||||
|
|
||||||
|
- Gitea (git.alk.dev): Git push operations can be slow for large repos; 60s
|
||||||
|
may be insufficient for clone/push operations with large pack files
|
||||||
|
- Deno/Fresh (alk.dev): Fast responses expected; 60s is generous
|
||||||
|
|
||||||
|
Global timeouts don't accommodate these differences. Per-site timeout
|
||||||
|
configuration allows tuning for each upstream without affecting others.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Add optional per-site upstream timeout configuration to `SiteConfig`. When not
|
||||||
|
specified, sensible defaults are used.
|
||||||
|
|
||||||
|
**SiteConfig additions:**
|
||||||
|
|
||||||
|
| Field | Type | Default | Description |
|
||||||
|
|-------|------|---------|-------------|
|
||||||
|
| `upstream_connect_timeout_secs` | `u64` | `5` | TCP connection timeout in seconds |
|
||||||
|
| `upstream_request_timeout_secs` | `u64` | `60` | Full request timeout in seconds |
|
||||||
|
|
||||||
|
These are part of `DynamicConfig` (hot-reloadable via ArcSwap) since they
|
||||||
|
affect per-request behavior and should not require a restart to change.
|
||||||
|
|
||||||
|
## Rationale
|
||||||
|
|
||||||
|
- Different upstreams genuinely have different latency profiles — Gitea pushes
|
||||||
|
with large pack files need more time than a fast static site
|
||||||
|
- Defaults of 5s connect and 60s request match common reverse proxy conventions
|
||||||
|
(nginx defaults: 60s, haproxy defaults: 30s connect, 60s server)
|
||||||
|
- Making these per-site rather than global allows tuning without side effects
|
||||||
|
- Per-site overrides in DynamicConfig means timeout changes don't require
|
||||||
|
restarts
|
||||||
|
- The defaults are reasonable for most services; explicit configuration is only
|
||||||
|
needed for outliers
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
**Positive:**
|
||||||
|
- Each upstream can be tuned independently
|
||||||
|
- Defaults work for most cases — explicit configuration is optional
|
||||||
|
- Hot-reloadable (part of DynamicConfig)
|
||||||
|
- Consistent with how other reverse proxies handle timeouts
|
||||||
|
|
||||||
|
**Negative:**
|
||||||
|
- Two more fields per site in config (mitigated by sensible defaults)
|
||||||
|
- Per-site timeout means the proxy must look up per-request config for each
|
||||||
|
upstream connection (already required for routing, so no additional overhead)
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [proxy.md](../proxy.md)
|
||||||
|
- [config.md](../config.md)
|
||||||
|
- OQ-06 (now resolved)
|
||||||
53
docs/architecture/decisions/016-explicit-bind-address.md
Normal file
53
docs/architecture/decisions/016-explicit-bind-address.md
Normal file
@@ -0,0 +1,53 @@
|
|||||||
|
# ADR-016: Explicit Bind Address Requirement
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Accepted
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
The proxy's `bind_addr` configuration field specifies the IP address to bind
|
||||||
|
to. The default for most network services is `0.0.0.0` (bind all interfaces),
|
||||||
|
which means the service is accessible on every network interface, including
|
||||||
|
public-facing ones.
|
||||||
|
|
||||||
|
For a reverse proxy that terminates TLS and handles security-sensitive traffic,
|
||||||
|
binding to all interfaces is risky. The proxy should only listen on the
|
||||||
|
intended interface(s) — typically a specific public IP for a single-server
|
||||||
|
deployment.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
The `bind_addr` field must be an explicit IP address. `0.0.0.0` is rejected
|
||||||
|
during config validation. The proxy will not start if `bind_addr` is `0.0.0.0`.
|
||||||
|
|
||||||
|
## Rationale
|
||||||
|
|
||||||
|
- A reverse proxy terminating TLS is a security boundary — it should not be
|
||||||
|
accidentally exposed on unintended interfaces
|
||||||
|
- Single-server deployments have a specific public IP; binding to it
|
||||||
|
deliberately is the correct operational practice
|
||||||
|
- `0.0.0.0` is often a default in example configurations and can be deployed
|
||||||
|
without the operator realizing the service is accessible on all interfaces
|
||||||
|
- Rejecting it at validation time prevents this common mistake
|
||||||
|
- If a deployment genuinely needs to bind all interfaces, `0.0.0.0` can be
|
||||||
|
overridden with an explicit flag, but this should be a deliberate choice
|
||||||
|
- This matches the principle of explicit over implicit for security-sensitive
|
||||||
|
configuration
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
**Positive:**
|
||||||
|
- Prevents accidental exposure on unintended network interfaces
|
||||||
|
- Forces operators to be deliberate about which interface the proxy serves
|
||||||
|
- Config validation catches the mistake before deployment
|
||||||
|
|
||||||
|
**Negative:**
|
||||||
|
- Not suitable for deployments that genuinely need to bind all interfaces
|
||||||
|
(mitigated by explicit override if needed in the future)
|
||||||
|
- Slightly more configuration required (operator must know their public IP)
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [config.md](../config.md)
|
||||||
|
- [overview.md](../overview.md)
|
||||||
@@ -0,0 +1,66 @@
|
|||||||
|
# ADR-017: Upstream Connection Defaults
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Accepted
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
The proxy forwards requests to upstream services using a shared hyper `Client`.
|
||||||
|
The client's configuration affects all upstream connections: timeouts, redirect
|
||||||
|
handling, and connection pooling. These are significant design choices that
|
||||||
|
affect production behavior and should be traceable.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Configure the hyper `Client` with the following defaults:
|
||||||
|
|
||||||
|
| Setting | Value | Rationale |
|
||||||
|
|---------|-------|-----------|
|
||||||
|
| Connect timeout | 5 seconds | Prevents hanging on unreachable upstreams |
|
||||||
|
| Request timeout | 60 seconds (default) | Per-site override available (ADR-015) |
|
||||||
|
| Redirect following | Disabled | Proxies should not follow redirects — they pass response status to the client |
|
||||||
|
| Connection pooling | Enabled (hyper default) | Reuses connections to the same upstream for performance |
|
||||||
|
| HTTP version | HTTP/1.1 only | Upstream connections use HTTP/1.1; HTTP/2 proxying is out of scope |
|
||||||
|
|
||||||
|
The connect timeout of 5 seconds applies to establishing a TCP connection to
|
||||||
|
the upstream. The request timeout of 60 seconds applies to the full
|
||||||
|
request/response cycle and can be overridden per-site (ADR-015).
|
||||||
|
|
||||||
|
## Rationale
|
||||||
|
|
||||||
|
- **5s connect timeout**: Matches common reverse proxy conventions. Long
|
||||||
|
enough for remote upstreams with network latency, short enough to fail fast
|
||||||
|
on unreachable services.
|
||||||
|
- **60s request timeout**: Generous default that accommodates Gitea push
|
||||||
|
operations with large pack files. Per-site overrides allow tuning for faster
|
||||||
|
upstreams.
|
||||||
|
- **No redirect following**: Standard reverse proxy behavior. If an upstream
|
||||||
|
returns a 3xx, the proxy streams it to the client. Following redirects would
|
||||||
|
break the client's expectation and could create security issues (redirect
|
||||||
|
loops, credential forwarding).
|
||||||
|
- **Connection pooling**: Hyper's default connection reuse improves performance
|
||||||
|
for repeated requests to the same upstream without additional configuration.
|
||||||
|
- **HTTP/1.1 only**: The proxy communicates with upstreams over plain HTTP
|
||||||
|
(loopback). HTTP/2 proxying is explicitly out of scope (see overview.md).
|
||||||
|
Upstreams needing HTTP/2+ run their own servers (e.g., api.alk.dev).
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
**Positive:**
|
||||||
|
- Explicit, documented defaults that match reverse proxy conventions
|
||||||
|
- Per-site timeout overrides available for specific needs
|
||||||
|
- No redirect following prevents common proxy pitfalls
|
||||||
|
- Connection pooling provides good performance by default
|
||||||
|
|
||||||
|
**Negative:**
|
||||||
|
- 60s default may be too long for fast upstreams (mitigated by per-site
|
||||||
|
overrides in ADR-015)
|
||||||
|
- HTTP/1.1 only for upstream means no HTTP/2 multiplexing to backends (out of
|
||||||
|
scope per project design)
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [proxy.md](../proxy.md)
|
||||||
|
- ADR-015 (per-site upstream timeouts)
|
||||||
|
- ADR-002 (custom proxy handler)
|
||||||
53
docs/architecture/decisions/018-body-size-limit.md
Normal file
53
docs/architecture/decisions/018-body-size-limit.md
Normal file
@@ -0,0 +1,53 @@
|
|||||||
|
# ADR-018: Request Body Size Limit
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Accepted
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
The proxy enforces a maximum request body size to protect against resource
|
||||||
|
exhaustion attacks. The default limit must balance security with usability.
|
||||||
|
|
||||||
|
Gitea push operations can involve large Git pack files. The current nginx
|
||||||
|
configuration uses `client_max_body_size 100m`, and Gitea's documentation
|
||||||
|
recommends allowing up to 100 MB for push operations.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Set the default request body size limit to 100 MB (104,857,600 bytes),
|
||||||
|
matching our current nginx configuration. The limit is global in Phase 1
|
||||||
|
(configurable via `body.limit_bytes` in DynamicConfig).
|
||||||
|
|
||||||
|
## Rationale
|
||||||
|
|
||||||
|
- 100 MB matches the current nginx `client_max_body_size 100m`, ensuring
|
||||||
|
behavioral parity during migration
|
||||||
|
- Gitea push operations with large repositories regularly exceed 50 MB
|
||||||
|
- 100 MB is large enough for any legitimate Git operation while still
|
||||||
|
providing protection against resource exhaustion (a 100 MB body is not
|
||||||
|
enough to exhaust memory on modern servers, but prevents unbounded uploads)
|
||||||
|
- The limit is configurable — operators can reduce it for deployments that
|
||||||
|
don't need large uploads
|
||||||
|
- In Phase 2, per-site limits will allow different limits for different
|
||||||
|
upstreams (e.g., a lower limit for alk.dev, the current limit for
|
||||||
|
git.alk.dev)
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
**Positive:**
|
||||||
|
- Behavioral parity with current nginx configuration
|
||||||
|
- Gitea push operations work without configuration changes
|
||||||
|
- Configurable for deployments with different needs
|
||||||
|
|
||||||
|
**Negative:**
|
||||||
|
- 100 MB is a generous default — some deployments may want a lower limit
|
||||||
|
(mitigated by configurability)
|
||||||
|
- Global limit means all sites share the same maximum (mitigated by Phase 2
|
||||||
|
per-site limits)
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [proxy.md](../proxy.md)
|
||||||
|
- [config.md](../config.md)
|
||||||
|
- nginx `client_max_body_size` documentation
|
||||||
@@ -7,19 +7,15 @@ last_updated: 2026-06-11
|
|||||||
|
|
||||||
## TLS
|
## TLS
|
||||||
|
|
||||||
### OQ-01: Should cipher suites be restricted beyond rustls defaults?
|
### ~~OQ-01: Should cipher suites be restricted beyond rustls defaults?~~
|
||||||
|
|
||||||
- **Origin**: [tls.md](tls.md)
|
- **Origin**: [tls.md](tls.md)
|
||||||
- **Status**: open
|
- **Status**: resolved
|
||||||
- **Priority**: medium
|
- **Priority**: medium
|
||||||
- **Context**: Our current nginx config explicitly restricts cipher suites to
|
- **Resolution**: Restrict cipher suites to match the nginx scope: four
|
||||||
four ECDHE-AES-GCM suites. rustls 0.23 with `aws_lc_rs` defaults to a
|
ECDHE-AES-GCM suites for TLS 1.2 plus all TLS 1.3 suites. This provides
|
||||||
conservative set that excludes all weak ciphers (no SHA-1, no 3DES, no RC4,
|
behavioral parity during migration. See ADR-012.
|
||||||
no CBC-mode suites, no RSA key exchange). The defaults include TLS 1.3 suites
|
- **Cross-references**: ADR-005, ADR-012
|
||||||
which nginx also allows. Restricting further would reduce compatibility with
|
|
||||||
older clients; not restricting means accepting a wider (but still safe) set
|
|
||||||
than the current nginx config.
|
|
||||||
- **Cross-references**: ADR-005
|
|
||||||
|
|
||||||
### ~~OQ-02: What log format should fail2ban consume?~~
|
### ~~OQ-02: What log format should fail2ban consume?~~
|
||||||
|
|
||||||
@@ -47,31 +43,28 @@ last_updated: 2026-06-11
|
|||||||
|
|
||||||
## Logging and Monitoring
|
## Logging and Monitoring
|
||||||
|
|
||||||
### OQ-03: Should the health check endpoint be on a separate port?
|
### ~~OQ-03: Should the health check endpoint be on a separate port?~~
|
||||||
|
|
||||||
- **Origin**: [operations.md](operations.md)
|
- **Origin**: [operations.md](operations.md)
|
||||||
- **Status**: open
|
- **Status**: resolved
|
||||||
- **Priority**: low
|
- **Priority**: low
|
||||||
- **Context**: Currently the health check is on the main HTTPS listener at
|
- **Resolution**: Add a configurable local health check port (default: 9900)
|
||||||
`/health`. Alternatives: (a) separate unencrypted port for health checks
|
bound to `127.0.0.1` only. Health checks work even when TLS is misconfigured.
|
||||||
(simpler for load balancers but less secure), (b) admin port with its own
|
The main HTTPS `/health` endpoint remains available as a fallback. See
|
||||||
listener (more complex but isolates operational traffic), (c) on the main
|
ADR-013.
|
||||||
listener (simplest, proposed approach). For a single-server deployment behind
|
- **Cross-references**: ADR-013
|
||||||
no external load balancer, the main listener is fine.
|
|
||||||
- **Cross-references**: None
|
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
### OQ-04: Should config reload support a Unix domain socket API in addition to SIGHUP?
|
### ~~OQ-04: Should config reload support a Unix domain socket API in addition to SIGHUP?~~
|
||||||
|
|
||||||
- **Origin**: [config.md](config.md)
|
- **Origin**: [config.md](config.md)
|
||||||
- **Status**: open
|
- **Status**: resolved
|
||||||
- **Priority**: low
|
- **Priority**: low
|
||||||
- **Context**: Phase 1 uses SIGHUP for config reload, which is simple and proven.
|
- **Resolution**: Yes. Add a Unix domain socket admin API alongside SIGHUP.
|
||||||
A Unix domain socket API would allow programmatic reload (e.g., from an admin
|
The socket accepts a `reload` command and returns structured success/failure
|
||||||
tool or CI/CD pipeline) and could return success/failure status. This adds
|
responses. SIGHUP is retained as a fallback. See ADR-014.
|
||||||
complexity and is not needed for Phase 1.
|
- **Cross-references**: ADR-014
|
||||||
- **Cross-references**: None
|
|
||||||
|
|
||||||
## Deployment
|
## Deployment
|
||||||
|
|
||||||
@@ -84,17 +77,16 @@ last_updated: 2026-06-11
|
|||||||
explicit IP address (not `0.0.0.0`). Multi-address binding is not needed for
|
explicit IP address (not `0.0.0.0`). Multi-address binding is not needed for
|
||||||
this single-server deployment. If needed in the future, `bind_addr` could be
|
this single-server deployment. If needed in the future, `bind_addr` could be
|
||||||
extended to an array. See config.md for the `bind_addr` field.
|
extended to an array. See config.md for the `bind_addr` field.
|
||||||
- **Cross-references**: None
|
- **Cross-references**: ADR-016
|
||||||
|
|
||||||
## Proxy
|
## Proxy
|
||||||
|
|
||||||
### OQ-06: Should upstream timeouts be configurable per-site?
|
### ~~OQ-06: Should upstream timeouts be configurable per-site?~~
|
||||||
|
|
||||||
- **Origin**: [proxy.md](proxy.md)
|
- **Origin**: [proxy.md](proxy.md)
|
||||||
- **Status**: open
|
- **Status**: resolved
|
||||||
- **Priority**: low
|
- **Priority**: low
|
||||||
- **Context**: Phase 1 uses global defaults (5s connect timeout, 60s request
|
- **Resolution**: Yes. Per-site upstream timeouts with sensible defaults (5s
|
||||||
timeout) for all upstream connections. Per-site timeout configuration would
|
connect, 60s request). Optional fields in SiteConfig that override global
|
||||||
allow tuning for different upstream services (e.g., a slow database-backed
|
defaults when specified. See ADR-015.
|
||||||
API vs. a fast static site). Not needed for Phase 1 with a single upstream.
|
- **Cross-references**: ADR-015, ADR-017
|
||||||
- **Cross-references**: None
|
|
||||||
@@ -109,23 +109,30 @@ Configurable via `log_level` in StaticConfig.
|
|||||||
|
|
||||||
## Health Check
|
## Health Check
|
||||||
|
|
||||||
### Endpoint
|
### Local Health Check Port
|
||||||
|
|
||||||
|
The primary health check endpoint is served on a separate local port (default:
|
||||||
|
9900), bound to `127.0.0.1` only. This ensures health checks work even when TLS
|
||||||
|
is misconfigured. See ADR-013 for the rationale.
|
||||||
|
|
||||||
```
|
```
|
||||||
GET /health → 200 OK (empty body)
|
GET http://127.0.0.1:9900/health → 200 OK (empty body)
|
||||||
```
|
```
|
||||||
|
|
||||||
The health check endpoint is accessible on the main HTTPS listener. It returns
|
The port is configurable via `health_check_port` in StaticConfig. Setting it
|
||||||
200 if the process is alive and serving requests.
|
to `0` disables the separate health check listener.
|
||||||
|
|
||||||
**Limitation**: Since `/health` is served over TLS, it cannot detect TLS
|
### HTTPS Health Check (Fallback)
|
||||||
configuration errors that prevent the TLS handshake from completing. External
|
|
||||||
monitoring should also check TCP connectivity to port 443 independently.
|
When the local health check port is enabled, `/health` is also available on the
|
||||||
|
main HTTPS listener for cases where TLS-level health verification is desired.
|
||||||
|
External monitoring should prefer the local health check for liveness checks
|
||||||
|
and can use the HTTPS endpoint for TLS verification.
|
||||||
|
|
||||||
### What It Checks
|
### What It Checks
|
||||||
|
|
||||||
- Process is running and the tokio runtime is responsive
|
- Process is running and the tokio runtime is responsive
|
||||||
- TLS listener is accepting connections
|
- TLS listener is accepting connections (HTTPS endpoint only)
|
||||||
- Config is loaded (StaticConfig and DynamicConfig are initialized)
|
- Config is loaded (StaticConfig and DynamicConfig are initialized)
|
||||||
|
|
||||||
It does **not** check upstream reachability. The health check answers "is the
|
It does **not** check upstream reachability. The health check answers "is the
|
||||||
@@ -180,13 +187,22 @@ The proxy handles three signals via `signal-hook` (see [ADR-009](decisions/009-s
|
|||||||
- **SIGTERM / SIGINT**: Graceful shutdown. Stop accepting new connections, wait
|
- **SIGTERM / SIGINT**: Graceful shutdown. Stop accepting new connections, wait
|
||||||
for in-flight requests to complete (up to a configurable timeout), then exit.
|
for in-flight requests to complete (up to a configurable timeout), then exit.
|
||||||
- **SIGHUP**: Config reload. Re-read the config file, validate, and swap
|
- **SIGHUP**: Config reload. Re-read the config file, validate, and swap
|
||||||
DynamicConfig if valid.
|
DynamicConfig if valid. No feedback on success or failure.
|
||||||
|
- **Admin socket reload**: Send `reload` command via the Unix domain socket
|
||||||
|
(default: `/run/reverse-proxy/admin.sock`). Returns structured response
|
||||||
|
indicating success or failure. See ADR-014 for details.
|
||||||
|
|
||||||
### SIGHUP for Config Reload
|
### SIGHUP for Config Reload
|
||||||
|
|
||||||
SIGHUP triggers config reload (see [config.md](config.md) for details). The
|
SIGHUP triggers config reload (see [config.md](config.md) for details). The
|
||||||
process does not exit on SIGHUP.
|
process does not exit on SIGHUP.
|
||||||
|
|
||||||
|
### Admin Socket for Config Reload
|
||||||
|
|
||||||
|
The admin Unix domain socket provides programmatic config reload with feedback.
|
||||||
|
This is useful for CI/CD pipelines and automation tools. See ADR-014 for the
|
||||||
|
command protocol.
|
||||||
|
|
||||||
### Timeout
|
### Timeout
|
||||||
|
|
||||||
In-flight requests have a configurable shutdown timeout (default: 30 seconds).
|
In-flight requests have a configurable shutdown timeout (default: 30 seconds).
|
||||||
@@ -242,10 +258,13 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
|
|||||||
| [006](decisions/006-rate-limiting-approach.md) | Token bucket rate limiting | In-memory per-IP token bucket matching nginx burst semantics |
|
| [006](decisions/006-rate-limiting-approach.md) | Token bucket rate limiting | In-memory per-IP token bucket matching nginx burst semantics |
|
||||||
| [007](decisions/007-custom-log-format.md) | Custom structured log format | key=value pairs with RATE_LIMIT prefix for fail2ban |
|
| [007](decisions/007-custom-log-format.md) | Custom structured log format | key=value pairs with RATE_LIMIT prefix for fail2ban |
|
||||||
| [009](decisions/009-signal-handling.md) | Signal handling strategy | signal-hook for SIGTERM/SIGINT/SIGHUP |
|
| [009](decisions/009-signal-handling.md) | Signal handling strategy | signal-hook for SIGTERM/SIGINT/SIGHUP |
|
||||||
|
| [013](decisions/013-health-check-port.md) | Health check on separate local port | Localhost-only HTTP health check, configurable port |
|
||||||
|
| [014](decisions/014-unix-socket-reload.md) | Unix domain socket config reload API | Programmatic reload with success/failure feedback |
|
||||||
|
|
||||||
## Open Questions
|
## Open Questions
|
||||||
|
|
||||||
Open questions are tracked in [open-questions.md](open-questions.md). Key
|
Open questions are tracked in [open-questions.md](open-questions.md). Key
|
||||||
questions affecting this document:
|
questions affecting this document:
|
||||||
|
|
||||||
- **OQ-03**: Should the health check endpoint be on a separate port? (open)
|
- ~~**OQ-03**: Should the health check endpoint be on a separate port?~~ (resolved
|
||||||
|
— ADR-013: separate local port, default 9900, localhost only)
|
||||||
@@ -40,22 +40,24 @@ details.
|
|||||||
- **Phase 1**: Multi-site reverse proxy with TLS termination
|
- **Phase 1**: Multi-site reverse proxy with TLS termination
|
||||||
- TLS termination with ACME (Let's Encrypt) multi-domain certificate management
|
- TLS termination with ACME (Let's Encrypt) multi-domain certificate management
|
||||||
- Manual certificate paths as fallback mode
|
- Manual certificate paths as fallback mode
|
||||||
|
- Cipher suite restriction matching nginx scope (ECDHE-AES-GCM + TLS 1.3)
|
||||||
- HTTP → HTTPS redirect
|
- HTTP → HTTPS redirect
|
||||||
- Host-based routing to multiple upstream services
|
- Host-based routing to multiple upstream services
|
||||||
- Reverse proxy to Gitea at `127.0.0.1:3000` (git.alk.dev)
|
- Reverse proxy to Gitea at `127.0.0.1:3000` (git.alk.dev)
|
||||||
- Reverse proxy to Deno/Fresh container for alk.dev (simple pass-through)
|
- Reverse proxy to Deno/Fresh container for alk.dev (simple pass-through)
|
||||||
- Proxy header injection (Host, X-Real-IP, X-Forwarded-For, X-Forwarded-Proto)
|
- Proxy header injection (Host, X-Real-IP, X-Forwarded-For, X-Forwarded-Proto)
|
||||||
|
- Per-site upstream timeouts with sensible defaults (5s connect, 60s request)
|
||||||
- Request rate limiting with fail2ban-compatible logging (global per-IP)
|
- Request rate limiting with fail2ban-compatible logging (global per-IP)
|
||||||
- 100 MB body size limit (global)
|
- 100 MB body size limit (global)
|
||||||
- Configurable bind address (no `0.0.0.0` default)
|
- Configurable bind address (must be explicit, no `0.0.0.0`)
|
||||||
- Health check endpoint
|
- Local health check endpoint on separate port (default: 9900, localhost only)
|
||||||
|
- Unix domain socket admin API for config reload with feedback
|
||||||
- Graceful shutdown (SIGTERM handling)
|
- Graceful shutdown (SIGTERM handling)
|
||||||
- Systemd unit file
|
- Systemd unit file
|
||||||
- Dual licensing: MIT OR Apache-2.0
|
- Dual licensing: MIT OR Apache-2.0
|
||||||
|
|
||||||
- **Phase 2**: Operational hardening
|
- **Phase 2**: Operational hardening
|
||||||
- Per-site rate limits and body limits
|
- Per-site rate limits and body limits
|
||||||
- Per-site upstream timeouts
|
|
||||||
- Metrics endpoint (Prometheus-compatible)
|
- Metrics endpoint (Prometheus-compatible)
|
||||||
- Connection limits and timeouts
|
- Connection limits and timeouts
|
||||||
- Log rotation
|
- Log rotation
|
||||||
@@ -63,7 +65,6 @@ details.
|
|||||||
- **Phase 3**: Future enhancements
|
- **Phase 3**: Future enhancements
|
||||||
- Wildcard subdomain support
|
- Wildcard subdomain support
|
||||||
- Per-site TLS overrides (manual certs for specific domains)
|
- Per-site TLS overrides (manual certs for specific domains)
|
||||||
- Unix domain socket config reload API
|
|
||||||
|
|
||||||
### Out of Scope
|
### Out of Scope
|
||||||
|
|
||||||
@@ -168,12 +169,19 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
|
|||||||
| [009](decisions/009-signal-handling.md) | Signal handling strategy | signal-hook for SIGTERM/SIGINT/SIGHUP |
|
| [009](decisions/009-signal-handling.md) | Signal handling strategy | signal-hook for SIGTERM/SIGINT/SIGHUP |
|
||||||
| [010](decisions/010-multi-site-phase1.md) | Multi-site in Phase 1 | Multiple domains from initial release; avoids config migration later |
|
| [010](decisions/010-multi-site-phase1.md) | Multi-site in Phase 1 | Multiple domains from initial release; avoids config migration later |
|
||||||
| [011](decisions/011-multi-domain-tls.md) | Multi-domain TLS config | Single SAN certificate covering all domains via rustls-acme |
|
| [011](decisions/011-multi-domain-tls.md) | Multi-domain TLS config | Single SAN certificate covering all domains via rustls-acme |
|
||||||
|
| [012](decisions/012-cipher-suite-restriction.md) | Restrict cipher suites | Match nginx scope: ECDHE-AES-GCM for TLS 1.2, all TLS 1.3 |
|
||||||
|
| [013](decisions/013-health-check-port.md) | Health check on separate local port | Localhost-only HTTP health check, configurable port |
|
||||||
|
| [014](decisions/014-unix-socket-reload.md) | Unix domain socket config reload API | Programmatic reload with success/failure feedback |
|
||||||
|
| [015](decisions/015-per-site-timeouts.md) | Per-site upstream timeouts with defaults | 5s connect / 60s request defaults, per-site overrides |
|
||||||
|
| [016](decisions/016-explicit-bind-address.md) | Explicit bind address required | Rejects `0.0.0.0` to prevent accidental exposure |
|
||||||
|
| [017](decisions/017-upstream-connection-defaults.md) | Upstream connection defaults | HTTP/1.1, no redirects, connection pooling |
|
||||||
|
| [018](decisions/018-body-size-limit.md) | Request body size limit | 100 MB default matching nginx, Gitea push compatibility |
|
||||||
|
|
||||||
## Open Questions
|
## Open Questions
|
||||||
|
|
||||||
Open questions are tracked in [open-questions.md](open-questions.md). Key
|
Open questions are tracked in [open-questions.md](open-questions.md). Key
|
||||||
questions affecting this document:
|
questions affecting this document:
|
||||||
|
|
||||||
- **OQ-01**: Should cipher suites be restricted beyond rustls defaults? (open)
|
- ~~**OQ-01**: Should cipher suites be restricted beyond rustls defaults?~~ (resolved — ADR-012)
|
||||||
- **OQ-03**: Should the health check endpoint be on a separate port? (open)
|
- ~~**OQ-03**: Should the health check endpoint be on a separate port?~~ (resolved — ADR-013)
|
||||||
- **OQ-07**: Should per-site TLS overrides be supported for mixed ACME/manual domains? (open)
|
- **OQ-07**: Should per-site TLS overrides be supported for mixed ACME/manual domains? (open)
|
||||||
@@ -109,12 +109,15 @@ The proxy handler constructs a new request to the upstream:
|
|||||||
5. Stream the response back to the client
|
5. Stream the response back to the client
|
||||||
|
|
||||||
The hyper Client is created once at startup and shared via axum's `State`. It
|
The hyper Client is created once at startup and shared via axum's `State`. It
|
||||||
must be configured with:
|
must be configured with (see ADR-017 for rationale):
|
||||||
- Connection pooling (hyper default behavior)
|
- Connection pooling (hyper default behavior)
|
||||||
- Connect timeout: 5 seconds
|
- HTTP/1.1 only for upstream connections (HTTP/2 proxying is out of scope)
|
||||||
- Request timeout: 60 seconds
|
|
||||||
- No redirect following (proxies should not follow redirects)
|
- No redirect following (proxies should not follow redirects)
|
||||||
|
|
||||||
|
Per-site timeout overrides are available via `upstream_connect_timeout_secs`
|
||||||
|
and `upstream_request_timeout_secs` in `SiteConfig` (see ADR-015). When not
|
||||||
|
specified, defaults of 5s connect and 60s request are used.
|
||||||
|
|
||||||
### 4. Error Handling
|
### 4. Error Handling
|
||||||
|
|
||||||
| Upstream Condition | Response | Notes |
|
| Upstream Condition | Response | Notes |
|
||||||
@@ -147,10 +150,11 @@ between the proxy and backend services on loopback is unnecessary.
|
|||||||
|
|
||||||
## Body Size Limit
|
## Body Size Limit
|
||||||
|
|
||||||
axum's `DefaultBodyLimit` layer sets the maximum request body size. For
|
axum's `DefaultBodyLimit` layer sets the maximum request body size. The default
|
||||||
compatibility with Gitea's push operations (large pack files), this defaults
|
of 100 MB (104,857,600 bytes) matches our current nginx configuration and
|
||||||
to 100 MB. In Phase 1, the body limit is a global setting; Phase 2 may add
|
accommodates Gitea's push operations with large pack files (see ADR-018). In
|
||||||
per-site body limits.
|
Phase 1, the body limit is a global setting; Phase 2 may add per-site body
|
||||||
|
limits.
|
||||||
|
|
||||||
## Design Decisions
|
## Design Decisions
|
||||||
|
|
||||||
@@ -161,11 +165,14 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
|
|||||||
| [002](decisions/002-custom-proxy-handler.md) | Custom proxy handler | One upstream per domain — simpler than a general proxy library |
|
| [002](decisions/002-custom-proxy-handler.md) | Custom proxy handler | One upstream per domain — simpler than a general proxy library |
|
||||||
| [007](decisions/007-custom-log-format.md) | Custom structured log format | key=value pairs with RATE_LIMIT prefix for fail2ban |
|
| [007](decisions/007-custom-log-format.md) | Custom structured log format | key=value pairs with RATE_LIMIT prefix for fail2ban |
|
||||||
| [010](decisions/010-multi-site-phase1.md) | Multi-site in Phase 1 | Multiple domains from initial release |
|
| [010](decisions/010-multi-site-phase1.md) | Multi-site in Phase 1 | Multiple domains from initial release |
|
||||||
|
| [015](decisions/015-per-site-timeouts.md) | Per-site upstream timeouts with defaults | 5s connect / 60s request defaults, per-site overrides |
|
||||||
|
| [017](decisions/017-upstream-connection-defaults.md) | Upstream connection defaults | HTTP/1.1, no redirects, connection pooling |
|
||||||
|
| [018](decisions/018-body-size-limit.md) | Request body size limit | 100 MB default matching nginx, Gitea push compatibility |
|
||||||
|
|
||||||
## Open Questions
|
## Open Questions
|
||||||
|
|
||||||
Open questions are tracked in [open-questions.md](open-questions.md). Key
|
Open questions are tracked in [open-questions.md](open-questions.md). Key
|
||||||
questions affecting this document:
|
questions affecting this document:
|
||||||
|
|
||||||
- **OQ-06**: Should upstream timeouts be configurable per-site? (open — Phase 1
|
- ~~**OQ-06**: Should upstream timeouts be configurable per-site?~~ (resolved —
|
||||||
uses global defaults of 5s connect, 60s request)
|
ADR-015: per-site timeout overrides with defaults)
|
||||||
@@ -115,25 +115,26 @@ regression if defaults change in future rustls releases.
|
|||||||
|
|
||||||
### Cipher Suites
|
### Cipher Suites
|
||||||
|
|
||||||
rustls 0.23 with the `aws_lc_rs` crypto provider defaults to a conservative
|
Cipher suites are explicitly restricted to match the scope of our current nginx
|
||||||
cipher suite selection that excludes all weak ciphers (no SHA-1, no 3DES, no
|
configuration. See ADR-012 for the full rationale.
|
||||||
RC4, no CBC-mode suites, no RSA key exchange).
|
|
||||||
|
|
||||||
The current nginx config explicitly restricts to:
|
**TLS 1.2 (explicitly selected):**
|
||||||
|
|
||||||
```
|
- `TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256`
|
||||||
ECDHE-ECDSA-AES128-GCM-SHA256
|
- `TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256`
|
||||||
ECDHE-RSA-AES128-GCM-SHA256
|
- `TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384`
|
||||||
ECDHE-ECDSA-AES256-GCM-SHA384
|
- `TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384`
|
||||||
ECDHE-RSA-AES256-GCM-SHA384
|
|
||||||
```
|
|
||||||
|
|
||||||
rustls's defaults include these plus TLS 1.3 suites (which nginx's config
|
**TLS 1.3 (all default suites):**
|
||||||
also allows via `TLSv1.3`). The default rustls cipher list is a strict subset
|
|
||||||
of what browsers accept.
|
|
||||||
|
|
||||||
See [open-questions.md](open-questions.md) OQ-01 for whether to further
|
- `TLS_AES_128_GCM_SHA256`
|
||||||
restrict cipher suites beyond rustls defaults.
|
- `TLS_AES_256_GCM_SHA384`
|
||||||
|
- `TLS_CHACHA20_POLY1305_SHA256`
|
||||||
|
|
||||||
|
This is configured by building a `CryptoProvider` with a custom `cipher_suite`
|
||||||
|
list and passing it to `ServerConfig::builder_with_provider()`. The cipher
|
||||||
|
list matches our current nginx configuration's scope, providing behavioral
|
||||||
|
parity during migration.
|
||||||
|
|
||||||
### ServerConfig Construction
|
### ServerConfig Construction
|
||||||
|
|
||||||
@@ -223,12 +224,13 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
|
|||||||
| [005](decisions/005-tokio-rustls-direct.md) | tokio-rustls directly | Full control over TLS config and ACME resolver integration |
|
| [005](decisions/005-tokio-rustls-direct.md) | tokio-rustls directly | Full control over TLS config and ACME resolver integration |
|
||||||
| [010](decisions/010-multi-site-phase1.md) | Multi-site in Phase 1 | Multiple domains from initial release |
|
| [010](decisions/010-multi-site-phase1.md) | Multi-site in Phase 1 | Multiple domains from initial release |
|
||||||
| [011](decisions/011-multi-domain-tls.md) | Multi-domain TLS config | Single SAN certificate covering all domains via rustls-acme |
|
| [011](decisions/011-multi-domain-tls.md) | Multi-domain TLS config | Single SAN certificate covering all domains via rustls-acme |
|
||||||
|
| [012](decisions/012-cipher-suite-restriction.md) | Restrict cipher suites | Match nginx scope: four ECDHE-AES-GCM suites for TLS 1.2, all TLS 1.3 suites |
|
||||||
|
|
||||||
## Open Questions
|
## Open Questions
|
||||||
|
|
||||||
Open questions are tracked in [open-questions.md](open-questions.md). Key
|
Open questions are tracked in [open-questions.md](open-questions.md). Key
|
||||||
questions affecting this document:
|
questions affecting this document:
|
||||||
|
|
||||||
- **OQ-01**: Should cipher suites be restricted beyond rustls defaults? (open)
|
- ~~**OQ-01**: Should cipher suites be restricted beyond rustls defaults?~~ (resolved — ADR-012: restrict to nginx scope)
|
||||||
- **OQ-07**: Should per-site TLS overrides be supported for mixed ACME/manual
|
- **OQ-07**: Should per-site TLS overrides be supported for mixed ACME/manual
|
||||||
domains? (open)
|
domains? (open)
|
||||||
Reference in New Issue
Block a user