Files

glm-5.1 9a2352e61c Resolve 5 open questions, add 7 ADRs for previously undocumented decisions

Resolve open questions:
- OQ-01: Restrict cipher suites to match nginx scope (4 ECDHE-AES-GCM
  suites for TLS 1.2 + all TLS 1.3 suites) — ADR-012
- OQ-03: Health check on separate local port (default 9900, localhost
  only) — ADR-013
- OQ-04: Add Unix domain socket admin API for config reload alongside
  SIGHUP, with structured success/failure responses — ADR-014
- OQ-06: Per-site upstream timeouts with defaults (5s connect, 60s
  request), overridable in SiteConfig — ADR-015

Document previously undocumented decisions flagged by architecture review:
- ADR-016: Explicit bind address requirement (reject 0.0.0.0)
- ADR-017: Upstream connection defaults (HTTP/1.1, no redirects, pooling)
- ADR-018: 100 MB body size limit (matches nginx, Gitea compatibility)

OQ-07 (per-site TLS overrides) remains open for future consideration.

Spec updates:
- config.md: add health_check_port, admin_socket_path, per-site timeout
  fields, update TOML example and validation rules
- proxy.md: reference ADR-015/017/018 for timeouts, connection defaults,
  and body limit decisions
- tls.md: replace OQ-01 cipher suite section with ADR-012 decision
- operations.md: add local health check port section, admin socket reload
- overview.md: update Phase 1 scope with new features, add ADR references
- open-questions.md: resolve OQ-01/03/04/06, keep OQ-07 open

2026-06-11 09:07:36 +00:00

9.8 KiB

Raw Blame History

status, last_updated

status	last_updated
draft	2026-06-11

Configuration

What It Is

The configuration system defines how the proxy is configured, how configuration is loaded, and how dynamic configuration can be reloaded without restarting the process.

Why It Exists

The proxy needs to be configurable without hard-coding domains, upstream addresses, or TLS settings. The configuration system separates immutable startup parameters (bind addresses, TLS mode) from runtime-adjustable parameters (site definitions, rate limits) using the ArcSwap pattern proven in the alknet project.

Architecture

config.toml
    │
    ▼
┌──────────────────────┐
│  serde::Deserialize   │
│  (TOML → Config)     │
└──────────┬───────────┘
           │
           ▼
┌──────────────────────┐     ┌──────────────────────┐
│  StaticConfig         │     │  DynamicConfig        │
│  (immutable)         │     │  (hot-reloadable)     │
│                      │     │                       │
│  bind_addr           │     │  sites[]              │
│  http_port           │     │  rate_limit           │
│  https_port          │     │  body_limit           │
│  health_check_port   │     │  proxy_headers        │
│  admin_socket_path   │     │                       │
│  tls.mode            │     │  ← ArcSwap →          │
│  tls.acme_domains    │     │                       │
│  tls.cert_path       │     │  ← ArcSwap →          │
│  tls.key_path        │     │  ConfigReloadHandle    │
│  tls.cache_dir       │     │  .reload(new_config)  │
│  log_level           │     │                       │
│  log_format          │     └───────────────────────┘
└──────────────────────┘

Static vs Dynamic Configuration

This split follows the pattern established in alknet (ADR-030) and adapted for our simpler use case.

StaticConfig

Immutable after startup. Changes require a process restart.

Field	Type	Description
`bind_addr`	`String`	IP address to bind to (must be explicit, no `0.0.0.0`; see ADR-016)
`http_port`	`u16`	Port for HTTP→HTTPS redirect (default: `80`; set to `0` to disable)
`https_port`	`u16`	Port for TLS listener (default: `443`)
`health_check_port`	`u16`	Port for local health check endpoint (default: `9900`; set to `0` to disable; see ADR-013)
`admin_socket_path`	`String`	Unix domain socket path for admin API (default: `/run/reverse-proxy/admin.sock`; empty string to disable; see ADR-014)
`tls.mode`	`"acme"` or `"manual"`	Certificate provisioning mode
`tls.acme_domains`	`Vec<String>`	Domains for ACME SAN certificate (ACME mode only)
`tls.acme_cache_dir`	`String`	ACME state cache directory
`tls.acme_directory`	`"production"` or `"staging"`	Let's Encrypt directory
`tls.cert_path`	`String`	Certificate file path (manual mode only)
`tls.key_path`	`String`	Private key file path (manual mode only)
`log_level`	`"trace"`, `"debug"`, `"info"`, `"warn"`, `"error"`	Logging verbosity
`log_format`	`"text"` or `"json"`	Log output format

Why these are static: See ADR-008 for the rationale behind the static/dynamic split. In summary: changing bind addresses, ports, or TLS mode requires creating new listeners and TLS configurations — operations that fundamentally require a restart.

DynamicConfig

Hot-reloadable at runtime via ArcSwap. Changes take effect for new connections immediately.

Field	Type	Description
`sites`	`Vec<SiteConfig>`	Site definitions (hostname → upstream mapping)
`rate_limit.requests_per_second`	`u32`	Rate limit per IP (global in Phase 1)
`rate_limit.burst`	`u32`	Burst capacity (global in Phase 1)
`body_limit_bytes`	`u64`	Max request body size in bytes (global in Phase 1)

SiteConfig:

Field	Type	Description
`host`	`String`	Hostname to match (e.g., `"git.alk.dev"`)
`upstream`	`String`	Upstream address (e.g., `"127.0.0.1:3000"`)
`upstream_scheme`	`"http"` or `"https"`	Protocol for upstream connection (default: `"http"`)
`upstream_connect_timeout_secs`	`u64`	TCP connect timeout in seconds (default: `5`; see ADR-015, ADR-017)
`upstream_request_timeout_secs`	`u64`	Full request timeout in seconds (default: `60`; see ADR-015, ADR-017)

Why these are dynamic: See ADR-008 for the rationale. Site definitions and rate limits are per-request concerns that should not require restarting the proxy or dropping active connections. Rate limits and body limits are global settings in Phase 1; per-site configuration for these is deferred to Phase 2.

Config Reload

ArcSwap Pattern

DynamicConfig is wrapped in Arc<ArcSwap<DynamicConfig>>. This provides:

Lock-free reads: Every handler reads the current config via a single Arc dereference — no lock contention on the request hot path.
Atomic writes: ConfigReloadHandle::reload(new_config) swaps the entire config atomically. All new requests see the new config immediately.
No partial updates: The entire config is swapped at once. There's no risk of reading a half-updated config.

See ADR-008 for the rationale behind this split.

Reload Trigger

Config reload is triggered by two mechanisms:

SIGHUP: Re-reads the config file, validates, and swaps DynamicConfig if valid. Simple and well-understood, but provides no feedback on success or failure.
Admin socket: The reload command via the admin Unix domain socket performs the same action as SIGHUP but returns a structured response indicating success or failure with an error message. See ADR-014 for details.

Both mechanisms converge on the same code path:

Re-read the config file from disk
Deserialize into DynamicConfig
Validate (check upstream reachability is optional)
Call ConfigReloadHandle::reload(new_config)

TOML Config Format

# reverse-proxy config

[server]
bind_addr = "203.0.113.10"  # Replace with actual bind address
http_port = 80
https_port = 443
health_check_port = 9900     # Local health check (0 to disable)
admin_socket_path = "/run/reverse-proxy/admin.sock"  # Empty string to disable

[server.tls]
mode = "acme"                    # "acme" or "manual"
acme_domains = ["git.alk.dev", "alk.dev"]
acme_cache_dir = "/var/lib/reverse-proxy/acme-cache"
acme_directory = "production"    # "production" or "staging"

# Manual mode (uncomment and comment out ACME settings)
# mode = "manual"
# cert_path = "/etc/letsencrypt/live/git.alk.dev/fullchain.pem"
# key_path = "/etc/letsencrypt/live/git.alk.dev/privkey.pem"

[server.logging]
level = "info"
format = "text"                  # "text" or "json"

[rate_limit]
requests_per_second = 10
burst = 20

[body]
limit_bytes = 104857600          # 100 MB

[[sites]]
host = "git.alk.dev"
upstream = "127.0.0.1:3000"
upstream_scheme = "http"
# upstream_connect_timeout_secs = 5    # Default: 5s
# upstream_request_timeout_secs = 60   # Default: 60s

[[sites]]
host = "alk.dev"
upstream = "127.0.0.1:8080"
upstream_scheme = "http"

Validation

On startup, the config is validated:

bind_addr is not 0.0.0.0 (must be explicit)
In ACME mode, acme_domains must be non-empty
In manual mode, cert_path and key_path must both be set and the files must be readable
Each site must have a host and upstream
Site host values must be unique (no duplicate hostnames)
rate_limit.requests_per_second must be > 0
body.limit_bytes must be > 0

On SIGHUP reload, the same validation applies. If the new config fails validation, the reload is rejected and the old config remains active. An error is logged.

On startup: If config validation fails, the process exits with a non-zero code and logs the validation errors. The proxy will not start with an invalid configuration.

Design Decisions

All design decisions are documented as ADRs in decisions/.

ADR	Decision	Summary
003	TOML configuration format	Rust-native, unambiguous, excellent serde support
008	Static/dynamic config split	Immutable StaticConfig, hot-reloadable DynamicConfig via ArcSwap
010	Multi-site in Phase 1	Multiple domains from initial release
011	Multi-domain TLS config	Single SAN certificate covering all domains
013	Health check on separate local port	Localhost-only HTTP health check, configurable port
014	Unix domain socket config reload API	Programmatic reload with success/failure feedback
015	Per-site upstream timeouts with defaults	5s connect / 60s request defaults, per-site overrides
016	Explicit bind address required	Rejects `0.0.0.0` to prevent accidental exposure

Open Questions

Open questions are tracked in open-questions.md. Key questions affecting this document:

OQ-04: Should config reload support a Unix domain socket API in addition to SIGHUP? (resolved — ADR-014: Unix domain socket admin API added)
OQ-07: Should per-site TLS overrides be supported for mixed ACME/manual domains? (open)

9.8 KiB Raw Blame History