Files

glm-5.1 9a2352e61c Resolve 5 open questions, add 7 ADRs for previously undocumented decisions

Resolve open questions:
- OQ-01: Restrict cipher suites to match nginx scope (4 ECDHE-AES-GCM
  suites for TLS 1.2 + all TLS 1.3 suites) — ADR-012
- OQ-03: Health check on separate local port (default 9900, localhost
  only) — ADR-013
- OQ-04: Add Unix domain socket admin API for config reload alongside
  SIGHUP, with structured success/failure responses — ADR-014
- OQ-06: Per-site upstream timeouts with defaults (5s connect, 60s
  request), overridable in SiteConfig — ADR-015

Document previously undocumented decisions flagged by architecture review:
- ADR-016: Explicit bind address requirement (reject 0.0.0.0)
- ADR-017: Upstream connection defaults (HTTP/1.1, no redirects, pooling)
- ADR-018: 100 MB body size limit (matches nginx, Gitea compatibility)

OQ-07 (per-site TLS overrides) remains open for future consideration.

Spec updates:
- config.md: add health_check_port, admin_socket_path, per-site timeout
  fields, update TOML example and validation rules
- proxy.md: reference ADR-015/017/018 for timeouts, connection defaults,
  and body limit decisions
- tls.md: replace OQ-01 cipher suite section with ADR-012 decision
- operations.md: add local health check port section, admin socket reload
- overview.md: update Phase 1 scope with new features, add ADR references
- open-questions.md: resolve OQ-01/03/04/06, keep OQ-07 open

2026-06-11 09:07:36 +00:00

2.5 KiB

Raw Blame History

ADR-015: Per-Site Upstream Timeouts with Defaults

Status

Accepted

Context

The proxy forwards requests to upstream services. Connection and request timeouts affect reliability — too short and legitimate slow responses fail, too long and the proxy is vulnerable to resource exhaustion from stalled connections.

Phase 1 initially specified global timeout defaults (5s connect, 60s request) for all upstreams. However, different upstream services have different latency profiles:

Gitea (git.alk.dev): Git push operations can be slow for large repos; 60s may be insufficient for clone/push operations with large pack files
Deno/Fresh (alk.dev): Fast responses expected; 60s is generous

Global timeouts don't accommodate these differences. Per-site timeout configuration allows tuning for each upstream without affecting others.

Decision

Add optional per-site upstream timeout configuration to SiteConfig. When not specified, sensible defaults are used.

SiteConfig additions:

Field	Type	Default	Description
`upstream_connect_timeout_secs`	`u64`	`5`	TCP connection timeout in seconds
`upstream_request_timeout_secs`	`u64`	`60`	Full request timeout in seconds

These are part of DynamicConfig (hot-reloadable via ArcSwap) since they affect per-request behavior and should not require a restart to change.

Rationale

Different upstreams genuinely have different latency profiles — Gitea pushes with large pack files need more time than a fast static site
Defaults of 5s connect and 60s request match common reverse proxy conventions (nginx defaults: 60s, haproxy defaults: 30s connect, 60s server)
Making these per-site rather than global allows tuning without side effects
Per-site overrides in DynamicConfig means timeout changes don't require restarts
The defaults are reasonable for most services; explicit configuration is only needed for outliers

Consequences

Positive:

Each upstream can be tuned independently
Defaults work for most cases — explicit configuration is optional
Hot-reloadable (part of DynamicConfig)
Consistent with how other reverse proxies handle timeouts

Negative:

Two more fields per site in config (mitigated by sensible defaults)
Per-site timeout means the proxy must look up per-request config for each upstream connection (already required for routing, so no additional overhead)

References

proxy.md
config.md
OQ-06 (now resolved)

2.5 KiB Raw Blame History