Triage implementation review findings and update architecture specs

Analyzed 29 findings from the implementation review (002-implementation-review.md)
and identified 8 architecture-level concerns requiring spec changes:

Architecture gaps addressed:
- C2: Added acme_contact field to config.md, tls.md, and operations.md.
  Let's Encrypt requires a contact email for production; the spec was missing
  this required field.
- C4: Added StaticConfig drift tracking requirement to config.md reload
  section. ConfigReloadHandle must update its stored StaticConfig after each
  successful reload to prevent stale warnings.
- W1: Updated shutdown sequence in operations.md to specify that server tasks
  should be joined (not aborted) during the drain window.
- W5: Added health check path collision note to proxy.md.
- W13: Clarified that access logging is always-on in operations.md.
- W14: Updated X-Forwarded-Proto description in proxy.md to clarify that it
  is always 'https' since the HTTP listener redirects rather than proxies.

New open questions added:
- OQ-08: Should /health use a less common path to avoid upstream collision?
- OQ-09: How should upstream_connect_timeout_secs be enforced?
- OQ-10: Should ACME contact email be a required config field?
- OQ-11: How should X-Forwarded-Proto be derived per-listener?
- OQ-12: Should request access logging be mandatory or optional?

The remaining 21 findings are implementation-level bugs, code quality issues,
or Phase 2 improvements that don't require architecture spec changes.
This commit is contained in:
2026-06-11 15:04:09 +00:00
parent 5478df7ab7
commit 68d27c4789
6 changed files with 135 additions and 10 deletions

View File

@@ -116,6 +116,7 @@ will be handled via signal-based or built-in rotation.
| `tls.acme_domains` | `Vec<String>` | Domains for ACME SAN certificate (ACME mode only) |
| `tls.acme_cache_dir` | `String` | ACME state cache directory |
| `tls.acme_directory` | `"production"` or `"staging"` | Let's Encrypt directory |
| `tls.acme_contact` | `String` | Contact email for ACME registration (e.g., `"mailto:admin@example.com"`). Required for production; Let's Encrypt rejects registrations without a contact email. See OQ-10. |
| `tls.cert_path` | `String` | Certificate file path (manual mode only) |
| `tls.key_path` | `String` | Private key file path (manual mode only) |
@@ -174,6 +175,7 @@ Phase 2.
| `listeners[].http_port` | `u16` | `80` | No |
| `listeners[].https_port` | `u16` | `443` | No |
| `listeners[].tls.acme_directory` | `String` | `"production"` | No |
| `listeners[].tls.acme_contact` | `String` | — | Yes (ACME mode only) |
| `sites[].upstream_scheme` | `String` | `"http"` | No |
| `sites[].upstream_connect_timeout_secs` | `u64` | `5` | No |
| `sites[].upstream_request_timeout_secs` | `u64` | `60` | No |
@@ -236,6 +238,13 @@ effect. This gives operators early feedback about config drift.
Only the DynamicConfig portion is swapped via ArcSwap. StaticConfig changes
require a process restart to take effect.
**Important**: The `ConfigReloadHandle` must track the last-known StaticConfig
so that it can correctly detect changes on subsequent reloads. After each
successful reload, the stored StaticConfig is updated with the new value (via
`ArcSwap<StaticConfig>` or similar interior mutability). This prevents stale
warnings: if the same static config change is present on two consecutive
reloads, the operator should see the warning only once, not on every reload.
### Reload Serialization
Reload operations are serialized using a `tokio::sync::Mutex` on the reload
@@ -290,6 +299,7 @@ mode = "acme"
acme_domains = ["git.alk.dev"]
acme_cache_dir = "/var/lib/reverse-proxy/acme-cache-git"
acme_directory = "production"
acme_contact = "mailto:admin@alk.dev"
[[listeners.sites]]
host = "git.alk.dev"
@@ -347,6 +357,7 @@ mode = "acme"
acme_domains = ["git.alk.dev", "alk.dev"]
acme_cache_dir = "/var/lib/reverse-proxy/acme-cache"
acme_directory = "production"
acme_contact = "mailto:admin@alk.dev"
[[listeners.sites]]
host = "git.alk.dev"
@@ -391,6 +402,9 @@ On startup, the config is validated:
`10.0.0.5` (missing port). The `upstream_scheme` field handles the protocol.
18. `upstream_scheme` values are case-sensitive: only `"http"` or `"https"`
(lowercase). Default is `"http"`.
19. In ACME mode, `tls.acme_contact` must be a valid `mailto:` URI
(e.g., `"mailto:admin@example.com"`). Let's Encrypt requires a contact
email for production certificate requests.
On SIGHUP reload, the same validation applies. If the new config fails
validation, the reload is rejected and the old config remains active. An error