Add architectural recommendation to replace Unix socket with authenticated HTTP

This commit is contained in:
2026-06-14 09:22:41 +00:00
parent 38649e9485
commit 0af1683c49

View File

@@ -627,6 +627,169 @@ that clearly states:
between startup and reload validation. between startup and reload validation.
9. **Remaining W and S findings** — Fix opportunistically. 9. **Remaining W and S findings** — Fix opportunistically.
## Architectural Recommendation: Replace Unix Domain Socket with Authenticated HTTP Endpoint
### Problem
Every finding in this review (C1C3, W1W7, S1S6) stems from the fundamental
design choice of using a Unix domain socket for the admin API. Unix sockets
bring an entire class of vulnerabilities that don't exist in HTTP endpoints:
- **Filesystem-based attack surface** — symlink races, stale socket cleanup,
path traversal, permission management, directory mount issues in containers
- **No built-in authentication** — filesystem permissions are the only access
control, which breaks in container environments (volume mounts, tmpfs quirks,
umask variations) and provides no defense against any local user
- **Custom protocol** — line-based command protocol instead of standard HTTP,
requiring `socat` or custom tooling instead of `curl`
- **Operational complexity** — socket cleanup on startup, stale socket detection,
shutdown cleanup, container volume mounts for socket access
### Existing Infrastructure
The proxy already has a localhost-only HTTP listener (`src/health.rs`) bound to
`127.0.0.1:9900` that serves `/health`. This listener:
- Is already axum-based with a router
- Is already restricted to localhost
- Already supports middleware layers
- Already has integration tests
### Proposed Replacement
Replace the Unix domain socket admin API with authenticated HTTP endpoints on
the existing health check listener:
```
GET http://127.0.0.1:9900/admin/reload → triggers config reload
GET http://127.0.0.1:9900/admin/status → returns uptime + site count
```
Authentication via a Bearer token supplied at startup (config or env var
`ADMIN_KEY`), validated using constant-time comparison (`subtle` crate) to
prevent timing attacks. This is the same pattern used in
`/workspace/embedding_service/src/auth.rs`:
```rust
use subtle::ConstantTimeEq;
pub struct AdminAuthConfig {
pub admin_key: Option<String>, // None = admin endpoints disabled
}
pub async fn admin_auth_middleware(
State(auth_config): State<Arc<AdminAuthConfig>>,
mut request: Request,
next: Next,
) -> Result<Response, StatusCode> {
let expected = auth_config.admin_key.as_deref().unwrap_or_default();
let provided = request
.headers()
.get(header::AUTHORIZATION)
.and_then(|h| h.to_str().ok())
.and_then(|h| h.strip_prefix("Bearer "));
match provided {
Some(key) if key.as_bytes().ct_eq(expected.as_bytes()).into() => {
request.headers_mut().remove(header::AUTHORIZATION);
Ok(next.run(request).await)
}
_ => Err(StatusCode::UNAUTHORIZED),
}
}
```
### Config Change
Replace `admin_socket_path` (StaticConfig) with `admin_key` (StaticConfig or
environment variable):
```toml
# Before
admin_socket_path = "/run/reverse-proxy/admin.sock"
# After
admin_key = "" # empty = admin endpoints disabled (return 404)
# set to a token to enable admin endpoints
```
Or via environment variable:
```
ADMIN_KEY=your-secure-random-token-here
```
### Why This Eliminates Every Finding
| Finding | Socket-specific? | Eliminated by HTTP? |
|---------|-------------------|---------------------|
| C1 (symlink race) | Yes | No filesystem path management at all |
| C2 (no auth) | Yes | Bearer token with constant-time comparison |
| C3 (info leak) | Partially | Generic HTTP error responses, no paths |
| W1 (no conn limit) | Partially | axum/TCP backlog handles this naturally |
| W2 (config TOCTOU) | No | Still exists (same file read pattern) |
| W3 (path validation) | Yes | No socket path to validate |
| W4 (is_socket_active) | Yes | No stale socket detection needed |
| W5 (wildcard flag) | No | Still exists (reload validation) |
| W6 (changed_fields) | No | Still exists (operational UX) |
| W7 (health port recon) | Partially | Merged into admin auth |
### What Changes
**Removed:**
- `src/admin/socket.rs` — entire file (826 lines)
- `src/admin/mod.rs` — re-exports
- `admin_socket_path` config field
- Socket cleanup/stale detection logic
- `socat`-based operational procedures
- Docker volume mount for `/run/reverse-proxy`
**Added:**
- `src/admin/auth.rs` — Bearer token middleware with `subtle::ConstantTimeEq`
- `src/admin/handler.rs` — HTTP handlers for `/admin/reload` and `/admin/status`
- `admin_key` config field (or env var)
- Admin route registration on the health check listener
**Modified:**
- `src/health.rs` — add admin routes behind auth middleware
- `src/main.rs` — remove socket init, add admin route setup
- `src/config/static_config.rs` — replace `admin_socket_path` with `admin_key`
- `docs/architecture/decisions/014-unix-socket-reload.md` — superseded
- `deploy/docker-compose.yml` — remove socket volume mount
- `deploy/README.md` — replace `socat` commands with `curl` examples
**Unchanged:**
- `src/shutdown.rs` — SIGHUP reload path remains as-is
- `src/config/dynamic_config.rs``ConfigReloadHandle` still used by both
SIGHUP and admin reload
### Operational Comparison
```bash
# Before (Unix socket)
echo "reload" | socat - UNIX-CONNECT:/run/reverse-proxy/admin.sock
echo "status" | socat - UNIX-CONNECT:/run/reverse-proxy/admin.sock
# After (HTTP with Bearer token)
curl -H "Authorization: Bearer $ADMIN_KEY" http://127.0.0.1:9900/admin/reload
curl -H "Authorization: Bearer $ADMIN_KEY" http://127.0.0.1:9900/admin/status
```
HTTP is universally understood by CI/CD tools, load balancer health checks,
and monitoring systems. No special client (`socat`) required.
### SIGHUP Still Works
This change does **not** remove SIGHUP as a reload mechanism. SIGHUP reload
(`src/shutdown.rs:handle_sighup_reload`) continues to work exactly as before.
The admin HTTP endpoint provides the same functionality with the addition of
structured feedback and authentication.
### Recommendation
Mark C1, C2, C3, W1, W3, W4, S1S6 as **resolved by architectural change**.
Implement W2 and W5 as standalone fixes regardless. Schedule the socket-to-HTTP
migration as the primary implementation task for the next development round.
## Additional Notes ## Additional Notes
The unusual traffic on this file was likely driven by interest in C1 and C2. The unusual traffic on this file was likely driven by interest in C1 and C2.