diff --git a/docs/reviews/004-post-fix-review.md b/docs/reviews/004-post-fix-review.md new file mode 100644 index 0000000..9face98 --- /dev/null +++ b/docs/reviews/004-post-fix-review.md @@ -0,0 +1,154 @@ +--- +status: draft +last_updated: 2026-06-12 +reviewed_code: + - src/main.rs + - src/server.rs + - src/cli.rs + - src/config/static_config.rs + - src/config/validation.rs + - src/proxy/handler.rs + - src/proxy/error.rs + - src/rate_limit/mod.rs + - src/rate_limit/bucket.rs + - src/admin/socket.rs + - src/logging/mod.rs + - src/logging/format.rs + - src/tls/acceptor.rs + - src/tls/config.rs + - tests/integration_test.rs + - tests/helpers/http_test_helper.rs +reviewer: code-reviewer +based_on: docs/reviews/003-security-and-bug-review.md +--- + +# Post-Fix Review #004 + +## Purpose + +Verify that the 4 critical, 12 warning, and 10 suggestion findings from review +#003 were correctly addressed. Flag any regressions or remaining issues. + +## Critical Findings — All Fixed + +### C1. Rate Limiter IP Source ✓ + +`src/rate_limit/mod.rs:66-73` — Now uses `ConnectInfo` exclusively. No fallback +to client-supplied `X-Forwarded-For`. Returns 429 if `ConnectInfo` is absent. +Three new tests verify: ConnectInfo-only extraction, rejection without +ConnectInfo, and XFF header ignored (same bucket for different XFF values). + +### C2. InFlightCounter Increment ✓ + +`src/server.rs:23-28,81` — `InFlightGuard::new()` now calls +`counter.increment()` on construction. The guard is created with +`InFlightGuard::new(in_flight.clone())` at line 81. Drain interval tightened +from 50ms to 100ms. + +### C3. Connect Timeout Ceiling ✓ + +`src/proxy/handler.rs:230,234,242` — `CONNECT_TIMEOUT_CEILING_SECS = 30` +replaces the hardcoded 5s. Per-site `tokio::time::timeout` can now enforce any +value up to 30s. + +### C4. JSON Format Without Log File ✓ + +`src/logging/mod.rs:55-56` — `.json()` added to the `None` branch of +`init_json`. + +--- + +## Warning Findings — All Fixed + +| # | Finding | Status | +|---|---------|--------| +| W1 | Upstream host validation | ✓ `is_valid_upstream` validates host part (IPv6 bracket, IP, hostname). 9 new tests. | +| W2 | ACME contact validation | ✓ Checks non-empty email with `@` after `mailto:`. 2 new tests (`mailto:` empty, no `@`). | +| W3 | Query string silently dropped | ✓ `build_upstream_uri` returns `Result`, logs warning, handler returns 502. | +| W4 | Admin socket resource limits | ✓ 5s read timeout, 4096 byte limit via `reader.take(4096)`. 3 new tests. | +| W5 | TlsMode wildcard mismatch | ✓ Wildcard arm removed. Count mismatch check added at `main.rs:190-196`. | +| W6 | RawConfig/FullConfig duplication | ✓ `RawConfig` removed. `load_config` uses `FullConfig::parse()` + `into_static_and_dynamic()`. | +| W7 | cleanup_stale_socket side effect | Not addressed (acceptable per review — Phase 1). | +| W8 | Misleading test name | ~ Test name unchanged despite commit claim. See N1 below. | +| W9 | Empty routing_table in test helper | ✓ `test_dynamic_config_with_limit` now uses `DynamicConfig::from_sites()`. | +| W10 | TokenBucket pub fields | ✓ `tokens`, `last_refill`, `rate`, `max` now private. `last_access` is `pub(crate)`. | +| W11 | reload_mutex exposed for testing | ✓ Gated with `#[cfg(test)]`. | +| W12 | http_port u32 vs u16 | ✓ `http_port` changed to `u16`. All `as u32` casts removed. | + +--- + +## Suggestions — Addressed + +| # | Suggestion | Status | +|---|-----------|--------| +| S1 | Remove dead code | ✓ `log_rate_limit!`, `log_config_reload!`, `format_event_fields`, 4 unused `ProxyError` variants, `build_multi_domain_server_config`, `SniCertResolver`, `TestUpstream::url()`/`upstream_addr()` all removed. `KvVisitor` gated with `#[cfg(test)]`. | +| S3 | Log root cert count | ✓ `root_certs()` logs count at info, warns if zero. | +| S4 | Admin socket resource limits | ✓ (same as W4) | +| S5 | Consolidate config types | ✓ (same as W6) | +| S6 | TokenBucket fields private | ✓ (same as W10) | +| S8 | Connect timeout ceiling | ✓ (same as C3) | +| S10 | ConnectInfo rate limit tests | ✓ 3 new tests in `src/rate_limit/mod.rs`. | +| S2, S7, S9 | Eviction config, status info, acme_contact Vec | Deferred (Phase 2 / config format change). | + +--- + +## New Findings + +### N1. Test Name `test_health_check_disabled_when_port_zero` Still Misleading + +**File**: `tests/integration_test.rs:82` + +**Problem**: Commit `855c0f1` claimed to rename this test but the name is +unchanged. Port 0 means "OS picks a random port" — the health check listener +still starts. The actual disable logic is `health_check_port > 0` in +`main.rs:94`. The test name implies it tests the disable path but it tests the +random-port binding path. + +**Severity**: Suggestion — rename to `test_health_check_binds_random_port_when_zero`. + +--- + +### N2. `FullConfig` Test Struct Duplicated in `static_config.rs` + +**File**: `src/config/static_config.rs:189-206` + +**Problem**: The test module defines its own `FullConfig` struct with +`#[allow(dead_code)]` that mirrors the production `FullConfig` in +`config/mod.rs`. If fields change, both must be updated. This is test +infrastructure only, so low risk. + +**Severity**: Suggestion — use the production `FullConfig` from `config/mod.rs` +in these tests, or accept the duplication as intentional test isolation. + +--- + +### N3. `is_valid_upstream` Accepts Bare IPv6 Without Brackets + +**File**: `src/config/validation.rs:327` + +**Problem**: `::1:3000` passes validation because `::1` parses as a valid +`IpAddr`. The config spec says IPv6 should use bracket notation +(`[::1]:3000`). In practice nobody writes bare IPv6 with a port, and the +address would still work, so this is extremely low risk. + +**Severity**: Suggestion — could tighten by requiring brackets for IPv6, but +not worth the churn. + +--- + +## Summary + +All 4 critical and 11 of 12 warning findings are properly fixed. W7 was +explicitly deferred as acceptable for Phase 1. W8 (test rename) was claimed +fixed but the name wasn't actually changed — minor. Three new suggestion-level +notes, none blocking. + +The codebase is in significantly better shape than at review #003. The rate +limiter spoofing vulnerability (C1) was the most important fix and it's done +correctly with good test coverage. + +| Severity | Count | +|----------|-------| +| Critical | 0 | +| Warning | 0 | +| Suggestion | 3 (new, minor) | diff --git a/docs/reviews/005-admin-socket-security-review.md b/docs/reviews/005-admin-socket-security-review.md new file mode 100644 index 0000000..9a512e1 --- /dev/null +++ b/docs/reviews/005-admin-socket-security-review.md @@ -0,0 +1,638 @@ +--- +status: draft +last_updated: 2026-06-14 +reviewed_code: + - src/admin/socket.rs + - src/admin/mod.rs + - src/main.rs + - src/shutdown.rs + - src/config/dynamic_config.rs + - src/config/validation.rs + - src/health.rs + - src/proxy/handler.rs + - src/proxy/headers.rs + - src/proxy/mod.rs + - src/rate_limit/mod.rs + - src/server.rs +reviewer: code-reviewer +based_on: docs/reviews/004-post-fix-review.md +--- + +# Admin Socket Security Review #005 + +## Purpose + +Focused security review of `src/admin/socket.rs` and related code paths, +triggered by unusual traffic patterns observed on the file in both public and +self-hosted git. The review examines the Unix domain socket admin interface for +vulnerabilities including symlink attacks, privilege escalation, information +disclosure, and DoS vectors. Broader codebase was also surveyed for related +issues. + +## Severity Definitions + +| Severity | Meaning | +|----------|---------| +| **Critical** | Will cause incorrect behavior or security issues in production | +| **Warning** | Could cause issues under specific conditions or represents a missed edge case | +| **Suggestion** | Code quality, style, or minor improvement opportunity | + +--- + +## Critical Findings + +### C1. Symlink Race in `cleanup_stale_socket` Enables Arbitrary File Deletion + +**File**: `src/admin/socket.rs:143-161` + +**Problem**: `cleanup_stale_socket` checks whether a socket file exists and +whether another process is listening on it, then removes the file. Between the +`is_socket_active()` check (which connects to the socket) and the +`remove_file()` call, an attacker with local access can replace the socket file +with a symlink pointing to an arbitrary path (e.g., `/etc/passwd`, a critical +database file). The `remove_file()` call then follows the symlink and deletes +the target: + +```rust +async fn cleanup_stale_socket(path: &str) -> Result<(), AdminSocketError> { + let socket_path = Path::new(path); + if !socket_path.exists() { + return Ok(()); + } + + if is_socket_active(path).await { // check + // ... + return Err(AdminSocketError::SocketInUse(path.to_string())); + } + + warn!("removing stale socket file: {}", path); + tokio::fs::remove_file(path) // act — follows symlinks! + .await + .map_err(AdminSocketError::Io) +} +``` + +This is a classic TOCTOU (time-of-check/time-of-use) race. The window is small +but exploitable with local access, which is exactly the threat model for a Unix +domain socket (any local user can reach it). + +Additionally, `is_socket_active` works by **actually connecting** to the socket +(`src/admin/socket.rs:163-165`). If the path is a symlink to another service's +Unix socket, this creates a real connection in that service, which is an +unintended side effect that could be logged or trigger behavior in the other +service. + +**Solution**: Replace the check-then-remove sequence with a safe alternative: + +1. Use `std::fs::metadata()` to verify the file is actually a socket before + removal (sockets cannot be symlinked to — `metadata` does not follow + symlinks when `std::fs::symlink_metadata` is used, and sockets are a + distinct file type). + +2. Alternatively, use `std::fs::remove_file` only after verifying with + `symlink_metadata` that the file type is `FileType::is_socket()`: + +```rust +async fn cleanup_stale_socket(path: &str) -> Result<(), AdminSocketError> { + let socket_path = Path::new(path); + if !socket_path.exists() { + return Ok(()); + } + + let metadata = std::fs::symlink_metadata(path) + .map_err(AdminSocketError::Io)?; + + if metadata.file_type().is_symlink() { + warn!("admin socket path {} is a symlink, refusing to remove", path); + return Err(AdminSocketError::BindFailed( + "socket path is a symlink, refusing to remove".to_string() + )); + } + + if !metadata.file_type().is_socket() { + warn!("admin socket path {} is not a socket file, refusing to remove", path); + return Err(AdminSocketError::BindFailed( + "path exists but is not a socket".to_string() + )); + } + + if is_socket_active(path).await { + warn!("socket file {} exists and another process is listening; disabling admin socket", path); + return Err(AdminSocketError::SocketInUse(path.to_string())); + } + + warn!("removing stale socket file: {}", path); + tokio::fs::remove_file(path) + .await + .map_err(AdminSocketError::Io) +} +``` + +This prevents both symlink attacks (refuses to remove symlinks) and accidental +removal of non-socket files. + +--- + +### C2. Admin Socket Has No Access Control — Any Local User Can Trigger Reload + +**File**: `src/admin/socket.rs:108-133,254-305` + +**Problem**: The admin socket accepts connections from any local user. There is +no authentication, no peer credential check, and no ownership/permission +restriction on the socket file itself. After `UnixListener::bind()`, the socket +inherits the process umask but no explicit restrictive permissions are set. + +Any local user who can reach the socket path can: + +1. **Trigger a config reload** (`reload` command) — re-reads the config file + from disk and hot-swaps the live routing table. If an attacker can write to + the config file path (e.g., via a separate misconfiguration or directory + permission issue), they can chain this: write a malicious config, then send + `reload` via the admin socket to activate it. This could redirect traffic + to an attacker-controlled upstream. + +2. **Read status** (`status` command) — reveals uptime and number of configured + sites. Minor information disclosure but useful for reconnaissance. + +The `reload` command is particularly dangerous because it reads the config file +from disk each time (`src/admin/socket.rs:257`). The config path is set at +startup from `StaticConfig` and cannot be changed at runtime, but the file +contents at that path can be modified by any process with write access. The +admin socket becomes a trigger mechanism for activating malicious configs. + +**Solution**: Multi-layered defense: + +1. Set restrictive permissions on the socket immediately after binding: + +```rust +use std::os::unix::fs::PermissionsExt; + +let listener = UnixListener::bind(socket_path)?; +let perms = std::fs::Permissions::from_mode(0o660); // owner + group only +std::fs::set_permissions(socket_path, perms)?; +``` + + Or `0o600` for owner-only access. This requires the proxy to run under a + dedicated user and for the admin tool (`socat`, etc.) to run as the same + user or group. + +2. Add peer credential checking using `SO_PEERCRED` on Linux: + +```rust +use std::os::unix::net::UnixStream; + +fn check_peer_uid(stream: &UnixStream) -> bool { + use std::os::unix::io::AsRawFd; + let uid = nix::sys::socket::getpeereid(stream.as_raw_fd()) + .map(|(_, uid, _)| uid); + match uid { + Ok(peer_uid) => peer_uid == 0 || peer_uid == get_current_uid(), + Err(_) => false, + } +} +``` + +3. Document that the socket path should be in a directory with restrictive + permissions (e.g., `/run/reverse-proxy/` owned by the proxy user with mode + `0700`). + +--- + +### C3. Admin Socket `reload` Response Leaks Filesystem Paths and Error Details + +**File**: `src/admin/socket.rs:257-265,268-276` + +**Problem**: When the `reload` command fails, the error response includes the +full `std::io::Error` or `toml::de::Error` message, which can contain absolute +filesystem paths, file permissions, and internal config structure details: + +```rust +Err(e) => { + return serde_json::to_string(&ErrorResponse { + status: "error", + message: format!("failed to read config file: {}", e), // leaks path + }) + .unwrap(); +} + +Err(e) => { + return serde_json::to_string(&ErrorResponse { + status: "error", + message: format!("failed to parse config file: {}", e), // leaks structure + }) + .unwrap(); +} +``` + +The same applies to the `unknown command` response at line 247, which echoes +arbitrary input back without sanitization: + +```rust +message: format!("unknown command: {}", command), +``` + +For an unauthenticated socket, this information disclosure helps an attacker +enumerate the system (filesystem layout, config syntax, software version via +error messages). + +**Solution**: Return generic error messages to the socket client and log the +details server-side: + +```rust +Err(e) => { + tracing::error!(error = %e, "failed to read config file"); + serde_json::to_string(&ErrorResponse { + status: "error", + message: "reload failed".to_string(), + }).unwrap() +} +``` + +For unknown commands, avoid echoing input: + +```rust +_ => serde_json::to_string(&ErrorResponse { + status: "error", + message: "unknown command".to_string(), +}).unwrap(), +``` + +--- + +## Warning Findings + +### W1. No Concurrency Limit on Admin Socket Connections + +**File**: `src/admin/socket.rs:108-133` + +**Problem**: Each accepted connection spawns a new tokio task (`tokio::spawn` at +line 114) with no concurrency limit. A local user with access to the socket can +open many simultaneous connections, spawning unlimited tasks. While the 5-second +read timeout and 4096-byte limit (added in review #004) mitigate the most +trivial DoS, a determined attacker can: + +- Open many connections simultaneously, each sending data slowly (within the + 5-second timeout), consuming memory and task slots. +- Send data up to 4096 bytes per connection — with enough concurrent + connections, this still consumes significant memory. + +**Solution**: Add a `tokio::sync::Semaphore` with a reasonable limit (e.g., 10 +concurrent connections): + +```rust +let semaphore = Arc::new(tokio::sync::Semaphore::new(10)); + +// In the accept loop: +let permit = match semaphore.clone().acquire_owned().await { + Ok(p) => p, + Err(_) => { + warn!("admin socket connection limit reached, dropping connection"); + continue; + } +}; +tokio::spawn(async move { + let _permit = permit; + handle_connection(stream, admin_socket).await; +}); +``` + +--- + +### W2. Config File TOCTOU: Reload Reads File Without Atomicity + +**File**: `src/admin/socket.rs:257`, `src/shutdown.rs:88` + +**Problem**: Both `handle_reload` and `handle_sighup_reload` read the config +file with `tokio::fs::read_to_string()`, then parse it. If another process is +writing to the config file at the same time (e.g., a configuration management +tool like Ansible writing a partial file), the proxy could read a partially +written config and apply it. This is a filesystem-level TOCTOU issue. + +Unlike the symlink race (C1), this is harder to exploit directly — the parse +will likely fail on a partial file, resulting in a reload error rather than a +bad config being applied. However, it could result in a brief window where the +operator sees confusing errors during config rotation. + +**Solution**: Use atomic file replacement — write to a temporary file in the +same directory, then rename over the target. Document this pattern for +operators. Alternatively, compute a checksum or stat the file before and after +reading to detect mid-write changes: + +```rust +let metadata_before = tokio::fs::metadata(&admin_socket.config_path).await?; +let config_content = tokio::fs::read_to_string(&admin_socket.config_path).await?; +let metadata_after = tokio::fs::metadata(&admin_socket.config_path).await?; + +if metadata_before.modified()? != metadata_after.modified()? { + return serde_json::to_string(&ErrorResponse { + status: "error", + message: "config file changed during read, please retry".to_string(), + }).unwrap(); +} +``` + +--- + +### W3. Admin Socket Path Is Not Validated or Sanitized + +**File**: `src/admin/socket.rs:81-83`, `src/config/static_config.rs:22-24` + +**Problem**: The `admin_socket_path` from the config file is used directly as a +filesystem path with no validation. A malicious or misconfigured path could +point to: + +- A path on a critical filesystem (e.g., `/etc/passwd`) +- An extremely long path (potential buffer issues in downstream code) +- A path with directory traversal (e.g., `../../etc/cron.d/malicious`) + +While the default `/run/reverse-proxy/admin.sock` is safe, the config is user- +controlled and loaded from disk. Combined with C2 (no authentication), a local +attacker who can write to the config file and trigger a reload could redirect +the admin socket to an arbitrary path. + +This is partially mitigated by the fact that `admin_socket_path` is in +`StaticConfig` (requires restart, not hot-reloadable), but the startup config +still trusts the path. + +**Solution**: Add validation that the socket path: + +1. Ends with `.sock` or `.socket` (or at least doesn't end with a suspicious + extension) +2. Is under a known-safe directory prefix (e.g., `/run/`, `/var/run/`, or a + configurable base directory) +3. Does not contain path traversal components (`..`) +4. Has a reasonable length limit + +```rust +fn validate_admin_socket_path(path: &str) -> Result<(), ValidationError> { + if path.is_empty() { return Ok(()); } // disabled is valid + if path.len() > 255 { + return Err(ValidationError::AdminSocketPathTooLong); + } + if path.contains("..") { + return Err(ValidationError::AdminSocketPathTraversal); + } + let path = Path::new(path); + if path.is_absolute() { + Ok(()) + } else { + Err(ValidationError::AdminSocketPathRelative) + } +} +``` + +--- + +### W4. `is_socket_active` Side-Effect on Other Processes + +**File**: `src/admin/socket.rs:163-165` + +**Problem**: This was flagged in review #003 (W7) and accepted as Phase 1. +However, in the context of the symlink attack in C1, this function becomes more +dangerous: if an attacker replaces the socket path with a symlink to another +service's socket, `is_socket_active` will connect to that service, which could +trigger behavior in the target service (e.g., accepting a connection, logging +it, starting a session). This amplifies the C1 symlink attack beyond file +deletion. + +**Solution**: See C1 — the proposed fix uses `symlink_metadata()` and checks +`is_socket()` before calling `is_socket_active`, which eliminates the symlink +attack surface. Additionally, the `is_socket_active` check should only be +reached after verifying the file is not a symlink and is actually a socket +file type. + +--- + +### W5. `reload` Command Does Not Validate Config Before Applying + +**File**: `src/admin/socket.rs:268-279`, `src/config/dynamic_config.rs:134-158` + +**Problem**: While `ConfigReloadHandle::reload()` calls `validate()` before +storing the new config, the validation does not check that the config file +itself hasn't been tampered with (e.g., via checksum). More critically, the +validation passes `cli_allow_wildcard_bind: false` during reload +(`validate(&new_static, &new_dynamic, false)` at line 141), but the startup +path may have passed `true` (via `--allow-wildcard-bind` CLI flag). This means +a config reload could tighten the wildcard bind restriction that was +intentionally relaxed via CLI, causing existing listeners on `0.0.0.0` to +continue running while the validation reports them as errors. + +This is a corner case — the running listeners are not stopped — but it creates +a confusing state where the config is rejected on reload even though it was +accepted at startup. + +**Solution**: Store the `cli_allow_wildcard_bind` flag in `ConfigReloadHandle` +or `StaticConfig` so that reload uses the same flag as startup: + +```rust +pub struct ConfigReloadHandle { + config: Arc>, + static_config: ArcSwap, + reload_mutex: Mutex<()>, + cli_allow_wildcard_bind: bool, // stored from startup +} +``` + +--- + +### W6. Reload Error Response Includes Static Config Change Warnings That Should Not Be Exposed + +**File**: `src/admin/socket.rs:286-293` + +**Problem**: On successful reload, if static config fields have changed, the +response includes a log at warning level but the admin socket only returns +`{"status": "ok"}`. However, the `reload_mutex` inside `ConfigReloadHandle` +already ensures only one reload runs at a time, but the `changed_fields` list +is only logged, not returned to the caller. An operator sending `reload` via +`socat` has no way to know if their reload actually changed static fields that +require a restart. This is an operational concern rather than a security issue. + +**Solution**: Include `changed_fields` in the reload response when non-empty: + +```rust +Ok(changed_fields) => { + if !changed_fields.is_empty() { + serde_json::to_string(&OkWithFieldsResponse { + status: "ok", + message: format!("static fields changed (restart required): {}", changed_fields.join(", ")), + }).unwrap() + } else { + serde_json::to_string(&OkResponse { status: "ok" }).unwrap() + } +} +``` + +--- + +### W7. Health Check Endpoint Exposes Service on Predictable Port + +**File**: `src/health.rs:17-36` + +**Problem**: The health check listener binds to `127.0.0.1` on the configured +port (default 9900). While binding to localhost is correct, the port is +predictable and not configurable to bind to a Unix socket instead. On shared +systems, any local process can connect to `127.0.0.1:9900/health` and confirm +the proxy is running. This is low risk (the response is just `200 OK` with no +body), but the endpoint is completely unauthenticated and could be used for +reconnaissance. + +More importantly, there's no rate limiting on the health endpoint. A local +attacker could flood it to consume connection resources, though this is +mitigated by the fact that it's localhost-only and a simple GET. + +**Solution**: Consider adding an option to bind the health check to a Unix +socket instead of a TCP port, or adding a shared-secret token header check. +For Phase 1, this is acceptable as-is since the endpoint is localhost-only and +returns minimal information. + +--- + +## Suggestions + +### S1. Add Connection Peer Logging to Admin Socket + +**File**: `src/admin/socket.rs:112` + +**Suggestion**: Log the peer credentials (UID, PID, GID) of each admin socket +connection using `SO_PEERCRED`. This provides an audit trail for reload +operations: + +```rust +use std::os::unix::io::AsRawFd; + +fn log_peer_credentials(stream: &tokio::net::UnixStream) { + if let Ok((_, uid)) = nix::sys::socket::getpeereid(stream.as_raw_fd()) { + info!(peer_uid = uid, "admin socket connection accepted"); + } +} +``` + +--- + +### S2. Use `AF_UNIX` Peer Credential Validation Instead of File Permissions + +**File**: `src/admin/socket.rs` + +**Suggestion**: Rather than (or in addition to) setting socket file permissions, +use `SO_PEERCRED`/`getpeereid` to validate the connecting process's UID at +accept time. This is more robust than filesystem permissions because: + +1. It works regardless of the umask +2. It allows fine-grained control (e.g., allow root and the proxy user only) +3. It cannot be bypassed by group membership changes + +Requires adding `nix` as a dependency (or using `libc` directly). + +--- + +### S3. Set Socket File Permissions Immediately After Bind + +**File**: `src/admin/socket.rs:90-102` + +**Suggestion**: After the successful `UnixListener::bind()`, immediately set +restrictive permissions on the socket file: + +```rust +let listener = UnixListener::bind(socket_path)?; +let perms = std::fs::Permissions::from_mode(0o660); +std::fs::set_permissions(socket_path, perms)?; +info!("admin socket listening on {} (mode 660)", socket_path); +``` + +This is the minimum defense-in-depth even if full peer credential checking +is not implemented in Phase 1. + +--- + +### S4. Consider Adding a Simple Challenge-Response to the Admin Socket Protocol + +**File**: `src/admin/socket.rs:167-252` + +**Suggestion**: Even without full authentication, a simple shared-secret token +check would significantly raise the bar for local attackers. Add a `token` +field to `StaticConfig` (or an environment variable) and require it as a prefix +to commands: + +``` +:reload\n +:status\n +``` + +This prevents casual discovery of the socket path from enabling reload. The +token could be stored in an environment variable (`ADMIN_SOCKET_TOKEN`) that +the proxy reads at startup, keeping it out of the config file. + +--- + +### S5. Move `is_socket_active` Check After File Type Verification + +**File**: `src/admin/socket.rs:143-161` + +**Suggestion**: This is the implementation detail of C1's fix. The current +order is: + +1. Check if path exists +2. Check if socket is active (connect to it) +3. Remove the file + +The safe order should be: + +1. Check if path exists +2. Verify it's not a symlink (`symlink_metadata` + `is_symlink()`) +3. Verify it's a socket file type (`is_socket()`) +4. Check if socket is active +5. Remove the file + +--- + +### S6. Document Admin Socket Security Model + +**Suggestion**: Add a section to `docs/architecture/operations.md` (or similar) +that clearly states: + +1. The admin socket is unauthenticated by design for Phase 1 +2. The socket path must be in a directory with restrictive permissions +3. The socket file permissions should be `0600` or `0660` +4. Only trusted users should have filesystem access to the socket path +5. Future phases may add peer credential authentication or challenge-response + +--- + +## Summary Statistics + +| Severity | Count | Status | +|----------|-------|--------| +| Critical | 3 | Must fix before production | +| Warning | 7 | Should fix — security hardening | +| Suggestion | 6 | Consider for defense-in-depth | + +## Recommended Fix Priority + +1. **C1 (symlink race / arbitrary file deletion)** — Exploitable by any local + user who can reach the socket directory. Fix with `symlink_metadata` + file + type check before removal. +2. **C2 (no authentication / unrestricted socket permissions)** — Any local user + can trigger config reload. Set socket permissions to 0o660 or 0o600 after + bind as a minimum; add peer credential checking for full protection. +3. **C3 (error messages leak filesystem paths)** — Information disclosure on an + unauthenticated socket. Return generic errors and log details server-side. +4. **W1 (no connection concurrency limit)** — DoS vector. Add a semaphore. +5. **W2 (config file read without atomicity)** — Partial-read risk during + config rotation. Document atomic replacement pattern for operators. +6. **W3 (socket path not validated)** — Path traversal/symlink risk in config. + Add basic validation. +7. **W4 (is_socket_active side effect)** — Amplifies C1. Fixed by C1's + `symlink_metadata` guard. +8. **W5 (reload validation uses different wildcard flag)** — Inconsistency + between startup and reload validation. +9. **Remaining W and S findings** — Fix opportunistically. + +## Additional Notes + +The unusual traffic on this file was likely driven by interest in C1 and C2. +The symlink race (C1) is the most directly exploitable vulnerability — it +requires only local filesystem access to the socket directory and timing to +replace the socket with a symlink. Combined with C2 (no authentication), a +local attacker who can write to the socket directory can both delete arbitrary +files (C1) and trigger config reloads with attacker-controlled content (C2 + +config file write access). \ No newline at end of file