Files
reverse-proxy/docs/reviews/005-admin-socket-security-review.md

638 lines
22 KiB
Markdown

---
status: draft
last_updated: 2026-06-14
reviewed_code:
- src/admin/socket.rs
- src/admin/mod.rs
- src/main.rs
- src/shutdown.rs
- src/config/dynamic_config.rs
- src/config/validation.rs
- src/health.rs
- src/proxy/handler.rs
- src/proxy/headers.rs
- src/proxy/mod.rs
- src/rate_limit/mod.rs
- src/server.rs
reviewer: code-reviewer
based_on: docs/reviews/004-post-fix-review.md
---
# Admin Socket Security Review #005
## Purpose
Focused security review of `src/admin/socket.rs` and related code paths,
triggered by unusual traffic patterns observed on the file in both public and
self-hosted git. The review examines the Unix domain socket admin interface for
vulnerabilities including symlink attacks, privilege escalation, information
disclosure, and DoS vectors. Broader codebase was also surveyed for related
issues.
## Severity Definitions
| Severity | Meaning |
|----------|---------|
| **Critical** | Will cause incorrect behavior or security issues in production |
| **Warning** | Could cause issues under specific conditions or represents a missed edge case |
| **Suggestion** | Code quality, style, or minor improvement opportunity |
---
## Critical Findings
### C1. Symlink Race in `cleanup_stale_socket` Enables Arbitrary File Deletion
**File**: `src/admin/socket.rs:143-161`
**Problem**: `cleanup_stale_socket` checks whether a socket file exists and
whether another process is listening on it, then removes the file. Between the
`is_socket_active()` check (which connects to the socket) and the
`remove_file()` call, an attacker with local access can replace the socket file
with a symlink pointing to an arbitrary path (e.g., `/etc/passwd`, a critical
database file). The `remove_file()` call then follows the symlink and deletes
the target:
```rust
async fn cleanup_stale_socket(path: &str) -> Result<(), AdminSocketError> {
let socket_path = Path::new(path);
if !socket_path.exists() {
return Ok(());
}
if is_socket_active(path).await { // check
// ...
return Err(AdminSocketError::SocketInUse(path.to_string()));
}
warn!("removing stale socket file: {}", path);
tokio::fs::remove_file(path) // act — follows symlinks!
.await
.map_err(AdminSocketError::Io)
}
```
This is a classic TOCTOU (time-of-check/time-of-use) race. The window is small
but exploitable with local access, which is exactly the threat model for a Unix
domain socket (any local user can reach it).
Additionally, `is_socket_active` works by **actually connecting** to the socket
(`src/admin/socket.rs:163-165`). If the path is a symlink to another service's
Unix socket, this creates a real connection in that service, which is an
unintended side effect that could be logged or trigger behavior in the other
service.
**Solution**: Replace the check-then-remove sequence with a safe alternative:
1. Use `std::fs::metadata()` to verify the file is actually a socket before
removal (sockets cannot be symlinked to — `metadata` does not follow
symlinks when `std::fs::symlink_metadata` is used, and sockets are a
distinct file type).
2. Alternatively, use `std::fs::remove_file` only after verifying with
`symlink_metadata` that the file type is `FileType::is_socket()`:
```rust
async fn cleanup_stale_socket(path: &str) -> Result<(), AdminSocketError> {
let socket_path = Path::new(path);
if !socket_path.exists() {
return Ok(());
}
let metadata = std::fs::symlink_metadata(path)
.map_err(AdminSocketError::Io)?;
if metadata.file_type().is_symlink() {
warn!("admin socket path {} is a symlink, refusing to remove", path);
return Err(AdminSocketError::BindFailed(
"socket path is a symlink, refusing to remove".to_string()
));
}
if !metadata.file_type().is_socket() {
warn!("admin socket path {} is not a socket file, refusing to remove", path);
return Err(AdminSocketError::BindFailed(
"path exists but is not a socket".to_string()
));
}
if is_socket_active(path).await {
warn!("socket file {} exists and another process is listening; disabling admin socket", path);
return Err(AdminSocketError::SocketInUse(path.to_string()));
}
warn!("removing stale socket file: {}", path);
tokio::fs::remove_file(path)
.await
.map_err(AdminSocketError::Io)
}
```
This prevents both symlink attacks (refuses to remove symlinks) and accidental
removal of non-socket files.
---
### C2. Admin Socket Has No Access Control — Any Local User Can Trigger Reload
**File**: `src/admin/socket.rs:108-133,254-305`
**Problem**: The admin socket accepts connections from any local user. There is
no authentication, no peer credential check, and no ownership/permission
restriction on the socket file itself. After `UnixListener::bind()`, the socket
inherits the process umask but no explicit restrictive permissions are set.
Any local user who can reach the socket path can:
1. **Trigger a config reload** (`reload` command) — re-reads the config file
from disk and hot-swaps the live routing table. If an attacker can write to
the config file path (e.g., via a separate misconfiguration or directory
permission issue), they can chain this: write a malicious config, then send
`reload` via the admin socket to activate it. This could redirect traffic
to an attacker-controlled upstream.
2. **Read status** (`status` command) — reveals uptime and number of configured
sites. Minor information disclosure but useful for reconnaissance.
The `reload` command is particularly dangerous because it reads the config file
from disk each time (`src/admin/socket.rs:257`). The config path is set at
startup from `StaticConfig` and cannot be changed at runtime, but the file
contents at that path can be modified by any process with write access. The
admin socket becomes a trigger mechanism for activating malicious configs.
**Solution**: Multi-layered defense:
1. Set restrictive permissions on the socket immediately after binding:
```rust
use std::os::unix::fs::PermissionsExt;
let listener = UnixListener::bind(socket_path)?;
let perms = std::fs::Permissions::from_mode(0o660); // owner + group only
std::fs::set_permissions(socket_path, perms)?;
```
Or `0o600` for owner-only access. This requires the proxy to run under a
dedicated user and for the admin tool (`socat`, etc.) to run as the same
user or group.
2. Add peer credential checking using `SO_PEERCRED` on Linux:
```rust
use std::os::unix::net::UnixStream;
fn check_peer_uid(stream: &UnixStream) -> bool {
use std::os::unix::io::AsRawFd;
let uid = nix::sys::socket::getpeereid(stream.as_raw_fd())
.map(|(_, uid, _)| uid);
match uid {
Ok(peer_uid) => peer_uid == 0 || peer_uid == get_current_uid(),
Err(_) => false,
}
}
```
3. Document that the socket path should be in a directory with restrictive
permissions (e.g., `/run/reverse-proxy/` owned by the proxy user with mode
`0700`).
---
### C3. Admin Socket `reload` Response Leaks Filesystem Paths and Error Details
**File**: `src/admin/socket.rs:257-265,268-276`
**Problem**: When the `reload` command fails, the error response includes the
full `std::io::Error` or `toml::de::Error` message, which can contain absolute
filesystem paths, file permissions, and internal config structure details:
```rust
Err(e) => {
return serde_json::to_string(&ErrorResponse {
status: "error",
message: format!("failed to read config file: {}", e), // leaks path
})
.unwrap();
}
Err(e) => {
return serde_json::to_string(&ErrorResponse {
status: "error",
message: format!("failed to parse config file: {}", e), // leaks structure
})
.unwrap();
}
```
The same applies to the `unknown command` response at line 247, which echoes
arbitrary input back without sanitization:
```rust
message: format!("unknown command: {}", command),
```
For an unauthenticated socket, this information disclosure helps an attacker
enumerate the system (filesystem layout, config syntax, software version via
error messages).
**Solution**: Return generic error messages to the socket client and log the
details server-side:
```rust
Err(e) => {
tracing::error!(error = %e, "failed to read config file");
serde_json::to_string(&ErrorResponse {
status: "error",
message: "reload failed".to_string(),
}).unwrap()
}
```
For unknown commands, avoid echoing input:
```rust
_ => serde_json::to_string(&ErrorResponse {
status: "error",
message: "unknown command".to_string(),
}).unwrap(),
```
---
## Warning Findings
### W1. No Concurrency Limit on Admin Socket Connections
**File**: `src/admin/socket.rs:108-133`
**Problem**: Each accepted connection spawns a new tokio task (`tokio::spawn` at
line 114) with no concurrency limit. A local user with access to the socket can
open many simultaneous connections, spawning unlimited tasks. While the 5-second
read timeout and 4096-byte limit (added in review #004) mitigate the most
trivial DoS, a determined attacker can:
- Open many connections simultaneously, each sending data slowly (within the
5-second timeout), consuming memory and task slots.
- Send data up to 4096 bytes per connection — with enough concurrent
connections, this still consumes significant memory.
**Solution**: Add a `tokio::sync::Semaphore` with a reasonable limit (e.g., 10
concurrent connections):
```rust
let semaphore = Arc::new(tokio::sync::Semaphore::new(10));
// In the accept loop:
let permit = match semaphore.clone().acquire_owned().await {
Ok(p) => p,
Err(_) => {
warn!("admin socket connection limit reached, dropping connection");
continue;
}
};
tokio::spawn(async move {
let _permit = permit;
handle_connection(stream, admin_socket).await;
});
```
---
### W2. Config File TOCTOU: Reload Reads File Without Atomicity
**File**: `src/admin/socket.rs:257`, `src/shutdown.rs:88`
**Problem**: Both `handle_reload` and `handle_sighup_reload` read the config
file with `tokio::fs::read_to_string()`, then parse it. If another process is
writing to the config file at the same time (e.g., a configuration management
tool like Ansible writing a partial file), the proxy could read a partially
written config and apply it. This is a filesystem-level TOCTOU issue.
Unlike the symlink race (C1), this is harder to exploit directly — the parse
will likely fail on a partial file, resulting in a reload error rather than a
bad config being applied. However, it could result in a brief window where the
operator sees confusing errors during config rotation.
**Solution**: Use atomic file replacement — write to a temporary file in the
same directory, then rename over the target. Document this pattern for
operators. Alternatively, compute a checksum or stat the file before and after
reading to detect mid-write changes:
```rust
let metadata_before = tokio::fs::metadata(&admin_socket.config_path).await?;
let config_content = tokio::fs::read_to_string(&admin_socket.config_path).await?;
let metadata_after = tokio::fs::metadata(&admin_socket.config_path).await?;
if metadata_before.modified()? != metadata_after.modified()? {
return serde_json::to_string(&ErrorResponse {
status: "error",
message: "config file changed during read, please retry".to_string(),
}).unwrap();
}
```
---
### W3. Admin Socket Path Is Not Validated or Sanitized
**File**: `src/admin/socket.rs:81-83`, `src/config/static_config.rs:22-24`
**Problem**: The `admin_socket_path` from the config file is used directly as a
filesystem path with no validation. A malicious or misconfigured path could
point to:
- A path on a critical filesystem (e.g., `/etc/passwd`)
- An extremely long path (potential buffer issues in downstream code)
- A path with directory traversal (e.g., `../../etc/cron.d/malicious`)
While the default `/run/reverse-proxy/admin.sock` is safe, the config is user-
controlled and loaded from disk. Combined with C2 (no authentication), a local
attacker who can write to the config file and trigger a reload could redirect
the admin socket to an arbitrary path.
This is partially mitigated by the fact that `admin_socket_path` is in
`StaticConfig` (requires restart, not hot-reloadable), but the startup config
still trusts the path.
**Solution**: Add validation that the socket path:
1. Ends with `.sock` or `.socket` (or at least doesn't end with a suspicious
extension)
2. Is under a known-safe directory prefix (e.g., `/run/`, `/var/run/`, or a
configurable base directory)
3. Does not contain path traversal components (`..`)
4. Has a reasonable length limit
```rust
fn validate_admin_socket_path(path: &str) -> Result<(), ValidationError> {
if path.is_empty() { return Ok(()); } // disabled is valid
if path.len() > 255 {
return Err(ValidationError::AdminSocketPathTooLong);
}
if path.contains("..") {
return Err(ValidationError::AdminSocketPathTraversal);
}
let path = Path::new(path);
if path.is_absolute() {
Ok(())
} else {
Err(ValidationError::AdminSocketPathRelative)
}
}
```
---
### W4. `is_socket_active` Side-Effect on Other Processes
**File**: `src/admin/socket.rs:163-165`
**Problem**: This was flagged in review #003 (W7) and accepted as Phase 1.
However, in the context of the symlink attack in C1, this function becomes more
dangerous: if an attacker replaces the socket path with a symlink to another
service's socket, `is_socket_active` will connect to that service, which could
trigger behavior in the target service (e.g., accepting a connection, logging
it, starting a session). This amplifies the C1 symlink attack beyond file
deletion.
**Solution**: See C1 — the proposed fix uses `symlink_metadata()` and checks
`is_socket()` before calling `is_socket_active`, which eliminates the symlink
attack surface. Additionally, the `is_socket_active` check should only be
reached after verifying the file is not a symlink and is actually a socket
file type.
---
### W5. `reload` Command Does Not Validate Config Before Applying
**File**: `src/admin/socket.rs:268-279`, `src/config/dynamic_config.rs:134-158`
**Problem**: While `ConfigReloadHandle::reload()` calls `validate()` before
storing the new config, the validation does not check that the config file
itself hasn't been tampered with (e.g., via checksum). More critically, the
validation passes `cli_allow_wildcard_bind: false` during reload
(`validate(&new_static, &new_dynamic, false)` at line 141), but the startup
path may have passed `true` (via `--allow-wildcard-bind` CLI flag). This means
a config reload could tighten the wildcard bind restriction that was
intentionally relaxed via CLI, causing existing listeners on `0.0.0.0` to
continue running while the validation reports them as errors.
This is a corner case — the running listeners are not stopped — but it creates
a confusing state where the config is rejected on reload even though it was
accepted at startup.
**Solution**: Store the `cli_allow_wildcard_bind` flag in `ConfigReloadHandle`
or `StaticConfig` so that reload uses the same flag as startup:
```rust
pub struct ConfigReloadHandle {
config: Arc<ArcSwap<DynamicConfig>>,
static_config: ArcSwap<StaticConfig>,
reload_mutex: Mutex<()>,
cli_allow_wildcard_bind: bool, // stored from startup
}
```
---
### W6. Reload Error Response Includes Static Config Change Warnings That Should Not Be Exposed
**File**: `src/admin/socket.rs:286-293`
**Problem**: On successful reload, if static config fields have changed, the
response includes a log at warning level but the admin socket only returns
`{"status": "ok"}`. However, the `reload_mutex` inside `ConfigReloadHandle`
already ensures only one reload runs at a time, but the `changed_fields` list
is only logged, not returned to the caller. An operator sending `reload` via
`socat` has no way to know if their reload actually changed static fields that
require a restart. This is an operational concern rather than a security issue.
**Solution**: Include `changed_fields` in the reload response when non-empty:
```rust
Ok(changed_fields) => {
if !changed_fields.is_empty() {
serde_json::to_string(&OkWithFieldsResponse {
status: "ok",
message: format!("static fields changed (restart required): {}", changed_fields.join(", ")),
}).unwrap()
} else {
serde_json::to_string(&OkResponse { status: "ok" }).unwrap()
}
}
```
---
### W7. Health Check Endpoint Exposes Service on Predictable Port
**File**: `src/health.rs:17-36`
**Problem**: The health check listener binds to `127.0.0.1` on the configured
port (default 9900). While binding to localhost is correct, the port is
predictable and not configurable to bind to a Unix socket instead. On shared
systems, any local process can connect to `127.0.0.1:9900/health` and confirm
the proxy is running. This is low risk (the response is just `200 OK` with no
body), but the endpoint is completely unauthenticated and could be used for
reconnaissance.
More importantly, there's no rate limiting on the health endpoint. A local
attacker could flood it to consume connection resources, though this is
mitigated by the fact that it's localhost-only and a simple GET.
**Solution**: Consider adding an option to bind the health check to a Unix
socket instead of a TCP port, or adding a shared-secret token header check.
For Phase 1, this is acceptable as-is since the endpoint is localhost-only and
returns minimal information.
---
## Suggestions
### S1. Add Connection Peer Logging to Admin Socket
**File**: `src/admin/socket.rs:112`
**Suggestion**: Log the peer credentials (UID, PID, GID) of each admin socket
connection using `SO_PEERCRED`. This provides an audit trail for reload
operations:
```rust
use std::os::unix::io::AsRawFd;
fn log_peer_credentials(stream: &tokio::net::UnixStream) {
if let Ok((_, uid)) = nix::sys::socket::getpeereid(stream.as_raw_fd()) {
info!(peer_uid = uid, "admin socket connection accepted");
}
}
```
---
### S2. Use `AF_UNIX` Peer Credential Validation Instead of File Permissions
**File**: `src/admin/socket.rs`
**Suggestion**: Rather than (or in addition to) setting socket file permissions,
use `SO_PEERCRED`/`getpeereid` to validate the connecting process's UID at
accept time. This is more robust than filesystem permissions because:
1. It works regardless of the umask
2. It allows fine-grained control (e.g., allow root and the proxy user only)
3. It cannot be bypassed by group membership changes
Requires adding `nix` as a dependency (or using `libc` directly).
---
### S3. Set Socket File Permissions Immediately After Bind
**File**: `src/admin/socket.rs:90-102`
**Suggestion**: After the successful `UnixListener::bind()`, immediately set
restrictive permissions on the socket file:
```rust
let listener = UnixListener::bind(socket_path)?;
let perms = std::fs::Permissions::from_mode(0o660);
std::fs::set_permissions(socket_path, perms)?;
info!("admin socket listening on {} (mode 660)", socket_path);
```
This is the minimum defense-in-depth even if full peer credential checking
is not implemented in Phase 1.
---
### S4. Consider Adding a Simple Challenge-Response to the Admin Socket Protocol
**File**: `src/admin/socket.rs:167-252`
**Suggestion**: Even without full authentication, a simple shared-secret token
check would significantly raise the bar for local attackers. Add a `token`
field to `StaticConfig` (or an environment variable) and require it as a prefix
to commands:
```
<token>:reload\n
<token>:status\n
```
This prevents casual discovery of the socket path from enabling reload. The
token could be stored in an environment variable (`ADMIN_SOCKET_TOKEN`) that
the proxy reads at startup, keeping it out of the config file.
---
### S5. Move `is_socket_active` Check After File Type Verification
**File**: `src/admin/socket.rs:143-161`
**Suggestion**: This is the implementation detail of C1's fix. The current
order is:
1. Check if path exists
2. Check if socket is active (connect to it)
3. Remove the file
The safe order should be:
1. Check if path exists
2. Verify it's not a symlink (`symlink_metadata` + `is_symlink()`)
3. Verify it's a socket file type (`is_socket()`)
4. Check if socket is active
5. Remove the file
---
### S6. Document Admin Socket Security Model
**Suggestion**: Add a section to `docs/architecture/operations.md` (or similar)
that clearly states:
1. The admin socket is unauthenticated by design for Phase 1
2. The socket path must be in a directory with restrictive permissions
3. The socket file permissions should be `0600` or `0660`
4. Only trusted users should have filesystem access to the socket path
5. Future phases may add peer credential authentication or challenge-response
---
## Summary Statistics
| Severity | Count | Status |
|----------|-------|--------|
| Critical | 3 | Must fix before production |
| Warning | 7 | Should fix — security hardening |
| Suggestion | 6 | Consider for defense-in-depth |
## Recommended Fix Priority
1. **C1 (symlink race / arbitrary file deletion)** — Exploitable by any local
user who can reach the socket directory. Fix with `symlink_metadata` + file
type check before removal.
2. **C2 (no authentication / unrestricted socket permissions)** — Any local user
can trigger config reload. Set socket permissions to 0o660 or 0o600 after
bind as a minimum; add peer credential checking for full protection.
3. **C3 (error messages leak filesystem paths)** — Information disclosure on an
unauthenticated socket. Return generic errors and log details server-side.
4. **W1 (no connection concurrency limit)** — DoS vector. Add a semaphore.
5. **W2 (config file read without atomicity)** — Partial-read risk during
config rotation. Document atomic replacement pattern for operators.
6. **W3 (socket path not validated)** — Path traversal/symlink risk in config.
Add basic validation.
7. **W4 (is_socket_active side effect)** — Amplifies C1. Fixed by C1's
`symlink_metadata` guard.
8. **W5 (reload validation uses different wildcard flag)** — Inconsistency
between startup and reload validation.
9. **Remaining W and S findings** — Fix opportunistically.
## Additional Notes
The unusual traffic on this file was likely driven by interest in C1 and C2.
The symlink race (C1) is the most directly exploitable vulnerability — it
requires only local filesystem access to the socket directory and timing to
replace the socket with a symlink. Combined with C2 (no authentication), a
local attacker who can write to the socket directory can both delete arbitrary
files (C1) and trigger config reloads with attacker-controlled content (C2 +
config file write access).