Files
reverse-proxy/docs/reviews/005-admin-socket-security-review.md

22 KiB

status, last_updated, reviewed_code, reviewer, based_on
status last_updated reviewed_code reviewer based_on
draft 2026-06-14
src/admin/socket.rs
src/admin/mod.rs
src/main.rs
src/shutdown.rs
src/config/dynamic_config.rs
src/config/validation.rs
src/health.rs
src/proxy/handler.rs
src/proxy/headers.rs
src/proxy/mod.rs
src/rate_limit/mod.rs
src/server.rs
code-reviewer docs/reviews/004-post-fix-review.md

Admin Socket Security Review #005

Purpose

Focused security review of src/admin/socket.rs and related code paths, triggered by unusual traffic patterns observed on the file in both public and self-hosted git. The review examines the Unix domain socket admin interface for vulnerabilities including symlink attacks, privilege escalation, information disclosure, and DoS vectors. Broader codebase was also surveyed for related issues.

Severity Definitions

Severity Meaning
Critical Will cause incorrect behavior or security issues in production
Warning Could cause issues under specific conditions or represents a missed edge case
Suggestion Code quality, style, or minor improvement opportunity

Critical Findings

File: src/admin/socket.rs:143-161

Problem: cleanup_stale_socket checks whether a socket file exists and whether another process is listening on it, then removes the file. Between the is_socket_active() check (which connects to the socket) and the remove_file() call, an attacker with local access can replace the socket file with a symlink pointing to an arbitrary path (e.g., /etc/passwd, a critical database file). The remove_file() call then follows the symlink and deletes the target:

async fn cleanup_stale_socket(path: &str) -> Result<(), AdminSocketError> {
    let socket_path = Path::new(path);
    if !socket_path.exists() {
        return Ok(());
    }

    if is_socket_active(path).await {         // check
        // ...
        return Err(AdminSocketError::SocketInUse(path.to_string()));
    }

    warn!("removing stale socket file: {}", path);
    tokio::fs::remove_file(path)              // act — follows symlinks!
        .await
        .map_err(AdminSocketError::Io)
}

This is a classic TOCTOU (time-of-check/time-of-use) race. The window is small but exploitable with local access, which is exactly the threat model for a Unix domain socket (any local user can reach it).

Additionally, is_socket_active works by actually connecting to the socket (src/admin/socket.rs:163-165). If the path is a symlink to another service's Unix socket, this creates a real connection in that service, which is an unintended side effect that could be logged or trigger behavior in the other service.

Solution: Replace the check-then-remove sequence with a safe alternative:

  1. Use std::fs::metadata() to verify the file is actually a socket before removal (sockets cannot be symlinked to — metadata does not follow symlinks when std::fs::symlink_metadata is used, and sockets are a distinct file type).

  2. Alternatively, use std::fs::remove_file only after verifying with symlink_metadata that the file type is FileType::is_socket():

async fn cleanup_stale_socket(path: &str) -> Result<(), AdminSocketError> {
    let socket_path = Path::new(path);
    if !socket_path.exists() {
        return Ok(());
    }

    let metadata = std::fs::symlink_metadata(path)
        .map_err(AdminSocketError::Io)?;

    if metadata.file_type().is_symlink() {
        warn!("admin socket path {} is a symlink, refusing to remove", path);
        return Err(AdminSocketError::BindFailed(
            "socket path is a symlink, refusing to remove".to_string()
        ));
    }

    if !metadata.file_type().is_socket() {
        warn!("admin socket path {} is not a socket file, refusing to remove", path);
        return Err(AdminSocketError::BindFailed(
            "path exists but is not a socket".to_string()
        ));
    }

    if is_socket_active(path).await {
        warn!("socket file {} exists and another process is listening; disabling admin socket", path);
        return Err(AdminSocketError::SocketInUse(path.to_string()));
    }

    warn!("removing stale socket file: {}", path);
    tokio::fs::remove_file(path)
        .await
        .map_err(AdminSocketError::Io)
}

This prevents both symlink attacks (refuses to remove symlinks) and accidental removal of non-socket files.


C2. Admin Socket Has No Access Control — Any Local User Can Trigger Reload

File: src/admin/socket.rs:108-133,254-305

Problem: The admin socket accepts connections from any local user. There is no authentication, no peer credential check, and no ownership/permission restriction on the socket file itself. After UnixListener::bind(), the socket inherits the process umask but no explicit restrictive permissions are set.

Any local user who can reach the socket path can:

  1. Trigger a config reload (reload command) — re-reads the config file from disk and hot-swaps the live routing table. If an attacker can write to the config file path (e.g., via a separate misconfiguration or directory permission issue), they can chain this: write a malicious config, then send reload via the admin socket to activate it. This could redirect traffic to an attacker-controlled upstream.

  2. Read status (status command) — reveals uptime and number of configured sites. Minor information disclosure but useful for reconnaissance.

The reload command is particularly dangerous because it reads the config file from disk each time (src/admin/socket.rs:257). The config path is set at startup from StaticConfig and cannot be changed at runtime, but the file contents at that path can be modified by any process with write access. The admin socket becomes a trigger mechanism for activating malicious configs.

Solution: Multi-layered defense:

  1. Set restrictive permissions on the socket immediately after binding:
use std::os::unix::fs::PermissionsExt;

let listener = UnixListener::bind(socket_path)?;
let perms = std::fs::Permissions::from_mode(0o660); // owner + group only
std::fs::set_permissions(socket_path, perms)?;

Or 0o600 for owner-only access. This requires the proxy to run under a dedicated user and for the admin tool (socat, etc.) to run as the same user or group.

  1. Add peer credential checking using SO_PEERCRED on Linux:
use std::os::unix::net::UnixStream;

fn check_peer_uid(stream: &UnixStream) -> bool {
    use std::os::unix::io::AsRawFd;
    let uid = nix::sys::socket::getpeereid(stream.as_raw_fd())
        .map(|(_, uid, _)| uid);
    match uid {
        Ok(peer_uid) => peer_uid == 0 || peer_uid == get_current_uid(),
        Err(_) => false,
    }
}
  1. Document that the socket path should be in a directory with restrictive permissions (e.g., /run/reverse-proxy/ owned by the proxy user with mode 0700).

C3. Admin Socket reload Response Leaks Filesystem Paths and Error Details

File: src/admin/socket.rs:257-265,268-276

Problem: When the reload command fails, the error response includes the full std::io::Error or toml::de::Error message, which can contain absolute filesystem paths, file permissions, and internal config structure details:

Err(e) => {
    return serde_json::to_string(&ErrorResponse {
        status: "error",
        message: format!("failed to read config file: {}", e),  // leaks path
    })
    .unwrap();
}

Err(e) => {
    return serde_json::to_string(&ErrorResponse {
        status: "error",
        message: format!("failed to parse config file: {}", e),  // leaks structure
    })
    .unwrap();
}

The same applies to the unknown command response at line 247, which echoes arbitrary input back without sanitization:

message: format!("unknown command: {}", command),

For an unauthenticated socket, this information disclosure helps an attacker enumerate the system (filesystem layout, config syntax, software version via error messages).

Solution: Return generic error messages to the socket client and log the details server-side:

Err(e) => {
    tracing::error!(error = %e, "failed to read config file");
    serde_json::to_string(&ErrorResponse {
        status: "error",
        message: "reload failed".to_string(),
    }).unwrap()
}

For unknown commands, avoid echoing input:

_ => serde_json::to_string(&ErrorResponse {
    status: "error",
    message: "unknown command".to_string(),
}).unwrap(),

Warning Findings

W1. No Concurrency Limit on Admin Socket Connections

File: src/admin/socket.rs:108-133

Problem: Each accepted connection spawns a new tokio task (tokio::spawn at line 114) with no concurrency limit. A local user with access to the socket can open many simultaneous connections, spawning unlimited tasks. While the 5-second read timeout and 4096-byte limit (added in review #004) mitigate the most trivial DoS, a determined attacker can:

  • Open many connections simultaneously, each sending data slowly (within the 5-second timeout), consuming memory and task slots.
  • Send data up to 4096 bytes per connection — with enough concurrent connections, this still consumes significant memory.

Solution: Add a tokio::sync::Semaphore with a reasonable limit (e.g., 10 concurrent connections):

let semaphore = Arc::new(tokio::sync::Semaphore::new(10));

// In the accept loop:
let permit = match semaphore.clone().acquire_owned().await {
    Ok(p) => p,
    Err(_) => {
        warn!("admin socket connection limit reached, dropping connection");
        continue;
    }
};
tokio::spawn(async move {
    let _permit = permit;
    handle_connection(stream, admin_socket).await;
});

W2. Config File TOCTOU: Reload Reads File Without Atomicity

File: src/admin/socket.rs:257, src/shutdown.rs:88

Problem: Both handle_reload and handle_sighup_reload read the config file with tokio::fs::read_to_string(), then parse it. If another process is writing to the config file at the same time (e.g., a configuration management tool like Ansible writing a partial file), the proxy could read a partially written config and apply it. This is a filesystem-level TOCTOU issue.

Unlike the symlink race (C1), this is harder to exploit directly — the parse will likely fail on a partial file, resulting in a reload error rather than a bad config being applied. However, it could result in a brief window where the operator sees confusing errors during config rotation.

Solution: Use atomic file replacement — write to a temporary file in the same directory, then rename over the target. Document this pattern for operators. Alternatively, compute a checksum or stat the file before and after reading to detect mid-write changes:

let metadata_before = tokio::fs::metadata(&admin_socket.config_path).await?;
let config_content = tokio::fs::read_to_string(&admin_socket.config_path).await?;
let metadata_after = tokio::fs::metadata(&admin_socket.config_path).await?;

if metadata_before.modified()? != metadata_after.modified()? {
    return serde_json::to_string(&ErrorResponse {
        status: "error",
        message: "config file changed during read, please retry".to_string(),
    }).unwrap();
}

W3. Admin Socket Path Is Not Validated or Sanitized

File: src/admin/socket.rs:81-83, src/config/static_config.rs:22-24

Problem: The admin_socket_path from the config file is used directly as a filesystem path with no validation. A malicious or misconfigured path could point to:

  • A path on a critical filesystem (e.g., /etc/passwd)
  • An extremely long path (potential buffer issues in downstream code)
  • A path with directory traversal (e.g., ../../etc/cron.d/malicious)

While the default /run/reverse-proxy/admin.sock is safe, the config is user- controlled and loaded from disk. Combined with C2 (no authentication), a local attacker who can write to the config file and trigger a reload could redirect the admin socket to an arbitrary path.

This is partially mitigated by the fact that admin_socket_path is in StaticConfig (requires restart, not hot-reloadable), but the startup config still trusts the path.

Solution: Add validation that the socket path:

  1. Ends with .sock or .socket (or at least doesn't end with a suspicious extension)
  2. Is under a known-safe directory prefix (e.g., /run/, /var/run/, or a configurable base directory)
  3. Does not contain path traversal components (..)
  4. Has a reasonable length limit
fn validate_admin_socket_path(path: &str) -> Result<(), ValidationError> {
    if path.is_empty() { return Ok(()); } // disabled is valid
    if path.len() > 255 {
        return Err(ValidationError::AdminSocketPathTooLong);
    }
    if path.contains("..") {
        return Err(ValidationError::AdminSocketPathTraversal);
    }
    let path = Path::new(path);
    if path.is_absolute() {
        Ok(())
    } else {
        Err(ValidationError::AdminSocketPathRelative)
    }
}

W4. is_socket_active Side-Effect on Other Processes

File: src/admin/socket.rs:163-165

Problem: This was flagged in review #003 (W7) and accepted as Phase 1. However, in the context of the symlink attack in C1, this function becomes more dangerous: if an attacker replaces the socket path with a symlink to another service's socket, is_socket_active will connect to that service, which could trigger behavior in the target service (e.g., accepting a connection, logging it, starting a session). This amplifies the C1 symlink attack beyond file deletion.

Solution: See C1 — the proposed fix uses symlink_metadata() and checks is_socket() before calling is_socket_active, which eliminates the symlink attack surface. Additionally, the is_socket_active check should only be reached after verifying the file is not a symlink and is actually a socket file type.


W5. reload Command Does Not Validate Config Before Applying

File: src/admin/socket.rs:268-279, src/config/dynamic_config.rs:134-158

Problem: While ConfigReloadHandle::reload() calls validate() before storing the new config, the validation does not check that the config file itself hasn't been tampered with (e.g., via checksum). More critically, the validation passes cli_allow_wildcard_bind: false during reload (validate(&new_static, &new_dynamic, false) at line 141), but the startup path may have passed true (via --allow-wildcard-bind CLI flag). This means a config reload could tighten the wildcard bind restriction that was intentionally relaxed via CLI, causing existing listeners on 0.0.0.0 to continue running while the validation reports them as errors.

This is a corner case — the running listeners are not stopped — but it creates a confusing state where the config is rejected on reload even though it was accepted at startup.

Solution: Store the cli_allow_wildcard_bind flag in ConfigReloadHandle or StaticConfig so that reload uses the same flag as startup:

pub struct ConfigReloadHandle {
    config: Arc<ArcSwap<DynamicConfig>>,
    static_config: ArcSwap<StaticConfig>,
    reload_mutex: Mutex<()>,
    cli_allow_wildcard_bind: bool,  // stored from startup
}

W6. Reload Error Response Includes Static Config Change Warnings That Should Not Be Exposed

File: src/admin/socket.rs:286-293

Problem: On successful reload, if static config fields have changed, the response includes a log at warning level but the admin socket only returns {"status": "ok"}. However, the reload_mutex inside ConfigReloadHandle already ensures only one reload runs at a time, but the changed_fields list is only logged, not returned to the caller. An operator sending reload via socat has no way to know if their reload actually changed static fields that require a restart. This is an operational concern rather than a security issue.

Solution: Include changed_fields in the reload response when non-empty:

Ok(changed_fields) => {
    if !changed_fields.is_empty() {
        serde_json::to_string(&OkWithFieldsResponse {
            status: "ok",
            message: format!("static fields changed (restart required): {}", changed_fields.join(", ")),
        }).unwrap()
    } else {
        serde_json::to_string(&OkResponse { status: "ok" }).unwrap()
    }
}

W7. Health Check Endpoint Exposes Service on Predictable Port

File: src/health.rs:17-36

Problem: The health check listener binds to 127.0.0.1 on the configured port (default 9900). While binding to localhost is correct, the port is predictable and not configurable to bind to a Unix socket instead. On shared systems, any local process can connect to 127.0.0.1:9900/health and confirm the proxy is running. This is low risk (the response is just 200 OK with no body), but the endpoint is completely unauthenticated and could be used for reconnaissance.

More importantly, there's no rate limiting on the health endpoint. A local attacker could flood it to consume connection resources, though this is mitigated by the fact that it's localhost-only and a simple GET.

Solution: Consider adding an option to bind the health check to a Unix socket instead of a TCP port, or adding a shared-secret token header check. For Phase 1, this is acceptable as-is since the endpoint is localhost-only and returns minimal information.


Suggestions

S1. Add Connection Peer Logging to Admin Socket

File: src/admin/socket.rs:112

Suggestion: Log the peer credentials (UID, PID, GID) of each admin socket connection using SO_PEERCRED. This provides an audit trail for reload operations:

use std::os::unix::io::AsRawFd;

fn log_peer_credentials(stream: &tokio::net::UnixStream) {
    if let Ok((_, uid)) = nix::sys::socket::getpeereid(stream.as_raw_fd()) {
        info!(peer_uid = uid, "admin socket connection accepted");
    }
}

S2. Use AF_UNIX Peer Credential Validation Instead of File Permissions

File: src/admin/socket.rs

Suggestion: Rather than (or in addition to) setting socket file permissions, use SO_PEERCRED/getpeereid to validate the connecting process's UID at accept time. This is more robust than filesystem permissions because:

  1. It works regardless of the umask
  2. It allows fine-grained control (e.g., allow root and the proxy user only)
  3. It cannot be bypassed by group membership changes

Requires adding nix as a dependency (or using libc directly).


S3. Set Socket File Permissions Immediately After Bind

File: src/admin/socket.rs:90-102

Suggestion: After the successful UnixListener::bind(), immediately set restrictive permissions on the socket file:

let listener = UnixListener::bind(socket_path)?;
let perms = std::fs::Permissions::from_mode(0o660);
std::fs::set_permissions(socket_path, perms)?;
info!("admin socket listening on {} (mode 660)", socket_path);

This is the minimum defense-in-depth even if full peer credential checking is not implemented in Phase 1.


S4. Consider Adding a Simple Challenge-Response to the Admin Socket Protocol

File: src/admin/socket.rs:167-252

Suggestion: Even without full authentication, a simple shared-secret token check would significantly raise the bar for local attackers. Add a token field to StaticConfig (or an environment variable) and require it as a prefix to commands:

<token>:reload\n
<token>:status\n

This prevents casual discovery of the socket path from enabling reload. The token could be stored in an environment variable (ADMIN_SOCKET_TOKEN) that the proxy reads at startup, keeping it out of the config file.


S5. Move is_socket_active Check After File Type Verification

File: src/admin/socket.rs:143-161

Suggestion: This is the implementation detail of C1's fix. The current order is:

  1. Check if path exists
  2. Check if socket is active (connect to it)
  3. Remove the file

The safe order should be:

  1. Check if path exists
  2. Verify it's not a symlink (symlink_metadata + is_symlink())
  3. Verify it's a socket file type (is_socket())
  4. Check if socket is active
  5. Remove the file

S6. Document Admin Socket Security Model

Suggestion: Add a section to docs/architecture/operations.md (or similar) that clearly states:

  1. The admin socket is unauthenticated by design for Phase 1
  2. The socket path must be in a directory with restrictive permissions
  3. The socket file permissions should be 0600 or 0660
  4. Only trusted users should have filesystem access to the socket path
  5. Future phases may add peer credential authentication or challenge-response

Summary Statistics

Severity Count Status
Critical 3 Must fix before production
Warning 7 Should fix — security hardening
Suggestion 6 Consider for defense-in-depth
  1. C1 (symlink race / arbitrary file deletion) — Exploitable by any local user who can reach the socket directory. Fix with symlink_metadata + file type check before removal.
  2. C2 (no authentication / unrestricted socket permissions) — Any local user can trigger config reload. Set socket permissions to 0o660 or 0o600 after bind as a minimum; add peer credential checking for full protection.
  3. C3 (error messages leak filesystem paths) — Information disclosure on an unauthenticated socket. Return generic errors and log details server-side.
  4. W1 (no connection concurrency limit) — DoS vector. Add a semaphore.
  5. W2 (config file read without atomicity) — Partial-read risk during config rotation. Document atomic replacement pattern for operators.
  6. W3 (socket path not validated) — Path traversal/symlink risk in config. Add basic validation.
  7. W4 (is_socket_active side effect) — Amplifies C1. Fixed by C1's symlink_metadata guard.
  8. W5 (reload validation uses different wildcard flag) — Inconsistency between startup and reload validation.
  9. Remaining W and S findings — Fix opportunistically.

Additional Notes

The unusual traffic on this file was likely driven by interest in C1 and C2. The symlink race (C1) is the most directly exploitable vulnerability — it requires only local filesystem access to the socket directory and timing to replace the socket with a symlink. Combined with C2 (no authentication), a local attacker who can write to the socket directory can both delete arbitrary files (C1) and trigger config reloads with attacker-controlled content (C2 + config file write access).