Resolve all architecture open questions, add 13 ADRs, update specs

Resolved all 11 open questions based on project guidance: Transport: - OQ-01/OQ-07: ACME/Let's Encrypt with domain + IP paths (ADR-008) - OQ-02: Default to n0 relay, --iroh-relay override (ADR-009) - OQ-05: Transport chaining supported natively (ADR-010) Client: - OQ-06: Programmatic-first API, no ~/.ssh/config (ADR-011) Server: - OQ-04: Ed25519 + OpenSSH cert-authority, no password auth (ADR-012) - OQ-08: fail2ban-friendly logging + built-in rate limiting (ADR-013) TUN: - OQ-03/OQ-09: Deferred entirely, recommend tun2proxy (ADR-014) - tun-shim.md marked deprecated NAPI: - OQ-10: Expose both connect() and serve() (ADR-016) - OQ-11: Use napi-rs for FFI bridge (ADR-015) Additional ADRs created during review: - ADR-006: No logging of tunnel destinations (was phantom reference) - ADR-017: Stealth mode protocol multiplexing - ADR-018: Control channel for pubsub over SSH Fixed: ADR-002 status → Superseded, ADR-007 title typo, WRAUTH_SERVER typo, ADR-005 stale wraith-tun refs, undefined ACL feature removed from server.md, --proxy semantic difference documented.
2026-06-01 17:31:28 +00:00
parent dad8224686
commit 13b0991fb8
23 changed files with 777 additions and 249 deletions
--- a/docs/architecture/server.md
+++ b/docs/architecture/server.md
@@ -23,7 +23,7 @@ The server is the tunnel endpoint. It receives SSH channels requesting TCP conne
 │                                                   │
 │  ┌─────────────────────────────────────────────┐ │
 │  │          SSH Server (russh)                  │ │
-│  │   ServerHandler per connection              │ │
+│  │   ServerHandler per connection               │ │
 │  │   - auth_publickey() → Accept/Reject        │ │
 │  │   - channel_open_direct_tcpip() → connect   │ │
 │  │   - channel_open_forwarded_tcpip() → proxy  │ │
@@ -35,29 +35,53 @@ The server is the tunnel endpoint. It receives SSH channels requesting TCP conne
 │  └──────────────────────────────────────────────┘ │
 │                                                   │
 │  ┌──────────────────────────────────────────────┐ │
-│  │         Outbound Proxy (optional)             │ │
+│  │         Outbound Proxy (optional)            │ │
 │  │   - Direct TCP                               │ │
-│  │   - SOCKS5 proxy                             │ │
-│  │   - HTTP CONNECT proxy                       │ │
+│  │   - SOCKS5 proxy                            │ │
+│  │   - HTTP CONNECT proxy                      │ │
+│  └──────────────────────────────────────────────┘ │
+│                                                   │
+│  ┌──────────────────────────────────────────────┐ │
+│  │         Rate Limiter                         │ │
+│  │   - max-connections-per-ip                   │ │
+│  │   - max-auth-attempts                        │ │
 │  └──────────────────────────────────────────────┘ │
 └──────────────────────────────────────────────────┘
 ```

 ### Authentication

-The server supports Ed25519 public key authentication by default:
+The server supports Ed25519 public key authentication (default) and OpenSSH certificate authority authentication (ADR-012):

-1. Load authorized keys from `~/.ssh/authorized_keys` or a specified path
+**Ed25519 public key** (default):
+1. Load authorized keys from a specified path or in-memory data
 2. `auth_publickey()` checks the presented key against the authorized set
 3. Uses constant-time comparison to prevent timing attacks

-Optional password authentication (not recommended, controlled by feature flag or CLI flag).
+**OpenSSH certificate authority** (ADR-012):
+1. Load a trusted CA public key (`--cert-authority <path>`)
+2. `auth_publickey()` validates the presented certificate: checks CA signature, expiry, and principal restrictions
+3. Supports certificate options: `permit-port-forwarding`, `no-pty`, `source-address`
+
+This enables multi-user deployments where adding one CA line to `authorized_keys` is simpler than managing individual keys for every user.
+
+**No password authentication over SSH.** Keys and certificates are sufficient and more secure. If a local SOCKS5 proxy needs its own auth layer, that's a separate concern.
+
+### TLS Certificate Provisioning
+
+The server supports three TLS certificate modes (ADR-008):
+
+1. **Manual certs** (`--tls-cert` / `--tls-key`): User provides certificate and key files. For users with existing PKI.
+2. **Domain-based ACME** (`--acme-domain <domain>`): Auto-provisions certificates from Let's Encrypt using HTTP-01 or TLS-ALPN-01 challenges. Certificate is domain-bound and auto-renews. Requires port 80 or DNS access for challenges.
+3. **IP-based ACME**: Short-lived certificates via TLS-ALPN-01 challenge on port 443. No domain name needed, but certificates expire frequently. The ACME client runs continuously.
+
+ACME support is feature-gated behind the `acme` feature flag to keep the base binary lean. Implementation uses `rustls-acme` or a similar pure-Rust ACME client to avoid an external `certbot` dependency.

 ### Channel Handling

 When a client opens a `channel_open_direct_tcpip(host, port, originator_addr, originator_port)`:

-1. **ACL check** — verify the client is allowed to connect to `host:port` (if ACLs are configured)
+1. **Connection** — connect to `host:port`, either directly or via the configured outbound proxy
 2. **Outbound connection** — connect to the target, either directly or via the configured outbound proxy
 3. **Bidirectional proxy** — `tokio::io::copy_bidirectional` between the SSH channel stream and the outbound TCP stream
 4. **Cleanup** — close the channel and TCP stream when either side disconnects
@@ -88,6 +112,7 @@ This makes the server appear as an ordinary web server to port scanners and DPI
 ```rust
 struct WraithServerHandler {
    authorized_keys: HashSet<PublicKey>,
+    cert_authorities: Vec<PublicKey>,
    proxy_config: Option<ProxyConfig>,
 }

@@ -95,11 +120,19 @@ impl server::Handler for WraithServerHandler {
    type Error = anyhow::Error;

    async fn auth_publickey(&mut self, user: &str, key: &PublicKey) -> Auth {
+        // Check direct key match
        if self.authorized_keys.contains(key) {
-            Auth::Accept
-        } else {
-            Auth::Reject { proceed_with_methods: None, partial_success: false }
+            return Auth::Accept;
        }
+        // Check certificate authority validation
+        if let Some(cert) = key.as_certificate() {
+            for ca in &self.cert_authorities {
+                if cert.verify(ca) && cert.is_valid() {
+                    return Auth::Accept;
+                }
+            }
+        }
+        Auth::Reject { proceed_with_methods: None, partial_success: false }
    }

    async fn channel_open_direct_tcpip(
@@ -111,7 +144,6 @@ impl server::Handler for WraithServerHandler {
        originator_port: u32,
        session: &mut server::Session,
    ) -> Result<Channel<server::Msg>, Self::Error> {
-        // ACL check (if configured)
        // Connect to host:port (directly or via proxy)
        // Spawn bidirectional proxy task
        Ok(channel)
@@ -119,12 +151,29 @@ impl server::Handler for WraithServerHandler {
 }
 ```

-### Logging
+### Logging and Rate Limiting

- **Log**: Auth attempts (timestamp, source IP, user, key fingerprint, success/failure)
- **Do not log**: Channel open targets, DNS resolutions, bytes transferred, connection duration
+**Logging** (for fail2ban integration on Linux):

-This provides enough information for fail2ban integration without creating a privacy-sensitive audit trail.
+- `INFO` level: auth attempts (remote_addr, user, key_fingerprint, accept/reject)
+- `INFO` level: connection opened (remote_addr, transport kind)
+- `INFO` level: connection closed (remote_addr, duration)
+- Do NOT log: channel open targets, DNS resolutions, bytes transferred
+
+This matches our production fail2ban setup which filters on source IP + failure indicators. Example log lines:
+```
+INFO auth attempt remote_addr=203.0.113.50 user=root key_fingerprint=SHA256:abc... result=reject
+INFO connection opened remote_addr=203.0.113.50 transport=tls
+```
+
+**Built-in rate limiting** (platform-independent):
+
+| Flag | Default | Purpose |
+|------|---------|---------|
+| `--max-connections-per-ip` | 0 (unlimited) | Reject new connections from IPs with N active connections |
+| `--max-auth-attempts` | 10 | Disconnect after N failed auth attempts per connection |
+
+These provide abuse protection on platforms without fail2ban (macOS, Windows, BSD) and complement fail2ban on Linux.

 ### CLI Interface

@@ -132,17 +181,21 @@ This provides enough information for fail2ban integration without creating a pri
 # Basic server (SSH on port 22)
 wraith serve --key ~/.ssh/ssh_host_ed25519_key

-# With TLS on port 443
+# With TLS (manual certs)
 wraith serve --key ~/.ssh/ssh_host_ed25519_key \
    --transport tls \
    --tls-cert /etc/ssl/cert.pem \
    --tls-key /etc/ssl/key.pem

+# With TLS (auto ACME, domain-based)
+wraith serve --key ~/.ssh/ssh_host_ed25519_key \
+    --transport tls \
+    --acme-domain example.com
+
 # With TLS + stealth (fake nginx 404 to scanners)
 wraith serve --key ~/.ssh/ssh_host_ed25519_key \
    --transport tls \
-    --tls-cert /etc/ssl/cert.pem \
-    --tls-key /etc/ssl/key.pem \
+    --acme-domain example.com \
    --stealth

 # With iroh transport (no public IP needed)
@@ -153,44 +206,64 @@ wraith serve --key ~/.ssh/ssh_host_ed25519_key \
 wraith serve --key ~/.ssh/ssh_host_ed25519_key \
    --proxy socks5://127.0.0.1:9050

+# With certificate authority authentication
+wraith serve --key ~/.ssh/ssh_host_ed25519_key \
+    --cert-authority /etc/wraith/ca.pub
+
+# With rate limiting
+wraith serve --key ~/.ssh/ssh_host_ed25519_key \
+    --max-connections-per-ip 5 \
+    --max-auth-attempts 3
+
 # All options
 wraith serve \
-  --key <path> \              # SSH host key path (required)
-  --authorized-keys <path> \  # Authorized keys file (default: ~/.ssh/authorized_keys)
-  --transport tcp|tls|iroh \  # Transport mode
-  --listen <addr:port> \      # Listen address for TCP/TLS (default: 0.0.0.0:22)
-  --tls-cert <path> \         # TLS certificate (required for tls transport)
-  --tls-key <path> \          # TLS private key (required for tls transport)
-  --stealth \                 # Serve fake nginx 404 to non-SSH connections
-  --proxy <url> \             # Outbound proxy URL (socks5:// or http://)
-  --iroh-relay <url>          # iroh relay server URL (default: n0 relay)
+  --key <path-or-buffer> \       # SSH host key (required)
+  --authorized-keys <path> \     # Authorized keys file
+  --cert-authority <path> \      # CA public key for cert-auth
+  --transport tcp|tls|iroh \     # Transport mode
+  --listen <addr:port> \         # Listen address for TCP/TLS (default: 0.0.0.0:22)
+  --tls-cert <path> \            # TLS certificate (manual)
+  --tls-key <path> \            # TLS private key (manual)
+  --acme-domain <domain> \      # ACME auto-cert domain
+  --stealth \                    # Serve fake nginx 404 to non-SSH connections
+  --proxy <url> \                # Outbound proxy URL (socks5:// or http://)
+  --iroh-relay <url> \           # iroh relay server URL (default: n0 relay)
+  --max-connections-per-ip <n> \ # Max concurrent connections per IP (default: unlimited)
+  --max-auth-attempts <n>        # Max auth failures before disconnect (default: 10)
 ```

 ### iroh Server Mode

 When running with `--transport iroh`, the server:

-1. Creates an `iroh::Endpoint` with the SSH ALPN
+1. Creates an `iroh::Endpoint` with ALPN value `b"wraith-ssh"`
 2. Prints its `EndpointId` (Ed25519 public key) — this is what clients use to connect
 3. Uses `iroh::protocol::Router` to accept incoming connections
 4. For each connection, accepts a `open_bi()` stream and passes it to `server::run_stream()`

-No listening port is needed. The server connects outbound to the iroh relay and awaits connections from clients who know its `EndpointId`.
+No listening port is needed. The server connects outbound to the iroh relay (default: n0, override with `--iroh-relay`) and awaits connections from clients who know its `EndpointId`.

 ## Constraints

- The server does not log tunnel destinations (ADR-006, pending)
+- The server does not log tunnel destinations (ADR-006). Auth events and connection events are logged for fail2ban integration (ADR-013).
 - One `ServerHandler` instance per connection. Handler state is not shared between connections (unless explicitly configured via `Arc` shared state for things like connection limits).
 - The server binds to a single transport at a time. Running multiple transports (e.g., TCP + iroh) simultaneously requires separate processes or a future multiplexing feature.
+- ACME support requires the `acme` feature flag. Without it, only manual TLS certs are supported.
+- No password authentication over SSH channels. Key-based and cert-authority only (ADR-012).

 ## Open Questions

- **OQ-07**: Whether to support ACME/Let's Encrypt auto-provisioning for TLS certificates
- **OQ-08**: Connection limits and rate limiting configuration
+None — all resolved.

 ## Design Decisions

 | ADR | Decision | Summary |
 |-----|----------|---------|
 | [001](decisions/001-pluggable-transport.md) | Pluggable transport | Transport trait, SSH consumes stream |
-| [004](decisions/004-ssh-over-transport.md) | SSH over transport | SSH never touches network directly |
+| [004](decisions/004-ssh-over-transport.md) | SSH over transport | SSH never touches network directly |
+| [006](decisions/006-no-logging-of-tunnel-destinations.md) | No logging of destinations | Server logs auth and connections, not destinations |
+| [008](decisions/008-acme-lets-encrypt.md) | ACME/Let's Encrypt | Auto-provision TLS certs, domain and IP paths |
+| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Key + cert-authority auth | No password auth; support OpenSSH cert-authority |
+| [013](decisions/013-fail2ban-friendly-logging.md) | Fail2ban-friendly logging | Structured auth logs + built-in rate limiting |
+| [017](decisions/017-stealth-mode-protocol-multiplexing.md) | Stealth mode | Protocol multiplexing on port 443 |
+| [018](decisions/018-control-channel-for-pubsub.md) | Control channel | Reserved `wraith-control` destination for pubsub |