Resolve all architecture open questions, add 13 ADRs, update specs

Resolved all 11 open questions based on project guidance: Transport: - OQ-01/OQ-07: ACME/Let's Encrypt with domain + IP paths (ADR-008) - OQ-02: Default to n0 relay, --iroh-relay override (ADR-009) - OQ-05: Transport chaining supported natively (ADR-010) Client: - OQ-06: Programmatic-first API, no ~/.ssh/config (ADR-011) Server: - OQ-04: Ed25519 + OpenSSH cert-authority, no password auth (ADR-012) - OQ-08: fail2ban-friendly logging + built-in rate limiting (ADR-013) TUN: - OQ-03/OQ-09: Deferred entirely, recommend tun2proxy (ADR-014) - tun-shim.md marked deprecated NAPI: - OQ-10: Expose both connect() and serve() (ADR-016) - OQ-11: Use napi-rs for FFI bridge (ADR-015) Additional ADRs created during review: - ADR-006: No logging of tunnel destinations (was phantom reference) - ADR-017: Stealth mode protocol multiplexing - ADR-018: Control channel for pubsub over SSH Fixed: ADR-002 status → Superseded, ADR-007 title typo, WRAUTH_SERVER typo, ADR-005 stale wraith-tun refs, undefined ACL feature removed from server.md, --proxy semantic difference documented.
2026-06-01 17:31:28 +00:00
parent dad8224686
commit 13b0991fb8
23 changed files with 777 additions and 249 deletions
--- a/docs/architecture/README.md
+++ b/docs/architecture/README.md
@@ -7,7 +7,7 @@ last_updated: 2026-06-01

 ## Current State

-Pre-implementation. Feasibility assessment complete (see research/ssh-tunnel-vpn-alternative-feasibility.md). Architecture specification in progress.
+Pre-implementation. Feasibility assessment complete. Architecture specification drafted — all open questions resolved, pending review.

 ## Architecture Documents

@@ -17,7 +17,7 @@ Pre-implementation. Feasibility assessment complete (see research/ssh-tunnel-vpn
 | [transport.md](transport.md) | draft | Transport abstraction: TCP, TLS, iroh |
 | [client.md](client.md) | draft | Client connection, SOCKS5, port forwarding |
 | [server.md](server.md) | draft | Server acceptance, channel handling, proxy |
-| [tun-shim.md](tun-shim.md) | draft | Privileged TUN interface wrapper (separate process) |
+| [tun-shim.md](tun-shim.md) | deprecated | TUN interface wrapper — **deferred**, use tun2proxy |
 | [napi-and-pubsub.md](napi-and-pubsub.md) | draft | NAPI wrapper and pubsub event target adapter |

 ## ADR Table
@@ -25,14 +25,27 @@ Pre-implementation. Feasibility assessment complete (see research/ssh-tunnel-vpn
 | ADR | Title | Status |
 |-----|-------|--------|
 | [001](decisions/001-pluggable-transport.md) | Pluggable transport via `AsyncRead+AsyncWrite` trait | Accepted |
-| [002](decisions/002-tun-separate-process.md) | TUN shim as separate process | Accepted |
-| [003](decisions/002-iroh-stream-join.md) | iroh stream via `tokio::io::join` | Accepted |
+| [002](decisions/002-tun-separate-process.md) | TUN shim as separate process | Superseded by ADR-014 |
+| [003](decisions/003-iroh-stream-join.md) | iroh stream via `tokio::io::join` | Accepted |
 | [004](decisions/004-ssh-over-transport.md) | SSH runs over transport, not alongside | Accepted |
 | [005](decisions/005-socks5-before-tun.md) | SOCKS5 as primary interface, TUN as add-on | Accepted |
+| [006](decisions/006-no-logging-of-tunnel-destinations.md) | No logging of tunnel destinations | Accepted |
+| [007](decisions/007-napi-single-stream.md) | NAPI exposes single duplex stream | Accepted |
+| [008](decisions/008-acme-lets-encrypt.md) | ACME/Let's Encrypt certificate provisioning | Accepted |
+| [009](decisions/009-default-iroh-relay.md) | Default iroh relay with override | Accepted |
+| [010](decisions/010-transport-chaining-cli.md) | Transport chaining in CLI | Accepted |
+| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first API, no file-based config | Accepted |
+| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Ed25519 keys + OpenSSH cert-authority, no password auth | Accepted |
+| [013](decisions/013-fail2ban-friendly-logging.md) | Fail2ban-friendly logging + built-in rate limiting | Accepted |
+| [014](decisions/014-defer-tun-recommend-socks5-proxy.md) | Defer TUN, recommend local SOCKS5 + tun2proxy | Accepted |
+| [015](decisions/015-napi-rs-for-ffi-bridge.md) | napi-rs for FFI bridge | Accepted |
+| [016](decisions/016-napi-expose-connect-and-serve.md) | NAPI exposes both connect() and serve() | Accepted |
+| [017](decisions/017-stealth-mode-protocol-multiplexing.md) | Stealth mode — protocol multiplexing on port 443 | Accepted |
+| [018](decisions/018-control-channel-for-pubsub.md) | Control channel for pubsub over SSH | Accepted |

 ## Open Questions

-See [open-questions.md](open-questions.md)
+All open questions have been resolved. See [open-questions.md](open-questions.md) for details on each resolution.

 ## Lifecycle Definitions

--- a/docs/architecture/client.md
+++ b/docs/architecture/client.md
@@ -7,11 +7,11 @@ last_updated: 2026-06-01

 ## What

-The wraith client establishes an SSH session to a server (via pluggable transport) and exposes local interfaces for routing traffic through that session: SOCKS5 proxy, port forwarding, and eventually TUN.
+The wraith client establishes an SSH session to a server (via pluggable transport) and exposes a local SOCKS5 proxy for routing traffic through that session. Port forwarding (`-L` / `-R` style) covers specific service access like Postgres or Redis.

 ## Why

-Users need a way to route traffic through the SSH tunnel. SOCKS5 is the primary interface — it's standard, well-supported by browsers and CLI tools, and needs no privileges. Port forwarding (`-L` / `-R` style) covers specific service access like Postgres or Redis. TUN covers full-system VPN-like behavior.
+Users need a way to route traffic through the SSH tunnel. SOCKS5 is the primary interface — it's standard, well-supported by browsers and CLI tools, and needs no privileges. Port forwarding covers specific service access. For VPN-like "route all traffic" behavior, users run `tun2proxy` alongside wraith (ADR-014).

 ## Architecture

@@ -21,11 +21,11 @@ Users need a way to route traffic through the SSH tunnel. SOCKS5 is the primary
 ┌────────────────────────────────────────────────────────┐
 │                     wraith connect                      │
 │                                                        │
-│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
-│  │ SOCKS5   │ │ Port     │ │ Remote   │ │ (TUN     │ │
-│  │ Server   │ │ Forward  │ │ Forward  │ │  shim)   │ │
-│  │ :1080    │ │ -L spec  │ │ -R spec  │ │ separate │ │
-│  └────┬─────┘ └────┬─────┘ └────┬─────┘ └──────────┘ │
+│  ┌──────────┐ ┌──────────┐ ┌──────────┐              │
+│  │ SOCKS5   │ │ Port     │ │ Remote   │              │
+│  │ Server   │ │ Forward  │ │ Forward  │              │
+│  │ :1080    │ │ -L spec  │ │ -R spec  │              │
+│  └────┬─────┘ └────┬─────┘ └────┬─────┘              │
 │       │             │             │                     │
 │       ▼             ▼             ▼                     │
 │  ┌─────────────────────────────────┐                   │
@@ -94,6 +94,16 @@ On transport failure:

 Existing TCP connections through the tunnel are lost on reconnect. This is acceptable — same as any VPN.

+### Programmatic Configuration (ADR-011)
+
+The client uses programmatic configuration — no `~/.ssh/config` parsing, no custom config files. Configuration comes from:
+
+1. **CLI flags**: `--server`, `--identity`, `--transport`, etc.
+2. **Library API**: `ConnectOptions` and `ServeOptions` structs in `wraith-core`, constructable programmatically
+3. **Environment variables**: `WRAITH_SERVER`, `WRAITH_IDENTITY` as convenience defaults
+
+This approach avoids cross-platform path issues (`~` expansion, Windows `USERPROFILE`) and makes the library API clean for programmatic consumers like the NAPI wrapper. Keys can be provided as file paths or in-memory data.
+
 ### CLI Interface

 ```bash
@@ -103,9 +113,18 @@ wraith connect --server example.com --identity ~/.ssh/id_ed25519
 # With TLS
 wraith connect --server example.com:443 --transport tls --identity ~/.ssh/id_ed25519

+# With TLS + insecure (self-signed certs)
+wraith connect --server example.com:443 --transport tls --identity ~/.ssh/id_ed25519 --insecure
+
 # With iroh (no public IP needed)
 wraith connect --peer <endpoint-id> --transport iroh --identity ~/.ssh/id_ed25519

+# With iroh + custom relay
+wraith connect --peer <endpoint-id> --transport iroh --identity ~/.ssh/id_ed25519 --iroh-relay https://relay.example.com
+
+# With iroh + proxy (transport chaining)
+wraith connect --peer <endpoint-id> --transport iroh --identity ~/.ssh/id_ed25519 --proxy socks5://127.0.0.1:1080
+
 # SOCKS5 on custom port
 wraith connect --server example.com --socks5 127.0.0.1:1080 --identity ~/.ssh/id_ed25519

@@ -114,30 +133,36 @@ wraith connect --server example.com --forward 5432:db.internal:5432 --forward 63

 # All options
 wraith connect \
-  --server <addr> \          # TCP server address (required for tcp/tls)
+  --server <addr> \          # TCP/TLS server address (required for tcp/tls)
  --peer <endpoint-id> \    # iroh peer ID (required for iroh)
  --transport tcp|tls|iroh \ # Transport mode
-  --identity <path> \       # SSH private key path
+  --identity <path-or-buffer> \ # SSH private key (path or in-memory)
  --socks5 <addr:port> \    # SOCKS5 listen address (default: 127.0.0.1:1080)
  --forward <spec> \        # Port forward spec (repeatable)
  --remote-forward <spec> \ # Remote port forward spec (repeatable)
-  --proxy <url>              # Upstream proxy (SOCKS5/HTTP CONNECT)
+  --proxy <url> \            # Upstream proxy (socks5:// or http://)
+  --iroh-relay <url> \      # iroh relay URL (default: n0 relay)
+  --tls-server-name <host> \ # SNI hostname for TLS
+  --insecure                 # Accept self-signed TLS certs
 ```

 ## Constraints

 - SOCKS5 is always enabled when `wraith connect` runs (it's the primary interface). Port forwards are optional.
- The client does not know or log what destinations are accessed. The SOCKS5 server connects and proxies — no logging of SOCKS5 request targets.
- Authentication is Ed25519 public key only by default. Password auth supported but not recommended. (OQ-04)
+- The client does not log tunnel destinations. The SOCKS5 server connects and proxies — no logging of SOCKS5 request targets.
+- Authentication is Ed25519 public key or OpenSSH certificate (ADR-012). No password authentication over SSH.
 - Only one SSH session per `wraith connect` process. Multiple sessions = multiple processes (or a future multiplexer).
+- No `~/.ssh/config` parsing. Configuration is programmatic via CLI flags, env vars, or library API structs (ADR-011).
+- VPN-like "route all traffic" behavior is provided by running `tun2proxy --proxy socks5://127.0.0.1:1080` alongside the client, not by a built-in TUN interface (ADR-014).

 ## Open Questions

- **OQ-04**: Authentication beyond Ed25519 keys
- **OQ-06**: Whether to support SSH config file parsing (`~/.ssh/config`) for default host/key/port settings
+None — all resolved.

 ## Design Decisions

 | ADR | Decision | Summary |
 |-----|----------|---------|
-| [005](decisions/005-socks5-before-tun.md) | SOCKS5 first | SOCKS5 is the primary interface, TUN forwards to it |
+| [005](decisions/005-socks5-before-tun.md) | SOCKS5 first | SOCKS5 is the primary interface; TUN is external (tun2proxy) |
+| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first API | No file-based config; options are structs, env vars, or CLI flags |
+| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Key + cert-authority | No password auth; OpenSSH cert-authority for multi-user |
--- a/docs/architecture/decisions/002-tun-separate-process.md
+++ b/docs/architecture/decisions/002-tun-separate-process.md
@@ -1,7 +1,7 @@
 # ADR-002: TUN Shim as Separate Process

 ## Status
-Accepted
+Superseded by ADR-014

 ## Context
 TUN interface creation requires root privileges or `CAP_NET_ADMIN` on Linux, Administrator on Windows, or platform-specific VPN APIs on macOS/iOS/Android. If the core wraith binary required these privileges, the attack surface of root-required code would include the entire SSH implementation, key handling, and transport negotiation.
--- a/docs/architecture/decisions/005-socks5-before-tun.md
+++ b/docs/architecture/decisions/005-socks5-before-tun.md
@@ -28,11 +28,11 @@ TUN forwards to SOCKS5 rather than directly to SSH because:
 - No root code in the core binary

 ## Consequences
- **Positive**: Core binary is root-free. TUN is optional and separate.
+- **Positive**: Core binary is root-free. TUN functionality is provided by the external `tun2proxy` tool (ADR-014).
 - **Positive**: SOCKS5 is testable without TUN — just `curl` against it.
- **Positive**: TUN implementation is simplified — it's a thin wrapper around tun2proxy's pattern pointed at localhost:1080.
- **Negative**: TUN adds one network hop (TUN → localhost SOCKS5 → SSH). The latency impact is negligible (localhost).
- **Negative**: SOCKS5 doesn't capture UDP (except DNS via SOCKS5h). TUN mode would handle non-DNS UDP via the SOCKS5 UDP association or drop it.
+- **Positive**: The TUN approach is validated by tun2proxy, a well-tested existing tool. No custom TUN code to maintain.
+- **Negative**: VPN-like behavior requires running `tun2proxy` alongside `wraith connect` — two processes instead of one integrated binary.
+- **Negative**: SOCKS5 doesn't capture UDP (except DNS via SOCKS5h). TUN mode via tun2proxy handles this separately.

 ## References
 - [client.md](../client.md)
--- a/docs/architecture/decisions/006-no-logging-of-tunnel-destinations.md
+++ b/docs/architecture/decisions/006-no-logging-of-tunnel-destinations.md
@@ -0,0 +1,38 @@
+# ADR-006: No Logging of Tunnel Destinations
+
+## Status
+Accepted
+
+## Context
+An SSH tunnel server sees every destination that clients connect to — hostnames, IP addresses, port numbers. This is extremely sensitive information. Logging it creates:
+
+- **Privacy risks**: Tunnel destinations reveal what services users access (internal databases, APIs, etc.)
+- **Legal concerns**: Server operators may be pressured to produce logs showing what clients accessed
+- **Data retention liability**: Stored destination logs are an attack surface (data breaches, subpoenas)
+
+However, the server does need to log some information for operational purposes — particularly for fail2ban integration to detect and block abusive connections.
+
+## Decision
+The server does NOT log:
+- `channel_open_direct_tcpip` destinations (host, port)
+- DNS resolutions performed by the server on behalf of clients
+- Bytes transferred through tunnel channels
+- Connection duration or throughput
+
+The server DOES log (ADR-013):
+- Auth attempts (remote_addr, user, key_fingerprint, accept/reject)
+- Connection opened (remote_addr, transport kind)
+- Connection closed (remote_addr, duration)
+
+This separation ensures fail2ban has enough data to detect abusive IPs while destination privacy is maintained.
+
+## Consequences
+- **Positive**: Tunnel destinations are never written to disk or any observable log. This is the same guarantee OpenSSH makes with `LogLevel VERBOSE` or below.
+- **Positive**: Reduces legal and privacy exposure for server operators.
+- **Positive**: fail2ban can still work — it needs source IPs and auth failures, not destinations.
+- **Negative**: Server operators cannot audit what destinations clients are accessing. If an operator needs this for compliance, they must implement it outside wraith (e.g., network-level logging at the target host).
+- **Negative**: Debugging connectivity issues is harder without destination logs. Mitigated by client-side logging (the client knows what it's connecting to).
+
+## References
+- [server.md](../server.md)
+- [ADR-013](013-fail2ban-friendly-logging.md) — what the server does log
--- a/docs/architecture/decisions/007-napi-single-stream.md
+++ b/docs/architecture/decisions/007-napi-single-stream.md
@@ -1,4 +1,4 @@
-# ADR-006: NAPI Exposes Single Duplex Stream
+# ADR-007: NAPI Exposes Single Duplex Stream

 ## Status
 Accepted
--- a/docs/architecture/decisions/008-acme-lets-encrypt.md
+++ b/docs/architecture/decisions/008-acme-lets-encrypt.md
@@ -0,0 +1,38 @@
+# ADR-008: ACME/Let's Encrypt Certificate Provisioning
+
+## Status
+Accepted
+
+## Context
+TLS transport mode requires certificates. Manual certificate management is error-prone — users need to obtain, install, and renew certificates. Our production setup uses certbot with Let's Encrypt (documented in `/workspace/system/dev1/certbot.md`), which automates this via the ACME protocol.
+
+There are two ACME flows:
+1. **Domain-based**: Standard flow with DNS-01 or HTTP-01 challenge. Certificate is tied to a domain name, auto-renews via certbot/systemd timer. Requires port 80 or DNS access for challenges.
+2. **IP-based**: Short-lived certificates via TLS-ALPN-01 challenge on port 443. No domain needed, but cert is short-lived (days, not months). Simpler for quick setups but requires the ACME client to run continuously.
+
+Both flows are important for wraith's usability. Without ACME, TLS mode requires manual cert setup — a significant barrier for users who want "SSH over port 443" for censorship resistance.
+
+## Decision
+Support both ACME certificate provisioning paths:
+
+1. **Domain-based ACME** (`--acme-domain <domain>`): Standard certbot-style flow. Certificate is domain-bound, auto-renews. The server runs a challenge responder (HTTP-01 on port 80 or TLS-ALPN-01 on port 443) during certificate issuance/renewal.
+
+2. **IP-based ACME**: Short-lived certs for servers without a domain. Uses TLS-ALPN-01 challenge on port 443. Lower burden but certs expire frequently.
+
+3. **Manual certs** (`--tls-cert` / `--tls-key`): Always supported for users with existing certificates or specific PKI setups.
+
+The implementation should use the `rustls-acme` crate (or similar pure-Rust ACME client) to avoid an external certbot dependency. This keeps wraith self-contained as a single binary.
+
+## Consequences
+- **Positive**: Users can run `wraith serve --transport tls --acme-domain example.com` and get working TLS with zero manual cert management.
+- **Positive**: IP-based ACME covers the quick-setup case without requiring a domain.
+- **Positive**: Consistent with our production infrastructure (certbot + Let's Encrypt is already our standard).
+- **Negative**: ACME adds complexity to the server binary (challenge responder, cert store, renewal timer).
+- **Negative**: IP-based short-lived certs require more frequent renewal handling.
+- **Negative**: Binary size increases with ACME support (rustls-acme dependency). Consider making ACME a feature flag (`acme`).
+
+## References
+- [server.md](../server.md)
+- [OQ-01](../open-questions.md) — resolved by this ADR
+- [OQ-07](../open-questions.md) — resolved by this ADR
+- Production certbot setup: `/workspace/system/dev1/certbot.md`
--- a/docs/architecture/decisions/009-default-iroh-relay.md
+++ b/docs/architecture/decisions/009-default-iroh-relay.md
@@ -0,0 +1,28 @@
+# ADR-009: Default iroh Relay with Override
+
+## Status
+Accepted
+
+## Context
+iroh requires a relay server for NAT traversal and initial connection establishment. The n0 project provides free relay servers (`https://relay.iroh.network/`) that work out of the box. However, relying on a third-party service creates a dependency:
+
+- n0's relay could change terms, rate-limit, or go down
+- Production deployments may want self-hosted relays for reliability and privacy
+- The relay URL is a configuration point that should be explicit
+
+Conversely, requiring users to set up a relay server before they can use iroh transport is a significant friction point for testing and quick starts.
+
+## Decision
+Default to n0's relay servers. Allow override via `--iroh-relay <url>` CLI flag. Document self-hosted relay setup in project documentation.
+
+This matches iroh's own defaults — n0's relay is the standard starting point. Users who need production reliability self-host.
+
+## Consequences
+- **Positive**: Zero-config iroh transport for testing and development. `wraith serve --transport iroh` just works.
+- **Positive**: Self-hosting is a single flag override, not a complex setup requirement.
+- **Negative**: Default depends on n0's infrastructure. If n0's relay is down, default iroh connections fail (but this is the same experience as every iroh user).
+- **Negative**: Privacy-conscious users must remember to `--iroh-relay` to avoid n0. Mitigated by documentation.
+
+## References
+- [transport.md](../transport.md)
+- [OQ-02](../open-questions.md) — resolved by this ADR
--- a/docs/architecture/decisions/010-transport-chaining-cli.md
+++ b/docs/architecture/decisions/010-transport-chaining-cli.md
@@ -0,0 +1,33 @@
+# ADR-010: Transport Chaining in CLI
+
+## Status
+Accepted
+
+## Context
+Transport chaining allows combining iroh with an upstream proxy, e.g.:
+
+```bash
+wraith connect --transport iroh --proxy socks5://127.0.0.1:1080
+```
+
+This routes iroh's outbound TCP connections through a SOCKS5 proxy, which could itself be another wraith instance. This is important for:
+- Nested tunnel topologies
+- Environments where iroh needs to go through an existing proxy
+- Composing transports in flexible ways
+
+iroh's `Endpoint::builder` supports proxy configuration natively. The implementation is straightforward — pass the proxy URL to iroh's builder.
+
+## Decision
+Support `--transport iroh --proxy socks5://...` natively in the CLI. This works because iroh's endpoint builder accepts a proxy configuration, so the implementation is minimal: parse the proxy URL and pass it to the endpoint builder.
+
+For other transport combinations (TCP+TLS is already implicit — TLS wraps TCP), the `--proxy` flag applies to outbound connections from the SSH client or iroh endpoint.
+
+## Consequences
+- **Positive**: Flexible transport composition without requiring separate manual configuration.
+- **Positive**: Matches user expectation from the overview doc's transport chaining example.
+- **Positive**: Implementation is minimal — iroh already supports proxy config.
+- **Negative**: Slightly more CLI surface area (`--proxy` interaction with `--transport`).
+
+## References
+- [transport.md](../transport.md)
+- [OQ-05](../open-questions.md) — resolved by this ADR
--- a/docs/architecture/decisions/011-no-ssh-config-programmatic-api.md
+++ b/docs/architecture/decisions/011-no-ssh-config-programmatic-api.md
@@ -0,0 +1,38 @@
+# ADR-011: Programmatic-First API, No File-Based Config
+
+## Status
+Accepted
+
+## Context
+The client and server both need configuration (host addresses, keys, transport options, etc.). There are several approaches:
+
+1. **Read `~/.ssh/config`**: Parse OpenSSH config for default host/key/port. Reduces CLI verbosity for frequent connections.
+2. **Custom config file**: Wraith-specific config file (TOML/YAML) with host definitions.
+3. **Programmatic API only**: Configuration comes from CLI flags or the library API. No file parsing. `~/.ssh/` path conventions are cross-platform trouble (`~` expansion, Windows paths, etc.).
+4. **Hybrid**: `--config` flag pointing to a wraith-specific config file, but no OpenSSH config parsing.
+
+## Decision
+Option 3: Programmatic-first API. Configuration is provided via:
+- **CLI**: explicit flags (`--server`, `--identity`, `--transport`, etc.)
+- **Library API**: `wraith_core::client::ConnectOptions` and `wraith_core::server::ServeOptions` structs, constructable programmatically
+- **Environment variables**: for a few convenience defaults (e.g., `WRAITH_SERVER`, `WRAITH_IDENTITY`)
+
+No `~/.ssh/config` parsing, no wraith-specific config files. This approach:
+- Avoids cross-platform path issues (`~` expansion, Windows `USERPROFILE`, etc.)
+- Makes the library API clean and straightforward for programmatic consumers (NAPI wrapper, pubsub)
+- Keeps the CLI simple and explicit — no hidden behavior from config files
+- Matches the design principle that the library crate (`wraith-core`) is the primary interface
+
+If users want config-file behavior in the future, it can be added as a separate layer that populates the options structs. But the core doesn't need to know about files.
+
+## Consequences
+- **Positive**: Clean library API — `ConnectOptions` and `ServeOptions` are plain Rust structs.
+- **Positive**: No cross-platform path issues in the core library.
+- **Positive**: Explicit CLI — no hidden settings from a config file the user forgot about.
+- **Positive**: NAPI wrapper can construct options programmatically without file I/O.
+- **Negative**: Users must type full connection flags each time. Mitigated by shell aliases or environment variables.
+- **Negative**: No config file convenience. Users coming from `ssh config` may find this inconvenient.
+
+## References
+- [client.md](../client.md)
+- [OQ-06](../open-questions.md) — resolved by this ADR
--- a/docs/architecture/decisions/012-auth-ed25519-and-cert-authority.md
+++ b/docs/architecture/decisions/012-auth-ed25519-and-cert-authority.md
@@ -0,0 +1,42 @@
+# ADR-012: Ed25519 Keys + OpenSSH Certificate Authority, No Password Auth
+
+## Status
+Accepted
+
+## Context
+SSH authentication has several options:
+- **Ed25519 public key**: The default, already specified. Each user has a keypair; the server has an `authorized_keys` file.
+- **Password authentication**: Convenient for quick setups but less secure (susceptible to brute force, credential reuse).
+- **OpenSSH certificate authority (cert-authority)**: A CA signs user certificates. The server trusts the CA instead of individual keys. Much easier for multi-user setups — add one CA line to `authorized_keys` instead of every user's public key. Also supports certificate expiry and restrictions.
+
+The question is which auth methods to support and prioritize.
+
+## Decision
+
+**Primary: Ed25519 public key** (already specified, no change).
+
+**Important: OpenSSH certificate authority**. Support `cert-authority` entries in `authorized_keys` files. When a user presents a certificate signed by a trusted CA, the server validates the certificate (signature, expiry, permissions) and accepts it. This is critical for multi-user deployments where managing individual keys is impractical.
+
+**Not supported: Password authentication over SSH channels.** Password auth over an SSH tunnel (i.e., the SOCKS5 proxy requiring a password) is not in scope. Password auth over SSH itself is rejected because:
+- It's less secure than key-based auth
+- It's susceptible to brute force (fail2ban can mitigate, but keys eliminate the problem)
+- It's not needed when cert-authority provides easy multi-user management
+- If a local SOCKS5 proxy is desired with its own auth, that's a separate concern
+
+The server's `authorized_keys` file format follows OpenSSH conventions:
+- Regular keys: `ssh-ed25519 AAAA... user@host`
+- CA trusts: `cert-authority ssh-ed25519 AAAA... CA name`
+- Principals: `cert-authority,permit-port-forwarding ssh-ed25519 AAAA... CA name`
+
+## Consequences
+- **Positive**: Multi-user deployments are manageable — one CA entry instead of N key entries.
+- **Positive**: Certificates can carry expiry dates and restrictions (permit-port-forwarding, no-pty, source-address).
+- **Positive**: No password brute force risk. fail2ban still needed for connection-level abuse, but not for auth-level password guessing.
+- **Positive**: `russh` supports OpenSSH certificate verification natively.
+- **Negative**: Setting up a CA requires initial key management tooling (`ssh-keygen -s`).
+- **Negative**: Users who want a quick "just let me in" experience need to generate keys first. Not a significant barrier for the target audience (self-hosting, ops).
+
+## References
+- [client.md](../client.md)
+- [server.md](../server.md)
+- [OQ-04](../open-questions.md) — resolved by this ADR
--- a/docs/architecture/decisions/013-fail2ban-friendly-logging.md
+++ b/docs/architecture/decisions/013-fail2ban-friendly-logging.md
@@ -0,0 +1,39 @@
+# ADR-013: Fail2ban-Friendly Server Logging
+
+## Status
+Accepted
+
+## Context
+The server needs to handle abuse on public-facing deployments. Our production infrastructure uses fail2ban on Linux (documented in `/workspace/system/dev1/fail2ban.md`) with nftables and systemd journal backend. fail2ban needs structured, parseable logs to identify abusive IP addresses.
+
+However, fail2ban is Linux-specific. On other platforms (macOS, Windows, BSD), users need a different approach to reject abusive connections. The server should provide enough logging for fail2ban on Linux and enough built-in protection for other platforms.
+
+## Decision
+The server logs connection and authentication events at `INFO` level with structured fields, and provides a configurable connection rate limiter as a built-in defense.
+
+**Logging** (for fail2ban integration on Linux):
+- Log auth attempts: `level=INFO, msg="auth attempt", remote_addr=<ip>, user=<user>, key_fingerprint=<sha256>, result=<accept|reject>`
+- Log new connections: `level=INFO, msg="connection opened", remote_addr=<ip>, transport=<tcp|tls|iroh>`
+- Log disconnections: `level=INFO, msg="connection closed", remote_addr=<ip>, duration=<secs>`
+- Do NOT log: channel open targets, DNS resolutions, bytes transferred
+
+This matches what fail2ban needs: source IP + failure indicator. Our existing fail2ban setup filters on similar fields for SSH and nginx.
+
+**Built-in rate limiting** (for all platforms):
+- `--max-connections-per-ip <n>` (default: 0 = unlimited) — reject new connections from an IP that already has N active connections
+- `--max-auth-attempts <n>` (default: 10) — disconnect after N failed auth attempts from one connection
+- Rate limiting happens at the SSH layer, before channels are opened
+
+This ensures that even without fail2ban, the server rejects obviously abusive connections.
+
+## Consequences
+- **Positive**: fail2ban can parse wraith logs the same way it parses SSH and nginx logs on our production systems.
+- **Positive**: Built-in rate limiting provides protection on platforms without fail2ban.
+- **Positive**: No privacy-sensitive data in logs (no tunnel destinations).
+- **Negative**: Slightly more code in the server for connection tracking per IP.
+- **Negative**: Users with custom fail2ban filters need to write regex for wraith's log format (documented examples provided).
+
+## References
+- [server.md](../server.md)
+- [OQ-08](../open-questions.md) — resolved by this ADR
+- Production fail2ban setup: `/workspace/system/dev1/fail2ban.md`
--- a/docs/architecture/decisions/014-defer-tun-recommend-socks5-proxy.md
+++ b/docs/architecture/decisions/014-defer-tun-recommend-socks5-proxy.md
@@ -0,0 +1,41 @@
+# ADR-014: Defer TUN Implementation, Recommend Local SOCKS5 + tun2proxy
+
+## Status
+Accepted
+
+## Context
+The original plan included a TUN shim (`wraith-tun`) as Phase 3 — a separate root-requiring process that creates a TUN device and forwards IP packets through wraith's SOCKS5 port. This would provide VPN-like "route all traffic" behavior.
+
+However, TUN implementation has significant complexities:
+- Platform differences (Linux TUN, macOS utun, Windows wintun.dll)
+- TCP reconstruction in userspace (smoltcp or tun2proxy's ip-stack)
+- Virtual DNS handling
+- Root/CAP_NET_ADMIN requirements
+- TUN is easy to get wrong and hard to debug
+
+The core SOCKS5 interface already works for the vast majority of use cases. For users who truly need VPN-like "route all traffic" behavior, `tun2proxy` is an existing, well-tested tool that does exactly this: creates a TUN device and routes traffic through a SOCKS5 proxy.
+
+## Decision
+Defer TUN implementation entirely. Remove `wraith-tun` from the architecture. Instead:
+
+1. **Core interface**: wraith's local SOCKS5 proxy (always available, no root required)
+2. **VPN-like behavior**: Users who need it run `tun2proxy --proxy socks5://127.0.0.1:1080` alongside `wraith connect`
+3. **Documentation**: Recommend tun2proxy in the README/wiki for "route all traffic" use cases
+
+This removes TUN from the project scope while still providing a path to VPN-like behavior. If demand justifies it later, `wraith-tun` can be added as a thin wrapper around tun2proxy's pattern.
+
+The `tun` feature flag and `wraith-tun` binary are removed from the architecture. The `tun-rs` dependency is removed.
+
+## Consequences
+- **Positive**: Significantly reduces project scope and complexity. No TUN code to write, test, or maintain across platforms.
+- **Positive**: tun2proxy is already well-tested for this exact use case.
+- **Positive**: Core binary remains unprivileged. No root code anywhere in the project.
+- **Positive**: Cleaner architecture — wraith only does SSH tunneling + SOCKS5. tun2proxy does TUN.
+- **Negative**: Users need two tools instead of one for VPN-like behavior. Mitigated by documentation.
+- **Negative**: tun2proxy is an external dependency in practice, though it's widely available in package managers.
+- **Negative**: No first-class Windows/macOS TUN story. tun2proxy handles these platforms but users need to install it separately.
+
+## References
+- [tun-shim.md](../tun-shim.md) — this spec is now deprecated
+- [ADR-002](002-tun-separate-process.md) — superseded; TUN is no longer in scope
+- [ADR-005](005-socks5-before-tun.md) — SOCKS5 is still the primary interface; TUN forwarding is now external
--- a/docs/architecture/decisions/015-napi-rs-for-ffi-bridge.md
+++ b/docs/architecture/decisions/015-napi-rs-for-ffi-bridge.md
@@ -0,0 +1,27 @@
+# ADR-015: napi-rs for FFI Bridge
+
+## Status
+Accepted
+
+## Context
+The NAPI wrapper needs a Rust-to-Node.js bridge. Two main options:
+
+1. **napi-rs**: The standard for Rust → Node.js native addons. Mature, well-documented, large ecosystem. Produces `.node` binaries for specific platforms. Good build tooling (`napi` CLI). Used by major projects (swc, rspack, biome).
+
+2. **uniffi**: Mozilla's FFI bridge supporting multiple targets (Python, Swift, Kotlin, Node.js). Broader target reach but less mature for Node.js specifically. The Node.js binding is relatively new.
+
+The primary consumer is TypeScript/Node.js — specifically the `@alkdev/pubsub` event target system. The broader alkdev ecosystem (pubsub, operations) is TypeScript-first. While future Python or mobile consumers are imaginable, they are not in scope.
+
+## Decision
+Use napi-rs. It's the standard for Node.js native addons, has the best documentation and tooling, and matches our primary consumer (TypeScript/Node.js). If future Python or mobile consumers are needed, uniffi can be added as a separate FFI layer — the Rust core library doesn't change, only the binding layer does.
+
+## Consequences
+- **Positive**: Best-in-class Node.js native addon support. Mature, well-documented, widely used.
+- **Positive**: `napi` CLI handles building, cross-compilation, and npm package publishing.
+- **Positive**: Async support via `napi-rs`'s `AsyncTask` and thread-safe functions.
+- **Negative**: Only targets Node.js. Python/Swift/Kotlin require a separate FFI bridge (uniffi or similar).
+- **Negative**: `.node` binaries are platform-specific. Need CI matrix for linux-x64, linux-arm64, macos-x64, macos-arm64, win32-x64.
+
+## References
+- [napi-and-pubsub.md](../napi-and-pubsub.md)
+- [OQ-11](../open-questions.md) — resolved by this ADR
--- a/docs/architecture/decisions/016-napi-expose-connect-and-serve.md
+++ b/docs/architecture/decisions/016-napi-expose-connect-and-serve.md
@@ -0,0 +1,40 @@
+# ADR-016: NAPI Exposes Both connect() and serve()
+
+## Status
+Accepted
+
+## Context
+The NAPI wrapper needs to provide TypeScript/Node.js consumers with access to wraith's functionality. The primary use case is `@alkdev/pubsub`'s event target system, which needs both directions:
+
+1. **connect()**: Establish a client connection to a wraith server. Used by workers/spokes that need to tunnel events through a wraith server.
+2. **serve()**: Start a wraith server from Node.js. Used by hubs that want to accept wraith connections and route events.
+
+The previous decision (ADR-007) was to expose only `connect()` for MVP, deferring `serve()`. However, the pubsub integration requires both: a spoke needs `connect()` to reach a hub, and a hub could use `serve()` to accept connections without running a separate `wraith serve` process.
+
+More importantly, both `connect()` and `serve()` are fundamental operations of the wraith library. Since the NAPI wrapper is a thin layer over `wraith-core`, exposing both is straightforward — they're just Rust functions behind `#[napi]` attributes.
+
+## Decision
+The NAPI wrapper exposes both `connect()` and `serve()` from the start:
+
+```typescript
+// @alkdev/wraith
+function connect(options: WraithConnectOptions): Promise<Duplex>;
+function serve(options: WraithServeOptions): Promise<WraithServer>;
+```
+
+- `connect()` returns a `Duplex` stream (as per ADR-007)
+- `serve()` returns a `WraithServer` object with a `close()` method and events for new connections
+
+The NAPI layer is transport-agnostic — it doesn't know about pubsub's `EventEnvelope`. The pubsub event target adapter wraps the `Duplex` stream to implement `TypedEventTarget`. This separation ensures the NAPI wrapper is reusable for any stream-based protocol, not just pubsub.
+
+## Consequences
+- **Positive**: Pubsub can use both directions without running a separate binary for the server side.
+- **Positive**: The NAPI wrapper becomes a complete bridge — any Node.js process can be either a client or server.
+- **Positive**: Implementation is still minimal — `serve()` is just `wraith_core::server::run()` behind `#[napi]`.
+- **Negative**: Slightly larger API surface (two functions + `WraithServer` type instead of just `connect()`).
+- **Negative**: Server-side NAPI needs to handle multiple concurrent connections, which adds complexity to `WraithServer`.
+
+## References
+- [napi-and-pubsub.md](../napi-and-pubsub.md)
+- [ADR-007](007-napi-single-stream.md) — still valid; NAPI exposes single streams, but now from both sides
+- [OQ-10](../open-questions.md) — resolved by this ADR
--- a/docs/architecture/decisions/017-stealth-mode-protocol-multiplexing.md
+++ b/docs/architecture/decisions/017-stealth-mode-protocol-multiplexing.md
@@ -0,0 +1,30 @@
+# ADR-017: Stealth Mode — Protocol Multiplexing on Port 443
+
+## Status
+Accepted
+
+## Context
+When running a wraith server with TLS transport on port 443, the server should be indistinguishable from a regular HTTPS web server to port scanners and deep packet inspection (DPI) systems. This is important for censorship circumvention — if SSH traffic on port 443 is detectable, it can be blocked.
+
+After the TLS handshake completes, the server sees a raw byte stream. SSH protocol identification starts with `SSH-2.0-`, while HTTP starts with HTTP method verbs (GET, POST, etc.). The server can inspect the first bytes to determine the protocol.
+
+## Decision
+When `--stealth` is enabled with TLS transport:
+
+1. After completing the TLS handshake, peek at the first few bytes of the connection
+2. If the connection starts with `SSH-2.0-`, proceed with SSH session via `server::run_stream()`
+3. If the connection starts with anything else (HTTP, random data), respond with `HTTP/1.1 404 Not Found\r\nServer: nginx\r\n\r\n` and close the connection
+
+This makes the server appear as an nginx web server returning 404 errors to all non-SSH connections. Scanners and DPI systems see a typical HTTPS site with no SSH exposure.
+
+The fake response uses `Server: nginx` headers to match the most common web server profile.
+
+## Consequences
+- **Positive**: TLS+wraith servers on port 443 are indistinguishable from ordinary HTTPS sites to automated scanners.
+- **Positive**: Simple implementation — just peek at the first bytes and branch.
+- **Positive**: Consistent with censorship circumvention best practices.
+- **Negative**: Legitimate HTTPS traffic to the same port gets a 404. If the same IP needs to serve real web content, use a reverse proxy (nginx/haproxy) in front that routes by SNI or path.
+- **Negative**: The `--stealth` flag only applies to TLS transport. It has no effect on TCP or iroh transports.
+
+## References
+- [server.md](../server.md)
--- a/docs/architecture/decisions/018-control-channel-for-pubsub.md
+++ b/docs/architecture/decisions/018-control-channel-for-pubsub.md
@@ -0,0 +1,38 @@
+# ADR-018: Control Channel for PubSub over SSH
+
+## Status
+Accepted
+
+## Context
+The NAPI wrapper and pubsub integration need a way to use wraith's SSH channel as a data plane for event routing. When a `wraith connect` client opens an SSH session to a server, the `direct_tcpip` channel type is used to reach specific TCP targets (host:port).
+
+For the pubsub use case, the client needs a dedicated bidirectional stream to the server's event bus — not a TCP connection to a random host. There are several approaches:
+
+1. **Special destination**: Use `direct_tcpip` with a reserved destination (e.g., `wraith-control:0`) that the server recognizes and routes internally instead of connecting to a TCP target.
+2. **Port forwarding**: The server runs a pubsub hub on a specific port (e.g., 9736) and the client uses normal port forwarding (`-L 9736:hub:9736`).
+3. **Custom channel type**: Define a new SSH channel type beyond `direct_tcpip` and `forwarded_tcpip`.
+
+## Decision
+Use approach 1: a reserved `direct_tcpip` destination string. When the server receives a `channel_open_direct_tcpip` request for `wraith-control:0`:
+
+1. The `channel_open_direct_tcpip` handler detects the special target via string matching
+2. Instead of connecting to a TCP target, it bridges the channel to the local pubsub event bus
+3. `EventEnvelope` JSON flows bidirectionally over the SSH channel
+
+The destination string `wraith-control` is reserved. Regular TCP targets are hostnames or IP addresses, so there is no collision risk.
+
+Approach 2 (port forwarding to a specific port) is still supported as an alternative — the client can use `--forward 9736:localhost:9736` if the server runs a pubsub hub on that port. But the control channel approach is simpler and doesn't require a separate listening port.
+
+Approach 3 (custom channel type) was rejected because russh's `direct_tcpip` handler is well-understood and adding custom channel types requires modifying russh.
+
+## Consequences
+- **Positive**: Simple implementation — just string matching in the server's `channel_open_direct_tcpip` handler.
+- **Positive**: No separate port or service needs to run on the server. The control channel is built into wraith.
+- **Positive**: Compatible with the NAPI wrapper's single-duplex-stream model.
+- **Positive**: Port forwarding to a specific port is still available as an alternative.
+- **Negative**: The string `wraith-control` is a magic constant. It should be defined as a constant in the crate.
+- **Negative**: Regular TCP destinations accidentally matching `wraith-control` would be misrouted. Mitigated by reserving the entire `wraith-` prefix namespace.
+
+## References
+- [napi-and-pubsub.md](../napi-and-pubsub.md)
+- [server.md](../server.md)
--- a/docs/architecture/napi-and-pubsub.md
+++ b/docs/architecture/napi-and-pubsub.md
@@ -9,20 +9,20 @@ last_updated: 2026-06-01

 Two integration layers that enable TypeScript/JavaScript consumers to use wraith as a transport:

-1. **NAPI wrapper** (`@alkdev/wraith`) — A minimal Node.js native addon exposing `connect()` and `serve()` that return duplex streams
+1. **NAPI wrapper** (`@alkdev/wraith`) — A Node.js native addon (via napi-rs) exposing `connect()` and `serve()` that return duplex streams
 2. **PubSub event target** (`@alkdev/pubsub` adapter) — An implementation of the `TypedEventTarget` interface that routes events over wraith's SSH channel

 ## Why

 The wraith Rust binary serves CLI users. But the broader ecosystem (pubsub, operations, agent spokes) is TypeScript-first. These integration layers let TypeScript code use wraith's transport without reimplementing SSH.

-The NAPI surface is intentionally tiny — it exposes the transport connection, not the full SSH protocol. The pubsub adapter is also minimal — it implements `TypedEventTarget` and serializes `EventEnvelope` JSON over the stream.
+The NAPI surface is intentionally minimal — it exposes transport connections as duplex streams, not the full SSH protocol. The pubsub adapter wraps those streams with `EventEnvelope` serialization.

 ## Architecture

-### NAPI Wrapper
+### NAPI Wrapper (napi-rs)

-The wrapper exposes a single function that establishes a wraith connection and returns a Node.js `Duplex` stream:
+The wrapper uses napi-rs (ADR-015) and exposes two functions (ADR-016):

 ```typescript
 // @alkdev/wraith (TypeScript side)
@@ -35,25 +35,50 @@ interface WraithConnectOptions {
  // Transport
  transport: 'tcp' | 'tls' | 'iroh';
  // Auth
-  identity?: string;         // path to SSH key
+  identity?: string;         // path to SSH key, or Buffer with key data
  // TLS
  tlsServerName?: string;    // SNI hostname
-  insecure?: boolean;         // accept self-signed certs
+  insecure?: boolean;        // accept self-signed certs
  // iroh
-  irohRelay?: string;        // relay URL (default: n0)
+  irohRelay?: string;       // relay URL (default: n0)
+  // Proxy
+  proxy?: string;            // upstream SOCKS5/HTTP proxy URL
 }

-function connect(options: WraithConnectOptions): Duplex;
+interface WraithServeOptions {
+  // Transport
+  transport: 'tcp' | 'tls' | 'iroh';
+  // Auth
+  hostKey?: string;          // path to SSH host key, or Buffer with key data
+  authorizedKeys?: string;  // path to authorized_keys, or Buffer with key data
+  certAuthority?: string;   // path to CA public key for cert-authority auth
+  // TLS
+  tlsCert?: string;          // path to TLS cert
+  tlsKey?: string;           // path to TLS key
+  acmeDomain?: string;      // ACME domain for auto-cert (ADR-008)
+  // Listen
+  listen?: string;           // listen address (default: 0.0.0.0:22)
+  // iroh
+  irohRelay?: string;       // relay URL (default: n0)
+}
+
+// Returns a Duplex stream for the SSH channel
+function connect(options: WraithConnectOptions): Promise<Duplex>;
+
+// Returns a server object with close() and connection events
+function serve(options: WraithServeOptions): Promise<WraithServer>;
+
+interface WraithServer {
+  close(): Promise<void>;
+  onConnection(callback: (stream: Duplex, info: ConnectionInfo) => void): void;
+}
 ```

-The `Duplex` stream carries raw SSH channel data. On the Rust side, the NAPI function:
+The NAPI layer is **transport-agnostic** — it doesn't know about pubsub's `EventEnvelope`. The pubsub adapter wraps the `Duplex` stream to implement `TypedEventTarget`. This separation ensures the NAPI wrapper is reusable for any stream-based protocol, not tied specifically to pubsub.

-1. Creates a transport (TCP/TLS/iroh) based on options
-2. Establishes an SSH session via `client::connect_stream()`
-3. Opens a single `direct_tcpip` channel to a well-known destination (or uses a control protocol)
-4. Returns the channel as a NAPI `Buffer` stream
+### Programmatic Configuration (ADR-011)

-**Key design decision**: The NAPI wrapper does NOT expose the full SSH channel multiplexing API. It returns one duplex stream. If the TypeScript consumer needs multiple logical channels, it multiplexes them itself (e.g., via pubsub's event routing).
+Both `connect()` and `serve()` accept options as plain objects. No file paths are mandatory — keys can be provided as `Buffer` data directly, making programmatic usage straightforward. Environment variables (`WRAITH_SERVER`, `WRAITH_IDENTITY`) provide convenience defaults.

 ### PubSub Event Target Adapter

@@ -63,7 +88,7 @@ This implements `TypedEventTarget` from `@alkdev/pubsub`:
 // @alkdev/pubsub (new adapter: event-target-wraith.ts)

 export interface WraithEventTargetOptions {
-  stream: Duplex;  // from @alkdev/wraith.connect()
+  stream: Duplex;  // from @alkdev/wraith.connect() or serve()
 }

 export interface WraithEventTarget<TEvent extends TypedEvent>
@@ -85,9 +110,11 @@ Wire protocol (same as other pubsub adapters):

 ### On the Server Side

-The wraith server exposes a "control channel" — a special `direct_tcpip` destination (e.g., `wraith-control:0`) that routes to the pubsub event bus instead of a TCP target. When a client connects to this destination:
+The wraith server uses a reserved `direct_tcpip` destination (`wraith-control:0`) for the pubsub control channel (ADR-018). When a client connects to this destination:

-1. Server's `channel_open_direct_tcpip` handler detects the special target
+1. The server's `channel_open_direct_tcpip` handler detects the reserved `wraith-control` target
+
+When a client connects to this destination:
 2. Instead of opening a TCP connection, it bridges the channel to its local pubsub event bus
 3. `EventEnvelope` JSON flows bidirectionally over the SSH channel

@@ -104,17 +131,21 @@ The pubsub adapter doesn't care which side initiated the SSH session. It just ne

 ## Constraints

- The NAPI wrapper exposes a single duplex stream, not the full SSH channel API. Multiplexing is done at the pubsub layer.
+- The NAPI wrapper exposes duplex streams, not the full SSH channel API. Multiplexing is done at the pubsub layer.
 - The pubsub wire protocol is length-prefixed JSON, matching the existing adapter pattern. Binary payloads should be base64-encoded in the `EventEnvelope.payload`.
 - The NAPI binary size will be ~5-10MB (includes russh + tokio + cryptography). The `iroh` feature adds significant size; it should be an optional feature.
+- Keys can be provided as file paths or `Buffer` data, supporting both CLI and programmatic usage patterns (ADR-011).

 ## Open Questions

- **OQ-10**: Whether the NAPI wrapper should expose raw channel access or a higher-level "send JSON, receive JSON" API
- **OQ-11**: Whether to use napi-rs or uniffi for the FFI bridge (napi-rs is more established for Node.js, uniffi supports more targets)
+None — all resolved.

 ## Design Decisions

 | ADR | Decision | Summary |
 |-----|----------|---------|
 | [007](decisions/007-napi-single-stream.md) | NAPI exposes single duplex stream | No SSH multiplexing in JS, pubsub handles it |
+| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first API | No file-based config; options are structs or env vars |
+| [015](decisions/015-napi-rs-for-ffi-bridge.md) | napi-rs for FFI | Standard Node.js native addon tooling |
+| [016](decisions/016-napi-expose-connect-and-serve.md) | Both connect() and serve() | NAPI exposes client and server sides from the start |
+| [018](decisions/018-control-channel-for-pubsub.md) | Control channel for pubsub | Reserved `wraith-control` destination for event bus |
--- a/docs/architecture/open-questions.md
+++ b/docs/architecture/open-questions.md
@@ -9,85 +9,85 @@ last_updated: 2026-06-01

 ### OQ-01: TLS certificate management strategy
 - **Origin**: [server.md](server.md)
- **Status**: open
- **Priority**: medium
- **Details**: Should the server support ACME/Let's Encrypt auto-provisioning (like https_proxy does), or is manual cert management sufficient? Auto-provisioning is more user-friendly but adds complexity and a dependency on the ACME protocol. Self-signed certs with `--insecure` flag on the client side covers the simple case.
- **Cross-references**: Server spec, TlsTransport implementation
+- **Status**: ~~resolved~~
+- **Priority**: ~~medium~~ —
+- **Resolution**: ADR-008 — Support both domain-based and IP-based ACME/Let's Encrypt auto-provisioning, plus manual certs. Domain-based uses standard certbot-style flow with HTTP-01/TLS-ALPN-01 challenges. IP-based uses short-lived certs via TLS-ALPN-01 on port 443. Manual certs via `--tls-cert`/`--tls-key` always supported. Implementation uses `rustls-acme` or similar pure-Rust ACME client.
+- **Cross-references**: [ADR-008](decisions/008-acme-lets-encrypt.md), Server spec, TlsTransport implementation

 ### OQ-02: iroh relay configuration defaults
 - **Origin**: [transport.md](transport.md)
- **Status**: open
- **Priority**: low
- **Details**: Should the default iroh relay be n0's free servers, or should users be required to specify one? n0's relay is convenient for testing and quick start but creates a dependency. Self-hosted relay is better for production. Consider: default to n0, allow `--iroh-relay` override, and document self-hosting.
- **Cross-references**: Transport spec, iroh docs
+- **Status**: ~~resolved~~
+- **Priority**: ~~low~~ —
+- **Resolution**: ADR-009 — Default to n0's free relay servers. Allow override via `--iroh-relay <url>`. Document self-hosted relay setup. This matches iroh's own defaults and minimizes friction for testing/development.
+- **Cross-references**: [ADR-009](decisions/009-default-iroh-relay.md), Transport spec

 ### OQ-05: Transport chaining support in CLI
 - **Origin**: [transport.md](transport.md)
- **Status**: open
- **Priority**: low
- **Details**: Should `--transport iroh --proxy socks5://...` be supported natively, or should chaining be a manual configuration thing? The iroh transport's `connect()` method would need to route its outbound through the proxy. This is possible (iroh's `Endpoint::builder` supports proxy configuration) but adds CLI complexity. Consider: defer to Phase 2.
- **Cross-references**: Transport spec
+- **Status**: ~~resolved~~
+- **Priority**: ~~low~~ —
+- **Resolution**: ADR-010 — Support `--transport iroh --proxy socks5://...` natively in the CLI. iroh's endpoint builder accepts proxy configuration directly, so the implementation is minimal. Other transport combinations (TCP+TLS) are already implicit.
+- **Cross-references**: [ADR-010](decisions/010-transport-chaining-cli.md), Transport spec

 ## Client

 ### OQ-06: SSH config file parsing
 - **Origin**: [client.md](client.md)
- **Status**: open
- **Priority**: low
- **Details**: Should the client read `~/.ssh/config` for default host/key/port settings? russh-config crate exists and can parse this. Would reduce CLI verbosity for frequent connections. Consider: `--config` flag to read from a wraith-specific config instead, avoiding OpenSSH config parsing complexity.
- **Cross-references**: Client spec
+- **Status**: ~~resolved~~
+- **Priority**: ~~low~~ —
+- **Resolution**: ADR-011 — No `~/.ssh/config` parsing, no custom config file. Configuration is programmatic-first: CLI flags, library API structs (`ConnectOptions`, `ServeOptions`), and environment variables. Cross-platform path issues (`~` expansion) are avoided. The library API is the primary interface; if config files are needed later, they can be a separate layer.
+- **Cross-references**: [ADR-011](decisions/011-no-ssh-config-programmatic-api.md), Client spec

 ## Server

 ### OQ-07: ACME/Let's Encrypt support
 - **Origin**: [server.md](server.md)
- **Status**: open
- **Priority**: medium
- **Details**: Auto-provisioning TLS certs from Let's Encrypt would make TLS mode much easier to set up. But it requires port 80 or port 443 + TLS-ALPN-01 challenge support, and a persistent cert store. Consider: defer to Phase 2, document manual cert setup for MVP.
- **Cross-references**: Server spec, TlsTransport
+- **Status**: ~~resolved~~
+- **Priority**: ~~medium~~ —
+- **Resolution**: ADR-008 — Same resolution as OQ-01. Both domain-based (standard, domain-bound, auto-renewing) and IP-based (short-lived, no domain required) ACME flows are supported. The domain-based path requires port 80 or DNS access for challenges. The IP-based path uses TLS-ALPN-01 on port 443 and requires the ACME client to run continuously.
+- **Cross-references**: [ADR-008](decisions/008-acme-lets-encrypt.md), Server spec, TlsTransport

 ### OQ-08: Connection limits and rate limiting
 - **Origin**: [server.md](server.md)
- **Status**: open
- **Priority**: low
- **Details**: Should the server support configurable connection limits, rate limiting, and max simultaneous channels? Useful for preventing abuse on public-facing servers. Consider: `--max-connections` and `--max-channels-per-connection` flags.
- **Cross-references**: Server spec
+- **Status**: ~~resolved~~
+- **Priority**: ~~low~~ —
+- **Resolution**: ADR-013 — Two-layer approach: (1) Structured logging of auth attempts and connections at INFO level for fail2ban integration on Linux — matches our production fail2ban setup with nftables and systemd journal. (2) Built-in rate limiting: `--max-connections-per-ip` and `--max-auth-attempts` flags providing platform-independent abuse protection.
+- **Cross-references**: [ADR-013](decisions/013-fail2ban-friendly-logging.md), Server spec, Production fail2ban docs

 ### OQ-04: Authentication beyond Ed25519 keys
 - **Origin**: [client.md](client.md), [server.md](server.md)
- **Status**: open
- **Priority**: low
- **Details**: Should password authentication be supported? Should SSH certificates (OpenSSH cert-authority) be supported? Password auth is convenient but less secure. Certificates are useful for large-scale deployments. Consider: password auth as optional flag (`--allow-password`), certificates as future feature.
- **Cross-references**: Client spec, Server spec
+- **Status**: ~~resolved~~
+- **Priority**: ~~low~~ —
+- **Resolution**: ADR-012 — Ed25519 public key (default, unchanged) + OpenSSH certificate authority support (new, important for multi-user). No password authentication over SSH channels. If a local SOCKS5 proxy needs its own auth, that's a separate concern. Cert-authority makes multi-user management practical: one CA entry in `authorized_keys` instead of N individual keys. Certificates support expiry and restrictions.
+- **Cross-references**: [ADR-012](decisions/012-auth-ed25519-and-cert-authority.md), Client spec, Server spec

 ## TUN

 ### OQ-03: Windows TUN support scope
 - **Origin**: [tun-shim.md](tun-shim.md)
- **Status**: open
- **Priority**: low
- **Details**: tun-rs supports Windows via wintun.dll but distributing a DLL adds complexity. Consider: Linux and macOS only for MVP, Windows as a follow-up.
- **Cross-references**: tun-shim.md
+- **Status**: ~~resolved~~
+- **Priority**: ~~low~~ —
+- **Resolution**: ADR-014 — TUN is deferred entirely from the wraith project. For VPN-like behavior, users run `tun2proxy --proxy socks5://127.0.0.1:1080` alongside wraith. This eliminates all TUN-related scope questions (Windows, TCP reconstruction, etc.).
+- **Cross-references**: [ADR-014](decisions/014-defer-tun-recommend-socks5-proxy.md)

 ### OQ-09: TCP reconstruction approach for TUN
 - **Origin**: [tun-shim.md](tun-shim.md)
- **Status**: open
- **Priority**: medium
- **Details**: Should the TUN shim use a userspace TCP stack (like smoltcp or tun2proxy's ip-stack) for reliable TCP reconstruction, or forward raw IP packets through SOCKS5? Raw packet forwarding requires handling segmentation, retransmission, and reordering. Userspace TCP solves this but is more code. Consider: start with SOCKS5 proxying (each TUN packet becomes a SOCKS5 connection) and add TCP reconstruction if needed.
- **Cross-references**: tun-shim.md
+- **Status**: ~~resolved~~
+- **Priority**: ~~medium~~ —
+- **Resolution**: ADR-014 — TUN is deferred from wraith. tun2proxy (external tool) handles this if users need VPN-like behavior.
+- **Cross-references**: [ADR-014](decisions/014-defer-tun-recommend-socks5-proxy.md)

 ## NAPI / PubSub

 ### OQ-10: NAPI wrapper API surface
 - **Origin**: [napi-and-pubsub.md](napi-and-pubsub.md)
- **Status**: open
- **Priority**: medium
- **Details**: Should the NAPI wrapper expose just a `connect()` function returning a `Duplex` stream, or also expose `serve()` for server-side use from Node.js? Server-side would enable running a wraith server from a Node.js process. Consider: `connect()` only for MVP, `serve()` as follow-up.
- **Cross-references**: napi-and-pubsub.md
+- **Status**: ~~resolved~~
+- **Priority**: ~~medium~~ —
+- **Resolution**: ADR-016 — Expose both `connect()` and `serve()` from the start. Both are fundamental operations needed by the pubsub event target system (spokes use `connect()`, hubs could use `serve()`). The NAPI layer is transport-agnostic — it doesn't know about pubsub's `EventEnvelope`. The pubsub adapter wraps the `Duplex` stream. This ensures the NAPI wrapper is reusable for any stream-based protocol, not tied specifically to pubsub.
+- **Cross-references**: [ADR-016](decisions/016-napi-expose-connect-and-serve.md), napi-and-pubsub.md

 ### OQ-11: napi-rs vs uniffi for FFI bridge
 - **Origin**: [napi-and-pubsub.md](napi-and-pubsub.md)
- **Status**: open
- **Priority**: low
- **Details**: napi-rs is the standard for Node.js native addons and has the best ecosystem. uniffi supports more targets (Python, Swift, Kotlin) but is less mature for Node.js. Since the primary consumer is TypeScript/Node.js (pubsub/operations ecosystem), napi-rs is the logical choice. But if future Python or mobile consumers are anticipated, uniffi could be worth the investment.
- **Cross-references**: napi-and-pubsub.md
+- **Status**: ~~resolved~~
+- **Priority**: ~~low~~ —
+- **Resolution**: ADR-015 — Use napi-rs. It's the standard for Node.js native addons, matches our primary consumer (TypeScript/Node.js), and has the best ecosystem and documentation. If future Python or mobile consumers are needed, a separate uniffi layer can be added — the Rust core doesn't change.
+- **Cross-references**: [ADR-015](decisions/015-napi-rs-for-ffi-bridge.md), napi-and-pubsub.md
--- a/docs/architecture/overview.md
+++ b/docs/architecture/overview.md
@@ -38,6 +38,7 @@ The `wraith-core` crate exports the pluggable components for embedding or progra
 - `Socks5Server` — local SOCKS5 proxy that forwards through SSH channels
 - `PortForwarder` — manages local/remote port forwards
 - `ServerHandler` — russh server handler with configurable auth and channel policies
+- `ConnectOptions` / `ServeOptions` — programmatic configuration structs (no file parsing)

 ## Dependencies

@@ -47,45 +48,63 @@ The `wraith-core` crate exports the pluggable components for embedding or progra
 | `tokio` | Async runtime | No (core) |
 | `tokio-rustls` | TLS wrapping | Yes (`tls`) |
 | `rustls` | TLS implementation | Yes (`tls`) |
+| `rustls-acme` | ACME/Let's Encrypt auto-cert | Yes (`acme`) |
 | `iroh` | P2P QUIC transport | Yes (`iroh`) |
-| `tun-rs` | TUN interface | Yes (`tun`) |
 | `clap` | CLI argument parsing | No (core) |
 | `tracing` | Structured logging | No (core) |
 | `anyhow` / `thiserror` | Error handling | No (core) |

+> Note: `tun-rs` is no longer a dependency. TUN support is deferred in favor of the external `tun2proxy` tool (ADR-014).
+
 ## Architecture Constraints

 1. **SSH runs over transport, not alongside** — The transport layer produces a single `AsyncRead+AsyncWrite+Unpin+Send` stream. SSH runs over that stream via `russh::client::connect_stream()` / `russh::server::run_stream()`. The SSH layer never knows what transport it's on. (ADR-001, ADR-004)

-2. **TUN is a separate process** — The core binary never requires root. TUN functionality is a separate `wraith-tun` process that reads from a TUN device and forwards to the core's SOCKS5 port. (ADR-002)
+2. **SOCKS5 is the primary client interface** — Port forwarding is built on top of SOCKS5-like channel management. For VPN-like "route all traffic" behavior, users run `tun2proxy` alongside wraith's SOCKS5 proxy. TUN is not in the project scope. (ADR-005, ADR-014)

-3. **SOCKS5 is the primary client interface** — Port forwarding and TUN are built on top of SOCKS5, not alongside it. Everything flows through the SSH channel abstraction. (ADR-005)
+3. **No logging of tunnel destinations** — The server logs auth attempts and connections (for fail2ban) but does not log `channel_open_direct_tcpip` destinations, DNS lookups, or bytes transferred. (ADR-006, ADR-013)

-4. **No logging of tunnel destinations** — The server logs auth attempts (for fail2ban) but does not log `channel_open_direct_tcpip` destinations, DNS lookups, or bytes transferred. (ADR-006, pending)
+4. **Programmatic-first API** — Configuration via CLI flags, library API structs (`ConnectOptions`, `ServeOptions`), and environment variables. No `~/.ssh/config` parsing, no custom config files. (ADR-011)

-5. **Feature flags control transport inclusion** — `tls`, `iroh`, `tun` are feature-gated so the base install is lean. Users opt in to heavier dependencies.
+5. **Feature flags control transport inclusion** — `tls`, `iroh`, `acme` are feature-gated so the base install is lean. Users opt in to heavier dependencies.
+
+6. **Authentication is key-based** — Ed25519 public key (default) and OpenSSH certificate authority. No password authentication over SSH. (ADR-012)
+
+7. **NAPI exposes both connect() and serve()** — The napi-rs wrapper provides client and server functionality, using napi-rs as the FFI bridge. The NAPI layer is transport-agnostic and not tied to pubsub. (ADR-015, ADR-016)

 ## Design Decisions

 | ADR | Decision | Summary |
 |-----|----------|---------|
 | [001](decisions/001-pluggable-transport.md) | Pluggable transport | Transport trait produces `AsyncRead+AsyncWrite+Unpin+Send`, SSH consumes it |
-| [002](decisions/002-tun-separate-process.md) | TUN shim separate | Core binary unprivileged, TUN is a thin root wrapper |
+| [002](decisions/002-tun-separate-process.md) | TUN shim separate | Superseded — TUN is deferred, use tun2proxy (ADR-014) |
 | [003](decisions/003-iroh-stream-join.md) | iroh stream join | `tokio::io::join(recv, send)` combines QUIC halves |
 | [004](decisions/004-ssh-over-transport.md) | SSH over transport | SSH never accesses TCP/iroh/TLS directly |
-| [005](decisions/005-socks5-before-tun.md) | SOCKS5 first | SOCKS5 is the primary interface, TUN forwards to it |
+| [005](decisions/005-socks5-before-tun.md) | SOCKS5 first | SOCKS5 is the primary interface; TUN is external (tun2proxy) |
+| [006](decisions/006-no-logging-of-tunnel-destinations.md) | No logging of tunnel destinations | Server logs auth and connections, not destinations |
+| [007](decisions/007-napi-single-stream.md) | NAPI single stream | NAPI exposes duplex streams, not SSH multiplexing |
+| [008](decisions/008-acme-lets-encrypt.md) | ACME/Let's Encrypt | Auto-provision TLS certs, domain and IP paths |
+| [009](decisions/009-default-iroh-relay.md) | Default iroh relay | n0 relay by default, `--iroh-relay` override |
+| [010](decisions/010-transport-chaining-cli.md) | Transport chaining | `--proxy` works with all transports natively |
+| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first | No file-based config; options are structs, env vars, CLI flags |
+| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Key + cert-authority | Ed25519 keys + OpenSSH CA; no password auth |
+| [013](decisions/013-fail2ban-friendly-logging.md) | Fail2ban-friendly | Structured auth logs + built-in rate limiting |
+| [014](decisions/014-defer-tun-recommend-socks5-proxy.md) | Defer TUN | Use tun2proxy for VPN-like behavior; no wraith-tun binary |
+| [015](decisions/015-napi-rs-for-ffi-bridge.md) | napi-rs | Standard Node.js native addon tooling |
+| [016](decisions/016-napi-expose-connect-and-serve.md) | connect + serve | NAPI exposes both client and server from the start |
+| [017](decisions/017-stealth-mode-protocol-multiplexing.md) | Stealth mode | Protocol multiplexing on port 443 |
+| [018](decisions/018-control-channel-for-pubsub.md) | Control channel | Reserved `wraith-control` destination for pubsub |

 ## Open Questions

- **OQ-01**: TLS certificate management strategy (ADR-006, pending)
- **OQ-02**: iroh relay configuration defaults (n0 relay vs self-hosted)
- **OQ-03**: Windows TUN support scope (wintun.dll dependency)
- **OQ-04**: Authentication beyond Ed25519 keys (password auth, certificate auth)
+All open questions have been resolved. See [open-questions.md](open-questions.md) for resolution details.

 ## References

 - [Feasibility Assessment](../../../conversations/research/ssh-tunnel-vpn-alternative-feasibility.md)
 - [russh API](/workspace/russh) — SSH client/server library
 - [Dispatch](/workspace/@alkdev/dispatch) — Reference implementation of russh port forwarding
- [tun-rs](https://github.com/tun-rs/tun-rs) — Cross-platform TUN/TAP library
 - [iroh](/workspace/iroh) — P2P QUIC connections
+- [tun2proxy](https://github.com/tun2proxy/tun2proxy) — Recommended external TUN-to-SOCKS5 tool
+- [Production certbot setup](/workspace/system/dev1/certbot.md) — Let's Encrypt on our infrastructure
+- [Production fail2ban setup](/workspace/system/dev1/fail2ban.md) — fail2ban with nftables on our infrastructure
--- a/docs/architecture/server.md
+++ b/docs/architecture/server.md
@@ -23,7 +23,7 @@ The server is the tunnel endpoint. It receives SSH channels requesting TCP conne
 │                                                   │
 │  ┌─────────────────────────────────────────────┐ │
 │  │          SSH Server (russh)                  │ │
-│  │   ServerHandler per connection              │ │
+│  │   ServerHandler per connection               │ │
 │  │   - auth_publickey() → Accept/Reject        │ │
 │  │   - channel_open_direct_tcpip() → connect   │ │
 │  │   - channel_open_forwarded_tcpip() → proxy  │ │
@@ -35,29 +35,53 @@ The server is the tunnel endpoint. It receives SSH channels requesting TCP conne
 │  └──────────────────────────────────────────────┘ │
 │                                                   │
 │  ┌──────────────────────────────────────────────┐ │
-│  │         Outbound Proxy (optional)             │ │
+│  │         Outbound Proxy (optional)            │ │
 │  │   - Direct TCP                               │ │
-│  │   - SOCKS5 proxy                             │ │
-│  │   - HTTP CONNECT proxy                       │ │
+│  │   - SOCKS5 proxy                            │ │
+│  │   - HTTP CONNECT proxy                      │ │
+│  └──────────────────────────────────────────────┘ │
+│                                                   │
+│  ┌──────────────────────────────────────────────┐ │
+│  │         Rate Limiter                         │ │
+│  │   - max-connections-per-ip                   │ │
+│  │   - max-auth-attempts                        │ │
 │  └──────────────────────────────────────────────┘ │
 └──────────────────────────────────────────────────┘
 ```

 ### Authentication

-The server supports Ed25519 public key authentication by default:
+The server supports Ed25519 public key authentication (default) and OpenSSH certificate authority authentication (ADR-012):

-1. Load authorized keys from `~/.ssh/authorized_keys` or a specified path
+**Ed25519 public key** (default):
+1. Load authorized keys from a specified path or in-memory data
 2. `auth_publickey()` checks the presented key against the authorized set
 3. Uses constant-time comparison to prevent timing attacks

-Optional password authentication (not recommended, controlled by feature flag or CLI flag).
+**OpenSSH certificate authority** (ADR-012):
+1. Load a trusted CA public key (`--cert-authority <path>`)
+2. `auth_publickey()` validates the presented certificate: checks CA signature, expiry, and principal restrictions
+3. Supports certificate options: `permit-port-forwarding`, `no-pty`, `source-address`
+
+This enables multi-user deployments where adding one CA line to `authorized_keys` is simpler than managing individual keys for every user.
+
+**No password authentication over SSH.** Keys and certificates are sufficient and more secure. If a local SOCKS5 proxy needs its own auth layer, that's a separate concern.
+
+### TLS Certificate Provisioning
+
+The server supports three TLS certificate modes (ADR-008):
+
+1. **Manual certs** (`--tls-cert` / `--tls-key`): User provides certificate and key files. For users with existing PKI.
+2. **Domain-based ACME** (`--acme-domain <domain>`): Auto-provisions certificates from Let's Encrypt using HTTP-01 or TLS-ALPN-01 challenges. Certificate is domain-bound and auto-renews. Requires port 80 or DNS access for challenges.
+3. **IP-based ACME**: Short-lived certificates via TLS-ALPN-01 challenge on port 443. No domain name needed, but certificates expire frequently. The ACME client runs continuously.
+
+ACME support is feature-gated behind the `acme` feature flag to keep the base binary lean. Implementation uses `rustls-acme` or a similar pure-Rust ACME client to avoid an external `certbot` dependency.

 ### Channel Handling

 When a client opens a `channel_open_direct_tcpip(host, port, originator_addr, originator_port)`:

-1. **ACL check** — verify the client is allowed to connect to `host:port` (if ACLs are configured)
+1. **Connection** — connect to `host:port`, either directly or via the configured outbound proxy
 2. **Outbound connection** — connect to the target, either directly or via the configured outbound proxy
 3. **Bidirectional proxy** — `tokio::io::copy_bidirectional` between the SSH channel stream and the outbound TCP stream
 4. **Cleanup** — close the channel and TCP stream when either side disconnects
@@ -88,6 +112,7 @@ This makes the server appear as an ordinary web server to port scanners and DPI
 ```rust
 struct WraithServerHandler {
    authorized_keys: HashSet<PublicKey>,
+    cert_authorities: Vec<PublicKey>,
    proxy_config: Option<ProxyConfig>,
 }

@@ -95,11 +120,19 @@ impl server::Handler for WraithServerHandler {
    type Error = anyhow::Error;

    async fn auth_publickey(&mut self, user: &str, key: &PublicKey) -> Auth {
+        // Check direct key match
        if self.authorized_keys.contains(key) {
-            Auth::Accept
-        } else {
-            Auth::Reject { proceed_with_methods: None, partial_success: false }
+            return Auth::Accept;
        }
+        // Check certificate authority validation
+        if let Some(cert) = key.as_certificate() {
+            for ca in &self.cert_authorities {
+                if cert.verify(ca) && cert.is_valid() {
+                    return Auth::Accept;
+                }
+            }
+        }
+        Auth::Reject { proceed_with_methods: None, partial_success: false }
    }

    async fn channel_open_direct_tcpip(
@@ -111,7 +144,6 @@ impl server::Handler for WraithServerHandler {
        originator_port: u32,
        session: &mut server::Session,
    ) -> Result<Channel<server::Msg>, Self::Error> {
-        // ACL check (if configured)
        // Connect to host:port (directly or via proxy)
        // Spawn bidirectional proxy task
        Ok(channel)
@@ -119,12 +151,29 @@ impl server::Handler for WraithServerHandler {
 }
 ```

-### Logging
+### Logging and Rate Limiting

- **Log**: Auth attempts (timestamp, source IP, user, key fingerprint, success/failure)
- **Do not log**: Channel open targets, DNS resolutions, bytes transferred, connection duration
+**Logging** (for fail2ban integration on Linux):

-This provides enough information for fail2ban integration without creating a privacy-sensitive audit trail.
+- `INFO` level: auth attempts (remote_addr, user, key_fingerprint, accept/reject)
+- `INFO` level: connection opened (remote_addr, transport kind)
+- `INFO` level: connection closed (remote_addr, duration)
+- Do NOT log: channel open targets, DNS resolutions, bytes transferred
+
+This matches our production fail2ban setup which filters on source IP + failure indicators. Example log lines:
+```
+INFO auth attempt remote_addr=203.0.113.50 user=root key_fingerprint=SHA256:abc... result=reject
+INFO connection opened remote_addr=203.0.113.50 transport=tls
+```
+
+**Built-in rate limiting** (platform-independent):
+
+| Flag | Default | Purpose |
+|------|---------|---------|
+| `--max-connections-per-ip` | 0 (unlimited) | Reject new connections from IPs with N active connections |
+| `--max-auth-attempts` | 10 | Disconnect after N failed auth attempts per connection |
+
+These provide abuse protection on platforms without fail2ban (macOS, Windows, BSD) and complement fail2ban on Linux.

 ### CLI Interface

@@ -132,17 +181,21 @@ This provides enough information for fail2ban integration without creating a pri
 # Basic server (SSH on port 22)
 wraith serve --key ~/.ssh/ssh_host_ed25519_key

-# With TLS on port 443
+# With TLS (manual certs)
 wraith serve --key ~/.ssh/ssh_host_ed25519_key \
    --transport tls \
    --tls-cert /etc/ssl/cert.pem \
    --tls-key /etc/ssl/key.pem

+# With TLS (auto ACME, domain-based)
+wraith serve --key ~/.ssh/ssh_host_ed25519_key \
+    --transport tls \
+    --acme-domain example.com
+
 # With TLS + stealth (fake nginx 404 to scanners)
 wraith serve --key ~/.ssh/ssh_host_ed25519_key \
    --transport tls \
-    --tls-cert /etc/ssl/cert.pem \
-    --tls-key /etc/ssl/key.pem \
+    --acme-domain example.com \
    --stealth

 # With iroh transport (no public IP needed)
@@ -153,40 +206,54 @@ wraith serve --key ~/.ssh/ssh_host_ed25519_key \
 wraith serve --key ~/.ssh/ssh_host_ed25519_key \
    --proxy socks5://127.0.0.1:9050

+# With certificate authority authentication
+wraith serve --key ~/.ssh/ssh_host_ed25519_key \
+    --cert-authority /etc/wraith/ca.pub
+
+# With rate limiting
+wraith serve --key ~/.ssh/ssh_host_ed25519_key \
+    --max-connections-per-ip 5 \
+    --max-auth-attempts 3
+
 # All options
 wraith serve \
-  --key <path> \              # SSH host key path (required)
-  --authorized-keys <path> \  # Authorized keys file (default: ~/.ssh/authorized_keys)
-  --transport tcp|tls|iroh \  # Transport mode
-  --listen <addr:port> \      # Listen address for TCP/TLS (default: 0.0.0.0:22)
-  --tls-cert <path> \         # TLS certificate (required for tls transport)
-  --tls-key <path> \          # TLS private key (required for tls transport)
-  --stealth \                 # Serve fake nginx 404 to non-SSH connections
-  --proxy <url> \             # Outbound proxy URL (socks5:// or http://)
-  --iroh-relay <url>          # iroh relay server URL (default: n0 relay)
+  --key <path-or-buffer> \       # SSH host key (required)
+  --authorized-keys <path> \     # Authorized keys file
+  --cert-authority <path> \      # CA public key for cert-auth
+  --transport tcp|tls|iroh \     # Transport mode
+  --listen <addr:port> \         # Listen address for TCP/TLS (default: 0.0.0.0:22)
+  --tls-cert <path> \            # TLS certificate (manual)
+  --tls-key <path> \            # TLS private key (manual)
+  --acme-domain <domain> \      # ACME auto-cert domain
+  --stealth \                    # Serve fake nginx 404 to non-SSH connections
+  --proxy <url> \                # Outbound proxy URL (socks5:// or http://)
+  --iroh-relay <url> \           # iroh relay server URL (default: n0 relay)
+  --max-connections-per-ip <n> \ # Max concurrent connections per IP (default: unlimited)
+  --max-auth-attempts <n>        # Max auth failures before disconnect (default: 10)
 ```

 ### iroh Server Mode

 When running with `--transport iroh`, the server:

-1. Creates an `iroh::Endpoint` with the SSH ALPN
+1. Creates an `iroh::Endpoint` with ALPN value `b"wraith-ssh"`
 2. Prints its `EndpointId` (Ed25519 public key) — this is what clients use to connect
 3. Uses `iroh::protocol::Router` to accept incoming connections
 4. For each connection, accepts a `open_bi()` stream and passes it to `server::run_stream()`

-No listening port is needed. The server connects outbound to the iroh relay and awaits connections from clients who know its `EndpointId`.
+No listening port is needed. The server connects outbound to the iroh relay (default: n0, override with `--iroh-relay`) and awaits connections from clients who know its `EndpointId`.

 ## Constraints

- The server does not log tunnel destinations (ADR-006, pending)
+- The server does not log tunnel destinations (ADR-006). Auth events and connection events are logged for fail2ban integration (ADR-013).
 - One `ServerHandler` instance per connection. Handler state is not shared between connections (unless explicitly configured via `Arc` shared state for things like connection limits).
 - The server binds to a single transport at a time. Running multiple transports (e.g., TCP + iroh) simultaneously requires separate processes or a future multiplexing feature.
+- ACME support requires the `acme` feature flag. Without it, only manual TLS certs are supported.
+- No password authentication over SSH channels. Key-based and cert-authority only (ADR-012).

 ## Open Questions

- **OQ-07**: Whether to support ACME/Let's Encrypt auto-provisioning for TLS certificates
- **OQ-08**: Connection limits and rate limiting configuration
+None — all resolved.

 ## Design Decisions

@@ -194,3 +261,9 @@ No listening port is needed. The server connects outbound to the iroh relay and
 |-----|----------|---------|
 | [001](decisions/001-pluggable-transport.md) | Pluggable transport | Transport trait, SSH consumes stream |
 | [004](decisions/004-ssh-over-transport.md) | SSH over transport | SSH never touches network directly |
+| [006](decisions/006-no-logging-of-tunnel-destinations.md) | No logging of destinations | Server logs auth and connections, not destinations |
+| [008](decisions/008-acme-lets-encrypt.md) | ACME/Let's Encrypt | Auto-provision TLS certs, domain and IP paths |
+| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Key + cert-authority auth | No password auth; support OpenSSH cert-authority |
+| [013](decisions/013-fail2ban-friendly-logging.md) | Fail2ban-friendly logging | Structured auth logs + built-in rate limiting |
+| [017](decisions/017-stealth-mode-protocol-multiplexing.md) | Stealth mode | Protocol multiplexing on port 443 |
+| [018](decisions/018-control-channel-for-pubsub.md) | Control channel | Reserved `wraith-control` destination for pubsub |
--- a/docs/architecture/transport.md
+++ b/docs/architecture/transport.md
@@ -84,6 +84,32 @@ let stream = tokio::io::join(recv_stream, send_stream);

 See ADR-003 for the decision to use `tokio::io::join` over a custom wrapper.

+### iroh Relay Configuration
+
+By default, iroh transport uses n0's free relay servers (`https://relay.iroh.network/`). This provides zero-config NAT traversal for testing and development. For production deployments, users override with `--iroh-relay <url>` to point to a self-hosted relay.
+
+The relay URL is passed to iroh's `Endpoint::builder()` configuration. Self-hosted relay setup is documented in the project wiki.
+
+See ADR-009 for the decision to default to n0's relay with override.
+
+### Transport Chaining
+
+Transports can be nested. The CLI supports `--transport iroh --proxy socks5://...` natively (ADR-010):
+
+```bash
+wraith connect --transport iroh --proxy socks5://127.0.0.1:1080
+```
+
+This routes iroh's outbound TCP connections through the specified SOCKS5 proxy. iroh's `Endpoint::builder` accepts proxy configuration directly, so the implementation is minimal — parse the proxy URL and pass it to the endpoint builder.
+
+For other combinations:
+- TCP + TLS is already implicit (TLS wraps TCP in `TlsTransport`)
+- TLS + SOCKS5 proxy is also supported via `--proxy` with `--transport tls`
+
+**Note**: `--proxy` has different semantics on the client vs the server:
+- **Client**: `--proxy` routes the *transport connection* through the proxy (e.g., iroh endpoint → SOCKS5 → iroh relay)
+- **Server**: `--proxy` routes *outbound target connections* through the proxy (e.g., SSH channel request → SOCKS5 → target host)
+
 ### Connection Lifecycle

 ```
@@ -107,27 +133,16 @@ Client                                          Server
  │  └────────────────────────────────────┘       │
 ```

-## Transport Chaining
-
-Transports can be nested. A common pattern: iroh transport through an upstream SOCKS5 proxy (which itself tunnels through another wraith instance):
-
-```
-wraith connect --transport iroh --proxy socks5://127.0.0.1:1080
-```
-
-The iroh endpoint's outbound TCP connections go through the SOCKS5 proxy, which itself tunnels through another wraith instance's SSH channel. The SOCKS5 proxy is provided by the core SOCKS5 server — no special chaining code needed.
-
 ## Constraints

 - SSH sees only the stream. It never opens its own TCP connections. (ADR-004)
 - Each transport produces exactly one stream per SSH session. Multiple sessions need multiple `connect()` calls.
 - The iroh transport reuses a single `Endpoint` across multiple sessions (one QUIC connection per peer, multiple `open_bi()` streams). The endpoint is created once and shared.
- TLS transport requires certificate configuration on the server side. The client can accept any certificate (self-signed) or verify against a CA.
+- TLS transport requires certificate configuration on the server side. The client can accept any certificate (self-signed) or verify against a CA. Server-side ACME is supported (ADR-008).

 ## Open Questions

- **OQ-02**: iroh relay configuration defaults
- **OQ-05**: Whether to support transport chaining in the CLI (`--transport iroh --proxy socks5://...`) or keep it as a separate manual configuration
+None — all resolved.

 ## Design Decisions

@@ -136,3 +151,6 @@ The iroh endpoint's outbound TCP connections go through the SOCKS5 proxy, which
 | [001](decisions/001-pluggable-transport.md) | Pluggable transport | Transport trait produces stream, SSH consumes it |
 | [003](decisions/003-iroh-stream-join.md) | iroh stream join | `tokio::io::join` combines QUIC halves |
 | [004](decisions/004-ssh-over-transport.md) | SSH over transport | SSH never touches TCP/iroh/TLS directly |
+| [008](decisions/008-acme-lets-encrypt.md) | ACME/Let's Encrypt | Auto-provision TLS certs, domain and IP paths |
+| [009](decisions/009-default-iroh-relay.md) | Default iroh relay | n0 relay by default, `--iroh-relay` override |
+| [010](decisions/010-transport-chaining-cli.md) | Transport chaining | `--proxy` works with all transports natively |
--- a/docs/architecture/tun-shim.md
+++ b/docs/architecture/tun-shim.md
@@ -1,111 +1,28 @@
 ---
-status: draft
+status: deprecated
 last_updated: 2026-06-01
 ---

-# TUN Shim
+# TUN Shim (Deprecated)

-## What
+> **Note**: TUN functionality has been deferred from the wraith project. For VPN-like "route all traffic" behavior, use `tun2proxy` alongside wraith's SOCKS5 proxy. See ADR-014 for the rationale.

-A separate process (`wraith-tun`) that creates a TUN interface, reads IP packets from it, and forwards them through the core wraith client's SOCKS5 port. Requires root or `CAP_NET_ADMIN`.
+## What Changed

-## Why
-
-The core wraith binary must never require root. TUN interfaces need elevated privileges. By separating TUN into its own minimal process, we:
-
- Minimize the root-required code surface (auditable in an afternoon)
- Keep the core binary unprivileged for SOCKS5 and port forwarding
- Allow the TUN shim to crash without affecting the SSH session
- Match the proven tun2proxy architecture
-
-(ADR-002)
-
-## Architecture
-
-```
-┌─────────────────┐     ┌──────────────────────────────┐
-│   wraith-tun     │     │      wraith connect           │
-│   (root)         │     │      (unprivileged)            │
-│                  │     │                                │
-│  ┌────────────┐ │     │  ┌──────────────────────────┐ │
-│  │ TUN Device │ │     │  │ SOCKS5 Server             │ │
-│  │ (tun-rs)   │◄├─────┤├►│ :1080                    │ │
-│  │ 10.0.0.1/24│ │     │  └──────────────────────────┘ │
-│  └────────────┘ │     │                                │
-│                  │     │  ┌──────────────────────────┐ │
-│  Route all      │     │  │ SSH Client (russh)        │ │
-│  traffic via    │     │  │ connect via Transport     │ │
-│  TUN device    │     │  └──────────────────────────┘ │
-└─────────────────┘     └──────────────────────────────┘
-```
-
-### Data Flow
-
-1. OS routing table sends all traffic through TUN device `tun0`
-2. `wraith-tun` reads IP packets from TUN device
-3. `wraith-tun` extracts destination IP:port from each packet
-4. `wraith-tun` connects to `127.0.0.1:1080` (wraith SOCKS5) with `SOCKS5h` (domain resolution by proxy)
-5. `wraith-tun` proxies the TCP connection through SOCKS5
-6. wraith's SOCKS5 server opens an SSH direct-tcpip channel to the destination
-7. Bytes flow: application → TUN → SOCKS5 → SSH channel → server → target
-
-### Virtual DNS
-
-The TUN shim implements virtual DNS (same approach as tun2proxy):
-
- DNS queries to port 53 arriving at the TUN device are intercepted
- Query names are mapped to fake IPs from `198.18.0.0/15`
- Connections to fake IPs are resolved to the original domain name via SOCKS5h
- This prevents DNS leaks (all DNS resolution happens server-side)
-
-### CLI Interface
+The `wraith-tun` separate process and all TUN-related code is out of scope. The recommended approach for VPN-like behavior is:

 ```bash
-# Basic TUN mode (uses wraith's SOCKS5 on 127.0.0.1:1080)
-sudo wraith-tun --socks5 127.0.0.1:1080
+# Terminal 1: wraith SOCKS5 proxy (no root required)
+wraith connect --server example.com --identity ~/.ssh/id_ed25519

-# With custom TUN address
-sudo wraith-tun --socks5 127.0.0.1:1080 --tun-addr 10.0.0.1/24
-
-# With DNS configuration
-sudo wraith-tun --socks5 127.0.0.1:1080 --dns virtual
-
-# Unprivileged mode (creates network namespace)
-wraith-tun --socks5 127.0.0.1:1080 --unshare
+# Terminal 2: tun2proxy routes all traffic through wraith's SOCKS5
+sudo tun2proxy --proxy socks5://127.0.0.1:1080
 ```

-### Unprivileged Mode
+This keeps the core wraith binary free of TUN complexity and leverages an existing, well-tested tool for TUN-to-SOCKS5 bridging.

-The `--unshare` flag creates a new network namespace, sets up the TUN device inside it, and maintains connectivity to the SOCKS5 proxy via the global namespace. This allows running without root, using only `CAP_NET_ADMIN` capability or namespace creation permissions.
+## References

-## Scope
-
-This is Phase 3 of the implementation plan. The core (`wraith serve`, `wraith connect` with SOCKS5 and port forwarding) comes first. TUN is an add-on.
-
-### What `wraith-tun` Does NOT Do
-
- It does not manage SSH sessions
- It does not know about transports
- It does not handle authentication
- It does not read SSH keys
-
-It only: reads packets from TUN, forwards to SOCKS5. That's it.
-
-## Constraints
-
- Requires root or `CAP_NET_ADMIN` (or `--unshare` namespace isolation)
- IPv4 only in initial release, IPv6 follow-up
- UDP over TCP (DNS queries are handled via SOCKS5h, other UDP is dropped in initial release)
- Approximately 200-500 lines of Rust for the initial implementation
-
-## Open Questions
-
- **OQ-03**: Windows TUN support scope (wintun.dll dependency)
- **OQ-09**: Whether to use tun2proxy's `ip-stack` crate for TCP reconstruction or implement a simpler packet-level approach
-
-## Design Decisions
-
-| ADR | Decision | Summary |
-|-----|----------|---------|
-| [002](decisions/002-tun-separate-process.md) | TUN separate process | Core never needs root, TUN is thin wrapper |
-| [005](decisions/005-socks5-before-tun.md) | SOCKS5 first | TUN forwards to SOCKS5, not to SSH directly |
+- [ADR-014](decisions/014-defer-tun-recommend-socks5-proxy.md) — decision to defer TUN
+- [ADR-005](decisions/005-socks5-before-tun.md) — SOCKS5 is still the primary interface
+- [tun2proxy](https://github.com/tun2proxy/tun2proxy) — recommended external tool for TUN support