Add architecture specification for wraith SSH tunnel tool

Docs: - README.md: index with doc table, ADR table, lifecycle definitions - overview.md: purpose, exports, dependencies, constraints - transport.md: Transport trait, TCP/TLS/iroh implementations, stream join - client.md: SOCKS5 server, port forwarding, channel manager, reconnection - server.md: auth, channel handling, stealth mode, outbound proxy - tun-shim.md: separate privileged process, virtual DNS, --unshare mode - napi-and-pubsub.md: NAPI wrapper, pubsub event target adapter ADRs: - 001: Pluggable transport via AsyncRead+AsyncWrite trait - 002: TUN shim as separate process - 003: iroh stream via tokio::io::join - 004: SSH runs over transport, not alongside - 005: SOCKS5 as primary interface, TUN as add-on - 006(007): NAPI exposes single duplex stream Open questions: 11 items covering TLS certs, iroh relay defaults, Windows TUN, auth expansion, NAPI surface, TCP reconstruction
2026-06-01 15:01:45 +00:00
parent c1275e2dfd
commit dad8224686
14 changed files with 1116 additions and 0 deletions
--- a/docs/architecture/decisions/001-pluggable-transport.md
+++ b/docs/architecture/decisions/001-pluggable-transport.md
@@ -0,0 +1,26 @@
+# ADR-001: Pluggable Transport via AsyncRead+AsyncWrite Trait
+
+## Status
+Accepted
+
+## Context
+Wraith needs to support multiple transport modes (TCP, TLS, iroh) for SSH sessions. Each mode has different connection establishment logic but produces the same result: a bidirectional byte stream. Without an abstraction, each transport would need its own SSH connection code path.
+
+russh's `client::connect_stream()` and `server::run_stream()` both accept `AsyncRead + AsyncWrite + Unpin + Send`, meaning SSH is already transport-agnostic at the API level. The design question is whether to enshrine this in wraith's own type system or handle each transport case-by-case.
+
+## Decision
+Define a `Transport` trait that produces `AsyncRead + AsyncWrite + Unpin + Send` streams. Each transport (TCP, TLS, iroh) implements this trait. The SSH layer calls `transport.connect()` and passes the result to `russh::client::connect_stream()`.
+
+On the server side, define a `TransportAcceptor` trait that produces incoming streams. Each acceptor (TCP listener, TLS listener, iroh endpoint) implements this trait. The server calls `acceptor.accept()` and passes the result to `russh::server::run_stream()`.
+
+This makes adding a new transport (e.g., WebSocket, QUIC directly) a matter of implementing the trait, not modifying SSH code.
+
+## Consequences
+- **Positive**: Clean separation between transport and protocol. Adding transports is additive. SSH code is transport-agnostic.
+- **Positive**: Testing is simplified — mock transports can produce in-memory streams.
+- **Negative**: Slight indirection for the single-transport case (just TCP). The trait boilerplate is minimal though.
+- **Negative**: The trait must be object-safe if we want dynamic dispatch. Using `impl Trait` in function signatures avoids this but limits runtime transport selection. CLI-selected transport needs dynamic dispatch: `Box<dyn Transport<Stream = Box<dyn AsyncRead+AsyncWrite+Unpin+Send>>>`.
+
+## References
+- [transport.md](../transport.md)
+- [Feasibility assessment §3](../../../../conversations/research/ssh-tunnel-vpn-alternative-feasibility.md)
--- a/docs/architecture/decisions/002-tun-separate-process.md
+++ b/docs/architecture/decisions/002-tun-separate-process.md
@@ -0,0 +1,30 @@
+# ADR-002: TUN Shim as Separate Process
+
+## Status
+Accepted
+
+## Context
+TUN interface creation requires root privileges or `CAP_NET_ADMIN` on Linux, Administrator on Windows, or platform-specific VPN APIs on macOS/iOS/Android. If the core wraith binary required these privileges, the attack surface of root-required code would include the entire SSH implementation, key handling, and transport negotiation.
+
+The primary use cases (SOCKS5 proxy, port forwarding) need no privileges at all. Only the "route all traffic through TUN" use case needs root.
+
+## Decision
+The TUN functionality is a separate `wraith-tun` binary that:
+1. Creates a TUN device (requires root / CAP_NET_ADMIN)
+2. Reads IP packets from it
+3. Forwards each connection to the core wraith's SOCKS5 port (127.0.0.1:1080)
+4. Proxies bytes between TUN packets and SOCKS5 connections
+
+The core `wraith connect` binary never needs root. The `wraith-tun` binary is ~200-500 lines and does nothing except TUN ↔ SOCKS5 forwarding.
+
+## Consequences
+- **Positive**: Root-required code surface is tiny and auditable.
+- **Positive**: Core binary runs unprivileged. SOCKS5 and port forwarding work without any special permissions.
+- **Positive**: TUN process can crash without affecting the SSH session (it just reconnects to SOCKS5).
+- **Positive**: Matches the proven tun2proxy architecture.
+- **Negative**: Two processes to manage instead of one. Requires process supervision (systemd, etc.).
+- **Negative**: SOCKS5 adds a small latency overhead vs. direct TUN → SSH packet routing. This is acceptable for the security benefit.
+
+## References
+- [tun-shim.md](../tun-shim.md)
+- [tun2proxy](https://github.com/tun2proxy/tun2proxy) — proven architecture for TUN → SOCKS5 proxy
--- a/docs/architecture/decisions/003-iroh-stream-join.md
+++ b/docs/architecture/decisions/003-iroh-stream-join.md
@@ -0,0 +1,31 @@
+# ADR-003: iroh Stream via tokio::io::join
+
+## Status
+Accepted
+
+## Context
+iroh's QUIC implementation provides separate `RecvStream` (implements `AsyncRead`) and `SendStream` (implements `AsyncWrite`) for each bidirectional channel opened via `open_bi()` / `accept_bi()`. russh's `connect_stream()` and `run_stream()` require a single type implementing both `AsyncRead` and `AsyncWrite`.
+
+Options considered:
+1. `tokio::io::join(recv, send)` — Combines the two halves into `Join<RecvStream, SendStream>` which implements both traits.
+2. Custom `IrohStream` wrapper — A struct with `recv` and `send` fields that delegates `AsyncRead` to `recv` and `AsyncWrite` to `send`.
+3. Using iroh's `Connection` directly — Opening a new `open_bi()` for each SSH channel instead of running SSH over a single stream.
+
+## Decision
+Use `tokio::io::join(recv_stream, send_stream)` (Option 1).
+
+One line of code, correct trait implementations, no custom types needed. The `Join<A, B>` type implements `AsyncRead` using `A` and `AsyncWrite` using `B`, which maps directly to iroh's split stream model.
+
+If profiling later shows overhead (unlikely — it's just method dispatch), we can switch to a custom wrapper. But YAGNI until demonstrated.
+
+Option 3 was rejected because it would require modifying russh to understand iroh connections. The whole point of the transport trait is that SSH doesn't know about iroh.
+
+## Consequences
+- **Positive**: Minimal code. One line to bridge iroh and russh.
+- **Positive**: No custom types to maintain.
+- **Positive**: Correct `AsyncRead` + `AsyncWrite` behavior — `Poll::Pending` on one half doesn't affect the other.
+- **Negative**: None identified. The `Join` type is a standard tokio combinator with well-tested semantics.
+
+## References
+- [transport.md](../transport.md)
+- [Feasibility assessment §11](../../../../conversations/research/ssh-tunnel-vpn-alternative-feasibility.md)
--- a/docs/architecture/decisions/004-ssh-over-transport.md
+++ b/docs/architecture/decisions/004-ssh-over-transport.md
@@ -0,0 +1,28 @@
+# ADR-004: SSH Runs Over Transport, Not Alongside
+
+## Status
+Accepted
+
+## Context
+There are two ways to structure the relationship between SSH and the transport layer:
+
+1. **SSH over transport**: The transport produces one duplex stream. The entire SSH session (handshake, key exchange, channel multiplexing) runs over that single stream via `connect_stream()` / `run_stream()`. SSH has no direct network access.
+
+2. **Transport alongside SSH**: SSH manages its own TCP connections via `connect()` / `run()`. The transport layer is an additional feature that wraps outgoing connections. SSH knows about the network.
+
+## Decision
+SSH runs over the transport (Option 1). The SSH layer never opens its own sockets or knows what transport it's on.
+
+This is directly enabled by russh's `connect_stream()` and `run_stream()` APIs, which accept any `AsyncRead+AsyncWrite+Unpin+Send`. SSH's entire interaction with the network goes through the single stream produced by the transport.
+
+## Consequences
+- **Positive**: Adding a new transport requires implementing the `Transport` trait, not modifying SSH code.
+- **Positive**: Testing is straightforward — mock transports produce in-memory streams.
+- **Positive**: Security audit is clean — the SSH implementation has no network-facing code.
+- **Positive**: The transport can be layered. Iroh connecting through a SOCKS5 proxy (which itself tunnels through wraith) is just a transport that calls out to a SOCKS5 library before establishing the QUIC connection.
+- **Negative**: SSH keepalive and reconnection must be handled at the transport level. If the transport stream dies, the SSH session dies. Reconnection means establishing a new transport + new SSH session. There's no "SSH reconnects over the same transport" — you get a new session.
+- **Negative**: Multiple SSH sessions over the same iroh connection require the iroh `Endpoint` (not stream) to be shared between sessions. The transport trait produces one stream per `connect()` call. The iroh `Endpoint` must be created externally and shared. (The `IrohTransport` struct holds an `Arc<Endpoint>`.)
+
+## References
+- [transport.md](../transport.md)
+- [Feasibility assessment §3.4](../../../../conversations/research/ssh-tunnel-vpn-alternative-feasibility.md)
--- a/docs/architecture/decisions/005-socks5-before-tun.md
+++ b/docs/architecture/decisions/005-socks5-before-tun.md
@@ -0,0 +1,39 @@
+# ADR-005: SOCKS5 as Primary Interface, TUN as Add-on
+
+## Status
+Accepted
+
+## Context
+A "VPN-like" tool needs to route traffic. There are three approaches:
+
+1. **TUN only**: Create a TUN interface, route all OS traffic through it. Full VPN experience but requires root.
+2. **SOCKS5 only**: Local SOCKS5 proxy. Applications configure proxy settings. No root needed but application support varies.
+3. **SOCKS5 primary, TUN add-on**: SOCKS5 is the core interface. TUN forwards to SOCKS5.
+
+## Decision
+SOCKS5 is the primary interface. TUN is a separate process that forwards to SOCKS5 (Option 3).
+
+SOCKS5 is the core because:
+- It requires no privileges
+- `curl --socks5-hostname` works everywhere
+- Browsers, most CLI tools, and many applications support SOCKS5
+- SOCKS5h prevents DNS leaks by resolving names server-side
+- It's the interface that the NAPI wrapper and pubsub adapter build on
+- TUN is only needed for "route all traffic" use cases, which are a subset of users
+
+TUN forwards to SOCKS5 rather than directly to SSH because:
+- The SOCKS5 code already handles TCP connection establishment and bidirectional proxying
+- TUN's job is just IP packet → SOCKS5 connection, not IP packet → SSH channel
+- The `wraith-tun` binary stays minimal (~200-500 lines)
+- No root code in the core binary
+
+## Consequences
+- **Positive**: Core binary is root-free. TUN is optional and separate.
+- **Positive**: SOCKS5 is testable without TUN — just `curl` against it.
+- **Positive**: TUN implementation is simplified — it's a thin wrapper around tun2proxy's pattern pointed at localhost:1080.
+- **Negative**: TUN adds one network hop (TUN → localhost SOCKS5 → SSH). The latency impact is negligible (localhost).
+- **Negative**: SOCKS5 doesn't capture UDP (except DNS via SOCKS5h). TUN mode would handle non-DNS UDP via the SOCKS5 UDP association or drop it.
+
+## References
+- [client.md](../client.md)
+- [tun-shim.md](../tun-shim.md)
--- a/docs/architecture/decisions/007-napi-single-stream.md
+++ b/docs/architecture/decisions/007-napi-single-stream.md
@@ -0,0 +1,26 @@
+# ADR-006: NAPI Exposes Single Duplex Stream
+
+## Status
+Accepted
+
+## Context
+The NAPI wrapper for wraith could expose different granularity levels:
+
+1. **Full SSH API**: Expose channel multiplexing, `open_direct_tcpip`, `tcpip_forward`, session management. The TypeScript layer would manage channels.
+2. **Single duplex stream**: The NAPI wrapper establishes one SSH channel and returns it as a Node.js `Duplex` stream. TypeScript multiplexing (if needed) happens at the pubsub layer.
+
+## Decision
+Option 2: NAPI exposes a single duplex stream.
+
+The NAPI wrapper's job is to get a reliable, authenticated byte stream from A to B. It handles transport (TCP/TLS/iroh), SSH authentication, and channel setup, then hands the caller a single `Duplex` stream that just works.
+
+If the TypeScript consumer needs multiplexing (e.g., multiple concurrent tool calls over operations), pubsub handles that at the `EventEnvelope` level. Multiple `call.requested` / `call.responded` events flow over the same stream, distinguished by their `id` fields. This is how the existing WebSocket adapter works.
+
+## Consequences
+- **Positive**: Minimal NAPI surface — one function, one return type. Small binary, small FFI boundary.
+- **Positive**: The TypeScript side doesn't need to understand SSH at all. It gets a stream and sends/receives `EventEnvelope` JSON.
+- **Positive**: No need to expose russh types in NAPI. The SSH complexity stays in Rust.
+- **Negative**: If a consumer wants multiple isolated channels (e.g., one for events, one for file transfer), they'd need multiple `connect()` calls (multiple SSH sessions). This is acceptable for the expected use case (pubsub events over a single stream).
+
+## References
+- [napi-and-pubsub.md](../napi-and-pubsub.md)