Add architecture specification for wraith SSH tunnel tool

Docs:
- README.md: index with doc table, ADR table, lifecycle definitions
- overview.md: purpose, exports, dependencies, constraints
- transport.md: Transport trait, TCP/TLS/iroh implementations, stream join
- client.md: SOCKS5 server, port forwarding, channel manager, reconnection
- server.md: auth, channel handling, stealth mode, outbound proxy
- tun-shim.md: separate privileged process, virtual DNS, --unshare mode
- napi-and-pubsub.md: NAPI wrapper, pubsub event target adapter

ADRs:
- 001: Pluggable transport via AsyncRead+AsyncWrite trait
- 002: TUN shim as separate process
- 003: iroh stream via tokio::io::join
- 004: SSH runs over transport, not alongside
- 005: SOCKS5 as primary interface, TUN as add-on
- 006(007): NAPI exposes single duplex stream

Open questions: 11 items covering TLS certs, iroh relay defaults,
Windows TUN, auth expansion, NAPI surface, TCP reconstruction
This commit is contained in:
2026-06-01 15:01:45 +00:00
parent c1275e2dfd
commit dad8224686
14 changed files with 1116 additions and 0 deletions

View File

@@ -0,0 +1,26 @@
# ADR-001: Pluggable Transport via AsyncRead+AsyncWrite Trait
## Status
Accepted
## Context
Wraith needs to support multiple transport modes (TCP, TLS, iroh) for SSH sessions. Each mode has different connection establishment logic but produces the same result: a bidirectional byte stream. Without an abstraction, each transport would need its own SSH connection code path.
russh's `client::connect_stream()` and `server::run_stream()` both accept `AsyncRead + AsyncWrite + Unpin + Send`, meaning SSH is already transport-agnostic at the API level. The design question is whether to enshrine this in wraith's own type system or handle each transport case-by-case.
## Decision
Define a `Transport` trait that produces `AsyncRead + AsyncWrite + Unpin + Send` streams. Each transport (TCP, TLS, iroh) implements this trait. The SSH layer calls `transport.connect()` and passes the result to `russh::client::connect_stream()`.
On the server side, define a `TransportAcceptor` trait that produces incoming streams. Each acceptor (TCP listener, TLS listener, iroh endpoint) implements this trait. The server calls `acceptor.accept()` and passes the result to `russh::server::run_stream()`.
This makes adding a new transport (e.g., WebSocket, QUIC directly) a matter of implementing the trait, not modifying SSH code.
## Consequences
- **Positive**: Clean separation between transport and protocol. Adding transports is additive. SSH code is transport-agnostic.
- **Positive**: Testing is simplified — mock transports can produce in-memory streams.
- **Negative**: Slight indirection for the single-transport case (just TCP). The trait boilerplate is minimal though.
- **Negative**: The trait must be object-safe if we want dynamic dispatch. Using `impl Trait` in function signatures avoids this but limits runtime transport selection. CLI-selected transport needs dynamic dispatch: `Box<dyn Transport<Stream = Box<dyn AsyncRead+AsyncWrite+Unpin+Send>>>`.
## References
- [transport.md](../transport.md)
- [Feasibility assessment §3](../../../../conversations/research/ssh-tunnel-vpn-alternative-feasibility.md)

View File

@@ -0,0 +1,30 @@
# ADR-002: TUN Shim as Separate Process
## Status
Accepted
## Context
TUN interface creation requires root privileges or `CAP_NET_ADMIN` on Linux, Administrator on Windows, or platform-specific VPN APIs on macOS/iOS/Android. If the core wraith binary required these privileges, the attack surface of root-required code would include the entire SSH implementation, key handling, and transport negotiation.
The primary use cases (SOCKS5 proxy, port forwarding) need no privileges at all. Only the "route all traffic through TUN" use case needs root.
## Decision
The TUN functionality is a separate `wraith-tun` binary that:
1. Creates a TUN device (requires root / CAP_NET_ADMIN)
2. Reads IP packets from it
3. Forwards each connection to the core wraith's SOCKS5 port (127.0.0.1:1080)
4. Proxies bytes between TUN packets and SOCKS5 connections
The core `wraith connect` binary never needs root. The `wraith-tun` binary is ~200-500 lines and does nothing except TUN ↔ SOCKS5 forwarding.
## Consequences
- **Positive**: Root-required code surface is tiny and auditable.
- **Positive**: Core binary runs unprivileged. SOCKS5 and port forwarding work without any special permissions.
- **Positive**: TUN process can crash without affecting the SSH session (it just reconnects to SOCKS5).
- **Positive**: Matches the proven tun2proxy architecture.
- **Negative**: Two processes to manage instead of one. Requires process supervision (systemd, etc.).
- **Negative**: SOCKS5 adds a small latency overhead vs. direct TUN → SSH packet routing. This is acceptable for the security benefit.
## References
- [tun-shim.md](../tun-shim.md)
- [tun2proxy](https://github.com/tun2proxy/tun2proxy) — proven architecture for TUN → SOCKS5 proxy

View File

@@ -0,0 +1,31 @@
# ADR-003: iroh Stream via tokio::io::join
## Status
Accepted
## Context
iroh's QUIC implementation provides separate `RecvStream` (implements `AsyncRead`) and `SendStream` (implements `AsyncWrite`) for each bidirectional channel opened via `open_bi()` / `accept_bi()`. russh's `connect_stream()` and `run_stream()` require a single type implementing both `AsyncRead` and `AsyncWrite`.
Options considered:
1. `tokio::io::join(recv, send)` — Combines the two halves into `Join<RecvStream, SendStream>` which implements both traits.
2. Custom `IrohStream` wrapper — A struct with `recv` and `send` fields that delegates `AsyncRead` to `recv` and `AsyncWrite` to `send`.
3. Using iroh's `Connection` directly — Opening a new `open_bi()` for each SSH channel instead of running SSH over a single stream.
## Decision
Use `tokio::io::join(recv_stream, send_stream)` (Option 1).
One line of code, correct trait implementations, no custom types needed. The `Join<A, B>` type implements `AsyncRead` using `A` and `AsyncWrite` using `B`, which maps directly to iroh's split stream model.
If profiling later shows overhead (unlikely — it's just method dispatch), we can switch to a custom wrapper. But YAGNI until demonstrated.
Option 3 was rejected because it would require modifying russh to understand iroh connections. The whole point of the transport trait is that SSH doesn't know about iroh.
## Consequences
- **Positive**: Minimal code. One line to bridge iroh and russh.
- **Positive**: No custom types to maintain.
- **Positive**: Correct `AsyncRead` + `AsyncWrite` behavior — `Poll::Pending` on one half doesn't affect the other.
- **Negative**: None identified. The `Join` type is a standard tokio combinator with well-tested semantics.
## References
- [transport.md](../transport.md)
- [Feasibility assessment §11](../../../../conversations/research/ssh-tunnel-vpn-alternative-feasibility.md)

View File

@@ -0,0 +1,28 @@
# ADR-004: SSH Runs Over Transport, Not Alongside
## Status
Accepted
## Context
There are two ways to structure the relationship between SSH and the transport layer:
1. **SSH over transport**: The transport produces one duplex stream. The entire SSH session (handshake, key exchange, channel multiplexing) runs over that single stream via `connect_stream()` / `run_stream()`. SSH has no direct network access.
2. **Transport alongside SSH**: SSH manages its own TCP connections via `connect()` / `run()`. The transport layer is an additional feature that wraps outgoing connections. SSH knows about the network.
## Decision
SSH runs over the transport (Option 1). The SSH layer never opens its own sockets or knows what transport it's on.
This is directly enabled by russh's `connect_stream()` and `run_stream()` APIs, which accept any `AsyncRead+AsyncWrite+Unpin+Send`. SSH's entire interaction with the network goes through the single stream produced by the transport.
## Consequences
- **Positive**: Adding a new transport requires implementing the `Transport` trait, not modifying SSH code.
- **Positive**: Testing is straightforward — mock transports produce in-memory streams.
- **Positive**: Security audit is clean — the SSH implementation has no network-facing code.
- **Positive**: The transport can be layered. Iroh connecting through a SOCKS5 proxy (which itself tunnels through wraith) is just a transport that calls out to a SOCKS5 library before establishing the QUIC connection.
- **Negative**: SSH keepalive and reconnection must be handled at the transport level. If the transport stream dies, the SSH session dies. Reconnection means establishing a new transport + new SSH session. There's no "SSH reconnects over the same transport" — you get a new session.
- **Negative**: Multiple SSH sessions over the same iroh connection require the iroh `Endpoint` (not stream) to be shared between sessions. The transport trait produces one stream per `connect()` call. The iroh `Endpoint` must be created externally and shared. (The `IrohTransport` struct holds an `Arc<Endpoint>`.)
## References
- [transport.md](../transport.md)
- [Feasibility assessment §3.4](../../../../conversations/research/ssh-tunnel-vpn-alternative-feasibility.md)

View File

@@ -0,0 +1,39 @@
# ADR-005: SOCKS5 as Primary Interface, TUN as Add-on
## Status
Accepted
## Context
A "VPN-like" tool needs to route traffic. There are three approaches:
1. **TUN only**: Create a TUN interface, route all OS traffic through it. Full VPN experience but requires root.
2. **SOCKS5 only**: Local SOCKS5 proxy. Applications configure proxy settings. No root needed but application support varies.
3. **SOCKS5 primary, TUN add-on**: SOCKS5 is the core interface. TUN forwards to SOCKS5.
## Decision
SOCKS5 is the primary interface. TUN is a separate process that forwards to SOCKS5 (Option 3).
SOCKS5 is the core because:
- It requires no privileges
- `curl --socks5-hostname` works everywhere
- Browsers, most CLI tools, and many applications support SOCKS5
- SOCKS5h prevents DNS leaks by resolving names server-side
- It's the interface that the NAPI wrapper and pubsub adapter build on
- TUN is only needed for "route all traffic" use cases, which are a subset of users
TUN forwards to SOCKS5 rather than directly to SSH because:
- The SOCKS5 code already handles TCP connection establishment and bidirectional proxying
- TUN's job is just IP packet → SOCKS5 connection, not IP packet → SSH channel
- The `wraith-tun` binary stays minimal (~200-500 lines)
- No root code in the core binary
## Consequences
- **Positive**: Core binary is root-free. TUN is optional and separate.
- **Positive**: SOCKS5 is testable without TUN — just `curl` against it.
- **Positive**: TUN implementation is simplified — it's a thin wrapper around tun2proxy's pattern pointed at localhost:1080.
- **Negative**: TUN adds one network hop (TUN → localhost SOCKS5 → SSH). The latency impact is negligible (localhost).
- **Negative**: SOCKS5 doesn't capture UDP (except DNS via SOCKS5h). TUN mode would handle non-DNS UDP via the SOCKS5 UDP association or drop it.
## References
- [client.md](../client.md)
- [tun-shim.md](../tun-shim.md)

View File

@@ -0,0 +1,26 @@
# ADR-006: NAPI Exposes Single Duplex Stream
## Status
Accepted
## Context
The NAPI wrapper for wraith could expose different granularity levels:
1. **Full SSH API**: Expose channel multiplexing, `open_direct_tcpip`, `tcpip_forward`, session management. The TypeScript layer would manage channels.
2. **Single duplex stream**: The NAPI wrapper establishes one SSH channel and returns it as a Node.js `Duplex` stream. TypeScript multiplexing (if needed) happens at the pubsub layer.
## Decision
Option 2: NAPI exposes a single duplex stream.
The NAPI wrapper's job is to get a reliable, authenticated byte stream from A to B. It handles transport (TCP/TLS/iroh), SSH authentication, and channel setup, then hands the caller a single `Duplex` stream that just works.
If the TypeScript consumer needs multiplexing (e.g., multiple concurrent tool calls over operations), pubsub handles that at the `EventEnvelope` level. Multiple `call.requested` / `call.responded` events flow over the same stream, distinguished by their `id` fields. This is how the existing WebSocket adapter works.
## Consequences
- **Positive**: Minimal NAPI surface — one function, one return type. Small binary, small FFI boundary.
- **Positive**: The TypeScript side doesn't need to understand SSH at all. It gets a stream and sends/receives `EventEnvelope` JSON.
- **Positive**: No need to expose russh types in NAPI. The SSH complexity stays in Rust.
- **Negative**: If a consumer wants multiple isolated channels (e.g., one for events, one for file transfer), they'd need multiple `connect()` calls (multiple SSH sessions). This is acceptable for the expected use case (pubsub events over a single stream).
## References
- [napi-and-pubsub.md](../napi-and-pubsub.md)