Resolve all architecture open questions, add 13 ADRs, update specs

Resolved all 11 open questions based on project guidance:

Transport:
- OQ-01/OQ-07: ACME/Let's Encrypt with domain + IP paths (ADR-008)
- OQ-02: Default to n0 relay, --iroh-relay override (ADR-009)
- OQ-05: Transport chaining supported natively (ADR-010)

Client:
- OQ-06: Programmatic-first API, no ~/.ssh/config (ADR-011)

Server:
- OQ-04: Ed25519 + OpenSSH cert-authority, no password auth (ADR-012)
- OQ-08: fail2ban-friendly logging + built-in rate limiting (ADR-013)

TUN:
- OQ-03/OQ-09: Deferred entirely, recommend tun2proxy (ADR-014)
- tun-shim.md marked deprecated

NAPI:
- OQ-10: Expose both connect() and serve() (ADR-016)
- OQ-11: Use napi-rs for FFI bridge (ADR-015)

Additional ADRs created during review:
- ADR-006: No logging of tunnel destinations (was phantom reference)
- ADR-017: Stealth mode protocol multiplexing
- ADR-018: Control channel for pubsub over SSH

Fixed: ADR-002 status → Superseded, ADR-007 title typo,
WRAUTH_SERVER typo, ADR-005 stale wraith-tun refs,
undefined ACL feature removed from server.md,
--proxy semantic difference documented.
This commit is contained in:
2026-06-01 17:31:28 +00:00
parent dad8224686
commit 13b0991fb8
23 changed files with 777 additions and 249 deletions

View File

@@ -1,7 +1,7 @@
# ADR-002: TUN Shim as Separate Process
## Status
Accepted
Superseded by ADR-014
## Context
TUN interface creation requires root privileges or `CAP_NET_ADMIN` on Linux, Administrator on Windows, or platform-specific VPN APIs on macOS/iOS/Android. If the core wraith binary required these privileges, the attack surface of root-required code would include the entire SSH implementation, key handling, and transport negotiation.

View File

@@ -28,11 +28,11 @@ TUN forwards to SOCKS5 rather than directly to SSH because:
- No root code in the core binary
## Consequences
- **Positive**: Core binary is root-free. TUN is optional and separate.
- **Positive**: Core binary is root-free. TUN functionality is provided by the external `tun2proxy` tool (ADR-014).
- **Positive**: SOCKS5 is testable without TUN — just `curl` against it.
- **Positive**: TUN implementation is simplified — it's a thin wrapper around tun2proxy's pattern pointed at localhost:1080.
- **Negative**: TUN adds one network hop (TUN → localhost SOCKS5 → SSH). The latency impact is negligible (localhost).
- **Negative**: SOCKS5 doesn't capture UDP (except DNS via SOCKS5h). TUN mode would handle non-DNS UDP via the SOCKS5 UDP association or drop it.
- **Positive**: The TUN approach is validated by tun2proxy, a well-tested existing tool. No custom TUN code to maintain.
- **Negative**: VPN-like behavior requires running `tun2proxy` alongside `wraith connect` — two processes instead of one integrated binary.
- **Negative**: SOCKS5 doesn't capture UDP (except DNS via SOCKS5h). TUN mode via tun2proxy handles this separately.
## References
- [client.md](../client.md)

View File

@@ -0,0 +1,38 @@
# ADR-006: No Logging of Tunnel Destinations
## Status
Accepted
## Context
An SSH tunnel server sees every destination that clients connect to — hostnames, IP addresses, port numbers. This is extremely sensitive information. Logging it creates:
- **Privacy risks**: Tunnel destinations reveal what services users access (internal databases, APIs, etc.)
- **Legal concerns**: Server operators may be pressured to produce logs showing what clients accessed
- **Data retention liability**: Stored destination logs are an attack surface (data breaches, subpoenas)
However, the server does need to log some information for operational purposes — particularly for fail2ban integration to detect and block abusive connections.
## Decision
The server does NOT log:
- `channel_open_direct_tcpip` destinations (host, port)
- DNS resolutions performed by the server on behalf of clients
- Bytes transferred through tunnel channels
- Connection duration or throughput
The server DOES log (ADR-013):
- Auth attempts (remote_addr, user, key_fingerprint, accept/reject)
- Connection opened (remote_addr, transport kind)
- Connection closed (remote_addr, duration)
This separation ensures fail2ban has enough data to detect abusive IPs while destination privacy is maintained.
## Consequences
- **Positive**: Tunnel destinations are never written to disk or any observable log. This is the same guarantee OpenSSH makes with `LogLevel VERBOSE` or below.
- **Positive**: Reduces legal and privacy exposure for server operators.
- **Positive**: fail2ban can still work — it needs source IPs and auth failures, not destinations.
- **Negative**: Server operators cannot audit what destinations clients are accessing. If an operator needs this for compliance, they must implement it outside wraith (e.g., network-level logging at the target host).
- **Negative**: Debugging connectivity issues is harder without destination logs. Mitigated by client-side logging (the client knows what it's connecting to).
## References
- [server.md](../server.md)
- [ADR-013](013-fail2ban-friendly-logging.md) — what the server does log

View File

@@ -1,4 +1,4 @@
# ADR-006: NAPI Exposes Single Duplex Stream
# ADR-007: NAPI Exposes Single Duplex Stream
## Status
Accepted

View File

@@ -0,0 +1,38 @@
# ADR-008: ACME/Let's Encrypt Certificate Provisioning
## Status
Accepted
## Context
TLS transport mode requires certificates. Manual certificate management is error-prone — users need to obtain, install, and renew certificates. Our production setup uses certbot with Let's Encrypt (documented in `/workspace/system/dev1/certbot.md`), which automates this via the ACME protocol.
There are two ACME flows:
1. **Domain-based**: Standard flow with DNS-01 or HTTP-01 challenge. Certificate is tied to a domain name, auto-renews via certbot/systemd timer. Requires port 80 or DNS access for challenges.
2. **IP-based**: Short-lived certificates via TLS-ALPN-01 challenge on port 443. No domain needed, but cert is short-lived (days, not months). Simpler for quick setups but requires the ACME client to run continuously.
Both flows are important for wraith's usability. Without ACME, TLS mode requires manual cert setup — a significant barrier for users who want "SSH over port 443" for censorship resistance.
## Decision
Support both ACME certificate provisioning paths:
1. **Domain-based ACME** (`--acme-domain <domain>`): Standard certbot-style flow. Certificate is domain-bound, auto-renews. The server runs a challenge responder (HTTP-01 on port 80 or TLS-ALPN-01 on port 443) during certificate issuance/renewal.
2. **IP-based ACME**: Short-lived certs for servers without a domain. Uses TLS-ALPN-01 challenge on port 443. Lower burden but certs expire frequently.
3. **Manual certs** (`--tls-cert` / `--tls-key`): Always supported for users with existing certificates or specific PKI setups.
The implementation should use the `rustls-acme` crate (or similar pure-Rust ACME client) to avoid an external certbot dependency. This keeps wraith self-contained as a single binary.
## Consequences
- **Positive**: Users can run `wraith serve --transport tls --acme-domain example.com` and get working TLS with zero manual cert management.
- **Positive**: IP-based ACME covers the quick-setup case without requiring a domain.
- **Positive**: Consistent with our production infrastructure (certbot + Let's Encrypt is already our standard).
- **Negative**: ACME adds complexity to the server binary (challenge responder, cert store, renewal timer).
- **Negative**: IP-based short-lived certs require more frequent renewal handling.
- **Negative**: Binary size increases with ACME support (rustls-acme dependency). Consider making ACME a feature flag (`acme`).
## References
- [server.md](../server.md)
- [OQ-01](../open-questions.md) — resolved by this ADR
- [OQ-07](../open-questions.md) — resolved by this ADR
- Production certbot setup: `/workspace/system/dev1/certbot.md`

View File

@@ -0,0 +1,28 @@
# ADR-009: Default iroh Relay with Override
## Status
Accepted
## Context
iroh requires a relay server for NAT traversal and initial connection establishment. The n0 project provides free relay servers (`https://relay.iroh.network/`) that work out of the box. However, relying on a third-party service creates a dependency:
- n0's relay could change terms, rate-limit, or go down
- Production deployments may want self-hosted relays for reliability and privacy
- The relay URL is a configuration point that should be explicit
Conversely, requiring users to set up a relay server before they can use iroh transport is a significant friction point for testing and quick starts.
## Decision
Default to n0's relay servers. Allow override via `--iroh-relay <url>` CLI flag. Document self-hosted relay setup in project documentation.
This matches iroh's own defaults — n0's relay is the standard starting point. Users who need production reliability self-host.
## Consequences
- **Positive**: Zero-config iroh transport for testing and development. `wraith serve --transport iroh` just works.
- **Positive**: Self-hosting is a single flag override, not a complex setup requirement.
- **Negative**: Default depends on n0's infrastructure. If n0's relay is down, default iroh connections fail (but this is the same experience as every iroh user).
- **Negative**: Privacy-conscious users must remember to `--iroh-relay` to avoid n0. Mitigated by documentation.
## References
- [transport.md](../transport.md)
- [OQ-02](../open-questions.md) — resolved by this ADR

View File

@@ -0,0 +1,33 @@
# ADR-010: Transport Chaining in CLI
## Status
Accepted
## Context
Transport chaining allows combining iroh with an upstream proxy, e.g.:
```bash
wraith connect --transport iroh --proxy socks5://127.0.0.1:1080
```
This routes iroh's outbound TCP connections through a SOCKS5 proxy, which could itself be another wraith instance. This is important for:
- Nested tunnel topologies
- Environments where iroh needs to go through an existing proxy
- Composing transports in flexible ways
iroh's `Endpoint::builder` supports proxy configuration natively. The implementation is straightforward — pass the proxy URL to iroh's builder.
## Decision
Support `--transport iroh --proxy socks5://...` natively in the CLI. This works because iroh's endpoint builder accepts a proxy configuration, so the implementation is minimal: parse the proxy URL and pass it to the endpoint builder.
For other transport combinations (TCP+TLS is already implicit — TLS wraps TCP), the `--proxy` flag applies to outbound connections from the SSH client or iroh endpoint.
## Consequences
- **Positive**: Flexible transport composition without requiring separate manual configuration.
- **Positive**: Matches user expectation from the overview doc's transport chaining example.
- **Positive**: Implementation is minimal — iroh already supports proxy config.
- **Negative**: Slightly more CLI surface area (`--proxy` interaction with `--transport`).
## References
- [transport.md](../transport.md)
- [OQ-05](../open-questions.md) — resolved by this ADR

View File

@@ -0,0 +1,38 @@
# ADR-011: Programmatic-First API, No File-Based Config
## Status
Accepted
## Context
The client and server both need configuration (host addresses, keys, transport options, etc.). There are several approaches:
1. **Read `~/.ssh/config`**: Parse OpenSSH config for default host/key/port. Reduces CLI verbosity for frequent connections.
2. **Custom config file**: Wraith-specific config file (TOML/YAML) with host definitions.
3. **Programmatic API only**: Configuration comes from CLI flags or the library API. No file parsing. `~/.ssh/` path conventions are cross-platform trouble (`~` expansion, Windows paths, etc.).
4. **Hybrid**: `--config` flag pointing to a wraith-specific config file, but no OpenSSH config parsing.
## Decision
Option 3: Programmatic-first API. Configuration is provided via:
- **CLI**: explicit flags (`--server`, `--identity`, `--transport`, etc.)
- **Library API**: `wraith_core::client::ConnectOptions` and `wraith_core::server::ServeOptions` structs, constructable programmatically
- **Environment variables**: for a few convenience defaults (e.g., `WRAITH_SERVER`, `WRAITH_IDENTITY`)
No `~/.ssh/config` parsing, no wraith-specific config files. This approach:
- Avoids cross-platform path issues (`~` expansion, Windows `USERPROFILE`, etc.)
- Makes the library API clean and straightforward for programmatic consumers (NAPI wrapper, pubsub)
- Keeps the CLI simple and explicit — no hidden behavior from config files
- Matches the design principle that the library crate (`wraith-core`) is the primary interface
If users want config-file behavior in the future, it can be added as a separate layer that populates the options structs. But the core doesn't need to know about files.
## Consequences
- **Positive**: Clean library API — `ConnectOptions` and `ServeOptions` are plain Rust structs.
- **Positive**: No cross-platform path issues in the core library.
- **Positive**: Explicit CLI — no hidden settings from a config file the user forgot about.
- **Positive**: NAPI wrapper can construct options programmatically without file I/O.
- **Negative**: Users must type full connection flags each time. Mitigated by shell aliases or environment variables.
- **Negative**: No config file convenience. Users coming from `ssh config` may find this inconvenient.
## References
- [client.md](../client.md)
- [OQ-06](../open-questions.md) — resolved by this ADR

View File

@@ -0,0 +1,42 @@
# ADR-012: Ed25519 Keys + OpenSSH Certificate Authority, No Password Auth
## Status
Accepted
## Context
SSH authentication has several options:
- **Ed25519 public key**: The default, already specified. Each user has a keypair; the server has an `authorized_keys` file.
- **Password authentication**: Convenient for quick setups but less secure (susceptible to brute force, credential reuse).
- **OpenSSH certificate authority (cert-authority)**: A CA signs user certificates. The server trusts the CA instead of individual keys. Much easier for multi-user setups — add one CA line to `authorized_keys` instead of every user's public key. Also supports certificate expiry and restrictions.
The question is which auth methods to support and prioritize.
## Decision
**Primary: Ed25519 public key** (already specified, no change).
**Important: OpenSSH certificate authority**. Support `cert-authority` entries in `authorized_keys` files. When a user presents a certificate signed by a trusted CA, the server validates the certificate (signature, expiry, permissions) and accepts it. This is critical for multi-user deployments where managing individual keys is impractical.
**Not supported: Password authentication over SSH channels.** Password auth over an SSH tunnel (i.e., the SOCKS5 proxy requiring a password) is not in scope. Password auth over SSH itself is rejected because:
- It's less secure than key-based auth
- It's susceptible to brute force (fail2ban can mitigate, but keys eliminate the problem)
- It's not needed when cert-authority provides easy multi-user management
- If a local SOCKS5 proxy is desired with its own auth, that's a separate concern
The server's `authorized_keys` file format follows OpenSSH conventions:
- Regular keys: `ssh-ed25519 AAAA... user@host`
- CA trusts: `cert-authority ssh-ed25519 AAAA... CA name`
- Principals: `cert-authority,permit-port-forwarding ssh-ed25519 AAAA... CA name`
## Consequences
- **Positive**: Multi-user deployments are manageable — one CA entry instead of N key entries.
- **Positive**: Certificates can carry expiry dates and restrictions (permit-port-forwarding, no-pty, source-address).
- **Positive**: No password brute force risk. fail2ban still needed for connection-level abuse, but not for auth-level password guessing.
- **Positive**: `russh` supports OpenSSH certificate verification natively.
- **Negative**: Setting up a CA requires initial key management tooling (`ssh-keygen -s`).
- **Negative**: Users who want a quick "just let me in" experience need to generate keys first. Not a significant barrier for the target audience (self-hosting, ops).
## References
- [client.md](../client.md)
- [server.md](../server.md)
- [OQ-04](../open-questions.md) — resolved by this ADR

View File

@@ -0,0 +1,39 @@
# ADR-013: Fail2ban-Friendly Server Logging
## Status
Accepted
## Context
The server needs to handle abuse on public-facing deployments. Our production infrastructure uses fail2ban on Linux (documented in `/workspace/system/dev1/fail2ban.md`) with nftables and systemd journal backend. fail2ban needs structured, parseable logs to identify abusive IP addresses.
However, fail2ban is Linux-specific. On other platforms (macOS, Windows, BSD), users need a different approach to reject abusive connections. The server should provide enough logging for fail2ban on Linux and enough built-in protection for other platforms.
## Decision
The server logs connection and authentication events at `INFO` level with structured fields, and provides a configurable connection rate limiter as a built-in defense.
**Logging** (for fail2ban integration on Linux):
- Log auth attempts: `level=INFO, msg="auth attempt", remote_addr=<ip>, user=<user>, key_fingerprint=<sha256>, result=<accept|reject>`
- Log new connections: `level=INFO, msg="connection opened", remote_addr=<ip>, transport=<tcp|tls|iroh>`
- Log disconnections: `level=INFO, msg="connection closed", remote_addr=<ip>, duration=<secs>`
- Do NOT log: channel open targets, DNS resolutions, bytes transferred
This matches what fail2ban needs: source IP + failure indicator. Our existing fail2ban setup filters on similar fields for SSH and nginx.
**Built-in rate limiting** (for all platforms):
- `--max-connections-per-ip <n>` (default: 0 = unlimited) — reject new connections from an IP that already has N active connections
- `--max-auth-attempts <n>` (default: 10) — disconnect after N failed auth attempts from one connection
- Rate limiting happens at the SSH layer, before channels are opened
This ensures that even without fail2ban, the server rejects obviously abusive connections.
## Consequences
- **Positive**: fail2ban can parse wraith logs the same way it parses SSH and nginx logs on our production systems.
- **Positive**: Built-in rate limiting provides protection on platforms without fail2ban.
- **Positive**: No privacy-sensitive data in logs (no tunnel destinations).
- **Negative**: Slightly more code in the server for connection tracking per IP.
- **Negative**: Users with custom fail2ban filters need to write regex for wraith's log format (documented examples provided).
## References
- [server.md](../server.md)
- [OQ-08](../open-questions.md) — resolved by this ADR
- Production fail2ban setup: `/workspace/system/dev1/fail2ban.md`

View File

@@ -0,0 +1,41 @@
# ADR-014: Defer TUN Implementation, Recommend Local SOCKS5 + tun2proxy
## Status
Accepted
## Context
The original plan included a TUN shim (`wraith-tun`) as Phase 3 — a separate root-requiring process that creates a TUN device and forwards IP packets through wraith's SOCKS5 port. This would provide VPN-like "route all traffic" behavior.
However, TUN implementation has significant complexities:
- Platform differences (Linux TUN, macOS utun, Windows wintun.dll)
- TCP reconstruction in userspace (smoltcp or tun2proxy's ip-stack)
- Virtual DNS handling
- Root/CAP_NET_ADMIN requirements
- TUN is easy to get wrong and hard to debug
The core SOCKS5 interface already works for the vast majority of use cases. For users who truly need VPN-like "route all traffic" behavior, `tun2proxy` is an existing, well-tested tool that does exactly this: creates a TUN device and routes traffic through a SOCKS5 proxy.
## Decision
Defer TUN implementation entirely. Remove `wraith-tun` from the architecture. Instead:
1. **Core interface**: wraith's local SOCKS5 proxy (always available, no root required)
2. **VPN-like behavior**: Users who need it run `tun2proxy --proxy socks5://127.0.0.1:1080` alongside `wraith connect`
3. **Documentation**: Recommend tun2proxy in the README/wiki for "route all traffic" use cases
This removes TUN from the project scope while still providing a path to VPN-like behavior. If demand justifies it later, `wraith-tun` can be added as a thin wrapper around tun2proxy's pattern.
The `tun` feature flag and `wraith-tun` binary are removed from the architecture. The `tun-rs` dependency is removed.
## Consequences
- **Positive**: Significantly reduces project scope and complexity. No TUN code to write, test, or maintain across platforms.
- **Positive**: tun2proxy is already well-tested for this exact use case.
- **Positive**: Core binary remains unprivileged. No root code anywhere in the project.
- **Positive**: Cleaner architecture — wraith only does SSH tunneling + SOCKS5. tun2proxy does TUN.
- **Negative**: Users need two tools instead of one for VPN-like behavior. Mitigated by documentation.
- **Negative**: tun2proxy is an external dependency in practice, though it's widely available in package managers.
- **Negative**: No first-class Windows/macOS TUN story. tun2proxy handles these platforms but users need to install it separately.
## References
- [tun-shim.md](../tun-shim.md) — this spec is now deprecated
- [ADR-002](002-tun-separate-process.md) — superseded; TUN is no longer in scope
- [ADR-005](005-socks5-before-tun.md) — SOCKS5 is still the primary interface; TUN forwarding is now external

View File

@@ -0,0 +1,27 @@
# ADR-015: napi-rs for FFI Bridge
## Status
Accepted
## Context
The NAPI wrapper needs a Rust-to-Node.js bridge. Two main options:
1. **napi-rs**: The standard for Rust → Node.js native addons. Mature, well-documented, large ecosystem. Produces `.node` binaries for specific platforms. Good build tooling (`napi` CLI). Used by major projects (swc, rspack, biome).
2. **uniffi**: Mozilla's FFI bridge supporting multiple targets (Python, Swift, Kotlin, Node.js). Broader target reach but less mature for Node.js specifically. The Node.js binding is relatively new.
The primary consumer is TypeScript/Node.js — specifically the `@alkdev/pubsub` event target system. The broader alkdev ecosystem (pubsub, operations) is TypeScript-first. While future Python or mobile consumers are imaginable, they are not in scope.
## Decision
Use napi-rs. It's the standard for Node.js native addons, has the best documentation and tooling, and matches our primary consumer (TypeScript/Node.js). If future Python or mobile consumers are needed, uniffi can be added as a separate FFI layer — the Rust core library doesn't change, only the binding layer does.
## Consequences
- **Positive**: Best-in-class Node.js native addon support. Mature, well-documented, widely used.
- **Positive**: `napi` CLI handles building, cross-compilation, and npm package publishing.
- **Positive**: Async support via `napi-rs`'s `AsyncTask` and thread-safe functions.
- **Negative**: Only targets Node.js. Python/Swift/Kotlin require a separate FFI bridge (uniffi or similar).
- **Negative**: `.node` binaries are platform-specific. Need CI matrix for linux-x64, linux-arm64, macos-x64, macos-arm64, win32-x64.
## References
- [napi-and-pubsub.md](../napi-and-pubsub.md)
- [OQ-11](../open-questions.md) — resolved by this ADR

View File

@@ -0,0 +1,40 @@
# ADR-016: NAPI Exposes Both connect() and serve()
## Status
Accepted
## Context
The NAPI wrapper needs to provide TypeScript/Node.js consumers with access to wraith's functionality. The primary use case is `@alkdev/pubsub`'s event target system, which needs both directions:
1. **connect()**: Establish a client connection to a wraith server. Used by workers/spokes that need to tunnel events through a wraith server.
2. **serve()**: Start a wraith server from Node.js. Used by hubs that want to accept wraith connections and route events.
The previous decision (ADR-007) was to expose only `connect()` for MVP, deferring `serve()`. However, the pubsub integration requires both: a spoke needs `connect()` to reach a hub, and a hub could use `serve()` to accept connections without running a separate `wraith serve` process.
More importantly, both `connect()` and `serve()` are fundamental operations of the wraith library. Since the NAPI wrapper is a thin layer over `wraith-core`, exposing both is straightforward — they're just Rust functions behind `#[napi]` attributes.
## Decision
The NAPI wrapper exposes both `connect()` and `serve()` from the start:
```typescript
// @alkdev/wraith
function connect(options: WraithConnectOptions): Promise<Duplex>;
function serve(options: WraithServeOptions): Promise<WraithServer>;
```
- `connect()` returns a `Duplex` stream (as per ADR-007)
- `serve()` returns a `WraithServer` object with a `close()` method and events for new connections
The NAPI layer is transport-agnostic — it doesn't know about pubsub's `EventEnvelope`. The pubsub event target adapter wraps the `Duplex` stream to implement `TypedEventTarget`. This separation ensures the NAPI wrapper is reusable for any stream-based protocol, not just pubsub.
## Consequences
- **Positive**: Pubsub can use both directions without running a separate binary for the server side.
- **Positive**: The NAPI wrapper becomes a complete bridge — any Node.js process can be either a client or server.
- **Positive**: Implementation is still minimal — `serve()` is just `wraith_core::server::run()` behind `#[napi]`.
- **Negative**: Slightly larger API surface (two functions + `WraithServer` type instead of just `connect()`).
- **Negative**: Server-side NAPI needs to handle multiple concurrent connections, which adds complexity to `WraithServer`.
## References
- [napi-and-pubsub.md](../napi-and-pubsub.md)
- [ADR-007](007-napi-single-stream.md) — still valid; NAPI exposes single streams, but now from both sides
- [OQ-10](../open-questions.md) — resolved by this ADR

View File

@@ -0,0 +1,30 @@
# ADR-017: Stealth Mode — Protocol Multiplexing on Port 443
## Status
Accepted
## Context
When running a wraith server with TLS transport on port 443, the server should be indistinguishable from a regular HTTPS web server to port scanners and deep packet inspection (DPI) systems. This is important for censorship circumvention — if SSH traffic on port 443 is detectable, it can be blocked.
After the TLS handshake completes, the server sees a raw byte stream. SSH protocol identification starts with `SSH-2.0-`, while HTTP starts with HTTP method verbs (GET, POST, etc.). The server can inspect the first bytes to determine the protocol.
## Decision
When `--stealth` is enabled with TLS transport:
1. After completing the TLS handshake, peek at the first few bytes of the connection
2. If the connection starts with `SSH-2.0-`, proceed with SSH session via `server::run_stream()`
3. If the connection starts with anything else (HTTP, random data), respond with `HTTP/1.1 404 Not Found\r\nServer: nginx\r\n\r\n` and close the connection
This makes the server appear as an nginx web server returning 404 errors to all non-SSH connections. Scanners and DPI systems see a typical HTTPS site with no SSH exposure.
The fake response uses `Server: nginx` headers to match the most common web server profile.
## Consequences
- **Positive**: TLS+wraith servers on port 443 are indistinguishable from ordinary HTTPS sites to automated scanners.
- **Positive**: Simple implementation — just peek at the first bytes and branch.
- **Positive**: Consistent with censorship circumvention best practices.
- **Negative**: Legitimate HTTPS traffic to the same port gets a 404. If the same IP needs to serve real web content, use a reverse proxy (nginx/haproxy) in front that routes by SNI or path.
- **Negative**: The `--stealth` flag only applies to TLS transport. It has no effect on TCP or iroh transports.
## References
- [server.md](../server.md)

View File

@@ -0,0 +1,38 @@
# ADR-018: Control Channel for PubSub over SSH
## Status
Accepted
## Context
The NAPI wrapper and pubsub integration need a way to use wraith's SSH channel as a data plane for event routing. When a `wraith connect` client opens an SSH session to a server, the `direct_tcpip` channel type is used to reach specific TCP targets (host:port).
For the pubsub use case, the client needs a dedicated bidirectional stream to the server's event bus — not a TCP connection to a random host. There are several approaches:
1. **Special destination**: Use `direct_tcpip` with a reserved destination (e.g., `wraith-control:0`) that the server recognizes and routes internally instead of connecting to a TCP target.
2. **Port forwarding**: The server runs a pubsub hub on a specific port (e.g., 9736) and the client uses normal port forwarding (`-L 9736:hub:9736`).
3. **Custom channel type**: Define a new SSH channel type beyond `direct_tcpip` and `forwarded_tcpip`.
## Decision
Use approach 1: a reserved `direct_tcpip` destination string. When the server receives a `channel_open_direct_tcpip` request for `wraith-control:0`:
1. The `channel_open_direct_tcpip` handler detects the special target via string matching
2. Instead of connecting to a TCP target, it bridges the channel to the local pubsub event bus
3. `EventEnvelope` JSON flows bidirectionally over the SSH channel
The destination string `wraith-control` is reserved. Regular TCP targets are hostnames or IP addresses, so there is no collision risk.
Approach 2 (port forwarding to a specific port) is still supported as an alternative — the client can use `--forward 9736:localhost:9736` if the server runs a pubsub hub on that port. But the control channel approach is simpler and doesn't require a separate listening port.
Approach 3 (custom channel type) was rejected because russh's `direct_tcpip` handler is well-understood and adding custom channel types requires modifying russh.
## Consequences
- **Positive**: Simple implementation — just string matching in the server's `channel_open_direct_tcpip` handler.
- **Positive**: No separate port or service needs to run on the server. The control channel is built into wraith.
- **Positive**: Compatible with the NAPI wrapper's single-duplex-stream model.
- **Positive**: Port forwarding to a specific port is still available as an alternative.
- **Negative**: The string `wraith-control` is a magic constant. It should be defined as a constant in the crate.
- **Negative**: Regular TCP destinations accidentally matching `wraith-control` would be misrouted. Mitigated by reserving the entire `wraith-` prefix namespace.
## References
- [napi-and-pubsub.md](../napi-and-pubsub.md)
- [server.md](../server.md)