Resolve all architecture open questions, add 13 ADRs, update specs

Resolved all 11 open questions based on project guidance: Transport: - OQ-01/OQ-07: ACME/Let's Encrypt with domain + IP paths (ADR-008) - OQ-02: Default to n0 relay, --iroh-relay override (ADR-009) - OQ-05: Transport chaining supported natively (ADR-010) Client: - OQ-06: Programmatic-first API, no ~/.ssh/config (ADR-011) Server: - OQ-04: Ed25519 + OpenSSH cert-authority, no password auth (ADR-012) - OQ-08: fail2ban-friendly logging + built-in rate limiting (ADR-013) TUN: - OQ-03/OQ-09: Deferred entirely, recommend tun2proxy (ADR-014) - tun-shim.md marked deprecated NAPI: - OQ-10: Expose both connect() and serve() (ADR-016) - OQ-11: Use napi-rs for FFI bridge (ADR-015) Additional ADRs created during review: - ADR-006: No logging of tunnel destinations (was phantom reference) - ADR-017: Stealth mode protocol multiplexing - ADR-018: Control channel for pubsub over SSH Fixed: ADR-002 status → Superseded, ADR-007 title typo, WRAUTH_SERVER typo, ADR-005 stale wraith-tun refs, undefined ACL feature removed from server.md, --proxy semantic difference documented.
2026-06-01 17:31:28 +00:00
parent dad8224686
commit 13b0991fb8
23 changed files with 777 additions and 249 deletions
--- a/docs/architecture/decisions/002-tun-separate-process.md
+++ b/docs/architecture/decisions/002-tun-separate-process.md
@@ -1,7 +1,7 @@
 # ADR-002: TUN Shim as Separate Process

 ## Status
-Accepted
+Superseded by ADR-014

 ## Context
 TUN interface creation requires root privileges or `CAP_NET_ADMIN` on Linux, Administrator on Windows, or platform-specific VPN APIs on macOS/iOS/Android. If the core wraith binary required these privileges, the attack surface of root-required code would include the entire SSH implementation, key handling, and transport negotiation.
--- a/docs/architecture/decisions/005-socks5-before-tun.md
+++ b/docs/architecture/decisions/005-socks5-before-tun.md
@@ -28,11 +28,11 @@ TUN forwards to SOCKS5 rather than directly to SSH because:
 - No root code in the core binary

 ## Consequences
- **Positive**: Core binary is root-free. TUN is optional and separate.
+- **Positive**: Core binary is root-free. TUN functionality is provided by the external `tun2proxy` tool (ADR-014).
 - **Positive**: SOCKS5 is testable without TUN — just `curl` against it.
- **Positive**: TUN implementation is simplified — it's a thin wrapper around tun2proxy's pattern pointed at localhost:1080.
- **Negative**: TUN adds one network hop (TUN → localhost SOCKS5 → SSH). The latency impact is negligible (localhost).
- **Negative**: SOCKS5 doesn't capture UDP (except DNS via SOCKS5h). TUN mode would handle non-DNS UDP via the SOCKS5 UDP association or drop it.
+- **Positive**: The TUN approach is validated by tun2proxy, a well-tested existing tool. No custom TUN code to maintain.
+- **Negative**: VPN-like behavior requires running `tun2proxy` alongside `wraith connect` — two processes instead of one integrated binary.
+- **Negative**: SOCKS5 doesn't capture UDP (except DNS via SOCKS5h). TUN mode via tun2proxy handles this separately.

 ## References
 - [client.md](../client.md)
--- a/docs/architecture/decisions/006-no-logging-of-tunnel-destinations.md
+++ b/docs/architecture/decisions/006-no-logging-of-tunnel-destinations.md
@@ -0,0 +1,38 @@
+# ADR-006: No Logging of Tunnel Destinations
+
+## Status
+Accepted
+
+## Context
+An SSH tunnel server sees every destination that clients connect to — hostnames, IP addresses, port numbers. This is extremely sensitive information. Logging it creates:
+
+- **Privacy risks**: Tunnel destinations reveal what services users access (internal databases, APIs, etc.)
+- **Legal concerns**: Server operators may be pressured to produce logs showing what clients accessed
+- **Data retention liability**: Stored destination logs are an attack surface (data breaches, subpoenas)
+
+However, the server does need to log some information for operational purposes — particularly for fail2ban integration to detect and block abusive connections.
+
+## Decision
+The server does NOT log:
+- `channel_open_direct_tcpip` destinations (host, port)
+- DNS resolutions performed by the server on behalf of clients
+- Bytes transferred through tunnel channels
+- Connection duration or throughput
+
+The server DOES log (ADR-013):
+- Auth attempts (remote_addr, user, key_fingerprint, accept/reject)
+- Connection opened (remote_addr, transport kind)
+- Connection closed (remote_addr, duration)
+
+This separation ensures fail2ban has enough data to detect abusive IPs while destination privacy is maintained.
+
+## Consequences
+- **Positive**: Tunnel destinations are never written to disk or any observable log. This is the same guarantee OpenSSH makes with `LogLevel VERBOSE` or below.
+- **Positive**: Reduces legal and privacy exposure for server operators.
+- **Positive**: fail2ban can still work — it needs source IPs and auth failures, not destinations.
+- **Negative**: Server operators cannot audit what destinations clients are accessing. If an operator needs this for compliance, they must implement it outside wraith (e.g., network-level logging at the target host).
+- **Negative**: Debugging connectivity issues is harder without destination logs. Mitigated by client-side logging (the client knows what it's connecting to).
+
+## References
+- [server.md](../server.md)
+- [ADR-013](013-fail2ban-friendly-logging.md) — what the server does log
--- a/docs/architecture/decisions/007-napi-single-stream.md
+++ b/docs/architecture/decisions/007-napi-single-stream.md
@@ -1,4 +1,4 @@
-# ADR-006: NAPI Exposes Single Duplex Stream
+# ADR-007: NAPI Exposes Single Duplex Stream

 ## Status
 Accepted
--- a/docs/architecture/decisions/008-acme-lets-encrypt.md
+++ b/docs/architecture/decisions/008-acme-lets-encrypt.md
@@ -0,0 +1,38 @@
+# ADR-008: ACME/Let's Encrypt Certificate Provisioning
+
+## Status
+Accepted
+
+## Context
+TLS transport mode requires certificates. Manual certificate management is error-prone — users need to obtain, install, and renew certificates. Our production setup uses certbot with Let's Encrypt (documented in `/workspace/system/dev1/certbot.md`), which automates this via the ACME protocol.
+
+There are two ACME flows:
+1. **Domain-based**: Standard flow with DNS-01 or HTTP-01 challenge. Certificate is tied to a domain name, auto-renews via certbot/systemd timer. Requires port 80 or DNS access for challenges.
+2. **IP-based**: Short-lived certificates via TLS-ALPN-01 challenge on port 443. No domain needed, but cert is short-lived (days, not months). Simpler for quick setups but requires the ACME client to run continuously.
+
+Both flows are important for wraith's usability. Without ACME, TLS mode requires manual cert setup — a significant barrier for users who want "SSH over port 443" for censorship resistance.
+
+## Decision
+Support both ACME certificate provisioning paths:
+
+1. **Domain-based ACME** (`--acme-domain <domain>`): Standard certbot-style flow. Certificate is domain-bound, auto-renews. The server runs a challenge responder (HTTP-01 on port 80 or TLS-ALPN-01 on port 443) during certificate issuance/renewal.
+
+2. **IP-based ACME**: Short-lived certs for servers without a domain. Uses TLS-ALPN-01 challenge on port 443. Lower burden but certs expire frequently.
+
+3. **Manual certs** (`--tls-cert` / `--tls-key`): Always supported for users with existing certificates or specific PKI setups.
+
+The implementation should use the `rustls-acme` crate (or similar pure-Rust ACME client) to avoid an external certbot dependency. This keeps wraith self-contained as a single binary.
+
+## Consequences
+- **Positive**: Users can run `wraith serve --transport tls --acme-domain example.com` and get working TLS with zero manual cert management.
+- **Positive**: IP-based ACME covers the quick-setup case without requiring a domain.
+- **Positive**: Consistent with our production infrastructure (certbot + Let's Encrypt is already our standard).
+- **Negative**: ACME adds complexity to the server binary (challenge responder, cert store, renewal timer).
+- **Negative**: IP-based short-lived certs require more frequent renewal handling.
+- **Negative**: Binary size increases with ACME support (rustls-acme dependency). Consider making ACME a feature flag (`acme`).
+
+## References
+- [server.md](../server.md)
+- [OQ-01](../open-questions.md) — resolved by this ADR
+- [OQ-07](../open-questions.md) — resolved by this ADR
+- Production certbot setup: `/workspace/system/dev1/certbot.md`
--- a/docs/architecture/decisions/009-default-iroh-relay.md
+++ b/docs/architecture/decisions/009-default-iroh-relay.md
@@ -0,0 +1,28 @@
+# ADR-009: Default iroh Relay with Override
+
+## Status
+Accepted
+
+## Context
+iroh requires a relay server for NAT traversal and initial connection establishment. The n0 project provides free relay servers (`https://relay.iroh.network/`) that work out of the box. However, relying on a third-party service creates a dependency:
+
+- n0's relay could change terms, rate-limit, or go down
+- Production deployments may want self-hosted relays for reliability and privacy
+- The relay URL is a configuration point that should be explicit
+
+Conversely, requiring users to set up a relay server before they can use iroh transport is a significant friction point for testing and quick starts.
+
+## Decision
+Default to n0's relay servers. Allow override via `--iroh-relay <url>` CLI flag. Document self-hosted relay setup in project documentation.
+
+This matches iroh's own defaults — n0's relay is the standard starting point. Users who need production reliability self-host.
+
+## Consequences
+- **Positive**: Zero-config iroh transport for testing and development. `wraith serve --transport iroh` just works.
+- **Positive**: Self-hosting is a single flag override, not a complex setup requirement.
+- **Negative**: Default depends on n0's infrastructure. If n0's relay is down, default iroh connections fail (but this is the same experience as every iroh user).
+- **Negative**: Privacy-conscious users must remember to `--iroh-relay` to avoid n0. Mitigated by documentation.
+
+## References
+- [transport.md](../transport.md)
+- [OQ-02](../open-questions.md) — resolved by this ADR
--- a/docs/architecture/decisions/010-transport-chaining-cli.md
+++ b/docs/architecture/decisions/010-transport-chaining-cli.md
@@ -0,0 +1,33 @@
+# ADR-010: Transport Chaining in CLI
+
+## Status
+Accepted
+
+## Context
+Transport chaining allows combining iroh with an upstream proxy, e.g.:
+
+```bash
+wraith connect --transport iroh --proxy socks5://127.0.0.1:1080
+```
+
+This routes iroh's outbound TCP connections through a SOCKS5 proxy, which could itself be another wraith instance. This is important for:
+- Nested tunnel topologies
+- Environments where iroh needs to go through an existing proxy
+- Composing transports in flexible ways
+
+iroh's `Endpoint::builder` supports proxy configuration natively. The implementation is straightforward — pass the proxy URL to iroh's builder.
+
+## Decision
+Support `--transport iroh --proxy socks5://...` natively in the CLI. This works because iroh's endpoint builder accepts a proxy configuration, so the implementation is minimal: parse the proxy URL and pass it to the endpoint builder.
+
+For other transport combinations (TCP+TLS is already implicit — TLS wraps TCP), the `--proxy` flag applies to outbound connections from the SSH client or iroh endpoint.
+
+## Consequences
+- **Positive**: Flexible transport composition without requiring separate manual configuration.
+- **Positive**: Matches user expectation from the overview doc's transport chaining example.
+- **Positive**: Implementation is minimal — iroh already supports proxy config.
+- **Negative**: Slightly more CLI surface area (`--proxy` interaction with `--transport`).
+
+## References
+- [transport.md](../transport.md)
+- [OQ-05](../open-questions.md) — resolved by this ADR
--- a/docs/architecture/decisions/011-no-ssh-config-programmatic-api.md
+++ b/docs/architecture/decisions/011-no-ssh-config-programmatic-api.md
@@ -0,0 +1,38 @@
+# ADR-011: Programmatic-First API, No File-Based Config
+
+## Status
+Accepted
+
+## Context
+The client and server both need configuration (host addresses, keys, transport options, etc.). There are several approaches:
+
+1. **Read `~/.ssh/config`**: Parse OpenSSH config for default host/key/port. Reduces CLI verbosity for frequent connections.
+2. **Custom config file**: Wraith-specific config file (TOML/YAML) with host definitions.
+3. **Programmatic API only**: Configuration comes from CLI flags or the library API. No file parsing. `~/.ssh/` path conventions are cross-platform trouble (`~` expansion, Windows paths, etc.).
+4. **Hybrid**: `--config` flag pointing to a wraith-specific config file, but no OpenSSH config parsing.
+
+## Decision
+Option 3: Programmatic-first API. Configuration is provided via:
+- **CLI**: explicit flags (`--server`, `--identity`, `--transport`, etc.)
+- **Library API**: `wraith_core::client::ConnectOptions` and `wraith_core::server::ServeOptions` structs, constructable programmatically
+- **Environment variables**: for a few convenience defaults (e.g., `WRAITH_SERVER`, `WRAITH_IDENTITY`)
+
+No `~/.ssh/config` parsing, no wraith-specific config files. This approach:
+- Avoids cross-platform path issues (`~` expansion, Windows `USERPROFILE`, etc.)
+- Makes the library API clean and straightforward for programmatic consumers (NAPI wrapper, pubsub)
+- Keeps the CLI simple and explicit — no hidden behavior from config files
+- Matches the design principle that the library crate (`wraith-core`) is the primary interface
+
+If users want config-file behavior in the future, it can be added as a separate layer that populates the options structs. But the core doesn't need to know about files.
+
+## Consequences
+- **Positive**: Clean library API — `ConnectOptions` and `ServeOptions` are plain Rust structs.
+- **Positive**: No cross-platform path issues in the core library.
+- **Positive**: Explicit CLI — no hidden settings from a config file the user forgot about.
+- **Positive**: NAPI wrapper can construct options programmatically without file I/O.
+- **Negative**: Users must type full connection flags each time. Mitigated by shell aliases or environment variables.
+- **Negative**: No config file convenience. Users coming from `ssh config` may find this inconvenient.
+
+## References
+- [client.md](../client.md)
+- [OQ-06](../open-questions.md) — resolved by this ADR
--- a/docs/architecture/decisions/012-auth-ed25519-and-cert-authority.md
+++ b/docs/architecture/decisions/012-auth-ed25519-and-cert-authority.md
@@ -0,0 +1,42 @@
+# ADR-012: Ed25519 Keys + OpenSSH Certificate Authority, No Password Auth
+
+## Status
+Accepted
+
+## Context
+SSH authentication has several options:
+- **Ed25519 public key**: The default, already specified. Each user has a keypair; the server has an `authorized_keys` file.
+- **Password authentication**: Convenient for quick setups but less secure (susceptible to brute force, credential reuse).
+- **OpenSSH certificate authority (cert-authority)**: A CA signs user certificates. The server trusts the CA instead of individual keys. Much easier for multi-user setups — add one CA line to `authorized_keys` instead of every user's public key. Also supports certificate expiry and restrictions.
+
+The question is which auth methods to support and prioritize.
+
+## Decision
+
+**Primary: Ed25519 public key** (already specified, no change).
+
+**Important: OpenSSH certificate authority**. Support `cert-authority` entries in `authorized_keys` files. When a user presents a certificate signed by a trusted CA, the server validates the certificate (signature, expiry, permissions) and accepts it. This is critical for multi-user deployments where managing individual keys is impractical.
+
+**Not supported: Password authentication over SSH channels.** Password auth over an SSH tunnel (i.e., the SOCKS5 proxy requiring a password) is not in scope. Password auth over SSH itself is rejected because:
+- It's less secure than key-based auth
+- It's susceptible to brute force (fail2ban can mitigate, but keys eliminate the problem)
+- It's not needed when cert-authority provides easy multi-user management
+- If a local SOCKS5 proxy is desired with its own auth, that's a separate concern
+
+The server's `authorized_keys` file format follows OpenSSH conventions:
+- Regular keys: `ssh-ed25519 AAAA... user@host`
+- CA trusts: `cert-authority ssh-ed25519 AAAA... CA name`
+- Principals: `cert-authority,permit-port-forwarding ssh-ed25519 AAAA... CA name`
+
+## Consequences
+- **Positive**: Multi-user deployments are manageable — one CA entry instead of N key entries.
+- **Positive**: Certificates can carry expiry dates and restrictions (permit-port-forwarding, no-pty, source-address).
+- **Positive**: No password brute force risk. fail2ban still needed for connection-level abuse, but not for auth-level password guessing.
+- **Positive**: `russh` supports OpenSSH certificate verification natively.
+- **Negative**: Setting up a CA requires initial key management tooling (`ssh-keygen -s`).
+- **Negative**: Users who want a quick "just let me in" experience need to generate keys first. Not a significant barrier for the target audience (self-hosting, ops).
+
+## References
+- [client.md](../client.md)
+- [server.md](../server.md)
+- [OQ-04](../open-questions.md) — resolved by this ADR
--- a/docs/architecture/decisions/013-fail2ban-friendly-logging.md
+++ b/docs/architecture/decisions/013-fail2ban-friendly-logging.md
@@ -0,0 +1,39 @@
+# ADR-013: Fail2ban-Friendly Server Logging
+
+## Status
+Accepted
+
+## Context
+The server needs to handle abuse on public-facing deployments. Our production infrastructure uses fail2ban on Linux (documented in `/workspace/system/dev1/fail2ban.md`) with nftables and systemd journal backend. fail2ban needs structured, parseable logs to identify abusive IP addresses.
+
+However, fail2ban is Linux-specific. On other platforms (macOS, Windows, BSD), users need a different approach to reject abusive connections. The server should provide enough logging for fail2ban on Linux and enough built-in protection for other platforms.
+
+## Decision
+The server logs connection and authentication events at `INFO` level with structured fields, and provides a configurable connection rate limiter as a built-in defense.
+
+**Logging** (for fail2ban integration on Linux):
+- Log auth attempts: `level=INFO, msg="auth attempt", remote_addr=<ip>, user=<user>, key_fingerprint=<sha256>, result=<accept|reject>`
+- Log new connections: `level=INFO, msg="connection opened", remote_addr=<ip>, transport=<tcp|tls|iroh>`
+- Log disconnections: `level=INFO, msg="connection closed", remote_addr=<ip>, duration=<secs>`
+- Do NOT log: channel open targets, DNS resolutions, bytes transferred
+
+This matches what fail2ban needs: source IP + failure indicator. Our existing fail2ban setup filters on similar fields for SSH and nginx.
+
+**Built-in rate limiting** (for all platforms):
+- `--max-connections-per-ip <n>` (default: 0 = unlimited) — reject new connections from an IP that already has N active connections
+- `--max-auth-attempts <n>` (default: 10) — disconnect after N failed auth attempts from one connection
+- Rate limiting happens at the SSH layer, before channels are opened
+
+This ensures that even without fail2ban, the server rejects obviously abusive connections.
+
+## Consequences
+- **Positive**: fail2ban can parse wraith logs the same way it parses SSH and nginx logs on our production systems.
+- **Positive**: Built-in rate limiting provides protection on platforms without fail2ban.
+- **Positive**: No privacy-sensitive data in logs (no tunnel destinations).
+- **Negative**: Slightly more code in the server for connection tracking per IP.
+- **Negative**: Users with custom fail2ban filters need to write regex for wraith's log format (documented examples provided).
+
+## References
+- [server.md](../server.md)
+- [OQ-08](../open-questions.md) — resolved by this ADR
+- Production fail2ban setup: `/workspace/system/dev1/fail2ban.md`
--- a/docs/architecture/decisions/014-defer-tun-recommend-socks5-proxy.md
+++ b/docs/architecture/decisions/014-defer-tun-recommend-socks5-proxy.md
@@ -0,0 +1,41 @@
+# ADR-014: Defer TUN Implementation, Recommend Local SOCKS5 + tun2proxy
+
+## Status
+Accepted
+
+## Context
+The original plan included a TUN shim (`wraith-tun`) as Phase 3 — a separate root-requiring process that creates a TUN device and forwards IP packets through wraith's SOCKS5 port. This would provide VPN-like "route all traffic" behavior.
+
+However, TUN implementation has significant complexities:
+- Platform differences (Linux TUN, macOS utun, Windows wintun.dll)
+- TCP reconstruction in userspace (smoltcp or tun2proxy's ip-stack)
+- Virtual DNS handling
+- Root/CAP_NET_ADMIN requirements
+- TUN is easy to get wrong and hard to debug
+
+The core SOCKS5 interface already works for the vast majority of use cases. For users who truly need VPN-like "route all traffic" behavior, `tun2proxy` is an existing, well-tested tool that does exactly this: creates a TUN device and routes traffic through a SOCKS5 proxy.
+
+## Decision
+Defer TUN implementation entirely. Remove `wraith-tun` from the architecture. Instead:
+
+1. **Core interface**: wraith's local SOCKS5 proxy (always available, no root required)
+2. **VPN-like behavior**: Users who need it run `tun2proxy --proxy socks5://127.0.0.1:1080` alongside `wraith connect`
+3. **Documentation**: Recommend tun2proxy in the README/wiki for "route all traffic" use cases
+
+This removes TUN from the project scope while still providing a path to VPN-like behavior. If demand justifies it later, `wraith-tun` can be added as a thin wrapper around tun2proxy's pattern.
+
+The `tun` feature flag and `wraith-tun` binary are removed from the architecture. The `tun-rs` dependency is removed.
+
+## Consequences
+- **Positive**: Significantly reduces project scope and complexity. No TUN code to write, test, or maintain across platforms.
+- **Positive**: tun2proxy is already well-tested for this exact use case.
+- **Positive**: Core binary remains unprivileged. No root code anywhere in the project.
+- **Positive**: Cleaner architecture — wraith only does SSH tunneling + SOCKS5. tun2proxy does TUN.
+- **Negative**: Users need two tools instead of one for VPN-like behavior. Mitigated by documentation.
+- **Negative**: tun2proxy is an external dependency in practice, though it's widely available in package managers.
+- **Negative**: No first-class Windows/macOS TUN story. tun2proxy handles these platforms but users need to install it separately.
+
+## References
+- [tun-shim.md](../tun-shim.md) — this spec is now deprecated
+- [ADR-002](002-tun-separate-process.md) — superseded; TUN is no longer in scope
+- [ADR-005](005-socks5-before-tun.md) — SOCKS5 is still the primary interface; TUN forwarding is now external
--- a/docs/architecture/decisions/015-napi-rs-for-ffi-bridge.md
+++ b/docs/architecture/decisions/015-napi-rs-for-ffi-bridge.md
@@ -0,0 +1,27 @@
+# ADR-015: napi-rs for FFI Bridge
+
+## Status
+Accepted
+
+## Context
+The NAPI wrapper needs a Rust-to-Node.js bridge. Two main options:
+
+1. **napi-rs**: The standard for Rust → Node.js native addons. Mature, well-documented, large ecosystem. Produces `.node` binaries for specific platforms. Good build tooling (`napi` CLI). Used by major projects (swc, rspack, biome).
+
+2. **uniffi**: Mozilla's FFI bridge supporting multiple targets (Python, Swift, Kotlin, Node.js). Broader target reach but less mature for Node.js specifically. The Node.js binding is relatively new.
+
+The primary consumer is TypeScript/Node.js — specifically the `@alkdev/pubsub` event target system. The broader alkdev ecosystem (pubsub, operations) is TypeScript-first. While future Python or mobile consumers are imaginable, they are not in scope.
+
+## Decision
+Use napi-rs. It's the standard for Node.js native addons, has the best documentation and tooling, and matches our primary consumer (TypeScript/Node.js). If future Python or mobile consumers are needed, uniffi can be added as a separate FFI layer — the Rust core library doesn't change, only the binding layer does.
+
+## Consequences
+- **Positive**: Best-in-class Node.js native addon support. Mature, well-documented, widely used.
+- **Positive**: `napi` CLI handles building, cross-compilation, and npm package publishing.
+- **Positive**: Async support via `napi-rs`'s `AsyncTask` and thread-safe functions.
+- **Negative**: Only targets Node.js. Python/Swift/Kotlin require a separate FFI bridge (uniffi or similar).
+- **Negative**: `.node` binaries are platform-specific. Need CI matrix for linux-x64, linux-arm64, macos-x64, macos-arm64, win32-x64.
+
+## References
+- [napi-and-pubsub.md](../napi-and-pubsub.md)
+- [OQ-11](../open-questions.md) — resolved by this ADR
--- a/docs/architecture/decisions/016-napi-expose-connect-and-serve.md
+++ b/docs/architecture/decisions/016-napi-expose-connect-and-serve.md
@@ -0,0 +1,40 @@
+# ADR-016: NAPI Exposes Both connect() and serve()
+
+## Status
+Accepted
+
+## Context
+The NAPI wrapper needs to provide TypeScript/Node.js consumers with access to wraith's functionality. The primary use case is `@alkdev/pubsub`'s event target system, which needs both directions:
+
+1. **connect()**: Establish a client connection to a wraith server. Used by workers/spokes that need to tunnel events through a wraith server.
+2. **serve()**: Start a wraith server from Node.js. Used by hubs that want to accept wraith connections and route events.
+
+The previous decision (ADR-007) was to expose only `connect()` for MVP, deferring `serve()`. However, the pubsub integration requires both: a spoke needs `connect()` to reach a hub, and a hub could use `serve()` to accept connections without running a separate `wraith serve` process.
+
+More importantly, both `connect()` and `serve()` are fundamental operations of the wraith library. Since the NAPI wrapper is a thin layer over `wraith-core`, exposing both is straightforward — they're just Rust functions behind `#[napi]` attributes.
+
+## Decision
+The NAPI wrapper exposes both `connect()` and `serve()` from the start:
+
+```typescript
+// @alkdev/wraith
+function connect(options: WraithConnectOptions): Promise<Duplex>;
+function serve(options: WraithServeOptions): Promise<WraithServer>;
+```
+
+- `connect()` returns a `Duplex` stream (as per ADR-007)
+- `serve()` returns a `WraithServer` object with a `close()` method and events for new connections
+
+The NAPI layer is transport-agnostic — it doesn't know about pubsub's `EventEnvelope`. The pubsub event target adapter wraps the `Duplex` stream to implement `TypedEventTarget`. This separation ensures the NAPI wrapper is reusable for any stream-based protocol, not just pubsub.
+
+## Consequences
+- **Positive**: Pubsub can use both directions without running a separate binary for the server side.
+- **Positive**: The NAPI wrapper becomes a complete bridge — any Node.js process can be either a client or server.
+- **Positive**: Implementation is still minimal — `serve()` is just `wraith_core::server::run()` behind `#[napi]`.
+- **Negative**: Slightly larger API surface (two functions + `WraithServer` type instead of just `connect()`).
+- **Negative**: Server-side NAPI needs to handle multiple concurrent connections, which adds complexity to `WraithServer`.
+
+## References
+- [napi-and-pubsub.md](../napi-and-pubsub.md)
+- [ADR-007](007-napi-single-stream.md) — still valid; NAPI exposes single streams, but now from both sides
+- [OQ-10](../open-questions.md) — resolved by this ADR
--- a/docs/architecture/decisions/017-stealth-mode-protocol-multiplexing.md
+++ b/docs/architecture/decisions/017-stealth-mode-protocol-multiplexing.md
@@ -0,0 +1,30 @@
+# ADR-017: Stealth Mode — Protocol Multiplexing on Port 443
+
+## Status
+Accepted
+
+## Context
+When running a wraith server with TLS transport on port 443, the server should be indistinguishable from a regular HTTPS web server to port scanners and deep packet inspection (DPI) systems. This is important for censorship circumvention — if SSH traffic on port 443 is detectable, it can be blocked.
+
+After the TLS handshake completes, the server sees a raw byte stream. SSH protocol identification starts with `SSH-2.0-`, while HTTP starts with HTTP method verbs (GET, POST, etc.). The server can inspect the first bytes to determine the protocol.
+
+## Decision
+When `--stealth` is enabled with TLS transport:
+
+1. After completing the TLS handshake, peek at the first few bytes of the connection
+2. If the connection starts with `SSH-2.0-`, proceed with SSH session via `server::run_stream()`
+3. If the connection starts with anything else (HTTP, random data), respond with `HTTP/1.1 404 Not Found\r\nServer: nginx\r\n\r\n` and close the connection
+
+This makes the server appear as an nginx web server returning 404 errors to all non-SSH connections. Scanners and DPI systems see a typical HTTPS site with no SSH exposure.
+
+The fake response uses `Server: nginx` headers to match the most common web server profile.
+
+## Consequences
+- **Positive**: TLS+wraith servers on port 443 are indistinguishable from ordinary HTTPS sites to automated scanners.
+- **Positive**: Simple implementation — just peek at the first bytes and branch.
+- **Positive**: Consistent with censorship circumvention best practices.
+- **Negative**: Legitimate HTTPS traffic to the same port gets a 404. If the same IP needs to serve real web content, use a reverse proxy (nginx/haproxy) in front that routes by SNI or path.
+- **Negative**: The `--stealth` flag only applies to TLS transport. It has no effect on TCP or iroh transports.
+
+## References
+- [server.md](../server.md)
--- a/docs/architecture/decisions/018-control-channel-for-pubsub.md
+++ b/docs/architecture/decisions/018-control-channel-for-pubsub.md
@@ -0,0 +1,38 @@
+# ADR-018: Control Channel for PubSub over SSH
+
+## Status
+Accepted
+
+## Context
+The NAPI wrapper and pubsub integration need a way to use wraith's SSH channel as a data plane for event routing. When a `wraith connect` client opens an SSH session to a server, the `direct_tcpip` channel type is used to reach specific TCP targets (host:port).
+
+For the pubsub use case, the client needs a dedicated bidirectional stream to the server's event bus — not a TCP connection to a random host. There are several approaches:
+
+1. **Special destination**: Use `direct_tcpip` with a reserved destination (e.g., `wraith-control:0`) that the server recognizes and routes internally instead of connecting to a TCP target.
+2. **Port forwarding**: The server runs a pubsub hub on a specific port (e.g., 9736) and the client uses normal port forwarding (`-L 9736:hub:9736`).
+3. **Custom channel type**: Define a new SSH channel type beyond `direct_tcpip` and `forwarded_tcpip`.
+
+## Decision
+Use approach 1: a reserved `direct_tcpip` destination string. When the server receives a `channel_open_direct_tcpip` request for `wraith-control:0`:
+
+1. The `channel_open_direct_tcpip` handler detects the special target via string matching
+2. Instead of connecting to a TCP target, it bridges the channel to the local pubsub event bus
+3. `EventEnvelope` JSON flows bidirectionally over the SSH channel
+
+The destination string `wraith-control` is reserved. Regular TCP targets are hostnames or IP addresses, so there is no collision risk.
+
+Approach 2 (port forwarding to a specific port) is still supported as an alternative — the client can use `--forward 9736:localhost:9736` if the server runs a pubsub hub on that port. But the control channel approach is simpler and doesn't require a separate listening port.
+
+Approach 3 (custom channel type) was rejected because russh's `direct_tcpip` handler is well-understood and adding custom channel types requires modifying russh.
+
+## Consequences
+- **Positive**: Simple implementation — just string matching in the server's `channel_open_direct_tcpip` handler.
+- **Positive**: No separate port or service needs to run on the server. The control channel is built into wraith.
+- **Positive**: Compatible with the NAPI wrapper's single-duplex-stream model.
+- **Positive**: Port forwarding to a specific port is still available as an alternative.
+- **Negative**: The string `wraith-control` is a magic constant. It should be defined as a constant in the crate.
+- **Negative**: Regular TCP destinations accidentally matching `wraith-control` would be misrouted. Mitigated by reserving the entire `wraith-` prefix namespace.
+
+## References
+- [napi-and-pubsub.md](../napi-and-pubsub.md)
+- [server.md](../server.md)