Resolve all architecture open questions, add 13 ADRs, update specs
Resolved all 11 open questions based on project guidance: Transport: - OQ-01/OQ-07: ACME/Let's Encrypt with domain + IP paths (ADR-008) - OQ-02: Default to n0 relay, --iroh-relay override (ADR-009) - OQ-05: Transport chaining supported natively (ADR-010) Client: - OQ-06: Programmatic-first API, no ~/.ssh/config (ADR-011) Server: - OQ-04: Ed25519 + OpenSSH cert-authority, no password auth (ADR-012) - OQ-08: fail2ban-friendly logging + built-in rate limiting (ADR-013) TUN: - OQ-03/OQ-09: Deferred entirely, recommend tun2proxy (ADR-014) - tun-shim.md marked deprecated NAPI: - OQ-10: Expose both connect() and serve() (ADR-016) - OQ-11: Use napi-rs for FFI bridge (ADR-015) Additional ADRs created during review: - ADR-006: No logging of tunnel destinations (was phantom reference) - ADR-017: Stealth mode protocol multiplexing - ADR-018: Control channel for pubsub over SSH Fixed: ADR-002 status → Superseded, ADR-007 title typo, WRAUTH_SERVER typo, ADR-005 stale wraith-tun refs, undefined ACL feature removed from server.md, --proxy semantic difference documented.
This commit is contained in:
@@ -1,7 +1,7 @@
|
||||
# ADR-002: TUN Shim as Separate Process
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
Superseded by ADR-014
|
||||
|
||||
## Context
|
||||
TUN interface creation requires root privileges or `CAP_NET_ADMIN` on Linux, Administrator on Windows, or platform-specific VPN APIs on macOS/iOS/Android. If the core wraith binary required these privileges, the attack surface of root-required code would include the entire SSH implementation, key handling, and transport negotiation.
|
||||
|
||||
@@ -28,11 +28,11 @@ TUN forwards to SOCKS5 rather than directly to SSH because:
|
||||
- No root code in the core binary
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Core binary is root-free. TUN is optional and separate.
|
||||
- **Positive**: Core binary is root-free. TUN functionality is provided by the external `tun2proxy` tool (ADR-014).
|
||||
- **Positive**: SOCKS5 is testable without TUN — just `curl` against it.
|
||||
- **Positive**: TUN implementation is simplified — it's a thin wrapper around tun2proxy's pattern pointed at localhost:1080.
|
||||
- **Negative**: TUN adds one network hop (TUN → localhost SOCKS5 → SSH). The latency impact is negligible (localhost).
|
||||
- **Negative**: SOCKS5 doesn't capture UDP (except DNS via SOCKS5h). TUN mode would handle non-DNS UDP via the SOCKS5 UDP association or drop it.
|
||||
- **Positive**: The TUN approach is validated by tun2proxy, a well-tested existing tool. No custom TUN code to maintain.
|
||||
- **Negative**: VPN-like behavior requires running `tun2proxy` alongside `wraith connect` — two processes instead of one integrated binary.
|
||||
- **Negative**: SOCKS5 doesn't capture UDP (except DNS via SOCKS5h). TUN mode via tun2proxy handles this separately.
|
||||
|
||||
## References
|
||||
- [client.md](../client.md)
|
||||
|
||||
@@ -0,0 +1,38 @@
|
||||
# ADR-006: No Logging of Tunnel Destinations
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
An SSH tunnel server sees every destination that clients connect to — hostnames, IP addresses, port numbers. This is extremely sensitive information. Logging it creates:
|
||||
|
||||
- **Privacy risks**: Tunnel destinations reveal what services users access (internal databases, APIs, etc.)
|
||||
- **Legal concerns**: Server operators may be pressured to produce logs showing what clients accessed
|
||||
- **Data retention liability**: Stored destination logs are an attack surface (data breaches, subpoenas)
|
||||
|
||||
However, the server does need to log some information for operational purposes — particularly for fail2ban integration to detect and block abusive connections.
|
||||
|
||||
## Decision
|
||||
The server does NOT log:
|
||||
- `channel_open_direct_tcpip` destinations (host, port)
|
||||
- DNS resolutions performed by the server on behalf of clients
|
||||
- Bytes transferred through tunnel channels
|
||||
- Connection duration or throughput
|
||||
|
||||
The server DOES log (ADR-013):
|
||||
- Auth attempts (remote_addr, user, key_fingerprint, accept/reject)
|
||||
- Connection opened (remote_addr, transport kind)
|
||||
- Connection closed (remote_addr, duration)
|
||||
|
||||
This separation ensures fail2ban has enough data to detect abusive IPs while destination privacy is maintained.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Tunnel destinations are never written to disk or any observable log. This is the same guarantee OpenSSH makes with `LogLevel VERBOSE` or below.
|
||||
- **Positive**: Reduces legal and privacy exposure for server operators.
|
||||
- **Positive**: fail2ban can still work — it needs source IPs and auth failures, not destinations.
|
||||
- **Negative**: Server operators cannot audit what destinations clients are accessing. If an operator needs this for compliance, they must implement it outside wraith (e.g., network-level logging at the target host).
|
||||
- **Negative**: Debugging connectivity issues is harder without destination logs. Mitigated by client-side logging (the client knows what it's connecting to).
|
||||
|
||||
## References
|
||||
- [server.md](../server.md)
|
||||
- [ADR-013](013-fail2ban-friendly-logging.md) — what the server does log
|
||||
@@ -1,4 +1,4 @@
|
||||
# ADR-006: NAPI Exposes Single Duplex Stream
|
||||
# ADR-007: NAPI Exposes Single Duplex Stream
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
38
docs/architecture/decisions/008-acme-lets-encrypt.md
Normal file
38
docs/architecture/decisions/008-acme-lets-encrypt.md
Normal file
@@ -0,0 +1,38 @@
|
||||
# ADR-008: ACME/Let's Encrypt Certificate Provisioning
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
TLS transport mode requires certificates. Manual certificate management is error-prone — users need to obtain, install, and renew certificates. Our production setup uses certbot with Let's Encrypt (documented in `/workspace/system/dev1/certbot.md`), which automates this via the ACME protocol.
|
||||
|
||||
There are two ACME flows:
|
||||
1. **Domain-based**: Standard flow with DNS-01 or HTTP-01 challenge. Certificate is tied to a domain name, auto-renews via certbot/systemd timer. Requires port 80 or DNS access for challenges.
|
||||
2. **IP-based**: Short-lived certificates via TLS-ALPN-01 challenge on port 443. No domain needed, but cert is short-lived (days, not months). Simpler for quick setups but requires the ACME client to run continuously.
|
||||
|
||||
Both flows are important for wraith's usability. Without ACME, TLS mode requires manual cert setup — a significant barrier for users who want "SSH over port 443" for censorship resistance.
|
||||
|
||||
## Decision
|
||||
Support both ACME certificate provisioning paths:
|
||||
|
||||
1. **Domain-based ACME** (`--acme-domain <domain>`): Standard certbot-style flow. Certificate is domain-bound, auto-renews. The server runs a challenge responder (HTTP-01 on port 80 or TLS-ALPN-01 on port 443) during certificate issuance/renewal.
|
||||
|
||||
2. **IP-based ACME**: Short-lived certs for servers without a domain. Uses TLS-ALPN-01 challenge on port 443. Lower burden but certs expire frequently.
|
||||
|
||||
3. **Manual certs** (`--tls-cert` / `--tls-key`): Always supported for users with existing certificates or specific PKI setups.
|
||||
|
||||
The implementation should use the `rustls-acme` crate (or similar pure-Rust ACME client) to avoid an external certbot dependency. This keeps wraith self-contained as a single binary.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Users can run `wraith serve --transport tls --acme-domain example.com` and get working TLS with zero manual cert management.
|
||||
- **Positive**: IP-based ACME covers the quick-setup case without requiring a domain.
|
||||
- **Positive**: Consistent with our production infrastructure (certbot + Let's Encrypt is already our standard).
|
||||
- **Negative**: ACME adds complexity to the server binary (challenge responder, cert store, renewal timer).
|
||||
- **Negative**: IP-based short-lived certs require more frequent renewal handling.
|
||||
- **Negative**: Binary size increases with ACME support (rustls-acme dependency). Consider making ACME a feature flag (`acme`).
|
||||
|
||||
## References
|
||||
- [server.md](../server.md)
|
||||
- [OQ-01](../open-questions.md) — resolved by this ADR
|
||||
- [OQ-07](../open-questions.md) — resolved by this ADR
|
||||
- Production certbot setup: `/workspace/system/dev1/certbot.md`
|
||||
28
docs/architecture/decisions/009-default-iroh-relay.md
Normal file
28
docs/architecture/decisions/009-default-iroh-relay.md
Normal file
@@ -0,0 +1,28 @@
|
||||
# ADR-009: Default iroh Relay with Override
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
iroh requires a relay server for NAT traversal and initial connection establishment. The n0 project provides free relay servers (`https://relay.iroh.network/`) that work out of the box. However, relying on a third-party service creates a dependency:
|
||||
|
||||
- n0's relay could change terms, rate-limit, or go down
|
||||
- Production deployments may want self-hosted relays for reliability and privacy
|
||||
- The relay URL is a configuration point that should be explicit
|
||||
|
||||
Conversely, requiring users to set up a relay server before they can use iroh transport is a significant friction point for testing and quick starts.
|
||||
|
||||
## Decision
|
||||
Default to n0's relay servers. Allow override via `--iroh-relay <url>` CLI flag. Document self-hosted relay setup in project documentation.
|
||||
|
||||
This matches iroh's own defaults — n0's relay is the standard starting point. Users who need production reliability self-host.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Zero-config iroh transport for testing and development. `wraith serve --transport iroh` just works.
|
||||
- **Positive**: Self-hosting is a single flag override, not a complex setup requirement.
|
||||
- **Negative**: Default depends on n0's infrastructure. If n0's relay is down, default iroh connections fail (but this is the same experience as every iroh user).
|
||||
- **Negative**: Privacy-conscious users must remember to `--iroh-relay` to avoid n0. Mitigated by documentation.
|
||||
|
||||
## References
|
||||
- [transport.md](../transport.md)
|
||||
- [OQ-02](../open-questions.md) — resolved by this ADR
|
||||
33
docs/architecture/decisions/010-transport-chaining-cli.md
Normal file
33
docs/architecture/decisions/010-transport-chaining-cli.md
Normal file
@@ -0,0 +1,33 @@
|
||||
# ADR-010: Transport Chaining in CLI
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
Transport chaining allows combining iroh with an upstream proxy, e.g.:
|
||||
|
||||
```bash
|
||||
wraith connect --transport iroh --proxy socks5://127.0.0.1:1080
|
||||
```
|
||||
|
||||
This routes iroh's outbound TCP connections through a SOCKS5 proxy, which could itself be another wraith instance. This is important for:
|
||||
- Nested tunnel topologies
|
||||
- Environments where iroh needs to go through an existing proxy
|
||||
- Composing transports in flexible ways
|
||||
|
||||
iroh's `Endpoint::builder` supports proxy configuration natively. The implementation is straightforward — pass the proxy URL to iroh's builder.
|
||||
|
||||
## Decision
|
||||
Support `--transport iroh --proxy socks5://...` natively in the CLI. This works because iroh's endpoint builder accepts a proxy configuration, so the implementation is minimal: parse the proxy URL and pass it to the endpoint builder.
|
||||
|
||||
For other transport combinations (TCP+TLS is already implicit — TLS wraps TCP), the `--proxy` flag applies to outbound connections from the SSH client or iroh endpoint.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Flexible transport composition without requiring separate manual configuration.
|
||||
- **Positive**: Matches user expectation from the overview doc's transport chaining example.
|
||||
- **Positive**: Implementation is minimal — iroh already supports proxy config.
|
||||
- **Negative**: Slightly more CLI surface area (`--proxy` interaction with `--transport`).
|
||||
|
||||
## References
|
||||
- [transport.md](../transport.md)
|
||||
- [OQ-05](../open-questions.md) — resolved by this ADR
|
||||
@@ -0,0 +1,38 @@
|
||||
# ADR-011: Programmatic-First API, No File-Based Config
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The client and server both need configuration (host addresses, keys, transport options, etc.). There are several approaches:
|
||||
|
||||
1. **Read `~/.ssh/config`**: Parse OpenSSH config for default host/key/port. Reduces CLI verbosity for frequent connections.
|
||||
2. **Custom config file**: Wraith-specific config file (TOML/YAML) with host definitions.
|
||||
3. **Programmatic API only**: Configuration comes from CLI flags or the library API. No file parsing. `~/.ssh/` path conventions are cross-platform trouble (`~` expansion, Windows paths, etc.).
|
||||
4. **Hybrid**: `--config` flag pointing to a wraith-specific config file, but no OpenSSH config parsing.
|
||||
|
||||
## Decision
|
||||
Option 3: Programmatic-first API. Configuration is provided via:
|
||||
- **CLI**: explicit flags (`--server`, `--identity`, `--transport`, etc.)
|
||||
- **Library API**: `wraith_core::client::ConnectOptions` and `wraith_core::server::ServeOptions` structs, constructable programmatically
|
||||
- **Environment variables**: for a few convenience defaults (e.g., `WRAITH_SERVER`, `WRAITH_IDENTITY`)
|
||||
|
||||
No `~/.ssh/config` parsing, no wraith-specific config files. This approach:
|
||||
- Avoids cross-platform path issues (`~` expansion, Windows `USERPROFILE`, etc.)
|
||||
- Makes the library API clean and straightforward for programmatic consumers (NAPI wrapper, pubsub)
|
||||
- Keeps the CLI simple and explicit — no hidden behavior from config files
|
||||
- Matches the design principle that the library crate (`wraith-core`) is the primary interface
|
||||
|
||||
If users want config-file behavior in the future, it can be added as a separate layer that populates the options structs. But the core doesn't need to know about files.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Clean library API — `ConnectOptions` and `ServeOptions` are plain Rust structs.
|
||||
- **Positive**: No cross-platform path issues in the core library.
|
||||
- **Positive**: Explicit CLI — no hidden settings from a config file the user forgot about.
|
||||
- **Positive**: NAPI wrapper can construct options programmatically without file I/O.
|
||||
- **Negative**: Users must type full connection flags each time. Mitigated by shell aliases or environment variables.
|
||||
- **Negative**: No config file convenience. Users coming from `ssh config` may find this inconvenient.
|
||||
|
||||
## References
|
||||
- [client.md](../client.md)
|
||||
- [OQ-06](../open-questions.md) — resolved by this ADR
|
||||
@@ -0,0 +1,42 @@
|
||||
# ADR-012: Ed25519 Keys + OpenSSH Certificate Authority, No Password Auth
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
SSH authentication has several options:
|
||||
- **Ed25519 public key**: The default, already specified. Each user has a keypair; the server has an `authorized_keys` file.
|
||||
- **Password authentication**: Convenient for quick setups but less secure (susceptible to brute force, credential reuse).
|
||||
- **OpenSSH certificate authority (cert-authority)**: A CA signs user certificates. The server trusts the CA instead of individual keys. Much easier for multi-user setups — add one CA line to `authorized_keys` instead of every user's public key. Also supports certificate expiry and restrictions.
|
||||
|
||||
The question is which auth methods to support and prioritize.
|
||||
|
||||
## Decision
|
||||
|
||||
**Primary: Ed25519 public key** (already specified, no change).
|
||||
|
||||
**Important: OpenSSH certificate authority**. Support `cert-authority` entries in `authorized_keys` files. When a user presents a certificate signed by a trusted CA, the server validates the certificate (signature, expiry, permissions) and accepts it. This is critical for multi-user deployments where managing individual keys is impractical.
|
||||
|
||||
**Not supported: Password authentication over SSH channels.** Password auth over an SSH tunnel (i.e., the SOCKS5 proxy requiring a password) is not in scope. Password auth over SSH itself is rejected because:
|
||||
- It's less secure than key-based auth
|
||||
- It's susceptible to brute force (fail2ban can mitigate, but keys eliminate the problem)
|
||||
- It's not needed when cert-authority provides easy multi-user management
|
||||
- If a local SOCKS5 proxy is desired with its own auth, that's a separate concern
|
||||
|
||||
The server's `authorized_keys` file format follows OpenSSH conventions:
|
||||
- Regular keys: `ssh-ed25519 AAAA... user@host`
|
||||
- CA trusts: `cert-authority ssh-ed25519 AAAA... CA name`
|
||||
- Principals: `cert-authority,permit-port-forwarding ssh-ed25519 AAAA... CA name`
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Multi-user deployments are manageable — one CA entry instead of N key entries.
|
||||
- **Positive**: Certificates can carry expiry dates and restrictions (permit-port-forwarding, no-pty, source-address).
|
||||
- **Positive**: No password brute force risk. fail2ban still needed for connection-level abuse, but not for auth-level password guessing.
|
||||
- **Positive**: `russh` supports OpenSSH certificate verification natively.
|
||||
- **Negative**: Setting up a CA requires initial key management tooling (`ssh-keygen -s`).
|
||||
- **Negative**: Users who want a quick "just let me in" experience need to generate keys first. Not a significant barrier for the target audience (self-hosting, ops).
|
||||
|
||||
## References
|
||||
- [client.md](../client.md)
|
||||
- [server.md](../server.md)
|
||||
- [OQ-04](../open-questions.md) — resolved by this ADR
|
||||
39
docs/architecture/decisions/013-fail2ban-friendly-logging.md
Normal file
39
docs/architecture/decisions/013-fail2ban-friendly-logging.md
Normal file
@@ -0,0 +1,39 @@
|
||||
# ADR-013: Fail2ban-Friendly Server Logging
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The server needs to handle abuse on public-facing deployments. Our production infrastructure uses fail2ban on Linux (documented in `/workspace/system/dev1/fail2ban.md`) with nftables and systemd journal backend. fail2ban needs structured, parseable logs to identify abusive IP addresses.
|
||||
|
||||
However, fail2ban is Linux-specific. On other platforms (macOS, Windows, BSD), users need a different approach to reject abusive connections. The server should provide enough logging for fail2ban on Linux and enough built-in protection for other platforms.
|
||||
|
||||
## Decision
|
||||
The server logs connection and authentication events at `INFO` level with structured fields, and provides a configurable connection rate limiter as a built-in defense.
|
||||
|
||||
**Logging** (for fail2ban integration on Linux):
|
||||
- Log auth attempts: `level=INFO, msg="auth attempt", remote_addr=<ip>, user=<user>, key_fingerprint=<sha256>, result=<accept|reject>`
|
||||
- Log new connections: `level=INFO, msg="connection opened", remote_addr=<ip>, transport=<tcp|tls|iroh>`
|
||||
- Log disconnections: `level=INFO, msg="connection closed", remote_addr=<ip>, duration=<secs>`
|
||||
- Do NOT log: channel open targets, DNS resolutions, bytes transferred
|
||||
|
||||
This matches what fail2ban needs: source IP + failure indicator. Our existing fail2ban setup filters on similar fields for SSH and nginx.
|
||||
|
||||
**Built-in rate limiting** (for all platforms):
|
||||
- `--max-connections-per-ip <n>` (default: 0 = unlimited) — reject new connections from an IP that already has N active connections
|
||||
- `--max-auth-attempts <n>` (default: 10) — disconnect after N failed auth attempts from one connection
|
||||
- Rate limiting happens at the SSH layer, before channels are opened
|
||||
|
||||
This ensures that even without fail2ban, the server rejects obviously abusive connections.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: fail2ban can parse wraith logs the same way it parses SSH and nginx logs on our production systems.
|
||||
- **Positive**: Built-in rate limiting provides protection on platforms without fail2ban.
|
||||
- **Positive**: No privacy-sensitive data in logs (no tunnel destinations).
|
||||
- **Negative**: Slightly more code in the server for connection tracking per IP.
|
||||
- **Negative**: Users with custom fail2ban filters need to write regex for wraith's log format (documented examples provided).
|
||||
|
||||
## References
|
||||
- [server.md](../server.md)
|
||||
- [OQ-08](../open-questions.md) — resolved by this ADR
|
||||
- Production fail2ban setup: `/workspace/system/dev1/fail2ban.md`
|
||||
@@ -0,0 +1,41 @@
|
||||
# ADR-014: Defer TUN Implementation, Recommend Local SOCKS5 + tun2proxy
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The original plan included a TUN shim (`wraith-tun`) as Phase 3 — a separate root-requiring process that creates a TUN device and forwards IP packets through wraith's SOCKS5 port. This would provide VPN-like "route all traffic" behavior.
|
||||
|
||||
However, TUN implementation has significant complexities:
|
||||
- Platform differences (Linux TUN, macOS utun, Windows wintun.dll)
|
||||
- TCP reconstruction in userspace (smoltcp or tun2proxy's ip-stack)
|
||||
- Virtual DNS handling
|
||||
- Root/CAP_NET_ADMIN requirements
|
||||
- TUN is easy to get wrong and hard to debug
|
||||
|
||||
The core SOCKS5 interface already works for the vast majority of use cases. For users who truly need VPN-like "route all traffic" behavior, `tun2proxy` is an existing, well-tested tool that does exactly this: creates a TUN device and routes traffic through a SOCKS5 proxy.
|
||||
|
||||
## Decision
|
||||
Defer TUN implementation entirely. Remove `wraith-tun` from the architecture. Instead:
|
||||
|
||||
1. **Core interface**: wraith's local SOCKS5 proxy (always available, no root required)
|
||||
2. **VPN-like behavior**: Users who need it run `tun2proxy --proxy socks5://127.0.0.1:1080` alongside `wraith connect`
|
||||
3. **Documentation**: Recommend tun2proxy in the README/wiki for "route all traffic" use cases
|
||||
|
||||
This removes TUN from the project scope while still providing a path to VPN-like behavior. If demand justifies it later, `wraith-tun` can be added as a thin wrapper around tun2proxy's pattern.
|
||||
|
||||
The `tun` feature flag and `wraith-tun` binary are removed from the architecture. The `tun-rs` dependency is removed.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Significantly reduces project scope and complexity. No TUN code to write, test, or maintain across platforms.
|
||||
- **Positive**: tun2proxy is already well-tested for this exact use case.
|
||||
- **Positive**: Core binary remains unprivileged. No root code anywhere in the project.
|
||||
- **Positive**: Cleaner architecture — wraith only does SSH tunneling + SOCKS5. tun2proxy does TUN.
|
||||
- **Negative**: Users need two tools instead of one for VPN-like behavior. Mitigated by documentation.
|
||||
- **Negative**: tun2proxy is an external dependency in practice, though it's widely available in package managers.
|
||||
- **Negative**: No first-class Windows/macOS TUN story. tun2proxy handles these platforms but users need to install it separately.
|
||||
|
||||
## References
|
||||
- [tun-shim.md](../tun-shim.md) — this spec is now deprecated
|
||||
- [ADR-002](002-tun-separate-process.md) — superseded; TUN is no longer in scope
|
||||
- [ADR-005](005-socks5-before-tun.md) — SOCKS5 is still the primary interface; TUN forwarding is now external
|
||||
27
docs/architecture/decisions/015-napi-rs-for-ffi-bridge.md
Normal file
27
docs/architecture/decisions/015-napi-rs-for-ffi-bridge.md
Normal file
@@ -0,0 +1,27 @@
|
||||
# ADR-015: napi-rs for FFI Bridge
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The NAPI wrapper needs a Rust-to-Node.js bridge. Two main options:
|
||||
|
||||
1. **napi-rs**: The standard for Rust → Node.js native addons. Mature, well-documented, large ecosystem. Produces `.node` binaries for specific platforms. Good build tooling (`napi` CLI). Used by major projects (swc, rspack, biome).
|
||||
|
||||
2. **uniffi**: Mozilla's FFI bridge supporting multiple targets (Python, Swift, Kotlin, Node.js). Broader target reach but less mature for Node.js specifically. The Node.js binding is relatively new.
|
||||
|
||||
The primary consumer is TypeScript/Node.js — specifically the `@alkdev/pubsub` event target system. The broader alkdev ecosystem (pubsub, operations) is TypeScript-first. While future Python or mobile consumers are imaginable, they are not in scope.
|
||||
|
||||
## Decision
|
||||
Use napi-rs. It's the standard for Node.js native addons, has the best documentation and tooling, and matches our primary consumer (TypeScript/Node.js). If future Python or mobile consumers are needed, uniffi can be added as a separate FFI layer — the Rust core library doesn't change, only the binding layer does.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Best-in-class Node.js native addon support. Mature, well-documented, widely used.
|
||||
- **Positive**: `napi` CLI handles building, cross-compilation, and npm package publishing.
|
||||
- **Positive**: Async support via `napi-rs`'s `AsyncTask` and thread-safe functions.
|
||||
- **Negative**: Only targets Node.js. Python/Swift/Kotlin require a separate FFI bridge (uniffi or similar).
|
||||
- **Negative**: `.node` binaries are platform-specific. Need CI matrix for linux-x64, linux-arm64, macos-x64, macos-arm64, win32-x64.
|
||||
|
||||
## References
|
||||
- [napi-and-pubsub.md](../napi-and-pubsub.md)
|
||||
- [OQ-11](../open-questions.md) — resolved by this ADR
|
||||
@@ -0,0 +1,40 @@
|
||||
# ADR-016: NAPI Exposes Both connect() and serve()
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The NAPI wrapper needs to provide TypeScript/Node.js consumers with access to wraith's functionality. The primary use case is `@alkdev/pubsub`'s event target system, which needs both directions:
|
||||
|
||||
1. **connect()**: Establish a client connection to a wraith server. Used by workers/spokes that need to tunnel events through a wraith server.
|
||||
2. **serve()**: Start a wraith server from Node.js. Used by hubs that want to accept wraith connections and route events.
|
||||
|
||||
The previous decision (ADR-007) was to expose only `connect()` for MVP, deferring `serve()`. However, the pubsub integration requires both: a spoke needs `connect()` to reach a hub, and a hub could use `serve()` to accept connections without running a separate `wraith serve` process.
|
||||
|
||||
More importantly, both `connect()` and `serve()` are fundamental operations of the wraith library. Since the NAPI wrapper is a thin layer over `wraith-core`, exposing both is straightforward — they're just Rust functions behind `#[napi]` attributes.
|
||||
|
||||
## Decision
|
||||
The NAPI wrapper exposes both `connect()` and `serve()` from the start:
|
||||
|
||||
```typescript
|
||||
// @alkdev/wraith
|
||||
function connect(options: WraithConnectOptions): Promise<Duplex>;
|
||||
function serve(options: WraithServeOptions): Promise<WraithServer>;
|
||||
```
|
||||
|
||||
- `connect()` returns a `Duplex` stream (as per ADR-007)
|
||||
- `serve()` returns a `WraithServer` object with a `close()` method and events for new connections
|
||||
|
||||
The NAPI layer is transport-agnostic — it doesn't know about pubsub's `EventEnvelope`. The pubsub event target adapter wraps the `Duplex` stream to implement `TypedEventTarget`. This separation ensures the NAPI wrapper is reusable for any stream-based protocol, not just pubsub.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Pubsub can use both directions without running a separate binary for the server side.
|
||||
- **Positive**: The NAPI wrapper becomes a complete bridge — any Node.js process can be either a client or server.
|
||||
- **Positive**: Implementation is still minimal — `serve()` is just `wraith_core::server::run()` behind `#[napi]`.
|
||||
- **Negative**: Slightly larger API surface (two functions + `WraithServer` type instead of just `connect()`).
|
||||
- **Negative**: Server-side NAPI needs to handle multiple concurrent connections, which adds complexity to `WraithServer`.
|
||||
|
||||
## References
|
||||
- [napi-and-pubsub.md](../napi-and-pubsub.md)
|
||||
- [ADR-007](007-napi-single-stream.md) — still valid; NAPI exposes single streams, but now from both sides
|
||||
- [OQ-10](../open-questions.md) — resolved by this ADR
|
||||
@@ -0,0 +1,30 @@
|
||||
# ADR-017: Stealth Mode — Protocol Multiplexing on Port 443
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
When running a wraith server with TLS transport on port 443, the server should be indistinguishable from a regular HTTPS web server to port scanners and deep packet inspection (DPI) systems. This is important for censorship circumvention — if SSH traffic on port 443 is detectable, it can be blocked.
|
||||
|
||||
After the TLS handshake completes, the server sees a raw byte stream. SSH protocol identification starts with `SSH-2.0-`, while HTTP starts with HTTP method verbs (GET, POST, etc.). The server can inspect the first bytes to determine the protocol.
|
||||
|
||||
## Decision
|
||||
When `--stealth` is enabled with TLS transport:
|
||||
|
||||
1. After completing the TLS handshake, peek at the first few bytes of the connection
|
||||
2. If the connection starts with `SSH-2.0-`, proceed with SSH session via `server::run_stream()`
|
||||
3. If the connection starts with anything else (HTTP, random data), respond with `HTTP/1.1 404 Not Found\r\nServer: nginx\r\n\r\n` and close the connection
|
||||
|
||||
This makes the server appear as an nginx web server returning 404 errors to all non-SSH connections. Scanners and DPI systems see a typical HTTPS site with no SSH exposure.
|
||||
|
||||
The fake response uses `Server: nginx` headers to match the most common web server profile.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: TLS+wraith servers on port 443 are indistinguishable from ordinary HTTPS sites to automated scanners.
|
||||
- **Positive**: Simple implementation — just peek at the first bytes and branch.
|
||||
- **Positive**: Consistent with censorship circumvention best practices.
|
||||
- **Negative**: Legitimate HTTPS traffic to the same port gets a 404. If the same IP needs to serve real web content, use a reverse proxy (nginx/haproxy) in front that routes by SNI or path.
|
||||
- **Negative**: The `--stealth` flag only applies to TLS transport. It has no effect on TCP or iroh transports.
|
||||
|
||||
## References
|
||||
- [server.md](../server.md)
|
||||
@@ -0,0 +1,38 @@
|
||||
# ADR-018: Control Channel for PubSub over SSH
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The NAPI wrapper and pubsub integration need a way to use wraith's SSH channel as a data plane for event routing. When a `wraith connect` client opens an SSH session to a server, the `direct_tcpip` channel type is used to reach specific TCP targets (host:port).
|
||||
|
||||
For the pubsub use case, the client needs a dedicated bidirectional stream to the server's event bus — not a TCP connection to a random host. There are several approaches:
|
||||
|
||||
1. **Special destination**: Use `direct_tcpip` with a reserved destination (e.g., `wraith-control:0`) that the server recognizes and routes internally instead of connecting to a TCP target.
|
||||
2. **Port forwarding**: The server runs a pubsub hub on a specific port (e.g., 9736) and the client uses normal port forwarding (`-L 9736:hub:9736`).
|
||||
3. **Custom channel type**: Define a new SSH channel type beyond `direct_tcpip` and `forwarded_tcpip`.
|
||||
|
||||
## Decision
|
||||
Use approach 1: a reserved `direct_tcpip` destination string. When the server receives a `channel_open_direct_tcpip` request for `wraith-control:0`:
|
||||
|
||||
1. The `channel_open_direct_tcpip` handler detects the special target via string matching
|
||||
2. Instead of connecting to a TCP target, it bridges the channel to the local pubsub event bus
|
||||
3. `EventEnvelope` JSON flows bidirectionally over the SSH channel
|
||||
|
||||
The destination string `wraith-control` is reserved. Regular TCP targets are hostnames or IP addresses, so there is no collision risk.
|
||||
|
||||
Approach 2 (port forwarding to a specific port) is still supported as an alternative — the client can use `--forward 9736:localhost:9736` if the server runs a pubsub hub on that port. But the control channel approach is simpler and doesn't require a separate listening port.
|
||||
|
||||
Approach 3 (custom channel type) was rejected because russh's `direct_tcpip` handler is well-understood and adding custom channel types requires modifying russh.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Simple implementation — just string matching in the server's `channel_open_direct_tcpip` handler.
|
||||
- **Positive**: No separate port or service needs to run on the server. The control channel is built into wraith.
|
||||
- **Positive**: Compatible with the NAPI wrapper's single-duplex-stream model.
|
||||
- **Positive**: Port forwarding to a specific port is still available as an alternative.
|
||||
- **Negative**: The string `wraith-control` is a magic constant. It should be defined as a constant in the crate.
|
||||
- **Negative**: Regular TCP destinations accidentally matching `wraith-control` would be misrouted. Mitigated by reserving the entire `wraith-` prefix namespace.
|
||||
|
||||
## References
|
||||
- [napi-and-pubsub.md](../napi-and-pubsub.md)
|
||||
- [server.md](../server.md)
|
||||
Reference in New Issue
Block a user