docs: add configuration architecture research
Explore static/dynamic config split, hot-reloadable auth via ArcSwap, forwarding policy, multi-transport listeners, and config file format. Documents three problems: no auth hot-reload, no forwarding access control, no structured config beyond CLI flags. Key findings: - Static config (transport, TLS, host key) loaded once at startup - Dynamic config (auth, forwarding, rate limits) reloadable via ArcSwap - ForwardingPolicy with rule-based allow/deny, first-match evaluation - Multi-transport: Server spawns Vec<ListenerConfig> sharing auth config - WebTransport out of scope for now (requires separate auth model) - Proposes ADR-020 (static/dynamic split), ADR-021 (forwarding policy), ADR-022 (multi-transport listeners) Adds OQ-12 through OQ-17 to open-questions.md.
This commit is contained in:
@@ -7,7 +7,7 @@ last_updated: 2026-06-02
|
||||
|
||||
## Current State
|
||||
|
||||
Architecture specification reviewed and ready for implementation. All open questions resolved. 19 ADRs accepted.
|
||||
Architecture specification reviewed and ready for implementation. 19 ADRs accepted. Configuration architecture under exploration — see [research/configuration.md](../research/configuration.md).
|
||||
|
||||
## Architecture Documents
|
||||
|
||||
@@ -20,6 +20,12 @@ Architecture specification reviewed and ready for implementation. All open quest
|
||||
| [tun-shim.md](tun-shim.md) | deprecated | TUN interface wrapper — **deferred**, use tun2proxy |
|
||||
| [napi-and-pubsub.md](napi-and-pubsub.md) | reviewed | NAPI wrapper and pubsub event target adapter |
|
||||
|
||||
## Research Documents
|
||||
|
||||
| Document | Status | Description |
|
||||
|----------|--------|-------------|
|
||||
| [configuration.md](../research/configuration.md) | draft | Configuration architecture: static/dynamic split, hot reload, forwarding policy |
|
||||
|
||||
## ADR Table
|
||||
|
||||
| ADR | Title | Status |
|
||||
@@ -46,7 +52,8 @@ Architecture specification reviewed and ready for implementation. All open quest
|
||||
|
||||
## Open Questions
|
||||
|
||||
All open questions have been resolved. See [open-questions.md](open-questions.md) for details on each resolution.
|
||||
Most open questions have been resolved. New questions from configuration
|
||||
research — see [open-questions.md](open-questions.md) for details.
|
||||
|
||||
## Lifecycle Definitions
|
||||
|
||||
|
||||
@@ -90,4 +90,48 @@ last_updated: 2026-06-02
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~low~~ —
|
||||
- **Resolution**: ADR-015 — Use napi-rs. It's the standard for Node.js native addons, matches our primary consumer (TypeScript/Node.js), and has the best ecosystem and documentation. If future Python or mobile consumers are needed, a separate uniffi layer can be added — the Rust core doesn't change.
|
||||
- **Cross-references**: [ADR-015](decisions/015-napi-rs-for-ffi-bridge.md), napi-and-pubsub.md
|
||||
- **Cross-references**: [ADR-015](decisions/015-napi-rs-for-ffi-bridge.md), napi-and-pubsub.md
|
||||
|
||||
## Configuration
|
||||
|
||||
### OQ-12: Per-user forwarding scope vs global rules
|
||||
- **Origin**: [research/configuration.md](../research/configuration.md)
|
||||
- **Status**: open
|
||||
- **Priority**: medium
|
||||
- **Resolution**: (pending)
|
||||
- **Cross-references**: ADR-020 (proposed)
|
||||
|
||||
### OQ-13: Config file auto-reload via file watching
|
||||
- **Origin**: [research/configuration.md](../research/configuration.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: low
|
||||
- **Resolution**: No file watching. CLI loads once at startup; NAPI/hub reload explicitly. File watching is a potential attack vector and unnecessary complexity for a security tool.
|
||||
- **Cross-references**: ADR-020 (proposed)
|
||||
|
||||
### OQ-14: ArcSwap vs RwLock for dynamic config
|
||||
- **Origin**: [research/configuration.md](../research/configuration.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: low
|
||||
- **Resolution**: ArcSwap. Lock-free reads on the hot path (every auth check, every channel open). `RwLock` adds contention. `arc-swap` is small (~500 lines) and well-maintained.
|
||||
- **Cross-references**: ADR-020 (proposed)
|
||||
|
||||
### OQ-15: TLS + WebTransport + iroh QUIC listener coexistence
|
||||
- **Origin**: [research/configuration.md](../research/configuration.md)
|
||||
- **Status**: open
|
||||
- **Priority**: medium
|
||||
- **Resolution**: (pending — needs R&D in WebTransport transport session)
|
||||
- **Cross-references**: ADR-022 (proposed)
|
||||
|
||||
### OQ-16: Transport-specific forwarding policy (e.g., WebTransport clients restricted to wraith-* channels)
|
||||
- **Origin**: [research/configuration.md](../research/configuration.md)
|
||||
- **Status**: open
|
||||
- **Priority**: low
|
||||
- **Resolution**: (pending — defer to forwarding policy design)
|
||||
- **Cross-references**: ADR-021 (proposed)
|
||||
|
||||
### OQ-17: Transport-aware auth layer (SSH keys vs API keys for non-SSH transports)
|
||||
- **Origin**: [research/configuration.md](../research/configuration.md)
|
||||
- **Status**: open
|
||||
- **Priority**: medium
|
||||
- **Resolution**: (pending — defer until non-SSH transport is implemented)
|
||||
- **Cross-references**: ADR-020 (proposed), OQ-15
|
||||
568
docs/research/configuration.md
Normal file
568
docs/research/configuration.md
Normal file
@@ -0,0 +1,568 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-04
|
||||
phase: exploration
|
||||
---
|
||||
|
||||
# Configuration Architecture
|
||||
|
||||
## Problem
|
||||
|
||||
Wraith's configuration is loaded once at startup and never changes. This has
|
||||
three specific failures:
|
||||
|
||||
1. **No hot reload of authentication credentials.** Adding or removing an
|
||||
authorized key requires restarting the server process. In a hub/spoke
|
||||
deployment where keys are managed via a database (see
|
||||
`@alkdev/storage`'s `peer_credentials` table), the wraith process must be
|
||||
restarted every time a key is added, revoked, or rotated. This is
|
||||
operationally unacceptable for a production service.
|
||||
|
||||
2. **No port forwarding access control.** Any authenticated client can open a
|
||||
`direct-tcpip` channel to any destination. There is no policy governing
|
||||
which hosts, ports, or `wraith-*` control channels a client may access. This
|
||||
is a security gap — a compromised key grants unrestricted network access
|
||||
through the tunnel.
|
||||
|
||||
3. **No structured configuration beyond CLI flags.** ADR-011 chose
|
||||
programmatic-first configuration for the alpha. This was correct — it
|
||||
avoided cross-platform path issues and kept the API surface small. But as
|
||||
wraith moves toward publishable releases, operators need config files for
|
||||
reproducible deployments, and the NAPI layer needs programmatic reload
|
||||
capability that the current `ServeOptions` builder pattern doesn't support.
|
||||
|
||||
### What's Not The Problem
|
||||
|
||||
- This does not propose depending on Honker, SQLite, or any specific data
|
||||
source at the `wraith-core` level. The core provides a reload mechanism;
|
||||
data sources plug in from outside.
|
||||
- This does not propose file-watching (potential attack vector, unnecessary
|
||||
complexity). CLI usage loads config once at startup. Programmatic usage
|
||||
(NAPI, hub) calls reload explicitly.
|
||||
- This does not replace the existing `ServeOptions` builder pattern. It
|
||||
generalizes it.
|
||||
|
||||
## Analysis
|
||||
|
||||
### Static vs Dynamic Configuration
|
||||
|
||||
Not all configuration should be reloadable. Transport-level settings (listen
|
||||
address, TLS certificates, host key) require socket/TLS renegotiation to change
|
||||
at runtime — effectively a restart. Auth and forwarding policy can change
|
||||
atomically without disrupting existing connections.
|
||||
|
||||
| Category | Examples | Reloadable? |
|
||||
|---|---|---|
|
||||
| Transport | listen addr, TLS cert/key, iroh relay, stealth mode | No — requires bind change |
|
||||
| Identity | host key, host key algorithm | No — requires SSH re-negotiation |
|
||||
| Auth | authorized keys, cert authorities | **Yes** — next auth check picks up changes |
|
||||
| Forwarding | allowed destinations, per-principal rules | **Yes** — next channel open picks up changes |
|
||||
| Rate limits | max connections per IP, max auth attempts | **Yes** — next check picks up changes |
|
||||
|
||||
The split is clean: anything that affects the SSH handshake or socket binding
|
||||
is static. Anything that's checked per-connection or per-channel is dynamic.
|
||||
|
||||
### Current Architecture
|
||||
|
||||
```
|
||||
ServeOptions (builder) → Server::new()
|
||||
├─ Arc<server::Config> (russh config, immutable)
|
||||
├─ Arc<ServerAuthConfig> (keys + CAs, immutable after load)
|
||||
├─ Arc<ConnectionRateLimiter> (mutable but not reloadable)
|
||||
└─ ServerHandler::new(auth_config, ...)
|
||||
|
||||
ServerHandler
|
||||
├─ auth_config: Arc<ServerAuthConfig> ← shared, immutable
|
||||
├─ connection_limiter: Arc<ConnectionRateLimiter>
|
||||
├─ outbound_proxy: Option<ProxyConfig>
|
||||
└─ (no forwarding policy field)
|
||||
```
|
||||
|
||||
`auth_publickey()` reads from `self.auth_config` via `Arc` dereference. No
|
||||
path to update it.
|
||||
|
||||
### Proposed Architecture
|
||||
|
||||
Replace `Arc<ServerAuthConfig>` with a reloadable provider:
|
||||
|
||||
```
|
||||
StaticConfig (Arc, loaded once)
|
||||
├─ transport mode, listen addr, TLS config, iroh config
|
||||
├─ stealth, proxy
|
||||
├─ host key
|
||||
└─ max_auth_attempts, max_connections_per_ip
|
||||
|
||||
DynamicConfig (Arc<ArcSwap<DynamicConfig>>, reloadable)
|
||||
├─ auth: ServerAuthConfig
|
||||
├─ forwarding: ForwardingPolicy
|
||||
└─ rate_limits: RateLimitConfig
|
||||
|
||||
ConfigReloadHandle (exposed to NAPI)
|
||||
└─ reload(DynamicConfig)
|
||||
```
|
||||
|
||||
`ArcSwap` provides lock-free reads on the hot path. Every `auth_publickey()`
|
||||
and `channel_open_direct_tcpip()` call does an `Arc` dereference — zero cost
|
||||
compared to the current approach. Writes are atomic: `store()` swaps the
|
||||
pointer. Existing connections finish with their current config, new connections
|
||||
get the new config.
|
||||
|
||||
### Forwarding Policy
|
||||
|
||||
Currently, `channel_open_direct_tcpip` in `handler.rs` spawns a proxy task for
|
||||
any destination. The only gate is authentication. A forwarding policy adds a
|
||||
check before the proxy spawn:
|
||||
|
||||
```rust
|
||||
pub struct ForwardingPolicy {
|
||||
default: ForwardingAction,
|
||||
rules: Vec<ForwardingRule>,
|
||||
}
|
||||
|
||||
pub struct ForwardingRule {
|
||||
target: TargetPattern,
|
||||
action: ForwardingAction,
|
||||
principals: Vec<String>,
|
||||
}
|
||||
|
||||
pub enum ForwardingAction { Allow, Deny }
|
||||
pub enum TargetPattern {
|
||||
Any,
|
||||
Host(String),
|
||||
Cidr(IpNetwork),
|
||||
PortRange(String, Range<u16>),
|
||||
WraithPrefix,
|
||||
}
|
||||
```
|
||||
|
||||
Rule evaluation: first match wins, default applies if no rule matches. This
|
||||
model maps to OpenSSH's `AllowTcpForwarding` + `PermitOpen` but is more
|
||||
expressive. It also maps to `peer_credentials.metadata.scopes` in `@alkdev/storage`
|
||||
— the hub can generate forwarding rules from stored scopes.
|
||||
|
||||
Rule ordering matters. A deny-then-allow pattern gives blocklist semantics. An
|
||||
allow-then-deny pattern gives allowlist semantics. Both are useful. The
|
||||
default determines the fallback.
|
||||
|
||||
### Configuration File Format
|
||||
|
||||
ADR-011 chose "programmatic-first, no config file." This was correct for alpha.
|
||||
For publishable releases, a config file enables:
|
||||
|
||||
- Reproducible deployments (version-controlled config)
|
||||
- Less verbose CLI invocations
|
||||
- Separate files for static and dynamic config (only static needs to be in the
|
||||
config file; dynamic comes from the reload mechanism)
|
||||
|
||||
TOML is the idiomatic Rust choice. The config file covers static config only —
|
||||
the same fields as `ServeOptions`. Dynamic config (auth, forwarding) comes from
|
||||
the reload mechanism, not from the file. This preserves ADR-011's intent: the
|
||||
core doesn't know about the data source for auth keys, it just provides a place
|
||||
to put them.
|
||||
|
||||
```toml
|
||||
[server]
|
||||
transport = "tls"
|
||||
listen = "0.0.0.0:443"
|
||||
stealth = false
|
||||
max_connections_per_ip = 5
|
||||
max_auth_attempts = 3
|
||||
|
||||
[server.tls]
|
||||
cert = "/etc/wraith/tls/cert.pem"
|
||||
key = "/etc/wraith/tls/key.pem"
|
||||
|
||||
[server.iroh]
|
||||
relay = "https://relay.alk.dev"
|
||||
|
||||
[auth]
|
||||
host_key = "/etc/wraith/ssh/host_key"
|
||||
|
||||
[forwarding]
|
||||
default = "deny"
|
||||
|
||||
[[forwarding.rules]]
|
||||
target = "localhost:*"
|
||||
action = "allow"
|
||||
|
||||
[[forwarding.rules]]
|
||||
target = "wraith-*"
|
||||
action = "allow"
|
||||
|
||||
[[forwarding.rules]]
|
||||
target = "*:22"
|
||||
action = "deny"
|
||||
```
|
||||
|
||||
The `[[forwarding.rules]]` array syntax is TOML's array-of-tables pattern.
|
||||
Rules are evaluated in order; first match wins.
|
||||
|
||||
### NAPI Reload API
|
||||
|
||||
The NAPI layer exposes the reload handle:
|
||||
|
||||
```typescript
|
||||
interface WraithServer {
|
||||
reloadAuth(auth: { authorizedKeys?: Buffer, certAuthority?: Buffer }): void;
|
||||
reloadForwarding(policy: ForwardingPolicyConfig): void;
|
||||
reloadAll(config: DynamicConfig): void;
|
||||
}
|
||||
|
||||
interface ForwardingPolicyConfig {
|
||||
default: 'allow' | 'deny';
|
||||
rules: ForwardingRuleConfig[];
|
||||
}
|
||||
|
||||
interface ForwardingRuleConfig {
|
||||
target: string; // "localhost:*", "10.0.0.0/8:80", "wraith-*"
|
||||
action: 'allow' | 'deny';
|
||||
principals?: string[]; // default ["*"]
|
||||
}
|
||||
```
|
||||
|
||||
The hub calls `server.reloadAuth(...)` after writing to `peer_credentials`.
|
||||
The NAPI layer parses the key data and constructs a new `DynamicConfig`, then
|
||||
calls the `ConfigReloadHandle`.
|
||||
|
||||
### Client Configuration
|
||||
|
||||
Client configuration is almost entirely static (which server to connect to,
|
||||
which key to use). The only potential dynamic config is key rotation, which is
|
||||
less urgent because clients don't serve. For now, client configuration stays
|
||||
as `ConnectOptions` — no `ArcSwap` needed.
|
||||
|
||||
A config file for client connections could define named profiles:
|
||||
|
||||
```toml
|
||||
[profiles.production]
|
||||
server = "hub.alk.dev:443"
|
||||
transport = "tls"
|
||||
identity = "/home/user/.ssh/id_ed25519"
|
||||
|
||||
[profiles.staging]
|
||||
server = "staging.alk.dev:22"
|
||||
transport = "tcp"
|
||||
identity = "/home/user/.ssh/staging_key"
|
||||
```
|
||||
|
||||
This is a convenience layer on top of `ConnectOptions`, not a replacement.
|
||||
|
||||
### CLI vs Programmatic Behavior
|
||||
|
||||
| Interface | Static config | Dynamic config | Reload mechanism |
|
||||
|---|---|---|---|
|
||||
| CLI | Flags + optional `--config` file | Loaded at startup from `--authorized-keys` | None (restart to change) |
|
||||
| Core Rust | `StaticConfig` struct | `ArcSwap<DynamicConfig>` | `ConfigReloadHandle::reload()` |
|
||||
| NAPI | `serve()` options | Same `ArcSwap` | `server.reloadAuth()`, `server.reloadForwarding()` |
|
||||
|
||||
The CLI doesn't need a reload mechanism. When you're running wraith from the
|
||||
command line, restarting is fine. The reload mechanism exists for programmatic
|
||||
consumers that manage credentials in a database.
|
||||
|
||||
### Multi-Transport Listeners
|
||||
|
||||
A host may want to accept connections on multiple transports simultaneously:
|
||||
|
||||
- TCP on port 22 (simple, direct SSH)
|
||||
- TLS on port 443 (stealth mode, corporate firewalls)
|
||||
- iroh QUIC (P2P, no port forwarding needed)
|
||||
- WebTransport on port 443 (browser clients, shares the HTTP/3 listener)
|
||||
|
||||
Currently `ServeTransportMode` is a single enum and `Server::run()` takes one
|
||||
acceptor. To serve multiple transports, the architecture needs to change.
|
||||
|
||||
**Option A: `Server` manages multiple listeners internally.**
|
||||
|
||||
```rust
|
||||
pub struct Server {
|
||||
// Shared state (one copy, shared across all listeners)
|
||||
config: Arc<server::Config>,
|
||||
dynamic_config: Arc<ArcSwap<DynamicConfig>>,
|
||||
connection_limiter: Arc<ConnectionRateLimiter>,
|
||||
outbound_proxy: Option<ProxyConfig>,
|
||||
sessions: Arc<tokio::sync::Mutex<Vec<ActiveSession>>>,
|
||||
shutdown_tx: tokio::sync::watch::Sender<bool>,
|
||||
shutdown_rx: tokio::sync::watch::Receiver<bool>,
|
||||
|
||||
// Per-listener state
|
||||
listeners: Vec<ListenerConfig>,
|
||||
}
|
||||
|
||||
pub struct ListenerConfig {
|
||||
transport: ServeTransportMode,
|
||||
listen_addr: SocketAddr,
|
||||
stealth: bool,
|
||||
// Transport-specific config (TLS cert, iroh relay, etc.)
|
||||
tls: Option<TlsConfig>,
|
||||
iroh: Option<IrohConfig>,
|
||||
}
|
||||
```
|
||||
|
||||
`Server::run()` spawns one accept loop per `ListenerConfig`. Each loop
|
||||
constructs its own acceptor and `ServerHandler` (with the appropriate
|
||||
`TransportKind` tag), but shares the auth config, connection limiter, and
|
||||
session list. Shutdown signal goes to all loops.
|
||||
|
||||
**Option B: Caller manages multiple `Server` instances.**
|
||||
|
||||
The caller creates N `Server` objects, each with its own transport. They share
|
||||
`Arc<ArcSwap<DynamicConfig>>` and `Arc<ConnectionRateLimiter>` explicitly.
|
||||
|
||||
Option A is better because: shared shutdown, shared session tracking, single
|
||||
point for config reload. Option B puts coordination burden on the caller and
|
||||
makes graceful shutdown harder (N independent shutdown channels).
|
||||
|
||||
**The TLS + WebTransport coexistence question.** Both TLS and WebTransport
|
||||
use port 443. WebTransport is HTTP/3 (QUIC), TLS on port 443 is typically
|
||||
TCP+TLS. They can share the port because they're different protocols — QUIC
|
||||
is UDP, TLS-over-TCP is TCP. The kernel routes by protocol. But if both are
|
||||
on 443, the stealth mode protocol detector needs to handle HTTP/3 as well:
|
||||
|
||||
```
|
||||
Port 443:
|
||||
TCP connection → TLS handshake → SSH (existing)
|
||||
UDP "connection" → QUIC handshake → WebTransport → stream proxy
|
||||
```
|
||||
|
||||
This is similar to how iroh-live-relay works: HTTP/3 listener accepts
|
||||
WebTransport sessions, each session opens bidirectional streams that map to
|
||||
internal services.
|
||||
|
||||
**Config file for multi-transport:**
|
||||
|
||||
```toml
|
||||
[[listeners]]
|
||||
transport = "tls"
|
||||
listen = "0.0.0.0:443"
|
||||
stealth = true
|
||||
|
||||
[listeners.tls]
|
||||
cert = "/etc/wraith/tls/cert.pem"
|
||||
key = "/etc/wraith/tls/key.pem"
|
||||
|
||||
[[listeners]]
|
||||
transport = "tcp"
|
||||
listen = "0.0.0.0:22"
|
||||
|
||||
[[listeners]]
|
||||
transport = "iroh"
|
||||
iroh_relay = "https://relay.alk.dev"
|
||||
|
||||
[[listeners]]
|
||||
transport = "webtransport"
|
||||
listen = "0.0.0.0:443"
|
||||
# WebTransport shares port 443 with TLS because QUIC is UDP, TLS is TCP
|
||||
|
||||
[listeners.webtransport]
|
||||
cert = "/etc/wraith/tls/cert.pem"
|
||||
key = "/etc/wraith/tls/key.pem"
|
||||
```
|
||||
|
||||
The `[[listeners]]` array-of-tables pattern means each listener is an
|
||||
independent config block. The `[auth]`, `[forwarding]`, and `[server]`
|
||||
sections at the top level are shared — they apply to all listeners.
|
||||
|
||||
**NAPI multi-transport:**
|
||||
|
||||
```typescript
|
||||
const server = await serve({
|
||||
listeners: [
|
||||
{ transport: 'tls', listen: '0.0.0.0:443', stealth: true, tlsCert: '...', tlsKey: '...' },
|
||||
{ transport: 'tcp', listen: '0.0.0.0:22' },
|
||||
{ transport: 'iroh', irohRelay: 'https://relay.alk.dev' },
|
||||
],
|
||||
hostKey: hostKeyBuffer,
|
||||
authorizedKeys: keysBuffer,
|
||||
});
|
||||
```
|
||||
|
||||
Single `WraithServer` object, single `reloadAuth()` call affects all
|
||||
listeners.
|
||||
|
||||
### Transport Kind and WebTransport
|
||||
|
||||
The `TransportKind` enum (currently `Tcp | Tls | Iroh`) tags each connection
|
||||
so the handler can behave differently per transport. Adding `WebTransport` to
|
||||
this enum is straightforward — WebTransport connections are identifiable at
|
||||
accept time. The handler behavior is the same (port forwarding only), but
|
||||
the tag enables transport-specific logging and future policy differences
|
||||
(e.g., WebTransport clients can only access `wraith-*` control channels).
|
||||
|
||||
## Proposed Solution
|
||||
|
||||
### Phase 1: Static/Dynamic Split
|
||||
|
||||
1. Introduce `StaticConfig` and `DynamicConfig` structs
|
||||
2. Replace `Arc<ServerAuthConfig>` in `ServerHandler` with
|
||||
`Arc<ArcSwap<DynamicConfig>>`
|
||||
3. Add `ConfigReloadHandle` with `reload(DynamicConfig)` method
|
||||
4. Expose `reloadAuth()` on the NAPI `WraithServer` object
|
||||
|
||||
**Scope**: `wraith-core` auth module + `wraith-napi` serve module
|
||||
|
||||
**Risk**: Low — internal refactor, no protocol changes
|
||||
|
||||
### Phase 2: Forwarding Policy
|
||||
|
||||
1. Add `ForwardingPolicy` to `DynamicConfig`
|
||||
2. Add policy check to `channel_open_direct_tcpip` before proxy spawn
|
||||
3. Expose `reloadForwarding()` on NAPI `WraithServer`
|
||||
|
||||
**Scope**: `wraith-core` handler + `wraith-napi`
|
||||
|
||||
**Risk**: Low — new check, default-allow preserves current behavior
|
||||
|
||||
### Phase 3: Config File
|
||||
|
||||
1. Add `--config <path>` CLI flag parsing TOML
|
||||
2. CLI flags override config file values (same precedence as cargo)
|
||||
3. Config file only covers static config + initial auth config path
|
||||
4. Add `serde` derive to `StaticConfig`
|
||||
|
||||
**Scope**: `wraith-cli` (new binary crate) + `wraith-core` config module
|
||||
|
||||
**Risk**: Medium — new dependency (`toml` crate), new CLI surface to validate
|
||||
|
||||
### Phase 4: Client Profiles
|
||||
|
||||
1. Add `[profiles]` section to client config file
|
||||
2. `--profile production` loads named profile
|
||||
3. CLI flags override profile values
|
||||
|
||||
**Scope**: `wraith-cli`
|
||||
|
||||
**Risk**: Low — convenience layer only
|
||||
|
||||
### Phase 5: Multi-Transport Listeners
|
||||
|
||||
1. Change `ServeTransportMode` from single enum to `Vec<ListenerConfig>`
|
||||
2. `Server::run()` spawns one accept loop per listener, sharing `DynamicConfig`
|
||||
3. Single shutdown signal drains all listeners
|
||||
4. Add `[[listeners]]` to config file format
|
||||
5. NAPI `serve()` accepts `listeners` array instead of single `transport`
|
||||
6. Add `WebTransport` to `TransportKind` enum (initially as a tag only;
|
||||
actual WebTransport acceptor is a separate R&D phase)
|
||||
|
||||
**Scope**: `wraith-core` serve.rs + `wraith-napi` + `wraith-cli`
|
||||
|
||||
**Risk**: Medium — changes the primary API surface of `serve()`. Backwards
|
||||
compat via accepting both `transport: string` (single) and
|
||||
`listeners: array` (multi) in NAPI.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **OQ-CFG-01**: Should forwarding rules support per-user scope derived from
|
||||
the authenticated key's metadata (e.g., `peer_credentials.metadata.scopes`)?
|
||||
Or is a global rules table with principal matching sufficient?
|
||||
|
||||
Global rules with principal matching is simpler and covers most cases. Per-user
|
||||
scope derived from certificates is more granular but requires the server to
|
||||
maintain a mapping from key fingerprint to scope. This mapping comes from the
|
||||
hub's database, not from the SSH protocol. Phase 2 starts with global rules;
|
||||
per-user scope can be added as an extension.
|
||||
|
||||
- **OQ-CFG-02**: Should the config file watch for changes and auto-reload?
|
||||
|
||||
No. File watching is a potential attack vector (symlink races, inotify
|
||||
limitations on network filesystems). The CLI loads once at startup. The NAPI
|
||||
layer reloads explicitly. This is the right model for a security-sensitive
|
||||
tool.
|
||||
|
||||
- **OQ-CFG-03**: Should `ArcSwap` be the reload primitive, or is `RwLock`
|
||||
sufficient?
|
||||
|
||||
`ArcSwap` is the standard pattern for this in Rust network services
|
||||
(`arc-swap` crate). It provides lock-free reads (the hot path) and atomic
|
||||
writes. `RwLock` would also work but adds lock contention on reads. The
|
||||
`arc-swap` dependency is small (~500 lines) and well-maintained. Prefer it.
|
||||
|
||||
- **OQ-CFG-04**: Should TLS and WebTransport on the same port share a single
|
||||
QUIC listener (like iroh Router's ALPN dispatch), or run as separate
|
||||
listeners on the same port?
|
||||
|
||||
They can't conflict because QUIC is UDP and TLS-over-TCP is TCP — the
|
||||
kernel routes by protocol, not by port number. They're naturally separate
|
||||
listeners even on the same port. However, if iroh is also running on the
|
||||
same host, the iroh endpoint already owns a QUIC listener. The WebTransport
|
||||
listener needs its own. Options: (a) share the iroh endpoint's QUIC listener
|
||||
with ALPN dispatch (reuses `from_endpoint` pattern), (b) separate QUIC
|
||||
listeners on different ports, (c) bind both to 443/UDP — possible if
|
||||
`SO_REUSEPORT` is used. Needs R&D; defer to WebTransport transport design
|
||||
session.
|
||||
|
||||
**Update**: WebTransport is out of scope for the current configuration
|
||||
work. It requires a fundamentally different authentication model (HTTP-level
|
||||
API keys/session tokens vs SSH key-based auth). The `ServerHandler` only
|
||||
knows SSH `auth_publickey`. WebTransport auth would need its own handler
|
||||
path. This connects to the broader question of whether `DynamicConfig.auth`
|
||||
should be transport-aware (see OQ-CFG-06). WebTransport transport design
|
||||
is a separate R&D session.
|
||||
|
||||
- **OQ-CFG-05**: Does `TransportKind::WebTransport` need any handler behavior
|
||||
different from other transports?
|
||||
|
||||
Initially no — all transports get the same port-forwarding-only handler.
|
||||
But WebTransport connections come from browsers, which have different trust
|
||||
assumptions. A future forwarding policy might restrict WebTransport clients
|
||||
to `wraith-*` control channels only (no arbitrary host:port forwarding).
|
||||
This is a policy question, not a transport question. The `TransportKind` tag
|
||||
on the handler enables transport-aware policy rules in `ForwardingPolicy`
|
||||
without changing the handler. Defer to Phase 2 (forwarding policy design).
|
||||
|
||||
- **OQ-CFG-06**: Should the auth layer be transport-aware?
|
||||
|
||||
Currently `DynamicConfig.auth` is `ServerAuthConfig` — SSH keys and CAs
|
||||
only. This works for SSH over any transport (TCP, TLS, iroh) because SSH
|
||||
carries its own auth protocol. But non-SSH transports (WebTransport,
|
||||
WebSocket) use HTTP-level authentication (API keys, session tokens in
|
||||
headers/query params). The auth question is: does the same `DynamicConfig`
|
||||
serve both models, or does each transport carry its own auth config?
|
||||
|
||||
Option A: `AuthPolicy` contains both SSH auth and API key auth:
|
||||
```rust
|
||||
pub struct AuthPolicy {
|
||||
ssh: SshAuthConfig, // for SSH-over-any-transport
|
||||
api_keys: Option<ApiKeysConfig>, // for non-SSH transports
|
||||
}
|
||||
```
|
||||
|
||||
Option B: Auth is per-listener. Each `ListenerConfig` carries its own auth
|
||||
config appropriate to its transport.
|
||||
|
||||
Option A is simpler for the initial implementation — the SSH auth path is
|
||||
unchanged, and API key auth is additive. Option B is more flexible but
|
||||
duplicates the shared auth state (keys should be reloadable once, not per
|
||||
listener).
|
||||
|
||||
For now, the config architecture should accommodate Option A as a future
|
||||
extension. Phase 1 implements `DynamicConfig` with SSH auth only. API key
|
||||
auth is added when a non-SSH transport is implemented.
|
||||
|
||||
## Decisions Required
|
||||
|
||||
These decisions will be extracted into ADRs when the architecture is finalized:
|
||||
|
||||
1. **ADR-020**: Static/dynamic config split, `ArcSwap<DynamicConfig>` for
|
||||
hot-reloadable auth and forwarding policy. Supersedes ADR-011's "no config
|
||||
file" — adds optional config file while preserving programmatic-first API.
|
||||
|
||||
2. **ADR-021**: Forwarding policy with rule-based allow/deny. Default-allow
|
||||
preserves current behavior during migration; default-deny for production
|
||||
deployments.
|
||||
|
||||
3. **ADR-022**: Multi-transport listeners. `Server` spawns multiple accept
|
||||
loops sharing auth config, session state, and shutdown. Replaces single
|
||||
`ServeTransportMode` with `Vec<ListenerConfig>`.
|
||||
|
||||
## References
|
||||
|
||||
- [ADR-011](../architecture/decisions/011-no-ssh-config-programmatic-api.md) — Programmatic-first API (superseded by ADR-020)
|
||||
- [ADR-012](../architecture/decisions/012-auth-ed25519-and-cert-authority.md) — Auth key format
|
||||
- [ADR-018](../architecture/decisions/018-control-channel-for-pubsub.md) — Control channel routing
|
||||
- `server/handler.rs` — Current `Arc<ServerAuthConfig>` usage
|
||||
- `server/serve.rs` — Current single-transport `Server::run()` accept loop
|
||||
- `auth/server_auth.rs` — `ServerAuthConfig` struct
|
||||
- `auth/keys.rs` — `KeySource` and key loading
|
||||
- `@alkdev/storage/docs/architecture/sqlite-host.md` — `peer_credentials` table schema
|
||||
- [wtransport](https://github.com/BiagioFesta/wtransport) — Rust WebTransport library (in `/workspace/wtransport`)
|
||||
- [arc-swap crate](https://docs.rs/arc-swap) — Lock-free read, atomic write for shared state
|
||||
Reference in New Issue
Block a user