From af7f4d00067aeb1166ba01622bf2d5a7d3b2e5b9 Mon Sep 17 00:00:00 2001 From: "glm-5.1" Date: Fri, 5 Jun 2026 08:19:41 +0000 Subject: [PATCH] docs: add auth, call protocol architecture specs and ADRs 023-025 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Unified authentication (ADR-023): SSH and WebTransport auth share the same Ed25519 key material. Token auth uses signed timestamps verified against the same authorized_keys set. IdentityProvider trait decouples core from identity storage. Bidirectional call protocol (ADR-024): Generalizes control channel (ADR-018) to support hub→spoke and spoke→hub calls. Operation paths use /{spoke}/{service}/{op} format for three-level routing. EventEnvelope wire format, five call events, PendingRequestMap for correlation. Handler/spec separation (ADR-025): Downstream consumers register operations without modifying core. OperationRegistry maps paths to specs + handlers. Service discovery via /services/list and /services/schema. Resolves OQ-17 (transport-aware auth), OQ-21 (spoke routing), OQ-CFG-04 and OQ-CFG-06 (WebTransport auth and transport-aware auth layer). Adds OQ-18 through OQ-22 for remaining open questions. --- docs/architecture/README.md | 19 +- docs/architecture/auth.md | 261 ++++++++++++ docs/architecture/call-protocol.md | 402 ++++++++++++++++++ .../023-unified-auth-shared-key-material.md | 85 ++++ .../024-bidirectional-call-protocol.md | 63 +++ .../decisions/025-handler-spec-separation.md | 73 ++++ docs/architecture/open-questions.md | 57 ++- docs/research/configuration.md | 30 +- 8 files changed, 971 insertions(+), 19 deletions(-) create mode 100644 docs/architecture/auth.md create mode 100644 docs/architecture/call-protocol.md create mode 100644 docs/architecture/decisions/023-unified-auth-shared-key-material.md create mode 100644 docs/architecture/decisions/024-bidirectional-call-protocol.md create mode 100644 docs/architecture/decisions/025-handler-spec-separation.md diff --git a/docs/architecture/README.md b/docs/architecture/README.md index b143327..eb96010 100644 --- a/docs/architecture/README.md +++ b/docs/architecture/README.md @@ -1,13 +1,16 @@ --- -status: reviewed -last_updated: 2026-06-02 +status: draft +last_updated: 2026-06-04 --- # Wraith Architecture ## Current State -Architecture specification reviewed and ready for implementation. 19 ADRs accepted. Configuration architecture under exploration — see [research/configuration.md](../research/configuration.md). +Architecture specification in active development. 22 ADRs accepted. Unified +auth and call protocol architecture being specified — see [auth.md](auth.md) +and [call-protocol.md](call-protocol.md). Configuration architecture under +exploration — see [research/configuration.md](../research/configuration.md). ## Architecture Documents @@ -15,6 +18,8 @@ Architecture specification reviewed and ready for implementation. 19 ADRs accept |----------|--------|-------------| | [overview.md](overview.md) | reviewed | Package purpose, exports, dependencies | | [transport.md](transport.md) | reviewed | Transport abstraction: TCP, TLS, iroh | +| [auth.md](auth.md) | draft | Unified auth: SSH + token, IdentityProvider trait | +| [call-protocol.md](call-protocol.md) | draft | Bidirectional call/event protocol, operation registry | | [client.md](client.md) | reviewed | Client connection, SOCKS5, port forwarding | | [server.md](server.md) | reviewed | Server acceptance, channel handling, proxy | | [tun-shim.md](tun-shim.md) | deprecated | TUN interface wrapper — **deferred**, use tun2proxy | @@ -49,11 +54,15 @@ Architecture specification reviewed and ready for implementation. 19 ADRs accept | [017](decisions/017-stealth-mode-protocol-multiplexing.md) | Stealth mode — protocol multiplexing on port 443 | Accepted | | [018](decisions/018-control-channel-for-pubsub.md) | Control channel for pubsub over SSH | Accepted | | [019](decisions/019-proxy-dual-semantics.md) | `--proxy` dual semantics (client vs server) | Accepted | +| [023](decisions/023-unified-auth-shared-key-material.md) | Unified auth with shared key material + token auth | Accepted | +| [024](decisions/024-bidirectional-call-protocol.md) | Bidirectional call protocol (EventEnvelope) | Accepted | +| [025](decisions/025-handler-spec-separation.md) | Handler/spec separation for downstream service registration | Accepted | ## Open Questions -Most open questions have been resolved. New questions from configuration -research — see [open-questions.md](open-questions.md) for details. +Most open questions have been resolved. Open questions remain for +configuration, auth, and call protocol — see +[open-questions.md](open-questions.md) for details. ## Lifecycle Definitions diff --git a/docs/architecture/auth.md b/docs/architecture/auth.md new file mode 100644 index 0000000..79ec06a --- /dev/null +++ b/docs/architecture/auth.md @@ -0,0 +1,261 @@ +--- +status: draft +last_updated: 2026-06-04 +--- + +# Authentication & Identity + +## What + +A unified authentication and identity layer that works across all transports — +SSH-over-any-transport and WebTransport (non-SSH HTTP-level transports). The +same key material (Ed25519 authorized keys and certificate authorities) is +shared across both auth paths. Identity resolution produces a transport-agnostic +`Identity` that carries scopes and resources for downstream authorization. + +## Why + +Wraith currently authenticates connections exclusively through SSH public key +auth. Non-SSH transports (WebTransport) cannot perform SSH key exchange — they +need a different auth presentation that shares the same key material. The +unified auth layer ensures one key set, one identity, one rotation mechanism +across all transports. See ADR-023 for the decision context. + +## Architecture + +### Auth Presentation Per Transport + +| Transport | Auth presentation | Verification | +|-----------|-------------------|-------------| +| SSH (TCP, TLS, iroh) | SSH public key auth in the SSH handshake | `ServerAuthConfig::authenticate_publickey()` — key lookup in authorized set | +| WebTransport (HTTP/3) | Signed timestamp token in CONNECT request | Token auth — same authorized set verifies the Ed25519 signature | +| Future (WebSocket, etc.) | Signed timestamp token in headers/query | Same token verification | + +The **key material is shared**. The **presentation differs per transport**. The +**verification result is the same**: an authenticated identity with scopes. + +### Token Authentication + +For non-SSH transports, the client constructs an authentication token: + +``` +AuthToken = base64url(key_id || timestamp || signature) + + key_id = SHA-256 fingerprint of the Ed25519 public key (32 bytes) + timestamp = Unix seconds, big-endian u64 (8 bytes) + signature = Ed25519 sign(key_id || timestamp_bytes, private_key) +``` + +Wire format when passed in a WebTransport CONNECT request: +``` +CONNECT https://server:443/wraith?token= +``` + +Server verification: + +1. Base64url-decode the token +2. Extract `key_id` (first 32 bytes) +3. Look up `key_id` in the same `authorized_keys` set that SSH auth uses +4. Verify the Ed25519 `signature` against `(key_id || timestamp_bytes)` using + the matching public key +5. Check `timestamp` is within the acceptable window (configurable, default + ±300 seconds) +6. Resolve to the same `Identity` that SSH pubkey auth would produce + +The key fingerprint in the token serves double duty: it identifies which key +to verify against, and it ties the signature to a specific key (swapping +`key_id` invalidates the signature). + +### Replay Protection + +V1 uses timestamp-only (±300s window, no server state). The replay trade-offs +and future zero-replay options (nonce challenge-response) are documented in +ADR-023. + +### IdentityProvider Trait + +The `IdentityProvider` trait decouples wraith-core from any specific identity +storage. It resolves a key fingerprint or auth token to an `Identity` with +scopes and resources. + +```rust +pub trait IdentityProvider: Send + Sync + 'static { + /// Resolve an SSH public key fingerprint to an identity. + fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option; + + /// Resolve an auth token to an identity. + /// Returns None if the token is invalid, expired, or the key is not authorized. + fn resolve_from_token(&self, token: &AuthToken) -> Option; +} + +pub struct Identity { + pub id: String, // Unique identifier — fingerprint (config) or account UUID (database) + pub scopes: Vec, // e.g., ["relay:connect", "service:gitea:read"] + pub resources: HashMap>, // e.g., {"service": ["gitea", "registry"]} +} +``` + +**Default implementation**: `ConfigIdentityProvider` loads from +`DynamicConfig.auth` (the `authorized_keys` set). Every authorized key gets a +default scope set. No database required. + +**Hub implementation**: Backed by `@alkdev/storage`'s `peer_credentials` and +`accounts` tables plus the ACL graph. Resolves fingerprint → account → +organization membership → effective scopes. Uses `ArcSwap` for hot reload. + +The trait is the contract. The backing store is pluggable. Wraith-core never +depends on Honker, SQLite, or any specific database. + +### AuthPolicy Structure + +`AuthPolicy` in `DynamicConfig` holds both auth paths, sharing key material: + +```rust +pub struct AuthPolicy { + pub ssh: SshAuthConfig, + pub token: TokenAuthConfig, +} + +pub struct SshAuthConfig { + pub authorized_keys: HashSet, + pub cert_authorities: Vec, + // Existing fields from current ServerAuthConfig +} + +pub struct TokenAuthConfig { + pub enabled: bool, + pub max_token_age: Duration, // Timestamp window (default: 300s) + pub key_source: TokenKeySource, +} + +pub enum TokenKeySource { + /// Share the same authorized_keys set with SshAuthConfig. + /// Default and recommended for v1. + Shared, + /// Separate key set for non-SSH transports. + /// For deployments that want distinct access control per transport. + Separate(HashSet), +} +``` + +When `TokenKeySource::Shared` (the default), adding a key to +`authorized_keys` immediately grants access via both SSH and WebTransport. +One key set, one `reloadAuth()` call, one rotation. + +### Auth Flow in the Server + +**SSH transport (existing, unchanged):** +``` +Client connects → SSH handshake → auth_publickey() callback + → ServerAuthConfig::authenticate_publickey() or authenticate_certificate() + → Auth::Accept or Auth::Reject +``` + +**WebTransport transport (new):** +``` +Browser connects → WebTransport CONNECT request + → SessionRequest inspection: extract token from URL path or header + → TokenAuthConfig verification: decode token → lookup key_id → verify signature → check timestamp + → session_request.accept() or session_request.forbidden() +``` + +After auth, both paths produce an `Identity`. The `Identity` is attached to the +connection and used by `ForwardingPolicy` and the call protocol to make +authorization decisions. + +### WebTransport SessionRequest Inspection + +The wtransport library's `SessionRequest` provides: + +- `path()` — URL path (e.g., `/wraith?token=...`) +- `headers()` — HTTP headers (for `Authorization: Bearer ...`) +- `origin()` — Browser origin (for CORS-like restrictions) +- `remote_address()` — Client UDP address + +Token extraction from URL path is preferred for browser WebTransport because +the W3C API (`new WebTransport(url)`) naturally includes query parameters. For +native clients (Deno, CLI), the `Authorization` header is also supported. + +### Browser-Side Token Construction + +```javascript +// Illustrative — see client SDK for production implementation +async function createAuthToken(keyPair) { + const publicKey = await crypto.subtle.exportKey('raw', keyPair.publicKey); + const keyId = new Uint8Array(await crypto.subtle.digest('SHA-256', publicKey)); + + const timestamp = new ArrayBuffer(8); + new DataView(timestamp).setBigUint64(0, BigInt(Math.floor(Date.now() / 1000))); + + const message = new Uint8Array([...keyId, ...new Uint8Array(timestamp)]); + const signature = await crypto.subtle.sign('Ed25519', keyPair.privateKey, message); + + const token = new Uint8Array([...keyId, ...new Uint8Array(timestamp), ...new Uint8Array(signature)]); + return btoa(String.fromCharCode(...token)) + .replace(/\+/g, '-').replace(/\//g, '_').replace(/=+$/, ''); +} +``` + +Browsers support Ed25519 key generation and signing via `SubtleCrypto` (Chrome +105+, Firefox 130+, Safari 17+). Deno supports it natively. No external +dependencies needed. + +## Constraints + +- Auth tokens are Ed25519-signed with the same key pair used for SSH auth. No + separate key management for non-SSH transports. +- `IdentityProvider` is the only interface between wraith-core and identity + storage. No database dependency at the core level. +- The SSH auth path is unchanged. `auth_publickey()` continues to work exactly + as it does today. Token auth is additive. +- Certificate authority tokens are not supported for token auth in v1. CA + verification requires the full OpenSSH certificate structure, which doesn't + fit in a simple signed timestamp. This can be added later if needed. +- Token auth is only available on transports that carry HTTP metadata (URL + path, headers). SSH-over-TCP/TLS/iroh continues to use SSH native auth + exclusively. + +### Security Considerations + +**Token in URL**: The auth token is passed as a URL query parameter +(`?token=...`) for browser WebTransport compatibility. This is a known web +security consideration: + +- **Server logs**: The token may appear in HTTP access logs. Servers MUST + strip or redact the `token` query parameter before logging the request URL. +- **Browser history**: The token may appear in browser history. Timestamps + limit exposure to the token window (±300s). +- **Referrer headers**: WebTransport does not send referrer headers, so the + token does not leak via HTTP Referer. +- **Native clients**: Deno and native clients SHOULD prefer the `Authorization: + Bearer` header over URL parameters when the client supports custom headers. + +## Open Questions + +- **OQ-18**: Should `Identity.scopes` be populated from `ForwardingPolicy` + rules, from an external `IdentityProvider`, or from both? See + [open-questions.md](open-questions.md). + +- **OQ-19**: Should the WebTransport listener require its own TLS identity + (separate from the SSH-over-TLS listener), or can they share the same + certificate? See [open-questions.md](open-questions.md). + +## Design Decisions + +| ADR | Decision | Summary | +|-----|----------|---------| +| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Ed25519 + cert-authority | Key-based auth, no passwords | +| [023](decisions/023-unified-auth-shared-key-material.md) | Unified auth, shared key material | Same keys for SSH and token auth | + +## References + +- [server.md](server.md) — Current SSH auth handler +- [transport.md](transport.md) — Transport abstraction +- [configuration.md](../research/configuration.md) — DynamicConfig, AuthPolicy structure +- [open-questions.md](open-questions.md) — OQ-17 (resolved), OQ-18, OQ-19 +- `server/handler.rs` — Current `auth_publickey()` callback +- `auth/server_auth.rs` — Current `ServerAuthConfig` struct +- `auth/keys.rs` — `KeySource` and key loading +- [wtransport](https://github.com/BiagioFesta/wtransport) — Rust WebTransport library +- [WebTransport W3C Spec](https://www.w3.org/TR/webtransport/) — Browser API +- [@alkdev/storage](/workspace/@alkdev/storage) — `peer_credentials` table, ACL graph \ No newline at end of file diff --git a/docs/architecture/call-protocol.md b/docs/architecture/call-protocol.md new file mode 100644 index 0000000..a11d247 --- /dev/null +++ b/docs/architecture/call-protocol.md @@ -0,0 +1,402 @@ +--- +status: draft +last_updated: 2026-06-04 +--- + +# Call Protocol + +## What + +A bidirectional, transport-agnostic call and event protocol that runs over +authenticated pipes. It supports request/response calls, streaming +subscriptions, and unidirectional events — all using the same wire format. The +protocol is defined as a spec + handler + registry; downstream consumers (NAPI, +Python, hub/spoke) register their own operations without modifying core. + +## Why + +The current control channel (ADR-018) is unidirectional (client → server) and +provides fire-and-forget event dispatch without request/response semantics. +The call protocol generalizes it to support bidirectional calls (ADR-024) and +downstream service registration (ADR-025), enabling the hub/spoke model where +spokes expose operations the hub invokes. + +## Architecture + +### Operation Paths + +Operation names use slash-based paths aligned with URL routing conventions: + +``` +/{spoke}/{service}/{op} +``` + +- **spoke** — identity prefix of the node that exposes the operation. The hub + uses this segment to route calls to the correct connected node. +- **service** — the logical service namespace. Groups related operations + under one handler prefix. +- **op** — the specific operation within that service. + +Examples: + +| Path | Meaning | +|------|---------| +| `/dev1/fs/readFile` | Spoke `dev1`, service `fs`, operation `readFile` | +| `/dev1/bash/exec` | Spoke `dev1`, service `bash`, operation `exec` | +| `/hub/agent/chat` | Hub's own `agent` service, operation `chat` | +| `/hub/sessions/list` | Hub's own `sessions` service, operation `list` | +| `/browser-1/notify/alert` | Browser spoke `browser-1`, `notify` service | + +This three-level routing mirrors iroh's ALPN dispatch: the first segment +routes to a connected node (like ALPN routes to a protocol handler), the +remaining path dispatches within that node's registry. See ADR-025 for the +handler/spec separation decision. + +The `namespace` field on `OperationSpec` is derived from the path (`namespace` += second path segment). It's a convenience accessor for ACL matching and +service grouping. + +### Wire Format: EventEnvelope + +Every message on the wire is a length-prefixed JSON `EventEnvelope`: + +```rust +pub struct EventEnvelope { + pub r#type: String, // Event type (e.g., "call.requested", "call.responded") + pub id: String, // Correlation key (requestId, topic, or "" for broadcasts) + pub payload: Value, // JSON payload — schema depends on event type +} + +// Frame: 4-byte big-endian length prefix + UTF-8 JSON body +``` + +This is the same format used by `@alkdev/pubsub` adapters. It is JSON because +it must be consumable from JavaScript, Python, and any language. The envelope +is transport-agnostic — it runs over SSH channels, WebTransport streams, iroh +bidirectional streams, WebSocket, or Worker postMessage. + +Binary payloads (postcard, protobuf, etc.) are base64-encoded in the `payload` +field. The envelope itself stays JSON for cross-language compatibility. + +### Call Protocol Events + +Five event types carry request/response and subscription semantics: + +| Event | Direction | Purpose | +|-------|-----------|---------| +| `call.requested` | Caller → Handler | Initiate a call or subscription | +| `call.responded` | Handler → Caller | Deliver a result (one for calls, many for subscriptions) | +| `call.completed` | Handler → Caller | Signal end of subscription stream | +| `call.aborted` | Either side | Cancel the call/subscription | +| `call.error` | Handler → Caller | Signal an error | + +**`call.error` payload**: +```json +{ + "code": "string", + "message": "string", + "retryable": false +} +``` + +**A call is just a subscribe that resolves after one event.** Both `call()` and +`subscribe()` send the same `call.requested` event. The difference is +consumption pattern: + +- **`call()`**: Sends `call.requested`, resolves `Promise` on first `call.responded` +- **`subscribe()`**: Sends `call.requested`, yields each `call.responded` until `call.completed` or `call.aborted` + +The `id` field carries the `requestId` for correlation. + +### Bidirectional Calls and Routing + +Both sides of a connection can initiate calls. The hub routes calls to spokes +using the first path segment: + +``` +Hub (server) Spoke: "dev1" (client) + │ │ + │ call.requested │ + │ name: "/dev1/fs/readFile" │ + │ payload: { path: "/src/main.rs" } │ + │──────────────────────────────────────────▶│ + │ │ + │ call.responded │ + │ id: │ + │ payload: { content: "fn main()..." } │ + │◀──────────────────────────────────────────│ + │ │ + │ Spoke exposes /dev1/fs/*, │ + │ /dev1/bash/* to hub │ + │ │ + │◀─ call.requested ────────────────────────│ + │ name: "/hub/agent/chat" │ + │ payload: { provider: "anthropic", ... } │ + │ │ + │── call.responded ──────────────────────▶ │ + │ id: │ + │ payload: { completion: "..." } │ +``` + +The hub's registry includes: +- **Hub-local operations** (`/hub/*`) — handled directly +- **Remote operations** (`/{spoke}/*`) — forwarded to the spoke connection + +When the hub routes `/dev1/fs/readFile` to spoke `dev1`, it strips the spoke +prefix and delivers the call to the spoke's local registry as `/fs/readFile`. +The spoke doesn't need to know its own alias. + +### Hub/Spoke Architecture + +``` + ┌─────────────────────────────────┐ + │ Hub │ + │ │ + │ Hub-local services: │ + │ /hub/agent/chat (LLM coord) │ + │ /hub/agent/complete │ + │ /hub/sessions/list │ + │ /hub/sessions/history │ + │ │ + │ Spoke registry (discovered): │ + │ /dev1/fs/* → dev1 connection │ + │ /dev1/bash/* → dev1 connection │ + │ /dev2/fs/* → dev2 connection │ + │ /browser-1/notify/* → WT conn │ + └──────┬───────┬───────┬──────────┘ + │ │ │ + ┌─────────▼┐ ┌───▼────┐ ┌▼───────────┐ + │ Dev Spoke│ │Dev Spk │ │Browser Spoke│ + │ "dev1" │ │"dev2" │ │"browser-1" │ + │ /fs/* │ │/fs/* │ │/notify/* │ + │ /bash/* │ │/bash/* │ │ │ + │ /search/*│ │ │ │ │ + └───────────┘ └────────┘ └─────────────┘ +``` + +When a spoke connects, it registers its operations with the hub: + +``` +spoke → hub: call.requested { name: "/hub/services/register", payload: { + spoke: "dev1", + operations: ["/fs/readFile", "/fs/writeFile", "/bash/exec", "/search/query"] +}} +``` + +The hub adds these to its routing table with the spoke prefix. Other spokes +and browser clients can then call `/dev1/fs/readFile` without knowing how +the hub routes it internally. + +### Operation Registry + +The operation registry maps paths to specs and handlers. **Specs and handlers +are separate** — downstream consumers register both (ADR-025). + +```rust +pub struct OperationSpec { + pub name: String, // e.g., "/fs/readFile", "/agent/chat" + pub namespace: String, // e.g., "fs", "agent" + pub op_type: OperationType, // Query, Mutation, Subscription + pub input_schema: Value, // JSON Schema for input + pub output_schema: Value, // JSON Schema for output + pub access_control: AccessControl, // Required scopes/resources +} + +pub enum OperationType { + Query, // Read-only, idempotent (e.g., "/fs/readFile", "/search/query") + Mutation, // Side effects (e.g., "/bash/exec", "/sessions/create") + Subscription, // Streaming (e.g., "/events/subscribe") +} + +pub struct AccessControl { + pub required_scopes: Vec, // AND-checked + pub required_scopes_any: Option>, // OR-checked + pub resource_type: Option, // e.g., "service" + pub resource_action: Option, // e.g., "read" +} +``` + +**Registration is separated from implementation:** + +```rust +// Core registers discovery operations +registry.register(OperationSpec { name: "/services/list", ... }, list_services_handler); +registry.register(OperationSpec { name: "/services/schema", ... }, schema_handler); + +// A dev env spoke registers its tools +registry.register(OperationSpec { name: "/fs/readFile", ... }, fs_read_handler); +registry.register(OperationSpec { name: "/bash/exec", ... }, bash_exec_handler); + +// A browser client registers notification UDFs +registry.register(OperationSpec { name: "/notify/alert", ... }, notify_handler); +``` + +Core-provided operations use short paths without a spoke prefix +(`/services/list`, `/services/schema`). They live on whatever node the +caller is connected to. Spoke-prefixed operations (`/dev1/fs/readFile`) +are routed by the hub. + +### ACL Per Operation Path + +Access control maps to path prefixes using standard URL-like matching: + +| Pattern | Matches | Purpose | +|---------|---------|---------| +| `/dev1/*` | All operations on spoke `dev1` | Full access to a spoke | +| `/*/fs/*` | `fs` service on any spoke | Read file access across dev envs | +| `/*/bash/*` | `bash` service on any spoke | Shell access (higher risk) | +| `/hub/agent/*` | Hub LLM agent | LLM calls | +| `/hub/sessions/*` | Hub session management | Session history | +| `/browser-1/notify/alert` | Specific operation on specific spoke | One UI notification | + +Higher-risk operations (shell, filesystem write) can require tighter scopes +than read-only operations. The ACL evaluates against the caller's +`Identity.scopes` and `Identity.resources` from the auth layer (see auth.md). + +### Service Discovery + +The `/services/list` and `/services/schema` operations expose what a node +offers. Read-only — no admin operations: + +| Operation | Type | Description | +|-----------|------|-------------| +| `/services/list` | Query | List registered operation paths + metadata | +| `/services/schema` | Query | Get `OperationSpec` for a specific operation | + +These tell the caller: "here's what you can call." They are not a control +panel. Access control is enforced at the operation level. + +### PendingRequestMap + +Manages in-flight calls and subscriptions. Correlates `call.responded` events +back to the original `call.requested`: + +```rust +pub struct PendingRequestMap { + pending: HashMap, +} + +enum PendingEntry { + Call { + tx: oneshot::Sender>, + timeout: Instant, + }, + Subscribe { + tx: mpsc::Sender>, + timeout: Option, + }, +} +``` + +When a `call.responded` event arrives: +- If `PendingEntry::Call` → resolve the oneshot, delete entry +- If `PendingEntry::Subscribe` → push to the mpsc channel, keep entry alive + +When `call.completed` arrives on a subscription → close the mpsc channel, delete +entry. When `call.aborted` arrives → cancel/drop whichever side initiated it. A +`call.aborted` for an unknown `requestId` is silently discarded — no error +response is generated. + +Timeouts prevent dangling entries. A background task sweeps expired entries +periodically. + +### Protocol Adapter Layer + +The call protocol is transport-agnostic by design. It maps to any transport +that carries `EventEnvelope` frames: + +| Transport | Channel mechanism | Direction | +|-----------|-------------------|-----------| +| SSH | Reserved `direct_tcpip` destination (ADR-018) | Bidirectional over SSH channel | +| WebTransport | Bidirectional stream after CONNECT | Bidirectional over WT stream | +| iroh QUIC | Bidirectional `open_bi()` / `accept_bi()` | Bidirectional over QUIC stream | +| WebSocket | Single WS connection | Bidirectional over WS frames | +| Worker | `postMessage` | Bidirectional over structured clone | + +The framing is always: 4-byte BE length prefix + JSON. The envelope shape is +the same regardless of transport. + +### Relationship to @alkdev/pubsub and @alkdev/operations + +The call protocol in core is a Rust reimplementation of the same protocol +defined in `@alkdev/operations`. The TypeScript implementation provides: + +- `PendingRequestMap` — request/response correlation +- `CallHandler` — bridges pubsub events to operation registry +- `OperationSpec`, `AccessControl`, `Identity` — type definitions + +The Rust implementation mirrors these types and behaviors. TypeScript consumers +continue using `@alkdev/operations` over `@alkdev/pubsub` adapters (including +the `event-target-wraith` adapter). Rust consumers use core's registry directly. +Both speak the same wire protocol and can interoperate. + +The key principle: **the same `EventEnvelope` can flow from a Rust handler +through core, out over SSH channel, into a JavaScript pubsub adapter, and +be dispatched through `@alkdev/operations`'s call handler** — with zero +translation at the wire level. + +### Agent Service Pattern + +The hub commonly runs an agent service that coordinates between LLM providers +and tool calls. This service is just another set of registered operations — +no special treatment: + +- `/hub/agent/chat` — send a message, get a completion. Routes to the + appropriate LLM provider based on available spokes and configuration. +- `/hub/agent/complete` — streaming completion. Yields tokens as they arrive. +- `/hub/sessions/list` — list session histories (backed by Honker or other + durable storage). +- `/hub/sessions/history` — retrieve a specific session's message history. + +The agent service uses the same call protocol to invoke tools on spokes: +`/dev1/fs/readFile` for file access, `/dev1/bash/exec` for shell commands. It +stores session state via whatever mechanism the hub deployment provides — core +doesn't mandate Honker or any specific storage. + +## Constraints + +- The call protocol does not depend on Honker, SQLite, or any database. The + `PendingRequestMap` is in-memory. Durable session storage is a consumer concern. +- Operation specs use JSON Schema. Complex sub-structures (postcard, protobuf) + can be carried as base64-encoded blobs in the `payload`, but the envelope + itself is always JSON. +- Service discovery (`/services/list`, `/services/schema`) is read-only. No + admin operations are exposed through the call protocol itself. +- Batch is not a protocol primitive. Multiple `call.requested` events with + correlated `requestId`s provide equivalent semantics. +- The spoke prefix in the operation path is a routing mechanism, not a security + boundary. ACL is enforced at the `AccessControl` level, not by path prefix + alone. A spoke that exposes `/dev1/bash/exec` can restrict access via + `required_scopes` — not every authenticated identity should have shell access. + +## Open Questions + +- **OQ-20**: How does the hub track which spokes expose which operations when + spokes connect and disconnect? Registration on connect and cleanup on + disconnect, or heartbeat-based discovery? See + [open-questions.md](open-questions.md). + +- **OQ-22**: Should the call protocol support streaming inputs (client streaming + in gRPC terms), or is client→server always a single request payload with + streaming only server→client? See [open-questions.md](open-questions.md). + +## Design Decisions + +| ADR | Decision | Summary | +|-----|----------|---------| +| [018](decisions/018-control-channel-for-pubsub.md) | Control channel for pubsub | Reserved destination for event bus | +| [024](decisions/024-bidirectional-call-protocol.md) | Bidirectional call protocol | Generalizes ADR-018, both sides can call | +| [025](decisions/025-handler-spec-separation.md) | Handler/spec separation | Downstream registers operations without modifying core | + +## References + +- [auth.md](auth.md) — Identity and `IdentityProvider` trait +- [napi-and-pubsub.md](napi-and-pubsub.md) — NAPI wrapper and pubsub adapter +- [server.md](server.md) — Channel handling and control channel routing +- [transport.md](transport.md) — Transport abstraction +- [configuration.md](../research/configuration.md) — ForwardingPolicy, service metadata +- `@alkdev/pubsub` — TypeScript event target adapters and `EventEnvelope` +- `@alkdev/operations` — TypeScript call protocol, `OperationSpec`, registry +- `@alkdev/storage` — `peer_credentials` table, ACL graph, `Identity` +- [irpc](/workspace/irpc) — iroh streaming RPC (postcard-only, Rust-to-Rust) +- [iroh](/workspace/iroh) — P2P QUIC transport \ No newline at end of file diff --git a/docs/architecture/decisions/023-unified-auth-shared-key-material.md b/docs/architecture/decisions/023-unified-auth-shared-key-material.md new file mode 100644 index 0000000..6e3965d --- /dev/null +++ b/docs/architecture/decisions/023-unified-auth-shared-key-material.md @@ -0,0 +1,85 @@ +# ADR-023: Unified Authentication with Shared Key Material + +## Status +Accepted + +## Context + +Wraith currently authenticates connections exclusively through SSH public key +auth in the SSH handshake. This works for SSH-over-any-transport (TCP, TLS, +iroh) because SSH carries its own auth protocol. But WebTransport and other +HTTP-level transports cannot perform SSH key exchange — browsers speak HTTP/3, +not SSH. + +Without unification, non-SSH transports would need a completely separate +identity system (API keys, JWTs, session tokens). This creates two problems: +(1) operators manage two key sets with two rotation mechanisms, and (2) the +same person connecting via SSH and WebTransport appears as two different +identities. + +The `IdentityProvider` trait is needed to decouple wraith-core from any +specific identity storage (config file vs. database). Without it, wraith-core +would either hardcode config-file-based auth or take a database dependency — +neither is acceptable for a library crate. + +## Decision + +**Unified authentication**: The same Ed25519 key material (`authorized_keys` +and `cert_authorities`) is shared across both SSH auth and token auth. The +presentation differs per transport, but the verification result (an +`Identity` with scopes) is the same. + +**Token auth for non-SSH transports**: WebTransport clients present a signed +timestamp token in the CONNECT request URL: + +``` +AuthToken = base64url(key_id || timestamp || signature) + key_id = SHA-256 fingerprint of the Ed25519 public key (32 bytes) + timestamp = Unix seconds, big-endian u64 (8 bytes) + signature = Ed25519 sign(key_id || timestamp_bytes, private_key) +``` + +Server extracts the fingerprint, looks it up in the same `authorized_keys` +set, verifies the signature, and checks the timestamp window (default ±300s). + +**`IdentityProvider` trait**: Decouples wraith-core from identity storage. The +trait resolves a fingerprint or token to an `Identity`. Default implementation +loads from `DynamicConfig.auth` (no database). Hub implementation can back it +with `@alkdev/storage`. + +**`TokenKeySource::Shared`**: The token auth uses the same authorized keys set +as SSH auth by default. Deployments that want separate access control can use +`TokenKeySource::Separate` with a distinct key set. + +**Replay protection via timestamps**: V1 uses timestamp-only (no server state). +Zero-replay can be added later via a nonce challenge-response without changing +the key material. + +## Consequences + +- **Positive**: One key set, one rotation, one `reloadAuth()` call. Adding a + key to `authorized_keys` immediately grants access via both SSH and + WebTransport. +- **Positive**: `IdentityProvider` trait makes wraith-core independent of any + specific database. Default: config file. Hub: `@alkdev/storage`. +- **Positive**: Browser clients can authenticate using Ed25519 keys via + SubtleCrypto (Chrome 105+, Firefox 130+, Safari 17+). Deno supports it + natively. +- **Positive**: No JWT library dependency. The token is a simple Ed25519 + signature over a fixed structure — same primitives SSH already uses. +- **Negative**: V1 has a replay window (±300s). An attacker who intercepts a + QUIC packet can replay the token within the window. Acceptable because QUIC + interception is the same threat level as connection hijacking. +- **Negative**: Certificate authority tokens are not supported in v1. CA + verification requires the full OpenSSH certificate structure, which doesn't + fit in a signed timestamp. +- **Negative**: Browser-side key management is less ergonomic than SSH key + files. The private key must be imported into SubtleCrypto. This is a UI/UX + concern, not a protocol concern. + +## References + +- [auth.md](../auth.md) — Full auth architecture spec +- [ADR-012](012-auth-ed25519-and-cert-authority.md) — Ed25519 + cert-authority auth +- [OQ-17](../open-questions.md) — Transport-aware auth (resolved by this ADR) +- [configuration.md](../../research/configuration.md) — OQ-CFG-04, OQ-CFG-06 (resolved) \ No newline at end of file diff --git a/docs/architecture/decisions/024-bidirectional-call-protocol.md b/docs/architecture/decisions/024-bidirectional-call-protocol.md new file mode 100644 index 0000000..3496b22 --- /dev/null +++ b/docs/architecture/decisions/024-bidirectional-call-protocol.md @@ -0,0 +1,63 @@ +# ADR-024: Bidirectional Call Protocol + +## Status +Accepted + +## Context + +The wraith control channel (ADR-018) routes from client → server's event bus. +This is unidirectional: clients can send events to the server, but the server +cannot call operations on the client. In the hub/spoke model, spokes (dev env +containers) connect to a hub and expose operations (fs, bash, search) that the +hub invokes. The hub needs to call *spoke* operations. + +Additionally, the current control channel provides no request/response semantics. +Every consumer that needs call/response reinvents the pending-request correlation. + +## Decision + +The call protocol is bidirectional. Both sides can send `call.requested` and +receive `call.responded`. The protocol uses `EventEnvelope` wire format (4-byte +BE length prefix + JSON) — the same as `@alkdev/pubsub`. + +Five event types: `call.requested`, `call.responded`, `call.completed`, +`call.aborted`, `call.error`. + +A call is a subscribe that resolves after one event. Both use `call.requested` +with correlated `requestId`. `PendingRequestMap` in core provides correlation. + +Operation names use slash-based paths: `/{spoke}/{service}/{op}`. The first +path segment routes the call to the correct connected node. The hub's registry +maps spoke prefixes to connections. This mirrors iroh's ALPN dispatch: the +first segment is the routing key, remaining path dispatches within the node. + +Core-provided operations use short paths without a spoke prefix +(`/services/list`, `/services/schema`). Spoke operations are prefixed +(`/dev1/fs/readFile`). + +This generalizes ADR-018's control channel: the `wraith-*` destination becomes +a transport for `EventEnvelope` frames with call protocol semantics, instead of +raw pubsub dispatch. + +## Consequences + +- **Positive**: Hub can invoke operations on spokes. Dev env containers + expose fs, bash, search — the hub calls them as needed. +- **Positive**: Browser clients can expose custom UDFs. Any connected participant + can both call and serve operations. +- **Positive**: Built-in request/response correlation. One `PendingRequestMap` + in core serves all consumers. +- **Positive**: Slash-based paths align with URL routing, OpenAPI, MCP, and + iroh's ALPN dispatch. First segment = routing key. +- **Positive**: Multiple spokes exposing the same service (two dev envs both + exposing `/fs/*`) are naturally differentiated by the spoke prefix. +- **Negative**: The `PendingRequestMap` adds in-memory state. Entries must be + cleaned up on timeout or connection close. +- **Negative**: The hub must maintain a routing table mapping spoke identities + to connections, with registration on connect and cleanup on disconnect. + +## References + +- [call-protocol.md](../call-protocol.md) — Full call protocol spec +- [ADR-018](018-control-channel-for-pubsub.md) — Control channel (generalized) +- [napi-and-pubsub.md](../napi-and-pubsub.md) — NAPI wrapper and pubsub adapter \ No newline at end of file diff --git a/docs/architecture/decisions/025-handler-spec-separation.md b/docs/architecture/decisions/025-handler-spec-separation.md new file mode 100644 index 0000000..c091883 --- /dev/null +++ b/docs/architecture/decisions/025-handler-spec-separation.md @@ -0,0 +1,73 @@ +# ADR-025: Handler/Spec Separation for Downstream Service Registration + +## Status +Accepted + +## Context + +The current control channel (ADR-018) is hardcoded: `wraith-control:0` bridges +to the local pubsub event bus. If NAPI wants to expose `fs.readFile` or +`bash.exec` as callable operations, it has no way to register these with core's +channel routing. The NAPI handler would need to intercept channel data outside +of core. + +For the hub/spoke model, spokes register their operations with the hub when +they connect. The hub's registry must include both hub-local operations and +remote operations exposed by spokes. + +## Decision + +Operation specs and handlers are separated from core. Core provides: + +1. `OperationSpec` — describes what an operation does (name, type, input/output + schemas, access control) +2. `OperationHandler` — implements the operation logic +3. `OperationRegistry` — maps paths to specs + handlers +4. Built-in operations: `/services/list`, `/services/schema` + +Downstream consumers register their own operations: + +```rust +// NAPI layer registers dev env tools +registry.register(OperationSpec { name: "/fs/readFile", ... }, fs_read_handler); +registry.register(OperationSpec { name: "/bash/exec", ... }, bash_exec_handler); + +// Browser client registers a custom UDF +registry.register(OperationSpec { name: "/notify/alert", ... }, notify_handler); +``` + +Operation names use slash-based paths: `/{spoke}/{service}/{op}`. The first +segment routes to the node. The `namespace` field on `OperationSpec` is +derived from the second path segment (`service`). + +When spoke operations are registered with the hub, the hub adds the spoke +prefix: a spoke that registers `/fs/readFile` as "dev1" becomes addressable as +`/dev1/fs/readFile` in the hub's routing table. + +The `/services/list` operation returns all registered specs. The +`/services/schema` operation returns the spec for a specific operation. These +are read-only — no admin operations. + +## Consequences + +- **Positive**: NAPI, Python, and any downstream consumer can register + operations without modifying core. +- **Positive**: Service discovery is built in. Clients query `/services/list` + to learn what operations a hub offers. +- **Positive**: Spoke prefix naturally differentiates multiple spokes exposing + the same service (dev1 vs dev2). +- **Positive**: `AccessControl` on each `OperationSpec` enables per-operation + authorization. Higher-risk operations (shell, filesystem write) can require + tighter scopes. +- **Positive**: Schema exposure enables MCP adapter generation. OperationSpec + maps directly to MCP tool definitions. +- **Negative**: The registry adds complexity. Core now owns `OperationSpec`, + `OperationRegistry`, and `PendingRequestMap`. +- **Negative**: Namespace collisions between downstream consumers are possible. + The spoke prefix mitigates this: `/dev1/fs/readFile` vs `/dev2/fs/readFile`. + +## References + +- [call-protocol.md](../call-protocol.md) — Full call protocol spec +- [ADR-018](018-control-channel-for-pubsub.md) — Control channel (generalized) +- `@alkdev/operations` — TypeScript `OperationSpec`, `CallHandler`, registry \ No newline at end of file diff --git a/docs/architecture/open-questions.md b/docs/architecture/open-questions.md index 790d386..d4a5983 100644 --- a/docs/architecture/open-questions.md +++ b/docs/architecture/open-questions.md @@ -1,6 +1,6 @@ --- -status: reviewed -last_updated: 2026-06-02 +status: draft +last_updated: 2026-06-04 --- # Open Questions @@ -99,39 +99,78 @@ last_updated: 2026-06-02 - **Status**: open - **Priority**: medium - **Resolution**: (pending) -- **Cross-references**: ADR-020 (proposed) +- **Cross-references**: configuration.md ### OQ-13: Config file auto-reload via file watching - **Origin**: [research/configuration.md](../research/configuration.md) - **Status**: resolved - **Priority**: low - **Resolution**: No file watching. CLI loads once at startup; NAPI/hub reload explicitly. File watching is a potential attack vector and unnecessary complexity for a security tool. -- **Cross-references**: ADR-020 (proposed) +- **Cross-references**: configuration.md ### OQ-14: ArcSwap vs RwLock for dynamic config - **Origin**: [research/configuration.md](../research/configuration.md) - **Status**: resolved - **Priority**: low - **Resolution**: ArcSwap. Lock-free reads on the hot path (every auth check, every channel open). `RwLock` adds contention. `arc-swap` is small (~500 lines) and well-maintained. -- **Cross-references**: ADR-020 (proposed) +- **Cross-references**: configuration.md ### OQ-15: TLS + WebTransport + iroh QUIC listener coexistence - **Origin**: [research/configuration.md](../research/configuration.md) - **Status**: open - **Priority**: medium - **Resolution**: (pending — needs R&D in WebTransport transport session) -- **Cross-references**: ADR-022 (proposed) +- **Cross-references**: [auth.md](auth.md), OQ-19 ### OQ-16: Transport-specific forwarding policy (e.g., WebTransport clients restricted to wraith-* channels) - **Origin**: [research/configuration.md](../research/configuration.md) - **Status**: open - **Priority**: low - **Resolution**: (pending — defer to forwarding policy design) -- **Cross-references**: ADR-021 (proposed) +- **Cross-references**: configuration.md ### OQ-17: Transport-aware auth layer (SSH keys vs API keys for non-SSH transports) - **Origin**: [research/configuration.md](../research/configuration.md) +- **Status**: ~~resolved~~ +- **Priority**: ~~medium~~ — +- **Resolution**: ADR-023 — Unified auth with shared key material. SSH transports use SSH pubkey auth. Non-SSH transports (WebTransport) use Ed25519-signed timestamp tokens. Both verify against the same `authorized_keys` set. The presentation differs per transport, but the identity is unified. `AuthPolicy` holds both `SshAuthConfig` and `TokenAuthConfig`, with `TokenKeySource::Shared` as the default (same keys for both paths). `IdentityProvider` trait decouples wraith-core from identity storage. +- **Cross-references**: [ADR-023](decisions/023-unified-auth-shared-key-material.md), [auth.md](auth.md), OQ-15 + +## Auth + +### OQ-18: Source of Identity.scopes — ForwardingPolicy, IdentityProvider, or both? +- **Origin**: [auth.md](auth.md) - **Status**: open - **Priority**: medium -- **Resolution**: (pending — defer until non-SSH transport is implemented) -- **Cross-references**: ADR-020 (proposed), OQ-15 \ No newline at end of file +- **Resolution**: (pending) +- **Cross-references**: ADR-023, [call-protocol.md](call-protocol.md) + +### OQ-19: Separate TLS identity for WebTransport vs shared with SSH-over-TLS? +- **Origin**: [auth.md](auth.md) +- **Status**: open +- **Priority**: low +- **Resolution**: (pending) +- **Cross-references**: OQ-15 + +## Call Protocol + +### OQ-20: Spoke registration and discovery on connect/disconnect +- **Origin**: [call-protocol.md](call-protocol.md) +- **Status**: open +- **Priority**: medium +- **Resolution**: (pending — registration on connect / cleanup on disconnect is the leading approach) +- **Cross-references**: ADR-024, ADR-025 + +### OQ-21: Routing calls to specific spokes with same-service operations +- **Origin**: [call-protocol.md](call-protocol.md) +- **Status**: ~~resolved~~ +- **Priority**: ~~medium~~ — +- **Resolution**: ADR-024, ADR-025 — Operation paths use `/{spoke}/{service}/{op}` format. The first path segment identifies the spoke and routes the call to the correct connected node. Multiple spokes exposing the same service (e.g., two dev envs both with `/fs/*`) are differentiated by the spoke prefix (`/dev1/fs/readFile` vs `/dev2/fs/readFile`). The hub maintains a routing table mapping spoke identity to connection. This mirrors iroh's ALPN dispatch: first segment = routing key. +- **Cross-references**: [call-protocol.md](call-protocol.md), ADR-024, ADR-025 + +### OQ-22: Client streaming (streaming inputs) in the call protocol? +- **Origin**: [call-protocol.md](call-protocol.md) +- **Status**: open +- **Priority**: low +- **Resolution**: (pending) +- **Cross-references**: ADR-024 \ No newline at end of file diff --git a/docs/research/configuration.md b/docs/research/configuration.md index 4a09c97..1bbe8e9 100644 --- a/docs/research/configuration.md +++ b/docs/research/configuration.md @@ -490,13 +490,20 @@ compat via accepting both `transport: string` (single) and `SO_REUSEPORT` is used. Needs R&D; defer to WebTransport transport design session. - **Update**: WebTransport is out of scope for the current configuration + ~~**Update**: WebTransport is out of scope for the current configuration work. It requires a fundamentally different authentication model (HTTP-level API keys/session tokens vs SSH key-based auth). The `ServerHandler` only knows SSH `auth_publickey`. WebTransport auth would need its own handler path. This connects to the broader question of whether `DynamicConfig.auth` should be transport-aware (see OQ-CFG-06). WebTransport transport design - is a separate R&D session. + is a separate R&D session.~~ + + **Update 2**: Auth concern is resolved by ADR-023. The same authorized_keys + set verifies both SSH pubkey auth and token auth (Ed25519-signed timestamp + for WebTransport). One key material, two presentations. The remaining + question is purely about QUIC listener coexistence — which is a transport + implementation detail, not an auth question. See [auth.md](../architecture/auth.md) + and [ADR-023](../architecture/decisions/023-unified-auth-shared-key-material.md). - **OQ-CFG-05**: Does `TransportKind::WebTransport` need any handler behavior different from other transports? @@ -518,7 +525,7 @@ compat via accepting both `transport: string` (single) and headers/query params). The auth question is: does the same `DynamicConfig` serve both models, or does each transport carry its own auth config? - Option A: `AuthPolicy` contains both SSH auth and API key auth: + ~~Option A: `AuthPolicy` contains both SSH auth and API key auth: ```rust pub struct AuthPolicy { ssh: SshAuthConfig, // for SSH-over-any-transport @@ -536,7 +543,17 @@ compat via accepting both `transport: string` (single) and For now, the config architecture should accommodate Option A as a future extension. Phase 1 implements `DynamicConfig` with SSH auth only. API key - auth is added when a non-SSH transport is implemented. + auth is added when a non-SSH transport is implemented.~~ + + **Resolved by ADR-023**: The auth layer is transport-aware in its + *presentation*, not its *material*. `AuthPolicy` holds `SshAuthConfig` and + `TokenAuthConfig`, where `TokenAuthConfig.key_source` defaults to + `Shared` (same `authorized_keys` set as SSH auth). The same Ed25519 keys + serve both paths: SSH presents the public key in the handshake; WebTransport + presents an Ed25519-signed timestamp token. Verification produces the same + `Identity` type via the `IdentityProvider` trait. One `reloadAuth()` call + updates both. See [auth.md](../architecture/auth.md) and + [ADR-023](../architecture/decisions/023-unified-auth-shared-key-material.md). ## Decisions Required @@ -565,4 +582,7 @@ These decisions will be extracted into ADRs when the architecture is finalized: - `auth/keys.rs` — `KeySource` and key loading - `@alkdev/storage/docs/architecture/sqlite-host.md` — `peer_credentials` table schema - [wtransport](https://github.com/BiagioFesta/wtransport) — Rust WebTransport library (in `/workspace/wtransport`) -- [arc-swap crate](https://docs.rs/arc-swap) — Lock-free read, atomic write for shared state \ No newline at end of file +- [arc-swap crate](https://docs.rs/arc-swap) — Lock-free read, atomic write for shared state +- [ADR-023](../architecture/decisions/023-unified-auth-shared-key-material.md) — Unified auth with shared key material +- [auth.md](../architecture/auth.md) — Unified auth architecture spec +- [call-protocol.md](../architecture/call-protocol.md) — Bidirectional call protocol spec \ No newline at end of file