docs: add auth, call protocol architecture specs and ADRs 023-025

Unified authentication (ADR-023): SSH and WebTransport auth share the same
Ed25519 key material. Token auth uses signed timestamps verified against the
same authorized_keys set. IdentityProvider trait decouples core from identity
storage.

Bidirectional call protocol (ADR-024): Generalizes control channel (ADR-018)
to support hub→spoke and spoke→hub calls. Operation paths use /{spoke}/{service}/{op}
format for three-level routing. EventEnvelope wire format, five call events,
PendingRequestMap for correlation.

Handler/spec separation (ADR-025): Downstream consumers register operations
without modifying core. OperationRegistry maps paths to specs + handlers.
Service discovery via /services/list and /services/schema.

Resolves OQ-17 (transport-aware auth), OQ-21 (spoke routing), OQ-CFG-04 and
OQ-CFG-06 (WebTransport auth and transport-aware auth layer). Adds OQ-18
through OQ-22 for remaining open questions.
This commit is contained in:
2026-06-05 08:19:41 +00:00
parent 41062d810e
commit af7f4d0006
8 changed files with 971 additions and 19 deletions

View File

@@ -1,13 +1,16 @@
--- ---
status: reviewed status: draft
last_updated: 2026-06-02 last_updated: 2026-06-04
--- ---
# Wraith Architecture # Wraith Architecture
## Current State ## Current State
Architecture specification reviewed and ready for implementation. 19 ADRs accepted. Configuration architecture under exploration — see [research/configuration.md](../research/configuration.md). Architecture specification in active development. 22 ADRs accepted. Unified
auth and call protocol architecture being specified — see [auth.md](auth.md)
and [call-protocol.md](call-protocol.md). Configuration architecture under
exploration — see [research/configuration.md](../research/configuration.md).
## Architecture Documents ## Architecture Documents
@@ -15,6 +18,8 @@ Architecture specification reviewed and ready for implementation. 19 ADRs accept
|----------|--------|-------------| |----------|--------|-------------|
| [overview.md](overview.md) | reviewed | Package purpose, exports, dependencies | | [overview.md](overview.md) | reviewed | Package purpose, exports, dependencies |
| [transport.md](transport.md) | reviewed | Transport abstraction: TCP, TLS, iroh | | [transport.md](transport.md) | reviewed | Transport abstraction: TCP, TLS, iroh |
| [auth.md](auth.md) | draft | Unified auth: SSH + token, IdentityProvider trait |
| [call-protocol.md](call-protocol.md) | draft | Bidirectional call/event protocol, operation registry |
| [client.md](client.md) | reviewed | Client connection, SOCKS5, port forwarding | | [client.md](client.md) | reviewed | Client connection, SOCKS5, port forwarding |
| [server.md](server.md) | reviewed | Server acceptance, channel handling, proxy | | [server.md](server.md) | reviewed | Server acceptance, channel handling, proxy |
| [tun-shim.md](tun-shim.md) | deprecated | TUN interface wrapper — **deferred**, use tun2proxy | | [tun-shim.md](tun-shim.md) | deprecated | TUN interface wrapper — **deferred**, use tun2proxy |
@@ -49,11 +54,15 @@ Architecture specification reviewed and ready for implementation. 19 ADRs accept
| [017](decisions/017-stealth-mode-protocol-multiplexing.md) | Stealth mode — protocol multiplexing on port 443 | Accepted | | [017](decisions/017-stealth-mode-protocol-multiplexing.md) | Stealth mode — protocol multiplexing on port 443 | Accepted |
| [018](decisions/018-control-channel-for-pubsub.md) | Control channel for pubsub over SSH | Accepted | | [018](decisions/018-control-channel-for-pubsub.md) | Control channel for pubsub over SSH | Accepted |
| [019](decisions/019-proxy-dual-semantics.md) | `--proxy` dual semantics (client vs server) | Accepted | | [019](decisions/019-proxy-dual-semantics.md) | `--proxy` dual semantics (client vs server) | Accepted |
| [023](decisions/023-unified-auth-shared-key-material.md) | Unified auth with shared key material + token auth | Accepted |
| [024](decisions/024-bidirectional-call-protocol.md) | Bidirectional call protocol (EventEnvelope) | Accepted |
| [025](decisions/025-handler-spec-separation.md) | Handler/spec separation for downstream service registration | Accepted |
## Open Questions ## Open Questions
Most open questions have been resolved. New questions from configuration Most open questions have been resolved. Open questions remain for
research — see [open-questions.md](open-questions.md) for details. configuration, auth, and call protocol — see
[open-questions.md](open-questions.md) for details.
## Lifecycle Definitions ## Lifecycle Definitions

261
docs/architecture/auth.md Normal file
View File

@@ -0,0 +1,261 @@
---
status: draft
last_updated: 2026-06-04
---
# Authentication & Identity
## What
A unified authentication and identity layer that works across all transports —
SSH-over-any-transport and WebTransport (non-SSH HTTP-level transports). The
same key material (Ed25519 authorized keys and certificate authorities) is
shared across both auth paths. Identity resolution produces a transport-agnostic
`Identity` that carries scopes and resources for downstream authorization.
## Why
Wraith currently authenticates connections exclusively through SSH public key
auth. Non-SSH transports (WebTransport) cannot perform SSH key exchange — they
need a different auth presentation that shares the same key material. The
unified auth layer ensures one key set, one identity, one rotation mechanism
across all transports. See ADR-023 for the decision context.
## Architecture
### Auth Presentation Per Transport
| Transport | Auth presentation | Verification |
|-----------|-------------------|-------------|
| SSH (TCP, TLS, iroh) | SSH public key auth in the SSH handshake | `ServerAuthConfig::authenticate_publickey()` — key lookup in authorized set |
| WebTransport (HTTP/3) | Signed timestamp token in CONNECT request | Token auth — same authorized set verifies the Ed25519 signature |
| Future (WebSocket, etc.) | Signed timestamp token in headers/query | Same token verification |
The **key material is shared**. The **presentation differs per transport**. The
**verification result is the same**: an authenticated identity with scopes.
### Token Authentication
For non-SSH transports, the client constructs an authentication token:
```
AuthToken = base64url(key_id || timestamp || signature)
key_id = SHA-256 fingerprint of the Ed25519 public key (32 bytes)
timestamp = Unix seconds, big-endian u64 (8 bytes)
signature = Ed25519 sign(key_id || timestamp_bytes, private_key)
```
Wire format when passed in a WebTransport CONNECT request:
```
CONNECT https://server:443/wraith?token=<AuthToken>
```
Server verification:
1. Base64url-decode the token
2. Extract `key_id` (first 32 bytes)
3. Look up `key_id` in the same `authorized_keys` set that SSH auth uses
4. Verify the Ed25519 `signature` against `(key_id || timestamp_bytes)` using
the matching public key
5. Check `timestamp` is within the acceptable window (configurable, default
±300 seconds)
6. Resolve to the same `Identity` that SSH pubkey auth would produce
The key fingerprint in the token serves double duty: it identifies which key
to verify against, and it ties the signature to a specific key (swapping
`key_id` invalidates the signature).
### Replay Protection
V1 uses timestamp-only (±300s window, no server state). The replay trade-offs
and future zero-replay options (nonce challenge-response) are documented in
ADR-023.
### IdentityProvider Trait
The `IdentityProvider` trait decouples wraith-core from any specific identity
storage. It resolves a key fingerprint or auth token to an `Identity` with
scopes and resources.
```rust
pub trait IdentityProvider: Send + Sync + 'static {
/// Resolve an SSH public key fingerprint to an identity.
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
/// Resolve an auth token to an identity.
/// Returns None if the token is invalid, expired, or the key is not authorized.
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
}
pub struct Identity {
pub id: String, // Unique identifier — fingerprint (config) or account UUID (database)
pub scopes: Vec<String>, // e.g., ["relay:connect", "service:gitea:read"]
pub resources: HashMap<String, Vec<String>>, // e.g., {"service": ["gitea", "registry"]}
}
```
**Default implementation**: `ConfigIdentityProvider` loads from
`DynamicConfig.auth` (the `authorized_keys` set). Every authorized key gets a
default scope set. No database required.
**Hub implementation**: Backed by `@alkdev/storage`'s `peer_credentials` and
`accounts` tables plus the ACL graph. Resolves fingerprint → account →
organization membership → effective scopes. Uses `ArcSwap` for hot reload.
The trait is the contract. The backing store is pluggable. Wraith-core never
depends on Honker, SQLite, or any specific database.
### AuthPolicy Structure
`AuthPolicy` in `DynamicConfig` holds both auth paths, sharing key material:
```rust
pub struct AuthPolicy {
pub ssh: SshAuthConfig,
pub token: TokenAuthConfig,
}
pub struct SshAuthConfig {
pub authorized_keys: HashSet<PublicKey>,
pub cert_authorities: Vec<CertAuthorityEntry>,
// Existing fields from current ServerAuthConfig
}
pub struct TokenAuthConfig {
pub enabled: bool,
pub max_token_age: Duration, // Timestamp window (default: 300s)
pub key_source: TokenKeySource,
}
pub enum TokenKeySource {
/// Share the same authorized_keys set with SshAuthConfig.
/// Default and recommended for v1.
Shared,
/// Separate key set for non-SSH transports.
/// For deployments that want distinct access control per transport.
Separate(HashSet<PublicKey>),
}
```
When `TokenKeySource::Shared` (the default), adding a key to
`authorized_keys` immediately grants access via both SSH and WebTransport.
One key set, one `reloadAuth()` call, one rotation.
### Auth Flow in the Server
**SSH transport (existing, unchanged):**
```
Client connects → SSH handshake → auth_publickey() callback
→ ServerAuthConfig::authenticate_publickey() or authenticate_certificate()
→ Auth::Accept or Auth::Reject
```
**WebTransport transport (new):**
```
Browser connects → WebTransport CONNECT request
→ SessionRequest inspection: extract token from URL path or header
→ TokenAuthConfig verification: decode token → lookup key_id → verify signature → check timestamp
→ session_request.accept() or session_request.forbidden()
```
After auth, both paths produce an `Identity`. The `Identity` is attached to the
connection and used by `ForwardingPolicy` and the call protocol to make
authorization decisions.
### WebTransport SessionRequest Inspection
The wtransport library's `SessionRequest` provides:
- `path()` — URL path (e.g., `/wraith?token=...`)
- `headers()` — HTTP headers (for `Authorization: Bearer ...`)
- `origin()` — Browser origin (for CORS-like restrictions)
- `remote_address()` — Client UDP address
Token extraction from URL path is preferred for browser WebTransport because
the W3C API (`new WebTransport(url)`) naturally includes query parameters. For
native clients (Deno, CLI), the `Authorization` header is also supported.
### Browser-Side Token Construction
```javascript
// Illustrative — see client SDK for production implementation
async function createAuthToken(keyPair) {
const publicKey = await crypto.subtle.exportKey('raw', keyPair.publicKey);
const keyId = new Uint8Array(await crypto.subtle.digest('SHA-256', publicKey));
const timestamp = new ArrayBuffer(8);
new DataView(timestamp).setBigUint64(0, BigInt(Math.floor(Date.now() / 1000)));
const message = new Uint8Array([...keyId, ...new Uint8Array(timestamp)]);
const signature = await crypto.subtle.sign('Ed25519', keyPair.privateKey, message);
const token = new Uint8Array([...keyId, ...new Uint8Array(timestamp), ...new Uint8Array(signature)]);
return btoa(String.fromCharCode(...token))
.replace(/\+/g, '-').replace(/\//g, '_').replace(/=+$/, '');
}
```
Browsers support Ed25519 key generation and signing via `SubtleCrypto` (Chrome
105+, Firefox 130+, Safari 17+). Deno supports it natively. No external
dependencies needed.
## Constraints
- Auth tokens are Ed25519-signed with the same key pair used for SSH auth. No
separate key management for non-SSH transports.
- `IdentityProvider` is the only interface between wraith-core and identity
storage. No database dependency at the core level.
- The SSH auth path is unchanged. `auth_publickey()` continues to work exactly
as it does today. Token auth is additive.
- Certificate authority tokens are not supported for token auth in v1. CA
verification requires the full OpenSSH certificate structure, which doesn't
fit in a simple signed timestamp. This can be added later if needed.
- Token auth is only available on transports that carry HTTP metadata (URL
path, headers). SSH-over-TCP/TLS/iroh continues to use SSH native auth
exclusively.
### Security Considerations
**Token in URL**: The auth token is passed as a URL query parameter
(`?token=...`) for browser WebTransport compatibility. This is a known web
security consideration:
- **Server logs**: The token may appear in HTTP access logs. Servers MUST
strip or redact the `token` query parameter before logging the request URL.
- **Browser history**: The token may appear in browser history. Timestamps
limit exposure to the token window (±300s).
- **Referrer headers**: WebTransport does not send referrer headers, so the
token does not leak via HTTP Referer.
- **Native clients**: Deno and native clients SHOULD prefer the `Authorization:
Bearer` header over URL parameters when the client supports custom headers.
## Open Questions
- **OQ-18**: Should `Identity.scopes` be populated from `ForwardingPolicy`
rules, from an external `IdentityProvider`, or from both? See
[open-questions.md](open-questions.md).
- **OQ-19**: Should the WebTransport listener require its own TLS identity
(separate from the SSH-over-TLS listener), or can they share the same
certificate? See [open-questions.md](open-questions.md).
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Ed25519 + cert-authority | Key-based auth, no passwords |
| [023](decisions/023-unified-auth-shared-key-material.md) | Unified auth, shared key material | Same keys for SSH and token auth |
## References
- [server.md](server.md) — Current SSH auth handler
- [transport.md](transport.md) — Transport abstraction
- [configuration.md](../research/configuration.md) — DynamicConfig, AuthPolicy structure
- [open-questions.md](open-questions.md) — OQ-17 (resolved), OQ-18, OQ-19
- `server/handler.rs` — Current `auth_publickey()` callback
- `auth/server_auth.rs` — Current `ServerAuthConfig` struct
- `auth/keys.rs` — `KeySource` and key loading
- [wtransport](https://github.com/BiagioFesta/wtransport) — Rust WebTransport library
- [WebTransport W3C Spec](https://www.w3.org/TR/webtransport/) — Browser API
- [@alkdev/storage](/workspace/@alkdev/storage) — `peer_credentials` table, ACL graph

View File

@@ -0,0 +1,402 @@
---
status: draft
last_updated: 2026-06-04
---
# Call Protocol
## What
A bidirectional, transport-agnostic call and event protocol that runs over
authenticated pipes. It supports request/response calls, streaming
subscriptions, and unidirectional events — all using the same wire format. The
protocol is defined as a spec + handler + registry; downstream consumers (NAPI,
Python, hub/spoke) register their own operations without modifying core.
## Why
The current control channel (ADR-018) is unidirectional (client → server) and
provides fire-and-forget event dispatch without request/response semantics.
The call protocol generalizes it to support bidirectional calls (ADR-024) and
downstream service registration (ADR-025), enabling the hub/spoke model where
spokes expose operations the hub invokes.
## Architecture
### Operation Paths
Operation names use slash-based paths aligned with URL routing conventions:
```
/{spoke}/{service}/{op}
```
- **spoke** — identity prefix of the node that exposes the operation. The hub
uses this segment to route calls to the correct connected node.
- **service** — the logical service namespace. Groups related operations
under one handler prefix.
- **op** — the specific operation within that service.
Examples:
| Path | Meaning |
|------|---------|
| `/dev1/fs/readFile` | Spoke `dev1`, service `fs`, operation `readFile` |
| `/dev1/bash/exec` | Spoke `dev1`, service `bash`, operation `exec` |
| `/hub/agent/chat` | Hub's own `agent` service, operation `chat` |
| `/hub/sessions/list` | Hub's own `sessions` service, operation `list` |
| `/browser-1/notify/alert` | Browser spoke `browser-1`, `notify` service |
This three-level routing mirrors iroh's ALPN dispatch: the first segment
routes to a connected node (like ALPN routes to a protocol handler), the
remaining path dispatches within that node's registry. See ADR-025 for the
handler/spec separation decision.
The `namespace` field on `OperationSpec` is derived from the path (`namespace`
= second path segment). It's a convenience accessor for ACL matching and
service grouping.
### Wire Format: EventEnvelope
Every message on the wire is a length-prefixed JSON `EventEnvelope`:
```rust
pub struct EventEnvelope {
pub r#type: String, // Event type (e.g., "call.requested", "call.responded")
pub id: String, // Correlation key (requestId, topic, or "" for broadcasts)
pub payload: Value, // JSON payload — schema depends on event type
}
// Frame: 4-byte big-endian length prefix + UTF-8 JSON body
```
This is the same format used by `@alkdev/pubsub` adapters. It is JSON because
it must be consumable from JavaScript, Python, and any language. The envelope
is transport-agnostic — it runs over SSH channels, WebTransport streams, iroh
bidirectional streams, WebSocket, or Worker postMessage.
Binary payloads (postcard, protobuf, etc.) are base64-encoded in the `payload`
field. The envelope itself stays JSON for cross-language compatibility.
### Call Protocol Events
Five event types carry request/response and subscription semantics:
| Event | Direction | Purpose |
|-------|-----------|---------|
| `call.requested` | Caller → Handler | Initiate a call or subscription |
| `call.responded` | Handler → Caller | Deliver a result (one for calls, many for subscriptions) |
| `call.completed` | Handler → Caller | Signal end of subscription stream |
| `call.aborted` | Either side | Cancel the call/subscription |
| `call.error` | Handler → Caller | Signal an error |
**`call.error` payload**:
```json
{
"code": "string",
"message": "string",
"retryable": false
}
```
**A call is just a subscribe that resolves after one event.** Both `call()` and
`subscribe()` send the same `call.requested` event. The difference is
consumption pattern:
- **`call()`**: Sends `call.requested`, resolves `Promise` on first `call.responded`
- **`subscribe()`**: Sends `call.requested`, yields each `call.responded` until `call.completed` or `call.aborted`
The `id` field carries the `requestId` for correlation.
### Bidirectional Calls and Routing
Both sides of a connection can initiate calls. The hub routes calls to spokes
using the first path segment:
```
Hub (server) Spoke: "dev1" (client)
│ │
│ call.requested │
│ name: "/dev1/fs/readFile" │
│ payload: { path: "/src/main.rs" } │
│──────────────────────────────────────────▶│
│ │
│ call.responded │
│ id: <requestId> │
│ payload: { content: "fn main()..." } │
│◀──────────────────────────────────────────│
│ │
│ Spoke exposes /dev1/fs/*, │
│ /dev1/bash/* to hub │
│ │
│◀─ call.requested ────────────────────────│
│ name: "/hub/agent/chat" │
│ payload: { provider: "anthropic", ... } │
│ │
│── call.responded ──────────────────────▶ │
│ id: <requestId> │
│ payload: { completion: "..." } │
```
The hub's registry includes:
- **Hub-local operations** (`/hub/*`) — handled directly
- **Remote operations** (`/{spoke}/*`) — forwarded to the spoke connection
When the hub routes `/dev1/fs/readFile` to spoke `dev1`, it strips the spoke
prefix and delivers the call to the spoke's local registry as `/fs/readFile`.
The spoke doesn't need to know its own alias.
### Hub/Spoke Architecture
```
┌─────────────────────────────────┐
│ Hub │
│ │
│ Hub-local services: │
│ /hub/agent/chat (LLM coord) │
│ /hub/agent/complete │
│ /hub/sessions/list │
│ /hub/sessions/history │
│ │
│ Spoke registry (discovered): │
│ /dev1/fs/* → dev1 connection │
│ /dev1/bash/* → dev1 connection │
│ /dev2/fs/* → dev2 connection │
│ /browser-1/notify/* → WT conn │
└──────┬───────┬───────┬──────────┘
│ │ │
┌─────────▼┐ ┌───▼────┐ ┌▼───────────┐
│ Dev Spoke│ │Dev Spk │ │Browser Spoke│
│ "dev1" │ │"dev2" │ │"browser-1" │
│ /fs/* │ │/fs/* │ │/notify/* │
│ /bash/* │ │/bash/* │ │ │
│ /search/*│ │ │ │ │
└───────────┘ └────────┘ └─────────────┘
```
When a spoke connects, it registers its operations with the hub:
```
spoke → hub: call.requested { name: "/hub/services/register", payload: {
spoke: "dev1",
operations: ["/fs/readFile", "/fs/writeFile", "/bash/exec", "/search/query"]
}}
```
The hub adds these to its routing table with the spoke prefix. Other spokes
and browser clients can then call `/dev1/fs/readFile` without knowing how
the hub routes it internally.
### Operation Registry
The operation registry maps paths to specs and handlers. **Specs and handlers
are separate** — downstream consumers register both (ADR-025).
```rust
pub struct OperationSpec {
pub name: String, // e.g., "/fs/readFile", "/agent/chat"
pub namespace: String, // e.g., "fs", "agent"
pub op_type: OperationType, // Query, Mutation, Subscription
pub input_schema: Value, // JSON Schema for input
pub output_schema: Value, // JSON Schema for output
pub access_control: AccessControl, // Required scopes/resources
}
pub enum OperationType {
Query, // Read-only, idempotent (e.g., "/fs/readFile", "/search/query")
Mutation, // Side effects (e.g., "/bash/exec", "/sessions/create")
Subscription, // Streaming (e.g., "/events/subscribe")
}
pub struct AccessControl {
pub required_scopes: Vec<String>, // AND-checked
pub required_scopes_any: Option<Vec<String>>, // OR-checked
pub resource_type: Option<String>, // e.g., "service"
pub resource_action: Option<String>, // e.g., "read"
}
```
**Registration is separated from implementation:**
```rust
// Core registers discovery operations
registry.register(OperationSpec { name: "/services/list", ... }, list_services_handler);
registry.register(OperationSpec { name: "/services/schema", ... }, schema_handler);
// A dev env spoke registers its tools
registry.register(OperationSpec { name: "/fs/readFile", ... }, fs_read_handler);
registry.register(OperationSpec { name: "/bash/exec", ... }, bash_exec_handler);
// A browser client registers notification UDFs
registry.register(OperationSpec { name: "/notify/alert", ... }, notify_handler);
```
Core-provided operations use short paths without a spoke prefix
(`/services/list`, `/services/schema`). They live on whatever node the
caller is connected to. Spoke-prefixed operations (`/dev1/fs/readFile`)
are routed by the hub.
### ACL Per Operation Path
Access control maps to path prefixes using standard URL-like matching:
| Pattern | Matches | Purpose |
|---------|---------|---------|
| `/dev1/*` | All operations on spoke `dev1` | Full access to a spoke |
| `/*/fs/*` | `fs` service on any spoke | Read file access across dev envs |
| `/*/bash/*` | `bash` service on any spoke | Shell access (higher risk) |
| `/hub/agent/*` | Hub LLM agent | LLM calls |
| `/hub/sessions/*` | Hub session management | Session history |
| `/browser-1/notify/alert` | Specific operation on specific spoke | One UI notification |
Higher-risk operations (shell, filesystem write) can require tighter scopes
than read-only operations. The ACL evaluates against the caller's
`Identity.scopes` and `Identity.resources` from the auth layer (see auth.md).
### Service Discovery
The `/services/list` and `/services/schema` operations expose what a node
offers. Read-only — no admin operations:
| Operation | Type | Description |
|-----------|------|-------------|
| `/services/list` | Query | List registered operation paths + metadata |
| `/services/schema` | Query | Get `OperationSpec` for a specific operation |
These tell the caller: "here's what you can call." They are not a control
panel. Access control is enforced at the operation level.
### PendingRequestMap
Manages in-flight calls and subscriptions. Correlates `call.responded` events
back to the original `call.requested`:
```rust
pub struct PendingRequestMap {
pending: HashMap<String, PendingEntry>,
}
enum PendingEntry {
Call {
tx: oneshot::Sender<Result<Value>>,
timeout: Instant,
},
Subscribe {
tx: mpsc::Sender<Result<Value>>,
timeout: Option<Instant>,
},
}
```
When a `call.responded` event arrives:
- If `PendingEntry::Call` → resolve the oneshot, delete entry
- If `PendingEntry::Subscribe` → push to the mpsc channel, keep entry alive
When `call.completed` arrives on a subscription → close the mpsc channel, delete
entry. When `call.aborted` arrives → cancel/drop whichever side initiated it. A
`call.aborted` for an unknown `requestId` is silently discarded — no error
response is generated.
Timeouts prevent dangling entries. A background task sweeps expired entries
periodically.
### Protocol Adapter Layer
The call protocol is transport-agnostic by design. It maps to any transport
that carries `EventEnvelope` frames:
| Transport | Channel mechanism | Direction |
|-----------|-------------------|-----------|
| SSH | Reserved `direct_tcpip` destination (ADR-018) | Bidirectional over SSH channel |
| WebTransport | Bidirectional stream after CONNECT | Bidirectional over WT stream |
| iroh QUIC | Bidirectional `open_bi()` / `accept_bi()` | Bidirectional over QUIC stream |
| WebSocket | Single WS connection | Bidirectional over WS frames |
| Worker | `postMessage` | Bidirectional over structured clone |
The framing is always: 4-byte BE length prefix + JSON. The envelope shape is
the same regardless of transport.
### Relationship to @alkdev/pubsub and @alkdev/operations
The call protocol in core is a Rust reimplementation of the same protocol
defined in `@alkdev/operations`. The TypeScript implementation provides:
- `PendingRequestMap` — request/response correlation
- `CallHandler` — bridges pubsub events to operation registry
- `OperationSpec`, `AccessControl`, `Identity` — type definitions
The Rust implementation mirrors these types and behaviors. TypeScript consumers
continue using `@alkdev/operations` over `@alkdev/pubsub` adapters (including
the `event-target-wraith` adapter). Rust consumers use core's registry directly.
Both speak the same wire protocol and can interoperate.
The key principle: **the same `EventEnvelope` can flow from a Rust handler
through core, out over SSH channel, into a JavaScript pubsub adapter, and
be dispatched through `@alkdev/operations`'s call handler** — with zero
translation at the wire level.
### Agent Service Pattern
The hub commonly runs an agent service that coordinates between LLM providers
and tool calls. This service is just another set of registered operations —
no special treatment:
- `/hub/agent/chat` — send a message, get a completion. Routes to the
appropriate LLM provider based on available spokes and configuration.
- `/hub/agent/complete` — streaming completion. Yields tokens as they arrive.
- `/hub/sessions/list` — list session histories (backed by Honker or other
durable storage).
- `/hub/sessions/history` — retrieve a specific session's message history.
The agent service uses the same call protocol to invoke tools on spokes:
`/dev1/fs/readFile` for file access, `/dev1/bash/exec` for shell commands. It
stores session state via whatever mechanism the hub deployment provides — core
doesn't mandate Honker or any specific storage.
## Constraints
- The call protocol does not depend on Honker, SQLite, or any database. The
`PendingRequestMap` is in-memory. Durable session storage is a consumer concern.
- Operation specs use JSON Schema. Complex sub-structures (postcard, protobuf)
can be carried as base64-encoded blobs in the `payload`, but the envelope
itself is always JSON.
- Service discovery (`/services/list`, `/services/schema`) is read-only. No
admin operations are exposed through the call protocol itself.
- Batch is not a protocol primitive. Multiple `call.requested` events with
correlated `requestId`s provide equivalent semantics.
- The spoke prefix in the operation path is a routing mechanism, not a security
boundary. ACL is enforced at the `AccessControl` level, not by path prefix
alone. A spoke that exposes `/dev1/bash/exec` can restrict access via
`required_scopes` — not every authenticated identity should have shell access.
## Open Questions
- **OQ-20**: How does the hub track which spokes expose which operations when
spokes connect and disconnect? Registration on connect and cleanup on
disconnect, or heartbeat-based discovery? See
[open-questions.md](open-questions.md).
- **OQ-22**: Should the call protocol support streaming inputs (client streaming
in gRPC terms), or is client→server always a single request payload with
streaming only server→client? See [open-questions.md](open-questions.md).
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [018](decisions/018-control-channel-for-pubsub.md) | Control channel for pubsub | Reserved destination for event bus |
| [024](decisions/024-bidirectional-call-protocol.md) | Bidirectional call protocol | Generalizes ADR-018, both sides can call |
| [025](decisions/025-handler-spec-separation.md) | Handler/spec separation | Downstream registers operations without modifying core |
## References
- [auth.md](auth.md) — Identity and `IdentityProvider` trait
- [napi-and-pubsub.md](napi-and-pubsub.md) — NAPI wrapper and pubsub adapter
- [server.md](server.md) — Channel handling and control channel routing
- [transport.md](transport.md) — Transport abstraction
- [configuration.md](../research/configuration.md) — ForwardingPolicy, service metadata
- `@alkdev/pubsub` — TypeScript event target adapters and `EventEnvelope`
- `@alkdev/operations` — TypeScript call protocol, `OperationSpec`, registry
- `@alkdev/storage``peer_credentials` table, ACL graph, `Identity`
- [irpc](/workspace/irpc) — iroh streaming RPC (postcard-only, Rust-to-Rust)
- [iroh](/workspace/iroh) — P2P QUIC transport

View File

@@ -0,0 +1,85 @@
# ADR-023: Unified Authentication with Shared Key Material
## Status
Accepted
## Context
Wraith currently authenticates connections exclusively through SSH public key
auth in the SSH handshake. This works for SSH-over-any-transport (TCP, TLS,
iroh) because SSH carries its own auth protocol. But WebTransport and other
HTTP-level transports cannot perform SSH key exchange — browsers speak HTTP/3,
not SSH.
Without unification, non-SSH transports would need a completely separate
identity system (API keys, JWTs, session tokens). This creates two problems:
(1) operators manage two key sets with two rotation mechanisms, and (2) the
same person connecting via SSH and WebTransport appears as two different
identities.
The `IdentityProvider` trait is needed to decouple wraith-core from any
specific identity storage (config file vs. database). Without it, wraith-core
would either hardcode config-file-based auth or take a database dependency —
neither is acceptable for a library crate.
## Decision
**Unified authentication**: The same Ed25519 key material (`authorized_keys`
and `cert_authorities`) is shared across both SSH auth and token auth. The
presentation differs per transport, but the verification result (an
`Identity` with scopes) is the same.
**Token auth for non-SSH transports**: WebTransport clients present a signed
timestamp token in the CONNECT request URL:
```
AuthToken = base64url(key_id || timestamp || signature)
key_id = SHA-256 fingerprint of the Ed25519 public key (32 bytes)
timestamp = Unix seconds, big-endian u64 (8 bytes)
signature = Ed25519 sign(key_id || timestamp_bytes, private_key)
```
Server extracts the fingerprint, looks it up in the same `authorized_keys`
set, verifies the signature, and checks the timestamp window (default ±300s).
**`IdentityProvider` trait**: Decouples wraith-core from identity storage. The
trait resolves a fingerprint or token to an `Identity`. Default implementation
loads from `DynamicConfig.auth` (no database). Hub implementation can back it
with `@alkdev/storage`.
**`TokenKeySource::Shared`**: The token auth uses the same authorized keys set
as SSH auth by default. Deployments that want separate access control can use
`TokenKeySource::Separate` with a distinct key set.
**Replay protection via timestamps**: V1 uses timestamp-only (no server state).
Zero-replay can be added later via a nonce challenge-response without changing
the key material.
## Consequences
- **Positive**: One key set, one rotation, one `reloadAuth()` call. Adding a
key to `authorized_keys` immediately grants access via both SSH and
WebTransport.
- **Positive**: `IdentityProvider` trait makes wraith-core independent of any
specific database. Default: config file. Hub: `@alkdev/storage`.
- **Positive**: Browser clients can authenticate using Ed25519 keys via
SubtleCrypto (Chrome 105+, Firefox 130+, Safari 17+). Deno supports it
natively.
- **Positive**: No JWT library dependency. The token is a simple Ed25519
signature over a fixed structure — same primitives SSH already uses.
- **Negative**: V1 has a replay window (±300s). An attacker who intercepts a
QUIC packet can replay the token within the window. Acceptable because QUIC
interception is the same threat level as connection hijacking.
- **Negative**: Certificate authority tokens are not supported in v1. CA
verification requires the full OpenSSH certificate structure, which doesn't
fit in a signed timestamp.
- **Negative**: Browser-side key management is less ergonomic than SSH key
files. The private key must be imported into SubtleCrypto. This is a UI/UX
concern, not a protocol concern.
## References
- [auth.md](../auth.md) — Full auth architecture spec
- [ADR-012](012-auth-ed25519-and-cert-authority.md) — Ed25519 + cert-authority auth
- [OQ-17](../open-questions.md) — Transport-aware auth (resolved by this ADR)
- [configuration.md](../../research/configuration.md) — OQ-CFG-04, OQ-CFG-06 (resolved)

View File

@@ -0,0 +1,63 @@
# ADR-024: Bidirectional Call Protocol
## Status
Accepted
## Context
The wraith control channel (ADR-018) routes from client → server's event bus.
This is unidirectional: clients can send events to the server, but the server
cannot call operations on the client. In the hub/spoke model, spokes (dev env
containers) connect to a hub and expose operations (fs, bash, search) that the
hub invokes. The hub needs to call *spoke* operations.
Additionally, the current control channel provides no request/response semantics.
Every consumer that needs call/response reinvents the pending-request correlation.
## Decision
The call protocol is bidirectional. Both sides can send `call.requested` and
receive `call.responded`. The protocol uses `EventEnvelope` wire format (4-byte
BE length prefix + JSON) — the same as `@alkdev/pubsub`.
Five event types: `call.requested`, `call.responded`, `call.completed`,
`call.aborted`, `call.error`.
A call is a subscribe that resolves after one event. Both use `call.requested`
with correlated `requestId`. `PendingRequestMap` in core provides correlation.
Operation names use slash-based paths: `/{spoke}/{service}/{op}`. The first
path segment routes the call to the correct connected node. The hub's registry
maps spoke prefixes to connections. This mirrors iroh's ALPN dispatch: the
first segment is the routing key, remaining path dispatches within the node.
Core-provided operations use short paths without a spoke prefix
(`/services/list`, `/services/schema`). Spoke operations are prefixed
(`/dev1/fs/readFile`).
This generalizes ADR-018's control channel: the `wraith-*` destination becomes
a transport for `EventEnvelope` frames with call protocol semantics, instead of
raw pubsub dispatch.
## Consequences
- **Positive**: Hub can invoke operations on spokes. Dev env containers
expose fs, bash, search — the hub calls them as needed.
- **Positive**: Browser clients can expose custom UDFs. Any connected participant
can both call and serve operations.
- **Positive**: Built-in request/response correlation. One `PendingRequestMap`
in core serves all consumers.
- **Positive**: Slash-based paths align with URL routing, OpenAPI, MCP, and
iroh's ALPN dispatch. First segment = routing key.
- **Positive**: Multiple spokes exposing the same service (two dev envs both
exposing `/fs/*`) are naturally differentiated by the spoke prefix.
- **Negative**: The `PendingRequestMap` adds in-memory state. Entries must be
cleaned up on timeout or connection close.
- **Negative**: The hub must maintain a routing table mapping spoke identities
to connections, with registration on connect and cleanup on disconnect.
## References
- [call-protocol.md](../call-protocol.md) — Full call protocol spec
- [ADR-018](018-control-channel-for-pubsub.md) — Control channel (generalized)
- [napi-and-pubsub.md](../napi-and-pubsub.md) — NAPI wrapper and pubsub adapter

View File

@@ -0,0 +1,73 @@
# ADR-025: Handler/Spec Separation for Downstream Service Registration
## Status
Accepted
## Context
The current control channel (ADR-018) is hardcoded: `wraith-control:0` bridges
to the local pubsub event bus. If NAPI wants to expose `fs.readFile` or
`bash.exec` as callable operations, it has no way to register these with core's
channel routing. The NAPI handler would need to intercept channel data outside
of core.
For the hub/spoke model, spokes register their operations with the hub when
they connect. The hub's registry must include both hub-local operations and
remote operations exposed by spokes.
## Decision
Operation specs and handlers are separated from core. Core provides:
1. `OperationSpec` — describes what an operation does (name, type, input/output
schemas, access control)
2. `OperationHandler` — implements the operation logic
3. `OperationRegistry` — maps paths to specs + handlers
4. Built-in operations: `/services/list`, `/services/schema`
Downstream consumers register their own operations:
```rust
// NAPI layer registers dev env tools
registry.register(OperationSpec { name: "/fs/readFile", ... }, fs_read_handler);
registry.register(OperationSpec { name: "/bash/exec", ... }, bash_exec_handler);
// Browser client registers a custom UDF
registry.register(OperationSpec { name: "/notify/alert", ... }, notify_handler);
```
Operation names use slash-based paths: `/{spoke}/{service}/{op}`. The first
segment routes to the node. The `namespace` field on `OperationSpec` is
derived from the second path segment (`service`).
When spoke operations are registered with the hub, the hub adds the spoke
prefix: a spoke that registers `/fs/readFile` as "dev1" becomes addressable as
`/dev1/fs/readFile` in the hub's routing table.
The `/services/list` operation returns all registered specs. The
`/services/schema` operation returns the spec for a specific operation. These
are read-only — no admin operations.
## Consequences
- **Positive**: NAPI, Python, and any downstream consumer can register
operations without modifying core.
- **Positive**: Service discovery is built in. Clients query `/services/list`
to learn what operations a hub offers.
- **Positive**: Spoke prefix naturally differentiates multiple spokes exposing
the same service (dev1 vs dev2).
- **Positive**: `AccessControl` on each `OperationSpec` enables per-operation
authorization. Higher-risk operations (shell, filesystem write) can require
tighter scopes.
- **Positive**: Schema exposure enables MCP adapter generation. OperationSpec
maps directly to MCP tool definitions.
- **Negative**: The registry adds complexity. Core now owns `OperationSpec`,
`OperationRegistry`, and `PendingRequestMap`.
- **Negative**: Namespace collisions between downstream consumers are possible.
The spoke prefix mitigates this: `/dev1/fs/readFile` vs `/dev2/fs/readFile`.
## References
- [call-protocol.md](../call-protocol.md) — Full call protocol spec
- [ADR-018](018-control-channel-for-pubsub.md) — Control channel (generalized)
- `@alkdev/operations` — TypeScript `OperationSpec`, `CallHandler`, registry

View File

@@ -1,6 +1,6 @@
--- ---
status: reviewed status: draft
last_updated: 2026-06-02 last_updated: 2026-06-04
--- ---
# Open Questions # Open Questions
@@ -99,39 +99,78 @@ last_updated: 2026-06-02
- **Status**: open - **Status**: open
- **Priority**: medium - **Priority**: medium
- **Resolution**: (pending) - **Resolution**: (pending)
- **Cross-references**: ADR-020 (proposed) - **Cross-references**: configuration.md
### OQ-13: Config file auto-reload via file watching ### OQ-13: Config file auto-reload via file watching
- **Origin**: [research/configuration.md](../research/configuration.md) - **Origin**: [research/configuration.md](../research/configuration.md)
- **Status**: resolved - **Status**: resolved
- **Priority**: low - **Priority**: low
- **Resolution**: No file watching. CLI loads once at startup; NAPI/hub reload explicitly. File watching is a potential attack vector and unnecessary complexity for a security tool. - **Resolution**: No file watching. CLI loads once at startup; NAPI/hub reload explicitly. File watching is a potential attack vector and unnecessary complexity for a security tool.
- **Cross-references**: ADR-020 (proposed) - **Cross-references**: configuration.md
### OQ-14: ArcSwap vs RwLock for dynamic config ### OQ-14: ArcSwap vs RwLock for dynamic config
- **Origin**: [research/configuration.md](../research/configuration.md) - **Origin**: [research/configuration.md](../research/configuration.md)
- **Status**: resolved - **Status**: resolved
- **Priority**: low - **Priority**: low
- **Resolution**: ArcSwap. Lock-free reads on the hot path (every auth check, every channel open). `RwLock` adds contention. `arc-swap` is small (~500 lines) and well-maintained. - **Resolution**: ArcSwap. Lock-free reads on the hot path (every auth check, every channel open). `RwLock` adds contention. `arc-swap` is small (~500 lines) and well-maintained.
- **Cross-references**: ADR-020 (proposed) - **Cross-references**: configuration.md
### OQ-15: TLS + WebTransport + iroh QUIC listener coexistence ### OQ-15: TLS + WebTransport + iroh QUIC listener coexistence
- **Origin**: [research/configuration.md](../research/configuration.md) - **Origin**: [research/configuration.md](../research/configuration.md)
- **Status**: open - **Status**: open
- **Priority**: medium - **Priority**: medium
- **Resolution**: (pending — needs R&D in WebTransport transport session) - **Resolution**: (pending — needs R&D in WebTransport transport session)
- **Cross-references**: ADR-022 (proposed) - **Cross-references**: [auth.md](auth.md), OQ-19
### OQ-16: Transport-specific forwarding policy (e.g., WebTransport clients restricted to wraith-* channels) ### OQ-16: Transport-specific forwarding policy (e.g., WebTransport clients restricted to wraith-* channels)
- **Origin**: [research/configuration.md](../research/configuration.md) - **Origin**: [research/configuration.md](../research/configuration.md)
- **Status**: open - **Status**: open
- **Priority**: low - **Priority**: low
- **Resolution**: (pending — defer to forwarding policy design) - **Resolution**: (pending — defer to forwarding policy design)
- **Cross-references**: ADR-021 (proposed) - **Cross-references**: configuration.md
### OQ-17: Transport-aware auth layer (SSH keys vs API keys for non-SSH transports) ### OQ-17: Transport-aware auth layer (SSH keys vs API keys for non-SSH transports)
- **Origin**: [research/configuration.md](../research/configuration.md) - **Origin**: [research/configuration.md](../research/configuration.md)
- **Status**: ~~resolved~~
- **Priority**: ~~medium~~
- **Resolution**: ADR-023 — Unified auth with shared key material. SSH transports use SSH pubkey auth. Non-SSH transports (WebTransport) use Ed25519-signed timestamp tokens. Both verify against the same `authorized_keys` set. The presentation differs per transport, but the identity is unified. `AuthPolicy` holds both `SshAuthConfig` and `TokenAuthConfig`, with `TokenKeySource::Shared` as the default (same keys for both paths). `IdentityProvider` trait decouples wraith-core from identity storage.
- **Cross-references**: [ADR-023](decisions/023-unified-auth-shared-key-material.md), [auth.md](auth.md), OQ-15
## Auth
### OQ-18: Source of Identity.scopes — ForwardingPolicy, IdentityProvider, or both?
- **Origin**: [auth.md](auth.md)
- **Status**: open - **Status**: open
- **Priority**: medium - **Priority**: medium
- **Resolution**: (pending — defer until non-SSH transport is implemented) - **Resolution**: (pending)
- **Cross-references**: ADR-020 (proposed), OQ-15 - **Cross-references**: ADR-023, [call-protocol.md](call-protocol.md)
### OQ-19: Separate TLS identity for WebTransport vs shared with SSH-over-TLS?
- **Origin**: [auth.md](auth.md)
- **Status**: open
- **Priority**: low
- **Resolution**: (pending)
- **Cross-references**: OQ-15
## Call Protocol
### OQ-20: Spoke registration and discovery on connect/disconnect
- **Origin**: [call-protocol.md](call-protocol.md)
- **Status**: open
- **Priority**: medium
- **Resolution**: (pending — registration on connect / cleanup on disconnect is the leading approach)
- **Cross-references**: ADR-024, ADR-025
### OQ-21: Routing calls to specific spokes with same-service operations
- **Origin**: [call-protocol.md](call-protocol.md)
- **Status**: ~~resolved~~
- **Priority**: ~~medium~~
- **Resolution**: ADR-024, ADR-025 — Operation paths use `/{spoke}/{service}/{op}` format. The first path segment identifies the spoke and routes the call to the correct connected node. Multiple spokes exposing the same service (e.g., two dev envs both with `/fs/*`) are differentiated by the spoke prefix (`/dev1/fs/readFile` vs `/dev2/fs/readFile`). The hub maintains a routing table mapping spoke identity to connection. This mirrors iroh's ALPN dispatch: first segment = routing key.
- **Cross-references**: [call-protocol.md](call-protocol.md), ADR-024, ADR-025
### OQ-22: Client streaming (streaming inputs) in the call protocol?
- **Origin**: [call-protocol.md](call-protocol.md)
- **Status**: open
- **Priority**: low
- **Resolution**: (pending)
- **Cross-references**: ADR-024

View File

@@ -490,13 +490,20 @@ compat via accepting both `transport: string` (single) and
`SO_REUSEPORT` is used. Needs R&D; defer to WebTransport transport design `SO_REUSEPORT` is used. Needs R&D; defer to WebTransport transport design
session. session.
**Update**: WebTransport is out of scope for the current configuration ~~**Update**: WebTransport is out of scope for the current configuration
work. It requires a fundamentally different authentication model (HTTP-level work. It requires a fundamentally different authentication model (HTTP-level
API keys/session tokens vs SSH key-based auth). The `ServerHandler` only API keys/session tokens vs SSH key-based auth). The `ServerHandler` only
knows SSH `auth_publickey`. WebTransport auth would need its own handler knows SSH `auth_publickey`. WebTransport auth would need its own handler
path. This connects to the broader question of whether `DynamicConfig.auth` path. This connects to the broader question of whether `DynamicConfig.auth`
should be transport-aware (see OQ-CFG-06). WebTransport transport design should be transport-aware (see OQ-CFG-06). WebTransport transport design
is a separate R&D session. is a separate R&D session.~~
**Update 2**: Auth concern is resolved by ADR-023. The same authorized_keys
set verifies both SSH pubkey auth and token auth (Ed25519-signed timestamp
for WebTransport). One key material, two presentations. The remaining
question is purely about QUIC listener coexistence — which is a transport
implementation detail, not an auth question. See [auth.md](../architecture/auth.md)
and [ADR-023](../architecture/decisions/023-unified-auth-shared-key-material.md).
- **OQ-CFG-05**: Does `TransportKind::WebTransport` need any handler behavior - **OQ-CFG-05**: Does `TransportKind::WebTransport` need any handler behavior
different from other transports? different from other transports?
@@ -518,7 +525,7 @@ compat via accepting both `transport: string` (single) and
headers/query params). The auth question is: does the same `DynamicConfig` headers/query params). The auth question is: does the same `DynamicConfig`
serve both models, or does each transport carry its own auth config? serve both models, or does each transport carry its own auth config?
Option A: `AuthPolicy` contains both SSH auth and API key auth: ~~Option A: `AuthPolicy` contains both SSH auth and API key auth:
```rust ```rust
pub struct AuthPolicy { pub struct AuthPolicy {
ssh: SshAuthConfig, // for SSH-over-any-transport ssh: SshAuthConfig, // for SSH-over-any-transport
@@ -536,7 +543,17 @@ compat via accepting both `transport: string` (single) and
For now, the config architecture should accommodate Option A as a future For now, the config architecture should accommodate Option A as a future
extension. Phase 1 implements `DynamicConfig` with SSH auth only. API key extension. Phase 1 implements `DynamicConfig` with SSH auth only. API key
auth is added when a non-SSH transport is implemented. auth is added when a non-SSH transport is implemented.~~
**Resolved by ADR-023**: The auth layer is transport-aware in its
*presentation*, not its *material*. `AuthPolicy` holds `SshAuthConfig` and
`TokenAuthConfig`, where `TokenAuthConfig.key_source` defaults to
`Shared` (same `authorized_keys` set as SSH auth). The same Ed25519 keys
serve both paths: SSH presents the public key in the handshake; WebTransport
presents an Ed25519-signed timestamp token. Verification produces the same
`Identity` type via the `IdentityProvider` trait. One `reloadAuth()` call
updates both. See [auth.md](../architecture/auth.md) and
[ADR-023](../architecture/decisions/023-unified-auth-shared-key-material.md).
## Decisions Required ## Decisions Required
@@ -566,3 +583,6 @@ These decisions will be extracted into ADRs when the architecture is finalized:
- `@alkdev/storage/docs/architecture/sqlite-host.md``peer_credentials` table schema - `@alkdev/storage/docs/architecture/sqlite-host.md``peer_credentials` table schema
- [wtransport](https://github.com/BiagioFesta/wtransport) — Rust WebTransport library (in `/workspace/wtransport`) - [wtransport](https://github.com/BiagioFesta/wtransport) — Rust WebTransport library (in `/workspace/wtransport`)
- [arc-swap crate](https://docs.rs/arc-swap) — Lock-free read, atomic write for shared state - [arc-swap crate](https://docs.rs/arc-swap) — Lock-free read, atomic write for shared state
- [ADR-023](../architecture/decisions/023-unified-auth-shared-key-material.md) — Unified auth with shared key material
- [auth.md](../architecture/auth.md) — Unified auth architecture spec
- [call-protocol.md](../architecture/call-protocol.md) — Bidirectional call protocol spec