docs: add auth, call protocol architecture specs and ADRs 023-025

Unified authentication (ADR-023): SSH and WebTransport auth share the same Ed25519 key material. Token auth uses signed timestamps verified against the same authorized_keys set. IdentityProvider trait decouples core from identity storage. Bidirectional call protocol (ADR-024): Generalizes control channel (ADR-018) to support hub→spoke and spoke→hub calls. Operation paths use /{spoke}/{service}/{op} format for three-level routing. EventEnvelope wire format, five call events, PendingRequestMap for correlation. Handler/spec separation (ADR-025): Downstream consumers register operations without modifying core. OperationRegistry maps paths to specs + handlers. Service discovery via /services/list and /services/schema. Resolves OQ-17 (transport-aware auth), OQ-21 (spoke routing), OQ-CFG-04 and OQ-CFG-06 (WebTransport auth and transport-aware auth layer). Adds OQ-18 through OQ-22 for remaining open questions.
2026-06-05 08:19:41 +00:00
parent 41062d810e
commit af7f4d0006
8 changed files with 971 additions and 19 deletions
--- a/docs/architecture/decisions/023-unified-auth-shared-key-material.md
+++ b/docs/architecture/decisions/023-unified-auth-shared-key-material.md
@@ -0,0 +1,85 @@
+# ADR-023: Unified Authentication with Shared Key Material
+
+## Status
+Accepted
+
+## Context
+
+Wraith currently authenticates connections exclusively through SSH public key
+auth in the SSH handshake. This works for SSH-over-any-transport (TCP, TLS,
+iroh) because SSH carries its own auth protocol. But WebTransport and other
+HTTP-level transports cannot perform SSH key exchange — browsers speak HTTP/3,
+not SSH.
+
+Without unification, non-SSH transports would need a completely separate
+identity system (API keys, JWTs, session tokens). This creates two problems:
+(1) operators manage two key sets with two rotation mechanisms, and (2) the
+same person connecting via SSH and WebTransport appears as two different
+identities.
+
+The `IdentityProvider` trait is needed to decouple wraith-core from any
+specific identity storage (config file vs. database). Without it, wraith-core
+would either hardcode config-file-based auth or take a database dependency —
+neither is acceptable for a library crate.
+
+## Decision
+
+**Unified authentication**: The same Ed25519 key material (`authorized_keys`
+and `cert_authorities`) is shared across both SSH auth and token auth. The
+presentation differs per transport, but the verification result (an
+`Identity` with scopes) is the same.
+
+**Token auth for non-SSH transports**: WebTransport clients present a signed
+timestamp token in the CONNECT request URL:
+
+```
+AuthToken = base64url(key_id || timestamp || signature)
+  key_id    = SHA-256 fingerprint of the Ed25519 public key (32 bytes)
+  timestamp = Unix seconds, big-endian u64 (8 bytes)
+  signature = Ed25519 sign(key_id || timestamp_bytes, private_key)
+```
+
+Server extracts the fingerprint, looks it up in the same `authorized_keys`
+set, verifies the signature, and checks the timestamp window (default ±300s).
+
+**`IdentityProvider` trait**: Decouples wraith-core from identity storage. The
+trait resolves a fingerprint or token to an `Identity`. Default implementation
+loads from `DynamicConfig.auth` (no database). Hub implementation can back it
+with `@alkdev/storage`.
+
+**`TokenKeySource::Shared`**: The token auth uses the same authorized keys set
+as SSH auth by default. Deployments that want separate access control can use
+`TokenKeySource::Separate` with a distinct key set.
+
+**Replay protection via timestamps**: V1 uses timestamp-only (no server state).
+Zero-replay can be added later via a nonce challenge-response without changing
+the key material.
+
+## Consequences
+
+- **Positive**: One key set, one rotation, one `reloadAuth()` call. Adding a
+  key to `authorized_keys` immediately grants access via both SSH and
+  WebTransport.
+- **Positive**: `IdentityProvider` trait makes wraith-core independent of any
+  specific database. Default: config file. Hub: `@alkdev/storage`.
+- **Positive**: Browser clients can authenticate using Ed25519 keys via
+  SubtleCrypto (Chrome 105+, Firefox 130+, Safari 17+). Deno supports it
+  natively.
+- **Positive**: No JWT library dependency. The token is a simple Ed25519
+  signature over a fixed structure — same primitives SSH already uses.
+- **Negative**: V1 has a replay window (±300s). An attacker who intercepts a
+  QUIC packet can replay the token within the window. Acceptable because QUIC
+  interception is the same threat level as connection hijacking.
+- **Negative**: Certificate authority tokens are not supported in v1. CA
+  verification requires the full OpenSSH certificate structure, which doesn't
+  fit in a signed timestamp.
+- **Negative**: Browser-side key management is less ergonomic than SSH key
+  files. The private key must be imported into SubtleCrypto. This is a UI/UX
+  concern, not a protocol concern.
+
+## References
+
+- [auth.md](../auth.md) — Full auth architecture spec
+- [ADR-012](012-auth-ed25519-and-cert-authority.md) — Ed25519 + cert-authority auth
+- [OQ-17](../open-questions.md) — Transport-aware auth (resolved by this ADR)
+- [configuration.md](../../research/configuration.md) — OQ-CFG-04, OQ-CFG-06 (resolved)
--- a/docs/architecture/decisions/024-bidirectional-call-protocol.md
+++ b/docs/architecture/decisions/024-bidirectional-call-protocol.md
@@ -0,0 +1,63 @@
+# ADR-024: Bidirectional Call Protocol
+
+## Status
+Accepted
+
+## Context
+
+The wraith control channel (ADR-018) routes from client → server's event bus.
+This is unidirectional: clients can send events to the server, but the server
+cannot call operations on the client. In the hub/spoke model, spokes (dev env
+containers) connect to a hub and expose operations (fs, bash, search) that the
+hub invokes. The hub needs to call *spoke* operations.
+
+Additionally, the current control channel provides no request/response semantics.
+Every consumer that needs call/response reinvents the pending-request correlation.
+
+## Decision
+
+The call protocol is bidirectional. Both sides can send `call.requested` and
+receive `call.responded`. The protocol uses `EventEnvelope` wire format (4-byte
+BE length prefix + JSON) — the same as `@alkdev/pubsub`.
+
+Five event types: `call.requested`, `call.responded`, `call.completed`,
+`call.aborted`, `call.error`.
+
+A call is a subscribe that resolves after one event. Both use `call.requested`
+with correlated `requestId`. `PendingRequestMap` in core provides correlation.
+
+Operation names use slash-based paths: `/{spoke}/{service}/{op}`. The first
+path segment routes the call to the correct connected node. The hub's registry
+maps spoke prefixes to connections. This mirrors iroh's ALPN dispatch: the
+first segment is the routing key, remaining path dispatches within the node.
+
+Core-provided operations use short paths without a spoke prefix
+(`/services/list`, `/services/schema`). Spoke operations are prefixed
+(`/dev1/fs/readFile`).
+
+This generalizes ADR-018's control channel: the `wraith-*` destination becomes
+a transport for `EventEnvelope` frames with call protocol semantics, instead of
+raw pubsub dispatch.
+
+## Consequences
+
+- **Positive**: Hub can invoke operations on spokes. Dev env containers
+  expose fs, bash, search — the hub calls them as needed.
+- **Positive**: Browser clients can expose custom UDFs. Any connected participant
+  can both call and serve operations.
+- **Positive**: Built-in request/response correlation. One `PendingRequestMap`
+  in core serves all consumers.
+- **Positive**: Slash-based paths align with URL routing, OpenAPI, MCP, and
+  iroh's ALPN dispatch. First segment = routing key.
+- **Positive**: Multiple spokes exposing the same service (two dev envs both
+  exposing `/fs/*`) are naturally differentiated by the spoke prefix.
+- **Negative**: The `PendingRequestMap` adds in-memory state. Entries must be
+  cleaned up on timeout or connection close.
+- **Negative**: The hub must maintain a routing table mapping spoke identities
+  to connections, with registration on connect and cleanup on disconnect.
+
+## References
+
+- [call-protocol.md](../call-protocol.md) — Full call protocol spec
+- [ADR-018](018-control-channel-for-pubsub.md) — Control channel (generalized)
+- [napi-and-pubsub.md](../napi-and-pubsub.md) — NAPI wrapper and pubsub adapter
--- a/docs/architecture/decisions/025-handler-spec-separation.md
+++ b/docs/architecture/decisions/025-handler-spec-separation.md
@@ -0,0 +1,73 @@
+# ADR-025: Handler/Spec Separation for Downstream Service Registration
+
+## Status
+Accepted
+
+## Context
+
+The current control channel (ADR-018) is hardcoded: `wraith-control:0` bridges
+to the local pubsub event bus. If NAPI wants to expose `fs.readFile` or
+`bash.exec` as callable operations, it has no way to register these with core's
+channel routing. The NAPI handler would need to intercept channel data outside
+of core.
+
+For the hub/spoke model, spokes register their operations with the hub when
+they connect. The hub's registry must include both hub-local operations and
+remote operations exposed by spokes.
+
+## Decision
+
+Operation specs and handlers are separated from core. Core provides:
+
+1. `OperationSpec` — describes what an operation does (name, type, input/output
+   schemas, access control)
+2. `OperationHandler` — implements the operation logic
+3. `OperationRegistry` — maps paths to specs + handlers
+4. Built-in operations: `/services/list`, `/services/schema`
+
+Downstream consumers register their own operations:
+
+```rust
+// NAPI layer registers dev env tools
+registry.register(OperationSpec { name: "/fs/readFile", ... }, fs_read_handler);
+registry.register(OperationSpec { name: "/bash/exec", ... }, bash_exec_handler);
+
+// Browser client registers a custom UDF
+registry.register(OperationSpec { name: "/notify/alert", ... }, notify_handler);
+```
+
+Operation names use slash-based paths: `/{spoke}/{service}/{op}`. The first
+segment routes to the node. The `namespace` field on `OperationSpec` is
+derived from the second path segment (`service`).
+
+When spoke operations are registered with the hub, the hub adds the spoke
+prefix: a spoke that registers `/fs/readFile` as "dev1" becomes addressable as
+`/dev1/fs/readFile` in the hub's routing table.
+
+The `/services/list` operation returns all registered specs. The
+`/services/schema` operation returns the spec for a specific operation. These
+are read-only — no admin operations.
+
+## Consequences
+
+- **Positive**: NAPI, Python, and any downstream consumer can register
+  operations without modifying core.
+- **Positive**: Service discovery is built in. Clients query `/services/list`
+  to learn what operations a hub offers.
+- **Positive**: Spoke prefix naturally differentiates multiple spokes exposing
+  the same service (dev1 vs dev2).
+- **Positive**: `AccessControl` on each `OperationSpec` enables per-operation
+  authorization. Higher-risk operations (shell, filesystem write) can require
+  tighter scopes.
+- **Positive**: Schema exposure enables MCP adapter generation. OperationSpec
+  maps directly to MCP tool definitions.
+- **Negative**: The registry adds complexity. Core now owns `OperationSpec`,
+  `OperationRegistry`, and `PendingRequestMap`.
+- **Negative**: Namespace collisions between downstream consumers are possible.
+  The spoke prefix mitigates this: `/dev1/fs/readFile` vs `/dev2/fs/readFile`.
+
+## References
+
+- [call-protocol.md](../call-protocol.md) — Full call protocol spec
+- [ADR-018](018-control-channel-for-pubsub.md) — Control channel (generalized)
+- `@alkdev/operations` — TypeScript `OperationSpec`, `CallHandler`, registry