docs: add auth, call protocol architecture specs and ADRs 023-025

Unified authentication (ADR-023): SSH and WebTransport auth share the same
Ed25519 key material. Token auth uses signed timestamps verified against the
same authorized_keys set. IdentityProvider trait decouples core from identity
storage.

Bidirectional call protocol (ADR-024): Generalizes control channel (ADR-018)
to support hub→spoke and spoke→hub calls. Operation paths use /{spoke}/{service}/{op}
format for three-level routing. EventEnvelope wire format, five call events,
PendingRequestMap for correlation.

Handler/spec separation (ADR-025): Downstream consumers register operations
without modifying core. OperationRegistry maps paths to specs + handlers.
Service discovery via /services/list and /services/schema.

Resolves OQ-17 (transport-aware auth), OQ-21 (spoke routing), OQ-CFG-04 and
OQ-CFG-06 (WebTransport auth and transport-aware auth layer). Adds OQ-18
through OQ-22 for remaining open questions.
This commit is contained in:
2026-06-05 08:19:41 +00:00
parent 41062d810e
commit af7f4d0006
8 changed files with 971 additions and 19 deletions

View File

@@ -0,0 +1,85 @@
# ADR-023: Unified Authentication with Shared Key Material
## Status
Accepted
## Context
Wraith currently authenticates connections exclusively through SSH public key
auth in the SSH handshake. This works for SSH-over-any-transport (TCP, TLS,
iroh) because SSH carries its own auth protocol. But WebTransport and other
HTTP-level transports cannot perform SSH key exchange — browsers speak HTTP/3,
not SSH.
Without unification, non-SSH transports would need a completely separate
identity system (API keys, JWTs, session tokens). This creates two problems:
(1) operators manage two key sets with two rotation mechanisms, and (2) the
same person connecting via SSH and WebTransport appears as two different
identities.
The `IdentityProvider` trait is needed to decouple wraith-core from any
specific identity storage (config file vs. database). Without it, wraith-core
would either hardcode config-file-based auth or take a database dependency —
neither is acceptable for a library crate.
## Decision
**Unified authentication**: The same Ed25519 key material (`authorized_keys`
and `cert_authorities`) is shared across both SSH auth and token auth. The
presentation differs per transport, but the verification result (an
`Identity` with scopes) is the same.
**Token auth for non-SSH transports**: WebTransport clients present a signed
timestamp token in the CONNECT request URL:
```
AuthToken = base64url(key_id || timestamp || signature)
key_id = SHA-256 fingerprint of the Ed25519 public key (32 bytes)
timestamp = Unix seconds, big-endian u64 (8 bytes)
signature = Ed25519 sign(key_id || timestamp_bytes, private_key)
```
Server extracts the fingerprint, looks it up in the same `authorized_keys`
set, verifies the signature, and checks the timestamp window (default ±300s).
**`IdentityProvider` trait**: Decouples wraith-core from identity storage. The
trait resolves a fingerprint or token to an `Identity`. Default implementation
loads from `DynamicConfig.auth` (no database). Hub implementation can back it
with `@alkdev/storage`.
**`TokenKeySource::Shared`**: The token auth uses the same authorized keys set
as SSH auth by default. Deployments that want separate access control can use
`TokenKeySource::Separate` with a distinct key set.
**Replay protection via timestamps**: V1 uses timestamp-only (no server state).
Zero-replay can be added later via a nonce challenge-response without changing
the key material.
## Consequences
- **Positive**: One key set, one rotation, one `reloadAuth()` call. Adding a
key to `authorized_keys` immediately grants access via both SSH and
WebTransport.
- **Positive**: `IdentityProvider` trait makes wraith-core independent of any
specific database. Default: config file. Hub: `@alkdev/storage`.
- **Positive**: Browser clients can authenticate using Ed25519 keys via
SubtleCrypto (Chrome 105+, Firefox 130+, Safari 17+). Deno supports it
natively.
- **Positive**: No JWT library dependency. The token is a simple Ed25519
signature over a fixed structure — same primitives SSH already uses.
- **Negative**: V1 has a replay window (±300s). An attacker who intercepts a
QUIC packet can replay the token within the window. Acceptable because QUIC
interception is the same threat level as connection hijacking.
- **Negative**: Certificate authority tokens are not supported in v1. CA
verification requires the full OpenSSH certificate structure, which doesn't
fit in a signed timestamp.
- **Negative**: Browser-side key management is less ergonomic than SSH key
files. The private key must be imported into SubtleCrypto. This is a UI/UX
concern, not a protocol concern.
## References
- [auth.md](../auth.md) — Full auth architecture spec
- [ADR-012](012-auth-ed25519-and-cert-authority.md) — Ed25519 + cert-authority auth
- [OQ-17](../open-questions.md) — Transport-aware auth (resolved by this ADR)
- [configuration.md](../../research/configuration.md) — OQ-CFG-04, OQ-CFG-06 (resolved)

View File

@@ -0,0 +1,63 @@
# ADR-024: Bidirectional Call Protocol
## Status
Accepted
## Context
The wraith control channel (ADR-018) routes from client → server's event bus.
This is unidirectional: clients can send events to the server, but the server
cannot call operations on the client. In the hub/spoke model, spokes (dev env
containers) connect to a hub and expose operations (fs, bash, search) that the
hub invokes. The hub needs to call *spoke* operations.
Additionally, the current control channel provides no request/response semantics.
Every consumer that needs call/response reinvents the pending-request correlation.
## Decision
The call protocol is bidirectional. Both sides can send `call.requested` and
receive `call.responded`. The protocol uses `EventEnvelope` wire format (4-byte
BE length prefix + JSON) — the same as `@alkdev/pubsub`.
Five event types: `call.requested`, `call.responded`, `call.completed`,
`call.aborted`, `call.error`.
A call is a subscribe that resolves after one event. Both use `call.requested`
with correlated `requestId`. `PendingRequestMap` in core provides correlation.
Operation names use slash-based paths: `/{spoke}/{service}/{op}`. The first
path segment routes the call to the correct connected node. The hub's registry
maps spoke prefixes to connections. This mirrors iroh's ALPN dispatch: the
first segment is the routing key, remaining path dispatches within the node.
Core-provided operations use short paths without a spoke prefix
(`/services/list`, `/services/schema`). Spoke operations are prefixed
(`/dev1/fs/readFile`).
This generalizes ADR-018's control channel: the `wraith-*` destination becomes
a transport for `EventEnvelope` frames with call protocol semantics, instead of
raw pubsub dispatch.
## Consequences
- **Positive**: Hub can invoke operations on spokes. Dev env containers
expose fs, bash, search — the hub calls them as needed.
- **Positive**: Browser clients can expose custom UDFs. Any connected participant
can both call and serve operations.
- **Positive**: Built-in request/response correlation. One `PendingRequestMap`
in core serves all consumers.
- **Positive**: Slash-based paths align with URL routing, OpenAPI, MCP, and
iroh's ALPN dispatch. First segment = routing key.
- **Positive**: Multiple spokes exposing the same service (two dev envs both
exposing `/fs/*`) are naturally differentiated by the spoke prefix.
- **Negative**: The `PendingRequestMap` adds in-memory state. Entries must be
cleaned up on timeout or connection close.
- **Negative**: The hub must maintain a routing table mapping spoke identities
to connections, with registration on connect and cleanup on disconnect.
## References
- [call-protocol.md](../call-protocol.md) — Full call protocol spec
- [ADR-018](018-control-channel-for-pubsub.md) — Control channel (generalized)
- [napi-and-pubsub.md](../napi-and-pubsub.md) — NAPI wrapper and pubsub adapter

View File

@@ -0,0 +1,73 @@
# ADR-025: Handler/Spec Separation for Downstream Service Registration
## Status
Accepted
## Context
The current control channel (ADR-018) is hardcoded: `wraith-control:0` bridges
to the local pubsub event bus. If NAPI wants to expose `fs.readFile` or
`bash.exec` as callable operations, it has no way to register these with core's
channel routing. The NAPI handler would need to intercept channel data outside
of core.
For the hub/spoke model, spokes register their operations with the hub when
they connect. The hub's registry must include both hub-local operations and
remote operations exposed by spokes.
## Decision
Operation specs and handlers are separated from core. Core provides:
1. `OperationSpec` — describes what an operation does (name, type, input/output
schemas, access control)
2. `OperationHandler` — implements the operation logic
3. `OperationRegistry` — maps paths to specs + handlers
4. Built-in operations: `/services/list`, `/services/schema`
Downstream consumers register their own operations:
```rust
// NAPI layer registers dev env tools
registry.register(OperationSpec { name: "/fs/readFile", ... }, fs_read_handler);
registry.register(OperationSpec { name: "/bash/exec", ... }, bash_exec_handler);
// Browser client registers a custom UDF
registry.register(OperationSpec { name: "/notify/alert", ... }, notify_handler);
```
Operation names use slash-based paths: `/{spoke}/{service}/{op}`. The first
segment routes to the node. The `namespace` field on `OperationSpec` is
derived from the second path segment (`service`).
When spoke operations are registered with the hub, the hub adds the spoke
prefix: a spoke that registers `/fs/readFile` as "dev1" becomes addressable as
`/dev1/fs/readFile` in the hub's routing table.
The `/services/list` operation returns all registered specs. The
`/services/schema` operation returns the spec for a specific operation. These
are read-only — no admin operations.
## Consequences
- **Positive**: NAPI, Python, and any downstream consumer can register
operations without modifying core.
- **Positive**: Service discovery is built in. Clients query `/services/list`
to learn what operations a hub offers.
- **Positive**: Spoke prefix naturally differentiates multiple spokes exposing
the same service (dev1 vs dev2).
- **Positive**: `AccessControl` on each `OperationSpec` enables per-operation
authorization. Higher-risk operations (shell, filesystem write) can require
tighter scopes.
- **Positive**: Schema exposure enables MCP adapter generation. OperationSpec
maps directly to MCP tool definitions.
- **Negative**: The registry adds complexity. Core now owns `OperationSpec`,
`OperationRegistry`, and `PendingRequestMap`.
- **Negative**: Namespace collisions between downstream consumers are possible.
The spoke prefix mitigates this: `/dev1/fs/readFile` vs `/dev2/fs/readFile`.
## References
- [call-protocol.md](../call-protocol.md) — Full call protocol spec
- [ADR-018](018-control-channel-for-pubsub.md) — Control channel (generalized)
- `@alkdev/operations` — TypeScript `OperationSpec`, `CallHandler`, registry