docs(http): decompose alknet-http spec into 19 implementation tasks
Break the alknet-http architecture spec into atomic, dependency-ordered tasks in tasks/http/, following the taskgraph frontmatter conventions used by the call/core/vault crates. Tasks span 7 phases across 5 module subdirectories (server/, gateway/, client/, adapters/, websocket/): - Phase 0: crate-init (foundation) - Phase 1: gateway-dispatch-spine, error-mapping, shared-http-client (shared infrastructure) - Phase 2: http-adapter, bearer-auth-middleware, gateway-endpoints, healthz-decoy (HTTP server surface) - Phase 3: to-openapi (OpenAPI gateway projection) - Phase 4: from-openapi (OpenAPI adapter, reqwest forwarding) - Phase 5: dispatcher-transport-abstraction, upgrade-handler, connection-overlay (WebSocket browser bidirectional path) - Phase 6: from-mcp, to-mcp (MCP adapters, feature-gated) - Phase 7: review-http, review-websocket, review-mcp, review-http-final (quality checkpoints) The gateway-dispatch-spine task implements the thin shared core recommended by the gateway-factoring research (concrete struct, not a trait). The dispatcher-transport-abstraction task is a cross-crate change to alknet-call (exposes EventEnvelope-level dispatch API for non-QUIC transports) — the highest-risk task. WebTransport/h3 is deferred per ADR-044 and has no tasks; from_wss is out of scope. Validated: 19 tasks, no cycles, 8 parallel generations, critical path length 8 (through the WebSocket strand).
This commit is contained in:
182
tasks/http/websocket/connection-overlay.md
Normal file
182
tasks/http/websocket/connection-overlay.md
Normal file
@@ -0,0 +1,182 @@
|
||||
---
|
||||
id: http/websocket/connection-overlay
|
||||
name: Implement connection-local Layer 2 overlay for browser-registered ops (no PeerId, ADR-024/034/044)
|
||||
status: pending
|
||||
depends_on: [http/websocket/upgrade-handler]
|
||||
scope: moderate
|
||||
risk: medium
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the connection-local Layer 2 overlay for browser-registered
|
||||
ops in `src/websocket/overlay.rs`. This is the mechanism that gives a
|
||||
browser bidirectional-call capability *without* peer-graph membership
|
||||
(ADR-024, ADR-034 §4, ADR-044 §5). A browser over WebSocket has no
|
||||
`PeerId`, does not enter `PeerCompositeEnv`, and any ops it registers
|
||||
land in a per-`CallConnection` overlay that dies when the connection
|
||||
drops.
|
||||
|
||||
### Browsers are not alknet peers (websocket.md §"Browsers are not alknet peers")
|
||||
|
||||
A browser over WebSocket authenticates by bearer token, gets no
|
||||
`PeerId`, does not enter `PeerCompositeEnv`, and its registered ops (if
|
||||
any) land in the connection-local Layer 2 overlay. The rationale, stated
|
||||
in ADR-044 §5 and amending ADR-034 §4 by reference, is a load-bearing
|
||||
distinction:
|
||||
|
||||
**"Peer" in alknet means an addressable node in the call-protocol peer
|
||||
graph** — a stable `PeerId`, reachable via `PeerRef::Specific`, whose ops
|
||||
land in `PeerCompositeEnv`, whose identity is stable across reconnects.
|
||||
It does *not* mean "any endpoint that exchanges calls during a live
|
||||
session." A browser is the second thing but not the first, on three
|
||||
concrete grounds:
|
||||
|
||||
1. **No stable cryptographic identity of its own.** A `PeerEntry` is
|
||||
anchored to fingerprints (Ed25519, X.509) that *the peer* presents
|
||||
and the local node pins. A browser presents a bearer token the *hub*
|
||||
issued; the "identity" is the hub's bookkeeping for that token, not
|
||||
something the browser owns or that could be pinned by another node.
|
||||
There is nothing to put in `PeerEntry.fingerprints`.
|
||||
|
||||
2. **Ephemeral.** Close the tab → connection dies → the connection-local
|
||||
Layer 2 overlay dies with it. A `PeerEntry` keyed to a browser would
|
||||
be a permanently-dead entry within seconds. `PeerRef::Specific("browser-X")`
|
||||
from another node would route to nothing.
|
||||
|
||||
3. **Not addressable from other nodes.** `PeerRef::Specific` resolves
|
||||
through `PeerEntry` → `PeerId`. Another alknet node has no way to
|
||||
reach "the browser currently connected to hub-A"; the hub holds that
|
||||
connection as a live `CallConnection` handle, not as a peer-graph
|
||||
entry. The connection-local overlay is precisely the mechanism that
|
||||
gives the browser bidirectional-call capability *without* peer-graph
|
||||
membership.
|
||||
|
||||
### The overlay (websocket.md §"Connection-local overlay")
|
||||
|
||||
A browser over WebSocket has no `PeerId` on the hub's side. Any ops the
|
||||
browser registers land in a **connection-local Layer 2 overlay**
|
||||
(ADR-024) — a per-`CallConnection` overlay that dies when the connection
|
||||
drops. This is the same mechanism ADR-034 §2 describes for the inbound
|
||||
browser case: the browser is a bidirectional call target during a live
|
||||
session, not a peer-graph member, and the connection-local overlay is
|
||||
what gives it bidirectional-call capability *without* peer-graph
|
||||
membership.
|
||||
|
||||
When the WS connection closes (browser closes the tab, network drops),
|
||||
the overlay and all its registered ops are dropped — no explicit
|
||||
deregistration needed. A `PeerRef::Specific("browser-X")` from another
|
||||
node would route to nothing, because there is no `PeerEntry` for the
|
||||
browser.
|
||||
|
||||
### Bidirectionality (websocket.md §"Bidirectionality")
|
||||
|
||||
The WS call-protocol session inherits the call protocol's native
|
||||
bidirectionality: both sides can send `call.requested` frames. The
|
||||
browser calls operations on the hub; the hub can call operations
|
||||
registered on the browser's side, over the same session, using the same
|
||||
`PendingRequestMap` and `EventEnvelope` framing as `alknet/call`.
|
||||
|
||||
The browser case where the client registers no operations of its own
|
||||
is the common case — the server→client call direction is unused because
|
||||
the browser has nothing to call. That is a use-case scoping, not an
|
||||
architectural limitation. A browser that *does* expose ops (e.g., a UI
|
||||
that registers a `ui/dragged` op the hub can call to push live updates)
|
||||
registers them in the connection-local Layer 2 overlay, and the hub
|
||||
reaches them through the live `CallConnection` handle — not through
|
||||
`PeerRef::Specific` (the browser is not a peer).
|
||||
|
||||
### Implementation
|
||||
|
||||
The `CallConnection` constructed by the upgrade handler (the
|
||||
`upgrade-handler` task, via the `dispatcher-transport-abstraction`
|
||||
task's non-QUIC constructor) already holds a Layer 2 overlay
|
||||
(`imported_operations: Arc<RwLock<HashMap<String, HandlerRegistration>>>`)
|
||||
and exposes `register_imported()` / `register_imported_all()` /
|
||||
`overlay_env()`. The browser registers ops via these methods; the
|
||||
overlay is per-connection and dies when the `CallConnection` is dropped
|
||||
(WS close).
|
||||
|
||||
This task ensures:
|
||||
|
||||
1. The overlay is correctly scoped to the WS connection (not the
|
||||
`PeerCompositeEnv` — no `PeerId`, no `PeerEntry`).
|
||||
2. The hub's outgoing `call.requested` to browser-registered ops routes
|
||||
through the `CallConnection`'s overlay (via `overlay_env()`), not
|
||||
through `PeerRef::Specific`.
|
||||
3. The overlay is dropped on WS close (no explicit deregistration; the
|
||||
`Arc<RwLock<HashMap>>` is dropped when the `CallConnection` is
|
||||
dropped).
|
||||
4. `AccessControl::check(identity)` gates the hub's calls to
|
||||
browser-registered ops (the browser's bearer-token identity is the
|
||||
caller identity for the hub's outgoing calls — wait, no: the *hub*
|
||||
is the caller when it calls a browser op; the browser's identity is
|
||||
the *handler* identity. Clarify: the hub's `call.requested` to a
|
||||
browser op runs with the hub's identity as caller, the browser's
|
||||
registration bundle's `composition_authority` as handler identity.
|
||||
The browser's `AccessControl` on its registered ops gates whether
|
||||
the hub is allowed to call them.)
|
||||
5. Abort cascade on WS disconnect (ADR-016): when the WS connection
|
||||
closes, all in-flight subscriptions and calls to browser ops are
|
||||
aborted, cascading to descendants.
|
||||
|
||||
### What this task does NOT do
|
||||
|
||||
- **No `PeerEntry` for the browser.** The browser is not in the peer
|
||||
graph. This task ensures the overlay is connection-local, not
|
||||
peer-graph.
|
||||
- **No `from_wss` adapter.** Out of scope (websocket.md §"Future" —
|
||||
scope decision). This task is about the browser *registering* ops on
|
||||
its connection, not about importing a remote node's ops over WS.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Browser-registered ops land in the `CallConnection`'s Layer 2 overlay (not `PeerCompositeEnv`)
|
||||
- [ ] No `PeerId` created for the browser (no `PeerEntry`, no peer-graph membership)
|
||||
- [ ] `register_imported()` / `register_imported_all()` work for browser ops
|
||||
- [ ] Hub's outgoing `call.requested` to browser ops routes through `overlay_env()`
|
||||
- [ ] Hub's outgoing calls do NOT route through `PeerRef::Specific` (browser is not a peer)
|
||||
- [ ] `AccessControl` on browser-registered ops gates the hub's calls
|
||||
- [ ] Overlay dropped on WS close (no explicit deregistration; `Arc<RwLock<HashMap>>` dropped)
|
||||
- [ ] `PeerRef::Specific("browser-X")` from another node → routes to nothing (no `PeerEntry`)
|
||||
- [ ] WS close → all in-flight subscriptions/calls to browser ops aborted (ADR-016 cascade)
|
||||
- [ ] WS close → overlay and all registered ops dropped
|
||||
- [ ] Bidirectionality: hub can `call.requested` to browser-registered ops
|
||||
- [ ] Browser with no registered ops → server→client direction unused (use-case scoping, not a limitation)
|
||||
- [ ] Integration test: browser registers op → hub calls it via overlay
|
||||
- [ ] Integration test: WS close → overlay dropped (op no longer reachable)
|
||||
- [ ] Integration test: `PeerRef::Specific("browser-X")` → NOT_FOUND (no PeerEntry)
|
||||
- [ ] Integration test: WS close mid-call to browser op → `call.aborted` cascade
|
||||
- [ ] Integration test: `AccessControl` on browser op gates hub's call
|
||||
- [ ] `cargo test -p alknet-http` succeeds
|
||||
- [ ] `cargo clippy -p alknet-http --all-targets` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/http/websocket.md — Connection-local overlay (§"Connection-local overlay"), Bidirectionality (§"Bidirectionality"), Browsers are not peers (§"Browsers are not alknet peers")
|
||||
- docs/architecture/decisions/024-operation-registry-layering.md — ADR-024 (Layer 2 connection-local overlay)
|
||||
- docs/architecture/decisions/034-outgoing-only-x509-and-three-peer-roles.md — ADR-034 §4 (browsers are not peers)
|
||||
- docs/architecture/decisions/044-defer-webtransport-browsers-use-websocket.md — ADR-044 §5 (addressability vs bidirectionality rationale)
|
||||
- docs/architecture/decisions/016-abort-cascade-for-nested-calls.md — ADR-016 (abort cascade on disconnect)
|
||||
- docs/architecture/decisions/029-peer-graph-routing-model.md — ADR-029 (PeerRef::Specific routes through PeerEntry → PeerId)
|
||||
|
||||
## Notes
|
||||
|
||||
> The connection-local overlay is the mechanism that gives a browser
|
||||
> bidirectional-call capability without peer-graph membership. The
|
||||
> browser has no PeerId, no PeerEntry, no PeerCompositeEnv entry — it is
|
||||
> a bidirectional call target during a live session, not a peer-graph
|
||||
> member. The overlay dies with the WS connection (no explicit
|
||||
> deregistration). The hub reaches browser ops through the live
|
||||
> CallConnection handle's overlay_env(), not through PeerRef::Specific.
|
||||
> The "browsers are not peers" rationale (ADR-044 §5) is load-bearing:
|
||||
> "peer" means addressable peer-graph node, not "any endpoint that
|
||||
> exchanges calls during a live session." A browser has no stable
|
||||
> cryptographic identity, is ephemeral, and is not addressable from
|
||||
> other nodes — three concrete grounds for not being a peer.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
180
tasks/http/websocket/dispatcher-transport-abstraction.md
Normal file
180
tasks/http/websocket/dispatcher-transport-abstraction.md
Normal file
@@ -0,0 +1,180 @@
|
||||
---
|
||||
id: http/websocket/dispatcher-transport-abstraction
|
||||
name: Expose EventEnvelope-level dispatch API in alknet-call for non-QUIC transports (WebSocket)
|
||||
status: pending
|
||||
depends_on: [http/crate-init]
|
||||
scope: moderate
|
||||
risk: high
|
||||
impact: project
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Expose an `EventEnvelope`-level dispatch API in `alknet-call` so the
|
||||
WebSocket handler can feed deserialized envelopes directly to the shared
|
||||
`Dispatcher`, without requiring a QUIC `Connection`. This is a
|
||||
**cross-crate task** (modifies `alknet-call`) and the **highest-risk
|
||||
task** in the http phase: the spec says "the `Dispatcher` runs unchanged"
|
||||
over WS (ADR-012, ADR-048), but the current implementation is
|
||||
QUIC-specific in two places that need loosening.
|
||||
|
||||
### The problem
|
||||
|
||||
The current `Dispatcher` (in `crates/alknet-call/src/protocol/dispatch.rs`)
|
||||
is transport-agnostic in *intent* (ADR-012 — stream-agnostic
|
||||
correlation) but QUIC-specific in *two* integration points:
|
||||
|
||||
1. **`Dispatcher::handle_stream`** takes raw `SendStream` / `RecvStream`
|
||||
(QUIC-backed `alknet_core::types::SendStream` / `RecvStream`) and uses
|
||||
`FrameFramedReader` (4-byte length-prefixed framing). The WebSocket
|
||||
path does NOT use length-prefix framing — a WS binary message is
|
||||
already length-delimited by the WS frame boundary (ADR-044 Assumption
|
||||
1). The WS handler deserializes `EventEnvelope` from each binary WS
|
||||
message directly (no `FrameFramedReader`), and needs to feed the
|
||||
envelope to the dispatch logic.
|
||||
|
||||
2. **`CallConnection`** wraps an `alknet_core::types::Connection` (which
|
||||
wraps a QUIC `quinn::Connection` or `iroh::endpoint::Connection`).
|
||||
The WS path has no QUIC connection — it has a WS message stream. The
|
||||
`CallConnection` is needed for: the Layer 2 overlay
|
||||
(`imported_operations`), the `PendingRequestMap` (correlation), and
|
||||
the `connection.identity()` (the resolved bearer identity). The WS
|
||||
path needs a `CallConnection`-equivalent that holds these without a
|
||||
QUIC `Connection`.
|
||||
|
||||
### The fix: expose `dispatch_requested` as `pub`
|
||||
|
||||
The core dispatch logic — `Dispatcher::dispatch_requested` — is already
|
||||
transport-agnostic: it takes a `request_id: String`, a `payload: Value`
|
||||
(the `EventEnvelope` payload), and a `&Arc<CallConnection>`, and returns
|
||||
a `ResponseEnvelope`. It is currently `pub(crate)`. **Expose it as
|
||||
`pub`** so the WS handler can call it directly with a deserialized
|
||||
`EventEnvelope` payload.
|
||||
|
||||
Similarly, the abort-cascade handling (`call.aborted` events) is in
|
||||
`Dispatcher::handle_stream` — extract the abort-handling logic into a
|
||||
`pub` method so the WS handler can call it for `call.aborted` events.
|
||||
|
||||
### The fix: `CallConnection` from a non-QUIC transport
|
||||
|
||||
The `CallConnection` needs to be constructible from a non-QUIC source.
|
||||
Two options (pick the cleaner one during implementation):
|
||||
|
||||
**Option A: A `CallConnection::new_overlay_only(identity)` constructor.**
|
||||
Construct a `CallConnection` that holds the Layer 2 overlay +
|
||||
`PendingRequestMap` + the resolved bearer `Identity`, but no QUIC
|
||||
`Connection`. The `connection()` accessor returns a stub or the
|
||||
`identity()` is stored directly. This is the minimal change —
|
||||
`CallConnection` gains a constructor that doesn't require a QUIC
|
||||
`Connection`, and the `identity()` is read from a stored field rather
|
||||
than `connection.identity()`.
|
||||
|
||||
**Option B: Extract a `CallSession` trait.** Define a trait that
|
||||
`CallConnection` and a new `WsCallSession` both implement, with
|
||||
`identity()`, `overlay_env()`, `pending()`, `register_imported()`. The
|
||||
`Dispatcher` takes `&Arc<dyn CallSession>`. This is more invasive but
|
||||
cleaner; it's the right choice if the QUIC/WS divergence is large.
|
||||
|
||||
**Recommendation: Option A** unless the divergence is larger than it
|
||||
appears. The `CallConnection` already holds the overlay + pending as
|
||||
`Arc<RwLock<...>>` / `Arc<Mutex<...>>` (independent of the QUIC
|
||||
`Connection`); the only QUIC-coupled piece is the `connection: Arc<Connection>`
|
||||
field and the `connection.identity()` call. A constructor that stores
|
||||
the `Identity` directly (and returns `None` from `connection()` or
|
||||
provides a `identity()` accessor that reads the stored field) is the
|
||||
minimal change.
|
||||
|
||||
### The WS dispatch loop (how the WS handler uses this)
|
||||
|
||||
The WS upgrade handler (the `websocket/upgrade-handler` task) will:
|
||||
|
||||
1. Resolve the bearer identity at upgrade time.
|
||||
2. Construct a `CallConnection` (via the new constructor — Option A) or
|
||||
equivalent (Option B) holding the identity, a fresh Layer 2 overlay,
|
||||
and a fresh `PendingRequestMap`.
|
||||
3. Construct a `Dispatcher` (already `pub`).
|
||||
4. For each binary WS message: deserialize `EventEnvelope`, match on
|
||||
`envelope.r#type`:
|
||||
- `call.requested` → call `Dispatcher::dispatch_requested(connection,
|
||||
request_id, payload)` (now `pub`), get `ResponseEnvelope`, convert
|
||||
to `EventEnvelope`, write back as binary WS message.
|
||||
- `call.aborted` → call the extracted `pub` abort-handling method.
|
||||
- `call.responded` / `call.completed` → correlate via
|
||||
`PendingRequestMap` (the WS handler's outgoing calls —
|
||||
bidirectionality, ADR-043 §2).
|
||||
5. On WS close: fail all pending, drop the overlay (connection-local,
|
||||
dies with the WS connection).
|
||||
|
||||
### What this task does NOT do
|
||||
|
||||
- **No WS upgrade handler.** The upgrade handler is the
|
||||
`websocket/upgrade-handler` task. This task exposes the API it calls.
|
||||
- **No WS framing.** The WS message → `EventEnvelope` deserialization is
|
||||
the `websocket/upgrade-handler` task. This task takes deserialized
|
||||
envelopes.
|
||||
- **No `from_wss` adapter.** Out of scope (websocket.md §"Future" —
|
||||
scope decision, not a two-way-door deferral).
|
||||
|
||||
### Why this is the highest-risk task
|
||||
|
||||
This task modifies `alknet-call`'s security-relevant dispatch code. The
|
||||
`dispatch_requested` method runs `AccessControl::check(identity)` — the
|
||||
sole authorization gate (ADR-029 §3). Exposing it as `pub` is safe (the
|
||||
WS handler is in `alknet-http`, a trusted crate), but the change must
|
||||
not alter the dispatch logic itself. The `CallConnection` change must
|
||||
not break the existing QUIC path (the `CallAdapter` and `CallClient`
|
||||
construct `CallConnection` from a QUIC `Connection` — that path must
|
||||
continue to work unchanged). Run the full `alknet-call` test suite after
|
||||
the change.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] `Dispatcher::dispatch_requested` is `pub` (was `pub(crate)`)
|
||||
- [ ] Abort-cascade handling extracted to a `pub` method (was inline in `handle_stream`)
|
||||
- [ ] `CallConnection` constructible from a non-QUIC source (Option A or B)
|
||||
- [ ] New `CallConnection` constructor stores `Identity` directly (or equivalent)
|
||||
- [ ] `CallConnection::identity()` works for the non-QUIC case
|
||||
- [ ] `CallConnection::overlay_env()`, `pending()`, `register_imported()` work for non-QUIC
|
||||
- [ ] Existing QUIC path (`CallAdapter`, `CallClient`) unchanged — no regressions
|
||||
- [ ] `Dispatcher::handle_stream` (QUIC path) still works unchanged
|
||||
- [ ] `Dispatcher::run_loop` (QUIC path) still works unchanged
|
||||
- [ ] `cargo test -p alknet-call` — all existing tests pass (no regressions)
|
||||
- [ ] `cargo clippy -p alknet-call --all-targets` — no warnings
|
||||
- [ ] Unit test: `dispatch_requested` callable with a non-QUIC `CallConnection`
|
||||
- [ ] Unit test: abort-handling method callable with a non-QUIC `CallConnection`
|
||||
- [ ] Unit test: `CallConnection` from non-QUIC source holds overlay + pending + identity
|
||||
- [ ] Integration test: dispatch a `call.requested` via the `pub` API → `ResponseEnvelope`
|
||||
- [ ] Integration test: abort cascade via the `pub` API
|
||||
- [ ] `cargo test -p alknet-http` succeeds (the WS handler can use the API)
|
||||
- [ ] `cargo clippy -p alknet-http --all-targets` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/http/websocket.md — Dispatch (§"Dispatch: the shared Dispatcher, unchanged"), Framing (§"Framing")
|
||||
- docs/architecture/decisions/012-call-protocol-stream-model.md — ADR-012 (stream-agnostic correlation)
|
||||
- docs/architecture/decisions/048-websocket-native-session-not-gateway.md — ADR-048 (WS carries native session)
|
||||
- docs/architecture/decisions/044-defer-webtransport-browsers-use-websocket.md — ADR-044 (WS message boundary is delimiter, no length prefix)
|
||||
- docs/architecture/crates/call/call-protocol.md — Dispatcher, EventEnvelope wire format
|
||||
- docs/architecture/crates/call/client-and-adapters.md — Shared Dispatcher (§"Shared Dispatcher")
|
||||
- crates/alknet-call/src/protocol/dispatch.rs — current Dispatcher implementation
|
||||
- crates/alknet-call/src/protocol/connection.rs — current CallConnection implementation
|
||||
|
||||
## Notes
|
||||
|
||||
> This is the highest-risk task in the http phase. It modifies
|
||||
> alknet-call's security-relevant dispatch code to expose an
|
||||
> EventEnvelope-level API for non-QUIC transports. The spec says "the
|
||||
> Dispatcher runs unchanged" (ADR-012), but the current implementation is
|
||||
> QUIC-specific in two places: handle_stream takes raw SendStream/RecvStream
|
||||
> (length-prefixed framing), and CallConnection wraps a QUIC Connection.
|
||||
> The fix is to expose dispatch_requested as pub and make CallConnection
|
||||
> constructible from a non-QUIC source. The existing QUIC path (CallAdapter,
|
||||
> CallClient) must not regress — run the full alknet-call test suite. The
|
||||
> WS handler (websocket/upgrade-handler task) is the consumer of this API.
|
||||
> This task is tracked in tasks/http/ because it unblocks the WS path, but
|
||||
> it modifies alknet-call — coordinate with the call crate's conventions.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
230
tasks/http/websocket/upgrade-handler.md
Normal file
230
tasks/http/websocket/upgrade-handler.md
Normal file
@@ -0,0 +1,230 @@
|
||||
---
|
||||
id: http/websocket/upgrade-handler
|
||||
name: Implement WebSocket upgrade handler (native EventEnvelope session, no length prefix, bearer auth)
|
||||
status: pending
|
||||
depends_on: [http/server/http-adapter, http/websocket/dispatcher-transport-abstraction, http/server/bearer-auth-middleware]
|
||||
scope: broad
|
||||
risk: high
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Implement the WebSocket upgrade handler in `src/websocket/upgrade.rs`.
|
||||
This is the v1 browser bidirectional path (ADR-044): a browser (or any
|
||||
WS client) upgrades an HTTP/1.1 or HTTP/2 request to WebSocket and
|
||||
speaks the call protocol over binary WS messages — full-duplex, both
|
||||
sides can initiate calls (the call protocol's native bidirectionality,
|
||||
ADR-012). The WS path carries the **native `EventEnvelope` session, not
|
||||
the HTTP gateway shape** (ADR-048): the gateway endpoints
|
||||
(`/search`/`/schema`/`/call`/`/batch`/`/subscribe`) are HTTP-only and do
|
||||
not appear on WS; discovery is via `services/list`/`services/schema` as
|
||||
ordinary call-protocol ops.
|
||||
|
||||
### The upgrade handler (websocket.md §"The WS upgrade handler")
|
||||
|
||||
The WS upgrade is an HTTP/1.1 or HTTP/2 request handled by an axum route
|
||||
on `HttpAdapter`'s router. The handler:
|
||||
|
||||
1. Receives the HTTP upgrade request (axum's `WebSocketUpgrade` extractor).
|
||||
2. Resolves the caller's identity from the `Authorization: Bearer` header
|
||||
via `identity_provider.resolve_from_token(&AuthToken { raw:
|
||||
token_bytes })` (the shared `bearer_auth_middleware` — same auth path
|
||||
as any HTTP request). The upgrade is rejected (`401`) if no token is
|
||||
present; insufficient scopes for any op the browser later calls
|
||||
surface as `403`/`FORBIDDEN` at call time, not at upgrade time (the
|
||||
upgrade doesn't know which ops the browser will call).
|
||||
3. Upgrades to WebSocket (axum's `WebSocketUpgrade::on_upgrade`),
|
||||
producing a full-duplex `WebSocket` stream.
|
||||
4. Wraps the `WebSocket` stream as a `BiStream`-satisfying transport — a
|
||||
WS binary message in either direction is one `EventEnvelope` frame.
|
||||
5. Constructs a `Dispatcher` (the shared dispatch loop) with the
|
||||
`Arc<OperationRegistry>` and `Arc<dyn IdentityProvider>` the
|
||||
`HttpAdapter` holds, plus a connection-local Layer 2 overlay for any
|
||||
ops the browser registers (the `connection-overlay` task).
|
||||
6. Spawns the dispatch task on a tokio task; the WS connection is live
|
||||
until either side closes it or the browser drops the handle (closes
|
||||
the tab).
|
||||
|
||||
### The upgrade path
|
||||
|
||||
The **default upgrade path is `/alknet/call`** (the deployment may
|
||||
override it via the `extra_routes` mechanism of ADR-046, but a
|
||||
deployment that passes no custom routes gets `/alknet/call`). The path
|
||||
must not collide with the reserved gateway/`/healthz`/`/openapi.json`/
|
||||
MCP/custom-route paths per ADR-046's collision rule; `/alknet/call`
|
||||
namespaces away from the reserved set naturally. A deployment that
|
||||
builds a custom REST projection with `POST /{service}/{op}` routes
|
||||
(ADR-047 §4) coexists with the WS upgrade at `/alknet/call` — axum's
|
||||
`Router::merge` prioritizes specific routes over wildcards, so the WS
|
||||
upgrade's exact `/alknet/call` path wins over any `/{service}/{op}`
|
||||
wildcard.
|
||||
|
||||
The upgrade runs over HTTP/1.1 (the standard `Upgrade: websocket` header,
|
||||
RFC 6455) or HTTP/2 (the extended CONNECT protocol, RFC 8441);
|
||||
axum/hyper supports both, and the handler does not branch on which —
|
||||
the WS frame stream is the same once the upgrade completes.
|
||||
|
||||
### Framing: `EventEnvelope` over binary WS messages (websocket.md §"Framing")
|
||||
|
||||
Every message on the WS connection is a binary WebSocket message
|
||||
containing one `EventEnvelope`:
|
||||
|
||||
```rust
|
||||
pub struct EventEnvelope {
|
||||
pub r#type: String, // "call.requested" | "call.responded" | "call.completed" | "call.aborted" | "call.error"
|
||||
pub id: String, // Correlation key (request ID, subscription ID)
|
||||
pub payload: Value, // serde_json::Value — schema depends on event type
|
||||
}
|
||||
```
|
||||
|
||||
This is the call protocol's wire format verbatim. **The WS path carries
|
||||
no length prefix**: one `EventEnvelope` JSON object = one binary WS
|
||||
message, and the WS message boundary is the delimiter. The
|
||||
implementation must not prepend the QUIC length prefix on outbound WS
|
||||
messages or expect it on inbound ones — the two framings are
|
||||
deliberately different, matching each transport's native boundary
|
||||
semantics. (The `FrameFramedReader`/`FrameFramedWriter` types the QUIC
|
||||
dispatch loop uses are replaced on the WS path by direct JSON serde
|
||||
over the WS message type; the `Dispatcher` itself is transport-agnostic
|
||||
and consumes `EventEnvelope` values, not raw bytes.)
|
||||
|
||||
Binary payloads within `EventEnvelope.payload` follow the same
|
||||
base64-as-JSON-string convention the QUIC path uses — the envelope
|
||||
carries `serde_json::Value` and does not interpret binary fields; that's
|
||||
a handler-level concern, transport-agnostic.
|
||||
|
||||
Text WS messages are not used; all call-protocol frames are binary. A
|
||||
client that sends a text message gets a protocol-level close (the WS
|
||||
handler validates message type).
|
||||
|
||||
### Dispatch: the shared `Dispatcher` (websocket.md §"Dispatch")
|
||||
|
||||
The WS message stream is handed to the `Dispatcher` — the same dispatch
|
||||
loop the `CallAdapter` uses for `alknet/call` QUIC connections. The
|
||||
dispatch half is one implementation; the connection-establishment half
|
||||
differs (WS upgrade handler vs QUIC accept/dial), but after
|
||||
establishment the `Dispatcher` runs identically:
|
||||
|
||||
- Reads `EventEnvelope` frames from the WS message stream (deserialized
|
||||
from binary WS messages — no `FrameFramedReader`).
|
||||
- For `call.requested`: resolves the peer's identity (the bearer-token
|
||||
identity resolved at upgrade time, stored on the connection), runs
|
||||
`AccessControl::check(identity)` against the op's `AccessControl`,
|
||||
dispatches via `OperationRegistry::invoke()` if allowed, returns
|
||||
`FORBIDDEN` (→ `call.error`) before the handler runs if not.
|
||||
- For `call.responded`/`call.completed`/`call.aborted`: correlates by
|
||||
`id` via `PendingRequestMap` (keyed by request ID, not by transport —
|
||||
ADR-012).
|
||||
- Writes response `EventEnvelope` frames back as binary WS messages.
|
||||
|
||||
Peer authorization flows through the existing `AccessControl::check`
|
||||
against the resolved identity — no `RemoteFilter`, no `remote_safe`
|
||||
gate (retired by ADR-029 §3).
|
||||
|
||||
### Using the exposed dispatch API
|
||||
|
||||
This task uses the `pub` dispatch API exposed by the
|
||||
`dispatcher-transport-abstraction` task:
|
||||
|
||||
- `Dispatcher::dispatch_requested(connection, request_id, payload)` —
|
||||
for `call.requested` events.
|
||||
- The `pub` abort-handling method — for `call.aborted` events.
|
||||
- `CallConnection` constructed from the non-QUIC source (holding the
|
||||
resolved bearer identity, a fresh Layer 2 overlay, a fresh
|
||||
`PendingRequestMap`).
|
||||
|
||||
### Bidirectionality (websocket.md §"Bidirectionality")
|
||||
|
||||
The WS call-protocol session inherits the call protocol's native
|
||||
bidirectionality: both sides can send `call.requested` frames. The
|
||||
browser calls operations on the hub; the hub can call operations
|
||||
registered on the browser's side, over the same session, using the same
|
||||
`PendingRequestMap` and `EventEnvelope` framing as `alknet/call`.
|
||||
|
||||
The browser case where the client registers no operations of its own
|
||||
is the common case — the server→client call direction is unused
|
||||
because the browser has nothing to call. That is a use-case scoping,
|
||||
not an architectural limitation. A browser that *does* expose ops
|
||||
registers them in the connection-local Layer 2 overlay (the
|
||||
`connection-overlay` task).
|
||||
|
||||
### Streaming: native `call.responded` events, no SSE (websocket.md §"Streaming")
|
||||
|
||||
A `Subscription` operation invoked over WS streams `call.responded`
|
||||
events as binary WS messages directly — **no SSE `data:` framing**. SSE
|
||||
is the `h2`/`http/1.1` streaming projection; on WS it is unnecessary
|
||||
because WS is already a framed full-duplex channel. The browser receives
|
||||
`call.responded` events one per WS binary message, with the same `id`
|
||||
correlating them to the original `call.requested`; `call.completed`
|
||||
closes the subscription; `call.aborted` closes it with an error frame.
|
||||
|
||||
On WS client disconnect (the browser closes the tab mid-subscription),
|
||||
the WS handler detects the stream close and sends `call.aborted` for
|
||||
the in-flight subscription, which cascades to descendants per ADR-016.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] WS upgrade route at `/alknet/call` (default, ADR-046 collision rule)
|
||||
- [ ] Upgrade handler uses axum's `WebSocketUpgrade` extractor
|
||||
- [ ] Bearer auth on upgrade request via shared `bearer_auth_middleware`
|
||||
- [ ] No token → `401` (upgrade rejected)
|
||||
- [ ] Token present but insufficient scopes → `403` at call time (not upgrade time)
|
||||
- [ ] Resolved identity stored on the `CallConnection` (for observability + AccessControl)
|
||||
- [ ] WS binary message = one `EventEnvelope` (JSON serde, no length prefix)
|
||||
- [ ] No `FrameFramedReader`/`FrameFramedWriter` on the WS path (WS message boundary is delimiter)
|
||||
- [ ] Text WS messages rejected (protocol-level close)
|
||||
- [ ] `call.requested` → `Dispatcher::dispatch_requested` (the pub API)
|
||||
- [ ] `AccessControl::check(identity)` gates every `call.requested`
|
||||
- [ ] `FORBIDDEN` → `call.error` event (before handler runs)
|
||||
- [ ] `call.responded`/`call.completed`/`call.aborted` correlated by `id` via `PendingRequestMap`
|
||||
- [ ] Response `EventEnvelope` frames written as binary WS messages
|
||||
- [ ] `call.aborted` → the pub abort-handling method
|
||||
- [ ] Bidirectionality: hub can `call.requested` to browser-registered ops
|
||||
- [ ] `Subscription` streams `call.responded` as binary WS messages (no SSE)
|
||||
- [ ] `call.completed` closes subscription; `call.aborted` closes with error
|
||||
- [ ] WS client disconnect mid-subscription → `call.aborted` (ADR-016 cascade)
|
||||
- [ ] WS close → fail all pending, drop overlay (connection-local)
|
||||
- [ ] Upgrade works over HTTP/1.1 (RFC 6455) and HTTP/2 (RFC 8441)
|
||||
- [ ] Handler does not branch on HTTP version (WS frame stream is same post-upgrade)
|
||||
- [ ] Integration test: WS upgrade → `call.requested` → `call.responded` round-trip
|
||||
- [ ] Integration test: no Bearer token → 401
|
||||
- [ ] Integration test: `AccessControl` denied → `call.error` FORBIDDEN
|
||||
- [ ] Integration test: `Subscription` over WS → multiple `call.responded` + `call.completed`
|
||||
- [ ] Integration test: WS disconnect mid-subscription → `call.aborted` cascade
|
||||
- [ ] Integration test: text WS message → protocol close
|
||||
- [ ] Integration test: bidirectional (hub calls browser-registered op)
|
||||
- [ ] `cargo test -p alknet-http` succeeds
|
||||
- [ ] `cargo clippy -p alknet-http --all-targets` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/http/websocket.md — full WS spec (upgrade handler, framing, dispatch, bidirectionality, streaming)
|
||||
- docs/architecture/decisions/044-defer-webtransport-browsers-use-websocket.md — ADR-044 (WS is v1 browser path, no length prefix)
|
||||
- docs/architecture/decisions/048-websocket-native-session-not-gateway.md — ADR-048 (native session, not gateway shape)
|
||||
- docs/architecture/decisions/012-call-protocol-stream-model.md — ADR-012 (stream-agnostic correlation)
|
||||
- docs/architecture/decisions/016-abort-cascade-for-nested-calls.md — ADR-016 (disconnect → abort cascade)
|
||||
- docs/architecture/decisions/029-peer-graph-routing-model.md — ADR-029 §3 (AccessControl::check is sole gate)
|
||||
- docs/architecture/decisions/046-assembly-layer-custom-http-routes.md — ADR-046 (collision rule for /alknet/call)
|
||||
- /workspace/@alkdev/pubsub/src/event-target-websocket-client.ts — TypeScript prior art (EventEnvelope over WS binary messages)
|
||||
|
||||
## Notes
|
||||
|
||||
> The WS path is the native EventEnvelope session, not the gateway shape
|
||||
> (ADR-048). The gateway endpoints are HTTP-only; discovery is via
|
||||
> services/list/services/schema as call-protocol ops. The WS path carries
|
||||
> no length prefix (ADR-044 Assumption 1 — the WS message boundary is the
|
||||
> delimiter, unlike QUIC's 4-byte prefix). Text messages are rejected. The
|
||||
> dispatch uses the pub API exposed by the dispatcher-transport-abstraction
|
||||
> task (dispatch_requested + abort-handling + non-QUIC CallConnection).
|
||||
> Bidirectionality: both sides can call.requested (ADR-043 §2 transferred
|
||||
> per ADR-044 §3). Streaming is native call.responded events, no SSE. The
|
||||
> default upgrade path is /alknet/call (namespaces away from reserved paths
|
||||
> per ADR-046). This is the second-highest-risk task (after the transport
|
||||
> abstraction) — the WS dispatch loop must be identical to the QUIC dispatch
|
||||
> loop on the security axis (AccessControl, identity, abort cascade).
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
Reference in New Issue
Block a user