docs(research): fix alknet-docker POC normalization crate boundary — alknet-compute is a workload not the fleet layer; add head-worker/machine-node model and dispatch reverse-runner prior art

docs(research): add alknet-tty phase-0 findings — terminal session protocol as separate ALPN, TtyBackend trait, dissolves alknet-ssh PTY hedge
docs(research): add alknet-docker POC summary — validates two-carriage model (JSON + raw) for bollard docker ops over framed bidi streams
2026-07-03 10:28:53 +00:00 · 2026-07-03 07:48:21 +00:00 · 2026-07-02 17:08:52 +00:00 · 2026-07-02 10:12:19 +00:00 · 2026-07-02 10:10:55 +00:00 · 2026-07-02 10:10:42 +00:00
5 changed files with 873 additions and 10 deletions
--- a/crates/alknet-http/src/websocket/upgrade.rs
+++ b/crates/alknet-http/src/websocket/upgrade.rs
@@ -779,10 +779,11 @@ mod tests {
        let out = handle_inbound_envelope(&dp, &conn, request)
            .await
            .expect("response");
-        assert_eq!(out.r#type, EVENT_ERROR);
+        assert_eq!(out.r#type, EVENT_RESPONDED);
+        assert_eq!(out.id, "sub-0");
        assert_eq!(
-            out.payload.get("code"),
-            Some(&serde_json::json!("INVALID_OPERATION_TYPE"))
+            out.payload.get("output"),
+            Some(&serde_json::json!({ "n": 1 }))
        );
    }

@@ -1077,10 +1078,10 @@ mod tests {
            MockMsg::Binary(bytes) => {
                let env: EventEnvelope = serde_json::from_slice(&bytes).unwrap();
                assert_eq!(env.id, "sub-ws-0");
-                assert_eq!(env.r#type, EVENT_ERROR);
+                assert_eq!(env.r#type, EVENT_RESPONDED);
                assert_eq!(
-                    env.payload.get("code"),
-                    Some(&serde_json::json!("INVALID_OPERATION_TYPE"))
+                    env.payload.get("output"),
+                    Some(&serde_json::json!({ "n": 1 }))
                );
            }
            other => panic!("expected binary, got {other:?}"),
--- a/docs/research/alknet-docker/poc-summary.md
+++ b/docs/research/alknet-docker/poc-summary.md
@@ -0,0 +1,245 @@
+# alknet-docker: POC Research Summary
+
+**Status:** Research complete — all three high-leverage unknowns validated against a live docker daemon. The approach is viable; the remaining unknowns are spec-scope, not feasibility.
+**Date:** 2026-07-02
+**Scope:** Captures what the POC proved about mapping bollard's docker operations onto framed bidirectional streams, the two-carriage model (JSON call protocol vs raw bytes), and what remains open for the `alknet-docker` crate spec.
+
+---
+
+## Executive Summary
+
+A POC (`alknet-docker-poc`, `/workspace/alknet-docker-poc`) validated the three highest-leverage unknowns for wrapping bollard into alknet's call protocol:
+
+1. **Interactive attach round-trip via raw carriage** — a client drives an interactive `sh` session in a container through a framed bidi stream. After a single JSON `call.requested` frame, the stream switches to a 1-byte-prefixed chunk format for stdin/stdout. Proves the stdin question is solved without modifying the core call protocol's wire format.
+2. **Logs subscription → deterministic completion** — a container's log stream maps to `call.responded` frames and container exit produces a single `call.completed` frame on the client. Proves the stopgap coordination path: a coordinator spawns a container, subscribes to logs, and gets a reliable completion notification — no plugin state to corrupt.
+3. **Exec with exit code propagation** — exit code rides on a final `call.responded` frame `{ "exitCode": N }` before `call.completed`. Proves streaming operations can carry a result-at-end without changing `call.completed`'s empty-payload shape.
+
+**6 tests pass** (3 docker-integration + 3 frame/codec unit tests) against a live docker daemon (Docker Engine 29.2.1, API 1.53) using `alpine:3`.
+
+The POC depends on the local bollard checkout (0.21.0 at `/workspace/bollard`) and uses `tokio::io::duplex` as a stand-in for a QUIC bidi stream. The framing layer is byte-identical to alknet-call's `protocol/wire.rs`, so a future swap to `alknet_call::protocol::wire::*` is mechanical.
+
+---
+
+## The Two-Carriage Model
+
+The central design decision validated by the POC: **the call protocol is the negotiation layer; the carriage is per-operation.** A single `call.requested` frame carries the operation name, parameters, and a `carriage` field that tells both sides what bytes come next on the bidi stream.
+
+### JSON carriage (`carriage: "json"`)
+
+Used for request/response operations (lifecycle, list, inspect) and for log/progress subscriptions where each event is naturally JSON-shaped.
+
+- After `call.requested`, all bytes on the stream are length-prefixed `EventEnvelope` frames (identical to alknet-call's `FrameFramedReader`/`FrameFramedWriter`).
+- For subscriptions: each event → `call.responded`, natural stream end → `call.completed`, error → `call.error` (terminal, no `completed`).
+- The dispatcher's `pump_stream` (`alknet-call/src/protocol/dispatch.rs:340`) already does exactly this — a docker logs subscription is just a `StreamingHandler` wrapping `bollard::container::logs()` in a stream of `ResponseEnvelope::ok(...)`.
+
+### Raw carriage (`carriage: "raw"`)
+
+Used for interactive attach/exec where JSON-encoding every byte chunk is wasteful and lossy (containers emit binary, TTYs stream partial lines, and — as noted in the conversation — "it might not be JSON").
+
+- After `call.requested`, the stream switches to a chunk format:
+  ```text
+  [stream_type: u8][length: u32 be][payload bytes]
+  ```
+- `stream_type` mirrors bollard's `NewlineLogOutputDecoder` header byte (`/workspace/bollard/src/read.rs:46`): 0=stdin, 1=stdout, 2=stderr.
+- This is the smallest viable framing that still gives multiplexing (stdout vs stderr) and length-delimiting on a stream without natural message boundaries.
+- The same pattern generalizes to `alknet-ssh` and other protocols that are "just bytes on a bidi stream" — the call protocol negotiates the mode, the protocol is the bytes.
+
+### Why not JSON for everything?
+
+The conversation identified the core tension: the call protocol is a JSON-schema-backed JSON-RPC, which maps cleanly to websockets, HTTP request/response, MCP, etc. But it doesn't fit every situation — a container's stdout isn't JSON, a TTY streams partial bytes, and forcing everything through `serde_json` is both wasteful (base64 for binary) and lossy (line-boundary semantics).
+
+The two-carriage model resolves this: **JSON is the default/fallback for structured operations; raw is the escape hatch for byte-stream protocols.** The `carriage` field in the initial `call.requested` is the one byte of negotiation that selects which mode the rest of the stream uses. This keeps the call protocol's wire format unchanged (the `call.requested` frame is still a normal JSON envelope) while letting the *subsequent* bytes on the same bidi stream be whatever the operation needs.
+
+This connects to the stream-agnostic model from the alknet-ssh research: a protocol can run over QUIC (raw or iroh p2p), TLS, or TCP. The call protocol is the ALPN negotiation layer that sets up the stream; the protocol itself is bytes. The `alknet-docker` crate is the first concrete instance of this pattern, and it validates that the pattern works.
+
+---
+
+## POC Target 1: Interactive Attach (Raw Carriage)
+
+**Question:** Can a client drive an interactive TTY session in a container through a framed bidi stream, with stdin flowing client→server and stdout/stderr flowing server→client, without modifying the core call protocol's wire format?
+
+**Answer:** Yes. The reliable `attach_container()` (HTTP upgrade to TCP, not websocket) returns `AttachContainerResults { output: Stream<LogOutput>, input: AsyncWrite }`. The POC bridges both onto a single raw-chunk bidi stream:
+
+- **server→client:** each `LogOutput` from bollard's output stream becomes a `Chunk` with the matching `stream_type` (StdOut→1, StdErr→2, StdIn→0, Console→1), written via `ChunkWriter`.
+- **client→server:** `ChunkReader` reads stdin chunks, writes the bytes to bollard's `container_input` (`AsyncWrite`).
+- **completion:** when bollard's output stream ends (container exited), the server sends a zero-length stdout chunk as a "drained" sentinel, then closes.
+
+**Test:** `docker_attach_raw_round_trips_stdin_to_stdout` — creates an interactive `sh` container, sends `echo hello-from-attach\n` as a stdin chunk, reads stdout chunks until the echo appears, sends `exit\n`, cleans up. Passes.
+
+**Why the websocket path was not used:** bollard's own docs (`/workspace/bollard/src/container.rs:577`) warn that the websocket attach endpoint "has compatibility issues with standard RFC 6455 WebSocket implementations" and that "data flow may be unreliable on some Docker versions." The reliable `attach_container()` (HTTP upgrade to TCP) uses the same `process_upgraded()` mechanism and returns the same `AttachContainerResults` shape. The POC uses the reliable path. The websocket path remains available behind bollard's `websocket` feature for browser-attach scenarios, but the inlining/forking concern raised in the conversation would only apply if we needed websocket-specific framing — we don't, because the raw chunk format is our own, layered on top of whichever bollard attach method we use.
+
+**The `NewlineLogOutputDecoder` insight:** bollard's decoder (`read.rs:46`) already parses the docker daemon's 8-byte header (`[stream_type: u8][length: u32 be]`) into `LogOutput::StdOut/StdErr/StdIn/Console`. The POC's chunk format is the same header shape, just on our framed stream instead of docker's upgraded TCP stream. This means the mapping is a near-identity transformation — `LogOutput` → `Chunk` is a one-line match. The bytes are already framed; we just re-emit them on a different transport.
+
+---
+
+## POC Target 2: Logs Subscription → Completion Notification
+
+**Question:** Does a container's log stream map cleanly to `call.responded` frames, and does container exit produce a deterministic `call.completed` on the client?
+
+**Answer:** Yes. `bollard::container::logs()` with `follow=true` returns a `Stream<Item = Result<LogOutput, Error>>` that ends when the container exits (for non-running containers, it returns historical logs then ends immediately). The POC's `drive_logs`:
+
+1. Reads one `call.requested` frame (the request).
+2. Calls `docker.logs(container, follow=true, stdout=true, stderr=true)`.
+3. For each `LogOutput` → `EventEnvelope::responded(request_id, { "stream": "stdout"|"stderr", "text": "..." })`.
+4. On stream end → `EventEnvelope::completed(request_id)`.
+5. On error → `EventEnvelope::error(...)` (terminal, no `completed`).
+
+**Test:** `docker_logs_subscription_pumps_frames_and_completes` — container runs `echo line1; echo line2; exit 0`, client receives 2× `call.responded` (with timestamped text) + 1× `call.completed`. Passes.
+
+**The stopgap coordination path this validates:** a coordinator spawns a container, subscribes to its logs, and gets `call.completed` when the container exits — no plugin state, no polling, no worktree-tracking to corrupt. This is the "reliable completion notification" the conversation identified as the thing that would have saved the session from the mid-point crisis. The completion comes from the docker daemon's own stream-termination semantics, which is as reliable as the daemon itself — far more reliable than an opencode plugin's session tracking.
+
+**Timestamps:** the POC sets `timestamps=true` on the logs query, so each `call.responded` carries the docker timestamp in the `text` field. A production version would separate `timestamp` and `text` into distinct JSON fields.
+
+---
+
+## POC Target 3: Exec with Exit Code
+
+**Question:** Can the exit code of an exec operation propagate cleanly through the streaming completion path?
+
+**Answer:** Yes, via a final `call.responded` frame carrying `{ "exitCode": N, "terminal": true }` before `call.completed`. This keeps `call.completed`'s payload empty (`{}`), matching alknet-call's current wire format (`wire.rs:48`) — no core protocol change needed.
+
+**Test:** `docker_exec_streams_output_and_exit_code` — exec runs `echo hello-from-exec; exit 7`, client receives stdout `call.responded` frames + a final `call.responded` with `exitCode: 7` + `call.completed`. Passes.
+
+**The completion-shape decision this validates:** the conversation raised whether `call.completed` should carry a payload (for exit codes) or whether the exit code rides on a final `call.responded`. The POC validates the latter: **`call.completed` stays empty; the exit code is the last `call.responded` before completion.** This is less invasive — no change to alknet-call's wire format — and it composes with the dispatcher's existing `pump_stream` logic, which already writes `call.completed` on natural stream end after the last `call.responded`.
+
+**bollard API note:** `start_exec` returns `StartExecResults::Attached { output, input }` (an enum, not a struct — the POC had to fix this against 0.21's API). The `output` is a `Stream<LogOutput>`; the exit code is *not* on the stream — it requires a separate `inspect_exec()` call after the stream ends. The POC does this: pump the output stream, then `inspect_exec` for the exit code, then send the exit-code `call.responded`, then `call.completed`. This is the correct ordering and it works.
+
+---
+
+## What the POC Does NOT Validate
+
+Following the filesystem POC's pattern of distinguishing feasibility-validated from scope-deferred:
+
+1. **Real QUIC transport.** Uses `tokio::io::duplex` as a stand-in. The framing layer is transport-agnostic (`AsyncRead`/`AsyncWrite`); the alknet-core `Connection` type wraps the same shape. Swapping to quinn is mechanical.
+
+2. **Operation registry integration.** The POC's `DockerOps` exposes three `drive_*` methods. The real crate registers `OperationSpec`s into a shared `OperationRegistry` and lets the dispatcher's `handle_stream` call them. The `StreamingHandler` shape in alknet-call (`registry/registration.rs:20`) maps 1:1 to what `drive_logs`/`drive_exec` do — return a `Stream<ResponseEnvelope>`. The raw-carriage attach is the exception: it needs the dispatcher to hand off the raw bidi stream after the request frame, which is the one place the call protocol's `handle_stream` (`protocol/dispatch.rs:295`) would need a branch for `carriage: "raw"`.
+
+3. **Access control / identity.** The call protocol's `AccessControl` (scopes, resources) is orthogonal. The POC has no auth. The real crate would use `AccessControl::resource_type("container")` + `resource_action("exec")` to gate operations by peer identity.
+
+4. **Lifecycle mutations (create/start/stop/remove/list/inspect).** Mechanical bollard wrapping, no feasibility risk. The POC deliberately skips these — they're `Query`/`Mutation` operations with single `call.responded` responses, the boring case.
+
+5. **Image management (pull, list, build).** Pull is a subscription (progress events → `call.responded`, done → `call.completed`) — same shape as logs, no new unknowns. Build (buildkit) is a large feature, deferred.
+
+6. **Label namespace / ownership.** Dispatch used `dispatch.managed=true`. The real crate needs a configurable label prefix and ownership mapping (`alknet.owner=<peer-id>`) tied to the call protocol's identity model. Spec-scope, not feasibility.
+
+7. **Fleet view (multiple hosts).** The POC is single-host (one `bollard::Docker` client, local socket). The fleet view — multiple dedicated servers + rented instances (e.g. runpod) — is a client-side concern: a `CallClient` talking to multiple endpoints, each running alknet-docker locally. This composes with the ALPN model cleanly. The later normalization crate is the fleet client that picks which endpoint to call — see §6 below for the boundary and the head-worker/machine-node model that frames it.
+
+> **Note (correction):** an earlier draft of this section called the normalization crate `alknet-compute`. That name is wrong. `alknet-compute` is an example of something a *normalized* `alknet-container` might **run inside** a container — a workload, not the fleet layer. The normalization crate is `alknet-container` (or similar), and its job is making any docker-capable machine addressable through one shape.
+
+---
+
+## Open Unknowns (For the Spec)
+
+### 1. Raw-carriage handoff in the dispatcher (design)
+
+The POC's `drive_attach_raw` reads the `call.requested` frame itself, then switches to raw chunks. In the real crate, the dispatcher's `handle_stream` (`alknet-call/src/protocol/dispatch.rs:295`) currently reads the request frame and calls `dispatch()` which returns a `DispatchResult::Stream(ResponseStream)`. For raw carriage, the handler needs the *raw bidi stream* (the `send`/`recv` pair), not just a `ResponseStream` to pump.
+
+Two options:
+- **(a)** Branch in `handle_stream` on the `carriage` field in the request payload: if `raw`, hand the raw streams to a `RawHandler` trait instead of pumping a `ResponseStream`. Localizes the change to `handle_stream`; the wire format and dispatcher stay unchanged.
+- **(b)** A separate ALPN for raw-carriage operations (e.g. `alknet/docker-raw`). Avoids touching the call dispatcher entirely; the `ProtocolHandler` for that ALPN owns the whole stream. Less elegant but zero blast radius.
+
+The POC validates the *mechanism* (raw chunks on a bidi stream after a JSON request); the *integration point* is a spec decision. Option (a) is cleaner and keeps all docker ops on `alknet/call`; option (b) is the safest for a first cut.
+
+### 2. ALPN layout (design)
+
+Should docker ops register on the shared `alknet/call` ALPN (as operations in a shared `OperationRegistry`) or get their own `alknet/docker` ALPN (as a `ProtocolHandler`)? The conversation leans shared. The POC doesn't resolve this — it's a spec decision tied to how the assembly layer (the CLI binary) composes handlers. Shared registry is more composable (docker ops are callable from any call client, including peer routing); separate ALPN is more isolated.
+
+### 3. Container-as-resource identity model (design)
+
+How do containers map to the call protocol's `AccessControl::resource_type`/`resource_action`? A container ID is a natural resource. `docker/container/exec` could require `resource: container/<id>:exec`. But containers are created at runtime — the resource set is dynamic. The `IdentityProvider` model in alknet-core is currently static (`PeerEntry` set). Dynamic resource ownership (who created this container, who can exec into it) needs a spec.
+
+### 4. Stdin closure semantics for raw carriage (design)
+
+The POC uses a zero-length stdin chunk as "client done sending input." bollard's `container_input.shutdown()` then closes the container's stdin so the process sees EOF. This works for the interactive case. But for a non-interactive exec with stdin (piping bytes in), the closure semantics need to be clearer: does the client send a zero-length chunk, or just close the write half of the duplex? The POC handles both (zero-length chunk breaks the loop; `ConnectionClosed` also breaks the loop), but the spec should pick one as the canonical "stdin done" signal.
+
+### 5. bollard version pinning (scoping)
+
+The POC uses the local checkout at 0.21.0. The real crate should depend on published 0.21 from crates.io (the dispatch POC pinned 0.18 — a 3-version jump). The `websocket` feature is optional; the `http` and `pipe` features are needed for socket/http connect. Confirm the published 0.21 has the same API surface as the checkout (it should — same version number).
+
+### 6. The normalization crate boundary (scoping)
+
+Where does `alknet-docker` end and the later normalization crate begin?
+
+**alknet-docker** stays a thin, single-host, bollard-specific wrapper. It talks to one local docker daemon and exposes operations over the call protocol. The POC validates this side.
+
+**The normalization layer** — tentatively `alknet-container` — is the fleet client that talks to multiple alknet-docker endpoints over the call protocol (not bollard). It makes "any docker-capable machine" addressable through one shape, regardless of whether that machine is a dedicated OVH server, a runpod non-GPU instance ($0.07/hr), a vast.ai GPU box, or a local dev box.
+
+**What `alknet-compute` actually is:** a workload — an example of something a normalized `alknet-container` would *run inside* a container it manages, not the fleet layer itself. An earlier conflation of these two is the thing being corrected here.
+
+**The head-worker / machine-node model.** Framed ray.io-style to untangle the fleet topology:
+
+- **Machine node** — any node capable of running docker. Neutral about role.
+- **Head node (hub)** — a node that other nodes connect *to* and that manages them. E.g. a dedicated server hosting its existing containers *plus* a hub endpoint running in a container on that same node.
+- **Worker node (spoke)** — a node that connects *to* a head and exposes its local operations so the head can manage its containers. E.g. a second dedicated server would connect to the hub and expose its docker operations for remote management.
+
+A machine can be both spoke and hub. Two dedicated servers (e.g. rented from OVH) are both machine nodes; one additionally hosts the hub. When scaling dev agents or needing GPUs, rented runpod/vast.ai instances become worker spokes that dial the same hub.
+
+**Prior art — the dispatch POC.** `/workspace/@alkdev/dispatch` is an older, out-of-date-deps POC that demonstrates the *reverse* of a typical GitHub/Gitea runner: instead of the runner dialing a control plane, the control plane dials into worker nodes over SSH. Its `InstanceProvider` trait (`src/provider.rs`) and `DockerProvider` (`src/docker.rs`, bollard 0.18, `dispatch.managed=true` labels, SSH-key-injection into containers) is the same "normalize heterogeneous compute" idea, but implemented by requiring SSH on the worker end. The SSH requirement is realistic for runpod/vast.ai but is exactly the friction alknet-container removes: the worker dials the hub over the call protocol and exposes its docker operations directly — no SSH, no key injection, no port binding to 127.0.0.1.
+
+**How external providers normalize.** runpod exposes a standard OpenAPI spec; `alknet-http`'s `from_openapi` adapter (`crates/alknet-http/src/adapters/from_openapi.rs`) can import it wholesale and surface its operations as call-protocol operations. vast.ai has a similar API but needs customization (no clean OpenAPI drop-in). The normalization crate wraps both behind one `InstanceProvider`-shaped trait so the fleet client is provider-agnostic.
+
+This keeps alknet-docker single-host and bollard-specific; the normalization layer is transport- and provider-agnostic (it talks the call protocol and `from_openapi`-imported HTTP APIs, not bollard or raw SSH).
+
+---
+
+## Test Coverage
+
+```
+running 6 tests
+test frame_completed_carries_empty_payload ... ok
+test raw_chunk_round_trip_stdin_and_stdout ... ok
+test frame_round_trip_request_and_response ... ok
+test docker_attach_raw_round_trips_stdin_to_stdout ... ok
+test docker_logs_subscription_pumps_frames_and_completes ... ok
+test docker_exec_streams_output_and_exit_code ... ok
+
+test result: ok. 6 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 9.65s
+```
+
+The three docker-integration tests run against a live daemon (Docker Engine 29.2.1, API 1.53) using `alpine:3`. They pull the image if missing, create short-lived labeled containers, and clean up after. The three unit tests validate the frame/codec round-trip without docker.
+
+---
+
+## POC Structure
+
+```
+alknet-docker-poc/
+  Cargo.toml          — depends on bollard (path = "../bollard"), tokio, serde_json
+  src/
+    lib.rs            — module docs, the two-carriage model rationale
+    frame.rs          — EventEnvelope, FrameFramedReader/Writer (mirrors alknet-call wire.rs)
+    raw.rs            — Chunk, ChunkReader/Writer (1-byte stream-type + 4-byte length)
+    ops.rs            — DockerOps: drive_logs, drive_exec, drive_attach_raw
+  tests/
+    integration.rs    — 6 tests (3 docker-integration + 3 codec unit)
+```
+
+---
+
+## Key Code-to-Concept Mappings
+
+| POC concept | alknet-call equivalent | bollard equivalent |
+|---|---|---|
+| `EventEnvelope` (`frame.rs`) | `alknet_call::protocol::wire::EventEnvelope` | — |
+| `FrameFramedReader/Writer` | `alknet_call::protocol::wire::FrameFramedReader/Writer` | — |
+| `call.requested`/`responded`/`completed` | same event types | — |
+| `Chunk` stream_type 0/1/2 | — | `NewlineLogOutputDecoder` header byte (`read.rs:46`) |
+| `drive_logs` pump | `StreamingHandler` returning `Stream<ResponseEnvelope>` | `Docker::logs()` → `Stream<LogOutput>` |
+| `drive_exec` exit code | final `call.responded` before `call.completed` | `Docker::inspect_exec()` → `ExecInspectResponse.exit_code` |
+| `drive_attach_raw` raw handoff | `handle_stream` branch on `carriage: "raw"` (spec decision) | `Docker::attach_container()` → `AttachContainerResults { output, input }` |
+| `Carriage::Json`/`Raw` | (new field in `call.requested` payload) | — |
+
+---
+
+## References
+
+- bollard source (0.21.0): `/workspace/bollard` — `src/container.rs` (`attach_container` at :540, `attach_container_websocket` at :613, `LogOutput` at :96, `AttachContainerResults` at :80), `src/exec.rs` (`CreateExecOptions` at :28, `StartExecResults` enum at :99, `start_exec` at :225), `src/read.rs` (`NewlineLogOutputDecoder` at :32)
+- bollard examples: `/workspace/bollard/examples/attach_container.rs` (reliable attach + tty), `/workspace/bollard/examples/websocket_attach.rs` (websocket attach with reliability warning)
+- alknet-call wire format: `/workspace/@alkdev/alknet/crates/alknet-call/src/protocol/wire.rs` (EventEnvelope, FrameFramedReader/Writer — the POC's `frame.rs` mirrors this)
+- alknet-call dispatch: `/workspace/@alkdev/alknet/crates/alknet-call/src/protocol/dispatch.rs` (`handle_stream` at :295, `pump_stream` at :340 — the streaming pump the POC's `drive_logs`/`drive_exec` mirror)
+- alknet-call registry: `/workspace/@alkdev/alknet/crates/alknet-call/src/registry/registration.rs` (`StreamingHandler` at :20 — the handler shape for subscription ops)
+- dispatch POC (prior art, "reverse runner"): `/workspace/@alkdev/dispatch` — `src/provider.rs` (`InstanceProvider` trait), `src/docker.rs` (bollard 0.18 wrapping, SSH-key-injection model), `src/vast.rs`, `AGENTS.md` (provider architecture summary)
+- alknet-http `from_openapi` adapter (runpod-style provider import): `/workspace/@alkdev/alknet/crates/alknet-http/src/adapters/from_openapi.rs`
+- filesystem POC summary (structure reference): `/workspace/@alkdev/alknet/docs/research/alknet-filesystem/poc-summary.md`
+- SDD process: `/workspace/@alkdev/alknet/docs/sdd_process.md` (Phase 0 exploration → Phase 1 architecture)
+- System docs (private, not in-repo): the maintainer's two-server fleet setup that motivates this design. The fleet use case is captured abstractly in §6 above; the concrete hostnames/IPs/paths are kept out of the public repo.
--- a/docs/research/alknet-tty/phase-0-findings.md
+++ b/docs/research/alknet-tty/phase-0-findings.md
@@ -0,0 +1,617 @@
+---
+status: draft
+last_updated: 2026-07-03
+---
+
+# alknet-tty — Phase 0 Research Findings
+
+This document captures Phase 0 (Exploration) findings for the `alknet-tty`
+crate. The objective of Phase 0 per `docs/sdd_process.md` is: *"Capture vision
+and guiding principles; research options; validate approaches; converge on a
+recommended approach."* It is the input to Phase 1 (Architecture), where the
+Architect will produce `docs/architecture/crates/tty/*.md` specs, ADRs, and
+open questions.
+
+This document was drafted 2026-07-03, immediately after the `alknet-docker`
+POC (`docs/research/alknet-docker/poc-summary.md`) validated that bollard's
+container attach maps cleanly onto a framed bidi stream with a 1-byte
+stream-type multiplexer. The POC's raw chunk format is the seed of
+`alknet-tty`'s wire format.
+
+## Vision Recap
+
+`alknet-tty` is a terminal session protocol handler for the ALPN-as-service
+architecture (ADR-001). It registers the `alknet/tty` ALPN on the shared
+`AlknetEndpoint` and implements the `ProtocolHandler` trait (ADR-002,
+ADR-007).
+
+The guiding insight, surfaced during the alknet-docker POC and recognized in
+the conversation that followed:
+
+> **A terminal session is not an SSH concern, or a Docker concern — it is a
+> terminal concern. SSH and Docker are just two backends that can allocate
+> a PTY.**
+
+The alknet-docker POC proved that the hard part of interactive attach —
+bidirectional byte pumping over a framed stream with a multiplexing header —
+is the same problem regardless of whether the backend is `bollard::attach_container()`
+or russh's `pty_request` + session channel. The POC's raw chunk format
+(`[stream_type: u8][length: u32 be][payload bytes]`, with stream_type
+0=stdin, 1=stdout, 2=stderr) is a deliberately impoverished version of SSH's
+channel multiplexer: fixed set of channel types, no negotiation, no
+open/close handshake, no windowing (QUIC provides flow control on the bidi
+stream). That impoverishment is the feature — a terminal session needs
+exactly those channels and no more.
+
+`alknet-tty` extracts that pattern into its own crate and ALPN. The
+backends (Docker, SSH, local process) implement a `TtyBackend` trait; the
+`alknet/tty` handler is backend-agnostic. This dissolves the PTY hedge in
+the alknet-ssh research (`docs/research/alknet-ssh/phase-0-findings.md`
+DP-5: "shell_request and pty_request default-reject; interactive shell is
+an explicit opt-in") — PTY is not an SSH feature, it's a tty feature that
+SSH happens to be able to provide.
+
+Beyond terminals, the same wire format and backend trait support a general
+"runner" pattern: a process (local `std::process::Command`, docker
+container, SSH exec) whose stdin/stdout/stderr/exit-code are streamed over
+a framed bidi connection. The dispatch project
+(`/workspace/@alkdev/dispatch/`) is a reverse runner that currently requires
+an SSH server on the remote end; with `alknet-tty` and a local-process
+backend, the same runner pattern works without SSH at all — the endpoint
+runs the process directly and streams its I/O back. This is the same shape
+as GitHub/Gitea Actions runners, just over alknet's transport instead of
+HTTP polling.
+
+## Sources Investigated
+
+| Source | Path | Note |
+|--------|------|------|
+| alknet-docker POC | `/workspace/alknet-docker-poc/` | Validated raw chunk format, two-carriage model, bidirectional pumping against live docker. The POC's `src/raw.rs` is the seed of alknet-tty's wire format. |
+| alknet-docker POC summary | `docs/research/alknet-docker/poc-summary.md` | Documents the two-carriage model (JSON negotiation → raw bytes), the three validated targets, the open unknowns. |
+| alknet-ssh phase-0 findings | `docs/research/alknet-ssh/phase-0-findings.md` | DP-5 hedges PTY as an SSH concern; the channel decomposition (Layers 1-7) treats PTY as part of Layer 4 (Session/exec). This document dissolves that hedge. |
+| alknet-core types | `crates/alknet-core/src/types.rs` | `ProtocolHandler`, `Connection`, `SendStream`, `RecvStream` — the handler interface alknet-tty implements. |
+| alknet-call wire format | `crates/alknet-call/src/protocol/wire.rs` | `EventEnvelope`, `FrameFramedReader/Writer` — the JSON carriage layer alknet-tty uses for the initial `call.requested` negotiation frame. |
+| alknet-call dispatch | `crates/alknet-call/src/protocol/dispatch.rs` | `handle_stream` (:295), `pump_stream` (:340) — the streaming pump pattern. alknet-tty's raw-carriage path is a sibling to this, not a consumer of it. |
+| bollard source | `/workspace/bollard/src/` | `container.rs` (`attach_container` :540, `LogOutput` :96, `AttachContainerResults` :80), `read.rs` (`NewlineLogOutputDecoder` :32 — the 8-byte header format our chunk format mirrors), `exec.rs` (`StartExecResults` enum :99) |
+| bollard examples | `/workspace/bollard/examples/attach_container.rs` | Reliable attach + TTY passthrough. |
+| dispatch project | `/workspace/@alkdev/dispatch/` | The "reverse runner" — axum + russh SSH client for exec/forwarding/sync over Docker/vast.ai. `src/handlers.rs` (`start_job`, `job_status`, `job_logs`) is the runner pattern alknet-tty generalizes. Currently requires SSH on the remote; alknet-tty with a local-process backend removes that requirement. |
+| russh source | `/workspace/russh/` | `server::Handler` — `pty_request` (allocates PTY), `window_change` (resize), `signal` (signal forwarding), `shell_request`/`exec_request`. These are the SSH-side operations a `SshTtyBackend` wraps. |
+| alknet-runtime research | `docs/research/alknet-runtime/summary.md` | The "operation host" pattern — a node that exposes ops on a registry. alknet-tty is the same pattern for process execution: a node that can run a process and stream its I/O. |
+| Rust std::process | stdlib | `Command`, `Stdio` (piped stdin/stdout/stderr), `Child::wait` (exit code). The local-process backend. The threading/deadlock caveat (must read stdout/stderr concurrently with writing stdin to avoid pipe-buffer deadlock) is handled by the bidirectional pump, same as docker attach. |
+
+## The Wire Format: From POC to Spec
+
+### What the alknet-docker POC validated
+
+The POC's `src/raw.rs` defines a chunk format for raw carriage on a bidi
+stream:
+
+```text
+[stream_type: u8][length: u32 be][payload bytes]
+```
+
+- `stream_type` mirrors bollard's `NewlineLogOutputDecoder` header byte
+  (`/workspace/bollard/src/read.rs:46`): 0=stdin, 1=stdout, 2=stderr.
+- `length` is the payload length in bytes (u32 big-endian, max 16 MiB).
+- A zero-length chunk is a sentinel (used for completion notification).
+
+The POC proved this format works for:
+- **server→client stdout/stderr**: each `LogOutput` from bollard's attach
+  stream becomes a chunk with the matching stream_type.
+- **client→server stdin**: `ChunkWriter::write_stdin(bytes)` writes a
+  type-0 chunk; the server reads it and writes the bytes to bollard's
+  `container_input` (`AsyncWrite`).
+- **completion**: when bollard's output stream ends (container exited),
+  the server sends a zero-length type-1 chunk as a "drained" sentinel.
+
+### What alknet-tty adds
+
+A terminal session needs two things the docker attach POC didn't:
+
+1. **Control messages during the raw phase.** Window resize (SIGWINCH) and
+   signal forwarding (Ctrl-C → SIGINT) must ride *during* the byte stream,
+   not as a new request. The chunk format handles this by reserving a 4th
+   stream_type:
+
+   | stream_type | channel | direction | payload |
+   |---|---|---|---|
+   | 0 | data-in (stdin) | client→server | raw bytes |
+   | 1 | data-out (stdout) | server→client | raw bytes |
+   | 2 | data-err (stderr) | server→client | raw bytes |
+   | 3 | control | bidirectional | JSON control message |
+
+   Control chunks carry a small JSON payload:
+   - `{"type":"resize","cols":80,"rows":24,"pixel_width":0,"pixel_height":0}` —
+     window resize (maps to SSH `window-change`, docker exec resize, or
+     `ioctl(TIOCSWINSZ)` on a local PTY).
+   - `{"type":"signal","name":"INT"}` — signal forwarding (maps to SSH
+     `signal`, docker exec signal, or `kill(pid, sig)` on a local process).
+   - `{"type":"eof"}` — client signals no more stdin (maps to SSH channel
+     EOF, docker stdin close, or `ChildStdin::drop`).
+   - `{"type":"exit","code":0}` — server signals process exit (terminal,
+     no more data chunks follow; the stream then closes).
+
+2. **Terminal parameters at negotiation time.** The initial `call.requested`
+   frame (JSON carriage, same as the POC) carries the terminal attributes
+   that the backend needs to allocate the PTY:
+
+   ```json
+   {
+     "operationId": "/tty/open",
+     "carriage": "raw",
+     "backend": "docker",
+     "container": "abc123",
+     "tty": {
+       "term": "xterm-256color",
+       "cols": 80,
+       "rows": 24,
+       "pixel_width": 0,
+       "pixel_height": 0,
+       "modes": {}
+     },
+     "cmd": ["/bin/bash"]
+   }
+   ```
+
+   The `tty` block maps directly to SSH's `pty_request` parameters
+   (term, cols, rows, pixel_width, pixel_height, modes) and to docker's
+   `CreateExecOptions { tty: true }`. A local-process backend passes them
+   to `portable_pty::PtySystem::openpty` (or equivalent).
+
+### Why fixed channel set, not extensible
+
+SSH's channels are `ChannelId(u32)` with string-named types negotiated per
+channel. alknet-tty's channels are a fixed `u8` set with no negotiation.
+This is a one-way door (adding a 5th channel type is a wire-format change),
+and it's the right one-way door:
+
+- **The use cases are bounded.** A terminal session has stdin, stdout,
+  stderr, and control. If something genuinely new appears (say, a
+  sideband file-transfer channel alongside the terminal), that's a
+  different ALPN, not a 5th tty channel type. The ALPN model handles
+  extensibility at the protocol level — a new ALPN is cheap, a wire-format
+  change is not.
+- **1 byte vs length-prefixed string + negotiation round-trip.** The fixed
+  set is faster, simpler, and the demuxing is a `match` instead of a hash
+  lookup. For a terminal session where every chunk is hot, this matters.
+- **The comparison to SSH channels is the justification, not the
+  constraint.** SSH needs dynamic channels because it multiplexes
+  *arbitrary* services (forwarding, SFTP, agent, X11) over one connection.
+  alknet-tty multiplexes *one* service (a terminal session) with a fixed
+  channel structure. The impoverishment is the feature.
+
+## The Backend Trait
+
+The `TtyBackend` trait is the inversion point that keeps alknet-tty
+decoupled from its backends:
+
+```rust
+#[async_trait]
+pub trait TtyBackend: Send + Sync {
+    async fn allocate(&self, params: &TtyParams) -> Result<TtyHandle, TtyError>;
+}
+
+pub struct TtyParams {
+    pub backend_params: BackendParams,  // backend-specific (container id, ssh host, command)
+    pub terminal: TerminalParams,        // term, cols, rows, modes
+    pub cmd: Vec<String>,
+}
+
+pub enum BackendParams {
+    Docker { container: String },
+    Ssh { channel: SshChannelRef },
+    Local { cwd: Option<PathBuf>, env: HashMap<String, String> },
+}
+
+pub struct TtyHandle {
+    pub stdin: Box<dyn AsyncWrite + Send + Unpin>,
+    pub stdout: Pin<Box<dyn Stream<Item = Bytes> + Send>>,
+    pub stderr: Option<Pin<Box<dyn Stream<Item = Bytes> + Send>>>,  // None if PTY (merged into stdout)
+    pub exit_code: BoxFuture<'static, Result<i32, TtyError>>,
+    pub control: Box<dyn TtyControl + Send + Unpin>,  // resize, signal
+}
+```
+
+The `TtyAdapter` (the `ProtocolHandler` for `alknet/tty`) receives the
+`Connection`, reads the `call.requested` frame, selects the backend by the
+`backend` field, calls `allocate()`, and pumps bytes bidirectionally using
+the chunk format. Control chunks are dispatched to `TtyHandle::control`.
+When `exit_code` resolves, the server sends a `{"type":"exit","code":N}`
+control chunk and closes the stream.
+
+Three implementations, each in its own crate (the no-handler-depends-on-
+another-handler rule from ADR-003 is preserved — backends depend on
+alknet-tty for the trait, alknet-tty doesn't depend on them):
+
+- **`DockerTtyBackend`** (in alknet-docker, or a thin adapter): wraps
+  `bollard::attach_container()` → `AttachContainerResults { output, input }`
+  for interactive attach, or `bollard::exec::start_exec` with `tty: true`
+  for exec-with-PTY. The POC's `drive_attach_raw` *is* this backend,
+  inlined; with the trait, it becomes `impl TtyBackend for DockerTtyBackend`.
+  `control.resize()` calls `bollard::exec::resize_exec` or
+  `bollard::container::resize_container`.
+
+- **`SshTtyBackend`** (in alknet-ssh): wraps russh's `pty_request` +
+  `shell_request` (or `exec_request` with a PTY) on a session channel.
+  `channel.into_stream()` gives `(AsyncRead, AsyncWrite)` — the stream
+  *is* the PTY; russh handles kernel PTY allocation on the server side.
+  `control.resize()` sends a `window_change` channel request;
+  `control.signal()` sends a `signal` channel request. stdout and stderr
+  are merged (PTY property), so `TtyHandle.stderr` is `None`.
+
+- **`LocalTtyBackend`** (in alknet-tty or a sibling crate): wraps
+  `std::process::Command` with `Stdio::piped()` for stdin/stdout/stderr,
+  OR `portable_pty` for a real PTY (needed for terminal escape sequences,
+  signal delivery, window resize). Without a PTY, it's a "runner" (piped
+  process); with a PTY, it's a terminal. `control.resize()` calls
+  `ioctl(TIOCSWINSZ)` on the PTY master; `control.signal()` calls
+  `kill(child.pid, sig)`. The threading/deadlock caveat (must read
+  stdout/stderr concurrently with writing stdin to avoid pipe-buffer
+  deadlock) is handled by the bidirectional pump — the same pattern as
+  docker attach, where `tokio::spawn` runs the two directions concurrently.
+
+### The runner generalization
+
+The `LocalTtyBackend` without a PTY is the "runner" pattern: a process
+whose stdin/stdout/stderr/exit-code are streamed over a framed bidi
+connection. This is functionally identical to GitHub/Gitea Actions runners,
+just over alknet's transport instead of HTTP polling:
+
+- A coordinator sends `{"backend":"local","cmd":["cargo","test"],"tty":null}`
+  — no terminal, just a command.
+- The endpoint runs `cargo test` with piped stdio, streams stdout/stderr
+  chunks back, sends `{"type":"exit","code":N}` when it finishes.
+- The coordinator gets reliable completion notification (the exit control
+  chunk + stream close) — the same stopgap property as the docker logs
+  subscription.
+
+The dispatch project (`/workspace/@alkdev/dispatch/`) is a reverse runner
+that currently requires an SSH server on the remote end (it uses russh to
+exec commands and stream output). With `LocalTtyBackend`, the same pattern
+works without SSH — the endpoint runs the process directly. SSH becomes
+one transport option (for reaching hosts that don't run alknet), not a
+requirement. This is "discuss afterwards" territory per the conversation,
+but the trait shape preserves the option.
+
+## What This Dissolves in alknet-ssh
+
+### DP-5's PTY hedge
+
+The alknet-ssh research (`phase-0-findings.md` DP-5) says:
+
+> `shell_request` and `pty_request` default-reject; `exec_request`
+> permitted (gated by ACL). This keeps alknet-ssh a focused forwarding/exec
+> appliance rather than a general-purpose interactive login server.
+> Interactive shell is an explicit opt-in (two-way door).
+
+With alknet-tty, PTY is not an SSH feature — it's a tty feature. alknet-ssh
+implements `TtyBackend` for SSH session channels; alknet-tty owns the
+terminal session lifecycle. alknet-ssh's session channel (Layer 4) still
+does `exec` (structured, JSON carriage, exit code on completion) but
+*delegates* PTY to alknet-tty. The "default-reject" stance stays for the
+SSH channel policy (alknet-ssh still rejects `pty_request` on its own
+session channels — it doesn't serve terminals directly), but the PTY
+capability is provided by a separate crate via a separate ALPN, not hedged
+inside alknet-ssh.
+
+### Layer 4 simplifies
+
+The alknet-ssh build order was "1-4 first (SSH+exec), then 5 (forwarding),
+then 6/7 (SOCKS5/SFTP)." PTY was a deferred wart on Layer 4. With
+alknet-tty, Layer 4 is just `exec` (one-shot command, JSON carriage, exit
+code on completion) — clean and complete. PTY is a *different ALPN*
+(`alknet/tty`) that happens to use SSH as its backend.
+
+### The browser case gets a terminal for free
+
+The alknet-ssh research notes the browser runs a WASM SSH client over
+WebTransport (ADR-040). But a browser terminal (xterm.js) doesn't want SSH
+— it wants a terminal. With `alknet/tty` as an ALPN, xterm.js connects via
+WebTransport to `/alknet/tty`, negotiates a session (docker container, SSH
+PTY, or local process), and gets raw bytes. The browser doesn't need to
+implement SSH at all for the terminal use case — it only needs SSH if it
+wants SSH-specific features (port forwarding, SFTP). This is a cleaner
+browser story than "run a WASM SSH client."
+
+## Straightforward Parts
+
+These are settled by the POC, existing ADRs, and the wire format above.
+Phase 1 should document them as spec rather than re-litigate.
+
+### 1. alknet-tty is a `ProtocolHandler` on `alknet/tty`
+
+Same pattern as every other handler: `TtyAdapter` implements
+`ProtocolHandler::handle(&self, connection: Connection, auth: &AuthContext)`
+with `alpn() = b"alknet/tty"`. The handler owns the entire `Connection`
+lifecycle (ADR-006) and accepts one bidi stream per terminal session.
+
+### 2. The two-carriage model is inherited from the POC
+
+The initial `call.requested` frame is JSON (length-prefixed `EventEnvelope`,
+identical to alknet-call's `FrameFramedReader/Writer`). After the request,
+the stream switches to raw chunks. The `carriage` field in the request
+payload is `"raw"` for terminal sessions. This is the same mechanism the
+POC validated; no new wire-format invention.
+
+### 3. Raw chunk format is POC-validated
+
+The `[stream_type: u8][length: u32 be][payload]` format, the `ChunkReader`/
+`ChunkWriter` types, and the bidirectional pump pattern are all directly
+from the POC's `src/raw.rs`. The only addition is `stream_type: 3` for
+control messages, which is a 1-byte extension to a validated format.
+
+### 4. Backend trait is the inversion point
+
+alknet-tty defines `TtyBackend`; the backend crates (alknet-docker,
+alknet-ssh, local) implement it. The `TtyAdapter` is backend-agnostic.
+This preserves ADR-003's no-handler-depends-on-another-handler rule:
+alknet-tty depends on alknet-core; the backend crates depend on alknet-tty
+(for the trait); alknet-tty doesn't depend on any backend.
+
+### 5. Completion notification is free
+
+The exit control chunk (`{"type":"exit","code":N}`) + stream close gives
+the coordinator deterministic completion notification — the same stopgap
+property the docker POC validated for logs subscriptions. No plugin state,
+no polling. The container/process exiting is the signal.
+
+## Less Straightforward Parts (Decision Points)
+
+### DP-1: Local-process backend in alknet-tty or a sibling crate?
+
+*(Recommended: two-way door — start in alknet-tty, extract if warranted)*
+
+The `LocalTtyBackend` (std::process::Command / portable_pty) is the
+simplest backend and the one that enables the runner pattern. It has no
+heavy dependencies (no bollard, no russh — just std + optionally
+`portable_pty`). Two options:
+
+- **(a) In alknet-tty**: the crate ships with the local backend built-in.
+  Pro: zero-config runner, one crate gets you a terminal/process-streaming
+  endpoint. Con: alknet-tty pulls in `portable_pty` even for deployments
+  that only use docker/ssh backends.
+- **(b) In a sibling crate (`alknet-tty-local`)**: alknet-tty defines the
+  trait; the local backend is a separate crate. Pro: alknet-tty stays
+  dependency-light; consumers opt into the local backend explicitly. Con:
+  one extra crate for the common case.
+
+**Recommendation**: **(b) sibling crate**, behind a feature flag on
+alknet-tty for the common case (`features = ["local"]` → re-export from
+`alknet-tty-local`). This keeps alknet-tty's default dependency surface
+minimal while making the local backend a one-feature opt-in. The local
+backend is where the `portable_pty` dependency lives; alknet-tty itself
+depends only on alknet-core and the frame/raw codec. Extraction is cheap
+because the trait is the seam.
+
+### DP-2: PTY vs pipe for the local backend
+
+*(Recommended: two-way door — support both, PTY is opt-in)*
+
+`std::process::Command` with `Stdio::piped()` gives pipes (no terminal
+semantics — no signal delivery, no window resize, no escape-sequence
+handling). `portable_pty` gives a real PTY (terminal semantics, resize,
+signals, escape sequences). The `TtyParams.terminal` field distinguishes:
+if `terminal` is `Some(TerminalParams { ... })`, the backend allocates a
+PTY; if `None`, it uses pipes (the runner case).
+
+**Recommendation**: support both. The `TtyHandle.stderr` field is `None`
+for PTY (stdout/stderr merged) and `Some` for pipes (separate streams).
+The `control` field is a no-op impl for pipes (resize/signal don't apply
+without a PTY — though `kill(pid, sig)` still works for signal forwarding).
+The decision is per-session, not per-deployment.
+
+### DP-3: Control message format — JSON vs binary
+
+*(Recommended: two-way door — JSON first, binary if hot)*
+
+Control chunks (stream_type 3) carry a JSON payload (`{"type":"resize",
+"cols":80,"rows":24}`). This is consistent with the call protocol's
+JSON-everything stance and easy to extend. A binary format
+(`[control_type: u8][params...]`) would be faster but harder to extend and
+inconsistent with the negotiation layer.
+
+**Recommendation**: JSON first. Control messages are rare (resize happens
+on window drag, signal on Ctrl-C) — the serialization cost is negligible
+compared to the data chunks. If a hot control path appears (unlikely for
+terminals), a binary format can be added as a `control_type` extension
+without breaking the chunk format.
+
+### DP-4: The threading/deadlock caveat for piped processes
+
+*(Recommended: acknowledged constraint — the bidirectional pump handles it)*
+
+`std::process::Command` with piped stdio can deadlock if stdin writes
+block while stdout/stderr buffers fill — the classic pipe-buffer deadlock.
+The fix is concurrent reads on stdout/stderr alongside stdin writes, which
+is exactly what the bidirectional pump does (the POC's `drive_attach_raw`
+runs the two directions as concurrent `tokio::spawn` tasks). The same
+pattern works for `LocalTtyBackend`: spawn one task pumping stdin→process,
+one task pumping process→stdout-chunks, one for stderr if piped.
+
+**Recommendation**: Phase 1 records this as a known constraint with a
+known solution (concurrent pumping). No design decision needed — the POC
+already proved the pattern. The spec notes that `LocalTtyBackend` must use
+the concurrent-pump pattern, not sequential read-then-write.
+
+### DP-5: Exit code propagation — control chunk vs final data chunk
+
+*(Recommended: one-way door — control chunk)*
+
+The alknet-docker POC validated exit-code-on-final-`call.responded` for
+the JSON carriage path (exec with exit code). The raw carriage path needs
+a different mechanism because there's no `call.responded` after the raw
+phase begins. Two options:
+
+- **(a) Control chunk**: `{"type":"exit","code":N}` as the last chunk
+  before stream close. Clean, explicit, carries the code as structured
+  data.
+- **(b) Final data chunk with exit code**: a special stdout chunk with an
+  exit-code payload. Hacky — overloads the data channel for metadata.
+
+**Recommendation**: **(a) control chunk**. The exit code is control
+metadata, not data. The control channel (stream_type 3) exists for exactly
+this. The chunk is the last thing before stream close; the client reads it
+and knows the process exited with code N. This is a one-way door because
+clients will depend on the "exit chunk is last" invariant.
+
+### DP-6: Multiple sessions per connection
+
+*(Recommended: two-way door — one session per stream, multiple streams per connection)*
+
+A `Connection` (ADR-007) can open/accept multiple bidi streams. Should one
+`alknet/tty` connection host multiple terminal sessions (one per stream),
+or one session per connection?
+
+**Recommendation**: **one session per bidi stream, multiple streams per
+connection**. This matches the call protocol's model (one operation per
+stream, multiple operations per connection) and is the natural fit for
+QUIC's stream multiplexing. A coordinator opens one connection to an
+endpoint and launches multiple sessions (one stream each) for parallel
+tasks. The `TtyAdapter::handle` accepts the connection and loops
+`accept_bi`, dispatching each stream to a session — same pattern as
+alknet-call's `Dispatcher::run_loop` (`protocol/dispatch.rs:369`).
+
+## Recommended Approach
+
+### Crate
+
+`alknet-tty`, depends on `alknet-core` (for `ProtocolHandler`, `Connection`).
+Defines the `TtyBackend` trait, the wire format (chunk codec + control
+messages), and the `TtyAdapter` (`ProtocolHandler` for `alknet/tty`). Does
+not depend on bollard, russh, or portable_pty — those are in the backend
+crates.
+
+### Build order
+
+**Step 1: Wire format + TtyAdapter + mock backend.**
+- Extract `raw.rs` from the POC into alknet-tty's wire format module.
+- Add `stream_type: 3` (control) and the control message types
+  (resize, signal, eof, exit).
+- Implement `TtyAdapter` with a mock backend (in-memory pipes) to validate
+  the full protocol: negotiate → pump → control → exit → close.
+- **Result**: a working `alknet/tty` handler with no real backends, but
+  the wire format and session lifecycle are proven.
+
+**Step 2: LocalTtyBackend (runner).**
+- `alknet-tty-local` crate (or feature): `impl TtyBackend for LocalTtyBackend`
+  using `std::process::Command` with piped stdio.
+- Validate the runner pattern: `cargo test` as the command, stream
+  stdout/stderr/exit over `alknet/tty`.
+- Add `portable_pty` for the PTY case (terminal semantics, resize, signals).
+- **Result**: a working runner/terminal endpoint with no docker or SSH
+  dependency.
+
+**Step 3: DockerTtyBackend.**
+- In alknet-docker: `impl TtyBackend for DockerTtyBackend` wrapping
+  `bollard::attach_container` / `exec with tty:true`.
+- The POC's `drive_attach_raw` becomes this backend; the `TtyAdapter` calls
+  it via the trait.
+- **Result**: docker containers as terminal sessions via `alknet/tty`.
+
+**Step 4: SshTtyBackend.**
+- In alknet-ssh: `impl TtyBackend for SshTtyBackend` wrapping russh's
+  `pty_request` + `shell_request`/`exec_request` on a session channel.
+- `control.resize()` → `window_change` channel request;
+  `control.signal()` → `signal` channel request.
+- **Result**: SSH PTYs as terminal sessions via `alknet/tty`. alknet-ssh's
+  DP-5 hedge dissolves — PTY is delegated to alknet-tty.
+
+### De-risk POC (extending the alknet-docker POC)
+
+The alknet-docker POC already validated targets 1 (attach round-trip), 2
+(logs completion), and 3 (exec exit code). Two extensions validate the
+alknet-tty additions:
+
+1. **Control message during raw phase** — add `stream_type: 3` to the POC's
+   chunk format, send a `resize` control chunk mid-session, prove the
+   backend receives it. For docker this requires `tty: true` on the exec
+   and `bollard::exec::resize_exec`. Small POC, validates the control
+   channel mechanism.
+
+2. **PTY allocation via docker exec with TTY** — `CreateExecOptions { tty:
+   true }` allocates a real PTY. Validate that stdout/stderr merge
+   (stream_type always 1) and that resize works. Proves the docker-as-PTY-
+   backend path.
+
+Both are extensions to the existing POC, not new POCs. The wire format and
+bidirectional pump are already proven; these just confirm the control
+channel and PTY-specific paths.
+
+## Open Questions to Carry into Phase 1
+
+- **OQ-TTY-01 (backend trait shape)**: the exact `TtyHandle` field set —
+  is `control` a separate trait object or are resize/signal methods on
+  `TtyHandle` directly? Does `exit_code` belong on the handle or is it a
+  separate `Future` the adapter awaits? Resolved by Phase 1 spec; the POC
+  extension informs the decision.
+- **OQ-TTY-02 (terminal modes)**: SSH's `pty_request` carries TTY modes
+  (echo, raw, canonical, etc.) as a packed bitmask. Does alknet-tty
+  support these, or defer to the backend's defaults? Likely defer for v1
+  (the common case is "default terminal modes"); the `modes` field in
+  `TerminalParams` is reserved for future use.
+- **OQ-TTY-03 (flow control)**: the chunk format has no windowing (QUIC
+  provides flow control on the bidi stream). Is this sufficient for
+  high-throughput stdout (e.g., `cargo build` output)? QUIC's per-stream
+  flow control should handle it, but a POC with real high-volume output
+  would confirm. Low risk — the docker POC's logs subscription handled
+  multi-line output without issue.
+- **OQ-TTY-04 (local backend crate placement)**: confirm `alknet-tty-local`
+  as a sibling crate vs a feature flag on alknet-tty. DP-1 recommends
+  sibling + feature re-export; Phase 1 confirms.
+- **OQ-TTY-05 (runner API surface)**: the "runner" generalization
+  (local-process backend without PTY) is noted as "discuss afterwards" in
+  the conversation. Phase 1 should at minimum preserve the option
+  (`TtyParams.terminal = None` → pipe mode) even if the runner-specific
+  API surface (job management, log persistence, task graph integration) is
+  deferred to a later crate.
+
+## Next Steps (Phase 0 → Phase 1)
+
+1. **POC extension**: extend `/workspace/alknet-docker-poc` with
+   `stream_type: 3` (control) and `tty: true` exec to validate the control
+   channel and PTY allocation. Timeboxed; the wire format is already
+   proven, these are extensions.
+2. **You decide** on the DP recommendations (or amend them). DP-1 (local
+   backend placement) and DP-5 (exit code on control chunk) are the
+   load-bearing choices. DP-2, DP-3, DP-4, DP-6 are defaults recommended
+   as-is.
+3. **Phase 1 (Architect)**: produce `docs/architecture/crates/tty/README.md`
+   + component specs (`tty-wire.md` for the chunk format + control
+   messages, `tty-backend.md` for the `TtyBackend` trait + `TtyHandle`,
+   `tty-adapter.md` for the `ProtocolHandler` + session lifecycle,
+   `tty-local.md` for the local backend / runner), ADRs for the accepted
+   DPs (wire format + fixed channel set, backend trait as inversion point,
+   local backend placement, exit code on control chunk), and the OQs above
+   in `open-questions.md`. Update `docs/architecture/README.md` index and
+   ADR table.
+
+## References
+
+- `docs/research/alknet-docker/poc-summary.md` — the POC that seeded this
+  crate. Raw chunk format, two-carriage model, three validated targets.
+- `/workspace/alknet-docker-poc/src/raw.rs` — the chunk codec
+  (`ChunkReader`, `ChunkWriter`, stream_type 0/1/2) that alknet-tty
+  extends with stream_type 3.
+- `/workspace/alknet-docker-poc/src/ops.rs` — `drive_attach_raw` (the
+  bidirectional pump pattern, the session lifecycle) that the
+  `TtyAdapter` generalizes.
+- `docs/research/alknet-ssh/phase-0-findings.md` — DP-5 (PTY hedge, dissolved
+  by this crate), the channel decomposition (Layers 1-7, PTY moves out of
+  Layer 4), the browser case (xterm.js over WebTransport to `/alknet/tty`).
+- `docs/architecture/decisions/001-alpn-protocol-dispatch.md` — ALPN dispatch
+- `docs/architecture/decisions/002-protocol-handler-trait.md` — ProtocolHandler
+- `docs/architecture/decisions/007-bistream-type-definition.md` — Connection,
+  SendStream, RecvStream
+- `docs/architecture/decisions/003-crate-decomposition.md` — no-handler-depends-
+  on-another-handler (alknet-tty depends on alknet-core; backends depend on
+  alknet-tty for the trait)
+- `docs/architecture/decisions/040-webtransport-alpn-stream-proxy.md` —
+  WebTransport stream → `Connection` (the browser terminal path)
+- `/workspace/bollard/src/read.rs` — `NewlineLogOutputDecoder` (the 8-byte
+  header format our chunk format mirrors)
+- `/workspace/russh/` — `server::Handler` (`pty_request`, `window_change`,
+  `signal`) — the SSH operations a `SshTtyBackend` wraps
+- `/workspace/@alkdev/dispatch/` — the reverse runner that currently requires
+  SSH; `LocalTtyBackend` removes that requirement
+- `docs/research/alknet-runtime/summary.md` — the "operation host" pattern
+  (alknet-tty is the same pattern for process execution)
--- a/tasks/http/server/subscribe-sse-streaming.md
+++ b/tasks/http/server/subscribe-sse-streaming.md
@@ -1,7 +1,7 @@
 ---
 id: http/server/subscribe-sse-streaming
 name: Wire /subscribe handler to GatewayDispatch::invoke_streaming() and pipe BoxStream to SSE
-status: pending
+status: completed
 depends_on: [http/gateway/invoke-streaming]
 scope: narrow
 risk: medium
@@ -153,4 +153,4 @@ stream is dropped (not leaked) on disconnect.

 ## Summary

-> To be filled on completion
+> Replaced /subscribe one-event placeholder with real streaming path. subscribe_handler now calls GatewayDispatch::invoke_streaming() and pipes BoxStream to SSE via subscribe_stream_from_envelope_stream (StreamExt::map). Ok → data: frame, Err → event:error (terminal, stream ends after). Removed placeholder helpers (subscribe_stream_from_envelope, envelope_to_sse_stream). Kept subscribe_stream_internal_error for Internal ops (NOT_FOUND). Added 6 unit tests. Also fixed 2 pre-existing websocket subscription tests that expected INVALID_OPERATION_TYPE but now get call.responded (dispatch_requested routes Subscription via invoke_streaming). 247 tests pass.
--- a/tasks/review-streaming-impl.md
+++ b/tasks/review-streaming-impl.md
@@ -1,7 +1,7 @@
 ---
 id: review-streaming-impl
 name: Review ADR-049 streaming handler implementation for spec conformance and end-to-end correctness
-status: pending
+status: completed
 depends_on: [call/protocol/dispatch-streaming-branch, call/client/from-call-streaming-forwarding, http/gateway/invoke-streaming, http/server/subscribe-sse-streaming, http/adapters/from-openapi-sse-streaming]
 scope: broad
 risk: low
@@ -207,4 +207,4 @@ review.

 ## Summary

-> To be filled on completion
+> Reviewed ADR-049 streaming handler implementation across all 12 checklist points. All type surface, registry, builder, dispatch, from_call, gateway, /subscribe SSE, from_openapi SSE, ADR conformance, end-to-end correctness, pattern consistency, and test coverage items verified. 555 tests pass (306 call + 2 integration + 247 http), clippy clean, fmt clean. Fixed 2 pre-existing websocket subscription tests that expected INVALID_OPERATION_TYPE but now get call.responded (dispatch_requested routes Subscription via invoke_streaming). All 9 ADR-049 decisions implemented. Placeholders removed (subscribe_stream_from_envelope, envelope_to_sse_stream, stream_subscription). from_mcp unchanged (always HandlerKind::Once).
Author	SHA1	Message	Date
glm-5.2	8c7443c7c6	docs(research): fix alknet-docker POC normalization crate boundary — alknet-compute is a workload not the fleet layer; add head-worker/machine-node model and dispatch reverse-runner prior art	2026-07-03 10:28:53 +00:00
glm-5.2	4500338384	docs(research): add alknet-tty phase-0 findings — terminal session protocol as separate ALPN, TtyBackend trait, dissolves alknet-ssh PTY hedge	2026-07-03 07:48:21 +00:00
glm-5.2	157f1dfb18	docs(research): add alknet-docker POC summary — validates two-carriage model (JSON + raw) for bollard docker ops over framed bidi streams	2026-07-02 17:08:52 +00:00
glm-5.2	e258ce0523	docs(review): mark review-streaming-impl completed — ADR-049 streaming handler review passes all 12 checklist points	2026-07-02 10:12:19 +00:00
glm-5.2	ab610730c0	docs(http): mark http/server/subscribe-sse-streaming completed — /subscribe pipes BoxStream to SSE	2026-07-02 10:10:55 +00:00
glm-5.2	c77024cdf5	fix(http): update websocket subscription tests to expect call.responded (dispatch_requested now routes Subscription via invoke_streaming)	2026-07-02 10:10:42 +00:00