Reframes the SSH scope around the channel multiplexer as the decomposition point. Each feature (forwarding, SOCKS5, SFTP) is a channel type or a consumer of channel types, stacking on the core — each layer functional when built, none shipped broken. Dissolves the 'massive v1' framing that produced hedging language proposing non-functional or half-built versions. Three developments since the initial 2026-06-25 research changed the framing: (1) WebTransport landed as ADRs 038/040/043, grounding SSH-over-WebTransport as a constraint (the handler must be source-agnostic about its Connection); (2) russh's runtime abstraction (russh-util swaps tokio::spawn for wasm_bindgen_futures on wasm32) means the SSH *client* runs in WASM when fed a WebTransport BiStream — the browser case is real, not speculative; (3) the http crate intersection (ALPN-stream-proxy depends on SSH handlers being source-agnostic) is now visible and specified. The layered build order (1-4 stream+connection+channels+exec, then 5 forwarding, then 6 SOCKS5, then 7 SFTP) doubles as the configuration surface: each layer beyond the core is an opt-in channel type, gating on the default-deny ACL baseline inherited from russh.
45 KiB
status, last_updated
| status | last_updated |
|---|---|
| draft | 2026-06-29 |
alknet-ssh — Phase 0 Research Findings
This document captures Phase 0 (Exploration) findings for the alknet-ssh
crate. The objective of Phase 0 per docs/sdd_process.md is: "Capture vision
and guiding principles; research options; validate approaches; converge on a
recommended approach." It is the input to Phase 1 (Architecture), where the
Architect will produce docs/architecture/crates/ssh/*.md specs, ADRs, and
open questions.
This document was initially drafted 2026-06-25 and revised 2026-06-29 to reflect two developments that changed the framing: (1) the WebTransport architecture landed as ADRs 038/040/043, grounding the SSH-over-WebTransport path that was previously speculative; (2) the recognition that SSH's channel multiplexer is the natural decomposition point, dissolving the "massive v1 scope" problem into a stack of independently functional layers.
Vision Recap
alknet-ssh is the SSH protocol handler for the ALPN-as-service architecture
(ADR-001). It registers the alknet/ssh ALPN on the shared AlknetEndpoint
and implements the ProtocolHandler trait (ADR-002, ADR-007).
The guiding insight, carried over from the reference implementation at
/workspace/@alkdev/alknet-main/, is:
SSH does not care where its underlying byte stream comes from.
The reference implementation built on this — it ran the russh SSH-2 state
machine over a Transport-produced duplex stream (AsyncRead + AsyncWrite + Unpin + Send) rather than over its own TCP sockets. The greenfield rebuild
keeps the insight and drops the messy transport-abstraction layer that grew
around it: in the new model the AlknetEndpoint hands the handler a
Connection (quinn/iroh QUIC), and the handler is responsible for
opening/accepting the bidirectional QUIC stream that carries the SSH-2
protocol. The same handler can equally be reached via a WebTransport stream
proxied through the h3 ALPN-stream-proxy (ADR-040) — the handler sees a
Connection either way, and SSH doesn't care.
The reference implementation reportedly has ~3.5k clones in 14 days on the GitHub push mirror (30-60 unique clones/day, a mix of bots and humans/LLMs inspecting it). There is real-world demand for the "SSH-over-arbitrary-stream" capability. The greenfield rewrite is a total rewrite; the vault was initially copied and also since rewritten.
Sources Investigated
| Source | Path | Note |
|---|---|---|
| Existing arch docs (core) | docs/architecture/crates/core/* |
ProtocolHandler, Connection, BiStream, AuthContext, IdentityProvider, Endpoint |
| Existing arch docs (http) | docs/architecture/crates/http/* |
WebTransport substrate, ALPN-stream-proxy — new since initial research |
| Existing ADRs 001–043 | docs/architecture/decisions/* |
ADR-002/007/010/004/011 (core); ADR-038/040/043 (WebTransport, new) |
| russh reference deep-dives | docs/research/references/ssh/russh/01-06 |
Overview, keys, protocol, crypto, internals, usage |
| russh-sftp reference deep-dives | docs/research/references/ssh/russh-sftp/01-07 |
SFTP protocol, client/server API, data flow |
| russh source (authoritative) | /workspace/russh/ |
Cargo.toml version 0.60.2, edition 2024, MSRV 1.85. The cargo registry cache only contains russh-0.49.2; use /workspace/russh/ as canonical. |
| russh-sftp source | /workspace/russh-sftp/ |
SFTP subsystem implementation, WASM-targeted protocol parsing |
| alknet Cargo.lock | Cargo.lock |
Does not yet contain a russh entry |
| Reference implementation | /workspace/@alkdev/alknet-main/ |
crates/alknet-core/src/{interface/ssh.rs, server/*, client/*, socks5/*} |
| Concrete consumer | /workspace/@alkdev/dispatch/ |
axum + russh = "0.60" SSH client for "reverse git runner" over Docker/vast.ai. Textbook consumer of the SSH client + forwarding primitives. |
Note on the russh clone:
/workspace/russhdeclaresversion = "0.60.2"withedition = "2024"and MSRV 1.85 — matching the research references. The cargo-cache mismatch (0.49.2 only) matters because 0.49.2 → 0.60.2 spans major API changes (server::run_streamgeneric signature,Authenum shape,server::Handlermethod set all differ). When alknet-ssh'sCargo.tomlpinsrussh = "0.60", Cargo will fetch the matching 0.60.x.
The Channel Decomposition (Core Insight)
The initial research framed alknet-ssh's scope as a single massive v1: server
- client + SOCKS5 + bidirectional port forwarding, all at once. That framing made the crate feel unmanageably large and produced hedging language ("v1 default," "can be revisited later," "two-way door, decide later") that proposed shipping non-functional or half-built versions. This revision dissolves that problem by recognizing that SSH's channel multiplexer is the natural decomposition point, and the features that felt like a massive scope are layers that stack on top of it — each functional on its own.
How SSH channels work
SSH multiplexes multiple logical channels over a single encrypted transport
stream (RFC 4254). ChannelId(u32) identifies channels; all channel traffic
(CHANNEL_OPEN/DATA/EOF/CLOSE/...) is interleaved on the single
underlying SSH transport. This is independent of QUIC's own stream
multiplexing — one QUIC bistream (or one WebTransport stream, or one TCP
connection) ↔ one SSH connection ↔ many SSH channels riding inside it.
The crucial property: channel types are negotiated. If one side requests a channel type the other doesn't implement, the request is rejected with an error. This means a partial channel implementation is not "broken" — it correctly negotiates the types it supports and rejects the ones it doesn't. This is the opposite of a half-built protocol; it's a layered protocol where each layer stands on its own.
The layer stack
Layer 7: SFTP subsystem (channel type: "subsystem", name: "sftp")
Layer 6: SOCKS5 server (consumer of Layer 5 — opens direct-tcpip channels)
Layer 5: Port forwarding (channel types: "direct-tcpip", "forwarded-tcpip")
Layer 4: Session / exec (channel type: "session"; exec/shell/pty requests)
Layer 3: Channel multiplexer (russh internal — CHANNEL_OPEN/DATA/CLOSE)
Layer 2: SSH connection (key exchange, auth, encrypted session)
Layer 1: Stream transport (QUIC bistream / WebTransport stream / TCP)
Each layer is functional when built:
- Layers 1-4 (stream + SSH connection + channels + session/exec): a working
SSH server that authenticates and runs commands. This is immediately useful
— it's the dispatch "reverse git runner" primitive (
execon a session channel) and the foundation everything else builds on. - + Layer 5 (port forwarding): add
direct-tcpip(local→remote) andforwarded-tcpip/tcpip_forward(remote→local) channel types. Now the SSH connection can forward ports in both directions. Each forwarded connection is a channel, not a separate transport stream. This unlocks the VPN-like topology (WireGuard + Postgres + Redis over SSH forwarding) that the reference implementation was built for. - + Layer 6 (SOCKS5): a SOCKS5 server that accepts local connections and
opens
direct-tcpipchannels to forward them. It's a consumer of the forwarding API, not a new channel type — SOCKS5 is a protocol spoken on the client side (the entity that wants to proxy), and the forwarding channel is what carries the bytes. This is where the "maybe a separate crate" question lives: SOCKS5 is a consumer of Layer 5's API, so if that API is clean, SOCKS5 can be in alknet-ssh or extracted — a two-way door. - + Layer 7 (SFTP): a subsystem channel ("subsystem", name "sftp") that
runs the SFTP protocol.
russh-sftp::server::runtakes the channel's stream (channel.into_stream()→AsyncRead + AsyncWrite + Unpin + Send) and a handler. It's another channel-layer consumer, stacking on Layer 3/4.
No layer ships broken. You build 1-4, ship a working SSH+exec appliance. You add 5, ship a working SSH+forwarding appliance. You add 6, ship a working SSH+SOCKS5 proxy. You add 7, ship SFTP. Each increment is a complete, functional SSH server for the channel types it supports — and a clean rejection for the ones it doesn't. This is decomposition, not phasing: there is no "phase 1 ships something that can't be used."
What this means for the crate boundary
The decomposition clarifies which pieces are "foundational to SSH" vs "consumers of SSH":
- Foundational (in alknet-ssh): Layers 1-5. The stream transport, SSH
connection, channel multiplexer, session/exec, and port forwarding are the
SSH protocol itself. Forwarding (
direct-tcpip/forwarded-tcpip) is defined by RFC 4254 §7; it's not an add-on, it's part of the protocol. - Consumer (in alknet-ssh or extractable): Layers 6-7. SOCKS5 and SFTP are consumers of the channel API. SOCKS5 is a proxy protocol that opens forwarding channels; SFTP is a file protocol that runs over a subsystem channel. Both could live in alknet-ssh or in separate crates — the decision is a two-way door because they consume a clean interface (the channel/stream API), so extraction is cheap if a second consumer appears.
The "maybe a separate socks proxy crate, and maybe not" question is answered by this framing: start with SOCKS5 in alknet-ssh (the VPN-like use case needs it there), and extract only if a second consumer of the forwarding API appears — the stream-agnostic philosophy makes extraction cheap. SFTP is the same: start with it as a subsystem the SSH handler can serve, extract only if warranted. Neither is deferred; both are built as stacking layers.
What's Changed Since Initial Research
Three things changed between the initial 2026-06-25 research and this revision:
1. WebTransport is now architecturally grounded
ADRs 038 (HTTP/3 + WebTransport as first-class), 040 (WebTransport
ALPN-stream-proxy), and 043 (WebTransport as a bidirectional ALPN transport
substrate) now exist. The path "a browser opens a WebTransport session to
/alknet/ssh, the h3 handler proxies the stream to SshAdapter::handle(),
the browser runs a WASM SSH client over the stream" is no longer speculative
— the substrate is specified. ADR-040 Assumption 2 states the constraint
explicitly: the target ALPN handler accepts a proxied Connection; if a
handler assumes its Connection came from a specific QUIC source, it breaks
the proxy. alknet-ssh must not assume its stream came from accept_bi() on a
native QUIC connection — it could be a WebTransport stream wrapped as a
Connection.
This is a constraint on alknet-ssh's design, not a feature to add later:
the handler's stream-acquisition path must be source-agnostic from the start.
The tokio::io::join(recv, send) adapter works identically whether the halves
came from a QUIC bistream or a WebTransport stream — both produce
AsyncRead + AsyncWrite + Unpin + Send. The constraint is satisfied by
construction if alknet-ssh uses the BiStream/Connection abstraction rather
than reaching for concrete quinn types.
2. The SSH client can run in WASM
The initial research (DP-7) framed tokio as a hard transitive dependency and
treated WASM as a one-way-door closure on the server side (OQ-09). That's
correct for the server dispatch path (the accept loop uses tokio::spawn,
the endpoint is quinn-bound), but incorrect for the client side.
Verifying against /workspace/russh/russh-util/src/runtime.rs:
#[cfg(target_arch = "wasm32")]
macro_rules! spawn_impl { ($fn:expr) => { wasm_bindgen_futures::spawn_local($fn) }; }
#[cfg(not(target_arch = "wasm32"))]
macro_rules! spawn_impl { ($fn:expr) => { tokio::spawn($fn) }; }
russh's spawn swaps to wasm_bindgen_futures::spawn_local on wasm32, and
russh-util/src/time.rs swaps to a chrono-based Instant on WASM. The client
connect_stream<H, R>(config, stream, handler) path takes a generic
R: AsyncRead + AsyncWrite + Unpin + Send + 'static — if the stream is
provided externally (a WebTransport BiStream implemented in WASM), the
client state machine runs in WASM. The russh-sftp protocol parsing already
targets WASM, confirming the pattern.
The browser case is real: a browser connects via WebTransport to
/alknet/ssh, the hub's h3 handler proxies the stream to SshAdapter, and
the browser runs a WASM build of the alknet-ssh client (russh client +
connect_stream over a WebTransport BiStream) to speak SSH over the proxied
stream. The browser doesn't open native ports — it sends packets over the SSH
protocol, which forwards them as channels. The server side stays tokio-native
(the accept loop, the endpoint); the client side is the WASM target.
This reframes DP-7: tokio is a hard dependency for the server path, but
the client path is WASM-compatible because russh already abstracted its
runtime. alknet-ssh's client API must not reach for tokio-specific types
(TcpStream, tokio::net) in its public surface — the client should take a
stream, like russh's connect_stream does, so a WASM build can feed it a
WebTransport BiStream.
3. The http crate intersection is now visible
The alknet-http specs are drafted (ADR-036 through ADR-043). The
ALPN-stream-proxy (ADR-040) means alknet-http's h3 handler holds a
HandlerRegistry reference and routes WebTransport streams to ALPN handlers by
CONNECT path. alknet-ssh is one of those handlers. This is a structural
relationship: alknet-ssh doesn't depend on alknet-http, but alknet-http's
WebTransport path depends on alknet-ssh (and every other ALPN handler) being
source-agnostic about its Connection. The specs must be consistent on this
point — ADR-040 Assumption 2 is the contract both crates must honor.
Straightforward Parts
These are settled by existing ADRs, the reference implementation, and the channel decomposition. Phase 1 should document them as spec rather than re-litigate them.
1. SSH is a ProtocolHandler on alknet/ssh
Confirmed by overview.md's ALPN Registry and core-types.md. SshAdapter
implements ProtocolHandler::handle(&self, connection: Connection, auth: &AuthContext) -> Result<(), HandlerError> with alpn() = b"alknet/ssh". The
handler owns the entire Connection lifecycle (ADR-006: one ALPN, one
connection, one handler) and may open/accept multiple QUIC streams because it
multiplexes SSH channels inside a single bistream.
2. SSH runs over a single bidirectional stream — source-agnostic
The reference implementation's transport/iroh_transport.rs proves the
approach: open a QUIC bistream, join the two halves into a single duplex
type with tokio::io::join(recv, send) and feed that to russh. This is a
one-liner:
// from alknet-main/.../iroh_transport.rs:94
let conn = self.endpoint.connect(self.node_id, ALPN).await?;
let (send, recv) = conn.open_bi().await?;
Ok(io::join(recv, send)) // produces: AsyncRead + AsyncWrite + Unpin + Send
tokio::io::join already produces the AsyncRead + AsyncWrite combo russh
requires (russh internally re-splits via tokio::io::split). No custom
adapter struct is required — Connection::accept_bi() / open_bi() plus
tokio::io::join is sufficient for the QUIC path, and the same join pattern
works for a WebTransport stream wrapped as a Connection (ADR-040).
This is now a constraint, not just a finding: per ADR-040 Assumption 2,
the handler must accept a Connection that came from a WebTransport stream,
not assume it came from a native QUIC accept_bi(). The BiStream/Connection
abstraction (ADR-007) is what makes this work — alknet-ssh must use it, not
reach for concrete quinn types.
3. russh accepts a generic stream on both client and server side
Verified from /workspace/russh/russh/src/:
server::run_stream<H, R>(config: Arc<Config>, stream: R, handler: H)whereR: AsyncRead + AsyncWrite + Unpin + Send + 'static—server/mod.rs:997.client::connect_stream<H, R>(config: Arc<Config>, stream: R, handler: H)with the same bound —client/mod.rs:982.
Neither path assumes TCP — TCP-specific code (set_nodelay, TcpListener) is
confined to run_on_socket / connect / run_on_address. The generic stream
path is clean of TCP assumptions. russh writes its own SSH identification banner
first, then reads the peer's — no caller-side banner pre-work is needed.
4. SSH channels multiplex inside the stream — this is the decomposition axis
ChannelId(u32) identifies channels; all channel traffic is interleaved on
the single underlying SSH transport stream that russh owns. Port forwarding
(direct-tcpip, forwarded-tcpip) is ordinary channel traffic — each
forwarded TCP connection is a channel, not a separate stream. SFTP is a
subsystem channel. SOCKS5 is a consumer of forwarding channels.
This is the cleanest mapping and the right default: alknet-ssh does not try to map SSH channels onto QUIC streams (which would require bypassing russh's own multiplexer). It hands russh one bistream and lets russh multiplex inside it. The channel multiplexer is the decomposition point — each feature (forwarding, SOCKS5, SFTP) is a channel type or a consumer of channel types, stacking on Layer 3. See "The Channel Decomposition" above.
5. Auth routes through the shared IdentityProvider
ADR-004 establishes the hybrid auth model: the endpoint resolves what it can
(TLS client cert → fingerprint), the handler resolves what it must (SSH key
fingerprint). auth.md shows the SshAdapter pattern exactly — constructor-
inject Arc<dyn IdentityProvider>, call resolve_from_fingerprint() inside
handle() when auth.identity is None, store the resolved Identity on the
Connection via set_identity() for observability (OQ-11). The
ConfigIdentityProvider already resolves SSH key fingerprints against
DynamicConfig::auth::authorized_keys_fingerprints. No new auth machinery
is needed for SSH.
6. Outbound credentials (if any) come from Capabilities
ADR-014 / ADR-022 establish that handlers get outbound credentials through the
registration bundle's capabilities field, populated by the assembly layer
from the vault. SSH itself typically needs no outbound credentials (the SSH
host key is a network-identity concern, the SSH client key for auth comes from
the peer), but if alknet-ssh ever needs an outbound secret (e.g., to dial an
upstream SOCKS proxy), it comes from Capabilities, not from env vars or
vault-on-wire.
7. TCP SSH is a handler concern, not an endpoint concern
ADR-010 is explicit: "TCP is NOT an endpoint concern... the SSH handler can
listen on a TCP socket independently." This means alknet-ssh may optionally bind
a plain TCP listener (port 22-style) and accept raw SSH connections outside
the ALPN endpoint. The alknet/ssh ALPN path and the bare-TCP path can coexist;
they share the same russh::server::Config and the same server::Handler
implementation, differing only in how the stream is obtained. This is a
two-way-door additive capability — the TCP listener can be added without
touching the ALPN path.
8. The WebTransport path is grounded — SSH-over-WebTransport is a constraint
Per ADR-040/043, the h3 handler proxies WebTransport streams to ALPN
handlers. A browser opening a WebTransport session to /alknet/ssh gets its
stream handed to SshAdapter::handle() as a Connection. The browser runs a
WASM SSH client (the alknet-ssh client, built for wasm32) over the stream.
The handler must be source-agnostic about its Connection — this is a
constraint on the design, satisfied by using the BiStream/Connection
abstraction rather than concrete quinn types. This is no longer an open
question; it's a requirement.
Less Straightforward Parts (Decision Points)
These are the points where Phase 0 surfaced genuine choices that affect the architecture. Each is tagged with a door type per ADR-009. The Architect should turn the accepted recommendations into ADRs, and the genuinely unresolved ones into open questions. Door type classifies reversal cost, not urgency — a two-way door is a decision made now that can be reverted later, not a decision to defer (ADR-009 §"What this framework is NOT").
DP-1: Host key sourcing — vault-derived vs config-loaded vs both
(Recommended: one-way door — needs an ADR)
russh's server::Config.keys: Vec<PrivateKey> holds the SSH host keys the
server presents during key exchange. The host key is the SSH layer's analogue
of the TLS layer's network identity — it is what the SSH client verifies
against known_hosts. Three sourcing paths exist:
- (a) Vault-derived: derive an Ed25519 key from the alknet-vault seed (HD path) and use it as the SSH host key. Aligns with the project's "everything keys-from-seed" philosophy (ADR-020, ADR-026) and means the SSH host key is deterministic from the mnemonic — a node restored from mnemonic gets the same SSH host key fingerprint.
- (b) Config-loaded: operator provides SSH host key file path(s) in
StaticConfig/DynamicConfig. Matches how OpenSSH works (/etc/ssh/ssh_host_ed25519_key). Simplest, decoupled from the vault. - (c) Both: vault-derived by default, config override for operators who
bring their own keys. Mirrors the TLS identity model (ADR-027's
TlsIdentity::RawKeydefault +X509/Acmefor domain-hosted).
Recommendation: (c) both, with vault-derived as the default. This
matches the symmetry with TlsIdentity in endpoint.md and respects the
"fingerprint-based, keys-from-seed" identity model. The vault is local-only by
construction (ADR-025) and assembly-layer-only access (ADR-019), so the SSH host
key is derived at startup and injected into SshAdapter::Config the same way
TLS RawKey identity is. Operators who want stable host keys independent of the
mnemonic can supply a key file. Phase 1 should write an ADR for this and a
corresponding OQ if the exact config-field shape is unresolved.
DP-2: Per-connection host key selection
(Recommended: one-way door — needs an ADR, ties to DP-1)
When supporting multiple host keys (e.g., an Ed25519 default + an RSA key for
legacy clients), russh's server::Config.keys is a Vec and russh negotiates
which to use based on the client's offered algorithms. The selection is
deterministic per-russh-version but not configurable per-connection. Question:
do we need per-peer host key selection (e.g., present different host keys to
different peer networks)? No — one host key set per node, advertised
uniformly. Per-connection selection is not needed; if a use case arises, it's
an additive two-way-door. Phase 1 records the simple model.
DP-3: Crypto backend — aws-lc-rs (default) vs ring
(Recommended: two-way door — decided: aws-lc-rs, can flip later)
russh 0.60.2 requires exactly one of aws-lc-rs (default) or ring enabled;
enabling both silently picks aws-lc-rs. Both produce AES-GCM / ChaCha20-Poly1305.
aws-lc-rsis the russh default, has broader algorithm coverage, but brings NIST build machinery (a heavier build, requires a C compiler + cmake).ringis lighter-weight, smaller binary, simpler build.- Cross-crate consequence: alknet-core already depends on
rustls-acme = "0.12"withfeatures = ["aws-lc-rs"], soaws-lc-rsis already in the workspace's build. Choosingringfor russh while alknet-core usesaws-lc-rswould put both crypto backends in the final binary — wasteful but not incorrect.
Recommendation: aws-lc-rs (aligns with the rest of the workspace and
avoids a duplicate crypto backend). This is a decision, not a deferral — it's
a two-way door that can be flipped by changing default-features = false on
russh if binary-size pressure arises later. Phase 1 notes this; likely not a
full ADR (it's a default, not a structural decision) but a documented design
choice in the ssh spec.
DP-4: Client + forwarding + SOCKS5 + SFTP scope — reframed as layer order
(Recommended: one-way door on "all in alknet-ssh"; two-way door on extraction)
The initial research framed this as "is all of this in v1?" — a massive scope question. The channel decomposition dissolves it. The question is not "do we ship it all at once" but "what's the build order, and are all the layers in alknet-ssh?"
Server side (the ProtocolHandler for alknet/ssh): owns Layers 1-5
(stream transport, SSH connection, channels, session/exec, port forwarding).
These are the SSH protocol itself. Forwarding is defined by RFC 4254 §7 — it's
not an add-on. The server also serves SFTP (Layer 7) as a subsystem channel
when configured.
Client side (outbound SSH dialing): owns the same layers, as a client. The
client opens session channels for exec (the dispatch "reverse git runner"
pattern), opens direct-tcpip channels for local→remote forwarding, and
requests tcpip_forward for remote→local forwarding. The client is the WASM
target — russh's connect_stream runs in WASM when fed a WebTransport
BiStream. This is why the client lives in alknet-ssh, not in each consumer:
dispatch and the VPN-like topology both consume the same client + forwarding
primitives, and the browser case needs the client in WASM.
SOCKS5 (Layer 6): a consumer of the forwarding API. The SOCKS5 server
accepts local connections and opens direct-tcpip channels to forward them.
It lives in alknet-ssh because the VPN-like use case needs it there; if a
second consumer of the forwarding API appears, the SOCKS5 codec can extract to
a tiny alknet-socks5 crate (consuming a byte stream) — a two-way door, cheap
because the interface (the forwarding channel API) is clean.
SFTP (Layer 7): a subsystem channel. russh-sftp::server::run takes the
channel's stream and a handler. It's in alknet-ssh as a subsystem the server
can serve; the client side uses russh-sftp::client::SftpSession over a
channel stream. Same extraction logic as SOCKS5 — start in alknet-ssh, extract
only if warranted.
Recommendation: alknet-ssh owns all layers (server + client + forwarding + SOCKS5 + SFTP). The build order is 1-4 first (functional SSH+exec), then 5 (forwarding), then 6 (SOCKS5) and 7 (SFTP) — each layer functional when built, none shipped broken. Phase 1 writes an ADR confirming this scope and the layered build order. The extraction question (SOCKS5/SFTP to separate crates) is a two-way door, decided as "in alknet-ssh, extract if a second consumer appears" — a decision, not a deferral.
DP-5: Channel-policy surface — which SSH services does alknet-ssh expose?
(Recommended: one-way door — needs an ADR; the default-deny baseline is non-negotiable)
russh's server::Handler defaults every channel-request method to reject/no-op
(or, for auth_publickey_offered, accept the offer through to signature
verification). alknet-ssh must decide its default channel policy:
- session channels: the dispatch use case uses
channel_open_session().exec()heavily — the "reverse git runner" pattern (run a command on the remote instance, capture stdout/stderr/exit). For the server side ofalknet/ssh, the question is whether alknet-ssh runs a real shell on its own node. Given the VPN-like / forwarding use case is primary and the "shell server" use case is secondary, the default is exec-only:shell_requestandpty_requestdefault-reject;exec_requestpermitted (gated by ACL). This keeps alknet-ssh a focused forwarding/exec appliance rather than a general-purpose interactive login server. Interactive shell is an explicit opt-in (two-way door). - port forwarding in both directions (
direct-tcpipin,tcpip_forward/forwarded-tcpipout): in scope (Layer 5). The policy (which destinations are allowed, whether to restrict by ACL/scope) needs specifying. - SFTP subsystem: in scope (Layer 7), gated by ACL.
- PTY/X11/agent forwarding: default-reject for security; explicit opt-in. (Consistent with the exec-only session stance.)
Default-deny baseline: russh's server::Handler already defaults every
channel/auth/forwarding callback to reject or no-op — so alknet-ssh gets
default-deny for free by overriding only the methods it wants to enable. This
is the explicit baseline: every forwarding destination, every exec command,
every channel type must be explicitly permitted by config + ACL, never
implicitly allowed. This applies to both the ALPN/QUIC path and the
bare-TCP path (DP-10) — a TCP-listener client gets exactly the same policy
treatment; only the transport differs.
ACL gating: forwarding destinations and exec commands are gated by scopes on
the resolved Identity. The exact scope vocabulary (e.g., ssh:forward:*,
ssh:forward:127.0.0.1:5432, ssh:exec:git-upload-pack) is a design choice the
Architect makes — likely a small, capability-shaped scope set with wildcards,
consistent with Identity.scopes / Identity.resources (auth.md). The
"resources" field on Identity (populated only by composition per
CompositionAuthority::as_identity, ADR-022) is not available to
fingerprint/token-resolved external identities, so per-destination ACLs for
inbound SSH must live in scopes, not resources.
Recommendation: Phase 1 writes an ADR defining the channel-policy surface:
exec (gated) + bidirectional port forwarding (gated) + SFTP (gated), with
shell/PTY/X11/agent forwarding default-rejected. Default-deny baseline
inherited from russh. Forwarding destinations + exec commands gated by ACL
scopes. The exact scope vocabulary is an OQ for Phase 1 (it interacts with how
operators express "allow forwarding to 127.0.0.1:5432" in DynamicConfig).
DP-6: Auth method coverage — publickey-only vs password/kbdint too
(Recommended: two-way door — decided: publickey-only, extend later if needed)
russh supports none, password, publickey, keyboard-interactive, and
OpenSSH certificate auth server-side. alknet's identity model (auth.md) is
fingerprint-based — SSH key fingerprint → IdentityProvider → Identity.
This maps naturally onto publickey (the fingerprint is the SHA-256 of the
presented public key) and OpenSSH certificate auth (cert fingerprint).
Password / keyboard-interactive don't fit the fingerprint model as cleanly
(there's no resolve_from_password on IdentityProvider).
Recommendation: publickey-only (and certificate auth, which is a superset of publickey from the fingerprint POV). Password / keyboard-interactive are a two-way door — can be added later if a use case arises. Phase 1 notes this as a documented design choice in the ssh spec, likely not a full ADR (it's a default, not a structural decision).
DP-7: Runtime — tokio (server) vs WASM-compatible (client)
(Recommended: acknowledged constraint — server needs tokio, client is WASM-compatible)
russh 0.60.2 uses russh-util::runtime::spawn, which swaps to
wasm_bindgen_futures::spawn_local on wasm32 and tokio::spawn otherwise.
russh-util::time::Instant swaps to a chrono-based implementation on WASM.
This means:
- Server side (the
ProtocolHandleraccept path): requires tokio. The endpoint's accept loop usestokio::spawn, theConnectionis quinn-bound, and the dispatch path is a one-way door away from WASM (OQ-09). alknet-ssh's server inherits this — it runs inside the tokio runtime that alknet-core's endpoint already provides (tokio = { version = "1", features = ["full"] }). - Client side (outbound dialing / the WASM target): WASM-compatible. The
client
connect_streampath takes a generic stream; if the stream is a WebTransportBiStreamimplemented in WASM, the client state machine runs in WASM. alknet-ssh's client API must not reach for tokio-specific types (TcpStream,tokio::net) in its public surface — it should take a stream, like russh'sconnect_streamdoes, so a WASM build can feed it a WebTransportBiStream. The browser runs the alknet-ssh client in WASM to speak SSH over the proxied WebTransport stream (ADR-040/043).
Recommendation: Phase 1 records the split: server = tokio (hard constraint, consistent with workspace), client = WASM-compatible (russh already abstracted its runtime; alknet-ssh's client API preserves this by taking a stream, not a socket). This is a known constraint, not a decision to fight. OQ-09 (WASM boundaries) documents the server-side closure; the client-side WASM compatibility is a new finding that keeps the browser door open.
DP-8: The ssh-key crate is forked
(Recommended: acknowledged constraint — use the russh re-export)
russh 0.60.2 depends on internal-russh-forked-ssh-key = "0.6.18" (a renamed
fork), not upstream ssh-key. alknet-ssh must not add upstream ssh-key
directly — that would put two ssh-key versions in the tree and the
PublicKey/PrivateKey types wouldn't unify. The fork is re-exported through
russh::keys::ssh_key, so alknet-ssh should always reach key types via
russh::keys::* (or russh::keys::ssh_key::*) to stay on the same fork. Phase
1 notes this as an implementation constraint; it's a real footgun if missed.
DP-9: End-to-end over a non-TCP stream is untested upstream
(Recommended: de-risk early with a POC test)
russh's own test suite only exercises the client↔server round trip over real
TCP loopback. There is no test connecting connect_stream ↔ run_stream
over tokio::io::duplex() or any other in-memory pipe. The SshRead::read_ssh_id
unit tests feed &[u8] directly, proving the banner parser works on
non-socket streams — but a full client↔server round trip over a non-TCP stream
is unverified upstream.
The reference implementation uses this path in production (transport/iroh_transport.rs
using tokio::io::join), which is strong empirical evidence it works. But the
greenfield rewrite should close this gap early with an integration test
using tokio::io::duplex() connecting connect_stream ↔ run_stream before
going near real QUIC. The WebTransport path adds a second POC target: a
WebTransport stream wrapped as a BiStream/Connection fed to run_stream,
validating the ADR-040 Assumption 2 contract (the handler accepts a proxied
Connection).
Recommendation: per sdd_process.md Phase 0, this is a candidate for a
POC Specialist task (.worktrees/research/ssh-stream-poc/). Two POC scenarios:
(1) duplex()-based round trip, (2) WebTransport-stream-as-Connection →
run_stream. Phase 1's architecture docs reference the POC outcomes. If the
POC surfaces issues (half-open stream handling, poll_shutdown semantics,
maximum packet size), they feed back into the spec as constraints.
DP-10: Bare-TCP SSH listener — first-class path for git-over-SSH
(Recommended: one-way door on the config shape, two-way door on the listener itself)
ADR-010 establishes that bare-TCP SSH is a handler concern — the SSH handler
can listen on a TCP socket independently of the alknet/ssh ALPN path.
Git-over-SSH (ssh git@host ...) runs on TCP port 22, not over QUIC — git
clients (git, libgit2, gix) dial a TCP socket and expect the SSH-2 protocol
directly. To make alknet-ssh a viable git-over-SSH target, the bare-TCP listener
must be a first-class path.
The two paths (ALPN/QUIC vs bare-TCP) share the same russh::server::Config and
the same server::Handler implementation; they differ only in how the duplex
stream is obtained:
- ALPN path:
handle()receives the QUICConnection, callsaccept_bi(),tokio::io::joins the halves, hands torun_stream. - TCP path: a
tokio::net::TcpListeneraccept loop hands each acceptedTcpStreamdirectly torun_stream(orrun_on_socket, keeping config/ handler identical across both paths). - WebTransport path (new):
handle()receives aConnectionwrapped from a WebTransport stream (ADR-040); samerun_streamcall, same config/handler.
All three paths share the same server::Config + Handler + ACL policy —
only the stream source differs. The TCP listener is off by default (must
be explicitly configured to bind), consistent with the default-deny posture.
Recommendation: Phase 1 records the three-path model in the ssh spec — ALPN/QUIC primary, bare-TCP as a co-equal first-class path (off by default), WebTransport as the browser path (via ADR-040). Reserve the TCP-listener config fields (one-way door on the config schema — retrofitting is messier than reserving the shape up front). The listener implementation is a two-way door; the config shape is locked.
Recommended Approach: Layered Build Order
Based on the channel decomposition and the decision points above, the recommended approach to take into Phase 1:
Crate
alknet-ssh, depends on alknet-core and russh = "0.60" (default features,
i.e. aws-lc-rs). Implements ProtocolHandler for b"alknet/ssh". Owns
both the SSH server and the SSH client — the server is the ProtocolHandler;
the client is the shared primitive dispatch, the VPN-like topology, and the
browser-WASM case all consume. Owns all channel layers (1-7): stream
transport, SSH connection, channel multiplexer, session/exec, port
forwarding, SOCKS5, SFTP.
Build order (each layer functional when built)
Layer 1-4: SSH connection + channels + session/exec
- Stream wiring:
handle()accepts theConnection, callsaccept_bi()(or receives a WebTransport-proxied stream),tokio::io::joins the halves, hands torussh::server::run_stream. Source-agnostic (ADR-040 constraint). - Auth: constructor-injected
Arc<dyn IdentityProvider>. Insidehandle(), ifauth.identityisNone, russh'sserver::Handler::auth_publickeyresolves the offered key's fingerprint through the provider; on success, store the resolvedIdentityon theConnectionviaset_identity()(OQ-11). Publickey-only (plus OpenSSH cert). - Host keys (DP-1): vault-derived Ed25519 by default, optional config override.
- Channel policy: exec (gated) only; shell/PTY/X11/agent default-reject.
- Client:
connect_streamover a provided stream (WASM-compatible); session channelexecfor the dispatch "reverse git runner" pattern. - Result: a working SSH+exec appliance (server + client). Immediately useful.
Layer 5: Port forwarding (bidirectional)
direct-tcpip(local→remote) andforwarded-tcpip/tcpip_forward(remote→local) channel types, both gated by ACL scopes.- Client-side: opens
direct-tcpipchannels (dispatch'sstart_forwardpattern); requeststcpip_forwardfor remote→local. - Result: a working SSH+forwarding appliance. The VPN-like topology (WireGuard + Postgres + Redis over SSH forwarding) works.
Layer 6: SOCKS5 server
- A SOCKS5 server that accepts local connections and opens
direct-tcpipchannels to forward them. Consumer of Layer 5's API. - In alknet-ssh (the VPN-like use case needs it there). Extractable to
alknet-socks5if a second consumer appears (two-way door). - Result: a working SSH+SOCKS5 proxy. The reference implementation's SOCKS5 feature is preserved.
Layer 7: SFTP subsystem
- Server:
russh-sftp::server::runover a subsystem channel's stream. - Client:
russh-sftp::client::SftpSessionover a channel stream. - In alknet-ssh; extractable if warranted (two-way door).
- Result: SFTP file transfer over SSH.
De-risk POC (DP-9)
A Phase 0 POC validating connect_stream ↔ run_stream over
tokio::io::duplex(), plus a WebTransport-stream-as-Connection →
run_stream POC validating the ADR-040 contract. Timeboxed; if they pass, the
stream-wiring spec is straightforward; if they surface constraints, they fold
into the spec.
Three-path model (DP-10)
ALPN/QUIC primary, bare-TCP co-equal (off by default, config reserved in the
schema for git-over-SSH), WebTransport as the browser path (ADR-040). All three
share server::Config + Handler + ACL; only the stream source differs.
Open Questions to Carry into Phase 1
The following should become OQs in docs/architecture/open-questions.md
(numbering assigned by the Architect — likely OQ-41 onwards, since OQ-01–OQ-40
exist):
- OQ-SSH-01 (host key sourcing): vault-derived default + config override — resolved by the DP-1 ADR. The exact config-field shape may stay open.
- OQ-SSH-02 (channel policy v1 surface + default-deny scope vocabulary):
the set of allowed channel types / request types is resolved by the DP-5
ADR; the exact scope vocabulary for forwarding destinations + exec commands
(e.g.,
ssh:forward:127.0.0.1:5432vs a resources-style shape) stays open — it interacts with how operators express allow-lists inDynamicConfigand with the fact thatIdentity.resourcesis composition-only (ADR-022). - OQ-SSH-03 (SOCKS5/SFTP extraction): confirm SOCKS5 and SFTP start in alknet-ssh and extract only if a second consumer of the forwarding/channel API appears — resolved (in favor of in-alknet-ssh-now, extract-later) by the DP-4 ADR. Two-way door.
- OQ-SSH-04 (POC outcome): did the
duplex()-based round-trip POC pass, and did the WebTransport-stream POC validate the ADR-040 contract? Resolved by POC Specialist results. - OQ-SSH-05 (client WASM surface): confirm alknet-ssh's client API takes a
stream (not a socket), preserving the WASM door russh's runtime abstraction
opened. This is a design constraint, not a deferral — the client must not
reach for
tokio::nettypes in its public surface. - OQ-SSH-06 (bare-TCP listener): config shape reserved; listener implementation is a two-way door. Git-over-SSH is the forcing function — decide based on whether the build needs to be a git-over-SSH target.
Next Steps (Phase 0 → Phase 1)
- You decide on the DP recommendations (or amend them). DP-1, DP-4, DP-5, DP-10 are the load-bearing architectural choices. DP-2, DP-3, DP-6, DP-7, DP-8 are defaults recommended as-is; DP-9 is a POC task.
- POC (DP-9): spawn a POC Specialist to validate
connect_stream↔run_streamovertokio::io::duplex()and the WebTransport-stream path. Timeboxed; if it passes, the stream-wiring spec is straightforward; if it surfaces constraints, they fold into the spec. - Phase 1 (Architect): produce
docs/architecture/crates/ssh/README.md+ component specs organized by channel layer (e.g.,ssh-stream.mdfor Layer 1,ssh-connection.mdfor Layer 2,ssh-channels.mdfor Layer 3,ssh-exec.mdfor Layer 4,ssh-forwarding.mdfor Layer 5,ssh-socks5.mdfor Layer 6,ssh-sftp.mdfor Layer 7,ssh-client.mdfor the client/WASM path,ssh-tcp-listener.mdfor the bare-TCP path), ADRs for the accepted DPs (host-key sourcing, channel policy + default-deny, ssh server+client+ forwarding+socks5+sftp scope + layered build order, bare-TCP config shape), and the OQs above inopen-questions.md. Updatedocs/architecture/README.mdindex and ADR table.
References
docs/sdd_process.md— Phase 0 process definitiondocs/architecture/overview.md— ALPN-as-service, crate graph, ProtocolHandlerdocs/architecture/crates/core/core-types.md— ProtocolHandler, Connection, BiStreamdocs/architecture/crates/core/auth.md— AuthContext, IdentityProvider, SshAdapter exampledocs/architecture/crates/http/webtransport.md— WebTransport substrate specdocs/architecture/decisions/001-alpn-protocol-dispatch.md— ALPN dispatchdocs/architecture/decisions/002-protocol-handler-trait.md— ProtocolHandler traitdocs/architecture/decisions/004-auth-as-shared-core.md— hybrid authdocs/architecture/decisions/007-bistream-type-definition.md— BiStream traitdocs/architecture/decisions/010-alpn-router-and-endpoint.md— endpoint, TCP-is-handler-concerndocs/architecture/decisions/014-secret-material-flow-and-capability-injection.md— Capabilitiesdocs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md— registration bundledocs/architecture/decisions/025-vault-local-only-dispatch.md— vault local-onlydocs/architecture/decisions/027-tls-identity-redesign-acme-rawkey-decoupling.md— TLS identity model (symmetry reference for DP-1)docs/architecture/decisions/038-http3-and-webtransport-as-first-class.md— h3/WebTransport first-classdocs/architecture/decisions/040-webtransport-alpn-stream-proxy.md— ALPN-stream-proxy (SSH-over-WebTransport path)docs/architecture/decisions/043-webtransport-bidirectional-alpn-substrate.md— WebTransport as bidirectional ALPN substratedocs/research/references/ssh/russh/01-06— russh deep-dives (overview, keys, protocol, crypto, internals, usage)docs/research/references/ssh/russh-sftp/01-07— russh-sftp deep-dives (overview, wire protocol, key types, client/server API, data flow, quick reference)/workspace/russh/— russh 0.60.2 source (authoritative;russh-util/src/runtime.rsshows the WASM runtime swap)/workspace/russh-sftp/— russh-sftp source (WASM-targeted protocol parsing)/workspace/@alkdev/alknet-main/crates/alknet-core/src/— reference implementation (transport/iroh_transport.rs:94shows thetokio::io::joinadapter;server/,interface/ssh.rs,client/,socks5/for prior art)/workspace/@alkdev/dispatch/— concrete downstream consumer the user wants to replace with this stack: axum +russh = "0.60"SSH client for "reverse git runner" over Docker/vast.ai.src/ssh.rs(russh client wrapper, 143 lines),src/handlers.rs::start_forward(channel_open_direct_tcpiplocal→remote forwarding),src/sftp.rs(russh-sftp client). No SOCKS5 — that's the alknet-original feature preserved here. Dispatch is a textbook consumer of the alknet-ssh client + forwarding primitives, which is why those live in alknet-ssh rather than being duplicated per-consumer.