Phase 0 exploration for alknet-ssh: confirms SSH-over-QUIC-bistream via tokio::io::join (no custom adapter needed, per reference impl), russh 0.60.2 generic run_stream/connect_stream, and channel-into-bistream multiplexing. Surfaces 9 decision points for Phase 1: host key sourcing (vault-derived vs config), channel policy v1 surface, client + SOCKS5 crate split, crypto backend, auth method coverage, and a stream-handling POC to close russh's upstream test gap.
25 KiB
status, last_updated
| status | last_updated |
|---|---|
| draft | 2026-06-25 |
alknet-ssh — Phase 0 Research Findings
This document captures Phase 0 (Exploration) findings for the alknet-ssh
crate. The objective of Phase 0 per docs/sdd_process.md is: "Capture vision
and guiding principles; research options; validate approaches; converge on a
recommended approach." It is the input to Phase 1 (Architecture), where the
Architect will produce docs/architecture/crates/ssh/*.md specs, ADRs, and open
questions.
Vision Recap
alknet-ssh is the SSH protocol handler for the ALPN-as-service architecture
(ADR-001). It registers the alknet/ssh ALPN on the shared AlknetEndpoint
and implements the ProtocolHandler trait (ADR-002, ADR-007).
The guiding insight, carried over from the reference implementation at
/workspace/@alkdev/alknet-main/, is:
SSH does not care where its underlying byte stream comes from.
The reference implementation built on this — it ran the russh SSH-2 state
machine over a Transport-produced duplex stream (AsyncRead + AsyncWrite + Unpin + Send) rather than over its own TCP sockets. The greenfield rebuild
keeps the insight and drops the messy transport-abstraction layer that grew
around it: in the new model the AlknetEndpoint hands the handler a Connection
(quinn/iroh QUIC), and the handler is responsible for opening/accepting the
bidirectional QUIC stream that carries the SSH-2 protocol.
The reference implementation reportedly has 3.5k clones in the past 14 days, so there is real-world demand for the "SSH-over-arbitrary-stream" capability. The greenfield rewrite is a total rewrite except most of the vault was initially copied (also since rewritten).
Sources Investigated
| Source | Path | Note |
|---|---|---|
| Existing arch docs (core) | docs/architecture/crates/core/* |
ProtocolHandler, Connection, BiStream, AuthContext, IdentityProvider, Endpoint |
| Existing ADRs 001–027 | docs/architecture/decisions/* |
All Accepted; ADR-002/007/010/004/011 most relevant to SSH |
| russh reference deep-dives | docs/research/references/ssh/russh/01-06 |
Already authored; covered overview, keys, protocol, crypto, internals, usage |
| russh source (authoritative) | /workspace/russh/ |
Checked out at Cargo.toml version 0.60.2. The cargo registry cache only contains russh-0.49.2 — older and NOT the intended version. Use /workspace/russh/ as the canonical 0.60.2 reference. |
| alknet Cargo.lock | Cargo.lock |
Does not yet contain a russh entry — russh is not wired into the workspace dependency graph yet |
| Reference implementation | /workspace/@alkdev/alknet-main/ |
crates/alknet-core/src/{interface/ssh.rs, server/handler.rs, server/serve.rs, transport/*, client/*} |
Note on the russh clone: the
/workspace/russhcheckout was inspected and itsrussh/Cargo.tomldeclaresversion = "0.60.2"withedition = "2024"and MSRV 1.85 — matching the research references. The agent flagged the cargo-cache mismatch; verifying against the checkout rather than the cache is the safe choice since 0.49.2 → 0.60.2 spans major API changes (server::run_streamgeneric signature,Authenum shape,server::Handlermethod set all differ). When alknet-ssh'sCargo.tomlpinsrussh = "0.60", Cargo will fetch the matching 0.60.x into the cache, at which point the cache becomes authoritative for future investigations.
Straightforward Parts
These are settled by existing ADRs and the reference implementation; Phase 1 should document them as spec rather than re-litigate them.
1. SSH is a ProtocolHandler on alknet/ssh
Confirmed by overview.md's ALPN Registry and core-types.md. SshAdapter
implements ProtocolHandler::handle(&self, connection: Connection, auth: &AuthContext) -> Result<(), HandlerError> with alpn() = b"alknet/ssh". The
handler owns the entire Connection lifecycle (ADR-006: one ALPN, one
connection, one handler) and may open/accept multiple QUIC streams because it
multiplexes SSH channels.
2. SSH runs over a single QUIC bidirectional stream
The reference implementation's transport/iroh_transport.rs proves the
approach: open a QUIC bistream, then join the two halves into a single duplex
type with tokio::io::join(recv, send) and feed that to russh. This is the
key adapter — it is already a one-liner in tokio:
// from alknet-main/.../iroh_transport.rs:94
let conn = self.endpoint.connect(self.node_id, ALPN).await?;
let (send, recv) = conn.open_bi().await?;
Ok(io::join(recv, send)) // produces: AsyncRead + AsyncWrite + Unpin + Send
The Phase 0 research subagent initially speculated a custom QuicSshStream
adapter struct would be needed. Verifying against the reference implementation
revealed that tokio::io::join already produces the AsyncRead + AsyncWrite
combo russh requires (russh internally re-splits via tokio::io::split). No
custom adapter struct is required — the Connection::accept_bi() /
open_bi() pair plus tokio::io::join is sufficient. This is a meaningful
simplification over the speculative approach.
3. russh accepts a generic stream on both client and server side
Verified from /workspace/russh/russh/src/:
server::run_stream<H, R>(config: Arc<Config>, stream: R, handler: H)whereR: AsyncRead + AsyncWrite + Unpin + Send + 'static—server/mod.rs:997.client::connect_stream<H, R>(config: Arc<Config>, stream: R, handler: H)with the same bound —client/mod.rs:982.
Neither path assumes TCP — TCP-specific code (set_nodelay, TcpListener) is
confined to run_on_socket / connect / run_on_address. The generic stream
path is clean of TCP assumptions. russh writes its own SSH identification banner
first, then reads the peer's — no caller-side banner pre-work is needed.
4. SSH channels multiplex inside the QUIC bistream
ChannelId(u32) identifies channels; all channel traffic
(CHANNEL_OPEN/DATA/EOF/CLOSE/...) is interleaved on the single
underlying SSH transport stream that russh owns. This is independent of
QUIC's own stream multiplexing — one QUIC bistream ↔ one SSH connection ↔ many
SSH channels riding inside it. Port forwarding (direct-tcpip,
forwarded-tcpip) is ordinary channel traffic — each forwarded TCP connection
is a channel, not a separate QUIC stream.
This is the cleanest mapping and the right default: alknet-ssh does not try to map SSH channels onto QUIC streams (which would require bypassing russh's own multiplexer). It hands russh one bistream and lets russh multiplex inside it.
5. Auth routes through the shared IdentityProvider
ADR-004 establishes the hybrid auth model: the endpoint resolves what it can
(TLS client cert → fingerprint), the handler resolves what it must (SSH key
fingerprint). auth.md shows the SshAdapter pattern exactly — constructor-
inject Arc<dyn IdentityProvider>, call resolve_from_fingerprint() inside
handle() when auth.identity is None, store the resolved Identity on the
Connection via set_identity() for observability (OQ-11). The
ConfigIdentityProvider already resolves SSH key fingerprints against
DynamicConfig::auth::authorized_keys_fingerprints. No new auth machinery is
needed for SSH.
6. Outbound credentials (if any) come from Capabilities
ADR-014 / ADR-022 establish that handlers get outbound credentials through the
registration bundle's capabilities field, populated by the assembly layer
from the vault. SSH itself typically needs no outbound credentials (the SSH host
key is a network-identity concern, the SSH client key for auth comes from the
peer), but if alknet-ssh ever needs an outbound secret (e.g., to dial an upstream
SOCKS proxy), it comes from Capabilities, not from env vars or vault-on-wire.
7. TCP SSH is a handler concern, not an endpoint concern
ADR-010 is explicit: "TCP is NOT an endpoint concern... the SSH handler can
listen on a TCP socket independently." This means alknet-ssh may optionally bind
a plain TCP listener (port 22-style) and accept raw SSH connections outside
the ALPN endpoint. The alknet/ssh ALPN path and the bare-TCP path can coexist;
they share the same russh::server::Config and the same server::Handler
implementation, differing only in how the stream is obtained. This is a
two-way-door additive capability — the TCP listener can be added later without
touching the ALPN path.
Less Straightforward Parts (Decision Points)
These are the points where Phase 0 surfaced genuine choices that affect the architecture. Each is tagged with a recommended door type per ADR-009. The Architect should turn the accepted recommendations into ADRs, and the deferred ones into open questions.
DP-1: Host key sourcing — vault-derived vs config-loaded vs both
(Recommended: one-way door — needs an ADR)
russh's server::Config.keys: Vec<PrivateKey> holds the SSH host keys the
server presents during key exchange. The host key is the SSH layer's analogue
of the TLS layer's network identity — it is what the SSH client verifies
against known_hosts. Three sourcing paths exist:
- (a) Vault-derived: derive an Ed25519 key from the alknet-vault seed (HD path) and use it as the SSH host key. Aligns with the project's "everything keys-from-seed" philosophy (ADR-020, ADR-026) and means the SSH host key is deterministic from the mnemonic — a node restored from mnemonic gets the same SSH host key fingerprint.
- (b) Config-loaded: operator provides SSH host key file path(s) in
StaticConfig/DynamicConfig. Matches how OpenSSH works (/etc/ssh/ssh_host_ed25519_key). Simplest, decoupled from the vault. - (c) Both: vault-derived by default, config override for operators who
bring their own keys. Mirrors the TLS identity model (ADR-027's
TlsIdentity::RawKeydefault +X509/Acmefor domain-hosted).
Recommendation: (c) both, with vault-derived as the default. This
matches the symmetry with TlsIdentity in endpoint.md and respects the
"fingerprint-based, keys-from-seed" identity model. The vault is local-only by
construction (ADR-025) and assembly-layer-only access (ADR-019), so the SSH host
key is derived at startup and injected into SshAdapter::Config the same way
TLS RawKey identity is. Operators who want stable host keys independent of the
mnemonic can supply a key file. Phase 1 should write an ADR for this (likely
ADR-028) and a corresponding OQ if the exact config-field shape is unresolved.
DP-2: Per-connection host key selection
(Recommended: one-way door — needs an ADR, ties to DP-1)
When supporting multiple host keys (e.g., an Ed25519 default + an RSA key for
legacy clients), russh's server::Config.keys is a Vec and russh negotiates
which to use based on the client's offered algorithms. The selection is
deterministic per-russh-version but not configurable per-connection. Question:
do we need per-peer host key selection (e.g., present different host keys to
different peer networks)? Almost certainly no for v1 — one host key set per
node, advertised uniformly. Phase 1 should record this as the simple model and
leave per-connection selection as a future two-way-door if a use case arises.
DP-3: Crypto backend — aws-lc-rs (default) vs ring
(Recommended: two-way door — decide at implementation time, but pin the choice in an ADR if it has cross-crate consequences)
russh 0.60.2 requires exactly one of aws-lc-rs (default) or ring enabled;
enabling both silently picks aws-lc-rs. Both produce AES-GCM / ChaCha20-Poly1305.
Considerations:
aws-lc-rsis the russh default, has broader algorithm coverage, but brings NIST build machinery (a heavier build, requires a C compiler + cmake for the AWSLC build).ringis lighter-weight, smaller binary, simpler build.- Cross-crate consequence: alknet-core already depends on
rustls-acme = "0.12"withfeatures = ["aws-lc-rs"](seecrates/alknet-core/Cargo.toml), soaws-lc-rsis already in the workspace's build. Choosingringfor russh while alknet-core usesaws-lc-rswould put both crypto backends in the final binary — wasteful but not incorrect.
Recommendation: default to aws-lc-rs (aligns with the rest of the
workspace and avoids a duplicate crypto backend), but treat the choice as a
two-way door — it can be flipped by changing default-features = false on
russh. Phase 1 should note this and not spend an ADR on it unless the
duplicate-backend concern turns out to matter for binary size.
DP-4: Client side — full russh::client vs SSH-only-server
(Recommended: one-way door — needs an ADR)
alknet-ssh as described in the README is the SSH handler (server side of the
alknet/ssh ALPN). But the reference implementation also ships a substantial
client (crates/alknet-core/src/client/*: SOCKS5 client, connect logic,
channel manager, ~1900 lines) and a SOCKS5 implementation
(src/socks5/*, ~800 lines) that turns the SSH server into a SOCKS5 proxy
endpoint clients can dial. The README lists alknet-ssh's purpose as "SSH
handler (russh), SOCKS5, port forwarding" — so the client/proxy functionality is
intended.
Questions:
- Does alknet-ssh own both the SSH server (handling
alknet/sshconnections) and the SSH/SOCKS5 client (for the node to dial out via SSH to other hosts)? Or does the client live elsewhere? - Is the SOCKS5 server a feature of alknet-ssh, or a separate crate? The SOCKS5 protocol itself is independent of SSH (it just needs a byte stream), so it could be its own reusable crate that alknet-ssh composes with.
Recommendation: Phase 1 should clarify scope with an ADR. My tentative
recommendation: alknet-ssh owns the SSH server (the ProtocolHandler) plus
the SSH client (for outbound SSH dialing, needed for port forwarding and
SOCKS-via-SSH). SOCKS5 itself becomes a small, self-contained, reusable crate
(e.g., alknet-socks5) that consumes a byte stream — keeping it decoupled from
SSH matches the "stream-agnostic" philosophy and unlocks SOCKS5 reuse over
non-SSH transports. This is a real architectural choice that deserves an ADR
rather than an implicit decision.
DP-5: Channel-policy surface — which SSH services does alknet-ssh expose?
(Recommended: one-way door — needs an ADR, at least the default policy)
russh's server::Handler defaults every channel-request method to reject/no-op
(or, for auth_publickey_offered, accept the offer through to signature
verification). alknet-ssh must decide its default channel policy:
- session channels (
shell,exec,subsystem): does alknet-ssh run a real shell? A restricted command set? Nothing (exec-only)? This is a major behavioral choice. The reference implementation (per overview.md's "what stays") had a 974-lineserver/handler.rsand a 555-lineserver/channel_proxy.rs— it clearly did substantial channel work (proxying channels to upstream connections). - port forwarding (
direct-tcpipin,tcpip-forward/forwarded-tcpipout): the README explicitly lists "port forwarding" as an alknet-ssh feature, so this is in scope. But the policy (which destinations are allowed, whether to restrict by ACL/scope) needs specifying. - PTY/X11/agent forwarding: almost certainly disabled by default for security; explicit opt-in.
Recommendation: Phase 1 should write an ADR defining the v1 channel-policy surface — likely "exec + port-forwarding in scope; shell/PTY/X11/agent deferred; channel destinations gated by ACL scopes." The exact scope set is a design choice the Architect makes with the user.
DP-6: Auth method coverage — publickey-only vs password/kbdint too
(Recommended: two-way door — start publickey-only, extend later)
russh supports none, password, publickey, keyboard-interactive, and
OpenSSH certificate auth server-side. alknet's identity model (auth.md) is
fingerprint-based — SSH key fingerprint → IdentityProvider → Identity.
This maps naturally onto publickey (the fingerprint is the SHA-256 of the
presented public key) and OpenSSH certificate auth (cert fingerprint).
Password / keyboard-interactive don't fit the fingerprint model as cleanly
(there's no resolve_from_password on IdentityProvider).
Recommendation: start publickey-only (and certificate auth, which is a superset of publickey from the fingerprint POV). Treat password / keyboard-interactive as a two-way door — can be added later if a use case arises, but the natural alknet identity story is key-based. Phase 1 should note this; likely not a full ADR (it's a default, not a structural decision) but at least a documented design choice in the ssh spec.
DP-7: tokio as a hard transitive dependency
(Recommended: acknowledged constraint, not a decision)
russh 0.60.2 transitively requires tokio (no "no-tokio" feature; only WASM swaps
the spawner). The server loop uses tokio::time::sleep for keepalive/inactivity
timers, so the tokio runtime must have its time driver enabled. alknet-ssh
must run inside a tokio runtime — which it will, because alknet-core's endpoint
already runs on tokio (tokio = { version = "1", features = ["full"] }). This
is consistent with the rest of the workspace and not a constraint to fight.
Phase 1 should record it as a known constraint; OQ-09 (WASM boundaries) already
documents that the server-side dispatch path is a one-way door away from WASM
— alknet-ssh inherits that.
DP-8: The ssh-key crate is forked
(Recommended: acknowledged constraint — use the russh re-export)
russh 0.60.2 depends on internal-russh-forked-ssh-key = "0.6.18" (a renamed
fork), not upstream ssh-key. alknet-ssh must not add upstream ssh-key
directly — that would put two ssh-key versions in the tree and the
PublicKey/PrivateKey types wouldn't unify. The fork is re-exported through
russh::keys::ssh_key, so alknet-ssh should always reach key types via
russh::keys::* (or russh::keys::ssh_key::*) to stay on the same fork. Phase
1 should note this as an implementation constraint; it's not architecturally
interesting but a real footgun if missed.
DP-9: End-to-end over a non-TCP stream is untested upstream
(Recommended: de-risk early with a POC test)
russh's own test suite (/workspace/russh/russh/src/tests.rs and
client/test.rs) only exercises the client↔server round trip over real TCP
loopback. There is no test connecting connect_stream ↔ run_stream over
tokio::io::duplex() or any other in-memory pipe. The SshRead::read_ssh_id
unit tests feed &[u8] directly, proving the banner parser works on
non-socket streams — but a full client↔server round trip over a non-TCP stream
is unverified upstream.
The reference implementation uses this path in production (per
transport/iroh_transport.rs using tokio::io::join), which is strong
empirical evidence it works. But the alknet greenfield rewrite should close
this gap early with an integration test using tokio::io::duplex() connecting
connect_stream ↔ run_stream before going near real QUIC.
Recommendation: per sdd_process.md Phase 0, this is a candidate for a POC
Specialist task (.worktrees/research/ssh-stream-poc/). Phase 1's
architecture docs should reference the POC's outcome. If the POC surfaces
issues (half-open stream handling, poll_shutdown semantics, etc.), they feed
back into the spec as constraints.
Tentative Recommended Approach (Convergence)
Based on the above, the recommended approach to take into Phase 1:
-
Crate:
alknet-ssh, depends onalknet-coreandrussh = "0.60"(default features, i.e.aws-lc-rs). ImplementsProtocolHandlerforb"alknet/ssh". -
Stream wiring:
handle()accepts the QUICConnection, callsconnection.accept_bi()once to get(SendStream, RecvStream), joins them withtokio::io::join(recv, send), and hands the resulting duplex stream torussh::server::run_stream(Arc::clone(&config), stream, handler). One QUIC bistream ↔ one SSH connection; russh multiplexes SSH channels inside it. -
Auth: constructor-injected
Arc<dyn IdentityProvider>(per auth.md'sSshAdapterexample). Insidehandle(), ifauth.identityisNone, russh'sserver::Handler::auth_publickeyresolves the offered key's fingerprint through the provider; on success, store the resolvedIdentityon theConnectionviaset_identity()(OQ-11). Start publickey-only (plus OpenSSH cert, which rides the same fingerprint path). -
Host keys (DP-1): vault-derived Ed25519 by default (derived from the seed at startup by the assembly layer and injected into
SshAdapter's config), with an optional config-supplied key file override. Symmetric withTlsIdentity::RawKey(ADR-027). Needs an ADR. -
Channel policy (DP-5): v1 supports
exec+ port forwarding (direct-tcpip/forwarded-tcpip);shell/PTY/X11/agent forwarding deferred (default-reject). Forwarding destinations gated by ACL scopes on the resolvedIdentity. Needs an ADR defining the v1 surface. -
Client + SOCKS5 (DP-4): alknet-ssh also owns the SSH client (outbound dialing, needed for forwarding). SOCKS5 protocol factors out into a small reusable
alknet-socks5crate that consumes a byte stream — decoupled from SSH, reusable over other transports. Needs an ADR confirming the scope split. -
De-risk POC (DP-9): a Phase 0 POC validating
connect_stream↔run_streamovertokio::io::duplex()before Phase 1 finalizes the stream wiring spec. Strong empirical evidence from the reference implementation suggests it will pass, but the upstream test gap is real. -
TCP listener (DP-7/ADR-010): optional, additive, deferred past v1 — the
alknet/sshALPN path is the primary surface; a bare-TCP SSH listener can be added later sharing the sameserver::ConfigandHandler.
Open Questions to Carry into Phase 1
The following should become OQs in docs/architecture/open-questions.md
(numbering will be assigned by the Architect — likely OQ-25 onwards, since
OQ-01–OQ-24 exist):
- OQ-SSH-01 (host key sourcing): vault-derived default + config override — resolved by the DP-1 ADR.
- OQ-SSH-02 (channel policy v1 surface): the exact set of allowed channel types / request types — resolved by the DP-5 ADR; some sub-questions (e.g., default forwarding ACL) may stay open.
- OQ-SSH-03 (client + SOCKS5 split): confirm alknet-ssh owns the client and
alknet-socks5is a separate crate — resolved by the DP-4 ADR. - OQ-SSH-04 (POC outcome): did the
duplex()-based round-trip POC pass, and did it surface any stream-handling constraints (half-open,poll_shutdown, maximum packet size) that constrain the spec? Resolved by POC Specialist results. - OQ-SSH-05 (crypto backend): confirm
aws-lc-rsdefault aligns with the rest of the workspace; defer flipping toringunless binary-size pressure arises. Two-way door.
Next Steps (Phase 0 → Phase 1)
- You decide on the DP-1, DP-4, DP-5 recommendations (or amend them) — these are the load-bearing architectural choices. DP-3, DP-6, DP-7, DP-8 are defaults I recommend accepting as-is; DP-9 is a POC task.
- Optional POC (DP-9): spawn a POC Specialist to validate
connect_stream↔run_streamovertokio::io::duplex(). Timeboxed; if it passes, the stream-wiring spec is straightforward; if it surfaces constraints, they fold into the spec. - Phase 1 (Architect): produce
docs/architecture/crates/ssh/README.md+ component specs (e.g.,ssh-handler.md,ssh-stream.md,ssh-channels.md,ssh-auth.md), ADRs for the accepted DPs (likely ADR-028 host-key sourcing, ADR-029 channel policy, ADR-030 ssh client + socks5 split), and the OQs above inopen-questions.md. Updatedocs/architecture/README.mdindex and ADR table.
References
docs/sdd_process.md— Phase 0 process definitiondocs/architecture/overview.md— ALPN-as-service, crate graph, ProtocolHandlerdocs/architecture/crates/core/core-types.md— ProtocolHandler, Connection, BiStreamdocs/architecture/crates/core/auth.md— AuthContext, IdentityProvider, SshAdapter exampledocs/architecture/decisions/001-alpn-protocol-dispatch.md— ALPN dispatchdocs/architecture/decisions/002-protocol-handler-trait.md— ProtocolHandler traitdocs/architecture/decisions/004-auth-as-shared-core.md— hybrid authdocs/architecture/decisions/007-bistream-type-definition.md— BiStream traitdocs/architecture/decisions/010-alpn-router-and-endpoint.md— endpoint, TCP-is-handler-concerndocs/architecture/decisions/014-secret-material-flow-and-capability-injection.md— Capabilitiesdocs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md— registration bundledocs/architecture/decisions/025-vault-local-only-dispatch.md— vault local-onlydocs/architecture/decisions/027-tls-identity-redesign-acme-rawkey-decoupling.md— TLS identity model (symmetry reference for DP-1)docs/research/references/ssh/russh/01-06— existing russh deep-dives/workspace/russh/— russh 0.60.2 source (authoritative; cargo cache has 0.49.2 only)/workspace/@alkdev/alknet-main/crates/alknet-core/src/— reference implementation (transport/iroh_transport.rs:94shows thetokio::io::joinadapter;server/,interface/ssh.rs,client/,socks5/for prior art)