Incorporates user clarifications: SOCKS5 and bidirectional port forwarding are core non-negotiable v1 features (the VPN-like use case + the 3.5k-clones demand). Adds DP-10 for the bare-TCP SSH listener as a first-class path needed for future git-over-SSH, with config shape reserved in v1 (off-by-default, default-deny). Grounds the client/forwarding recommendations in the dispatch downstream consumer at /workspace/@alkdev/dispatch, which is a textbook russh SSH client + direct-tcpip forwarder the user wants to replace with this stack. alknet-ssh now owns both server and client + SOCKS5-server in v1; the SOCKS5 codec may extract to a separate crate later (two-way door).
35 KiB
status, last_updated
| status | last_updated |
|---|---|
| draft | 2026-06-25 |
alknet-ssh — Phase 0 Research Findings
This document captures Phase 0 (Exploration) findings for the alknet-ssh
crate. The objective of Phase 0 per docs/sdd_process.md is: "Capture vision
and guiding principles; research options; validate approaches; converge on a
recommended approach." It is the input to Phase 1 (Architecture), where the
Architect will produce docs/architecture/crates/ssh/*.md specs, ADRs, and open
questions.
Vision Recap
alknet-ssh is the SSH protocol handler for the ALPN-as-service architecture
(ADR-001). It registers the alknet/ssh ALPN on the shared AlknetEndpoint
and implements the ProtocolHandler trait (ADR-002, ADR-007).
The guiding insight, carried over from the reference implementation at
/workspace/@alkdev/alknet-main/, is:
SSH does not care where its underlying byte stream comes from.
The reference implementation built on this — it ran the russh SSH-2 state
machine over a Transport-produced duplex stream (AsyncRead + AsyncWrite + Unpin + Send) rather than over its own TCP sockets. The greenfield rebuild
keeps the insight and drops the messy transport-abstraction layer that grew
around it: in the new model the AlknetEndpoint hands the handler a Connection
(quinn/iroh QUIC), and the handler is responsible for opening/accepting the
bidirectional QUIC stream that carries the SSH-2 protocol.
The reference implementation reportedly has 3.5k clones in the past 14 days, so there is real-world demand for the "SSH-over-arbitrary-stream" capability. The greenfield rewrite is a total rewrite except most of the vault was initially copied (also since rewritten).
Sources Investigated
| Source | Path | Note |
|---|---|---|
| Existing arch docs (core) | docs/architecture/crates/core/* |
ProtocolHandler, Connection, BiStream, AuthContext, IdentityProvider, Endpoint |
| Existing ADRs 001–027 | docs/architecture/decisions/* |
All Accepted; ADR-002/007/010/004/011 most relevant to SSH |
| russh reference deep-dives | docs/research/references/ssh/russh/01-06 |
Already authored; covered overview, keys, protocol, crypto, internals, usage |
| russh source (authoritative) | /workspace/russh/ |
Checked out at Cargo.toml version 0.60.2. The cargo registry cache only contains russh-0.49.2 — older and NOT the intended version. Use /workspace/russh/ as the canonical 0.60.2 reference. |
| alknet Cargo.lock | Cargo.lock |
Does not yet contain a russh entry — russh is not wired into the workspace dependency graph yet |
| Reference implementation | /workspace/@alkdev/alknet-main/ |
crates/alknet-core/src/{interface/ssh.rs, server/handler.rs, server/serve.rs, transport/*, client/*} |
Note on the russh clone: the
/workspace/russhcheckout was inspected and itsrussh/Cargo.tomldeclaresversion = "0.60.2"withedition = "2024"and MSRV 1.85 — matching the research references. The agent flagged the cargo-cache mismatch; verifying against the checkout rather than the cache is the safe choice since 0.49.2 → 0.60.2 spans major API changes (server::run_streamgeneric signature,Authenum shape,server::Handlermethod set all differ). When alknet-ssh'sCargo.tomlpinsrussh = "0.60", Cargo will fetch the matching 0.60.x into the cache, at which point the cache becomes authoritative for future investigations.
Straightforward Parts
These are settled by existing ADRs and the reference implementation; Phase 1 should document them as spec rather than re-litigate them.
1. SSH is a ProtocolHandler on alknet/ssh
Confirmed by overview.md's ALPN Registry and core-types.md. SshAdapter
implements ProtocolHandler::handle(&self, connection: Connection, auth: &AuthContext) -> Result<(), HandlerError> with alpn() = b"alknet/ssh". The
handler owns the entire Connection lifecycle (ADR-006: one ALPN, one
connection, one handler) and may open/accept multiple QUIC streams because it
multiplexes SSH channels.
2. SSH runs over a single QUIC bidirectional stream
The reference implementation's transport/iroh_transport.rs proves the
approach: open a QUIC bistream, then join the two halves into a single duplex
type with tokio::io::join(recv, send) and feed that to russh. This is the
key adapter — it is already a one-liner in tokio:
// from alknet-main/.../iroh_transport.rs:94
let conn = self.endpoint.connect(self.node_id, ALPN).await?;
let (send, recv) = conn.open_bi().await?;
Ok(io::join(recv, send)) // produces: AsyncRead + AsyncWrite + Unpin + Send
The Phase 0 research subagent initially speculated a custom QuicSshStream
adapter struct would be needed. Verifying against the reference implementation
revealed that tokio::io::join already produces the AsyncRead + AsyncWrite
combo russh requires (russh internally re-splits via tokio::io::split). No
custom adapter struct is required — the Connection::accept_bi() /
open_bi() pair plus tokio::io::join is sufficient. This is a meaningful
simplification over the speculative approach.
3. russh accepts a generic stream on both client and server side
Verified from /workspace/russh/russh/src/:
server::run_stream<H, R>(config: Arc<Config>, stream: R, handler: H)whereR: AsyncRead + AsyncWrite + Unpin + Send + 'static—server/mod.rs:997.client::connect_stream<H, R>(config: Arc<Config>, stream: R, handler: H)with the same bound —client/mod.rs:982.
Neither path assumes TCP — TCP-specific code (set_nodelay, TcpListener) is
confined to run_on_socket / connect / run_on_address. The generic stream
path is clean of TCP assumptions. russh writes its own SSH identification banner
first, then reads the peer's — no caller-side banner pre-work is needed.
4. SSH channels multiplex inside the QUIC bistream
ChannelId(u32) identifies channels; all channel traffic
(CHANNEL_OPEN/DATA/EOF/CLOSE/...) is interleaved on the single
underlying SSH transport stream that russh owns. This is independent of
QUIC's own stream multiplexing — one QUIC bistream ↔ one SSH connection ↔ many
SSH channels riding inside it. Port forwarding (direct-tcpip,
forwarded-tcpip) is ordinary channel traffic — each forwarded TCP connection
is a channel, not a separate QUIC stream.
This is the cleanest mapping and the right default: alknet-ssh does not try to map SSH channels onto QUIC streams (which would require bypassing russh's own multiplexer). It hands russh one bistream and lets russh multiplex inside it.
5. Auth routes through the shared IdentityProvider
ADR-004 establishes the hybrid auth model: the endpoint resolves what it can
(TLS client cert → fingerprint), the handler resolves what it must (SSH key
fingerprint). auth.md shows the SshAdapter pattern exactly — constructor-
inject Arc<dyn IdentityProvider>, call resolve_from_fingerprint() inside
handle() when auth.identity is None, store the resolved Identity on the
Connection via set_identity() for observability (OQ-11). The
ConfigIdentityProvider already resolves SSH key fingerprints against
DynamicConfig::auth::authorized_keys_fingerprints. No new auth machinery is
needed for SSH.
6. Outbound credentials (if any) come from Capabilities
ADR-014 / ADR-022 establish that handlers get outbound credentials through the
registration bundle's capabilities field, populated by the assembly layer
from the vault. SSH itself typically needs no outbound credentials (the SSH host
key is a network-identity concern, the SSH client key for auth comes from the
peer), but if alknet-ssh ever needs an outbound secret (e.g., to dial an upstream
SOCKS proxy), it comes from Capabilities, not from env vars or vault-on-wire.
7. TCP SSH is a handler concern, not an endpoint concern
ADR-010 is explicit: "TCP is NOT an endpoint concern... the SSH handler can
listen on a TCP socket independently." This means alknet-ssh may optionally bind
a plain TCP listener (port 22-style) and accept raw SSH connections outside
the ALPN endpoint. The alknet/ssh ALPN path and the bare-TCP path can coexist;
they share the same russh::server::Config and the same server::Handler
implementation, differing only in how the stream is obtained. This is a
two-way-door additive capability — the TCP listener can be added later without
touching the ALPN path.
Less Straightforward Parts (Decision Points)
These are the points where Phase 0 surfaced genuine choices that affect the architecture. Each is tagged with a recommended door type per ADR-009. The Architect should turn the accepted recommendations into ADRs, and the deferred ones into open questions.
DP-1: Host key sourcing — vault-derived vs config-loaded vs both
(Recommended: one-way door — needs an ADR)
russh's server::Config.keys: Vec<PrivateKey> holds the SSH host keys the
server presents during key exchange. The host key is the SSH layer's analogue
of the TLS layer's network identity — it is what the SSH client verifies
against known_hosts. Three sourcing paths exist:
- (a) Vault-derived: derive an Ed25519 key from the alknet-vault seed (HD path) and use it as the SSH host key. Aligns with the project's "everything keys-from-seed" philosophy (ADR-020, ADR-026) and means the SSH host key is deterministic from the mnemonic — a node restored from mnemonic gets the same SSH host key fingerprint.
- (b) Config-loaded: operator provides SSH host key file path(s) in
StaticConfig/DynamicConfig. Matches how OpenSSH works (/etc/ssh/ssh_host_ed25519_key). Simplest, decoupled from the vault. - (c) Both: vault-derived by default, config override for operators who
bring their own keys. Mirrors the TLS identity model (ADR-027's
TlsIdentity::RawKeydefault +X509/Acmefor domain-hosted).
Recommendation: (c) both, with vault-derived as the default. This
matches the symmetry with TlsIdentity in endpoint.md and respects the
"fingerprint-based, keys-from-seed" identity model. The vault is local-only by
construction (ADR-025) and assembly-layer-only access (ADR-019), so the SSH host
key is derived at startup and injected into SshAdapter::Config the same way
TLS RawKey identity is. Operators who want stable host keys independent of the
mnemonic can supply a key file. Phase 1 should write an ADR for this (likely
ADR-028) and a corresponding OQ if the exact config-field shape is unresolved.
DP-2: Per-connection host key selection
(Recommended: one-way door — needs an ADR, ties to DP-1)
When supporting multiple host keys (e.g., an Ed25519 default + an RSA key for
legacy clients), russh's server::Config.keys is a Vec and russh negotiates
which to use based on the client's offered algorithms. The selection is
deterministic per-russh-version but not configurable per-connection. Question:
do we need per-peer host key selection (e.g., present different host keys to
different peer networks)? Almost certainly no for v1 — one host key set per
node, advertised uniformly. Phase 1 should record this as the simple model and
leave per-connection selection as a future two-way-door if a use case arises.
DP-3: Crypto backend — aws-lc-rs (default) vs ring
(Recommended: two-way door — decide at implementation time, but pin the choice in an ADR if it has cross-crate consequences)
russh 0.60.2 requires exactly one of aws-lc-rs (default) or ring enabled;
enabling both silently picks aws-lc-rs. Both produce AES-GCM / ChaCha20-Poly1305.
Considerations:
aws-lc-rsis the russh default, has broader algorithm coverage, but brings NIST build machinery (a heavier build, requires a C compiler + cmake for the AWSLC build).ringis lighter-weight, smaller binary, simpler build.- Cross-crate consequence: alknet-core already depends on
rustls-acme = "0.12"withfeatures = ["aws-lc-rs"](seecrates/alknet-core/Cargo.toml), soaws-lc-rsis already in the workspace's build. Choosingringfor russh while alknet-core usesaws-lc-rswould put both crypto backends in the final binary — wasteful but not incorrect.
Recommendation: default to aws-lc-rs (aligns with the rest of the
workspace and avoids a duplicate crypto backend), but treat the choice as a
two-way door — it can be flipped by changing default-features = false on
russh. Phase 1 should note this and not spend an ADR on it unless the
duplicate-backend concern turns out to matter for binary size.
DP-4: Client side — full russh::client vs SSH-only-server
(Recommended: one-way door — needs an ADR; user-clarified)
alknet-ssh as described in the README is the SSH handler (server side of the
alknet/ssh ALPN). But the reference implementation also ships a substantial
client (crates/alknet-core/src/client/*: SOCKS5 client, connect logic,
channel manager, ~1900 lines) and a SOCKS5 implementation
(src/socks5/*, ~800 lines) that turns the SSH server into a SOCKS5 proxy
endpoint clients can dial. The README lists alknet-ssh's purpose as "SSH
handler (russh), SOCKS5, port forwarding" — so the client/proxy functionality is
intended.
User clarification (necessary context): SOCKS5 and port forwarding in both directions are core, non-negotiable features for v1 — they are "the basic features that made the first version gain interest" (3.5k clones/14 days). The user runs an actual VPN-like topology (WireGuard + Postgres + Redis today) over this, and explicitly wants the port-forwarding-in-both-directions capability to unlock the VPN-like functionality in the new stack. The growing world-wide trend of banning/blocking "VPNs" (most users use it as a proxy / location-hiding tool) makes a self-hostable, stream-agnostic SSH-with-forwarding stack strategically valuable beyond alknet itself.
A concrete downstream consumer that the user wants to replace with this stack
is /workspace/@alkdev/dispatch — a single-crate axum service that uses
russh = "0.60" as an SSH client to act as a "reverse git runner" for
Docker containers and remote GPU instances (vast.ai, and eventually runpod /
ubicloud / others). Dispatch's src/ssh.rs is a textbook russh client wrapper
(connect + auth + channel_open_session().exec() + disconnect), and its
src/handlers.rs::start_forward does channel_open_direct_tcpip local→remote
forwarding (the VPN-like pattern). Dispatch has no SOCKS5 — that's the
alknet-original feature the user wants preserved. Dispatch also factors into a
future "abstract container service" — both it and alknet-ssh share the SSH
client + forwarding primitives, which argues strongly for those primitives living
in alknet-ssh (not duplicated in each consumer).
This reframes the questions:
- Does alknet-ssh own both the SSH server (handling
alknet/sshconnections) and the SSH client (for outbound SSH dialing)? — Yes (recommended strongly; dispatch and the VPN-like use case both need it, and factoring it into alknet-ssh avoids primitive duplication). - Is the SOCKS5 server (what an SSH connection's client dials through the alknet node) a feature of alknet-ssh, or a separate crate? The SOCKS5 protocol itself is transport-independent (it just needs a byte stream), so it could factor out — but it's tightly coupled to the SSH-forwarding feature and to the VPN-like use case. The user explicitly abstracts some things out to optional crates but stresses that "some is pretty foundational stuff to ssh."
Recommendation: alknet-ssh owns both the SSH server (ProtocolHandler
for alknet/ssh) and the SSH client (outbound dialing, the primitives
dispatch and the VPN-like topology both consume). Port forwarding in both
directions (direct-tcpip local→remote, forwarded-tcpip/tcpip_forward
remote→local) is in v1 scope, not deferred. SOCKS5 is in v1 scope within
alknet-ssh (the VPN-like use case needs the node to expose a SOCKS5 server
that forwards over the SSH connection); the question of whether the SOCKS5
protocol codec factors into a tiny reusable alknet-socks5 crate (consuming a
byte stream, reusable over other transports) is left as a two-way-door
implementation detail — recommend starting with the codec inside alknet-ssh and
extracting only if a second consumer appears (the "stream-agnostic" philosophy
says this extraction, if done, is cheap). Phase 1 writes an ADR recording this
scope: server + client + bidirectional forwarding + SOCKS5-server-all-in-v1.
DP-5: Channel-policy surface — which SSH services does alknet-ssh expose?
(Recommended: one-way door — needs an ADR, at least the default policy; user-clarified)
russh's server::Handler defaults every channel-request method to reject/no-op
(or, for auth_publickey_offered, accept the offer through to signature
verification). alknet-ssh must decide its default channel policy. The user's
clarification sharpens this:
- session channels: the dispatch use case uses
channel_open_session().exec()heavily — that's the "reverse git runner" pattern (run a command on the remote instance, capture stdout/stderr/exit). For the server side ofalknet/ssh, though, the question is whether alknet-ssh runs a real shell on its own node. Given the VPN-like / forwarding use case is primary and the "shell server" use case is secondary, the default should be exec-only:shell_requestandpty_requestdefault-reject;exec_requestpermitted (gated by ACL — see forwarding below). This keeps alknet-ssh a focused forwarding/exec appliance rather than a general-purpose interactive login server. Interactive shell can be an explicit opt-in later (two-way door). - port forwarding in both directions (
direct-tcpipin,tcpip_forward/forwarded-tcpipout): in v1 scope, both directions, per user clarification. The policy (which destinations are allowed, whether to restrict by ACL/scope) still needs specifying. - PTY/X11/agent forwarding: default-reject for security; explicit opt-in. (Consistent with the exec-only session stance.)
Default-deny baseline: the user explicitly called out that "the configuration
needs to be such that it's kind of 'default deny', which russh does by default."
russh's server::Handler already defaults every channel/auth/forwarding callback
to reject or no-op — so alknet-ssh gets default-deny for free by overriding
only the methods it wants to enable. Phase 1 must record this as the explicit
baseline: every forwarding destination, every exec command, every channel type
must be explicitly permitted by config + ACL, never implicitly allowed.
ACL gating: forwarding destinations and exec commands are gated by scopes on
the resolved Identity. The exact scope vocabulary (e.g., ssh:forward:*,
ssh:forward:127.0.0.1:5432, ssh:exec:git-upload-pack) is a design choice the
Architect makes — likely a small, capability-shaped scope set with wildcards,
consistent with Identity.scopes / Identity.resources (auth.md). The
"resources" field on Identity (populated only by composition per
CompositionAuthority::as_identity, ADR-022) is not available to
fingerprint/token-resolved external identities, so per-destination ACLs for
inbound SSH must live in scopes, not resources.
Recommendation: Phase 1 writes an ADR defining the v1 channel-policy
surface: exec (gated) + bidirectional port forwarding (gated), with
shell/PTY/X11/agent forwarding default-rejected. Default-deny baseline is
inherited from russh. Forwarding destinations + exec commands gated by ACL
scopes. The exact scope vocabulary is an OQ for Phase 1 (it interacts with how
operators express "allow forwarding to 127.0.0.1:5432" in DynamicConfig).
DP-6: Auth method coverage — publickey-only vs password/kbdint too
(Recommended: two-way door — start publickey-only, extend later)
russh supports none, password, publickey, keyboard-interactive, and
OpenSSH certificate auth server-side. alknet's identity model (auth.md) is
fingerprint-based — SSH key fingerprint → IdentityProvider → Identity.
This maps naturally onto publickey (the fingerprint is the SHA-256 of the
presented public key) and OpenSSH certificate auth (cert fingerprint).
Password / keyboard-interactive don't fit the fingerprint model as cleanly
(there's no resolve_from_password on IdentityProvider).
Recommendation: start publickey-only (and certificate auth, which is a superset of publickey from the fingerprint POV). Treat password / keyboard-interactive as a two-way door — can be added later if a use case arises, but the natural alknet identity story is key-based. Phase 1 should note this; likely not a full ADR (it's a default, not a structural decision) but at least a documented design choice in the ssh spec.
DP-7: tokio as a hard transitive dependency
(Recommended: acknowledged constraint, not a decision)
russh 0.60.2 transitively requires tokio (no "no-tokio" feature; only WASM swaps
the spawner). The server loop uses tokio::time::sleep for keepalive/inactivity
timers, so the tokio runtime must have its time driver enabled. alknet-ssh
must run inside a tokio runtime — which it will, because alknet-core's endpoint
already runs on tokio (tokio = { version = "1", features = ["full"] }). This
is consistent with the rest of the workspace and not a constraint to fight.
Phase 1 should record it as a known constraint; OQ-09 (WASM boundaries) already
documents that the server-side dispatch path is a one-way door away from WASM
— alknet-ssh inherits that.
DP-8: The ssh-key crate is forked
(Recommended: acknowledged constraint — use the russh re-export)
russh 0.60.2 depends on internal-russh-forked-ssh-key = "0.6.18" (a renamed
fork), not upstream ssh-key. alknet-ssh must not add upstream ssh-key
directly — that would put two ssh-key versions in the tree and the
PublicKey/PrivateKey types wouldn't unify. The fork is re-exported through
russh::keys::ssh_key, so alknet-ssh should always reach key types via
russh::keys::* (or russh::keys::ssh_key::*) to stay on the same fork. Phase
1 should note this as an implementation constraint; it's not architecturally
interesting but a real footgun if missed.
DP-9: End-to-end over a non-TCP stream is untested upstream
(Recommended: de-risk early with a POC test)
russh's own test suite (/workspace/russh/russh/src/tests.rs and
client/test.rs) only exercises the client↔server round trip over real TCP
loopback. There is no test connecting connect_stream ↔ run_stream over
tokio::io::duplex() or any other in-memory pipe. The SshRead::read_ssh_id
unit tests feed &[u8] directly, proving the banner parser works on
non-socket streams — but a full client↔server round trip over a non-TCP stream
is unverified upstream.
The reference implementation uses this path in production (per
transport/iroh_transport.rs using tokio::io::join), which is strong
empirical evidence it works. But the alknet greenfield rewrite should close
this gap early with an integration test using tokio::io::duplex() connecting
connect_stream ↔ run_stream before going near real QUIC.
Recommendation: per sdd_process.md Phase 0, this is a candidate for a POC
Specialist task (.worktrees/research/ssh-stream-poc/). Phase 1's
architecture docs should reference the POC's outcome. If the POC surfaces
issues (half-open stream handling, poll_shutdown semantics, etc.), they feed
back into the spec as constraints.
DP-10: Bare-TCP SSH listener — in-v1 for git-over-SSH forward-compat
(Recommended: one-way door on the config shape, two-way door on the listener itself — user-clarified)
ADR-010 already establishes that bare-TCP SSH is a handler concern, not an
endpoint concern — the SSH handler can listen on a TCP socket independently of
the alknet/ssh ALPN path. The user added a forward-looking constraint: "We
need to be able to have that TCP handler so we can later support git over ssh."
Standard git-over-SSH (ssh git@host ...) runs on TCP port 22, not over QUIC,
not over the alknet/ssh ALPN — git clients (git, libgit2, gix) dial a TCP
socket and expect the SSH-2 protocol directly. To make alknet-ssh a viable
git-over-SSH target, the bare-TCP listener must be a first-class path, not just
a future two-way-door add-on.
The two paths (ALPN/QUIC vs bare-TCP) share the same russh::server::Config and
the same server::Handler implementation; they differ only in how the duplex
stream is obtained:
- ALPN path:
handle()receives the QUICConnection, callsaccept_bi(),tokio::io::joins the halves, hands torun_stream. - TCP path: a
tokio::net::TcpListeneraccept loop hands each acceptedTcpStreamdirectly torun_stream(russh acceptsTcpStreamnatively viarun_on_socket, or we userun_streamwith the raw stream to keep config/ handler identical across both paths).
Default-deny baseline (user-stated): "the configuration needs to be consider such that it's kind of 'default deny', which russh does by default." This applies to both paths — the same ACL gating, the same channel policy, the same default-reject for forwarding destinations. A TCP-listener client gets exactly the same policy treatment as an ALPN client; the only difference is the transport. The TCP listener is off by default (must be explicitly configured to bind), consistent with the default-deny posture — an operator who doesn't configure a TCP bind address gets no TCP listener, only the ALPN path.
Recommendation: Phase 1 records the dual-path model in the ssh spec — ALPN/QUIC primary, bare-TCP as a co-equal first-class path (off by default, explicit config to enable) — so that the configuration shape accommodates both from v1 even if the TCP listener implementation lands slightly later. Crucially, the config schema should reserve the TCP-listener fields now (one-way door — adding a config field later is non-breaking but designing the config around only-ALPN-then-retrofitting-TCP is messier than reserving the shape up front). The listener implementation itself is a two-way door. This avoids the trap where git-over-SSH becomes a painful retrofit because the config only modeled the ALPN path.
Tentative Recommended Approach (Convergence)
Based on the above, the recommended approach to take into Phase 1:
-
Crate:
alknet-ssh, depends onalknet-coreandrussh = "0.60"(default features, i.e.aws-lc-rs). ImplementsProtocolHandlerforb"alknet/ssh". Owns both the SSH server and the SSH client (the client is the shared primitive dispatch and the VPN-like topology both consume). -
Stream wiring:
handle()accepts the QUICConnection, callsconnection.accept_bi()once to get(SendStream, RecvStream), joins them withtokio::io::join(recv, send), and hands the resulting duplex stream torussh::server::run_stream(Arc::clone(&config), stream, handler). One QUIC bistream ↔ one SSH connection; russh multiplexes SSH channels inside it. -
Auth: constructor-injected
Arc<dyn IdentityProvider>(per auth.md'sSshAdapterexample). Insidehandle(), ifauth.identityisNone, russh'sserver::Handler::auth_publickeyresolves the offered key's fingerprint through the provider; on success, store the resolvedIdentityon theConnectionviaset_identity()(OQ-11). Start publickey-only (plus OpenSSH cert, which rides the same fingerprint path). -
Host keys (DP-1): vault-derived Ed25519 by default (derived from the seed at startup by the assembly layer and injected into
SshAdapter's config), with an optional config-supplied key file override. Symmetric withTlsIdentity::RawKey(ADR-027). Needs an ADR. -
Channel policy — default-deny, exec + bidirectional forwarding in v1 (DP-5): v1 supports
exec(gated) + port forwarding in both directions (direct-tcpiplocal→remote,forwarded-tcpip/tcpip_forwardremote→local, both gated).shell/PTY/X11/agent forwarding default-reject (opt-in later, two-way door). Default-deny baseline inherited from russh — every channel type, every forwarding destination, every exec command must be explicitly permitted by config + ACL scopes; never implicitly allowed. Forwarding destinations + exec commands gated by scopes on the resolvedIdentity(theresourcesfield is composition-only per ADR-022, so inbound-SSH per-destination ACLs live inscopes). Needs an ADR defining the v1 surface + the scope vocabulary (latter likely stays an OQ). -
Client + SOCKS5 — in v1, both in alknet-ssh (DP-4): alknet-ssh owns the SSH server (the
ProtocolHandler) and the SSH client (outbound dialing, the primitives dispatch and the VPN-like topology both consume). Port forwarding in both directions is a client-side feature too (the client opensdirect-tcpipchannels; dispatch does exactly this). SOCKS5 server (what an SSH connection's client dials through the alknet node) is in v1 within alknet-ssh — the VPN-like use case requires it. The SOCKS5 protocol codec may or may not factor into a tiny reusablealknet-socks5crate (consuming a byte stream); recommend starting with the codec inside alknet-ssh and extracting only if a second consumer appears (two-way door — the stream-agnostic philosophy makes extraction cheap). Needs an ADR confirming this scope. -
De-risk POC (DP-9): a Phase 0 POC validating
connect_stream↔run_streamovertokio::io::duplex()before Phase 1 finalizes the stream wiring spec. Strong empirical evidence from the reference implementation suggests it will pass, but the upstream test gap is real. -
Bare-TCP SSH listener — first-class path, config shape reserved in v1, listener off-by-default (DP-10): the
alknet/sshALPN/QUIC path is primary; a bare-TCP listener is a co-equal first-class path needed for future git-over-SSH support. Reserve the TCP-listener config fields in v1 (one-way door on the config schema — retrofitting is messier than reserving the shape up front). The listener is off by default (explicit config to bind), consistent with the default-deny posture. Both paths share the sameserver::Config+Handler+ ACL policy — only the stream source differs. The listener implementation itself is a two-way door, but the config shape is locked in v1.
Open Questions to Carry into Phase 1
The following should become OQs in docs/architecture/open-questions.md
(numbering will be assigned by the Architect — likely OQ-25 onwards, since
OQ-01–OQ-24 exist):
- OQ-SSH-01 (host key sourcing): vault-derived default + config override — resolved by the DP-1 ADR.
- OQ-SSH-02 (channel policy v1 surface + default-deny scope vocabulary):
the set of allowed channel types / request types is resolved by the DP-5
ADR; the exact scope vocabulary for forwarding destinations + exec commands
(e.g.,
ssh:forward:127.0.0.1:5432vs a resources-style shape) stays open — it interacts with how operators express allow-lists inDynamicConfigand with the fact thatIdentity.resourcesis composition-only (ADR-022). - OQ-SSH-03 (client + SOCKS5 scope): confirm alknet-ssh owns both server + client + SOCKS5-server in v1, and whether the SOCKS5 codec extracts to a separate crate now or later — resolved (in favor of in-alknet-ssh-now, extract-later) by the DP-4 ADR.
- OQ-SSH-04 (POC outcome): did the
duplex()-based round-trip POC pass, and did it surface any stream-handling constraints (half-open,poll_shutdown, maximum packet size) that constrain the spec? Resolved by POC Specialist results. - OQ-SSH-05 (crypto backend): confirm
aws-lc-rsdefault aligns with the rest of the workspace; defer flipping toringunless binary-size pressure arises. Two-way door. - OQ-SSH-06 (bare-TCP listener enablement timeline): the config shape is reserved in v1 (DP-10); whether the TCP listener implementation lands in v1 or as a fast-follow is a two-way door. Git-over-SSH is the forcing function — decide based on whether v1 needs to be a git-over-SSH target out of the box.
Next Steps (Phase 0 → Phase 1)
- You decide on the DP-1, DP-4, DP-5, DP-10 recommendations (or amend them) — these are the load-bearing architectural choices, and DP-4/DP-5/DP-10 now reflect your clarifications (SOCKS5 + bidirectional forwarding + TCP listener for git-over-SSH are all in-scope; default-deny baseline). DP-2, DP-3, DP-6, DP-7, DP-8 are defaults I recommend accepting as-is; DP-9 is a POC task.
- Optional POC (DP-9): spawn a POC Specialist to validate
connect_stream↔run_streamovertokio::io::duplex(). Timeboxed; if it passes, the stream-wiring spec is straightforward; if it surfaces constraints, they fold into the spec. - Phase 1 (Architect): produce
docs/architecture/crates/ssh/README.md+ component specs (e.g.,ssh-handler.md,ssh-stream.md,ssh-channels.md,ssh-auth.md,ssh-forwarding.md,ssh-socks5.md,ssh-client.md,ssh-tcp-listener.md), ADRs for the accepted DPs (likely ADR-028 host-key sourcing, ADR-029 channel policy + default-deny, ADR-030 ssh server+client+ socks5+forwarding scope, ADR-031 bare-TCP listener config shape), and the OQs above inopen-questions.md. Updatedocs/architecture/README.mdindex and ADR table.
References
docs/sdd_process.md— Phase 0 process definitiondocs/architecture/overview.md— ALPN-as-service, crate graph, ProtocolHandlerdocs/architecture/crates/core/core-types.md— ProtocolHandler, Connection, BiStreamdocs/architecture/crates/core/auth.md— AuthContext, IdentityProvider, SshAdapter exampledocs/architecture/decisions/001-alpn-protocol-dispatch.md— ALPN dispatchdocs/architecture/decisions/002-protocol-handler-trait.md— ProtocolHandler traitdocs/architecture/decisions/004-auth-as-shared-core.md— hybrid authdocs/architecture/decisions/007-bistream-type-definition.md— BiStream traitdocs/architecture/decisions/010-alpn-router-and-endpoint.md— endpoint, TCP-is-handler-concerndocs/architecture/decisions/014-secret-material-flow-and-capability-injection.md— Capabilitiesdocs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md— registration bundledocs/architecture/decisions/025-vault-local-only-dispatch.md— vault local-onlydocs/architecture/decisions/027-tls-identity-redesign-acme-rawkey-decoupling.md— TLS identity model (symmetry reference for DP-1)docs/research/references/ssh/russh/01-06— existing russh deep-dives/workspace/russh/— russh 0.60.2 source (authoritative; cargo cache has 0.49.2 only)/workspace/@alkdev/alknet-main/crates/alknet-core/src/— reference implementation (transport/iroh_transport.rs:94shows thetokio::io::joinadapter;server/,interface/ssh.rs,client/,socks5/for prior art)/workspace/@alkdev/dispatch/— concrete downstream consumer the user wants to replace with this stack: axum +russh = "0.60"SSH client for "reverse git runner" over Docker/vast.ai.src/ssh.rs(russh client wrapper, 143 lines),src/handlers.rs::start_forward(channel_open_direct_tcpiplocal→remote forwarding),src/sftp.rs(russh-sftp client). AGENTS.md anddocs/architecture.mddescribe the architecture. No SOCKS5 — that's the alknet-original feature preserved here. Dispatch is a textbook consumer of the alknet-ssh client + forwarding primitives, which is why those live in alknet-ssh rather than being duplicated per-consumer.