Files
alknet/docs/research/alknet-ssh/phase-0-findings.md
glm-5.2 db1dcd362f docs(research): revise alknet-ssh phase-0 — SOCKS5+forwarding in v1, TCP listener for git-over-ssh
Incorporates user clarifications: SOCKS5 and bidirectional port forwarding are
core non-negotiable v1 features (the VPN-like use case + the 3.5k-clones
demand). Adds DP-10 for the bare-TCP SSH listener as a first-class path needed
for future git-over-SSH, with config shape reserved in v1 (off-by-default,
default-deny). Grounds the client/forwarding recommendations in the dispatch
downstream consumer at /workspace/@alkdev/dispatch, which is a textbook russh
SSH client + direct-tcpip forwarder the user wants to replace with this stack.

alknet-ssh now owns both server and client + SOCKS5-server in v1; the SOCKS5
codec may extract to a separate crate later (two-way door).
2026-06-25 08:46:35 +00:00

600 lines
35 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
status: draft
last_updated: 2026-06-25
---
# alknet-ssh — Phase 0 Research Findings
This document captures Phase 0 (Exploration) findings for the `alknet-ssh`
crate. The objective of Phase 0 per `docs/sdd_process.md` is: *"Capture vision
and guiding principles; research options; validate approaches; converge on a
recommended approach."* It is the input to Phase 1 (Architecture), where the
Architect will produce `docs/architecture/crates/ssh/*.md` specs, ADRs, and open
questions.
## Vision Recap
`alknet-ssh` is the SSH protocol handler for the ALPN-as-service architecture
(ADR-001). It registers the `alknet/ssh` ALPN on the shared `AlknetEndpoint`
and implements the `ProtocolHandler` trait (ADR-002, ADR-007).
The guiding insight, carried over from the reference implementation at
`/workspace/@alkdev/alknet-main/`, is:
> **SSH does not care where its underlying byte stream comes from.**
The reference implementation built on this — it ran the russh SSH-2 state
machine over a `Transport`-produced duplex stream (`AsyncRead + AsyncWrite +
Unpin + Send`) rather than over its own TCP sockets. The greenfield rebuild
keeps the insight and drops the messy transport-abstraction layer that grew
around it: in the new model the `AlknetEndpoint` hands the handler a `Connection`
(quinn/iroh QUIC), and the handler is responsible for opening/accepting the
bidirectional QUIC stream that carries the SSH-2 protocol.
The reference implementation reportedly has 3.5k clones in the past 14 days, so
there is real-world demand for the "SSH-over-arbitrary-stream" capability. The
greenfield rewrite is a total rewrite except most of the vault was initially
copied (also since rewritten).
## Sources Investigated
| Source | Path | Note |
|--------|------|------|
| Existing arch docs (core) | `docs/architecture/crates/core/*` | ProtocolHandler, Connection, BiStream, AuthContext, IdentityProvider, Endpoint |
| Existing ADRs 001027 | `docs/architecture/decisions/*` | All Accepted; ADR-002/007/010/004/011 most relevant to SSH |
| russh reference deep-dives | `docs/research/references/ssh/russh/01-06` | Already authored; covered overview, keys, protocol, crypto, internals, usage |
| russh source (authoritative) | `/workspace/russh/` | Checked out at `Cargo.toml` version `0.60.2`. The cargo registry cache only contains `russh-0.49.2` — older and NOT the intended version. **Use `/workspace/russh/` as the canonical 0.60.2 reference.** |
| alknet Cargo.lock | `Cargo.lock` | Does **not** yet contain a russh entry — russh is not wired into the workspace dependency graph yet |
| Reference implementation | `/workspace/@alkdev/alknet-main/` | `crates/alknet-core/src/{interface/ssh.rs, server/handler.rs, server/serve.rs, transport/*, client/*}` |
> **Note on the russh clone**: the `/workspace/russh` checkout was inspected and
> its `russh/Cargo.toml` declares `version = "0.60.2"` with `edition = "2024"`
> and MSRV 1.85 — matching the research references. The agent flagged the
> cargo-cache mismatch; verifying against the checkout rather than the cache is
> the safe choice since 0.49.2 → 0.60.2 spans major API changes
> (`server::run_stream` generic signature, `Auth` enum shape, `server::Handler`
> method set all differ). When alknet-ssh's `Cargo.toml` pins `russh = "0.60"`,
> Cargo will fetch the matching 0.60.x into the cache, at which point the cache
> becomes authoritative for *future* investigations.
## Straightforward Parts
These are settled by existing ADRs and the reference implementation; Phase 1
should document them as spec rather than re-litigate them.
### 1. SSH is a `ProtocolHandler` on `alknet/ssh`
Confirmed by overview.md's ALPN Registry and core-types.md. `SshAdapter`
implements `ProtocolHandler::handle(&self, connection: Connection, auth:
&AuthContext) -> Result<(), HandlerError>` with `alpn() = b"alknet/ssh"`. The
handler owns the entire `Connection` lifecycle (ADR-006: one ALPN, one
connection, one handler) and may open/accept multiple QUIC streams because it
multiplexes SSH channels.
### 2. SSH runs over a single QUIC bidirectional stream
The reference implementation's `transport/iroh_transport.rs` proves the
approach: open a QUIC bistream, then **join the two halves into a single duplex
type with `tokio::io::join(recv, send)`** and feed that to russh. This is the
key adapter — it is already a one-liner in tokio:
```rust
// from alknet-main/.../iroh_transport.rs:94
let conn = self.endpoint.connect(self.node_id, ALPN).await?;
let (send, recv) = conn.open_bi().await?;
Ok(io::join(recv, send)) // produces: AsyncRead + AsyncWrite + Unpin + Send
```
The Phase 0 research subagent initially speculated a custom `QuicSshStream`
adapter struct would be needed. Verifying against the reference implementation
revealed that `tokio::io::join` already produces the `AsyncRead + AsyncWrite`
combo russh requires (russh internally re-splits via `tokio::io::split`). **No
custom adapter struct is required** — the `Connection::accept_bi()` /
`open_bi()` pair plus `tokio::io::join` is sufficient. This is a meaningful
simplification over the speculative approach.
### 3. russh accepts a generic stream on both client and server side
Verified from `/workspace/russh/russh/src/`:
- `server::run_stream<H, R>(config: Arc<Config>, stream: R, handler: H)` where
`R: AsyncRead + AsyncWrite + Unpin + Send + 'static``server/mod.rs:997`.
- `client::connect_stream<H, R>(config: Arc<Config>, stream: R, handler: H)`
with the same bound — `client/mod.rs:982`.
Neither path assumes TCP — TCP-specific code (`set_nodelay`, `TcpListener`) is
confined to `run_on_socket` / `connect` / `run_on_address`. The generic stream
path is clean of TCP assumptions. russh writes its own SSH identification banner
first, then reads the peer's — no caller-side banner pre-work is needed.
### 4. SSH channels multiplex *inside* the QUIC bistream
`ChannelId(u32)` identifies channels; all channel traffic
(`CHANNEL_OPEN`/`DATA`/`EOF`/`CLOSE`/...) is interleaved on the single
underlying SSH transport stream that russh owns. **This is independent of
QUIC's own stream multiplexing** — one QUIC bistream ↔ one SSH connection ↔ many
SSH channels riding inside it. Port forwarding (`direct-tcpip`,
`forwarded-tcpip`) is ordinary channel traffic — each forwarded TCP connection
is a channel, not a separate QUIC stream.
This is the cleanest mapping and the right default: alknet-ssh does not try to
map SSH channels onto QUIC streams (which would require bypassing russh's own
multiplexer). It hands russh one bistream and lets russh multiplex inside it.
### 5. Auth routes through the shared `IdentityProvider`
ADR-004 establishes the hybrid auth model: the endpoint resolves what it can
(TLS client cert → fingerprint), the handler resolves what it must (SSH key
fingerprint). `auth.md` shows the `SshAdapter` pattern exactly — constructor-
inject `Arc<dyn IdentityProvider>`, call `resolve_from_fingerprint()` inside
`handle()` when `auth.identity` is `None`, store the resolved `Identity` on the
`Connection` via `set_identity()` for observability (OQ-11). The
`ConfigIdentityProvider` already resolves SSH key fingerprints against
`DynamicConfig::auth::authorized_keys_fingerprints`. No new auth machinery is
needed for SSH.
### 6. Outbound credentials (if any) come from `Capabilities`
ADR-014 / ADR-022 establish that handlers get outbound credentials through the
registration bundle's `capabilities` field, populated by the assembly layer
from the vault. SSH itself typically needs no outbound credentials (the SSH host
key is a network-identity concern, the SSH *client* key for auth comes from the
peer), but if alknet-ssh ever needs an outbound secret (e.g., to dial an upstream
SOCKS proxy), it comes from `Capabilities`, not from env vars or vault-on-wire.
### 7. TCP SSH is a handler concern, not an endpoint concern
ADR-010 is explicit: "TCP is NOT an endpoint concern... the SSH handler can
listen on a TCP socket independently." This means alknet-ssh may optionally bind
a plain TCP listener (port 22-style) and accept raw SSH connections *outside*
the ALPN endpoint. The `alknet/ssh` ALPN path and the bare-TCP path can coexist;
they share the same `russh::server::Config` and the same `server::Handler`
implementation, differing only in how the stream is obtained. This is a
two-way-door additive capability — the TCP listener can be added later without
touching the ALPN path.
## Less Straightforward Parts (Decision Points)
These are the points where Phase 0 surfaced genuine choices that affect the
architecture. Each is tagged with a recommended door type per ADR-009. The
Architect should turn the *accepted* recommendations into ADRs, and the
*deferred* ones into open questions.
### DP-1: Host key sourcing — vault-derived vs config-loaded vs both
*(Recommended: one-way door — needs an ADR)*
russh's `server::Config.keys: Vec<PrivateKey>` holds the SSH host keys the
server presents during key exchange. The host key is the SSH layer's analogue
of the TLS layer's network identity — it is what the *SSH client* verifies
against `known_hosts`. Three sourcing paths exist:
- **(a) Vault-derived**: derive an Ed25519 key from the alknet-vault seed (HD
path) and use it as the SSH host key. Aligns with the project's "everything
keys-from-seed" philosophy (ADR-020, ADR-026) and means the SSH host key is
deterministic from the mnemonic — a node restored from mnemonic gets the same
SSH host key fingerprint.
- **(b) Config-loaded**: operator provides SSH host key file path(s) in
`StaticConfig`/`DynamicConfig`. Matches how OpenSSH works
(`/etc/ssh/ssh_host_ed25519_key`). Simplest, decoupled from the vault.
- **(c) Both**: vault-derived by default, config override for operators who
bring their own keys. Mirrors the TLS identity model (ADR-027's
`TlsIdentity::RawKey` default + `X509`/`Acme` for domain-hosted).
**Recommendation**: **(c) both**, with vault-derived as the default. This
matches the symmetry with `TlsIdentity` in endpoint.md and respects the
"fingerprint-based, keys-from-seed" identity model. The vault is local-only by
construction (ADR-025) and assembly-layer-only access (ADR-019), so the SSH host
key is derived at startup and injected into `SshAdapter::Config` the same way
TLS RawKey identity is. Operators who want stable host keys independent of the
mnemonic can supply a key file. Phase 1 should write an ADR for this (likely
ADR-028) and a corresponding OQ if the exact config-field shape is unresolved.
### DP-2: Per-connection host key selection
*(Recommended: one-way door — needs an ADR, ties to DP-1)*
When supporting multiple host keys (e.g., an Ed25519 default + an RSA key for
legacy clients), russh's `server::Config.keys` is a `Vec` and russh negotiates
which to use based on the client's offered algorithms. The selection is
deterministic per-russh-version but not configurable per-connection. Question:
do we need per-peer host key selection (e.g., present different host keys to
different peer networks)? Almost certainly **no** for v1 — one host key set per
node, advertised uniformly. Phase 1 should record this as the simple model and
leave per-connection selection as a future two-way-door if a use case arises.
### DP-3: Crypto backend — `aws-lc-rs` (default) vs `ring`
*(Recommended: two-way door — decide at implementation time, but pin the choice
in an ADR if it has cross-crate consequences)*
russh 0.60.2 requires exactly one of `aws-lc-rs` (default) or `ring` enabled;
enabling both silently picks `aws-lc-rs`. Both produce AES-GCM / ChaCha20-Poly1305.
Considerations:
- `aws-lc-rs` is the russh default, has broader algorithm coverage, but brings
NIST build machinery (a heavier build, requires a C compiler + cmake for the
AWSLC build).
- `ring` is lighter-weight, smaller binary, simpler build.
- **Cross-crate consequence**: alknet-core already depends on `rustls-acme =
"0.12"` with `features = ["aws-lc-rs"]` (see `crates/alknet-core/Cargo.toml`),
so `aws-lc-rs` is already in the workspace's build. Choosing `ring` for russh
while alknet-core uses `aws-lc-rs` would put *both* crypto backends in the
final binary — wasteful but not incorrect.
**Recommendation**: **default to `aws-lc-rs`** (aligns with the rest of the
workspace and avoids a duplicate crypto backend), but treat the choice as a
two-way door — it can be flipped by changing `default-features = false` on
russh. Phase 1 should note this and *not* spend an ADR on it unless the
duplicate-backend concern turns out to matter for binary size.
### DP-4: Client side — full `russh::client` vs SSH-only-server
*(Recommended: one-way door — needs an ADR; user-clarified)*
alknet-ssh as described in the README is the *SSH handler* (server side of the
`alknet/ssh` ALPN). But the reference implementation also ships a substantial
**client** (`crates/alknet-core/src/client/*`: SOCKS5 client, connect logic,
channel manager, ~1900 lines) and a **SOCKS5** implementation
(`src/socks5/*`, ~800 lines) that turns the SSH server into a SOCKS5 *proxy
endpoint* clients can dial. The README lists alknet-ssh's purpose as "SSH
handler (russh), SOCKS5, port forwarding" — so the client/proxy functionality is
intended.
**User clarification (necessary context)**: SOCKS5 and port forwarding in
*both* directions are **core, non-negotiable features** for v1 — they are "the
basic features that made the first version gain interest" (3.5k clones/14 days).
The user runs an actual VPN-like topology (WireGuard + Postgres + Redis today)
over this, and explicitly wants the port-forwarding-in-both-directions
capability to unlock the VPN-like functionality in the new stack. The growing
world-wide trend of banning/blocking "VPNs" (most users use it as a proxy /
location-hiding tool) makes a self-hostable, stream-agnostic SSH-with-forwarding
stack strategically valuable beyond alknet itself.
A concrete downstream consumer that the user wants to *replace* with this stack
is `/workspace/@alkdev/dispatch` — a single-crate axum service that uses
`russh = "0.60"` as an SSH **client** to act as a "reverse git runner" for
Docker containers and remote GPU instances (vast.ai, and eventually runpod /
ubicloud / others). Dispatch's `src/ssh.rs` is a textbook russh client wrapper
(connect + auth + `channel_open_session().exec()` + `disconnect`), and its
`src/handlers.rs::start_forward` does `channel_open_direct_tcpip` local→remote
forwarding (the VPN-like pattern). Dispatch has no SOCKS5 — that's the
alknet-original feature the user wants preserved. Dispatch also factors into a
future "abstract container service" — both it and alknet-ssh share the SSH
client + forwarding primitives, which argues strongly for those primitives living
in alknet-ssh (not duplicated in each consumer).
This reframes the questions:
- Does alknet-ssh own *both* the SSH server (handling `alknet/ssh` connections)
*and* the SSH *client* (for outbound SSH dialing)? — **Yes** (recommended
strongly; dispatch and the VPN-like use case both need it, and factoring it
into alknet-ssh avoids primitive duplication).
- Is the SOCKS5 *server* (what an SSH connection's client dials *through* the
alknet node) a feature of alknet-ssh, or a separate crate? The SOCKS5 protocol
itself is transport-independent (it just needs a byte stream), so it *could*
factor out — but it's tightly coupled to the SSH-forwarding feature and to the
VPN-like use case. The user explicitly abstracts *some* things out to optional
crates but stresses that "some is pretty foundational stuff to ssh."
**Recommendation**: alknet-ssh owns **both** the SSH server (`ProtocolHandler`
for `alknet/ssh`) **and** the SSH client (outbound dialing, the primitives
dispatch and the VPN-like topology both consume). Port forwarding in both
directions (`direct-tcpip` local→remote, `forwarded-tcpip`/`tcpip_forward`
remote→local) is **in v1 scope**, not deferred. SOCKS5 is **in v1 scope within
alknet-ssh** (the VPN-like use case needs the node to expose a SOCKS5 *server*
that forwards over the SSH connection); the question of whether the SOCKS5
*protocol codec* factors into a tiny reusable `alknet-socks5` crate (consuming a
byte stream, reusable over other transports) is left as a two-way-door
implementation detail — recommend starting with the codec inside alknet-ssh and
extracting only if a second consumer appears (the "stream-agnostic" philosophy
says this extraction, if done, is cheap). Phase 1 writes an ADR recording this
scope: server + client + bidirectional forwarding + SOCKS5-server-all-in-v1.
### DP-5: Channel-policy surface — which SSH services does alknet-ssh expose?
*(Recommended: one-way door — needs an ADR, at least the default policy;
user-clarified)*
russh's `server::Handler` defaults every channel-request method to reject/no-op
(or, for `auth_publickey_offered`, accept the offer through to signature
verification). alknet-ssh must decide its default channel policy. The user's
clarification sharpens this:
- **session channels**: the dispatch use case uses `channel_open_session().exec()`
heavily — that's the "reverse git runner" pattern (run a command on the remote
instance, capture stdout/stderr/exit). For the **server side** of
`alknet/ssh`, though, the question is whether alknet-ssh *runs a real shell*
on its own node. Given the VPN-like / forwarding use case is primary and the
"shell server" use case is secondary, the default should be **exec-only**:
`shell_request` and `pty_request` default-reject; `exec_request` permitted
(gated by ACL — see forwarding below). This keeps alknet-ssh a focused
forwarding/exec appliance rather than a general-purpose interactive login
server. Interactive shell can be an explicit opt-in later (two-way door).
- **port forwarding in both directions** (`direct-tcpip` in, `tcpip_forward` /
`forwarded-tcpip` out): **in v1 scope, both directions**, per user
clarification. The *policy* (which destinations are allowed, whether to
restrict by ACL/scope) still needs specifying.
- **PTY/X11/agent forwarding**: default-reject for security; explicit opt-in.
(Consistent with the exec-only session stance.)
**Default-deny baseline**: the user explicitly called out that "the configuration
needs to be such that it's kind of 'default deny', which russh does by default."
russh's `server::Handler` already defaults every channel/auth/forwarding callback
to reject or no-op — so alknet-ssh gets default-deny for free by overriding
only the methods it wants to enable. Phase 1 must record this as the explicit
baseline: every forwarding destination, every exec command, every channel type
must be *explicitly permitted* by config + ACL, never implicitly allowed.
**ACL gating**: forwarding destinations and exec commands are gated by scopes on
the resolved `Identity`. The exact scope vocabulary (e.g., `ssh:forward:*`,
`ssh:forward:127.0.0.1:5432`, `ssh:exec:git-upload-pack`) is a design choice the
Architect makes — likely a small, capability-shaped scope set with wildcards,
consistent with `Identity.scopes` / `Identity.resources` (auth.md). The
"resources" field on `Identity` (populated only by composition per
`CompositionAuthority::as_identity`, ADR-022) is *not* available to
fingerprint/token-resolved external identities, so per-destination ACLs for
inbound SSH must live in `scopes`, not `resources`.
**Recommendation**: Phase 1 writes an ADR defining the v1 channel-policy
surface: exec (gated) + bidirectional port forwarding (gated), with
shell/PTY/X11/agent forwarding default-rejected. Default-deny baseline is
inherited from russh. Forwarding destinations + exec commands gated by ACL
scopes. The exact scope vocabulary is an OQ for Phase 1 (it interacts with how
operators express "allow forwarding to 127.0.0.1:5432" in `DynamicConfig`).
### DP-6: Auth method coverage — publickey-only vs password/kbdint too
*(Recommended: two-way door — start publickey-only, extend later)*
russh supports `none`, `password`, `publickey`, `keyboard-interactive`, and
OpenSSH certificate auth server-side. alknet's identity model (auth.md) is
*fingerprint-based* — SSH key fingerprint → `IdentityProvider` → `Identity`.
This maps naturally onto **publickey** (the fingerprint is the SHA-256 of the
presented public key) and **OpenSSH certificate** auth (cert fingerprint).
Password / keyboard-interactive don't fit the fingerprint model as cleanly
(there's no `resolve_from_password` on `IdentityProvider`).
**Recommendation**: **start publickey-only** (and certificate auth, which is a
superset of publickey from the fingerprint POV). Treat password /
keyboard-interactive as a two-way door — can be added later if a use case
arises, but the natural alknet identity story is key-based. Phase 1 should note
this; likely not a full ADR (it's a default, not a structural decision) but at
least a documented design choice in the ssh spec.
### DP-7: tokio as a hard transitive dependency
*(Recommended: acknowledged constraint, not a decision)*
russh 0.60.2 transitively requires tokio (no "no-tokio" feature; only WASM swaps
the spawner). The server loop uses `tokio::time::sleep` for keepalive/inactivity
timers, so the tokio runtime must have its time driver enabled. **alknet-ssh
must run inside a tokio runtime** — which it will, because alknet-core's endpoint
already runs on tokio (`tokio = { version = "1", features = ["full"] }`). This
is consistent with the rest of the workspace and not a constraint to fight.
Phase 1 should record it as a known constraint; OQ-09 (WASM boundaries) already
documents that the *server-side* dispatch path is a one-way door away from WASM
— alknet-ssh inherits that.
### DP-8: The `ssh-key` crate is forked
*(Recommended: acknowledged constraint — use the russh re-export)*
russh 0.60.2 depends on `internal-russh-forked-ssh-key = "0.6.18"` (a renamed
fork), **not** upstream `ssh-key`. alknet-ssh must not add upstream `ssh-key`
directly — that would put two `ssh-key` versions in the tree and the
`PublicKey`/`PrivateKey` types wouldn't unify. The fork is re-exported through
`russh::keys::ssh_key`, so alknet-ssh should always reach key types via
`russh::keys::*` (or `russh::keys::ssh_key::*`) to stay on the same fork. Phase
1 should note this as an implementation constraint; it's not architecturally
interesting but a real footgun if missed.
### DP-9: End-to-end over a non-TCP stream is untested upstream
*(Recommended: de-risk early with a POC test)*
russh's own test suite (`/workspace/russh/russh/src/tests.rs` and
`client/test.rs`) only exercises the client↔server round trip over real TCP
loopback. There is **no** test connecting `connect_stream` ↔ `run_stream` over
`tokio::io::duplex()` or any other in-memory pipe. The `SshRead::read_ssh_id`
unit tests feed `&[u8]` directly, proving the banner parser works on
non-socket streams — but a full client↔server round trip over a non-TCP stream
is unverified upstream.
The reference implementation uses this path in production (per
`transport/iroh_transport.rs` using `tokio::io::join`), which is strong
empirical evidence it works. But the alknet greenfield rewrite should **close
this gap early** with an integration test using `tokio::io::duplex()` connecting
`connect_stream` ↔ `run_stream` *before* going near real QUIC.
**Recommendation**: per `sdd_process.md` Phase 0, this is a candidate for a POC
Specialist task (`.worktrees/research/ssh-stream-poc/`). Phase 1's
architecture docs should reference the POC's outcome. If the POC surfaces
issues (half-open stream handling, `poll_shutdown` semantics, etc.), they feed
back into the spec as constraints.
### DP-10: Bare-TCP SSH listener — in-v1 for git-over-SSH forward-compat
*(Recommended: one-way door on the *config shape*, two-way door on the *listener
itself* — user-clarified)*
ADR-010 already establishes that bare-TCP SSH is a handler concern, not an
endpoint concern — the SSH handler can listen on a TCP socket independently of
the `alknet/ssh` ALPN path. The user added a forward-looking constraint: **"We
need to be able to have that TCP handler so we can later support git over ssh."**
Standard git-over-SSH (`ssh git@host ...`) runs on TCP port 22, not over QUIC,
not over the `alknet/ssh` ALPN — git clients (`git`, libgit2, `gix`) dial a TCP
socket and expect the SSH-2 protocol directly. To make alknet-ssh a viable
git-over-SSH target, the bare-TCP listener must be a first-class path, not just
a future two-way-door add-on.
The two paths (ALPN/QUIC vs bare-TCP) share the same `russh::server::Config` and
the same `server::Handler` implementation; they differ only in how the duplex
stream is obtained:
- **ALPN path**: `handle()` receives the QUIC `Connection`, calls
`accept_bi()`, `tokio::io::join`s the halves, hands to `run_stream`.
- **TCP path**: a `tokio::net::TcpListener` accept loop hands each accepted
`TcpStream` directly to `run_stream` (russh accepts `TcpStream` natively via
`run_on_socket`, or we use `run_stream` with the raw stream to keep config/
handler identical across both paths).
**Default-deny baseline (user-stated)**: "the configuration needs to be consider
such that it's kind of 'default deny', which russh does by default." This
applies to *both* paths — the same ACL gating, the same channel policy, the
same default-reject for forwarding destinations. A TCP-listener client gets
*exactly* the same policy treatment as an ALPN client; the only difference is
the transport. The TCP listener is **off by default** (must be explicitly
configured to bind), consistent with the default-deny posture — an operator
who doesn't configure a TCP bind address gets no TCP listener, only the ALPN
path.
**Recommendation**: Phase 1 records the dual-path model in the ssh spec —
ALPN/QUIC primary, bare-TCP as a co-equal first-class path (off by default,
explicit config to enable) — so that the **configuration shape** accommodates
both from v1 even if the TCP listener implementation lands slightly later.
Crucially, the **config schema** should reserve the TCP-listener fields now
(one-way door — adding a config field later is non-breaking but designing the
config *around* only-ALPN-then-retrofitting-TCP is messier than reserving the
shape up front). The listener implementation itself is a two-way door. This
avoids the trap where git-over-SSH becomes a painful retrofit because the
config only modeled the ALPN path.
## Tentative Recommended Approach (Convergence)
Based on the above, the recommended approach to take into Phase 1:
1. **Crate**: `alknet-ssh`, depends on `alknet-core` and `russh = "0.60"`
(default features, i.e. `aws-lc-rs`). Implements `ProtocolHandler` for
`b"alknet/ssh"`. **Owns both the SSH server and the SSH client** (the client
is the shared primitive dispatch and the VPN-like topology both consume).
2. **Stream wiring**: `handle()` accepts the QUIC `Connection`, calls
`connection.accept_bi()` once to get `(SendStream, RecvStream)`, joins them
with `tokio::io::join(recv, send)`, and hands the resulting duplex stream to
`russh::server::run_stream(Arc::clone(&config), stream, handler)`. One QUIC
bistream ↔ one SSH connection; russh multiplexes SSH channels inside it.
3. **Auth**: constructor-injected `Arc<dyn IdentityProvider>` (per auth.md's
`SshAdapter` example). Inside `handle()`, if `auth.identity` is `None`,
russh's `server::Handler::auth_publickey` resolves the offered key's
fingerprint through the provider; on success, store the resolved `Identity`
on the `Connection` via `set_identity()` (OQ-11). Start **publickey-only**
(plus OpenSSH cert, which rides the same fingerprint path).
4. **Host keys** (DP-1): vault-derived Ed25519 by default (derived from the
seed at startup by the assembly layer and injected into `SshAdapter`'s
config), with an optional config-supplied key file override. Symmetric with
`TlsIdentity::RawKey` (ADR-027). Needs an ADR.
5. **Channel policy — default-deny, exec + bidirectional forwarding in v1**
(DP-5): v1 supports `exec` (gated) + port forwarding in **both** directions
(`direct-tcpip` local→remote, `forwarded-tcpip`/`tcpip_forward`
remote→local, both gated). `shell`/PTY/X11/agent forwarding default-reject
(opt-in later, two-way door). **Default-deny baseline inherited from
russh** — every channel type, every forwarding destination, every exec
command must be explicitly permitted by config + ACL scopes; never
implicitly allowed. Forwarding destinations + exec commands gated by scopes
on the resolved `Identity` (the `resources` field is composition-only per
ADR-022, so inbound-SSH per-destination ACLs live in `scopes`). Needs an ADR
defining the v1 surface + the scope vocabulary (latter likely stays an OQ).
6. **Client + SOCKS5 — in v1, both in alknet-ssh** (DP-4): alknet-ssh owns the
SSH *server* (the `ProtocolHandler`) **and** the SSH *client* (outbound
dialing, the primitives dispatch and the VPN-like topology both consume).
Port forwarding in both directions is a *client-side* feature too (the
client opens `direct-tcpip` channels; dispatch does exactly this). SOCKS5
*server* (what an SSH connection's client dials *through* the alknet node)
is **in v1 within alknet-ssh** — the VPN-like use case requires it. The
SOCKS5 protocol codec may or may not factor into a tiny reusable
`alknet-socks5` crate (consuming a byte stream); recommend starting with the
codec inside alknet-ssh and extracting only if a second consumer appears
(two-way door — the stream-agnostic philosophy makes extraction cheap).
Needs an ADR confirming this scope.
7. **De-risk POC** (DP-9): a Phase 0 POC validating `connect_stream` ↔
`run_stream` over `tokio::io::duplex()` before Phase 1 finalizes the stream
wiring spec. Strong empirical evidence from the reference implementation
suggests it will pass, but the upstream test gap is real.
8. **Bare-TCP SSH listener — first-class path, config shape reserved in v1,
listener off-by-default** (DP-10): the `alknet/ssh` ALPN/QUIC path is
primary; a bare-TCP listener is a co-equal first-class path needed for
future git-over-SSH support. **Reserve the TCP-listener config fields in v1**
(one-way door on the config schema — retrofitting is messier than reserving
the shape up front). The listener is **off by default** (explicit config to
bind), consistent with the default-deny posture. Both paths share the same
`server::Config` + `Handler` + ACL policy — only the stream source differs.
The listener implementation itself is a two-way door, but the config shape is
locked in v1.
## Open Questions to Carry into Phase 1
The following should become OQs in `docs/architecture/open-questions.md`
(numbering will be assigned by the Architect — likely OQ-25 onwards, since
OQ-01OQ-24 exist):
- **OQ-SSH-01 (host key sourcing)**: vault-derived default + config override —
resolved by the DP-1 ADR.
- **OQ-SSH-02 (channel policy v1 surface + default-deny scope vocabulary)**:
the set of allowed channel types / request types is resolved by the DP-5
ADR; the exact scope vocabulary for forwarding destinations + exec commands
(e.g., `ssh:forward:127.0.0.1:5432` vs a resources-style shape) stays open —
it interacts with how operators express allow-lists in `DynamicConfig` and
with the fact that `Identity.resources` is composition-only (ADR-022).
- **OQ-SSH-03 (client + SOCKS5 scope)**: confirm alknet-ssh owns both server +
client + SOCKS5-server in v1, and whether the SOCKS5 codec extracts to a
separate crate now or later — resolved (in favor of in-alknet-ssh-now,
extract-later) by the DP-4 ADR.
- **OQ-SSH-04 (POC outcome)**: did the `duplex()`-based round-trip POC pass, and
did it surface any stream-handling constraints (half-open, `poll_shutdown`,
maximum packet size) that constrain the spec? Resolved by POC Specialist
results.
- **OQ-SSH-05 (crypto backend)**: confirm `aws-lc-rs` default aligns with the
rest of the workspace; defer flipping to `ring` unless binary-size pressure
arises. Two-way door.
- **OQ-SSH-06 (bare-TCP listener enablement timeline)**: the config shape is
reserved in v1 (DP-10); whether the TCP listener *implementation* lands in v1
or as a fast-follow is a two-way door. Git-over-SSH is the forcing function —
decide based on whether v1 needs to be a git-over-SSH target out of the box.
## Next Steps (Phase 0 → Phase 1)
1. **You decide** on the DP-1, DP-4, DP-5, DP-10 recommendations (or amend
them) — these are the load-bearing architectural choices, and DP-4/DP-5/DP-10
now reflect your clarifications (SOCKS5 + bidirectional forwarding + TCP
listener for git-over-SSH are all in-scope; default-deny baseline). DP-2,
DP-3, DP-6, DP-7, DP-8 are defaults I recommend accepting as-is; DP-9 is a
POC task.
2. **Optional POC** (DP-9): spawn a POC Specialist to validate
`connect_stream` ↔ `run_stream` over `tokio::io::duplex()`. Timeboxed; if it
passes, the stream-wiring spec is straightforward; if it surfaces
constraints, they fold into the spec.
3. **Phase 1 (Architect)**: produce `docs/architecture/crates/ssh/README.md` +
component specs (e.g., `ssh-handler.md`, `ssh-stream.md`, `ssh-channels.md`,
`ssh-auth.md`, `ssh-forwarding.md`, `ssh-socks5.md`, `ssh-client.md`,
`ssh-tcp-listener.md`), ADRs for the accepted DPs (likely ADR-028 host-key
sourcing, ADR-029 channel policy + default-deny, ADR-030 ssh server+client+
socks5+forwarding scope, ADR-031 bare-TCP listener config shape), and the
OQs above in `open-questions.md`. Update `docs/architecture/README.md` index
and ADR table.
## References
- `docs/sdd_process.md` — Phase 0 process definition
- `docs/architecture/overview.md` — ALPN-as-service, crate graph, ProtocolHandler
- `docs/architecture/crates/core/core-types.md` — ProtocolHandler, Connection, BiStream
- `docs/architecture/crates/core/auth.md` — AuthContext, IdentityProvider, SshAdapter example
- `docs/architecture/decisions/001-alpn-protocol-dispatch.md` — ALPN dispatch
- `docs/architecture/decisions/002-protocol-handler-trait.md` — ProtocolHandler trait
- `docs/architecture/decisions/004-auth-as-shared-core.md` — hybrid auth
- `docs/architecture/decisions/007-bistream-type-definition.md` — BiStream trait
- `docs/architecture/decisions/010-alpn-router-and-endpoint.md` — endpoint, TCP-is-handler-concern
- `docs/architecture/decisions/014-secret-material-flow-and-capability-injection.md` — Capabilities
- `docs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md` — registration bundle
- `docs/architecture/decisions/025-vault-local-only-dispatch.md` — vault local-only
- `docs/architecture/decisions/027-tls-identity-redesign-acme-rawkey-decoupling.md` — TLS identity model (symmetry reference for DP-1)
- `docs/research/references/ssh/russh/01-06` — existing russh deep-dives
- `/workspace/russh/` — russh 0.60.2 source (authoritative; cargo cache has 0.49.2 only)
- `/workspace/@alkdev/alknet-main/crates/alknet-core/src/` — reference implementation
(`transport/iroh_transport.rs:94` shows the `tokio::io::join` adapter; `server/`,
`interface/ssh.rs`, `client/`, `socks5/` for prior art)
- `/workspace/@alkdev/dispatch/` — concrete downstream consumer the user wants to
replace with this stack: axum + `russh = "0.60"` SSH **client** for "reverse git
runner" over Docker/vast.ai. `src/ssh.rs` (russh client wrapper, 143 lines),
`src/handlers.rs::start_forward` (`channel_open_direct_tcpip` local→remote
forwarding), `src/sftp.rs` (russh-sftp client). AGENTS.md and
`docs/architecture.md` describe the architecture. No SOCKS5 — that's the
alknet-original feature preserved here. Dispatch is a textbook consumer of the
alknet-ssh **client** + **forwarding** primitives, which is why those live in
alknet-ssh rather than being duplicated per-consumer.