Files
alknet/docs/research/alknet-ssh/phase-0-findings.md
glm-5.2 db1dcd362f docs(research): revise alknet-ssh phase-0 — SOCKS5+forwarding in v1, TCP listener for git-over-ssh
Incorporates user clarifications: SOCKS5 and bidirectional port forwarding are
core non-negotiable v1 features (the VPN-like use case + the 3.5k-clones
demand). Adds DP-10 for the bare-TCP SSH listener as a first-class path needed
for future git-over-SSH, with config shape reserved in v1 (off-by-default,
default-deny). Grounds the client/forwarding recommendations in the dispatch
downstream consumer at /workspace/@alkdev/dispatch, which is a textbook russh
SSH client + direct-tcpip forwarder the user wants to replace with this stack.

alknet-ssh now owns both server and client + SOCKS5-server in v1; the SOCKS5
codec may extract to a separate crate later (two-way door).
2026-06-25 08:46:35 +00:00

35 KiB
Raw Blame History

status, last_updated
status last_updated
draft 2026-06-25

alknet-ssh — Phase 0 Research Findings

This document captures Phase 0 (Exploration) findings for the alknet-ssh crate. The objective of Phase 0 per docs/sdd_process.md is: "Capture vision and guiding principles; research options; validate approaches; converge on a recommended approach." It is the input to Phase 1 (Architecture), where the Architect will produce docs/architecture/crates/ssh/*.md specs, ADRs, and open questions.

Vision Recap

alknet-ssh is the SSH protocol handler for the ALPN-as-service architecture (ADR-001). It registers the alknet/ssh ALPN on the shared AlknetEndpoint and implements the ProtocolHandler trait (ADR-002, ADR-007).

The guiding insight, carried over from the reference implementation at /workspace/@alkdev/alknet-main/, is:

SSH does not care where its underlying byte stream comes from.

The reference implementation built on this — it ran the russh SSH-2 state machine over a Transport-produced duplex stream (AsyncRead + AsyncWrite + Unpin + Send) rather than over its own TCP sockets. The greenfield rebuild keeps the insight and drops the messy transport-abstraction layer that grew around it: in the new model the AlknetEndpoint hands the handler a Connection (quinn/iroh QUIC), and the handler is responsible for opening/accepting the bidirectional QUIC stream that carries the SSH-2 protocol.

The reference implementation reportedly has 3.5k clones in the past 14 days, so there is real-world demand for the "SSH-over-arbitrary-stream" capability. The greenfield rewrite is a total rewrite except most of the vault was initially copied (also since rewritten).

Sources Investigated

Source Path Note
Existing arch docs (core) docs/architecture/crates/core/* ProtocolHandler, Connection, BiStream, AuthContext, IdentityProvider, Endpoint
Existing ADRs 001027 docs/architecture/decisions/* All Accepted; ADR-002/007/010/004/011 most relevant to SSH
russh reference deep-dives docs/research/references/ssh/russh/01-06 Already authored; covered overview, keys, protocol, crypto, internals, usage
russh source (authoritative) /workspace/russh/ Checked out at Cargo.toml version 0.60.2. The cargo registry cache only contains russh-0.49.2 — older and NOT the intended version. Use /workspace/russh/ as the canonical 0.60.2 reference.
alknet Cargo.lock Cargo.lock Does not yet contain a russh entry — russh is not wired into the workspace dependency graph yet
Reference implementation /workspace/@alkdev/alknet-main/ crates/alknet-core/src/{interface/ssh.rs, server/handler.rs, server/serve.rs, transport/*, client/*}

Note on the russh clone: the /workspace/russh checkout was inspected and its russh/Cargo.toml declares version = "0.60.2" with edition = "2024" and MSRV 1.85 — matching the research references. The agent flagged the cargo-cache mismatch; verifying against the checkout rather than the cache is the safe choice since 0.49.2 → 0.60.2 spans major API changes (server::run_stream generic signature, Auth enum shape, server::Handler method set all differ). When alknet-ssh's Cargo.toml pins russh = "0.60", Cargo will fetch the matching 0.60.x into the cache, at which point the cache becomes authoritative for future investigations.

Straightforward Parts

These are settled by existing ADRs and the reference implementation; Phase 1 should document them as spec rather than re-litigate them.

1. SSH is a ProtocolHandler on alknet/ssh

Confirmed by overview.md's ALPN Registry and core-types.md. SshAdapter implements ProtocolHandler::handle(&self, connection: Connection, auth: &AuthContext) -> Result<(), HandlerError> with alpn() = b"alknet/ssh". The handler owns the entire Connection lifecycle (ADR-006: one ALPN, one connection, one handler) and may open/accept multiple QUIC streams because it multiplexes SSH channels.

2. SSH runs over a single QUIC bidirectional stream

The reference implementation's transport/iroh_transport.rs proves the approach: open a QUIC bistream, then join the two halves into a single duplex type with tokio::io::join(recv, send) and feed that to russh. This is the key adapter — it is already a one-liner in tokio:

// from alknet-main/.../iroh_transport.rs:94
let conn = self.endpoint.connect(self.node_id, ALPN).await?;
let (send, recv) = conn.open_bi().await?;
Ok(io::join(recv, send))   // produces: AsyncRead + AsyncWrite + Unpin + Send

The Phase 0 research subagent initially speculated a custom QuicSshStream adapter struct would be needed. Verifying against the reference implementation revealed that tokio::io::join already produces the AsyncRead + AsyncWrite combo russh requires (russh internally re-splits via tokio::io::split). No custom adapter struct is required — the Connection::accept_bi() / open_bi() pair plus tokio::io::join is sufficient. This is a meaningful simplification over the speculative approach.

3. russh accepts a generic stream on both client and server side

Verified from /workspace/russh/russh/src/:

  • server::run_stream<H, R>(config: Arc<Config>, stream: R, handler: H) where R: AsyncRead + AsyncWrite + Unpin + Send + 'staticserver/mod.rs:997.
  • client::connect_stream<H, R>(config: Arc<Config>, stream: R, handler: H) with the same bound — client/mod.rs:982.

Neither path assumes TCP — TCP-specific code (set_nodelay, TcpListener) is confined to run_on_socket / connect / run_on_address. The generic stream path is clean of TCP assumptions. russh writes its own SSH identification banner first, then reads the peer's — no caller-side banner pre-work is needed.

4. SSH channels multiplex inside the QUIC bistream

ChannelId(u32) identifies channels; all channel traffic (CHANNEL_OPEN/DATA/EOF/CLOSE/...) is interleaved on the single underlying SSH transport stream that russh owns. This is independent of QUIC's own stream multiplexing — one QUIC bistream ↔ one SSH connection ↔ many SSH channels riding inside it. Port forwarding (direct-tcpip, forwarded-tcpip) is ordinary channel traffic — each forwarded TCP connection is a channel, not a separate QUIC stream.

This is the cleanest mapping and the right default: alknet-ssh does not try to map SSH channels onto QUIC streams (which would require bypassing russh's own multiplexer). It hands russh one bistream and lets russh multiplex inside it.

5. Auth routes through the shared IdentityProvider

ADR-004 establishes the hybrid auth model: the endpoint resolves what it can (TLS client cert → fingerprint), the handler resolves what it must (SSH key fingerprint). auth.md shows the SshAdapter pattern exactly — constructor- inject Arc<dyn IdentityProvider>, call resolve_from_fingerprint() inside handle() when auth.identity is None, store the resolved Identity on the Connection via set_identity() for observability (OQ-11). The ConfigIdentityProvider already resolves SSH key fingerprints against DynamicConfig::auth::authorized_keys_fingerprints. No new auth machinery is needed for SSH.

6. Outbound credentials (if any) come from Capabilities

ADR-014 / ADR-022 establish that handlers get outbound credentials through the registration bundle's capabilities field, populated by the assembly layer from the vault. SSH itself typically needs no outbound credentials (the SSH host key is a network-identity concern, the SSH client key for auth comes from the peer), but if alknet-ssh ever needs an outbound secret (e.g., to dial an upstream SOCKS proxy), it comes from Capabilities, not from env vars or vault-on-wire.

7. TCP SSH is a handler concern, not an endpoint concern

ADR-010 is explicit: "TCP is NOT an endpoint concern... the SSH handler can listen on a TCP socket independently." This means alknet-ssh may optionally bind a plain TCP listener (port 22-style) and accept raw SSH connections outside the ALPN endpoint. The alknet/ssh ALPN path and the bare-TCP path can coexist; they share the same russh::server::Config and the same server::Handler implementation, differing only in how the stream is obtained. This is a two-way-door additive capability — the TCP listener can be added later without touching the ALPN path.

Less Straightforward Parts (Decision Points)

These are the points where Phase 0 surfaced genuine choices that affect the architecture. Each is tagged with a recommended door type per ADR-009. The Architect should turn the accepted recommendations into ADRs, and the deferred ones into open questions.

DP-1: Host key sourcing — vault-derived vs config-loaded vs both

(Recommended: one-way door — needs an ADR)

russh's server::Config.keys: Vec<PrivateKey> holds the SSH host keys the server presents during key exchange. The host key is the SSH layer's analogue of the TLS layer's network identity — it is what the SSH client verifies against known_hosts. Three sourcing paths exist:

  • (a) Vault-derived: derive an Ed25519 key from the alknet-vault seed (HD path) and use it as the SSH host key. Aligns with the project's "everything keys-from-seed" philosophy (ADR-020, ADR-026) and means the SSH host key is deterministic from the mnemonic — a node restored from mnemonic gets the same SSH host key fingerprint.
  • (b) Config-loaded: operator provides SSH host key file path(s) in StaticConfig/DynamicConfig. Matches how OpenSSH works (/etc/ssh/ssh_host_ed25519_key). Simplest, decoupled from the vault.
  • (c) Both: vault-derived by default, config override for operators who bring their own keys. Mirrors the TLS identity model (ADR-027's TlsIdentity::RawKey default + X509/Acme for domain-hosted).

Recommendation: (c) both, with vault-derived as the default. This matches the symmetry with TlsIdentity in endpoint.md and respects the "fingerprint-based, keys-from-seed" identity model. The vault is local-only by construction (ADR-025) and assembly-layer-only access (ADR-019), so the SSH host key is derived at startup and injected into SshAdapter::Config the same way TLS RawKey identity is. Operators who want stable host keys independent of the mnemonic can supply a key file. Phase 1 should write an ADR for this (likely ADR-028) and a corresponding OQ if the exact config-field shape is unresolved.

DP-2: Per-connection host key selection

(Recommended: one-way door — needs an ADR, ties to DP-1)

When supporting multiple host keys (e.g., an Ed25519 default + an RSA key for legacy clients), russh's server::Config.keys is a Vec and russh negotiates which to use based on the client's offered algorithms. The selection is deterministic per-russh-version but not configurable per-connection. Question: do we need per-peer host key selection (e.g., present different host keys to different peer networks)? Almost certainly no for v1 — one host key set per node, advertised uniformly. Phase 1 should record this as the simple model and leave per-connection selection as a future two-way-door if a use case arises.

DP-3: Crypto backend — aws-lc-rs (default) vs ring

(Recommended: two-way door — decide at implementation time, but pin the choice in an ADR if it has cross-crate consequences)

russh 0.60.2 requires exactly one of aws-lc-rs (default) or ring enabled; enabling both silently picks aws-lc-rs. Both produce AES-GCM / ChaCha20-Poly1305. Considerations:

  • aws-lc-rs is the russh default, has broader algorithm coverage, but brings NIST build machinery (a heavier build, requires a C compiler + cmake for the AWSLC build).
  • ring is lighter-weight, smaller binary, simpler build.
  • Cross-crate consequence: alknet-core already depends on rustls-acme = "0.12" with features = ["aws-lc-rs"] (see crates/alknet-core/Cargo.toml), so aws-lc-rs is already in the workspace's build. Choosing ring for russh while alknet-core uses aws-lc-rs would put both crypto backends in the final binary — wasteful but not incorrect.

Recommendation: default to aws-lc-rs (aligns with the rest of the workspace and avoids a duplicate crypto backend), but treat the choice as a two-way door — it can be flipped by changing default-features = false on russh. Phase 1 should note this and not spend an ADR on it unless the duplicate-backend concern turns out to matter for binary size.

DP-4: Client side — full russh::client vs SSH-only-server

(Recommended: one-way door — needs an ADR; user-clarified)

alknet-ssh as described in the README is the SSH handler (server side of the alknet/ssh ALPN). But the reference implementation also ships a substantial client (crates/alknet-core/src/client/*: SOCKS5 client, connect logic, channel manager, ~1900 lines) and a SOCKS5 implementation (src/socks5/*, ~800 lines) that turns the SSH server into a SOCKS5 proxy endpoint clients can dial. The README lists alknet-ssh's purpose as "SSH handler (russh), SOCKS5, port forwarding" — so the client/proxy functionality is intended.

User clarification (necessary context): SOCKS5 and port forwarding in both directions are core, non-negotiable features for v1 — they are "the basic features that made the first version gain interest" (3.5k clones/14 days). The user runs an actual VPN-like topology (WireGuard + Postgres + Redis today) over this, and explicitly wants the port-forwarding-in-both-directions capability to unlock the VPN-like functionality in the new stack. The growing world-wide trend of banning/blocking "VPNs" (most users use it as a proxy / location-hiding tool) makes a self-hostable, stream-agnostic SSH-with-forwarding stack strategically valuable beyond alknet itself.

A concrete downstream consumer that the user wants to replace with this stack is /workspace/@alkdev/dispatch — a single-crate axum service that uses russh = "0.60" as an SSH client to act as a "reverse git runner" for Docker containers and remote GPU instances (vast.ai, and eventually runpod / ubicloud / others). Dispatch's src/ssh.rs is a textbook russh client wrapper (connect + auth + channel_open_session().exec() + disconnect), and its src/handlers.rs::start_forward does channel_open_direct_tcpip local→remote forwarding (the VPN-like pattern). Dispatch has no SOCKS5 — that's the alknet-original feature the user wants preserved. Dispatch also factors into a future "abstract container service" — both it and alknet-ssh share the SSH client + forwarding primitives, which argues strongly for those primitives living in alknet-ssh (not duplicated in each consumer).

This reframes the questions:

  • Does alknet-ssh own both the SSH server (handling alknet/ssh connections) and the SSH client (for outbound SSH dialing)? — Yes (recommended strongly; dispatch and the VPN-like use case both need it, and factoring it into alknet-ssh avoids primitive duplication).
  • Is the SOCKS5 server (what an SSH connection's client dials through the alknet node) a feature of alknet-ssh, or a separate crate? The SOCKS5 protocol itself is transport-independent (it just needs a byte stream), so it could factor out — but it's tightly coupled to the SSH-forwarding feature and to the VPN-like use case. The user explicitly abstracts some things out to optional crates but stresses that "some is pretty foundational stuff to ssh."

Recommendation: alknet-ssh owns both the SSH server (ProtocolHandler for alknet/ssh) and the SSH client (outbound dialing, the primitives dispatch and the VPN-like topology both consume). Port forwarding in both directions (direct-tcpip local→remote, forwarded-tcpip/tcpip_forward remote→local) is in v1 scope, not deferred. SOCKS5 is in v1 scope within alknet-ssh (the VPN-like use case needs the node to expose a SOCKS5 server that forwards over the SSH connection); the question of whether the SOCKS5 protocol codec factors into a tiny reusable alknet-socks5 crate (consuming a byte stream, reusable over other transports) is left as a two-way-door implementation detail — recommend starting with the codec inside alknet-ssh and extracting only if a second consumer appears (the "stream-agnostic" philosophy says this extraction, if done, is cheap). Phase 1 writes an ADR recording this scope: server + client + bidirectional forwarding + SOCKS5-server-all-in-v1.

DP-5: Channel-policy surface — which SSH services does alknet-ssh expose?

(Recommended: one-way door — needs an ADR, at least the default policy; user-clarified)

russh's server::Handler defaults every channel-request method to reject/no-op (or, for auth_publickey_offered, accept the offer through to signature verification). alknet-ssh must decide its default channel policy. The user's clarification sharpens this:

  • session channels: the dispatch use case uses channel_open_session().exec() heavily — that's the "reverse git runner" pattern (run a command on the remote instance, capture stdout/stderr/exit). For the server side of alknet/ssh, though, the question is whether alknet-ssh runs a real shell on its own node. Given the VPN-like / forwarding use case is primary and the "shell server" use case is secondary, the default should be exec-only: shell_request and pty_request default-reject; exec_request permitted (gated by ACL — see forwarding below). This keeps alknet-ssh a focused forwarding/exec appliance rather than a general-purpose interactive login server. Interactive shell can be an explicit opt-in later (two-way door).
  • port forwarding in both directions (direct-tcpip in, tcpip_forward / forwarded-tcpip out): in v1 scope, both directions, per user clarification. The policy (which destinations are allowed, whether to restrict by ACL/scope) still needs specifying.
  • PTY/X11/agent forwarding: default-reject for security; explicit opt-in. (Consistent with the exec-only session stance.)

Default-deny baseline: the user explicitly called out that "the configuration needs to be such that it's kind of 'default deny', which russh does by default." russh's server::Handler already defaults every channel/auth/forwarding callback to reject or no-op — so alknet-ssh gets default-deny for free by overriding only the methods it wants to enable. Phase 1 must record this as the explicit baseline: every forwarding destination, every exec command, every channel type must be explicitly permitted by config + ACL, never implicitly allowed.

ACL gating: forwarding destinations and exec commands are gated by scopes on the resolved Identity. The exact scope vocabulary (e.g., ssh:forward:*, ssh:forward:127.0.0.1:5432, ssh:exec:git-upload-pack) is a design choice the Architect makes — likely a small, capability-shaped scope set with wildcards, consistent with Identity.scopes / Identity.resources (auth.md). The "resources" field on Identity (populated only by composition per CompositionAuthority::as_identity, ADR-022) is not available to fingerprint/token-resolved external identities, so per-destination ACLs for inbound SSH must live in scopes, not resources.

Recommendation: Phase 1 writes an ADR defining the v1 channel-policy surface: exec (gated) + bidirectional port forwarding (gated), with shell/PTY/X11/agent forwarding default-rejected. Default-deny baseline is inherited from russh. Forwarding destinations + exec commands gated by ACL scopes. The exact scope vocabulary is an OQ for Phase 1 (it interacts with how operators express "allow forwarding to 127.0.0.1:5432" in DynamicConfig).

DP-6: Auth method coverage — publickey-only vs password/kbdint too

(Recommended: two-way door — start publickey-only, extend later)

russh supports none, password, publickey, keyboard-interactive, and OpenSSH certificate auth server-side. alknet's identity model (auth.md) is fingerprint-based — SSH key fingerprint → IdentityProviderIdentity. This maps naturally onto publickey (the fingerprint is the SHA-256 of the presented public key) and OpenSSH certificate auth (cert fingerprint). Password / keyboard-interactive don't fit the fingerprint model as cleanly (there's no resolve_from_password on IdentityProvider).

Recommendation: start publickey-only (and certificate auth, which is a superset of publickey from the fingerprint POV). Treat password / keyboard-interactive as a two-way door — can be added later if a use case arises, but the natural alknet identity story is key-based. Phase 1 should note this; likely not a full ADR (it's a default, not a structural decision) but at least a documented design choice in the ssh spec.

DP-7: tokio as a hard transitive dependency

(Recommended: acknowledged constraint, not a decision)

russh 0.60.2 transitively requires tokio (no "no-tokio" feature; only WASM swaps the spawner). The server loop uses tokio::time::sleep for keepalive/inactivity timers, so the tokio runtime must have its time driver enabled. alknet-ssh must run inside a tokio runtime — which it will, because alknet-core's endpoint already runs on tokio (tokio = { version = "1", features = ["full"] }). This is consistent with the rest of the workspace and not a constraint to fight. Phase 1 should record it as a known constraint; OQ-09 (WASM boundaries) already documents that the server-side dispatch path is a one-way door away from WASM — alknet-ssh inherits that.

DP-8: The ssh-key crate is forked

(Recommended: acknowledged constraint — use the russh re-export)

russh 0.60.2 depends on internal-russh-forked-ssh-key = "0.6.18" (a renamed fork), not upstream ssh-key. alknet-ssh must not add upstream ssh-key directly — that would put two ssh-key versions in the tree and the PublicKey/PrivateKey types wouldn't unify. The fork is re-exported through russh::keys::ssh_key, so alknet-ssh should always reach key types via russh::keys::* (or russh::keys::ssh_key::*) to stay on the same fork. Phase 1 should note this as an implementation constraint; it's not architecturally interesting but a real footgun if missed.

DP-9: End-to-end over a non-TCP stream is untested upstream

(Recommended: de-risk early with a POC test)

russh's own test suite (/workspace/russh/russh/src/tests.rs and client/test.rs) only exercises the client↔server round trip over real TCP loopback. There is no test connecting connect_streamrun_stream over tokio::io::duplex() or any other in-memory pipe. The SshRead::read_ssh_id unit tests feed &[u8] directly, proving the banner parser works on non-socket streams — but a full client↔server round trip over a non-TCP stream is unverified upstream.

The reference implementation uses this path in production (per transport/iroh_transport.rs using tokio::io::join), which is strong empirical evidence it works. But the alknet greenfield rewrite should close this gap early with an integration test using tokio::io::duplex() connecting connect_streamrun_stream before going near real QUIC.

Recommendation: per sdd_process.md Phase 0, this is a candidate for a POC Specialist task (.worktrees/research/ssh-stream-poc/). Phase 1's architecture docs should reference the POC's outcome. If the POC surfaces issues (half-open stream handling, poll_shutdown semantics, etc.), they feed back into the spec as constraints.

DP-10: Bare-TCP SSH listener — in-v1 for git-over-SSH forward-compat

(Recommended: one-way door on the config shape, two-way door on the listener itself — user-clarified)

ADR-010 already establishes that bare-TCP SSH is a handler concern, not an endpoint concern — the SSH handler can listen on a TCP socket independently of the alknet/ssh ALPN path. The user added a forward-looking constraint: "We need to be able to have that TCP handler so we can later support git over ssh."

Standard git-over-SSH (ssh git@host ...) runs on TCP port 22, not over QUIC, not over the alknet/ssh ALPN — git clients (git, libgit2, gix) dial a TCP socket and expect the SSH-2 protocol directly. To make alknet-ssh a viable git-over-SSH target, the bare-TCP listener must be a first-class path, not just a future two-way-door add-on.

The two paths (ALPN/QUIC vs bare-TCP) share the same russh::server::Config and the same server::Handler implementation; they differ only in how the duplex stream is obtained:

  • ALPN path: handle() receives the QUIC Connection, calls accept_bi(), tokio::io::joins the halves, hands to run_stream.
  • TCP path: a tokio::net::TcpListener accept loop hands each accepted TcpStream directly to run_stream (russh accepts TcpStream natively via run_on_socket, or we use run_stream with the raw stream to keep config/ handler identical across both paths).

Default-deny baseline (user-stated): "the configuration needs to be consider such that it's kind of 'default deny', which russh does by default." This applies to both paths — the same ACL gating, the same channel policy, the same default-reject for forwarding destinations. A TCP-listener client gets exactly the same policy treatment as an ALPN client; the only difference is the transport. The TCP listener is off by default (must be explicitly configured to bind), consistent with the default-deny posture — an operator who doesn't configure a TCP bind address gets no TCP listener, only the ALPN path.

Recommendation: Phase 1 records the dual-path model in the ssh spec — ALPN/QUIC primary, bare-TCP as a co-equal first-class path (off by default, explicit config to enable) — so that the configuration shape accommodates both from v1 even if the TCP listener implementation lands slightly later. Crucially, the config schema should reserve the TCP-listener fields now (one-way door — adding a config field later is non-breaking but designing the config around only-ALPN-then-retrofitting-TCP is messier than reserving the shape up front). The listener implementation itself is a two-way door. This avoids the trap where git-over-SSH becomes a painful retrofit because the config only modeled the ALPN path.

Based on the above, the recommended approach to take into Phase 1:

  1. Crate: alknet-ssh, depends on alknet-core and russh = "0.60" (default features, i.e. aws-lc-rs). Implements ProtocolHandler for b"alknet/ssh". Owns both the SSH server and the SSH client (the client is the shared primitive dispatch and the VPN-like topology both consume).

  2. Stream wiring: handle() accepts the QUIC Connection, calls connection.accept_bi() once to get (SendStream, RecvStream), joins them with tokio::io::join(recv, send), and hands the resulting duplex stream to russh::server::run_stream(Arc::clone(&config), stream, handler). One QUIC bistream ↔ one SSH connection; russh multiplexes SSH channels inside it.

  3. Auth: constructor-injected Arc<dyn IdentityProvider> (per auth.md's SshAdapter example). Inside handle(), if auth.identity is None, russh's server::Handler::auth_publickey resolves the offered key's fingerprint through the provider; on success, store the resolved Identity on the Connection via set_identity() (OQ-11). Start publickey-only (plus OpenSSH cert, which rides the same fingerprint path).

  4. Host keys (DP-1): vault-derived Ed25519 by default (derived from the seed at startup by the assembly layer and injected into SshAdapter's config), with an optional config-supplied key file override. Symmetric with TlsIdentity::RawKey (ADR-027). Needs an ADR.

  5. Channel policy — default-deny, exec + bidirectional forwarding in v1 (DP-5): v1 supports exec (gated) + port forwarding in both directions (direct-tcpip local→remote, forwarded-tcpip/tcpip_forward remote→local, both gated). shell/PTY/X11/agent forwarding default-reject (opt-in later, two-way door). Default-deny baseline inherited from russh — every channel type, every forwarding destination, every exec command must be explicitly permitted by config + ACL scopes; never implicitly allowed. Forwarding destinations + exec commands gated by scopes on the resolved Identity (the resources field is composition-only per ADR-022, so inbound-SSH per-destination ACLs live in scopes). Needs an ADR defining the v1 surface + the scope vocabulary (latter likely stays an OQ).

  6. Client + SOCKS5 — in v1, both in alknet-ssh (DP-4): alknet-ssh owns the SSH server (the ProtocolHandler) and the SSH client (outbound dialing, the primitives dispatch and the VPN-like topology both consume). Port forwarding in both directions is a client-side feature too (the client opens direct-tcpip channels; dispatch does exactly this). SOCKS5 server (what an SSH connection's client dials through the alknet node) is in v1 within alknet-ssh — the VPN-like use case requires it. The SOCKS5 protocol codec may or may not factor into a tiny reusable alknet-socks5 crate (consuming a byte stream); recommend starting with the codec inside alknet-ssh and extracting only if a second consumer appears (two-way door — the stream-agnostic philosophy makes extraction cheap). Needs an ADR confirming this scope.

  7. De-risk POC (DP-9): a Phase 0 POC validating connect_streamrun_stream over tokio::io::duplex() before Phase 1 finalizes the stream wiring spec. Strong empirical evidence from the reference implementation suggests it will pass, but the upstream test gap is real.

  8. Bare-TCP SSH listener — first-class path, config shape reserved in v1, listener off-by-default (DP-10): the alknet/ssh ALPN/QUIC path is primary; a bare-TCP listener is a co-equal first-class path needed for future git-over-SSH support. Reserve the TCP-listener config fields in v1 (one-way door on the config schema — retrofitting is messier than reserving the shape up front). The listener is off by default (explicit config to bind), consistent with the default-deny posture. Both paths share the same server::Config + Handler + ACL policy — only the stream source differs. The listener implementation itself is a two-way door, but the config shape is locked in v1.

Open Questions to Carry into Phase 1

The following should become OQs in docs/architecture/open-questions.md (numbering will be assigned by the Architect — likely OQ-25 onwards, since OQ-01OQ-24 exist):

  • OQ-SSH-01 (host key sourcing): vault-derived default + config override — resolved by the DP-1 ADR.
  • OQ-SSH-02 (channel policy v1 surface + default-deny scope vocabulary): the set of allowed channel types / request types is resolved by the DP-5 ADR; the exact scope vocabulary for forwarding destinations + exec commands (e.g., ssh:forward:127.0.0.1:5432 vs a resources-style shape) stays open — it interacts with how operators express allow-lists in DynamicConfig and with the fact that Identity.resources is composition-only (ADR-022).
  • OQ-SSH-03 (client + SOCKS5 scope): confirm alknet-ssh owns both server + client + SOCKS5-server in v1, and whether the SOCKS5 codec extracts to a separate crate now or later — resolved (in favor of in-alknet-ssh-now, extract-later) by the DP-4 ADR.
  • OQ-SSH-04 (POC outcome): did the duplex()-based round-trip POC pass, and did it surface any stream-handling constraints (half-open, poll_shutdown, maximum packet size) that constrain the spec? Resolved by POC Specialist results.
  • OQ-SSH-05 (crypto backend): confirm aws-lc-rs default aligns with the rest of the workspace; defer flipping to ring unless binary-size pressure arises. Two-way door.
  • OQ-SSH-06 (bare-TCP listener enablement timeline): the config shape is reserved in v1 (DP-10); whether the TCP listener implementation lands in v1 or as a fast-follow is a two-way door. Git-over-SSH is the forcing function — decide based on whether v1 needs to be a git-over-SSH target out of the box.

Next Steps (Phase 0 → Phase 1)

  1. You decide on the DP-1, DP-4, DP-5, DP-10 recommendations (or amend them) — these are the load-bearing architectural choices, and DP-4/DP-5/DP-10 now reflect your clarifications (SOCKS5 + bidirectional forwarding + TCP listener for git-over-SSH are all in-scope; default-deny baseline). DP-2, DP-3, DP-6, DP-7, DP-8 are defaults I recommend accepting as-is; DP-9 is a POC task.
  2. Optional POC (DP-9): spawn a POC Specialist to validate connect_streamrun_stream over tokio::io::duplex(). Timeboxed; if it passes, the stream-wiring spec is straightforward; if it surfaces constraints, they fold into the spec.
  3. Phase 1 (Architect): produce docs/architecture/crates/ssh/README.md + component specs (e.g., ssh-handler.md, ssh-stream.md, ssh-channels.md, ssh-auth.md, ssh-forwarding.md, ssh-socks5.md, ssh-client.md, ssh-tcp-listener.md), ADRs for the accepted DPs (likely ADR-028 host-key sourcing, ADR-029 channel policy + default-deny, ADR-030 ssh server+client+ socks5+forwarding scope, ADR-031 bare-TCP listener config shape), and the OQs above in open-questions.md. Update docs/architecture/README.md index and ADR table.

References

  • docs/sdd_process.md — Phase 0 process definition
  • docs/architecture/overview.md — ALPN-as-service, crate graph, ProtocolHandler
  • docs/architecture/crates/core/core-types.md — ProtocolHandler, Connection, BiStream
  • docs/architecture/crates/core/auth.md — AuthContext, IdentityProvider, SshAdapter example
  • docs/architecture/decisions/001-alpn-protocol-dispatch.md — ALPN dispatch
  • docs/architecture/decisions/002-protocol-handler-trait.md — ProtocolHandler trait
  • docs/architecture/decisions/004-auth-as-shared-core.md — hybrid auth
  • docs/architecture/decisions/007-bistream-type-definition.md — BiStream trait
  • docs/architecture/decisions/010-alpn-router-and-endpoint.md — endpoint, TCP-is-handler-concern
  • docs/architecture/decisions/014-secret-material-flow-and-capability-injection.md — Capabilities
  • docs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md — registration bundle
  • docs/architecture/decisions/025-vault-local-only-dispatch.md — vault local-only
  • docs/architecture/decisions/027-tls-identity-redesign-acme-rawkey-decoupling.md — TLS identity model (symmetry reference for DP-1)
  • docs/research/references/ssh/russh/01-06 — existing russh deep-dives
  • /workspace/russh/ — russh 0.60.2 source (authoritative; cargo cache has 0.49.2 only)
  • /workspace/@alkdev/alknet-main/crates/alknet-core/src/ — reference implementation (transport/iroh_transport.rs:94 shows the tokio::io::join adapter; server/, interface/ssh.rs, client/, socks5/ for prior art)
  • /workspace/@alkdev/dispatch/ — concrete downstream consumer the user wants to replace with this stack: axum + russh = "0.60" SSH client for "reverse git runner" over Docker/vast.ai. src/ssh.rs (russh client wrapper, 143 lines), src/handlers.rs::start_forward (channel_open_direct_tcpip local→remote forwarding), src/sftp.rs (russh-sftp client). AGENTS.md and docs/architecture.md describe the architecture. No SOCKS5 — that's the alknet-original feature preserved here. Dispatch is a textbook consumer of the alknet-ssh client + forwarding primitives, which is why those live in alknet-ssh rather than being duplicated per-consumer.