greenfield: clean slate for ALPN-as-service pivot

Delete old source crates (alknet-core, alknet, alknet-napi), old architecture docs (ADRs, specs, open questions), old research docs (phase2, event-sourcing, feasibility, etc.), old tasks, and obsolete reference material (gitserver/MPL, honker, nats, rustfs, polyglot, keystone, distributed-identity). Keep: alknet-secret (standalone, compiles), pivot docs, iroh and ssh references, rudolfs reference (MIT/Apache, fork candidate), ops docs, sdd_process.md, and licenses. Previous implementation preserved at /workspace/@alkdev/alknet-main/ for reference during porting. Workspace compiles: cargo check + 14 tests pass for alknet-secret.
2026-06-15 12:08:08 +00:00
parent d003a4f4ec
commit b5a4600d74
261 changed files with 138 additions and 53794 deletions
--- a/docs/architecture/decisions/001-pluggable-transport.md
+++ b/docs/architecture/decisions/001-pluggable-transport.md
@@ -1,26 +0,0 @@
-# ADR-001: Pluggable Transport via AsyncRead+AsyncWrite Trait
-
-## Status
-Accepted
-
-## Context
-Alknet needs to support multiple transport modes (TCP, TLS, iroh) for SSH sessions. Each mode has different connection establishment logic but produces the same result: a bidirectional byte stream. Without an abstraction, each transport would need its own SSH connection code path.
-
-russh's `client::connect_stream()` and `server::run_stream()` both accept `AsyncRead + AsyncWrite + Unpin + Send`, meaning SSH is already transport-agnostic at the API level. The design question is whether to enshrine this in alknet's own type system or handle each transport case-by-case.
-
-## Decision
-Define a `Transport` trait that produces `AsyncRead + AsyncWrite + Unpin + Send` streams. Each transport (TCP, TLS, iroh) implements this trait. The SSH layer calls `transport.connect()` and passes the result to `russh::client::connect_stream()`.
-
-On the server side, define a `TransportAcceptor` trait that produces incoming streams. Each acceptor (TCP listener, TLS listener, iroh endpoint) implements this trait. The server calls `acceptor.accept()` and passes the result to `russh::server::run_stream()`.
-
-This makes adding a new transport (e.g., WebSocket, QUIC directly) a matter of implementing the trait, not modifying SSH code.
-
-## Consequences
- **Positive**: Clean separation between transport and protocol. Adding transports is additive. SSH code is transport-agnostic.
- **Positive**: Testing is simplified — mock transports can produce in-memory streams.
- **Negative**: Slight indirection for the single-transport case (just TCP). The trait boilerplate is minimal though.
- **Negative**: The trait must be object-safe if we want dynamic dispatch. Using `impl Trait` in function signatures avoids this but limits runtime transport selection. CLI-selected transport needs dynamic dispatch: `Box<dyn Transport<Stream = Box<dyn AsyncRead+AsyncWrite+Unpin+Send>>>`.
-
-## References
- [transport.md](../transport.md)
- [Feasibility assessment §3](../../research/feasibility/ssh-tunnel-vpn-alternative-feasibility.md)
--- a/docs/architecture/decisions/002-tun-separate-process.md
+++ b/docs/architecture/decisions/002-tun-separate-process.md
@@ -1,30 +0,0 @@
-# ADR-002: TUN Shim as Separate Process
-
-## Status
-Superseded by ADR-014
-
-## Context
-TUN interface creation requires root privileges or `CAP_NET_ADMIN` on Linux, Administrator on Windows, or platform-specific VPN APIs on macOS/iOS/Android. If the core alknet binary required these privileges, the attack surface of root-required code would include the entire SSH implementation, key handling, and transport negotiation.
-
-The primary use cases (SOCKS5 proxy, port forwarding) need no privileges at all. Only the "route all traffic through TUN" use case needs root.
-
-## Decision
-The TUN functionality is a separate `alknet-tun` binary that:
-1. Creates a TUN device (requires root / CAP_NET_ADMIN)
-2. Reads IP packets from it
-3. Forwards each connection to the core alknet's SOCKS5 port (127.0.0.1:1080)
-4. Proxies bytes between TUN packets and SOCKS5 connections
-
-The core `alknet connect` binary never needs root. The `alknet-tun` binary is ~200-500 lines and does nothing except TUN ↔ SOCKS5 forwarding.
-
-## Consequences
- **Positive**: Root-required code surface is tiny and auditable.
- **Positive**: Core binary runs unprivileged. SOCKS5 and port forwarding work without any special permissions.
- **Positive**: TUN process can crash without affecting the SSH session (it just reconnects to SOCKS5).
- **Positive**: Matches the proven tun2proxy architecture.
- **Negative**: Two processes to manage instead of one. Requires process supervision (systemd, etc.).
- **Negative**: SOCKS5 adds a small latency overhead vs. direct TUN → SSH packet routing. This is acceptable for the security benefit.
-
-## References
- [tun-shim.md](../tun-shim.md)
- [tun2proxy](https://github.com/tun2proxy/tun2proxy) — proven architecture for TUN → SOCKS5 proxy
--- a/docs/architecture/decisions/003-iroh-stream-join.md
+++ b/docs/architecture/decisions/003-iroh-stream-join.md
@@ -1,31 +0,0 @@
-# ADR-003: iroh Stream via tokio::io::join
-
-## Status
-Accepted
-
-## Context
-iroh's QUIC implementation provides separate `RecvStream` (implements `AsyncRead`) and `SendStream` (implements `AsyncWrite`) for each bidirectional channel opened via `open_bi()` / `accept_bi()`. russh's `connect_stream()` and `run_stream()` require a single type implementing both `AsyncRead` and `AsyncWrite`.
-
-Options considered:
-1. `tokio::io::join(recv, send)` — Combines the two halves into `Join<RecvStream, SendStream>` which implements both traits.
-2. Custom `IrohStream` wrapper — A struct with `recv` and `send` fields that delegates `AsyncRead` to `recv` and `AsyncWrite` to `send`.
-3. Using iroh's `Connection` directly — Opening a new `open_bi()` for each SSH channel instead of running SSH over a single stream.
-
-## Decision
-Use `tokio::io::join(recv_stream, send_stream)` (Option 1).
-
-One line of code, correct trait implementations, no custom types needed. The `Join<A, B>` type implements `AsyncRead` using `A` and `AsyncWrite` using `B`, which maps directly to iroh's split stream model.
-
-If profiling later shows overhead (unlikely — it's just method dispatch), we can switch to a custom wrapper. But YAGNI until demonstrated.
-
-Option 3 was rejected because it would require modifying russh to understand iroh connections. The whole point of the transport trait is that SSH doesn't know about iroh.
-
-## Consequences
- **Positive**: Minimal code. One line to bridge iroh and russh.
- **Positive**: No custom types to maintain.
- **Positive**: Correct `AsyncRead` + `AsyncWrite` behavior — `Poll::Pending` on one half doesn't affect the other.
- **Negative**: None identified. The `Join` type is a standard tokio combinator with well-tested semantics.
-
-## References
- [transport.md](../transport.md)
- [Feasibility assessment §11](../../research/feasibility/ssh-tunnel-vpn-alternative-feasibility.md)
--- a/docs/architecture/decisions/004-ssh-over-transport.md
+++ b/docs/architecture/decisions/004-ssh-over-transport.md
@@ -1,28 +0,0 @@
-# ADR-004: SSH Runs Over Transport, Not Alongside
-
-## Status
-Accepted
-
-## Context
-There are two ways to structure the relationship between SSH and the transport layer:
-
-1. **SSH over transport**: The transport produces one duplex stream. The entire SSH session (handshake, key exchange, channel multiplexing) runs over that single stream via `connect_stream()` / `run_stream()`. SSH has no direct network access.
-
-2. **Transport alongside SSH**: SSH manages its own TCP connections via `connect()` / `run()`. The transport layer is an additional feature that wraps outgoing connections. SSH knows about the network.
-
-## Decision
-SSH runs over the transport (Option 1). The SSH layer never opens its own sockets or knows what transport it's on.
-
-This is directly enabled by russh's `connect_stream()` and `run_stream()` APIs, which accept any `AsyncRead+AsyncWrite+Unpin+Send`. SSH's entire interaction with the network goes through the single stream produced by the transport.
-
-## Consequences
- **Positive**: Adding a new transport requires implementing the `Transport` trait, not modifying SSH code.
- **Positive**: Testing is straightforward — mock transports produce in-memory streams.
- **Positive**: Security audit is clean — the SSH implementation has no network-facing code.
- **Positive**: The transport can be layered. Iroh connecting through a SOCKS5 proxy (which itself tunnels through alknet) is just a transport that calls out to a SOCKS5 library before establishing the QUIC connection.
- **Negative**: SSH keepalive and reconnection must be handled at the transport level. If the transport stream dies, the SSH session dies. Reconnection means establishing a new transport + new SSH session. There's no "SSH reconnects over the same transport" — you get a new session.
- **Negative**: Multiple SSH sessions over the same iroh connection require the iroh `Endpoint` (not stream) to be shared between sessions. The transport trait produces one stream per `connect()` call. The iroh `Endpoint` must be created externally and shared. (The `IrohTransport` struct holds an `Arc<Endpoint>`.)
-
-## References
- [transport.md](../transport.md)
- [Feasibility assessment §3.4](../../research/feasibility/ssh-tunnel-vpn-alternative-feasibility.md)
--- a/docs/architecture/decisions/005-socks5-before-tun.md
+++ b/docs/architecture/decisions/005-socks5-before-tun.md
@@ -1,39 +0,0 @@
-# ADR-005: SOCKS5 as Primary Interface, TUN as Add-on
-
-## Status
-Accepted
-
-## Context
-A "VPN-like" tool needs to route traffic. There are three approaches:
-
-1. **TUN only**: Create a TUN interface, route all OS traffic through it. Full VPN experience but requires root.
-2. **SOCKS5 only**: Local SOCKS5 proxy. Applications configure proxy settings. No root needed but application support varies.
-3. **SOCKS5 primary, TUN add-on**: SOCKS5 is the core interface. TUN forwards to SOCKS5.
-
-## Decision
-SOCKS5 is the primary interface. TUN is a separate process that forwards to SOCKS5 (Option 3).
-
-SOCKS5 is the core because:
- It requires no privileges
- `curl --socks5-hostname` works everywhere
- Browsers, most CLI tools, and many applications support SOCKS5
- SOCKS5h prevents DNS leaks by resolving names server-side
- It's the interface that the NAPI wrapper and pubsub adapter build on
- TUN is only needed for "route all traffic" use cases, which are a subset of users
-
-TUN forwards to SOCKS5 rather than directly to SSH because:
- The SOCKS5 code already handles TCP connection establishment and bidirectional proxying
- TUN's job is just IP packet → SOCKS5 connection, not IP packet → SSH channel
- The `alknet-tun` binary stays minimal (~200-500 lines)
- No root code in the core binary
-
-## Consequences
- **Positive**: Core binary is root-free. TUN functionality is provided by the external `tun2proxy` tool (ADR-014).
- **Positive**: SOCKS5 is testable without TUN — just `curl` against it.
- **Positive**: The TUN approach is validated by tun2proxy, a well-tested existing tool. No custom TUN code to maintain.
- **Negative**: VPN-like behavior requires running `tun2proxy` alongside `alknet connect` — two processes instead of one integrated binary.
- **Negative**: SOCKS5 doesn't capture UDP (except DNS via SOCKS5h). TUN mode via tun2proxy handles this separately.
-
-## References
- [client.md](../client.md)
- [tun-shim.md](../tun-shim.md)
--- a/docs/architecture/decisions/006-no-logging-of-tunnel-destinations.md
+++ b/docs/architecture/decisions/006-no-logging-of-tunnel-destinations.md
@@ -1,38 +0,0 @@
-# ADR-006: No Logging of Tunnel Destinations
-
-## Status
-Accepted
-
-## Context
-An SSH tunnel server sees every destination that clients connect to — hostnames, IP addresses, port numbers. This is extremely sensitive information. Logging it creates:
-
- **Privacy risks**: Tunnel destinations reveal what services users access (internal databases, APIs, etc.)
- **Legal concerns**: Server operators may be pressured to produce logs showing what clients accessed
- **Data retention liability**: Stored destination logs are an attack surface (data breaches, subpoenas)
-
-However, the server does need to log some information for operational purposes — particularly for fail2ban integration to detect and block abusive connections.
-
-## Decision
-The server does NOT log:
- `channel_open_direct_tcpip` destinations (host, port)
- DNS resolutions performed by the server on behalf of clients
- Bytes transferred through tunnel channels
- Connection duration or throughput
-
-The server DOES log (ADR-013):
- Auth attempts (remote_addr, user, key_fingerprint, accept/reject)
- Connection opened (remote_addr, transport kind)
- Connection closed (remote_addr, duration)
-
-This separation ensures fail2ban has enough data to detect abusive IPs while destination privacy is maintained.
-
-## Consequences
- **Positive**: Tunnel destinations are never written to disk or any observable log. This is the same guarantee OpenSSH makes with `LogLevel VERBOSE` or below.
- **Positive**: Reduces legal and privacy exposure for server operators.
- **Positive**: fail2ban can still work — it needs source IPs and auth failures, not destinations.
- **Negative**: Server operators cannot audit what destinations clients are accessing. If an operator needs this for compliance, they must implement it outside alknet (e.g., network-level logging at the target host).
- **Negative**: Debugging connectivity issues is harder without destination logs. Mitigated by client-side logging (the client knows what it's connecting to).
-
-## References
- [server.md](../server.md)
- [ADR-013](013-fail2ban-friendly-logging.md) — what the server does log
--- a/docs/architecture/decisions/007-napi-single-stream.md
+++ b/docs/architecture/decisions/007-napi-single-stream.md
@@ -1,26 +0,0 @@
-# ADR-007: NAPI Exposes Single Duplex Stream
-
-## Status
-Accepted
-
-## Context
-The NAPI wrapper for alknet could expose different granularity levels:
-
-1. **Full SSH API**: Expose channel multiplexing, `open_direct_tcpip`, `tcpip_forward`, session management. The TypeScript layer would manage channels.
-2. **Single duplex stream**: The NAPI wrapper establishes one SSH channel and returns it as a Node.js `Duplex` stream. TypeScript multiplexing (if needed) happens at the pubsub layer.
-
-## Decision
-Option 2: NAPI exposes a single duplex stream.
-
-The NAPI wrapper's job is to get a reliable, authenticated byte stream from A to B. It handles transport (TCP/TLS/iroh), SSH authentication, and channel setup, then hands the caller a single `Duplex` stream that just works.
-
-If the TypeScript consumer needs multiplexing (e.g., multiple concurrent tool calls over operations), pubsub handles that at the `EventEnvelope` level. Multiple `call.requested` / `call.responded` events flow over the same stream, distinguished by their `id` fields. This is how the existing WebSocket adapter works.
-
-## Consequences
- **Positive**: Minimal NAPI surface — one function, one return type. Small binary, small FFI boundary.
- **Positive**: The TypeScript side doesn't need to understand SSH at all. It gets a stream and sends/receives `EventEnvelope` JSON.
- **Positive**: No need to expose russh types in NAPI. The SSH complexity stays in Rust.
- **Negative**: If a consumer wants multiple isolated channels (e.g., one for events, one for file transfer), they'd need multiple `connect()` calls (multiple SSH sessions). This is acceptable for the expected use case (pubsub events over a single stream).
-
-## References
- [napi-and-pubsub.md](../napi-and-pubsub.md)
--- a/docs/architecture/decisions/008-acme-lets-encrypt.md
+++ b/docs/architecture/decisions/008-acme-lets-encrypt.md
@@ -1,38 +0,0 @@
-# ADR-008: ACME/Let's Encrypt Certificate Provisioning
-
-## Status
-Accepted
-
-## Context
-TLS transport mode requires certificates. Manual certificate management is error-prone — users need to obtain, install, and renew certificates. Our production setup uses certbot with Let's Encrypt (documented in [certbot.md](../../research/ops/certbot.md)), which automates this via the ACME protocol.
-
-There are two ACME flows:
-1. **Domain-based**: Standard flow with DNS-01 or HTTP-01 challenge. Certificate is tied to a domain name, auto-renews via certbot/systemd timer. Requires port 80 or DNS access for challenges.
-2. **IP-based**: Short-lived certificates via TLS-ALPN-01 challenge on port 443. No domain needed, but cert is short-lived (days, not months). Simpler for quick setups but requires the ACME client to run continuously.
-
-Both flows are important for alknet's usability. Without ACME, TLS mode requires manual cert setup — a significant barrier for users who want "SSH over port 443" for censorship resistance.
-
-## Decision
-Support both ACME certificate provisioning paths:
-
-1. **Domain-based ACME** (`--acme-domain <domain>`): Standard certbot-style flow. Certificate is domain-bound, auto-renews. The server runs a challenge responder (HTTP-01 on port 80 or TLS-ALPN-01 on port 443) during certificate issuance/renewal.
-
-2. **IP-based ACME**: Short-lived certs for servers without a domain. Uses TLS-ALPN-01 challenge on port 443. Lower burden but certs expire frequently.
-
-3. **Manual certs** (`--tls-cert` / `--tls-key`): Always supported for users with existing certificates or specific PKI setups.
-
-The implementation should use the `rustls-acme` crate (or similar pure-Rust ACME client) to avoid an external certbot dependency. This keeps alknet self-contained as a single binary.
-
-## Consequences
- **Positive**: Users can run `alknet serve --transport tls --acme-domain example.com` and get working TLS with zero manual cert management.
- **Positive**: IP-based ACME covers the quick-setup case without requiring a domain.
- **Positive**: Consistent with our production infrastructure (certbot + Let's Encrypt is already our standard).
- **Negative**: ACME adds complexity to the server binary (challenge responder, cert store, renewal timer).
- **Negative**: IP-based short-lived certs require more frequent renewal handling.
- **Negative**: Binary size increases with ACME support (rustls-acme dependency). Consider making ACME a feature flag (`acme`).
-
-## References
- [server.md](../server.md)
- [OQ-01](../open-questions.md) — resolved by this ADR
- [OQ-07](../open-questions.md) — resolved by this ADR
- Production certbot setup: [certbot.md](../../research/ops/certbot.md)
--- a/docs/architecture/decisions/009-default-iroh-relay.md
+++ b/docs/architecture/decisions/009-default-iroh-relay.md
@@ -1,28 +0,0 @@
-# ADR-009: Default iroh Relay with Override
-
-## Status
-Accepted
-
-## Context
-iroh requires a relay server for NAT traversal and initial connection establishment. The n0 project provides free relay servers (`https://relay.iroh.network/`) that work out of the box. However, relying on a third-party service creates a dependency:
-
- n0's relay could change terms, rate-limit, or go down
- Production deployments may want self-hosted relays for reliability and privacy
- The relay URL is a configuration point that should be explicit
-
-Conversely, requiring users to set up a relay server before they can use iroh transport is a significant friction point for testing and quick starts.
-
-## Decision
-Default to n0's relay servers. Allow override via `--iroh-relay <url>` CLI flag. Document self-hosted relay setup in project documentation.
-
-This matches iroh's own defaults — n0's relay is the standard starting point. Users who need production reliability self-host.
-
-## Consequences
- **Positive**: Zero-config iroh transport for testing and development. `alknet serve --transport iroh` just works.
- **Positive**: Self-hosting is a single flag override, not a complex setup requirement.
- **Negative**: Default depends on n0's infrastructure. If n0's relay is down, default iroh connections fail (but this is the same experience as every iroh user).
- **Negative**: Privacy-conscious users must remember to `--iroh-relay` to avoid n0. Mitigated by documentation.
-
-## References
- [transport.md](../transport.md)
- [OQ-02](../open-questions.md) — resolved by this ADR
--- a/docs/architecture/decisions/010-transport-chaining-cli.md
+++ b/docs/architecture/decisions/010-transport-chaining-cli.md
@@ -1,33 +0,0 @@
-# ADR-010: Transport Chaining in CLI
-
-## Status
-Accepted
-
-## Context
-Transport chaining allows combining iroh with an upstream proxy, e.g.:
-
-```bash
-alknet connect --transport iroh --proxy socks5://127.0.0.1:1080
-```
-
-This routes iroh's outbound TCP connections through a SOCKS5 proxy, which could itself be another alknet instance. This is important for:
- Nested tunnel topologies
- Environments where iroh needs to go through an existing proxy
- Composing transports in flexible ways
-
-iroh's `Endpoint::builder` supports proxy configuration natively. The implementation is straightforward — pass the proxy URL to iroh's builder.
-
-## Decision
-Support `--transport iroh --proxy socks5://...` natively in the CLI. This works because iroh's endpoint builder accepts a proxy configuration, so the implementation is minimal: parse the proxy URL and pass it to the endpoint builder.
-
-For other transport combinations (TCP+TLS is already implicit — TLS wraps TCP), the `--proxy` flag applies to outbound connections from the SSH client or iroh endpoint.
-
-## Consequences
- **Positive**: Flexible transport composition without requiring separate manual configuration.
- **Positive**: Matches user expectation from the overview doc's transport chaining example.
- **Positive**: Implementation is minimal — iroh already supports proxy config.
- **Negative**: Slightly more CLI surface area (`--proxy` interaction with `--transport`).
-
-## References
- [transport.md](../transport.md)
- [OQ-05](../open-questions.md) — resolved by this ADR
--- a/docs/architecture/decisions/011-no-ssh-config-programmatic-api.md
+++ b/docs/architecture/decisions/011-no-ssh-config-programmatic-api.md
@@ -1,38 +0,0 @@
-# ADR-011: Programmatic-First API, No File-Based Config
-
-## Status
-Accepted
-
-## Context
-The client and server both need configuration (host addresses, keys, transport options, etc.). There are several approaches:
-
-1. **Read `~/.ssh/config`**: Parse OpenSSH config for default host/key/port. Reduces CLI verbosity for frequent connections.
-2. **Custom config file**: Alknet-specific config file (TOML/YAML) with host definitions.
-3. **Programmatic API only**: Configuration comes from CLI flags or the library API. No file parsing. `~/.ssh/` path conventions are cross-platform trouble (`~` expansion, Windows paths, etc.).
-4. **Hybrid**: `--config` flag pointing to a alknet-specific config file, but no OpenSSH config parsing.
-
-## Decision
-Option 3: Programmatic-first API. Configuration is provided via:
- **CLI**: explicit flags (`--server`, `--identity`, `--transport`, etc.)
- **Library API**: `alknet_core::client::ConnectOptions` and `alknet_core::server::ServeOptions` structs, constructable programmatically
- **Environment variables**: for a few convenience defaults (e.g., `ALKNET_SERVER`, `ALKNET_IDENTITY`)
-
-No `~/.ssh/config` parsing, no alknet-specific config files. This approach:
- Avoids cross-platform path issues (`~` expansion, Windows `USERPROFILE`, etc.)
- Makes the library API clean and straightforward for programmatic consumers (NAPI wrapper, pubsub)
- Keeps the CLI simple and explicit — no hidden behavior from config files
- Matches the design principle that the library crate (`alknet-core`) is the primary interface
-
-If users want config-file behavior in the future, it can be added as a separate layer that populates the options structs. But the core doesn't need to know about files.
-
-## Consequences
- **Positive**: Clean library API — `ConnectOptions` and `ServeOptions` are plain Rust structs.
- **Positive**: No cross-platform path issues in the core library.
- **Positive**: Explicit CLI — no hidden settings from a config file the user forgot about.
- **Positive**: NAPI wrapper can construct options programmatically without file I/O.
- **Negative**: Users must type full connection flags each time. Mitigated by shell aliases or environment variables.
- **Negative**: No config file convenience. Users coming from `ssh config` may find this inconvenient.
-
-## References
- [client.md](../client.md)
- [OQ-06](../open-questions.md) — resolved by this ADR
--- a/docs/architecture/decisions/012-auth-ed25519-and-cert-authority.md
+++ b/docs/architecture/decisions/012-auth-ed25519-and-cert-authority.md
@@ -1,42 +0,0 @@
-# ADR-012: Ed25519 Keys + OpenSSH Certificate Authority, No Password Auth
-
-## Status
-Accepted
-
-## Context
-SSH authentication has several options:
- **Ed25519 public key**: The default, already specified. Each user has a keypair; the server has an `authorized_keys` file.
- **Password authentication**: Convenient for quick setups but less secure (susceptible to brute force, credential reuse).
- **OpenSSH certificate authority (cert-authority)**: A CA signs user certificates. The server trusts the CA instead of individual keys. Much easier for multi-user setups — add one CA line to `authorized_keys` instead of every user's public key. Also supports certificate expiry and restrictions.
-
-The question is which auth methods to support and prioritize.
-
-## Decision
-
-**Primary: Ed25519 public key** (already specified, no change).
-
-**Important: OpenSSH certificate authority**. Support `cert-authority` entries in `authorized_keys` files. When a user presents a certificate signed by a trusted CA, the server validates the certificate (signature, expiry, permissions) and accepts it. This is critical for multi-user deployments where managing individual keys is impractical.
-
-**Not supported: Password authentication over SSH channels.** Password auth over an SSH tunnel (i.e., the SOCKS5 proxy requiring a password) is not in scope. Password auth over SSH itself is rejected because:
- It's less secure than key-based auth
- It's susceptible to brute force (fail2ban can mitigate, but keys eliminate the problem)
- It's not needed when cert-authority provides easy multi-user management
- If a local SOCKS5 proxy is desired with its own auth, that's a separate concern
-
-The server's `authorized_keys` file format follows OpenSSH conventions:
- Regular keys: `ssh-ed25519 AAAA... user@host`
- CA trusts: `cert-authority ssh-ed25519 AAAA... CA name`
- Principals: `cert-authority,permit-port-forwarding ssh-ed25519 AAAA... CA name`
-
-## Consequences
- **Positive**: Multi-user deployments are manageable — one CA entry instead of N key entries.
- **Positive**: Certificates can carry expiry dates and restrictions (permit-port-forwarding, no-pty, source-address).
- **Positive**: No password brute force risk. fail2ban still needed for connection-level abuse, but not for auth-level password guessing.
- **Positive**: `russh` supports OpenSSH certificate verification natively.
- **Negative**: Setting up a CA requires initial key management tooling (`ssh-keygen -s`).
- **Negative**: Users who want a quick "just let me in" experience need to generate keys first. Not a significant barrier for the target audience (self-hosting, ops).
-
-## References
- [client.md](../client.md)
- [server.md](../server.md)
- [OQ-04](../open-questions.md) — resolved by this ADR
--- a/docs/architecture/decisions/013-fail2ban-friendly-logging.md
+++ b/docs/architecture/decisions/013-fail2ban-friendly-logging.md
@@ -1,39 +0,0 @@
-# ADR-013: Fail2ban-Friendly Server Logging
-
-## Status
-Accepted
-
-## Context
-The server needs to handle abuse on public-facing deployments. Our production infrastructure uses fail2ban on Linux (documented in [fail2ban.md](../../research/ops/fail2ban.md)) with nftables and systemd journal backend. fail2ban needs structured, parseable logs to identify abusive IP addresses.
-
-However, fail2ban is Linux-specific. On other platforms (macOS, Windows, BSD), users need a different approach to reject abusive connections. The server should provide enough logging for fail2ban on Linux and enough built-in protection for other platforms.
-
-## Decision
-The server logs connection and authentication events at `INFO` level with structured fields, and provides a configurable connection rate limiter as a built-in defense.
-
-**Logging** (for fail2ban integration on Linux):
- Log auth attempts: `level=INFO, msg="auth attempt", remote_addr=<ip>, user=<user>, key_fingerprint=<sha256>, result=<accept|reject>`
- Log new connections: `level=INFO, msg="connection opened", remote_addr=<ip>, transport=<tcp|tls|iroh>`
- Log disconnections: `level=INFO, msg="connection closed", remote_addr=<ip>, duration=<secs>`
- Do NOT log: channel open targets, DNS resolutions, bytes transferred
-
-This matches what fail2ban needs: source IP + failure indicator. Our existing fail2ban setup filters on similar fields for SSH and nginx.
-
-**Built-in rate limiting** (for all platforms):
- `--max-connections-per-ip <n>` (default: 0 = unlimited) — reject new connections from an IP that already has N active connections
- `--max-auth-attempts <n>` (default: 10) — disconnect after N failed auth attempts from one connection
- Rate limiting happens at the SSH layer, before channels are opened
-
-This ensures that even without fail2ban, the server rejects obviously abusive connections.
-
-## Consequences
- **Positive**: fail2ban can parse alknet logs the same way it parses SSH and nginx logs on our production systems.
- **Positive**: Built-in rate limiting provides protection on platforms without fail2ban.
- **Positive**: No privacy-sensitive data in logs (no tunnel destinations).
- **Negative**: Slightly more code in the server for connection tracking per IP.
- **Negative**: Users with custom fail2ban filters need to write regex for alknet's log format (documented examples provided).
-
-## References
- [server.md](../server.md)
- [OQ-08](../open-questions.md) — resolved by this ADR
- Production fail2ban setup: [fail2ban.md](../../research/ops/fail2ban.md)
--- a/docs/architecture/decisions/014-defer-tun-recommend-socks5-proxy.md
+++ b/docs/architecture/decisions/014-defer-tun-recommend-socks5-proxy.md
@@ -1,41 +0,0 @@
-# ADR-014: Defer TUN Implementation, Recommend Local SOCKS5 + tun2proxy
-
-## Status
-Accepted
-
-## Context
-The original plan included a TUN shim (`alknet-tun`) as Phase 3 — a separate root-requiring process that creates a TUN device and forwards IP packets through alknet's SOCKS5 port. This would provide VPN-like "route all traffic" behavior.
-
-However, TUN implementation has significant complexities:
- Platform differences (Linux TUN, macOS utun, Windows wintun.dll)
- TCP reconstruction in userspace (smoltcp or tun2proxy's ip-stack)
- Virtual DNS handling
- Root/CAP_NET_ADMIN requirements
- TUN is easy to get wrong and hard to debug
-
-The core SOCKS5 interface already works for the vast majority of use cases. For users who truly need VPN-like "route all traffic" behavior, `tun2proxy` is an existing, well-tested tool that does exactly this: creates a TUN device and routes traffic through a SOCKS5 proxy.
-
-## Decision
-Defer TUN implementation entirely. Remove `alknet-tun` from the architecture. Instead:
-
-1. **Core interface**: alknet's local SOCKS5 proxy (always available, no root required)
-2. **VPN-like behavior**: Users who need it run `tun2proxy --proxy socks5://127.0.0.1:1080` alongside `alknet connect`
-3. **Documentation**: Recommend tun2proxy in the README/wiki for "route all traffic" use cases
-
-This removes TUN from the project scope while still providing a path to VPN-like behavior. If demand justifies it later, `alknet-tun` can be added as a thin wrapper around tun2proxy's pattern.
-
-The `tun` feature flag and `alknet-tun` binary are removed from the architecture. The `tun-rs` dependency is removed.
-
-## Consequences
- **Positive**: Significantly reduces project scope and complexity. No TUN code to write, test, or maintain across platforms.
- **Positive**: tun2proxy is already well-tested for this exact use case.
- **Positive**: Core binary remains unprivileged. No root code anywhere in the project.
- **Positive**: Cleaner architecture — alknet only does SSH tunneling + SOCKS5. tun2proxy does TUN.
- **Negative**: Users need two tools instead of one for VPN-like behavior. Mitigated by documentation.
- **Negative**: tun2proxy is an external dependency in practice, though it's widely available in package managers.
- **Negative**: No first-class Windows/macOS TUN story. tun2proxy handles these platforms but users need to install it separately.
-
-## References
- [tun-shim.md](../tun-shim.md) — this spec is now deprecated
- [ADR-002](002-tun-separate-process.md) — superseded; TUN is no longer in scope
- [ADR-005](005-socks5-before-tun.md) — SOCKS5 is still the primary interface; TUN forwarding is now external
--- a/docs/architecture/decisions/015-napi-rs-for-ffi-bridge.md
+++ b/docs/architecture/decisions/015-napi-rs-for-ffi-bridge.md
@@ -1,27 +0,0 @@
-# ADR-015: napi-rs for FFI Bridge
-
-## Status
-Accepted
-
-## Context
-The NAPI wrapper needs a Rust-to-Node.js bridge. Two main options:
-
-1. **napi-rs**: The standard for Rust → Node.js native addons. Mature, well-documented, large ecosystem. Produces `.node` binaries for specific platforms. Good build tooling (`napi` CLI). Used by major projects (swc, rspack, biome).
-
-2. **uniffi**: Mozilla's FFI bridge supporting multiple targets (Python, Swift, Kotlin, Node.js). Broader target reach but less mature for Node.js specifically. The Node.js binding is relatively new.
-
-The primary consumer is TypeScript/Node.js — specifically the `@alkdev/pubsub` event target system. The broader alkdev ecosystem (pubsub, operations) is TypeScript-first. While future Python or mobile consumers are imaginable, they are not in scope.
-
-## Decision
-Use napi-rs. It's the standard for Node.js native addons, has the best documentation and tooling, and matches our primary consumer (TypeScript/Node.js). If future Python or mobile consumers are needed, uniffi can be added as a separate FFI layer — the Rust core library doesn't change, only the binding layer does.
-
-## Consequences
- **Positive**: Best-in-class Node.js native addon support. Mature, well-documented, widely used.
- **Positive**: `napi` CLI handles building, cross-compilation, and npm package publishing.
- **Positive**: Async support via `napi-rs`'s `AsyncTask` and thread-safe functions.
- **Negative**: Only targets Node.js. Python/Swift/Kotlin require a separate FFI bridge (uniffi or similar).
- **Negative**: `.node` binaries are platform-specific. Need CI matrix for linux-x64, linux-arm64, macos-x64, macos-arm64, win32-x64.
-
-## References
- [napi-and-pubsub.md](../napi-and-pubsub.md)
- [OQ-11](../open-questions.md) — resolved by this ADR
--- a/docs/architecture/decisions/016-napi-expose-connect-and-serve.md
+++ b/docs/architecture/decisions/016-napi-expose-connect-and-serve.md
@@ -1,40 +0,0 @@
-# ADR-016: NAPI Exposes Both connect() and serve()
-
-## Status
-Accepted
-
-## Context
-The NAPI wrapper needs to provide TypeScript/Node.js consumers with access to alknet's functionality. The primary use case is `@alkdev/pubsub`'s event target system, which needs both directions:
-
-1. **connect()**: Establish a client connection to a alknet server. Used by workers/spokes that need to tunnel events through a alknet server.
-2. **serve()**: Start a alknet server from Node.js. Used by hubs that want to accept alknet connections and route events.
-
-The previous decision (ADR-007) was to expose only `connect()` for MVP, deferring `serve()`. However, the pubsub integration requires both: a spoke needs `connect()` to reach a hub, and a hub could use `serve()` to accept connections without running a separate `alknet serve` process.
-
-More importantly, both `connect()` and `serve()` are fundamental operations of the alknet library. Since the NAPI wrapper is a thin layer over `alknet-core`, exposing both is straightforward — they're just Rust functions behind `#[napi]` attributes.
-
-## Decision
-The NAPI wrapper exposes both `connect()` and `serve()` from the start:
-
-```typescript
-// @alkdev/alknet
-function connect(options: AlknetConnectOptions): Promise<Duplex>;
-function serve(options: AlknetServeOptions): Promise<AlknetServer>;
-```
-
- `connect()` returns a `Duplex` stream (as per ADR-007)
- `serve()` returns a `AlknetServer` object with a `close()` method and events for new connections
-
-The NAPI layer is transport-agnostic — it doesn't know about pubsub's `EventEnvelope`. The pubsub event target adapter wraps the `Duplex` stream to implement `TypedEventTarget`. This separation ensures the NAPI wrapper is reusable for any stream-based protocol, not just pubsub.
-
-## Consequences
- **Positive**: Pubsub can use both directions without running a separate binary for the server side.
- **Positive**: The NAPI wrapper becomes a complete bridge — any Node.js process can be either a client or server.
- **Positive**: Implementation is still minimal — `serve()` is just `alknet_core::server::run()` behind `#[napi]`.
- **Negative**: Slightly larger API surface (two functions + `AlknetServer` type instead of just `connect()`).
- **Negative**: Server-side NAPI needs to handle multiple concurrent connections, which adds complexity to `AlknetServer`.
-
-## References
- [napi-and-pubsub.md](../napi-and-pubsub.md)
- [ADR-007](007-napi-single-stream.md) — still valid; NAPI exposes single streams, but now from both sides
- [OQ-10](../open-questions.md) — resolved by this ADR
--- a/docs/architecture/decisions/017-stealth-mode-protocol-multiplexing.md
+++ b/docs/architecture/decisions/017-stealth-mode-protocol-multiplexing.md
@@ -1,30 +0,0 @@
-# ADR-017: Stealth Mode — Protocol Multiplexing on Port 443
-
-## Status
-Accepted
-
-## Context
-When running a alknet server with TLS transport on port 443, the server should be indistinguishable from a regular HTTPS web server to port scanners and deep packet inspection (DPI) systems. This is important for censorship circumvention — if SSH traffic on port 443 is detectable, it can be blocked.
-
-After the TLS handshake completes, the server sees a raw byte stream. SSH protocol identification starts with `SSH-2.0-`, while HTTP starts with HTTP method verbs (GET, POST, etc.). The server can inspect the first bytes to determine the protocol.
-
-## Decision
-When `--stealth` is enabled with TLS transport:
-
-1. After completing the TLS handshake, peek at the first few bytes of the connection
-2. If the connection starts with `SSH-2.0-`, proceed with SSH session via `server::run_stream()`
-3. If the connection starts with anything else (HTTP, random data), respond with `HTTP/1.1 404 Not Found\r\nServer: nginx\r\n\r\n` and close the connection
-
-This makes the server appear as an nginx web server returning 404 errors to all non-SSH connections. Scanners and DPI systems see a typical HTTPS site with no SSH exposure.
-
-The fake response uses `Server: nginx` headers to match the most common web server profile.
-
-## Consequences
- **Positive**: TLS+alknet servers on port 443 are indistinguishable from ordinary HTTPS sites to automated scanners.
- **Positive**: Simple implementation — just peek at the first bytes and branch.
- **Positive**: Consistent with censorship circumvention best practices.
- **Negative**: Legitimate HTTPS traffic to the same port gets a 404. If the same IP needs to serve real web content, use a reverse proxy (nginx/haproxy) in front that routes by SNI or path.
- **Negative**: The `--stealth` flag only applies to TLS transport. It has no effect on TCP or iroh transports.
-
-## References
- [server.md](../server.md)
--- a/docs/architecture/decisions/018-control-channel-for-pubsub.md
+++ b/docs/architecture/decisions/018-control-channel-for-pubsub.md
@@ -1,38 +0,0 @@
-# ADR-018: Control Channel for PubSub over SSH
-
-## Status
-Accepted
-
-## Context
-The NAPI wrapper and pubsub integration need a way to use alknet's SSH channel as a data plane for event routing. When a `alknet connect` client opens an SSH session to a server, the `direct_tcpip` channel type is used to reach specific TCP targets (host:port).
-
-For the pubsub use case, the client needs a dedicated bidirectional stream to the server's event bus — not a TCP connection to a random host. There are several approaches:
-
-1. **Special destination**: Use `direct_tcpip` with a reserved destination (e.g., `alknet-control:0`) that the server recognizes and routes internally instead of connecting to a TCP target.
-2. **Port forwarding**: The server runs a pubsub hub on a specific port (e.g., 9736) and the client uses normal port forwarding (`-L 9736:hub:9736`).
-3. **Custom channel type**: Define a new SSH channel type beyond `direct_tcpip` and `forwarded_tcpip`.
-
-## Decision
-Use approach 1: a reserved `direct_tcpip` destination string. When the server receives a `channel_open_direct_tcpip` request for `alknet-control:0`:
-
-1. The `channel_open_direct_tcpip` handler detects the special target via string matching
-2. Instead of connecting to a TCP target, it bridges the channel to the local pubsub event bus
-3. `EventEnvelope` JSON flows bidirectionally over the SSH channel
-
-The destination string `alknet-control` is reserved. Regular TCP targets are hostnames or IP addresses, so there is no collision risk.
-
-Approach 2 (port forwarding to a specific port) is still supported as an alternative — the client can use `--forward 9736:localhost:9736` if the server runs a pubsub hub on that port. But the control channel approach is simpler and doesn't require a separate listening port.
-
-Approach 3 (custom channel type) was rejected because russh's `direct_tcpip` handler is well-understood and adding custom channel types requires modifying russh.
-
-## Consequences
- **Positive**: Simple implementation — just string matching in the server's `channel_open_direct_tcpip` handler.
- **Positive**: No separate port or service needs to run on the server. The control channel is built into alknet.
- **Positive**: Compatible with the NAPI wrapper's single-duplex-stream model.
- **Positive**: Port forwarding to a specific port is still available as an alternative.
- **Negative**: The string `alknet-control` is a magic constant. It should be defined as a constant in the crate.
- **Negative**: Regular TCP destinations accidentally matching `alknet-control` would be misrouted. Mitigated by reserving the entire `alknet-` prefix namespace.
-
-## References
- [napi-and-pubsub.md](../napi-and-pubsub.md)
- [server.md](../server.md)
--- a/docs/architecture/decisions/019-proxy-dual-semantics.md
+++ b/docs/architecture/decisions/019-proxy-dual-semantics.md
@@ -1,42 +0,0 @@
-# ADR-019: `--proxy` Has Different Semantics on Client vs Server
-
-## Status
-Accepted
-
-## Context
-The `--proxy` CLI flag appears on both `alknet connect` (client) and `alknet serve` (server), but the two sides proxy fundamentally different things:
-
- **Client**: `--proxy` routes the *transport connection* through the proxy. For example, `alknet connect --transport iroh --proxy socks5://127.0.0.1:1080` means the iroh endpoint's outbound TCP connections go through the specified SOCKS5 proxy before reaching the iroh relay. The proxy wraps the transport layer.
-
- **Server**: `--proxy` routes *outbound target connections* through the proxy. For example, `alknet serve --proxy socks5://127.0.0.1:9050` means when an SSH client opens a `direct_tcpip` channel to `db.internal:5432`, the server connects to that target through the specified proxy. The proxy wraps the data-plane connections.
-
-Using the same flag name for both is intentional — from the user's perspective, both mean "route traffic through a proxy." But the layer at which the proxy operates differs, and this needs to be explicit so implementers don't confuse the two.
-
-ADR-010 addressed transport chaining for the client side only. The server-side outbound proxy behavior has no ADR. This ADR documents both semantics and the rationale for sharing the flag name.
-
-## Decision
-The `--proxy` flag uses the same name on client and server, with documented different semantics:
-
-| Side | Flag | What gets proxied | Example |
-|------|------|-------------------|---------|
-| Client | `--proxy` | Transport connection (outbound to server/relay) | `--transport iroh --proxy socks5://...` → iroh endpoint connects through proxy |
-| Server | `--proxy` | Outbound target connections (data plane) | `--proxy socks5://...` → direct_tcpip targets reached through proxy |
-
-On the **client**, `--proxy` affects the transport layer. It only applies to transports that make outbound TCP connections (iroh through a proxy, TLS through a proxy). For plain TCP transport, `--proxy` has no meaningful effect since the transport is already a direct TCP connection — use the SOCKS5 server instead.
-
-On the **server**, `--proxy` affects the data plane. All `channel_open_direct_tcpip` outbound connections are routed through the proxy, regardless of transport mode.
-
-This is not a naming collision — it's the same conceptual operation ("route through a proxy") at different layers. The shared name avoids forcing users to learn two proxy flags.
-
-## Consequences
- **Positive**: One flag name (`--proxy`) instead of two. Users already understand "proxy" as "route through this."
- **Positive**: Client-side proxy is minimal implementation — iroh's endpoint builder accepts proxy config natively.
- **Positive**: Server-side proxy is straightforward — all outbound TCP from channel handlers goes through the proxy.
- **Negative**: Implementers must read the correct spec (client vs server) to understand what `--proxy` does for their side. This is mitigated by CLI help text that clearly describes the behavior per side.
- **Negative**: On the client, `--proxy` with `--transport tcp` is effectively a no-op (the transport is already a direct TCP connection to the server). The CLI should handle this case gracefully.
-
-## References
- [ADR-010](010-transport-chaining-cli.md) — client-side transport chaining
- [transport.md](../transport.md) — transport layer spec
- [client.md](../client.md) — client CLI
- [server.md](../server.md) — server outbound proxy
--- a/docs/architecture/decisions/023-unified-auth-shared-key-material.md
+++ b/docs/architecture/decisions/023-unified-auth-shared-key-material.md
@@ -1,85 +0,0 @@
-# ADR-023: Unified Authentication with Shared Key Material
-
-## Status
-Accepted
-
-## Context
-
-Alknet currently authenticates connections exclusively through SSH public key
-auth in the SSH handshake. This works for SSH-over-any-transport (TCP, TLS,
-iroh) because SSH carries its own auth protocol. But WebTransport and other
-HTTP-level transports cannot perform SSH key exchange — browsers speak HTTP/3,
-not SSH.
-
-Without unification, non-SSH transports would need a completely separate
-identity system (API keys, JWTs, session tokens). This creates two problems:
-(1) operators manage two key sets with two rotation mechanisms, and (2) the
-same person connecting via SSH and WebTransport appears as two different
-identities.
-
-The `IdentityProvider` trait is needed to decouple alknet-core from any
-specific identity storage (config file vs. database). Without it, alknet-core
-would either hardcode config-file-based auth or take a database dependency —
-neither is acceptable for a library crate.
-
-## Decision
-
-**Unified authentication**: The same Ed25519 key material (`authorized_keys`
-and `cert_authorities`) is shared across both SSH auth and token auth. The
-presentation differs per transport, but the verification result (an
-`Identity` with scopes) is the same.
-
-**Token auth for non-SSH transports**: WebTransport clients present a signed
-timestamp token in the CONNECT request URL:
-
-```
-AuthToken = base64url(key_id || timestamp || signature)
-  key_id    = SHA-256 fingerprint of the Ed25519 public key (32 bytes)
-  timestamp = Unix seconds, big-endian u64 (8 bytes)
-  signature = Ed25519 sign(key_id || timestamp_bytes, private_key)
-```
-
-Server extracts the fingerprint, looks it up in the same `authorized_keys`
-set, verifies the signature, and checks the timestamp window (default ±300s).
-
-**`IdentityProvider` trait**: Decouples alknet-core from identity storage. The
-trait resolves a fingerprint or token to an `Identity`. Default implementation
-loads from `DynamicConfig.auth` (no database). Hub implementation can back it
-with `@alkdev/storage`.
-
-**`TokenKeySource::Shared`**: The token auth uses the same authorized keys set
-as SSH auth by default. Deployments that want separate access control can use
-`TokenKeySource::Separate` with a distinct key set.
-
-**Replay protection via timestamps**: V1 uses timestamp-only (no server state).
-Zero-replay can be added later via a nonce challenge-response without changing
-the key material.
-
-## Consequences
-
- **Positive**: One key set, one rotation, one `reloadAuth()` call. Adding a
-  key to `authorized_keys` immediately grants access via both SSH and
-  WebTransport.
- **Positive**: `IdentityProvider` trait makes alknet-core independent of any
-  specific database. Default: config file. Hub: `@alkdev/storage`.
- **Positive**: Browser clients can authenticate using Ed25519 keys via
-  SubtleCrypto (Chrome 105+, Firefox 130+, Safari 17+). Deno supports it
-  natively.
- **Positive**: No JWT library dependency. The token is a simple Ed25519
-  signature over a fixed structure — same primitives SSH already uses.
- **Negative**: V1 has a replay window (±300s). An attacker who intercepts a
-  QUIC packet can replay the token within the window. Acceptable because QUIC
-  interception is the same threat level as connection hijacking.
- **Negative**: Certificate authority tokens are not supported in v1. CA
-  verification requires the full OpenSSH certificate structure, which doesn't
-  fit in a signed timestamp.
- **Negative**: Browser-side key management is less ergonomic than SSH key
-  files. The private key must be imported into SubtleCrypto. This is a UI/UX
-  concern, not a protocol concern.
-
-## References
-
- [auth.md](../auth.md) — Full auth architecture spec
- [ADR-012](012-auth-ed25519-and-cert-authority.md) — Ed25519 + cert-authority auth
- [OQ-17](../open-questions.md) — Transport-aware auth (resolved by this ADR)
- [configuration.md](../../research/configuration.md) — OQ-CFG-04, OQ-CFG-06 (resolved)
--- a/docs/architecture/decisions/024-bidirectional-call-protocol.md
+++ b/docs/architecture/decisions/024-bidirectional-call-protocol.md
@@ -1,63 +0,0 @@
-# ADR-024: Bidirectional Call Protocol
-
-## Status
-Accepted
-
-## Context
-
-The alknet control channel (ADR-018) routes from client → server's event bus.
-This is unidirectional: clients can send events to the server, but the server
-cannot call operations on the client. In the hub/spoke model, spokes (dev env
-containers) connect to a hub and expose operations (fs, bash, search) that the
-hub invokes. The hub needs to call *spoke* operations.
-
-Additionally, the current control channel provides no request/response semantics.
-Every consumer that needs call/response reinvents the pending-request correlation.
-
-## Decision
-
-The call protocol is bidirectional. Both sides can send `call.requested` and
-receive `call.responded`. The protocol uses `EventEnvelope` wire format (4-byte
-BE length prefix + JSON) — the same as `@alkdev/pubsub`.
-
-Five event types: `call.requested`, `call.responded`, `call.completed`,
-`call.aborted`, `call.error`.
-
-A call is a subscribe that resolves after one event. Both use `call.requested`
-with correlated `requestId`. `PendingRequestMap` in core provides correlation.
-
-Operation names use slash-based paths: `/{spoke}/{service}/{op}`. The first
-path segment routes the call to the correct connected node. The hub's registry
-maps spoke prefixes to connections. This mirrors iroh's ALPN dispatch: the
-first segment is the routing key, remaining path dispatches within the node.
-
-Core-provided operations use short paths without a spoke prefix
-(`/services/list`, `/services/schema`). Spoke operations are prefixed
-(`/dev1/fs/readFile`).
-
-This generalizes ADR-018's control channel: the `alknet-*` destination becomes
-a transport for `EventEnvelope` frames with call protocol semantics, instead of
-raw pubsub dispatch.
-
-## Consequences
-
- **Positive**: Hub can invoke operations on spokes. Dev env containers
-  expose fs, bash, search — the hub calls them as needed.
- **Positive**: Browser clients can expose custom UDFs. Any connected participant
-  can both call and serve operations.
- **Positive**: Built-in request/response correlation. One `PendingRequestMap`
-  in core serves all consumers.
- **Positive**: Slash-based paths align with URL routing, OpenAPI, MCP, and
-  iroh's ALPN dispatch. First segment = routing key.
- **Positive**: Multiple spokes exposing the same service (two dev envs both
-  exposing `/fs/*`) are naturally differentiated by the spoke prefix.
- **Negative**: The `PendingRequestMap` adds in-memory state. Entries must be
-  cleaned up on timeout or connection close.
- **Negative**: The hub must maintain a routing table mapping spoke identities
-  to connections, with registration on connect and cleanup on disconnect.
-
-## References
-
- [call-protocol.md](../call-protocol.md) — Full call protocol spec
- [ADR-018](018-control-channel-for-pubsub.md) — Control channel (generalized)
- [napi-and-pubsub.md](../napi-and-pubsub.md) — NAPI wrapper and pubsub adapter
--- a/docs/architecture/decisions/025-handler-spec-separation.md
+++ b/docs/architecture/decisions/025-handler-spec-separation.md
@@ -1,73 +0,0 @@
-# ADR-025: Handler/Spec Separation for Downstream Service Registration
-
-## Status
-Accepted
-
-## Context
-
-The current control channel (ADR-018) is hardcoded: `alknet-control:0` bridges
-to the local pubsub event bus. If NAPI wants to expose `fs.readFile` or
-`bash.exec` as callable operations, it has no way to register these with core's
-channel routing. The NAPI handler would need to intercept channel data outside
-of core.
-
-For the hub/spoke model, spokes register their operations with the hub when
-they connect. The hub's registry must include both hub-local operations and
-remote operations exposed by spokes.
-
-## Decision
-
-Operation specs and handlers are separated from core. Core provides:
-
-1. `OperationSpec` — describes what an operation does (name, type, input/output
-   schemas, access control)
-2. `OperationHandler` — implements the operation logic
-3. `OperationRegistry` — maps paths to specs + handlers
-4. Built-in operations: `/services/list`, `/services/schema`
-
-Downstream consumers register their own operations:
-
-```rust
-// NAPI layer registers dev env tools
-registry.register(OperationSpec { name: "/fs/readFile", ... }, fs_read_handler);
-registry.register(OperationSpec { name: "/bash/exec", ... }, bash_exec_handler);
-
-// Browser client registers a custom UDF
-registry.register(OperationSpec { name: "/notify/alert", ... }, notify_handler);
-```
-
-Operation names use slash-based paths: `/{spoke}/{service}/{op}`. The first
-segment routes to the node. The `namespace` field on `OperationSpec` is
-derived from the second path segment (`service`).
-
-When spoke operations are registered with the hub, the hub adds the spoke
-prefix: a spoke that registers `/fs/readFile` as "dev1" becomes addressable as
-`/dev1/fs/readFile` in the hub's routing table.
-
-The `/services/list` operation returns all registered specs. The
-`/services/schema` operation returns the spec for a specific operation. These
-are read-only — no admin operations.
-
-## Consequences
-
- **Positive**: NAPI, Python, and any downstream consumer can register
-  operations without modifying core.
- **Positive**: Service discovery is built in. Clients query `/services/list`
-  to learn what operations a hub offers.
- **Positive**: Spoke prefix naturally differentiates multiple spokes exposing
-  the same service (dev1 vs dev2).
- **Positive**: `AccessControl` on each `OperationSpec` enables per-operation
-  authorization. Higher-risk operations (shell, filesystem write) can require
-  tighter scopes.
- **Positive**: Schema exposure enables MCP adapter generation. OperationSpec
-  maps directly to MCP tool definitions.
- **Negative**: The registry adds complexity. Core now owns `OperationSpec`,
-  `OperationRegistry`, and `PendingRequestMap`.
- **Negative**: Namespace collisions between downstream consumers are possible.
-  The spoke prefix mitigates this: `/dev1/fs/readFile` vs `/dev2/fs/readFile`.
-
-## References
-
- [call-protocol.md](../call-protocol.md) — Full call protocol spec
- [ADR-018](018-control-channel-for-pubsub.md) — Control channel (generalized)
- `@alkdev/operations` — TypeScript `OperationSpec`, `CallHandler`, registry
--- a/docs/architecture/decisions/026-transport-interface-separation.md
+++ b/docs/architecture/decisions/026-transport-interface-separation.md
@@ -1,162 +0,0 @@
-# ADR-026: Transport/Interface Separation (Three-Layer Model)
-
-## Status
-
-Accepted
-
-## Context
-
-In the current architecture, SSH is deeply embedded in the server handler. The
-`ServerHandler` owns auth, channel management, and proxy logic — all mixed
-together. This makes it impossible to run the call protocol over any transport
-that doesn't speak SSH, such as:
-
- **DNS** — encoding call protocol frames as DNS TXT queries/responses for
-  censorship resistance
- **Raw framing** — 4-byte length prefix + JSON `EventEnvelope` without SSH
-  wrapping, for local service mesh or browser-to-head direct communication
- **WebTransport** — running call protocol over QUIC streams (browsers can't do
-  SSH key exchange)
-
-The DNS control channel concept from research (`core.md`) currently conflates
-"DNS as a transport that moves bytes" with "SSH sessions over those bytes." But
-SSH is not a transport — it's a protocol layer that sits *on top of* a
-transport. Separating them enables the DNS control channel to carry call
-protocol events directly, without wrapping SSH inside DNS queries.
-
-The same separation enables raw framing (no SSH overhead) for trusted local
-networks, and WebTransport direct call protocol for browser clients.
-
-## Decision
-
-**Establish a three-layer model:**
-
-### Layer 1: Transport
-
-Produces byte streams. A `Transport` still produces
-`AsyncRead + AsyncWrite + Unpin + Send`. This layer is unchanged from ADR-001.
-
-```rust
-#[async_trait]
-pub trait Transport: Send + Sync + 'static {
-    type Stream: AsyncRead + AsyncWrite + Unpin + Send + 'static;
-    async fn connect(&self) -> Result<Self::Stream>;
-    fn describe(&self) -> String;
-}
-```
-
-Transports: TCP, TLS, iroh, DNS (as byte carrier), WebTransport (future).
-
-### Layer 2: Interface
-
-Consumes a `Transport::Stream` and produces call protocol sessions. An
-interface is what SSH currently does: wrap a byte stream in session semantics.
-
-```rust
-#[async_trait]
-pub trait Interface: Send + Sync + 'static {
-    type Session;
-    async fn accept(stream: TransportStream, config: &InterfaceConfig) -> Result<Self::Session>;
-}
-```
-
-Interfaces:
-
- **SSH interface** — wraps existing `ServerHandler` logic. SSH handshake, auth,
-  channel multiplexing. The call protocol runs over a reserved SSH channel
-  (`alknet-control:0`).
- **Raw framing interface** — 4-byte big-endian length prefix + JSON
-  `EventEnvelope`. No SSH overhead. Direct call protocol over the transport
-  stream.
- **DNS control channel** — a (DNS transport, raw framing interface) pair that
-  encodes/decodes `EventEnvelope` frames as DNS query/response pairs.
-
-### Layer 3: Protocol
-
-Carries semantics. Call protocol events, operation registry, service calls.
-The protocol is agnostic to both the transport and the interface below it. It
-receives `EventEnvelope` frames from whatever interface produced them.
-
-### Connection Model
-
-A **connection** is always a (Transport, Interface) pair. The valid combinations are enumerated:
-
-| Transport | Interface | Use case |
-|-----------|-----------|----------|
-| TLS | SSH | Standard alknet tunnel |
-| TCP | SSH | Plain SSH tunnel |
-| iroh | SSH | P2P SSH tunnel |
-| DNS | raw framing | DNS control channel |
-| WebTransport | SSH | Browser SSH tunnel (future) |
-| WebTransport | raw framing | Browser call protocol (future) |
-| TCP | raw framing | Direct call protocol, local mesh |
-
-**The DNS control channel carries call protocol frames directly — it does NOT
-wrap SSH inside DNS.** This is explicit because the research originally
-conflated "SSH tunneling over DNS" with "DNS as a transport for call protocol."
-The (DNS, raw framing) pair sends `EventEnvelope` frames as DNS TXT
-queries/responses — no SSH involved.
-
-### `TransportKind` Enum
-
-The `TransportKind` enum (currently `Tcp | Tls | Iroh`) gains `Dns` and
-`WebTransport` variants. Initially these are tags only — no acceptor
-implementation. The full DNS and WebTransport implementations are Phase 4 work
-per the integration plan.
-
-```rust
-pub enum TransportKind {
-    Tcp,
-    Tls { server_name: Option<String> },
-    Iroh { endpoint_id: String },
-    Dns { domain: String },
-    WebTransport { host: String },
-}
-```
-
-### ServerHandler Refactor
-
-The existing `ServerHandler` is refactored into `SshInterface`. The interface
-abstraction means the server's accept loop becomes:
-
-```rust
-// Pseudocode
-let (transport, interface) = listener_config;
-let stream = transport.accept().await?;
-let session = interface.accept(stream, &config).await?;
-// session produces call protocol events
-```
-
-The call protocol handler is interface-agnostic — it receives `EventEnvelope`
-frames from any interface. Auth, forwarding policy, and operation routing happen
-at Layer 3, not inside the SSH handler.
-
-## Consequences
-
- **Positive**: Enables DNS control channel without SSH wrapping. The (DNS,
-  raw framing) pair is a clean (Transport, Interface) combination.
- **Positive**: Enables raw framing for local service mesh. No SSH overhead for
-  trusted networks.
- **Positive**: SSH becomes pluggable. The same call protocol handler works with
-  any interface.
- **Positive**: `ServerHandler` is refactored into `SshInterface` — a smaller,
-  more focused component that only handles SSH session management.
- **Positive**: Future WebTransport and WebSocket interfaces are additive — they
-  implement the `Interface` trait without touching SSH code.
- **Negative**: This is the most invasive code change in Phase 1
-  (integration-plan, Phase 1.8). SSH auth, channel management, and proxy logic
-  are currently tangled in `ServerHandler`. Extracting them requires careful
-  refactoring to maintain existing behavior.
- **Negative**: The `Interface` trait is new and untested. The design must
-  accommodate both SSH's channel multiplexing and raw framing's single-stream
-  model through the same abstraction.
-
-## References
-
- [research/core.md](../../research/core.md) — Transport layer, DNS transport section
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.8, three-layer model
- [transport.md](../transport.md) — Current Transport trait (unchanged at Layer 1)
- [server.md](../server.md) — Current ServerHandler (will become SshInterface)
- [ADR-001](001-pluggable-transport.md) — Transport trait produces stream (unchanged)
- [ADR-004](004-ssh-over-transport.md) — SSH runs over transport (reinforced by Layer 2)
- [ADR-024](024-bidirectional-call-protocol.md) — Bidirectional call protocol (Layer 3)
--- a/docs/architecture/decisions/027-crate-decomposition.md
+++ b/docs/architecture/decisions/027-crate-decomposition.md
@@ -1,164 +0,0 @@
-# ADR-027: Crate Decomposition
-
-## Status
-
-Accepted
-
-## Context
-
-alknet-core currently contains everything: transport, SSH, auth, config, the
-call protocol handler, and the server accept loop. As the project grows to
-include SQLite-backed identity, HD key derivation, and metagraph storage, core
-would need to depend on rusqlite, bip39, petgraph, and other heavy dependencies
-— unacceptable for a library crate that CLI users embed.
-
-Different deployment topologies need different subsets:
- A minimal CLI tunnel only needs core, transport, and auth types
- A head node needs SQLite-backed identity and the secret service
- A flowgraph visualization tool only needs petgraph operations
-
-Circular dependencies must be avoided. alknet-storage implements
-alknet-core's `IdentityProvider` trait, so alknet-core cannot depend on
-alknet-storage. alknet-storage references alknet-secret's `EncryptedData` wire
-format, but not as a crate dependency.
-
-## Decision
-
-**Decompose the project into six crates with a strict acyclic dependency graph.**
-
-### Crate Structure
-
-1. **alknet-core** — Transport, SSH, call protocol, config, auth types, identity,
-   `OperationSpec`, `Interface` trait. The foundational crate that everything
-   else depends on (by type, not by crate dep in some cases).
-   - *Depends on*: russh, tokio, irpc (feature-gated), serde, arc-swap
-   - *Does NOT depend on*: alknet-secret, alknet-storage, alknet-flowgraph
-
-2. **alknet-secret** — BIP39 mnemonic generation, SLIP-0010 Ed25519 HD key
-   derivation, AES-256-GCM encryption, `SecretProtocol` irpc service.
-   - *Depends on*: bip39, ed25519-bip32 (or rust-bip32-ed25519), aes-gcm, sha2,
-     irpc
-   - *Does NOT depend on*: alknet-core, alknet-storage
-
-3. **alknet-storage** — SQLite-backed metagraph, identity tables, ACL graph,
-   honker integration, `StorageProtocol` irpc service.
-   - *Depends on*: rusqlite (via honker), honker, petgraph, jsonschema, irpc
-   - *Does NOT depend on alknet-core* (but implements alknet-core's
-     `IdentityProvider` trait via the trait, not a crate dep)
-   - *Does NOT depend on alknet-secret* (but references `EncryptedData` type
-     format for wire compatibility)
-
-4. **alknet-flowgraph** — `FlowGraph<N,E>` over petgraph, operation graph, call
-   graph, type compatibility checking.
-   - *Depends on*: petgraph, serde, jsonschema, thiserror
-   - *Does NOT depend on*: alknet-core, alknet-storage, alknet-secret
-
-5. **alknet-napi** — Node.js native addon. Exposes alknet-core to Node.js.
-   - *Depends on*: alknet-core
-   - *Does NOT depend on*: alknet-secret, alknet-storage, alknet-flowgraph
-
-6. **alknet** (CLI binary) — Assembles everything.
-   - *Depends on*: alknet-core, alknet-secret (feature), alknet-storage (feature),
-     alknet-flowgraph (feature), toml
-
-### Dependency Graph
-
-```
-alknet-secret       alknet-storage      alknet-flowgraph
-   (standalone)        (standalone)        (standalone)
-        │                   │                  │
-        │  (feature flags   │   (trait impl    │  (type compat
-        │   in CLI binary)  │    via CLI wire)  │   via JSON)
-        ▼                   ▼                  ▼
-                 ┌─────────────────────┐
-                 │    alknet-core       │
-                 │  (transport, SSH,     │
-                 │   call protocol,     │
-                 │   Identity, Config)  │
-                 └─────────┬───────────┘
-                           │
-              ┌────────────┼────────────┐
-              ▼            ▼            ▼
-        alknet-napi    alknet (CLI binary — assembles everything)
-```
-
-All four library crates (core, secret, storage, flowgraph) are independent of
-each other. Dependencies flow **upward** only. The CLI binary sits at the top
-and wires concrete implementations together. alknet-storage implements
-alknet-core's `IdentityProvider` trait without a crate dependency — the CLI
-binary provides the bridge.
-
-### Narrow Interface Points
-
-Three types serve as the narrow interface points between crates:
-
-1. **`Identity`** — Defined in `alknet_core::auth`. Used by auth handler,
-   forwarding policy, and call protocol. alknet-storage implements
-   `IdentityProvider` to produce instances.
-
-2. **`IdentityProvider`** — Trait defined in `alknet_core::auth`. Implemented by
-   `ConfigIdentityProvider` (in core) and `StorageIdentityProvider` (in
-   alknet-storage). The CLI/NAPI layer wires the concrete implementation.
-
-3. **`OperationSpec`** — Defined in `alknet_core::call`. Used by the operation
-   registry and by alknet-flowgraph for type compatibility checking. The bridge
-   is serialization — flowgraph serializes to JSON, storage persists it.
-
-### irpc Feature Flag
-
-irpc is a feature flag in alknet-core. When disabled, auth and config go through
-`IdentityProvider` and `ConfigReloadHandle` directly — no irpc overhead. Nodes
-that only do SSH tunneling don't need the service layer.
-
-In alknet-secret and alknet-storage, irpc is an independent dependency, not
-feature-gated. These crates always define irpc service protocols because they
-are used in production deployments where the service layer is active.
-
-### alknet-storage's Relationship to alknet-core
-
-alknet-storage does NOT depend on alknet-core as a crate. Instead:
-
- alknet-storage defines its own `IdentityProvider` impl that matches
-  alknet-core's trait signature. The trait is re-exported or defined locally
-  with `#[cfg(feature = "alknet-core")]` interop.
- In practice, the CLI binary crate depends on both and wires them together.
-  alknet-storage provides `StorageIdentityProvider`; alknet-core takes
-  `impl IdentityProvider`.
-
-### alknet-storage's Relationship to alknet-secret
-
-alknet-storage does NOT depend on alknet-secret as a crate. Instead:
-
- alknet-storage and alknet-secret share the `EncryptedData` wire format (key
-  version, salt, IV, ciphertext). This is a type-level compatibility, not a
-  crate dependency.
- alknet-secret encrypts; alknet-storage stores the encrypted blob in a
-  `SecretNode` in the metagraph. The bridge is serialization.
-
-## Consequences
-
- **Positive**: Core is lean. No database, no crypto, no petgraph. CLI users
-  get a small binary.
- **Positive**: Services are pluggable. alknet-secret and alknet-storage can be
-  swapped for alternative implementations.
- **Positive**: No circular dependencies. The dependency graph is a DAG.
- **Positive**: Deployment topology determines which crates to include. A CLI
-  tunnel uses only alknet-core. A head node uses everything.
- **Positive**: irpc is feature-gated in core. Minimal deployments don't pay for
-  service layer overhead.
- **Negative**: `IdentityProvider` trait interop between alknet-core and
-  alknet-storage requires careful versioning. If the trait signature changes,
-  both crates must update.
- **Negative**: `EncryptedData` wire format compatibility between alknet-secret
-  and alknet-storage is implicit (not enforced by the type system). A shared
-  types crate could be extracted if needed, but adds another crate dependency.
-
-## References
-
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 2, dependency graph
- [research/core.md](../../research/core.md) — alknet-core contents
- [research/services.md](../../research/services.md) — Service protocols
- [research/storage.md](../../research/storage.md) — alknet-storage contents
- [research/flow.md](../../research/flow.md) — alknet-flowgraph contents
- [ADR-028](028-auth-irpc-service.md) — Auth as irpc service (service protocol enabled by decomposition)
- [ADR-029](029-identity-core-type.md) — Identity as core type (narrow interface point)
--- a/docs/architecture/decisions/028-auth-irpc-service.md
+++ b/docs/architecture/decisions/028-auth-irpc-service.md
@@ -1,147 +0,0 @@
-# ADR-028: Auth as irpc Service
-
-## Status
-
-Accepted
-
-## Context
-
-For head nodes serving many users, in-memory key lookup via `ArcSwap<DynamicConfig>`
-doesn't scale. Loading all authorized keys into RAM and atomic-swapping the
-entire set on each reload works for small deployments but requires holding every
-key in memory. For production deployments with hundreds or thousands of users,
-auth verification should query a database on demand rather than holding all keys
-in memory.
-
-The current `ArcSwap<DynamicConfig>` approach works for CLI and single-node
-setups. What's needed is an async boundary that allows auth verification to go
-through a service — locally via channels for minimal deployments, or via irpc
-for production deployments where auth runs on a separate process or node.
-
-The critical design point: callers go through the `IdentityProvider` trait
-(ADR-029). The irpc service is one way to satisfy the trait. Both paths produce
-the same result — an `Identity` or rejection. The trait is the contract; the
-service is an implementation path.
-
-## Decision
-
-**Auth verification is provided via an irpc service protocol, with
-`IdentityProvider` as the interface contract and `ConfigIdentityProvider`
-(ArcSwap-backed) as the default implementation.**
-
-### IdentityProvider Trait (ADR-029) — The Contract
-
-Callers depend on `IdentityProvider`, not on any concrete implementation:
-
-```rust
-pub trait IdentityProvider: Send + Sync + 'static {
-    fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
-    fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
-}
-```
-
-### ConfigIdentityProvider — Default Implementation
-
-Reads from `ArcSwap<DynamicConfig.auth>`. No database needed. Every authorized
-key gets a default scope set. This is the default for CLI and single-node
-deployments.
-
-### AuthProtocol irpc Service — Behind Feature Flag
-
-```rust
-#[rpc_requests(message = AuthMessage)]
-#[derive(Debug, Serialize, Deserialize)]
-enum AuthProtocol {
-    #[rpc(tx=oneshot::Sender<AuthResult>)]
-    #[wrap(VerifyPubkey)]
-    VerifyPubkey { fingerprint: String, key_data: Vec<u8> },
-
-    #[rpc(tx=oneshot::Sender<AuthResult>)]
-    #[wrap(VerifyToken)]
-    VerifyToken { token_bytes: Vec<u8>, timestamp: u64 },
-
-    #[rpc(tx=oneshot::Sender<()>)]
-    #[wrap(ReloadKeys)]
-    ReloadKeys,
-
-    #[rpc(tx=oneshot::Sender<bool>)]
-    #[wrap(CheckAccess)]
-    CheckAccess { identity: Identity, operation: String },
-}
-
-enum AuthResult {
-    Ok(Identity),
-    Denied(String),
-}
-```
-
-The `AuthProtocol` is behind the `irpc` feature flag in alknet-core. Nodes
-that only do SSH tunneling don't need the service layer overhead. When the
-feature is disabled, auth goes through `IdentityProvider` directly.
-
-### AuthServiceImpl
-
-Two implementations exist (the second is a future phase):
-
- **ConfigAuthService** — backed by `ConfigIdentityProvider` (ArcSwap path).
-  Wraps the trait in an irpc service for deployments that use the service layer
-  but don't have SQLite. This is the Phase 1 path: it ships with alknet-core.
- **StorageAuthService** — backed by SQLite `peer_credentials` and `api_keys`
-  tables (in alknet-storage, not yet built). Queries on demand. Can maintain an
-  LRU cache for hot fingerprints. This is a Phase 2+ implementation — the
-  contract is defined here so alknet-storage can implement it later.
-
-Both produce the same `AuthResult` — an `Identity` or a denial. Callers don't
-know or care which backend is running.
-
-### Integration with IdentityProvider
-
-The irpc service and the trait compose. A caller goes through `IdentityProvider`,
-which may internally delegate to the irpc service, or may satisfy the request
-locally via `ConfigIdentityProvider`. The deployment topology determines the
-path:
-
- **Minimal (CLI, single-node)**: `ConfigIdentityProvider` reads from
-  `ArcSwap<DynamicConfig>`. No irpc overhead.
- **Production with local auth**: `AuthServiceImpl` wraps
-  `StorageIdentityProvider` locally. The handler calls `IdentityProvider` which
-  routes to the local irpc service.
- **Distributed auth**: Handler on a worker node calls `IdentityProvider` which
-  routes to a remote auth irpc service over QUIC.
-
-### ConfigService Integration
-
-`AuthProtocol::ReloadKeys` triggers reload of the dynamic config's auth section.
-For the `ConfigIdentityProvider` path, this is equivalent to
-`ConfigReloadHandle::reload()`. For the `StorageIdentityProvider` path, this
-refreshes the LRU cache. Both update atomically — ongoing connections are
-unaffected, new connections pick up changes.
-
-## Consequences
-
- **Positive**: Minimal deployments use `ArcSwap` without irpc overhead. No
-  database dependency for CLI users.
- **Positive**: Production deployments wire `StorageIdentityProvider` behind the
-  irpc service. Auth scales to thousands of users without loading all keys into
-  memory.
- **Positive**: The `IdentityProvider` trait is the only contract callers depend
-  on. This keeps alknet-core lean and testable.
- **Positive**: Feature flag (`irpc`) keeps core lean for deployments that don't
-  need the service layer.
- **Positive**: Both paths produce identical `Identity` results. Behavioral
-  parity is enforced by the shared `Identity` type.
- **Negative**: Two implementations must be kept in sync. `ConfigIdentityProvider`
-  and `StorageIdentityProvider` must produce the same `Identity` for the same
-  input. Integration tests should verify this.
- **Negative**: The `irpc` feature flag adds conditional compilation complexity.
-  The core must compile and work without it, and the service layer must work
-  with it enabled.
-
-## References
-
- [research/services.md](../../research/services.md) — AuthService, AuthProtocol definition
- [auth.md](../auth.md) — IdentityProvider trait, Identity struct
- [research/configuration.md](../../research/configuration.md) — Auth service approach
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.4
- [ADR-029](029-identity-core-type.md) — Identity as core type
- [ADR-027](027-crate-decomposition.md) — Crate decomposition
--- a/docs/architecture/decisions/029-identity-core-type.md
+++ b/docs/architecture/decisions/029-identity-core-type.md
@@ -1,107 +0,0 @@
-# ADR-029: Identity as Core Type
-
-## Status
-
-Accepted
-
-## Context
-
-The `Identity` struct and `IdentityProvider` trait are needed by auth,
-forwarding policy, and call protocol — three different subsystems in
-alknet-core. Without placing them in core, these subsystems would each define
-their own identity type, leading to duplication and conversion boilerplate.
-
-The constraint: alknet-core must not depend on alknet-storage or any database.
-The `IdentityProvider` trait must be in core so that the handler can resolve
-identities without knowing whether the backing store is a config file or a
-SQLite database. External crates provide implementations.
-
-Earlier research defined `Identity` inconsistently: `{node_id, fingerprint,
-scopes}` in services.md and `{id, scopes, resources}` in auth.md. The unified
-model uses `{id, scopes, resources}` where `id` serves as both fingerprint (for
-key-based auth from config) and account UUID (for database-backed auth).
-
-## Decision
-
-**`Identity` struct and `IdentityProvider` trait live in `alknet_core::auth`.**
-
-### Identity Struct
-
-```rust
-pub struct Identity {
-    pub id: String,                               // Fingerprint (config auth) or account UUID (database auth)
-    pub scopes: Vec<String>,                      // e.g., ["relay:connect", "service:gitea:read"]
-    pub resources: HashMap<String, Vec<String>>,   // e.g., {"service": ["gitea", "registry"]}
-}
-```
-
-The `id` field serves dual purpose: when using config-based authentication
-(`ConfigIdentityProvider`), it holds the Ed25519 key fingerprint. When using
-database-backed authentication (`StorageIdentityProvider`), it holds the account
-UUID from the `accounts` table. This keeps the type simple while accommodating
-both auth paths.
-
-The `scopes` field provides authorization scope strings used by
-`ForwardingPolicy` and `AccessControl` in `OperationSpec`. The `resources`
-field provides resource-level authorization beyond what scopes offer (e.g., which
-services this identity can access).
-
-### IdentityProvider Trait
-
-```rust
-pub trait IdentityProvider: Send + Sync + 'static {
-    fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
-    fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
-}
-```
-
-The trait is the contract. Callers (auth handler, forwarding policy, call
-protocol) depend on `IdentityProvider` — not on any concrete implementation.
-
-### Default and Production Implementations
-
- **`ConfigIdentityProvider`** (in alknet-core) — reads from
-  `ArcSwap<DynamicConfig.auth>`. Every authorized key gets a default scope set.
-  No database needed. This is the default for minimal deployments.
- **`StorageIdentityProvider`** (in alknet-storage) — backed by SQLite
-  `peer_credentials` and `api_keys` tables plus the ACL graph. Resolves
-  fingerprint → account → organization membership → effective scopes. This is
-  the production implementation for head nodes.
-
-alknet-core never depends on alknet-storage. The trait relationship is:
-alknet-core *defines* the trait, alknet-storage *implements* it. The CLI or
-NAPI assembly layer wires the concrete implementation.
-
-### Why Not in alknet-storage?
-
-If `Identity` lived in alknet-storage, alknet-core would need to depend on
-alknet-storage to use the type — creating a circular dependency (since
-alknet-storage implements alknet-core's `IdentityProvider` trait). Placing the
-type and trait in core breaks the cycle.
-
-## Consequences
-
- **Positive**: alknet-core has no database dependency. Auth, forwarding, and
-  call protocol all use the same `Identity` type.
- **Positive**: alknet-storage implements the core trait. The CLI/NAPI layer
-  wires the concrete implementation. Deployment topology determines which impl
-  to use.
- **Positive**: The `id` field serves dual purpose (fingerprint or UUID),
-  avoiding separate types for config-based and database-based auth.
- **Positive**: `ForwardingPolicy` and `AccessControl` can reference scopes from
-  `Identity` without knowing where they came from.
- **Negative**: Two implementations of `IdentityProvider` exist — `Config` and
-  `Storage`. Both must produce identical `Identity` results for the same input.
-  Tests should verify behavioral parity.
- **Negative**: The trait abstraction adds a level of indirection for the
-  minimal (config-only) deployment path. The cost is negligible — the
-  `ConfigIdentityProvider` is a simple `ArcSwap` dereference.
-
-## References
-
- [auth.md](../auth.md) — IdentityProvider trait, Identity struct, unified auth
- [research/services.md](../../research/services.md) — AuthService, Identity section
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.2
- [ADR-023](023-unified-auth-shared-key-material.md) — Unified auth with shared key material
- [ADR-028](028-auth-irpc-service.md) — Auth as irpc service
- [OQ-18](../open-questions.md) — IdentityProvider owns scopes
--- a/docs/architecture/decisions/030-static-dynamic-config-split.md
+++ b/docs/architecture/decisions/030-static-dynamic-config-split.md
@@ -1,159 +0,0 @@
-# ADR-030: Static/Dynamic Configuration Split
-
-## Status
-
-Accepted
-
-## Context
-
-Alknet's configuration is loaded once at startup and never changes. This causes
-three specific failures:
-
-1. **No hot reload of authentication credentials.** Adding or removing an
-   authorized key requires restarting the server process. In head/worker
-   deployments where keys are managed via a database, the process must be
-   restarted every time a key is added, revoked, or rotated. This is
-   operationally unacceptable.
-
-2. **No port forwarding access control.** Any authenticated client can open a
-   `direct-tcpip` channel to any destination. There is no policy governing
-   which hosts, ports, or alknet control channels a client may access. A
-   compromised key grants unrestricted network access through the tunnel.
-
-3. **No structured configuration beyond CLI flags.** ADR-011 chose
-   programmatic-first configuration for the alpha — correct at the time. But as
-   alknet moves toward publishable releases, operators need config files for
-   reproducible deployments, and the NAPI layer needs programmatic reload
-   capability that `ServeOptions` doesn't currently support.
-
-Not all configuration should be reloadable. Transport-level settings (listen
-address, TLS certificates, host key) require socket/TLS renegotiation to change
-at runtime — effectively a restart. Auth and forwarding policy can change
-atomically without disrupting existing connections.
-
-## Decision
-
-**Split configuration into `StaticConfig` and `DynamicConfig`.**
-
-### StaticConfig
-
-Immutable after startup. Constructed from `ServeOptions` (the builder pattern is
-preserved). Contains everything that affects socket binding, TLS handshakes, or
-SSH session negotiation:
-
- Transport mode, listen address
- TLS config (cert, key)
- iroh config (relay URL)
- Stealth mode flag
- Host key, host key algorithm
- Max auth attempts, max connections per IP
- Proxy config
-
-Changing any of these requires a restart.
-
-### DynamicConfig
-
-Hot-reloadable at runtime via `ArcSwap<DynamicConfig>`. Contains everything
-checked per-connection or per-channel:
-
- `AuthPolicy` — authorized keys, certificate authorities, token config
- `ForwardingPolicy` — allow/deny rules for channel targets (ADR-031)
- `RateLimitConfig` — rate limiting parameters
-
-`ArcSwap` provides lock-free reads on the hot path (every `auth_publickey()` and
-every `channel_open_direct_tcpip()` call does an `Arc` dereference — zero cost
-compared to the current approach). Writes are atomic: `store()` swaps the
-pointer. Existing connections finish with their current config; new connections
-get the new config.
-
-### ConfigReloadHandle
-
-```rust
-pub struct ConfigReloadHandle {
-    dynamic: Arc<ArcSwap<DynamicConfig>>,
-}
-
-impl ConfigReloadHandle {
-    pub fn reload(&self, new_config: DynamicConfig) { ... }
-}
-```
-
-The handle is obtained from `Server::run()` and passed to NAPI or the CLI.
-
-### ConfigService
-
-The `ConfigService` wraps `ArcSwap<DynamicConfig>` reloads behind an irpc
-protocol (behind the `irpc` feature flag) for production deployments that use
-the service layer. For minimal deployments (CLI, single-node), direct
-`ConfigReloadHandle::reload()` is sufficient.
-
-### TOML Config File
-
-An optional TOML config file covers static config plus initial auth/forwarding
-paths. This **amends** ADR-011 (does not supersede it) — the programmatic-first
-API remains primary. The config file is a convenience input format:
-
-```toml
-[server]
-transport = "tls"
-listen = "0.0.0.0:443"
-stealth = false
-max_connections_per_ip = 5
-max_auth_attempts = 3
-
-[server.tls]
-cert = "/etc/alknet/tls/cert.pem"
-key = "/etc/alknet/tls/key.pem"
-
-[auth]
-host_key = "/etc/alknet/ssh/host_key"
-
-[forwarding]
-default = "deny"
-```
-
-### NAPI Reload API
-
-```typescript
-interface AlknetServer {
-  reloadAuth(auth: { authorizedKeys?: Buffer, certAuthority?: Buffer }): void;
-  reloadForwarding(policy: ForwardingPolicyConfig): void;
-  reloadAll(config: DynamicConfig): void;
-}
-```
-
-The NAPI layer parses key data and constructs a new `DynamicConfig`, then calls
-`ConfigReloadHandle::reload()`.
-
-### Client Configuration
-
-Client configuration stays as `ConnectOptions` — no `ArcSwap` needed. Client
-config is almost entirely static (which server to connect to, which key to use).
-
-## Consequences
-
- **Positive**: Auth credentials and forwarding policy can be reloaded without
-  restarting the server. Adding a key via `reloadAuth()` takes effect on the
-  next connection attempt.
- **Positive**: ADR-011's programmatic-first intent is preserved. The TOML
-  config file is an optional convenience layer, not a replacement for
-  `ServeOptions`.
- **Positive**: `ArcSwap` provides zero-cost reads on the hot path. Every auth
-  check and every channel open is a single `Arc` dereference.
- **Positive**: The `ConfigService` irpc protocol (behind feature flag) allows
-  production deployments to integrate config reload into their service mesh
-  without taking a direct dependency on `DynamicConfig` internals.
- **Positive**: Forwarding policy is now part of `DynamicConfig` — operators can
-  restrict access per identity, per destination, per transport (ADR-031).
- **Negative**: Two config structs where there was one. The split is clean
-  (transport vs. policy) but adds surface area.
- **Negative**: Config file introduces `toml` as a dependency in the CLI crate.
-  This is acceptable for a CLI binary.
-
-## References
-
- [research/configuration.md](../../research/configuration.md) — Full analysis
- [ADR-011](011-no-ssh-config-programmatic-api.md) — Programmatic-first API (amended, not superseded)
- [ADR-031](031-forwarding-policy.md) — Forwarding policy (part of DynamicConfig)
- [ADR-029](029-identity-core-type.md) — Identity as core type (DynamicConfig.auth uses IdentityProvider)
- [integration-plan.md](../../research/integration-plan.md) — Phase 1.1
--- a/docs/architecture/decisions/031-forwarding-policy.md
+++ b/docs/architecture/decisions/031-forwarding-policy.md
@@ -1,138 +0,0 @@
-# ADR-031: Forwarding Policy
-
-## Status
-
-Accepted
-
-## Context
-
-Currently, any authenticated client can open a `direct-tcpip` SSH channel to
-any destination. The only gate is authentication — once authenticated, a client
-has unrestricted network access through the tunnel. This is a security gap: a
-compromised key grants unrestricted access.
-
-Operators need the ability to:
- Restrict which hosts and ports authenticated clients can access
- Apply different rules to different principals (key fingerprints, accounts)
- Restrict WebTransport clients to alknet control channels only
- Set a default policy (allow-all for migration compatibility, deny-all for
-  production)
-
-## Decision
-
-**Add `ForwardingPolicy` as part of `DynamicConfig` (reloadable without
-restart).**
-
-### Type Definitions
-
-```rust
-pub struct ForwardingPolicy {
-    pub default: ForwardingAction,
-    pub rules: Vec<ForwardingRule>,
-}
-
-pub struct ForwardingRule {
-    pub target: TargetPattern,
-    pub action: ForwardingAction,
-    pub principals: Vec<String>,   // Empty = matches all
-    pub transports: Vec<TransportKind>,  // Empty = matches all
-}
-
-pub enum ForwardingAction {
-    Allow,
-    Deny,
-}
-
-pub enum TargetPattern {
-    Any,
-    Host(String),          // "localhost", "*.example.com"
-    Cidr(IpNetwork),       // "10.0.0.0/8"
-    PortRange(String, Range<u16>),  // "localhost", ports 8080-8090
-    AlknetPrefix,          // Matches alknet-* control channels
-}
-```
-
-### Rule Evaluation
-
-Rules are evaluated in order. First match wins. If no rule matches, the default
-applies. This supports both allowlist and blocklist semantics:
-
- **Allowlist**: `default: Deny`, then explicit Allow rules for permitted
-  destinations.
- **Blocklist**: `default: Allow`, then explicit Deny rules for blocked
-  destinations.
-
-### Principals
-
-Each rule can specify which principals it applies to. A principal is an
-`Identity.id` (fingerprint or UUID) or a scope from `Identity.scopes`. When the
-rule's `principals` field is empty, it matches all identities.
-
-This connects to the `IdentityProvider` trait (ADR-029): when a client
-authenticates, the `Identity` is resolved, and the forwarding policy checks
-rules against `Identity.id` and `Identity.scopes`.
-
-### TransportKind-Aware Rules
-
-Each rule can specify which `TransportKind` it applies to. This enables
-transport-specific restrictions — for example, WebTransport clients can be
-restricted to `alknet-*` control channels only:
-
-```rust
-ForwardingRule {
-    target: TargetPattern::AlknetPrefix,
-    action: ForwardingAction::Allow,
-    principals: vec![],
-    transports: vec![TransportKind::WebTransport { host: "*".into() }],
-}
-```
-
-### Where the Policy Check Happens
-
-The forwarding policy check occurs in `channel_open_direct_tcpip` before the
-proxy task is spawned. The current behavior (no check) is equivalent to
-`ForwardingPolicy::allow_all()` — default Allow, no rules. This preserves
-backward compatibility during migration.
-
-### DynamicConfig Integration
-
-`ForwardingPolicy` is part of `DynamicConfig` and reloadable via
-`ConfigReloadHandle::reload()` or NAPI's `reloadForwarding()`. Changes take
-effect on the next channel open — existing connections continue with their
-current policy.
-
-### OQ Resolutions
-
- **OQ-12** (Per-user forwarding scope vs global rules): Resolved. Start with
-  global rules + principal matching from `Identity.scopes`. Per-user scope
-  from `peer_credentials.metadata.scopes` via `IdentityProvider`.
- **OQ-16** (Transport-specific forwarding): Resolved. Add `TransportKind`
-  match in `ForwardingRule`. WebTransport clients can be restricted.
- **OQ-18** (Source of Identity.scopes): Resolved by ADR-029 and this ADR.
-  `IdentityProvider` owns scopes. `ForwardingPolicy` consumes them.
-
-## Consequences
-
- **Positive**: Operators can restrict access per identity, per destination, per
-  transport. A compromised key no longer grants unrestricted network access.
- **Positive**: Default-allow preserves current behavior during migration. Switch
-  to default-deny for production deployments.
- **Positive**: Policy is reloadable without restart. Adding a rule via
-  `reloadForwarding()` takes effect on the next channel open.
- **Positive**: `TransportKind`-aware rules enable transport-specific
-  restrictions (e.g., WebTransport clients restricted to alknet-* channels).
- **Negative**: Another check in the hot path (every `channel_open_direct_tcpip`
-  call). The cost is a linear scan of rules — acceptable for small rule sets.
-  Large rule sets should use compiled matchers (future optimization).
- **Negative**: `TargetPattern` string matching is lenient. Host patterns like
-  `*.example.com` require careful implementation to prevent bypasses. The
-  `glob` or `globset` crate can handle this correctly.
-
-## References
-
- [research/configuration.md](../../research/configuration.md) — ForwardingPolicy section
- [auth.md](../auth.md) — Identity.scopes and IdentityProvider
- [open-questions.md](../open-questions.md) — OQ-12, OQ-16, OQ-18
- [ADR-029](029-identity-core-type.md) — Identity as core type
- [ADR-030](030-static-dynamic-config-split.md) — DynamicConfig (ForwardingPolicy is part of it)
- [integration-plan.md](../../research/integration-plan.md) — Phase 1.3
--- a/docs/architecture/decisions/032-event-boundary-discipline.md
+++ b/docs/architecture/decisions/032-event-boundary-discipline.md
@@ -1,96 +0,0 @@
-# ADR-032: Event Boundary Discipline
-
-## Status
-
-Accepted
-
-## Context
-
-The research identified three distinct communication patterns in the system, and
-conflating them is a known anti-pattern in event-driven architectures:
-
-1. **Domain events** (Honker streams) — Internal to the service that owns that
-   data. Used for state reconstruction within the service's own boundaries.
-   Examples: `nodes:created`, `edges:deleted`, `accounts:updated`.
-
-2. **irpc service calls** — Synchronous request-response within a node or
-   cluster. Internal to the system. Examples: `AuthProtocol::VerifyPubkey`,
-   `SecretProtocol::DeriveEd25519`, `ConfigProtocol::ReloadForwarding`.
-
-3. **Call protocol events** (`EventEnvelope`) — Asynchronous integration events
-   that cross node boundaries. External to the system. Examples:
-   `call.requested`, `call.responded`, `call.completed`, `call.aborted`.
-
-Without a hard constraint, it's tempting to have one service subscribe directly
-to another service's Honker streams. This leads to:
-
- **Leaky event store**: Service A reads Service B's domain events directly,
-  coupling A to B's internal state representation. When B changes its schema, A
-  breaks.
- **Boomerang coupling**: An integration event is too thin, causing the
-  consumer to call back to the source service synchronously to get details. This
-  negates the benefit of async communication.
- **Fat notification trap**: A notification event carries full entity state,
-  when it should use state transfer instead.
-
-## Decision
-
-**Event boundary discipline is a hard architectural constraint, not a
-suggestion.**
-
-1. **Domain events stay within the owning service.** A Honker stream published
-   by the storage service (`nodes:created`) is for the storage service's own
-   state reconstruction. No other service reads these stream events directly.
-
-2. **irpc service calls are synchronous and internal.** They never cross node
-   boundaries. They are request-response, not events. They should not be used
-   as a substitute for integration events.
-
-3. **Call protocol events are the only events that cross node boundaries.**
-   `EventEnvelope` frames are the integration boundary. When a domain event
-   needs to be communicated to another node, it must be projected into a call
-   protocol event.
-
-4. **Projection from domain events to integration events is required when
-   crossing boundaries.** A service that owns a Honker stream must project
-   relevant state changes into `EventEnvelope` frames before they leave the
-   node. The projection strips internal details and produces a versioned,
-   stable integration event.
-
-This discipline applies at three levels:
-
-```
-Call Protocol (Layer 3, external, JSON)
-    └── irpc Service (Layer 3, internal, postcard)
-            └── Honker Streams (Domain events, within service boundary)
-```
-
-A call protocol handler MAY call an irpc service internally (e.g.,
-`/head/auth/verify` calls `AuthProtocol::VerifyPubkey`). The irpc service MAY
-use Honker streams for its own state management. But domain events never
-propagate beyond the service boundary without projection.
-
-## Consequences
-
- **Positive**: Prevents leaky event stores. Services are independently
-  deployable and their internal schemas can evolve without breaking consumers.
- **Positive**: Honker and irpc are implementation details, not cross-boundary
-  contracts. The call protocol's `EventEnvelope` is the only stable, versioned
-  contract that other nodes depend on.
- **Positive**: Clear ownership. Each service owns its Honker streams and can
-  change them freely. Integration events are a deliberate, reviewed contract.
- **Positive**: Makes testing easier. Services can be tested in isolation with
-  mock domain events. Integration events are tested against the `EventEnvelope`
-  schema.
- **Negative**: Projection code is required. Every domain event that needs to
-  cross a boundary must be explicitly projected. This is deliberate — the
-  overhead ensures the integration contract is intentional.
- **Negative**: Developers must resist the temptation to subscribe directly to
-  Honker streams across services. Code review should catch this pattern.
-
-## References
-
- [research/services.md](../../research/services.md) — Event boundary discipline section
- [research/storage.md](../../research/storage.md) — Honker integration, event boundaries
- [research/integration-plan.md](../../research/integration-plan.md) — ADR 032 entry
- [event_source_types.md](../../research/event-sourcing/event_source_types.md) — Event-driven architecture patterns
--- a/docs/architecture/decisions/033-operationenv-irpc-call-protocol.md
+++ b/docs/architecture/decisions/033-operationenv-irpc-call-protocol.md
@@ -1,132 +0,0 @@
-# ADR-033: OperationEnv as Universal Composition Mechanism
-
-## Status
-
-Accepted
-
-## Context
-
-The `@alkdev/operations` TypeScript package defines `OperationEnv` as a
-universal composition mechanism. A handler receives `context.env[namespace][op](input)`
-and can invoke any registered operation regardless of whether it runs locally, in
-an irpc service on the same cluster, or on a remote node via call protocol.
-
-The research documents define three dispatch paths:
-1. **Local dispatch** — direct function call through the operation registry
-2. **Service dispatch** — irpc protocol call to a service backend
-3. **Remote dispatch** — call protocol `EventEnvelope` to a remote node
-
-Without a formal decision, irpc services could be seen as a replacement for
-OperationEnv or for the call protocol. They are not — irpc is one dispatch
-backend for OperationEnv, not a replacement for anything. The call protocol is
-another dispatch backend. OperationEnv unifies them from the handler's
-perspective.
-
-The three communication patterns in the system (ADR-032) are:
- Domain events (Honker streams) — internal to the owning service
- irpc service calls — synchronous, in-cluster
- Call protocol events — asynchronous, cross-node
-
-irpc services and call protocol operations serve different scopes but must
-compose cleanly through OperationEnv.
-
-## Decision
-
-**OperationEnv is the universal composition mechanism that all operation
-handlers receive. It provides namespace + operation name → invoke with input,
-return output, regardless of dispatch path.**
-
-### OperationEnv Behavioral Contract
-
-```rust
-// The behavioral contract: given a namespace and operation name, invoke the
-// operation with the given input and return the output. The handler neither
-// knows nor cares whether the dispatch is local, via irpc, or via call protocol.
-pub trait OperationEnv: Send + Sync {
-    fn invoke(&self, namespace: &str, operation: &str, input: Value) -> ResponseEnvelope;
-}
-```
-
-The Rust implementation may use typed method dispatch or a registry behind the
-scenes, but the handler-facing API must preserve this contract.
-
-### Three Dispatch Paths
-
-OperationEnv resolves each call to one of three dispatch backends:
-
-| Path | Mechanism | Serialization | Scope |
-|------|-----------|---------------|-------|
-| Local | Direct function call through registry | None (in-process) | Same process |
-| Service | irpc protocol enum dispatch | postcard (binary) | Same cluster |
-| Remote | Call protocol `EventEnvelope` | JSON | Cross-node |
-
-All three produce the same `ResponseEnvelope`. The handler always calls
-`context.env.invoke("secrets", "derive", input)` and gets a `ResponseEnvelope`
-back.
-
-### Service Assembly
-
-The deployment topology determines which dispatch path each operation uses:
-
-```rust
-// Minimal deployment (single node, all local)
-let env = OperationEnv::local(local_registry);
-
-// Production deployment (mix of local and remote)
-let env = OperationEnv::new()
-    .local("auth", auth_registry)           // Auth runs locally
-    .local("config", config_registry)       // Config runs locally
-    .service("secrets", secret_irpc_client) // Secret service via irpc
-    .remote("worker-1", call_protocol_conn)  // Worker-1 operations via call protocol
-```
-
-### irpc Services Are One Dispatch Backend
-
-irpc services (`AuthProtocol`, `SecretProtocol`, `ConfigProtocol`) define the
-wire format for in-cluster communication. They are Rust-to-Rust, type-safe,
-and efficient. But they are not a replacement for OperationEnv or for the call
-protocol. They are one dispatch backend.
-
-An irpc service can be exposed as a call protocol operation:
-`/head/auth/verify` receives a call protocol event and internally calls
-`AuthProtocol::VerifyPubkey` via irpc. The layers compose:
-
-```
-Call Protocol (Layer 3, external, JSON)
-    └── irpc Service (Layer 3, internal, postcard)
-            └── Honker Streams (Domain events, within service boundary)
-```
-
-### Adapters Map to OperationEnv
-
-HTTP (`POST /v1/{namespace}/{op}`), MCP (`tools/call`), DNS
-(`{op}.{namespace}.alk.dev TXT?`), and call protocol
-(`/call.requested`) all resolve through OperationEnv. This is what makes
-operations universally composable across all interfaces.
-
-## Consequences
-
- **Positive**: Handlers compose through a single interface. Adding a new
-  dispatch path (e.g., a new irpc service) doesn't change handler code.
- **Positive**: irpc and call protocol coexist naturally. The handler doesn't
-  know which path was taken.
- **Positive**: Adapters (MCP, HTTP, DNS) map to operations through the same
-  OperationEnv interface. One handler, multiple dispatch paths.
- **Positive**: Deployment topology determines dispatch, not code. Same handler
-  works locally, in-cluster, or cross-node.
- **Negative**: OperationEnv is a new abstraction that must coexist with the
-  existing call protocol handler pattern. The registry currently maps paths to
-  handlers; OperationEnv adds namespace-aware composition on top.
- **Negative**: The `@alkdev/operations` TypeScript `HashMap<String,
-  HashMap<String, fn>>` model needs idiomatic Rust translation. The behavioral
-  contract must match, but the implementation can differ.
-
-## References
-
- [research/services.md](../../research/services.md) — OperationContext, OperationEnv
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.5, OperationEnv wiring
- [ADR-026](026-transport-interface-separation.md) — Three-layer model (OperationEnv is Layer 3)
- [ADR-028](028-auth-irpc-service.md) — Auth as irpc service (one dispatch backend)
- [ADR-032](032-event-boundary-discipline.md) — Event boundary discipline
- [ADR-024](024-bidirectional-call-protocol.md) — Bidirectional call protocol
- [ADR-025](025-handler-spec-separation.md) — Handler/spec separation
--- a/docs/architecture/decisions/034-head-worker-terminology.md
+++ b/docs/architecture/decisions/034-head-worker-terminology.md
@@ -1,55 +0,0 @@
-# ADR-034: Head/Worker Terminology
-
-## Status
-
-Accepted
-
-## Context
-
-The project previously used hub/spoke terminology for describing node
-relationships: a hub node that coordinates connections and spokes that connect to
-it. This terminology implies a strict star topology where the hub is
-fundamentally different from spokes.
-
-In practice, a coordinating node can also execute operations (run services,
-forward traffic). Any node can become a coordinator. The architecture supports
-mesh topologies where nodes coordinate in a peer-to-peer fashion.
-
-The research documents (`core.md`, `services.md`) and updated architecture
-specs (`call-protocol.md`, `auth.md`, `napi-and-pubsub.md`, `open-questions.md`)
-already use head/worker consistently. Existing ADRs (024, 025) retain their
-original hub/spoke language because ADRs are historical records.
-
-## Decision
-
-**Use head/worker terminology throughout the project.**
-
- **Head node**: A node that coordinates — accepts connections, routes
-  operations, manages cluster state. A head is also a worker (it can execute
-  operations).
- **Worker node**: A node that connects to a head, registers its services, and
-  executes operations. Any worker can become a head.
- **Node**: Any participant in the network. Every node has an Ed25519 identity.
-
-The terms hub and spoke are deprecated in all new specs, code, and
-documentation. Existing ADRs retain their original language as historical
-records — ADRs document what was decided at the time, not what the current
-terminology is.
-
-## Consequences
-
- **Positive**: Natural mesh formation. A head that is also a worker enables
-  multi-hop routing, redundancy, and distributed topologies without a
-  centralized authority.
- **Positive**: Consistency with integration plan and research documents.
- **Positive**: The terminology better reflects the architecture — there is no
-  single "hub" that's fundamentally different from "spokes."
- **Neutral**: Existing ADRs (024, 025) retain hub/spoke in their text. This is
-  intentional — ADRs are historical records.
-
-## References
-
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 0 ADR 034 entry, inconsistencies section
- [ADR-024](024-bidirectional-call-protocol.md) — Uses hub/spoke historically
- [ADR-025](025-handler-spec-separation.md) — Uses hub/spoke historically
- [research/core.md](../../research/core.md) — Head/worker terminology
--- a/docs/architecture/decisions/035-streaminterface-messageinterface-split.md
+++ b/docs/architecture/decisions/035-streaminterface-messageinterface-split.md
@@ -1,65 +0,0 @@
-# ADR-035: StreamInterface and MessageInterface Split
-
-## Status
-Accepted
-
-## Context
-
-The `Interface` trait (ADR-026) assumes a persistent byte stream from a `Transport`. It produces a `Session` that yields `InterfaceEvent` frames. This works for SSH and raw framing — both run over duplex streams.
-
-However, HTTP and DNS do not fit this model. They handle individual request/response pairs, not persistent sessions. HTTP runs over a TLS connection after byte-peek protocol detection (extending the existing stealth mode pattern). DNS runs its own server on port 53. Both are stateless per-request, not session-oriented.
-
-The three-layer model (Transport, Interface, Protocol) remains correct. The issue is that Layer 2 has two distinct patterns: stream-based (SSH, raw framing) where the transport provides a continuous byte stream, and message-based (HTTP, DNS) where the interface manages its own transport and handles discrete requests.
-
-## Decision
-
-Split the `Interface` trait into two independent traits:
-
-1. **`StreamInterface`** — consumes a `TransportStream`, produces a long-lived `Session` that yields `InterfaceEvent` frames. Existing `SshInterface` and `RawFramingInterface` become `StreamInterface` implementations.
-
-2. **`MessageInterface`** — handles individual `InterfaceRequest` → `InterfaceResponse` pairs. Manages its own transport (HTTP server, DNS server). `HttpInterface` and `DnsInterface` are `MessageInterface` implementations.
-
-The traits are independent. They have different signatures (`accept(stream)` vs `handle_request(req)`), different lifecycles (long-lived session vs stateless per-request), and different transport ownership (provided by caller vs self-managed).
-
-`ListenerConfig` gains variants for both:
-
-```rust
-pub enum ListenerConfig {
-    Stream {
-        transport: TransportKind,
-        interface: StreamInterfaceKind,
-    },
-    Http {
-        bind_addr: SocketAddr,
-        tls: bool,
-        stealth: bool,
-    },
-    Dns {
-        bind_addr: SocketAddr,
-        tls: bool,
-    },
-}
-```
-
-`TransportKind::Dns` is removed. DNS is a `MessageInterface` that manages its own transport (UDP/TCP port 53), not a transport variant.
-
-The call protocol handler (Layer 3) is interface-agnostic: it processes `InterfaceEvent` frames from `StreamInterface` sessions and `InterfaceRequest` → `InterfaceResponse` from `MessageInterface` handlers. The dispatch logic is the same — only the framing differs.
-
-## Consequences
-
-**Positive**: HTTP and DNS are first-class interfaces with proper type signatures. No forcing stateless protocols into a session model. The existing stealth mode byte-peek pattern naturally extends to `HttpInterface`. The `InterfaceRequest` / `InterfaceResponse` types normalize calls across message-based interfaces.
-
-**Positive**: Removing `TransportKind::Dns` prevents a breaking change later — code should never depend on DNS as a transport variant.
-
-**Positive**: `ListenerConfig` correctly models the server's accept loop: stream listeners spawn one accept loop per (transport, interface) pair, while HTTP and DNS listeners each manage their own server.
-
-**Negative**: Two traits where there was one. But they serve fundamentally different purposes. A common super-trait would add complexity (`accept_stream` + `handle_request` + `transport_kind`) without practical benefit — implementations satisfy one trait or the other, never both.
-
-**Negative**: The `accept()` method on the current `Interface` trait needs to be renamed. This is a rename of an existing method signature, not a semantic change — `SshInterface` and `RawFramingInterface` implementations become `StreamInterface` implementations with the same `accept()` logic.
-
-## References
-
- ADR-026 (transport/interface separation — updated by this ADR)
- [interface.md](../interface.md) — Interface layer spec
- [research/phase2/interface-model.md](../../research/phase2/interface-model.md) — Full analysis
- [research/phase2/tls-transport.md](../../research/phase2/tls-transport.md) — HTTP interface, ListenerConfig
--- a/docs/architecture/decisions/036-credentialprovider-core-type.md
+++ b/docs/architecture/decisions/036-credentialprovider-core-type.md
@@ -1,82 +0,0 @@
-# ADR-036: CredentialProvider as Core Type
-
-## Status
-Accepted
-
-## Context
-
-Alknet's `IdentityProvider` resolves **inbound** authentication: given a
-credential (fingerprint or token), produce an `Identity`. But there is no
-corresponding abstraction for **outbound** credentials: how does alknet
-authenticate _to_ external services (vast.ai, rustfs, gitea)?
-
-Without `CredentialProvider`, each service wrapper would independently solve
-credential retrieval, caching, and lifecycle management. This leads to
-duplicated effort and inconsistent security practices across service wrappers.
-
-The pattern mirrors the existing `IdentityProvider` pattern: trait in core,
-default impl using simple storage, production impl using the secret service
-and database.
-
-## Decision
-
-Define `CredentialProvider` trait and `CredentialSet` enum in
-`alknet_core::credentials`.
-
-```rust
-pub trait CredentialProvider: Send + Sync + 'static {
-    fn get_credentials(&self, service: &str) -> Option<CredentialSet>;
-    fn refresh_credentials(&self, service: &str) -> Option<CredentialSet>;
-}
-
-pub enum CredentialSet {
-    ApiKey { header_name: String, token: String },
-    Basic { username: String, password: String },
-    Bearer { token: String },
-    S3AccessKey { access_key: String, secret_key: String, session_token: Option<String> },
-    OidcToken { access_token: String, refresh_token: Option<String>, expires_at: Option<u64> },
-    Custom { scheme: String, params: HashMap<String, String> },
-}
-```
-
-The trait is intentionally narrow. It returns credentials for a named service.
-It does not try to abstract the auth mechanism itself — that stays with the
-service wrapper that knows the protocol (S3 signing, OAuth2 refresh, etc.).
-
-Phase 1 provides `SecretStoreCredentialProvider` (reads from
-`SecretProtocol::Decrypt`, holds in RAM). Phase 2+ adds
-`ManagedCredentialProvider` (with `CredentialManager` for lifecycle management:
-refresh, expiration, provisioning).
-
-`CredentialProvider` does not depend on `IdentityProvider`, though
-`ManagedCredentialProvider` may use `Identity.id` for identity-bound credential
-lookups.
-
-## Consequences
-
-**Positive**: Outbound auth has a unified abstraction, just as inbound auth
-has `IdentityProvider`. Service wrappers retrieve credentials through one
-interface. `OperationEnv` can expose credentials through `context.env`.
-
-**Positive**: The `CredentialSet` enum covers all identified credential types
-(API keys, bearer tokens, S3 access keys, OIDC tokens, basic auth, custom).
-This is sufficient for Phases A-C. Phase D (alknet as OIDC provider) is additive.
-
-**Positive**: The trait in core, impl in service crate pattern is consistent
-with `IdentityProvider` (trait in core, `ConfigIdentityProvider` in core,
-`StorageIdentityProvider` in alknet-storage).
-
-**Negative**: Adds a new core type and a new module (`credentials`). But this
-is the same pattern as `IdentityProvider` and `auth` — a small, narrow trait
-with a clear contract.
-
-**Negative**: `ManagedCredentialProvider` and `CredentialManager` are Phase C
-concepts. The spec should define them as future extensions, not implement them
-now.
-
-## References
-
- ADR-029 (Identity as core type — same pattern)
- [credentials.md](../credentials.md) — CredentialProvider spec
- [research/phase2/credential-provider.md](../../research/phase2/credential-provider.md) — Full analysis
- [identity.md](../identity.md) — IdentityProvider (inbound, opposite direction)
--- a/docs/architecture/decisions/037-api-keys-dynamic-config.md
+++ b/docs/architecture/decisions/037-api-keys-dynamic-config.md
@@ -1,83 +0,0 @@
-# ADR-037: API Keys as DynamicConfig Auth
-
-## Status
-Accepted
-
-## Context
-
-Alknet's token auth uses Ed25519-signed `AuthToken`s — the same key material
-used for SSH auth. This is appropriate for interactive clients (browsers, CLI)
-that can generate and sign Ed25519 key pairs.
-
-But for service accounts, automation, and simple integrations, Ed25519 key
-pairs are inconvenient. A dashboard backend, a CI/CD pipeline, or a monitoring
-script needs a simple bearer token that can be stored in an environment variable
-or config file without managing cryptographic key pairs.
-
-The HTTP interface (Phase 2+) requires bearer token auth for `Authorization:
-Bearer <token>` headers. `AuthToken` works but requires client-side Ed25519
-signing. API keys offer a simpler alternative: short bearer tokens verified by
-SHA-256 hash lookup, with optional scope restrictions and TTL.
-
-## Decision
-
-Add `[[auth.api_keys]]` section to `DynamicConfig`:
-
-```toml
-[[auth.api_keys]]
-prefix = "alk_"
-hash = "sha256:abc..."
-scopes = ["relay:connect", "secrets:derive"]
-description = "dashboard service account"
-ttl = "30d"  # optional
-```
-
-`ConfigIdentityProvider::resolve_from_token()` handles both token types:
- If the input starts with the configured prefix (default `alk_`), treat it as
-  an API key: hash it with SHA-256 and look up the hash in the `api_keys` table.
- Otherwise, treat it as an `AuthToken`: decode, verify Ed25519 signature,
-  check timestamp, resolve from `authorized_keys`.
-
-Both paths produce the same `Identity` result. In database-backed deployments,
-both resolve to the same account UUID.
-
-API keys are stored as SHA-256 hashes (like password hashing — the cleartext
-key is never stored, only its hash). The prefix enables O(1) routing between
-AuthToken and API key verification without trying both paths.
-
-The full key is provided to the client exactly once (at creation time). Subsequent
-verifications only compare hashes.
-
-## Consequences
-
-**Positive**: Simple bearer token auth for HTTP and other non-SSH interfaces.
-No cryptographic key management for service accounts. Consistent with industry
-practice (Stripe, GitHub, AWS all use prefixed API keys).
-
-**Positive**: Both AuthTokens and API keys go through `resolve_from_token()`.
-The caller doesn't need to know which type they're using. This keeps the
-authentication layer unified.
-
-**Positive**: Scoped API keys enable fine-grained access control for service
-accounts. A monitoring tool gets `["monitoring:read"]`, not full access.
-
-**Negative**: API keys are bearer tokens — anyone who obtains the key has the
-associated permissions. The hash storage and optional TTL mitigate but do not
-eliminate this risk. Ed25519 AuthTokens remain the preferred auth method for
-interactive clients.
-
-**Negative**: API key rotation requires updating `DynamicConfig` (or the
-`api_keys` database table). The `ConfigReloadHandle` / `ConfigService` reload
-mechanism handles this, but it's a deliberate operation, not automatic.
-
-**Negative**: No rate limiting on API key verification is built into this ADR.
-Rate limiting on the HTTP interface is a separate concern.
-
-## References
-
- ADR-023 (unified auth, shared key material)
- ADR-029 (Identity as core type)
- ADR-030 (static/dynamic config split)
- [auth.md](../auth.md) — Token auth, AuthPolicy, API keys
- [configuration.md](../configuration.md) — DynamicConfig, AuthPolicy
- [research/phase2/interface-model.md](../../research/phase2/interface-model.md) — API keys in config
--- a/docs/architecture/decisions/038-seed-lifecycle-memory-security.md
+++ b/docs/architecture/decisions/038-seed-lifecycle-memory-security.md
@@ -1,137 +0,0 @@
-# ADR-038: Seed Lifecycle and Memory Security
-
-## Status
-
-Accepted
-
-## Context
-
-The alknet-secret crate holds the master BIP39 seed phrase in RAM. This seed is
-the root of trust for all derived keys (identity, encryption, signing). If the
-seed is leaked — through memory dumps, swap files, or core dumps — an attacker
-can derive every key in the system.
-
-Security-conscious key management systems typically employ three defenses:
-
-1. **Zeroize**: Overwrite sensitive memory before deallocating. Prevents
-   stale-data reads from freed memory.
-
-2. **Memory locking** (`mlock`/`VirtualLock`): Prevent the OS from paging
-   sensitive RAM to disk. Prevents swap-file leakage.
-
-3. **Constant-time comparison**: Prevent timing side-channels when comparing
-   keys or tokens.
-
-The question is: which of these should alknet-secret adopt in v1, and which
-should be deferred?
-
-## Decision
-
-**Phase 3 (v1): Zeroize only. Defer mlock and constant-time comparison to
-Phase B.**
-
- All sensitive types (seed bytes, derived private keys, passphrase strings)
-  derive `Zeroize` and implement `Drop` to call `zeroize()` before deallocation.
- The `Lock` operation calls `zeroize()` on the seed and all cached derived
-  keys, then drops them.
- `mlock`/`VirtualLock` and constant-time comparison are not included in v1.
-
-### Rationale for deferring mlock
-
-1. **Complexity**: `mlock` requires root/CAP_IPC_LOCK on Linux or
-   `SeLockMemory` on Windows. The crate should work in unprivileged contexts
-   (development, testing, single-user nodes) without requiring system
-   configuration changes.
-
-2. **Performance**: `mlock` locks physical pages, which are typically 4KB.
-   Locking many small buffers wastes physical memory. The seed (64 bytes) and
-   derived keys (32–64 bytes each) are tiny — the real risk is swap-file
-   leakage, which `zeroize` partially mitigates by wiping before free.
-
-3. **Deployment flexibility**: Production head nodes running as root or with
-   `CAP_IPC_LOCK` can add `mlock` in Phase B. Development and CLI nodes
-   shouldn't need it.
-
-4. **Audit surface**: `mlock` introduces platform-specific code paths (Linux
-   vs macOS vs Windows) that should be audited together, not bolted on
-   incrementally.
-
-### Rationale for deferring constant-time comparison
-
-The `SecretProtocol` service receives requests over irpc (local mpsc or remote
-QUIC). Comparison timing is not observable by callers — they send a message and
-wait for a response. The comparison that matters (auth token verification) is
-in alknet-core's `IdentityProvider`, not in alknet-secret. Key derivation
-results (DerivedKey) are not compared against attacker-controlled input within
-this crate.
-
-### Zeroize implementation
-
-```rust
-use zeroize::Zeroize;
-
-#[derive(Zeroize)]
-#[zeroize(drop)]
-struct SeedHolder {
-    seed: Vec<u8>,
-}
-
-#[derive(Zeroize)]
-#[zeroize(drop)]
-struct DerivedKeyCache {
-    keys: HashMap<String, Vec<u8>>,
-}
-```
-
-`#[zeroize(drop)]` ensures that `Drop` calls `zeroize()` on all fields,
-overwriting memory before deallocation. This is a compile-time guarantee —
-forgetting to zeroize a field is a compile error.
-
-### Lock lifecycle
-
-```
-Unlock(passphrase)
-  → validate mnemonic (if restoring) or generate new
-  → derive master key from seed
-  → store seed in SeedHolder (Zeroize-protected)
-  → cache empty (keys derived on demand)
-
-DeriveEd25519/DeriveEncryptionKey/Encrypt/Decrypt
-  → require unlocked state (error if locked)
-  → derive key, return result
-  → optionally cache derived key
-
-Lock
-  → zeroize all cached derived keys
-  → zeroize seed
-  → drop all sensitive material
-  → service returns to locked state
-```
-
-## Consequences
-
- **Positive**: Zeroize is zero-cost at compile time, minimal dependency
-  (`zeroize` crate is ~500 lines, no `unsafe` on stable), and provides
-  meaningful protection against stale-memory reads.
- **Positive**: Lock effectively purges all sensitive material. After Lock,
-  the process memory contains no useful secret data.
- **Positive**: No platform-specific code paths in v1. The crate compiles and
-  runs everywhere without privilege requirements.
- **Negative**: Without `mlock`, the OS can page the seed to swap before
-  zeroization occurs. This is a window of vulnerability that Phase B closes.
-  The risk is acceptable for v1 because swap-file extraction requires root
-  access or physical access to the machine — the same threat model as reading
-  process memory directly.
- **Negative**: Without constant-time comparison, timing side-channels exist
-  in theory. In practice, no comparison in alknet-secret operates on
-  attacker-controlled input, so the risk is nil within this crate.
- **Negative**: `zeroize` adds a dependency. The `zeroize` crate is widely
-  used in Rust crypto (ring, ed25519-dalek, x25519-dalek) and is a de facto
-  standard.
-
-## References
-
- [secret-service.md](../secret-service.md) — Security model, Lock/Unlock lifecycle
- [ADR-027](027-crate-decomposition.md) — Crate decomposition (alknet-secret is independent)
- [credentials.md](../credentials.md) — SecretStoreCredentialProvider integration
- `zeroize` crate — https://crates.io/crates/zeroize