greenfield: clean slate for ALPN-as-service pivot
Delete old source crates (alknet-core, alknet, alknet-napi), old architecture docs (ADRs, specs, open questions), old research docs (phase2, event-sourcing, feasibility, etc.), old tasks, and obsolete reference material (gitserver/MPL, honker, nats, rustfs, polyglot, keystone, distributed-identity). Keep: alknet-secret (standalone, compiles), pivot docs, iroh and ssh references, rudolfs reference (MIT/Apache, fork candidate), ops docs, sdd_process.md, and licenses. Previous implementation preserved at /workspace/@alkdev/alknet-main/ for reference during porting. Workspace compiles: cargo check + 14 tests pass for alknet-secret.
This commit is contained in:
@@ -1,26 +0,0 @@
|
||||
# ADR-001: Pluggable Transport via AsyncRead+AsyncWrite Trait
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
Alknet needs to support multiple transport modes (TCP, TLS, iroh) for SSH sessions. Each mode has different connection establishment logic but produces the same result: a bidirectional byte stream. Without an abstraction, each transport would need its own SSH connection code path.
|
||||
|
||||
russh's `client::connect_stream()` and `server::run_stream()` both accept `AsyncRead + AsyncWrite + Unpin + Send`, meaning SSH is already transport-agnostic at the API level. The design question is whether to enshrine this in alknet's own type system or handle each transport case-by-case.
|
||||
|
||||
## Decision
|
||||
Define a `Transport` trait that produces `AsyncRead + AsyncWrite + Unpin + Send` streams. Each transport (TCP, TLS, iroh) implements this trait. The SSH layer calls `transport.connect()` and passes the result to `russh::client::connect_stream()`.
|
||||
|
||||
On the server side, define a `TransportAcceptor` trait that produces incoming streams. Each acceptor (TCP listener, TLS listener, iroh endpoint) implements this trait. The server calls `acceptor.accept()` and passes the result to `russh::server::run_stream()`.
|
||||
|
||||
This makes adding a new transport (e.g., WebSocket, QUIC directly) a matter of implementing the trait, not modifying SSH code.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Clean separation between transport and protocol. Adding transports is additive. SSH code is transport-agnostic.
|
||||
- **Positive**: Testing is simplified — mock transports can produce in-memory streams.
|
||||
- **Negative**: Slight indirection for the single-transport case (just TCP). The trait boilerplate is minimal though.
|
||||
- **Negative**: The trait must be object-safe if we want dynamic dispatch. Using `impl Trait` in function signatures avoids this but limits runtime transport selection. CLI-selected transport needs dynamic dispatch: `Box<dyn Transport<Stream = Box<dyn AsyncRead+AsyncWrite+Unpin+Send>>>`.
|
||||
|
||||
## References
|
||||
- [transport.md](../transport.md)
|
||||
- [Feasibility assessment §3](../../research/feasibility/ssh-tunnel-vpn-alternative-feasibility.md)
|
||||
@@ -1,30 +0,0 @@
|
||||
# ADR-002: TUN Shim as Separate Process
|
||||
|
||||
## Status
|
||||
Superseded by ADR-014
|
||||
|
||||
## Context
|
||||
TUN interface creation requires root privileges or `CAP_NET_ADMIN` on Linux, Administrator on Windows, or platform-specific VPN APIs on macOS/iOS/Android. If the core alknet binary required these privileges, the attack surface of root-required code would include the entire SSH implementation, key handling, and transport negotiation.
|
||||
|
||||
The primary use cases (SOCKS5 proxy, port forwarding) need no privileges at all. Only the "route all traffic through TUN" use case needs root.
|
||||
|
||||
## Decision
|
||||
The TUN functionality is a separate `alknet-tun` binary that:
|
||||
1. Creates a TUN device (requires root / CAP_NET_ADMIN)
|
||||
2. Reads IP packets from it
|
||||
3. Forwards each connection to the core alknet's SOCKS5 port (127.0.0.1:1080)
|
||||
4. Proxies bytes between TUN packets and SOCKS5 connections
|
||||
|
||||
The core `alknet connect` binary never needs root. The `alknet-tun` binary is ~200-500 lines and does nothing except TUN ↔ SOCKS5 forwarding.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Root-required code surface is tiny and auditable.
|
||||
- **Positive**: Core binary runs unprivileged. SOCKS5 and port forwarding work without any special permissions.
|
||||
- **Positive**: TUN process can crash without affecting the SSH session (it just reconnects to SOCKS5).
|
||||
- **Positive**: Matches the proven tun2proxy architecture.
|
||||
- **Negative**: Two processes to manage instead of one. Requires process supervision (systemd, etc.).
|
||||
- **Negative**: SOCKS5 adds a small latency overhead vs. direct TUN → SSH packet routing. This is acceptable for the security benefit.
|
||||
|
||||
## References
|
||||
- [tun-shim.md](../tun-shim.md)
|
||||
- [tun2proxy](https://github.com/tun2proxy/tun2proxy) — proven architecture for TUN → SOCKS5 proxy
|
||||
@@ -1,31 +0,0 @@
|
||||
# ADR-003: iroh Stream via tokio::io::join
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
iroh's QUIC implementation provides separate `RecvStream` (implements `AsyncRead`) and `SendStream` (implements `AsyncWrite`) for each bidirectional channel opened via `open_bi()` / `accept_bi()`. russh's `connect_stream()` and `run_stream()` require a single type implementing both `AsyncRead` and `AsyncWrite`.
|
||||
|
||||
Options considered:
|
||||
1. `tokio::io::join(recv, send)` — Combines the two halves into `Join<RecvStream, SendStream>` which implements both traits.
|
||||
2. Custom `IrohStream` wrapper — A struct with `recv` and `send` fields that delegates `AsyncRead` to `recv` and `AsyncWrite` to `send`.
|
||||
3. Using iroh's `Connection` directly — Opening a new `open_bi()` for each SSH channel instead of running SSH over a single stream.
|
||||
|
||||
## Decision
|
||||
Use `tokio::io::join(recv_stream, send_stream)` (Option 1).
|
||||
|
||||
One line of code, correct trait implementations, no custom types needed. The `Join<A, B>` type implements `AsyncRead` using `A` and `AsyncWrite` using `B`, which maps directly to iroh's split stream model.
|
||||
|
||||
If profiling later shows overhead (unlikely — it's just method dispatch), we can switch to a custom wrapper. But YAGNI until demonstrated.
|
||||
|
||||
Option 3 was rejected because it would require modifying russh to understand iroh connections. The whole point of the transport trait is that SSH doesn't know about iroh.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Minimal code. One line to bridge iroh and russh.
|
||||
- **Positive**: No custom types to maintain.
|
||||
- **Positive**: Correct `AsyncRead` + `AsyncWrite` behavior — `Poll::Pending` on one half doesn't affect the other.
|
||||
- **Negative**: None identified. The `Join` type is a standard tokio combinator with well-tested semantics.
|
||||
|
||||
## References
|
||||
- [transport.md](../transport.md)
|
||||
- [Feasibility assessment §11](../../research/feasibility/ssh-tunnel-vpn-alternative-feasibility.md)
|
||||
@@ -1,28 +0,0 @@
|
||||
# ADR-004: SSH Runs Over Transport, Not Alongside
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
There are two ways to structure the relationship between SSH and the transport layer:
|
||||
|
||||
1. **SSH over transport**: The transport produces one duplex stream. The entire SSH session (handshake, key exchange, channel multiplexing) runs over that single stream via `connect_stream()` / `run_stream()`. SSH has no direct network access.
|
||||
|
||||
2. **Transport alongside SSH**: SSH manages its own TCP connections via `connect()` / `run()`. The transport layer is an additional feature that wraps outgoing connections. SSH knows about the network.
|
||||
|
||||
## Decision
|
||||
SSH runs over the transport (Option 1). The SSH layer never opens its own sockets or knows what transport it's on.
|
||||
|
||||
This is directly enabled by russh's `connect_stream()` and `run_stream()` APIs, which accept any `AsyncRead+AsyncWrite+Unpin+Send`. SSH's entire interaction with the network goes through the single stream produced by the transport.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Adding a new transport requires implementing the `Transport` trait, not modifying SSH code.
|
||||
- **Positive**: Testing is straightforward — mock transports produce in-memory streams.
|
||||
- **Positive**: Security audit is clean — the SSH implementation has no network-facing code.
|
||||
- **Positive**: The transport can be layered. Iroh connecting through a SOCKS5 proxy (which itself tunnels through alknet) is just a transport that calls out to a SOCKS5 library before establishing the QUIC connection.
|
||||
- **Negative**: SSH keepalive and reconnection must be handled at the transport level. If the transport stream dies, the SSH session dies. Reconnection means establishing a new transport + new SSH session. There's no "SSH reconnects over the same transport" — you get a new session.
|
||||
- **Negative**: Multiple SSH sessions over the same iroh connection require the iroh `Endpoint` (not stream) to be shared between sessions. The transport trait produces one stream per `connect()` call. The iroh `Endpoint` must be created externally and shared. (The `IrohTransport` struct holds an `Arc<Endpoint>`.)
|
||||
|
||||
## References
|
||||
- [transport.md](../transport.md)
|
||||
- [Feasibility assessment §3.4](../../research/feasibility/ssh-tunnel-vpn-alternative-feasibility.md)
|
||||
@@ -1,39 +0,0 @@
|
||||
# ADR-005: SOCKS5 as Primary Interface, TUN as Add-on
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
A "VPN-like" tool needs to route traffic. There are three approaches:
|
||||
|
||||
1. **TUN only**: Create a TUN interface, route all OS traffic through it. Full VPN experience but requires root.
|
||||
2. **SOCKS5 only**: Local SOCKS5 proxy. Applications configure proxy settings. No root needed but application support varies.
|
||||
3. **SOCKS5 primary, TUN add-on**: SOCKS5 is the core interface. TUN forwards to SOCKS5.
|
||||
|
||||
## Decision
|
||||
SOCKS5 is the primary interface. TUN is a separate process that forwards to SOCKS5 (Option 3).
|
||||
|
||||
SOCKS5 is the core because:
|
||||
- It requires no privileges
|
||||
- `curl --socks5-hostname` works everywhere
|
||||
- Browsers, most CLI tools, and many applications support SOCKS5
|
||||
- SOCKS5h prevents DNS leaks by resolving names server-side
|
||||
- It's the interface that the NAPI wrapper and pubsub adapter build on
|
||||
- TUN is only needed for "route all traffic" use cases, which are a subset of users
|
||||
|
||||
TUN forwards to SOCKS5 rather than directly to SSH because:
|
||||
- The SOCKS5 code already handles TCP connection establishment and bidirectional proxying
|
||||
- TUN's job is just IP packet → SOCKS5 connection, not IP packet → SSH channel
|
||||
- The `alknet-tun` binary stays minimal (~200-500 lines)
|
||||
- No root code in the core binary
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Core binary is root-free. TUN functionality is provided by the external `tun2proxy` tool (ADR-014).
|
||||
- **Positive**: SOCKS5 is testable without TUN — just `curl` against it.
|
||||
- **Positive**: The TUN approach is validated by tun2proxy, a well-tested existing tool. No custom TUN code to maintain.
|
||||
- **Negative**: VPN-like behavior requires running `tun2proxy` alongside `alknet connect` — two processes instead of one integrated binary.
|
||||
- **Negative**: SOCKS5 doesn't capture UDP (except DNS via SOCKS5h). TUN mode via tun2proxy handles this separately.
|
||||
|
||||
## References
|
||||
- [client.md](../client.md)
|
||||
- [tun-shim.md](../tun-shim.md)
|
||||
@@ -1,38 +0,0 @@
|
||||
# ADR-006: No Logging of Tunnel Destinations
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
An SSH tunnel server sees every destination that clients connect to — hostnames, IP addresses, port numbers. This is extremely sensitive information. Logging it creates:
|
||||
|
||||
- **Privacy risks**: Tunnel destinations reveal what services users access (internal databases, APIs, etc.)
|
||||
- **Legal concerns**: Server operators may be pressured to produce logs showing what clients accessed
|
||||
- **Data retention liability**: Stored destination logs are an attack surface (data breaches, subpoenas)
|
||||
|
||||
However, the server does need to log some information for operational purposes — particularly for fail2ban integration to detect and block abusive connections.
|
||||
|
||||
## Decision
|
||||
The server does NOT log:
|
||||
- `channel_open_direct_tcpip` destinations (host, port)
|
||||
- DNS resolutions performed by the server on behalf of clients
|
||||
- Bytes transferred through tunnel channels
|
||||
- Connection duration or throughput
|
||||
|
||||
The server DOES log (ADR-013):
|
||||
- Auth attempts (remote_addr, user, key_fingerprint, accept/reject)
|
||||
- Connection opened (remote_addr, transport kind)
|
||||
- Connection closed (remote_addr, duration)
|
||||
|
||||
This separation ensures fail2ban has enough data to detect abusive IPs while destination privacy is maintained.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Tunnel destinations are never written to disk or any observable log. This is the same guarantee OpenSSH makes with `LogLevel VERBOSE` or below.
|
||||
- **Positive**: Reduces legal and privacy exposure for server operators.
|
||||
- **Positive**: fail2ban can still work — it needs source IPs and auth failures, not destinations.
|
||||
- **Negative**: Server operators cannot audit what destinations clients are accessing. If an operator needs this for compliance, they must implement it outside alknet (e.g., network-level logging at the target host).
|
||||
- **Negative**: Debugging connectivity issues is harder without destination logs. Mitigated by client-side logging (the client knows what it's connecting to).
|
||||
|
||||
## References
|
||||
- [server.md](../server.md)
|
||||
- [ADR-013](013-fail2ban-friendly-logging.md) — what the server does log
|
||||
@@ -1,26 +0,0 @@
|
||||
# ADR-007: NAPI Exposes Single Duplex Stream
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The NAPI wrapper for alknet could expose different granularity levels:
|
||||
|
||||
1. **Full SSH API**: Expose channel multiplexing, `open_direct_tcpip`, `tcpip_forward`, session management. The TypeScript layer would manage channels.
|
||||
2. **Single duplex stream**: The NAPI wrapper establishes one SSH channel and returns it as a Node.js `Duplex` stream. TypeScript multiplexing (if needed) happens at the pubsub layer.
|
||||
|
||||
## Decision
|
||||
Option 2: NAPI exposes a single duplex stream.
|
||||
|
||||
The NAPI wrapper's job is to get a reliable, authenticated byte stream from A to B. It handles transport (TCP/TLS/iroh), SSH authentication, and channel setup, then hands the caller a single `Duplex` stream that just works.
|
||||
|
||||
If the TypeScript consumer needs multiplexing (e.g., multiple concurrent tool calls over operations), pubsub handles that at the `EventEnvelope` level. Multiple `call.requested` / `call.responded` events flow over the same stream, distinguished by their `id` fields. This is how the existing WebSocket adapter works.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Minimal NAPI surface — one function, one return type. Small binary, small FFI boundary.
|
||||
- **Positive**: The TypeScript side doesn't need to understand SSH at all. It gets a stream and sends/receives `EventEnvelope` JSON.
|
||||
- **Positive**: No need to expose russh types in NAPI. The SSH complexity stays in Rust.
|
||||
- **Negative**: If a consumer wants multiple isolated channels (e.g., one for events, one for file transfer), they'd need multiple `connect()` calls (multiple SSH sessions). This is acceptable for the expected use case (pubsub events over a single stream).
|
||||
|
||||
## References
|
||||
- [napi-and-pubsub.md](../napi-and-pubsub.md)
|
||||
@@ -1,38 +0,0 @@
|
||||
# ADR-008: ACME/Let's Encrypt Certificate Provisioning
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
TLS transport mode requires certificates. Manual certificate management is error-prone — users need to obtain, install, and renew certificates. Our production setup uses certbot with Let's Encrypt (documented in [certbot.md](../../research/ops/certbot.md)), which automates this via the ACME protocol.
|
||||
|
||||
There are two ACME flows:
|
||||
1. **Domain-based**: Standard flow with DNS-01 or HTTP-01 challenge. Certificate is tied to a domain name, auto-renews via certbot/systemd timer. Requires port 80 or DNS access for challenges.
|
||||
2. **IP-based**: Short-lived certificates via TLS-ALPN-01 challenge on port 443. No domain needed, but cert is short-lived (days, not months). Simpler for quick setups but requires the ACME client to run continuously.
|
||||
|
||||
Both flows are important for alknet's usability. Without ACME, TLS mode requires manual cert setup — a significant barrier for users who want "SSH over port 443" for censorship resistance.
|
||||
|
||||
## Decision
|
||||
Support both ACME certificate provisioning paths:
|
||||
|
||||
1. **Domain-based ACME** (`--acme-domain <domain>`): Standard certbot-style flow. Certificate is domain-bound, auto-renews. The server runs a challenge responder (HTTP-01 on port 80 or TLS-ALPN-01 on port 443) during certificate issuance/renewal.
|
||||
|
||||
2. **IP-based ACME**: Short-lived certs for servers without a domain. Uses TLS-ALPN-01 challenge on port 443. Lower burden but certs expire frequently.
|
||||
|
||||
3. **Manual certs** (`--tls-cert` / `--tls-key`): Always supported for users with existing certificates or specific PKI setups.
|
||||
|
||||
The implementation should use the `rustls-acme` crate (or similar pure-Rust ACME client) to avoid an external certbot dependency. This keeps alknet self-contained as a single binary.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Users can run `alknet serve --transport tls --acme-domain example.com` and get working TLS with zero manual cert management.
|
||||
- **Positive**: IP-based ACME covers the quick-setup case without requiring a domain.
|
||||
- **Positive**: Consistent with our production infrastructure (certbot + Let's Encrypt is already our standard).
|
||||
- **Negative**: ACME adds complexity to the server binary (challenge responder, cert store, renewal timer).
|
||||
- **Negative**: IP-based short-lived certs require more frequent renewal handling.
|
||||
- **Negative**: Binary size increases with ACME support (rustls-acme dependency). Consider making ACME a feature flag (`acme`).
|
||||
|
||||
## References
|
||||
- [server.md](../server.md)
|
||||
- [OQ-01](../open-questions.md) — resolved by this ADR
|
||||
- [OQ-07](../open-questions.md) — resolved by this ADR
|
||||
- Production certbot setup: [certbot.md](../../research/ops/certbot.md)
|
||||
@@ -1,28 +0,0 @@
|
||||
# ADR-009: Default iroh Relay with Override
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
iroh requires a relay server for NAT traversal and initial connection establishment. The n0 project provides free relay servers (`https://relay.iroh.network/`) that work out of the box. However, relying on a third-party service creates a dependency:
|
||||
|
||||
- n0's relay could change terms, rate-limit, or go down
|
||||
- Production deployments may want self-hosted relays for reliability and privacy
|
||||
- The relay URL is a configuration point that should be explicit
|
||||
|
||||
Conversely, requiring users to set up a relay server before they can use iroh transport is a significant friction point for testing and quick starts.
|
||||
|
||||
## Decision
|
||||
Default to n0's relay servers. Allow override via `--iroh-relay <url>` CLI flag. Document self-hosted relay setup in project documentation.
|
||||
|
||||
This matches iroh's own defaults — n0's relay is the standard starting point. Users who need production reliability self-host.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Zero-config iroh transport for testing and development. `alknet serve --transport iroh` just works.
|
||||
- **Positive**: Self-hosting is a single flag override, not a complex setup requirement.
|
||||
- **Negative**: Default depends on n0's infrastructure. If n0's relay is down, default iroh connections fail (but this is the same experience as every iroh user).
|
||||
- **Negative**: Privacy-conscious users must remember to `--iroh-relay` to avoid n0. Mitigated by documentation.
|
||||
|
||||
## References
|
||||
- [transport.md](../transport.md)
|
||||
- [OQ-02](../open-questions.md) — resolved by this ADR
|
||||
@@ -1,33 +0,0 @@
|
||||
# ADR-010: Transport Chaining in CLI
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
Transport chaining allows combining iroh with an upstream proxy, e.g.:
|
||||
|
||||
```bash
|
||||
alknet connect --transport iroh --proxy socks5://127.0.0.1:1080
|
||||
```
|
||||
|
||||
This routes iroh's outbound TCP connections through a SOCKS5 proxy, which could itself be another alknet instance. This is important for:
|
||||
- Nested tunnel topologies
|
||||
- Environments where iroh needs to go through an existing proxy
|
||||
- Composing transports in flexible ways
|
||||
|
||||
iroh's `Endpoint::builder` supports proxy configuration natively. The implementation is straightforward — pass the proxy URL to iroh's builder.
|
||||
|
||||
## Decision
|
||||
Support `--transport iroh --proxy socks5://...` natively in the CLI. This works because iroh's endpoint builder accepts a proxy configuration, so the implementation is minimal: parse the proxy URL and pass it to the endpoint builder.
|
||||
|
||||
For other transport combinations (TCP+TLS is already implicit — TLS wraps TCP), the `--proxy` flag applies to outbound connections from the SSH client or iroh endpoint.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Flexible transport composition without requiring separate manual configuration.
|
||||
- **Positive**: Matches user expectation from the overview doc's transport chaining example.
|
||||
- **Positive**: Implementation is minimal — iroh already supports proxy config.
|
||||
- **Negative**: Slightly more CLI surface area (`--proxy` interaction with `--transport`).
|
||||
|
||||
## References
|
||||
- [transport.md](../transport.md)
|
||||
- [OQ-05](../open-questions.md) — resolved by this ADR
|
||||
@@ -1,38 +0,0 @@
|
||||
# ADR-011: Programmatic-First API, No File-Based Config
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The client and server both need configuration (host addresses, keys, transport options, etc.). There are several approaches:
|
||||
|
||||
1. **Read `~/.ssh/config`**: Parse OpenSSH config for default host/key/port. Reduces CLI verbosity for frequent connections.
|
||||
2. **Custom config file**: Alknet-specific config file (TOML/YAML) with host definitions.
|
||||
3. **Programmatic API only**: Configuration comes from CLI flags or the library API. No file parsing. `~/.ssh/` path conventions are cross-platform trouble (`~` expansion, Windows paths, etc.).
|
||||
4. **Hybrid**: `--config` flag pointing to a alknet-specific config file, but no OpenSSH config parsing.
|
||||
|
||||
## Decision
|
||||
Option 3: Programmatic-first API. Configuration is provided via:
|
||||
- **CLI**: explicit flags (`--server`, `--identity`, `--transport`, etc.)
|
||||
- **Library API**: `alknet_core::client::ConnectOptions` and `alknet_core::server::ServeOptions` structs, constructable programmatically
|
||||
- **Environment variables**: for a few convenience defaults (e.g., `ALKNET_SERVER`, `ALKNET_IDENTITY`)
|
||||
|
||||
No `~/.ssh/config` parsing, no alknet-specific config files. This approach:
|
||||
- Avoids cross-platform path issues (`~` expansion, Windows `USERPROFILE`, etc.)
|
||||
- Makes the library API clean and straightforward for programmatic consumers (NAPI wrapper, pubsub)
|
||||
- Keeps the CLI simple and explicit — no hidden behavior from config files
|
||||
- Matches the design principle that the library crate (`alknet-core`) is the primary interface
|
||||
|
||||
If users want config-file behavior in the future, it can be added as a separate layer that populates the options structs. But the core doesn't need to know about files.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Clean library API — `ConnectOptions` and `ServeOptions` are plain Rust structs.
|
||||
- **Positive**: No cross-platform path issues in the core library.
|
||||
- **Positive**: Explicit CLI — no hidden settings from a config file the user forgot about.
|
||||
- **Positive**: NAPI wrapper can construct options programmatically without file I/O.
|
||||
- **Negative**: Users must type full connection flags each time. Mitigated by shell aliases or environment variables.
|
||||
- **Negative**: No config file convenience. Users coming from `ssh config` may find this inconvenient.
|
||||
|
||||
## References
|
||||
- [client.md](../client.md)
|
||||
- [OQ-06](../open-questions.md) — resolved by this ADR
|
||||
@@ -1,42 +0,0 @@
|
||||
# ADR-012: Ed25519 Keys + OpenSSH Certificate Authority, No Password Auth
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
SSH authentication has several options:
|
||||
- **Ed25519 public key**: The default, already specified. Each user has a keypair; the server has an `authorized_keys` file.
|
||||
- **Password authentication**: Convenient for quick setups but less secure (susceptible to brute force, credential reuse).
|
||||
- **OpenSSH certificate authority (cert-authority)**: A CA signs user certificates. The server trusts the CA instead of individual keys. Much easier for multi-user setups — add one CA line to `authorized_keys` instead of every user's public key. Also supports certificate expiry and restrictions.
|
||||
|
||||
The question is which auth methods to support and prioritize.
|
||||
|
||||
## Decision
|
||||
|
||||
**Primary: Ed25519 public key** (already specified, no change).
|
||||
|
||||
**Important: OpenSSH certificate authority**. Support `cert-authority` entries in `authorized_keys` files. When a user presents a certificate signed by a trusted CA, the server validates the certificate (signature, expiry, permissions) and accepts it. This is critical for multi-user deployments where managing individual keys is impractical.
|
||||
|
||||
**Not supported: Password authentication over SSH channels.** Password auth over an SSH tunnel (i.e., the SOCKS5 proxy requiring a password) is not in scope. Password auth over SSH itself is rejected because:
|
||||
- It's less secure than key-based auth
|
||||
- It's susceptible to brute force (fail2ban can mitigate, but keys eliminate the problem)
|
||||
- It's not needed when cert-authority provides easy multi-user management
|
||||
- If a local SOCKS5 proxy is desired with its own auth, that's a separate concern
|
||||
|
||||
The server's `authorized_keys` file format follows OpenSSH conventions:
|
||||
- Regular keys: `ssh-ed25519 AAAA... user@host`
|
||||
- CA trusts: `cert-authority ssh-ed25519 AAAA... CA name`
|
||||
- Principals: `cert-authority,permit-port-forwarding ssh-ed25519 AAAA... CA name`
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Multi-user deployments are manageable — one CA entry instead of N key entries.
|
||||
- **Positive**: Certificates can carry expiry dates and restrictions (permit-port-forwarding, no-pty, source-address).
|
||||
- **Positive**: No password brute force risk. fail2ban still needed for connection-level abuse, but not for auth-level password guessing.
|
||||
- **Positive**: `russh` supports OpenSSH certificate verification natively.
|
||||
- **Negative**: Setting up a CA requires initial key management tooling (`ssh-keygen -s`).
|
||||
- **Negative**: Users who want a quick "just let me in" experience need to generate keys first. Not a significant barrier for the target audience (self-hosting, ops).
|
||||
|
||||
## References
|
||||
- [client.md](../client.md)
|
||||
- [server.md](../server.md)
|
||||
- [OQ-04](../open-questions.md) — resolved by this ADR
|
||||
@@ -1,39 +0,0 @@
|
||||
# ADR-013: Fail2ban-Friendly Server Logging
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The server needs to handle abuse on public-facing deployments. Our production infrastructure uses fail2ban on Linux (documented in [fail2ban.md](../../research/ops/fail2ban.md)) with nftables and systemd journal backend. fail2ban needs structured, parseable logs to identify abusive IP addresses.
|
||||
|
||||
However, fail2ban is Linux-specific. On other platforms (macOS, Windows, BSD), users need a different approach to reject abusive connections. The server should provide enough logging for fail2ban on Linux and enough built-in protection for other platforms.
|
||||
|
||||
## Decision
|
||||
The server logs connection and authentication events at `INFO` level with structured fields, and provides a configurable connection rate limiter as a built-in defense.
|
||||
|
||||
**Logging** (for fail2ban integration on Linux):
|
||||
- Log auth attempts: `level=INFO, msg="auth attempt", remote_addr=<ip>, user=<user>, key_fingerprint=<sha256>, result=<accept|reject>`
|
||||
- Log new connections: `level=INFO, msg="connection opened", remote_addr=<ip>, transport=<tcp|tls|iroh>`
|
||||
- Log disconnections: `level=INFO, msg="connection closed", remote_addr=<ip>, duration=<secs>`
|
||||
- Do NOT log: channel open targets, DNS resolutions, bytes transferred
|
||||
|
||||
This matches what fail2ban needs: source IP + failure indicator. Our existing fail2ban setup filters on similar fields for SSH and nginx.
|
||||
|
||||
**Built-in rate limiting** (for all platforms):
|
||||
- `--max-connections-per-ip <n>` (default: 0 = unlimited) — reject new connections from an IP that already has N active connections
|
||||
- `--max-auth-attempts <n>` (default: 10) — disconnect after N failed auth attempts from one connection
|
||||
- Rate limiting happens at the SSH layer, before channels are opened
|
||||
|
||||
This ensures that even without fail2ban, the server rejects obviously abusive connections.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: fail2ban can parse alknet logs the same way it parses SSH and nginx logs on our production systems.
|
||||
- **Positive**: Built-in rate limiting provides protection on platforms without fail2ban.
|
||||
- **Positive**: No privacy-sensitive data in logs (no tunnel destinations).
|
||||
- **Negative**: Slightly more code in the server for connection tracking per IP.
|
||||
- **Negative**: Users with custom fail2ban filters need to write regex for alknet's log format (documented examples provided).
|
||||
|
||||
## References
|
||||
- [server.md](../server.md)
|
||||
- [OQ-08](../open-questions.md) — resolved by this ADR
|
||||
- Production fail2ban setup: [fail2ban.md](../../research/ops/fail2ban.md)
|
||||
@@ -1,41 +0,0 @@
|
||||
# ADR-014: Defer TUN Implementation, Recommend Local SOCKS5 + tun2proxy
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The original plan included a TUN shim (`alknet-tun`) as Phase 3 — a separate root-requiring process that creates a TUN device and forwards IP packets through alknet's SOCKS5 port. This would provide VPN-like "route all traffic" behavior.
|
||||
|
||||
However, TUN implementation has significant complexities:
|
||||
- Platform differences (Linux TUN, macOS utun, Windows wintun.dll)
|
||||
- TCP reconstruction in userspace (smoltcp or tun2proxy's ip-stack)
|
||||
- Virtual DNS handling
|
||||
- Root/CAP_NET_ADMIN requirements
|
||||
- TUN is easy to get wrong and hard to debug
|
||||
|
||||
The core SOCKS5 interface already works for the vast majority of use cases. For users who truly need VPN-like "route all traffic" behavior, `tun2proxy` is an existing, well-tested tool that does exactly this: creates a TUN device and routes traffic through a SOCKS5 proxy.
|
||||
|
||||
## Decision
|
||||
Defer TUN implementation entirely. Remove `alknet-tun` from the architecture. Instead:
|
||||
|
||||
1. **Core interface**: alknet's local SOCKS5 proxy (always available, no root required)
|
||||
2. **VPN-like behavior**: Users who need it run `tun2proxy --proxy socks5://127.0.0.1:1080` alongside `alknet connect`
|
||||
3. **Documentation**: Recommend tun2proxy in the README/wiki for "route all traffic" use cases
|
||||
|
||||
This removes TUN from the project scope while still providing a path to VPN-like behavior. If demand justifies it later, `alknet-tun` can be added as a thin wrapper around tun2proxy's pattern.
|
||||
|
||||
The `tun` feature flag and `alknet-tun` binary are removed from the architecture. The `tun-rs` dependency is removed.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Significantly reduces project scope and complexity. No TUN code to write, test, or maintain across platforms.
|
||||
- **Positive**: tun2proxy is already well-tested for this exact use case.
|
||||
- **Positive**: Core binary remains unprivileged. No root code anywhere in the project.
|
||||
- **Positive**: Cleaner architecture — alknet only does SSH tunneling + SOCKS5. tun2proxy does TUN.
|
||||
- **Negative**: Users need two tools instead of one for VPN-like behavior. Mitigated by documentation.
|
||||
- **Negative**: tun2proxy is an external dependency in practice, though it's widely available in package managers.
|
||||
- **Negative**: No first-class Windows/macOS TUN story. tun2proxy handles these platforms but users need to install it separately.
|
||||
|
||||
## References
|
||||
- [tun-shim.md](../tun-shim.md) — this spec is now deprecated
|
||||
- [ADR-002](002-tun-separate-process.md) — superseded; TUN is no longer in scope
|
||||
- [ADR-005](005-socks5-before-tun.md) — SOCKS5 is still the primary interface; TUN forwarding is now external
|
||||
@@ -1,27 +0,0 @@
|
||||
# ADR-015: napi-rs for FFI Bridge
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The NAPI wrapper needs a Rust-to-Node.js bridge. Two main options:
|
||||
|
||||
1. **napi-rs**: The standard for Rust → Node.js native addons. Mature, well-documented, large ecosystem. Produces `.node` binaries for specific platforms. Good build tooling (`napi` CLI). Used by major projects (swc, rspack, biome).
|
||||
|
||||
2. **uniffi**: Mozilla's FFI bridge supporting multiple targets (Python, Swift, Kotlin, Node.js). Broader target reach but less mature for Node.js specifically. The Node.js binding is relatively new.
|
||||
|
||||
The primary consumer is TypeScript/Node.js — specifically the `@alkdev/pubsub` event target system. The broader alkdev ecosystem (pubsub, operations) is TypeScript-first. While future Python or mobile consumers are imaginable, they are not in scope.
|
||||
|
||||
## Decision
|
||||
Use napi-rs. It's the standard for Node.js native addons, has the best documentation and tooling, and matches our primary consumer (TypeScript/Node.js). If future Python or mobile consumers are needed, uniffi can be added as a separate FFI layer — the Rust core library doesn't change, only the binding layer does.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Best-in-class Node.js native addon support. Mature, well-documented, widely used.
|
||||
- **Positive**: `napi` CLI handles building, cross-compilation, and npm package publishing.
|
||||
- **Positive**: Async support via `napi-rs`'s `AsyncTask` and thread-safe functions.
|
||||
- **Negative**: Only targets Node.js. Python/Swift/Kotlin require a separate FFI bridge (uniffi or similar).
|
||||
- **Negative**: `.node` binaries are platform-specific. Need CI matrix for linux-x64, linux-arm64, macos-x64, macos-arm64, win32-x64.
|
||||
|
||||
## References
|
||||
- [napi-and-pubsub.md](../napi-and-pubsub.md)
|
||||
- [OQ-11](../open-questions.md) — resolved by this ADR
|
||||
@@ -1,40 +0,0 @@
|
||||
# ADR-016: NAPI Exposes Both connect() and serve()
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The NAPI wrapper needs to provide TypeScript/Node.js consumers with access to alknet's functionality. The primary use case is `@alkdev/pubsub`'s event target system, which needs both directions:
|
||||
|
||||
1. **connect()**: Establish a client connection to a alknet server. Used by workers/spokes that need to tunnel events through a alknet server.
|
||||
2. **serve()**: Start a alknet server from Node.js. Used by hubs that want to accept alknet connections and route events.
|
||||
|
||||
The previous decision (ADR-007) was to expose only `connect()` for MVP, deferring `serve()`. However, the pubsub integration requires both: a spoke needs `connect()` to reach a hub, and a hub could use `serve()` to accept connections without running a separate `alknet serve` process.
|
||||
|
||||
More importantly, both `connect()` and `serve()` are fundamental operations of the alknet library. Since the NAPI wrapper is a thin layer over `alknet-core`, exposing both is straightforward — they're just Rust functions behind `#[napi]` attributes.
|
||||
|
||||
## Decision
|
||||
The NAPI wrapper exposes both `connect()` and `serve()` from the start:
|
||||
|
||||
```typescript
|
||||
// @alkdev/alknet
|
||||
function connect(options: AlknetConnectOptions): Promise<Duplex>;
|
||||
function serve(options: AlknetServeOptions): Promise<AlknetServer>;
|
||||
```
|
||||
|
||||
- `connect()` returns a `Duplex` stream (as per ADR-007)
|
||||
- `serve()` returns a `AlknetServer` object with a `close()` method and events for new connections
|
||||
|
||||
The NAPI layer is transport-agnostic — it doesn't know about pubsub's `EventEnvelope`. The pubsub event target adapter wraps the `Duplex` stream to implement `TypedEventTarget`. This separation ensures the NAPI wrapper is reusable for any stream-based protocol, not just pubsub.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Pubsub can use both directions without running a separate binary for the server side.
|
||||
- **Positive**: The NAPI wrapper becomes a complete bridge — any Node.js process can be either a client or server.
|
||||
- **Positive**: Implementation is still minimal — `serve()` is just `alknet_core::server::run()` behind `#[napi]`.
|
||||
- **Negative**: Slightly larger API surface (two functions + `AlknetServer` type instead of just `connect()`).
|
||||
- **Negative**: Server-side NAPI needs to handle multiple concurrent connections, which adds complexity to `AlknetServer`.
|
||||
|
||||
## References
|
||||
- [napi-and-pubsub.md](../napi-and-pubsub.md)
|
||||
- [ADR-007](007-napi-single-stream.md) — still valid; NAPI exposes single streams, but now from both sides
|
||||
- [OQ-10](../open-questions.md) — resolved by this ADR
|
||||
@@ -1,30 +0,0 @@
|
||||
# ADR-017: Stealth Mode — Protocol Multiplexing on Port 443
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
When running a alknet server with TLS transport on port 443, the server should be indistinguishable from a regular HTTPS web server to port scanners and deep packet inspection (DPI) systems. This is important for censorship circumvention — if SSH traffic on port 443 is detectable, it can be blocked.
|
||||
|
||||
After the TLS handshake completes, the server sees a raw byte stream. SSH protocol identification starts with `SSH-2.0-`, while HTTP starts with HTTP method verbs (GET, POST, etc.). The server can inspect the first bytes to determine the protocol.
|
||||
|
||||
## Decision
|
||||
When `--stealth` is enabled with TLS transport:
|
||||
|
||||
1. After completing the TLS handshake, peek at the first few bytes of the connection
|
||||
2. If the connection starts with `SSH-2.0-`, proceed with SSH session via `server::run_stream()`
|
||||
3. If the connection starts with anything else (HTTP, random data), respond with `HTTP/1.1 404 Not Found\r\nServer: nginx\r\n\r\n` and close the connection
|
||||
|
||||
This makes the server appear as an nginx web server returning 404 errors to all non-SSH connections. Scanners and DPI systems see a typical HTTPS site with no SSH exposure.
|
||||
|
||||
The fake response uses `Server: nginx` headers to match the most common web server profile.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: TLS+alknet servers on port 443 are indistinguishable from ordinary HTTPS sites to automated scanners.
|
||||
- **Positive**: Simple implementation — just peek at the first bytes and branch.
|
||||
- **Positive**: Consistent with censorship circumvention best practices.
|
||||
- **Negative**: Legitimate HTTPS traffic to the same port gets a 404. If the same IP needs to serve real web content, use a reverse proxy (nginx/haproxy) in front that routes by SNI or path.
|
||||
- **Negative**: The `--stealth` flag only applies to TLS transport. It has no effect on TCP or iroh transports.
|
||||
|
||||
## References
|
||||
- [server.md](../server.md)
|
||||
@@ -1,38 +0,0 @@
|
||||
# ADR-018: Control Channel for PubSub over SSH
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The NAPI wrapper and pubsub integration need a way to use alknet's SSH channel as a data plane for event routing. When a `alknet connect` client opens an SSH session to a server, the `direct_tcpip` channel type is used to reach specific TCP targets (host:port).
|
||||
|
||||
For the pubsub use case, the client needs a dedicated bidirectional stream to the server's event bus — not a TCP connection to a random host. There are several approaches:
|
||||
|
||||
1. **Special destination**: Use `direct_tcpip` with a reserved destination (e.g., `alknet-control:0`) that the server recognizes and routes internally instead of connecting to a TCP target.
|
||||
2. **Port forwarding**: The server runs a pubsub hub on a specific port (e.g., 9736) and the client uses normal port forwarding (`-L 9736:hub:9736`).
|
||||
3. **Custom channel type**: Define a new SSH channel type beyond `direct_tcpip` and `forwarded_tcpip`.
|
||||
|
||||
## Decision
|
||||
Use approach 1: a reserved `direct_tcpip` destination string. When the server receives a `channel_open_direct_tcpip` request for `alknet-control:0`:
|
||||
|
||||
1. The `channel_open_direct_tcpip` handler detects the special target via string matching
|
||||
2. Instead of connecting to a TCP target, it bridges the channel to the local pubsub event bus
|
||||
3. `EventEnvelope` JSON flows bidirectionally over the SSH channel
|
||||
|
||||
The destination string `alknet-control` is reserved. Regular TCP targets are hostnames or IP addresses, so there is no collision risk.
|
||||
|
||||
Approach 2 (port forwarding to a specific port) is still supported as an alternative — the client can use `--forward 9736:localhost:9736` if the server runs a pubsub hub on that port. But the control channel approach is simpler and doesn't require a separate listening port.
|
||||
|
||||
Approach 3 (custom channel type) was rejected because russh's `direct_tcpip` handler is well-understood and adding custom channel types requires modifying russh.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Simple implementation — just string matching in the server's `channel_open_direct_tcpip` handler.
|
||||
- **Positive**: No separate port or service needs to run on the server. The control channel is built into alknet.
|
||||
- **Positive**: Compatible with the NAPI wrapper's single-duplex-stream model.
|
||||
- **Positive**: Port forwarding to a specific port is still available as an alternative.
|
||||
- **Negative**: The string `alknet-control` is a magic constant. It should be defined as a constant in the crate.
|
||||
- **Negative**: Regular TCP destinations accidentally matching `alknet-control` would be misrouted. Mitigated by reserving the entire `alknet-` prefix namespace.
|
||||
|
||||
## References
|
||||
- [napi-and-pubsub.md](../napi-and-pubsub.md)
|
||||
- [server.md](../server.md)
|
||||
@@ -1,42 +0,0 @@
|
||||
# ADR-019: `--proxy` Has Different Semantics on Client vs Server
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The `--proxy` CLI flag appears on both `alknet connect` (client) and `alknet serve` (server), but the two sides proxy fundamentally different things:
|
||||
|
||||
- **Client**: `--proxy` routes the *transport connection* through the proxy. For example, `alknet connect --transport iroh --proxy socks5://127.0.0.1:1080` means the iroh endpoint's outbound TCP connections go through the specified SOCKS5 proxy before reaching the iroh relay. The proxy wraps the transport layer.
|
||||
|
||||
- **Server**: `--proxy` routes *outbound target connections* through the proxy. For example, `alknet serve --proxy socks5://127.0.0.1:9050` means when an SSH client opens a `direct_tcpip` channel to `db.internal:5432`, the server connects to that target through the specified proxy. The proxy wraps the data-plane connections.
|
||||
|
||||
Using the same flag name for both is intentional — from the user's perspective, both mean "route traffic through a proxy." But the layer at which the proxy operates differs, and this needs to be explicit so implementers don't confuse the two.
|
||||
|
||||
ADR-010 addressed transport chaining for the client side only. The server-side outbound proxy behavior has no ADR. This ADR documents both semantics and the rationale for sharing the flag name.
|
||||
|
||||
## Decision
|
||||
The `--proxy` flag uses the same name on client and server, with documented different semantics:
|
||||
|
||||
| Side | Flag | What gets proxied | Example |
|
||||
|------|------|-------------------|---------|
|
||||
| Client | `--proxy` | Transport connection (outbound to server/relay) | `--transport iroh --proxy socks5://...` → iroh endpoint connects through proxy |
|
||||
| Server | `--proxy` | Outbound target connections (data plane) | `--proxy socks5://...` → direct_tcpip targets reached through proxy |
|
||||
|
||||
On the **client**, `--proxy` affects the transport layer. It only applies to transports that make outbound TCP connections (iroh through a proxy, TLS through a proxy). For plain TCP transport, `--proxy` has no meaningful effect since the transport is already a direct TCP connection — use the SOCKS5 server instead.
|
||||
|
||||
On the **server**, `--proxy` affects the data plane. All `channel_open_direct_tcpip` outbound connections are routed through the proxy, regardless of transport mode.
|
||||
|
||||
This is not a naming collision — it's the same conceptual operation ("route through a proxy") at different layers. The shared name avoids forcing users to learn two proxy flags.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: One flag name (`--proxy`) instead of two. Users already understand "proxy" as "route through this."
|
||||
- **Positive**: Client-side proxy is minimal implementation — iroh's endpoint builder accepts proxy config natively.
|
||||
- **Positive**: Server-side proxy is straightforward — all outbound TCP from channel handlers goes through the proxy.
|
||||
- **Negative**: Implementers must read the correct spec (client vs server) to understand what `--proxy` does for their side. This is mitigated by CLI help text that clearly describes the behavior per side.
|
||||
- **Negative**: On the client, `--proxy` with `--transport tcp` is effectively a no-op (the transport is already a direct TCP connection to the server). The CLI should handle this case gracefully.
|
||||
|
||||
## References
|
||||
- [ADR-010](010-transport-chaining-cli.md) — client-side transport chaining
|
||||
- [transport.md](../transport.md) — transport layer spec
|
||||
- [client.md](../client.md) — client CLI
|
||||
- [server.md](../server.md) — server outbound proxy
|
||||
@@ -1,85 +0,0 @@
|
||||
# ADR-023: Unified Authentication with Shared Key Material
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Alknet currently authenticates connections exclusively through SSH public key
|
||||
auth in the SSH handshake. This works for SSH-over-any-transport (TCP, TLS,
|
||||
iroh) because SSH carries its own auth protocol. But WebTransport and other
|
||||
HTTP-level transports cannot perform SSH key exchange — browsers speak HTTP/3,
|
||||
not SSH.
|
||||
|
||||
Without unification, non-SSH transports would need a completely separate
|
||||
identity system (API keys, JWTs, session tokens). This creates two problems:
|
||||
(1) operators manage two key sets with two rotation mechanisms, and (2) the
|
||||
same person connecting via SSH and WebTransport appears as two different
|
||||
identities.
|
||||
|
||||
The `IdentityProvider` trait is needed to decouple alknet-core from any
|
||||
specific identity storage (config file vs. database). Without it, alknet-core
|
||||
would either hardcode config-file-based auth or take a database dependency —
|
||||
neither is acceptable for a library crate.
|
||||
|
||||
## Decision
|
||||
|
||||
**Unified authentication**: The same Ed25519 key material (`authorized_keys`
|
||||
and `cert_authorities`) is shared across both SSH auth and token auth. The
|
||||
presentation differs per transport, but the verification result (an
|
||||
`Identity` with scopes) is the same.
|
||||
|
||||
**Token auth for non-SSH transports**: WebTransport clients present a signed
|
||||
timestamp token in the CONNECT request URL:
|
||||
|
||||
```
|
||||
AuthToken = base64url(key_id || timestamp || signature)
|
||||
key_id = SHA-256 fingerprint of the Ed25519 public key (32 bytes)
|
||||
timestamp = Unix seconds, big-endian u64 (8 bytes)
|
||||
signature = Ed25519 sign(key_id || timestamp_bytes, private_key)
|
||||
```
|
||||
|
||||
Server extracts the fingerprint, looks it up in the same `authorized_keys`
|
||||
set, verifies the signature, and checks the timestamp window (default ±300s).
|
||||
|
||||
**`IdentityProvider` trait**: Decouples alknet-core from identity storage. The
|
||||
trait resolves a fingerprint or token to an `Identity`. Default implementation
|
||||
loads from `DynamicConfig.auth` (no database). Hub implementation can back it
|
||||
with `@alkdev/storage`.
|
||||
|
||||
**`TokenKeySource::Shared`**: The token auth uses the same authorized keys set
|
||||
as SSH auth by default. Deployments that want separate access control can use
|
||||
`TokenKeySource::Separate` with a distinct key set.
|
||||
|
||||
**Replay protection via timestamps**: V1 uses timestamp-only (no server state).
|
||||
Zero-replay can be added later via a nonce challenge-response without changing
|
||||
the key material.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: One key set, one rotation, one `reloadAuth()` call. Adding a
|
||||
key to `authorized_keys` immediately grants access via both SSH and
|
||||
WebTransport.
|
||||
- **Positive**: `IdentityProvider` trait makes alknet-core independent of any
|
||||
specific database. Default: config file. Hub: `@alkdev/storage`.
|
||||
- **Positive**: Browser clients can authenticate using Ed25519 keys via
|
||||
SubtleCrypto (Chrome 105+, Firefox 130+, Safari 17+). Deno supports it
|
||||
natively.
|
||||
- **Positive**: No JWT library dependency. The token is a simple Ed25519
|
||||
signature over a fixed structure — same primitives SSH already uses.
|
||||
- **Negative**: V1 has a replay window (±300s). An attacker who intercepts a
|
||||
QUIC packet can replay the token within the window. Acceptable because QUIC
|
||||
interception is the same threat level as connection hijacking.
|
||||
- **Negative**: Certificate authority tokens are not supported in v1. CA
|
||||
verification requires the full OpenSSH certificate structure, which doesn't
|
||||
fit in a signed timestamp.
|
||||
- **Negative**: Browser-side key management is less ergonomic than SSH key
|
||||
files. The private key must be imported into SubtleCrypto. This is a UI/UX
|
||||
concern, not a protocol concern.
|
||||
|
||||
## References
|
||||
|
||||
- [auth.md](../auth.md) — Full auth architecture spec
|
||||
- [ADR-012](012-auth-ed25519-and-cert-authority.md) — Ed25519 + cert-authority auth
|
||||
- [OQ-17](../open-questions.md) — Transport-aware auth (resolved by this ADR)
|
||||
- [configuration.md](../../research/configuration.md) — OQ-CFG-04, OQ-CFG-06 (resolved)
|
||||
@@ -1,63 +0,0 @@
|
||||
# ADR-024: Bidirectional Call Protocol
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The alknet control channel (ADR-018) routes from client → server's event bus.
|
||||
This is unidirectional: clients can send events to the server, but the server
|
||||
cannot call operations on the client. In the hub/spoke model, spokes (dev env
|
||||
containers) connect to a hub and expose operations (fs, bash, search) that the
|
||||
hub invokes. The hub needs to call *spoke* operations.
|
||||
|
||||
Additionally, the current control channel provides no request/response semantics.
|
||||
Every consumer that needs call/response reinvents the pending-request correlation.
|
||||
|
||||
## Decision
|
||||
|
||||
The call protocol is bidirectional. Both sides can send `call.requested` and
|
||||
receive `call.responded`. The protocol uses `EventEnvelope` wire format (4-byte
|
||||
BE length prefix + JSON) — the same as `@alkdev/pubsub`.
|
||||
|
||||
Five event types: `call.requested`, `call.responded`, `call.completed`,
|
||||
`call.aborted`, `call.error`.
|
||||
|
||||
A call is a subscribe that resolves after one event. Both use `call.requested`
|
||||
with correlated `requestId`. `PendingRequestMap` in core provides correlation.
|
||||
|
||||
Operation names use slash-based paths: `/{spoke}/{service}/{op}`. The first
|
||||
path segment routes the call to the correct connected node. The hub's registry
|
||||
maps spoke prefixes to connections. This mirrors iroh's ALPN dispatch: the
|
||||
first segment is the routing key, remaining path dispatches within the node.
|
||||
|
||||
Core-provided operations use short paths without a spoke prefix
|
||||
(`/services/list`, `/services/schema`). Spoke operations are prefixed
|
||||
(`/dev1/fs/readFile`).
|
||||
|
||||
This generalizes ADR-018's control channel: the `alknet-*` destination becomes
|
||||
a transport for `EventEnvelope` frames with call protocol semantics, instead of
|
||||
raw pubsub dispatch.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Hub can invoke operations on spokes. Dev env containers
|
||||
expose fs, bash, search — the hub calls them as needed.
|
||||
- **Positive**: Browser clients can expose custom UDFs. Any connected participant
|
||||
can both call and serve operations.
|
||||
- **Positive**: Built-in request/response correlation. One `PendingRequestMap`
|
||||
in core serves all consumers.
|
||||
- **Positive**: Slash-based paths align with URL routing, OpenAPI, MCP, and
|
||||
iroh's ALPN dispatch. First segment = routing key.
|
||||
- **Positive**: Multiple spokes exposing the same service (two dev envs both
|
||||
exposing `/fs/*`) are naturally differentiated by the spoke prefix.
|
||||
- **Negative**: The `PendingRequestMap` adds in-memory state. Entries must be
|
||||
cleaned up on timeout or connection close.
|
||||
- **Negative**: The hub must maintain a routing table mapping spoke identities
|
||||
to connections, with registration on connect and cleanup on disconnect.
|
||||
|
||||
## References
|
||||
|
||||
- [call-protocol.md](../call-protocol.md) — Full call protocol spec
|
||||
- [ADR-018](018-control-channel-for-pubsub.md) — Control channel (generalized)
|
||||
- [napi-and-pubsub.md](../napi-and-pubsub.md) — NAPI wrapper and pubsub adapter
|
||||
@@ -1,73 +0,0 @@
|
||||
# ADR-025: Handler/Spec Separation for Downstream Service Registration
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The current control channel (ADR-018) is hardcoded: `alknet-control:0` bridges
|
||||
to the local pubsub event bus. If NAPI wants to expose `fs.readFile` or
|
||||
`bash.exec` as callable operations, it has no way to register these with core's
|
||||
channel routing. The NAPI handler would need to intercept channel data outside
|
||||
of core.
|
||||
|
||||
For the hub/spoke model, spokes register their operations with the hub when
|
||||
they connect. The hub's registry must include both hub-local operations and
|
||||
remote operations exposed by spokes.
|
||||
|
||||
## Decision
|
||||
|
||||
Operation specs and handlers are separated from core. Core provides:
|
||||
|
||||
1. `OperationSpec` — describes what an operation does (name, type, input/output
|
||||
schemas, access control)
|
||||
2. `OperationHandler` — implements the operation logic
|
||||
3. `OperationRegistry` — maps paths to specs + handlers
|
||||
4. Built-in operations: `/services/list`, `/services/schema`
|
||||
|
||||
Downstream consumers register their own operations:
|
||||
|
||||
```rust
|
||||
// NAPI layer registers dev env tools
|
||||
registry.register(OperationSpec { name: "/fs/readFile", ... }, fs_read_handler);
|
||||
registry.register(OperationSpec { name: "/bash/exec", ... }, bash_exec_handler);
|
||||
|
||||
// Browser client registers a custom UDF
|
||||
registry.register(OperationSpec { name: "/notify/alert", ... }, notify_handler);
|
||||
```
|
||||
|
||||
Operation names use slash-based paths: `/{spoke}/{service}/{op}`. The first
|
||||
segment routes to the node. The `namespace` field on `OperationSpec` is
|
||||
derived from the second path segment (`service`).
|
||||
|
||||
When spoke operations are registered with the hub, the hub adds the spoke
|
||||
prefix: a spoke that registers `/fs/readFile` as "dev1" becomes addressable as
|
||||
`/dev1/fs/readFile` in the hub's routing table.
|
||||
|
||||
The `/services/list` operation returns all registered specs. The
|
||||
`/services/schema` operation returns the spec for a specific operation. These
|
||||
are read-only — no admin operations.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: NAPI, Python, and any downstream consumer can register
|
||||
operations without modifying core.
|
||||
- **Positive**: Service discovery is built in. Clients query `/services/list`
|
||||
to learn what operations a hub offers.
|
||||
- **Positive**: Spoke prefix naturally differentiates multiple spokes exposing
|
||||
the same service (dev1 vs dev2).
|
||||
- **Positive**: `AccessControl` on each `OperationSpec` enables per-operation
|
||||
authorization. Higher-risk operations (shell, filesystem write) can require
|
||||
tighter scopes.
|
||||
- **Positive**: Schema exposure enables MCP adapter generation. OperationSpec
|
||||
maps directly to MCP tool definitions.
|
||||
- **Negative**: The registry adds complexity. Core now owns `OperationSpec`,
|
||||
`OperationRegistry`, and `PendingRequestMap`.
|
||||
- **Negative**: Namespace collisions between downstream consumers are possible.
|
||||
The spoke prefix mitigates this: `/dev1/fs/readFile` vs `/dev2/fs/readFile`.
|
||||
|
||||
## References
|
||||
|
||||
- [call-protocol.md](../call-protocol.md) — Full call protocol spec
|
||||
- [ADR-018](018-control-channel-for-pubsub.md) — Control channel (generalized)
|
||||
- `@alkdev/operations` — TypeScript `OperationSpec`, `CallHandler`, registry
|
||||
@@ -1,162 +0,0 @@
|
||||
# ADR-026: Transport/Interface Separation (Three-Layer Model)
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
In the current architecture, SSH is deeply embedded in the server handler. The
|
||||
`ServerHandler` owns auth, channel management, and proxy logic — all mixed
|
||||
together. This makes it impossible to run the call protocol over any transport
|
||||
that doesn't speak SSH, such as:
|
||||
|
||||
- **DNS** — encoding call protocol frames as DNS TXT queries/responses for
|
||||
censorship resistance
|
||||
- **Raw framing** — 4-byte length prefix + JSON `EventEnvelope` without SSH
|
||||
wrapping, for local service mesh or browser-to-head direct communication
|
||||
- **WebTransport** — running call protocol over QUIC streams (browsers can't do
|
||||
SSH key exchange)
|
||||
|
||||
The DNS control channel concept from research (`core.md`) currently conflates
|
||||
"DNS as a transport that moves bytes" with "SSH sessions over those bytes." But
|
||||
SSH is not a transport — it's a protocol layer that sits *on top of* a
|
||||
transport. Separating them enables the DNS control channel to carry call
|
||||
protocol events directly, without wrapping SSH inside DNS queries.
|
||||
|
||||
The same separation enables raw framing (no SSH overhead) for trusted local
|
||||
networks, and WebTransport direct call protocol for browser clients.
|
||||
|
||||
## Decision
|
||||
|
||||
**Establish a three-layer model:**
|
||||
|
||||
### Layer 1: Transport
|
||||
|
||||
Produces byte streams. A `Transport` still produces
|
||||
`AsyncRead + AsyncWrite + Unpin + Send`. This layer is unchanged from ADR-001.
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait Transport: Send + Sync + 'static {
|
||||
type Stream: AsyncRead + AsyncWrite + Unpin + Send + 'static;
|
||||
async fn connect(&self) -> Result<Self::Stream>;
|
||||
fn describe(&self) -> String;
|
||||
}
|
||||
```
|
||||
|
||||
Transports: TCP, TLS, iroh, DNS (as byte carrier), WebTransport (future).
|
||||
|
||||
### Layer 2: Interface
|
||||
|
||||
Consumes a `Transport::Stream` and produces call protocol sessions. An
|
||||
interface is what SSH currently does: wrap a byte stream in session semantics.
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait Interface: Send + Sync + 'static {
|
||||
type Session;
|
||||
async fn accept(stream: TransportStream, config: &InterfaceConfig) -> Result<Self::Session>;
|
||||
}
|
||||
```
|
||||
|
||||
Interfaces:
|
||||
|
||||
- **SSH interface** — wraps existing `ServerHandler` logic. SSH handshake, auth,
|
||||
channel multiplexing. The call protocol runs over a reserved SSH channel
|
||||
(`alknet-control:0`).
|
||||
- **Raw framing interface** — 4-byte big-endian length prefix + JSON
|
||||
`EventEnvelope`. No SSH overhead. Direct call protocol over the transport
|
||||
stream.
|
||||
- **DNS control channel** — a (DNS transport, raw framing interface) pair that
|
||||
encodes/decodes `EventEnvelope` frames as DNS query/response pairs.
|
||||
|
||||
### Layer 3: Protocol
|
||||
|
||||
Carries semantics. Call protocol events, operation registry, service calls.
|
||||
The protocol is agnostic to both the transport and the interface below it. It
|
||||
receives `EventEnvelope` frames from whatever interface produced them.
|
||||
|
||||
### Connection Model
|
||||
|
||||
A **connection** is always a (Transport, Interface) pair. The valid combinations are enumerated:
|
||||
|
||||
| Transport | Interface | Use case |
|
||||
|-----------|-----------|----------|
|
||||
| TLS | SSH | Standard alknet tunnel |
|
||||
| TCP | SSH | Plain SSH tunnel |
|
||||
| iroh | SSH | P2P SSH tunnel |
|
||||
| DNS | raw framing | DNS control channel |
|
||||
| WebTransport | SSH | Browser SSH tunnel (future) |
|
||||
| WebTransport | raw framing | Browser call protocol (future) |
|
||||
| TCP | raw framing | Direct call protocol, local mesh |
|
||||
|
||||
**The DNS control channel carries call protocol frames directly — it does NOT
|
||||
wrap SSH inside DNS.** This is explicit because the research originally
|
||||
conflated "SSH tunneling over DNS" with "DNS as a transport for call protocol."
|
||||
The (DNS, raw framing) pair sends `EventEnvelope` frames as DNS TXT
|
||||
queries/responses — no SSH involved.
|
||||
|
||||
### `TransportKind` Enum
|
||||
|
||||
The `TransportKind` enum (currently `Tcp | Tls | Iroh`) gains `Dns` and
|
||||
`WebTransport` variants. Initially these are tags only — no acceptor
|
||||
implementation. The full DNS and WebTransport implementations are Phase 4 work
|
||||
per the integration plan.
|
||||
|
||||
```rust
|
||||
pub enum TransportKind {
|
||||
Tcp,
|
||||
Tls { server_name: Option<String> },
|
||||
Iroh { endpoint_id: String },
|
||||
Dns { domain: String },
|
||||
WebTransport { host: String },
|
||||
}
|
||||
```
|
||||
|
||||
### ServerHandler Refactor
|
||||
|
||||
The existing `ServerHandler` is refactored into `SshInterface`. The interface
|
||||
abstraction means the server's accept loop becomes:
|
||||
|
||||
```rust
|
||||
// Pseudocode
|
||||
let (transport, interface) = listener_config;
|
||||
let stream = transport.accept().await?;
|
||||
let session = interface.accept(stream, &config).await?;
|
||||
// session produces call protocol events
|
||||
```
|
||||
|
||||
The call protocol handler is interface-agnostic — it receives `EventEnvelope`
|
||||
frames from any interface. Auth, forwarding policy, and operation routing happen
|
||||
at Layer 3, not inside the SSH handler.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Enables DNS control channel without SSH wrapping. The (DNS,
|
||||
raw framing) pair is a clean (Transport, Interface) combination.
|
||||
- **Positive**: Enables raw framing for local service mesh. No SSH overhead for
|
||||
trusted networks.
|
||||
- **Positive**: SSH becomes pluggable. The same call protocol handler works with
|
||||
any interface.
|
||||
- **Positive**: `ServerHandler` is refactored into `SshInterface` — a smaller,
|
||||
more focused component that only handles SSH session management.
|
||||
- **Positive**: Future WebTransport and WebSocket interfaces are additive — they
|
||||
implement the `Interface` trait without touching SSH code.
|
||||
- **Negative**: This is the most invasive code change in Phase 1
|
||||
(integration-plan, Phase 1.8). SSH auth, channel management, and proxy logic
|
||||
are currently tangled in `ServerHandler`. Extracting them requires careful
|
||||
refactoring to maintain existing behavior.
|
||||
- **Negative**: The `Interface` trait is new and untested. The design must
|
||||
accommodate both SSH's channel multiplexing and raw framing's single-stream
|
||||
model through the same abstraction.
|
||||
|
||||
## References
|
||||
|
||||
- [research/core.md](../../research/core.md) — Transport layer, DNS transport section
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.8, three-layer model
|
||||
- [transport.md](../transport.md) — Current Transport trait (unchanged at Layer 1)
|
||||
- [server.md](../server.md) — Current ServerHandler (will become SshInterface)
|
||||
- [ADR-001](001-pluggable-transport.md) — Transport trait produces stream (unchanged)
|
||||
- [ADR-004](004-ssh-over-transport.md) — SSH runs over transport (reinforced by Layer 2)
|
||||
- [ADR-024](024-bidirectional-call-protocol.md) — Bidirectional call protocol (Layer 3)
|
||||
@@ -1,164 +0,0 @@
|
||||
# ADR-027: Crate Decomposition
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
alknet-core currently contains everything: transport, SSH, auth, config, the
|
||||
call protocol handler, and the server accept loop. As the project grows to
|
||||
include SQLite-backed identity, HD key derivation, and metagraph storage, core
|
||||
would need to depend on rusqlite, bip39, petgraph, and other heavy dependencies
|
||||
— unacceptable for a library crate that CLI users embed.
|
||||
|
||||
Different deployment topologies need different subsets:
|
||||
- A minimal CLI tunnel only needs core, transport, and auth types
|
||||
- A head node needs SQLite-backed identity and the secret service
|
||||
- A flowgraph visualization tool only needs petgraph operations
|
||||
|
||||
Circular dependencies must be avoided. alknet-storage implements
|
||||
alknet-core's `IdentityProvider` trait, so alknet-core cannot depend on
|
||||
alknet-storage. alknet-storage references alknet-secret's `EncryptedData` wire
|
||||
format, but not as a crate dependency.
|
||||
|
||||
## Decision
|
||||
|
||||
**Decompose the project into six crates with a strict acyclic dependency graph.**
|
||||
|
||||
### Crate Structure
|
||||
|
||||
1. **alknet-core** — Transport, SSH, call protocol, config, auth types, identity,
|
||||
`OperationSpec`, `Interface` trait. The foundational crate that everything
|
||||
else depends on (by type, not by crate dep in some cases).
|
||||
- *Depends on*: russh, tokio, irpc (feature-gated), serde, arc-swap
|
||||
- *Does NOT depend on*: alknet-secret, alknet-storage, alknet-flowgraph
|
||||
|
||||
2. **alknet-secret** — BIP39 mnemonic generation, SLIP-0010 Ed25519 HD key
|
||||
derivation, AES-256-GCM encryption, `SecretProtocol` irpc service.
|
||||
- *Depends on*: bip39, ed25519-bip32 (or rust-bip32-ed25519), aes-gcm, sha2,
|
||||
irpc
|
||||
- *Does NOT depend on*: alknet-core, alknet-storage
|
||||
|
||||
3. **alknet-storage** — SQLite-backed metagraph, identity tables, ACL graph,
|
||||
honker integration, `StorageProtocol` irpc service.
|
||||
- *Depends on*: rusqlite (via honker), honker, petgraph, jsonschema, irpc
|
||||
- *Does NOT depend on alknet-core* (but implements alknet-core's
|
||||
`IdentityProvider` trait via the trait, not a crate dep)
|
||||
- *Does NOT depend on alknet-secret* (but references `EncryptedData` type
|
||||
format for wire compatibility)
|
||||
|
||||
4. **alknet-flowgraph** — `FlowGraph<N,E>` over petgraph, operation graph, call
|
||||
graph, type compatibility checking.
|
||||
- *Depends on*: petgraph, serde, jsonschema, thiserror
|
||||
- *Does NOT depend on*: alknet-core, alknet-storage, alknet-secret
|
||||
|
||||
5. **alknet-napi** — Node.js native addon. Exposes alknet-core to Node.js.
|
||||
- *Depends on*: alknet-core
|
||||
- *Does NOT depend on*: alknet-secret, alknet-storage, alknet-flowgraph
|
||||
|
||||
6. **alknet** (CLI binary) — Assembles everything.
|
||||
- *Depends on*: alknet-core, alknet-secret (feature), alknet-storage (feature),
|
||||
alknet-flowgraph (feature), toml
|
||||
|
||||
### Dependency Graph
|
||||
|
||||
```
|
||||
alknet-secret alknet-storage alknet-flowgraph
|
||||
(standalone) (standalone) (standalone)
|
||||
│ │ │
|
||||
│ (feature flags │ (trait impl │ (type compat
|
||||
│ in CLI binary) │ via CLI wire) │ via JSON)
|
||||
▼ ▼ ▼
|
||||
┌─────────────────────┐
|
||||
│ alknet-core │
|
||||
│ (transport, SSH, │
|
||||
│ call protocol, │
|
||||
│ Identity, Config) │
|
||||
└─────────┬───────────┘
|
||||
│
|
||||
┌────────────┼────────────┐
|
||||
▼ ▼ ▼
|
||||
alknet-napi alknet (CLI binary — assembles everything)
|
||||
```
|
||||
|
||||
All four library crates (core, secret, storage, flowgraph) are independent of
|
||||
each other. Dependencies flow **upward** only. The CLI binary sits at the top
|
||||
and wires concrete implementations together. alknet-storage implements
|
||||
alknet-core's `IdentityProvider` trait without a crate dependency — the CLI
|
||||
binary provides the bridge.
|
||||
|
||||
### Narrow Interface Points
|
||||
|
||||
Three types serve as the narrow interface points between crates:
|
||||
|
||||
1. **`Identity`** — Defined in `alknet_core::auth`. Used by auth handler,
|
||||
forwarding policy, and call protocol. alknet-storage implements
|
||||
`IdentityProvider` to produce instances.
|
||||
|
||||
2. **`IdentityProvider`** — Trait defined in `alknet_core::auth`. Implemented by
|
||||
`ConfigIdentityProvider` (in core) and `StorageIdentityProvider` (in
|
||||
alknet-storage). The CLI/NAPI layer wires the concrete implementation.
|
||||
|
||||
3. **`OperationSpec`** — Defined in `alknet_core::call`. Used by the operation
|
||||
registry and by alknet-flowgraph for type compatibility checking. The bridge
|
||||
is serialization — flowgraph serializes to JSON, storage persists it.
|
||||
|
||||
### irpc Feature Flag
|
||||
|
||||
irpc is a feature flag in alknet-core. When disabled, auth and config go through
|
||||
`IdentityProvider` and `ConfigReloadHandle` directly — no irpc overhead. Nodes
|
||||
that only do SSH tunneling don't need the service layer.
|
||||
|
||||
In alknet-secret and alknet-storage, irpc is an independent dependency, not
|
||||
feature-gated. These crates always define irpc service protocols because they
|
||||
are used in production deployments where the service layer is active.
|
||||
|
||||
### alknet-storage's Relationship to alknet-core
|
||||
|
||||
alknet-storage does NOT depend on alknet-core as a crate. Instead:
|
||||
|
||||
- alknet-storage defines its own `IdentityProvider` impl that matches
|
||||
alknet-core's trait signature. The trait is re-exported or defined locally
|
||||
with `#[cfg(feature = "alknet-core")]` interop.
|
||||
- In practice, the CLI binary crate depends on both and wires them together.
|
||||
alknet-storage provides `StorageIdentityProvider`; alknet-core takes
|
||||
`impl IdentityProvider`.
|
||||
|
||||
### alknet-storage's Relationship to alknet-secret
|
||||
|
||||
alknet-storage does NOT depend on alknet-secret as a crate. Instead:
|
||||
|
||||
- alknet-storage and alknet-secret share the `EncryptedData` wire format (key
|
||||
version, salt, IV, ciphertext). This is a type-level compatibility, not a
|
||||
crate dependency.
|
||||
- alknet-secret encrypts; alknet-storage stores the encrypted blob in a
|
||||
`SecretNode` in the metagraph. The bridge is serialization.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Core is lean. No database, no crypto, no petgraph. CLI users
|
||||
get a small binary.
|
||||
- **Positive**: Services are pluggable. alknet-secret and alknet-storage can be
|
||||
swapped for alternative implementations.
|
||||
- **Positive**: No circular dependencies. The dependency graph is a DAG.
|
||||
- **Positive**: Deployment topology determines which crates to include. A CLI
|
||||
tunnel uses only alknet-core. A head node uses everything.
|
||||
- **Positive**: irpc is feature-gated in core. Minimal deployments don't pay for
|
||||
service layer overhead.
|
||||
- **Negative**: `IdentityProvider` trait interop between alknet-core and
|
||||
alknet-storage requires careful versioning. If the trait signature changes,
|
||||
both crates must update.
|
||||
- **Negative**: `EncryptedData` wire format compatibility between alknet-secret
|
||||
and alknet-storage is implicit (not enforced by the type system). A shared
|
||||
types crate could be extracted if needed, but adds another crate dependency.
|
||||
|
||||
## References
|
||||
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 2, dependency graph
|
||||
- [research/core.md](../../research/core.md) — alknet-core contents
|
||||
- [research/services.md](../../research/services.md) — Service protocols
|
||||
- [research/storage.md](../../research/storage.md) — alknet-storage contents
|
||||
- [research/flow.md](../../research/flow.md) — alknet-flowgraph contents
|
||||
- [ADR-028](028-auth-irpc-service.md) — Auth as irpc service (service protocol enabled by decomposition)
|
||||
- [ADR-029](029-identity-core-type.md) — Identity as core type (narrow interface point)
|
||||
@@ -1,147 +0,0 @@
|
||||
# ADR-028: Auth as irpc Service
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
For head nodes serving many users, in-memory key lookup via `ArcSwap<DynamicConfig>`
|
||||
doesn't scale. Loading all authorized keys into RAM and atomic-swapping the
|
||||
entire set on each reload works for small deployments but requires holding every
|
||||
key in memory. For production deployments with hundreds or thousands of users,
|
||||
auth verification should query a database on demand rather than holding all keys
|
||||
in memory.
|
||||
|
||||
The current `ArcSwap<DynamicConfig>` approach works for CLI and single-node
|
||||
setups. What's needed is an async boundary that allows auth verification to go
|
||||
through a service — locally via channels for minimal deployments, or via irpc
|
||||
for production deployments where auth runs on a separate process or node.
|
||||
|
||||
The critical design point: callers go through the `IdentityProvider` trait
|
||||
(ADR-029). The irpc service is one way to satisfy the trait. Both paths produce
|
||||
the same result — an `Identity` or rejection. The trait is the contract; the
|
||||
service is an implementation path.
|
||||
|
||||
## Decision
|
||||
|
||||
**Auth verification is provided via an irpc service protocol, with
|
||||
`IdentityProvider` as the interface contract and `ConfigIdentityProvider`
|
||||
(ArcSwap-backed) as the default implementation.**
|
||||
|
||||
### IdentityProvider Trait (ADR-029) — The Contract
|
||||
|
||||
Callers depend on `IdentityProvider`, not on any concrete implementation:
|
||||
|
||||
```rust
|
||||
pub trait IdentityProvider: Send + Sync + 'static {
|
||||
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
|
||||
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
|
||||
}
|
||||
```
|
||||
|
||||
### ConfigIdentityProvider — Default Implementation
|
||||
|
||||
Reads from `ArcSwap<DynamicConfig.auth>`. No database needed. Every authorized
|
||||
key gets a default scope set. This is the default for CLI and single-node
|
||||
deployments.
|
||||
|
||||
### AuthProtocol irpc Service — Behind Feature Flag
|
||||
|
||||
```rust
|
||||
#[rpc_requests(message = AuthMessage)]
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
enum AuthProtocol {
|
||||
#[rpc(tx=oneshot::Sender<AuthResult>)]
|
||||
#[wrap(VerifyPubkey)]
|
||||
VerifyPubkey { fingerprint: String, key_data: Vec<u8> },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<AuthResult>)]
|
||||
#[wrap(VerifyToken)]
|
||||
VerifyToken { token_bytes: Vec<u8>, timestamp: u64 },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<()>)]
|
||||
#[wrap(ReloadKeys)]
|
||||
ReloadKeys,
|
||||
|
||||
#[rpc(tx=oneshot::Sender<bool>)]
|
||||
#[wrap(CheckAccess)]
|
||||
CheckAccess { identity: Identity, operation: String },
|
||||
}
|
||||
|
||||
enum AuthResult {
|
||||
Ok(Identity),
|
||||
Denied(String),
|
||||
}
|
||||
```
|
||||
|
||||
The `AuthProtocol` is behind the `irpc` feature flag in alknet-core. Nodes
|
||||
that only do SSH tunneling don't need the service layer overhead. When the
|
||||
feature is disabled, auth goes through `IdentityProvider` directly.
|
||||
|
||||
### AuthServiceImpl
|
||||
|
||||
Two implementations exist (the second is a future phase):
|
||||
|
||||
- **ConfigAuthService** — backed by `ConfigIdentityProvider` (ArcSwap path).
|
||||
Wraps the trait in an irpc service for deployments that use the service layer
|
||||
but don't have SQLite. This is the Phase 1 path: it ships with alknet-core.
|
||||
- **StorageAuthService** — backed by SQLite `peer_credentials` and `api_keys`
|
||||
tables (in alknet-storage, not yet built). Queries on demand. Can maintain an
|
||||
LRU cache for hot fingerprints. This is a Phase 2+ implementation — the
|
||||
contract is defined here so alknet-storage can implement it later.
|
||||
|
||||
Both produce the same `AuthResult` — an `Identity` or a denial. Callers don't
|
||||
know or care which backend is running.
|
||||
|
||||
### Integration with IdentityProvider
|
||||
|
||||
The irpc service and the trait compose. A caller goes through `IdentityProvider`,
|
||||
which may internally delegate to the irpc service, or may satisfy the request
|
||||
locally via `ConfigIdentityProvider`. The deployment topology determines the
|
||||
path:
|
||||
|
||||
- **Minimal (CLI, single-node)**: `ConfigIdentityProvider` reads from
|
||||
`ArcSwap<DynamicConfig>`. No irpc overhead.
|
||||
- **Production with local auth**: `AuthServiceImpl` wraps
|
||||
`StorageIdentityProvider` locally. The handler calls `IdentityProvider` which
|
||||
routes to the local irpc service.
|
||||
- **Distributed auth**: Handler on a worker node calls `IdentityProvider` which
|
||||
routes to a remote auth irpc service over QUIC.
|
||||
|
||||
### ConfigService Integration
|
||||
|
||||
`AuthProtocol::ReloadKeys` triggers reload of the dynamic config's auth section.
|
||||
For the `ConfigIdentityProvider` path, this is equivalent to
|
||||
`ConfigReloadHandle::reload()`. For the `StorageIdentityProvider` path, this
|
||||
refreshes the LRU cache. Both update atomically — ongoing connections are
|
||||
unaffected, new connections pick up changes.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Minimal deployments use `ArcSwap` without irpc overhead. No
|
||||
database dependency for CLI users.
|
||||
- **Positive**: Production deployments wire `StorageIdentityProvider` behind the
|
||||
irpc service. Auth scales to thousands of users without loading all keys into
|
||||
memory.
|
||||
- **Positive**: The `IdentityProvider` trait is the only contract callers depend
|
||||
on. This keeps alknet-core lean and testable.
|
||||
- **Positive**: Feature flag (`irpc`) keeps core lean for deployments that don't
|
||||
need the service layer.
|
||||
- **Positive**: Both paths produce identical `Identity` results. Behavioral
|
||||
parity is enforced by the shared `Identity` type.
|
||||
- **Negative**: Two implementations must be kept in sync. `ConfigIdentityProvider`
|
||||
and `StorageIdentityProvider` must produce the same `Identity` for the same
|
||||
input. Integration tests should verify this.
|
||||
- **Negative**: The `irpc` feature flag adds conditional compilation complexity.
|
||||
The core must compile and work without it, and the service layer must work
|
||||
with it enabled.
|
||||
|
||||
## References
|
||||
|
||||
- [research/services.md](../../research/services.md) — AuthService, AuthProtocol definition
|
||||
- [auth.md](../auth.md) — IdentityProvider trait, Identity struct
|
||||
- [research/configuration.md](../../research/configuration.md) — Auth service approach
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.4
|
||||
- [ADR-029](029-identity-core-type.md) — Identity as core type
|
||||
- [ADR-027](027-crate-decomposition.md) — Crate decomposition
|
||||
@@ -1,107 +0,0 @@
|
||||
# ADR-029: Identity as Core Type
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The `Identity` struct and `IdentityProvider` trait are needed by auth,
|
||||
forwarding policy, and call protocol — three different subsystems in
|
||||
alknet-core. Without placing them in core, these subsystems would each define
|
||||
their own identity type, leading to duplication and conversion boilerplate.
|
||||
|
||||
The constraint: alknet-core must not depend on alknet-storage or any database.
|
||||
The `IdentityProvider` trait must be in core so that the handler can resolve
|
||||
identities without knowing whether the backing store is a config file or a
|
||||
SQLite database. External crates provide implementations.
|
||||
|
||||
Earlier research defined `Identity` inconsistently: `{node_id, fingerprint,
|
||||
scopes}` in services.md and `{id, scopes, resources}` in auth.md. The unified
|
||||
model uses `{id, scopes, resources}` where `id` serves as both fingerprint (for
|
||||
key-based auth from config) and account UUID (for database-backed auth).
|
||||
|
||||
## Decision
|
||||
|
||||
**`Identity` struct and `IdentityProvider` trait live in `alknet_core::auth`.**
|
||||
|
||||
### Identity Struct
|
||||
|
||||
```rust
|
||||
pub struct Identity {
|
||||
pub id: String, // Fingerprint (config auth) or account UUID (database auth)
|
||||
pub scopes: Vec<String>, // e.g., ["relay:connect", "service:gitea:read"]
|
||||
pub resources: HashMap<String, Vec<String>>, // e.g., {"service": ["gitea", "registry"]}
|
||||
}
|
||||
```
|
||||
|
||||
The `id` field serves dual purpose: when using config-based authentication
|
||||
(`ConfigIdentityProvider`), it holds the Ed25519 key fingerprint. When using
|
||||
database-backed authentication (`StorageIdentityProvider`), it holds the account
|
||||
UUID from the `accounts` table. This keeps the type simple while accommodating
|
||||
both auth paths.
|
||||
|
||||
The `scopes` field provides authorization scope strings used by
|
||||
`ForwardingPolicy` and `AccessControl` in `OperationSpec`. The `resources`
|
||||
field provides resource-level authorization beyond what scopes offer (e.g., which
|
||||
services this identity can access).
|
||||
|
||||
### IdentityProvider Trait
|
||||
|
||||
```rust
|
||||
pub trait IdentityProvider: Send + Sync + 'static {
|
||||
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
|
||||
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
|
||||
}
|
||||
```
|
||||
|
||||
The trait is the contract. Callers (auth handler, forwarding policy, call
|
||||
protocol) depend on `IdentityProvider` — not on any concrete implementation.
|
||||
|
||||
### Default and Production Implementations
|
||||
|
||||
- **`ConfigIdentityProvider`** (in alknet-core) — reads from
|
||||
`ArcSwap<DynamicConfig.auth>`. Every authorized key gets a default scope set.
|
||||
No database needed. This is the default for minimal deployments.
|
||||
- **`StorageIdentityProvider`** (in alknet-storage) — backed by SQLite
|
||||
`peer_credentials` and `api_keys` tables plus the ACL graph. Resolves
|
||||
fingerprint → account → organization membership → effective scopes. This is
|
||||
the production implementation for head nodes.
|
||||
|
||||
alknet-core never depends on alknet-storage. The trait relationship is:
|
||||
alknet-core *defines* the trait, alknet-storage *implements* it. The CLI or
|
||||
NAPI assembly layer wires the concrete implementation.
|
||||
|
||||
### Why Not in alknet-storage?
|
||||
|
||||
If `Identity` lived in alknet-storage, alknet-core would need to depend on
|
||||
alknet-storage to use the type — creating a circular dependency (since
|
||||
alknet-storage implements alknet-core's `IdentityProvider` trait). Placing the
|
||||
type and trait in core breaks the cycle.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: alknet-core has no database dependency. Auth, forwarding, and
|
||||
call protocol all use the same `Identity` type.
|
||||
- **Positive**: alknet-storage implements the core trait. The CLI/NAPI layer
|
||||
wires the concrete implementation. Deployment topology determines which impl
|
||||
to use.
|
||||
- **Positive**: The `id` field serves dual purpose (fingerprint or UUID),
|
||||
avoiding separate types for config-based and database-based auth.
|
||||
- **Positive**: `ForwardingPolicy` and `AccessControl` can reference scopes from
|
||||
`Identity` without knowing where they came from.
|
||||
- **Negative**: Two implementations of `IdentityProvider` exist — `Config` and
|
||||
`Storage`. Both must produce identical `Identity` results for the same input.
|
||||
Tests should verify behavioral parity.
|
||||
- **Negative**: The trait abstraction adds a level of indirection for the
|
||||
minimal (config-only) deployment path. The cost is negligible — the
|
||||
`ConfigIdentityProvider` is a simple `ArcSwap` dereference.
|
||||
|
||||
## References
|
||||
|
||||
- [auth.md](../auth.md) — IdentityProvider trait, Identity struct, unified auth
|
||||
- [research/services.md](../../research/services.md) — AuthService, Identity section
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.2
|
||||
- [ADR-023](023-unified-auth-shared-key-material.md) — Unified auth with shared key material
|
||||
- [ADR-028](028-auth-irpc-service.md) — Auth as irpc service
|
||||
- [OQ-18](../open-questions.md) — IdentityProvider owns scopes
|
||||
@@ -1,159 +0,0 @@
|
||||
# ADR-030: Static/Dynamic Configuration Split
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Alknet's configuration is loaded once at startup and never changes. This causes
|
||||
three specific failures:
|
||||
|
||||
1. **No hot reload of authentication credentials.** Adding or removing an
|
||||
authorized key requires restarting the server process. In head/worker
|
||||
deployments where keys are managed via a database, the process must be
|
||||
restarted every time a key is added, revoked, or rotated. This is
|
||||
operationally unacceptable.
|
||||
|
||||
2. **No port forwarding access control.** Any authenticated client can open a
|
||||
`direct-tcpip` channel to any destination. There is no policy governing
|
||||
which hosts, ports, or alknet control channels a client may access. A
|
||||
compromised key grants unrestricted network access through the tunnel.
|
||||
|
||||
3. **No structured configuration beyond CLI flags.** ADR-011 chose
|
||||
programmatic-first configuration for the alpha — correct at the time. But as
|
||||
alknet moves toward publishable releases, operators need config files for
|
||||
reproducible deployments, and the NAPI layer needs programmatic reload
|
||||
capability that `ServeOptions` doesn't currently support.
|
||||
|
||||
Not all configuration should be reloadable. Transport-level settings (listen
|
||||
address, TLS certificates, host key) require socket/TLS renegotiation to change
|
||||
at runtime — effectively a restart. Auth and forwarding policy can change
|
||||
atomically without disrupting existing connections.
|
||||
|
||||
## Decision
|
||||
|
||||
**Split configuration into `StaticConfig` and `DynamicConfig`.**
|
||||
|
||||
### StaticConfig
|
||||
|
||||
Immutable after startup. Constructed from `ServeOptions` (the builder pattern is
|
||||
preserved). Contains everything that affects socket binding, TLS handshakes, or
|
||||
SSH session negotiation:
|
||||
|
||||
- Transport mode, listen address
|
||||
- TLS config (cert, key)
|
||||
- iroh config (relay URL)
|
||||
- Stealth mode flag
|
||||
- Host key, host key algorithm
|
||||
- Max auth attempts, max connections per IP
|
||||
- Proxy config
|
||||
|
||||
Changing any of these requires a restart.
|
||||
|
||||
### DynamicConfig
|
||||
|
||||
Hot-reloadable at runtime via `ArcSwap<DynamicConfig>`. Contains everything
|
||||
checked per-connection or per-channel:
|
||||
|
||||
- `AuthPolicy` — authorized keys, certificate authorities, token config
|
||||
- `ForwardingPolicy` — allow/deny rules for channel targets (ADR-031)
|
||||
- `RateLimitConfig` — rate limiting parameters
|
||||
|
||||
`ArcSwap` provides lock-free reads on the hot path (every `auth_publickey()` and
|
||||
every `channel_open_direct_tcpip()` call does an `Arc` dereference — zero cost
|
||||
compared to the current approach). Writes are atomic: `store()` swaps the
|
||||
pointer. Existing connections finish with their current config; new connections
|
||||
get the new config.
|
||||
|
||||
### ConfigReloadHandle
|
||||
|
||||
```rust
|
||||
pub struct ConfigReloadHandle {
|
||||
dynamic: Arc<ArcSwap<DynamicConfig>>,
|
||||
}
|
||||
|
||||
impl ConfigReloadHandle {
|
||||
pub fn reload(&self, new_config: DynamicConfig) { ... }
|
||||
}
|
||||
```
|
||||
|
||||
The handle is obtained from `Server::run()` and passed to NAPI or the CLI.
|
||||
|
||||
### ConfigService
|
||||
|
||||
The `ConfigService` wraps `ArcSwap<DynamicConfig>` reloads behind an irpc
|
||||
protocol (behind the `irpc` feature flag) for production deployments that use
|
||||
the service layer. For minimal deployments (CLI, single-node), direct
|
||||
`ConfigReloadHandle::reload()` is sufficient.
|
||||
|
||||
### TOML Config File
|
||||
|
||||
An optional TOML config file covers static config plus initial auth/forwarding
|
||||
paths. This **amends** ADR-011 (does not supersede it) — the programmatic-first
|
||||
API remains primary. The config file is a convenience input format:
|
||||
|
||||
```toml
|
||||
[server]
|
||||
transport = "tls"
|
||||
listen = "0.0.0.0:443"
|
||||
stealth = false
|
||||
max_connections_per_ip = 5
|
||||
max_auth_attempts = 3
|
||||
|
||||
[server.tls]
|
||||
cert = "/etc/alknet/tls/cert.pem"
|
||||
key = "/etc/alknet/tls/key.pem"
|
||||
|
||||
[auth]
|
||||
host_key = "/etc/alknet/ssh/host_key"
|
||||
|
||||
[forwarding]
|
||||
default = "deny"
|
||||
```
|
||||
|
||||
### NAPI Reload API
|
||||
|
||||
```typescript
|
||||
interface AlknetServer {
|
||||
reloadAuth(auth: { authorizedKeys?: Buffer, certAuthority?: Buffer }): void;
|
||||
reloadForwarding(policy: ForwardingPolicyConfig): void;
|
||||
reloadAll(config: DynamicConfig): void;
|
||||
}
|
||||
```
|
||||
|
||||
The NAPI layer parses key data and constructs a new `DynamicConfig`, then calls
|
||||
`ConfigReloadHandle::reload()`.
|
||||
|
||||
### Client Configuration
|
||||
|
||||
Client configuration stays as `ConnectOptions` — no `ArcSwap` needed. Client
|
||||
config is almost entirely static (which server to connect to, which key to use).
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Auth credentials and forwarding policy can be reloaded without
|
||||
restarting the server. Adding a key via `reloadAuth()` takes effect on the
|
||||
next connection attempt.
|
||||
- **Positive**: ADR-011's programmatic-first intent is preserved. The TOML
|
||||
config file is an optional convenience layer, not a replacement for
|
||||
`ServeOptions`.
|
||||
- **Positive**: `ArcSwap` provides zero-cost reads on the hot path. Every auth
|
||||
check and every channel open is a single `Arc` dereference.
|
||||
- **Positive**: The `ConfigService` irpc protocol (behind feature flag) allows
|
||||
production deployments to integrate config reload into their service mesh
|
||||
without taking a direct dependency on `DynamicConfig` internals.
|
||||
- **Positive**: Forwarding policy is now part of `DynamicConfig` — operators can
|
||||
restrict access per identity, per destination, per transport (ADR-031).
|
||||
- **Negative**: Two config structs where there was one. The split is clean
|
||||
(transport vs. policy) but adds surface area.
|
||||
- **Negative**: Config file introduces `toml` as a dependency in the CLI crate.
|
||||
This is acceptable for a CLI binary.
|
||||
|
||||
## References
|
||||
|
||||
- [research/configuration.md](../../research/configuration.md) — Full analysis
|
||||
- [ADR-011](011-no-ssh-config-programmatic-api.md) — Programmatic-first API (amended, not superseded)
|
||||
- [ADR-031](031-forwarding-policy.md) — Forwarding policy (part of DynamicConfig)
|
||||
- [ADR-029](029-identity-core-type.md) — Identity as core type (DynamicConfig.auth uses IdentityProvider)
|
||||
- [integration-plan.md](../../research/integration-plan.md) — Phase 1.1
|
||||
@@ -1,138 +0,0 @@
|
||||
# ADR-031: Forwarding Policy
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Currently, any authenticated client can open a `direct-tcpip` SSH channel to
|
||||
any destination. The only gate is authentication — once authenticated, a client
|
||||
has unrestricted network access through the tunnel. This is a security gap: a
|
||||
compromised key grants unrestricted access.
|
||||
|
||||
Operators need the ability to:
|
||||
- Restrict which hosts and ports authenticated clients can access
|
||||
- Apply different rules to different principals (key fingerprints, accounts)
|
||||
- Restrict WebTransport clients to alknet control channels only
|
||||
- Set a default policy (allow-all for migration compatibility, deny-all for
|
||||
production)
|
||||
|
||||
## Decision
|
||||
|
||||
**Add `ForwardingPolicy` as part of `DynamicConfig` (reloadable without
|
||||
restart).**
|
||||
|
||||
### Type Definitions
|
||||
|
||||
```rust
|
||||
pub struct ForwardingPolicy {
|
||||
pub default: ForwardingAction,
|
||||
pub rules: Vec<ForwardingRule>,
|
||||
}
|
||||
|
||||
pub struct ForwardingRule {
|
||||
pub target: TargetPattern,
|
||||
pub action: ForwardingAction,
|
||||
pub principals: Vec<String>, // Empty = matches all
|
||||
pub transports: Vec<TransportKind>, // Empty = matches all
|
||||
}
|
||||
|
||||
pub enum ForwardingAction {
|
||||
Allow,
|
||||
Deny,
|
||||
}
|
||||
|
||||
pub enum TargetPattern {
|
||||
Any,
|
||||
Host(String), // "localhost", "*.example.com"
|
||||
Cidr(IpNetwork), // "10.0.0.0/8"
|
||||
PortRange(String, Range<u16>), // "localhost", ports 8080-8090
|
||||
AlknetPrefix, // Matches alknet-* control channels
|
||||
}
|
||||
```
|
||||
|
||||
### Rule Evaluation
|
||||
|
||||
Rules are evaluated in order. First match wins. If no rule matches, the default
|
||||
applies. This supports both allowlist and blocklist semantics:
|
||||
|
||||
- **Allowlist**: `default: Deny`, then explicit Allow rules for permitted
|
||||
destinations.
|
||||
- **Blocklist**: `default: Allow`, then explicit Deny rules for blocked
|
||||
destinations.
|
||||
|
||||
### Principals
|
||||
|
||||
Each rule can specify which principals it applies to. A principal is an
|
||||
`Identity.id` (fingerprint or UUID) or a scope from `Identity.scopes`. When the
|
||||
rule's `principals` field is empty, it matches all identities.
|
||||
|
||||
This connects to the `IdentityProvider` trait (ADR-029): when a client
|
||||
authenticates, the `Identity` is resolved, and the forwarding policy checks
|
||||
rules against `Identity.id` and `Identity.scopes`.
|
||||
|
||||
### TransportKind-Aware Rules
|
||||
|
||||
Each rule can specify which `TransportKind` it applies to. This enables
|
||||
transport-specific restrictions — for example, WebTransport clients can be
|
||||
restricted to `alknet-*` control channels only:
|
||||
|
||||
```rust
|
||||
ForwardingRule {
|
||||
target: TargetPattern::AlknetPrefix,
|
||||
action: ForwardingAction::Allow,
|
||||
principals: vec![],
|
||||
transports: vec![TransportKind::WebTransport { host: "*".into() }],
|
||||
}
|
||||
```
|
||||
|
||||
### Where the Policy Check Happens
|
||||
|
||||
The forwarding policy check occurs in `channel_open_direct_tcpip` before the
|
||||
proxy task is spawned. The current behavior (no check) is equivalent to
|
||||
`ForwardingPolicy::allow_all()` — default Allow, no rules. This preserves
|
||||
backward compatibility during migration.
|
||||
|
||||
### DynamicConfig Integration
|
||||
|
||||
`ForwardingPolicy` is part of `DynamicConfig` and reloadable via
|
||||
`ConfigReloadHandle::reload()` or NAPI's `reloadForwarding()`. Changes take
|
||||
effect on the next channel open — existing connections continue with their
|
||||
current policy.
|
||||
|
||||
### OQ Resolutions
|
||||
|
||||
- **OQ-12** (Per-user forwarding scope vs global rules): Resolved. Start with
|
||||
global rules + principal matching from `Identity.scopes`. Per-user scope
|
||||
from `peer_credentials.metadata.scopes` via `IdentityProvider`.
|
||||
- **OQ-16** (Transport-specific forwarding): Resolved. Add `TransportKind`
|
||||
match in `ForwardingRule`. WebTransport clients can be restricted.
|
||||
- **OQ-18** (Source of Identity.scopes): Resolved by ADR-029 and this ADR.
|
||||
`IdentityProvider` owns scopes. `ForwardingPolicy` consumes them.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Operators can restrict access per identity, per destination, per
|
||||
transport. A compromised key no longer grants unrestricted network access.
|
||||
- **Positive**: Default-allow preserves current behavior during migration. Switch
|
||||
to default-deny for production deployments.
|
||||
- **Positive**: Policy is reloadable without restart. Adding a rule via
|
||||
`reloadForwarding()` takes effect on the next channel open.
|
||||
- **Positive**: `TransportKind`-aware rules enable transport-specific
|
||||
restrictions (e.g., WebTransport clients restricted to alknet-* channels).
|
||||
- **Negative**: Another check in the hot path (every `channel_open_direct_tcpip`
|
||||
call). The cost is a linear scan of rules — acceptable for small rule sets.
|
||||
Large rule sets should use compiled matchers (future optimization).
|
||||
- **Negative**: `TargetPattern` string matching is lenient. Host patterns like
|
||||
`*.example.com` require careful implementation to prevent bypasses. The
|
||||
`glob` or `globset` crate can handle this correctly.
|
||||
|
||||
## References
|
||||
|
||||
- [research/configuration.md](../../research/configuration.md) — ForwardingPolicy section
|
||||
- [auth.md](../auth.md) — Identity.scopes and IdentityProvider
|
||||
- [open-questions.md](../open-questions.md) — OQ-12, OQ-16, OQ-18
|
||||
- [ADR-029](029-identity-core-type.md) — Identity as core type
|
||||
- [ADR-030](030-static-dynamic-config-split.md) — DynamicConfig (ForwardingPolicy is part of it)
|
||||
- [integration-plan.md](../../research/integration-plan.md) — Phase 1.3
|
||||
@@ -1,96 +0,0 @@
|
||||
# ADR-032: Event Boundary Discipline
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The research identified three distinct communication patterns in the system, and
|
||||
conflating them is a known anti-pattern in event-driven architectures:
|
||||
|
||||
1. **Domain events** (Honker streams) — Internal to the service that owns that
|
||||
data. Used for state reconstruction within the service's own boundaries.
|
||||
Examples: `nodes:created`, `edges:deleted`, `accounts:updated`.
|
||||
|
||||
2. **irpc service calls** — Synchronous request-response within a node or
|
||||
cluster. Internal to the system. Examples: `AuthProtocol::VerifyPubkey`,
|
||||
`SecretProtocol::DeriveEd25519`, `ConfigProtocol::ReloadForwarding`.
|
||||
|
||||
3. **Call protocol events** (`EventEnvelope`) — Asynchronous integration events
|
||||
that cross node boundaries. External to the system. Examples:
|
||||
`call.requested`, `call.responded`, `call.completed`, `call.aborted`.
|
||||
|
||||
Without a hard constraint, it's tempting to have one service subscribe directly
|
||||
to another service's Honker streams. This leads to:
|
||||
|
||||
- **Leaky event store**: Service A reads Service B's domain events directly,
|
||||
coupling A to B's internal state representation. When B changes its schema, A
|
||||
breaks.
|
||||
- **Boomerang coupling**: An integration event is too thin, causing the
|
||||
consumer to call back to the source service synchronously to get details. This
|
||||
negates the benefit of async communication.
|
||||
- **Fat notification trap**: A notification event carries full entity state,
|
||||
when it should use state transfer instead.
|
||||
|
||||
## Decision
|
||||
|
||||
**Event boundary discipline is a hard architectural constraint, not a
|
||||
suggestion.**
|
||||
|
||||
1. **Domain events stay within the owning service.** A Honker stream published
|
||||
by the storage service (`nodes:created`) is for the storage service's own
|
||||
state reconstruction. No other service reads these stream events directly.
|
||||
|
||||
2. **irpc service calls are synchronous and internal.** They never cross node
|
||||
boundaries. They are request-response, not events. They should not be used
|
||||
as a substitute for integration events.
|
||||
|
||||
3. **Call protocol events are the only events that cross node boundaries.**
|
||||
`EventEnvelope` frames are the integration boundary. When a domain event
|
||||
needs to be communicated to another node, it must be projected into a call
|
||||
protocol event.
|
||||
|
||||
4. **Projection from domain events to integration events is required when
|
||||
crossing boundaries.** A service that owns a Honker stream must project
|
||||
relevant state changes into `EventEnvelope` frames before they leave the
|
||||
node. The projection strips internal details and produces a versioned,
|
||||
stable integration event.
|
||||
|
||||
This discipline applies at three levels:
|
||||
|
||||
```
|
||||
Call Protocol (Layer 3, external, JSON)
|
||||
└── irpc Service (Layer 3, internal, postcard)
|
||||
└── Honker Streams (Domain events, within service boundary)
|
||||
```
|
||||
|
||||
A call protocol handler MAY call an irpc service internally (e.g.,
|
||||
`/head/auth/verify` calls `AuthProtocol::VerifyPubkey`). The irpc service MAY
|
||||
use Honker streams for its own state management. But domain events never
|
||||
propagate beyond the service boundary without projection.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Prevents leaky event stores. Services are independently
|
||||
deployable and their internal schemas can evolve without breaking consumers.
|
||||
- **Positive**: Honker and irpc are implementation details, not cross-boundary
|
||||
contracts. The call protocol's `EventEnvelope` is the only stable, versioned
|
||||
contract that other nodes depend on.
|
||||
- **Positive**: Clear ownership. Each service owns its Honker streams and can
|
||||
change them freely. Integration events are a deliberate, reviewed contract.
|
||||
- **Positive**: Makes testing easier. Services can be tested in isolation with
|
||||
mock domain events. Integration events are tested against the `EventEnvelope`
|
||||
schema.
|
||||
- **Negative**: Projection code is required. Every domain event that needs to
|
||||
cross a boundary must be explicitly projected. This is deliberate — the
|
||||
overhead ensures the integration contract is intentional.
|
||||
- **Negative**: Developers must resist the temptation to subscribe directly to
|
||||
Honker streams across services. Code review should catch this pattern.
|
||||
|
||||
## References
|
||||
|
||||
- [research/services.md](../../research/services.md) — Event boundary discipline section
|
||||
- [research/storage.md](../../research/storage.md) — Honker integration, event boundaries
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — ADR 032 entry
|
||||
- [event_source_types.md](../../research/event-sourcing/event_source_types.md) — Event-driven architecture patterns
|
||||
@@ -1,132 +0,0 @@
|
||||
# ADR-033: OperationEnv as Universal Composition Mechanism
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The `@alkdev/operations` TypeScript package defines `OperationEnv` as a
|
||||
universal composition mechanism. A handler receives `context.env[namespace][op](input)`
|
||||
and can invoke any registered operation regardless of whether it runs locally, in
|
||||
an irpc service on the same cluster, or on a remote node via call protocol.
|
||||
|
||||
The research documents define three dispatch paths:
|
||||
1. **Local dispatch** — direct function call through the operation registry
|
||||
2. **Service dispatch** — irpc protocol call to a service backend
|
||||
3. **Remote dispatch** — call protocol `EventEnvelope` to a remote node
|
||||
|
||||
Without a formal decision, irpc services could be seen as a replacement for
|
||||
OperationEnv or for the call protocol. They are not — irpc is one dispatch
|
||||
backend for OperationEnv, not a replacement for anything. The call protocol is
|
||||
another dispatch backend. OperationEnv unifies them from the handler's
|
||||
perspective.
|
||||
|
||||
The three communication patterns in the system (ADR-032) are:
|
||||
- Domain events (Honker streams) — internal to the owning service
|
||||
- irpc service calls — synchronous, in-cluster
|
||||
- Call protocol events — asynchronous, cross-node
|
||||
|
||||
irpc services and call protocol operations serve different scopes but must
|
||||
compose cleanly through OperationEnv.
|
||||
|
||||
## Decision
|
||||
|
||||
**OperationEnv is the universal composition mechanism that all operation
|
||||
handlers receive. It provides namespace + operation name → invoke with input,
|
||||
return output, regardless of dispatch path.**
|
||||
|
||||
### OperationEnv Behavioral Contract
|
||||
|
||||
```rust
|
||||
// The behavioral contract: given a namespace and operation name, invoke the
|
||||
// operation with the given input and return the output. The handler neither
|
||||
// knows nor cares whether the dispatch is local, via irpc, or via call protocol.
|
||||
pub trait OperationEnv: Send + Sync {
|
||||
fn invoke(&self, namespace: &str, operation: &str, input: Value) -> ResponseEnvelope;
|
||||
}
|
||||
```
|
||||
|
||||
The Rust implementation may use typed method dispatch or a registry behind the
|
||||
scenes, but the handler-facing API must preserve this contract.
|
||||
|
||||
### Three Dispatch Paths
|
||||
|
||||
OperationEnv resolves each call to one of three dispatch backends:
|
||||
|
||||
| Path | Mechanism | Serialization | Scope |
|
||||
|------|-----------|---------------|-------|
|
||||
| Local | Direct function call through registry | None (in-process) | Same process |
|
||||
| Service | irpc protocol enum dispatch | postcard (binary) | Same cluster |
|
||||
| Remote | Call protocol `EventEnvelope` | JSON | Cross-node |
|
||||
|
||||
All three produce the same `ResponseEnvelope`. The handler always calls
|
||||
`context.env.invoke("secrets", "derive", input)` and gets a `ResponseEnvelope`
|
||||
back.
|
||||
|
||||
### Service Assembly
|
||||
|
||||
The deployment topology determines which dispatch path each operation uses:
|
||||
|
||||
```rust
|
||||
// Minimal deployment (single node, all local)
|
||||
let env = OperationEnv::local(local_registry);
|
||||
|
||||
// Production deployment (mix of local and remote)
|
||||
let env = OperationEnv::new()
|
||||
.local("auth", auth_registry) // Auth runs locally
|
||||
.local("config", config_registry) // Config runs locally
|
||||
.service("secrets", secret_irpc_client) // Secret service via irpc
|
||||
.remote("worker-1", call_protocol_conn) // Worker-1 operations via call protocol
|
||||
```
|
||||
|
||||
### irpc Services Are One Dispatch Backend
|
||||
|
||||
irpc services (`AuthProtocol`, `SecretProtocol`, `ConfigProtocol`) define the
|
||||
wire format for in-cluster communication. They are Rust-to-Rust, type-safe,
|
||||
and efficient. But they are not a replacement for OperationEnv or for the call
|
||||
protocol. They are one dispatch backend.
|
||||
|
||||
An irpc service can be exposed as a call protocol operation:
|
||||
`/head/auth/verify` receives a call protocol event and internally calls
|
||||
`AuthProtocol::VerifyPubkey` via irpc. The layers compose:
|
||||
|
||||
```
|
||||
Call Protocol (Layer 3, external, JSON)
|
||||
└── irpc Service (Layer 3, internal, postcard)
|
||||
└── Honker Streams (Domain events, within service boundary)
|
||||
```
|
||||
|
||||
### Adapters Map to OperationEnv
|
||||
|
||||
HTTP (`POST /v1/{namespace}/{op}`), MCP (`tools/call`), DNS
|
||||
(`{op}.{namespace}.alk.dev TXT?`), and call protocol
|
||||
(`/call.requested`) all resolve through OperationEnv. This is what makes
|
||||
operations universally composable across all interfaces.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Handlers compose through a single interface. Adding a new
|
||||
dispatch path (e.g., a new irpc service) doesn't change handler code.
|
||||
- **Positive**: irpc and call protocol coexist naturally. The handler doesn't
|
||||
know which path was taken.
|
||||
- **Positive**: Adapters (MCP, HTTP, DNS) map to operations through the same
|
||||
OperationEnv interface. One handler, multiple dispatch paths.
|
||||
- **Positive**: Deployment topology determines dispatch, not code. Same handler
|
||||
works locally, in-cluster, or cross-node.
|
||||
- **Negative**: OperationEnv is a new abstraction that must coexist with the
|
||||
existing call protocol handler pattern. The registry currently maps paths to
|
||||
handlers; OperationEnv adds namespace-aware composition on top.
|
||||
- **Negative**: The `@alkdev/operations` TypeScript `HashMap<String,
|
||||
HashMap<String, fn>>` model needs idiomatic Rust translation. The behavioral
|
||||
contract must match, but the implementation can differ.
|
||||
|
||||
## References
|
||||
|
||||
- [research/services.md](../../research/services.md) — OperationContext, OperationEnv
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.5, OperationEnv wiring
|
||||
- [ADR-026](026-transport-interface-separation.md) — Three-layer model (OperationEnv is Layer 3)
|
||||
- [ADR-028](028-auth-irpc-service.md) — Auth as irpc service (one dispatch backend)
|
||||
- [ADR-032](032-event-boundary-discipline.md) — Event boundary discipline
|
||||
- [ADR-024](024-bidirectional-call-protocol.md) — Bidirectional call protocol
|
||||
- [ADR-025](025-handler-spec-separation.md) — Handler/spec separation
|
||||
@@ -1,55 +0,0 @@
|
||||
# ADR-034: Head/Worker Terminology
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The project previously used hub/spoke terminology for describing node
|
||||
relationships: a hub node that coordinates connections and spokes that connect to
|
||||
it. This terminology implies a strict star topology where the hub is
|
||||
fundamentally different from spokes.
|
||||
|
||||
In practice, a coordinating node can also execute operations (run services,
|
||||
forward traffic). Any node can become a coordinator. The architecture supports
|
||||
mesh topologies where nodes coordinate in a peer-to-peer fashion.
|
||||
|
||||
The research documents (`core.md`, `services.md`) and updated architecture
|
||||
specs (`call-protocol.md`, `auth.md`, `napi-and-pubsub.md`, `open-questions.md`)
|
||||
already use head/worker consistently. Existing ADRs (024, 025) retain their
|
||||
original hub/spoke language because ADRs are historical records.
|
||||
|
||||
## Decision
|
||||
|
||||
**Use head/worker terminology throughout the project.**
|
||||
|
||||
- **Head node**: A node that coordinates — accepts connections, routes
|
||||
operations, manages cluster state. A head is also a worker (it can execute
|
||||
operations).
|
||||
- **Worker node**: A node that connects to a head, registers its services, and
|
||||
executes operations. Any worker can become a head.
|
||||
- **Node**: Any participant in the network. Every node has an Ed25519 identity.
|
||||
|
||||
The terms hub and spoke are deprecated in all new specs, code, and
|
||||
documentation. Existing ADRs retain their original language as historical
|
||||
records — ADRs document what was decided at the time, not what the current
|
||||
terminology is.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Natural mesh formation. A head that is also a worker enables
|
||||
multi-hop routing, redundancy, and distributed topologies without a
|
||||
centralized authority.
|
||||
- **Positive**: Consistency with integration plan and research documents.
|
||||
- **Positive**: The terminology better reflects the architecture — there is no
|
||||
single "hub" that's fundamentally different from "spokes."
|
||||
- **Neutral**: Existing ADRs (024, 025) retain hub/spoke in their text. This is
|
||||
intentional — ADRs are historical records.
|
||||
|
||||
## References
|
||||
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 0 ADR 034 entry, inconsistencies section
|
||||
- [ADR-024](024-bidirectional-call-protocol.md) — Uses hub/spoke historically
|
||||
- [ADR-025](025-handler-spec-separation.md) — Uses hub/spoke historically
|
||||
- [research/core.md](../../research/core.md) — Head/worker terminology
|
||||
@@ -1,65 +0,0 @@
|
||||
# ADR-035: StreamInterface and MessageInterface Split
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The `Interface` trait (ADR-026) assumes a persistent byte stream from a `Transport`. It produces a `Session` that yields `InterfaceEvent` frames. This works for SSH and raw framing — both run over duplex streams.
|
||||
|
||||
However, HTTP and DNS do not fit this model. They handle individual request/response pairs, not persistent sessions. HTTP runs over a TLS connection after byte-peek protocol detection (extending the existing stealth mode pattern). DNS runs its own server on port 53. Both are stateless per-request, not session-oriented.
|
||||
|
||||
The three-layer model (Transport, Interface, Protocol) remains correct. The issue is that Layer 2 has two distinct patterns: stream-based (SSH, raw framing) where the transport provides a continuous byte stream, and message-based (HTTP, DNS) where the interface manages its own transport and handles discrete requests.
|
||||
|
||||
## Decision
|
||||
|
||||
Split the `Interface` trait into two independent traits:
|
||||
|
||||
1. **`StreamInterface`** — consumes a `TransportStream`, produces a long-lived `Session` that yields `InterfaceEvent` frames. Existing `SshInterface` and `RawFramingInterface` become `StreamInterface` implementations.
|
||||
|
||||
2. **`MessageInterface`** — handles individual `InterfaceRequest` → `InterfaceResponse` pairs. Manages its own transport (HTTP server, DNS server). `HttpInterface` and `DnsInterface` are `MessageInterface` implementations.
|
||||
|
||||
The traits are independent. They have different signatures (`accept(stream)` vs `handle_request(req)`), different lifecycles (long-lived session vs stateless per-request), and different transport ownership (provided by caller vs self-managed).
|
||||
|
||||
`ListenerConfig` gains variants for both:
|
||||
|
||||
```rust
|
||||
pub enum ListenerConfig {
|
||||
Stream {
|
||||
transport: TransportKind,
|
||||
interface: StreamInterfaceKind,
|
||||
},
|
||||
Http {
|
||||
bind_addr: SocketAddr,
|
||||
tls: bool,
|
||||
stealth: bool,
|
||||
},
|
||||
Dns {
|
||||
bind_addr: SocketAddr,
|
||||
tls: bool,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
`TransportKind::Dns` is removed. DNS is a `MessageInterface` that manages its own transport (UDP/TCP port 53), not a transport variant.
|
||||
|
||||
The call protocol handler (Layer 3) is interface-agnostic: it processes `InterfaceEvent` frames from `StreamInterface` sessions and `InterfaceRequest` → `InterfaceResponse` from `MessageInterface` handlers. The dispatch logic is the same — only the framing differs.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive**: HTTP and DNS are first-class interfaces with proper type signatures. No forcing stateless protocols into a session model. The existing stealth mode byte-peek pattern naturally extends to `HttpInterface`. The `InterfaceRequest` / `InterfaceResponse` types normalize calls across message-based interfaces.
|
||||
|
||||
**Positive**: Removing `TransportKind::Dns` prevents a breaking change later — code should never depend on DNS as a transport variant.
|
||||
|
||||
**Positive**: `ListenerConfig` correctly models the server's accept loop: stream listeners spawn one accept loop per (transport, interface) pair, while HTTP and DNS listeners each manage their own server.
|
||||
|
||||
**Negative**: Two traits where there was one. But they serve fundamentally different purposes. A common super-trait would add complexity (`accept_stream` + `handle_request` + `transport_kind`) without practical benefit — implementations satisfy one trait or the other, never both.
|
||||
|
||||
**Negative**: The `accept()` method on the current `Interface` trait needs to be renamed. This is a rename of an existing method signature, not a semantic change — `SshInterface` and `RawFramingInterface` implementations become `StreamInterface` implementations with the same `accept()` logic.
|
||||
|
||||
## References
|
||||
|
||||
- ADR-026 (transport/interface separation — updated by this ADR)
|
||||
- [interface.md](../interface.md) — Interface layer spec
|
||||
- [research/phase2/interface-model.md](../../research/phase2/interface-model.md) — Full analysis
|
||||
- [research/phase2/tls-transport.md](../../research/phase2/tls-transport.md) — HTTP interface, ListenerConfig
|
||||
@@ -1,82 +0,0 @@
|
||||
# ADR-036: CredentialProvider as Core Type
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Alknet's `IdentityProvider` resolves **inbound** authentication: given a
|
||||
credential (fingerprint or token), produce an `Identity`. But there is no
|
||||
corresponding abstraction for **outbound** credentials: how does alknet
|
||||
authenticate _to_ external services (vast.ai, rustfs, gitea)?
|
||||
|
||||
Without `CredentialProvider`, each service wrapper would independently solve
|
||||
credential retrieval, caching, and lifecycle management. This leads to
|
||||
duplicated effort and inconsistent security practices across service wrappers.
|
||||
|
||||
The pattern mirrors the existing `IdentityProvider` pattern: trait in core,
|
||||
default impl using simple storage, production impl using the secret service
|
||||
and database.
|
||||
|
||||
## Decision
|
||||
|
||||
Define `CredentialProvider` trait and `CredentialSet` enum in
|
||||
`alknet_core::credentials`.
|
||||
|
||||
```rust
|
||||
pub trait CredentialProvider: Send + Sync + 'static {
|
||||
fn get_credentials(&self, service: &str) -> Option<CredentialSet>;
|
||||
fn refresh_credentials(&self, service: &str) -> Option<CredentialSet>;
|
||||
}
|
||||
|
||||
pub enum CredentialSet {
|
||||
ApiKey { header_name: String, token: String },
|
||||
Basic { username: String, password: String },
|
||||
Bearer { token: String },
|
||||
S3AccessKey { access_key: String, secret_key: String, session_token: Option<String> },
|
||||
OidcToken { access_token: String, refresh_token: Option<String>, expires_at: Option<u64> },
|
||||
Custom { scheme: String, params: HashMap<String, String> },
|
||||
}
|
||||
```
|
||||
|
||||
The trait is intentionally narrow. It returns credentials for a named service.
|
||||
It does not try to abstract the auth mechanism itself — that stays with the
|
||||
service wrapper that knows the protocol (S3 signing, OAuth2 refresh, etc.).
|
||||
|
||||
Phase 1 provides `SecretStoreCredentialProvider` (reads from
|
||||
`SecretProtocol::Decrypt`, holds in RAM). Phase 2+ adds
|
||||
`ManagedCredentialProvider` (with `CredentialManager` for lifecycle management:
|
||||
refresh, expiration, provisioning).
|
||||
|
||||
`CredentialProvider` does not depend on `IdentityProvider`, though
|
||||
`ManagedCredentialProvider` may use `Identity.id` for identity-bound credential
|
||||
lookups.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive**: Outbound auth has a unified abstraction, just as inbound auth
|
||||
has `IdentityProvider`. Service wrappers retrieve credentials through one
|
||||
interface. `OperationEnv` can expose credentials through `context.env`.
|
||||
|
||||
**Positive**: The `CredentialSet` enum covers all identified credential types
|
||||
(API keys, bearer tokens, S3 access keys, OIDC tokens, basic auth, custom).
|
||||
This is sufficient for Phases A-C. Phase D (alknet as OIDC provider) is additive.
|
||||
|
||||
**Positive**: The trait in core, impl in service crate pattern is consistent
|
||||
with `IdentityProvider` (trait in core, `ConfigIdentityProvider` in core,
|
||||
`StorageIdentityProvider` in alknet-storage).
|
||||
|
||||
**Negative**: Adds a new core type and a new module (`credentials`). But this
|
||||
is the same pattern as `IdentityProvider` and `auth` — a small, narrow trait
|
||||
with a clear contract.
|
||||
|
||||
**Negative**: `ManagedCredentialProvider` and `CredentialManager` are Phase C
|
||||
concepts. The spec should define them as future extensions, not implement them
|
||||
now.
|
||||
|
||||
## References
|
||||
|
||||
- ADR-029 (Identity as core type — same pattern)
|
||||
- [credentials.md](../credentials.md) — CredentialProvider spec
|
||||
- [research/phase2/credential-provider.md](../../research/phase2/credential-provider.md) — Full analysis
|
||||
- [identity.md](../identity.md) — IdentityProvider (inbound, opposite direction)
|
||||
@@ -1,83 +0,0 @@
|
||||
# ADR-037: API Keys as DynamicConfig Auth
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Alknet's token auth uses Ed25519-signed `AuthToken`s — the same key material
|
||||
used for SSH auth. This is appropriate for interactive clients (browsers, CLI)
|
||||
that can generate and sign Ed25519 key pairs.
|
||||
|
||||
But for service accounts, automation, and simple integrations, Ed25519 key
|
||||
pairs are inconvenient. A dashboard backend, a CI/CD pipeline, or a monitoring
|
||||
script needs a simple bearer token that can be stored in an environment variable
|
||||
or config file without managing cryptographic key pairs.
|
||||
|
||||
The HTTP interface (Phase 2+) requires bearer token auth for `Authorization:
|
||||
Bearer <token>` headers. `AuthToken` works but requires client-side Ed25519
|
||||
signing. API keys offer a simpler alternative: short bearer tokens verified by
|
||||
SHA-256 hash lookup, with optional scope restrictions and TTL.
|
||||
|
||||
## Decision
|
||||
|
||||
Add `[[auth.api_keys]]` section to `DynamicConfig`:
|
||||
|
||||
```toml
|
||||
[[auth.api_keys]]
|
||||
prefix = "alk_"
|
||||
hash = "sha256:abc..."
|
||||
scopes = ["relay:connect", "secrets:derive"]
|
||||
description = "dashboard service account"
|
||||
ttl = "30d" # optional
|
||||
```
|
||||
|
||||
`ConfigIdentityProvider::resolve_from_token()` handles both token types:
|
||||
- If the input starts with the configured prefix (default `alk_`), treat it as
|
||||
an API key: hash it with SHA-256 and look up the hash in the `api_keys` table.
|
||||
- Otherwise, treat it as an `AuthToken`: decode, verify Ed25519 signature,
|
||||
check timestamp, resolve from `authorized_keys`.
|
||||
|
||||
Both paths produce the same `Identity` result. In database-backed deployments,
|
||||
both resolve to the same account UUID.
|
||||
|
||||
API keys are stored as SHA-256 hashes (like password hashing — the cleartext
|
||||
key is never stored, only its hash). The prefix enables O(1) routing between
|
||||
AuthToken and API key verification without trying both paths.
|
||||
|
||||
The full key is provided to the client exactly once (at creation time). Subsequent
|
||||
verifications only compare hashes.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive**: Simple bearer token auth for HTTP and other non-SSH interfaces.
|
||||
No cryptographic key management for service accounts. Consistent with industry
|
||||
practice (Stripe, GitHub, AWS all use prefixed API keys).
|
||||
|
||||
**Positive**: Both AuthTokens and API keys go through `resolve_from_token()`.
|
||||
The caller doesn't need to know which type they're using. This keeps the
|
||||
authentication layer unified.
|
||||
|
||||
**Positive**: Scoped API keys enable fine-grained access control for service
|
||||
accounts. A monitoring tool gets `["monitoring:read"]`, not full access.
|
||||
|
||||
**Negative**: API keys are bearer tokens — anyone who obtains the key has the
|
||||
associated permissions. The hash storage and optional TTL mitigate but do not
|
||||
eliminate this risk. Ed25519 AuthTokens remain the preferred auth method for
|
||||
interactive clients.
|
||||
|
||||
**Negative**: API key rotation requires updating `DynamicConfig` (or the
|
||||
`api_keys` database table). The `ConfigReloadHandle` / `ConfigService` reload
|
||||
mechanism handles this, but it's a deliberate operation, not automatic.
|
||||
|
||||
**Negative**: No rate limiting on API key verification is built into this ADR.
|
||||
Rate limiting on the HTTP interface is a separate concern.
|
||||
|
||||
## References
|
||||
|
||||
- ADR-023 (unified auth, shared key material)
|
||||
- ADR-029 (Identity as core type)
|
||||
- ADR-030 (static/dynamic config split)
|
||||
- [auth.md](../auth.md) — Token auth, AuthPolicy, API keys
|
||||
- [configuration.md](../configuration.md) — DynamicConfig, AuthPolicy
|
||||
- [research/phase2/interface-model.md](../../research/phase2/interface-model.md) — API keys in config
|
||||
@@ -1,137 +0,0 @@
|
||||
# ADR-038: Seed Lifecycle and Memory Security
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The alknet-secret crate holds the master BIP39 seed phrase in RAM. This seed is
|
||||
the root of trust for all derived keys (identity, encryption, signing). If the
|
||||
seed is leaked — through memory dumps, swap files, or core dumps — an attacker
|
||||
can derive every key in the system.
|
||||
|
||||
Security-conscious key management systems typically employ three defenses:
|
||||
|
||||
1. **Zeroize**: Overwrite sensitive memory before deallocating. Prevents
|
||||
stale-data reads from freed memory.
|
||||
|
||||
2. **Memory locking** (`mlock`/`VirtualLock`): Prevent the OS from paging
|
||||
sensitive RAM to disk. Prevents swap-file leakage.
|
||||
|
||||
3. **Constant-time comparison**: Prevent timing side-channels when comparing
|
||||
keys or tokens.
|
||||
|
||||
The question is: which of these should alknet-secret adopt in v1, and which
|
||||
should be deferred?
|
||||
|
||||
## Decision
|
||||
|
||||
**Phase 3 (v1): Zeroize only. Defer mlock and constant-time comparison to
|
||||
Phase B.**
|
||||
|
||||
- All sensitive types (seed bytes, derived private keys, passphrase strings)
|
||||
derive `Zeroize` and implement `Drop` to call `zeroize()` before deallocation.
|
||||
- The `Lock` operation calls `zeroize()` on the seed and all cached derived
|
||||
keys, then drops them.
|
||||
- `mlock`/`VirtualLock` and constant-time comparison are not included in v1.
|
||||
|
||||
### Rationale for deferring mlock
|
||||
|
||||
1. **Complexity**: `mlock` requires root/CAP_IPC_LOCK on Linux or
|
||||
`SeLockMemory` on Windows. The crate should work in unprivileged contexts
|
||||
(development, testing, single-user nodes) without requiring system
|
||||
configuration changes.
|
||||
|
||||
2. **Performance**: `mlock` locks physical pages, which are typically 4KB.
|
||||
Locking many small buffers wastes physical memory. The seed (64 bytes) and
|
||||
derived keys (32–64 bytes each) are tiny — the real risk is swap-file
|
||||
leakage, which `zeroize` partially mitigates by wiping before free.
|
||||
|
||||
3. **Deployment flexibility**: Production head nodes running as root or with
|
||||
`CAP_IPC_LOCK` can add `mlock` in Phase B. Development and CLI nodes
|
||||
shouldn't need it.
|
||||
|
||||
4. **Audit surface**: `mlock` introduces platform-specific code paths (Linux
|
||||
vs macOS vs Windows) that should be audited together, not bolted on
|
||||
incrementally.
|
||||
|
||||
### Rationale for deferring constant-time comparison
|
||||
|
||||
The `SecretProtocol` service receives requests over irpc (local mpsc or remote
|
||||
QUIC). Comparison timing is not observable by callers — they send a message and
|
||||
wait for a response. The comparison that matters (auth token verification) is
|
||||
in alknet-core's `IdentityProvider`, not in alknet-secret. Key derivation
|
||||
results (DerivedKey) are not compared against attacker-controlled input within
|
||||
this crate.
|
||||
|
||||
### Zeroize implementation
|
||||
|
||||
```rust
|
||||
use zeroize::Zeroize;
|
||||
|
||||
#[derive(Zeroize)]
|
||||
#[zeroize(drop)]
|
||||
struct SeedHolder {
|
||||
seed: Vec<u8>,
|
||||
}
|
||||
|
||||
#[derive(Zeroize)]
|
||||
#[zeroize(drop)]
|
||||
struct DerivedKeyCache {
|
||||
keys: HashMap<String, Vec<u8>>,
|
||||
}
|
||||
```
|
||||
|
||||
`#[zeroize(drop)]` ensures that `Drop` calls `zeroize()` on all fields,
|
||||
overwriting memory before deallocation. This is a compile-time guarantee —
|
||||
forgetting to zeroize a field is a compile error.
|
||||
|
||||
### Lock lifecycle
|
||||
|
||||
```
|
||||
Unlock(passphrase)
|
||||
→ validate mnemonic (if restoring) or generate new
|
||||
→ derive master key from seed
|
||||
→ store seed in SeedHolder (Zeroize-protected)
|
||||
→ cache empty (keys derived on demand)
|
||||
|
||||
DeriveEd25519/DeriveEncryptionKey/Encrypt/Decrypt
|
||||
→ require unlocked state (error if locked)
|
||||
→ derive key, return result
|
||||
→ optionally cache derived key
|
||||
|
||||
Lock
|
||||
→ zeroize all cached derived keys
|
||||
→ zeroize seed
|
||||
→ drop all sensitive material
|
||||
→ service returns to locked state
|
||||
```
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Zeroize is zero-cost at compile time, minimal dependency
|
||||
(`zeroize` crate is ~500 lines, no `unsafe` on stable), and provides
|
||||
meaningful protection against stale-memory reads.
|
||||
- **Positive**: Lock effectively purges all sensitive material. After Lock,
|
||||
the process memory contains no useful secret data.
|
||||
- **Positive**: No platform-specific code paths in v1. The crate compiles and
|
||||
runs everywhere without privilege requirements.
|
||||
- **Negative**: Without `mlock`, the OS can page the seed to swap before
|
||||
zeroization occurs. This is a window of vulnerability that Phase B closes.
|
||||
The risk is acceptable for v1 because swap-file extraction requires root
|
||||
access or physical access to the machine — the same threat model as reading
|
||||
process memory directly.
|
||||
- **Negative**: Without constant-time comparison, timing side-channels exist
|
||||
in theory. In practice, no comparison in alknet-secret operates on
|
||||
attacker-controlled input, so the risk is nil within this crate.
|
||||
- **Negative**: `zeroize` adds a dependency. The `zeroize` crate is widely
|
||||
used in Rust crypto (ring, ed25519-dalek, x25519-dalek) and is a de facto
|
||||
standard.
|
||||
|
||||
## References
|
||||
|
||||
- [secret-service.md](../secret-service.md) — Security model, Lock/Unlock lifecycle
|
||||
- [ADR-027](027-crate-decomposition.md) — Crate decomposition (alknet-secret is independent)
|
||||
- [credentials.md](../credentials.md) — SecretStoreCredentialProvider integration
|
||||
- `zeroize` crate — https://crates.io/crates/zeroize
|
||||
Reference in New Issue
Block a user