greenfield: clean slate for ALPN-as-service pivot

Delete old source crates (alknet-core, alknet, alknet-napi), old
architecture docs (ADRs, specs, open questions), old research docs
(phase2, event-sourcing, feasibility, etc.), old tasks, and obsolete
reference material (gitserver/MPL, honker, nats, rustfs, polyglot,
keystone, distributed-identity).

Keep: alknet-secret (standalone, compiles), pivot docs, iroh and ssh
references, rudolfs reference (MIT/Apache, fork candidate), ops docs,
sdd_process.md, and licenses.

Previous implementation preserved at /workspace/@alkdev/alknet-main/
for reference during porting.

Workspace compiles: cargo check + 14 tests pass for alknet-secret.
This commit is contained in:
2026-06-15 12:08:08 +00:00
parent d003a4f4ec
commit b5a4600d74
261 changed files with 138 additions and 53794 deletions

View File

@@ -1,122 +0,0 @@
---
status: draft
last_updated: 2026-06-09
---
# Alknet Architecture
## Current State
Architecture spec sync in progress. Phase 0 foundation complete (ADRs 001037).
Phase 1 core modifications partially implemented (interface trait, config split,
identity provider, forwarding policy). Phase 2 core bridge research complete;
spec documents updated to reflect StreamInterface/MessageInterface split,
CredentialProvider as core type, and API keys in DynamicConfig.
Remaining open questions: OQ-15 (QUIC coexistence), OQ-19 (WebTransport TLS),
OQ-20 (worker registration), OQ-CP-01 (per-identity credentials), OQ-CP-02
(OIDC provider location), OQ-CP-03 (credential rotation). See
[open-questions.md](open-questions.md).
## Architecture Documents
| Document | Status | Description |
|----------|--------|-------------|
| [overview.md](overview.md) | reviewed | Package purpose, crate structure, three-layer model, exports, dependencies |
| [transport.md](transport.md) | reviewed | Transport abstraction: TCP, TLS, iroh |
| [auth.md](auth.md) | draft | Unified auth: SSH + token + API keys, credential presentation per interface |
| [call-protocol.md](call-protocol.md) | draft | Bidirectional call/event protocol, OperationEnv, three dispatch paths |
| [client.md](client.md) | reviewed | Client connection, SOCKS5, port forwarding |
| [server.md](server.md) | reviewed | Server acceptance, IdentityProvider, ForwardingPolicy, channel handling |
| [tun-shim.md](tun-shim.md) | deprecated | TUN interface wrapper — **deferred**, use tun2proxy |
| [napi-and-pubsub.md](napi-and-pubsub.md) | reviewed | NAPI wrapper, reload API, pubsub event target adapter |
| [identity.md](identity.md) | draft | Identity type, IdentityProvider trait, auth flows |
| [services.md](services.md) | draft | irpc service layer, OperationEnv, three dispatch paths |
| [interface.md](interface.md) | draft | StreamInterface, MessageInterface, credential presentation, ListenerConfig |
| [configuration.md](configuration.md) | draft | StaticConfig, DynamicConfig, API keys, forwarding policy, reload |
| [storage.md](storage.md) | draft | alknet-storage: metagraph, identity, ACL, honker |
| [flowgraph.md](flowgraph.md) | draft | alknet-flowgraph: call graph, operation graph, petgraph |
| [secret-service.md](secret-service.md) | reviewed | alknet-secret: BIP39, SLIP-0010, AES-GCM, SecretProtocol |
| [credentials.md](credentials.md) | draft | CredentialProvider, CredentialSet (outbound auth) |
| [definitions.md](definitions.md) | draft | Terminology disambiguation and concept mapping |
## Research Documents
| Document | Status | Description |
|----------|--------|-------------|
| [configuration.md](../research/configuration.md) | draft | Configuration architecture (source for promoted spec) |
| [core.md](../research/core.md) | draft | Core overview, transport, call protocol, DNS |
| [services.md](../research/services.md) | draft | irpc service protocols, OperationContext, application services |
| [storage.md](../research/storage.md) | draft | Metagraph, identity, ACL, secrets, honker |
| [flow.md](../research/flow.md) | draft | FlowGraph, operation graph, call graph, petgraph mapping |
| [integration-plan.md](../research/integration-plan.md) | draft | Phased integration plan for services, pubsub, and operations |
| [feasibility/](../research/feasibility/) | — | SSH tunnel feasibility assessment and related analyses |
| [event-sourcing/](../research/event-sourcing/) | — | Event sourcing patterns and event-driven architecture reference |
| [ops/](../research/ops/) | — | Production ops reference: certbot, fail2ban |
| [phase2/definitions.md](../research/phase2/definitions.md) | draft | Terminology disambiguation (promoted to architecture/definitions.md) |
| [phase2/interface-model.md](../research/phase2/interface-model.md) | draft | StreamInterface/MessageInterface analysis (promoted to interface.md) |
| [phase2/credential-provider.md](../research/phase2/credential-provider.md) | draft | CredentialProvider research (promoted to credentials.md) |
| [phase2/tls-transport.md](../research/phase2/tls-transport.md) | draft | HTTP interface, stealth handoff, ListenerConfig (promoted to interface.md, auth.md) |
## ADR Table
| ADR | Title | Status |
|-----|-------|--------|
| [001](decisions/001-pluggable-transport.md) | Pluggable transport via `AsyncRead+AsyncWrite` trait | Accepted |
| [002](decisions/002-tun-separate-process.md) | TUN shim as separate process | Superseded by ADR-014 |
| [003](decisions/003-iroh-stream-join.md) | iroh stream via `tokio::io::join` | Accepted |
| [004](decisions/004-ssh-over-transport.md) | SSH runs over transport, not alongside | Accepted |
| [005](decisions/005-socks5-before-tun.md) | SOCKS5 as primary interface, TUN as add-on | Accepted |
| [006](decisions/006-no-logging-of-tunnel-destinations.md) | No logging of tunnel destinations | Accepted |
| [007](decisions/007-napi-single-stream.md) | NAPI exposes single duplex stream | Accepted |
| [008](decisions/008-acme-lets-encrypt.md) | ACME/Let's Encrypt certificate provisioning | Accepted |
| [009](decisions/009-default-iroh-relay.md) | Default iroh relay with override | Accepted |
| [010](decisions/010-transport-chaining-cli.md) | Transport chaining in CLI | Accepted |
| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first API, no file-based config | Accepted |
| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Ed25519 keys + OpenSSH cert-authority, no password auth | Accepted |
| [013](decisions/013-fail2ban-friendly-logging.md) | Fail2ban-friendly logging + built-in rate limiting | Accepted |
| [014](decisions/014-defer-tun-recommend-socks5-proxy.md) | Defer TUN, recommend local SOCKS5 + tun2proxy | Accepted |
| [015](decisions/015-napi-rs-for-ffi-bridge.md) | napi-rs for FFI bridge | Accepted |
| [016](decisions/016-napi-expose-connect-and-serve.md) | NAPI exposes both connect() and serve() | Accepted |
| [017](decisions/017-stealth-mode-protocol-multiplexing.md) | Stealth mode — protocol multiplexing on port 443 | Accepted |
| [018](decisions/018-control-channel-for-pubsub.md) | Control channel for pubsub over SSH | Accepted |
| [019](decisions/019-proxy-dual-semantics.md) | `--proxy` dual semantics (client vs server) | Accepted |
| [023](decisions/023-unified-auth-shared-key-material.md) | Unified auth with shared key material + token auth | Accepted |
| [024](decisions/024-bidirectional-call-protocol.md) | Bidirectional call protocol (EventEnvelope) | Accepted |
| [025](decisions/025-handler-spec-separation.md) | Handler/spec separation for downstream service registration | Accepted |
| [026](decisions/026-transport-interface-separation.md) | Transport/interface separation (three-layer model) | Accepted |
| [027](decisions/027-crate-decomposition.md) | Crate decomposition (core, secret, storage, flowgraph) | Accepted |
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service behind feature flag | Accepted |
| [029](decisions/029-identity-core-type.md) | Identity as core type in alknet-core | Accepted |
| [030](decisions/030-static-dynamic-config-split.md) | Static/dynamic config split with ArcSwap | Accepted |
| [031](decisions/031-forwarding-policy.md) | Forwarding policy with rule-based allow/deny | Accepted |
| [032](decisions/032-event-boundary-discipline.md) | Event boundary discipline (domain, irpc, call protocol) | Accepted |
| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv as universal composition mechanism | Accepted |
| [034](decisions/034-head-worker-terminology.md) | Head/worker terminology replacing hub/spoke | Accepted |
| [035](decisions/035-streaminterface-messageinterface-split.md) | StreamInterface / MessageInterface split | Accepted |
| [036](decisions/036-credentialprovider-core-type.md) | CredentialProvider as core type (outbound auth) | Accepted |
| [037](decisions/037-api-keys-dynamic-config.md) | API keys as DynamicConfig auth | Accepted |
| [038](decisions/038-seed-lifecycle-memory-security.md) | Seed lifecycle and memory security (zeroize for v1) | Accepted |
> ADR numbers 020022 were allocated to proposals that were withdrawn before
> acceptance and are not listed.
## Open Questions
See [open-questions.md](open-questions.md) for all open and resolved questions.
Key resolved questions from Phase 0: OQ-12, OQ-16, OQ-18 (forwarding policy
and identity scopes), OQ-17 (transport-aware auth), OQ-23 (irpc feature flag),
OQ-24 (DNS control channel scope), OQ-25 (crate irpc dependencies), OQ-IF-01
(Interface session / EventEnvelope relationship), OQ-IF-02 (ForwardingPolicy
placement). Key open questions: OQ-15 (QUIC coexistence), OQ-19 (WebTransport
TLS), OQ-20 (worker registration).
## Lifecycle Definitions
| Status | Meaning | Transitions |
|--------|---------|-------------|
| `draft` | Under active development. May change significantly. | → `reviewed` when open questions resolved |
| `reviewed` | Architecture final. Implementation may begin. Changes require review. | → `stable` when implementation is complete and verified |
| `stable` | Locked. Changes require review and may warrant an ADR. | → `deprecated` when superseded |
| `deprecated` | Superseded. Kept for reference. | Removed when no longer referenced |

View File

@@ -1,339 +0,0 @@
---
status: draft
last_updated: 2026-06-09
---
# Authentication
## What
A unified authentication layer that works across all transports — SSH-over-any-
transport and WebTransport (non-SSH HTTP-level transports). The same key
material (Ed25519 authorized keys and certificate authorities) is shared across
both auth paths. Identity resolution produces a transport-agnostic `Identity`
that carries scopes and resources for downstream authorization.
## Why
Alknet currently authenticates connections exclusively through SSH public key
auth. Non-SSH transports (WebTransport) cannot perform SSH key exchange — they
need a different auth presentation that shares the same key material. The
unified auth layer ensures one key set, one identity, one rotation mechanism
across all transports. See ADR-023 for the decision context.
The canonical definitions of `Identity` and `IdentityProvider` are in
[identity.md](identity.md). This document covers auth-specific behavior:
auth presentation per transport, `AuthPolicy` structure, and the auth service
relationship.
## Architecture
### Identity and IdentityProvider
See [identity.md](identity.md) for the canonical definitions of:
- `Identity` struct (`{ id, scopes, resources }`)
- `IdentityProvider` trait (`resolve_from_fingerprint()`, `resolve_from_token()`)
- `ConfigIdentityProvider` (default, ArcSwap-backed)
- `StorageIdentityProvider` (production, SQLite-backed, in alknet-storage)
- `AuthProtocol` irpc service (behind `irpc` feature flag)
The key relationship: `IdentityProvider` is the contract. `ConfigIdentityProvider`
is the default implementation (reads from `DynamicConfig.auth`). `AuthProtocol`
irpc service is one way to satisfy the trait, behind a feature flag. Both paths
produce the same `Identity` result. See ADR-028 and ADR-029.
### Credential Presentation Per Interface
Each (Transport, Interface) pair presents credentials differently, but all
resolve to the same `Identity` through `IdentityProvider`. See
[definitions.md](definitions.md) for the full terminology rules.
| (Transport, Interface) | Credential presentation | Resolves via |
|------------------------|------------------------|-------------|
| (TLS, SshInterface) | SSH public key handshake | `resolve_from_fingerprint()` |
| (TCP, SshInterface) | SSH public key handshake | `resolve_from_fingerprint()` |
| (iroh, SshInterface) | SSH public key handshake | `resolve_from_fingerprint()` |
| (TLS, RawFramingInterface) | AuthToken in frame header | `resolve_from_token()` |
| (TCP, RawFramingInterface) | AuthToken in frame header | `resolve_from_token()` |
| (WebTransport, RawFramingInterface) | AuthToken in CONNECT request | `resolve_from_token()` |
| (—, HttpInterface) | `Authorization: Bearer` header | `resolve_from_token()` |
| (—, DnsInterface) | AuthToken in query labels | `resolve_from_token()` |
The **key material is shared**. The **credential presentation** differs per
(Transport, Interface) pair. The **verification result is the same**: an
authenticated `Identity` with scopes.
`resolve_from_token()` handles both AuthTokens (Ed25519-signed) and API keys
(hash-verified bearer tokens). The implementation discriminates by prefix or
format — see ADR-037.
### Token Authentication
For non-SSH transports, the client constructs an authentication token:
```
AuthToken = base64url(key_id || timestamp || signature)
key_id = SHA-256 fingerprint of the Ed25519 public key (32 bytes)
timestamp = Unix seconds, big-endian u64 (8 bytes)
signature = Ed25519 sign(key_id || timestamp_bytes, private_key)
```
Wire format when passed in a WebTransport CONNECT request:
```
CONNECT https://server:443/alknet?token=<AuthToken>
```
Server verification:
1. Base64url-decode the token
2. Extract `key_id` (first 32 bytes)
3. Look up `key_id` in the same `authorized_keys` set that SSH auth uses
4. Verify the Ed25519 `signature` against `(key_id || timestamp_bytes)` using
the matching public key
5. Check `timestamp` is within the acceptable window (configurable, default
±300 seconds)
6. Resolve to the same `Identity` that SSH pubkey auth would produce
The key fingerprint in the token serves double duty: it identifies which key
to verify against, and it ties the signature to a specific key (swapping
`key_id` invalidates the signature).
### Replay Protection
V1 uses timestamp-only (±300s window, no server state). The replay trade-offs
and future zero-replay options (nonce challenge-response) are documented in
ADR-023.
### IdentityProvider and Auth Service Relationship
The `IdentityProvider` trait (defined in [identity.md](identity.md)) decouples
alknet-core from any specific identity storage. Two implementations exist:
- **ConfigIdentityProvider** (in alknet-core) — reads from
`ArcSwap<DynamicConfig.auth>`. Every authorized key gets a default scope set.
No database required. This is the default for minimal deployments.
- **StorageIdentityProvider** (in alknet-storage) — backed by SQLite
`peer_credentials` and `api_keys` tables plus the ACL graph. Resolves
fingerprint → account → organization membership → effective scopes.
The `AuthProtocol` irpc service (behind the `irpc` feature flag, per ADR-028)
provides an async boundary for auth verification. It is one way to satisfy the
`IdentityProvider` trait, not a replacement for it. Both the trait path and the
irpc path produce the same `Identity` result.
The trait is the contract. The backing store is pluggable. Alknet-core never
depends on Honker, SQLite, or any specific database.
### API Keys
For service accounts, automation, and HTTP interface auth, Ed25519 AuthTokens
are inconvenient — they require client-side key generation and signing. API keys
provide a simpler bearer token format (ADR-037):
```
API key: "alk_dGhlX3NlY3JldA" (~20 chars, configurable prefix)
Storage: SHA-256 hash of the full key
Lookup: prefix match → hash verification → Identity
```
API keys are configured in `DynamicConfig.auth.api_keys`:
```toml
[[auth.api_keys]]
prefix = "alk_"
hash = "sha256:abc..."
scopes = ["relay:connect"]
description = "dashboard service account"
ttl = "30d" # optional
```
Both AuthTokens and API keys go through `IdentityProvider::resolve_from_token()`.
The implementation discriminates by prefix (default `alk_`): if the token starts
with the API key prefix, it's verified by SHA-256 hash lookup; otherwise, it's
verified as an Ed25519 AuthToken. Both paths produce the same `Identity`.
See [configuration.md](configuration.md) for the full `DynamicConfig.auth`
structure and ADR-037 for the decision context.
### AuthPolicy Structure
`AuthPolicy` in `DynamicConfig` holds all auth paths, sharing key material:
```rust
pub struct AuthPolicy {
pub ssh: SshAuthConfig,
pub token: TokenAuthConfig,
pub api_keys: Vec<ApiKeyEntry>,
}
pub struct SshAuthConfig {
pub authorized_keys: HashSet<PublicKey>,
pub cert_authorities: Vec<CertAuthorityEntry>,
// Existing fields from current ServerAuthConfig
}
pub struct TokenAuthConfig {
pub enabled: bool,
pub max_token_age: Duration, // Timestamp window (default: 300s)
pub key_source: TokenKeySource,
}
pub enum TokenKeySource {
/// Share the same authorized_keys set with SshAuthConfig.
/// Default and recommended for v1.
Shared,
/// Separate key set for non-SSH transports.
/// For deployments that want distinct access control per transport.
Separate(HashSet<PublicKey>),
}
pub struct ApiKeyEntry {
pub prefix: String, // e.g., "alk_"
pub hash: String, // e.g., "sha256:abc..."
pub scopes: Vec<String>, // e.g., ["relay:connect", "secrets:derive"]
pub description: Option<String>, // e.g., "dashboard service account"
pub expires_at: Option<u64>, // Unix timestamp, optional TTL
}
```
When `TokenKeySource::Shared` (the default), adding a key to
`authorized_keys` immediately grants access via both SSH and WebTransport.
One key set, one `reloadAuth()` call, one rotation.
### Auth Flow in the Server
**SSH transport (existing, unchanged):**
```
Client connects → SSH handshake → auth_publickey() callback
→ ServerAuthConfig::authenticate_publickey() or authenticate_certificate()
→ Auth::Accept or Auth::Reject
```
**WebTransport transport (new):**
```
Browser connects → WebTransport CONNECT request
→ SessionRequest inspection: extract token from URL path or header
→ TokenAuthConfig verification: decode token → lookup key_id → verify signature → check timestamp
→ session_request.accept() or session_request.forbidden()
```
After auth, both paths produce an `Identity`. The `Identity` is attached to the
connection and used by `ForwardingPolicy` and the call protocol to make
authorization decisions.
### WebTransport SessionRequest Inspection
The wtransport library's `SessionRequest` provides:
- `path()` — URL path (e.g., `/alknet?token=...`)
- `headers()` — HTTP headers (for `Authorization: Bearer ...`)
- `origin()` — Browser origin (for CORS-like restrictions)
- `remote_address()` — Client UDP address
Token extraction from URL path is preferred for browser WebTransport because
the W3C API (`new WebTransport(url)`) naturally includes query parameters. For
native clients (Deno, CLI), the `Authorization` header is also supported.
### Browser-Side Token Construction
```javascript
// Illustrative — see client SDK for production implementation
async function createAuthToken(keyPair) {
const publicKey = await crypto.subtle.exportKey('raw', keyPair.publicKey);
const keyId = new Uint8Array(await crypto.subtle.digest('SHA-256', publicKey));
const timestamp = new ArrayBuffer(8);
new DataView(timestamp).setBigUint64(0, BigInt(Math.floor(Date.now() / 1000)));
const message = new Uint8Array([...keyId, ...new Uint8Array(timestamp)]);
const signature = await crypto.subtle.sign('Ed25519', keyPair.privateKey, message);
const token = new Uint8Array([...keyId, ...new Uint8Array(timestamp), ...new Uint8Array(signature)]);
return btoa(String.fromCharCode(...token))
.replace(/\+/g, '-').replace(/\//g, '_').replace(/=+$/, '');
}
```
Browsers support Ed25519 key generation and signing via `SubtleCrypto` (Chrome
105+, Firefox 130+, Safari 17+). Deno supports it natively. No external
dependencies needed.
## Constraints
- Auth tokens are Ed25519-signed with the same key pair used for SSH auth. No
separate key management for non-SSH transports.
- `IdentityProvider` is the only interface between alknet-core and identity
storage. No database dependency at the core level.
- The SSH auth path is unchanged. `auth_publickey()` continues to work exactly
as it does today. Token auth is additive.
- Certificate authority tokens are not supported for token auth in v1. CA
verification requires the full OpenSSH certificate structure, which doesn't
fit in a simple signed timestamp. This can be added later if needed.
- Token auth is only available on transports that carry HTTP metadata (URL
path, headers). SSH-over-TCP/TLS/iroh continues to use SSH native auth
exclusively.
- API keys are bearer tokens — anyone who obtains the key has the associated
permissions. The hash storage and optional TTL mitigate but do not eliminate
this risk. Ed25519 AuthTokens remain the preferred auth method for interactive
clients. See ADR-037.
- API keys are verified by SHA-256 hash lookup in `DynamicConfig.auth.api_keys`
(or the `api_keys` database table in production). The full key is provided to
the client exactly once at creation time.
### Security Considerations
**Token in URL**: The auth token is passed as a URL query parameter
(`?token=...`) for browser WebTransport compatibility. This is a known web
security consideration:
- **Server logs**: The token may appear in HTTP access logs. Servers MUST
strip or redact the `token` query parameter before logging the request URL.
- **Browser history**: The token may appear in browser history. Timestamps
limit exposure to the token window (±300s).
- **Referrer headers**: WebTransport does not send referrer headers, so the
token does not leak via HTTP Referer.
- **Native clients**: Deno and native clients SHOULD prefer the `Authorization:
Bearer` header over URL parameters when the client supports custom headers.
## Open Questions
- **OQ-18**: ~~Source of Identity.scopes~~ Resolved per ADR-029 and ADR-031.
`IdentityProvider` owns scopes, `ForwardingPolicy` uses scopes from `Identity`.
See [open-questions.md](open-questions.md).
- **OQ-19**: Should the WebTransport listener require its own TLS identity
(separate from the SSH-over-TLS listener), or can they share the same
certificate? Deferred to Phase 4. See [open-questions.md](open-questions.md).
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Ed25519 + cert-authority | Key-based auth, no passwords |
| [023](decisions/023-unified-auth-shared-key-material.md) | Unified auth, shared key material | Same keys for SSH and token auth |
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | AuthProtocol behind feature flag; IdentityProvider is the contract |
| [029](decisions/029-identity-core-type.md) | Identity as core type | `Identity` and `IdentityProvider` in alknet-core |
| [035](decisions/035-streaminterface-messageinterface-split.md) | StreamInterface/MessageInterface | Credential presentation differs per (Transport, Interface) pair |
| [037](decisions/037-api-keys-dynamic-config.md) | API keys in DynamicConfig | Hash-verified bearer tokens for service accounts |
## Phase 2 Implementation Notes
- `ConfigIdentityProvider::resolve_from_token()` now handles API keys (`alk_` prefix) via SHA-256 hash verification with expiry checking
- `ApiKeyEntry` struct added to `AuthPolicy` with prefix, hash, scopes, description, expires_at fields
- API keys produce `Identity { id: prefix, scopes: from_entry, resources: {} }`
- Both AuthTokens (Ed25519 signed) and API keys (hash-verified bearer) go through `resolve_from_token()`, discriminated by format/prefix
## References
- [identity.md](identity.md) — Canonical Identity and IdentityProvider definitions
- [server.md](server.md) — Current SSH auth handler
- [transport.md](transport.md) — Transport abstraction
- [configuration.md](configuration.md) — DynamicConfig, AuthPolicy, ConfigReloadHandle
- [interface.md](interface.md) — Credential presentation per (Transport, Interface) pair
- [definitions.md](definitions.md) — Terminology disambiguation (IdentityProvider vs CredentialProvider, AuthToken vs API key)
- [services.md](services.md) — AuthProtocol irpc service
- [open-questions.md](open-questions.md) — OQ-17 (resolved), OQ-18 (resolved), OQ-19
- [wtransport](https://github.com/BiagioFesta/wtransport) — Rust WebTransport library
- [WebTransport W3C Spec](https://www.w3.org/TR/webtransport/) — Browser API

View File

@@ -1,551 +0,0 @@
---
status: draft
last_updated: 2026-06-09
---
# Call Protocol
## What
A bidirectional, transport-agnostic call and event protocol that runs over
authenticated pipes. It supports request/response calls, streaming
subscriptions, and unidirectional events — all using the same wire format. The
protocol is defined as a spec + handler + registry; downstream consumers (NAPI,
Python, head/worker) register their own operations without modifying core.
OperationEnv extends the call protocol with a universal composition mechanism
that unifies local dispatch, irpc service dispatch, and remote dispatch. A
handler receives `context.env.invoke(namespace, op, input)` and doesn't know
whether the operation runs locally, in-cluster, or on a remote node.
## Why
The current control channel (ADR-018) is unidirectional (client → server) and
provides fire-and-forget event dispatch without request/response semantics.
The call protocol generalizes it to support bidirectional calls (ADR-024) and
downstream service registration (ADR-025), enabling the head/worker model where
workers expose operations the head invokes.
Without OperationEnv, handlers calling other operations would need to know
whether the target is local, in-cluster, or on a remote node. OperationEnv
abstracts this away — one handler-facing API, three dispatch backends (ADR-033).
## Architecture
### Operation Paths
Operation names use slash-based paths aligned with URL routing conventions:
```
/{node}/{service}/{op}
```
- **node** — identity prefix of the node that exposes the operation. The head
uses this segment to route calls to the correct connected node.
- **service** — the logical service namespace. Groups related operations
under one handler prefix.
- **op** — the specific operation within that service.
Examples:
| Path | Meaning |
|------|---------|
| `/dev1/fs/readFile` | Node `dev1`, service `fs`, operation `readFile` |
| `/dev1/bash/exec` | Node `dev1`, service `bash`, operation `exec` |
| `/head/agent/chat` | Head's own `agent` service, operation `chat` |
| `/head/sessions/list` | Head's own `sessions` service, operation `list` |
| `/browser-1/notify/alert` | Worker `browser-1`, `notify` service |
This three-level routing mirrors iroh's ALPN dispatch: the first segment
routes to a connected node (like ALPN routes to a protocol handler), the
remaining path dispatches within that node's registry. See ADR-025 for the
handler/spec separation decision.
The `namespace` field on `OperationSpec` is derived from the path (`namespace`
= second path segment). It's a convenience accessor for ACL matching and
service grouping.
### Wire Format: EventEnvelope
Every message on the wire is a length-prefixed JSON `EventEnvelope`:
```rust
pub struct EventEnvelope {
pub r#type: String, // Event type (e.g., "call.requested", "call.responded")
pub id: String, // Correlation key (requestId, topic, or "" for broadcasts)
pub payload: Value, // JSON payload — schema depends on event type
}
// Frame: 4-byte big-endian length prefix + UTF-8 JSON body
```
This is the same format used by `@alkdev/pubsub` adapters. It is JSON because
it must be consumable from JavaScript, Python, and any language. The envelope
is transport-agnostic — it runs over SSH channels, WebTransport streams, iroh
bidirectional streams, WebSocket, or Worker postMessage.
Binary payloads (postcard, protobuf, etc.) are base64-encoded in the `payload`
field. The envelope itself stays JSON for cross-language compatibility.
### Call Protocol Events
Five event types carry request/response and subscription semantics:
| Event | Direction | Purpose |
|-------|-----------|---------|
| `call.requested` | Caller → Handler | Initiate a call or subscription |
| `call.responded` | Handler → Caller | Deliver a result (one for calls, many for subscriptions) |
| `call.completed` | Handler → Caller | Signal end of subscription stream |
| `call.aborted` | Either side | Cancel the call/subscription |
| `call.error` | Handler → Caller | Signal an error |
**`call.error` payload**:
```json
{
"code": "string",
"message": "string",
"retryable": false
}
```
**A call is just a subscribe that resolves after one event.** Both `call()` and
`subscribe()` send the same `call.requested` event. The difference is
consumption pattern:
- **`call()`**: Sends `call.requested`, resolves `Promise` on first `call.responded`
- **`subscribe()`**: Sends `call.requested`, yields each `call.responded` until `call.completed` or `call.aborted`
The `id` field carries the `requestId` for correlation.
### Bidirectional Calls and Routing
Both sides of a connection can initiate calls. The head routes calls to workers
using the first path segment:
```
Head (server) Worker: "dev1" (client)
│ │
│ call.requested │
│ name: "/dev1/fs/readFile" │
│ payload: { path: "/src/main.rs" } │
│──────────────────────────────────────────▶│
│ │
│ call.responded │
│ id: <requestId> │
│ payload: { content: "fn main()..." } │
│◀──────────────────────────────────────────│
│ │
│ Worker exposes /dev1/fs/*, │
│ /dev1/bash/* to head │
│ │
│◀─ call.requested ────────────────────────│
│ name: "/head/agent/chat" │
│ payload: { provider: "anthropic", ... } │
│ │
│── call.responded ──────────────────────▶ │
│ id: <requestId> │
│ payload: { completion: "..." } │
```
The head's registry includes:
- **Head-local operations** (`/head/*`) — handled directly
- **Remote operations** (`/{node}/*`) — forwarded to the worker connection
When the head routes `/dev1/fs/readFile` to worker `dev1`, it strips the node
prefix and delivers the call to the worker's local registry as `/fs/readFile`.
The worker doesn't need to know its own alias.
### Head/Worker Architecture
```
┌─────────────────────────────────┐
│ Head Node │
│ │
│ Head-local services: │
│ /head/agent/chat (LLM coord) │
│ /head/agent/complete │
│ /head/sessions/list │
│ /head/sessions/history │
│ │
│ Worker registry (discovered): │
│ /dev1/fs/* → dev1 connection │
│ /dev1/bash/* → dev1 connection │
│ /dev2/fs/* → dev2 connection │
│ /browser-1/notify/* → WT conn │
└──────┬───────┬───────┬──────────┘
│ │ │
┌─────────▼┐ ┌───▼────┐ ┌▼───────────┐
│ Worker │ │Worker │ │Browser Worker│
│ "dev1" │ │"dev2" │ │"browser-1" │
│ /fs/* │ │/fs/* │ │/notify/* │
│ /bash/* │ │/bash/* │ │ │
│ /search/*│ │ │ │ │
└──────────┘ └────────┘ └─────────────┘
```
When a worker connects, it registers its operations with the head:
```
worker → head: call.requested { name: "/head/services/register", payload: {
node: "dev1",
operations: ["/fs/readFile", "/fs/writeFile", "/bash/exec", "/search/query"]
}}
```
The head adds these to its routing table with the node prefix. Other workers
and browser clients can then call `/dev1/fs/readFile` without knowing how
the head routes it internally.
### Operation Registry
The operation registry maps paths to specs and handlers. **Specs and handlers
are separate** — downstream consumers register both (ADR-025).
```rust
pub struct OperationSpec {
pub name: String, // e.g., "/fs/readFile", "/agent/chat"
pub namespace: String, // e.g., "fs", "agent"
pub op_type: OperationType, // Query, Mutation, Subscription
pub input_schema: Value, // JSON Schema for input
pub output_schema: Value, // JSON Schema for output
pub access_control: AccessControl, // Required scopes/resources
}
pub enum OperationType {
Query, // Read-only, idempotent (e.g., "/fs/readFile", "/search/query")
Mutation, // Side effects (e.g., "/bash/exec", "/sessions/create")
Subscription, // Streaming (e.g., "/events/subscribe")
}
pub struct AccessControl {
pub required_scopes: Vec<String>, // AND-checked
pub required_scopes_any: Option<Vec<String>>, // OR-checked
pub resource_type: Option<String>, // e.g., "service"
pub resource_action: Option<String>, // e.g., "read"
}
```
**Registration is separated from implementation:**
```rust
// Core registers discovery operations
registry.register(OperationSpec { name: "/services/list", ... }, list_services_handler);
registry.register(OperationSpec { name: "/services/schema", ... }, schema_handler);
// A dev env worker registers its tools
registry.register(OperationSpec { name: "/fs/readFile", ... }, fs_read_handler);
registry.register(OperationSpec { name: "/bash/exec", ... }, bash_exec_handler);
// A browser client registers notification UDFs
registry.register(OperationSpec { name: "/notify/alert", ... }, notify_handler);
```
Core-provided operations use short paths without a node prefix
(`/services/list`, `/services/schema`). They live on whatever node the
caller is connected to. Worker-prefixed operations (`/dev1/fs/readFile`)
are routed by the head.
### ACL Per Operation Path
Access control maps to path prefixes using standard URL-like matching:
| Pattern | Matches | Purpose |
|---------|---------|---------|
| `/dev1/*` | All operations on node `dev1` | Full access to a worker |
| `/*/fs/*` | `fs` service on any node | Read file access across dev envs |
| `/*/bash/*` | `bash` service on any node | Shell access (higher risk) |
| `/head/agent/*` | Head LLM agent | LLM calls |
| `/head/sessions/*` | Head session management | Session history |
| `/browser-1/notify/alert` | Specific operation on specific node | One UI notification |
Higher-risk operations (shell, filesystem write) can require tighter scopes
than read-only operations. The ACL evaluates against the caller's
`Identity.scopes` and `Identity.resources` from the auth layer (see auth.md).
### Service Discovery
The `/services/list` and `/services/schema` operations expose what a node
offers. Read-only — no admin operations:
| Operation | Type | Description |
|-----------|------|-------------|
| `/services/list` | Query | List registered operation paths + metadata |
| `/services/schema` | Query | Get `OperationSpec` for a specific operation |
These tell the caller: "here's what you can call." They are not a control
panel. Access control is enforced at the operation level.
### PendingRequestMap
Manages in-flight calls and subscriptions. Correlates `call.responded` events
back to the original `call.requested`:
```rust
pub struct PendingRequestMap {
pending: HashMap<String, PendingEntry>,
}
enum PendingEntry {
Call {
tx: oneshot::Sender<Result<Value>>,
timeout: Instant,
},
Subscribe {
tx: mpsc::Sender<Result<Value>>,
timeout: Option<Instant>,
},
}
```
When a `call.responded` event arrives:
- If `PendingEntry::Call` → resolve the oneshot, delete entry
- If `PendingEntry::Subscribe` → push to the mpsc channel, keep entry alive
When `call.completed` arrives on a subscription → close the mpsc channel, delete
entry. When `call.aborted` arrives → cancel/drop whichever side initiated it. A
`call.aborted` for an unknown `requestId` is silently discarded — no error
response is generated.
Timeouts prevent dangling entries. A background task sweeps expired entries
periodically.
### Protocol Adapter Layer
The call protocol is transport-agnostic and interface-agnostic by design. It
receives input from two interface categories (ADR-035):
**StreamInterface** produces `InterfaceEvent` frames from a continuous byte
stream (SSH channel, raw framing). The call protocol handler calls `recv()`
on the session to get events.
**MessageInterface** handles individual `InterfaceRequest``InterfaceResponse`
pairs (HTTP, DNS). The call protocol handler constructs an `OperationContext`
from the request and invokes the registry directly.
Both paths resolve to the same `OperationRegistry` and `OperationEnv`:
| Transport | Channel mechanism | Direction |
|-----------|-------------------|-----------|
| SSH | Reserved `direct_tcpip` destination (ADR-018) | Bidirectional over SSH channel |
| WebTransport | Bidirectional stream after CONNECT | Bidirectional over WT stream |
| iroh QUIC | Bidirectional `open_bi()` / `accept_bi()` | Bidirectional over QUIC stream |
| WebSocket | Single WS connection | Bidirectional over WS frames |
| Worker | `postMessage` | Bidirectional over structured clone |
The framing is always: 4-byte BE length prefix + JSON. The envelope shape is
the same regardless of transport.
### OperationEnv — Universal Composition Mechanism
OperationEnv provides the handler-facing API for composing operations. A handler
receives `context.env.invoke(namespace, operation, input)` and gets back a
`ResponseEnvelope` — regardless of which dispatch path the operation takes
(ADR-033).
Three dispatch paths, one API:
| Path | Mechanism | Serialization | Scope |
|------|-----------|---------------|-------|
| **Local** | Direct function call through registry | None (in-process) | Same process |
| **Service** | irpc protocol enum dispatch | postcard (binary) | Same cluster |
| **Remote** | Call protocol `EventEnvelope` | JSON | Cross-node |
All three produce the same `ResponseEnvelope`. Service assembly determines
which path each operation uses:
```rust
// Minimal deployment (Phase 1: single node, all local)
let env = OperationEnv::local(local_registry);
// Production deployment (Phase 2+: mix of local and remote)
let env = OperationEnv::new()
.local("auth", auth_registry)
.local("config", config_registry)
.service("secrets", secret_irpc_client)
.remote("worker-1", call_protocol_conn);
```
**Phase boundary**: Phase 1 ships with local dispatch only (direct function
calls through the operation registry). The irpc service dispatch and remote
dispatch paths are contracted here but not built yet. irpc service protocols
(`AuthProtocol`, `SecretProtocol`, etc.) are defined in the specs but the
implementations are Phase 2+ work.
**irpc is one dispatch backend for OperationEnv, not a replacement for the
call protocol or for OperationEnv.** A call protocol handler can call an irpc
service internally (e.g., `/head/auth/verify` calls
`AuthProtocol::VerifyPubkey`) — the layers compose. irpc is behind a feature
flag in alknet-core. See [services.md](services.md) for full OperationEnv and
irpc service details.
### OperationContext
Every handler receives an `OperationContext`:
```rust
pub struct OperationContext {
pub request_id: String,
pub parent_request_id: Option<String>,
pub identity: Option<Identity>,
pub metadata: HashMap<String, Value>,
pub env: OperationEnv,
pub trusted: bool, // set by buildEnv(), not by callers
}
```
- **`identity`**: The authenticated identity making the call. Populated by
`IdentityProvider` from the interface layer ([identity.md](identity.md)).
- **`env`**: The operation environment — namespaced access to other operations.
- **`trusted`**: When a handler calls another operation through `env`, the
nested call is `trusted` (skips ACL checks). This prevents double-checking:
if `/head/agent/chat` is allowed, and it internally calls
`/head/auth/verify`, the auth check is trusted.
Handler signature:
```rust
fn handle(input: Value, context: OperationContext) -> ResponseEnvelope;
```
### ResponseEnvelope
The universal return type from all three dispatch paths:
```rust
pub struct ResponseEnvelope {
pub request_id: String,
pub result: Result<Value, CallError>,
}
pub struct CallError {
pub code: String,
pub message: String,
pub retryable: bool,
}
```
Local dispatch produces `ResponseEnvelope` with no serialization. irpc service
dispatch produces postcard-encoded results that are decoded into
`ResponseEnvelope`. Remote dispatch receives `call.responded` EventEnvelope
frames and maps them to `ResponseEnvelope`. The handler always gets the same
type back.
### Relationship to @alkdev/pubsub and @alkdev/operations
The call protocol in core is a Rust reimplementation of the same protocol
defined in `@alkdev/operations`. The TypeScript implementation provides:
- `PendingRequestMap` — request/response correlation
- `CallHandler` — bridges pubsub events to operation registry
- `OperationSpec`, `AccessControl`, `Identity` — type definitions
The Rust implementation mirrors these types and behaviors. TypeScript consumers
continue using `@alkdev/operations` over `@alkdev/pubsub` adapters (including
the `event-target-alknet` adapter). Rust consumers use core's registry directly.
Both speak the same wire protocol and can interoperate.
The key principle: **the same `EventEnvelope` can flow from a Rust handler
through core, out over SSH channel, into a JavaScript pubsub adapter, and
be dispatched through `@alkdev/operations`'s call handler** — with zero
translation at the wire level.
### Agent Service Pattern (Downstream Application Concern)
An agent service — coordinating between LLM providers and tool calls — is a
primary downstream use case for the call protocol. It would be just another set
of registered operations with no special treatment:
- `/head/agent/chat` — send a message, get a completion. Routes to the
appropriate LLM provider based on available workers and configuration.
- `/head/agent/complete` — streaming completion. Yields tokens as they arrive.
- `/head/sessions/list` — list session histories (backed by Honker or other
durable storage).
- `/head/sessions/history` — retrieve a specific session's message history.
The agent service uses OperationEnv to invoke tools on workers. **This is a
downstream application concern, not a core requirement.** The call protocol
enables it by providing the universal composition mechanism (ADR-033), but the
agent service itself is built on top, not into the core.
## Constraints
- The call protocol does not depend on Honker, SQLite, or any database. The
`PendingRequestMap` is in-memory. Durable session storage is a consumer concern.
- Operation specs use JSON Schema. Complex sub-structures (postcard, protobuf)
can be carried as base64-encoded blobs in the `payload`, but the envelope
itself is always JSON.
- Service discovery (`/services/list`, `/services/schema`) is read-only. No
admin operations are exposed through the call protocol itself.
- Batch is not a protocol primitive. Multiple `call.requested` events with
correlated `requestId`s provide equivalent semantics.
- The node prefix in the operation path is a routing mechanism, not a security
boundary. ACL is enforced at the `AccessControl` level, not by path prefix
alone. A worker that exposes `/dev1/bash/exec` can restrict access via
`required_scopes` — not every authenticated identity should have shell access.
- **OperationEnv composition model matches the `@alkdev/operations` behavioral
contract**: namespace + operation name → invoke with input, return output.
The Rust implementation may differ in structure but must preserve this
contract (ADR-033).
- **irpc is explicitly positioned as one dispatch backend for OperationEnv**
(ADR-033, ADR-028). It is not a replacement for the call protocol or for
OperationEnv.
- **Phase 1 is local dispatch only.** irpc service dispatch and remote dispatch
are contracted in this spec but not built yet. The `OperationEnv::local()`
path is the Phase 1 implementation.
## Open Questions
- **OQ-20**: How does the head track which workers expose which operations when
workers connect and disconnect? Registration on connect and cleanup on
disconnect, or heartbeat-based discovery? See
[open-questions.md](open-questions.md).
- **OQ-22**: ~~Should the call protocol support streaming inputs (client streaming
in gRPC terms)?~~ Resolved — deferred. Current model covers all identified use
cases. See [open-questions.md](open-questions.md).
- **~~OQ-IF-01~~**: ~~How does the `Interface` session type relate to the call
protocol's `EventEnvelope` stream?~~ Resolved — `InterfaceSession::recv()`
returns `Option<InterfaceEvent>` where `InterfaceEvent` carries
`EventEnvelope` + `Identity`. `InterfaceSession::send()` accepts `EventEnvelope`.
The `SshSession` bridge implements this over the `alknet-control:0` channel.
For `MessageInterface`, `InterfaceRequest`/`InterfaceResponse` normalize
request/response pairs. See [interface.md](interface.md) and ADR-035.
- **OQ-P2-01**: Should `MessageInterface` and `StreamInterface` share a common
trait? See [interface.md](interface.md) and [open-questions.md](open-questions.md).
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [018](decisions/018-control-channel-for-pubsub.md) | Control channel for pubsub | Reserved destination for event bus |
| [024](decisions/024-bidirectional-call-protocol.md) | Bidirectional call protocol | Generalizes ADR-018, both sides can call |
| [025](decisions/025-handler-spec-separation.md) | Handler/spec separation | Downstream registers operations without modifying core |
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | irpc is one dispatch backend for OperationEnv |
| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv | Universal composition with three dispatch paths |
| [035](decisions/035-streaminterface-messageinterface-split.md) | StreamInterface/MessageInterface | Call protocol accepts events from both interface categories |
## Phase 2 Implementation Notes
- `SshSession::recv()` and `SshSession::send()` now functional — bridged to call protocol via `alknet-control:0` SSH channel using `ControlChannelBridge` with mpsc channels
- `FrameFramedReader`/`FrameFramedWriter` added to `call::frame` for async length-prefixed EventEnvelope I/O
- `RawFramingSession` implemented with first-frame auth: first frame's payload extracted as AuthToken, resolved via `IdentityProvider::resolve_from_token()`, session transitions to authenticated state on success
- `OperationEnv.credentials(service)` method added for outbound credential resolution (ADR-036)
- `CredentialProvider` trait and `CredentialSet` enum defined in `alknet_core::credentials`
## References
- [auth.md](auth.md) — Identity and `IdentityProvider` trait
- [napi-and-pubsub.md](napi-and-pubsub.md) — NAPI wrapper and pubsub adapter
- [server.md](server.md) — Channel handling and control channel routing
- [transport.md](transport.md) — Transport abstraction
- [identity.md](identity.md) — Identity struct, IdentityProvider trait
- [interface.md](interface.md) — Interface layer, EventEnvelope stream from interfaces
- [configuration.md](configuration.md) — ForwardingPolicy, service metadata
- [services.md](services.md) — OperationEnv, OperationContext, irpc service layer
- `@alkdev/pubsub` — TypeScript event target adapters and `EventEnvelope`
- `@alkdev/operations` — TypeScript call protocol, `OperationSpec`, registry
- `@alkdev/storage``peer_credentials` table, ACL graph, `Identity`
- [irpc](/workspace/irpc) — iroh streaming RPC (postcard-only, Rust-to-Rust)
- [iroh](/workspace/iroh) — P2P QUIC transport

View File

@@ -1,209 +0,0 @@
---
status: reviewed
last_updated: 2026-06-02
---
# Client
## What
The alknet client establishes an SSH session to a server (via pluggable transport) and exposes a local SOCKS5 proxy for routing traffic through that session. Port forwarding (`-L` / `-R` style) covers specific service access like Postgres or Redis.
## Why
Users need a way to route traffic through the SSH tunnel. SOCKS5 is the primary interface — it's standard, well-supported by browsers and CLI tools, and needs no privileges. Port forwarding covers specific service access. For VPN-like "route all traffic" behavior, users run `tun2proxy` alongside alknet (ADR-014).
## Architecture
### Client Components
```
┌────────────────────────────────────────────────────────┐
│ alknet connect │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ SOCKS5 │ │ Port │ │ Remote │ │
│ │ Server │ │ Forward │ │ Forward │ │
│ │ :1080 │ │ -L spec │ │ -R spec │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────┐ │
│ │ Channel Manager │ │
│ │ (opens direct-tcpip, │ │
│ │ forwarded-tcpip streams) │ │
│ └──────────────┬──────────────────┘ │
│ │ │
│ ┌──────────────▼──────────────────┐ │
│ │ SSH Client (russh) │ │
│ │ Handle<ClientHandler> │ │
│ └──────────────┬──────────────────┘ │
│ │ │
│ ┌──────────────▼──────────────────┐ │
│ │ Transport │ │
│ │ (Tcp / Tls / Iroh) │ │
│ └──────────────────────────────────┘ │
└────────────────────────────────────────────────────────┘
```
### SOCKS5 Server
The primary client interface. Listens on a local port (default `127.0.0.1:1080`), accepts SOCKS5 connections, and for each connection:
1. Reads the SOCKS5 handshake (auth method negotiation, target address)
2. Opens a `channel_open_direct_tcpip(target_host, target_port, originator_addr, originator_port)` on the SSH session
3. Converts the SSH channel to a stream via `channel.into_stream()`
4. Runs `tokio::io::copy_bidirectional(&mut local_socket, &mut ssh_stream)` to proxy data
Supports SOCKS5h (domain names resolved server-side) by default. This prevents DNS leaks — the client never resolves target hostnames locally, sending them to the server for resolution instead. This is consistent with the project's privacy design (ADR-006).
### Port Forwarding
Local port forwards (`-L local_addr:local_port:remote_host:remote_port`):
1. Bind `TcpListener` on `local_addr:local_port`
2. For each accepted connection, open `channel_open_direct_tcpip(remote_host, remote_port, ...)`
3. Proxy bytes bidirectionally via `copy_bidirectional`
Remote port forwards (`-R remote_addr:remote_port:local_host:local_port`):
1. Send `tcpip_forward(remote_addr, remote_port)` to request the server listen on a port
2. When the handler receives `server_channel_open_forwarded_tcpip`, connect to `local_host:local_port`
3. Proxy bytes bidirectionally
### Channel Manager
The channel manager owns the `Arc<client::Handle<ClientHandler>>` and provides methods:
- `open_direct_tcpip(host, port)` — open a tunnel channel to a remote host
- `open_streamlocal(socket_path)` — open a tunnel to a Unix socket
- `request_tcpip_forward(addr, port)` — request remote listening
- `cancel_tcpip_forward(addr, port)` — cancel remote listening
It also handles reconnection: if `handle.is_closed()` returns true, attempt reconnection with exponential backoff.
### Reconnection
On transport failure:
1. Detect via `handle.is_closed()` or transport read error
2. Exponential backoff reconnect (1s, 2s, 4s, ... max 30s)
3. Re-establish transport connection
4. Re-authenticate SSH session
5. Notify SOCKS5 server and port forwards (in-flight connections fail, new connections work)
Reconnection is always enabled. The backoff caps at 30 seconds and continues indefinitely until the user terminates the process. Existing TCP connections through the tunnel are lost on reconnect — this is acceptable and consistent with how VPN connections behave.
The channel manager orchestrates reconnection: it creates a new transport stream (by calling `transport.connect()` again) and establishes a new SSH session over it (ADR-004). This is a full reconnect — there is no "SSH reconnects over the same transport." Port forward listeners (`-L`, `-R`) are re-registered with the new session after reconnection.
### Programmatic Configuration (ADR-011)
The client uses programmatic configuration — no `~/.ssh/config` parsing, no custom config files. Configuration comes from:
1. **CLI flags**: `--server`, `--identity`, `--transport`, etc.
2. **Library API**: `ConnectOptions` and `ServeOptions` structs in `alknet-core`, constructable programmatically
3. **Environment variables**: `ALKNET_SERVER`, `ALKNET_IDENTITY` as convenience defaults
This approach avoids cross-platform path issues (`~` expansion, Windows `USERPROFILE`) and makes the library API clean for programmatic consumers like the NAPI wrapper. Keys can be provided as file paths or in-memory data.
### Key Material Format
Key inputs (`--identity`, `--authorized-keys`, `--cert-authority`, `--key`) accept either:
- **File path**: A filesystem path to a key file (e.g., `~/.ssh/id_ed25519`, `/etc/alknet/ca.pub`)
- **In-memory data**: Raw key bytes provided programmatically via the library API or NAPI wrapper (as `Vec<u8>` in Rust, `Buffer` in Node.js)
The accepted format is **OpenSSH key format** (the format used by `ssh-keygen` and OpenSSH's `~/.ssh/` files). This includes:
- Private keys: OpenSSH format (begins with `-----BEGIN OPENSSH PRIVATE KEY-----`)
- Public keys: OpenSSH format (e.g., `ssh-ed25519 AAAA... user@host`)
- Certificate authority keys: OpenSSH public key format
- Authorized keys files: Standard OpenSSH `authorized_keys` format
PEM-encoded keys (PKCS#1, PKCS#8) are not supported. Use OpenSSH format keys throughout.
### CLI Interface
```bash
# Basic connection (TCP, default port 22)
alknet connect --server example.com --identity ~/.ssh/id_ed25519
# With TLS
alknet connect --server example.com:443 --transport tls --identity ~/.ssh/id_ed25519
# With TLS + insecure (self-signed certs)
alknet connect --server example.com:443 --transport tls --identity ~/.ssh/id_ed25519 --insecure
# With iroh (no public IP needed)
alknet connect --peer <endpoint-id> --transport iroh --identity ~/.ssh/id_ed25519
# With iroh + custom relay
alknet connect --peer <endpoint-id> --transport iroh --identity ~/.ssh/id_ed25519 --iroh-relay https://relay.example.com
# With iroh + proxy (transport chaining)
alknet connect --peer <endpoint-id> --transport iroh --identity ~/.ssh/id_ed25519 --proxy socks5://127.0.0.1:1080
# SOCKS5 on custom port
alknet connect --server example.com --socks5 127.0.0.1:1080 --identity ~/.ssh/id_ed25519
# With port forwards
alknet connect --server example.com --forward 5432:db.internal:5432 --forward 6379:redis.internal:6379
# All options
alknet connect \
--server <addr> \ # TCP/TLS server address (required for tcp/tls)
--peer <endpoint-id> \ # iroh endpoint ID, base58-encoded (required for iroh)
--transport tcp|tls|iroh \ # Transport mode
--identity <path-or-buffer> \ # SSH private key (path or in-memory)
--socks5 <addr:port> \ # SOCKS5 listen address (default: 127.0.0.1:1080)
--forward <spec> \ # Port forward spec (repeatable)
--remote-forward <spec> \ # Remote port forward spec (repeatable)
--proxy <url> \ # Upstream proxy (socks5:// or http://)
--iroh-relay <url> \ # iroh relay URL (default: n0 relay)
--tls-server-name <host> \ # SNI hostname for TLS
--insecure # Accept self-signed TLS certs
```
## Constraints
- SOCKS5 is always enabled when `alknet connect` runs (it's the primary interface). Port forwards are optional.
- The client does not log tunnel destinations. The SOCKS5 server connects and proxies — no logging of SOCKS5 request targets.
- Authentication is Ed25519 public key or OpenSSH certificate (ADR-012). No password authentication over SSH.
- Only one SSH session per `alknet connect` process. Multiple sessions = multiple processes (or a future multiplexer).
- No `~/.ssh/config` parsing. Configuration is programmatic via CLI flags, env vars, or library API structs (ADR-011).
- VPN-like "route all traffic" behavior is provided by running `tun2proxy --proxy socks5://127.0.0.1:1080` alongside the client, not by a built-in TUN interface (ADR-014).
- The CLI `alknet connect` command manages a full SSH session with SOCKS5 and port forwarding. The NAPI `connect()` function is a different operation — it opens a single SSH channel as a Duplex stream for programmatic use, with no SOCKS5 server or port forwarding. See napi-and-pubsub.md for details.
## Graceful Shutdown
On SIGTERM or SIGINT:
1. Stop accepting new SOCKS5 connections and port forward connections
2. Send an SSH disconnect message to the server
3. Wait for in-flight channel data to drain (brief timeout, ~2 seconds)
4. Close the transport stream
5. Exit
In-flight connections are not preserved across shutdown — they receive a connection reset. This matches the behavior of standard SSH tunnel tools.
## Error Handling
Error handling follows the project's layered pattern (see overview.md):
- **Transport errors**: Trigger reconnection with exponential backoff (see Reconnection section above). If reconnection fails indefinitely, the process continues retrying until the user terminates it.
- **Auth errors**: Cause reconnection retry. After repeated auth failures, the SOCKS5 server and port-forward listeners remain active but new channel opens fail until reconnection succeeds.
- **Channel-level errors**: Individual channel failures (target unreachable, proxy failure) close that channel without affecting the SSH session or other channels.
- **CLI errors**: Reported to stderr with a non-zero exit code. Fatal errors (invalid flags, key file not found) exit immediately.
## Open Questions
None — all resolved.
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [005](decisions/005-socks5-before-tun.md) | SOCKS5 first | SOCKS5 is the primary interface; TUN is external (tun2proxy) |
| [006](decisions/006-no-logging-of-tunnel-destinations.md) | No logging of destinations | Client does not log SOCKS5 request targets (consistent with ADR-006) |
| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first API | No file-based config; options are structs, env vars, or CLI flags |
| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Key + cert-authority | No password auth; OpenSSH cert-authority for multi-user |
| [019](decisions/019-proxy-dual-semantics.md) | Proxy dual semantics | `--proxy` routes transport on client, data on server |

View File

@@ -1,329 +0,0 @@
---
status: draft
last_updated: 2026-06-09
---
# Configuration
## What
Alknet's configuration is split into `StaticConfig` (immutable after startup) and
`DynamicConfig` (hot-reloadable at runtime), with `ArcSwap` providing lock-free
reads on the hot path. `ConfigService` wraps reloads behind an irpc protocol
for production deployments.
## Why
Three specific failures motivated the split (ADR-030):
1. No hot reload of authentication credentials — adding a key requires a restart.
2. No port forwarding access control — any authenticated client has unrestricted
access (ADR-031).
3. No structured configuration beyond CLI flags — operators need config files
and the NAPI layer needs programmatic reload.
The split is clean: anything that affects SSH handshake or socket binding is
static; anything checked per-connection or per-channel is dynamic.
## Architecture
### StaticConfig
Immutable after startup. Constructed from `ServeOptions` (the builder pattern
is preserved per ADR-011). Contains:
- Transport mode, listen address
- TLS config (cert, key)
- iroh config (relay URL)
- Stealth mode flag
- Host key, host key algorithm
- Max auth attempts, max connections per IP
- Proxy config
Changing any of these requires a restart.
### DynamicConfig
Hot-reloadable at runtime via `ArcSwap<DynamicConfig>`. Contains:
- `AuthPolicy` — authorized keys, certificate authorities, token config
- `ForwardingPolicy` — allow/deny rules for channel targets (ADR-031)
- `RateLimitConfig` — rate limiting parameters
`ArcSwap` provides lock-free reads. Every `auth_publickey()` and
`channel_open_direct_tcpip()` call does a single `Arc` dereference — zero cost
compared to the current approach. Writes are atomic: `store()` swaps the
pointer.
### API Keys
`DynamicConfig.auth` also includes API keys for service accounts and HTTP
interface auth (ADR-037):
```toml
[[auth.api_keys]]
prefix = "alk_"
hash = "sha256:abc..."
scopes = ["relay:connect"]
description = "dashboard service account"
ttl = "30d" # optional
```
API keys are verified by `ConfigIdentityProvider::resolve_from_token()` — if
the token starts with the configured prefix, it's treated as an API key and
verified by SHA-256 hash lookup. Otherwise, it's treated as an Ed25519 AuthToken.
Both paths produce the same `Identity` result.
### ConfigReloadHandle
```rust
pub struct ConfigReloadHandle {
dynamic: Arc<ArcSwap<DynamicConfig>>,
}
impl ConfigReloadHandle {
pub fn reload(&self, new_config: DynamicConfig) { ... }
}
```
Obtained from `Server::run()`. Passed to NAPI or CLI for explicit reload.
### ConfigServiceImpl
The Phase 1 implementation of config service logic, backed by
`ArcSwap<DynamicConfig>`. Where `ConfigIdentityProvider` wraps the auth section
of `DynamicConfig`, `ConfigServiceImpl` wraps the forwarding and rate-limit
sections. Both are ArcSwap-backed and share the same `DynamicConfig` instance.
```rust
pub struct ConfigServiceImpl {
dynamic: Arc<ArcSwap<DynamicConfig>>,
}
impl ConfigServiceImpl {
pub fn forwarding_policy(&self) -> Arc<ForwardingPolicy> {
self.dynamic.load().forwarding.clone()
}
pub fn rate_limits(&self) -> Arc<RateLimitConfig> {
self.dynamic.load().rate_limits.clone()
}
pub fn reload(&self, new_config: DynamicConfig) {
self.dynamic.store(Arc::new(new_config));
}
}
```
Phase 1 deploys `ConfigServiceImpl` directly — no irpc service boundary. The
`ConfigProtocol` irpc service (behind feature flag) wraps `ConfigServiceImpl`
for production deployments that use the service layer. This mirrors the
`ConfigIdentityProvider` / `AuthProtocol` pattern from [identity.md](identity.md)
and ADR-028.
### ConfigService irpc Service
```rust
enum ConfigProtocol {
GetForwardingPolicy,
GetRateLimits,
ReloadForwarding { policy: ForwardingPolicy },
ReloadRateLimits { limits: RateLimitConfig },
}
```
Behind the `irpc` feature flag. For production deployments that use the service
layer. For minimal deployments, direct `ConfigReloadHandle::reload()` is
sufficient.
### ForwardingPolicy
Part of DynamicConfig (ADR-031). Evaluated per-channel-open, matched against
the authenticated `Identity`. Rules are evaluated in order; first match wins.
Default determines fallback.
```rust
pub struct ForwardingPolicy {
pub default: ForwardingAction,
pub rules: Vec<ForwardingRule>,
}
```
### TOML Config File
Optional convenience input format (amends ADR-011, does not replace
programmatic API). Covers static config plus initial auth/forwarding paths.
```toml
[server]
# Stream-based listener: TLS + SSH on port 443
[[listeners]]
type = "stream"
transport = "tls"
interface = "ssh"
listen = "0.0.0.0:443"
[server.tls]
cert = "/etc/alknet/tls/cert.pem"
key = "/etc/alknet/tls/key.pem"
# Stream-based listener: TCP + SSH on port 22
[[listeners]]
type = "stream"
transport = "tcp"
interface = "ssh"
listen = "0.0.0.0:22"
# Stream-based listener: iroh P2P
[[listeners]]
type = "stream"
transport = "iroh"
iroh_relay = "https://relay.alk.dev"
# Message-based listener: HTTP on port 443 (with stealth)
[[listeners]]
type = "http"
listen = "0.0.0.0:443"
tls = true
stealth = true
# Message-based listener: HTTP on port 8080 (separate, no stealth)
# [[listeners]]
# type = "http"
# listen = "0.0.0.0:8080"
# tls = false
# stealth = false
# Message-based listener: DNS on port 53
# [[listeners]]
# type = "dns"
# listen = "0.0.0.0:53"
# tls = false
[auth]
host_key = "/etc/alknet/ssh/host_key"
[auth.ssh]
authorized_keys = [...]
[auth.token]
enabled = true
max_token_age = "5m"
[[auth.api_keys]]
prefix = "alk_"
hash = "sha256:abc..."
scopes = ["relay:connect"]
description = "dashboard service account"
ttl = "30d"
[forwarding]
default = "deny"
[[forwarding.rules]]
target = "localhost:*"
action = "allow"
```
### NAPI Reload API
```typescript
interface AlknetServer {
reloadAuth(auth: { authorizedKeys?: Buffer, certAuthority?: Buffer }): void;
reloadForwarding(policy: ForwardingPolicyConfig): void;
reloadAll(config: DynamicConfig): void;
}
```
### Multi-Transport Listeners
A head node may accept connections on multiple transports and interfaces simultaneously.
Listeners come in two categories: stream-based (Transport + StreamInterface pairs) and
message-based (self-contained HTTP or DNS servers).
```rust
pub enum ListenerConfig {
Stream {
transport: TransportKind,
interface: StreamInterfaceKind,
},
Http {
bind_addr: SocketAddr,
tls: bool,
stealth: bool, // byte-peek protocol detection on shared port
},
Dns {
bind_addr: SocketAddr,
tls: bool,
},
}
```
For stream-based listeners, `Server::run()` spawns one accept loop per listener.
For HTTP listeners, it spawns an axum server. For DNS listeners, it spawns a DNS
server. All share `DynamicConfig`, `ConnectionRateLimiter`, sessions, and
shutdown signal.
```toml
[[listeners]]
transport = "tls"
listen = "0.0.0.0:443"
stealth = true
[[listeners]]
transport = "tcp"
listen = "0.0.0.0:22"
[[listeners]]
transport = "iroh"
iroh_relay = "https://relay.alk.dev"
```
### CLI vs Programmatic Behavior
| Interface | Static config | Dynamic config | Reload mechanism |
|-----------|--------------|----------------|------------------|
| CLI | Flags + optional `--config` file | Loaded at startup from `--authorized-keys` | None (restart to change) |
| Core Rust | `StaticConfig` struct | `AuthProtocol` (irpc) or `ConfigIdentityProvider` (ArcSwap) | `ConfigProtocol::ReloadDynamicConfig` or `ConfigReloadHandle::reload()` |
| NAPI | `serve()` options | Same | `server.reloadAuth()`, `server.reloadForwarding()` |
## Constraints
- `StaticConfig` cannot be changed after startup. Changing transport mode,
listen address, TLS config, or host key requires a restart.
- `DynamicConfig` is reloaded atomically via `ArcSwap`. Existing connections
continue with their current config; new connections get the new config.
- Config file is optional. `ServeOptions` builder pattern remains the primary
API (amends ADR-011, does not supersede it).
- No file watching (OQ-13 resolved: potential attack vector, unnecessary
complexity).
- Client configuration stays as `ConnectOptions` — no `ArcSwap` needed.
## Open Questions
- None. All configuration-related questions are resolved per ADR-030, ADR-031,
and the resolved OQs in [open-questions.md](open-questions.md).
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [030](decisions/030-static-dynamic-config-split.md) | Static/dynamic config split | Immutable transport vs. reloadable auth/forwarding |
| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first API | Amended, not superseded — TOML is convenience layer |
| [031](decisions/031-forwarding-policy.md) | Forwarding policy | Rule-based allow/deny, TransportKind-aware |
| [029](decisions/029-identity-core-type.md) | Identity as core type | DynamicConfig.auth consumed by IdentityProvider |
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | ConfigService wraps DynamicConfig reloads |
## Phase 2 Implementation Notes
- `DynamicConfig.auth` now includes `api_keys: Vec<ApiKeyEntry>` (ADR-037)
- `DynamicConfig.credentials: HashMap<String, CredentialSet>` added for static outbound credentials (ADR-036)
- `ListenerConfig` restructured from flat struct to enum: `Stream { transport, interface }`, `Http { config: HttpListenerConfig }`, `Dns { config: DnsListenerConfig }` (ADR-035)
- `HttpListenerConfig` and `DnsListenerConfig` builder-pattern structs added
- `ListenerConfig::validate()` now validates all three variants
## References
- [research/configuration.md](../research/configuration.md) — Full analysis and proposed solution
- [identity.md](identity.md) — IdentityProvider trait, DynamicConfig.auth
- [ADR-013](decisions/013-fail2ban-friendly-logging.md) — Rate limiting parameters

View File

@@ -1,263 +0,0 @@
---
status: draft
last_updated: 2026-06-09
---
# Credentials (Outbound Auth)
## What
The `CredentialProvider` trait and `CredentialSet` enum handle **outbound**
authentication: how alknet authenticates _to_ external and self-hosted services.
This is the complement to `IdentityProvider`, which handles **inbound**
authentication (who is calling alknet).
## Why
Without `CredentialProvider`, each service wrapper would independently solve
credential retrieval, caching, and lifecycle management. Cloud API integrations
(vast.ai, runpod) need API keys. Self-hosted services (rustfs, gitea) need
S3 access keys or OIDC tokens. The secret service can store these at rest, but
the wiring between "decrypt a credential from storage" and "use it in an HTTP
request" doesn't exist yet.
`CredentialProvider` provides a unified abstraction — just as `IdentityProvider`
unifies inbound auth, `CredentialProvider` unifies outbound auth. Handlers
access credentials through `OperationEnv`, not by reaching into storage directly.
## Architecture
### Direction: Inbound vs Outbound
| | IdentityProvider | CredentialProvider |
|---|---|---|
| **Direction** | Inbound (who is calling alknet) | Outbound (how alknet calls others) |
| **Resolves** | Fingerprint/token → `Identity` | Service name → `CredentialSet` |
| **Storage** | `peer_credentials`, `api_keys` | Encrypted nodes in metagraph |
| **Lifecycle** | Stateless lookup | May need refresh (OIDC tokens, S3 sessions) |
| **Location** | `alknet_core::auth` | `alknet_core::credentials` |
Both live at the same architectural layer. A handler receives an
`OperationContext` with `identity` (who called us) and can access credentials
through `context.env` (how we call out).
### CredentialProvider Trait
```rust
pub trait CredentialProvider: Send + Sync + 'static {
fn get_credentials(&self, service: &str) -> Option<CredentialSet>;
fn refresh_credentials(&self, service: &str) -> Option<CredentialSet>;
}
```
The trait is intentionally narrow. It returns credentials for a named service.
It does not abstract the auth mechanism — that stays with the service wrapper
that knows the protocol (S3 signing, OAuth2 refresh, etc.).
### CredentialSet
```rust
pub enum CredentialSet {
ApiKey {
header_name: String,
token: String,
},
Basic {
username: String,
password: String,
},
Bearer {
token: String,
},
S3AccessKey {
access_key: String,
secret_key: String,
session_token: Option<String>,
},
OidcToken {
access_token: String,
refresh_token: Option<String>,
expires_at: Option<u64>,
},
Custom {
scheme: String,
params: HashMap<String, String>,
},
}
```
Each variant carries the data needed for a specific auth mechanism. The service
wrapper that requested the credentials knows what variant it expects and how to
use it.
### CredentialProvider vs IdentityProvider
These are opposite-direction abstractions that compose through `OperationEnv`:
```
Incoming Request
IdentityProvider (credential → Identity)
├── SSH fingerprint → Identity.id, .scopes, .resources
├── Bearer AuthToken → Identity.id, .scopes, .resources
└── API key → Identity.id, .scopes, .resources
OperationContext { identity, env, ... }
├── context.env.invoke("git", "push", input)
│ └── GitService handler
│ └── CredentialProvider (outbound)
│ └── get_credentials("rustfs")
│ └── S3AccessKey { access_key, secret_key }
└── context.env.invoke("secrets", "derive", input)
└── local dispatch to SecretProtocol
Two directions: Inbound (who is calling us)
Outbound (how we call others)
```
### SecretStoreCredentialProvider (Phase 1 Default)
The default `CredentialProvider` implementation. Decrypts credentials via
`SecretProtocol::Decrypt` and holds them in RAM:
```rust
pub struct SecretStoreCredentialProvider {
credentials: ArcSwap<HashMap<String, CredentialSet>>,
}
```
At startup, the CLI or NAPI assembly loads credentials from the secret service
and populates the `ArcSwap`. The `refresh_credentials()` method re-decrypts
after a `Lock`/`Unlock` cycle on the secret service.
### ManagedCredentialProvider (Phase C Future)
For self-hosted services that need active lifecycle management (S3 session
token rotation, OIDC token refresh). Wraps `SecretStoreCredentialProvider`
with per-service `CredentialManager` instances:
```rust
pub struct ManagedCredentialProvider {
base: SecretStoreCredentialProvider,
managers: HashMap<String, Arc<dyn CredentialManager>>,
}
pub trait CredentialManager: Send + Sync + 'static {
fn refresh(&self, current: &CredentialSet) -> Option<CredentialSet>;
fn is_expired(&self, current: &CredentialSet) -> bool;
fn provision(&self, identity: &Identity) -> Option<CredentialSet>;
}
```
- `refresh`: OIDC token refresh, S3 session token rotation
- `is_expired`: Check TTL before use
- `provision`: Create credentials on a self-hosted service for a given identity
This is a Phase C concept. The spec defines the extension point but defers
implementation.
### Integration with OperationEnv
Handlers access credentials through `OperationEnv`:
```rust
// Handler needs outbound credentials for a service
let creds = context.env.get_credentials("rustfs");
```
This is analogous to how `context.env.invoke(namespace, op, input)` works for
operation dispatch — the handler doesn't know whether the credential comes from
config, the secret service, or a managed provider.
### Integration with SecretProtocol
Credentials are stored encrypted in the metagraph via `SecretProtocol`:
1. Operator configures credentials: `alknet credential add vast-ai --type bearer --token-file ./key.txt`
2. CLI encrypts via `SecretProtocol::Encrypt` (AES-256-GCM, key at path `m/74'/2'/0'/0'`)
3. Encrypted credential stored as `EncryptedData` node in metagraph, tagged with service name
4. At startup, `SecretStoreCredentialProvider` calls `SecretProtocol::Decrypt` for each configured service
5. Decrypted credentials held in RAM with same lifecycle as the seed (purged on `Lock`)
The `EncryptedData` wire format is shared with alknet-storage by type-level
compatibility, not a crate dependency.
### Identity-Bound Credentials (Phase B+ Future)
For multi-tenant setups where different alknet users have different access levels
on the same external service:
```rust
// Service-level credential (all users share one key):
credential_provider.get_credentials("rustfs")
// Identity-bound credential (per-user key):
credential_provider.get_credentials_for("rustfs", &identity.id)
```
The trait-level method is service-level. The identity-bound method is an
extension in alknet-storage that uses `Identity.id` (the account UUID in
database-backed deployments) as the lookup key. No separate `account_id` field
needed — `Identity.id` IS the account identifier.
## Constraints
- `CredentialProvider` and `CredentialSet` live in `alknet_core::credentials`.
No database dependency at the core level.
- `CredentialProvider` does not depend on `IdentityProvider`. They compose
through `OperationEnv`, not through dependency.
- `ManagedCredentialProvider` and `CredentialManager` are Phase C concepts.
They are defined as extension points but not implemented yet.
- Identity-bound credentials use `Identity.id` as the account key. In
config-backed deployments, this is the fingerprint or key prefix. In
database-backed deployments, this is the account UUID.
- `SecretStoreCredentialProvider` depends on `SecretProtocol::Decrypt`, which
requires the alknet-secret crate. A stub impl that reads from config is
sufficient for Phase 2 when alknet-secret isn't available.
- The `CredentialSet` variants cover all identified credential types (Phases
AC). Phase D (alknet as OIDC provider) is additive.
## Phase Progression
| Phase | CredentialProvider Scope | Notes |
|-------|-------------------------|-------|
| Phase 2 (now) | Trait + `CredentialSet` in core. `SecretStoreCredentialProvider` stub reads from config. | Enables Phase 2 HTTP auth |
| Phase A | `SecretStoreCredentialProvider` backed by `SecretProtocol::Decrypt`. CLI command for credential management. | Full secret service integration |
| Phase B | `FromOpenAPI` integration. `CredentialProvider` populates `HttpServiceConfig.auth`. | Auto-registration of external services |
| Phase C | `ManagedCredentialProvider` + `CredentialManager`. S3 signing, OIDC refresh, identity-bound credentials. | Production self-hosted services |
| Phase D | Alknet as OIDC provider. Eliminates stored credentials for OIDC-compatible services. | Long-term goal |
## Open Questions
- **OQ-CP-01**: Should `CredentialProvider` support per-identity credentials
(`get_credentials(service, identity)`)? See [open-questions.md](open-questions.md).
- **OQ-CP-02**: Where should OIDC provider operations live if alknet becomes
an OIDC provider (Phase D)? See [open-questions.md](open-questions.md).
- **OQ-CP-03**: How do credential rotations propagate across a cluster? See
[open-questions.md](open-questions.md).
- **OQ-CP-04**: Should `CredentialSet` include request-signing capability?
See [open-questions.md](open-questions.md).
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [036](decisions/036-credentialprovider-core-type.md) | CredentialProvider as core type | Outbound credentials in `alknet_core::credentials`, parallel to IdentityProvider |
| [029](decisions/029-identity-core-type.md) | Identity as core type | Inbound auth — the opposite direction |
| [032](decisions/032-event-boundary-discipline.md) | Event boundary | Secret service domain events stay internal |
## References
- [identity.md](identity.md) — IdentityProvider (inbound auth, opposite direction)
- [secret-service.md](secret-service.md) — SecretProtocol, EncryptedData
- [services.md](services.md) — OperationEnv, OperationContext
- [definitions.md](definitions.md) — IdentityProvider vs CredentialProvider disambiguation
- [research/phase2/credential-provider.md](../research/phase2/credential-provider.md) — Full analysis with rustfs/gitea integration

View File

@@ -1,26 +0,0 @@
# ADR-001: Pluggable Transport via AsyncRead+AsyncWrite Trait
## Status
Accepted
## Context
Alknet needs to support multiple transport modes (TCP, TLS, iroh) for SSH sessions. Each mode has different connection establishment logic but produces the same result: a bidirectional byte stream. Without an abstraction, each transport would need its own SSH connection code path.
russh's `client::connect_stream()` and `server::run_stream()` both accept `AsyncRead + AsyncWrite + Unpin + Send`, meaning SSH is already transport-agnostic at the API level. The design question is whether to enshrine this in alknet's own type system or handle each transport case-by-case.
## Decision
Define a `Transport` trait that produces `AsyncRead + AsyncWrite + Unpin + Send` streams. Each transport (TCP, TLS, iroh) implements this trait. The SSH layer calls `transport.connect()` and passes the result to `russh::client::connect_stream()`.
On the server side, define a `TransportAcceptor` trait that produces incoming streams. Each acceptor (TCP listener, TLS listener, iroh endpoint) implements this trait. The server calls `acceptor.accept()` and passes the result to `russh::server::run_stream()`.
This makes adding a new transport (e.g., WebSocket, QUIC directly) a matter of implementing the trait, not modifying SSH code.
## Consequences
- **Positive**: Clean separation between transport and protocol. Adding transports is additive. SSH code is transport-agnostic.
- **Positive**: Testing is simplified — mock transports can produce in-memory streams.
- **Negative**: Slight indirection for the single-transport case (just TCP). The trait boilerplate is minimal though.
- **Negative**: The trait must be object-safe if we want dynamic dispatch. Using `impl Trait` in function signatures avoids this but limits runtime transport selection. CLI-selected transport needs dynamic dispatch: `Box<dyn Transport<Stream = Box<dyn AsyncRead+AsyncWrite+Unpin+Send>>>`.
## References
- [transport.md](../transport.md)
- [Feasibility assessment §3](../../research/feasibility/ssh-tunnel-vpn-alternative-feasibility.md)

View File

@@ -1,30 +0,0 @@
# ADR-002: TUN Shim as Separate Process
## Status
Superseded by ADR-014
## Context
TUN interface creation requires root privileges or `CAP_NET_ADMIN` on Linux, Administrator on Windows, or platform-specific VPN APIs on macOS/iOS/Android. If the core alknet binary required these privileges, the attack surface of root-required code would include the entire SSH implementation, key handling, and transport negotiation.
The primary use cases (SOCKS5 proxy, port forwarding) need no privileges at all. Only the "route all traffic through TUN" use case needs root.
## Decision
The TUN functionality is a separate `alknet-tun` binary that:
1. Creates a TUN device (requires root / CAP_NET_ADMIN)
2. Reads IP packets from it
3. Forwards each connection to the core alknet's SOCKS5 port (127.0.0.1:1080)
4. Proxies bytes between TUN packets and SOCKS5 connections
The core `alknet connect` binary never needs root. The `alknet-tun` binary is ~200-500 lines and does nothing except TUN ↔ SOCKS5 forwarding.
## Consequences
- **Positive**: Root-required code surface is tiny and auditable.
- **Positive**: Core binary runs unprivileged. SOCKS5 and port forwarding work without any special permissions.
- **Positive**: TUN process can crash without affecting the SSH session (it just reconnects to SOCKS5).
- **Positive**: Matches the proven tun2proxy architecture.
- **Negative**: Two processes to manage instead of one. Requires process supervision (systemd, etc.).
- **Negative**: SOCKS5 adds a small latency overhead vs. direct TUN → SSH packet routing. This is acceptable for the security benefit.
## References
- [tun-shim.md](../tun-shim.md)
- [tun2proxy](https://github.com/tun2proxy/tun2proxy) — proven architecture for TUN → SOCKS5 proxy

View File

@@ -1,31 +0,0 @@
# ADR-003: iroh Stream via tokio::io::join
## Status
Accepted
## Context
iroh's QUIC implementation provides separate `RecvStream` (implements `AsyncRead`) and `SendStream` (implements `AsyncWrite`) for each bidirectional channel opened via `open_bi()` / `accept_bi()`. russh's `connect_stream()` and `run_stream()` require a single type implementing both `AsyncRead` and `AsyncWrite`.
Options considered:
1. `tokio::io::join(recv, send)` — Combines the two halves into `Join<RecvStream, SendStream>` which implements both traits.
2. Custom `IrohStream` wrapper — A struct with `recv` and `send` fields that delegates `AsyncRead` to `recv` and `AsyncWrite` to `send`.
3. Using iroh's `Connection` directly — Opening a new `open_bi()` for each SSH channel instead of running SSH over a single stream.
## Decision
Use `tokio::io::join(recv_stream, send_stream)` (Option 1).
One line of code, correct trait implementations, no custom types needed. The `Join<A, B>` type implements `AsyncRead` using `A` and `AsyncWrite` using `B`, which maps directly to iroh's split stream model.
If profiling later shows overhead (unlikely — it's just method dispatch), we can switch to a custom wrapper. But YAGNI until demonstrated.
Option 3 was rejected because it would require modifying russh to understand iroh connections. The whole point of the transport trait is that SSH doesn't know about iroh.
## Consequences
- **Positive**: Minimal code. One line to bridge iroh and russh.
- **Positive**: No custom types to maintain.
- **Positive**: Correct `AsyncRead` + `AsyncWrite` behavior — `Poll::Pending` on one half doesn't affect the other.
- **Negative**: None identified. The `Join` type is a standard tokio combinator with well-tested semantics.
## References
- [transport.md](../transport.md)
- [Feasibility assessment §11](../../research/feasibility/ssh-tunnel-vpn-alternative-feasibility.md)

View File

@@ -1,28 +0,0 @@
# ADR-004: SSH Runs Over Transport, Not Alongside
## Status
Accepted
## Context
There are two ways to structure the relationship between SSH and the transport layer:
1. **SSH over transport**: The transport produces one duplex stream. The entire SSH session (handshake, key exchange, channel multiplexing) runs over that single stream via `connect_stream()` / `run_stream()`. SSH has no direct network access.
2. **Transport alongside SSH**: SSH manages its own TCP connections via `connect()` / `run()`. The transport layer is an additional feature that wraps outgoing connections. SSH knows about the network.
## Decision
SSH runs over the transport (Option 1). The SSH layer never opens its own sockets or knows what transport it's on.
This is directly enabled by russh's `connect_stream()` and `run_stream()` APIs, which accept any `AsyncRead+AsyncWrite+Unpin+Send`. SSH's entire interaction with the network goes through the single stream produced by the transport.
## Consequences
- **Positive**: Adding a new transport requires implementing the `Transport` trait, not modifying SSH code.
- **Positive**: Testing is straightforward — mock transports produce in-memory streams.
- **Positive**: Security audit is clean — the SSH implementation has no network-facing code.
- **Positive**: The transport can be layered. Iroh connecting through a SOCKS5 proxy (which itself tunnels through alknet) is just a transport that calls out to a SOCKS5 library before establishing the QUIC connection.
- **Negative**: SSH keepalive and reconnection must be handled at the transport level. If the transport stream dies, the SSH session dies. Reconnection means establishing a new transport + new SSH session. There's no "SSH reconnects over the same transport" — you get a new session.
- **Negative**: Multiple SSH sessions over the same iroh connection require the iroh `Endpoint` (not stream) to be shared between sessions. The transport trait produces one stream per `connect()` call. The iroh `Endpoint` must be created externally and shared. (The `IrohTransport` struct holds an `Arc<Endpoint>`.)
## References
- [transport.md](../transport.md)
- [Feasibility assessment §3.4](../../research/feasibility/ssh-tunnel-vpn-alternative-feasibility.md)

View File

@@ -1,39 +0,0 @@
# ADR-005: SOCKS5 as Primary Interface, TUN as Add-on
## Status
Accepted
## Context
A "VPN-like" tool needs to route traffic. There are three approaches:
1. **TUN only**: Create a TUN interface, route all OS traffic through it. Full VPN experience but requires root.
2. **SOCKS5 only**: Local SOCKS5 proxy. Applications configure proxy settings. No root needed but application support varies.
3. **SOCKS5 primary, TUN add-on**: SOCKS5 is the core interface. TUN forwards to SOCKS5.
## Decision
SOCKS5 is the primary interface. TUN is a separate process that forwards to SOCKS5 (Option 3).
SOCKS5 is the core because:
- It requires no privileges
- `curl --socks5-hostname` works everywhere
- Browsers, most CLI tools, and many applications support SOCKS5
- SOCKS5h prevents DNS leaks by resolving names server-side
- It's the interface that the NAPI wrapper and pubsub adapter build on
- TUN is only needed for "route all traffic" use cases, which are a subset of users
TUN forwards to SOCKS5 rather than directly to SSH because:
- The SOCKS5 code already handles TCP connection establishment and bidirectional proxying
- TUN's job is just IP packet → SOCKS5 connection, not IP packet → SSH channel
- The `alknet-tun` binary stays minimal (~200-500 lines)
- No root code in the core binary
## Consequences
- **Positive**: Core binary is root-free. TUN functionality is provided by the external `tun2proxy` tool (ADR-014).
- **Positive**: SOCKS5 is testable without TUN — just `curl` against it.
- **Positive**: The TUN approach is validated by tun2proxy, a well-tested existing tool. No custom TUN code to maintain.
- **Negative**: VPN-like behavior requires running `tun2proxy` alongside `alknet connect` — two processes instead of one integrated binary.
- **Negative**: SOCKS5 doesn't capture UDP (except DNS via SOCKS5h). TUN mode via tun2proxy handles this separately.
## References
- [client.md](../client.md)
- [tun-shim.md](../tun-shim.md)

View File

@@ -1,38 +0,0 @@
# ADR-006: No Logging of Tunnel Destinations
## Status
Accepted
## Context
An SSH tunnel server sees every destination that clients connect to — hostnames, IP addresses, port numbers. This is extremely sensitive information. Logging it creates:
- **Privacy risks**: Tunnel destinations reveal what services users access (internal databases, APIs, etc.)
- **Legal concerns**: Server operators may be pressured to produce logs showing what clients accessed
- **Data retention liability**: Stored destination logs are an attack surface (data breaches, subpoenas)
However, the server does need to log some information for operational purposes — particularly for fail2ban integration to detect and block abusive connections.
## Decision
The server does NOT log:
- `channel_open_direct_tcpip` destinations (host, port)
- DNS resolutions performed by the server on behalf of clients
- Bytes transferred through tunnel channels
- Connection duration or throughput
The server DOES log (ADR-013):
- Auth attempts (remote_addr, user, key_fingerprint, accept/reject)
- Connection opened (remote_addr, transport kind)
- Connection closed (remote_addr, duration)
This separation ensures fail2ban has enough data to detect abusive IPs while destination privacy is maintained.
## Consequences
- **Positive**: Tunnel destinations are never written to disk or any observable log. This is the same guarantee OpenSSH makes with `LogLevel VERBOSE` or below.
- **Positive**: Reduces legal and privacy exposure for server operators.
- **Positive**: fail2ban can still work — it needs source IPs and auth failures, not destinations.
- **Negative**: Server operators cannot audit what destinations clients are accessing. If an operator needs this for compliance, they must implement it outside alknet (e.g., network-level logging at the target host).
- **Negative**: Debugging connectivity issues is harder without destination logs. Mitigated by client-side logging (the client knows what it's connecting to).
## References
- [server.md](../server.md)
- [ADR-013](013-fail2ban-friendly-logging.md) — what the server does log

View File

@@ -1,26 +0,0 @@
# ADR-007: NAPI Exposes Single Duplex Stream
## Status
Accepted
## Context
The NAPI wrapper for alknet could expose different granularity levels:
1. **Full SSH API**: Expose channel multiplexing, `open_direct_tcpip`, `tcpip_forward`, session management. The TypeScript layer would manage channels.
2. **Single duplex stream**: The NAPI wrapper establishes one SSH channel and returns it as a Node.js `Duplex` stream. TypeScript multiplexing (if needed) happens at the pubsub layer.
## Decision
Option 2: NAPI exposes a single duplex stream.
The NAPI wrapper's job is to get a reliable, authenticated byte stream from A to B. It handles transport (TCP/TLS/iroh), SSH authentication, and channel setup, then hands the caller a single `Duplex` stream that just works.
If the TypeScript consumer needs multiplexing (e.g., multiple concurrent tool calls over operations), pubsub handles that at the `EventEnvelope` level. Multiple `call.requested` / `call.responded` events flow over the same stream, distinguished by their `id` fields. This is how the existing WebSocket adapter works.
## Consequences
- **Positive**: Minimal NAPI surface — one function, one return type. Small binary, small FFI boundary.
- **Positive**: The TypeScript side doesn't need to understand SSH at all. It gets a stream and sends/receives `EventEnvelope` JSON.
- **Positive**: No need to expose russh types in NAPI. The SSH complexity stays in Rust.
- **Negative**: If a consumer wants multiple isolated channels (e.g., one for events, one for file transfer), they'd need multiple `connect()` calls (multiple SSH sessions). This is acceptable for the expected use case (pubsub events over a single stream).
## References
- [napi-and-pubsub.md](../napi-and-pubsub.md)

View File

@@ -1,38 +0,0 @@
# ADR-008: ACME/Let's Encrypt Certificate Provisioning
## Status
Accepted
## Context
TLS transport mode requires certificates. Manual certificate management is error-prone — users need to obtain, install, and renew certificates. Our production setup uses certbot with Let's Encrypt (documented in [certbot.md](../../research/ops/certbot.md)), which automates this via the ACME protocol.
There are two ACME flows:
1. **Domain-based**: Standard flow with DNS-01 or HTTP-01 challenge. Certificate is tied to a domain name, auto-renews via certbot/systemd timer. Requires port 80 or DNS access for challenges.
2. **IP-based**: Short-lived certificates via TLS-ALPN-01 challenge on port 443. No domain needed, but cert is short-lived (days, not months). Simpler for quick setups but requires the ACME client to run continuously.
Both flows are important for alknet's usability. Without ACME, TLS mode requires manual cert setup — a significant barrier for users who want "SSH over port 443" for censorship resistance.
## Decision
Support both ACME certificate provisioning paths:
1. **Domain-based ACME** (`--acme-domain <domain>`): Standard certbot-style flow. Certificate is domain-bound, auto-renews. The server runs a challenge responder (HTTP-01 on port 80 or TLS-ALPN-01 on port 443) during certificate issuance/renewal.
2. **IP-based ACME**: Short-lived certs for servers without a domain. Uses TLS-ALPN-01 challenge on port 443. Lower burden but certs expire frequently.
3. **Manual certs** (`--tls-cert` / `--tls-key`): Always supported for users with existing certificates or specific PKI setups.
The implementation should use the `rustls-acme` crate (or similar pure-Rust ACME client) to avoid an external certbot dependency. This keeps alknet self-contained as a single binary.
## Consequences
- **Positive**: Users can run `alknet serve --transport tls --acme-domain example.com` and get working TLS with zero manual cert management.
- **Positive**: IP-based ACME covers the quick-setup case without requiring a domain.
- **Positive**: Consistent with our production infrastructure (certbot + Let's Encrypt is already our standard).
- **Negative**: ACME adds complexity to the server binary (challenge responder, cert store, renewal timer).
- **Negative**: IP-based short-lived certs require more frequent renewal handling.
- **Negative**: Binary size increases with ACME support (rustls-acme dependency). Consider making ACME a feature flag (`acme`).
## References
- [server.md](../server.md)
- [OQ-01](../open-questions.md) — resolved by this ADR
- [OQ-07](../open-questions.md) — resolved by this ADR
- Production certbot setup: [certbot.md](../../research/ops/certbot.md)

View File

@@ -1,28 +0,0 @@
# ADR-009: Default iroh Relay with Override
## Status
Accepted
## Context
iroh requires a relay server for NAT traversal and initial connection establishment. The n0 project provides free relay servers (`https://relay.iroh.network/`) that work out of the box. However, relying on a third-party service creates a dependency:
- n0's relay could change terms, rate-limit, or go down
- Production deployments may want self-hosted relays for reliability and privacy
- The relay URL is a configuration point that should be explicit
Conversely, requiring users to set up a relay server before they can use iroh transport is a significant friction point for testing and quick starts.
## Decision
Default to n0's relay servers. Allow override via `--iroh-relay <url>` CLI flag. Document self-hosted relay setup in project documentation.
This matches iroh's own defaults — n0's relay is the standard starting point. Users who need production reliability self-host.
## Consequences
- **Positive**: Zero-config iroh transport for testing and development. `alknet serve --transport iroh` just works.
- **Positive**: Self-hosting is a single flag override, not a complex setup requirement.
- **Negative**: Default depends on n0's infrastructure. If n0's relay is down, default iroh connections fail (but this is the same experience as every iroh user).
- **Negative**: Privacy-conscious users must remember to `--iroh-relay` to avoid n0. Mitigated by documentation.
## References
- [transport.md](../transport.md)
- [OQ-02](../open-questions.md) — resolved by this ADR

View File

@@ -1,33 +0,0 @@
# ADR-010: Transport Chaining in CLI
## Status
Accepted
## Context
Transport chaining allows combining iroh with an upstream proxy, e.g.:
```bash
alknet connect --transport iroh --proxy socks5://127.0.0.1:1080
```
This routes iroh's outbound TCP connections through a SOCKS5 proxy, which could itself be another alknet instance. This is important for:
- Nested tunnel topologies
- Environments where iroh needs to go through an existing proxy
- Composing transports in flexible ways
iroh's `Endpoint::builder` supports proxy configuration natively. The implementation is straightforward — pass the proxy URL to iroh's builder.
## Decision
Support `--transport iroh --proxy socks5://...` natively in the CLI. This works because iroh's endpoint builder accepts a proxy configuration, so the implementation is minimal: parse the proxy URL and pass it to the endpoint builder.
For other transport combinations (TCP+TLS is already implicit — TLS wraps TCP), the `--proxy` flag applies to outbound connections from the SSH client or iroh endpoint.
## Consequences
- **Positive**: Flexible transport composition without requiring separate manual configuration.
- **Positive**: Matches user expectation from the overview doc's transport chaining example.
- **Positive**: Implementation is minimal — iroh already supports proxy config.
- **Negative**: Slightly more CLI surface area (`--proxy` interaction with `--transport`).
## References
- [transport.md](../transport.md)
- [OQ-05](../open-questions.md) — resolved by this ADR

View File

@@ -1,38 +0,0 @@
# ADR-011: Programmatic-First API, No File-Based Config
## Status
Accepted
## Context
The client and server both need configuration (host addresses, keys, transport options, etc.). There are several approaches:
1. **Read `~/.ssh/config`**: Parse OpenSSH config for default host/key/port. Reduces CLI verbosity for frequent connections.
2. **Custom config file**: Alknet-specific config file (TOML/YAML) with host definitions.
3. **Programmatic API only**: Configuration comes from CLI flags or the library API. No file parsing. `~/.ssh/` path conventions are cross-platform trouble (`~` expansion, Windows paths, etc.).
4. **Hybrid**: `--config` flag pointing to a alknet-specific config file, but no OpenSSH config parsing.
## Decision
Option 3: Programmatic-first API. Configuration is provided via:
- **CLI**: explicit flags (`--server`, `--identity`, `--transport`, etc.)
- **Library API**: `alknet_core::client::ConnectOptions` and `alknet_core::server::ServeOptions` structs, constructable programmatically
- **Environment variables**: for a few convenience defaults (e.g., `ALKNET_SERVER`, `ALKNET_IDENTITY`)
No `~/.ssh/config` parsing, no alknet-specific config files. This approach:
- Avoids cross-platform path issues (`~` expansion, Windows `USERPROFILE`, etc.)
- Makes the library API clean and straightforward for programmatic consumers (NAPI wrapper, pubsub)
- Keeps the CLI simple and explicit — no hidden behavior from config files
- Matches the design principle that the library crate (`alknet-core`) is the primary interface
If users want config-file behavior in the future, it can be added as a separate layer that populates the options structs. But the core doesn't need to know about files.
## Consequences
- **Positive**: Clean library API — `ConnectOptions` and `ServeOptions` are plain Rust structs.
- **Positive**: No cross-platform path issues in the core library.
- **Positive**: Explicit CLI — no hidden settings from a config file the user forgot about.
- **Positive**: NAPI wrapper can construct options programmatically without file I/O.
- **Negative**: Users must type full connection flags each time. Mitigated by shell aliases or environment variables.
- **Negative**: No config file convenience. Users coming from `ssh config` may find this inconvenient.
## References
- [client.md](../client.md)
- [OQ-06](../open-questions.md) — resolved by this ADR

View File

@@ -1,42 +0,0 @@
# ADR-012: Ed25519 Keys + OpenSSH Certificate Authority, No Password Auth
## Status
Accepted
## Context
SSH authentication has several options:
- **Ed25519 public key**: The default, already specified. Each user has a keypair; the server has an `authorized_keys` file.
- **Password authentication**: Convenient for quick setups but less secure (susceptible to brute force, credential reuse).
- **OpenSSH certificate authority (cert-authority)**: A CA signs user certificates. The server trusts the CA instead of individual keys. Much easier for multi-user setups — add one CA line to `authorized_keys` instead of every user's public key. Also supports certificate expiry and restrictions.
The question is which auth methods to support and prioritize.
## Decision
**Primary: Ed25519 public key** (already specified, no change).
**Important: OpenSSH certificate authority**. Support `cert-authority` entries in `authorized_keys` files. When a user presents a certificate signed by a trusted CA, the server validates the certificate (signature, expiry, permissions) and accepts it. This is critical for multi-user deployments where managing individual keys is impractical.
**Not supported: Password authentication over SSH channels.** Password auth over an SSH tunnel (i.e., the SOCKS5 proxy requiring a password) is not in scope. Password auth over SSH itself is rejected because:
- It's less secure than key-based auth
- It's susceptible to brute force (fail2ban can mitigate, but keys eliminate the problem)
- It's not needed when cert-authority provides easy multi-user management
- If a local SOCKS5 proxy is desired with its own auth, that's a separate concern
The server's `authorized_keys` file format follows OpenSSH conventions:
- Regular keys: `ssh-ed25519 AAAA... user@host`
- CA trusts: `cert-authority ssh-ed25519 AAAA... CA name`
- Principals: `cert-authority,permit-port-forwarding ssh-ed25519 AAAA... CA name`
## Consequences
- **Positive**: Multi-user deployments are manageable — one CA entry instead of N key entries.
- **Positive**: Certificates can carry expiry dates and restrictions (permit-port-forwarding, no-pty, source-address).
- **Positive**: No password brute force risk. fail2ban still needed for connection-level abuse, but not for auth-level password guessing.
- **Positive**: `russh` supports OpenSSH certificate verification natively.
- **Negative**: Setting up a CA requires initial key management tooling (`ssh-keygen -s`).
- **Negative**: Users who want a quick "just let me in" experience need to generate keys first. Not a significant barrier for the target audience (self-hosting, ops).
## References
- [client.md](../client.md)
- [server.md](../server.md)
- [OQ-04](../open-questions.md) — resolved by this ADR

View File

@@ -1,39 +0,0 @@
# ADR-013: Fail2ban-Friendly Server Logging
## Status
Accepted
## Context
The server needs to handle abuse on public-facing deployments. Our production infrastructure uses fail2ban on Linux (documented in [fail2ban.md](../../research/ops/fail2ban.md)) with nftables and systemd journal backend. fail2ban needs structured, parseable logs to identify abusive IP addresses.
However, fail2ban is Linux-specific. On other platforms (macOS, Windows, BSD), users need a different approach to reject abusive connections. The server should provide enough logging for fail2ban on Linux and enough built-in protection for other platforms.
## Decision
The server logs connection and authentication events at `INFO` level with structured fields, and provides a configurable connection rate limiter as a built-in defense.
**Logging** (for fail2ban integration on Linux):
- Log auth attempts: `level=INFO, msg="auth attempt", remote_addr=<ip>, user=<user>, key_fingerprint=<sha256>, result=<accept|reject>`
- Log new connections: `level=INFO, msg="connection opened", remote_addr=<ip>, transport=<tcp|tls|iroh>`
- Log disconnections: `level=INFO, msg="connection closed", remote_addr=<ip>, duration=<secs>`
- Do NOT log: channel open targets, DNS resolutions, bytes transferred
This matches what fail2ban needs: source IP + failure indicator. Our existing fail2ban setup filters on similar fields for SSH and nginx.
**Built-in rate limiting** (for all platforms):
- `--max-connections-per-ip <n>` (default: 0 = unlimited) — reject new connections from an IP that already has N active connections
- `--max-auth-attempts <n>` (default: 10) — disconnect after N failed auth attempts from one connection
- Rate limiting happens at the SSH layer, before channels are opened
This ensures that even without fail2ban, the server rejects obviously abusive connections.
## Consequences
- **Positive**: fail2ban can parse alknet logs the same way it parses SSH and nginx logs on our production systems.
- **Positive**: Built-in rate limiting provides protection on platforms without fail2ban.
- **Positive**: No privacy-sensitive data in logs (no tunnel destinations).
- **Negative**: Slightly more code in the server for connection tracking per IP.
- **Negative**: Users with custom fail2ban filters need to write regex for alknet's log format (documented examples provided).
## References
- [server.md](../server.md)
- [OQ-08](../open-questions.md) — resolved by this ADR
- Production fail2ban setup: [fail2ban.md](../../research/ops/fail2ban.md)

View File

@@ -1,41 +0,0 @@
# ADR-014: Defer TUN Implementation, Recommend Local SOCKS5 + tun2proxy
## Status
Accepted
## Context
The original plan included a TUN shim (`alknet-tun`) as Phase 3 — a separate root-requiring process that creates a TUN device and forwards IP packets through alknet's SOCKS5 port. This would provide VPN-like "route all traffic" behavior.
However, TUN implementation has significant complexities:
- Platform differences (Linux TUN, macOS utun, Windows wintun.dll)
- TCP reconstruction in userspace (smoltcp or tun2proxy's ip-stack)
- Virtual DNS handling
- Root/CAP_NET_ADMIN requirements
- TUN is easy to get wrong and hard to debug
The core SOCKS5 interface already works for the vast majority of use cases. For users who truly need VPN-like "route all traffic" behavior, `tun2proxy` is an existing, well-tested tool that does exactly this: creates a TUN device and routes traffic through a SOCKS5 proxy.
## Decision
Defer TUN implementation entirely. Remove `alknet-tun` from the architecture. Instead:
1. **Core interface**: alknet's local SOCKS5 proxy (always available, no root required)
2. **VPN-like behavior**: Users who need it run `tun2proxy --proxy socks5://127.0.0.1:1080` alongside `alknet connect`
3. **Documentation**: Recommend tun2proxy in the README/wiki for "route all traffic" use cases
This removes TUN from the project scope while still providing a path to VPN-like behavior. If demand justifies it later, `alknet-tun` can be added as a thin wrapper around tun2proxy's pattern.
The `tun` feature flag and `alknet-tun` binary are removed from the architecture. The `tun-rs` dependency is removed.
## Consequences
- **Positive**: Significantly reduces project scope and complexity. No TUN code to write, test, or maintain across platforms.
- **Positive**: tun2proxy is already well-tested for this exact use case.
- **Positive**: Core binary remains unprivileged. No root code anywhere in the project.
- **Positive**: Cleaner architecture — alknet only does SSH tunneling + SOCKS5. tun2proxy does TUN.
- **Negative**: Users need two tools instead of one for VPN-like behavior. Mitigated by documentation.
- **Negative**: tun2proxy is an external dependency in practice, though it's widely available in package managers.
- **Negative**: No first-class Windows/macOS TUN story. tun2proxy handles these platforms but users need to install it separately.
## References
- [tun-shim.md](../tun-shim.md) — this spec is now deprecated
- [ADR-002](002-tun-separate-process.md) — superseded; TUN is no longer in scope
- [ADR-005](005-socks5-before-tun.md) — SOCKS5 is still the primary interface; TUN forwarding is now external

View File

@@ -1,27 +0,0 @@
# ADR-015: napi-rs for FFI Bridge
## Status
Accepted
## Context
The NAPI wrapper needs a Rust-to-Node.js bridge. Two main options:
1. **napi-rs**: The standard for Rust → Node.js native addons. Mature, well-documented, large ecosystem. Produces `.node` binaries for specific platforms. Good build tooling (`napi` CLI). Used by major projects (swc, rspack, biome).
2. **uniffi**: Mozilla's FFI bridge supporting multiple targets (Python, Swift, Kotlin, Node.js). Broader target reach but less mature for Node.js specifically. The Node.js binding is relatively new.
The primary consumer is TypeScript/Node.js — specifically the `@alkdev/pubsub` event target system. The broader alkdev ecosystem (pubsub, operations) is TypeScript-first. While future Python or mobile consumers are imaginable, they are not in scope.
## Decision
Use napi-rs. It's the standard for Node.js native addons, has the best documentation and tooling, and matches our primary consumer (TypeScript/Node.js). If future Python or mobile consumers are needed, uniffi can be added as a separate FFI layer — the Rust core library doesn't change, only the binding layer does.
## Consequences
- **Positive**: Best-in-class Node.js native addon support. Mature, well-documented, widely used.
- **Positive**: `napi` CLI handles building, cross-compilation, and npm package publishing.
- **Positive**: Async support via `napi-rs`'s `AsyncTask` and thread-safe functions.
- **Negative**: Only targets Node.js. Python/Swift/Kotlin require a separate FFI bridge (uniffi or similar).
- **Negative**: `.node` binaries are platform-specific. Need CI matrix for linux-x64, linux-arm64, macos-x64, macos-arm64, win32-x64.
## References
- [napi-and-pubsub.md](../napi-and-pubsub.md)
- [OQ-11](../open-questions.md) — resolved by this ADR

View File

@@ -1,40 +0,0 @@
# ADR-016: NAPI Exposes Both connect() and serve()
## Status
Accepted
## Context
The NAPI wrapper needs to provide TypeScript/Node.js consumers with access to alknet's functionality. The primary use case is `@alkdev/pubsub`'s event target system, which needs both directions:
1. **connect()**: Establish a client connection to a alknet server. Used by workers/spokes that need to tunnel events through a alknet server.
2. **serve()**: Start a alknet server from Node.js. Used by hubs that want to accept alknet connections and route events.
The previous decision (ADR-007) was to expose only `connect()` for MVP, deferring `serve()`. However, the pubsub integration requires both: a spoke needs `connect()` to reach a hub, and a hub could use `serve()` to accept connections without running a separate `alknet serve` process.
More importantly, both `connect()` and `serve()` are fundamental operations of the alknet library. Since the NAPI wrapper is a thin layer over `alknet-core`, exposing both is straightforward — they're just Rust functions behind `#[napi]` attributes.
## Decision
The NAPI wrapper exposes both `connect()` and `serve()` from the start:
```typescript
// @alkdev/alknet
function connect(options: AlknetConnectOptions): Promise<Duplex>;
function serve(options: AlknetServeOptions): Promise<AlknetServer>;
```
- `connect()` returns a `Duplex` stream (as per ADR-007)
- `serve()` returns a `AlknetServer` object with a `close()` method and events for new connections
The NAPI layer is transport-agnostic — it doesn't know about pubsub's `EventEnvelope`. The pubsub event target adapter wraps the `Duplex` stream to implement `TypedEventTarget`. This separation ensures the NAPI wrapper is reusable for any stream-based protocol, not just pubsub.
## Consequences
- **Positive**: Pubsub can use both directions without running a separate binary for the server side.
- **Positive**: The NAPI wrapper becomes a complete bridge — any Node.js process can be either a client or server.
- **Positive**: Implementation is still minimal — `serve()` is just `alknet_core::server::run()` behind `#[napi]`.
- **Negative**: Slightly larger API surface (two functions + `AlknetServer` type instead of just `connect()`).
- **Negative**: Server-side NAPI needs to handle multiple concurrent connections, which adds complexity to `AlknetServer`.
## References
- [napi-and-pubsub.md](../napi-and-pubsub.md)
- [ADR-007](007-napi-single-stream.md) — still valid; NAPI exposes single streams, but now from both sides
- [OQ-10](../open-questions.md) — resolved by this ADR

View File

@@ -1,30 +0,0 @@
# ADR-017: Stealth Mode — Protocol Multiplexing on Port 443
## Status
Accepted
## Context
When running a alknet server with TLS transport on port 443, the server should be indistinguishable from a regular HTTPS web server to port scanners and deep packet inspection (DPI) systems. This is important for censorship circumvention — if SSH traffic on port 443 is detectable, it can be blocked.
After the TLS handshake completes, the server sees a raw byte stream. SSH protocol identification starts with `SSH-2.0-`, while HTTP starts with HTTP method verbs (GET, POST, etc.). The server can inspect the first bytes to determine the protocol.
## Decision
When `--stealth` is enabled with TLS transport:
1. After completing the TLS handshake, peek at the first few bytes of the connection
2. If the connection starts with `SSH-2.0-`, proceed with SSH session via `server::run_stream()`
3. If the connection starts with anything else (HTTP, random data), respond with `HTTP/1.1 404 Not Found\r\nServer: nginx\r\n\r\n` and close the connection
This makes the server appear as an nginx web server returning 404 errors to all non-SSH connections. Scanners and DPI systems see a typical HTTPS site with no SSH exposure.
The fake response uses `Server: nginx` headers to match the most common web server profile.
## Consequences
- **Positive**: TLS+alknet servers on port 443 are indistinguishable from ordinary HTTPS sites to automated scanners.
- **Positive**: Simple implementation — just peek at the first bytes and branch.
- **Positive**: Consistent with censorship circumvention best practices.
- **Negative**: Legitimate HTTPS traffic to the same port gets a 404. If the same IP needs to serve real web content, use a reverse proxy (nginx/haproxy) in front that routes by SNI or path.
- **Negative**: The `--stealth` flag only applies to TLS transport. It has no effect on TCP or iroh transports.
## References
- [server.md](../server.md)

View File

@@ -1,38 +0,0 @@
# ADR-018: Control Channel for PubSub over SSH
## Status
Accepted
## Context
The NAPI wrapper and pubsub integration need a way to use alknet's SSH channel as a data plane for event routing. When a `alknet connect` client opens an SSH session to a server, the `direct_tcpip` channel type is used to reach specific TCP targets (host:port).
For the pubsub use case, the client needs a dedicated bidirectional stream to the server's event bus — not a TCP connection to a random host. There are several approaches:
1. **Special destination**: Use `direct_tcpip` with a reserved destination (e.g., `alknet-control:0`) that the server recognizes and routes internally instead of connecting to a TCP target.
2. **Port forwarding**: The server runs a pubsub hub on a specific port (e.g., 9736) and the client uses normal port forwarding (`-L 9736:hub:9736`).
3. **Custom channel type**: Define a new SSH channel type beyond `direct_tcpip` and `forwarded_tcpip`.
## Decision
Use approach 1: a reserved `direct_tcpip` destination string. When the server receives a `channel_open_direct_tcpip` request for `alknet-control:0`:
1. The `channel_open_direct_tcpip` handler detects the special target via string matching
2. Instead of connecting to a TCP target, it bridges the channel to the local pubsub event bus
3. `EventEnvelope` JSON flows bidirectionally over the SSH channel
The destination string `alknet-control` is reserved. Regular TCP targets are hostnames or IP addresses, so there is no collision risk.
Approach 2 (port forwarding to a specific port) is still supported as an alternative — the client can use `--forward 9736:localhost:9736` if the server runs a pubsub hub on that port. But the control channel approach is simpler and doesn't require a separate listening port.
Approach 3 (custom channel type) was rejected because russh's `direct_tcpip` handler is well-understood and adding custom channel types requires modifying russh.
## Consequences
- **Positive**: Simple implementation — just string matching in the server's `channel_open_direct_tcpip` handler.
- **Positive**: No separate port or service needs to run on the server. The control channel is built into alknet.
- **Positive**: Compatible with the NAPI wrapper's single-duplex-stream model.
- **Positive**: Port forwarding to a specific port is still available as an alternative.
- **Negative**: The string `alknet-control` is a magic constant. It should be defined as a constant in the crate.
- **Negative**: Regular TCP destinations accidentally matching `alknet-control` would be misrouted. Mitigated by reserving the entire `alknet-` prefix namespace.
## References
- [napi-and-pubsub.md](../napi-and-pubsub.md)
- [server.md](../server.md)

View File

@@ -1,42 +0,0 @@
# ADR-019: `--proxy` Has Different Semantics on Client vs Server
## Status
Accepted
## Context
The `--proxy` CLI flag appears on both `alknet connect` (client) and `alknet serve` (server), but the two sides proxy fundamentally different things:
- **Client**: `--proxy` routes the *transport connection* through the proxy. For example, `alknet connect --transport iroh --proxy socks5://127.0.0.1:1080` means the iroh endpoint's outbound TCP connections go through the specified SOCKS5 proxy before reaching the iroh relay. The proxy wraps the transport layer.
- **Server**: `--proxy` routes *outbound target connections* through the proxy. For example, `alknet serve --proxy socks5://127.0.0.1:9050` means when an SSH client opens a `direct_tcpip` channel to `db.internal:5432`, the server connects to that target through the specified proxy. The proxy wraps the data-plane connections.
Using the same flag name for both is intentional — from the user's perspective, both mean "route traffic through a proxy." But the layer at which the proxy operates differs, and this needs to be explicit so implementers don't confuse the two.
ADR-010 addressed transport chaining for the client side only. The server-side outbound proxy behavior has no ADR. This ADR documents both semantics and the rationale for sharing the flag name.
## Decision
The `--proxy` flag uses the same name on client and server, with documented different semantics:
| Side | Flag | What gets proxied | Example |
|------|------|-------------------|---------|
| Client | `--proxy` | Transport connection (outbound to server/relay) | `--transport iroh --proxy socks5://...` → iroh endpoint connects through proxy |
| Server | `--proxy` | Outbound target connections (data plane) | `--proxy socks5://...` → direct_tcpip targets reached through proxy |
On the **client**, `--proxy` affects the transport layer. It only applies to transports that make outbound TCP connections (iroh through a proxy, TLS through a proxy). For plain TCP transport, `--proxy` has no meaningful effect since the transport is already a direct TCP connection — use the SOCKS5 server instead.
On the **server**, `--proxy` affects the data plane. All `channel_open_direct_tcpip` outbound connections are routed through the proxy, regardless of transport mode.
This is not a naming collision — it's the same conceptual operation ("route through a proxy") at different layers. The shared name avoids forcing users to learn two proxy flags.
## Consequences
- **Positive**: One flag name (`--proxy`) instead of two. Users already understand "proxy" as "route through this."
- **Positive**: Client-side proxy is minimal implementation — iroh's endpoint builder accepts proxy config natively.
- **Positive**: Server-side proxy is straightforward — all outbound TCP from channel handlers goes through the proxy.
- **Negative**: Implementers must read the correct spec (client vs server) to understand what `--proxy` does for their side. This is mitigated by CLI help text that clearly describes the behavior per side.
- **Negative**: On the client, `--proxy` with `--transport tcp` is effectively a no-op (the transport is already a direct TCP connection to the server). The CLI should handle this case gracefully.
## References
- [ADR-010](010-transport-chaining-cli.md) — client-side transport chaining
- [transport.md](../transport.md) — transport layer spec
- [client.md](../client.md) — client CLI
- [server.md](../server.md) — server outbound proxy

View File

@@ -1,85 +0,0 @@
# ADR-023: Unified Authentication with Shared Key Material
## Status
Accepted
## Context
Alknet currently authenticates connections exclusively through SSH public key
auth in the SSH handshake. This works for SSH-over-any-transport (TCP, TLS,
iroh) because SSH carries its own auth protocol. But WebTransport and other
HTTP-level transports cannot perform SSH key exchange — browsers speak HTTP/3,
not SSH.
Without unification, non-SSH transports would need a completely separate
identity system (API keys, JWTs, session tokens). This creates two problems:
(1) operators manage two key sets with two rotation mechanisms, and (2) the
same person connecting via SSH and WebTransport appears as two different
identities.
The `IdentityProvider` trait is needed to decouple alknet-core from any
specific identity storage (config file vs. database). Without it, alknet-core
would either hardcode config-file-based auth or take a database dependency —
neither is acceptable for a library crate.
## Decision
**Unified authentication**: The same Ed25519 key material (`authorized_keys`
and `cert_authorities`) is shared across both SSH auth and token auth. The
presentation differs per transport, but the verification result (an
`Identity` with scopes) is the same.
**Token auth for non-SSH transports**: WebTransport clients present a signed
timestamp token in the CONNECT request URL:
```
AuthToken = base64url(key_id || timestamp || signature)
key_id = SHA-256 fingerprint of the Ed25519 public key (32 bytes)
timestamp = Unix seconds, big-endian u64 (8 bytes)
signature = Ed25519 sign(key_id || timestamp_bytes, private_key)
```
Server extracts the fingerprint, looks it up in the same `authorized_keys`
set, verifies the signature, and checks the timestamp window (default ±300s).
**`IdentityProvider` trait**: Decouples alknet-core from identity storage. The
trait resolves a fingerprint or token to an `Identity`. Default implementation
loads from `DynamicConfig.auth` (no database). Hub implementation can back it
with `@alkdev/storage`.
**`TokenKeySource::Shared`**: The token auth uses the same authorized keys set
as SSH auth by default. Deployments that want separate access control can use
`TokenKeySource::Separate` with a distinct key set.
**Replay protection via timestamps**: V1 uses timestamp-only (no server state).
Zero-replay can be added later via a nonce challenge-response without changing
the key material.
## Consequences
- **Positive**: One key set, one rotation, one `reloadAuth()` call. Adding a
key to `authorized_keys` immediately grants access via both SSH and
WebTransport.
- **Positive**: `IdentityProvider` trait makes alknet-core independent of any
specific database. Default: config file. Hub: `@alkdev/storage`.
- **Positive**: Browser clients can authenticate using Ed25519 keys via
SubtleCrypto (Chrome 105+, Firefox 130+, Safari 17+). Deno supports it
natively.
- **Positive**: No JWT library dependency. The token is a simple Ed25519
signature over a fixed structure — same primitives SSH already uses.
- **Negative**: V1 has a replay window (±300s). An attacker who intercepts a
QUIC packet can replay the token within the window. Acceptable because QUIC
interception is the same threat level as connection hijacking.
- **Negative**: Certificate authority tokens are not supported in v1. CA
verification requires the full OpenSSH certificate structure, which doesn't
fit in a signed timestamp.
- **Negative**: Browser-side key management is less ergonomic than SSH key
files. The private key must be imported into SubtleCrypto. This is a UI/UX
concern, not a protocol concern.
## References
- [auth.md](../auth.md) — Full auth architecture spec
- [ADR-012](012-auth-ed25519-and-cert-authority.md) — Ed25519 + cert-authority auth
- [OQ-17](../open-questions.md) — Transport-aware auth (resolved by this ADR)
- [configuration.md](../../research/configuration.md) — OQ-CFG-04, OQ-CFG-06 (resolved)

View File

@@ -1,63 +0,0 @@
# ADR-024: Bidirectional Call Protocol
## Status
Accepted
## Context
The alknet control channel (ADR-018) routes from client → server's event bus.
This is unidirectional: clients can send events to the server, but the server
cannot call operations on the client. In the hub/spoke model, spokes (dev env
containers) connect to a hub and expose operations (fs, bash, search) that the
hub invokes. The hub needs to call *spoke* operations.
Additionally, the current control channel provides no request/response semantics.
Every consumer that needs call/response reinvents the pending-request correlation.
## Decision
The call protocol is bidirectional. Both sides can send `call.requested` and
receive `call.responded`. The protocol uses `EventEnvelope` wire format (4-byte
BE length prefix + JSON) — the same as `@alkdev/pubsub`.
Five event types: `call.requested`, `call.responded`, `call.completed`,
`call.aborted`, `call.error`.
A call is a subscribe that resolves after one event. Both use `call.requested`
with correlated `requestId`. `PendingRequestMap` in core provides correlation.
Operation names use slash-based paths: `/{spoke}/{service}/{op}`. The first
path segment routes the call to the correct connected node. The hub's registry
maps spoke prefixes to connections. This mirrors iroh's ALPN dispatch: the
first segment is the routing key, remaining path dispatches within the node.
Core-provided operations use short paths without a spoke prefix
(`/services/list`, `/services/schema`). Spoke operations are prefixed
(`/dev1/fs/readFile`).
This generalizes ADR-018's control channel: the `alknet-*` destination becomes
a transport for `EventEnvelope` frames with call protocol semantics, instead of
raw pubsub dispatch.
## Consequences
- **Positive**: Hub can invoke operations on spokes. Dev env containers
expose fs, bash, search — the hub calls them as needed.
- **Positive**: Browser clients can expose custom UDFs. Any connected participant
can both call and serve operations.
- **Positive**: Built-in request/response correlation. One `PendingRequestMap`
in core serves all consumers.
- **Positive**: Slash-based paths align with URL routing, OpenAPI, MCP, and
iroh's ALPN dispatch. First segment = routing key.
- **Positive**: Multiple spokes exposing the same service (two dev envs both
exposing `/fs/*`) are naturally differentiated by the spoke prefix.
- **Negative**: The `PendingRequestMap` adds in-memory state. Entries must be
cleaned up on timeout or connection close.
- **Negative**: The hub must maintain a routing table mapping spoke identities
to connections, with registration on connect and cleanup on disconnect.
## References
- [call-protocol.md](../call-protocol.md) — Full call protocol spec
- [ADR-018](018-control-channel-for-pubsub.md) — Control channel (generalized)
- [napi-and-pubsub.md](../napi-and-pubsub.md) — NAPI wrapper and pubsub adapter

View File

@@ -1,73 +0,0 @@
# ADR-025: Handler/Spec Separation for Downstream Service Registration
## Status
Accepted
## Context
The current control channel (ADR-018) is hardcoded: `alknet-control:0` bridges
to the local pubsub event bus. If NAPI wants to expose `fs.readFile` or
`bash.exec` as callable operations, it has no way to register these with core's
channel routing. The NAPI handler would need to intercept channel data outside
of core.
For the hub/spoke model, spokes register their operations with the hub when
they connect. The hub's registry must include both hub-local operations and
remote operations exposed by spokes.
## Decision
Operation specs and handlers are separated from core. Core provides:
1. `OperationSpec` — describes what an operation does (name, type, input/output
schemas, access control)
2. `OperationHandler` — implements the operation logic
3. `OperationRegistry` — maps paths to specs + handlers
4. Built-in operations: `/services/list`, `/services/schema`
Downstream consumers register their own operations:
```rust
// NAPI layer registers dev env tools
registry.register(OperationSpec { name: "/fs/readFile", ... }, fs_read_handler);
registry.register(OperationSpec { name: "/bash/exec", ... }, bash_exec_handler);
// Browser client registers a custom UDF
registry.register(OperationSpec { name: "/notify/alert", ... }, notify_handler);
```
Operation names use slash-based paths: `/{spoke}/{service}/{op}`. The first
segment routes to the node. The `namespace` field on `OperationSpec` is
derived from the second path segment (`service`).
When spoke operations are registered with the hub, the hub adds the spoke
prefix: a spoke that registers `/fs/readFile` as "dev1" becomes addressable as
`/dev1/fs/readFile` in the hub's routing table.
The `/services/list` operation returns all registered specs. The
`/services/schema` operation returns the spec for a specific operation. These
are read-only — no admin operations.
## Consequences
- **Positive**: NAPI, Python, and any downstream consumer can register
operations without modifying core.
- **Positive**: Service discovery is built in. Clients query `/services/list`
to learn what operations a hub offers.
- **Positive**: Spoke prefix naturally differentiates multiple spokes exposing
the same service (dev1 vs dev2).
- **Positive**: `AccessControl` on each `OperationSpec` enables per-operation
authorization. Higher-risk operations (shell, filesystem write) can require
tighter scopes.
- **Positive**: Schema exposure enables MCP adapter generation. OperationSpec
maps directly to MCP tool definitions.
- **Negative**: The registry adds complexity. Core now owns `OperationSpec`,
`OperationRegistry`, and `PendingRequestMap`.
- **Negative**: Namespace collisions between downstream consumers are possible.
The spoke prefix mitigates this: `/dev1/fs/readFile` vs `/dev2/fs/readFile`.
## References
- [call-protocol.md](../call-protocol.md) — Full call protocol spec
- [ADR-018](018-control-channel-for-pubsub.md) — Control channel (generalized)
- `@alkdev/operations` — TypeScript `OperationSpec`, `CallHandler`, registry

View File

@@ -1,162 +0,0 @@
# ADR-026: Transport/Interface Separation (Three-Layer Model)
## Status
Accepted
## Context
In the current architecture, SSH is deeply embedded in the server handler. The
`ServerHandler` owns auth, channel management, and proxy logic — all mixed
together. This makes it impossible to run the call protocol over any transport
that doesn't speak SSH, such as:
- **DNS** — encoding call protocol frames as DNS TXT queries/responses for
censorship resistance
- **Raw framing** — 4-byte length prefix + JSON `EventEnvelope` without SSH
wrapping, for local service mesh or browser-to-head direct communication
- **WebTransport** — running call protocol over QUIC streams (browsers can't do
SSH key exchange)
The DNS control channel concept from research (`core.md`) currently conflates
"DNS as a transport that moves bytes" with "SSH sessions over those bytes." But
SSH is not a transport — it's a protocol layer that sits *on top of* a
transport. Separating them enables the DNS control channel to carry call
protocol events directly, without wrapping SSH inside DNS queries.
The same separation enables raw framing (no SSH overhead) for trusted local
networks, and WebTransport direct call protocol for browser clients.
## Decision
**Establish a three-layer model:**
### Layer 1: Transport
Produces byte streams. A `Transport` still produces
`AsyncRead + AsyncWrite + Unpin + Send`. This layer is unchanged from ADR-001.
```rust
#[async_trait]
pub trait Transport: Send + Sync + 'static {
type Stream: AsyncRead + AsyncWrite + Unpin + Send + 'static;
async fn connect(&self) -> Result<Self::Stream>;
fn describe(&self) -> String;
}
```
Transports: TCP, TLS, iroh, DNS (as byte carrier), WebTransport (future).
### Layer 2: Interface
Consumes a `Transport::Stream` and produces call protocol sessions. An
interface is what SSH currently does: wrap a byte stream in session semantics.
```rust
#[async_trait]
pub trait Interface: Send + Sync + 'static {
type Session;
async fn accept(stream: TransportStream, config: &InterfaceConfig) -> Result<Self::Session>;
}
```
Interfaces:
- **SSH interface** — wraps existing `ServerHandler` logic. SSH handshake, auth,
channel multiplexing. The call protocol runs over a reserved SSH channel
(`alknet-control:0`).
- **Raw framing interface** — 4-byte big-endian length prefix + JSON
`EventEnvelope`. No SSH overhead. Direct call protocol over the transport
stream.
- **DNS control channel** — a (DNS transport, raw framing interface) pair that
encodes/decodes `EventEnvelope` frames as DNS query/response pairs.
### Layer 3: Protocol
Carries semantics. Call protocol events, operation registry, service calls.
The protocol is agnostic to both the transport and the interface below it. It
receives `EventEnvelope` frames from whatever interface produced them.
### Connection Model
A **connection** is always a (Transport, Interface) pair. The valid combinations are enumerated:
| Transport | Interface | Use case |
|-----------|-----------|----------|
| TLS | SSH | Standard alknet tunnel |
| TCP | SSH | Plain SSH tunnel |
| iroh | SSH | P2P SSH tunnel |
| DNS | raw framing | DNS control channel |
| WebTransport | SSH | Browser SSH tunnel (future) |
| WebTransport | raw framing | Browser call protocol (future) |
| TCP | raw framing | Direct call protocol, local mesh |
**The DNS control channel carries call protocol frames directly — it does NOT
wrap SSH inside DNS.** This is explicit because the research originally
conflated "SSH tunneling over DNS" with "DNS as a transport for call protocol."
The (DNS, raw framing) pair sends `EventEnvelope` frames as DNS TXT
queries/responses — no SSH involved.
### `TransportKind` Enum
The `TransportKind` enum (currently `Tcp | Tls | Iroh`) gains `Dns` and
`WebTransport` variants. Initially these are tags only — no acceptor
implementation. The full DNS and WebTransport implementations are Phase 4 work
per the integration plan.
```rust
pub enum TransportKind {
Tcp,
Tls { server_name: Option<String> },
Iroh { endpoint_id: String },
Dns { domain: String },
WebTransport { host: String },
}
```
### ServerHandler Refactor
The existing `ServerHandler` is refactored into `SshInterface`. The interface
abstraction means the server's accept loop becomes:
```rust
// Pseudocode
let (transport, interface) = listener_config;
let stream = transport.accept().await?;
let session = interface.accept(stream, &config).await?;
// session produces call protocol events
```
The call protocol handler is interface-agnostic — it receives `EventEnvelope`
frames from any interface. Auth, forwarding policy, and operation routing happen
at Layer 3, not inside the SSH handler.
## Consequences
- **Positive**: Enables DNS control channel without SSH wrapping. The (DNS,
raw framing) pair is a clean (Transport, Interface) combination.
- **Positive**: Enables raw framing for local service mesh. No SSH overhead for
trusted networks.
- **Positive**: SSH becomes pluggable. The same call protocol handler works with
any interface.
- **Positive**: `ServerHandler` is refactored into `SshInterface` — a smaller,
more focused component that only handles SSH session management.
- **Positive**: Future WebTransport and WebSocket interfaces are additive — they
implement the `Interface` trait without touching SSH code.
- **Negative**: This is the most invasive code change in Phase 1
(integration-plan, Phase 1.8). SSH auth, channel management, and proxy logic
are currently tangled in `ServerHandler`. Extracting them requires careful
refactoring to maintain existing behavior.
- **Negative**: The `Interface` trait is new and untested. The design must
accommodate both SSH's channel multiplexing and raw framing's single-stream
model through the same abstraction.
## References
- [research/core.md](../../research/core.md) — Transport layer, DNS transport section
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.8, three-layer model
- [transport.md](../transport.md) — Current Transport trait (unchanged at Layer 1)
- [server.md](../server.md) — Current ServerHandler (will become SshInterface)
- [ADR-001](001-pluggable-transport.md) — Transport trait produces stream (unchanged)
- [ADR-004](004-ssh-over-transport.md) — SSH runs over transport (reinforced by Layer 2)
- [ADR-024](024-bidirectional-call-protocol.md) — Bidirectional call protocol (Layer 3)

View File

@@ -1,164 +0,0 @@
# ADR-027: Crate Decomposition
## Status
Accepted
## Context
alknet-core currently contains everything: transport, SSH, auth, config, the
call protocol handler, and the server accept loop. As the project grows to
include SQLite-backed identity, HD key derivation, and metagraph storage, core
would need to depend on rusqlite, bip39, petgraph, and other heavy dependencies
— unacceptable for a library crate that CLI users embed.
Different deployment topologies need different subsets:
- A minimal CLI tunnel only needs core, transport, and auth types
- A head node needs SQLite-backed identity and the secret service
- A flowgraph visualization tool only needs petgraph operations
Circular dependencies must be avoided. alknet-storage implements
alknet-core's `IdentityProvider` trait, so alknet-core cannot depend on
alknet-storage. alknet-storage references alknet-secret's `EncryptedData` wire
format, but not as a crate dependency.
## Decision
**Decompose the project into six crates with a strict acyclic dependency graph.**
### Crate Structure
1. **alknet-core** — Transport, SSH, call protocol, config, auth types, identity,
`OperationSpec`, `Interface` trait. The foundational crate that everything
else depends on (by type, not by crate dep in some cases).
- *Depends on*: russh, tokio, irpc (feature-gated), serde, arc-swap
- *Does NOT depend on*: alknet-secret, alknet-storage, alknet-flowgraph
2. **alknet-secret** — BIP39 mnemonic generation, SLIP-0010 Ed25519 HD key
derivation, AES-256-GCM encryption, `SecretProtocol` irpc service.
- *Depends on*: bip39, ed25519-bip32 (or rust-bip32-ed25519), aes-gcm, sha2,
irpc
- *Does NOT depend on*: alknet-core, alknet-storage
3. **alknet-storage** — SQLite-backed metagraph, identity tables, ACL graph,
honker integration, `StorageProtocol` irpc service.
- *Depends on*: rusqlite (via honker), honker, petgraph, jsonschema, irpc
- *Does NOT depend on alknet-core* (but implements alknet-core's
`IdentityProvider` trait via the trait, not a crate dep)
- *Does NOT depend on alknet-secret* (but references `EncryptedData` type
format for wire compatibility)
4. **alknet-flowgraph**`FlowGraph<N,E>` over petgraph, operation graph, call
graph, type compatibility checking.
- *Depends on*: petgraph, serde, jsonschema, thiserror
- *Does NOT depend on*: alknet-core, alknet-storage, alknet-secret
5. **alknet-napi** — Node.js native addon. Exposes alknet-core to Node.js.
- *Depends on*: alknet-core
- *Does NOT depend on*: alknet-secret, alknet-storage, alknet-flowgraph
6. **alknet** (CLI binary) — Assembles everything.
- *Depends on*: alknet-core, alknet-secret (feature), alknet-storage (feature),
alknet-flowgraph (feature), toml
### Dependency Graph
```
alknet-secret alknet-storage alknet-flowgraph
(standalone) (standalone) (standalone)
│ │ │
│ (feature flags │ (trait impl │ (type compat
│ in CLI binary) │ via CLI wire) │ via JSON)
▼ ▼ ▼
┌─────────────────────┐
│ alknet-core │
│ (transport, SSH, │
│ call protocol, │
│ Identity, Config) │
└─────────┬───────────┘
┌────────────┼────────────┐
▼ ▼ ▼
alknet-napi alknet (CLI binary — assembles everything)
```
All four library crates (core, secret, storage, flowgraph) are independent of
each other. Dependencies flow **upward** only. The CLI binary sits at the top
and wires concrete implementations together. alknet-storage implements
alknet-core's `IdentityProvider` trait without a crate dependency — the CLI
binary provides the bridge.
### Narrow Interface Points
Three types serve as the narrow interface points between crates:
1. **`Identity`** — Defined in `alknet_core::auth`. Used by auth handler,
forwarding policy, and call protocol. alknet-storage implements
`IdentityProvider` to produce instances.
2. **`IdentityProvider`** — Trait defined in `alknet_core::auth`. Implemented by
`ConfigIdentityProvider` (in core) and `StorageIdentityProvider` (in
alknet-storage). The CLI/NAPI layer wires the concrete implementation.
3. **`OperationSpec`** — Defined in `alknet_core::call`. Used by the operation
registry and by alknet-flowgraph for type compatibility checking. The bridge
is serialization — flowgraph serializes to JSON, storage persists it.
### irpc Feature Flag
irpc is a feature flag in alknet-core. When disabled, auth and config go through
`IdentityProvider` and `ConfigReloadHandle` directly — no irpc overhead. Nodes
that only do SSH tunneling don't need the service layer.
In alknet-secret and alknet-storage, irpc is an independent dependency, not
feature-gated. These crates always define irpc service protocols because they
are used in production deployments where the service layer is active.
### alknet-storage's Relationship to alknet-core
alknet-storage does NOT depend on alknet-core as a crate. Instead:
- alknet-storage defines its own `IdentityProvider` impl that matches
alknet-core's trait signature. The trait is re-exported or defined locally
with `#[cfg(feature = "alknet-core")]` interop.
- In practice, the CLI binary crate depends on both and wires them together.
alknet-storage provides `StorageIdentityProvider`; alknet-core takes
`impl IdentityProvider`.
### alknet-storage's Relationship to alknet-secret
alknet-storage does NOT depend on alknet-secret as a crate. Instead:
- alknet-storage and alknet-secret share the `EncryptedData` wire format (key
version, salt, IV, ciphertext). This is a type-level compatibility, not a
crate dependency.
- alknet-secret encrypts; alknet-storage stores the encrypted blob in a
`SecretNode` in the metagraph. The bridge is serialization.
## Consequences
- **Positive**: Core is lean. No database, no crypto, no petgraph. CLI users
get a small binary.
- **Positive**: Services are pluggable. alknet-secret and alknet-storage can be
swapped for alternative implementations.
- **Positive**: No circular dependencies. The dependency graph is a DAG.
- **Positive**: Deployment topology determines which crates to include. A CLI
tunnel uses only alknet-core. A head node uses everything.
- **Positive**: irpc is feature-gated in core. Minimal deployments don't pay for
service layer overhead.
- **Negative**: `IdentityProvider` trait interop between alknet-core and
alknet-storage requires careful versioning. If the trait signature changes,
both crates must update.
- **Negative**: `EncryptedData` wire format compatibility between alknet-secret
and alknet-storage is implicit (not enforced by the type system). A shared
types crate could be extracted if needed, but adds another crate dependency.
## References
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 2, dependency graph
- [research/core.md](../../research/core.md) — alknet-core contents
- [research/services.md](../../research/services.md) — Service protocols
- [research/storage.md](../../research/storage.md) — alknet-storage contents
- [research/flow.md](../../research/flow.md) — alknet-flowgraph contents
- [ADR-028](028-auth-irpc-service.md) — Auth as irpc service (service protocol enabled by decomposition)
- [ADR-029](029-identity-core-type.md) — Identity as core type (narrow interface point)

View File

@@ -1,147 +0,0 @@
# ADR-028: Auth as irpc Service
## Status
Accepted
## Context
For head nodes serving many users, in-memory key lookup via `ArcSwap<DynamicConfig>`
doesn't scale. Loading all authorized keys into RAM and atomic-swapping the
entire set on each reload works for small deployments but requires holding every
key in memory. For production deployments with hundreds or thousands of users,
auth verification should query a database on demand rather than holding all keys
in memory.
The current `ArcSwap<DynamicConfig>` approach works for CLI and single-node
setups. What's needed is an async boundary that allows auth verification to go
through a service — locally via channels for minimal deployments, or via irpc
for production deployments where auth runs on a separate process or node.
The critical design point: callers go through the `IdentityProvider` trait
(ADR-029). The irpc service is one way to satisfy the trait. Both paths produce
the same result — an `Identity` or rejection. The trait is the contract; the
service is an implementation path.
## Decision
**Auth verification is provided via an irpc service protocol, with
`IdentityProvider` as the interface contract and `ConfigIdentityProvider`
(ArcSwap-backed) as the default implementation.**
### IdentityProvider Trait (ADR-029) — The Contract
Callers depend on `IdentityProvider`, not on any concrete implementation:
```rust
pub trait IdentityProvider: Send + Sync + 'static {
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
}
```
### ConfigIdentityProvider — Default Implementation
Reads from `ArcSwap<DynamicConfig.auth>`. No database needed. Every authorized
key gets a default scope set. This is the default for CLI and single-node
deployments.
### AuthProtocol irpc Service — Behind Feature Flag
```rust
#[rpc_requests(message = AuthMessage)]
#[derive(Debug, Serialize, Deserialize)]
enum AuthProtocol {
#[rpc(tx=oneshot::Sender<AuthResult>)]
#[wrap(VerifyPubkey)]
VerifyPubkey { fingerprint: String, key_data: Vec<u8> },
#[rpc(tx=oneshot::Sender<AuthResult>)]
#[wrap(VerifyToken)]
VerifyToken { token_bytes: Vec<u8>, timestamp: u64 },
#[rpc(tx=oneshot::Sender<()>)]
#[wrap(ReloadKeys)]
ReloadKeys,
#[rpc(tx=oneshot::Sender<bool>)]
#[wrap(CheckAccess)]
CheckAccess { identity: Identity, operation: String },
}
enum AuthResult {
Ok(Identity),
Denied(String),
}
```
The `AuthProtocol` is behind the `irpc` feature flag in alknet-core. Nodes
that only do SSH tunneling don't need the service layer overhead. When the
feature is disabled, auth goes through `IdentityProvider` directly.
### AuthServiceImpl
Two implementations exist (the second is a future phase):
- **ConfigAuthService** — backed by `ConfigIdentityProvider` (ArcSwap path).
Wraps the trait in an irpc service for deployments that use the service layer
but don't have SQLite. This is the Phase 1 path: it ships with alknet-core.
- **StorageAuthService** — backed by SQLite `peer_credentials` and `api_keys`
tables (in alknet-storage, not yet built). Queries on demand. Can maintain an
LRU cache for hot fingerprints. This is a Phase 2+ implementation — the
contract is defined here so alknet-storage can implement it later.
Both produce the same `AuthResult` — an `Identity` or a denial. Callers don't
know or care which backend is running.
### Integration with IdentityProvider
The irpc service and the trait compose. A caller goes through `IdentityProvider`,
which may internally delegate to the irpc service, or may satisfy the request
locally via `ConfigIdentityProvider`. The deployment topology determines the
path:
- **Minimal (CLI, single-node)**: `ConfigIdentityProvider` reads from
`ArcSwap<DynamicConfig>`. No irpc overhead.
- **Production with local auth**: `AuthServiceImpl` wraps
`StorageIdentityProvider` locally. The handler calls `IdentityProvider` which
routes to the local irpc service.
- **Distributed auth**: Handler on a worker node calls `IdentityProvider` which
routes to a remote auth irpc service over QUIC.
### ConfigService Integration
`AuthProtocol::ReloadKeys` triggers reload of the dynamic config's auth section.
For the `ConfigIdentityProvider` path, this is equivalent to
`ConfigReloadHandle::reload()`. For the `StorageIdentityProvider` path, this
refreshes the LRU cache. Both update atomically — ongoing connections are
unaffected, new connections pick up changes.
## Consequences
- **Positive**: Minimal deployments use `ArcSwap` without irpc overhead. No
database dependency for CLI users.
- **Positive**: Production deployments wire `StorageIdentityProvider` behind the
irpc service. Auth scales to thousands of users without loading all keys into
memory.
- **Positive**: The `IdentityProvider` trait is the only contract callers depend
on. This keeps alknet-core lean and testable.
- **Positive**: Feature flag (`irpc`) keeps core lean for deployments that don't
need the service layer.
- **Positive**: Both paths produce identical `Identity` results. Behavioral
parity is enforced by the shared `Identity` type.
- **Negative**: Two implementations must be kept in sync. `ConfigIdentityProvider`
and `StorageIdentityProvider` must produce the same `Identity` for the same
input. Integration tests should verify this.
- **Negative**: The `irpc` feature flag adds conditional compilation complexity.
The core must compile and work without it, and the service layer must work
with it enabled.
## References
- [research/services.md](../../research/services.md) — AuthService, AuthProtocol definition
- [auth.md](../auth.md) — IdentityProvider trait, Identity struct
- [research/configuration.md](../../research/configuration.md) — Auth service approach
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.4
- [ADR-029](029-identity-core-type.md) — Identity as core type
- [ADR-027](027-crate-decomposition.md) — Crate decomposition

View File

@@ -1,107 +0,0 @@
# ADR-029: Identity as Core Type
## Status
Accepted
## Context
The `Identity` struct and `IdentityProvider` trait are needed by auth,
forwarding policy, and call protocol — three different subsystems in
alknet-core. Without placing them in core, these subsystems would each define
their own identity type, leading to duplication and conversion boilerplate.
The constraint: alknet-core must not depend on alknet-storage or any database.
The `IdentityProvider` trait must be in core so that the handler can resolve
identities without knowing whether the backing store is a config file or a
SQLite database. External crates provide implementations.
Earlier research defined `Identity` inconsistently: `{node_id, fingerprint,
scopes}` in services.md and `{id, scopes, resources}` in auth.md. The unified
model uses `{id, scopes, resources}` where `id` serves as both fingerprint (for
key-based auth from config) and account UUID (for database-backed auth).
## Decision
**`Identity` struct and `IdentityProvider` trait live in `alknet_core::auth`.**
### Identity Struct
```rust
pub struct Identity {
pub id: String, // Fingerprint (config auth) or account UUID (database auth)
pub scopes: Vec<String>, // e.g., ["relay:connect", "service:gitea:read"]
pub resources: HashMap<String, Vec<String>>, // e.g., {"service": ["gitea", "registry"]}
}
```
The `id` field serves dual purpose: when using config-based authentication
(`ConfigIdentityProvider`), it holds the Ed25519 key fingerprint. When using
database-backed authentication (`StorageIdentityProvider`), it holds the account
UUID from the `accounts` table. This keeps the type simple while accommodating
both auth paths.
The `scopes` field provides authorization scope strings used by
`ForwardingPolicy` and `AccessControl` in `OperationSpec`. The `resources`
field provides resource-level authorization beyond what scopes offer (e.g., which
services this identity can access).
### IdentityProvider Trait
```rust
pub trait IdentityProvider: Send + Sync + 'static {
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
}
```
The trait is the contract. Callers (auth handler, forwarding policy, call
protocol) depend on `IdentityProvider` — not on any concrete implementation.
### Default and Production Implementations
- **`ConfigIdentityProvider`** (in alknet-core) — reads from
`ArcSwap<DynamicConfig.auth>`. Every authorized key gets a default scope set.
No database needed. This is the default for minimal deployments.
- **`StorageIdentityProvider`** (in alknet-storage) — backed by SQLite
`peer_credentials` and `api_keys` tables plus the ACL graph. Resolves
fingerprint → account → organization membership → effective scopes. This is
the production implementation for head nodes.
alknet-core never depends on alknet-storage. The trait relationship is:
alknet-core *defines* the trait, alknet-storage *implements* it. The CLI or
NAPI assembly layer wires the concrete implementation.
### Why Not in alknet-storage?
If `Identity` lived in alknet-storage, alknet-core would need to depend on
alknet-storage to use the type — creating a circular dependency (since
alknet-storage implements alknet-core's `IdentityProvider` trait). Placing the
type and trait in core breaks the cycle.
## Consequences
- **Positive**: alknet-core has no database dependency. Auth, forwarding, and
call protocol all use the same `Identity` type.
- **Positive**: alknet-storage implements the core trait. The CLI/NAPI layer
wires the concrete implementation. Deployment topology determines which impl
to use.
- **Positive**: The `id` field serves dual purpose (fingerprint or UUID),
avoiding separate types for config-based and database-based auth.
- **Positive**: `ForwardingPolicy` and `AccessControl` can reference scopes from
`Identity` without knowing where they came from.
- **Negative**: Two implementations of `IdentityProvider` exist — `Config` and
`Storage`. Both must produce identical `Identity` results for the same input.
Tests should verify behavioral parity.
- **Negative**: The trait abstraction adds a level of indirection for the
minimal (config-only) deployment path. The cost is negligible — the
`ConfigIdentityProvider` is a simple `ArcSwap` dereference.
## References
- [auth.md](../auth.md) — IdentityProvider trait, Identity struct, unified auth
- [research/services.md](../../research/services.md) — AuthService, Identity section
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.2
- [ADR-023](023-unified-auth-shared-key-material.md) — Unified auth with shared key material
- [ADR-028](028-auth-irpc-service.md) — Auth as irpc service
- [OQ-18](../open-questions.md) — IdentityProvider owns scopes

View File

@@ -1,159 +0,0 @@
# ADR-030: Static/Dynamic Configuration Split
## Status
Accepted
## Context
Alknet's configuration is loaded once at startup and never changes. This causes
three specific failures:
1. **No hot reload of authentication credentials.** Adding or removing an
authorized key requires restarting the server process. In head/worker
deployments where keys are managed via a database, the process must be
restarted every time a key is added, revoked, or rotated. This is
operationally unacceptable.
2. **No port forwarding access control.** Any authenticated client can open a
`direct-tcpip` channel to any destination. There is no policy governing
which hosts, ports, or alknet control channels a client may access. A
compromised key grants unrestricted network access through the tunnel.
3. **No structured configuration beyond CLI flags.** ADR-011 chose
programmatic-first configuration for the alpha — correct at the time. But as
alknet moves toward publishable releases, operators need config files for
reproducible deployments, and the NAPI layer needs programmatic reload
capability that `ServeOptions` doesn't currently support.
Not all configuration should be reloadable. Transport-level settings (listen
address, TLS certificates, host key) require socket/TLS renegotiation to change
at runtime — effectively a restart. Auth and forwarding policy can change
atomically without disrupting existing connections.
## Decision
**Split configuration into `StaticConfig` and `DynamicConfig`.**
### StaticConfig
Immutable after startup. Constructed from `ServeOptions` (the builder pattern is
preserved). Contains everything that affects socket binding, TLS handshakes, or
SSH session negotiation:
- Transport mode, listen address
- TLS config (cert, key)
- iroh config (relay URL)
- Stealth mode flag
- Host key, host key algorithm
- Max auth attempts, max connections per IP
- Proxy config
Changing any of these requires a restart.
### DynamicConfig
Hot-reloadable at runtime via `ArcSwap<DynamicConfig>`. Contains everything
checked per-connection or per-channel:
- `AuthPolicy` — authorized keys, certificate authorities, token config
- `ForwardingPolicy` — allow/deny rules for channel targets (ADR-031)
- `RateLimitConfig` — rate limiting parameters
`ArcSwap` provides lock-free reads on the hot path (every `auth_publickey()` and
every `channel_open_direct_tcpip()` call does an `Arc` dereference — zero cost
compared to the current approach). Writes are atomic: `store()` swaps the
pointer. Existing connections finish with their current config; new connections
get the new config.
### ConfigReloadHandle
```rust
pub struct ConfigReloadHandle {
dynamic: Arc<ArcSwap<DynamicConfig>>,
}
impl ConfigReloadHandle {
pub fn reload(&self, new_config: DynamicConfig) { ... }
}
```
The handle is obtained from `Server::run()` and passed to NAPI or the CLI.
### ConfigService
The `ConfigService` wraps `ArcSwap<DynamicConfig>` reloads behind an irpc
protocol (behind the `irpc` feature flag) for production deployments that use
the service layer. For minimal deployments (CLI, single-node), direct
`ConfigReloadHandle::reload()` is sufficient.
### TOML Config File
An optional TOML config file covers static config plus initial auth/forwarding
paths. This **amends** ADR-011 (does not supersede it) — the programmatic-first
API remains primary. The config file is a convenience input format:
```toml
[server]
transport = "tls"
listen = "0.0.0.0:443"
stealth = false
max_connections_per_ip = 5
max_auth_attempts = 3
[server.tls]
cert = "/etc/alknet/tls/cert.pem"
key = "/etc/alknet/tls/key.pem"
[auth]
host_key = "/etc/alknet/ssh/host_key"
[forwarding]
default = "deny"
```
### NAPI Reload API
```typescript
interface AlknetServer {
reloadAuth(auth: { authorizedKeys?: Buffer, certAuthority?: Buffer }): void;
reloadForwarding(policy: ForwardingPolicyConfig): void;
reloadAll(config: DynamicConfig): void;
}
```
The NAPI layer parses key data and constructs a new `DynamicConfig`, then calls
`ConfigReloadHandle::reload()`.
### Client Configuration
Client configuration stays as `ConnectOptions` — no `ArcSwap` needed. Client
config is almost entirely static (which server to connect to, which key to use).
## Consequences
- **Positive**: Auth credentials and forwarding policy can be reloaded without
restarting the server. Adding a key via `reloadAuth()` takes effect on the
next connection attempt.
- **Positive**: ADR-011's programmatic-first intent is preserved. The TOML
config file is an optional convenience layer, not a replacement for
`ServeOptions`.
- **Positive**: `ArcSwap` provides zero-cost reads on the hot path. Every auth
check and every channel open is a single `Arc` dereference.
- **Positive**: The `ConfigService` irpc protocol (behind feature flag) allows
production deployments to integrate config reload into their service mesh
without taking a direct dependency on `DynamicConfig` internals.
- **Positive**: Forwarding policy is now part of `DynamicConfig` — operators can
restrict access per identity, per destination, per transport (ADR-031).
- **Negative**: Two config structs where there was one. The split is clean
(transport vs. policy) but adds surface area.
- **Negative**: Config file introduces `toml` as a dependency in the CLI crate.
This is acceptable for a CLI binary.
## References
- [research/configuration.md](../../research/configuration.md) — Full analysis
- [ADR-011](011-no-ssh-config-programmatic-api.md) — Programmatic-first API (amended, not superseded)
- [ADR-031](031-forwarding-policy.md) — Forwarding policy (part of DynamicConfig)
- [ADR-029](029-identity-core-type.md) — Identity as core type (DynamicConfig.auth uses IdentityProvider)
- [integration-plan.md](../../research/integration-plan.md) — Phase 1.1

View File

@@ -1,138 +0,0 @@
# ADR-031: Forwarding Policy
## Status
Accepted
## Context
Currently, any authenticated client can open a `direct-tcpip` SSH channel to
any destination. The only gate is authentication — once authenticated, a client
has unrestricted network access through the tunnel. This is a security gap: a
compromised key grants unrestricted access.
Operators need the ability to:
- Restrict which hosts and ports authenticated clients can access
- Apply different rules to different principals (key fingerprints, accounts)
- Restrict WebTransport clients to alknet control channels only
- Set a default policy (allow-all for migration compatibility, deny-all for
production)
## Decision
**Add `ForwardingPolicy` as part of `DynamicConfig` (reloadable without
restart).**
### Type Definitions
```rust
pub struct ForwardingPolicy {
pub default: ForwardingAction,
pub rules: Vec<ForwardingRule>,
}
pub struct ForwardingRule {
pub target: TargetPattern,
pub action: ForwardingAction,
pub principals: Vec<String>, // Empty = matches all
pub transports: Vec<TransportKind>, // Empty = matches all
}
pub enum ForwardingAction {
Allow,
Deny,
}
pub enum TargetPattern {
Any,
Host(String), // "localhost", "*.example.com"
Cidr(IpNetwork), // "10.0.0.0/8"
PortRange(String, Range<u16>), // "localhost", ports 8080-8090
AlknetPrefix, // Matches alknet-* control channels
}
```
### Rule Evaluation
Rules are evaluated in order. First match wins. If no rule matches, the default
applies. This supports both allowlist and blocklist semantics:
- **Allowlist**: `default: Deny`, then explicit Allow rules for permitted
destinations.
- **Blocklist**: `default: Allow`, then explicit Deny rules for blocked
destinations.
### Principals
Each rule can specify which principals it applies to. A principal is an
`Identity.id` (fingerprint or UUID) or a scope from `Identity.scopes`. When the
rule's `principals` field is empty, it matches all identities.
This connects to the `IdentityProvider` trait (ADR-029): when a client
authenticates, the `Identity` is resolved, and the forwarding policy checks
rules against `Identity.id` and `Identity.scopes`.
### TransportKind-Aware Rules
Each rule can specify which `TransportKind` it applies to. This enables
transport-specific restrictions — for example, WebTransport clients can be
restricted to `alknet-*` control channels only:
```rust
ForwardingRule {
target: TargetPattern::AlknetPrefix,
action: ForwardingAction::Allow,
principals: vec![],
transports: vec![TransportKind::WebTransport { host: "*".into() }],
}
```
### Where the Policy Check Happens
The forwarding policy check occurs in `channel_open_direct_tcpip` before the
proxy task is spawned. The current behavior (no check) is equivalent to
`ForwardingPolicy::allow_all()` — default Allow, no rules. This preserves
backward compatibility during migration.
### DynamicConfig Integration
`ForwardingPolicy` is part of `DynamicConfig` and reloadable via
`ConfigReloadHandle::reload()` or NAPI's `reloadForwarding()`. Changes take
effect on the next channel open — existing connections continue with their
current policy.
### OQ Resolutions
- **OQ-12** (Per-user forwarding scope vs global rules): Resolved. Start with
global rules + principal matching from `Identity.scopes`. Per-user scope
from `peer_credentials.metadata.scopes` via `IdentityProvider`.
- **OQ-16** (Transport-specific forwarding): Resolved. Add `TransportKind`
match in `ForwardingRule`. WebTransport clients can be restricted.
- **OQ-18** (Source of Identity.scopes): Resolved by ADR-029 and this ADR.
`IdentityProvider` owns scopes. `ForwardingPolicy` consumes them.
## Consequences
- **Positive**: Operators can restrict access per identity, per destination, per
transport. A compromised key no longer grants unrestricted network access.
- **Positive**: Default-allow preserves current behavior during migration. Switch
to default-deny for production deployments.
- **Positive**: Policy is reloadable without restart. Adding a rule via
`reloadForwarding()` takes effect on the next channel open.
- **Positive**: `TransportKind`-aware rules enable transport-specific
restrictions (e.g., WebTransport clients restricted to alknet-* channels).
- **Negative**: Another check in the hot path (every `channel_open_direct_tcpip`
call). The cost is a linear scan of rules — acceptable for small rule sets.
Large rule sets should use compiled matchers (future optimization).
- **Negative**: `TargetPattern` string matching is lenient. Host patterns like
`*.example.com` require careful implementation to prevent bypasses. The
`glob` or `globset` crate can handle this correctly.
## References
- [research/configuration.md](../../research/configuration.md) — ForwardingPolicy section
- [auth.md](../auth.md) — Identity.scopes and IdentityProvider
- [open-questions.md](../open-questions.md) — OQ-12, OQ-16, OQ-18
- [ADR-029](029-identity-core-type.md) — Identity as core type
- [ADR-030](030-static-dynamic-config-split.md) — DynamicConfig (ForwardingPolicy is part of it)
- [integration-plan.md](../../research/integration-plan.md) — Phase 1.3

View File

@@ -1,96 +0,0 @@
# ADR-032: Event Boundary Discipline
## Status
Accepted
## Context
The research identified three distinct communication patterns in the system, and
conflating them is a known anti-pattern in event-driven architectures:
1. **Domain events** (Honker streams) — Internal to the service that owns that
data. Used for state reconstruction within the service's own boundaries.
Examples: `nodes:created`, `edges:deleted`, `accounts:updated`.
2. **irpc service calls** — Synchronous request-response within a node or
cluster. Internal to the system. Examples: `AuthProtocol::VerifyPubkey`,
`SecretProtocol::DeriveEd25519`, `ConfigProtocol::ReloadForwarding`.
3. **Call protocol events** (`EventEnvelope`) — Asynchronous integration events
that cross node boundaries. External to the system. Examples:
`call.requested`, `call.responded`, `call.completed`, `call.aborted`.
Without a hard constraint, it's tempting to have one service subscribe directly
to another service's Honker streams. This leads to:
- **Leaky event store**: Service A reads Service B's domain events directly,
coupling A to B's internal state representation. When B changes its schema, A
breaks.
- **Boomerang coupling**: An integration event is too thin, causing the
consumer to call back to the source service synchronously to get details. This
negates the benefit of async communication.
- **Fat notification trap**: A notification event carries full entity state,
when it should use state transfer instead.
## Decision
**Event boundary discipline is a hard architectural constraint, not a
suggestion.**
1. **Domain events stay within the owning service.** A Honker stream published
by the storage service (`nodes:created`) is for the storage service's own
state reconstruction. No other service reads these stream events directly.
2. **irpc service calls are synchronous and internal.** They never cross node
boundaries. They are request-response, not events. They should not be used
as a substitute for integration events.
3. **Call protocol events are the only events that cross node boundaries.**
`EventEnvelope` frames are the integration boundary. When a domain event
needs to be communicated to another node, it must be projected into a call
protocol event.
4. **Projection from domain events to integration events is required when
crossing boundaries.** A service that owns a Honker stream must project
relevant state changes into `EventEnvelope` frames before they leave the
node. The projection strips internal details and produces a versioned,
stable integration event.
This discipline applies at three levels:
```
Call Protocol (Layer 3, external, JSON)
└── irpc Service (Layer 3, internal, postcard)
└── Honker Streams (Domain events, within service boundary)
```
A call protocol handler MAY call an irpc service internally (e.g.,
`/head/auth/verify` calls `AuthProtocol::VerifyPubkey`). The irpc service MAY
use Honker streams for its own state management. But domain events never
propagate beyond the service boundary without projection.
## Consequences
- **Positive**: Prevents leaky event stores. Services are independently
deployable and their internal schemas can evolve without breaking consumers.
- **Positive**: Honker and irpc are implementation details, not cross-boundary
contracts. The call protocol's `EventEnvelope` is the only stable, versioned
contract that other nodes depend on.
- **Positive**: Clear ownership. Each service owns its Honker streams and can
change them freely. Integration events are a deliberate, reviewed contract.
- **Positive**: Makes testing easier. Services can be tested in isolation with
mock domain events. Integration events are tested against the `EventEnvelope`
schema.
- **Negative**: Projection code is required. Every domain event that needs to
cross a boundary must be explicitly projected. This is deliberate — the
overhead ensures the integration contract is intentional.
- **Negative**: Developers must resist the temptation to subscribe directly to
Honker streams across services. Code review should catch this pattern.
## References
- [research/services.md](../../research/services.md) — Event boundary discipline section
- [research/storage.md](../../research/storage.md) — Honker integration, event boundaries
- [research/integration-plan.md](../../research/integration-plan.md) — ADR 032 entry
- [event_source_types.md](../../research/event-sourcing/event_source_types.md) — Event-driven architecture patterns

View File

@@ -1,132 +0,0 @@
# ADR-033: OperationEnv as Universal Composition Mechanism
## Status
Accepted
## Context
The `@alkdev/operations` TypeScript package defines `OperationEnv` as a
universal composition mechanism. A handler receives `context.env[namespace][op](input)`
and can invoke any registered operation regardless of whether it runs locally, in
an irpc service on the same cluster, or on a remote node via call protocol.
The research documents define three dispatch paths:
1. **Local dispatch** — direct function call through the operation registry
2. **Service dispatch** — irpc protocol call to a service backend
3. **Remote dispatch** — call protocol `EventEnvelope` to a remote node
Without a formal decision, irpc services could be seen as a replacement for
OperationEnv or for the call protocol. They are not — irpc is one dispatch
backend for OperationEnv, not a replacement for anything. The call protocol is
another dispatch backend. OperationEnv unifies them from the handler's
perspective.
The three communication patterns in the system (ADR-032) are:
- Domain events (Honker streams) — internal to the owning service
- irpc service calls — synchronous, in-cluster
- Call protocol events — asynchronous, cross-node
irpc services and call protocol operations serve different scopes but must
compose cleanly through OperationEnv.
## Decision
**OperationEnv is the universal composition mechanism that all operation
handlers receive. It provides namespace + operation name → invoke with input,
return output, regardless of dispatch path.**
### OperationEnv Behavioral Contract
```rust
// The behavioral contract: given a namespace and operation name, invoke the
// operation with the given input and return the output. The handler neither
// knows nor cares whether the dispatch is local, via irpc, or via call protocol.
pub trait OperationEnv: Send + Sync {
fn invoke(&self, namespace: &str, operation: &str, input: Value) -> ResponseEnvelope;
}
```
The Rust implementation may use typed method dispatch or a registry behind the
scenes, but the handler-facing API must preserve this contract.
### Three Dispatch Paths
OperationEnv resolves each call to one of three dispatch backends:
| Path | Mechanism | Serialization | Scope |
|------|-----------|---------------|-------|
| Local | Direct function call through registry | None (in-process) | Same process |
| Service | irpc protocol enum dispatch | postcard (binary) | Same cluster |
| Remote | Call protocol `EventEnvelope` | JSON | Cross-node |
All three produce the same `ResponseEnvelope`. The handler always calls
`context.env.invoke("secrets", "derive", input)` and gets a `ResponseEnvelope`
back.
### Service Assembly
The deployment topology determines which dispatch path each operation uses:
```rust
// Minimal deployment (single node, all local)
let env = OperationEnv::local(local_registry);
// Production deployment (mix of local and remote)
let env = OperationEnv::new()
.local("auth", auth_registry) // Auth runs locally
.local("config", config_registry) // Config runs locally
.service("secrets", secret_irpc_client) // Secret service via irpc
.remote("worker-1", call_protocol_conn) // Worker-1 operations via call protocol
```
### irpc Services Are One Dispatch Backend
irpc services (`AuthProtocol`, `SecretProtocol`, `ConfigProtocol`) define the
wire format for in-cluster communication. They are Rust-to-Rust, type-safe,
and efficient. But they are not a replacement for OperationEnv or for the call
protocol. They are one dispatch backend.
An irpc service can be exposed as a call protocol operation:
`/head/auth/verify` receives a call protocol event and internally calls
`AuthProtocol::VerifyPubkey` via irpc. The layers compose:
```
Call Protocol (Layer 3, external, JSON)
└── irpc Service (Layer 3, internal, postcard)
└── Honker Streams (Domain events, within service boundary)
```
### Adapters Map to OperationEnv
HTTP (`POST /v1/{namespace}/{op}`), MCP (`tools/call`), DNS
(`{op}.{namespace}.alk.dev TXT?`), and call protocol
(`/call.requested`) all resolve through OperationEnv. This is what makes
operations universally composable across all interfaces.
## Consequences
- **Positive**: Handlers compose through a single interface. Adding a new
dispatch path (e.g., a new irpc service) doesn't change handler code.
- **Positive**: irpc and call protocol coexist naturally. The handler doesn't
know which path was taken.
- **Positive**: Adapters (MCP, HTTP, DNS) map to operations through the same
OperationEnv interface. One handler, multiple dispatch paths.
- **Positive**: Deployment topology determines dispatch, not code. Same handler
works locally, in-cluster, or cross-node.
- **Negative**: OperationEnv is a new abstraction that must coexist with the
existing call protocol handler pattern. The registry currently maps paths to
handlers; OperationEnv adds namespace-aware composition on top.
- **Negative**: The `@alkdev/operations` TypeScript `HashMap<String,
HashMap<String, fn>>` model needs idiomatic Rust translation. The behavioral
contract must match, but the implementation can differ.
## References
- [research/services.md](../../research/services.md) — OperationContext, OperationEnv
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.5, OperationEnv wiring
- [ADR-026](026-transport-interface-separation.md) — Three-layer model (OperationEnv is Layer 3)
- [ADR-028](028-auth-irpc-service.md) — Auth as irpc service (one dispatch backend)
- [ADR-032](032-event-boundary-discipline.md) — Event boundary discipline
- [ADR-024](024-bidirectional-call-protocol.md) — Bidirectional call protocol
- [ADR-025](025-handler-spec-separation.md) — Handler/spec separation

View File

@@ -1,55 +0,0 @@
# ADR-034: Head/Worker Terminology
## Status
Accepted
## Context
The project previously used hub/spoke terminology for describing node
relationships: a hub node that coordinates connections and spokes that connect to
it. This terminology implies a strict star topology where the hub is
fundamentally different from spokes.
In practice, a coordinating node can also execute operations (run services,
forward traffic). Any node can become a coordinator. The architecture supports
mesh topologies where nodes coordinate in a peer-to-peer fashion.
The research documents (`core.md`, `services.md`) and updated architecture
specs (`call-protocol.md`, `auth.md`, `napi-and-pubsub.md`, `open-questions.md`)
already use head/worker consistently. Existing ADRs (024, 025) retain their
original hub/spoke language because ADRs are historical records.
## Decision
**Use head/worker terminology throughout the project.**
- **Head node**: A node that coordinates — accepts connections, routes
operations, manages cluster state. A head is also a worker (it can execute
operations).
- **Worker node**: A node that connects to a head, registers its services, and
executes operations. Any worker can become a head.
- **Node**: Any participant in the network. Every node has an Ed25519 identity.
The terms hub and spoke are deprecated in all new specs, code, and
documentation. Existing ADRs retain their original language as historical
records — ADRs document what was decided at the time, not what the current
terminology is.
## Consequences
- **Positive**: Natural mesh formation. A head that is also a worker enables
multi-hop routing, redundancy, and distributed topologies without a
centralized authority.
- **Positive**: Consistency with integration plan and research documents.
- **Positive**: The terminology better reflects the architecture — there is no
single "hub" that's fundamentally different from "spokes."
- **Neutral**: Existing ADRs (024, 025) retain hub/spoke in their text. This is
intentional — ADRs are historical records.
## References
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 0 ADR 034 entry, inconsistencies section
- [ADR-024](024-bidirectional-call-protocol.md) — Uses hub/spoke historically
- [ADR-025](025-handler-spec-separation.md) — Uses hub/spoke historically
- [research/core.md](../../research/core.md) — Head/worker terminology

View File

@@ -1,65 +0,0 @@
# ADR-035: StreamInterface and MessageInterface Split
## Status
Accepted
## Context
The `Interface` trait (ADR-026) assumes a persistent byte stream from a `Transport`. It produces a `Session` that yields `InterfaceEvent` frames. This works for SSH and raw framing — both run over duplex streams.
However, HTTP and DNS do not fit this model. They handle individual request/response pairs, not persistent sessions. HTTP runs over a TLS connection after byte-peek protocol detection (extending the existing stealth mode pattern). DNS runs its own server on port 53. Both are stateless per-request, not session-oriented.
The three-layer model (Transport, Interface, Protocol) remains correct. The issue is that Layer 2 has two distinct patterns: stream-based (SSH, raw framing) where the transport provides a continuous byte stream, and message-based (HTTP, DNS) where the interface manages its own transport and handles discrete requests.
## Decision
Split the `Interface` trait into two independent traits:
1. **`StreamInterface`** — consumes a `TransportStream`, produces a long-lived `Session` that yields `InterfaceEvent` frames. Existing `SshInterface` and `RawFramingInterface` become `StreamInterface` implementations.
2. **`MessageInterface`** — handles individual `InterfaceRequest``InterfaceResponse` pairs. Manages its own transport (HTTP server, DNS server). `HttpInterface` and `DnsInterface` are `MessageInterface` implementations.
The traits are independent. They have different signatures (`accept(stream)` vs `handle_request(req)`), different lifecycles (long-lived session vs stateless per-request), and different transport ownership (provided by caller vs self-managed).
`ListenerConfig` gains variants for both:
```rust
pub enum ListenerConfig {
Stream {
transport: TransportKind,
interface: StreamInterfaceKind,
},
Http {
bind_addr: SocketAddr,
tls: bool,
stealth: bool,
},
Dns {
bind_addr: SocketAddr,
tls: bool,
},
}
```
`TransportKind::Dns` is removed. DNS is a `MessageInterface` that manages its own transport (UDP/TCP port 53), not a transport variant.
The call protocol handler (Layer 3) is interface-agnostic: it processes `InterfaceEvent` frames from `StreamInterface` sessions and `InterfaceRequest``InterfaceResponse` from `MessageInterface` handlers. The dispatch logic is the same — only the framing differs.
## Consequences
**Positive**: HTTP and DNS are first-class interfaces with proper type signatures. No forcing stateless protocols into a session model. The existing stealth mode byte-peek pattern naturally extends to `HttpInterface`. The `InterfaceRequest` / `InterfaceResponse` types normalize calls across message-based interfaces.
**Positive**: Removing `TransportKind::Dns` prevents a breaking change later — code should never depend on DNS as a transport variant.
**Positive**: `ListenerConfig` correctly models the server's accept loop: stream listeners spawn one accept loop per (transport, interface) pair, while HTTP and DNS listeners each manage their own server.
**Negative**: Two traits where there was one. But they serve fundamentally different purposes. A common super-trait would add complexity (`accept_stream` + `handle_request` + `transport_kind`) without practical benefit — implementations satisfy one trait or the other, never both.
**Negative**: The `accept()` method on the current `Interface` trait needs to be renamed. This is a rename of an existing method signature, not a semantic change — `SshInterface` and `RawFramingInterface` implementations become `StreamInterface` implementations with the same `accept()` logic.
## References
- ADR-026 (transport/interface separation — updated by this ADR)
- [interface.md](../interface.md) — Interface layer spec
- [research/phase2/interface-model.md](../../research/phase2/interface-model.md) — Full analysis
- [research/phase2/tls-transport.md](../../research/phase2/tls-transport.md) — HTTP interface, ListenerConfig

View File

@@ -1,82 +0,0 @@
# ADR-036: CredentialProvider as Core Type
## Status
Accepted
## Context
Alknet's `IdentityProvider` resolves **inbound** authentication: given a
credential (fingerprint or token), produce an `Identity`. But there is no
corresponding abstraction for **outbound** credentials: how does alknet
authenticate _to_ external services (vast.ai, rustfs, gitea)?
Without `CredentialProvider`, each service wrapper would independently solve
credential retrieval, caching, and lifecycle management. This leads to
duplicated effort and inconsistent security practices across service wrappers.
The pattern mirrors the existing `IdentityProvider` pattern: trait in core,
default impl using simple storage, production impl using the secret service
and database.
## Decision
Define `CredentialProvider` trait and `CredentialSet` enum in
`alknet_core::credentials`.
```rust
pub trait CredentialProvider: Send + Sync + 'static {
fn get_credentials(&self, service: &str) -> Option<CredentialSet>;
fn refresh_credentials(&self, service: &str) -> Option<CredentialSet>;
}
pub enum CredentialSet {
ApiKey { header_name: String, token: String },
Basic { username: String, password: String },
Bearer { token: String },
S3AccessKey { access_key: String, secret_key: String, session_token: Option<String> },
OidcToken { access_token: String, refresh_token: Option<String>, expires_at: Option<u64> },
Custom { scheme: String, params: HashMap<String, String> },
}
```
The trait is intentionally narrow. It returns credentials for a named service.
It does not try to abstract the auth mechanism itself — that stays with the
service wrapper that knows the protocol (S3 signing, OAuth2 refresh, etc.).
Phase 1 provides `SecretStoreCredentialProvider` (reads from
`SecretProtocol::Decrypt`, holds in RAM). Phase 2+ adds
`ManagedCredentialProvider` (with `CredentialManager` for lifecycle management:
refresh, expiration, provisioning).
`CredentialProvider` does not depend on `IdentityProvider`, though
`ManagedCredentialProvider` may use `Identity.id` for identity-bound credential
lookups.
## Consequences
**Positive**: Outbound auth has a unified abstraction, just as inbound auth
has `IdentityProvider`. Service wrappers retrieve credentials through one
interface. `OperationEnv` can expose credentials through `context.env`.
**Positive**: The `CredentialSet` enum covers all identified credential types
(API keys, bearer tokens, S3 access keys, OIDC tokens, basic auth, custom).
This is sufficient for Phases A-C. Phase D (alknet as OIDC provider) is additive.
**Positive**: The trait in core, impl in service crate pattern is consistent
with `IdentityProvider` (trait in core, `ConfigIdentityProvider` in core,
`StorageIdentityProvider` in alknet-storage).
**Negative**: Adds a new core type and a new module (`credentials`). But this
is the same pattern as `IdentityProvider` and `auth` — a small, narrow trait
with a clear contract.
**Negative**: `ManagedCredentialProvider` and `CredentialManager` are Phase C
concepts. The spec should define them as future extensions, not implement them
now.
## References
- ADR-029 (Identity as core type — same pattern)
- [credentials.md](../credentials.md) — CredentialProvider spec
- [research/phase2/credential-provider.md](../../research/phase2/credential-provider.md) — Full analysis
- [identity.md](../identity.md) — IdentityProvider (inbound, opposite direction)

View File

@@ -1,83 +0,0 @@
# ADR-037: API Keys as DynamicConfig Auth
## Status
Accepted
## Context
Alknet's token auth uses Ed25519-signed `AuthToken`s — the same key material
used for SSH auth. This is appropriate for interactive clients (browsers, CLI)
that can generate and sign Ed25519 key pairs.
But for service accounts, automation, and simple integrations, Ed25519 key
pairs are inconvenient. A dashboard backend, a CI/CD pipeline, or a monitoring
script needs a simple bearer token that can be stored in an environment variable
or config file without managing cryptographic key pairs.
The HTTP interface (Phase 2+) requires bearer token auth for `Authorization:
Bearer <token>` headers. `AuthToken` works but requires client-side Ed25519
signing. API keys offer a simpler alternative: short bearer tokens verified by
SHA-256 hash lookup, with optional scope restrictions and TTL.
## Decision
Add `[[auth.api_keys]]` section to `DynamicConfig`:
```toml
[[auth.api_keys]]
prefix = "alk_"
hash = "sha256:abc..."
scopes = ["relay:connect", "secrets:derive"]
description = "dashboard service account"
ttl = "30d" # optional
```
`ConfigIdentityProvider::resolve_from_token()` handles both token types:
- If the input starts with the configured prefix (default `alk_`), treat it as
an API key: hash it with SHA-256 and look up the hash in the `api_keys` table.
- Otherwise, treat it as an `AuthToken`: decode, verify Ed25519 signature,
check timestamp, resolve from `authorized_keys`.
Both paths produce the same `Identity` result. In database-backed deployments,
both resolve to the same account UUID.
API keys are stored as SHA-256 hashes (like password hashing — the cleartext
key is never stored, only its hash). The prefix enables O(1) routing between
AuthToken and API key verification without trying both paths.
The full key is provided to the client exactly once (at creation time). Subsequent
verifications only compare hashes.
## Consequences
**Positive**: Simple bearer token auth for HTTP and other non-SSH interfaces.
No cryptographic key management for service accounts. Consistent with industry
practice (Stripe, GitHub, AWS all use prefixed API keys).
**Positive**: Both AuthTokens and API keys go through `resolve_from_token()`.
The caller doesn't need to know which type they're using. This keeps the
authentication layer unified.
**Positive**: Scoped API keys enable fine-grained access control for service
accounts. A monitoring tool gets `["monitoring:read"]`, not full access.
**Negative**: API keys are bearer tokens — anyone who obtains the key has the
associated permissions. The hash storage and optional TTL mitigate but do not
eliminate this risk. Ed25519 AuthTokens remain the preferred auth method for
interactive clients.
**Negative**: API key rotation requires updating `DynamicConfig` (or the
`api_keys` database table). The `ConfigReloadHandle` / `ConfigService` reload
mechanism handles this, but it's a deliberate operation, not automatic.
**Negative**: No rate limiting on API key verification is built into this ADR.
Rate limiting on the HTTP interface is a separate concern.
## References
- ADR-023 (unified auth, shared key material)
- ADR-029 (Identity as core type)
- ADR-030 (static/dynamic config split)
- [auth.md](../auth.md) — Token auth, AuthPolicy, API keys
- [configuration.md](../configuration.md) — DynamicConfig, AuthPolicy
- [research/phase2/interface-model.md](../../research/phase2/interface-model.md) — API keys in config

View File

@@ -1,137 +0,0 @@
# ADR-038: Seed Lifecycle and Memory Security
## Status
Accepted
## Context
The alknet-secret crate holds the master BIP39 seed phrase in RAM. This seed is
the root of trust for all derived keys (identity, encryption, signing). If the
seed is leaked — through memory dumps, swap files, or core dumps — an attacker
can derive every key in the system.
Security-conscious key management systems typically employ three defenses:
1. **Zeroize**: Overwrite sensitive memory before deallocating. Prevents
stale-data reads from freed memory.
2. **Memory locking** (`mlock`/`VirtualLock`): Prevent the OS from paging
sensitive RAM to disk. Prevents swap-file leakage.
3. **Constant-time comparison**: Prevent timing side-channels when comparing
keys or tokens.
The question is: which of these should alknet-secret adopt in v1, and which
should be deferred?
## Decision
**Phase 3 (v1): Zeroize only. Defer mlock and constant-time comparison to
Phase B.**
- All sensitive types (seed bytes, derived private keys, passphrase strings)
derive `Zeroize` and implement `Drop` to call `zeroize()` before deallocation.
- The `Lock` operation calls `zeroize()` on the seed and all cached derived
keys, then drops them.
- `mlock`/`VirtualLock` and constant-time comparison are not included in v1.
### Rationale for deferring mlock
1. **Complexity**: `mlock` requires root/CAP_IPC_LOCK on Linux or
`SeLockMemory` on Windows. The crate should work in unprivileged contexts
(development, testing, single-user nodes) without requiring system
configuration changes.
2. **Performance**: `mlock` locks physical pages, which are typically 4KB.
Locking many small buffers wastes physical memory. The seed (64 bytes) and
derived keys (3264 bytes each) are tiny — the real risk is swap-file
leakage, which `zeroize` partially mitigates by wiping before free.
3. **Deployment flexibility**: Production head nodes running as root or with
`CAP_IPC_LOCK` can add `mlock` in Phase B. Development and CLI nodes
shouldn't need it.
4. **Audit surface**: `mlock` introduces platform-specific code paths (Linux
vs macOS vs Windows) that should be audited together, not bolted on
incrementally.
### Rationale for deferring constant-time comparison
The `SecretProtocol` service receives requests over irpc (local mpsc or remote
QUIC). Comparison timing is not observable by callers — they send a message and
wait for a response. The comparison that matters (auth token verification) is
in alknet-core's `IdentityProvider`, not in alknet-secret. Key derivation
results (DerivedKey) are not compared against attacker-controlled input within
this crate.
### Zeroize implementation
```rust
use zeroize::Zeroize;
#[derive(Zeroize)]
#[zeroize(drop)]
struct SeedHolder {
seed: Vec<u8>,
}
#[derive(Zeroize)]
#[zeroize(drop)]
struct DerivedKeyCache {
keys: HashMap<String, Vec<u8>>,
}
```
`#[zeroize(drop)]` ensures that `Drop` calls `zeroize()` on all fields,
overwriting memory before deallocation. This is a compile-time guarantee —
forgetting to zeroize a field is a compile error.
### Lock lifecycle
```
Unlock(passphrase)
→ validate mnemonic (if restoring) or generate new
→ derive master key from seed
→ store seed in SeedHolder (Zeroize-protected)
→ cache empty (keys derived on demand)
DeriveEd25519/DeriveEncryptionKey/Encrypt/Decrypt
→ require unlocked state (error if locked)
→ derive key, return result
→ optionally cache derived key
Lock
→ zeroize all cached derived keys
→ zeroize seed
→ drop all sensitive material
→ service returns to locked state
```
## Consequences
- **Positive**: Zeroize is zero-cost at compile time, minimal dependency
(`zeroize` crate is ~500 lines, no `unsafe` on stable), and provides
meaningful protection against stale-memory reads.
- **Positive**: Lock effectively purges all sensitive material. After Lock,
the process memory contains no useful secret data.
- **Positive**: No platform-specific code paths in v1. The crate compiles and
runs everywhere without privilege requirements.
- **Negative**: Without `mlock`, the OS can page the seed to swap before
zeroization occurs. This is a window of vulnerability that Phase B closes.
The risk is acceptable for v1 because swap-file extraction requires root
access or physical access to the machine — the same threat model as reading
process memory directly.
- **Negative**: Without constant-time comparison, timing side-channels exist
in theory. In practice, no comparison in alknet-secret operates on
attacker-controlled input, so the risk is nil within this crate.
- **Negative**: `zeroize` adds a dependency. The `zeroize` crate is widely
used in Rust crypto (ring, ed25519-dalek, x25519-dalek) and is a de facto
standard.
## References
- [secret-service.md](../secret-service.md) — Security model, Lock/Unlock lifecycle
- [ADR-027](027-crate-decomposition.md) — Crate decomposition (alknet-secret is independent)
- [credentials.md](../credentials.md) — SecretStoreCredentialProvider integration
- `zeroize` crate — https://crates.io/crates/zeroize

View File

@@ -1,226 +0,0 @@
---
status: draft
last_updated: 2026-06-09
---
# Definitions: Terminology and Concept Disambiguation
## Purpose
Several terms are overloaded across alknet's architecture. This document defines
each term precisely and states the rule for using it in architecture specs. When
ambiguity is possible, specs must use the full qualifier.
This is a normative reference — other architecture documents link here rather
than repeating definitions inline.
## Term Definitions
### Interface (Layer 2)
An **Interface** consumes a Transport stream (Layer 1) or manages its own
transport, and produces call protocol sessions or handles discrete requests.
It is a _protocol parser_, not a network service.
Two subtypes:
| Subtype | Trait | Lifecycle | Transport ownership | Examples |
|---------|-------|-----------|---------------------|----------|
| `StreamInterface` | `accept(stream) → Session` | Long-lived session | Provided by caller | SshInterface, RawFramingInterface |
| `MessageInterface` | `handle_request(req) → Response` | Stateless per-request | Self-managed | HttpInterface, DnsInterface |
**Rule**: In alknet architecture docs, "Interface" (capitalized) refers to
Layer 2. Rust trait definitions use "trait" or "contract." Network URLs use
"endpoint." When discussing auth mechanisms per transport/interface pair, use
"credential presentation" (not "auth interface").
See: [interface.md](interface.md), ADR-035.
### Transport (Layer 1)
A **Transport** produces a byte stream (`AsyncRead + AsyncWrite + Unpin + Send`).
It is a _wire mechanism_, not a protocol. `TransportKind` enumerates:
`Tcp`, `Tls`, `Iroh`, `WebTransport`.
DNS is **not** a transport — it is a `MessageInterface` that manages its own
transport (UDP/TCP port 53).
**Rule**: Never use "transport" to refer to HTTP, DNS, or any protocol that
doesn't produce a `TransportStream`. Use "MessageInterface" instead.
See: [transport.md](transport.md), ADR-026, ADR-035.
### Service (irpc service)
An **irpc service** is an in-cluster, Rust-to-Rust service defined by an irpc
protocol enum. Dispatched by enum variant with postcard serialization. Examples:
`AuthProtocol`, `SecretProtocol`, `ConfigProtocol`.
**Rule**: Always qualify: "irpc service" (in-cluster, enum-dispatched),
"application service" (operation-registered handler), or "external service"
(third-party endpoint). Never use bare "service" in architecture docs.
See: [services.md](services.md), ADR-028, ADR-033.
### Operation (call protocol)
An **operation** is a path-based handler registered in `OperationRegistry`,
dispatched by `namespace + name`. Cross-node, cross-language, JSON
`EventEnvelope` framing.
**Rule**: Use "operation" for call protocol handlers. Use "irpc service method"
for enum-dispatched calls. These are different dispatch mechanisms unified by
OperationEnv.
See: [call-protocol.md](call-protocol.md), ADR-033.
### Identity (core type)
The `Identity` struct `{ id, scopes, resources }` represents an authenticated
principal. Produced by `IdentityProvider` (inbound auth resolution).
| Identity field | Config-backed auth | Database-backed auth |
|---------------|-------------------|---------------------|
| `id` | SSH key fingerprint | Account UUID |
| `scopes` | From authorized_keys entry | From peer_credentials + ACL |
| `resources` | From authorized_keys entry | From organization membership |
**Rule**: "Identity" (capitalized, code font) = the alknet struct. "identity
service" = a full identity management system (Keystone, etc.). Never conflate
the two.
See: [identity.md](identity.md), ADR-029.
### IdentityProvider (inbound auth)
`IdentityProvider` resolves **inbound** authentication: given a credential
(fingerprint or token), produce an `Identity`.
**Direction**: Inbound (who is calling alknet).
**Rule**: Never use "IdentityProvider" to describe outbound auth. That is
`CredentialProvider`.
See: [identity.md](identity.md), ADR-029.
### CredentialProvider (outbound auth)
`CredentialProvider` resolves **outbound** credentials: given a service name,
produce a `CredentialSet` for authenticating _to_ that service.
**Direction**: Outbound (how alknet calls others).
**Rule**: Never use "CredentialProvider" for inbound auth. That is
`IdentityProvider`.
See: [credentials.md](credentials.md), ADR-036.
### AuthToken
`AuthToken = base64url(key_id || timestamp || signature)` — an Ed25519-signed
timestamp token used for non-SSH auth. Self-signed by the client, verified
server-side.
**Rule**: Use "AuthToken" (capitalized) for this specific format. Use "API key"
for hash-verified bearer tokens. Never use bare "token" in architecture docs.
See: [auth.md](auth.md), ADR-023.
### API Key
A hash-verified bearer token with a prefix like `alk_...`. Simpler than
AuthToken (no Ed25519 key pair needed). Stored as SHA-256 hash in
`DynamicConfig.auth.api_keys` or `api_keys` table.
**Rule**: Always "API key" (two words) for hash-verified bearer tokens.
"AuthToken" for Ed25519-signed tokens.
See: [auth.md](auth.md), ADR-037.
### Domain Event vs Integration Event
| Type | Scope | Serialization | Example |
|------|-------|---------------|---------|
| Domain event | Within a service boundary | Any format (Honker streams) | `KeyRotated`, `InventoryAdjusted` |
| Integration event | Across service or node boundaries | JSON `EventEnvelope` | `call.requested`, `UserCreated` |
irpc service calls are synchronous request-response, not events.
**Rule**: "Domain event" for internal Honker streams. "Integration event" for
call protocol `EventEnvelope`. "irpc call" for synchronous in-cluster calls.
Per ADR-032, domain events never cross service boundaries without projection.
See: ADR-032, [services.md](services.md).
### Scope
A permission string attached to an `Identity`. Flat strings like
`"relay:connect"`, `"secrets:derive"`. Used by `ForwardingPolicy` and
operation-level ACL.
**Rule**: Use "scope" for `Identity.scopes` flat strings. Use "resource" for
`Identity.resources` entries. Do not conflate with hierarchical role models
unless explicitly noting a comparison to Keystone.
See: [identity.md](identity.md), ADR-031.
### OperationRegistry
The central registry mapping `(namespace, operation_name)` to handlers and
specs. All interfaces resolve to the same registry.
**Rule**: "OperationRegistry" for this specific data structure. "Service
catalog" only when explicitly comparing to Keystone or similar external systems.
See: [call-protocol.md](call-protocol.md), ADR-025.
### Credential Presentation
The mechanism by which credentials are presented on each (Transport, Interface)
pair:
| (Transport, Interface) | Credential presentation | Resolves via |
|----------------------|----------------------|-------------|
| (TLS, SSH) | SSH key handshake | `resolve_from_fingerprint()` |
| (TCP, SSH) | SSH key handshake | `resolve_from_fingerprint()` |
| (iroh, SSH) | SSH key handshake | `resolve_from_fingerprint()` |
| (TLS, raw framing) | AuthToken in frame header | `resolve_from_token()` |
| (TCP, raw framing) | AuthToken in frame header | `resolve_from_token()` |
| (WebTransport, raw framing) | AuthToken in CONNECT request | `resolve_from_token()` |
| (—, HTTP) | `Authorization: Bearer` header | `resolve_from_token()` |
| (—, DNS) | AuthToken in query labels | `resolve_from_token()` |
**Rule**: Use "credential presentation" for the mechanism of presenting
credentials on a specific (Transport, Interface) pair. Not "auth interface"
(which overloads "Interface").
See: [auth.md](auth.md), [interface.md](interface.md).
## Cross-cutting Open Questions
These questions affect multiple specs and need resolution before or during
Phase 2 implementation:
- **OQ-DEF-03**: Should `Identity.scopes` be hierarchical (Keystone implied roles)
or stay flat? Recommendation: Stay flat. Add implied scope resolution in
alknet-storage when multi-tenant deployment requires it.
- **OQ-DEF-07**: Should the on-chain `IdentityProvider` be a separate impl or a
`CredentialProvider` extension? Recommendation: Separate `IdentityProvider`
impl (`OnChainIdentityProvider`). `IdentityProvider` resolves inbound auth,
not outbound credentials.
- **OQ-DEF-08**: Should "credential presentation" replace overloaded "interface" in
auth contexts? Recommendation: Yes. Adopted in this document.
See: [open-questions.md](open-questions.md) for tracking.
## References
- [interface.md](interface.md) — StreamInterface / MessageInterface
- [auth.md](auth.md) — AuthToken, credential presentation per interface
- [identity.md](identity.md) — Identity, IdentityProvider
- [credentials.md](credentials.md) — CredentialProvider, CredentialSet
- [services.md](services.md) — irpc services vs application services
- [call-protocol.md](call-protocol.md) — Operations, OperationEnv
- [research/phase2/definitions.md](../research/phase2/definitions.md) — Full research with cross-domain mappings

View File

@@ -1,186 +0,0 @@
---
status: draft
last_updated: 2026-06-07
---
# FlowGraph
## What
The `alknet-flowgraph` crate provides graph data structures and operations,
mapping the TypeScript `@alkdev/flowgraph` package's call-graph and
operation-graph concepts to `petgraph::DiGraph`.
## Why
Call graphs and operation graphs are core observability and type-safety
constructs. Call graphs track request flow across services; operation graphs
validate type compatibility between composed operations. The crate is pure
computation (no I/O, no external state), making it safe to include in any
deployment topology.
## Architecture
### Core Abstraction
`petgraph::DiGraph` replaces graphology. The mapping is nearly 1:1 for the
operations used:
| TypeScript (graphology) | Rust (petgraph) |
|------------------------|-----------------|
| `graph.addNode(key, attrs)` | `graph.add_node(attrs)` + key_to_index |
| `graph.addEdge(source, target, attrs)` | `graph.add_edge(source, target, attrs)` |
| `hasCycle()` | `is_cyclic_directed(&graph)` |
| `topologicalSort()` | `toposort(&graph)` |
A `HashMap<String, NodeIndex>` provides node-key-to-index lookups, mirroring
the `key` column in the SQLite `nodes` table.
### FlowGraph<N, E>
```rust
pub struct FlowGraph<N, E>
where
N: NodeAttributes,
E: EdgeAttributes,
{
graph: DiGraph<N, E>,
key_to_index: HashMap<String, NodeIndex>,
}
pub trait NodeAttributes: Clone + Serialize + DeserializeOwned + Debug + Send + Sync {
fn key(&self) -> &str;
fn set_key(&mut self, key: String);
}
pub trait EdgeAttributes: Clone + Serialize + DeserializeOwned + Debug + Send + Sync {
fn edge_type(&self) -> &str;
}
```
### Operation Graph (Static)
Built from `OperationSpec`s at startup. Answers structural questions: type
compatibility, cycle detection, reachability.
```rust
pub struct OperationNodeAttrs {
pub name: String,
pub namespace: String,
pub op_type: OperationType,
pub input_schema: Value,
pub output_schema: Value,
}
pub enum OperationType { Query, Mutation, Subscription }
```
Type compatibility compares `output_schema` (source) against `input_schema`
(target) using `jsonschema::validate()`. Exact match or subtype = compatible
edge. Structural mismatch = incompatible edge.
### Call Graph (Dynamic)
Populated at runtime from call protocol events. Every `call.requested` adds a
node; `call.responded`/`call.error`/`call.aborted` update status.
```rust
pub struct CallNodeAttrs {
pub request_id: String,
pub operation_id: String,
pub status: CallStatus,
pub parent_request_id: Option<String>,
pub input: Value,
pub output: Option<Value>,
pub error: Option<CallErrorInfo>,
pub identity: Option<Identity>,
pub started_at: Option<String>,
pub completed_at: Option<String>,
}
pub enum CallStatus { Pending, Running, Completed, Failed, Aborted }
```
### Key Operations
| Query | Method | Returns |
|-------|--------|---------|
| Topological order | `topological_order()` | `Result<Vec<String>, CycleError>` |
| Cycle detection | `has_cycles()` | `bool` |
| Ancestors/descendants | `ancestors()`, `descendants()` | `Vec<String>` |
| Status filtering | `filter_by_status()` | Keys with matching status |
| Duration | `duration()` | `completed_at - started_at` |
### DAG Invariants
- **Operation graph**: DAG-only enforced at construction. Cycles throw
`CycleError`.
- **Call graph**: DAG by design. `parent_request_id` cannot create ancestor
cycles.
- **No parallel edges**: `multi: false`.
- **No self-loops**: `allow_self_loops: false`.
### Integration with alknet-storage
Call graphs and operation graphs are stored as metagraph instances in
alknet-storage. The bridge is serialization: `FlowGraph` serializes to
`serde_json::Value`, which storage persists in the `nodes.attributes` and
`edges.attributes` columns.
### Integration with alknet-core (Call Protocol)
The call protocol's `EventEnvelope` drives call graph construction:
```rust
call_map.on_requested(|event| {
call_graph.update_from_event(&CallEvent::Requested(event));
});
```
### Crate Dependencies
```toml
[dependencies]
petgraph = "0.x"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
jsonschema = "0.x"
thiserror = "1"
uuid = { version = "1", features = ["v4"] }
chrono = { version = "0.x", features = ["serde"] }
```
Does NOT depend on alknet-core, alknet-storage, or alknet-secret.
### Interface Back to Core
`OperationSpec` and `CallNodeAttrs` types must match alknet-core's definitions.
The bridge is serialization — flowgraph serializes to JSON, storage persists it.
alknet-flowgraph does not depend on alknet-core as a crate; it conforms to the
`OperationSpec` schema independently.
## Constraints
- Pure computation crate — no I/O, no database, no external state.
- No dependency on alknet-core, alknet-storage, or alknet-secret.
- Type compatibility with alknet-core's `OperationSpec` is via serialization
conformance, not a crate dependency.
## Open Questions
- None specific to this spec. See [open-questions.md](open-questions.md) for
general questions.
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [027](decisions/027-crate-decomposition.md) | Crate decomposition | alknet-flowgraph is independent of core, storage, secret |
## References
- [research/flow.md](../research/flow.md) — Full FlowGraph, operation graph, call graph design
- [research/integration-plan.md](../research/integration-plan.md) — Phase 2.3
- [call-protocol.md](call-protocol.md) — EventEnvelope, PendingRequestMap
- `@alkdev/flowgraph` — TypeScript call-graph and operation-graph implementation
- `@alkdev/operations` — OperationSpec, CallHandler, registry

View File

@@ -1,193 +0,0 @@
---
status: draft
last_updated: 2026-06-07
---
# Identity
## What
The `Identity` type and `IdentityProvider` trait are the core abstractions for
authentication and authorization in alknet. `Identity` is the unified result of
auth verification — whether via SSH public key, signed timestamp token, or
database lookup. `IdentityProvider` is the trait that resolves credentials to an
`Identity`, decoupling alknet-core from any specific identity storage.
## Why
Auth, forwarding policy, and call protocol all need to know who is making a
request and what they are authorized to do. Without `Identity` in core, each
subsystem would define its own identity type, leading to duplication and
conversion boilerplate. Without `IdentityProvider` as a trait, alknet-core
would either hardcode config-file-based auth or take a database dependency —
neither acceptable for a library crate.
The `IdentityProvider` trait exists because the same auth verification concept
needs two implementations: `ConfigIdentityProvider` for minimal deployments (all
keys in memory via ArcSwap) and `StorageIdentityProvider` for production (SQLite
lookup via `peer_credentials` and ACL graph). The trait is the contract; the
backing store is pluggable.
## Architecture
### Identity Struct
```rust
pub struct Identity {
pub id: String, // Fingerprint or account UUID
pub scopes: Vec<String>, // e.g., ["relay:connect", "service:gitea:read"]
pub resources: HashMap<String, Vec<String>>, // e.g., {"service": ["gitea", "registry"]}
}
```
The `id` field serves dual purpose:
- **Config-based auth** (`ConfigIdentityProvider`): holds the Ed25519 key
fingerprint (e.g., `SHA256:abc123...`)
- **Database-backed auth** (`StorageIdentityProvider`): holds the account UUID
from the `accounts` table
This keeps the type simple while accommodating both auth paths. Downstream
consumers (forwarding policy, call protocol ACL checks) use `scopes` and
`resources` without knowing whether the identity came from a config file or a
database.
### IdentityProvider Trait
```rust
pub trait IdentityProvider: Send + Sync + 'static {
/// Resolve an SSH public key fingerprint to an identity.
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
/// Resolve an auth token to an identity.
/// Returns None if the token is invalid, expired, or the key is not authorized.
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
}
```
Both SSH key auth and token auth resolve to the same `Identity` type. The trait
lives in `alknet_core::auth`.
### ConfigIdentityProvider (Default)
Reads from `ArcSwap<DynamicConfig.auth>` per ADR-030. Every authorized key gets
a default scope set. No database dependency. This is the default for CLI and
single-node deployments.
```rust
pub struct ConfigIdentityProvider {
auth_config: Arc<ArcSwap<DynamicConfig>>,
}
impl IdentityProvider for ConfigIdentityProvider {
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity> {
let config = self.auth_config.load();
config.auth.ssh.authorized_keys.get(fingerprint)
.map(|key_entry| Identity {
id: fingerprint.to_string(),
scopes: key_entry.scopes.clone(),
resources: key_entry.resources.clone(),
})
}
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity> {
// Verify Ed25519 signature against the same authorized_keys set
// Resolve to the same Identity as SSH auth would produce
}
}
```
### StorageIdentityProvider (Future — Phase 2+)
Implemented in `alknet-storage` (a crate that doesn't exist yet). Backed by
SQLite `peer_credentials` and `api_keys` tables plus the ACL graph. Resolves
fingerprint → account → organization membership → effective scopes.
This implementation is defined here so the contract is clear, but alknet-storage
hasn't been built yet. Phase 1 uses `ConfigIdentityProvider` exclusively. When
alknet-storage is built, it implements alknet-core's `IdentityProvider` trait,
and the CLI/NAPI assembly layer wires the concrete implementation.
### AuthProtocol irpc Service
The `AuthProtocol` irpc service (behind the `irpc` feature flag per ADR-028)
provides an async boundary for auth verification. It is one way to satisfy the
`IdentityProvider` trait, not a replacement for it:
```rust
enum AuthProtocol {
VerifyPubkey { fingerprint: String, key_data: Vec<u8> },
VerifyToken { token_bytes: Vec<u8>, timestamp: u64 },
ReloadKeys,
CheckAccess { identity: Identity, operation: String },
}
enum AuthResult {
Ok(Identity),
Denied(String),
}
```
The relationship:
- **Trait-based path**: Handler calls `identity_provider.resolve_from_fingerprint()`
directly. Zero overhead. Used when irpc is disabled or when the
implementation is local.
- **irpc path**: Handler calls `identity_provider.resolve_from_fingerprint()`,
which internally delegates to `AuthProtocol::VerifyPubkey` via an irpc client.
Used in production deployments with SQLite-backed auth.
Both paths produce the same `Identity` result. Note: the irpc path requires the
service layer to be built (Phase 2+). Phase 1 uses the trait path exclusively.
### Auth Flows
**SSH key auth** (existing, unchanged):
```
Client connects → SSH handshake → auth_publickey() callback
→ IdentityProvider::resolve_from_fingerprint(fingerprint)
→ Some(Identity) or None
```
**Token auth** (new, for non-SSH transports):
```
Browser connects → WebTransport CONNECT request
→ Extract token from URL path or Authorization header
→ IdentityProvider::resolve_from_token(token)
→ Some(Identity) or None
```
Both paths produce an `Identity`. The `Identity` is attached to the connection
and used by `ForwardingPolicy` and call protocol for authorization decisions.
## Constraints
- `Identity` and `IdentityProvider` live in `alknet_core::auth`. No database
dependency at the core level (ADR-029).
- alknet-storage implements the core trait — the dependency goes from storage
to core, not the other way.
- The `id` field in `Identity` serves dual purpose (fingerprint or UUID). This
is a deliberate simplification — downstream consumers don't need to know the
source.
- Certificate authority tokens are not supported for token auth in v1 (ADR-023).
- The irpc feature flag means nodes that only do SSH tunneling don't need the
service layer overhead.
## Open Questions
- None specific to this spec. See [open-questions.md](open-questions.md) for
general auth questions (OQ-15, OQ-19).
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [029](decisions/029-identity-core-type.md) | Identity as core type | `Identity` and `IdentityProvider` live in alknet-core, not storage |
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | `AuthProtocol` behind feature flag; `IdentityProvider` is the contract |
| [023](decisions/023-unified-auth-shared-key-material.md) | Unified auth | Same key material for SSH and token auth; same `Identity` result |
## References
- [auth.md](auth.md) — Token authentication, AuthPolicy, WebTransport session handling
- [research/services.md](../research/services.md) — AuthService, AuthProtocol definition
- [research/integration-plan.md](../research/integration-plan.md) — Phase 1.2
- [ADR-030](decisions/030-static-dynamic-config-split.md) — DynamicConfig (ConfigIdentityProvider reads from it)
- [ADR-031](decisions/031-forwarding-policy.md) — ForwardingPolicy consumes Identity.scopes

View File

@@ -1,390 +0,0 @@
---
status: draft
last_updated: 2026-06-09
---
# Interface (Layer 2)
## What
The Interface layer sits between Transport (Layer 1) and Protocol (Layer 3).
Interfaces consume byte streams from Transports or manage their own transports,
and produce call protocol sessions or handle discrete requests. SSH is an
interface, not a transport — it wraps a byte stream in session semantics. Raw
framing (4-byte length prefix + JSON `EventEnvelope`) is another interface.
HTTP and DNS are message-based interfaces that handle individual request/response
pairs without persistent sessions.
## Why
In the original architecture, SSH was deeply embedded in `ServerHandler`. This
tangling of transport, interface, and protocol made it impossible to:
- Run the call protocol over DNS queries without wrapping SSH inside DNS
- Use raw framing for local service mesh (no SSH overhead)
- Support WebTransport direct call protocol for browsers
- Separate auth mechanics from channel management
- Accept HTTP requests and map them to call protocol operations
The three-layer model (ADR-026) cleanly separates these concerns. Transport
produces bytes. Interface parses bytes into sessions or handles requests.
Protocol carries semantics. A connection is always a (Transport, Interface)
pair for stream-based interfaces, or a standalone message-based interface.
Phase 2 research identified that HTTP and DNS don't fit the persistent session
model — they're stateless per-request. This led to the StreamInterface /
MessageInterface split (ADR-035), which gives each interface category its own
trait with the right lifecycle and ownership model.
## Architecture
### Three-Layer Model
```
Layer 3: Protocol (Call protocol, Operations, OperationEnv)
Layer 2: Interface (StreamInterface: SSH, raw framing | MessageInterface: HTTP, DNS)
Layer 1: Transport (TCP, TLS, iroh, WebTransport)
```
- **Layer 1: Transport** — produces byte streams (`AsyncRead + AsyncWrite + Unpin
+ Send`). Unchanged per ADR-001. DNS is NOT a transport.
- **Layer 2: Interface** — two categories:
- **StreamInterface**: consumes a `TransportStream` and produces a long-lived
session that yields `InterfaceEvent` frames.
- **MessageInterface**: handles individual `InterfaceRequest` →
`InterfaceResponse` pairs. Manages its own transport.
- **Layer 3: Protocol** — carries semantics. Call protocol events, operation
registry, service calls. Agnostic to both Transport and Interface below it.
### StreamInterface Trait
```rust
#[async_trait]
pub trait StreamInterface: Send + Sync + 'static {
type Session: InterfaceSession;
async fn accept(
&self,
stream: Box<dyn TransportStream>,
config: &InterfaceConfig,
) -> Result<Self::Session>;
}
```
The session produced by a `StreamInterface` is consumed by the call protocol
handler. Different stream interfaces produce different session types, but the
call protocol handler receives `InterfaceEvent` frames from any stream
interface.
### MessageInterface Trait
```rust
#[async_trait]
pub trait MessageInterface: Send + Sync + 'static {
async fn handle_request(&self, request: InterfaceRequest) -> Result<InterfaceResponse>;
}
```
Message-based interfaces handle individual requests without persistent sessions.
They manage their own transport (HTTP server, DNS server) and normalize requests
into `InterfaceRequest` / `InterfaceResponse`.
### InterfaceRequest / InterfaceResponse
```rust
pub struct InterfaceRequest {
pub operation_path: String, // e.g., "/head/auth/verify"
pub input: Value, // JSON input payload
pub auth_token: Option<AuthToken>, // Extracted from wire format
pub metadata: HashMap<String, String>,
}
pub struct InterfaceResponse {
pub result: Result<Value, CallError>,
pub status: u16, // HTTP status, DNS result code, etc.
pub headers: HashMap<String, String>,
}
```
The call protocol handler processes `InterfaceRequest` the same way it processes
`InterfaceEvent` frames — both resolve to operation invocations through
`OperationEnv`. The difference is framing: stream interfaces produce `InterfaceEvent`
frames from a continuous byte stream, message interfaces construct `InterfaceRequest`
from their wire format.
### InterfaceSession
Every stream interface session implements `InterfaceSession`:
```rust
pub struct InterfaceEvent {
pub envelope: EventEnvelope,
pub identity: Option<Identity>,
}
#[async_trait]
pub trait InterfaceSession: Send {
async fn recv(&mut self) -> Option<InterfaceEvent>;
async fn send(&mut self, envelope: EventEnvelope) -> Result<()>;
}
```
`InterfaceEvent` carries an `EventEnvelope` and the authenticated `Identity`.
The call protocol handler (Layer 3) receives `InterfaceEvent` frames and
processes them uniformly, regardless of whether they arrived over SSH or raw
framing.
### SshInterface (StreamInterface)
Wraps the existing `ServerHandler` logic. This is the most complex stream
interface because SSH provides channel multiplexing, auth negotiation, and
proxy management within a single session.
What stays in SshInterface (Layer 2):
- SSH handshake and session management
- Auth delegation to `IdentityProvider` (via `auth_publickey()` callback)
- Channel multiplexing (multiple channels per session)
- `alknet-control:0` channel routing to call protocol
What moves to Layer 3 (call protocol handler):
- Operation registry and dispatch
- Forwarding policy checks (per ADR-031)
- Operation context construction (Identity, scopes)
What moves to per-connection state:
- Port forwarding proxy logic
**Current implementation note**: `SshSession::recv()` and `SshSession::send()`
are stubs. The bridge from SSH channels to `InterfaceEvent` frames is
scheduled for Phase 2 implementation (see integration-plan.md Phase 2.1).
### RawFramingInterface (StreamInterface)
Reads 4-byte big-endian length prefix + JSON `EventEnvelope` frames directly
from the transport stream. No SSH wrapping. No channel multiplexing — the
entire stream is a single call protocol channel.
```rust
pub struct RawFramingInterface;
impl StreamInterface for RawFramingInterface {
type Session = RawFramingSession;
// Reads length-prefixed EventEnvelope frames from the stream
}
```
Used for:
- Local service mesh (TCP + raw framing, no SSH overhead)
- Secure mesh (TLS + raw framing)
- WebTransport direct call protocol (future: WebTransport + raw framing)
Auth for raw framing: `AuthToken` in frame header, resolved via
`IdentityProvider::resolve_from_token()`.
**Current implementation note**: `RawFramingInterface::accept()` returns an
error. Frame reading/writing is scheduled for Phase 2 implementation (see
integration-plan.md Phase 2.2).
### HttpInterface (MessageInterface)
Accepts standard HTTP requests and maps them to call protocol operations:
```
POST /v1/{namespace}/{op} → registry.invoke(namespace, op, input) (mutation)
GET /v1/{namespace}/{op} → registry.invoke(namespace, op, input) (query)
GET /v1/{namespace}/{op} SSE → registry.subscribe(namespace, op, input) (subscription)
GET /v1/schema → registry.list_operations()
```
Auth: `Authorization: Bearer <token>` header, resolved via
`IdentityProvider::resolve_from_token()`. Both AuthTokens and API keys are
accepted.
The HTTP interface runs inside the existing stealth mode byte-peek architecture:
after a TLS handshake, the server peeks at the first bytes. If they're
`SSH-2.0-`, the stream goes to `SshInterface`. Otherwise, the stream goes to
the axum HTTP router.
**Phase 2 scope**: Auth middleware, stealth handoff, and default 404 handler
only. Specific operation routes and path conventions are Phase 5+. The
`ListenerConfig::Http` variant spawns an axum router that reaches auth context;
routing inside axum is a later concern.
### DnsInterface (MessageInterface)
A DNS server that encodes/decodes `EventEnvelope` frames as DNS query/response
pairs. AuthToken is embedded in DNS query labels. Resolution via
`IdentityProvider::resolve_from_token()`.
This is a `MessageInterface` — it manages its own transport (UDP/TCP port 53)
and handles individual DNS queries as request/response pairs. DNS is NOT a
transport.
**Phase**: DNS interface implementation is Phase 5+. The `ListenerConfig::Dns`
variant and `DnsInterface` stub are defined now; implementation is deferred.
### Stream-Based Interface Pairs
| Transport | StreamInterface | Credential Presentation | Use case |
|-----------|---------------|------------------------|----------|
| TLS | SshInterface | SSH key handshake | Standard alknet tunnel |
| TCP | SshInterface | SSH key handshake | Plain SSH tunnel |
| iroh | SshInterface | SSH key handshake | P2P SSH tunnel |
| TCP | RawFramingInterface | AuthToken in frame header | Local service mesh |
| TLS | RawFramingInterface | AuthToken in frame header | Secure mesh |
| WebTransport | RawFramingInterface | AuthToken in CONNECT request | Browser call protocol (future) |
### Message-Based Interface Pairs
| MessageInterface | Credential Presentation | Owns transport? | Use case |
|-----------------|------------------------|----------------|----------|
| HttpInterface | `Authorization: Bearer` header | Yes (axum) | REST API, dashboard, integrations |
| DnsInterface | AuthToken in query labels | Yes (DNS server) | Censorship-resistant control channel |
| WebSocketInterface | AuthToken in handshake | Yes (WS server) | Browser persistent connection (future) |
Message-based interfaces manage their own transport. They don't need a
`Transport` from Layer 1 — they ARE the transport+interface combined.
### ListenerConfig
The server's accept loop configuration covers both stream and message interfaces:
```rust
pub enum ListenerConfig {
Stream {
transport: TransportKind,
interface: StreamInterfaceKind,
},
Http {
bind_addr: SocketAddr,
tls: bool,
stealth: bool, // byte-peek protocol detection on shared port
},
Dns {
bind_addr: SocketAddr,
tls: bool,
},
}
pub enum StreamInterfaceKind {
Ssh,
RawFraming,
}
pub enum TransportKind {
Tcp,
Tls { server_name: Option<String> },
Iroh { endpoint_id: String },
WebTransport, // Phase 5+: tag only, no acceptor yet
}
```
Note: `TransportKind::Dns` does NOT exist. DNS is a `MessageInterface`, not a
transport. The `ListenerConfig::Dns` variant handles DNS listener configuration
directly.
### Credential Presentation Across Interfaces
Every interface resolves to the same `Identity` through `IdentityProvider`:
```
SSH fingerprint → IdentityProvider::resolve_from_fingerprint → Identity
AuthToken (Bearer) → IdentityProvider::resolve_from_token → Identity
API key (Bearer) → IdentityProvider::resolve_from_token → Identity
DNS embedded token → IdentityProvider::resolve_from_token → Identity
```
The credential presentation differs per (Transport, Interface) pair, but the
resolution result is always an `Identity`. See [definitions.md](definitions.md)
for the full table and terminology rules.
### Server Accept Loop
With both stream and message interfaces, the accept loop becomes:
```rust
for listener in listeners {
match listener {
ListenerConfig::Stream { transport, interface } => {
// Spawn accept loop: transport.accept() → interface.accept(stream)
}
ListenerConfig::Http { bind_addr, tls, stealth } => {
// Spawn axum HTTP server on bind_addr
// If stealth: byte-peek after TLS, route SSH vs HTTP
}
ListenerConfig::Dns { bind_addr, tls } => {
// Spawn DNS server on bind_addr
}
}
}
```
## Constraints
- `StreamInterface` and `MessageInterface` are independent traits with different
signatures, lifecycles, and transport ownership. No common super-trait (ADR-035).
- `SshInterface` is the most invasive refactoring. The existing `SshHandler`
owns auth, channel management, and proxy logic — extracting these cleanly
requires careful design (integration-plan Phase 1.8, completed in Phase 1).
- DNS interface implementation is Phase 5 work. `DnsInterface` is defined as a
`MessageInterface` stub; implementation is deferred.
- HTTP interface Phase 2 scope is limited to auth middleware and stealth handoff.
Specific operation routes are Phase 5+.
- WebTransport is Phase 5 work. `TransportKind::WebTransport` and
`StreamInterfaceKind::WebTransport` are tags only for now.
- `TransportKind::Dns` does not exist. DNS is a `MessageInterface`, not a
transport. This was `TransportKind` enum pollution from an earlier design.
- The `Interface` trait (singular) in the current codebase needs to be renamed
to `StreamInterface`. This is a rename, not a semantic change.
## Open Questions
- **OQ-IF-02**: ~~Should `SshInterface` own the `ForwardingPolicy` check for
`channel_open_direct_tcpip`, or should that move to Layer 3?~~ **Resolved**:
ForwardingPolicy is Layer 3, but channel open/close lifecycle is Layer 2.
SshInterface reports channel requests to Layer 3; Layer 3 applies policy.
- **OQ-P2-01**: Should `MessageInterface` and `StreamInterface` share a common
trait? **Recommendation**: No. Independent traits with different signatures,
lifecycles, and transport ownership. A common super-trait adds complexity
without clear benefit. (See ADR-035.)
- **OQ-P2-02**: Should the HTTP interface share a port with the SSH listener?
**Recommendation**: Start with separate ports. ALPN multiplexing on port 443
is a future optimization that doesn't change the interface abstraction.
Stealth mode byte-peek already handles shared-port detection for the common
case.
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [026](decisions/026-transport-interface-separation.md) | Three-layer model | SSH is Layer 2, not Layer 1 |
| [035](decisions/035-streaminterface-messageinterface-split.md) | StreamInterface / MessageInterface | Two trait categories at Layer 2 |
| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv | Protocol is interface-agnostic |
| [029](decisions/029-identity-core-type.md) | Identity as core type | Auth resolution across interfaces |
| [031](decisions/031-forwarding-policy.md) | Forwarding policy | Layer 3 policy applied to Layer 2 channel requests |
## Phase 2 Implementation Notes
- `Interface` trait renamed to `StreamInterface` throughout alknet-core (ADR-035 implemented)
- `MessageInterface` trait added with `handle_request(InterfaceRequest) -> Result<InterfaceResponse>` (ADR-035 implemented)
- `InterfaceRequest` and `InterfaceResponse` types implemented
- `HttpInterface` and `DnsInterface` stub structs added (Phase 5 for full implementation)
- `InterfaceConfig` split into `StreamInterfaceConfig` and `MessageInterfaceConfig`
- `StreamInterfaceKind` and `MessageInterfaceKind` enums added
- `ListenerConfig` restructured from flat struct to enum with `Stream`, `Http`, `Dns` variants
- `TransportKind::Dns` removed from the enum (DNS is a MessageInterface, not a transport)
- `TransportKind::WebTransport` updated from `{ host: String }` to `{ server_name: Option<String> }`
- `RawFramingInterface` fully implemented with first-frame auth
- `SshSession::recv()`/`send()` bridge to call protocol via `alknet-control:0` channel implemented, using `ControlChannelBridge` with mpsc channels
## References
- [definitions.md](definitions.md) — Terminology disambiguation, credential presentation
- [research/phase2/interface-model.md](../research/phase2/interface-model.md) — Full StreamInterface/MessageInterface analysis
- [research/phase2/tls-transport.md](../research/phase2/tls-transport.md) — HTTP interface, stealth handoff, ListenerConfig
- [research/integration-plan.md](../research/integration-plan.md) — Phase 1.8, Phase 2.1-2.7
- [transport.md](transport.md) — Transport trait (unchanged at Layer 1)
- [auth.md](auth.md) — Credential presentation per (Transport, Interface) pair
- [identity.md](identity.md) — IdentityProvider, auth across interfaces

View File

@@ -1,189 +0,0 @@
---
status: reviewed
last_updated: 2026-06-07
---
# NAPI Wrapper & PubSub Event Target
## What
Two integration layers that enable TypeScript/JavaScript consumers to use alknet as a transport:
1. **NAPI wrapper** (`@alkdev/alknet`) — A Node.js native addon (via napi-rs) exposing `connect()` and `serve()` that return duplex streams
2. **PubSub event target** (`@alkdev/pubsub` adapter) — An implementation of the `TypedEventTarget` interface that routes events over alknet's SSH channel
## Why
The alknet Rust binary serves CLI users. But the broader ecosystem (pubsub, operations, agent workers) is TypeScript-first. These integration layers let TypeScript code use alknet's transport without reimplementing SSH.
The NAPI surface is intentionally minimal — it exposes transport connections as duplex streams, not the full SSH protocol. The pubsub adapter wraps those streams with `EventEnvelope` serialization.
## Architecture
### NAPI Wrapper (napi-rs)
The wrapper uses napi-rs (ADR-015) and exposes two functions (ADR-016):
```typescript
// @alkdev/alknet (TypeScript side)
interface AlknetConnectOptions {
// TCP/TLS mode
server?: string; // e.g., "example.com:443"
// iroh mode
peer?: string; // iroh endpoint ID (base58-encoded)
// Transport
transport: 'tcp' | 'tls' | 'iroh';
// Auth
identity?: string; // path to SSH key, or Buffer with key data
// TLS
tlsServerName?: string; // SNI hostname
insecure?: boolean; // accept self-signed certs
// iroh
irohRelay?: string; // relay URL (default: n0)
// Proxy
proxy?: string; // upstream SOCKS5/HTTP proxy URL
}
interface AlknetServeOptions {
// Transport
transport: 'tcp' | 'tls' | 'iroh';
// Auth
hostKey?: string; // path to SSH host key, or Buffer with key data
authorizedKeys?: string; // path to authorized_keys, or Buffer with key data
certAuthority?: string; // path to CA public key for cert-authority auth
// TLS
tlsCert?: string; // path to TLS cert
tlsKey?: string; // path to TLS key
acmeDomain?: string; // ACME domain for auto-cert (ADR-008)
// Listen
listen?: string; // listen address (default: 0.0.0.0:22)
// iroh
irohRelay?: string; // relay URL (default: n0)
}
// Returns a Duplex stream for the SSH channel
function connect(options: AlknetConnectOptions): Promise<Duplex>;
// Returns a server object with close() and connection events
function serve(options: AlknetServeOptions): Promise<AlknetServer>;
interface AlknetServer {
close(): Promise<void>;
onConnection(callback: (stream: Duplex, info: ConnectionInfo) => void): void;
// Dynamic config reload (ADR-030)
reloadAuth(auth: { authorizedKeys?: Buffer, certAuthority?: Buffer }): void;
reloadForwarding(policy: ForwardingPolicyConfig): void;
reloadAll(config: DynamicConfig): void;
}
interface ForwardingPolicyConfig {
default: 'allow' | 'deny';
rules: ForwardingRuleConfig[];
}
interface ForwardingRuleConfig {
target: string; // "localhost:*", "10.0.0.0/8:80", "alknet-*"
action: 'allow' | 'deny';
principals?: string[]; // default ["*"]
}
```
The NAPI layer is **transport-agnostic** — it doesn't know about pubsub's `EventEnvelope`. The pubsub adapter wraps the `Duplex` stream to implement `TypedEventTarget`. This separation ensures the NAPI wrapper is reusable for any stream-based protocol, not tied specifically to pubsub.
### NAPI Call Protocol Integration
NAPI consumers can register operation handlers to participate in the call protocol. The `Duplex` stream from `connect()` or `serve()` carries `EventEnvelope` frames (4-byte BE length prefix + JSON). A TypeScript consumer can implement a call protocol handler that reads these frames and dispatches to registered operations — the same wire protocol used by `@alkdev/operations`.
See [call-protocol.md](call-protocol.md) for the call protocol spec and [services.md](services.md) for OperationEnv and dispatch paths.
### NAPI irpc Service Creation
Behind the `irpc` feature flag, NAPI consumers can create irpc service instances for in-cluster communication. This is a Phase 2+ capability — Phase 1 uses `ConfigIdentityProvider` and direct `ConfigReloadHandle` calls. See [services.md](services.md) for the irpc service layer and ADR-027 for crate decomposition.
### NAPI `connect()` vs CLI `alknet connect`
The NAPI `connect()` function and the CLI `alknet connect` command are fundamentally different operations despite sharing the same name:
- **CLI `alknet connect`**: Starts a full SSH client session with a local SOCKS5 server and optional port forwards. It manages multiple SSH channels over a single session — the user routes traffic through it via SOCKS5 or forwarded ports.
- **NAPI `connect()`**: Opens a single SSH channel and returns it as a `Duplex` stream. No SOCKS5 server, no port forwarding. The caller reads and writes bytes directly. This is designed for the pubsub/programmatic use case where a single bidirectional byte stream is needed.
For SOCKS5 proxy functionality, use the CLI binary (`alknet connect`). The NAPI wrapper is for programmatic consumers that need a raw stream.
### Programmatic Configuration (ADR-011)
Both `connect()` and `serve()` accept options as plain objects. No file paths are mandatory — keys can be provided as `Buffer` data directly, making programmatic usage straightforward. Environment variables (`ALKNET_SERVER`, `ALKNET_IDENTITY`) provide convenience defaults.
Key material provided as `Buffer` must be in **OpenSSH key format** (the format used by `ssh-keygen`). Private keys: OpenSSH format (`-----BEGIN OPENSSH PRIVATE KEY-----`). Public keys: OpenSSH format (`ssh-ed25519 AAAA...`). PEM-encoded keys (PKCS#1, PKCS#8) are not supported.
### PubSub Event Target Adapter
This implements `TypedEventTarget` from `@alkdev/pubsub`:
```typescript
// @alkdev/pubsub (new adapter: event-target-alknet.ts)
export interface AlknetEventTargetOptions {
stream: Duplex; // from @alkdev/alknet.connect() or serve()
}
export interface AlknetEventTarget<TEvent extends TypedEvent>
extends TypedEventTarget<TEvent> {
close(): void;
}
export function createAlknetEventTarget<TEvent extends TypedEvent>(
options: AlknetEventTargetOptions
): AlknetEventTarget<TEvent>;
```
Wire protocol (same as other pubsub adapters):
- **Framing**: 4-byte big-endian length prefix + JSON payload
- **Payload**: `EventEnvelope` JSON (`{ type, id, payload }`)
- **Control**: `__subscribe` / `__unsubscribe` messages for topic-based routing
- **Direction**: Bidirectional — `dispatchEvent` sends, `addEventListener` subscribes and receives
### On the Server Side
The alknet server uses a reserved `direct_tcpip` destination (`alknet-control:0`) for the pubsub control channel (ADR-018). When a client connects to this destination:
1. The server's `channel_open_direct_ip` handler detects the reserved `alknet-control` target
2. Instead of opening a TCP connection, it bridges the channel to its local pubsub event bus
3. `EventEnvelope` JSON flows bidirectionally over the SSH channel
Users who prefer not to use the control channel can alternatively run a pubsub service on a specific port and use standard port forwarding: `alknet connect --forward 9736:head:9736`. This is a deployment choice, not a separate implementation — alknet's port forwarding works normally for any TCP service.
- **Worker connects to head**: `alknet connect --forward 9736:head:9736` then create WebSocket event target pointing at `ws://localhost:9736`
- **Head connects to worker**: `alknet connect --remote-forward 9736:worker:9736` — same result, opposite initiator
The pubsub adapter doesn't care which side initiated the SSH session. It just needs a byte stream.
## Constraints
- The NAPI wrapper exposes duplex streams, not the full SSH channel API. Multiplexing is done at the pubsub layer.
- The pubsub wire protocol is length-prefixed JSON, matching the existing adapter pattern. Binary payloads should be base64-encoded in the `EventEnvelope.payload`.
- The NAPI binary size will be ~5-10MB (includes russh + tokio + cryptography). The `iroh` feature adds significant size; it should be an optional feature.
- Keys can be provided as file paths or `Buffer` data, supporting both CLI and programmatic usage patterns (ADR-011).
## Open Questions
None — all resolved.
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [007](decisions/007-napi-single-stream.md) | NAPI exposes single duplex stream | No SSH multiplexing in JS, pubsub handles it |
| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first API | No file-based config; options are structs or env vars |
| [015](decisions/015-napi-rs-for-ffi-bridge.md) | napi-rs for FFI | Standard Node.js native addon tooling |
| [016](decisions/016-napi-expose-connect-and-serve.md) | Both connect() and serve() | NAPI exposes client and server sides from the start |
| [018](decisions/018-control-channel-for-pubsub.md) | Control channel for pubsub | Reserved `alknet-control` destination for event bus |
| [030](decisions/030-static-dynamic-config-split.md) | Static/dynamic config split | NAPI reload methods for auth, forwarding, and all dynamic config |
## References
- [configuration.md](configuration.md) — DynamicConfig, ForwardingPolicy, reload mechanism
- [services.md](services.md) — OperationEnv, irpc service layer
- [call-protocol.md](call-protocol.md) — Call protocol wire format and operation registry

View File

@@ -1,340 +0,0 @@
---
status: draft
last_updated: 2026-06-07
---
# Open Questions
## Transport
### OQ-01: TLS certificate management strategy
- **Origin**: [server.md](server.md)
- **Status**: ~~resolved~~
- **Priority**: ~~medium~~
- **Resolution**: ADR-008 — Support both domain-based and IP-based ACME/Let's Encrypt auto-provisioning, plus manual certs. Domain-based uses standard certbot-style flow with HTTP-01/TLS-ALPN-01 challenges. IP-based uses short-lived certs via TLS-ALPN-01 on port 443. Manual certs via `--tls-cert`/`--tls-key` always supported. Implementation uses `rustls-acme` or similar pure-Rust ACME client.
- **Cross-references**: [ADR-008](decisions/008-acme-lets-encrypt.md), Server spec, TlsTransport implementation
### OQ-02: iroh relay configuration defaults
- **Origin**: [transport.md](transport.md)
- **Status**: ~~resolved~~
- **Priority**: ~~low~~
- **Resolution**: ADR-009 — Default to n0's free relay servers. Allow override via `--iroh-relay <url>`. Document self-hosted relay setup. This matches iroh's own defaults and minimizes friction for testing/development.
- **Cross-references**: [ADR-009](decisions/009-default-iroh-relay.md), Transport spec
### OQ-05: Transport chaining support in CLI
- **Origin**: [transport.md](transport.md)
- **Status**: ~~resolved~~
- **Priority**: ~~low~~
- **Resolution**: ADR-010 — Support `--transport iroh --proxy socks5://...` natively in the CLI. iroh's endpoint builder accepts proxy configuration directly, so the implementation is minimal. Other transport combinations (TCP+TLS) are already implicit.
- **Cross-references**: [ADR-010](decisions/010-transport-chaining-cli.md), Transport spec
## Client
### OQ-06: SSH config file parsing
- **Origin**: [client.md](client.md)
- **Status**: ~~resolved~~
- **Priority**: ~~low~~
- **Resolution**: ADR-011 — No `~/.ssh/config` parsing, no custom config file. Configuration is programmatic-first: CLI flags, library API structs (`ConnectOptions`, `ServeOptions`), and environment variables. Cross-platform path issues (`~` expansion) are avoided. The library API is the primary interface; if config files are needed later, they can be a separate layer.
- **Cross-references**: [ADR-011](decisions/011-no-ssh-config-programmatic-api.md), Client spec
## Server
### OQ-07: ACME/Let's Encrypt support
- **Origin**: [server.md](server.md)
- **Status**: ~~resolved~~
- **Priority**: ~~medium~~
- **Resolution**: ADR-008 — Same resolution as OQ-01. Both domain-based (standard, domain-bound, auto-renewing) and IP-based (short-lived, no domain required) ACME flows are supported. The domain-based path requires port 80 or DNS access for challenges. The IP-based path uses TLS-ALPN-01 on port 443 and requires the ACME client to run continuously.
- **Cross-references**: [ADR-008](decisions/008-acme-lets-encrypt.md), Server spec, TlsTransport
### OQ-08: Connection limits and rate limiting
- **Origin**: [server.md](server.md)
- **Status**: ~~resolved~~
- **Priority**: ~~low~~
- **Resolution**: ADR-013 — Two-layer approach: (1) Structured logging of auth attempts and connections at INFO level for fail2ban integration on Linux — matches our production fail2ban setup with nftables and systemd journal. (2) Built-in rate limiting: `--max-connections-per-ip` and `--max-auth-attempts` flags providing platform-independent abuse protection.
- **Cross-references**: [ADR-013](decisions/013-fail2ban-friendly-logging.md), Server spec, Production fail2ban docs
### OQ-04: Authentication beyond Ed25519 keys
- **Origin**: [client.md](client.md), [server.md](server.md)
- **Status**: ~~resolved~~
- **Priority**: ~~low~~
- **Resolution**: ADR-012 — Ed25519 public key (default, unchanged) + OpenSSH certificate authority support (new, important for multi-user). No password authentication over SSH channels. If a local SOCKS5 proxy needs its own auth, that's a separate concern. Cert-authority makes multi-user management practical: one CA entry in `authorized_keys` instead of N individual keys. Certificates support expiry and restrictions.
- **Cross-references**: [ADR-012](decisions/012-auth-ed25519-and-cert-authority.md), Client spec, Server spec
## TUN
### OQ-03: Windows TUN support scope
- **Origin**: [tun-shim.md](tun-shim.md)
- **Status**: ~~resolved~~
- **Priority**: ~~low~~
- **Resolution**: ADR-014 — TUN is deferred entirely from the alknet project. For VPN-like behavior, users run `tun2proxy --proxy socks5://127.0.0.1:1080` alongside alknet. This eliminates all TUN-related scope questions (Windows, TCP reconstruction, etc.).
- **Cross-references**: [ADR-014](decisions/014-defer-tun-recommend-socks5-proxy.md)
### OQ-09: TCP reconstruction approach for TUN
- **Origin**: [tun-shim.md](tun-shim.md)
- **Status**: ~~resolved~~
- **Priority**: ~~medium~~
- **Resolution**: ADR-014 — TUN is deferred from alknet. tun2proxy (external tool) handles this if users need VPN-like behavior.
- **Cross-references**: [ADR-014](decisions/014-defer-tun-recommend-socks5-proxy.md)
## NAPI / PubSub
### OQ-10: NAPI wrapper API surface
- **Origin**: [napi-and-pubsub.md](napi-and-pubsub.md)
- **Status**: ~~resolved~~
- **Priority**: ~~medium~~
- **Resolution**: ADR-016 — Expose both `connect()` and `serve()` from the start. Both are fundamental operations needed by the pubsub event target system (spokes use `connect()`, hubs could use `serve()`). The NAPI layer is transport-agnostic — it doesn't know about pubsub's `EventEnvelope`. The pubsub adapter wraps the `Duplex` stream. This ensures the NAPI wrapper is reusable for any stream-based protocol, not tied specifically to pubsub.
- **Cross-references**: [ADR-016](decisions/016-napi-expose-connect-and-serve.md), napi-and-pubsub.md
### OQ-11: napi-rs vs uniffi for FFI bridge
- **Origin**: [napi-and-pubsub.md](napi-and-pubsub.md)
- **Status**: ~~resolved~~
- **Priority**: ~~low~~
- **Resolution**: ADR-015 — Use napi-rs. It's the standard for Node.js native addons, matches our primary consumer (TypeScript/Node.js), and has the best ecosystem and documentation. If future Python or mobile consumers are needed, a separate uniffi layer can be added — the Rust core doesn't change.
- **Cross-references**: [ADR-015](decisions/015-napi-rs-for-ffi-bridge.md), napi-and-pubsub.md
## Configuration
### OQ-12: Per-user forwarding scope vs global rules
- **Origin**: [research/configuration.md](../research/configuration.md)
- **Status**: ~~resolved~~
- **Priority**: ~~medium~~
- **Resolution**: ADR-031 — Start with global rules + principal matching from `Identity.scopes`. Per-user scope from `peer_credentials.metadata.scopes` via `IdentityProvider`. The `ForwardingPolicy` evaluates rules against `Identity.id` and `Identity.scopes` from the authenticated identity.
- **Cross-references**: [ADR-031](decisions/031-forwarding-policy.md), [configuration.md](configuration.md)
### OQ-13: Config file auto-reload via file watching
- **Origin**: [research/configuration.md](../research/configuration.md)
- **Status**: resolved
- **Priority**: low
- **Resolution**: No file watching. CLI loads once at startup; NAPI/head reload explicitly. File watching is a potential attack vector and unnecessary complexity for a security tool.
- **Cross-references**: configuration.md
### OQ-14: ArcSwap vs RwLock for dynamic config
- **Origin**: [research/configuration.md](../research/configuration.md)
- **Status**: resolved
- **Priority**: low
- **Resolution**: ArcSwap. Lock-free reads on the hot path (every auth check, every channel open). `RwLock` adds contention. `arc-swap` is small (~500 lines) and well-maintained.
- **Cross-references**: configuration.md
### OQ-15: TLS + WebTransport + iroh QUIC listener coexistence
- **Origin**: [research/configuration.md](../research/configuration.md)
- **Status**: open
- **Priority**: medium
- **Resolution**: (deferred to Phase 4 — needs R&D in WebTransport transport session)
- **Cross-references**: [auth.md](auth.md), OQ-19, [interface.md](interface.md)
### OQ-16: Transport-specific forwarding policy (e.g., WebTransport clients restricted to alknet-* channels)
- **Origin**: [research/configuration.md](../research/configuration.md)
- **Status**: ~~resolved~~
- **Priority**: ~~low~~
- **Resolution**: ADR-031 — Add `TransportKind` match in `ForwardingRule`. WebTransport clients can be restricted to `alknet-*` channels via `TargetPattern::AlknetPrefix` combined with a `TransportKind::WebTransport` filter.
- **Cross-references**: [ADR-031](decisions/031-forwarding-policy.md), [configuration.md](configuration.md)
### OQ-17: Transport-aware auth layer (SSH keys vs API keys for non-SSH transports)
- **Origin**: [research/configuration.md](../research/configuration.md)
- **Status**: ~~resolved~~
- **Priority**: ~~medium~~
- **Resolution**: ADR-023 — Unified auth with shared key material. SSH transports use SSH pubkey auth. Non-SSH transports (WebTransport) use Ed25519-signed timestamp tokens. Both verify against the same `authorized_keys` set. The presentation differs per transport, but the identity is unified. `AuthPolicy` holds both `SshAuthConfig` and `TokenAuthConfig`, with `TokenKeySource::Shared` as the default (same keys for both paths). `IdentityProvider` trait decouples alknet-core from identity storage.
- **Cross-references**: [ADR-023](decisions/023-unified-auth-shared-key-material.md), [identity.md](identity.md), OQ-15
### OQ-23: irpc dependency — always or behind feature flag?
- **Origin**: [research/integration-plan.md](../research/integration-plan.md)
- **Status**: ~~resolved~~
- **Priority**: medium —
- **Resolution**: ADR-027 — Feature flag. Nodes that only do SSH tunneling don't need the service layer. irpc is behind a feature flag in alknet-core and an independent dependency in alknet-secret and alknet-storage.
- **Cross-references**: [ADR-027](decisions/027-crate-decomposition.md)
### OQ-24: DNS control channel scope for initial implementation?
- **Origin**: [research/integration-plan.md](../research/integration-plan.md)
- **Status**: ~~resolved~~
- **Priority**: medium —
- **Resolution**: ADR-026 — DNS control channel carries call protocol frames only (no SSH tunneling over DNS). The (DNS transport, raw framing interface) pair sends `EventEnvelope` directly. SSH-over-DNS is a future possibility but out of scope.
- **Cross-references**: [ADR-026](decisions/026-transport-interface-separation.md), [interface.md](interface.md)
### OQ-25: alknet-storage and alknet-secret irpc dependency
- **Origin**: [research/integration-plan.md](../research/integration-plan.md)
- **Status**: ~~resolved~~
- **Priority**: low —
- **Resolution**: ADR-027 — Independently. They're separate crates. irpc is a shared library they both use as an independent dependency.
- **Cross-references**: [ADR-027](decisions/027-crate-decomposition.md)
## Auth
### OQ-18: Source of Identity.scopes — ForwardingPolicy, IdentityProvider, or both?
- **Origin**: [auth.md](auth.md)
- **Status**: ~~resolved~~
- **Priority**: ~~medium~~
- **Resolution**: ADR-029 and ADR-031 — `IdentityProvider` owns scopes. The `Identity` struct includes `scopes` and `resources` fields populated by the `IdentityProvider` implementation (config-based or database-backed). `ForwardingPolicy` uses scopes from `Identity` — it consumes them, it doesn't produce them.
- **Cross-references**: [ADR-029](decisions/029-identity-core-type.md), [ADR-031](decisions/031-forwarding-policy.md), [identity.md](identity.md)
### OQ-19: Separate TLS identity for WebTransport vs shared with SSH-over-TLS?
- **Origin**: [auth.md](auth.md)
- **Status**: open
- **Priority**: low
- **Resolution**: (deferred to Phase 4 — QUIC is UDP, TLS-over-TCP is TCP, they can share port 443 without conflict)
- **Cross-references**: OQ-15, [interface.md](interface.md)
## Call Protocol
### OQ-20: Worker registration and discovery on connect/disconnect
- **Origin**: [call-protocol.md](call-protocol.md)
- **Status**: open
- **Priority**: medium
- **Resolution**: (pending — registration on connect / cleanup on disconnect is the leading approach but needs spec in call-protocol.md)
- **Cross-references**: ADR-024, ADR-025
### OQ-21: Routing calls to specific workers with same-service operations
- **Origin**: [call-protocol.md](call-protocol.md)
- **Status**: ~~resolved~~
- **Priority**: ~~medium~~
- **Resolution**: ADR-024, ADR-025 — Operation paths use `/{node}/{service}/{op}` format. The first path segment identifies the node and routes the call to the correct connected node. Multiple workers exposing the same service are differentiated by the node prefix (`/dev1/fs/readFile` vs `/dev2/fs/readFile`). The head maintains a routing table mapping node identity to connection.
- **Cross-references**: [call-protocol.md](call-protocol.md), ADR-024, ADR-025
### OQ-22: Client streaming (streaming inputs) in the call protocol?
- **Origin**: [call-protocol.md](call-protocol.md)
- **Status**: ~~resolved~~
- **Priority**: ~~low~~
- **Resolution**: Deferred. Current model (single request, optional streaming response) covers all identified use cases. Client streaming can be added later if needed.
- **Cross-references**: ADR-024
## Services
### OQ-SVC-01: Should the secret service support multiple seed phrases (one per tenant)?
- **Origin**: [secret-service.md](secret-service.md)
- **Status**: open
- **Priority**: low
- **Resolution**: (deferred — one seed per node is simplest; multi-seed can be added later by indexing `Unlock` with a tenant ID)
- **Cross-references**: [secret-service.md](secret-service.md)
### OQ-SVC-02: Should service protocols use postcard (binary) or JSON for remote calls?
- **Origin**: [research/services.md](../research/services.md)
- **Status**: ~~resolved~~
- **Priority**: low —
- **Resolution**: Postcard for irpc (Rust-to-Rust, efficient). JSON for call protocol (cross-language, universal). The irpc remote path naturally uses postcard.
- **Cross-references**: [services.md](services.md)
### OQ-SVC-03: How does the secret service integrate with the existing EncryptedDataSchema from @alkdev/storage?
- **Origin**: [secret-service.md](secret-service.md)
- **Status**: open
- **Priority**: medium
- **Resolution**: (pending — Rust implementation replaces PBKDF2 password-based encryption with derived AES-256-GCM keys; EncryptedData format is a superset; migration by re-encrypting)
- **Cross-references**: [secret-service.md](secret-service.md), [storage.md](storage.md)
### OQ-SVC-04: Should workers cache derived keys locally?
- **Origin**: [secret-service.md](secret-service.md)
- **Status**: ~~resolved~~
- **Priority**: low —
- **Resolution**: Yes, with a TTL (default: 1 hour). The head can revoke by invalidating the session.
- **Cross-references**: [secret-service.md](secret-service.md)
### OQ-SVC-05: How does the NFT-based ACL smart contract interact with the secret service?
- **Origin**: [storage.md](storage.md)
- **Status**: open
- **Priority**: low
- **Resolution**: The Ethereum signing key (`m/44'/60'/0'/0/0`) is derived from the same seed as the secret service. The smart contract is a separate concern — it reads on-chain ACL state, it doesn't call the secret service.
- **Cross-references**: [storage.md](storage.md), [secret-service.md](secret-service.md)
## Interface
### OQ-IF-01: How does the Interface session type relate to the call protocol's EventEnvelope stream?
- **Origin**: [interface.md](interface.md)
- **Status**: ~~resolved~~
- **Priority**: ~~high~~
- **Resolution**: `InterfaceSession::recv()` returns `Option<InterfaceEvent>` where `InterfaceEvent` carries `EventEnvelope` + `Identity`. `InterfaceSession::send()` accepts `EventEnvelope`. The `SshSession` bridge implements this over the `alknet-control:0` channel. For `MessageInterface`, `InterfaceRequest`/`InterfaceResponse` normalize request/response pairs. See [interface.md](interface.md) and ADR-035.
- **Cross-references**: [ADR-035](decisions/035-streaminterface-messageinterface-split.md), [interface.md](interface.md)
### OQ-IF-02: Should SshInterface own ForwardingPolicy checks or should they move to Layer 3?
- **Origin**: [interface.md](interface.md)
- **Status**: ~~resolved~~
- **Priority**: ~~medium~~
- **Resolution**: ForwardingPolicy is Layer 3 (it's policy, not session mechanics). Channel open/close lifecycle is Layer 2. The Interface reports channel open requests to Layer 3; Layer 3 applies ForwardingPolicy. The current `SshHandler` implementation checks policy in `channel_open_direct_tcpip`, which already delegates to `Identity.scopes` from the authenticated identity — this is consistent with the resolution.
- **Cross-references**: [ADR-031](decisions/031-forwarding-policy.md), [interface.md](interface.md)
### OQ-P2-01: Should MessageInterface and StreamInterface share a common trait?
- **Origin**: [research/phase2/interface-model.md](../research/phase2/interface-model.md)
- **Status**: resolved
- **Priority**: medium
- **Resolution**: Independent traits. Different signatures (`handle_request` vs `accept` + session lifecycle), different transport ownership (self-managed vs provided), different lifecycles (stateless per-request vs long-lived session). A common super-trait adds complexity without benefit. See ADR-035.
- **Cross-references**: [ADR-035](decisions/035-streaminterface-messageinterface-split.md), [interface.md](interface.md)
### OQ-P2-02: Should the HTTP interface share a port with the SSH listener?
- **Origin**: [research/phase2/interface-model.md](../research/phase2/interface-model.md)
- **Status**: resolved
- **Priority**: low
- **Resolution**: Start with separate ports. Stealth mode byte-peek on a shared port is already implemented for SSH vs HTTP detection. `ListenerConfig::Http { stealth: true }` enables the existing peek pattern. ALPN multiplexing on port 443 is a future optimization that doesn't change the interface abstraction.
- **Cross-references**: [interface.md](interface.md), [research/phase2/tls-transport.md](../research/phase2/tls-transport.md)
### OQ-P2-03: Should the HTTP interface auto-generate OpenAPI specs from OperationRegistry?
- **Origin**: [research/phase2/interface-model.md](../research/phase2/interface-model.md)
- **Status**: resolved
- **Priority**: low
- **Resolution**: Yes, but Phase 5+. The HTTP interface needs to exist first (Phase 5.3 in the integration plan). `GET /v1/schema` producing an OpenAPI spec from registered `OperationSpec`s is the natural end state. This creates symmetry with `FromOpenAPI` (inbound spec consumption).
- **Cross-references**: [call-protocol.md](call-protocol.md), [interface.md](interface.md)
### OQ-P2-04: How do self-hosted services authenticate via alknet?
- **Origin**: [research/phase2/credential-provider.md](../research/phase2/credential-provider.md), [research/phase2/definitions.md](../research/phase2/definitions.md)
- **Status**: resolved
- **Priority**: medium
- **Resolution**: Three-phase approach. Phase A: shared secret (`CredentialSet::Bearer` or `S3AccessKey`). Phase C: identity-bound credentials via `ManagedCredentialProvider`. Phase D: alknet as OIDC provider. The `CredentialProvider` trait in core enables Phase A immediately; Phases C and D are additive.
- **Cross-references**: [ADR-036](decisions/036-credentialprovider-core-type.md), [credentials.md](credentials.md)
## Credentials
### OQ-CP-01: Should CredentialProvider support per-identity credentials?
- **Origin**: [credentials.md](credentials.md)
- **Status**: open
- **Priority**: low
- **Resolution**: Start with service-level credentials (`get_credentials(service)`). Add identity-level resolution (`get_credentials_for(service, identity_id)`) when the need is concrete. `Identity.id` already serves as the account UUID in database-backed mode.
- **Cross-references**: [credentials.md](credentials.md), [ADR-036](decisions/036-credentialprovider-core-type.md)
### OQ-CP-02: Where should OIDC provider operations live?
- **Origin**: [credentials.md](credentials.md)
- **Status**: open
- **Priority**: low
- **Resolution**: Application service (Phase D). OIDC is an application concern, not a core concern. The call protocol and OperationRegistry provide the transport; OIDC is just another set of operations.
- **Cross-references**: [credentials.md](credentials.md)
### OQ-CP-03: How do credential rotations propagate across a cluster?
- **Origin**: [credentials.md](credentials.md)
- **Status**: open
- **Priority**: low
- **Resolution**: TBD. Likely TTL-based caching with a refresh threshold. Workers call `CredentialProvider::get_credentials()` which checks `is_expired()` and calls `refresh_credentials()` if needed.
- **Cross-references**: [credentials.md](credentials.md)
### OQ-CP-04: Should CredentialSet include request-signing capability?
- **Origin**: [credentials.md](credentials.md)
- **Status**: resolved
- **Priority**: low
- **Resolution**: No. `CredentialSet` is pure data. Request signing (e.g., AWS Signature V4) is a separate utility function in the service wrapper or a shared `alknet-s3` crate. Credentials are data; signing is protocol behavior.
- **Cross-references**: [credentials.md](credentials.md)
## Definitions
### OQ-DEF-01: Should alknet adopt a "Service Catalog" concept like Keystone?
- **Origin**: [research/phase2/definitions.md](../research/phase2/definitions.md)
- **Status**: resolved
- **Priority**: low
- **Resolution**: Keep `OperationRegistry` global, check scope at invocation time. Add scope-filtered discovery (`GET /v1/schema?scope=...`) when multi-tenant deployment requires it. The unfiltered registry is sufficient for current needs.
- **Cross-references**: [call-protocol.md](call-protocol.md)
### OQ-DEF-03: Should Identity.scopes be hierarchical or stay flat?
- **Origin**: [research/phase2/definitions.md](../research/phase2/definitions.md)
- **Status**: resolved
- **Priority**: low
- **Resolution**: Stay flat. Add implied scope resolution in alknet-storage when multi-tenant deployment requires it. A full policy language (like Rustfs IAM JSON policies) is Phase D territory.
- **Cross-references**: [identity.md](identity.md)
### OQ-DEF-08: Should "credential presentation" replace "auth interface" in terminology?
- **Origin**: [research/phase2/definitions.md](../research/phase2/definitions.md)
- **Status**: resolved
- **Priority**: medium
- **Resolution**: Yes. Adopted in [definitions.md](definitions.md). Use "credential presentation" for the mechanism of presenting credentials on a (Transport, Interface) pair. Never use "auth interface" (overloads "Interface").
- **Cross-references**: [definitions.md](definitions.md), [auth.md](auth.md)
## Secret Service
### OQ-SEC-01: Should alknet-secret use mlock/VirtualLock to prevent seed RAM from being paged to disk?
- **Origin**: [secret-service.md](secret-service.md)
- **Status**: open
- **Priority**: low
- **Resolution**: (deferred to Phase B — zeroize is sufficient for v1; mlock requires root/CAP_IPC_LOCK on Linux and SeLockMemory on Windows, adding platform complexity that should be audited together)
- **Cross-references**: [ADR-038](decisions/038-seed-lifecycle-memory-security.md), [secret-service.md](secret-service.md)

View File

@@ -1,242 +0,0 @@
---
status: reviewed
last_updated: 2026-06-07
---
# Alknet Overview
## Purpose
Alknet is a self-hostable SSH-based tunnel tool that provides VPN-like functionality without being a VPN protocol. It enables:
- **Private tunneling** of services (Postgres, Redis, internal APIs) over SSH
- **Censorship circumvention** — SSH over TLS on port 443 looks like HTTPS to DPI
- **NAT traversal** — iroh transport allows peer-to-peer connections without public IPs or port forwarding
- **Service mesh connectivity** — a lightweight transport layer for the pubsub/operations event system
The core insight: SSH tunnels work because SSH is fundamental infrastructure. Blocking it breaks the internet. Alknet makes SSH tunneling accessible through a simple CLI with pluggable transports.
## Crate Structure
Alknet is decomposed into six crates with a strict acyclic dependency graph (ADR-027):
| Crate | Purpose | Exists Now? |
|-------|---------|-------------|
| **alknet-core** | Transport, SSH, call protocol, config, auth types, `OperationSpec`, `Interface` trait | Yes |
| **alknet-napi** | Node.js native addon via napi-rs | Yes |
| **alknet-secret** | BIP39, SLIP-0010 HD key derivation, AES-256-GCM, `SecretProtocol` irpc service | Phase 2+ |
| **alknet-storage** | SQLite-backed metagraph, identity tables, ACL graph, honker, `StorageProtocol` | Phase 2+ |
| **alknet-flowgraph** | `FlowGraph<N,E>` over petgraph, operation graph, call graph | Phase 2+ |
| **alknet** (CLI) | Binary that assembles everything with feature flags | Yes |
The four library crates (core, secret, storage, flowgraph) are independent of each other. Dependencies flow upward only: the CLI binary sits at the top and wires concrete implementations together. alknet-storage implements alknet-core's `IdentityProvider` trait without a crate dependency — the CLI binary provides the bridge.
irpc is behind a feature flag in alknet-core. Nodes that only do SSH tunneling don't need the service layer overhead.
## Three-Layer Model
Alknet uses a three-layer model (ADR-026, ADR-035):
| Layer | Responsibility | Examples |
|-------|---------------|----------|
| **Layer 1: Transport** | Produces byte streams (`AsyncRead + AsyncWrite + Unpin + Send`) | TCP, TLS, iroh, WebTransport (future) |
| **Layer 2: Interface** | Two categories: StreamInterface (consumes transport stream, produces session) and MessageInterface (handles discrete requests, manages own transport) | Stream: SSH, raw framing. Message: HTTP, DNS |
| **Layer 3: Protocol** | Carries semantics — operation registry, service calls, events | Call protocol, OperationEnv, operation dispatch |
SSH is an interface, not a transport. DNS is a message interface, not a transport.
The three-layer model enables HTTP interfaces (stealth mode byte-peek),
DNS control channels, and local service mesh (raw framing) without wrapping SSH
inside those transports.
A stream-based connection is always a (Transport, StreamInterface) pair.
Message-based interfaces manage their own transport. The protocol layer is
agnostic to both.
## Service Layer
The irpc service layer decomposes alknet's core responsibilities into independently testable, deployable, and replaceable components (ADR-033, [services.md](services.md)):
- **Auth** (`AuthProtocol`) — verify identities, check credentials
- **Secret** (`SecretProtocol`) — derive keys, encrypt/decrypt
- **Config** (`ConfigProtocol`) — dynamic config reload
- **Storage** (`StorageProtocol`) — graph CRUD, metagraph operations
**OperationEnv** is the universal composition mechanism. A handler receives `context.env.invoke("secrets", "derive", input)` and doesn't know whether the dispatch is local (direct function call), in-cluster (irpc service), or cross-node (call protocol `EventEnvelope`). Three dispatch paths, one handler-facing API.
**Phase boundary**: Phase 1 ships `ConfigIdentityProvider` (ArcSwap-backed) and `ConfigServiceImpl` (ArcSwap-backed) as the only auth and config implementations. The irpc service protocols (`AuthProtocol`, `SecretProtocol`, etc.) and the production deployment topology (multi-node with `StorageIdentityProvider`) are contracted in the specs but will be implemented in Phase 2+. Application services (DockerService, NodeService, agent services) are downstream concerns that build on top of the call protocol and OperationEnv.
## Identity
`Identity` struct and `IdentityProvider` trait are core types in alknet-core (ADR-029, [identity.md](identity.md)):
```rust
pub struct Identity {
pub id: String, // Fingerprint (config auth) or account UUID (database auth)
pub scopes: Vec<String>, // Authorization scope strings
pub resources: HashMap<String, Vec<String>>, // Resource-level authorization
}
```
`IdentityProvider` decouples alknet-core from identity storage. Phase 1 ships `ConfigIdentityProvider` (reads from `ArcSwap<DynamicConfig.auth>`). `StorageIdentityProvider` (Phase 2+, backed by SQLite) replaces it for production deployments. Both produce the same `Identity` result.
## Exports
### Binary: `alknet`
A single binary with subcommands:
```
alknet serve — Start the server (accepts SSH connections)
alknet connect — Start the client (opens SSH session, exposes SOCKS5/port-forwards)
```
### Library: `alknet-core`
The `alknet-core` crate exports the pluggable components for embedding or programmatic use:
- `Transport` trait — produces a duplex stream for SSH to run over
- `TcpTransport` — direct TCP connection
- `TlsTransport` — TCP + tokio-rustls TLS
- `IrohTransport` — iroh QUIC P2P connection
- `Interface` trait → `StreamInterface` trait and `MessageInterface` trait (ADR-035)
- `InterfaceSession` trait — `recv()`/`send()` producing/consuming `InterfaceEvent` frames
- `InterfaceRequest` / `InterfaceResponse` — normalized request/response for message interfaces
- `Socks5Server` — local SOCKS5 proxy that forwards through SSH channels
- `PortForwarder` — manages local/remote port forwards
- `ServerHandler``SshInterface` — russh server handler with configurable auth and channel policies
- `Identity` / `IdentityProvider` — core identity types (ADR-029)
- `CredentialProvider` / `CredentialSet` — outbound credential types (ADR-036)
- `OperationSpec` — operation registration for call protocol (ADR-025)
- `OperationEnv` / `OperationContext` — universal composition and operation context
- `ConnectOptions` / `ServeOptions` — programmatic configuration structs
- `StaticConfig` / `DynamicConfig` — static/immutable vs, hot-reloadable config (ADR-030)
- `ConfigReloadHandle` — programmatic reload of dynamic config
- `ForwardingPolicy` — rule-based allow/deny for channel targets (ADR-031)
- `ListenerConfig` — stream and message listener configuration
## Dependencies
| Dependency | Purpose | Crate | Feature-gated |
|------------|---------|-------|---------------|
| `russh` | SSH client & server | core | No (core) |
| `tokio` | Async runtime | core | No (core) |
| `tokio-rustls` | TLS wrapping | core | Yes (`tls`) |
| `rustls` | TLS implementation | core | Yes (`tls`) |
| `rustls-acme` | ACME/Let's Encrypt auto-cert | core | Yes (`acme`) |
| `iroh` | P2P QUIC transport | core | Yes (`iroh`) |
| `irpc` | Streaming RPC service layer | core | Yes (`irpc`) |
| `arc-swap` | Lock-free dynamic config | core | No (core) |
| `serde` | Serialization | core | No (core) |
| `clap` | CLI argument parsing | CLI | No (CLI) |
| `toml` | TOML config file | CLI | No (CLI) |
| `tracing` | Structured logging | core | No (core) |
| `anyhow` / `thiserror` | Error handling | core | No (core) |
| `bip39` | Mnemonic generation | secret | No (secret) |
| `ed25519-bip32` | HD key derivation | secret | No (secret) |
| `aes-gcm` | AES-256-GCM encryption | secret | No (secret) |
| `rusqlite` | SQLite (via honker) | storage | No (storage) |
| `honker` | Event-sourced storage | storage | No (storage) |
| `petgraph` | Graph data structure | storage, flowgraph | No |
| `jsonschema` | JSON Schema validation | storage, flowgraph | No |
> Note: `tun-rs` is no longer a dependency. TUN support is deferred in favor of the external `tun2proxy` tool (ADR-014).
## Architecture Constraints
1. **SSH runs over transport, not alongside** — The transport layer produces a single `AsyncRead+AsyncWrite+Unpin+Send` stream. SSH runs over that stream via `russh::client::connect_stream()` / `russh::server::run_stream()`. The SSH layer never knows what transport it's on. (ADR-001, ADR-004)
2. **Three-layer model: Transport, Interface, Protocol** — SSH is a StreamInterface (Layer 2), not a transport (Layer 1). HTTP and DNS are MessageInterfaces (Layer 2). A connection is always a (Transport, StreamInterface) pair for stream-based interfaces, or a standalone MessageInterface for message-based ones. The call protocol (Layer 3) is agnostic to both. This enables HTTP interfaces, DNS control channels, and local service mesh without wrapping SSH. (ADR-026, ADR-035)
3. **SOCKS5 is the primary client interface** — Port forwarding is built on top of SOCKS5-like channel management. For VPN-like "route all traffic" behavior, users run `tun2proxy` alongside alknet's SOCKS5 proxy. TUN is not in the project scope. (ADR-005, ADR-014)
4. **No logging of tunnel destinations** — The server logs auth attempts and connections (for fail2ban) but does not log `channel_open_direct_tcpip` destinations, DNS lookups, or bytes transferred. (ADR-006, ADR-013)
5. **Programmatic-first API** — Configuration via CLI flags, library API structs (`ConnectOptions`, `ServeOptions`), and environment variables. No `~/.ssh/config` parsing. Optional `--config` TOML file for reproducible deployments. (ADR-011, ADR-030)
6. **Feature flags control transport inclusion**`tls`, `iroh`, `acme`, `irpc` are feature-gated so the base install is lean. Users opt in to heavier dependencies.
7. **Authentication is key-based and unified** — Ed25519 public key (default) and OpenSSH certificate authority. Same key material for SSH and token auth. Identity resolves through `IdentityProvider` trait, decoupling core from identity storage. (ADR-012, ADR-023, ADR-029)
8. **NAPI exposes both connect() and serve()** — The napi-rs wrapper provides client and server functionality, using napi-rs as the FFI bridge. The NAPI layer is transport-agnostic and not tied to pubsub. (ADR-015, ADR-016)
9. **Static/dynamic config split** — Transport-level settings (listen address, TLS certs) are immutable after startup. Auth, forwarding policy, and rate limits are hot-reloadable via `ArcSwap<DynamicConfig>`. (ADR-030)
10. **Forwarding policy enforced before proxy spawn** — Each `channel_open_direct_tcpip` is checked against `ForwardingPolicy` before a TCP connection is made. Default-allow preserves current behavior. (ADR-031)
11. **OperationEnv as universal composition mechanism** — Handlers call `context.env.invoke(namespace, op, input)` regardless of dispatch path (local, irpc service, remote call protocol). (ADR-033)
12. **Event boundary discipline** — Domain events (Honker streams) stay within the owning service. irpc calls are synchronous and in-cluster. Call protocol `EventEnvelope` is the only thing that crosses node boundaries. (ADR-032)
13. **Error handling follows a consistent layered pattern** — Transport and auth errors cause reconnection (client, with exponential backoff) or connection rejection (server). Channel-level errors (target unreachable, proxy failure) close the individual channel without killing the session. Library API errors propagate via `anyhow::Result` / `thiserror` types. CLI reports errors to stderr with appropriate exit codes. NAPI errors are marshalled as JavaScript exceptions.
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [001](decisions/001-pluggable-transport.md) | Pluggable transport | Transport trait produces `AsyncRead+AsyncWrite+Unpin+Send`, SSH consumes it |
| [002](decisions/002-tun-separate-process.md) | TUN shim separate | Superseded — TUN is deferred, use tun2proxy (ADR-014) |
| [003](decisions/003-iroh-stream-join.md) | iroh stream join | `tokio::io::join(recv, send)` combines QUIC halves |
| [004](decisions/004-ssh-over-transport.md) | SSH over transport | SSH never accesses TCP/iroh/TLS directly |
| [005](decisions/005-socks5-before-tun.md) | SOCKS5 first | SOCKS5 is the primary interface; TUN is external (tun2proxy) |
| [006](decisions/006-no-logging-of-tunnel-destinations.md) | No logging of tunnel destinations | Server logs auth and connections, not destinations |
| [007](decisions/007-napi-single-stream.md) | NAPI single stream | NAPI exposes duplex streams, not SSH multiplexing |
| [008](decisions/008-acme-lets-encrypt.md) | ACME/Let's Encrypt | Auto-provision TLS certs, domain and IP paths |
| [009](decisions/009-default-iroh-relay.md) | Default iroh relay | n0 relay by default, `--iroh-relay` override |
| [010](decisions/010-transport-chaining-cli.md) | Transport chaining | `--proxy` works with all transports natively |
| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first | No SSH config files; options are structs, env vars, CLI flags (amended by ADR-030 for optional TOML) |
| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Key + cert-authority | Ed25519 keys + OpenSSH CA; no password auth |
| [013](decisions/013-fail2ban-friendly-logging.md) | Fail2ban-friendly | Structured auth logs + built-in rate limiting |
| [014](decisions/014-defer-tun-recommend-socks5-proxy.md) | Defer TUN | Use tun2proxy for VPN-like behavior; no alknet-tun binary |
| [015](decisions/015-napi-rs-for-ffi-bridge.md) | napi-rs | Standard Node.js native addon tooling |
| [016](decisions/016-napi-expose-connect-and-serve.md) | connect + serve | NAPI exposes both client and server from the start |
| [017](decisions/017-stealth-mode-protocol-multiplexing.md) | Stealth mode | Protocol multiplexing on port 443 |
| [018](decisions/018-control-channel-for-pubsub.md) | Control channel | Reserved `alknet-control` destination for pubsub |
| [019](decisions/019-proxy-dual-semantics.md) | Proxy dual semantics | `--proxy` routes transport on client, data on server |
| [023](decisions/023-unified-auth-shared-key-material.md) | Unified auth | Same key material for SSH and token auth |
| [024](decisions/024-bidirectional-call-protocol.md) | Bidirectional call protocol | Both sides can initiate calls |
| [025](decisions/025-handler-spec-separation.md) | Handler/spec separation | Downstream registers operations without modifying core |
| [026](decisions/026-transport-interface-separation.md) | Three-layer model | SSH is Layer 2, not Layer 1 |
| [027](decisions/027-crate-decomposition.md) | Crate decomposition | Six crates, acyclic deps, feature-gated irpc |
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | IdentityProvider is the contract, irpc is one backend |
| [029](decisions/029-identity-core-type.md) | Identity as core type | `Identity` and `IdentityProvider` in alknet-core |
| [030](decisions/030-static-dynamic-config-split.md) | Static/dynamic config | ArcSwap for hot-reloadable auth and forwarding |
| [031](decisions/031-forwarding-policy.md) | Forwarding policy | Per-identity, per-destination, per-transport rules |
| [032](decisions/032-event-boundary-discipline.md) | Event boundary | Domain events never cross service boundaries |
| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv | Universal composition, three dispatch paths |
| [034](decisions/034-head-worker-terminology.md) | Head/worker | Replaces hub/spoke terminology |
| [035](decisions/035-streaminterface-messageinterface-split.md) | StreamInterface/MessageInterface | Two Layer 2 trait categories for stream vs message |
| [036](decisions/036-credentialprovider-core-type.md) | CredentialProvider as core type | Outbound credentials in `alknet_core::credentials` |
| [037](decisions/037-api-keys-dynamic-config.md) | API keys in DynamicConfig | Hash-verified bearer tokens for service accounts |
## Open Questions
See [open-questions.md](open-questions.md) for all open and resolved questions.
Key open questions: OQ-15 (QUIC coexistence), OQ-19 (WebTransport TLS),
OQ-20 (worker registration), OQ-IF-01 (Interface session / EventEnvelope
relationship).
## References
- [transport.md](transport.md) — Transport abstraction (Layer 1)
- [interface.md](interface.md) — StreamInterface and MessageInterface (Layer 2)
- [call-protocol.md](call-protocol.md) — Call protocol (Layer 3)
- [auth.md](auth.md) — Unified authentication, API keys, credential presentation
- [identity.md](identity.md) — Identity and IdentityProvider
- [credentials.md](credentials.md) — CredentialProvider and CredentialSet (outbound auth)
- [definitions.md](definitions.md) — Terminology disambiguation
- [configuration.md](configuration.md) — StaticConfig, DynamicConfig, ForwardingPolicy
- [services.md](services.md) — irpc service layer, OperationEnv
- [server.md](server.md) — Server acceptance, channel handling
- [client.md](client.md) — Client connection, SOCKS5, port forwarding
- [napi-and-pubsub.md](napi-and-pubsub.md) — NAPI wrapper and pubsub adapter
- [storage.md](storage.md) — alknet-storage: metagraph, identity, ACL
- [flowgraph.md](flowgraph.md) — alknet-flowgraph: call graph, operation graph
- [secret-service.md](secret-service.md) — alknet-secret: BIP39, SLIP-0010, AES-GCM
- [Feasibility Assessment](../research/feasibility/ssh-tunnel-vpn-alternative-feasibility.md)
- [russh API](/workspace/russh) — SSH client/server library
- [Dispatch](/workspace/@alkdev/dispatch) — Reference implementation of russh port forwarding
- [iroh](/workspace/iroh) — P2P QUIC connections
- [tun2proxy](https://github.com/tun2proxy/tun2proxy) — Recommended external TUN-to-SOCKS5 tool
- [irpc](/workspace/irpc) — iroh streaming RPC
- [Production certbot setup](../research/ops/certbot.md) — Let's Encrypt on our infrastructure
- [Production fail2ban setup](../research/ops/fail2ban.md) — fail2ban with nftables on our infrastructure

View File

@@ -1,519 +0,0 @@
---
status: reviewed
last_updated: 2026-06-10
---
# Secret Service (alknet-secret)
## What
The `alknet-secret` crate provides BIP39 mnemonic generation, SLIP-0010 Ed25519
HD key derivation, AES-256-GCM encryption for external credentials, and the
`SecretProtocol` irpc service. It is the only component that holds the master
seed phrase.
## Why
Operations like SSH key generation, API key storage, and Ethereum transaction
signing all need deterministic key derivation from a single root of trust. The
seed phrase is the single recovery mechanism — from it, all self-generated
secrets can be derived on demand. External credentials (third-party API keys,
OAuth tokens) cannot be derived and must be stored encrypted, with the
encryption key itself derived from the seed.
The secret service isolates this responsibility: no other crate sees the seed,
and derived keys are provided on demand through an irpc service interface. This
follows ADR-027 (crate decomposition) — alknet-secret is fully independent of
alknet-core and alknet-storage.
## Architecture
### Crate Structure
```
alknet-secret/
├── Cargo.toml
├── src/
│ ├── lib.rs # Crate root, re-exports
│ ├── mnemonic.rs # BIP39: phrase generation, validation, seed derivation
│ ├── derivation.rs # SLIP-0010: HD key derivation, path constants
│ ├── encryption.rs # AES-256-GCM: encrypt/decrypt, EncryptedData type
│ ├── protocol.rs # SecretProtocol irpc service enum, DerivedKey, KeyType
│ ├── service.rs # SecretService, SecretServiceHandle, SecretServiceActor
│ ├── cache.rs # Key caching: LRU cache with TTL, derivation path as key
│ └── ethereum.rs # BIP-0032 secp256k1 HD key derivation (behind feature flag)
└── tests/
├── derivation_tests.rs # Path derivation, coin type 74' consistency
├── encryption_tests.rs # Round-trip encrypt/decrypt, key version
├── service_tests.rs # Unlock/Lock lifecycle, derive on locked = error
└── test_vectors.rs # Known-answer tests: BIP39, SLIP-0010, AES-256-GCM
```
### Dependencies
```toml
[dependencies]
bip39 = { version = "2", features = ["rand"] }
ed25519-bip32 = "0.4" # IOHK SLIP-0010 Ed25519 HD derivation
aes-gcm = "0.10" # AES-256-GCM
sha2 = "0.10" # SHA-256 (also used for HMAC-SHA512 in password derivation)
hmac = "0.12" # HMAC-SHA512 for key derivation
serde = { version = "1", features = ["derive"] }
serde_json = "1"
thiserror = "2"
irpc = { workspace = true } # Always-on, not feature-gated (ADR-027)
irpc-derive = { workspace = true } # Proc-macro for #[rpc_requests]
tokio = { version = "1", features = ["sync", "rt", "macros"] } # Async runtime for SecretServiceActor
zeroize = { version = "1", features = ["derive"] } # Secure memory wiping (ADR-038)
base64 = "0.22" # Base64url encoding for derived passwords
rand = "0.8" # Random IV/salt generation for AES-256-GCM
[dependencies.secp256k1]
version = "0.29"
optional = true # BIP-0032 secp256k1 derivation (behind feature flag)
[features]
default = []
secp256k1 = ["dep:secp256k1"] # Enable Ethereum/secp256k1 key derivation
# Future (Phase B): key rotation via KDF
# hkdf = "0.12" # HKDF for salt-based key stretching (deferred)
# pbkdf2 = "0.12" # PBKDF2 for password-based key derivation (deferred)
```
irpc is always a dependency (not behind a feature flag). Per ADR-027, irpc
in alknet-secret and alknet-storage is not feature-gated because these crates
are used in production deployments where the service layer is always active.
`irpc-derive` provides the `#[rpc_requests]` proc-macro that generates
`SecretMessage` and channel plumbing. `tokio` is needed for the
`SecretServiceActor` message loop (async channel receivers and task spawning).
The `secp256k1` crate is feature-gated behind the `secp256k1` feature because
Ethereum/BIP-0032 derivation is not needed in minimal deployments. Only
deployments that require `DeriveEthereumKey` should enable this feature. Note
that the crate name is `secp256k1` (the Rust library), not `libsecp256k1`
(the C library that the Rust crate wraps).
The `hkdf` and `pbkdf2` crates are deferred to Phase B. They will be needed for
salt-based key stretching when key rotation is implemented (see
[EncryptedData.salt](#aes-256-gcm-encryption-for-external-credentials)).
### Crate Interface (Public API)
The crate exposes these types as its stable public interface:
```rust
// Core types (always available)
pub use mnemonic::{Mnemonic, Language, Seed};
pub use derivation::{ExtendedPrivKey, DerivationError, PATHS};
pub use encryption::{EncryptedData, EncryptionError};
pub use protocol::{SecretProtocol, DerivedKey, KeyType, SecretMessage};
pub use service::{SecretService, SecretServiceHandle, SecretServiceActor, SecretServiceError};
pub use cache::CacheConfig;
// secp256k1 types (behind feature flag)
#[cfg(feature = "secp256k1")]
pub use ethereum::Secp256k1ExtendedPrivKey;
```
Other crates consume this interface:
- **alknet-storage** references `EncryptedData` for wire format compatibility
(type-level, not a crate dependency)
- **alknet** (CLI binary) assembles `SecretService` and wires it to the
`OperationEnv`
- **alknet-core** never depends on alknet-secret; `CredentialProvider` stub
returns `None` until Phase A wiring
### Security Model
Per ADR-038 (seed lifecycle and memory security):
| State | What's in memory | What's on disk |
|-------|-----------------|---------------|
| Locked | Nothing | Encrypted database, derivation path metadata |
| Unlocked | Master seed in zeroize-protected RAM | Same (seed is never persisted) |
| After use | Derived keys cached in zeroize-protected RAM | Derivation paths only |
The seed phrase is entered once (at node startup or via `Unlock`), held only in
RAM, and never written to disk. `Lock` calls `zeroize()` on the seed and all
cached derived keys. The `SecretService` uses `Zeroize`-derived types for all
sensitive material.
#### Key Caching
Per OQ-SVC-04 (resolved), derived keys are cached in RAM with the following
properties:
- **Cache key**: The derivation path string (e.g., `m/74'/0'/0'/0'`). This
uniquely identifies a derived key — the same path always produces the same
key from the same seed.
- **TTL**: 1 hour (configurable). Cached entries expire after the TTL elapses,
forcing re-derivation from the seed on next access.
- **Eviction policy**: LRU (least recently used). When the cache exceeds its
maximum size, the least recently accessed entry is evicted.
- **Clearing**: The entire cache is cleared on `Lock`, and all entries are
zeroized before removal per ADR-038.
- **Implementation**: The cache lives in `cache.rs` as an LRU map from
derivation path to `Zeroize`-protected key bytes.
The cache avoids redundant derivation for frequently used keys (identity,
encryption) while ensuring that `Lock` purges all sensitive material.
### Key Derivation
#### BIP39 Mnemonic and Seed Derivation
```rust
let mnemonic = Mnemonic::from_phrase(&phrase, Language::English)?;
let seed = mnemonic.to_seed(None); // or Some("passphrase")
let key = derive_path_from_seed(seed.as_bytes(), PATHS::IDENTITY)?;
```
#### SLIP-0010 Ed25519 HD Key Derivation
The `74'` coin type is unallocated per SLIP-0044 and reserved for alknet.
#### Derivation Path Constants
| Path | Purpose | Curve/Algorithm |
|------|---------|----------------|
| `m/74'/0'/0'/0'` | Primary identity keypair | Ed25519 (alknet auth) |
| `m/74'/0'/0'/{n}'` | Worker/device identity | Ed25519 |
| `m/74'/0'/1'/0'` | SSH host key | Ed25519 |
| `m/74'/1'/0'/{hash}'` | Site-specific password | Deterministic (HMAC-SHA512) |
| `m/74'/2'/0'/0'` | Encryption key for external credentials | AES-256-GCM |
| `m/44'/60'/0'/0/0` | Ethereum signing key | secp256k1 |
These constants are defined in `derivation::PATHS` for programmatic access.
#### Password Derivation
`DerivePassword` produces a deterministic password from the seed using the
following algorithm:
1. Derive the extended private key at path `m/74'/1'/0'/{hash}'` using
SLIP-0010 (HMAC-SHA512 with key "ed25519 seed"), where `{hash}'` is a
site-specific hardened index derived from the site identifier.
2. Take the HMAC-SHA512 output (64 bytes) at that derivation level.
3. Truncate to the requested `length` bytes.
4. Encode as Base64url (RFC 4648 §5, no padding).
This produces a URL-safe, deterministic password of the requested length. v1
does not impose a special character set — the Base64url alphabet (`A-Z`,
`a-z`, `0-9`, `-`, `_`) provides sufficient entropy. If a specific character
set is required in the future, a versioned path can be introduced
(e.g., `m/74'/1'/1'/{hash}'`).
The `SecretServiceHandle` provides two methods for password derivation:
- `derive_password(path, length)``Vec<u8>` (raw truncated bytes)
- `derive_password_string(path, length)``String` (Base64url-encoded)
The irpc `DerivePassword` variant returns raw bytes (`Vec<u8>`). Consumers
who need a string representation can Base64url-encode the result.
#### secp256k1 Derivation (Ethereum)
`DeriveEthereumKey` uses **BIP-0032** (not SLIP-0010) at path
`m/44'/60'/0'/0/0`. This is a fundamentally different derivation algorithm from
Ed25519:
- SLIP-0010 (Ed25519) uses HMAC-SHA512 with key "ed25519 seed" and only
supports hardened child derivation.
- BIP-0032 (secp256k1) uses HMAC-SHA512 with key "Bitcoin seed" and supports
both hardened and unhardened child derivation.
The Ethereum path contains unhardened indices (`0/0`), which are invalid under
SLIP-0010. The `alknet-secret` crate gates secp256k1 derivation behind a
`secp256k1` feature flag, which pulls in the `libsecp256k1` crate. Deployments
that do not need Ethereum signing can omit this feature to avoid the
dependency.
#### DerivedKey Security Properties
Per ADR-038, the `private_key` field of `DerivedKey` must derive `Zeroize` and
use `#[zeroize(drop)]` to ensure sensitive key material is overwritten before
deallocation:
```rust
#[derive(Zeroize, Deserialize)]
#[zeroize(drop)]
pub struct DerivedKey {
#[zeroize(skip)]
pub key_type: KeyType,
#[zeroize]
#[serde(deserialize_with = "deserialize_private_key")]
pub private_key: Vec<u8>,
#[zeroize(skip)]
pub public_key: Vec<u8>,
}
```
`DerivedKey` is **move-only** — it does not implement `Clone`. This is a
stronger security property than manual `Clone` with zeroization of the source:
a move-only type cannot be accidentally duplicated, and the `#[zeroize(drop)]`
annotation ensures the `private_key` is zeroized when the key goes out of scope.
There is no risk of use-after-zeroize from a manual `clone()` that destroys
the source.
Serialization redacts `private_key` in human-readable formats (JSON shows
`"[REDACTED]"`) but preserves the actual bytes in binary formats (postcard) so
that irpc remote communication works correctly. Deserialization always reads
the full bytes.
### AES-256-GCM Encryption for External Credentials
External credentials (API keys, OAuth tokens) that cannot be derived are
encrypted using a key derived from the seed at path `m/74'/2'/0'/0'`. The
`EncryptedData` type stores the key version, salt, IV, and ciphertext.
1. The secret service derives an AES-256-GCM key via path `m/74'/2'/0'/0'`
2. External credentials are encrypted with this key
3. The encrypted data is stored as a `SecretNode` in the metagraph
4. Only the derivation path and key version are stored in plain attributes
5. The seed phrase (or derived encryption key) is held only by the secret
service — never in the database
#### EncryptedData.salt — Reserved for Future KDF-Based Key Rotation
In v1, the encryption key is derived directly from the seed at path
`m/74'/2'/0'/0'` without any salt-based key derivation. The `salt` field in
`EncryptedData` is **reserved for future KDF-based key rotation** (Phase B):
- The salt is generated randomly (32 bytes) and stored in `EncryptedData.salt`
for forward compatibility, but it is **not used** in the v1 key derivation
process.
- When key rotation is implemented, the salt will be used as input to HKDF or
PBKDF2 for stretch-based key derivation, allowing the same seed to produce
different encryption keys without changing the derivation path.
- This design ensures that the wire format does not need to change when key
rotation is introduced — the `salt` field is already present and populated.
The `hkdf` and `pbkdf2` crates are listed as future dependencies in the
`Dependencies` section but are not included in v1.
### SecretProtocol irpc Service
```rust
#[rpc_requests(message = SecretMessage)]
#[derive(Debug, Serialize, Deserialize)]
enum SecretProtocol {
#[rpc(tx=oneshot::Sender<DerivedKey>)]
#[wrap(DeriveEd25519)]
DeriveEd25519 { path: String },
#[rpc(tx=oneshot::Sender<DerivedKey>)]
#[wrap(DeriveEncryptionKey)]
DeriveEncryptionKey { path: String },
#[rpc(tx=oneshot::Sender<DerivedKey>)]
#[wrap(DeriveEthereumKey)]
DeriveEthereumKey { path: String },
#[rpc(tx=oneshot::Sender<Vec<u8>>)]
#[wrap(DerivePassword)]
DerivePassword { path: String, length: usize },
#[rpc(tx=oneshot::Sender<EncryptedData>)]
#[wrap(Encrypt)]
Encrypt { plaintext: String, key_version: u32 },
#[rpc(tx=oneshot::Sender<String>)]
#[wrap(Decrypt)]
Decrypt { encrypted: EncryptedData },
#[rpc(tx=oneshot::Sender<()>)]
#[wrap(Lock)]
Lock,
#[rpc(tx=oneshot::Sender<()>)]
#[wrap(Unlock)]
Unlock { mnemonic: String, passphrase: Option<String> },
```
**Note**: The `Unlock` variant carries both the mnemonic phrase and an optional
BIP39 passphrase. The `mnemonic` field is the space-separated BIP39 word list.
The `passphrase` field is the optional BIP39 password extension (sometimes
called the "25th word"). Most deployments use `passphrase: None`, but the field
is available for users who need additional security beyond the mnemonic alone.
> **Implementation gap**: The current code has `Unlock { passphrase: String }`
> with only a single field (the mnemonic), and the actor handler passes `None`
> for the BIP39 passphrase. This needs to be updated to match the spec above.
> See the `unlock-passphrase-gap` task.
#### irpc Integration Model
The `SecretProtocol` enum defines the **wire protocol** — the set of operations
the secret service supports. The `#[rpc_requests(message = SecretMessage)]`
macro generates `SecretMessage` as the irpc wire type, which comes in two
variants:
- `SecretMessage::Request`: serialized form for remote (QUIC) communication,
using postcard encoding.
- `SecretMessage::RequestWithChannels`: local form with `oneshot::Sender`
channels for in-process communication.
There are two dispatch paths for consuming the secret service:
1. **Local (in-process)**: `SecretServiceHandle` wraps `SecretServiceInner`
behind `Arc<RwLock<>>` and provides direct method calls
(`derive_ed25519()`, `encrypt()`, etc.) without any serialization overhead.
This is the path used by the CLI binary and single-node deployments. No irpc
message passing is involved — the handle calls the implementation directly.
2. **Remote (in-cluster)**: `Client<SecretProtocol>` connects to the secret
service node via irpc over QUIC. The client sends `SecretMessage::Request`
messages (postcard-serialized) and receives responses. Workers on remote
nodes use this path. The seed never leaves the secret service node — only
derived keys are transmitted.
The `SecretServiceActor` processes incoming `SecretMessage` variants by
dispatching to the corresponding `SecretServiceHandle` methods. It provides
a `spawn(handle)` convenience method that creates an mpsc channel, spawns the
actor on a tokio task, and returns a `(Client<SecretProtocol>, SecretServiceActor)`
tuple for immediate use.
The `SecretService` type owns the irpc service handler and a
`SecretServiceHandle`. It dispatches incoming `SecretMessage` variants to the
handle's methods. For call protocol exposure (e.g., `/head/secrets/derive`),
the service is wrapped in an operation that serializes to JSON.
### Wire Format Compatibility with alknet-storage
The `EncryptedData` type (`key_version`, `salt`, `iv`, `data`) is the stable
wire format shared with alknet-storage. This is type-level compatibility — not a
crate dependency. alknet-storage stores encrypted nodes using this format;
alknet-secret encrypts and decrypts using this format.
The Rust `EncryptedData` struct in alknet-secret is a superset of the TypeScript
`EncryptedDataSchema` from `@alkdev/storage`. Migration path: re-encrypt
TypeScript-encrypted data using the Rust secret service with a new key version.
The wire format is stable — future key rotation will use the existing `salt`
field rather than adding new fields (see OQ-SVC-03).
### Deployment Topologies
**Minimal (single node, CLI)**: Secret service runs in the same process. Seed
phrase entered at startup. All keys derived locally via `SecretServiceHandle`.
No irpc overhead.
**Production (head node)**: Secret service runs on a dedicated node or as a
local irpc service. Workers request derived keys via `Client<SecretProtocol>`
over QUIC. The seed never leaves the secret service node.
### Test Vectors
Known-answer tests are required against published test vectors to verify
correctness of the cryptographic implementations:
#### BIP39 Test Vectors
The `mnemonic` module must produce identical output to the BIP39 reference
test vectors:
- Given a known mnemonic phrase and passphrase, the derived seed must match
the reference output byte-for-byte.
- Test vectors from
[BIP39 reference](https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki)
and the `bip39` crate's own test suite.
#### SLIP-0010 Test Vectors
The `derivation` module must produce identical output to the SLIP-0010 reference
test vectors:
- Given a known seed, the derived master key (private key + chain code) must
match the SLIP-0010 reference output.
- Given a known master key, the derived child key at path `m/74'/0'/0'/0'`
must match the reference output.
- Test vectors from
[SLIP-0010 reference](https://github.com/satoshilabs/slips/blob/master/slip-0010.md).
#### AES-256-GCM Test Vectors
The `encryption` module must produce identical results to published AES-256-GCM
test vectors:
- Given a known key, IV, and plaintext, the ciphertext must match the reference
output.
- Use IEEE P802.1ASck or NIST SP 800-38D test vectors.
- Round-trip encryption/decryption must always succeed for valid inputs.
These tests ensure that the implementation is correct and compatible with
other BIP39/SLIP-0010/AES-256-GCM implementations. They are placed in
`tests/test_vectors.rs`.
## Constraints
- The seed phrase is never persisted to disk. It is entered at startup or via
`Unlock` and held only in `Zeroize`-protected RAM (ADR-038).
- `Lock` calls `zeroize()` on the seed and all cached derived keys. The key
cache is fully cleared and zeroized on `Lock` (OQ-SVC-04, resolved).
- alknet-secret does not depend on alknet-core or alknet-storage. It is fully
independent (ADR-027).
- The `EncryptedData` wire format is shared with alknet-storage for type-level
compatibility, not a crate dependency.
- Per ADR-032, secret service domain events (key derivation notifications) stay
within the service boundary. External consumers use irpc calls or call
protocol operations projected to integration events.
- irpc is always a dependency (not feature-gated) per ADR-027.
- `SecretProtocol` defines the wire format for in-cluster communication
(postcard serialization). For call protocol exposure (e.g.,
`/head/secrets/derive`), the service is wrapped in an operation that
serializes to JSON.
- `DerivedKey.private_key` must derive `Zeroize` per ADR-038. `DerivedKey`
is move-only (not `Clone`) — this is stronger than manual Clone with
zeroization of the source, as it prevents accidental duplication.
- secp256k1 (Ethereum) derivation is gated behind the `secp256k1` feature flag
because it requires a different derivation algorithm (BIP-0032) and an
additional dependency (`secp256k1`).
## Phase Progression
| Phase | Scope | Notes |
|-------|-------|-------|
| Phase 3 (now) | Basic crate: mnemonic, derivation, encryption, irpc protocol, service lifecycle, key caching | Core key management |
| Phase A | Integration with alknet-storage via `EncryptedData` wire format. CLI commands for unlock/lock/derive. `SecretStoreCredentialProvider` wiring. | Full service integration |
| Phase B | Memory hardening: `mlock`/`VirtualLock` for seed RAM, constant-time comparison, audit logging of derivation requests. Key rotation: KDF-based key derivation using `EncryptedData.salt` with HKDF/PBKDF2. | Security hardening |
| Phase C | Multi-seed support (tenant isolation): indexed `Unlock` with tenant ID. | Multi-tenancy |
## Open Questions
- **OQ-SVC-01**: Should the secret service support multiple seed phrases (one
per tenant)? See [open-questions.md](open-questions.md).
- **OQ-SVC-03**: How does the secret service integrate with the existing
`EncryptedDataSchema` from `@alkdev/storage`? **Resolution**: The wire format
is stable. `EncryptedData` (`key_version`, `salt`, `iv`, `data`) is shared
type-level between alknet-secret and alknet-storage. The migration path is
re-encryption with a new key version. The `salt` field is reserved for future
KDF-based key rotation (see Phase B). See [open-questions.md](open-questions.md).
- **OQ-SVC-04**: Should workers cache derived keys locally? **Resolution**: Yes.
Derived keys are cached in RAM using an LRU cache keyed by derivation path,
with a TTL of 1 hour (configurable). The cache is fully cleared and zeroized
on `Lock`. This avoids redundant derivation for frequently used keys while
ensuring that `Lock` purges all sensitive material. See [open-questions.md](open-questions.md).
- **OQ-SEC-01**: Should alknet-secret use `mlock`/`VirtualLock` to prevent seed
RAM from being paged to disk? See [open-questions.md](open-questions.md).
Deferred to Phase B per ADR-038.
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [027](decisions/027-crate-decomposition.md) | Crate decomposition | alknet-secret is independent of core and storage |
| [032](decisions/032-event-boundary-discipline.md) | Event boundary | Secret service domain events stay internal |
| [038](decisions/038-seed-lifecycle-memory-security.md) | Seed lifecycle and memory security | Zeroize for sensitive material, mlock deferred to Phase B |
## References
- [research/services.md](../research/services.md) — SecretProtocol definition, DerivedKey, KeyType
- [research/storage.md](../research/storage.md) — Secrets section, derivation paths, EncryptedData
- [research/integration-plan.md](../research/integration-plan.md) — Phase 3.1
- [credentials.md](credentials.md) — CredentialProvider (outbound auth, consumes SecretProtocol::Decrypt)
- SLIP-0010 — https://github.com/satoshilabs/slips/blob/master/slip-0010.md
- BIP39 — https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki
- BIP-0032 — https://github.com/bitcoin/bips/blob/master/bip-0032.mediawiki
- NIST SP 800-38D — AES-GCM test vectors

View File

@@ -1,325 +0,0 @@
---
status: reviewed
last_updated: 2026-06-07
---
# Server
## What
The alknet server accepts SSH connections (via pluggable transport) and handles `channel_open_direct_tcpip` requests by connecting to the requested target — either directly or through an outbound proxy.
## Why
The server is the tunnel endpoint. It receives SSH channels requesting TCP connections to specific hosts and ports, and makes those connections on behalf of the client. It's the same role as an SSH server with `AllowTcpForwarding yes`, but self-contained and transport-agnostic.
## Architecture
### Server Components
```
┌──────────────────────────────────────────────────┐
│ alknet serve │
│ │
│ ┌─────────────────────────────────────────────┐ │
│ │ SSH Server (russh) │ │
│ │ ServerHandler per connection │ │
│ │ - auth_publickey() → Accept/Reject │ │
│ │ - channel_open_direct_tcpip() → connect │ │
│ │ - channel_open_forwarded_tcpip() → proxy │ │
│ └──────────────────┬──────────────────────────┘ │
│ │ │
│ ┌──────────────────▼──────────────────────────┐ │
│ │ Transport Acceptor │ │
│ │ (TcpListener / TlsListener / IrohEndpoint) │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Outbound Proxy (optional) │ │
│ │ - Direct TCP │ │
│ │ - SOCKS5 proxy │ │
│ │ - HTTP CONNECT proxy │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Rate Limiter │ │
│ │ - max-connections-per-ip │ │
│ │ - max-auth-attempts │ │
│ └──────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────┘
```
### Authentication
The server authenticates connections through the `IdentityProvider` trait (ADR-029, [identity.md](identity.md)). `IdentityProvider` decouples the server from any specific identity storage — the server resolves an identity, it doesn't manage keys.
**Phase 1 implementation**: `ConfigIdentityProvider` (in alknet-core) reads from `ArcSwap<DynamicConfig.auth>` (ADR-030). Every authorized key gets a default scope set. No database required. This is the default for CLI and single-node deployments.
**Future implementation**: `StorageIdentityProvider` (in alknet-storage, not yet built) backed by SQLite `peer_credentials` and `api_keys` tables plus the ACL graph. The server doesn't need to know which implementation is active — it goes through the trait.
The server supports two auth presentation paths (ADR-023, [auth.md](auth.md)):
**SSH public key auth** (SSH transports):
1. `auth_publickey()` callback receives the presented key
2. Delegates to `IdentityProvider::resolve_from_fingerprint()` with the key fingerprint
3. Returns `Accept` (with `Identity` attached) or `Reject`
**Ed25519 + OpenSSH certificate authority** (ADR-012):
1. If no direct key match, validate the presented certificate against trusted cert-authorities
2. Check CA signature, expiry, and principal restrictions
3. Certificate options: `permit-port-forwarding`, `no-pty`, `source-address`
**Token auth** (non-SSH transports, WebTransport):
1. Extract token from URL path or `Authorization` header
2. Delegate to `IdentityProvider::resolve_from_token()`
3. Same verification: same authorized keys set, same `Identity` result (ADR-023)
**No password authentication over SSH channels.** Keys and certificates are sufficient and more secure. If a local SOCKS5 proxy needs its own auth layer, that's a separate concern.
### Key Material Format
Key inputs (`--key`, `--authorized-keys`, `--cert-authority`) accept either file paths or in-memory data (via library API or NAPI wrapper). The accepted format is **OpenSSH key format** throughout — private keys in OpenSSH format (`-----BEGIN OPENSSH PRIVATE KEY-----`), public keys in OpenSSH format (`ssh-ed25519 AAAA... user@host`), and authorized keys files in standard OpenSSH `authorized_keys` format. PEM-encoded keys (PKCS#1, PKCS#8) are not supported.
### TLS Certificate Provisioning
The server supports three TLS certificate modes (ADR-008):
1. **Manual certs** (`--tls-cert` / `--tls-key`): User provides certificate and key files. For users with existing PKI.
2. **Domain-based ACME** (`--acme-domain <domain>`): Auto-provisions certificates from Let's Encrypt using HTTP-01 or TLS-ALPN-01 challenges. Certificate is domain-bound and auto-renews. Requires port 80 or DNS access for challenges.
3. **IP-based ACME**: Short-lived certificates via TLS-ALPN-01 challenge on port 443. No domain name needed, but certificates expire frequently. The ACME client runs continuously.
ACME support is feature-gated behind the `acme` feature flag to keep the base binary lean. Implementation uses `rustls-acme` or a similar pure-Rust ACME client to avoid an external `certbot` dependency.
### Channel Handling
When a client opens a `channel_open_direct_tcpip(host, port, originator_addr, originator_port)`:
**Reserved destination** — If `host` starts with `alknet-` (e.g., `alknet-control`), the server routes the channel internally instead of connecting to a TCP target. The primary reserved destination is `alknet-control:0`, which bridges the channel to the local pubsub event bus (ADR-018).
**Forwarding policy check** — Before the proxy task is spawned for any non-reserved destination, the server evaluates `ForwardingPolicy` against the authenticated `Identity` (ADR-031, [configuration.md](configuration.md)). The policy check uses `Identity.id` and `Identity.scopes` from the identity resolved during auth. If the policy denies the destination, the channel open is rejected — no TCP connection is attempted. The default policy (`ForwardingPolicy::allow_all()`) preserves current behavior.
**Regular destination** — For targets that pass the forwarding policy check:
1. **Connection** — connect to `host:port`, either directly or via the configured outbound proxy
2. **Outbound connection** — connect to the target, either directly or via the configured outbound proxy
3. **Bidirectional proxy**`tokio::io::copy_bidirectional` between the SSH channel stream and the outbound TCP stream
4. **Cleanup** — close the channel and TCP stream when either side disconnects
### Outbound Proxy Modes
| Mode | CLI Flag | Behavior |
|------|----------|----------|
| **Direct** | (default) | `TcpStream::connect(target)` |
| **SOCKS5** | `--proxy socks5://addr:port` | Connect through SOCKS5 proxy |
| **HTTP CONNECT** | `--proxy http://addr:port` | Connect through HTTP CONNECT proxy |
The proxy setting applies globally to all outbound connections from the server.
### Stealth Mode
When `--stealth` is enabled on the server alongside TLS transport:
1. Non-SSH connections (normal web browsers, scanners) receive a fake nginx 404 response
2. The server detects whether the connecting client is speaking SSH or HTTP after the TLS handshake
3. If SSH: proceed with `server::run_stream()`
4. If HTTP: respond with `HTTP/1.1 404 Not Found` + `Server: nginx` headers, then close
This makes the server appear as an ordinary web server to port scanners and DPI systems.
**Stealth mode requires TLS transport (`--transport tls`).** It has no effect with TCP or iroh transports — in those cases, there is no TLS handshake to peek behind, and protocol multiplexing is impossible. The CLI should reject or warn if `--stealth` is used without `--transport tls`.
### Server Handler Behavior
The server handler implements `russh::server::Handler` with two primary responsibilities:
**Authentication (`auth_publickey`)**:
- Delegate to `IdentityProvider::resolve_from_fingerprint()` with the presented key fingerprint
- If identity resolved, return `Accept` with the `Identity` attached to the session
- If no identity, check certificate authority: validate CA signature, expiry, principals
- Return `Accept` or `Reject`
**Channel handling (`channel_open_direct_tcpip`)**:
- If the destination host starts with `alknet-`, route internally (control channel, ADR-018)
- Otherwise, evaluate `ForwardingPolicy` against the session's `Identity` (ADR-031)
- If denied, reject the channel open
- If allowed, connect to `host:port` (directly or via the configured outbound proxy)
- Spawn a bidirectional proxy task between the SSH channel and the outbound TCP stream
- Return the channel for data flow
### Interface Abstraction
SSH is one interface at Layer 2 in the three-layer model (ADR-026, [interface.md](interface.md)). The current `ServerHandler` will be refactored into `SshInterface` — it manages SSH session concerns (handshake, auth delegation, channel multiplexing). Forwarding policy, operation routing, and call protocol handling are Layer 3 concerns that live outside the interface. This refactoring is the most invasive code change in Phase 1 (integration-plan, Phase 1.8).
### Logging and Rate Limiting
**Logging** (for fail2ban integration on Linux):
- `INFO` level: auth attempts (remote_addr, user, key_fingerprint, accept/reject)
- `INFO` level: connection opened (remote_addr, transport kind)
- `INFO` level: connection closed (remote_addr, duration)
- Do NOT log: channel open targets, DNS resolutions, bytes transferred
This matches our production fail2ban setup which filters on source IP + failure indicators. Example log lines:
```
INFO auth attempt remote_addr=203.0.113.50 user=root key_fingerprint=SHA256:abc... result=reject
INFO connection opened remote_addr=203.0.113.50 transport=tls
```
**Built-in rate limiting** (platform-independent):
| Flag | Default | Purpose |
|------|---------|---------|
| `--max-connections-per-ip` | 0 (unlimited) | Reject new connections from IPs with N active connections |
| `--max-auth-attempts` | 10 | Disconnect after N failed auth attempts per connection |
These provide abuse protection on platforms without fail2ban (macOS, Windows, BSD) and complement fail2ban on Linux.
### CLI Interface
Configuration sources (in priority order): CLI flags, environment variables, optional `--config` TOML file (ADR-030). The TOML config file is a convenience input for reproducible deployments; it does not replace `ServeOptions` (ADR-011).
Multi-transport listeners use `[[listeners]]` in the TOML config (ADR-030):
```toml
[[listeners]]
transport = "tls"
listen = "0.0.0.0:443"
[listeners.tls]
cert = "/etc/alknet/tls/cert.pem"
key = "/etc/alknet/tls/key.pem"
[[listeners]]
transport = "iroh"
```
Currently, the server binds to a single transport at a time. Multi-transport via `[[listeners]]` is coming per ADR-030.
```bash
# Basic server (SSH on port 22)
alknet serve --key ~/.ssh/ssh_host_ed25519_key
# With TLS (manual certs)
alknet serve --key ~/.ssh/ssh_host_ed25519_key \
--transport tls \
--tls-cert /etc/ssl/cert.pem \
--tls-key /etc/ssl/key.pem
# With TLS (auto ACME, domain-based)
alknet serve --key ~/.ssh/ssh_host_ed25519_key \
--transport tls \
--acme-domain example.com
# With TLS + stealth (fake nginx 404 to scanners)
alknet serve --key ~/.ssh/ssh_host_ed25519_key \
--transport tls \
--acme-domain example.com \
--stealth
# With iroh transport (no public IP needed)
alknet serve --key ~/.ssh/ssh_host_ed25519_key \
--transport iroh
# With outbound proxy
alknet serve --key ~/.ssh/ssh_host_ed25519_key \
--proxy socks5://127.0.0.1:9050
# With certificate authority authentication
alknet serve --key ~/.ssh/ssh_host_ed25519_key \
--cert-authority /etc/alknet/ca.pub
# With rate limiting
alknet serve --key ~/.ssh/ssh_host_ed25519_key \
--max-connections-per-ip 5 \
--max-auth-attempts 3
# All options
alknet serve \
--key <path-or-buffer> \ # SSH host key (required)
--authorized-keys <path> \ # Authorized keys file
--cert-authority <path> \ # CA public key for cert-auth
--transport tcp|tls|iroh \ # Transport mode
--listen <addr:port> \ # Listen address for TCP/TLS (default: 0.0.0.0:22)
--tls-cert <path> \ # TLS certificate (manual)
--tls-key <path> \ # TLS private key (manual)
--acme-domain <domain> \ # ACME auto-cert domain
--stealth \ # Serve fake nginx 404 to non-SSH connections
--proxy <url> \ # Outbound proxy URL (socks5:// or http://)
--iroh-relay <url> \ # iroh relay server URL (default: n0 relay)
--max-connections-per-ip <n> \ # Max concurrent connections per IP (default: unlimited)
--max-auth-attempts <n> # Max auth failures before disconnect (default: 10)
```
### iroh Server Mode
When running with `--transport iroh`, the server:
1. Creates an iroh endpoint with ALPN value `b"alknet-ssh"`
2. Prints its endpoint ID (base58-encoded Ed25519 public key) — this is what clients use as the `--peer` value
3. Accepts incoming connections on the endpoint
4. For each connection, accepts a bidirectional stream and passes it to `server::run_stream()`
No listening port is needed. The server connects outbound to the iroh relay (default: n0, override with `--iroh-relay`) and awaits connections from clients who know its endpoint ID (base58-encoded, printed on startup).
## Constraints
- The server does not log tunnel destinations (ADR-006). Auth events and connection events are logged for fail2ban integration (ADR-013).
- Destination strings beginning with `alknet-` are reserved for internal use (ADR-018). The server must not attempt TCP connections to `alknet-*` destinations — these are intercepted for control channel routing.
- One `ServerHandler` instance per connection. Handler state is not shared between connections (unless explicitly configured via `Arc` shared state for things like connection limits).
- The server currently binds to a single transport at a time. Multi-transport via `[[listeners]]` is coming per ADR-030.
- Forwarding policy is evaluated before every channel proxy spawn. Denied channels are rejected immediately (ADR-031).
- Auth resolves through `IdentityProvider` (ADR-029). Phase 1 uses `ConfigIdentityProvider` backed by `ArcSwap<DynamicConfig>` (ADR-030). `StorageIdentityProvider` (Phase 2+) replaces it for production deployments with SQLite.
- ACME support requires the `acme` feature flag. Without it, only manual TLS certs are supported.
- No password authentication over SSH channels. Key-based and cert-authority only (ADR-012).
- Stealth mode (`--stealth`) requires TLS transport. It has no effect on TCP or iroh transports (ADR-017).
## Graceful Shutdown
On SIGTERM or SIGINT:
1. Stop accepting new connections on the transport listener
2. Send SSH disconnect messages to all active sessions
3. Wait for in-flight channel data to drain (brief timeout, ~2 seconds per session)
4. Close all transport listeners
5. Exit
The server does not wait indefinitely for idle connections to close. After the drain timeout, remaining connections are forcibly terminated. This prevents a slow or stuck client from blocking shutdown indefinitely.
## Error Handling
Error handling follows the project's layered pattern (see overview.md):
- **Transport errors**: Cause connection rejection. The listener remains active — a failed TLS handshake or iroh connection attempt does not affect other incoming connections.
- **Auth errors**: Result in connection rejection with a logged auth failure event (for fail2ban, ADR-013). Repeated failures from one connection trigger disconnect after `--max-auth-attempts`.
- **Channel-level errors**: Individual channel failures (target unreachable, proxy failure) close that channel without affecting the SSH session or other channels. The client receives a channel open failure message.
- **CLI errors**: Reported to stderr with a non-zero exit code. Fatal errors (invalid flags, key file not found, bind failure) exit immediately.
## Open Questions
None — all resolved.
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [001](decisions/001-pluggable-transport.md) | Pluggable transport | Transport trait, SSH consumes stream |
| [004](decisions/004-ssh-over-transport.md) | SSH over transport | SSH never touches network directly |
| [006](decisions/006-no-logging-of-tunnel-destinations.md) | No logging of destinations | Server logs auth and connections, not destinations |
| [008](decisions/008-acme-lets-encrypt.md) | ACME/Let's Encrypt | Auto-provision TLS certs, domain and IP paths |
| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Key + cert-authority auth | No password auth; support OpenSSH cert-authority |
| [013](decisions/013-fail2ban-friendly-logging.md) | Fail2ban-friendly logging | Structured auth logs + built-in rate limiting |
| [017](decisions/017-stealth-mode-protocol-multiplexing.md) | Stealth mode | Protocol multiplexing on port 443 |
| [018](decisions/018-control-channel-for-pubsub.md) | Control channel | Reserved `alknet-control` destination for pubsub |
| [019](decisions/019-proxy-dual-semantics.md) | Proxy dual semantics | `--proxy` routes transport on client, data on server |
| [026](decisions/026-transport-interface-separation.md) | Three-layer model | SSH is Layer 2 interface, ServerHandler → SshInterface |
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | IdentityProvider is the contract; irpc service is one backend |
| [029](decisions/029-identity-core-type.md) | Identity as core type | IdentityProvider trait in alknet-core |
| [030](decisions/030-static-dynamic-config-split.md) | Static/dynamic config split | ArcSwap for dynamic config, ConfigReloadHandle |
| [031](decisions/031-forwarding-policy.md) | Forwarding policy | Evaluated before channel proxy spawn |
## References
- [configuration.md](configuration.md) — DynamicConfig, ForwardingPolicy, ConfigReloadHandle
- [identity.md](identity.md) — IdentityProvider trait, Identity struct
- [auth.md](auth.md) — Unified auth, AuthPolicy, token auth
- [interface.md](interface.md) — Interface trait, SshInterface, three-layer model

View File

@@ -1,233 +0,0 @@
---
status: draft
last_updated: 2026-06-07
---
# Services
> **Phase note**: This spec defines the contracts for the service layer — the
> protocol enums, OperationEnv, and deployment topologies. Phase 1 ships
> `ConfigIdentityProvider` (ArcSwap-based) and `ConfigServiceImpl` (ArcSwap-based)
> as the only auth and config implementations. The irpc service protocols
> (`AuthProtocol`, `SecretProtocol`, etc.) and the production deployment
> topology (multi-node with `StorageIdentityProvider`) are contracted here but
> will be implemented in Phase 2+. Application services (DockerService,
> NodeService, agent services) are downstream concerns that build on top of
> the call protocol and OperationEnv — they are not core requirements.
## What
The irpc service layer decomposes alknet's core responsibilities into
independently testable, deployable, and replaceable components. Auth, Secret,
Config, and Storage are irpc protocol enums that work both as in-process async
boundaries (tokio channels) and cross-process/cross-network (irpc over iroh
QUIC streams). OperationEnv is the universal composition mechanism that unifies local
dispatch, irpc service dispatch, and remote call protocol dispatch.
## Why
Without the service layer, auth verification, key derivation, and config reload
are scattered across the codebase with no async boundary. For head nodes serving
many users, in-memory key lookup doesn't scale — auth needs to query a database
on demand. For secret management, the seed must be isolated in its own process
boundary.
Without OperationEnv, handlers calling other operations would need to know
whether the target is local, in-cluster, or on a remote node. OperationEnv
abstracts this away: `context.env.invoke("secrets", "derive", input)` works
regardless of dispatch path.
## Architecture
### Service Definition Pattern
Services are defined as irpc protocol enums:
```rust
#[rpc_requests(message = AuthMessage)]
#[derive(Debug, Serialize, Deserialize)]
enum AuthProtocol {
#[rpc(tx=oneshot::Sender<AuthResult>)]
#[wrap(VerifyPubkey)]
VerifyPubkey { fingerprint: String, key_data: Vec<u8> },
// ...
}
```
The `#[rpc_requests]` macro generates two versions:
- **Serializable** (`Request`): for remote communication (postcard encoding)
- **With channels** (`RequestWithChannels`): for local communication (tokio channels)
Both use the same `Client<S>` type. The local/remote distinction is transparent
at the call site.
### Core Services
| Service | Protocol | Purpose | Always Local? |
|---------|----------|---------|---------------|
| **Auth** | `AuthProtocol` | Verify identities, check credentials | Can be remote |
| **Secret** | `SecretProtocol` | Derive keys, encrypt/decrypt | Local or remote |
| **Config** | `ConfigProtocol` | Dynamic config reload | Local |
| **Storage** | `StorageProtocol` | Graph CRUD, metagraph operations | Local or remote |
### OperationContext
Every handler receives an `OperationContext`:
```rust
pub struct OperationContext {
pub request_id: String,
pub parent_request_id: Option<String>,
pub identity: Option<Identity>,
pub metadata: HashMap<String, Value>,
pub env: OperationEnv,
pub trusted: bool, // set by buildEnv(), not by callers
}
```
- **`identity`**: The authenticated identity making the call. Populated by
`IdentityProvider` from the interface layer.
- **`env`**: The operation environment — namespaced access to other operations.
- **`trusted`**: When a handler calls another operation through `env`, the
nested call is `trusted` (skips ACL checks).
### OperationEnv — Universal Composition Mechanism
OperationEnv provides namespace + operation name → invoke with input, return
output. The handler doesn't know or care whether the dispatch is local, irpc,
or remote.
Three dispatch paths:
| Path | Mechanism | Serialization | Scope |
|------|-----------|---------------|-------|
| **Local** | Direct function call through registry | None (in-process) | Same process |
| **Service** | irpc protocol enum dispatch | postcard (binary) | Same cluster |
| **Remote** | Call protocol `EventEnvelope` | JSON | Cross-node |
All three produce the same `ResponseEnvelope`.
Service assembly determines which path each operation uses:
```rust
// Minimal deployment (single node, all local)
let env = OperationEnv::local(local_registry);
// Production deployment (mix of local and remote)
let env = OperationEnv::new()
.local("auth", auth_registry)
.local("config", config_registry)
.service("secrets", secret_irpc_client)
.remote("worker-1", call_protocol_conn);
```
### Service vs Call Protocol vs External Service
These are different concepts that compose through OperationEnv:
- **irpc service**: In-cluster, Rust-to-Rust, type-safe, postcard serialization.
Dispatched by enum variant. Example: `AuthProtocol::VerifyPubkey`.
- **Call protocol operation**: Cross-node, cross-language, path-based, JSON
`EventEnvelope`. Dispatched by namespace + name. Example:
`/head/auth/verify`.
- **External service**: Any endpoint reachable via the call protocol.
Example: a vast.ai instance, an HTTP API, another head node.
An irpc service can back a call protocol operation. The OperationEnv routes to
the appropriate dispatch path:
```
Call Protocol (Layer 3, external, JSON)
└── irpc Service (Layer 3, internal, postcard)
└── Honker Streams (Domain events, within service boundary)
```
### Adapters
HTTP, MCP, DNS, and WebSocket adapters all resolve through OperationEnv:
- HTTP: `POST /v1/{namespace}/{op}``context.env.invoke(namespace, op, input)`
- MCP: `tools/call` with tool name → `context.env.invoke(namespace, op, input)`
- DNS: `{op}.{namespace}.alk.dev TXT?``context.env.invoke(namespace, op, input)`
- Call protocol: `call.requested` with `operationId``context.env.invoke(namespace, op, input)`
### Deployment Topologies
**Current (Phase 1, single node, CLI)**: This is what exists and ships today.
Auth uses `ConfigIdentityProvider` backed by `ArcSwap<DynamicConfig>`. Config
uses `ConfigServiceImpl` backed by `ArcSwap<DynamicConfig>`. There is no
database dependency.
```
┌──────────────────────────────────────────────┐
│ Single Process │
│ ConfigIdentityProvider (ArcSwap) │
│ ConfigServiceImpl (ArcSwap) │
│ alknet-core Server │
└──────────────────────────────────────────────┘
```
The irpc service layer (`AuthProtocol`, `SecretProtocol`, `ConfigProtocol`,
`StorageProtocol`) and the application services (DockerService, NodeService,
WalletService, agent services) are downstream concerns that will be built in
later phases. The architecture defines the contracts (`IdentityProvider` trait,
`OperationEnv`, service protocol enums) so that implementations can plug in
without modifying core, but the implementations don't exist yet.
**Future (multi-node, production)**: Auth and secrets on dedicated nodes;
workers access them remotely via irpc over QUIC. StorageIdentityProvider
backed by SQLite replaces ConfigIdentityProvider for auth.
```
Auth Node (SQLite) Secret Node (seed in RAM)
↑ ↑
│ QUIC (irpc) │ QUIC (irpc)
│ │
Head Node (Config, Storage, alknet-core Server)
│ SSH / iroh / TLS
Worker Node (alknet-core Client)
```
This topology requires alknet-storage, alknet-secret, and the irpc service
layer to be built — they are Phase 2+ concerns.
## Constraints
- Services are **internal** — they run within a node or cluster.
- The call protocol is **external** — it's how nodes talk to each other.
- Per ADR-032, domain events (Honker streams) stay within the owning service.
irpc calls are synchronous request-response within a node. Call protocol
`EventEnvelope` is the integration boundary between nodes.
- OperationEnv is a hard constraint: the handler-facing API must match the
behavioral contract from `@alkdev/operations`. Namespace + operation name →
invoke with input, return output.
- irpc is behind a feature flag in alknet-core. Nodes that only do SSH tunneling
don't need the service layer overhead.
## Open Questions
- **OQ-SVC-01**: Should the secret service support multiple seed phrases (one
per tenant)? See [open-questions.md](open-questions.md).
- **OQ-SVC-02**: Should service protocols use postcard (binary) or JSON for
remote calls? See [open-questions.md](open-questions.md).
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [027](decisions/027-crate-decomposition.md) | Crate decomposition | Service crates are independent of core |
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | AuthProtocol behind feature flag |
| [032](decisions/032-event-boundary-discipline.md) | Event boundary | Domain events never cross service boundaries |
| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv | Universal composition mechanism with three dispatch paths |
## References
- [research/services.md](../research/services.md) — Service protocol definitions, OperationContext, deployment topologies
- [research/integration-plan.md](../research/integration-plan.md) — OperationEnv, three dispatch paths, adapter patterns
- [secret-service.md](secret-service.md) — SecretProtocol definition
- [identity.md](identity.md) — IdentityProvider, AuthProtocol
- [configuration.md](configuration.md) — ConfigProtocol, DynamicConfig reload
- [interface.md](interface.md) — Interface layer, auth across interfaces

View File

@@ -1,221 +0,0 @@
---
status: draft
last_updated: 2026-06-07
---
# Storage
> **Phase note**: `alknet-storage` is a future crate (Phase 2+). This spec
> defines its contract — the data model, the `IdentityProvider` impl, the
> irpc service protocol — so that alknet-core can define the traits
> (`IdentityProvider`) that storage will later implement. The crate itself
> hasn't been built yet. Phase 1 uses `ConfigIdentityProvider` backed by
> `ArcSwap<DynamicConfig>`.
## What
The `alknet-storage` crate will provide SQLite-backed graph storage, identity
management, access control, and reactivity via honker. It mirrors the
TypeScript `@alkdev/storage` package's design while leveraging Rust's type
system and honker's built-in pub/sub.
## Why
alknet-core needs persistent identity data (authorized keys, accounts, ACLs)
and a way to store and query graph-structured data (call graphs, operation
graphs, metagraph). But alknet-core cannot take a database dependency. The
solution: alknet-storage implements alknet-core's `IdentityProvider` trait,
providing SQLite-backed identity resolution without core knowing about SQLite.
The metagraph (three-level type system: GraphType → NodeType → EdgeType → Graph
→ Node → Edge) is the foundation for ACL, flowgraph persistence, and any
future graph-structured data.
## Architecture
### Crate Structure
```
alknet-storage/
├── metagraph/ — GraphType, NodeType, EdgeType persistence
├── identity/ — accounts, organizations, peer_credentials, api_keys, audit_logs
├── acl/ — PrincipalNode, DelegatesEdge, access control graph
├── secrets/ — Encrypted node type, encrypt/decrypt bridge
├── honker/ — honker integration: notify, stream, queue
├── graph/ — GraphInstance, Node, Edge CRUD with schema validation
└── schema/ — JSON Schema definitions (serde + jsonschema)
```
### Metagraph Data Model
Three-level type system:
1. **GraphType** — A class of graphs (e.g., "call-graph", "acl",
"task-dependencies"). Defines structural constraints.
2. **NodeType** — A category of node within a graph type. Each has a JSON Schema
for attribute validation.
3. **EdgeType** — A category of edge within a graph type. Each has a JSON Schema
and optional source/target constraints.
Graph instances belong to a graph type and contain nodes and edges conforming
to those type definitions.
### SQLite Table Schema
Common columns: `id TEXT PK`, `metadata TEXT JSON DEFAULT '{}'`,
`created_at INTEGER TIMESTAMP`, `updated_at INTEGER TIMESTAMP`.
| Table | Key columns |
|-------|------------|
| `graph_types` | id, name (UNIQUE), config JSON, version, scope |
| `node_types` | id, graph_type_id FK, name, schema JSON |
| `edge_types` | id, graph_type_id FK, name, schema JSON, allowed_source/target types |
| `graphs` | id, graph_type_id FK, name, description, status, owner_id, project_id |
| `nodes` | id, graph_id FK, key (UNIQUE per graph), attributes JSON |
| `edges` | id, graph_id FK, key, source_node_key, target_node_key, attributes JSON, undirected |
No FK constraints across database files. Referential integrity is enforced at
the application layer.
### System DB vs Tenant DB
- **System DB** (`system.db`): Identity tables (accounts, organizations,
peer_credentials, api_keys, audit_logs) + system-scoped graph types.
- **Tenant DB** (`tenant-{orgId}.db`): Metagraph tables + tenant-scoped graph
types.
### Identity Tables
| Table | Key columns |
|-------|------------|
| `accounts` | email (UNIQUE), display_name, access_level (admin/user/service), status |
| `organizations` | name (UNIQUE), slug (UNIQUE), owner_id FK → accounts |
| `organization_members` | org_id FK, account_id FK, membership_level (owner/admin/member) |
| `api_keys` | owner_id FK, key_hash (UNIQUE), name, enabled, expires_at, revoked_at |
| `peer_credentials` | owner_id FK, credential_type (ssh_key/cert_authority), fingerprint (UNIQUE), public_key_data |
| `audit_logs` | action, owner_id FK, credential_id, org_id FK, details JSON |
### ACL as Metagraph
The ACL graph is a directed, non-multi metagraph:
- **PrincipalNode**: IdentityType (Account, Org, Service, Role) + identity_id + scopes + resources
- **ResourceNode**: The thing being accessed
- **Edges**: can_read, can_write, can_execute, belongs_to, delegates
Delegation edges carry `narrowed_scopes` — the delegate can only exercise scopes
that are a subset of the delegator's.
### StorageIdentityProvider (Future — Phase 2+)
Implements alknet-core's `IdentityProvider` trait (ADR-029). This is defined
here as a contract. When alknet-storage is built, it will provide this
implementation. Phase 1 uses `ConfigIdentityProvider` backed by `ArcSwap`.
```rust
impl IdentityProvider for StorageIdentityProvider {
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity> {
// 1. Find peer_credentials row by fingerprint
// 2. Resolve to account → organization membership → effective scopes
// 3. Return Identity { id: account_uuid, scopes, resources }
}
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity> {
// 1. Verify Ed25519 signature against api_keys or peer_credentials
// 2. Resolve to account → effective scopes
// 3. Return Identity { id: account_uuid, scopes, resources }
}
}
```
### StorageProtocol irpc Service
```rust
#[rpc_requests(message = StorageMessage)]
enum StorageProtocol {
#[rpc(tx=oneshot::Sender<Graph>)]
#[wrap(CreateGraph)]
CreateGraph { graph_type_id: String, name: String },
#[rpc(tx=oneshot::Sender<Node>)]
#[wrap(AddNode)]
AddNode { graph_id: String, key: String, attributes: Value },
// ... (full protocol in research/services.md)
}
```
### Honker Integration
| Feature | Use case |
|---------|----------|
| `stream_publish` / `subscribe` | Durable pub/sub for node/edge/membership changes |
| `notify` / `listen` | Ephemeral pub/sub for real-time control channel events |
| `queue` / `claim` / `ack` | Task queue for async operations |
Per ADR-032, honker streams are domain events internal to the storage service.
They are projected to call protocol `EventEnvelope` events when crossing service
boundaries.
### Encrypted Data
alknet-storage references alknet-secret's `EncryptedData` wire format for
storing encrypted nodes (API keys, OAuth tokens). The format (key_version,
salt, iv, ciphertext) is shared by type-level compatibility, not a crate
dependency. alknet-secret encrypts; alknet-storage stores the blob.
### Crate Dependencies
```toml
[dependencies]
honker = "0.x"
rusqlite = { version = "0.x", features = ["bundled"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
jsonschema = "0.x"
petgraph = "0.x"
irpc = "0.x"
```
Does NOT depend on alknet-core or alknet-secret. Implements alknet-core's
`IdentityProvider` trait by conforming to its signature, not by direct crate
dependency.
## Constraints
- alknet-storage does NOT depend on alknet-core as a crate. It implements the
`IdentityProvider` trait by conforming to the signature. The CLI binary
wires them together.
- alknet-storage does NOT depend on alknet-secret. They share the `EncryptedData`
wire format by type-level compatibility, not a crate dependency.
- WAL mode for concurrent reads during writes. Single writer per `.db` file.
- JSON Schema validation uses the `jsonschema` crate at runtime (replaces
TypeBox from TypeScript).
- Per ADR-032, honker stream events never cross service boundaries without
projection to `EventEnvelope`.
## Open Questions
- **OQ-SVC-03**: How does the secret service integrate with the existing
`EncryptedDataSchema` from `@alkdev/storage`? See [open-questions.md](open-questions.md).
- **OQ-SVC-04**: Should workers cache derived keys locally? See [open-questions.md](open-questions.md).
- **OQ-SVC-05**: How does the NFT-based ACL smart contract interact with the
secret service? See [open-questions.md](open-questions.md).
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [027](decisions/027-crate-decomposition.md) | Crate decomposition | alknet-storage is independent of core and secret |
| [029](decisions/029-identity-core-type.md) | Identity as core type | alknet-storage implements IdentityProvider trait |
| [032](decisions/032-event-boundary-discipline.md) | Event boundary | Honker streams stay internal; projection to EventEnvelope at boundaries |
## References
- [research/storage.md](../research/storage.md) — Full metagraph, identity, ACL, honker definitions
- [research/services.md](../research/services.md) — StorageProtocol, StorageIdentityProvider
- [research/integration-plan.md](../research/integration-plan.md) — Phase 2.2
- [identity.md](identity.md) — IdentityProvider trait, Identity struct
- [secret-service.md](secret-service.md) — EncryptedData format, derivation paths

View File

@@ -1,152 +0,0 @@
---
status: reviewed
last_updated: 2026-06-02
---
# Transport Layer
## What
The transport layer produces a duplex byte stream (`AsyncRead + AsyncWrite + Unpin + Send`) that the SSH layer consumes via `russh::client::connect_stream()` or `russh::server::run_stream()`. The SSH layer is completely unaware of what transport it runs over.
## Why
Pluggable transports are the core architectural insight. They enable:
- **Simple deployment**: TCP on port 22 for basic use
- **Censorship resistance**: TLS on port 443 looks like HTTPS
- **NAT traversal**: iroh QUIC allows connections without public IPs
- **Composability**: transports can be layered (iroh through SOCKS5 through SSH through TLS)
Without this abstraction, each transport mode would need its own SSH connection logic. With it, there's one SSH implementation and N transport implementations.
## Architecture
### Transport Trait
```rust
// The core abstraction. Each transport produces ONE duplex stream.
// The SSH session runs over this stream for its entire lifetime.
#[async_trait]
pub trait Transport: Send + Sync + 'static {
type Stream: AsyncRead + AsyncWrite + Unpin + Send + 'static;
/// Connect to the remote endpoint and return a duplex stream.
/// For client-side transports.
async fn connect(&self) -> Result<Self::Stream>;
/// Return a human-readable description of this transport for logging.
fn describe(&self) -> String;
}
```
### Server-Side Transport Acceptor
```rust
#[async_trait]
pub trait TransportAcceptor: Send + Sync + 'static {
type Stream: AsyncRead + AsyncWrite + Unpin + Send + 'static;
/// Accept an incoming connection and return a duplex stream.
async fn accept(&self) -> Result<(Self::Stream, TransportInfo)>;
}
/// Metadata about the incoming connection.
pub struct TransportInfo {
pub remote_addr: Option<SocketAddr>,
pub transport_kind: TransportKind,
}
pub enum TransportKind {
Tcp,
Tls { server_name: Option<String> },
Iroh { endpoint_id: String },
}
```
### Transport Implementations
| Transport | Client | Server | Stream Type |
|-----------|--------|--------|-------------|
| **TcpTransport** | `TcpStream::connect(addr)` | `TcpListener::accept()` | `TcpStream` |
| **TlsTransport** | `TlsStream<TcpStream>` (client TLS) | `TlsStream<TcpStream>` (server TLS) | `tokio_rustls::client::TlsStream<TcpStream>` |
| **IrohTransport** | `endpoint.connect(peer, alpn)` then `conn.open_bi()` then `join(recv, send)` | `endpoint.accept()` then `conn.accept_bi()` then `join(recv, send)` | `tokio::io::Join<RecvStream, SendStream>` |
### Iroh Stream Join
Since QUIC splits streams into separate `RecvStream` (implements `AsyncRead`) and `SendStream` (implements `AsyncWrite`), while russh expects a single duplex stream, they are combined using `tokio::io::join(recv_stream, send_stream)` which produces a `Join<RecvStream, SendStream>` implementing both traits.
See ADR-003 for the decision to use `tokio::io::join` over a custom wrapper.
### iroh Relay Configuration
By default, iroh transport uses n0's free relay servers (`https://relay.iroh.network/`). This provides zero-config NAT traversal for testing and development. For production deployments, users override with `--iroh-relay <url>` to point to a self-hosted relay.
The relay URL is passed to iroh's `Endpoint::builder()` configuration. Self-hosted relay setup is documented in the project wiki.
See ADR-009 for the decision to default to n0's relay with override.
### Transport Chaining
Transports can be nested. The CLI supports `--transport iroh --proxy socks5://...` natively (ADR-010):
```bash
alknet connect --transport iroh --proxy socks5://127.0.0.1:1080
```
This routes iroh's outbound TCP connections through the specified SOCKS5 proxy. The iroh transport supports SOCKS5 and HTTP proxy configuration for its outbound connections — the proxy URL is applied during transport initialization.
For other combinations:
- TCP + TLS is already implicit (TLS wraps TCP in `TlsTransport`)
- TLS + SOCKS5 proxy is also supported via `--proxy` with `--transport tls`
**Note**: `--proxy` has different semantics on the client vs the server (ADR-019):
- **Client**: `--proxy` routes the *transport connection* through the proxy (e.g., iroh endpoint → SOCKS5 → iroh relay)
- **Server**: `--proxy` routes *outbound target connections* through the proxy (e.g., SSH channel request → SOCKS5 → target host)
### Connection Lifecycle
```
Client Server
│ │
│ transport.connect() │ transport_acceptor.accept()
│ ─────────────────────────────────────────────▶│
│ (duplex byte stream established) │
│ │
│ russh::client::connect_stream(config, │ russh::server::run_stream(config,
│ stream, handler) │ stream, handler)
│ │
│ ═══════ SSH session over stream ═════════════ │
│ ═════════════════════════════════════════════ │
│ │
│ channel_open_direct_tcpip(host, port, ...) │
│ ─────────────────────────────────────────────▶│
│ │
│ ┌─────── TCP proxy ──────────────────┐ │
│ │ SSH channel ←→ TcpStream::connect │ │
│ └────────────────────────────────────┘ │
```
## Constraints
- SSH sees only the stream. It never opens its own TCP connections. (ADR-004)
- Each transport produces exactly one stream per SSH session. Multiple sessions need multiple `connect()` calls.
- The iroh transport reuses a single `Endpoint` across multiple sessions (one QUIC connection per peer, multiple `open_bi()` streams). The endpoint is created once and shared.
- TLS transport requires certificate configuration on the server side. The client can accept any certificate (self-signed) or verify against a CA. Server-side ACME is supported (ADR-008).
## Open Questions
None — all resolved.
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [001](decisions/001-pluggable-transport.md) | Pluggable transport | Transport trait produces stream, SSH consumes it |
| [003](decisions/003-iroh-stream-join.md) | iroh stream join | `tokio::io::join` combines QUIC halves |
| [004](decisions/004-ssh-over-transport.md) | SSH over transport | SSH never touches TCP/iroh/TLS directly |
| [008](decisions/008-acme-lets-encrypt.md) | ACME/Let's Encrypt | Auto-provision TLS certs, domain and IP paths |
| [009](decisions/009-default-iroh-relay.md) | Default iroh relay | n0 relay by default, `--iroh-relay` override |
| [010](decisions/010-transport-chaining-cli.md) | Transport chaining | `--proxy` works with all transports natively |
| [019](decisions/019-proxy-dual-semantics.md) | Proxy dual semantics | `--proxy` routes transport on client, data on server |

View File

@@ -1,28 +0,0 @@
---
status: deprecated
last_updated: 2026-06-01
---
# TUN Shim (Deprecated)
> **Note**: TUN functionality has been deferred from the alknet project. For VPN-like "route all traffic" behavior, use `tun2proxy` alongside alknet's SOCKS5 proxy. See ADR-014 for the rationale.
## What Changed
The `alknet-tun` separate process and all TUN-related code is out of scope. The recommended approach for VPN-like behavior is:
```bash
# Terminal 1: alknet SOCKS5 proxy (no root required)
alknet connect --server example.com --identity ~/.ssh/id_ed25519
# Terminal 2: tun2proxy routes all traffic through alknet's SOCKS5
sudo tun2proxy --proxy socks5://127.0.0.1:1080
```
This keeps the core alknet binary free of TUN complexity and leverages an existing, well-tested tool for TUN-to-SOCKS5 bridging.
## References
- [ADR-014](decisions/014-defer-tun-recommend-socks5-proxy.md) — decision to defer TUN
- [ADR-005](decisions/005-socks5-before-tun.md) — SOCKS5 is still the primary interface
- [tun2proxy](https://github.com/tun2proxy/tun2proxy) — recommended external tool for TUN support