greenfield: clean slate for ALPN-as-service pivot
Delete old source crates (alknet-core, alknet, alknet-napi), old architecture docs (ADRs, specs, open questions), old research docs (phase2, event-sourcing, feasibility, etc.), old tasks, and obsolete reference material (gitserver/MPL, honker, nats, rustfs, polyglot, keystone, distributed-identity). Keep: alknet-secret (standalone, compiles), pivot docs, iroh and ssh references, rudolfs reference (MIT/Apache, fork candidate), ops docs, sdd_process.md, and licenses. Previous implementation preserved at /workspace/@alkdev/alknet-main/ for reference during porting. Workspace compiles: cargo check + 14 tests pass for alknet-secret.
This commit is contained in:
@@ -1,122 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-09
|
||||
---
|
||||
|
||||
# Alknet Architecture
|
||||
|
||||
## Current State
|
||||
|
||||
Architecture spec sync in progress. Phase 0 foundation complete (ADRs 001–037).
|
||||
Phase 1 core modifications partially implemented (interface trait, config split,
|
||||
identity provider, forwarding policy). Phase 2 core bridge research complete;
|
||||
spec documents updated to reflect StreamInterface/MessageInterface split,
|
||||
CredentialProvider as core type, and API keys in DynamicConfig.
|
||||
|
||||
Remaining open questions: OQ-15 (QUIC coexistence), OQ-19 (WebTransport TLS),
|
||||
OQ-20 (worker registration), OQ-CP-01 (per-identity credentials), OQ-CP-02
|
||||
(OIDC provider location), OQ-CP-03 (credential rotation). See
|
||||
[open-questions.md](open-questions.md).
|
||||
|
||||
## Architecture Documents
|
||||
|
||||
| Document | Status | Description |
|
||||
|----------|--------|-------------|
|
||||
| [overview.md](overview.md) | reviewed | Package purpose, crate structure, three-layer model, exports, dependencies |
|
||||
| [transport.md](transport.md) | reviewed | Transport abstraction: TCP, TLS, iroh |
|
||||
| [auth.md](auth.md) | draft | Unified auth: SSH + token + API keys, credential presentation per interface |
|
||||
| [call-protocol.md](call-protocol.md) | draft | Bidirectional call/event protocol, OperationEnv, three dispatch paths |
|
||||
| [client.md](client.md) | reviewed | Client connection, SOCKS5, port forwarding |
|
||||
| [server.md](server.md) | reviewed | Server acceptance, IdentityProvider, ForwardingPolicy, channel handling |
|
||||
| [tun-shim.md](tun-shim.md) | deprecated | TUN interface wrapper — **deferred**, use tun2proxy |
|
||||
| [napi-and-pubsub.md](napi-and-pubsub.md) | reviewed | NAPI wrapper, reload API, pubsub event target adapter |
|
||||
| [identity.md](identity.md) | draft | Identity type, IdentityProvider trait, auth flows |
|
||||
| [services.md](services.md) | draft | irpc service layer, OperationEnv, three dispatch paths |
|
||||
| [interface.md](interface.md) | draft | StreamInterface, MessageInterface, credential presentation, ListenerConfig |
|
||||
| [configuration.md](configuration.md) | draft | StaticConfig, DynamicConfig, API keys, forwarding policy, reload |
|
||||
| [storage.md](storage.md) | draft | alknet-storage: metagraph, identity, ACL, honker |
|
||||
| [flowgraph.md](flowgraph.md) | draft | alknet-flowgraph: call graph, operation graph, petgraph |
|
||||
| [secret-service.md](secret-service.md) | reviewed | alknet-secret: BIP39, SLIP-0010, AES-GCM, SecretProtocol |
|
||||
| [credentials.md](credentials.md) | draft | CredentialProvider, CredentialSet (outbound auth) |
|
||||
| [definitions.md](definitions.md) | draft | Terminology disambiguation and concept mapping |
|
||||
|
||||
## Research Documents
|
||||
|
||||
| Document | Status | Description |
|
||||
|----------|--------|-------------|
|
||||
| [configuration.md](../research/configuration.md) | draft | Configuration architecture (source for promoted spec) |
|
||||
| [core.md](../research/core.md) | draft | Core overview, transport, call protocol, DNS |
|
||||
| [services.md](../research/services.md) | draft | irpc service protocols, OperationContext, application services |
|
||||
| [storage.md](../research/storage.md) | draft | Metagraph, identity, ACL, secrets, honker |
|
||||
| [flow.md](../research/flow.md) | draft | FlowGraph, operation graph, call graph, petgraph mapping |
|
||||
| [integration-plan.md](../research/integration-plan.md) | draft | Phased integration plan for services, pubsub, and operations |
|
||||
| [feasibility/](../research/feasibility/) | — | SSH tunnel feasibility assessment and related analyses |
|
||||
| [event-sourcing/](../research/event-sourcing/) | — | Event sourcing patterns and event-driven architecture reference |
|
||||
| [ops/](../research/ops/) | — | Production ops reference: certbot, fail2ban |
|
||||
| [phase2/definitions.md](../research/phase2/definitions.md) | draft | Terminology disambiguation (promoted to architecture/definitions.md) |
|
||||
| [phase2/interface-model.md](../research/phase2/interface-model.md) | draft | StreamInterface/MessageInterface analysis (promoted to interface.md) |
|
||||
| [phase2/credential-provider.md](../research/phase2/credential-provider.md) | draft | CredentialProvider research (promoted to credentials.md) |
|
||||
| [phase2/tls-transport.md](../research/phase2/tls-transport.md) | draft | HTTP interface, stealth handoff, ListenerConfig (promoted to interface.md, auth.md) |
|
||||
|
||||
## ADR Table
|
||||
|
||||
| ADR | Title | Status |
|
||||
|-----|-------|--------|
|
||||
| [001](decisions/001-pluggable-transport.md) | Pluggable transport via `AsyncRead+AsyncWrite` trait | Accepted |
|
||||
| [002](decisions/002-tun-separate-process.md) | TUN shim as separate process | Superseded by ADR-014 |
|
||||
| [003](decisions/003-iroh-stream-join.md) | iroh stream via `tokio::io::join` | Accepted |
|
||||
| [004](decisions/004-ssh-over-transport.md) | SSH runs over transport, not alongside | Accepted |
|
||||
| [005](decisions/005-socks5-before-tun.md) | SOCKS5 as primary interface, TUN as add-on | Accepted |
|
||||
| [006](decisions/006-no-logging-of-tunnel-destinations.md) | No logging of tunnel destinations | Accepted |
|
||||
| [007](decisions/007-napi-single-stream.md) | NAPI exposes single duplex stream | Accepted |
|
||||
| [008](decisions/008-acme-lets-encrypt.md) | ACME/Let's Encrypt certificate provisioning | Accepted |
|
||||
| [009](decisions/009-default-iroh-relay.md) | Default iroh relay with override | Accepted |
|
||||
| [010](decisions/010-transport-chaining-cli.md) | Transport chaining in CLI | Accepted |
|
||||
| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first API, no file-based config | Accepted |
|
||||
| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Ed25519 keys + OpenSSH cert-authority, no password auth | Accepted |
|
||||
| [013](decisions/013-fail2ban-friendly-logging.md) | Fail2ban-friendly logging + built-in rate limiting | Accepted |
|
||||
| [014](decisions/014-defer-tun-recommend-socks5-proxy.md) | Defer TUN, recommend local SOCKS5 + tun2proxy | Accepted |
|
||||
| [015](decisions/015-napi-rs-for-ffi-bridge.md) | napi-rs for FFI bridge | Accepted |
|
||||
| [016](decisions/016-napi-expose-connect-and-serve.md) | NAPI exposes both connect() and serve() | Accepted |
|
||||
| [017](decisions/017-stealth-mode-protocol-multiplexing.md) | Stealth mode — protocol multiplexing on port 443 | Accepted |
|
||||
| [018](decisions/018-control-channel-for-pubsub.md) | Control channel for pubsub over SSH | Accepted |
|
||||
| [019](decisions/019-proxy-dual-semantics.md) | `--proxy` dual semantics (client vs server) | Accepted |
|
||||
| [023](decisions/023-unified-auth-shared-key-material.md) | Unified auth with shared key material + token auth | Accepted |
|
||||
| [024](decisions/024-bidirectional-call-protocol.md) | Bidirectional call protocol (EventEnvelope) | Accepted |
|
||||
| [025](decisions/025-handler-spec-separation.md) | Handler/spec separation for downstream service registration | Accepted |
|
||||
| [026](decisions/026-transport-interface-separation.md) | Transport/interface separation (three-layer model) | Accepted |
|
||||
| [027](decisions/027-crate-decomposition.md) | Crate decomposition (core, secret, storage, flowgraph) | Accepted |
|
||||
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service behind feature flag | Accepted |
|
||||
| [029](decisions/029-identity-core-type.md) | Identity as core type in alknet-core | Accepted |
|
||||
| [030](decisions/030-static-dynamic-config-split.md) | Static/dynamic config split with ArcSwap | Accepted |
|
||||
| [031](decisions/031-forwarding-policy.md) | Forwarding policy with rule-based allow/deny | Accepted |
|
||||
| [032](decisions/032-event-boundary-discipline.md) | Event boundary discipline (domain, irpc, call protocol) | Accepted |
|
||||
| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv as universal composition mechanism | Accepted |
|
||||
| [034](decisions/034-head-worker-terminology.md) | Head/worker terminology replacing hub/spoke | Accepted |
|
||||
| [035](decisions/035-streaminterface-messageinterface-split.md) | StreamInterface / MessageInterface split | Accepted |
|
||||
| [036](decisions/036-credentialprovider-core-type.md) | CredentialProvider as core type (outbound auth) | Accepted |
|
||||
| [037](decisions/037-api-keys-dynamic-config.md) | API keys as DynamicConfig auth | Accepted |
|
||||
|
||||
| [038](decisions/038-seed-lifecycle-memory-security.md) | Seed lifecycle and memory security (zeroize for v1) | Accepted |
|
||||
|
||||
> ADR numbers 020–022 were allocated to proposals that were withdrawn before
|
||||
> acceptance and are not listed.
|
||||
|
||||
## Open Questions
|
||||
|
||||
See [open-questions.md](open-questions.md) for all open and resolved questions.
|
||||
Key resolved questions from Phase 0: OQ-12, OQ-16, OQ-18 (forwarding policy
|
||||
and identity scopes), OQ-17 (transport-aware auth), OQ-23 (irpc feature flag),
|
||||
OQ-24 (DNS control channel scope), OQ-25 (crate irpc dependencies), OQ-IF-01
|
||||
(Interface session / EventEnvelope relationship), OQ-IF-02 (ForwardingPolicy
|
||||
placement). Key open questions: OQ-15 (QUIC coexistence), OQ-19 (WebTransport
|
||||
TLS), OQ-20 (worker registration).
|
||||
|
||||
## Lifecycle Definitions
|
||||
|
||||
| Status | Meaning | Transitions |
|
||||
|--------|---------|-------------|
|
||||
| `draft` | Under active development. May change significantly. | → `reviewed` when open questions resolved |
|
||||
| `reviewed` | Architecture final. Implementation may begin. Changes require review. | → `stable` when implementation is complete and verified |
|
||||
| `stable` | Locked. Changes require review and may warrant an ADR. | → `deprecated` when superseded |
|
||||
| `deprecated` | Superseded. Kept for reference. | Removed when no longer referenced |
|
||||
@@ -1,339 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-09
|
||||
---
|
||||
|
||||
# Authentication
|
||||
|
||||
## What
|
||||
|
||||
A unified authentication layer that works across all transports — SSH-over-any-
|
||||
transport and WebTransport (non-SSH HTTP-level transports). The same key
|
||||
material (Ed25519 authorized keys and certificate authorities) is shared across
|
||||
both auth paths. Identity resolution produces a transport-agnostic `Identity`
|
||||
that carries scopes and resources for downstream authorization.
|
||||
|
||||
## Why
|
||||
|
||||
Alknet currently authenticates connections exclusively through SSH public key
|
||||
auth. Non-SSH transports (WebTransport) cannot perform SSH key exchange — they
|
||||
need a different auth presentation that shares the same key material. The
|
||||
unified auth layer ensures one key set, one identity, one rotation mechanism
|
||||
across all transports. See ADR-023 for the decision context.
|
||||
|
||||
The canonical definitions of `Identity` and `IdentityProvider` are in
|
||||
[identity.md](identity.md). This document covers auth-specific behavior:
|
||||
auth presentation per transport, `AuthPolicy` structure, and the auth service
|
||||
relationship.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Identity and IdentityProvider
|
||||
|
||||
See [identity.md](identity.md) for the canonical definitions of:
|
||||
- `Identity` struct (`{ id, scopes, resources }`)
|
||||
- `IdentityProvider` trait (`resolve_from_fingerprint()`, `resolve_from_token()`)
|
||||
- `ConfigIdentityProvider` (default, ArcSwap-backed)
|
||||
- `StorageIdentityProvider` (production, SQLite-backed, in alknet-storage)
|
||||
- `AuthProtocol` irpc service (behind `irpc` feature flag)
|
||||
|
||||
The key relationship: `IdentityProvider` is the contract. `ConfigIdentityProvider`
|
||||
is the default implementation (reads from `DynamicConfig.auth`). `AuthProtocol`
|
||||
irpc service is one way to satisfy the trait, behind a feature flag. Both paths
|
||||
produce the same `Identity` result. See ADR-028 and ADR-029.
|
||||
|
||||
### Credential Presentation Per Interface
|
||||
|
||||
Each (Transport, Interface) pair presents credentials differently, but all
|
||||
resolve to the same `Identity` through `IdentityProvider`. See
|
||||
[definitions.md](definitions.md) for the full terminology rules.
|
||||
|
||||
| (Transport, Interface) | Credential presentation | Resolves via |
|
||||
|------------------------|------------------------|-------------|
|
||||
| (TLS, SshInterface) | SSH public key handshake | `resolve_from_fingerprint()` |
|
||||
| (TCP, SshInterface) | SSH public key handshake | `resolve_from_fingerprint()` |
|
||||
| (iroh, SshInterface) | SSH public key handshake | `resolve_from_fingerprint()` |
|
||||
| (TLS, RawFramingInterface) | AuthToken in frame header | `resolve_from_token()` |
|
||||
| (TCP, RawFramingInterface) | AuthToken in frame header | `resolve_from_token()` |
|
||||
| (WebTransport, RawFramingInterface) | AuthToken in CONNECT request | `resolve_from_token()` |
|
||||
| (—, HttpInterface) | `Authorization: Bearer` header | `resolve_from_token()` |
|
||||
| (—, DnsInterface) | AuthToken in query labels | `resolve_from_token()` |
|
||||
|
||||
The **key material is shared**. The **credential presentation** differs per
|
||||
(Transport, Interface) pair. The **verification result is the same**: an
|
||||
authenticated `Identity` with scopes.
|
||||
|
||||
`resolve_from_token()` handles both AuthTokens (Ed25519-signed) and API keys
|
||||
(hash-verified bearer tokens). The implementation discriminates by prefix or
|
||||
format — see ADR-037.
|
||||
|
||||
### Token Authentication
|
||||
|
||||
For non-SSH transports, the client constructs an authentication token:
|
||||
|
||||
```
|
||||
AuthToken = base64url(key_id || timestamp || signature)
|
||||
|
||||
key_id = SHA-256 fingerprint of the Ed25519 public key (32 bytes)
|
||||
timestamp = Unix seconds, big-endian u64 (8 bytes)
|
||||
signature = Ed25519 sign(key_id || timestamp_bytes, private_key)
|
||||
```
|
||||
|
||||
Wire format when passed in a WebTransport CONNECT request:
|
||||
```
|
||||
CONNECT https://server:443/alknet?token=<AuthToken>
|
||||
```
|
||||
|
||||
Server verification:
|
||||
|
||||
1. Base64url-decode the token
|
||||
2. Extract `key_id` (first 32 bytes)
|
||||
3. Look up `key_id` in the same `authorized_keys` set that SSH auth uses
|
||||
4. Verify the Ed25519 `signature` against `(key_id || timestamp_bytes)` using
|
||||
the matching public key
|
||||
5. Check `timestamp` is within the acceptable window (configurable, default
|
||||
±300 seconds)
|
||||
6. Resolve to the same `Identity` that SSH pubkey auth would produce
|
||||
|
||||
The key fingerprint in the token serves double duty: it identifies which key
|
||||
to verify against, and it ties the signature to a specific key (swapping
|
||||
`key_id` invalidates the signature).
|
||||
|
||||
### Replay Protection
|
||||
|
||||
V1 uses timestamp-only (±300s window, no server state). The replay trade-offs
|
||||
and future zero-replay options (nonce challenge-response) are documented in
|
||||
ADR-023.
|
||||
|
||||
### IdentityProvider and Auth Service Relationship
|
||||
|
||||
The `IdentityProvider` trait (defined in [identity.md](identity.md)) decouples
|
||||
alknet-core from any specific identity storage. Two implementations exist:
|
||||
|
||||
- **ConfigIdentityProvider** (in alknet-core) — reads from
|
||||
`ArcSwap<DynamicConfig.auth>`. Every authorized key gets a default scope set.
|
||||
No database required. This is the default for minimal deployments.
|
||||
|
||||
- **StorageIdentityProvider** (in alknet-storage) — backed by SQLite
|
||||
`peer_credentials` and `api_keys` tables plus the ACL graph. Resolves
|
||||
fingerprint → account → organization membership → effective scopes.
|
||||
|
||||
The `AuthProtocol` irpc service (behind the `irpc` feature flag, per ADR-028)
|
||||
provides an async boundary for auth verification. It is one way to satisfy the
|
||||
`IdentityProvider` trait, not a replacement for it. Both the trait path and the
|
||||
irpc path produce the same `Identity` result.
|
||||
|
||||
The trait is the contract. The backing store is pluggable. Alknet-core never
|
||||
depends on Honker, SQLite, or any specific database.
|
||||
|
||||
### API Keys
|
||||
|
||||
For service accounts, automation, and HTTP interface auth, Ed25519 AuthTokens
|
||||
are inconvenient — they require client-side key generation and signing. API keys
|
||||
provide a simpler bearer token format (ADR-037):
|
||||
|
||||
```
|
||||
API key: "alk_dGhlX3NlY3JldA" (~20 chars, configurable prefix)
|
||||
Storage: SHA-256 hash of the full key
|
||||
Lookup: prefix match → hash verification → Identity
|
||||
```
|
||||
|
||||
API keys are configured in `DynamicConfig.auth.api_keys`:
|
||||
|
||||
```toml
|
||||
[[auth.api_keys]]
|
||||
prefix = "alk_"
|
||||
hash = "sha256:abc..."
|
||||
scopes = ["relay:connect"]
|
||||
description = "dashboard service account"
|
||||
ttl = "30d" # optional
|
||||
```
|
||||
|
||||
Both AuthTokens and API keys go through `IdentityProvider::resolve_from_token()`.
|
||||
The implementation discriminates by prefix (default `alk_`): if the token starts
|
||||
with the API key prefix, it's verified by SHA-256 hash lookup; otherwise, it's
|
||||
verified as an Ed25519 AuthToken. Both paths produce the same `Identity`.
|
||||
|
||||
See [configuration.md](configuration.md) for the full `DynamicConfig.auth`
|
||||
structure and ADR-037 for the decision context.
|
||||
|
||||
### AuthPolicy Structure
|
||||
|
||||
`AuthPolicy` in `DynamicConfig` holds all auth paths, sharing key material:
|
||||
|
||||
```rust
|
||||
pub struct AuthPolicy {
|
||||
pub ssh: SshAuthConfig,
|
||||
pub token: TokenAuthConfig,
|
||||
pub api_keys: Vec<ApiKeyEntry>,
|
||||
}
|
||||
|
||||
pub struct SshAuthConfig {
|
||||
pub authorized_keys: HashSet<PublicKey>,
|
||||
pub cert_authorities: Vec<CertAuthorityEntry>,
|
||||
// Existing fields from current ServerAuthConfig
|
||||
}
|
||||
|
||||
pub struct TokenAuthConfig {
|
||||
pub enabled: bool,
|
||||
pub max_token_age: Duration, // Timestamp window (default: 300s)
|
||||
pub key_source: TokenKeySource,
|
||||
}
|
||||
|
||||
pub enum TokenKeySource {
|
||||
/// Share the same authorized_keys set with SshAuthConfig.
|
||||
/// Default and recommended for v1.
|
||||
Shared,
|
||||
/// Separate key set for non-SSH transports.
|
||||
/// For deployments that want distinct access control per transport.
|
||||
Separate(HashSet<PublicKey>),
|
||||
}
|
||||
|
||||
pub struct ApiKeyEntry {
|
||||
pub prefix: String, // e.g., "alk_"
|
||||
pub hash: String, // e.g., "sha256:abc..."
|
||||
pub scopes: Vec<String>, // e.g., ["relay:connect", "secrets:derive"]
|
||||
pub description: Option<String>, // e.g., "dashboard service account"
|
||||
pub expires_at: Option<u64>, // Unix timestamp, optional TTL
|
||||
}
|
||||
```
|
||||
|
||||
When `TokenKeySource::Shared` (the default), adding a key to
|
||||
`authorized_keys` immediately grants access via both SSH and WebTransport.
|
||||
One key set, one `reloadAuth()` call, one rotation.
|
||||
|
||||
### Auth Flow in the Server
|
||||
|
||||
**SSH transport (existing, unchanged):**
|
||||
```
|
||||
Client connects → SSH handshake → auth_publickey() callback
|
||||
→ ServerAuthConfig::authenticate_publickey() or authenticate_certificate()
|
||||
→ Auth::Accept or Auth::Reject
|
||||
```
|
||||
|
||||
**WebTransport transport (new):**
|
||||
```
|
||||
Browser connects → WebTransport CONNECT request
|
||||
→ SessionRequest inspection: extract token from URL path or header
|
||||
→ TokenAuthConfig verification: decode token → lookup key_id → verify signature → check timestamp
|
||||
→ session_request.accept() or session_request.forbidden()
|
||||
```
|
||||
|
||||
After auth, both paths produce an `Identity`. The `Identity` is attached to the
|
||||
connection and used by `ForwardingPolicy` and the call protocol to make
|
||||
authorization decisions.
|
||||
|
||||
### WebTransport SessionRequest Inspection
|
||||
|
||||
The wtransport library's `SessionRequest` provides:
|
||||
|
||||
- `path()` — URL path (e.g., `/alknet?token=...`)
|
||||
- `headers()` — HTTP headers (for `Authorization: Bearer ...`)
|
||||
- `origin()` — Browser origin (for CORS-like restrictions)
|
||||
- `remote_address()` — Client UDP address
|
||||
|
||||
Token extraction from URL path is preferred for browser WebTransport because
|
||||
the W3C API (`new WebTransport(url)`) naturally includes query parameters. For
|
||||
native clients (Deno, CLI), the `Authorization` header is also supported.
|
||||
|
||||
### Browser-Side Token Construction
|
||||
|
||||
```javascript
|
||||
// Illustrative — see client SDK for production implementation
|
||||
async function createAuthToken(keyPair) {
|
||||
const publicKey = await crypto.subtle.exportKey('raw', keyPair.publicKey);
|
||||
const keyId = new Uint8Array(await crypto.subtle.digest('SHA-256', publicKey));
|
||||
|
||||
const timestamp = new ArrayBuffer(8);
|
||||
new DataView(timestamp).setBigUint64(0, BigInt(Math.floor(Date.now() / 1000)));
|
||||
|
||||
const message = new Uint8Array([...keyId, ...new Uint8Array(timestamp)]);
|
||||
const signature = await crypto.subtle.sign('Ed25519', keyPair.privateKey, message);
|
||||
|
||||
const token = new Uint8Array([...keyId, ...new Uint8Array(timestamp), ...new Uint8Array(signature)]);
|
||||
return btoa(String.fromCharCode(...token))
|
||||
.replace(/\+/g, '-').replace(/\//g, '_').replace(/=+$/, '');
|
||||
}
|
||||
```
|
||||
|
||||
Browsers support Ed25519 key generation and signing via `SubtleCrypto` (Chrome
|
||||
105+, Firefox 130+, Safari 17+). Deno supports it natively. No external
|
||||
dependencies needed.
|
||||
|
||||
## Constraints
|
||||
|
||||
- Auth tokens are Ed25519-signed with the same key pair used for SSH auth. No
|
||||
separate key management for non-SSH transports.
|
||||
- `IdentityProvider` is the only interface between alknet-core and identity
|
||||
storage. No database dependency at the core level.
|
||||
- The SSH auth path is unchanged. `auth_publickey()` continues to work exactly
|
||||
as it does today. Token auth is additive.
|
||||
- Certificate authority tokens are not supported for token auth in v1. CA
|
||||
verification requires the full OpenSSH certificate structure, which doesn't
|
||||
fit in a simple signed timestamp. This can be added later if needed.
|
||||
- Token auth is only available on transports that carry HTTP metadata (URL
|
||||
path, headers). SSH-over-TCP/TLS/iroh continues to use SSH native auth
|
||||
exclusively.
|
||||
- API keys are bearer tokens — anyone who obtains the key has the associated
|
||||
permissions. The hash storage and optional TTL mitigate but do not eliminate
|
||||
this risk. Ed25519 AuthTokens remain the preferred auth method for interactive
|
||||
clients. See ADR-037.
|
||||
- API keys are verified by SHA-256 hash lookup in `DynamicConfig.auth.api_keys`
|
||||
(or the `api_keys` database table in production). The full key is provided to
|
||||
the client exactly once at creation time.
|
||||
|
||||
### Security Considerations
|
||||
|
||||
**Token in URL**: The auth token is passed as a URL query parameter
|
||||
(`?token=...`) for browser WebTransport compatibility. This is a known web
|
||||
security consideration:
|
||||
|
||||
- **Server logs**: The token may appear in HTTP access logs. Servers MUST
|
||||
strip or redact the `token` query parameter before logging the request URL.
|
||||
- **Browser history**: The token may appear in browser history. Timestamps
|
||||
limit exposure to the token window (±300s).
|
||||
- **Referrer headers**: WebTransport does not send referrer headers, so the
|
||||
token does not leak via HTTP Referer.
|
||||
- **Native clients**: Deno and native clients SHOULD prefer the `Authorization:
|
||||
Bearer` header over URL parameters when the client supports custom headers.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **OQ-18**: ~~Source of Identity.scopes~~ Resolved per ADR-029 and ADR-031.
|
||||
`IdentityProvider` owns scopes, `ForwardingPolicy` uses scopes from `Identity`.
|
||||
See [open-questions.md](open-questions.md).
|
||||
|
||||
- **OQ-19**: Should the WebTransport listener require its own TLS identity
|
||||
(separate from the SSH-over-TLS listener), or can they share the same
|
||||
certificate? Deferred to Phase 4. See [open-questions.md](open-questions.md).
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Ed25519 + cert-authority | Key-based auth, no passwords |
|
||||
| [023](decisions/023-unified-auth-shared-key-material.md) | Unified auth, shared key material | Same keys for SSH and token auth |
|
||||
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | AuthProtocol behind feature flag; IdentityProvider is the contract |
|
||||
| [029](decisions/029-identity-core-type.md) | Identity as core type | `Identity` and `IdentityProvider` in alknet-core |
|
||||
| [035](decisions/035-streaminterface-messageinterface-split.md) | StreamInterface/MessageInterface | Credential presentation differs per (Transport, Interface) pair |
|
||||
| [037](decisions/037-api-keys-dynamic-config.md) | API keys in DynamicConfig | Hash-verified bearer tokens for service accounts |
|
||||
|
||||
## Phase 2 Implementation Notes
|
||||
|
||||
- `ConfigIdentityProvider::resolve_from_token()` now handles API keys (`alk_` prefix) via SHA-256 hash verification with expiry checking
|
||||
- `ApiKeyEntry` struct added to `AuthPolicy` with prefix, hash, scopes, description, expires_at fields
|
||||
- API keys produce `Identity { id: prefix, scopes: from_entry, resources: {} }`
|
||||
- Both AuthTokens (Ed25519 signed) and API keys (hash-verified bearer) go through `resolve_from_token()`, discriminated by format/prefix
|
||||
|
||||
## References
|
||||
|
||||
- [identity.md](identity.md) — Canonical Identity and IdentityProvider definitions
|
||||
- [server.md](server.md) — Current SSH auth handler
|
||||
- [transport.md](transport.md) — Transport abstraction
|
||||
- [configuration.md](configuration.md) — DynamicConfig, AuthPolicy, ConfigReloadHandle
|
||||
- [interface.md](interface.md) — Credential presentation per (Transport, Interface) pair
|
||||
- [definitions.md](definitions.md) — Terminology disambiguation (IdentityProvider vs CredentialProvider, AuthToken vs API key)
|
||||
- [services.md](services.md) — AuthProtocol irpc service
|
||||
- [open-questions.md](open-questions.md) — OQ-17 (resolved), OQ-18 (resolved), OQ-19
|
||||
- [wtransport](https://github.com/BiagioFesta/wtransport) — Rust WebTransport library
|
||||
- [WebTransport W3C Spec](https://www.w3.org/TR/webtransport/) — Browser API
|
||||
@@ -1,551 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-09
|
||||
---
|
||||
|
||||
# Call Protocol
|
||||
|
||||
## What
|
||||
|
||||
A bidirectional, transport-agnostic call and event protocol that runs over
|
||||
authenticated pipes. It supports request/response calls, streaming
|
||||
subscriptions, and unidirectional events — all using the same wire format. The
|
||||
protocol is defined as a spec + handler + registry; downstream consumers (NAPI,
|
||||
Python, head/worker) register their own operations without modifying core.
|
||||
|
||||
OperationEnv extends the call protocol with a universal composition mechanism
|
||||
that unifies local dispatch, irpc service dispatch, and remote dispatch. A
|
||||
handler receives `context.env.invoke(namespace, op, input)` and doesn't know
|
||||
whether the operation runs locally, in-cluster, or on a remote node.
|
||||
|
||||
## Why
|
||||
|
||||
The current control channel (ADR-018) is unidirectional (client → server) and
|
||||
provides fire-and-forget event dispatch without request/response semantics.
|
||||
The call protocol generalizes it to support bidirectional calls (ADR-024) and
|
||||
downstream service registration (ADR-025), enabling the head/worker model where
|
||||
workers expose operations the head invokes.
|
||||
|
||||
Without OperationEnv, handlers calling other operations would need to know
|
||||
whether the target is local, in-cluster, or on a remote node. OperationEnv
|
||||
abstracts this away — one handler-facing API, three dispatch backends (ADR-033).
|
||||
|
||||
## Architecture
|
||||
|
||||
### Operation Paths
|
||||
|
||||
Operation names use slash-based paths aligned with URL routing conventions:
|
||||
|
||||
```
|
||||
/{node}/{service}/{op}
|
||||
```
|
||||
|
||||
- **node** — identity prefix of the node that exposes the operation. The head
|
||||
uses this segment to route calls to the correct connected node.
|
||||
- **service** — the logical service namespace. Groups related operations
|
||||
under one handler prefix.
|
||||
- **op** — the specific operation within that service.
|
||||
|
||||
Examples:
|
||||
|
||||
| Path | Meaning |
|
||||
|------|---------|
|
||||
| `/dev1/fs/readFile` | Node `dev1`, service `fs`, operation `readFile` |
|
||||
| `/dev1/bash/exec` | Node `dev1`, service `bash`, operation `exec` |
|
||||
| `/head/agent/chat` | Head's own `agent` service, operation `chat` |
|
||||
| `/head/sessions/list` | Head's own `sessions` service, operation `list` |
|
||||
| `/browser-1/notify/alert` | Worker `browser-1`, `notify` service |
|
||||
|
||||
This three-level routing mirrors iroh's ALPN dispatch: the first segment
|
||||
routes to a connected node (like ALPN routes to a protocol handler), the
|
||||
remaining path dispatches within that node's registry. See ADR-025 for the
|
||||
handler/spec separation decision.
|
||||
|
||||
The `namespace` field on `OperationSpec` is derived from the path (`namespace`
|
||||
= second path segment). It's a convenience accessor for ACL matching and
|
||||
service grouping.
|
||||
|
||||
### Wire Format: EventEnvelope
|
||||
|
||||
Every message on the wire is a length-prefixed JSON `EventEnvelope`:
|
||||
|
||||
```rust
|
||||
pub struct EventEnvelope {
|
||||
pub r#type: String, // Event type (e.g., "call.requested", "call.responded")
|
||||
pub id: String, // Correlation key (requestId, topic, or "" for broadcasts)
|
||||
pub payload: Value, // JSON payload — schema depends on event type
|
||||
}
|
||||
|
||||
// Frame: 4-byte big-endian length prefix + UTF-8 JSON body
|
||||
```
|
||||
|
||||
This is the same format used by `@alkdev/pubsub` adapters. It is JSON because
|
||||
it must be consumable from JavaScript, Python, and any language. The envelope
|
||||
is transport-agnostic — it runs over SSH channels, WebTransport streams, iroh
|
||||
bidirectional streams, WebSocket, or Worker postMessage.
|
||||
|
||||
Binary payloads (postcard, protobuf, etc.) are base64-encoded in the `payload`
|
||||
field. The envelope itself stays JSON for cross-language compatibility.
|
||||
|
||||
### Call Protocol Events
|
||||
|
||||
Five event types carry request/response and subscription semantics:
|
||||
|
||||
| Event | Direction | Purpose |
|
||||
|-------|-----------|---------|
|
||||
| `call.requested` | Caller → Handler | Initiate a call or subscription |
|
||||
| `call.responded` | Handler → Caller | Deliver a result (one for calls, many for subscriptions) |
|
||||
| `call.completed` | Handler → Caller | Signal end of subscription stream |
|
||||
| `call.aborted` | Either side | Cancel the call/subscription |
|
||||
| `call.error` | Handler → Caller | Signal an error |
|
||||
|
||||
**`call.error` payload**:
|
||||
```json
|
||||
{
|
||||
"code": "string",
|
||||
"message": "string",
|
||||
"retryable": false
|
||||
}
|
||||
```
|
||||
|
||||
**A call is just a subscribe that resolves after one event.** Both `call()` and
|
||||
`subscribe()` send the same `call.requested` event. The difference is
|
||||
consumption pattern:
|
||||
|
||||
- **`call()`**: Sends `call.requested`, resolves `Promise` on first `call.responded`
|
||||
- **`subscribe()`**: Sends `call.requested`, yields each `call.responded` until `call.completed` or `call.aborted`
|
||||
|
||||
The `id` field carries the `requestId` for correlation.
|
||||
|
||||
### Bidirectional Calls and Routing
|
||||
|
||||
Both sides of a connection can initiate calls. The head routes calls to workers
|
||||
using the first path segment:
|
||||
|
||||
```
|
||||
Head (server) Worker: "dev1" (client)
|
||||
│ │
|
||||
│ call.requested │
|
||||
│ name: "/dev1/fs/readFile" │
|
||||
│ payload: { path: "/src/main.rs" } │
|
||||
│──────────────────────────────────────────▶│
|
||||
│ │
|
||||
│ call.responded │
|
||||
│ id: <requestId> │
|
||||
│ payload: { content: "fn main()..." } │
|
||||
│◀──────────────────────────────────────────│
|
||||
│ │
|
||||
│ Worker exposes /dev1/fs/*, │
|
||||
│ /dev1/bash/* to head │
|
||||
│ │
|
||||
│◀─ call.requested ────────────────────────│
|
||||
│ name: "/head/agent/chat" │
|
||||
│ payload: { provider: "anthropic", ... } │
|
||||
│ │
|
||||
│── call.responded ──────────────────────▶ │
|
||||
│ id: <requestId> │
|
||||
│ payload: { completion: "..." } │
|
||||
```
|
||||
|
||||
The head's registry includes:
|
||||
- **Head-local operations** (`/head/*`) — handled directly
|
||||
- **Remote operations** (`/{node}/*`) — forwarded to the worker connection
|
||||
|
||||
When the head routes `/dev1/fs/readFile` to worker `dev1`, it strips the node
|
||||
prefix and delivers the call to the worker's local registry as `/fs/readFile`.
|
||||
The worker doesn't need to know its own alias.
|
||||
|
||||
### Head/Worker Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────┐
|
||||
│ Head Node │
|
||||
│ │
|
||||
│ Head-local services: │
|
||||
│ /head/agent/chat (LLM coord) │
|
||||
│ /head/agent/complete │
|
||||
│ /head/sessions/list │
|
||||
│ /head/sessions/history │
|
||||
│ │
|
||||
│ Worker registry (discovered): │
|
||||
│ /dev1/fs/* → dev1 connection │
|
||||
│ /dev1/bash/* → dev1 connection │
|
||||
│ /dev2/fs/* → dev2 connection │
|
||||
│ /browser-1/notify/* → WT conn │
|
||||
└──────┬───────┬───────┬──────────┘
|
||||
│ │ │
|
||||
┌─────────▼┐ ┌───▼────┐ ┌▼───────────┐
|
||||
│ Worker │ │Worker │ │Browser Worker│
|
||||
│ "dev1" │ │"dev2" │ │"browser-1" │
|
||||
│ /fs/* │ │/fs/* │ │/notify/* │
|
||||
│ /bash/* │ │/bash/* │ │ │
|
||||
│ /search/*│ │ │ │ │
|
||||
└──────────┘ └────────┘ └─────────────┘
|
||||
```
|
||||
|
||||
When a worker connects, it registers its operations with the head:
|
||||
|
||||
```
|
||||
worker → head: call.requested { name: "/head/services/register", payload: {
|
||||
node: "dev1",
|
||||
operations: ["/fs/readFile", "/fs/writeFile", "/bash/exec", "/search/query"]
|
||||
}}
|
||||
```
|
||||
|
||||
The head adds these to its routing table with the node prefix. Other workers
|
||||
and browser clients can then call `/dev1/fs/readFile` without knowing how
|
||||
the head routes it internally.
|
||||
|
||||
### Operation Registry
|
||||
|
||||
The operation registry maps paths to specs and handlers. **Specs and handlers
|
||||
are separate** — downstream consumers register both (ADR-025).
|
||||
|
||||
```rust
|
||||
pub struct OperationSpec {
|
||||
pub name: String, // e.g., "/fs/readFile", "/agent/chat"
|
||||
pub namespace: String, // e.g., "fs", "agent"
|
||||
pub op_type: OperationType, // Query, Mutation, Subscription
|
||||
pub input_schema: Value, // JSON Schema for input
|
||||
pub output_schema: Value, // JSON Schema for output
|
||||
pub access_control: AccessControl, // Required scopes/resources
|
||||
}
|
||||
|
||||
pub enum OperationType {
|
||||
Query, // Read-only, idempotent (e.g., "/fs/readFile", "/search/query")
|
||||
Mutation, // Side effects (e.g., "/bash/exec", "/sessions/create")
|
||||
Subscription, // Streaming (e.g., "/events/subscribe")
|
||||
}
|
||||
|
||||
pub struct AccessControl {
|
||||
pub required_scopes: Vec<String>, // AND-checked
|
||||
pub required_scopes_any: Option<Vec<String>>, // OR-checked
|
||||
pub resource_type: Option<String>, // e.g., "service"
|
||||
pub resource_action: Option<String>, // e.g., "read"
|
||||
}
|
||||
```
|
||||
|
||||
**Registration is separated from implementation:**
|
||||
|
||||
```rust
|
||||
// Core registers discovery operations
|
||||
registry.register(OperationSpec { name: "/services/list", ... }, list_services_handler);
|
||||
registry.register(OperationSpec { name: "/services/schema", ... }, schema_handler);
|
||||
|
||||
// A dev env worker registers its tools
|
||||
registry.register(OperationSpec { name: "/fs/readFile", ... }, fs_read_handler);
|
||||
registry.register(OperationSpec { name: "/bash/exec", ... }, bash_exec_handler);
|
||||
|
||||
// A browser client registers notification UDFs
|
||||
registry.register(OperationSpec { name: "/notify/alert", ... }, notify_handler);
|
||||
```
|
||||
|
||||
Core-provided operations use short paths without a node prefix
|
||||
(`/services/list`, `/services/schema`). They live on whatever node the
|
||||
caller is connected to. Worker-prefixed operations (`/dev1/fs/readFile`)
|
||||
are routed by the head.
|
||||
|
||||
### ACL Per Operation Path
|
||||
|
||||
Access control maps to path prefixes using standard URL-like matching:
|
||||
|
||||
| Pattern | Matches | Purpose |
|
||||
|---------|---------|---------|
|
||||
| `/dev1/*` | All operations on node `dev1` | Full access to a worker |
|
||||
| `/*/fs/*` | `fs` service on any node | Read file access across dev envs |
|
||||
| `/*/bash/*` | `bash` service on any node | Shell access (higher risk) |
|
||||
| `/head/agent/*` | Head LLM agent | LLM calls |
|
||||
| `/head/sessions/*` | Head session management | Session history |
|
||||
| `/browser-1/notify/alert` | Specific operation on specific node | One UI notification |
|
||||
|
||||
Higher-risk operations (shell, filesystem write) can require tighter scopes
|
||||
than read-only operations. The ACL evaluates against the caller's
|
||||
`Identity.scopes` and `Identity.resources` from the auth layer (see auth.md).
|
||||
|
||||
### Service Discovery
|
||||
|
||||
The `/services/list` and `/services/schema` operations expose what a node
|
||||
offers. Read-only — no admin operations:
|
||||
|
||||
| Operation | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `/services/list` | Query | List registered operation paths + metadata |
|
||||
| `/services/schema` | Query | Get `OperationSpec` for a specific operation |
|
||||
|
||||
These tell the caller: "here's what you can call." They are not a control
|
||||
panel. Access control is enforced at the operation level.
|
||||
|
||||
### PendingRequestMap
|
||||
|
||||
Manages in-flight calls and subscriptions. Correlates `call.responded` events
|
||||
back to the original `call.requested`:
|
||||
|
||||
```rust
|
||||
pub struct PendingRequestMap {
|
||||
pending: HashMap<String, PendingEntry>,
|
||||
}
|
||||
|
||||
enum PendingEntry {
|
||||
Call {
|
||||
tx: oneshot::Sender<Result<Value>>,
|
||||
timeout: Instant,
|
||||
},
|
||||
Subscribe {
|
||||
tx: mpsc::Sender<Result<Value>>,
|
||||
timeout: Option<Instant>,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
When a `call.responded` event arrives:
|
||||
- If `PendingEntry::Call` → resolve the oneshot, delete entry
|
||||
- If `PendingEntry::Subscribe` → push to the mpsc channel, keep entry alive
|
||||
|
||||
When `call.completed` arrives on a subscription → close the mpsc channel, delete
|
||||
entry. When `call.aborted` arrives → cancel/drop whichever side initiated it. A
|
||||
`call.aborted` for an unknown `requestId` is silently discarded — no error
|
||||
response is generated.
|
||||
|
||||
Timeouts prevent dangling entries. A background task sweeps expired entries
|
||||
periodically.
|
||||
|
||||
### Protocol Adapter Layer
|
||||
|
||||
The call protocol is transport-agnostic and interface-agnostic by design. It
|
||||
receives input from two interface categories (ADR-035):
|
||||
|
||||
**StreamInterface** produces `InterfaceEvent` frames from a continuous byte
|
||||
stream (SSH channel, raw framing). The call protocol handler calls `recv()`
|
||||
on the session to get events.
|
||||
|
||||
**MessageInterface** handles individual `InterfaceRequest` → `InterfaceResponse`
|
||||
pairs (HTTP, DNS). The call protocol handler constructs an `OperationContext`
|
||||
from the request and invokes the registry directly.
|
||||
|
||||
Both paths resolve to the same `OperationRegistry` and `OperationEnv`:
|
||||
|
||||
| Transport | Channel mechanism | Direction |
|
||||
|-----------|-------------------|-----------|
|
||||
| SSH | Reserved `direct_tcpip` destination (ADR-018) | Bidirectional over SSH channel |
|
||||
| WebTransport | Bidirectional stream after CONNECT | Bidirectional over WT stream |
|
||||
| iroh QUIC | Bidirectional `open_bi()` / `accept_bi()` | Bidirectional over QUIC stream |
|
||||
| WebSocket | Single WS connection | Bidirectional over WS frames |
|
||||
| Worker | `postMessage` | Bidirectional over structured clone |
|
||||
|
||||
The framing is always: 4-byte BE length prefix + JSON. The envelope shape is
|
||||
the same regardless of transport.
|
||||
|
||||
### OperationEnv — Universal Composition Mechanism
|
||||
|
||||
OperationEnv provides the handler-facing API for composing operations. A handler
|
||||
receives `context.env.invoke(namespace, operation, input)` and gets back a
|
||||
`ResponseEnvelope` — regardless of which dispatch path the operation takes
|
||||
(ADR-033).
|
||||
|
||||
Three dispatch paths, one API:
|
||||
|
||||
| Path | Mechanism | Serialization | Scope |
|
||||
|------|-----------|---------------|-------|
|
||||
| **Local** | Direct function call through registry | None (in-process) | Same process |
|
||||
| **Service** | irpc protocol enum dispatch | postcard (binary) | Same cluster |
|
||||
| **Remote** | Call protocol `EventEnvelope` | JSON | Cross-node |
|
||||
|
||||
All three produce the same `ResponseEnvelope`. Service assembly determines
|
||||
which path each operation uses:
|
||||
|
||||
```rust
|
||||
// Minimal deployment (Phase 1: single node, all local)
|
||||
let env = OperationEnv::local(local_registry);
|
||||
|
||||
// Production deployment (Phase 2+: mix of local and remote)
|
||||
let env = OperationEnv::new()
|
||||
.local("auth", auth_registry)
|
||||
.local("config", config_registry)
|
||||
.service("secrets", secret_irpc_client)
|
||||
.remote("worker-1", call_protocol_conn);
|
||||
```
|
||||
|
||||
**Phase boundary**: Phase 1 ships with local dispatch only (direct function
|
||||
calls through the operation registry). The irpc service dispatch and remote
|
||||
dispatch paths are contracted here but not built yet. irpc service protocols
|
||||
(`AuthProtocol`, `SecretProtocol`, etc.) are defined in the specs but the
|
||||
implementations are Phase 2+ work.
|
||||
|
||||
**irpc is one dispatch backend for OperationEnv, not a replacement for the
|
||||
call protocol or for OperationEnv.** A call protocol handler can call an irpc
|
||||
service internally (e.g., `/head/auth/verify` calls
|
||||
`AuthProtocol::VerifyPubkey`) — the layers compose. irpc is behind a feature
|
||||
flag in alknet-core. See [services.md](services.md) for full OperationEnv and
|
||||
irpc service details.
|
||||
|
||||
### OperationContext
|
||||
|
||||
Every handler receives an `OperationContext`:
|
||||
|
||||
```rust
|
||||
pub struct OperationContext {
|
||||
pub request_id: String,
|
||||
pub parent_request_id: Option<String>,
|
||||
pub identity: Option<Identity>,
|
||||
pub metadata: HashMap<String, Value>,
|
||||
pub env: OperationEnv,
|
||||
pub trusted: bool, // set by buildEnv(), not by callers
|
||||
}
|
||||
```
|
||||
|
||||
- **`identity`**: The authenticated identity making the call. Populated by
|
||||
`IdentityProvider` from the interface layer ([identity.md](identity.md)).
|
||||
- **`env`**: The operation environment — namespaced access to other operations.
|
||||
- **`trusted`**: When a handler calls another operation through `env`, the
|
||||
nested call is `trusted` (skips ACL checks). This prevents double-checking:
|
||||
if `/head/agent/chat` is allowed, and it internally calls
|
||||
`/head/auth/verify`, the auth check is trusted.
|
||||
|
||||
Handler signature:
|
||||
|
||||
```rust
|
||||
fn handle(input: Value, context: OperationContext) -> ResponseEnvelope;
|
||||
```
|
||||
|
||||
### ResponseEnvelope
|
||||
|
||||
The universal return type from all three dispatch paths:
|
||||
|
||||
```rust
|
||||
pub struct ResponseEnvelope {
|
||||
pub request_id: String,
|
||||
pub result: Result<Value, CallError>,
|
||||
}
|
||||
|
||||
pub struct CallError {
|
||||
pub code: String,
|
||||
pub message: String,
|
||||
pub retryable: bool,
|
||||
}
|
||||
```
|
||||
|
||||
Local dispatch produces `ResponseEnvelope` with no serialization. irpc service
|
||||
dispatch produces postcard-encoded results that are decoded into
|
||||
`ResponseEnvelope`. Remote dispatch receives `call.responded` EventEnvelope
|
||||
frames and maps them to `ResponseEnvelope`. The handler always gets the same
|
||||
type back.
|
||||
|
||||
### Relationship to @alkdev/pubsub and @alkdev/operations
|
||||
|
||||
The call protocol in core is a Rust reimplementation of the same protocol
|
||||
defined in `@alkdev/operations`. The TypeScript implementation provides:
|
||||
|
||||
- `PendingRequestMap` — request/response correlation
|
||||
- `CallHandler` — bridges pubsub events to operation registry
|
||||
- `OperationSpec`, `AccessControl`, `Identity` — type definitions
|
||||
|
||||
The Rust implementation mirrors these types and behaviors. TypeScript consumers
|
||||
continue using `@alkdev/operations` over `@alkdev/pubsub` adapters (including
|
||||
the `event-target-alknet` adapter). Rust consumers use core's registry directly.
|
||||
Both speak the same wire protocol and can interoperate.
|
||||
|
||||
The key principle: **the same `EventEnvelope` can flow from a Rust handler
|
||||
through core, out over SSH channel, into a JavaScript pubsub adapter, and
|
||||
be dispatched through `@alkdev/operations`'s call handler** — with zero
|
||||
translation at the wire level.
|
||||
|
||||
### Agent Service Pattern (Downstream Application Concern)
|
||||
|
||||
An agent service — coordinating between LLM providers and tool calls — is a
|
||||
primary downstream use case for the call protocol. It would be just another set
|
||||
of registered operations with no special treatment:
|
||||
|
||||
- `/head/agent/chat` — send a message, get a completion. Routes to the
|
||||
appropriate LLM provider based on available workers and configuration.
|
||||
- `/head/agent/complete` — streaming completion. Yields tokens as they arrive.
|
||||
- `/head/sessions/list` — list session histories (backed by Honker or other
|
||||
durable storage).
|
||||
- `/head/sessions/history` — retrieve a specific session's message history.
|
||||
|
||||
The agent service uses OperationEnv to invoke tools on workers. **This is a
|
||||
downstream application concern, not a core requirement.** The call protocol
|
||||
enables it by providing the universal composition mechanism (ADR-033), but the
|
||||
agent service itself is built on top, not into the core.
|
||||
|
||||
## Constraints
|
||||
|
||||
- The call protocol does not depend on Honker, SQLite, or any database. The
|
||||
`PendingRequestMap` is in-memory. Durable session storage is a consumer concern.
|
||||
- Operation specs use JSON Schema. Complex sub-structures (postcard, protobuf)
|
||||
can be carried as base64-encoded blobs in the `payload`, but the envelope
|
||||
itself is always JSON.
|
||||
- Service discovery (`/services/list`, `/services/schema`) is read-only. No
|
||||
admin operations are exposed through the call protocol itself.
|
||||
- Batch is not a protocol primitive. Multiple `call.requested` events with
|
||||
correlated `requestId`s provide equivalent semantics.
|
||||
- The node prefix in the operation path is a routing mechanism, not a security
|
||||
boundary. ACL is enforced at the `AccessControl` level, not by path prefix
|
||||
alone. A worker that exposes `/dev1/bash/exec` can restrict access via
|
||||
`required_scopes` — not every authenticated identity should have shell access.
|
||||
- **OperationEnv composition model matches the `@alkdev/operations` behavioral
|
||||
contract**: namespace + operation name → invoke with input, return output.
|
||||
The Rust implementation may differ in structure but must preserve this
|
||||
contract (ADR-033).
|
||||
- **irpc is explicitly positioned as one dispatch backend for OperationEnv**
|
||||
(ADR-033, ADR-028). It is not a replacement for the call protocol or for
|
||||
OperationEnv.
|
||||
- **Phase 1 is local dispatch only.** irpc service dispatch and remote dispatch
|
||||
are contracted in this spec but not built yet. The `OperationEnv::local()`
|
||||
path is the Phase 1 implementation.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **OQ-20**: How does the head track which workers expose which operations when
|
||||
workers connect and disconnect? Registration on connect and cleanup on
|
||||
disconnect, or heartbeat-based discovery? See
|
||||
[open-questions.md](open-questions.md).
|
||||
|
||||
- **OQ-22**: ~~Should the call protocol support streaming inputs (client streaming
|
||||
in gRPC terms)?~~ Resolved — deferred. Current model covers all identified use
|
||||
cases. See [open-questions.md](open-questions.md).
|
||||
|
||||
- **~~OQ-IF-01~~**: ~~How does the `Interface` session type relate to the call
|
||||
protocol's `EventEnvelope` stream?~~ Resolved — `InterfaceSession::recv()`
|
||||
returns `Option<InterfaceEvent>` where `InterfaceEvent` carries
|
||||
`EventEnvelope` + `Identity`. `InterfaceSession::send()` accepts `EventEnvelope`.
|
||||
The `SshSession` bridge implements this over the `alknet-control:0` channel.
|
||||
For `MessageInterface`, `InterfaceRequest`/`InterfaceResponse` normalize
|
||||
request/response pairs. See [interface.md](interface.md) and ADR-035.
|
||||
|
||||
- **OQ-P2-01**: Should `MessageInterface` and `StreamInterface` share a common
|
||||
trait? See [interface.md](interface.md) and [open-questions.md](open-questions.md).
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [018](decisions/018-control-channel-for-pubsub.md) | Control channel for pubsub | Reserved destination for event bus |
|
||||
| [024](decisions/024-bidirectional-call-protocol.md) | Bidirectional call protocol | Generalizes ADR-018, both sides can call |
|
||||
| [025](decisions/025-handler-spec-separation.md) | Handler/spec separation | Downstream registers operations without modifying core |
|
||||
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | irpc is one dispatch backend for OperationEnv |
|
||||
| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv | Universal composition with three dispatch paths |
|
||||
| [035](decisions/035-streaminterface-messageinterface-split.md) | StreamInterface/MessageInterface | Call protocol accepts events from both interface categories |
|
||||
|
||||
## Phase 2 Implementation Notes
|
||||
|
||||
- `SshSession::recv()` and `SshSession::send()` now functional — bridged to call protocol via `alknet-control:0` SSH channel using `ControlChannelBridge` with mpsc channels
|
||||
- `FrameFramedReader`/`FrameFramedWriter` added to `call::frame` for async length-prefixed EventEnvelope I/O
|
||||
- `RawFramingSession` implemented with first-frame auth: first frame's payload extracted as AuthToken, resolved via `IdentityProvider::resolve_from_token()`, session transitions to authenticated state on success
|
||||
- `OperationEnv.credentials(service)` method added for outbound credential resolution (ADR-036)
|
||||
- `CredentialProvider` trait and `CredentialSet` enum defined in `alknet_core::credentials`
|
||||
|
||||
## References
|
||||
|
||||
- [auth.md](auth.md) — Identity and `IdentityProvider` trait
|
||||
- [napi-and-pubsub.md](napi-and-pubsub.md) — NAPI wrapper and pubsub adapter
|
||||
- [server.md](server.md) — Channel handling and control channel routing
|
||||
- [transport.md](transport.md) — Transport abstraction
|
||||
- [identity.md](identity.md) — Identity struct, IdentityProvider trait
|
||||
- [interface.md](interface.md) — Interface layer, EventEnvelope stream from interfaces
|
||||
- [configuration.md](configuration.md) — ForwardingPolicy, service metadata
|
||||
- [services.md](services.md) — OperationEnv, OperationContext, irpc service layer
|
||||
- `@alkdev/pubsub` — TypeScript event target adapters and `EventEnvelope`
|
||||
- `@alkdev/operations` — TypeScript call protocol, `OperationSpec`, registry
|
||||
- `@alkdev/storage` — `peer_credentials` table, ACL graph, `Identity`
|
||||
- [irpc](/workspace/irpc) — iroh streaming RPC (postcard-only, Rust-to-Rust)
|
||||
- [iroh](/workspace/iroh) — P2P QUIC transport
|
||||
@@ -1,209 +0,0 @@
|
||||
---
|
||||
status: reviewed
|
||||
last_updated: 2026-06-02
|
||||
---
|
||||
|
||||
# Client
|
||||
|
||||
## What
|
||||
|
||||
The alknet client establishes an SSH session to a server (via pluggable transport) and exposes a local SOCKS5 proxy for routing traffic through that session. Port forwarding (`-L` / `-R` style) covers specific service access like Postgres or Redis.
|
||||
|
||||
## Why
|
||||
|
||||
Users need a way to route traffic through the SSH tunnel. SOCKS5 is the primary interface — it's standard, well-supported by browsers and CLI tools, and needs no privileges. Port forwarding covers specific service access. For VPN-like "route all traffic" behavior, users run `tun2proxy` alongside alknet (ADR-014).
|
||||
|
||||
## Architecture
|
||||
|
||||
### Client Components
|
||||
|
||||
```
|
||||
┌────────────────────────────────────────────────────────┐
|
||||
│ alknet connect │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ SOCKS5 │ │ Port │ │ Remote │ │
|
||||
│ │ Server │ │ Forward │ │ Forward │ │
|
||||
│ │ :1080 │ │ -L spec │ │ -R spec │ │
|
||||
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌─────────────────────────────────┐ │
|
||||
│ │ Channel Manager │ │
|
||||
│ │ (opens direct-tcpip, │ │
|
||||
│ │ forwarded-tcpip streams) │ │
|
||||
│ └──────────────┬──────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────────────▼──────────────────┐ │
|
||||
│ │ SSH Client (russh) │ │
|
||||
│ │ Handle<ClientHandler> │ │
|
||||
│ └──────────────┬──────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────────────▼──────────────────┐ │
|
||||
│ │ Transport │ │
|
||||
│ │ (Tcp / Tls / Iroh) │ │
|
||||
│ └──────────────────────────────────┘ │
|
||||
└────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### SOCKS5 Server
|
||||
|
||||
The primary client interface. Listens on a local port (default `127.0.0.1:1080`), accepts SOCKS5 connections, and for each connection:
|
||||
|
||||
1. Reads the SOCKS5 handshake (auth method negotiation, target address)
|
||||
2. Opens a `channel_open_direct_tcpip(target_host, target_port, originator_addr, originator_port)` on the SSH session
|
||||
3. Converts the SSH channel to a stream via `channel.into_stream()`
|
||||
4. Runs `tokio::io::copy_bidirectional(&mut local_socket, &mut ssh_stream)` to proxy data
|
||||
|
||||
Supports SOCKS5h (domain names resolved server-side) by default. This prevents DNS leaks — the client never resolves target hostnames locally, sending them to the server for resolution instead. This is consistent with the project's privacy design (ADR-006).
|
||||
|
||||
### Port Forwarding
|
||||
|
||||
Local port forwards (`-L local_addr:local_port:remote_host:remote_port`):
|
||||
|
||||
1. Bind `TcpListener` on `local_addr:local_port`
|
||||
2. For each accepted connection, open `channel_open_direct_tcpip(remote_host, remote_port, ...)`
|
||||
3. Proxy bytes bidirectionally via `copy_bidirectional`
|
||||
|
||||
Remote port forwards (`-R remote_addr:remote_port:local_host:local_port`):
|
||||
|
||||
1. Send `tcpip_forward(remote_addr, remote_port)` to request the server listen on a port
|
||||
2. When the handler receives `server_channel_open_forwarded_tcpip`, connect to `local_host:local_port`
|
||||
3. Proxy bytes bidirectionally
|
||||
|
||||
### Channel Manager
|
||||
|
||||
The channel manager owns the `Arc<client::Handle<ClientHandler>>` and provides methods:
|
||||
|
||||
- `open_direct_tcpip(host, port)` — open a tunnel channel to a remote host
|
||||
- `open_streamlocal(socket_path)` — open a tunnel to a Unix socket
|
||||
- `request_tcpip_forward(addr, port)` — request remote listening
|
||||
- `cancel_tcpip_forward(addr, port)` — cancel remote listening
|
||||
|
||||
It also handles reconnection: if `handle.is_closed()` returns true, attempt reconnection with exponential backoff.
|
||||
|
||||
### Reconnection
|
||||
|
||||
On transport failure:
|
||||
|
||||
1. Detect via `handle.is_closed()` or transport read error
|
||||
2. Exponential backoff reconnect (1s, 2s, 4s, ... max 30s)
|
||||
3. Re-establish transport connection
|
||||
4. Re-authenticate SSH session
|
||||
5. Notify SOCKS5 server and port forwards (in-flight connections fail, new connections work)
|
||||
|
||||
Reconnection is always enabled. The backoff caps at 30 seconds and continues indefinitely until the user terminates the process. Existing TCP connections through the tunnel are lost on reconnect — this is acceptable and consistent with how VPN connections behave.
|
||||
|
||||
The channel manager orchestrates reconnection: it creates a new transport stream (by calling `transport.connect()` again) and establishes a new SSH session over it (ADR-004). This is a full reconnect — there is no "SSH reconnects over the same transport." Port forward listeners (`-L`, `-R`) are re-registered with the new session after reconnection.
|
||||
|
||||
### Programmatic Configuration (ADR-011)
|
||||
|
||||
The client uses programmatic configuration — no `~/.ssh/config` parsing, no custom config files. Configuration comes from:
|
||||
|
||||
1. **CLI flags**: `--server`, `--identity`, `--transport`, etc.
|
||||
2. **Library API**: `ConnectOptions` and `ServeOptions` structs in `alknet-core`, constructable programmatically
|
||||
3. **Environment variables**: `ALKNET_SERVER`, `ALKNET_IDENTITY` as convenience defaults
|
||||
|
||||
This approach avoids cross-platform path issues (`~` expansion, Windows `USERPROFILE`) and makes the library API clean for programmatic consumers like the NAPI wrapper. Keys can be provided as file paths or in-memory data.
|
||||
|
||||
### Key Material Format
|
||||
|
||||
Key inputs (`--identity`, `--authorized-keys`, `--cert-authority`, `--key`) accept either:
|
||||
|
||||
- **File path**: A filesystem path to a key file (e.g., `~/.ssh/id_ed25519`, `/etc/alknet/ca.pub`)
|
||||
- **In-memory data**: Raw key bytes provided programmatically via the library API or NAPI wrapper (as `Vec<u8>` in Rust, `Buffer` in Node.js)
|
||||
|
||||
The accepted format is **OpenSSH key format** (the format used by `ssh-keygen` and OpenSSH's `~/.ssh/` files). This includes:
|
||||
- Private keys: OpenSSH format (begins with `-----BEGIN OPENSSH PRIVATE KEY-----`)
|
||||
- Public keys: OpenSSH format (e.g., `ssh-ed25519 AAAA... user@host`)
|
||||
- Certificate authority keys: OpenSSH public key format
|
||||
- Authorized keys files: Standard OpenSSH `authorized_keys` format
|
||||
|
||||
PEM-encoded keys (PKCS#1, PKCS#8) are not supported. Use OpenSSH format keys throughout.
|
||||
|
||||
### CLI Interface
|
||||
|
||||
```bash
|
||||
# Basic connection (TCP, default port 22)
|
||||
alknet connect --server example.com --identity ~/.ssh/id_ed25519
|
||||
|
||||
# With TLS
|
||||
alknet connect --server example.com:443 --transport tls --identity ~/.ssh/id_ed25519
|
||||
|
||||
# With TLS + insecure (self-signed certs)
|
||||
alknet connect --server example.com:443 --transport tls --identity ~/.ssh/id_ed25519 --insecure
|
||||
|
||||
# With iroh (no public IP needed)
|
||||
alknet connect --peer <endpoint-id> --transport iroh --identity ~/.ssh/id_ed25519
|
||||
|
||||
# With iroh + custom relay
|
||||
alknet connect --peer <endpoint-id> --transport iroh --identity ~/.ssh/id_ed25519 --iroh-relay https://relay.example.com
|
||||
|
||||
# With iroh + proxy (transport chaining)
|
||||
alknet connect --peer <endpoint-id> --transport iroh --identity ~/.ssh/id_ed25519 --proxy socks5://127.0.0.1:1080
|
||||
|
||||
# SOCKS5 on custom port
|
||||
alknet connect --server example.com --socks5 127.0.0.1:1080 --identity ~/.ssh/id_ed25519
|
||||
|
||||
# With port forwards
|
||||
alknet connect --server example.com --forward 5432:db.internal:5432 --forward 6379:redis.internal:6379
|
||||
|
||||
# All options
|
||||
alknet connect \
|
||||
--server <addr> \ # TCP/TLS server address (required for tcp/tls)
|
||||
--peer <endpoint-id> \ # iroh endpoint ID, base58-encoded (required for iroh)
|
||||
--transport tcp|tls|iroh \ # Transport mode
|
||||
--identity <path-or-buffer> \ # SSH private key (path or in-memory)
|
||||
--socks5 <addr:port> \ # SOCKS5 listen address (default: 127.0.0.1:1080)
|
||||
--forward <spec> \ # Port forward spec (repeatable)
|
||||
--remote-forward <spec> \ # Remote port forward spec (repeatable)
|
||||
--proxy <url> \ # Upstream proxy (socks5:// or http://)
|
||||
--iroh-relay <url> \ # iroh relay URL (default: n0 relay)
|
||||
--tls-server-name <host> \ # SNI hostname for TLS
|
||||
--insecure # Accept self-signed TLS certs
|
||||
```
|
||||
|
||||
## Constraints
|
||||
|
||||
- SOCKS5 is always enabled when `alknet connect` runs (it's the primary interface). Port forwards are optional.
|
||||
- The client does not log tunnel destinations. The SOCKS5 server connects and proxies — no logging of SOCKS5 request targets.
|
||||
- Authentication is Ed25519 public key or OpenSSH certificate (ADR-012). No password authentication over SSH.
|
||||
- Only one SSH session per `alknet connect` process. Multiple sessions = multiple processes (or a future multiplexer).
|
||||
- No `~/.ssh/config` parsing. Configuration is programmatic via CLI flags, env vars, or library API structs (ADR-011).
|
||||
- VPN-like "route all traffic" behavior is provided by running `tun2proxy --proxy socks5://127.0.0.1:1080` alongside the client, not by a built-in TUN interface (ADR-014).
|
||||
- The CLI `alknet connect` command manages a full SSH session with SOCKS5 and port forwarding. The NAPI `connect()` function is a different operation — it opens a single SSH channel as a Duplex stream for programmatic use, with no SOCKS5 server or port forwarding. See napi-and-pubsub.md for details.
|
||||
|
||||
## Graceful Shutdown
|
||||
|
||||
On SIGTERM or SIGINT:
|
||||
|
||||
1. Stop accepting new SOCKS5 connections and port forward connections
|
||||
2. Send an SSH disconnect message to the server
|
||||
3. Wait for in-flight channel data to drain (brief timeout, ~2 seconds)
|
||||
4. Close the transport stream
|
||||
5. Exit
|
||||
|
||||
In-flight connections are not preserved across shutdown — they receive a connection reset. This matches the behavior of standard SSH tunnel tools.
|
||||
|
||||
## Error Handling
|
||||
|
||||
Error handling follows the project's layered pattern (see overview.md):
|
||||
|
||||
- **Transport errors**: Trigger reconnection with exponential backoff (see Reconnection section above). If reconnection fails indefinitely, the process continues retrying until the user terminates it.
|
||||
- **Auth errors**: Cause reconnection retry. After repeated auth failures, the SOCKS5 server and port-forward listeners remain active but new channel opens fail until reconnection succeeds.
|
||||
- **Channel-level errors**: Individual channel failures (target unreachable, proxy failure) close that channel without affecting the SSH session or other channels.
|
||||
- **CLI errors**: Reported to stderr with a non-zero exit code. Fatal errors (invalid flags, key file not found) exit immediately.
|
||||
|
||||
## Open Questions
|
||||
|
||||
None — all resolved.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [005](decisions/005-socks5-before-tun.md) | SOCKS5 first | SOCKS5 is the primary interface; TUN is external (tun2proxy) |
|
||||
| [006](decisions/006-no-logging-of-tunnel-destinations.md) | No logging of destinations | Client does not log SOCKS5 request targets (consistent with ADR-006) |
|
||||
| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first API | No file-based config; options are structs, env vars, or CLI flags |
|
||||
| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Key + cert-authority | No password auth; OpenSSH cert-authority for multi-user |
|
||||
| [019](decisions/019-proxy-dual-semantics.md) | Proxy dual semantics | `--proxy` routes transport on client, data on server |
|
||||
@@ -1,329 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-09
|
||||
---
|
||||
|
||||
# Configuration
|
||||
|
||||
## What
|
||||
|
||||
Alknet's configuration is split into `StaticConfig` (immutable after startup) and
|
||||
`DynamicConfig` (hot-reloadable at runtime), with `ArcSwap` providing lock-free
|
||||
reads on the hot path. `ConfigService` wraps reloads behind an irpc protocol
|
||||
for production deployments.
|
||||
|
||||
## Why
|
||||
|
||||
Three specific failures motivated the split (ADR-030):
|
||||
|
||||
1. No hot reload of authentication credentials — adding a key requires a restart.
|
||||
2. No port forwarding access control — any authenticated client has unrestricted
|
||||
access (ADR-031).
|
||||
3. No structured configuration beyond CLI flags — operators need config files
|
||||
and the NAPI layer needs programmatic reload.
|
||||
|
||||
The split is clean: anything that affects SSH handshake or socket binding is
|
||||
static; anything checked per-connection or per-channel is dynamic.
|
||||
|
||||
## Architecture
|
||||
|
||||
### StaticConfig
|
||||
|
||||
Immutable after startup. Constructed from `ServeOptions` (the builder pattern
|
||||
is preserved per ADR-011). Contains:
|
||||
|
||||
- Transport mode, listen address
|
||||
- TLS config (cert, key)
|
||||
- iroh config (relay URL)
|
||||
- Stealth mode flag
|
||||
- Host key, host key algorithm
|
||||
- Max auth attempts, max connections per IP
|
||||
- Proxy config
|
||||
|
||||
Changing any of these requires a restart.
|
||||
|
||||
### DynamicConfig
|
||||
|
||||
Hot-reloadable at runtime via `ArcSwap<DynamicConfig>`. Contains:
|
||||
|
||||
- `AuthPolicy` — authorized keys, certificate authorities, token config
|
||||
- `ForwardingPolicy` — allow/deny rules for channel targets (ADR-031)
|
||||
- `RateLimitConfig` — rate limiting parameters
|
||||
|
||||
`ArcSwap` provides lock-free reads. Every `auth_publickey()` and
|
||||
`channel_open_direct_tcpip()` call does a single `Arc` dereference — zero cost
|
||||
compared to the current approach. Writes are atomic: `store()` swaps the
|
||||
pointer.
|
||||
|
||||
### API Keys
|
||||
|
||||
`DynamicConfig.auth` also includes API keys for service accounts and HTTP
|
||||
interface auth (ADR-037):
|
||||
|
||||
```toml
|
||||
[[auth.api_keys]]
|
||||
prefix = "alk_"
|
||||
hash = "sha256:abc..."
|
||||
scopes = ["relay:connect"]
|
||||
description = "dashboard service account"
|
||||
ttl = "30d" # optional
|
||||
```
|
||||
|
||||
API keys are verified by `ConfigIdentityProvider::resolve_from_token()` — if
|
||||
the token starts with the configured prefix, it's treated as an API key and
|
||||
verified by SHA-256 hash lookup. Otherwise, it's treated as an Ed25519 AuthToken.
|
||||
Both paths produce the same `Identity` result.
|
||||
|
||||
### ConfigReloadHandle
|
||||
|
||||
```rust
|
||||
pub struct ConfigReloadHandle {
|
||||
dynamic: Arc<ArcSwap<DynamicConfig>>,
|
||||
}
|
||||
|
||||
impl ConfigReloadHandle {
|
||||
pub fn reload(&self, new_config: DynamicConfig) { ... }
|
||||
}
|
||||
```
|
||||
|
||||
Obtained from `Server::run()`. Passed to NAPI or CLI for explicit reload.
|
||||
|
||||
### ConfigServiceImpl
|
||||
|
||||
The Phase 1 implementation of config service logic, backed by
|
||||
`ArcSwap<DynamicConfig>`. Where `ConfigIdentityProvider` wraps the auth section
|
||||
of `DynamicConfig`, `ConfigServiceImpl` wraps the forwarding and rate-limit
|
||||
sections. Both are ArcSwap-backed and share the same `DynamicConfig` instance.
|
||||
|
||||
```rust
|
||||
pub struct ConfigServiceImpl {
|
||||
dynamic: Arc<ArcSwap<DynamicConfig>>,
|
||||
}
|
||||
|
||||
impl ConfigServiceImpl {
|
||||
pub fn forwarding_policy(&self) -> Arc<ForwardingPolicy> {
|
||||
self.dynamic.load().forwarding.clone()
|
||||
}
|
||||
|
||||
pub fn rate_limits(&self) -> Arc<RateLimitConfig> {
|
||||
self.dynamic.load().rate_limits.clone()
|
||||
}
|
||||
|
||||
pub fn reload(&self, new_config: DynamicConfig) {
|
||||
self.dynamic.store(Arc::new(new_config));
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Phase 1 deploys `ConfigServiceImpl` directly — no irpc service boundary. The
|
||||
`ConfigProtocol` irpc service (behind feature flag) wraps `ConfigServiceImpl`
|
||||
for production deployments that use the service layer. This mirrors the
|
||||
`ConfigIdentityProvider` / `AuthProtocol` pattern from [identity.md](identity.md)
|
||||
and ADR-028.
|
||||
|
||||
### ConfigService irpc Service
|
||||
|
||||
```rust
|
||||
enum ConfigProtocol {
|
||||
GetForwardingPolicy,
|
||||
GetRateLimits,
|
||||
ReloadForwarding { policy: ForwardingPolicy },
|
||||
ReloadRateLimits { limits: RateLimitConfig },
|
||||
}
|
||||
```
|
||||
|
||||
Behind the `irpc` feature flag. For production deployments that use the service
|
||||
layer. For minimal deployments, direct `ConfigReloadHandle::reload()` is
|
||||
sufficient.
|
||||
|
||||
### ForwardingPolicy
|
||||
|
||||
Part of DynamicConfig (ADR-031). Evaluated per-channel-open, matched against
|
||||
the authenticated `Identity`. Rules are evaluated in order; first match wins.
|
||||
Default determines fallback.
|
||||
|
||||
```rust
|
||||
pub struct ForwardingPolicy {
|
||||
pub default: ForwardingAction,
|
||||
pub rules: Vec<ForwardingRule>,
|
||||
}
|
||||
```
|
||||
|
||||
### TOML Config File
|
||||
|
||||
Optional convenience input format (amends ADR-011, does not replace
|
||||
programmatic API). Covers static config plus initial auth/forwarding paths.
|
||||
|
||||
```toml
|
||||
[server]
|
||||
# Stream-based listener: TLS + SSH on port 443
|
||||
[[listeners]]
|
||||
type = "stream"
|
||||
transport = "tls"
|
||||
interface = "ssh"
|
||||
listen = "0.0.0.0:443"
|
||||
|
||||
[server.tls]
|
||||
cert = "/etc/alknet/tls/cert.pem"
|
||||
key = "/etc/alknet/tls/key.pem"
|
||||
|
||||
# Stream-based listener: TCP + SSH on port 22
|
||||
[[listeners]]
|
||||
type = "stream"
|
||||
transport = "tcp"
|
||||
interface = "ssh"
|
||||
listen = "0.0.0.0:22"
|
||||
|
||||
# Stream-based listener: iroh P2P
|
||||
[[listeners]]
|
||||
type = "stream"
|
||||
transport = "iroh"
|
||||
iroh_relay = "https://relay.alk.dev"
|
||||
|
||||
# Message-based listener: HTTP on port 443 (with stealth)
|
||||
[[listeners]]
|
||||
type = "http"
|
||||
listen = "0.0.0.0:443"
|
||||
tls = true
|
||||
stealth = true
|
||||
|
||||
# Message-based listener: HTTP on port 8080 (separate, no stealth)
|
||||
# [[listeners]]
|
||||
# type = "http"
|
||||
# listen = "0.0.0.0:8080"
|
||||
# tls = false
|
||||
# stealth = false
|
||||
|
||||
# Message-based listener: DNS on port 53
|
||||
# [[listeners]]
|
||||
# type = "dns"
|
||||
# listen = "0.0.0.0:53"
|
||||
# tls = false
|
||||
|
||||
[auth]
|
||||
host_key = "/etc/alknet/ssh/host_key"
|
||||
|
||||
[auth.ssh]
|
||||
authorized_keys = [...]
|
||||
|
||||
[auth.token]
|
||||
enabled = true
|
||||
max_token_age = "5m"
|
||||
|
||||
[[auth.api_keys]]
|
||||
prefix = "alk_"
|
||||
hash = "sha256:abc..."
|
||||
scopes = ["relay:connect"]
|
||||
description = "dashboard service account"
|
||||
ttl = "30d"
|
||||
|
||||
[forwarding]
|
||||
default = "deny"
|
||||
|
||||
[[forwarding.rules]]
|
||||
target = "localhost:*"
|
||||
action = "allow"
|
||||
```
|
||||
|
||||
### NAPI Reload API
|
||||
|
||||
```typescript
|
||||
interface AlknetServer {
|
||||
reloadAuth(auth: { authorizedKeys?: Buffer, certAuthority?: Buffer }): void;
|
||||
reloadForwarding(policy: ForwardingPolicyConfig): void;
|
||||
reloadAll(config: DynamicConfig): void;
|
||||
}
|
||||
```
|
||||
|
||||
### Multi-Transport Listeners
|
||||
|
||||
A head node may accept connections on multiple transports and interfaces simultaneously.
|
||||
Listeners come in two categories: stream-based (Transport + StreamInterface pairs) and
|
||||
message-based (self-contained HTTP or DNS servers).
|
||||
|
||||
```rust
|
||||
pub enum ListenerConfig {
|
||||
Stream {
|
||||
transport: TransportKind,
|
||||
interface: StreamInterfaceKind,
|
||||
},
|
||||
Http {
|
||||
bind_addr: SocketAddr,
|
||||
tls: bool,
|
||||
stealth: bool, // byte-peek protocol detection on shared port
|
||||
},
|
||||
Dns {
|
||||
bind_addr: SocketAddr,
|
||||
tls: bool,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
For stream-based listeners, `Server::run()` spawns one accept loop per listener.
|
||||
For HTTP listeners, it spawns an axum server. For DNS listeners, it spawns a DNS
|
||||
server. All share `DynamicConfig`, `ConnectionRateLimiter`, sessions, and
|
||||
shutdown signal.
|
||||
|
||||
```toml
|
||||
[[listeners]]
|
||||
transport = "tls"
|
||||
listen = "0.0.0.0:443"
|
||||
stealth = true
|
||||
|
||||
[[listeners]]
|
||||
transport = "tcp"
|
||||
listen = "0.0.0.0:22"
|
||||
|
||||
[[listeners]]
|
||||
transport = "iroh"
|
||||
iroh_relay = "https://relay.alk.dev"
|
||||
```
|
||||
|
||||
### CLI vs Programmatic Behavior
|
||||
|
||||
| Interface | Static config | Dynamic config | Reload mechanism |
|
||||
|-----------|--------------|----------------|------------------|
|
||||
| CLI | Flags + optional `--config` file | Loaded at startup from `--authorized-keys` | None (restart to change) |
|
||||
| Core Rust | `StaticConfig` struct | `AuthProtocol` (irpc) or `ConfigIdentityProvider` (ArcSwap) | `ConfigProtocol::ReloadDynamicConfig` or `ConfigReloadHandle::reload()` |
|
||||
| NAPI | `serve()` options | Same | `server.reloadAuth()`, `server.reloadForwarding()` |
|
||||
|
||||
## Constraints
|
||||
|
||||
- `StaticConfig` cannot be changed after startup. Changing transport mode,
|
||||
listen address, TLS config, or host key requires a restart.
|
||||
- `DynamicConfig` is reloaded atomically via `ArcSwap`. Existing connections
|
||||
continue with their current config; new connections get the new config.
|
||||
- Config file is optional. `ServeOptions` builder pattern remains the primary
|
||||
API (amends ADR-011, does not supersede it).
|
||||
- No file watching (OQ-13 resolved: potential attack vector, unnecessary
|
||||
complexity).
|
||||
- Client configuration stays as `ConnectOptions` — no `ArcSwap` needed.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- None. All configuration-related questions are resolved per ADR-030, ADR-031,
|
||||
and the resolved OQs in [open-questions.md](open-questions.md).
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [030](decisions/030-static-dynamic-config-split.md) | Static/dynamic config split | Immutable transport vs. reloadable auth/forwarding |
|
||||
| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first API | Amended, not superseded — TOML is convenience layer |
|
||||
| [031](decisions/031-forwarding-policy.md) | Forwarding policy | Rule-based allow/deny, TransportKind-aware |
|
||||
| [029](decisions/029-identity-core-type.md) | Identity as core type | DynamicConfig.auth consumed by IdentityProvider |
|
||||
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | ConfigService wraps DynamicConfig reloads |
|
||||
|
||||
## Phase 2 Implementation Notes
|
||||
|
||||
- `DynamicConfig.auth` now includes `api_keys: Vec<ApiKeyEntry>` (ADR-037)
|
||||
- `DynamicConfig.credentials: HashMap<String, CredentialSet>` added for static outbound credentials (ADR-036)
|
||||
- `ListenerConfig` restructured from flat struct to enum: `Stream { transport, interface }`, `Http { config: HttpListenerConfig }`, `Dns { config: DnsListenerConfig }` (ADR-035)
|
||||
- `HttpListenerConfig` and `DnsListenerConfig` builder-pattern structs added
|
||||
- `ListenerConfig::validate()` now validates all three variants
|
||||
|
||||
## References
|
||||
|
||||
- [research/configuration.md](../research/configuration.md) — Full analysis and proposed solution
|
||||
- [identity.md](identity.md) — IdentityProvider trait, DynamicConfig.auth
|
||||
- [ADR-013](decisions/013-fail2ban-friendly-logging.md) — Rate limiting parameters
|
||||
@@ -1,263 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-09
|
||||
---
|
||||
|
||||
# Credentials (Outbound Auth)
|
||||
|
||||
## What
|
||||
|
||||
The `CredentialProvider` trait and `CredentialSet` enum handle **outbound**
|
||||
authentication: how alknet authenticates _to_ external and self-hosted services.
|
||||
This is the complement to `IdentityProvider`, which handles **inbound**
|
||||
authentication (who is calling alknet).
|
||||
|
||||
## Why
|
||||
|
||||
Without `CredentialProvider`, each service wrapper would independently solve
|
||||
credential retrieval, caching, and lifecycle management. Cloud API integrations
|
||||
(vast.ai, runpod) need API keys. Self-hosted services (rustfs, gitea) need
|
||||
S3 access keys or OIDC tokens. The secret service can store these at rest, but
|
||||
the wiring between "decrypt a credential from storage" and "use it in an HTTP
|
||||
request" doesn't exist yet.
|
||||
|
||||
`CredentialProvider` provides a unified abstraction — just as `IdentityProvider`
|
||||
unifies inbound auth, `CredentialProvider` unifies outbound auth. Handlers
|
||||
access credentials through `OperationEnv`, not by reaching into storage directly.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Direction: Inbound vs Outbound
|
||||
|
||||
| | IdentityProvider | CredentialProvider |
|
||||
|---|---|---|
|
||||
| **Direction** | Inbound (who is calling alknet) | Outbound (how alknet calls others) |
|
||||
| **Resolves** | Fingerprint/token → `Identity` | Service name → `CredentialSet` |
|
||||
| **Storage** | `peer_credentials`, `api_keys` | Encrypted nodes in metagraph |
|
||||
| **Lifecycle** | Stateless lookup | May need refresh (OIDC tokens, S3 sessions) |
|
||||
| **Location** | `alknet_core::auth` | `alknet_core::credentials` |
|
||||
|
||||
Both live at the same architectural layer. A handler receives an
|
||||
`OperationContext` with `identity` (who called us) and can access credentials
|
||||
through `context.env` (how we call out).
|
||||
|
||||
### CredentialProvider Trait
|
||||
|
||||
```rust
|
||||
pub trait CredentialProvider: Send + Sync + 'static {
|
||||
fn get_credentials(&self, service: &str) -> Option<CredentialSet>;
|
||||
fn refresh_credentials(&self, service: &str) -> Option<CredentialSet>;
|
||||
}
|
||||
```
|
||||
|
||||
The trait is intentionally narrow. It returns credentials for a named service.
|
||||
It does not abstract the auth mechanism — that stays with the service wrapper
|
||||
that knows the protocol (S3 signing, OAuth2 refresh, etc.).
|
||||
|
||||
### CredentialSet
|
||||
|
||||
```rust
|
||||
pub enum CredentialSet {
|
||||
ApiKey {
|
||||
header_name: String,
|
||||
token: String,
|
||||
},
|
||||
Basic {
|
||||
username: String,
|
||||
password: String,
|
||||
},
|
||||
Bearer {
|
||||
token: String,
|
||||
},
|
||||
S3AccessKey {
|
||||
access_key: String,
|
||||
secret_key: String,
|
||||
session_token: Option<String>,
|
||||
},
|
||||
OidcToken {
|
||||
access_token: String,
|
||||
refresh_token: Option<String>,
|
||||
expires_at: Option<u64>,
|
||||
},
|
||||
Custom {
|
||||
scheme: String,
|
||||
params: HashMap<String, String>,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
Each variant carries the data needed for a specific auth mechanism. The service
|
||||
wrapper that requested the credentials knows what variant it expects and how to
|
||||
use it.
|
||||
|
||||
### CredentialProvider vs IdentityProvider
|
||||
|
||||
These are opposite-direction abstractions that compose through `OperationEnv`:
|
||||
|
||||
```
|
||||
Incoming Request
|
||||
│
|
||||
▼
|
||||
IdentityProvider (credential → Identity)
|
||||
│
|
||||
├── SSH fingerprint → Identity.id, .scopes, .resources
|
||||
├── Bearer AuthToken → Identity.id, .scopes, .resources
|
||||
└── API key → Identity.id, .scopes, .resources
|
||||
│
|
||||
▼
|
||||
OperationContext { identity, env, ... }
|
||||
│
|
||||
├── context.env.invoke("git", "push", input)
|
||||
│ └── GitService handler
|
||||
│ └── CredentialProvider (outbound)
|
||||
│ └── get_credentials("rustfs")
|
||||
│ └── S3AccessKey { access_key, secret_key }
|
||||
│
|
||||
└── context.env.invoke("secrets", "derive", input)
|
||||
└── local dispatch to SecretProtocol
|
||||
|
||||
Two directions: Inbound (who is calling us)
|
||||
Outbound (how we call others)
|
||||
```
|
||||
|
||||
### SecretStoreCredentialProvider (Phase 1 Default)
|
||||
|
||||
The default `CredentialProvider` implementation. Decrypts credentials via
|
||||
`SecretProtocol::Decrypt` and holds them in RAM:
|
||||
|
||||
```rust
|
||||
pub struct SecretStoreCredentialProvider {
|
||||
credentials: ArcSwap<HashMap<String, CredentialSet>>,
|
||||
}
|
||||
```
|
||||
|
||||
At startup, the CLI or NAPI assembly loads credentials from the secret service
|
||||
and populates the `ArcSwap`. The `refresh_credentials()` method re-decrypts
|
||||
after a `Lock`/`Unlock` cycle on the secret service.
|
||||
|
||||
### ManagedCredentialProvider (Phase C Future)
|
||||
|
||||
For self-hosted services that need active lifecycle management (S3 session
|
||||
token rotation, OIDC token refresh). Wraps `SecretStoreCredentialProvider`
|
||||
with per-service `CredentialManager` instances:
|
||||
|
||||
```rust
|
||||
pub struct ManagedCredentialProvider {
|
||||
base: SecretStoreCredentialProvider,
|
||||
managers: HashMap<String, Arc<dyn CredentialManager>>,
|
||||
}
|
||||
|
||||
pub trait CredentialManager: Send + Sync + 'static {
|
||||
fn refresh(&self, current: &CredentialSet) -> Option<CredentialSet>;
|
||||
fn is_expired(&self, current: &CredentialSet) -> bool;
|
||||
fn provision(&self, identity: &Identity) -> Option<CredentialSet>;
|
||||
}
|
||||
```
|
||||
|
||||
- `refresh`: OIDC token refresh, S3 session token rotation
|
||||
- `is_expired`: Check TTL before use
|
||||
- `provision`: Create credentials on a self-hosted service for a given identity
|
||||
|
||||
This is a Phase C concept. The spec defines the extension point but defers
|
||||
implementation.
|
||||
|
||||
### Integration with OperationEnv
|
||||
|
||||
Handlers access credentials through `OperationEnv`:
|
||||
|
||||
```rust
|
||||
// Handler needs outbound credentials for a service
|
||||
let creds = context.env.get_credentials("rustfs");
|
||||
```
|
||||
|
||||
This is analogous to how `context.env.invoke(namespace, op, input)` works for
|
||||
operation dispatch — the handler doesn't know whether the credential comes from
|
||||
config, the secret service, or a managed provider.
|
||||
|
||||
### Integration with SecretProtocol
|
||||
|
||||
Credentials are stored encrypted in the metagraph via `SecretProtocol`:
|
||||
|
||||
1. Operator configures credentials: `alknet credential add vast-ai --type bearer --token-file ./key.txt`
|
||||
2. CLI encrypts via `SecretProtocol::Encrypt` (AES-256-GCM, key at path `m/74'/2'/0'/0'`)
|
||||
3. Encrypted credential stored as `EncryptedData` node in metagraph, tagged with service name
|
||||
4. At startup, `SecretStoreCredentialProvider` calls `SecretProtocol::Decrypt` for each configured service
|
||||
5. Decrypted credentials held in RAM with same lifecycle as the seed (purged on `Lock`)
|
||||
|
||||
The `EncryptedData` wire format is shared with alknet-storage by type-level
|
||||
compatibility, not a crate dependency.
|
||||
|
||||
### Identity-Bound Credentials (Phase B+ Future)
|
||||
|
||||
For multi-tenant setups where different alknet users have different access levels
|
||||
on the same external service:
|
||||
|
||||
```rust
|
||||
// Service-level credential (all users share one key):
|
||||
credential_provider.get_credentials("rustfs")
|
||||
|
||||
// Identity-bound credential (per-user key):
|
||||
credential_provider.get_credentials_for("rustfs", &identity.id)
|
||||
```
|
||||
|
||||
The trait-level method is service-level. The identity-bound method is an
|
||||
extension in alknet-storage that uses `Identity.id` (the account UUID in
|
||||
database-backed deployments) as the lookup key. No separate `account_id` field
|
||||
needed — `Identity.id` IS the account identifier.
|
||||
|
||||
## Constraints
|
||||
|
||||
- `CredentialProvider` and `CredentialSet` live in `alknet_core::credentials`.
|
||||
No database dependency at the core level.
|
||||
- `CredentialProvider` does not depend on `IdentityProvider`. They compose
|
||||
through `OperationEnv`, not through dependency.
|
||||
- `ManagedCredentialProvider` and `CredentialManager` are Phase C concepts.
|
||||
They are defined as extension points but not implemented yet.
|
||||
- Identity-bound credentials use `Identity.id` as the account key. In
|
||||
config-backed deployments, this is the fingerprint or key prefix. In
|
||||
database-backed deployments, this is the account UUID.
|
||||
- `SecretStoreCredentialProvider` depends on `SecretProtocol::Decrypt`, which
|
||||
requires the alknet-secret crate. A stub impl that reads from config is
|
||||
sufficient for Phase 2 when alknet-secret isn't available.
|
||||
- The `CredentialSet` variants cover all identified credential types (Phases
|
||||
A–C). Phase D (alknet as OIDC provider) is additive.
|
||||
|
||||
## Phase Progression
|
||||
|
||||
| Phase | CredentialProvider Scope | Notes |
|
||||
|-------|-------------------------|-------|
|
||||
| Phase 2 (now) | Trait + `CredentialSet` in core. `SecretStoreCredentialProvider` stub reads from config. | Enables Phase 2 HTTP auth |
|
||||
| Phase A | `SecretStoreCredentialProvider` backed by `SecretProtocol::Decrypt`. CLI command for credential management. | Full secret service integration |
|
||||
| Phase B | `FromOpenAPI` integration. `CredentialProvider` populates `HttpServiceConfig.auth`. | Auto-registration of external services |
|
||||
| Phase C | `ManagedCredentialProvider` + `CredentialManager`. S3 signing, OIDC refresh, identity-bound credentials. | Production self-hosted services |
|
||||
| Phase D | Alknet as OIDC provider. Eliminates stored credentials for OIDC-compatible services. | Long-term goal |
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **OQ-CP-01**: Should `CredentialProvider` support per-identity credentials
|
||||
(`get_credentials(service, identity)`)? See [open-questions.md](open-questions.md).
|
||||
|
||||
- **OQ-CP-02**: Where should OIDC provider operations live if alknet becomes
|
||||
an OIDC provider (Phase D)? See [open-questions.md](open-questions.md).
|
||||
|
||||
- **OQ-CP-03**: How do credential rotations propagate across a cluster? See
|
||||
[open-questions.md](open-questions.md).
|
||||
|
||||
- **OQ-CP-04**: Should `CredentialSet` include request-signing capability?
|
||||
See [open-questions.md](open-questions.md).
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [036](decisions/036-credentialprovider-core-type.md) | CredentialProvider as core type | Outbound credentials in `alknet_core::credentials`, parallel to IdentityProvider |
|
||||
| [029](decisions/029-identity-core-type.md) | Identity as core type | Inbound auth — the opposite direction |
|
||||
| [032](decisions/032-event-boundary-discipline.md) | Event boundary | Secret service domain events stay internal |
|
||||
|
||||
## References
|
||||
|
||||
- [identity.md](identity.md) — IdentityProvider (inbound auth, opposite direction)
|
||||
- [secret-service.md](secret-service.md) — SecretProtocol, EncryptedData
|
||||
- [services.md](services.md) — OperationEnv, OperationContext
|
||||
- [definitions.md](definitions.md) — IdentityProvider vs CredentialProvider disambiguation
|
||||
- [research/phase2/credential-provider.md](../research/phase2/credential-provider.md) — Full analysis with rustfs/gitea integration
|
||||
@@ -1,26 +0,0 @@
|
||||
# ADR-001: Pluggable Transport via AsyncRead+AsyncWrite Trait
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
Alknet needs to support multiple transport modes (TCP, TLS, iroh) for SSH sessions. Each mode has different connection establishment logic but produces the same result: a bidirectional byte stream. Without an abstraction, each transport would need its own SSH connection code path.
|
||||
|
||||
russh's `client::connect_stream()` and `server::run_stream()` both accept `AsyncRead + AsyncWrite + Unpin + Send`, meaning SSH is already transport-agnostic at the API level. The design question is whether to enshrine this in alknet's own type system or handle each transport case-by-case.
|
||||
|
||||
## Decision
|
||||
Define a `Transport` trait that produces `AsyncRead + AsyncWrite + Unpin + Send` streams. Each transport (TCP, TLS, iroh) implements this trait. The SSH layer calls `transport.connect()` and passes the result to `russh::client::connect_stream()`.
|
||||
|
||||
On the server side, define a `TransportAcceptor` trait that produces incoming streams. Each acceptor (TCP listener, TLS listener, iroh endpoint) implements this trait. The server calls `acceptor.accept()` and passes the result to `russh::server::run_stream()`.
|
||||
|
||||
This makes adding a new transport (e.g., WebSocket, QUIC directly) a matter of implementing the trait, not modifying SSH code.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Clean separation between transport and protocol. Adding transports is additive. SSH code is transport-agnostic.
|
||||
- **Positive**: Testing is simplified — mock transports can produce in-memory streams.
|
||||
- **Negative**: Slight indirection for the single-transport case (just TCP). The trait boilerplate is minimal though.
|
||||
- **Negative**: The trait must be object-safe if we want dynamic dispatch. Using `impl Trait` in function signatures avoids this but limits runtime transport selection. CLI-selected transport needs dynamic dispatch: `Box<dyn Transport<Stream = Box<dyn AsyncRead+AsyncWrite+Unpin+Send>>>`.
|
||||
|
||||
## References
|
||||
- [transport.md](../transport.md)
|
||||
- [Feasibility assessment §3](../../research/feasibility/ssh-tunnel-vpn-alternative-feasibility.md)
|
||||
@@ -1,30 +0,0 @@
|
||||
# ADR-002: TUN Shim as Separate Process
|
||||
|
||||
## Status
|
||||
Superseded by ADR-014
|
||||
|
||||
## Context
|
||||
TUN interface creation requires root privileges or `CAP_NET_ADMIN` on Linux, Administrator on Windows, or platform-specific VPN APIs on macOS/iOS/Android. If the core alknet binary required these privileges, the attack surface of root-required code would include the entire SSH implementation, key handling, and transport negotiation.
|
||||
|
||||
The primary use cases (SOCKS5 proxy, port forwarding) need no privileges at all. Only the "route all traffic through TUN" use case needs root.
|
||||
|
||||
## Decision
|
||||
The TUN functionality is a separate `alknet-tun` binary that:
|
||||
1. Creates a TUN device (requires root / CAP_NET_ADMIN)
|
||||
2. Reads IP packets from it
|
||||
3. Forwards each connection to the core alknet's SOCKS5 port (127.0.0.1:1080)
|
||||
4. Proxies bytes between TUN packets and SOCKS5 connections
|
||||
|
||||
The core `alknet connect` binary never needs root. The `alknet-tun` binary is ~200-500 lines and does nothing except TUN ↔ SOCKS5 forwarding.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Root-required code surface is tiny and auditable.
|
||||
- **Positive**: Core binary runs unprivileged. SOCKS5 and port forwarding work without any special permissions.
|
||||
- **Positive**: TUN process can crash without affecting the SSH session (it just reconnects to SOCKS5).
|
||||
- **Positive**: Matches the proven tun2proxy architecture.
|
||||
- **Negative**: Two processes to manage instead of one. Requires process supervision (systemd, etc.).
|
||||
- **Negative**: SOCKS5 adds a small latency overhead vs. direct TUN → SSH packet routing. This is acceptable for the security benefit.
|
||||
|
||||
## References
|
||||
- [tun-shim.md](../tun-shim.md)
|
||||
- [tun2proxy](https://github.com/tun2proxy/tun2proxy) — proven architecture for TUN → SOCKS5 proxy
|
||||
@@ -1,31 +0,0 @@
|
||||
# ADR-003: iroh Stream via tokio::io::join
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
iroh's QUIC implementation provides separate `RecvStream` (implements `AsyncRead`) and `SendStream` (implements `AsyncWrite`) for each bidirectional channel opened via `open_bi()` / `accept_bi()`. russh's `connect_stream()` and `run_stream()` require a single type implementing both `AsyncRead` and `AsyncWrite`.
|
||||
|
||||
Options considered:
|
||||
1. `tokio::io::join(recv, send)` — Combines the two halves into `Join<RecvStream, SendStream>` which implements both traits.
|
||||
2. Custom `IrohStream` wrapper — A struct with `recv` and `send` fields that delegates `AsyncRead` to `recv` and `AsyncWrite` to `send`.
|
||||
3. Using iroh's `Connection` directly — Opening a new `open_bi()` for each SSH channel instead of running SSH over a single stream.
|
||||
|
||||
## Decision
|
||||
Use `tokio::io::join(recv_stream, send_stream)` (Option 1).
|
||||
|
||||
One line of code, correct trait implementations, no custom types needed. The `Join<A, B>` type implements `AsyncRead` using `A` and `AsyncWrite` using `B`, which maps directly to iroh's split stream model.
|
||||
|
||||
If profiling later shows overhead (unlikely — it's just method dispatch), we can switch to a custom wrapper. But YAGNI until demonstrated.
|
||||
|
||||
Option 3 was rejected because it would require modifying russh to understand iroh connections. The whole point of the transport trait is that SSH doesn't know about iroh.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Minimal code. One line to bridge iroh and russh.
|
||||
- **Positive**: No custom types to maintain.
|
||||
- **Positive**: Correct `AsyncRead` + `AsyncWrite` behavior — `Poll::Pending` on one half doesn't affect the other.
|
||||
- **Negative**: None identified. The `Join` type is a standard tokio combinator with well-tested semantics.
|
||||
|
||||
## References
|
||||
- [transport.md](../transport.md)
|
||||
- [Feasibility assessment §11](../../research/feasibility/ssh-tunnel-vpn-alternative-feasibility.md)
|
||||
@@ -1,28 +0,0 @@
|
||||
# ADR-004: SSH Runs Over Transport, Not Alongside
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
There are two ways to structure the relationship between SSH and the transport layer:
|
||||
|
||||
1. **SSH over transport**: The transport produces one duplex stream. The entire SSH session (handshake, key exchange, channel multiplexing) runs over that single stream via `connect_stream()` / `run_stream()`. SSH has no direct network access.
|
||||
|
||||
2. **Transport alongside SSH**: SSH manages its own TCP connections via `connect()` / `run()`. The transport layer is an additional feature that wraps outgoing connections. SSH knows about the network.
|
||||
|
||||
## Decision
|
||||
SSH runs over the transport (Option 1). The SSH layer never opens its own sockets or knows what transport it's on.
|
||||
|
||||
This is directly enabled by russh's `connect_stream()` and `run_stream()` APIs, which accept any `AsyncRead+AsyncWrite+Unpin+Send`. SSH's entire interaction with the network goes through the single stream produced by the transport.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Adding a new transport requires implementing the `Transport` trait, not modifying SSH code.
|
||||
- **Positive**: Testing is straightforward — mock transports produce in-memory streams.
|
||||
- **Positive**: Security audit is clean — the SSH implementation has no network-facing code.
|
||||
- **Positive**: The transport can be layered. Iroh connecting through a SOCKS5 proxy (which itself tunnels through alknet) is just a transport that calls out to a SOCKS5 library before establishing the QUIC connection.
|
||||
- **Negative**: SSH keepalive and reconnection must be handled at the transport level. If the transport stream dies, the SSH session dies. Reconnection means establishing a new transport + new SSH session. There's no "SSH reconnects over the same transport" — you get a new session.
|
||||
- **Negative**: Multiple SSH sessions over the same iroh connection require the iroh `Endpoint` (not stream) to be shared between sessions. The transport trait produces one stream per `connect()` call. The iroh `Endpoint` must be created externally and shared. (The `IrohTransport` struct holds an `Arc<Endpoint>`.)
|
||||
|
||||
## References
|
||||
- [transport.md](../transport.md)
|
||||
- [Feasibility assessment §3.4](../../research/feasibility/ssh-tunnel-vpn-alternative-feasibility.md)
|
||||
@@ -1,39 +0,0 @@
|
||||
# ADR-005: SOCKS5 as Primary Interface, TUN as Add-on
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
A "VPN-like" tool needs to route traffic. There are three approaches:
|
||||
|
||||
1. **TUN only**: Create a TUN interface, route all OS traffic through it. Full VPN experience but requires root.
|
||||
2. **SOCKS5 only**: Local SOCKS5 proxy. Applications configure proxy settings. No root needed but application support varies.
|
||||
3. **SOCKS5 primary, TUN add-on**: SOCKS5 is the core interface. TUN forwards to SOCKS5.
|
||||
|
||||
## Decision
|
||||
SOCKS5 is the primary interface. TUN is a separate process that forwards to SOCKS5 (Option 3).
|
||||
|
||||
SOCKS5 is the core because:
|
||||
- It requires no privileges
|
||||
- `curl --socks5-hostname` works everywhere
|
||||
- Browsers, most CLI tools, and many applications support SOCKS5
|
||||
- SOCKS5h prevents DNS leaks by resolving names server-side
|
||||
- It's the interface that the NAPI wrapper and pubsub adapter build on
|
||||
- TUN is only needed for "route all traffic" use cases, which are a subset of users
|
||||
|
||||
TUN forwards to SOCKS5 rather than directly to SSH because:
|
||||
- The SOCKS5 code already handles TCP connection establishment and bidirectional proxying
|
||||
- TUN's job is just IP packet → SOCKS5 connection, not IP packet → SSH channel
|
||||
- The `alknet-tun` binary stays minimal (~200-500 lines)
|
||||
- No root code in the core binary
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Core binary is root-free. TUN functionality is provided by the external `tun2proxy` tool (ADR-014).
|
||||
- **Positive**: SOCKS5 is testable without TUN — just `curl` against it.
|
||||
- **Positive**: The TUN approach is validated by tun2proxy, a well-tested existing tool. No custom TUN code to maintain.
|
||||
- **Negative**: VPN-like behavior requires running `tun2proxy` alongside `alknet connect` — two processes instead of one integrated binary.
|
||||
- **Negative**: SOCKS5 doesn't capture UDP (except DNS via SOCKS5h). TUN mode via tun2proxy handles this separately.
|
||||
|
||||
## References
|
||||
- [client.md](../client.md)
|
||||
- [tun-shim.md](../tun-shim.md)
|
||||
@@ -1,38 +0,0 @@
|
||||
# ADR-006: No Logging of Tunnel Destinations
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
An SSH tunnel server sees every destination that clients connect to — hostnames, IP addresses, port numbers. This is extremely sensitive information. Logging it creates:
|
||||
|
||||
- **Privacy risks**: Tunnel destinations reveal what services users access (internal databases, APIs, etc.)
|
||||
- **Legal concerns**: Server operators may be pressured to produce logs showing what clients accessed
|
||||
- **Data retention liability**: Stored destination logs are an attack surface (data breaches, subpoenas)
|
||||
|
||||
However, the server does need to log some information for operational purposes — particularly for fail2ban integration to detect and block abusive connections.
|
||||
|
||||
## Decision
|
||||
The server does NOT log:
|
||||
- `channel_open_direct_tcpip` destinations (host, port)
|
||||
- DNS resolutions performed by the server on behalf of clients
|
||||
- Bytes transferred through tunnel channels
|
||||
- Connection duration or throughput
|
||||
|
||||
The server DOES log (ADR-013):
|
||||
- Auth attempts (remote_addr, user, key_fingerprint, accept/reject)
|
||||
- Connection opened (remote_addr, transport kind)
|
||||
- Connection closed (remote_addr, duration)
|
||||
|
||||
This separation ensures fail2ban has enough data to detect abusive IPs while destination privacy is maintained.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Tunnel destinations are never written to disk or any observable log. This is the same guarantee OpenSSH makes with `LogLevel VERBOSE` or below.
|
||||
- **Positive**: Reduces legal and privacy exposure for server operators.
|
||||
- **Positive**: fail2ban can still work — it needs source IPs and auth failures, not destinations.
|
||||
- **Negative**: Server operators cannot audit what destinations clients are accessing. If an operator needs this for compliance, they must implement it outside alknet (e.g., network-level logging at the target host).
|
||||
- **Negative**: Debugging connectivity issues is harder without destination logs. Mitigated by client-side logging (the client knows what it's connecting to).
|
||||
|
||||
## References
|
||||
- [server.md](../server.md)
|
||||
- [ADR-013](013-fail2ban-friendly-logging.md) — what the server does log
|
||||
@@ -1,26 +0,0 @@
|
||||
# ADR-007: NAPI Exposes Single Duplex Stream
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The NAPI wrapper for alknet could expose different granularity levels:
|
||||
|
||||
1. **Full SSH API**: Expose channel multiplexing, `open_direct_tcpip`, `tcpip_forward`, session management. The TypeScript layer would manage channels.
|
||||
2. **Single duplex stream**: The NAPI wrapper establishes one SSH channel and returns it as a Node.js `Duplex` stream. TypeScript multiplexing (if needed) happens at the pubsub layer.
|
||||
|
||||
## Decision
|
||||
Option 2: NAPI exposes a single duplex stream.
|
||||
|
||||
The NAPI wrapper's job is to get a reliable, authenticated byte stream from A to B. It handles transport (TCP/TLS/iroh), SSH authentication, and channel setup, then hands the caller a single `Duplex` stream that just works.
|
||||
|
||||
If the TypeScript consumer needs multiplexing (e.g., multiple concurrent tool calls over operations), pubsub handles that at the `EventEnvelope` level. Multiple `call.requested` / `call.responded` events flow over the same stream, distinguished by their `id` fields. This is how the existing WebSocket adapter works.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Minimal NAPI surface — one function, one return type. Small binary, small FFI boundary.
|
||||
- **Positive**: The TypeScript side doesn't need to understand SSH at all. It gets a stream and sends/receives `EventEnvelope` JSON.
|
||||
- **Positive**: No need to expose russh types in NAPI. The SSH complexity stays in Rust.
|
||||
- **Negative**: If a consumer wants multiple isolated channels (e.g., one for events, one for file transfer), they'd need multiple `connect()` calls (multiple SSH sessions). This is acceptable for the expected use case (pubsub events over a single stream).
|
||||
|
||||
## References
|
||||
- [napi-and-pubsub.md](../napi-and-pubsub.md)
|
||||
@@ -1,38 +0,0 @@
|
||||
# ADR-008: ACME/Let's Encrypt Certificate Provisioning
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
TLS transport mode requires certificates. Manual certificate management is error-prone — users need to obtain, install, and renew certificates. Our production setup uses certbot with Let's Encrypt (documented in [certbot.md](../../research/ops/certbot.md)), which automates this via the ACME protocol.
|
||||
|
||||
There are two ACME flows:
|
||||
1. **Domain-based**: Standard flow with DNS-01 or HTTP-01 challenge. Certificate is tied to a domain name, auto-renews via certbot/systemd timer. Requires port 80 or DNS access for challenges.
|
||||
2. **IP-based**: Short-lived certificates via TLS-ALPN-01 challenge on port 443. No domain needed, but cert is short-lived (days, not months). Simpler for quick setups but requires the ACME client to run continuously.
|
||||
|
||||
Both flows are important for alknet's usability. Without ACME, TLS mode requires manual cert setup — a significant barrier for users who want "SSH over port 443" for censorship resistance.
|
||||
|
||||
## Decision
|
||||
Support both ACME certificate provisioning paths:
|
||||
|
||||
1. **Domain-based ACME** (`--acme-domain <domain>`): Standard certbot-style flow. Certificate is domain-bound, auto-renews. The server runs a challenge responder (HTTP-01 on port 80 or TLS-ALPN-01 on port 443) during certificate issuance/renewal.
|
||||
|
||||
2. **IP-based ACME**: Short-lived certs for servers without a domain. Uses TLS-ALPN-01 challenge on port 443. Lower burden but certs expire frequently.
|
||||
|
||||
3. **Manual certs** (`--tls-cert` / `--tls-key`): Always supported for users with existing certificates or specific PKI setups.
|
||||
|
||||
The implementation should use the `rustls-acme` crate (or similar pure-Rust ACME client) to avoid an external certbot dependency. This keeps alknet self-contained as a single binary.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Users can run `alknet serve --transport tls --acme-domain example.com` and get working TLS with zero manual cert management.
|
||||
- **Positive**: IP-based ACME covers the quick-setup case without requiring a domain.
|
||||
- **Positive**: Consistent with our production infrastructure (certbot + Let's Encrypt is already our standard).
|
||||
- **Negative**: ACME adds complexity to the server binary (challenge responder, cert store, renewal timer).
|
||||
- **Negative**: IP-based short-lived certs require more frequent renewal handling.
|
||||
- **Negative**: Binary size increases with ACME support (rustls-acme dependency). Consider making ACME a feature flag (`acme`).
|
||||
|
||||
## References
|
||||
- [server.md](../server.md)
|
||||
- [OQ-01](../open-questions.md) — resolved by this ADR
|
||||
- [OQ-07](../open-questions.md) — resolved by this ADR
|
||||
- Production certbot setup: [certbot.md](../../research/ops/certbot.md)
|
||||
@@ -1,28 +0,0 @@
|
||||
# ADR-009: Default iroh Relay with Override
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
iroh requires a relay server for NAT traversal and initial connection establishment. The n0 project provides free relay servers (`https://relay.iroh.network/`) that work out of the box. However, relying on a third-party service creates a dependency:
|
||||
|
||||
- n0's relay could change terms, rate-limit, or go down
|
||||
- Production deployments may want self-hosted relays for reliability and privacy
|
||||
- The relay URL is a configuration point that should be explicit
|
||||
|
||||
Conversely, requiring users to set up a relay server before they can use iroh transport is a significant friction point for testing and quick starts.
|
||||
|
||||
## Decision
|
||||
Default to n0's relay servers. Allow override via `--iroh-relay <url>` CLI flag. Document self-hosted relay setup in project documentation.
|
||||
|
||||
This matches iroh's own defaults — n0's relay is the standard starting point. Users who need production reliability self-host.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Zero-config iroh transport for testing and development. `alknet serve --transport iroh` just works.
|
||||
- **Positive**: Self-hosting is a single flag override, not a complex setup requirement.
|
||||
- **Negative**: Default depends on n0's infrastructure. If n0's relay is down, default iroh connections fail (but this is the same experience as every iroh user).
|
||||
- **Negative**: Privacy-conscious users must remember to `--iroh-relay` to avoid n0. Mitigated by documentation.
|
||||
|
||||
## References
|
||||
- [transport.md](../transport.md)
|
||||
- [OQ-02](../open-questions.md) — resolved by this ADR
|
||||
@@ -1,33 +0,0 @@
|
||||
# ADR-010: Transport Chaining in CLI
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
Transport chaining allows combining iroh with an upstream proxy, e.g.:
|
||||
|
||||
```bash
|
||||
alknet connect --transport iroh --proxy socks5://127.0.0.1:1080
|
||||
```
|
||||
|
||||
This routes iroh's outbound TCP connections through a SOCKS5 proxy, which could itself be another alknet instance. This is important for:
|
||||
- Nested tunnel topologies
|
||||
- Environments where iroh needs to go through an existing proxy
|
||||
- Composing transports in flexible ways
|
||||
|
||||
iroh's `Endpoint::builder` supports proxy configuration natively. The implementation is straightforward — pass the proxy URL to iroh's builder.
|
||||
|
||||
## Decision
|
||||
Support `--transport iroh --proxy socks5://...` natively in the CLI. This works because iroh's endpoint builder accepts a proxy configuration, so the implementation is minimal: parse the proxy URL and pass it to the endpoint builder.
|
||||
|
||||
For other transport combinations (TCP+TLS is already implicit — TLS wraps TCP), the `--proxy` flag applies to outbound connections from the SSH client or iroh endpoint.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Flexible transport composition without requiring separate manual configuration.
|
||||
- **Positive**: Matches user expectation from the overview doc's transport chaining example.
|
||||
- **Positive**: Implementation is minimal — iroh already supports proxy config.
|
||||
- **Negative**: Slightly more CLI surface area (`--proxy` interaction with `--transport`).
|
||||
|
||||
## References
|
||||
- [transport.md](../transport.md)
|
||||
- [OQ-05](../open-questions.md) — resolved by this ADR
|
||||
@@ -1,38 +0,0 @@
|
||||
# ADR-011: Programmatic-First API, No File-Based Config
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The client and server both need configuration (host addresses, keys, transport options, etc.). There are several approaches:
|
||||
|
||||
1. **Read `~/.ssh/config`**: Parse OpenSSH config for default host/key/port. Reduces CLI verbosity for frequent connections.
|
||||
2. **Custom config file**: Alknet-specific config file (TOML/YAML) with host definitions.
|
||||
3. **Programmatic API only**: Configuration comes from CLI flags or the library API. No file parsing. `~/.ssh/` path conventions are cross-platform trouble (`~` expansion, Windows paths, etc.).
|
||||
4. **Hybrid**: `--config` flag pointing to a alknet-specific config file, but no OpenSSH config parsing.
|
||||
|
||||
## Decision
|
||||
Option 3: Programmatic-first API. Configuration is provided via:
|
||||
- **CLI**: explicit flags (`--server`, `--identity`, `--transport`, etc.)
|
||||
- **Library API**: `alknet_core::client::ConnectOptions` and `alknet_core::server::ServeOptions` structs, constructable programmatically
|
||||
- **Environment variables**: for a few convenience defaults (e.g., `ALKNET_SERVER`, `ALKNET_IDENTITY`)
|
||||
|
||||
No `~/.ssh/config` parsing, no alknet-specific config files. This approach:
|
||||
- Avoids cross-platform path issues (`~` expansion, Windows `USERPROFILE`, etc.)
|
||||
- Makes the library API clean and straightforward for programmatic consumers (NAPI wrapper, pubsub)
|
||||
- Keeps the CLI simple and explicit — no hidden behavior from config files
|
||||
- Matches the design principle that the library crate (`alknet-core`) is the primary interface
|
||||
|
||||
If users want config-file behavior in the future, it can be added as a separate layer that populates the options structs. But the core doesn't need to know about files.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Clean library API — `ConnectOptions` and `ServeOptions` are plain Rust structs.
|
||||
- **Positive**: No cross-platform path issues in the core library.
|
||||
- **Positive**: Explicit CLI — no hidden settings from a config file the user forgot about.
|
||||
- **Positive**: NAPI wrapper can construct options programmatically without file I/O.
|
||||
- **Negative**: Users must type full connection flags each time. Mitigated by shell aliases or environment variables.
|
||||
- **Negative**: No config file convenience. Users coming from `ssh config` may find this inconvenient.
|
||||
|
||||
## References
|
||||
- [client.md](../client.md)
|
||||
- [OQ-06](../open-questions.md) — resolved by this ADR
|
||||
@@ -1,42 +0,0 @@
|
||||
# ADR-012: Ed25519 Keys + OpenSSH Certificate Authority, No Password Auth
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
SSH authentication has several options:
|
||||
- **Ed25519 public key**: The default, already specified. Each user has a keypair; the server has an `authorized_keys` file.
|
||||
- **Password authentication**: Convenient for quick setups but less secure (susceptible to brute force, credential reuse).
|
||||
- **OpenSSH certificate authority (cert-authority)**: A CA signs user certificates. The server trusts the CA instead of individual keys. Much easier for multi-user setups — add one CA line to `authorized_keys` instead of every user's public key. Also supports certificate expiry and restrictions.
|
||||
|
||||
The question is which auth methods to support and prioritize.
|
||||
|
||||
## Decision
|
||||
|
||||
**Primary: Ed25519 public key** (already specified, no change).
|
||||
|
||||
**Important: OpenSSH certificate authority**. Support `cert-authority` entries in `authorized_keys` files. When a user presents a certificate signed by a trusted CA, the server validates the certificate (signature, expiry, permissions) and accepts it. This is critical for multi-user deployments where managing individual keys is impractical.
|
||||
|
||||
**Not supported: Password authentication over SSH channels.** Password auth over an SSH tunnel (i.e., the SOCKS5 proxy requiring a password) is not in scope. Password auth over SSH itself is rejected because:
|
||||
- It's less secure than key-based auth
|
||||
- It's susceptible to brute force (fail2ban can mitigate, but keys eliminate the problem)
|
||||
- It's not needed when cert-authority provides easy multi-user management
|
||||
- If a local SOCKS5 proxy is desired with its own auth, that's a separate concern
|
||||
|
||||
The server's `authorized_keys` file format follows OpenSSH conventions:
|
||||
- Regular keys: `ssh-ed25519 AAAA... user@host`
|
||||
- CA trusts: `cert-authority ssh-ed25519 AAAA... CA name`
|
||||
- Principals: `cert-authority,permit-port-forwarding ssh-ed25519 AAAA... CA name`
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Multi-user deployments are manageable — one CA entry instead of N key entries.
|
||||
- **Positive**: Certificates can carry expiry dates and restrictions (permit-port-forwarding, no-pty, source-address).
|
||||
- **Positive**: No password brute force risk. fail2ban still needed for connection-level abuse, but not for auth-level password guessing.
|
||||
- **Positive**: `russh` supports OpenSSH certificate verification natively.
|
||||
- **Negative**: Setting up a CA requires initial key management tooling (`ssh-keygen -s`).
|
||||
- **Negative**: Users who want a quick "just let me in" experience need to generate keys first. Not a significant barrier for the target audience (self-hosting, ops).
|
||||
|
||||
## References
|
||||
- [client.md](../client.md)
|
||||
- [server.md](../server.md)
|
||||
- [OQ-04](../open-questions.md) — resolved by this ADR
|
||||
@@ -1,39 +0,0 @@
|
||||
# ADR-013: Fail2ban-Friendly Server Logging
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The server needs to handle abuse on public-facing deployments. Our production infrastructure uses fail2ban on Linux (documented in [fail2ban.md](../../research/ops/fail2ban.md)) with nftables and systemd journal backend. fail2ban needs structured, parseable logs to identify abusive IP addresses.
|
||||
|
||||
However, fail2ban is Linux-specific. On other platforms (macOS, Windows, BSD), users need a different approach to reject abusive connections. The server should provide enough logging for fail2ban on Linux and enough built-in protection for other platforms.
|
||||
|
||||
## Decision
|
||||
The server logs connection and authentication events at `INFO` level with structured fields, and provides a configurable connection rate limiter as a built-in defense.
|
||||
|
||||
**Logging** (for fail2ban integration on Linux):
|
||||
- Log auth attempts: `level=INFO, msg="auth attempt", remote_addr=<ip>, user=<user>, key_fingerprint=<sha256>, result=<accept|reject>`
|
||||
- Log new connections: `level=INFO, msg="connection opened", remote_addr=<ip>, transport=<tcp|tls|iroh>`
|
||||
- Log disconnections: `level=INFO, msg="connection closed", remote_addr=<ip>, duration=<secs>`
|
||||
- Do NOT log: channel open targets, DNS resolutions, bytes transferred
|
||||
|
||||
This matches what fail2ban needs: source IP + failure indicator. Our existing fail2ban setup filters on similar fields for SSH and nginx.
|
||||
|
||||
**Built-in rate limiting** (for all platforms):
|
||||
- `--max-connections-per-ip <n>` (default: 0 = unlimited) — reject new connections from an IP that already has N active connections
|
||||
- `--max-auth-attempts <n>` (default: 10) — disconnect after N failed auth attempts from one connection
|
||||
- Rate limiting happens at the SSH layer, before channels are opened
|
||||
|
||||
This ensures that even without fail2ban, the server rejects obviously abusive connections.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: fail2ban can parse alknet logs the same way it parses SSH and nginx logs on our production systems.
|
||||
- **Positive**: Built-in rate limiting provides protection on platforms without fail2ban.
|
||||
- **Positive**: No privacy-sensitive data in logs (no tunnel destinations).
|
||||
- **Negative**: Slightly more code in the server for connection tracking per IP.
|
||||
- **Negative**: Users with custom fail2ban filters need to write regex for alknet's log format (documented examples provided).
|
||||
|
||||
## References
|
||||
- [server.md](../server.md)
|
||||
- [OQ-08](../open-questions.md) — resolved by this ADR
|
||||
- Production fail2ban setup: [fail2ban.md](../../research/ops/fail2ban.md)
|
||||
@@ -1,41 +0,0 @@
|
||||
# ADR-014: Defer TUN Implementation, Recommend Local SOCKS5 + tun2proxy
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The original plan included a TUN shim (`alknet-tun`) as Phase 3 — a separate root-requiring process that creates a TUN device and forwards IP packets through alknet's SOCKS5 port. This would provide VPN-like "route all traffic" behavior.
|
||||
|
||||
However, TUN implementation has significant complexities:
|
||||
- Platform differences (Linux TUN, macOS utun, Windows wintun.dll)
|
||||
- TCP reconstruction in userspace (smoltcp or tun2proxy's ip-stack)
|
||||
- Virtual DNS handling
|
||||
- Root/CAP_NET_ADMIN requirements
|
||||
- TUN is easy to get wrong and hard to debug
|
||||
|
||||
The core SOCKS5 interface already works for the vast majority of use cases. For users who truly need VPN-like "route all traffic" behavior, `tun2proxy` is an existing, well-tested tool that does exactly this: creates a TUN device and routes traffic through a SOCKS5 proxy.
|
||||
|
||||
## Decision
|
||||
Defer TUN implementation entirely. Remove `alknet-tun` from the architecture. Instead:
|
||||
|
||||
1. **Core interface**: alknet's local SOCKS5 proxy (always available, no root required)
|
||||
2. **VPN-like behavior**: Users who need it run `tun2proxy --proxy socks5://127.0.0.1:1080` alongside `alknet connect`
|
||||
3. **Documentation**: Recommend tun2proxy in the README/wiki for "route all traffic" use cases
|
||||
|
||||
This removes TUN from the project scope while still providing a path to VPN-like behavior. If demand justifies it later, `alknet-tun` can be added as a thin wrapper around tun2proxy's pattern.
|
||||
|
||||
The `tun` feature flag and `alknet-tun` binary are removed from the architecture. The `tun-rs` dependency is removed.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Significantly reduces project scope and complexity. No TUN code to write, test, or maintain across platforms.
|
||||
- **Positive**: tun2proxy is already well-tested for this exact use case.
|
||||
- **Positive**: Core binary remains unprivileged. No root code anywhere in the project.
|
||||
- **Positive**: Cleaner architecture — alknet only does SSH tunneling + SOCKS5. tun2proxy does TUN.
|
||||
- **Negative**: Users need two tools instead of one for VPN-like behavior. Mitigated by documentation.
|
||||
- **Negative**: tun2proxy is an external dependency in practice, though it's widely available in package managers.
|
||||
- **Negative**: No first-class Windows/macOS TUN story. tun2proxy handles these platforms but users need to install it separately.
|
||||
|
||||
## References
|
||||
- [tun-shim.md](../tun-shim.md) — this spec is now deprecated
|
||||
- [ADR-002](002-tun-separate-process.md) — superseded; TUN is no longer in scope
|
||||
- [ADR-005](005-socks5-before-tun.md) — SOCKS5 is still the primary interface; TUN forwarding is now external
|
||||
@@ -1,27 +0,0 @@
|
||||
# ADR-015: napi-rs for FFI Bridge
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The NAPI wrapper needs a Rust-to-Node.js bridge. Two main options:
|
||||
|
||||
1. **napi-rs**: The standard for Rust → Node.js native addons. Mature, well-documented, large ecosystem. Produces `.node` binaries for specific platforms. Good build tooling (`napi` CLI). Used by major projects (swc, rspack, biome).
|
||||
|
||||
2. **uniffi**: Mozilla's FFI bridge supporting multiple targets (Python, Swift, Kotlin, Node.js). Broader target reach but less mature for Node.js specifically. The Node.js binding is relatively new.
|
||||
|
||||
The primary consumer is TypeScript/Node.js — specifically the `@alkdev/pubsub` event target system. The broader alkdev ecosystem (pubsub, operations) is TypeScript-first. While future Python or mobile consumers are imaginable, they are not in scope.
|
||||
|
||||
## Decision
|
||||
Use napi-rs. It's the standard for Node.js native addons, has the best documentation and tooling, and matches our primary consumer (TypeScript/Node.js). If future Python or mobile consumers are needed, uniffi can be added as a separate FFI layer — the Rust core library doesn't change, only the binding layer does.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Best-in-class Node.js native addon support. Mature, well-documented, widely used.
|
||||
- **Positive**: `napi` CLI handles building, cross-compilation, and npm package publishing.
|
||||
- **Positive**: Async support via `napi-rs`'s `AsyncTask` and thread-safe functions.
|
||||
- **Negative**: Only targets Node.js. Python/Swift/Kotlin require a separate FFI bridge (uniffi or similar).
|
||||
- **Negative**: `.node` binaries are platform-specific. Need CI matrix for linux-x64, linux-arm64, macos-x64, macos-arm64, win32-x64.
|
||||
|
||||
## References
|
||||
- [napi-and-pubsub.md](../napi-and-pubsub.md)
|
||||
- [OQ-11](../open-questions.md) — resolved by this ADR
|
||||
@@ -1,40 +0,0 @@
|
||||
# ADR-016: NAPI Exposes Both connect() and serve()
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The NAPI wrapper needs to provide TypeScript/Node.js consumers with access to alknet's functionality. The primary use case is `@alkdev/pubsub`'s event target system, which needs both directions:
|
||||
|
||||
1. **connect()**: Establish a client connection to a alknet server. Used by workers/spokes that need to tunnel events through a alknet server.
|
||||
2. **serve()**: Start a alknet server from Node.js. Used by hubs that want to accept alknet connections and route events.
|
||||
|
||||
The previous decision (ADR-007) was to expose only `connect()` for MVP, deferring `serve()`. However, the pubsub integration requires both: a spoke needs `connect()` to reach a hub, and a hub could use `serve()` to accept connections without running a separate `alknet serve` process.
|
||||
|
||||
More importantly, both `connect()` and `serve()` are fundamental operations of the alknet library. Since the NAPI wrapper is a thin layer over `alknet-core`, exposing both is straightforward — they're just Rust functions behind `#[napi]` attributes.
|
||||
|
||||
## Decision
|
||||
The NAPI wrapper exposes both `connect()` and `serve()` from the start:
|
||||
|
||||
```typescript
|
||||
// @alkdev/alknet
|
||||
function connect(options: AlknetConnectOptions): Promise<Duplex>;
|
||||
function serve(options: AlknetServeOptions): Promise<AlknetServer>;
|
||||
```
|
||||
|
||||
- `connect()` returns a `Duplex` stream (as per ADR-007)
|
||||
- `serve()` returns a `AlknetServer` object with a `close()` method and events for new connections
|
||||
|
||||
The NAPI layer is transport-agnostic — it doesn't know about pubsub's `EventEnvelope`. The pubsub event target adapter wraps the `Duplex` stream to implement `TypedEventTarget`. This separation ensures the NAPI wrapper is reusable for any stream-based protocol, not just pubsub.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Pubsub can use both directions without running a separate binary for the server side.
|
||||
- **Positive**: The NAPI wrapper becomes a complete bridge — any Node.js process can be either a client or server.
|
||||
- **Positive**: Implementation is still minimal — `serve()` is just `alknet_core::server::run()` behind `#[napi]`.
|
||||
- **Negative**: Slightly larger API surface (two functions + `AlknetServer` type instead of just `connect()`).
|
||||
- **Negative**: Server-side NAPI needs to handle multiple concurrent connections, which adds complexity to `AlknetServer`.
|
||||
|
||||
## References
|
||||
- [napi-and-pubsub.md](../napi-and-pubsub.md)
|
||||
- [ADR-007](007-napi-single-stream.md) — still valid; NAPI exposes single streams, but now from both sides
|
||||
- [OQ-10](../open-questions.md) — resolved by this ADR
|
||||
@@ -1,30 +0,0 @@
|
||||
# ADR-017: Stealth Mode — Protocol Multiplexing on Port 443
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
When running a alknet server with TLS transport on port 443, the server should be indistinguishable from a regular HTTPS web server to port scanners and deep packet inspection (DPI) systems. This is important for censorship circumvention — if SSH traffic on port 443 is detectable, it can be blocked.
|
||||
|
||||
After the TLS handshake completes, the server sees a raw byte stream. SSH protocol identification starts with `SSH-2.0-`, while HTTP starts with HTTP method verbs (GET, POST, etc.). The server can inspect the first bytes to determine the protocol.
|
||||
|
||||
## Decision
|
||||
When `--stealth` is enabled with TLS transport:
|
||||
|
||||
1. After completing the TLS handshake, peek at the first few bytes of the connection
|
||||
2. If the connection starts with `SSH-2.0-`, proceed with SSH session via `server::run_stream()`
|
||||
3. If the connection starts with anything else (HTTP, random data), respond with `HTTP/1.1 404 Not Found\r\nServer: nginx\r\n\r\n` and close the connection
|
||||
|
||||
This makes the server appear as an nginx web server returning 404 errors to all non-SSH connections. Scanners and DPI systems see a typical HTTPS site with no SSH exposure.
|
||||
|
||||
The fake response uses `Server: nginx` headers to match the most common web server profile.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: TLS+alknet servers on port 443 are indistinguishable from ordinary HTTPS sites to automated scanners.
|
||||
- **Positive**: Simple implementation — just peek at the first bytes and branch.
|
||||
- **Positive**: Consistent with censorship circumvention best practices.
|
||||
- **Negative**: Legitimate HTTPS traffic to the same port gets a 404. If the same IP needs to serve real web content, use a reverse proxy (nginx/haproxy) in front that routes by SNI or path.
|
||||
- **Negative**: The `--stealth` flag only applies to TLS transport. It has no effect on TCP or iroh transports.
|
||||
|
||||
## References
|
||||
- [server.md](../server.md)
|
||||
@@ -1,38 +0,0 @@
|
||||
# ADR-018: Control Channel for PubSub over SSH
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The NAPI wrapper and pubsub integration need a way to use alknet's SSH channel as a data plane for event routing. When a `alknet connect` client opens an SSH session to a server, the `direct_tcpip` channel type is used to reach specific TCP targets (host:port).
|
||||
|
||||
For the pubsub use case, the client needs a dedicated bidirectional stream to the server's event bus — not a TCP connection to a random host. There are several approaches:
|
||||
|
||||
1. **Special destination**: Use `direct_tcpip` with a reserved destination (e.g., `alknet-control:0`) that the server recognizes and routes internally instead of connecting to a TCP target.
|
||||
2. **Port forwarding**: The server runs a pubsub hub on a specific port (e.g., 9736) and the client uses normal port forwarding (`-L 9736:hub:9736`).
|
||||
3. **Custom channel type**: Define a new SSH channel type beyond `direct_tcpip` and `forwarded_tcpip`.
|
||||
|
||||
## Decision
|
||||
Use approach 1: a reserved `direct_tcpip` destination string. When the server receives a `channel_open_direct_tcpip` request for `alknet-control:0`:
|
||||
|
||||
1. The `channel_open_direct_tcpip` handler detects the special target via string matching
|
||||
2. Instead of connecting to a TCP target, it bridges the channel to the local pubsub event bus
|
||||
3. `EventEnvelope` JSON flows bidirectionally over the SSH channel
|
||||
|
||||
The destination string `alknet-control` is reserved. Regular TCP targets are hostnames or IP addresses, so there is no collision risk.
|
||||
|
||||
Approach 2 (port forwarding to a specific port) is still supported as an alternative — the client can use `--forward 9736:localhost:9736` if the server runs a pubsub hub on that port. But the control channel approach is simpler and doesn't require a separate listening port.
|
||||
|
||||
Approach 3 (custom channel type) was rejected because russh's `direct_tcpip` handler is well-understood and adding custom channel types requires modifying russh.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Simple implementation — just string matching in the server's `channel_open_direct_tcpip` handler.
|
||||
- **Positive**: No separate port or service needs to run on the server. The control channel is built into alknet.
|
||||
- **Positive**: Compatible with the NAPI wrapper's single-duplex-stream model.
|
||||
- **Positive**: Port forwarding to a specific port is still available as an alternative.
|
||||
- **Negative**: The string `alknet-control` is a magic constant. It should be defined as a constant in the crate.
|
||||
- **Negative**: Regular TCP destinations accidentally matching `alknet-control` would be misrouted. Mitigated by reserving the entire `alknet-` prefix namespace.
|
||||
|
||||
## References
|
||||
- [napi-and-pubsub.md](../napi-and-pubsub.md)
|
||||
- [server.md](../server.md)
|
||||
@@ -1,42 +0,0 @@
|
||||
# ADR-019: `--proxy` Has Different Semantics on Client vs Server
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The `--proxy` CLI flag appears on both `alknet connect` (client) and `alknet serve` (server), but the two sides proxy fundamentally different things:
|
||||
|
||||
- **Client**: `--proxy` routes the *transport connection* through the proxy. For example, `alknet connect --transport iroh --proxy socks5://127.0.0.1:1080` means the iroh endpoint's outbound TCP connections go through the specified SOCKS5 proxy before reaching the iroh relay. The proxy wraps the transport layer.
|
||||
|
||||
- **Server**: `--proxy` routes *outbound target connections* through the proxy. For example, `alknet serve --proxy socks5://127.0.0.1:9050` means when an SSH client opens a `direct_tcpip` channel to `db.internal:5432`, the server connects to that target through the specified proxy. The proxy wraps the data-plane connections.
|
||||
|
||||
Using the same flag name for both is intentional — from the user's perspective, both mean "route traffic through a proxy." But the layer at which the proxy operates differs, and this needs to be explicit so implementers don't confuse the two.
|
||||
|
||||
ADR-010 addressed transport chaining for the client side only. The server-side outbound proxy behavior has no ADR. This ADR documents both semantics and the rationale for sharing the flag name.
|
||||
|
||||
## Decision
|
||||
The `--proxy` flag uses the same name on client and server, with documented different semantics:
|
||||
|
||||
| Side | Flag | What gets proxied | Example |
|
||||
|------|------|-------------------|---------|
|
||||
| Client | `--proxy` | Transport connection (outbound to server/relay) | `--transport iroh --proxy socks5://...` → iroh endpoint connects through proxy |
|
||||
| Server | `--proxy` | Outbound target connections (data plane) | `--proxy socks5://...` → direct_tcpip targets reached through proxy |
|
||||
|
||||
On the **client**, `--proxy` affects the transport layer. It only applies to transports that make outbound TCP connections (iroh through a proxy, TLS through a proxy). For plain TCP transport, `--proxy` has no meaningful effect since the transport is already a direct TCP connection — use the SOCKS5 server instead.
|
||||
|
||||
On the **server**, `--proxy` affects the data plane. All `channel_open_direct_tcpip` outbound connections are routed through the proxy, regardless of transport mode.
|
||||
|
||||
This is not a naming collision — it's the same conceptual operation ("route through a proxy") at different layers. The shared name avoids forcing users to learn two proxy flags.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: One flag name (`--proxy`) instead of two. Users already understand "proxy" as "route through this."
|
||||
- **Positive**: Client-side proxy is minimal implementation — iroh's endpoint builder accepts proxy config natively.
|
||||
- **Positive**: Server-side proxy is straightforward — all outbound TCP from channel handlers goes through the proxy.
|
||||
- **Negative**: Implementers must read the correct spec (client vs server) to understand what `--proxy` does for their side. This is mitigated by CLI help text that clearly describes the behavior per side.
|
||||
- **Negative**: On the client, `--proxy` with `--transport tcp` is effectively a no-op (the transport is already a direct TCP connection to the server). The CLI should handle this case gracefully.
|
||||
|
||||
## References
|
||||
- [ADR-010](010-transport-chaining-cli.md) — client-side transport chaining
|
||||
- [transport.md](../transport.md) — transport layer spec
|
||||
- [client.md](../client.md) — client CLI
|
||||
- [server.md](../server.md) — server outbound proxy
|
||||
@@ -1,85 +0,0 @@
|
||||
# ADR-023: Unified Authentication with Shared Key Material
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Alknet currently authenticates connections exclusively through SSH public key
|
||||
auth in the SSH handshake. This works for SSH-over-any-transport (TCP, TLS,
|
||||
iroh) because SSH carries its own auth protocol. But WebTransport and other
|
||||
HTTP-level transports cannot perform SSH key exchange — browsers speak HTTP/3,
|
||||
not SSH.
|
||||
|
||||
Without unification, non-SSH transports would need a completely separate
|
||||
identity system (API keys, JWTs, session tokens). This creates two problems:
|
||||
(1) operators manage two key sets with two rotation mechanisms, and (2) the
|
||||
same person connecting via SSH and WebTransport appears as two different
|
||||
identities.
|
||||
|
||||
The `IdentityProvider` trait is needed to decouple alknet-core from any
|
||||
specific identity storage (config file vs. database). Without it, alknet-core
|
||||
would either hardcode config-file-based auth or take a database dependency —
|
||||
neither is acceptable for a library crate.
|
||||
|
||||
## Decision
|
||||
|
||||
**Unified authentication**: The same Ed25519 key material (`authorized_keys`
|
||||
and `cert_authorities`) is shared across both SSH auth and token auth. The
|
||||
presentation differs per transport, but the verification result (an
|
||||
`Identity` with scopes) is the same.
|
||||
|
||||
**Token auth for non-SSH transports**: WebTransport clients present a signed
|
||||
timestamp token in the CONNECT request URL:
|
||||
|
||||
```
|
||||
AuthToken = base64url(key_id || timestamp || signature)
|
||||
key_id = SHA-256 fingerprint of the Ed25519 public key (32 bytes)
|
||||
timestamp = Unix seconds, big-endian u64 (8 bytes)
|
||||
signature = Ed25519 sign(key_id || timestamp_bytes, private_key)
|
||||
```
|
||||
|
||||
Server extracts the fingerprint, looks it up in the same `authorized_keys`
|
||||
set, verifies the signature, and checks the timestamp window (default ±300s).
|
||||
|
||||
**`IdentityProvider` trait**: Decouples alknet-core from identity storage. The
|
||||
trait resolves a fingerprint or token to an `Identity`. Default implementation
|
||||
loads from `DynamicConfig.auth` (no database). Hub implementation can back it
|
||||
with `@alkdev/storage`.
|
||||
|
||||
**`TokenKeySource::Shared`**: The token auth uses the same authorized keys set
|
||||
as SSH auth by default. Deployments that want separate access control can use
|
||||
`TokenKeySource::Separate` with a distinct key set.
|
||||
|
||||
**Replay protection via timestamps**: V1 uses timestamp-only (no server state).
|
||||
Zero-replay can be added later via a nonce challenge-response without changing
|
||||
the key material.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: One key set, one rotation, one `reloadAuth()` call. Adding a
|
||||
key to `authorized_keys` immediately grants access via both SSH and
|
||||
WebTransport.
|
||||
- **Positive**: `IdentityProvider` trait makes alknet-core independent of any
|
||||
specific database. Default: config file. Hub: `@alkdev/storage`.
|
||||
- **Positive**: Browser clients can authenticate using Ed25519 keys via
|
||||
SubtleCrypto (Chrome 105+, Firefox 130+, Safari 17+). Deno supports it
|
||||
natively.
|
||||
- **Positive**: No JWT library dependency. The token is a simple Ed25519
|
||||
signature over a fixed structure — same primitives SSH already uses.
|
||||
- **Negative**: V1 has a replay window (±300s). An attacker who intercepts a
|
||||
QUIC packet can replay the token within the window. Acceptable because QUIC
|
||||
interception is the same threat level as connection hijacking.
|
||||
- **Negative**: Certificate authority tokens are not supported in v1. CA
|
||||
verification requires the full OpenSSH certificate structure, which doesn't
|
||||
fit in a signed timestamp.
|
||||
- **Negative**: Browser-side key management is less ergonomic than SSH key
|
||||
files. The private key must be imported into SubtleCrypto. This is a UI/UX
|
||||
concern, not a protocol concern.
|
||||
|
||||
## References
|
||||
|
||||
- [auth.md](../auth.md) — Full auth architecture spec
|
||||
- [ADR-012](012-auth-ed25519-and-cert-authority.md) — Ed25519 + cert-authority auth
|
||||
- [OQ-17](../open-questions.md) — Transport-aware auth (resolved by this ADR)
|
||||
- [configuration.md](../../research/configuration.md) — OQ-CFG-04, OQ-CFG-06 (resolved)
|
||||
@@ -1,63 +0,0 @@
|
||||
# ADR-024: Bidirectional Call Protocol
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The alknet control channel (ADR-018) routes from client → server's event bus.
|
||||
This is unidirectional: clients can send events to the server, but the server
|
||||
cannot call operations on the client. In the hub/spoke model, spokes (dev env
|
||||
containers) connect to a hub and expose operations (fs, bash, search) that the
|
||||
hub invokes. The hub needs to call *spoke* operations.
|
||||
|
||||
Additionally, the current control channel provides no request/response semantics.
|
||||
Every consumer that needs call/response reinvents the pending-request correlation.
|
||||
|
||||
## Decision
|
||||
|
||||
The call protocol is bidirectional. Both sides can send `call.requested` and
|
||||
receive `call.responded`. The protocol uses `EventEnvelope` wire format (4-byte
|
||||
BE length prefix + JSON) — the same as `@alkdev/pubsub`.
|
||||
|
||||
Five event types: `call.requested`, `call.responded`, `call.completed`,
|
||||
`call.aborted`, `call.error`.
|
||||
|
||||
A call is a subscribe that resolves after one event. Both use `call.requested`
|
||||
with correlated `requestId`. `PendingRequestMap` in core provides correlation.
|
||||
|
||||
Operation names use slash-based paths: `/{spoke}/{service}/{op}`. The first
|
||||
path segment routes the call to the correct connected node. The hub's registry
|
||||
maps spoke prefixes to connections. This mirrors iroh's ALPN dispatch: the
|
||||
first segment is the routing key, remaining path dispatches within the node.
|
||||
|
||||
Core-provided operations use short paths without a spoke prefix
|
||||
(`/services/list`, `/services/schema`). Spoke operations are prefixed
|
||||
(`/dev1/fs/readFile`).
|
||||
|
||||
This generalizes ADR-018's control channel: the `alknet-*` destination becomes
|
||||
a transport for `EventEnvelope` frames with call protocol semantics, instead of
|
||||
raw pubsub dispatch.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Hub can invoke operations on spokes. Dev env containers
|
||||
expose fs, bash, search — the hub calls them as needed.
|
||||
- **Positive**: Browser clients can expose custom UDFs. Any connected participant
|
||||
can both call and serve operations.
|
||||
- **Positive**: Built-in request/response correlation. One `PendingRequestMap`
|
||||
in core serves all consumers.
|
||||
- **Positive**: Slash-based paths align with URL routing, OpenAPI, MCP, and
|
||||
iroh's ALPN dispatch. First segment = routing key.
|
||||
- **Positive**: Multiple spokes exposing the same service (two dev envs both
|
||||
exposing `/fs/*`) are naturally differentiated by the spoke prefix.
|
||||
- **Negative**: The `PendingRequestMap` adds in-memory state. Entries must be
|
||||
cleaned up on timeout or connection close.
|
||||
- **Negative**: The hub must maintain a routing table mapping spoke identities
|
||||
to connections, with registration on connect and cleanup on disconnect.
|
||||
|
||||
## References
|
||||
|
||||
- [call-protocol.md](../call-protocol.md) — Full call protocol spec
|
||||
- [ADR-018](018-control-channel-for-pubsub.md) — Control channel (generalized)
|
||||
- [napi-and-pubsub.md](../napi-and-pubsub.md) — NAPI wrapper and pubsub adapter
|
||||
@@ -1,73 +0,0 @@
|
||||
# ADR-025: Handler/Spec Separation for Downstream Service Registration
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The current control channel (ADR-018) is hardcoded: `alknet-control:0` bridges
|
||||
to the local pubsub event bus. If NAPI wants to expose `fs.readFile` or
|
||||
`bash.exec` as callable operations, it has no way to register these with core's
|
||||
channel routing. The NAPI handler would need to intercept channel data outside
|
||||
of core.
|
||||
|
||||
For the hub/spoke model, spokes register their operations with the hub when
|
||||
they connect. The hub's registry must include both hub-local operations and
|
||||
remote operations exposed by spokes.
|
||||
|
||||
## Decision
|
||||
|
||||
Operation specs and handlers are separated from core. Core provides:
|
||||
|
||||
1. `OperationSpec` — describes what an operation does (name, type, input/output
|
||||
schemas, access control)
|
||||
2. `OperationHandler` — implements the operation logic
|
||||
3. `OperationRegistry` — maps paths to specs + handlers
|
||||
4. Built-in operations: `/services/list`, `/services/schema`
|
||||
|
||||
Downstream consumers register their own operations:
|
||||
|
||||
```rust
|
||||
// NAPI layer registers dev env tools
|
||||
registry.register(OperationSpec { name: "/fs/readFile", ... }, fs_read_handler);
|
||||
registry.register(OperationSpec { name: "/bash/exec", ... }, bash_exec_handler);
|
||||
|
||||
// Browser client registers a custom UDF
|
||||
registry.register(OperationSpec { name: "/notify/alert", ... }, notify_handler);
|
||||
```
|
||||
|
||||
Operation names use slash-based paths: `/{spoke}/{service}/{op}`. The first
|
||||
segment routes to the node. The `namespace` field on `OperationSpec` is
|
||||
derived from the second path segment (`service`).
|
||||
|
||||
When spoke operations are registered with the hub, the hub adds the spoke
|
||||
prefix: a spoke that registers `/fs/readFile` as "dev1" becomes addressable as
|
||||
`/dev1/fs/readFile` in the hub's routing table.
|
||||
|
||||
The `/services/list` operation returns all registered specs. The
|
||||
`/services/schema` operation returns the spec for a specific operation. These
|
||||
are read-only — no admin operations.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: NAPI, Python, and any downstream consumer can register
|
||||
operations without modifying core.
|
||||
- **Positive**: Service discovery is built in. Clients query `/services/list`
|
||||
to learn what operations a hub offers.
|
||||
- **Positive**: Spoke prefix naturally differentiates multiple spokes exposing
|
||||
the same service (dev1 vs dev2).
|
||||
- **Positive**: `AccessControl` on each `OperationSpec` enables per-operation
|
||||
authorization. Higher-risk operations (shell, filesystem write) can require
|
||||
tighter scopes.
|
||||
- **Positive**: Schema exposure enables MCP adapter generation. OperationSpec
|
||||
maps directly to MCP tool definitions.
|
||||
- **Negative**: The registry adds complexity. Core now owns `OperationSpec`,
|
||||
`OperationRegistry`, and `PendingRequestMap`.
|
||||
- **Negative**: Namespace collisions between downstream consumers are possible.
|
||||
The spoke prefix mitigates this: `/dev1/fs/readFile` vs `/dev2/fs/readFile`.
|
||||
|
||||
## References
|
||||
|
||||
- [call-protocol.md](../call-protocol.md) — Full call protocol spec
|
||||
- [ADR-018](018-control-channel-for-pubsub.md) — Control channel (generalized)
|
||||
- `@alkdev/operations` — TypeScript `OperationSpec`, `CallHandler`, registry
|
||||
@@ -1,162 +0,0 @@
|
||||
# ADR-026: Transport/Interface Separation (Three-Layer Model)
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
In the current architecture, SSH is deeply embedded in the server handler. The
|
||||
`ServerHandler` owns auth, channel management, and proxy logic — all mixed
|
||||
together. This makes it impossible to run the call protocol over any transport
|
||||
that doesn't speak SSH, such as:
|
||||
|
||||
- **DNS** — encoding call protocol frames as DNS TXT queries/responses for
|
||||
censorship resistance
|
||||
- **Raw framing** — 4-byte length prefix + JSON `EventEnvelope` without SSH
|
||||
wrapping, for local service mesh or browser-to-head direct communication
|
||||
- **WebTransport** — running call protocol over QUIC streams (browsers can't do
|
||||
SSH key exchange)
|
||||
|
||||
The DNS control channel concept from research (`core.md`) currently conflates
|
||||
"DNS as a transport that moves bytes" with "SSH sessions over those bytes." But
|
||||
SSH is not a transport — it's a protocol layer that sits *on top of* a
|
||||
transport. Separating them enables the DNS control channel to carry call
|
||||
protocol events directly, without wrapping SSH inside DNS queries.
|
||||
|
||||
The same separation enables raw framing (no SSH overhead) for trusted local
|
||||
networks, and WebTransport direct call protocol for browser clients.
|
||||
|
||||
## Decision
|
||||
|
||||
**Establish a three-layer model:**
|
||||
|
||||
### Layer 1: Transport
|
||||
|
||||
Produces byte streams. A `Transport` still produces
|
||||
`AsyncRead + AsyncWrite + Unpin + Send`. This layer is unchanged from ADR-001.
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait Transport: Send + Sync + 'static {
|
||||
type Stream: AsyncRead + AsyncWrite + Unpin + Send + 'static;
|
||||
async fn connect(&self) -> Result<Self::Stream>;
|
||||
fn describe(&self) -> String;
|
||||
}
|
||||
```
|
||||
|
||||
Transports: TCP, TLS, iroh, DNS (as byte carrier), WebTransport (future).
|
||||
|
||||
### Layer 2: Interface
|
||||
|
||||
Consumes a `Transport::Stream` and produces call protocol sessions. An
|
||||
interface is what SSH currently does: wrap a byte stream in session semantics.
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait Interface: Send + Sync + 'static {
|
||||
type Session;
|
||||
async fn accept(stream: TransportStream, config: &InterfaceConfig) -> Result<Self::Session>;
|
||||
}
|
||||
```
|
||||
|
||||
Interfaces:
|
||||
|
||||
- **SSH interface** — wraps existing `ServerHandler` logic. SSH handshake, auth,
|
||||
channel multiplexing. The call protocol runs over a reserved SSH channel
|
||||
(`alknet-control:0`).
|
||||
- **Raw framing interface** — 4-byte big-endian length prefix + JSON
|
||||
`EventEnvelope`. No SSH overhead. Direct call protocol over the transport
|
||||
stream.
|
||||
- **DNS control channel** — a (DNS transport, raw framing interface) pair that
|
||||
encodes/decodes `EventEnvelope` frames as DNS query/response pairs.
|
||||
|
||||
### Layer 3: Protocol
|
||||
|
||||
Carries semantics. Call protocol events, operation registry, service calls.
|
||||
The protocol is agnostic to both the transport and the interface below it. It
|
||||
receives `EventEnvelope` frames from whatever interface produced them.
|
||||
|
||||
### Connection Model
|
||||
|
||||
A **connection** is always a (Transport, Interface) pair. The valid combinations are enumerated:
|
||||
|
||||
| Transport | Interface | Use case |
|
||||
|-----------|-----------|----------|
|
||||
| TLS | SSH | Standard alknet tunnel |
|
||||
| TCP | SSH | Plain SSH tunnel |
|
||||
| iroh | SSH | P2P SSH tunnel |
|
||||
| DNS | raw framing | DNS control channel |
|
||||
| WebTransport | SSH | Browser SSH tunnel (future) |
|
||||
| WebTransport | raw framing | Browser call protocol (future) |
|
||||
| TCP | raw framing | Direct call protocol, local mesh |
|
||||
|
||||
**The DNS control channel carries call protocol frames directly — it does NOT
|
||||
wrap SSH inside DNS.** This is explicit because the research originally
|
||||
conflated "SSH tunneling over DNS" with "DNS as a transport for call protocol."
|
||||
The (DNS, raw framing) pair sends `EventEnvelope` frames as DNS TXT
|
||||
queries/responses — no SSH involved.
|
||||
|
||||
### `TransportKind` Enum
|
||||
|
||||
The `TransportKind` enum (currently `Tcp | Tls | Iroh`) gains `Dns` and
|
||||
`WebTransport` variants. Initially these are tags only — no acceptor
|
||||
implementation. The full DNS and WebTransport implementations are Phase 4 work
|
||||
per the integration plan.
|
||||
|
||||
```rust
|
||||
pub enum TransportKind {
|
||||
Tcp,
|
||||
Tls { server_name: Option<String> },
|
||||
Iroh { endpoint_id: String },
|
||||
Dns { domain: String },
|
||||
WebTransport { host: String },
|
||||
}
|
||||
```
|
||||
|
||||
### ServerHandler Refactor
|
||||
|
||||
The existing `ServerHandler` is refactored into `SshInterface`. The interface
|
||||
abstraction means the server's accept loop becomes:
|
||||
|
||||
```rust
|
||||
// Pseudocode
|
||||
let (transport, interface) = listener_config;
|
||||
let stream = transport.accept().await?;
|
||||
let session = interface.accept(stream, &config).await?;
|
||||
// session produces call protocol events
|
||||
```
|
||||
|
||||
The call protocol handler is interface-agnostic — it receives `EventEnvelope`
|
||||
frames from any interface. Auth, forwarding policy, and operation routing happen
|
||||
at Layer 3, not inside the SSH handler.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Enables DNS control channel without SSH wrapping. The (DNS,
|
||||
raw framing) pair is a clean (Transport, Interface) combination.
|
||||
- **Positive**: Enables raw framing for local service mesh. No SSH overhead for
|
||||
trusted networks.
|
||||
- **Positive**: SSH becomes pluggable. The same call protocol handler works with
|
||||
any interface.
|
||||
- **Positive**: `ServerHandler` is refactored into `SshInterface` — a smaller,
|
||||
more focused component that only handles SSH session management.
|
||||
- **Positive**: Future WebTransport and WebSocket interfaces are additive — they
|
||||
implement the `Interface` trait without touching SSH code.
|
||||
- **Negative**: This is the most invasive code change in Phase 1
|
||||
(integration-plan, Phase 1.8). SSH auth, channel management, and proxy logic
|
||||
are currently tangled in `ServerHandler`. Extracting them requires careful
|
||||
refactoring to maintain existing behavior.
|
||||
- **Negative**: The `Interface` trait is new and untested. The design must
|
||||
accommodate both SSH's channel multiplexing and raw framing's single-stream
|
||||
model through the same abstraction.
|
||||
|
||||
## References
|
||||
|
||||
- [research/core.md](../../research/core.md) — Transport layer, DNS transport section
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.8, three-layer model
|
||||
- [transport.md](../transport.md) — Current Transport trait (unchanged at Layer 1)
|
||||
- [server.md](../server.md) — Current ServerHandler (will become SshInterface)
|
||||
- [ADR-001](001-pluggable-transport.md) — Transport trait produces stream (unchanged)
|
||||
- [ADR-004](004-ssh-over-transport.md) — SSH runs over transport (reinforced by Layer 2)
|
||||
- [ADR-024](024-bidirectional-call-protocol.md) — Bidirectional call protocol (Layer 3)
|
||||
@@ -1,164 +0,0 @@
|
||||
# ADR-027: Crate Decomposition
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
alknet-core currently contains everything: transport, SSH, auth, config, the
|
||||
call protocol handler, and the server accept loop. As the project grows to
|
||||
include SQLite-backed identity, HD key derivation, and metagraph storage, core
|
||||
would need to depend on rusqlite, bip39, petgraph, and other heavy dependencies
|
||||
— unacceptable for a library crate that CLI users embed.
|
||||
|
||||
Different deployment topologies need different subsets:
|
||||
- A minimal CLI tunnel only needs core, transport, and auth types
|
||||
- A head node needs SQLite-backed identity and the secret service
|
||||
- A flowgraph visualization tool only needs petgraph operations
|
||||
|
||||
Circular dependencies must be avoided. alknet-storage implements
|
||||
alknet-core's `IdentityProvider` trait, so alknet-core cannot depend on
|
||||
alknet-storage. alknet-storage references alknet-secret's `EncryptedData` wire
|
||||
format, but not as a crate dependency.
|
||||
|
||||
## Decision
|
||||
|
||||
**Decompose the project into six crates with a strict acyclic dependency graph.**
|
||||
|
||||
### Crate Structure
|
||||
|
||||
1. **alknet-core** — Transport, SSH, call protocol, config, auth types, identity,
|
||||
`OperationSpec`, `Interface` trait. The foundational crate that everything
|
||||
else depends on (by type, not by crate dep in some cases).
|
||||
- *Depends on*: russh, tokio, irpc (feature-gated), serde, arc-swap
|
||||
- *Does NOT depend on*: alknet-secret, alknet-storage, alknet-flowgraph
|
||||
|
||||
2. **alknet-secret** — BIP39 mnemonic generation, SLIP-0010 Ed25519 HD key
|
||||
derivation, AES-256-GCM encryption, `SecretProtocol` irpc service.
|
||||
- *Depends on*: bip39, ed25519-bip32 (or rust-bip32-ed25519), aes-gcm, sha2,
|
||||
irpc
|
||||
- *Does NOT depend on*: alknet-core, alknet-storage
|
||||
|
||||
3. **alknet-storage** — SQLite-backed metagraph, identity tables, ACL graph,
|
||||
honker integration, `StorageProtocol` irpc service.
|
||||
- *Depends on*: rusqlite (via honker), honker, petgraph, jsonschema, irpc
|
||||
- *Does NOT depend on alknet-core* (but implements alknet-core's
|
||||
`IdentityProvider` trait via the trait, not a crate dep)
|
||||
- *Does NOT depend on alknet-secret* (but references `EncryptedData` type
|
||||
format for wire compatibility)
|
||||
|
||||
4. **alknet-flowgraph** — `FlowGraph<N,E>` over petgraph, operation graph, call
|
||||
graph, type compatibility checking.
|
||||
- *Depends on*: petgraph, serde, jsonschema, thiserror
|
||||
- *Does NOT depend on*: alknet-core, alknet-storage, alknet-secret
|
||||
|
||||
5. **alknet-napi** — Node.js native addon. Exposes alknet-core to Node.js.
|
||||
- *Depends on*: alknet-core
|
||||
- *Does NOT depend on*: alknet-secret, alknet-storage, alknet-flowgraph
|
||||
|
||||
6. **alknet** (CLI binary) — Assembles everything.
|
||||
- *Depends on*: alknet-core, alknet-secret (feature), alknet-storage (feature),
|
||||
alknet-flowgraph (feature), toml
|
||||
|
||||
### Dependency Graph
|
||||
|
||||
```
|
||||
alknet-secret alknet-storage alknet-flowgraph
|
||||
(standalone) (standalone) (standalone)
|
||||
│ │ │
|
||||
│ (feature flags │ (trait impl │ (type compat
|
||||
│ in CLI binary) │ via CLI wire) │ via JSON)
|
||||
▼ ▼ ▼
|
||||
┌─────────────────────┐
|
||||
│ alknet-core │
|
||||
│ (transport, SSH, │
|
||||
│ call protocol, │
|
||||
│ Identity, Config) │
|
||||
└─────────┬───────────┘
|
||||
│
|
||||
┌────────────┼────────────┐
|
||||
▼ ▼ ▼
|
||||
alknet-napi alknet (CLI binary — assembles everything)
|
||||
```
|
||||
|
||||
All four library crates (core, secret, storage, flowgraph) are independent of
|
||||
each other. Dependencies flow **upward** only. The CLI binary sits at the top
|
||||
and wires concrete implementations together. alknet-storage implements
|
||||
alknet-core's `IdentityProvider` trait without a crate dependency — the CLI
|
||||
binary provides the bridge.
|
||||
|
||||
### Narrow Interface Points
|
||||
|
||||
Three types serve as the narrow interface points between crates:
|
||||
|
||||
1. **`Identity`** — Defined in `alknet_core::auth`. Used by auth handler,
|
||||
forwarding policy, and call protocol. alknet-storage implements
|
||||
`IdentityProvider` to produce instances.
|
||||
|
||||
2. **`IdentityProvider`** — Trait defined in `alknet_core::auth`. Implemented by
|
||||
`ConfigIdentityProvider` (in core) and `StorageIdentityProvider` (in
|
||||
alknet-storage). The CLI/NAPI layer wires the concrete implementation.
|
||||
|
||||
3. **`OperationSpec`** — Defined in `alknet_core::call`. Used by the operation
|
||||
registry and by alknet-flowgraph for type compatibility checking. The bridge
|
||||
is serialization — flowgraph serializes to JSON, storage persists it.
|
||||
|
||||
### irpc Feature Flag
|
||||
|
||||
irpc is a feature flag in alknet-core. When disabled, auth and config go through
|
||||
`IdentityProvider` and `ConfigReloadHandle` directly — no irpc overhead. Nodes
|
||||
that only do SSH tunneling don't need the service layer.
|
||||
|
||||
In alknet-secret and alknet-storage, irpc is an independent dependency, not
|
||||
feature-gated. These crates always define irpc service protocols because they
|
||||
are used in production deployments where the service layer is active.
|
||||
|
||||
### alknet-storage's Relationship to alknet-core
|
||||
|
||||
alknet-storage does NOT depend on alknet-core as a crate. Instead:
|
||||
|
||||
- alknet-storage defines its own `IdentityProvider` impl that matches
|
||||
alknet-core's trait signature. The trait is re-exported or defined locally
|
||||
with `#[cfg(feature = "alknet-core")]` interop.
|
||||
- In practice, the CLI binary crate depends on both and wires them together.
|
||||
alknet-storage provides `StorageIdentityProvider`; alknet-core takes
|
||||
`impl IdentityProvider`.
|
||||
|
||||
### alknet-storage's Relationship to alknet-secret
|
||||
|
||||
alknet-storage does NOT depend on alknet-secret as a crate. Instead:
|
||||
|
||||
- alknet-storage and alknet-secret share the `EncryptedData` wire format (key
|
||||
version, salt, IV, ciphertext). This is a type-level compatibility, not a
|
||||
crate dependency.
|
||||
- alknet-secret encrypts; alknet-storage stores the encrypted blob in a
|
||||
`SecretNode` in the metagraph. The bridge is serialization.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Core is lean. No database, no crypto, no petgraph. CLI users
|
||||
get a small binary.
|
||||
- **Positive**: Services are pluggable. alknet-secret and alknet-storage can be
|
||||
swapped for alternative implementations.
|
||||
- **Positive**: No circular dependencies. The dependency graph is a DAG.
|
||||
- **Positive**: Deployment topology determines which crates to include. A CLI
|
||||
tunnel uses only alknet-core. A head node uses everything.
|
||||
- **Positive**: irpc is feature-gated in core. Minimal deployments don't pay for
|
||||
service layer overhead.
|
||||
- **Negative**: `IdentityProvider` trait interop between alknet-core and
|
||||
alknet-storage requires careful versioning. If the trait signature changes,
|
||||
both crates must update.
|
||||
- **Negative**: `EncryptedData` wire format compatibility between alknet-secret
|
||||
and alknet-storage is implicit (not enforced by the type system). A shared
|
||||
types crate could be extracted if needed, but adds another crate dependency.
|
||||
|
||||
## References
|
||||
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 2, dependency graph
|
||||
- [research/core.md](../../research/core.md) — alknet-core contents
|
||||
- [research/services.md](../../research/services.md) — Service protocols
|
||||
- [research/storage.md](../../research/storage.md) — alknet-storage contents
|
||||
- [research/flow.md](../../research/flow.md) — alknet-flowgraph contents
|
||||
- [ADR-028](028-auth-irpc-service.md) — Auth as irpc service (service protocol enabled by decomposition)
|
||||
- [ADR-029](029-identity-core-type.md) — Identity as core type (narrow interface point)
|
||||
@@ -1,147 +0,0 @@
|
||||
# ADR-028: Auth as irpc Service
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
For head nodes serving many users, in-memory key lookup via `ArcSwap<DynamicConfig>`
|
||||
doesn't scale. Loading all authorized keys into RAM and atomic-swapping the
|
||||
entire set on each reload works for small deployments but requires holding every
|
||||
key in memory. For production deployments with hundreds or thousands of users,
|
||||
auth verification should query a database on demand rather than holding all keys
|
||||
in memory.
|
||||
|
||||
The current `ArcSwap<DynamicConfig>` approach works for CLI and single-node
|
||||
setups. What's needed is an async boundary that allows auth verification to go
|
||||
through a service — locally via channels for minimal deployments, or via irpc
|
||||
for production deployments where auth runs on a separate process or node.
|
||||
|
||||
The critical design point: callers go through the `IdentityProvider` trait
|
||||
(ADR-029). The irpc service is one way to satisfy the trait. Both paths produce
|
||||
the same result — an `Identity` or rejection. The trait is the contract; the
|
||||
service is an implementation path.
|
||||
|
||||
## Decision
|
||||
|
||||
**Auth verification is provided via an irpc service protocol, with
|
||||
`IdentityProvider` as the interface contract and `ConfigIdentityProvider`
|
||||
(ArcSwap-backed) as the default implementation.**
|
||||
|
||||
### IdentityProvider Trait (ADR-029) — The Contract
|
||||
|
||||
Callers depend on `IdentityProvider`, not on any concrete implementation:
|
||||
|
||||
```rust
|
||||
pub trait IdentityProvider: Send + Sync + 'static {
|
||||
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
|
||||
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
|
||||
}
|
||||
```
|
||||
|
||||
### ConfigIdentityProvider — Default Implementation
|
||||
|
||||
Reads from `ArcSwap<DynamicConfig.auth>`. No database needed. Every authorized
|
||||
key gets a default scope set. This is the default for CLI and single-node
|
||||
deployments.
|
||||
|
||||
### AuthProtocol irpc Service — Behind Feature Flag
|
||||
|
||||
```rust
|
||||
#[rpc_requests(message = AuthMessage)]
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
enum AuthProtocol {
|
||||
#[rpc(tx=oneshot::Sender<AuthResult>)]
|
||||
#[wrap(VerifyPubkey)]
|
||||
VerifyPubkey { fingerprint: String, key_data: Vec<u8> },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<AuthResult>)]
|
||||
#[wrap(VerifyToken)]
|
||||
VerifyToken { token_bytes: Vec<u8>, timestamp: u64 },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<()>)]
|
||||
#[wrap(ReloadKeys)]
|
||||
ReloadKeys,
|
||||
|
||||
#[rpc(tx=oneshot::Sender<bool>)]
|
||||
#[wrap(CheckAccess)]
|
||||
CheckAccess { identity: Identity, operation: String },
|
||||
}
|
||||
|
||||
enum AuthResult {
|
||||
Ok(Identity),
|
||||
Denied(String),
|
||||
}
|
||||
```
|
||||
|
||||
The `AuthProtocol` is behind the `irpc` feature flag in alknet-core. Nodes
|
||||
that only do SSH tunneling don't need the service layer overhead. When the
|
||||
feature is disabled, auth goes through `IdentityProvider` directly.
|
||||
|
||||
### AuthServiceImpl
|
||||
|
||||
Two implementations exist (the second is a future phase):
|
||||
|
||||
- **ConfigAuthService** — backed by `ConfigIdentityProvider` (ArcSwap path).
|
||||
Wraps the trait in an irpc service for deployments that use the service layer
|
||||
but don't have SQLite. This is the Phase 1 path: it ships with alknet-core.
|
||||
- **StorageAuthService** — backed by SQLite `peer_credentials` and `api_keys`
|
||||
tables (in alknet-storage, not yet built). Queries on demand. Can maintain an
|
||||
LRU cache for hot fingerprints. This is a Phase 2+ implementation — the
|
||||
contract is defined here so alknet-storage can implement it later.
|
||||
|
||||
Both produce the same `AuthResult` — an `Identity` or a denial. Callers don't
|
||||
know or care which backend is running.
|
||||
|
||||
### Integration with IdentityProvider
|
||||
|
||||
The irpc service and the trait compose. A caller goes through `IdentityProvider`,
|
||||
which may internally delegate to the irpc service, or may satisfy the request
|
||||
locally via `ConfigIdentityProvider`. The deployment topology determines the
|
||||
path:
|
||||
|
||||
- **Minimal (CLI, single-node)**: `ConfigIdentityProvider` reads from
|
||||
`ArcSwap<DynamicConfig>`. No irpc overhead.
|
||||
- **Production with local auth**: `AuthServiceImpl` wraps
|
||||
`StorageIdentityProvider` locally. The handler calls `IdentityProvider` which
|
||||
routes to the local irpc service.
|
||||
- **Distributed auth**: Handler on a worker node calls `IdentityProvider` which
|
||||
routes to a remote auth irpc service over QUIC.
|
||||
|
||||
### ConfigService Integration
|
||||
|
||||
`AuthProtocol::ReloadKeys` triggers reload of the dynamic config's auth section.
|
||||
For the `ConfigIdentityProvider` path, this is equivalent to
|
||||
`ConfigReloadHandle::reload()`. For the `StorageIdentityProvider` path, this
|
||||
refreshes the LRU cache. Both update atomically — ongoing connections are
|
||||
unaffected, new connections pick up changes.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Minimal deployments use `ArcSwap` without irpc overhead. No
|
||||
database dependency for CLI users.
|
||||
- **Positive**: Production deployments wire `StorageIdentityProvider` behind the
|
||||
irpc service. Auth scales to thousands of users without loading all keys into
|
||||
memory.
|
||||
- **Positive**: The `IdentityProvider` trait is the only contract callers depend
|
||||
on. This keeps alknet-core lean and testable.
|
||||
- **Positive**: Feature flag (`irpc`) keeps core lean for deployments that don't
|
||||
need the service layer.
|
||||
- **Positive**: Both paths produce identical `Identity` results. Behavioral
|
||||
parity is enforced by the shared `Identity` type.
|
||||
- **Negative**: Two implementations must be kept in sync. `ConfigIdentityProvider`
|
||||
and `StorageIdentityProvider` must produce the same `Identity` for the same
|
||||
input. Integration tests should verify this.
|
||||
- **Negative**: The `irpc` feature flag adds conditional compilation complexity.
|
||||
The core must compile and work without it, and the service layer must work
|
||||
with it enabled.
|
||||
|
||||
## References
|
||||
|
||||
- [research/services.md](../../research/services.md) — AuthService, AuthProtocol definition
|
||||
- [auth.md](../auth.md) — IdentityProvider trait, Identity struct
|
||||
- [research/configuration.md](../../research/configuration.md) — Auth service approach
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.4
|
||||
- [ADR-029](029-identity-core-type.md) — Identity as core type
|
||||
- [ADR-027](027-crate-decomposition.md) — Crate decomposition
|
||||
@@ -1,107 +0,0 @@
|
||||
# ADR-029: Identity as Core Type
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The `Identity` struct and `IdentityProvider` trait are needed by auth,
|
||||
forwarding policy, and call protocol — three different subsystems in
|
||||
alknet-core. Without placing them in core, these subsystems would each define
|
||||
their own identity type, leading to duplication and conversion boilerplate.
|
||||
|
||||
The constraint: alknet-core must not depend on alknet-storage or any database.
|
||||
The `IdentityProvider` trait must be in core so that the handler can resolve
|
||||
identities without knowing whether the backing store is a config file or a
|
||||
SQLite database. External crates provide implementations.
|
||||
|
||||
Earlier research defined `Identity` inconsistently: `{node_id, fingerprint,
|
||||
scopes}` in services.md and `{id, scopes, resources}` in auth.md. The unified
|
||||
model uses `{id, scopes, resources}` where `id` serves as both fingerprint (for
|
||||
key-based auth from config) and account UUID (for database-backed auth).
|
||||
|
||||
## Decision
|
||||
|
||||
**`Identity` struct and `IdentityProvider` trait live in `alknet_core::auth`.**
|
||||
|
||||
### Identity Struct
|
||||
|
||||
```rust
|
||||
pub struct Identity {
|
||||
pub id: String, // Fingerprint (config auth) or account UUID (database auth)
|
||||
pub scopes: Vec<String>, // e.g., ["relay:connect", "service:gitea:read"]
|
||||
pub resources: HashMap<String, Vec<String>>, // e.g., {"service": ["gitea", "registry"]}
|
||||
}
|
||||
```
|
||||
|
||||
The `id` field serves dual purpose: when using config-based authentication
|
||||
(`ConfigIdentityProvider`), it holds the Ed25519 key fingerprint. When using
|
||||
database-backed authentication (`StorageIdentityProvider`), it holds the account
|
||||
UUID from the `accounts` table. This keeps the type simple while accommodating
|
||||
both auth paths.
|
||||
|
||||
The `scopes` field provides authorization scope strings used by
|
||||
`ForwardingPolicy` and `AccessControl` in `OperationSpec`. The `resources`
|
||||
field provides resource-level authorization beyond what scopes offer (e.g., which
|
||||
services this identity can access).
|
||||
|
||||
### IdentityProvider Trait
|
||||
|
||||
```rust
|
||||
pub trait IdentityProvider: Send + Sync + 'static {
|
||||
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
|
||||
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
|
||||
}
|
||||
```
|
||||
|
||||
The trait is the contract. Callers (auth handler, forwarding policy, call
|
||||
protocol) depend on `IdentityProvider` — not on any concrete implementation.
|
||||
|
||||
### Default and Production Implementations
|
||||
|
||||
- **`ConfigIdentityProvider`** (in alknet-core) — reads from
|
||||
`ArcSwap<DynamicConfig.auth>`. Every authorized key gets a default scope set.
|
||||
No database needed. This is the default for minimal deployments.
|
||||
- **`StorageIdentityProvider`** (in alknet-storage) — backed by SQLite
|
||||
`peer_credentials` and `api_keys` tables plus the ACL graph. Resolves
|
||||
fingerprint → account → organization membership → effective scopes. This is
|
||||
the production implementation for head nodes.
|
||||
|
||||
alknet-core never depends on alknet-storage. The trait relationship is:
|
||||
alknet-core *defines* the trait, alknet-storage *implements* it. The CLI or
|
||||
NAPI assembly layer wires the concrete implementation.
|
||||
|
||||
### Why Not in alknet-storage?
|
||||
|
||||
If `Identity` lived in alknet-storage, alknet-core would need to depend on
|
||||
alknet-storage to use the type — creating a circular dependency (since
|
||||
alknet-storage implements alknet-core's `IdentityProvider` trait). Placing the
|
||||
type and trait in core breaks the cycle.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: alknet-core has no database dependency. Auth, forwarding, and
|
||||
call protocol all use the same `Identity` type.
|
||||
- **Positive**: alknet-storage implements the core trait. The CLI/NAPI layer
|
||||
wires the concrete implementation. Deployment topology determines which impl
|
||||
to use.
|
||||
- **Positive**: The `id` field serves dual purpose (fingerprint or UUID),
|
||||
avoiding separate types for config-based and database-based auth.
|
||||
- **Positive**: `ForwardingPolicy` and `AccessControl` can reference scopes from
|
||||
`Identity` without knowing where they came from.
|
||||
- **Negative**: Two implementations of `IdentityProvider` exist — `Config` and
|
||||
`Storage`. Both must produce identical `Identity` results for the same input.
|
||||
Tests should verify behavioral parity.
|
||||
- **Negative**: The trait abstraction adds a level of indirection for the
|
||||
minimal (config-only) deployment path. The cost is negligible — the
|
||||
`ConfigIdentityProvider` is a simple `ArcSwap` dereference.
|
||||
|
||||
## References
|
||||
|
||||
- [auth.md](../auth.md) — IdentityProvider trait, Identity struct, unified auth
|
||||
- [research/services.md](../../research/services.md) — AuthService, Identity section
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.2
|
||||
- [ADR-023](023-unified-auth-shared-key-material.md) — Unified auth with shared key material
|
||||
- [ADR-028](028-auth-irpc-service.md) — Auth as irpc service
|
||||
- [OQ-18](../open-questions.md) — IdentityProvider owns scopes
|
||||
@@ -1,159 +0,0 @@
|
||||
# ADR-030: Static/Dynamic Configuration Split
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Alknet's configuration is loaded once at startup and never changes. This causes
|
||||
three specific failures:
|
||||
|
||||
1. **No hot reload of authentication credentials.** Adding or removing an
|
||||
authorized key requires restarting the server process. In head/worker
|
||||
deployments where keys are managed via a database, the process must be
|
||||
restarted every time a key is added, revoked, or rotated. This is
|
||||
operationally unacceptable.
|
||||
|
||||
2. **No port forwarding access control.** Any authenticated client can open a
|
||||
`direct-tcpip` channel to any destination. There is no policy governing
|
||||
which hosts, ports, or alknet control channels a client may access. A
|
||||
compromised key grants unrestricted network access through the tunnel.
|
||||
|
||||
3. **No structured configuration beyond CLI flags.** ADR-011 chose
|
||||
programmatic-first configuration for the alpha — correct at the time. But as
|
||||
alknet moves toward publishable releases, operators need config files for
|
||||
reproducible deployments, and the NAPI layer needs programmatic reload
|
||||
capability that `ServeOptions` doesn't currently support.
|
||||
|
||||
Not all configuration should be reloadable. Transport-level settings (listen
|
||||
address, TLS certificates, host key) require socket/TLS renegotiation to change
|
||||
at runtime — effectively a restart. Auth and forwarding policy can change
|
||||
atomically without disrupting existing connections.
|
||||
|
||||
## Decision
|
||||
|
||||
**Split configuration into `StaticConfig` and `DynamicConfig`.**
|
||||
|
||||
### StaticConfig
|
||||
|
||||
Immutable after startup. Constructed from `ServeOptions` (the builder pattern is
|
||||
preserved). Contains everything that affects socket binding, TLS handshakes, or
|
||||
SSH session negotiation:
|
||||
|
||||
- Transport mode, listen address
|
||||
- TLS config (cert, key)
|
||||
- iroh config (relay URL)
|
||||
- Stealth mode flag
|
||||
- Host key, host key algorithm
|
||||
- Max auth attempts, max connections per IP
|
||||
- Proxy config
|
||||
|
||||
Changing any of these requires a restart.
|
||||
|
||||
### DynamicConfig
|
||||
|
||||
Hot-reloadable at runtime via `ArcSwap<DynamicConfig>`. Contains everything
|
||||
checked per-connection or per-channel:
|
||||
|
||||
- `AuthPolicy` — authorized keys, certificate authorities, token config
|
||||
- `ForwardingPolicy` — allow/deny rules for channel targets (ADR-031)
|
||||
- `RateLimitConfig` — rate limiting parameters
|
||||
|
||||
`ArcSwap` provides lock-free reads on the hot path (every `auth_publickey()` and
|
||||
every `channel_open_direct_tcpip()` call does an `Arc` dereference — zero cost
|
||||
compared to the current approach). Writes are atomic: `store()` swaps the
|
||||
pointer. Existing connections finish with their current config; new connections
|
||||
get the new config.
|
||||
|
||||
### ConfigReloadHandle
|
||||
|
||||
```rust
|
||||
pub struct ConfigReloadHandle {
|
||||
dynamic: Arc<ArcSwap<DynamicConfig>>,
|
||||
}
|
||||
|
||||
impl ConfigReloadHandle {
|
||||
pub fn reload(&self, new_config: DynamicConfig) { ... }
|
||||
}
|
||||
```
|
||||
|
||||
The handle is obtained from `Server::run()` and passed to NAPI or the CLI.
|
||||
|
||||
### ConfigService
|
||||
|
||||
The `ConfigService` wraps `ArcSwap<DynamicConfig>` reloads behind an irpc
|
||||
protocol (behind the `irpc` feature flag) for production deployments that use
|
||||
the service layer. For minimal deployments (CLI, single-node), direct
|
||||
`ConfigReloadHandle::reload()` is sufficient.
|
||||
|
||||
### TOML Config File
|
||||
|
||||
An optional TOML config file covers static config plus initial auth/forwarding
|
||||
paths. This **amends** ADR-011 (does not supersede it) — the programmatic-first
|
||||
API remains primary. The config file is a convenience input format:
|
||||
|
||||
```toml
|
||||
[server]
|
||||
transport = "tls"
|
||||
listen = "0.0.0.0:443"
|
||||
stealth = false
|
||||
max_connections_per_ip = 5
|
||||
max_auth_attempts = 3
|
||||
|
||||
[server.tls]
|
||||
cert = "/etc/alknet/tls/cert.pem"
|
||||
key = "/etc/alknet/tls/key.pem"
|
||||
|
||||
[auth]
|
||||
host_key = "/etc/alknet/ssh/host_key"
|
||||
|
||||
[forwarding]
|
||||
default = "deny"
|
||||
```
|
||||
|
||||
### NAPI Reload API
|
||||
|
||||
```typescript
|
||||
interface AlknetServer {
|
||||
reloadAuth(auth: { authorizedKeys?: Buffer, certAuthority?: Buffer }): void;
|
||||
reloadForwarding(policy: ForwardingPolicyConfig): void;
|
||||
reloadAll(config: DynamicConfig): void;
|
||||
}
|
||||
```
|
||||
|
||||
The NAPI layer parses key data and constructs a new `DynamicConfig`, then calls
|
||||
`ConfigReloadHandle::reload()`.
|
||||
|
||||
### Client Configuration
|
||||
|
||||
Client configuration stays as `ConnectOptions` — no `ArcSwap` needed. Client
|
||||
config is almost entirely static (which server to connect to, which key to use).
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Auth credentials and forwarding policy can be reloaded without
|
||||
restarting the server. Adding a key via `reloadAuth()` takes effect on the
|
||||
next connection attempt.
|
||||
- **Positive**: ADR-011's programmatic-first intent is preserved. The TOML
|
||||
config file is an optional convenience layer, not a replacement for
|
||||
`ServeOptions`.
|
||||
- **Positive**: `ArcSwap` provides zero-cost reads on the hot path. Every auth
|
||||
check and every channel open is a single `Arc` dereference.
|
||||
- **Positive**: The `ConfigService` irpc protocol (behind feature flag) allows
|
||||
production deployments to integrate config reload into their service mesh
|
||||
without taking a direct dependency on `DynamicConfig` internals.
|
||||
- **Positive**: Forwarding policy is now part of `DynamicConfig` — operators can
|
||||
restrict access per identity, per destination, per transport (ADR-031).
|
||||
- **Negative**: Two config structs where there was one. The split is clean
|
||||
(transport vs. policy) but adds surface area.
|
||||
- **Negative**: Config file introduces `toml` as a dependency in the CLI crate.
|
||||
This is acceptable for a CLI binary.
|
||||
|
||||
## References
|
||||
|
||||
- [research/configuration.md](../../research/configuration.md) — Full analysis
|
||||
- [ADR-011](011-no-ssh-config-programmatic-api.md) — Programmatic-first API (amended, not superseded)
|
||||
- [ADR-031](031-forwarding-policy.md) — Forwarding policy (part of DynamicConfig)
|
||||
- [ADR-029](029-identity-core-type.md) — Identity as core type (DynamicConfig.auth uses IdentityProvider)
|
||||
- [integration-plan.md](../../research/integration-plan.md) — Phase 1.1
|
||||
@@ -1,138 +0,0 @@
|
||||
# ADR-031: Forwarding Policy
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Currently, any authenticated client can open a `direct-tcpip` SSH channel to
|
||||
any destination. The only gate is authentication — once authenticated, a client
|
||||
has unrestricted network access through the tunnel. This is a security gap: a
|
||||
compromised key grants unrestricted access.
|
||||
|
||||
Operators need the ability to:
|
||||
- Restrict which hosts and ports authenticated clients can access
|
||||
- Apply different rules to different principals (key fingerprints, accounts)
|
||||
- Restrict WebTransport clients to alknet control channels only
|
||||
- Set a default policy (allow-all for migration compatibility, deny-all for
|
||||
production)
|
||||
|
||||
## Decision
|
||||
|
||||
**Add `ForwardingPolicy` as part of `DynamicConfig` (reloadable without
|
||||
restart).**
|
||||
|
||||
### Type Definitions
|
||||
|
||||
```rust
|
||||
pub struct ForwardingPolicy {
|
||||
pub default: ForwardingAction,
|
||||
pub rules: Vec<ForwardingRule>,
|
||||
}
|
||||
|
||||
pub struct ForwardingRule {
|
||||
pub target: TargetPattern,
|
||||
pub action: ForwardingAction,
|
||||
pub principals: Vec<String>, // Empty = matches all
|
||||
pub transports: Vec<TransportKind>, // Empty = matches all
|
||||
}
|
||||
|
||||
pub enum ForwardingAction {
|
||||
Allow,
|
||||
Deny,
|
||||
}
|
||||
|
||||
pub enum TargetPattern {
|
||||
Any,
|
||||
Host(String), // "localhost", "*.example.com"
|
||||
Cidr(IpNetwork), // "10.0.0.0/8"
|
||||
PortRange(String, Range<u16>), // "localhost", ports 8080-8090
|
||||
AlknetPrefix, // Matches alknet-* control channels
|
||||
}
|
||||
```
|
||||
|
||||
### Rule Evaluation
|
||||
|
||||
Rules are evaluated in order. First match wins. If no rule matches, the default
|
||||
applies. This supports both allowlist and blocklist semantics:
|
||||
|
||||
- **Allowlist**: `default: Deny`, then explicit Allow rules for permitted
|
||||
destinations.
|
||||
- **Blocklist**: `default: Allow`, then explicit Deny rules for blocked
|
||||
destinations.
|
||||
|
||||
### Principals
|
||||
|
||||
Each rule can specify which principals it applies to. A principal is an
|
||||
`Identity.id` (fingerprint or UUID) or a scope from `Identity.scopes`. When the
|
||||
rule's `principals` field is empty, it matches all identities.
|
||||
|
||||
This connects to the `IdentityProvider` trait (ADR-029): when a client
|
||||
authenticates, the `Identity` is resolved, and the forwarding policy checks
|
||||
rules against `Identity.id` and `Identity.scopes`.
|
||||
|
||||
### TransportKind-Aware Rules
|
||||
|
||||
Each rule can specify which `TransportKind` it applies to. This enables
|
||||
transport-specific restrictions — for example, WebTransport clients can be
|
||||
restricted to `alknet-*` control channels only:
|
||||
|
||||
```rust
|
||||
ForwardingRule {
|
||||
target: TargetPattern::AlknetPrefix,
|
||||
action: ForwardingAction::Allow,
|
||||
principals: vec![],
|
||||
transports: vec![TransportKind::WebTransport { host: "*".into() }],
|
||||
}
|
||||
```
|
||||
|
||||
### Where the Policy Check Happens
|
||||
|
||||
The forwarding policy check occurs in `channel_open_direct_tcpip` before the
|
||||
proxy task is spawned. The current behavior (no check) is equivalent to
|
||||
`ForwardingPolicy::allow_all()` — default Allow, no rules. This preserves
|
||||
backward compatibility during migration.
|
||||
|
||||
### DynamicConfig Integration
|
||||
|
||||
`ForwardingPolicy` is part of `DynamicConfig` and reloadable via
|
||||
`ConfigReloadHandle::reload()` or NAPI's `reloadForwarding()`. Changes take
|
||||
effect on the next channel open — existing connections continue with their
|
||||
current policy.
|
||||
|
||||
### OQ Resolutions
|
||||
|
||||
- **OQ-12** (Per-user forwarding scope vs global rules): Resolved. Start with
|
||||
global rules + principal matching from `Identity.scopes`. Per-user scope
|
||||
from `peer_credentials.metadata.scopes` via `IdentityProvider`.
|
||||
- **OQ-16** (Transport-specific forwarding): Resolved. Add `TransportKind`
|
||||
match in `ForwardingRule`. WebTransport clients can be restricted.
|
||||
- **OQ-18** (Source of Identity.scopes): Resolved by ADR-029 and this ADR.
|
||||
`IdentityProvider` owns scopes. `ForwardingPolicy` consumes them.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Operators can restrict access per identity, per destination, per
|
||||
transport. A compromised key no longer grants unrestricted network access.
|
||||
- **Positive**: Default-allow preserves current behavior during migration. Switch
|
||||
to default-deny for production deployments.
|
||||
- **Positive**: Policy is reloadable without restart. Adding a rule via
|
||||
`reloadForwarding()` takes effect on the next channel open.
|
||||
- **Positive**: `TransportKind`-aware rules enable transport-specific
|
||||
restrictions (e.g., WebTransport clients restricted to alknet-* channels).
|
||||
- **Negative**: Another check in the hot path (every `channel_open_direct_tcpip`
|
||||
call). The cost is a linear scan of rules — acceptable for small rule sets.
|
||||
Large rule sets should use compiled matchers (future optimization).
|
||||
- **Negative**: `TargetPattern` string matching is lenient. Host patterns like
|
||||
`*.example.com` require careful implementation to prevent bypasses. The
|
||||
`glob` or `globset` crate can handle this correctly.
|
||||
|
||||
## References
|
||||
|
||||
- [research/configuration.md](../../research/configuration.md) — ForwardingPolicy section
|
||||
- [auth.md](../auth.md) — Identity.scopes and IdentityProvider
|
||||
- [open-questions.md](../open-questions.md) — OQ-12, OQ-16, OQ-18
|
||||
- [ADR-029](029-identity-core-type.md) — Identity as core type
|
||||
- [ADR-030](030-static-dynamic-config-split.md) — DynamicConfig (ForwardingPolicy is part of it)
|
||||
- [integration-plan.md](../../research/integration-plan.md) — Phase 1.3
|
||||
@@ -1,96 +0,0 @@
|
||||
# ADR-032: Event Boundary Discipline
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The research identified three distinct communication patterns in the system, and
|
||||
conflating them is a known anti-pattern in event-driven architectures:
|
||||
|
||||
1. **Domain events** (Honker streams) — Internal to the service that owns that
|
||||
data. Used for state reconstruction within the service's own boundaries.
|
||||
Examples: `nodes:created`, `edges:deleted`, `accounts:updated`.
|
||||
|
||||
2. **irpc service calls** — Synchronous request-response within a node or
|
||||
cluster. Internal to the system. Examples: `AuthProtocol::VerifyPubkey`,
|
||||
`SecretProtocol::DeriveEd25519`, `ConfigProtocol::ReloadForwarding`.
|
||||
|
||||
3. **Call protocol events** (`EventEnvelope`) — Asynchronous integration events
|
||||
that cross node boundaries. External to the system. Examples:
|
||||
`call.requested`, `call.responded`, `call.completed`, `call.aborted`.
|
||||
|
||||
Without a hard constraint, it's tempting to have one service subscribe directly
|
||||
to another service's Honker streams. This leads to:
|
||||
|
||||
- **Leaky event store**: Service A reads Service B's domain events directly,
|
||||
coupling A to B's internal state representation. When B changes its schema, A
|
||||
breaks.
|
||||
- **Boomerang coupling**: An integration event is too thin, causing the
|
||||
consumer to call back to the source service synchronously to get details. This
|
||||
negates the benefit of async communication.
|
||||
- **Fat notification trap**: A notification event carries full entity state,
|
||||
when it should use state transfer instead.
|
||||
|
||||
## Decision
|
||||
|
||||
**Event boundary discipline is a hard architectural constraint, not a
|
||||
suggestion.**
|
||||
|
||||
1. **Domain events stay within the owning service.** A Honker stream published
|
||||
by the storage service (`nodes:created`) is for the storage service's own
|
||||
state reconstruction. No other service reads these stream events directly.
|
||||
|
||||
2. **irpc service calls are synchronous and internal.** They never cross node
|
||||
boundaries. They are request-response, not events. They should not be used
|
||||
as a substitute for integration events.
|
||||
|
||||
3. **Call protocol events are the only events that cross node boundaries.**
|
||||
`EventEnvelope` frames are the integration boundary. When a domain event
|
||||
needs to be communicated to another node, it must be projected into a call
|
||||
protocol event.
|
||||
|
||||
4. **Projection from domain events to integration events is required when
|
||||
crossing boundaries.** A service that owns a Honker stream must project
|
||||
relevant state changes into `EventEnvelope` frames before they leave the
|
||||
node. The projection strips internal details and produces a versioned,
|
||||
stable integration event.
|
||||
|
||||
This discipline applies at three levels:
|
||||
|
||||
```
|
||||
Call Protocol (Layer 3, external, JSON)
|
||||
└── irpc Service (Layer 3, internal, postcard)
|
||||
└── Honker Streams (Domain events, within service boundary)
|
||||
```
|
||||
|
||||
A call protocol handler MAY call an irpc service internally (e.g.,
|
||||
`/head/auth/verify` calls `AuthProtocol::VerifyPubkey`). The irpc service MAY
|
||||
use Honker streams for its own state management. But domain events never
|
||||
propagate beyond the service boundary without projection.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Prevents leaky event stores. Services are independently
|
||||
deployable and their internal schemas can evolve without breaking consumers.
|
||||
- **Positive**: Honker and irpc are implementation details, not cross-boundary
|
||||
contracts. The call protocol's `EventEnvelope` is the only stable, versioned
|
||||
contract that other nodes depend on.
|
||||
- **Positive**: Clear ownership. Each service owns its Honker streams and can
|
||||
change them freely. Integration events are a deliberate, reviewed contract.
|
||||
- **Positive**: Makes testing easier. Services can be tested in isolation with
|
||||
mock domain events. Integration events are tested against the `EventEnvelope`
|
||||
schema.
|
||||
- **Negative**: Projection code is required. Every domain event that needs to
|
||||
cross a boundary must be explicitly projected. This is deliberate — the
|
||||
overhead ensures the integration contract is intentional.
|
||||
- **Negative**: Developers must resist the temptation to subscribe directly to
|
||||
Honker streams across services. Code review should catch this pattern.
|
||||
|
||||
## References
|
||||
|
||||
- [research/services.md](../../research/services.md) — Event boundary discipline section
|
||||
- [research/storage.md](../../research/storage.md) — Honker integration, event boundaries
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — ADR 032 entry
|
||||
- [event_source_types.md](../../research/event-sourcing/event_source_types.md) — Event-driven architecture patterns
|
||||
@@ -1,132 +0,0 @@
|
||||
# ADR-033: OperationEnv as Universal Composition Mechanism
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The `@alkdev/operations` TypeScript package defines `OperationEnv` as a
|
||||
universal composition mechanism. A handler receives `context.env[namespace][op](input)`
|
||||
and can invoke any registered operation regardless of whether it runs locally, in
|
||||
an irpc service on the same cluster, or on a remote node via call protocol.
|
||||
|
||||
The research documents define three dispatch paths:
|
||||
1. **Local dispatch** — direct function call through the operation registry
|
||||
2. **Service dispatch** — irpc protocol call to a service backend
|
||||
3. **Remote dispatch** — call protocol `EventEnvelope` to a remote node
|
||||
|
||||
Without a formal decision, irpc services could be seen as a replacement for
|
||||
OperationEnv or for the call protocol. They are not — irpc is one dispatch
|
||||
backend for OperationEnv, not a replacement for anything. The call protocol is
|
||||
another dispatch backend. OperationEnv unifies them from the handler's
|
||||
perspective.
|
||||
|
||||
The three communication patterns in the system (ADR-032) are:
|
||||
- Domain events (Honker streams) — internal to the owning service
|
||||
- irpc service calls — synchronous, in-cluster
|
||||
- Call protocol events — asynchronous, cross-node
|
||||
|
||||
irpc services and call protocol operations serve different scopes but must
|
||||
compose cleanly through OperationEnv.
|
||||
|
||||
## Decision
|
||||
|
||||
**OperationEnv is the universal composition mechanism that all operation
|
||||
handlers receive. It provides namespace + operation name → invoke with input,
|
||||
return output, regardless of dispatch path.**
|
||||
|
||||
### OperationEnv Behavioral Contract
|
||||
|
||||
```rust
|
||||
// The behavioral contract: given a namespace and operation name, invoke the
|
||||
// operation with the given input and return the output. The handler neither
|
||||
// knows nor cares whether the dispatch is local, via irpc, or via call protocol.
|
||||
pub trait OperationEnv: Send + Sync {
|
||||
fn invoke(&self, namespace: &str, operation: &str, input: Value) -> ResponseEnvelope;
|
||||
}
|
||||
```
|
||||
|
||||
The Rust implementation may use typed method dispatch or a registry behind the
|
||||
scenes, but the handler-facing API must preserve this contract.
|
||||
|
||||
### Three Dispatch Paths
|
||||
|
||||
OperationEnv resolves each call to one of three dispatch backends:
|
||||
|
||||
| Path | Mechanism | Serialization | Scope |
|
||||
|------|-----------|---------------|-------|
|
||||
| Local | Direct function call through registry | None (in-process) | Same process |
|
||||
| Service | irpc protocol enum dispatch | postcard (binary) | Same cluster |
|
||||
| Remote | Call protocol `EventEnvelope` | JSON | Cross-node |
|
||||
|
||||
All three produce the same `ResponseEnvelope`. The handler always calls
|
||||
`context.env.invoke("secrets", "derive", input)` and gets a `ResponseEnvelope`
|
||||
back.
|
||||
|
||||
### Service Assembly
|
||||
|
||||
The deployment topology determines which dispatch path each operation uses:
|
||||
|
||||
```rust
|
||||
// Minimal deployment (single node, all local)
|
||||
let env = OperationEnv::local(local_registry);
|
||||
|
||||
// Production deployment (mix of local and remote)
|
||||
let env = OperationEnv::new()
|
||||
.local("auth", auth_registry) // Auth runs locally
|
||||
.local("config", config_registry) // Config runs locally
|
||||
.service("secrets", secret_irpc_client) // Secret service via irpc
|
||||
.remote("worker-1", call_protocol_conn) // Worker-1 operations via call protocol
|
||||
```
|
||||
|
||||
### irpc Services Are One Dispatch Backend
|
||||
|
||||
irpc services (`AuthProtocol`, `SecretProtocol`, `ConfigProtocol`) define the
|
||||
wire format for in-cluster communication. They are Rust-to-Rust, type-safe,
|
||||
and efficient. But they are not a replacement for OperationEnv or for the call
|
||||
protocol. They are one dispatch backend.
|
||||
|
||||
An irpc service can be exposed as a call protocol operation:
|
||||
`/head/auth/verify` receives a call protocol event and internally calls
|
||||
`AuthProtocol::VerifyPubkey` via irpc. The layers compose:
|
||||
|
||||
```
|
||||
Call Protocol (Layer 3, external, JSON)
|
||||
└── irpc Service (Layer 3, internal, postcard)
|
||||
└── Honker Streams (Domain events, within service boundary)
|
||||
```
|
||||
|
||||
### Adapters Map to OperationEnv
|
||||
|
||||
HTTP (`POST /v1/{namespace}/{op}`), MCP (`tools/call`), DNS
|
||||
(`{op}.{namespace}.alk.dev TXT?`), and call protocol
|
||||
(`/call.requested`) all resolve through OperationEnv. This is what makes
|
||||
operations universally composable across all interfaces.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Handlers compose through a single interface. Adding a new
|
||||
dispatch path (e.g., a new irpc service) doesn't change handler code.
|
||||
- **Positive**: irpc and call protocol coexist naturally. The handler doesn't
|
||||
know which path was taken.
|
||||
- **Positive**: Adapters (MCP, HTTP, DNS) map to operations through the same
|
||||
OperationEnv interface. One handler, multiple dispatch paths.
|
||||
- **Positive**: Deployment topology determines dispatch, not code. Same handler
|
||||
works locally, in-cluster, or cross-node.
|
||||
- **Negative**: OperationEnv is a new abstraction that must coexist with the
|
||||
existing call protocol handler pattern. The registry currently maps paths to
|
||||
handlers; OperationEnv adds namespace-aware composition on top.
|
||||
- **Negative**: The `@alkdev/operations` TypeScript `HashMap<String,
|
||||
HashMap<String, fn>>` model needs idiomatic Rust translation. The behavioral
|
||||
contract must match, but the implementation can differ.
|
||||
|
||||
## References
|
||||
|
||||
- [research/services.md](../../research/services.md) — OperationContext, OperationEnv
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.5, OperationEnv wiring
|
||||
- [ADR-026](026-transport-interface-separation.md) — Three-layer model (OperationEnv is Layer 3)
|
||||
- [ADR-028](028-auth-irpc-service.md) — Auth as irpc service (one dispatch backend)
|
||||
- [ADR-032](032-event-boundary-discipline.md) — Event boundary discipline
|
||||
- [ADR-024](024-bidirectional-call-protocol.md) — Bidirectional call protocol
|
||||
- [ADR-025](025-handler-spec-separation.md) — Handler/spec separation
|
||||
@@ -1,55 +0,0 @@
|
||||
# ADR-034: Head/Worker Terminology
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The project previously used hub/spoke terminology for describing node
|
||||
relationships: a hub node that coordinates connections and spokes that connect to
|
||||
it. This terminology implies a strict star topology where the hub is
|
||||
fundamentally different from spokes.
|
||||
|
||||
In practice, a coordinating node can also execute operations (run services,
|
||||
forward traffic). Any node can become a coordinator. The architecture supports
|
||||
mesh topologies where nodes coordinate in a peer-to-peer fashion.
|
||||
|
||||
The research documents (`core.md`, `services.md`) and updated architecture
|
||||
specs (`call-protocol.md`, `auth.md`, `napi-and-pubsub.md`, `open-questions.md`)
|
||||
already use head/worker consistently. Existing ADRs (024, 025) retain their
|
||||
original hub/spoke language because ADRs are historical records.
|
||||
|
||||
## Decision
|
||||
|
||||
**Use head/worker terminology throughout the project.**
|
||||
|
||||
- **Head node**: A node that coordinates — accepts connections, routes
|
||||
operations, manages cluster state. A head is also a worker (it can execute
|
||||
operations).
|
||||
- **Worker node**: A node that connects to a head, registers its services, and
|
||||
executes operations. Any worker can become a head.
|
||||
- **Node**: Any participant in the network. Every node has an Ed25519 identity.
|
||||
|
||||
The terms hub and spoke are deprecated in all new specs, code, and
|
||||
documentation. Existing ADRs retain their original language as historical
|
||||
records — ADRs document what was decided at the time, not what the current
|
||||
terminology is.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Natural mesh formation. A head that is also a worker enables
|
||||
multi-hop routing, redundancy, and distributed topologies without a
|
||||
centralized authority.
|
||||
- **Positive**: Consistency with integration plan and research documents.
|
||||
- **Positive**: The terminology better reflects the architecture — there is no
|
||||
single "hub" that's fundamentally different from "spokes."
|
||||
- **Neutral**: Existing ADRs (024, 025) retain hub/spoke in their text. This is
|
||||
intentional — ADRs are historical records.
|
||||
|
||||
## References
|
||||
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 0 ADR 034 entry, inconsistencies section
|
||||
- [ADR-024](024-bidirectional-call-protocol.md) — Uses hub/spoke historically
|
||||
- [ADR-025](025-handler-spec-separation.md) — Uses hub/spoke historically
|
||||
- [research/core.md](../../research/core.md) — Head/worker terminology
|
||||
@@ -1,65 +0,0 @@
|
||||
# ADR-035: StreamInterface and MessageInterface Split
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The `Interface` trait (ADR-026) assumes a persistent byte stream from a `Transport`. It produces a `Session` that yields `InterfaceEvent` frames. This works for SSH and raw framing — both run over duplex streams.
|
||||
|
||||
However, HTTP and DNS do not fit this model. They handle individual request/response pairs, not persistent sessions. HTTP runs over a TLS connection after byte-peek protocol detection (extending the existing stealth mode pattern). DNS runs its own server on port 53. Both are stateless per-request, not session-oriented.
|
||||
|
||||
The three-layer model (Transport, Interface, Protocol) remains correct. The issue is that Layer 2 has two distinct patterns: stream-based (SSH, raw framing) where the transport provides a continuous byte stream, and message-based (HTTP, DNS) where the interface manages its own transport and handles discrete requests.
|
||||
|
||||
## Decision
|
||||
|
||||
Split the `Interface` trait into two independent traits:
|
||||
|
||||
1. **`StreamInterface`** — consumes a `TransportStream`, produces a long-lived `Session` that yields `InterfaceEvent` frames. Existing `SshInterface` and `RawFramingInterface` become `StreamInterface` implementations.
|
||||
|
||||
2. **`MessageInterface`** — handles individual `InterfaceRequest` → `InterfaceResponse` pairs. Manages its own transport (HTTP server, DNS server). `HttpInterface` and `DnsInterface` are `MessageInterface` implementations.
|
||||
|
||||
The traits are independent. They have different signatures (`accept(stream)` vs `handle_request(req)`), different lifecycles (long-lived session vs stateless per-request), and different transport ownership (provided by caller vs self-managed).
|
||||
|
||||
`ListenerConfig` gains variants for both:
|
||||
|
||||
```rust
|
||||
pub enum ListenerConfig {
|
||||
Stream {
|
||||
transport: TransportKind,
|
||||
interface: StreamInterfaceKind,
|
||||
},
|
||||
Http {
|
||||
bind_addr: SocketAddr,
|
||||
tls: bool,
|
||||
stealth: bool,
|
||||
},
|
||||
Dns {
|
||||
bind_addr: SocketAddr,
|
||||
tls: bool,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
`TransportKind::Dns` is removed. DNS is a `MessageInterface` that manages its own transport (UDP/TCP port 53), not a transport variant.
|
||||
|
||||
The call protocol handler (Layer 3) is interface-agnostic: it processes `InterfaceEvent` frames from `StreamInterface` sessions and `InterfaceRequest` → `InterfaceResponse` from `MessageInterface` handlers. The dispatch logic is the same — only the framing differs.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive**: HTTP and DNS are first-class interfaces with proper type signatures. No forcing stateless protocols into a session model. The existing stealth mode byte-peek pattern naturally extends to `HttpInterface`. The `InterfaceRequest` / `InterfaceResponse` types normalize calls across message-based interfaces.
|
||||
|
||||
**Positive**: Removing `TransportKind::Dns` prevents a breaking change later — code should never depend on DNS as a transport variant.
|
||||
|
||||
**Positive**: `ListenerConfig` correctly models the server's accept loop: stream listeners spawn one accept loop per (transport, interface) pair, while HTTP and DNS listeners each manage their own server.
|
||||
|
||||
**Negative**: Two traits where there was one. But they serve fundamentally different purposes. A common super-trait would add complexity (`accept_stream` + `handle_request` + `transport_kind`) without practical benefit — implementations satisfy one trait or the other, never both.
|
||||
|
||||
**Negative**: The `accept()` method on the current `Interface` trait needs to be renamed. This is a rename of an existing method signature, not a semantic change — `SshInterface` and `RawFramingInterface` implementations become `StreamInterface` implementations with the same `accept()` logic.
|
||||
|
||||
## References
|
||||
|
||||
- ADR-026 (transport/interface separation — updated by this ADR)
|
||||
- [interface.md](../interface.md) — Interface layer spec
|
||||
- [research/phase2/interface-model.md](../../research/phase2/interface-model.md) — Full analysis
|
||||
- [research/phase2/tls-transport.md](../../research/phase2/tls-transport.md) — HTTP interface, ListenerConfig
|
||||
@@ -1,82 +0,0 @@
|
||||
# ADR-036: CredentialProvider as Core Type
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Alknet's `IdentityProvider` resolves **inbound** authentication: given a
|
||||
credential (fingerprint or token), produce an `Identity`. But there is no
|
||||
corresponding abstraction for **outbound** credentials: how does alknet
|
||||
authenticate _to_ external services (vast.ai, rustfs, gitea)?
|
||||
|
||||
Without `CredentialProvider`, each service wrapper would independently solve
|
||||
credential retrieval, caching, and lifecycle management. This leads to
|
||||
duplicated effort and inconsistent security practices across service wrappers.
|
||||
|
||||
The pattern mirrors the existing `IdentityProvider` pattern: trait in core,
|
||||
default impl using simple storage, production impl using the secret service
|
||||
and database.
|
||||
|
||||
## Decision
|
||||
|
||||
Define `CredentialProvider` trait and `CredentialSet` enum in
|
||||
`alknet_core::credentials`.
|
||||
|
||||
```rust
|
||||
pub trait CredentialProvider: Send + Sync + 'static {
|
||||
fn get_credentials(&self, service: &str) -> Option<CredentialSet>;
|
||||
fn refresh_credentials(&self, service: &str) -> Option<CredentialSet>;
|
||||
}
|
||||
|
||||
pub enum CredentialSet {
|
||||
ApiKey { header_name: String, token: String },
|
||||
Basic { username: String, password: String },
|
||||
Bearer { token: String },
|
||||
S3AccessKey { access_key: String, secret_key: String, session_token: Option<String> },
|
||||
OidcToken { access_token: String, refresh_token: Option<String>, expires_at: Option<u64> },
|
||||
Custom { scheme: String, params: HashMap<String, String> },
|
||||
}
|
||||
```
|
||||
|
||||
The trait is intentionally narrow. It returns credentials for a named service.
|
||||
It does not try to abstract the auth mechanism itself — that stays with the
|
||||
service wrapper that knows the protocol (S3 signing, OAuth2 refresh, etc.).
|
||||
|
||||
Phase 1 provides `SecretStoreCredentialProvider` (reads from
|
||||
`SecretProtocol::Decrypt`, holds in RAM). Phase 2+ adds
|
||||
`ManagedCredentialProvider` (with `CredentialManager` for lifecycle management:
|
||||
refresh, expiration, provisioning).
|
||||
|
||||
`CredentialProvider` does not depend on `IdentityProvider`, though
|
||||
`ManagedCredentialProvider` may use `Identity.id` for identity-bound credential
|
||||
lookups.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive**: Outbound auth has a unified abstraction, just as inbound auth
|
||||
has `IdentityProvider`. Service wrappers retrieve credentials through one
|
||||
interface. `OperationEnv` can expose credentials through `context.env`.
|
||||
|
||||
**Positive**: The `CredentialSet` enum covers all identified credential types
|
||||
(API keys, bearer tokens, S3 access keys, OIDC tokens, basic auth, custom).
|
||||
This is sufficient for Phases A-C. Phase D (alknet as OIDC provider) is additive.
|
||||
|
||||
**Positive**: The trait in core, impl in service crate pattern is consistent
|
||||
with `IdentityProvider` (trait in core, `ConfigIdentityProvider` in core,
|
||||
`StorageIdentityProvider` in alknet-storage).
|
||||
|
||||
**Negative**: Adds a new core type and a new module (`credentials`). But this
|
||||
is the same pattern as `IdentityProvider` and `auth` — a small, narrow trait
|
||||
with a clear contract.
|
||||
|
||||
**Negative**: `ManagedCredentialProvider` and `CredentialManager` are Phase C
|
||||
concepts. The spec should define them as future extensions, not implement them
|
||||
now.
|
||||
|
||||
## References
|
||||
|
||||
- ADR-029 (Identity as core type — same pattern)
|
||||
- [credentials.md](../credentials.md) — CredentialProvider spec
|
||||
- [research/phase2/credential-provider.md](../../research/phase2/credential-provider.md) — Full analysis
|
||||
- [identity.md](../identity.md) — IdentityProvider (inbound, opposite direction)
|
||||
@@ -1,83 +0,0 @@
|
||||
# ADR-037: API Keys as DynamicConfig Auth
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Alknet's token auth uses Ed25519-signed `AuthToken`s — the same key material
|
||||
used for SSH auth. This is appropriate for interactive clients (browsers, CLI)
|
||||
that can generate and sign Ed25519 key pairs.
|
||||
|
||||
But for service accounts, automation, and simple integrations, Ed25519 key
|
||||
pairs are inconvenient. A dashboard backend, a CI/CD pipeline, or a monitoring
|
||||
script needs a simple bearer token that can be stored in an environment variable
|
||||
or config file without managing cryptographic key pairs.
|
||||
|
||||
The HTTP interface (Phase 2+) requires bearer token auth for `Authorization:
|
||||
Bearer <token>` headers. `AuthToken` works but requires client-side Ed25519
|
||||
signing. API keys offer a simpler alternative: short bearer tokens verified by
|
||||
SHA-256 hash lookup, with optional scope restrictions and TTL.
|
||||
|
||||
## Decision
|
||||
|
||||
Add `[[auth.api_keys]]` section to `DynamicConfig`:
|
||||
|
||||
```toml
|
||||
[[auth.api_keys]]
|
||||
prefix = "alk_"
|
||||
hash = "sha256:abc..."
|
||||
scopes = ["relay:connect", "secrets:derive"]
|
||||
description = "dashboard service account"
|
||||
ttl = "30d" # optional
|
||||
```
|
||||
|
||||
`ConfigIdentityProvider::resolve_from_token()` handles both token types:
|
||||
- If the input starts with the configured prefix (default `alk_`), treat it as
|
||||
an API key: hash it with SHA-256 and look up the hash in the `api_keys` table.
|
||||
- Otherwise, treat it as an `AuthToken`: decode, verify Ed25519 signature,
|
||||
check timestamp, resolve from `authorized_keys`.
|
||||
|
||||
Both paths produce the same `Identity` result. In database-backed deployments,
|
||||
both resolve to the same account UUID.
|
||||
|
||||
API keys are stored as SHA-256 hashes (like password hashing — the cleartext
|
||||
key is never stored, only its hash). The prefix enables O(1) routing between
|
||||
AuthToken and API key verification without trying both paths.
|
||||
|
||||
The full key is provided to the client exactly once (at creation time). Subsequent
|
||||
verifications only compare hashes.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive**: Simple bearer token auth for HTTP and other non-SSH interfaces.
|
||||
No cryptographic key management for service accounts. Consistent with industry
|
||||
practice (Stripe, GitHub, AWS all use prefixed API keys).
|
||||
|
||||
**Positive**: Both AuthTokens and API keys go through `resolve_from_token()`.
|
||||
The caller doesn't need to know which type they're using. This keeps the
|
||||
authentication layer unified.
|
||||
|
||||
**Positive**: Scoped API keys enable fine-grained access control for service
|
||||
accounts. A monitoring tool gets `["monitoring:read"]`, not full access.
|
||||
|
||||
**Negative**: API keys are bearer tokens — anyone who obtains the key has the
|
||||
associated permissions. The hash storage and optional TTL mitigate but do not
|
||||
eliminate this risk. Ed25519 AuthTokens remain the preferred auth method for
|
||||
interactive clients.
|
||||
|
||||
**Negative**: API key rotation requires updating `DynamicConfig` (or the
|
||||
`api_keys` database table). The `ConfigReloadHandle` / `ConfigService` reload
|
||||
mechanism handles this, but it's a deliberate operation, not automatic.
|
||||
|
||||
**Negative**: No rate limiting on API key verification is built into this ADR.
|
||||
Rate limiting on the HTTP interface is a separate concern.
|
||||
|
||||
## References
|
||||
|
||||
- ADR-023 (unified auth, shared key material)
|
||||
- ADR-029 (Identity as core type)
|
||||
- ADR-030 (static/dynamic config split)
|
||||
- [auth.md](../auth.md) — Token auth, AuthPolicy, API keys
|
||||
- [configuration.md](../configuration.md) — DynamicConfig, AuthPolicy
|
||||
- [research/phase2/interface-model.md](../../research/phase2/interface-model.md) — API keys in config
|
||||
@@ -1,137 +0,0 @@
|
||||
# ADR-038: Seed Lifecycle and Memory Security
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The alknet-secret crate holds the master BIP39 seed phrase in RAM. This seed is
|
||||
the root of trust for all derived keys (identity, encryption, signing). If the
|
||||
seed is leaked — through memory dumps, swap files, or core dumps — an attacker
|
||||
can derive every key in the system.
|
||||
|
||||
Security-conscious key management systems typically employ three defenses:
|
||||
|
||||
1. **Zeroize**: Overwrite sensitive memory before deallocating. Prevents
|
||||
stale-data reads from freed memory.
|
||||
|
||||
2. **Memory locking** (`mlock`/`VirtualLock`): Prevent the OS from paging
|
||||
sensitive RAM to disk. Prevents swap-file leakage.
|
||||
|
||||
3. **Constant-time comparison**: Prevent timing side-channels when comparing
|
||||
keys or tokens.
|
||||
|
||||
The question is: which of these should alknet-secret adopt in v1, and which
|
||||
should be deferred?
|
||||
|
||||
## Decision
|
||||
|
||||
**Phase 3 (v1): Zeroize only. Defer mlock and constant-time comparison to
|
||||
Phase B.**
|
||||
|
||||
- All sensitive types (seed bytes, derived private keys, passphrase strings)
|
||||
derive `Zeroize` and implement `Drop` to call `zeroize()` before deallocation.
|
||||
- The `Lock` operation calls `zeroize()` on the seed and all cached derived
|
||||
keys, then drops them.
|
||||
- `mlock`/`VirtualLock` and constant-time comparison are not included in v1.
|
||||
|
||||
### Rationale for deferring mlock
|
||||
|
||||
1. **Complexity**: `mlock` requires root/CAP_IPC_LOCK on Linux or
|
||||
`SeLockMemory` on Windows. The crate should work in unprivileged contexts
|
||||
(development, testing, single-user nodes) without requiring system
|
||||
configuration changes.
|
||||
|
||||
2. **Performance**: `mlock` locks physical pages, which are typically 4KB.
|
||||
Locking many small buffers wastes physical memory. The seed (64 bytes) and
|
||||
derived keys (32–64 bytes each) are tiny — the real risk is swap-file
|
||||
leakage, which `zeroize` partially mitigates by wiping before free.
|
||||
|
||||
3. **Deployment flexibility**: Production head nodes running as root or with
|
||||
`CAP_IPC_LOCK` can add `mlock` in Phase B. Development and CLI nodes
|
||||
shouldn't need it.
|
||||
|
||||
4. **Audit surface**: `mlock` introduces platform-specific code paths (Linux
|
||||
vs macOS vs Windows) that should be audited together, not bolted on
|
||||
incrementally.
|
||||
|
||||
### Rationale for deferring constant-time comparison
|
||||
|
||||
The `SecretProtocol` service receives requests over irpc (local mpsc or remote
|
||||
QUIC). Comparison timing is not observable by callers — they send a message and
|
||||
wait for a response. The comparison that matters (auth token verification) is
|
||||
in alknet-core's `IdentityProvider`, not in alknet-secret. Key derivation
|
||||
results (DerivedKey) are not compared against attacker-controlled input within
|
||||
this crate.
|
||||
|
||||
### Zeroize implementation
|
||||
|
||||
```rust
|
||||
use zeroize::Zeroize;
|
||||
|
||||
#[derive(Zeroize)]
|
||||
#[zeroize(drop)]
|
||||
struct SeedHolder {
|
||||
seed: Vec<u8>,
|
||||
}
|
||||
|
||||
#[derive(Zeroize)]
|
||||
#[zeroize(drop)]
|
||||
struct DerivedKeyCache {
|
||||
keys: HashMap<String, Vec<u8>>,
|
||||
}
|
||||
```
|
||||
|
||||
`#[zeroize(drop)]` ensures that `Drop` calls `zeroize()` on all fields,
|
||||
overwriting memory before deallocation. This is a compile-time guarantee —
|
||||
forgetting to zeroize a field is a compile error.
|
||||
|
||||
### Lock lifecycle
|
||||
|
||||
```
|
||||
Unlock(passphrase)
|
||||
→ validate mnemonic (if restoring) or generate new
|
||||
→ derive master key from seed
|
||||
→ store seed in SeedHolder (Zeroize-protected)
|
||||
→ cache empty (keys derived on demand)
|
||||
|
||||
DeriveEd25519/DeriveEncryptionKey/Encrypt/Decrypt
|
||||
→ require unlocked state (error if locked)
|
||||
→ derive key, return result
|
||||
→ optionally cache derived key
|
||||
|
||||
Lock
|
||||
→ zeroize all cached derived keys
|
||||
→ zeroize seed
|
||||
→ drop all sensitive material
|
||||
→ service returns to locked state
|
||||
```
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Zeroize is zero-cost at compile time, minimal dependency
|
||||
(`zeroize` crate is ~500 lines, no `unsafe` on stable), and provides
|
||||
meaningful protection against stale-memory reads.
|
||||
- **Positive**: Lock effectively purges all sensitive material. After Lock,
|
||||
the process memory contains no useful secret data.
|
||||
- **Positive**: No platform-specific code paths in v1. The crate compiles and
|
||||
runs everywhere without privilege requirements.
|
||||
- **Negative**: Without `mlock`, the OS can page the seed to swap before
|
||||
zeroization occurs. This is a window of vulnerability that Phase B closes.
|
||||
The risk is acceptable for v1 because swap-file extraction requires root
|
||||
access or physical access to the machine — the same threat model as reading
|
||||
process memory directly.
|
||||
- **Negative**: Without constant-time comparison, timing side-channels exist
|
||||
in theory. In practice, no comparison in alknet-secret operates on
|
||||
attacker-controlled input, so the risk is nil within this crate.
|
||||
- **Negative**: `zeroize` adds a dependency. The `zeroize` crate is widely
|
||||
used in Rust crypto (ring, ed25519-dalek, x25519-dalek) and is a de facto
|
||||
standard.
|
||||
|
||||
## References
|
||||
|
||||
- [secret-service.md](../secret-service.md) — Security model, Lock/Unlock lifecycle
|
||||
- [ADR-027](027-crate-decomposition.md) — Crate decomposition (alknet-secret is independent)
|
||||
- [credentials.md](../credentials.md) — SecretStoreCredentialProvider integration
|
||||
- `zeroize` crate — https://crates.io/crates/zeroize
|
||||
@@ -1,226 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-09
|
||||
---
|
||||
|
||||
# Definitions: Terminology and Concept Disambiguation
|
||||
|
||||
## Purpose
|
||||
|
||||
Several terms are overloaded across alknet's architecture. This document defines
|
||||
each term precisely and states the rule for using it in architecture specs. When
|
||||
ambiguity is possible, specs must use the full qualifier.
|
||||
|
||||
This is a normative reference — other architecture documents link here rather
|
||||
than repeating definitions inline.
|
||||
|
||||
## Term Definitions
|
||||
|
||||
### Interface (Layer 2)
|
||||
|
||||
An **Interface** consumes a Transport stream (Layer 1) or manages its own
|
||||
transport, and produces call protocol sessions or handles discrete requests.
|
||||
It is a _protocol parser_, not a network service.
|
||||
|
||||
Two subtypes:
|
||||
|
||||
| Subtype | Trait | Lifecycle | Transport ownership | Examples |
|
||||
|---------|-------|-----------|---------------------|----------|
|
||||
| `StreamInterface` | `accept(stream) → Session` | Long-lived session | Provided by caller | SshInterface, RawFramingInterface |
|
||||
| `MessageInterface` | `handle_request(req) → Response` | Stateless per-request | Self-managed | HttpInterface, DnsInterface |
|
||||
|
||||
**Rule**: In alknet architecture docs, "Interface" (capitalized) refers to
|
||||
Layer 2. Rust trait definitions use "trait" or "contract." Network URLs use
|
||||
"endpoint." When discussing auth mechanisms per transport/interface pair, use
|
||||
"credential presentation" (not "auth interface").
|
||||
|
||||
See: [interface.md](interface.md), ADR-035.
|
||||
|
||||
### Transport (Layer 1)
|
||||
|
||||
A **Transport** produces a byte stream (`AsyncRead + AsyncWrite + Unpin + Send`).
|
||||
It is a _wire mechanism_, not a protocol. `TransportKind` enumerates:
|
||||
`Tcp`, `Tls`, `Iroh`, `WebTransport`.
|
||||
|
||||
DNS is **not** a transport — it is a `MessageInterface` that manages its own
|
||||
transport (UDP/TCP port 53).
|
||||
|
||||
**Rule**: Never use "transport" to refer to HTTP, DNS, or any protocol that
|
||||
doesn't produce a `TransportStream`. Use "MessageInterface" instead.
|
||||
|
||||
See: [transport.md](transport.md), ADR-026, ADR-035.
|
||||
|
||||
### Service (irpc service)
|
||||
|
||||
An **irpc service** is an in-cluster, Rust-to-Rust service defined by an irpc
|
||||
protocol enum. Dispatched by enum variant with postcard serialization. Examples:
|
||||
`AuthProtocol`, `SecretProtocol`, `ConfigProtocol`.
|
||||
|
||||
**Rule**: Always qualify: "irpc service" (in-cluster, enum-dispatched),
|
||||
"application service" (operation-registered handler), or "external service"
|
||||
(third-party endpoint). Never use bare "service" in architecture docs.
|
||||
|
||||
See: [services.md](services.md), ADR-028, ADR-033.
|
||||
|
||||
### Operation (call protocol)
|
||||
|
||||
An **operation** is a path-based handler registered in `OperationRegistry`,
|
||||
dispatched by `namespace + name`. Cross-node, cross-language, JSON
|
||||
`EventEnvelope` framing.
|
||||
|
||||
**Rule**: Use "operation" for call protocol handlers. Use "irpc service method"
|
||||
for enum-dispatched calls. These are different dispatch mechanisms unified by
|
||||
OperationEnv.
|
||||
|
||||
See: [call-protocol.md](call-protocol.md), ADR-033.
|
||||
|
||||
### Identity (core type)
|
||||
|
||||
The `Identity` struct `{ id, scopes, resources }` represents an authenticated
|
||||
principal. Produced by `IdentityProvider` (inbound auth resolution).
|
||||
|
||||
| Identity field | Config-backed auth | Database-backed auth |
|
||||
|---------------|-------------------|---------------------|
|
||||
| `id` | SSH key fingerprint | Account UUID |
|
||||
| `scopes` | From authorized_keys entry | From peer_credentials + ACL |
|
||||
| `resources` | From authorized_keys entry | From organization membership |
|
||||
|
||||
**Rule**: "Identity" (capitalized, code font) = the alknet struct. "identity
|
||||
service" = a full identity management system (Keystone, etc.). Never conflate
|
||||
the two.
|
||||
|
||||
See: [identity.md](identity.md), ADR-029.
|
||||
|
||||
### IdentityProvider (inbound auth)
|
||||
|
||||
`IdentityProvider` resolves **inbound** authentication: given a credential
|
||||
(fingerprint or token), produce an `Identity`.
|
||||
|
||||
**Direction**: Inbound (who is calling alknet).
|
||||
|
||||
**Rule**: Never use "IdentityProvider" to describe outbound auth. That is
|
||||
`CredentialProvider`.
|
||||
|
||||
See: [identity.md](identity.md), ADR-029.
|
||||
|
||||
### CredentialProvider (outbound auth)
|
||||
|
||||
`CredentialProvider` resolves **outbound** credentials: given a service name,
|
||||
produce a `CredentialSet` for authenticating _to_ that service.
|
||||
|
||||
**Direction**: Outbound (how alknet calls others).
|
||||
|
||||
**Rule**: Never use "CredentialProvider" for inbound auth. That is
|
||||
`IdentityProvider`.
|
||||
|
||||
See: [credentials.md](credentials.md), ADR-036.
|
||||
|
||||
### AuthToken
|
||||
|
||||
`AuthToken = base64url(key_id || timestamp || signature)` — an Ed25519-signed
|
||||
timestamp token used for non-SSH auth. Self-signed by the client, verified
|
||||
server-side.
|
||||
|
||||
**Rule**: Use "AuthToken" (capitalized) for this specific format. Use "API key"
|
||||
for hash-verified bearer tokens. Never use bare "token" in architecture docs.
|
||||
|
||||
See: [auth.md](auth.md), ADR-023.
|
||||
|
||||
### API Key
|
||||
|
||||
A hash-verified bearer token with a prefix like `alk_...`. Simpler than
|
||||
AuthToken (no Ed25519 key pair needed). Stored as SHA-256 hash in
|
||||
`DynamicConfig.auth.api_keys` or `api_keys` table.
|
||||
|
||||
**Rule**: Always "API key" (two words) for hash-verified bearer tokens.
|
||||
"AuthToken" for Ed25519-signed tokens.
|
||||
|
||||
See: [auth.md](auth.md), ADR-037.
|
||||
|
||||
### Domain Event vs Integration Event
|
||||
|
||||
| Type | Scope | Serialization | Example |
|
||||
|------|-------|---------------|---------|
|
||||
| Domain event | Within a service boundary | Any format (Honker streams) | `KeyRotated`, `InventoryAdjusted` |
|
||||
| Integration event | Across service or node boundaries | JSON `EventEnvelope` | `call.requested`, `UserCreated` |
|
||||
|
||||
irpc service calls are synchronous request-response, not events.
|
||||
|
||||
**Rule**: "Domain event" for internal Honker streams. "Integration event" for
|
||||
call protocol `EventEnvelope`. "irpc call" for synchronous in-cluster calls.
|
||||
Per ADR-032, domain events never cross service boundaries without projection.
|
||||
|
||||
See: ADR-032, [services.md](services.md).
|
||||
|
||||
### Scope
|
||||
|
||||
A permission string attached to an `Identity`. Flat strings like
|
||||
`"relay:connect"`, `"secrets:derive"`. Used by `ForwardingPolicy` and
|
||||
operation-level ACL.
|
||||
|
||||
**Rule**: Use "scope" for `Identity.scopes` flat strings. Use "resource" for
|
||||
`Identity.resources` entries. Do not conflate with hierarchical role models
|
||||
unless explicitly noting a comparison to Keystone.
|
||||
|
||||
See: [identity.md](identity.md), ADR-031.
|
||||
|
||||
### OperationRegistry
|
||||
|
||||
The central registry mapping `(namespace, operation_name)` to handlers and
|
||||
specs. All interfaces resolve to the same registry.
|
||||
|
||||
**Rule**: "OperationRegistry" for this specific data structure. "Service
|
||||
catalog" only when explicitly comparing to Keystone or similar external systems.
|
||||
|
||||
See: [call-protocol.md](call-protocol.md), ADR-025.
|
||||
|
||||
### Credential Presentation
|
||||
|
||||
The mechanism by which credentials are presented on each (Transport, Interface)
|
||||
pair:
|
||||
|
||||
| (Transport, Interface) | Credential presentation | Resolves via |
|
||||
|----------------------|----------------------|-------------|
|
||||
| (TLS, SSH) | SSH key handshake | `resolve_from_fingerprint()` |
|
||||
| (TCP, SSH) | SSH key handshake | `resolve_from_fingerprint()` |
|
||||
| (iroh, SSH) | SSH key handshake | `resolve_from_fingerprint()` |
|
||||
| (TLS, raw framing) | AuthToken in frame header | `resolve_from_token()` |
|
||||
| (TCP, raw framing) | AuthToken in frame header | `resolve_from_token()` |
|
||||
| (WebTransport, raw framing) | AuthToken in CONNECT request | `resolve_from_token()` |
|
||||
| (—, HTTP) | `Authorization: Bearer` header | `resolve_from_token()` |
|
||||
| (—, DNS) | AuthToken in query labels | `resolve_from_token()` |
|
||||
|
||||
**Rule**: Use "credential presentation" for the mechanism of presenting
|
||||
credentials on a specific (Transport, Interface) pair. Not "auth interface"
|
||||
(which overloads "Interface").
|
||||
|
||||
See: [auth.md](auth.md), [interface.md](interface.md).
|
||||
|
||||
## Cross-cutting Open Questions
|
||||
|
||||
These questions affect multiple specs and need resolution before or during
|
||||
Phase 2 implementation:
|
||||
|
||||
- **OQ-DEF-03**: Should `Identity.scopes` be hierarchical (Keystone implied roles)
|
||||
or stay flat? Recommendation: Stay flat. Add implied scope resolution in
|
||||
alknet-storage when multi-tenant deployment requires it.
|
||||
|
||||
- **OQ-DEF-07**: Should the on-chain `IdentityProvider` be a separate impl or a
|
||||
`CredentialProvider` extension? Recommendation: Separate `IdentityProvider`
|
||||
impl (`OnChainIdentityProvider`). `IdentityProvider` resolves inbound auth,
|
||||
not outbound credentials.
|
||||
|
||||
- **OQ-DEF-08**: Should "credential presentation" replace overloaded "interface" in
|
||||
auth contexts? Recommendation: Yes. Adopted in this document.
|
||||
|
||||
See: [open-questions.md](open-questions.md) for tracking.
|
||||
|
||||
## References
|
||||
|
||||
- [interface.md](interface.md) — StreamInterface / MessageInterface
|
||||
- [auth.md](auth.md) — AuthToken, credential presentation per interface
|
||||
- [identity.md](identity.md) — Identity, IdentityProvider
|
||||
- [credentials.md](credentials.md) — CredentialProvider, CredentialSet
|
||||
- [services.md](services.md) — irpc services vs application services
|
||||
- [call-protocol.md](call-protocol.md) — Operations, OperationEnv
|
||||
- [research/phase2/definitions.md](../research/phase2/definitions.md) — Full research with cross-domain mappings
|
||||
@@ -1,186 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-07
|
||||
---
|
||||
|
||||
# FlowGraph
|
||||
|
||||
## What
|
||||
|
||||
The `alknet-flowgraph` crate provides graph data structures and operations,
|
||||
mapping the TypeScript `@alkdev/flowgraph` package's call-graph and
|
||||
operation-graph concepts to `petgraph::DiGraph`.
|
||||
|
||||
## Why
|
||||
|
||||
Call graphs and operation graphs are core observability and type-safety
|
||||
constructs. Call graphs track request flow across services; operation graphs
|
||||
validate type compatibility between composed operations. The crate is pure
|
||||
computation (no I/O, no external state), making it safe to include in any
|
||||
deployment topology.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Core Abstraction
|
||||
|
||||
`petgraph::DiGraph` replaces graphology. The mapping is nearly 1:1 for the
|
||||
operations used:
|
||||
|
||||
| TypeScript (graphology) | Rust (petgraph) |
|
||||
|------------------------|-----------------|
|
||||
| `graph.addNode(key, attrs)` | `graph.add_node(attrs)` + key_to_index |
|
||||
| `graph.addEdge(source, target, attrs)` | `graph.add_edge(source, target, attrs)` |
|
||||
| `hasCycle()` | `is_cyclic_directed(&graph)` |
|
||||
| `topologicalSort()` | `toposort(&graph)` |
|
||||
|
||||
A `HashMap<String, NodeIndex>` provides node-key-to-index lookups, mirroring
|
||||
the `key` column in the SQLite `nodes` table.
|
||||
|
||||
### FlowGraph<N, E>
|
||||
|
||||
```rust
|
||||
pub struct FlowGraph<N, E>
|
||||
where
|
||||
N: NodeAttributes,
|
||||
E: EdgeAttributes,
|
||||
{
|
||||
graph: DiGraph<N, E>,
|
||||
key_to_index: HashMap<String, NodeIndex>,
|
||||
}
|
||||
|
||||
pub trait NodeAttributes: Clone + Serialize + DeserializeOwned + Debug + Send + Sync {
|
||||
fn key(&self) -> &str;
|
||||
fn set_key(&mut self, key: String);
|
||||
}
|
||||
|
||||
pub trait EdgeAttributes: Clone + Serialize + DeserializeOwned + Debug + Send + Sync {
|
||||
fn edge_type(&self) -> &str;
|
||||
}
|
||||
```
|
||||
|
||||
### Operation Graph (Static)
|
||||
|
||||
Built from `OperationSpec`s at startup. Answers structural questions: type
|
||||
compatibility, cycle detection, reachability.
|
||||
|
||||
```rust
|
||||
pub struct OperationNodeAttrs {
|
||||
pub name: String,
|
||||
pub namespace: String,
|
||||
pub op_type: OperationType,
|
||||
pub input_schema: Value,
|
||||
pub output_schema: Value,
|
||||
}
|
||||
|
||||
pub enum OperationType { Query, Mutation, Subscription }
|
||||
```
|
||||
|
||||
Type compatibility compares `output_schema` (source) against `input_schema`
|
||||
(target) using `jsonschema::validate()`. Exact match or subtype = compatible
|
||||
edge. Structural mismatch = incompatible edge.
|
||||
|
||||
### Call Graph (Dynamic)
|
||||
|
||||
Populated at runtime from call protocol events. Every `call.requested` adds a
|
||||
node; `call.responded`/`call.error`/`call.aborted` update status.
|
||||
|
||||
```rust
|
||||
pub struct CallNodeAttrs {
|
||||
pub request_id: String,
|
||||
pub operation_id: String,
|
||||
pub status: CallStatus,
|
||||
pub parent_request_id: Option<String>,
|
||||
pub input: Value,
|
||||
pub output: Option<Value>,
|
||||
pub error: Option<CallErrorInfo>,
|
||||
pub identity: Option<Identity>,
|
||||
pub started_at: Option<String>,
|
||||
pub completed_at: Option<String>,
|
||||
}
|
||||
|
||||
pub enum CallStatus { Pending, Running, Completed, Failed, Aborted }
|
||||
```
|
||||
|
||||
### Key Operations
|
||||
|
||||
| Query | Method | Returns |
|
||||
|-------|--------|---------|
|
||||
| Topological order | `topological_order()` | `Result<Vec<String>, CycleError>` |
|
||||
| Cycle detection | `has_cycles()` | `bool` |
|
||||
| Ancestors/descendants | `ancestors()`, `descendants()` | `Vec<String>` |
|
||||
| Status filtering | `filter_by_status()` | Keys with matching status |
|
||||
| Duration | `duration()` | `completed_at - started_at` |
|
||||
|
||||
### DAG Invariants
|
||||
|
||||
- **Operation graph**: DAG-only enforced at construction. Cycles throw
|
||||
`CycleError`.
|
||||
- **Call graph**: DAG by design. `parent_request_id` cannot create ancestor
|
||||
cycles.
|
||||
- **No parallel edges**: `multi: false`.
|
||||
- **No self-loops**: `allow_self_loops: false`.
|
||||
|
||||
### Integration with alknet-storage
|
||||
|
||||
Call graphs and operation graphs are stored as metagraph instances in
|
||||
alknet-storage. The bridge is serialization: `FlowGraph` serializes to
|
||||
`serde_json::Value`, which storage persists in the `nodes.attributes` and
|
||||
`edges.attributes` columns.
|
||||
|
||||
### Integration with alknet-core (Call Protocol)
|
||||
|
||||
The call protocol's `EventEnvelope` drives call graph construction:
|
||||
|
||||
```rust
|
||||
call_map.on_requested(|event| {
|
||||
call_graph.update_from_event(&CallEvent::Requested(event));
|
||||
});
|
||||
```
|
||||
|
||||
### Crate Dependencies
|
||||
|
||||
```toml
|
||||
[dependencies]
|
||||
petgraph = "0.x"
|
||||
serde = { version = "1", features = ["derive"] }
|
||||
serde_json = "1"
|
||||
jsonschema = "0.x"
|
||||
thiserror = "1"
|
||||
uuid = { version = "1", features = ["v4"] }
|
||||
chrono = { version = "0.x", features = ["serde"] }
|
||||
```
|
||||
|
||||
Does NOT depend on alknet-core, alknet-storage, or alknet-secret.
|
||||
|
||||
### Interface Back to Core
|
||||
|
||||
`OperationSpec` and `CallNodeAttrs` types must match alknet-core's definitions.
|
||||
The bridge is serialization — flowgraph serializes to JSON, storage persists it.
|
||||
alknet-flowgraph does not depend on alknet-core as a crate; it conforms to the
|
||||
`OperationSpec` schema independently.
|
||||
|
||||
## Constraints
|
||||
|
||||
- Pure computation crate — no I/O, no database, no external state.
|
||||
- No dependency on alknet-core, alknet-storage, or alknet-secret.
|
||||
- Type compatibility with alknet-core's `OperationSpec` is via serialization
|
||||
conformance, not a crate dependency.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- None specific to this spec. See [open-questions.md](open-questions.md) for
|
||||
general questions.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [027](decisions/027-crate-decomposition.md) | Crate decomposition | alknet-flowgraph is independent of core, storage, secret |
|
||||
|
||||
## References
|
||||
|
||||
- [research/flow.md](../research/flow.md) — Full FlowGraph, operation graph, call graph design
|
||||
- [research/integration-plan.md](../research/integration-plan.md) — Phase 2.3
|
||||
- [call-protocol.md](call-protocol.md) — EventEnvelope, PendingRequestMap
|
||||
- `@alkdev/flowgraph` — TypeScript call-graph and operation-graph implementation
|
||||
- `@alkdev/operations` — OperationSpec, CallHandler, registry
|
||||
@@ -1,193 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-07
|
||||
---
|
||||
|
||||
# Identity
|
||||
|
||||
## What
|
||||
|
||||
The `Identity` type and `IdentityProvider` trait are the core abstractions for
|
||||
authentication and authorization in alknet. `Identity` is the unified result of
|
||||
auth verification — whether via SSH public key, signed timestamp token, or
|
||||
database lookup. `IdentityProvider` is the trait that resolves credentials to an
|
||||
`Identity`, decoupling alknet-core from any specific identity storage.
|
||||
|
||||
## Why
|
||||
|
||||
Auth, forwarding policy, and call protocol all need to know who is making a
|
||||
request and what they are authorized to do. Without `Identity` in core, each
|
||||
subsystem would define its own identity type, leading to duplication and
|
||||
conversion boilerplate. Without `IdentityProvider` as a trait, alknet-core
|
||||
would either hardcode config-file-based auth or take a database dependency —
|
||||
neither acceptable for a library crate.
|
||||
|
||||
The `IdentityProvider` trait exists because the same auth verification concept
|
||||
needs two implementations: `ConfigIdentityProvider` for minimal deployments (all
|
||||
keys in memory via ArcSwap) and `StorageIdentityProvider` for production (SQLite
|
||||
lookup via `peer_credentials` and ACL graph). The trait is the contract; the
|
||||
backing store is pluggable.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Identity Struct
|
||||
|
||||
```rust
|
||||
pub struct Identity {
|
||||
pub id: String, // Fingerprint or account UUID
|
||||
pub scopes: Vec<String>, // e.g., ["relay:connect", "service:gitea:read"]
|
||||
pub resources: HashMap<String, Vec<String>>, // e.g., {"service": ["gitea", "registry"]}
|
||||
}
|
||||
```
|
||||
|
||||
The `id` field serves dual purpose:
|
||||
- **Config-based auth** (`ConfigIdentityProvider`): holds the Ed25519 key
|
||||
fingerprint (e.g., `SHA256:abc123...`)
|
||||
- **Database-backed auth** (`StorageIdentityProvider`): holds the account UUID
|
||||
from the `accounts` table
|
||||
|
||||
This keeps the type simple while accommodating both auth paths. Downstream
|
||||
consumers (forwarding policy, call protocol ACL checks) use `scopes` and
|
||||
`resources` without knowing whether the identity came from a config file or a
|
||||
database.
|
||||
|
||||
### IdentityProvider Trait
|
||||
|
||||
```rust
|
||||
pub trait IdentityProvider: Send + Sync + 'static {
|
||||
/// Resolve an SSH public key fingerprint to an identity.
|
||||
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
|
||||
|
||||
/// Resolve an auth token to an identity.
|
||||
/// Returns None if the token is invalid, expired, or the key is not authorized.
|
||||
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
|
||||
}
|
||||
```
|
||||
|
||||
Both SSH key auth and token auth resolve to the same `Identity` type. The trait
|
||||
lives in `alknet_core::auth`.
|
||||
|
||||
### ConfigIdentityProvider (Default)
|
||||
|
||||
Reads from `ArcSwap<DynamicConfig.auth>` per ADR-030. Every authorized key gets
|
||||
a default scope set. No database dependency. This is the default for CLI and
|
||||
single-node deployments.
|
||||
|
||||
```rust
|
||||
pub struct ConfigIdentityProvider {
|
||||
auth_config: Arc<ArcSwap<DynamicConfig>>,
|
||||
}
|
||||
|
||||
impl IdentityProvider for ConfigIdentityProvider {
|
||||
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity> {
|
||||
let config = self.auth_config.load();
|
||||
config.auth.ssh.authorized_keys.get(fingerprint)
|
||||
.map(|key_entry| Identity {
|
||||
id: fingerprint.to_string(),
|
||||
scopes: key_entry.scopes.clone(),
|
||||
resources: key_entry.resources.clone(),
|
||||
})
|
||||
}
|
||||
|
||||
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity> {
|
||||
// Verify Ed25519 signature against the same authorized_keys set
|
||||
// Resolve to the same Identity as SSH auth would produce
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### StorageIdentityProvider (Future — Phase 2+)
|
||||
|
||||
Implemented in `alknet-storage` (a crate that doesn't exist yet). Backed by
|
||||
SQLite `peer_credentials` and `api_keys` tables plus the ACL graph. Resolves
|
||||
fingerprint → account → organization membership → effective scopes.
|
||||
|
||||
This implementation is defined here so the contract is clear, but alknet-storage
|
||||
hasn't been built yet. Phase 1 uses `ConfigIdentityProvider` exclusively. When
|
||||
alknet-storage is built, it implements alknet-core's `IdentityProvider` trait,
|
||||
and the CLI/NAPI assembly layer wires the concrete implementation.
|
||||
|
||||
### AuthProtocol irpc Service
|
||||
|
||||
The `AuthProtocol` irpc service (behind the `irpc` feature flag per ADR-028)
|
||||
provides an async boundary for auth verification. It is one way to satisfy the
|
||||
`IdentityProvider` trait, not a replacement for it:
|
||||
|
||||
```rust
|
||||
enum AuthProtocol {
|
||||
VerifyPubkey { fingerprint: String, key_data: Vec<u8> },
|
||||
VerifyToken { token_bytes: Vec<u8>, timestamp: u64 },
|
||||
ReloadKeys,
|
||||
CheckAccess { identity: Identity, operation: String },
|
||||
}
|
||||
|
||||
enum AuthResult {
|
||||
Ok(Identity),
|
||||
Denied(String),
|
||||
}
|
||||
```
|
||||
|
||||
The relationship:
|
||||
- **Trait-based path**: Handler calls `identity_provider.resolve_from_fingerprint()`
|
||||
directly. Zero overhead. Used when irpc is disabled or when the
|
||||
implementation is local.
|
||||
- **irpc path**: Handler calls `identity_provider.resolve_from_fingerprint()`,
|
||||
which internally delegates to `AuthProtocol::VerifyPubkey` via an irpc client.
|
||||
Used in production deployments with SQLite-backed auth.
|
||||
|
||||
Both paths produce the same `Identity` result. Note: the irpc path requires the
|
||||
service layer to be built (Phase 2+). Phase 1 uses the trait path exclusively.
|
||||
|
||||
### Auth Flows
|
||||
|
||||
**SSH key auth** (existing, unchanged):
|
||||
```
|
||||
Client connects → SSH handshake → auth_publickey() callback
|
||||
→ IdentityProvider::resolve_from_fingerprint(fingerprint)
|
||||
→ Some(Identity) or None
|
||||
```
|
||||
|
||||
**Token auth** (new, for non-SSH transports):
|
||||
```
|
||||
Browser connects → WebTransport CONNECT request
|
||||
→ Extract token from URL path or Authorization header
|
||||
→ IdentityProvider::resolve_from_token(token)
|
||||
→ Some(Identity) or None
|
||||
```
|
||||
|
||||
Both paths produce an `Identity`. The `Identity` is attached to the connection
|
||||
and used by `ForwardingPolicy` and call protocol for authorization decisions.
|
||||
|
||||
## Constraints
|
||||
|
||||
- `Identity` and `IdentityProvider` live in `alknet_core::auth`. No database
|
||||
dependency at the core level (ADR-029).
|
||||
- alknet-storage implements the core trait — the dependency goes from storage
|
||||
to core, not the other way.
|
||||
- The `id` field in `Identity` serves dual purpose (fingerprint or UUID). This
|
||||
is a deliberate simplification — downstream consumers don't need to know the
|
||||
source.
|
||||
- Certificate authority tokens are not supported for token auth in v1 (ADR-023).
|
||||
- The irpc feature flag means nodes that only do SSH tunneling don't need the
|
||||
service layer overhead.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- None specific to this spec. See [open-questions.md](open-questions.md) for
|
||||
general auth questions (OQ-15, OQ-19).
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [029](decisions/029-identity-core-type.md) | Identity as core type | `Identity` and `IdentityProvider` live in alknet-core, not storage |
|
||||
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | `AuthProtocol` behind feature flag; `IdentityProvider` is the contract |
|
||||
| [023](decisions/023-unified-auth-shared-key-material.md) | Unified auth | Same key material for SSH and token auth; same `Identity` result |
|
||||
|
||||
## References
|
||||
|
||||
- [auth.md](auth.md) — Token authentication, AuthPolicy, WebTransport session handling
|
||||
- [research/services.md](../research/services.md) — AuthService, AuthProtocol definition
|
||||
- [research/integration-plan.md](../research/integration-plan.md) — Phase 1.2
|
||||
- [ADR-030](decisions/030-static-dynamic-config-split.md) — DynamicConfig (ConfigIdentityProvider reads from it)
|
||||
- [ADR-031](decisions/031-forwarding-policy.md) — ForwardingPolicy consumes Identity.scopes
|
||||
@@ -1,390 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-09
|
||||
---
|
||||
|
||||
# Interface (Layer 2)
|
||||
|
||||
## What
|
||||
|
||||
The Interface layer sits between Transport (Layer 1) and Protocol (Layer 3).
|
||||
Interfaces consume byte streams from Transports or manage their own transports,
|
||||
and produce call protocol sessions or handle discrete requests. SSH is an
|
||||
interface, not a transport — it wraps a byte stream in session semantics. Raw
|
||||
framing (4-byte length prefix + JSON `EventEnvelope`) is another interface.
|
||||
HTTP and DNS are message-based interfaces that handle individual request/response
|
||||
pairs without persistent sessions.
|
||||
|
||||
## Why
|
||||
|
||||
In the original architecture, SSH was deeply embedded in `ServerHandler`. This
|
||||
tangling of transport, interface, and protocol made it impossible to:
|
||||
|
||||
- Run the call protocol over DNS queries without wrapping SSH inside DNS
|
||||
- Use raw framing for local service mesh (no SSH overhead)
|
||||
- Support WebTransport direct call protocol for browsers
|
||||
- Separate auth mechanics from channel management
|
||||
- Accept HTTP requests and map them to call protocol operations
|
||||
|
||||
The three-layer model (ADR-026) cleanly separates these concerns. Transport
|
||||
produces bytes. Interface parses bytes into sessions or handles requests.
|
||||
Protocol carries semantics. A connection is always a (Transport, Interface)
|
||||
pair for stream-based interfaces, or a standalone message-based interface.
|
||||
|
||||
Phase 2 research identified that HTTP and DNS don't fit the persistent session
|
||||
model — they're stateless per-request. This led to the StreamInterface /
|
||||
MessageInterface split (ADR-035), which gives each interface category its own
|
||||
trait with the right lifecycle and ownership model.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Three-Layer Model
|
||||
|
||||
```
|
||||
Layer 3: Protocol (Call protocol, Operations, OperationEnv)
|
||||
Layer 2: Interface (StreamInterface: SSH, raw framing | MessageInterface: HTTP, DNS)
|
||||
Layer 1: Transport (TCP, TLS, iroh, WebTransport)
|
||||
```
|
||||
|
||||
- **Layer 1: Transport** — produces byte streams (`AsyncRead + AsyncWrite + Unpin
|
||||
+ Send`). Unchanged per ADR-001. DNS is NOT a transport.
|
||||
- **Layer 2: Interface** — two categories:
|
||||
- **StreamInterface**: consumes a `TransportStream` and produces a long-lived
|
||||
session that yields `InterfaceEvent` frames.
|
||||
- **MessageInterface**: handles individual `InterfaceRequest` →
|
||||
`InterfaceResponse` pairs. Manages its own transport.
|
||||
- **Layer 3: Protocol** — carries semantics. Call protocol events, operation
|
||||
registry, service calls. Agnostic to both Transport and Interface below it.
|
||||
|
||||
### StreamInterface Trait
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait StreamInterface: Send + Sync + 'static {
|
||||
type Session: InterfaceSession;
|
||||
|
||||
async fn accept(
|
||||
&self,
|
||||
stream: Box<dyn TransportStream>,
|
||||
config: &InterfaceConfig,
|
||||
) -> Result<Self::Session>;
|
||||
}
|
||||
```
|
||||
|
||||
The session produced by a `StreamInterface` is consumed by the call protocol
|
||||
handler. Different stream interfaces produce different session types, but the
|
||||
call protocol handler receives `InterfaceEvent` frames from any stream
|
||||
interface.
|
||||
|
||||
### MessageInterface Trait
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait MessageInterface: Send + Sync + 'static {
|
||||
async fn handle_request(&self, request: InterfaceRequest) -> Result<InterfaceResponse>;
|
||||
}
|
||||
```
|
||||
|
||||
Message-based interfaces handle individual requests without persistent sessions.
|
||||
They manage their own transport (HTTP server, DNS server) and normalize requests
|
||||
into `InterfaceRequest` / `InterfaceResponse`.
|
||||
|
||||
### InterfaceRequest / InterfaceResponse
|
||||
|
||||
```rust
|
||||
pub struct InterfaceRequest {
|
||||
pub operation_path: String, // e.g., "/head/auth/verify"
|
||||
pub input: Value, // JSON input payload
|
||||
pub auth_token: Option<AuthToken>, // Extracted from wire format
|
||||
pub metadata: HashMap<String, String>,
|
||||
}
|
||||
|
||||
pub struct InterfaceResponse {
|
||||
pub result: Result<Value, CallError>,
|
||||
pub status: u16, // HTTP status, DNS result code, etc.
|
||||
pub headers: HashMap<String, String>,
|
||||
}
|
||||
```
|
||||
|
||||
The call protocol handler processes `InterfaceRequest` the same way it processes
|
||||
`InterfaceEvent` frames — both resolve to operation invocations through
|
||||
`OperationEnv`. The difference is framing: stream interfaces produce `InterfaceEvent`
|
||||
frames from a continuous byte stream, message interfaces construct `InterfaceRequest`
|
||||
from their wire format.
|
||||
|
||||
### InterfaceSession
|
||||
|
||||
Every stream interface session implements `InterfaceSession`:
|
||||
|
||||
```rust
|
||||
pub struct InterfaceEvent {
|
||||
pub envelope: EventEnvelope,
|
||||
pub identity: Option<Identity>,
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
pub trait InterfaceSession: Send {
|
||||
async fn recv(&mut self) -> Option<InterfaceEvent>;
|
||||
async fn send(&mut self, envelope: EventEnvelope) -> Result<()>;
|
||||
}
|
||||
```
|
||||
|
||||
`InterfaceEvent` carries an `EventEnvelope` and the authenticated `Identity`.
|
||||
The call protocol handler (Layer 3) receives `InterfaceEvent` frames and
|
||||
processes them uniformly, regardless of whether they arrived over SSH or raw
|
||||
framing.
|
||||
|
||||
### SshInterface (StreamInterface)
|
||||
|
||||
Wraps the existing `ServerHandler` logic. This is the most complex stream
|
||||
interface because SSH provides channel multiplexing, auth negotiation, and
|
||||
proxy management within a single session.
|
||||
|
||||
What stays in SshInterface (Layer 2):
|
||||
- SSH handshake and session management
|
||||
- Auth delegation to `IdentityProvider` (via `auth_publickey()` callback)
|
||||
- Channel multiplexing (multiple channels per session)
|
||||
- `alknet-control:0` channel routing to call protocol
|
||||
|
||||
What moves to Layer 3 (call protocol handler):
|
||||
- Operation registry and dispatch
|
||||
- Forwarding policy checks (per ADR-031)
|
||||
- Operation context construction (Identity, scopes)
|
||||
|
||||
What moves to per-connection state:
|
||||
- Port forwarding proxy logic
|
||||
|
||||
**Current implementation note**: `SshSession::recv()` and `SshSession::send()`
|
||||
are stubs. The bridge from SSH channels to `InterfaceEvent` frames is
|
||||
scheduled for Phase 2 implementation (see integration-plan.md Phase 2.1).
|
||||
|
||||
### RawFramingInterface (StreamInterface)
|
||||
|
||||
Reads 4-byte big-endian length prefix + JSON `EventEnvelope` frames directly
|
||||
from the transport stream. No SSH wrapping. No channel multiplexing — the
|
||||
entire stream is a single call protocol channel.
|
||||
|
||||
```rust
|
||||
pub struct RawFramingInterface;
|
||||
|
||||
impl StreamInterface for RawFramingInterface {
|
||||
type Session = RawFramingSession;
|
||||
// Reads length-prefixed EventEnvelope frames from the stream
|
||||
}
|
||||
```
|
||||
|
||||
Used for:
|
||||
- Local service mesh (TCP + raw framing, no SSH overhead)
|
||||
- Secure mesh (TLS + raw framing)
|
||||
- WebTransport direct call protocol (future: WebTransport + raw framing)
|
||||
|
||||
Auth for raw framing: `AuthToken` in frame header, resolved via
|
||||
`IdentityProvider::resolve_from_token()`.
|
||||
|
||||
**Current implementation note**: `RawFramingInterface::accept()` returns an
|
||||
error. Frame reading/writing is scheduled for Phase 2 implementation (see
|
||||
integration-plan.md Phase 2.2).
|
||||
|
||||
### HttpInterface (MessageInterface)
|
||||
|
||||
Accepts standard HTTP requests and maps them to call protocol operations:
|
||||
|
||||
```
|
||||
POST /v1/{namespace}/{op} → registry.invoke(namespace, op, input) (mutation)
|
||||
GET /v1/{namespace}/{op} → registry.invoke(namespace, op, input) (query)
|
||||
GET /v1/{namespace}/{op} SSE → registry.subscribe(namespace, op, input) (subscription)
|
||||
GET /v1/schema → registry.list_operations()
|
||||
```
|
||||
|
||||
Auth: `Authorization: Bearer <token>` header, resolved via
|
||||
`IdentityProvider::resolve_from_token()`. Both AuthTokens and API keys are
|
||||
accepted.
|
||||
|
||||
The HTTP interface runs inside the existing stealth mode byte-peek architecture:
|
||||
after a TLS handshake, the server peeks at the first bytes. If they're
|
||||
`SSH-2.0-`, the stream goes to `SshInterface`. Otherwise, the stream goes to
|
||||
the axum HTTP router.
|
||||
|
||||
**Phase 2 scope**: Auth middleware, stealth handoff, and default 404 handler
|
||||
only. Specific operation routes and path conventions are Phase 5+. The
|
||||
`ListenerConfig::Http` variant spawns an axum router that reaches auth context;
|
||||
routing inside axum is a later concern.
|
||||
|
||||
### DnsInterface (MessageInterface)
|
||||
|
||||
A DNS server that encodes/decodes `EventEnvelope` frames as DNS query/response
|
||||
pairs. AuthToken is embedded in DNS query labels. Resolution via
|
||||
`IdentityProvider::resolve_from_token()`.
|
||||
|
||||
This is a `MessageInterface` — it manages its own transport (UDP/TCP port 53)
|
||||
and handles individual DNS queries as request/response pairs. DNS is NOT a
|
||||
transport.
|
||||
|
||||
**Phase**: DNS interface implementation is Phase 5+. The `ListenerConfig::Dns`
|
||||
variant and `DnsInterface` stub are defined now; implementation is deferred.
|
||||
|
||||
### Stream-Based Interface Pairs
|
||||
|
||||
| Transport | StreamInterface | Credential Presentation | Use case |
|
||||
|-----------|---------------|------------------------|----------|
|
||||
| TLS | SshInterface | SSH key handshake | Standard alknet tunnel |
|
||||
| TCP | SshInterface | SSH key handshake | Plain SSH tunnel |
|
||||
| iroh | SshInterface | SSH key handshake | P2P SSH tunnel |
|
||||
| TCP | RawFramingInterface | AuthToken in frame header | Local service mesh |
|
||||
| TLS | RawFramingInterface | AuthToken in frame header | Secure mesh |
|
||||
| WebTransport | RawFramingInterface | AuthToken in CONNECT request | Browser call protocol (future) |
|
||||
|
||||
### Message-Based Interface Pairs
|
||||
|
||||
| MessageInterface | Credential Presentation | Owns transport? | Use case |
|
||||
|-----------------|------------------------|----------------|----------|
|
||||
| HttpInterface | `Authorization: Bearer` header | Yes (axum) | REST API, dashboard, integrations |
|
||||
| DnsInterface | AuthToken in query labels | Yes (DNS server) | Censorship-resistant control channel |
|
||||
| WebSocketInterface | AuthToken in handshake | Yes (WS server) | Browser persistent connection (future) |
|
||||
|
||||
Message-based interfaces manage their own transport. They don't need a
|
||||
`Transport` from Layer 1 — they ARE the transport+interface combined.
|
||||
|
||||
### ListenerConfig
|
||||
|
||||
The server's accept loop configuration covers both stream and message interfaces:
|
||||
|
||||
```rust
|
||||
pub enum ListenerConfig {
|
||||
Stream {
|
||||
transport: TransportKind,
|
||||
interface: StreamInterfaceKind,
|
||||
},
|
||||
Http {
|
||||
bind_addr: SocketAddr,
|
||||
tls: bool,
|
||||
stealth: bool, // byte-peek protocol detection on shared port
|
||||
},
|
||||
Dns {
|
||||
bind_addr: SocketAddr,
|
||||
tls: bool,
|
||||
},
|
||||
}
|
||||
|
||||
pub enum StreamInterfaceKind {
|
||||
Ssh,
|
||||
RawFraming,
|
||||
}
|
||||
|
||||
pub enum TransportKind {
|
||||
Tcp,
|
||||
Tls { server_name: Option<String> },
|
||||
Iroh { endpoint_id: String },
|
||||
WebTransport, // Phase 5+: tag only, no acceptor yet
|
||||
}
|
||||
```
|
||||
|
||||
Note: `TransportKind::Dns` does NOT exist. DNS is a `MessageInterface`, not a
|
||||
transport. The `ListenerConfig::Dns` variant handles DNS listener configuration
|
||||
directly.
|
||||
|
||||
### Credential Presentation Across Interfaces
|
||||
|
||||
Every interface resolves to the same `Identity` through `IdentityProvider`:
|
||||
|
||||
```
|
||||
SSH fingerprint → IdentityProvider::resolve_from_fingerprint → Identity
|
||||
AuthToken (Bearer) → IdentityProvider::resolve_from_token → Identity
|
||||
API key (Bearer) → IdentityProvider::resolve_from_token → Identity
|
||||
DNS embedded token → IdentityProvider::resolve_from_token → Identity
|
||||
```
|
||||
|
||||
The credential presentation differs per (Transport, Interface) pair, but the
|
||||
resolution result is always an `Identity`. See [definitions.md](definitions.md)
|
||||
for the full table and terminology rules.
|
||||
|
||||
### Server Accept Loop
|
||||
|
||||
With both stream and message interfaces, the accept loop becomes:
|
||||
|
||||
```rust
|
||||
for listener in listeners {
|
||||
match listener {
|
||||
ListenerConfig::Stream { transport, interface } => {
|
||||
// Spawn accept loop: transport.accept() → interface.accept(stream)
|
||||
}
|
||||
ListenerConfig::Http { bind_addr, tls, stealth } => {
|
||||
// Spawn axum HTTP server on bind_addr
|
||||
// If stealth: byte-peek after TLS, route SSH vs HTTP
|
||||
}
|
||||
ListenerConfig::Dns { bind_addr, tls } => {
|
||||
// Spawn DNS server on bind_addr
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Constraints
|
||||
|
||||
- `StreamInterface` and `MessageInterface` are independent traits with different
|
||||
signatures, lifecycles, and transport ownership. No common super-trait (ADR-035).
|
||||
- `SshInterface` is the most invasive refactoring. The existing `SshHandler`
|
||||
owns auth, channel management, and proxy logic — extracting these cleanly
|
||||
requires careful design (integration-plan Phase 1.8, completed in Phase 1).
|
||||
- DNS interface implementation is Phase 5 work. `DnsInterface` is defined as a
|
||||
`MessageInterface` stub; implementation is deferred.
|
||||
- HTTP interface Phase 2 scope is limited to auth middleware and stealth handoff.
|
||||
Specific operation routes are Phase 5+.
|
||||
- WebTransport is Phase 5 work. `TransportKind::WebTransport` and
|
||||
`StreamInterfaceKind::WebTransport` are tags only for now.
|
||||
- `TransportKind::Dns` does not exist. DNS is a `MessageInterface`, not a
|
||||
transport. This was `TransportKind` enum pollution from an earlier design.
|
||||
- The `Interface` trait (singular) in the current codebase needs to be renamed
|
||||
to `StreamInterface`. This is a rename, not a semantic change.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **OQ-IF-02**: ~~Should `SshInterface` own the `ForwardingPolicy` check for
|
||||
`channel_open_direct_tcpip`, or should that move to Layer 3?~~ **Resolved**:
|
||||
ForwardingPolicy is Layer 3, but channel open/close lifecycle is Layer 2.
|
||||
SshInterface reports channel requests to Layer 3; Layer 3 applies policy.
|
||||
|
||||
- **OQ-P2-01**: Should `MessageInterface` and `StreamInterface` share a common
|
||||
trait? **Recommendation**: No. Independent traits with different signatures,
|
||||
lifecycles, and transport ownership. A common super-trait adds complexity
|
||||
without clear benefit. (See ADR-035.)
|
||||
|
||||
- **OQ-P2-02**: Should the HTTP interface share a port with the SSH listener?
|
||||
**Recommendation**: Start with separate ports. ALPN multiplexing on port 443
|
||||
is a future optimization that doesn't change the interface abstraction.
|
||||
Stealth mode byte-peek already handles shared-port detection for the common
|
||||
case.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [026](decisions/026-transport-interface-separation.md) | Three-layer model | SSH is Layer 2, not Layer 1 |
|
||||
| [035](decisions/035-streaminterface-messageinterface-split.md) | StreamInterface / MessageInterface | Two trait categories at Layer 2 |
|
||||
| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv | Protocol is interface-agnostic |
|
||||
| [029](decisions/029-identity-core-type.md) | Identity as core type | Auth resolution across interfaces |
|
||||
| [031](decisions/031-forwarding-policy.md) | Forwarding policy | Layer 3 policy applied to Layer 2 channel requests |
|
||||
|
||||
## Phase 2 Implementation Notes
|
||||
|
||||
- `Interface` trait renamed to `StreamInterface` throughout alknet-core (ADR-035 implemented)
|
||||
- `MessageInterface` trait added with `handle_request(InterfaceRequest) -> Result<InterfaceResponse>` (ADR-035 implemented)
|
||||
- `InterfaceRequest` and `InterfaceResponse` types implemented
|
||||
- `HttpInterface` and `DnsInterface` stub structs added (Phase 5 for full implementation)
|
||||
- `InterfaceConfig` split into `StreamInterfaceConfig` and `MessageInterfaceConfig`
|
||||
- `StreamInterfaceKind` and `MessageInterfaceKind` enums added
|
||||
- `ListenerConfig` restructured from flat struct to enum with `Stream`, `Http`, `Dns` variants
|
||||
- `TransportKind::Dns` removed from the enum (DNS is a MessageInterface, not a transport)
|
||||
- `TransportKind::WebTransport` updated from `{ host: String }` to `{ server_name: Option<String> }`
|
||||
- `RawFramingInterface` fully implemented with first-frame auth
|
||||
- `SshSession::recv()`/`send()` bridge to call protocol via `alknet-control:0` channel implemented, using `ControlChannelBridge` with mpsc channels
|
||||
|
||||
## References
|
||||
|
||||
- [definitions.md](definitions.md) — Terminology disambiguation, credential presentation
|
||||
- [research/phase2/interface-model.md](../research/phase2/interface-model.md) — Full StreamInterface/MessageInterface analysis
|
||||
- [research/phase2/tls-transport.md](../research/phase2/tls-transport.md) — HTTP interface, stealth handoff, ListenerConfig
|
||||
- [research/integration-plan.md](../research/integration-plan.md) — Phase 1.8, Phase 2.1-2.7
|
||||
- [transport.md](transport.md) — Transport trait (unchanged at Layer 1)
|
||||
- [auth.md](auth.md) — Credential presentation per (Transport, Interface) pair
|
||||
- [identity.md](identity.md) — IdentityProvider, auth across interfaces
|
||||
@@ -1,189 +0,0 @@
|
||||
---
|
||||
status: reviewed
|
||||
last_updated: 2026-06-07
|
||||
---
|
||||
|
||||
# NAPI Wrapper & PubSub Event Target
|
||||
|
||||
## What
|
||||
|
||||
Two integration layers that enable TypeScript/JavaScript consumers to use alknet as a transport:
|
||||
|
||||
1. **NAPI wrapper** (`@alkdev/alknet`) — A Node.js native addon (via napi-rs) exposing `connect()` and `serve()` that return duplex streams
|
||||
2. **PubSub event target** (`@alkdev/pubsub` adapter) — An implementation of the `TypedEventTarget` interface that routes events over alknet's SSH channel
|
||||
|
||||
## Why
|
||||
|
||||
The alknet Rust binary serves CLI users. But the broader ecosystem (pubsub, operations, agent workers) is TypeScript-first. These integration layers let TypeScript code use alknet's transport without reimplementing SSH.
|
||||
|
||||
The NAPI surface is intentionally minimal — it exposes transport connections as duplex streams, not the full SSH protocol. The pubsub adapter wraps those streams with `EventEnvelope` serialization.
|
||||
|
||||
## Architecture
|
||||
|
||||
### NAPI Wrapper (napi-rs)
|
||||
|
||||
The wrapper uses napi-rs (ADR-015) and exposes two functions (ADR-016):
|
||||
|
||||
```typescript
|
||||
// @alkdev/alknet (TypeScript side)
|
||||
|
||||
interface AlknetConnectOptions {
|
||||
// TCP/TLS mode
|
||||
server?: string; // e.g., "example.com:443"
|
||||
// iroh mode
|
||||
peer?: string; // iroh endpoint ID (base58-encoded)
|
||||
// Transport
|
||||
transport: 'tcp' | 'tls' | 'iroh';
|
||||
// Auth
|
||||
identity?: string; // path to SSH key, or Buffer with key data
|
||||
// TLS
|
||||
tlsServerName?: string; // SNI hostname
|
||||
insecure?: boolean; // accept self-signed certs
|
||||
// iroh
|
||||
irohRelay?: string; // relay URL (default: n0)
|
||||
// Proxy
|
||||
proxy?: string; // upstream SOCKS5/HTTP proxy URL
|
||||
}
|
||||
|
||||
interface AlknetServeOptions {
|
||||
// Transport
|
||||
transport: 'tcp' | 'tls' | 'iroh';
|
||||
// Auth
|
||||
hostKey?: string; // path to SSH host key, or Buffer with key data
|
||||
authorizedKeys?: string; // path to authorized_keys, or Buffer with key data
|
||||
certAuthority?: string; // path to CA public key for cert-authority auth
|
||||
// TLS
|
||||
tlsCert?: string; // path to TLS cert
|
||||
tlsKey?: string; // path to TLS key
|
||||
acmeDomain?: string; // ACME domain for auto-cert (ADR-008)
|
||||
// Listen
|
||||
listen?: string; // listen address (default: 0.0.0.0:22)
|
||||
// iroh
|
||||
irohRelay?: string; // relay URL (default: n0)
|
||||
}
|
||||
|
||||
// Returns a Duplex stream for the SSH channel
|
||||
function connect(options: AlknetConnectOptions): Promise<Duplex>;
|
||||
|
||||
// Returns a server object with close() and connection events
|
||||
function serve(options: AlknetServeOptions): Promise<AlknetServer>;
|
||||
|
||||
interface AlknetServer {
|
||||
close(): Promise<void>;
|
||||
onConnection(callback: (stream: Duplex, info: ConnectionInfo) => void): void;
|
||||
// Dynamic config reload (ADR-030)
|
||||
reloadAuth(auth: { authorizedKeys?: Buffer, certAuthority?: Buffer }): void;
|
||||
reloadForwarding(policy: ForwardingPolicyConfig): void;
|
||||
reloadAll(config: DynamicConfig): void;
|
||||
}
|
||||
|
||||
interface ForwardingPolicyConfig {
|
||||
default: 'allow' | 'deny';
|
||||
rules: ForwardingRuleConfig[];
|
||||
}
|
||||
|
||||
interface ForwardingRuleConfig {
|
||||
target: string; // "localhost:*", "10.0.0.0/8:80", "alknet-*"
|
||||
action: 'allow' | 'deny';
|
||||
principals?: string[]; // default ["*"]
|
||||
}
|
||||
```
|
||||
|
||||
The NAPI layer is **transport-agnostic** — it doesn't know about pubsub's `EventEnvelope`. The pubsub adapter wraps the `Duplex` stream to implement `TypedEventTarget`. This separation ensures the NAPI wrapper is reusable for any stream-based protocol, not tied specifically to pubsub.
|
||||
|
||||
### NAPI Call Protocol Integration
|
||||
|
||||
NAPI consumers can register operation handlers to participate in the call protocol. The `Duplex` stream from `connect()` or `serve()` carries `EventEnvelope` frames (4-byte BE length prefix + JSON). A TypeScript consumer can implement a call protocol handler that reads these frames and dispatches to registered operations — the same wire protocol used by `@alkdev/operations`.
|
||||
|
||||
See [call-protocol.md](call-protocol.md) for the call protocol spec and [services.md](services.md) for OperationEnv and dispatch paths.
|
||||
|
||||
### NAPI irpc Service Creation
|
||||
|
||||
Behind the `irpc` feature flag, NAPI consumers can create irpc service instances for in-cluster communication. This is a Phase 2+ capability — Phase 1 uses `ConfigIdentityProvider` and direct `ConfigReloadHandle` calls. See [services.md](services.md) for the irpc service layer and ADR-027 for crate decomposition.
|
||||
|
||||
### NAPI `connect()` vs CLI `alknet connect`
|
||||
|
||||
The NAPI `connect()` function and the CLI `alknet connect` command are fundamentally different operations despite sharing the same name:
|
||||
|
||||
- **CLI `alknet connect`**: Starts a full SSH client session with a local SOCKS5 server and optional port forwards. It manages multiple SSH channels over a single session — the user routes traffic through it via SOCKS5 or forwarded ports.
|
||||
- **NAPI `connect()`**: Opens a single SSH channel and returns it as a `Duplex` stream. No SOCKS5 server, no port forwarding. The caller reads and writes bytes directly. This is designed for the pubsub/programmatic use case where a single bidirectional byte stream is needed.
|
||||
|
||||
For SOCKS5 proxy functionality, use the CLI binary (`alknet connect`). The NAPI wrapper is for programmatic consumers that need a raw stream.
|
||||
|
||||
### Programmatic Configuration (ADR-011)
|
||||
|
||||
Both `connect()` and `serve()` accept options as plain objects. No file paths are mandatory — keys can be provided as `Buffer` data directly, making programmatic usage straightforward. Environment variables (`ALKNET_SERVER`, `ALKNET_IDENTITY`) provide convenience defaults.
|
||||
|
||||
Key material provided as `Buffer` must be in **OpenSSH key format** (the format used by `ssh-keygen`). Private keys: OpenSSH format (`-----BEGIN OPENSSH PRIVATE KEY-----`). Public keys: OpenSSH format (`ssh-ed25519 AAAA...`). PEM-encoded keys (PKCS#1, PKCS#8) are not supported.
|
||||
|
||||
### PubSub Event Target Adapter
|
||||
|
||||
This implements `TypedEventTarget` from `@alkdev/pubsub`:
|
||||
|
||||
```typescript
|
||||
// @alkdev/pubsub (new adapter: event-target-alknet.ts)
|
||||
|
||||
export interface AlknetEventTargetOptions {
|
||||
stream: Duplex; // from @alkdev/alknet.connect() or serve()
|
||||
}
|
||||
|
||||
export interface AlknetEventTarget<TEvent extends TypedEvent>
|
||||
extends TypedEventTarget<TEvent> {
|
||||
close(): void;
|
||||
}
|
||||
|
||||
export function createAlknetEventTarget<TEvent extends TypedEvent>(
|
||||
options: AlknetEventTargetOptions
|
||||
): AlknetEventTarget<TEvent>;
|
||||
```
|
||||
|
||||
Wire protocol (same as other pubsub adapters):
|
||||
|
||||
- **Framing**: 4-byte big-endian length prefix + JSON payload
|
||||
- **Payload**: `EventEnvelope` JSON (`{ type, id, payload }`)
|
||||
- **Control**: `__subscribe` / `__unsubscribe` messages for topic-based routing
|
||||
- **Direction**: Bidirectional — `dispatchEvent` sends, `addEventListener` subscribes and receives
|
||||
|
||||
### On the Server Side
|
||||
|
||||
The alknet server uses a reserved `direct_tcpip` destination (`alknet-control:0`) for the pubsub control channel (ADR-018). When a client connects to this destination:
|
||||
|
||||
1. The server's `channel_open_direct_ip` handler detects the reserved `alknet-control` target
|
||||
2. Instead of opening a TCP connection, it bridges the channel to its local pubsub event bus
|
||||
3. `EventEnvelope` JSON flows bidirectionally over the SSH channel
|
||||
|
||||
Users who prefer not to use the control channel can alternatively run a pubsub service on a specific port and use standard port forwarding: `alknet connect --forward 9736:head:9736`. This is a deployment choice, not a separate implementation — alknet's port forwarding works normally for any TCP service.
|
||||
|
||||
- **Worker connects to head**: `alknet connect --forward 9736:head:9736` then create WebSocket event target pointing at `ws://localhost:9736`
|
||||
|
||||
- **Head connects to worker**: `alknet connect --remote-forward 9736:worker:9736` — same result, opposite initiator
|
||||
|
||||
The pubsub adapter doesn't care which side initiated the SSH session. It just needs a byte stream.
|
||||
|
||||
## Constraints
|
||||
|
||||
- The NAPI wrapper exposes duplex streams, not the full SSH channel API. Multiplexing is done at the pubsub layer.
|
||||
- The pubsub wire protocol is length-prefixed JSON, matching the existing adapter pattern. Binary payloads should be base64-encoded in the `EventEnvelope.payload`.
|
||||
- The NAPI binary size will be ~5-10MB (includes russh + tokio + cryptography). The `iroh` feature adds significant size; it should be an optional feature.
|
||||
- Keys can be provided as file paths or `Buffer` data, supporting both CLI and programmatic usage patterns (ADR-011).
|
||||
|
||||
## Open Questions
|
||||
|
||||
None — all resolved.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [007](decisions/007-napi-single-stream.md) | NAPI exposes single duplex stream | No SSH multiplexing in JS, pubsub handles it |
|
||||
| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first API | No file-based config; options are structs or env vars |
|
||||
| [015](decisions/015-napi-rs-for-ffi-bridge.md) | napi-rs for FFI | Standard Node.js native addon tooling |
|
||||
| [016](decisions/016-napi-expose-connect-and-serve.md) | Both connect() and serve() | NAPI exposes client and server sides from the start |
|
||||
| [018](decisions/018-control-channel-for-pubsub.md) | Control channel for pubsub | Reserved `alknet-control` destination for event bus |
|
||||
| [030](decisions/030-static-dynamic-config-split.md) | Static/dynamic config split | NAPI reload methods for auth, forwarding, and all dynamic config |
|
||||
|
||||
## References
|
||||
|
||||
- [configuration.md](configuration.md) — DynamicConfig, ForwardingPolicy, reload mechanism
|
||||
- [services.md](services.md) — OperationEnv, irpc service layer
|
||||
- [call-protocol.md](call-protocol.md) — Call protocol wire format and operation registry
|
||||
@@ -1,340 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-07
|
||||
---
|
||||
|
||||
# Open Questions
|
||||
|
||||
## Transport
|
||||
|
||||
### OQ-01: TLS certificate management strategy
|
||||
- **Origin**: [server.md](server.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~medium~~ —
|
||||
- **Resolution**: ADR-008 — Support both domain-based and IP-based ACME/Let's Encrypt auto-provisioning, plus manual certs. Domain-based uses standard certbot-style flow with HTTP-01/TLS-ALPN-01 challenges. IP-based uses short-lived certs via TLS-ALPN-01 on port 443. Manual certs via `--tls-cert`/`--tls-key` always supported. Implementation uses `rustls-acme` or similar pure-Rust ACME client.
|
||||
- **Cross-references**: [ADR-008](decisions/008-acme-lets-encrypt.md), Server spec, TlsTransport implementation
|
||||
|
||||
### OQ-02: iroh relay configuration defaults
|
||||
- **Origin**: [transport.md](transport.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~low~~ —
|
||||
- **Resolution**: ADR-009 — Default to n0's free relay servers. Allow override via `--iroh-relay <url>`. Document self-hosted relay setup. This matches iroh's own defaults and minimizes friction for testing/development.
|
||||
- **Cross-references**: [ADR-009](decisions/009-default-iroh-relay.md), Transport spec
|
||||
|
||||
### OQ-05: Transport chaining support in CLI
|
||||
- **Origin**: [transport.md](transport.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~low~~ —
|
||||
- **Resolution**: ADR-010 — Support `--transport iroh --proxy socks5://...` natively in the CLI. iroh's endpoint builder accepts proxy configuration directly, so the implementation is minimal. Other transport combinations (TCP+TLS) are already implicit.
|
||||
- **Cross-references**: [ADR-010](decisions/010-transport-chaining-cli.md), Transport spec
|
||||
|
||||
## Client
|
||||
|
||||
### OQ-06: SSH config file parsing
|
||||
- **Origin**: [client.md](client.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~low~~ —
|
||||
- **Resolution**: ADR-011 — No `~/.ssh/config` parsing, no custom config file. Configuration is programmatic-first: CLI flags, library API structs (`ConnectOptions`, `ServeOptions`), and environment variables. Cross-platform path issues (`~` expansion) are avoided. The library API is the primary interface; if config files are needed later, they can be a separate layer.
|
||||
- **Cross-references**: [ADR-011](decisions/011-no-ssh-config-programmatic-api.md), Client spec
|
||||
|
||||
## Server
|
||||
|
||||
### OQ-07: ACME/Let's Encrypt support
|
||||
- **Origin**: [server.md](server.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~medium~~ —
|
||||
- **Resolution**: ADR-008 — Same resolution as OQ-01. Both domain-based (standard, domain-bound, auto-renewing) and IP-based (short-lived, no domain required) ACME flows are supported. The domain-based path requires port 80 or DNS access for challenges. The IP-based path uses TLS-ALPN-01 on port 443 and requires the ACME client to run continuously.
|
||||
- **Cross-references**: [ADR-008](decisions/008-acme-lets-encrypt.md), Server spec, TlsTransport
|
||||
|
||||
### OQ-08: Connection limits and rate limiting
|
||||
- **Origin**: [server.md](server.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~low~~ —
|
||||
- **Resolution**: ADR-013 — Two-layer approach: (1) Structured logging of auth attempts and connections at INFO level for fail2ban integration on Linux — matches our production fail2ban setup with nftables and systemd journal. (2) Built-in rate limiting: `--max-connections-per-ip` and `--max-auth-attempts` flags providing platform-independent abuse protection.
|
||||
- **Cross-references**: [ADR-013](decisions/013-fail2ban-friendly-logging.md), Server spec, Production fail2ban docs
|
||||
|
||||
### OQ-04: Authentication beyond Ed25519 keys
|
||||
- **Origin**: [client.md](client.md), [server.md](server.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~low~~ —
|
||||
- **Resolution**: ADR-012 — Ed25519 public key (default, unchanged) + OpenSSH certificate authority support (new, important for multi-user). No password authentication over SSH channels. If a local SOCKS5 proxy needs its own auth, that's a separate concern. Cert-authority makes multi-user management practical: one CA entry in `authorized_keys` instead of N individual keys. Certificates support expiry and restrictions.
|
||||
- **Cross-references**: [ADR-012](decisions/012-auth-ed25519-and-cert-authority.md), Client spec, Server spec
|
||||
|
||||
## TUN
|
||||
|
||||
### OQ-03: Windows TUN support scope
|
||||
- **Origin**: [tun-shim.md](tun-shim.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~low~~ —
|
||||
- **Resolution**: ADR-014 — TUN is deferred entirely from the alknet project. For VPN-like behavior, users run `tun2proxy --proxy socks5://127.0.0.1:1080` alongside alknet. This eliminates all TUN-related scope questions (Windows, TCP reconstruction, etc.).
|
||||
- **Cross-references**: [ADR-014](decisions/014-defer-tun-recommend-socks5-proxy.md)
|
||||
|
||||
### OQ-09: TCP reconstruction approach for TUN
|
||||
- **Origin**: [tun-shim.md](tun-shim.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~medium~~ —
|
||||
- **Resolution**: ADR-014 — TUN is deferred from alknet. tun2proxy (external tool) handles this if users need VPN-like behavior.
|
||||
- **Cross-references**: [ADR-014](decisions/014-defer-tun-recommend-socks5-proxy.md)
|
||||
|
||||
## NAPI / PubSub
|
||||
|
||||
### OQ-10: NAPI wrapper API surface
|
||||
- **Origin**: [napi-and-pubsub.md](napi-and-pubsub.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~medium~~ —
|
||||
- **Resolution**: ADR-016 — Expose both `connect()` and `serve()` from the start. Both are fundamental operations needed by the pubsub event target system (spokes use `connect()`, hubs could use `serve()`). The NAPI layer is transport-agnostic — it doesn't know about pubsub's `EventEnvelope`. The pubsub adapter wraps the `Duplex` stream. This ensures the NAPI wrapper is reusable for any stream-based protocol, not tied specifically to pubsub.
|
||||
- **Cross-references**: [ADR-016](decisions/016-napi-expose-connect-and-serve.md), napi-and-pubsub.md
|
||||
|
||||
### OQ-11: napi-rs vs uniffi for FFI bridge
|
||||
- **Origin**: [napi-and-pubsub.md](napi-and-pubsub.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~low~~ —
|
||||
- **Resolution**: ADR-015 — Use napi-rs. It's the standard for Node.js native addons, matches our primary consumer (TypeScript/Node.js), and has the best ecosystem and documentation. If future Python or mobile consumers are needed, a separate uniffi layer can be added — the Rust core doesn't change.
|
||||
- **Cross-references**: [ADR-015](decisions/015-napi-rs-for-ffi-bridge.md), napi-and-pubsub.md
|
||||
|
||||
## Configuration
|
||||
|
||||
### OQ-12: Per-user forwarding scope vs global rules
|
||||
- **Origin**: [research/configuration.md](../research/configuration.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~medium~~ —
|
||||
- **Resolution**: ADR-031 — Start with global rules + principal matching from `Identity.scopes`. Per-user scope from `peer_credentials.metadata.scopes` via `IdentityProvider`. The `ForwardingPolicy` evaluates rules against `Identity.id` and `Identity.scopes` from the authenticated identity.
|
||||
- **Cross-references**: [ADR-031](decisions/031-forwarding-policy.md), [configuration.md](configuration.md)
|
||||
|
||||
### OQ-13: Config file auto-reload via file watching
|
||||
- **Origin**: [research/configuration.md](../research/configuration.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: low
|
||||
- **Resolution**: No file watching. CLI loads once at startup; NAPI/head reload explicitly. File watching is a potential attack vector and unnecessary complexity for a security tool.
|
||||
- **Cross-references**: configuration.md
|
||||
|
||||
### OQ-14: ArcSwap vs RwLock for dynamic config
|
||||
- **Origin**: [research/configuration.md](../research/configuration.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: low
|
||||
- **Resolution**: ArcSwap. Lock-free reads on the hot path (every auth check, every channel open). `RwLock` adds contention. `arc-swap` is small (~500 lines) and well-maintained.
|
||||
- **Cross-references**: configuration.md
|
||||
|
||||
### OQ-15: TLS + WebTransport + iroh QUIC listener coexistence
|
||||
- **Origin**: [research/configuration.md](../research/configuration.md)
|
||||
- **Status**: open
|
||||
- **Priority**: medium
|
||||
- **Resolution**: (deferred to Phase 4 — needs R&D in WebTransport transport session)
|
||||
- **Cross-references**: [auth.md](auth.md), OQ-19, [interface.md](interface.md)
|
||||
|
||||
### OQ-16: Transport-specific forwarding policy (e.g., WebTransport clients restricted to alknet-* channels)
|
||||
- **Origin**: [research/configuration.md](../research/configuration.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~low~~ —
|
||||
- **Resolution**: ADR-031 — Add `TransportKind` match in `ForwardingRule`. WebTransport clients can be restricted to `alknet-*` channels via `TargetPattern::AlknetPrefix` combined with a `TransportKind::WebTransport` filter.
|
||||
- **Cross-references**: [ADR-031](decisions/031-forwarding-policy.md), [configuration.md](configuration.md)
|
||||
|
||||
### OQ-17: Transport-aware auth layer (SSH keys vs API keys for non-SSH transports)
|
||||
- **Origin**: [research/configuration.md](../research/configuration.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~medium~~ —
|
||||
- **Resolution**: ADR-023 — Unified auth with shared key material. SSH transports use SSH pubkey auth. Non-SSH transports (WebTransport) use Ed25519-signed timestamp tokens. Both verify against the same `authorized_keys` set. The presentation differs per transport, but the identity is unified. `AuthPolicy` holds both `SshAuthConfig` and `TokenAuthConfig`, with `TokenKeySource::Shared` as the default (same keys for both paths). `IdentityProvider` trait decouples alknet-core from identity storage.
|
||||
- **Cross-references**: [ADR-023](decisions/023-unified-auth-shared-key-material.md), [identity.md](identity.md), OQ-15
|
||||
|
||||
### OQ-23: irpc dependency — always or behind feature flag?
|
||||
- **Origin**: [research/integration-plan.md](../research/integration-plan.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: medium —
|
||||
- **Resolution**: ADR-027 — Feature flag. Nodes that only do SSH tunneling don't need the service layer. irpc is behind a feature flag in alknet-core and an independent dependency in alknet-secret and alknet-storage.
|
||||
- **Cross-references**: [ADR-027](decisions/027-crate-decomposition.md)
|
||||
|
||||
### OQ-24: DNS control channel scope for initial implementation?
|
||||
- **Origin**: [research/integration-plan.md](../research/integration-plan.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: medium —
|
||||
- **Resolution**: ADR-026 — DNS control channel carries call protocol frames only (no SSH tunneling over DNS). The (DNS transport, raw framing interface) pair sends `EventEnvelope` directly. SSH-over-DNS is a future possibility but out of scope.
|
||||
- **Cross-references**: [ADR-026](decisions/026-transport-interface-separation.md), [interface.md](interface.md)
|
||||
|
||||
### OQ-25: alknet-storage and alknet-secret irpc dependency
|
||||
- **Origin**: [research/integration-plan.md](../research/integration-plan.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: low —
|
||||
- **Resolution**: ADR-027 — Independently. They're separate crates. irpc is a shared library they both use as an independent dependency.
|
||||
- **Cross-references**: [ADR-027](decisions/027-crate-decomposition.md)
|
||||
|
||||
## Auth
|
||||
|
||||
### OQ-18: Source of Identity.scopes — ForwardingPolicy, IdentityProvider, or both?
|
||||
- **Origin**: [auth.md](auth.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~medium~~ —
|
||||
- **Resolution**: ADR-029 and ADR-031 — `IdentityProvider` owns scopes. The `Identity` struct includes `scopes` and `resources` fields populated by the `IdentityProvider` implementation (config-based or database-backed). `ForwardingPolicy` uses scopes from `Identity` — it consumes them, it doesn't produce them.
|
||||
- **Cross-references**: [ADR-029](decisions/029-identity-core-type.md), [ADR-031](decisions/031-forwarding-policy.md), [identity.md](identity.md)
|
||||
|
||||
### OQ-19: Separate TLS identity for WebTransport vs shared with SSH-over-TLS?
|
||||
- **Origin**: [auth.md](auth.md)
|
||||
- **Status**: open
|
||||
- **Priority**: low
|
||||
- **Resolution**: (deferred to Phase 4 — QUIC is UDP, TLS-over-TCP is TCP, they can share port 443 without conflict)
|
||||
- **Cross-references**: OQ-15, [interface.md](interface.md)
|
||||
|
||||
## Call Protocol
|
||||
|
||||
### OQ-20: Worker registration and discovery on connect/disconnect
|
||||
- **Origin**: [call-protocol.md](call-protocol.md)
|
||||
- **Status**: open
|
||||
- **Priority**: medium
|
||||
- **Resolution**: (pending — registration on connect / cleanup on disconnect is the leading approach but needs spec in call-protocol.md)
|
||||
- **Cross-references**: ADR-024, ADR-025
|
||||
|
||||
### OQ-21: Routing calls to specific workers with same-service operations
|
||||
- **Origin**: [call-protocol.md](call-protocol.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~medium~~ —
|
||||
- **Resolution**: ADR-024, ADR-025 — Operation paths use `/{node}/{service}/{op}` format. The first path segment identifies the node and routes the call to the correct connected node. Multiple workers exposing the same service are differentiated by the node prefix (`/dev1/fs/readFile` vs `/dev2/fs/readFile`). The head maintains a routing table mapping node identity to connection.
|
||||
- **Cross-references**: [call-protocol.md](call-protocol.md), ADR-024, ADR-025
|
||||
|
||||
### OQ-22: Client streaming (streaming inputs) in the call protocol?
|
||||
- **Origin**: [call-protocol.md](call-protocol.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~low~~ —
|
||||
- **Resolution**: Deferred. Current model (single request, optional streaming response) covers all identified use cases. Client streaming can be added later if needed.
|
||||
- **Cross-references**: ADR-024
|
||||
|
||||
## Services
|
||||
|
||||
### OQ-SVC-01: Should the secret service support multiple seed phrases (one per tenant)?
|
||||
- **Origin**: [secret-service.md](secret-service.md)
|
||||
- **Status**: open
|
||||
- **Priority**: low
|
||||
- **Resolution**: (deferred — one seed per node is simplest; multi-seed can be added later by indexing `Unlock` with a tenant ID)
|
||||
- **Cross-references**: [secret-service.md](secret-service.md)
|
||||
|
||||
### OQ-SVC-02: Should service protocols use postcard (binary) or JSON for remote calls?
|
||||
- **Origin**: [research/services.md](../research/services.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: low —
|
||||
- **Resolution**: Postcard for irpc (Rust-to-Rust, efficient). JSON for call protocol (cross-language, universal). The irpc remote path naturally uses postcard.
|
||||
- **Cross-references**: [services.md](services.md)
|
||||
|
||||
### OQ-SVC-03: How does the secret service integrate with the existing EncryptedDataSchema from @alkdev/storage?
|
||||
- **Origin**: [secret-service.md](secret-service.md)
|
||||
- **Status**: open
|
||||
- **Priority**: medium
|
||||
- **Resolution**: (pending — Rust implementation replaces PBKDF2 password-based encryption with derived AES-256-GCM keys; EncryptedData format is a superset; migration by re-encrypting)
|
||||
- **Cross-references**: [secret-service.md](secret-service.md), [storage.md](storage.md)
|
||||
|
||||
### OQ-SVC-04: Should workers cache derived keys locally?
|
||||
- **Origin**: [secret-service.md](secret-service.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: low —
|
||||
- **Resolution**: Yes, with a TTL (default: 1 hour). The head can revoke by invalidating the session.
|
||||
- **Cross-references**: [secret-service.md](secret-service.md)
|
||||
|
||||
### OQ-SVC-05: How does the NFT-based ACL smart contract interact with the secret service?
|
||||
- **Origin**: [storage.md](storage.md)
|
||||
- **Status**: open
|
||||
- **Priority**: low
|
||||
- **Resolution**: The Ethereum signing key (`m/44'/60'/0'/0/0`) is derived from the same seed as the secret service. The smart contract is a separate concern — it reads on-chain ACL state, it doesn't call the secret service.
|
||||
- **Cross-references**: [storage.md](storage.md), [secret-service.md](secret-service.md)
|
||||
|
||||
## Interface
|
||||
|
||||
### OQ-IF-01: How does the Interface session type relate to the call protocol's EventEnvelope stream?
|
||||
- **Origin**: [interface.md](interface.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~high~~ —
|
||||
- **Resolution**: `InterfaceSession::recv()` returns `Option<InterfaceEvent>` where `InterfaceEvent` carries `EventEnvelope` + `Identity`. `InterfaceSession::send()` accepts `EventEnvelope`. The `SshSession` bridge implements this over the `alknet-control:0` channel. For `MessageInterface`, `InterfaceRequest`/`InterfaceResponse` normalize request/response pairs. See [interface.md](interface.md) and ADR-035.
|
||||
- **Cross-references**: [ADR-035](decisions/035-streaminterface-messageinterface-split.md), [interface.md](interface.md)
|
||||
|
||||
### OQ-IF-02: Should SshInterface own ForwardingPolicy checks or should they move to Layer 3?
|
||||
- **Origin**: [interface.md](interface.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~medium~~ —
|
||||
- **Resolution**: ForwardingPolicy is Layer 3 (it's policy, not session mechanics). Channel open/close lifecycle is Layer 2. The Interface reports channel open requests to Layer 3; Layer 3 applies ForwardingPolicy. The current `SshHandler` implementation checks policy in `channel_open_direct_tcpip`, which already delegates to `Identity.scopes` from the authenticated identity — this is consistent with the resolution.
|
||||
- **Cross-references**: [ADR-031](decisions/031-forwarding-policy.md), [interface.md](interface.md)
|
||||
|
||||
### OQ-P2-01: Should MessageInterface and StreamInterface share a common trait?
|
||||
- **Origin**: [research/phase2/interface-model.md](../research/phase2/interface-model.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: medium
|
||||
- **Resolution**: Independent traits. Different signatures (`handle_request` vs `accept` + session lifecycle), different transport ownership (self-managed vs provided), different lifecycles (stateless per-request vs long-lived session). A common super-trait adds complexity without benefit. See ADR-035.
|
||||
- **Cross-references**: [ADR-035](decisions/035-streaminterface-messageinterface-split.md), [interface.md](interface.md)
|
||||
|
||||
### OQ-P2-02: Should the HTTP interface share a port with the SSH listener?
|
||||
- **Origin**: [research/phase2/interface-model.md](../research/phase2/interface-model.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: low
|
||||
- **Resolution**: Start with separate ports. Stealth mode byte-peek on a shared port is already implemented for SSH vs HTTP detection. `ListenerConfig::Http { stealth: true }` enables the existing peek pattern. ALPN multiplexing on port 443 is a future optimization that doesn't change the interface abstraction.
|
||||
- **Cross-references**: [interface.md](interface.md), [research/phase2/tls-transport.md](../research/phase2/tls-transport.md)
|
||||
|
||||
### OQ-P2-03: Should the HTTP interface auto-generate OpenAPI specs from OperationRegistry?
|
||||
- **Origin**: [research/phase2/interface-model.md](../research/phase2/interface-model.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: low
|
||||
- **Resolution**: Yes, but Phase 5+. The HTTP interface needs to exist first (Phase 5.3 in the integration plan). `GET /v1/schema` producing an OpenAPI spec from registered `OperationSpec`s is the natural end state. This creates symmetry with `FromOpenAPI` (inbound spec consumption).
|
||||
- **Cross-references**: [call-protocol.md](call-protocol.md), [interface.md](interface.md)
|
||||
|
||||
### OQ-P2-04: How do self-hosted services authenticate via alknet?
|
||||
- **Origin**: [research/phase2/credential-provider.md](../research/phase2/credential-provider.md), [research/phase2/definitions.md](../research/phase2/definitions.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: medium
|
||||
- **Resolution**: Three-phase approach. Phase A: shared secret (`CredentialSet::Bearer` or `S3AccessKey`). Phase C: identity-bound credentials via `ManagedCredentialProvider`. Phase D: alknet as OIDC provider. The `CredentialProvider` trait in core enables Phase A immediately; Phases C and D are additive.
|
||||
- **Cross-references**: [ADR-036](decisions/036-credentialprovider-core-type.md), [credentials.md](credentials.md)
|
||||
|
||||
## Credentials
|
||||
|
||||
### OQ-CP-01: Should CredentialProvider support per-identity credentials?
|
||||
- **Origin**: [credentials.md](credentials.md)
|
||||
- **Status**: open
|
||||
- **Priority**: low
|
||||
- **Resolution**: Start with service-level credentials (`get_credentials(service)`). Add identity-level resolution (`get_credentials_for(service, identity_id)`) when the need is concrete. `Identity.id` already serves as the account UUID in database-backed mode.
|
||||
- **Cross-references**: [credentials.md](credentials.md), [ADR-036](decisions/036-credentialprovider-core-type.md)
|
||||
|
||||
### OQ-CP-02: Where should OIDC provider operations live?
|
||||
- **Origin**: [credentials.md](credentials.md)
|
||||
- **Status**: open
|
||||
- **Priority**: low
|
||||
- **Resolution**: Application service (Phase D). OIDC is an application concern, not a core concern. The call protocol and OperationRegistry provide the transport; OIDC is just another set of operations.
|
||||
- **Cross-references**: [credentials.md](credentials.md)
|
||||
|
||||
### OQ-CP-03: How do credential rotations propagate across a cluster?
|
||||
- **Origin**: [credentials.md](credentials.md)
|
||||
- **Status**: open
|
||||
- **Priority**: low
|
||||
- **Resolution**: TBD. Likely TTL-based caching with a refresh threshold. Workers call `CredentialProvider::get_credentials()` which checks `is_expired()` and calls `refresh_credentials()` if needed.
|
||||
- **Cross-references**: [credentials.md](credentials.md)
|
||||
|
||||
### OQ-CP-04: Should CredentialSet include request-signing capability?
|
||||
- **Origin**: [credentials.md](credentials.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: low
|
||||
- **Resolution**: No. `CredentialSet` is pure data. Request signing (e.g., AWS Signature V4) is a separate utility function in the service wrapper or a shared `alknet-s3` crate. Credentials are data; signing is protocol behavior.
|
||||
- **Cross-references**: [credentials.md](credentials.md)
|
||||
|
||||
## Definitions
|
||||
|
||||
### OQ-DEF-01: Should alknet adopt a "Service Catalog" concept like Keystone?
|
||||
- **Origin**: [research/phase2/definitions.md](../research/phase2/definitions.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: low
|
||||
- **Resolution**: Keep `OperationRegistry` global, check scope at invocation time. Add scope-filtered discovery (`GET /v1/schema?scope=...`) when multi-tenant deployment requires it. The unfiltered registry is sufficient for current needs.
|
||||
- **Cross-references**: [call-protocol.md](call-protocol.md)
|
||||
|
||||
### OQ-DEF-03: Should Identity.scopes be hierarchical or stay flat?
|
||||
- **Origin**: [research/phase2/definitions.md](../research/phase2/definitions.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: low
|
||||
- **Resolution**: Stay flat. Add implied scope resolution in alknet-storage when multi-tenant deployment requires it. A full policy language (like Rustfs IAM JSON policies) is Phase D territory.
|
||||
- **Cross-references**: [identity.md](identity.md)
|
||||
|
||||
### OQ-DEF-08: Should "credential presentation" replace "auth interface" in terminology?
|
||||
- **Origin**: [research/phase2/definitions.md](../research/phase2/definitions.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: medium
|
||||
- **Resolution**: Yes. Adopted in [definitions.md](definitions.md). Use "credential presentation" for the mechanism of presenting credentials on a (Transport, Interface) pair. Never use "auth interface" (overloads "Interface").
|
||||
- **Cross-references**: [definitions.md](definitions.md), [auth.md](auth.md)
|
||||
|
||||
## Secret Service
|
||||
|
||||
### OQ-SEC-01: Should alknet-secret use mlock/VirtualLock to prevent seed RAM from being paged to disk?
|
||||
- **Origin**: [secret-service.md](secret-service.md)
|
||||
- **Status**: open
|
||||
- **Priority**: low
|
||||
- **Resolution**: (deferred to Phase B — zeroize is sufficient for v1; mlock requires root/CAP_IPC_LOCK on Linux and SeLockMemory on Windows, adding platform complexity that should be audited together)
|
||||
- **Cross-references**: [ADR-038](decisions/038-seed-lifecycle-memory-security.md), [secret-service.md](secret-service.md)
|
||||
@@ -1,242 +0,0 @@
|
||||
---
|
||||
status: reviewed
|
||||
last_updated: 2026-06-07
|
||||
---
|
||||
|
||||
# Alknet Overview
|
||||
|
||||
## Purpose
|
||||
|
||||
Alknet is a self-hostable SSH-based tunnel tool that provides VPN-like functionality without being a VPN protocol. It enables:
|
||||
|
||||
- **Private tunneling** of services (Postgres, Redis, internal APIs) over SSH
|
||||
- **Censorship circumvention** — SSH over TLS on port 443 looks like HTTPS to DPI
|
||||
- **NAT traversal** — iroh transport allows peer-to-peer connections without public IPs or port forwarding
|
||||
- **Service mesh connectivity** — a lightweight transport layer for the pubsub/operations event system
|
||||
|
||||
The core insight: SSH tunnels work because SSH is fundamental infrastructure. Blocking it breaks the internet. Alknet makes SSH tunneling accessible through a simple CLI with pluggable transports.
|
||||
|
||||
## Crate Structure
|
||||
|
||||
Alknet is decomposed into six crates with a strict acyclic dependency graph (ADR-027):
|
||||
|
||||
| Crate | Purpose | Exists Now? |
|
||||
|-------|---------|-------------|
|
||||
| **alknet-core** | Transport, SSH, call protocol, config, auth types, `OperationSpec`, `Interface` trait | Yes |
|
||||
| **alknet-napi** | Node.js native addon via napi-rs | Yes |
|
||||
| **alknet-secret** | BIP39, SLIP-0010 HD key derivation, AES-256-GCM, `SecretProtocol` irpc service | Phase 2+ |
|
||||
| **alknet-storage** | SQLite-backed metagraph, identity tables, ACL graph, honker, `StorageProtocol` | Phase 2+ |
|
||||
| **alknet-flowgraph** | `FlowGraph<N,E>` over petgraph, operation graph, call graph | Phase 2+ |
|
||||
| **alknet** (CLI) | Binary that assembles everything with feature flags | Yes |
|
||||
|
||||
The four library crates (core, secret, storage, flowgraph) are independent of each other. Dependencies flow upward only: the CLI binary sits at the top and wires concrete implementations together. alknet-storage implements alknet-core's `IdentityProvider` trait without a crate dependency — the CLI binary provides the bridge.
|
||||
|
||||
irpc is behind a feature flag in alknet-core. Nodes that only do SSH tunneling don't need the service layer overhead.
|
||||
|
||||
## Three-Layer Model
|
||||
|
||||
Alknet uses a three-layer model (ADR-026, ADR-035):
|
||||
|
||||
| Layer | Responsibility | Examples |
|
||||
|-------|---------------|----------|
|
||||
| **Layer 1: Transport** | Produces byte streams (`AsyncRead + AsyncWrite + Unpin + Send`) | TCP, TLS, iroh, WebTransport (future) |
|
||||
| **Layer 2: Interface** | Two categories: StreamInterface (consumes transport stream, produces session) and MessageInterface (handles discrete requests, manages own transport) | Stream: SSH, raw framing. Message: HTTP, DNS |
|
||||
| **Layer 3: Protocol** | Carries semantics — operation registry, service calls, events | Call protocol, OperationEnv, operation dispatch |
|
||||
|
||||
SSH is an interface, not a transport. DNS is a message interface, not a transport.
|
||||
The three-layer model enables HTTP interfaces (stealth mode byte-peek),
|
||||
DNS control channels, and local service mesh (raw framing) without wrapping SSH
|
||||
inside those transports.
|
||||
|
||||
A stream-based connection is always a (Transport, StreamInterface) pair.
|
||||
Message-based interfaces manage their own transport. The protocol layer is
|
||||
agnostic to both.
|
||||
|
||||
## Service Layer
|
||||
|
||||
The irpc service layer decomposes alknet's core responsibilities into independently testable, deployable, and replaceable components (ADR-033, [services.md](services.md)):
|
||||
|
||||
- **Auth** (`AuthProtocol`) — verify identities, check credentials
|
||||
- **Secret** (`SecretProtocol`) — derive keys, encrypt/decrypt
|
||||
- **Config** (`ConfigProtocol`) — dynamic config reload
|
||||
- **Storage** (`StorageProtocol`) — graph CRUD, metagraph operations
|
||||
|
||||
**OperationEnv** is the universal composition mechanism. A handler receives `context.env.invoke("secrets", "derive", input)` and doesn't know whether the dispatch is local (direct function call), in-cluster (irpc service), or cross-node (call protocol `EventEnvelope`). Three dispatch paths, one handler-facing API.
|
||||
|
||||
**Phase boundary**: Phase 1 ships `ConfigIdentityProvider` (ArcSwap-backed) and `ConfigServiceImpl` (ArcSwap-backed) as the only auth and config implementations. The irpc service protocols (`AuthProtocol`, `SecretProtocol`, etc.) and the production deployment topology (multi-node with `StorageIdentityProvider`) are contracted in the specs but will be implemented in Phase 2+. Application services (DockerService, NodeService, agent services) are downstream concerns that build on top of the call protocol and OperationEnv.
|
||||
|
||||
## Identity
|
||||
|
||||
`Identity` struct and `IdentityProvider` trait are core types in alknet-core (ADR-029, [identity.md](identity.md)):
|
||||
|
||||
```rust
|
||||
pub struct Identity {
|
||||
pub id: String, // Fingerprint (config auth) or account UUID (database auth)
|
||||
pub scopes: Vec<String>, // Authorization scope strings
|
||||
pub resources: HashMap<String, Vec<String>>, // Resource-level authorization
|
||||
}
|
||||
```
|
||||
|
||||
`IdentityProvider` decouples alknet-core from identity storage. Phase 1 ships `ConfigIdentityProvider` (reads from `ArcSwap<DynamicConfig.auth>`). `StorageIdentityProvider` (Phase 2+, backed by SQLite) replaces it for production deployments. Both produce the same `Identity` result.
|
||||
|
||||
## Exports
|
||||
|
||||
### Binary: `alknet`
|
||||
|
||||
A single binary with subcommands:
|
||||
|
||||
```
|
||||
alknet serve — Start the server (accepts SSH connections)
|
||||
alknet connect — Start the client (opens SSH session, exposes SOCKS5/port-forwards)
|
||||
```
|
||||
|
||||
### Library: `alknet-core`
|
||||
|
||||
The `alknet-core` crate exports the pluggable components for embedding or programmatic use:
|
||||
|
||||
- `Transport` trait — produces a duplex stream for SSH to run over
|
||||
- `TcpTransport` — direct TCP connection
|
||||
- `TlsTransport` — TCP + tokio-rustls TLS
|
||||
- `IrohTransport` — iroh QUIC P2P connection
|
||||
- `Interface` trait → `StreamInterface` trait and `MessageInterface` trait (ADR-035)
|
||||
- `InterfaceSession` trait — `recv()`/`send()` producing/consuming `InterfaceEvent` frames
|
||||
- `InterfaceRequest` / `InterfaceResponse` — normalized request/response for message interfaces
|
||||
- `Socks5Server` — local SOCKS5 proxy that forwards through SSH channels
|
||||
- `PortForwarder` — manages local/remote port forwards
|
||||
- `ServerHandler` → `SshInterface` — russh server handler with configurable auth and channel policies
|
||||
- `Identity` / `IdentityProvider` — core identity types (ADR-029)
|
||||
- `CredentialProvider` / `CredentialSet` — outbound credential types (ADR-036)
|
||||
- `OperationSpec` — operation registration for call protocol (ADR-025)
|
||||
- `OperationEnv` / `OperationContext` — universal composition and operation context
|
||||
- `ConnectOptions` / `ServeOptions` — programmatic configuration structs
|
||||
- `StaticConfig` / `DynamicConfig` — static/immutable vs, hot-reloadable config (ADR-030)
|
||||
- `ConfigReloadHandle` — programmatic reload of dynamic config
|
||||
- `ForwardingPolicy` — rule-based allow/deny for channel targets (ADR-031)
|
||||
- `ListenerConfig` — stream and message listener configuration
|
||||
|
||||
## Dependencies
|
||||
|
||||
| Dependency | Purpose | Crate | Feature-gated |
|
||||
|------------|---------|-------|---------------|
|
||||
| `russh` | SSH client & server | core | No (core) |
|
||||
| `tokio` | Async runtime | core | No (core) |
|
||||
| `tokio-rustls` | TLS wrapping | core | Yes (`tls`) |
|
||||
| `rustls` | TLS implementation | core | Yes (`tls`) |
|
||||
| `rustls-acme` | ACME/Let's Encrypt auto-cert | core | Yes (`acme`) |
|
||||
| `iroh` | P2P QUIC transport | core | Yes (`iroh`) |
|
||||
| `irpc` | Streaming RPC service layer | core | Yes (`irpc`) |
|
||||
| `arc-swap` | Lock-free dynamic config | core | No (core) |
|
||||
| `serde` | Serialization | core | No (core) |
|
||||
| `clap` | CLI argument parsing | CLI | No (CLI) |
|
||||
| `toml` | TOML config file | CLI | No (CLI) |
|
||||
| `tracing` | Structured logging | core | No (core) |
|
||||
| `anyhow` / `thiserror` | Error handling | core | No (core) |
|
||||
| `bip39` | Mnemonic generation | secret | No (secret) |
|
||||
| `ed25519-bip32` | HD key derivation | secret | No (secret) |
|
||||
| `aes-gcm` | AES-256-GCM encryption | secret | No (secret) |
|
||||
| `rusqlite` | SQLite (via honker) | storage | No (storage) |
|
||||
| `honker` | Event-sourced storage | storage | No (storage) |
|
||||
| `petgraph` | Graph data structure | storage, flowgraph | No |
|
||||
| `jsonschema` | JSON Schema validation | storage, flowgraph | No |
|
||||
|
||||
> Note: `tun-rs` is no longer a dependency. TUN support is deferred in favor of the external `tun2proxy` tool (ADR-014).
|
||||
|
||||
## Architecture Constraints
|
||||
|
||||
1. **SSH runs over transport, not alongside** — The transport layer produces a single `AsyncRead+AsyncWrite+Unpin+Send` stream. SSH runs over that stream via `russh::client::connect_stream()` / `russh::server::run_stream()`. The SSH layer never knows what transport it's on. (ADR-001, ADR-004)
|
||||
|
||||
2. **Three-layer model: Transport, Interface, Protocol** — SSH is a StreamInterface (Layer 2), not a transport (Layer 1). HTTP and DNS are MessageInterfaces (Layer 2). A connection is always a (Transport, StreamInterface) pair for stream-based interfaces, or a standalone MessageInterface for message-based ones. The call protocol (Layer 3) is agnostic to both. This enables HTTP interfaces, DNS control channels, and local service mesh without wrapping SSH. (ADR-026, ADR-035)
|
||||
|
||||
3. **SOCKS5 is the primary client interface** — Port forwarding is built on top of SOCKS5-like channel management. For VPN-like "route all traffic" behavior, users run `tun2proxy` alongside alknet's SOCKS5 proxy. TUN is not in the project scope. (ADR-005, ADR-014)
|
||||
|
||||
4. **No logging of tunnel destinations** — The server logs auth attempts and connections (for fail2ban) but does not log `channel_open_direct_tcpip` destinations, DNS lookups, or bytes transferred. (ADR-006, ADR-013)
|
||||
|
||||
5. **Programmatic-first API** — Configuration via CLI flags, library API structs (`ConnectOptions`, `ServeOptions`), and environment variables. No `~/.ssh/config` parsing. Optional `--config` TOML file for reproducible deployments. (ADR-011, ADR-030)
|
||||
|
||||
6. **Feature flags control transport inclusion** — `tls`, `iroh`, `acme`, `irpc` are feature-gated so the base install is lean. Users opt in to heavier dependencies.
|
||||
|
||||
7. **Authentication is key-based and unified** — Ed25519 public key (default) and OpenSSH certificate authority. Same key material for SSH and token auth. Identity resolves through `IdentityProvider` trait, decoupling core from identity storage. (ADR-012, ADR-023, ADR-029)
|
||||
|
||||
8. **NAPI exposes both connect() and serve()** — The napi-rs wrapper provides client and server functionality, using napi-rs as the FFI bridge. The NAPI layer is transport-agnostic and not tied to pubsub. (ADR-015, ADR-016)
|
||||
|
||||
9. **Static/dynamic config split** — Transport-level settings (listen address, TLS certs) are immutable after startup. Auth, forwarding policy, and rate limits are hot-reloadable via `ArcSwap<DynamicConfig>`. (ADR-030)
|
||||
|
||||
10. **Forwarding policy enforced before proxy spawn** — Each `channel_open_direct_tcpip` is checked against `ForwardingPolicy` before a TCP connection is made. Default-allow preserves current behavior. (ADR-031)
|
||||
|
||||
11. **OperationEnv as universal composition mechanism** — Handlers call `context.env.invoke(namespace, op, input)` regardless of dispatch path (local, irpc service, remote call protocol). (ADR-033)
|
||||
|
||||
12. **Event boundary discipline** — Domain events (Honker streams) stay within the owning service. irpc calls are synchronous and in-cluster. Call protocol `EventEnvelope` is the only thing that crosses node boundaries. (ADR-032)
|
||||
|
||||
13. **Error handling follows a consistent layered pattern** — Transport and auth errors cause reconnection (client, with exponential backoff) or connection rejection (server). Channel-level errors (target unreachable, proxy failure) close the individual channel without killing the session. Library API errors propagate via `anyhow::Result` / `thiserror` types. CLI reports errors to stderr with appropriate exit codes. NAPI errors are marshalled as JavaScript exceptions.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [001](decisions/001-pluggable-transport.md) | Pluggable transport | Transport trait produces `AsyncRead+AsyncWrite+Unpin+Send`, SSH consumes it |
|
||||
| [002](decisions/002-tun-separate-process.md) | TUN shim separate | Superseded — TUN is deferred, use tun2proxy (ADR-014) |
|
||||
| [003](decisions/003-iroh-stream-join.md) | iroh stream join | `tokio::io::join(recv, send)` combines QUIC halves |
|
||||
| [004](decisions/004-ssh-over-transport.md) | SSH over transport | SSH never accesses TCP/iroh/TLS directly |
|
||||
| [005](decisions/005-socks5-before-tun.md) | SOCKS5 first | SOCKS5 is the primary interface; TUN is external (tun2proxy) |
|
||||
| [006](decisions/006-no-logging-of-tunnel-destinations.md) | No logging of tunnel destinations | Server logs auth and connections, not destinations |
|
||||
| [007](decisions/007-napi-single-stream.md) | NAPI single stream | NAPI exposes duplex streams, not SSH multiplexing |
|
||||
| [008](decisions/008-acme-lets-encrypt.md) | ACME/Let's Encrypt | Auto-provision TLS certs, domain and IP paths |
|
||||
| [009](decisions/009-default-iroh-relay.md) | Default iroh relay | n0 relay by default, `--iroh-relay` override |
|
||||
| [010](decisions/010-transport-chaining-cli.md) | Transport chaining | `--proxy` works with all transports natively |
|
||||
| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first | No SSH config files; options are structs, env vars, CLI flags (amended by ADR-030 for optional TOML) |
|
||||
| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Key + cert-authority | Ed25519 keys + OpenSSH CA; no password auth |
|
||||
| [013](decisions/013-fail2ban-friendly-logging.md) | Fail2ban-friendly | Structured auth logs + built-in rate limiting |
|
||||
| [014](decisions/014-defer-tun-recommend-socks5-proxy.md) | Defer TUN | Use tun2proxy for VPN-like behavior; no alknet-tun binary |
|
||||
| [015](decisions/015-napi-rs-for-ffi-bridge.md) | napi-rs | Standard Node.js native addon tooling |
|
||||
| [016](decisions/016-napi-expose-connect-and-serve.md) | connect + serve | NAPI exposes both client and server from the start |
|
||||
| [017](decisions/017-stealth-mode-protocol-multiplexing.md) | Stealth mode | Protocol multiplexing on port 443 |
|
||||
| [018](decisions/018-control-channel-for-pubsub.md) | Control channel | Reserved `alknet-control` destination for pubsub |
|
||||
| [019](decisions/019-proxy-dual-semantics.md) | Proxy dual semantics | `--proxy` routes transport on client, data on server |
|
||||
| [023](decisions/023-unified-auth-shared-key-material.md) | Unified auth | Same key material for SSH and token auth |
|
||||
| [024](decisions/024-bidirectional-call-protocol.md) | Bidirectional call protocol | Both sides can initiate calls |
|
||||
| [025](decisions/025-handler-spec-separation.md) | Handler/spec separation | Downstream registers operations without modifying core |
|
||||
| [026](decisions/026-transport-interface-separation.md) | Three-layer model | SSH is Layer 2, not Layer 1 |
|
||||
| [027](decisions/027-crate-decomposition.md) | Crate decomposition | Six crates, acyclic deps, feature-gated irpc |
|
||||
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | IdentityProvider is the contract, irpc is one backend |
|
||||
| [029](decisions/029-identity-core-type.md) | Identity as core type | `Identity` and `IdentityProvider` in alknet-core |
|
||||
| [030](decisions/030-static-dynamic-config-split.md) | Static/dynamic config | ArcSwap for hot-reloadable auth and forwarding |
|
||||
| [031](decisions/031-forwarding-policy.md) | Forwarding policy | Per-identity, per-destination, per-transport rules |
|
||||
| [032](decisions/032-event-boundary-discipline.md) | Event boundary | Domain events never cross service boundaries |
|
||||
| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv | Universal composition, three dispatch paths |
|
||||
| [034](decisions/034-head-worker-terminology.md) | Head/worker | Replaces hub/spoke terminology |
|
||||
| [035](decisions/035-streaminterface-messageinterface-split.md) | StreamInterface/MessageInterface | Two Layer 2 trait categories for stream vs message |
|
||||
| [036](decisions/036-credentialprovider-core-type.md) | CredentialProvider as core type | Outbound credentials in `alknet_core::credentials` |
|
||||
| [037](decisions/037-api-keys-dynamic-config.md) | API keys in DynamicConfig | Hash-verified bearer tokens for service accounts |
|
||||
|
||||
## Open Questions
|
||||
|
||||
See [open-questions.md](open-questions.md) for all open and resolved questions.
|
||||
Key open questions: OQ-15 (QUIC coexistence), OQ-19 (WebTransport TLS),
|
||||
OQ-20 (worker registration), OQ-IF-01 (Interface session / EventEnvelope
|
||||
relationship).
|
||||
|
||||
## References
|
||||
|
||||
- [transport.md](transport.md) — Transport abstraction (Layer 1)
|
||||
- [interface.md](interface.md) — StreamInterface and MessageInterface (Layer 2)
|
||||
- [call-protocol.md](call-protocol.md) — Call protocol (Layer 3)
|
||||
- [auth.md](auth.md) — Unified authentication, API keys, credential presentation
|
||||
- [identity.md](identity.md) — Identity and IdentityProvider
|
||||
- [credentials.md](credentials.md) — CredentialProvider and CredentialSet (outbound auth)
|
||||
- [definitions.md](definitions.md) — Terminology disambiguation
|
||||
- [configuration.md](configuration.md) — StaticConfig, DynamicConfig, ForwardingPolicy
|
||||
- [services.md](services.md) — irpc service layer, OperationEnv
|
||||
- [server.md](server.md) — Server acceptance, channel handling
|
||||
- [client.md](client.md) — Client connection, SOCKS5, port forwarding
|
||||
- [napi-and-pubsub.md](napi-and-pubsub.md) — NAPI wrapper and pubsub adapter
|
||||
- [storage.md](storage.md) — alknet-storage: metagraph, identity, ACL
|
||||
- [flowgraph.md](flowgraph.md) — alknet-flowgraph: call graph, operation graph
|
||||
- [secret-service.md](secret-service.md) — alknet-secret: BIP39, SLIP-0010, AES-GCM
|
||||
- [Feasibility Assessment](../research/feasibility/ssh-tunnel-vpn-alternative-feasibility.md)
|
||||
- [russh API](/workspace/russh) — SSH client/server library
|
||||
- [Dispatch](/workspace/@alkdev/dispatch) — Reference implementation of russh port forwarding
|
||||
- [iroh](/workspace/iroh) — P2P QUIC connections
|
||||
- [tun2proxy](https://github.com/tun2proxy/tun2proxy) — Recommended external TUN-to-SOCKS5 tool
|
||||
- [irpc](/workspace/irpc) — iroh streaming RPC
|
||||
- [Production certbot setup](../research/ops/certbot.md) — Let's Encrypt on our infrastructure
|
||||
- [Production fail2ban setup](../research/ops/fail2ban.md) — fail2ban with nftables on our infrastructure
|
||||
@@ -1,519 +0,0 @@
|
||||
---
|
||||
status: reviewed
|
||||
last_updated: 2026-06-10
|
||||
---
|
||||
|
||||
# Secret Service (alknet-secret)
|
||||
|
||||
## What
|
||||
|
||||
The `alknet-secret` crate provides BIP39 mnemonic generation, SLIP-0010 Ed25519
|
||||
HD key derivation, AES-256-GCM encryption for external credentials, and the
|
||||
`SecretProtocol` irpc service. It is the only component that holds the master
|
||||
seed phrase.
|
||||
|
||||
## Why
|
||||
|
||||
Operations like SSH key generation, API key storage, and Ethereum transaction
|
||||
signing all need deterministic key derivation from a single root of trust. The
|
||||
seed phrase is the single recovery mechanism — from it, all self-generated
|
||||
secrets can be derived on demand. External credentials (third-party API keys,
|
||||
OAuth tokens) cannot be derived and must be stored encrypted, with the
|
||||
encryption key itself derived from the seed.
|
||||
|
||||
The secret service isolates this responsibility: no other crate sees the seed,
|
||||
and derived keys are provided on demand through an irpc service interface. This
|
||||
follows ADR-027 (crate decomposition) — alknet-secret is fully independent of
|
||||
alknet-core and alknet-storage.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Crate Structure
|
||||
|
||||
```
|
||||
alknet-secret/
|
||||
├── Cargo.toml
|
||||
├── src/
|
||||
│ ├── lib.rs # Crate root, re-exports
|
||||
│ ├── mnemonic.rs # BIP39: phrase generation, validation, seed derivation
|
||||
│ ├── derivation.rs # SLIP-0010: HD key derivation, path constants
|
||||
│ ├── encryption.rs # AES-256-GCM: encrypt/decrypt, EncryptedData type
|
||||
│ ├── protocol.rs # SecretProtocol irpc service enum, DerivedKey, KeyType
|
||||
│ ├── service.rs # SecretService, SecretServiceHandle, SecretServiceActor
|
||||
│ ├── cache.rs # Key caching: LRU cache with TTL, derivation path as key
|
||||
│ └── ethereum.rs # BIP-0032 secp256k1 HD key derivation (behind feature flag)
|
||||
└── tests/
|
||||
├── derivation_tests.rs # Path derivation, coin type 74' consistency
|
||||
├── encryption_tests.rs # Round-trip encrypt/decrypt, key version
|
||||
├── service_tests.rs # Unlock/Lock lifecycle, derive on locked = error
|
||||
└── test_vectors.rs # Known-answer tests: BIP39, SLIP-0010, AES-256-GCM
|
||||
```
|
||||
|
||||
### Dependencies
|
||||
|
||||
```toml
|
||||
[dependencies]
|
||||
bip39 = { version = "2", features = ["rand"] }
|
||||
ed25519-bip32 = "0.4" # IOHK SLIP-0010 Ed25519 HD derivation
|
||||
aes-gcm = "0.10" # AES-256-GCM
|
||||
sha2 = "0.10" # SHA-256 (also used for HMAC-SHA512 in password derivation)
|
||||
hmac = "0.12" # HMAC-SHA512 for key derivation
|
||||
serde = { version = "1", features = ["derive"] }
|
||||
serde_json = "1"
|
||||
thiserror = "2"
|
||||
irpc = { workspace = true } # Always-on, not feature-gated (ADR-027)
|
||||
irpc-derive = { workspace = true } # Proc-macro for #[rpc_requests]
|
||||
tokio = { version = "1", features = ["sync", "rt", "macros"] } # Async runtime for SecretServiceActor
|
||||
zeroize = { version = "1", features = ["derive"] } # Secure memory wiping (ADR-038)
|
||||
base64 = "0.22" # Base64url encoding for derived passwords
|
||||
rand = "0.8" # Random IV/salt generation for AES-256-GCM
|
||||
|
||||
[dependencies.secp256k1]
|
||||
version = "0.29"
|
||||
optional = true # BIP-0032 secp256k1 derivation (behind feature flag)
|
||||
|
||||
[features]
|
||||
default = []
|
||||
secp256k1 = ["dep:secp256k1"] # Enable Ethereum/secp256k1 key derivation
|
||||
|
||||
# Future (Phase B): key rotation via KDF
|
||||
# hkdf = "0.12" # HKDF for salt-based key stretching (deferred)
|
||||
# pbkdf2 = "0.12" # PBKDF2 for password-based key derivation (deferred)
|
||||
```
|
||||
|
||||
irpc is always a dependency (not behind a feature flag). Per ADR-027, irpc
|
||||
in alknet-secret and alknet-storage is not feature-gated because these crates
|
||||
are used in production deployments where the service layer is always active.
|
||||
`irpc-derive` provides the `#[rpc_requests]` proc-macro that generates
|
||||
`SecretMessage` and channel plumbing. `tokio` is needed for the
|
||||
`SecretServiceActor` message loop (async channel receivers and task spawning).
|
||||
|
||||
The `secp256k1` crate is feature-gated behind the `secp256k1` feature because
|
||||
Ethereum/BIP-0032 derivation is not needed in minimal deployments. Only
|
||||
deployments that require `DeriveEthereumKey` should enable this feature. Note
|
||||
that the crate name is `secp256k1` (the Rust library), not `libsecp256k1`
|
||||
(the C library that the Rust crate wraps).
|
||||
|
||||
The `hkdf` and `pbkdf2` crates are deferred to Phase B. They will be needed for
|
||||
salt-based key stretching when key rotation is implemented (see
|
||||
[EncryptedData.salt](#aes-256-gcm-encryption-for-external-credentials)).
|
||||
|
||||
### Crate Interface (Public API)
|
||||
|
||||
The crate exposes these types as its stable public interface:
|
||||
|
||||
```rust
|
||||
// Core types (always available)
|
||||
pub use mnemonic::{Mnemonic, Language, Seed};
|
||||
pub use derivation::{ExtendedPrivKey, DerivationError, PATHS};
|
||||
pub use encryption::{EncryptedData, EncryptionError};
|
||||
pub use protocol::{SecretProtocol, DerivedKey, KeyType, SecretMessage};
|
||||
pub use service::{SecretService, SecretServiceHandle, SecretServiceActor, SecretServiceError};
|
||||
pub use cache::CacheConfig;
|
||||
|
||||
// secp256k1 types (behind feature flag)
|
||||
#[cfg(feature = "secp256k1")]
|
||||
pub use ethereum::Secp256k1ExtendedPrivKey;
|
||||
```
|
||||
|
||||
Other crates consume this interface:
|
||||
- **alknet-storage** references `EncryptedData` for wire format compatibility
|
||||
(type-level, not a crate dependency)
|
||||
- **alknet** (CLI binary) assembles `SecretService` and wires it to the
|
||||
`OperationEnv`
|
||||
- **alknet-core** never depends on alknet-secret; `CredentialProvider` stub
|
||||
returns `None` until Phase A wiring
|
||||
|
||||
### Security Model
|
||||
|
||||
Per ADR-038 (seed lifecycle and memory security):
|
||||
|
||||
| State | What's in memory | What's on disk |
|
||||
|-------|-----------------|---------------|
|
||||
| Locked | Nothing | Encrypted database, derivation path metadata |
|
||||
| Unlocked | Master seed in zeroize-protected RAM | Same (seed is never persisted) |
|
||||
| After use | Derived keys cached in zeroize-protected RAM | Derivation paths only |
|
||||
|
||||
The seed phrase is entered once (at node startup or via `Unlock`), held only in
|
||||
RAM, and never written to disk. `Lock` calls `zeroize()` on the seed and all
|
||||
cached derived keys. The `SecretService` uses `Zeroize`-derived types for all
|
||||
sensitive material.
|
||||
|
||||
#### Key Caching
|
||||
|
||||
Per OQ-SVC-04 (resolved), derived keys are cached in RAM with the following
|
||||
properties:
|
||||
|
||||
- **Cache key**: The derivation path string (e.g., `m/74'/0'/0'/0'`). This
|
||||
uniquely identifies a derived key — the same path always produces the same
|
||||
key from the same seed.
|
||||
- **TTL**: 1 hour (configurable). Cached entries expire after the TTL elapses,
|
||||
forcing re-derivation from the seed on next access.
|
||||
- **Eviction policy**: LRU (least recently used). When the cache exceeds its
|
||||
maximum size, the least recently accessed entry is evicted.
|
||||
- **Clearing**: The entire cache is cleared on `Lock`, and all entries are
|
||||
zeroized before removal per ADR-038.
|
||||
- **Implementation**: The cache lives in `cache.rs` as an LRU map from
|
||||
derivation path to `Zeroize`-protected key bytes.
|
||||
|
||||
The cache avoids redundant derivation for frequently used keys (identity,
|
||||
encryption) while ensuring that `Lock` purges all sensitive material.
|
||||
|
||||
### Key Derivation
|
||||
|
||||
#### BIP39 Mnemonic and Seed Derivation
|
||||
|
||||
```rust
|
||||
let mnemonic = Mnemonic::from_phrase(&phrase, Language::English)?;
|
||||
let seed = mnemonic.to_seed(None); // or Some("passphrase")
|
||||
let key = derive_path_from_seed(seed.as_bytes(), PATHS::IDENTITY)?;
|
||||
```
|
||||
|
||||
#### SLIP-0010 Ed25519 HD Key Derivation
|
||||
|
||||
The `74'` coin type is unallocated per SLIP-0044 and reserved for alknet.
|
||||
|
||||
#### Derivation Path Constants
|
||||
|
||||
| Path | Purpose | Curve/Algorithm |
|
||||
|------|---------|----------------|
|
||||
| `m/74'/0'/0'/0'` | Primary identity keypair | Ed25519 (alknet auth) |
|
||||
| `m/74'/0'/0'/{n}'` | Worker/device identity | Ed25519 |
|
||||
| `m/74'/0'/1'/0'` | SSH host key | Ed25519 |
|
||||
| `m/74'/1'/0'/{hash}'` | Site-specific password | Deterministic (HMAC-SHA512) |
|
||||
| `m/74'/2'/0'/0'` | Encryption key for external credentials | AES-256-GCM |
|
||||
| `m/44'/60'/0'/0/0` | Ethereum signing key | secp256k1 |
|
||||
|
||||
These constants are defined in `derivation::PATHS` for programmatic access.
|
||||
|
||||
#### Password Derivation
|
||||
|
||||
`DerivePassword` produces a deterministic password from the seed using the
|
||||
following algorithm:
|
||||
|
||||
1. Derive the extended private key at path `m/74'/1'/0'/{hash}'` using
|
||||
SLIP-0010 (HMAC-SHA512 with key "ed25519 seed"), where `{hash}'` is a
|
||||
site-specific hardened index derived from the site identifier.
|
||||
2. Take the HMAC-SHA512 output (64 bytes) at that derivation level.
|
||||
3. Truncate to the requested `length` bytes.
|
||||
4. Encode as Base64url (RFC 4648 §5, no padding).
|
||||
|
||||
This produces a URL-safe, deterministic password of the requested length. v1
|
||||
does not impose a special character set — the Base64url alphabet (`A-Z`,
|
||||
`a-z`, `0-9`, `-`, `_`) provides sufficient entropy. If a specific character
|
||||
set is required in the future, a versioned path can be introduced
|
||||
(e.g., `m/74'/1'/1'/{hash}'`).
|
||||
|
||||
The `SecretServiceHandle` provides two methods for password derivation:
|
||||
- `derive_password(path, length)` → `Vec<u8>` (raw truncated bytes)
|
||||
- `derive_password_string(path, length)` → `String` (Base64url-encoded)
|
||||
|
||||
The irpc `DerivePassword` variant returns raw bytes (`Vec<u8>`). Consumers
|
||||
who need a string representation can Base64url-encode the result.
|
||||
|
||||
#### secp256k1 Derivation (Ethereum)
|
||||
|
||||
`DeriveEthereumKey` uses **BIP-0032** (not SLIP-0010) at path
|
||||
`m/44'/60'/0'/0/0`. This is a fundamentally different derivation algorithm from
|
||||
Ed25519:
|
||||
|
||||
- SLIP-0010 (Ed25519) uses HMAC-SHA512 with key "ed25519 seed" and only
|
||||
supports hardened child derivation.
|
||||
- BIP-0032 (secp256k1) uses HMAC-SHA512 with key "Bitcoin seed" and supports
|
||||
both hardened and unhardened child derivation.
|
||||
|
||||
The Ethereum path contains unhardened indices (`0/0`), which are invalid under
|
||||
SLIP-0010. The `alknet-secret` crate gates secp256k1 derivation behind a
|
||||
`secp256k1` feature flag, which pulls in the `libsecp256k1` crate. Deployments
|
||||
that do not need Ethereum signing can omit this feature to avoid the
|
||||
dependency.
|
||||
|
||||
#### DerivedKey Security Properties
|
||||
|
||||
Per ADR-038, the `private_key` field of `DerivedKey` must derive `Zeroize` and
|
||||
use `#[zeroize(drop)]` to ensure sensitive key material is overwritten before
|
||||
deallocation:
|
||||
|
||||
```rust
|
||||
#[derive(Zeroize, Deserialize)]
|
||||
#[zeroize(drop)]
|
||||
pub struct DerivedKey {
|
||||
#[zeroize(skip)]
|
||||
pub key_type: KeyType,
|
||||
#[zeroize]
|
||||
#[serde(deserialize_with = "deserialize_private_key")]
|
||||
pub private_key: Vec<u8>,
|
||||
#[zeroize(skip)]
|
||||
pub public_key: Vec<u8>,
|
||||
}
|
||||
```
|
||||
|
||||
`DerivedKey` is **move-only** — it does not implement `Clone`. This is a
|
||||
stronger security property than manual `Clone` with zeroization of the source:
|
||||
a move-only type cannot be accidentally duplicated, and the `#[zeroize(drop)]`
|
||||
annotation ensures the `private_key` is zeroized when the key goes out of scope.
|
||||
There is no risk of use-after-zeroize from a manual `clone()` that destroys
|
||||
the source.
|
||||
|
||||
Serialization redacts `private_key` in human-readable formats (JSON shows
|
||||
`"[REDACTED]"`) but preserves the actual bytes in binary formats (postcard) so
|
||||
that irpc remote communication works correctly. Deserialization always reads
|
||||
the full bytes.
|
||||
|
||||
### AES-256-GCM Encryption for External Credentials
|
||||
|
||||
External credentials (API keys, OAuth tokens) that cannot be derived are
|
||||
encrypted using a key derived from the seed at path `m/74'/2'/0'/0'`. The
|
||||
`EncryptedData` type stores the key version, salt, IV, and ciphertext.
|
||||
|
||||
1. The secret service derives an AES-256-GCM key via path `m/74'/2'/0'/0'`
|
||||
2. External credentials are encrypted with this key
|
||||
3. The encrypted data is stored as a `SecretNode` in the metagraph
|
||||
4. Only the derivation path and key version are stored in plain attributes
|
||||
5. The seed phrase (or derived encryption key) is held only by the secret
|
||||
service — never in the database
|
||||
|
||||
#### EncryptedData.salt — Reserved for Future KDF-Based Key Rotation
|
||||
|
||||
In v1, the encryption key is derived directly from the seed at path
|
||||
`m/74'/2'/0'/0'` without any salt-based key derivation. The `salt` field in
|
||||
`EncryptedData` is **reserved for future KDF-based key rotation** (Phase B):
|
||||
|
||||
- The salt is generated randomly (32 bytes) and stored in `EncryptedData.salt`
|
||||
for forward compatibility, but it is **not used** in the v1 key derivation
|
||||
process.
|
||||
- When key rotation is implemented, the salt will be used as input to HKDF or
|
||||
PBKDF2 for stretch-based key derivation, allowing the same seed to produce
|
||||
different encryption keys without changing the derivation path.
|
||||
- This design ensures that the wire format does not need to change when key
|
||||
rotation is introduced — the `salt` field is already present and populated.
|
||||
|
||||
The `hkdf` and `pbkdf2` crates are listed as future dependencies in the
|
||||
`Dependencies` section but are not included in v1.
|
||||
|
||||
### SecretProtocol irpc Service
|
||||
|
||||
```rust
|
||||
#[rpc_requests(message = SecretMessage)]
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
enum SecretProtocol {
|
||||
#[rpc(tx=oneshot::Sender<DerivedKey>)]
|
||||
#[wrap(DeriveEd25519)]
|
||||
DeriveEd25519 { path: String },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<DerivedKey>)]
|
||||
#[wrap(DeriveEncryptionKey)]
|
||||
DeriveEncryptionKey { path: String },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<DerivedKey>)]
|
||||
#[wrap(DeriveEthereumKey)]
|
||||
DeriveEthereumKey { path: String },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<Vec<u8>>)]
|
||||
#[wrap(DerivePassword)]
|
||||
DerivePassword { path: String, length: usize },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<EncryptedData>)]
|
||||
#[wrap(Encrypt)]
|
||||
Encrypt { plaintext: String, key_version: u32 },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<String>)]
|
||||
#[wrap(Decrypt)]
|
||||
Decrypt { encrypted: EncryptedData },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<()>)]
|
||||
#[wrap(Lock)]
|
||||
Lock,
|
||||
|
||||
#[rpc(tx=oneshot::Sender<()>)]
|
||||
#[wrap(Unlock)]
|
||||
Unlock { mnemonic: String, passphrase: Option<String> },
|
||||
```
|
||||
|
||||
**Note**: The `Unlock` variant carries both the mnemonic phrase and an optional
|
||||
BIP39 passphrase. The `mnemonic` field is the space-separated BIP39 word list.
|
||||
The `passphrase` field is the optional BIP39 password extension (sometimes
|
||||
called the "25th word"). Most deployments use `passphrase: None`, but the field
|
||||
is available for users who need additional security beyond the mnemonic alone.
|
||||
|
||||
> **Implementation gap**: The current code has `Unlock { passphrase: String }`
|
||||
> with only a single field (the mnemonic), and the actor handler passes `None`
|
||||
> for the BIP39 passphrase. This needs to be updated to match the spec above.
|
||||
> See the `unlock-passphrase-gap` task.
|
||||
|
||||
#### irpc Integration Model
|
||||
|
||||
The `SecretProtocol` enum defines the **wire protocol** — the set of operations
|
||||
the secret service supports. The `#[rpc_requests(message = SecretMessage)]`
|
||||
macro generates `SecretMessage` as the irpc wire type, which comes in two
|
||||
variants:
|
||||
|
||||
- `SecretMessage::Request`: serialized form for remote (QUIC) communication,
|
||||
using postcard encoding.
|
||||
- `SecretMessage::RequestWithChannels`: local form with `oneshot::Sender`
|
||||
channels for in-process communication.
|
||||
|
||||
There are two dispatch paths for consuming the secret service:
|
||||
|
||||
1. **Local (in-process)**: `SecretServiceHandle` wraps `SecretServiceInner`
|
||||
behind `Arc<RwLock<>>` and provides direct method calls
|
||||
(`derive_ed25519()`, `encrypt()`, etc.) without any serialization overhead.
|
||||
This is the path used by the CLI binary and single-node deployments. No irpc
|
||||
message passing is involved — the handle calls the implementation directly.
|
||||
|
||||
2. **Remote (in-cluster)**: `Client<SecretProtocol>` connects to the secret
|
||||
service node via irpc over QUIC. The client sends `SecretMessage::Request`
|
||||
messages (postcard-serialized) and receives responses. Workers on remote
|
||||
nodes use this path. The seed never leaves the secret service node — only
|
||||
derived keys are transmitted.
|
||||
|
||||
The `SecretServiceActor` processes incoming `SecretMessage` variants by
|
||||
dispatching to the corresponding `SecretServiceHandle` methods. It provides
|
||||
a `spawn(handle)` convenience method that creates an mpsc channel, spawns the
|
||||
actor on a tokio task, and returns a `(Client<SecretProtocol>, SecretServiceActor)`
|
||||
tuple for immediate use.
|
||||
|
||||
The `SecretService` type owns the irpc service handler and a
|
||||
`SecretServiceHandle`. It dispatches incoming `SecretMessage` variants to the
|
||||
handle's methods. For call protocol exposure (e.g., `/head/secrets/derive`),
|
||||
the service is wrapped in an operation that serializes to JSON.
|
||||
|
||||
### Wire Format Compatibility with alknet-storage
|
||||
|
||||
The `EncryptedData` type (`key_version`, `salt`, `iv`, `data`) is the stable
|
||||
wire format shared with alknet-storage. This is type-level compatibility — not a
|
||||
crate dependency. alknet-storage stores encrypted nodes using this format;
|
||||
alknet-secret encrypts and decrypts using this format.
|
||||
|
||||
The Rust `EncryptedData` struct in alknet-secret is a superset of the TypeScript
|
||||
`EncryptedDataSchema` from `@alkdev/storage`. Migration path: re-encrypt
|
||||
TypeScript-encrypted data using the Rust secret service with a new key version.
|
||||
The wire format is stable — future key rotation will use the existing `salt`
|
||||
field rather than adding new fields (see OQ-SVC-03).
|
||||
|
||||
### Deployment Topologies
|
||||
|
||||
**Minimal (single node, CLI)**: Secret service runs in the same process. Seed
|
||||
phrase entered at startup. All keys derived locally via `SecretServiceHandle`.
|
||||
No irpc overhead.
|
||||
|
||||
**Production (head node)**: Secret service runs on a dedicated node or as a
|
||||
local irpc service. Workers request derived keys via `Client<SecretProtocol>`
|
||||
over QUIC. The seed never leaves the secret service node.
|
||||
|
||||
### Test Vectors
|
||||
|
||||
Known-answer tests are required against published test vectors to verify
|
||||
correctness of the cryptographic implementations:
|
||||
|
||||
#### BIP39 Test Vectors
|
||||
|
||||
The `mnemonic` module must produce identical output to the BIP39 reference
|
||||
test vectors:
|
||||
|
||||
- Given a known mnemonic phrase and passphrase, the derived seed must match
|
||||
the reference output byte-for-byte.
|
||||
- Test vectors from
|
||||
[BIP39 reference](https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki)
|
||||
and the `bip39` crate's own test suite.
|
||||
|
||||
#### SLIP-0010 Test Vectors
|
||||
|
||||
The `derivation` module must produce identical output to the SLIP-0010 reference
|
||||
test vectors:
|
||||
|
||||
- Given a known seed, the derived master key (private key + chain code) must
|
||||
match the SLIP-0010 reference output.
|
||||
- Given a known master key, the derived child key at path `m/74'/0'/0'/0'`
|
||||
must match the reference output.
|
||||
- Test vectors from
|
||||
[SLIP-0010 reference](https://github.com/satoshilabs/slips/blob/master/slip-0010.md).
|
||||
|
||||
#### AES-256-GCM Test Vectors
|
||||
|
||||
The `encryption` module must produce identical results to published AES-256-GCM
|
||||
test vectors:
|
||||
|
||||
- Given a known key, IV, and plaintext, the ciphertext must match the reference
|
||||
output.
|
||||
- Use IEEE P802.1ASck or NIST SP 800-38D test vectors.
|
||||
- Round-trip encryption/decryption must always succeed for valid inputs.
|
||||
|
||||
These tests ensure that the implementation is correct and compatible with
|
||||
other BIP39/SLIP-0010/AES-256-GCM implementations. They are placed in
|
||||
`tests/test_vectors.rs`.
|
||||
|
||||
## Constraints
|
||||
|
||||
- The seed phrase is never persisted to disk. It is entered at startup or via
|
||||
`Unlock` and held only in `Zeroize`-protected RAM (ADR-038).
|
||||
- `Lock` calls `zeroize()` on the seed and all cached derived keys. The key
|
||||
cache is fully cleared and zeroized on `Lock` (OQ-SVC-04, resolved).
|
||||
- alknet-secret does not depend on alknet-core or alknet-storage. It is fully
|
||||
independent (ADR-027).
|
||||
- The `EncryptedData` wire format is shared with alknet-storage for type-level
|
||||
compatibility, not a crate dependency.
|
||||
- Per ADR-032, secret service domain events (key derivation notifications) stay
|
||||
within the service boundary. External consumers use irpc calls or call
|
||||
protocol operations projected to integration events.
|
||||
- irpc is always a dependency (not feature-gated) per ADR-027.
|
||||
- `SecretProtocol` defines the wire format for in-cluster communication
|
||||
(postcard serialization). For call protocol exposure (e.g.,
|
||||
`/head/secrets/derive`), the service is wrapped in an operation that
|
||||
serializes to JSON.
|
||||
- `DerivedKey.private_key` must derive `Zeroize` per ADR-038. `DerivedKey`
|
||||
is move-only (not `Clone`) — this is stronger than manual Clone with
|
||||
zeroization of the source, as it prevents accidental duplication.
|
||||
- secp256k1 (Ethereum) derivation is gated behind the `secp256k1` feature flag
|
||||
because it requires a different derivation algorithm (BIP-0032) and an
|
||||
additional dependency (`secp256k1`).
|
||||
|
||||
## Phase Progression
|
||||
|
||||
| Phase | Scope | Notes |
|
||||
|-------|-------|-------|
|
||||
| Phase 3 (now) | Basic crate: mnemonic, derivation, encryption, irpc protocol, service lifecycle, key caching | Core key management |
|
||||
| Phase A | Integration with alknet-storage via `EncryptedData` wire format. CLI commands for unlock/lock/derive. `SecretStoreCredentialProvider` wiring. | Full service integration |
|
||||
| Phase B | Memory hardening: `mlock`/`VirtualLock` for seed RAM, constant-time comparison, audit logging of derivation requests. Key rotation: KDF-based key derivation using `EncryptedData.salt` with HKDF/PBKDF2. | Security hardening |
|
||||
| Phase C | Multi-seed support (tenant isolation): indexed `Unlock` with tenant ID. | Multi-tenancy |
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **OQ-SVC-01**: Should the secret service support multiple seed phrases (one
|
||||
per tenant)? See [open-questions.md](open-questions.md).
|
||||
|
||||
- **OQ-SVC-03**: How does the secret service integrate with the existing
|
||||
`EncryptedDataSchema` from `@alkdev/storage`? **Resolution**: The wire format
|
||||
is stable. `EncryptedData` (`key_version`, `salt`, `iv`, `data`) is shared
|
||||
type-level between alknet-secret and alknet-storage. The migration path is
|
||||
re-encryption with a new key version. The `salt` field is reserved for future
|
||||
KDF-based key rotation (see Phase B). See [open-questions.md](open-questions.md).
|
||||
|
||||
- **OQ-SVC-04**: Should workers cache derived keys locally? **Resolution**: Yes.
|
||||
Derived keys are cached in RAM using an LRU cache keyed by derivation path,
|
||||
with a TTL of 1 hour (configurable). The cache is fully cleared and zeroized
|
||||
on `Lock`. This avoids redundant derivation for frequently used keys while
|
||||
ensuring that `Lock` purges all sensitive material. See [open-questions.md](open-questions.md).
|
||||
|
||||
- **OQ-SEC-01**: Should alknet-secret use `mlock`/`VirtualLock` to prevent seed
|
||||
RAM from being paged to disk? See [open-questions.md](open-questions.md).
|
||||
Deferred to Phase B per ADR-038.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [027](decisions/027-crate-decomposition.md) | Crate decomposition | alknet-secret is independent of core and storage |
|
||||
| [032](decisions/032-event-boundary-discipline.md) | Event boundary | Secret service domain events stay internal |
|
||||
| [038](decisions/038-seed-lifecycle-memory-security.md) | Seed lifecycle and memory security | Zeroize for sensitive material, mlock deferred to Phase B |
|
||||
|
||||
## References
|
||||
|
||||
- [research/services.md](../research/services.md) — SecretProtocol definition, DerivedKey, KeyType
|
||||
- [research/storage.md](../research/storage.md) — Secrets section, derivation paths, EncryptedData
|
||||
- [research/integration-plan.md](../research/integration-plan.md) — Phase 3.1
|
||||
- [credentials.md](credentials.md) — CredentialProvider (outbound auth, consumes SecretProtocol::Decrypt)
|
||||
- SLIP-0010 — https://github.com/satoshilabs/slips/blob/master/slip-0010.md
|
||||
- BIP39 — https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki
|
||||
- BIP-0032 — https://github.com/bitcoin/bips/blob/master/bip-0032.mediawiki
|
||||
- NIST SP 800-38D — AES-GCM test vectors
|
||||
@@ -1,325 +0,0 @@
|
||||
---
|
||||
status: reviewed
|
||||
last_updated: 2026-06-07
|
||||
---
|
||||
|
||||
# Server
|
||||
|
||||
## What
|
||||
|
||||
The alknet server accepts SSH connections (via pluggable transport) and handles `channel_open_direct_tcpip` requests by connecting to the requested target — either directly or through an outbound proxy.
|
||||
|
||||
## Why
|
||||
|
||||
The server is the tunnel endpoint. It receives SSH channels requesting TCP connections to specific hosts and ports, and makes those connections on behalf of the client. It's the same role as an SSH server with `AllowTcpForwarding yes`, but self-contained and transport-agnostic.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Server Components
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────┐
|
||||
│ alknet serve │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────┐ │
|
||||
│ │ SSH Server (russh) │ │
|
||||
│ │ ServerHandler per connection │ │
|
||||
│ │ - auth_publickey() → Accept/Reject │ │
|
||||
│ │ - channel_open_direct_tcpip() → connect │ │
|
||||
│ │ - channel_open_forwarded_tcpip() → proxy │ │
|
||||
│ └──────────────────┬──────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────────────────▼──────────────────────────┐ │
|
||||
│ │ Transport Acceptor │ │
|
||||
│ │ (TcpListener / TlsListener / IrohEndpoint) │ │
|
||||
│ └──────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌──────────────────────────────────────────────┐ │
|
||||
│ │ Outbound Proxy (optional) │ │
|
||||
│ │ - Direct TCP │ │
|
||||
│ │ - SOCKS5 proxy │ │
|
||||
│ │ - HTTP CONNECT proxy │ │
|
||||
│ └──────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌──────────────────────────────────────────────┐ │
|
||||
│ │ Rate Limiter │ │
|
||||
│ │ - max-connections-per-ip │ │
|
||||
│ │ - max-auth-attempts │ │
|
||||
│ └──────────────────────────────────────────────┘ │
|
||||
└──────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Authentication
|
||||
|
||||
The server authenticates connections through the `IdentityProvider` trait (ADR-029, [identity.md](identity.md)). `IdentityProvider` decouples the server from any specific identity storage — the server resolves an identity, it doesn't manage keys.
|
||||
|
||||
**Phase 1 implementation**: `ConfigIdentityProvider` (in alknet-core) reads from `ArcSwap<DynamicConfig.auth>` (ADR-030). Every authorized key gets a default scope set. No database required. This is the default for CLI and single-node deployments.
|
||||
|
||||
**Future implementation**: `StorageIdentityProvider` (in alknet-storage, not yet built) backed by SQLite `peer_credentials` and `api_keys` tables plus the ACL graph. The server doesn't need to know which implementation is active — it goes through the trait.
|
||||
|
||||
The server supports two auth presentation paths (ADR-023, [auth.md](auth.md)):
|
||||
|
||||
**SSH public key auth** (SSH transports):
|
||||
1. `auth_publickey()` callback receives the presented key
|
||||
2. Delegates to `IdentityProvider::resolve_from_fingerprint()` with the key fingerprint
|
||||
3. Returns `Accept` (with `Identity` attached) or `Reject`
|
||||
|
||||
**Ed25519 + OpenSSH certificate authority** (ADR-012):
|
||||
1. If no direct key match, validate the presented certificate against trusted cert-authorities
|
||||
2. Check CA signature, expiry, and principal restrictions
|
||||
3. Certificate options: `permit-port-forwarding`, `no-pty`, `source-address`
|
||||
|
||||
**Token auth** (non-SSH transports, WebTransport):
|
||||
1. Extract token from URL path or `Authorization` header
|
||||
2. Delegate to `IdentityProvider::resolve_from_token()`
|
||||
3. Same verification: same authorized keys set, same `Identity` result (ADR-023)
|
||||
|
||||
**No password authentication over SSH channels.** Keys and certificates are sufficient and more secure. If a local SOCKS5 proxy needs its own auth layer, that's a separate concern.
|
||||
|
||||
### Key Material Format
|
||||
|
||||
Key inputs (`--key`, `--authorized-keys`, `--cert-authority`) accept either file paths or in-memory data (via library API or NAPI wrapper). The accepted format is **OpenSSH key format** throughout — private keys in OpenSSH format (`-----BEGIN OPENSSH PRIVATE KEY-----`), public keys in OpenSSH format (`ssh-ed25519 AAAA... user@host`), and authorized keys files in standard OpenSSH `authorized_keys` format. PEM-encoded keys (PKCS#1, PKCS#8) are not supported.
|
||||
|
||||
### TLS Certificate Provisioning
|
||||
|
||||
The server supports three TLS certificate modes (ADR-008):
|
||||
|
||||
1. **Manual certs** (`--tls-cert` / `--tls-key`): User provides certificate and key files. For users with existing PKI.
|
||||
2. **Domain-based ACME** (`--acme-domain <domain>`): Auto-provisions certificates from Let's Encrypt using HTTP-01 or TLS-ALPN-01 challenges. Certificate is domain-bound and auto-renews. Requires port 80 or DNS access for challenges.
|
||||
3. **IP-based ACME**: Short-lived certificates via TLS-ALPN-01 challenge on port 443. No domain name needed, but certificates expire frequently. The ACME client runs continuously.
|
||||
|
||||
ACME support is feature-gated behind the `acme` feature flag to keep the base binary lean. Implementation uses `rustls-acme` or a similar pure-Rust ACME client to avoid an external `certbot` dependency.
|
||||
|
||||
### Channel Handling
|
||||
|
||||
When a client opens a `channel_open_direct_tcpip(host, port, originator_addr, originator_port)`:
|
||||
|
||||
**Reserved destination** — If `host` starts with `alknet-` (e.g., `alknet-control`), the server routes the channel internally instead of connecting to a TCP target. The primary reserved destination is `alknet-control:0`, which bridges the channel to the local pubsub event bus (ADR-018).
|
||||
|
||||
**Forwarding policy check** — Before the proxy task is spawned for any non-reserved destination, the server evaluates `ForwardingPolicy` against the authenticated `Identity` (ADR-031, [configuration.md](configuration.md)). The policy check uses `Identity.id` and `Identity.scopes` from the identity resolved during auth. If the policy denies the destination, the channel open is rejected — no TCP connection is attempted. The default policy (`ForwardingPolicy::allow_all()`) preserves current behavior.
|
||||
|
||||
**Regular destination** — For targets that pass the forwarding policy check:
|
||||
|
||||
1. **Connection** — connect to `host:port`, either directly or via the configured outbound proxy
|
||||
2. **Outbound connection** — connect to the target, either directly or via the configured outbound proxy
|
||||
3. **Bidirectional proxy** — `tokio::io::copy_bidirectional` between the SSH channel stream and the outbound TCP stream
|
||||
4. **Cleanup** — close the channel and TCP stream when either side disconnects
|
||||
|
||||
### Outbound Proxy Modes
|
||||
|
||||
| Mode | CLI Flag | Behavior |
|
||||
|------|----------|----------|
|
||||
| **Direct** | (default) | `TcpStream::connect(target)` |
|
||||
| **SOCKS5** | `--proxy socks5://addr:port` | Connect through SOCKS5 proxy |
|
||||
| **HTTP CONNECT** | `--proxy http://addr:port` | Connect through HTTP CONNECT proxy |
|
||||
|
||||
The proxy setting applies globally to all outbound connections from the server.
|
||||
|
||||
### Stealth Mode
|
||||
|
||||
When `--stealth` is enabled on the server alongside TLS transport:
|
||||
|
||||
1. Non-SSH connections (normal web browsers, scanners) receive a fake nginx 404 response
|
||||
2. The server detects whether the connecting client is speaking SSH or HTTP after the TLS handshake
|
||||
3. If SSH: proceed with `server::run_stream()`
|
||||
4. If HTTP: respond with `HTTP/1.1 404 Not Found` + `Server: nginx` headers, then close
|
||||
|
||||
This makes the server appear as an ordinary web server to port scanners and DPI systems.
|
||||
|
||||
**Stealth mode requires TLS transport (`--transport tls`).** It has no effect with TCP or iroh transports — in those cases, there is no TLS handshake to peek behind, and protocol multiplexing is impossible. The CLI should reject or warn if `--stealth` is used without `--transport tls`.
|
||||
|
||||
### Server Handler Behavior
|
||||
|
||||
The server handler implements `russh::server::Handler` with two primary responsibilities:
|
||||
|
||||
**Authentication (`auth_publickey`)**:
|
||||
- Delegate to `IdentityProvider::resolve_from_fingerprint()` with the presented key fingerprint
|
||||
- If identity resolved, return `Accept` with the `Identity` attached to the session
|
||||
- If no identity, check certificate authority: validate CA signature, expiry, principals
|
||||
- Return `Accept` or `Reject`
|
||||
|
||||
**Channel handling (`channel_open_direct_tcpip`)**:
|
||||
- If the destination host starts with `alknet-`, route internally (control channel, ADR-018)
|
||||
- Otherwise, evaluate `ForwardingPolicy` against the session's `Identity` (ADR-031)
|
||||
- If denied, reject the channel open
|
||||
- If allowed, connect to `host:port` (directly or via the configured outbound proxy)
|
||||
- Spawn a bidirectional proxy task between the SSH channel and the outbound TCP stream
|
||||
- Return the channel for data flow
|
||||
|
||||
### Interface Abstraction
|
||||
|
||||
SSH is one interface at Layer 2 in the three-layer model (ADR-026, [interface.md](interface.md)). The current `ServerHandler` will be refactored into `SshInterface` — it manages SSH session concerns (handshake, auth delegation, channel multiplexing). Forwarding policy, operation routing, and call protocol handling are Layer 3 concerns that live outside the interface. This refactoring is the most invasive code change in Phase 1 (integration-plan, Phase 1.8).
|
||||
|
||||
### Logging and Rate Limiting
|
||||
|
||||
**Logging** (for fail2ban integration on Linux):
|
||||
|
||||
- `INFO` level: auth attempts (remote_addr, user, key_fingerprint, accept/reject)
|
||||
- `INFO` level: connection opened (remote_addr, transport kind)
|
||||
- `INFO` level: connection closed (remote_addr, duration)
|
||||
- Do NOT log: channel open targets, DNS resolutions, bytes transferred
|
||||
|
||||
This matches our production fail2ban setup which filters on source IP + failure indicators. Example log lines:
|
||||
```
|
||||
INFO auth attempt remote_addr=203.0.113.50 user=root key_fingerprint=SHA256:abc... result=reject
|
||||
INFO connection opened remote_addr=203.0.113.50 transport=tls
|
||||
```
|
||||
|
||||
**Built-in rate limiting** (platform-independent):
|
||||
|
||||
| Flag | Default | Purpose |
|
||||
|------|---------|---------|
|
||||
| `--max-connections-per-ip` | 0 (unlimited) | Reject new connections from IPs with N active connections |
|
||||
| `--max-auth-attempts` | 10 | Disconnect after N failed auth attempts per connection |
|
||||
|
||||
These provide abuse protection on platforms without fail2ban (macOS, Windows, BSD) and complement fail2ban on Linux.
|
||||
|
||||
### CLI Interface
|
||||
|
||||
Configuration sources (in priority order): CLI flags, environment variables, optional `--config` TOML file (ADR-030). The TOML config file is a convenience input for reproducible deployments; it does not replace `ServeOptions` (ADR-011).
|
||||
|
||||
Multi-transport listeners use `[[listeners]]` in the TOML config (ADR-030):
|
||||
|
||||
```toml
|
||||
[[listeners]]
|
||||
transport = "tls"
|
||||
listen = "0.0.0.0:443"
|
||||
|
||||
[listeners.tls]
|
||||
cert = "/etc/alknet/tls/cert.pem"
|
||||
key = "/etc/alknet/tls/key.pem"
|
||||
|
||||
[[listeners]]
|
||||
transport = "iroh"
|
||||
```
|
||||
|
||||
Currently, the server binds to a single transport at a time. Multi-transport via `[[listeners]]` is coming per ADR-030.
|
||||
|
||||
```bash
|
||||
# Basic server (SSH on port 22)
|
||||
alknet serve --key ~/.ssh/ssh_host_ed25519_key
|
||||
|
||||
# With TLS (manual certs)
|
||||
alknet serve --key ~/.ssh/ssh_host_ed25519_key \
|
||||
--transport tls \
|
||||
--tls-cert /etc/ssl/cert.pem \
|
||||
--tls-key /etc/ssl/key.pem
|
||||
|
||||
# With TLS (auto ACME, domain-based)
|
||||
alknet serve --key ~/.ssh/ssh_host_ed25519_key \
|
||||
--transport tls \
|
||||
--acme-domain example.com
|
||||
|
||||
# With TLS + stealth (fake nginx 404 to scanners)
|
||||
alknet serve --key ~/.ssh/ssh_host_ed25519_key \
|
||||
--transport tls \
|
||||
--acme-domain example.com \
|
||||
--stealth
|
||||
|
||||
# With iroh transport (no public IP needed)
|
||||
alknet serve --key ~/.ssh/ssh_host_ed25519_key \
|
||||
--transport iroh
|
||||
|
||||
# With outbound proxy
|
||||
alknet serve --key ~/.ssh/ssh_host_ed25519_key \
|
||||
--proxy socks5://127.0.0.1:9050
|
||||
|
||||
# With certificate authority authentication
|
||||
alknet serve --key ~/.ssh/ssh_host_ed25519_key \
|
||||
--cert-authority /etc/alknet/ca.pub
|
||||
|
||||
# With rate limiting
|
||||
alknet serve --key ~/.ssh/ssh_host_ed25519_key \
|
||||
--max-connections-per-ip 5 \
|
||||
--max-auth-attempts 3
|
||||
|
||||
# All options
|
||||
alknet serve \
|
||||
--key <path-or-buffer> \ # SSH host key (required)
|
||||
--authorized-keys <path> \ # Authorized keys file
|
||||
--cert-authority <path> \ # CA public key for cert-auth
|
||||
--transport tcp|tls|iroh \ # Transport mode
|
||||
--listen <addr:port> \ # Listen address for TCP/TLS (default: 0.0.0.0:22)
|
||||
--tls-cert <path> \ # TLS certificate (manual)
|
||||
--tls-key <path> \ # TLS private key (manual)
|
||||
--acme-domain <domain> \ # ACME auto-cert domain
|
||||
--stealth \ # Serve fake nginx 404 to non-SSH connections
|
||||
--proxy <url> \ # Outbound proxy URL (socks5:// or http://)
|
||||
--iroh-relay <url> \ # iroh relay server URL (default: n0 relay)
|
||||
--max-connections-per-ip <n> \ # Max concurrent connections per IP (default: unlimited)
|
||||
--max-auth-attempts <n> # Max auth failures before disconnect (default: 10)
|
||||
```
|
||||
|
||||
### iroh Server Mode
|
||||
|
||||
When running with `--transport iroh`, the server:
|
||||
|
||||
1. Creates an iroh endpoint with ALPN value `b"alknet-ssh"`
|
||||
2. Prints its endpoint ID (base58-encoded Ed25519 public key) — this is what clients use as the `--peer` value
|
||||
3. Accepts incoming connections on the endpoint
|
||||
4. For each connection, accepts a bidirectional stream and passes it to `server::run_stream()`
|
||||
|
||||
No listening port is needed. The server connects outbound to the iroh relay (default: n0, override with `--iroh-relay`) and awaits connections from clients who know its endpoint ID (base58-encoded, printed on startup).
|
||||
|
||||
## Constraints
|
||||
|
||||
- The server does not log tunnel destinations (ADR-006). Auth events and connection events are logged for fail2ban integration (ADR-013).
|
||||
- Destination strings beginning with `alknet-` are reserved for internal use (ADR-018). The server must not attempt TCP connections to `alknet-*` destinations — these are intercepted for control channel routing.
|
||||
- One `ServerHandler` instance per connection. Handler state is not shared between connections (unless explicitly configured via `Arc` shared state for things like connection limits).
|
||||
- The server currently binds to a single transport at a time. Multi-transport via `[[listeners]]` is coming per ADR-030.
|
||||
- Forwarding policy is evaluated before every channel proxy spawn. Denied channels are rejected immediately (ADR-031).
|
||||
- Auth resolves through `IdentityProvider` (ADR-029). Phase 1 uses `ConfigIdentityProvider` backed by `ArcSwap<DynamicConfig>` (ADR-030). `StorageIdentityProvider` (Phase 2+) replaces it for production deployments with SQLite.
|
||||
- ACME support requires the `acme` feature flag. Without it, only manual TLS certs are supported.
|
||||
- No password authentication over SSH channels. Key-based and cert-authority only (ADR-012).
|
||||
- Stealth mode (`--stealth`) requires TLS transport. It has no effect on TCP or iroh transports (ADR-017).
|
||||
|
||||
## Graceful Shutdown
|
||||
|
||||
On SIGTERM or SIGINT:
|
||||
|
||||
1. Stop accepting new connections on the transport listener
|
||||
2. Send SSH disconnect messages to all active sessions
|
||||
3. Wait for in-flight channel data to drain (brief timeout, ~2 seconds per session)
|
||||
4. Close all transport listeners
|
||||
5. Exit
|
||||
|
||||
The server does not wait indefinitely for idle connections to close. After the drain timeout, remaining connections are forcibly terminated. This prevents a slow or stuck client from blocking shutdown indefinitely.
|
||||
|
||||
## Error Handling
|
||||
|
||||
Error handling follows the project's layered pattern (see overview.md):
|
||||
|
||||
- **Transport errors**: Cause connection rejection. The listener remains active — a failed TLS handshake or iroh connection attempt does not affect other incoming connections.
|
||||
- **Auth errors**: Result in connection rejection with a logged auth failure event (for fail2ban, ADR-013). Repeated failures from one connection trigger disconnect after `--max-auth-attempts`.
|
||||
- **Channel-level errors**: Individual channel failures (target unreachable, proxy failure) close that channel without affecting the SSH session or other channels. The client receives a channel open failure message.
|
||||
- **CLI errors**: Reported to stderr with a non-zero exit code. Fatal errors (invalid flags, key file not found, bind failure) exit immediately.
|
||||
|
||||
## Open Questions
|
||||
|
||||
None — all resolved.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [001](decisions/001-pluggable-transport.md) | Pluggable transport | Transport trait, SSH consumes stream |
|
||||
| [004](decisions/004-ssh-over-transport.md) | SSH over transport | SSH never touches network directly |
|
||||
| [006](decisions/006-no-logging-of-tunnel-destinations.md) | No logging of destinations | Server logs auth and connections, not destinations |
|
||||
| [008](decisions/008-acme-lets-encrypt.md) | ACME/Let's Encrypt | Auto-provision TLS certs, domain and IP paths |
|
||||
| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Key + cert-authority auth | No password auth; support OpenSSH cert-authority |
|
||||
| [013](decisions/013-fail2ban-friendly-logging.md) | Fail2ban-friendly logging | Structured auth logs + built-in rate limiting |
|
||||
| [017](decisions/017-stealth-mode-protocol-multiplexing.md) | Stealth mode | Protocol multiplexing on port 443 |
|
||||
| [018](decisions/018-control-channel-for-pubsub.md) | Control channel | Reserved `alknet-control` destination for pubsub |
|
||||
| [019](decisions/019-proxy-dual-semantics.md) | Proxy dual semantics | `--proxy` routes transport on client, data on server |
|
||||
| [026](decisions/026-transport-interface-separation.md) | Three-layer model | SSH is Layer 2 interface, ServerHandler → SshInterface |
|
||||
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | IdentityProvider is the contract; irpc service is one backend |
|
||||
| [029](decisions/029-identity-core-type.md) | Identity as core type | IdentityProvider trait in alknet-core |
|
||||
| [030](decisions/030-static-dynamic-config-split.md) | Static/dynamic config split | ArcSwap for dynamic config, ConfigReloadHandle |
|
||||
| [031](decisions/031-forwarding-policy.md) | Forwarding policy | Evaluated before channel proxy spawn |
|
||||
|
||||
## References
|
||||
|
||||
- [configuration.md](configuration.md) — DynamicConfig, ForwardingPolicy, ConfigReloadHandle
|
||||
- [identity.md](identity.md) — IdentityProvider trait, Identity struct
|
||||
- [auth.md](auth.md) — Unified auth, AuthPolicy, token auth
|
||||
- [interface.md](interface.md) — Interface trait, SshInterface, three-layer model
|
||||
@@ -1,233 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-07
|
||||
---
|
||||
|
||||
# Services
|
||||
|
||||
> **Phase note**: This spec defines the contracts for the service layer — the
|
||||
> protocol enums, OperationEnv, and deployment topologies. Phase 1 ships
|
||||
> `ConfigIdentityProvider` (ArcSwap-based) and `ConfigServiceImpl` (ArcSwap-based)
|
||||
> as the only auth and config implementations. The irpc service protocols
|
||||
> (`AuthProtocol`, `SecretProtocol`, etc.) and the production deployment
|
||||
> topology (multi-node with `StorageIdentityProvider`) are contracted here but
|
||||
> will be implemented in Phase 2+. Application services (DockerService,
|
||||
> NodeService, agent services) are downstream concerns that build on top of
|
||||
> the call protocol and OperationEnv — they are not core requirements.
|
||||
|
||||
## What
|
||||
|
||||
The irpc service layer decomposes alknet's core responsibilities into
|
||||
independently testable, deployable, and replaceable components. Auth, Secret,
|
||||
Config, and Storage are irpc protocol enums that work both as in-process async
|
||||
boundaries (tokio channels) and cross-process/cross-network (irpc over iroh
|
||||
QUIC streams). OperationEnv is the universal composition mechanism that unifies local
|
||||
dispatch, irpc service dispatch, and remote call protocol dispatch.
|
||||
|
||||
## Why
|
||||
|
||||
Without the service layer, auth verification, key derivation, and config reload
|
||||
are scattered across the codebase with no async boundary. For head nodes serving
|
||||
many users, in-memory key lookup doesn't scale — auth needs to query a database
|
||||
on demand. For secret management, the seed must be isolated in its own process
|
||||
boundary.
|
||||
|
||||
Without OperationEnv, handlers calling other operations would need to know
|
||||
whether the target is local, in-cluster, or on a remote node. OperationEnv
|
||||
abstracts this away: `context.env.invoke("secrets", "derive", input)` works
|
||||
regardless of dispatch path.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Service Definition Pattern
|
||||
|
||||
Services are defined as irpc protocol enums:
|
||||
|
||||
```rust
|
||||
#[rpc_requests(message = AuthMessage)]
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
enum AuthProtocol {
|
||||
#[rpc(tx=oneshot::Sender<AuthResult>)]
|
||||
#[wrap(VerifyPubkey)]
|
||||
VerifyPubkey { fingerprint: String, key_data: Vec<u8> },
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
The `#[rpc_requests]` macro generates two versions:
|
||||
- **Serializable** (`Request`): for remote communication (postcard encoding)
|
||||
- **With channels** (`RequestWithChannels`): for local communication (tokio channels)
|
||||
|
||||
Both use the same `Client<S>` type. The local/remote distinction is transparent
|
||||
at the call site.
|
||||
|
||||
### Core Services
|
||||
|
||||
| Service | Protocol | Purpose | Always Local? |
|
||||
|---------|----------|---------|---------------|
|
||||
| **Auth** | `AuthProtocol` | Verify identities, check credentials | Can be remote |
|
||||
| **Secret** | `SecretProtocol` | Derive keys, encrypt/decrypt | Local or remote |
|
||||
| **Config** | `ConfigProtocol` | Dynamic config reload | Local |
|
||||
| **Storage** | `StorageProtocol` | Graph CRUD, metagraph operations | Local or remote |
|
||||
|
||||
### OperationContext
|
||||
|
||||
Every handler receives an `OperationContext`:
|
||||
|
||||
```rust
|
||||
pub struct OperationContext {
|
||||
pub request_id: String,
|
||||
pub parent_request_id: Option<String>,
|
||||
pub identity: Option<Identity>,
|
||||
pub metadata: HashMap<String, Value>,
|
||||
pub env: OperationEnv,
|
||||
pub trusted: bool, // set by buildEnv(), not by callers
|
||||
}
|
||||
```
|
||||
|
||||
- **`identity`**: The authenticated identity making the call. Populated by
|
||||
`IdentityProvider` from the interface layer.
|
||||
- **`env`**: The operation environment — namespaced access to other operations.
|
||||
- **`trusted`**: When a handler calls another operation through `env`, the
|
||||
nested call is `trusted` (skips ACL checks).
|
||||
|
||||
### OperationEnv — Universal Composition Mechanism
|
||||
|
||||
OperationEnv provides namespace + operation name → invoke with input, return
|
||||
output. The handler doesn't know or care whether the dispatch is local, irpc,
|
||||
or remote.
|
||||
|
||||
Three dispatch paths:
|
||||
|
||||
| Path | Mechanism | Serialization | Scope |
|
||||
|------|-----------|---------------|-------|
|
||||
| **Local** | Direct function call through registry | None (in-process) | Same process |
|
||||
| **Service** | irpc protocol enum dispatch | postcard (binary) | Same cluster |
|
||||
| **Remote** | Call protocol `EventEnvelope` | JSON | Cross-node |
|
||||
|
||||
All three produce the same `ResponseEnvelope`.
|
||||
|
||||
Service assembly determines which path each operation uses:
|
||||
|
||||
```rust
|
||||
// Minimal deployment (single node, all local)
|
||||
let env = OperationEnv::local(local_registry);
|
||||
|
||||
// Production deployment (mix of local and remote)
|
||||
let env = OperationEnv::new()
|
||||
.local("auth", auth_registry)
|
||||
.local("config", config_registry)
|
||||
.service("secrets", secret_irpc_client)
|
||||
.remote("worker-1", call_protocol_conn);
|
||||
```
|
||||
|
||||
### Service vs Call Protocol vs External Service
|
||||
|
||||
These are different concepts that compose through OperationEnv:
|
||||
|
||||
- **irpc service**: In-cluster, Rust-to-Rust, type-safe, postcard serialization.
|
||||
Dispatched by enum variant. Example: `AuthProtocol::VerifyPubkey`.
|
||||
- **Call protocol operation**: Cross-node, cross-language, path-based, JSON
|
||||
`EventEnvelope`. Dispatched by namespace + name. Example:
|
||||
`/head/auth/verify`.
|
||||
- **External service**: Any endpoint reachable via the call protocol.
|
||||
Example: a vast.ai instance, an HTTP API, another head node.
|
||||
|
||||
An irpc service can back a call protocol operation. The OperationEnv routes to
|
||||
the appropriate dispatch path:
|
||||
|
||||
```
|
||||
Call Protocol (Layer 3, external, JSON)
|
||||
└── irpc Service (Layer 3, internal, postcard)
|
||||
└── Honker Streams (Domain events, within service boundary)
|
||||
```
|
||||
|
||||
### Adapters
|
||||
|
||||
HTTP, MCP, DNS, and WebSocket adapters all resolve through OperationEnv:
|
||||
|
||||
- HTTP: `POST /v1/{namespace}/{op}` → `context.env.invoke(namespace, op, input)`
|
||||
- MCP: `tools/call` with tool name → `context.env.invoke(namespace, op, input)`
|
||||
- DNS: `{op}.{namespace}.alk.dev TXT?` → `context.env.invoke(namespace, op, input)`
|
||||
- Call protocol: `call.requested` with `operationId` → `context.env.invoke(namespace, op, input)`
|
||||
|
||||
### Deployment Topologies
|
||||
|
||||
**Current (Phase 1, single node, CLI)**: This is what exists and ships today.
|
||||
Auth uses `ConfigIdentityProvider` backed by `ArcSwap<DynamicConfig>`. Config
|
||||
uses `ConfigServiceImpl` backed by `ArcSwap<DynamicConfig>`. There is no
|
||||
database dependency.
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────┐
|
||||
│ Single Process │
|
||||
│ ConfigIdentityProvider (ArcSwap) │
|
||||
│ ConfigServiceImpl (ArcSwap) │
|
||||
│ alknet-core Server │
|
||||
└──────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
The irpc service layer (`AuthProtocol`, `SecretProtocol`, `ConfigProtocol`,
|
||||
`StorageProtocol`) and the application services (DockerService, NodeService,
|
||||
WalletService, agent services) are downstream concerns that will be built in
|
||||
later phases. The architecture defines the contracts (`IdentityProvider` trait,
|
||||
`OperationEnv`, service protocol enums) so that implementations can plug in
|
||||
without modifying core, but the implementations don't exist yet.
|
||||
|
||||
**Future (multi-node, production)**: Auth and secrets on dedicated nodes;
|
||||
workers access them remotely via irpc over QUIC. StorageIdentityProvider
|
||||
backed by SQLite replaces ConfigIdentityProvider for auth.
|
||||
|
||||
```
|
||||
Auth Node (SQLite) Secret Node (seed in RAM)
|
||||
↑ ↑
|
||||
│ QUIC (irpc) │ QUIC (irpc)
|
||||
│ │
|
||||
Head Node (Config, Storage, alknet-core Server)
|
||||
│
|
||||
│ SSH / iroh / TLS
|
||||
│
|
||||
Worker Node (alknet-core Client)
|
||||
```
|
||||
|
||||
This topology requires alknet-storage, alknet-secret, and the irpc service
|
||||
layer to be built — they are Phase 2+ concerns.
|
||||
|
||||
## Constraints
|
||||
|
||||
- Services are **internal** — they run within a node or cluster.
|
||||
- The call protocol is **external** — it's how nodes talk to each other.
|
||||
- Per ADR-032, domain events (Honker streams) stay within the owning service.
|
||||
irpc calls are synchronous request-response within a node. Call protocol
|
||||
`EventEnvelope` is the integration boundary between nodes.
|
||||
- OperationEnv is a hard constraint: the handler-facing API must match the
|
||||
behavioral contract from `@alkdev/operations`. Namespace + operation name →
|
||||
invoke with input, return output.
|
||||
- irpc is behind a feature flag in alknet-core. Nodes that only do SSH tunneling
|
||||
don't need the service layer overhead.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **OQ-SVC-01**: Should the secret service support multiple seed phrases (one
|
||||
per tenant)? See [open-questions.md](open-questions.md).
|
||||
|
||||
- **OQ-SVC-02**: Should service protocols use postcard (binary) or JSON for
|
||||
remote calls? See [open-questions.md](open-questions.md).
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [027](decisions/027-crate-decomposition.md) | Crate decomposition | Service crates are independent of core |
|
||||
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | AuthProtocol behind feature flag |
|
||||
| [032](decisions/032-event-boundary-discipline.md) | Event boundary | Domain events never cross service boundaries |
|
||||
| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv | Universal composition mechanism with three dispatch paths |
|
||||
|
||||
## References
|
||||
|
||||
- [research/services.md](../research/services.md) — Service protocol definitions, OperationContext, deployment topologies
|
||||
- [research/integration-plan.md](../research/integration-plan.md) — OperationEnv, three dispatch paths, adapter patterns
|
||||
- [secret-service.md](secret-service.md) — SecretProtocol definition
|
||||
- [identity.md](identity.md) — IdentityProvider, AuthProtocol
|
||||
- [configuration.md](configuration.md) — ConfigProtocol, DynamicConfig reload
|
||||
- [interface.md](interface.md) — Interface layer, auth across interfaces
|
||||
@@ -1,221 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-07
|
||||
---
|
||||
|
||||
# Storage
|
||||
|
||||
> **Phase note**: `alknet-storage` is a future crate (Phase 2+). This spec
|
||||
> defines its contract — the data model, the `IdentityProvider` impl, the
|
||||
> irpc service protocol — so that alknet-core can define the traits
|
||||
> (`IdentityProvider`) that storage will later implement. The crate itself
|
||||
> hasn't been built yet. Phase 1 uses `ConfigIdentityProvider` backed by
|
||||
> `ArcSwap<DynamicConfig>`.
|
||||
|
||||
## What
|
||||
|
||||
The `alknet-storage` crate will provide SQLite-backed graph storage, identity
|
||||
management, access control, and reactivity via honker. It mirrors the
|
||||
TypeScript `@alkdev/storage` package's design while leveraging Rust's type
|
||||
system and honker's built-in pub/sub.
|
||||
|
||||
## Why
|
||||
|
||||
alknet-core needs persistent identity data (authorized keys, accounts, ACLs)
|
||||
and a way to store and query graph-structured data (call graphs, operation
|
||||
graphs, metagraph). But alknet-core cannot take a database dependency. The
|
||||
solution: alknet-storage implements alknet-core's `IdentityProvider` trait,
|
||||
providing SQLite-backed identity resolution without core knowing about SQLite.
|
||||
|
||||
The metagraph (three-level type system: GraphType → NodeType → EdgeType → Graph
|
||||
→ Node → Edge) is the foundation for ACL, flowgraph persistence, and any
|
||||
future graph-structured data.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Crate Structure
|
||||
|
||||
```
|
||||
alknet-storage/
|
||||
├── metagraph/ — GraphType, NodeType, EdgeType persistence
|
||||
├── identity/ — accounts, organizations, peer_credentials, api_keys, audit_logs
|
||||
├── acl/ — PrincipalNode, DelegatesEdge, access control graph
|
||||
├── secrets/ — Encrypted node type, encrypt/decrypt bridge
|
||||
├── honker/ — honker integration: notify, stream, queue
|
||||
├── graph/ — GraphInstance, Node, Edge CRUD with schema validation
|
||||
└── schema/ — JSON Schema definitions (serde + jsonschema)
|
||||
```
|
||||
|
||||
### Metagraph Data Model
|
||||
|
||||
Three-level type system:
|
||||
|
||||
1. **GraphType** — A class of graphs (e.g., "call-graph", "acl",
|
||||
"task-dependencies"). Defines structural constraints.
|
||||
2. **NodeType** — A category of node within a graph type. Each has a JSON Schema
|
||||
for attribute validation.
|
||||
3. **EdgeType** — A category of edge within a graph type. Each has a JSON Schema
|
||||
and optional source/target constraints.
|
||||
|
||||
Graph instances belong to a graph type and contain nodes and edges conforming
|
||||
to those type definitions.
|
||||
|
||||
### SQLite Table Schema
|
||||
|
||||
Common columns: `id TEXT PK`, `metadata TEXT JSON DEFAULT '{}'`,
|
||||
`created_at INTEGER TIMESTAMP`, `updated_at INTEGER TIMESTAMP`.
|
||||
|
||||
| Table | Key columns |
|
||||
|-------|------------|
|
||||
| `graph_types` | id, name (UNIQUE), config JSON, version, scope |
|
||||
| `node_types` | id, graph_type_id FK, name, schema JSON |
|
||||
| `edge_types` | id, graph_type_id FK, name, schema JSON, allowed_source/target types |
|
||||
| `graphs` | id, graph_type_id FK, name, description, status, owner_id, project_id |
|
||||
| `nodes` | id, graph_id FK, key (UNIQUE per graph), attributes JSON |
|
||||
| `edges` | id, graph_id FK, key, source_node_key, target_node_key, attributes JSON, undirected |
|
||||
|
||||
No FK constraints across database files. Referential integrity is enforced at
|
||||
the application layer.
|
||||
|
||||
### System DB vs Tenant DB
|
||||
|
||||
- **System DB** (`system.db`): Identity tables (accounts, organizations,
|
||||
peer_credentials, api_keys, audit_logs) + system-scoped graph types.
|
||||
- **Tenant DB** (`tenant-{orgId}.db`): Metagraph tables + tenant-scoped graph
|
||||
types.
|
||||
|
||||
### Identity Tables
|
||||
|
||||
| Table | Key columns |
|
||||
|-------|------------|
|
||||
| `accounts` | email (UNIQUE), display_name, access_level (admin/user/service), status |
|
||||
| `organizations` | name (UNIQUE), slug (UNIQUE), owner_id FK → accounts |
|
||||
| `organization_members` | org_id FK, account_id FK, membership_level (owner/admin/member) |
|
||||
| `api_keys` | owner_id FK, key_hash (UNIQUE), name, enabled, expires_at, revoked_at |
|
||||
| `peer_credentials` | owner_id FK, credential_type (ssh_key/cert_authority), fingerprint (UNIQUE), public_key_data |
|
||||
| `audit_logs` | action, owner_id FK, credential_id, org_id FK, details JSON |
|
||||
|
||||
### ACL as Metagraph
|
||||
|
||||
The ACL graph is a directed, non-multi metagraph:
|
||||
|
||||
- **PrincipalNode**: IdentityType (Account, Org, Service, Role) + identity_id + scopes + resources
|
||||
- **ResourceNode**: The thing being accessed
|
||||
- **Edges**: can_read, can_write, can_execute, belongs_to, delegates
|
||||
|
||||
Delegation edges carry `narrowed_scopes` — the delegate can only exercise scopes
|
||||
that are a subset of the delegator's.
|
||||
|
||||
### StorageIdentityProvider (Future — Phase 2+)
|
||||
|
||||
Implements alknet-core's `IdentityProvider` trait (ADR-029). This is defined
|
||||
here as a contract. When alknet-storage is built, it will provide this
|
||||
implementation. Phase 1 uses `ConfigIdentityProvider` backed by `ArcSwap`.
|
||||
|
||||
```rust
|
||||
impl IdentityProvider for StorageIdentityProvider {
|
||||
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity> {
|
||||
// 1. Find peer_credentials row by fingerprint
|
||||
// 2. Resolve to account → organization membership → effective scopes
|
||||
// 3. Return Identity { id: account_uuid, scopes, resources }
|
||||
}
|
||||
|
||||
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity> {
|
||||
// 1. Verify Ed25519 signature against api_keys or peer_credentials
|
||||
// 2. Resolve to account → effective scopes
|
||||
// 3. Return Identity { id: account_uuid, scopes, resources }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### StorageProtocol irpc Service
|
||||
|
||||
```rust
|
||||
#[rpc_requests(message = StorageMessage)]
|
||||
enum StorageProtocol {
|
||||
#[rpc(tx=oneshot::Sender<Graph>)]
|
||||
#[wrap(CreateGraph)]
|
||||
CreateGraph { graph_type_id: String, name: String },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<Node>)]
|
||||
#[wrap(AddNode)]
|
||||
AddNode { graph_id: String, key: String, attributes: Value },
|
||||
|
||||
// ... (full protocol in research/services.md)
|
||||
}
|
||||
```
|
||||
|
||||
### Honker Integration
|
||||
|
||||
| Feature | Use case |
|
||||
|---------|----------|
|
||||
| `stream_publish` / `subscribe` | Durable pub/sub for node/edge/membership changes |
|
||||
| `notify` / `listen` | Ephemeral pub/sub for real-time control channel events |
|
||||
| `queue` / `claim` / `ack` | Task queue for async operations |
|
||||
|
||||
Per ADR-032, honker streams are domain events internal to the storage service.
|
||||
They are projected to call protocol `EventEnvelope` events when crossing service
|
||||
boundaries.
|
||||
|
||||
### Encrypted Data
|
||||
|
||||
alknet-storage references alknet-secret's `EncryptedData` wire format for
|
||||
storing encrypted nodes (API keys, OAuth tokens). The format (key_version,
|
||||
salt, iv, ciphertext) is shared by type-level compatibility, not a crate
|
||||
dependency. alknet-secret encrypts; alknet-storage stores the blob.
|
||||
|
||||
### Crate Dependencies
|
||||
|
||||
```toml
|
||||
[dependencies]
|
||||
honker = "0.x"
|
||||
rusqlite = { version = "0.x", features = ["bundled"] }
|
||||
serde = { version = "1", features = ["derive"] }
|
||||
serde_json = "1"
|
||||
jsonschema = "0.x"
|
||||
petgraph = "0.x"
|
||||
irpc = "0.x"
|
||||
```
|
||||
|
||||
Does NOT depend on alknet-core or alknet-secret. Implements alknet-core's
|
||||
`IdentityProvider` trait by conforming to its signature, not by direct crate
|
||||
dependency.
|
||||
|
||||
## Constraints
|
||||
|
||||
- alknet-storage does NOT depend on alknet-core as a crate. It implements the
|
||||
`IdentityProvider` trait by conforming to the signature. The CLI binary
|
||||
wires them together.
|
||||
- alknet-storage does NOT depend on alknet-secret. They share the `EncryptedData`
|
||||
wire format by type-level compatibility, not a crate dependency.
|
||||
- WAL mode for concurrent reads during writes. Single writer per `.db` file.
|
||||
- JSON Schema validation uses the `jsonschema` crate at runtime (replaces
|
||||
TypeBox from TypeScript).
|
||||
- Per ADR-032, honker stream events never cross service boundaries without
|
||||
projection to `EventEnvelope`.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **OQ-SVC-03**: How does the secret service integrate with the existing
|
||||
`EncryptedDataSchema` from `@alkdev/storage`? See [open-questions.md](open-questions.md).
|
||||
|
||||
- **OQ-SVC-04**: Should workers cache derived keys locally? See [open-questions.md](open-questions.md).
|
||||
|
||||
- **OQ-SVC-05**: How does the NFT-based ACL smart contract interact with the
|
||||
secret service? See [open-questions.md](open-questions.md).
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [027](decisions/027-crate-decomposition.md) | Crate decomposition | alknet-storage is independent of core and secret |
|
||||
| [029](decisions/029-identity-core-type.md) | Identity as core type | alknet-storage implements IdentityProvider trait |
|
||||
| [032](decisions/032-event-boundary-discipline.md) | Event boundary | Honker streams stay internal; projection to EventEnvelope at boundaries |
|
||||
|
||||
## References
|
||||
|
||||
- [research/storage.md](../research/storage.md) — Full metagraph, identity, ACL, honker definitions
|
||||
- [research/services.md](../research/services.md) — StorageProtocol, StorageIdentityProvider
|
||||
- [research/integration-plan.md](../research/integration-plan.md) — Phase 2.2
|
||||
- [identity.md](identity.md) — IdentityProvider trait, Identity struct
|
||||
- [secret-service.md](secret-service.md) — EncryptedData format, derivation paths
|
||||
@@ -1,152 +0,0 @@
|
||||
---
|
||||
status: reviewed
|
||||
last_updated: 2026-06-02
|
||||
---
|
||||
|
||||
# Transport Layer
|
||||
|
||||
## What
|
||||
|
||||
The transport layer produces a duplex byte stream (`AsyncRead + AsyncWrite + Unpin + Send`) that the SSH layer consumes via `russh::client::connect_stream()` or `russh::server::run_stream()`. The SSH layer is completely unaware of what transport it runs over.
|
||||
|
||||
## Why
|
||||
|
||||
Pluggable transports are the core architectural insight. They enable:
|
||||
|
||||
- **Simple deployment**: TCP on port 22 for basic use
|
||||
- **Censorship resistance**: TLS on port 443 looks like HTTPS
|
||||
- **NAT traversal**: iroh QUIC allows connections without public IPs
|
||||
- **Composability**: transports can be layered (iroh through SOCKS5 through SSH through TLS)
|
||||
|
||||
Without this abstraction, each transport mode would need its own SSH connection logic. With it, there's one SSH implementation and N transport implementations.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Transport Trait
|
||||
|
||||
```rust
|
||||
// The core abstraction. Each transport produces ONE duplex stream.
|
||||
// The SSH session runs over this stream for its entire lifetime.
|
||||
|
||||
#[async_trait]
|
||||
pub trait Transport: Send + Sync + 'static {
|
||||
type Stream: AsyncRead + AsyncWrite + Unpin + Send + 'static;
|
||||
|
||||
/// Connect to the remote endpoint and return a duplex stream.
|
||||
/// For client-side transports.
|
||||
async fn connect(&self) -> Result<Self::Stream>;
|
||||
|
||||
/// Return a human-readable description of this transport for logging.
|
||||
fn describe(&self) -> String;
|
||||
}
|
||||
```
|
||||
|
||||
### Server-Side Transport Acceptor
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait TransportAcceptor: Send + Sync + 'static {
|
||||
type Stream: AsyncRead + AsyncWrite + Unpin + Send + 'static;
|
||||
|
||||
/// Accept an incoming connection and return a duplex stream.
|
||||
async fn accept(&self) -> Result<(Self::Stream, TransportInfo)>;
|
||||
}
|
||||
|
||||
/// Metadata about the incoming connection.
|
||||
pub struct TransportInfo {
|
||||
pub remote_addr: Option<SocketAddr>,
|
||||
pub transport_kind: TransportKind,
|
||||
}
|
||||
|
||||
pub enum TransportKind {
|
||||
Tcp,
|
||||
Tls { server_name: Option<String> },
|
||||
Iroh { endpoint_id: String },
|
||||
}
|
||||
```
|
||||
|
||||
### Transport Implementations
|
||||
|
||||
| Transport | Client | Server | Stream Type |
|
||||
|-----------|--------|--------|-------------|
|
||||
| **TcpTransport** | `TcpStream::connect(addr)` | `TcpListener::accept()` | `TcpStream` |
|
||||
| **TlsTransport** | `TlsStream<TcpStream>` (client TLS) | `TlsStream<TcpStream>` (server TLS) | `tokio_rustls::client::TlsStream<TcpStream>` |
|
||||
| **IrohTransport** | `endpoint.connect(peer, alpn)` then `conn.open_bi()` then `join(recv, send)` | `endpoint.accept()` then `conn.accept_bi()` then `join(recv, send)` | `tokio::io::Join<RecvStream, SendStream>` |
|
||||
|
||||
### Iroh Stream Join
|
||||
|
||||
Since QUIC splits streams into separate `RecvStream` (implements `AsyncRead`) and `SendStream` (implements `AsyncWrite`), while russh expects a single duplex stream, they are combined using `tokio::io::join(recv_stream, send_stream)` which produces a `Join<RecvStream, SendStream>` implementing both traits.
|
||||
|
||||
See ADR-003 for the decision to use `tokio::io::join` over a custom wrapper.
|
||||
|
||||
### iroh Relay Configuration
|
||||
|
||||
By default, iroh transport uses n0's free relay servers (`https://relay.iroh.network/`). This provides zero-config NAT traversal for testing and development. For production deployments, users override with `--iroh-relay <url>` to point to a self-hosted relay.
|
||||
|
||||
The relay URL is passed to iroh's `Endpoint::builder()` configuration. Self-hosted relay setup is documented in the project wiki.
|
||||
|
||||
See ADR-009 for the decision to default to n0's relay with override.
|
||||
|
||||
### Transport Chaining
|
||||
|
||||
Transports can be nested. The CLI supports `--transport iroh --proxy socks5://...` natively (ADR-010):
|
||||
|
||||
```bash
|
||||
alknet connect --transport iroh --proxy socks5://127.0.0.1:1080
|
||||
```
|
||||
|
||||
This routes iroh's outbound TCP connections through the specified SOCKS5 proxy. The iroh transport supports SOCKS5 and HTTP proxy configuration for its outbound connections — the proxy URL is applied during transport initialization.
|
||||
|
||||
For other combinations:
|
||||
- TCP + TLS is already implicit (TLS wraps TCP in `TlsTransport`)
|
||||
- TLS + SOCKS5 proxy is also supported via `--proxy` with `--transport tls`
|
||||
|
||||
**Note**: `--proxy` has different semantics on the client vs the server (ADR-019):
|
||||
- **Client**: `--proxy` routes the *transport connection* through the proxy (e.g., iroh endpoint → SOCKS5 → iroh relay)
|
||||
- **Server**: `--proxy` routes *outbound target connections* through the proxy (e.g., SSH channel request → SOCKS5 → target host)
|
||||
|
||||
### Connection Lifecycle
|
||||
|
||||
```
|
||||
Client Server
|
||||
│ │
|
||||
│ transport.connect() │ transport_acceptor.accept()
|
||||
│ ─────────────────────────────────────────────▶│
|
||||
│ (duplex byte stream established) │
|
||||
│ │
|
||||
│ russh::client::connect_stream(config, │ russh::server::run_stream(config,
|
||||
│ stream, handler) │ stream, handler)
|
||||
│ │
|
||||
│ ═══════ SSH session over stream ═════════════ │
|
||||
│ ═════════════════════════════════════════════ │
|
||||
│ │
|
||||
│ channel_open_direct_tcpip(host, port, ...) │
|
||||
│ ─────────────────────────────────────────────▶│
|
||||
│ │
|
||||
│ ┌─────── TCP proxy ──────────────────┐ │
|
||||
│ │ SSH channel ←→ TcpStream::connect │ │
|
||||
│ └────────────────────────────────────┘ │
|
||||
```
|
||||
|
||||
## Constraints
|
||||
|
||||
- SSH sees only the stream. It never opens its own TCP connections. (ADR-004)
|
||||
- Each transport produces exactly one stream per SSH session. Multiple sessions need multiple `connect()` calls.
|
||||
- The iroh transport reuses a single `Endpoint` across multiple sessions (one QUIC connection per peer, multiple `open_bi()` streams). The endpoint is created once and shared.
|
||||
- TLS transport requires certificate configuration on the server side. The client can accept any certificate (self-signed) or verify against a CA. Server-side ACME is supported (ADR-008).
|
||||
|
||||
## Open Questions
|
||||
|
||||
None — all resolved.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [001](decisions/001-pluggable-transport.md) | Pluggable transport | Transport trait produces stream, SSH consumes it |
|
||||
| [003](decisions/003-iroh-stream-join.md) | iroh stream join | `tokio::io::join` combines QUIC halves |
|
||||
| [004](decisions/004-ssh-over-transport.md) | SSH over transport | SSH never touches TCP/iroh/TLS directly |
|
||||
| [008](decisions/008-acme-lets-encrypt.md) | ACME/Let's Encrypt | Auto-provision TLS certs, domain and IP paths |
|
||||
| [009](decisions/009-default-iroh-relay.md) | Default iroh relay | n0 relay by default, `--iroh-relay` override |
|
||||
| [010](decisions/010-transport-chaining-cli.md) | Transport chaining | `--proxy` works with all transports natively |
|
||||
| [019](decisions/019-proxy-dual-semantics.md) | Proxy dual semantics | `--proxy` routes transport on client, data on server |
|
||||
@@ -1,28 +0,0 @@
|
||||
---
|
||||
status: deprecated
|
||||
last_updated: 2026-06-01
|
||||
---
|
||||
|
||||
# TUN Shim (Deprecated)
|
||||
|
||||
> **Note**: TUN functionality has been deferred from the alknet project. For VPN-like "route all traffic" behavior, use `tun2proxy` alongside alknet's SOCKS5 proxy. See ADR-014 for the rationale.
|
||||
|
||||
## What Changed
|
||||
|
||||
The `alknet-tun` separate process and all TUN-related code is out of scope. The recommended approach for VPN-like behavior is:
|
||||
|
||||
```bash
|
||||
# Terminal 1: alknet SOCKS5 proxy (no root required)
|
||||
alknet connect --server example.com --identity ~/.ssh/id_ed25519
|
||||
|
||||
# Terminal 2: tun2proxy routes all traffic through alknet's SOCKS5
|
||||
sudo tun2proxy --proxy socks5://127.0.0.1:1080
|
||||
```
|
||||
|
||||
This keeps the core alknet binary free of TUN complexity and leverages an existing, well-tested tool for TUN-to-SOCKS5 bridging.
|
||||
|
||||
## References
|
||||
|
||||
- [ADR-014](decisions/014-defer-tun-recommend-socks5-proxy.md) — decision to defer TUN
|
||||
- [ADR-005](decisions/005-socks5-before-tun.md) — SOCKS5 is still the primary interface
|
||||
- [tun2proxy](https://github.com/tun2proxy/tun2proxy) — recommended external tool for TUN support
|
||||
Reference in New Issue
Block a user