greenfield: clean slate for ALPN-as-service pivot
Delete old source crates (alknet-core, alknet, alknet-napi), old architecture docs (ADRs, specs, open questions), old research docs (phase2, event-sourcing, feasibility, etc.), old tasks, and obsolete reference material (gitserver/MPL, honker, nats, rustfs, polyglot, keystone, distributed-identity). Keep: alknet-secret (standalone, compiles), pivot docs, iroh and ssh references, rudolfs reference (MIT/Apache, fork candidate), ops docs, sdd_process.md, and licenses. Previous implementation preserved at /workspace/@alkdev/alknet-main/ for reference during porting. Workspace compiles: cargo check + 14 tests pass for alknet-secret.
This commit is contained in:
@@ -1,122 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-09
|
||||
---
|
||||
|
||||
# Alknet Architecture
|
||||
|
||||
## Current State
|
||||
|
||||
Architecture spec sync in progress. Phase 0 foundation complete (ADRs 001–037).
|
||||
Phase 1 core modifications partially implemented (interface trait, config split,
|
||||
identity provider, forwarding policy). Phase 2 core bridge research complete;
|
||||
spec documents updated to reflect StreamInterface/MessageInterface split,
|
||||
CredentialProvider as core type, and API keys in DynamicConfig.
|
||||
|
||||
Remaining open questions: OQ-15 (QUIC coexistence), OQ-19 (WebTransport TLS),
|
||||
OQ-20 (worker registration), OQ-CP-01 (per-identity credentials), OQ-CP-02
|
||||
(OIDC provider location), OQ-CP-03 (credential rotation). See
|
||||
[open-questions.md](open-questions.md).
|
||||
|
||||
## Architecture Documents
|
||||
|
||||
| Document | Status | Description |
|
||||
|----------|--------|-------------|
|
||||
| [overview.md](overview.md) | reviewed | Package purpose, crate structure, three-layer model, exports, dependencies |
|
||||
| [transport.md](transport.md) | reviewed | Transport abstraction: TCP, TLS, iroh |
|
||||
| [auth.md](auth.md) | draft | Unified auth: SSH + token + API keys, credential presentation per interface |
|
||||
| [call-protocol.md](call-protocol.md) | draft | Bidirectional call/event protocol, OperationEnv, three dispatch paths |
|
||||
| [client.md](client.md) | reviewed | Client connection, SOCKS5, port forwarding |
|
||||
| [server.md](server.md) | reviewed | Server acceptance, IdentityProvider, ForwardingPolicy, channel handling |
|
||||
| [tun-shim.md](tun-shim.md) | deprecated | TUN interface wrapper — **deferred**, use tun2proxy |
|
||||
| [napi-and-pubsub.md](napi-and-pubsub.md) | reviewed | NAPI wrapper, reload API, pubsub event target adapter |
|
||||
| [identity.md](identity.md) | draft | Identity type, IdentityProvider trait, auth flows |
|
||||
| [services.md](services.md) | draft | irpc service layer, OperationEnv, three dispatch paths |
|
||||
| [interface.md](interface.md) | draft | StreamInterface, MessageInterface, credential presentation, ListenerConfig |
|
||||
| [configuration.md](configuration.md) | draft | StaticConfig, DynamicConfig, API keys, forwarding policy, reload |
|
||||
| [storage.md](storage.md) | draft | alknet-storage: metagraph, identity, ACL, honker |
|
||||
| [flowgraph.md](flowgraph.md) | draft | alknet-flowgraph: call graph, operation graph, petgraph |
|
||||
| [secret-service.md](secret-service.md) | reviewed | alknet-secret: BIP39, SLIP-0010, AES-GCM, SecretProtocol |
|
||||
| [credentials.md](credentials.md) | draft | CredentialProvider, CredentialSet (outbound auth) |
|
||||
| [definitions.md](definitions.md) | draft | Terminology disambiguation and concept mapping |
|
||||
|
||||
## Research Documents
|
||||
|
||||
| Document | Status | Description |
|
||||
|----------|--------|-------------|
|
||||
| [configuration.md](../research/configuration.md) | draft | Configuration architecture (source for promoted spec) |
|
||||
| [core.md](../research/core.md) | draft | Core overview, transport, call protocol, DNS |
|
||||
| [services.md](../research/services.md) | draft | irpc service protocols, OperationContext, application services |
|
||||
| [storage.md](../research/storage.md) | draft | Metagraph, identity, ACL, secrets, honker |
|
||||
| [flow.md](../research/flow.md) | draft | FlowGraph, operation graph, call graph, petgraph mapping |
|
||||
| [integration-plan.md](../research/integration-plan.md) | draft | Phased integration plan for services, pubsub, and operations |
|
||||
| [feasibility/](../research/feasibility/) | — | SSH tunnel feasibility assessment and related analyses |
|
||||
| [event-sourcing/](../research/event-sourcing/) | — | Event sourcing patterns and event-driven architecture reference |
|
||||
| [ops/](../research/ops/) | — | Production ops reference: certbot, fail2ban |
|
||||
| [phase2/definitions.md](../research/phase2/definitions.md) | draft | Terminology disambiguation (promoted to architecture/definitions.md) |
|
||||
| [phase2/interface-model.md](../research/phase2/interface-model.md) | draft | StreamInterface/MessageInterface analysis (promoted to interface.md) |
|
||||
| [phase2/credential-provider.md](../research/phase2/credential-provider.md) | draft | CredentialProvider research (promoted to credentials.md) |
|
||||
| [phase2/tls-transport.md](../research/phase2/tls-transport.md) | draft | HTTP interface, stealth handoff, ListenerConfig (promoted to interface.md, auth.md) |
|
||||
|
||||
## ADR Table
|
||||
|
||||
| ADR | Title | Status |
|
||||
|-----|-------|--------|
|
||||
| [001](decisions/001-pluggable-transport.md) | Pluggable transport via `AsyncRead+AsyncWrite` trait | Accepted |
|
||||
| [002](decisions/002-tun-separate-process.md) | TUN shim as separate process | Superseded by ADR-014 |
|
||||
| [003](decisions/003-iroh-stream-join.md) | iroh stream via `tokio::io::join` | Accepted |
|
||||
| [004](decisions/004-ssh-over-transport.md) | SSH runs over transport, not alongside | Accepted |
|
||||
| [005](decisions/005-socks5-before-tun.md) | SOCKS5 as primary interface, TUN as add-on | Accepted |
|
||||
| [006](decisions/006-no-logging-of-tunnel-destinations.md) | No logging of tunnel destinations | Accepted |
|
||||
| [007](decisions/007-napi-single-stream.md) | NAPI exposes single duplex stream | Accepted |
|
||||
| [008](decisions/008-acme-lets-encrypt.md) | ACME/Let's Encrypt certificate provisioning | Accepted |
|
||||
| [009](decisions/009-default-iroh-relay.md) | Default iroh relay with override | Accepted |
|
||||
| [010](decisions/010-transport-chaining-cli.md) | Transport chaining in CLI | Accepted |
|
||||
| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first API, no file-based config | Accepted |
|
||||
| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Ed25519 keys + OpenSSH cert-authority, no password auth | Accepted |
|
||||
| [013](decisions/013-fail2ban-friendly-logging.md) | Fail2ban-friendly logging + built-in rate limiting | Accepted |
|
||||
| [014](decisions/014-defer-tun-recommend-socks5-proxy.md) | Defer TUN, recommend local SOCKS5 + tun2proxy | Accepted |
|
||||
| [015](decisions/015-napi-rs-for-ffi-bridge.md) | napi-rs for FFI bridge | Accepted |
|
||||
| [016](decisions/016-napi-expose-connect-and-serve.md) | NAPI exposes both connect() and serve() | Accepted |
|
||||
| [017](decisions/017-stealth-mode-protocol-multiplexing.md) | Stealth mode — protocol multiplexing on port 443 | Accepted |
|
||||
| [018](decisions/018-control-channel-for-pubsub.md) | Control channel for pubsub over SSH | Accepted |
|
||||
| [019](decisions/019-proxy-dual-semantics.md) | `--proxy` dual semantics (client vs server) | Accepted |
|
||||
| [023](decisions/023-unified-auth-shared-key-material.md) | Unified auth with shared key material + token auth | Accepted |
|
||||
| [024](decisions/024-bidirectional-call-protocol.md) | Bidirectional call protocol (EventEnvelope) | Accepted |
|
||||
| [025](decisions/025-handler-spec-separation.md) | Handler/spec separation for downstream service registration | Accepted |
|
||||
| [026](decisions/026-transport-interface-separation.md) | Transport/interface separation (three-layer model) | Accepted |
|
||||
| [027](decisions/027-crate-decomposition.md) | Crate decomposition (core, secret, storage, flowgraph) | Accepted |
|
||||
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service behind feature flag | Accepted |
|
||||
| [029](decisions/029-identity-core-type.md) | Identity as core type in alknet-core | Accepted |
|
||||
| [030](decisions/030-static-dynamic-config-split.md) | Static/dynamic config split with ArcSwap | Accepted |
|
||||
| [031](decisions/031-forwarding-policy.md) | Forwarding policy with rule-based allow/deny | Accepted |
|
||||
| [032](decisions/032-event-boundary-discipline.md) | Event boundary discipline (domain, irpc, call protocol) | Accepted |
|
||||
| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv as universal composition mechanism | Accepted |
|
||||
| [034](decisions/034-head-worker-terminology.md) | Head/worker terminology replacing hub/spoke | Accepted |
|
||||
| [035](decisions/035-streaminterface-messageinterface-split.md) | StreamInterface / MessageInterface split | Accepted |
|
||||
| [036](decisions/036-credentialprovider-core-type.md) | CredentialProvider as core type (outbound auth) | Accepted |
|
||||
| [037](decisions/037-api-keys-dynamic-config.md) | API keys as DynamicConfig auth | Accepted |
|
||||
|
||||
| [038](decisions/038-seed-lifecycle-memory-security.md) | Seed lifecycle and memory security (zeroize for v1) | Accepted |
|
||||
|
||||
> ADR numbers 020–022 were allocated to proposals that were withdrawn before
|
||||
> acceptance and are not listed.
|
||||
|
||||
## Open Questions
|
||||
|
||||
See [open-questions.md](open-questions.md) for all open and resolved questions.
|
||||
Key resolved questions from Phase 0: OQ-12, OQ-16, OQ-18 (forwarding policy
|
||||
and identity scopes), OQ-17 (transport-aware auth), OQ-23 (irpc feature flag),
|
||||
OQ-24 (DNS control channel scope), OQ-25 (crate irpc dependencies), OQ-IF-01
|
||||
(Interface session / EventEnvelope relationship), OQ-IF-02 (ForwardingPolicy
|
||||
placement). Key open questions: OQ-15 (QUIC coexistence), OQ-19 (WebTransport
|
||||
TLS), OQ-20 (worker registration).
|
||||
|
||||
## Lifecycle Definitions
|
||||
|
||||
| Status | Meaning | Transitions |
|
||||
|--------|---------|-------------|
|
||||
| `draft` | Under active development. May change significantly. | → `reviewed` when open questions resolved |
|
||||
| `reviewed` | Architecture final. Implementation may begin. Changes require review. | → `stable` when implementation is complete and verified |
|
||||
| `stable` | Locked. Changes require review and may warrant an ADR. | → `deprecated` when superseded |
|
||||
| `deprecated` | Superseded. Kept for reference. | Removed when no longer referenced |
|
||||
@@ -1,339 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-09
|
||||
---
|
||||
|
||||
# Authentication
|
||||
|
||||
## What
|
||||
|
||||
A unified authentication layer that works across all transports — SSH-over-any-
|
||||
transport and WebTransport (non-SSH HTTP-level transports). The same key
|
||||
material (Ed25519 authorized keys and certificate authorities) is shared across
|
||||
both auth paths. Identity resolution produces a transport-agnostic `Identity`
|
||||
that carries scopes and resources for downstream authorization.
|
||||
|
||||
## Why
|
||||
|
||||
Alknet currently authenticates connections exclusively through SSH public key
|
||||
auth. Non-SSH transports (WebTransport) cannot perform SSH key exchange — they
|
||||
need a different auth presentation that shares the same key material. The
|
||||
unified auth layer ensures one key set, one identity, one rotation mechanism
|
||||
across all transports. See ADR-023 for the decision context.
|
||||
|
||||
The canonical definitions of `Identity` and `IdentityProvider` are in
|
||||
[identity.md](identity.md). This document covers auth-specific behavior:
|
||||
auth presentation per transport, `AuthPolicy` structure, and the auth service
|
||||
relationship.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Identity and IdentityProvider
|
||||
|
||||
See [identity.md](identity.md) for the canonical definitions of:
|
||||
- `Identity` struct (`{ id, scopes, resources }`)
|
||||
- `IdentityProvider` trait (`resolve_from_fingerprint()`, `resolve_from_token()`)
|
||||
- `ConfigIdentityProvider` (default, ArcSwap-backed)
|
||||
- `StorageIdentityProvider` (production, SQLite-backed, in alknet-storage)
|
||||
- `AuthProtocol` irpc service (behind `irpc` feature flag)
|
||||
|
||||
The key relationship: `IdentityProvider` is the contract. `ConfigIdentityProvider`
|
||||
is the default implementation (reads from `DynamicConfig.auth`). `AuthProtocol`
|
||||
irpc service is one way to satisfy the trait, behind a feature flag. Both paths
|
||||
produce the same `Identity` result. See ADR-028 and ADR-029.
|
||||
|
||||
### Credential Presentation Per Interface
|
||||
|
||||
Each (Transport, Interface) pair presents credentials differently, but all
|
||||
resolve to the same `Identity` through `IdentityProvider`. See
|
||||
[definitions.md](definitions.md) for the full terminology rules.
|
||||
|
||||
| (Transport, Interface) | Credential presentation | Resolves via |
|
||||
|------------------------|------------------------|-------------|
|
||||
| (TLS, SshInterface) | SSH public key handshake | `resolve_from_fingerprint()` |
|
||||
| (TCP, SshInterface) | SSH public key handshake | `resolve_from_fingerprint()` |
|
||||
| (iroh, SshInterface) | SSH public key handshake | `resolve_from_fingerprint()` |
|
||||
| (TLS, RawFramingInterface) | AuthToken in frame header | `resolve_from_token()` |
|
||||
| (TCP, RawFramingInterface) | AuthToken in frame header | `resolve_from_token()` |
|
||||
| (WebTransport, RawFramingInterface) | AuthToken in CONNECT request | `resolve_from_token()` |
|
||||
| (—, HttpInterface) | `Authorization: Bearer` header | `resolve_from_token()` |
|
||||
| (—, DnsInterface) | AuthToken in query labels | `resolve_from_token()` |
|
||||
|
||||
The **key material is shared**. The **credential presentation** differs per
|
||||
(Transport, Interface) pair. The **verification result is the same**: an
|
||||
authenticated `Identity` with scopes.
|
||||
|
||||
`resolve_from_token()` handles both AuthTokens (Ed25519-signed) and API keys
|
||||
(hash-verified bearer tokens). The implementation discriminates by prefix or
|
||||
format — see ADR-037.
|
||||
|
||||
### Token Authentication
|
||||
|
||||
For non-SSH transports, the client constructs an authentication token:
|
||||
|
||||
```
|
||||
AuthToken = base64url(key_id || timestamp || signature)
|
||||
|
||||
key_id = SHA-256 fingerprint of the Ed25519 public key (32 bytes)
|
||||
timestamp = Unix seconds, big-endian u64 (8 bytes)
|
||||
signature = Ed25519 sign(key_id || timestamp_bytes, private_key)
|
||||
```
|
||||
|
||||
Wire format when passed in a WebTransport CONNECT request:
|
||||
```
|
||||
CONNECT https://server:443/alknet?token=<AuthToken>
|
||||
```
|
||||
|
||||
Server verification:
|
||||
|
||||
1. Base64url-decode the token
|
||||
2. Extract `key_id` (first 32 bytes)
|
||||
3. Look up `key_id` in the same `authorized_keys` set that SSH auth uses
|
||||
4. Verify the Ed25519 `signature` against `(key_id || timestamp_bytes)` using
|
||||
the matching public key
|
||||
5. Check `timestamp` is within the acceptable window (configurable, default
|
||||
±300 seconds)
|
||||
6. Resolve to the same `Identity` that SSH pubkey auth would produce
|
||||
|
||||
The key fingerprint in the token serves double duty: it identifies which key
|
||||
to verify against, and it ties the signature to a specific key (swapping
|
||||
`key_id` invalidates the signature).
|
||||
|
||||
### Replay Protection
|
||||
|
||||
V1 uses timestamp-only (±300s window, no server state). The replay trade-offs
|
||||
and future zero-replay options (nonce challenge-response) are documented in
|
||||
ADR-023.
|
||||
|
||||
### IdentityProvider and Auth Service Relationship
|
||||
|
||||
The `IdentityProvider` trait (defined in [identity.md](identity.md)) decouples
|
||||
alknet-core from any specific identity storage. Two implementations exist:
|
||||
|
||||
- **ConfigIdentityProvider** (in alknet-core) — reads from
|
||||
`ArcSwap<DynamicConfig.auth>`. Every authorized key gets a default scope set.
|
||||
No database required. This is the default for minimal deployments.
|
||||
|
||||
- **StorageIdentityProvider** (in alknet-storage) — backed by SQLite
|
||||
`peer_credentials` and `api_keys` tables plus the ACL graph. Resolves
|
||||
fingerprint → account → organization membership → effective scopes.
|
||||
|
||||
The `AuthProtocol` irpc service (behind the `irpc` feature flag, per ADR-028)
|
||||
provides an async boundary for auth verification. It is one way to satisfy the
|
||||
`IdentityProvider` trait, not a replacement for it. Both the trait path and the
|
||||
irpc path produce the same `Identity` result.
|
||||
|
||||
The trait is the contract. The backing store is pluggable. Alknet-core never
|
||||
depends on Honker, SQLite, or any specific database.
|
||||
|
||||
### API Keys
|
||||
|
||||
For service accounts, automation, and HTTP interface auth, Ed25519 AuthTokens
|
||||
are inconvenient — they require client-side key generation and signing. API keys
|
||||
provide a simpler bearer token format (ADR-037):
|
||||
|
||||
```
|
||||
API key: "alk_dGhlX3NlY3JldA" (~20 chars, configurable prefix)
|
||||
Storage: SHA-256 hash of the full key
|
||||
Lookup: prefix match → hash verification → Identity
|
||||
```
|
||||
|
||||
API keys are configured in `DynamicConfig.auth.api_keys`:
|
||||
|
||||
```toml
|
||||
[[auth.api_keys]]
|
||||
prefix = "alk_"
|
||||
hash = "sha256:abc..."
|
||||
scopes = ["relay:connect"]
|
||||
description = "dashboard service account"
|
||||
ttl = "30d" # optional
|
||||
```
|
||||
|
||||
Both AuthTokens and API keys go through `IdentityProvider::resolve_from_token()`.
|
||||
The implementation discriminates by prefix (default `alk_`): if the token starts
|
||||
with the API key prefix, it's verified by SHA-256 hash lookup; otherwise, it's
|
||||
verified as an Ed25519 AuthToken. Both paths produce the same `Identity`.
|
||||
|
||||
See [configuration.md](configuration.md) for the full `DynamicConfig.auth`
|
||||
structure and ADR-037 for the decision context.
|
||||
|
||||
### AuthPolicy Structure
|
||||
|
||||
`AuthPolicy` in `DynamicConfig` holds all auth paths, sharing key material:
|
||||
|
||||
```rust
|
||||
pub struct AuthPolicy {
|
||||
pub ssh: SshAuthConfig,
|
||||
pub token: TokenAuthConfig,
|
||||
pub api_keys: Vec<ApiKeyEntry>,
|
||||
}
|
||||
|
||||
pub struct SshAuthConfig {
|
||||
pub authorized_keys: HashSet<PublicKey>,
|
||||
pub cert_authorities: Vec<CertAuthorityEntry>,
|
||||
// Existing fields from current ServerAuthConfig
|
||||
}
|
||||
|
||||
pub struct TokenAuthConfig {
|
||||
pub enabled: bool,
|
||||
pub max_token_age: Duration, // Timestamp window (default: 300s)
|
||||
pub key_source: TokenKeySource,
|
||||
}
|
||||
|
||||
pub enum TokenKeySource {
|
||||
/// Share the same authorized_keys set with SshAuthConfig.
|
||||
/// Default and recommended for v1.
|
||||
Shared,
|
||||
/// Separate key set for non-SSH transports.
|
||||
/// For deployments that want distinct access control per transport.
|
||||
Separate(HashSet<PublicKey>),
|
||||
}
|
||||
|
||||
pub struct ApiKeyEntry {
|
||||
pub prefix: String, // e.g., "alk_"
|
||||
pub hash: String, // e.g., "sha256:abc..."
|
||||
pub scopes: Vec<String>, // e.g., ["relay:connect", "secrets:derive"]
|
||||
pub description: Option<String>, // e.g., "dashboard service account"
|
||||
pub expires_at: Option<u64>, // Unix timestamp, optional TTL
|
||||
}
|
||||
```
|
||||
|
||||
When `TokenKeySource::Shared` (the default), adding a key to
|
||||
`authorized_keys` immediately grants access via both SSH and WebTransport.
|
||||
One key set, one `reloadAuth()` call, one rotation.
|
||||
|
||||
### Auth Flow in the Server
|
||||
|
||||
**SSH transport (existing, unchanged):**
|
||||
```
|
||||
Client connects → SSH handshake → auth_publickey() callback
|
||||
→ ServerAuthConfig::authenticate_publickey() or authenticate_certificate()
|
||||
→ Auth::Accept or Auth::Reject
|
||||
```
|
||||
|
||||
**WebTransport transport (new):**
|
||||
```
|
||||
Browser connects → WebTransport CONNECT request
|
||||
→ SessionRequest inspection: extract token from URL path or header
|
||||
→ TokenAuthConfig verification: decode token → lookup key_id → verify signature → check timestamp
|
||||
→ session_request.accept() or session_request.forbidden()
|
||||
```
|
||||
|
||||
After auth, both paths produce an `Identity`. The `Identity` is attached to the
|
||||
connection and used by `ForwardingPolicy` and the call protocol to make
|
||||
authorization decisions.
|
||||
|
||||
### WebTransport SessionRequest Inspection
|
||||
|
||||
The wtransport library's `SessionRequest` provides:
|
||||
|
||||
- `path()` — URL path (e.g., `/alknet?token=...`)
|
||||
- `headers()` — HTTP headers (for `Authorization: Bearer ...`)
|
||||
- `origin()` — Browser origin (for CORS-like restrictions)
|
||||
- `remote_address()` — Client UDP address
|
||||
|
||||
Token extraction from URL path is preferred for browser WebTransport because
|
||||
the W3C API (`new WebTransport(url)`) naturally includes query parameters. For
|
||||
native clients (Deno, CLI), the `Authorization` header is also supported.
|
||||
|
||||
### Browser-Side Token Construction
|
||||
|
||||
```javascript
|
||||
// Illustrative — see client SDK for production implementation
|
||||
async function createAuthToken(keyPair) {
|
||||
const publicKey = await crypto.subtle.exportKey('raw', keyPair.publicKey);
|
||||
const keyId = new Uint8Array(await crypto.subtle.digest('SHA-256', publicKey));
|
||||
|
||||
const timestamp = new ArrayBuffer(8);
|
||||
new DataView(timestamp).setBigUint64(0, BigInt(Math.floor(Date.now() / 1000)));
|
||||
|
||||
const message = new Uint8Array([...keyId, ...new Uint8Array(timestamp)]);
|
||||
const signature = await crypto.subtle.sign('Ed25519', keyPair.privateKey, message);
|
||||
|
||||
const token = new Uint8Array([...keyId, ...new Uint8Array(timestamp), ...new Uint8Array(signature)]);
|
||||
return btoa(String.fromCharCode(...token))
|
||||
.replace(/\+/g, '-').replace(/\//g, '_').replace(/=+$/, '');
|
||||
}
|
||||
```
|
||||
|
||||
Browsers support Ed25519 key generation and signing via `SubtleCrypto` (Chrome
|
||||
105+, Firefox 130+, Safari 17+). Deno supports it natively. No external
|
||||
dependencies needed.
|
||||
|
||||
## Constraints
|
||||
|
||||
- Auth tokens are Ed25519-signed with the same key pair used for SSH auth. No
|
||||
separate key management for non-SSH transports.
|
||||
- `IdentityProvider` is the only interface between alknet-core and identity
|
||||
storage. No database dependency at the core level.
|
||||
- The SSH auth path is unchanged. `auth_publickey()` continues to work exactly
|
||||
as it does today. Token auth is additive.
|
||||
- Certificate authority tokens are not supported for token auth in v1. CA
|
||||
verification requires the full OpenSSH certificate structure, which doesn't
|
||||
fit in a simple signed timestamp. This can be added later if needed.
|
||||
- Token auth is only available on transports that carry HTTP metadata (URL
|
||||
path, headers). SSH-over-TCP/TLS/iroh continues to use SSH native auth
|
||||
exclusively.
|
||||
- API keys are bearer tokens — anyone who obtains the key has the associated
|
||||
permissions. The hash storage and optional TTL mitigate but do not eliminate
|
||||
this risk. Ed25519 AuthTokens remain the preferred auth method for interactive
|
||||
clients. See ADR-037.
|
||||
- API keys are verified by SHA-256 hash lookup in `DynamicConfig.auth.api_keys`
|
||||
(or the `api_keys` database table in production). The full key is provided to
|
||||
the client exactly once at creation time.
|
||||
|
||||
### Security Considerations
|
||||
|
||||
**Token in URL**: The auth token is passed as a URL query parameter
|
||||
(`?token=...`) for browser WebTransport compatibility. This is a known web
|
||||
security consideration:
|
||||
|
||||
- **Server logs**: The token may appear in HTTP access logs. Servers MUST
|
||||
strip or redact the `token` query parameter before logging the request URL.
|
||||
- **Browser history**: The token may appear in browser history. Timestamps
|
||||
limit exposure to the token window (±300s).
|
||||
- **Referrer headers**: WebTransport does not send referrer headers, so the
|
||||
token does not leak via HTTP Referer.
|
||||
- **Native clients**: Deno and native clients SHOULD prefer the `Authorization:
|
||||
Bearer` header over URL parameters when the client supports custom headers.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **OQ-18**: ~~Source of Identity.scopes~~ Resolved per ADR-029 and ADR-031.
|
||||
`IdentityProvider` owns scopes, `ForwardingPolicy` uses scopes from `Identity`.
|
||||
See [open-questions.md](open-questions.md).
|
||||
|
||||
- **OQ-19**: Should the WebTransport listener require its own TLS identity
|
||||
(separate from the SSH-over-TLS listener), or can they share the same
|
||||
certificate? Deferred to Phase 4. See [open-questions.md](open-questions.md).
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Ed25519 + cert-authority | Key-based auth, no passwords |
|
||||
| [023](decisions/023-unified-auth-shared-key-material.md) | Unified auth, shared key material | Same keys for SSH and token auth |
|
||||
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | AuthProtocol behind feature flag; IdentityProvider is the contract |
|
||||
| [029](decisions/029-identity-core-type.md) | Identity as core type | `Identity` and `IdentityProvider` in alknet-core |
|
||||
| [035](decisions/035-streaminterface-messageinterface-split.md) | StreamInterface/MessageInterface | Credential presentation differs per (Transport, Interface) pair |
|
||||
| [037](decisions/037-api-keys-dynamic-config.md) | API keys in DynamicConfig | Hash-verified bearer tokens for service accounts |
|
||||
|
||||
## Phase 2 Implementation Notes
|
||||
|
||||
- `ConfigIdentityProvider::resolve_from_token()` now handles API keys (`alk_` prefix) via SHA-256 hash verification with expiry checking
|
||||
- `ApiKeyEntry` struct added to `AuthPolicy` with prefix, hash, scopes, description, expires_at fields
|
||||
- API keys produce `Identity { id: prefix, scopes: from_entry, resources: {} }`
|
||||
- Both AuthTokens (Ed25519 signed) and API keys (hash-verified bearer) go through `resolve_from_token()`, discriminated by format/prefix
|
||||
|
||||
## References
|
||||
|
||||
- [identity.md](identity.md) — Canonical Identity and IdentityProvider definitions
|
||||
- [server.md](server.md) — Current SSH auth handler
|
||||
- [transport.md](transport.md) — Transport abstraction
|
||||
- [configuration.md](configuration.md) — DynamicConfig, AuthPolicy, ConfigReloadHandle
|
||||
- [interface.md](interface.md) — Credential presentation per (Transport, Interface) pair
|
||||
- [definitions.md](definitions.md) — Terminology disambiguation (IdentityProvider vs CredentialProvider, AuthToken vs API key)
|
||||
- [services.md](services.md) — AuthProtocol irpc service
|
||||
- [open-questions.md](open-questions.md) — OQ-17 (resolved), OQ-18 (resolved), OQ-19
|
||||
- [wtransport](https://github.com/BiagioFesta/wtransport) — Rust WebTransport library
|
||||
- [WebTransport W3C Spec](https://www.w3.org/TR/webtransport/) — Browser API
|
||||
@@ -1,551 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-09
|
||||
---
|
||||
|
||||
# Call Protocol
|
||||
|
||||
## What
|
||||
|
||||
A bidirectional, transport-agnostic call and event protocol that runs over
|
||||
authenticated pipes. It supports request/response calls, streaming
|
||||
subscriptions, and unidirectional events — all using the same wire format. The
|
||||
protocol is defined as a spec + handler + registry; downstream consumers (NAPI,
|
||||
Python, head/worker) register their own operations without modifying core.
|
||||
|
||||
OperationEnv extends the call protocol with a universal composition mechanism
|
||||
that unifies local dispatch, irpc service dispatch, and remote dispatch. A
|
||||
handler receives `context.env.invoke(namespace, op, input)` and doesn't know
|
||||
whether the operation runs locally, in-cluster, or on a remote node.
|
||||
|
||||
## Why
|
||||
|
||||
The current control channel (ADR-018) is unidirectional (client → server) and
|
||||
provides fire-and-forget event dispatch without request/response semantics.
|
||||
The call protocol generalizes it to support bidirectional calls (ADR-024) and
|
||||
downstream service registration (ADR-025), enabling the head/worker model where
|
||||
workers expose operations the head invokes.
|
||||
|
||||
Without OperationEnv, handlers calling other operations would need to know
|
||||
whether the target is local, in-cluster, or on a remote node. OperationEnv
|
||||
abstracts this away — one handler-facing API, three dispatch backends (ADR-033).
|
||||
|
||||
## Architecture
|
||||
|
||||
### Operation Paths
|
||||
|
||||
Operation names use slash-based paths aligned with URL routing conventions:
|
||||
|
||||
```
|
||||
/{node}/{service}/{op}
|
||||
```
|
||||
|
||||
- **node** — identity prefix of the node that exposes the operation. The head
|
||||
uses this segment to route calls to the correct connected node.
|
||||
- **service** — the logical service namespace. Groups related operations
|
||||
under one handler prefix.
|
||||
- **op** — the specific operation within that service.
|
||||
|
||||
Examples:
|
||||
|
||||
| Path | Meaning |
|
||||
|------|---------|
|
||||
| `/dev1/fs/readFile` | Node `dev1`, service `fs`, operation `readFile` |
|
||||
| `/dev1/bash/exec` | Node `dev1`, service `bash`, operation `exec` |
|
||||
| `/head/agent/chat` | Head's own `agent` service, operation `chat` |
|
||||
| `/head/sessions/list` | Head's own `sessions` service, operation `list` |
|
||||
| `/browser-1/notify/alert` | Worker `browser-1`, `notify` service |
|
||||
|
||||
This three-level routing mirrors iroh's ALPN dispatch: the first segment
|
||||
routes to a connected node (like ALPN routes to a protocol handler), the
|
||||
remaining path dispatches within that node's registry. See ADR-025 for the
|
||||
handler/spec separation decision.
|
||||
|
||||
The `namespace` field on `OperationSpec` is derived from the path (`namespace`
|
||||
= second path segment). It's a convenience accessor for ACL matching and
|
||||
service grouping.
|
||||
|
||||
### Wire Format: EventEnvelope
|
||||
|
||||
Every message on the wire is a length-prefixed JSON `EventEnvelope`:
|
||||
|
||||
```rust
|
||||
pub struct EventEnvelope {
|
||||
pub r#type: String, // Event type (e.g., "call.requested", "call.responded")
|
||||
pub id: String, // Correlation key (requestId, topic, or "" for broadcasts)
|
||||
pub payload: Value, // JSON payload — schema depends on event type
|
||||
}
|
||||
|
||||
// Frame: 4-byte big-endian length prefix + UTF-8 JSON body
|
||||
```
|
||||
|
||||
This is the same format used by `@alkdev/pubsub` adapters. It is JSON because
|
||||
it must be consumable from JavaScript, Python, and any language. The envelope
|
||||
is transport-agnostic — it runs over SSH channels, WebTransport streams, iroh
|
||||
bidirectional streams, WebSocket, or Worker postMessage.
|
||||
|
||||
Binary payloads (postcard, protobuf, etc.) are base64-encoded in the `payload`
|
||||
field. The envelope itself stays JSON for cross-language compatibility.
|
||||
|
||||
### Call Protocol Events
|
||||
|
||||
Five event types carry request/response and subscription semantics:
|
||||
|
||||
| Event | Direction | Purpose |
|
||||
|-------|-----------|---------|
|
||||
| `call.requested` | Caller → Handler | Initiate a call or subscription |
|
||||
| `call.responded` | Handler → Caller | Deliver a result (one for calls, many for subscriptions) |
|
||||
| `call.completed` | Handler → Caller | Signal end of subscription stream |
|
||||
| `call.aborted` | Either side | Cancel the call/subscription |
|
||||
| `call.error` | Handler → Caller | Signal an error |
|
||||
|
||||
**`call.error` payload**:
|
||||
```json
|
||||
{
|
||||
"code": "string",
|
||||
"message": "string",
|
||||
"retryable": false
|
||||
}
|
||||
```
|
||||
|
||||
**A call is just a subscribe that resolves after one event.** Both `call()` and
|
||||
`subscribe()` send the same `call.requested` event. The difference is
|
||||
consumption pattern:
|
||||
|
||||
- **`call()`**: Sends `call.requested`, resolves `Promise` on first `call.responded`
|
||||
- **`subscribe()`**: Sends `call.requested`, yields each `call.responded` until `call.completed` or `call.aborted`
|
||||
|
||||
The `id` field carries the `requestId` for correlation.
|
||||
|
||||
### Bidirectional Calls and Routing
|
||||
|
||||
Both sides of a connection can initiate calls. The head routes calls to workers
|
||||
using the first path segment:
|
||||
|
||||
```
|
||||
Head (server) Worker: "dev1" (client)
|
||||
│ │
|
||||
│ call.requested │
|
||||
│ name: "/dev1/fs/readFile" │
|
||||
│ payload: { path: "/src/main.rs" } │
|
||||
│──────────────────────────────────────────▶│
|
||||
│ │
|
||||
│ call.responded │
|
||||
│ id: <requestId> │
|
||||
│ payload: { content: "fn main()..." } │
|
||||
│◀──────────────────────────────────────────│
|
||||
│ │
|
||||
│ Worker exposes /dev1/fs/*, │
|
||||
│ /dev1/bash/* to head │
|
||||
│ │
|
||||
│◀─ call.requested ────────────────────────│
|
||||
│ name: "/head/agent/chat" │
|
||||
│ payload: { provider: "anthropic", ... } │
|
||||
│ │
|
||||
│── call.responded ──────────────────────▶ │
|
||||
│ id: <requestId> │
|
||||
│ payload: { completion: "..." } │
|
||||
```
|
||||
|
||||
The head's registry includes:
|
||||
- **Head-local operations** (`/head/*`) — handled directly
|
||||
- **Remote operations** (`/{node}/*`) — forwarded to the worker connection
|
||||
|
||||
When the head routes `/dev1/fs/readFile` to worker `dev1`, it strips the node
|
||||
prefix and delivers the call to the worker's local registry as `/fs/readFile`.
|
||||
The worker doesn't need to know its own alias.
|
||||
|
||||
### Head/Worker Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────┐
|
||||
│ Head Node │
|
||||
│ │
|
||||
│ Head-local services: │
|
||||
│ /head/agent/chat (LLM coord) │
|
||||
│ /head/agent/complete │
|
||||
│ /head/sessions/list │
|
||||
│ /head/sessions/history │
|
||||
│ │
|
||||
│ Worker registry (discovered): │
|
||||
│ /dev1/fs/* → dev1 connection │
|
||||
│ /dev1/bash/* → dev1 connection │
|
||||
│ /dev2/fs/* → dev2 connection │
|
||||
│ /browser-1/notify/* → WT conn │
|
||||
└──────┬───────┬───────┬──────────┘
|
||||
│ │ │
|
||||
┌─────────▼┐ ┌───▼────┐ ┌▼───────────┐
|
||||
│ Worker │ │Worker │ │Browser Worker│
|
||||
│ "dev1" │ │"dev2" │ │"browser-1" │
|
||||
│ /fs/* │ │/fs/* │ │/notify/* │
|
||||
│ /bash/* │ │/bash/* │ │ │
|
||||
│ /search/*│ │ │ │ │
|
||||
└──────────┘ └────────┘ └─────────────┘
|
||||
```
|
||||
|
||||
When a worker connects, it registers its operations with the head:
|
||||
|
||||
```
|
||||
worker → head: call.requested { name: "/head/services/register", payload: {
|
||||
node: "dev1",
|
||||
operations: ["/fs/readFile", "/fs/writeFile", "/bash/exec", "/search/query"]
|
||||
}}
|
||||
```
|
||||
|
||||
The head adds these to its routing table with the node prefix. Other workers
|
||||
and browser clients can then call `/dev1/fs/readFile` without knowing how
|
||||
the head routes it internally.
|
||||
|
||||
### Operation Registry
|
||||
|
||||
The operation registry maps paths to specs and handlers. **Specs and handlers
|
||||
are separate** — downstream consumers register both (ADR-025).
|
||||
|
||||
```rust
|
||||
pub struct OperationSpec {
|
||||
pub name: String, // e.g., "/fs/readFile", "/agent/chat"
|
||||
pub namespace: String, // e.g., "fs", "agent"
|
||||
pub op_type: OperationType, // Query, Mutation, Subscription
|
||||
pub input_schema: Value, // JSON Schema for input
|
||||
pub output_schema: Value, // JSON Schema for output
|
||||
pub access_control: AccessControl, // Required scopes/resources
|
||||
}
|
||||
|
||||
pub enum OperationType {
|
||||
Query, // Read-only, idempotent (e.g., "/fs/readFile", "/search/query")
|
||||
Mutation, // Side effects (e.g., "/bash/exec", "/sessions/create")
|
||||
Subscription, // Streaming (e.g., "/events/subscribe")
|
||||
}
|
||||
|
||||
pub struct AccessControl {
|
||||
pub required_scopes: Vec<String>, // AND-checked
|
||||
pub required_scopes_any: Option<Vec<String>>, // OR-checked
|
||||
pub resource_type: Option<String>, // e.g., "service"
|
||||
pub resource_action: Option<String>, // e.g., "read"
|
||||
}
|
||||
```
|
||||
|
||||
**Registration is separated from implementation:**
|
||||
|
||||
```rust
|
||||
// Core registers discovery operations
|
||||
registry.register(OperationSpec { name: "/services/list", ... }, list_services_handler);
|
||||
registry.register(OperationSpec { name: "/services/schema", ... }, schema_handler);
|
||||
|
||||
// A dev env worker registers its tools
|
||||
registry.register(OperationSpec { name: "/fs/readFile", ... }, fs_read_handler);
|
||||
registry.register(OperationSpec { name: "/bash/exec", ... }, bash_exec_handler);
|
||||
|
||||
// A browser client registers notification UDFs
|
||||
registry.register(OperationSpec { name: "/notify/alert", ... }, notify_handler);
|
||||
```
|
||||
|
||||
Core-provided operations use short paths without a node prefix
|
||||
(`/services/list`, `/services/schema`). They live on whatever node the
|
||||
caller is connected to. Worker-prefixed operations (`/dev1/fs/readFile`)
|
||||
are routed by the head.
|
||||
|
||||
### ACL Per Operation Path
|
||||
|
||||
Access control maps to path prefixes using standard URL-like matching:
|
||||
|
||||
| Pattern | Matches | Purpose |
|
||||
|---------|---------|---------|
|
||||
| `/dev1/*` | All operations on node `dev1` | Full access to a worker |
|
||||
| `/*/fs/*` | `fs` service on any node | Read file access across dev envs |
|
||||
| `/*/bash/*` | `bash` service on any node | Shell access (higher risk) |
|
||||
| `/head/agent/*` | Head LLM agent | LLM calls |
|
||||
| `/head/sessions/*` | Head session management | Session history |
|
||||
| `/browser-1/notify/alert` | Specific operation on specific node | One UI notification |
|
||||
|
||||
Higher-risk operations (shell, filesystem write) can require tighter scopes
|
||||
than read-only operations. The ACL evaluates against the caller's
|
||||
`Identity.scopes` and `Identity.resources` from the auth layer (see auth.md).
|
||||
|
||||
### Service Discovery
|
||||
|
||||
The `/services/list` and `/services/schema` operations expose what a node
|
||||
offers. Read-only — no admin operations:
|
||||
|
||||
| Operation | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `/services/list` | Query | List registered operation paths + metadata |
|
||||
| `/services/schema` | Query | Get `OperationSpec` for a specific operation |
|
||||
|
||||
These tell the caller: "here's what you can call." They are not a control
|
||||
panel. Access control is enforced at the operation level.
|
||||
|
||||
### PendingRequestMap
|
||||
|
||||
Manages in-flight calls and subscriptions. Correlates `call.responded` events
|
||||
back to the original `call.requested`:
|
||||
|
||||
```rust
|
||||
pub struct PendingRequestMap {
|
||||
pending: HashMap<String, PendingEntry>,
|
||||
}
|
||||
|
||||
enum PendingEntry {
|
||||
Call {
|
||||
tx: oneshot::Sender<Result<Value>>,
|
||||
timeout: Instant,
|
||||
},
|
||||
Subscribe {
|
||||
tx: mpsc::Sender<Result<Value>>,
|
||||
timeout: Option<Instant>,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
When a `call.responded` event arrives:
|
||||
- If `PendingEntry::Call` → resolve the oneshot, delete entry
|
||||
- If `PendingEntry::Subscribe` → push to the mpsc channel, keep entry alive
|
||||
|
||||
When `call.completed` arrives on a subscription → close the mpsc channel, delete
|
||||
entry. When `call.aborted` arrives → cancel/drop whichever side initiated it. A
|
||||
`call.aborted` for an unknown `requestId` is silently discarded — no error
|
||||
response is generated.
|
||||
|
||||
Timeouts prevent dangling entries. A background task sweeps expired entries
|
||||
periodically.
|
||||
|
||||
### Protocol Adapter Layer
|
||||
|
||||
The call protocol is transport-agnostic and interface-agnostic by design. It
|
||||
receives input from two interface categories (ADR-035):
|
||||
|
||||
**StreamInterface** produces `InterfaceEvent` frames from a continuous byte
|
||||
stream (SSH channel, raw framing). The call protocol handler calls `recv()`
|
||||
on the session to get events.
|
||||
|
||||
**MessageInterface** handles individual `InterfaceRequest` → `InterfaceResponse`
|
||||
pairs (HTTP, DNS). The call protocol handler constructs an `OperationContext`
|
||||
from the request and invokes the registry directly.
|
||||
|
||||
Both paths resolve to the same `OperationRegistry` and `OperationEnv`:
|
||||
|
||||
| Transport | Channel mechanism | Direction |
|
||||
|-----------|-------------------|-----------|
|
||||
| SSH | Reserved `direct_tcpip` destination (ADR-018) | Bidirectional over SSH channel |
|
||||
| WebTransport | Bidirectional stream after CONNECT | Bidirectional over WT stream |
|
||||
| iroh QUIC | Bidirectional `open_bi()` / `accept_bi()` | Bidirectional over QUIC stream |
|
||||
| WebSocket | Single WS connection | Bidirectional over WS frames |
|
||||
| Worker | `postMessage` | Bidirectional over structured clone |
|
||||
|
||||
The framing is always: 4-byte BE length prefix + JSON. The envelope shape is
|
||||
the same regardless of transport.
|
||||
|
||||
### OperationEnv — Universal Composition Mechanism
|
||||
|
||||
OperationEnv provides the handler-facing API for composing operations. A handler
|
||||
receives `context.env.invoke(namespace, operation, input)` and gets back a
|
||||
`ResponseEnvelope` — regardless of which dispatch path the operation takes
|
||||
(ADR-033).
|
||||
|
||||
Three dispatch paths, one API:
|
||||
|
||||
| Path | Mechanism | Serialization | Scope |
|
||||
|------|-----------|---------------|-------|
|
||||
| **Local** | Direct function call through registry | None (in-process) | Same process |
|
||||
| **Service** | irpc protocol enum dispatch | postcard (binary) | Same cluster |
|
||||
| **Remote** | Call protocol `EventEnvelope` | JSON | Cross-node |
|
||||
|
||||
All three produce the same `ResponseEnvelope`. Service assembly determines
|
||||
which path each operation uses:
|
||||
|
||||
```rust
|
||||
// Minimal deployment (Phase 1: single node, all local)
|
||||
let env = OperationEnv::local(local_registry);
|
||||
|
||||
// Production deployment (Phase 2+: mix of local and remote)
|
||||
let env = OperationEnv::new()
|
||||
.local("auth", auth_registry)
|
||||
.local("config", config_registry)
|
||||
.service("secrets", secret_irpc_client)
|
||||
.remote("worker-1", call_protocol_conn);
|
||||
```
|
||||
|
||||
**Phase boundary**: Phase 1 ships with local dispatch only (direct function
|
||||
calls through the operation registry). The irpc service dispatch and remote
|
||||
dispatch paths are contracted here but not built yet. irpc service protocols
|
||||
(`AuthProtocol`, `SecretProtocol`, etc.) are defined in the specs but the
|
||||
implementations are Phase 2+ work.
|
||||
|
||||
**irpc is one dispatch backend for OperationEnv, not a replacement for the
|
||||
call protocol or for OperationEnv.** A call protocol handler can call an irpc
|
||||
service internally (e.g., `/head/auth/verify` calls
|
||||
`AuthProtocol::VerifyPubkey`) — the layers compose. irpc is behind a feature
|
||||
flag in alknet-core. See [services.md](services.md) for full OperationEnv and
|
||||
irpc service details.
|
||||
|
||||
### OperationContext
|
||||
|
||||
Every handler receives an `OperationContext`:
|
||||
|
||||
```rust
|
||||
pub struct OperationContext {
|
||||
pub request_id: String,
|
||||
pub parent_request_id: Option<String>,
|
||||
pub identity: Option<Identity>,
|
||||
pub metadata: HashMap<String, Value>,
|
||||
pub env: OperationEnv,
|
||||
pub trusted: bool, // set by buildEnv(), not by callers
|
||||
}
|
||||
```
|
||||
|
||||
- **`identity`**: The authenticated identity making the call. Populated by
|
||||
`IdentityProvider` from the interface layer ([identity.md](identity.md)).
|
||||
- **`env`**: The operation environment — namespaced access to other operations.
|
||||
- **`trusted`**: When a handler calls another operation through `env`, the
|
||||
nested call is `trusted` (skips ACL checks). This prevents double-checking:
|
||||
if `/head/agent/chat` is allowed, and it internally calls
|
||||
`/head/auth/verify`, the auth check is trusted.
|
||||
|
||||
Handler signature:
|
||||
|
||||
```rust
|
||||
fn handle(input: Value, context: OperationContext) -> ResponseEnvelope;
|
||||
```
|
||||
|
||||
### ResponseEnvelope
|
||||
|
||||
The universal return type from all three dispatch paths:
|
||||
|
||||
```rust
|
||||
pub struct ResponseEnvelope {
|
||||
pub request_id: String,
|
||||
pub result: Result<Value, CallError>,
|
||||
}
|
||||
|
||||
pub struct CallError {
|
||||
pub code: String,
|
||||
pub message: String,
|
||||
pub retryable: bool,
|
||||
}
|
||||
```
|
||||
|
||||
Local dispatch produces `ResponseEnvelope` with no serialization. irpc service
|
||||
dispatch produces postcard-encoded results that are decoded into
|
||||
`ResponseEnvelope`. Remote dispatch receives `call.responded` EventEnvelope
|
||||
frames and maps them to `ResponseEnvelope`. The handler always gets the same
|
||||
type back.
|
||||
|
||||
### Relationship to @alkdev/pubsub and @alkdev/operations
|
||||
|
||||
The call protocol in core is a Rust reimplementation of the same protocol
|
||||
defined in `@alkdev/operations`. The TypeScript implementation provides:
|
||||
|
||||
- `PendingRequestMap` — request/response correlation
|
||||
- `CallHandler` — bridges pubsub events to operation registry
|
||||
- `OperationSpec`, `AccessControl`, `Identity` — type definitions
|
||||
|
||||
The Rust implementation mirrors these types and behaviors. TypeScript consumers
|
||||
continue using `@alkdev/operations` over `@alkdev/pubsub` adapters (including
|
||||
the `event-target-alknet` adapter). Rust consumers use core's registry directly.
|
||||
Both speak the same wire protocol and can interoperate.
|
||||
|
||||
The key principle: **the same `EventEnvelope` can flow from a Rust handler
|
||||
through core, out over SSH channel, into a JavaScript pubsub adapter, and
|
||||
be dispatched through `@alkdev/operations`'s call handler** — with zero
|
||||
translation at the wire level.
|
||||
|
||||
### Agent Service Pattern (Downstream Application Concern)
|
||||
|
||||
An agent service — coordinating between LLM providers and tool calls — is a
|
||||
primary downstream use case for the call protocol. It would be just another set
|
||||
of registered operations with no special treatment:
|
||||
|
||||
- `/head/agent/chat` — send a message, get a completion. Routes to the
|
||||
appropriate LLM provider based on available workers and configuration.
|
||||
- `/head/agent/complete` — streaming completion. Yields tokens as they arrive.
|
||||
- `/head/sessions/list` — list session histories (backed by Honker or other
|
||||
durable storage).
|
||||
- `/head/sessions/history` — retrieve a specific session's message history.
|
||||
|
||||
The agent service uses OperationEnv to invoke tools on workers. **This is a
|
||||
downstream application concern, not a core requirement.** The call protocol
|
||||
enables it by providing the universal composition mechanism (ADR-033), but the
|
||||
agent service itself is built on top, not into the core.
|
||||
|
||||
## Constraints
|
||||
|
||||
- The call protocol does not depend on Honker, SQLite, or any database. The
|
||||
`PendingRequestMap` is in-memory. Durable session storage is a consumer concern.
|
||||
- Operation specs use JSON Schema. Complex sub-structures (postcard, protobuf)
|
||||
can be carried as base64-encoded blobs in the `payload`, but the envelope
|
||||
itself is always JSON.
|
||||
- Service discovery (`/services/list`, `/services/schema`) is read-only. No
|
||||
admin operations are exposed through the call protocol itself.
|
||||
- Batch is not a protocol primitive. Multiple `call.requested` events with
|
||||
correlated `requestId`s provide equivalent semantics.
|
||||
- The node prefix in the operation path is a routing mechanism, not a security
|
||||
boundary. ACL is enforced at the `AccessControl` level, not by path prefix
|
||||
alone. A worker that exposes `/dev1/bash/exec` can restrict access via
|
||||
`required_scopes` — not every authenticated identity should have shell access.
|
||||
- **OperationEnv composition model matches the `@alkdev/operations` behavioral
|
||||
contract**: namespace + operation name → invoke with input, return output.
|
||||
The Rust implementation may differ in structure but must preserve this
|
||||
contract (ADR-033).
|
||||
- **irpc is explicitly positioned as one dispatch backend for OperationEnv**
|
||||
(ADR-033, ADR-028). It is not a replacement for the call protocol or for
|
||||
OperationEnv.
|
||||
- **Phase 1 is local dispatch only.** irpc service dispatch and remote dispatch
|
||||
are contracted in this spec but not built yet. The `OperationEnv::local()`
|
||||
path is the Phase 1 implementation.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **OQ-20**: How does the head track which workers expose which operations when
|
||||
workers connect and disconnect? Registration on connect and cleanup on
|
||||
disconnect, or heartbeat-based discovery? See
|
||||
[open-questions.md](open-questions.md).
|
||||
|
||||
- **OQ-22**: ~~Should the call protocol support streaming inputs (client streaming
|
||||
in gRPC terms)?~~ Resolved — deferred. Current model covers all identified use
|
||||
cases. See [open-questions.md](open-questions.md).
|
||||
|
||||
- **~~OQ-IF-01~~**: ~~How does the `Interface` session type relate to the call
|
||||
protocol's `EventEnvelope` stream?~~ Resolved — `InterfaceSession::recv()`
|
||||
returns `Option<InterfaceEvent>` where `InterfaceEvent` carries
|
||||
`EventEnvelope` + `Identity`. `InterfaceSession::send()` accepts `EventEnvelope`.
|
||||
The `SshSession` bridge implements this over the `alknet-control:0` channel.
|
||||
For `MessageInterface`, `InterfaceRequest`/`InterfaceResponse` normalize
|
||||
request/response pairs. See [interface.md](interface.md) and ADR-035.
|
||||
|
||||
- **OQ-P2-01**: Should `MessageInterface` and `StreamInterface` share a common
|
||||
trait? See [interface.md](interface.md) and [open-questions.md](open-questions.md).
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [018](decisions/018-control-channel-for-pubsub.md) | Control channel for pubsub | Reserved destination for event bus |
|
||||
| [024](decisions/024-bidirectional-call-protocol.md) | Bidirectional call protocol | Generalizes ADR-018, both sides can call |
|
||||
| [025](decisions/025-handler-spec-separation.md) | Handler/spec separation | Downstream registers operations without modifying core |
|
||||
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | irpc is one dispatch backend for OperationEnv |
|
||||
| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv | Universal composition with three dispatch paths |
|
||||
| [035](decisions/035-streaminterface-messageinterface-split.md) | StreamInterface/MessageInterface | Call protocol accepts events from both interface categories |
|
||||
|
||||
## Phase 2 Implementation Notes
|
||||
|
||||
- `SshSession::recv()` and `SshSession::send()` now functional — bridged to call protocol via `alknet-control:0` SSH channel using `ControlChannelBridge` with mpsc channels
|
||||
- `FrameFramedReader`/`FrameFramedWriter` added to `call::frame` for async length-prefixed EventEnvelope I/O
|
||||
- `RawFramingSession` implemented with first-frame auth: first frame's payload extracted as AuthToken, resolved via `IdentityProvider::resolve_from_token()`, session transitions to authenticated state on success
|
||||
- `OperationEnv.credentials(service)` method added for outbound credential resolution (ADR-036)
|
||||
- `CredentialProvider` trait and `CredentialSet` enum defined in `alknet_core::credentials`
|
||||
|
||||
## References
|
||||
|
||||
- [auth.md](auth.md) — Identity and `IdentityProvider` trait
|
||||
- [napi-and-pubsub.md](napi-and-pubsub.md) — NAPI wrapper and pubsub adapter
|
||||
- [server.md](server.md) — Channel handling and control channel routing
|
||||
- [transport.md](transport.md) — Transport abstraction
|
||||
- [identity.md](identity.md) — Identity struct, IdentityProvider trait
|
||||
- [interface.md](interface.md) — Interface layer, EventEnvelope stream from interfaces
|
||||
- [configuration.md](configuration.md) — ForwardingPolicy, service metadata
|
||||
- [services.md](services.md) — OperationEnv, OperationContext, irpc service layer
|
||||
- `@alkdev/pubsub` — TypeScript event target adapters and `EventEnvelope`
|
||||
- `@alkdev/operations` — TypeScript call protocol, `OperationSpec`, registry
|
||||
- `@alkdev/storage` — `peer_credentials` table, ACL graph, `Identity`
|
||||
- [irpc](/workspace/irpc) — iroh streaming RPC (postcard-only, Rust-to-Rust)
|
||||
- [iroh](/workspace/iroh) — P2P QUIC transport
|
||||
@@ -1,209 +0,0 @@
|
||||
---
|
||||
status: reviewed
|
||||
last_updated: 2026-06-02
|
||||
---
|
||||
|
||||
# Client
|
||||
|
||||
## What
|
||||
|
||||
The alknet client establishes an SSH session to a server (via pluggable transport) and exposes a local SOCKS5 proxy for routing traffic through that session. Port forwarding (`-L` / `-R` style) covers specific service access like Postgres or Redis.
|
||||
|
||||
## Why
|
||||
|
||||
Users need a way to route traffic through the SSH tunnel. SOCKS5 is the primary interface — it's standard, well-supported by browsers and CLI tools, and needs no privileges. Port forwarding covers specific service access. For VPN-like "route all traffic" behavior, users run `tun2proxy` alongside alknet (ADR-014).
|
||||
|
||||
## Architecture
|
||||
|
||||
### Client Components
|
||||
|
||||
```
|
||||
┌────────────────────────────────────────────────────────┐
|
||||
│ alknet connect │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ SOCKS5 │ │ Port │ │ Remote │ │
|
||||
│ │ Server │ │ Forward │ │ Forward │ │
|
||||
│ │ :1080 │ │ -L spec │ │ -R spec │ │
|
||||
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌─────────────────────────────────┐ │
|
||||
│ │ Channel Manager │ │
|
||||
│ │ (opens direct-tcpip, │ │
|
||||
│ │ forwarded-tcpip streams) │ │
|
||||
│ └──────────────┬──────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────────────▼──────────────────┐ │
|
||||
│ │ SSH Client (russh) │ │
|
||||
│ │ Handle<ClientHandler> │ │
|
||||
│ └──────────────┬──────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────────────▼──────────────────┐ │
|
||||
│ │ Transport │ │
|
||||
│ │ (Tcp / Tls / Iroh) │ │
|
||||
│ └──────────────────────────────────┘ │
|
||||
└────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### SOCKS5 Server
|
||||
|
||||
The primary client interface. Listens on a local port (default `127.0.0.1:1080`), accepts SOCKS5 connections, and for each connection:
|
||||
|
||||
1. Reads the SOCKS5 handshake (auth method negotiation, target address)
|
||||
2. Opens a `channel_open_direct_tcpip(target_host, target_port, originator_addr, originator_port)` on the SSH session
|
||||
3. Converts the SSH channel to a stream via `channel.into_stream()`
|
||||
4. Runs `tokio::io::copy_bidirectional(&mut local_socket, &mut ssh_stream)` to proxy data
|
||||
|
||||
Supports SOCKS5h (domain names resolved server-side) by default. This prevents DNS leaks — the client never resolves target hostnames locally, sending them to the server for resolution instead. This is consistent with the project's privacy design (ADR-006).
|
||||
|
||||
### Port Forwarding
|
||||
|
||||
Local port forwards (`-L local_addr:local_port:remote_host:remote_port`):
|
||||
|
||||
1. Bind `TcpListener` on `local_addr:local_port`
|
||||
2. For each accepted connection, open `channel_open_direct_tcpip(remote_host, remote_port, ...)`
|
||||
3. Proxy bytes bidirectionally via `copy_bidirectional`
|
||||
|
||||
Remote port forwards (`-R remote_addr:remote_port:local_host:local_port`):
|
||||
|
||||
1. Send `tcpip_forward(remote_addr, remote_port)` to request the server listen on a port
|
||||
2. When the handler receives `server_channel_open_forwarded_tcpip`, connect to `local_host:local_port`
|
||||
3. Proxy bytes bidirectionally
|
||||
|
||||
### Channel Manager
|
||||
|
||||
The channel manager owns the `Arc<client::Handle<ClientHandler>>` and provides methods:
|
||||
|
||||
- `open_direct_tcpip(host, port)` — open a tunnel channel to a remote host
|
||||
- `open_streamlocal(socket_path)` — open a tunnel to a Unix socket
|
||||
- `request_tcpip_forward(addr, port)` — request remote listening
|
||||
- `cancel_tcpip_forward(addr, port)` — cancel remote listening
|
||||
|
||||
It also handles reconnection: if `handle.is_closed()` returns true, attempt reconnection with exponential backoff.
|
||||
|
||||
### Reconnection
|
||||
|
||||
On transport failure:
|
||||
|
||||
1. Detect via `handle.is_closed()` or transport read error
|
||||
2. Exponential backoff reconnect (1s, 2s, 4s, ... max 30s)
|
||||
3. Re-establish transport connection
|
||||
4. Re-authenticate SSH session
|
||||
5. Notify SOCKS5 server and port forwards (in-flight connections fail, new connections work)
|
||||
|
||||
Reconnection is always enabled. The backoff caps at 30 seconds and continues indefinitely until the user terminates the process. Existing TCP connections through the tunnel are lost on reconnect — this is acceptable and consistent with how VPN connections behave.
|
||||
|
||||
The channel manager orchestrates reconnection: it creates a new transport stream (by calling `transport.connect()` again) and establishes a new SSH session over it (ADR-004). This is a full reconnect — there is no "SSH reconnects over the same transport." Port forward listeners (`-L`, `-R`) are re-registered with the new session after reconnection.
|
||||
|
||||
### Programmatic Configuration (ADR-011)
|
||||
|
||||
The client uses programmatic configuration — no `~/.ssh/config` parsing, no custom config files. Configuration comes from:
|
||||
|
||||
1. **CLI flags**: `--server`, `--identity`, `--transport`, etc.
|
||||
2. **Library API**: `ConnectOptions` and `ServeOptions` structs in `alknet-core`, constructable programmatically
|
||||
3. **Environment variables**: `ALKNET_SERVER`, `ALKNET_IDENTITY` as convenience defaults
|
||||
|
||||
This approach avoids cross-platform path issues (`~` expansion, Windows `USERPROFILE`) and makes the library API clean for programmatic consumers like the NAPI wrapper. Keys can be provided as file paths or in-memory data.
|
||||
|
||||
### Key Material Format
|
||||
|
||||
Key inputs (`--identity`, `--authorized-keys`, `--cert-authority`, `--key`) accept either:
|
||||
|
||||
- **File path**: A filesystem path to a key file (e.g., `~/.ssh/id_ed25519`, `/etc/alknet/ca.pub`)
|
||||
- **In-memory data**: Raw key bytes provided programmatically via the library API or NAPI wrapper (as `Vec<u8>` in Rust, `Buffer` in Node.js)
|
||||
|
||||
The accepted format is **OpenSSH key format** (the format used by `ssh-keygen` and OpenSSH's `~/.ssh/` files). This includes:
|
||||
- Private keys: OpenSSH format (begins with `-----BEGIN OPENSSH PRIVATE KEY-----`)
|
||||
- Public keys: OpenSSH format (e.g., `ssh-ed25519 AAAA... user@host`)
|
||||
- Certificate authority keys: OpenSSH public key format
|
||||
- Authorized keys files: Standard OpenSSH `authorized_keys` format
|
||||
|
||||
PEM-encoded keys (PKCS#1, PKCS#8) are not supported. Use OpenSSH format keys throughout.
|
||||
|
||||
### CLI Interface
|
||||
|
||||
```bash
|
||||
# Basic connection (TCP, default port 22)
|
||||
alknet connect --server example.com --identity ~/.ssh/id_ed25519
|
||||
|
||||
# With TLS
|
||||
alknet connect --server example.com:443 --transport tls --identity ~/.ssh/id_ed25519
|
||||
|
||||
# With TLS + insecure (self-signed certs)
|
||||
alknet connect --server example.com:443 --transport tls --identity ~/.ssh/id_ed25519 --insecure
|
||||
|
||||
# With iroh (no public IP needed)
|
||||
alknet connect --peer <endpoint-id> --transport iroh --identity ~/.ssh/id_ed25519
|
||||
|
||||
# With iroh + custom relay
|
||||
alknet connect --peer <endpoint-id> --transport iroh --identity ~/.ssh/id_ed25519 --iroh-relay https://relay.example.com
|
||||
|
||||
# With iroh + proxy (transport chaining)
|
||||
alknet connect --peer <endpoint-id> --transport iroh --identity ~/.ssh/id_ed25519 --proxy socks5://127.0.0.1:1080
|
||||
|
||||
# SOCKS5 on custom port
|
||||
alknet connect --server example.com --socks5 127.0.0.1:1080 --identity ~/.ssh/id_ed25519
|
||||
|
||||
# With port forwards
|
||||
alknet connect --server example.com --forward 5432:db.internal:5432 --forward 6379:redis.internal:6379
|
||||
|
||||
# All options
|
||||
alknet connect \
|
||||
--server <addr> \ # TCP/TLS server address (required for tcp/tls)
|
||||
--peer <endpoint-id> \ # iroh endpoint ID, base58-encoded (required for iroh)
|
||||
--transport tcp|tls|iroh \ # Transport mode
|
||||
--identity <path-or-buffer> \ # SSH private key (path or in-memory)
|
||||
--socks5 <addr:port> \ # SOCKS5 listen address (default: 127.0.0.1:1080)
|
||||
--forward <spec> \ # Port forward spec (repeatable)
|
||||
--remote-forward <spec> \ # Remote port forward spec (repeatable)
|
||||
--proxy <url> \ # Upstream proxy (socks5:// or http://)
|
||||
--iroh-relay <url> \ # iroh relay URL (default: n0 relay)
|
||||
--tls-server-name <host> \ # SNI hostname for TLS
|
||||
--insecure # Accept self-signed TLS certs
|
||||
```
|
||||
|
||||
## Constraints
|
||||
|
||||
- SOCKS5 is always enabled when `alknet connect` runs (it's the primary interface). Port forwards are optional.
|
||||
- The client does not log tunnel destinations. The SOCKS5 server connects and proxies — no logging of SOCKS5 request targets.
|
||||
- Authentication is Ed25519 public key or OpenSSH certificate (ADR-012). No password authentication over SSH.
|
||||
- Only one SSH session per `alknet connect` process. Multiple sessions = multiple processes (or a future multiplexer).
|
||||
- No `~/.ssh/config` parsing. Configuration is programmatic via CLI flags, env vars, or library API structs (ADR-011).
|
||||
- VPN-like "route all traffic" behavior is provided by running `tun2proxy --proxy socks5://127.0.0.1:1080` alongside the client, not by a built-in TUN interface (ADR-014).
|
||||
- The CLI `alknet connect` command manages a full SSH session with SOCKS5 and port forwarding. The NAPI `connect()` function is a different operation — it opens a single SSH channel as a Duplex stream for programmatic use, with no SOCKS5 server or port forwarding. See napi-and-pubsub.md for details.
|
||||
|
||||
## Graceful Shutdown
|
||||
|
||||
On SIGTERM or SIGINT:
|
||||
|
||||
1. Stop accepting new SOCKS5 connections and port forward connections
|
||||
2. Send an SSH disconnect message to the server
|
||||
3. Wait for in-flight channel data to drain (brief timeout, ~2 seconds)
|
||||
4. Close the transport stream
|
||||
5. Exit
|
||||
|
||||
In-flight connections are not preserved across shutdown — they receive a connection reset. This matches the behavior of standard SSH tunnel tools.
|
||||
|
||||
## Error Handling
|
||||
|
||||
Error handling follows the project's layered pattern (see overview.md):
|
||||
|
||||
- **Transport errors**: Trigger reconnection with exponential backoff (see Reconnection section above). If reconnection fails indefinitely, the process continues retrying until the user terminates it.
|
||||
- **Auth errors**: Cause reconnection retry. After repeated auth failures, the SOCKS5 server and port-forward listeners remain active but new channel opens fail until reconnection succeeds.
|
||||
- **Channel-level errors**: Individual channel failures (target unreachable, proxy failure) close that channel without affecting the SSH session or other channels.
|
||||
- **CLI errors**: Reported to stderr with a non-zero exit code. Fatal errors (invalid flags, key file not found) exit immediately.
|
||||
|
||||
## Open Questions
|
||||
|
||||
None — all resolved.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [005](decisions/005-socks5-before-tun.md) | SOCKS5 first | SOCKS5 is the primary interface; TUN is external (tun2proxy) |
|
||||
| [006](decisions/006-no-logging-of-tunnel-destinations.md) | No logging of destinations | Client does not log SOCKS5 request targets (consistent with ADR-006) |
|
||||
| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first API | No file-based config; options are structs, env vars, or CLI flags |
|
||||
| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Key + cert-authority | No password auth; OpenSSH cert-authority for multi-user |
|
||||
| [019](decisions/019-proxy-dual-semantics.md) | Proxy dual semantics | `--proxy` routes transport on client, data on server |
|
||||
@@ -1,329 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-09
|
||||
---
|
||||
|
||||
# Configuration
|
||||
|
||||
## What
|
||||
|
||||
Alknet's configuration is split into `StaticConfig` (immutable after startup) and
|
||||
`DynamicConfig` (hot-reloadable at runtime), with `ArcSwap` providing lock-free
|
||||
reads on the hot path. `ConfigService` wraps reloads behind an irpc protocol
|
||||
for production deployments.
|
||||
|
||||
## Why
|
||||
|
||||
Three specific failures motivated the split (ADR-030):
|
||||
|
||||
1. No hot reload of authentication credentials — adding a key requires a restart.
|
||||
2. No port forwarding access control — any authenticated client has unrestricted
|
||||
access (ADR-031).
|
||||
3. No structured configuration beyond CLI flags — operators need config files
|
||||
and the NAPI layer needs programmatic reload.
|
||||
|
||||
The split is clean: anything that affects SSH handshake or socket binding is
|
||||
static; anything checked per-connection or per-channel is dynamic.
|
||||
|
||||
## Architecture
|
||||
|
||||
### StaticConfig
|
||||
|
||||
Immutable after startup. Constructed from `ServeOptions` (the builder pattern
|
||||
is preserved per ADR-011). Contains:
|
||||
|
||||
- Transport mode, listen address
|
||||
- TLS config (cert, key)
|
||||
- iroh config (relay URL)
|
||||
- Stealth mode flag
|
||||
- Host key, host key algorithm
|
||||
- Max auth attempts, max connections per IP
|
||||
- Proxy config
|
||||
|
||||
Changing any of these requires a restart.
|
||||
|
||||
### DynamicConfig
|
||||
|
||||
Hot-reloadable at runtime via `ArcSwap<DynamicConfig>`. Contains:
|
||||
|
||||
- `AuthPolicy` — authorized keys, certificate authorities, token config
|
||||
- `ForwardingPolicy` — allow/deny rules for channel targets (ADR-031)
|
||||
- `RateLimitConfig` — rate limiting parameters
|
||||
|
||||
`ArcSwap` provides lock-free reads. Every `auth_publickey()` and
|
||||
`channel_open_direct_tcpip()` call does a single `Arc` dereference — zero cost
|
||||
compared to the current approach. Writes are atomic: `store()` swaps the
|
||||
pointer.
|
||||
|
||||
### API Keys
|
||||
|
||||
`DynamicConfig.auth` also includes API keys for service accounts and HTTP
|
||||
interface auth (ADR-037):
|
||||
|
||||
```toml
|
||||
[[auth.api_keys]]
|
||||
prefix = "alk_"
|
||||
hash = "sha256:abc..."
|
||||
scopes = ["relay:connect"]
|
||||
description = "dashboard service account"
|
||||
ttl = "30d" # optional
|
||||
```
|
||||
|
||||
API keys are verified by `ConfigIdentityProvider::resolve_from_token()` — if
|
||||
the token starts with the configured prefix, it's treated as an API key and
|
||||
verified by SHA-256 hash lookup. Otherwise, it's treated as an Ed25519 AuthToken.
|
||||
Both paths produce the same `Identity` result.
|
||||
|
||||
### ConfigReloadHandle
|
||||
|
||||
```rust
|
||||
pub struct ConfigReloadHandle {
|
||||
dynamic: Arc<ArcSwap<DynamicConfig>>,
|
||||
}
|
||||
|
||||
impl ConfigReloadHandle {
|
||||
pub fn reload(&self, new_config: DynamicConfig) { ... }
|
||||
}
|
||||
```
|
||||
|
||||
Obtained from `Server::run()`. Passed to NAPI or CLI for explicit reload.
|
||||
|
||||
### ConfigServiceImpl
|
||||
|
||||
The Phase 1 implementation of config service logic, backed by
|
||||
`ArcSwap<DynamicConfig>`. Where `ConfigIdentityProvider` wraps the auth section
|
||||
of `DynamicConfig`, `ConfigServiceImpl` wraps the forwarding and rate-limit
|
||||
sections. Both are ArcSwap-backed and share the same `DynamicConfig` instance.
|
||||
|
||||
```rust
|
||||
pub struct ConfigServiceImpl {
|
||||
dynamic: Arc<ArcSwap<DynamicConfig>>,
|
||||
}
|
||||
|
||||
impl ConfigServiceImpl {
|
||||
pub fn forwarding_policy(&self) -> Arc<ForwardingPolicy> {
|
||||
self.dynamic.load().forwarding.clone()
|
||||
}
|
||||
|
||||
pub fn rate_limits(&self) -> Arc<RateLimitConfig> {
|
||||
self.dynamic.load().rate_limits.clone()
|
||||
}
|
||||
|
||||
pub fn reload(&self, new_config: DynamicConfig) {
|
||||
self.dynamic.store(Arc::new(new_config));
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Phase 1 deploys `ConfigServiceImpl` directly — no irpc service boundary. The
|
||||
`ConfigProtocol` irpc service (behind feature flag) wraps `ConfigServiceImpl`
|
||||
for production deployments that use the service layer. This mirrors the
|
||||
`ConfigIdentityProvider` / `AuthProtocol` pattern from [identity.md](identity.md)
|
||||
and ADR-028.
|
||||
|
||||
### ConfigService irpc Service
|
||||
|
||||
```rust
|
||||
enum ConfigProtocol {
|
||||
GetForwardingPolicy,
|
||||
GetRateLimits,
|
||||
ReloadForwarding { policy: ForwardingPolicy },
|
||||
ReloadRateLimits { limits: RateLimitConfig },
|
||||
}
|
||||
```
|
||||
|
||||
Behind the `irpc` feature flag. For production deployments that use the service
|
||||
layer. For minimal deployments, direct `ConfigReloadHandle::reload()` is
|
||||
sufficient.
|
||||
|
||||
### ForwardingPolicy
|
||||
|
||||
Part of DynamicConfig (ADR-031). Evaluated per-channel-open, matched against
|
||||
the authenticated `Identity`. Rules are evaluated in order; first match wins.
|
||||
Default determines fallback.
|
||||
|
||||
```rust
|
||||
pub struct ForwardingPolicy {
|
||||
pub default: ForwardingAction,
|
||||
pub rules: Vec<ForwardingRule>,
|
||||
}
|
||||
```
|
||||
|
||||
### TOML Config File
|
||||
|
||||
Optional convenience input format (amends ADR-011, does not replace
|
||||
programmatic API). Covers static config plus initial auth/forwarding paths.
|
||||
|
||||
```toml
|
||||
[server]
|
||||
# Stream-based listener: TLS + SSH on port 443
|
||||
[[listeners]]
|
||||
type = "stream"
|
||||
transport = "tls"
|
||||
interface = "ssh"
|
||||
listen = "0.0.0.0:443"
|
||||
|
||||
[server.tls]
|
||||
cert = "/etc/alknet/tls/cert.pem"
|
||||
key = "/etc/alknet/tls/key.pem"
|
||||
|
||||
# Stream-based listener: TCP + SSH on port 22
|
||||
[[listeners]]
|
||||
type = "stream"
|
||||
transport = "tcp"
|
||||
interface = "ssh"
|
||||
listen = "0.0.0.0:22"
|
||||
|
||||
# Stream-based listener: iroh P2P
|
||||
[[listeners]]
|
||||
type = "stream"
|
||||
transport = "iroh"
|
||||
iroh_relay = "https://relay.alk.dev"
|
||||
|
||||
# Message-based listener: HTTP on port 443 (with stealth)
|
||||
[[listeners]]
|
||||
type = "http"
|
||||
listen = "0.0.0.0:443"
|
||||
tls = true
|
||||
stealth = true
|
||||
|
||||
# Message-based listener: HTTP on port 8080 (separate, no stealth)
|
||||
# [[listeners]]
|
||||
# type = "http"
|
||||
# listen = "0.0.0.0:8080"
|
||||
# tls = false
|
||||
# stealth = false
|
||||
|
||||
# Message-based listener: DNS on port 53
|
||||
# [[listeners]]
|
||||
# type = "dns"
|
||||
# listen = "0.0.0.0:53"
|
||||
# tls = false
|
||||
|
||||
[auth]
|
||||
host_key = "/etc/alknet/ssh/host_key"
|
||||
|
||||
[auth.ssh]
|
||||
authorized_keys = [...]
|
||||
|
||||
[auth.token]
|
||||
enabled = true
|
||||
max_token_age = "5m"
|
||||
|
||||
[[auth.api_keys]]
|
||||
prefix = "alk_"
|
||||
hash = "sha256:abc..."
|
||||
scopes = ["relay:connect"]
|
||||
description = "dashboard service account"
|
||||
ttl = "30d"
|
||||
|
||||
[forwarding]
|
||||
default = "deny"
|
||||
|
||||
[[forwarding.rules]]
|
||||
target = "localhost:*"
|
||||
action = "allow"
|
||||
```
|
||||
|
||||
### NAPI Reload API
|
||||
|
||||
```typescript
|
||||
interface AlknetServer {
|
||||
reloadAuth(auth: { authorizedKeys?: Buffer, certAuthority?: Buffer }): void;
|
||||
reloadForwarding(policy: ForwardingPolicyConfig): void;
|
||||
reloadAll(config: DynamicConfig): void;
|
||||
}
|
||||
```
|
||||
|
||||
### Multi-Transport Listeners
|
||||
|
||||
A head node may accept connections on multiple transports and interfaces simultaneously.
|
||||
Listeners come in two categories: stream-based (Transport + StreamInterface pairs) and
|
||||
message-based (self-contained HTTP or DNS servers).
|
||||
|
||||
```rust
|
||||
pub enum ListenerConfig {
|
||||
Stream {
|
||||
transport: TransportKind,
|
||||
interface: StreamInterfaceKind,
|
||||
},
|
||||
Http {
|
||||
bind_addr: SocketAddr,
|
||||
tls: bool,
|
||||
stealth: bool, // byte-peek protocol detection on shared port
|
||||
},
|
||||
Dns {
|
||||
bind_addr: SocketAddr,
|
||||
tls: bool,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
For stream-based listeners, `Server::run()` spawns one accept loop per listener.
|
||||
For HTTP listeners, it spawns an axum server. For DNS listeners, it spawns a DNS
|
||||
server. All share `DynamicConfig`, `ConnectionRateLimiter`, sessions, and
|
||||
shutdown signal.
|
||||
|
||||
```toml
|
||||
[[listeners]]
|
||||
transport = "tls"
|
||||
listen = "0.0.0.0:443"
|
||||
stealth = true
|
||||
|
||||
[[listeners]]
|
||||
transport = "tcp"
|
||||
listen = "0.0.0.0:22"
|
||||
|
||||
[[listeners]]
|
||||
transport = "iroh"
|
||||
iroh_relay = "https://relay.alk.dev"
|
||||
```
|
||||
|
||||
### CLI vs Programmatic Behavior
|
||||
|
||||
| Interface | Static config | Dynamic config | Reload mechanism |
|
||||
|-----------|--------------|----------------|------------------|
|
||||
| CLI | Flags + optional `--config` file | Loaded at startup from `--authorized-keys` | None (restart to change) |
|
||||
| Core Rust | `StaticConfig` struct | `AuthProtocol` (irpc) or `ConfigIdentityProvider` (ArcSwap) | `ConfigProtocol::ReloadDynamicConfig` or `ConfigReloadHandle::reload()` |
|
||||
| NAPI | `serve()` options | Same | `server.reloadAuth()`, `server.reloadForwarding()` |
|
||||
|
||||
## Constraints
|
||||
|
||||
- `StaticConfig` cannot be changed after startup. Changing transport mode,
|
||||
listen address, TLS config, or host key requires a restart.
|
||||
- `DynamicConfig` is reloaded atomically via `ArcSwap`. Existing connections
|
||||
continue with their current config; new connections get the new config.
|
||||
- Config file is optional. `ServeOptions` builder pattern remains the primary
|
||||
API (amends ADR-011, does not supersede it).
|
||||
- No file watching (OQ-13 resolved: potential attack vector, unnecessary
|
||||
complexity).
|
||||
- Client configuration stays as `ConnectOptions` — no `ArcSwap` needed.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- None. All configuration-related questions are resolved per ADR-030, ADR-031,
|
||||
and the resolved OQs in [open-questions.md](open-questions.md).
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [030](decisions/030-static-dynamic-config-split.md) | Static/dynamic config split | Immutable transport vs. reloadable auth/forwarding |
|
||||
| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first API | Amended, not superseded — TOML is convenience layer |
|
||||
| [031](decisions/031-forwarding-policy.md) | Forwarding policy | Rule-based allow/deny, TransportKind-aware |
|
||||
| [029](decisions/029-identity-core-type.md) | Identity as core type | DynamicConfig.auth consumed by IdentityProvider |
|
||||
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | ConfigService wraps DynamicConfig reloads |
|
||||
|
||||
## Phase 2 Implementation Notes
|
||||
|
||||
- `DynamicConfig.auth` now includes `api_keys: Vec<ApiKeyEntry>` (ADR-037)
|
||||
- `DynamicConfig.credentials: HashMap<String, CredentialSet>` added for static outbound credentials (ADR-036)
|
||||
- `ListenerConfig` restructured from flat struct to enum: `Stream { transport, interface }`, `Http { config: HttpListenerConfig }`, `Dns { config: DnsListenerConfig }` (ADR-035)
|
||||
- `HttpListenerConfig` and `DnsListenerConfig` builder-pattern structs added
|
||||
- `ListenerConfig::validate()` now validates all three variants
|
||||
|
||||
## References
|
||||
|
||||
- [research/configuration.md](../research/configuration.md) — Full analysis and proposed solution
|
||||
- [identity.md](identity.md) — IdentityProvider trait, DynamicConfig.auth
|
||||
- [ADR-013](decisions/013-fail2ban-friendly-logging.md) — Rate limiting parameters
|
||||
@@ -1,263 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-09
|
||||
---
|
||||
|
||||
# Credentials (Outbound Auth)
|
||||
|
||||
## What
|
||||
|
||||
The `CredentialProvider` trait and `CredentialSet` enum handle **outbound**
|
||||
authentication: how alknet authenticates _to_ external and self-hosted services.
|
||||
This is the complement to `IdentityProvider`, which handles **inbound**
|
||||
authentication (who is calling alknet).
|
||||
|
||||
## Why
|
||||
|
||||
Without `CredentialProvider`, each service wrapper would independently solve
|
||||
credential retrieval, caching, and lifecycle management. Cloud API integrations
|
||||
(vast.ai, runpod) need API keys. Self-hosted services (rustfs, gitea) need
|
||||
S3 access keys or OIDC tokens. The secret service can store these at rest, but
|
||||
the wiring between "decrypt a credential from storage" and "use it in an HTTP
|
||||
request" doesn't exist yet.
|
||||
|
||||
`CredentialProvider` provides a unified abstraction — just as `IdentityProvider`
|
||||
unifies inbound auth, `CredentialProvider` unifies outbound auth. Handlers
|
||||
access credentials through `OperationEnv`, not by reaching into storage directly.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Direction: Inbound vs Outbound
|
||||
|
||||
| | IdentityProvider | CredentialProvider |
|
||||
|---|---|---|
|
||||
| **Direction** | Inbound (who is calling alknet) | Outbound (how alknet calls others) |
|
||||
| **Resolves** | Fingerprint/token → `Identity` | Service name → `CredentialSet` |
|
||||
| **Storage** | `peer_credentials`, `api_keys` | Encrypted nodes in metagraph |
|
||||
| **Lifecycle** | Stateless lookup | May need refresh (OIDC tokens, S3 sessions) |
|
||||
| **Location** | `alknet_core::auth` | `alknet_core::credentials` |
|
||||
|
||||
Both live at the same architectural layer. A handler receives an
|
||||
`OperationContext` with `identity` (who called us) and can access credentials
|
||||
through `context.env` (how we call out).
|
||||
|
||||
### CredentialProvider Trait
|
||||
|
||||
```rust
|
||||
pub trait CredentialProvider: Send + Sync + 'static {
|
||||
fn get_credentials(&self, service: &str) -> Option<CredentialSet>;
|
||||
fn refresh_credentials(&self, service: &str) -> Option<CredentialSet>;
|
||||
}
|
||||
```
|
||||
|
||||
The trait is intentionally narrow. It returns credentials for a named service.
|
||||
It does not abstract the auth mechanism — that stays with the service wrapper
|
||||
that knows the protocol (S3 signing, OAuth2 refresh, etc.).
|
||||
|
||||
### CredentialSet
|
||||
|
||||
```rust
|
||||
pub enum CredentialSet {
|
||||
ApiKey {
|
||||
header_name: String,
|
||||
token: String,
|
||||
},
|
||||
Basic {
|
||||
username: String,
|
||||
password: String,
|
||||
},
|
||||
Bearer {
|
||||
token: String,
|
||||
},
|
||||
S3AccessKey {
|
||||
access_key: String,
|
||||
secret_key: String,
|
||||
session_token: Option<String>,
|
||||
},
|
||||
OidcToken {
|
||||
access_token: String,
|
||||
refresh_token: Option<String>,
|
||||
expires_at: Option<u64>,
|
||||
},
|
||||
Custom {
|
||||
scheme: String,
|
||||
params: HashMap<String, String>,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
Each variant carries the data needed for a specific auth mechanism. The service
|
||||
wrapper that requested the credentials knows what variant it expects and how to
|
||||
use it.
|
||||
|
||||
### CredentialProvider vs IdentityProvider
|
||||
|
||||
These are opposite-direction abstractions that compose through `OperationEnv`:
|
||||
|
||||
```
|
||||
Incoming Request
|
||||
│
|
||||
▼
|
||||
IdentityProvider (credential → Identity)
|
||||
│
|
||||
├── SSH fingerprint → Identity.id, .scopes, .resources
|
||||
├── Bearer AuthToken → Identity.id, .scopes, .resources
|
||||
└── API key → Identity.id, .scopes, .resources
|
||||
│
|
||||
▼
|
||||
OperationContext { identity, env, ... }
|
||||
│
|
||||
├── context.env.invoke("git", "push", input)
|
||||
│ └── GitService handler
|
||||
│ └── CredentialProvider (outbound)
|
||||
│ └── get_credentials("rustfs")
|
||||
│ └── S3AccessKey { access_key, secret_key }
|
||||
│
|
||||
└── context.env.invoke("secrets", "derive", input)
|
||||
└── local dispatch to SecretProtocol
|
||||
|
||||
Two directions: Inbound (who is calling us)
|
||||
Outbound (how we call others)
|
||||
```
|
||||
|
||||
### SecretStoreCredentialProvider (Phase 1 Default)
|
||||
|
||||
The default `CredentialProvider` implementation. Decrypts credentials via
|
||||
`SecretProtocol::Decrypt` and holds them in RAM:
|
||||
|
||||
```rust
|
||||
pub struct SecretStoreCredentialProvider {
|
||||
credentials: ArcSwap<HashMap<String, CredentialSet>>,
|
||||
}
|
||||
```
|
||||
|
||||
At startup, the CLI or NAPI assembly loads credentials from the secret service
|
||||
and populates the `ArcSwap`. The `refresh_credentials()` method re-decrypts
|
||||
after a `Lock`/`Unlock` cycle on the secret service.
|
||||
|
||||
### ManagedCredentialProvider (Phase C Future)
|
||||
|
||||
For self-hosted services that need active lifecycle management (S3 session
|
||||
token rotation, OIDC token refresh). Wraps `SecretStoreCredentialProvider`
|
||||
with per-service `CredentialManager` instances:
|
||||
|
||||
```rust
|
||||
pub struct ManagedCredentialProvider {
|
||||
base: SecretStoreCredentialProvider,
|
||||
managers: HashMap<String, Arc<dyn CredentialManager>>,
|
||||
}
|
||||
|
||||
pub trait CredentialManager: Send + Sync + 'static {
|
||||
fn refresh(&self, current: &CredentialSet) -> Option<CredentialSet>;
|
||||
fn is_expired(&self, current: &CredentialSet) -> bool;
|
||||
fn provision(&self, identity: &Identity) -> Option<CredentialSet>;
|
||||
}
|
||||
```
|
||||
|
||||
- `refresh`: OIDC token refresh, S3 session token rotation
|
||||
- `is_expired`: Check TTL before use
|
||||
- `provision`: Create credentials on a self-hosted service for a given identity
|
||||
|
||||
This is a Phase C concept. The spec defines the extension point but defers
|
||||
implementation.
|
||||
|
||||
### Integration with OperationEnv
|
||||
|
||||
Handlers access credentials through `OperationEnv`:
|
||||
|
||||
```rust
|
||||
// Handler needs outbound credentials for a service
|
||||
let creds = context.env.get_credentials("rustfs");
|
||||
```
|
||||
|
||||
This is analogous to how `context.env.invoke(namespace, op, input)` works for
|
||||
operation dispatch — the handler doesn't know whether the credential comes from
|
||||
config, the secret service, or a managed provider.
|
||||
|
||||
### Integration with SecretProtocol
|
||||
|
||||
Credentials are stored encrypted in the metagraph via `SecretProtocol`:
|
||||
|
||||
1. Operator configures credentials: `alknet credential add vast-ai --type bearer --token-file ./key.txt`
|
||||
2. CLI encrypts via `SecretProtocol::Encrypt` (AES-256-GCM, key at path `m/74'/2'/0'/0'`)
|
||||
3. Encrypted credential stored as `EncryptedData` node in metagraph, tagged with service name
|
||||
4. At startup, `SecretStoreCredentialProvider` calls `SecretProtocol::Decrypt` for each configured service
|
||||
5. Decrypted credentials held in RAM with same lifecycle as the seed (purged on `Lock`)
|
||||
|
||||
The `EncryptedData` wire format is shared with alknet-storage by type-level
|
||||
compatibility, not a crate dependency.
|
||||
|
||||
### Identity-Bound Credentials (Phase B+ Future)
|
||||
|
||||
For multi-tenant setups where different alknet users have different access levels
|
||||
on the same external service:
|
||||
|
||||
```rust
|
||||
// Service-level credential (all users share one key):
|
||||
credential_provider.get_credentials("rustfs")
|
||||
|
||||
// Identity-bound credential (per-user key):
|
||||
credential_provider.get_credentials_for("rustfs", &identity.id)
|
||||
```
|
||||
|
||||
The trait-level method is service-level. The identity-bound method is an
|
||||
extension in alknet-storage that uses `Identity.id` (the account UUID in
|
||||
database-backed deployments) as the lookup key. No separate `account_id` field
|
||||
needed — `Identity.id` IS the account identifier.
|
||||
|
||||
## Constraints
|
||||
|
||||
- `CredentialProvider` and `CredentialSet` live in `alknet_core::credentials`.
|
||||
No database dependency at the core level.
|
||||
- `CredentialProvider` does not depend on `IdentityProvider`. They compose
|
||||
through `OperationEnv`, not through dependency.
|
||||
- `ManagedCredentialProvider` and `CredentialManager` are Phase C concepts.
|
||||
They are defined as extension points but not implemented yet.
|
||||
- Identity-bound credentials use `Identity.id` as the account key. In
|
||||
config-backed deployments, this is the fingerprint or key prefix. In
|
||||
database-backed deployments, this is the account UUID.
|
||||
- `SecretStoreCredentialProvider` depends on `SecretProtocol::Decrypt`, which
|
||||
requires the alknet-secret crate. A stub impl that reads from config is
|
||||
sufficient for Phase 2 when alknet-secret isn't available.
|
||||
- The `CredentialSet` variants cover all identified credential types (Phases
|
||||
A–C). Phase D (alknet as OIDC provider) is additive.
|
||||
|
||||
## Phase Progression
|
||||
|
||||
| Phase | CredentialProvider Scope | Notes |
|
||||
|-------|-------------------------|-------|
|
||||
| Phase 2 (now) | Trait + `CredentialSet` in core. `SecretStoreCredentialProvider` stub reads from config. | Enables Phase 2 HTTP auth |
|
||||
| Phase A | `SecretStoreCredentialProvider` backed by `SecretProtocol::Decrypt`. CLI command for credential management. | Full secret service integration |
|
||||
| Phase B | `FromOpenAPI` integration. `CredentialProvider` populates `HttpServiceConfig.auth`. | Auto-registration of external services |
|
||||
| Phase C | `ManagedCredentialProvider` + `CredentialManager`. S3 signing, OIDC refresh, identity-bound credentials. | Production self-hosted services |
|
||||
| Phase D | Alknet as OIDC provider. Eliminates stored credentials for OIDC-compatible services. | Long-term goal |
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **OQ-CP-01**: Should `CredentialProvider` support per-identity credentials
|
||||
(`get_credentials(service, identity)`)? See [open-questions.md](open-questions.md).
|
||||
|
||||
- **OQ-CP-02**: Where should OIDC provider operations live if alknet becomes
|
||||
an OIDC provider (Phase D)? See [open-questions.md](open-questions.md).
|
||||
|
||||
- **OQ-CP-03**: How do credential rotations propagate across a cluster? See
|
||||
[open-questions.md](open-questions.md).
|
||||
|
||||
- **OQ-CP-04**: Should `CredentialSet` include request-signing capability?
|
||||
See [open-questions.md](open-questions.md).
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [036](decisions/036-credentialprovider-core-type.md) | CredentialProvider as core type | Outbound credentials in `alknet_core::credentials`, parallel to IdentityProvider |
|
||||
| [029](decisions/029-identity-core-type.md) | Identity as core type | Inbound auth — the opposite direction |
|
||||
| [032](decisions/032-event-boundary-discipline.md) | Event boundary | Secret service domain events stay internal |
|
||||
|
||||
## References
|
||||
|
||||
- [identity.md](identity.md) — IdentityProvider (inbound auth, opposite direction)
|
||||
- [secret-service.md](secret-service.md) — SecretProtocol, EncryptedData
|
||||
- [services.md](services.md) — OperationEnv, OperationContext
|
||||
- [definitions.md](definitions.md) — IdentityProvider vs CredentialProvider disambiguation
|
||||
- [research/phase2/credential-provider.md](../research/phase2/credential-provider.md) — Full analysis with rustfs/gitea integration
|
||||
@@ -1,26 +0,0 @@
|
||||
# ADR-001: Pluggable Transport via AsyncRead+AsyncWrite Trait
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
Alknet needs to support multiple transport modes (TCP, TLS, iroh) for SSH sessions. Each mode has different connection establishment logic but produces the same result: a bidirectional byte stream. Without an abstraction, each transport would need its own SSH connection code path.
|
||||
|
||||
russh's `client::connect_stream()` and `server::run_stream()` both accept `AsyncRead + AsyncWrite + Unpin + Send`, meaning SSH is already transport-agnostic at the API level. The design question is whether to enshrine this in alknet's own type system or handle each transport case-by-case.
|
||||
|
||||
## Decision
|
||||
Define a `Transport` trait that produces `AsyncRead + AsyncWrite + Unpin + Send` streams. Each transport (TCP, TLS, iroh) implements this trait. The SSH layer calls `transport.connect()` and passes the result to `russh::client::connect_stream()`.
|
||||
|
||||
On the server side, define a `TransportAcceptor` trait that produces incoming streams. Each acceptor (TCP listener, TLS listener, iroh endpoint) implements this trait. The server calls `acceptor.accept()` and passes the result to `russh::server::run_stream()`.
|
||||
|
||||
This makes adding a new transport (e.g., WebSocket, QUIC directly) a matter of implementing the trait, not modifying SSH code.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Clean separation between transport and protocol. Adding transports is additive. SSH code is transport-agnostic.
|
||||
- **Positive**: Testing is simplified — mock transports can produce in-memory streams.
|
||||
- **Negative**: Slight indirection for the single-transport case (just TCP). The trait boilerplate is minimal though.
|
||||
- **Negative**: The trait must be object-safe if we want dynamic dispatch. Using `impl Trait` in function signatures avoids this but limits runtime transport selection. CLI-selected transport needs dynamic dispatch: `Box<dyn Transport<Stream = Box<dyn AsyncRead+AsyncWrite+Unpin+Send>>>`.
|
||||
|
||||
## References
|
||||
- [transport.md](../transport.md)
|
||||
- [Feasibility assessment §3](../../research/feasibility/ssh-tunnel-vpn-alternative-feasibility.md)
|
||||
@@ -1,30 +0,0 @@
|
||||
# ADR-002: TUN Shim as Separate Process
|
||||
|
||||
## Status
|
||||
Superseded by ADR-014
|
||||
|
||||
## Context
|
||||
TUN interface creation requires root privileges or `CAP_NET_ADMIN` on Linux, Administrator on Windows, or platform-specific VPN APIs on macOS/iOS/Android. If the core alknet binary required these privileges, the attack surface of root-required code would include the entire SSH implementation, key handling, and transport negotiation.
|
||||
|
||||
The primary use cases (SOCKS5 proxy, port forwarding) need no privileges at all. Only the "route all traffic through TUN" use case needs root.
|
||||
|
||||
## Decision
|
||||
The TUN functionality is a separate `alknet-tun` binary that:
|
||||
1. Creates a TUN device (requires root / CAP_NET_ADMIN)
|
||||
2. Reads IP packets from it
|
||||
3. Forwards each connection to the core alknet's SOCKS5 port (127.0.0.1:1080)
|
||||
4. Proxies bytes between TUN packets and SOCKS5 connections
|
||||
|
||||
The core `alknet connect` binary never needs root. The `alknet-tun` binary is ~200-500 lines and does nothing except TUN ↔ SOCKS5 forwarding.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Root-required code surface is tiny and auditable.
|
||||
- **Positive**: Core binary runs unprivileged. SOCKS5 and port forwarding work without any special permissions.
|
||||
- **Positive**: TUN process can crash without affecting the SSH session (it just reconnects to SOCKS5).
|
||||
- **Positive**: Matches the proven tun2proxy architecture.
|
||||
- **Negative**: Two processes to manage instead of one. Requires process supervision (systemd, etc.).
|
||||
- **Negative**: SOCKS5 adds a small latency overhead vs. direct TUN → SSH packet routing. This is acceptable for the security benefit.
|
||||
|
||||
## References
|
||||
- [tun-shim.md](../tun-shim.md)
|
||||
- [tun2proxy](https://github.com/tun2proxy/tun2proxy) — proven architecture for TUN → SOCKS5 proxy
|
||||
@@ -1,31 +0,0 @@
|
||||
# ADR-003: iroh Stream via tokio::io::join
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
iroh's QUIC implementation provides separate `RecvStream` (implements `AsyncRead`) and `SendStream` (implements `AsyncWrite`) for each bidirectional channel opened via `open_bi()` / `accept_bi()`. russh's `connect_stream()` and `run_stream()` require a single type implementing both `AsyncRead` and `AsyncWrite`.
|
||||
|
||||
Options considered:
|
||||
1. `tokio::io::join(recv, send)` — Combines the two halves into `Join<RecvStream, SendStream>` which implements both traits.
|
||||
2. Custom `IrohStream` wrapper — A struct with `recv` and `send` fields that delegates `AsyncRead` to `recv` and `AsyncWrite` to `send`.
|
||||
3. Using iroh's `Connection` directly — Opening a new `open_bi()` for each SSH channel instead of running SSH over a single stream.
|
||||
|
||||
## Decision
|
||||
Use `tokio::io::join(recv_stream, send_stream)` (Option 1).
|
||||
|
||||
One line of code, correct trait implementations, no custom types needed. The `Join<A, B>` type implements `AsyncRead` using `A` and `AsyncWrite` using `B`, which maps directly to iroh's split stream model.
|
||||
|
||||
If profiling later shows overhead (unlikely — it's just method dispatch), we can switch to a custom wrapper. But YAGNI until demonstrated.
|
||||
|
||||
Option 3 was rejected because it would require modifying russh to understand iroh connections. The whole point of the transport trait is that SSH doesn't know about iroh.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Minimal code. One line to bridge iroh and russh.
|
||||
- **Positive**: No custom types to maintain.
|
||||
- **Positive**: Correct `AsyncRead` + `AsyncWrite` behavior — `Poll::Pending` on one half doesn't affect the other.
|
||||
- **Negative**: None identified. The `Join` type is a standard tokio combinator with well-tested semantics.
|
||||
|
||||
## References
|
||||
- [transport.md](../transport.md)
|
||||
- [Feasibility assessment §11](../../research/feasibility/ssh-tunnel-vpn-alternative-feasibility.md)
|
||||
@@ -1,28 +0,0 @@
|
||||
# ADR-004: SSH Runs Over Transport, Not Alongside
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
There are two ways to structure the relationship between SSH and the transport layer:
|
||||
|
||||
1. **SSH over transport**: The transport produces one duplex stream. The entire SSH session (handshake, key exchange, channel multiplexing) runs over that single stream via `connect_stream()` / `run_stream()`. SSH has no direct network access.
|
||||
|
||||
2. **Transport alongside SSH**: SSH manages its own TCP connections via `connect()` / `run()`. The transport layer is an additional feature that wraps outgoing connections. SSH knows about the network.
|
||||
|
||||
## Decision
|
||||
SSH runs over the transport (Option 1). The SSH layer never opens its own sockets or knows what transport it's on.
|
||||
|
||||
This is directly enabled by russh's `connect_stream()` and `run_stream()` APIs, which accept any `AsyncRead+AsyncWrite+Unpin+Send`. SSH's entire interaction with the network goes through the single stream produced by the transport.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Adding a new transport requires implementing the `Transport` trait, not modifying SSH code.
|
||||
- **Positive**: Testing is straightforward — mock transports produce in-memory streams.
|
||||
- **Positive**: Security audit is clean — the SSH implementation has no network-facing code.
|
||||
- **Positive**: The transport can be layered. Iroh connecting through a SOCKS5 proxy (which itself tunnels through alknet) is just a transport that calls out to a SOCKS5 library before establishing the QUIC connection.
|
||||
- **Negative**: SSH keepalive and reconnection must be handled at the transport level. If the transport stream dies, the SSH session dies. Reconnection means establishing a new transport + new SSH session. There's no "SSH reconnects over the same transport" — you get a new session.
|
||||
- **Negative**: Multiple SSH sessions over the same iroh connection require the iroh `Endpoint` (not stream) to be shared between sessions. The transport trait produces one stream per `connect()` call. The iroh `Endpoint` must be created externally and shared. (The `IrohTransport` struct holds an `Arc<Endpoint>`.)
|
||||
|
||||
## References
|
||||
- [transport.md](../transport.md)
|
||||
- [Feasibility assessment §3.4](../../research/feasibility/ssh-tunnel-vpn-alternative-feasibility.md)
|
||||
@@ -1,39 +0,0 @@
|
||||
# ADR-005: SOCKS5 as Primary Interface, TUN as Add-on
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
A "VPN-like" tool needs to route traffic. There are three approaches:
|
||||
|
||||
1. **TUN only**: Create a TUN interface, route all OS traffic through it. Full VPN experience but requires root.
|
||||
2. **SOCKS5 only**: Local SOCKS5 proxy. Applications configure proxy settings. No root needed but application support varies.
|
||||
3. **SOCKS5 primary, TUN add-on**: SOCKS5 is the core interface. TUN forwards to SOCKS5.
|
||||
|
||||
## Decision
|
||||
SOCKS5 is the primary interface. TUN is a separate process that forwards to SOCKS5 (Option 3).
|
||||
|
||||
SOCKS5 is the core because:
|
||||
- It requires no privileges
|
||||
- `curl --socks5-hostname` works everywhere
|
||||
- Browsers, most CLI tools, and many applications support SOCKS5
|
||||
- SOCKS5h prevents DNS leaks by resolving names server-side
|
||||
- It's the interface that the NAPI wrapper and pubsub adapter build on
|
||||
- TUN is only needed for "route all traffic" use cases, which are a subset of users
|
||||
|
||||
TUN forwards to SOCKS5 rather than directly to SSH because:
|
||||
- The SOCKS5 code already handles TCP connection establishment and bidirectional proxying
|
||||
- TUN's job is just IP packet → SOCKS5 connection, not IP packet → SSH channel
|
||||
- The `alknet-tun` binary stays minimal (~200-500 lines)
|
||||
- No root code in the core binary
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Core binary is root-free. TUN functionality is provided by the external `tun2proxy` tool (ADR-014).
|
||||
- **Positive**: SOCKS5 is testable without TUN — just `curl` against it.
|
||||
- **Positive**: The TUN approach is validated by tun2proxy, a well-tested existing tool. No custom TUN code to maintain.
|
||||
- **Negative**: VPN-like behavior requires running `tun2proxy` alongside `alknet connect` — two processes instead of one integrated binary.
|
||||
- **Negative**: SOCKS5 doesn't capture UDP (except DNS via SOCKS5h). TUN mode via tun2proxy handles this separately.
|
||||
|
||||
## References
|
||||
- [client.md](../client.md)
|
||||
- [tun-shim.md](../tun-shim.md)
|
||||
@@ -1,38 +0,0 @@
|
||||
# ADR-006: No Logging of Tunnel Destinations
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
An SSH tunnel server sees every destination that clients connect to — hostnames, IP addresses, port numbers. This is extremely sensitive information. Logging it creates:
|
||||
|
||||
- **Privacy risks**: Tunnel destinations reveal what services users access (internal databases, APIs, etc.)
|
||||
- **Legal concerns**: Server operators may be pressured to produce logs showing what clients accessed
|
||||
- **Data retention liability**: Stored destination logs are an attack surface (data breaches, subpoenas)
|
||||
|
||||
However, the server does need to log some information for operational purposes — particularly for fail2ban integration to detect and block abusive connections.
|
||||
|
||||
## Decision
|
||||
The server does NOT log:
|
||||
- `channel_open_direct_tcpip` destinations (host, port)
|
||||
- DNS resolutions performed by the server on behalf of clients
|
||||
- Bytes transferred through tunnel channels
|
||||
- Connection duration or throughput
|
||||
|
||||
The server DOES log (ADR-013):
|
||||
- Auth attempts (remote_addr, user, key_fingerprint, accept/reject)
|
||||
- Connection opened (remote_addr, transport kind)
|
||||
- Connection closed (remote_addr, duration)
|
||||
|
||||
This separation ensures fail2ban has enough data to detect abusive IPs while destination privacy is maintained.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Tunnel destinations are never written to disk or any observable log. This is the same guarantee OpenSSH makes with `LogLevel VERBOSE` or below.
|
||||
- **Positive**: Reduces legal and privacy exposure for server operators.
|
||||
- **Positive**: fail2ban can still work — it needs source IPs and auth failures, not destinations.
|
||||
- **Negative**: Server operators cannot audit what destinations clients are accessing. If an operator needs this for compliance, they must implement it outside alknet (e.g., network-level logging at the target host).
|
||||
- **Negative**: Debugging connectivity issues is harder without destination logs. Mitigated by client-side logging (the client knows what it's connecting to).
|
||||
|
||||
## References
|
||||
- [server.md](../server.md)
|
||||
- [ADR-013](013-fail2ban-friendly-logging.md) — what the server does log
|
||||
@@ -1,26 +0,0 @@
|
||||
# ADR-007: NAPI Exposes Single Duplex Stream
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The NAPI wrapper for alknet could expose different granularity levels:
|
||||
|
||||
1. **Full SSH API**: Expose channel multiplexing, `open_direct_tcpip`, `tcpip_forward`, session management. The TypeScript layer would manage channels.
|
||||
2. **Single duplex stream**: The NAPI wrapper establishes one SSH channel and returns it as a Node.js `Duplex` stream. TypeScript multiplexing (if needed) happens at the pubsub layer.
|
||||
|
||||
## Decision
|
||||
Option 2: NAPI exposes a single duplex stream.
|
||||
|
||||
The NAPI wrapper's job is to get a reliable, authenticated byte stream from A to B. It handles transport (TCP/TLS/iroh), SSH authentication, and channel setup, then hands the caller a single `Duplex` stream that just works.
|
||||
|
||||
If the TypeScript consumer needs multiplexing (e.g., multiple concurrent tool calls over operations), pubsub handles that at the `EventEnvelope` level. Multiple `call.requested` / `call.responded` events flow over the same stream, distinguished by their `id` fields. This is how the existing WebSocket adapter works.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Minimal NAPI surface — one function, one return type. Small binary, small FFI boundary.
|
||||
- **Positive**: The TypeScript side doesn't need to understand SSH at all. It gets a stream and sends/receives `EventEnvelope` JSON.
|
||||
- **Positive**: No need to expose russh types in NAPI. The SSH complexity stays in Rust.
|
||||
- **Negative**: If a consumer wants multiple isolated channels (e.g., one for events, one for file transfer), they'd need multiple `connect()` calls (multiple SSH sessions). This is acceptable for the expected use case (pubsub events over a single stream).
|
||||
|
||||
## References
|
||||
- [napi-and-pubsub.md](../napi-and-pubsub.md)
|
||||
@@ -1,38 +0,0 @@
|
||||
# ADR-008: ACME/Let's Encrypt Certificate Provisioning
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
TLS transport mode requires certificates. Manual certificate management is error-prone — users need to obtain, install, and renew certificates. Our production setup uses certbot with Let's Encrypt (documented in [certbot.md](../../research/ops/certbot.md)), which automates this via the ACME protocol.
|
||||
|
||||
There are two ACME flows:
|
||||
1. **Domain-based**: Standard flow with DNS-01 or HTTP-01 challenge. Certificate is tied to a domain name, auto-renews via certbot/systemd timer. Requires port 80 or DNS access for challenges.
|
||||
2. **IP-based**: Short-lived certificates via TLS-ALPN-01 challenge on port 443. No domain needed, but cert is short-lived (days, not months). Simpler for quick setups but requires the ACME client to run continuously.
|
||||
|
||||
Both flows are important for alknet's usability. Without ACME, TLS mode requires manual cert setup — a significant barrier for users who want "SSH over port 443" for censorship resistance.
|
||||
|
||||
## Decision
|
||||
Support both ACME certificate provisioning paths:
|
||||
|
||||
1. **Domain-based ACME** (`--acme-domain <domain>`): Standard certbot-style flow. Certificate is domain-bound, auto-renews. The server runs a challenge responder (HTTP-01 on port 80 or TLS-ALPN-01 on port 443) during certificate issuance/renewal.
|
||||
|
||||
2. **IP-based ACME**: Short-lived certs for servers without a domain. Uses TLS-ALPN-01 challenge on port 443. Lower burden but certs expire frequently.
|
||||
|
||||
3. **Manual certs** (`--tls-cert` / `--tls-key`): Always supported for users with existing certificates or specific PKI setups.
|
||||
|
||||
The implementation should use the `rustls-acme` crate (or similar pure-Rust ACME client) to avoid an external certbot dependency. This keeps alknet self-contained as a single binary.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Users can run `alknet serve --transport tls --acme-domain example.com` and get working TLS with zero manual cert management.
|
||||
- **Positive**: IP-based ACME covers the quick-setup case without requiring a domain.
|
||||
- **Positive**: Consistent with our production infrastructure (certbot + Let's Encrypt is already our standard).
|
||||
- **Negative**: ACME adds complexity to the server binary (challenge responder, cert store, renewal timer).
|
||||
- **Negative**: IP-based short-lived certs require more frequent renewal handling.
|
||||
- **Negative**: Binary size increases with ACME support (rustls-acme dependency). Consider making ACME a feature flag (`acme`).
|
||||
|
||||
## References
|
||||
- [server.md](../server.md)
|
||||
- [OQ-01](../open-questions.md) — resolved by this ADR
|
||||
- [OQ-07](../open-questions.md) — resolved by this ADR
|
||||
- Production certbot setup: [certbot.md](../../research/ops/certbot.md)
|
||||
@@ -1,28 +0,0 @@
|
||||
# ADR-009: Default iroh Relay with Override
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
iroh requires a relay server for NAT traversal and initial connection establishment. The n0 project provides free relay servers (`https://relay.iroh.network/`) that work out of the box. However, relying on a third-party service creates a dependency:
|
||||
|
||||
- n0's relay could change terms, rate-limit, or go down
|
||||
- Production deployments may want self-hosted relays for reliability and privacy
|
||||
- The relay URL is a configuration point that should be explicit
|
||||
|
||||
Conversely, requiring users to set up a relay server before they can use iroh transport is a significant friction point for testing and quick starts.
|
||||
|
||||
## Decision
|
||||
Default to n0's relay servers. Allow override via `--iroh-relay <url>` CLI flag. Document self-hosted relay setup in project documentation.
|
||||
|
||||
This matches iroh's own defaults — n0's relay is the standard starting point. Users who need production reliability self-host.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Zero-config iroh transport for testing and development. `alknet serve --transport iroh` just works.
|
||||
- **Positive**: Self-hosting is a single flag override, not a complex setup requirement.
|
||||
- **Negative**: Default depends on n0's infrastructure. If n0's relay is down, default iroh connections fail (but this is the same experience as every iroh user).
|
||||
- **Negative**: Privacy-conscious users must remember to `--iroh-relay` to avoid n0. Mitigated by documentation.
|
||||
|
||||
## References
|
||||
- [transport.md](../transport.md)
|
||||
- [OQ-02](../open-questions.md) — resolved by this ADR
|
||||
@@ -1,33 +0,0 @@
|
||||
# ADR-010: Transport Chaining in CLI
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
Transport chaining allows combining iroh with an upstream proxy, e.g.:
|
||||
|
||||
```bash
|
||||
alknet connect --transport iroh --proxy socks5://127.0.0.1:1080
|
||||
```
|
||||
|
||||
This routes iroh's outbound TCP connections through a SOCKS5 proxy, which could itself be another alknet instance. This is important for:
|
||||
- Nested tunnel topologies
|
||||
- Environments where iroh needs to go through an existing proxy
|
||||
- Composing transports in flexible ways
|
||||
|
||||
iroh's `Endpoint::builder` supports proxy configuration natively. The implementation is straightforward — pass the proxy URL to iroh's builder.
|
||||
|
||||
## Decision
|
||||
Support `--transport iroh --proxy socks5://...` natively in the CLI. This works because iroh's endpoint builder accepts a proxy configuration, so the implementation is minimal: parse the proxy URL and pass it to the endpoint builder.
|
||||
|
||||
For other transport combinations (TCP+TLS is already implicit — TLS wraps TCP), the `--proxy` flag applies to outbound connections from the SSH client or iroh endpoint.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Flexible transport composition without requiring separate manual configuration.
|
||||
- **Positive**: Matches user expectation from the overview doc's transport chaining example.
|
||||
- **Positive**: Implementation is minimal — iroh already supports proxy config.
|
||||
- **Negative**: Slightly more CLI surface area (`--proxy` interaction with `--transport`).
|
||||
|
||||
## References
|
||||
- [transport.md](../transport.md)
|
||||
- [OQ-05](../open-questions.md) — resolved by this ADR
|
||||
@@ -1,38 +0,0 @@
|
||||
# ADR-011: Programmatic-First API, No File-Based Config
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The client and server both need configuration (host addresses, keys, transport options, etc.). There are several approaches:
|
||||
|
||||
1. **Read `~/.ssh/config`**: Parse OpenSSH config for default host/key/port. Reduces CLI verbosity for frequent connections.
|
||||
2. **Custom config file**: Alknet-specific config file (TOML/YAML) with host definitions.
|
||||
3. **Programmatic API only**: Configuration comes from CLI flags or the library API. No file parsing. `~/.ssh/` path conventions are cross-platform trouble (`~` expansion, Windows paths, etc.).
|
||||
4. **Hybrid**: `--config` flag pointing to a alknet-specific config file, but no OpenSSH config parsing.
|
||||
|
||||
## Decision
|
||||
Option 3: Programmatic-first API. Configuration is provided via:
|
||||
- **CLI**: explicit flags (`--server`, `--identity`, `--transport`, etc.)
|
||||
- **Library API**: `alknet_core::client::ConnectOptions` and `alknet_core::server::ServeOptions` structs, constructable programmatically
|
||||
- **Environment variables**: for a few convenience defaults (e.g., `ALKNET_SERVER`, `ALKNET_IDENTITY`)
|
||||
|
||||
No `~/.ssh/config` parsing, no alknet-specific config files. This approach:
|
||||
- Avoids cross-platform path issues (`~` expansion, Windows `USERPROFILE`, etc.)
|
||||
- Makes the library API clean and straightforward for programmatic consumers (NAPI wrapper, pubsub)
|
||||
- Keeps the CLI simple and explicit — no hidden behavior from config files
|
||||
- Matches the design principle that the library crate (`alknet-core`) is the primary interface
|
||||
|
||||
If users want config-file behavior in the future, it can be added as a separate layer that populates the options structs. But the core doesn't need to know about files.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Clean library API — `ConnectOptions` and `ServeOptions` are plain Rust structs.
|
||||
- **Positive**: No cross-platform path issues in the core library.
|
||||
- **Positive**: Explicit CLI — no hidden settings from a config file the user forgot about.
|
||||
- **Positive**: NAPI wrapper can construct options programmatically without file I/O.
|
||||
- **Negative**: Users must type full connection flags each time. Mitigated by shell aliases or environment variables.
|
||||
- **Negative**: No config file convenience. Users coming from `ssh config` may find this inconvenient.
|
||||
|
||||
## References
|
||||
- [client.md](../client.md)
|
||||
- [OQ-06](../open-questions.md) — resolved by this ADR
|
||||
@@ -1,42 +0,0 @@
|
||||
# ADR-012: Ed25519 Keys + OpenSSH Certificate Authority, No Password Auth
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
SSH authentication has several options:
|
||||
- **Ed25519 public key**: The default, already specified. Each user has a keypair; the server has an `authorized_keys` file.
|
||||
- **Password authentication**: Convenient for quick setups but less secure (susceptible to brute force, credential reuse).
|
||||
- **OpenSSH certificate authority (cert-authority)**: A CA signs user certificates. The server trusts the CA instead of individual keys. Much easier for multi-user setups — add one CA line to `authorized_keys` instead of every user's public key. Also supports certificate expiry and restrictions.
|
||||
|
||||
The question is which auth methods to support and prioritize.
|
||||
|
||||
## Decision
|
||||
|
||||
**Primary: Ed25519 public key** (already specified, no change).
|
||||
|
||||
**Important: OpenSSH certificate authority**. Support `cert-authority` entries in `authorized_keys` files. When a user presents a certificate signed by a trusted CA, the server validates the certificate (signature, expiry, permissions) and accepts it. This is critical for multi-user deployments where managing individual keys is impractical.
|
||||
|
||||
**Not supported: Password authentication over SSH channels.** Password auth over an SSH tunnel (i.e., the SOCKS5 proxy requiring a password) is not in scope. Password auth over SSH itself is rejected because:
|
||||
- It's less secure than key-based auth
|
||||
- It's susceptible to brute force (fail2ban can mitigate, but keys eliminate the problem)
|
||||
- It's not needed when cert-authority provides easy multi-user management
|
||||
- If a local SOCKS5 proxy is desired with its own auth, that's a separate concern
|
||||
|
||||
The server's `authorized_keys` file format follows OpenSSH conventions:
|
||||
- Regular keys: `ssh-ed25519 AAAA... user@host`
|
||||
- CA trusts: `cert-authority ssh-ed25519 AAAA... CA name`
|
||||
- Principals: `cert-authority,permit-port-forwarding ssh-ed25519 AAAA... CA name`
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Multi-user deployments are manageable — one CA entry instead of N key entries.
|
||||
- **Positive**: Certificates can carry expiry dates and restrictions (permit-port-forwarding, no-pty, source-address).
|
||||
- **Positive**: No password brute force risk. fail2ban still needed for connection-level abuse, but not for auth-level password guessing.
|
||||
- **Positive**: `russh` supports OpenSSH certificate verification natively.
|
||||
- **Negative**: Setting up a CA requires initial key management tooling (`ssh-keygen -s`).
|
||||
- **Negative**: Users who want a quick "just let me in" experience need to generate keys first. Not a significant barrier for the target audience (self-hosting, ops).
|
||||
|
||||
## References
|
||||
- [client.md](../client.md)
|
||||
- [server.md](../server.md)
|
||||
- [OQ-04](../open-questions.md) — resolved by this ADR
|
||||
@@ -1,39 +0,0 @@
|
||||
# ADR-013: Fail2ban-Friendly Server Logging
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The server needs to handle abuse on public-facing deployments. Our production infrastructure uses fail2ban on Linux (documented in [fail2ban.md](../../research/ops/fail2ban.md)) with nftables and systemd journal backend. fail2ban needs structured, parseable logs to identify abusive IP addresses.
|
||||
|
||||
However, fail2ban is Linux-specific. On other platforms (macOS, Windows, BSD), users need a different approach to reject abusive connections. The server should provide enough logging for fail2ban on Linux and enough built-in protection for other platforms.
|
||||
|
||||
## Decision
|
||||
The server logs connection and authentication events at `INFO` level with structured fields, and provides a configurable connection rate limiter as a built-in defense.
|
||||
|
||||
**Logging** (for fail2ban integration on Linux):
|
||||
- Log auth attempts: `level=INFO, msg="auth attempt", remote_addr=<ip>, user=<user>, key_fingerprint=<sha256>, result=<accept|reject>`
|
||||
- Log new connections: `level=INFO, msg="connection opened", remote_addr=<ip>, transport=<tcp|tls|iroh>`
|
||||
- Log disconnections: `level=INFO, msg="connection closed", remote_addr=<ip>, duration=<secs>`
|
||||
- Do NOT log: channel open targets, DNS resolutions, bytes transferred
|
||||
|
||||
This matches what fail2ban needs: source IP + failure indicator. Our existing fail2ban setup filters on similar fields for SSH and nginx.
|
||||
|
||||
**Built-in rate limiting** (for all platforms):
|
||||
- `--max-connections-per-ip <n>` (default: 0 = unlimited) — reject new connections from an IP that already has N active connections
|
||||
- `--max-auth-attempts <n>` (default: 10) — disconnect after N failed auth attempts from one connection
|
||||
- Rate limiting happens at the SSH layer, before channels are opened
|
||||
|
||||
This ensures that even without fail2ban, the server rejects obviously abusive connections.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: fail2ban can parse alknet logs the same way it parses SSH and nginx logs on our production systems.
|
||||
- **Positive**: Built-in rate limiting provides protection on platforms without fail2ban.
|
||||
- **Positive**: No privacy-sensitive data in logs (no tunnel destinations).
|
||||
- **Negative**: Slightly more code in the server for connection tracking per IP.
|
||||
- **Negative**: Users with custom fail2ban filters need to write regex for alknet's log format (documented examples provided).
|
||||
|
||||
## References
|
||||
- [server.md](../server.md)
|
||||
- [OQ-08](../open-questions.md) — resolved by this ADR
|
||||
- Production fail2ban setup: [fail2ban.md](../../research/ops/fail2ban.md)
|
||||
@@ -1,41 +0,0 @@
|
||||
# ADR-014: Defer TUN Implementation, Recommend Local SOCKS5 + tun2proxy
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The original plan included a TUN shim (`alknet-tun`) as Phase 3 — a separate root-requiring process that creates a TUN device and forwards IP packets through alknet's SOCKS5 port. This would provide VPN-like "route all traffic" behavior.
|
||||
|
||||
However, TUN implementation has significant complexities:
|
||||
- Platform differences (Linux TUN, macOS utun, Windows wintun.dll)
|
||||
- TCP reconstruction in userspace (smoltcp or tun2proxy's ip-stack)
|
||||
- Virtual DNS handling
|
||||
- Root/CAP_NET_ADMIN requirements
|
||||
- TUN is easy to get wrong and hard to debug
|
||||
|
||||
The core SOCKS5 interface already works for the vast majority of use cases. For users who truly need VPN-like "route all traffic" behavior, `tun2proxy` is an existing, well-tested tool that does exactly this: creates a TUN device and routes traffic through a SOCKS5 proxy.
|
||||
|
||||
## Decision
|
||||
Defer TUN implementation entirely. Remove `alknet-tun` from the architecture. Instead:
|
||||
|
||||
1. **Core interface**: alknet's local SOCKS5 proxy (always available, no root required)
|
||||
2. **VPN-like behavior**: Users who need it run `tun2proxy --proxy socks5://127.0.0.1:1080` alongside `alknet connect`
|
||||
3. **Documentation**: Recommend tun2proxy in the README/wiki for "route all traffic" use cases
|
||||
|
||||
This removes TUN from the project scope while still providing a path to VPN-like behavior. If demand justifies it later, `alknet-tun` can be added as a thin wrapper around tun2proxy's pattern.
|
||||
|
||||
The `tun` feature flag and `alknet-tun` binary are removed from the architecture. The `tun-rs` dependency is removed.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Significantly reduces project scope and complexity. No TUN code to write, test, or maintain across platforms.
|
||||
- **Positive**: tun2proxy is already well-tested for this exact use case.
|
||||
- **Positive**: Core binary remains unprivileged. No root code anywhere in the project.
|
||||
- **Positive**: Cleaner architecture — alknet only does SSH tunneling + SOCKS5. tun2proxy does TUN.
|
||||
- **Negative**: Users need two tools instead of one for VPN-like behavior. Mitigated by documentation.
|
||||
- **Negative**: tun2proxy is an external dependency in practice, though it's widely available in package managers.
|
||||
- **Negative**: No first-class Windows/macOS TUN story. tun2proxy handles these platforms but users need to install it separately.
|
||||
|
||||
## References
|
||||
- [tun-shim.md](../tun-shim.md) — this spec is now deprecated
|
||||
- [ADR-002](002-tun-separate-process.md) — superseded; TUN is no longer in scope
|
||||
- [ADR-005](005-socks5-before-tun.md) — SOCKS5 is still the primary interface; TUN forwarding is now external
|
||||
@@ -1,27 +0,0 @@
|
||||
# ADR-015: napi-rs for FFI Bridge
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The NAPI wrapper needs a Rust-to-Node.js bridge. Two main options:
|
||||
|
||||
1. **napi-rs**: The standard for Rust → Node.js native addons. Mature, well-documented, large ecosystem. Produces `.node` binaries for specific platforms. Good build tooling (`napi` CLI). Used by major projects (swc, rspack, biome).
|
||||
|
||||
2. **uniffi**: Mozilla's FFI bridge supporting multiple targets (Python, Swift, Kotlin, Node.js). Broader target reach but less mature for Node.js specifically. The Node.js binding is relatively new.
|
||||
|
||||
The primary consumer is TypeScript/Node.js — specifically the `@alkdev/pubsub` event target system. The broader alkdev ecosystem (pubsub, operations) is TypeScript-first. While future Python or mobile consumers are imaginable, they are not in scope.
|
||||
|
||||
## Decision
|
||||
Use napi-rs. It's the standard for Node.js native addons, has the best documentation and tooling, and matches our primary consumer (TypeScript/Node.js). If future Python or mobile consumers are needed, uniffi can be added as a separate FFI layer — the Rust core library doesn't change, only the binding layer does.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Best-in-class Node.js native addon support. Mature, well-documented, widely used.
|
||||
- **Positive**: `napi` CLI handles building, cross-compilation, and npm package publishing.
|
||||
- **Positive**: Async support via `napi-rs`'s `AsyncTask` and thread-safe functions.
|
||||
- **Negative**: Only targets Node.js. Python/Swift/Kotlin require a separate FFI bridge (uniffi or similar).
|
||||
- **Negative**: `.node` binaries are platform-specific. Need CI matrix for linux-x64, linux-arm64, macos-x64, macos-arm64, win32-x64.
|
||||
|
||||
## References
|
||||
- [napi-and-pubsub.md](../napi-and-pubsub.md)
|
||||
- [OQ-11](../open-questions.md) — resolved by this ADR
|
||||
@@ -1,40 +0,0 @@
|
||||
# ADR-016: NAPI Exposes Both connect() and serve()
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The NAPI wrapper needs to provide TypeScript/Node.js consumers with access to alknet's functionality. The primary use case is `@alkdev/pubsub`'s event target system, which needs both directions:
|
||||
|
||||
1. **connect()**: Establish a client connection to a alknet server. Used by workers/spokes that need to tunnel events through a alknet server.
|
||||
2. **serve()**: Start a alknet server from Node.js. Used by hubs that want to accept alknet connections and route events.
|
||||
|
||||
The previous decision (ADR-007) was to expose only `connect()` for MVP, deferring `serve()`. However, the pubsub integration requires both: a spoke needs `connect()` to reach a hub, and a hub could use `serve()` to accept connections without running a separate `alknet serve` process.
|
||||
|
||||
More importantly, both `connect()` and `serve()` are fundamental operations of the alknet library. Since the NAPI wrapper is a thin layer over `alknet-core`, exposing both is straightforward — they're just Rust functions behind `#[napi]` attributes.
|
||||
|
||||
## Decision
|
||||
The NAPI wrapper exposes both `connect()` and `serve()` from the start:
|
||||
|
||||
```typescript
|
||||
// @alkdev/alknet
|
||||
function connect(options: AlknetConnectOptions): Promise<Duplex>;
|
||||
function serve(options: AlknetServeOptions): Promise<AlknetServer>;
|
||||
```
|
||||
|
||||
- `connect()` returns a `Duplex` stream (as per ADR-007)
|
||||
- `serve()` returns a `AlknetServer` object with a `close()` method and events for new connections
|
||||
|
||||
The NAPI layer is transport-agnostic — it doesn't know about pubsub's `EventEnvelope`. The pubsub event target adapter wraps the `Duplex` stream to implement `TypedEventTarget`. This separation ensures the NAPI wrapper is reusable for any stream-based protocol, not just pubsub.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Pubsub can use both directions without running a separate binary for the server side.
|
||||
- **Positive**: The NAPI wrapper becomes a complete bridge — any Node.js process can be either a client or server.
|
||||
- **Positive**: Implementation is still minimal — `serve()` is just `alknet_core::server::run()` behind `#[napi]`.
|
||||
- **Negative**: Slightly larger API surface (two functions + `AlknetServer` type instead of just `connect()`).
|
||||
- **Negative**: Server-side NAPI needs to handle multiple concurrent connections, which adds complexity to `AlknetServer`.
|
||||
|
||||
## References
|
||||
- [napi-and-pubsub.md](../napi-and-pubsub.md)
|
||||
- [ADR-007](007-napi-single-stream.md) — still valid; NAPI exposes single streams, but now from both sides
|
||||
- [OQ-10](../open-questions.md) — resolved by this ADR
|
||||
@@ -1,30 +0,0 @@
|
||||
# ADR-017: Stealth Mode — Protocol Multiplexing on Port 443
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
When running a alknet server with TLS transport on port 443, the server should be indistinguishable from a regular HTTPS web server to port scanners and deep packet inspection (DPI) systems. This is important for censorship circumvention — if SSH traffic on port 443 is detectable, it can be blocked.
|
||||
|
||||
After the TLS handshake completes, the server sees a raw byte stream. SSH protocol identification starts with `SSH-2.0-`, while HTTP starts with HTTP method verbs (GET, POST, etc.). The server can inspect the first bytes to determine the protocol.
|
||||
|
||||
## Decision
|
||||
When `--stealth` is enabled with TLS transport:
|
||||
|
||||
1. After completing the TLS handshake, peek at the first few bytes of the connection
|
||||
2. If the connection starts with `SSH-2.0-`, proceed with SSH session via `server::run_stream()`
|
||||
3. If the connection starts with anything else (HTTP, random data), respond with `HTTP/1.1 404 Not Found\r\nServer: nginx\r\n\r\n` and close the connection
|
||||
|
||||
This makes the server appear as an nginx web server returning 404 errors to all non-SSH connections. Scanners and DPI systems see a typical HTTPS site with no SSH exposure.
|
||||
|
||||
The fake response uses `Server: nginx` headers to match the most common web server profile.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: TLS+alknet servers on port 443 are indistinguishable from ordinary HTTPS sites to automated scanners.
|
||||
- **Positive**: Simple implementation — just peek at the first bytes and branch.
|
||||
- **Positive**: Consistent with censorship circumvention best practices.
|
||||
- **Negative**: Legitimate HTTPS traffic to the same port gets a 404. If the same IP needs to serve real web content, use a reverse proxy (nginx/haproxy) in front that routes by SNI or path.
|
||||
- **Negative**: The `--stealth` flag only applies to TLS transport. It has no effect on TCP or iroh transports.
|
||||
|
||||
## References
|
||||
- [server.md](../server.md)
|
||||
@@ -1,38 +0,0 @@
|
||||
# ADR-018: Control Channel for PubSub over SSH
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The NAPI wrapper and pubsub integration need a way to use alknet's SSH channel as a data plane for event routing. When a `alknet connect` client opens an SSH session to a server, the `direct_tcpip` channel type is used to reach specific TCP targets (host:port).
|
||||
|
||||
For the pubsub use case, the client needs a dedicated bidirectional stream to the server's event bus — not a TCP connection to a random host. There are several approaches:
|
||||
|
||||
1. **Special destination**: Use `direct_tcpip` with a reserved destination (e.g., `alknet-control:0`) that the server recognizes and routes internally instead of connecting to a TCP target.
|
||||
2. **Port forwarding**: The server runs a pubsub hub on a specific port (e.g., 9736) and the client uses normal port forwarding (`-L 9736:hub:9736`).
|
||||
3. **Custom channel type**: Define a new SSH channel type beyond `direct_tcpip` and `forwarded_tcpip`.
|
||||
|
||||
## Decision
|
||||
Use approach 1: a reserved `direct_tcpip` destination string. When the server receives a `channel_open_direct_tcpip` request for `alknet-control:0`:
|
||||
|
||||
1. The `channel_open_direct_tcpip` handler detects the special target via string matching
|
||||
2. Instead of connecting to a TCP target, it bridges the channel to the local pubsub event bus
|
||||
3. `EventEnvelope` JSON flows bidirectionally over the SSH channel
|
||||
|
||||
The destination string `alknet-control` is reserved. Regular TCP targets are hostnames or IP addresses, so there is no collision risk.
|
||||
|
||||
Approach 2 (port forwarding to a specific port) is still supported as an alternative — the client can use `--forward 9736:localhost:9736` if the server runs a pubsub hub on that port. But the control channel approach is simpler and doesn't require a separate listening port.
|
||||
|
||||
Approach 3 (custom channel type) was rejected because russh's `direct_tcpip` handler is well-understood and adding custom channel types requires modifying russh.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: Simple implementation — just string matching in the server's `channel_open_direct_tcpip` handler.
|
||||
- **Positive**: No separate port or service needs to run on the server. The control channel is built into alknet.
|
||||
- **Positive**: Compatible with the NAPI wrapper's single-duplex-stream model.
|
||||
- **Positive**: Port forwarding to a specific port is still available as an alternative.
|
||||
- **Negative**: The string `alknet-control` is a magic constant. It should be defined as a constant in the crate.
|
||||
- **Negative**: Regular TCP destinations accidentally matching `alknet-control` would be misrouted. Mitigated by reserving the entire `alknet-` prefix namespace.
|
||||
|
||||
## References
|
||||
- [napi-and-pubsub.md](../napi-and-pubsub.md)
|
||||
- [server.md](../server.md)
|
||||
@@ -1,42 +0,0 @@
|
||||
# ADR-019: `--proxy` Has Different Semantics on Client vs Server
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The `--proxy` CLI flag appears on both `alknet connect` (client) and `alknet serve` (server), but the two sides proxy fundamentally different things:
|
||||
|
||||
- **Client**: `--proxy` routes the *transport connection* through the proxy. For example, `alknet connect --transport iroh --proxy socks5://127.0.0.1:1080` means the iroh endpoint's outbound TCP connections go through the specified SOCKS5 proxy before reaching the iroh relay. The proxy wraps the transport layer.
|
||||
|
||||
- **Server**: `--proxy` routes *outbound target connections* through the proxy. For example, `alknet serve --proxy socks5://127.0.0.1:9050` means when an SSH client opens a `direct_tcpip` channel to `db.internal:5432`, the server connects to that target through the specified proxy. The proxy wraps the data-plane connections.
|
||||
|
||||
Using the same flag name for both is intentional — from the user's perspective, both mean "route traffic through a proxy." But the layer at which the proxy operates differs, and this needs to be explicit so implementers don't confuse the two.
|
||||
|
||||
ADR-010 addressed transport chaining for the client side only. The server-side outbound proxy behavior has no ADR. This ADR documents both semantics and the rationale for sharing the flag name.
|
||||
|
||||
## Decision
|
||||
The `--proxy` flag uses the same name on client and server, with documented different semantics:
|
||||
|
||||
| Side | Flag | What gets proxied | Example |
|
||||
|------|------|-------------------|---------|
|
||||
| Client | `--proxy` | Transport connection (outbound to server/relay) | `--transport iroh --proxy socks5://...` → iroh endpoint connects through proxy |
|
||||
| Server | `--proxy` | Outbound target connections (data plane) | `--proxy socks5://...` → direct_tcpip targets reached through proxy |
|
||||
|
||||
On the **client**, `--proxy` affects the transport layer. It only applies to transports that make outbound TCP connections (iroh through a proxy, TLS through a proxy). For plain TCP transport, `--proxy` has no meaningful effect since the transport is already a direct TCP connection — use the SOCKS5 server instead.
|
||||
|
||||
On the **server**, `--proxy` affects the data plane. All `channel_open_direct_tcpip` outbound connections are routed through the proxy, regardless of transport mode.
|
||||
|
||||
This is not a naming collision — it's the same conceptual operation ("route through a proxy") at different layers. The shared name avoids forcing users to learn two proxy flags.
|
||||
|
||||
## Consequences
|
||||
- **Positive**: One flag name (`--proxy`) instead of two. Users already understand "proxy" as "route through this."
|
||||
- **Positive**: Client-side proxy is minimal implementation — iroh's endpoint builder accepts proxy config natively.
|
||||
- **Positive**: Server-side proxy is straightforward — all outbound TCP from channel handlers goes through the proxy.
|
||||
- **Negative**: Implementers must read the correct spec (client vs server) to understand what `--proxy` does for their side. This is mitigated by CLI help text that clearly describes the behavior per side.
|
||||
- **Negative**: On the client, `--proxy` with `--transport tcp` is effectively a no-op (the transport is already a direct TCP connection to the server). The CLI should handle this case gracefully.
|
||||
|
||||
## References
|
||||
- [ADR-010](010-transport-chaining-cli.md) — client-side transport chaining
|
||||
- [transport.md](../transport.md) — transport layer spec
|
||||
- [client.md](../client.md) — client CLI
|
||||
- [server.md](../server.md) — server outbound proxy
|
||||
@@ -1,85 +0,0 @@
|
||||
# ADR-023: Unified Authentication with Shared Key Material
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Alknet currently authenticates connections exclusively through SSH public key
|
||||
auth in the SSH handshake. This works for SSH-over-any-transport (TCP, TLS,
|
||||
iroh) because SSH carries its own auth protocol. But WebTransport and other
|
||||
HTTP-level transports cannot perform SSH key exchange — browsers speak HTTP/3,
|
||||
not SSH.
|
||||
|
||||
Without unification, non-SSH transports would need a completely separate
|
||||
identity system (API keys, JWTs, session tokens). This creates two problems:
|
||||
(1) operators manage two key sets with two rotation mechanisms, and (2) the
|
||||
same person connecting via SSH and WebTransport appears as two different
|
||||
identities.
|
||||
|
||||
The `IdentityProvider` trait is needed to decouple alknet-core from any
|
||||
specific identity storage (config file vs. database). Without it, alknet-core
|
||||
would either hardcode config-file-based auth or take a database dependency —
|
||||
neither is acceptable for a library crate.
|
||||
|
||||
## Decision
|
||||
|
||||
**Unified authentication**: The same Ed25519 key material (`authorized_keys`
|
||||
and `cert_authorities`) is shared across both SSH auth and token auth. The
|
||||
presentation differs per transport, but the verification result (an
|
||||
`Identity` with scopes) is the same.
|
||||
|
||||
**Token auth for non-SSH transports**: WebTransport clients present a signed
|
||||
timestamp token in the CONNECT request URL:
|
||||
|
||||
```
|
||||
AuthToken = base64url(key_id || timestamp || signature)
|
||||
key_id = SHA-256 fingerprint of the Ed25519 public key (32 bytes)
|
||||
timestamp = Unix seconds, big-endian u64 (8 bytes)
|
||||
signature = Ed25519 sign(key_id || timestamp_bytes, private_key)
|
||||
```
|
||||
|
||||
Server extracts the fingerprint, looks it up in the same `authorized_keys`
|
||||
set, verifies the signature, and checks the timestamp window (default ±300s).
|
||||
|
||||
**`IdentityProvider` trait**: Decouples alknet-core from identity storage. The
|
||||
trait resolves a fingerprint or token to an `Identity`. Default implementation
|
||||
loads from `DynamicConfig.auth` (no database). Hub implementation can back it
|
||||
with `@alkdev/storage`.
|
||||
|
||||
**`TokenKeySource::Shared`**: The token auth uses the same authorized keys set
|
||||
as SSH auth by default. Deployments that want separate access control can use
|
||||
`TokenKeySource::Separate` with a distinct key set.
|
||||
|
||||
**Replay protection via timestamps**: V1 uses timestamp-only (no server state).
|
||||
Zero-replay can be added later via a nonce challenge-response without changing
|
||||
the key material.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: One key set, one rotation, one `reloadAuth()` call. Adding a
|
||||
key to `authorized_keys` immediately grants access via both SSH and
|
||||
WebTransport.
|
||||
- **Positive**: `IdentityProvider` trait makes alknet-core independent of any
|
||||
specific database. Default: config file. Hub: `@alkdev/storage`.
|
||||
- **Positive**: Browser clients can authenticate using Ed25519 keys via
|
||||
SubtleCrypto (Chrome 105+, Firefox 130+, Safari 17+). Deno supports it
|
||||
natively.
|
||||
- **Positive**: No JWT library dependency. The token is a simple Ed25519
|
||||
signature over a fixed structure — same primitives SSH already uses.
|
||||
- **Negative**: V1 has a replay window (±300s). An attacker who intercepts a
|
||||
QUIC packet can replay the token within the window. Acceptable because QUIC
|
||||
interception is the same threat level as connection hijacking.
|
||||
- **Negative**: Certificate authority tokens are not supported in v1. CA
|
||||
verification requires the full OpenSSH certificate structure, which doesn't
|
||||
fit in a signed timestamp.
|
||||
- **Negative**: Browser-side key management is less ergonomic than SSH key
|
||||
files. The private key must be imported into SubtleCrypto. This is a UI/UX
|
||||
concern, not a protocol concern.
|
||||
|
||||
## References
|
||||
|
||||
- [auth.md](../auth.md) — Full auth architecture spec
|
||||
- [ADR-012](012-auth-ed25519-and-cert-authority.md) — Ed25519 + cert-authority auth
|
||||
- [OQ-17](../open-questions.md) — Transport-aware auth (resolved by this ADR)
|
||||
- [configuration.md](../../research/configuration.md) — OQ-CFG-04, OQ-CFG-06 (resolved)
|
||||
@@ -1,63 +0,0 @@
|
||||
# ADR-024: Bidirectional Call Protocol
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The alknet control channel (ADR-018) routes from client → server's event bus.
|
||||
This is unidirectional: clients can send events to the server, but the server
|
||||
cannot call operations on the client. In the hub/spoke model, spokes (dev env
|
||||
containers) connect to a hub and expose operations (fs, bash, search) that the
|
||||
hub invokes. The hub needs to call *spoke* operations.
|
||||
|
||||
Additionally, the current control channel provides no request/response semantics.
|
||||
Every consumer that needs call/response reinvents the pending-request correlation.
|
||||
|
||||
## Decision
|
||||
|
||||
The call protocol is bidirectional. Both sides can send `call.requested` and
|
||||
receive `call.responded`. The protocol uses `EventEnvelope` wire format (4-byte
|
||||
BE length prefix + JSON) — the same as `@alkdev/pubsub`.
|
||||
|
||||
Five event types: `call.requested`, `call.responded`, `call.completed`,
|
||||
`call.aborted`, `call.error`.
|
||||
|
||||
A call is a subscribe that resolves after one event. Both use `call.requested`
|
||||
with correlated `requestId`. `PendingRequestMap` in core provides correlation.
|
||||
|
||||
Operation names use slash-based paths: `/{spoke}/{service}/{op}`. The first
|
||||
path segment routes the call to the correct connected node. The hub's registry
|
||||
maps spoke prefixes to connections. This mirrors iroh's ALPN dispatch: the
|
||||
first segment is the routing key, remaining path dispatches within the node.
|
||||
|
||||
Core-provided operations use short paths without a spoke prefix
|
||||
(`/services/list`, `/services/schema`). Spoke operations are prefixed
|
||||
(`/dev1/fs/readFile`).
|
||||
|
||||
This generalizes ADR-018's control channel: the `alknet-*` destination becomes
|
||||
a transport for `EventEnvelope` frames with call protocol semantics, instead of
|
||||
raw pubsub dispatch.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Hub can invoke operations on spokes. Dev env containers
|
||||
expose fs, bash, search — the hub calls them as needed.
|
||||
- **Positive**: Browser clients can expose custom UDFs. Any connected participant
|
||||
can both call and serve operations.
|
||||
- **Positive**: Built-in request/response correlation. One `PendingRequestMap`
|
||||
in core serves all consumers.
|
||||
- **Positive**: Slash-based paths align with URL routing, OpenAPI, MCP, and
|
||||
iroh's ALPN dispatch. First segment = routing key.
|
||||
- **Positive**: Multiple spokes exposing the same service (two dev envs both
|
||||
exposing `/fs/*`) are naturally differentiated by the spoke prefix.
|
||||
- **Negative**: The `PendingRequestMap` adds in-memory state. Entries must be
|
||||
cleaned up on timeout or connection close.
|
||||
- **Negative**: The hub must maintain a routing table mapping spoke identities
|
||||
to connections, with registration on connect and cleanup on disconnect.
|
||||
|
||||
## References
|
||||
|
||||
- [call-protocol.md](../call-protocol.md) — Full call protocol spec
|
||||
- [ADR-018](018-control-channel-for-pubsub.md) — Control channel (generalized)
|
||||
- [napi-and-pubsub.md](../napi-and-pubsub.md) — NAPI wrapper and pubsub adapter
|
||||
@@ -1,73 +0,0 @@
|
||||
# ADR-025: Handler/Spec Separation for Downstream Service Registration
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The current control channel (ADR-018) is hardcoded: `alknet-control:0` bridges
|
||||
to the local pubsub event bus. If NAPI wants to expose `fs.readFile` or
|
||||
`bash.exec` as callable operations, it has no way to register these with core's
|
||||
channel routing. The NAPI handler would need to intercept channel data outside
|
||||
of core.
|
||||
|
||||
For the hub/spoke model, spokes register their operations with the hub when
|
||||
they connect. The hub's registry must include both hub-local operations and
|
||||
remote operations exposed by spokes.
|
||||
|
||||
## Decision
|
||||
|
||||
Operation specs and handlers are separated from core. Core provides:
|
||||
|
||||
1. `OperationSpec` — describes what an operation does (name, type, input/output
|
||||
schemas, access control)
|
||||
2. `OperationHandler` — implements the operation logic
|
||||
3. `OperationRegistry` — maps paths to specs + handlers
|
||||
4. Built-in operations: `/services/list`, `/services/schema`
|
||||
|
||||
Downstream consumers register their own operations:
|
||||
|
||||
```rust
|
||||
// NAPI layer registers dev env tools
|
||||
registry.register(OperationSpec { name: "/fs/readFile", ... }, fs_read_handler);
|
||||
registry.register(OperationSpec { name: "/bash/exec", ... }, bash_exec_handler);
|
||||
|
||||
// Browser client registers a custom UDF
|
||||
registry.register(OperationSpec { name: "/notify/alert", ... }, notify_handler);
|
||||
```
|
||||
|
||||
Operation names use slash-based paths: `/{spoke}/{service}/{op}`. The first
|
||||
segment routes to the node. The `namespace` field on `OperationSpec` is
|
||||
derived from the second path segment (`service`).
|
||||
|
||||
When spoke operations are registered with the hub, the hub adds the spoke
|
||||
prefix: a spoke that registers `/fs/readFile` as "dev1" becomes addressable as
|
||||
`/dev1/fs/readFile` in the hub's routing table.
|
||||
|
||||
The `/services/list` operation returns all registered specs. The
|
||||
`/services/schema` operation returns the spec for a specific operation. These
|
||||
are read-only — no admin operations.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: NAPI, Python, and any downstream consumer can register
|
||||
operations without modifying core.
|
||||
- **Positive**: Service discovery is built in. Clients query `/services/list`
|
||||
to learn what operations a hub offers.
|
||||
- **Positive**: Spoke prefix naturally differentiates multiple spokes exposing
|
||||
the same service (dev1 vs dev2).
|
||||
- **Positive**: `AccessControl` on each `OperationSpec` enables per-operation
|
||||
authorization. Higher-risk operations (shell, filesystem write) can require
|
||||
tighter scopes.
|
||||
- **Positive**: Schema exposure enables MCP adapter generation. OperationSpec
|
||||
maps directly to MCP tool definitions.
|
||||
- **Negative**: The registry adds complexity. Core now owns `OperationSpec`,
|
||||
`OperationRegistry`, and `PendingRequestMap`.
|
||||
- **Negative**: Namespace collisions between downstream consumers are possible.
|
||||
The spoke prefix mitigates this: `/dev1/fs/readFile` vs `/dev2/fs/readFile`.
|
||||
|
||||
## References
|
||||
|
||||
- [call-protocol.md](../call-protocol.md) — Full call protocol spec
|
||||
- [ADR-018](018-control-channel-for-pubsub.md) — Control channel (generalized)
|
||||
- `@alkdev/operations` — TypeScript `OperationSpec`, `CallHandler`, registry
|
||||
@@ -1,162 +0,0 @@
|
||||
# ADR-026: Transport/Interface Separation (Three-Layer Model)
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
In the current architecture, SSH is deeply embedded in the server handler. The
|
||||
`ServerHandler` owns auth, channel management, and proxy logic — all mixed
|
||||
together. This makes it impossible to run the call protocol over any transport
|
||||
that doesn't speak SSH, such as:
|
||||
|
||||
- **DNS** — encoding call protocol frames as DNS TXT queries/responses for
|
||||
censorship resistance
|
||||
- **Raw framing** — 4-byte length prefix + JSON `EventEnvelope` without SSH
|
||||
wrapping, for local service mesh or browser-to-head direct communication
|
||||
- **WebTransport** — running call protocol over QUIC streams (browsers can't do
|
||||
SSH key exchange)
|
||||
|
||||
The DNS control channel concept from research (`core.md`) currently conflates
|
||||
"DNS as a transport that moves bytes" with "SSH sessions over those bytes." But
|
||||
SSH is not a transport — it's a protocol layer that sits *on top of* a
|
||||
transport. Separating them enables the DNS control channel to carry call
|
||||
protocol events directly, without wrapping SSH inside DNS queries.
|
||||
|
||||
The same separation enables raw framing (no SSH overhead) for trusted local
|
||||
networks, and WebTransport direct call protocol for browser clients.
|
||||
|
||||
## Decision
|
||||
|
||||
**Establish a three-layer model:**
|
||||
|
||||
### Layer 1: Transport
|
||||
|
||||
Produces byte streams. A `Transport` still produces
|
||||
`AsyncRead + AsyncWrite + Unpin + Send`. This layer is unchanged from ADR-001.
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait Transport: Send + Sync + 'static {
|
||||
type Stream: AsyncRead + AsyncWrite + Unpin + Send + 'static;
|
||||
async fn connect(&self) -> Result<Self::Stream>;
|
||||
fn describe(&self) -> String;
|
||||
}
|
||||
```
|
||||
|
||||
Transports: TCP, TLS, iroh, DNS (as byte carrier), WebTransport (future).
|
||||
|
||||
### Layer 2: Interface
|
||||
|
||||
Consumes a `Transport::Stream` and produces call protocol sessions. An
|
||||
interface is what SSH currently does: wrap a byte stream in session semantics.
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait Interface: Send + Sync + 'static {
|
||||
type Session;
|
||||
async fn accept(stream: TransportStream, config: &InterfaceConfig) -> Result<Self::Session>;
|
||||
}
|
||||
```
|
||||
|
||||
Interfaces:
|
||||
|
||||
- **SSH interface** — wraps existing `ServerHandler` logic. SSH handshake, auth,
|
||||
channel multiplexing. The call protocol runs over a reserved SSH channel
|
||||
(`alknet-control:0`).
|
||||
- **Raw framing interface** — 4-byte big-endian length prefix + JSON
|
||||
`EventEnvelope`. No SSH overhead. Direct call protocol over the transport
|
||||
stream.
|
||||
- **DNS control channel** — a (DNS transport, raw framing interface) pair that
|
||||
encodes/decodes `EventEnvelope` frames as DNS query/response pairs.
|
||||
|
||||
### Layer 3: Protocol
|
||||
|
||||
Carries semantics. Call protocol events, operation registry, service calls.
|
||||
The protocol is agnostic to both the transport and the interface below it. It
|
||||
receives `EventEnvelope` frames from whatever interface produced them.
|
||||
|
||||
### Connection Model
|
||||
|
||||
A **connection** is always a (Transport, Interface) pair. The valid combinations are enumerated:
|
||||
|
||||
| Transport | Interface | Use case |
|
||||
|-----------|-----------|----------|
|
||||
| TLS | SSH | Standard alknet tunnel |
|
||||
| TCP | SSH | Plain SSH tunnel |
|
||||
| iroh | SSH | P2P SSH tunnel |
|
||||
| DNS | raw framing | DNS control channel |
|
||||
| WebTransport | SSH | Browser SSH tunnel (future) |
|
||||
| WebTransport | raw framing | Browser call protocol (future) |
|
||||
| TCP | raw framing | Direct call protocol, local mesh |
|
||||
|
||||
**The DNS control channel carries call protocol frames directly — it does NOT
|
||||
wrap SSH inside DNS.** This is explicit because the research originally
|
||||
conflated "SSH tunneling over DNS" with "DNS as a transport for call protocol."
|
||||
The (DNS, raw framing) pair sends `EventEnvelope` frames as DNS TXT
|
||||
queries/responses — no SSH involved.
|
||||
|
||||
### `TransportKind` Enum
|
||||
|
||||
The `TransportKind` enum (currently `Tcp | Tls | Iroh`) gains `Dns` and
|
||||
`WebTransport` variants. Initially these are tags only — no acceptor
|
||||
implementation. The full DNS and WebTransport implementations are Phase 4 work
|
||||
per the integration plan.
|
||||
|
||||
```rust
|
||||
pub enum TransportKind {
|
||||
Tcp,
|
||||
Tls { server_name: Option<String> },
|
||||
Iroh { endpoint_id: String },
|
||||
Dns { domain: String },
|
||||
WebTransport { host: String },
|
||||
}
|
||||
```
|
||||
|
||||
### ServerHandler Refactor
|
||||
|
||||
The existing `ServerHandler` is refactored into `SshInterface`. The interface
|
||||
abstraction means the server's accept loop becomes:
|
||||
|
||||
```rust
|
||||
// Pseudocode
|
||||
let (transport, interface) = listener_config;
|
||||
let stream = transport.accept().await?;
|
||||
let session = interface.accept(stream, &config).await?;
|
||||
// session produces call protocol events
|
||||
```
|
||||
|
||||
The call protocol handler is interface-agnostic — it receives `EventEnvelope`
|
||||
frames from any interface. Auth, forwarding policy, and operation routing happen
|
||||
at Layer 3, not inside the SSH handler.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Enables DNS control channel without SSH wrapping. The (DNS,
|
||||
raw framing) pair is a clean (Transport, Interface) combination.
|
||||
- **Positive**: Enables raw framing for local service mesh. No SSH overhead for
|
||||
trusted networks.
|
||||
- **Positive**: SSH becomes pluggable. The same call protocol handler works with
|
||||
any interface.
|
||||
- **Positive**: `ServerHandler` is refactored into `SshInterface` — a smaller,
|
||||
more focused component that only handles SSH session management.
|
||||
- **Positive**: Future WebTransport and WebSocket interfaces are additive — they
|
||||
implement the `Interface` trait without touching SSH code.
|
||||
- **Negative**: This is the most invasive code change in Phase 1
|
||||
(integration-plan, Phase 1.8). SSH auth, channel management, and proxy logic
|
||||
are currently tangled in `ServerHandler`. Extracting them requires careful
|
||||
refactoring to maintain existing behavior.
|
||||
- **Negative**: The `Interface` trait is new and untested. The design must
|
||||
accommodate both SSH's channel multiplexing and raw framing's single-stream
|
||||
model through the same abstraction.
|
||||
|
||||
## References
|
||||
|
||||
- [research/core.md](../../research/core.md) — Transport layer, DNS transport section
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.8, three-layer model
|
||||
- [transport.md](../transport.md) — Current Transport trait (unchanged at Layer 1)
|
||||
- [server.md](../server.md) — Current ServerHandler (will become SshInterface)
|
||||
- [ADR-001](001-pluggable-transport.md) — Transport trait produces stream (unchanged)
|
||||
- [ADR-004](004-ssh-over-transport.md) — SSH runs over transport (reinforced by Layer 2)
|
||||
- [ADR-024](024-bidirectional-call-protocol.md) — Bidirectional call protocol (Layer 3)
|
||||
@@ -1,164 +0,0 @@
|
||||
# ADR-027: Crate Decomposition
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
alknet-core currently contains everything: transport, SSH, auth, config, the
|
||||
call protocol handler, and the server accept loop. As the project grows to
|
||||
include SQLite-backed identity, HD key derivation, and metagraph storage, core
|
||||
would need to depend on rusqlite, bip39, petgraph, and other heavy dependencies
|
||||
— unacceptable for a library crate that CLI users embed.
|
||||
|
||||
Different deployment topologies need different subsets:
|
||||
- A minimal CLI tunnel only needs core, transport, and auth types
|
||||
- A head node needs SQLite-backed identity and the secret service
|
||||
- A flowgraph visualization tool only needs petgraph operations
|
||||
|
||||
Circular dependencies must be avoided. alknet-storage implements
|
||||
alknet-core's `IdentityProvider` trait, so alknet-core cannot depend on
|
||||
alknet-storage. alknet-storage references alknet-secret's `EncryptedData` wire
|
||||
format, but not as a crate dependency.
|
||||
|
||||
## Decision
|
||||
|
||||
**Decompose the project into six crates with a strict acyclic dependency graph.**
|
||||
|
||||
### Crate Structure
|
||||
|
||||
1. **alknet-core** — Transport, SSH, call protocol, config, auth types, identity,
|
||||
`OperationSpec`, `Interface` trait. The foundational crate that everything
|
||||
else depends on (by type, not by crate dep in some cases).
|
||||
- *Depends on*: russh, tokio, irpc (feature-gated), serde, arc-swap
|
||||
- *Does NOT depend on*: alknet-secret, alknet-storage, alknet-flowgraph
|
||||
|
||||
2. **alknet-secret** — BIP39 mnemonic generation, SLIP-0010 Ed25519 HD key
|
||||
derivation, AES-256-GCM encryption, `SecretProtocol` irpc service.
|
||||
- *Depends on*: bip39, ed25519-bip32 (or rust-bip32-ed25519), aes-gcm, sha2,
|
||||
irpc
|
||||
- *Does NOT depend on*: alknet-core, alknet-storage
|
||||
|
||||
3. **alknet-storage** — SQLite-backed metagraph, identity tables, ACL graph,
|
||||
honker integration, `StorageProtocol` irpc service.
|
||||
- *Depends on*: rusqlite (via honker), honker, petgraph, jsonschema, irpc
|
||||
- *Does NOT depend on alknet-core* (but implements alknet-core's
|
||||
`IdentityProvider` trait via the trait, not a crate dep)
|
||||
- *Does NOT depend on alknet-secret* (but references `EncryptedData` type
|
||||
format for wire compatibility)
|
||||
|
||||
4. **alknet-flowgraph** — `FlowGraph<N,E>` over petgraph, operation graph, call
|
||||
graph, type compatibility checking.
|
||||
- *Depends on*: petgraph, serde, jsonschema, thiserror
|
||||
- *Does NOT depend on*: alknet-core, alknet-storage, alknet-secret
|
||||
|
||||
5. **alknet-napi** — Node.js native addon. Exposes alknet-core to Node.js.
|
||||
- *Depends on*: alknet-core
|
||||
- *Does NOT depend on*: alknet-secret, alknet-storage, alknet-flowgraph
|
||||
|
||||
6. **alknet** (CLI binary) — Assembles everything.
|
||||
- *Depends on*: alknet-core, alknet-secret (feature), alknet-storage (feature),
|
||||
alknet-flowgraph (feature), toml
|
||||
|
||||
### Dependency Graph
|
||||
|
||||
```
|
||||
alknet-secret alknet-storage alknet-flowgraph
|
||||
(standalone) (standalone) (standalone)
|
||||
│ │ │
|
||||
│ (feature flags │ (trait impl │ (type compat
|
||||
│ in CLI binary) │ via CLI wire) │ via JSON)
|
||||
▼ ▼ ▼
|
||||
┌─────────────────────┐
|
||||
│ alknet-core │
|
||||
│ (transport, SSH, │
|
||||
│ call protocol, │
|
||||
│ Identity, Config) │
|
||||
└─────────┬───────────┘
|
||||
│
|
||||
┌────────────┼────────────┐
|
||||
▼ ▼ ▼
|
||||
alknet-napi alknet (CLI binary — assembles everything)
|
||||
```
|
||||
|
||||
All four library crates (core, secret, storage, flowgraph) are independent of
|
||||
each other. Dependencies flow **upward** only. The CLI binary sits at the top
|
||||
and wires concrete implementations together. alknet-storage implements
|
||||
alknet-core's `IdentityProvider` trait without a crate dependency — the CLI
|
||||
binary provides the bridge.
|
||||
|
||||
### Narrow Interface Points
|
||||
|
||||
Three types serve as the narrow interface points between crates:
|
||||
|
||||
1. **`Identity`** — Defined in `alknet_core::auth`. Used by auth handler,
|
||||
forwarding policy, and call protocol. alknet-storage implements
|
||||
`IdentityProvider` to produce instances.
|
||||
|
||||
2. **`IdentityProvider`** — Trait defined in `alknet_core::auth`. Implemented by
|
||||
`ConfigIdentityProvider` (in core) and `StorageIdentityProvider` (in
|
||||
alknet-storage). The CLI/NAPI layer wires the concrete implementation.
|
||||
|
||||
3. **`OperationSpec`** — Defined in `alknet_core::call`. Used by the operation
|
||||
registry and by alknet-flowgraph for type compatibility checking. The bridge
|
||||
is serialization — flowgraph serializes to JSON, storage persists it.
|
||||
|
||||
### irpc Feature Flag
|
||||
|
||||
irpc is a feature flag in alknet-core. When disabled, auth and config go through
|
||||
`IdentityProvider` and `ConfigReloadHandle` directly — no irpc overhead. Nodes
|
||||
that only do SSH tunneling don't need the service layer.
|
||||
|
||||
In alknet-secret and alknet-storage, irpc is an independent dependency, not
|
||||
feature-gated. These crates always define irpc service protocols because they
|
||||
are used in production deployments where the service layer is active.
|
||||
|
||||
### alknet-storage's Relationship to alknet-core
|
||||
|
||||
alknet-storage does NOT depend on alknet-core as a crate. Instead:
|
||||
|
||||
- alknet-storage defines its own `IdentityProvider` impl that matches
|
||||
alknet-core's trait signature. The trait is re-exported or defined locally
|
||||
with `#[cfg(feature = "alknet-core")]` interop.
|
||||
- In practice, the CLI binary crate depends on both and wires them together.
|
||||
alknet-storage provides `StorageIdentityProvider`; alknet-core takes
|
||||
`impl IdentityProvider`.
|
||||
|
||||
### alknet-storage's Relationship to alknet-secret
|
||||
|
||||
alknet-storage does NOT depend on alknet-secret as a crate. Instead:
|
||||
|
||||
- alknet-storage and alknet-secret share the `EncryptedData` wire format (key
|
||||
version, salt, IV, ciphertext). This is a type-level compatibility, not a
|
||||
crate dependency.
|
||||
- alknet-secret encrypts; alknet-storage stores the encrypted blob in a
|
||||
`SecretNode` in the metagraph. The bridge is serialization.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Core is lean. No database, no crypto, no petgraph. CLI users
|
||||
get a small binary.
|
||||
- **Positive**: Services are pluggable. alknet-secret and alknet-storage can be
|
||||
swapped for alternative implementations.
|
||||
- **Positive**: No circular dependencies. The dependency graph is a DAG.
|
||||
- **Positive**: Deployment topology determines which crates to include. A CLI
|
||||
tunnel uses only alknet-core. A head node uses everything.
|
||||
- **Positive**: irpc is feature-gated in core. Minimal deployments don't pay for
|
||||
service layer overhead.
|
||||
- **Negative**: `IdentityProvider` trait interop between alknet-core and
|
||||
alknet-storage requires careful versioning. If the trait signature changes,
|
||||
both crates must update.
|
||||
- **Negative**: `EncryptedData` wire format compatibility between alknet-secret
|
||||
and alknet-storage is implicit (not enforced by the type system). A shared
|
||||
types crate could be extracted if needed, but adds another crate dependency.
|
||||
|
||||
## References
|
||||
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 2, dependency graph
|
||||
- [research/core.md](../../research/core.md) — alknet-core contents
|
||||
- [research/services.md](../../research/services.md) — Service protocols
|
||||
- [research/storage.md](../../research/storage.md) — alknet-storage contents
|
||||
- [research/flow.md](../../research/flow.md) — alknet-flowgraph contents
|
||||
- [ADR-028](028-auth-irpc-service.md) — Auth as irpc service (service protocol enabled by decomposition)
|
||||
- [ADR-029](029-identity-core-type.md) — Identity as core type (narrow interface point)
|
||||
@@ -1,147 +0,0 @@
|
||||
# ADR-028: Auth as irpc Service
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
For head nodes serving many users, in-memory key lookup via `ArcSwap<DynamicConfig>`
|
||||
doesn't scale. Loading all authorized keys into RAM and atomic-swapping the
|
||||
entire set on each reload works for small deployments but requires holding every
|
||||
key in memory. For production deployments with hundreds or thousands of users,
|
||||
auth verification should query a database on demand rather than holding all keys
|
||||
in memory.
|
||||
|
||||
The current `ArcSwap<DynamicConfig>` approach works for CLI and single-node
|
||||
setups. What's needed is an async boundary that allows auth verification to go
|
||||
through a service — locally via channels for minimal deployments, or via irpc
|
||||
for production deployments where auth runs on a separate process or node.
|
||||
|
||||
The critical design point: callers go through the `IdentityProvider` trait
|
||||
(ADR-029). The irpc service is one way to satisfy the trait. Both paths produce
|
||||
the same result — an `Identity` or rejection. The trait is the contract; the
|
||||
service is an implementation path.
|
||||
|
||||
## Decision
|
||||
|
||||
**Auth verification is provided via an irpc service protocol, with
|
||||
`IdentityProvider` as the interface contract and `ConfigIdentityProvider`
|
||||
(ArcSwap-backed) as the default implementation.**
|
||||
|
||||
### IdentityProvider Trait (ADR-029) — The Contract
|
||||
|
||||
Callers depend on `IdentityProvider`, not on any concrete implementation:
|
||||
|
||||
```rust
|
||||
pub trait IdentityProvider: Send + Sync + 'static {
|
||||
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
|
||||
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
|
||||
}
|
||||
```
|
||||
|
||||
### ConfigIdentityProvider — Default Implementation
|
||||
|
||||
Reads from `ArcSwap<DynamicConfig.auth>`. No database needed. Every authorized
|
||||
key gets a default scope set. This is the default for CLI and single-node
|
||||
deployments.
|
||||
|
||||
### AuthProtocol irpc Service — Behind Feature Flag
|
||||
|
||||
```rust
|
||||
#[rpc_requests(message = AuthMessage)]
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
enum AuthProtocol {
|
||||
#[rpc(tx=oneshot::Sender<AuthResult>)]
|
||||
#[wrap(VerifyPubkey)]
|
||||
VerifyPubkey { fingerprint: String, key_data: Vec<u8> },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<AuthResult>)]
|
||||
#[wrap(VerifyToken)]
|
||||
VerifyToken { token_bytes: Vec<u8>, timestamp: u64 },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<()>)]
|
||||
#[wrap(ReloadKeys)]
|
||||
ReloadKeys,
|
||||
|
||||
#[rpc(tx=oneshot::Sender<bool>)]
|
||||
#[wrap(CheckAccess)]
|
||||
CheckAccess { identity: Identity, operation: String },
|
||||
}
|
||||
|
||||
enum AuthResult {
|
||||
Ok(Identity),
|
||||
Denied(String),
|
||||
}
|
||||
```
|
||||
|
||||
The `AuthProtocol` is behind the `irpc` feature flag in alknet-core. Nodes
|
||||
that only do SSH tunneling don't need the service layer overhead. When the
|
||||
feature is disabled, auth goes through `IdentityProvider` directly.
|
||||
|
||||
### AuthServiceImpl
|
||||
|
||||
Two implementations exist (the second is a future phase):
|
||||
|
||||
- **ConfigAuthService** — backed by `ConfigIdentityProvider` (ArcSwap path).
|
||||
Wraps the trait in an irpc service for deployments that use the service layer
|
||||
but don't have SQLite. This is the Phase 1 path: it ships with alknet-core.
|
||||
- **StorageAuthService** — backed by SQLite `peer_credentials` and `api_keys`
|
||||
tables (in alknet-storage, not yet built). Queries on demand. Can maintain an
|
||||
LRU cache for hot fingerprints. This is a Phase 2+ implementation — the
|
||||
contract is defined here so alknet-storage can implement it later.
|
||||
|
||||
Both produce the same `AuthResult` — an `Identity` or a denial. Callers don't
|
||||
know or care which backend is running.
|
||||
|
||||
### Integration with IdentityProvider
|
||||
|
||||
The irpc service and the trait compose. A caller goes through `IdentityProvider`,
|
||||
which may internally delegate to the irpc service, or may satisfy the request
|
||||
locally via `ConfigIdentityProvider`. The deployment topology determines the
|
||||
path:
|
||||
|
||||
- **Minimal (CLI, single-node)**: `ConfigIdentityProvider` reads from
|
||||
`ArcSwap<DynamicConfig>`. No irpc overhead.
|
||||
- **Production with local auth**: `AuthServiceImpl` wraps
|
||||
`StorageIdentityProvider` locally. The handler calls `IdentityProvider` which
|
||||
routes to the local irpc service.
|
||||
- **Distributed auth**: Handler on a worker node calls `IdentityProvider` which
|
||||
routes to a remote auth irpc service over QUIC.
|
||||
|
||||
### ConfigService Integration
|
||||
|
||||
`AuthProtocol::ReloadKeys` triggers reload of the dynamic config's auth section.
|
||||
For the `ConfigIdentityProvider` path, this is equivalent to
|
||||
`ConfigReloadHandle::reload()`. For the `StorageIdentityProvider` path, this
|
||||
refreshes the LRU cache. Both update atomically — ongoing connections are
|
||||
unaffected, new connections pick up changes.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Minimal deployments use `ArcSwap` without irpc overhead. No
|
||||
database dependency for CLI users.
|
||||
- **Positive**: Production deployments wire `StorageIdentityProvider` behind the
|
||||
irpc service. Auth scales to thousands of users without loading all keys into
|
||||
memory.
|
||||
- **Positive**: The `IdentityProvider` trait is the only contract callers depend
|
||||
on. This keeps alknet-core lean and testable.
|
||||
- **Positive**: Feature flag (`irpc`) keeps core lean for deployments that don't
|
||||
need the service layer.
|
||||
- **Positive**: Both paths produce identical `Identity` results. Behavioral
|
||||
parity is enforced by the shared `Identity` type.
|
||||
- **Negative**: Two implementations must be kept in sync. `ConfigIdentityProvider`
|
||||
and `StorageIdentityProvider` must produce the same `Identity` for the same
|
||||
input. Integration tests should verify this.
|
||||
- **Negative**: The `irpc` feature flag adds conditional compilation complexity.
|
||||
The core must compile and work without it, and the service layer must work
|
||||
with it enabled.
|
||||
|
||||
## References
|
||||
|
||||
- [research/services.md](../../research/services.md) — AuthService, AuthProtocol definition
|
||||
- [auth.md](../auth.md) — IdentityProvider trait, Identity struct
|
||||
- [research/configuration.md](../../research/configuration.md) — Auth service approach
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.4
|
||||
- [ADR-029](029-identity-core-type.md) — Identity as core type
|
||||
- [ADR-027](027-crate-decomposition.md) — Crate decomposition
|
||||
@@ -1,107 +0,0 @@
|
||||
# ADR-029: Identity as Core Type
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The `Identity` struct and `IdentityProvider` trait are needed by auth,
|
||||
forwarding policy, and call protocol — three different subsystems in
|
||||
alknet-core. Without placing them in core, these subsystems would each define
|
||||
their own identity type, leading to duplication and conversion boilerplate.
|
||||
|
||||
The constraint: alknet-core must not depend on alknet-storage or any database.
|
||||
The `IdentityProvider` trait must be in core so that the handler can resolve
|
||||
identities without knowing whether the backing store is a config file or a
|
||||
SQLite database. External crates provide implementations.
|
||||
|
||||
Earlier research defined `Identity` inconsistently: `{node_id, fingerprint,
|
||||
scopes}` in services.md and `{id, scopes, resources}` in auth.md. The unified
|
||||
model uses `{id, scopes, resources}` where `id` serves as both fingerprint (for
|
||||
key-based auth from config) and account UUID (for database-backed auth).
|
||||
|
||||
## Decision
|
||||
|
||||
**`Identity` struct and `IdentityProvider` trait live in `alknet_core::auth`.**
|
||||
|
||||
### Identity Struct
|
||||
|
||||
```rust
|
||||
pub struct Identity {
|
||||
pub id: String, // Fingerprint (config auth) or account UUID (database auth)
|
||||
pub scopes: Vec<String>, // e.g., ["relay:connect", "service:gitea:read"]
|
||||
pub resources: HashMap<String, Vec<String>>, // e.g., {"service": ["gitea", "registry"]}
|
||||
}
|
||||
```
|
||||
|
||||
The `id` field serves dual purpose: when using config-based authentication
|
||||
(`ConfigIdentityProvider`), it holds the Ed25519 key fingerprint. When using
|
||||
database-backed authentication (`StorageIdentityProvider`), it holds the account
|
||||
UUID from the `accounts` table. This keeps the type simple while accommodating
|
||||
both auth paths.
|
||||
|
||||
The `scopes` field provides authorization scope strings used by
|
||||
`ForwardingPolicy` and `AccessControl` in `OperationSpec`. The `resources`
|
||||
field provides resource-level authorization beyond what scopes offer (e.g., which
|
||||
services this identity can access).
|
||||
|
||||
### IdentityProvider Trait
|
||||
|
||||
```rust
|
||||
pub trait IdentityProvider: Send + Sync + 'static {
|
||||
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
|
||||
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
|
||||
}
|
||||
```
|
||||
|
||||
The trait is the contract. Callers (auth handler, forwarding policy, call
|
||||
protocol) depend on `IdentityProvider` — not on any concrete implementation.
|
||||
|
||||
### Default and Production Implementations
|
||||
|
||||
- **`ConfigIdentityProvider`** (in alknet-core) — reads from
|
||||
`ArcSwap<DynamicConfig.auth>`. Every authorized key gets a default scope set.
|
||||
No database needed. This is the default for minimal deployments.
|
||||
- **`StorageIdentityProvider`** (in alknet-storage) — backed by SQLite
|
||||
`peer_credentials` and `api_keys` tables plus the ACL graph. Resolves
|
||||
fingerprint → account → organization membership → effective scopes. This is
|
||||
the production implementation for head nodes.
|
||||
|
||||
alknet-core never depends on alknet-storage. The trait relationship is:
|
||||
alknet-core *defines* the trait, alknet-storage *implements* it. The CLI or
|
||||
NAPI assembly layer wires the concrete implementation.
|
||||
|
||||
### Why Not in alknet-storage?
|
||||
|
||||
If `Identity` lived in alknet-storage, alknet-core would need to depend on
|
||||
alknet-storage to use the type — creating a circular dependency (since
|
||||
alknet-storage implements alknet-core's `IdentityProvider` trait). Placing the
|
||||
type and trait in core breaks the cycle.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: alknet-core has no database dependency. Auth, forwarding, and
|
||||
call protocol all use the same `Identity` type.
|
||||
- **Positive**: alknet-storage implements the core trait. The CLI/NAPI layer
|
||||
wires the concrete implementation. Deployment topology determines which impl
|
||||
to use.
|
||||
- **Positive**: The `id` field serves dual purpose (fingerprint or UUID),
|
||||
avoiding separate types for config-based and database-based auth.
|
||||
- **Positive**: `ForwardingPolicy` and `AccessControl` can reference scopes from
|
||||
`Identity` without knowing where they came from.
|
||||
- **Negative**: Two implementations of `IdentityProvider` exist — `Config` and
|
||||
`Storage`. Both must produce identical `Identity` results for the same input.
|
||||
Tests should verify behavioral parity.
|
||||
- **Negative**: The trait abstraction adds a level of indirection for the
|
||||
minimal (config-only) deployment path. The cost is negligible — the
|
||||
`ConfigIdentityProvider` is a simple `ArcSwap` dereference.
|
||||
|
||||
## References
|
||||
|
||||
- [auth.md](../auth.md) — IdentityProvider trait, Identity struct, unified auth
|
||||
- [research/services.md](../../research/services.md) — AuthService, Identity section
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.2
|
||||
- [ADR-023](023-unified-auth-shared-key-material.md) — Unified auth with shared key material
|
||||
- [ADR-028](028-auth-irpc-service.md) — Auth as irpc service
|
||||
- [OQ-18](../open-questions.md) — IdentityProvider owns scopes
|
||||
@@ -1,159 +0,0 @@
|
||||
# ADR-030: Static/Dynamic Configuration Split
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Alknet's configuration is loaded once at startup and never changes. This causes
|
||||
three specific failures:
|
||||
|
||||
1. **No hot reload of authentication credentials.** Adding or removing an
|
||||
authorized key requires restarting the server process. In head/worker
|
||||
deployments where keys are managed via a database, the process must be
|
||||
restarted every time a key is added, revoked, or rotated. This is
|
||||
operationally unacceptable.
|
||||
|
||||
2. **No port forwarding access control.** Any authenticated client can open a
|
||||
`direct-tcpip` channel to any destination. There is no policy governing
|
||||
which hosts, ports, or alknet control channels a client may access. A
|
||||
compromised key grants unrestricted network access through the tunnel.
|
||||
|
||||
3. **No structured configuration beyond CLI flags.** ADR-011 chose
|
||||
programmatic-first configuration for the alpha — correct at the time. But as
|
||||
alknet moves toward publishable releases, operators need config files for
|
||||
reproducible deployments, and the NAPI layer needs programmatic reload
|
||||
capability that `ServeOptions` doesn't currently support.
|
||||
|
||||
Not all configuration should be reloadable. Transport-level settings (listen
|
||||
address, TLS certificates, host key) require socket/TLS renegotiation to change
|
||||
at runtime — effectively a restart. Auth and forwarding policy can change
|
||||
atomically without disrupting existing connections.
|
||||
|
||||
## Decision
|
||||
|
||||
**Split configuration into `StaticConfig` and `DynamicConfig`.**
|
||||
|
||||
### StaticConfig
|
||||
|
||||
Immutable after startup. Constructed from `ServeOptions` (the builder pattern is
|
||||
preserved). Contains everything that affects socket binding, TLS handshakes, or
|
||||
SSH session negotiation:
|
||||
|
||||
- Transport mode, listen address
|
||||
- TLS config (cert, key)
|
||||
- iroh config (relay URL)
|
||||
- Stealth mode flag
|
||||
- Host key, host key algorithm
|
||||
- Max auth attempts, max connections per IP
|
||||
- Proxy config
|
||||
|
||||
Changing any of these requires a restart.
|
||||
|
||||
### DynamicConfig
|
||||
|
||||
Hot-reloadable at runtime via `ArcSwap<DynamicConfig>`. Contains everything
|
||||
checked per-connection or per-channel:
|
||||
|
||||
- `AuthPolicy` — authorized keys, certificate authorities, token config
|
||||
- `ForwardingPolicy` — allow/deny rules for channel targets (ADR-031)
|
||||
- `RateLimitConfig` — rate limiting parameters
|
||||
|
||||
`ArcSwap` provides lock-free reads on the hot path (every `auth_publickey()` and
|
||||
every `channel_open_direct_tcpip()` call does an `Arc` dereference — zero cost
|
||||
compared to the current approach). Writes are atomic: `store()` swaps the
|
||||
pointer. Existing connections finish with their current config; new connections
|
||||
get the new config.
|
||||
|
||||
### ConfigReloadHandle
|
||||
|
||||
```rust
|
||||
pub struct ConfigReloadHandle {
|
||||
dynamic: Arc<ArcSwap<DynamicConfig>>,
|
||||
}
|
||||
|
||||
impl ConfigReloadHandle {
|
||||
pub fn reload(&self, new_config: DynamicConfig) { ... }
|
||||
}
|
||||
```
|
||||
|
||||
The handle is obtained from `Server::run()` and passed to NAPI or the CLI.
|
||||
|
||||
### ConfigService
|
||||
|
||||
The `ConfigService` wraps `ArcSwap<DynamicConfig>` reloads behind an irpc
|
||||
protocol (behind the `irpc` feature flag) for production deployments that use
|
||||
the service layer. For minimal deployments (CLI, single-node), direct
|
||||
`ConfigReloadHandle::reload()` is sufficient.
|
||||
|
||||
### TOML Config File
|
||||
|
||||
An optional TOML config file covers static config plus initial auth/forwarding
|
||||
paths. This **amends** ADR-011 (does not supersede it) — the programmatic-first
|
||||
API remains primary. The config file is a convenience input format:
|
||||
|
||||
```toml
|
||||
[server]
|
||||
transport = "tls"
|
||||
listen = "0.0.0.0:443"
|
||||
stealth = false
|
||||
max_connections_per_ip = 5
|
||||
max_auth_attempts = 3
|
||||
|
||||
[server.tls]
|
||||
cert = "/etc/alknet/tls/cert.pem"
|
||||
key = "/etc/alknet/tls/key.pem"
|
||||
|
||||
[auth]
|
||||
host_key = "/etc/alknet/ssh/host_key"
|
||||
|
||||
[forwarding]
|
||||
default = "deny"
|
||||
```
|
||||
|
||||
### NAPI Reload API
|
||||
|
||||
```typescript
|
||||
interface AlknetServer {
|
||||
reloadAuth(auth: { authorizedKeys?: Buffer, certAuthority?: Buffer }): void;
|
||||
reloadForwarding(policy: ForwardingPolicyConfig): void;
|
||||
reloadAll(config: DynamicConfig): void;
|
||||
}
|
||||
```
|
||||
|
||||
The NAPI layer parses key data and constructs a new `DynamicConfig`, then calls
|
||||
`ConfigReloadHandle::reload()`.
|
||||
|
||||
### Client Configuration
|
||||
|
||||
Client configuration stays as `ConnectOptions` — no `ArcSwap` needed. Client
|
||||
config is almost entirely static (which server to connect to, which key to use).
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Auth credentials and forwarding policy can be reloaded without
|
||||
restarting the server. Adding a key via `reloadAuth()` takes effect on the
|
||||
next connection attempt.
|
||||
- **Positive**: ADR-011's programmatic-first intent is preserved. The TOML
|
||||
config file is an optional convenience layer, not a replacement for
|
||||
`ServeOptions`.
|
||||
- **Positive**: `ArcSwap` provides zero-cost reads on the hot path. Every auth
|
||||
check and every channel open is a single `Arc` dereference.
|
||||
- **Positive**: The `ConfigService` irpc protocol (behind feature flag) allows
|
||||
production deployments to integrate config reload into their service mesh
|
||||
without taking a direct dependency on `DynamicConfig` internals.
|
||||
- **Positive**: Forwarding policy is now part of `DynamicConfig` — operators can
|
||||
restrict access per identity, per destination, per transport (ADR-031).
|
||||
- **Negative**: Two config structs where there was one. The split is clean
|
||||
(transport vs. policy) but adds surface area.
|
||||
- **Negative**: Config file introduces `toml` as a dependency in the CLI crate.
|
||||
This is acceptable for a CLI binary.
|
||||
|
||||
## References
|
||||
|
||||
- [research/configuration.md](../../research/configuration.md) — Full analysis
|
||||
- [ADR-011](011-no-ssh-config-programmatic-api.md) — Programmatic-first API (amended, not superseded)
|
||||
- [ADR-031](031-forwarding-policy.md) — Forwarding policy (part of DynamicConfig)
|
||||
- [ADR-029](029-identity-core-type.md) — Identity as core type (DynamicConfig.auth uses IdentityProvider)
|
||||
- [integration-plan.md](../../research/integration-plan.md) — Phase 1.1
|
||||
@@ -1,138 +0,0 @@
|
||||
# ADR-031: Forwarding Policy
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Currently, any authenticated client can open a `direct-tcpip` SSH channel to
|
||||
any destination. The only gate is authentication — once authenticated, a client
|
||||
has unrestricted network access through the tunnel. This is a security gap: a
|
||||
compromised key grants unrestricted access.
|
||||
|
||||
Operators need the ability to:
|
||||
- Restrict which hosts and ports authenticated clients can access
|
||||
- Apply different rules to different principals (key fingerprints, accounts)
|
||||
- Restrict WebTransport clients to alknet control channels only
|
||||
- Set a default policy (allow-all for migration compatibility, deny-all for
|
||||
production)
|
||||
|
||||
## Decision
|
||||
|
||||
**Add `ForwardingPolicy` as part of `DynamicConfig` (reloadable without
|
||||
restart).**
|
||||
|
||||
### Type Definitions
|
||||
|
||||
```rust
|
||||
pub struct ForwardingPolicy {
|
||||
pub default: ForwardingAction,
|
||||
pub rules: Vec<ForwardingRule>,
|
||||
}
|
||||
|
||||
pub struct ForwardingRule {
|
||||
pub target: TargetPattern,
|
||||
pub action: ForwardingAction,
|
||||
pub principals: Vec<String>, // Empty = matches all
|
||||
pub transports: Vec<TransportKind>, // Empty = matches all
|
||||
}
|
||||
|
||||
pub enum ForwardingAction {
|
||||
Allow,
|
||||
Deny,
|
||||
}
|
||||
|
||||
pub enum TargetPattern {
|
||||
Any,
|
||||
Host(String), // "localhost", "*.example.com"
|
||||
Cidr(IpNetwork), // "10.0.0.0/8"
|
||||
PortRange(String, Range<u16>), // "localhost", ports 8080-8090
|
||||
AlknetPrefix, // Matches alknet-* control channels
|
||||
}
|
||||
```
|
||||
|
||||
### Rule Evaluation
|
||||
|
||||
Rules are evaluated in order. First match wins. If no rule matches, the default
|
||||
applies. This supports both allowlist and blocklist semantics:
|
||||
|
||||
- **Allowlist**: `default: Deny`, then explicit Allow rules for permitted
|
||||
destinations.
|
||||
- **Blocklist**: `default: Allow`, then explicit Deny rules for blocked
|
||||
destinations.
|
||||
|
||||
### Principals
|
||||
|
||||
Each rule can specify which principals it applies to. A principal is an
|
||||
`Identity.id` (fingerprint or UUID) or a scope from `Identity.scopes`. When the
|
||||
rule's `principals` field is empty, it matches all identities.
|
||||
|
||||
This connects to the `IdentityProvider` trait (ADR-029): when a client
|
||||
authenticates, the `Identity` is resolved, and the forwarding policy checks
|
||||
rules against `Identity.id` and `Identity.scopes`.
|
||||
|
||||
### TransportKind-Aware Rules
|
||||
|
||||
Each rule can specify which `TransportKind` it applies to. This enables
|
||||
transport-specific restrictions — for example, WebTransport clients can be
|
||||
restricted to `alknet-*` control channels only:
|
||||
|
||||
```rust
|
||||
ForwardingRule {
|
||||
target: TargetPattern::AlknetPrefix,
|
||||
action: ForwardingAction::Allow,
|
||||
principals: vec![],
|
||||
transports: vec![TransportKind::WebTransport { host: "*".into() }],
|
||||
}
|
||||
```
|
||||
|
||||
### Where the Policy Check Happens
|
||||
|
||||
The forwarding policy check occurs in `channel_open_direct_tcpip` before the
|
||||
proxy task is spawned. The current behavior (no check) is equivalent to
|
||||
`ForwardingPolicy::allow_all()` — default Allow, no rules. This preserves
|
||||
backward compatibility during migration.
|
||||
|
||||
### DynamicConfig Integration
|
||||
|
||||
`ForwardingPolicy` is part of `DynamicConfig` and reloadable via
|
||||
`ConfigReloadHandle::reload()` or NAPI's `reloadForwarding()`. Changes take
|
||||
effect on the next channel open — existing connections continue with their
|
||||
current policy.
|
||||
|
||||
### OQ Resolutions
|
||||
|
||||
- **OQ-12** (Per-user forwarding scope vs global rules): Resolved. Start with
|
||||
global rules + principal matching from `Identity.scopes`. Per-user scope
|
||||
from `peer_credentials.metadata.scopes` via `IdentityProvider`.
|
||||
- **OQ-16** (Transport-specific forwarding): Resolved. Add `TransportKind`
|
||||
match in `ForwardingRule`. WebTransport clients can be restricted.
|
||||
- **OQ-18** (Source of Identity.scopes): Resolved by ADR-029 and this ADR.
|
||||
`IdentityProvider` owns scopes. `ForwardingPolicy` consumes them.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Operators can restrict access per identity, per destination, per
|
||||
transport. A compromised key no longer grants unrestricted network access.
|
||||
- **Positive**: Default-allow preserves current behavior during migration. Switch
|
||||
to default-deny for production deployments.
|
||||
- **Positive**: Policy is reloadable without restart. Adding a rule via
|
||||
`reloadForwarding()` takes effect on the next channel open.
|
||||
- **Positive**: `TransportKind`-aware rules enable transport-specific
|
||||
restrictions (e.g., WebTransport clients restricted to alknet-* channels).
|
||||
- **Negative**: Another check in the hot path (every `channel_open_direct_tcpip`
|
||||
call). The cost is a linear scan of rules — acceptable for small rule sets.
|
||||
Large rule sets should use compiled matchers (future optimization).
|
||||
- **Negative**: `TargetPattern` string matching is lenient. Host patterns like
|
||||
`*.example.com` require careful implementation to prevent bypasses. The
|
||||
`glob` or `globset` crate can handle this correctly.
|
||||
|
||||
## References
|
||||
|
||||
- [research/configuration.md](../../research/configuration.md) — ForwardingPolicy section
|
||||
- [auth.md](../auth.md) — Identity.scopes and IdentityProvider
|
||||
- [open-questions.md](../open-questions.md) — OQ-12, OQ-16, OQ-18
|
||||
- [ADR-029](029-identity-core-type.md) — Identity as core type
|
||||
- [ADR-030](030-static-dynamic-config-split.md) — DynamicConfig (ForwardingPolicy is part of it)
|
||||
- [integration-plan.md](../../research/integration-plan.md) — Phase 1.3
|
||||
@@ -1,96 +0,0 @@
|
||||
# ADR-032: Event Boundary Discipline
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The research identified three distinct communication patterns in the system, and
|
||||
conflating them is a known anti-pattern in event-driven architectures:
|
||||
|
||||
1. **Domain events** (Honker streams) — Internal to the service that owns that
|
||||
data. Used for state reconstruction within the service's own boundaries.
|
||||
Examples: `nodes:created`, `edges:deleted`, `accounts:updated`.
|
||||
|
||||
2. **irpc service calls** — Synchronous request-response within a node or
|
||||
cluster. Internal to the system. Examples: `AuthProtocol::VerifyPubkey`,
|
||||
`SecretProtocol::DeriveEd25519`, `ConfigProtocol::ReloadForwarding`.
|
||||
|
||||
3. **Call protocol events** (`EventEnvelope`) — Asynchronous integration events
|
||||
that cross node boundaries. External to the system. Examples:
|
||||
`call.requested`, `call.responded`, `call.completed`, `call.aborted`.
|
||||
|
||||
Without a hard constraint, it's tempting to have one service subscribe directly
|
||||
to another service's Honker streams. This leads to:
|
||||
|
||||
- **Leaky event store**: Service A reads Service B's domain events directly,
|
||||
coupling A to B's internal state representation. When B changes its schema, A
|
||||
breaks.
|
||||
- **Boomerang coupling**: An integration event is too thin, causing the
|
||||
consumer to call back to the source service synchronously to get details. This
|
||||
negates the benefit of async communication.
|
||||
- **Fat notification trap**: A notification event carries full entity state,
|
||||
when it should use state transfer instead.
|
||||
|
||||
## Decision
|
||||
|
||||
**Event boundary discipline is a hard architectural constraint, not a
|
||||
suggestion.**
|
||||
|
||||
1. **Domain events stay within the owning service.** A Honker stream published
|
||||
by the storage service (`nodes:created`) is for the storage service's own
|
||||
state reconstruction. No other service reads these stream events directly.
|
||||
|
||||
2. **irpc service calls are synchronous and internal.** They never cross node
|
||||
boundaries. They are request-response, not events. They should not be used
|
||||
as a substitute for integration events.
|
||||
|
||||
3. **Call protocol events are the only events that cross node boundaries.**
|
||||
`EventEnvelope` frames are the integration boundary. When a domain event
|
||||
needs to be communicated to another node, it must be projected into a call
|
||||
protocol event.
|
||||
|
||||
4. **Projection from domain events to integration events is required when
|
||||
crossing boundaries.** A service that owns a Honker stream must project
|
||||
relevant state changes into `EventEnvelope` frames before they leave the
|
||||
node. The projection strips internal details and produces a versioned,
|
||||
stable integration event.
|
||||
|
||||
This discipline applies at three levels:
|
||||
|
||||
```
|
||||
Call Protocol (Layer 3, external, JSON)
|
||||
└── irpc Service (Layer 3, internal, postcard)
|
||||
└── Honker Streams (Domain events, within service boundary)
|
||||
```
|
||||
|
||||
A call protocol handler MAY call an irpc service internally (e.g.,
|
||||
`/head/auth/verify` calls `AuthProtocol::VerifyPubkey`). The irpc service MAY
|
||||
use Honker streams for its own state management. But domain events never
|
||||
propagate beyond the service boundary without projection.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Prevents leaky event stores. Services are independently
|
||||
deployable and their internal schemas can evolve without breaking consumers.
|
||||
- **Positive**: Honker and irpc are implementation details, not cross-boundary
|
||||
contracts. The call protocol's `EventEnvelope` is the only stable, versioned
|
||||
contract that other nodes depend on.
|
||||
- **Positive**: Clear ownership. Each service owns its Honker streams and can
|
||||
change them freely. Integration events are a deliberate, reviewed contract.
|
||||
- **Positive**: Makes testing easier. Services can be tested in isolation with
|
||||
mock domain events. Integration events are tested against the `EventEnvelope`
|
||||
schema.
|
||||
- **Negative**: Projection code is required. Every domain event that needs to
|
||||
cross a boundary must be explicitly projected. This is deliberate — the
|
||||
overhead ensures the integration contract is intentional.
|
||||
- **Negative**: Developers must resist the temptation to subscribe directly to
|
||||
Honker streams across services. Code review should catch this pattern.
|
||||
|
||||
## References
|
||||
|
||||
- [research/services.md](../../research/services.md) — Event boundary discipline section
|
||||
- [research/storage.md](../../research/storage.md) — Honker integration, event boundaries
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — ADR 032 entry
|
||||
- [event_source_types.md](../../research/event-sourcing/event_source_types.md) — Event-driven architecture patterns
|
||||
@@ -1,132 +0,0 @@
|
||||
# ADR-033: OperationEnv as Universal Composition Mechanism
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The `@alkdev/operations` TypeScript package defines `OperationEnv` as a
|
||||
universal composition mechanism. A handler receives `context.env[namespace][op](input)`
|
||||
and can invoke any registered operation regardless of whether it runs locally, in
|
||||
an irpc service on the same cluster, or on a remote node via call protocol.
|
||||
|
||||
The research documents define three dispatch paths:
|
||||
1. **Local dispatch** — direct function call through the operation registry
|
||||
2. **Service dispatch** — irpc protocol call to a service backend
|
||||
3. **Remote dispatch** — call protocol `EventEnvelope` to a remote node
|
||||
|
||||
Without a formal decision, irpc services could be seen as a replacement for
|
||||
OperationEnv or for the call protocol. They are not — irpc is one dispatch
|
||||
backend for OperationEnv, not a replacement for anything. The call protocol is
|
||||
another dispatch backend. OperationEnv unifies them from the handler's
|
||||
perspective.
|
||||
|
||||
The three communication patterns in the system (ADR-032) are:
|
||||
- Domain events (Honker streams) — internal to the owning service
|
||||
- irpc service calls — synchronous, in-cluster
|
||||
- Call protocol events — asynchronous, cross-node
|
||||
|
||||
irpc services and call protocol operations serve different scopes but must
|
||||
compose cleanly through OperationEnv.
|
||||
|
||||
## Decision
|
||||
|
||||
**OperationEnv is the universal composition mechanism that all operation
|
||||
handlers receive. It provides namespace + operation name → invoke with input,
|
||||
return output, regardless of dispatch path.**
|
||||
|
||||
### OperationEnv Behavioral Contract
|
||||
|
||||
```rust
|
||||
// The behavioral contract: given a namespace and operation name, invoke the
|
||||
// operation with the given input and return the output. The handler neither
|
||||
// knows nor cares whether the dispatch is local, via irpc, or via call protocol.
|
||||
pub trait OperationEnv: Send + Sync {
|
||||
fn invoke(&self, namespace: &str, operation: &str, input: Value) -> ResponseEnvelope;
|
||||
}
|
||||
```
|
||||
|
||||
The Rust implementation may use typed method dispatch or a registry behind the
|
||||
scenes, but the handler-facing API must preserve this contract.
|
||||
|
||||
### Three Dispatch Paths
|
||||
|
||||
OperationEnv resolves each call to one of three dispatch backends:
|
||||
|
||||
| Path | Mechanism | Serialization | Scope |
|
||||
|------|-----------|---------------|-------|
|
||||
| Local | Direct function call through registry | None (in-process) | Same process |
|
||||
| Service | irpc protocol enum dispatch | postcard (binary) | Same cluster |
|
||||
| Remote | Call protocol `EventEnvelope` | JSON | Cross-node |
|
||||
|
||||
All three produce the same `ResponseEnvelope`. The handler always calls
|
||||
`context.env.invoke("secrets", "derive", input)` and gets a `ResponseEnvelope`
|
||||
back.
|
||||
|
||||
### Service Assembly
|
||||
|
||||
The deployment topology determines which dispatch path each operation uses:
|
||||
|
||||
```rust
|
||||
// Minimal deployment (single node, all local)
|
||||
let env = OperationEnv::local(local_registry);
|
||||
|
||||
// Production deployment (mix of local and remote)
|
||||
let env = OperationEnv::new()
|
||||
.local("auth", auth_registry) // Auth runs locally
|
||||
.local("config", config_registry) // Config runs locally
|
||||
.service("secrets", secret_irpc_client) // Secret service via irpc
|
||||
.remote("worker-1", call_protocol_conn) // Worker-1 operations via call protocol
|
||||
```
|
||||
|
||||
### irpc Services Are One Dispatch Backend
|
||||
|
||||
irpc services (`AuthProtocol`, `SecretProtocol`, `ConfigProtocol`) define the
|
||||
wire format for in-cluster communication. They are Rust-to-Rust, type-safe,
|
||||
and efficient. But they are not a replacement for OperationEnv or for the call
|
||||
protocol. They are one dispatch backend.
|
||||
|
||||
An irpc service can be exposed as a call protocol operation:
|
||||
`/head/auth/verify` receives a call protocol event and internally calls
|
||||
`AuthProtocol::VerifyPubkey` via irpc. The layers compose:
|
||||
|
||||
```
|
||||
Call Protocol (Layer 3, external, JSON)
|
||||
└── irpc Service (Layer 3, internal, postcard)
|
||||
└── Honker Streams (Domain events, within service boundary)
|
||||
```
|
||||
|
||||
### Adapters Map to OperationEnv
|
||||
|
||||
HTTP (`POST /v1/{namespace}/{op}`), MCP (`tools/call`), DNS
|
||||
(`{op}.{namespace}.alk.dev TXT?`), and call protocol
|
||||
(`/call.requested`) all resolve through OperationEnv. This is what makes
|
||||
operations universally composable across all interfaces.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Handlers compose through a single interface. Adding a new
|
||||
dispatch path (e.g., a new irpc service) doesn't change handler code.
|
||||
- **Positive**: irpc and call protocol coexist naturally. The handler doesn't
|
||||
know which path was taken.
|
||||
- **Positive**: Adapters (MCP, HTTP, DNS) map to operations through the same
|
||||
OperationEnv interface. One handler, multiple dispatch paths.
|
||||
- **Positive**: Deployment topology determines dispatch, not code. Same handler
|
||||
works locally, in-cluster, or cross-node.
|
||||
- **Negative**: OperationEnv is a new abstraction that must coexist with the
|
||||
existing call protocol handler pattern. The registry currently maps paths to
|
||||
handlers; OperationEnv adds namespace-aware composition on top.
|
||||
- **Negative**: The `@alkdev/operations` TypeScript `HashMap<String,
|
||||
HashMap<String, fn>>` model needs idiomatic Rust translation. The behavioral
|
||||
contract must match, but the implementation can differ.
|
||||
|
||||
## References
|
||||
|
||||
- [research/services.md](../../research/services.md) — OperationContext, OperationEnv
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.5, OperationEnv wiring
|
||||
- [ADR-026](026-transport-interface-separation.md) — Three-layer model (OperationEnv is Layer 3)
|
||||
- [ADR-028](028-auth-irpc-service.md) — Auth as irpc service (one dispatch backend)
|
||||
- [ADR-032](032-event-boundary-discipline.md) — Event boundary discipline
|
||||
- [ADR-024](024-bidirectional-call-protocol.md) — Bidirectional call protocol
|
||||
- [ADR-025](025-handler-spec-separation.md) — Handler/spec separation
|
||||
@@ -1,55 +0,0 @@
|
||||
# ADR-034: Head/Worker Terminology
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The project previously used hub/spoke terminology for describing node
|
||||
relationships: a hub node that coordinates connections and spokes that connect to
|
||||
it. This terminology implies a strict star topology where the hub is
|
||||
fundamentally different from spokes.
|
||||
|
||||
In practice, a coordinating node can also execute operations (run services,
|
||||
forward traffic). Any node can become a coordinator. The architecture supports
|
||||
mesh topologies where nodes coordinate in a peer-to-peer fashion.
|
||||
|
||||
The research documents (`core.md`, `services.md`) and updated architecture
|
||||
specs (`call-protocol.md`, `auth.md`, `napi-and-pubsub.md`, `open-questions.md`)
|
||||
already use head/worker consistently. Existing ADRs (024, 025) retain their
|
||||
original hub/spoke language because ADRs are historical records.
|
||||
|
||||
## Decision
|
||||
|
||||
**Use head/worker terminology throughout the project.**
|
||||
|
||||
- **Head node**: A node that coordinates — accepts connections, routes
|
||||
operations, manages cluster state. A head is also a worker (it can execute
|
||||
operations).
|
||||
- **Worker node**: A node that connects to a head, registers its services, and
|
||||
executes operations. Any worker can become a head.
|
||||
- **Node**: Any participant in the network. Every node has an Ed25519 identity.
|
||||
|
||||
The terms hub and spoke are deprecated in all new specs, code, and
|
||||
documentation. Existing ADRs retain their original language as historical
|
||||
records — ADRs document what was decided at the time, not what the current
|
||||
terminology is.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Natural mesh formation. A head that is also a worker enables
|
||||
multi-hop routing, redundancy, and distributed topologies without a
|
||||
centralized authority.
|
||||
- **Positive**: Consistency with integration plan and research documents.
|
||||
- **Positive**: The terminology better reflects the architecture — there is no
|
||||
single "hub" that's fundamentally different from "spokes."
|
||||
- **Neutral**: Existing ADRs (024, 025) retain hub/spoke in their text. This is
|
||||
intentional — ADRs are historical records.
|
||||
|
||||
## References
|
||||
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 0 ADR 034 entry, inconsistencies section
|
||||
- [ADR-024](024-bidirectional-call-protocol.md) — Uses hub/spoke historically
|
||||
- [ADR-025](025-handler-spec-separation.md) — Uses hub/spoke historically
|
||||
- [research/core.md](../../research/core.md) — Head/worker terminology
|
||||
@@ -1,65 +0,0 @@
|
||||
# ADR-035: StreamInterface and MessageInterface Split
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The `Interface` trait (ADR-026) assumes a persistent byte stream from a `Transport`. It produces a `Session` that yields `InterfaceEvent` frames. This works for SSH and raw framing — both run over duplex streams.
|
||||
|
||||
However, HTTP and DNS do not fit this model. They handle individual request/response pairs, not persistent sessions. HTTP runs over a TLS connection after byte-peek protocol detection (extending the existing stealth mode pattern). DNS runs its own server on port 53. Both are stateless per-request, not session-oriented.
|
||||
|
||||
The three-layer model (Transport, Interface, Protocol) remains correct. The issue is that Layer 2 has two distinct patterns: stream-based (SSH, raw framing) where the transport provides a continuous byte stream, and message-based (HTTP, DNS) where the interface manages its own transport and handles discrete requests.
|
||||
|
||||
## Decision
|
||||
|
||||
Split the `Interface` trait into two independent traits:
|
||||
|
||||
1. **`StreamInterface`** — consumes a `TransportStream`, produces a long-lived `Session` that yields `InterfaceEvent` frames. Existing `SshInterface` and `RawFramingInterface` become `StreamInterface` implementations.
|
||||
|
||||
2. **`MessageInterface`** — handles individual `InterfaceRequest` → `InterfaceResponse` pairs. Manages its own transport (HTTP server, DNS server). `HttpInterface` and `DnsInterface` are `MessageInterface` implementations.
|
||||
|
||||
The traits are independent. They have different signatures (`accept(stream)` vs `handle_request(req)`), different lifecycles (long-lived session vs stateless per-request), and different transport ownership (provided by caller vs self-managed).
|
||||
|
||||
`ListenerConfig` gains variants for both:
|
||||
|
||||
```rust
|
||||
pub enum ListenerConfig {
|
||||
Stream {
|
||||
transport: TransportKind,
|
||||
interface: StreamInterfaceKind,
|
||||
},
|
||||
Http {
|
||||
bind_addr: SocketAddr,
|
||||
tls: bool,
|
||||
stealth: bool,
|
||||
},
|
||||
Dns {
|
||||
bind_addr: SocketAddr,
|
||||
tls: bool,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
`TransportKind::Dns` is removed. DNS is a `MessageInterface` that manages its own transport (UDP/TCP port 53), not a transport variant.
|
||||
|
||||
The call protocol handler (Layer 3) is interface-agnostic: it processes `InterfaceEvent` frames from `StreamInterface` sessions and `InterfaceRequest` → `InterfaceResponse` from `MessageInterface` handlers. The dispatch logic is the same — only the framing differs.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive**: HTTP and DNS are first-class interfaces with proper type signatures. No forcing stateless protocols into a session model. The existing stealth mode byte-peek pattern naturally extends to `HttpInterface`. The `InterfaceRequest` / `InterfaceResponse` types normalize calls across message-based interfaces.
|
||||
|
||||
**Positive**: Removing `TransportKind::Dns` prevents a breaking change later — code should never depend on DNS as a transport variant.
|
||||
|
||||
**Positive**: `ListenerConfig` correctly models the server's accept loop: stream listeners spawn one accept loop per (transport, interface) pair, while HTTP and DNS listeners each manage their own server.
|
||||
|
||||
**Negative**: Two traits where there was one. But they serve fundamentally different purposes. A common super-trait would add complexity (`accept_stream` + `handle_request` + `transport_kind`) without practical benefit — implementations satisfy one trait or the other, never both.
|
||||
|
||||
**Negative**: The `accept()` method on the current `Interface` trait needs to be renamed. This is a rename of an existing method signature, not a semantic change — `SshInterface` and `RawFramingInterface` implementations become `StreamInterface` implementations with the same `accept()` logic.
|
||||
|
||||
## References
|
||||
|
||||
- ADR-026 (transport/interface separation — updated by this ADR)
|
||||
- [interface.md](../interface.md) — Interface layer spec
|
||||
- [research/phase2/interface-model.md](../../research/phase2/interface-model.md) — Full analysis
|
||||
- [research/phase2/tls-transport.md](../../research/phase2/tls-transport.md) — HTTP interface, ListenerConfig
|
||||
@@ -1,82 +0,0 @@
|
||||
# ADR-036: CredentialProvider as Core Type
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Alknet's `IdentityProvider` resolves **inbound** authentication: given a
|
||||
credential (fingerprint or token), produce an `Identity`. But there is no
|
||||
corresponding abstraction for **outbound** credentials: how does alknet
|
||||
authenticate _to_ external services (vast.ai, rustfs, gitea)?
|
||||
|
||||
Without `CredentialProvider`, each service wrapper would independently solve
|
||||
credential retrieval, caching, and lifecycle management. This leads to
|
||||
duplicated effort and inconsistent security practices across service wrappers.
|
||||
|
||||
The pattern mirrors the existing `IdentityProvider` pattern: trait in core,
|
||||
default impl using simple storage, production impl using the secret service
|
||||
and database.
|
||||
|
||||
## Decision
|
||||
|
||||
Define `CredentialProvider` trait and `CredentialSet` enum in
|
||||
`alknet_core::credentials`.
|
||||
|
||||
```rust
|
||||
pub trait CredentialProvider: Send + Sync + 'static {
|
||||
fn get_credentials(&self, service: &str) -> Option<CredentialSet>;
|
||||
fn refresh_credentials(&self, service: &str) -> Option<CredentialSet>;
|
||||
}
|
||||
|
||||
pub enum CredentialSet {
|
||||
ApiKey { header_name: String, token: String },
|
||||
Basic { username: String, password: String },
|
||||
Bearer { token: String },
|
||||
S3AccessKey { access_key: String, secret_key: String, session_token: Option<String> },
|
||||
OidcToken { access_token: String, refresh_token: Option<String>, expires_at: Option<u64> },
|
||||
Custom { scheme: String, params: HashMap<String, String> },
|
||||
}
|
||||
```
|
||||
|
||||
The trait is intentionally narrow. It returns credentials for a named service.
|
||||
It does not try to abstract the auth mechanism itself — that stays with the
|
||||
service wrapper that knows the protocol (S3 signing, OAuth2 refresh, etc.).
|
||||
|
||||
Phase 1 provides `SecretStoreCredentialProvider` (reads from
|
||||
`SecretProtocol::Decrypt`, holds in RAM). Phase 2+ adds
|
||||
`ManagedCredentialProvider` (with `CredentialManager` for lifecycle management:
|
||||
refresh, expiration, provisioning).
|
||||
|
||||
`CredentialProvider` does not depend on `IdentityProvider`, though
|
||||
`ManagedCredentialProvider` may use `Identity.id` for identity-bound credential
|
||||
lookups.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive**: Outbound auth has a unified abstraction, just as inbound auth
|
||||
has `IdentityProvider`. Service wrappers retrieve credentials through one
|
||||
interface. `OperationEnv` can expose credentials through `context.env`.
|
||||
|
||||
**Positive**: The `CredentialSet` enum covers all identified credential types
|
||||
(API keys, bearer tokens, S3 access keys, OIDC tokens, basic auth, custom).
|
||||
This is sufficient for Phases A-C. Phase D (alknet as OIDC provider) is additive.
|
||||
|
||||
**Positive**: The trait in core, impl in service crate pattern is consistent
|
||||
with `IdentityProvider` (trait in core, `ConfigIdentityProvider` in core,
|
||||
`StorageIdentityProvider` in alknet-storage).
|
||||
|
||||
**Negative**: Adds a new core type and a new module (`credentials`). But this
|
||||
is the same pattern as `IdentityProvider` and `auth` — a small, narrow trait
|
||||
with a clear contract.
|
||||
|
||||
**Negative**: `ManagedCredentialProvider` and `CredentialManager` are Phase C
|
||||
concepts. The spec should define them as future extensions, not implement them
|
||||
now.
|
||||
|
||||
## References
|
||||
|
||||
- ADR-029 (Identity as core type — same pattern)
|
||||
- [credentials.md](../credentials.md) — CredentialProvider spec
|
||||
- [research/phase2/credential-provider.md](../../research/phase2/credential-provider.md) — Full analysis
|
||||
- [identity.md](../identity.md) — IdentityProvider (inbound, opposite direction)
|
||||
@@ -1,83 +0,0 @@
|
||||
# ADR-037: API Keys as DynamicConfig Auth
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Alknet's token auth uses Ed25519-signed `AuthToken`s — the same key material
|
||||
used for SSH auth. This is appropriate for interactive clients (browsers, CLI)
|
||||
that can generate and sign Ed25519 key pairs.
|
||||
|
||||
But for service accounts, automation, and simple integrations, Ed25519 key
|
||||
pairs are inconvenient. A dashboard backend, a CI/CD pipeline, or a monitoring
|
||||
script needs a simple bearer token that can be stored in an environment variable
|
||||
or config file without managing cryptographic key pairs.
|
||||
|
||||
The HTTP interface (Phase 2+) requires bearer token auth for `Authorization:
|
||||
Bearer <token>` headers. `AuthToken` works but requires client-side Ed25519
|
||||
signing. API keys offer a simpler alternative: short bearer tokens verified by
|
||||
SHA-256 hash lookup, with optional scope restrictions and TTL.
|
||||
|
||||
## Decision
|
||||
|
||||
Add `[[auth.api_keys]]` section to `DynamicConfig`:
|
||||
|
||||
```toml
|
||||
[[auth.api_keys]]
|
||||
prefix = "alk_"
|
||||
hash = "sha256:abc..."
|
||||
scopes = ["relay:connect", "secrets:derive"]
|
||||
description = "dashboard service account"
|
||||
ttl = "30d" # optional
|
||||
```
|
||||
|
||||
`ConfigIdentityProvider::resolve_from_token()` handles both token types:
|
||||
- If the input starts with the configured prefix (default `alk_`), treat it as
|
||||
an API key: hash it with SHA-256 and look up the hash in the `api_keys` table.
|
||||
- Otherwise, treat it as an `AuthToken`: decode, verify Ed25519 signature,
|
||||
check timestamp, resolve from `authorized_keys`.
|
||||
|
||||
Both paths produce the same `Identity` result. In database-backed deployments,
|
||||
both resolve to the same account UUID.
|
||||
|
||||
API keys are stored as SHA-256 hashes (like password hashing — the cleartext
|
||||
key is never stored, only its hash). The prefix enables O(1) routing between
|
||||
AuthToken and API key verification without trying both paths.
|
||||
|
||||
The full key is provided to the client exactly once (at creation time). Subsequent
|
||||
verifications only compare hashes.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive**: Simple bearer token auth for HTTP and other non-SSH interfaces.
|
||||
No cryptographic key management for service accounts. Consistent with industry
|
||||
practice (Stripe, GitHub, AWS all use prefixed API keys).
|
||||
|
||||
**Positive**: Both AuthTokens and API keys go through `resolve_from_token()`.
|
||||
The caller doesn't need to know which type they're using. This keeps the
|
||||
authentication layer unified.
|
||||
|
||||
**Positive**: Scoped API keys enable fine-grained access control for service
|
||||
accounts. A monitoring tool gets `["monitoring:read"]`, not full access.
|
||||
|
||||
**Negative**: API keys are bearer tokens — anyone who obtains the key has the
|
||||
associated permissions. The hash storage and optional TTL mitigate but do not
|
||||
eliminate this risk. Ed25519 AuthTokens remain the preferred auth method for
|
||||
interactive clients.
|
||||
|
||||
**Negative**: API key rotation requires updating `DynamicConfig` (or the
|
||||
`api_keys` database table). The `ConfigReloadHandle` / `ConfigService` reload
|
||||
mechanism handles this, but it's a deliberate operation, not automatic.
|
||||
|
||||
**Negative**: No rate limiting on API key verification is built into this ADR.
|
||||
Rate limiting on the HTTP interface is a separate concern.
|
||||
|
||||
## References
|
||||
|
||||
- ADR-023 (unified auth, shared key material)
|
||||
- ADR-029 (Identity as core type)
|
||||
- ADR-030 (static/dynamic config split)
|
||||
- [auth.md](../auth.md) — Token auth, AuthPolicy, API keys
|
||||
- [configuration.md](../configuration.md) — DynamicConfig, AuthPolicy
|
||||
- [research/phase2/interface-model.md](../../research/phase2/interface-model.md) — API keys in config
|
||||
@@ -1,137 +0,0 @@
|
||||
# ADR-038: Seed Lifecycle and Memory Security
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The alknet-secret crate holds the master BIP39 seed phrase in RAM. This seed is
|
||||
the root of trust for all derived keys (identity, encryption, signing). If the
|
||||
seed is leaked — through memory dumps, swap files, or core dumps — an attacker
|
||||
can derive every key in the system.
|
||||
|
||||
Security-conscious key management systems typically employ three defenses:
|
||||
|
||||
1. **Zeroize**: Overwrite sensitive memory before deallocating. Prevents
|
||||
stale-data reads from freed memory.
|
||||
|
||||
2. **Memory locking** (`mlock`/`VirtualLock`): Prevent the OS from paging
|
||||
sensitive RAM to disk. Prevents swap-file leakage.
|
||||
|
||||
3. **Constant-time comparison**: Prevent timing side-channels when comparing
|
||||
keys or tokens.
|
||||
|
||||
The question is: which of these should alknet-secret adopt in v1, and which
|
||||
should be deferred?
|
||||
|
||||
## Decision
|
||||
|
||||
**Phase 3 (v1): Zeroize only. Defer mlock and constant-time comparison to
|
||||
Phase B.**
|
||||
|
||||
- All sensitive types (seed bytes, derived private keys, passphrase strings)
|
||||
derive `Zeroize` and implement `Drop` to call `zeroize()` before deallocation.
|
||||
- The `Lock` operation calls `zeroize()` on the seed and all cached derived
|
||||
keys, then drops them.
|
||||
- `mlock`/`VirtualLock` and constant-time comparison are not included in v1.
|
||||
|
||||
### Rationale for deferring mlock
|
||||
|
||||
1. **Complexity**: `mlock` requires root/CAP_IPC_LOCK on Linux or
|
||||
`SeLockMemory` on Windows. The crate should work in unprivileged contexts
|
||||
(development, testing, single-user nodes) without requiring system
|
||||
configuration changes.
|
||||
|
||||
2. **Performance**: `mlock` locks physical pages, which are typically 4KB.
|
||||
Locking many small buffers wastes physical memory. The seed (64 bytes) and
|
||||
derived keys (32–64 bytes each) are tiny — the real risk is swap-file
|
||||
leakage, which `zeroize` partially mitigates by wiping before free.
|
||||
|
||||
3. **Deployment flexibility**: Production head nodes running as root or with
|
||||
`CAP_IPC_LOCK` can add `mlock` in Phase B. Development and CLI nodes
|
||||
shouldn't need it.
|
||||
|
||||
4. **Audit surface**: `mlock` introduces platform-specific code paths (Linux
|
||||
vs macOS vs Windows) that should be audited together, not bolted on
|
||||
incrementally.
|
||||
|
||||
### Rationale for deferring constant-time comparison
|
||||
|
||||
The `SecretProtocol` service receives requests over irpc (local mpsc or remote
|
||||
QUIC). Comparison timing is not observable by callers — they send a message and
|
||||
wait for a response. The comparison that matters (auth token verification) is
|
||||
in alknet-core's `IdentityProvider`, not in alknet-secret. Key derivation
|
||||
results (DerivedKey) are not compared against attacker-controlled input within
|
||||
this crate.
|
||||
|
||||
### Zeroize implementation
|
||||
|
||||
```rust
|
||||
use zeroize::Zeroize;
|
||||
|
||||
#[derive(Zeroize)]
|
||||
#[zeroize(drop)]
|
||||
struct SeedHolder {
|
||||
seed: Vec<u8>,
|
||||
}
|
||||
|
||||
#[derive(Zeroize)]
|
||||
#[zeroize(drop)]
|
||||
struct DerivedKeyCache {
|
||||
keys: HashMap<String, Vec<u8>>,
|
||||
}
|
||||
```
|
||||
|
||||
`#[zeroize(drop)]` ensures that `Drop` calls `zeroize()` on all fields,
|
||||
overwriting memory before deallocation. This is a compile-time guarantee —
|
||||
forgetting to zeroize a field is a compile error.
|
||||
|
||||
### Lock lifecycle
|
||||
|
||||
```
|
||||
Unlock(passphrase)
|
||||
→ validate mnemonic (if restoring) or generate new
|
||||
→ derive master key from seed
|
||||
→ store seed in SeedHolder (Zeroize-protected)
|
||||
→ cache empty (keys derived on demand)
|
||||
|
||||
DeriveEd25519/DeriveEncryptionKey/Encrypt/Decrypt
|
||||
→ require unlocked state (error if locked)
|
||||
→ derive key, return result
|
||||
→ optionally cache derived key
|
||||
|
||||
Lock
|
||||
→ zeroize all cached derived keys
|
||||
→ zeroize seed
|
||||
→ drop all sensitive material
|
||||
→ service returns to locked state
|
||||
```
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Zeroize is zero-cost at compile time, minimal dependency
|
||||
(`zeroize` crate is ~500 lines, no `unsafe` on stable), and provides
|
||||
meaningful protection against stale-memory reads.
|
||||
- **Positive**: Lock effectively purges all sensitive material. After Lock,
|
||||
the process memory contains no useful secret data.
|
||||
- **Positive**: No platform-specific code paths in v1. The crate compiles and
|
||||
runs everywhere without privilege requirements.
|
||||
- **Negative**: Without `mlock`, the OS can page the seed to swap before
|
||||
zeroization occurs. This is a window of vulnerability that Phase B closes.
|
||||
The risk is acceptable for v1 because swap-file extraction requires root
|
||||
access or physical access to the machine — the same threat model as reading
|
||||
process memory directly.
|
||||
- **Negative**: Without constant-time comparison, timing side-channels exist
|
||||
in theory. In practice, no comparison in alknet-secret operates on
|
||||
attacker-controlled input, so the risk is nil within this crate.
|
||||
- **Negative**: `zeroize` adds a dependency. The `zeroize` crate is widely
|
||||
used in Rust crypto (ring, ed25519-dalek, x25519-dalek) and is a de facto
|
||||
standard.
|
||||
|
||||
## References
|
||||
|
||||
- [secret-service.md](../secret-service.md) — Security model, Lock/Unlock lifecycle
|
||||
- [ADR-027](027-crate-decomposition.md) — Crate decomposition (alknet-secret is independent)
|
||||
- [credentials.md](../credentials.md) — SecretStoreCredentialProvider integration
|
||||
- `zeroize` crate — https://crates.io/crates/zeroize
|
||||
@@ -1,226 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-09
|
||||
---
|
||||
|
||||
# Definitions: Terminology and Concept Disambiguation
|
||||
|
||||
## Purpose
|
||||
|
||||
Several terms are overloaded across alknet's architecture. This document defines
|
||||
each term precisely and states the rule for using it in architecture specs. When
|
||||
ambiguity is possible, specs must use the full qualifier.
|
||||
|
||||
This is a normative reference — other architecture documents link here rather
|
||||
than repeating definitions inline.
|
||||
|
||||
## Term Definitions
|
||||
|
||||
### Interface (Layer 2)
|
||||
|
||||
An **Interface** consumes a Transport stream (Layer 1) or manages its own
|
||||
transport, and produces call protocol sessions or handles discrete requests.
|
||||
It is a _protocol parser_, not a network service.
|
||||
|
||||
Two subtypes:
|
||||
|
||||
| Subtype | Trait | Lifecycle | Transport ownership | Examples |
|
||||
|---------|-------|-----------|---------------------|----------|
|
||||
| `StreamInterface` | `accept(stream) → Session` | Long-lived session | Provided by caller | SshInterface, RawFramingInterface |
|
||||
| `MessageInterface` | `handle_request(req) → Response` | Stateless per-request | Self-managed | HttpInterface, DnsInterface |
|
||||
|
||||
**Rule**: In alknet architecture docs, "Interface" (capitalized) refers to
|
||||
Layer 2. Rust trait definitions use "trait" or "contract." Network URLs use
|
||||
"endpoint." When discussing auth mechanisms per transport/interface pair, use
|
||||
"credential presentation" (not "auth interface").
|
||||
|
||||
See: [interface.md](interface.md), ADR-035.
|
||||
|
||||
### Transport (Layer 1)
|
||||
|
||||
A **Transport** produces a byte stream (`AsyncRead + AsyncWrite + Unpin + Send`).
|
||||
It is a _wire mechanism_, not a protocol. `TransportKind` enumerates:
|
||||
`Tcp`, `Tls`, `Iroh`, `WebTransport`.
|
||||
|
||||
DNS is **not** a transport — it is a `MessageInterface` that manages its own
|
||||
transport (UDP/TCP port 53).
|
||||
|
||||
**Rule**: Never use "transport" to refer to HTTP, DNS, or any protocol that
|
||||
doesn't produce a `TransportStream`. Use "MessageInterface" instead.
|
||||
|
||||
See: [transport.md](transport.md), ADR-026, ADR-035.
|
||||
|
||||
### Service (irpc service)
|
||||
|
||||
An **irpc service** is an in-cluster, Rust-to-Rust service defined by an irpc
|
||||
protocol enum. Dispatched by enum variant with postcard serialization. Examples:
|
||||
`AuthProtocol`, `SecretProtocol`, `ConfigProtocol`.
|
||||
|
||||
**Rule**: Always qualify: "irpc service" (in-cluster, enum-dispatched),
|
||||
"application service" (operation-registered handler), or "external service"
|
||||
(third-party endpoint). Never use bare "service" in architecture docs.
|
||||
|
||||
See: [services.md](services.md), ADR-028, ADR-033.
|
||||
|
||||
### Operation (call protocol)
|
||||
|
||||
An **operation** is a path-based handler registered in `OperationRegistry`,
|
||||
dispatched by `namespace + name`. Cross-node, cross-language, JSON
|
||||
`EventEnvelope` framing.
|
||||
|
||||
**Rule**: Use "operation" for call protocol handlers. Use "irpc service method"
|
||||
for enum-dispatched calls. These are different dispatch mechanisms unified by
|
||||
OperationEnv.
|
||||
|
||||
See: [call-protocol.md](call-protocol.md), ADR-033.
|
||||
|
||||
### Identity (core type)
|
||||
|
||||
The `Identity` struct `{ id, scopes, resources }` represents an authenticated
|
||||
principal. Produced by `IdentityProvider` (inbound auth resolution).
|
||||
|
||||
| Identity field | Config-backed auth | Database-backed auth |
|
||||
|---------------|-------------------|---------------------|
|
||||
| `id` | SSH key fingerprint | Account UUID |
|
||||
| `scopes` | From authorized_keys entry | From peer_credentials + ACL |
|
||||
| `resources` | From authorized_keys entry | From organization membership |
|
||||
|
||||
**Rule**: "Identity" (capitalized, code font) = the alknet struct. "identity
|
||||
service" = a full identity management system (Keystone, etc.). Never conflate
|
||||
the two.
|
||||
|
||||
See: [identity.md](identity.md), ADR-029.
|
||||
|
||||
### IdentityProvider (inbound auth)
|
||||
|
||||
`IdentityProvider` resolves **inbound** authentication: given a credential
|
||||
(fingerprint or token), produce an `Identity`.
|
||||
|
||||
**Direction**: Inbound (who is calling alknet).
|
||||
|
||||
**Rule**: Never use "IdentityProvider" to describe outbound auth. That is
|
||||
`CredentialProvider`.
|
||||
|
||||
See: [identity.md](identity.md), ADR-029.
|
||||
|
||||
### CredentialProvider (outbound auth)
|
||||
|
||||
`CredentialProvider` resolves **outbound** credentials: given a service name,
|
||||
produce a `CredentialSet` for authenticating _to_ that service.
|
||||
|
||||
**Direction**: Outbound (how alknet calls others).
|
||||
|
||||
**Rule**: Never use "CredentialProvider" for inbound auth. That is
|
||||
`IdentityProvider`.
|
||||
|
||||
See: [credentials.md](credentials.md), ADR-036.
|
||||
|
||||
### AuthToken
|
||||
|
||||
`AuthToken = base64url(key_id || timestamp || signature)` — an Ed25519-signed
|
||||
timestamp token used for non-SSH auth. Self-signed by the client, verified
|
||||
server-side.
|
||||
|
||||
**Rule**: Use "AuthToken" (capitalized) for this specific format. Use "API key"
|
||||
for hash-verified bearer tokens. Never use bare "token" in architecture docs.
|
||||
|
||||
See: [auth.md](auth.md), ADR-023.
|
||||
|
||||
### API Key
|
||||
|
||||
A hash-verified bearer token with a prefix like `alk_...`. Simpler than
|
||||
AuthToken (no Ed25519 key pair needed). Stored as SHA-256 hash in
|
||||
`DynamicConfig.auth.api_keys` or `api_keys` table.
|
||||
|
||||
**Rule**: Always "API key" (two words) for hash-verified bearer tokens.
|
||||
"AuthToken" for Ed25519-signed tokens.
|
||||
|
||||
See: [auth.md](auth.md), ADR-037.
|
||||
|
||||
### Domain Event vs Integration Event
|
||||
|
||||
| Type | Scope | Serialization | Example |
|
||||
|------|-------|---------------|---------|
|
||||
| Domain event | Within a service boundary | Any format (Honker streams) | `KeyRotated`, `InventoryAdjusted` |
|
||||
| Integration event | Across service or node boundaries | JSON `EventEnvelope` | `call.requested`, `UserCreated` |
|
||||
|
||||
irpc service calls are synchronous request-response, not events.
|
||||
|
||||
**Rule**: "Domain event" for internal Honker streams. "Integration event" for
|
||||
call protocol `EventEnvelope`. "irpc call" for synchronous in-cluster calls.
|
||||
Per ADR-032, domain events never cross service boundaries without projection.
|
||||
|
||||
See: ADR-032, [services.md](services.md).
|
||||
|
||||
### Scope
|
||||
|
||||
A permission string attached to an `Identity`. Flat strings like
|
||||
`"relay:connect"`, `"secrets:derive"`. Used by `ForwardingPolicy` and
|
||||
operation-level ACL.
|
||||
|
||||
**Rule**: Use "scope" for `Identity.scopes` flat strings. Use "resource" for
|
||||
`Identity.resources` entries. Do not conflate with hierarchical role models
|
||||
unless explicitly noting a comparison to Keystone.
|
||||
|
||||
See: [identity.md](identity.md), ADR-031.
|
||||
|
||||
### OperationRegistry
|
||||
|
||||
The central registry mapping `(namespace, operation_name)` to handlers and
|
||||
specs. All interfaces resolve to the same registry.
|
||||
|
||||
**Rule**: "OperationRegistry" for this specific data structure. "Service
|
||||
catalog" only when explicitly comparing to Keystone or similar external systems.
|
||||
|
||||
See: [call-protocol.md](call-protocol.md), ADR-025.
|
||||
|
||||
### Credential Presentation
|
||||
|
||||
The mechanism by which credentials are presented on each (Transport, Interface)
|
||||
pair:
|
||||
|
||||
| (Transport, Interface) | Credential presentation | Resolves via |
|
||||
|----------------------|----------------------|-------------|
|
||||
| (TLS, SSH) | SSH key handshake | `resolve_from_fingerprint()` |
|
||||
| (TCP, SSH) | SSH key handshake | `resolve_from_fingerprint()` |
|
||||
| (iroh, SSH) | SSH key handshake | `resolve_from_fingerprint()` |
|
||||
| (TLS, raw framing) | AuthToken in frame header | `resolve_from_token()` |
|
||||
| (TCP, raw framing) | AuthToken in frame header | `resolve_from_token()` |
|
||||
| (WebTransport, raw framing) | AuthToken in CONNECT request | `resolve_from_token()` |
|
||||
| (—, HTTP) | `Authorization: Bearer` header | `resolve_from_token()` |
|
||||
| (—, DNS) | AuthToken in query labels | `resolve_from_token()` |
|
||||
|
||||
**Rule**: Use "credential presentation" for the mechanism of presenting
|
||||
credentials on a specific (Transport, Interface) pair. Not "auth interface"
|
||||
(which overloads "Interface").
|
||||
|
||||
See: [auth.md](auth.md), [interface.md](interface.md).
|
||||
|
||||
## Cross-cutting Open Questions
|
||||
|
||||
These questions affect multiple specs and need resolution before or during
|
||||
Phase 2 implementation:
|
||||
|
||||
- **OQ-DEF-03**: Should `Identity.scopes` be hierarchical (Keystone implied roles)
|
||||
or stay flat? Recommendation: Stay flat. Add implied scope resolution in
|
||||
alknet-storage when multi-tenant deployment requires it.
|
||||
|
||||
- **OQ-DEF-07**: Should the on-chain `IdentityProvider` be a separate impl or a
|
||||
`CredentialProvider` extension? Recommendation: Separate `IdentityProvider`
|
||||
impl (`OnChainIdentityProvider`). `IdentityProvider` resolves inbound auth,
|
||||
not outbound credentials.
|
||||
|
||||
- **OQ-DEF-08**: Should "credential presentation" replace overloaded "interface" in
|
||||
auth contexts? Recommendation: Yes. Adopted in this document.
|
||||
|
||||
See: [open-questions.md](open-questions.md) for tracking.
|
||||
|
||||
## References
|
||||
|
||||
- [interface.md](interface.md) — StreamInterface / MessageInterface
|
||||
- [auth.md](auth.md) — AuthToken, credential presentation per interface
|
||||
- [identity.md](identity.md) — Identity, IdentityProvider
|
||||
- [credentials.md](credentials.md) — CredentialProvider, CredentialSet
|
||||
- [services.md](services.md) — irpc services vs application services
|
||||
- [call-protocol.md](call-protocol.md) — Operations, OperationEnv
|
||||
- [research/phase2/definitions.md](../research/phase2/definitions.md) — Full research with cross-domain mappings
|
||||
@@ -1,186 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-07
|
||||
---
|
||||
|
||||
# FlowGraph
|
||||
|
||||
## What
|
||||
|
||||
The `alknet-flowgraph` crate provides graph data structures and operations,
|
||||
mapping the TypeScript `@alkdev/flowgraph` package's call-graph and
|
||||
operation-graph concepts to `petgraph::DiGraph`.
|
||||
|
||||
## Why
|
||||
|
||||
Call graphs and operation graphs are core observability and type-safety
|
||||
constructs. Call graphs track request flow across services; operation graphs
|
||||
validate type compatibility between composed operations. The crate is pure
|
||||
computation (no I/O, no external state), making it safe to include in any
|
||||
deployment topology.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Core Abstraction
|
||||
|
||||
`petgraph::DiGraph` replaces graphology. The mapping is nearly 1:1 for the
|
||||
operations used:
|
||||
|
||||
| TypeScript (graphology) | Rust (petgraph) |
|
||||
|------------------------|-----------------|
|
||||
| `graph.addNode(key, attrs)` | `graph.add_node(attrs)` + key_to_index |
|
||||
| `graph.addEdge(source, target, attrs)` | `graph.add_edge(source, target, attrs)` |
|
||||
| `hasCycle()` | `is_cyclic_directed(&graph)` |
|
||||
| `topologicalSort()` | `toposort(&graph)` |
|
||||
|
||||
A `HashMap<String, NodeIndex>` provides node-key-to-index lookups, mirroring
|
||||
the `key` column in the SQLite `nodes` table.
|
||||
|
||||
### FlowGraph<N, E>
|
||||
|
||||
```rust
|
||||
pub struct FlowGraph<N, E>
|
||||
where
|
||||
N: NodeAttributes,
|
||||
E: EdgeAttributes,
|
||||
{
|
||||
graph: DiGraph<N, E>,
|
||||
key_to_index: HashMap<String, NodeIndex>,
|
||||
}
|
||||
|
||||
pub trait NodeAttributes: Clone + Serialize + DeserializeOwned + Debug + Send + Sync {
|
||||
fn key(&self) -> &str;
|
||||
fn set_key(&mut self, key: String);
|
||||
}
|
||||
|
||||
pub trait EdgeAttributes: Clone + Serialize + DeserializeOwned + Debug + Send + Sync {
|
||||
fn edge_type(&self) -> &str;
|
||||
}
|
||||
```
|
||||
|
||||
### Operation Graph (Static)
|
||||
|
||||
Built from `OperationSpec`s at startup. Answers structural questions: type
|
||||
compatibility, cycle detection, reachability.
|
||||
|
||||
```rust
|
||||
pub struct OperationNodeAttrs {
|
||||
pub name: String,
|
||||
pub namespace: String,
|
||||
pub op_type: OperationType,
|
||||
pub input_schema: Value,
|
||||
pub output_schema: Value,
|
||||
}
|
||||
|
||||
pub enum OperationType { Query, Mutation, Subscription }
|
||||
```
|
||||
|
||||
Type compatibility compares `output_schema` (source) against `input_schema`
|
||||
(target) using `jsonschema::validate()`. Exact match or subtype = compatible
|
||||
edge. Structural mismatch = incompatible edge.
|
||||
|
||||
### Call Graph (Dynamic)
|
||||
|
||||
Populated at runtime from call protocol events. Every `call.requested` adds a
|
||||
node; `call.responded`/`call.error`/`call.aborted` update status.
|
||||
|
||||
```rust
|
||||
pub struct CallNodeAttrs {
|
||||
pub request_id: String,
|
||||
pub operation_id: String,
|
||||
pub status: CallStatus,
|
||||
pub parent_request_id: Option<String>,
|
||||
pub input: Value,
|
||||
pub output: Option<Value>,
|
||||
pub error: Option<CallErrorInfo>,
|
||||
pub identity: Option<Identity>,
|
||||
pub started_at: Option<String>,
|
||||
pub completed_at: Option<String>,
|
||||
}
|
||||
|
||||
pub enum CallStatus { Pending, Running, Completed, Failed, Aborted }
|
||||
```
|
||||
|
||||
### Key Operations
|
||||
|
||||
| Query | Method | Returns |
|
||||
|-------|--------|---------|
|
||||
| Topological order | `topological_order()` | `Result<Vec<String>, CycleError>` |
|
||||
| Cycle detection | `has_cycles()` | `bool` |
|
||||
| Ancestors/descendants | `ancestors()`, `descendants()` | `Vec<String>` |
|
||||
| Status filtering | `filter_by_status()` | Keys with matching status |
|
||||
| Duration | `duration()` | `completed_at - started_at` |
|
||||
|
||||
### DAG Invariants
|
||||
|
||||
- **Operation graph**: DAG-only enforced at construction. Cycles throw
|
||||
`CycleError`.
|
||||
- **Call graph**: DAG by design. `parent_request_id` cannot create ancestor
|
||||
cycles.
|
||||
- **No parallel edges**: `multi: false`.
|
||||
- **No self-loops**: `allow_self_loops: false`.
|
||||
|
||||
### Integration with alknet-storage
|
||||
|
||||
Call graphs and operation graphs are stored as metagraph instances in
|
||||
alknet-storage. The bridge is serialization: `FlowGraph` serializes to
|
||||
`serde_json::Value`, which storage persists in the `nodes.attributes` and
|
||||
`edges.attributes` columns.
|
||||
|
||||
### Integration with alknet-core (Call Protocol)
|
||||
|
||||
The call protocol's `EventEnvelope` drives call graph construction:
|
||||
|
||||
```rust
|
||||
call_map.on_requested(|event| {
|
||||
call_graph.update_from_event(&CallEvent::Requested(event));
|
||||
});
|
||||
```
|
||||
|
||||
### Crate Dependencies
|
||||
|
||||
```toml
|
||||
[dependencies]
|
||||
petgraph = "0.x"
|
||||
serde = { version = "1", features = ["derive"] }
|
||||
serde_json = "1"
|
||||
jsonschema = "0.x"
|
||||
thiserror = "1"
|
||||
uuid = { version = "1", features = ["v4"] }
|
||||
chrono = { version = "0.x", features = ["serde"] }
|
||||
```
|
||||
|
||||
Does NOT depend on alknet-core, alknet-storage, or alknet-secret.
|
||||
|
||||
### Interface Back to Core
|
||||
|
||||
`OperationSpec` and `CallNodeAttrs` types must match alknet-core's definitions.
|
||||
The bridge is serialization — flowgraph serializes to JSON, storage persists it.
|
||||
alknet-flowgraph does not depend on alknet-core as a crate; it conforms to the
|
||||
`OperationSpec` schema independently.
|
||||
|
||||
## Constraints
|
||||
|
||||
- Pure computation crate — no I/O, no database, no external state.
|
||||
- No dependency on alknet-core, alknet-storage, or alknet-secret.
|
||||
- Type compatibility with alknet-core's `OperationSpec` is via serialization
|
||||
conformance, not a crate dependency.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- None specific to this spec. See [open-questions.md](open-questions.md) for
|
||||
general questions.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [027](decisions/027-crate-decomposition.md) | Crate decomposition | alknet-flowgraph is independent of core, storage, secret |
|
||||
|
||||
## References
|
||||
|
||||
- [research/flow.md](../research/flow.md) — Full FlowGraph, operation graph, call graph design
|
||||
- [research/integration-plan.md](../research/integration-plan.md) — Phase 2.3
|
||||
- [call-protocol.md](call-protocol.md) — EventEnvelope, PendingRequestMap
|
||||
- `@alkdev/flowgraph` — TypeScript call-graph and operation-graph implementation
|
||||
- `@alkdev/operations` — OperationSpec, CallHandler, registry
|
||||
@@ -1,193 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-07
|
||||
---
|
||||
|
||||
# Identity
|
||||
|
||||
## What
|
||||
|
||||
The `Identity` type and `IdentityProvider` trait are the core abstractions for
|
||||
authentication and authorization in alknet. `Identity` is the unified result of
|
||||
auth verification — whether via SSH public key, signed timestamp token, or
|
||||
database lookup. `IdentityProvider` is the trait that resolves credentials to an
|
||||
`Identity`, decoupling alknet-core from any specific identity storage.
|
||||
|
||||
## Why
|
||||
|
||||
Auth, forwarding policy, and call protocol all need to know who is making a
|
||||
request and what they are authorized to do. Without `Identity` in core, each
|
||||
subsystem would define its own identity type, leading to duplication and
|
||||
conversion boilerplate. Without `IdentityProvider` as a trait, alknet-core
|
||||
would either hardcode config-file-based auth or take a database dependency —
|
||||
neither acceptable for a library crate.
|
||||
|
||||
The `IdentityProvider` trait exists because the same auth verification concept
|
||||
needs two implementations: `ConfigIdentityProvider` for minimal deployments (all
|
||||
keys in memory via ArcSwap) and `StorageIdentityProvider` for production (SQLite
|
||||
lookup via `peer_credentials` and ACL graph). The trait is the contract; the
|
||||
backing store is pluggable.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Identity Struct
|
||||
|
||||
```rust
|
||||
pub struct Identity {
|
||||
pub id: String, // Fingerprint or account UUID
|
||||
pub scopes: Vec<String>, // e.g., ["relay:connect", "service:gitea:read"]
|
||||
pub resources: HashMap<String, Vec<String>>, // e.g., {"service": ["gitea", "registry"]}
|
||||
}
|
||||
```
|
||||
|
||||
The `id` field serves dual purpose:
|
||||
- **Config-based auth** (`ConfigIdentityProvider`): holds the Ed25519 key
|
||||
fingerprint (e.g., `SHA256:abc123...`)
|
||||
- **Database-backed auth** (`StorageIdentityProvider`): holds the account UUID
|
||||
from the `accounts` table
|
||||
|
||||
This keeps the type simple while accommodating both auth paths. Downstream
|
||||
consumers (forwarding policy, call protocol ACL checks) use `scopes` and
|
||||
`resources` without knowing whether the identity came from a config file or a
|
||||
database.
|
||||
|
||||
### IdentityProvider Trait
|
||||
|
||||
```rust
|
||||
pub trait IdentityProvider: Send + Sync + 'static {
|
||||
/// Resolve an SSH public key fingerprint to an identity.
|
||||
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
|
||||
|
||||
/// Resolve an auth token to an identity.
|
||||
/// Returns None if the token is invalid, expired, or the key is not authorized.
|
||||
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
|
||||
}
|
||||
```
|
||||
|
||||
Both SSH key auth and token auth resolve to the same `Identity` type. The trait
|
||||
lives in `alknet_core::auth`.
|
||||
|
||||
### ConfigIdentityProvider (Default)
|
||||
|
||||
Reads from `ArcSwap<DynamicConfig.auth>` per ADR-030. Every authorized key gets
|
||||
a default scope set. No database dependency. This is the default for CLI and
|
||||
single-node deployments.
|
||||
|
||||
```rust
|
||||
pub struct ConfigIdentityProvider {
|
||||
auth_config: Arc<ArcSwap<DynamicConfig>>,
|
||||
}
|
||||
|
||||
impl IdentityProvider for ConfigIdentityProvider {
|
||||
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity> {
|
||||
let config = self.auth_config.load();
|
||||
config.auth.ssh.authorized_keys.get(fingerprint)
|
||||
.map(|key_entry| Identity {
|
||||
id: fingerprint.to_string(),
|
||||
scopes: key_entry.scopes.clone(),
|
||||
resources: key_entry.resources.clone(),
|
||||
})
|
||||
}
|
||||
|
||||
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity> {
|
||||
// Verify Ed25519 signature against the same authorized_keys set
|
||||
// Resolve to the same Identity as SSH auth would produce
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### StorageIdentityProvider (Future — Phase 2+)
|
||||
|
||||
Implemented in `alknet-storage` (a crate that doesn't exist yet). Backed by
|
||||
SQLite `peer_credentials` and `api_keys` tables plus the ACL graph. Resolves
|
||||
fingerprint → account → organization membership → effective scopes.
|
||||
|
||||
This implementation is defined here so the contract is clear, but alknet-storage
|
||||
hasn't been built yet. Phase 1 uses `ConfigIdentityProvider` exclusively. When
|
||||
alknet-storage is built, it implements alknet-core's `IdentityProvider` trait,
|
||||
and the CLI/NAPI assembly layer wires the concrete implementation.
|
||||
|
||||
### AuthProtocol irpc Service
|
||||
|
||||
The `AuthProtocol` irpc service (behind the `irpc` feature flag per ADR-028)
|
||||
provides an async boundary for auth verification. It is one way to satisfy the
|
||||
`IdentityProvider` trait, not a replacement for it:
|
||||
|
||||
```rust
|
||||
enum AuthProtocol {
|
||||
VerifyPubkey { fingerprint: String, key_data: Vec<u8> },
|
||||
VerifyToken { token_bytes: Vec<u8>, timestamp: u64 },
|
||||
ReloadKeys,
|
||||
CheckAccess { identity: Identity, operation: String },
|
||||
}
|
||||
|
||||
enum AuthResult {
|
||||
Ok(Identity),
|
||||
Denied(String),
|
||||
}
|
||||
```
|
||||
|
||||
The relationship:
|
||||
- **Trait-based path**: Handler calls `identity_provider.resolve_from_fingerprint()`
|
||||
directly. Zero overhead. Used when irpc is disabled or when the
|
||||
implementation is local.
|
||||
- **irpc path**: Handler calls `identity_provider.resolve_from_fingerprint()`,
|
||||
which internally delegates to `AuthProtocol::VerifyPubkey` via an irpc client.
|
||||
Used in production deployments with SQLite-backed auth.
|
||||
|
||||
Both paths produce the same `Identity` result. Note: the irpc path requires the
|
||||
service layer to be built (Phase 2+). Phase 1 uses the trait path exclusively.
|
||||
|
||||
### Auth Flows
|
||||
|
||||
**SSH key auth** (existing, unchanged):
|
||||
```
|
||||
Client connects → SSH handshake → auth_publickey() callback
|
||||
→ IdentityProvider::resolve_from_fingerprint(fingerprint)
|
||||
→ Some(Identity) or None
|
||||
```
|
||||
|
||||
**Token auth** (new, for non-SSH transports):
|
||||
```
|
||||
Browser connects → WebTransport CONNECT request
|
||||
→ Extract token from URL path or Authorization header
|
||||
→ IdentityProvider::resolve_from_token(token)
|
||||
→ Some(Identity) or None
|
||||
```
|
||||
|
||||
Both paths produce an `Identity`. The `Identity` is attached to the connection
|
||||
and used by `ForwardingPolicy` and call protocol for authorization decisions.
|
||||
|
||||
## Constraints
|
||||
|
||||
- `Identity` and `IdentityProvider` live in `alknet_core::auth`. No database
|
||||
dependency at the core level (ADR-029).
|
||||
- alknet-storage implements the core trait — the dependency goes from storage
|
||||
to core, not the other way.
|
||||
- The `id` field in `Identity` serves dual purpose (fingerprint or UUID). This
|
||||
is a deliberate simplification — downstream consumers don't need to know the
|
||||
source.
|
||||
- Certificate authority tokens are not supported for token auth in v1 (ADR-023).
|
||||
- The irpc feature flag means nodes that only do SSH tunneling don't need the
|
||||
service layer overhead.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- None specific to this spec. See [open-questions.md](open-questions.md) for
|
||||
general auth questions (OQ-15, OQ-19).
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [029](decisions/029-identity-core-type.md) | Identity as core type | `Identity` and `IdentityProvider` live in alknet-core, not storage |
|
||||
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | `AuthProtocol` behind feature flag; `IdentityProvider` is the contract |
|
||||
| [023](decisions/023-unified-auth-shared-key-material.md) | Unified auth | Same key material for SSH and token auth; same `Identity` result |
|
||||
|
||||
## References
|
||||
|
||||
- [auth.md](auth.md) — Token authentication, AuthPolicy, WebTransport session handling
|
||||
- [research/services.md](../research/services.md) — AuthService, AuthProtocol definition
|
||||
- [research/integration-plan.md](../research/integration-plan.md) — Phase 1.2
|
||||
- [ADR-030](decisions/030-static-dynamic-config-split.md) — DynamicConfig (ConfigIdentityProvider reads from it)
|
||||
- [ADR-031](decisions/031-forwarding-policy.md) — ForwardingPolicy consumes Identity.scopes
|
||||
@@ -1,390 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-09
|
||||
---
|
||||
|
||||
# Interface (Layer 2)
|
||||
|
||||
## What
|
||||
|
||||
The Interface layer sits between Transport (Layer 1) and Protocol (Layer 3).
|
||||
Interfaces consume byte streams from Transports or manage their own transports,
|
||||
and produce call protocol sessions or handle discrete requests. SSH is an
|
||||
interface, not a transport — it wraps a byte stream in session semantics. Raw
|
||||
framing (4-byte length prefix + JSON `EventEnvelope`) is another interface.
|
||||
HTTP and DNS are message-based interfaces that handle individual request/response
|
||||
pairs without persistent sessions.
|
||||
|
||||
## Why
|
||||
|
||||
In the original architecture, SSH was deeply embedded in `ServerHandler`. This
|
||||
tangling of transport, interface, and protocol made it impossible to:
|
||||
|
||||
- Run the call protocol over DNS queries without wrapping SSH inside DNS
|
||||
- Use raw framing for local service mesh (no SSH overhead)
|
||||
- Support WebTransport direct call protocol for browsers
|
||||
- Separate auth mechanics from channel management
|
||||
- Accept HTTP requests and map them to call protocol operations
|
||||
|
||||
The three-layer model (ADR-026) cleanly separates these concerns. Transport
|
||||
produces bytes. Interface parses bytes into sessions or handles requests.
|
||||
Protocol carries semantics. A connection is always a (Transport, Interface)
|
||||
pair for stream-based interfaces, or a standalone message-based interface.
|
||||
|
||||
Phase 2 research identified that HTTP and DNS don't fit the persistent session
|
||||
model — they're stateless per-request. This led to the StreamInterface /
|
||||
MessageInterface split (ADR-035), which gives each interface category its own
|
||||
trait with the right lifecycle and ownership model.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Three-Layer Model
|
||||
|
||||
```
|
||||
Layer 3: Protocol (Call protocol, Operations, OperationEnv)
|
||||
Layer 2: Interface (StreamInterface: SSH, raw framing | MessageInterface: HTTP, DNS)
|
||||
Layer 1: Transport (TCP, TLS, iroh, WebTransport)
|
||||
```
|
||||
|
||||
- **Layer 1: Transport** — produces byte streams (`AsyncRead + AsyncWrite + Unpin
|
||||
+ Send`). Unchanged per ADR-001. DNS is NOT a transport.
|
||||
- **Layer 2: Interface** — two categories:
|
||||
- **StreamInterface**: consumes a `TransportStream` and produces a long-lived
|
||||
session that yields `InterfaceEvent` frames.
|
||||
- **MessageInterface**: handles individual `InterfaceRequest` →
|
||||
`InterfaceResponse` pairs. Manages its own transport.
|
||||
- **Layer 3: Protocol** — carries semantics. Call protocol events, operation
|
||||
registry, service calls. Agnostic to both Transport and Interface below it.
|
||||
|
||||
### StreamInterface Trait
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait StreamInterface: Send + Sync + 'static {
|
||||
type Session: InterfaceSession;
|
||||
|
||||
async fn accept(
|
||||
&self,
|
||||
stream: Box<dyn TransportStream>,
|
||||
config: &InterfaceConfig,
|
||||
) -> Result<Self::Session>;
|
||||
}
|
||||
```
|
||||
|
||||
The session produced by a `StreamInterface` is consumed by the call protocol
|
||||
handler. Different stream interfaces produce different session types, but the
|
||||
call protocol handler receives `InterfaceEvent` frames from any stream
|
||||
interface.
|
||||
|
||||
### MessageInterface Trait
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait MessageInterface: Send + Sync + 'static {
|
||||
async fn handle_request(&self, request: InterfaceRequest) -> Result<InterfaceResponse>;
|
||||
}
|
||||
```
|
||||
|
||||
Message-based interfaces handle individual requests without persistent sessions.
|
||||
They manage their own transport (HTTP server, DNS server) and normalize requests
|
||||
into `InterfaceRequest` / `InterfaceResponse`.
|
||||
|
||||
### InterfaceRequest / InterfaceResponse
|
||||
|
||||
```rust
|
||||
pub struct InterfaceRequest {
|
||||
pub operation_path: String, // e.g., "/head/auth/verify"
|
||||
pub input: Value, // JSON input payload
|
||||
pub auth_token: Option<AuthToken>, // Extracted from wire format
|
||||
pub metadata: HashMap<String, String>,
|
||||
}
|
||||
|
||||
pub struct InterfaceResponse {
|
||||
pub result: Result<Value, CallError>,
|
||||
pub status: u16, // HTTP status, DNS result code, etc.
|
||||
pub headers: HashMap<String, String>,
|
||||
}
|
||||
```
|
||||
|
||||
The call protocol handler processes `InterfaceRequest` the same way it processes
|
||||
`InterfaceEvent` frames — both resolve to operation invocations through
|
||||
`OperationEnv`. The difference is framing: stream interfaces produce `InterfaceEvent`
|
||||
frames from a continuous byte stream, message interfaces construct `InterfaceRequest`
|
||||
from their wire format.
|
||||
|
||||
### InterfaceSession
|
||||
|
||||
Every stream interface session implements `InterfaceSession`:
|
||||
|
||||
```rust
|
||||
pub struct InterfaceEvent {
|
||||
pub envelope: EventEnvelope,
|
||||
pub identity: Option<Identity>,
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
pub trait InterfaceSession: Send {
|
||||
async fn recv(&mut self) -> Option<InterfaceEvent>;
|
||||
async fn send(&mut self, envelope: EventEnvelope) -> Result<()>;
|
||||
}
|
||||
```
|
||||
|
||||
`InterfaceEvent` carries an `EventEnvelope` and the authenticated `Identity`.
|
||||
The call protocol handler (Layer 3) receives `InterfaceEvent` frames and
|
||||
processes them uniformly, regardless of whether they arrived over SSH or raw
|
||||
framing.
|
||||
|
||||
### SshInterface (StreamInterface)
|
||||
|
||||
Wraps the existing `ServerHandler` logic. This is the most complex stream
|
||||
interface because SSH provides channel multiplexing, auth negotiation, and
|
||||
proxy management within a single session.
|
||||
|
||||
What stays in SshInterface (Layer 2):
|
||||
- SSH handshake and session management
|
||||
- Auth delegation to `IdentityProvider` (via `auth_publickey()` callback)
|
||||
- Channel multiplexing (multiple channels per session)
|
||||
- `alknet-control:0` channel routing to call protocol
|
||||
|
||||
What moves to Layer 3 (call protocol handler):
|
||||
- Operation registry and dispatch
|
||||
- Forwarding policy checks (per ADR-031)
|
||||
- Operation context construction (Identity, scopes)
|
||||
|
||||
What moves to per-connection state:
|
||||
- Port forwarding proxy logic
|
||||
|
||||
**Current implementation note**: `SshSession::recv()` and `SshSession::send()`
|
||||
are stubs. The bridge from SSH channels to `InterfaceEvent` frames is
|
||||
scheduled for Phase 2 implementation (see integration-plan.md Phase 2.1).
|
||||
|
||||
### RawFramingInterface (StreamInterface)
|
||||
|
||||
Reads 4-byte big-endian length prefix + JSON `EventEnvelope` frames directly
|
||||
from the transport stream. No SSH wrapping. No channel multiplexing — the
|
||||
entire stream is a single call protocol channel.
|
||||
|
||||
```rust
|
||||
pub struct RawFramingInterface;
|
||||
|
||||
impl StreamInterface for RawFramingInterface {
|
||||
type Session = RawFramingSession;
|
||||
// Reads length-prefixed EventEnvelope frames from the stream
|
||||
}
|
||||
```
|
||||
|
||||
Used for:
|
||||
- Local service mesh (TCP + raw framing, no SSH overhead)
|
||||
- Secure mesh (TLS + raw framing)
|
||||
- WebTransport direct call protocol (future: WebTransport + raw framing)
|
||||
|
||||
Auth for raw framing: `AuthToken` in frame header, resolved via
|
||||
`IdentityProvider::resolve_from_token()`.
|
||||
|
||||
**Current implementation note**: `RawFramingInterface::accept()` returns an
|
||||
error. Frame reading/writing is scheduled for Phase 2 implementation (see
|
||||
integration-plan.md Phase 2.2).
|
||||
|
||||
### HttpInterface (MessageInterface)
|
||||
|
||||
Accepts standard HTTP requests and maps them to call protocol operations:
|
||||
|
||||
```
|
||||
POST /v1/{namespace}/{op} → registry.invoke(namespace, op, input) (mutation)
|
||||
GET /v1/{namespace}/{op} → registry.invoke(namespace, op, input) (query)
|
||||
GET /v1/{namespace}/{op} SSE → registry.subscribe(namespace, op, input) (subscription)
|
||||
GET /v1/schema → registry.list_operations()
|
||||
```
|
||||
|
||||
Auth: `Authorization: Bearer <token>` header, resolved via
|
||||
`IdentityProvider::resolve_from_token()`. Both AuthTokens and API keys are
|
||||
accepted.
|
||||
|
||||
The HTTP interface runs inside the existing stealth mode byte-peek architecture:
|
||||
after a TLS handshake, the server peeks at the first bytes. If they're
|
||||
`SSH-2.0-`, the stream goes to `SshInterface`. Otherwise, the stream goes to
|
||||
the axum HTTP router.
|
||||
|
||||
**Phase 2 scope**: Auth middleware, stealth handoff, and default 404 handler
|
||||
only. Specific operation routes and path conventions are Phase 5+. The
|
||||
`ListenerConfig::Http` variant spawns an axum router that reaches auth context;
|
||||
routing inside axum is a later concern.
|
||||
|
||||
### DnsInterface (MessageInterface)
|
||||
|
||||
A DNS server that encodes/decodes `EventEnvelope` frames as DNS query/response
|
||||
pairs. AuthToken is embedded in DNS query labels. Resolution via
|
||||
`IdentityProvider::resolve_from_token()`.
|
||||
|
||||
This is a `MessageInterface` — it manages its own transport (UDP/TCP port 53)
|
||||
and handles individual DNS queries as request/response pairs. DNS is NOT a
|
||||
transport.
|
||||
|
||||
**Phase**: DNS interface implementation is Phase 5+. The `ListenerConfig::Dns`
|
||||
variant and `DnsInterface` stub are defined now; implementation is deferred.
|
||||
|
||||
### Stream-Based Interface Pairs
|
||||
|
||||
| Transport | StreamInterface | Credential Presentation | Use case |
|
||||
|-----------|---------------|------------------------|----------|
|
||||
| TLS | SshInterface | SSH key handshake | Standard alknet tunnel |
|
||||
| TCP | SshInterface | SSH key handshake | Plain SSH tunnel |
|
||||
| iroh | SshInterface | SSH key handshake | P2P SSH tunnel |
|
||||
| TCP | RawFramingInterface | AuthToken in frame header | Local service mesh |
|
||||
| TLS | RawFramingInterface | AuthToken in frame header | Secure mesh |
|
||||
| WebTransport | RawFramingInterface | AuthToken in CONNECT request | Browser call protocol (future) |
|
||||
|
||||
### Message-Based Interface Pairs
|
||||
|
||||
| MessageInterface | Credential Presentation | Owns transport? | Use case |
|
||||
|-----------------|------------------------|----------------|----------|
|
||||
| HttpInterface | `Authorization: Bearer` header | Yes (axum) | REST API, dashboard, integrations |
|
||||
| DnsInterface | AuthToken in query labels | Yes (DNS server) | Censorship-resistant control channel |
|
||||
| WebSocketInterface | AuthToken in handshake | Yes (WS server) | Browser persistent connection (future) |
|
||||
|
||||
Message-based interfaces manage their own transport. They don't need a
|
||||
`Transport` from Layer 1 — they ARE the transport+interface combined.
|
||||
|
||||
### ListenerConfig
|
||||
|
||||
The server's accept loop configuration covers both stream and message interfaces:
|
||||
|
||||
```rust
|
||||
pub enum ListenerConfig {
|
||||
Stream {
|
||||
transport: TransportKind,
|
||||
interface: StreamInterfaceKind,
|
||||
},
|
||||
Http {
|
||||
bind_addr: SocketAddr,
|
||||
tls: bool,
|
||||
stealth: bool, // byte-peek protocol detection on shared port
|
||||
},
|
||||
Dns {
|
||||
bind_addr: SocketAddr,
|
||||
tls: bool,
|
||||
},
|
||||
}
|
||||
|
||||
pub enum StreamInterfaceKind {
|
||||
Ssh,
|
||||
RawFraming,
|
||||
}
|
||||
|
||||
pub enum TransportKind {
|
||||
Tcp,
|
||||
Tls { server_name: Option<String> },
|
||||
Iroh { endpoint_id: String },
|
||||
WebTransport, // Phase 5+: tag only, no acceptor yet
|
||||
}
|
||||
```
|
||||
|
||||
Note: `TransportKind::Dns` does NOT exist. DNS is a `MessageInterface`, not a
|
||||
transport. The `ListenerConfig::Dns` variant handles DNS listener configuration
|
||||
directly.
|
||||
|
||||
### Credential Presentation Across Interfaces
|
||||
|
||||
Every interface resolves to the same `Identity` through `IdentityProvider`:
|
||||
|
||||
```
|
||||
SSH fingerprint → IdentityProvider::resolve_from_fingerprint → Identity
|
||||
AuthToken (Bearer) → IdentityProvider::resolve_from_token → Identity
|
||||
API key (Bearer) → IdentityProvider::resolve_from_token → Identity
|
||||
DNS embedded token → IdentityProvider::resolve_from_token → Identity
|
||||
```
|
||||
|
||||
The credential presentation differs per (Transport, Interface) pair, but the
|
||||
resolution result is always an `Identity`. See [definitions.md](definitions.md)
|
||||
for the full table and terminology rules.
|
||||
|
||||
### Server Accept Loop
|
||||
|
||||
With both stream and message interfaces, the accept loop becomes:
|
||||
|
||||
```rust
|
||||
for listener in listeners {
|
||||
match listener {
|
||||
ListenerConfig::Stream { transport, interface } => {
|
||||
// Spawn accept loop: transport.accept() → interface.accept(stream)
|
||||
}
|
||||
ListenerConfig::Http { bind_addr, tls, stealth } => {
|
||||
// Spawn axum HTTP server on bind_addr
|
||||
// If stealth: byte-peek after TLS, route SSH vs HTTP
|
||||
}
|
||||
ListenerConfig::Dns { bind_addr, tls } => {
|
||||
// Spawn DNS server on bind_addr
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Constraints
|
||||
|
||||
- `StreamInterface` and `MessageInterface` are independent traits with different
|
||||
signatures, lifecycles, and transport ownership. No common super-trait (ADR-035).
|
||||
- `SshInterface` is the most invasive refactoring. The existing `SshHandler`
|
||||
owns auth, channel management, and proxy logic — extracting these cleanly
|
||||
requires careful design (integration-plan Phase 1.8, completed in Phase 1).
|
||||
- DNS interface implementation is Phase 5 work. `DnsInterface` is defined as a
|
||||
`MessageInterface` stub; implementation is deferred.
|
||||
- HTTP interface Phase 2 scope is limited to auth middleware and stealth handoff.
|
||||
Specific operation routes are Phase 5+.
|
||||
- WebTransport is Phase 5 work. `TransportKind::WebTransport` and
|
||||
`StreamInterfaceKind::WebTransport` are tags only for now.
|
||||
- `TransportKind::Dns` does not exist. DNS is a `MessageInterface`, not a
|
||||
transport. This was `TransportKind` enum pollution from an earlier design.
|
||||
- The `Interface` trait (singular) in the current codebase needs to be renamed
|
||||
to `StreamInterface`. This is a rename, not a semantic change.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **OQ-IF-02**: ~~Should `SshInterface` own the `ForwardingPolicy` check for
|
||||
`channel_open_direct_tcpip`, or should that move to Layer 3?~~ **Resolved**:
|
||||
ForwardingPolicy is Layer 3, but channel open/close lifecycle is Layer 2.
|
||||
SshInterface reports channel requests to Layer 3; Layer 3 applies policy.
|
||||
|
||||
- **OQ-P2-01**: Should `MessageInterface` and `StreamInterface` share a common
|
||||
trait? **Recommendation**: No. Independent traits with different signatures,
|
||||
lifecycles, and transport ownership. A common super-trait adds complexity
|
||||
without clear benefit. (See ADR-035.)
|
||||
|
||||
- **OQ-P2-02**: Should the HTTP interface share a port with the SSH listener?
|
||||
**Recommendation**: Start with separate ports. ALPN multiplexing on port 443
|
||||
is a future optimization that doesn't change the interface abstraction.
|
||||
Stealth mode byte-peek already handles shared-port detection for the common
|
||||
case.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [026](decisions/026-transport-interface-separation.md) | Three-layer model | SSH is Layer 2, not Layer 1 |
|
||||
| [035](decisions/035-streaminterface-messageinterface-split.md) | StreamInterface / MessageInterface | Two trait categories at Layer 2 |
|
||||
| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv | Protocol is interface-agnostic |
|
||||
| [029](decisions/029-identity-core-type.md) | Identity as core type | Auth resolution across interfaces |
|
||||
| [031](decisions/031-forwarding-policy.md) | Forwarding policy | Layer 3 policy applied to Layer 2 channel requests |
|
||||
|
||||
## Phase 2 Implementation Notes
|
||||
|
||||
- `Interface` trait renamed to `StreamInterface` throughout alknet-core (ADR-035 implemented)
|
||||
- `MessageInterface` trait added with `handle_request(InterfaceRequest) -> Result<InterfaceResponse>` (ADR-035 implemented)
|
||||
- `InterfaceRequest` and `InterfaceResponse` types implemented
|
||||
- `HttpInterface` and `DnsInterface` stub structs added (Phase 5 for full implementation)
|
||||
- `InterfaceConfig` split into `StreamInterfaceConfig` and `MessageInterfaceConfig`
|
||||
- `StreamInterfaceKind` and `MessageInterfaceKind` enums added
|
||||
- `ListenerConfig` restructured from flat struct to enum with `Stream`, `Http`, `Dns` variants
|
||||
- `TransportKind::Dns` removed from the enum (DNS is a MessageInterface, not a transport)
|
||||
- `TransportKind::WebTransport` updated from `{ host: String }` to `{ server_name: Option<String> }`
|
||||
- `RawFramingInterface` fully implemented with first-frame auth
|
||||
- `SshSession::recv()`/`send()` bridge to call protocol via `alknet-control:0` channel implemented, using `ControlChannelBridge` with mpsc channels
|
||||
|
||||
## References
|
||||
|
||||
- [definitions.md](definitions.md) — Terminology disambiguation, credential presentation
|
||||
- [research/phase2/interface-model.md](../research/phase2/interface-model.md) — Full StreamInterface/MessageInterface analysis
|
||||
- [research/phase2/tls-transport.md](../research/phase2/tls-transport.md) — HTTP interface, stealth handoff, ListenerConfig
|
||||
- [research/integration-plan.md](../research/integration-plan.md) — Phase 1.8, Phase 2.1-2.7
|
||||
- [transport.md](transport.md) — Transport trait (unchanged at Layer 1)
|
||||
- [auth.md](auth.md) — Credential presentation per (Transport, Interface) pair
|
||||
- [identity.md](identity.md) — IdentityProvider, auth across interfaces
|
||||
@@ -1,189 +0,0 @@
|
||||
---
|
||||
status: reviewed
|
||||
last_updated: 2026-06-07
|
||||
---
|
||||
|
||||
# NAPI Wrapper & PubSub Event Target
|
||||
|
||||
## What
|
||||
|
||||
Two integration layers that enable TypeScript/JavaScript consumers to use alknet as a transport:
|
||||
|
||||
1. **NAPI wrapper** (`@alkdev/alknet`) — A Node.js native addon (via napi-rs) exposing `connect()` and `serve()` that return duplex streams
|
||||
2. **PubSub event target** (`@alkdev/pubsub` adapter) — An implementation of the `TypedEventTarget` interface that routes events over alknet's SSH channel
|
||||
|
||||
## Why
|
||||
|
||||
The alknet Rust binary serves CLI users. But the broader ecosystem (pubsub, operations, agent workers) is TypeScript-first. These integration layers let TypeScript code use alknet's transport without reimplementing SSH.
|
||||
|
||||
The NAPI surface is intentionally minimal — it exposes transport connections as duplex streams, not the full SSH protocol. The pubsub adapter wraps those streams with `EventEnvelope` serialization.
|
||||
|
||||
## Architecture
|
||||
|
||||
### NAPI Wrapper (napi-rs)
|
||||
|
||||
The wrapper uses napi-rs (ADR-015) and exposes two functions (ADR-016):
|
||||
|
||||
```typescript
|
||||
// @alkdev/alknet (TypeScript side)
|
||||
|
||||
interface AlknetConnectOptions {
|
||||
// TCP/TLS mode
|
||||
server?: string; // e.g., "example.com:443"
|
||||
// iroh mode
|
||||
peer?: string; // iroh endpoint ID (base58-encoded)
|
||||
// Transport
|
||||
transport: 'tcp' | 'tls' | 'iroh';
|
||||
// Auth
|
||||
identity?: string; // path to SSH key, or Buffer with key data
|
||||
// TLS
|
||||
tlsServerName?: string; // SNI hostname
|
||||
insecure?: boolean; // accept self-signed certs
|
||||
// iroh
|
||||
irohRelay?: string; // relay URL (default: n0)
|
||||
// Proxy
|
||||
proxy?: string; // upstream SOCKS5/HTTP proxy URL
|
||||
}
|
||||
|
||||
interface AlknetServeOptions {
|
||||
// Transport
|
||||
transport: 'tcp' | 'tls' | 'iroh';
|
||||
// Auth
|
||||
hostKey?: string; // path to SSH host key, or Buffer with key data
|
||||
authorizedKeys?: string; // path to authorized_keys, or Buffer with key data
|
||||
certAuthority?: string; // path to CA public key for cert-authority auth
|
||||
// TLS
|
||||
tlsCert?: string; // path to TLS cert
|
||||
tlsKey?: string; // path to TLS key
|
||||
acmeDomain?: string; // ACME domain for auto-cert (ADR-008)
|
||||
// Listen
|
||||
listen?: string; // listen address (default: 0.0.0.0:22)
|
||||
// iroh
|
||||
irohRelay?: string; // relay URL (default: n0)
|
||||
}
|
||||
|
||||
// Returns a Duplex stream for the SSH channel
|
||||
function connect(options: AlknetConnectOptions): Promise<Duplex>;
|
||||
|
||||
// Returns a server object with close() and connection events
|
||||
function serve(options: AlknetServeOptions): Promise<AlknetServer>;
|
||||
|
||||
interface AlknetServer {
|
||||
close(): Promise<void>;
|
||||
onConnection(callback: (stream: Duplex, info: ConnectionInfo) => void): void;
|
||||
// Dynamic config reload (ADR-030)
|
||||
reloadAuth(auth: { authorizedKeys?: Buffer, certAuthority?: Buffer }): void;
|
||||
reloadForwarding(policy: ForwardingPolicyConfig): void;
|
||||
reloadAll(config: DynamicConfig): void;
|
||||
}
|
||||
|
||||
interface ForwardingPolicyConfig {
|
||||
default: 'allow' | 'deny';
|
||||
rules: ForwardingRuleConfig[];
|
||||
}
|
||||
|
||||
interface ForwardingRuleConfig {
|
||||
target: string; // "localhost:*", "10.0.0.0/8:80", "alknet-*"
|
||||
action: 'allow' | 'deny';
|
||||
principals?: string[]; // default ["*"]
|
||||
}
|
||||
```
|
||||
|
||||
The NAPI layer is **transport-agnostic** — it doesn't know about pubsub's `EventEnvelope`. The pubsub adapter wraps the `Duplex` stream to implement `TypedEventTarget`. This separation ensures the NAPI wrapper is reusable for any stream-based protocol, not tied specifically to pubsub.
|
||||
|
||||
### NAPI Call Protocol Integration
|
||||
|
||||
NAPI consumers can register operation handlers to participate in the call protocol. The `Duplex` stream from `connect()` or `serve()` carries `EventEnvelope` frames (4-byte BE length prefix + JSON). A TypeScript consumer can implement a call protocol handler that reads these frames and dispatches to registered operations — the same wire protocol used by `@alkdev/operations`.
|
||||
|
||||
See [call-protocol.md](call-protocol.md) for the call protocol spec and [services.md](services.md) for OperationEnv and dispatch paths.
|
||||
|
||||
### NAPI irpc Service Creation
|
||||
|
||||
Behind the `irpc` feature flag, NAPI consumers can create irpc service instances for in-cluster communication. This is a Phase 2+ capability — Phase 1 uses `ConfigIdentityProvider` and direct `ConfigReloadHandle` calls. See [services.md](services.md) for the irpc service layer and ADR-027 for crate decomposition.
|
||||
|
||||
### NAPI `connect()` vs CLI `alknet connect`
|
||||
|
||||
The NAPI `connect()` function and the CLI `alknet connect` command are fundamentally different operations despite sharing the same name:
|
||||
|
||||
- **CLI `alknet connect`**: Starts a full SSH client session with a local SOCKS5 server and optional port forwards. It manages multiple SSH channels over a single session — the user routes traffic through it via SOCKS5 or forwarded ports.
|
||||
- **NAPI `connect()`**: Opens a single SSH channel and returns it as a `Duplex` stream. No SOCKS5 server, no port forwarding. The caller reads and writes bytes directly. This is designed for the pubsub/programmatic use case where a single bidirectional byte stream is needed.
|
||||
|
||||
For SOCKS5 proxy functionality, use the CLI binary (`alknet connect`). The NAPI wrapper is for programmatic consumers that need a raw stream.
|
||||
|
||||
### Programmatic Configuration (ADR-011)
|
||||
|
||||
Both `connect()` and `serve()` accept options as plain objects. No file paths are mandatory — keys can be provided as `Buffer` data directly, making programmatic usage straightforward. Environment variables (`ALKNET_SERVER`, `ALKNET_IDENTITY`) provide convenience defaults.
|
||||
|
||||
Key material provided as `Buffer` must be in **OpenSSH key format** (the format used by `ssh-keygen`). Private keys: OpenSSH format (`-----BEGIN OPENSSH PRIVATE KEY-----`). Public keys: OpenSSH format (`ssh-ed25519 AAAA...`). PEM-encoded keys (PKCS#1, PKCS#8) are not supported.
|
||||
|
||||
### PubSub Event Target Adapter
|
||||
|
||||
This implements `TypedEventTarget` from `@alkdev/pubsub`:
|
||||
|
||||
```typescript
|
||||
// @alkdev/pubsub (new adapter: event-target-alknet.ts)
|
||||
|
||||
export interface AlknetEventTargetOptions {
|
||||
stream: Duplex; // from @alkdev/alknet.connect() or serve()
|
||||
}
|
||||
|
||||
export interface AlknetEventTarget<TEvent extends TypedEvent>
|
||||
extends TypedEventTarget<TEvent> {
|
||||
close(): void;
|
||||
}
|
||||
|
||||
export function createAlknetEventTarget<TEvent extends TypedEvent>(
|
||||
options: AlknetEventTargetOptions
|
||||
): AlknetEventTarget<TEvent>;
|
||||
```
|
||||
|
||||
Wire protocol (same as other pubsub adapters):
|
||||
|
||||
- **Framing**: 4-byte big-endian length prefix + JSON payload
|
||||
- **Payload**: `EventEnvelope` JSON (`{ type, id, payload }`)
|
||||
- **Control**: `__subscribe` / `__unsubscribe` messages for topic-based routing
|
||||
- **Direction**: Bidirectional — `dispatchEvent` sends, `addEventListener` subscribes and receives
|
||||
|
||||
### On the Server Side
|
||||
|
||||
The alknet server uses a reserved `direct_tcpip` destination (`alknet-control:0`) for the pubsub control channel (ADR-018). When a client connects to this destination:
|
||||
|
||||
1. The server's `channel_open_direct_ip` handler detects the reserved `alknet-control` target
|
||||
2. Instead of opening a TCP connection, it bridges the channel to its local pubsub event bus
|
||||
3. `EventEnvelope` JSON flows bidirectionally over the SSH channel
|
||||
|
||||
Users who prefer not to use the control channel can alternatively run a pubsub service on a specific port and use standard port forwarding: `alknet connect --forward 9736:head:9736`. This is a deployment choice, not a separate implementation — alknet's port forwarding works normally for any TCP service.
|
||||
|
||||
- **Worker connects to head**: `alknet connect --forward 9736:head:9736` then create WebSocket event target pointing at `ws://localhost:9736`
|
||||
|
||||
- **Head connects to worker**: `alknet connect --remote-forward 9736:worker:9736` — same result, opposite initiator
|
||||
|
||||
The pubsub adapter doesn't care which side initiated the SSH session. It just needs a byte stream.
|
||||
|
||||
## Constraints
|
||||
|
||||
- The NAPI wrapper exposes duplex streams, not the full SSH channel API. Multiplexing is done at the pubsub layer.
|
||||
- The pubsub wire protocol is length-prefixed JSON, matching the existing adapter pattern. Binary payloads should be base64-encoded in the `EventEnvelope.payload`.
|
||||
- The NAPI binary size will be ~5-10MB (includes russh + tokio + cryptography). The `iroh` feature adds significant size; it should be an optional feature.
|
||||
- Keys can be provided as file paths or `Buffer` data, supporting both CLI and programmatic usage patterns (ADR-011).
|
||||
|
||||
## Open Questions
|
||||
|
||||
None — all resolved.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [007](decisions/007-napi-single-stream.md) | NAPI exposes single duplex stream | No SSH multiplexing in JS, pubsub handles it |
|
||||
| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first API | No file-based config; options are structs or env vars |
|
||||
| [015](decisions/015-napi-rs-for-ffi-bridge.md) | napi-rs for FFI | Standard Node.js native addon tooling |
|
||||
| [016](decisions/016-napi-expose-connect-and-serve.md) | Both connect() and serve() | NAPI exposes client and server sides from the start |
|
||||
| [018](decisions/018-control-channel-for-pubsub.md) | Control channel for pubsub | Reserved `alknet-control` destination for event bus |
|
||||
| [030](decisions/030-static-dynamic-config-split.md) | Static/dynamic config split | NAPI reload methods for auth, forwarding, and all dynamic config |
|
||||
|
||||
## References
|
||||
|
||||
- [configuration.md](configuration.md) — DynamicConfig, ForwardingPolicy, reload mechanism
|
||||
- [services.md](services.md) — OperationEnv, irpc service layer
|
||||
- [call-protocol.md](call-protocol.md) — Call protocol wire format and operation registry
|
||||
@@ -1,340 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-07
|
||||
---
|
||||
|
||||
# Open Questions
|
||||
|
||||
## Transport
|
||||
|
||||
### OQ-01: TLS certificate management strategy
|
||||
- **Origin**: [server.md](server.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~medium~~ —
|
||||
- **Resolution**: ADR-008 — Support both domain-based and IP-based ACME/Let's Encrypt auto-provisioning, plus manual certs. Domain-based uses standard certbot-style flow with HTTP-01/TLS-ALPN-01 challenges. IP-based uses short-lived certs via TLS-ALPN-01 on port 443. Manual certs via `--tls-cert`/`--tls-key` always supported. Implementation uses `rustls-acme` or similar pure-Rust ACME client.
|
||||
- **Cross-references**: [ADR-008](decisions/008-acme-lets-encrypt.md), Server spec, TlsTransport implementation
|
||||
|
||||
### OQ-02: iroh relay configuration defaults
|
||||
- **Origin**: [transport.md](transport.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~low~~ —
|
||||
- **Resolution**: ADR-009 — Default to n0's free relay servers. Allow override via `--iroh-relay <url>`. Document self-hosted relay setup. This matches iroh's own defaults and minimizes friction for testing/development.
|
||||
- **Cross-references**: [ADR-009](decisions/009-default-iroh-relay.md), Transport spec
|
||||
|
||||
### OQ-05: Transport chaining support in CLI
|
||||
- **Origin**: [transport.md](transport.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~low~~ —
|
||||
- **Resolution**: ADR-010 — Support `--transport iroh --proxy socks5://...` natively in the CLI. iroh's endpoint builder accepts proxy configuration directly, so the implementation is minimal. Other transport combinations (TCP+TLS) are already implicit.
|
||||
- **Cross-references**: [ADR-010](decisions/010-transport-chaining-cli.md), Transport spec
|
||||
|
||||
## Client
|
||||
|
||||
### OQ-06: SSH config file parsing
|
||||
- **Origin**: [client.md](client.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~low~~ —
|
||||
- **Resolution**: ADR-011 — No `~/.ssh/config` parsing, no custom config file. Configuration is programmatic-first: CLI flags, library API structs (`ConnectOptions`, `ServeOptions`), and environment variables. Cross-platform path issues (`~` expansion) are avoided. The library API is the primary interface; if config files are needed later, they can be a separate layer.
|
||||
- **Cross-references**: [ADR-011](decisions/011-no-ssh-config-programmatic-api.md), Client spec
|
||||
|
||||
## Server
|
||||
|
||||
### OQ-07: ACME/Let's Encrypt support
|
||||
- **Origin**: [server.md](server.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~medium~~ —
|
||||
- **Resolution**: ADR-008 — Same resolution as OQ-01. Both domain-based (standard, domain-bound, auto-renewing) and IP-based (short-lived, no domain required) ACME flows are supported. The domain-based path requires port 80 or DNS access for challenges. The IP-based path uses TLS-ALPN-01 on port 443 and requires the ACME client to run continuously.
|
||||
- **Cross-references**: [ADR-008](decisions/008-acme-lets-encrypt.md), Server spec, TlsTransport
|
||||
|
||||
### OQ-08: Connection limits and rate limiting
|
||||
- **Origin**: [server.md](server.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~low~~ —
|
||||
- **Resolution**: ADR-013 — Two-layer approach: (1) Structured logging of auth attempts and connections at INFO level for fail2ban integration on Linux — matches our production fail2ban setup with nftables and systemd journal. (2) Built-in rate limiting: `--max-connections-per-ip` and `--max-auth-attempts` flags providing platform-independent abuse protection.
|
||||
- **Cross-references**: [ADR-013](decisions/013-fail2ban-friendly-logging.md), Server spec, Production fail2ban docs
|
||||
|
||||
### OQ-04: Authentication beyond Ed25519 keys
|
||||
- **Origin**: [client.md](client.md), [server.md](server.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~low~~ —
|
||||
- **Resolution**: ADR-012 — Ed25519 public key (default, unchanged) + OpenSSH certificate authority support (new, important for multi-user). No password authentication over SSH channels. If a local SOCKS5 proxy needs its own auth, that's a separate concern. Cert-authority makes multi-user management practical: one CA entry in `authorized_keys` instead of N individual keys. Certificates support expiry and restrictions.
|
||||
- **Cross-references**: [ADR-012](decisions/012-auth-ed25519-and-cert-authority.md), Client spec, Server spec
|
||||
|
||||
## TUN
|
||||
|
||||
### OQ-03: Windows TUN support scope
|
||||
- **Origin**: [tun-shim.md](tun-shim.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~low~~ —
|
||||
- **Resolution**: ADR-014 — TUN is deferred entirely from the alknet project. For VPN-like behavior, users run `tun2proxy --proxy socks5://127.0.0.1:1080` alongside alknet. This eliminates all TUN-related scope questions (Windows, TCP reconstruction, etc.).
|
||||
- **Cross-references**: [ADR-014](decisions/014-defer-tun-recommend-socks5-proxy.md)
|
||||
|
||||
### OQ-09: TCP reconstruction approach for TUN
|
||||
- **Origin**: [tun-shim.md](tun-shim.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~medium~~ —
|
||||
- **Resolution**: ADR-014 — TUN is deferred from alknet. tun2proxy (external tool) handles this if users need VPN-like behavior.
|
||||
- **Cross-references**: [ADR-014](decisions/014-defer-tun-recommend-socks5-proxy.md)
|
||||
|
||||
## NAPI / PubSub
|
||||
|
||||
### OQ-10: NAPI wrapper API surface
|
||||
- **Origin**: [napi-and-pubsub.md](napi-and-pubsub.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~medium~~ —
|
||||
- **Resolution**: ADR-016 — Expose both `connect()` and `serve()` from the start. Both are fundamental operations needed by the pubsub event target system (spokes use `connect()`, hubs could use `serve()`). The NAPI layer is transport-agnostic — it doesn't know about pubsub's `EventEnvelope`. The pubsub adapter wraps the `Duplex` stream. This ensures the NAPI wrapper is reusable for any stream-based protocol, not tied specifically to pubsub.
|
||||
- **Cross-references**: [ADR-016](decisions/016-napi-expose-connect-and-serve.md), napi-and-pubsub.md
|
||||
|
||||
### OQ-11: napi-rs vs uniffi for FFI bridge
|
||||
- **Origin**: [napi-and-pubsub.md](napi-and-pubsub.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~low~~ —
|
||||
- **Resolution**: ADR-015 — Use napi-rs. It's the standard for Node.js native addons, matches our primary consumer (TypeScript/Node.js), and has the best ecosystem and documentation. If future Python or mobile consumers are needed, a separate uniffi layer can be added — the Rust core doesn't change.
|
||||
- **Cross-references**: [ADR-015](decisions/015-napi-rs-for-ffi-bridge.md), napi-and-pubsub.md
|
||||
|
||||
## Configuration
|
||||
|
||||
### OQ-12: Per-user forwarding scope vs global rules
|
||||
- **Origin**: [research/configuration.md](../research/configuration.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~medium~~ —
|
||||
- **Resolution**: ADR-031 — Start with global rules + principal matching from `Identity.scopes`. Per-user scope from `peer_credentials.metadata.scopes` via `IdentityProvider`. The `ForwardingPolicy` evaluates rules against `Identity.id` and `Identity.scopes` from the authenticated identity.
|
||||
- **Cross-references**: [ADR-031](decisions/031-forwarding-policy.md), [configuration.md](configuration.md)
|
||||
|
||||
### OQ-13: Config file auto-reload via file watching
|
||||
- **Origin**: [research/configuration.md](../research/configuration.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: low
|
||||
- **Resolution**: No file watching. CLI loads once at startup; NAPI/head reload explicitly. File watching is a potential attack vector and unnecessary complexity for a security tool.
|
||||
- **Cross-references**: configuration.md
|
||||
|
||||
### OQ-14: ArcSwap vs RwLock for dynamic config
|
||||
- **Origin**: [research/configuration.md](../research/configuration.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: low
|
||||
- **Resolution**: ArcSwap. Lock-free reads on the hot path (every auth check, every channel open). `RwLock` adds contention. `arc-swap` is small (~500 lines) and well-maintained.
|
||||
- **Cross-references**: configuration.md
|
||||
|
||||
### OQ-15: TLS + WebTransport + iroh QUIC listener coexistence
|
||||
- **Origin**: [research/configuration.md](../research/configuration.md)
|
||||
- **Status**: open
|
||||
- **Priority**: medium
|
||||
- **Resolution**: (deferred to Phase 4 — needs R&D in WebTransport transport session)
|
||||
- **Cross-references**: [auth.md](auth.md), OQ-19, [interface.md](interface.md)
|
||||
|
||||
### OQ-16: Transport-specific forwarding policy (e.g., WebTransport clients restricted to alknet-* channels)
|
||||
- **Origin**: [research/configuration.md](../research/configuration.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~low~~ —
|
||||
- **Resolution**: ADR-031 — Add `TransportKind` match in `ForwardingRule`. WebTransport clients can be restricted to `alknet-*` channels via `TargetPattern::AlknetPrefix` combined with a `TransportKind::WebTransport` filter.
|
||||
- **Cross-references**: [ADR-031](decisions/031-forwarding-policy.md), [configuration.md](configuration.md)
|
||||
|
||||
### OQ-17: Transport-aware auth layer (SSH keys vs API keys for non-SSH transports)
|
||||
- **Origin**: [research/configuration.md](../research/configuration.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~medium~~ —
|
||||
- **Resolution**: ADR-023 — Unified auth with shared key material. SSH transports use SSH pubkey auth. Non-SSH transports (WebTransport) use Ed25519-signed timestamp tokens. Both verify against the same `authorized_keys` set. The presentation differs per transport, but the identity is unified. `AuthPolicy` holds both `SshAuthConfig` and `TokenAuthConfig`, with `TokenKeySource::Shared` as the default (same keys for both paths). `IdentityProvider` trait decouples alknet-core from identity storage.
|
||||
- **Cross-references**: [ADR-023](decisions/023-unified-auth-shared-key-material.md), [identity.md](identity.md), OQ-15
|
||||
|
||||
### OQ-23: irpc dependency — always or behind feature flag?
|
||||
- **Origin**: [research/integration-plan.md](../research/integration-plan.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: medium —
|
||||
- **Resolution**: ADR-027 — Feature flag. Nodes that only do SSH tunneling don't need the service layer. irpc is behind a feature flag in alknet-core and an independent dependency in alknet-secret and alknet-storage.
|
||||
- **Cross-references**: [ADR-027](decisions/027-crate-decomposition.md)
|
||||
|
||||
### OQ-24: DNS control channel scope for initial implementation?
|
||||
- **Origin**: [research/integration-plan.md](../research/integration-plan.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: medium —
|
||||
- **Resolution**: ADR-026 — DNS control channel carries call protocol frames only (no SSH tunneling over DNS). The (DNS transport, raw framing interface) pair sends `EventEnvelope` directly. SSH-over-DNS is a future possibility but out of scope.
|
||||
- **Cross-references**: [ADR-026](decisions/026-transport-interface-separation.md), [interface.md](interface.md)
|
||||
|
||||
### OQ-25: alknet-storage and alknet-secret irpc dependency
|
||||
- **Origin**: [research/integration-plan.md](../research/integration-plan.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: low —
|
||||
- **Resolution**: ADR-027 — Independently. They're separate crates. irpc is a shared library they both use as an independent dependency.
|
||||
- **Cross-references**: [ADR-027](decisions/027-crate-decomposition.md)
|
||||
|
||||
## Auth
|
||||
|
||||
### OQ-18: Source of Identity.scopes — ForwardingPolicy, IdentityProvider, or both?
|
||||
- **Origin**: [auth.md](auth.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~medium~~ —
|
||||
- **Resolution**: ADR-029 and ADR-031 — `IdentityProvider` owns scopes. The `Identity` struct includes `scopes` and `resources` fields populated by the `IdentityProvider` implementation (config-based or database-backed). `ForwardingPolicy` uses scopes from `Identity` — it consumes them, it doesn't produce them.
|
||||
- **Cross-references**: [ADR-029](decisions/029-identity-core-type.md), [ADR-031](decisions/031-forwarding-policy.md), [identity.md](identity.md)
|
||||
|
||||
### OQ-19: Separate TLS identity for WebTransport vs shared with SSH-over-TLS?
|
||||
- **Origin**: [auth.md](auth.md)
|
||||
- **Status**: open
|
||||
- **Priority**: low
|
||||
- **Resolution**: (deferred to Phase 4 — QUIC is UDP, TLS-over-TCP is TCP, they can share port 443 without conflict)
|
||||
- **Cross-references**: OQ-15, [interface.md](interface.md)
|
||||
|
||||
## Call Protocol
|
||||
|
||||
### OQ-20: Worker registration and discovery on connect/disconnect
|
||||
- **Origin**: [call-protocol.md](call-protocol.md)
|
||||
- **Status**: open
|
||||
- **Priority**: medium
|
||||
- **Resolution**: (pending — registration on connect / cleanup on disconnect is the leading approach but needs spec in call-protocol.md)
|
||||
- **Cross-references**: ADR-024, ADR-025
|
||||
|
||||
### OQ-21: Routing calls to specific workers with same-service operations
|
||||
- **Origin**: [call-protocol.md](call-protocol.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~medium~~ —
|
||||
- **Resolution**: ADR-024, ADR-025 — Operation paths use `/{node}/{service}/{op}` format. The first path segment identifies the node and routes the call to the correct connected node. Multiple workers exposing the same service are differentiated by the node prefix (`/dev1/fs/readFile` vs `/dev2/fs/readFile`). The head maintains a routing table mapping node identity to connection.
|
||||
- **Cross-references**: [call-protocol.md](call-protocol.md), ADR-024, ADR-025
|
||||
|
||||
### OQ-22: Client streaming (streaming inputs) in the call protocol?
|
||||
- **Origin**: [call-protocol.md](call-protocol.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~low~~ —
|
||||
- **Resolution**: Deferred. Current model (single request, optional streaming response) covers all identified use cases. Client streaming can be added later if needed.
|
||||
- **Cross-references**: ADR-024
|
||||
|
||||
## Services
|
||||
|
||||
### OQ-SVC-01: Should the secret service support multiple seed phrases (one per tenant)?
|
||||
- **Origin**: [secret-service.md](secret-service.md)
|
||||
- **Status**: open
|
||||
- **Priority**: low
|
||||
- **Resolution**: (deferred — one seed per node is simplest; multi-seed can be added later by indexing `Unlock` with a tenant ID)
|
||||
- **Cross-references**: [secret-service.md](secret-service.md)
|
||||
|
||||
### OQ-SVC-02: Should service protocols use postcard (binary) or JSON for remote calls?
|
||||
- **Origin**: [research/services.md](../research/services.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: low —
|
||||
- **Resolution**: Postcard for irpc (Rust-to-Rust, efficient). JSON for call protocol (cross-language, universal). The irpc remote path naturally uses postcard.
|
||||
- **Cross-references**: [services.md](services.md)
|
||||
|
||||
### OQ-SVC-03: How does the secret service integrate with the existing EncryptedDataSchema from @alkdev/storage?
|
||||
- **Origin**: [secret-service.md](secret-service.md)
|
||||
- **Status**: open
|
||||
- **Priority**: medium
|
||||
- **Resolution**: (pending — Rust implementation replaces PBKDF2 password-based encryption with derived AES-256-GCM keys; EncryptedData format is a superset; migration by re-encrypting)
|
||||
- **Cross-references**: [secret-service.md](secret-service.md), [storage.md](storage.md)
|
||||
|
||||
### OQ-SVC-04: Should workers cache derived keys locally?
|
||||
- **Origin**: [secret-service.md](secret-service.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: low —
|
||||
- **Resolution**: Yes, with a TTL (default: 1 hour). The head can revoke by invalidating the session.
|
||||
- **Cross-references**: [secret-service.md](secret-service.md)
|
||||
|
||||
### OQ-SVC-05: How does the NFT-based ACL smart contract interact with the secret service?
|
||||
- **Origin**: [storage.md](storage.md)
|
||||
- **Status**: open
|
||||
- **Priority**: low
|
||||
- **Resolution**: The Ethereum signing key (`m/44'/60'/0'/0/0`) is derived from the same seed as the secret service. The smart contract is a separate concern — it reads on-chain ACL state, it doesn't call the secret service.
|
||||
- **Cross-references**: [storage.md](storage.md), [secret-service.md](secret-service.md)
|
||||
|
||||
## Interface
|
||||
|
||||
### OQ-IF-01: How does the Interface session type relate to the call protocol's EventEnvelope stream?
|
||||
- **Origin**: [interface.md](interface.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~high~~ —
|
||||
- **Resolution**: `InterfaceSession::recv()` returns `Option<InterfaceEvent>` where `InterfaceEvent` carries `EventEnvelope` + `Identity`. `InterfaceSession::send()` accepts `EventEnvelope`. The `SshSession` bridge implements this over the `alknet-control:0` channel. For `MessageInterface`, `InterfaceRequest`/`InterfaceResponse` normalize request/response pairs. See [interface.md](interface.md) and ADR-035.
|
||||
- **Cross-references**: [ADR-035](decisions/035-streaminterface-messageinterface-split.md), [interface.md](interface.md)
|
||||
|
||||
### OQ-IF-02: Should SshInterface own ForwardingPolicy checks or should they move to Layer 3?
|
||||
- **Origin**: [interface.md](interface.md)
|
||||
- **Status**: ~~resolved~~
|
||||
- **Priority**: ~~medium~~ —
|
||||
- **Resolution**: ForwardingPolicy is Layer 3 (it's policy, not session mechanics). Channel open/close lifecycle is Layer 2. The Interface reports channel open requests to Layer 3; Layer 3 applies ForwardingPolicy. The current `SshHandler` implementation checks policy in `channel_open_direct_tcpip`, which already delegates to `Identity.scopes` from the authenticated identity — this is consistent with the resolution.
|
||||
- **Cross-references**: [ADR-031](decisions/031-forwarding-policy.md), [interface.md](interface.md)
|
||||
|
||||
### OQ-P2-01: Should MessageInterface and StreamInterface share a common trait?
|
||||
- **Origin**: [research/phase2/interface-model.md](../research/phase2/interface-model.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: medium
|
||||
- **Resolution**: Independent traits. Different signatures (`handle_request` vs `accept` + session lifecycle), different transport ownership (self-managed vs provided), different lifecycles (stateless per-request vs long-lived session). A common super-trait adds complexity without benefit. See ADR-035.
|
||||
- **Cross-references**: [ADR-035](decisions/035-streaminterface-messageinterface-split.md), [interface.md](interface.md)
|
||||
|
||||
### OQ-P2-02: Should the HTTP interface share a port with the SSH listener?
|
||||
- **Origin**: [research/phase2/interface-model.md](../research/phase2/interface-model.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: low
|
||||
- **Resolution**: Start with separate ports. Stealth mode byte-peek on a shared port is already implemented for SSH vs HTTP detection. `ListenerConfig::Http { stealth: true }` enables the existing peek pattern. ALPN multiplexing on port 443 is a future optimization that doesn't change the interface abstraction.
|
||||
- **Cross-references**: [interface.md](interface.md), [research/phase2/tls-transport.md](../research/phase2/tls-transport.md)
|
||||
|
||||
### OQ-P2-03: Should the HTTP interface auto-generate OpenAPI specs from OperationRegistry?
|
||||
- **Origin**: [research/phase2/interface-model.md](../research/phase2/interface-model.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: low
|
||||
- **Resolution**: Yes, but Phase 5+. The HTTP interface needs to exist first (Phase 5.3 in the integration plan). `GET /v1/schema` producing an OpenAPI spec from registered `OperationSpec`s is the natural end state. This creates symmetry with `FromOpenAPI` (inbound spec consumption).
|
||||
- **Cross-references**: [call-protocol.md](call-protocol.md), [interface.md](interface.md)
|
||||
|
||||
### OQ-P2-04: How do self-hosted services authenticate via alknet?
|
||||
- **Origin**: [research/phase2/credential-provider.md](../research/phase2/credential-provider.md), [research/phase2/definitions.md](../research/phase2/definitions.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: medium
|
||||
- **Resolution**: Three-phase approach. Phase A: shared secret (`CredentialSet::Bearer` or `S3AccessKey`). Phase C: identity-bound credentials via `ManagedCredentialProvider`. Phase D: alknet as OIDC provider. The `CredentialProvider` trait in core enables Phase A immediately; Phases C and D are additive.
|
||||
- **Cross-references**: [ADR-036](decisions/036-credentialprovider-core-type.md), [credentials.md](credentials.md)
|
||||
|
||||
## Credentials
|
||||
|
||||
### OQ-CP-01: Should CredentialProvider support per-identity credentials?
|
||||
- **Origin**: [credentials.md](credentials.md)
|
||||
- **Status**: open
|
||||
- **Priority**: low
|
||||
- **Resolution**: Start with service-level credentials (`get_credentials(service)`). Add identity-level resolution (`get_credentials_for(service, identity_id)`) when the need is concrete. `Identity.id` already serves as the account UUID in database-backed mode.
|
||||
- **Cross-references**: [credentials.md](credentials.md), [ADR-036](decisions/036-credentialprovider-core-type.md)
|
||||
|
||||
### OQ-CP-02: Where should OIDC provider operations live?
|
||||
- **Origin**: [credentials.md](credentials.md)
|
||||
- **Status**: open
|
||||
- **Priority**: low
|
||||
- **Resolution**: Application service (Phase D). OIDC is an application concern, not a core concern. The call protocol and OperationRegistry provide the transport; OIDC is just another set of operations.
|
||||
- **Cross-references**: [credentials.md](credentials.md)
|
||||
|
||||
### OQ-CP-03: How do credential rotations propagate across a cluster?
|
||||
- **Origin**: [credentials.md](credentials.md)
|
||||
- **Status**: open
|
||||
- **Priority**: low
|
||||
- **Resolution**: TBD. Likely TTL-based caching with a refresh threshold. Workers call `CredentialProvider::get_credentials()` which checks `is_expired()` and calls `refresh_credentials()` if needed.
|
||||
- **Cross-references**: [credentials.md](credentials.md)
|
||||
|
||||
### OQ-CP-04: Should CredentialSet include request-signing capability?
|
||||
- **Origin**: [credentials.md](credentials.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: low
|
||||
- **Resolution**: No. `CredentialSet` is pure data. Request signing (e.g., AWS Signature V4) is a separate utility function in the service wrapper or a shared `alknet-s3` crate. Credentials are data; signing is protocol behavior.
|
||||
- **Cross-references**: [credentials.md](credentials.md)
|
||||
|
||||
## Definitions
|
||||
|
||||
### OQ-DEF-01: Should alknet adopt a "Service Catalog" concept like Keystone?
|
||||
- **Origin**: [research/phase2/definitions.md](../research/phase2/definitions.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: low
|
||||
- **Resolution**: Keep `OperationRegistry` global, check scope at invocation time. Add scope-filtered discovery (`GET /v1/schema?scope=...`) when multi-tenant deployment requires it. The unfiltered registry is sufficient for current needs.
|
||||
- **Cross-references**: [call-protocol.md](call-protocol.md)
|
||||
|
||||
### OQ-DEF-03: Should Identity.scopes be hierarchical or stay flat?
|
||||
- **Origin**: [research/phase2/definitions.md](../research/phase2/definitions.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: low
|
||||
- **Resolution**: Stay flat. Add implied scope resolution in alknet-storage when multi-tenant deployment requires it. A full policy language (like Rustfs IAM JSON policies) is Phase D territory.
|
||||
- **Cross-references**: [identity.md](identity.md)
|
||||
|
||||
### OQ-DEF-08: Should "credential presentation" replace "auth interface" in terminology?
|
||||
- **Origin**: [research/phase2/definitions.md](../research/phase2/definitions.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: medium
|
||||
- **Resolution**: Yes. Adopted in [definitions.md](definitions.md). Use "credential presentation" for the mechanism of presenting credentials on a (Transport, Interface) pair. Never use "auth interface" (overloads "Interface").
|
||||
- **Cross-references**: [definitions.md](definitions.md), [auth.md](auth.md)
|
||||
|
||||
## Secret Service
|
||||
|
||||
### OQ-SEC-01: Should alknet-secret use mlock/VirtualLock to prevent seed RAM from being paged to disk?
|
||||
- **Origin**: [secret-service.md](secret-service.md)
|
||||
- **Status**: open
|
||||
- **Priority**: low
|
||||
- **Resolution**: (deferred to Phase B — zeroize is sufficient for v1; mlock requires root/CAP_IPC_LOCK on Linux and SeLockMemory on Windows, adding platform complexity that should be audited together)
|
||||
- **Cross-references**: [ADR-038](decisions/038-seed-lifecycle-memory-security.md), [secret-service.md](secret-service.md)
|
||||
@@ -1,242 +0,0 @@
|
||||
---
|
||||
status: reviewed
|
||||
last_updated: 2026-06-07
|
||||
---
|
||||
|
||||
# Alknet Overview
|
||||
|
||||
## Purpose
|
||||
|
||||
Alknet is a self-hostable SSH-based tunnel tool that provides VPN-like functionality without being a VPN protocol. It enables:
|
||||
|
||||
- **Private tunneling** of services (Postgres, Redis, internal APIs) over SSH
|
||||
- **Censorship circumvention** — SSH over TLS on port 443 looks like HTTPS to DPI
|
||||
- **NAT traversal** — iroh transport allows peer-to-peer connections without public IPs or port forwarding
|
||||
- **Service mesh connectivity** — a lightweight transport layer for the pubsub/operations event system
|
||||
|
||||
The core insight: SSH tunnels work because SSH is fundamental infrastructure. Blocking it breaks the internet. Alknet makes SSH tunneling accessible through a simple CLI with pluggable transports.
|
||||
|
||||
## Crate Structure
|
||||
|
||||
Alknet is decomposed into six crates with a strict acyclic dependency graph (ADR-027):
|
||||
|
||||
| Crate | Purpose | Exists Now? |
|
||||
|-------|---------|-------------|
|
||||
| **alknet-core** | Transport, SSH, call protocol, config, auth types, `OperationSpec`, `Interface` trait | Yes |
|
||||
| **alknet-napi** | Node.js native addon via napi-rs | Yes |
|
||||
| **alknet-secret** | BIP39, SLIP-0010 HD key derivation, AES-256-GCM, `SecretProtocol` irpc service | Phase 2+ |
|
||||
| **alknet-storage** | SQLite-backed metagraph, identity tables, ACL graph, honker, `StorageProtocol` | Phase 2+ |
|
||||
| **alknet-flowgraph** | `FlowGraph<N,E>` over petgraph, operation graph, call graph | Phase 2+ |
|
||||
| **alknet** (CLI) | Binary that assembles everything with feature flags | Yes |
|
||||
|
||||
The four library crates (core, secret, storage, flowgraph) are independent of each other. Dependencies flow upward only: the CLI binary sits at the top and wires concrete implementations together. alknet-storage implements alknet-core's `IdentityProvider` trait without a crate dependency — the CLI binary provides the bridge.
|
||||
|
||||
irpc is behind a feature flag in alknet-core. Nodes that only do SSH tunneling don't need the service layer overhead.
|
||||
|
||||
## Three-Layer Model
|
||||
|
||||
Alknet uses a three-layer model (ADR-026, ADR-035):
|
||||
|
||||
| Layer | Responsibility | Examples |
|
||||
|-------|---------------|----------|
|
||||
| **Layer 1: Transport** | Produces byte streams (`AsyncRead + AsyncWrite + Unpin + Send`) | TCP, TLS, iroh, WebTransport (future) |
|
||||
| **Layer 2: Interface** | Two categories: StreamInterface (consumes transport stream, produces session) and MessageInterface (handles discrete requests, manages own transport) | Stream: SSH, raw framing. Message: HTTP, DNS |
|
||||
| **Layer 3: Protocol** | Carries semantics — operation registry, service calls, events | Call protocol, OperationEnv, operation dispatch |
|
||||
|
||||
SSH is an interface, not a transport. DNS is a message interface, not a transport.
|
||||
The three-layer model enables HTTP interfaces (stealth mode byte-peek),
|
||||
DNS control channels, and local service mesh (raw framing) without wrapping SSH
|
||||
inside those transports.
|
||||
|
||||
A stream-based connection is always a (Transport, StreamInterface) pair.
|
||||
Message-based interfaces manage their own transport. The protocol layer is
|
||||
agnostic to both.
|
||||
|
||||
## Service Layer
|
||||
|
||||
The irpc service layer decomposes alknet's core responsibilities into independently testable, deployable, and replaceable components (ADR-033, [services.md](services.md)):
|
||||
|
||||
- **Auth** (`AuthProtocol`) — verify identities, check credentials
|
||||
- **Secret** (`SecretProtocol`) — derive keys, encrypt/decrypt
|
||||
- **Config** (`ConfigProtocol`) — dynamic config reload
|
||||
- **Storage** (`StorageProtocol`) — graph CRUD, metagraph operations
|
||||
|
||||
**OperationEnv** is the universal composition mechanism. A handler receives `context.env.invoke("secrets", "derive", input)` and doesn't know whether the dispatch is local (direct function call), in-cluster (irpc service), or cross-node (call protocol `EventEnvelope`). Three dispatch paths, one handler-facing API.
|
||||
|
||||
**Phase boundary**: Phase 1 ships `ConfigIdentityProvider` (ArcSwap-backed) and `ConfigServiceImpl` (ArcSwap-backed) as the only auth and config implementations. The irpc service protocols (`AuthProtocol`, `SecretProtocol`, etc.) and the production deployment topology (multi-node with `StorageIdentityProvider`) are contracted in the specs but will be implemented in Phase 2+. Application services (DockerService, NodeService, agent services) are downstream concerns that build on top of the call protocol and OperationEnv.
|
||||
|
||||
## Identity
|
||||
|
||||
`Identity` struct and `IdentityProvider` trait are core types in alknet-core (ADR-029, [identity.md](identity.md)):
|
||||
|
||||
```rust
|
||||
pub struct Identity {
|
||||
pub id: String, // Fingerprint (config auth) or account UUID (database auth)
|
||||
pub scopes: Vec<String>, // Authorization scope strings
|
||||
pub resources: HashMap<String, Vec<String>>, // Resource-level authorization
|
||||
}
|
||||
```
|
||||
|
||||
`IdentityProvider` decouples alknet-core from identity storage. Phase 1 ships `ConfigIdentityProvider` (reads from `ArcSwap<DynamicConfig.auth>`). `StorageIdentityProvider` (Phase 2+, backed by SQLite) replaces it for production deployments. Both produce the same `Identity` result.
|
||||
|
||||
## Exports
|
||||
|
||||
### Binary: `alknet`
|
||||
|
||||
A single binary with subcommands:
|
||||
|
||||
```
|
||||
alknet serve — Start the server (accepts SSH connections)
|
||||
alknet connect — Start the client (opens SSH session, exposes SOCKS5/port-forwards)
|
||||
```
|
||||
|
||||
### Library: `alknet-core`
|
||||
|
||||
The `alknet-core` crate exports the pluggable components for embedding or programmatic use:
|
||||
|
||||
- `Transport` trait — produces a duplex stream for SSH to run over
|
||||
- `TcpTransport` — direct TCP connection
|
||||
- `TlsTransport` — TCP + tokio-rustls TLS
|
||||
- `IrohTransport` — iroh QUIC P2P connection
|
||||
- `Interface` trait → `StreamInterface` trait and `MessageInterface` trait (ADR-035)
|
||||
- `InterfaceSession` trait — `recv()`/`send()` producing/consuming `InterfaceEvent` frames
|
||||
- `InterfaceRequest` / `InterfaceResponse` — normalized request/response for message interfaces
|
||||
- `Socks5Server` — local SOCKS5 proxy that forwards through SSH channels
|
||||
- `PortForwarder` — manages local/remote port forwards
|
||||
- `ServerHandler` → `SshInterface` — russh server handler with configurable auth and channel policies
|
||||
- `Identity` / `IdentityProvider` — core identity types (ADR-029)
|
||||
- `CredentialProvider` / `CredentialSet` — outbound credential types (ADR-036)
|
||||
- `OperationSpec` — operation registration for call protocol (ADR-025)
|
||||
- `OperationEnv` / `OperationContext` — universal composition and operation context
|
||||
- `ConnectOptions` / `ServeOptions` — programmatic configuration structs
|
||||
- `StaticConfig` / `DynamicConfig` — static/immutable vs, hot-reloadable config (ADR-030)
|
||||
- `ConfigReloadHandle` — programmatic reload of dynamic config
|
||||
- `ForwardingPolicy` — rule-based allow/deny for channel targets (ADR-031)
|
||||
- `ListenerConfig` — stream and message listener configuration
|
||||
|
||||
## Dependencies
|
||||
|
||||
| Dependency | Purpose | Crate | Feature-gated |
|
||||
|------------|---------|-------|---------------|
|
||||
| `russh` | SSH client & server | core | No (core) |
|
||||
| `tokio` | Async runtime | core | No (core) |
|
||||
| `tokio-rustls` | TLS wrapping | core | Yes (`tls`) |
|
||||
| `rustls` | TLS implementation | core | Yes (`tls`) |
|
||||
| `rustls-acme` | ACME/Let's Encrypt auto-cert | core | Yes (`acme`) |
|
||||
| `iroh` | P2P QUIC transport | core | Yes (`iroh`) |
|
||||
| `irpc` | Streaming RPC service layer | core | Yes (`irpc`) |
|
||||
| `arc-swap` | Lock-free dynamic config | core | No (core) |
|
||||
| `serde` | Serialization | core | No (core) |
|
||||
| `clap` | CLI argument parsing | CLI | No (CLI) |
|
||||
| `toml` | TOML config file | CLI | No (CLI) |
|
||||
| `tracing` | Structured logging | core | No (core) |
|
||||
| `anyhow` / `thiserror` | Error handling | core | No (core) |
|
||||
| `bip39` | Mnemonic generation | secret | No (secret) |
|
||||
| `ed25519-bip32` | HD key derivation | secret | No (secret) |
|
||||
| `aes-gcm` | AES-256-GCM encryption | secret | No (secret) |
|
||||
| `rusqlite` | SQLite (via honker) | storage | No (storage) |
|
||||
| `honker` | Event-sourced storage | storage | No (storage) |
|
||||
| `petgraph` | Graph data structure | storage, flowgraph | No |
|
||||
| `jsonschema` | JSON Schema validation | storage, flowgraph | No |
|
||||
|
||||
> Note: `tun-rs` is no longer a dependency. TUN support is deferred in favor of the external `tun2proxy` tool (ADR-014).
|
||||
|
||||
## Architecture Constraints
|
||||
|
||||
1. **SSH runs over transport, not alongside** — The transport layer produces a single `AsyncRead+AsyncWrite+Unpin+Send` stream. SSH runs over that stream via `russh::client::connect_stream()` / `russh::server::run_stream()`. The SSH layer never knows what transport it's on. (ADR-001, ADR-004)
|
||||
|
||||
2. **Three-layer model: Transport, Interface, Protocol** — SSH is a StreamInterface (Layer 2), not a transport (Layer 1). HTTP and DNS are MessageInterfaces (Layer 2). A connection is always a (Transport, StreamInterface) pair for stream-based interfaces, or a standalone MessageInterface for message-based ones. The call protocol (Layer 3) is agnostic to both. This enables HTTP interfaces, DNS control channels, and local service mesh without wrapping SSH. (ADR-026, ADR-035)
|
||||
|
||||
3. **SOCKS5 is the primary client interface** — Port forwarding is built on top of SOCKS5-like channel management. For VPN-like "route all traffic" behavior, users run `tun2proxy` alongside alknet's SOCKS5 proxy. TUN is not in the project scope. (ADR-005, ADR-014)
|
||||
|
||||
4. **No logging of tunnel destinations** — The server logs auth attempts and connections (for fail2ban) but does not log `channel_open_direct_tcpip` destinations, DNS lookups, or bytes transferred. (ADR-006, ADR-013)
|
||||
|
||||
5. **Programmatic-first API** — Configuration via CLI flags, library API structs (`ConnectOptions`, `ServeOptions`), and environment variables. No `~/.ssh/config` parsing. Optional `--config` TOML file for reproducible deployments. (ADR-011, ADR-030)
|
||||
|
||||
6. **Feature flags control transport inclusion** — `tls`, `iroh`, `acme`, `irpc` are feature-gated so the base install is lean. Users opt in to heavier dependencies.
|
||||
|
||||
7. **Authentication is key-based and unified** — Ed25519 public key (default) and OpenSSH certificate authority. Same key material for SSH and token auth. Identity resolves through `IdentityProvider` trait, decoupling core from identity storage. (ADR-012, ADR-023, ADR-029)
|
||||
|
||||
8. **NAPI exposes both connect() and serve()** — The napi-rs wrapper provides client and server functionality, using napi-rs as the FFI bridge. The NAPI layer is transport-agnostic and not tied to pubsub. (ADR-015, ADR-016)
|
||||
|
||||
9. **Static/dynamic config split** — Transport-level settings (listen address, TLS certs) are immutable after startup. Auth, forwarding policy, and rate limits are hot-reloadable via `ArcSwap<DynamicConfig>`. (ADR-030)
|
||||
|
||||
10. **Forwarding policy enforced before proxy spawn** — Each `channel_open_direct_tcpip` is checked against `ForwardingPolicy` before a TCP connection is made. Default-allow preserves current behavior. (ADR-031)
|
||||
|
||||
11. **OperationEnv as universal composition mechanism** — Handlers call `context.env.invoke(namespace, op, input)` regardless of dispatch path (local, irpc service, remote call protocol). (ADR-033)
|
||||
|
||||
12. **Event boundary discipline** — Domain events (Honker streams) stay within the owning service. irpc calls are synchronous and in-cluster. Call protocol `EventEnvelope` is the only thing that crosses node boundaries. (ADR-032)
|
||||
|
||||
13. **Error handling follows a consistent layered pattern** — Transport and auth errors cause reconnection (client, with exponential backoff) or connection rejection (server). Channel-level errors (target unreachable, proxy failure) close the individual channel without killing the session. Library API errors propagate via `anyhow::Result` / `thiserror` types. CLI reports errors to stderr with appropriate exit codes. NAPI errors are marshalled as JavaScript exceptions.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [001](decisions/001-pluggable-transport.md) | Pluggable transport | Transport trait produces `AsyncRead+AsyncWrite+Unpin+Send`, SSH consumes it |
|
||||
| [002](decisions/002-tun-separate-process.md) | TUN shim separate | Superseded — TUN is deferred, use tun2proxy (ADR-014) |
|
||||
| [003](decisions/003-iroh-stream-join.md) | iroh stream join | `tokio::io::join(recv, send)` combines QUIC halves |
|
||||
| [004](decisions/004-ssh-over-transport.md) | SSH over transport | SSH never accesses TCP/iroh/TLS directly |
|
||||
| [005](decisions/005-socks5-before-tun.md) | SOCKS5 first | SOCKS5 is the primary interface; TUN is external (tun2proxy) |
|
||||
| [006](decisions/006-no-logging-of-tunnel-destinations.md) | No logging of tunnel destinations | Server logs auth and connections, not destinations |
|
||||
| [007](decisions/007-napi-single-stream.md) | NAPI single stream | NAPI exposes duplex streams, not SSH multiplexing |
|
||||
| [008](decisions/008-acme-lets-encrypt.md) | ACME/Let's Encrypt | Auto-provision TLS certs, domain and IP paths |
|
||||
| [009](decisions/009-default-iroh-relay.md) | Default iroh relay | n0 relay by default, `--iroh-relay` override |
|
||||
| [010](decisions/010-transport-chaining-cli.md) | Transport chaining | `--proxy` works with all transports natively |
|
||||
| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first | No SSH config files; options are structs, env vars, CLI flags (amended by ADR-030 for optional TOML) |
|
||||
| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Key + cert-authority | Ed25519 keys + OpenSSH CA; no password auth |
|
||||
| [013](decisions/013-fail2ban-friendly-logging.md) | Fail2ban-friendly | Structured auth logs + built-in rate limiting |
|
||||
| [014](decisions/014-defer-tun-recommend-socks5-proxy.md) | Defer TUN | Use tun2proxy for VPN-like behavior; no alknet-tun binary |
|
||||
| [015](decisions/015-napi-rs-for-ffi-bridge.md) | napi-rs | Standard Node.js native addon tooling |
|
||||
| [016](decisions/016-napi-expose-connect-and-serve.md) | connect + serve | NAPI exposes both client and server from the start |
|
||||
| [017](decisions/017-stealth-mode-protocol-multiplexing.md) | Stealth mode | Protocol multiplexing on port 443 |
|
||||
| [018](decisions/018-control-channel-for-pubsub.md) | Control channel | Reserved `alknet-control` destination for pubsub |
|
||||
| [019](decisions/019-proxy-dual-semantics.md) | Proxy dual semantics | `--proxy` routes transport on client, data on server |
|
||||
| [023](decisions/023-unified-auth-shared-key-material.md) | Unified auth | Same key material for SSH and token auth |
|
||||
| [024](decisions/024-bidirectional-call-protocol.md) | Bidirectional call protocol | Both sides can initiate calls |
|
||||
| [025](decisions/025-handler-spec-separation.md) | Handler/spec separation | Downstream registers operations without modifying core |
|
||||
| [026](decisions/026-transport-interface-separation.md) | Three-layer model | SSH is Layer 2, not Layer 1 |
|
||||
| [027](decisions/027-crate-decomposition.md) | Crate decomposition | Six crates, acyclic deps, feature-gated irpc |
|
||||
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | IdentityProvider is the contract, irpc is one backend |
|
||||
| [029](decisions/029-identity-core-type.md) | Identity as core type | `Identity` and `IdentityProvider` in alknet-core |
|
||||
| [030](decisions/030-static-dynamic-config-split.md) | Static/dynamic config | ArcSwap for hot-reloadable auth and forwarding |
|
||||
| [031](decisions/031-forwarding-policy.md) | Forwarding policy | Per-identity, per-destination, per-transport rules |
|
||||
| [032](decisions/032-event-boundary-discipline.md) | Event boundary | Domain events never cross service boundaries |
|
||||
| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv | Universal composition, three dispatch paths |
|
||||
| [034](decisions/034-head-worker-terminology.md) | Head/worker | Replaces hub/spoke terminology |
|
||||
| [035](decisions/035-streaminterface-messageinterface-split.md) | StreamInterface/MessageInterface | Two Layer 2 trait categories for stream vs message |
|
||||
| [036](decisions/036-credentialprovider-core-type.md) | CredentialProvider as core type | Outbound credentials in `alknet_core::credentials` |
|
||||
| [037](decisions/037-api-keys-dynamic-config.md) | API keys in DynamicConfig | Hash-verified bearer tokens for service accounts |
|
||||
|
||||
## Open Questions
|
||||
|
||||
See [open-questions.md](open-questions.md) for all open and resolved questions.
|
||||
Key open questions: OQ-15 (QUIC coexistence), OQ-19 (WebTransport TLS),
|
||||
OQ-20 (worker registration), OQ-IF-01 (Interface session / EventEnvelope
|
||||
relationship).
|
||||
|
||||
## References
|
||||
|
||||
- [transport.md](transport.md) — Transport abstraction (Layer 1)
|
||||
- [interface.md](interface.md) — StreamInterface and MessageInterface (Layer 2)
|
||||
- [call-protocol.md](call-protocol.md) — Call protocol (Layer 3)
|
||||
- [auth.md](auth.md) — Unified authentication, API keys, credential presentation
|
||||
- [identity.md](identity.md) — Identity and IdentityProvider
|
||||
- [credentials.md](credentials.md) — CredentialProvider and CredentialSet (outbound auth)
|
||||
- [definitions.md](definitions.md) — Terminology disambiguation
|
||||
- [configuration.md](configuration.md) — StaticConfig, DynamicConfig, ForwardingPolicy
|
||||
- [services.md](services.md) — irpc service layer, OperationEnv
|
||||
- [server.md](server.md) — Server acceptance, channel handling
|
||||
- [client.md](client.md) — Client connection, SOCKS5, port forwarding
|
||||
- [napi-and-pubsub.md](napi-and-pubsub.md) — NAPI wrapper and pubsub adapter
|
||||
- [storage.md](storage.md) — alknet-storage: metagraph, identity, ACL
|
||||
- [flowgraph.md](flowgraph.md) — alknet-flowgraph: call graph, operation graph
|
||||
- [secret-service.md](secret-service.md) — alknet-secret: BIP39, SLIP-0010, AES-GCM
|
||||
- [Feasibility Assessment](../research/feasibility/ssh-tunnel-vpn-alternative-feasibility.md)
|
||||
- [russh API](/workspace/russh) — SSH client/server library
|
||||
- [Dispatch](/workspace/@alkdev/dispatch) — Reference implementation of russh port forwarding
|
||||
- [iroh](/workspace/iroh) — P2P QUIC connections
|
||||
- [tun2proxy](https://github.com/tun2proxy/tun2proxy) — Recommended external TUN-to-SOCKS5 tool
|
||||
- [irpc](/workspace/irpc) — iroh streaming RPC
|
||||
- [Production certbot setup](../research/ops/certbot.md) — Let's Encrypt on our infrastructure
|
||||
- [Production fail2ban setup](../research/ops/fail2ban.md) — fail2ban with nftables on our infrastructure
|
||||
@@ -1,519 +0,0 @@
|
||||
---
|
||||
status: reviewed
|
||||
last_updated: 2026-06-10
|
||||
---
|
||||
|
||||
# Secret Service (alknet-secret)
|
||||
|
||||
## What
|
||||
|
||||
The `alknet-secret` crate provides BIP39 mnemonic generation, SLIP-0010 Ed25519
|
||||
HD key derivation, AES-256-GCM encryption for external credentials, and the
|
||||
`SecretProtocol` irpc service. It is the only component that holds the master
|
||||
seed phrase.
|
||||
|
||||
## Why
|
||||
|
||||
Operations like SSH key generation, API key storage, and Ethereum transaction
|
||||
signing all need deterministic key derivation from a single root of trust. The
|
||||
seed phrase is the single recovery mechanism — from it, all self-generated
|
||||
secrets can be derived on demand. External credentials (third-party API keys,
|
||||
OAuth tokens) cannot be derived and must be stored encrypted, with the
|
||||
encryption key itself derived from the seed.
|
||||
|
||||
The secret service isolates this responsibility: no other crate sees the seed,
|
||||
and derived keys are provided on demand through an irpc service interface. This
|
||||
follows ADR-027 (crate decomposition) — alknet-secret is fully independent of
|
||||
alknet-core and alknet-storage.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Crate Structure
|
||||
|
||||
```
|
||||
alknet-secret/
|
||||
├── Cargo.toml
|
||||
├── src/
|
||||
│ ├── lib.rs # Crate root, re-exports
|
||||
│ ├── mnemonic.rs # BIP39: phrase generation, validation, seed derivation
|
||||
│ ├── derivation.rs # SLIP-0010: HD key derivation, path constants
|
||||
│ ├── encryption.rs # AES-256-GCM: encrypt/decrypt, EncryptedData type
|
||||
│ ├── protocol.rs # SecretProtocol irpc service enum, DerivedKey, KeyType
|
||||
│ ├── service.rs # SecretService, SecretServiceHandle, SecretServiceActor
|
||||
│ ├── cache.rs # Key caching: LRU cache with TTL, derivation path as key
|
||||
│ └── ethereum.rs # BIP-0032 secp256k1 HD key derivation (behind feature flag)
|
||||
└── tests/
|
||||
├── derivation_tests.rs # Path derivation, coin type 74' consistency
|
||||
├── encryption_tests.rs # Round-trip encrypt/decrypt, key version
|
||||
├── service_tests.rs # Unlock/Lock lifecycle, derive on locked = error
|
||||
└── test_vectors.rs # Known-answer tests: BIP39, SLIP-0010, AES-256-GCM
|
||||
```
|
||||
|
||||
### Dependencies
|
||||
|
||||
```toml
|
||||
[dependencies]
|
||||
bip39 = { version = "2", features = ["rand"] }
|
||||
ed25519-bip32 = "0.4" # IOHK SLIP-0010 Ed25519 HD derivation
|
||||
aes-gcm = "0.10" # AES-256-GCM
|
||||
sha2 = "0.10" # SHA-256 (also used for HMAC-SHA512 in password derivation)
|
||||
hmac = "0.12" # HMAC-SHA512 for key derivation
|
||||
serde = { version = "1", features = ["derive"] }
|
||||
serde_json = "1"
|
||||
thiserror = "2"
|
||||
irpc = { workspace = true } # Always-on, not feature-gated (ADR-027)
|
||||
irpc-derive = { workspace = true } # Proc-macro for #[rpc_requests]
|
||||
tokio = { version = "1", features = ["sync", "rt", "macros"] } # Async runtime for SecretServiceActor
|
||||
zeroize = { version = "1", features = ["derive"] } # Secure memory wiping (ADR-038)
|
||||
base64 = "0.22" # Base64url encoding for derived passwords
|
||||
rand = "0.8" # Random IV/salt generation for AES-256-GCM
|
||||
|
||||
[dependencies.secp256k1]
|
||||
version = "0.29"
|
||||
optional = true # BIP-0032 secp256k1 derivation (behind feature flag)
|
||||
|
||||
[features]
|
||||
default = []
|
||||
secp256k1 = ["dep:secp256k1"] # Enable Ethereum/secp256k1 key derivation
|
||||
|
||||
# Future (Phase B): key rotation via KDF
|
||||
# hkdf = "0.12" # HKDF for salt-based key stretching (deferred)
|
||||
# pbkdf2 = "0.12" # PBKDF2 for password-based key derivation (deferred)
|
||||
```
|
||||
|
||||
irpc is always a dependency (not behind a feature flag). Per ADR-027, irpc
|
||||
in alknet-secret and alknet-storage is not feature-gated because these crates
|
||||
are used in production deployments where the service layer is always active.
|
||||
`irpc-derive` provides the `#[rpc_requests]` proc-macro that generates
|
||||
`SecretMessage` and channel plumbing. `tokio` is needed for the
|
||||
`SecretServiceActor` message loop (async channel receivers and task spawning).
|
||||
|
||||
The `secp256k1` crate is feature-gated behind the `secp256k1` feature because
|
||||
Ethereum/BIP-0032 derivation is not needed in minimal deployments. Only
|
||||
deployments that require `DeriveEthereumKey` should enable this feature. Note
|
||||
that the crate name is `secp256k1` (the Rust library), not `libsecp256k1`
|
||||
(the C library that the Rust crate wraps).
|
||||
|
||||
The `hkdf` and `pbkdf2` crates are deferred to Phase B. They will be needed for
|
||||
salt-based key stretching when key rotation is implemented (see
|
||||
[EncryptedData.salt](#aes-256-gcm-encryption-for-external-credentials)).
|
||||
|
||||
### Crate Interface (Public API)
|
||||
|
||||
The crate exposes these types as its stable public interface:
|
||||
|
||||
```rust
|
||||
// Core types (always available)
|
||||
pub use mnemonic::{Mnemonic, Language, Seed};
|
||||
pub use derivation::{ExtendedPrivKey, DerivationError, PATHS};
|
||||
pub use encryption::{EncryptedData, EncryptionError};
|
||||
pub use protocol::{SecretProtocol, DerivedKey, KeyType, SecretMessage};
|
||||
pub use service::{SecretService, SecretServiceHandle, SecretServiceActor, SecretServiceError};
|
||||
pub use cache::CacheConfig;
|
||||
|
||||
// secp256k1 types (behind feature flag)
|
||||
#[cfg(feature = "secp256k1")]
|
||||
pub use ethereum::Secp256k1ExtendedPrivKey;
|
||||
```
|
||||
|
||||
Other crates consume this interface:
|
||||
- **alknet-storage** references `EncryptedData` for wire format compatibility
|
||||
(type-level, not a crate dependency)
|
||||
- **alknet** (CLI binary) assembles `SecretService` and wires it to the
|
||||
`OperationEnv`
|
||||
- **alknet-core** never depends on alknet-secret; `CredentialProvider` stub
|
||||
returns `None` until Phase A wiring
|
||||
|
||||
### Security Model
|
||||
|
||||
Per ADR-038 (seed lifecycle and memory security):
|
||||
|
||||
| State | What's in memory | What's on disk |
|
||||
|-------|-----------------|---------------|
|
||||
| Locked | Nothing | Encrypted database, derivation path metadata |
|
||||
| Unlocked | Master seed in zeroize-protected RAM | Same (seed is never persisted) |
|
||||
| After use | Derived keys cached in zeroize-protected RAM | Derivation paths only |
|
||||
|
||||
The seed phrase is entered once (at node startup or via `Unlock`), held only in
|
||||
RAM, and never written to disk. `Lock` calls `zeroize()` on the seed and all
|
||||
cached derived keys. The `SecretService` uses `Zeroize`-derived types for all
|
||||
sensitive material.
|
||||
|
||||
#### Key Caching
|
||||
|
||||
Per OQ-SVC-04 (resolved), derived keys are cached in RAM with the following
|
||||
properties:
|
||||
|
||||
- **Cache key**: The derivation path string (e.g., `m/74'/0'/0'/0'`). This
|
||||
uniquely identifies a derived key — the same path always produces the same
|
||||
key from the same seed.
|
||||
- **TTL**: 1 hour (configurable). Cached entries expire after the TTL elapses,
|
||||
forcing re-derivation from the seed on next access.
|
||||
- **Eviction policy**: LRU (least recently used). When the cache exceeds its
|
||||
maximum size, the least recently accessed entry is evicted.
|
||||
- **Clearing**: The entire cache is cleared on `Lock`, and all entries are
|
||||
zeroized before removal per ADR-038.
|
||||
- **Implementation**: The cache lives in `cache.rs` as an LRU map from
|
||||
derivation path to `Zeroize`-protected key bytes.
|
||||
|
||||
The cache avoids redundant derivation for frequently used keys (identity,
|
||||
encryption) while ensuring that `Lock` purges all sensitive material.
|
||||
|
||||
### Key Derivation
|
||||
|
||||
#### BIP39 Mnemonic and Seed Derivation
|
||||
|
||||
```rust
|
||||
let mnemonic = Mnemonic::from_phrase(&phrase, Language::English)?;
|
||||
let seed = mnemonic.to_seed(None); // or Some("passphrase")
|
||||
let key = derive_path_from_seed(seed.as_bytes(), PATHS::IDENTITY)?;
|
||||
```
|
||||
|
||||
#### SLIP-0010 Ed25519 HD Key Derivation
|
||||
|
||||
The `74'` coin type is unallocated per SLIP-0044 and reserved for alknet.
|
||||
|
||||
#### Derivation Path Constants
|
||||
|
||||
| Path | Purpose | Curve/Algorithm |
|
||||
|------|---------|----------------|
|
||||
| `m/74'/0'/0'/0'` | Primary identity keypair | Ed25519 (alknet auth) |
|
||||
| `m/74'/0'/0'/{n}'` | Worker/device identity | Ed25519 |
|
||||
| `m/74'/0'/1'/0'` | SSH host key | Ed25519 |
|
||||
| `m/74'/1'/0'/{hash}'` | Site-specific password | Deterministic (HMAC-SHA512) |
|
||||
| `m/74'/2'/0'/0'` | Encryption key for external credentials | AES-256-GCM |
|
||||
| `m/44'/60'/0'/0/0` | Ethereum signing key | secp256k1 |
|
||||
|
||||
These constants are defined in `derivation::PATHS` for programmatic access.
|
||||
|
||||
#### Password Derivation
|
||||
|
||||
`DerivePassword` produces a deterministic password from the seed using the
|
||||
following algorithm:
|
||||
|
||||
1. Derive the extended private key at path `m/74'/1'/0'/{hash}'` using
|
||||
SLIP-0010 (HMAC-SHA512 with key "ed25519 seed"), where `{hash}'` is a
|
||||
site-specific hardened index derived from the site identifier.
|
||||
2. Take the HMAC-SHA512 output (64 bytes) at that derivation level.
|
||||
3. Truncate to the requested `length` bytes.
|
||||
4. Encode as Base64url (RFC 4648 §5, no padding).
|
||||
|
||||
This produces a URL-safe, deterministic password of the requested length. v1
|
||||
does not impose a special character set — the Base64url alphabet (`A-Z`,
|
||||
`a-z`, `0-9`, `-`, `_`) provides sufficient entropy. If a specific character
|
||||
set is required in the future, a versioned path can be introduced
|
||||
(e.g., `m/74'/1'/1'/{hash}'`).
|
||||
|
||||
The `SecretServiceHandle` provides two methods for password derivation:
|
||||
- `derive_password(path, length)` → `Vec<u8>` (raw truncated bytes)
|
||||
- `derive_password_string(path, length)` → `String` (Base64url-encoded)
|
||||
|
||||
The irpc `DerivePassword` variant returns raw bytes (`Vec<u8>`). Consumers
|
||||
who need a string representation can Base64url-encode the result.
|
||||
|
||||
#### secp256k1 Derivation (Ethereum)
|
||||
|
||||
`DeriveEthereumKey` uses **BIP-0032** (not SLIP-0010) at path
|
||||
`m/44'/60'/0'/0/0`. This is a fundamentally different derivation algorithm from
|
||||
Ed25519:
|
||||
|
||||
- SLIP-0010 (Ed25519) uses HMAC-SHA512 with key "ed25519 seed" and only
|
||||
supports hardened child derivation.
|
||||
- BIP-0032 (secp256k1) uses HMAC-SHA512 with key "Bitcoin seed" and supports
|
||||
both hardened and unhardened child derivation.
|
||||
|
||||
The Ethereum path contains unhardened indices (`0/0`), which are invalid under
|
||||
SLIP-0010. The `alknet-secret` crate gates secp256k1 derivation behind a
|
||||
`secp256k1` feature flag, which pulls in the `libsecp256k1` crate. Deployments
|
||||
that do not need Ethereum signing can omit this feature to avoid the
|
||||
dependency.
|
||||
|
||||
#### DerivedKey Security Properties
|
||||
|
||||
Per ADR-038, the `private_key` field of `DerivedKey` must derive `Zeroize` and
|
||||
use `#[zeroize(drop)]` to ensure sensitive key material is overwritten before
|
||||
deallocation:
|
||||
|
||||
```rust
|
||||
#[derive(Zeroize, Deserialize)]
|
||||
#[zeroize(drop)]
|
||||
pub struct DerivedKey {
|
||||
#[zeroize(skip)]
|
||||
pub key_type: KeyType,
|
||||
#[zeroize]
|
||||
#[serde(deserialize_with = "deserialize_private_key")]
|
||||
pub private_key: Vec<u8>,
|
||||
#[zeroize(skip)]
|
||||
pub public_key: Vec<u8>,
|
||||
}
|
||||
```
|
||||
|
||||
`DerivedKey` is **move-only** — it does not implement `Clone`. This is a
|
||||
stronger security property than manual `Clone` with zeroization of the source:
|
||||
a move-only type cannot be accidentally duplicated, and the `#[zeroize(drop)]`
|
||||
annotation ensures the `private_key` is zeroized when the key goes out of scope.
|
||||
There is no risk of use-after-zeroize from a manual `clone()` that destroys
|
||||
the source.
|
||||
|
||||
Serialization redacts `private_key` in human-readable formats (JSON shows
|
||||
`"[REDACTED]"`) but preserves the actual bytes in binary formats (postcard) so
|
||||
that irpc remote communication works correctly. Deserialization always reads
|
||||
the full bytes.
|
||||
|
||||
### AES-256-GCM Encryption for External Credentials
|
||||
|
||||
External credentials (API keys, OAuth tokens) that cannot be derived are
|
||||
encrypted using a key derived from the seed at path `m/74'/2'/0'/0'`. The
|
||||
`EncryptedData` type stores the key version, salt, IV, and ciphertext.
|
||||
|
||||
1. The secret service derives an AES-256-GCM key via path `m/74'/2'/0'/0'`
|
||||
2. External credentials are encrypted with this key
|
||||
3. The encrypted data is stored as a `SecretNode` in the metagraph
|
||||
4. Only the derivation path and key version are stored in plain attributes
|
||||
5. The seed phrase (or derived encryption key) is held only by the secret
|
||||
service — never in the database
|
||||
|
||||
#### EncryptedData.salt — Reserved for Future KDF-Based Key Rotation
|
||||
|
||||
In v1, the encryption key is derived directly from the seed at path
|
||||
`m/74'/2'/0'/0'` without any salt-based key derivation. The `salt` field in
|
||||
`EncryptedData` is **reserved for future KDF-based key rotation** (Phase B):
|
||||
|
||||
- The salt is generated randomly (32 bytes) and stored in `EncryptedData.salt`
|
||||
for forward compatibility, but it is **not used** in the v1 key derivation
|
||||
process.
|
||||
- When key rotation is implemented, the salt will be used as input to HKDF or
|
||||
PBKDF2 for stretch-based key derivation, allowing the same seed to produce
|
||||
different encryption keys without changing the derivation path.
|
||||
- This design ensures that the wire format does not need to change when key
|
||||
rotation is introduced — the `salt` field is already present and populated.
|
||||
|
||||
The `hkdf` and `pbkdf2` crates are listed as future dependencies in the
|
||||
`Dependencies` section but are not included in v1.
|
||||
|
||||
### SecretProtocol irpc Service
|
||||
|
||||
```rust
|
||||
#[rpc_requests(message = SecretMessage)]
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
enum SecretProtocol {
|
||||
#[rpc(tx=oneshot::Sender<DerivedKey>)]
|
||||
#[wrap(DeriveEd25519)]
|
||||
DeriveEd25519 { path: String },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<DerivedKey>)]
|
||||
#[wrap(DeriveEncryptionKey)]
|
||||
DeriveEncryptionKey { path: String },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<DerivedKey>)]
|
||||
#[wrap(DeriveEthereumKey)]
|
||||
DeriveEthereumKey { path: String },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<Vec<u8>>)]
|
||||
#[wrap(DerivePassword)]
|
||||
DerivePassword { path: String, length: usize },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<EncryptedData>)]
|
||||
#[wrap(Encrypt)]
|
||||
Encrypt { plaintext: String, key_version: u32 },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<String>)]
|
||||
#[wrap(Decrypt)]
|
||||
Decrypt { encrypted: EncryptedData },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<()>)]
|
||||
#[wrap(Lock)]
|
||||
Lock,
|
||||
|
||||
#[rpc(tx=oneshot::Sender<()>)]
|
||||
#[wrap(Unlock)]
|
||||
Unlock { mnemonic: String, passphrase: Option<String> },
|
||||
```
|
||||
|
||||
**Note**: The `Unlock` variant carries both the mnemonic phrase and an optional
|
||||
BIP39 passphrase. The `mnemonic` field is the space-separated BIP39 word list.
|
||||
The `passphrase` field is the optional BIP39 password extension (sometimes
|
||||
called the "25th word"). Most deployments use `passphrase: None`, but the field
|
||||
is available for users who need additional security beyond the mnemonic alone.
|
||||
|
||||
> **Implementation gap**: The current code has `Unlock { passphrase: String }`
|
||||
> with only a single field (the mnemonic), and the actor handler passes `None`
|
||||
> for the BIP39 passphrase. This needs to be updated to match the spec above.
|
||||
> See the `unlock-passphrase-gap` task.
|
||||
|
||||
#### irpc Integration Model
|
||||
|
||||
The `SecretProtocol` enum defines the **wire protocol** — the set of operations
|
||||
the secret service supports. The `#[rpc_requests(message = SecretMessage)]`
|
||||
macro generates `SecretMessage` as the irpc wire type, which comes in two
|
||||
variants:
|
||||
|
||||
- `SecretMessage::Request`: serialized form for remote (QUIC) communication,
|
||||
using postcard encoding.
|
||||
- `SecretMessage::RequestWithChannels`: local form with `oneshot::Sender`
|
||||
channels for in-process communication.
|
||||
|
||||
There are two dispatch paths for consuming the secret service:
|
||||
|
||||
1. **Local (in-process)**: `SecretServiceHandle` wraps `SecretServiceInner`
|
||||
behind `Arc<RwLock<>>` and provides direct method calls
|
||||
(`derive_ed25519()`, `encrypt()`, etc.) without any serialization overhead.
|
||||
This is the path used by the CLI binary and single-node deployments. No irpc
|
||||
message passing is involved — the handle calls the implementation directly.
|
||||
|
||||
2. **Remote (in-cluster)**: `Client<SecretProtocol>` connects to the secret
|
||||
service node via irpc over QUIC. The client sends `SecretMessage::Request`
|
||||
messages (postcard-serialized) and receives responses. Workers on remote
|
||||
nodes use this path. The seed never leaves the secret service node — only
|
||||
derived keys are transmitted.
|
||||
|
||||
The `SecretServiceActor` processes incoming `SecretMessage` variants by
|
||||
dispatching to the corresponding `SecretServiceHandle` methods. It provides
|
||||
a `spawn(handle)` convenience method that creates an mpsc channel, spawns the
|
||||
actor on a tokio task, and returns a `(Client<SecretProtocol>, SecretServiceActor)`
|
||||
tuple for immediate use.
|
||||
|
||||
The `SecretService` type owns the irpc service handler and a
|
||||
`SecretServiceHandle`. It dispatches incoming `SecretMessage` variants to the
|
||||
handle's methods. For call protocol exposure (e.g., `/head/secrets/derive`),
|
||||
the service is wrapped in an operation that serializes to JSON.
|
||||
|
||||
### Wire Format Compatibility with alknet-storage
|
||||
|
||||
The `EncryptedData` type (`key_version`, `salt`, `iv`, `data`) is the stable
|
||||
wire format shared with alknet-storage. This is type-level compatibility — not a
|
||||
crate dependency. alknet-storage stores encrypted nodes using this format;
|
||||
alknet-secret encrypts and decrypts using this format.
|
||||
|
||||
The Rust `EncryptedData` struct in alknet-secret is a superset of the TypeScript
|
||||
`EncryptedDataSchema` from `@alkdev/storage`. Migration path: re-encrypt
|
||||
TypeScript-encrypted data using the Rust secret service with a new key version.
|
||||
The wire format is stable — future key rotation will use the existing `salt`
|
||||
field rather than adding new fields (see OQ-SVC-03).
|
||||
|
||||
### Deployment Topologies
|
||||
|
||||
**Minimal (single node, CLI)**: Secret service runs in the same process. Seed
|
||||
phrase entered at startup. All keys derived locally via `SecretServiceHandle`.
|
||||
No irpc overhead.
|
||||
|
||||
**Production (head node)**: Secret service runs on a dedicated node or as a
|
||||
local irpc service. Workers request derived keys via `Client<SecretProtocol>`
|
||||
over QUIC. The seed never leaves the secret service node.
|
||||
|
||||
### Test Vectors
|
||||
|
||||
Known-answer tests are required against published test vectors to verify
|
||||
correctness of the cryptographic implementations:
|
||||
|
||||
#### BIP39 Test Vectors
|
||||
|
||||
The `mnemonic` module must produce identical output to the BIP39 reference
|
||||
test vectors:
|
||||
|
||||
- Given a known mnemonic phrase and passphrase, the derived seed must match
|
||||
the reference output byte-for-byte.
|
||||
- Test vectors from
|
||||
[BIP39 reference](https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki)
|
||||
and the `bip39` crate's own test suite.
|
||||
|
||||
#### SLIP-0010 Test Vectors
|
||||
|
||||
The `derivation` module must produce identical output to the SLIP-0010 reference
|
||||
test vectors:
|
||||
|
||||
- Given a known seed, the derived master key (private key + chain code) must
|
||||
match the SLIP-0010 reference output.
|
||||
- Given a known master key, the derived child key at path `m/74'/0'/0'/0'`
|
||||
must match the reference output.
|
||||
- Test vectors from
|
||||
[SLIP-0010 reference](https://github.com/satoshilabs/slips/blob/master/slip-0010.md).
|
||||
|
||||
#### AES-256-GCM Test Vectors
|
||||
|
||||
The `encryption` module must produce identical results to published AES-256-GCM
|
||||
test vectors:
|
||||
|
||||
- Given a known key, IV, and plaintext, the ciphertext must match the reference
|
||||
output.
|
||||
- Use IEEE P802.1ASck or NIST SP 800-38D test vectors.
|
||||
- Round-trip encryption/decryption must always succeed for valid inputs.
|
||||
|
||||
These tests ensure that the implementation is correct and compatible with
|
||||
other BIP39/SLIP-0010/AES-256-GCM implementations. They are placed in
|
||||
`tests/test_vectors.rs`.
|
||||
|
||||
## Constraints
|
||||
|
||||
- The seed phrase is never persisted to disk. It is entered at startup or via
|
||||
`Unlock` and held only in `Zeroize`-protected RAM (ADR-038).
|
||||
- `Lock` calls `zeroize()` on the seed and all cached derived keys. The key
|
||||
cache is fully cleared and zeroized on `Lock` (OQ-SVC-04, resolved).
|
||||
- alknet-secret does not depend on alknet-core or alknet-storage. It is fully
|
||||
independent (ADR-027).
|
||||
- The `EncryptedData` wire format is shared with alknet-storage for type-level
|
||||
compatibility, not a crate dependency.
|
||||
- Per ADR-032, secret service domain events (key derivation notifications) stay
|
||||
within the service boundary. External consumers use irpc calls or call
|
||||
protocol operations projected to integration events.
|
||||
- irpc is always a dependency (not feature-gated) per ADR-027.
|
||||
- `SecretProtocol` defines the wire format for in-cluster communication
|
||||
(postcard serialization). For call protocol exposure (e.g.,
|
||||
`/head/secrets/derive`), the service is wrapped in an operation that
|
||||
serializes to JSON.
|
||||
- `DerivedKey.private_key` must derive `Zeroize` per ADR-038. `DerivedKey`
|
||||
is move-only (not `Clone`) — this is stronger than manual Clone with
|
||||
zeroization of the source, as it prevents accidental duplication.
|
||||
- secp256k1 (Ethereum) derivation is gated behind the `secp256k1` feature flag
|
||||
because it requires a different derivation algorithm (BIP-0032) and an
|
||||
additional dependency (`secp256k1`).
|
||||
|
||||
## Phase Progression
|
||||
|
||||
| Phase | Scope | Notes |
|
||||
|-------|-------|-------|
|
||||
| Phase 3 (now) | Basic crate: mnemonic, derivation, encryption, irpc protocol, service lifecycle, key caching | Core key management |
|
||||
| Phase A | Integration with alknet-storage via `EncryptedData` wire format. CLI commands for unlock/lock/derive. `SecretStoreCredentialProvider` wiring. | Full service integration |
|
||||
| Phase B | Memory hardening: `mlock`/`VirtualLock` for seed RAM, constant-time comparison, audit logging of derivation requests. Key rotation: KDF-based key derivation using `EncryptedData.salt` with HKDF/PBKDF2. | Security hardening |
|
||||
| Phase C | Multi-seed support (tenant isolation): indexed `Unlock` with tenant ID. | Multi-tenancy |
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **OQ-SVC-01**: Should the secret service support multiple seed phrases (one
|
||||
per tenant)? See [open-questions.md](open-questions.md).
|
||||
|
||||
- **OQ-SVC-03**: How does the secret service integrate with the existing
|
||||
`EncryptedDataSchema` from `@alkdev/storage`? **Resolution**: The wire format
|
||||
is stable. `EncryptedData` (`key_version`, `salt`, `iv`, `data`) is shared
|
||||
type-level between alknet-secret and alknet-storage. The migration path is
|
||||
re-encryption with a new key version. The `salt` field is reserved for future
|
||||
KDF-based key rotation (see Phase B). See [open-questions.md](open-questions.md).
|
||||
|
||||
- **OQ-SVC-04**: Should workers cache derived keys locally? **Resolution**: Yes.
|
||||
Derived keys are cached in RAM using an LRU cache keyed by derivation path,
|
||||
with a TTL of 1 hour (configurable). The cache is fully cleared and zeroized
|
||||
on `Lock`. This avoids redundant derivation for frequently used keys while
|
||||
ensuring that `Lock` purges all sensitive material. See [open-questions.md](open-questions.md).
|
||||
|
||||
- **OQ-SEC-01**: Should alknet-secret use `mlock`/`VirtualLock` to prevent seed
|
||||
RAM from being paged to disk? See [open-questions.md](open-questions.md).
|
||||
Deferred to Phase B per ADR-038.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [027](decisions/027-crate-decomposition.md) | Crate decomposition | alknet-secret is independent of core and storage |
|
||||
| [032](decisions/032-event-boundary-discipline.md) | Event boundary | Secret service domain events stay internal |
|
||||
| [038](decisions/038-seed-lifecycle-memory-security.md) | Seed lifecycle and memory security | Zeroize for sensitive material, mlock deferred to Phase B |
|
||||
|
||||
## References
|
||||
|
||||
- [research/services.md](../research/services.md) — SecretProtocol definition, DerivedKey, KeyType
|
||||
- [research/storage.md](../research/storage.md) — Secrets section, derivation paths, EncryptedData
|
||||
- [research/integration-plan.md](../research/integration-plan.md) — Phase 3.1
|
||||
- [credentials.md](credentials.md) — CredentialProvider (outbound auth, consumes SecretProtocol::Decrypt)
|
||||
- SLIP-0010 — https://github.com/satoshilabs/slips/blob/master/slip-0010.md
|
||||
- BIP39 — https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki
|
||||
- BIP-0032 — https://github.com/bitcoin/bips/blob/master/bip-0032.mediawiki
|
||||
- NIST SP 800-38D — AES-GCM test vectors
|
||||
@@ -1,325 +0,0 @@
|
||||
---
|
||||
status: reviewed
|
||||
last_updated: 2026-06-07
|
||||
---
|
||||
|
||||
# Server
|
||||
|
||||
## What
|
||||
|
||||
The alknet server accepts SSH connections (via pluggable transport) and handles `channel_open_direct_tcpip` requests by connecting to the requested target — either directly or through an outbound proxy.
|
||||
|
||||
## Why
|
||||
|
||||
The server is the tunnel endpoint. It receives SSH channels requesting TCP connections to specific hosts and ports, and makes those connections on behalf of the client. It's the same role as an SSH server with `AllowTcpForwarding yes`, but self-contained and transport-agnostic.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Server Components
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────┐
|
||||
│ alknet serve │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────┐ │
|
||||
│ │ SSH Server (russh) │ │
|
||||
│ │ ServerHandler per connection │ │
|
||||
│ │ - auth_publickey() → Accept/Reject │ │
|
||||
│ │ - channel_open_direct_tcpip() → connect │ │
|
||||
│ │ - channel_open_forwarded_tcpip() → proxy │ │
|
||||
│ └──────────────────┬──────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────────────────▼──────────────────────────┐ │
|
||||
│ │ Transport Acceptor │ │
|
||||
│ │ (TcpListener / TlsListener / IrohEndpoint) │ │
|
||||
│ └──────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌──────────────────────────────────────────────┐ │
|
||||
│ │ Outbound Proxy (optional) │ │
|
||||
│ │ - Direct TCP │ │
|
||||
│ │ - SOCKS5 proxy │ │
|
||||
│ │ - HTTP CONNECT proxy │ │
|
||||
│ └──────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌──────────────────────────────────────────────┐ │
|
||||
│ │ Rate Limiter │ │
|
||||
│ │ - max-connections-per-ip │ │
|
||||
│ │ - max-auth-attempts │ │
|
||||
│ └──────────────────────────────────────────────┘ │
|
||||
└──────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Authentication
|
||||
|
||||
The server authenticates connections through the `IdentityProvider` trait (ADR-029, [identity.md](identity.md)). `IdentityProvider` decouples the server from any specific identity storage — the server resolves an identity, it doesn't manage keys.
|
||||
|
||||
**Phase 1 implementation**: `ConfigIdentityProvider` (in alknet-core) reads from `ArcSwap<DynamicConfig.auth>` (ADR-030). Every authorized key gets a default scope set. No database required. This is the default for CLI and single-node deployments.
|
||||
|
||||
**Future implementation**: `StorageIdentityProvider` (in alknet-storage, not yet built) backed by SQLite `peer_credentials` and `api_keys` tables plus the ACL graph. The server doesn't need to know which implementation is active — it goes through the trait.
|
||||
|
||||
The server supports two auth presentation paths (ADR-023, [auth.md](auth.md)):
|
||||
|
||||
**SSH public key auth** (SSH transports):
|
||||
1. `auth_publickey()` callback receives the presented key
|
||||
2. Delegates to `IdentityProvider::resolve_from_fingerprint()` with the key fingerprint
|
||||
3. Returns `Accept` (with `Identity` attached) or `Reject`
|
||||
|
||||
**Ed25519 + OpenSSH certificate authority** (ADR-012):
|
||||
1. If no direct key match, validate the presented certificate against trusted cert-authorities
|
||||
2. Check CA signature, expiry, and principal restrictions
|
||||
3. Certificate options: `permit-port-forwarding`, `no-pty`, `source-address`
|
||||
|
||||
**Token auth** (non-SSH transports, WebTransport):
|
||||
1. Extract token from URL path or `Authorization` header
|
||||
2. Delegate to `IdentityProvider::resolve_from_token()`
|
||||
3. Same verification: same authorized keys set, same `Identity` result (ADR-023)
|
||||
|
||||
**No password authentication over SSH channels.** Keys and certificates are sufficient and more secure. If a local SOCKS5 proxy needs its own auth layer, that's a separate concern.
|
||||
|
||||
### Key Material Format
|
||||
|
||||
Key inputs (`--key`, `--authorized-keys`, `--cert-authority`) accept either file paths or in-memory data (via library API or NAPI wrapper). The accepted format is **OpenSSH key format** throughout — private keys in OpenSSH format (`-----BEGIN OPENSSH PRIVATE KEY-----`), public keys in OpenSSH format (`ssh-ed25519 AAAA... user@host`), and authorized keys files in standard OpenSSH `authorized_keys` format. PEM-encoded keys (PKCS#1, PKCS#8) are not supported.
|
||||
|
||||
### TLS Certificate Provisioning
|
||||
|
||||
The server supports three TLS certificate modes (ADR-008):
|
||||
|
||||
1. **Manual certs** (`--tls-cert` / `--tls-key`): User provides certificate and key files. For users with existing PKI.
|
||||
2. **Domain-based ACME** (`--acme-domain <domain>`): Auto-provisions certificates from Let's Encrypt using HTTP-01 or TLS-ALPN-01 challenges. Certificate is domain-bound and auto-renews. Requires port 80 or DNS access for challenges.
|
||||
3. **IP-based ACME**: Short-lived certificates via TLS-ALPN-01 challenge on port 443. No domain name needed, but certificates expire frequently. The ACME client runs continuously.
|
||||
|
||||
ACME support is feature-gated behind the `acme` feature flag to keep the base binary lean. Implementation uses `rustls-acme` or a similar pure-Rust ACME client to avoid an external `certbot` dependency.
|
||||
|
||||
### Channel Handling
|
||||
|
||||
When a client opens a `channel_open_direct_tcpip(host, port, originator_addr, originator_port)`:
|
||||
|
||||
**Reserved destination** — If `host` starts with `alknet-` (e.g., `alknet-control`), the server routes the channel internally instead of connecting to a TCP target. The primary reserved destination is `alknet-control:0`, which bridges the channel to the local pubsub event bus (ADR-018).
|
||||
|
||||
**Forwarding policy check** — Before the proxy task is spawned for any non-reserved destination, the server evaluates `ForwardingPolicy` against the authenticated `Identity` (ADR-031, [configuration.md](configuration.md)). The policy check uses `Identity.id` and `Identity.scopes` from the identity resolved during auth. If the policy denies the destination, the channel open is rejected — no TCP connection is attempted. The default policy (`ForwardingPolicy::allow_all()`) preserves current behavior.
|
||||
|
||||
**Regular destination** — For targets that pass the forwarding policy check:
|
||||
|
||||
1. **Connection** — connect to `host:port`, either directly or via the configured outbound proxy
|
||||
2. **Outbound connection** — connect to the target, either directly or via the configured outbound proxy
|
||||
3. **Bidirectional proxy** — `tokio::io::copy_bidirectional` between the SSH channel stream and the outbound TCP stream
|
||||
4. **Cleanup** — close the channel and TCP stream when either side disconnects
|
||||
|
||||
### Outbound Proxy Modes
|
||||
|
||||
| Mode | CLI Flag | Behavior |
|
||||
|------|----------|----------|
|
||||
| **Direct** | (default) | `TcpStream::connect(target)` |
|
||||
| **SOCKS5** | `--proxy socks5://addr:port` | Connect through SOCKS5 proxy |
|
||||
| **HTTP CONNECT** | `--proxy http://addr:port` | Connect through HTTP CONNECT proxy |
|
||||
|
||||
The proxy setting applies globally to all outbound connections from the server.
|
||||
|
||||
### Stealth Mode
|
||||
|
||||
When `--stealth` is enabled on the server alongside TLS transport:
|
||||
|
||||
1. Non-SSH connections (normal web browsers, scanners) receive a fake nginx 404 response
|
||||
2. The server detects whether the connecting client is speaking SSH or HTTP after the TLS handshake
|
||||
3. If SSH: proceed with `server::run_stream()`
|
||||
4. If HTTP: respond with `HTTP/1.1 404 Not Found` + `Server: nginx` headers, then close
|
||||
|
||||
This makes the server appear as an ordinary web server to port scanners and DPI systems.
|
||||
|
||||
**Stealth mode requires TLS transport (`--transport tls`).** It has no effect with TCP or iroh transports — in those cases, there is no TLS handshake to peek behind, and protocol multiplexing is impossible. The CLI should reject or warn if `--stealth` is used without `--transport tls`.
|
||||
|
||||
### Server Handler Behavior
|
||||
|
||||
The server handler implements `russh::server::Handler` with two primary responsibilities:
|
||||
|
||||
**Authentication (`auth_publickey`)**:
|
||||
- Delegate to `IdentityProvider::resolve_from_fingerprint()` with the presented key fingerprint
|
||||
- If identity resolved, return `Accept` with the `Identity` attached to the session
|
||||
- If no identity, check certificate authority: validate CA signature, expiry, principals
|
||||
- Return `Accept` or `Reject`
|
||||
|
||||
**Channel handling (`channel_open_direct_tcpip`)**:
|
||||
- If the destination host starts with `alknet-`, route internally (control channel, ADR-018)
|
||||
- Otherwise, evaluate `ForwardingPolicy` against the session's `Identity` (ADR-031)
|
||||
- If denied, reject the channel open
|
||||
- If allowed, connect to `host:port` (directly or via the configured outbound proxy)
|
||||
- Spawn a bidirectional proxy task between the SSH channel and the outbound TCP stream
|
||||
- Return the channel for data flow
|
||||
|
||||
### Interface Abstraction
|
||||
|
||||
SSH is one interface at Layer 2 in the three-layer model (ADR-026, [interface.md](interface.md)). The current `ServerHandler` will be refactored into `SshInterface` — it manages SSH session concerns (handshake, auth delegation, channel multiplexing). Forwarding policy, operation routing, and call protocol handling are Layer 3 concerns that live outside the interface. This refactoring is the most invasive code change in Phase 1 (integration-plan, Phase 1.8).
|
||||
|
||||
### Logging and Rate Limiting
|
||||
|
||||
**Logging** (for fail2ban integration on Linux):
|
||||
|
||||
- `INFO` level: auth attempts (remote_addr, user, key_fingerprint, accept/reject)
|
||||
- `INFO` level: connection opened (remote_addr, transport kind)
|
||||
- `INFO` level: connection closed (remote_addr, duration)
|
||||
- Do NOT log: channel open targets, DNS resolutions, bytes transferred
|
||||
|
||||
This matches our production fail2ban setup which filters on source IP + failure indicators. Example log lines:
|
||||
```
|
||||
INFO auth attempt remote_addr=203.0.113.50 user=root key_fingerprint=SHA256:abc... result=reject
|
||||
INFO connection opened remote_addr=203.0.113.50 transport=tls
|
||||
```
|
||||
|
||||
**Built-in rate limiting** (platform-independent):
|
||||
|
||||
| Flag | Default | Purpose |
|
||||
|------|---------|---------|
|
||||
| `--max-connections-per-ip` | 0 (unlimited) | Reject new connections from IPs with N active connections |
|
||||
| `--max-auth-attempts` | 10 | Disconnect after N failed auth attempts per connection |
|
||||
|
||||
These provide abuse protection on platforms without fail2ban (macOS, Windows, BSD) and complement fail2ban on Linux.
|
||||
|
||||
### CLI Interface
|
||||
|
||||
Configuration sources (in priority order): CLI flags, environment variables, optional `--config` TOML file (ADR-030). The TOML config file is a convenience input for reproducible deployments; it does not replace `ServeOptions` (ADR-011).
|
||||
|
||||
Multi-transport listeners use `[[listeners]]` in the TOML config (ADR-030):
|
||||
|
||||
```toml
|
||||
[[listeners]]
|
||||
transport = "tls"
|
||||
listen = "0.0.0.0:443"
|
||||
|
||||
[listeners.tls]
|
||||
cert = "/etc/alknet/tls/cert.pem"
|
||||
key = "/etc/alknet/tls/key.pem"
|
||||
|
||||
[[listeners]]
|
||||
transport = "iroh"
|
||||
```
|
||||
|
||||
Currently, the server binds to a single transport at a time. Multi-transport via `[[listeners]]` is coming per ADR-030.
|
||||
|
||||
```bash
|
||||
# Basic server (SSH on port 22)
|
||||
alknet serve --key ~/.ssh/ssh_host_ed25519_key
|
||||
|
||||
# With TLS (manual certs)
|
||||
alknet serve --key ~/.ssh/ssh_host_ed25519_key \
|
||||
--transport tls \
|
||||
--tls-cert /etc/ssl/cert.pem \
|
||||
--tls-key /etc/ssl/key.pem
|
||||
|
||||
# With TLS (auto ACME, domain-based)
|
||||
alknet serve --key ~/.ssh/ssh_host_ed25519_key \
|
||||
--transport tls \
|
||||
--acme-domain example.com
|
||||
|
||||
# With TLS + stealth (fake nginx 404 to scanners)
|
||||
alknet serve --key ~/.ssh/ssh_host_ed25519_key \
|
||||
--transport tls \
|
||||
--acme-domain example.com \
|
||||
--stealth
|
||||
|
||||
# With iroh transport (no public IP needed)
|
||||
alknet serve --key ~/.ssh/ssh_host_ed25519_key \
|
||||
--transport iroh
|
||||
|
||||
# With outbound proxy
|
||||
alknet serve --key ~/.ssh/ssh_host_ed25519_key \
|
||||
--proxy socks5://127.0.0.1:9050
|
||||
|
||||
# With certificate authority authentication
|
||||
alknet serve --key ~/.ssh/ssh_host_ed25519_key \
|
||||
--cert-authority /etc/alknet/ca.pub
|
||||
|
||||
# With rate limiting
|
||||
alknet serve --key ~/.ssh/ssh_host_ed25519_key \
|
||||
--max-connections-per-ip 5 \
|
||||
--max-auth-attempts 3
|
||||
|
||||
# All options
|
||||
alknet serve \
|
||||
--key <path-or-buffer> \ # SSH host key (required)
|
||||
--authorized-keys <path> \ # Authorized keys file
|
||||
--cert-authority <path> \ # CA public key for cert-auth
|
||||
--transport tcp|tls|iroh \ # Transport mode
|
||||
--listen <addr:port> \ # Listen address for TCP/TLS (default: 0.0.0.0:22)
|
||||
--tls-cert <path> \ # TLS certificate (manual)
|
||||
--tls-key <path> \ # TLS private key (manual)
|
||||
--acme-domain <domain> \ # ACME auto-cert domain
|
||||
--stealth \ # Serve fake nginx 404 to non-SSH connections
|
||||
--proxy <url> \ # Outbound proxy URL (socks5:// or http://)
|
||||
--iroh-relay <url> \ # iroh relay server URL (default: n0 relay)
|
||||
--max-connections-per-ip <n> \ # Max concurrent connections per IP (default: unlimited)
|
||||
--max-auth-attempts <n> # Max auth failures before disconnect (default: 10)
|
||||
```
|
||||
|
||||
### iroh Server Mode
|
||||
|
||||
When running with `--transport iroh`, the server:
|
||||
|
||||
1. Creates an iroh endpoint with ALPN value `b"alknet-ssh"`
|
||||
2. Prints its endpoint ID (base58-encoded Ed25519 public key) — this is what clients use as the `--peer` value
|
||||
3. Accepts incoming connections on the endpoint
|
||||
4. For each connection, accepts a bidirectional stream and passes it to `server::run_stream()`
|
||||
|
||||
No listening port is needed. The server connects outbound to the iroh relay (default: n0, override with `--iroh-relay`) and awaits connections from clients who know its endpoint ID (base58-encoded, printed on startup).
|
||||
|
||||
## Constraints
|
||||
|
||||
- The server does not log tunnel destinations (ADR-006). Auth events and connection events are logged for fail2ban integration (ADR-013).
|
||||
- Destination strings beginning with `alknet-` are reserved for internal use (ADR-018). The server must not attempt TCP connections to `alknet-*` destinations — these are intercepted for control channel routing.
|
||||
- One `ServerHandler` instance per connection. Handler state is not shared between connections (unless explicitly configured via `Arc` shared state for things like connection limits).
|
||||
- The server currently binds to a single transport at a time. Multi-transport via `[[listeners]]` is coming per ADR-030.
|
||||
- Forwarding policy is evaluated before every channel proxy spawn. Denied channels are rejected immediately (ADR-031).
|
||||
- Auth resolves through `IdentityProvider` (ADR-029). Phase 1 uses `ConfigIdentityProvider` backed by `ArcSwap<DynamicConfig>` (ADR-030). `StorageIdentityProvider` (Phase 2+) replaces it for production deployments with SQLite.
|
||||
- ACME support requires the `acme` feature flag. Without it, only manual TLS certs are supported.
|
||||
- No password authentication over SSH channels. Key-based and cert-authority only (ADR-012).
|
||||
- Stealth mode (`--stealth`) requires TLS transport. It has no effect on TCP or iroh transports (ADR-017).
|
||||
|
||||
## Graceful Shutdown
|
||||
|
||||
On SIGTERM or SIGINT:
|
||||
|
||||
1. Stop accepting new connections on the transport listener
|
||||
2. Send SSH disconnect messages to all active sessions
|
||||
3. Wait for in-flight channel data to drain (brief timeout, ~2 seconds per session)
|
||||
4. Close all transport listeners
|
||||
5. Exit
|
||||
|
||||
The server does not wait indefinitely for idle connections to close. After the drain timeout, remaining connections are forcibly terminated. This prevents a slow or stuck client from blocking shutdown indefinitely.
|
||||
|
||||
## Error Handling
|
||||
|
||||
Error handling follows the project's layered pattern (see overview.md):
|
||||
|
||||
- **Transport errors**: Cause connection rejection. The listener remains active — a failed TLS handshake or iroh connection attempt does not affect other incoming connections.
|
||||
- **Auth errors**: Result in connection rejection with a logged auth failure event (for fail2ban, ADR-013). Repeated failures from one connection trigger disconnect after `--max-auth-attempts`.
|
||||
- **Channel-level errors**: Individual channel failures (target unreachable, proxy failure) close that channel without affecting the SSH session or other channels. The client receives a channel open failure message.
|
||||
- **CLI errors**: Reported to stderr with a non-zero exit code. Fatal errors (invalid flags, key file not found, bind failure) exit immediately.
|
||||
|
||||
## Open Questions
|
||||
|
||||
None — all resolved.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [001](decisions/001-pluggable-transport.md) | Pluggable transport | Transport trait, SSH consumes stream |
|
||||
| [004](decisions/004-ssh-over-transport.md) | SSH over transport | SSH never touches network directly |
|
||||
| [006](decisions/006-no-logging-of-tunnel-destinations.md) | No logging of destinations | Server logs auth and connections, not destinations |
|
||||
| [008](decisions/008-acme-lets-encrypt.md) | ACME/Let's Encrypt | Auto-provision TLS certs, domain and IP paths |
|
||||
| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Key + cert-authority auth | No password auth; support OpenSSH cert-authority |
|
||||
| [013](decisions/013-fail2ban-friendly-logging.md) | Fail2ban-friendly logging | Structured auth logs + built-in rate limiting |
|
||||
| [017](decisions/017-stealth-mode-protocol-multiplexing.md) | Stealth mode | Protocol multiplexing on port 443 |
|
||||
| [018](decisions/018-control-channel-for-pubsub.md) | Control channel | Reserved `alknet-control` destination for pubsub |
|
||||
| [019](decisions/019-proxy-dual-semantics.md) | Proxy dual semantics | `--proxy` routes transport on client, data on server |
|
||||
| [026](decisions/026-transport-interface-separation.md) | Three-layer model | SSH is Layer 2 interface, ServerHandler → SshInterface |
|
||||
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | IdentityProvider is the contract; irpc service is one backend |
|
||||
| [029](decisions/029-identity-core-type.md) | Identity as core type | IdentityProvider trait in alknet-core |
|
||||
| [030](decisions/030-static-dynamic-config-split.md) | Static/dynamic config split | ArcSwap for dynamic config, ConfigReloadHandle |
|
||||
| [031](decisions/031-forwarding-policy.md) | Forwarding policy | Evaluated before channel proxy spawn |
|
||||
|
||||
## References
|
||||
|
||||
- [configuration.md](configuration.md) — DynamicConfig, ForwardingPolicy, ConfigReloadHandle
|
||||
- [identity.md](identity.md) — IdentityProvider trait, Identity struct
|
||||
- [auth.md](auth.md) — Unified auth, AuthPolicy, token auth
|
||||
- [interface.md](interface.md) — Interface trait, SshInterface, three-layer model
|
||||
@@ -1,233 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-07
|
||||
---
|
||||
|
||||
# Services
|
||||
|
||||
> **Phase note**: This spec defines the contracts for the service layer — the
|
||||
> protocol enums, OperationEnv, and deployment topologies. Phase 1 ships
|
||||
> `ConfigIdentityProvider` (ArcSwap-based) and `ConfigServiceImpl` (ArcSwap-based)
|
||||
> as the only auth and config implementations. The irpc service protocols
|
||||
> (`AuthProtocol`, `SecretProtocol`, etc.) and the production deployment
|
||||
> topology (multi-node with `StorageIdentityProvider`) are contracted here but
|
||||
> will be implemented in Phase 2+. Application services (DockerService,
|
||||
> NodeService, agent services) are downstream concerns that build on top of
|
||||
> the call protocol and OperationEnv — they are not core requirements.
|
||||
|
||||
## What
|
||||
|
||||
The irpc service layer decomposes alknet's core responsibilities into
|
||||
independently testable, deployable, and replaceable components. Auth, Secret,
|
||||
Config, and Storage are irpc protocol enums that work both as in-process async
|
||||
boundaries (tokio channels) and cross-process/cross-network (irpc over iroh
|
||||
QUIC streams). OperationEnv is the universal composition mechanism that unifies local
|
||||
dispatch, irpc service dispatch, and remote call protocol dispatch.
|
||||
|
||||
## Why
|
||||
|
||||
Without the service layer, auth verification, key derivation, and config reload
|
||||
are scattered across the codebase with no async boundary. For head nodes serving
|
||||
many users, in-memory key lookup doesn't scale — auth needs to query a database
|
||||
on demand. For secret management, the seed must be isolated in its own process
|
||||
boundary.
|
||||
|
||||
Without OperationEnv, handlers calling other operations would need to know
|
||||
whether the target is local, in-cluster, or on a remote node. OperationEnv
|
||||
abstracts this away: `context.env.invoke("secrets", "derive", input)` works
|
||||
regardless of dispatch path.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Service Definition Pattern
|
||||
|
||||
Services are defined as irpc protocol enums:
|
||||
|
||||
```rust
|
||||
#[rpc_requests(message = AuthMessage)]
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
enum AuthProtocol {
|
||||
#[rpc(tx=oneshot::Sender<AuthResult>)]
|
||||
#[wrap(VerifyPubkey)]
|
||||
VerifyPubkey { fingerprint: String, key_data: Vec<u8> },
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
The `#[rpc_requests]` macro generates two versions:
|
||||
- **Serializable** (`Request`): for remote communication (postcard encoding)
|
||||
- **With channels** (`RequestWithChannels`): for local communication (tokio channels)
|
||||
|
||||
Both use the same `Client<S>` type. The local/remote distinction is transparent
|
||||
at the call site.
|
||||
|
||||
### Core Services
|
||||
|
||||
| Service | Protocol | Purpose | Always Local? |
|
||||
|---------|----------|---------|---------------|
|
||||
| **Auth** | `AuthProtocol` | Verify identities, check credentials | Can be remote |
|
||||
| **Secret** | `SecretProtocol` | Derive keys, encrypt/decrypt | Local or remote |
|
||||
| **Config** | `ConfigProtocol` | Dynamic config reload | Local |
|
||||
| **Storage** | `StorageProtocol` | Graph CRUD, metagraph operations | Local or remote |
|
||||
|
||||
### OperationContext
|
||||
|
||||
Every handler receives an `OperationContext`:
|
||||
|
||||
```rust
|
||||
pub struct OperationContext {
|
||||
pub request_id: String,
|
||||
pub parent_request_id: Option<String>,
|
||||
pub identity: Option<Identity>,
|
||||
pub metadata: HashMap<String, Value>,
|
||||
pub env: OperationEnv,
|
||||
pub trusted: bool, // set by buildEnv(), not by callers
|
||||
}
|
||||
```
|
||||
|
||||
- **`identity`**: The authenticated identity making the call. Populated by
|
||||
`IdentityProvider` from the interface layer.
|
||||
- **`env`**: The operation environment — namespaced access to other operations.
|
||||
- **`trusted`**: When a handler calls another operation through `env`, the
|
||||
nested call is `trusted` (skips ACL checks).
|
||||
|
||||
### OperationEnv — Universal Composition Mechanism
|
||||
|
||||
OperationEnv provides namespace + operation name → invoke with input, return
|
||||
output. The handler doesn't know or care whether the dispatch is local, irpc,
|
||||
or remote.
|
||||
|
||||
Three dispatch paths:
|
||||
|
||||
| Path | Mechanism | Serialization | Scope |
|
||||
|------|-----------|---------------|-------|
|
||||
| **Local** | Direct function call through registry | None (in-process) | Same process |
|
||||
| **Service** | irpc protocol enum dispatch | postcard (binary) | Same cluster |
|
||||
| **Remote** | Call protocol `EventEnvelope` | JSON | Cross-node |
|
||||
|
||||
All three produce the same `ResponseEnvelope`.
|
||||
|
||||
Service assembly determines which path each operation uses:
|
||||
|
||||
```rust
|
||||
// Minimal deployment (single node, all local)
|
||||
let env = OperationEnv::local(local_registry);
|
||||
|
||||
// Production deployment (mix of local and remote)
|
||||
let env = OperationEnv::new()
|
||||
.local("auth", auth_registry)
|
||||
.local("config", config_registry)
|
||||
.service("secrets", secret_irpc_client)
|
||||
.remote("worker-1", call_protocol_conn);
|
||||
```
|
||||
|
||||
### Service vs Call Protocol vs External Service
|
||||
|
||||
These are different concepts that compose through OperationEnv:
|
||||
|
||||
- **irpc service**: In-cluster, Rust-to-Rust, type-safe, postcard serialization.
|
||||
Dispatched by enum variant. Example: `AuthProtocol::VerifyPubkey`.
|
||||
- **Call protocol operation**: Cross-node, cross-language, path-based, JSON
|
||||
`EventEnvelope`. Dispatched by namespace + name. Example:
|
||||
`/head/auth/verify`.
|
||||
- **External service**: Any endpoint reachable via the call protocol.
|
||||
Example: a vast.ai instance, an HTTP API, another head node.
|
||||
|
||||
An irpc service can back a call protocol operation. The OperationEnv routes to
|
||||
the appropriate dispatch path:
|
||||
|
||||
```
|
||||
Call Protocol (Layer 3, external, JSON)
|
||||
└── irpc Service (Layer 3, internal, postcard)
|
||||
└── Honker Streams (Domain events, within service boundary)
|
||||
```
|
||||
|
||||
### Adapters
|
||||
|
||||
HTTP, MCP, DNS, and WebSocket adapters all resolve through OperationEnv:
|
||||
|
||||
- HTTP: `POST /v1/{namespace}/{op}` → `context.env.invoke(namespace, op, input)`
|
||||
- MCP: `tools/call` with tool name → `context.env.invoke(namespace, op, input)`
|
||||
- DNS: `{op}.{namespace}.alk.dev TXT?` → `context.env.invoke(namespace, op, input)`
|
||||
- Call protocol: `call.requested` with `operationId` → `context.env.invoke(namespace, op, input)`
|
||||
|
||||
### Deployment Topologies
|
||||
|
||||
**Current (Phase 1, single node, CLI)**: This is what exists and ships today.
|
||||
Auth uses `ConfigIdentityProvider` backed by `ArcSwap<DynamicConfig>`. Config
|
||||
uses `ConfigServiceImpl` backed by `ArcSwap<DynamicConfig>`. There is no
|
||||
database dependency.
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────┐
|
||||
│ Single Process │
|
||||
│ ConfigIdentityProvider (ArcSwap) │
|
||||
│ ConfigServiceImpl (ArcSwap) │
|
||||
│ alknet-core Server │
|
||||
└──────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
The irpc service layer (`AuthProtocol`, `SecretProtocol`, `ConfigProtocol`,
|
||||
`StorageProtocol`) and the application services (DockerService, NodeService,
|
||||
WalletService, agent services) are downstream concerns that will be built in
|
||||
later phases. The architecture defines the contracts (`IdentityProvider` trait,
|
||||
`OperationEnv`, service protocol enums) so that implementations can plug in
|
||||
without modifying core, but the implementations don't exist yet.
|
||||
|
||||
**Future (multi-node, production)**: Auth and secrets on dedicated nodes;
|
||||
workers access them remotely via irpc over QUIC. StorageIdentityProvider
|
||||
backed by SQLite replaces ConfigIdentityProvider for auth.
|
||||
|
||||
```
|
||||
Auth Node (SQLite) Secret Node (seed in RAM)
|
||||
↑ ↑
|
||||
│ QUIC (irpc) │ QUIC (irpc)
|
||||
│ │
|
||||
Head Node (Config, Storage, alknet-core Server)
|
||||
│
|
||||
│ SSH / iroh / TLS
|
||||
│
|
||||
Worker Node (alknet-core Client)
|
||||
```
|
||||
|
||||
This topology requires alknet-storage, alknet-secret, and the irpc service
|
||||
layer to be built — they are Phase 2+ concerns.
|
||||
|
||||
## Constraints
|
||||
|
||||
- Services are **internal** — they run within a node or cluster.
|
||||
- The call protocol is **external** — it's how nodes talk to each other.
|
||||
- Per ADR-032, domain events (Honker streams) stay within the owning service.
|
||||
irpc calls are synchronous request-response within a node. Call protocol
|
||||
`EventEnvelope` is the integration boundary between nodes.
|
||||
- OperationEnv is a hard constraint: the handler-facing API must match the
|
||||
behavioral contract from `@alkdev/operations`. Namespace + operation name →
|
||||
invoke with input, return output.
|
||||
- irpc is behind a feature flag in alknet-core. Nodes that only do SSH tunneling
|
||||
don't need the service layer overhead.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **OQ-SVC-01**: Should the secret service support multiple seed phrases (one
|
||||
per tenant)? See [open-questions.md](open-questions.md).
|
||||
|
||||
- **OQ-SVC-02**: Should service protocols use postcard (binary) or JSON for
|
||||
remote calls? See [open-questions.md](open-questions.md).
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [027](decisions/027-crate-decomposition.md) | Crate decomposition | Service crates are independent of core |
|
||||
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | AuthProtocol behind feature flag |
|
||||
| [032](decisions/032-event-boundary-discipline.md) | Event boundary | Domain events never cross service boundaries |
|
||||
| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv | Universal composition mechanism with three dispatch paths |
|
||||
|
||||
## References
|
||||
|
||||
- [research/services.md](../research/services.md) — Service protocol definitions, OperationContext, deployment topologies
|
||||
- [research/integration-plan.md](../research/integration-plan.md) — OperationEnv, three dispatch paths, adapter patterns
|
||||
- [secret-service.md](secret-service.md) — SecretProtocol definition
|
||||
- [identity.md](identity.md) — IdentityProvider, AuthProtocol
|
||||
- [configuration.md](configuration.md) — ConfigProtocol, DynamicConfig reload
|
||||
- [interface.md](interface.md) — Interface layer, auth across interfaces
|
||||
@@ -1,221 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-07
|
||||
---
|
||||
|
||||
# Storage
|
||||
|
||||
> **Phase note**: `alknet-storage` is a future crate (Phase 2+). This spec
|
||||
> defines its contract — the data model, the `IdentityProvider` impl, the
|
||||
> irpc service protocol — so that alknet-core can define the traits
|
||||
> (`IdentityProvider`) that storage will later implement. The crate itself
|
||||
> hasn't been built yet. Phase 1 uses `ConfigIdentityProvider` backed by
|
||||
> `ArcSwap<DynamicConfig>`.
|
||||
|
||||
## What
|
||||
|
||||
The `alknet-storage` crate will provide SQLite-backed graph storage, identity
|
||||
management, access control, and reactivity via honker. It mirrors the
|
||||
TypeScript `@alkdev/storage` package's design while leveraging Rust's type
|
||||
system and honker's built-in pub/sub.
|
||||
|
||||
## Why
|
||||
|
||||
alknet-core needs persistent identity data (authorized keys, accounts, ACLs)
|
||||
and a way to store and query graph-structured data (call graphs, operation
|
||||
graphs, metagraph). But alknet-core cannot take a database dependency. The
|
||||
solution: alknet-storage implements alknet-core's `IdentityProvider` trait,
|
||||
providing SQLite-backed identity resolution without core knowing about SQLite.
|
||||
|
||||
The metagraph (three-level type system: GraphType → NodeType → EdgeType → Graph
|
||||
→ Node → Edge) is the foundation for ACL, flowgraph persistence, and any
|
||||
future graph-structured data.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Crate Structure
|
||||
|
||||
```
|
||||
alknet-storage/
|
||||
├── metagraph/ — GraphType, NodeType, EdgeType persistence
|
||||
├── identity/ — accounts, organizations, peer_credentials, api_keys, audit_logs
|
||||
├── acl/ — PrincipalNode, DelegatesEdge, access control graph
|
||||
├── secrets/ — Encrypted node type, encrypt/decrypt bridge
|
||||
├── honker/ — honker integration: notify, stream, queue
|
||||
├── graph/ — GraphInstance, Node, Edge CRUD with schema validation
|
||||
└── schema/ — JSON Schema definitions (serde + jsonschema)
|
||||
```
|
||||
|
||||
### Metagraph Data Model
|
||||
|
||||
Three-level type system:
|
||||
|
||||
1. **GraphType** — A class of graphs (e.g., "call-graph", "acl",
|
||||
"task-dependencies"). Defines structural constraints.
|
||||
2. **NodeType** — A category of node within a graph type. Each has a JSON Schema
|
||||
for attribute validation.
|
||||
3. **EdgeType** — A category of edge within a graph type. Each has a JSON Schema
|
||||
and optional source/target constraints.
|
||||
|
||||
Graph instances belong to a graph type and contain nodes and edges conforming
|
||||
to those type definitions.
|
||||
|
||||
### SQLite Table Schema
|
||||
|
||||
Common columns: `id TEXT PK`, `metadata TEXT JSON DEFAULT '{}'`,
|
||||
`created_at INTEGER TIMESTAMP`, `updated_at INTEGER TIMESTAMP`.
|
||||
|
||||
| Table | Key columns |
|
||||
|-------|------------|
|
||||
| `graph_types` | id, name (UNIQUE), config JSON, version, scope |
|
||||
| `node_types` | id, graph_type_id FK, name, schema JSON |
|
||||
| `edge_types` | id, graph_type_id FK, name, schema JSON, allowed_source/target types |
|
||||
| `graphs` | id, graph_type_id FK, name, description, status, owner_id, project_id |
|
||||
| `nodes` | id, graph_id FK, key (UNIQUE per graph), attributes JSON |
|
||||
| `edges` | id, graph_id FK, key, source_node_key, target_node_key, attributes JSON, undirected |
|
||||
|
||||
No FK constraints across database files. Referential integrity is enforced at
|
||||
the application layer.
|
||||
|
||||
### System DB vs Tenant DB
|
||||
|
||||
- **System DB** (`system.db`): Identity tables (accounts, organizations,
|
||||
peer_credentials, api_keys, audit_logs) + system-scoped graph types.
|
||||
- **Tenant DB** (`tenant-{orgId}.db`): Metagraph tables + tenant-scoped graph
|
||||
types.
|
||||
|
||||
### Identity Tables
|
||||
|
||||
| Table | Key columns |
|
||||
|-------|------------|
|
||||
| `accounts` | email (UNIQUE), display_name, access_level (admin/user/service), status |
|
||||
| `organizations` | name (UNIQUE), slug (UNIQUE), owner_id FK → accounts |
|
||||
| `organization_members` | org_id FK, account_id FK, membership_level (owner/admin/member) |
|
||||
| `api_keys` | owner_id FK, key_hash (UNIQUE), name, enabled, expires_at, revoked_at |
|
||||
| `peer_credentials` | owner_id FK, credential_type (ssh_key/cert_authority), fingerprint (UNIQUE), public_key_data |
|
||||
| `audit_logs` | action, owner_id FK, credential_id, org_id FK, details JSON |
|
||||
|
||||
### ACL as Metagraph
|
||||
|
||||
The ACL graph is a directed, non-multi metagraph:
|
||||
|
||||
- **PrincipalNode**: IdentityType (Account, Org, Service, Role) + identity_id + scopes + resources
|
||||
- **ResourceNode**: The thing being accessed
|
||||
- **Edges**: can_read, can_write, can_execute, belongs_to, delegates
|
||||
|
||||
Delegation edges carry `narrowed_scopes` — the delegate can only exercise scopes
|
||||
that are a subset of the delegator's.
|
||||
|
||||
### StorageIdentityProvider (Future — Phase 2+)
|
||||
|
||||
Implements alknet-core's `IdentityProvider` trait (ADR-029). This is defined
|
||||
here as a contract. When alknet-storage is built, it will provide this
|
||||
implementation. Phase 1 uses `ConfigIdentityProvider` backed by `ArcSwap`.
|
||||
|
||||
```rust
|
||||
impl IdentityProvider for StorageIdentityProvider {
|
||||
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity> {
|
||||
// 1. Find peer_credentials row by fingerprint
|
||||
// 2. Resolve to account → organization membership → effective scopes
|
||||
// 3. Return Identity { id: account_uuid, scopes, resources }
|
||||
}
|
||||
|
||||
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity> {
|
||||
// 1. Verify Ed25519 signature against api_keys or peer_credentials
|
||||
// 2. Resolve to account → effective scopes
|
||||
// 3. Return Identity { id: account_uuid, scopes, resources }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### StorageProtocol irpc Service
|
||||
|
||||
```rust
|
||||
#[rpc_requests(message = StorageMessage)]
|
||||
enum StorageProtocol {
|
||||
#[rpc(tx=oneshot::Sender<Graph>)]
|
||||
#[wrap(CreateGraph)]
|
||||
CreateGraph { graph_type_id: String, name: String },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<Node>)]
|
||||
#[wrap(AddNode)]
|
||||
AddNode { graph_id: String, key: String, attributes: Value },
|
||||
|
||||
// ... (full protocol in research/services.md)
|
||||
}
|
||||
```
|
||||
|
||||
### Honker Integration
|
||||
|
||||
| Feature | Use case |
|
||||
|---------|----------|
|
||||
| `stream_publish` / `subscribe` | Durable pub/sub for node/edge/membership changes |
|
||||
| `notify` / `listen` | Ephemeral pub/sub for real-time control channel events |
|
||||
| `queue` / `claim` / `ack` | Task queue for async operations |
|
||||
|
||||
Per ADR-032, honker streams are domain events internal to the storage service.
|
||||
They are projected to call protocol `EventEnvelope` events when crossing service
|
||||
boundaries.
|
||||
|
||||
### Encrypted Data
|
||||
|
||||
alknet-storage references alknet-secret's `EncryptedData` wire format for
|
||||
storing encrypted nodes (API keys, OAuth tokens). The format (key_version,
|
||||
salt, iv, ciphertext) is shared by type-level compatibility, not a crate
|
||||
dependency. alknet-secret encrypts; alknet-storage stores the blob.
|
||||
|
||||
### Crate Dependencies
|
||||
|
||||
```toml
|
||||
[dependencies]
|
||||
honker = "0.x"
|
||||
rusqlite = { version = "0.x", features = ["bundled"] }
|
||||
serde = { version = "1", features = ["derive"] }
|
||||
serde_json = "1"
|
||||
jsonschema = "0.x"
|
||||
petgraph = "0.x"
|
||||
irpc = "0.x"
|
||||
```
|
||||
|
||||
Does NOT depend on alknet-core or alknet-secret. Implements alknet-core's
|
||||
`IdentityProvider` trait by conforming to its signature, not by direct crate
|
||||
dependency.
|
||||
|
||||
## Constraints
|
||||
|
||||
- alknet-storage does NOT depend on alknet-core as a crate. It implements the
|
||||
`IdentityProvider` trait by conforming to the signature. The CLI binary
|
||||
wires them together.
|
||||
- alknet-storage does NOT depend on alknet-secret. They share the `EncryptedData`
|
||||
wire format by type-level compatibility, not a crate dependency.
|
||||
- WAL mode for concurrent reads during writes. Single writer per `.db` file.
|
||||
- JSON Schema validation uses the `jsonschema` crate at runtime (replaces
|
||||
TypeBox from TypeScript).
|
||||
- Per ADR-032, honker stream events never cross service boundaries without
|
||||
projection to `EventEnvelope`.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **OQ-SVC-03**: How does the secret service integrate with the existing
|
||||
`EncryptedDataSchema` from `@alkdev/storage`? See [open-questions.md](open-questions.md).
|
||||
|
||||
- **OQ-SVC-04**: Should workers cache derived keys locally? See [open-questions.md](open-questions.md).
|
||||
|
||||
- **OQ-SVC-05**: How does the NFT-based ACL smart contract interact with the
|
||||
secret service? See [open-questions.md](open-questions.md).
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [027](decisions/027-crate-decomposition.md) | Crate decomposition | alknet-storage is independent of core and secret |
|
||||
| [029](decisions/029-identity-core-type.md) | Identity as core type | alknet-storage implements IdentityProvider trait |
|
||||
| [032](decisions/032-event-boundary-discipline.md) | Event boundary | Honker streams stay internal; projection to EventEnvelope at boundaries |
|
||||
|
||||
## References
|
||||
|
||||
- [research/storage.md](../research/storage.md) — Full metagraph, identity, ACL, honker definitions
|
||||
- [research/services.md](../research/services.md) — StorageProtocol, StorageIdentityProvider
|
||||
- [research/integration-plan.md](../research/integration-plan.md) — Phase 2.2
|
||||
- [identity.md](identity.md) — IdentityProvider trait, Identity struct
|
||||
- [secret-service.md](secret-service.md) — EncryptedData format, derivation paths
|
||||
@@ -1,152 +0,0 @@
|
||||
---
|
||||
status: reviewed
|
||||
last_updated: 2026-06-02
|
||||
---
|
||||
|
||||
# Transport Layer
|
||||
|
||||
## What
|
||||
|
||||
The transport layer produces a duplex byte stream (`AsyncRead + AsyncWrite + Unpin + Send`) that the SSH layer consumes via `russh::client::connect_stream()` or `russh::server::run_stream()`. The SSH layer is completely unaware of what transport it runs over.
|
||||
|
||||
## Why
|
||||
|
||||
Pluggable transports are the core architectural insight. They enable:
|
||||
|
||||
- **Simple deployment**: TCP on port 22 for basic use
|
||||
- **Censorship resistance**: TLS on port 443 looks like HTTPS
|
||||
- **NAT traversal**: iroh QUIC allows connections without public IPs
|
||||
- **Composability**: transports can be layered (iroh through SOCKS5 through SSH through TLS)
|
||||
|
||||
Without this abstraction, each transport mode would need its own SSH connection logic. With it, there's one SSH implementation and N transport implementations.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Transport Trait
|
||||
|
||||
```rust
|
||||
// The core abstraction. Each transport produces ONE duplex stream.
|
||||
// The SSH session runs over this stream for its entire lifetime.
|
||||
|
||||
#[async_trait]
|
||||
pub trait Transport: Send + Sync + 'static {
|
||||
type Stream: AsyncRead + AsyncWrite + Unpin + Send + 'static;
|
||||
|
||||
/// Connect to the remote endpoint and return a duplex stream.
|
||||
/// For client-side transports.
|
||||
async fn connect(&self) -> Result<Self::Stream>;
|
||||
|
||||
/// Return a human-readable description of this transport for logging.
|
||||
fn describe(&self) -> String;
|
||||
}
|
||||
```
|
||||
|
||||
### Server-Side Transport Acceptor
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait TransportAcceptor: Send + Sync + 'static {
|
||||
type Stream: AsyncRead + AsyncWrite + Unpin + Send + 'static;
|
||||
|
||||
/// Accept an incoming connection and return a duplex stream.
|
||||
async fn accept(&self) -> Result<(Self::Stream, TransportInfo)>;
|
||||
}
|
||||
|
||||
/// Metadata about the incoming connection.
|
||||
pub struct TransportInfo {
|
||||
pub remote_addr: Option<SocketAddr>,
|
||||
pub transport_kind: TransportKind,
|
||||
}
|
||||
|
||||
pub enum TransportKind {
|
||||
Tcp,
|
||||
Tls { server_name: Option<String> },
|
||||
Iroh { endpoint_id: String },
|
||||
}
|
||||
```
|
||||
|
||||
### Transport Implementations
|
||||
|
||||
| Transport | Client | Server | Stream Type |
|
||||
|-----------|--------|--------|-------------|
|
||||
| **TcpTransport** | `TcpStream::connect(addr)` | `TcpListener::accept()` | `TcpStream` |
|
||||
| **TlsTransport** | `TlsStream<TcpStream>` (client TLS) | `TlsStream<TcpStream>` (server TLS) | `tokio_rustls::client::TlsStream<TcpStream>` |
|
||||
| **IrohTransport** | `endpoint.connect(peer, alpn)` then `conn.open_bi()` then `join(recv, send)` | `endpoint.accept()` then `conn.accept_bi()` then `join(recv, send)` | `tokio::io::Join<RecvStream, SendStream>` |
|
||||
|
||||
### Iroh Stream Join
|
||||
|
||||
Since QUIC splits streams into separate `RecvStream` (implements `AsyncRead`) and `SendStream` (implements `AsyncWrite`), while russh expects a single duplex stream, they are combined using `tokio::io::join(recv_stream, send_stream)` which produces a `Join<RecvStream, SendStream>` implementing both traits.
|
||||
|
||||
See ADR-003 for the decision to use `tokio::io::join` over a custom wrapper.
|
||||
|
||||
### iroh Relay Configuration
|
||||
|
||||
By default, iroh transport uses n0's free relay servers (`https://relay.iroh.network/`). This provides zero-config NAT traversal for testing and development. For production deployments, users override with `--iroh-relay <url>` to point to a self-hosted relay.
|
||||
|
||||
The relay URL is passed to iroh's `Endpoint::builder()` configuration. Self-hosted relay setup is documented in the project wiki.
|
||||
|
||||
See ADR-009 for the decision to default to n0's relay with override.
|
||||
|
||||
### Transport Chaining
|
||||
|
||||
Transports can be nested. The CLI supports `--transport iroh --proxy socks5://...` natively (ADR-010):
|
||||
|
||||
```bash
|
||||
alknet connect --transport iroh --proxy socks5://127.0.0.1:1080
|
||||
```
|
||||
|
||||
This routes iroh's outbound TCP connections through the specified SOCKS5 proxy. The iroh transport supports SOCKS5 and HTTP proxy configuration for its outbound connections — the proxy URL is applied during transport initialization.
|
||||
|
||||
For other combinations:
|
||||
- TCP + TLS is already implicit (TLS wraps TCP in `TlsTransport`)
|
||||
- TLS + SOCKS5 proxy is also supported via `--proxy` with `--transport tls`
|
||||
|
||||
**Note**: `--proxy` has different semantics on the client vs the server (ADR-019):
|
||||
- **Client**: `--proxy` routes the *transport connection* through the proxy (e.g., iroh endpoint → SOCKS5 → iroh relay)
|
||||
- **Server**: `--proxy` routes *outbound target connections* through the proxy (e.g., SSH channel request → SOCKS5 → target host)
|
||||
|
||||
### Connection Lifecycle
|
||||
|
||||
```
|
||||
Client Server
|
||||
│ │
|
||||
│ transport.connect() │ transport_acceptor.accept()
|
||||
│ ─────────────────────────────────────────────▶│
|
||||
│ (duplex byte stream established) │
|
||||
│ │
|
||||
│ russh::client::connect_stream(config, │ russh::server::run_stream(config,
|
||||
│ stream, handler) │ stream, handler)
|
||||
│ │
|
||||
│ ═══════ SSH session over stream ═════════════ │
|
||||
│ ═════════════════════════════════════════════ │
|
||||
│ │
|
||||
│ channel_open_direct_tcpip(host, port, ...) │
|
||||
│ ─────────────────────────────────────────────▶│
|
||||
│ │
|
||||
│ ┌─────── TCP proxy ──────────────────┐ │
|
||||
│ │ SSH channel ←→ TcpStream::connect │ │
|
||||
│ └────────────────────────────────────┘ │
|
||||
```
|
||||
|
||||
## Constraints
|
||||
|
||||
- SSH sees only the stream. It never opens its own TCP connections. (ADR-004)
|
||||
- Each transport produces exactly one stream per SSH session. Multiple sessions need multiple `connect()` calls.
|
||||
- The iroh transport reuses a single `Endpoint` across multiple sessions (one QUIC connection per peer, multiple `open_bi()` streams). The endpoint is created once and shared.
|
||||
- TLS transport requires certificate configuration on the server side. The client can accept any certificate (self-signed) or verify against a CA. Server-side ACME is supported (ADR-008).
|
||||
|
||||
## Open Questions
|
||||
|
||||
None — all resolved.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [001](decisions/001-pluggable-transport.md) | Pluggable transport | Transport trait produces stream, SSH consumes it |
|
||||
| [003](decisions/003-iroh-stream-join.md) | iroh stream join | `tokio::io::join` combines QUIC halves |
|
||||
| [004](decisions/004-ssh-over-transport.md) | SSH over transport | SSH never touches TCP/iroh/TLS directly |
|
||||
| [008](decisions/008-acme-lets-encrypt.md) | ACME/Let's Encrypt | Auto-provision TLS certs, domain and IP paths |
|
||||
| [009](decisions/009-default-iroh-relay.md) | Default iroh relay | n0 relay by default, `--iroh-relay` override |
|
||||
| [010](decisions/010-transport-chaining-cli.md) | Transport chaining | `--proxy` works with all transports natively |
|
||||
| [019](decisions/019-proxy-dual-semantics.md) | Proxy dual semantics | `--proxy` routes transport on client, data on server |
|
||||
@@ -1,28 +0,0 @@
|
||||
---
|
||||
status: deprecated
|
||||
last_updated: 2026-06-01
|
||||
---
|
||||
|
||||
# TUN Shim (Deprecated)
|
||||
|
||||
> **Note**: TUN functionality has been deferred from the alknet project. For VPN-like "route all traffic" behavior, use `tun2proxy` alongside alknet's SOCKS5 proxy. See ADR-014 for the rationale.
|
||||
|
||||
## What Changed
|
||||
|
||||
The `alknet-tun` separate process and all TUN-related code is out of scope. The recommended approach for VPN-like behavior is:
|
||||
|
||||
```bash
|
||||
# Terminal 1: alknet SOCKS5 proxy (no root required)
|
||||
alknet connect --server example.com --identity ~/.ssh/id_ed25519
|
||||
|
||||
# Terminal 2: tun2proxy routes all traffic through alknet's SOCKS5
|
||||
sudo tun2proxy --proxy socks5://127.0.0.1:1080
|
||||
```
|
||||
|
||||
This keeps the core alknet binary free of TUN complexity and leverages an existing, well-tested tool for TUN-to-SOCKS5 bridging.
|
||||
|
||||
## References
|
||||
|
||||
- [ADR-014](decisions/014-defer-tun-recommend-socks5-proxy.md) — decision to defer TUN
|
||||
- [ADR-005](decisions/005-socks5-before-tun.md) — SOCKS5 is still the primary interface
|
||||
- [tun2proxy](https://github.com/tun2proxy/tun2proxy) — recommended external tool for TUN support
|
||||
@@ -1,651 +0,0 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-06-04
|
||||
phase: exploration
|
||||
---
|
||||
|
||||
# Configuration Architecture
|
||||
|
||||
## Terminology Change: Head/Worker
|
||||
|
||||
This document previously used **hub/spoke** terminology. It has been updated to **head/worker**:
|
||||
|
||||
- **Head node**: The coordinating node (formerly "hub"). A head can also be a worker.
|
||||
- **Worker node**: A node that connects to a head and registers services (formerly "spoke").
|
||||
- **Node**: Any participant in the network. Every node has an identity.
|
||||
|
||||
This better reflects that a head is also a worker, enabling mesh topologies.
|
||||
|
||||
## Problem
|
||||
|
||||
## Problem
|
||||
|
||||
Alknet's configuration is loaded once at startup and never changes. This has
|
||||
three specific failures:
|
||||
|
||||
1. **No hot reload of authentication credentials.** Adding or removing an
|
||||
authorized key requires restarting the server process. In a head/worker
|
||||
deployment where keys are managed via a database (see
|
||||
`@alkdev/storage`'s `peer_credentials` table), the alknet process must be
|
||||
restarted every time a key is added, revoked, or rotated. This is
|
||||
operationally unacceptable for a production service.
|
||||
|
||||
2. **No port forwarding access control.** Any authenticated client can open a
|
||||
`direct-tcpip` channel to any destination. There is no policy governing
|
||||
which hosts, ports, or `alknet-*` control channels a client may access. This
|
||||
is a security gap — a compromised key grants unrestricted network access
|
||||
through the tunnel.
|
||||
|
||||
3. **No structured configuration beyond CLI flags.** ADR-011 chose
|
||||
programmatic-first configuration for the alpha. This was correct — it
|
||||
avoided cross-platform path issues and kept the API surface small. But as
|
||||
alknet moves toward publishable releases, operators need config files for
|
||||
reproducible deployments, and the NAPI layer needs programmatic reload
|
||||
capability that the current `ServeOptions` builder pattern doesn't support.
|
||||
|
||||
### What's Not The Problem
|
||||
|
||||
- This does not propose depending on Honker, SQLite, or any specific data
|
||||
source at the `alknet-core` level. The core provides a reload mechanism;
|
||||
data sources plug in from outside.
|
||||
- This does not propose file-watching (potential attack vector, unnecessary
|
||||
complexity). CLI usage loads config once at startup. Programmatic usage
|
||||
(NAPI, head node) calls reload explicitly.
|
||||
- This does not replace the existing `ServeOptions` builder pattern. It
|
||||
generalizes it.
|
||||
|
||||
## Analysis
|
||||
|
||||
### Static vs Dynamic Configuration
|
||||
|
||||
Not all configuration should be reloadable. Transport-level settings (listen
|
||||
address, TLS certificates, host key) require socket/TLS renegotiation to change
|
||||
at runtime — effectively a restart. Auth and forwarding policy can change
|
||||
atomically without disrupting existing connections.
|
||||
|
||||
| Category | Examples | Reloadable? |
|
||||
|---|---|---|
|
||||
| Transport | listen addr, TLS cert/key, iroh relay, stealth mode | No — requires bind change |
|
||||
| Identity | host key, host key algorithm | No — requires SSH re-negotiation |
|
||||
| Auth | authorized keys, cert authorities | **Yes** — next auth check picks up changes |
|
||||
| Forwarding | allowed destinations, per-principal rules | **Yes** — next channel open picks up changes |
|
||||
| Rate limits | max connections per IP, max auth attempts | **Yes** — next check picks up changes |
|
||||
|
||||
The split is clean: anything that affects the SSH handshake or socket binding
|
||||
is static. Anything that's checked per-connection or per-channel is dynamic.
|
||||
|
||||
### Auth Reload: Service Approach
|
||||
|
||||
The original design held all authorized keys in memory via `ArcSwap<DynamicConfig>`. For small deployments this works, but for nodes serving many users it requires loading every key into RAM and atomic-swapping the entire set on each reload.
|
||||
|
||||
The improved approach is to make auth an **irpc service** (see [core.md](core.md) and [services.md](services.md)). Auth verification becomes a service call: `VerifyPubkey { fingerprint, key_data }` → `oneshot::Sender<AuthResult>`. The service can:
|
||||
|
||||
- Query SQLite on demand (no need to hold all keys in memory)
|
||||
- Maintain an LRU cache for hot keys
|
||||
- Subscribe to honker streams for key invalidation
|
||||
- Run locally (in-process mpsc) or remotely (QUIC stream)
|
||||
|
||||
`ArcSwap<DynamicConfig>` remains as a fallback for minimal deployments (CLI usage, single-node setups) where SQLite overhead isn't warranted. The service approach is the primary path for production deployments.
|
||||
|
||||
### Current Architecture
|
||||
|
||||
```
|
||||
ServeOptions (builder) → Server::new()
|
||||
├─ Arc<server::Config> (russh config, immutable)
|
||||
├─ Arc<ServerAuthConfig> (keys + CAs, immutable after load)
|
||||
├─ Arc<ConnectionRateLimiter> (mutable but not reloadable)
|
||||
└─ ServerHandler::new(auth_config, ...)
|
||||
|
||||
ServerHandler
|
||||
├─ auth_config: Arc<ServerAuthConfig> ← shared, immutable
|
||||
├─ connection_limiter: Arc<ConnectionRateLimiter>
|
||||
├─ outbound_proxy: Option<ProxyConfig>
|
||||
└─ (no forwarding policy field)
|
||||
```
|
||||
|
||||
`auth_publickey()` reads from `self.auth_config` via `Arc` dereference. No
|
||||
path to update it.
|
||||
|
||||
### Proposed Architecture
|
||||
|
||||
Replace `Arc<ServerAuthConfig>` with a service-based approach:
|
||||
|
||||
```
|
||||
StaticConfig (Arc, loaded once)
|
||||
├─ transport mode, listen addr, TLS config, iroh config
|
||||
├─ stealth, proxy
|
||||
├─ host key
|
||||
└─ max_auth_attempts, max_connections_per_ip
|
||||
|
||||
AuthService (irpc service, local or remote)
|
||||
├─ VerifyPubkey(fingerprint, key_data) → AuthResult
|
||||
├─ VerifyToken(token_bytes) → AuthResult
|
||||
└─ ReloadKeys() → ()
|
||||
Backed by: SQLite (peer_credentials, api_keys)
|
||||
Optional: ArcSwap<DynamicConfig> for minimal deployments
|
||||
|
||||
ConfigService (irpc service, always local)
|
||||
├─ ReloadDynamicConfig(DynamicConfig)
|
||||
└─ GetForwardingPolicy() → ForwardingPolicy
|
||||
|
||||
DynamicConfig (Arc<ArcSwap<DynamicConfig>>, reloadable)
|
||||
├─ forwarding: ForwardingPolicy
|
||||
└─ rate_limits: RateLimitConfig
|
||||
```
|
||||
|
||||
For production: auth verification goes through the auth service, which queries SQLite. The `DynamicConfig` only holds forwarding policy and rate limits — not the full key set. For minimal deployments: auth falls back to `ArcSwap<DynamicConfig>` with all keys in memory, wrapped by the same service interface.
|
||||
|
||||
`ArcSwap` provides lock-free reads on the hot path. Every `auth_publickey()`
|
||||
and `channel_open_direct_tcpip()` call does an `Arc` dereference — zero cost
|
||||
compared to the current approach. Writes are atomic: `store()` swaps the
|
||||
pointer. Existing connections finish with their current config, new connections
|
||||
get the new config.
|
||||
|
||||
### Forwarding Policy
|
||||
|
||||
Currently, `channel_open_direct_tcpip` in `handler.rs` spawns a proxy task for
|
||||
any destination. The only gate is authentication. A forwarding policy adds a
|
||||
check before the proxy spawn:
|
||||
|
||||
```rust
|
||||
pub struct ForwardingPolicy {
|
||||
default: ForwardingAction,
|
||||
rules: Vec<ForwardingRule>,
|
||||
}
|
||||
|
||||
pub struct ForwardingRule {
|
||||
target: TargetPattern,
|
||||
action: ForwardingAction,
|
||||
principals: Vec<String>,
|
||||
}
|
||||
|
||||
pub enum ForwardingAction { Allow, Deny }
|
||||
pub enum TargetPattern {
|
||||
Any,
|
||||
Host(String),
|
||||
Cidr(IpNetwork),
|
||||
PortRange(String, Range<u16>),
|
||||
AlknetPrefix,
|
||||
}
|
||||
```
|
||||
|
||||
Rule evaluation: first match wins, default applies if no rule matches. This
|
||||
model maps to OpenSSH's `AllowTcpForwarding` + `PermitOpen` but is more
|
||||
expressive. It also maps to `peer_credentials.metadata.scopes` in `@alkdev/storage`
|
||||
— the head node can generate forwarding rules from stored scopes.
|
||||
|
||||
Rule ordering matters. A deny-then-allow pattern gives blocklist semantics. An
|
||||
allow-then-deny pattern gives allowlist semantics. Both are useful. The
|
||||
default determines the fallback.
|
||||
|
||||
### Configuration File Format
|
||||
|
||||
ADR-011 chose "programmatic-first, no config file." This was correct for alpha.
|
||||
For publishable releases, a config file enables:
|
||||
|
||||
- Reproducible deployments (version-controlled config)
|
||||
- Less verbose CLI invocations
|
||||
- Separate files for static and dynamic config (only static needs to be in the
|
||||
config file; dynamic comes from the reload mechanism)
|
||||
|
||||
TOML is the idiomatic Rust choice. The config file covers static config only —
|
||||
the same fields as `ServeOptions`. Dynamic config (auth, forwarding) comes from
|
||||
the reload mechanism, not from the file. This preserves ADR-011's intent: the
|
||||
core doesn't know about the data source for auth keys, it just provides a place
|
||||
to put them.
|
||||
|
||||
```toml
|
||||
[server]
|
||||
transport = "tls"
|
||||
listen = "0.0.0.0:443"
|
||||
stealth = false
|
||||
max_connections_per_ip = 5
|
||||
max_auth_attempts = 3
|
||||
|
||||
[server.tls]
|
||||
cert = "/etc/alknet/tls/cert.pem"
|
||||
key = "/etc/alknet/tls/key.pem"
|
||||
|
||||
[server.iroh]
|
||||
relay = "https://relay.alk.dev"
|
||||
|
||||
[auth]
|
||||
host_key = "/etc/alknet/ssh/host_key"
|
||||
|
||||
[forwarding]
|
||||
default = "deny"
|
||||
|
||||
[[forwarding.rules]]
|
||||
target = "localhost:*"
|
||||
action = "allow"
|
||||
|
||||
[[forwarding.rules]]
|
||||
target = "alknet-*"
|
||||
action = "allow"
|
||||
|
||||
[[forwarding.rules]]
|
||||
target = "*:22"
|
||||
action = "deny"
|
||||
```
|
||||
|
||||
The `[[forwarding.rules]]` array syntax is TOML's array-of-tables pattern.
|
||||
Rules are evaluated in order; first match wins.
|
||||
|
||||
### NAPI Reload API
|
||||
|
||||
The NAPI layer exposes the reload handle:
|
||||
|
||||
```typescript
|
||||
interface AlknetServer {
|
||||
reloadAuth(auth: { authorizedKeys?: Buffer, certAuthority?: Buffer }): void;
|
||||
reloadForwarding(policy: ForwardingPolicyConfig): void;
|
||||
reloadAll(config: DynamicConfig): void;
|
||||
}
|
||||
|
||||
interface ForwardingPolicyConfig {
|
||||
default: 'allow' | 'deny';
|
||||
rules: ForwardingRuleConfig[];
|
||||
}
|
||||
|
||||
interface ForwardingRuleConfig {
|
||||
target: string; // "localhost:*", "10.0.0.0/8:80", "alknet-*"
|
||||
action: 'allow' | 'deny';
|
||||
principals?: string[]; // default ["*"]
|
||||
}
|
||||
```
|
||||
|
||||
The head node calls `server.reloadAuth(...)` after writing to `peer_credentials`.
|
||||
The NAPI layer parses the key data and constructs a new `DynamicConfig`, then
|
||||
calls the `ConfigReloadHandle`.
|
||||
|
||||
### Client Configuration
|
||||
|
||||
Client configuration is almost entirely static (which server to connect to,
|
||||
which key to use). The only potential dynamic config is key rotation, which is
|
||||
less urgent because clients don't serve. For now, client configuration stays
|
||||
as `ConnectOptions` — no `ArcSwap` needed.
|
||||
|
||||
A config file for client connections could define named profiles:
|
||||
|
||||
```toml
|
||||
[profiles.production]
|
||||
server = "head.alk.dev:443"
|
||||
transport = "tls"
|
||||
identity = "/home/user/.ssh/id_ed25519"
|
||||
|
||||
[profiles.staging]
|
||||
server = "staging.alk.dev:22"
|
||||
transport = "tcp"
|
||||
identity = "/home/user/.ssh/staging_key"
|
||||
```
|
||||
|
||||
This is a convenience layer on top of `ConnectOptions`, not a replacement.
|
||||
|
||||
### CLI vs Programmatic Behavior
|
||||
|
||||
| Interface | Static config | Dynamic config | Reload mechanism |
|
||||
|---|---|---|---|
|
||||
| CLI | Flags + optional `--config` file | Loaded at startup from `--authorized-keys` | None (restart to change) |
|
||||
| Core Rust | `StaticConfig` struct | `AuthService` (irpc) or `ArcSwap<DynamicConfig>` (minimal) | `ConfigService::reload()` or `ConfigReloadHandle::reload()` |
|
||||
| NAPI | `serve()` options | Same | `server.reloadAuth()`, `server.reloadForwarding()` |
|
||||
|
||||
The CLI doesn't need a reload mechanism. When you're running alknet from the
|
||||
command line, restarting is fine. The reload mechanism exists for programmatic
|
||||
consumers and for the auth service pattern where keys are queried on demand from
|
||||
a database.
|
||||
|
||||
### Multi-Transport Listeners
|
||||
|
||||
A head node may want to accept connections on multiple transports simultaneously:
|
||||
|
||||
- TCP on port 22 (simple, direct SSH)
|
||||
- TLS on port 443 (stealth mode, corporate firewalls)
|
||||
- iroh QUIC (P2P, no port forwarding needed)
|
||||
- WebTransport on port 443 (browser clients, shares the HTTP/3 listener)
|
||||
|
||||
Currently `ServeTransportMode` is a single enum and `Server::run()` takes one
|
||||
acceptor. To serve multiple transports, the architecture needs to change.
|
||||
|
||||
**Option A: `Server` manages multiple listeners internally.**
|
||||
|
||||
```rust
|
||||
pub struct Server {
|
||||
// Shared state (one copy, shared across all listeners)
|
||||
config: Arc<server::Config>,
|
||||
dynamic_config: Arc<ArcSwap<DynamicConfig>>,
|
||||
connection_limiter: Arc<ConnectionRateLimiter>,
|
||||
outbound_proxy: Option<ProxyConfig>,
|
||||
sessions: Arc<tokio::sync::Mutex<Vec<ActiveSession>>>,
|
||||
shutdown_tx: tokio::sync::watch::Sender<bool>,
|
||||
shutdown_rx: tokio::sync::watch::Receiver<bool>,
|
||||
|
||||
// Per-listener state
|
||||
listeners: Vec<ListenerConfig>,
|
||||
}
|
||||
|
||||
pub struct ListenerConfig {
|
||||
transport: ServeTransportMode,
|
||||
listen_addr: SocketAddr,
|
||||
stealth: bool,
|
||||
// Transport-specific config (TLS cert, iroh relay, etc.)
|
||||
tls: Option<TlsConfig>,
|
||||
iroh: Option<IrohConfig>,
|
||||
}
|
||||
```
|
||||
|
||||
`Server::run()` spawns one accept loop per `ListenerConfig`. Each loop
|
||||
constructs its own acceptor and `ServerHandler` (with the appropriate
|
||||
`TransportKind` tag), but shares the auth config, connection limiter, and
|
||||
session list. Shutdown signal goes to all loops.
|
||||
|
||||
**Option B: Caller manages multiple `Server` instances.**
|
||||
|
||||
The caller creates N `Server` objects, each with its own transport. They share
|
||||
`Arc<ArcSwap<DynamicConfig>>` and `Arc<ConnectionRateLimiter>` explicitly.
|
||||
|
||||
Option A is better because: shared shutdown, shared session tracking, single
|
||||
point for config reload. Option B puts coordination burden on the caller and
|
||||
makes graceful shutdown harder (N independent shutdown channels).
|
||||
|
||||
**The TLS + WebTransport coexistence question.** Both TLS and WebTransport
|
||||
use port 443. WebTransport is HTTP/3 (QUIC), TLS on port 443 is typically
|
||||
TCP+TLS. They can share the port because they're different protocols — QUIC
|
||||
is UDP, TLS-over-TCP is TCP. The kernel routes by protocol. But if both are
|
||||
on 443, the stealth mode protocol detector needs to handle HTTP/3 as well:
|
||||
|
||||
```
|
||||
Port 443:
|
||||
TCP connection → TLS handshake → SSH (existing)
|
||||
UDP "connection" → QUIC handshake → WebTransport → stream proxy
|
||||
```
|
||||
|
||||
This is similar to how iroh-live-relay works: HTTP/3 listener accepts
|
||||
WebTransport sessions, each session opens bidirectional streams that map to
|
||||
internal services.
|
||||
|
||||
**Config file for multi-transport:**
|
||||
|
||||
```toml
|
||||
[[listeners]]
|
||||
transport = "tls"
|
||||
listen = "0.0.0.0:443"
|
||||
stealth = true
|
||||
|
||||
[listeners.tls]
|
||||
cert = "/etc/alknet/tls/cert.pem"
|
||||
key = "/etc/alknet/tls/key.pem"
|
||||
|
||||
[[listeners]]
|
||||
transport = "tcp"
|
||||
listen = "0.0.0.0:22"
|
||||
|
||||
[[listeners]]
|
||||
transport = "iroh"
|
||||
iroh_relay = "https://relay.alk.dev"
|
||||
|
||||
[[listeners]]
|
||||
transport = "webtransport"
|
||||
listen = "0.0.0.0:443"
|
||||
# WebTransport shares port 443 with TLS because QUIC is UDP, TLS is TCP
|
||||
|
||||
[listeners.webtransport]
|
||||
cert = "/etc/alknet/tls/cert.pem"
|
||||
key = "/etc/alknet/tls/key.pem"
|
||||
```
|
||||
|
||||
The `[[listeners]]` array-of-tables pattern means each listener is an
|
||||
independent config block. The `[auth]`, `[forwarding]`, and `[server]`
|
||||
sections at the top level are shared — they apply to all listeners.
|
||||
|
||||
**NAPI multi-transport:**
|
||||
|
||||
```typescript
|
||||
const server = await serve({
|
||||
listeners: [
|
||||
{ transport: 'tls', listen: '0.0.0.0:443', stealth: true, tlsCert: '...', tlsKey: '...' },
|
||||
{ transport: 'tcp', listen: '0.0.0.0:22' },
|
||||
{ transport: 'iroh', irohRelay: 'https://relay.alk.dev' },
|
||||
],
|
||||
hostKey: hostKeyBuffer,
|
||||
authorizedKeys: keysBuffer,
|
||||
});
|
||||
```
|
||||
|
||||
Single `AlknetServer` object, single `reloadAuth()` call affects all
|
||||
listeners.
|
||||
|
||||
### Transport Kind and WebTransport
|
||||
|
||||
The `TransportKind` enum (currently `Tcp | Tls | Iroh`) tags each connection
|
||||
so the handler can behave differently per transport. Adding `WebTransport` to
|
||||
this enum is straightforward — WebTransport connections are identifiable at
|
||||
accept time. The handler behavior is the same (port forwarding only), but
|
||||
the tag enables transport-specific logging and future policy differences
|
||||
(e.g., WebTransport clients can only access `alknet-*` control channels).
|
||||
|
||||
## Proposed Solution
|
||||
|
||||
### Phase 1: Static/Dynamic Split
|
||||
|
||||
1. Introduce `StaticConfig` and `DynamicConfig` structs
|
||||
2. Replace `Arc<ServerAuthConfig>` in `ServerHandler` with
|
||||
`Arc<ArcSwap<DynamicConfig>>`
|
||||
3. Add `ConfigReloadHandle` with `reload(DynamicConfig)` method
|
||||
4. Expose `reloadAuth()` on the NAPI `AlknetServer` object
|
||||
|
||||
**Scope**: `alknet-core` auth module + `alknet-napi` serve module
|
||||
|
||||
**Risk**: Low — internal refactor, no protocol changes
|
||||
|
||||
### Phase 2: Forwarding Policy
|
||||
|
||||
1. Add `ForwardingPolicy` to `DynamicConfig`
|
||||
2. Add policy check to `channel_open_direct_tcpip` before proxy spawn
|
||||
3. Expose `reloadForwarding()` on NAPI `AlknetServer`
|
||||
|
||||
**Scope**: `alknet-core` handler + `alknet-napi`
|
||||
|
||||
**Risk**: Low — new check, default-allow preserves current behavior
|
||||
|
||||
### Phase 3: Config File
|
||||
|
||||
1. Add `--config <path>` CLI flag parsing TOML
|
||||
2. CLI flags override config file values (same precedence as cargo)
|
||||
3. Config file only covers static config + initial auth config path
|
||||
4. Add `serde` derive to `StaticConfig`
|
||||
|
||||
**Scope**: `alknet-cli` (new binary crate) + `alknet-core` config module
|
||||
|
||||
**Risk**: Medium — new dependency (`toml` crate), new CLI surface to validate
|
||||
|
||||
### Phase 4: Client Profiles
|
||||
|
||||
1. Add `[profiles]` section to client config file
|
||||
2. `--profile production` loads named profile
|
||||
3. CLI flags override profile values
|
||||
|
||||
**Scope**: `alknet-cli`
|
||||
|
||||
**Risk**: Low — convenience layer only
|
||||
|
||||
### Phase 5: Multi-Transport Listeners
|
||||
|
||||
1. Change `ServeTransportMode` from single enum to `Vec<ListenerConfig>`
|
||||
2. `Server::run()` spawns one accept loop per listener, sharing `DynamicConfig`
|
||||
3. Single shutdown signal drains all listeners
|
||||
4. Add `[[listeners]]` to config file format
|
||||
5. NAPI `serve()` accepts `listeners` array instead of single `transport`
|
||||
6. Add `WebTransport` to `TransportKind` enum (initially as a tag only;
|
||||
actual WebTransport acceptor is a separate R&D phase)
|
||||
|
||||
**Scope**: `alknet-core` serve.rs + `alknet-napi` + `alknet-cli`
|
||||
|
||||
**Risk**: Medium — changes the primary API surface of `serve()`. Backwards
|
||||
compat via accepting both `transport: string` (single) and
|
||||
`listeners: array` (multi) in NAPI.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **OQ-CFG-01**: Should forwarding rules support per-user scope derived from
|
||||
the authenticated key's metadata (e.g., `peer_credentials.metadata.scopes`)?
|
||||
Or is a global rules table with principal matching sufficient?
|
||||
|
||||
Global rules with principal matching is simpler and covers most cases. Per-user
|
||||
scope derived from certificates is more granular but requires the server to
|
||||
maintain a mapping from key fingerprint to scope. This mapping comes from the
|
||||
head node's database, not from the SSH protocol. Phase 2 starts with global rules;
|
||||
per-user scope can be added as an extension.
|
||||
|
||||
- **OQ-CFG-02**: Should the config file watch for changes and auto-reload?
|
||||
|
||||
No. File watching is a potential attack vector (symlink races, inotify
|
||||
limitations on network filesystems). The CLI loads once at startup. The NAPI
|
||||
layer reloads explicitly. This is the right model for a security-sensitive
|
||||
tool.
|
||||
|
||||
- **OQ-CFG-03**: Should `ArcSwap` be the reload primitive, or is `RwLock`
|
||||
sufficient?
|
||||
|
||||
`ArcSwap` is the standard pattern for this in Rust network services
|
||||
(`arc-swap` crate). It provides lock-free reads (the hot path) and atomic
|
||||
writes. `RwLock` would also work but adds lock contention on reads. The
|
||||
`arc-swap` dependency is small (~500 lines) and well-maintained. Prefer it.
|
||||
|
||||
- **OQ-CFG-04**: Should TLS and WebTransport on the same port share a single
|
||||
QUIC listener (like iroh Router's ALPN dispatch), or run as separate
|
||||
listeners on the same port?
|
||||
|
||||
They can't conflict because QUIC is UDP and TLS-over-TCP is TCP — the
|
||||
kernel routes by protocol, not by port number. They're naturally separate
|
||||
listeners even on the same port. However, if iroh is also running on the
|
||||
same host, the iroh endpoint already owns a QUIC listener. The WebTransport
|
||||
listener needs its own. Options: (a) share the iroh endpoint's QUIC listener
|
||||
with ALPN dispatch (reuses `from_endpoint` pattern), (b) separate QUIC
|
||||
listeners on different ports, (c) bind both to 443/UDP — possible if
|
||||
`SO_REUSEPORT` is used. Needs R&D; defer to WebTransport transport design
|
||||
session.
|
||||
|
||||
~~**Update**: WebTransport is out of scope for the current configuration
|
||||
work. It requires a fundamentally different authentication model (HTTP-level
|
||||
API keys/session tokens vs SSH key-based auth). The `ServerHandler` only
|
||||
knows SSH `auth_publickey`. WebTransport auth would need its own handler
|
||||
path. This connects to the broader question of whether `DynamicConfig.auth`
|
||||
should be transport-aware (see OQ-CFG-06). WebTransport transport design
|
||||
is a separate R&D session.~~
|
||||
|
||||
**Update 2**: Auth concern is resolved by ADR-023. The same authorized_keys
|
||||
set verifies both SSH pubkey auth and token auth (Ed25519-signed timestamp
|
||||
for WebTransport). One key material, two presentations. The remaining
|
||||
question is purely about QUIC listener coexistence — which is a transport
|
||||
implementation detail, not an auth question. See [auth.md](../architecture/auth.md)
|
||||
and [ADR-023](../architecture/decisions/023-unified-auth-shared-key-material.md).
|
||||
|
||||
- **OQ-CFG-05**: Does `TransportKind::WebTransport` need any handler behavior
|
||||
different from other transports?
|
||||
|
||||
Initially no — all transports get the same port-forwarding-only handler.
|
||||
But WebTransport connections come from browsers, which have different trust
|
||||
assumptions. A future forwarding policy might restrict WebTransport clients
|
||||
to `alknet-*` control channels only (no arbitrary host:port forwarding).
|
||||
This is a policy question, not a transport question. The `TransportKind` tag
|
||||
on the handler enables transport-aware policy rules in `ForwardingPolicy`
|
||||
without changing the handler. Defer to Phase 2 (forwarding policy design).
|
||||
|
||||
- **OQ-CFG-06**: Should the auth layer be transport-aware?
|
||||
|
||||
Currently `DynamicConfig.auth` is `ServerAuthConfig` — SSH keys and CAs
|
||||
only. This works for SSH over any transport (TCP, TLS, iroh) because SSH
|
||||
carries its own auth protocol. But non-SSH transports (WebTransport,
|
||||
WebSocket) use HTTP-level authentication (API keys, session tokens in
|
||||
headers/query params). The auth question is: does the same `DynamicConfig`
|
||||
serve both models, or does each transport carry its own auth config?
|
||||
|
||||
~~Option A: `AuthPolicy` contains both SSH auth and API key auth:
|
||||
```rust
|
||||
pub struct AuthPolicy {
|
||||
ssh: SshAuthConfig, // for SSH-over-any-transport
|
||||
api_keys: Option<ApiKeysConfig>, // for non-SSH transports
|
||||
}
|
||||
```
|
||||
|
||||
Option B: Auth is per-listener. Each `ListenerConfig` carries its own auth
|
||||
config appropriate to its transport.
|
||||
|
||||
Option A is simpler for the initial implementation — the SSH auth path is
|
||||
unchanged, and API key auth is additive. Option B is more flexible but
|
||||
duplicates the shared auth state (keys should be reloadable once, not per
|
||||
listener).
|
||||
|
||||
For now, the config architecture should accommodate Option A as a future
|
||||
extension. Phase 1 implements `DynamicConfig` with SSH auth only. API key
|
||||
auth is added when a non-SSH transport is implemented.~~
|
||||
|
||||
**Resolved by ADR-023**: The auth layer is transport-aware in its
|
||||
*presentation*, not its *material*. `AuthPolicy` holds `SshAuthConfig` and
|
||||
`TokenAuthConfig`, where `TokenAuthConfig.key_source` defaults to
|
||||
`Shared` (same `authorized_keys` set as SSH auth). The same Ed25519 keys
|
||||
serve both paths: SSH presents the public key in the handshake; WebTransport
|
||||
presents an Ed25519-signed timestamp token. Verification produces the same
|
||||
`Identity` type via the `IdentityProvider` trait. One `reloadAuth()` call
|
||||
updates both. See [auth.md](../architecture/auth.md) and
|
||||
[ADR-023](../architecture/decisions/023-unified-auth-shared-key-material.md).
|
||||
|
||||
- **OQ-CFG-07**: Should auth and secret services share a single irpc endpoint
|
||||
or be separate services?
|
||||
|
||||
Separate services are better. Auth (verify credentials) and Secret (derive/store
|
||||
keys) have different security boundaries. The secret service holds the master
|
||||
seed; the auth service only needs public key fingerprints. They may run on
|
||||
different machines. See [services.md](services.md) for protocol definitions.
|
||||
|
||||
- **OQ-CFG-08**: How do external credentials (API keys, OAuth tokens) relate
|
||||
to the secret service's HD key derivation?
|
||||
|
||||
HD-derived keys (from SLIP-0010/BIP39) cover self-generated secrets (identity
|
||||
keys, encryption keys, SSH keys). External credentials (third-party API keys,
|
||||
OAuth tokens) can't be derived — they must be stored encrypted. The secret
|
||||
service handles both: derived keys are regenerated on demand; stored secrets
|
||||
are encrypted with a key that is itself derived from the seed. See
|
||||
[services.md](services.md) for the `SecretProtocol` definition.
|
||||
|
||||
## Decisions Required
|
||||
|
||||
These decisions will be extracted into ADRs when the architecture is finalized:
|
||||
|
||||
1. **ADR-020**: Static/dynamic config split. Auth delegated to `AuthService` (irpc)
|
||||
for production; `ArcSwap<DynamicConfig>` for minimal deployments. Supersedes
|
||||
ADR-011's "no config file" — adds optional config file while preserving
|
||||
programmatic-first API.
|
||||
|
||||
2. **ADR-021**: Forwarding policy with rule-based allow/deny. Default-allow
|
||||
preserves current behavior during migration; default-deny for production
|
||||
deployments.
|
||||
|
||||
3. **ADR-022**: Multi-transport listeners. `Server` spawns multiple accept
|
||||
loops sharing auth config, session state, and shutdown. Replaces single
|
||||
`ServeTransportMode` with `Vec<ListenerConfig>`.
|
||||
|
||||
4. **ADR-026**: Head/worker terminology. Replace hub/spoke with head/worker
|
||||
throughout all documentation and APIs. A head is also a worker.
|
||||
|
||||
5. **ADR-028**: Auth as service. Auth verification via irpc `AuthProtocol`
|
||||
service, not in-memory key set. Enables SQLite-backed auth for production,
|
||||
`ArcSwap` fallback for minimal deployments.
|
||||
|
||||
## References
|
||||
|
||||
- [ADR-011](../architecture/decisions/011-no-ssh-config-programmatic-api.md) — Programmatic-first API (superseded by ADR-020)
|
||||
- [ADR-012](../architecture/decisions/012-auth-ed25519-and-cert-authority.md) — Auth key format
|
||||
- [ADR-018](../architecture/decisions/018-control-channel-for-pubsub.md) — Control channel routing
|
||||
- `server/handler.rs` — Current `Arc<ServerAuthConfig>` usage
|
||||
- `server/serve.rs` — Current single-transport `Server::run()` accept loop
|
||||
- `auth/server_auth.rs` — `ServerAuthConfig` struct
|
||||
- `auth/keys.rs` — `KeySource` and key loading
|
||||
- `@alkdev/storage/docs/architecture/sqlite-host.md` — `peer_credentials` table schema
|
||||
- [wtransport](https://github.com/BiagioFesta/wtransport) — Rust WebTransport library (in `/workspace/wtransport`)
|
||||
- [arc-swap crate](https://docs.rs/arc-swap) — Lock-free read, atomic write for shared state
|
||||
- [ADR-023](../architecture/decisions/023-unified-auth-shared-key-material.md) — Unified auth with shared key material
|
||||
- [auth.md](../architecture/auth.md) — Unified auth architecture spec
|
||||
- [call-protocol.md](../architecture/call-protocol.md) — Bidirectional call protocol spec
|
||||
- [services.md](services.md) — Service layer architecture (irpc services)
|
||||
- [core.md](core.md) — Core overview, head/worker terminology, service layer
|
||||
@@ -1,426 +0,0 @@
|
||||
# Alknet Core: Transport, Call Protocol, Auth, Services, and DNS
|
||||
|
||||
> Status: Research / Draft
|
||||
> Last updated: 2026-06-06
|
||||
|
||||
## Overview
|
||||
|
||||
`alknet-core` is the foundational crate providing pluggable transports, the bidirectional call protocol, Ed25519 authentication, a service layer (via irpc), and (future) DNS transport + naming. Everything else (storage, flowgraph, relay) builds on top of this.
|
||||
|
||||
### Terminology: Nodes, Heads, and Workers
|
||||
|
||||
Alknet uses a **head/worker** model instead of hub/spoke:
|
||||
|
||||
- **Node**: Any participant in the network. Every node has an Ed25519 identity.
|
||||
- **Head node**: A node that coordinates — accepts connections, routes operations, manages cluster state. A head is also a worker (it can execute operations).
|
||||
- **Worker node**: A node that connects to a head, registers its services, and executes operations. Any worker can become a head.
|
||||
- **Service**: A named collection of operations exposed by a node (e.g., `fs`, `bash`, `compute`, `agent`). Services register via the call protocol.
|
||||
|
||||
This model allows natural mesh formation: a head can also be a worker for another head, enabling multi-hop routing, redundancy, and distributed topologies without a centralized authority.
|
||||
|
||||
## Transport Layer
|
||||
|
||||
### Architecture
|
||||
|
||||
The transport layer produces a duplex byte stream (`AsyncRead + AsyncWrite + Unpin + Send`) that the SSH layer consumes via `russh::client::connect_stream()` or `russh::server::run_stream()`. SSH is completely unaware of what transport it runs over.
|
||||
|
||||
### Transport Trait
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait Transport: Send + Sync + 'static {
|
||||
type Stream: AsyncRead + AsyncWrite + Unpin + Send + 'static;
|
||||
async fn connect(&self) -> Result<Self::Stream>;
|
||||
fn describe(&self) -> String;
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
pub trait TransportAcceptor: Send + Sync + 'static {
|
||||
type Stream: AsyncRead + AsyncWrite + Unpin + Send + 'static;
|
||||
async fn accept(&self) -> Result<(Self::Stream, TransportInfo)>;
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct TransportInfo {
|
||||
pub remote_addr: Option<SocketAddr>,
|
||||
pub transport_kind: TransportKind,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub enum TransportKind {
|
||||
Tcp,
|
||||
Tls { server_name: Option<String> },
|
||||
Iroh { endpoint_id: String },
|
||||
Dns { domain: String }, // NEW
|
||||
WebTransport { host: String }, // NEW (planned)
|
||||
}
|
||||
```
|
||||
|
||||
### Existing Transports
|
||||
|
||||
| Transport | Client | Server | Stream Type |
|
||||
|-----------|--------|--------|-------------|
|
||||
| TcpTransport | `TcpStream::connect(addr)` | `TcpListener::accept()` | `TcpStream` |
|
||||
| TlsTransport | `TlsStream<TcpStream>` | `TlsStream<TcpStream>` | tokio_rustls |
|
||||
| IrohTransport | `endpoint.connect(peer, alpn)` then `conn.open_bi()` then `join(recv, send)` | `endpoint.accept()` then `conn.accept_bi()` then `join(recv, send)` | `tokio::io::Join<RecvStream, SendStream>` |
|
||||
| AcmeTlsAcceptor | Auto-provision via Let's Encrypt | ACME cert provision + TLS accept | TlsStream |
|
||||
|
||||
### Transport Chaining
|
||||
|
||||
```bash
|
||||
alknet connect --transport iroh --proxy socks5://127.0.0.1:1080
|
||||
alknet connect --transport tls --proxy socks5://127.0.0.1:1080
|
||||
```
|
||||
|
||||
`--proxy` routes outbound connections. Client: routes transport connection. Server: routes data-channel TCP targets.
|
||||
|
||||
### Stealth Mode
|
||||
|
||||
When `--stealth` is enabled with TLS transport on port 443: after TLS handshake, peek first bytes. If `SSH-2.0-`, run SSH. Otherwise, return `HTTP/1.1 404 Not Found\r\nServer: nginx\r\n\r\n` and close. Makes the server indistinguishable from an HTTPS site.
|
||||
|
||||
## Call Protocol
|
||||
|
||||
### Wire Format
|
||||
|
||||
Every message is a length-prefixed JSON `EventEnvelope`:
|
||||
|
||||
```rust
|
||||
pub struct EventEnvelope {
|
||||
pub r#type: String, // "call.requested", "call.responded", etc.
|
||||
pub id: String, // Correlation key (requestId, topic, or "" for broadcasts)
|
||||
pub payload: Value, // JSON payload — schema depends on event type
|
||||
}
|
||||
|
||||
// Frame: 4-byte big-endian length prefix + UTF-8 JSON body
|
||||
```
|
||||
|
||||
This is the same format used by `@alkdev/pubsub` adapters. The envelope is transport-agnostic — it runs over SSH channels, WebTransport streams, iroh bidirectional streams, WebSocket, Worker postMessage, or DNS queries.
|
||||
|
||||
Binary payloads are base64-encoded in the `payload` field. The envelope itself stays JSON for cross-language compatibility.
|
||||
|
||||
### Call Protocol Events
|
||||
|
||||
| Event | Direction | Purpose |
|
||||
|-------|-----------|---------|
|
||||
| `call.requested` | Caller → Handler | Initiate a call or subscription |
|
||||
| `call.responded` | Handler → Caller | Deliver a result (one for calls, many for subscriptions) |
|
||||
| `call.completed` | Handler → Caller | Signal end of subscription stream |
|
||||
| `call.aborted` | Either side | Cancel the call/subscription |
|
||||
| `call.error` | Handler → Caller | Signal an error |
|
||||
|
||||
A call is just a subscribe that resolves after one event. Both `call()` and `subscribe()` send the same `call.requested` event.
|
||||
|
||||
### Operation Paths
|
||||
|
||||
```
|
||||
/{node}/{service}/{op}
|
||||
```
|
||||
|
||||
- **node** — identity prefix of the node that exposes the operation
|
||||
- **service** — logical service namespace (e.g., `fs`, `bash`, `agent`)
|
||||
- **op** — specific operation (e.g., `readFile`, `exec`, `chat`)
|
||||
|
||||
Examples:
|
||||
|
||||
| Path | Meaning |
|
||||
|------|---------|
|
||||
| `/dev1/fs/readFile` | Node `dev1`, service `fs`, op `readFile` |
|
||||
| `/head/agent/chat` | Head's own `agent` service, op `chat` |
|
||||
| `/head/sessions/list` | Head's `sessions` service, op `list` |
|
||||
|
||||
### PendingRequestMap
|
||||
|
||||
Manages in-flight calls and subscriptions. Correlates `call.responded` events back to the original `call.requested`:
|
||||
|
||||
```rust
|
||||
pub struct PendingRequestMap {
|
||||
pending: HashMap<String, PendingEntry>,
|
||||
}
|
||||
|
||||
enum PendingEntry {
|
||||
Call { tx: oneshot::Sender<Result<Value>>, timeout: Instant },
|
||||
Subscribe { tx: mpsc::Sender<Result<Value>>, timeout: Option<Instant> },
|
||||
}
|
||||
```
|
||||
|
||||
### Operation Registry
|
||||
|
||||
```rust
|
||||
pub struct OperationSpec {
|
||||
pub name: String, // "/fs/readFile", "/agent/chat"
|
||||
pub namespace: String, // "fs", "agent"
|
||||
pub op_type: OperationType, // Query, Mutation, Subscription
|
||||
pub input_schema: Value, // JSON Schema for input
|
||||
pub output_schema: Value, // JSON Schema for output
|
||||
pub access_control: AccessControl, // Required scopes/resources
|
||||
}
|
||||
|
||||
pub enum OperationType {
|
||||
Query, // Read-only, idempotent
|
||||
Mutation, // Side effects
|
||||
Subscription, // Streaming
|
||||
}
|
||||
|
||||
pub struct AccessControl {
|
||||
pub required_scopes: Vec<String>,
|
||||
pub required_scopes_any: Option<Vec<String>>,
|
||||
pub resource_type: Option<String>,
|
||||
pub resource_action: Option<String>,
|
||||
}
|
||||
```
|
||||
|
||||
Specs and handlers are separated — downstream consumers register both without modifying core:
|
||||
|
||||
```rust
|
||||
registry.register(OperationSpec { name: "/services/list", ... }, list_services_handler);
|
||||
registry.register(OperationSpec { name: "/fs/readFile", ... }, fs_read_handler);
|
||||
```
|
||||
|
||||
### Protocol Adapter Layer
|
||||
|
||||
| Transport | Channel mechanism | Direction |
|
||||
|-----------|-------------------|-----------|
|
||||
| SSH | Reserved `direct_tcpip` destination `alknet-control:0` | Bidirectional over SSH channel |
|
||||
| WebTransport | Bidirectional stream after CONNECT | Bidirectional over WT stream |
|
||||
| iroh QUIC | `open_bi()` / `accept_bi()` | Bidirectional over QUIC stream |
|
||||
| WebSocket | Single WS connection | Bidirectional over WS frames |
|
||||
| Worker | `postMessage` | Bidirectional over structured clone |
|
||||
| DNS | Query TXT records (client) / serve TXT records (server) | Request/response over DNS |
|
||||
|
||||
### Head/Worker Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────┐
|
||||
│ Head Node │
|
||||
│ │
|
||||
│ Head-local services: │
|
||||
│ /head/agent/chat │
|
||||
│ /head/agent/complete │
|
||||
│ /head/sessions/list │
|
||||
│ │
|
||||
│ Worker registry: │
|
||||
│ /dev1/fs/* → dev1 connection │
|
||||
│ /browser-1/notify/* → WT conn │
|
||||
└──────┬───────┬──────────────────┘
|
||||
│ │
|
||||
┌─────────▼┐ ┌───▼────────────┐
|
||||
│ Worker │ │Browser Worker │
|
||||
│ "dev1" │ │"browser-1" │
|
||||
│ /fs/* │ │/notify/* │
|
||||
└───────────┘ └────────────────┘
|
||||
```
|
||||
|
||||
A head node is also a worker. Any worker can become a head. This enables mesh topologies where nodes coordinate in a peer-to-peer fashion rather than through a single centralized authority.
|
||||
|
||||
Workers register operations on connect:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "call.requested",
|
||||
"id": "uuid-123",
|
||||
"payload": {
|
||||
"operationId": "/head/services/register",
|
||||
"input": {
|
||||
"node": "dev1",
|
||||
"operations": ["/fs/readFile", "/bash/exec"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Authentication
|
||||
|
||||
Ed25519 keys for SSH authentication. A separate authentication mechanism for browsers where they sign a token using the same Ed25519 keys.
|
||||
|
||||
Authentication is provided by the **auth service** — an irpc-based service that verifies credentials on demand rather than holding all keys in memory. This replaces the earlier `ArcSwap<DynamicConfig>` approach and scales to large user populations without requiring full key set reloads.
|
||||
|
||||
Peer credentials are stored in `peer_credentials` table (fingerprint-based lookup). Account credentials via `api_keys` table (SHA-256 hash for high-entropy keys).
|
||||
|
||||
See [services.md](services.md) for the auth service protocol definition.
|
||||
|
||||
## Service Layer
|
||||
|
||||
### Architecture
|
||||
|
||||
Alknet uses an **irpc-based service layer** to decompose core responsibilities into independently testable, deployable, and replaceable components. irpc provides lightweight RPC that works both as an in-process async boundary (tokio channels) and cross-process/cross-network (QUIC streams via noq).
|
||||
|
||||
A **service** is an irpc protocol enum that defines the operations a component supports. Services run as async actors — locally they communicate via `mpsc` channels, remotely via QUIC streams. The `Client<S>` abstracts over both.
|
||||
|
||||
### Core Services
|
||||
|
||||
| Service | irpc Protocol | Purpose | Always Local? |
|
||||
|---------|--------------|---------|---------------|
|
||||
| **Auth** | `AuthProtocol` | Verify identities, check credentials, issue tokens | Can be remote for large-scale auth |
|
||||
| **Secret** | `SecretProtocol` | Derive keys from seed, encrypt/decrypt stored secrets, key versioning | Local in single-node, remote in clustered |
|
||||
| **Config** | `ConfigProtocol` | Dynamic config reload (auth keys, forwarding policy) | Local |
|
||||
| **Storage** | `StorageProtocol` | Graph CRUD, metagraph operations, honker event bridge | Local or remote |
|
||||
|
||||
### Service Definition Pattern
|
||||
|
||||
Services are defined as irpc protocol enums:
|
||||
|
||||
```rust
|
||||
use irpc::{rpc_requests, channel::{mpsc, oneshot}};
|
||||
|
||||
#[rpc_requests(message = AuthMessage)]
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
enum AuthProtocol {
|
||||
#[rpc(tx=oneshot::Sender<AuthResult>)]
|
||||
#[wrap(VerifyPubkey)]
|
||||
VerifyPubkey { fingerprint: String, key_data: Vec<u8> },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<AuthResult>)]
|
||||
#[wrap(VerifyToken)]
|
||||
VerifyToken { token: Vec<u8> },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<()>)]
|
||||
#[wrap(ReloadKeys)]
|
||||
ReloadKeys,
|
||||
}
|
||||
```
|
||||
|
||||
### Local vs Remote
|
||||
|
||||
```rust
|
||||
enum AuthClient {
|
||||
// In-process: zero-copy tokio channels
|
||||
Local(Client<AuthProtocol>),
|
||||
// Cross-process/cross-network: QUIC stream
|
||||
Remote(irpc::rpc::Client<AuthProtocol>),
|
||||
}
|
||||
```
|
||||
|
||||
A node that runs all services locally uses `Client::local(mpsc::channel)`. A node that delegates auth to a separate service uses `Client::remote(quinn::Connection)`. The call sites are identical — the client abstracts over both.
|
||||
|
||||
### Relationship to Call Protocol
|
||||
|
||||
Services are **internal** to a node or cluster. The call protocol is **external** — it's how nodes talk to each other over SSH/WebSocket/QUIC/DNS transports. Services handle concerns like auth and secrets that should not be part of the wire protocol but are needed by every node.
|
||||
|
||||
A service can also be exposed as a call protocol operation. For example, the secret service's `DeriveKey` could be exposed as `/head/secrets/derive` for remote workers that need key derivation but shouldn't hold the master seed.
|
||||
|
||||
### Event Boundary Discipline
|
||||
|
||||
Following the event sourcing patterns in [event_source_types.md](/workspace/research/event_sourcing/event_source_types.md):
|
||||
|
||||
- **Honker streams** (`stream_publish`/`subscribe`) are **internal event sourcing** for the service that owns that data. They are domain events, not integration events.
|
||||
- **Call protocol `EventEnvelope`** is the **integration boundary** between nodes. Cross-node notifications are projected from domain events, not published directly.
|
||||
- **irpc service calls** are **synchronous request-response** within a node or cluster. They are not events and should not be used as such.
|
||||
|
||||
This prevents the conflation of internal state management (event sourcing), cross-service notification (integration events), and service calls (request-response).
|
||||
|
||||
## DNS Transport (Planned)
|
||||
|
||||
### Two DNS Concepts
|
||||
|
||||
1. **DNS as Transport** — Encode `EventEnvelope` frames as DNS queries/responses. Censorship resistance. Request/response maps to `call.requested`/`call.responded` naturally.
|
||||
|
||||
2. **DNS as Naming/Discovery** — Publish/resolve endpoint information via DNS TXT records (iroh-dns style). Smart contract provides on-chain `name → namespaceId + relays`. DNS transport carries the data flow when other transports are blocked.
|
||||
|
||||
### DNS as Call Protocol Transport
|
||||
|
||||
The call protocol is transport-agnostic. DNS becomes another adapter:
|
||||
|
||||
```
|
||||
Transport Layer:
|
||||
SSH channel → EventEnvelope frames → CallHandler
|
||||
WebTransport → EventEnvelope frames → CallHandler
|
||||
iroh QUIC stream → EventEnvelope frames → CallHandler
|
||||
DNS query/response → EventEnvelope frames → CallHandler ← NEW
|
||||
```
|
||||
|
||||
**Upstream (client → server)**: Encode `EventEnvelope` JSON as base32 DNS query labels.
|
||||
**Downstream (server → client)**: Return `EventEnvelope` JSON in TXT record responses.
|
||||
**Polling**: For `call.responded` after `call.requested`, client polls `requestId.alk.dev TXT?`.
|
||||
|
||||
The `DnsTransportAdapter` implements the same adapter pattern as `@alkdev/pubsub`'s event targets, making DNS a first-class transport for control channel operations.
|
||||
|
||||
### DNS as Full Transport (SSH Tunneling)
|
||||
|
||||
Full-duplex SSH tunneling over DNS requires a framing protocol:
|
||||
- Chunk SSH data into fixed-size frames (e.g., 220-byte frames with 4-byte header for seq/ack)
|
||||
- Encode upstream in base32 subdomain labels
|
||||
- Encode downstream in TXT records or CNAME targets
|
||||
- Handle resequencing and retransmission
|
||||
|
||||
This is higher latency (~1-50 KB/s) but works when all other transports are blocked. Fine for interactive SSH. Log a warning at connect time.
|
||||
|
||||
### iroh-dns Relationship
|
||||
|
||||
iroh-dns publishes `EndpointInfo` via `_iroh.<z32-endpoint-id>.<origin> TXT` records. alknet can extend this:
|
||||
|
||||
- Add `tunnel=dnst.example.com` attribute to indicate DNS transport availability
|
||||
- Use iroh-dns `DnsResolver` for endpoint discovery
|
||||
- When a client sees the `tunnel` attribute and QUIC is blocked, fall back to DNS transport
|
||||
|
||||
### DnsTransport Implementation Sketch
|
||||
|
||||
```rust
|
||||
#[cfg(feature = "dns")]
|
||||
mod dns;
|
||||
|
||||
pub struct DnsTransport {
|
||||
domain: String, // e.g. "t.alk.dev"
|
||||
resolver_addr: SocketAddr,
|
||||
protocol: DnsProtocol, // Udp, Tcp, Tls, Https
|
||||
auth_token: Option<String>,
|
||||
}
|
||||
|
||||
pub struct DnsAcceptor {
|
||||
domain: String,
|
||||
listen_addr: SocketAddr,
|
||||
protocol: DnsProtocol,
|
||||
}
|
||||
|
||||
// DnsStream: virtual duplex backed by DNS poll/push
|
||||
// Uses tokio::io::duplex() internally with a background task that:
|
||||
// - Chunks outgoing bytes into DNS queries (client) or response records (server)
|
||||
// - Reassembles incoming DNS payloads into the read buffer
|
||||
// - Handles ACK/NACK for reliability
|
||||
```
|
||||
|
||||
### DnsProtocol in iroh-dns
|
||||
|
||||
iroh-dns already supports multiple DNS protocols:
|
||||
|
||||
```rust
|
||||
pub enum DnsProtocol {
|
||||
Udp, // Classic DNS
|
||||
Tcp, // DNS over TCP
|
||||
Tls, // DNS over TLS (DoT) — RFC 7858
|
||||
Https, // DNS over HTTPS (DoH) — RFC 8484
|
||||
}
|
||||
```
|
||||
|
||||
alknet's DNS transport should support all of these. DoH (port 443, looks like HTTPS) is particularly valuable for censorship resistance since it's indistinguishable from normal web traffic.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| 001 | Pluggable transport | Transport trait produces stream, SSH consumes it |
|
||||
| 003 | iroh stream join | `tokio::io::join` combines QUIC halves |
|
||||
| 004 | SSH over transport | SSH never touches TCP/iroh/TLS directly |
|
||||
| 008 | ACME/Let's Encrypt | Auto-provision TLS certs |
|
||||
| 009 | Default iroh relay | n0 relay by default, `--iroh-relay` override |
|
||||
| 010 | Transport chaining | `--proxy` works with all transports natively |
|
||||
| 017 | Stealth mode | Peek first bytes, return 404 for non-SSH on port 443 |
|
||||
| 018 | Control channel for pubsub | Reserved destination for event bus |
|
||||
| 019 | Proxy dual semantics | `--proxy` routes transport on client, data on server |
|
||||
| 023 | Unified auth | Shared Ed25519 key material across auth mechanisms |
|
||||
| 024 | Bidirectional call protocol | Both sides can call, generalized from ADR-018 |
|
||||
| 025 | Handler/spec separation | Downstream registers operations without modifying core |
|
||||
| 026 | Head/worker terminology | Replace hub/spoke with head/worker; any node can be a head |
|
||||
| 027 | Service layer via irpc | Core responsibilities decomposed into irpc service protocols |
|
||||
| 028 | Auth as service | Auth verification via irpc service, not in-memory key set |
|
||||
|
||||
## References
|
||||
|
||||
- `@alkdev/pubsub` — TypeScript event target adapters and `EventEnvelope`
|
||||
- `@alkdev/operations` — TypeScript call protocol, `OperationSpec`, registry
|
||||
- `@alkdev/flowgraph` — TypeScript operation graph and call graph (planned Rust port)
|
||||
- `@alkdev/storage` — TypeScript metagraph, identity, ACL (planned Rust port as `alknet-storage`)
|
||||
- `@alkdev/dispatch` — Instance management service (head+worker architecture reference)
|
||||
- iroh-dns — DNS resolver and endpoint info (naming/discovery)
|
||||
- iroh-live-relay — WebTransport relay (planned transport reference)
|
||||
- irpc — iroh streaming RPC (service layer, async boundaries)
|
||||
- [event_source_types.md](/workspace/research/event_sourcing/event_source_types.md) — Event-driven architecture patterns and anti-patterns
|
||||
@@ -1,91 +0,0 @@
|
||||
|
||||
|
||||
|
||||
Here is an article tailored specifically to untangle these concepts. It is structured not just as a conceptual guide, but as a **diagnostic tool**—perfect for feeding into an AI coding CLI to sniff out architectural smells and "spaghetti concepts" in a codebase.
|
||||
|
||||
***
|
||||
|
||||
# Deconstructing Event-Driven Architecture: Untangling "Spaghetti Concepts"
|
||||
|
||||
In modern software architecture, the term "Event" has fallen victim to *semantic diffusion*—a concept popularized by Martin Fowler where a term becomes so widely used that it loses its original, specific meaning. When developers use the same word to describe state persistence, data distribution, and asynchronous notifications, the result is "Spaghetti Concepts."
|
||||
|
||||
Just like spaghetti code, spaghetti concepts lead to tight coupling, brittle systems, and unpredictable side effects. To fix an Event-Driven Architecture (EDA), we must draw hard boundaries around what an "event" is actually doing in any given context.
|
||||
|
||||
This guide breaks down the distinct types of events, their proper use cases, and the structural anti-patterns (Conflation Points) that occur when they are mixed up.
|
||||
|
||||
---
|
||||
|
||||
## 1. Event Sourcing (State Persistence)
|
||||
**The Concept:** Event Sourcing is a method of persisting state. Instead of saving the *current* state of an entity (e.g., `Quantity: 27`) in a database row, you save the *history of facts* that led to that state (e.g., `Received 30`, `Shipped 5`, `Adjusted +2`). The current state is derived by replaying these facts.
|
||||
|
||||
**The Golden Rule:** Event Sourcing is an **internal implementation detail** of a specific service or Aggregate. It is highly specific to the domain logic.
|
||||
|
||||
**How to Identify It:**
|
||||
* Uses a specialized stream database (like EventStoreDB).
|
||||
* Events are named in the past tense representing highly specific domain actions (`InventoryAdjusted`, `OrderPlaced`).
|
||||
* The system reads a stream of these events to reconstruct an object in memory before applying new business rules.
|
||||
|
||||
### 🚨 Conflation Point: Leaking the Event Store (The Database Reach-In)
|
||||
**The Smell:** Service B connects directly to Service A’s event store to read its events and react to them.
|
||||
**Why it’s bad:** Because Event Sourcing events are internal state, exposing them externally completely shatters Service A's encapsulation. If Service A refactors how it calculates inventory, Service B breaks.
|
||||
**The Fix:** Service A should project its internal Event Sourcing events into generalized **Integration Events** (see below) and publish those to a message broker (like RabbitMQ or Kafka) for Service B to consume.
|
||||
|
||||
---
|
||||
|
||||
## 2. Event-Carried State Transfer (Data Distribution)
|
||||
**The Concept:** Also known as "Fat Events," this pattern is used to distribute data across services to avoid synchronous API calls (temporal coupling). If Service B needs to know about a Product's price to calculate a shopping cart total, Service A publishes an event containing the *entire* current state of that product. Service B listens to this event and builds a local, read-only cache (a projection).
|
||||
|
||||
**The Golden Rule:** These events exist to answer the question, *"What does the data look like now?"* without requiring a synchronous HTTP callback.
|
||||
|
||||
**How to Identify It:**
|
||||
* Events often have generic CRUD-like names (`ProductUpdated`, `CustomerCreated`).
|
||||
* Payloads are "fat"—they contain a lot of data (ID, Name, Price, Category, etc.).
|
||||
* Often implemented using Change Data Capture (CDC) tools like Debezium reading from a primary database and publishing to Kafka.
|
||||
|
||||
### 🚨 Conflation Point: Event Sourcing vs. State Transfer
|
||||
**The Smell:** Using a state transfer tool (like Debezium publishing `RowUpdated` events) as a makeshift Event Sourcing log to derive business logic.
|
||||
**Why it’s bad:** A database row update doesn't tell you *why* the data changed. Was a user's address updated because they moved, or because there was a typo? Business intent is lost.
|
||||
**The Fix:** Keep CDC and state transfer events strictly for updating local read-caches in downstream services. Do not use them to drive complex business workflows that rely on "intent."
|
||||
|
||||
---
|
||||
|
||||
## 3. Notification Events (Behavioral Triggers)
|
||||
**The Concept:** Also known as "Thin Events," these are lean messages broadcasted to notify the system that a business milestone has occurred. They usually contain minimal data—often just an Entity ID and an action.
|
||||
|
||||
**The Golden Rule:** They act as an asynchronous "tap on the shoulder" to tell downstream services to trigger their own workflows (Choreography).
|
||||
|
||||
**How to Identify It:**
|
||||
* Payloads are "thin" (e.g., `{ "Event": "OrderShipped", "OrderId": "123" }`).
|
||||
* Used heavily in integrations (e.g., triggering an email via AWS SES, or notifying a shipping warehouse).
|
||||
|
||||
### 🚨 Conflation Point: The Synchronous Callback Trap (Boomerang Coupling)
|
||||
**The Smell:** Service A publishes a thin `OrderPlaced` event. Service B receives it, but to do its job, it must immediately make a synchronous HTTP REST call back to Service A to fetch the order details.
|
||||
**Why it’s bad:** If Service A goes down, Service B fails. You have successfully implemented Event-Driven Architecture, but kept the exact synchronous temporal coupling you were trying to eliminate. Furthermore, a flood of events can cause a DDoS attack on your own service.
|
||||
**The Fix:** If downstream services *always* need the data to process the event, upgrade the Notification Event to an Event-Carried State Transfer ("Fat Event") by including the required data in the payload.
|
||||
|
||||
---
|
||||
|
||||
## 4. Domain Events vs. Integration Events (The Boundary Rule)
|
||||
*Own Insight / DDD Integration*
|
||||
|
||||
A massive source of spaghetti concepts is failing to differentiate between events meant for *inside* the house and events meant for *outside* the house.
|
||||
|
||||
* **Domain Events:** Fired and consumed *within the same service boundary*. They can contain rich, complex, internal domain models because the producer and consumer share the same codebase/ubiquitous language.
|
||||
* **Integration Events:** Fired *across service boundaries*. They should be simple, generalized, and stripped of internal jargon or complex objects.
|
||||
|
||||
### 🚨 Conflation Point: The Leaky Domain Model
|
||||
**The Smell:** A microservice publishes an event to a global Kafka topic, and the payload contains internal database IDs, complex nested objects, or serialized language-specific data types (like Java/C# specific enums).
|
||||
**Why it’s bad:** Downstream services are now strictly coupled to the internal data structure of the upstream service.
|
||||
**The Fix:** Implement an Anti-Corruption Layer. The producing service should catch its own Domain Event, map the data to a simplified, standardized, versioned Integration Event, and publish *that* to the wider system.
|
||||
|
||||
---
|
||||
|
||||
## Code Review CLI Prompt: "The Conflation Detector"
|
||||
*(Note: Feed the following heuristics to your AI CLI alongside this article to review your codebase).*
|
||||
|
||||
**AI CLI Instructions:** Scan the provided codebase for Event-Driven Architecture anti-patterns. Flag code that violates the conceptual boundaries described in the article. Look specifically for:
|
||||
1. **Shared Event Stores:** Are multiple distinct microservices connecting to the same EventStoreDB or reading the exact same raw Event Sourcing stream?
|
||||
2. **Boomerang Callbacks:** Is an event consumer receiving a message from a broker (RabbitMQ/Kafka/Azure Service Bus), extracting an ID, and immediately making an HTTP request to the service that originated the event?
|
||||
3. **Leaky Domain Models:** Are internal entity objects (e.g., classes mapped directly to ORMs like Entity Framework or Hibernate) being serialized directly into event payloads sent to external message brokers?
|
||||
4. **Misused CDC:** Are Debezium/database-trigger events being used to trigger business logic workflows, rather than simply updating read-models/caches?
|
||||
5. **Fat Notification Trap:** Are Notification events carrying massive payloads just to trigger an email, when a thin event would suffice? Or conversely, are thin events starving consumers of necessary data?
|
||||
@@ -1,773 +0,0 @@
|
||||
# SSH Tunnel VPN Alternative — Feasibility Assessment
|
||||
|
||||
**Date**: 2026-06-01
|
||||
**Status**: Feasibility assessment / architecture sketch
|
||||
**Updated**: 2026-06-01 — Added iroh transport analysis (§11)
|
||||
|
||||
## 1. Problem Statement
|
||||
|
||||
Countries in the "developed west" (UK, CA, etc.) are increasingly banning or restricting VPNs at the protocol level. The valid use case of a VPN — a *virtual private network* for securing traffic on hostile networks, accessing private infrastructure, and tunneling between trusted endpoints — gets caught in the crossfire when VPNs are treated primarily as location-spoofing tools.
|
||||
|
||||
SSH-based tunnels cover the same functional ground without being a VPN protocol. Blocking SSH would break the internet in critical ways (infrastructure management, CI/CD, development workflows). The goal is to build a dead-simple, self-hostable Rust client/server that provides VPN-like functionality over SSH, with optional TLS wrapping for traffic obfuscation.
|
||||
|
||||
## 2. Reference Codebase Analysis
|
||||
|
||||
### 2.1 Dispatch (`/workspace/@alkdev/dispatch`)
|
||||
|
||||
Dispatch proves russh usage well within scope. Key takeaways:
|
||||
|
||||
- **Pure SSH client** — `client::Handler` is a zero-sized type, auto-accepts server keys. Minimal boilerplate.
|
||||
- **Arc-wrapped Handle pattern** — `Arc<client::Handle<Client>>` enables sharing across concurrent tasks (port forwarding, SFTP, exec).
|
||||
- **Port forwarding via `channel_open_direct_tcpip`** — Already implemented. Local TCP listener → `direct-tcpip` SSH channel → `tokio::io::copy_bidirectional`. This is the standard SSH `-L` pattern, implemented programmatically.
|
||||
- **Channel-per-operation model** — Each operation opens its own SSH channel on a shared session. Multiplexing is handled by russh internally.
|
||||
- **Channel.into_stream()** — Converts SSH channels to `AsyncRead + AsyncWrite` streams, enabling use with any tokio I/O combinator.
|
||||
|
||||
The dispatch codebase is clean and demonstrates that the core SSH mechanics are straightforward. The new project would need both client **and** server sides, but russh's server API mirrors the client API closely.
|
||||
|
||||
### 2.2 russh (`/workspace/russh`)
|
||||
|
||||
Critical capabilities confirmed:
|
||||
|
||||
| Feature | API | Status |
|
||||
|---------|-----|--------|
|
||||
| Local port forwarding (client → server → remote) | `Handle::channel_open_direct_tcpip()` | Available, no feature flag |
|
||||
| Remote port forwarding (server listens, client gets channels) | `Handle::tcpip_forward()` / Handler callback `server_channel_open_forwarded_tcpip()` | Available, no feature flag |
|
||||
| Unix socket forwarding | `Handle::channel_open_direct_streamlocal()` / `Handle::streamlocal_forward()` | Available, no feature flag |
|
||||
| Server-side reverse forwarding | `server::Handler::tcpip_forward()` / `server::Handle::forward_tcpip()` | Available, no feature flag |
|
||||
| Arbitrary stream transport | `client::connect_stream()` / `server::run_stream()` | **Both accept `AsyncRead+AsyncWrite+Unpin+Send`** |
|
||||
| Channel as bidirectional stream | `Channel::into_stream()` / `split()` | Available |
|
||||
|
||||
**The `connect_stream()` and `run_stream()` APIs are the key enabler for TLS wrapping.** They accept any async byte stream, meaning we can layer TLS (via `tokio-rustls`) underneath russh without modifying russh itself. The SSH session runs over a TLS stream, which looks like HTTPS to DPI.
|
||||
|
||||
## 3. Architecture Sketch
|
||||
|
||||
### 3.1 Components
|
||||
|
||||
```
|
||||
┌─────────────────────────────────┐ ┌─────────────────────────────────┐
|
||||
│ CLIENT │ │ SERVER │
|
||||
│ │ │ │
|
||||
│ ┌──────────┐ ┌───────────┐ │ │ ┌───────────┐ ┌──────────┐ │
|
||||
│ │ TUN │ │ SSH │ │ SSH │ │ SSH │ │ Proxy │ │
|
||||
│ │ Interface│───▶│ Client │──┼─ over ──▶│ Server │───▶│ Handler │ │
|
||||
│ │ (tun-rs)│◀───│ (russh) │ │ TLS │ (russh) │◀───│ │ │
|
||||
│ └──────────┘ └─────┬─────┘ │ opt. │ └─────┬─────┘ └────┬─────┘ │
|
||||
│ │ │ │ │ │ │
|
||||
│ ┌─────▼─────┐ │ │ ┌─────▼─────┐ ┌────▼─────┐ │
|
||||
│ │ TLS Layer │ │ │ │ TLS Layer │ │ Outbound │ │
|
||||
│ │(tokio- │ │ │ │(tokio- │ │ Proxy │ │
|
||||
│ │ rustls) │ │ │ │ rustls) │ │(SOCKS5/ │ │
|
||||
│ └─────┬─────┘ │ │ └─────┬─────┘ │ HTTP) │ │
|
||||
│ │ │ │ │ └────┬─────┘ │
|
||||
│ ┌─────▼─────┐ │ │ ┌─────▼─────┐ │ │
|
||||
│ │ TCP │ │ │ │ TCP │ ┌────▼─────┐ │
|
||||
│ │ Connect │◀─┼────────▶│ │ Listener │ │ Direct │ │
|
||||
│ └───────────┘ │ │ └───────────┘ │ Forward │ │
|
||||
│ │ │ └────┬─────┘ │
|
||||
└─────────────────────────────────┘ └─────────────────────────────────┘
|
||||
│ │
|
||||
Proxy Mode Direct Mode
|
||||
(outbound via (outbound
|
||||
SOCKS5/HTTP) direct TCP)
|
||||
```
|
||||
|
||||
### 3.2 Data Flow — Client TUN Mode
|
||||
|
||||
1. **TUN interface** (created via `tun-rs`) captures IP packets from the OS routing table
|
||||
2. **Client reads IP packets** from the TUN device, determines destination IP:port
|
||||
3. **Client opens `direct-tcpip` SSH channel** to destination via `handle.channel_open_direct_tcpip(dest_ip, dest_port, ...)`
|
||||
4. **Client writes packet payload** to the SSH channel, reads response
|
||||
5. **Client writes response** back to TUN interface
|
||||
|
||||
This is essentially what tun2proxy does, except instead of SOCKS5 upstream, it's an SSH channel.
|
||||
|
||||
### 3.3 Data Flow — TLS Obfuscation Mode
|
||||
|
||||
When `--tls` or `--https` is specified:
|
||||
|
||||
1. **Client establishes TLS connection** to `server:443` using `tokio-rustls::TlsStream`
|
||||
2. **SSH session runs over the TLS stream** via `client::connect_stream(Arc::new(config), tls_stream, handler)`
|
||||
3. **Server accepts TLS connection**, then runs `server::run_stream(server_config, tls_stream, handler)`
|
||||
4. **To DPI, the traffic looks like HTTPS** — standard TLS handshake, then encrypted application data
|
||||
5. Optional: Server can present a legitimate-looking certificate and serve a fake nginx 404 to non-SSH probes (similar to https_proxy's stealth approach)
|
||||
|
||||
### 3.4 Data Flow — Server-Side Proxy Mode
|
||||
|
||||
When `--proxy` is specified on the server:
|
||||
|
||||
1. Client requests `channel_open_direct_tcpip(target_host, target_port, ...)`
|
||||
2. Server's `channel_open_direct_tcpip` handler checks ACLs
|
||||
3. Instead of connecting directly, server routes through a local SOCKS5/HTTP proxy
|
||||
4. This provides an additional hop for privacy — the SSH server's IP isn't exposed to the destination
|
||||
|
||||
### 3.5 CLI Interface Sketch
|
||||
|
||||
```bash
|
||||
# Server — simplest mode (SSH only, port 22)
|
||||
ghost serve --key /etc/ssh/ssh_host_ed25519_key
|
||||
|
||||
# Server — with TLS on port 443
|
||||
ghost serve --key /etc/ssh/ssh_host_ed25519_key --tls --tls-cert /etc/ssl/cert.pem --tls-key /etc/ssl/key.pem
|
||||
|
||||
# Server — with TLS + outbound proxy
|
||||
ghost serve --key /etc/ssh/ssh_host_ed25519_key --tls --tls-cert /etc/ssl/cert.pem --tls-key /etc/ssl/key.pem --proxy socks5://127.0.0.1:9050
|
||||
|
||||
# Client — TUN mode (routes all traffic through SSH tunnel)
|
||||
ghost connect --server example.com:443 --tls --identity ~/.ssh/id_ed25519 --tun
|
||||
|
||||
# Client — Single port forward (like SSH -L)
|
||||
ghost connect --server example.com:443 --tls --identity ~/.ssh/id_ed25519 --forward 5432:db.internal:5432
|
||||
|
||||
# Client — SOCKS5 proxy mode (local SOCKS5 that tunnels through SSH)
|
||||
ghost connect --server example.com:443 --tls --identity ~/.ssh/id_ed25519 --socks5 1080
|
||||
```
|
||||
|
||||
**Working name: `ghost`** (as in "ghost in the shell" — it's SSH, it's stealthy, it passes through walls). Or `shade`, `wraith`, `spectre`. Pick anything.
|
||||
|
||||
## 4. Key Technical Decisions & Unknowns Analysis
|
||||
|
||||
### 4.1 TUN Interface — SOLVED
|
||||
|
||||
**Library: `tun-rs` (v2, formerly `tun` crate)**
|
||||
|
||||
- Supports Linux, macOS, Windows (via wintun.dll), FreeBSD, OpenBSD, NetBSD, Android, iOS
|
||||
- Async API with `tokio` feature: `DeviceBuilder::new().build_async()`
|
||||
- Clean `recv()` / `send()` API — read IP packets, write IP packets
|
||||
- Already used in production by tun2proxy and similar projects
|
||||
- Supports hardware offload (TSO/GSO) on Linux for performance
|
||||
- No `CAP_NET_ADMIN` needed on some platforms when using `--unshare` namespace approach (tun2proxy pattern)
|
||||
|
||||
**This is a solved problem.** The `tun-rs` crate is mature, cross-platform, and async-native with tokio. The implementation is straightforward:
|
||||
|
||||
```rust
|
||||
let dev = DeviceBuilder::new()
|
||||
.ipv4("10.0.0.1", 24, None)
|
||||
.mtu(1400)
|
||||
.build_async()?;
|
||||
|
||||
let mut buf = vec![0u8; 65536];
|
||||
loop {
|
||||
let len = dev.recv(&mut buf).await?;
|
||||
// Parse IP header, determine destination
|
||||
// Open SSH channel to destination
|
||||
// Write response back to TUN
|
||||
}
|
||||
```
|
||||
|
||||
**Key consideration**: On Linux requires `CAP_NET_ADMIN` or root. The tun2proxy approach of using network namespaces (`--unshare`) is worth adopting for unprivileged operation.
|
||||
|
||||
### 4.2 SSH over TLS — SOLVED (architecturally)
|
||||
|
||||
**Approach: Layer TLS beneath SSH using russh's `connect_stream` / `run_stream`**
|
||||
|
||||
This is the critical insight. russh already decouples transport from protocol:
|
||||
|
||||
- `client::connect_stream(config, stream, handler)` — accepts any `AsyncRead + AsyncWrite + Unpin + Send`
|
||||
- `server::run_stream(config, stream, handler)` — same for server
|
||||
|
||||
This means:
|
||||
|
||||
```rust
|
||||
// Client side
|
||||
let tcp_stream = TcpStream::connect((server_addr, server_port)).await?;
|
||||
let tls_stream = TlsStream::connect(tls_connector, server_domain, tcp_stream).await?;
|
||||
let handle = client::connect_stream(config, tls_stream, handler).await?;
|
||||
|
||||
// Server side
|
||||
let (tcp_stream, addr) = tcp_listener.accept().await?;
|
||||
let tls_stream = TlsStream::accept(tls_acceptor, tcp_stream).await?;
|
||||
server::run_stream(config, tls_stream, handler).await?;
|
||||
```
|
||||
|
||||
**No modification to russh is needed.** This is a clean layering.
|
||||
|
||||
**For HTTPS stealth**: The server can:
|
||||
1. Accept connections on port 443
|
||||
2. Present a valid TLS certificate (self-signed or Let's Encrypt via ACME)
|
||||
3. Non-SSH clients making HTTP requests get a normal-looking 404 response
|
||||
4. SSH clients speak SSH protocol directly after TLS handshake
|
||||
5. DPI sees standard HTTPS traffic since the TLS handshake is normal
|
||||
|
||||
The https_proxy project demonstrates this pattern well — stealth proxy returning fake nginx 404s to probes.
|
||||
|
||||
### 4.3 IP Packet Handling — NEEDS DESIGN
|
||||
|
||||
When using TUN mode, we're receiving raw IP packets. We need to:
|
||||
|
||||
1. **Parse IP headers** to determine destination IP and port
|
||||
2. **Track connection state** — map `(src_ip, src_port, dst_ip, dst_port)` to SSH channels
|
||||
3. **TCP reassembly** — handle segmentation, retransmission, etc.
|
||||
4. **ICMP handling** — respond to pings, handle unreachable destinations
|
||||
5. **DNS interception** — handle DNS queries that arrive at the TUN interface
|
||||
|
||||
This is the most complex part. Options:
|
||||
|
||||
**Option A: Use a userspace TCP/IP stack (smoltcp)**
|
||||
- Parse packets, but let a userspace stack handle TCP
|
||||
- Heavier dependency, but proven approach (what tun2proxy does with its own stack)
|
||||
- `smoltcp` is well-maintained, used in embedded and networking projects
|
||||
|
||||
**Option B: Raw packet forwarding with NAT**
|
||||
- Simpler conceptually — just NAT the packets, forward them through the SSH channel
|
||||
- Requires handling TCP state at the IP level (seq/ack manipulation, checksum recalculation)
|
||||
- More error-prone
|
||||
|
||||
**Option C: SOCKS5 proxy mode only (no TUN)**
|
||||
- Simplest to implement — just a local SOCKS5 server that forwards through SSH
|
||||
- Browsers, curl, and most apps can use SOCKS5
|
||||
- No root/CAP_NET_ADMIN needed
|
||||
- But: doesn't capture all traffic (UDP, DNS leaks, etc.)
|
||||
|
||||
**Recommendation**: Start with Option C (SOCKS5 proxy mode) as the minimal viable product. Add TUN mode (Option A with smoltcp) as an advanced feature. This matches how tun2proxy structures their project and is the pragmatic path.
|
||||
|
||||
### 4.4 SSH Server Authentication — STRAIGHTFORORD
|
||||
|
||||
The server implementation needs:
|
||||
|
||||
- **Public key authentication** — primary method, matching standard SSH practices
|
||||
- **`authorized_keys` file support** — read `~/.ssh/authorized_keys` or a custom path
|
||||
- **Optional password authentication** — for convenience, but not recommended for production
|
||||
|
||||
russh's `server::Handler` trait provides `auth_publickey` and `auth_password` callbacks. Implementation is trivial:
|
||||
|
||||
```rust
|
||||
async fn auth_publickey(&mut self, user: &str, public_key: &PublicKey) -> Auth {
|
||||
if self.authorized_keys.iter().any(|k| k == public_key) {
|
||||
Auth::Accept
|
||||
} else {
|
||||
Auth::Reject { proceed_with_methods: None, partial_success: false }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.5 DNS Handling — DESIGN DECISION NEEDED
|
||||
|
||||
In TUN mode, DNS queries need to be routed through the tunnel. Options:
|
||||
|
||||
1. **Virtual DNS** (tun2proxy approach) — intercept DNS packets, map query names to fake IPs from a reserved range (198.18.0.0/15), resolve via the SSH tunnel
|
||||
2. **DNS-over-TCP** — Force DNS through the SSH tunnel
|
||||
3. **Direct DNS** — Don't handle DNS in the tunnel, rely on system resolver
|
||||
4. **SOCKS5 mode** — SOCKS5 supports DOMAIN names natively (SOCKS5h), so DNS resolution happens server-side
|
||||
|
||||
**Recommendation**: SOCKS5 mode handles DNS naturally via SOCKS5h. For TUN mode, adopt the virtual DNS approach from tun2proxy (their `ip-stack` crate handles this).
|
||||
|
||||
### 4.6 Connection Multiplexing — ALREADY SOLVED
|
||||
|
||||
russh multiplexes channels over a single SSH connection. No need to manage multiple TCP connections per tunnel. One SSH connection, many channels. This is exactly what we want.
|
||||
|
||||
### 4.7 Keep-Alive and Reconnection — NEEDS DESIGN
|
||||
|
||||
- **SSH keepalive**: russh `Config` has `keepalive_interval` and `keepalive_max`
|
||||
- **Auto-reconnect**: Client should detect disconnection (`is_closed()`) and reconnect with exponential backoff
|
||||
- **TUN continuity**: When SSH reconnects, existing TCP connections through the tunnel will fail, but new ones will work. This is acceptable behavior (same as any VPN).
|
||||
|
||||
### 4.8 Server-Side Proxy (Outbound) — STRAIGHTFORORD
|
||||
|
||||
When `--proxy` is specified, the server's `channel_open_direct_tcpip` handler forwards through a local proxy:
|
||||
|
||||
```rust
|
||||
async fn channel_open_direct_tcpip(
|
||||
&mut self,
|
||||
host: &str,
|
||||
port: u32,
|
||||
...
|
||||
) -> Result<Channel<Msg>, Self::Error> {
|
||||
// Option 1: Connect directly
|
||||
let stream = TcpStream::connect((host, port as u16)).await?;
|
||||
|
||||
// Option 2: Connect through SOCKS5 proxy
|
||||
let stream = connect_socks5(proxy_addr, host, port).await?;
|
||||
|
||||
// Option 3: Connect through HTTP CONNECT proxy
|
||||
let stream = connect_http_proxy(proxy_addr, host, port).await?;
|
||||
|
||||
// Then bidirectional copy between SSH channel and stream
|
||||
Ok(channel)
|
||||
}
|
||||
```
|
||||
|
||||
SOCKS5 client implementation is simple (5-byte handshake, variable-length connect). HTTP CONNECT is also straightforward. Both can be implemented in a few hundred lines.
|
||||
|
||||
## 5. Dependency Assessment
|
||||
|
||||
| Dependency | Purpose | Maturity | Risk |
|
||||
|------------|---------|----------|------|
|
||||
| `russh` | SSH client & server | High (used in dispatch, well-maintained) | Low — already proven |
|
||||
| `tun-rs` (v2) | TUN/TAP interface | High (cross-platform, prod-tested, bench'd at 70Gbps) | Low — well-maintained |
|
||||
| `tokio-rustls` | TLS layer | High (standard Rust TLS) | Low — widely used |
|
||||
| `rustls` | TLS implementation | High | Low — no ring dependency needed with aws-lc-rs |
|
||||
| `smoltcp` | Userspace TCP/IP stack (TUN mode) | Medium-High | Medium — complex but well-proven |
|
||||
| `clap` | CLI args | High | None |
|
||||
| `tracing` | Structured logging | High | None |
|
||||
| `anyhow/thiserror` | Error handling | High | None |
|
||||
| `tokio` | Async runtime | High | None |
|
||||
|
||||
**No immature or risky dependencies.** Every crate is well-established with active maintenance.
|
||||
|
||||
## 6. Risk Assessment
|
||||
|
||||
### 6.1 Technical Risks
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|-----------|--------|------------|
|
||||
| TUN mode complexity (TCP state, IP parsing) | Medium | Medium | Start with SOCKS5 mode; TUN is advanced feature |
|
||||
| Cross-platform TUN differences | Medium | Medium | tun-rs handles most; `--unshare` for Linux privilege separation |
|
||||
| TLS + SSH interaction edge cases | Low | Low | Both are well-tested; russh's `connect_stream` / `run_stream` abstracts transport |
|
||||
| Performance under load | Low | Medium | russh multiplexes channels; tun-rs has benchmarked 35+ Gbps async |
|
||||
| DPI detecting SSH banner over TLS | Medium | High | After TLS, the SSH banner ("SSH-2.0-...") is encrypted. But SNI reveals domain. Use `Config { anonymous: true }` to minimize fingerprint, or configure `client_id` to look like a web server. |
|
||||
|
||||
### 6.2 Protocol-Level Risks
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|-----------|--------|------------|
|
||||
| SSH protocol fingerprinting (packet sizes, timing) | Medium | Medium | Pad messages, add random delays. russh doesn't do this natively — would need custom channel wrapping. |
|
||||
| SNI leaks domain in TLS handshake | High | Low | Use a innocuous domain. Could also explore ECH (Encrypted Client Hello) in rustls if available. |
|
||||
| Deep packet inspection identifying SSH patterns even over TLS | Low-Medium | Medium | The TLS layer prevents payload inspection. Only traffic analysis (sizes, timing) is possible. Padding and traffic shaping could help. |
|
||||
| Countries blocking SSH traffic on port 22 | Already happening | N/A | That's the whole point — we run SSH over TLS on port 443 |
|
||||
|
||||
### 6.3 Usability Risks
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|-----------|--------|------------|
|
||||
| Requires self-hosted server | By design | Medium | Document simple deployment. Provide Docker image. Consider one-command install script. |
|
||||
| Root/CAP_NET_ADMIN needed for TUN on Linux | High | Medium | Provide `--unshare` mode. SOCKS5 mode needs no privileges. |
|
||||
| Certificate management for TLS mode | Medium | Low | Support self-signed certs, ACME (Let's Encrypt), or manual cert paths. |
|
||||
|
||||
## 7. Implementation Plan
|
||||
|
||||
### Phase 1: MVP (2-3 days)
|
||||
|
||||
**SOCKS5 proxy mode only. No TUN. Client + server.**
|
||||
|
||||
1. **Server binary** (`ghost serve`)
|
||||
- russh server implementation with public key auth
|
||||
- `channel_open_direct_tcpip` handler: connect to target directly or via outbound proxy
|
||||
- Optional TLS wrapping via `tokio-rustls` + `server::run_stream`
|
||||
- Config: listen address, host key path, authorized keys, TLS options, proxy options
|
||||
|
||||
2. **Client binary** (`ghost connect`)
|
||||
- russh client with public key auth
|
||||
- Local SOCKS5 server that forwards connections through SSH `channel_open_direct_tcpip`
|
||||
- Optional TLS wrapping via `tokio-rustls` + `client::connect_stream`
|
||||
- Config: server address, identity key, TLS options, SOCKS5 listen address
|
||||
|
||||
3. **Testing**
|
||||
- Integration test: client → server → HTTP target
|
||||
- Test with: `curl --socks5-hostname 127.0.0.1:1080 https://example.com`
|
||||
- Test TLS mode against DPI-like inspection
|
||||
|
||||
### Phase 2: Port Forwarding (1 day)
|
||||
|
||||
4. **Client: explicit port forwards** (`--forward local:remote:port`)
|
||||
- Direct reimplementation of SSH `-L` and `-R`
|
||||
- Uses `channel_open_direct_tcpip` for local forwards
|
||||
- Uses `tcpip_forward` / handler callback for remote forwards
|
||||
|
||||
5. **Client: SOCKS5 with DNS** (SOCKS5h)
|
||||
- Domain names resolved server-side, not client-side
|
||||
|
||||
### Phase 3: TUN Mode (2-3 days)
|
||||
|
||||
6. **Client: TUN interface mode** (`--tun`)
|
||||
- Create TUN device via `tun-rs`
|
||||
- IP packet routing through SSH channels
|
||||
- Either: raw packet forwarding (simpler, but fragile) or smoltcp integration (robust, but more code)
|
||||
- Recommend: use tun2proxy's `ip-stack` crate or similar for TCP reconstruction
|
||||
- Virtual DNS for TUN mode
|
||||
|
||||
7. **Privilege separation**
|
||||
- `--unshare` mode for Linux (create network namespace, unshare)
|
||||
- Document CAP_NET_ADMIN requirement
|
||||
|
||||
### Phase 4: Hardening & Polish (1-2 days)
|
||||
|
||||
8. **Obfuscation improvements**
|
||||
- SSH banner customization (`client_id` config)
|
||||
- Random padding in channel data
|
||||
- Traffic shaping / constant-rate padding (optional, advanced)
|
||||
|
||||
9. **Server stealth**
|
||||
- Non-SSH connection detection: serve fake nginx 404 on TLS port
|
||||
- Dual-protocol listener: HTTPS for browsers, SSH for ghost clients
|
||||
|
||||
10. **Auto-reconnect**
|
||||
- Exponential backoff reconnect on SSH session drop
|
||||
- TUN interface survives reconnect (new connections work, in-flight connections fail gracefully)
|
||||
|
||||
### Phase 5: Distribution (1 day)
|
||||
|
||||
11. **Build & packaging**
|
||||
- Static musl binary for Linux
|
||||
- Docker image
|
||||
- systemd unit file
|
||||
- One-line install script
|
||||
|
||||
## 8. Estimated Timeline
|
||||
|
||||
| Phase | Duration | Cumulative |
|
||||
|-------|----------|------------|
|
||||
| Phase 1: SOCKS5 MVP | 2-3 days | 2-3 days |
|
||||
| Phase 2: Port Forwarding | 1 day | 3-4 days |
|
||||
| Phase 3: TUN Mode | 2-3 days | 5-7 days |
|
||||
| Phase 4: Hardening & Polish | 1-2 days | 6-9 days |
|
||||
| Phase 5: Distribution | 1 day | 7-10 days |
|
||||
|
||||
With LLM-assisted development, the MVP (Phase 1) could realistically be done in 1-2 focused sessions. The full feature set in under a week.
|
||||
|
||||
## 9. Open Questions
|
||||
|
||||
1. **Project name** — `ghost`, `wraith`, `shade`, `spectre`, something else? Needs to be catchy, not conflict with existing Rust crates, and suggest stealth/mobility.
|
||||
|
||||
2. **TUN vs smoltcp** — Should TUN mode integrate smoltcp for a userspace TCP stack, or try the simpler "just forward packets and let the OS handle TCP" approach? Smoltcp is more work but more robust. tun2proxy's approach (which uses their own `ip-stack`) suggests userspace TCP is the way to go for reliability.
|
||||
|
||||
3. **TLS certificate story** — Should the server support ACME/Let's Encrypt auto-provisioning (like https_proxy does), or is manual cert management sufficient? Auto-provisioning is more user-friendly but adds significant complexity and a dependency on the ACME protocol.
|
||||
|
||||
4. **Mobile support** — Should we target iOS/Android eventually? tun-rs supports both via platform APIs, but mobile is a much bigger scope. Probably Phase 6+.
|
||||
|
||||
5. **Multi-user server** — Should the server support multiple simultaneous clients? russh's server model handles this naturally (each connection gets its own Handler instance), but access control (per-user ACLs, bandwidth limits) would add complexity.
|
||||
|
||||
6. **Crates structure** — Single binary with subcommands (`ghost serve`, `ghost connect`), or separate binaries? Single crate with `#[tokio::main]` dispatch seems cleanest for MVP.
|
||||
|
||||
## 10. Conclusion
|
||||
|
||||
**This is feasible and straightforward.** The core mechanics — SSH tunnel via russh, TLS wrapping via tokio-rustls, TUN interface via tun-rs — are all solved problems with mature Rust libraries. The dispatch codebase proves russh is production-ready for this kind of work. The `connect_stream` / `run_stream` API in russh makes TLS wrapping a clean layering, not a hack.
|
||||
|
||||
The biggest design decision is TUN mode approach (raw packets vs. userspace TCP), and the recommendation is to start with SOCKS5 mode and add TUN later. This gives a working tool in 2-3 days that covers the primary use case (private tunneling that doesn't look like VPN traffic).
|
||||
|
||||
The project is well-scoped, the risk profile is low, and the existing tooling (russh, tun-rs, tokio-rustls) handles the hard parts. This is a "few days of focused work" estimate, not a "few weeks."
|
||||
|
||||
## 11. iroh Transport — Feasibility Addendum
|
||||
|
||||
### 11.1 The Insight
|
||||
|
||||
russh's `connect_stream()` and `server::run_stream()` accept **any** `AsyncRead + AsyncWrite + Unpin + Send` stream. The iroh project provides exactly such a stream — a QUIC bidirectional stream (`open_bi()` / `accept_bi()`) where both `SendStream` and `RecvStream` implement `tokio::io::AsyncWrite` and `tokio::io::AsyncRead` respectively.
|
||||
|
||||
This means **iroh can serve as a transport layer beneath SSH**, the same way TLS can. The architecture becomes:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────┐
|
||||
│ APPLICATION │
|
||||
│ (SOCKS5 / TUN / port-forward) │
|
||||
├──────────────────────────────────────────────────┤
|
||||
│ SSH (russh) │
|
||||
│ channel_open_direct_tcpip/etc. │
|
||||
├──────────────────────────────────────────────────┤
|
||||
│ Transport Layer (SWAPPABLE) │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
|
||||
│ │ TCP │ │ TLS │ │ iroh │ │
|
||||
│ │(direct) │ │(obfusc) │ │ (P2P QUIC) │ │
|
||||
│ └──────────┘ └──────────┘ └──────────────┘ │
|
||||
└──────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 11.2 Why iroh is Compelling
|
||||
|
||||
iroh solves the **biggest deployment problem** with SSH tunnels: the server needs a public IP and open port.
|
||||
|
||||
With iroh as transport:
|
||||
|
||||
1. **No public IP needed** — Server and client both connect outbound to iroh's relay servers. Hole-punching attempts direct UDP in the background.
|
||||
2. **No open firewall ports** — The server only needs outbound HTTPS to the relay. No inbound 22 or 443 required.
|
||||
3. **NAT traversal for free** — iroh's relay + hole-punching means peers behind CGNAT or strict firewalls can still connect.
|
||||
4. **Ed25519-based addressing** — Peers are identified by public key (EndpointId), no DNS or IP addresses needed.
|
||||
5. **Built-in address discovery** — pkarr DNS records let you find a peer knowing only their public key.
|
||||
6. **Still SSH underneath** — All the channel multiplexing, port forwarding, SOCKS5 logic still works. iroh is just the wire.
|
||||
|
||||
The use cases multiply:
|
||||
|
||||
- **Home server behind NAT**: No reverse proxy, no dynamic DNS, no port forwarding. Just run the server, share the EndpointId.
|
||||
- **Temporary infrastructure**: Spin up a server anywhere (even behind corporate NAT), connect by public key.
|
||||
- **Internal services**: Expose Postgres/Redis etc. over an SSH connection that traverses any NAT, no VPN required.
|
||||
- **Censorship circumvention**: SSH over iroh QUIC to a relay that uses standard HTTPS. The deep packet inspector sees HTTPS traffic to a relay server, not SSH.
|
||||
|
||||
### 11.3 How It Works — The Code
|
||||
|
||||
The integration is trivially clean because both primitives implement the right traits:
|
||||
|
||||
**Client side:**
|
||||
```rust
|
||||
// Create iroh endpoint
|
||||
let endpoint = Endpoint::builder(presets::N0)
|
||||
.alpns(vec![b"ghost-ssh/1".to_vec()])
|
||||
.bind()
|
||||
.await?;
|
||||
|
||||
// Connect to peer (no IP needed — just public key)
|
||||
let addr = EndpointAddr::from_bytes(peer_id_bytes);
|
||||
let conn = endpoint.connect(addr, b"ghost-ssh/1").await?;
|
||||
|
||||
// Open a bidirectional QUIC stream
|
||||
let (send_stream, recv_stream) = conn.open_bi().await?;
|
||||
|
||||
// Combine into a single AsyncRead+AsyncWrite
|
||||
let iroh_stream = tokio::io::join(recv_stream, send_stream);
|
||||
// OR use a custom wrapper that implements AsyncRead+AsyncWrite
|
||||
|
||||
// Run SSH client over the iroh stream
|
||||
let handle = client::connect_stream(
|
||||
Arc::new(client_config),
|
||||
iroh_stream,
|
||||
client_handler
|
||||
).await?;
|
||||
```
|
||||
|
||||
**Server side:**
|
||||
```rust
|
||||
// Create iroh endpoint
|
||||
let endpoint = Endpoint::builder(presets::N0)
|
||||
.alpns(vec![b"ghost-ssh/1".to_vec()])
|
||||
.bind()
|
||||
.await?;
|
||||
|
||||
// Accept incoming connections
|
||||
while let Some(incoming) = endpoint.accept().await {
|
||||
let conn = incoming.await?;
|
||||
|
||||
// For each connection, accept a bidirectional stream
|
||||
let (send_stream, recv_stream) = conn.accept_bi().await?;
|
||||
let iroh_stream = tokio::io::join(recv_stream, send_stream);
|
||||
|
||||
// Run SSH server over the iroh stream
|
||||
server::run_stream(
|
||||
Arc::new(server_config),
|
||||
iroh_stream,
|
||||
server_handler
|
||||
).await?;
|
||||
}
|
||||
```
|
||||
|
||||
**Or using iroh's Router + ProtocolHandler pattern:**
|
||||
```rust
|
||||
struct GhostSshProtocol;
|
||||
|
||||
impl ProtocolHandler for GhostSshProtocol {
|
||||
async fn accept(&self, connection: Connection) -> Result<(), AcceptError> {
|
||||
// iroh already handled connection acceptance
|
||||
// We can accept bi streams on the connection directly
|
||||
// Or: each SSH session could be a new bi stream on the same connection
|
||||
|
||||
let (send, recv) = connection.accept_bi().await
|
||||
.map_err(AcceptError::from_err)?;
|
||||
let stream = join_streams(recv, send);
|
||||
|
||||
server::run_stream(server_config, stream, GhostHandler).await
|
||||
.map_err(AcceptError::from_err)
|
||||
}
|
||||
}
|
||||
|
||||
let endpoint = Endpoint::builder(presets::N0).bind().await?;
|
||||
let router = Router::builder(endpoint)
|
||||
.accept(b"ghost-ssh/1", GhostSshProtocol)
|
||||
.spawn();
|
||||
```
|
||||
|
||||
### 11.4 Design Decision: One Stream per Session vs. One Connection with Multiple Streams
|
||||
|
||||
There are two ways to layer SSH over iroh:
|
||||
|
||||
**Option A: One QUIC bi-stream per SSH session**
|
||||
- Each SSH session opens a new `open_bi()` stream under a single iroh `Connection`
|
||||
- The iroh Connection itself persists (one QUIC connection per peer pair)
|
||||
- Simpler: `open_bi()` gives you a stream, you feed it to `connect_stream()`
|
||||
- Pro: Connection setup cost amortized. If SSH disconnects, `open_bi()` again is cheap.
|
||||
- Con: Need to combine `RecvStream` + `SendStream` into a single `AsyncRead+AsyncWrite`
|
||||
|
||||
**Option B: One iroh Connection per SSH session (new QUIC connection each time)**
|
||||
- Each SSH session = one `endpoint.connect()` + the whole connection
|
||||
- Wasteful: QUIC handshake + iroh relay discovery each time
|
||||
- Not recommended
|
||||
|
||||
**Recommendation: Option A.** One iroh `Connection` per peer pair, one `open_bi()` stream per SSH session. The connection is long-lived; SSH sessions can be re-established cheaply on the same QUIC connection.
|
||||
|
||||
### 11.5 Combining `RecvStream + SendStream` into `AsyncRead + AsyncWrite`
|
||||
|
||||
QUIC splits streams into separate send and receive halves. russh needs a single duplex stream. Two approaches:
|
||||
|
||||
**Approach 1: `tokio::io::join()` (simplest)**
|
||||
```rust
|
||||
use tokio::io;
|
||||
|
||||
fn join_iroh_stream(
|
||||
recv: iroh::endpoint::RecvStream,
|
||||
send: iroh::endpoint::SendStream,
|
||||
) -> impl AsyncRead + AsyncWrite + Unpin + Send {
|
||||
io::join(recv, send)
|
||||
}
|
||||
```
|
||||
`tokio::io::join` returns a `Join<A, B>` that implements both `AsyncRead` (from the first) and `AsyncWrite` (from the second). Since `RecvStream: AsyncRead` and `SendStream: AsyncWrite`, this works directly.
|
||||
|
||||
**Approach 2: Custom wrapper (more control)**
|
||||
```rust
|
||||
struct IrohStream {
|
||||
recv: iroh::endpoint::RecvStream,
|
||||
send: iroh::endpoint::SendStream,
|
||||
}
|
||||
|
||||
impl AsyncRead for IrohStream { /* delegate to recv */ }
|
||||
impl AsyncWrite for IrohStream { /* delegate to send */ }
|
||||
```
|
||||
|
||||
**Recommendation: Start with `tokio::io::join`.** It's one line and has the right trait implementations. Only switch to a custom wrapper if profiling shows overhead (unlikely).
|
||||
|
||||
### 11.6 Relay Considerations
|
||||
|
||||
iroh provides two relay options:
|
||||
|
||||
1. **Default n0 relay servers** (`https://use1-1.relay.n0.iroh.network.`) — free, operated by n0. Good for getting started and testing.
|
||||
2. **Self-hosted relay** (`iroh-relay` crate) — The relay server is part of the iroh project. Can be self-hosted for complete independence.
|
||||
|
||||
For this project:
|
||||
|
||||
- **Development/quick start**: Use n0 relays (they're free and reliable)
|
||||
- **Production/privacy**: Self-host the relay server. It's a single binary (`iroh-relay`) that can run on any VPS. The relay sees only encrypted QUIC packets — it cannot read SSH traffic.
|
||||
- **Paranoid**: Disable relay entirely. Both peers must have direct network connectivity. No third-party dependency.
|
||||
|
||||
The `RelayMode` enum handles this:
|
||||
```rust
|
||||
// Default n0 relays
|
||||
let endpoint = Endpoint::builder(presets::N0).bind().await?;
|
||||
|
||||
// Self-hosted relay
|
||||
let relay_map = RelayMap::from([(relay_url, Some(direct_addr))]);
|
||||
let endpoint = Endpoint::builder(presets::Custom(relay_map)).bind().await?;
|
||||
|
||||
// No relay (direct only)
|
||||
let endpoint = Endpoint::builder(presets::RelayDisabled).bind().await?;
|
||||
```
|
||||
|
||||
### 11.7 Updated Architecture with iroh Transport
|
||||
|
||||
```
|
||||
┌───────────────────────────────────────────────────────────┐
|
||||
│ CLIENT │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌───────────┐ ┌────────────────────┐ │
|
||||
│ │ TUN / │ │ SSH │ │ Transport │ │
|
||||
│ │ SOCKS5 / │───▶│ Client │───▶│ (selectable) │ │
|
||||
│ │ Port- │ │ (russh) │ │ │ │
|
||||
│ │ Forward │ │ │ │ ┌────────────────┐ │ │
|
||||
│ └──────────┘ └───────────┘ │ │ TCP direct │ │ │
|
||||
│ │ │ TLS (rustls) │ │ │
|
||||
│ │ │ iroh (QUIC) │ │ │
|
||||
│ │ └────────────────┘ │ │
|
||||
│ └────────────────────┘ │
|
||||
└───────────────────────────────────────────────────────────┘
|
||||
|
||||
┌───────────────────────────────────────────────────────────┐
|
||||
│ SERVER │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌───────────┐ ┌────────────────────┐ │
|
||||
│ │ Outbound │ │ SSH │ │ Transport │ │
|
||||
│ │ Proxy / │◀───│ Server │◀───│ (selectable) │ │
|
||||
│ │ Direct │ │ (russh) │ │ │ │
|
||||
│ │ Forward │ │ │ │ ┌────────────────┐ │ │
|
||||
│ └──────────┘ └───────────┘ │ │ TCP listener │ │ │
|
||||
│ │ │ TLS (rustls) │ │ │
|
||||
│ │ │ iroh (QUIC) │ │ │
|
||||
│ │ └────────────────┘ │ │
|
||||
│ └────────────────────┘ │
|
||||
└───────────────────────────────────────────────────────────┘
|
||||
|
||||
┌──────────────┐
|
||||
│ iroh Relay │ (optional, for NAT)
|
||||
│ (self-host │
|
||||
│ or n0) │
|
||||
└──────────────┘
|
||||
|
||||
Transport modes:
|
||||
--transport tcp Direct TCP (default, simplest)
|
||||
--transport tls TCP + TLS (obfuscation)
|
||||
--transport iroh iroh QUIC (NAT traversal, no public IP)
|
||||
--transport iroh+tls iroh QUIC + TLS (NAT traversal + obfuscation)
|
||||
```
|
||||
|
||||
### 11.8 iroh Transport — Risk Assessment
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|-----------|--------|------------|
|
||||
| iroh API instability (it's v0.x) | Medium | Medium | Pin version; iroh's core stream API is stable (it's just QUIC) |
|
||||
| Relay dependency for initial connectivity | Low | Low | Self-host relay; or direct-only mode for LAN |
|
||||
| QUIC stream vs TCP semantics differences | Low | Medium | QUIC streams are reliable ordered byte streams, same semantics as TCP. russh won't know the difference. |
|
||||
| Performance overhead of QUIC + SSH | Low | Low | QUIC is fast. SSH over QUIC might actually be *faster* than SSH over TCP due to QUIC's multipath and no head-of-line blocking. |
|
||||
| iroh crate size / compile time | Low | Low | iroh pulls in quinn + rustls + lots of networking. But we already need rustls for TLS mode. The incremental cost is the QUIC stack. |
|
||||
|
||||
**Key observation**: QUIC streams have identical reliability and ordering guarantees to TCP. russh's `connect_stream()` / `run_stream()` will work correctly over iroh QUIC streams with no modifications.
|
||||
|
||||
### 11.9 Updated CLI Sketch with iroh
|
||||
|
||||
```bash
|
||||
# Server — iroh mode (no public IP needed!)
|
||||
ghost serve --key ~/.ssh/id_ed25519 --transport iroh
|
||||
# Prints endpoint ID: e.g., "abc123..."
|
||||
# Clients connect using this ID
|
||||
|
||||
# Server — iroh mode with self-hosted relay
|
||||
ghost serve --key ~/.ssh/id_ed25519 --transport iroh \
|
||||
--iroh-relay https://my-relay.example.com
|
||||
|
||||
# Client — connect via iroh (no IP needed!)
|
||||
ghost connect --peer abc123def456... --transport iroh --socks5 1080
|
||||
|
||||
# Client — connect via iroh with TUN
|
||||
ghost connect --peer abc123def456... --transport iroh --tun
|
||||
|
||||
# Client — traditional TCP mode (still works)
|
||||
ghost connect --server 1.2.3.4:443 --transport tls --socks5 1080
|
||||
```
|
||||
|
||||
### 11.10 Implementation Impact
|
||||
|
||||
Adding iroh as a transport option is **incremental** — it doesn't change the SSH layer at all:
|
||||
|
||||
1. **Transport trait**: Define a `Transport` trait that produces `Box<dyn AsyncRead + AsyncWrite + Unpin + Send>`:
|
||||
```rust
|
||||
trait Transport {
|
||||
async fn connect(&self) -> Result<Box<dyn AsyncRead + AsyncWrite + Unpin + Send>>;
|
||||
}
|
||||
```
|
||||
|
||||
2. **Three implementations**:
|
||||
- `TcpTransport` — plain TCP
|
||||
- `TlsTransport` — TCP + tokio-rustls
|
||||
- `IrohTransport` — iroh endpoint + `open_bi()` + `tokio::io::join(recv, send)`
|
||||
|
||||
3. **Server side**: Same trait, different direction:
|
||||
```rust
|
||||
trait TransportAcceptor {
|
||||
async fn accept(&self) -> Result<Box<dyn AsyncRead + AsyncWrite + Unpin + Send>>;
|
||||
}
|
||||
```
|
||||
|
||||
4. **The SSH layer never changes.** russh's `connect_stream()` / `run_stream()` takes the transport stream, and everything else stays the same.
|
||||
|
||||
### 11.11 Dependency Impact
|
||||
|
||||
| Dependency | Added? | Size concern |
|
||||
|------------|--------|-------------|
|
||||
| `iroh` (includes iroh-base) | Yes, feature-gated | Yes — pulls in QUIC stack, DNS, relay client |
|
||||
| `n0-error` | Yes (small) | No |
|
||||
| `tokio` | Already present | No |
|
||||
| `rustls` | Already present (for TLS mode) | No |
|
||||
|
||||
**Recommendation**: Make iroh a feature flag (`--features iroh`) so the base install stays lean. Users who want P2P capability opt in:
|
||||
|
||||
```toml
|
||||
[features]
|
||||
default = ["tls"]
|
||||
tls = ["tokio-rustls", "rustls-pemfile"]
|
||||
iroh = ["dep:iroh"]
|
||||
tun = ["dep:tun-rs", "dep:smoltcp"]
|
||||
```
|
||||
|
||||
### 11.12 The Compelling Narrative
|
||||
|
||||
With iroh as a transport option, this tool becomes something genuinely new:
|
||||
|
||||
- **Not just a VPN alternative** — it's a VPN alternative that doesn't need port forwarding, public IPs, or DNS records.
|
||||
- **Not just SSH tunneling** — it's SSH tunneling that works between any two machines on the internet, regardless of NAT configuration.
|
||||
- **Not just for censorship circumvention** — it's how you securely expose internal services (Postgres, Redis, admin panels) from machines behind corporate firewalls or home networks.
|
||||
|
||||
The "ghetto VPN" becomes a **zero-config mesh VPN**. Spin up `ghost serve` on any machine, share the public key, connect from anywhere. The relay server is optional (self-host or n0's free tier). And underneath it's just SSH, doing what SSH does best.
|
||||
|
||||
This isn't theoretical — the API compatibility is exact. iroh's `RecvStream + SendStream` implement `AsyncRead + AsyncWrite`, and russh's `connect_stream` / `run_stream` accept `AsyncRead + AsyncWrite`. Three lines of `tokio::io::join(recv, send)` and you have a transport stream that russh can use.
|
||||
@@ -1,472 +0,0 @@
|
||||
# Alknet Flowgraph: Operation Graph, Call Graph, and Graph Operations
|
||||
|
||||
> Status: Research / Draft
|
||||
> Last updated: 2026-06-06
|
||||
|
||||
## Overview
|
||||
|
||||
`alknet-flowgraph` is a Rust crate providing graph data structures and operations, mapping the TypeScript `@alkdev/flowgraph` package's call-graph and operation-graph concepts to `petgraph::DiGraph`. It works with `alknet-storage` for persistence and `alknet-core` for call protocol event processing.
|
||||
|
||||
## Core Abstraction
|
||||
|
||||
`petgraph::DiGraph` replaces graphology. The mapping is nearly 1:1 for the operations used:
|
||||
|
||||
| TypeScript (graphology) | Rust (petgraph) |
|
||||
|------------------------|-----------------|
|
||||
| `graph.addNode(key, attrs)` | `graph.add_node(attrs)` returns `NodeIndex` |
|
||||
| `graph.addEdge(source, target, attrs)` | `graph.add_edge(source, target, attrs)` returns `EdgeIndex` |
|
||||
| `graph.getAttribute(key)` | `graph[node]` |
|
||||
| `graph.forEachNode()` | `graph.node_indices().for_each()` |
|
||||
| `graph.inNeighbors(node)` | `graph.neighbors_directed(node, Direction::Incoming)` |
|
||||
| `graph.outNeighbors(node)` | `graph.neighbors_directed(node, Direction::Outgoing)` |
|
||||
| `graph.hasCycle()` | `petgraph::algo::is_cyclic_directed(&graph)` |
|
||||
| `graph.topologicalSort()` | `petgraph::algo::toposort(&graph)` returns `Result<Vec<NodeIndex>, Cycle>` |
|
||||
| `graph.export()` | serde serialization |
|
||||
| `FlowGraph.fromJSON(data)` | serde deserialization |
|
||||
|
||||
### Key Difference: Node Keys
|
||||
|
||||
graphology uses string node keys (`"call-001"`). petgraph uses `NodeIndex` (u32). We maintain a `HashMap<String, NodeIndex>` for node-key-to-index lookups, mirroring the `key` column in the `nodes` SQLite table.
|
||||
|
||||
```rust
|
||||
pub struct FlowGraph<N, E>
|
||||
where
|
||||
N: NodeAttributes,
|
||||
E: EdgeAttributes,
|
||||
{
|
||||
graph: DiGraph<N, E>,
|
||||
key_to_index: HashMap<String, NodeIndex>,
|
||||
}
|
||||
|
||||
pub trait NodeAttributes: Clone + Serialize + DeserializeOwned + Debug + Send + Sync {
|
||||
fn key(&self) -> &str;
|
||||
fn set_key(&mut self, key: String);
|
||||
}
|
||||
|
||||
pub trait EdgeAttributes: Clone + Serialize + DeserializeOwned + Debug + Send + Sync {
|
||||
fn edge_type(&self) -> &str;
|
||||
}
|
||||
```
|
||||
|
||||
## Operation Graph (Static)
|
||||
|
||||
Built from `OperationSpec`s at startup. Answers structural questions about the operation space: type compatibility, cycle detection, reachability.
|
||||
|
||||
### Construction
|
||||
|
||||
```rust
|
||||
impl FlowGraph<OperationNodeAttrs, OperationEdgeAttrs> {
|
||||
pub fn from_specs(specs: &[OperationSpec]) -> Result<Self, CycleError> {
|
||||
let mut graph = Self::new();
|
||||
for spec in specs {
|
||||
graph.add_operation(spec.clone());
|
||||
}
|
||||
for (source, target) in graph.compute_type_edges(specs) {
|
||||
graph.add_typed_edge(&source, &target, TypeCompat::compatible(/*...*/))?;
|
||||
}
|
||||
if graph.has_cycles() {
|
||||
return Err(CycleError);
|
||||
}
|
||||
Ok(graph)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Type Compatibility
|
||||
|
||||
Compare `output_schema` (source) against `input_schema` (target) using `jsonschema`:
|
||||
|
||||
```rust
|
||||
pub fn type_compat(
|
||||
output_schema: &Value,
|
||||
input_schema: &Value,
|
||||
) -> TypeCompatResult {
|
||||
// 1. Exact match → compatible
|
||||
// 2. Subtype match (output has extra fields) → compatible
|
||||
// 3. Unknown on either side → skip (no edge)
|
||||
// 4. Structural mismatch → incompatible edge (added with compatible: false)
|
||||
}
|
||||
```
|
||||
|
||||
### Node Attributes
|
||||
|
||||
```rust
|
||||
#[derive(Clone, Serialize, Deserialize, Debug)]
|
||||
pub struct OperationNodeAttrs {
|
||||
pub name: String,
|
||||
pub namespace: String,
|
||||
pub op_type: OperationType,
|
||||
pub input_schema: Value,
|
||||
pub output_schema: Value,
|
||||
}
|
||||
|
||||
#[derive(Clone, Serialize, Deserialize, Debug)]
|
||||
pub enum OperationType {
|
||||
Query,
|
||||
Mutation,
|
||||
Subscription,
|
||||
}
|
||||
|
||||
#[derive(Clone, Serialize, Deserialize, Debug)]
|
||||
pub struct OperationEdgeAttrs {
|
||||
pub edge_type: String, // "typed"
|
||||
pub compatible: bool,
|
||||
pub detail: Option<String>,
|
||||
}
|
||||
```
|
||||
|
||||
### Queries
|
||||
|
||||
```rust
|
||||
// petgraph delegations
|
||||
pub fn topological_order(&self) -> Result<Vec<String>, CycleError>
|
||||
pub fn has_cycles(&self) -> bool
|
||||
pub fn find_cycles(&self) -> Vec<Vec<String>>
|
||||
pub fn ancestors(&self, node_key: &str) -> Vec<String>
|
||||
pub fn descendants(&self, node_key: &str) -> Vec<String>
|
||||
pub fn predecessors(&self, node_key: &str) -> Vec<String>
|
||||
pub fn successors(&self, node_key: &str) -> Vec<String>
|
||||
pub fn reachable_from(&self, node_keys: &[String]) -> HashSet<String>
|
||||
```
|
||||
|
||||
## Call Graph (Dynamic)
|
||||
|
||||
Populated at runtime from call protocol events. Every `call.requested` adds a node, every `call.responded`/`call.error`/`call.aborted` updates its status.
|
||||
|
||||
### Construction from Events
|
||||
|
||||
```rust
|
||||
impl FlowGraph<CallNodeAttrs, CallEdgeAttrs> {
|
||||
pub fn from_call_events(events: &[CallEventMapValue]) -> Self {
|
||||
let mut graph = Self::new();
|
||||
for event in events {
|
||||
graph.update_from_event(event);
|
||||
}
|
||||
graph
|
||||
}
|
||||
|
||||
pub fn update_from_event(&mut self, event: &CallEventMapValue) {
|
||||
match event {
|
||||
CallEvent::Requested(e) => {
|
||||
self.add_call(CallNodeAttrs {
|
||||
request_id: e.request_id.clone(),
|
||||
operation_id: e.operation_id.clone(),
|
||||
status: CallStatus::Pending,
|
||||
parent_request_id: e.parent_request_id.clone(),
|
||||
input: e.input.clone(),
|
||||
..Default::default()
|
||||
});
|
||||
}
|
||||
CallEvent::Responded(e) => {
|
||||
self.update_status(&e.request_id, CallStatus::Completed, None);
|
||||
}
|
||||
CallEvent::Error(e) => {
|
||||
self.update_status(&e.request_id, CallStatus::Failed, Some(e.clone()));
|
||||
}
|
||||
CallEvent::Aborted(e) => {
|
||||
self.update_status(&e.request_id, CallStatus::Aborted, None);
|
||||
}
|
||||
CallEvent::Completed(e) => {
|
||||
self.update_status(&e.request_id, CallStatus::Completed, None);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Real-time Population
|
||||
|
||||
```rust
|
||||
// Subscribe to call protocol events for live graph construction
|
||||
let call_graph = FlowGraph::<CallNodeAttrs, CallEdgeAttrs>::new();
|
||||
|
||||
pubsub.subscribe("call.requested", |event| {
|
||||
call_graph.update_from_event(&event);
|
||||
});
|
||||
pubsub.subscribe("call.responded", |event| {
|
||||
call_graph.update_from_event(&event);
|
||||
});
|
||||
// ... etc for all call event types
|
||||
```
|
||||
|
||||
### Node Attributes
|
||||
|
||||
```rust
|
||||
#[derive(Clone, Serialize, Deserialize, Debug, Default)]
|
||||
pub struct CallNodeAttrs {
|
||||
pub request_id: String,
|
||||
pub operation_id: String,
|
||||
pub status: CallStatus,
|
||||
pub parent_request_id: Option<String>,
|
||||
pub input: Value,
|
||||
pub output: Option<Value>,
|
||||
pub error: Option<CallErrorInfo>,
|
||||
pub identity: Option<Identity>,
|
||||
pub started_at: Option<String>,
|
||||
pub completed_at: Option<String>,
|
||||
}
|
||||
|
||||
#[derive(Clone, Serialize, Deserialize, Debug, PartialEq)]
|
||||
pub enum CallStatus {
|
||||
Pending,
|
||||
Running,
|
||||
Completed,
|
||||
Failed,
|
||||
Aborted,
|
||||
}
|
||||
|
||||
#[derive(Clone, Serialize, Deserialize, Debug)]
|
||||
pub struct CallEdgeAttrs {
|
||||
pub edge_type: EdgeType,
|
||||
}
|
||||
|
||||
#[derive(Clone, Serialize, Deserialize, Debug)]
|
||||
pub enum EdgeType {
|
||||
Triggered,
|
||||
DependsOn,
|
||||
}
|
||||
```
|
||||
|
||||
### Status Lifecycle
|
||||
|
||||
```
|
||||
call.requested
|
||||
│
|
||||
▼
|
||||
┌─────────┐
|
||||
│ pending │
|
||||
└────┬────┘
|
||||
│
|
||||
handler starts
|
||||
│
|
||||
▼
|
||||
┌─────────┐
|
||||
┌────│ running │────┐
|
||||
│ └────┬────┘ │
|
||||
call.aborted │ call.aborted
|
||||
│ │ │
|
||||
▼ │ ▼
|
||||
┌─────────┐ │ ┌─────────┐
|
||||
│ aborted │ │ │ aborted │
|
||||
└─────────┘ │ └─────────┘
|
||||
│
|
||||
┌─────────┼─────────┐
|
||||
│ │ │
|
||||
call.responded │ call.error
|
||||
│ │ │
|
||||
▼ │ ▼
|
||||
┌───────────┐ │ ┌────────┐
|
||||
│ completed │ │ │ failed │
|
||||
└───────────┘ │ └────────┘
|
||||
│
|
||||
call.completed
|
||||
│
|
||||
▼
|
||||
┌───────────┐
|
||||
│ completed │
|
||||
└───────────┘
|
||||
```
|
||||
|
||||
### Abort Cascading
|
||||
|
||||
```rust
|
||||
// Abort cascade: get all descendants of a call
|
||||
let descendants = call_graph.descendants(&request_id);
|
||||
// The protocol handler aborts each descendant via PendingRequestMap::abort()
|
||||
```
|
||||
|
||||
### Observability Queries
|
||||
|
||||
| Query | Method | Returns |
|
||||
|-------|--------|---------|
|
||||
| Get running calls | `filter_by_status(CallStatus::Running)` | Node keys with running status |
|
||||
| Get failed calls | `filter_by_status(CallStatus::Failed)` | Node keys with failed status |
|
||||
| Get top-level calls | `get_roots()` | Nodes with no `parent_request_id` |
|
||||
| Get children of call | `children(&request_id)` | Direct children via `triggered` edges |
|
||||
| Get call duration | `duration(&request_id)` | `completed_at - started_at` |
|
||||
| Get call lineage | `lineage(&request_id)` | Ancestor chain from root to this call |
|
||||
|
||||
### Serialization and Persistence
|
||||
|
||||
```rust
|
||||
// Serialize via serde
|
||||
let json = serde_json::to_value(&call_graph)?;
|
||||
let restored: FlowGraph<CallNodeAttrs, CallEdgeAttrs> = serde_json::from_value(json)?;
|
||||
|
||||
// Persist via alknet-storage
|
||||
storage.insert_call_graph("session-abc", &call_graph)?;
|
||||
storage.load_call_graph("session-abc")?;
|
||||
```
|
||||
|
||||
## Graph Operations (petgraph mapping)
|
||||
|
||||
All graph operations used in `@alkdev/flowgraph` map directly to petgraph:
|
||||
|
||||
| Flowgraph method | petgraph function |
|
||||
|------------------|-------------------|
|
||||
| `addNode(key, attrs)` | `add_node(attrs)` + `key_to_index.insert(key, idx)` |
|
||||
| `addEdge(source, target, attrs)` | `add_edge(source_idx, target_idx, attrs)` |
|
||||
| `addDirectedEdge(source, target, attrs)` | `add_edge(source_idx, target_idx, attrs)` |
|
||||
| `getNodeAttributes(key)` | `graph[NodeIndex]` |
|
||||
| `getEdgeAttributes(key)` | `graph[EdgeIndex]` |
|
||||
| `getSource(key)` / `getTarget(key)` | `graph.edge_endpoints(EdgeIndex)` |
|
||||
| `inDegree(key)` | `graph.neighbors_directed(idx, Incoming).count()` |
|
||||
| `outDegree(key)` | `graph.neighbors_directed(idx, Outgoing).count()` |
|
||||
| `inNeighbors(key)` | `graph.neighbors_directed(idx, Incoming)` |
|
||||
| `outNeighbors(key)` | `graph.neighbors_directed(idx, Outgoing)` |
|
||||
| `hasEdge(source, target)` | `graph.contains_edge(source_idx, target_idx)` |
|
||||
| `forEachNode(callback)` | `graph.node_indices().for_each()` |
|
||||
| `forEachEdge(callback)` | `graph.edge_indices().for_each()` |
|
||||
| `findCycle()` | `is_cyclic_directed(&graph)` |
|
||||
| `topologicalSort()` | `toposort(&graph, None)` |
|
||||
| `export()` / `toJSON()` | `serde_json::to_value(&graph)` |
|
||||
| `fromJSON()` | `serde_json::from_value(json)` |
|
||||
|
||||
### Cycle Detection
|
||||
|
||||
The operation graph rejects cycles at construction time. The call graph allows cycles in the parent-child hierarchy only via `parentRequestId` (which should not create actual cycles — a call cannot be its own ancestor).
|
||||
|
||||
```rust
|
||||
pub fn add_call(&mut self, attrs: CallNodeAttrs) -> Result<(), CycleError> {
|
||||
if let Some(parent) = &attrs.parent_request_id {
|
||||
// Check if adding triggered edge would create a cycle
|
||||
if self.would_create_cycle(parent, &attrs.request_id) {
|
||||
return Err(CycleError);
|
||||
}
|
||||
}
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
### DAG Invariants
|
||||
|
||||
- **Operation graph**: DAG-only enforced at construction. `add_typed_edge` throws `CycleError` if a cycle would result.
|
||||
- **Call graph**: DAG-only by design (a call cannot be its own ancestor). `add_call` with a `parentRequestId` that would create a cycle throws `CycleError`.
|
||||
- **No parallel edges**: `multi: false` — at most one edge per (source, target) pair.
|
||||
- **No self-loops**: `allow_self_loops: false` — an operation cannot depend on its own output.
|
||||
|
||||
## Integration with alknet-storage
|
||||
|
||||
Call graphs and operation graphs are stored as metagraph instances:
|
||||
|
||||
```rust
|
||||
// Create call-graph type definition
|
||||
let call_graph_type = GraphType {
|
||||
name: "call-graph".to_string(),
|
||||
config: GraphConfig { graph_type: Directed, multi: false, allow_self_loops: false },
|
||||
scope: Scope::System,
|
||||
..Default::default()
|
||||
};
|
||||
|
||||
// Store a call graph instance
|
||||
let graph = storage.create_graph("call-graph", "session-abc")?;
|
||||
|
||||
// Add call nodes
|
||||
storage.add_node(graph.id, "call-001", &call_attrs)?;
|
||||
|
||||
// Query via petgraph
|
||||
let pg: FlowGraph<CallNodeAttrs, CallEdgeAttrs> = storage.load_call_graph("session-abc")?;
|
||||
let running = pg.filter_by_status(CallStatus::Running);
|
||||
```
|
||||
|
||||
The `alknet-storage` crate handles persistence (SQLite via honker). The `alknet-flowgraph` crate handles in-memory graph operations (petgraph). The bridge is serialization: `FlowGraph` serializes to/from `serde_json::Value`, which `alknet-storage` stores in the `nodes.attributes` and `edges.attributes` columns.
|
||||
|
||||
## Integration with alknet-core (Call Protocol)
|
||||
|
||||
```rust
|
||||
// The call protocol's EventEnvelope drives call graph construction
|
||||
use alknet_core::call::PendingRequestMap;
|
||||
use alknet_flowgraph::FlowGraph;
|
||||
|
||||
let mut call_graph = FlowGraph::<CallNodeAttrs, CallEdgeAttrs>::new();
|
||||
|
||||
// Wire up call protocol events to graph updates
|
||||
call_map.on_requested(|event| {
|
||||
call_graph.update_from_event(&CallEvent::Requested(event));
|
||||
});
|
||||
|
||||
call_map.on_responded(|event| {
|
||||
call_graph.update_from_event(&CallEvent::Responded(event));
|
||||
// Persist incrementally to storage
|
||||
storage.update_node(event.request_id, &call_graph)?;
|
||||
});
|
||||
|
||||
call_map.on_error(|event| {
|
||||
call_graph.update_from_event(&CallEvent::Error(event));
|
||||
});
|
||||
|
||||
call_map.on_completed(|event| {
|
||||
call_graph.update_from_event(&CallEvent::Completed(event));
|
||||
});
|
||||
|
||||
// Abort cascading
|
||||
call_map.on_aborted(|event| {
|
||||
let descendants = call_graph.descendants(&event.request_id);
|
||||
for desc in descendants {
|
||||
call_map.abort(&desc);
|
||||
}
|
||||
call_graph.update_from_event(&CallEvent::Aborted(event));
|
||||
});
|
||||
```
|
||||
|
||||
## Type Compatibility Between TS and Rust
|
||||
|
||||
| TypeScript (flowgraph) | Rust (alknet-flowgraph) |
|
||||
|------------------------|-------------------------|
|
||||
| `graphology.DirectedGraph` | `petgraph::DiGraph<N, E>` |
|
||||
| `CallNodeAttrs` (TypeBox) | `CallNodeAttrs` (serde struct) |
|
||||
| `CallEdgeAttrs` (TypeBox) | `CallEdgeAttrs` (serde struct) |
|
||||
| `CallStatus` (enum) | `CallStatus` (Rust enum) |
|
||||
| `EdgeType` (enum) | `EdgeType` (Rust enum) |
|
||||
| `OperationNodeAttrs` | `OperationNodeAttrs` (serde struct) |
|
||||
| `OperationEdgeAttrs` | `OperationEdgeAttrs` (serde struct) |
|
||||
| `OperationType` (enum) | `OperationType` (Rust enum) |
|
||||
| `Identity` | `Identity` (serde struct) |
|
||||
| `AccessControl` | `AccessControl` (serde struct) |
|
||||
| `typeCompat()` | `type_compat()` using jsonschema |
|
||||
| `Value.Check()` (TypeBox) | `jsonschema::validate()` |
|
||||
| `addCall()` | `add_call()` |
|
||||
| `updateStatus()` with state machine | `update_status()` with `is_valid_transition()` |
|
||||
| `addDependency()` | `add_dependency()` |
|
||||
| `descendants()` | petgraph DFS |
|
||||
|
||||
## Crate Dependency Map
|
||||
|
||||
```toml
|
||||
[dependencies]
|
||||
petgraph = "0.x" # Core graph data structure
|
||||
serde = { version = "1", features = ["derive"] }
|
||||
serde_json = "1"
|
||||
jsonschema = "0.x" # Type compatibility checks
|
||||
thiserror = "1"
|
||||
uuid = { version = "1", features = ["v4"] }
|
||||
chrono = { version = "0.x", features = ["serde"] }
|
||||
|
||||
[dev-dependencies]
|
||||
tokio = { version = "1", features = ["full"] }
|
||||
```
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| Decision | Rationale |
|
||||
|----------|-----------|
|
||||
| petgraph over custom graph | Nearly 1:1 mapping to graphology operations; well-maintained; fast |
|
||||
| `HashMap<String, NodeIndex>` for key lookups | Matches SQLite `key` column pattern; O(1) lookup by string key |
|
||||
| serde_json for attributes | Matches SQLite `attributes TEXT JSON` column; dynamic validation via jsonschema |
|
||||
| Separate crates for flowgraph and storage | Flowgraph is pure in-memory graph ops; storage is SQLite persistence; different dependency sets |
|
||||
| `NodeAttributes` / `EdgeAttributes` traits | Generic over attribute types, matching flowgraph's type parameter pattern |
|
||||
| DAG enforcement at construction | Matches TypeScript flowgraph: `fromSpecs()` throws `CycleError` |
|
||||
| `filter_by_status` is O(n) | Matches TypeScript: small graphs (tens to hundreds of nodes), no index needed |
|
||||
| Call protocol as integration boundary | Call protocol `EventEnvelope` is the cross-node integration boundary; domain events stay within services |
|
||||
|
||||
## References
|
||||
|
||||
- `@alkdev/flowgraph` — TypeScript implementation (call-graph, operation-graph)
|
||||
- `@alkdev/operations` — `OperationSpec`, `CallHandler`, `PendingRequestMap`
|
||||
- `/workspace/petgraph` — Graph data structure crate
|
||||
- `/workspace/jsonschema` — JSON Schema validation crate
|
||||
- `/workspace/@alkdev/storage/docs/architecture/metagraph-module.md` — TypeBox Module pattern
|
||||
- `/workspace/@alkdev/storage/docs/architecture/sqlite-host.md` — SQLite table definitions
|
||||
- `/workspace/@alkdev/storage/docs/architecture/acl.md` — ACL as metagraph
|
||||
- [services.md](services.md) — Service layer architecture (irpc protocols)
|
||||
- [core.md](core.md) — Core overview, head/worker terminology
|
||||
@@ -1,850 +0,0 @@
|
||||
# Integration Plan: Services, PubSub, and Operations
|
||||
|
||||
> Status: Research / Draft
|
||||
> Last updated: 2026-06-09
|
||||
|
||||
## Purpose
|
||||
|
||||
This document organizes the findings from the research phase (core.md, services.md, configuration.md, storage.md, flow.md) into an actionable integration plan. It identifies what requires changes to the core, what becomes new crates, what can be carried over from existing research specs, and what needs further specification before implementation.
|
||||
|
||||
The plan is organized into phases because not everything can be front-loaded. Earlier phases change the core architecture; later phases build on top. Things learned during implementation may adjust later phases.
|
||||
|
||||
## Key Clarifications
|
||||
|
||||
### Transport / Interface / Protocol — Three Layers
|
||||
|
||||
Carrying forward the distinction raised during review, the architecture has three distinct layers:
|
||||
|
||||
```
|
||||
Layer 3: Application Protocol (Call Protocol, Operations, Service Calls)
|
||||
Layer 2: Interface (SSH, raw EventEnvelope framing, HTTP/WS, DNS control channel)
|
||||
Layer 1: Transport (TCP, TLS, iroh, WebTransport, DNS)
|
||||
```
|
||||
|
||||
A **connection** is always a (Transport, Interface) pair. The call protocol runs at Layer 3 and is agnostic to both layers below it.
|
||||
|
||||
This means:
|
||||
|
||||
| Combination | What it does | Example |
|
||||
|---|---|---|
|
||||
| (TLS, SSH) | Standard alknet tunnel | `alknet connect --transport tls` |
|
||||
| (TCP, SSH) | Plain SSH tunnel | `alknet connect --transport tcp` |
|
||||
| (iroh, SSH) | P2P SSH tunnel | `alknet connect --transport iroh` |
|
||||
| (DNS, raw framing) | DNS control channel | Call protocol frames as DNS TXT queries |
|
||||
| (WebTransport, SSH) | Browser SSH tunnel | Future: browser client |
|
||||
| (WebTransport, raw framing) | Browser call protocol | Future: browser-to-head direct |
|
||||
| (TCP, raw framing) | Direct call protocol | Local service mesh, no SSH overhead |
|
||||
|
||||
"Raw framing" means the 4-byte length prefix + JSON EventEnvelope format without SSH wrapping. The DNS "control channel" concept from the research is a (DNS transport, raw framing interface) pair. It carries call protocol events directly — it does NOT wrap SSH inside DNS.
|
||||
|
||||
### Services vs Call Protocol — Two Different Layers
|
||||
|
||||
From services.md:
|
||||
|
||||
> Services are internal — they run within a node or cluster. The call protocol is external — it's how nodes communicate with each other over SSH/QUIC/WebSocket/DNS transports.
|
||||
|
||||
- **irpc service calls**: Internal, synchronous request-response. Rust-to-Rust, postcard serialization, over tokio channels (local) or QUIC streams (remote). Domain-level.
|
||||
- **Call protocol events**: External, cross-node, cross-language. JSON EventEnvelope frames, over any (Transport, Interface) pair. Integration-level.
|
||||
|
||||
A call protocol handler MAY call an irpc service internally. For example, `/head/auth/verify` receives a call protocol `call.requested` event, then calls the local `AuthProtocol::VerifyPubkey` irpc service to actually perform the check. The layers compose:
|
||||
|
||||
```
|
||||
Call Protocol (Layer 3, external, JSON)
|
||||
└── irpc Service (Layer 3, internal, postcard)
|
||||
└── Honker Streams (Domain events, within service boundary)
|
||||
```
|
||||
|
||||
Future work on binary encoding (replacing JSON with postcard or similar for Rust-to-Rust cross-node communication) is possible but deferred — JSON works well across platforms and the performance characteristics are acceptable for control-plane traffic.
|
||||
|
||||
### OperationEnv — The Universal Composition Mechanism
|
||||
|
||||
The `OperationEnv` pattern from `@alkdev/operations` is not a TypeScript implementation detail. It is the **universal composition mechanism** that all operation handlers receive. It maps identically across every modern boundary:
|
||||
|
||||
- HTTP: `POST /v1/{namespace}/{op}` → `context.env[namespace][op](input)`
|
||||
- MCP: `tools/call` with tool name `{namespace}_{op}` → `context.env[namespace][op](input)`
|
||||
- DNS: `{op}.{namespace}.alk.dev TXT?` → `context.env[namespace][op](input)`
|
||||
- Call protocol: `call.requested` with `operationId: "/{node}/{namespace}/{op}"` → `context.env[namespace][op](input)`
|
||||
- irpc: service enum dispatch → wraps the same handler → `context.env[namespace][op](input)`
|
||||
|
||||
The handler always sees the same interface: given a namespace and operation name, invoke it with input. The OperationEnv implements the routing. The three dispatch paths are:
|
||||
|
||||
```
|
||||
OperationEnv (handler-facing composition)
|
||||
│
|
||||
├── Local dispatch (in-process, direct function call through registry)
|
||||
├── Service dispatch (in-cluster, irpc protocol enum to service backend)
|
||||
└── Remote dispatch (cross-node, call protocol EventEnvelope to head)
|
||||
```
|
||||
|
||||
All three resolve the same way from the handler's perspective. A handler calling `context.env.secrets.derive(input)` doesn't know or care whether it becomes a local function call, an irpc protocol message, or a cross-node call protocol event. The OperationEnv chooses the routing based on where the operation is registered.
|
||||
|
||||
This means:
|
||||
- **irpc services are one dispatch backend for OperationEnv**, not a replacement for it.
|
||||
- **irpc protocol enums** (`AuthProtocol::VerifyPubkey`, `SecretProtocol::DeriveEd25519`) define the wire format for in-cluster communication. They're the Rust-to-Rust optimization path.
|
||||
- **Call protocol operations** define the cross-node, cross-language wire format. They use path-based routing (`/head/auth/verify`).
|
||||
- **An irpc service can be exposed as a call protocol operation** — the registry maps the path to a handler that internally calls the irpc service.
|
||||
- **Both coexist** and both are needed. irpc gives you type-safe, efficient in-cluster calls. Call protocol gives you universal, cross-language, cross-node calls. OperationEnv unifies them from the handler's perspective.
|
||||
|
||||
The Rust implementation of OperationEnv doesn't have to be a literal `HashMap<String, HashMap<String, fn(...)>>` — it can be a struct with typed method dispatch or a registry that resolves to irpc clients — but the **behavioral contract** must match: namespace + operation name → invoke with input, return output. Handlers compose through this interface. Adapters (MCP, OpenAPI, HTTP, DNS) map to operations through this interface.
|
||||
|
||||
This is a hard constraint: the OperationEnv composition model must survive the Rust port intact. It's what makes operations universally composable across all interfaces.
|
||||
|
||||
---
|
||||
|
||||
## What Exists Already
|
||||
|
||||
### Existing Architecture Specs (reviewed/stable)
|
||||
|
||||
| Doc | Status | Carries Over? |
|
||||
|---|---|---|
|
||||
| overview.md | reviewed | Yes — needs updates for expanded scope (services, identity, interface layer) |
|
||||
| transport.md | reviewed | Yes — transport trait is unchanged |
|
||||
| client.md | reviewed | Yes — client behavior unchanged |
|
||||
| server.md | reviewed | Yes — server handler needs minor updates for DynamicConfig/AuthService |
|
||||
| tun-shim.md | deprecated | No — remains deprecated |
|
||||
| napi-and-pubsub.md | reviewed | Yes — NAPI layer needs call protocol additions |
|
||||
|
||||
### Existing Architecture Specs (draft)
|
||||
|
||||
| Doc | Status | Needs |
|
||||
|---|---|---|
|
||||
| auth.md | draft | Promote Identity to a first-class concern. Add IdentityProvider vs AuthService relationship. |
|
||||
| call-protocol.md | draft | Add OperationEnv as universal composition mechanism. Update hub/spoke → head/worker. Clarify Layer 3 position. Show three dispatch paths (local, irpc, remote). |
|
||||
|
||||
### Research Documents (source material)
|
||||
|
||||
| Doc | Content | Spec Readiness |
|
||||
|---|---|---|
|
||||
| core.md | Transport, call protocol, auth, services, DNS | High for most parts. DNS section needs rewrite for transport/interface separation. |
|
||||
| services.md | irpc service protocols, operation context, application services | High for core services. Application services are sketches — defer to phase 4+. |
|
||||
| configuration.md | Static/dynamic split, forwarding policy, multi-transport | High — this was nearly spec-ready already. Needs ADR extraction. |
|
||||
| storage.md | Metagraph, identity, ACL, secrets, honker | High for data model. Integration points with core need spec work. |
|
||||
| flow.md | FlowGraph, petgraph mapping, call/operation graphs | High — straightforward port of TypeScript design. |
|
||||
|
||||
### Existing ADRs (25 accepted)
|
||||
|
||||
ADR-001 through ADR-025 are accepted. Several new ADRs are needed (see Phase 0). Existing ADRs to update:
|
||||
- ADR-018 (control channel for pubsub) — superseded/extended by bidirectional call protocol (ADR-024) and the Layer 2/3 model
|
||||
- ADR-024, ADR-025 — update terminology from hub/spoke to head/worker
|
||||
|
||||
---
|
||||
|
||||
## Phase 0: Architecture Foundation
|
||||
|
||||
**Goal**: Establish the structural decisions that everything else depends on. Write ADRs, create new spec documents, adjust existing specs for the three-layer model and crate decomposition.
|
||||
|
||||
**Why first**: Every subsequent phase depends on knowing where types live, what the layer boundaries are, and which crates depend on which. These decisions are architectural and cheap to change now but expensive to change later.
|
||||
|
||||
### ADRs to Write
|
||||
|
||||
| ADR | Title | Key Decision |
|
||||
|---|---|---|
|
||||
| 026 | Transport-interface separation | Three-layer model: Transport (Layer 1) produces byte streams, Interface (Layer 2) parses them into sessions, Protocol (Layer 3) carries semantics. Valid (Transport, Interface) pairs are enumerated. SSH is an interface, not a transport. DNS control channel is a (DNS transport, raw framing interface) pair. |
|
||||
| 027 | Crate decomposition | alknet-core (transport, SSH, call protocol, config, auth types, identity), alknet-secret (BIP39, SLIP-0010, AES-GCM), alknet-storage (SQLite, honker, metagraph, ACL, identity tables), alknet-flowgraph (petgraph, type compatibility). Core depends on no heavy service crates. |
|
||||
| 028 | Auth as irpc service | Auth verification via IdentityProvider trait (in core). Default impl: ArcSwap<DynamicConfig>. Production impl: irpc AuthService backed by SQLite. Callers don't know the difference. |
|
||||
| 029 | Identity as core type | `Identity` struct (id, scopes, resources) and `IdentityProvider` trait live in alknet-core. Derivation and storage are external concerns. |
|
||||
| 030 | Static/dynamic config split | StaticConfig (transport binding, TLS, host key) vs DynamicConfig (auth, forwarding, rate limits). ArcSwap for hot reload. ConfigService wraps reloads. Promoted from research/configuration.md. |
|
||||
| 031 | Forwarding policy | Rule-based allow/deny for channel_open_direct_tcpip. Default-allow for migration, default-deny for production. TransportKind-aware rules. |
|
||||
| 032 | Event boundary discipline | Domain events (honker streams) stay within the owning service. Integration events (call protocol EventEnvelope) cross node boundaries. Service calls (irpc) are synchronous and internal. Never conflate the three. |
|
||||
| 033 | Call protocol / irpc relationship / OperationEnv | OperationEnv is the universal composition mechanism. irpc services are one dispatch backend for OperationEnv (in-cluster, postcard). Call protocol operations are another backend (cross-node, JSON). Handlers compose through `context.env[namespace][op](input)` regardless of dispatch path. Both are Layer 3, at different scope boundaries. |
|
||||
| 034 | Head/worker terminology | Replace hub/spoke with head/worker throughout. A head is also a worker. Mesh topologies are natural. |
|
||||
|
||||
### Spec Documents to Create or Update
|
||||
|
||||
| Document | Action | Source |
|
||||
|---|---|---|
|
||||
| `interface.md` | **Create new** | Defines Layer 2. SSH as interface. Raw framing as interface. DNS control channel as (DNS transport, raw framing interface). |
|
||||
| `services.md` | **Create new** | Defines irpc service layer. Auth, Secret, Config, Storage service protocols. How irpc services relate to call protocol operations and OperationEnv. Carries from research/services.md and research/core.md service layer section. |
|
||||
| `identity.md` | **Create new** | `Identity` type, `IdentityProvider` trait, auth flow for SSH and token. Carries from architecture/auth.md + research/services.md Identity section. |
|
||||
| `configuration.md` | **Promote from research** | StaticConfig, DynamicConfig, ConfigService, forwarding policy, auth service relationship. Needs cleanup: remove duplicate "Problem" heading, resolve open questions per ADRs. |
|
||||
| `secret-service.md` | **Create new** | Slides from research/services.md SecretProtocol definition. BIP39/SLIP-0010, key derivation paths, encryption model, lock/unlock lifecycle. |
|
||||
| `storage.md` | **Create new** (or reference alknet-storage's own docs) | Metagraph data model, identity tables, ACL graph, honker integration. Carries from research/storage.md. |
|
||||
| `flowgraph.md` | **Create new** (or reference alknet-flowgraph's own docs) | FlowGraph<N,E>, operation graph, call graph, petgraph mapping. Carries from research/flow.md. |
|
||||
| `overview.md` | **Update** | Add crate structure, Layer 3 description, service layer concept, updated dependency list. |
|
||||
| `auth.md` | **Update** | Add IdentityProvider vs AuthService relationship. Update for irpc AuthProtocol. Note: this is mostly a rename/reorg since the current auth.md already defines IdentityProvider. |
|
||||
| `call-protocol.md` | **Update** | Add OperationEnv as universal composition mechanism with three dispatch paths (local, irpc service, remote). Update hub/spoke → head/worker. Show how irpc is one backend for OperationEnv, not a replacement for it. |
|
||||
| `README.md` | **Update** | Add new docs and ADRs to the tables. |
|
||||
|
||||
### Review Checklist (Phase 0)
|
||||
|
||||
After writing specs and ADRs:
|
||||
|
||||
1. **No inline decision rationale** — all "why" decisions are in ADRs, specs reference ADR numbers
|
||||
2. **No inline open questions** — all OQs are in open-questions.md, specs reference OQ numbers
|
||||
3. **Terminology is consistent** — head/worker everywhere (no hub/spoke remaining)
|
||||
4. **Layer boundaries are clear** — every component belongs to exactly one layer
|
||||
5. **Crate dependencies are acyclic** — core doesn't depend on secret, storage, or flowgraph
|
||||
6. **Every spec has YAML frontmatter** with status and last_updated
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Core Modifications
|
||||
|
||||
**Goal**: Modify alknet-core to support the architectural changes. This is the "adjust the foundation" phase.
|
||||
|
||||
**Why second**: The core changes (config split, auth service, identity type, forwarding policy) are prerequisites for the service layer and the external crates. Implementation can begin after Phase 0 ADRs and specs are reviewed and stable.
|
||||
|
||||
### 1.1 Configuration: Static/Dynamic Split
|
||||
|
||||
**Source**: research/configuration.md (nearly spec-ready)
|
||||
|
||||
**Changes to alknet-core**:
|
||||
- Introduce `StaticConfig` struct (transport mode, listen addr, TLS config, iroh config, host key, stealth, max_auth_attempts, max_connections_per_ip)
|
||||
- Introduce `DynamicConfig` struct (auth policy, forwarding policy, rate limits)
|
||||
- Replace `Arc<ServerAuthConfig>` with `Arc<ArcSwap<DynamicConfig>>` in ServerHandler
|
||||
- Add `ConfigReloadHandle` with `reload(DynamicConfig)` method
|
||||
- Expose `reloadAuth()` / `reloadForwarding()` on the NAPI AlknetServer object
|
||||
|
||||
**What stays the same**: `ServeOptions` builder pattern is preserved. `StaticConfig` is constructed from `ServeOptions`. `DynamicConfig` starts with what was in `ServerAuthConfig` and gains `ForwardingPolicy`.
|
||||
|
||||
**New crate**: None. This is all in alknet-core.
|
||||
|
||||
**ADR**: 030 (static/dynamic split)
|
||||
|
||||
**Risk**: Low — internal refactor, no protocol changes. Default-allow forwarding preserves current behavior.
|
||||
|
||||
### 1.2 Identity Type and IdentityProvider Trait
|
||||
|
||||
**Source**: architecture/auth.md (already defines IdentityProvider), research/services.md (Identity struct)
|
||||
|
||||
**Changes to alknet-core**:
|
||||
- Define `Identity` struct in `alknet_core::auth` (id, scopes, resources)
|
||||
- Define `IdentityProvider` trait in `alknet_core::auth`
|
||||
- Implement `ConfigIdentityProvider` (reads from DynamicConfig's authorized_keys)
|
||||
- Wire `IdentityProvider` into `ServerHandler::auth_publickey()` — currently reads from `ServerAuthConfig`, now goes through trait
|
||||
- Wire `IdentityProvider` into token auth (WebTransport path) when that lands
|
||||
|
||||
**What stays the same**: SSH key verification logic. The `auth_publickey()` callback just delegates to the trait instead of reading directly.
|
||||
|
||||
**New crate**: None. Identity is core.
|
||||
|
||||
**ADR**: 029 (identity as core type)
|
||||
|
||||
**Risk**: Low — adding a trait abstraction over existing behavior.
|
||||
|
||||
### 1.3 Forwarding Policy
|
||||
|
||||
**Source**: research/configuration.md (ForwardingPolicy section)
|
||||
|
||||
**Changes to alknet-core**:
|
||||
- Define `ForwardingPolicy`, `ForwardingRule`, `TargetPattern` structs
|
||||
- Add policy check in `channel_open_direct_tcpip` before proxy spawn
|
||||
- Default: `ForwardingPolicy::allow_all()` (preserves current behavior)
|
||||
- Policy is part of `DynamicConfig` and reloadable
|
||||
|
||||
**New crate**: None. This is in alknet-core.
|
||||
|
||||
**ADR**: 031 (forwarding policy)
|
||||
|
||||
**Risk**: Low — new check, default-allow preserves current behavior.
|
||||
|
||||
### 1.4 Auth Service (irpc Protocol)
|
||||
|
||||
**Source**: research/services.md (AuthProtocol definition), research/configuration.md (auth service approach)
|
||||
|
||||
**Changes to alknet-core**:
|
||||
- Define `AuthProtocol` enum with `#[rpc_requests]` (behind `irpc` feature flag)
|
||||
- Define `AuthResult` and `Identity` types shared between SSH auth path and irpc auth path
|
||||
- Implement `AuthServiceImpl` backed by `ConfigIdentityProvider` (ArcSwap path) — the default for minimal deployments
|
||||
- Future: `AuthServiceImpl` backed by SQLite (in alknet-storage) — not in this phase
|
||||
|
||||
**What stays the same**: The `IdentityProvider` trait is the contract. Default impl uses ArcSwap. SQL impl is additive.
|
||||
|
||||
**New crate**: None. Auth service types live in alknet-core.
|
||||
|
||||
**Feature flag**: `irpc` feature in alknet-core. When disabled, auth goes through `IdentityProvider` directly (no irpc overhead).
|
||||
|
||||
**ADR**: 028 (auth as irpc service), 029 (identity as core type)
|
||||
|
||||
**Risk**: Medium — introduces irpc dependency behind feature flag. Needs careful API design so the trait-based path and the irpc path produce identical results.
|
||||
|
||||
### 1.5 OperationEnv and OperationRegistry
|
||||
|
||||
**Source**: research/services.md (OperationContext, OperationEnv), existing call-protocol.md (OperationSpec, OperationRegistry)
|
||||
|
||||
**Changes to alknet-core**:
|
||||
- Define `OperationContext` struct (request_id, parent_request_id, identity, metadata, env, trusted)
|
||||
- Define `OperationEnv` — the universal composition mechanism with three dispatch backends:
|
||||
- **Local dispatch**: Direct function call through the operation registry
|
||||
- **Service dispatch**: irpc protocol call to a service backend
|
||||
- **Remote dispatch**: Call protocol EventEnvelope to a remote node
|
||||
- Extend the existing `OperationRegistry` to support all three dispatch paths
|
||||
- Define `ResponseEnvelope` as the universal return type (matching `@alkdev/operations`)
|
||||
- Operation handlers receive `(input: Value, context: OperationContext) -> ResponseEnvelope`
|
||||
- The `env` field on `OperationContext` allows handlers to call other operations without knowing the dispatch path
|
||||
|
||||
**Hard constraint**: The OperationEnv composition model must match the behavioral contract from `@alkdev/operations`. Namespace + operation name → invoke with input, return output. This is what makes operations universally composable across HTTP, MCP, DNS, call protocol, and irpc. The Rust implementation can differ in its internal dispatch mechanism, but the handler-facing API must preserve this contract.
|
||||
|
||||
**New crate**: None. OperationEnv, OperationContext, and OperationRegistry are core concepts in `alknet_core::call`.
|
||||
|
||||
**ADR**: 033 (call protocol / irpc relationship)
|
||||
|
||||
**Risk**: Medium — OperationEnv is a new abstraction that must coexist with the existing call protocol handler pattern. The registry currently maps paths to handlers; OperationEnv adds namespace-aware composition on top. Need to ensure the two models compose cleanly.
|
||||
|
||||
### 1.6 Config Service (irpc Protocol)
|
||||
|
||||
**Source**: research/configuration.md, research/services.md (ConfigProtocol definition)
|
||||
|
||||
**Changes to alknet-core**:
|
||||
- Define `ConfigProtocol` enum with `#[rpc_requests]` (behind `irpc` feature flag)
|
||||
- Implement `ConfigServiceImpl` backed by `ArcSwap<DynamicConfig>`
|
||||
- Expose reload methods through the service
|
||||
|
||||
**New crate**: None. Config is core.
|
||||
|
||||
**Feature flag**: `irpc` feature.
|
||||
|
||||
**ADR**: 030 (static/dynamic split)
|
||||
|
||||
**Risk**: Low — thin wrapper over ArcSwap.
|
||||
|
||||
### 1.7 Multi-Transport Listeners
|
||||
|
||||
**Source**: research/configuration.md (multi-transport section)
|
||||
|
||||
**Changes to alknet-core**:
|
||||
- Change `ServeTransportMode` from single enum to `Vec<ListenerConfig>`
|
||||
- `Server::run()` spawns one accept loop per listener, sharing `DynamicConfig`, `ConnectionRateLimiter`, sessions, and shutdown signal
|
||||
- Add `TransportKind::WebTransport` and `TransportKind::Dns` variants (initially tags only — no acceptor implementation)
|
||||
- TOML config file support: `[[listeners]]` array-of-tables syntax
|
||||
|
||||
**New crate**: None. This is alknet-core server logic.
|
||||
|
||||
**ADR**: 026 (transport-interface separation) — TransportKind enum includes all Layer 1 types
|
||||
|
||||
**Risk**: Medium — changes the primary API surface of `serve()`. Backwards compat via accepting both single `transport` and `listeners` array.
|
||||
|
||||
### 1.8 Interface Abstraction
|
||||
|
||||
**Source**: New concept from review (not in research docs explicitly)
|
||||
|
||||
**Changes to alknet-core**:
|
||||
- Define `Interface` trait that consumes a `Transport::Stream` and produces call protocol events
|
||||
- `SshInterface` — wraps existing russh handler, produces SSH channels + control channel
|
||||
- `RawFramingInterface` — reads length-prefixed JSON EventEnvelope frames, produces call protocol events directly (no SSH)
|
||||
- The call protocol is interface-agnostic — it receives `EventEnvelope` frames from any interface
|
||||
|
||||
This is the most architecturally significant change in Phase 1. Currently, SSH is deeply embedded in the server handler. Extracting it into an Interface trait means:
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait Interface: Send + Sync + 'static {
|
||||
type Session;
|
||||
async fn accept(stream: TransportStream, config: &InterfaceConfig) -> Result<Self::Session>;
|
||||
// The session produces call protocol events and handles responses
|
||||
}
|
||||
```
|
||||
|
||||
The existing `ServerHandler` logic (auth, channel open, proxy) becomes `SshInterface`. The raw framing interface becomes a simple length-prefix reader. DNS control channel becomes (DNS transport + raw framing interface).
|
||||
|
||||
**This requires careful design review**. The SSH handler currently owns auth, channel management, and proxy logic. Much of that moves to Layer 3 (call protocol) or stays in the interface. The split needs to be clean.
|
||||
|
||||
**ADR**: 026 (transport-interface separation)
|
||||
|
||||
**Risk**: High — refactoring the core server handler. This is the most invasive change in Phase 1. May need to be split into sub-phases or deferred partially.
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Core Bridge
|
||||
|
||||
**Goal**: Complete the interface-to-protocol bridge and add the core types that external crates and HTTP interfaces depend on. Phase 1 established the interface trait and SSH extraction but left the call protocol bridge (SshSession recv/send) as stubs and deferred key interface model refinements. Phase 2 closes those gaps so that Phase 3 crates can reference a stable, functional core.
|
||||
|
||||
**Why before external crates**: The external crates (alknet-secret, alknet-storage) depend on a core where the Layer 2→3 bridge actually works. Without `SshSession::recv()`/`send()` producing and consuming `InterfaceEvent` frames, the call protocol is inert for SSH sessions. Without `RawFramingInterface` implemented, there's no non-SSH path either. And without `StreamInterface`/`MessageInterface` split and `CredentialProvider`, the phase 2 research docs (interface-model, credential-provider, tls-transport) describe a target architecture that doesn't exist in code yet. These must exist before crates can wire against them.
|
||||
|
||||
### 2.1 SshSession Call Protocol Bridge
|
||||
|
||||
**Source**: interface.md (OQ-IF-01, resolved), ssh-interface-extraction task, control_channel.rs
|
||||
|
||||
**Current state**: `SshSession::recv()` always returns `None` and `SshSession::send()` silently discards. The `ControlChannelRouter` exists but has no handler wired. The `alknet-control:0` SSH channel is detected in `channel_open_direct_tcpip` but not bridged to `InterfaceEvent` frames.
|
||||
|
||||
**Changes to alknet-core**:
|
||||
- Implement `SshSession::recv()` — read `EventEnvelope` frames from the `alknet-control:0` channel stream, wrap in `InterfaceEvent` with the session's `Identity`
|
||||
- Implement `SshSession::send()` — write `EventEnvelope` frames to the `alknet-control:0` channel stream
|
||||
- Wire `ControlChannelRouter` to bridge SSH channel data to the call protocol handler
|
||||
- The session's `Identity` (from SSH auth) is attached to every `InterfaceEvent`
|
||||
|
||||
**Prerequisites**: Verify that `call::frame::{encode, decode}` exists and produces/consumes frames compatible with the SSH channel data stream. The `ControlChannelRouter` in `control_channel.rs` needs a handler wired — check its current API for how to register a call protocol handler.
|
||||
|
||||
**Why this is Phase 2 not Phase 4**: This is the duct work that connects Layer 2 (interface) to Layer 3 (protocol). Without it, SSH sessions can only forward ports — they cannot invoke call protocol operations. This is core functionality, not an advanced feature.
|
||||
|
||||
**New crate**: None. This is alknet-core.
|
||||
|
||||
**Risk**: Medium — the SSH channel → call protocol bridge needs careful framing (4-byte length prefix over the SSH channel data stream, matching `RawFramingInterface`'s wire format). The `SshHandler` already detects `alknet-*` destinations; the bridge is connecting that detection to the channel stream.
|
||||
|
||||
### 2.2 RawFramingInterface Implementation
|
||||
|
||||
**Source**: interface.md, integration-plan Phase 1.8
|
||||
|
||||
**Current state**: `RawFramingInterface` and `RawFramingSession` are stub types. `accept()` returns an error, `recv()` returns `None`, `send()` returns an error.
|
||||
|
||||
**Changes to alknet-core**:
|
||||
- Implement `RawFramingInterface::accept()` — read the 4-byte length prefix + JSON `EventEnvelope` frame from the transport stream, return a `RawFramingSession` that wraps the stream
|
||||
- Implement `RawFramingSession::recv()` — read length-prefixed `EventEnvelope` frames from the stream, produce `InterfaceEvent`
|
||||
- Implement `RawFramingSession::send()` — write length-prefixed `EventEnvelope` frames to the stream
|
||||
- Auth for raw framing: first frame on the session is an auth event carrying token data, resolved via `IdentityProvider::resolve_from_token()`. After auth succeeds, subsequent frames are call protocol `EventEnvelope` data. The `RawFramingSession` is not considered authenticated until the auth frame is processed.
|
||||
|
||||
**Auth design decision**: Raw framing sessions use a first-frame auth pattern. The first `InterfaceEvent` on a `RawFramingSession` carries an auth token (in the `InterfaceEvent.identity` field or a dedicated auth event type). After authentication, all subsequent frames are call protocol events. This is simpler and more secure than per-frame auth — the session has a clear auth state transition, and the token is only transmitted once. For sessions that fail auth, the session is terminated immediately.
|
||||
|
||||
**Why this is Phase 2**: Raw framing is the simplest interface and the foundation for all non-SSH paths (TCP mesh, WebTransport, DNS). Without it, no `MessageInterface` or `StreamInterface` other than SSH can carry call protocol traffic. HTTP interfaces (Phase 4) build on the framing logic established here.
|
||||
|
||||
**New crate**: None. This is alknet-core.
|
||||
|
||||
**Risk**: Low — straightforward length-prefixed frame reader/writer. The frame format already exists in `call::frame::{encode, decode}`. The auth design (first-frame auth) is simple and matches the `InterfaceEvent` model where `identity: Option<Identity>` is set on auth and carried forward.
|
||||
|
||||
### 2.3 StreamInterface / MessageInterface Split
|
||||
|
||||
**Source**: research/phase2/interface-model.md
|
||||
|
||||
**Current state**: The `Interface` trait has one form (`accept(stream) → Session`). Phase 2 research identifies that HTTP and DNS are not stream-based — they're message-based (individual request/response pairs, no persistent session). The research proposes splitting into `StreamInterface` and `MessageInterface`.
|
||||
|
||||
**Changes to alknet-core**:
|
||||
- Rename `Interface` → `StreamInterface` (the current trait becomes the stream-specific variant)
|
||||
- Rename `InterfaceSession` → `StreamInterfaceSession` (or keep as `InterfaceSession` — it's already specific to stream sessions)
|
||||
- Add `MessageInterface` trait: `handle_request(&self, request: InterfaceRequest) -> Result<InterfaceResponse>`
|
||||
- Add `InterfaceRequest` and `InterfaceResponse` types
|
||||
- Add `HttpInterface` stub (struct and impl signature, axum not wired yet)
|
||||
- Add `DnsInterface` stub (struct definition only)
|
||||
- Restructure `InterfaceConfig` enum: current `InterfaceConfig::Ssh(SshInterfaceConfig)` and `InterfaceConfig::RawFraming(RawFramingConfig)` become `StreamInterfaceConfig::Ssh` and `StreamInterfaceConfig::RawFraming`. Add `MessageInterfaceConfig` variants for HTTP and DNS.
|
||||
- Update `ListenerConfig` to include `Stream`, `Http`, and `Dns` variants (per ADR-035 and updated interface.md)
|
||||
- Add `TransportKind::WebTransport` as a tag-only variant (no acceptor implementation) — this was planned for Phase 1 but never added. It's a trivial addition that prevents a breaking change later.
|
||||
- Note: `TransportKind::Dns` was never added to the code, so no removal is needed. The updated specs correctly show DNS as a `MessageInterface` with its own `ListenerConfig::Dns` variant, not a transport.
|
||||
|
||||
**Why this is Phase 2**: This is a type-system change that affects how all future interfaces are implemented. If we build HTTP on top of `Interface` (singular) and then need to split later, we'd refactor HTTP, DNS, WebSocket, and any other interface added in Phases 4+. Doing the split now is cheap — it's a rename + new trait + two stubs — and prevents a larger refactor later.
|
||||
|
||||
**New crate**: None. This is alknet-core.
|
||||
|
||||
**ADR**: 035 (StreamInterface/MessageInterface split — supersedes the Layer 2 aspects of ADR-026)
|
||||
|
||||
**Risk**: Low — rename and new trait. Existing `SshInterface` and `RawFramingInterface` become `StreamInterface` implementations. No behavior change for stream-based interfaces. The `InterfaceConfig` enum restructuring and `TransportKind::WebTransport` addition are mechanical changes.
|
||||
|
||||
**Scheduling note**: This task should be done early in Phase 2 because all subsequent tasks (2.1, 2.2, 2.4, 2.5, 2.6, 2.7) reference the new trait names. It can be done in parallel with 2.1 and 2.2 since they're mostly additive.
|
||||
|
||||
### 2.4 CredentialProvider Trait and CredentialSet
|
||||
|
||||
**Source**: research/phase2/credential-provider.md
|
||||
|
||||
**Current state**: No outbound credential resolution exists. Each service wrapper would need to independently retrieve and manage credentials.
|
||||
|
||||
**Changes to alknet-core**:
|
||||
- Define `CredentialProvider` trait in `alknet_core::credentials`
|
||||
- Define `CredentialSet` enum: `ApiKey`, `Basic`, `Bearer`, `S3AccessKey`, `OidcToken`, `Custom`
|
||||
- Implement `ConfigCredentialProvider` — a config-backed stub that reads API keys and static credentials from `DynamicConfig`. This is the Phase 2 default: simple, no secret service dependency, sufficient for testing and single-node deployments.
|
||||
- Wire into `OperationEnv` so handlers can access credentials through `context.env` (or a separate `CredentialProvider` field on `OperationContext` — implementation detail)
|
||||
- Define the `SecretStoreCredentialProvider` type and its interface (reads from `SecretProtocol::Decrypt`, holds in RAM) but **do not implement the body** — leave it as a stub that returns `None`. Full implementation requires alknet-secret (Phase 3).
|
||||
|
||||
**Why this is Phase 2**: The secret crate (Phase 3) needs `CredentialProvider` as a consumer of `SecretProtocol::Decrypt`. The trait and enum must exist in core before the secret crate can wire against them. This is the same pattern as `IdentityProvider` — trait in core, default impl uses simple storage, production impl uses the secret service.
|
||||
|
||||
**New crate**: None. Trait and enum in alknet-core.
|
||||
|
||||
**Risk**: Low — new trait and enum, no existing code changes. `ConfigCredentialProvider` is a simple config-backed lookup. `SecretStoreCredentialProvider` stub returns `None` until Phase 3 provides the secret service dependency.
|
||||
|
||||
**Split note**: This task is naturally split into:
|
||||
- **2.4a** (this phase): Define `CredentialProvider` trait, `CredentialSet` enum, `ConfigCredentialProvider` impl, wire into `OperationEnv`/`OperationContext`. This is self-contained and testable.
|
||||
- **2.4b** (Phase 3, after alknet-secret exists): Implement `SecretStoreCredentialProvider` backed by `SecretProtocol::Decrypt`. This requires alknet-secret as a dependency.
|
||||
|
||||
### 2.5 ListenerConfig Update and HTTP Listener Stub
|
||||
|
||||
**Source**: research/phase2/tls-transport.md
|
||||
|
||||
**Current state**: Phase 1 added `ListenerConfig` with `Stream` variant (transport + interface pair). Phase 2 research adds `Http` and `Dns` listener variants for message-based interfaces. The Phase 1 implementation also added `TransportKind::Dns` which should be removed (DNS is a `MessageInterface`, not a transport).
|
||||
|
||||
**Changes to alknet-core**:
|
||||
- `TransportKind::Dns` removal: **No-op** — `TransportKind` in the current code has `Tcp`, `Tls`, and `Iroh` only. `Dns` was never added to the enum. The updated specs correctly show DNS as a `MessageInterface` with its own `ListenerConfig::Dns` variant (per ADR-035), not as a transport variant.
|
||||
- Add `ListenerConfig::Http` variant: `{ bind_addr, tls, stealth }`
|
||||
- Add `ListenerConfig::Dns` variant: `{ bind_addr, tls }` (DNS as a MessageInterface with its own listener)
|
||||
- Extend the server accept loop to handle `ListenerConfig::Http` by spawning an axum router when `stealth` mode detects HTTP traffic (replacing `send_fake_nginx_404`)
|
||||
- `HttpInterface` stub defined in 2.3 gets its structural types but no route implementations yet
|
||||
|
||||
**Why this is Phase 2**: The `ListenerConfig` is the server's primary configuration type. Adding HTTP and DNS listener variants now means Phase 3+ crates and Phase 4 HTTP implementation can reference the right type from the start. Removing `TransportKind::Dns` before any code depends on it prevents a breaking change later.
|
||||
|
||||
**New crate**: None. This is alknet-core. New dependency: `axum` (behind `http` feature flag).
|
||||
|
||||
**Risk**: Low — type changes and a stub axum router. The `send_fake_nginx_404` → axum handoff is a small change to the existing stealth detection code. Full HTTP route implementations are Phase 4.
|
||||
|
||||
### 2.6 API Keys in DynamicConfig
|
||||
|
||||
**Source**: research/phase2/interface-model.md (Config section), research/phase2/credential-provider.md
|
||||
|
||||
**Current state**: `DynamicConfig.auth` has `authorized_keys` for SSH auth and `token` settings but no simple bearer API keys for service accounts or automation.
|
||||
|
||||
**Changes to alknet-core**:
|
||||
- Add `[[auth.api_keys]]` section to `DynamicConfig`: prefix, hash (SHA-256), scopes, description, optional TTL
|
||||
- Extend `ConfigIdentityProvider::resolve_from_token()` to verify API keys in addition to AuthTokens
|
||||
- API keys are shorter and simpler than AuthTokens — no Ed25519 key pair needed, just a hash-verified bearer string
|
||||
- `SecretStoreCredentialProvider` can also resolve API keys when database-backed storage is available
|
||||
|
||||
**Why this is Phase 2**: The HTTP interface (Phase 4) needs bearer token auth, and the simplest path is API keys that already work with `IdentityProvider::resolve_from_token()`. Without this, Phase 4 HTTP auth has no config-based auth mechanism.
|
||||
|
||||
**New crate**: None. This is alknet-core.
|
||||
|
||||
**Risk**: Low — additive config section and an additional lookup path in an existing trait method.
|
||||
|
||||
### 2.7 Axum HTTP Router Scaffold
|
||||
|
||||
**Source**: research/phase2/tls-transport.md
|
||||
|
||||
**Changes to alknet-core** (behind `http` feature flag):
|
||||
- Add `axum` dependency (behind feature flag)
|
||||
- Create `alknet_core::http` module with an axum `Router` scaffold:
|
||||
- Auth middleware that extracts `Authorization: Bearer <token>` and calls `IdentityProvider::resolve_from_token()`, attaching the resolved `Identity` to the request extensions
|
||||
- Stealth handoff: replace `send_fake_nginx_404` with axum router serving the `BufReader<TlsStream>`
|
||||
- A default 404 handler for any unmatched routes (no hardcoded operation paths)
|
||||
- No operational routes yet — the question of how HTTP paths map to operation invocations depends on the from_openapi / spec-generation work and is deferred to Phase 5. Custom routes (git, S3, OpenAI proxy) will register directly with the axum router at their own paths, sharing the auth middleware but with their own routing logic.
|
||||
- The `ListenerConfig::Http` variant and stealth mode handoff are established here so that HTTP traffic reaches axum with auth context. Routing *inside* axum is a later concern.
|
||||
|
||||
**Why this is Phase 2**: The auth middleware and stealth handoff are prerequisites for any HTTP endpoint. Without this, the only way to reach call protocol operations is via SSH. The scaffold gets HTTP traffic to axum with identity — the specific routes and path conventions are intentionally not specified here.
|
||||
|
||||
**New crate**: None. In alknet-core behind `http` feature flag.
|
||||
|
||||
**Risk**: Low — structural scaffold with auth middleware and stealth handoff only. No operational routes or path conventions.
|
||||
|
||||
**Open question**: How should external HTTP paths map to alknet operations? The internal path convention (`/{namespace}/{op}` over call protocol channels) is one design; external HTTP paths are determined by the API being exposed (OpenAI `/v1/chat/completions`, S3 `/{bucket}/{key}`, git `/{repo}.git/info/refs`). The inverse of `from_openapi` — generating an OpenAPI spec from registered operations and mapping those to HTTP routes — will determine the answer. This is deferred to Phase 5.
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: External Crates
|
||||
|
||||
**Goal**: Create the new crates that core depends on by type but not by implementation.
|
||||
|
||||
**Why after Phase 2**: The core types and bridges must be stable before building crates that reference them. Phase 2 ensures that the `InterfaceSession` bridge works, `CredentialProvider` exists, and `ListenerConfig` has its final shape. The external crates can then wire against a functional core.
|
||||
|
||||
### 3.1 alknet-secret
|
||||
|
||||
**Source**: research/services.md (SecretProtocol), research/storage.md (secrets section, key derivation)
|
||||
|
||||
**Contents**:
|
||||
- BIP39 mnemonic generation and seed derivation
|
||||
- SLIP-0010 Ed25519 HD key derivation (SLIP-0044 coin type 74')
|
||||
- AES-256-GCM encryption/decryption for external credentials
|
||||
- `SecretProtocol` irpc service implementation (Unlock, Lock, DeriveEd25519, DeriveEncryptionKey, Encrypt, Decrypt)
|
||||
- `EncryptedData` type (key_version, salt, iv, ciphertext)
|
||||
- Derivation path constants
|
||||
|
||||
**Dependencies**: bip39, ed25519-bip32 (or rust-bip32-ed25519), aes-gcm, sha2, irpc
|
||||
|
||||
**Does NOT depend on**: alknet-core, alknet-storage
|
||||
|
||||
**Interface back to core**: alknet-secret types (EncryptedData, derivation paths) are referenced by alknet-storage when storing encrypted nodes. The wire format is stable; core never sees the seed or derived keys.
|
||||
|
||||
**ADR**: 027 (crate decomposition)
|
||||
|
||||
**Risk**: Low — new crate, no existing code to refactor. Crypto dependencies are well-understood.
|
||||
|
||||
### 3.2 alknet-storage
|
||||
|
||||
**Source**: research/storage.md (entire document)
|
||||
|
||||
**Contents**:
|
||||
- SQLite-backed metagraph (GraphType, NodeType, EdgeType, Graph, Node, Edge)
|
||||
- Identity tables (accounts, organizations, peer_credentials, api_keys, audit_logs)
|
||||
- ACL as metagraph (PrincipalNode, DelegatesEdge, access control graph)
|
||||
- Encrypted node type (bridges to alknet-secret's EncryptedData format)
|
||||
- Honker integration (stream_publish/subscribe, notify/listen, queue/claim)
|
||||
- System DB vs Tenant DB separation
|
||||
- `StorageProtocol` irpc service
|
||||
|
||||
**Dependencies**: rusqlite (via honker or direct), honker, serde_json, jsonschema, petgraph, irpc
|
||||
|
||||
**Does NOT depend on**: alknet-core, alknet-secret (but references EncryptedData type format)
|
||||
|
||||
**Interface back to core**:
|
||||
- `StorageIdentityProvider` implements alknet-core's `IdentityProvider` trait (queries peer_credentials + ACL graph)
|
||||
- `StorageProtocol` is called via irpc from alknet-core's service layer
|
||||
|
||||
**ADR**: 027 (crate decomposition), 032 (event boundary discipline)
|
||||
|
||||
**Risk**: Medium — honker integration is new. SQLite schema needs to match the TypeScript version for compatibility.
|
||||
|
||||
### 3.3 alknet-flowgraph
|
||||
|
||||
**Source**: research/flow.md (entire document)
|
||||
|
||||
**Contents**:
|
||||
- `FlowGraph<N, E>` generic graph over `petgraph::DiGraph`
|
||||
- `NodeAttributes` / `EdgeAttributes` traits
|
||||
- Operation graph construction from `OperationSpec`s
|
||||
- Call graph population from `EventEnvelope` events
|
||||
- Type compatibility checking (jsonschema)
|
||||
- Cycle detection, topological sort, reachability queries
|
||||
- Serde serialization/deserialization
|
||||
|
||||
**Dependencies**: petgraph, serde, serde_json, jsonschema, thiserror
|
||||
|
||||
**Does NOT depend on**: alknet-core, alknet-storage, alknet-secret
|
||||
|
||||
**Interface back to core**: `OperationSpec` and `CallNodeAttrs` types must match alknet-core's definitions. Bridge is serialization — flowgraph serializes to JSON, storage persists it.
|
||||
|
||||
**ADR**: 027 (crate decomposition)
|
||||
|
||||
**Risk**: Low — pure computation crate, no I/O, no external state. Straight port of TypeScript design.
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Integration and Wiring
|
||||
|
||||
**Goal**: Wire the crates together. The CLI binary and NAPI layer assemble everything.
|
||||
|
||||
**Why after Phase 3**: Integration requires all pieces to exist. Phase 1 defines the interfaces; Phase 2 completes the core bridge; Phase 3 builds the crate implementations; Phase 4 connects them.
|
||||
|
||||
### 4.1 CLI Binary (alknet crate)
|
||||
|
||||
**Source**: research/configuration.md (CLI config, --config flag)
|
||||
|
||||
**Contents**:
|
||||
- `alknet serve` — parse TOML config, assemble StaticConfig + initial DynamicConfig, create services, run multi-transport server
|
||||
- `alknet connect` — parse CLI flags or TOML profile, create ConnectOptions, run client
|
||||
- Service assembly: for minimal deployments, use ArcSwap-backed services. For production, wire in SQLite-backed services.
|
||||
- TOML config file parsing (`alknet serve --config stack.toml`)
|
||||
|
||||
**New dependency**: `toml` crate (for config file parsing)
|
||||
|
||||
### 4.2 Service Assembly
|
||||
|
||||
The CLI or NAPI layer is responsible for wiring services together:
|
||||
|
||||
```rust
|
||||
// Minimal deployment (single-node, CLI)
|
||||
let auth = ConfigIdentityProvider::new(dynamic_config.clone());
|
||||
let config = ConfigServiceImpl::new(dynamic_config.clone());
|
||||
let secret = None; // No secret service in minimal mode
|
||||
|
||||
// Production deployment (head node)
|
||||
let auth = StorageIdentityProvider::new(storage_db);
|
||||
let config = ConfigServiceImpl::new(dynamic_config.clone());
|
||||
let secret = SecretServiceImpl::new(storage_db); // Holds seed in memory
|
||||
```
|
||||
|
||||
Core doesn't know about this assembly — it receives `IdentityProvider` and `DynamicConfig` through its public API.
|
||||
|
||||
### 4.3 OperationEnv Wiring — Three Dispatch Paths
|
||||
|
||||
The OperationEnv is the universal composition mechanism. When a handler calls `context.env.secrets.derive(input)`, the runtime resolves which dispatch path to take:
|
||||
|
||||
**Local dispatch** (in-process):
|
||||
```
|
||||
handler calls context.env[namespace][op](input)
|
||||
→ OperationEnv resolves the handler function from the local registry
|
||||
→ Direct function call, zero serialization
|
||||
→ Returns ResponseEnvelope
|
||||
```
|
||||
|
||||
**Service dispatch** (in-cluster, irpc):
|
||||
```
|
||||
handler calls context.env[namespace][op](input)
|
||||
→ OperationEnv resolves that this operation is backed by an irpc service
|
||||
→ Serializes input via postcard, sends to AuthProtocol::VerifyPubkey via mpsc channel (local) or QUIC stream (remote)
|
||||
→ Receives AuthResult, wraps in ResponseEnvelope
|
||||
```
|
||||
|
||||
**Remote dispatch** (cross-node, call protocol):
|
||||
```
|
||||
handler calls context.env[namespace][op](input)
|
||||
→ OperationEnv resolves that this operation lives on a remote node
|
||||
→ Sends call.requested EventEnvelope via the interface (SSH channel, raw framing, DNS, etc.)
|
||||
→ Receives call.responded EventEnvelope, deserializes payload
|
||||
```
|
||||
|
||||
All three paths produce the same `ResponseEnvelope`. The handler neither knows nor cares which path was taken. The OperationEnv is wired at startup based on deployment topology:
|
||||
|
||||
```rust
|
||||
// Minimal deployment (single node, all local)
|
||||
let env = OperationEnv::local(local_registry);
|
||||
|
||||
// Production deployment (mix of local and remote)
|
||||
let env = OperationEnv::new()
|
||||
.local("auth", auth_registry) // Auth runs locally
|
||||
.local("config", config_registry) // Config runs locally
|
||||
.service("secrets", secret_irpc_client) // Secret service via irpc
|
||||
.remote("worker-1", call_protocol_conn) // Worker-1 operations via call protocol
|
||||
;
|
||||
```
|
||||
|
||||
The irpc service layer is thus **one dispatch backend** for OperationEnv — the path chosen when an operation is registered as backed by an in-cluster service. It is not a replacement for OperationEnv or for the call protocol.
|
||||
|
||||
### 4.4 NAPI Layer Updates
|
||||
|
||||
**Changes to alknet-napi**:
|
||||
- Expose `reloadAuth()`, `reloadForwarding()`, `reloadAll()` on the AlknetServer object
|
||||
- Call protocol integration: expose operation registry for NAPI consumers to register handlers
|
||||
- Service layer: expose irpc service creation for NAPI consumers
|
||||
|
||||
### 4.5 Architecture Doc Sync
|
||||
|
||||
After Phase 2 core bridge changes are implemented and before Phase 3 crate development begins, the architecture docs should be updated to reflect the implementation state. The first round of doc sync has already been completed (commit `cfc4400`) based on Phase 2 research findings — this covered:
|
||||
|
||||
- StreamInterface/MessageInterface split in interface.md
|
||||
- CredentialProvider/CredentialSet in credentials.md
|
||||
- API keys in auth.md and configuration.md
|
||||
- ListenerConfig variants for HTTP and DNS
|
||||
- Resolved open questions (OQ-IF-01, OQ-IF-02, etc.)
|
||||
- New ADRs (035, 036, 037)
|
||||
|
||||
A **second doc sync** will be needed after Phase 2 implementation is complete to capture any deviations between the spec and the actual implementation (e.g., if `InterfaceConfig` was restructured differently, or if the raw framing auth design differs from the first-frame approach specified here). This second sync should be done before Phase 3 crate development begins.
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: Application Services and Advanced Features
|
||||
|
||||
**Goal**: Build services that register with the operation registry but don't change core.
|
||||
|
||||
**Why last**: These are pluggable. They depend on the core being stable (Phases 1-4) but don't affect core's architecture.
|
||||
|
||||
### 5.1 DNS Transport + Control Channel Interface
|
||||
|
||||
**Source**: research/core.md (DNS transport section)
|
||||
|
||||
**Scope**:
|
||||
- `DnsInterface` (already defined as a `MessageInterface` stub in Phase 2) gets full implementation
|
||||
- DNS server that encodes/decodes `EventEnvelope` frames as DNS TXT query/response pairs
|
||||
- Call protocol over DNS (not SSH over DNS — that's a separate, future goal)
|
||||
- AuthToken embedded in DNS query labels
|
||||
|
||||
**Crate**: `alknet-core` (behind `dns` feature flag)
|
||||
|
||||
**ADR**: 026 (transport-interface separation) — DNS is a `MessageInterface`, not a (DNS transport, raw framing) pair
|
||||
|
||||
**Risk**: Medium — DNS protocol implementation is non-trivial. Framing, chunking, and retransmission need R&D.
|
||||
|
||||
### 5.2 WebTransport Transport
|
||||
|
||||
**Source**: architecture/auth.md (WebTransport section), research/phase2/tls-transport.md
|
||||
|
||||
**Scope**:
|
||||
- `WebTransportAcceptor` implements `TransportAcceptor` trait
|
||||
- Token auth for WebTransport sessions (AuthToken in CONNECT URL, `IdentityProvider::resolve_from_token()`)
|
||||
- `TransportKind::WebTransport` variant
|
||||
- QUIC listener coexistence with iroh on UDP 443
|
||||
|
||||
**Crate**: `alknet-core` (behind `webtransport` feature flag)
|
||||
|
||||
**Risk**: Medium — requires wtransport crate dependency, QUIC listener coexistence questions (OQ-15).
|
||||
|
||||
### 5.3 Full HTTP Interface Implementation
|
||||
|
||||
**Source**: research/phase2/tls-transport.md
|
||||
|
||||
**Scope**:
|
||||
- Replace stub handlers in the Phase 2 axum scaffold with actual operation dispatch
|
||||
- `POST /v1/{namespace}/{op}` → `registry.invoke(namespace, op, input)` (mutation)
|
||||
- `GET /v1/{namespace}/{op}` → `registry.invoke(namespace, op, input)` (query, params as input)
|
||||
- `GET /v1/{namespace}/{op}` SSE → `registry.subscribe(namespace, op, input)` (subscription)
|
||||
- `GET /v1/schema` → `registry.list_operations()`
|
||||
- OpenAPI spec generation from `OperationRegistry`
|
||||
- WebSocket upgrade handler for persistent browser connections
|
||||
|
||||
**Crate**: `alknet-core` (behind `http` feature flag)
|
||||
|
||||
**Risk**: Medium — full HTTP routing, SSE streaming, auth middleware integration with OperationEnv.
|
||||
|
||||
### 5.4 Docker Service, Node Service, Git Service, etc.
|
||||
|
||||
**Source**: research/services.md (application services section), research/references/gitserver/
|
||||
|
||||
These are all pluggable services that register operations with the core's `OperationRegistry`. They don't require core changes. They're candidates for a `alknet-services` crate or individual crates.
|
||||
|
||||
**Git Service** path (see research/references/gitserver/ and research/references/gitlfs/):
|
||||
- Use `gitserver-core` as the git protocol engine (transport-agnostic, library-first design)
|
||||
- `gitserver-http` nested in alknet's axum router for HTTPS git
|
||||
- `rudolfs` (or a fork) as the LFS layer, backed by rustfs S3 storage
|
||||
- Auth via `IdentityProvider` → gitserver's `AuthConfig`
|
||||
- Operations: `git.clone`, `git.push`, `git.pull` registered in OperationRegistry
|
||||
|
||||
**Crate**: New crate(s) per service, or a consolidated `alknet-services` crate
|
||||
|
||||
**Risk**: Low — purely additive, no core changes needed.
|
||||
|
||||
### 5.5 Flow Graph Real-time Construction
|
||||
|
||||
**Source**: research/flow.md
|
||||
|
||||
Wire call protocol events (call.requested, call.responded, etc.) to `FlowGraph::update_from_event()`. This is application-level wiring, not a core concern.
|
||||
|
||||
**Crate**: Application code in `alknet` binary or a `alknet-head` crate.
|
||||
|
||||
**Risk**: Low — event subscription pattern is well-established.
|
||||
|
||||
---
|
||||
|
||||
## Phase Summary
|
||||
|
||||
| Phase | What | Core Changes? | New Crates? | ADR Dependency |
|
||||
|---|---|---|---|---|
|
||||
| 0 | Architecture: ADRs, specs, review | No | No | Write all |
|
||||
| 1 | Core: config split, identity, forwarding, auth service, OperationEnv, interface abstraction | Yes | No | 026-034 |
|
||||
| 2 | Core bridge: SshSession recv/send, RawFramingInterface, StreamInterface/MessageInterface split, CredentialProvider (trait+stub), HTTP listener stub, API keys | Yes | No | 035, 036, 037, phase2 research |
|
||||
| 3 | External crates: secret, storage, flowgraph | No | Yes (3) | 027 |
|
||||
| 4 | Integration: CLI assembly, NAPI, service wiring, doc sync | Minor (exports) | No | 027 |
|
||||
| 5 | Advanced: DNS, WebTransport, full HTTP, application services | Minimal (feature flags) | Maybe | 026 |
|
||||
|
||||
## Dependency Graph
|
||||
|
||||
```
|
||||
alknet-secret
|
||||
/ \
|
||||
/ \
|
||||
alknet-core ←──── ←── alknet-storage
|
||||
↑ \ /
|
||||
│ alknet-flowgraph
|
||||
│
|
||||
alknet-napi
|
||||
alknet (CLI binary — assembles everything)
|
||||
```
|
||||
|
||||
alknet-core depends on: russh, tokio, irpc (feature flag), serde, axum (feature flag)
|
||||
alknet-secret depends on: bip39, ed25519-bip32, aes-gcm, sha2, irpc
|
||||
alknet-storage depends on: honker, rusqlite, petgraph, jsonschema, irpc
|
||||
alknet-flowgraph depends on: petgraph, serde, jsonschema
|
||||
alknet-napi depends on: alknet-core
|
||||
alknet (CLI) depends on: alknet-core, alknet-secret (feature), alknet-storage (feature), alknet-flowgraph (feature), toml
|
||||
|
||||
No crate depends on alknet-core's internal types through a circular path. The `Identity` type, `IdentityProvider` trait, and `OperationSpec` are the narrow interface points.
|
||||
|
||||
---
|
||||
|
||||
## Open Questions to Resolve Before Phase 2
|
||||
|
||||
These must have answers before Phase 2 implementation begins. Phase 0/1 questions are resolved.
|
||||
|
||||
| OQ | Question | Proposed Resolution | Phase | ADR |
|
||||
|---|---|---|---|---|
|
||||
| ~~OQ-12~~ | Per-user forwarding scope vs global rules | **Resolved**: Start with global rules + principal matching. Per-user scope from peer_credentials.metadata.scopes via IdentityProvider. | 1 | 031 |
|
||||
| ~~OQ-16~~ | Transport-specific forwarding policy | **Resolved**: Add `TransportKind` match in ForwardingRule. | 1 | 031 |
|
||||
| ~~OQ-18~~ | Source of Identity.scopes | **Resolved**: IdentityProvider owns scopes. ForwardingPolicy uses scopes from Identity. | 1 | 029 |
|
||||
| ~~OQ-22~~ | Client streaming in call protocol | **Resolved**: Defer. Single request + optional streaming response covers all identified use cases. | — | — |
|
||||
| ~~OQ-IF-01~~ | How does InterfaceSession relate to EventEnvelope? | **Resolved**: `InterfaceSession::recv()` returns `Option<InterfaceEvent>` where `InterfaceEvent` carries `EventEnvelope` + `Identity`. `send()` accepts `EventEnvelope`. The SshSession bridge implements this over `alknet-control:0`. For `MessageInterface`, `InterfaceRequest`/`InterfaceResponse` normalize request/response pairs. See interface.md, ADR-035. | 2 | 035 |
|
||||
| ~~OQ-IF-02~~ | Should SshInterface own ForwardingPolicy checks? | **Resolved**: ForwardingPolicy is Layer 3 (policy), channel open/close lifecycle is Layer 2. SshInterface reports channel requests to Layer 3; Layer 3 applies policy. Current implementation already does this. | 2 | 031 |
|
||||
| OQ-15 | TLS + WebTransport + iroh QUIC coexistence | Defer WebTransport to Phase 5. TLS and iroh already coexist (TCP vs UDP). | 5 | — |
|
||||
| OQ-19 | Separate TLS identity for WebTransport vs shared | Share certificates. QUIC is UDP, TLS is TCP, same port works. Different subject alt names possible but not required. | 5 | — |
|
||||
| OQ-20 | Worker registration and discovery on connect/disconnect | Register on connect, cleanup on disconnect. Heartbeat for liveness. Spec in call-protocol.md. | 2+ | — |
|
||||
| OQ-P2-01 | Should MessageInterface and StreamInterface share a common trait? | **Resolved**: Independent traits. Different signatures (`handle_request` vs `accept` + session lifecycle), different transport ownership (self-managed vs provided). A common super-trait adds complexity without benefit. ADR-035 accepted. | 2 | 035 |
|
||||
| OQ-P2-02 | Should HTTP share a port with the SSH listener? | **Resolved**: Start with separate ports. Stealth mode byte-peek on shared port 443 already detects SSH vs HTTP. ALPN multiplexing is a future optimization that doesn't change the interface abstraction. | 2 | — |
|
||||
| OQ-P2-03 | Should the HTTP interface auto-generate OpenAPI specs from OperationRegistry? | **Resolved**: Yes, but Phase 5+. The HTTP interface needs to exist first (Phase 5.3). | 5 | — |
|
||||
| OQ-P2-04 | How do self-hosted services authenticate via alknet? | **Resolved**: Three-phase approach. Phase A: shared secret (`CredentialSet::Bearer` or `S3AccessKey`). Phase C: identity-bound credentials via `ManagedCredentialProvider`. Phase D: alknet as OIDC provider. `CredentialProvider` trait in core enables Phase A immediately. ADR-036 accepted. | 2-5 | 036 |
|
||||
|
||||
---
|
||||
|
||||
## Inconsistencies and Conflations to Clean Up
|
||||
|
||||
The research documents have a few areas that need reconciliation:
|
||||
|
||||
1. **Hub/spoke vs head/worker**~~: core.md and services.md use head/worker. call-protocol.md still uses hub/spoke in several places. All docs need to be updated consistently. ADR-034 formalizes this.~~ **Fixed**: call-protocol.md, auth.md, open-questions.md, and napi-and-pubsub.md updated to head/worker terminology. ADRs are historical records and retain original terminology. ADR-034 still needed to formalize the decision.
|
||||
|
||||
2. **DNS as transport vs interface**: core.md conflates "DNS as transport" (encoding bytes as DNS queries) with "DNS as naming/discovery" (TXT records). The three-layer model cleanly separates these: DNS is a `MessageInterface`, not a transport. **Phase 2 removes `TransportKind::Dns`** and adds `ListenerConfig::Dns`.
|
||||
|
||||
3. **Service naming collision — irpc service vs call protocol operation vs external service**: The research uses "service" for both irpc protocol enums and call protocol path-based handlers. See research/phase2/definitions.md for full disambiguation. The architecture should consistently use: **irpc service** (in-cluster, Rust-to-Rust), **operation** (path-based call protocol handler), **external service** (third-party endpoint), and **application service** (handler registered in OperationRegistry).
|
||||
|
||||
4. **Identity model divergence**~~: auth.md defines `Identity` with `{id, scopes, resources}`. services.md defines `Identity` with `{node_id, fingerprint, scopes}`.~~ **Fixed**: auth.md has the correct unified definition `{id, scopes, resources}`.
|
||||
|
||||
5. **OperationEnv is a universal composition mechanism, not an implementation detail**~~: services.md defines `OperationEnv` as `HashMap<String, HashMap<String, fn(...)>>`.~~ **Acknowledged**: The behavioral contract (namespace + operation name → invoke) must match. The Rust implementation can use typed dispatch behind the scenes.
|
||||
|
||||
6. **Event boundary discipline needs to be a hard constraint, not a suggestion**~~: storage.md and services.md both call this out, but it's presented as a pattern rather than a rule.~~ **Formalized**: ADR-032 makes it a hard architectural constraint. See also research/phase2/definitions.md (Domain Events vs Integration Events).
|
||||
|
||||
7. **Config file vs programmatic API**: configuration.md proposes TOML config files. ADR-011 says "no config file, programmatic-first." **Proposed**: TOML is an optional convenience layer that builds `StaticConfig`/`DynamicConfig`. `ServeOptions` builder pattern remains the primary API. ADR-011 is amended, not superseded.
|
||||
|
||||
8. **Interface model needs StreamInterface/MessageInterface split**: The current `Interface` trait assumes persistent byte streams. HTTP and DNS don't fit (they handle individual requests, not sessions). **Phase 2 addresses this** — rename `Interface` → `StreamInterface`, add `MessageInterface`, add `HttpInterface` stub. See research/phase2/interface-model.md.
|
||||
|
||||
9. **SshSession recv/send stubs are core, not "Phase 4"**: The Phase 1 implementation left `SshSession::recv()` and `SshSession::send()` as stubs returning `None` / silently discarding. This makes the interface model inert for call protocol operations. The bridge between SSH channels and `InterfaceEvent`/`EventEnvelope` frames is a **Phase 2** concern, not a future feature. See Phase 2.1.
|
||||
|
||||
10. **CredentialProvider is missing from core**: Outbound auth (how alknet authenticates to external services) has no trait or implementation. This is needed before any HTTP API integration work. **Phase 2.4** adds the trait and enum to core; Phase 3 (alknet-secret) provides the storage-backed implementation. See research/phase2/credential-provider.md.
|
||||
|
||||
11. **Architecture docs need sync after Phase 2**: The current architecture docs (interface.md, auth.md, services.md, call-protocol.md) reflect the pre-Phase-0/1 state. After Phase 2 core bridge changes land, these must be updated to reflect StreamInterface/MessageInterface, CredentialProvider, HTTP listener, and the functional call protocol bridge. **Phase 4.5** is the doc sync point.
|
||||
@@ -1,466 +0,0 @@
|
||||
# Credential Provider: Outbound Service Authentication
|
||||
|
||||
> Status: Research / Draft
|
||||
> Last updated: 2026-06-08
|
||||
> Part of: Phase 2 planning
|
||||
|
||||
## Overview
|
||||
|
||||
Alknet's `IdentityProvider` resolves **inbound** authentication: who is making a request _to_ alknet. The `CredentialProvider` resolves **outbound** authentication: how alknet authenticates _to_ external and self-hosted services. This is a distinct and currently unaddressed concern that affects nearly every application service — from cloud API integrations (vast.ai, runpod, ubicloud) to self-hosted infrastructure (rustfs, gitea, postgres).
|
||||
|
||||
## Problem Statement
|
||||
|
||||
### External API credentials
|
||||
|
||||
Cloud providers use simple auth patterns — API keys, bearer tokens, basic auth. The existing `SecretProtocol` (encrypt/decrypt via derived AES-256-GCM keys, defined in [secret-service.md](../../architecture/secret-service.md)) can store and retrieve these at rest. But the wiring between "decrypt a credential from storage" and "use it in an HTTP request" doesn't exist yet. Each service wrapper currently would have to independently solve credential retrieval, caching, and lifecycle.
|
||||
|
||||
### Self-hosted service auth
|
||||
|
||||
Self-hosted services use more complex auth mechanisms that go beyond static tokens:
|
||||
|
||||
- **rustfs** uses S3-style access key + secret key pairs with AWS Signature V4 request signing. They also support OIDC (OpenID Connect with PKCE). The access key/secret key aren't a bearer header — they're inputs to a per-request HMAC-SHA256 signature computation.
|
||||
- **gitea** supports OAuth2, OIDC, and reverse proxy authentication (SSO via headers). Its internal user/token system is separate from alknet's identity model.
|
||||
- Other self-hosted services (postgres, redis) may use their own auth schemes.
|
||||
|
||||
These services are **inside the operational domain** — their credential lifecycle (provisioning, rotation, revocation, token refresh) is part of running the stack, not a one-time configuration step.
|
||||
|
||||
### The gap
|
||||
|
||||
Currently:
|
||||
|
||||
```
|
||||
User → alknet → IdentityProvider (resolves who the user is) ✅ exists
|
||||
alknet → external service → ??? (resolves how alknet authenticates) ❌ missing
|
||||
```
|
||||
|
||||
Without `CredentialProvider`, each service wrapper would:
|
||||
1. Independently retrieve and decrypt credentials from the secret service
|
||||
2. Independently implement auth mechanism specifics (bearer, S3 signing, OIDC refresh)
|
||||
3. Have no shared infrastructure for credential lifecycle management
|
||||
|
||||
This leads to duplicated effort and inconsistent security practices across service wrappers.
|
||||
|
||||
## Design
|
||||
|
||||
### CredentialProvider Trait
|
||||
|
||||
```rust
|
||||
pub trait CredentialProvider: Send + Sync + 'static {
|
||||
fn get_credentials(&self, service: &str) -> Option<CredentialSet>;
|
||||
fn refresh_credentials(&self, service: &str) -> Option<CredentialSet>;
|
||||
}
|
||||
```
|
||||
|
||||
This is intentionally narrow. It returns credentials for a named service. It does not try to abstract the auth mechanism itself — that stays with the service wrapper that knows the protocol.
|
||||
|
||||
### CredentialSet
|
||||
|
||||
```rust
|
||||
pub enum CredentialSet {
|
||||
ApiKey {
|
||||
header_name: String,
|
||||
token: String,
|
||||
},
|
||||
Basic {
|
||||
username: String,
|
||||
password: String,
|
||||
},
|
||||
Bearer {
|
||||
token: String,
|
||||
},
|
||||
S3AccessKey {
|
||||
access_key: String,
|
||||
secret_key: String,
|
||||
session_token: Option<String>,
|
||||
},
|
||||
OidcToken {
|
||||
access_token: String,
|
||||
refresh_token: Option<String>,
|
||||
expires_at: Option<u64>,
|
||||
},
|
||||
Custom {
|
||||
scheme: String,
|
||||
params: HashMap<String, String>,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
Each variant carries the data needed for a specific auth mechanism. The service wrapper that requested the credentials knows what variant it expects and how to use it — the `OpenAPIServiceRegistry` knows it needs a `Bearer` or `ApiKey`, the rustfs S3 wrapper knows it needs `S3AccessKey` for request signing.
|
||||
|
||||
### CredentialProvider vs IdentityProvider
|
||||
|
||||
These are opposite-direction abstractions:
|
||||
|
||||
| | IdentityProvider | CredentialProvider |
|
||||
|---|---|---|
|
||||
| Direction | Inbound (who is calling alknet) | Outbound (how alknet calls others) |
|
||||
| Resolves | Fingerprint/token → Identity | Service name → CredentialSet |
|
||||
| Storage | `peer_credentials`, `api_keys` tables (alknet-storage) | Encrypted nodes in metagraph (via SecretProtocol) |
|
||||
| Lifecycle | Stateless lookup | May need refresh (OIDC tokens, S3 sessions) |
|
||||
| Location | `alknet_core::auth` | `alknet_core::credentials` |
|
||||
|
||||
Both live at the same architectural layer. A service handler receives an `OperationContext` with `identity` (who called us) and access to credentials through `context.env`. The handler doesn't interact with `CredentialProvider` directly in the common case — the service initialization code does, when setting up the HTTP client or SDK wrapper.
|
||||
|
||||
### Accounts: Storage-Layer Concern, Not Core
|
||||
|
||||
The `Identity` struct in core (`{ id, scopes, resources }`) does not need an explicit `account_id` field. In config-based auth (`ConfigIdentityProvider`), `id` is the SSH key fingerprint. In database-backed auth (`StorageIdentityProvider`), `id` is the account UUID. The account concept is an implementation detail of `StorageIdentityProvider` — it resolves `peer_credentials.fingerprint → account_id → Identity { id: account_uuid, ... }`. The same person authenticating via SSH key or bearer token gets the same `Identity { id: account_uuid, ... }` because both credential presentations map to the same account UUID in storage.
|
||||
|
||||
This means identity-bound credential lookups (e.g., "Alice's rustfs access key") use `Identity.id` (which is the account UUID in database-backed deployments) as the key — not a separate field. The call pattern is:
|
||||
|
||||
```rust
|
||||
// Service-level credential (no identity needed):
|
||||
credential_provider.get_credentials("rustfs") // shared admin key
|
||||
|
||||
// Identity-bound credential (uses id as account identifier):
|
||||
credential_provider.get_credentials_for("rustfs", &identity.id) // per-user key
|
||||
```
|
||||
|
||||
The `CredentialProvider` trait at core only needs the service-level method. Identity-bound lookups are an extension in alknet-storage that uses the same `Identity.id`.
|
||||
|
||||
### Interaction with SecretProtocol
|
||||
|
||||
Credentials are stored encrypted in the metagraph via the existing `SecretProtocol`:
|
||||
|
||||
1. At setup time, an operator configures credentials for a service (e.g., `alknet credential add vast-ai --type bearer --token-file ./key.txt`)
|
||||
2. The CLI encrypts the credential via `SecretProtocol::Encrypt` (using the derived encryption key at `m/74'/2'/0'/0'`)
|
||||
3. The encrypted credential is stored as an `EncryptedData` node in the metagraph, tagged with the service name
|
||||
4. At startup, `SecretStoreCredentialProvider` (the default `CredentialProvider` impl) calls `SecretProtocol::Decrypt` for each configured service
|
||||
5. The decrypted credentials are held in RAM with the same lifecycle as the secret service (purged on `Lock`)
|
||||
|
||||
```rust
|
||||
pub struct SecretStoreCredentialProvider {
|
||||
credentials: ArcSwap<HashMap<String, CredentialSet>>,
|
||||
secret_client: Client<SecretProtocol>,
|
||||
}
|
||||
|
||||
impl CredentialProvider for SecretStoreCredentialProvider {
|
||||
fn get_credentials(&self, service: &str) -> Option<CredentialSet> {
|
||||
let cache = self.credentials.load();
|
||||
cache.get(service).cloned()
|
||||
}
|
||||
|
||||
fn refresh_credentials(&self, service: &str) -> Option<CredentialSet> {
|
||||
// Re-decrypt from storage — used after Lock/Unlock cycle
|
||||
// Calls secret_client.decrypt() and updates cache
|
||||
None // simplified
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Interaction with OpenAPIServiceRegistry
|
||||
|
||||
The TypeScript `@alkdev/operations` `from_openapi.ts` defines `HTTPServiceConfig.auth`:
|
||||
|
||||
```typescript
|
||||
auth?: {
|
||||
type: "bearer" | "apiKey" | "basic";
|
||||
token?: string;
|
||||
headerName?: string;
|
||||
prefix?: string;
|
||||
};
|
||||
```
|
||||
|
||||
The Rust port would populate this from `CredentialProvider`:
|
||||
|
||||
```rust
|
||||
let creds = credential_provider.get_credentials("vast-ai");
|
||||
let auth = match creds {
|
||||
Some(CredentialSet::Bearer { token }) => AuthConfig::Bearer { token },
|
||||
Some(CredentialSet::ApiKey { header_name, token }) => AuthConfig::ApiKey { header_name, token },
|
||||
Some(CredentialSet::Basic { username, password }) => AuthConfig::Basic { username, password },
|
||||
_ => None,
|
||||
};
|
||||
let config = HttpServiceConfig {
|
||||
namespace: "vast-ai",
|
||||
base_url: "https://cloud.vast.ai/api/v1",
|
||||
auth,
|
||||
..
|
||||
};
|
||||
let ops = FromOpenAPI(spec, config);
|
||||
registry.register_all(ops);
|
||||
```
|
||||
|
||||
### Self-Hosted Services: ManagedCredentialProvider
|
||||
|
||||
For self-hosted services, credentials may need active lifecycle management:
|
||||
|
||||
**rustfs (S3)**:
|
||||
- Access key + secret key are created inside rustfs IAM
|
||||
- The alknet rustfs service wrapper holds the `S3AccessKey` credential set
|
||||
- Each S3 request is signed using AWS Signature V4 (computed from access_key + secret_key + request details)
|
||||
- Session tokens from STS-style calls have a TTL and need rotation
|
||||
- Provisioning: alknet could create the rustfs access key via the rustfs admin API at first setup, then store the resulting credentials
|
||||
|
||||
**rustfs (OIDC)**:
|
||||
- rustfs supports OIDC providers — alknet's identity system _could_ act as an OIDC provider
|
||||
- This would allow alknet identities to authenticate directly to rustfs without stored credentials
|
||||
- Requires: alknet running an OIDC authorization server endpoint (potentially exposed via the call protocol)
|
||||
|
||||
**gitea (OAuth2/OIDC)**:
|
||||
- Similar to rustfs OIDC — alknet could act as the OAuth2/OIDC provider
|
||||
- Gitea supports reverse proxy auth (SSO via headers) — if alknet sits in front as a reverse proxy, it can inject auth headers
|
||||
- Gitea also has its own API token system — simpler case, just store the token
|
||||
|
||||
**ManagedCredentialProvider** wraps these cases:
|
||||
|
||||
```rust
|
||||
pub struct ManagedCredentialProvider {
|
||||
base: SecretStoreCredentialProvider,
|
||||
managers: HashMap<String, Arc<dyn CredentialManager>>,
|
||||
}
|
||||
|
||||
pub trait CredentialManager: Send + Sync + 'static {
|
||||
fn refresh(&self, current: &CredentialSet) -> Option<CredentialSet>;
|
||||
fn is_expired(&self, current: &CredentialSet) -> bool;
|
||||
fn provision(&self, identity: &Identity) -> Option<CredentialSet>;
|
||||
}
|
||||
```
|
||||
|
||||
- `refresh`: For OIDC token refresh, S3 session token rotation
|
||||
- `is_expired`: Check TTL on tokens before use
|
||||
- `provision`: Create credentials on a self-hosted service for a given alknet identity (e.g., create a rustfs access key for a new user)
|
||||
|
||||
### Identity-Bound Credentials
|
||||
|
||||
For self-hosted services where alknet manages the user accounts, there's a higher-order pattern:
|
||||
|
||||
1. An alknet `Identity` (resolved by `IdentityProvider`) needs access to a self-hosted service
|
||||
2. `ManagedCredentialProvider::provision(identity)` creates the corresponding account on the external service
|
||||
3. The resulting credentials are stored and associated with the alknet identity in the metagraph
|
||||
4. When the identity makes a call through the operation registry, the handler can resolve their service-specific credentials using `Identity.id` as the account key
|
||||
|
||||
This bridges `IdentityProvider` and `CredentialProvider`:
|
||||
|
||||
```
|
||||
IdentityProvider: who is this user? → Identity
|
||||
CredentialProvider: how do we talk to service X? → CredentialSet
|
||||
Identity-bound: how does THIS user talk to service X? → CredentialSet (scoped to Identity.id)
|
||||
```
|
||||
|
||||
The identity-bound case is important for multi-tenant self-hosted setups where different alknet users should have different access levels on rustfs or gitea. It can be deferred initially — Phase A only needs service-level credentials.
|
||||
|
||||
## Architectural Position
|
||||
|
||||
### Where CredentialProvider lives
|
||||
|
||||
`CredentialProvider` and `CredentialSet` are core types, analogous to `IdentityProvider` and `Identity`. They live in `alknet_core::credentials`.
|
||||
|
||||
Like `IdentityProvider`:
|
||||
- The trait is in alknet-core
|
||||
- The default impl (`SecretStoreCredentialProvider`) uses the secret service + metagraph
|
||||
- Production impls (`ManagedCredentialProvider`) may live in alknet-storage or application crates
|
||||
- The CLI/NAPI assembly wires the concrete impl
|
||||
- Core does not depend on any storage system
|
||||
|
||||
### Dependencies
|
||||
|
||||
```
|
||||
alknet-core (CredentialProvider trait, CredentialSet enum)
|
||||
↑
|
||||
alknet-secret (SecretStoreCredentialProvider reads from SecretProtocol::Decrypt)
|
||||
↑
|
||||
Application crates (rustfs wrapper, gitea wrapper, etc.)
|
||||
```
|
||||
|
||||
`CredentialProvider` does not depend on `IdentityProvider`, but `ManagedCredentialProvider` may use `Identity.id` to resolve identity-bound credentials.
|
||||
|
||||
### Relationship to existing specs
|
||||
|
||||
| Existing concept | Relationship |
|
||||
|---|---|
|
||||
| `IdentityProvider` | Opposite direction. Identity is inbound auth. Credential is outbound auth. |
|
||||
| `SecretProtocol` | Stores and retrieves encrypted credentials. `SecretStoreCredentialProvider` is a consumer of `SecretProtocol::Decrypt`. |
|
||||
| `OperationEnv` | Service init code uses `CredentialProvider` to configure `HTTPServiceConfig.auth`. Handlers call operations through `env`. |
|
||||
| `OpenAPIServiceRegistry` | Consumer of `CredentialProvider` — populates `auth` config from credential lookup. |
|
||||
| `EncryptedData` | Wire format for stored credentials. Compatible with existing `EncryptedDataSchema` from `@alkdev/storage`. |
|
||||
| `Identity.id` | In database-backed deployments, serves as the account UUID for identity-bound credential lookups. No separate `account_id` field needed — `id` IS the account identifier. |
|
||||
|
||||
### Account management is storage-layer, not core
|
||||
|
||||
The `AccountService` irpc protocol (CRUD for accounts and credential associations) lives in alknet-storage, not core. This follows the same pattern as `ConfigService`:
|
||||
- Core has the read trait (`IdentityProvider`, `CredentialProvider`)
|
||||
- Storage has the management service (`AccountProtocol`, `CredentialProtocol`)
|
||||
- The CLI/NAPI assembly wires them together
|
||||
|
||||
The storage model for accounts:
|
||||
|
||||
```
|
||||
accounts
|
||||
├── id (UUID, primary key)
|
||||
├── display_name
|
||||
├── status (active, disabled)
|
||||
└── default_scopes (JSON)
|
||||
|
||||
peer_credentials (inbound — SSH keys)
|
||||
├── account_id → accounts.id
|
||||
├── fingerprint (SHA-256 of public key)
|
||||
├── public_key_data
|
||||
└── scopes_override (JSON, null = use account default)
|
||||
|
||||
api_keys (inbound — bearer tokens)
|
||||
├── account_id → accounts.id
|
||||
├── key_prefix (first 8 chars, for lookup)
|
||||
├── key_hash (SHA-256 of full key)
|
||||
├── scopes (JSON)
|
||||
└── expires_at
|
||||
|
||||
service_credentials (outbound — for external services)
|
||||
├── id (UUID)
|
||||
├── account_id → accounts.id (NULL = shared/service-level)
|
||||
├── service_name
|
||||
├── credential_type
|
||||
├── encrypted_data → EncryptedData
|
||||
├── metadata (JSON)
|
||||
└── expires_at
|
||||
```
|
||||
|
||||
`StorageIdentityProvider` queries `peer_credentials` → `accounts` and `api_keys` → `accounts` to resolve any inbound credential to the same `Identity { id: account_uuid, ... }`. `StorageCredentialProvider` queries `service_credentials` and decrypts via `SecretProtocol` to resolve outbound credentials.
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### Phase A: Core types and simple credential storage
|
||||
|
||||
Define the trait and enum in alknet-core. Implement `SecretStoreCredentialProvider` that decrypts stored credentials at startup. Wire into the service assembly (CLI). This enables static API key / bearer token patterns — sufficient for cloud API integrations.
|
||||
|
||||
Deliverables:
|
||||
- `CredentialProvider` trait + `CredentialSet` enum in `alknet_core::credentials`
|
||||
- `SecretStoreCredentialProvider` impl (reads from `SecretProtocol::Decrypt`)
|
||||
- CLI command: `alknet credential add <service> --type bearer --token-file <path>`
|
||||
- Credential storage in metagraph as encrypted nodes tagged by service name
|
||||
|
||||
Depends on: Phase 1 (OperationEnv, SecretProtocol) + alknet-secret crate existing
|
||||
|
||||
### Phase B: OpenAPI/JSON Schema auto-registration
|
||||
|
||||
Port `FromOpenAPI` and `OpenAPIServiceRegistry` from the TypeScript `@alkdev/operations` to Rust. Integrate with `CredentialProvider` for auth config. This enables any OpenAPI-described service to be auto-registered as a set of operations.
|
||||
|
||||
Deliverables:
|
||||
- `alknet-openapi` or `alknet-operations-adapter` crate with `from_openapi` module
|
||||
- `FromOpenAPI(spec, config) -> Vec<(OperationSpec, Handler)>`
|
||||
- `HttpServiceConfig` with auth populated from `CredentialProvider`
|
||||
- `OpenAPIServiceRegistry::register_all(registry)` port
|
||||
|
||||
Depends on: Phase A + existing `OperationRegistry`
|
||||
|
||||
### Phase C: Managed credentials and self-hosted auth
|
||||
|
||||
Add `ManagedCredentialProvider` with `CredentialManager` trait. Implement S3 signing for rustfs. Implement OIDC token refresh. Enable identity-bound credential provisioning.
|
||||
|
||||
Deliverables:
|
||||
- `CredentialManager` trait
|
||||
- `ManagedCredentialProvider` impl
|
||||
- S3CredentialManager (request signing, session token rotation)
|
||||
- OidcCredentialManager (token refresh, PKCE flow)
|
||||
- Identity-bound credential resolution (uses `Identity.id` as account key)
|
||||
|
||||
Depends on: Phase A + alknet-storage + application-specific knowledge
|
||||
|
||||
### Phase D: Alknet as OIDC/OAuth2 provider
|
||||
|
||||
Alknet's identity system could expose an OIDC authorization server endpoint. Self-hosted services (rustfs, gitea) would be configured to use alknet as their OIDC provider. This eliminates stored credential management entirely for the OIDC path — users authenticate directly through alknet's existing identity.
|
||||
|
||||
This is the most complex but also the most elegant path for self-hosted services. It makes alknet the identity backbone of the entire self-hosted stack.
|
||||
|
||||
Deliverables:
|
||||
- OIDC authorization server operations (authorize, token, userinfo, jwks)
|
||||
- Exposed via call protocol and/or HTTP adapter
|
||||
- Configuration for rustfs/gitea to use alknet as OIDC provider
|
||||
- Identity mapping: alknet Identity scopes → rustfs/gitea policies
|
||||
|
||||
Depends on: Phase C + call protocol HTTP or web adapter + significant R&D
|
||||
|
||||
## Analysis of Self-Hosted Auth Mechanisms
|
||||
|
||||
### rustfs
|
||||
|
||||
**S3 access key/secret key**:
|
||||
- rustfs IAM manages users, groups, policies, and service accounts
|
||||
- Credentials are `access_key` + `secret_key` pairs (S3 standard)
|
||||
- Auth uses AWS Signature V4: HMAC-SHA256 of request details using the secret key
|
||||
- Session tokens (from STS AssumeRole-style flows) are JWTs with claims including policy
|
||||
- Access keys are created via the rustfs admin API or UI
|
||||
|
||||
**OIDC**:
|
||||
- Full OpenID Connect support with PKCE
|
||||
- Uses the `openidconnect` Rust crate for standards compliance
|
||||
- Supports discovery, token exchange, ID token verification
|
||||
- OIDC users are mapped to rustfs policies via claims
|
||||
|
||||
**Integration path**:
|
||||
- Minimal: Store access key + secret key as `CredentialSet::S3AccessKey`, use for request signing
|
||||
- Better: alknet as OIDC provider → no stored credentials, direct identity mapping
|
||||
- Best: Phase D path where rustfs trusts alknet as its identity provider
|
||||
|
||||
### gitea
|
||||
|
||||
**Auth options**:
|
||||
- OAuth2 provider (gitea can act as OAuth2 provider for other apps)
|
||||
- OIDC client (gitea can delegate auth to an external OIDC provider — alknet in Phase D)
|
||||
- Reverse proxy auth (SSO via HTTP headers — alknet injects `X-WebAuth-User` as a reverse proxy)
|
||||
- API tokens (personal access tokens, scoped, with TTL)
|
||||
- SSH keys (for git operations, separate from API auth)
|
||||
|
||||
**Integration path**:
|
||||
- Minimal: Store gitea API token as `CredentialSet::Bearer`
|
||||
- Intermediate: If alknet runs as a reverse proxy in front of gitea, inject auth headers
|
||||
- Best: alknet as OIDC provider for gitea
|
||||
|
||||
### General pattern
|
||||
|
||||
For both rustfs and gitea, the auth integration follows the same progression:
|
||||
|
||||
1. **Static credentials** (Phase A): Store API keys/tokens, decrypt at startup. Simple, works for single-user or admin-only access.
|
||||
2. **Dynamic credentials** (Phase C): Managed credential lifecycle — token refresh, session rotation. Needed for production.
|
||||
3. **Identity federation** (Phase D): Alknet acts as the identity provider. No stored service credentials. Users authenticate through alknet and their identity (scopes, resources) maps to the external service's policy model. Most secure, most complex.
|
||||
|
||||
Phase D is not required to start building service wrappers. Phases A and C are sufficient for functional integrations. Phase D is a quality-of-life and security improvement that becomes important in multi-user self-hosted deployments.
|
||||
|
||||
## Open Questions
|
||||
|
||||
### OQ-CP-01: Should CredentialProvider support per-identity credentials?
|
||||
|
||||
That is, should the trait be `get_credentials(service, identity)` instead of `get_credentials(service)`?
|
||||
|
||||
Pro: Enables multi-tenant self-hosted services where different alknet users have different access.
|
||||
Con: More complex, and the identity resolution can be done by the service wrapper itself by looking up identity-bound credentials from the metagraph.
|
||||
|
||||
Recommendation: Start with service-level credentials. Add identity-level resolution as a second method (`get_credentials_for(service, account_id)`) when the need is concrete. Since `Identity.id` already serves as the account UUID in database-backed mode, there's no need for a separate `account_id` field.
|
||||
|
||||
### OQ-CP-02: Where should the OIDC provider operations live?
|
||||
|
||||
If alknet becomes an OIDC provider (Phase D), the authorization server endpoints need to live somewhere. Options:
|
||||
|
||||
1. In alknet-core behind a feature flag (like auth service)
|
||||
2. In a new `alknet-oidc` crate
|
||||
3. As an application service registered in the operation registry
|
||||
|
||||
Recommendation: Application service (option 3). OIDC is an application concern, not a core concern. The call protocol and `OperationRegistry` provide the transport; OIDC is just another set of operations.
|
||||
|
||||
### OQ-CP-03: How do credential rotations propagate across a cluster?
|
||||
|
||||
If a credential is rotated (e.g., S3 session token refreshed on the head node), how do worker nodes get the updated credential? Options:
|
||||
|
||||
1. Workers request fresh credentials on each use (always current, more secret service calls)
|
||||
2. Push notification via honker stream (efficient, but adds cross-service event coupling)
|
||||
3. Workers cache with TTL (simple, may briefly use stale credentials)
|
||||
|
||||
Recommendation: TTL-based caching with a refresh threshold. Workers call `CredentialProvider::get_credentials()` which checks `is_expired()` and calls `refresh_credentials()` if needed. The TTL is per-credential-type (e.g., 1 hour for S3 session tokens, no TTL for static API keys).
|
||||
|
||||
### OQ-CP-04: Should CredentialSet include request-signing capability?
|
||||
|
||||
For S3 auth, the credential set contains `access_key + secret_key`, but the actual HTTP request signing (AWS Signature V4) is a separate computation. Should `CredentialSet::S3AccessKey` include a signing method?
|
||||
|
||||
Recommendation: Keep `CredentialSet` as pure data. Add a separate `s3_sign(credential: &S3AccessKey, request: &HttpRequest) -> SignedRequest` utility function in the service wrapper or a shared `alknet-s3` utility crate. The `OpenAPIServiceRegistry` pattern already separates credentials from HTTP client behavior; signing is client behavior.
|
||||
|
||||
### OQ-CP-05: How does this relate to the HTTP service / AI SDK port?
|
||||
|
||||
The AI SDK port provides HTTP infrastructure (streaming, retries, SSE parsing, error handling). The `CredentialProvider` provides the auth config that the HTTP client consumes. They're separate concerns that compose: the HTTP service uses `CredentialProvider` to populate `auth` headers/tokens on outgoing requests, just as `OpenAPIServiceRegistry` does. The AI SDK's provider codegen (which would be replaced with the operation pattern) currently hardcodes auth per provider; `CredentialProvider` makes it dynamic and centrally managed instead.
|
||||
|
||||
## References
|
||||
|
||||
- [identity.md](../../architecture/identity.md) — IdentityProvider trait, Identity struct
|
||||
- [secret-service.md](../../architecture/secret-service.md) — SecretProtocol, EncryptedData
|
||||
- [services.md](../../architecture/services.md) — OperationEnv, OperationRegistry, service composition
|
||||
- [call-protocol.md](../../architecture/call-protocol.md) — OperationEnv three dispatch paths
|
||||
- [integration-plan.md](../integration-plan.md) — Phase structure, OperationEnv wiring
|
||||
- [@alkdev/operations/src/from_openapi.ts](../../../@alkdev/operations/src/from_openapi.ts) — OpenAPIServiceRegistry, HTTPServiceConfig.auth
|
||||
@@ -1,549 +0,0 @@
|
||||
# Definitions: Terminology Disambiguation and Concept Mapping
|
||||
|
||||
> Status: Research / Draft
|
||||
> Last updated: 2026-06-08
|
||||
> Part of: Phase 2 planning
|
||||
|
||||
## Purpose
|
||||
|
||||
Multiple terms are overloaded across alknet's architecture, OpenStack's identity model, and the distributed systems/git space. This document disambiguates each term, maps equivalent concepts across domains, and identifies open questions that need resolution before updating architecture specs.
|
||||
|
||||
The architecture docs (interface.md, auth.md, services.md) reflect a pre-Phase-0/1 state. This document exists to untangle conceptual knots before editing those specs.
|
||||
|
||||
---
|
||||
|
||||
## Term Definitions
|
||||
|
||||
### Interface (alknet Layer 2)
|
||||
|
||||
**Definition**: An interface consumes a byte stream from a Transport (Layer 1) and produces call protocol sessions or handles discrete requests. It is a _protocol parser_, not a network service.
|
||||
|
||||
**Subtypes**:
|
||||
|
||||
| Subtype | Trait | Lifecycle | Transport ownership | Examples |
|
||||
|---|---|---|---|---|
|
||||
| `StreamInterface` | `StreamInterface::accept(stream) -> Session` | Long-lived session | Provided by caller | SshInterface, RawFramingInterface |
|
||||
| `MessageInterface` | `MessageInterface::handle_request(req) -> Response` | Stateless per-request | Self-managed | HttpInterface, DnsInterface, WebSocketInterface |
|
||||
|
||||
**Not to be confused with**: A "service interface" (API surface of a service), a Rust trait (also called an interface generically), or an "interface" in the OpenStack sense (a network endpoint).
|
||||
|
||||
**Source**: [interface-model.md](interface-model.md)
|
||||
|
||||
---
|
||||
|
||||
### Transport (alknet Layer 1)
|
||||
|
||||
**Definition**: A transport produces a byte stream (`AsyncRead + AsyncWrite + Unpin + Send`) or a datagram channel. It is a _wire mechanism_, not a protocol. Transports are listed in `TransportKind`: TCP, TLS, iroh (QUIC), WebTransport.
|
||||
|
||||
**Not to be confused with**: The HTTP transport (which is a transport+interface combined in a `MessageInterface`), or the DNS "transport" (which was removed from `TransportKind` because DNS is a `MessageInterface`).
|
||||
|
||||
**Key constraint**: A connection is always a (Transport, StreamInterface) pair for stream-based connections. `MessageInterface` implementations manage their own transport internally.
|
||||
|
||||
**Source**: [tls-transport.md](tls-transport.md), [interface-model.md](interface-model.md)
|
||||
|
||||
---
|
||||
|
||||
### Service (irpc service)
|
||||
|
||||
**Definition**: An in-cluster Rust-to-Rust service defined by an irpc protocol enum. Services are dispatched by enum variant and use postcard serialization. They run within a node or cluster and are synchronous request-response.
|
||||
|
||||
**Examples**: `AuthProtocol`, `SecretProtocol`, `ConfigProtocol`, `StorageProtocol`.
|
||||
|
||||
**Not to be confused with**: A call protocol operation (path-based, JSON, cross-node), an external service (a third-party endpoint reachable via HTTP/call protocol), or an application service (DockerService, GitService — an operation-registered handler).
|
||||
|
||||
**Architecture position**: irpc services are _one dispatch backend_ for OperationEnv, not a replacement for it.
|
||||
|
||||
**Source**: [integration-plan.md](../integration-plan.md), Inconsistencies section item 3.
|
||||
|
||||
---
|
||||
|
||||
### Operation (call protocol)
|
||||
|
||||
**Definition**: A path-based handler registered in the `OperationRegistry`, dispatched by namespace + name (e.g., `/head/auth/verify`). Operations are cross-node, cross-language, and use JSON `EventEnvelope` frames.
|
||||
|
||||
**Not to be confused with**: An irpc service method (which is dispatched by enum variant, not path), or an OpenStack operation (which is a REST API verb).
|
||||
|
||||
**Architecture position**: Operations are the universal composition unit. All interfaces (SSH, HTTP, DNS, WebSocket, MCP) resolve to the same operation invocations through `OperationEnv`.
|
||||
|
||||
**Source**: [integration-plan.md](../integration-plan.md), ADR-033.
|
||||
|
||||
---
|
||||
|
||||
### External Service
|
||||
|
||||
**Definition**: Any endpoint reachable via the call protocol from another node or over an interface — an HTTP API (vast.ai), another alknet head node, rustfs, gitea. External services are _consumed_ by alknet, not part of it.
|
||||
|
||||
**Examples**: vast.ai cloud API, runpod API, any OpenAPI-described endpoint consumed by `OpenAPIServiceRegistry`.
|
||||
|
||||
**Not to be confused with**: An irpc service (internal), or an application service (handler within alknet).
|
||||
|
||||
---
|
||||
|
||||
### Application Service
|
||||
|
||||
**Definition**: A handler registered with the `OperationRegistry` that provides application-level functionality. Application services are pluggable, don't change core, and register operations like any other handler.
|
||||
|
||||
**Examples**: DockerService, NodeService, GitService, RustfsService.
|
||||
|
||||
**Not to be confused with**: An irpc service (which is a dispatch mechanism, not a handler), or an external service (outside the cluster).
|
||||
|
||||
---
|
||||
|
||||
### Identity (alknet core type)
|
||||
|
||||
**Definition**: A struct `{ id, scopes, resources }` that represents an authenticated principal. Produced by `IdentityProvider::resolve_from_fingerprint()` or `IdentityProvider::resolve_from_token()`. The same person connecting via SSH key or API token resolves to the same `Identity` (same `id` in database-backed deployments).
|
||||
|
||||
**Mapping to other domains**:
|
||||
|
||||
| alknet Concept | OpenStack Keystone | Distributed Git |
|
||||
|---|---|---|
|
||||
| `Identity.id` (fingerprint or UUID) | User ID | Radicle DID / on-chain address |
|
||||
| `Identity.scopes` | Role assignments on a project/domain | Repository ACL entries |
|
||||
| `Identity.resources` | Service catalog endpoints | Repositories accessible |
|
||||
| `IdentityProvider` | Keystone identity service | On-chain registry + local cache |
|
||||
|
||||
**Not to be confused with**: A "user" (which is an account concept in storage), a "principal" (similar but not identical — an Identity can represent a service account or API key).
|
||||
|
||||
**Source**: [identity.md](../../architecture/identity.md), [auth.md](../../architecture/auth.md), ADR-029.
|
||||
|
||||
---
|
||||
|
||||
### IdentityProvider (alknet trait)
|
||||
|
||||
**Definition**: A trait in `alknet_core::auth` with two methods: `resolve_from_fingerprint()` (SSH key auth) and `resolve_from_token()` (bearer token auth). It resolves an inbound credential to an `Identity`.
|
||||
|
||||
**Implementations**: `ConfigIdentityProvider` (ArcSwap-backed, minimal), `StorageIdentityProvider` (SQLite-backed, production). Future possibility: `OnChainIdentityProvider` (smart contract + local cache).
|
||||
|
||||
**Direction**: Inbound (who is calling alknet).
|
||||
|
||||
**Not to be confused with**: `CredentialProvider` (outbound — how alknet authenticates TO external services), or an OpenStack Keystone "identity provider" which is a federation concept.
|
||||
|
||||
---
|
||||
|
||||
### CredentialProvider (alknet trait)
|
||||
|
||||
**Definition**: A trait in `alknet_core::credentials` that resolves outbound credentials. `get_credentials(service) -> Option<CredentialSet>`. It answers: "how does alknet authenticate to service X?"
|
||||
|
||||
**Direction**: Outbound (how alknet calls external services).
|
||||
|
||||
**Mapping**: Rustfs credentials (S3AccessKey), gitea tokens (Bearer), OIDC tokens (OidcToken), API keys (ApiKey).
|
||||
|
||||
**Not to be confused with**: `IdentityProvider` (inbound auth resolution).
|
||||
|
||||
**Source**: [credential-provider.md](credential-provider.md)
|
||||
|
||||
---
|
||||
|
||||
### AuthToken (alknet wire format)
|
||||
|
||||
**Definition**: `base64url(key_id || timestamp || signature)` — an Ed25519-signed timestamp token used for non-SSH auth (HTTP, DNS, WebTransport, WebSocket).
|
||||
|
||||
**Mapping to other domains**:
|
||||
|
||||
| alknet | OpenStack Keystone | Description |
|
||||
|---|---|---|
|
||||
| AuthToken | Keystone token (X-Auth-Token) | Proof of identity carried in a request |
|
||||
| AuthToken (Ed25519 signed) | Keystone token (scoped, with catalog) | Keystone tokens carry more metadata (catalog, scope); alknet tokens are minimal |
|
||||
| API key (`alk_...`) | Application Credential | Password-less auth with restricted scope |
|
||||
| `resolve_from_token()` | Token validation endpoint | Verify token → resolve identity |
|
||||
|
||||
**Key difference**: Keystone tokens are server-issued and carry scope/catalog. alknet AuthTokens are self-signed (client-generated) and carry only key_id + timestamp — scope is resolved server-side by `IdentityProvider`. This is intentional: alknet doesn't need a token issuance endpoint because tokens are self-proving.
|
||||
|
||||
**Source**: [auth.md](../../architecture/auth.md), ADR-023.
|
||||
|
||||
---
|
||||
|
||||
### Domain Event vs Integration Event
|
||||
|
||||
**Definition** (from event-sourcing/event_source_types.md):
|
||||
|
||||
| Type | Scope | Consumers | Serialization | Example |
|
||||
|---|---|---|---|---|
|
||||
| Domain Event | Within a single service boundary | Internal handlers only | Can be rich, domain-specific | `InventoryAdjusted`, `KeyRotated` |
|
||||
| Integration Event | Across service boundaries | External services, other nodes | Simple, versioned, stripped of internals | `call.requested` (EventEnvelope), `UserCreated` (projected) |
|
||||
|
||||
**alknet mapping**:
|
||||
|
||||
| Boundary | Mechanism | Serialization | Scope |
|
||||
|---|---|---|---|
|
||||
| Within a service (e.g., AuthProtocol) | Honker streams (domain events) | Internal | Same service |
|
||||
| Between services in a cluster | irpc protocol enum | postcard (binary) | Same cluster |
|
||||
| Between nodes or over interfaces | Call protocol EventEnvelope | JSON | Cross-node |
|
||||
|
||||
**Hard constraint** (ADR-032): Domain events never cross service boundaries without projection. Integration events are the boundary contract.
|
||||
|
||||
**Not to be confused with**: A "call protocol event" (which IS an integration event), or a "service call" (which is synchronous, not event-based).
|
||||
|
||||
---
|
||||
|
||||
### Scope (alknet)
|
||||
|
||||
**Definition**: A permission or claim attached to an `Identity`. Used by `ForwardingPolicy` and operation-level ACL. Defined as part of the `Identity` struct.
|
||||
|
||||
**Mapping to other domains**:
|
||||
|
||||
| alknet `Scope` | OpenStack Keystone | Distributed Git |
|
||||
|---|---|---|
|
||||
| `scopes: ["relay:connect", "secrets:derive"]` | Role assignments on a project ("member", "admin") | Write/push access to repository X |
|
||||
| `resources: [...]` | Project/domain scope targets | Which repositories are accessible |
|
||||
|
||||
**Open question**: Should alknet adopt a richer scope model (hierarchical, like Keystone's implied roles), or keep the flat string model? See OQ-DEF-03.
|
||||
|
||||
---
|
||||
|
||||
### OperationRegistry (alknet)
|
||||
|
||||
**Definition**: The central registry that maps `(namespace, operation_name)` to handlers. All interfaces resolve to the same registry. The HTTP interface maps `POST /v1/{namespace}/{op}` to `registry.invoke()`. The call protocol maps `call.requested` with `operationId` to `registry.invoke()`.
|
||||
|
||||
**Mapping to other domains**:
|
||||
|
||||
| alknet | OpenStack Keystone | Description |
|
||||
|---|---|---|
|
||||
| OperationRegistry | Service Catalog | Both map names to endpoints; registry is programmatic, catalog is runtime-discovered |
|
||||
| `FromOpenAPI` | — | Consumes an external API spec and registers operations |
|
||||
| `GET /v1/schema` (proposed) | `GET /v3/auth/catalog` | Produces a spec of available operations |
|
||||
|
||||
**Key difference**: Keystone's catalog is per-token (scoped to the user's project). alknet's OperationRegistry is global — scope checking happens at invocation time, not discovery time.
|
||||
|
||||
---
|
||||
|
||||
### Call Protocol (alknet Layer 3)
|
||||
|
||||
**Definition**: The application-level protocol that carries operations, events, and responses between nodes. Uses JSON `EventEnvelope` frames. Interface-agnostic: runs over any (Transport, StreamInterface) pair or any `MessageInterface`.
|
||||
|
||||
**Not to be confused with**: irpc service calls (synchronous, in-cluster, postcard), or HTTP (which is an interface that maps to call protocol operations).
|
||||
|
||||
---
|
||||
|
||||
## Concept Mapping Table
|
||||
|
||||
### Alknet ↔ OpenStack Keystone
|
||||
|
||||
| Alknet Concept | Keystone Equivalent | Notes |
|
||||
|---|---|---|
|
||||
| `Identity` | User + Role Assignment + Project scope | alknet is simpler; Keystone separates user/role/project |
|
||||
| `Identity.id` | User ID | In storage-backed: UUID. In config-backed: key fingerprint |
|
||||
| `Identity.scopes` | Role assignments | alknet uses flat strings; Keystone uses hierarchical roles |
|
||||
| `Identity.resources` | Project scope + Service Catalog | Both limit what a token can access |
|
||||
| `IdentityProvider` | Keystone identity service | Both resolve credentials → identity + scope |
|
||||
| `AuthToken` | Keystone token (X-Auth-Token) | alknet tokens are self-signed (no issuance endpoint); Keystone tokens are server-issued |
|
||||
| API key (`alk_...`) | Application Credential | Nearly identical pattern |
|
||||
| `CredentialProvider` | — (no direct equivalent) | Keystone doesn't authenticate outbound; each service manages its own credentials |
|
||||
| `OperationRegistry` | Service Catalog | Registry is programmatic; catalog is runtime-discovered per-token scope |
|
||||
| `CredentialSet::S3AccessKey` | S3 credential (access key + secret) | Directly maps to rustfs IAM model |
|
||||
| `CredentialSet::OidcToken` | Federated token | alknet Phase D: becomes OIDC provider |
|
||||
| Domain events (Honker) | — | Internal event bus, no Keystone equivalent |
|
||||
| Integration events (call protocol) | Keystone notifications | Both are cross-boundary, but call protocol is request/response, not pub/sub |
|
||||
| Token scoping | Unscoped → scoped token flow | alknet resolves scope server-side; Keystone requires explicit scope request |
|
||||
|
||||
### Alknet ↔ Distributed Git / Smart Contracts
|
||||
|
||||
| Alknet Concept | Distributed Git Equivalent | Notes |
|
||||
|---|---|---|
|
||||
| `Identity.id` (Ed25519 fingerprint) | Radicle DID (Ed25519 pubkey hash) | Both use Ed25519; alknet uses SLIP-0010 derivation |
|
||||
| `Identity.scopes` | Repository ACL entries | Smart contract: NFT ownership → write permission |
|
||||
| `IdentityProvider` | On-chain identity registry | alknet: local/DB lookup. Distributed: on-chain verification + local cache |
|
||||
| `CredentialSet` | Git push credentials | ssh-key for SSH git, token for HTTPS git |
|
||||
| Call protocol (integration events) | Gossip protocol (Radicle) | Both are cross-node; call protocol is point-to-point, gossip is epidemic |
|
||||
| `OperationRegistry` | Replicator registry (on-chain) | Both map names to endpoints/operations |
|
||||
| Domain events (Honker) | Git ref updates (internal) | Internal to the git service boundary |
|
||||
| Seed derivation (BIP39) | Ethereum private key | Both derive multiple keys from one seed; different curves (Ed25519 vs secp256k1) |
|
||||
| SecretProtocol key paths | — | alknet's `m/74'/0'/0'/0'` for Ed25519 identity; `m/44'/60'/0'/0/0` for Ethereum signing |
|
||||
|
||||
### Alknet ↔ Rustfs Auth Integration
|
||||
|
||||
| Alknet Concept | Rustfs Equivalent | Integration Path |
|
||||
|---|---|---|
|
||||
| `IdentityProvider` (inbound) | Rustfs IAM / Keystone auth | Phase D: alknet as OIDC provider → rustfs accepts alknet tokens |
|
||||
| `CredentialSet::S3AccessKey` | Rustfs access key + secret key | Phase A: static credentials; Phase C: per-identity provisioned keys |
|
||||
| `CredentialProvider` (outbound) | Rustfs admin API (key provisioning) | Phase C: `ManagedCredentialProvider` provisions rustfs keys |
|
||||
| `Identity.scopes` | Rustfs IAM policy | Phase D: scope → OIDC claim → policy mapping |
|
||||
| HTTP MessageInterface | Rustfs S3 API (port 9000) | Rustfs sits behind alknet's HTTP router or sidecar |
|
||||
| OperationRegistry | — | Git service maps `git.clone`, `git.push`, etc. to operations |
|
||||
|
||||
---
|
||||
|
||||
## Overloaded Terms: Disambiguation
|
||||
|
||||
### "Service" — Three Meanings
|
||||
|
||||
| Context | Meaning | Example | Architecture Layer |
|
||||
|---|---|---|---|
|
||||
| alknet irpc service | In-cluster Rust-to-Rust protocol enum | `AuthProtocol`, `SecretProtocol` | Layer 3 (internal) |
|
||||
| alknet application service | Operation-registered handler | `GitService`, `RustfsService` | Layer 3 (handler) |
|
||||
| External service | Third-party endpoint consumed by alknet | `vast.ai`, `rustfs` instance | Outside alknet (consumed via OperationEnv) |
|
||||
|
||||
**Rule**: When ambiguity is possible, use the full qualifier: "irpc service", "application service", or "external service". The bare word "service" should be avoided in architecture docs.
|
||||
|
||||
### "Interface" — Three Meanings
|
||||
|
||||
| Context | Meaning | Example |
|
||||
|---|---|---|
|
||||
| alknet Layer 2 | A protocol parser that consumes Transport streams or handles discrete requests | `SshInterface`, `HttpInterface`, `DnsInterface` |
|
||||
| Rust/generic | A trait definition | `IdentityProvider`, `CredentialProvider` |
|
||||
| OpenStack/generic | A network endpoint (URL) for a service | Keystone's public/internal/admin interfaces |
|
||||
|
||||
**Rule**: In alknet architecture docs, "Interface" (capitalized) refers to Layer 2. "trait" or "contract" should be used for Rust trait definitions. "endpoint" should be used for network URLs.
|
||||
|
||||
### "Token" — Three Meanings
|
||||
|
||||
| Context | Meaning | Structure |
|
||||
|---|---|---|
|
||||
| AuthToken (alknet) | Self-signed Ed25519 timestamp | `base64url(key_id \|\| timestamp \|\| sig)` |
|
||||
| API key (alknet) | Hash-verified bearer string | `alk_...` prefix, SHA-256 hash verification |
|
||||
| Keystone token | Server-issued scoped token | UUID or JWT, carries catalog and scope |
|
||||
|
||||
**Rule**: "AuthToken" refers to alknet's self-signed token. "API key" refers to the hash-verified bearer format. "Keystone token" when referring to OpenStack. Never use bare "token" in architecture docs.
|
||||
|
||||
### "Identity" — Two Meanings
|
||||
|
||||
| Context | Meaning |
|
||||
|---|---|
|
||||
| alknet `Identity` struct | `{ id, scopes, resources }` — the authenticated principal |
|
||||
| OpenStack Identity (Keystone) | The entire identity management SERVICE, including users, projects, roles, tokens, catalog |
|
||||
|
||||
**Rule**: "Identity" (capitalized, code font) = alknet struct. "Keystone" or "identity service" = OpenStack concept.
|
||||
|
||||
### "Domain" — Two Meanings in Event Sourcing
|
||||
|
||||
| Context | Meaning |
|
||||
|---|---|
|
||||
| Domain Event | An event within a single service boundary (e.g., `KeyRotated` within AuthProtocol) |
|
||||
| DNS Domain | A domain name in DNS queries/records |
|
||||
|
||||
These are unrelated. "Domain event" is from DDD. "DNS domain" is from networking. Context should always make it clear, but if there's any chance of confusion, use "bounded-context event" instead of "domain event".
|
||||
|
||||
---
|
||||
|
||||
## Architectural Patterns: Cross-Domain Comparison
|
||||
|
||||
### Pattern: Inbound Auth → Outbound Credentials
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ Incoming Request │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ IdentityProvider │
|
||||
│ (credential → Identity) │
|
||||
│ │ │
|
||||
│ ├── SSH fingerprint → Identity.id, .scopes, .resources │
|
||||
│ ├── Bearer AuthToken → Identity.id, .scopes, .resources │
|
||||
│ └── API key → Identity.id, .scopes, .resources│
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ OperationContext { identity, env, ... } │
|
||||
│ │ │
|
||||
│ ├── context.env.invoke("git", "push", input) │
|
||||
│ │ └── GitService handler │
|
||||
│ │ └── CredentialProvider │
|
||||
│ │ └── get_credentials("rustfs") │
|
||||
│ │ └── S3AccessKey { access_key, │
|
||||
│ │ secret_key } │
|
||||
│ │ │
|
||||
│ └── context.env.invoke("secrets", "derive", input) │
|
||||
│ └── local dispatch to SecretProtocol │
|
||||
│ │
|
||||
│ Two directions: Inbound (who is calling us) │
|
||||
│ Outbound (how we call others) │
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Pattern: Scope Resolution Across Systems
|
||||
|
||||
| System | Scope Source | Scope Shape | Scope Check Location |
|
||||
|---|---|---|---|
|
||||
| alknet (current) | `IdentityProvider` | Flat strings `["relay:connect"]` | Handler invocation |
|
||||
| Keystone | Role assignment on project | Hierarchical roles with implied roles | Policy engine per service |
|
||||
| Rustfs IAM | Policy document attached to user | JSON policy with actions/resources | Request evaluation |
|
||||
| Smart contract ACL | NFT ownership + on-chain mapping | Address → repo → permission level | On-chain verification + local cache |
|
||||
| Radicle | Local config | Pubkey → repo → permission | Pre-receive hook |
|
||||
|
||||
**Open question**: Should alknet adopt hierarchical implied roles (Keystone pattern) or stay with flat scopes and let individual services interpret them?
|
||||
|
||||
### Pattern: Token Self-Proving vs Server-Issued
|
||||
|
||||
| Property | alknet AuthToken | Keystone Token | API Key |
|
||||
|---|---|---|---|
|
||||
| Issued by | Client (self-signed) | Server (Keystone) | Admin (config or DB) |
|
||||
| Carries | key_id + timestamp + signature | User ID, scope, catalog, expiry | Prefix + hash |
|
||||
| Verified by | Ed25519 signature check | Server lookup (database or JWT) | SHA-256 hash check |
|
||||
| Revocation | Key removal from `authorized_keys` | Token revocation list or JWT `jti` | DB deletion |
|
||||
| Scope resolution | Server-side (IdentityProvider) | Embedded in token | Server-side (DB lookup) |
|
||||
| Replay protection | Timestamp window (±300s) | Token TTL + server validation | N/A (stateless) |
|
||||
|
||||
alknet's self-proving model avoids the need for a token issuance endpoint. This is a deliberate trade-off: simpler at the cost of no server-side session state. For replay protection beyond the timestamp window, future work could add nonce challenge-response (ADR-023).
|
||||
|
||||
---
|
||||
|
||||
## Service Classification
|
||||
|
||||
Services within alknet's ecosystem are classified by their relationship to the core:
|
||||
|
||||
### Core Services (irpc, always present when feature flag enabled)
|
||||
|
||||
| Service | Protocol | Location | Purpose |
|
||||
|---|---|---|---|
|
||||
| Auth | `AuthProtocol` | alknet-core (`irpc` feature) | Identity resolution, credential verification |
|
||||
| Config | `ConfigProtocol` | alknet-core (`irpc` feature) | Dynamic config reload |
|
||||
| Secret | `SecretProtocol` | alknet-secret | Key derivation, encryption, decryption |
|
||||
| Storage | `StorageProtocol` | alknet-storage | Metagraph CRUD, ACL, accounts |
|
||||
|
||||
### Application Services (operation-registered, pluggable)
|
||||
|
||||
| Service | Interface | Core dependency | Purpose |
|
||||
|---|---|---|---|
|
||||
| GitService | HTTP (MessageInterface) + SSH (StreamInterface) | IdentityProvider, CredentialProvider | Git clone/push/pull over HTTPS and SSH |
|
||||
| RustfsService | HTTP (MessageInterface) | CredentialProvider | S3-compatible object storage proxy |
|
||||
| DockerService | HTTP (MessageInterface) | CredentialProvider | Container management |
|
||||
| NodeService | HTTP (MessageInterface) | IdentityProvider | Node management |
|
||||
|
||||
### External Services (consumed, not hosted)
|
||||
|
||||
| Service | Integration | Auth |
|
||||
|---|---|---|
|
||||
| vast.ai | `OpenAPIServiceRegistry` + `CredentialProvider` | API key |
|
||||
| runpod | `OpenAPIServiceRegistry` + `CredentialProvider` | API key |
|
||||
| ubicloud | `OpenAPIServiceRegistry` + `CredentialProvider` | API key |
|
||||
|
||||
**Key distinction**: Rustfs and gitea are "self-hosted external services" — they run inside the same deployment boundary but are managed independently. alknet acts as a gateway (identity provider, credential provider) and reverse proxy (HTTP interface) for them, but they are NOT part of alknet-core.
|
||||
|
||||
---
|
||||
|
||||
## Open Questions
|
||||
|
||||
### OQ-DEF-01: Should alknet adopt a "Service Catalog" concept like Keystone?
|
||||
|
||||
Keystone's service catalog lets a token carry information about which services and endpoints are available to the authenticated user. alknet's `OperationRegistry` is global — every authenticated identity sees the same operations. Should there be a scope-filtered operation discovery mechanism?
|
||||
|
||||
**Options**:
|
||||
1. Keep `OperationRegistry` global, check scope at invocation time (current design)
|
||||
2. Add `GET /v1/catalog` or `GET /v1/schema?scope=<scope>` that returns only operations the identity can invoke
|
||||
3. Add a "service catalog" field to `Identity.resources` that lists available namespaces
|
||||
|
||||
**Recommendation**: Start with option 1 (current design). Add option 2 when multi-tenant deployment requires it. The `GET /v1/schema` endpoint (from tls-transport.md) already provides operation discovery — adding scope filtering is additive.
|
||||
|
||||
---
|
||||
|
||||
### OQ-DEF-02: Should "application service" and "irpc service" be renamed to avoid "service" overloading?
|
||||
|
||||
The word "service" has three meanings in the architecture (irpc, application, external). Should we adopt different terms?
|
||||
|
||||
**Options**:
|
||||
1. Keep current names, always qualify with "irpc service", "application service", "external service"
|
||||
2. Rename: "irpc service" → "irpc protocol" or "backend handler"; "application service" → "adapter" or "integration"
|
||||
3. Adopt the call-protocol terminology exclusively: everything that registers in `OperationRegistry` is an "operation handler", and "service" refers only to external endpoints
|
||||
|
||||
**Recommendation**: Option 1 for now. The qualifiers are sufficient, and renaming would require changing ADRs and multiple specs. Revisit if confusion persists in practice.
|
||||
|
||||
---
|
||||
|
||||
### OQ-DEF-03: Should `Identity.scopes` be hierarchical (like Keystone implied roles) or stay flat?
|
||||
|
||||
Current design: `scopes: Vec<String>` with flat strings like `"relay:connect"`, `"secrets:derive"`.
|
||||
Keystone pattern: Roles can imply other roles (admin implies member). Policies are per-service, not global strings.
|
||||
|
||||
**Options**:
|
||||
1. Keep flat scopes, let individual services interpret them (current)
|
||||
2. Add implied scope resolution: `"admin"` → `["relay:connect", "secrets:derive", ...]`
|
||||
3. Adopt a policy language (JSON policy documents like Rustfs IAM)
|
||||
|
||||
**Recommendation**: Start with option 1. Add implied scope resolution in alknet-storage when multi-tenant deployment requires it. A full policy language is Phase D territory and should follow what Rustfs already uses (MinIO-style JSON policies) rather than inventing something new.
|
||||
|
||||
---
|
||||
|
||||
### OQ-DEF-04: How should the GitService adapter work across HTTP and SSH?
|
||||
|
||||
gitserver provides `gitserver-core` (transport-agnostic git protocol logic) and `gitserver-http` (Axum HTTP layer). alknet's architecture supports two paths:
|
||||
|
||||
**Path A — HTTP MessageInterface**: Git operations over HTTPS, with alknet's HTTP interface authenticating the request and passing Identity to the GitService handler. The GitService handler uses `Identity` to determine repo access and calls `gitserver-core` directly.
|
||||
|
||||
**Path B — SSH StreamInterface**: Git operations over SSH, where the SSH interface already authenticates the user. Git commands are dispatched through SSH channels (similar to how `channel_open_direct_tcpip` works for port forwarding, but with a `git-upload-pack` / `git-receive-pack` channel type).
|
||||
|
||||
**Path C — Both**: `gitserver-core` as the protocol engine, `gitserver-http` for HTTPS, and a custom `SshGitInterface` for SSH-git channels.
|
||||
|
||||
**Recommendation**: Phase 1 — Path A (HTTP only). Phase 2 — Path C (both). The git-smart-HTTP-protocol is well-understood, and `gitserver-http` can be nested into alknet's Axum router. SSH git requires designing a new channel type in `SshInterface`.
|
||||
|
||||
---
|
||||
|
||||
### OQ-DEF-05: Should alknet act as an OIDC provider (Phase D of credential-provider.md)?
|
||||
|
||||
This is the most cross-cutting question. If alknet becomes an OIDC provider, it becomes the identity backbone for all self-hosted services (rustfs, gitea, etc.). This maps to OpenStack Keystone's role but with a different scope model.
|
||||
|
||||
**Benefits**:
|
||||
- Eliminates stored credential management for OIDC-compatible services
|
||||
- Users authenticate once via alknet (SSH key or token) and get scoped access to all services
|
||||
- Maps directly to `Identity.scopes → OIDC claims → service policies`
|
||||
|
||||
**Complexity**:
|
||||
- Requires OIDC authorization server endpoints (authorize, token, userinfo, jwks)
|
||||
- Requires PKCE flow for browser-based auth
|
||||
- Requires claim → policy mapping per service
|
||||
- alknet is not currently designed to be an OIDC server
|
||||
|
||||
**Recommendation**: Phase D (long-term). Phases A-C use static credentials and managed credentials, which are sufficient for most deployments. OIDC provider is a quality-of-life improvement that becomes important in multi-user self-hosted setups.
|
||||
|
||||
---
|
||||
|
||||
### OQ-DEF-06: How does Domain Event vs Integration Event discipline apply to self-hosted services?
|
||||
|
||||
ADR-032 says domain events stay within the service boundary and integration events cross it. Rustfs and gitea are outside alknet's boundary but inside the deployment boundary. Where do their events fall?
|
||||
|
||||
**Options**:
|
||||
1. Self-hosted services are external: their events are integration events, consumed via callback/webhook
|
||||
2. Self-hosted services are part of alknet's boundary: use Honker streams internally, project to integration events for cross-node
|
||||
3. Hybrid: alknet projects rustfs/gitea state changes into `EventEnvelope` integration events, but rustfs/gitea internal events stay in their own boundary
|
||||
|
||||
**Recommendation**: Option 3. Self-hosted services have their own internal event systems. alknet projects state changes (bucket created, repo pushed) into `EventEnvelope` integration events for cross-node communication. Honker streams are for events within alknet-core services only.
|
||||
|
||||
---
|
||||
|
||||
### OQ-DEF-07: How should the smart-contract / on-chain identity model relate to alknet's IdentityProvider?
|
||||
|
||||
The distributed git concept (NFT-based org/repo tokens) introduces a third `IdentityProvider` implementation that validates identity on-chain. How does this relate to the existing two implementations?
|
||||
|
||||
**Option 1 — OnChainIdentityProvider**: A new implementation of the `IdentityProvider` trait that checks on-chain ownership. Slow path: on-chain verification (0.5-5s on L2). Fast path: local ACL metagraph cache validated against on-chain state periodically.
|
||||
|
||||
**Option 2 — Separate verification layer**: On-chain verification is a separate step, not an IdentityProvider. After normal auth (SSH key or token), a second check verifies on-chain ownership for specific operations (e.g., write to a distributed repo).
|
||||
|
||||
**Option 3 — CredentialProvider extension**: On-chain verification is outbound — alknet authenticates TO the smart contract to verify repo permissions. This would be a new `CredentialSet` variant.
|
||||
|
||||
**Recommendation**: Option 1 for the long term. The `IdentityProvider` trait is designed to be pluggable. An `OnChainIdentityProvider` with local cache is additive. It resolves on-chain identity to an `Identity` struct just like `ConfigIdentityProvider` and `StorageIdentityProvider`. The seed derivation path `m/44'/60'/0'/0/0` (Ethereum) alongside `m/74'/0'/0'/0'` (Ed25519 identity) provides a cryptographic link between the two key types.
|
||||
|
||||
---
|
||||
|
||||
### OQ-DEF-08: Should the "interface" concept in auth.md (which distinguishes auth "presentation" per transport/interface pair) be renamed to avoid confusion with Layer 2 "Interface"?
|
||||
|
||||
In auth.md, "auth presentation" is the mechanism by which credentials are presented on each interface:
|
||||
- SSH: key handshake
|
||||
- HTTP: Bearer header
|
||||
- DNS: token in query labels
|
||||
- WebTransport: token in CONNECT request
|
||||
|
||||
This is NOT the same as "Interface" (Layer 2), but uses the same word. Should we adopt a distinct term?
|
||||
|
||||
**Options**:
|
||||
1. Keep "auth presentation" — it's already distinct from "Interface" (Layer 2)
|
||||
2. Rename to "auth mechanism" or "credential presentation" to be more precise
|
||||
3. Use the term from the interface-model.md table: "(Transport, Interface) → Auth mechanism"
|
||||
|
||||
**Recommendation**: Option 2. "Credential presentation" is precise and doesn't overload "interface". Update auth.md to use "credential presentation per (Transport, Interface) pair" consistently.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [interface-model.md](interface-model.md) — StreamInterface / MessageInterface trait design
|
||||
- [credential-provider.md](credential-provider.md) — CredentialProvider, CredentialSet (outbound auth)
|
||||
- [tls-transport.md](tls-transport.md) — Unified multi-interface architecture
|
||||
- [integration-plan.md](../integration-plan.md) — Phase structure, OperationEnv, event boundary discipline
|
||||
- [identity.md](../../architecture/identity.md) — Identity struct, IdentityProvider trait
|
||||
- [auth.md](../../architecture/auth.md) — Unified auth, AuthToken format
|
||||
- [services.md](../../architecture/services.md) — irpc services, OperationEnv
|
||||
- [event-source-types.md](../event-sourcing/event_source_types.md) — Domain events vs integration events
|
||||
- [ADR-032](../../architecture/decisions/032-event-boundary-discipline.md) — Event boundary rule
|
||||
- [ADR-033](../../architecture/decisions/033-operationenv-irpc-call-protocol.md) — OperationEnv, three dispatch paths
|
||||
- [references/rustfs/](../references/rustfs/) — Rustfs research and reference
|
||||
- [references/gitserver/](../references/gitserver/) — Gitserver research and reference
|
||||
- [references/openstack-keystone/](../references/openstack-keystone/) — OpenStack Keystone concepts
|
||||
- [references/distributed-identity/](../references/distributed-identity/) — Distributed identity and smart contract ACL
|
||||
@@ -1,367 +0,0 @@
|
||||
# Interface Model: Stream and Message Interfaces
|
||||
|
||||
> Status: Research / Draft
|
||||
> Last updated: 2026-06-08
|
||||
> Part of: Phase 2 planning
|
||||
|
||||
## Overview
|
||||
|
||||
The current three-layer model (ADR-026, [interface.md](../../architecture/interface.md)) defines Transport (Layer 1), Interface (Layer 2), and Protocol (Layer 3). The `Interface` trait assumes a persistent byte stream from a `Transport`, which works for SSH and raw framing. However, two important interface types — HTTP and DNS — don't fit this model: they handle individual requests, not persistent sessions. This document proposes splitting the interface model into `StreamInterface` and `MessageInterface`, adding HTTP as a first-class interface, and reclassifying DNS from a transport to a message-based interface.
|
||||
|
||||
## Problem Statement
|
||||
|
||||
### DNS is not a transport
|
||||
|
||||
The current `TransportKind` enum includes `Dns { domain: String }` alongside `Tcp`, `Tls`, and `Iroh`. But DNS doesn't produce a `AsyncRead + AsyncWrite + Unpin + Send` byte stream. It's a request/response protocol. Listing it as a transport conflates different abstractions. DNS encodes/decodes `EventEnvelope` frames as DNS query/response pairs — that's an interface behavior, not a transport behavior.
|
||||
|
||||
### HTTP is missing as an interface
|
||||
|
||||
The current valid (Transport, Interface) pairs are all stream-based:
|
||||
|
||||
| Transport | Interface |
|
||||
|---|---|
|
||||
| TLS | SSH |
|
||||
| TCP | SSH |
|
||||
| iroh | SSH |
|
||||
| DNS | raw framing |
|
||||
| WebTransport | SSH |
|
||||
| WebTransport | raw framing |
|
||||
| TCP | raw framing |
|
||||
|
||||
But there's no HTTP interface — the (TCP/TLS, HTTP) pair that accepts standard HTTP requests and maps them to call protocol operations. This is the **server-side** equivalent of `OpenAPIServiceRegistry` (which does client-side: consuming OpenAPI specs to make outbound HTTP calls). Without it, external clients (browsers, curl, monitoring) can only reach alknet through SSH.
|
||||
|
||||
### Auth across all interfaces
|
||||
|
||||
Different interfaces authenticate differently, but all resolve to the same `Identity` through `IdentityProvider`:
|
||||
|
||||
| (Transport, Interface) | Auth mechanism | Resolves via |
|
||||
|---|---|---|
|
||||
| (TLS, SSH) | SSH public key handshake | `IdentityProvider::resolve_from_fingerprint()` |
|
||||
| (TCP, SSH) | SSH public key handshake | `IdentityProvider::resolve_from_fingerprint()` |
|
||||
| (iroh, SSH) | SSH public key handshake | `IdentityProvider::resolve_from_fingerprint()` |
|
||||
| (TLS, raw framing) | Token in frame header | `IdentityProvider::resolve_from_token()` |
|
||||
| (TCP, raw framing) | Token in frame header | `IdentityProvider::resolve_from_token()` |
|
||||
| (WebTransport, raw framing) | Token in CONNECT request | `IdentityProvider::resolve_from_token()` |
|
||||
| (TLS, HTTP) | HTTP Authorization header | `IdentityProvider::resolve_from_token()` |
|
||||
| (—, DNS) | Token embedded in DNS query | `IdentityProvider::resolve_from_token()` |
|
||||
|
||||
All token-based paths use the same `AuthToken` format (Ed25519-signed timestamp, defined in [auth.md](../../architecture/auth.md)). The `IdentityProvider` trait doesn't change — `resolve_from_token()` already covers all of these. The difference is just how the token gets extracted from the wire format.
|
||||
|
||||
## Design
|
||||
|
||||
### StreamInterface and MessageInterface
|
||||
|
||||
The current `Interface` trait has this signature:
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait Interface: Send + Sync + 'static {
|
||||
type Session;
|
||||
async fn accept(stream: TransportStream, config: &InterfaceConfig) -> Result<Self::Session>;
|
||||
}
|
||||
```
|
||||
|
||||
This works for SSH and raw framing — both run over a duplex stream. But HTTP and DNS are **message-based**: they receive isolated requests, not persistent sessions. The interface model needs to accommodate both patterns.
|
||||
|
||||
**Rename `Interface` to `StreamInterface`** for stream-based connections:
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait StreamInterface: Send + Sync + 'static {
|
||||
type Session;
|
||||
async fn accept(stream: TransportStream, config: &InterfaceConfig) -> Result<Self::Session>;
|
||||
}
|
||||
```
|
||||
|
||||
**Add `MessageInterface`** for message-based request/response interfaces:
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait MessageInterface: Send + Sync + 'static {
|
||||
async fn handle_request(&self, request: InterfaceRequest) -> Result<InterfaceResponse>;
|
||||
}
|
||||
```
|
||||
|
||||
Why separate traits instead of one:
|
||||
- Different signatures: `StreamInterface` produces a session from a stream. `MessageInterface` handles an individual request.
|
||||
- Different lifecycles: Stream sessions are long-lived (SSH channels persist). Message handlers are stateless per-request (each HTTP request is independent).
|
||||
- Different transport ownership: `StreamInterface` receives a `TransportStream` from elsewhere. `MessageInterface` manages its own transport (HTTP server, DNS server).
|
||||
|
||||
### InterfaceRequest / InterfaceResponse
|
||||
|
||||
```rust
|
||||
pub struct InterfaceRequest {
|
||||
pub operation_path: String, // e.g., "/head/auth/verify"
|
||||
pub input: Value, // JSON input payload
|
||||
pub auth_token: Option<AuthToken>, // Extracted from wire format
|
||||
pub metadata: HashMap<String, String>,
|
||||
}
|
||||
|
||||
pub struct InterfaceResponse {
|
||||
pub result: Result<Value, CallError>,
|
||||
pub status: u16, // HTTP status, DNS result code, etc.
|
||||
pub headers: HashMap<String, String>,
|
||||
}
|
||||
```
|
||||
|
||||
This is a normalized interface-agnostic request/response. The `MessageInterface` implementation extracts the operation path, input, and auth token from its wire format (HTTP, DNS, etc.) and constructs an `InterfaceRequest`. The call protocol handler processes it and returns an `InterfaceResponse` that the implementation serializes back to its wire format.
|
||||
|
||||
### HTTP Interface
|
||||
|
||||
The HTTP interface accepts standard HTTP requests and maps them to call protocol operations:
|
||||
|
||||
```
|
||||
POST /v1/{namespace}/{op} → registry.invoke(namespace, op, input) (mutation)
|
||||
GET /v1/{namespace}/{op} → registry.invoke(namespace, op, input) (query, params as input)
|
||||
GET /v1/{namespace}/{op} SSE → registry.subscribe(namespace, op, input) (subscription)
|
||||
```
|
||||
|
||||
This is how external clients invoke alknet operations without SSH. Use cases:
|
||||
- Dashboard UI calling operations via fetch()
|
||||
- Third-party service integration via REST API
|
||||
- Health checks and monitoring endpoints
|
||||
- Other alknet nodes using `OpenAPIServiceRegistry` to register against this API
|
||||
|
||||
```rust
|
||||
pub struct HttpInterface {
|
||||
identity_provider: Arc<dyn IdentityProvider>,
|
||||
registry: Arc<OperationRegistry>,
|
||||
env: OperationEnv,
|
||||
}
|
||||
```
|
||||
|
||||
Auth: Extract `Authorization: Bearer <token>` header, pass to `IdentityProvider::resolve_from_token()`. The token is the same `AuthToken` format used by WebTransport and raw framing.
|
||||
|
||||
The HTTP interface manages its own transport layer (hyper/axum/actix). It doesn't need a `Transport` from Layer 1 — HTTP IS the transport. This is the same pattern as the DNS interface.
|
||||
|
||||
### DNS Interface
|
||||
|
||||
DNS is not a transport. It's a **message-based interface** that encodes `EventEnvelope` frames as DNS query/response pairs:
|
||||
|
||||
```
|
||||
DNS query: "_alknet.request.{base64url(payload)}.alk.dev TXT?"
|
||||
→ decoded as EventEnvelope (call.requested)
|
||||
→ call protocol handler processes it
|
||||
→ encoded as EventEnvelope (call.responded)
|
||||
→ returned as DNS TXT record response
|
||||
```
|
||||
|
||||
```rust
|
||||
pub struct DnsInterface {
|
||||
domain: String,
|
||||
identity_provider: Arc<dyn IdentityProvider>,
|
||||
registry: Arc<OperationRegistry>,
|
||||
env: OperationEnv,
|
||||
}
|
||||
```
|
||||
|
||||
Auth: Token embedded in the DNS query. Same `AuthToken` format.
|
||||
|
||||
The DNS interface runs its own DNS server. It doesn't need a separate `Transport` — DNS is both the transport and the interface combined.
|
||||
|
||||
### Remove TransportKind::Dns
|
||||
|
||||
Since DNS is a `MessageInterface` (not a transport), `TransportKind::Dns` should be removed from the enum. The `ListenerConfig` enum should be updated to cover both stream and message listeners:
|
||||
|
||||
```rust
|
||||
pub enum ListenerConfig {
|
||||
Stream {
|
||||
transport: TransportKind,
|
||||
interface: StreamInterfaceKind,
|
||||
},
|
||||
Message {
|
||||
interface: MessageInterfaceKind,
|
||||
bind_addr: SocketAddr,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
This cleanly separates "listen for byte streams" from "listen for messages."
|
||||
|
||||
### Revised Interface Pairs
|
||||
|
||||
**Stream-based connections** (persistent session, `StreamInterface`):
|
||||
|
||||
| Transport | StreamInterface | Auth | Use case |
|
||||
|---|---|---|---|
|
||||
| TLS | SshInterface | SSH pubkey handshake | Standard alknet tunnel |
|
||||
| TCP | SshInterface | SSH pubkey handshake | Plain SSH tunnel |
|
||||
| iroh | SshInterface | SSH pubkey handshake | P2P SSH tunnel |
|
||||
| TCP | RawFramingInterface | Token in frame header | Local service mesh |
|
||||
| TLS | RawFramingInterface | Token in frame header | Secure mesh |
|
||||
| WebTransport | SshInterface | SSH pubkey handshake | Browser SSH tunnel (future) |
|
||||
| WebTransport | RawFramingInterface | Token in CONNECT request | Browser call protocol (future) |
|
||||
|
||||
**Message-based interfaces** (stateless per-request, `MessageInterface`):
|
||||
|
||||
| MessageInterface | Auth | Owns transport? | Use case |
|
||||
|---|---|---|---|
|
||||
| HttpInterface | Authorization header (Bearer token) | Yes (hyper/axum) | REST API, dashboard, integrations |
|
||||
| DnsInterface | Token embedded in query labels | Yes (DNS server) | Censorship-resistant control channel |
|
||||
| WebSocketInterface | Token in handshake or first message | Yes (WS server) | Browser persistent connection (future) |
|
||||
|
||||
The `MessageInterface` implementations manage their own transport. They don't need the `Transport` trait because they're not wrapping a generic byte stream — they ARE the transport+interface combined.
|
||||
|
||||
### Unified auth across all interfaces
|
||||
|
||||
Every interface resolves to the same `Identity` through `IdentityProvider`:
|
||||
|
||||
```
|
||||
SSH fingerprint → IdentityProvider::resolve_from_fingerprint → Identity
|
||||
Bearer token → IdentityProvider::resolve_from_token → Identity
|
||||
HTTP Authorization → IdentityProvider::resolve_from_token → Identity
|
||||
DNS embedded token → IdentityProvider::resolve_from_token → Identity
|
||||
WebSocket token → IdentityProvider::resolve_from_token → Identity
|
||||
```
|
||||
|
||||
The token format is the same `AuthToken = base64url(key_id || timestamp || signature)` defined in [auth.md](../../architecture/auth.md). The interface just extracts the credential from its wire format. `IdentityProvider` resolves it to an `Identity`. The call protocol handler receives `OperationContext` with that identity.
|
||||
|
||||
In database-backed deployments (`StorageIdentityProvider`), `Identity.id` is the account UUID — so the same person connecting via SSH, HTTP, or DNS resolves to the same identity. No separate `account_id` field needed.
|
||||
|
||||
### ConfigIdentityProvider: Token auth without a database
|
||||
|
||||
The config-based (minimal) deployment gains API key / bearer token support through `DynamicConfig.auth`:
|
||||
|
||||
```toml
|
||||
[auth.ssh]
|
||||
authorized_keys = [...]
|
||||
|
||||
[auth.token]
|
||||
enabled = true
|
||||
max_token_age = "5m"
|
||||
# key_source = "shared" (default: same keys as SSH)
|
||||
|
||||
[[auth.api_keys]]
|
||||
prefix = "alk_"
|
||||
hash = "sha256:xyz..."
|
||||
scopes = ["relay:connect"]
|
||||
description = "dashboard service account"
|
||||
```
|
||||
|
||||
`ConfigIdentityProvider::resolve_from_token()` already exists in the current spec. It verifies the `AuthToken` format (Ed25519 signed timestamp) against the same `authorized_keys` set used for SSH. The `api_keys` section adds an alternative: simple bearer tokens (hash-verified, with optional TTL) that don't require Ed25519 key pairs. This is useful for service accounts and automation.
|
||||
|
||||
Both token types produce the same `Identity`. Config-based `Identity.id` is the key fingerprint (for `AuthToken`) or the key prefix (for simple bearer tokens). In database-backed deployments, both resolve to the account UUID.
|
||||
|
||||
## Service Decomposition
|
||||
|
||||
### AuthService (existing — ADR-028)
|
||||
|
||||
Resolves **inbound** credentials to an `Identity`. Already defined. Works across all interfaces — SSH interface calls `resolve_from_fingerprint()`, HTTP/DNS interfaces call `resolve_from_token()`. No changes needed.
|
||||
|
||||
### CredentialService (new — see credential-provider.md)
|
||||
|
||||
Resolves **outbound** credentials for external service access. Defined in [credential-provider.md](credential-provider.md).
|
||||
|
||||
### AccountService (new — storage layer)
|
||||
|
||||
Manages accounts and credential associations. This is a storage-layer irpc service, not a core concern:
|
||||
|
||||
- `AccountProtocol::CreateAccount { display_name, default_scopes }`
|
||||
- `AccountProtocol::GetAccount { account_id }`
|
||||
- `AccountProtocol::AddCredential { account_id, credential }` (SSH key, API key)
|
||||
- `AccountProtocol::RemoveCredential { account_id, credential_id }`
|
||||
- `AccountProtocol::ListCredentials { account_id }`
|
||||
|
||||
This is the CRUD layer. `StorageIdentityProvider` uses it internally. External management (admin UI) goes through `AccountService`. Analogous to how `ConfigService` provides `ConfigReloadHandle` — core has the read trait, storage has the management service.
|
||||
|
||||
Core doesn't need `AccountService` for operation. `IdentityProvider` is the read-only contract. Account management is additive.
|
||||
|
||||
## Impact on Existing Specs
|
||||
|
||||
### interface.md
|
||||
|
||||
Needs revision:
|
||||
|
||||
1. **Rename `Interface` to `StreamInterface`** — the current trait becomes the stream-specific variant.
|
||||
2. **Add `MessageInterface` trait** — for HTTP, DNS, WebSocket.
|
||||
3. **Add `HttpInterface`** as a `MessageInterface` implementation.
|
||||
4. **Clarify DNS** — DNS is a `MessageInterface`, not a (DNS transport, raw framing) pair. Remove `TransportKind::Dns` from the transport enum.
|
||||
5. **Add valid message-based interface pairs** table alongside the stream-based pairs table.
|
||||
6. **Add `InterfaceRequest` / `InterfaceResponse`** types that normalize calls across message interfaces.
|
||||
|
||||
### auth.md
|
||||
|
||||
Needs revision:
|
||||
|
||||
1. **Add HTTP interface auth** — `Authorization: Bearer <token>` extraction.
|
||||
2. **Add DNS interface auth** — token embedded in DNS query labels.
|
||||
3. **Add auth presentation table** showing all interface/auth combos.
|
||||
4. **Add simple API keys** — bearer tokens (hash-verified, with optional TTL) for service accounts. Not all token auth needs Ed25519 key pairs.
|
||||
|
||||
### transport.md
|
||||
|
||||
Minor: **Remove `TransportKind::Dns`** from the enum. Add note that DNS is handled as a `MessageInterface`.
|
||||
|
||||
### call-protocol.md
|
||||
|
||||
Minor update: the call protocol handler should accept `EventEnvelope` frames from both `StreamInterface::Session` and `MessageInterface::handle_request()`. The dispatch logic is the same — only the framing differs.
|
||||
|
||||
### ADR-026
|
||||
|
||||
Needs update: the three-layer model is correct, but the (Transport, Interface) pair enumeration in ADR-026 lists DNS as a transport. This should be revised to show `StreamInterface` and `MessageInterface` as two interface categories at Layer 2.
|
||||
|
||||
## Phasing Considerations
|
||||
|
||||
| Work | Suggested Phase | Notes |
|
||||
|---|---|---|
|
||||
| Rename `Interface` → `StreamInterface` | Phase 1 (now) | Rename only, no behavior change. Existing code already implements the stream pattern. |
|
||||
| Define `MessageInterface` trait | Phase 1 (now) | Cheap, forward-compatible. Define the trait and `InterfaceRequest`/`InterfaceResponse` types. |
|
||||
| Define `HttpInterface` stub | Phase 1 (now) | Define the struct and impl signature. Full HTTP server wiring can wait. |
|
||||
| `TransportKind::Dns` removal | Phase 1 (now) | Clean up the enum before code depends on `TransportKind::Dns`. |
|
||||
| `ListenerConfig` with Stream/Message variants | Phase 1 (now) | Update the server accept loop to support both interface types. |
|
||||
| `HttpInterface` implementation | Phase 2 | Full HTTP server with router, auth middleware, SSE. Depends on core being stable. |
|
||||
| `DnsInterface` implementation | Phase 3+ | DNS protocol is non-trivial. Deferring is fine. |
|
||||
| `AccountService` irpc protocol | Phase 2 | CRUD for accounts. Lives in alknet-storage. |
|
||||
| `ApiKeys` in `DynamicConfig.auth` | Phase 1 (now) | Enable bearer token auth in config-based deployments. |
|
||||
|
||||
The key observation: defining the traits (`MessageInterface`, `InterfaceRequest`, `HttpInterface` stub) now is cheap and prevents refactoring later. The actual HTTP server implementation can wait for Phase 2. But the trait surface needs to exist in Phase 1 so downstream code can target it.
|
||||
|
||||
## Open Questions
|
||||
|
||||
### OQ-IF-03: Should `MessageInterface` and `StreamInterface` share a common trait?
|
||||
|
||||
Recommendation: Independent traits. Different signatures (`handle_request` vs `accept` + `next_event/send_event`), different lifecycles (stateless vs session-stateful), different transport ownership (self-managed vs provided). A common super-trait adds complexity without clear benefit.
|
||||
|
||||
### OQ-IF-04: Should `TransportKind::Dns` be removed from the enum?
|
||||
|
||||
Recommendation: Yes. DNS doesn't produce byte streams. Remove it and add `ListenerConfig::Message` variant. This is a cleanup, not a breaking change — `TransportKind::Dns` is currently a tag with no acceptor implementation.
|
||||
|
||||
### OQ-IF-05: Should the HTTP interface share a port with the SSH listener?
|
||||
|
||||
In production, alknet might run SSH on port 22 and HTTP on port 443. Or both on 443 (TLS with ALPN). The `HttpInterface` could share a TLS listener with `SshInterface` if ALPN negotiation selects SSH vs. HTTP.
|
||||
|
||||
Recommendation: Start simple — separate ports. HTTP on its own port (default 8080 or configured via `[[listeners]]`). ALPN multiplexing is a future optimization that doesn't change the interface abstraction.
|
||||
|
||||
### OQ-IF-06: Should the HTTP interface auto-generate OpenAPI specs from the OperationRegistry?
|
||||
|
||||
If alknet exposes operations as `POST /v1/{namespace}/{op}`, the HTTP interface could auto-generate an OpenAPI spec from the registered `OperationSpec`s. This would provide:
|
||||
- Interactive API documentation
|
||||
- Automatic client SDK generation
|
||||
- Compatibility with `OpenAPIServiceRegistry` (another alknet node's `FromOpenAPI` could register against this spec)
|
||||
|
||||
This is the reverse of `OpenAPIServiceRegistry` — instead of consuming an OpenAPI spec to register operations, it produces an OpenAPI spec from registered operations. The `OperationSpec` already has `input_schema`, `output_schema`, `description`, and `tags`.
|
||||
|
||||
Recommendation: Yes, but Phase 4+. The HTTP interface needs to exist first.
|
||||
|
||||
### OQ-IF-07: How do self-hosted services (rustfs, gitea) authenticate requests from alknet users?
|
||||
|
||||
When alknet sits in front of rustfs or gitea (e.g., as a reverse proxy or HTTP interface gateway), how does it map alknet identities to external service identities?
|
||||
|
||||
Options:
|
||||
1. **Shared secret / API key**: Alknet holds a service-level credential. All proxied requests use it. Simple but loses per-user identity on the external service.
|
||||
2. **Identity-bound credentials**: Each alknet account has a corresponding rustfs/gitea credential, looked up via `Identity.id`. Per-user ACL on the external service.
|
||||
3. **Alknet as OIDC provider**: Rustfs/gitea trust alknet as their identity provider. No stored credentials — users authenticate directly via OIDC.
|
||||
|
||||
Recommendation: Start with Option 1. Add Option 2 when multi-tenant access is needed. Option 3 is the long-term goal (Phase D in [credential-provider.md](credential-provider.md)).
|
||||
|
||||
## References
|
||||
|
||||
- [interface.md](../../architecture/interface.md) — Current Interface layer spec (needs update for `StreamInterface`/`MessageInterface`)
|
||||
- [auth.md](../../architecture/auth.md) — Unified auth, IdentityProvider, AuthToken format
|
||||
- [identity.md](../../architecture/identity.md) — Identity struct, IdentityProvider trait
|
||||
- [call-protocol.md](../../architecture/call-protocol.md) — Call protocol, OperationEnv
|
||||
- [services.md](../../architecture/services.md) — irpc service definitions
|
||||
- [credential-provider.md](credential-provider.md) — CredentialProvider, CredentialSet (Phase 2)
|
||||
- [ADR-026](../../architecture/decisions/026-transport-interface-separation.md) — Three-layer model (needs update for `MessageInterface`)
|
||||
- [ADR-023](../../architecture/decisions/023-unified-auth-shared-key-material.md) — Unified auth with shared key material
|
||||
- [ADR-029](../../architecture/decisions/029-identity-core-type.md) — Identity as core type
|
||||
@@ -1,401 +0,0 @@
|
||||
# TLS Transport: Unified Multi-Interface Architecture
|
||||
|
||||
> Status: Research / Draft
|
||||
> Last updated: 2026-06-08
|
||||
> Part of: Phase 2 planning
|
||||
|
||||
## Overview
|
||||
|
||||
Alknet's existing stealth mode already does protocol detection: after a TLS handshake, the server peeks at the first bytes and routes SSH connections one way and HTTP connections another. This document extends that pattern into a unified architecture where a single TLS port supports SSH, REST, WebSocket, SSE, and gRPC — all routed by the first bytes after the TLS handshake. Alongside this, QUIC (UDP) supports WebTransport and iroh P2P, and DNS runs on its own port. Every interface resolves to the same call protocol operations through the `OperationRegistry`.
|
||||
|
||||
This replaces the earlier `(Transport, Interface)` pair model for TCP/TLS connections with a clearer distinction: persistent stream interfaces go through the peek-based router, message-based interfaces manage their own transports, and axum serves as the multiplexer for everything HTTP.
|
||||
|
||||
## Current State
|
||||
|
||||
The stealth mode implementation in `crates/alknet-core/src/server/stealth.rs` does byte-peeking after TLS handshake:
|
||||
|
||||
```rust
|
||||
pub enum ProtocolDetection {
|
||||
Ssh,
|
||||
Http,
|
||||
}
|
||||
|
||||
pub async fn detect_protocol<S>(stream: S) -> (ProtocolDetection, BufReader<S>) {
|
||||
// Peek first bytes: "SSH-2.0-" → Ssh, anything else → Http
|
||||
}
|
||||
|
||||
pub async fn send_fake_nginx_404<S>(reader: &mut BufReader<S>) {
|
||||
// Currently: non-SSH gets a fake 404 and connection closed
|
||||
}
|
||||
```
|
||||
|
||||
This is almost exactly what we need. The `Http` detection currently sends a fake nginx 404. Instead, it should route to a real HTTP server.
|
||||
|
||||
## New Architecture
|
||||
|
||||
### TCP TLS Port 443: Peek-Based Routing
|
||||
|
||||
```
|
||||
Client connects to port 443
|
||||
│
|
||||
TLS handshake completes
|
||||
│
|
||||
Peek first bytes
|
||||
│
|
||||
├─ "SSH-2.0-" → SshInterface (russh, existing path)
|
||||
│
|
||||
└─ (anything else) → axum HTTP router
|
||||
│
|
||||
├─ POST /v1/{namespace}/{op} → registry.invoke()
|
||||
├─ GET /v1/{namespace}/{op} → registry.invoke()
|
||||
├─ GET /v1/{namespace}/{op} (SSE) → registry.subscribe()
|
||||
├─ POST /v1/batch → batch invoke
|
||||
├─ GET /v1/schema → registry.list_operations()
|
||||
├─ WebSocket upgrade /ws → WebSocketInterface
|
||||
├─ gRPC via tonic routes → tonic services
|
||||
├─ GET /.well-known/alknet/schema → OpenAPI spec generation
|
||||
└─ (anything else) → 404
|
||||
```
|
||||
|
||||
The peek happens after TLS, so the client sees a valid HTTPS server. The `send_fake_nginx_404` function becomes `hand_to_axum(stream)`. axum handles everything that isn't SSH.
|
||||
|
||||
### UDP Port 443: QUIC with ALPN Routing
|
||||
|
||||
```
|
||||
Client sends QUIC Initial to port 443 UDP
|
||||
│
|
||||
TLS 1.3 handshake with ALPN negotiation
|
||||
│
|
||||
├─ ALPN "h3" (WebTransport) → wtransport → RawFramingInterface
|
||||
│ │
|
||||
│ └─ SessionRequest → validate AuthToken
|
||||
│ from URL path or headers
|
||||
│ → OperationContext → call protocol
|
||||
│
|
||||
└─ ALPN "alknet" (iroh P2P) → iroh endpoint → RawFramingInterface
|
||||
│
|
||||
└─ existing iroh accept loop
|
||||
→ SshInterface or RawFramingInterface
|
||||
```
|
||||
|
||||
wtransport and iroh both listen on UDP 443. Quinn supports multiple ALPN protocols — the QUIC handshake negotiates which handler gets the connection.
|
||||
|
||||
### DNS Port 53: MessageInterface
|
||||
|
||||
```
|
||||
DNS query arrives on port 53 (UDP or TCP)
|
||||
│
|
||||
├─ UDP query → DnsInterface (MessageInterface)
|
||||
└─ TCP query → DnsInterface over DoT (TLS on port 853)
|
||||
│
|
||||
└─ Encode EventEnvelope as DNS TXT query
|
||||
Decode response from DNS TXT record
|
||||
AuthToken embedded in query labels
|
||||
→ IdentityProvider::resolve_from_token()
|
||||
→ OperationContext → call protocol
|
||||
```
|
||||
|
||||
DNS is a `MessageInterface` — it manages its own transport and handles individual request/response pairs. It doesn't sit on top of the TLS peek router.
|
||||
|
||||
### Revised Routing Table
|
||||
|
||||
| Protocol | Transport | Detection | Interface | Auth |
|
||||
|---|---|---|---|---|
|
||||
| SSH | TCP/TLS | Byte peek: `SSH-2.0-` prefix | SshInterface | SSH key fingerprint |
|
||||
| HTTP REST | TCP/TLS | Byte peek: not SSH → axum | axum handler → registry | `Authorization: Bearer <AuthToken>` |
|
||||
| WebSocket | TCP/TLS | Axum upgrade: `Upgrade: websocket` | axum upgrade handler | AuthToken in handshake |
|
||||
| SSE | TCP/TLS | Axum route: `Accept: text/event-stream` | axum handler → registry.subscribe() | AuthToken in header |
|
||||
| gRPC | TCP/TLS | Axum route: `content-type: application/grpc` | tonic via axum router | AuthToken in header/metadata |
|
||||
| WebTransport | QUIC (UDP) | ALPN `h3` | wtransport → RawFramingInterface | AuthToken in CONNECT URL |
|
||||
| iroh P2P | QUIC (UDP) | ALPN `alknet` | iroh → RawFramingInterface | iroh's existing auth |
|
||||
| DNS | UDP/TCP | Own listener | DnsInterface (MessageInterface) | AuthToken in query labels |
|
||||
|
||||
## Implementation
|
||||
|
||||
### Extending ProtocolDetection
|
||||
|
||||
The current `ProtocolDetection` enum gains variants for known HTTP sub-protocols:
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
||||
pub enum ProtocolDetection {
|
||||
Ssh,
|
||||
Http, // Any HTTP — axum handles sub-routing
|
||||
}
|
||||
```
|
||||
|
||||
This stays simple. SSH vs. not-SSH is the only peek-level decision. Everything else is HTTP-content routing inside axum. We don't need to detect WebSocket, SSE, or gRPC at the byte level — axum routes those by HTTP headers and paths.
|
||||
|
||||
The accept loop becomes:
|
||||
|
||||
```rust
|
||||
// After TLS handshake and peek:
|
||||
match detect_protocol(tls_stream).await {
|
||||
(ProtocolDetection::Ssh, reader) => {
|
||||
// Existing SSH path: hand to SshInterface
|
||||
handle_ssh(reader, config).await;
|
||||
}
|
||||
(ProtocolDetection::Http, reader) => {
|
||||
// Hand to axum HTTP server
|
||||
handle_http(reader, config).await;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Axum Integration
|
||||
|
||||
The axum server is an HTTP `Service` that receives the TLS stream after the peek. Since the TLS handshake is already complete, axum receives a plaintext stream:
|
||||
|
||||
```rust
|
||||
async fn handle_http(stream: BufReader<TlsStream>, config: ServerConfig) {
|
||||
let app = Router::new()
|
||||
.route("/v1/{namespace}/{op}", post(invoke_operation))
|
||||
.route("/v1/{namespace}/{op}", get(invoke_operation))
|
||||
.route("/v1/batch", post(invoke_batch))
|
||||
.route("/v1/schema", get(list_operations))
|
||||
.route("/ws", get(websocket_upgrade))
|
||||
// gRPC via tonic::Routes merged into axum router
|
||||
.layer(ExtractorLayer::new(config.identity_provider, config.registry))
|
||||
.layer(middleware::from_fn(auth_middleware));
|
||||
|
||||
// Serve the axum app on the TLS stream
|
||||
hyper::server::conn::http1::Builder::new()
|
||||
.serve_connection(TokioIo::new(stream), app.into_make_service())
|
||||
.with_upgrades() // Enables WebSocket upgrades
|
||||
.await;
|
||||
}
|
||||
```
|
||||
|
||||
The auth middleware extracts the `Authorization: Bearer <token>` header and calls `IdentityProvider::resolve_from_token()`. The operation handler constructs an `OperationContext` and calls `registry.invoke(namespace, op, input)`.
|
||||
|
||||
### WebTransport (QUIC/UDP)
|
||||
|
||||
WebTransport runs on UDP alongside iroh. The routing is by ALPN during the QUIC handshake:
|
||||
|
||||
```rust
|
||||
// Quinn server config with two ALPN protocols:
|
||||
let mut server_config = quinn::ServerConfig::with_crypto(Arc::new(tls_config));
|
||||
server_config.alpn_protocols = vec![
|
||||
WEBTRANSPORT_ALPN.to_vec(), // b"h3"
|
||||
IROH_ALPN.to_vec(), // existing iroh ALPN
|
||||
];
|
||||
|
||||
// Accept loop:
|
||||
loop {
|
||||
let incoming = quic_endpoint.accept().await;
|
||||
match incoming.alpn() {
|
||||
b"h3" => {
|
||||
// Hand to wtransport
|
||||
let session_request = IncomingSession::with_quic_incoming(incoming).await;
|
||||
// Validate AuthToken from URL path/headers
|
||||
// Create OperationContext
|
||||
// Route to call protocol via RawFramingInterface or HTTP-like handler
|
||||
}
|
||||
b"alknet" | IROH_ALPN => {
|
||||
// Hand to existing iroh accept loop
|
||||
handle_iroh(incoming).await;
|
||||
}
|
||||
_ => { /* reject unknown ALPN */ }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
wtransport's `with_quic_incoming()` escape hatch allows integrating with an externally managed Quinn endpoint, so alknet owns the Quinn `Endpoint` and routes WebTransport sessions to wtransport.
|
||||
|
||||
### Auth: Single Token Mechanism
|
||||
|
||||
Every interface except SSH uses the same `AuthToken` format defined in auth.md:
|
||||
|
||||
```
|
||||
AuthToken = base64url(key_id || timestamp || signature)
|
||||
key_id = SHA-256 fingerprint of the Ed25519 public key (32 bytes)
|
||||
timestamp = Unix seconds, big-endian u64 (8 bytes)
|
||||
signature = Ed25519 sign(key_id || timestamp_bytes, private_key)
|
||||
```
|
||||
|
||||
| Interface | Auth mechanism | Token location |
|
||||
|---|---|---|
|
||||
| SSH | SSH key handshake | In SSH protocol (not a token) |
|
||||
| HTTP REST | `Authorization: Bearer <AuthToken>` | HTTP header |
|
||||
| WebSocket | AuthToken in first message or query param | After upgrade |
|
||||
| SSE | `Authorization: Bearer <AuthToken>` | HTTP header |
|
||||
| gRPC | `Authorization: Bearer <AuthToken>` | HTTP/2 metadata |
|
||||
| WebTransport | AuthToken in CONNECT URL or header | WebTransport session request |
|
||||
| DNS | AuthToken embedded in DNS query labels | Encoded in domain name |
|
||||
|
||||
All token-based paths call `IdentityProvider::resolve_from_token()`. The `resolve_from_token()` implementation handles Ed25519 signature verification (for AuthTokens) and will also handle hash-verified API keys (shorter tokens for simpler integrations).
|
||||
|
||||
For services and automation where Ed25519 key pairs are inconvenient, short API keys work:
|
||||
|
||||
```
|
||||
API key: "alk_dGhlX3NlY3JldA" (~20 chars)
|
||||
Storage: SHA-256 hash of the full key
|
||||
Lookup: prefix match → hash verification → Identity
|
||||
```
|
||||
|
||||
API keys are specified in `DynamicConfig.auth` or stored in `api_keys` tables (database-backed). Both AuthTokens and API keys go through the same `resolve_from_token()` method — the implementation discriminates by prefix or format.
|
||||
|
||||
### Contract Pattern: call / batch / schema / subscribe
|
||||
|
||||
Every interface exposes the same four primitive operations through `OperationRegistry`:
|
||||
|
||||
| Primitive | HTTP | MCP | DNS | Call protocol |
|
||||
|---|---|---|---|---|
|
||||
| `call(namespace, op, input)` | `POST /v1/{ns}/{op}` | `tools/call` | `{op}.{ns}.alk.dev TXT?` | `call.requested` |
|
||||
| `batch([{ns, op, input}, ...])` | `POST /v1/batch` | (multiple `tools/call`) | (multiple queries) | (multiple `call.requested`) |
|
||||
| `schema(namespace?)` | `GET /v1/schema` | `tools/list` | (not typically) | `call.requested` with special op |
|
||||
| `subscribe(namespace, op, input)` | `GET /v1/{ns}/{op} SSE` | (future) | (not applicable) | `call.requested` with stream flag |
|
||||
|
||||
MCP's four core operations map directly:
|
||||
- `tools/list` → `schema()`
|
||||
- `tools/call` → `call()`
|
||||
- `prompts/list` → `schema("prompts")`
|
||||
- `prompts/get` → `call("prompts", "get", input)`
|
||||
|
||||
The `memory` tool pattern (one namespace gate dispatching to many operations behind it) is exactly `OperationRegistry` with `OperationSpec.access_control`:
|
||||
|
||||
```
|
||||
memory({tool:"help"}) → registry.invoke("memory", "help", {})
|
||||
memory({tool:"search"}) → registry.invoke("memory", "search", {query: "..."})
|
||||
memory({tool:"store"}) → registry.invoke("memory", "store", {key: "...", value: "..."})
|
||||
```
|
||||
|
||||
### Reverse: OpenAPI Spec Generation
|
||||
|
||||
The HTTP interface's `GET /v1/schema` endpoint (or `GET /.well-known/alknet/schema`) auto-generates an OpenAPI spec from the registered `OperationSpec`s. This creates a symmetry with `FromOpenAPI`:
|
||||
|
||||
```
|
||||
Inbound: HTTP request → axum handler → registry.invoke(namespace, op, input) → ResponseEnvelope → HTTP response
|
||||
Outbound: OpenAPI spec → FromOpenAPI(spec, config) → registry.register_all(operations) → HTTP client → external service
|
||||
```
|
||||
|
||||
Node A's HTTP interface produces an OpenAPI spec. Node B's `FromOpenAPI` consumes it. Alknet nodes can discover each other's capabilities via the schema endpoint.
|
||||
|
||||
## Relationship to StreamInterface / MessageInterface
|
||||
|
||||
The earlier `interface-model.md` research defined `StreamInterface` and `MessageInterface` traits. This doc refines the architecture:
|
||||
|
||||
**StreamInterface** — persistent byte stream, used for SSH and raw framing:
|
||||
- `SshInterface`: (TLS, SSH) — existing path, unchanged
|
||||
- `RawFramingInterface`: (TCP/TLS, raw framing) — for local mesh
|
||||
- `RawFramingInterface`: (iroh/QUIC, raw framing) — for P2P mesh
|
||||
|
||||
**MessageInterface** — manages its own transport, handles individual requests:
|
||||
- `DnsInterface`: Runs its own DNS server on port 53
|
||||
|
||||
**The HTTP case** is special. The axum router is not a `MessageInterface` in the same sense as DNS. It receives a stream (the TLS connection after peek), but it handles individual requests within that stream. It's better modeled as:
|
||||
|
||||
- A `StreamInterface` that internally routes to axum
|
||||
- Axum is the implementation detail, not a trait boundary
|
||||
- The call protocol handler receives `InterfaceRequest` and returns `InterfaceResponse` regardless of whether the request came from HTTP, DNS, SSH, or raw framing
|
||||
|
||||
The `InterfaceRequest` / `InterfaceResponse` types from `interface-model.md` still make sense as the normalized interface-agnostic request/response that all interfaces produce:
|
||||
|
||||
```rust
|
||||
pub struct InterfaceRequest {
|
||||
pub operation_path: String, // e.g., "/head/auth/verify"
|
||||
pub input: Value, // JSON input payload
|
||||
pub auth_token: Option<AuthToken>, // Extracted from wire format
|
||||
pub metadata: HashMap<String, String>,
|
||||
}
|
||||
|
||||
pub struct InterfaceResponse {
|
||||
pub result: Result<Value, CallError>,
|
||||
pub status: u16, // HTTP status, DNS result code, etc.
|
||||
pub headers: HashMap<String, String>,
|
||||
}
|
||||
```
|
||||
|
||||
But the HTTP implementation doesn't need to construct `InterfaceRequest` explicitly — it constructs `OperationContext` directly from the axum request and calls `registry.invoke()`. The `InterfaceRequest` abstraction is more useful for DNS where there's no framework doing routing for you.
|
||||
|
||||
## ListenerConfig Update
|
||||
|
||||
The `ListenerConfig` enum from the integration plan gains a `Http` variant alongside existing `Stream`:
|
||||
|
||||
```rust
|
||||
pub enum ListenerConfig {
|
||||
Stream {
|
||||
transport: TransportKind,
|
||||
interface: StreamInterfaceKind,
|
||||
},
|
||||
Http {
|
||||
bind_addr: SocketAddr,
|
||||
tls: bool, // true = TLS, false = plain TCP
|
||||
stealth: bool, // true = byte-peek protocol detection
|
||||
},
|
||||
Dns {
|
||||
bind_addr: SocketAddr,
|
||||
tls: bool, // true = DoT, false = plain DNS
|
||||
},
|
||||
}
|
||||
|
||||
pub enum StreamInterfaceKind {
|
||||
Ssh,
|
||||
RawFraming,
|
||||
}
|
||||
|
||||
pub enum TransportKind {
|
||||
Tcp,
|
||||
Tls { server_name: Option<String> },
|
||||
Iroh { endpoint_id: String },
|
||||
// NO Dns variant — DNS is a MessageInterface, not a Transport
|
||||
}
|
||||
```
|
||||
|
||||
For the common production deployment on port 443:
|
||||
|
||||
```toml
|
||||
[[listeners]]
|
||||
type = "stream"
|
||||
transport = { tls = {} }
|
||||
interface = "ssh"
|
||||
bind = "0.0.0.0:443"
|
||||
|
||||
[[listeners]]
|
||||
type = "http"
|
||||
bind = "0.0.0.0:443"
|
||||
tls = true
|
||||
stealth = true
|
||||
|
||||
# If separate ports are preferred:
|
||||
[[listeners]]
|
||||
type = "http"
|
||||
bind = "0.0.0.0:8080"
|
||||
tls = false
|
||||
stealth = false
|
||||
```
|
||||
|
||||
When `stealth = true` on an HTTP listener sharing a port with an SSH listener, the accept loop uses the byte-peek pattern to route connections to the correct handler.
|
||||
|
||||
When the HTTP listener is on its own port, no peeking is needed — everything is HTTP.
|
||||
|
||||
## Phasing
|
||||
|
||||
| Work | Phase | Notes |
|
||||
|---|---|---|
|
||||
| Extend `ProtocolDetection` to route `Http` to axum | Phase 1 (now) | Replace `send_fake_nginx_404` with axum handoff |
|
||||
| Axum HTTP server with `/v1/{ns}/{op}` routes | Phase 1 (now) | Core REST API for call protocol operations |
|
||||
| Auth middleware (`Authorization: Bearer`) | Phase 1 (now) | Uses existing `IdentityProvider::resolve_from_token()` |
|
||||
| `ListenerConfig::Http` variant | Phase 1 (now) | Define alongside existing `Stream` variant |
|
||||
| Remove `TransportKind::Dns` | Phase 1 (now) | Cleanup before code depends on it |
|
||||
| WebSocket upgrade handler | Phase 2 | axum `.with_upgrades()` is already available |
|
||||
| SSE streaming handler | Phase 2 | axum + `axum-streams` or `tokio-stream` |
|
||||
| gRPC via tonic integration | Phase 3 | `tonic::Routes` merges into axum router |
|
||||
| WebTransport (QUIC/UDP) | Phase 3 | wtransport integration, ALPN routing |
|
||||
| DNS interface | Phase 3+ | Uses `MessageInterface` trait, own listener |
|
||||
| OpenAPI spec generation from registry | Phase 3+ | `GET /v1/schema` or `GET /.well-known/alknet/schema` |
|
||||
| ALPN multiplexing on UDP 443 | Phase 3+ | Quinn ALPN routing between iroh and wtransport |
|
||||
|
||||
## References
|
||||
|
||||
- [stealth.rs](../../../crates/alknet-core/src/server/stealth.rs) — Current protocol detection implementation
|
||||
- [auth.md](../../architecture/auth.md) — AuthToken format, IdentityProvider, unified auth
|
||||
- [interface-model.md](interface-model.md) — StreamInterface / MessageInterface trait design
|
||||
- [credential-provider.md](credential-provider.md) — CredentialProvider, outbound auth
|
||||
- [call-protocol.md](../../architecture/call-protocol.md) — OperationRegistry, OperationEnv
|
||||
- [services.md](../../architecture/services.md) — irpc service definitions, OperationContext
|
||||
- [ADR-026](../../architecture/decisions/026-transport-interface-separation.md) — Three-layer model
|
||||
- [wtransport](/workspace/wtransport/) — WebTransport server implementation (QUIC/HTTP3, ALPN h3)
|
||||
- [iroh-relay](/workspace/iroh/iroh-relay/) — HTTP + WebSocket relay (hyper, MaybeTlsStream)
|
||||
- [hickory-dns](/workspace/hickory-dns/) — DNS server with DoT/DoH/DoQ/DoH3
|
||||
- [tonic](/workspace/tonic/) — gRPC framework (axum + hyper integration, ALPN h2)
|
||||
@@ -154,52 +154,70 @@ These docs describe concepts that carry forward but need updating to reflect the
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Clean Up Code
|
||||
## Phase 4: Greenfield Workspace
|
||||
|
||||
Not a rewrite — just remove dead weight so agents don't pattern-match to it.
|
||||
**Decision: Greenfield rather than in-place migration.** The old codebase is preserved at `/workspace/@alkdev/alknet-main/` as a reference implementation. The new workspace starts clean with only `alknet-secret` carried over (it's standalone with no alknet-core dependency).
|
||||
|
||||
### Delete from `alknet-core`
|
||||
### What was deleted
|
||||
|
||||
These modules/files implement concepts that the pivot replaces entirely. They'll be re-implemented in new crates:
|
||||
| What | Reason |
|
||||
|------|--------|
|
||||
| `crates/alknet-core/` | Replaced by new `alknet-core` v2 with ALPN router |
|
||||
| `crates/alknet/` | CLI will be rebuilt for new model |
|
||||
| `crates/alknet-napi/` | NAPI will be rebuilt as call protocol client |
|
||||
| `docs/architecture/` | Old model specs — will be replaced by SDD process |
|
||||
| `docs/research/core.md` | Three-layer model — superseded |
|
||||
| `docs/research/services.md` | irpc service layer — superseded |
|
||||
| `docs/research/storage.md` | Metagraph — deferred |
|
||||
| `docs/research/flow.md` | FlowGraph — deferred |
|
||||
| `docs/research/configuration.md` | Promoted to architecture already |
|
||||
| `docs/research/integration-plan.md` | Old model integration — superseded |
|
||||
| `docs/research/phase2/` | StreamInterface/MessageInterface, CredentialProvider — superseded |
|
||||
| `docs/research/event-sourcing/` | Not currently needed |
|
||||
| `docs/research/references/gitserver/` | MPL-2.0 licensed — licensing risk |
|
||||
| `docs/research/references/gitlfs/` | MIT/Apache — kept as fork candidate, moved to references |
|
||||
| `docs/research/references/honker/` | Biased toward old irpc model |
|
||||
| `docs/research/references/nats.rs/` | Not directly used |
|
||||
| `docs/research/references/distributed-identity/` | Deferred |
|
||||
| `docs/research/references/openstack-keystone/` | Not directly used |
|
||||
| `docs/research/references/polyglot/` | Not directly used |
|
||||
| `docs/research/references/rustfs/` | Not directly used (may return for alknet-fs) |
|
||||
| `docs/references/` | Stray duplicate directory |
|
||||
| `tasks/` | Old task graph — will be regenerated by SDD process |
|
||||
|
||||
| What | Lines | Reason |
|
||||
|------|-------|--------|
|
||||
| `src/interface/mod.rs` | 140 | `StreamInterface` / `MessageInterface` — replaced by `ProtocolHandler` |
|
||||
| `src/interface/pairs.rs` | 122 | Transport/interface validation — no longer needed |
|
||||
| `src/interface/config.rs` | 270 | `ListenerConfig` variants — replaced by ALPN advertisement |
|
||||
| `src/interface/session.rs` | 62 | `InterfaceSession` / `InterfaceEvent` — old model |
|
||||
| `src/interface/http.rs` | 66 | Old HTTP interface — becomes `alknet-http` handler |
|
||||
| `src/interface/dns.rs` | 47 | Old DNS interface — becomes `alknet-dns` handler |
|
||||
| `src/interface/raw_framing.rs` | 399 | Stealth mode byte-peek — replaced by ALPN negotiation |
|
||||
| `src/server/stealth.rs` | 316 | Stealth mode — replaced by ALPN negotiation |
|
||||
| `src/server/control_channel.rs` | 196 | SSH control channel for pubsub — old model |
|
||||
### What was kept
|
||||
|
||||
**Keep as-is (port later):**
|
||||
| What | Reason |
|
||||
|------|--------|
|
||||
| `crates/alknet-secret/` | Standalone crate, no alknet-core dependency, fully working |
|
||||
| `docs/research/pivot/` | The pivot proposal and this cleanup plan |
|
||||
| `docs/research/references/iroh/` | ALPN dispatch, QUIC endpoints — directly relevant |
|
||||
| `docs/research/references/ssh/` | russh, russh-sftp — directly relevant for alknet-ssh |
|
||||
| `docs/research/ops/` | fail2ban, certbot — production reference |
|
||||
| `docs/sdd_process.md` | The development process we follow |
|
||||
| `Cargo.toml` (workspace) | Updated to only include alknet-secret |
|
||||
| `Cargo.lock` | Preserved for alknet-secret dependencies |
|
||||
| `LICENSE-MIT`, `LICENSE-APACHE` | License files |
|
||||
| `README.md` | Updated for greenfield state |
|
||||
|
||||
| What | Lines | Destination |
|
||||
|------|-------|-------------|
|
||||
| `src/interface/ssh.rs` | 982 | → `alknet-ssh` (largest single extraction) |
|
||||
| `src/server/handler.rs` | 974 | → `alknet-ssh` (SSH server handler) |
|
||||
| `src/server/channel_proxy.rs` | 555 | → `alknet-ssh` (port forwarding proxy) |
|
||||
| `src/server/serve.rs` | 1526 | → rewrite as ALPN router (keep for reference, rewrite later) |
|
||||
| `src/call/*` | ~1200 | → `alknet-call` (relatively clean extraction) |
|
||||
| `src/auth/*` | ~1450 | → `alknet-core` (shared auth/identity) |
|
||||
| `src/config/*` | ~950 | → `alknet-core` (static/dynamic config) |
|
||||
| `src/transport/*` | ~1500 | → `alknet-core` (endpoint acceptors) |
|
||||
| `src/client/*` | ~1900 | → `alknet-ssh` (client session, SOCKS5, forwarding) |
|
||||
| `src/socks5/*` | ~800 | → `alknet-ssh` (SOCKS5 server) |
|
||||
| `src/credentials/*` | ~250 | → simplify into `alknet-core` auth |
|
||||
| `src/http/*` | ~340 | → `alknet-http` |
|
||||
| `src/error.rs` | ~240 | → `alknet-core` |
|
||||
| `src/testutil.rs` | ~140 | → `alknet-core` test utilities |
|
||||
### Reference implementation
|
||||
|
||||
### Delete entire crate
|
||||
The previous codebase is preserved at `/workspace/@alkdev/alknet-main/`. When spec'ing and implementing new crates, the architect and implementation specialists can reference the old code to understand what worked and what didn't. Key modules to port:
|
||||
|
||||
| Crate | Reason |
|
||||
|-------|--------|
|
||||
| (none yet — `alknet-storage` and `alknet-flowgraph` don't exist as crates) |
|
||||
| Old module | Lines | Port destination |
|
||||
|------------|-------|-----------------|
|
||||
| `src/interface/ssh.rs` | 982 | → `alknet-ssh` |
|
||||
| `src/server/handler.rs` | 974 | → `alknet-ssh` |
|
||||
| `src/server/channel_proxy.rs` | 555 | → `alknet-ssh` |
|
||||
| `src/server/serve.rs` | 1526 | → reference for ALPN router rewrite |
|
||||
| `src/call/*` | ~1200 | → `alknet-call` |
|
||||
| `src/auth/*` | ~1450 | → `alknet-core` |
|
||||
| `src/config/*` | ~950 | → `alknet-core` |
|
||||
| `src/transport/*` | ~1500 | → `alknet-core` |
|
||||
| `src/client/*` | ~1900 | → `alknet-ssh` |
|
||||
| `src/socks5/*` | ~800 | → `alknet-ssh` |
|
||||
|
||||
The current workspace only has `alknet-core`, `alknet-secret`, `alknet-napi`, and `alknet` (CLI). No storage or flowgraph crates exist to delete.
|
||||
**The old code is reference, not constraint.** Agents should understand what it did and why, then implement against the new ProtocolHandler trait and ALPN router — not copy-paste the old architecture.
|
||||
|
||||
---
|
||||
|
||||
@@ -243,17 +261,20 @@ Key architecture docs the architect will need to produce or rewrite:
|
||||
|
||||
## Execution Order
|
||||
|
||||
1. **Create `docs/_archived/` directory** and move files there (preserves git history)
|
||||
2. **Mark superseded ADRs** with `Superseded` status and pivot reference
|
||||
3. **Move obsolete research docs** to `docs/_archived/research/`
|
||||
4. **Annotate stale-but-keeping architecture docs** with `status: needs-update` frontmatter and pivot reference note
|
||||
5. **Delete replaced code modules** from `alknet-core` (interface layer, stealth, control channel)
|
||||
6. **Fix compilation** — removing modules will break imports. Fix them minimally (comment out, stub, or remove call sites) so the project compiles. This is temporary scaffolding, not the refactor.
|
||||
7. **Architect produces proper SDD architecture specs** per Phase 1 of the SDD process
|
||||
1. ~~Create `docs/_archived/` directory~~ → **Greenfield instead.** Old code preserved at `/workspace/@alkdev/alknet-main/`.
|
||||
2. ~~Mark superseded ADRs~~ → **Deleted.** Old architecture docs removed entirely. New ADRs will be created by the architect per SDD process.
|
||||
3. ~~Move obsolete research docs~~ → **Deleted.** Only kept directly relevant references (iroh, ssh, ops, pivot).
|
||||
4. ~~Annotate stale-but-keeping architecture docs~~ → **Deleted.** No stale docs remain. Architect will produce fresh specs.
|
||||
5. **Delete old source crates** (alknet-core, alknet, alknet-napi) — done
|
||||
6. **Update workspace Cargo.toml** to only include alknet-secret — done
|
||||
7. **Update README.md** for greenfield state — done
|
||||
8. **Verify compilation** — `cargo check` and `cargo test -p alknet-secret` both pass — done
|
||||
9. **Architect produces proper SDD architecture specs** per Phase 1 of the SDD process
|
||||
|
||||
After this cleanup, the repo should:
|
||||
- Compile (possibly with reduced functionality)
|
||||
- Have no references to `StreamInterface`, `MessageInterface`, `ListenerConfig`, or stealth mode in active docs
|
||||
- Have superseded ADRs clearly marked so agents don't implement the old model
|
||||
- Have all obsolete material in `docs/_archived/` where it won't bias agents
|
||||
- Be ready for the architect role to produce proper Phase 1 architecture specs following the SDD process
|
||||
After this cleanup, the repo:
|
||||
- Compiles cleanly (alknet-secret passes all 14 tests)
|
||||
- Has no old architecture docs, ADRs, or task graph
|
||||
- Has only directly relevant reference material (iroh, ssh, ops)
|
||||
- Has the pivot proposal and cleanup plan as the starting point
|
||||
- Has a clean workspace ready for the architect to produce Phase 1 specs
|
||||
- Has the reference implementation at `/workspace/@alkdev/alknet-main/`
|
||||
@@ -1,771 +0,0 @@
|
||||
# Research: Distributed Identity, Smart Contract ACL, and Decentralized Git
|
||||
|
||||
> Status: Research Reference
|
||||
> Created: 2026-06-08
|
||||
> Scope: Decentralized git hosting, distributed identity, smart contract-based access control, and their relevance to alknet
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Executive Summary](#1-executive-summary)
|
||||
2. [Source Concept: NFT-Based Decentralized Git](#2-source-concept-nft-based-decentralized-git)
|
||||
3. [Existing Projects](#3-existing-projects)
|
||||
4. [Identity on the Blockchain](#4-identity-on-the-blockchain)
|
||||
5. [Access Control Models for Distributed Git](#5-access-control-models-for-distributed-git)
|
||||
6. [Cryptographic Identity Mapping](#6-cryptographic-identity-mapping)
|
||||
7. [Gossip Protocols for Repo Synchronization](#7-gossip-protocols-for-repo-synchronization)
|
||||
8. [Relevance to Alknet](#8-relevance-to-alknet)
|
||||
9. [References](#9-references)
|
||||
|
||||
---
|
||||
|
||||
## 1. Executive Summary
|
||||
|
||||
This document researches distributed identity systems, smart contract-based access control, and decentralized git platforms to inform alknet's architecture. The source concept — a decentralized, censorship-resistant git hosting platform using NFTs (ERC-721) for identity and smart contracts for ACL — directly inspired some of alknet's cryptographic identity and key derivation ideas. The research reveals several key findings:
|
||||
|
||||
**Key Findings:**
|
||||
|
||||
1. **Radicle is the most mature decentralized git system** and provides the closest production reference for alknet's architecture, particularly in Ed25519 identity, gossip-based replication, and self-certifying repositories. However, Radicle lacks the smart contract/on-chain ACL layer that the source concept envisions.
|
||||
|
||||
2. **Smart contract ACL is feasible but introduces latency trade-offs.** On-chain identity verification costs 0.5-5 seconds per look-up on L2s, making it unsuitable as a hot path. The correct pattern is on-chain registration + local cache, which aligns with alknet's `StorageIdentityProvider` approach.
|
||||
|
||||
3. **alknet's BIP39/SLIP-0010 key derivation already spans both worlds.** The `m/74'/0'/0'/0'` path for Ed25519 identity and `m/44'/60'/0'/0/0` for Ethereum signing means the same seed phrase that governs alknet authentication can also sign on-chain transactions — no separate wallet needed.
|
||||
|
||||
4. **The Identity + IdentityProvider model maps directly to decentralized identity.** `ConfigIdentityProvider` is the local-only mode (Radicle-like); `StorageIdentityProvider` is the cached mode (on-chain ACL mirrored to SQLite); a future `OnChainIdentityProvider` could verify against smart contracts.
|
||||
|
||||
5. **Domain events vs. integration events (from alknet's event sourcing research) is the correct pattern** for synchronizing on-chain state to local nodes. On-chain events are the source of truth; honker streams carry the projected local state.
|
||||
|
||||
---
|
||||
|
||||
## 2. Source Concept: NFT-Based Decentralized Git
|
||||
|
||||
The originating concept for this research is a decentralized, censorship-resistant git hosting platform built on the following principles:
|
||||
|
||||
### 2.1 Core Architecture
|
||||
|
||||
| Component | Mechanism | Purpose |
|
||||
|-----------|----------|---------|
|
||||
| **Org/User Identity** | Transferable ERC-721 tokens | Organizations and users are NFTs; ownership is on-chain and transferable |
|
||||
| **Repository Identity** | ERC-721 tokens owned by org/user tokens | Repos are NFTs with a `mapping(address => Role)` ACL |
|
||||
| **Replicators** | User/org nodes listing replicated repos + public endpoints | Decentralized hosting; replicators choose what to mirror |
|
||||
| **Gossip Protocol** | Push/pull notifications about repo updates | Replicators learn about new commits from tracked repos |
|
||||
| **Push Authorization** | Identity's on-chain ACL verified by replicator | No central authority can ban; replicators individually verify write privileges |
|
||||
| **Funding Model** | After-the-fact Patreon-like contributions | Replicators receive donations; no paywall for access |
|
||||
|
||||
### 2.2 Key Design Properties
|
||||
|
||||
- **No central authority**: No single entity can ban an org, user, or repo
|
||||
- **Individual replicator choice**: Each replicator independently decides what to replicate and whose pushes to accept
|
||||
- **Transferable identity**: Selling the org NFT transfers all repos and access permissions
|
||||
- **Self-certifying data**: Git content addresses + on-chain identity = verifiable data provenance
|
||||
|
||||
### 2.3 Critical Gaps in the Source Concept
|
||||
|
||||
| Gap | Issue | Solution Pattern |
|
||||
|-----|-------|-----------------|
|
||||
| **Hot path latency** | On-chain ACL look-up per push is too slow | Cache ACL locally; sync from chain events |
|
||||
| **Key rotation** | If the private key controlling the NFT is lost, the identity is lost | Multi-delegate thresholds (like Radicle) + social recovery |
|
||||
| **Fork/namespace collisions** | Multiple repos with same name under different orgs | Use on-chain IDs (token IDs) not human-readable names as the authoritative identifier |
|
||||
| **Gas costs** | Every ACL change costs gas | Batch updates; use L2s (Base, Arbitrum); delegate to replicator-level local ACL |
|
||||
| **Revocation propagation** | Revoking write access must propagate to all replicators | Event-driven: on-chain Revoked event → gossip notification → local ACL update |
|
||||
|
||||
---
|
||||
|
||||
## 3. Existing Projects
|
||||
|
||||
### 3.1 Radicle (radicle.xyz)
|
||||
|
||||
**Overview**: Radicle is an open-source, peer-to-peer code collaboration stack built on Git. It is the most mature decentralized git system currently in production (v1.x, Heartwood release).
|
||||
|
||||
#### Identity System
|
||||
|
||||
| Feature | Implementation |
|
||||
|---------|---------------|
|
||||
| **Node ID (NID)** | Ed25519 public key encoded as a DID (`did:key:z6Mk...`) |
|
||||
| **Key format** | Ed25519 (same curve as alknet) |
|
||||
| **Storage** | SSH-format key files; `MemorySigner` holds decrypted key in RAM |
|
||||
| **Multi-device** | Currently one key per device (per RIP-0002); multi-device via threshold delegates is in development |
|
||||
| **Identity Document** | JSON document stored in Git, listing delegates (DIDs) and a threshold for canonical updates |
|
||||
|
||||
**Relevance to alknet**: Radicle's NID system is architecturally very close to alknet's Ed25519-based identity. Both use:
|
||||
- Ed25519 as the primary key type
|
||||
- A single seed/identity as the root of trust
|
||||
- DID-like identifiers for inter-node communication
|
||||
- Cryptographic signatures for data verification
|
||||
|
||||
**Key difference**: Radicle uses pure Ed25519 keypairs directly (no hierarchical derivation), while alknet derives Ed25519 keys from a BIP39 seed phrase via SLIP-0010. This gives alknet the ability to derive multiple keys from a single root and to derive Ethereum signing keys from the same seed.
|
||||
|
||||
#### Gossip Protocol
|
||||
|
||||
Radicle uses a custom gossip protocol with three message types:
|
||||
|
||||
| Message Type | Purpose | Content |
|
||||
|-------------|---------|---------|
|
||||
| **Node Announcement** | Peer discovery | Node ID, alias, addresses, capabilities, timestamp |
|
||||
| **Inventory Announcement** | Repo discovery | List of RepoIDs being seeded, timestamp |
|
||||
| **Reference Announcement** | Repo update notification | RepoID + updated signed refs, timestamp |
|
||||
|
||||
Each announcement includes a cryptographic signature and timestamp, enabling verification before relay. Messages are dropped on re-encounter (epidemic-style deduplication). Bootstrap nodes seed peer discovery.
|
||||
|
||||
**Comparison with alknet's call protocol**: Radicle's gossip is metadata-only; actual data transfer uses Git protocol. alknet's approach uses a call protocol (`EventEnvelope`) for both metadata and operation invocation. The gossip pattern could be layered on top of alknet's call protocol as a subscription-based integration event mechanism.
|
||||
|
||||
#### Self-Certifying Repositories
|
||||
|
||||
Radicle repositories are **self-certifying**:
|
||||
- The Repository ID (RID) is derived from the initial identity document hash
|
||||
- All actions (commits, issue comments, patches) are cryptographically signed
|
||||
- **Delegates** are public keys authorized to update the identity document
|
||||
- A **threshold** defines how many delegates must sign for an update to be canonical
|
||||
- Canonical branches are established dynamically based on signature thresholds
|
||||
|
||||
This eliminates the need for a central authority to determine "which version is correct."
|
||||
|
||||
**Relevance**: alknet's on-chain ACL concept (from the source) can use this threshold model. Instead of a single NFT owner dictating the canonical branch, a threshold of delegates can be required — this mirrors the `narrowed_scopes` / `DelegatesEdge` model in alknet's ACL graph.
|
||||
|
||||
#### Collaborative Objects (COBs)
|
||||
|
||||
COBs are Radicle's mechanism for distributed social artifacts (issues, patches, code review):
|
||||
|
||||
- Stored as Git objects in `refs/cobs/<type>/<object-id>` namespace
|
||||
- Use CRDT DAG (Directed Acyclic Graph) for conflict-free merging
|
||||
- All operations are Ed25519-signed by their author
|
||||
- SQLite cache (`cobs.db`) provides indexed queries without traversing Git history
|
||||
|
||||
**Relevance**: COBs demonstrate that complex social data can be stored in Git with CRDT semantics. alknet's `alknet-storage` metagraph + honker streams could serve a similar role for distributed state, with the key difference being that alknet's state store is SQLite-backed rather than Git-backed, making it more efficient for real-time operations.
|
||||
|
||||
#### Summary Assessment
|
||||
|
||||
| Dimension | Radicle | alknet (proposed) |
|
||||
|-----------|---------|-------------------|
|
||||
| **Identity** | Ed25519 keypair (DID) | Ed25519 from SLIP-0010 + Ethereum key from same seed |
|
||||
| **Naming** | No global naming; NID is identifier | On-chain NFT ID + human-readable name (via ENS or custom) |
|
||||
| **Access Control** | Threshold delegates in identity doc | Smart contract ACL + local graph cache |
|
||||
| **Replication** | Gossip for metadata, Git for data | Call protocol + (future) gossip subscriptions |
|
||||
| **Data Storage** | Git objects + SQLite cache | SQLite (metagraph/honker) + Git-compatible |
|
||||
| **Censorship Resistance** | P2P, no authority | P2P + on-chain identity (uncensorable registration) |
|
||||
| **Funding Model** | Community-funded seed nodes | After-the-fact contributions (replicators) |
|
||||
|
||||
### 3.2 ForgeFed (Forgejo Federation)
|
||||
|
||||
**Overview**: ForgeFed is an ActivityPub-based federation protocol for software forges. It enables Gitea/Forgejo instances to interoperate — users on one instance can open issues and submit PRs on another without creating separate accounts.
|
||||
|
||||
| Feature | Details |
|
||||
|---------|---------|
|
||||
| **Protocol** | ActivityPub (same as Mastodon, PeerTube) |
|
||||
| **Identity** | Web-based (user@example.com format, like email) |
|
||||
| **ACL** | Per-instance ACL; no on-chain verification |
|
||||
| **Censorship Resistance** | Limited; instances can block each other |
|
||||
| **Status** | Forgejo implementing; Vervis is reference implementation |
|
||||
|
||||
**Relevance to alknet**: ForgeFed shows how federation works without blockchain. It uses ActivityPub for cross-instance communication, which is analogous to alknet's call protocol for cross-node communication. However, ForgeFed relies on instance-level trust (each Forgejo admin controls their instance), while alknet's concept uses on-chain identity for trust.
|
||||
|
||||
**Key takeaway**: ForgeFed's federation model is complementary, not competitive, with blockchain identity. An alknet node could expose a ForgeFed-compatible interface for interop with existing forges while using on-chain identity for internal trust decisions.
|
||||
|
||||
### 3.3 Git-Based Smart Contract Projects
|
||||
|
||||
| Project | Chain | Approach | Status |
|
||||
|---------|-------|----------|--------|
|
||||
| **GitBross** | Solana/Arbitrum + IPFS | Repos backed up to IPFS; smart contracts for metadata | Active |
|
||||
| **GitLike** | Ethereum + IPFS | Browser-based decentralized VCS | Experimental |
|
||||
| **Statik** | IPFS | Version control on IPFS with content-addressed storage | Experimental |
|
||||
| **PineSU** | Ethereum | Git repos + blockchain for integrity/timestamping | Research paper |
|
||||
|
||||
**Common patterns**:
|
||||
- IPFS for content-addressed storage of git objects
|
||||
- Smart contracts for metadata (ownership, ACL, provenance)
|
||||
- Ethereum or L2 for on-chain verification
|
||||
- Git bridge tools that push to both IPFS and traditional remotes
|
||||
|
||||
**Key insight**: None of these projects have achieved widespread adoption. The main challenges are:
|
||||
1. **Performance**: IPFS retrieval is slower than centralized git hosting
|
||||
2. **UX**: Browser-based git clients lack feature parity with CLI tools
|
||||
3. **Incentives**: No sustainable funding model for replicators
|
||||
|
||||
alknet's approach of using traditional git remotes with a smart contract ACL overlay avoids the IPFS performance trap while still providing censorship resistance.
|
||||
|
||||
### 3.4 NFT-Based Access Control Systems
|
||||
|
||||
Several projects use NFTs (ERC-721) for access gating:
|
||||
|
||||
| Pattern | Mechanism | Example |
|
||||
|---------|-----------|---------|
|
||||
| **Token-gated content** | Wallet verification proves NFT ownership before granting access | NFT-gated websites, Discord roles |
|
||||
| **Role-based ACL via NFT** | NFTs represent roles; smart contract checks `balanceOf(address) > 0` | Token-gated DAOs, access-controlled channels |
|
||||
| **Namespace NFTs** | Each NFT represents a namespace/org; sub-rights derive from ownership | ENS domains, NFT-based guild systems |
|
||||
|
||||
**Solidity Pattern for Repository ACL**:
|
||||
|
||||
```solidity
|
||||
// Simplified example: NFT-based org/repo with on-chain ACL
|
||||
contract OrgToken is ERC721 {
|
||||
struct Org {
|
||||
address owner;
|
||||
mapping(address => Role) members; // ACL mapping
|
||||
}
|
||||
|
||||
struct Repo {
|
||||
uint256 orgTokenId; // Owning org
|
||||
mapping(address => Permission) collaborators;
|
||||
}
|
||||
|
||||
function canPush(uint256 repoId, address user) external view returns (bool) {
|
||||
Repo storage repo = repos[repoId];
|
||||
// Check direct permission
|
||||
if (repo.collaborators[user] >= Permission.Write) return true;
|
||||
// Check org membership
|
||||
Org storage org = orgs[repo.orgTokenId];
|
||||
if (org.members[user] >= Role.Member) return true;
|
||||
return false;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Performance considerations**: A `canPush()` check on L2 (Base, Arbitrum) costs ~0.001-0.01 USD and takes 0.5-2 seconds. This is acceptable for occasional operations (repo creation, ACL changes) but not for per-push verification. Caching is essential.
|
||||
|
||||
**Relevance to alknet**: The mapping from on-chain ACL to alknet's local ACL graph is direct:
|
||||
- ERC-721 token ID → `PrincipalNode` in alknet's ACL metagraph
|
||||
- `collaborators` mapping → `DelegatesEdge` with `narrowed_scopes`
|
||||
- `canPush()` → alknet's `check_access()` function
|
||||
|
||||
---
|
||||
|
||||
## 4. Identity on the Blockchain
|
||||
|
||||
### 4.1 ERC-721 as Identity/Namespace Tokens
|
||||
|
||||
**How it works**: Each unique identity (org, user, namespace) is an ERC-721 NFT. The token ID is the on-chain identifier; metadata (display name, avatar, public key) is stored off-chain (IPFS or DNS).
|
||||
|
||||
**Advantages**:
|
||||
- Inherent transferability (sell/gift an org identity)
|
||||
- On-chain ownership verification
|
||||
- Metadata can include cryptographic public keys for off-chain verification
|
||||
- Composable with other on-chain protocols (DAO governance, treasury)
|
||||
|
||||
**Disadvantages**:
|
||||
- Gas costs for every state change
|
||||
- Key rotation requires a transaction (can't just change a local file)
|
||||
- Metadata availability depends on off-chain storage
|
||||
- Privacy: all ACL changes are public on-chain
|
||||
|
||||
**Resolution pattern**: Use on-chain registration as the root of trust, but resolve identity locally via cached data. This is exactly how DNS works — the zone file is authoritative, but resolvers cache it.
|
||||
|
||||
### 4.2 ENS (Ethereum Name Service) as a Naming Layer
|
||||
|
||||
**Overview**: ENS maps human-readable names (e.g., `alice.eth`) to machine-readable identifiers (Ethereum addresses, content hashes, text records).
|
||||
|
||||
| Feature | Implementation |
|
||||
|---------|---------------|
|
||||
| **Name resolution** | `alice.eth` → Ethereum address (NFT owner) |
|
||||
| **Text records** | Store arbitrary key-value data (avatar, email, public key, SSH key) |
|
||||
| **Subdomains** | `git.alice.eth` can point to a replicator endpoint |
|
||||
| **Resolver** | Smart contract that returns records for a name |
|
||||
| **Off-chain look-up** | CCIP-read (EIP-3668) allows resolving names via external data |
|
||||
|
||||
**Relevance to alknet**: ENS text records can store alknet node identifiers:
|
||||
- `alk.id` text record → alknet Node ID (Ed25519 public key fingerprint)
|
||||
- `alk.pubkey` text record → Ed25519 public key (for SSH authentication)
|
||||
- `alk.replicator` text record → endpoint URL (for repo discovery)
|
||||
|
||||
This creates a human-friendly naming overlay on top of alknet's cryptographic identifiers. Combined with DNS TXT records (alknet's planned DNS naming layer), it provides multiple resolution paths.
|
||||
|
||||
**Limitation**: ENS resolution requires an Ethereum RPC call, which adds latency. For production use, ENS data should be cached locally and refreshed periodically, similar to DNS TTLs.
|
||||
|
||||
### 4.3 Smart Contracts as ACL/Naming Services
|
||||
|
||||
**Pattern**: A smart contract stores the ACL mapping and provides a view function for verification. This is the "source of truth" that local caches sync from.
|
||||
|
||||
```
|
||||
On-chain ACL contract (source of truth)
|
||||
│
|
||||
│ events: RoleGranted, RoleRevoked, RepoCreated, etc.
|
||||
│
|
||||
▼
|
||||
alknet-storage (local cache)
|
||||
├── ACL metagraph (PrincipalNode + DelegatesEdge)
|
||||
├── Synced from on-chain events
|
||||
└── Used for hot-path access checks
|
||||
```
|
||||
|
||||
**Event-driven sync pattern** (critical for alknet):
|
||||
|
||||
1. Smart contract emits `RoleGranted(address, repoId, role)` event
|
||||
2. alknet head node listens to these events (via Ethereum log subscription)
|
||||
3. Event is projected into the ACL metagraph as a `DelegatesEdge` with `narrowed_scopes`
|
||||
4. Local access checks use the metagraph (fast, SQLite)
|
||||
5. Periodic consistency check ensures local cache matches on-chain state
|
||||
|
||||
This maps directly to alknet's **event boundary discipline**:
|
||||
- On-chain events = external source of truth (like domain events from another service)
|
||||
- ACL metagraph = local projection (like an integration event or read model)
|
||||
- Honker stream `acl:updated` = notification that the local cache changed (integration event)
|
||||
|
||||
### 4.4 Decentralized Identity Standards
|
||||
|
||||
#### W3C DIDs (Decentralized Identifiers)
|
||||
|
||||
**Overview**: DIDs are a W3C standard for verifiable, self-sovereign digital identifiers. A DID is a URI that resolves to a DID Document describing how to interact with the identity holder.
|
||||
|
||||
| DID Method | Resolution | Key Type | Use Case |
|
||||
|-----------|-----------|----------|----------|
|
||||
| `did:key` | Static (no registry) | Ed25519, secp256k1, etc. | Radicle uses this; self-certifying |
|
||||
| `did:ethr` | Ethereum registry | secp256k1 | Blockchain-verifiable identity |
|
||||
| `did:web` | DNS/web server | Any | Traditional web PKI bridge |
|
||||
| `did:ion` | Bitcoin Sidetree | secp256k1 | Microsoft's DID system |
|
||||
|
||||
**Relevance**: Radicle uses `did:key` with Ed25519 keys. alknet could use `did:key` for local identity (same key type!) and extend to `did:ethr` for on-chain identity, using the same seed phrase to derive both keys.
|
||||
|
||||
#### Verifiable Credentials (VCs)
|
||||
|
||||
**Overview**: VCs are tamper-evident, cryptographically secure attestations issued by a trusted authority. Think of them as digital certificates (driver's license, degree) that the holder presents to a verifier.
|
||||
|
||||
**Application to git access**: A VC could attest that "this Ed25519 public key has write access to repo X." The issuer is the org's NFT contract (or a delegate). VCs can be verified off-chain, reducing on-chain transaction costs.
|
||||
|
||||
**alknet mapping**: VCs are analogous to alknet's `Identity` struct with `scopes` and `resources`. A VC issuance maps to the creation of a `DelegatesEdge` in the ACL graph. The key difference is that VCs are bearer tokens (anyone who holds one can present it), while alknet's ACL is graph-based (the principal must be connected to the resource via edges).
|
||||
|
||||
---
|
||||
|
||||
## 5. Access Control Models for Distributed Git
|
||||
|
||||
### 5.1 Git's Own ACL Model
|
||||
|
||||
Git has limited built-in ACL. Access control is typically enforced at the transport layer:
|
||||
|
||||
| Mechanism | Layer | Scope |
|
||||
|-----------|-------|-------|
|
||||
| **`pre-receive` hook** | Server-side | Reject pushes based on branch, author, file patterns |
|
||||
| **`update` hook** | Server-side | Per-ref checks (branch-level protection) |
|
||||
| **`post-receive` hook** | Server-side | Post-push actions (notifications, CI triggers) |
|
||||
| **SSH key mapping** | Transport | `authorized_keys` → system user → filesystem permissions |
|
||||
| **HTTP basic auth** | Transport | Username/password → Git smart HTTP |
|
||||
| **Gitolite** | Server-side | Config-file-based ACL mapping SSH keys to repos and permissions |
|
||||
|
||||
**Gitolite pattern** (most relevant for distributed git):
|
||||
- `~/.ssh/authorized_keys` maps SSH keys to Gitolite users
|
||||
- `~/.gitolite/conf/gitolite.conf` defines repos and permissions
|
||||
- Permission levels: `R` (read), `RW` (read+write), `RW+` (read+write+force-push)
|
||||
- Wildcard repos: `CREATOR/..*` — users can create repos matching patterns
|
||||
|
||||
**alknet mapping**: Gitolite's config file is the analog of alknet's ACL metagraph. The key difference is that Gitolite is centralized (one config file), while alknet's ACL can be distributed (synced from on-chain events).
|
||||
|
||||
### 5.2 Decentralized Write Permission Without Central Authority
|
||||
|
||||
In a truly decentralized system, no single node controls access. Several patterns exist:
|
||||
|
||||
#### Pattern 1: Self-Certifying Repositories (Radicle)
|
||||
|
||||
- The repo creator defines an identity document listing delegates
|
||||
- Delegates are Ed25519 public keys with a threshold
|
||||
- Only delegate signatures on refs are considered canonical
|
||||
- Replicators accept any push but only replicate refs signed by sufficient delegates
|
||||
|
||||
**Trade-off**: Simple, no on-chain costs, but no mechanism for human-readable names or transferable ownership.
|
||||
|
||||
#### Pattern 2: On-Chain ACL (Source Concept)
|
||||
|
||||
- Smart contract stores `mapping(address => Role)` for each repo
|
||||
- Replicators verify pusher's address against the contract before accepting
|
||||
- Ownership is transferable (the NFT can be sold)
|
||||
- Gas costs for setup and ACL changes
|
||||
|
||||
**Trade-off**: Transferable ownership and verifiable ACL, but requires Ethereum interaction and introduces latency.
|
||||
|
||||
#### Pattern 3: Hybrid — On-Chain Root + Local Cache
|
||||
|
||||
- On-chain contract defines who owns each org/repo NFT
|
||||
- Local ACL graph caches on-chain state and adds local rules
|
||||
- Hot-path checks use local cache (SQLite, fast)
|
||||
- Cold-path operations (ACL changes, ownership transfers) go on-chain
|
||||
- Local cache is periodically verified against on-chain state
|
||||
|
||||
**This is the recommended pattern for alknet.** It combines:
|
||||
- On-chain censorship resistance (no single authority can revoke identity)
|
||||
- Local performance (ACL checks are SQLite-fast)
|
||||
- Transferable ownership (NFT can be sold/transferred on-chain)
|
||||
- Graceful degradation (local ACL still works when chain is unavailable)
|
||||
|
||||
### 5.3 Radicle's Approach to Identity and Verification
|
||||
|
||||
Radicle's identity model has specific properties worth detailed comparison:
|
||||
|
||||
| Property | Radicle | alknet (proposed) |
|
||||
|----------|---------|-------------------|
|
||||
| **Identity root** | Ed25519 keypair (generated locally) | BIP39 seed phrase → SLIP-0010 derivation |
|
||||
| **Identity document** | JSON in Git, signed by delegates | On-chain NFT + local ACL metagraph |
|
||||
| **Delegate model** | Threshold of N public keys | Threshold of N delegates (on-chain or local) |
|
||||
| **Key rotation** | Add/remove delegates via identity doc update | Transfer NFT to new address; update local keys |
|
||||
| **Multi-device** | One key per device (RIP-0002) | One key per device derived from same seed (`m/74'/0'/0'/{n}'`) |
|
||||
| **Namespace collision** | RID is content-hash, collision-free | NFT token ID is unique; human names via ENS |
|
||||
| **Revocation** | Remove delegate from identity doc | On-chain ACL change + local cache update |
|
||||
| **Verification** | Signature verification against delegate list | Signature verification + on-chain ACL check |
|
||||
|
||||
**alknet advantage**: Deriving multiple keys from one seed means:
|
||||
- Multi-device support is built-in (derive a key per device)
|
||||
- No "one key per identity" limitation
|
||||
- The same seed provides identity keys, encryption keys, SSH keys, and Ethereum signing keys
|
||||
- Key rotation for a single device is: derive a new key from the next index, updated locally
|
||||
|
||||
**alknet challenge**: If the seed phrase is lost, all derived keys are lost. Mitigation strategies:
|
||||
- Social recovery (N-of-M threshold: trusted contacts hold shards)
|
||||
- Hardware security module (HSM) protection for the seed
|
||||
- Multi-sig on key operations (require threshold of devices to authorize)
|
||||
|
||||
---
|
||||
|
||||
## 6. Cryptographic Identity Mapping
|
||||
|
||||
### 6.1 Ed25519 Keys (alknet's Key Type)
|
||||
|
||||
alknet uses Ed25519 as the primary key type for:
|
||||
- SSH authentication (fingerprint-based verification)
|
||||
- Node identity (Node IDs are Ed25519 public keys)
|
||||
- Channel signing (call protocol event signatures)
|
||||
|
||||
**Relevant properties of Ed25519**:
|
||||
- 32-byte public key, 64-byte private key (or 32-byte seed + 32-byte public key)
|
||||
- Deterministic signatures (same message, same key → same signature)
|
||||
- Fast verification (~3x faster than secp256k1)
|
||||
- Used in SSH (since OpenSSH 6.5), Tor onion services, Signal
|
||||
|
||||
**SLIP-0010 derivation** (what alknet uses):
|
||||
- SLIP-0010 generalizes BIP-32 to non-secp256k1 curves
|
||||
- Ed25519 derivation uses **hardened keys only** (cannot derive child public keys from parent public key)
|
||||
- This means: the master seed must be available to derive any child key
|
||||
- alknet's secret service holds the seed in RAM and derives keys on demand
|
||||
|
||||
### 6.2 Blockchain Private Keys vs SSH Keys
|
||||
|
||||
The key question for mapping blockchain identity to git access is: **how does an Ed25519 SSH key relate to a secp256k1 Ethereum key?**
|
||||
|
||||
| Key Type | Curve | Use Case | alknet Derivation Path |
|
||||
|----------|-------|----------|----------------------|
|
||||
| Identity key | Ed25519 | SSH auth, node identity | `m/74'/0'/0'/0'` |
|
||||
| Device key | Ed25519 | Per-device identity | `m/74'/0'/0'/{n}'` |
|
||||
| SSH host key | Ed25519 | Server identity | `m/74'/0'/1'/0'` |
|
||||
| Encryption key | AES-256-GCM | External credential encryption | `m/74'/2'/0'/0'` |
|
||||
| Ethereum key | secp256k1 | Smart contract signing | `m/44'/60'/0'/0/0` |
|
||||
|
||||
**The bridge**: Both keys derive from the **same BIP39 seed phrase**. The secret service can sign an Ethereum transaction using the secp256k1 key and also authenticate SSH using the Ed25519 key. This creates a cryptographically linked identity pair:
|
||||
- On-chain identity (Ethereum address derived from `m/44'/60'/0'/0/0`)
|
||||
- Off-chain identity (Ed25519 key derived from `m/74'/0'/0'/0'`)
|
||||
|
||||
**Binding them**: To prove that the Ed25519 key and the Ethereum key belong to the same entity:
|
||||
1. Sign a message with the Ed25519 key: `"I, <Ed25519-pubkey>, attest that my on-chain identity is <Ethereum-address>"`
|
||||
2. Store this attestation on-chain (in the org/user NFT metadata)
|
||||
3. Anyone can verify: the on-chain address owns the NFT, and the attestation links the SSH key to that address
|
||||
|
||||
This is the **key binding mechanism** that connects alknet's SSH-based authentication to on-chain identity.
|
||||
|
||||
### 6.3 Deriving Repository Access from On-Chain Identity
|
||||
|
||||
The complete flow for a push operation in a decentralized git system with on-chain ACL:
|
||||
|
||||
```
|
||||
1. Client connects to replicator via SSH
|
||||
2. SSH auth succeeds (Ed25519 key verified by alknet IdentityProvider)
|
||||
3. Client pushes to repo X
|
||||
4. Replicator checks:
|
||||
a. Local ACL metagraph: does this Ed25519 key have write access to repo X?
|
||||
b. If local ACL is stale, re-verify against on-chain contract
|
||||
5. If authorized: accept push, gossip update to other replicators
|
||||
6. If not: reject with "access denied"
|
||||
```
|
||||
|
||||
**Optimization**: Step 4b is rarely needed if the local ACL cache is kept fresh via event subscriptions. The on-chain contract emits events on ACL changes, and the head node's sync process projects these into the local ACL metagraph.
|
||||
|
||||
**alknet's existing support for this flow**:
|
||||
|
||||
| Component | Role |
|
||||
|-----------|------|
|
||||
| `IdentityProvider` trait | Resolves Ed25519 fingerprint → `Identity` with scopes/resources |
|
||||
| `ConfigIdentityProvider` | Local-only: reads from `authorized_keys` config |
|
||||
| `StorageIdentityProvider` | SQLite-backed: queries `peer_credentials` + ACL metagraph |
|
||||
| `OnChainIdentityProvider` (future) | Verifies against on-chain ACL, falls back to local cache |
|
||||
| `AuthProtocol` (irpc) | `VerifyPubkey` → `Identity` resolution |
|
||||
| `CheckAccess` (irpc) | `Identity` + operation → access verification using ACL graph |
|
||||
| `OperationSpec.access_control` | Declarative access requirements per operation |
|
||||
|
||||
---
|
||||
|
||||
## 7. Gossip Protocols for Repo Synchronization
|
||||
|
||||
### 7.1 Epidemic/Gossip Protocol Fundamentals
|
||||
|
||||
Gossip protocols are decentralized dissemination mechanisms inspired by how rumors spread in social networks. Key properties:
|
||||
- **Eventual consistency**: All nodes eventually receive all updates
|
||||
- **Fault tolerance**: Works even when nodes join/leave randomly
|
||||
- **Scalability**: O(log N) time to reach all nodes in a network of N nodes
|
||||
- **No single point of failure**: No coordinator node
|
||||
|
||||
### 7.2 Radicle's Gossip Protocol
|
||||
|
||||
Radicle uses three message types (detailed in Section 3.1):
|
||||
- **Node Announcements**: Peer discovery (who's online, where to reach them)
|
||||
- **Inventory Announcements**: Repo discovery (what repos each node seeds)
|
||||
- **Reference Announcements**: Update notifications (new commits, new COB operations)
|
||||
|
||||
**Anti-entropy mechanism**: Nodes periodically exchange state summaries to ensure they haven't missed any updates. This is similar to Merkle tree-based reconciliation in distributed databases.
|
||||
|
||||
**Relevance to alknet**: alknet's call protocol subscription model (`call.requested` with `OperationType::Subscription`) can serve as the transport for gossip messages. The key difference is that alknet's call protocol is request-response oriented, while gossip is push-based. A gossip layer on top of the call protocol would work as follows:
|
||||
|
||||
```
|
||||
alknet gossip layer:
|
||||
1. Subscribe to `/{node}/gossip/announce` on known peers
|
||||
2. Receive NodeAnnouncement, InventoryAnnouncement, RefAnnouncement events
|
||||
3. Forward announcements to other connected peers (with deduplication)
|
||||
4. For RefAnnouncements of tracked repos, trigger git fetch
|
||||
```
|
||||
|
||||
### 7.3 Alternative: CRDT-Based Sync
|
||||
|
||||
Instead of gossip + git fetch, some systems use CRDTs for repository synchronization:
|
||||
- **Advantages**: No merge conflicts, automatic convergence
|
||||
- **Disadvantages**: Large metadata overhead, complex implementation, doesn't map directly to git's object model
|
||||
|
||||
**Recommendation for alknet**: Start with gossip + git fetch (as Radicle does) and consider CRDT-based sync for specific metadata (e.g., ACL state, org metadata) while keeping git data as-is. The ACL metagraph changes can propagate via honker streams (which are effectively a form of CRDT merge).
|
||||
|
||||
---
|
||||
|
||||
## 8. Relevance to Alknet
|
||||
|
||||
### 8.1 Identity + IdentityProvider Model
|
||||
|
||||
alknet's existing `Identity` struct and `IdentityProvider` trait are already designed for this use case:
|
||||
|
||||
```rust
|
||||
pub struct Identity {
|
||||
pub id: String, // Fingerprint or UUID
|
||||
pub scopes: Vec<String>, // Permission scopes
|
||||
pub resources: Option<HashMap<String, Vec<String>>>, // Resource-level access
|
||||
}
|
||||
```
|
||||
|
||||
The `id` field serves dual purpose:
|
||||
- **Config-based auth**: SSH fingerprint (e.g., `SHA256:abc123...`)
|
||||
- **Storage-based auth**: Account UUID (e.g., `acc_0123456789`)
|
||||
|
||||
**Extended for on-chain identity**, the `id` field could also be:
|
||||
- **On-chain auth**: Ethereum address (e.g., `0x1234...`) or NFT token ID (e.g., `token_42`)
|
||||
|
||||
The `IdentityProvider` trait naturally extends:
|
||||
|
||||
```rust
|
||||
trait IdentityProvider: Send + Sync {
|
||||
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
|
||||
fn resolve_from_token(&self, token: &[u8]) -> Option<Identity>;
|
||||
}
|
||||
|
||||
// Future extension:
|
||||
// OnChainIdentityProvider resolves Ethereum address + Ed25519 binding
|
||||
// from on-chain ACL contract, with local metagraph cache
|
||||
```
|
||||
|
||||
### 8.2 OperationRegistry Extension with On-Chain Verification
|
||||
|
||||
alknet's `OperationSpec` includes `access_control` fields:
|
||||
|
||||
```rust
|
||||
pub struct AccessControl {
|
||||
pub required_scopes: Vec<String>,
|
||||
pub required_scopes_any: Option<Vec<String>>,
|
||||
pub resource_type: Option<String>,
|
||||
pub resource_action: Option<String>,
|
||||
}
|
||||
```
|
||||
|
||||
For on-chain verification, a new `access_control` mode could be added:
|
||||
|
||||
```rust
|
||||
pub enum AccessControlMode {
|
||||
Local, // Check against local ACL metagraph (current)
|
||||
OnChain, // Verify against on-chain contract (future)
|
||||
CachedOnChain, // Check local cache first, verify on-chain on miss/stale (recommended)
|
||||
}
|
||||
```
|
||||
|
||||
The `AccessControl` struct gains a `mode` field defaulting to `Local`. This is additive and doesn't change existing behavior.
|
||||
|
||||
### 8.3 Git Service Adapter for Decentralized Replication
|
||||
|
||||
alknet's application service pattern (from services.md) can accommodate a `GitService`:
|
||||
|
||||
```rust
|
||||
#[rpc_requests(message = GitMessage)]
|
||||
enum GitProtocol {
|
||||
#[rpc(tx=oneshot::Sender<RepoInfo>)]
|
||||
#[wrap(GetRepo)]
|
||||
GetRepo { repo_id: String },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<Vec<RepoInfo>>)]
|
||||
#[wrap(ListRepos)]
|
||||
ListRepos { org: Option<String> },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<bool>)]
|
||||
#[wrap(CanPush)]
|
||||
CanPush { repo_id: String, identity: Identity },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<()>)]
|
||||
#[wrap(UpdateMirror)]
|
||||
UpdateMirror { repo_id: String, refs: Vec<RefUpdate> },
|
||||
|
||||
#[rpc(tx=mpsc::Sender<RefAnnouncement>)]
|
||||
#[wrap(SubscribeRefs)]
|
||||
SubscribeRefs { repo_ids: Vec<String> },
|
||||
}
|
||||
```
|
||||
|
||||
This service:
|
||||
- **Registers with the call protocol** as `/head/git/*`
|
||||
- **Uses `StorageIdentityProvider`** for `CanPush` checks (with ACL metagraph)
|
||||
- **Manages git mirrors** (git bare repos on the local filesystem)
|
||||
- **Propagates updates** via `SubscribeRefs` (which maps to honker stream subscriptions → call protocol integration events)
|
||||
|
||||
### 8.4 CredentialProvider Role
|
||||
|
||||
The existing `CredentialProvider` pattern in alknet (used for outbound authentication TO external services) maps to:
|
||||
|
||||
| Use Case | CredentialProvider Implementation |
|
||||
|----------|----------------------------------|
|
||||
| Push to GitHub/GitLab | SSH key from alknet identity, or OAuth token from external source |
|
||||
| Push to on-chain repo | Ed25519 key derived from seed (signs the push) + Ethereum key (signs on-chain attestation) |
|
||||
| Authenticate to replicator | Ed25519 key (SSH auth via `IdentityProvider`) |
|
||||
| Decrypt stored credentials | AES-256-GCM key derived from seed via `SecretProtocol` |
|
||||
|
||||
### 8.5 Domain Events vs. Integration Events (Distributed Git Context)
|
||||
|
||||
alknet's event boundary discipline (from event sourcing research and ADR-032) is critical for the distributed git scenario:
|
||||
|
||||
| Event Type | Source | Consumer | Boundary | Git Analog |
|
||||
|-----------|--------|----------|----------|------------|
|
||||
| **Domain events** (honker) | Local service | Same service | Internal | Git object creation/update in local repo |
|
||||
| **Integration events** (call protocol) | Projected from domain events | Other nodes/services | Cross-node | Push notification, gossip announcement |
|
||||
| **On-chain events** (smart contract) | Ethereum log | Head node sync process | External source | ACL change on blockchain |
|
||||
| **Notifications** (honker) | Service | Any subscriber | Cross-service | "Repo X was updated" (thin, ID-only) |
|
||||
|
||||
**The flow for a decentralized git push**:
|
||||
|
||||
```
|
||||
1. Client pushes to replicator
|
||||
2. Replicator's GitService receives push
|
||||
3. GitService publishes domain event: "repo:refs-updated" (honker stream)
|
||||
4. Integration event projected: "call.responded" with repo update (call protocol)
|
||||
5. Replicator gossips "RefAnnouncement" to tracked peers (call protocol subscription)
|
||||
6. On-chain: if this push creates a new branch, optionally emit on-chain attestation
|
||||
7. Peer replicators fetch updated refs (git protocol) and update their mirrors
|
||||
```
|
||||
|
||||
**The flow for an ACL change**:
|
||||
|
||||
```
|
||||
1. Org admin calls smart contract: grantWrite(repoId, newUserAddress)
|
||||
2. Smart contract emits RoleGranted event
|
||||
3. Head node's sync process detects the event (Ethereum log subscription)
|
||||
4. Sync process calls StorageService: add DelegatesEdge to ACL metagraph
|
||||
5. StorageService publishes domain event: "acl:updated" (honker stream)
|
||||
6. Integration event projected: notify replicators of ACL change (call protocol)
|
||||
7. Replicators update their local ACL cache
|
||||
```
|
||||
|
||||
This cleanly separates:
|
||||
- **On-chain events** (smart contract logs) = external source of truth
|
||||
- **Local projections** (ACL metagraph) = cached view for fast access checks
|
||||
- **Integration events** (call protocol) = cross-node notification mechanism
|
||||
- **Domain events** (honker streams) = internal state management
|
||||
|
||||
### 8.6 Practical Integration Path
|
||||
|
||||
For alknet to support the decentralized git concept, the integration path is:
|
||||
|
||||
#### Phase 1: Foundation (Current Architecture)
|
||||
|
||||
- `IdentityProvider` trait supports multiple backends ✓
|
||||
- `StorageIdentityProvider` queries `peer_credentials` + ACL graph ✓
|
||||
- `SecretProtocol` derives Ed25519 and secp256k1 keys from same seed ✓
|
||||
- `OperationSpec.access_control` supports scope-based checks ✓
|
||||
|
||||
#### Phase 2: Git Service (Additive)
|
||||
|
||||
- Add `GitProtocol` irpc service for repo management
|
||||
- Implement `GitService` as an application service (like DockerService, NodeService)
|
||||
- Map `CanPush` to ACL metagraph traversal
|
||||
- Implement `pre-receive` hook that calls alknet's `CheckAccess` irpc
|
||||
|
||||
#### Phase 3: On-Chain ACL (Additive, Requires External Dependencies)
|
||||
|
||||
- Add `OnChainIdentityProvider` that:
|
||||
1. Resolves Ed25519 fingerprint → Ethereum address (via attestation stored in NFT metadata)
|
||||
2. Checks on-chain ACL contract for access rights
|
||||
3. Caches results in local ACL metagraph
|
||||
4. Subscribes to on-chain events for ACL changes
|
||||
- Add `AccessControlMode::CachedOnChain` to `OperationSpec`
|
||||
- Add `WalletProtocol` irpc service for signing on-chain transactions
|
||||
|
||||
#### Phase 4: Gossip and Replication (Additive)
|
||||
|
||||
- Add gossip message types to call protocol (`NodeAnnouncement`, `RepoAnnouncement`, `RefAnnouncement`)
|
||||
- Implement `SubscribeRefs` streaming operation for repo update subscriptions
|
||||
- Add replicator service that seeds repos and responds to gossip
|
||||
|
||||
Each phase is additive and doesn't require changes to earlier phases. The architecture supports this incremental extension because:
|
||||
1. `IdentityProvider` is a trait — new implementations are additive
|
||||
2. `OperationSpec.access_control` is a struct — new fields are additive
|
||||
3. Application services register with the call protocol — new services don't change core
|
||||
4. Honker streams are internal — new streams are additive
|
||||
|
||||
---
|
||||
|
||||
## 9. References
|
||||
|
||||
### Decentralized Git Platforms
|
||||
|
||||
- **Radicle Protocol Guide**: https://radicle.dev/guides/protocol — Comprehensive documentation of Radicle's identity system, gossip protocol, replication, and self-certifying repositories
|
||||
- **Radicle Heartwood (source)**: https://github.com/radicle-dev/heartwood — Reference implementation in Rust
|
||||
- **RIP-0002 Identity**: Radicle Improvement Proposal for identity documents and delegate thresholds
|
||||
- **radicle-crypto crate**: Ed25519 key types, SSH encoding, keystore (DeepWiki analysis: https://deepwiki.com/radicle-dev/heartwood/7.1-radicle-crypto)
|
||||
- **ForgeFed**: https://forgefed.org/ — ActivityPub-based federation protocol for forges (Forgejo, Gitea integration)
|
||||
- **GitLike**: https://gitlike.dev/ — Browser-based decentralized VCS using IPFS and Ethereum
|
||||
- **GitBross**: https://gitbross.com/ — Decentralized Git platform using Solana, Arbitrum, and IPFS
|
||||
- **PineSU**: IEEE paper on Git + Ethereum integration for trusted information sharing
|
||||
|
||||
### Blockchain Identity and Naming
|
||||
|
||||
- **ERC-721 Standard**: https://ethereum.org/developers/docs/standards/tokens/erc-721 — Non-fungible token standard
|
||||
- **ENS (Ethereum Name Service)**: https://docs.ens.domains/ — Decentralized naming on Ethereum
|
||||
- **W3C DID Primer**: https://w3c-ccg.github.io/did-primer/ — Decentralized Identifiers overview
|
||||
- **W3C Verifiable Credentials**: https://www.w3.org/TR/vc-data-model/ — VC specification
|
||||
- **EIP-3668 (CCIP-Read)**: Off-chain data lookup for ENS, enabling smart contracts to verify off-chain data
|
||||
|
||||
### Access Control
|
||||
|
||||
- **Git Hooks**: https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks — Server-side hooks for git access control
|
||||
- **Gitolite**: Config-file-based SSH key → repo permission mapping
|
||||
- **Token-Gated Access Control**: https://chainscorelabs.com/guides/ — Patterns for ERC-721/ERC-1155 token-gated access
|
||||
- **ChainGuard**: Blockchain-based authentication and access control (academic paper)
|
||||
|
||||
### Cryptographic Key Management
|
||||
|
||||
- **SLIP-0010**: https://slips.readthedocs.io/en/latest/slip-0010/ — Universal private key derivation from master private key (Ed25519, secp256k1, NIST P-256)
|
||||
- **BIP-0032**: Hierarchical Deterministic Wallets
|
||||
- **BIP-0039**: Mnemonic code for generating deterministic keys
|
||||
- **SLIP-0044**: Registered coin types for BIP-0044 (alknet uses unallocated `74'`)
|
||||
- **Ed25519**: Bernstein's Edwards-curve Digital Signature Algorithm
|
||||
|
||||
### Gossip Protocols
|
||||
|
||||
- **Gossip Protocol Fundamentals**: https://www.geeksforgeeks.org/distributed-systems/gossip-protocol-in-disrtibuted-systems/ — Epidemic-style information dissemination
|
||||
- **libgossip**: C++17 implementation for decentralized node discovery and metadata propagation
|
||||
- **Bitcoin Gossip**: Used in Bitcoin for transaction and block propagation
|
||||
- **Secure Scuttlebutt (SSB)**: Inspiration for Radicle's gossip model
|
||||
|
||||
### Alknet Architecture Documents (Internal)
|
||||
|
||||
- **core.md**: Transport, call protocol, auth, services, DNS
|
||||
- **services.md**: irpc service architecture, OperationEnv, Identity, auth/secret/config protocols
|
||||
- **storage.md**: Metagraph data model, ACL as metagraph, identity tables, honker integration
|
||||
- **integration-plan.md**: Phase 0-4 integration plan, ADRs 026-034
|
||||
- **ADR-029**: Identity as core type (`Identity { id, scopes, resources }` + `IdentityProvider` trait)
|
||||
- **ADR-032**: Event boundary discipline (domain events vs. integration events vs. service calls)
|
||||
|
||||
### Radicle-Specific Documentation
|
||||
|
||||
- **Radicle COBs (Collaborative Objects)**: CRDT-based distributed issues/patches stored as Git objects — https://deepwiki.com/radicle-dev/heartwood/6.1-collaborative-objects-(cobs)
|
||||
- **Radicle Identity Documents**: Delegates, thresholds, and self-certifying repo identity — RIP-0002
|
||||
- **Radicle Signed Refs**: Vulnerability disclosure (2026-03) on replay attacks in signed references
|
||||
@@ -1,716 +0,0 @@
|
||||
# Gitserver Reference Document
|
||||
|
||||
> **Source**: <https://github.com/WJQSERVER/gitserver> (cloned at `/workspace/gitserver/`)
|
||||
> **Version**: 0.0.3 (workspace Cargo.toml)
|
||||
> **License**: MPL-2.0 (primary); upstream portions MIT (preserved in UPSTREAM-LICENSE)
|
||||
> **Upstream origin**: <https://github.com/ggueret/git-server>
|
||||
> **Date researched**: 2026-06-08
|
||||
> **Purpose**: Evaluate gitserver as a basis for a git service adapter within alknet
|
||||
|
||||
---
|
||||
|
||||
## 1. Architecture Overview
|
||||
|
||||
### 1.1 What is gitserver?
|
||||
|
||||
Gitserver is a **Rust-native Git Smart HTTP server** that does not require an installed `git` binary at runtime. All Git operations (ref advertisement, pack generation, receive-pack) are implemented via the [gitoxide](https://github.com/GitoxideLabs/gitoxide) (`gix`) crate. It supports both Git protocol v1 and v2, including shallow clones and multi-ack negotiation.
|
||||
|
||||
The project follows a **library-first design**: `gitserver-core` and `gitserver-http` are reusable libraries, while the `gitserver` binary is a thin CLI wrapper for standalone deployment.
|
||||
|
||||
### 1.2 Crate Structure
|
||||
|
||||
```
|
||||
crates/
|
||||
├── gitserver-core/ # Git protocol operations (no HTTP dependency)
|
||||
│ ├── backend.rs # GitBackend: unified interface for refs/pack/receive-pack
|
||||
│ ├── discovery.rs # RepoStore: filesystem-based repo discovery
|
||||
│ ├── dynamic_registry.rs # DynamicRepoRegistry, RepoResolver, MutableRepoRegistry traits
|
||||
│ ├── error.rs # Error types (RepoNotFound, PathTraversal, Protocol, Git, Io)
|
||||
│ ├── pack.rs # UploadPackRequest parsing, pack generation with side-band-64k
|
||||
│ ├── path.rs # Path safety: resolve_repo_path (normalize + canonicalize)
|
||||
│ ├── pktline.rs # pkt-line encoding/decoding utilities
|
||||
│ ├── protocol_v2.rs # Git protocol v2: ls-refs, fetch, shallow, stateless-rpc
|
||||
│ ├── receive_pack.rs # receive-pack: ref advertisement, pack reception, fast-forward validation
|
||||
│ └── refs.rs # Protocol v1 ref advertisement
|
||||
├── gitserver-http/ # Axum HTTP layer
|
||||
│ ├── error.rs # AppError enum → HTTP status codes
|
||||
│ ├── handlers.rs # Route handlers: info_refs, upload_pack, receive_pack, healthz, list
|
||||
│ ├── lib.rs # router() function + public re-exports
|
||||
│ └── state.rs # SharedState (RepoMode, AuthConfig, ServicePolicy, draining flag)
|
||||
├── gitserver/ # CLI binary (thin wrapper)
|
||||
│ └── main.rs # CLI args, RepoStore discovery, Axum server, graceful shutdown
|
||||
└── gitserver-bench/ # Performance benchmarks (not published)
|
||||
```
|
||||
|
||||
### 1.3 Key Dependencies
|
||||
|
||||
| Dependency | Version | Purpose |
|
||||
|---|---|---|
|
||||
| `gix` | 0.80.0 | Native Git repository operations (open refs, object store, rev-walk) |
|
||||
| `gix-pack` | 0.67.0 | Pack file writing (receive-pack) |
|
||||
| `axum` | 0.8.8 | HTTP routing and handlers |
|
||||
| `tokio` | 1.50.0 | Async runtime, channels, IO |
|
||||
| `miniz_oxide` | 0.8 | Zlib compression for pack objects |
|
||||
| `sha1` | 0.10 | Pack checksum |
|
||||
| `flate2` | 1 | Gzip response compression |
|
||||
| `zstd` | 0.13 | Zstd response compression |
|
||||
| `base64` | 0.22 | HTTP Basic auth decoding |
|
||||
| `subtle` | 2 | Constant-time comparison (auth) |
|
||||
| `clap` | 4.6.0 | CLI argument parsing |
|
||||
|
||||
### 1.4 Request Flow
|
||||
|
||||
#### Clone/Fetch (Protocol v1)
|
||||
|
||||
```
|
||||
Client → GET /{repo}/info/refs?service=git-upload-pack
|
||||
→ Server: resolve repo, verify auth, advertise_refs()
|
||||
← Ref advertisement response
|
||||
|
||||
Client → POST /{repo}/git-upload-pack
|
||||
→ Server: parse UploadPackRequest, generate_pack()
|
||||
← Streamed side-band-64k pack response
|
||||
```
|
||||
|
||||
#### Clone/Fetch (Protocol v2)
|
||||
|
||||
```
|
||||
Client → GET /{repo}/info/refs (git-protocol: version=2)
|
||||
← Capabilities advertisement
|
||||
|
||||
Client → POST /{repo}/git-upload-pack (git-protocol: version=2)
|
||||
→ Server: parse_command_request() → ls-refs or fetch
|
||||
← ls-refs result or streamed packfile
|
||||
```
|
||||
|
||||
#### Push (receive-pack, must be enabled)
|
||||
|
||||
```
|
||||
Client → GET /{repo}/info/refs?service=git-receive-pack
|
||||
← Ref advertisement
|
||||
|
||||
Client → POST /{repo}/git-receive-pack
|
||||
→ Server: parse commands, write pack, validate fast-forward, update refs
|
||||
← Status report (ok/ng per ref)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Protocol Support
|
||||
|
||||
### 2.1 Smart HTTP Git Protocol
|
||||
|
||||
Gitserver implements the **Git Smart HTTP protocol** (RFC-like, de facto standard). This is the standard protocol used by `git clone http://...`, `git fetch`, and `git push` over HTTP.
|
||||
|
||||
**Supported endpoints:**
|
||||
|
||||
| Method | Endpoint | Protocol Version | Description |
|
||||
|---|---|---|---|
|
||||
| GET | `/healthz` | — | Health check (no auth) |
|
||||
| GET | `/` | — | JSON repository listing (auth required if configured) |
|
||||
| GET | `/{repo}/info/refs?service=git-upload-pack` | v1 | Ref advertisement for clone/fetch |
|
||||
| GET | `/{repo}/info/refs?service=git-receive-pack` | v1 | Ref advertisement for push (disabled by default) |
|
||||
| POST | `/{repo}/git-upload-pack` | v1 | Pack negotiation and transfer |
|
||||
| POST | `/{repo}/git-receive-pack` | v1 | Push operations (disabled by default) |
|
||||
| GET | `/{repo}/info/refs` with `git-protocol: version=2` | v2 | Capabilities advertisement |
|
||||
| POST | `/{repo}/git-upload-pack` with `git-protocol: version=2` | v2 | `ls-refs` and `fetch` commands |
|
||||
|
||||
### 2.2 Git Operations
|
||||
|
||||
| Operation | Supported | Notes |
|
||||
|---|---|---|
|
||||
| `git clone` | ✓ | Both v1 and v2 |
|
||||
| `git fetch` | ✓ | Multi-ack, multi-ack-detailed negotiation |
|
||||
| `git push` | ✓ (opt-in) | Via `--enable-receive-pack` or `ServicePolicy.receive_pack: true` |
|
||||
| Shallow clone | ✓ | Protocol v2 `fetch` with `deepen` |
|
||||
| OFS_DELTA | ✓ | Offset delta compression in packs |
|
||||
| Side-band-64k | ✓ | Multiplexed progress/pack data |
|
||||
| Response compression | ✓ | Gzip and Zstd on ref advertisement |
|
||||
|
||||
### 2.3 Push Restrictions
|
||||
|
||||
When receive-pack is enabled, the following restrictions apply:
|
||||
|
||||
- **Fast-forward only**: Branch updates under `refs/heads/*` must be fast-forward (old commit is ancestor of new)
|
||||
- **No ref deletion**: New OID cannot be the zero OID
|
||||
- **No tag overwrite**: Updating an existing tag is rejected
|
||||
- **Commits only**: Branch tips must point to commit objects
|
||||
- **Timeouts**: 300s total, 30s idle
|
||||
|
||||
### 2.4 SSH Git Protocol
|
||||
|
||||
Gitserver does **not** support SSH Git protocol. It is HTTP-only. SSH git access would require a separate implementation or integration layer (see Section 6).
|
||||
|
||||
---
|
||||
|
||||
## 3. Interface Pattern Analysis
|
||||
|
||||
### 3.1 HTTP Handler Architecture
|
||||
|
||||
Gitserver's HTTP layer follows a clean handler pattern:
|
||||
|
||||
```rust
|
||||
// gitserver-http/src/lib.rs
|
||||
pub fn router(state: SharedState) -> Router {
|
||||
Router::new()
|
||||
.route("/healthz", get(handlers::healthz))
|
||||
.route("/", get(handlers::list_repos))
|
||||
.route("/{*path}", get(handlers::info_refs_dispatch))
|
||||
.route("/{*path}", post(handlers::rpc_dispatch))
|
||||
.with_state(state)
|
||||
}
|
||||
```
|
||||
|
||||
The `SharedState` is an Axum state object containing:
|
||||
- `RepoMode` — either `Discovered(Arc<RwLock<RepoStore>>)` or `Dynamic { resolver, registry }`
|
||||
- `AuthConfig` — optional Basic and/or Bearer authentication
|
||||
- `ServicePolicy` — toggle for upload_pack, upload_pack_v2, receive_pack
|
||||
- `draining: Arc<AtomicBool>` — graceful shutdown flag
|
||||
|
||||
Each handler follows this pattern:
|
||||
1. Check `draining` flag → 503 if shutting down
|
||||
2. Check `ServicePolicy` → 404 if service disabled
|
||||
3. Authenticate request via `require_auth()` → 401 if credentials missing/invalid
|
||||
4. Resolve repository via `SharedState::resolve()` → 404 if not found
|
||||
5. Execute git operation via `GitBackend`
|
||||
6. Return streaming or buffered response
|
||||
|
||||
### 3.2 Mapping to alknet's MessageInterface
|
||||
|
||||
Gitserver's `SharedState` + handler pattern maps closely to alknet's proposed `MessageInterface` trait:
|
||||
|
||||
```rust
|
||||
// alknet's proposed MessageInterface
|
||||
async fn handle_request(&self, request: InterfaceRequest) -> Result<InterfaceResponse>;
|
||||
```
|
||||
|
||||
Gitserver's handler flow is essentially:
|
||||
1. Receive HTTP request (analogous to `InterfaceRequest`)
|
||||
2. Extract operation path, auth, and body
|
||||
3. Dispatch to the appropriate Git operation
|
||||
4. Return HTTP response (analogous to `InterfaceResponse`)
|
||||
|
||||
### 3.3 Low-Level Handler API
|
||||
|
||||
Gitserver also exposes handler functions that can be called directly without going through the Axum router:
|
||||
|
||||
```rust
|
||||
use gitserver_http::handlers::{info_refs_endpoint, ServiceKind};
|
||||
|
||||
let response = info_refs_endpoint(
|
||||
&state,
|
||||
"my-project.git",
|
||||
ServiceKind::UploadPack,
|
||||
HeaderMap::new(),
|
||||
).await?;
|
||||
```
|
||||
|
||||
This is significant for alknet integration — it means the git logic can be invoked programmatically without HTTP routing.
|
||||
|
||||
---
|
||||
|
||||
## 4. Authentication
|
||||
|
||||
### 4.1 Current Auth Model
|
||||
|
||||
Gitserver supports two HTTP authentication mechanisms, both optional:
|
||||
|
||||
```rust
|
||||
pub struct AuthConfig {
|
||||
pub basic: Option<BasicAuthConfig>,
|
||||
pub bearer_token: Option<String>,
|
||||
}
|
||||
|
||||
pub struct BasicAuthConfig {
|
||||
pub username: String,
|
||||
pub password: String,
|
||||
}
|
||||
```
|
||||
|
||||
**Key characteristics:**
|
||||
- Both can be configured simultaneously; **either one passing is sufficient**
|
||||
- Basic auth uses **constant-time comparison** (`subtle` crate) to prevent timing attacks
|
||||
- Bearer token is compared directly (suitable for generated tokens)
|
||||
- Failed auth returns `401 Unauthorized` with `WWW-Authenticate: Basic realm="gitserver", Bearer`
|
||||
- `GET /healthz` is **unauthenticated** (always accessible)
|
||||
- Auth is **global** (same credentials for all repositories) — no per-repo or per-user ACL
|
||||
|
||||
### 4.2 Auth Flow in Handlers
|
||||
|
||||
```rust
|
||||
fn require_auth(store: &SharedState, headers: &HeaderMap) -> Result<(), AppError> {
|
||||
let auth = store.auth();
|
||||
if auth.basic.is_none() && auth.bearer_token.is_none() {
|
||||
return Ok(()); // No auth configured → allow all
|
||||
}
|
||||
let value = headers.get(AUTHORIZATION)...;
|
||||
// Try Bearer first, then Basic
|
||||
// Constant-time comparison for Basic
|
||||
}
|
||||
```
|
||||
|
||||
### 4.3 Mapping to alknet Identity
|
||||
|
||||
alknet's `IdentityProvider` resolves credentials to an `Identity`. The mapping would be:
|
||||
|
||||
| gitserver auth | alknet equivalent | Resolution path |
|
||||
|---|---|---|
|
||||
| No auth | `Identity::anonymous()` or reject | Configurable policy |
|
||||
| Basic auth (username/password) | `IdentityProvider::resolve_from_token()` | Map to AuthToken or direct lookup |
|
||||
| Bearer token | `IdentityProvider::resolve_from_token()` | Token is already in the right format |
|
||||
|
||||
The key gap is that gitserver's auth is **single-credential, global**, while alknet needs **per-identity, per-repository** access control. Integration would require:
|
||||
|
||||
1. Replacing `AuthConfig` with alknet's `IdentityProvider`
|
||||
2. Extracting identity from the `Authorization` header
|
||||
3. Checking per-repo ACL based on resolved `Identity`
|
||||
|
||||
---
|
||||
|
||||
## 5. Storage
|
||||
|
||||
### 5.1 Filesystem-Based Storage
|
||||
|
||||
Gitserver currently stores repositories as **bare Git repositories on the local filesystem**. The storage model is:
|
||||
|
||||
```
|
||||
ROOT/
|
||||
├── project-a.git/ # bare repository
|
||||
│ ├── HEAD
|
||||
│ ├── objects/
|
||||
│ ├── refs/
|
||||
│ └── description
|
||||
├── org/
|
||||
│ └── project-b.git/ # nested repository (up to max_depth)
|
||||
└── ...
|
||||
```
|
||||
|
||||
The `RepoStore::discover(root, max_depth)` function:
|
||||
1. Canonicalizes the root path
|
||||
2. Recursively walks subdirectories up to `max_depth`
|
||||
3. Attempts `gix::open(path)` on each directory
|
||||
4. If `repo.is_bare()`, adds it as a `RepoInfo`
|
||||
5. Path traversal protection via lexical normalization + `canonicalize()` double-check
|
||||
|
||||
The `DynamicRepoRegistry` allows programmatic registration/unregistration of repos at runtime, validated by `gix::open()` confirming the path is a bare repo.
|
||||
|
||||
### 5.2 Storage Abstraction Points
|
||||
|
||||
The key storage interaction points in the codebase are:
|
||||
|
||||
| Component | Storage Pattern |
|
||||
|---|---|
|
||||
| `RepoStore::discover()` | Filesystem scan (local directory tree) |
|
||||
| `DynamicRepoRegistry` | In-memory registry with filesystem-backed paths |
|
||||
| `GitBackend::new(repo_path)` | Opens a local bare repo via `gix::open()` |
|
||||
| `receive_pack::write_pack()` | Writes pack to `objects/pack/` via `gix_pack::Bundle::write_to_directory()` |
|
||||
| `path::resolve_repo_path()` | Canonical path resolution + traversal protection |
|
||||
|
||||
**All storage operations assume a local filesystem path.** There is no abstraction for remote or object storage backends.
|
||||
|
||||
### 5.3 Rustfs (S3-Compatible) Integration Feasibility
|
||||
|
||||
Git operations fundamentally require **a local filesystem** — `gix::open()` expects a directory with the standard `.git` layout (objects, refs, HEAD, etc.). Rustfs (S3-compatible) cannot serve as a **direct** storage backend for gitoxide's repository operations because:
|
||||
|
||||
1. `gix::open()` requires a local path — it reads `HEAD`, refs, and object packs from the filesystem
|
||||
2. Pack generation (`generate_pack()`) streams objects from the local ODB
|
||||
3. Receive-pack writes pack files to the local `objects/pack/` directory
|
||||
4. Reference updates use `gix::Repository::edit_references()` which operates on the local refstore
|
||||
|
||||
However, rustfs **could** be used in several supporting roles:
|
||||
|
||||
| Integration Approach | Description | Feasibility |
|
||||
|---|---|---|
|
||||
| **Repo sync backend** | Store bare repo tarballs in rustfs; sync to local disk on demand | High — sync from S3 to local FS before serving |
|
||||
| **Backup/archive** | Push repo backups to rustfs buckets | High — out-of-band backup |
|
||||
| **Git LFS storage** | Store large file objects in rustfs via Git LFS | Medium — requires LFS server implementation |
|
||||
| **Object store proxy** | Cache layer: serve from local FS, sync to/from rustfs | Medium — needs repo lifecycle management |
|
||||
| **Direct S3 repo** | Custom `gix` object backend reading from S3 | Low — would require deep gitoxide customization |
|
||||
|
||||
The most practical approach: **use rustfs as a backing store for repository synchronization**. Gitserver would always operate on local filesystem paths, but a separate component would manage syncing repos to/from rustfs buckets.
|
||||
|
||||
---
|
||||
|
||||
## 6. SSH Support
|
||||
|
||||
### 6.1 Current State
|
||||
|
||||
Gitserver has **no SSH transport capability**. It only implements the HTTP Smart Git protocol. Adding SSH support would require implementing the Git SSH protocol, which is a different wire format:
|
||||
|
||||
| Aspect | Smart HTTP | SSH |
|
||||
|---|---|---|
|
||||
| Transport | HTTP (request/response) | Persistent SSH channel |
|
||||
| Service discovery | `GET /info/refs?service=git-upload-pack` | `ssh://host/git-upload-pack 'repo'` |
|
||||
| Protocol framing | pkt-line over HTTP | pkt-line over SSH channel |
|
||||
| Authentication | HTTP Authorization header | SSH key-based |
|
||||
| Multiplexing | HTTP/2 or separate connections | Multiple SSH channels |
|
||||
|
||||
### 6.2 How Git over SSH Works
|
||||
|
||||
The Git SSH protocol uses SSH as a transport for the same `git-upload-pack` and `git-receive-pack` commands:
|
||||
|
||||
```
|
||||
Client connects via SSH → server executes git-upload-pack or git-receive-pack
|
||||
Client ← SSH channel → Server (bidirectional pkt-line stream)
|
||||
```
|
||||
|
||||
### 6.3 Integration with alknet's SSH Interface
|
||||
|
||||
alknet's SSH interface (`SshInterface`) is a `StreamInterface` — it accepts a persistent byte stream and multiplexes it into channels. This maps naturally to Git over SSH:
|
||||
|
||||
**Approach: Git as an alknet operation over SSH**
|
||||
|
||||
```
|
||||
alknet SSH session
|
||||
│
|
||||
├─ Channel: call protocol (operations)
|
||||
│
|
||||
└─ Channel: git-upload-pack
|
||||
OR git-receive-pack
|
||||
│
|
||||
▼
|
||||
gitserver-core protocol logic
|
||||
(ref advertisement, pack generation, receive-pack)
|
||||
```
|
||||
|
||||
This would work by:
|
||||
|
||||
1. The SSH interface receives a connection with a request like `git-upload-pack '/repos/project.git'`
|
||||
2. alknet resolves the identity from the SSH key fingerprint
|
||||
3. Checks ACL: does this identity have read/write access to this repo?
|
||||
4. Invokes `gitserver-core` functions directly (no HTTP needed):
|
||||
- `refs::advertise_refs()` → send over SSH channel
|
||||
- `pack::generate_pack()` → stream over SSH channel
|
||||
- `receive_pack::receive_pack()` → read/write over SSH channel
|
||||
|
||||
**Key advantage**: Since `gitserver-core` has no HTTP dependency, it can be used directly over SSH channels without the HTTP overhead. The `GitBackend` API is transport-agnostic.
|
||||
|
||||
### 6.4 Alternative: Dedicated Git SSH Adapter
|
||||
|
||||
A simpler approach that doesn't require modifying the SSH channel multiplexing:
|
||||
|
||||
```
|
||||
alknet SSH session → call protocol → operation "git/upload-pack" →
|
||||
→ GitAdapter::upload_pack(repo, wants, haves) → streaming response
|
||||
```
|
||||
|
||||
This treats Git operations as alknet call operations, where the SSH interface is the transport but Git operations are invoked via the call protocol rather than raw SSH channels. This is more aligned with alknet's architecture but requires adapting the Git protocol to the call protocol's request/response model (potentially with streaming).
|
||||
|
||||
---
|
||||
|
||||
## 7. Relevance to alknet
|
||||
|
||||
### 7.1 Mapping to alknet's Interface Model
|
||||
|
||||
Gitserver is a textbook **`MessageInterface`** implementation:
|
||||
|
||||
| alknet MessageInterface | Gitserver Equivalent |
|
||||
|---|---|
|
||||
| `handle_request(InterfaceRequest)` | `info_refs_dispatch()` / `rpc_dispatch()` |
|
||||
| `InterfaceRequest.operation_path` | URL path (`/{repo}/info/refs`, `/{repo}/git-upload-pack`) |
|
||||
| `InterfaceRequest.auth_token` | `Authorization` header → `require_auth()` |
|
||||
| `InterfaceRequest.input` | Request body (pack negotiation data) |
|
||||
| `InterfaceResponse.result` | HTTP response body (ref advertisement, pack data) |
|
||||
| `InterfaceResponse.status` | HTTP status code |
|
||||
| `InterfaceResponse.headers` | Content-Type, Cache-Control, etc. |
|
||||
|
||||
However, gitserver **manages its own transport** (Axum HTTP server), which is exactly the `MessageInterface` pattern described in alknet's interface model: "MessageInterface implementations manage their own transport. They don't need the Transport trait because they're not wrapping a generic byte stream — they ARE the transport+interface combined."
|
||||
|
||||
### 7.2 Git as an alknet Operation
|
||||
|
||||
Git operations could be mapped to alknet's call protocol namespace:
|
||||
|
||||
```
|
||||
Namespace: "git"
|
||||
Operations:
|
||||
- git/list → List available repositories
|
||||
- git/info-refs → Get ref advertisement for a repo
|
||||
- git/upload-pack → Clone/fetch (streaming response)
|
||||
- git/receive-pack → Push (streaming request+response)
|
||||
- git/ls-refs → Protocol v2 ls-refs
|
||||
- git/fetch → Protocol v2 fetch
|
||||
```
|
||||
|
||||
**Challenge**: Git operations are **streaming and bidirectional** (especially fetch negotiation and receive-pack), while alknet's call protocol is currently defined as request→response. This needs design consideration:
|
||||
|
||||
| Operation | Direction | Stream Duration | alknet Fit |
|
||||
|---|---|---|---|
|
||||
| `git/list` | Request → Response | Short | Direct fit |
|
||||
| `git/info-refs` | Request → Response | Short | Direct fit |
|
||||
| `git/upload-pack` | Request → Streaming Response | Long | Needs streaming response support |
|
||||
| `git/receive-pack` | Streaming Request → Streaming Response | Long | Needs bidirectional streaming |
|
||||
|
||||
### 7.3 Proposed GitAdapter Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ alknet node │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ HttpInterface│ │ SshInterface │ │ DNS/other │ │
|
||||
│ │ (Message) │ │ (Stream) │ │ (Message) │ │
|
||||
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌──────────────────────────────────────────────────┐ │
|
||||
│ │ OperationRegistry │ │
|
||||
│ │ "git/list" → GitAdapter::list_repos() │ │
|
||||
│ │ "git/upload-pack" → GitAdapter::upload_pack() │ │
|
||||
│ │ "git/receive-pack" → GitAdapter::receive_pack() │ │
|
||||
│ └──────────────┬───────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌──────────────────────────────┐ │
|
||||
│ │ GitAdapter │ │
|
||||
│ │ - SharedState (repos, auth) │ │
|
||||
│ │ - GitBackend (protocol ops) │ │
|
||||
│ │ - IdentityProvider (auth) │ │
|
||||
│ │ - RepoResolver (filesystem) │ │
|
||||
│ └──────────────┬───────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌──────────────────────────────┐ ┌────────────────┐ │
|
||||
│ │ Local filesystem │ │ Rustfs sync │ │
|
||||
│ │ (bare git repos) │ │ (S3 backend) │ │
|
||||
│ └──────────────────────────────┘ └────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 7.4 Auth Integration: alknet Identity → Gitserver Auth
|
||||
|
||||
**Current gitserver auth** (single global credential):
|
||||
```rust
|
||||
AuthConfig {
|
||||
basic: Option<BasicAuthConfig>, // one username/password
|
||||
bearer_token: Option<String>, // one token
|
||||
}
|
||||
```
|
||||
|
||||
**Proposed alknet integration** (per-identity, per-repo):
|
||||
```rust
|
||||
struct GitAdapter {
|
||||
identity_provider: Arc<dyn IdentityProvider>,
|
||||
repo_resolver: Arc<dyn RepoResolver>,
|
||||
backend_factory: Arc<dyn GitBackendFactory>,
|
||||
acl: Arc<dyn GitAcl>,
|
||||
}
|
||||
|
||||
impl GitAdapter {
|
||||
async fn handle_request(
|
||||
&self,
|
||||
request: InterfaceRequest,
|
||||
) -> Result<InterfaceResponse> {
|
||||
// 1. Resolve identity from auth token
|
||||
let identity = self.identity_provider
|
||||
.resolve_from_token(request.auth_token)?;
|
||||
|
||||
// 2. Parse git operation from path
|
||||
let operation = parse_git_operation(&request.operation_path)?;
|
||||
|
||||
// 3. Check ACL
|
||||
self.acl.check_access(&identity, &operation.repo, operation.access_type)?;
|
||||
|
||||
// 4. Dispatch to gitserver-core logic
|
||||
// ...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**ACL design** (per-repo, per-operation):
|
||||
```rust
|
||||
enum GitAccess {
|
||||
Read, // clone, fetch
|
||||
Write, // push
|
||||
}
|
||||
|
||||
trait GitAcl: Send + Sync {
|
||||
fn check_access(
|
||||
&self,
|
||||
identity: &Identity,
|
||||
repo: &str,
|
||||
access: GitAccess,
|
||||
) -> Result<()>;
|
||||
}
|
||||
```
|
||||
|
||||
### 7.5 Storage Integration with Rustfs
|
||||
|
||||
**Recommended approach**: Rustfs as a sync backend:
|
||||
|
||||
```rust
|
||||
trait RepoStorage: Send + Sync {
|
||||
/// Ensure a local working copy exists for the given repo.
|
||||
/// May involve syncing from S3 (rustfs) to local disk.
|
||||
async fn ensure_local(&self, repo: &str) -> Result<PathBuf>;
|
||||
|
||||
/// Sync local changes back to S3 (rustfs) after a push.
|
||||
async fn sync_to_remote(&self, repo: &str) -> Result<()>;
|
||||
|
||||
/// List available repos (may consult S3 bucket listing).
|
||||
async fn list_repos(&self) -> Result<Vec<RepoInfo>>;
|
||||
}
|
||||
```
|
||||
|
||||
The flow would be:
|
||||
1. `GitAdapter` receives a request for repo `X`
|
||||
2. `RepoStorage::ensure_local("X")` checks if the repo exists on local disk; if not, syncs from rustfs
|
||||
3. Git operations run on the local filesystem (using `gitserver-core` directly)
|
||||
4. After push operations, `RepoStorage::sync_to_remote("X")` pushes updates to rustfs
|
||||
|
||||
This maintains gitserver's requirement for a local filesystem while leveraging rustfs for durability and distribution.
|
||||
|
||||
### 7.6 Operation Mapping
|
||||
|
||||
| Git Operation | alknet Namespace | alknet Op | Input | Output | Stream? |
|
||||
|---|---|---|---|---|---|
|
||||
| List repos | `git` | `list` | `{}` | `[RepoInfo]` | No |
|
||||
| Ref advertisement (v1) | `git` | `info-refs` | `{repo, service: "upload-pack" \| "receive-pack"}` | Binary ref advertisement | No |
|
||||
| Ref capabilities (v2) | `git` | `capabilities` | `{repo}` | Binary capabilities | No |
|
||||
| Ls-refs (v2) | `git` | `ls-refs` | `{repo, peel, symrefs, ref_prefixes}` | Binary ref listing | No |
|
||||
| Clone/Fetch | `git` | `upload-pack` | `{repo, wants, haves, done, ...}` | Streamed pack data | Yes (response) |
|
||||
| Push | `git` | `receive-pack` | `{repo, commands, pack_data}` | Status report | Yes (both) |
|
||||
|
||||
### 7.7 What gitserver-core Provides Directly
|
||||
|
||||
The most valuable integration point is `gitserver-core` — the HTTP-free protocol library:
|
||||
|
||||
```rust
|
||||
// Direct usage without HTTP
|
||||
use gitserver_core::backend::GitBackend;
|
||||
use gitserver_core::discovery::RepoStore;
|
||||
use gitserver_core::pack::{UploadPackRequest, UploadPackCapabilities, ShallowRequest};
|
||||
use gitserver_core::protocol_v2;
|
||||
|
||||
// Repository discovery
|
||||
let store = RepoStore::discover("./repos".into(), 3)?;
|
||||
let repo = store.resolve("my-project.git")?;
|
||||
|
||||
// Protocol v1 ref advertisement
|
||||
let backend = GitBackend::new(repo.absolute_path.clone());
|
||||
let refs = backend.advertise_refs()?;
|
||||
|
||||
// Pack generation (streaming)
|
||||
let request = UploadPackRequest { wants, haves, done, ... };
|
||||
let pack_stream = backend.upload_pack(&request).await?;
|
||||
|
||||
// Receive-pack (push)
|
||||
let result = backend.receive_pack(request_stream).await?;
|
||||
|
||||
// Protocol v2
|
||||
let capabilities = protocol_v2::advertise_capabilities();
|
||||
let ls_refs_output = protocol_v2::ls_refs(&repo_path, &ls_refs_request)?;
|
||||
let fetch_output = backend.upload_pack(&fetch_request.upload_request).await?;
|
||||
```
|
||||
|
||||
These functions can be called from any async context — SSH channel handler, alknet operation handler, HTTP handler — without going through the Axum HTTP layer.
|
||||
|
||||
---
|
||||
|
||||
## 8. Integration Recommendations
|
||||
|
||||
### 8.1 Recommended Integration Strategy
|
||||
|
||||
**Phase 1: HTTP Gateway (MessageInterface)**
|
||||
|
||||
Embed gitserver-http's Axum router into alknet's HTTP interface. This provides immediate Git-over-HTTP capability:
|
||||
|
||||
```rust
|
||||
// In alknet's HttpInterface::handle_request()
|
||||
// Route: /git/* → gitserver router
|
||||
let git_app = gitserver_http::router(git_state);
|
||||
let app = Router::new()
|
||||
.nest("/git", git_app) // Mount git under /git
|
||||
.route("/v1/{namespace}/{op}", post(operation_handler));
|
||||
```
|
||||
|
||||
This works because gitserver is designed to be nested into existing Axum apps. Auth integration would replace `AuthConfig` with alknet's `IdentityProvider`.
|
||||
|
||||
**Phase 2: SSH Git Adapter (StreamInterface)**
|
||||
|
||||
Use `gitserver-core` directly within alknet's SSH interface for Git-over-SSH:
|
||||
|
||||
```rust
|
||||
// In alknet's SshInterface channel handler
|
||||
// SSH channel request: "git-upload-pack '/repos/project.git'"
|
||||
let backend = GitBackend::new(repo_path);
|
||||
let refs = backend.advertise_refs()?;
|
||||
// Send refs over SSH channel
|
||||
// Stream pack data over SSH channel
|
||||
```
|
||||
|
||||
**Phase 3: Call Protocol Operations (OperationRegistry)**
|
||||
|
||||
Register Git operations in the operation registry for access via any interface:
|
||||
|
||||
```rust
|
||||
registry.register(GitListRepos::new(adapter.clone()));
|
||||
registry.register(GitUploadPack::new(adapter.clone()));
|
||||
registry.register(GitReceivePack::new(adapter.clone()));
|
||||
```
|
||||
|
||||
### 8.2 Key Modifications Needed
|
||||
|
||||
1. **Auth replacement**: Replace `AuthConfig` with `IdentityProvider`-based auth in `handlers.rs`'s `require_auth()` function
|
||||
2. **ACL addition**: Add per-repo, per-identity access control (gitserver currently has none)
|
||||
3. **RepoResolver abstraction**: Replace `RepoStore`/`DynamicRepoRegistry` with alknet's `RepoResolver` that integrates with rustfs sync
|
||||
4. **Streaming response support**: Adapt alknet's call protocol for streaming (large pack files)
|
||||
5. **Bidirectional streaming**: For receive-pack, the call protocol needs to support bidirectional streaming
|
||||
|
||||
### 8.3 Risks and Mitigations
|
||||
|
||||
| Risk | Mitigation |
|
||||
|---|---|
|
||||
| gitserver requires local filesystem | Use rustfs as sync backend; maintain local working copies |
|
||||
| Auth is global (single credential) | Fork/modify `require_auth()` to use `IdentityProvider` |
|
||||
| No per-repo ACL | Add `GitAcl` trait in the adapter layer |
|
||||
| MPL-2.0 license requires modifications to be under MPL-2.0 | Acceptable for alknet (MPL-2.0 is file-level copyleft) |
|
||||
| Large pack files may not fit alknet's message size limits | Implement streaming response in the call protocol |
|
||||
| gitoxide version coupling | Pin `gix = "0.80.0"` as gitserver does |
|
||||
|
||||
### 8.4 License Considerations
|
||||
|
||||
- **Primary license**: MPL-2.0 (file-level copyleft)
|
||||
- **Upstream portions**: MIT (preserved in UPSTREAM-LICENSE)
|
||||
- **Implication**: Modifications to gitserver's `.rs` files must remain under MPL-2.0. Linking from alknet code is unrestricted.
|
||||
- **Recommendation**: Use gitserver as a library dependency. If alknet-specific auth/ACL modifications are needed, contribute them upstream or maintain them as separate files under MPL-2.0.
|
||||
|
||||
---
|
||||
|
||||
## 9. Summary
|
||||
|
||||
### 9.1 Key Findings
|
||||
|
||||
1. **gitserver is a well-structured, library-first Rust Git Smart HTTP server** with clean separation between protocol logic (`gitserver-core`) and HTTP transport (`gitserver-http`).
|
||||
2. **Protocol support is comprehensive**: Git Smart HTTP v1 and v2, clone, fetch, push (opt-in), shallow clones, delta compression, streaming pack generation.
|
||||
3. **No SSH support exists**, but `gitserver-core` is transport-agnostic and can serve Git operations over any channel.
|
||||
4. **Auth is simple but limited**: single global Basic/Bearer credential, no per-repo or per-user ACL.
|
||||
5. **Storage is local-filesystem only**: `gix::open()` requires a local path. S3/rustfs integration requires a sync-to-local approach.
|
||||
6. **The library design enables direct integration**: `GitBackend` and protocol functions can be called without HTTP.
|
||||
|
||||
### 9.2 Recommendation
|
||||
|
||||
**Use `gitserver-core` as alknet's Git protocol engine.** The core crate provides all Git protocol operations (ref advertisement, pack generation, receive-pack, protocol v2) without any HTTP dependency. This allows alknet to expose Git services through any interface (HTTP, SSH, call protocol) while maintaining a single protocol implementation.
|
||||
|
||||
**Use `gitserver-http` as alknet's Git HTTP interface** by nesting its Axum router under alknet's HTTP interface, with auth replaced by `IdentityProvider`.
|
||||
|
||||
**Design a `GitAdapter`** that wraps `gitserver-core` and integrates with alknet's `OperationRegistry`, `IdentityProvider`, and rustfs-backed storage.
|
||||
|
||||
### 9.3 Next Steps
|
||||
|
||||
1. Fork or vendor `gitserver-core` and `gitserver-http` into alknet's dependency tree
|
||||
2. Design the `GitAdapter` trait with `IdentityProvider` auth and `GitAcl` access control
|
||||
3. Implement Phase 1: HTTP gateway with nested Axum router and `IdentityProvider` auth
|
||||
4. Implement `RepoStorage` trait with rustfs sync-to-local strategy
|
||||
5. Design streaming extensions to alknet's call protocol for pack file transfer
|
||||
6. Evaluate Phase 2: SSH Git adapter using `gitserver-core` directly over SSH channels
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [gitserver README](https://github.com/WJQSERVER/gitserver) — project overview, quick start, CLI usage
|
||||
- [gitserver Architecture docs](docs/en/architecture.md) — crate responsibilities, request flows
|
||||
- [gitserver Library docs](docs/en/library.md) — embedding, dynamic registration, auth config
|
||||
- [gitserver API Reference](docs/en/api.md) — REST endpoints, protocol details, error codes
|
||||
- [alknet Interface Model](../../phase2/interface-model.md) — StreamInterface/MessageInterface design
|
||||
- [gitoxide](https://github.com/GitoxideLabs/gitoxide) — underlying Git implementation library
|
||||
@@ -1,857 +0,0 @@
|
||||
# Research: Honker — SQLite Pub/Sub, Queue, and Notification Extension
|
||||
|
||||
## Key Findings
|
||||
|
||||
- **Honker is a Rust-based SQLite extension** that adds Postgres-style NOTIFY/LISTEN semantics plus durable pub/sub, task queues, and event streams entirely within SQLite. It eliminates the need for a separate message broker (Redis, Kafka) when SQLite is the primary datastore.
|
||||
- **Three core primitives**: `notify/listen` (ephemeral pub/sub), `stream` (durable pub/sub with per-consumer offsets), and `queue` (at-least-once work queue with retries, priority, delayed jobs, and dead-letter handling). All three are SQL INSERTs inside your transaction — business write and side-effect commit or roll back together.
|
||||
- **Wake mechanism**: Uses `PRAGMA data_version` polling at 1ms granularity to detect commits, achieving ~1-2ms median cross-process wake latency without requiring a daemon or broker. A single thread per database fans out to N subscribers via bounded channels.
|
||||
- **Single-machine, single-writer model**: Designed for self-hosted deployments. Not distributed — no multi-node replication. This maps perfectly to alknet's per-node architecture where domain events are internal to a service boundary (ADR-032).
|
||||
- **Comprehensive SQL API**: 30+ SQL scalar functions (`honker_enqueue`, `honker_claim_batch`, `honker_ack_batch`, `honker_stream_publish`, `honker_stream_read_since`, `honker_stream_save_offset`, `notify`, `honker_lock_acquire`, `honker_rate_limit_try`, `honker_scheduler_register`, etc.) registered as a loadable SQLite extension. Any language that can `SELECT load_extension('honker')` gets the same features.
|
||||
- **Rust core (`honker-core`)**: All SQL implementations live in a shared Rust crate consumed by the loadable extension, PyO3 Python binding, napi-rs Node binding, and other language wrappers. One source of truth for the SQL — no behavioral drift across bindings.
|
||||
- **License**: Apache 2.0 / MIT dual-license. Fully permissive for integration.
|
||||
|
||||
**Recommendation**: Adopt honker's patterns directly in `alknet-storage`. The `honker` crate (or `honker-core` for a Rust-native integration) should be a dependency of `alknet-storage`. Honker's single-node model aligns with alknet's event boundary discipline — domain events stay within the service boundary, and cross-node events go through the call protocol. For production deployments that use Postgres instead of SQLite, the same patterns (queue/claim, stream/subscribe, notify/listen) can be replicated using Postgres features, but honker's built-in retry, visibility timeout, and scheduling would need to be reimplemented.
|
||||
|
||||
---
|
||||
|
||||
## 1. Architecture
|
||||
|
||||
### What Is Honker?
|
||||
|
||||
Honker is a **SQLite extension + language bindings** that adds Postgres-style `NOTIFY`/`LISTEN` semantics to SQLite, with built-in durable pub/sub, task queues, and event streams — without requiring a client-polling loop, a daemon, or a separate broker.
|
||||
|
||||
**Core idea**: If SQLite is your primary datastore, your queue should live in the same file. `INSERT INTO orders` and `queue.enqueue(...)` commit in the same transaction. Rollback drops both.
|
||||
|
||||
**Implementation language**: Rust. The shared engine is `honker-core`, a plain Rust `rlib` crate. Language bindings (Python via PyO3, Node via napi-rs, Go via CGo, Ruby via C extension, .NET via P/Invoke, JVM via JNI, Kotlin wrapper, Elixir via NIF, C++ via header-only wrapper) are thin wrappers around the loadable extension's SQL functions.
|
||||
|
||||
**How it works as a SQLite extension**: The `honker-extension` crate compiles to `libhonker_ext.{so,dylib,dll}`. Any SQLite 3.9+ client loads it:
|
||||
|
||||
```sql
|
||||
.load ./libhonker_ext
|
||||
SELECT honker_bootstrap();
|
||||
```
|
||||
|
||||
This creates the schema tables (`_honker_live`, `_honker_dead`, `_honker_notifications`, `_honker_stream`, `_honker_stream_consumers`, `_honker_locks`, `_honker_rate_limits`, `_honker_scheduler_tasks`, `_honker_results`) and registers all SQL scalar functions. The extension and Python/binding tables are shared, so a Python worker can claim jobs any other language pushed via the extension.
|
||||
|
||||
### Crate Structure
|
||||
|
||||
```
|
||||
honker-core/ # Rust rlib shared across all bindings (published on crates.io)
|
||||
honker-extension/ # SQLite loadable extension (cdylib, published on crates.io)
|
||||
packages/
|
||||
honker/ # Python package (PyO3 cdylib + Queue/Stream/Outbox/Scheduler)
|
||||
honker-node/ # napi-rs Node.js binding
|
||||
honker-rs/ # Ergonomic Rust wrapper
|
||||
honker-go/ # Go binding
|
||||
honker-ruby/ # Ruby binding
|
||||
honker-bun/ # Bun binding
|
||||
honker-ex/ # Elixir binding
|
||||
honker-cpp/ # C++ binding
|
||||
honker-dotnet/ # .NET / C# binding
|
||||
honker-jvm/ # JVM / Java-compatible binding
|
||||
honker-kotlin/ # Kotlin convenience wrapper
|
||||
```
|
||||
|
||||
### Wake Path Architecture
|
||||
|
||||
The fundamental challenge for any SQLite-based pub/sub system: SQLite has no wire protocol or server-push. Consumers must initiate reads. Honker solves this with a **single-digit-microsecond `PRAGMA data_version` read**:
|
||||
|
||||
1. **One PRAGMA-poll thread per `Database`** queries `data_version` every 1ms
|
||||
2. Counter change → fan out a tick to each subscriber's bounded channel (capacity 1 — coalesces redundant wakes)
|
||||
3. Each subscriber runs `SELECT … WHERE id > last_seen` against a partial index, yields rows, returns to wait
|
||||
4. 100 subscribers = 1 poll thread. Idle listeners run zero SQL queries.
|
||||
|
||||
Idle cost: ~3.5µs per `PRAGMA data_version` query, ~3.5ms/sec total at 1kHz. A 5-second paranoia poll exists as a fallback only if the update watcher cannot fire.
|
||||
|
||||
**Three backend options** (controlled by `WatcherBackend` enum):
|
||||
- **Polling** (default, stable): `PRAGMA data_version` every 1ms. Correct on all platforms.
|
||||
- **Kernel** (experimental, `kernel-watcher` Cargo feature): Uses `notify-rs` filesystem events. Fires on every filesystem write. May produce spurious/missed wakes. Dead-man's switch for file replacement.
|
||||
- **SHM fast path** (experimental, `shm-fast-path` Cargo feature): Memory-maps the `-shm` WAL index file and reads `iChange` at offset 8 at ~100µs cadence. WAL-mode only. Dead-man's switch for file replacement.
|
||||
|
||||
**Dead-man's switch**: All backends check file identity `(dev, ino)` / `(volume_serial, file_index)` every ~100ms. If the database file is replaced (atomic rename, litestream restore, volume remount), the watcher panics with a clear error message. Subscribers see an error from `update_events()` instead of hanging silently.
|
||||
|
||||
### SharedUpdateWatcher
|
||||
|
||||
```rust
|
||||
pub struct SharedUpdateWatcher {
|
||||
watcher: Mutex<Option<UpdateWatcher>>, // background poll thread
|
||||
senders: Arc<Mutex<HashMap<u64, SyncSender<()>>>>, // fan-out channels
|
||||
next_id: AtomicU64,
|
||||
}
|
||||
```
|
||||
|
||||
- `subscribe()` → `(u64, Receiver<()>)` — register a channel; capacity 1
|
||||
- `unsubscribe(id)` — remove channel; receiver sees `Err(RecvError)`
|
||||
- `close()` — join the poll thread, clear all subscribers
|
||||
- Wakes are idempotent "go re-read state" signals. Dropped redundant wakes never lose data.
|
||||
|
||||
---
|
||||
|
||||
## 2. Core Capabilities
|
||||
|
||||
### 2.1 Notify/Listen — Ephemeral Pub/Sub
|
||||
|
||||
**What it is**: Fire-and-forget notifications to channel subscribers. Like `pg_notify` but with table-backed persistence until explicitly pruned.
|
||||
|
||||
**How it works**:
|
||||
- `notify(channel, payload)` is a SQL scalar function that INSERTs into `_honker_notifications` and returns the row id. Runs inside the caller's open transaction — rollbacks drop the notification.
|
||||
- `db.listen(channel)` or `db.updateEvents()` in Node — registers a subscriber that wakes on any database commit, then filters by channel in the `SELECT` path.
|
||||
- Listeners attach at current `MAX(id)`; **history is not replayed**. This is the key distinction from streams.
|
||||
|
||||
**Schema**:
|
||||
```sql
|
||||
CREATE TABLE _honker_notifications (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
channel TEXT NOT NULL,
|
||||
payload TEXT NOT NULL,
|
||||
created_at INTEGER NOT NULL DEFAULT (unixepoch())
|
||||
);
|
||||
CREATE INDEX _honker_notifications_recent ON _honker_notifications(channel, id);
|
||||
```
|
||||
|
||||
**Key characteristics**:
|
||||
- Not auto-pruned. Call `db.prune_notifications(older_than_s=…, max_keep=…)` from a scheduled task.
|
||||
- Over-triggering is by design: a `data_version` change wakes every subscriber on that database, not just the matching channel. Each wasted wake = one indexed SELECT (microseconds). A missed wake = a silent correctness bug.
|
||||
- Payload must be valid JSON for cross-language compatibility.
|
||||
|
||||
### 2.2 Queue — At-Least-Once Work Queue
|
||||
|
||||
**What it is**: Durable, at-least-once delivery work queue with retries, priority, delayed jobs, task expiration, dead-letter handling, named locks, and rate-limiting.
|
||||
|
||||
**Schema (single-table hybrid)**:
|
||||
|
||||
```sql
|
||||
CREATE TABLE _honker_live (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
queue TEXT NOT NULL,
|
||||
payload TEXT NOT NULL,
|
||||
state TEXT NOT NULL DEFAULT 'pending', -- 'pending' | 'processing'
|
||||
priority INTEGER NOT NULL DEFAULT 0,
|
||||
run_at INTEGER NOT NULL DEFAULT (unixepoch()), -- for delayed jobs
|
||||
worker_id TEXT,
|
||||
claim_expires_at INTEGER, -- visibility timeout
|
||||
attempts INTEGER NOT NULL DEFAULT 0,
|
||||
max_attempts INTEGER NOT NULL DEFAULT 3,
|
||||
created_at INTEGER NOT NULL DEFAULT (unixepoch()),
|
||||
expires_at INTEGER -- job expiration
|
||||
);
|
||||
|
||||
CREATE INDEX _honker_live_claim
|
||||
ON _honker_live(queue, priority DESC, run_at, id)
|
||||
WHERE state IN ('pending', 'processing');
|
||||
|
||||
CREATE TABLE _honker_dead (
|
||||
id INTEGER PRIMARY KEY,
|
||||
queue TEXT NOT NULL,
|
||||
payload TEXT NOT NULL,
|
||||
priority INTEGER NOT NULL DEFAULT 0,
|
||||
run_at INTEGER NOT NULL DEFAULT 0,
|
||||
attempts INTEGER NOT NULL DEFAULT 0,
|
||||
max_attempts INTEGER NOT NULL DEFAULT 0,
|
||||
last_error TEXT,
|
||||
created_at INTEGER NOT NULL DEFAULT (unixepoch()),
|
||||
died_at INTEGER NOT NULL DEFAULT (unixepoch())
|
||||
);
|
||||
```
|
||||
|
||||
**Claim/ack/nack model**:
|
||||
|
||||
| Operation | SQL | Notes |
|
||||
|-----------|-----|-------|
|
||||
| Enqueue | `INSERT INTO _honker_live (queue, payload, run_at, priority, max_attempts, expires_at) VALUES (…)` | Returns auto-increment id |
|
||||
| Claim | `UPDATE _honker_live SET state='processing', worker_id=?, claim_expires_at=unixepoch()+?, attempts=attempts+1 WHERE id IN (SELECT id FROM _honker_live WHERE queue=? AND state IN ('pending','processing') AND (expires_at IS NULL OR expires_at > unixepoch()) AND ((state='pending' AND run_at <= unixepoch()) OR (state='processing' AND claim_expires_at < unixepoch())) ORDER BY priority DESC, run_at ASC, id ASC LIMIT ?) RETURNING …` | One `UPDATE … RETURNING` via partial index |
|
||||
| Ack | `DELETE FROM _honker_live WHERE id=? AND worker_id=? AND claim_expires_at >= unixepoch() RETURNING id` | Returns 1 if claim still valid, 0 if expired |
|
||||
| Retry | `UPDATE _honker_live SET state='pending', run_at=unixepoch()+?, worker_id=NULL, claim_expires_at=NULL WHERE id=?` + notify on queue channel | If `attempts >= max_attempts`, DELETE from `_honker_live` and INSERT into `_honker_dead` |
|
||||
| Fail | `DELETE FROM _honker_live WHERE id=? AND worker_id=? AND claim_expires_at >= unixepoch() RETURNING …` + `INSERT INTO _honker_dead` | Unconditionally move to dead letter |
|
||||
| Heartbeat | `UPDATE _honker_live SET claim_expires_at=unixepoch()+? WHERE id=? AND worker_id=? AND state='processing'` | Extend claim for long-running handlers |
|
||||
| Cancel | `DELETE FROM _honker_live WHERE id=? AND state IN ('pending', 'processing')` | Idempotent |
|
||||
|
||||
**Visibility timeout**: Default 300 seconds (`claim_expires_at = unixepoch() + 300`). If a worker crashes mid-job, the claim expires and another worker reclaims. `attempts` increments. After `max_attempts` (default 3), the row moves to `_honker_dead`.
|
||||
|
||||
**Priority**: Higher `priority` value = claimed first. The partial index on `(queue, priority DESC, run_at, id)` ensures claim path is bounded by working-set size, not history size.
|
||||
|
||||
**Delayed jobs**: Set `run_at` to a future timestamp. Workers only claim rows where `run_at <= unixepoch()`. The `run_at` deadline also wakes sleeping workers through `honker_queue_next_claim_at()`.
|
||||
|
||||
**Task expiration**: Set `expires_at` on enqueue. Expired jobs are filtered from the claim path. Call `queue.sweep_expired()` to move them to `_honker_dead` with `last_error='expired'`.
|
||||
|
||||
**Named locks**: `honker_lock_acquire(name, owner, ttl_s)` → 1 (got it) or 0 (held). `honker_lock_release(name, owner)` → 1 (released) or 0 (not yours). Uses `_honker_locks` table with TTL-based expiration. Primary use case: cron tasks that shouldn't overlap (leader election).
|
||||
|
||||
**Rate limiting**: `honker_rate_limit_try(name, limit, per)` → 1 (under limit) or 0 (at limit). Fixed-window counter. Rejected calls don't inflate the count.
|
||||
|
||||
**Batch operations**: `honker_claim_batch(queue, worker_id, n, timeout_s)` returns a JSON array of claimed jobs. `honker_ack_batch('[1,2,3]', worker_id)` acks multiple jobs. Ack is per-transaction for batch — honest bool return.
|
||||
|
||||
**Task result storage**: `honker_enqueue()` returns the job id. Workers can persist return values via `honker_result_save(id, value, ttl_s)`. Callers await results with `queue.wait_result(id, timeout)`. Opt-in (default `save_result=False`).
|
||||
|
||||
**Claim iterator pattern**:
|
||||
```python
|
||||
async for job in q.claim("worker-1"):
|
||||
try:
|
||||
send(job.payload)
|
||||
job.ack()
|
||||
except Exception as e:
|
||||
job.retry(delay_s=60, error=str(e))
|
||||
```
|
||||
|
||||
Each iteration is `claim_batch(worker_id, 1)`. Wakes on database update from any process, or when the next `run_at` / reclaim deadline arrives. 5-second paranoia poll is the only fallback.
|
||||
|
||||
**Queue notifications**: Each enqueue also fires a notification on `honker:<queue>` channel so workers wake immediately without waiting for the next poll cycle.
|
||||
|
||||
### 2.3 Stream — Durable Pub/Sub with Per-Consumer Offsets
|
||||
|
||||
**What it is**: Durable event stream where each named consumer tracks its own offset. Events persist until explicitly pruned. At-least-once delivery with configurable offset flush cadence.
|
||||
|
||||
**Schema**:
|
||||
```sql
|
||||
CREATE TABLE _honker_stream (
|
||||
offset INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
topic TEXT NOT NULL,
|
||||
key TEXT,
|
||||
payload TEXT NOT NULL,
|
||||
created_at INTEGER NOT NULL DEFAULT (unixepoch())
|
||||
);
|
||||
|
||||
CREATE INDEX _honker_stream_topic
|
||||
ON _honker_stream(topic, offset);
|
||||
|
||||
CREATE TABLE _honker_stream_consumers (
|
||||
name TEXT NOT NULL,
|
||||
topic TEXT NOT NULL,
|
||||
offset INTEGER NOT NULL DEFAULT 0,
|
||||
PRIMARY KEY (name, topic)
|
||||
);
|
||||
```
|
||||
|
||||
**API**:
|
||||
|
||||
| Function | Returns | Notes |
|
||||
|----------|---------|-------|
|
||||
| `honker_stream_publish(topic, key_or_null, payload_json)` | `offset` | INSERTs into `_honker_stream` + fires notification on `honker:stream:<topic>` |
|
||||
| `honker_stream_read_since(topic, offset, limit)` | JSON array | Reads rows where `offset > ?` ordered by offset |
|
||||
| `honker_stream_save_offset(consumer, topic, offset)` | 1 or 0 | Monotonic upsert — never rewinds. 1 = advanced, 0 = existing offset ≥ new |
|
||||
| `honker_stream_get_offset(consumer, topic)` | offset or 0 | Returns saved offset for consumer/topic pair |
|
||||
|
||||
**Python binding**:
|
||||
```python
|
||||
stream = db.stream("user-events")
|
||||
stream.publish({"user_id": uid, "change": "name"}, tx=tx)
|
||||
async for event in stream.subscribe(consumer="dashboard"):
|
||||
await push_to_browser(event)
|
||||
```
|
||||
|
||||
**Subscribe behavior**:
|
||||
1. Replay rows past `offset > saved_offset` in batches (default 1000 rows)
|
||||
2. Transition to live delivery on commit wake
|
||||
3. Auto-save offset at most every 1000 events or every 1 second (whichever first)
|
||||
4. At-least-once: a crash re-delivers in-flight events up to the last flushed offset
|
||||
5. Override auto-save with `save_every_n=` / `save_every_s=`; set both to 0 for manual control
|
||||
|
||||
**Transaction coupling**: `stream.publish(payload, tx=tx)` inserts into `_honker_stream` inside the caller's transaction. Rollback drops the event. This is the transactional outbox pattern without a separate dispatch table.
|
||||
|
||||
### 2.4 Scheduler — Time-Triggered Cron Tasks
|
||||
|
||||
**Schema**:
|
||||
```sql
|
||||
CREATE TABLE _honker_scheduler_tasks (
|
||||
name TEXT PRIMARY KEY,
|
||||
queue TEXT NOT NULL,
|
||||
cron_expr TEXT NOT NULL,
|
||||
payload TEXT NOT NULL,
|
||||
priority INTEGER NOT NULL DEFAULT 0,
|
||||
expires_s INTEGER,
|
||||
next_fire_at INTEGER NOT NULL,
|
||||
enabled INTEGER NOT NULL DEFAULT 1
|
||||
);
|
||||
```
|
||||
|
||||
**API**:
|
||||
```sql
|
||||
SELECT honker_scheduler_register('nightly', 'backups', '0 3 * * *', '"go"', 0, NULL);
|
||||
SELECT honker_scheduler_tick(unixepoch()); -- JSON: fires due
|
||||
SELECT honker_scheduler_soonest(); -- min next_fire_at
|
||||
SELECT honker_scheduler_unregister('nightly'); -- 1 = deleted
|
||||
SELECT honker_scheduler_pause('nightly'); -- 1 = paused
|
||||
SELECT honker_scheduler_resume('nightly'); -- 1 = resumed
|
||||
SELECT honker_scheduler_list(); -- JSON array of all schedules
|
||||
SELECT honker_scheduler_update('nightly', '0 4 * * *', NULL, NULL, NULL, 0);
|
||||
```
|
||||
|
||||
Supports: 5-field cron, 6-field cron (with seconds), `@every <n><unit>` interval expressions.
|
||||
|
||||
**Leader election via named lock**: `db.lock('honker-scheduler', ttl=60)`. Two scheduler processes can't both fire. The lock is heartbeat-refreshed every 30s.
|
||||
|
||||
**Missed-fire catch-up**: If the scheduler was down for 4 hours with an hourly schedule, the first iteration fires all 4 missed boundaries (with `expires=` to drop stale ones).
|
||||
|
||||
**Fires = enqueue**: The scheduler never runs handlers. It enqueues into the task queue. Regular workers consume.
|
||||
|
||||
### 2.5 Outbox Pattern
|
||||
|
||||
The `outbox` is a convenience wrapper around the `Queue` primitive:
|
||||
|
||||
```python
|
||||
db.outbox("emails", delivery=send_email)
|
||||
db.outbox("emails").enqueue({"to": "alice@example.com"}, tx=tx)
|
||||
db.outbox("emails").run_worker("worker-1")
|
||||
```
|
||||
|
||||
Failures retry with exponential backoff (`base_backoff_s * 2^(attempts-1)`) up to `max_attempts`, then land in `_honker_dead`.
|
||||
|
||||
---
|
||||
|
||||
## 3. Persistence and Reliability
|
||||
|
||||
### Durability Guarantees
|
||||
|
||||
- **Atomic commit**: Business write + side-effect enqueue/event/notify commit together or roll back together. This is SQLite ACID — the transactional outbox pattern is built into the primitives, not bolted on.
|
||||
- **SIGKILL safety**: Verified in `tests/test_crash_recovery.py`. Subprocess killed pre-COMMIT → `PRAGMA integrity_check == 'ok'`, zero in-flight rows, no stale write lock, queue round-trip works post-crash.
|
||||
- **Worker crash recovery**: If a worker crashes mid-job, the claim expires after `visibility_timeout_s` (default 300s) and another worker reclaims. `attempts` increments on each claim. After `max_attempts` (default 3), the row moves to `_honker_dead`.
|
||||
- **Stream at-least-once**: Offsets auto-flush every 1000 events or 1 second. A crash re-delivers in-flight events up to the last flushed offset. The crash window is bounded by the flush thresholds.
|
||||
- **Notify has no replay**: Listeners attach at `MAX(id)`. Pruned events are gone. For durable replay, use streams.
|
||||
|
||||
### WAL Mode
|
||||
|
||||
Recommended default (`journal_mode = WAL`). Gives concurrent readers with one writer and efficient fsync batching (`wal_autocheckpoint = 10000`). Other journal modes work but lose WAL's concurrent-read-while-writing property. Wake detection (`PRAGMA data_version`) works in all journal modes.
|
||||
|
||||
### What Happens on Crash
|
||||
|
||||
| Scenario | Result |
|
||||
|----------|--------|
|
||||
| Process SIGKILL mid-TRANSACTION | SQLite atomic-commit rollback. In-flight write did not land. Fresh process can acquire write lock immediately. |
|
||||
| Worker process crash mid-job | Claim expires after visibility_timeout. Another worker reclaims. `attempts` increments. |
|
||||
| Stream consumer crash | Resumes from last auto-saved offset (at-least-once). Pending offset is lost. |
|
||||
| Database file replaced (litestream restore) | Watcher panics with clear error message. All subscribers see error from update_events(). Must reopen database. |
|
||||
|
||||
### What Honker Does NOT Provide
|
||||
|
||||
- **Multi-writer replication**: SQLite's locking is for single-host. Two servers writing one `.db` over NFS will corrupt it. Shard by file or switch to Postgres.
|
||||
- **In-memory database support**: `:memory:` creates a separate database per connection, splitting writer/readers/watchers. Use temp file-backed `.db` for tests.
|
||||
- **Cross-node distribution**: Honker is single-machine. No built-in mechanism for distributing events across nodes. (This is intentional — see alknet relevance below.)
|
||||
- **Task pipelines/chains/groups/chords**. Deliberately not built.
|
||||
- **Workflow orchestration with DAGs**. Deliberately not built.
|
||||
- **Ordering guarantees across queues**. Each queue is independent.
|
||||
- **Exactly-once delivery**. Honker provides at-least-once. Idempotent handlers are the user's responsibility.
|
||||
|
||||
---
|
||||
|
||||
## 4. Performance
|
||||
|
||||
### Benchmarks (M-series, release build, median of 3)
|
||||
|
||||
| Operation | Throughput |
|
||||
|-----------|-----------|
|
||||
| enqueue (1/tx) | ~8,000/sec |
|
||||
| enqueue (100/tx) | ~110,000/sec |
|
||||
| claim + ack (individual) | ~4,500/sec |
|
||||
| claim_batch + ack_batch (32) | ~75,000/sec |
|
||||
| claim_batch + ack_batch (128) | ~110,000/sec |
|
||||
| async iter end-to-end | ~6,500/sec |
|
||||
| stream replay | ~1,000,000/sec |
|
||||
| stream live e2e p50 | 0.24ms |
|
||||
| stream live e2e p99 | 8ms |
|
||||
|
||||
### Cross-Process Wake Latency
|
||||
|
||||
Median ~1-2ms on M-series, bounded by the 1ms `PRAGMA data_version` poll cadence. 600-second soak test under sustained ~75 commits/sec showed zero missed wakes, zero drift, `PRAGMA integrity_check = ok`.
|
||||
|
||||
### Claim Performance at Scale
|
||||
|
||||
With 100,000 dead rows in `_honker_dead`:
|
||||
|
||||
| Operation | Claim+ack |
|
||||
|-----------|-----------|
|
||||
| 0 dead rows (fresh DB) | ~4,000/sec |
|
||||
| 100k dead rows | ~3,500/sec |
|
||||
|
||||
The partial index `(queue, priority DESC, run_at, id) WHERE state IN ('pending','processing')` keeps the claim hot path bounded by working-set size, not history size.
|
||||
|
||||
### How It Compares to Polling
|
||||
|
||||
Prior to honker's wake mechanism, the alternative would be application-level polling (e.g., `SELECT … WHERE id > last_seen` every N seconds). Honker replaces this with a single-digit-microsecond PRAGMA read. 100 subscribers still = 1 poll thread. The over-triggering trade-off (waking all subscribers on any commit) is explicitly chosen over potentially missing a wake.
|
||||
|
||||
---
|
||||
|
||||
## 5. SQLite Integration
|
||||
|
||||
### Loading the Extension
|
||||
|
||||
```sql
|
||||
-- Any SQLite 3.9+ client
|
||||
.load ./libhonker_ext
|
||||
SELECT honker_bootstrap();
|
||||
```
|
||||
|
||||
`honker_bootstrap()` is idempotent — it runs `CREATE TABLE IF NOT EXISTS` and `CREATE INDEX IF NOT EXISTS` for all schema tables.
|
||||
|
||||
### Compile/Load Flags
|
||||
|
||||
For Rust integration via `rusqlite`:
|
||||
|
||||
```toml
|
||||
[dependencies]
|
||||
honker-core = "0.2.3"
|
||||
rusqlite = { version = "0.39.0", features = ["functions", "hooks"] }
|
||||
```
|
||||
|
||||
Then in Rust:
|
||||
```rust
|
||||
use honker_core::{attach_notify, attach_honker_functions, bootstrap_honker_schema, open_conn};
|
||||
|
||||
let conn = open_conn("app.db", true)?; // true = install notify
|
||||
attach_honker_functions(&conn)?;
|
||||
bootstrap_honker_schema(&conn)?;
|
||||
```
|
||||
|
||||
For the loadable extension:
|
||||
```bash
|
||||
cargo build --release -p honker-extension
|
||||
# Produces: target/release/libhonker_ext.so (or .dylib, .dll)
|
||||
```
|
||||
|
||||
### Rust Crate Usage
|
||||
|
||||
```rust
|
||||
use honker_core::SharedUpdateWatcher;
|
||||
|
||||
let watcher = SharedUpdateWatcher::new(db_path.clone());
|
||||
let (sub_id, rx) = watcher.subscribe();
|
||||
|
||||
// In a loop:
|
||||
match rx.recv_timeout(Duration::from_secs(5)) {
|
||||
Ok(()) => { /* re-read state from SQLite */ },
|
||||
Err(RecvTimeoutError::Timeout) => { /* paranoia poll */ },
|
||||
Err(RecvTimeoutError::Disconnected) => { /* watcher died, reopen */ },
|
||||
}
|
||||
|
||||
watcher.unsubscribe(sub_id);
|
||||
watcher.close()?;
|
||||
```
|
||||
|
||||
### Using with ORM Connections
|
||||
|
||||
Load `libhonker_ext` on the ORM's connection and call `honker_bootstrap()` inside the ORM's transaction:
|
||||
|
||||
```python
|
||||
# SQLAlchemy
|
||||
@event.listens_for(engine, "connect")
|
||||
def _load_honker(conn, _):
|
||||
honker.load_extension(conn)
|
||||
conn.execute("SELECT honker_bootstrap()")
|
||||
|
||||
with Session(engine) as s, s.begin():
|
||||
s.add(Order(user_id=42))
|
||||
s.execute(text("SELECT honker_enqueue(:q, :p, NULL, NULL, 0, 3, NULL)"),
|
||||
{"q": "emails", "p": '{"to":"alice"}'})
|
||||
```
|
||||
|
||||
### PRAGMA Defaults
|
||||
|
||||
Applied on every connection opened via `open_conn`:
|
||||
|
||||
```sql
|
||||
PRAGMA journal_mode = WAL;
|
||||
PRAGMA synchronous = NORMAL; -- fsync WAL at checkpoint, not every commit
|
||||
PRAGMA busy_timeout = 5000; -- wait up to 5s for writer lock
|
||||
PRAGMA foreign_keys = ON;
|
||||
PRAGMA cache_size = -32000; -- 32MB page cache (default was 2MB)
|
||||
PRAGMA temp_store = MEMORY; -- temp B-trees in RAM
|
||||
PRAGMA wal_autocheckpoint = 10000; -- fsync every 10k WAL pages
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Complete API Surface
|
||||
|
||||
### Notification Functions
|
||||
|
||||
| Function | Signature | Returns | Notes |
|
||||
|----------|-----------|---------|-------|
|
||||
| `notify(channel, payload)` | Scalar, 2 args | `rowid` | INSERTs into `_honker_notifications`, returns auto-generated id |
|
||||
|
||||
### Queue Functions
|
||||
|
||||
| Function | Signature | Returns | Notes |
|
||||
|----------|-----------|---------|-------|
|
||||
| `honker_bootstrap()` | 0 args | `1` | Creates all schema tables/indexes. Idempotent. |
|
||||
| `honker_enqueue(queue, payload, run_at_or_null, delay_or_null, priority, max_attempts, expires_or_null)` | 7 args | `id` | INSERTs job. Delay overrides run_at. |
|
||||
| `honker_claim_batch(queue, worker_id, n, timeout_s)` | 4 args | JSON array | Claims up to `n` jobs. Each gets `claim_expires_at = now + timeout_s`. |
|
||||
| `honker_ack_batch(ids_json, worker_id)` | 2 args | `count` | ACKs (DELETEs) claimed jobs. `ids_json` is `[1,2,3]`. |
|
||||
| `honker_ack(job_id, worker_id)` | 2 args | `1` or `0` | Single-job ack. Returns 0 if claim expired. |
|
||||
| `honker_retry(job_id, worker_id, delay_s, error)` | 4 args | `1` or `0` | Retries (flips back to pending) or fails to dead if `attempts >= max_attempts`. |
|
||||
| `honker_fail(job_id, worker_id, error)` | 3 args | `1` or `0` | Unconditionally moves to `_honker_dead`. |
|
||||
| `honker_heartbeat(job_id, worker_id, extend_s)` | 3 args | `1` or `0` | Extends claim for long-running handlers. |
|
||||
| `honker_cancel(job_id)` | 1 arg | `1` or `0` | Removes pending/processing row. Idempotent. |
|
||||
| `honker_get_job(job_id)` | 1 arg | JSON or `""` | Read job state. Pure read. |
|
||||
| `honker_sweep_expired(queue)` | 1 arg | `count` | Moves expired pending jobs to `_honker_dead`. |
|
||||
| `honker_queue_next_claim_at(queue)` | 1 arg | `unix_ts` or `0` | Earliest future deadline (run_at or claim_expires_at + 1). |
|
||||
|
||||
### Stream Functions
|
||||
|
||||
| Function | Signature | Returns | Notes |
|
||||
|----------|-----------|---------|-------|
|
||||
| `honker_stream_publish(topic, key_or_null, payload_json)` | 3 args | `offset` | INSERTs event + fires notification |
|
||||
| `honker_stream_read_since(topic, offset, limit)` | 3 args | JSON array | Reads events with `offset > ?` |
|
||||
| `honker_stream_save_offset(consumer, topic, offset)` | 3 args | `1` or `0` | Monotonic upsert. 0 = existing offset ≥ new |
|
||||
| `honker_stream_get_offset(consumer, topic)` | 2 args | `offset` or `0` | Returns saved offset |
|
||||
|
||||
### Lock Functions
|
||||
|
||||
| Function | Signature | Returns | Notes |
|
||||
|----------|-----------|---------|-------|
|
||||
| `honker_lock_acquire(name, owner, ttl_s)` | 3 args | `1` or `0` | 1 = acquired, 0 = held |
|
||||
| `honker_lock_release(name, owner)` | 2 args | `1` or `0` | 1 = released, 0 = not yours |
|
||||
|
||||
### Rate Limit Functions
|
||||
|
||||
| Function | Signature | Returns | Notes |
|
||||
|----------|-----------|---------|-------|
|
||||
| `honker_rate_limit_try(name, limit, per)` | 3 args | `1` or `0` | 1 = under limit, 0 = at limit |
|
||||
| `honker_rate_limit_sweep(older_than_s)` | 1 arg | `count` | Prunes expired windows |
|
||||
|
||||
### Scheduler Functions
|
||||
|
||||
| Function | Signature | Returns | Notes |
|
||||
|----------|-----------|---------|-------|
|
||||
| `honker_scheduler_register(name, queue, cron_expr, payload, priority, expires_s_or_null)` | 6 args | `1` | Upserts task. Computes next_fire_at. |
|
||||
| `honker_scheduler_unregister(name)` | 1 arg | `0` or `1` | Deletes task. |
|
||||
| `honker_scheduler_tick(now_unix)` | 1 arg | JSON array | Fires due tasks, enqueues payloads, advances next_fire_at. |
|
||||
| `honker_scheduler_soonest()` | 0 args | `unix_ts` or `0` | Earliest next_fire_at for sleep duration calculation. |
|
||||
| `honker_scheduler_pause(name)` | 1 arg | `0` or `1` | Toggles `enabled = 0`. |
|
||||
| `honker_scheduler_resume(name)` | 1 arg | `0` or `1` | Toggles `enabled = 1`. |
|
||||
| `honker_scheduler_list()` | 0 args | JSON array | Returns all schedules with state. |
|
||||
| `honker_scheduler_update(name, cron_expr_or_null, payload_or_null, priority_or_null, expires_s_or_null, touch_expires)` | 6 args | `0` or `1` | Mutates schedule fields. Recomputes next_fire_at if cron_expr changed. |
|
||||
| `honker_cron_next_after(expr, from_unix)` | 2 args | `unix_ts` | Pure deterministic function. 5-field, 6-field, or `@every <n><unit>`. |
|
||||
|
||||
### Result Functions
|
||||
|
||||
| Function | Signature | Returns | Notes |
|
||||
|----------|-----------|---------|-------|
|
||||
| `honker_result_save(job_id, value_json, ttl_s)` | 3 args | `1` | UPSERTs result. `ttl_s=0` = no expiration. |
|
||||
| `honker_result_get(job_id)` | 1 arg | `value` or `NULL` | Returns result or NULL if expired/missing. |
|
||||
| `honker_result_sweep()` | 0 args | `count` | Prunes expired result rows. |
|
||||
|
||||
### Watcher Functions (Extension ABI)
|
||||
|
||||
| Function | Signature | Returns | Notes |
|
||||
|----------|-----------|---------|-------|
|
||||
| `honker_update_watcher_open(db_path, backend)` | 2 SQL args | `id` | Opens a watcher handle. For Elixir and extension consumers. |
|
||||
| `honker_update_watcher_wait(id, timeout_ms)` | 2 SQL args | `1`/`0`/`-1` | 1 = update observed, 0 = timeout, -1 = disconnected |
|
||||
| `honker_update_watcher_close(id)` | 1 SQL arg | `1` | Closes watcher handle. |
|
||||
|
||||
C ABI (for Go, .NET, C++, Ruby bindings that route through the extension):
|
||||
|
||||
| Function | Signature | Returns | Notes |
|
||||
|----------|-----------|---------|-------|
|
||||
| `honker_watcher_open(db_path, backend, err_buf, err_buf_len)` | C ABI | `*mut HonkerWatcherHandle` | Opens a core-backed update watcher. |
|
||||
| `honker_watcher_wait(handle, timeout_ms)` | C ABI | `1`/`0`/`-1`/`-2` | 1 = update, 0 = timeout, -1 = closed, -2 = panic |
|
||||
| `honker_watcher_close(handle)` | C ABI | void | Closes and frees the handle. |
|
||||
|
||||
### Tables
|
||||
|
||||
| Table | Purpose |
|
||||
|-------|---------|
|
||||
| `_honker_live` | Pending + processing jobs. Partial index for fast claims. |
|
||||
| `_honker_dead` | Terminal jobs (retry-exhausted or explicitly failed). Never scanned by claim path. |
|
||||
| `_honker_notifications` | Ephemeral notify/listen messages. Not auto-pruned. |
|
||||
| `_honker_stream` | Durable stream events with auto-incrementing offsets. |
|
||||
| `_honker_stream_consumers` | Per-consumer stream offsets. Monotonic upsert. |
|
||||
| `_honker_locks` | Named advisory locks with TTL expiration. |
|
||||
| `_honker_rate_limits` | Fixed-window rate limit counters. |
|
||||
| `_honker_scheduler_tasks` | Cron/schedule task definitions with next_fire_at. |
|
||||
| `_honker_results` | Task result storage with TTL expiration. |
|
||||
|
||||
---
|
||||
|
||||
## 7. Comparison to Postgres (pg_notify)
|
||||
|
||||
| Feature | Honker | pg_notify |
|
||||
|---------|--------|-----------|
|
||||
| **Delivery model** | Table-backed `INSERT` in transaction | In-memory NOTIFY with LISTEN callback |
|
||||
| **Persistence** | Rows survive restart. Not auto-pruned. | Ephemeral — lost on restart, not replayed. |
|
||||
| **Transactional coupling** | `notify(channel, payload)` inside `BEGIN IMMEDIATE; INSERT; COMMIT` — atomic with business write | NOTIFY fires at COMMIT inside the same transaction. Atomic with business write. |
|
||||
| **Retry / visibility timeout** | Queue has `claim_expires_at`, `attempts`, `max_attempts`, dead-letter. | No retry. No visibility timeout. |
|
||||
| **Delayed delivery** | `run_at` for scheduled delivery. Jobs only claimable after deadline. | No scheduling. |
|
||||
| **Cross-process wake** | `PRAGMA data_version` polling at ~1ms cadence. SharedUpdateWatcher fans out to N subscribers. | Postgres notifies listeners via its inter-process communication. |
|
||||
| **Priority** | Queue priority via partial index `(queue, priority DESC, run_at, id)`. | No priority. |
|
||||
| **Rate limiting** | Built-in fixed-window `rate_limit_try`. | No rate limiting. |
|
||||
| **Named locks** | TTL-based advisory locks in `_honker_locks`. | `pg_advisory_lock` (similar concept, different implementation). |
|
||||
| **Cron scheduling** | Built-in scheduler with 5-field/6-field cron + `@every` intervals. | Needs pg-boss/Oban/cron extension. |
|
||||
| **Stream offsets** | Per-consumer tracked offsets with monotonic upsert. | No built-in stream offsets. |
|
||||
| **Multi-process** | Single-machine, single-writer. | Multi-process, multi-writer natively. |
|
||||
| **Durability** | SQLite ACID. WAL mode for concurrent readers. | Postgres ACID. Full write-ahead logging. |
|
||||
|
||||
**What honker gives you that pg_notify alone doesn't**:
|
||||
1. **Retry with exponential backoff** — automatic re-delivery on failure
|
||||
2. **Visibility timeout** — crashed workers don't permanently lose messages
|
||||
3. **Dead-letter queue** — exhausted retries land in `_honker_dead` for inspection
|
||||
4. **Delayed jobs** — `run_at` for future delivery
|
||||
5. **Prioritization** — `priority` column in claim index
|
||||
6. **Transactional outbox** — business write + enqueue/event in one transaction, without adding Redis/Celery
|
||||
7. **Task result storage** — workers can persist return values; callers can await results
|
||||
8. **Durable streams** — per-consumer offsets with at-least-once delivery
|
||||
9. **Cron scheduling** — built-in periodic tasks with leader election
|
||||
10. **Named locks and rate limiting** — built-in coordination primitives
|
||||
|
||||
**What you'd need to add if you used Postgres instead**: pg-boss, Oban, or similar PgBoss-style packages provide many of these features, but they require Postgres as the database. Honker exists for the case where SQLite is already the primary datastore.
|
||||
|
||||
---
|
||||
|
||||
## 8. Comparison to Other Message Systems
|
||||
|
||||
| Feature | Honker | Redis Pub/Sub | NATS | Kafka |
|
||||
|---------|--------|--------------|------|-------|
|
||||
| **Persistence** | SQLite tables (disk) | In-memory only (unless RDB/AOF) | In-memory (JetStream adds persistence) | Persistent log |
|
||||
| **Transactional coupling** | Business write + enqueue in one tx | Not atomic with business data | Not atomic with business data | Not atomic with business data |
|
||||
| **Delivery guarantee** | At-least-once | At-most-once (fire-and-forget) | At-most-once (core); at-least-once (JetStream) | At-least-once with consumer offsets |
|
||||
| **Retry/visibility** | Built-in (claim timeout, retry, dead-letter) | None (messages disappear if no consumer) | None (core); redelivery (JetStream) | Consumer group offsets |
|
||||
| **Priority** | Yes (partial index) | No | No | No |
|
||||
| **Delayed delivery** | Yes (`run_at`) | No (requires sorted sets hack) | No | No (requires time-based logic) |
|
||||
| **Single-node complexity** | Zero — just a `.db` file | Requires Redis server | Requires NATS server | Requires Kafka cluster |
|
||||
| **Cross-process wake latency** | 1-2ms | ~0.1ms | ~0.1ms | ~1-5ms |
|
||||
| **Cross-node distribution** | None (single machine) | Pub/Sub is fan-out to connected clients | JetStream supports clustering | Built for distributed |
|
||||
| **Dependency** | SQLite (already in your stack) | Additional server | Additional server | Additional cluster |
|
||||
| **Schema coupling** | Same file as business data — dual-write impossible | Separate system — dual-write risk | Separate system — dual-write risk | Separate system — dual-write risk |
|
||||
| **Language support** | Python, Node, Rust, Go, Ruby, Bun, Elixir, C++, .NET, JVM, Kotlin | Many (but protocol, not SQL) | 40+ client libraries | Many client libraries |
|
||||
| **Dead-letter queue** | Built-in `_honker_dead` | None | JetStream has DLQ | DLQ via configuration |
|
||||
|
||||
**When honker is the right choice**: SQLite is already your primary datastore, and you need pub/sub + queue + scheduling without introducing Redis/Celery/NATS. The dual-write problem between your business tables and the queue disappears.
|
||||
|
||||
**When honker is NOT the right choice**: Multi-node deployments, multi-writer sharding, need for cross-datacenter replication, or workloads exceeding single-machine throughput.
|
||||
|
||||
---
|
||||
|
||||
## 9. Relevance to Alknet
|
||||
|
||||
### 9.1 Alignment with Event Boundary Discipline (ADR-032)
|
||||
|
||||
ADR-032 defines three communication layers:
|
||||
|
||||
```
|
||||
Call Protocol (Layer 3, external, JSON)
|
||||
└── irpc Service (Layer 3, internal, postcard)
|
||||
└── Honker Streams (Domain events, within service boundary)
|
||||
```
|
||||
|
||||
**Honker's single-machine model is exactly right for the bottom layer.** Domain events in alknet are internal to the service that owns that data — `nodes:created`, `edges:deleted`, `accounts:updated`. These never cross the service boundary without projection into a call protocol `EventEnvelope`.
|
||||
|
||||
The integration plan (Phase 2.2) explicitly lists honker integration patterns for alknet-storage:
|
||||
|
||||
| Feature | Use Case |
|
||||
|---------|----------|
|
||||
| `stream_publish` / `subscribe` | Durable pub/sub for node/edge/membership changes |
|
||||
| `notify` / `listen` | Ephemeral pub/sub for real-time control channel events |
|
||||
| `queue` / `claim` / `ack` | Task queue for async operations |
|
||||
|
||||
### 9.2 Patterns from Honker for alknet-storage Adoption
|
||||
|
||||
**Map honker's primitives to alknet-storage's internal events**:
|
||||
|
||||
| Alknet Domain Event | Honker Primitive | Stream Name |
|
||||
|---------------------|------------------|-------------|
|
||||
| Node created | `stream.publish("nodes:created", ...)` | `nodes:created` |
|
||||
| Node updated | `stream.publish("nodes:updated", ...)` | `nodes:updated` |
|
||||
| Node deleted | `stream.publish("nodes:deleted", ...)` | `nodes:deleted` |
|
||||
| Edge created | `stream.publish("edges:created", ...)` | `edges:created` |
|
||||
| Account updated | `stream.publish("accounts:updated", ...)` | `accounts:updated` |
|
||||
| ACL rule changed | `stream.publish("acl:changed", ...)` | `acl:changed` |
|
||||
|
||||
**Map honker's task queue to alknet's async operations**:
|
||||
|
||||
| Alknet Async Task | Honker Queue |
|
||||
|-------------------|-------------|
|
||||
| Key rotation | `queue("key-rotation")` |
|
||||
| Certificate renewal | `queue("cert-renewal")` |
|
||||
| Audit log archival | `queue("audit-archival")` |
|
||||
| Node encryption/decryption | `queue("node-crypto")` |
|
||||
|
||||
**Map honker's notify/listen to real-time events**:
|
||||
|
||||
| Alknet Real-Time Event | Honker Channel |
|
||||
|------------------------|---------------|
|
||||
| SSH connection opened | `notify("ssh:connected", ...)` |
|
||||
| Config reload triggered | `notify("config:reload", ...)` |
|
||||
| Forwarding rule activated | `notify("forwarding:activated", ...)` |
|
||||
|
||||
### 9.3 Replicating Honker Patterns with Postgres for Production
|
||||
|
||||
If alknet-storage is backed by Postgres in production deployments (the storage spec mentions `rusqlite` but leaves room for alternative backends), the following Postgres equivalents would be needed:
|
||||
|
||||
| Honker Primitive | Postgres Equivalent | What's Lost |
|
||||
|-----------------|---------------------|-------------|
|
||||
| `notify/listen` | `pg_notify` + `LISTEN` | Postgres NOTIFY is ephemeral (lost on restart). Honker's table-backed notifications persist. Need to add a `_notifications` table and polling. |
|
||||
| `stream_publish/subscribe` | `pg_notify` + consumer offset table | No built-in per-consumer offset tracking. Would need a `_stream_consumers` table and polling/cursor logic. |
|
||||
| `queue/claim/ack` | pg-boss / Oban | These exist and are production-quality. Honker's simplicity (one table, partial index) is lost. Need a dependency on Oban or pg-boss. |
|
||||
| `run_at` (delayed jobs) | Oban's `scheduled_at` / pg-boss's `startAfter` | Available in both. |
|
||||
| `claim_expires_at` (visibility timeout) | Oban's `attempted_at` + `max_attempts` | Available in both. |
|
||||
| `honker_lock_acquire/release` | `pg_advisory_lock` | Built-in, similar concept. |
|
||||
| `honker_rate_limit_try` | Custom table or Redis | Postgres has no built-in rate limiting. |
|
||||
| Transactional coupling | Same tx | Naturally available: `INSERT INTO orders ...; INSERT INTO _honker_live ...;` both in the same Postgres tx. |
|
||||
| Scheduler | pg-boss `schedule()` or Oban's `Oban.insert(CronWorker, ...)` | Available in both. |
|
||||
|
||||
**What would be lost switching to Postgres + pg-boss/Oban**:
|
||||
- **Schema simplicity**: Honker uses 2 tables for 90% of queue operations. pg-boss uses more tables. Oban uses per-queue tables.
|
||||
- **Zero-dependency**: Honker is a SQLite extension. No Redis, no Celery, no broker. pg-boss requires Postgres. Oban requires Postgres + Elixir.
|
||||
- **Cross-language transparency**: Any SQLite client can `SELECT load_extension('honker')` and get the same features. Postgres requires language-specific client libraries.
|
||||
- **File-based deployment**: Copy the `.db` file. Done. Postgres requires a server.
|
||||
|
||||
**Recommendation for alknet-storage**: Start with honker on SQLite for self-hosted/edge deployments. For production Postgres deployments, create an abstraction layer in `alknet-storage` that implements the same `EventStream`, `TaskQueue`, and `NotificationChannel` traits against both backends. The honker-on-SQLite implementation is the reference; the Postgres implementation uses `pg_notify` + offset tables + Oban/pg-boss.
|
||||
|
||||
### 9.4 Honker's Queue/Claim Model and alknet's Call Protocol
|
||||
|
||||
The call protocol's `EventEnvelope` frames are the integration boundary (ADR-033). When a domain event needs to cross node boundaries, it must be projected:
|
||||
|
||||
```
|
||||
Honker stream event (internal)
|
||||
→ Projection function
|
||||
→ EventEnvelope frame (external, call protocol)
|
||||
→ Transported over SSH/QUIC/DNS
|
||||
→ Received by remote node
|
||||
→ May trigger local Honker stream event on remote node
|
||||
```
|
||||
|
||||
The **queue/claim model maps to async call protocol operations**:
|
||||
|
||||
1. **call.requested** → Honker `queue.enqueue({"operation": "/head/auth/verify", "input": {...}})`
|
||||
2. **Worker claims the job** → Like a worker process picking up a call request
|
||||
3. **job.ack()** → call.responded with the result
|
||||
4. **job.retry()** → Call timeout / retry logic (but this is at the transport layer, not the queue)
|
||||
5. **job fails → _honker_dead** → Dead letter equivalent for failed call protocol operations
|
||||
|
||||
The **key difference**: alknet's call protocol is synchronous request-response at the transport layer, while honker's queue is async at-least-once. They serve different purposes:
|
||||
- **Call protocol**: "I need you to verify this pubkey NOW" (synchronous, cross-node)
|
||||
- **Honker queue**: "Process this key rotation in the background" (asynchronous, within-node)
|
||||
|
||||
For **cross-node task distribution**, honker's queue should NOT be the transport. Instead:
|
||||
1. A domain event (honker stream) in the storage service triggers a projection
|
||||
2. The projection creates an `EventEnvelope` frame
|
||||
3. The call protocol delivers it to remote nodes
|
||||
4. Remote nodes may enqueue it into their own honker queues for local processing
|
||||
|
||||
### 9.5 Cross-Node Event Distribution
|
||||
|
||||
**Honker is single-node by design.** This is correct for alknet's architecture because:
|
||||
|
||||
1. **Domain events stay within the service boundary** (ADR-032). Honker streams are for internal state reconstruction, not cross-node distribution.
|
||||
2. **Integration events cross boundaries via the call protocol.** When a domain event in the storage service needs to be communicated to another node, it's projected into an `EventEnvelope` frame and sent over the wire.
|
||||
3. **Each node has its own `.db` file** with its own honker streams. This is a feature, not a limitation — it enforces the event boundary discipline.
|
||||
|
||||
The bridge pattern:
|
||||
|
||||
```
|
||||
Node A (storage service):
|
||||
1. Business write INSERTs into SQLite
|
||||
2. stream.publish("nodes:created", {node_id: 42}) in same tx
|
||||
3. A local subscriber detects the event
|
||||
4. Projects it into EventEnvelope {operation: "/head/nodes/created", data: {node_id: 42}}
|
||||
5. Sends via call protocol over SSH/QUIC/DNS to Node B
|
||||
|
||||
Node B (receiver):
|
||||
1. Receives EventEnvelope via call protocol
|
||||
2. Enqueues locally: queue("incoming-events").enqueue({source: "node-A", event: ...})
|
||||
3. Or publishes locally: stream.publish("remote:nodes:created", {node_id: 42})
|
||||
```
|
||||
|
||||
This preserves the three-layer model while respecting honker's single-machine design.
|
||||
|
||||
### 9.6 Honker Patterns and Integration Plan Mapping
|
||||
|
||||
The integration plan (Phase 2.2, alknet-storage) references these honker patterns. Here's the direct mapping:
|
||||
|
||||
| Plan Reference | Honker Primitive | Implementation Notes |
|
||||
|---------------|-----------------|---------------------|
|
||||
| `stream_publish/subscribe` | `db.stream("topic").publish(data, tx=tx)` + `async for event in stream.subscribe(consumer="name")` | Used for domain events within alknet-storage. Each metagraph change (node/edge created/updated/deleted) publishes to a stream. Consumers (local reactive logic, SSE endpoints) subscribe. |
|
||||
| `notify/listen` | `tx.notify("channel", data)` + `async for n in db.listen("channel")` | Used for ephemeral real-time signals. SSH connection events, config reload triggers, forwarding rule activation. No persistence needed. |
|
||||
| `queue/claim` | `queue.enqueue(data, tx=tx)` + `async for job in queue.claim(worker_id)` | Used for background tasks. Key rotation, certificate renewal, audit log archival, batch operations. The `tx=tx` parameter ensures atomicity with business writes. |
|
||||
|
||||
**Implementation approach for alknet-storage (Rust)**:
|
||||
|
||||
Use `honker-core` directly (not the Python binding). The Rust crate exposes:
|
||||
- `open_conn(path, install_notify)` — open a connection with PRAGMA defaults
|
||||
- `attach_honker_functions(&conn)` — register all SQL functions
|
||||
- `bootstrap_honker_schema(&conn)` — create tables
|
||||
- `SharedUpdateWatcher::new(db_path)` — the wake listener
|
||||
|
||||
Or load the extension via `rusqlite`:
|
||||
```rust
|
||||
use rusqlite::Connection;
|
||||
|
||||
let conn = Connection::open("alknet.db")?;
|
||||
conn.load_extension("libhonker_ext", None)?;
|
||||
conn.execute_batch("SELECT honker_bootstrap()")?;
|
||||
```
|
||||
|
||||
### 9.7 Rust Integration: honker-core vs honker-rs
|
||||
|
||||
Two options for Rust integration in alknet-storage:
|
||||
|
||||
**Option A: Use `honker-core` directly**
|
||||
|
||||
The `honker-core` crate provides:
|
||||
- `attach_notify(&conn)` — `_honker_notifications` table + `notify()` SQL function
|
||||
- `attach_honker_functions(&conn)` — all `honker_*` SQL functions
|
||||
- `bootstrap_honker_schema(&conn)` — all table/index creation
|
||||
- `SharedUpdateWatcher` — the wake mechanism
|
||||
- `open_conn(path, install_notify)` — connection factory with PRAGMA defaults
|
||||
|
||||
This gives you raw SQL access. You call `conn.query_row("SELECT honker_enqueue(…)")` etc. Maximum control, minimum abstraction.
|
||||
|
||||
**Option B: Use `honker-rs` ergonomic wrapper**
|
||||
|
||||
The `packages/honker-rs` crate provides:
|
||||
- `Database::open(path)` — opens `system.db`
|
||||
- `db.queue("name")` — `Queue` handle with `.enqueue()`, `.claim_batch()`, `.ack_batch()`
|
||||
- `db.stream("name")` — `Stream` handle with `.publish()`, `.subscribe()`
|
||||
- `db.listen("channel")` — async listener
|
||||
- `db.outbox("name", delivery_fn)` — outbox pattern
|
||||
- `db.lock("name", owner, ttl)` — named lock
|
||||
- `db.scheduler()` — cron scheduler
|
||||
|
||||
**Recommendation**: Start with `honker-core` + direct SQL. The schema and functions are stable and well-tested. Wrap in application-level methods as needed. `honker-rs` may not expose all features (e.g., the scheduler pause/resume/list/update functions added in Phase Mantle). Using `honker-core` gives maximum flexibility while maintaining a single source of truth for SQL behavior.
|
||||
|
||||
---
|
||||
|
||||
## 10. Open Questions for Alknet
|
||||
|
||||
1. **Should alknet-storage bundle honker as a Rust crate dependency, or load the extension at runtime?**
|
||||
- Bundling `honker-core` gives compile-time verification. Loading the extension requires shipping `libhonker_ext.so/.dylib/.dll` alongside the binary.
|
||||
- Recommendation: Bundle `honker-core` as a crate dependency for the Rust implementation. Extension loading is for language bindings that can't link Rust code directly.
|
||||
|
||||
2. **Should the `alknet-storage` crate depend on `honker` (the Python package) or `honker-core` (the Rust rlib)?**
|
||||
- `honker-core` (Rust rlib) — correct choice for a Rust crate. `honker` is the Python binding.
|
||||
- The Crate dependency in storage.md currently lists `honker = "0.x"`. This should be `honker-core = "0.2"`.
|
||||
|
||||
3. **How does the Rust `SharedUpdateWatcher` integrate with tokio?**
|
||||
- `SharedUpdateWatcher::subscribe()` returns a `std::sync::mpsc::Receiver<()>`, which is blocking. For tokio integration, wrap in `tokio::task::spawn_blocking` or use `tokio::sync::mpsc` as a bridge.
|
||||
- Alternatively, use `UpdateWatcher::spawn()` directly and convert ticks to tokio notifications.
|
||||
|
||||
4. **Should alknet-storage abstract over honker-specific table names?**
|
||||
- Honker prefixes all internal tables with `_honker_` (e.g., `_honker_live`, `_honker_stream`). Alknet-storage should treat these as honker's internal schema and not directly query them for application logic.
|
||||
- Application-level tables (like `nodes`, `edges`, `accounts`) should use their own namespacing convention. Honker's tables coexist in the same `.db` file.
|
||||
|
||||
5. **Multi-tenant support**: Honker queues and streams are identified by name strings (e.g., `"emails"`, `"user-events"`). For alknet's multi-tenant model (system DB vs tenant DB), each tenant gets its own `.db` file with its own honker tables. Cross-tenant events must go through the call protocol — never by direct honker stream subscription across database files.
|
||||
|
||||
6. **Database file management**: Alknet-storage's system DB (`system.db`) and tenant DBs (`tenant-{orgId}.db`) should each have their own honker instance. The `SharedUpdateWatcher` is per-database, so 100 active tenants = 100 poll threads. This is fine for the expected alknet deployment size, but worth monitoring thread count in large deployments.
|
||||
|
||||
---
|
||||
|
||||
## 11. License and Maturity
|
||||
|
||||
- **License**: Apache 2.0 OR MIT (dual-licensed). Fully permissive for integration.
|
||||
- **Maturity**: Alpha software (noted in README). Better than experimental but not beta-quality yet.
|
||||
- **Status**: Active development. Regular commits. Cross-language interop tests. 180+ Python tests, 12+ Rust tests. Crash recovery verified. 600-second soak test under sustained writes.
|
||||
- **Breaking changes risk**: The project is pre-1.0. Some table names still reference "joblite" and "litenotify" in the CHANGELOG (historical names). Current names use `_honker_` prefix. The API surface is stabilizing but may change.
|
||||
- **Recommendation**: Pin to a specific `honker-core` version in `alknet-storage`'s `Cargo.toml`. The schema migration path (seen in `bootstrap_honker_schema`'s ALTER TABLE for `enabled` column) shows the project handles migrations.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Honker GitHub Repository](https://github.com/russellromney/honker) — Primary source for all code and documentation
|
||||
- [Honker README](https://github.com/russellromney/honker/blob/main/README.md) — Feature overview, quick start, architecture, performance
|
||||
- [Honker BINDINGS.md](https://github.com/russellromney/honker/blob/main/BINDINGS.md) — Language binding support matrix
|
||||
- [Honker ROADMAP.md](https://github.com/russellromney/honker/blob/main/ROADMAP.md) — Future work phases, planned features (singleton/dedup, state events, queue stats, per-queue config)
|
||||
- [Honker CHANGELOG.md](https://github.com/russellromney/honker/blob/main/CHANGELOG.md) — Detailed history of all changes, performance passes, and architecture decisions
|
||||
- [Honker honker-core/src/lib.rs](https://github.com/russellromney/honker/blob/main/honker-core/src/lib.rs) — Core Rust implementation: Writer, Readers, UpdateWatcher, SharedUpdateWatcher, schema, PRAGMAs
|
||||
- [Honker honker-core/src/honker_ops.rs](https://github.com/russellromney/honker/blob/main/honker-core/src/honker_ops.rs) — All SQL function implementations: enqueue, claim, ack, retry, stream, lock, rate limit, scheduler
|
||||
- [Honker honker-extension/src/lib.rs](https://github.com/russellromney/honker/blob/main/honker-extension/src/lib.rs) — Loadable extension entry point and C ABI for watcher
|
||||
- [alknet ADR-032: Event Boundary Discipline](../../architecture/decisions/032-event-boundary-discipline.md) — Domain events stay within service boundary
|
||||
- [alknet Integration Plan](../../research/integration-plan.md) — Phase 2.2: alknet-storage honker integration
|
||||
- [alknet Storage Spec](../../architecture/storage.md) — alknet-storage crate design and honker integration table
|
||||
@@ -1,170 +0,0 @@
|
||||
# async-nats: Overview & Architecture
|
||||
|
||||
**Crate**: `async-nats`
|
||||
**Version**: 0.49.1
|
||||
**Repository**: https://github.com/nats-io/nats.rs
|
||||
**License**: Apache-2.0
|
||||
**Rust Edition**: 2021
|
||||
**MSRV**: 1.88.0
|
||||
**Async Runtime**: Tokio
|
||||
|
||||
## What is async-nats?
|
||||
|
||||
`async-nats` is the official async Rust client for the [NATS messaging system](https://nats.io). It provides a Tokio-based asynchronous interface to NATS server features including:
|
||||
|
||||
- **Core NATS** — publish/subscribe, request/reply, queue groups
|
||||
- **JetStream** — persistent stream-based messaging with at-least-once and exactly-once semantics
|
||||
- **Key-Value Store** — KV abstraction built on JetStream streams
|
||||
- **Object Store** — large-object storage built on JetStream streams
|
||||
- **Service API** — microservice request/reply pattern with built-in PING/INFO/STATS verbs
|
||||
|
||||
The crate is positioned as the **core client** in the NATS Rust ecosystem. A separate project, [Orbit](https://github.com/synadia-io/orbit.rs), provides higher-level opinionated abstractions on top.
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────┐
|
||||
│ Application code │
|
||||
└──────────────┬───────────────────────────┬───────────┘
|
||||
│ │
|
||||
▼ ▼
|
||||
┌───────────────────┐ ┌───────────────────┐
|
||||
│ Orbit crates │ uses │ async-nats (core) │
|
||||
│ (opinionated, │──────▶│ (parity, stable, │
|
||||
│ per-crate semver) │ │ protocol-level) │
|
||||
└───────────────────┘ └─────────┬─────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────┐
|
||||
│ nats-server │
|
||||
└─────────────┘
|
||||
```
|
||||
|
||||
## Feature Flags
|
||||
|
||||
Features are extensive and control which subsystems are compiled:
|
||||
|
||||
| Feature | Default | Description |
|
||||
|---------|---------|-------------|
|
||||
| `jetstream` | ✅ | JetStream API (streams, consumers, publish) |
|
||||
| `kv` | ✅ | Key-Value store (depends on `jetstream`) |
|
||||
| `object-store` | ✅ | Object store (depends on `jetstream` + `crypto`) |
|
||||
| `service` | ✅ | Service API (microservice pattern) |
|
||||
| `nkeys` | ✅ | NKey/JWT authentication |
|
||||
| `nuid` | ✅ | NUID-based unique ID generation |
|
||||
| `crypto` | ✅ | Cryptographic primitives (SHA-256 for object store) |
|
||||
| `websockets` | ✅ | WebSocket transport (`ws://`/`wss://`) |
|
||||
| `ring` | ✅ | Use `ring` as TLS crypto backend |
|
||||
| `aws-lc-rs` | ❌ | Use `aws-lc-rs` as TLS crypto backend |
|
||||
| `fips` | ❌ | FIPS 140-2 compliant via `aws-lc-rs` |
|
||||
| `chrono` | ❌ | Use `chrono` instead of `time` for datetime types |
|
||||
| `server_2_10` | ✅ | Server 2.10+ features |
|
||||
| `server_2_11` | ✅ | Server 2.11+ features |
|
||||
| `server_2_12` | ✅ | Server 2.12+ features |
|
||||
| `server_2_14` | ✅ | Server 2.14+ features |
|
||||
| `experimental` | ❌ | Experimental features |
|
||||
|
||||
## Source Structure
|
||||
|
||||
```
|
||||
async-nats/src/
|
||||
├── lib.rs # Entry point: connect(), ServerInfo, Command, ClientOp, ServerOp,
|
||||
│ ConnectionHandler, Subscriber, Event, ServerAddr, ConnectInfo
|
||||
├── client.rs # Client struct, publish/subscribe/request/drain/flush APIs,
|
||||
│ Request builder, Statistics, trait definitions
|
||||
├── connection.rs # Framed connection: NATS protocol parser/serializer,
|
||||
│ read/write buffer management, WebSocket adapter
|
||||
├── connector.rs # Server pool, reconnection logic, TLS setup, DNS resolution,
|
||||
│ authentication handshake
|
||||
├── options.rs # ConnectOptions builder, auth methods, TLS config, callbacks
|
||||
├── auth.rs # Auth struct (username, password, token, JWT, nkey, signature)
|
||||
├── auth_utils.rs # Credentials file parsing (JWT + NKey seed)
|
||||
├── message.rs # Message (inbound), OutboundMessage (outbound)
|
||||
├── header.rs # HeaderMap, HeaderName, HeaderValue (NATS headers)
|
||||
├── subject.rs # Subject type, ToSubject trait, SubjectError
|
||||
├── status.rs # StatusCode enum (NATS status codes)
|
||||
├── error.rs # Generic Error<K> type used throughout
|
||||
├── datetime.rs # DateTime type (time or chrono backend)
|
||||
├── id_generator.rs # Unique ID generation (NUID or rand fallback)
|
||||
├── tls.rs # TLS configuration helper
|
||||
├── crypto.rs # SHA-256 for object store integrity
|
||||
├── jetstream/
|
||||
│ ├── mod.rs # Module entry: new(), with_domain(), with_prefix()
|
||||
│ ├── context.rs # Context: JetStream API (streams, consumers, KV, OS, publish)
|
||||
│ ├── stream.rs # Stream handle, Config, Info, purge/delete/message ops
|
||||
│ ├── consumer/
|
||||
│ │ ├── mod.rs # Consumer trait, Info, Config base
|
||||
│ │ ├── pull.rs # PullConsumer: batch fetch, sequence, messages stream
|
||||
│ │ └── push.rs # PushConsumer: Ordered push consumer with auto-recreate
|
||||
│ ├── publish.rs # PublishAck, PublishAckFuture, PublishMessage builder
|
||||
│ ├── message.rs # JetStream Message (with ack methods), AckKind
|
||||
│ ├── response.rs # Response<T> (Ok/Err) for JetStream API calls
|
||||
│ ├── errors.rs # ErrorCode, Error for JetStream
|
||||
│ ├── account.rs # Account info
|
||||
│ ├── kv/
|
||||
│ │ ├── mod.rs # Store: put/get/delete/purge/watch/history/keys
|
||||
│ │ └── bucket.rs # Bucket Status
|
||||
│ └── object_store/
|
||||
│ └── mod.rs # ObjectStore: put/get/delete/watch/list/seal, Object (AsyncRead)
|
||||
└── service/
|
||||
├── mod.rs # Service, ServiceBuilder, Group, EndpointBuilder, Request
|
||||
└── endpoint.rs # Endpoint stream, Stats, Info
|
||||
```
|
||||
|
||||
## Architecture: Core Connection Model
|
||||
|
||||
The client uses a **single-connection, actor-model** design:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────┐
|
||||
Client (clone) ──▶│ mpsc::Sender<Command> │
|
||||
(many handles) │ (bounded channel) │
|
||||
└────────────┬────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌────────────────────────────────────────┐
|
||||
│ ConnectionHandler (tokio::task) │
|
||||
│ - Receives Command from channel │
|
||||
│ - Converts to ClientOp │
|
||||
│ - Manages subscriptions map │
|
||||
│ - Manages multiplexer (request/reply)│
|
||||
│ - Pings server on interval │
|
||||
│ - Handles reconnection │
|
||||
└────────────┬──────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌────────────────────────────────────────┐
|
||||
│ Connection (framed TCP/TLS/WS) │
|
||||
│ - Protocol parser (try_read_op) │
|
||||
│ - Write buffer (VecDeque<Bytes>) │
|
||||
│ - Vectored I/O support │
|
||||
│ - Read buffer (BytesMut) │
|
||||
└────────────┬──────────────────────────┘
|
||||
│
|
||||
▼
|
||||
nats-server
|
||||
```
|
||||
|
||||
### Key Design Decisions
|
||||
|
||||
1. **Cloneable Client**: `Client` is `Clone` (via `mpsc::Sender` clone), enabling shared use across tasks
|
||||
2. **Single TCP connection**: All traffic (Core NATS, JetStream API, etc.) multiplexes over one connection
|
||||
3. **Background task**: `ConnectionHandler` runs as a spawned Tokio task, bridging the mpsc channel to the TCP stream
|
||||
4. **Automatic reconnection**: On disconnect, `Connector` retries servers from the pool with exponential backoff
|
||||
5. **Subscription rehydration**: On reconnect, all active subscriptions are re-subscribed with adjusted `max` counts
|
||||
6. **Multiplexer for request/reply**: A single wildcard subscription (`_INBOX.<id>.*`) multiplexes all pending request/reply correlations
|
||||
|
||||
## Dependencies (Key)
|
||||
|
||||
| Crate | Purpose |
|
||||
|-------|---------|
|
||||
| `tokio` | Async runtime, TCP, time, sync, io-util |
|
||||
| `bytes` | Efficient byte buffer (`Bytes`, `BytesMut`) |
|
||||
| `tokio-rustls` | TLS via rustls |
|
||||
| `rustls-native-certs` | Load system root certificates |
|
||||
| `serde` / `serde_json` | JSON serialization for JetStream API |
|
||||
| `futures-util` | Stream trait, Sink trait, StreamExt |
|
||||
| `tracing` | Structured logging |
|
||||
| `thiserror` | Error derive macros |
|
||||
| `memchr` | Fast substring search for protocol parsing |
|
||||
| `portable-atomic` | Atomic types with portable-atomic fallback |
|
||||
| `tokio-util` | `PollSender` for Sink implementation |
|
||||
| `tokio-stream` | `ReceiverStream` adapter |
|
||||
@@ -1,404 +0,0 @@
|
||||
# async-nats: Key Types & Traits
|
||||
|
||||
## Core Types
|
||||
|
||||
### `Client`
|
||||
|
||||
The primary handle to a NATS connection. Cheaply cloneable (wraps `mpsc::Sender<Command>`).
|
||||
|
||||
```rust
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct Client {
|
||||
info: tokio::sync::watch::Receiver<Option<ServerInfo>>,
|
||||
state: tokio::sync::watch::Receiver<State>,
|
||||
sender: mpsc::Sender<Command>,
|
||||
poll_sender: PollSender<Command>,
|
||||
next_subscription_id: Arc<AtomicU64>,
|
||||
subscription_capacity: usize,
|
||||
inbox_prefix: Arc<str>,
|
||||
request_timeout: Option<Duration>,
|
||||
max_payload: Arc<AtomicUsize>,
|
||||
connection_stats: Arc<Statistics>,
|
||||
skip_subject_validation: bool,
|
||||
}
|
||||
```
|
||||
|
||||
**Key methods**:
|
||||
- `publish(subject, payload)` — fire-and-forget publish
|
||||
- `publish_with_headers(subject, headers, payload)` — publish with NATS headers
|
||||
- `publish_with_reply(subject, reply, payload)` — publish with reply-to subject
|
||||
- `subscribe(subject)` → `Subscriber` — subscribe to a subject
|
||||
- `queue_subscribe(subject, queue_group)` → `Subscriber` — queue group subscription
|
||||
- `request(subject, payload)` → `Message` — request/reply with default timeout
|
||||
- `send_request(subject, request)` → `Message` — request with custom `Request` builder
|
||||
- `flush()` — wait until all buffered writes are flushed to the server
|
||||
- `drain()` — drain all subscriptions, flush, then close
|
||||
- `force_reconnect()` — force a reconnection (e.g., to re-trigger auth)
|
||||
- `new_inbox()` — generate a unique inbox subject (`_INBOX.<id>`)
|
||||
- `server_info()` → `ServerInfo` — last known server info
|
||||
- `connection_state()` → `State` — `Pending`/`Connected`/`Disconnected`
|
||||
- `statistics()` → `Arc<Statistics>` — connection statistics (bytes, messages, connects)
|
||||
- `max_payload()` → `usize` — server's max payload size
|
||||
- `set_server_pool(addrs)` — replace the server pool for reconnection
|
||||
- `server_pool()` — snapshot of current server pool
|
||||
|
||||
### `Subscriber`
|
||||
|
||||
A `Stream` yielding `Message` values from a subscription.
|
||||
|
||||
```rust
|
||||
#[derive(Debug)]
|
||||
pub struct Subscriber {
|
||||
sid: u64,
|
||||
receiver: mpsc::Receiver<Message>,
|
||||
sender: mpsc::Sender<Command>,
|
||||
}
|
||||
```
|
||||
|
||||
Implements `futures_util::Stream<Item = Message>`. Methods:
|
||||
- `unsubscribe()` — immediately unsubscribe
|
||||
- `unsubscribe_after(n)` — unsubscribe after `n` total delivered messages
|
||||
- `drain()` — unsubscribe after in-flight messages are delivered
|
||||
|
||||
**Drop behavior**: When a `Subscriber` is dropped, it spawns a task to send `Command::Unsubscribe` to the connection handler, ensuring the server is always notified.
|
||||
|
||||
### `Message`
|
||||
|
||||
An inbound NATS message:
|
||||
|
||||
```rust
|
||||
#[derive(Clone, Debug, Serialize, Deserialize)]
|
||||
pub struct Message {
|
||||
pub subject: Subject,
|
||||
pub reply: Option<Subject>,
|
||||
pub payload: Bytes,
|
||||
pub headers: Option<HeaderMap>,
|
||||
pub status: Option<StatusCode>,
|
||||
pub description: Option<String>,
|
||||
pub length: usize,
|
||||
}
|
||||
```
|
||||
|
||||
### `OutboundMessage`
|
||||
|
||||
An outbound message for publishing (no status/description):
|
||||
|
||||
```rust
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct OutboundMessage {
|
||||
pub subject: Subject,
|
||||
pub reply: Option<Subject>,
|
||||
pub payload: Bytes,
|
||||
pub headers: Option<HeaderMap>,
|
||||
}
|
||||
```
|
||||
|
||||
### `Request`
|
||||
|
||||
Builder for request/reply calls:
|
||||
|
||||
```rust
|
||||
#[derive(Default)]
|
||||
pub struct Request {
|
||||
pub payload: Option<Bytes>,
|
||||
pub headers: Option<HeaderMap>,
|
||||
pub timeout: Option<Option<Duration>>,
|
||||
pub inbox: Option<String>,
|
||||
}
|
||||
```
|
||||
|
||||
Builder methods: `payload()`, `headers()`, `timeout()`, `inbox()`. The `inbox` field, when set, bypasses the multiplexer and uses a dedicated subscription instead.
|
||||
|
||||
### `ServerInfo`
|
||||
|
||||
Server metadata received during connection handshake:
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Deserialize, Default, Clone, Eq, PartialEq)]
|
||||
pub struct ServerInfo {
|
||||
pub server_id: String,
|
||||
pub server_name: String,
|
||||
pub host: String,
|
||||
pub port: u16,
|
||||
pub version: String,
|
||||
pub auth_required: bool,
|
||||
pub tls_required: bool,
|
||||
pub max_payload: usize,
|
||||
pub proto: i8,
|
||||
pub client_id: u64,
|
||||
pub go: String,
|
||||
pub nonce: String,
|
||||
pub connect_urls: Vec<String>,
|
||||
pub client_ip: String,
|
||||
pub headers: bool,
|
||||
pub lame_duck_mode: bool,
|
||||
pub cluster: Option<String>,
|
||||
pub domain: Option<String>,
|
||||
pub jetstream: bool,
|
||||
}
|
||||
```
|
||||
|
||||
### `ConnectInfo`
|
||||
|
||||
Client → server `CONNECT` message payload:
|
||||
|
||||
```rust
|
||||
#[derive(Clone, Debug, Serialize)]
|
||||
pub struct ConnectInfo {
|
||||
pub verbose: bool,
|
||||
pub pedantic: bool,
|
||||
pub user_jwt: Option<String>,
|
||||
pub nkey: Option<String>,
|
||||
pub signature: Option<String>,
|
||||
pub name: Option<String>,
|
||||
pub echo: bool,
|
||||
pub lang: String,
|
||||
pub version: String,
|
||||
pub protocol: Protocol, // Original(0) or Dynamic(1)
|
||||
pub tls_required: bool,
|
||||
pub user: Option<String>,
|
||||
pub pass: Option<String>,
|
||||
pub auth_token: Option<String>,
|
||||
pub headers: bool,
|
||||
pub no_responders: bool,
|
||||
}
|
||||
```
|
||||
|
||||
The client always sets: `verbose=false`, `pedantic=false`, `lang="rust"`, `protocol=Dynamic`, `headers=true`, `no_responders=true`.
|
||||
|
||||
### `Statistics`
|
||||
|
||||
Atomic connection statistics (shared via `Arc`):
|
||||
|
||||
```rust
|
||||
#[derive(Default, Debug)]
|
||||
pub struct Statistics {
|
||||
pub in_bytes: AtomicU64,
|
||||
pub out_bytes: AtomicU64,
|
||||
pub in_messages: AtomicU64,
|
||||
pub out_messages: AtomicU64,
|
||||
pub connects: AtomicU64,
|
||||
}
|
||||
```
|
||||
|
||||
## Subject Types
|
||||
|
||||
### `Subject`
|
||||
|
||||
A validated NATS subject string (newtype over `String`):
|
||||
|
||||
```rust
|
||||
// Usage:
|
||||
let subject: Subject = "foo.bar.baz".into();
|
||||
```
|
||||
|
||||
### `ToSubject` trait
|
||||
|
||||
Conversion trait for subjects:
|
||||
|
||||
```rust
|
||||
pub trait ToSubject {
|
||||
fn to_subject(self) -> Result<Subject, SubjectError>;
|
||||
}
|
||||
```
|
||||
|
||||
Implemented for `&str`, `String`, `Subject` directly.
|
||||
|
||||
### `SubjectError`
|
||||
|
||||
```rust
|
||||
pub enum SubjectError {
|
||||
InvalidFormat,
|
||||
}
|
||||
```
|
||||
|
||||
## Header Types
|
||||
|
||||
### `HeaderMap`
|
||||
|
||||
A multimap of header name → values:
|
||||
|
||||
```rust
|
||||
pub struct HeaderMap {
|
||||
inner: VecMap<HeaderName, Vec<HeaderValue>>,
|
||||
}
|
||||
```
|
||||
|
||||
Methods: `insert()`, `append()`, `get()`, `len()`, `is_empty()`, `iter()`, `to_bytes()`.
|
||||
|
||||
### `HeaderName`
|
||||
|
||||
Case-insensitive header name. Created via `FromStr`:
|
||||
|
||||
```rust
|
||||
let name: HeaderName = "Nats-Expected-Last-Subject-Sequence".parse()?;
|
||||
```
|
||||
|
||||
### `HeaderValue`
|
||||
|
||||
Header value string. Created via `FromStr` or `From<u64>`:
|
||||
|
||||
```rust
|
||||
let val: HeaderValue = "some value".parse()?;
|
||||
let val: HeaderValue = HeaderValue::from(42u64);
|
||||
```
|
||||
|
||||
## Server Address Types
|
||||
|
||||
### `ServerAddr`
|
||||
|
||||
Wraps a `url::Url` with NATS-specific validation. Supports schemes: `nats://`, `tls://`, `ws://`, `wss://`. Default port is `4222`.
|
||||
|
||||
```rust
|
||||
let addr: ServerAddr = "demo.nats.io".parse()?;
|
||||
let addr: ServerAddr = "nats://demo.nats.io:4222".parse()?;
|
||||
let addr: ServerAddr = "tls://demo.nats.io".parse()?;
|
||||
```
|
||||
|
||||
### `ToServerAddrs` trait
|
||||
|
||||
Flexible server address input (single URL, `Vec`, slice, etc.):
|
||||
|
||||
```rust
|
||||
pub trait ToServerAddrs {
|
||||
type Iter: Iterator<Item = ServerAddr>;
|
||||
fn to_server_addrs(&self) -> io::Result<Self::Iter>;
|
||||
}
|
||||
```
|
||||
|
||||
### `Server`
|
||||
|
||||
Metadata about a server in the pool:
|
||||
|
||||
```rust
|
||||
pub struct Server {
|
||||
pub addr: ServerAddr,
|
||||
pub failed_attempts: usize,
|
||||
pub did_connect: bool,
|
||||
pub is_discovered: bool,
|
||||
pub last_error: Option<String>,
|
||||
}
|
||||
```
|
||||
|
||||
## Event & State Types
|
||||
|
||||
### `Event`
|
||||
|
||||
Asynchronous notifications from the connection:
|
||||
|
||||
```rust
|
||||
pub enum Event {
|
||||
Connected,
|
||||
Disconnected,
|
||||
LameDuckMode,
|
||||
Draining,
|
||||
Closed,
|
||||
SlowConsumer(u64), // subscription sid
|
||||
ServerError(ServerError),
|
||||
ClientError(ClientError),
|
||||
}
|
||||
```
|
||||
|
||||
Received via `ConnectOptions::event_callback()`.
|
||||
|
||||
### `State`
|
||||
|
||||
Connection state observable via `watch::Receiver`:
|
||||
|
||||
```rust
|
||||
pub enum State {
|
||||
Pending,
|
||||
Connected,
|
||||
Disconnected,
|
||||
}
|
||||
```
|
||||
|
||||
### `StatusCode`
|
||||
|
||||
NATS protocol status codes (e.g., `NO_RESPONDERS = 404`, `TIMEOUT = 408`).
|
||||
|
||||
## Error Types
|
||||
|
||||
All error types follow the pattern `Error<Kind>` from `crate::error`:
|
||||
|
||||
| Error Type | Kind | Used By |
|
||||
|------------|------|---------|
|
||||
| `ConnectError` | `ConnectErrorKind` | Connection establishment |
|
||||
| `PublishError` | `PublishErrorKind` | Publish operations |
|
||||
| `RequestError` | `RequestErrorKind` | Request/reply |
|
||||
| `SubscribeError` | `SubscribeErrorKind` | Subscribe |
|
||||
| `FlushError` | `FlushErrorKind` | Flush |
|
||||
| `DrainError` | — | Drain |
|
||||
|
||||
### `ConnectErrorKind`
|
||||
|
||||
```rust
|
||||
pub enum ConnectErrorKind {
|
||||
ServerParse, // URL parsing failed
|
||||
Dns, // DNS resolution failed
|
||||
Authentication, // Auth signing failed
|
||||
AuthorizationViolation, // Server rejected auth
|
||||
TimedOut, // Connection handshake timeout
|
||||
Tls, // TLS error
|
||||
Io, // Other I/O error
|
||||
MaxReconnects, // Exceeded max reconnect attempts
|
||||
}
|
||||
```
|
||||
|
||||
## Trait Definitions
|
||||
|
||||
The `client::traits` module defines abstract interfaces:
|
||||
|
||||
```rust
|
||||
pub trait Publisher {
|
||||
fn publish_with_reply(&self, subject, reply, payload) -> Future<Output = Result<(), PublishError>>;
|
||||
fn publish_message(&self, msg: OutboundMessage) -> Future<Output = Result<(), PublishError>>;
|
||||
}
|
||||
|
||||
pub trait Subscriber {
|
||||
fn subscribe(&self, subject) -> Future<Output = Result<crate::Subscriber, SubscribeError>>;
|
||||
}
|
||||
|
||||
pub trait Requester {
|
||||
fn send_request(&self, subject, request: Request) -> Future<Output = Result<Message, RequestError>>;
|
||||
}
|
||||
|
||||
pub trait TimeoutProvider {
|
||||
fn timeout(&self) -> Option<Duration>;
|
||||
}
|
||||
```
|
||||
|
||||
`Client` implements all of these. The JetStream `Context` also implements them via delegation.
|
||||
|
||||
## Authentication Types
|
||||
|
||||
### `Auth`
|
||||
|
||||
Container for all authentication methods:
|
||||
|
||||
```rust
|
||||
pub struct Auth {
|
||||
pub jwt: Option<String>,
|
||||
pub nkey: Option<String>,
|
||||
pub signature_callback: Option<CallbackArg1<String, Result<String, AuthError>>>,
|
||||
pub signature: Option<Vec<u8>>,
|
||||
pub username: Option<String>,
|
||||
pub password: Option<String>,
|
||||
pub token: Option<String>,
|
||||
}
|
||||
```
|
||||
|
||||
### `AuthError`
|
||||
|
||||
Simple string error for auth callback failures.
|
||||
|
||||
### `ReconnectToServer`
|
||||
|
||||
Returned by `reconnect_to_server_callback` to select a server and delay:
|
||||
|
||||
```rust
|
||||
pub struct ReconnectToServer {
|
||||
pub addr: ServerAddr,
|
||||
pub delay: Option<Duration>,
|
||||
}
|
||||
```
|
||||
@@ -1,278 +0,0 @@
|
||||
# async-nats: NATS Protocol & Wire Format
|
||||
|
||||
## Protocol Overview
|
||||
|
||||
NATS uses a simple, text-based protocol over TCP. Messages are terminated with `\r\n`. The protocol is symmetric for client and server operations.
|
||||
|
||||
### Client → Server Operations (`ClientOp`)
|
||||
|
||||
```rust
|
||||
pub(crate) enum ClientOp {
|
||||
Publish { subject, payload, respond, headers },
|
||||
Subscribe { sid, subject, queue_group },
|
||||
Unsubscribe { sid, max },
|
||||
Ping,
|
||||
Pong,
|
||||
Connect(ConnectInfo),
|
||||
}
|
||||
```
|
||||
|
||||
### Server → Client Operations (`ServerOp`)
|
||||
|
||||
```rust
|
||||
pub(crate) enum ServerOp {
|
||||
Ok,
|
||||
Info(Box<ServerInfo>),
|
||||
Ping,
|
||||
Pong,
|
||||
Error(ServerError),
|
||||
Message { sid, subject, reply, payload, headers, status, description, length },
|
||||
}
|
||||
```
|
||||
|
||||
## Wire Format: Client Operations
|
||||
|
||||
### CONNECT
|
||||
|
||||
Sent immediately after receiving the first `INFO` from the server:
|
||||
|
||||
```
|
||||
CONNECT {"verbose":false,"pedantic":false,...}\r\n
|
||||
```
|
||||
|
||||
The JSON payload is `ConnectInfo` serialized inline on the same line.
|
||||
|
||||
### PUB (Publish without headers)
|
||||
|
||||
```
|
||||
PUB <subject> [reply-to] <payload-size>\r\n
|
||||
<payload>\r\n
|
||||
```
|
||||
|
||||
Example:
|
||||
```
|
||||
PUB events.data INBOX.67 11\r\n
|
||||
Hello World\r\n
|
||||
```
|
||||
|
||||
### HPUB (Publish with headers)
|
||||
|
||||
When headers are present and non-empty:
|
||||
|
||||
```
|
||||
HPUB <subject> [reply-to] <header-size> <total-size>\r\n
|
||||
<headers>\r\n
|
||||
<payload>\r\n
|
||||
```
|
||||
|
||||
The `<total-size>` = `<header-size>` + `<payload-size>`.
|
||||
|
||||
Header block format:
|
||||
```
|
||||
NATS/1.0\r\n
|
||||
Header-Name: Header-Value\r\n
|
||||
Another-Header: Another-Value\r\n
|
||||
\r\n
|
||||
```
|
||||
|
||||
The version line (`NATS/1.0`) may include a status code and description:
|
||||
```
|
||||
NATS/1.0 404 No Messages\r\n
|
||||
\r\n
|
||||
```
|
||||
|
||||
### SUB (Subscribe)
|
||||
|
||||
```
|
||||
SUB <subject> [queue-group] <sid>\r\n
|
||||
```
|
||||
|
||||
The `sid` (subscription ID) is a client-assigned u64, unique per connection.
|
||||
|
||||
### UNSUB (Unsubscribe)
|
||||
|
||||
```
|
||||
UNSUB <sid> [max]\r\n
|
||||
```
|
||||
|
||||
The optional `max` tells the server to auto-unsubscribe after `max` messages are delivered.
|
||||
|
||||
### PING / PONG
|
||||
|
||||
```
|
||||
PING\r\n
|
||||
PONG\r\n
|
||||
```
|
||||
|
||||
Client sends PING periodically (default every 60s). If 2+ pings are pending without PONG, the connection is considered dead.
|
||||
|
||||
## Wire Format: Server Operations
|
||||
|
||||
### INFO
|
||||
|
||||
First message sent by the server on connection:
|
||||
|
||||
```
|
||||
INFO {"server_id":"NATSxxx","version":"2.10"...}\r\n
|
||||
```
|
||||
|
||||
Also sent asynchronously when cluster topology changes.
|
||||
|
||||
### MSG (Message without headers)
|
||||
|
||||
```
|
||||
MSG <subject> <sid> [reply-to] <payload-size>\r\n
|
||||
<payload>\r\n
|
||||
```
|
||||
|
||||
### HMSG (Message with headers)
|
||||
|
||||
```
|
||||
HMSG <subject> <sid> [reply-to] <header-size> <total-size>\r\n
|
||||
<headers + payload>\r\n
|
||||
```
|
||||
|
||||
### +OK / -ERR
|
||||
|
||||
```
|
||||
+OK\r\n
|
||||
-ERR <description>\r\n
|
||||
```
|
||||
|
||||
Sent only when `verbose=true` in `CONNECT`. The client always sets `verbose=false`, so `+OK` is not expected.
|
||||
|
||||
## Protocol Parser
|
||||
|
||||
The `Connection` struct handles all protocol parsing and serialization:
|
||||
|
||||
### Read Path (`try_read_op`)
|
||||
|
||||
1. Search for `\r\n` in `read_buf` using `memchr::memmem::find`
|
||||
2. Inspect the first bytes to determine the operation type:
|
||||
- `+OK` → `ServerOp::Ok`
|
||||
- `PING` → `ServerOp::Ping`
|
||||
- `PONG` → `ServerOp::Pong`
|
||||
- `-ERR` → `ServerOp::Error(...)` (description is `trim_matches('\'')`)
|
||||
- `INFO ` → `ServerOp::Info(...)` (serde_json deserialization)
|
||||
- `MSG ` → Parse subject/sid/reply/size, then read payload
|
||||
- `HMSG ` → Parse subject/sid/reply/header_len/total_len, then read headers + payload
|
||||
3. For `MSG`/`HMSG`: if the full message body hasn't been read yet, return `None` (wait for more data)
|
||||
4. For `HMSG`: parse the header block — extract version line (`NATS/1.0[ <status>[ <description>]]`), then key-value pairs (supports folded/multi-line header values)
|
||||
|
||||
### Write Path (`enqueue_write_op`)
|
||||
|
||||
Writes into a buffer strategy:
|
||||
- **Small writes** (< 4096 bytes): flattened into `flattened_writes: BytesMut`
|
||||
- **Large writes** (≥ 4096 bytes): appended as separate `Bytes` chunks in `write_buf: VecDeque<Bytes>`
|
||||
|
||||
This enables efficient vectored I/O when the underlying stream supports it.
|
||||
|
||||
### Write Flush Strategy
|
||||
|
||||
The `should_flush()` method returns:
|
||||
- `Yes` — buffers empty but haven't flushed yet
|
||||
- `May` — buffers not empty and haven't flushed
|
||||
- `No` — already flushed or nothing to flush
|
||||
|
||||
The `ConnectionHandler` calls `poll_flush()` after processing commands, ensuring data is actually sent to the server.
|
||||
|
||||
## Vectored I/O
|
||||
|
||||
When `stream.is_write_vectored()` returns true, the connection uses `poll_write_vectored()` to write up to 64 `IoSlice`s at once. This is significantly more efficient for bursty publish patterns.
|
||||
|
||||
```rust
|
||||
const WRITE_VECTORED_CHUNKS: usize = 64;
|
||||
```
|
||||
|
||||
## WebSocket Transport
|
||||
|
||||
When the `websockets` feature is enabled, `WebSocketAdapter<T>` wraps `tokio_websockets::WebSocketStream<T>` to implement `AsyncRead + AsyncWrite`, making WebSocket connections transparent to the protocol layer.
|
||||
|
||||
```rust
|
||||
#[cfg(feature = "websockets")]
|
||||
pub(crate) struct WebSocketAdapter<T> {
|
||||
pub(crate) inner: WebSocketStream<T>,
|
||||
pub(crate) read_buf: BytesMut,
|
||||
}
|
||||
```
|
||||
|
||||
WebSocket connections use `ws://` or `wss://` scheme in the server URL. TLS for `wss://` is handled by the WebSocket library's built-in TLS support.
|
||||
|
||||
## Connection Lifecycle
|
||||
|
||||
### Initial Connection Flow
|
||||
|
||||
```
|
||||
Client Server
|
||||
│ │
|
||||
│──── TCP connect ────────────────────▶ │
|
||||
│◀──── INFO {server_id, nonce, ...} ─── │
|
||||
│──── CONNECT {auth, ...} ──────────▶ │
|
||||
│──── PING ─────────────────────────▶ │
|
||||
│◀──── PONG (or -ERR) ─────────────── │
|
||||
│ │
|
||||
│ [connected, ConnectionHandler runs] │
|
||||
```
|
||||
|
||||
If `tls_first` is enabled, TLS is established before reading INFO:
|
||||
|
||||
```
|
||||
Client Server
|
||||
│ │
|
||||
│──── TCP connect ────────────────────▶ │
|
||||
│──── TLS handshake ─────────────────▶ │
|
||||
│◀──── TLS handshake ──────────────── │
|
||||
│◀──── INFO {...} ──────────────────── │
|
||||
│──── CONNECT + PING ────────────────▶ │
|
||||
│◀──── PONG ────────────────────────── │
|
||||
```
|
||||
|
||||
### Ping/Pong Keepalive
|
||||
|
||||
- Client sends PING every `ping_interval` (default 60s)
|
||||
- Server responds with PONG
|
||||
- If `pending_pings > MAX_PENDING_PINGS (2)`, connection is considered dead
|
||||
- Any server operation resets the ping interval timer
|
||||
|
||||
### Reconnection Flow
|
||||
|
||||
On disconnect:
|
||||
1. `handle_disconnect()` sends `Event::Disconnected` and sets state to `Disconnected`
|
||||
2. `handle_reconnect()` calls `connector.connect()` which:
|
||||
- Shuffles servers (unless `retain_servers_order`)
|
||||
- Sorts by `failed_attempts` (ascending)
|
||||
- Iterates through servers with exponential backoff delay
|
||||
- On each server: DNS resolve → TCP connect → INFO → TLS (if needed) → CONNECT+PING → PONG
|
||||
3. On success:
|
||||
- Sends `Event::Connected`, sets state to `Connected`
|
||||
- Removes closed subscriptions
|
||||
- Re-subscribes all active subscriptions (with adjusted `max = max - delivered`)
|
||||
- Re-subscribes the multiplexer (if active)
|
||||
4. On failure with `MaxReconnects` reached, the handler loop exits
|
||||
|
||||
### Default Reconnect Delay
|
||||
|
||||
Exponential backoff capped at 4 seconds:
|
||||
|
||||
```rust
|
||||
fn reconnect_delay_callback_default(attempts: usize) -> Duration {
|
||||
if attempts <= 1 {
|
||||
Duration::from_millis(0)
|
||||
} else {
|
||||
let exp: u32 = (attempts - 1).try_into().unwrap_or(u32::MAX);
|
||||
cmp::min(Duration::from_millis(2_u64.saturating_pow(exp)), Duration::from_secs(4))
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Attempt | Delay |
|
||||
|---------|-------|
|
||||
| 1 | 0ms |
|
||||
| 2 | 0ms |
|
||||
| 3 | 2ms |
|
||||
| 4 | 8ms |
|
||||
| 5 | 32ms |
|
||||
| 6 | 128ms |
|
||||
| 7 | 512ms |
|
||||
| 8 | 2048ms |
|
||||
| 9+ | 4000ms (cap) |
|
||||
@@ -1,221 +0,0 @@
|
||||
# async-nats: Connection Management & Configuration
|
||||
|
||||
## ConnectOptions Builder
|
||||
|
||||
`ConnectOptions` provides a builder for all connection configuration:
|
||||
|
||||
```rust
|
||||
let client = ConnectOptions::new()
|
||||
.require_tls(true)
|
||||
.ping_interval(Duration::from_secs(10))
|
||||
.name("my-service")
|
||||
.connect("demo.nats.io")
|
||||
.await?;
|
||||
```
|
||||
|
||||
### Authentication Methods
|
||||
|
||||
| Method | Description |
|
||||
|--------|-------------|
|
||||
| `with_token(token)` | Token-based auth |
|
||||
| `with_user_and_password(user, pass)` | Username/password auth |
|
||||
| `with_nkey(seed)` | NKey auth (requires `nkeys` feature) |
|
||||
| `with_jwt(jwt, sign_cb)` | JWT + signing callback (requires `nkeys`) |
|
||||
| `with_credentials_file(path)` | Load from `.creds` file (requires `nkeys`) |
|
||||
| `with_credentials(creds_str)` | Parse credentials string (requires `nkeys`) |
|
||||
| `with_auth_callback(cb)` | Dynamic auth callback receiving nonce, returning `Auth` |
|
||||
|
||||
The auth callback is the most flexible — it receives the server nonce and can return any combination of auth fields:
|
||||
|
||||
```rust
|
||||
ConnectOptions::with_auth_callback(move |nonce| async move {
|
||||
let mut auth = Auth::new();
|
||||
auth.username = Some("user".to_string());
|
||||
auth.password = Some("pass".to_string());
|
||||
Ok(auth)
|
||||
})
|
||||
```
|
||||
|
||||
### TLS Configuration
|
||||
|
||||
| Option | Description |
|
||||
|-------|-------------|
|
||||
| `require_tls(bool)` | Require TLS for the connection |
|
||||
| `tls_first()` | Establish TLS before INFO (requires server `handshake_first`) |
|
||||
| `add_root_certificates(path)` | Load root CA certificates from PEM file |
|
||||
| `add_client_certificate(cert, key)` | Load client certificate for mTLS |
|
||||
| `tls_client_config(config)` | Pass a custom `rustls::ClientConfig` |
|
||||
|
||||
Two TLS crypto backends: `ring` (default) or `aws-lc-rs` (via feature flags). FIPS mode available via `aws-lc-rs` + `fips` features.
|
||||
|
||||
### Connection Behavior
|
||||
|
||||
| Option | Default | Description |
|
||||
|--------|---------|-------------|
|
||||
| `connection_timeout` | 5s | Timeout for full connection establishment |
|
||||
| `request_timeout` | 10s | Default timeout for `Client::request` |
|
||||
| `ping_interval` | 60s | How often client sends PING |
|
||||
| `retry_on_initial_connect` | false | Return client immediately, connect in background |
|
||||
| `max_reconnects` | None (unlimited) | Max consecutive reconnect attempts |
|
||||
| `ignore_discovered_servers` | false | Ignore servers advertised in INFO |
|
||||
| `retain_servers_order` | false | Don't shuffle server list on reconnect |
|
||||
| `skip_subject_validation` | false | Skip whitespace validation on publish subjects |
|
||||
| `subscription_capacity` | 65536 | mpsc channel capacity per subscription |
|
||||
| `client_capacity` | 2048 | mpsc channel capacity for command sender |
|
||||
| `custom_inbox_prefix` | `_INBOX` | Custom prefix for inbox subjects |
|
||||
| `read_buffer_capacity` | 65535 | Initial size of the protocol read buffer |
|
||||
| `local_address` | None | Local socket address to bind to |
|
||||
| `no_echo` | false | Don't deliver messages published by this connection |
|
||||
|
||||
### Reconnection Callbacks
|
||||
|
||||
**`reconnect_delay_callback`**: Custom backoff strategy:
|
||||
|
||||
```rust
|
||||
.reconnect_delay_callback(|attempts| {
|
||||
Duration::from_millis(std::cmp::min((attempts * 100) as u64, 8000))
|
||||
})
|
||||
```
|
||||
|
||||
**`reconnect_to_server_callback`**: Select which server to connect to on each reconnect attempt:
|
||||
|
||||
```rust
|
||||
.reconnect_to_server_callback(|servers, _info| async move {
|
||||
servers.first().map(|s| ReconnectToServer {
|
||||
addr: s.addr.clone(),
|
||||
delay: Some(Duration::ZERO),
|
||||
})
|
||||
})
|
||||
```
|
||||
|
||||
Receives `(Vec<Server>, ServerInfo)`, returns `Option<ReconnectToServer>`. If the returned server isn't in the pool, falls back to default selection.
|
||||
|
||||
**`event_callback`**: Receive async notifications:
|
||||
|
||||
```rust
|
||||
.event_callback(|event| async move {
|
||||
match event {
|
||||
Event::Disconnected => println!("disconnected"),
|
||||
Event::Connected => println!("connected"),
|
||||
Event::SlowConsumer(sid) => eprintln!("slow consumer: {sid}"),
|
||||
_ => {}
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
## Connection Handler Internals
|
||||
|
||||
### ProcessFut — The Core Event Loop
|
||||
|
||||
The `ConnectionHandler::process()` method creates a custom `Future` (`ProcessFut`) that drives the connection forward. Each `poll()` call:
|
||||
|
||||
1. **Check ping interval** — if timer ticked, send PING; if too many pending pings, disconnect
|
||||
2. **Read server operations** — drain all available `ServerOp`s from `Connection::poll_read_op()`
|
||||
3. **Process drain completions** — remove subscriptions that finished draining
|
||||
4. **Handle commands** — receive up to 16 `Command`s from the mpsc channel and process them
|
||||
5. **Write to socket** — flush the write buffer via `Connection::poll_write()`
|
||||
6. **Flush** — call `poll_flush()` on the underlying stream when needed
|
||||
7. **Check reconnect flag** — if `should_reconnect` is set, shut down and reconnect
|
||||
|
||||
```rust
|
||||
const RECV_CHUNK_SIZE: usize = 16;
|
||||
```
|
||||
|
||||
### Exit Reasons
|
||||
|
||||
The event loop exits with one of:
|
||||
|
||||
| Reason | Action |
|
||||
|--------|--------|
|
||||
| `Disconnected(Option<io::Error>)` | Attempt reconnection |
|
||||
| `ReconnectRequested` | Shut down stream, attempt reconnection |
|
||||
| `Closed` | Send `Event::Closed`, exit loop |
|
||||
|
||||
### Handle Disconnect & Reconnect
|
||||
|
||||
```rust
|
||||
async fn handle_disconnect(&mut self) -> Result<(), ConnectError> {
|
||||
self.pending_pings = 0;
|
||||
self.connector.events_tx.try_send(Event::Disconnected).ok();
|
||||
self.connector.state_tx.send(State::Disconnected).ok();
|
||||
self.handle_reconnect().await
|
||||
}
|
||||
|
||||
async fn handle_reconnect(&mut self) -> Result<(), ConnectError> {
|
||||
let (info, connection) = self.connector.connect().await?;
|
||||
self.connection = connection;
|
||||
let _ = self.info_sender.send(Some(info));
|
||||
|
||||
// Remove closed subscriptions
|
||||
self.subscriptions.retain(|_, sub| !sub.sender.is_closed());
|
||||
|
||||
// Re-subscribe all active subscriptions
|
||||
for (sid, subscription) in &self.subscriptions {
|
||||
self.connection.enqueue_write_op(&ClientOp::Subscribe {
|
||||
sid: *sid,
|
||||
subject: subscription.subject.to_owned(),
|
||||
queue_group: subscription.queue_group.to_owned(),
|
||||
});
|
||||
if let Some(max) = subscription.max {
|
||||
self.connection.enqueue_write_op(&ClientOp::Unsubscribe {
|
||||
sid: *sid,
|
||||
max: Some(max.saturating_sub(subscription.delivered)),
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Re-subscribe multiplexer if active
|
||||
if let Some(multiplexer) = &self.multiplexer {
|
||||
self.connection.enqueue_write_op(&ClientOp::Subscribe {
|
||||
sid: MULTIPLEXER_SID,
|
||||
subject: multiplexer.subject.to_owned(),
|
||||
queue_group: None,
|
||||
});
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
## Request/Reply Multiplexer
|
||||
|
||||
The client uses a **multiplexer** pattern for request/reply to avoid creating a separate subscription per request:
|
||||
|
||||
1. A single wildcard subscription is created on first request: `_INBOX.<random_id>.*`
|
||||
2. Each request gets a unique token appended to the inbox: `_INBOX.<random_id>.<token>`
|
||||
3. When a response arrives, the token is extracted from the subject and used to look up the `oneshot::Sender` in `multiplexer.senders`
|
||||
4. The response is forwarded through the oneshot channel to the waiting `send_request()` future
|
||||
|
||||
```rust
|
||||
struct Multiplexer {
|
||||
subject: Subject, // _INBOX.<id>.*
|
||||
prefix: Subject, // _INBOX.<id>.
|
||||
senders: HashMap<String, oneshot::Sender<Message>>, // token → sender
|
||||
}
|
||||
```
|
||||
|
||||
The multiplexer subscription uses `sid = 0` (`MULTIPLEXER_SID`), which is separate from regular subscription IDs (which start at 1).
|
||||
|
||||
### Custom Inbox Bypass
|
||||
|
||||
If a `Request` has a custom `inbox` set, the multiplexer is bypassed — a dedicated subscription is created for that specific request, and the timeout/response logic is handled locally within `send_request()`.
|
||||
|
||||
## Server Pool Management
|
||||
|
||||
The `Connector` maintains a `Vec<Server>` pool. Servers can come from:
|
||||
1. **Explicit URLs** — provided by the user at connect time
|
||||
2. **Discovered servers** — advertised in `INFO.connect_urls` (unless `ignore_discovered_servers` is set)
|
||||
|
||||
On reconnection:
|
||||
- Servers are shuffled (unless `retain_servers_order`)
|
||||
- Sorted by `failed_attempts` (ascending) — prefer servers that haven't failed recently
|
||||
- Each server is tried with exponential backoff delay
|
||||
- On success: `failed_attempts` reset to 0, `did_connect` set to true
|
||||
- On failure: `failed_attempts` incremented, `last_error` updated
|
||||
|
||||
### Dynamic Server Pool Updates
|
||||
|
||||
`Client::set_server_pool()` replaces the pool at runtime:
|
||||
- Per-server state is preserved for servers that appear in both old and new pools
|
||||
- The global reconnection attempt counter is reset
|
||||
- Cannot mix WebSocket and non-WebSocket URLs
|
||||
- Pool cannot be empty
|
||||
@@ -1,373 +0,0 @@
|
||||
# async-nats: JetStream
|
||||
|
||||
## Overview
|
||||
|
||||
JetStream is NATS' built-in persistence layer, providing stream-based messaging with at-least-once and exactly-once delivery semantics. The `async-nats` JetStream API is accessed through a `Context` object.
|
||||
|
||||
### Creating a Context
|
||||
|
||||
```rust
|
||||
// Default context (prefix: $JS.API)
|
||||
let jetstream = async_nats::jetstream::new(client);
|
||||
|
||||
// With domain (prefix: $JS.<domain>.API)
|
||||
let jetstream = async_nats::jetstream::with_domain(client, "hub");
|
||||
|
||||
// With custom prefix
|
||||
let jetstream = async_nats::jetstream::with_prefix(client, "JS.acc@hub.API");
|
||||
|
||||
// Builder with fine-grained control
|
||||
let context = ContextBuilder::new()
|
||||
.timeout(Duration::from_secs(5))
|
||||
.api_prefix("MY.JS.API")
|
||||
.max_ack_inflight(1000)
|
||||
.backpressure_on_inflight(true)
|
||||
.ack_timeout(Duration::from_secs(30))
|
||||
.build(client);
|
||||
```
|
||||
|
||||
## Context
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct Context {
|
||||
pub(crate) client: Client,
|
||||
pub(crate) prefix: String,
|
||||
pub(crate) timeout: Duration,
|
||||
pub(crate) max_ack_semaphore: Arc<tokio::sync::Semaphore>,
|
||||
pub(crate) ack_sender: mpsc::Sender<(oneshot::Receiver<Message>, OwnedSemaphorePermit)>,
|
||||
pub(crate) backpressure_on_inflight: bool,
|
||||
pub(crate) semaphore_capacity: usize,
|
||||
}
|
||||
```
|
||||
|
||||
### Publish Backpressure
|
||||
|
||||
The context uses a semaphore to limit the number of pending publish acknowledgments:
|
||||
|
||||
- `max_ack_inflight(n)` — sets semaphore capacity (default 5000)
|
||||
- `backpressure_on_inflight(true)` — `publish()` waits for a permit when limit is reached
|
||||
- `backpressure_on_inflight(false)` — `publish()` returns `MaxAckPending` error immediately when limit is reached
|
||||
|
||||
A background **acker task** monitors pending acks with a timeout (`ack_timeout`, default 30s), releasing permits when acks arrive or time out.
|
||||
|
||||
### JetStream API Request Pattern
|
||||
|
||||
All JetStream API calls follow the same pattern:
|
||||
|
||||
1. Build a subject from the prefix: `format!("{}.STREAM.CREATE.<name>", self.prefix)`
|
||||
2. Serialize the request payload as JSON
|
||||
3. Send a request via `client.send_request()` with the API subject
|
||||
4. Deserialize the response as `Response<T>` (which is `Ok(T)` or `Err(ErrorCode)`)
|
||||
|
||||
## Streams
|
||||
|
||||
### Stream Handle
|
||||
|
||||
```rust
|
||||
pub struct Stream<I = Info> {
|
||||
context: Context,
|
||||
info: I,
|
||||
name: String,
|
||||
}
|
||||
```
|
||||
|
||||
`Stream<Info>` carries server-side info. `Stream<()>` is a lightweight handle that skips the INFO fetch. `Stream` (no generic) defaults to `Stream<Info>`.
|
||||
|
||||
### Stream Config
|
||||
|
||||
```rust
|
||||
pub struct Config {
|
||||
pub name: String,
|
||||
pub description: Option<String>,
|
||||
pub subjects: Vec<String>,
|
||||
pub retention: RetentionPolicy,
|
||||
pub max_consumers: i64,
|
||||
pub max_messages: i64,
|
||||
pub max_messages_per_subject: i64,
|
||||
pub max_bytes: i64,
|
||||
pub max_age: Duration,
|
||||
pub max_messages_per_stream: i64,
|
||||
pub max_msg_size: i32,
|
||||
pub discard: DiscardPolicy,
|
||||
pub discard_new_per_subject: bool,
|
||||
pub storage: StorageType,
|
||||
pub num_replicas: usize,
|
||||
pub no_ack: bool,
|
||||
pub duplicate_window: Duration,
|
||||
pub placement: Option<Placement>,
|
||||
pub mirror: Option<Source>,
|
||||
pub sources: Option<Vec<Source>>,
|
||||
pub sealed: bool,
|
||||
pub allow_direct: bool,
|
||||
pub allow_rollup_hdrs: bool,
|
||||
// server_2_10 features:
|
||||
pub compression: Option<Compression>,
|
||||
pub first_sequence: Option<u64>,
|
||||
pub subject_transform: Option<SubjectTransform>,
|
||||
pub republish: Option<Republish>,
|
||||
pub metadata: Option<HashMap<String, String>>,
|
||||
}
|
||||
```
|
||||
|
||||
### Stream Operations
|
||||
|
||||
| Method | Description |
|
||||
|--------|-------------|
|
||||
| `create_stream(config)` | Create a new stream |
|
||||
| `get_stream(name)` | Get stream handle (with INFO) |
|
||||
| `get_stream_no_info(name)` | Get lightweight handle (no server round-trip) |
|
||||
| `get_or_create_stream(config)` | Get existing or create new |
|
||||
| `delete_stream(name)` | Delete a stream |
|
||||
| `update_stream(config)` | Update stream configuration |
|
||||
| `create_or_update_stream(config)` | Update or create if not found |
|
||||
| `stream_names()` | `Stream` of stream names (paginated) |
|
||||
| `streams()` | `Stream` of stream info (paginated) |
|
||||
| `stream_by_subject(subject)` | Find stream name containing subject |
|
||||
|
||||
### Stream Handle Methods
|
||||
|
||||
```rust
|
||||
let stream: Stream = jetstream.get_stream("events").await?;
|
||||
|
||||
// Info
|
||||
let info: Info = stream.info().await?; // Fresh info from server
|
||||
let info: &Info = stream.cached_info(); // Cached info from last fetch
|
||||
|
||||
// Message operations
|
||||
stream.get_raw_message(seq).await?; // Get raw message by sequence
|
||||
stream.get_last_raw_message_by_subject(subj).await?; // Get last message for subject
|
||||
stream.direct_get(seq).await?; // Direct get (if allow_direct)
|
||||
stream.direct_get_last_for_subject(subj).await?; // Direct last by subject
|
||||
stream.delete_message(seq).await?; // Delete a specific message
|
||||
stream.purge().await?; // Purge all messages
|
||||
stream.purge().filter(subj).await?; // Purge messages for subject
|
||||
|
||||
// Consumers
|
||||
stream.create_consumer(config).await?; // Create consumer bound to stream
|
||||
stream.get_consumer(name).await?; // Get existing consumer
|
||||
stream.delete_consumer(name).await?; // Delete consumer
|
||||
```
|
||||
|
||||
## Consumers
|
||||
|
||||
### Consumer Types
|
||||
|
||||
Two consumer types, each with distinct delivery models:
|
||||
|
||||
1. **Pull Consumer** (`pull::Config` / `PullConsumer`) — Client explicitly requests batches of messages
|
||||
2. **Push Consumer** (`push::Config` / `PushConsumer`) — Server pushes messages to a deliver subject
|
||||
|
||||
### Pull Consumer
|
||||
|
||||
```rust
|
||||
let consumer: PullConsumer = stream
|
||||
.get_or_create_consumer("my-consumer", pull::Config {
|
||||
durable_name: Some("my-consumer".to_string()),
|
||||
..Default::default()
|
||||
})
|
||||
.await?;
|
||||
```
|
||||
|
||||
**Key methods**:
|
||||
- `consumer.batch(n).await?` — Fetch up to `n` messages (one-shot batch)
|
||||
- `consumer.messages().await?` — Continuous `Stream` of messages
|
||||
- `consumer.sequence(n).await?` — Continuous `Stream` of batches of `n` messages
|
||||
- `consumer.fetch().max(n).expires(dur).await?` — Configurable fetch
|
||||
|
||||
Each message from a pull consumer is a `jetstream::Message` which has `ack()` methods.
|
||||
|
||||
### Push Consumer
|
||||
|
||||
Two push consumer variants:
|
||||
|
||||
1. **Standard** (`push::Config`) — messages delivered to a specific subject
|
||||
2. **Ordered** (`push::OrderedConfig`) — auto-recreated on failure, with flow control
|
||||
|
||||
```rust
|
||||
// Standard push
|
||||
let consumer = stream.create_consumer(push::Config {
|
||||
deliver_subject: "deliver.subject".to_string(),
|
||||
durable_name: Some("push-consumer".to_string()),
|
||||
..Default::default()
|
||||
}).await?;
|
||||
|
||||
// Ordered push (no durable name, auto-recreates on failure)
|
||||
let consumer = stream.create_consumer(push::OrderedConfig {
|
||||
deliver_subject: client.new_inbox(),
|
||||
filter_subject: "events.>".to_string(),
|
||||
..Default::default()
|
||||
}).await?;
|
||||
```
|
||||
|
||||
### Consumer Config (Shared Fields)
|
||||
|
||||
```rust
|
||||
pub struct Config {
|
||||
// Pull fields
|
||||
pub durable_name: Option<String>,
|
||||
pub name: Option<String>,
|
||||
|
||||
// Push fields
|
||||
pub deliver_subject: Option<String>,
|
||||
pub deliver_group: Option<String>,
|
||||
pub deliver_policy: DeliverPolicy,
|
||||
pub opt_start_time: Option<DateTime>,
|
||||
pub opt_start_sequence: Option<u64>,
|
||||
pub ack_policy: AckPolicy,
|
||||
pub ack_wait: Duration,
|
||||
pub max_deliver: i64,
|
||||
pub backoff: Vec<Duration>,
|
||||
pub filter_subject: String,
|
||||
pub filter_subjects: Vec<String>, // server_2_10+
|
||||
pub replay_policy: ReplayPolicy,
|
||||
pub rate_limit_bps: Option<u64>,
|
||||
pub max_waiting: i64, // pull: max outstanding pull requests
|
||||
pub max_ack_pending: i64,
|
||||
pub flow_control: bool,
|
||||
pub idle_heartbeat: Duration,
|
||||
pub headers_only: bool,
|
||||
pub num_replicas: usize,
|
||||
pub mem_storage: bool,
|
||||
pub description: Option<String>,
|
||||
pub metadata: Option<HashMap<String, String>>,
|
||||
pub inactive_threshold: Option<Duration>, // for ephemeral consumers
|
||||
}
|
||||
```
|
||||
|
||||
### Deliver Policy
|
||||
|
||||
```rust
|
||||
pub enum DeliverPolicy {
|
||||
All, // Deliver all messages
|
||||
Last, // Deliver last message only
|
||||
New, // Deliver only new messages
|
||||
ByStartSequence { start_sequence: u64 },
|
||||
ByStartTime { start_time: DateTime },
|
||||
LastPerSubject, // Deliver last message per subject
|
||||
}
|
||||
```
|
||||
|
||||
### Ack Policy
|
||||
|
||||
```rust
|
||||
pub enum AckPolicy {
|
||||
None, // No acknowledgment needed
|
||||
All, // Ack all messages up to this one
|
||||
Explicit, // Ack each message individually
|
||||
}
|
||||
```
|
||||
|
||||
## JetStream Messages
|
||||
|
||||
### `jetstream::Message`
|
||||
|
||||
Wraps a core `Message` with JetStream-specific metadata:
|
||||
|
||||
```rust
|
||||
pub struct Message {
|
||||
pub message: crate::Message, // The underlying NATS message
|
||||
pub ack_subject: Subject, // Subject for sending acks
|
||||
pub stream: String, // Stream name
|
||||
pub consumer: String, // Consumer name
|
||||
pub stream_sequence: u64, // Sequence in stream
|
||||
pub consumer_sequence: u64, // Sequence for this consumer
|
||||
pub delivered: u64, // Delivery count
|
||||
pub pending: u64, // Pending message count
|
||||
pub published: DateTime, // Original publish time
|
||||
}
|
||||
```
|
||||
|
||||
### Ack Methods
|
||||
|
||||
```rust
|
||||
// In-memory ack (non-persistent, fast)
|
||||
message.ack().await?;
|
||||
|
||||
// Ack with specific type
|
||||
message.ack_with(AckKind::Nak).await?;
|
||||
message.ack_with(AckKind::Progress).await?;
|
||||
message.ack_with(AckKind::Term).await?;
|
||||
message.ack_with(AckKind::NakWithDelay(duration)).await?;
|
||||
message.ack_with(AckKind::TermWithReason("reason")).await?;
|
||||
```
|
||||
|
||||
### `AckKind`
|
||||
|
||||
```rust
|
||||
pub enum AckKind {
|
||||
Ack, // +ACK — message processed
|
||||
Nak, // -NAK — re-deliver
|
||||
Progress, // PRI — still working
|
||||
Term, // +TERM — don't redeliver
|
||||
NakWithDelay(Duration), // -NAK with re-delivery delay
|
||||
TermWithReason(String), // +TERM with reason
|
||||
}
|
||||
```
|
||||
|
||||
## JetStream Publish
|
||||
|
||||
### `Context::publish()`
|
||||
|
||||
JetStream publish returns a `PublishAckFuture` — a future that resolves to a `PublishAck`:
|
||||
|
||||
```rust
|
||||
let ack_future = jetstream.publish("events", "data".into()).await?;
|
||||
let ack: PublishAck = ack_future.await?; // Wait for server acknowledgment
|
||||
```
|
||||
|
||||
### `PublishAck`
|
||||
|
||||
```rust
|
||||
pub struct PublishAck {
|
||||
pub stream: String,
|
||||
pub sequence: u64,
|
||||
pub domain: String,
|
||||
pub duplicate: bool,
|
||||
}
|
||||
```
|
||||
|
||||
### `PublishMessage` Builder
|
||||
|
||||
```rust
|
||||
let ack = jetstream.send_publish(
|
||||
"events",
|
||||
PublishMessage::build()
|
||||
.payload("data".into())
|
||||
.message_id("uuid-123") // Deduplication ID
|
||||
.expected_stream("events") // Fail if wrong stream
|
||||
.expected_last_msg_id("prev-id")
|
||||
.expected_last_sequence(42)
|
||||
.headers(headers),
|
||||
).await?;
|
||||
```
|
||||
|
||||
## Pagination
|
||||
|
||||
Stream and consumer listing uses pagination internally:
|
||||
|
||||
```rust
|
||||
pub struct StreamNames {
|
||||
context: Context,
|
||||
offset: usize,
|
||||
page_request: Option<Request>,
|
||||
streams: Vec<String>,
|
||||
subject: Option<String>,
|
||||
done: bool,
|
||||
}
|
||||
```
|
||||
|
||||
Implements `futures_util::Stream<Item = Result<String, Error>>`, lazily fetching pages as needed.
|
||||
|
||||
## Error Handling
|
||||
|
||||
JetStream errors follow the `Response<T>` pattern:
|
||||
|
||||
```rust
|
||||
pub enum Response<T> {
|
||||
Ok(T),
|
||||
Err { error: ErrorCode },
|
||||
}
|
||||
```
|
||||
|
||||
`ErrorCode` carries the server's error code and description. Most JetStream-specific errors map to typed error enums (e.g., `CreateStreamError`, `ConsumerError`, etc.).
|
||||
@@ -1,237 +0,0 @@
|
||||
# async-nats: Key-Value Store
|
||||
|
||||
## Overview
|
||||
|
||||
The Key-Value (KV) store is an abstraction built on top of JetStream streams. Each KV bucket is backed by a JetStream stream with the naming convention `KV_<bucket_name>`. Keys are mapped to subjects under the `$KV.<bucket>.<key>` prefix.
|
||||
|
||||
The KV feature requires `kv` (which implies `jetstream`).
|
||||
|
||||
## Store Handle
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct Store {
|
||||
pub name: String,
|
||||
pub stream_name: String,
|
||||
pub prefix: String, // $KV.<bucket>.
|
||||
pub put_prefix: Option<String>, // For mirrored buckets
|
||||
pub use_jetstream_prefix: bool, // Whether to prepend JS API prefix
|
||||
pub stream: Stream,
|
||||
}
|
||||
```
|
||||
|
||||
## Bucket Config
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone, Default)]
|
||||
pub struct Config {
|
||||
pub bucket: String,
|
||||
pub description: String,
|
||||
pub max_value_size: i32,
|
||||
pub history: i64, // Max historical entries per key (1-64)
|
||||
pub max_age: Duration, // Max age of any entry
|
||||
pub max_bytes: i64, // Total bucket size limit
|
||||
pub storage: StorageType, // File or Memory
|
||||
pub num_replicas: usize,
|
||||
pub republish: Option<Republish>,
|
||||
pub mirror: Option<Source>, // Mirror another bucket
|
||||
pub sources: Option<Vec<Source>>,
|
||||
pub mirror_direct: bool,
|
||||
pub compression: bool, // server_2_10+
|
||||
pub placement: Option<Placement>,
|
||||
pub limit_markers: Option<Duration>, // server_2_11+
|
||||
}
|
||||
```
|
||||
|
||||
## Creating/Accessing Buckets
|
||||
|
||||
```rust
|
||||
// Create a new bucket
|
||||
let kv = jetstream.create_key_value(kv::Config {
|
||||
bucket: "my-bucket".to_string(),
|
||||
history: 10,
|
||||
max_age: Duration::from_secs(3600),
|
||||
..Default::default()
|
||||
}).await?;
|
||||
|
||||
// Get an existing bucket
|
||||
let kv = jetstream.get_key_value("my-bucket").await?;
|
||||
|
||||
// Create or update
|
||||
let kv = jetstream.create_or_update_key_value(kv::Config { ... }).await?;
|
||||
|
||||
// Delete a bucket
|
||||
jetstream.delete_key_value("my-bucket").await?;
|
||||
```
|
||||
|
||||
## KV Operations
|
||||
|
||||
### Put
|
||||
|
||||
```rust
|
||||
let revision: u64 = kv.put("key", "value".into()).await?;
|
||||
```
|
||||
|
||||
Publishes to `$KV.<bucket>.<key>` (or with JS prefix). The JetStream stream stores it, and the returned sequence number serves as the revision.
|
||||
|
||||
### Get
|
||||
|
||||
```rust
|
||||
let value: Option<Bytes> = kv.get("key").await?;
|
||||
```
|
||||
|
||||
Returns `None` if the key doesn't exist or was deleted/purged. Uses either direct get (if `allow_direct`) or the standard message API.
|
||||
|
||||
### Entry
|
||||
|
||||
```rust
|
||||
let entry: Option<Entry> = kv.entry("key").await?;
|
||||
let entry: Option<Entry> = kv.entry_for_revision("key", 2).await?;
|
||||
```
|
||||
|
||||
Returns full entry metadata:
|
||||
|
||||
```rust
|
||||
pub struct Entry {
|
||||
pub bucket: String,
|
||||
pub key: String,
|
||||
pub value: Bytes,
|
||||
pub revision: u64,
|
||||
pub created: DateTime,
|
||||
pub delta: u64,
|
||||
pub operation: Operation,
|
||||
pub seen_current: bool,
|
||||
}
|
||||
```
|
||||
|
||||
### Create (Put if not exists)
|
||||
|
||||
```rust
|
||||
let revision: u64 = kv.create("key", "value".into()).await?;
|
||||
```
|
||||
|
||||
Uses `update` with `expected_last_subject_sequence = 0` (create-only). If the key exists and is deleted/purged, it's re-created.
|
||||
|
||||
### Update (Conditional Put)
|
||||
|
||||
```rust
|
||||
let revision: u64 = kv.update("key", "value".into(), last_revision).await?;
|
||||
```
|
||||
|
||||
Uses the `Nats-Expected-Last-Subject-Sequence` header for optimistic concurrency control. Only succeeds if the key's current revision matches.
|
||||
|
||||
### Delete
|
||||
|
||||
```rust
|
||||
kv.delete("key").await?;
|
||||
kv.delete_expect_revision("key", Some(revision)).await?;
|
||||
```
|
||||
|
||||
Non-destructive — publishes a `DEL` marker message. The key appears deleted to `get()`, but history is preserved (up to `history` limit).
|
||||
|
||||
### Purge
|
||||
|
||||
```rust
|
||||
kv.purge("key").await?;
|
||||
kv.purge_with_ttl("key", Duration::from_secs(10)).await?;
|
||||
kv.purge_expect_revision("key", Some(revision)).await?;
|
||||
```
|
||||
|
||||
Destructive — publishes a `PURGE` marker with rollup header, removing all previous revisions of the key. Leaves a single purge entry.
|
||||
|
||||
### Watch
|
||||
|
||||
```rust
|
||||
// Watch for new changes
|
||||
let mut watch = kv.watch("key").await?;
|
||||
// Watch with initial value
|
||||
let mut watch = kv.watch_with_history("key").await?;
|
||||
// Watch from specific revision
|
||||
let mut watch = kv.watch_from_revision("key", 5).await?;
|
||||
// Watch all keys
|
||||
let mut watch = kv.watch_all().await?;
|
||||
// Watch multiple keys (server_2_10+)
|
||||
let mut watch = kv.watch_many(["foo", "bar"]).await?;
|
||||
```
|
||||
|
||||
`Watch` implements `futures_util::Stream<Item = Result<Entry, WatcherError>>`.
|
||||
|
||||
Under the hood, each watch creates an **ordered push consumer** on the KV stream with:
|
||||
- `filter_subject` matching `$KV.<bucket>.<key>`
|
||||
- `replay_policy: Instant`
|
||||
- Appropriate `deliver_policy`
|
||||
|
||||
### History
|
||||
|
||||
```rust
|
||||
let mut history = kv.history("key").await?;
|
||||
```
|
||||
|
||||
Returns a `Stream` of all past `Entry` values for a key (including deletes/purges).
|
||||
|
||||
### Keys
|
||||
|
||||
```rust
|
||||
let mut keys = kv.keys().await?;
|
||||
```
|
||||
|
||||
Returns a `Stream<String>` of all current keys. Uses a headers-only consumer with `LastPerSubject` deliver policy to efficiently scan the bucket.
|
||||
|
||||
## Entry Operations
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone, Copy, Eq, PartialEq)]
|
||||
pub enum Operation {
|
||||
Put, // Value was put
|
||||
Delete, // Value was deleted (DEL marker)
|
||||
Purge, // Value was purged (PURGE marker with rollup)
|
||||
}
|
||||
```
|
||||
|
||||
The operation type is determined from the `KV-Operation` header (`PUT`, `DEL`, `PURGE`) or the `Nats-Marker-Reason` header (fallback for server-generated markers like `MaxAge`, `Purge`, `Remove`).
|
||||
|
||||
## Key and Bucket Name Validation
|
||||
|
||||
```rust
|
||||
// Bucket: alphanumeric, dash, underscore only
|
||||
VALID_BUCKET_RE: \A[a-zA-Z0-9_-]+\z
|
||||
|
||||
// Key: alphanumeric, dash, slash, underscore, equals, dot; no leading/trailing dots
|
||||
VALID_KEY_RE: \A[-/_=\.a-zA-Z0-9]+\z
|
||||
```
|
||||
|
||||
## Bucket Status
|
||||
|
||||
```rust
|
||||
let status: Status = kv.status().await?;
|
||||
```
|
||||
|
||||
Wraps stream info to provide bucket-level statistics (bucket name, message count, byte count, etc.).
|
||||
|
||||
## Mirrored Buckets
|
||||
|
||||
When a bucket is configured as a mirror of another (potentially in a different account/domain):
|
||||
|
||||
- `prefix` is set to `$KV.<mirror_bucket>.`
|
||||
- `put_prefix` may be set to the source bucket's API prefix for cross-domain writes
|
||||
- `use_jetstream_prefix` is adjusted based on whether the mirror is in the same domain
|
||||
|
||||
## KV → Stream Config Mapping
|
||||
|
||||
When creating a KV bucket, the `Config` is converted to a JetStream `stream::Config`:
|
||||
|
||||
| KV Config | Stream Config |
|
||||
|-----------|---------------|
|
||||
| `bucket` | `name = "KV_<bucket>"` |
|
||||
| `subjects` | `["$KV.<bucket>.>"]` |
|
||||
| `max_messages_per_subject` | `history` (max 64) |
|
||||
| `max_age` | `max_age` |
|
||||
| `max_bytes` | `max_bytes` |
|
||||
| `storage` | `storage` |
|
||||
| `num_replicas` | `num_replicas` |
|
||||
| `republish` | `republish` |
|
||||
| `mirror` | `mirror` |
|
||||
| `discard` | `DiscardPolicy::New` |
|
||||
| `allow_direct` | `true` |
|
||||
| `allow_rollup_hdrs` | `true |
|
||||
| `max_msg_size` | `max_value_size` |
|
||||
@@ -1,245 +0,0 @@
|
||||
# async-nats: Object Store
|
||||
|
||||
## Overview
|
||||
|
||||
The Object Store provides large-object storage built on JetStream. Objects are chunked and stored as messages in a JetStream stream, with metadata stored separately. The stream is named `OBJ_<bucket_name>`.
|
||||
|
||||
The object-store feature requires `object-store` (which implies `jetstream` + `crypto`).
|
||||
|
||||
## ObjectStore Handle
|
||||
|
||||
```rust
|
||||
#[derive(Clone)]
|
||||
pub struct ObjectStore {
|
||||
pub(crate) name: String,
|
||||
pub(crate) stream: Stream,
|
||||
}
|
||||
```
|
||||
|
||||
## Object Store Config
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Default, Clone, Serialize, Deserialize)]
|
||||
pub struct Config {
|
||||
pub bucket: String,
|
||||
pub description: Option<String>,
|
||||
pub max_age: Duration,
|
||||
pub max_bytes: i64,
|
||||
pub storage: StorageType,
|
||||
pub num_replicas: usize,
|
||||
pub compression: bool,
|
||||
pub placement: Option<Placement>,
|
||||
}
|
||||
```
|
||||
|
||||
## Creating/Accessing Object Stores
|
||||
|
||||
```rust
|
||||
// Create
|
||||
let bucket = jetstream.create_object_store(object_store::Config {
|
||||
bucket: "my-bucket".to_string(),
|
||||
..Default::default()
|
||||
}).await?;
|
||||
|
||||
// Get existing
|
||||
let bucket = jetstream.get_object_store("my-bucket").await?;
|
||||
|
||||
// Delete
|
||||
jetstream.delete_object_store("my-bucket").await?;
|
||||
```
|
||||
|
||||
## Object Store Operations
|
||||
|
||||
### Put
|
||||
|
||||
```rust
|
||||
let info: ObjectInfo = bucket.put("file.txt", &mut async_read).await?;
|
||||
```
|
||||
|
||||
The put operation:
|
||||
1. Reads data from any `AsyncRead + Unpin` source in chunks (default 128KB)
|
||||
2. Each chunk is published to `$O.<bucket>.C.<nuid>` (chunk subject)
|
||||
3. SHA-256 digest is computed incrementally
|
||||
4. After all chunks, metadata is published to `$O.<bucket>.M.<encoded_name>` with a rollup header
|
||||
5. If the object previously existed, old chunks are purged
|
||||
|
||||
### Get
|
||||
|
||||
```rust
|
||||
let mut object: Object = bucket.get("file.txt").await?;
|
||||
```
|
||||
|
||||
Returns an `Object` that implements `tokio::io::AsyncRead`:
|
||||
|
||||
```rust
|
||||
let mut bytes = Vec::new();
|
||||
object.read_to_end(&mut bytes).await?;
|
||||
```
|
||||
|
||||
On read, the Object:
|
||||
1. Creates an ordered push consumer on `$O.<bucket>.C.<nuid>`
|
||||
2. Streams chunk messages, feeding bytes to the reader
|
||||
3. Verifies SHA-256 digest after the last chunk
|
||||
4. If digest doesn't match, returns `io::ErrorKind::InvalidData`
|
||||
|
||||
### Delete
|
||||
|
||||
```rust
|
||||
bucket.delete("file.txt").await?;
|
||||
```
|
||||
|
||||
Marks the object as deleted in metadata (sets `deleted = true`, `chunks = 0`, `size = 0`) with a rollup, then purges all chunk messages.
|
||||
|
||||
### Info
|
||||
|
||||
```rust
|
||||
let info: ObjectInfo = bucket.info("file.txt").await?;
|
||||
```
|
||||
|
||||
Fetches the last metadata message for the object (from `$O.<bucket>.M.<encoded_name>`).
|
||||
|
||||
### Watch
|
||||
|
||||
```rust
|
||||
let mut watcher = bucket.watch().await?;
|
||||
let mut watcher = bucket.watch_with_history().await?;
|
||||
```
|
||||
|
||||
Returns a `Stream<Item = Result<ObjectInfo, WatcherError>>`. Uses an ordered push consumer on `$O.<bucket>.M.>`.
|
||||
|
||||
### List
|
||||
|
||||
```rust
|
||||
let mut list = bucket.list().await?;
|
||||
```
|
||||
|
||||
Returns a `Stream<Item = Result<ObjectInfo, ListerError>>`. Lists all non-deleted objects. Uses `DeliverPolicy::All` to replay all metadata.
|
||||
|
||||
### Seal
|
||||
|
||||
```rust
|
||||
bucket.seal().await?;
|
||||
```
|
||||
|
||||
Sets the underlying stream's `sealed = true`, preventing any further modifications.
|
||||
|
||||
### Links
|
||||
|
||||
```rust
|
||||
// Link to another object (same or different bucket)
|
||||
let info = bucket.add_link("link_name", &object).await?;
|
||||
|
||||
// Link to another bucket
|
||||
let info = bucket.add_bucket_link("link_name", "other_bucket").await?;
|
||||
```
|
||||
|
||||
Links are followed automatically when `get()` is called (one level deep). Cannot link to a deleted object or create a link to a link.
|
||||
|
||||
### Update Metadata
|
||||
|
||||
```rust
|
||||
bucket.update_metadata("object", object_store::UpdateMetadata {
|
||||
name: "new_name".to_string(),
|
||||
description: Some("updated description".to_string()),
|
||||
..Default::default()
|
||||
}).await?;
|
||||
```
|
||||
|
||||
If the name changes, old metadata is purged and new metadata is published.
|
||||
|
||||
## Object Types
|
||||
|
||||
### ObjectInfo
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, Eq, PartialEq)]
|
||||
pub struct ObjectInfo {
|
||||
pub name: String,
|
||||
pub description: Option<String>,
|
||||
pub metadata: HashMap<String, String>,
|
||||
pub headers: Option<HeaderMap>,
|
||||
pub options: Option<ObjectOptions>,
|
||||
pub bucket: String,
|
||||
pub nuid: String,
|
||||
pub size: usize,
|
||||
pub chunks: usize,
|
||||
pub modified: Option<DateTime>,
|
||||
pub digest: Option<String>, // Format: "SHA-256=<base64url-digest>"
|
||||
pub deleted: bool,
|
||||
}
|
||||
```
|
||||
|
||||
### ObjectMetadata
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Default, Clone, Serialize, Deserialize, Eq, PartialEq)]
|
||||
pub struct ObjectMetadata {
|
||||
pub name: String,
|
||||
pub description: Option<String>,
|
||||
pub chunk_size: Option<usize>,
|
||||
pub metadata: HashMap<String, String>,
|
||||
pub headers: Option<HeaderMap>,
|
||||
}
|
||||
```
|
||||
|
||||
### ObjectLink
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Default, Clone, Serialize, Deserialize, Eq, PartialEq)]
|
||||
pub struct ObjectLink {
|
||||
pub name: Option<String>, // None = bucket link, Some = object link
|
||||
pub bucket: String,
|
||||
}
|
||||
```
|
||||
|
||||
### Object
|
||||
|
||||
```rust
|
||||
pub struct Object {
|
||||
pub info: ObjectInfo,
|
||||
remaining_bytes: VecDeque<u8>,
|
||||
has_pending_messages: bool,
|
||||
digest: Option<Sha256>,
|
||||
subscription: Option<crate::jetstream::consumer::push::Ordered>,
|
||||
subscription_future: Option<BoxFuture<'static, Result<Ordered, StreamError>>>,
|
||||
stream: Stream,
|
||||
}
|
||||
```
|
||||
|
||||
Implements `tokio::io::AsyncRead`. Lazy-creates the consumer on first read.
|
||||
|
||||
## Subject Naming Convention
|
||||
|
||||
| Purpose | Subject Pattern |
|
||||
|---------|----------------|
|
||||
| Chunks | `$O.<bucket>.C.<nuid>` |
|
||||
| Metadata | `$O.<bucket>.M.<base64url-encoded-name>` |
|
||||
|
||||
Object names are base64url-encoded in metadata subjects to allow arbitrary characters (the raw name might contain characters invalid in NATS subjects).
|
||||
|
||||
## Validation
|
||||
|
||||
```rust
|
||||
// Bucket: alphanumeric, dash, underscore only
|
||||
BUCKET_NAME_RE: \A[a-zA-Z0-9_-]+\z
|
||||
|
||||
// Object name: alphanumeric, dash, slash, underscore, equals, dot; no leading/trailing dots
|
||||
OBJECT_NAME_RE: \A[-/_=\.a-zA-Z0-9]+\z
|
||||
```
|
||||
|
||||
## Data Integrity
|
||||
|
||||
The object store uses SHA-256 hashing (from the `crypto` module) to verify data integrity:
|
||||
|
||||
1. On `put()`: SHA-256 is computed incrementally as chunks are read. The digest is stored in `ObjectInfo.digest` as `"SHA-256=<base64url>"`.
|
||||
2. On `get()` (via `AsyncRead`): SHA-256 is verified after the last chunk is read. If the computed digest doesn't match the stored digest, `io::ErrorKind::InvalidData` is returned.
|
||||
|
||||
```rust
|
||||
// crypto module
|
||||
pub(crate) struct Sha256 { ... }
|
||||
impl Sha256 {
|
||||
pub fn new() -> Self;
|
||||
pub fn update(&mut self, data: &[u8]);
|
||||
pub fn finish(self) -> [u8; 32];
|
||||
}
|
||||
```
|
||||
@@ -1,272 +0,0 @@
|
||||
# async-nats: Service API
|
||||
|
||||
## Overview
|
||||
|
||||
The Service API provides a microservice request/reply pattern with built-in service discovery, health checking, and statistics. It follows the [NATS Micro v1 specification](https://github.com/nats-io/nats-architecture-design/blob/main/adr/ADR-33.md).
|
||||
|
||||
The `service` feature is required.
|
||||
|
||||
## Service
|
||||
|
||||
```rust
|
||||
#[derive(Debug)]
|
||||
pub struct Service {
|
||||
endpoints_state: Arc<Mutex<Endpoints>>,
|
||||
info: Info,
|
||||
client: Client,
|
||||
handle: JoinHandle<Result<(), Error>>,
|
||||
shutdown_tx: Sender<()>,
|
||||
subjects: Arc<Mutex<Vec<String>>>,
|
||||
queue_group: String,
|
||||
}
|
||||
```
|
||||
|
||||
## Creating a Service
|
||||
|
||||
Via the `ServiceExt` trait on `Client`:
|
||||
|
||||
```rust
|
||||
use async_nats::service::ServiceExt;
|
||||
|
||||
// Builder pattern
|
||||
let mut service = client
|
||||
.service_builder()
|
||||
.description("product service")
|
||||
.stats_handler(|endpoint, stats| serde_json::json!({ "endpoint": endpoint }))
|
||||
.metadata(HashMap::from([("version".into(), "v2".into())]))
|
||||
.queue_group("products-group")
|
||||
.start("products", "1.0.0")
|
||||
.await?;
|
||||
|
||||
// Direct config
|
||||
let mut service = client
|
||||
.add_service(service::Config {
|
||||
name: "products".to_string(),
|
||||
version: "1.0.0".to_string(),
|
||||
description: Some("product service".to_string()),
|
||||
stats_handler: None,
|
||||
metadata: None,
|
||||
queue_group: None,
|
||||
})
|
||||
.await?;
|
||||
```
|
||||
|
||||
Service name must match `^[A-Za-z0-9\-_]+$`. Version must be valid SemVer.
|
||||
|
||||
## Service Verbs
|
||||
|
||||
Every service automatically subscribes to three verb subjects for discovery and monitoring:
|
||||
|
||||
| Verb | Subject Pattern | Purpose |
|
||||
|------|----------------|---------|
|
||||
| PING | `$SRV.PING`, `$SRV.PING.<name>`, `$SRV.PING.<name>.<id>` | Lightweight health check |
|
||||
| INFO | `$SRV.INFO.<name>`, `$SRV.INFO.<name>.<id>` | Service metadata |
|
||||
| STATS | `$SRV.STATS.<name>`, `$SRV.STATS.<name>.<id>` | Service + endpoint statistics |
|
||||
|
||||
A background task handles these verb requests and responds with JSON payloads.
|
||||
|
||||
## Service Config
|
||||
|
||||
```rust
|
||||
#[derive(Serialize, Deserialize, Debug)]
|
||||
pub struct Config {
|
||||
pub name: String,
|
||||
pub description: Option<String>,
|
||||
pub version: String,
|
||||
pub stats_handler: Option<StatsHandler>,
|
||||
pub metadata: Option<HashMap<String, String>>,
|
||||
pub queue_group: Option<String>,
|
||||
}
|
||||
```
|
||||
|
||||
## Adding Endpoints
|
||||
|
||||
```rust
|
||||
// Simple endpoint
|
||||
let mut endpoint = service.endpoint("get-products").await?;
|
||||
|
||||
// Endpoint with custom name and metadata
|
||||
let endpoint = service
|
||||
.endpoint_builder()
|
||||
.name("api")
|
||||
.metadata(HashMap::from([("auth".into(), "required".into())]))
|
||||
.queue_group("custom-group")
|
||||
.add("products")
|
||||
.await?;
|
||||
|
||||
// Grouped endpoints
|
||||
let v1 = service.group("v1");
|
||||
let products = v1.endpoint("products").await?;
|
||||
let orders = v1.endpoint("orders").await?;
|
||||
|
||||
// Nested groups
|
||||
let v1_api = service.group("api").group("v1");
|
||||
```
|
||||
|
||||
## Endpoint
|
||||
|
||||
```rust
|
||||
pub struct Endpoint {
|
||||
requests: Subscriber,
|
||||
stats: Arc<Mutex<Endpoints>>,
|
||||
client: Client,
|
||||
endpoint: String,
|
||||
shutdown: Option<ShutdownRx>,
|
||||
shutdown_future: Option<ShutdownReceiverFuture>,
|
||||
}
|
||||
```
|
||||
|
||||
Implements `futures_util::Stream<Item = Request>`.
|
||||
|
||||
```rust
|
||||
while let Some(request) = endpoint.next().await {
|
||||
request.respond(Ok("response data".into())).await?;
|
||||
}
|
||||
```
|
||||
|
||||
## Service Request
|
||||
|
||||
```rust
|
||||
#[derive(Debug)]
|
||||
pub struct Request {
|
||||
issued: Instant,
|
||||
client: Client,
|
||||
pub message: Message,
|
||||
endpoint: String,
|
||||
stats: Arc<Mutex<Endpoints>>,
|
||||
}
|
||||
```
|
||||
|
||||
### Responding
|
||||
|
||||
```rust
|
||||
// Success
|
||||
request.respond(Ok("result".into())).await?;
|
||||
|
||||
// Success with headers
|
||||
request.respond_with_headers(Ok("result".into()), headers).await?;
|
||||
|
||||
// Error
|
||||
request.respond(Err(service::error::Error {
|
||||
code: 500,
|
||||
status: "internal error".to_string(),
|
||||
})).await?;
|
||||
```
|
||||
|
||||
Error responses always include `Nats-Service-Error` and `Nats-Service-Error-Code` headers. If user-supplied headers contain these headers, they are overridden by the error values.
|
||||
|
||||
### Stats Tracking
|
||||
|
||||
Each response updates endpoint statistics:
|
||||
- `requests` — total requests
|
||||
- `processing_time` — cumulative processing time
|
||||
- `average_processing_time` — average per request
|
||||
- `errors` — error count
|
||||
- `last_error` — last error details
|
||||
|
||||
## Service Info Types
|
||||
|
||||
### PingResponse
|
||||
|
||||
```rust
|
||||
pub struct PingResponse {
|
||||
pub kind: String, // "io.nats.micro.v1.ping_response"
|
||||
pub name: String,
|
||||
pub id: String,
|
||||
pub version: String,
|
||||
pub metadata: HashMap<String, String>,
|
||||
}
|
||||
```
|
||||
|
||||
### Info
|
||||
|
||||
```rust
|
||||
pub struct Info {
|
||||
pub kind: String, // "io.nats.micro.v1.info_response"
|
||||
pub name: String,
|
||||
pub id: String,
|
||||
pub description: String,
|
||||
pub version: String,
|
||||
pub metadata: HashMap<String, String>,
|
||||
pub endpoints: Vec<endpoint::Info>,
|
||||
}
|
||||
```
|
||||
|
||||
### Stats
|
||||
|
||||
```rust
|
||||
pub struct Stats {
|
||||
pub kind: String, // "io.nats.micro.v1.stats_response"
|
||||
pub name: String,
|
||||
pub id: String,
|
||||
pub version: String,
|
||||
pub started: DateTime,
|
||||
pub endpoints: Vec<endpoint::Stats>,
|
||||
}
|
||||
```
|
||||
|
||||
### Endpoint Stats
|
||||
|
||||
```rust
|
||||
pub struct endpoint::Stats {
|
||||
pub name: String,
|
||||
pub subject: String,
|
||||
pub queue_group: String,
|
||||
pub data: Option<serde_json::Value>, // Custom data from stats_handler
|
||||
pub errors: u64,
|
||||
pub processing_time: Duration,
|
||||
pub average_processing_time: Duration,
|
||||
pub requests: u64,
|
||||
pub last_error: Option<error::Error>,
|
||||
}
|
||||
```
|
||||
|
||||
## Service Groups
|
||||
|
||||
Groups provide subject prefixing for endpoint organization:
|
||||
|
||||
```rust
|
||||
let service = client.service_builder().start("api", "1.0.0").await?;
|
||||
|
||||
// Endpoints subscribe to "products" and "orders"
|
||||
let products = service.endpoint("products").await?;
|
||||
let orders = service.endpoint("orders").await?;
|
||||
|
||||
// Grouped: subscribe to "v1.products" and "v1.orders"
|
||||
let v1 = service.group("v1");
|
||||
let products = v1.endpoint("products").await?;
|
||||
let orders = v1.endpoint("orders").await?;
|
||||
|
||||
// Nested: subscribe to "api.v1.products"
|
||||
let api_v1 = service.group("api").group("v1");
|
||||
let products = api_v1.endpoint("products").await?;
|
||||
```
|
||||
|
||||
Each group can have its own queue group:
|
||||
|
||||
```rust
|
||||
let v1 = service.group_with_queue_group("v1", "v1-workers");
|
||||
```
|
||||
|
||||
## Stopping a Service
|
||||
|
||||
```rust
|
||||
service.stop().await?;
|
||||
```
|
||||
|
||||
Sends a shutdown signal and aborts the verb-handling task. Other service instances with the same name continue running.
|
||||
|
||||
## Resetting Stats
|
||||
|
||||
```rust
|
||||
service.reset().await?;
|
||||
```
|
||||
|
||||
Resets all endpoint statistics (errors, processing time, requests, average processing time) to zero.
|
||||
|
||||
## Querying Service State
|
||||
|
||||
```rust
|
||||
let stats: HashMap<String, endpoint::Stats> = service.stats().await?;
|
||||
let info: Info = service.info().await?;
|
||||
```
|
||||
@@ -1,312 +0,0 @@
|
||||
# async-nats: Quick Reference
|
||||
|
||||
## Connection
|
||||
|
||||
```rust
|
||||
// Basic connect
|
||||
let client = async_nats::connect("demo.nats.io").await?;
|
||||
|
||||
// With options
|
||||
let client = async_nats::ConnectOptions::new()
|
||||
.require_tls(true)
|
||||
.name("my-service")
|
||||
.ping_interval(Duration::from_secs(10))
|
||||
.request_timeout(Some(Duration::from_secs(5)))
|
||||
.connect("demo.nats.io")
|
||||
.await?;
|
||||
|
||||
// Multiple servers
|
||||
let client = async_nats::connect(vec![
|
||||
"nats://server1:4222".parse()?,
|
||||
"nats://server2:4222".parse()?,
|
||||
]).await?;
|
||||
|
||||
// Background connect
|
||||
let client = async_nats::ConnectOptions::new()
|
||||
.retry_on_initial_connect()
|
||||
.connect("demo.nats.io")
|
||||
.await?;
|
||||
```
|
||||
|
||||
## Core NATS: Publish
|
||||
|
||||
```rust
|
||||
// Simple publish
|
||||
client.publish("subject", "payload".into()).await?;
|
||||
|
||||
// With reply-to
|
||||
client.publish_with_reply("subject", "reply-to", "payload".into()).await?;
|
||||
|
||||
// With headers
|
||||
let mut headers = HeaderMap::new();
|
||||
headers.insert("X-Custom", "value");
|
||||
client.publish_with_headers("subject", headers, "payload".into()).await?;
|
||||
|
||||
// Full control
|
||||
client.publish_with_reply_and_headers("subject", "reply-to", headers, "payload".into()).await?;
|
||||
|
||||
// Flush (ensure all published messages are sent)
|
||||
client.flush().await?;
|
||||
```
|
||||
|
||||
## Core NATS: Subscribe
|
||||
|
||||
```rust
|
||||
use futures_util::StreamExt;
|
||||
|
||||
// Basic subscribe
|
||||
let mut subscriber = client.subscribe("subject").await?;
|
||||
|
||||
// Queue group
|
||||
let mut subscriber = client.queue_subscribe("subject", "group".into()).await?;
|
||||
|
||||
// Receive messages (Subscriber implements Stream)
|
||||
while let Some(message) = subscriber.next().await {
|
||||
println!("subject: {}, payload: {:?}", message.subject, message.payload);
|
||||
}
|
||||
|
||||
// Unsubscribe
|
||||
subscriber.unsubscribe().await?;
|
||||
|
||||
// Unsubscribe after N messages
|
||||
subscriber.unsubscribe_after(10).await?;
|
||||
|
||||
// Drain (wait for in-flight, then unsubscribe)
|
||||
subscriber.drain().await?;
|
||||
```
|
||||
|
||||
## Core NATS: Request/Reply
|
||||
|
||||
```rust
|
||||
// Simple request (uses default timeout)
|
||||
let response = client.request("subject", "data".into()).await?;
|
||||
|
||||
// With custom timeout and headers
|
||||
let request = async_nats::Request::new()
|
||||
.payload("data".into())
|
||||
.timeout(Some(Duration::from_secs(5)))
|
||||
.headers(headers);
|
||||
let response = client.send_request("subject", request).await?;
|
||||
|
||||
// Custom inbox (bypasses multiplexer)
|
||||
let request = async_nats::Request::new()
|
||||
.payload("data".into())
|
||||
.inbox("custom-inbox".into());
|
||||
let response = client.send_request("subject", request).await?;
|
||||
```
|
||||
|
||||
## Message Structure
|
||||
|
||||
```rust
|
||||
pub struct Message {
|
||||
pub subject: Subject,
|
||||
pub reply: Option<Subject>,
|
||||
pub payload: Bytes,
|
||||
pub headers: Option<HeaderMap>,
|
||||
pub status: Option<StatusCode>,
|
||||
pub description: Option<String>,
|
||||
pub length: usize,
|
||||
}
|
||||
```
|
||||
|
||||
## JetStream
|
||||
|
||||
```rust
|
||||
let jetstream = async_nats::jetstream::new(client);
|
||||
|
||||
// Publish (returns ack future)
|
||||
let ack = jetstream.publish("events", "data".into()).await?;
|
||||
let publish_ack = ack.await?;
|
||||
|
||||
// Stream management
|
||||
let stream = jetstream.create_stream(stream::Config {
|
||||
name: "events".to_string(),
|
||||
subjects: vec!["events.>".to_string()],
|
||||
max_messages: 10_000,
|
||||
..Default::default()
|
||||
}).await?;
|
||||
|
||||
let stream = jetstream.get_stream("events").await?;
|
||||
let stream = jetstream.get_or_create_stream(config).await?;
|
||||
jetstream.delete_stream("events").await?;
|
||||
jetstream.update_stream(config).await?;
|
||||
|
||||
// Consumer management
|
||||
let consumer: PullConsumer = stream.create_consumer(pull::Config {
|
||||
durable_name: Some("my-consumer".to_string()),
|
||||
..Default::default()
|
||||
}).await?;
|
||||
|
||||
// Pull consumer: fetch messages
|
||||
let mut messages = consumer.messages().await?;
|
||||
while let Some(message) = messages.next().await {
|
||||
let message = message?;
|
||||
message.ack().await?;
|
||||
}
|
||||
|
||||
// Push consumer (ordered)
|
||||
let consumer = stream.create_consumer(push::OrderedConfig {
|
||||
deliver_subject: client.new_inbox(),
|
||||
filter_subject: "events.>".to_string(),
|
||||
..Default::default()
|
||||
}).await?;
|
||||
let mut messages = consumer.messages().await?;
|
||||
```
|
||||
|
||||
## Key-Value Store
|
||||
|
||||
```rust
|
||||
let kv = jetstream.create_key_value(kv::Config {
|
||||
bucket: "my-bucket".to_string(),
|
||||
history: 10,
|
||||
..Default::default()
|
||||
}).await?;
|
||||
|
||||
// CRUD
|
||||
let revision = kv.put("key", "value".into()).await?;
|
||||
let revision = kv.create("key", "value".into()).await?;
|
||||
let value: Option<Bytes> = kv.get("key").await?;
|
||||
let entry: Option<Entry> = kv.entry("key").await?;
|
||||
let revision = kv.update("key", "new-value".into(), revision).await?;
|
||||
kv.delete("key").await?;
|
||||
kv.purge("key").await?;
|
||||
|
||||
// Watch
|
||||
let mut watch = kv.watch("key").await?;
|
||||
let mut watch_all = kv.watch_all().await?;
|
||||
|
||||
// History & Keys
|
||||
let mut history = kv.history("key").await?;
|
||||
let mut keys = kv.keys().await?;
|
||||
```
|
||||
|
||||
## Object Store
|
||||
|
||||
```rust
|
||||
let bucket = jetstream.create_object_store(object_store::Config {
|
||||
bucket: "files".to_string(),
|
||||
..Default::default()
|
||||
}).await?;
|
||||
|
||||
// Put (from any AsyncRead)
|
||||
let info = bucket.put("file.txt", &mut file).await?;
|
||||
|
||||
// Get (returns AsyncRead)
|
||||
let mut object = bucket.get("file.txt").await?;
|
||||
let mut bytes = Vec::new();
|
||||
object.read_to_end(&mut bytes).await?;
|
||||
|
||||
// Info, delete, list, watch
|
||||
let info = bucket.info("file.txt").await?;
|
||||
bucket.delete("file.txt").await?;
|
||||
let mut list = bucket.list().await?;
|
||||
let mut watch = bucket.watch().await?;
|
||||
```
|
||||
|
||||
## Service API
|
||||
|
||||
```rust
|
||||
use async_nats::service::ServiceExt;
|
||||
use futures_util::StreamExt;
|
||||
|
||||
let mut service = client
|
||||
.service_builder()
|
||||
.description("product service")
|
||||
.start("products", "1.0.0")
|
||||
.await?;
|
||||
|
||||
let mut endpoint = service.endpoint("get").await?;
|
||||
|
||||
while let Some(request) = endpoint.next().await {
|
||||
request.respond(Ok("result".into())).await?;
|
||||
}
|
||||
```
|
||||
|
||||
## Client State & Events
|
||||
|
||||
```rust
|
||||
// Check connection state
|
||||
match client.connection_state() {
|
||||
State::Connected => {},
|
||||
State::Disconnected => {},
|
||||
State::Pending => {},
|
||||
}
|
||||
|
||||
// Get server info
|
||||
let info: ServerInfo = client.server_info();
|
||||
println!("max_payload: {}", info.max_payload);
|
||||
println!("jetstream: {}", info.jetstream);
|
||||
|
||||
// Get statistics
|
||||
let stats = client.statistics();
|
||||
println!("in_messages: {}", stats.in_messages.load(Ordering::Relaxed));
|
||||
|
||||
// Force reconnect
|
||||
client.force_reconnect().await?;
|
||||
|
||||
// Server pool management
|
||||
client.set_server_pool(["nats://s1:4222".parse()?, "nats://s2:4222".parse()?].as_slice()).await?;
|
||||
let pool = client.server_pool().await?;
|
||||
|
||||
// Drain
|
||||
client.drain().await?;
|
||||
```
|
||||
|
||||
## Error Handling Patterns
|
||||
|
||||
```rust
|
||||
// Connect errors
|
||||
match async_nats::connect("server").await {
|
||||
Err(e) => match e.kind() {
|
||||
ConnectErrorKind::TimedOut => {},
|
||||
ConnectErrorKind::Authentication => {},
|
||||
ConnectErrorKind::AuthorizationViolation => {},
|
||||
_ => {},
|
||||
},
|
||||
Ok(client) => {},
|
||||
}
|
||||
|
||||
// Publish errors
|
||||
match client.publish("subject", "data".into()).await {
|
||||
Err(e) => match e.kind() {
|
||||
PublishErrorKind::MaxPayloadExceeded => {},
|
||||
PublishErrorKind::InvalidSubject => {},
|
||||
PublishErrorKind::Send => {},
|
||||
_ => {},
|
||||
},
|
||||
_ => {},
|
||||
}
|
||||
|
||||
// Request errors
|
||||
match client.request("subject", "data".into()).await {
|
||||
Err(e) => match e.kind() {
|
||||
RequestErrorKind::TimedOut => {},
|
||||
RequestErrorKind::NoResponders => {},
|
||||
RequestErrorKind::InvalidSubject => {},
|
||||
RequestErrorKind::MaxPayloadExceeded => {},
|
||||
_ => {},
|
||||
},
|
||||
Ok(message) => {},
|
||||
}
|
||||
```
|
||||
|
||||
## Feature Flag Quick Reference
|
||||
|
||||
| Feature | Enables | Default |
|
||||
|---------|---------|---------|
|
||||
| `jetstream` | JetStream streams, consumers, publish | ✅ |
|
||||
| `kv` | Key-Value store (implies `jetstream`) | ✅ |
|
||||
| `object-store` | Object store (implies `jetstream` + `crypto`) | ✅ |
|
||||
| `service` | Service API | ✅ |
|
||||
| `nkeys` | NKey/JWT authentication | ✅ |
|
||||
| `nuid` | NUID-based ID generation | ✅ |
|
||||
| `crypto` | SHA-256 (for object store) | ✅ |
|
||||
| `websockets` | WebSocket transport | ✅ |
|
||||
| `ring` | `ring` TLS crypto backend | ✅ |
|
||||
| `aws-lc-rs` | `aws-lc-rs` TLS crypto backend | ❌ |
|
||||
| `fips` | FIPS mode via `aws-lc-rs` | ❌ |
|
||||
| `chrono` | `chrono` datetime instead of `time` | ❌ |
|
||||
| `server_2_10` | Server 2.10+ features | ✅ |
|
||||
| `server_2_11` | Server 2.11+ features | ✅ |
|
||||
| `server_2_12` | Server 2.12+ features | ✅ |
|
||||
| `server_2_14` | Server 2.14+ features | ✅ |
|
||||
@@ -1,23 +0,0 @@
|
||||
# async-nats Reference Documentation
|
||||
|
||||
**Crate**: `async-nats` v0.49.1
|
||||
**Source**: https://github.com/nats-io/nats.rs (`async-nats/` directory)
|
||||
**License**: Apache-2.0
|
||||
|
||||
## Contents
|
||||
|
||||
| # | File | Topic |
|
||||
|---|------|-------|
|
||||
| 01 | [Overview & Architecture](01-overview-and-architecture.md) | Crate overview, feature flags, source structure, core connection model, dependency graph |
|
||||
| 02 | [Key Types & Traits](02-key-types-and-traits.md) | `Client`, `Subscriber`, `Message`, `Request`, `ServerInfo`, `ConnectInfo`, `Statistics`, subject/header types, event/state types, error types, trait definitions |
|
||||
| 03 | [Protocol & Wire Format](03-protocol-and-wire-format.md) | NATS wire protocol (PUB/HPUB/SUB/UNSUB/PING/PONG, MSG/HMSG/INFO/ERR), parser/serializer internals, vectored I/O, WebSocket transport, connection lifecycle, reconnection |
|
||||
| 04 | [Connection Management](04-connection-management.md) | `ConnectOptions` builder, authentication methods, TLS configuration, reconnection callbacks, event callbacks, `ConnectionHandler` internals, multiplexer, server pool management |
|
||||
| 05 | [JetStream](05-jetstream.md) | `Context` and `ContextBuilder`, streams, consumers (pull/push/ordered), JetStream messages and acks, publish with ack futures, pagination |
|
||||
| 06 | [Key-Value Store](06-key-value-store.md) | KV `Store` handle, bucket CRUD, put/get/create/update/delete/purge, watch/history/keys, entry operations, mirrored buckets, KV-to-stream mapping |
|
||||
| 07 | [Object Store](07-object-store.md) | `ObjectStore` handle, put/get/delete/watch/list/seal, links, `Object` (AsyncRead), chunking, SHA-256 integrity, subject naming |
|
||||
| 08 | [Service API](08-service-api.md) | `Service` and `ServiceBuilder`, endpoints, groups, verb subscriptions (PING/INFO/STATS), request/respond with stats tracking |
|
||||
| 09 | [Quick Reference](09-quick-reference.md) | Code examples for all major operations, feature flag reference |
|
||||
|
||||
## How This Documentation Was Produced
|
||||
|
||||
All information was derived by reading the source code of the `async-nats` crate at version 0.49.1 from the `nats.rs` repository. No external documentation was consulted — this is a ground-up reference based purely on the source.
|
||||
@@ -1,200 +0,0 @@
|
||||
# nats.rs: Overview and Architecture
|
||||
|
||||
**Version**: async-nats 0.49.1, nats-server 0.1.0
|
||||
**Repository**: https://github.com/nats-io/nats.rs
|
||||
**License**: Apache-2.0
|
||||
**Rust Edition**: 2021
|
||||
**MSRV**: 1.88.0
|
||||
**Protocol**: NATS Client Protocol (INFO/CONNECT/PUB/SUB/UNSUB/PING/PONG)
|
||||
|
||||
## What It Is
|
||||
|
||||
The `nats.rs` repository contains the **official Rust client for NATS.io**, a high-performance messaging system. The active crate is **`async-nats`** — a fully async, Tokio-based NATS client. The deprecated `nats` crate (synchronous) receives security fixes only.
|
||||
|
||||
The `nats-server` crate is **not** an implementation of the NATS server. It is a **test harness** that spawns the Go-based `nats-server` binary for integration tests. The actual NATS server is a separate Go project at `github.com/nats-io/nats-server`.
|
||||
|
||||
Core design decisions:
|
||||
- **Fully async** — all I/O is Tokio-based with async/await throughout
|
||||
- **Cloneable Client handle** — `Client` is cheap to clone (Arc internals), all protocol work happens in a single `ConnectionHandler` task
|
||||
- **Channel-based internal communication** — `Client` sends `Command` variants via `mpsc` channel to `ConnectionHandler`
|
||||
- **Multiplexed request-reply** — one internal subscription handles all request-response patterns via inbox token routing
|
||||
- **Automatic reconnection** — exponential backoff with configurable server pool rotation
|
||||
- **Feature-gated subsystems** — JetStream, KV, Object Store, Service API, NKeys, WebSockets, and crypto backends are all optional
|
||||
|
||||
## Workspace Structure
|
||||
|
||||
```
|
||||
nats.rs/
|
||||
├── async-nats/ # Primary crate — async NATS client
|
||||
│ ├── src/
|
||||
│ │ ├── lib.rs # Entry point: connect(), ServerOp, ClientOp, Command, ConnectionHandler, Subscriber
|
||||
│ │ ├── client.rs # Client handle: publish, subscribe, request, flush, drain
|
||||
│ │ ├── connection.rs # Low-level I/O: protocol parsing, read/write buffers
|
||||
│ │ ├── connector.rs # Connection establishment, reconnection, server pool
|
||||
│ │ ├── options.rs # ConnectOptions builder
|
||||
│ │ ├── auth.rs # Auth struct (credentials container)
|
||||
│ │ ├── auth_utils.rs # Credential file parsing (.creds files)
|
||||
│ │ ├── error.rs # Generic Error<Kind> type
|
||||
│ │ ├── header.rs # HeaderMap — NATS message headers
|
||||
│ │ ├── subject.rs # Subject type, ToSubject trait
|
||||
│ │ ├── status.rs # StatusCode (100-999 NATS protocol codes)
|
||||
│ │ ├── message.rs # Message and OutboundMessage types
|
||||
│ │ ├── tls.rs # TLS configuration helpers
|
||||
│ │ ├── crypto.rs # Crypto feature support
|
||||
│ │ ├── id_generator.rs # NUID/rand-based unique ID generation
|
||||
│ │ ├── datetime.rs # DateTime helpers for JetStream/Service
|
||||
│ │ ├── jetstream/ # JetStream API (feature-gated)
|
||||
│ │ │ ├── mod.rs # Module root, jetstream::new(), with_domain()
|
||||
│ │ │ ├── context.rs # JetStream Context — streams, publishing, consumers
|
||||
│ │ │ ├── stream.rs # Stream management, Config, Info, Consumer creation
|
||||
│ │ │ ├── consumer/ # Pull, Push, Ordered consumers
|
||||
│ │ │ ├── message.rs # JetStream Message with ack methods
|
||||
│ │ │ ├── publish.rs # PublishAck
|
||||
│ │ │ ├── response.rs # Response wrapper
|
||||
│ │ │ ├── errors.rs # JetStream error codes
|
||||
│ │ │ ├── account.rs # Account info
|
||||
│ │ │ ├── kv/ # Key-Value store (feature: "kv")
|
||||
│ │ │ └── object_store/ # Object store (feature: "object-store")
|
||||
│ │ └── service/ # Service API (feature-gated)
|
||||
│ │ ├── mod.rs # Service, ServiceBuilder
|
||||
│ │ ├── endpoint.rs # Endpoint handling
|
||||
│ │ └── error.rs # Service errors
|
||||
│ ├── tests/ # Integration tests (require nats-server binary)
|
||||
│ ├── examples/ # Runnable examples
|
||||
│ └── benches/ # Criterion benchmarks
|
||||
├── nats-server/ # Test harness — spawns Go nats-server for tests
|
||||
│ ├── src/lib.rs # Server struct, run_server(), run_cluster()
|
||||
│ └── configs/ # Server config files for tests
|
||||
│ └── jetstream.conf
|
||||
└── nats/ # DEPRECATED sync client — do not modify
|
||||
```
|
||||
|
||||
## Architecture Diagram
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────┐
|
||||
│ Application Layer │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ JetStream│ │ KV │ │ Object │ │ Service │ │
|
||||
│ │ Context │ │ Store │ │ Store │ │ API │ │
|
||||
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
|
||||
│ └──────────────┴─────────────┴─────────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────┴──────┐ │
|
||||
│ │ Client │ Cloneable handle │
|
||||
│ │ (mpsc::Sender) │
|
||||
│ └──────┬──────┘ │
|
||||
│ │ Command channel │
|
||||
└──────────────────────────┼────────────────────────────────┘
|
||||
│
|
||||
┌──────────────────────────┼────────────────────────────────┐
|
||||
│ ConnectionHandler │
|
||||
│ (single Tokio task) │
|
||||
│ │ │
|
||||
│ ┌───────────┐ ┌───────┴───────┐ ┌──────────────────┐ │
|
||||
│ │Subscriptions│ │ Multiplexer │ │ Flush Observers │ │
|
||||
│ │ HashMap │ │ (request-reply)│ │ │ │
|
||||
│ └──────┬──────┘ └───────┬───────┘ └──────────────────┘ │
|
||||
│ └────────────────┼ │
|
||||
│ ┌──────┴──────┐ │
|
||||
│ │ Connector │ Server pool, reconnect │
|
||||
│ └──────┬──────┘ │
|
||||
│ │ │
|
||||
│ ┌──────┴──────┐ │
|
||||
│ │ Connection │ Protocol I/O │
|
||||
│ │ (read/write)│ ServerOp / ClientOp │
|
||||
│ └──────┬──────┘ │
|
||||
└──────────────────────────┼────────────────────────────────┘
|
||||
│
|
||||
┌──────┴──────┐
|
||||
│ NATS Server │ (Go binary, TCP/TLS/WS)
|
||||
└─────────────┘
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### Subject
|
||||
NATS uses subject strings for message addressing. A `Subject` is a validated, immutable, UTF-8 string backed by `Bytes`. Subjects use dot-delimited tokens (e.g., `events.data.sensor1`). Wildcards `*` (single token) and `>` (multi-token suffix) are supported for subscriptions.
|
||||
|
||||
### ClientOp / ServerOp
|
||||
The NATS client-server protocol is text-based with binary payloads. The client sends `ClientOp` variants (CONNECT, PUB/HPUB, SUB, UNSUB, PING, PONG) and receives `ServerOp` variants (INFO, MSG/HMSG, +OK, -ERR, PING, PONG).
|
||||
|
||||
### Command
|
||||
Internal command type sent from `Client` to `ConnectionHandler` via `mpsc` channel. Includes Publish, Request, Subscribe, Unsubscribe, Flush, Drain, Reconnect, SetServerPool, ServerPool.
|
||||
|
||||
### Multiplexer
|
||||
A single internal subscription (SID 0) that routes all request-reply responses. When a `Request` is made, a unique inbox token is registered in the multiplexer's sender map, and the response is dispatched to the corresponding `oneshot::Sender`.
|
||||
|
||||
### ConnectionHandler
|
||||
A single Tokio task that drives all protocol I/O. It processes server operations from `Connection`, handles client commands from the `mpsc` channel, manages subscriptions, maintains ping/pong health, and orchestrates reconnection.
|
||||
|
||||
## nats-server Test Harness
|
||||
|
||||
The `nats-server` crate provides utilities for launching real NATS server instances in tests:
|
||||
|
||||
- `run_server(cfg)` — starts a single server with optional config
|
||||
- `run_cluster(cfg)` — starts a 3-node cluster
|
||||
- `Server` struct — holds the child process, cleans up on drop
|
||||
- `Server::restart()` — kills and restarts the server process
|
||||
- `Server::client_url()` — reads the INFO from the server to get the client URL
|
||||
- `set_lame_duck_mode(server)` — sends LDM signal to the server process
|
||||
|
||||
The test harness spawns the Go `nats-server` binary via `std::process::Command`, using dynamic ports for parallel test execution. It auto-discovers the client URL by connecting to the server's TCP port and parsing the `INFO` JSON. On `Drop`, it kills the child process and cleans up JetStream storage directories.
|
||||
|
||||
## Feature Flags
|
||||
|
||||
```toml
|
||||
# Default: everything enabled
|
||||
default = ["server_2_10", "server_2_11", "server_2_12", "server_2_14",
|
||||
"service", "ring", "jetstream", "nkeys", "crypto",
|
||||
"object-store", "kv", "websockets", "nuid"]
|
||||
|
||||
# Subsystems
|
||||
jetstream # JetStream API
|
||||
kv # Key-Value store (requires jetstream)
|
||||
object-store # Object store (requires jetstream + crypto)
|
||||
service # Service API
|
||||
|
||||
# Crypto backends (pick one)
|
||||
ring # Default crypto backend
|
||||
aws-lc-rs # Alternative backend
|
||||
fips # FIPS mode (requires aws-lc-rs)
|
||||
|
||||
# Auth
|
||||
nkeys # NKey authentication
|
||||
|
||||
# Other
|
||||
nuid # NUID-based ID generation (falls back to rand)
|
||||
crypto # Encryption support
|
||||
websockets # WebSocket transport
|
||||
experimental # Experimental features
|
||||
|
||||
# Server version markers (enable version-specific API fields)
|
||||
server_2_10
|
||||
server_2_11
|
||||
server_2_12
|
||||
server_2_14
|
||||
```
|
||||
|
||||
## Dependencies (Key)
|
||||
|
||||
| Dependency | Purpose |
|
||||
|-----------|---------|
|
||||
| `tokio` | Async runtime (macros, rt, net, sync, time, io-util) |
|
||||
| `bytes` | Zero-copy byte buffers for payloads |
|
||||
| `tokio-rustls` | TLS via rustls |
|
||||
| `rustls-native-certs` | Load native TLS root certificates |
|
||||
| `serde` / `serde_json` | JSON serialization for protocol messages and JetStream API |
|
||||
| `memchr` | Fast CRLF search for protocol parsing |
|
||||
| `futures-util` | Stream trait, Sink trait, StreamExt |
|
||||
| `tracing` | Structured logging |
|
||||
| `thiserror` | Error type derivation |
|
||||
| `url` | URL parsing for server addresses |
|
||||
| `portable-atomic` | Portable atomic operations |
|
||||
|
||||
## References
|
||||
|
||||
- [NATS Protocol Specification](https://docs.nats.io/reference/reference-protocols/nats-protocol)
|
||||
- [NATS JetStream Documentation](https://docs.nats.io/nats-concepts/jetstream)
|
||||
- [async-nats on docs.rs](https://docs.rs/async-nats)
|
||||
@@ -1,281 +0,0 @@
|
||||
# NATS Client Protocol and Wire Format
|
||||
|
||||
**Protocol**: NATS Client Protocol v1 (with dynamic reconfiguration)
|
||||
**Transport**: TCP (port 4222), TLS, WebSocket (ws/wss)
|
||||
|
||||
## Protocol Overview
|
||||
|
||||
The NATS client-server protocol is a simple, text-based protocol with binary payload support. All operations are terminated with `\r\n`. Messages carry their payload length, allowing efficient binary data transfer.
|
||||
|
||||
### Connection Lifecycle
|
||||
|
||||
```
|
||||
Client Server
|
||||
│ │
|
||||
│◄──────────── INFO {json} ────────────────────│ Server sends INFO first
|
||||
│ │
|
||||
│────────────── CONNECT {json} ────────────────►│ Client sends CONNECT
|
||||
│────────────── PING ──────────────────────────►│ Client sends PING
|
||||
│◄──────────── PONG ────────────────────────── │ Server confirms connection
|
||||
│ │
|
||||
│──── SUB/UNSUB/PUB/HPUB ──────────────────────►│ Normal operation
|
||||
│◄─── MSG/HMSG/+OK/-ERR/PING ─────────────────│
|
||||
│ │
|
||||
```
|
||||
|
||||
## Server Operations (ServerOp)
|
||||
|
||||
These are operations received from the server. The `Connection` module parses these from the read buffer.
|
||||
|
||||
### INFO
|
||||
|
||||
Sent by the server upon connection and asynchronously when cluster topology changes.
|
||||
|
||||
```
|
||||
INFO {json}\r\n
|
||||
```
|
||||
|
||||
JSON fields (see `ServerInfo` struct):
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `server_id` | String | Unique server identifier |
|
||||
| `server_name` | String | Generated server name |
|
||||
| `host` | String | Cluster host |
|
||||
| `port` | u16 | Cluster port |
|
||||
| `version` | String | Server version |
|
||||
| `auth_required` | bool | Authentication required |
|
||||
| `tls_required` | bool | TLS required |
|
||||
| `max_payload` | usize | Maximum payload size |
|
||||
| `proto` | i8 | Protocol version (0 or 1) |
|
||||
| `client_id` | u64 | Server-assigned client ID |
|
||||
| `go` | String | Go build version |
|
||||
| `nonce` | String | Nonce for nkey auth |
|
||||
| `connect_urls` | Vec<String> | Cluster server URLs |
|
||||
| `client_ip` | String | Client IP as seen by server |
|
||||
| `headers` | bool | Server supports headers |
|
||||
| `ldm` | bool | Lame duck mode |
|
||||
| `cluster` | Option<String> | Cluster name |
|
||||
| `domain` | Option<String> | NATS domain |
|
||||
| `jetstream` | bool | JetStream enabled |
|
||||
|
||||
### MSG
|
||||
|
||||
Delivers a message to a subscription (no headers):
|
||||
|
||||
```
|
||||
MSG <subject> <sid> [reply-to] <#bytes>\r\n
|
||||
<payload>\r\n
|
||||
```
|
||||
|
||||
### HMSG
|
||||
|
||||
Delivers a message with headers:
|
||||
|
||||
```
|
||||
HMSG <subject> <sid> [reply-to] <#header-bytes> <#total-bytes>\r\n
|
||||
<NATS/1.0 [status] [description]>\r\n
|
||||
<header-name>: <header-value>\r\n
|
||||
\r\n
|
||||
<payload>\r\n
|
||||
```
|
||||
|
||||
Header format follows the NATS/1.0 header spec:
|
||||
- First line: `NATS/1.0` optionally followed by status code and description
|
||||
- Subsequent lines: `name: value` headers
|
||||
- Empty line separates headers from payload
|
||||
- Header values may span multiple lines (continuation lines start with whitespace)
|
||||
|
||||
### PING / PONG
|
||||
|
||||
```
|
||||
PING\r\n → Client responds with PONG
|
||||
PONG\r\n → Acknowledges client's PING
|
||||
```
|
||||
|
||||
### +OK / -ERR
|
||||
|
||||
```
|
||||
+OK\r\n → Success acknowledgment (verbose mode)
|
||||
-ERR <description>\r\n → Error from server
|
||||
```
|
||||
|
||||
Common server errors:
|
||||
- `authorization violation` → parsed as `ServerError::AuthorizationViolation`
|
||||
- Other strings → `ServerError::Other(String)`
|
||||
|
||||
## Client Operations (ClientOp)
|
||||
|
||||
These are operations sent from the client to the server. The `Connection` module serializes these to the write buffer.
|
||||
|
||||
### CONNECT
|
||||
|
||||
Sent as the first client operation after receiving INFO. Contains authentication and capability information.
|
||||
|
||||
```
|
||||
CONNECT {json}\r\n
|
||||
```
|
||||
|
||||
JSON fields (see `ConnectInfo` struct):
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `verbose` | bool | Enable +OK acknowledgments (always false in this client) |
|
||||
| `pedantic` | bool | Strict format checking (always false) |
|
||||
| `jwt` | Option<String> | User JWT for auth |
|
||||
| `nkey` | Option<String> | Public nkey for auth |
|
||||
| `sig` | Option<String> | Signed nonce (Base64URL encoded) |
|
||||
| `name` | Option<String> | Client name |
|
||||
| `echo` | bool | Whether server should echo messages back |
|
||||
| `lang` | String | Implementation language ("rust") |
|
||||
| `version` | String | Client version |
|
||||
| `protocol` | u8 | Protocol version (1 = dynamic) |
|
||||
| `tls_required` | bool | TLS required |
|
||||
| `user` | Option<String> | Username |
|
||||
| `pass` | Option<String> | Password |
|
||||
| `auth_token` | Option<String> | Auth token |
|
||||
| `headers` | bool | Client supports headers (always true) |
|
||||
| `no_responders` | bool | Client supports no-responders (always true) |
|
||||
|
||||
### PUB / HPUB
|
||||
|
||||
Publish a message:
|
||||
|
||||
```
|
||||
PUB <subject> [reply-to] <#payload-bytes>\r\n
|
||||
<payload>\r\n
|
||||
```
|
||||
|
||||
Publish with headers:
|
||||
|
||||
```
|
||||
HPUB <subject> [reply-to] <#header-bytes> <#total-bytes>\r\n
|
||||
<NATS/1.0\r\n
|
||||
<header-name>: <header-value>\r\n
|
||||
\r\n
|
||||
<payload>\r\n
|
||||
```
|
||||
|
||||
### SUB
|
||||
|
||||
Subscribe to a subject:
|
||||
|
||||
```
|
||||
SUB <subject> [queue-group] <sid>\r\n
|
||||
```
|
||||
|
||||
### UNSUB
|
||||
|
||||
Unsubscribe from a subscription:
|
||||
|
||||
```
|
||||
UNSUB <sid> [max]\r\n
|
||||
```
|
||||
|
||||
The optional `max` parameter tells the server to auto-unsubscribe after receiving the specified number of messages.
|
||||
|
||||
### PING / PONG
|
||||
|
||||
```
|
||||
PING\r\n → Health check / keepalive
|
||||
PONG\r\n → Response to server PING
|
||||
```
|
||||
|
||||
## Protocol Version
|
||||
|
||||
The `Protocol` enum has two variants:
|
||||
|
||||
| Value | Name | Description |
|
||||
|-------|------|-------------|
|
||||
| 0 | Original | Basic protocol |
|
||||
| 1 | Dynamic | Supports async INFO for cluster topology changes, lame duck mode |
|
||||
|
||||
This client always sends `protocol: 1` (Dynamic), enabling:
|
||||
- Asynchronous INFO messages with updated server lists
|
||||
- Lame duck mode notifications
|
||||
- Dynamic reconfiguration of cluster topology
|
||||
|
||||
## Wire Format Details
|
||||
|
||||
### Message Length Calculation
|
||||
|
||||
For plain `MSG`:
|
||||
```
|
||||
length = subject.len() + reply.map_or(0, |r| r.len()) + payload.len()
|
||||
```
|
||||
|
||||
For `HMSG`:
|
||||
```
|
||||
length = subject.len() + reply.map_or(0, |r| r.len()) + header_len + payload.len()
|
||||
```
|
||||
|
||||
Where `header_len` = serialized header bytes and `total_len` = `header_len + payload.len()`.
|
||||
|
||||
### Write Buffer Architecture
|
||||
|
||||
The `Connection` uses a two-tier write buffer:
|
||||
|
||||
1. **`flattened_writes`** (`BytesMut`) — for small writes (< 4096 bytes). Protocol headers, short commands, and small messages are flattened into this buffer for efficient sequential writing.
|
||||
|
||||
2. **`write_buf`** (`VecDeque<Bytes>`) — for large writes (>= 4096 bytes). Large payloads are appended as separate `Bytes` chunks. Supports vectored writes (`write_vectored`) when the underlying stream supports it, writing up to 64 chunks at once.
|
||||
|
||||
The soft limit for the total write buffer is 65,535 bytes (`SOFT_WRITE_BUF_LIMIT`). When exceeded, the `ConnectionHandler` stops processing new commands until the buffer drains.
|
||||
|
||||
### Read Buffer Architecture
|
||||
|
||||
The `Connection` uses a single `BytesMut` read buffer with configurable initial capacity (default 65,535 bytes). Protocol parsing uses `memchr::memmem::find` to locate CRLF delimiters efficiently. If a partial message is in the buffer, the parser returns `None` and waits for more data.
|
||||
|
||||
### Header Serialization
|
||||
|
||||
Headers are serialized in NATS/1.0 format:
|
||||
|
||||
```
|
||||
NATS/1.0\r\n
|
||||
Header-Name: Header-Value\r\n
|
||||
Multi-Line-Header: value part 1\r\n
|
||||
continuation of value\r\n
|
||||
Another-Header: another value\r\n
|
||||
\r\n
|
||||
```
|
||||
|
||||
The `HeaderMap::to_bytes()` method handles this serialization, using `httparse`-compatible line folding for multi-line values.
|
||||
|
||||
### Status Codes in Headers
|
||||
|
||||
NATS status codes are embedded in the `HMSG` header version line:
|
||||
|
||||
```
|
||||
NATS/1.0 404 No Messages\r\n
|
||||
NATS/1.0 408 Request Timeout\r\n
|
||||
NATS/1.0 503 No Responders\r\n
|
||||
```
|
||||
|
||||
Common codes used by the client:
|
||||
|
||||
| Code | Constant | Meaning |
|
||||
|------|----------|---------|
|
||||
| 100 | `IDLE_HEARTBEAT` | JetStream idle heartbeat |
|
||||
| 200 | `OK` | Success |
|
||||
| 404 | `NOT_FOUND` | Message/stream not found |
|
||||
| 408 | `TIMEOUT` | Request timeout |
|
||||
| 409 | `REQUEST_TERMINATED` | Request terminated |
|
||||
| 503 | `NO_RESPONDERS` | No responders available |
|
||||
|
||||
## Protocol Parsing Implementation
|
||||
|
||||
The `Connection::try_read_op()` method handles all protocol parsing:
|
||||
|
||||
1. Search for `\r\n` delimiter using `memchr::memmem::find`
|
||||
2. Match the operation prefix:
|
||||
- `+OK` → `ServerOp::Ok`
|
||||
- `PING` → `ServerOp::Ping`
|
||||
- `PONG` → `ServerOp::Pong`
|
||||
- `-ERR` → parse error description → `ServerOp::Error`
|
||||
- `INFO ` → parse JSON → `ServerOp::Info`
|
||||
- `MSG ` → parse subject/sid/reply/length, read payload → `ServerOp::Message`
|
||||
- `HMSG ` → parse headers + payload → `ServerOp::Message`
|
||||
3. Unknown prefix → return `io::Error` with `InvalidInput`
|
||||
|
||||
For `MSG` and `HMSG`, if the complete payload isn't yet in the read buffer (checked via `len + payload_len + 4 > remaining`), the method returns `Ok(None)` and the buffer accumulates more data before retrying.
|
||||
|
||||
Non-UTF8 subjects in server messages are handled gracefully — the parser returns an `io::Error` rather than panicking, which is critical because the Go server does not enforce UTF-8 in subjects (regression fix for issue #1572).
|
||||
@@ -1,443 +0,0 @@
|
||||
# Key Types and Traits
|
||||
|
||||
This document covers the core data types in the `async-nats` crate that form the public API and internal plumbing.
|
||||
|
||||
## Public Types
|
||||
|
||||
### Client
|
||||
|
||||
**Location**: `client.rs`
|
||||
|
||||
`Client` is the primary user-facing type. It is a lightweight, cloneable handle to a NATS connection.
|
||||
|
||||
```rust
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct Client {
|
||||
info: tokio::sync::watch::Receiver<Option<ServerInfo>>,
|
||||
state: tokio::sync::watch::Receiver<State>,
|
||||
sender: mpsc::Sender<Command>,
|
||||
poll_sender: PollSender<Command>,
|
||||
next_subscription_id: Arc<AtomicU64>,
|
||||
subscription_capacity: usize,
|
||||
inbox_prefix: Arc<str>,
|
||||
request_timeout: Option<Duration>,
|
||||
max_payload: Arc<AtomicUsize>,
|
||||
connection_stats: Arc<Statistics>,
|
||||
skip_subject_validation: bool,
|
||||
}
|
||||
```
|
||||
|
||||
Key methods:
|
||||
- `publish(subject, payload)` — fire-and-forget publish
|
||||
- `publish_with_headers(subject, headers, payload)` — publish with NATS headers
|
||||
- `publish_with_reply(subject, reply, payload)` — publish with reply subject
|
||||
- `request(subject, payload)` — request-response (returns `Message`)
|
||||
- `send_request(subject, request)` — request with `Request` builder
|
||||
- `subscribe(subject)` — subscribe to a subject, returns `Subscriber`
|
||||
- `queue_subscribe(subject, queue_group)` — subscribe as part of a queue group
|
||||
- `flush()` — ensure all pending messages are written to the wire
|
||||
- `drain()` — gracefully drain all subscriptions and close
|
||||
- `force_reconnect()` — trigger immediate reconnection
|
||||
- `new_inbox()` — generate a unique inbox subject for request-reply
|
||||
- `server_info()` — get last received `ServerInfo`
|
||||
- `max_payload()` — get server's maximum payload size
|
||||
- `connection_state()` — get current connection `State`
|
||||
- `statistics()` — get `Arc<Statistics>` for connection metrics
|
||||
- `is_server_compatible(major, minor, patch)` — check server version compatibility
|
||||
- `set_server_pool(addrs)` / `server_pool()` — manage server pool
|
||||
|
||||
`Client` also implements `Sink<OutboundMessage>` for backpressure-aware publishing.
|
||||
|
||||
### Subscriber
|
||||
|
||||
**Location**: `lib.rs`
|
||||
|
||||
A `Subscriber` receives messages from a single subscription. It implements `futures::Stream`.
|
||||
|
||||
```rust
|
||||
#[derive(Debug)]
|
||||
pub struct Subscriber {
|
||||
sid: u64,
|
||||
receiver: mpsc::Receiver<Message>,
|
||||
sender: mpsc::Sender<Command>,
|
||||
}
|
||||
```
|
||||
|
||||
Key methods:
|
||||
- `unsubscribe()` — unsubscribe and close the stream
|
||||
- `unsubscribe_after(max)` — auto-unsubscribe after N messages
|
||||
- `drain()` — gracefully drain remaining messages then close
|
||||
|
||||
On `Drop`, `Subscriber` automatically sends an `Unsubscribe` command and closes the receiver channel.
|
||||
|
||||
### Message
|
||||
|
||||
**Location**: `message.rs`
|
||||
|
||||
Represents an inbound NATS message.
|
||||
|
||||
```rust
|
||||
#[derive(Clone, Debug, Serialize, Deserialize)]
|
||||
pub struct Message {
|
||||
pub subject: Subject,
|
||||
pub reply: Option<Subject>,
|
||||
pub payload: Bytes,
|
||||
pub headers: Option<HeaderMap>,
|
||||
pub status: Option<StatusCode>,
|
||||
pub description: Option<String>,
|
||||
pub length: usize,
|
||||
}
|
||||
```
|
||||
|
||||
### OutboundMessage
|
||||
|
||||
**Location**: `message.rs`
|
||||
|
||||
Represents a message to be published. No status/description fields (those are inbound-only).
|
||||
|
||||
```rust
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct OutboundMessage {
|
||||
pub subject: Subject,
|
||||
pub reply: Option<Subject>,
|
||||
pub payload: Bytes,
|
||||
pub headers: Option<HeaderMap>,
|
||||
}
|
||||
```
|
||||
|
||||
### Subject
|
||||
|
||||
**Location**: `subject.rs`
|
||||
|
||||
An immutable, validated UTF-8 string backed by `Bytes`. Used throughout the crate instead of raw `String` for subjects.
|
||||
|
||||
```rust
|
||||
#[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Ord)]
|
||||
pub struct Subject {
|
||||
bytes: Bytes,
|
||||
}
|
||||
```
|
||||
|
||||
Implements `Deref<Target = str>`, `From<&str>`, `From<String>`, `TryFrom<Bytes>`, `Serialize`, `Deserialize`.
|
||||
|
||||
Validation methods:
|
||||
- `is_valid()` — checks NATS subject rules (no leading/trailing dots, no consecutive dots, no whitespace)
|
||||
- `validated(s)` — construct with validation, returns `Result<Subject, SubjectError>`
|
||||
- `from_static_validated(s)` — const-time validation for static strings (compile-time panic on invalid)
|
||||
|
||||
### ToSubject Trait
|
||||
|
||||
**Location**: `subject.rs`
|
||||
|
||||
```rust
|
||||
pub trait ToSubject {
|
||||
fn to_subject(&self) -> Subject;
|
||||
}
|
||||
```
|
||||
|
||||
Implemented for `Subject`, `&'static str`, `String`. All methods accepting subjects are generic over `impl ToSubject`.
|
||||
|
||||
### HeaderMap
|
||||
|
||||
**Location**: `header.rs`
|
||||
|
||||
NATS message headers, modeled after the `http::header` crate.
|
||||
|
||||
```rust
|
||||
#[derive(Clone, PartialEq, Eq, Debug, Default)]
|
||||
pub struct HeaderMap {
|
||||
inner: HashMap<HeaderName, Vec<HeaderValue>>,
|
||||
}
|
||||
```
|
||||
|
||||
Supports multiple values per header name (like HTTP). Key methods:
|
||||
- `insert(name, value)` — replace all values for a name
|
||||
- `append(name, value)` — add a value to a name
|
||||
- `get(name)` — get the first value
|
||||
- `get_all(name)` — get all values as an iterator
|
||||
- `len()` / `is_empty()` — number of header entries
|
||||
- `to_bytes()` — serialize to NATS/1.0 wire format
|
||||
- `wire_len()` — size in wire format (for payload size checks)
|
||||
|
||||
### StatusCode
|
||||
|
||||
**Location**: `status.rs`
|
||||
|
||||
NATS status codes (100-999), structurally similar to HTTP status codes.
|
||||
|
||||
```rust
|
||||
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Serialize, Deserialize)]
|
||||
pub struct StatusCode(NonZeroU16);
|
||||
```
|
||||
|
||||
Constants:
|
||||
| Constant | Code | Meaning |
|
||||
|----------|------|---------|
|
||||
| `IDLE_HEARTBEAT` | 100 | JetStream idle heartbeat |
|
||||
| `OK` | 200 | Success |
|
||||
| `NOT_FOUND` | 404 | Not found |
|
||||
| `TIMEOUT` | 408 | Timeout |
|
||||
| `REQUEST_TERMINATED` | 409 | Request terminated |
|
||||
| `NO_RESPONDERS` | 503 | No responders |
|
||||
|
||||
### ServerInfo
|
||||
|
||||
**Location**: `lib.rs`
|
||||
|
||||
Deserialized from the server's `INFO` JSON message. Contains server capabilities, connection details, and cluster information.
|
||||
|
||||
### ConnectInfo
|
||||
|
||||
**Location**: `lib.rs`
|
||||
|
||||
Serialized into the client's `CONNECT` JSON message. Contains authentication credentials, client capabilities, and protocol preferences.
|
||||
|
||||
### ServerAddr
|
||||
|
||||
**Location**: `lib.rs`
|
||||
|
||||
A validated NATS server URL, supporting schemes `nats://`, `tls://`, `ws://`, `wss://`.
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
|
||||
pub struct ServerAddr(Url);
|
||||
```
|
||||
|
||||
Methods:
|
||||
- `from_url(url)` — validate and create
|
||||
- `tls_required()` — true for `tls://` scheme
|
||||
- `is_websocket()` — true for `ws://` or `wss://`
|
||||
- `host()` / `port()` / `scheme()` — URL component accessors
|
||||
- `socket_addrs()` — async DNS resolution
|
||||
- `username()` / `password()` — embedded credentials
|
||||
|
||||
### Auth
|
||||
|
||||
**Location**: `auth.rs`
|
||||
|
||||
Container for authentication credentials.
|
||||
|
||||
```rust
|
||||
#[derive(Clone, Default)]
|
||||
pub struct Auth {
|
||||
pub jwt: Option<String>,
|
||||
pub nkey: Option<String>,
|
||||
pub signature_callback: Option<CallbackArg1<String, Result<String, AuthError>>>,
|
||||
pub signature: Option<Vec<u8>>,
|
||||
pub username: Option<String>,
|
||||
pub password: Option<String>,
|
||||
pub token: Option<String>,
|
||||
}
|
||||
```
|
||||
|
||||
### Request
|
||||
|
||||
**Location**: `client.rs`
|
||||
|
||||
Builder for customized request-response operations.
|
||||
|
||||
```rust
|
||||
#[derive(Default)]
|
||||
pub struct Request {
|
||||
pub payload: Option<Bytes>,
|
||||
pub headers: Option<HeaderMap>,
|
||||
pub timeout: Option<Option<Duration>>,
|
||||
pub inbox: Option<String>,
|
||||
}
|
||||
```
|
||||
|
||||
### Statistics
|
||||
|
||||
**Location**: `client.rs`
|
||||
|
||||
Atomic connection statistics shared between Client and ConnectionHandler.
|
||||
|
||||
```rust
|
||||
#[derive(Default, Debug)]
|
||||
pub struct Statistics {
|
||||
pub in_bytes: AtomicU64,
|
||||
pub out_bytes: AtomicU64,
|
||||
pub in_messages: AtomicU64,
|
||||
pub out_messages: AtomicU64,
|
||||
pub connects: AtomicU64,
|
||||
}
|
||||
```
|
||||
|
||||
### Event
|
||||
|
||||
**Location**: `lib.rs`
|
||||
|
||||
Events emitted by the client for connection lifecycle monitoring.
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||
pub enum Event {
|
||||
Connected,
|
||||
Disconnected,
|
||||
LameDuckMode,
|
||||
Draining,
|
||||
Closed,
|
||||
SlowConsumer(u64),
|
||||
ServerError(ServerError),
|
||||
ClientError(ClientError),
|
||||
}
|
||||
```
|
||||
|
||||
## Internal Types
|
||||
|
||||
### Command
|
||||
|
||||
**Location**: `lib.rs`
|
||||
|
||||
Internal commands sent from `Client` to `ConnectionHandler` via `mpsc` channel.
|
||||
|
||||
```rust
|
||||
pub(crate) enum Command {
|
||||
Publish(OutboundMessage),
|
||||
Request { subject, payload, respond, headers, sender: oneshot::Sender<Message> },
|
||||
Subscribe { sid, subject, queue_group, sender: mpsc::Sender<Message> },
|
||||
Unsubscribe { sid, max: Option<u64> },
|
||||
Flush { observer: oneshot::Sender<()> },
|
||||
Drain { sid: Option<u64> },
|
||||
Reconnect,
|
||||
SetServerPool { servers: Vec<ServerAddr>, result: oneshot::Sender<Result<(), String>> },
|
||||
ServerPool { result: oneshot::Sender<Vec<connector::Server>> },
|
||||
}
|
||||
```
|
||||
|
||||
### ClientOp / ServerOp
|
||||
|
||||
**Location**: `lib.rs`
|
||||
|
||||
Protocol-level operation types used by `Connection` for wire format parsing and serialization.
|
||||
|
||||
### Subscription (Internal)
|
||||
|
||||
**Location**: `lib.rs`
|
||||
|
||||
```rust
|
||||
struct Subscription {
|
||||
subject: Subject,
|
||||
sender: mpsc::Sender<Message>,
|
||||
queue_group: Option<String>,
|
||||
delivered: u64,
|
||||
max: Option<u64>,
|
||||
}
|
||||
```
|
||||
|
||||
### Multiplexer (Internal)
|
||||
|
||||
**Location**: `lib.rs`
|
||||
|
||||
```rust
|
||||
struct Multiplexer {
|
||||
subject: Subject, // Wildcard subscription subject (e.g., "_INBOX.xxx.*")
|
||||
prefix: Subject, // Prefix for routing (e.g., "_INBOX.xxx.")
|
||||
senders: HashMap<String, oneshot::Sender<Message>>, // token → sender
|
||||
}
|
||||
```
|
||||
|
||||
### Connection State
|
||||
|
||||
**Location**: `connection.rs`
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Eq, PartialEq, Clone)]
|
||||
pub enum State {
|
||||
Pending,
|
||||
Connected,
|
||||
Disconnected,
|
||||
}
|
||||
```
|
||||
|
||||
### Protocol
|
||||
|
||||
**Location**: `lib.rs`
|
||||
|
||||
```rust
|
||||
#[derive(Serialize_repr, Deserialize_repr, PartialEq, Eq, Debug, Clone, Copy)]
|
||||
#[repr(u8)]
|
||||
pub enum Protocol {
|
||||
Original = 0,
|
||||
Dynamic = 1,
|
||||
}
|
||||
```
|
||||
|
||||
## Error Type Pattern
|
||||
|
||||
The crate uses a generic `Error<Kind>` type throughout. Every subsystem defines its own `ErrorKind` enum and a type alias:
|
||||
|
||||
```rust
|
||||
// Define the kind enum
|
||||
#[derive(Clone, Debug, PartialEq)]
|
||||
pub enum PublishErrorKind {
|
||||
MaxPayloadExceeded,
|
||||
InvalidSubject,
|
||||
Send,
|
||||
}
|
||||
|
||||
// Define the error type alias
|
||||
pub type PublishError = Error<PublishErrorKind>;
|
||||
|
||||
// Construct errors
|
||||
PublishError::new(PublishErrorKind::MaxPayloadExceeded)
|
||||
PublishError::with_source(PublishErrorKind::Send, io_error)
|
||||
|
||||
// Match on errors
|
||||
if err.kind() == PublishErrorKind::MaxPayloadExceeded { ... }
|
||||
```
|
||||
|
||||
Error kinds in the crate:
|
||||
|
||||
| Error Type | Kind Enum | Context |
|
||||
|-----------|-----------|---------|
|
||||
| `ConnectError` | `ConnectErrorKind` | Initial connection failures |
|
||||
| `PublishError` | `PublishErrorKind` | Publish validation failures |
|
||||
| `RequestError` | `RequestErrorKind` | Request-response failures |
|
||||
| `SubscribeError` | `SubscribeErrorKind` | Subscription failures |
|
||||
| `FlushError` | `FlushErrorKind` | Flush failures |
|
||||
| `ServerPoolError` | `ServerPoolErrorKind` | Server pool query failures |
|
||||
| `SetServerPoolError` | `SetServerPoolErrorKind` | Server pool modification failures |
|
||||
|
||||
## Trait Implementations
|
||||
|
||||
### Client Trait Interfaces
|
||||
|
||||
The `Client` implements several traits defined in `client::traits`:
|
||||
|
||||
```rust
|
||||
// Publisher trait — publish with optional reply subject
|
||||
trait Publisher {
|
||||
fn publish_with_reply<S, R>(&self, subject: S, reply: R, payload: Bytes) -> impl Future<Output = Result<(), PublishError>>;
|
||||
fn publish_message(&self, msg: OutboundMessage) -> impl Future<Output = Result<(), PublishError>>;
|
||||
}
|
||||
|
||||
// Subscriber trait — subscribe to a subject
|
||||
trait Subscriber {
|
||||
fn subscribe<S>(&self, subject: S) -> impl Future<Output = Result<crate::Subscriber, SubscribeError>>;
|
||||
}
|
||||
|
||||
// Requester trait — send request-response
|
||||
trait Requester {
|
||||
fn send_request<S>(&self, subject: S, request: Request) -> impl Future<Output = Result<Message, RequestError>>;
|
||||
}
|
||||
|
||||
// TimeoutProvider trait — access request timeout
|
||||
trait TimeoutProvider {
|
||||
fn timeout(&self) -> Option<Duration>;
|
||||
}
|
||||
```
|
||||
|
||||
### ToServerAddrs Trait
|
||||
|
||||
**Location**: `lib.rs`
|
||||
|
||||
Converts various address types into server address iterators. Implemented for `ServerAddr`, `str`, `String`, `&[T]`, `Vec<T>`, `&[ServerAddr]`, and references.
|
||||
|
||||
### Sink<OutboundMessage>
|
||||
|
||||
`Client` implements `futures::Sink<OutboundMessage>` for backpressure-aware publishing through the `PollSender` adapter.
|
||||
|
||||
### Stream for Subscriber
|
||||
|
||||
`Subscriber` implements `futures::Stream` with `Item = Message`, delegating to the internal `mpsc::Receiver`.
|
||||
@@ -1,338 +0,0 @@
|
||||
# Connection Handler and Data Flow
|
||||
|
||||
This document covers the internal `ConnectionHandler` that drives all protocol I/O, and the data flow through the system.
|
||||
|
||||
## ConnectionHandler
|
||||
|
||||
**Location**: `lib.rs`
|
||||
|
||||
The `ConnectionHandler` is the heart of the client. It runs as a single Tokio task and manages all communication with the NATS server.
|
||||
|
||||
```rust
|
||||
pub(crate) struct ConnectionHandler {
|
||||
connection: Connection, // Low-level I/O
|
||||
connector: Connector, // Server pool, reconnection
|
||||
subscriptions: HashMap<u64, Subscription>, // Active subscriptions
|
||||
multiplexer: Option<Multiplexer>, // Request-reply multiplexer
|
||||
pending_pings: usize, // Unanswered PINGs
|
||||
info_sender: tokio::sync::watch::Sender<Option<ServerInfo>>,
|
||||
ping_interval: Interval, // Periodic PING timer
|
||||
should_reconnect: bool, // Flag for forced reconnect
|
||||
flush_observers: Vec<oneshot::Sender<()>>, // Pending flush callbacks
|
||||
is_draining: bool, // Connection is draining
|
||||
drain_pings: VecDeque<u64>, // SIDs being drained
|
||||
}
|
||||
```
|
||||
|
||||
## Data Flow: Publish
|
||||
|
||||
```
|
||||
Application
|
||||
│
|
||||
│ client.publish("events.data", payload)
|
||||
│
|
||||
▼
|
||||
Client
|
||||
│ validates subject & payload size
|
||||
│ sends Command::Publish(OutboundMessage) via mpsc channel
|
||||
│
|
||||
▼
|
||||
ConnectionHandler::handle_command(Command::Publish)
|
||||
│ increments out_messages, out_bytes statistics
|
||||
│ calls connection.enqueue_write_op(&ClientOp::Publish { ... })
|
||||
│
|
||||
▼
|
||||
Connection::enqueue_write_op
|
||||
│ serializes to wire format:
|
||||
│ "PUB events.data 11\r\n" or "HPUB events.data 23 34\r\n"
|
||||
│ appends to flattened_writes or write_buf
|
||||
│
|
||||
▼
|
||||
Connection::poll_write
|
||||
│ uses vectored writes (64 chunks) if supported
|
||||
│ or sequential writes otherwise
|
||||
│
|
||||
▼
|
||||
Connection::poll_flush
|
||||
│ flushes the TCP/TLS/WS stream
|
||||
│ notifies flush_observers
|
||||
│
|
||||
▼
|
||||
NATS Server (TCP/TLS/WebSocket)
|
||||
```
|
||||
|
||||
## Data Flow: Subscribe
|
||||
|
||||
```
|
||||
Application
|
||||
│
|
||||
│ client.subscribe("events.>")
|
||||
│
|
||||
▼
|
||||
Client::subscribe
|
||||
│ validates subject (always, regardless of skip_subject_validation)
|
||||
│ allocates next sid via AtomicU64
|
||||
│ creates mpsc channel for messages
|
||||
│ sends Command::Subscribe { sid, subject, sender }
|
||||
│ returns Subscriber { sid, receiver }
|
||||
│
|
||||
▼
|
||||
ConnectionHandler::handle_command(Command::Subscribe)
|
||||
│ creates Subscription { subject, sender, delivered: 0, max: None }
|
||||
│ inserts into subscriptions HashMap
|
||||
│ calls connection.enqueue_write_op(&ClientOp::Subscribe { sid, subject, queue_group })
|
||||
│
|
||||
▼
|
||||
Connection::enqueue_write_op
|
||||
│ serializes: "SUB events.> 42\r\n"
|
||||
│
|
||||
▼
|
||||
Server sends MSG for matching subjects:
|
||||
│
|
||||
▼
|
||||
ConnectionHandler::handle_server_op(ServerOp::Message { sid, subject, ... })
|
||||
│ looks up sid in subscriptions HashMap
|
||||
│ constructs Message { subject, reply, payload, headers, status, description }
|
||||
│ tries subscription.sender.try_send(message)
|
||||
│
|
||||
├── Ok → increments subscription.delivered, checks max
|
||||
├── Full → emits Event::SlowConsumer(sid)
|
||||
└── Closed → removes subscription, sends ClientOp::Unsubscribe
|
||||
│
|
||||
▼
|
||||
Subscriber::poll_next (Stream impl)
|
||||
│ receives from mpsc::Receiver
|
||||
│
|
||||
▼
|
||||
Application processes Message
|
||||
```
|
||||
|
||||
## Data Flow: Request-Response
|
||||
|
||||
The request-response pattern uses the **multiplexer** — a single wildcard subscription that routes responses to their waiting requesters.
|
||||
|
||||
```
|
||||
Application
|
||||
│
|
||||
│ client.request("service", payload)
|
||||
│
|
||||
▼
|
||||
Client::send_request
|
||||
│ validates subject & payload size
|
||||
│ creates oneshot channel for response
|
||||
│ generates unique inbox: "_INBOX.<nuid>.<token>"
|
||||
│ sends Command::Request { subject, payload, respond, sender }
|
||||
│
|
||||
▼
|
||||
ConnectionHandler::handle_command(Command::Request)
|
||||
│ extracts token from respond subject (after last '.')
|
||||
│ if no multiplexer exists:
|
||||
│ creates Multiplexer with wildcard sub "_INBOX.<id>.*" (SID 0)
|
||||
│ sends ClientOp::Subscribe { sid: 0, subject: "_INBOX.<id>.*" }
|
||||
│ inserts token → oneshot::Sender in multiplexer.senders
|
||||
│ sends ClientOp::Publish { subject, payload, respond: "<prefix><token>" }
|
||||
│
|
||||
▼
|
||||
Server routes request to service:
|
||||
│
|
||||
▼
|
||||
Service responds by publishing to the reply subject:
|
||||
│
|
||||
▼
|
||||
ConnectionHandler::handle_server_op(ServerOp::Message { sid: 0, ... })
|
||||
│ sid == MULTIPLEXER_SID (0), so enters multiplexer path
|
||||
│ extracts token by stripping prefix from subject
|
||||
│ looks up token in multiplexer.senders
|
||||
│ sends Message via oneshot::Sender
|
||||
│
|
||||
▼
|
||||
Client::send_request receives via oneshot::Receiver
|
||||
│ applies timeout (default 10s)
|
||||
│ checks for NO_RESPONDERS status (503)
|
||||
│
|
||||
▼
|
||||
Application receives Message
|
||||
```
|
||||
|
||||
### Custom Inbox Request
|
||||
|
||||
If the `Request` builder specifies a custom `inbox`, the flow is different:
|
||||
- The client subscribes to the inbox directly (not via multiplexer)
|
||||
- Publishes with the inbox as the reply subject
|
||||
- Waits for the message on that subscription
|
||||
- No multiplexer involvement
|
||||
|
||||
## Data Flow: Flush
|
||||
|
||||
```
|
||||
Application
|
||||
│
|
||||
│ client.flush()
|
||||
│
|
||||
▼
|
||||
Client::flush
|
||||
│ creates oneshot channel
|
||||
│ sends Command::Flush { observer }
|
||||
│
|
||||
▼
|
||||
ConnectionHandler::handle_command(Command::Flush)
|
||||
│ pushes observer into flush_observers Vec
|
||||
│
|
||||
▼
|
||||
ProcessFut::poll (main loop)
|
||||
│ after writing all pending data...
|
||||
│ checks should_flush():
|
||||
│ Yes (write buffers empty, not yet flushed) → poll_flush
|
||||
│ May (write buffers not empty) → poll_flush
|
||||
│ No (already flushed) → skip
|
||||
│ on successful flush:
|
||||
│ drains flush_observers, sending () to each
|
||||
│
|
||||
▼
|
||||
Client::flush receives via oneshot::Receiver
|
||||
```
|
||||
|
||||
## Data Flow: Drain
|
||||
|
||||
```
|
||||
Application
|
||||
│
|
||||
│ client.drain() or subscriber.drain()
|
||||
│
|
||||
▼
|
||||
Client::drain / Subscriber::drain
|
||||
│ sends Command::Drain { sid: None } (whole client)
|
||||
│ or Command::Drain { sid: Some(n) } (single subscription)
|
||||
│
|
||||
▼
|
||||
ConnectionHandler::handle_command(Command::Drain)
|
||||
│ if sid is Some:
|
||||
│ pushes sid to drain_pings
|
||||
│ sends ClientOp::Unsubscribe { sid, max: None }
|
||||
│ if sid is None (whole client):
|
||||
│ sets is_draining = true
|
||||
│ emits Event::Draining
|
||||
│ for each subscription: drain_pings.push(sid), Unsubscribe
|
||||
│ sends ClientOp::Ping (to flush the UNSUB messages)
|
||||
│
|
||||
▼
|
||||
ProcessFut::poll (main loop)
|
||||
│ processes any remaining server messages
|
||||
│ removes drained subscriptions from HashMap
|
||||
│ if is_draining: returns ExitReason::Closed
|
||||
│
|
||||
▼
|
||||
ConnectionHandler exits, emits Event::Closed
|
||||
```
|
||||
|
||||
## Main Processing Loop
|
||||
|
||||
The `ConnectionHandler::process` method implements the core event loop via a custom `Future` (`ProcessFut`):
|
||||
|
||||
```rust
|
||||
impl Future for ProcessFut<'_> {
|
||||
type Output = ExitReason;
|
||||
|
||||
fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
|
||||
// 1. Check ping interval — send PING if due, disconnect if too many pending
|
||||
while self.handler.ping_interval.poll_tick(cx).is_ready() {
|
||||
if let Poll::Ready(exit) = self.ping() { return Poll::Ready(exit); }
|
||||
}
|
||||
|
||||
// 2. Read all available server operations
|
||||
loop {
|
||||
match self.handler.connection.poll_read_op(cx) {
|
||||
Poll::Pending => break,
|
||||
Poll::Ready(Ok(Some(server_op))) => self.handler.handle_server_op(server_op),
|
||||
Poll::Ready(Ok(None)) => return Poll::Ready(ExitReason::Disconnected(None)),
|
||||
Poll::Ready(Err(err)) => return Poll::Ready(ExitReason::Disconnected(Some(err))),
|
||||
}
|
||||
}
|
||||
|
||||
// 3. Clean up drained subscriptions
|
||||
while let Some(sid) = self.handler.drain_pings.pop_front() {
|
||||
self.handler.subscriptions.remove(&sid);
|
||||
}
|
||||
|
||||
// 4. If draining, exit
|
||||
if self.handler.is_draining { return Poll::Ready(ExitReason::Closed); }
|
||||
|
||||
// 5. Process client commands (batch of up to 16)
|
||||
// while write buffer not full
|
||||
loop {
|
||||
while !self.handler.connection.is_write_buf_full() {
|
||||
match receiver.poll_recv_many(cx, recv_buf, 16) {
|
||||
Poll::Pending => break,
|
||||
Poll::Ready(1..) => { for cmd in recv_buf.drain(..) { handler.handle_command(cmd); } }
|
||||
Poll::Ready(0) => return Poll::Ready(ExitReason::Closed),
|
||||
}
|
||||
}
|
||||
|
||||
// 6. Write pending data to stream
|
||||
match self.handler.connection.poll_write(cx) {
|
||||
Poll::Pending => break,
|
||||
Poll::Ready(Ok(())) => continue, // write buffer empty, try more commands
|
||||
Poll::Ready(Err(err)) => return Poll::Ready(ExitReason::Disconnected(Some(err))),
|
||||
}
|
||||
}
|
||||
|
||||
// 7. Flush stream and notify observers
|
||||
match self.handler.connection.poll_flush(cx) { ... }
|
||||
|
||||
// 8. Check for forced reconnect
|
||||
if mem::take(&mut self.handler.should_reconnect) {
|
||||
return Poll::Ready(ExitReason::ReconnectRequested);
|
||||
}
|
||||
|
||||
Poll::Pending
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Exit Reasons
|
||||
|
||||
The main loop exits for three reasons:
|
||||
|
||||
| Reason | Action |
|
||||
|--------|--------|
|
||||
| `Disconnected(Option<io::Error>)` | Attempt reconnection via `handle_disconnect()` |
|
||||
| `ReconnectRequested` | Force reconnect (user-triggered) |
|
||||
| `Closed` | Connection handler terminates, emit `Event::Closed` |
|
||||
|
||||
On disconnection, `handle_disconnect()` is called which:
|
||||
1. Resets `pending_pings` to 0
|
||||
2. Emits `Event::Disconnected`
|
||||
3. Updates connection state to `Disconnected`
|
||||
4. Calls `handle_reconnect()` which uses `Connector::connect()`
|
||||
5. On successful reconnect, re-subscribes all active subscriptions
|
||||
6. Re-subscribes the multiplexer wildcard if present
|
||||
|
||||
## Slow Consumer Handling
|
||||
|
||||
When a subscription's `mpsc::Sender` channel is full (the application isn't consuming messages fast enough):
|
||||
|
||||
1. `try_send` returns `TrySendError::Full`
|
||||
2. The `ConnectionHandler` emits `Event::SlowConsumer(sid)`
|
||||
3. The message is **dropped** (not queued)
|
||||
4. The subscription remains active
|
||||
|
||||
When a subscription's receiver is dropped (application closed the stream):
|
||||
|
||||
1. `try_send` returns `TrySendError::Closed`
|
||||
2. The subscription is removed from the HashMap
|
||||
3. An `UNSUB` command is sent to the server
|
||||
|
||||
## Ping/Pong Health Check
|
||||
|
||||
The `ConnectionHandler` maintains a periodic PING interval (default 60 seconds):
|
||||
|
||||
1. `ping_interval` fires every N seconds
|
||||
2. A `ClientOp::Ping` is enqueued
|
||||
3. `pending_pings` counter increments
|
||||
4. If `pending_pings > MAX_PENDING_PINGS (2)`, the connection is considered dead
|
||||
5. When `ServerOp::Pong` is received, `pending_pings` decrements
|
||||
6. Any server operation resets the ping interval timer
|
||||
|
||||
## Batched Command Processing
|
||||
|
||||
Commands from the `Client` are received in batches of up to 16 (`RECV_CHUNK_SIZE`) using `poll_recv_many`. This amortizes the cost of waking the task and enables pipelining multiple operations (e.g., publishing many messages) in a single poll cycle.
|
||||
@@ -1,277 +0,0 @@
|
||||
# Connection and Reconnection
|
||||
|
||||
This document covers how connections are established, TLS handling, the server pool, and the reconnection mechanism.
|
||||
|
||||
## Connector
|
||||
|
||||
**Location**: `connector.rs`
|
||||
|
||||
The `Connector` manages the server pool and handles connection establishment and reconnection.
|
||||
|
||||
```rust
|
||||
pub(crate) struct Connector {
|
||||
servers: Vec<Server>, // Server pool with per-server metadata
|
||||
options: ConnectorOptions, // Connection configuration
|
||||
connect_stats: Arc<Statistics>, // Shared statistics
|
||||
attempts: usize, // Global reconnection attempt counter
|
||||
events_tx: mpsc::Sender<Event>, // Event channel
|
||||
state_tx: watch::Sender<State>, // Connection state watcher
|
||||
max_payload: Arc<AtomicUsize>, // Server's max payload
|
||||
last_info: ServerInfo, // Last known server info
|
||||
}
|
||||
```
|
||||
|
||||
### Server Pool
|
||||
|
||||
Each server in the pool carries metadata:
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct Server {
|
||||
pub addr: ServerAddr,
|
||||
pub failed_attempts: usize, // Consecutive failed attempts
|
||||
pub did_connect: bool, // Ever successfully connected?
|
||||
pub is_discovered: bool, // Discovered via INFO, not user-configured
|
||||
pub last_error: Option<String>, // Last connection error
|
||||
}
|
||||
```
|
||||
|
||||
### ConnectorOptions
|
||||
|
||||
```rust
|
||||
pub(crate) struct ConnectorOptions {
|
||||
pub tls_required: bool,
|
||||
pub certificates: Vec<PathBuf>,
|
||||
pub client_cert: Option<PathBuf>,
|
||||
pub client_key: Option<PathBuf>,
|
||||
pub tls_client_config: Option<rustls::ClientConfig>,
|
||||
pub tls_first: bool,
|
||||
pub auth: Auth,
|
||||
pub no_echo: bool,
|
||||
pub connection_timeout: Duration, // Default: 5 seconds
|
||||
pub name: Option<String>,
|
||||
pub ignore_discovered_servers: bool,
|
||||
pub retain_servers_order: bool,
|
||||
pub read_buffer_capacity: u16, // Default: 65535
|
||||
pub reconnect_delay_callback: Arc<dyn Fn(usize) -> Duration>,
|
||||
pub auth_callback: Option<CallbackArg1<Vec<u8>, Result<Auth, AuthError>>>,
|
||||
pub max_reconnects: Option<usize>,
|
||||
pub local_address: Option<SocketAddr>,
|
||||
pub reconnect_to_server_callback: Option<ReconnectToServerCallback>,
|
||||
}
|
||||
```
|
||||
|
||||
## Connection Establishment Flow
|
||||
|
||||
```
|
||||
Connector::try_connect_to_server(addr)
|
||||
│
|
||||
├── 1. DNS resolution
|
||||
│ server_addr.socket_addrs()
|
||||
│
|
||||
├── 2. For each resolved address:
|
||||
│ │
|
||||
│ ├── 2a. Connect with timeout
|
||||
│ │ tokio::time::timeout(connection_timeout, try_connect_to(socket_addr, ...))
|
||||
│ │
|
||||
│ └── 2b. try_connect_to():
|
||||
│ │
|
||||
│ ├── Select transport:
|
||||
│ │ ├── "ws" → WebSocket (tokio_websockets)
|
||||
│ │ ├── "wss" → WebSocket over TLS
|
||||
│ │ └── default → TCP (TcpStream)
|
||||
│ │
|
||||
│ ├── Optional: bind to local_address
|
||||
│ ├── Set TCP_NODELAY
|
||||
│ ├── Create Connection with read_buffer_capacity
|
||||
│ │
|
||||
│ ├── If tls_first: upgrade to TLS before INFO
|
||||
│ │
|
||||
│ ├── Read INFO from server
|
||||
│ │
|
||||
│ ├── If TLS required (by option, server, or URL scheme):
|
||||
│ │ upgrade to TLS (rustls)
|
||||
│ │
|
||||
│ ├── Discover servers from INFO.connect_urls
|
||||
│ │ (unless ignore_discovered_servers)
|
||||
│ │
|
||||
│ ├── Build ConnectInfo with auth:
|
||||
│ │ ├── username/password (from Auth or URL)
|
||||
│ │ ├── token (from Auth)
|
||||
│ │ ├── nkey + signed nonce (feature: nkeys)
|
||||
│ │ ├── JWT + signature callback (feature: nkeys)
|
||||
│ │ └── auth_callback (custom async callback)
|
||||
│ │
|
||||
│ ├── Send CONNECT + PING
|
||||
│ │
|
||||
│ └── Wait for response:
|
||||
│ ├── -ERR (authorization violation) → error
|
||||
│ ├── PONG or +OK → success
|
||||
│ └── EOF → error
|
||||
│
|
||||
└── 3. On success:
|
||||
├── Reset attempt counter
|
||||
├── Increment connects statistic
|
||||
├── Emit Event::Connected
|
||||
├── Update State::Connected
|
||||
├── Store max_payload
|
||||
├── Update per-server metadata (did_connect, failed_attempts)
|
||||
└── Return (ServerInfo, Connection)
|
||||
```
|
||||
|
||||
## TLS Handling
|
||||
|
||||
The client supports three TLS modes:
|
||||
|
||||
### 1. Standard TLS (INFO → TLS)
|
||||
Default behavior. The client receives the `INFO` message in plaintext, then upgrades to TLS if:
|
||||
- `tls_required` option is set
|
||||
- Server's `INFO.tls_required` is true
|
||||
- URL scheme is `tls://`
|
||||
|
||||
### 2. TLS First (TLS → INFO)
|
||||
When `ConnectOptions::tls_first()` is enabled, the client establishes TLS before reading INFO. This requires the server to have `handshake_first` enabled. Useful for environments where plaintext INFO is not acceptable.
|
||||
|
||||
### 3. WebSocket TLS
|
||||
For `wss://` URLs, TLS is handled by the WebSocket library (`tokio-websockets`) directly, not by the client's TLS layer.
|
||||
|
||||
### TLS Configuration
|
||||
The client uses `rustls` via `tokio-rustls`. Configuration steps:
|
||||
1. Load root certificates from system store (`rustls-native-certs`)
|
||||
2. Optionally add custom root certificates from PEM files
|
||||
3. Optionally configure client certificate and key for mTLS
|
||||
4. Optionally pass a custom `rustls::ClientConfig`
|
||||
|
||||
Crypto backend is selectable via feature flags:
|
||||
- `ring` (default)
|
||||
- `aws-lc-rs`
|
||||
- `fips` (requires aws-lc-rs)
|
||||
|
||||
## Reconnection
|
||||
|
||||
### Reconnection Trigger
|
||||
|
||||
Reconnection is triggered when:
|
||||
1. I/O error during read or write (`ExitReason::Disconnected`)
|
||||
2. Too many pending PINGs (no PONG received)
|
||||
3. User calls `Client::force_reconnect()` (`ExitReason::ReconnectRequested`)
|
||||
|
||||
### Reconnection Flow
|
||||
|
||||
```
|
||||
ConnectionHandler::handle_disconnect()
|
||||
│
|
||||
├── Reset pending_pings to 0
|
||||
├── Emit Event::Disconnected
|
||||
├── Update State::Disconnected
|
||||
│
|
||||
└── handle_reconnect()
|
||||
│
|
||||
└── Connector::connect()
|
||||
│
|
||||
└── Loop: try_connect()
|
||||
│
|
||||
├── If reconnect_to_server_callback is set:
|
||||
│ │ Call callback with (server_pool, server_info)
|
||||
│ │ If returns Some(ReconnectToServer):
|
||||
│ │ Validate server is in pool
|
||||
│ │ Use callback's delay or default backoff
|
||||
│ │ Try connecting to selected server
|
||||
│ └── If None or invalid: fall through to default
|
||||
│
|
||||
├── Default selection:
|
||||
│ ├── Shuffle servers (unless retain_servers_order)
|
||||
│ ├── Sort by failed_attempts (ascending)
|
||||
│ └── Try each server in order
|
||||
│
|
||||
├── For each server:
|
||||
│ ├── Increment attempts counter
|
||||
│ ├── Check max_reconnects limit
|
||||
│ ├── Apply reconnect delay (exponential backoff)
|
||||
│ └── try_connect_to_server(addr)
|
||||
│
|
||||
├── On success:
|
||||
│ ├── Reset attempts to 0
|
||||
│ ├── Re-subscribe all active subscriptions
|
||||
│ │ (filter out closed subscription channels)
|
||||
│ ├── Re-subscribe multiplexer wildcard
|
||||
│ └── Return (ServerInfo, Connection)
|
||||
│
|
||||
└── On failure:
|
||||
├── Update per-server metadata (failed_attempts, last_error)
|
||||
├── Auth errors → propagate immediately
|
||||
└── Other errors → continue to next server
|
||||
```
|
||||
|
||||
### Exponential Backoff
|
||||
|
||||
Default reconnect delay function:
|
||||
|
||||
```rust
|
||||
fn reconnect_delay_callback_default(attempts: usize) -> Duration {
|
||||
if attempts <= 1 {
|
||||
Duration::from_millis(0)
|
||||
} else {
|
||||
let exp: u32 = (attempts - 1).try_into().unwrap_or(u32::MAX);
|
||||
let max = Duration::from_secs(4);
|
||||
cmp::min(Duration::from_millis(2_u64.saturating_pow(exp)), max)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Attempt | Delay |
|
||||
|---------|-------|
|
||||
| 1 | 0ms |
|
||||
| 2 | 0ms |
|
||||
| 3 | 2ms |
|
||||
| 4 | 4ms |
|
||||
| 5 | 8ms |
|
||||
| ... | ... |
|
||||
| 13 | 4096ms |
|
||||
| 14+ | 4000ms (capped) |
|
||||
|
||||
Custom delay functions can be provided via `ConnectOptions::reconnect_delay_callback()`.
|
||||
|
||||
### Server Pool Updates
|
||||
|
||||
The server pool is dynamic:
|
||||
|
||||
1. **Initial pool**: from `connect()` / `ConnectOptions::connect()` URL(s)
|
||||
2. **Discovered servers**: added from `INFO.connect_urls` on each connection (unless `ignore_discovered_servers` is set)
|
||||
3. **Runtime updates**: via `Client::set_server_pool()` — replaces the entire pool while preserving per-server state for servers that appear in both old and new pools
|
||||
4. **Order**: servers are shuffled by default (random selection), unless `retain_servers_order` is set
|
||||
|
||||
### Max Reconnects
|
||||
|
||||
The `max_reconnects` option limits total reconnection attempts:
|
||||
- `None` or `0` → unlimited (default)
|
||||
- `Some(n)` → give up after `n` total attempts
|
||||
- Counter is reset on successful connection and when `set_server_pool()` is called
|
||||
|
||||
## ConnectOptions Defaults
|
||||
|
||||
| Option | Default |
|
||||
|--------|---------|
|
||||
| `connection_timeout` | 5 seconds |
|
||||
| `ping_interval` | 60 seconds |
|
||||
| `sender_capacity` | 2048 |
|
||||
| `subscription_capacity` | 65536 |
|
||||
| `inbox_prefix` | `"_INBOX"` |
|
||||
| `request_timeout` | 10 seconds |
|
||||
| `retry_on_initial_connect` | false |
|
||||
| `ignore_discovered_servers` | false |
|
||||
| `retain_servers_order` | false |
|
||||
| `read_buffer_capacity` | 65535 |
|
||||
| `skip_subject_validation` | false |
|
||||
| `no_echo` | false |
|
||||
| `tls_required` | false |
|
||||
| `tls_first` | false |
|
||||
| `max_reconnects` | None (unlimited) |
|
||||
|
||||
## Background Connection
|
||||
|
||||
When `ConnectOptions::retry_on_initial_connect()` is enabled, the `connect()` function returns a `Client` immediately, before the connection is established. The connection is established in a background Tokio task. This means:
|
||||
- `client.server_info()` returns `ServerInfo::default()` until connected
|
||||
- `client.connection_state()` returns `State::Pending`
|
||||
- Operations like `publish()` will queue in the command channel
|
||||
- The `Client` becomes usable once the background task connects
|
||||
@@ -1,472 +0,0 @@
|
||||
# JetStream Internals
|
||||
|
||||
This document covers the JetStream subsystem — how it provides stream-based messaging with persistence, consumer management, and higher-level APIs like KV and Object Store.
|
||||
|
||||
## JetStream Context
|
||||
|
||||
**Location**: `jetstream/context.rs`
|
||||
|
||||
The `Context` is the entry point to the JetStream API. It wraps a `Client` and provides stream management, publishing, and consumer operations.
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct Context {
|
||||
pub(crate) client: Client,
|
||||
pub(crate) prefix: String, // API subject prefix (default: "$JS.API")
|
||||
pub(crate) timeout: Duration, // Default request timeout
|
||||
pub(crate) max_ack_semaphore: Arc<Semaphore>, // Limits in-flight ack waits
|
||||
pub(crate) ack_sender: mpsc::Sender<(oneshot::Receiver<Message>, OwnedSemaphorePermit)>,
|
||||
pub(crate) backpressure_on_inflight: bool,
|
||||
}
|
||||
```
|
||||
|
||||
### Context Creation
|
||||
|
||||
```rust
|
||||
// Default context (prefix = "$JS.API")
|
||||
let jetstream = async_nats::jetstream::new(client);
|
||||
|
||||
// With domain (prefix = "$JS.hub.API")
|
||||
let jetstream = async_nats::jetstream::with_domain(client, "hub");
|
||||
|
||||
// With custom prefix
|
||||
let jetstream = async_nats::jetstream::with_prefix(client, "JS.acc@hub.API");
|
||||
|
||||
// Builder pattern for more options
|
||||
let jetstream = async_nats::jetstream::Context::builder(client)
|
||||
.domain("hub")
|
||||
.prefix("$JS.API")
|
||||
.timeout(Duration::from_secs(30))
|
||||
.max_ack_pending(256)
|
||||
.backpressure_on_inflight(true)
|
||||
.build();
|
||||
```
|
||||
|
||||
### JetStream API Subject Convention
|
||||
|
||||
All JetStream API calls are request-response messages sent to subjects following the pattern:
|
||||
|
||||
```
|
||||
$JS.API.<operation>.<stream-name>[.<consumer-name>]
|
||||
```
|
||||
|
||||
Examples:
|
||||
- `$JS.API.STREAM.CREATE.events` — create stream "events"
|
||||
- `$JS.API.STREAM.INFO.events` — get stream info
|
||||
- `$JS.API.CONSUMER.DURABLE.CREATE.events.myconsumer` — create durable consumer
|
||||
- `$JS.API.CONSUMER.MSG.NEXT.events.myconsumer` — pull next message
|
||||
|
||||
With a domain, the prefix changes to `$JS.<domain>.API`.
|
||||
|
||||
## Stream Management
|
||||
|
||||
**Location**: `jetstream/stream.rs`
|
||||
|
||||
### Stream Config
|
||||
|
||||
```rust
|
||||
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq)]
|
||||
pub struct Config {
|
||||
pub name: String,
|
||||
pub subjects: Vec<String>, // Subject filter
|
||||
pub retention: RetentionPolicy, // Limits, Interest, WorkQueue
|
||||
pub max_consumers: i32,
|
||||
pub max_messages: i64, // Per-stream message limit
|
||||
pub max_messages_per_subject: i64,
|
||||
pub max_bytes: i64, // Per-stream byte limit
|
||||
pub max_age: Duration, // Message TTL
|
||||
pub max_message_size: Option<i32>, // Max individual message size
|
||||
pub storage: StorageType, // File or Memory
|
||||
pub num_replicas: usize,
|
||||
pub no_ack: bool, // Don't require ack
|
||||
pub discard: DiscardPolicy, // Old or New
|
||||
pub duplicate_window: Duration,
|
||||
pub allow_rollup_hdrs: bool,
|
||||
pub allow_direct: bool,
|
||||
pub mirror: Option<External>,
|
||||
pub sources: Vec<External>,
|
||||
pub sealed: bool,
|
||||
pub compression: Option<Compression>, // server_2_10+
|
||||
pub first_sequence: Option<u64>, // server_2_11+
|
||||
pub subject_transform: Option<SubjectTransform>, // server_2_12+
|
||||
pub metadata: Option<HashMap<String, String>>, // server_2_10+
|
||||
pub placement: Option<Placement>,
|
||||
pub republish: Option<RePublish>,
|
||||
}
|
||||
```
|
||||
|
||||
### Stream Operations
|
||||
|
||||
Via `Context`:
|
||||
|
||||
| Method | API Subject | Description |
|
||||
|--------|------------|-------------|
|
||||
| `create_stream(config)` | `STREAM.CREATE.<name>` | Create a new stream |
|
||||
| `get_stream(name)` | `STREAM.INFO.<name>` | Get existing stream |
|
||||
| `get_or_create_stream(config)` | `STREAM.INFO` → `STREAM.CREATE` | Get or create |
|
||||
| `delete_stream(name)` | `STREAM.DELETE.<name>` | Delete a stream |
|
||||
| `update_stream(name, config)` | `STREAM.UPDATE.<name>` | Update stream config |
|
||||
| `purge_stream(name)` | `STREAM.PURGE.<name>` | Purge all messages |
|
||||
| `streams()` | `STREAM.LIST` | List all streams (paged iterator) |
|
||||
| `stream_names()` | `STREAM.NAMES` | List stream names (paged iterator) |
|
||||
| `account_info()` | `ACCOUNT.INFO` | Get account info |
|
||||
|
||||
Via `Stream`:
|
||||
|
||||
| Method | API Subject | Description |
|
||||
|--------|------------|-------------|
|
||||
| `info()` | `STREAM.INFO.<name>` | Refresh stream info |
|
||||
| `purge()` | `STREAM.PURGE.<name>` | Purge messages |
|
||||
| `delete()` | `STREAM.DELETE.<name>` | Delete this stream |
|
||||
| `update(config)` | `STREAM.UPDATE.<name>` | Update config |
|
||||
| `get_raw_message(seq)` | `STREAM.MSG.GET.<name>` | Get message by sequence (stored mode) |
|
||||
| `get_last_message(subject)` | `STREAM.MSG.GET.<name>` | Get last message for subject (stored mode) |
|
||||
| `direct_get_last(subject)` | `DIRECT.GET.<name>` | Direct get last (bypasses RAA) |
|
||||
| `direct_get(seq)` | `DIRECT.GET.<name>` | Direct get by sequence |
|
||||
| `delete_message(seq)` | `STREAM.MSG.DELETE.<name>` | Delete a specific message |
|
||||
| `create_consumer(config)` | `CONSUMER.CREATE.<stream>` | Create consumer |
|
||||
| `get_or_create_consumer(name, config)` | `CONSUMER.DURABLE.CREATE.<stream>.<name>` | Get or create durable |
|
||||
| `get_consumer(name)` | `CONSUMER.INFO.<stream>.<name>` | Get existing consumer |
|
||||
|
||||
### Stream Info
|
||||
|
||||
```rust
|
||||
#[derive(Serialize, Deserialize, Debug, Clone)]
|
||||
pub struct Info {
|
||||
pub config: Config,
|
||||
pub created: DateTime,
|
||||
pub state: State, // Messages, bytes, first/last sequence, consumer count
|
||||
pub cluster: Option<ClusterInfo>,
|
||||
pub timestamp: DateTime,
|
||||
pub leader: Option<String>,
|
||||
pub subjects: Option<HashMap<String, u64>>, // Subject → message count
|
||||
}
|
||||
```
|
||||
|
||||
### Paged List Operations
|
||||
|
||||
Stream and consumer listing uses a paged iterator pattern:
|
||||
|
||||
```rust
|
||||
// streams() returns an iterator that automatically pages
|
||||
let mut streams = jetstream.streams();
|
||||
while let Some(stream) = streams.next().await {
|
||||
let stream = stream?;
|
||||
// process stream
|
||||
}
|
||||
|
||||
// stream_names() similarly pages
|
||||
let mut names = jetstream.stream_names();
|
||||
while let Some(name) = names.next().await {
|
||||
println!("{}", name?);
|
||||
}
|
||||
```
|
||||
|
||||
The paged iterator sends an initial request with `offset: 0` and continues fetching pages until no more results are returned.
|
||||
|
||||
## Publishing
|
||||
|
||||
**Location**: `jetstream/context.rs`, `jetstream/publish.rs`
|
||||
|
||||
### Publish
|
||||
|
||||
```rust
|
||||
// Basic publish (fire-and-forget)
|
||||
jetstream.publish("events.data", "payload".into()).await?;
|
||||
|
||||
// Publish with custom message builder
|
||||
jetstream.publish_message(
|
||||
jetstream::message::PublishMessage::build()
|
||||
.payload("data".into())
|
||||
.message_id("unique-id") // Nats-Msg-Id header for dedup
|
||||
.expected_last_message_id("prev") // Nats-Expected-Last-Msg-Id
|
||||
.expected_last_sequence(42) // Nats-Expected-Last-Sequence
|
||||
.expected_last_subject_sequence("events", 10) // Per-subject sequence
|
||||
.header("Custom", "Value")
|
||||
).await?;
|
||||
```
|
||||
|
||||
### PublishAck
|
||||
|
||||
When a message is published to a JetStream stream, the server responds with a `PublishAck`:
|
||||
|
||||
```rust
|
||||
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq)]
|
||||
pub struct PublishAck {
|
||||
pub stream: String,
|
||||
pub sequence: u64,
|
||||
pub domain: Option<String>,
|
||||
pub duplicate: bool,
|
||||
}
|
||||
```
|
||||
|
||||
### PublishAckFuture
|
||||
|
||||
Publishing returns a `PublishAckFuture` that resolves to `PublishAck`. The future uses a semaphore (`max_ack_semaphore`) to limit in-flight ack waits and prevent backpressure issues.
|
||||
|
||||
When `backpressure_on_inflight` is enabled, the publish operation blocks if there are too many pending acks, preventing the command channel from filling up with unbounded publish operations.
|
||||
|
||||
### Idempotent Publishing
|
||||
|
||||
Headers for exactly-once semantics:
|
||||
|
||||
| Header | Purpose |
|
||||
|--------|---------|
|
||||
| `Nats-Msg-Id` | Message ID for deduplication within the stream's duplicate window |
|
||||
| `Nats-Expected-Last-Msg-Id` | Expected last message ID (conditional publish) |
|
||||
| `Nats-Expected-Last-Sequence` | Expected last sequence number |
|
||||
| `Nats-Expected-Last-Subject-Sequence` | Expected last sequence for a specific subject |
|
||||
|
||||
## Consumers
|
||||
|
||||
**Location**: `jetstream/consumer/`
|
||||
|
||||
### Consumer Types
|
||||
|
||||
| Type | Description |
|
||||
|------|-------------|
|
||||
| `PullConsumer` | Client pulls messages on demand |
|
||||
| `PushConsumer` | Server pushes messages to a delivery subject |
|
||||
| `OrderedConsumer` | Push consumer with automatic re-creation on failure |
|
||||
|
||||
### Consumer Config
|
||||
|
||||
```rust
|
||||
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq)]
|
||||
pub struct Config {
|
||||
pub name: Option<String>,
|
||||
pub durable_name: Option<String>,
|
||||
pub description: Option<String>,
|
||||
pub deliver_subject: Option<String>, // Push consumers only
|
||||
pub ack_policy: AckPolicy,
|
||||
pub ack_wait: Duration,
|
||||
pub max_deliver: i64,
|
||||
pub max_ack_pending: i32,
|
||||
pub max_waiting: i32, // Pull consumers only
|
||||
pub filter_subject: Option<String>,
|
||||
pub replay_policy: ReplayPolicy,
|
||||
pub sample_frequency: Option<i8>,
|
||||
pub max_batch: i32, // Pull consumers
|
||||
pub max_expires: Duration, // Pull consumers
|
||||
pub inactive_threshold: Duration,
|
||||
pub flow_control: bool, // Push consumers
|
||||
pub heartbeat: Option<Duration>, // Push consumers
|
||||
pub backoff: Vec<Duration>,
|
||||
pub deliver_group: Option<String>,
|
||||
pub num_replicas: usize,
|
||||
pub mem_storage: bool,
|
||||
pub metadata: Option<HashMap<String, String>>,
|
||||
pub ack_markers: Option<Vec<String>>, // server_2_12+
|
||||
}
|
||||
```
|
||||
|
||||
### Pull Consumer
|
||||
|
||||
**Location**: `jetstream/consumer/pull.rs`
|
||||
|
||||
Pull consumers require explicit requests for messages:
|
||||
|
||||
```rust
|
||||
// Batch request
|
||||
let mut messages = consumer.messages().await?.take(100);
|
||||
while let Some(message) = messages.next().await {
|
||||
let message = message?;
|
||||
message.ack().await?;
|
||||
}
|
||||
|
||||
// Sequence-based batch
|
||||
let mut batches = consumer.sequence(50)?.take(10);
|
||||
while let Some(mut batch) = batches.try_next().await? {
|
||||
while let Some(Ok(message)) = batch.next().await {
|
||||
message.ack().await?;
|
||||
}
|
||||
}
|
||||
|
||||
// Single message fetch
|
||||
let message = consumer.fetch().await?;
|
||||
```
|
||||
|
||||
Pull requests are sent to: `$JS.API.CONSUMER.MSG.NEXT.<stream>.<consumer>`
|
||||
|
||||
The request payload is JSON:
|
||||
```json
|
||||
{"batch": 10, "expires": 5000, "no_wait": false}
|
||||
```
|
||||
|
||||
### Push Consumer
|
||||
|
||||
**Location**: `jetstream/consumer/push.rs`
|
||||
|
||||
Push consumers receive messages automatically on a delivery subject. The client subscribes to the delivery subject and processes messages as they arrive.
|
||||
|
||||
Features:
|
||||
- **Flow control** — server sends flow control messages, client responds to maintain delivery rate
|
||||
- **Heartbeats** — idle heartbeats (status code 100) when no messages are available
|
||||
- **Ordered consumers** — automatically recreated on delivery failures with correct sequence positioning
|
||||
|
||||
### Acknowledgment
|
||||
|
||||
**Location**: `jetstream/message.rs`
|
||||
|
||||
JetStream messages support multiple acknowledgment types:
|
||||
|
||||
```rust
|
||||
pub enum AckKind {
|
||||
Ack, // Ack (message processed)
|
||||
Nack, // Nak (re-deliver)
|
||||
Progress, // Progress (still working)
|
||||
Next, // Next (ack + pull next)
|
||||
Term, // Term (don't redeliver, remove from stream)
|
||||
All, // Ack all messages up to this sequence
|
||||
}
|
||||
```
|
||||
|
||||
Methods on JetStream `Message`:
|
||||
- `ack()` — simple acknowledgment
|
||||
- `ack_with(kind)` — acknowledgment with specific type
|
||||
- `double_ack()` — exactly-once ack (ACK + separate ack message)
|
||||
- `nack()` — negative acknowledgment (request redelivery)
|
||||
- `in_progress()` — progress indicator
|
||||
- `term()` — terminate message (no redelivery)
|
||||
|
||||
## JetStream Message
|
||||
|
||||
**Location**: `jetstream/message.rs`
|
||||
|
||||
JetStream messages wrap core `Message` with metadata extracted from headers:
|
||||
|
||||
```rust
|
||||
#[derive(Debug)]
|
||||
pub struct Message {
|
||||
pub message: crate::Message, // The underlying NATS message
|
||||
pub context: Context, // JetStream context for acking
|
||||
pub ack_pending: Arc<AtomicU64>, // Pending ack counter
|
||||
}
|
||||
|
||||
impl Message {
|
||||
pub fn info(&self) -> Result<Info, MessageInfoError> // Parse message info from headers
|
||||
pub async fn ack(&self) -> Result<(), AckError>
|
||||
pub async fn ack_with(&self, kind: AckKind) -> Result<(), AckError>
|
||||
pub async fn double_ack(&self) -> Result<(), AckError>
|
||||
pub async fn nack(&self) -> Result<(), AckError>
|
||||
pub async fn in_progress(&self) -> Result<(), AckError>
|
||||
pub async fn term(&self) -> Result<(), AckError>
|
||||
}
|
||||
```
|
||||
|
||||
Message info is extracted from the `HMSG` headers:
|
||||
- `Nats-Stream` — stream name
|
||||
- `Nats-Consumer` — consumer name
|
||||
- `Nats-Delivered` — delivery count
|
||||
- `Nats-Sequence` — stream sequence
|
||||
- `Nats-Time-Stamp` — timestamp
|
||||
- `Nats-Subject` — original subject
|
||||
- `Nats-Pending-Messages` / `Nats-Pending-Bytes` — pending counts
|
||||
|
||||
## Key-Value Store
|
||||
|
||||
**Location**: `jetstream/kv/`
|
||||
|
||||
The KV store is a JetStream-based key-value API. Each bucket maps to a JetStream stream with specific configuration:
|
||||
|
||||
```rust
|
||||
// Create a KV store
|
||||
let kv = jetstream
|
||||
.create_key_value(async_nats::jetstream::kv::Config {
|
||||
bucket: "my_bucket".to_string(),
|
||||
history: 5, // Max history per key (1-64)
|
||||
ttl: Duration::from_secs(3600), // Key TTL
|
||||
max_bytes: 1024 * 1024, // Max bucket size
|
||||
storage: StorageType::File,
|
||||
replicas: 1,
|
||||
..Default::default()
|
||||
})
|
||||
.await?;
|
||||
```
|
||||
|
||||
Under the hood:
|
||||
- Each key is stored as a message with subject `$KV.<bucket>.<key>`
|
||||
- Keys support wildcard patterns (`$KV.bucket.prefix.*`)
|
||||
- History is managed via stream `max_messages_per_subject`
|
||||
- TTL is managed via stream `max_age`
|
||||
- `put(key, value)` publishes to the key subject
|
||||
- `get(key)` reads the last message for the key subject
|
||||
- `delete(key)` publishes an internal delete marker
|
||||
- `purge(key)` uses stream purge API
|
||||
- `watch()` subscribes to key changes and returns a `Watch` stream
|
||||
- `keys()` / `history(key)` list keys and history
|
||||
|
||||
## Object Store
|
||||
|
||||
**Location**: `jetstream/object_store/`
|
||||
|
||||
The Object Store provides large object storage built on JetStream. Objects are chunked and stored across multiple messages in a stream.
|
||||
|
||||
```rust
|
||||
// Create an object store
|
||||
let store = jetstream
|
||||
.create_object_store(async_nats::jetstream::object_store::Config {
|
||||
bucket: "my_objects".to_string(),
|
||||
..Default::default()
|
||||
})
|
||||
.await?;
|
||||
|
||||
// Put an object
|
||||
let info = store.put("file.txt", stream).await?;
|
||||
|
||||
// Get an object
|
||||
let mut object_stream = store.get("file.txt").await?;
|
||||
```
|
||||
|
||||
Under the hood:
|
||||
- Objects are chunked into ~128KB messages
|
||||
- Metadata (object info) is stored as the first "chunk 0" message
|
||||
- Each chunk is a message with subject `$OBJ.<bucket>.<object-nuid>.C<chunk-number>`
|
||||
- Metadata includes: name, description, headers, size, chunks, digest (SHA-256)
|
||||
- `get()` returns a stream of chunks
|
||||
- Links allow referencing one object from another (like symlinks)
|
||||
|
||||
## JetStream Error Codes
|
||||
|
||||
**Location**: `jetstream/errors.rs`
|
||||
|
||||
Standard JetStream error codes returned by the server:
|
||||
|
||||
| Code | Constant | Description |
|
||||
|------|----------|-------------|
|
||||
| 10001 | `NOT_FOUND` | Resource not found |
|
||||
| 10002 | `STREAM_NOT_FOUND` | Stream not found |
|
||||
| 10003 | `CONSUMER_NOT_FOUND` | Consumer not found |
|
||||
| 10004 | `REQUEST_NOT_FOUND` | Request not found |
|
||||
| 10005 | `STREAM_WRONG_LAST_SEQ` | Wrong last sequence |
|
||||
| 10006 | `STREAM_NAME_EXISTS` | Stream already exists |
|
||||
| 10007 | `CONSUMER_NAME_EXISTS` | Consumer already exists |
|
||||
| 10008 | `INSUFFICIENT_RESOURCES` | Insufficient resources |
|
||||
| 10009 | `NO_MESSAGE_FOUND` | No message found |
|
||||
| 10013 | `CONSUMER_EXISTS` | Consumer already exists (duplicate) |
|
||||
| 10014 | `STREAM_NOT_CONFIGURED` | Stream not configured |
|
||||
| 10015 | `CLUSTER_NOT_ACTIVE` | Cluster not active |
|
||||
| 10016 | `CLUSTER_NOT_LEADER` | Not the cluster leader |
|
||||
| 10017 | `CLUSTER_NOT_ENOUGH_PEERS` | Not enough peers |
|
||||
| 10018 | `CLUSTER_INCOMPLETE` | Cluster incomplete |
|
||||
| 10019 | `CONSUMER_DELETED` | Consumer was deleted |
|
||||
| 10020 | `CONSUMER_BAD_ACK` | Bad acknowledgment |
|
||||
| 10021 | `CONSUMER_BAD_SUBJECT` | Bad consumer subject |
|
||||
| 10022 | `CONSUMER_DELETED_DRIFT` | Consumer deleted due to drift |
|
||||
| ... | ... | Additional codes |
|
||||
|
||||
## Account
|
||||
|
||||
**Location**: `jetstream/account.rs`
|
||||
|
||||
The `Account` struct provides information about the JetStream account:
|
||||
|
||||
```rust
|
||||
pub struct Account {
|
||||
pub memory: i64,
|
||||
pub storage: i64,
|
||||
pub streams: i64,
|
||||
pub consumers: i64,
|
||||
pub limits: AccountLimits,
|
||||
}
|
||||
```
|
||||
@@ -1,292 +0,0 @@
|
||||
# Authentication and Security
|
||||
|
||||
This document covers the authentication mechanisms, TLS configuration, and security-related features of the async-nats client.
|
||||
|
||||
## Authentication Methods
|
||||
|
||||
The NATS server supports multiple authentication methods. The client implements all of them.
|
||||
|
||||
### 1. Username/Password
|
||||
|
||||
The simplest authentication method.
|
||||
|
||||
```rust
|
||||
// Via ConnectOptions
|
||||
let client = ConnectOptions::with_user_and_password("user".into(), "pass".into())
|
||||
.connect("nats://localhost")
|
||||
.await?;
|
||||
|
||||
// Via URL
|
||||
let client = connect("nats://user:pass@localhost:4222").await?;
|
||||
```
|
||||
|
||||
These credentials are sent in the `CONNECT` message as `user` and `pass` fields.
|
||||
|
||||
### 2. Token Authentication
|
||||
|
||||
A single token used for authentication.
|
||||
|
||||
```rust
|
||||
let client = ConnectOptions::with_token("my-token".into())
|
||||
.connect("nats://localhost")
|
||||
.await?;
|
||||
```
|
||||
|
||||
Token is sent in the `CONNECT` message as `auth_token` field.
|
||||
|
||||
### 3. NKey Authentication
|
||||
|
||||
NKey-based authentication using Ed25519 key pairs. Requires the `nkeys` feature.
|
||||
|
||||
```rust
|
||||
let seed = "SUANQDPB2RUOE4ETUA26CNX7FUKE5ZZKFCQIIW63OX225F2CO7UEXTM7ZY";
|
||||
let client = ConnectOptions::with_nkey(seed.into())
|
||||
.connect("nats://localhost")
|
||||
.await?;
|
||||
```
|
||||
|
||||
Flow:
|
||||
1. Server sends `INFO` with a `nonce` field
|
||||
2. Client creates a `KeyPair` from the seed
|
||||
3. Client signs the nonce: `key_pair.sign(nonce.as_bytes())`
|
||||
4. Client sends `CONNECT` with `nkey` (public key) and `sig` (Base64URL-encoded signature)
|
||||
5. Server verifies the signature against the public key and nonce
|
||||
|
||||
### 4. JWT Authentication
|
||||
|
||||
User JWT with a signing callback. Requires the `nkeys` feature.
|
||||
|
||||
```rust
|
||||
let key_pair = Arc::new(nkeys::KeyPair::from_seed(seed)?);
|
||||
let jwt = load_jwt().await?;
|
||||
|
||||
let client = ConnectOptions::with_jwt(jwt, move |nonce| {
|
||||
let key_pair = key_pair.clone();
|
||||
async move { key_pair.sign(&nonce).map_err(AuthError::new) }
|
||||
})
|
||||
.connect("nats://localhost")
|
||||
.await?;
|
||||
```
|
||||
|
||||
Flow:
|
||||
1. Server sends `INFO` with a `nonce` field
|
||||
2. Client sends `CONNECT` with `jwt` (user JWT) and `sig` (Base64URL-encoded nonce signature)
|
||||
3. The signing callback is async, allowing integration with external signing services (e.g., HSM)
|
||||
|
||||
### 5. Credentials File
|
||||
|
||||
Combines JWT and NKey from a `.creds` file. Requires the `nkeys` feature.
|
||||
|
||||
```rust
|
||||
// From file
|
||||
let client = ConnectOptions::with_credentials_file("path/to/my.creds")
|
||||
.await?
|
||||
.connect("nats://localhost")
|
||||
.await?;
|
||||
|
||||
// From string
|
||||
let client = ConnectOptions::with_credentials(creds_string)
|
||||
.connect("nats://localhost")
|
||||
.await?;
|
||||
```
|
||||
|
||||
Credentials file format:
|
||||
```
|
||||
-----BEGIN NATS USER JWT-----
|
||||
eyJ0eXAiOiJqd3QiLCJhbGciOiJlZDI1NTE5...
|
||||
------END NATS USER JWT------
|
||||
|
||||
************************* IMPORTANT *************************
|
||||
NKEY Seed printed below can be used sign and prove identity.
|
||||
|
||||
-----BEGIN USER NKEY SEED-----
|
||||
SUAIO3FHUX5PNV2LQIIP7TZ3N4L7TX3W53MQGEIVYFIGA635OZCKEYHFLM
|
||||
------END USER NKEY SEED------
|
||||
```
|
||||
|
||||
**Location**: `auth_utils.rs` handles parsing:
|
||||
- `load_creds(path)` — async file read + parse
|
||||
- `parse_jwt_and_key_from_creds(creds)` — extracts JWT and KeyPair from the string
|
||||
|
||||
### 6. Auth Callback
|
||||
|
||||
A custom async callback that receives the server nonce and returns an `Auth` struct. This is the most flexible mechanism.
|
||||
|
||||
```rust
|
||||
let client = ConnectOptions::with_auth_callback(move |nonce| {
|
||||
async move {
|
||||
let mut auth = Auth::new();
|
||||
auth.username = Some("user".to_string());
|
||||
auth.password = Some("pass".to_string());
|
||||
// Can also set jwt, nkey, signature, token
|
||||
Ok(auth)
|
||||
}
|
||||
})
|
||||
.connect("nats://localhost")
|
||||
.await?;
|
||||
```
|
||||
|
||||
The callback is invoked on each connection/reconnection, allowing dynamic credential refresh (e.g., refreshing JWTs from an auth server).
|
||||
|
||||
### 7. URL-Embedded Credentials
|
||||
|
||||
```rust
|
||||
// Username and password in URL
|
||||
let client = connect("nats://user:pass@localhost:4222").await?;
|
||||
|
||||
// Token in URL (username field)
|
||||
let client = connect("nats://token@localhost:4222").await?;
|
||||
```
|
||||
|
||||
## Auth Struct
|
||||
|
||||
**Location**: `auth.rs`
|
||||
|
||||
The `Auth` struct is a container for all authentication methods. Multiple fields can be set simultaneously:
|
||||
|
||||
```rust
|
||||
#[derive(Clone, Default)]
|
||||
pub struct Auth {
|
||||
pub jwt: Option<String>,
|
||||
pub nkey: Option<String>,
|
||||
pub signature_callback: Option<CallbackArg1<String, Result<String, AuthError>>>,
|
||||
pub signature: Option<Vec<u8>>,
|
||||
pub username: Option<String>,
|
||||
pub password: Option<String>,
|
||||
pub token: Option<String>,
|
||||
}
|
||||
```
|
||||
|
||||
Priority in `Connector::try_connect_to()`:
|
||||
1. Auth callback overrides all other methods
|
||||
2. NKey authentication (if `auth.nkey` is set)
|
||||
3. JWT authentication (if `auth.jwt` is set)
|
||||
4. Username/password/token from `Auth` struct
|
||||
5. Username/password from URL
|
||||
|
||||
## TLS Configuration
|
||||
|
||||
### TLS Modes
|
||||
|
||||
| Mode | When | Description |
|
||||
|------|------|-------------|
|
||||
| None | Default | Plaintext connection |
|
||||
| Standard | `tls_required` or server requires | TLS after INFO |
|
||||
| TLS First | `tls_first` option | TLS before INFO |
|
||||
| WebSocket | `wss://` URL | TLS handled by WebSocket library |
|
||||
|
||||
### TLS Setup
|
||||
|
||||
**Location**: `tls.rs`
|
||||
|
||||
The `config_tls()` function builds a `rustls::ClientConfig`:
|
||||
|
||||
1. Create `RootCertStore` and load native system certificates
|
||||
2. Add custom root certificates from configured PEM files
|
||||
3. Build `ClientConfig` with the chosen crypto provider:
|
||||
- `ring` (default)
|
||||
- `aws-lc-rs`
|
||||
- `fips` (aws-lc-rs in FIPS mode)
|
||||
4. If client certificate + key are configured, add them for mTLS
|
||||
5. If a custom `rustls::ClientConfig` was provided, use it directly
|
||||
|
||||
### TLS First
|
||||
|
||||
```rust
|
||||
let client = ConnectOptions::new()
|
||||
.tls_first()
|
||||
.connect("nats://localhost")
|
||||
.await?;
|
||||
```
|
||||
|
||||
This sets both `tls_first = true` and `tls_required = true`. The client performs TLS handshake before reading the `INFO` message. The server must have `handshake_first: true` in its configuration.
|
||||
|
||||
### Custom TLS Configuration
|
||||
|
||||
```rust
|
||||
let tls_client = rustls::ClientConfig::builder()
|
||||
.with_root_certificates(root_store)
|
||||
.with_no_client_auth();
|
||||
|
||||
let client = ConnectOptions::new()
|
||||
.require_tls(true)
|
||||
.tls_client_config(tls_client)
|
||||
.connect("nats://localhost")
|
||||
.await?;
|
||||
```
|
||||
|
||||
### mTLS (Mutual TLS)
|
||||
|
||||
```rust
|
||||
let client = ConnectOptions::new()
|
||||
.add_root_certificates("ca.pem".into())
|
||||
.add_client_certificate("cert.pem".into(), "key.pem".into())
|
||||
.connect("tls://localhost")
|
||||
.await?;
|
||||
```
|
||||
|
||||
## WebSocket Transport
|
||||
|
||||
Requires the `websockets` feature. Supports `ws://` and `wss://` schemes.
|
||||
|
||||
```rust
|
||||
let client = connect("ws://localhost:8080").await?;
|
||||
let client = connect("wss://localhost:443").await?;
|
||||
```
|
||||
|
||||
Implementation uses `tokio-websockets` with a `WebSocketAdapter` that wraps the WebSocket stream to implement `AsyncRead + AsyncWrite`:
|
||||
|
||||
```rust
|
||||
// WebSocketAdapter bridges WebSocket messages to byte streams
|
||||
pub(crate) struct WebSocketAdapter<T> {
|
||||
pub(crate) inner: WebSocketStream<T>,
|
||||
pub(crate) read_buf: BytesMut, // Buffered incoming WebSocket messages
|
||||
}
|
||||
```
|
||||
|
||||
For `wss://`, TLS is configured within the WebSocket connector, not via the client's TLS layer.
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Nonce Signing
|
||||
|
||||
The server's `nonce` in the `INFO` message prevents replay attacks:
|
||||
- Each connection gets a unique nonce
|
||||
- The nonce must be signed with the client's private key
|
||||
- The signature is verified server-side against the public key
|
||||
|
||||
### Authorization Violations
|
||||
|
||||
When the server sends `-ERR 'authorization violation'`:
|
||||
- The client parses this as `ServerError::AuthorizationViolation`
|
||||
- The `Connector` immediately propagates this error (does not retry)
|
||||
- The error is converted to `ConnectErrorKind::AuthorizationViolation`
|
||||
|
||||
### Subject Validation
|
||||
|
||||
By default, the client validates subjects for protocol safety:
|
||||
- **Publish subjects**: checked for emptiness and whitespace (can be disabled with `skip_subject_validation`)
|
||||
- **Subscribe subjects**: always checked for emptiness, whitespace, leading/trailing dots, consecutive dots
|
||||
- **Queue group names**: checked for emptiness and whitespace
|
||||
|
||||
The server enforces its own validation, but client-side checks prevent protocol-framing errors.
|
||||
|
||||
### Max Payload Size
|
||||
|
||||
The client checks payload size against the server's `max_payload` before publishing:
|
||||
- For plain messages: `payload.len() > max_payload`
|
||||
- For messages with headers: `headers.wire_len() + payload.len() > max_payload`
|
||||
- Returns `PublishErrorKind::MaxPayloadExceeded` if exceeded
|
||||
|
||||
### No Echo
|
||||
|
||||
When `no_echo` is set, the `CONNECT` message includes `echo: false`. The server will not deliver messages published by this connection back to its own subscriptions. This prevents feedback loops.
|
||||
|
||||
### Lame Duck Mode
|
||||
|
||||
When the server enters lame duck mode (draining for shutdown):
|
||||
1. Server sends `INFO` with `ldm: true`
|
||||
2. Client emits `Event::LameDuckMode`
|
||||
3. Application should gracefully close or reconnect to another server
|
||||
|
||||
The `nats-server` test harness provides `set_lame_duck_mode(server)` for testing this behavior.
|
||||
@@ -1,347 +0,0 @@
|
||||
# nats-server Test Harness
|
||||
|
||||
This document covers the `nats-server` crate — a test harness for spawning real NATS server instances in integration tests.
|
||||
|
||||
**Location**: `nats-server/src/lib.rs`
|
||||
**Version**: 0.1.0
|
||||
**License**: Apache-2.0
|
||||
**Dependencies**: `lazy_static`, `regex`, `serde_json`, `nuid`, `rand`, `tokio-retry`
|
||||
|
||||
## What It Is
|
||||
|
||||
The `nats-server` crate is **not** a NATS server implementation. It is a thin test harness that:
|
||||
- Spawns the Go-based `nats-server` binary as a child process
|
||||
- Configures it for test use (dynamic ports, temp storage, log files)
|
||||
- Discovers the client URL from the server's `INFO` protocol message
|
||||
- Cleans up resources (JetStream storage, logs, PID files) on `Drop`
|
||||
- Supports single servers and 3-node clusters
|
||||
|
||||
The actual NATS server must be installed separately (Go binary from `github.com/nats-io/nats-server`).
|
||||
|
||||
## Server Struct
|
||||
|
||||
```rust
|
||||
pub struct Server {
|
||||
inner: Inner,
|
||||
}
|
||||
|
||||
struct Inner {
|
||||
cfg: String, // Config file path
|
||||
id: String, // Unique server ID (NUID)
|
||||
port: Option<String>, // Explicit port (None = dynamic)
|
||||
child: Child, // Child process handle
|
||||
logfile: PathBuf, // Log file path in temp dir
|
||||
pidfile: PathBuf, // PID file path in temp dir
|
||||
}
|
||||
```
|
||||
|
||||
## Public API
|
||||
|
||||
### run_server
|
||||
|
||||
```rust
|
||||
pub fn run_server(cfg: &str) -> Server
|
||||
```
|
||||
|
||||
Starts a single NATS server with optional config file.
|
||||
|
||||
- Uses dynamic port (`-1` flag) for parallel test execution
|
||||
- Stores JetStream data in temp directory
|
||||
- Writes logs to temp file: `nats-server-<id>.log`
|
||||
- Writes PID to temp file: `nats-server-<id>.pid`
|
||||
- If `cfg` is non-empty, passes `-c <cfg>` to the server
|
||||
|
||||
Example:
|
||||
```rust
|
||||
let server = nats_server::run_server("tests/configs/jetstream.conf");
|
||||
let client = async_nats::connect(server.client_url()).await.unwrap();
|
||||
```
|
||||
|
||||
### run_basic_server
|
||||
|
||||
```rust
|
||||
pub fn run_basic_server() -> Server
|
||||
```
|
||||
|
||||
Starts a server with no config (bare minimum). Equivalent to `run_server("")`.
|
||||
|
||||
### run_server_with_port
|
||||
|
||||
```rust
|
||||
pub fn run_server_with_port(cfg: &str, port: Option<&str>) -> Server
|
||||
```
|
||||
|
||||
Starts a server with an explicit port. If `None`, uses dynamic port.
|
||||
|
||||
### run_cluster
|
||||
|
||||
```rust
|
||||
pub fn run_cluster<'a, C: IntoConfig<'a>>(cfg: C) -> Cluster
|
||||
```
|
||||
|
||||
Starts a 3-node cluster with the given config.
|
||||
|
||||
- Allocates 3 random port ranges (base, base+100, base+200)
|
||||
- Configures cluster routes between nodes
|
||||
- Each node gets: `--cluster nats://127.0.0.1:<cluster_port>`, `--routes <other_routes>`, `--cluster_name cluster`, `-n nodeN`
|
||||
- Waits 2 seconds for cluster formation and leader election
|
||||
|
||||
The `IntoConfig` trait allows passing either a single config string (applied to all 3 nodes) or an array of 3 configs (one per node):
|
||||
|
||||
```rust
|
||||
// Same config for all nodes
|
||||
let cluster = run_cluster("configs/jetstream.conf");
|
||||
|
||||
// Different configs per node
|
||||
let cluster = run_cluster(["node1.conf", "node2.conf", "node3.conf"]);
|
||||
```
|
||||
|
||||
### Cluster Struct
|
||||
|
||||
```rust
|
||||
pub struct Cluster {
|
||||
pub servers: Vec<Server>,
|
||||
}
|
||||
|
||||
impl Cluster {
|
||||
pub fn client_url(&self) -> String {
|
||||
self.servers[0].client_url()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Server Methods
|
||||
|
||||
```rust
|
||||
impl Server {
|
||||
pub fn restart(&mut self)
|
||||
pub fn client_url(&self) -> String
|
||||
pub fn client_port(&self) -> u16
|
||||
pub fn client_url_with(&self, user: &str, pass: &str) -> String
|
||||
pub fn client_url_with_token(&self, token: &str) -> String
|
||||
pub fn client_pid(&self) -> usize
|
||||
}
|
||||
```
|
||||
|
||||
#### restart()
|
||||
|
||||
Kills the current server process, waits for it to exit, then restarts with the same config, port, and ID. Used for testing reconnection behavior.
|
||||
|
||||
#### client_url()
|
||||
|
||||
Connects to the server's TCP port, reads the `INFO` line, parses the JSON, and constructs a URL:
|
||||
- `nats://localhost:<port>` for non-TLS
|
||||
- `tls://localhost:<port>` for TLS-required servers
|
||||
|
||||
Polls the log file (up to 10 seconds) to discover the client address, since the port may be dynamically allocated.
|
||||
|
||||
#### client_pid()
|
||||
|
||||
Reads the PID file and returns the server process ID. Used for sending signals.
|
||||
|
||||
### set_lame_duck_mode
|
||||
|
||||
```rust
|
||||
pub fn set_lame_duck_mode(s: &Server)
|
||||
```
|
||||
|
||||
Sends the lame duck mode signal to the server:
|
||||
|
||||
```bash
|
||||
nats-server --signal ldm=<pid>
|
||||
```
|
||||
|
||||
### is_port_available
|
||||
|
||||
```rust
|
||||
pub fn is_port_available(port: usize) -> bool
|
||||
```
|
||||
|
||||
Tests if a TCP port is available by attempting to bind to it.
|
||||
|
||||
## Server Lifecycle
|
||||
|
||||
### Spawning
|
||||
|
||||
The `do_run` function constructs and spawns the server process:
|
||||
|
||||
```rust
|
||||
fn do_run(cfg: &str, port: Option<&str>, id: Option<String>) -> Inner {
|
||||
let id = id.unwrap_or_else(|| nuid::next().to_string());
|
||||
let logfile = env::temp_dir().join(format!("nats-server-{id}.log"));
|
||||
let pidfile = env::temp_dir().join(format!("nats-server-{id}.pid"));
|
||||
let store_dir = env::temp_dir().join(format!("store-dir-{id}"));
|
||||
|
||||
let mut cmd = Command::new("nats-server");
|
||||
cmd.arg("--store_dir").arg(store_dir.as_path())
|
||||
.arg("-p");
|
||||
|
||||
match port {
|
||||
Some(port) => cmd.arg(port),
|
||||
None => cmd.arg("-1"), // Dynamic port
|
||||
};
|
||||
|
||||
cmd.arg("-l").arg(logfile.as_os_str())
|
||||
.arg("-P").arg(pidfile.as_os_str());
|
||||
|
||||
if !cfg.is_empty() {
|
||||
cmd.arg("-c").arg(cfg);
|
||||
}
|
||||
|
||||
let child = cmd.spawn().unwrap();
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
Key flags:
|
||||
- `--store_dir` — JetStream storage directory in temp
|
||||
- `-p -1` — Dynamic port allocation (or explicit port)
|
||||
- `-l` — Log file path
|
||||
- `-P` — PID file path
|
||||
- `-c` — Config file path
|
||||
|
||||
### Cleanup (Drop)
|
||||
|
||||
```rust
|
||||
impl Drop for Server {
|
||||
fn drop(&mut self) {
|
||||
self.inner.child.kill().unwrap();
|
||||
self.inner.child.wait().unwrap();
|
||||
|
||||
if let Ok(log) = fs::read_to_string(self.inner.logfile.as_os_str()) {
|
||||
// Clean up JetStream storage directory if found in log
|
||||
if let Some(caps) = SD_RE.captures(&log) {
|
||||
let sd = caps.get(1).map_or("", |m| m.as_str());
|
||||
fs::remove_dir_all(sd).ok();
|
||||
}
|
||||
// Remove log file
|
||||
fs::remove_file(self.inner.logfile.as_os_str()).ok();
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The regex `SD_RE` matches the "Store Directory" line in the server log:
|
||||
```
|
||||
.+\sStore Directory:\s+"([^"]+)"
|
||||
```
|
||||
|
||||
### Client URL Discovery
|
||||
|
||||
The `client_addr` method polls the log file to find the server's listen address:
|
||||
|
||||
```rust
|
||||
fn client_addr(&self) -> String {
|
||||
for _ in 0..100 { // 100 iterations × 500ms = 50s max
|
||||
match fs::read_to_string(self.inner.logfile.as_os_str()) {
|
||||
Ok(l) => {
|
||||
if let Some(cre) = CLIENT_RE.captures(&l) {
|
||||
return cre.get(1).unwrap().as_str()
|
||||
.replace("0.0.0.0", "localhost");
|
||||
} else {
|
||||
thread::sleep(Duration::from_millis(500));
|
||||
}
|
||||
}
|
||||
_ => thread::sleep(Duration::from_millis(500)),
|
||||
}
|
||||
}
|
||||
panic!("no client addr info");
|
||||
}
|
||||
```
|
||||
|
||||
The regex `CLIENT_RE` matches:
|
||||
```
|
||||
.+\sclient connections on\s+(\S+)
|
||||
```
|
||||
|
||||
After finding the address, `client_url()` connects to it and parses the `INFO` JSON to get the port and TLS requirements.
|
||||
|
||||
## Cluster Setup
|
||||
|
||||
The `run_cluster_node_with_port` function spawns a single cluster node:
|
||||
|
||||
```rust
|
||||
fn run_cluster_node_with_port(
|
||||
cfg: &str,
|
||||
port: Option<&str>,
|
||||
routes: Vec<usize>,
|
||||
name: String,
|
||||
cluster_name: String,
|
||||
cluster: usize,
|
||||
) -> Server
|
||||
```
|
||||
|
||||
Additional flags for cluster nodes:
|
||||
- `--routes nats://127.0.0.1:<port1>,nats://127.0.0.1:<port2>` — routes to other cluster members
|
||||
- `--cluster nats://127.0.0.1:<cluster_port>` — cluster listen address
|
||||
- `--cluster_name <name>` — cluster name for grouping
|
||||
- `-n <name>` — server name
|
||||
|
||||
Port allocation for a cluster:
|
||||
```
|
||||
Base port: random in 3000..50000
|
||||
Node 1: client_port=base, cluster_port=base+1
|
||||
Node 2: client_port=base+100, cluster_port=base+101
|
||||
Node 3: client_port=base+200, cluster_port=base+201
|
||||
```
|
||||
|
||||
Each port is checked for availability with `is_port_available()`, including the +1 cluster port.
|
||||
|
||||
## JetStream Config
|
||||
|
||||
**Location**: `configs/jetstream.conf`
|
||||
|
||||
```conf
|
||||
jetstream: {
|
||||
strict: true,
|
||||
max_mem_store: 8MiB,
|
||||
max_file_store: 10GiB
|
||||
}
|
||||
```
|
||||
|
||||
This is the default test config for JetStream-enabled servers. It enables strict mode and sets memory/file storage limits suitable for testing.
|
||||
|
||||
## Test Usage Patterns
|
||||
|
||||
```rust
|
||||
#[tokio::test]
|
||||
async fn basic_test() {
|
||||
let server = nats_server::run_server("configs/jetstream.conf");
|
||||
let client = async_nats::connect(server.client_url()).await.unwrap();
|
||||
// ... test logic ...
|
||||
// Server cleaned up on drop
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn cluster_test() {
|
||||
let cluster = nats_server::run_cluster("configs/jetstream.conf");
|
||||
let client = async_nats::connect(cluster.client_url()).await.unwrap();
|
||||
// ... test logic ...
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn reconnect_test() {
|
||||
let mut server = nats_server::run_server("");
|
||||
let client = async_nats::connect(server.client_url()).await.unwrap();
|
||||
|
||||
// Restart the server to test reconnection
|
||||
server.restart();
|
||||
|
||||
// Client should reconnect automatically
|
||||
client.publish("test", "data".into()).await.unwrap();
|
||||
}
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
| Dependency | Version | Purpose |
|
||||
|-----------|---------|---------|
|
||||
| `lazy_static` | 1.4.0 | Static regex initialization |
|
||||
| `regex` | 1.7.1 | Log parsing (store directory, client address) |
|
||||
| `url` | 2 | URL manipulation for client_url_with |
|
||||
| `serde_json` | 1.0.104 | INFO JSON parsing |
|
||||
| `nuid` | 0.5 | Unique server ID generation |
|
||||
| `rand` | 0.10.1 | Random port selection |
|
||||
| `tokio-retry` | 0.3.0 | Exponential backoff for cluster operations |
|
||||
|
||||
Note: `async-nats` is only a dev-dependency, used in the crate's own integration tests.
|
||||
@@ -1,307 +0,0 @@
|
||||
# Service API and Higher-Level Abstractions
|
||||
|
||||
This document covers the Service API and other higher-level abstractions built on top of the core NATS client.
|
||||
|
||||
## Service API
|
||||
|
||||
**Location**: `service/` (feature: `service`)
|
||||
|
||||
The Service API provides a framework for building NATS-based microservices with built-in monitoring, health checks, and statistics.
|
||||
|
||||
### Service
|
||||
|
||||
```rust
|
||||
#[derive(Debug)]
|
||||
pub struct Service {
|
||||
client: Client,
|
||||
info: Info,
|
||||
endpoints: HashMap<String, Endpoint>,
|
||||
started: DateTime,
|
||||
stats_handler: Arc<dyn Fn(&str, &Stats) -> serde_json::Value + Send + Sync>,
|
||||
stop_sender: mpsc::Sender<()>,
|
||||
stop_receiver: Option<mpsc::Receiver<()>>,
|
||||
}
|
||||
```
|
||||
|
||||
### Creating a Service
|
||||
|
||||
```rust
|
||||
use async_nats::service::ServiceExt;
|
||||
|
||||
let mut service = client
|
||||
.service_builder()
|
||||
.description("Product service")
|
||||
.stats_handler(|endpoint, stats| {
|
||||
serde_json::json!({
|
||||
"endpoint": endpoint,
|
||||
"requests": stats.num_requests,
|
||||
"errors": stats.num_errors,
|
||||
})
|
||||
})
|
||||
.start("products", "1.0.0")
|
||||
.await?;
|
||||
```
|
||||
|
||||
### ServiceBuilder
|
||||
|
||||
```rust
|
||||
impl ServiceBuilder {
|
||||
pub fn description(mut self, description: impl Into<String>) -> Self
|
||||
pub fn stats_handler<F>(mut self, handler: F) -> Self
|
||||
pub async fn start(self, name: impl Into<String>, version: impl Into<String>) -> Result<Service, ServiceError>
|
||||
}
|
||||
```
|
||||
|
||||
### Endpoints
|
||||
|
||||
A service exposes one or more endpoints, each handling requests on a specific subject:
|
||||
|
||||
```rust
|
||||
// Add an endpoint
|
||||
let mut endpoint = service
|
||||
.endpoint("get_product")
|
||||
.await?;
|
||||
|
||||
// Process requests
|
||||
while let Some(request) = endpoint.next().await {
|
||||
let request = request?;
|
||||
// Handle the request
|
||||
request.respond(serde_json::json!({ "id": 1, "name": "Widget" })).await?;
|
||||
}
|
||||
```
|
||||
|
||||
### Endpoint
|
||||
|
||||
**Location**: `service/endpoint.rs`
|
||||
|
||||
```rust
|
||||
pub struct Endpoint {
|
||||
subject: Subject,
|
||||
queue_group: Option<String>,
|
||||
info: EndpointInfo,
|
||||
stats: Stats,
|
||||
subscriber: Subscriber,
|
||||
}
|
||||
```
|
||||
|
||||
Implements `futures::Stream` yielding `ServiceRequest` objects.
|
||||
|
||||
### ServiceRequest
|
||||
|
||||
```rust
|
||||
pub struct ServiceRequest {
|
||||
pub subject: Subject,
|
||||
pub payload: Bytes,
|
||||
pub headers: Option<HeaderMap>,
|
||||
pub reply: Option<Subject>,
|
||||
pub client: Client,
|
||||
}
|
||||
```
|
||||
|
||||
Methods:
|
||||
- `respond(payload)` — send a response to the requester
|
||||
- `respond_with_headers(payload, headers)` — send a response with headers
|
||||
|
||||
### Monitoring Subjects
|
||||
|
||||
The Service API automatically creates monitoring endpoints:
|
||||
|
||||
| Subject | Description |
|
||||
|---------|-------------|
|
||||
| `$SRV.PING` | Ping all services (returns service info) |
|
||||
| `$SRV.PING.<name>` | Ping specific service by name |
|
||||
| `$SRV.PING.<name>.<id>` | Ping specific service instance |
|
||||
| `$SRV.INFO` | Get service info |
|
||||
| `$SRV.STATS` | Get service statistics |
|
||||
|
||||
### Service Info
|
||||
|
||||
```rust
|
||||
pub struct Info {
|
||||
pub name: String,
|
||||
pub id: String,
|
||||
pub version: String,
|
||||
pub description: String,
|
||||
pub endpoints: Vec<EndpointInfo>,
|
||||
}
|
||||
```
|
||||
|
||||
### Stats
|
||||
|
||||
```rust
|
||||
pub struct Stats {
|
||||
pub num_requests: u64,
|
||||
pub num_errors: u64,
|
||||
pub last_error: Option<String>,
|
||||
pub processing_time: Duration,
|
||||
pub average_processing_time: Duration,
|
||||
}
|
||||
```
|
||||
|
||||
## ID Generation
|
||||
|
||||
**Location**: `id_generator.rs`
|
||||
|
||||
The client needs unique IDs for inbox subjects and other purposes.
|
||||
|
||||
### With `nuid` Feature (Default)
|
||||
|
||||
Uses the NUID library for high-performance, cryptographically strong, collision-resistant IDs:
|
||||
|
||||
```rust
|
||||
pub(crate) fn next() -> String {
|
||||
nuid::next().to_string()
|
||||
}
|
||||
```
|
||||
|
||||
NUID generates 22-character alphanumeric strings using a combination of a random prefix and a sequential counter.
|
||||
|
||||
### Without `nuid` Feature
|
||||
|
||||
Falls back to `rand`-based generation:
|
||||
|
||||
```rust
|
||||
pub(crate) fn next() -> String {
|
||||
rng()
|
||||
.sample_iter(Alphanumeric)
|
||||
.take(22)
|
||||
.map(char::from)
|
||||
.collect()
|
||||
}
|
||||
```
|
||||
|
||||
Both approaches produce 22-character alphanumeric strings, but NUID is more performant and has better collision resistance.
|
||||
|
||||
## Inbox Generation
|
||||
|
||||
The `Client::new_inbox()` method generates globally unique inbox subjects for request-reply:
|
||||
|
||||
```rust
|
||||
pub fn new_inbox(&self) -> String {
|
||||
format!("{}.{}", self.inbox_prefix, crate::id_generator::next())
|
||||
}
|
||||
```
|
||||
|
||||
Default prefix is `_INBOX`, producing subjects like `_INBOX.UaBG3f3q5NxX3KdNcRmF2f`.
|
||||
|
||||
Custom prefix via `ConnectOptions::custom_inbox_prefix()`:
|
||||
```rust
|
||||
let client = ConnectOptions::new()
|
||||
.custom_inbox_prefix("MYAPP")
|
||||
.connect("demo.nats.io")
|
||||
.await?;
|
||||
// Inbox subjects: MYAPP.UaBG3f3q5KdNcRmF2f
|
||||
```
|
||||
|
||||
## DateTime Helpers
|
||||
|
||||
**Location**: `datetime.rs` (feature: `jetstream` or `service` or `chrono`)
|
||||
|
||||
Provides date/time types for JetStream and Service API timestamps:
|
||||
|
||||
- Uses the `time` crate by default
|
||||
- Optionally uses `chrono` via the `chrono` feature flag
|
||||
- Supports RFC 3339 formatting and parsing
|
||||
- `DateTime` type wraps either `time::OffsetDateTime` or `chrono::DateTime<Utc>`
|
||||
|
||||
## Crypto Module
|
||||
|
||||
**Location**: `crypto.rs` (feature: `crypto`)
|
||||
|
||||
Provides encryption/decryption support used by the Object Store for server-side encryption.
|
||||
|
||||
## Subject Validation
|
||||
|
||||
**Location**: `lib.rs`
|
||||
|
||||
The client provides two levels of subject validation:
|
||||
|
||||
### is_valid_publish_subject
|
||||
|
||||
```rust
|
||||
pub(crate) fn is_valid_publish_subject<T: AsRef<str>>(subject: T) -> bool
|
||||
```
|
||||
|
||||
Checks for protocol safety only:
|
||||
- Not empty
|
||||
- No whitespace (space, tab, CR, LF) which would break protocol framing
|
||||
|
||||
Used for publish operations. Can be disabled with `skip_subject_validation`.
|
||||
|
||||
### is_valid_subject
|
||||
|
||||
```rust
|
||||
pub(crate) fn is_valid_subject<T: AsRef<str>>(subject: T) -> bool
|
||||
```
|
||||
|
||||
Checks structural validity:
|
||||
- Not empty
|
||||
- No leading/trailing dots
|
||||
- No consecutive dots (`..`)
|
||||
- No whitespace
|
||||
|
||||
Used for subscribe operations (always runs, matching Go/Java behavior).
|
||||
|
||||
### is_valid_queue_group
|
||||
|
||||
```rust
|
||||
pub(crate) fn is_valid_queue_group(queue_group: &str) -> bool
|
||||
```
|
||||
|
||||
Checks:
|
||||
- Not empty
|
||||
- No whitespace
|
||||
|
||||
## JetStream Name Validation
|
||||
|
||||
**Location**: `jetstream/mod.rs`
|
||||
|
||||
```rust
|
||||
pub(crate) fn is_valid_name(name: &str) -> bool {
|
||||
!name.is_empty()
|
||||
&& name.bytes().all(|c| !c.is_ascii_whitespace() && c != b'.' && c != b'*' && c != b'>')
|
||||
}
|
||||
```
|
||||
|
||||
JetStream names (stream names, consumer names) must not contain:
|
||||
- Whitespace
|
||||
- Dots (`.`) — would conflict with subject delimiters
|
||||
- Wildcards (`*`, `>`) — would conflict with subject wildcards
|
||||
|
||||
## CallbackArg1
|
||||
|
||||
**Location**: `options.rs`
|
||||
|
||||
A type-erased async callback wrapper used throughout the crate:
|
||||
|
||||
```rust
|
||||
pub(crate) type AsyncCallbackArg1<A, T> =
|
||||
Arc<dyn Fn(A) -> Pin<Box<dyn Future<Output = T> + Send + Sync + 'static>> + Send + Sync>;
|
||||
|
||||
#[derive(Clone)]
|
||||
pub(crate) struct CallbackArg1<A, T>(AsyncCallbackArg1<A, T>);
|
||||
|
||||
impl<A, T> CallbackArg1<A, T> {
|
||||
pub(crate) async fn call(&self, arg: A) -> T {
|
||||
(self.0.as_ref())(arg).await
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Used for:
|
||||
- `event_callback` — `CallbackArg1<Event, ()>`
|
||||
- `auth_callback` — `CallbackArg1<Vec<u8>, Result<Auth, AuthError>>`
|
||||
- `reconnect_to_server_callback` — `CallbackArg1<(Vec<Server>, ServerInfo), Option<ReconnectToServer>>`
|
||||
- `signature_callback` — `CallbackArg1<String, Result<String, AuthError>>`
|
||||
|
||||
## Version Compatibility Checking
|
||||
|
||||
The `Client::is_server_compatible` method checks if the server version meets a minimum requirement:
|
||||
|
||||
```rust
|
||||
pub fn is_server_compatible(&self, major: i64, minor: i64, patch: i64) -> bool
|
||||
```
|
||||
|
||||
This parses the server version string from `ServerInfo::version` using a regex and compares major/minor/patch components. Note: this checks the directly-connected server, not necessarily the JetStream leader.
|
||||
|
||||
The `server_2_10`, `server_2_11`, `server_2_12`, and `server_2_14` feature flags enable version-specific API fields and methods without runtime checks.
|
||||
@@ -1,215 +0,0 @@
|
||||
# Quick Reference
|
||||
|
||||
## Crate Summary
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **Crate** | `async-nats` |
|
||||
| **Version** | 0.49.1 |
|
||||
| **Edition** | 2021 |
|
||||
| **MSRV** | 1.88.0 |
|
||||
| **License** | Apache-2.0 |
|
||||
| **Runtime** | Tokio |
|
||||
| **Protocol** | NATS Client Protocol v1 (Dynamic) |
|
||||
| **TLS** | rustls (ring / aws-lc-rs / fips) |
|
||||
| **WebSocket** | tokio-websockets (feature-gated) |
|
||||
|
||||
## Quick Start
|
||||
|
||||
```rust
|
||||
use async_nats::connect;
|
||||
use futures_util::StreamExt;
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() -> Result<(), async_nats::Error> {
|
||||
let client = connect("demo.nats.io").await?;
|
||||
|
||||
// Publish
|
||||
client.publish("events.data", "hello".into()).await?;
|
||||
|
||||
// Subscribe
|
||||
let mut sub = client.subscribe("events.>").await?;
|
||||
while let Some(msg) = sub.next().await {
|
||||
println!("{:?}", msg);
|
||||
}
|
||||
|
||||
// Request-Response
|
||||
let response = client.request("service", "input".into()).await?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
## Architecture at a Glance
|
||||
|
||||
```
|
||||
Client (cloneable handle, mpsc::Sender<Command>)
|
||||
│
|
||||
▼
|
||||
ConnectionHandler (single Tokio task)
|
||||
├── Subscriptions HashMap<u64, Subscription>
|
||||
├── Multiplexer (request-reply, SID 0)
|
||||
├── Flush Observers
|
||||
└── Ping/Pong health check
|
||||
│
|
||||
▼
|
||||
Connection (protocol I/O, read/write buffers)
|
||||
│
|
||||
▼
|
||||
Connector (server pool, reconnection)
|
||||
│
|
||||
▼
|
||||
NATS Server (Go binary, TCP/TLS/WebSocket)
|
||||
```
|
||||
|
||||
## Key Types
|
||||
|
||||
| Type | Location | Purpose |
|
||||
|------|----------|---------|
|
||||
| `Client` | `client.rs` | Cloneable connection handle |
|
||||
| `Subscriber` | `lib.rs` | Message stream (impl `futures::Stream`) |
|
||||
| `Message` | `message.rs` | Inbound NATS message |
|
||||
| `OutboundMessage` | `message.rs` | Outbound publish message |
|
||||
| `Subject` | `subject.rs` | Validated subject string (backed by `Bytes`) |
|
||||
| `HeaderMap` | `header.rs` | NATS message headers |
|
||||
| `StatusCode` | `status.rs` | NATS protocol status codes |
|
||||
| `ServerInfo` | `lib.rs` | Server INFO data |
|
||||
| `ConnectInfo` | `lib.rs` | Client CONNECT data |
|
||||
| `ServerAddr` | `lib.rs` | Validated server URL |
|
||||
| `Auth` | `auth.rs` | Authentication credentials |
|
||||
| `ConnectOptions` | `options.rs` | Connection configuration builder |
|
||||
| `Event` | `lib.rs` | Connection lifecycle events |
|
||||
| `State` | `connection.rs` | Connection state (Pending/Connected/Disconnected) |
|
||||
| `Statistics` | `client.rs` | Atomic connection metrics |
|
||||
| `Request` | `client.rs` | Request-response builder |
|
||||
|
||||
## JetStream Types
|
||||
|
||||
| Type | Location | Purpose |
|
||||
|------|----------|---------|
|
||||
| `jetstream::Context` | `jetstream/context.rs` | JetStream API entry point |
|
||||
| `jetstream::stream::Stream` | `jetstream/stream.rs` | Stream management |
|
||||
| `jetstream::stream::Config` | `jetstream/stream.rs` | Stream configuration |
|
||||
| `jetstream::stream::Info` | `jetstream/stream.rs` | Stream info/state |
|
||||
| `jetstream::consumer::PullConsumer` | `jetstream/consumer/pull.rs` | Pull-based consumer |
|
||||
| `jetstream::consumer::PushConsumer` | `jetstream/consumer/push.rs` | Push-based consumer |
|
||||
| `jetstream::consumer::Config` | `jetstream/consumer/mod.rs` | Consumer configuration |
|
||||
| `jetstream::Message` | `jetstream/message.rs` | Message with ack methods |
|
||||
| `jetstream::PublishAck` | `jetstream/publish.rs` | Publish acknowledgment |
|
||||
| `jetstream::kv::Store` | `jetstream/kv/bucket.rs` | Key-Value store |
|
||||
| `jetstream::object_store::ObjectStore` | `jetstream/object_store/mod.rs` | Object store |
|
||||
| `jetstream::ErrorCode` | `jetstream/errors.rs` | JetStream error codes |
|
||||
|
||||
## Protocol Operations
|
||||
|
||||
### Client → Server (ClientOp)
|
||||
|
||||
| Op | Wire Format | Purpose |
|
||||
|----|-----------|---------|
|
||||
| `CONNECT` | `CONNECT {json}\r\n` | Authentication and capabilities |
|
||||
| `PUB` | `PUB <subject> [reply] <len>\r\n<payload>\r\n` | Publish message |
|
||||
| `HPUB` | `HPUB <subject> [reply] <hlen> <tlen>\r\n<hdrs><payload>\r\n` | Publish with headers |
|
||||
| `SUB` | `SUB <subject> [queue] <sid>\r\n` | Subscribe |
|
||||
| `UNSUB` | `UNSUB <sid> [max]\r\n` | Unsubscribe |
|
||||
| `PING` | `PING\r\n` | Keepalive / health check |
|
||||
| `PONG` | `PONG\r\n` | Response to server PING |
|
||||
|
||||
### Server → Client (ServerOp)
|
||||
|
||||
| Op | Wire Format | Purpose |
|
||||
|----|-----------|---------|
|
||||
| `INFO` | `INFO {json}\r\n` | Server capabilities, cluster info |
|
||||
| `MSG` | `MSG <subj> <sid> [reply] <len>\r\n<payload>\r\n` | Deliver message |
|
||||
| `HMSG` | `HMSG <subj> <sid> [reply] <hlen> <tlen>\r\n<hdrs><payload>\r\n` | Message with headers |
|
||||
| `+OK` | `+OK\r\n` | Success (verbose mode) |
|
||||
| `-ERR` | `-ERR <desc>\r\n` | Server error |
|
||||
| `PING` | `PING\r\n` | Server health check |
|
||||
| `PONG` | `PONG\r\n` | Ack client PING |
|
||||
|
||||
## Internal Commands (Command → ConnectionHandler)
|
||||
|
||||
| Command | Purpose |
|
||||
|---------|---------|
|
||||
| `Publish(OutboundMessage)` | Queue message for sending |
|
||||
| `Request { subject, payload, respond, headers, sender }` | Request-response via multiplexer |
|
||||
| `Subscribe { sid, subject, queue_group, sender }` | Create subscription |
|
||||
| `Unsubscribe { sid, max }` | Remove subscription |
|
||||
| `Flush { observer }` | Wait for write buffer flush |
|
||||
| `Drain { sid }` | Gracefully drain (sub or whole client) |
|
||||
| `Reconnect` | Force reconnection |
|
||||
| `SetServerPool { servers, result }` | Replace server pool |
|
||||
| `ServerPool { result }` | Query server pool |
|
||||
|
||||
## Feature Flags
|
||||
|
||||
| Feature | Default | Enables |
|
||||
|---------|---------|---------|
|
||||
| `jetstream` | ✓ | JetStream API (streams, consumers, publish) |
|
||||
| `kv` | ✓ | Key-Value store (requires jetstream) |
|
||||
| `object-store` | ✓ | Object store (requires jetstream + crypto) |
|
||||
| `service` | ✓ | Service API |
|
||||
| `nkeys` | ✓ | NKey/JWT authentication |
|
||||
| `crypto` | ✓ | Encryption support |
|
||||
| `websockets` | ✓ | WebSocket transport |
|
||||
| `nuid` | ✓ | NUID ID generation |
|
||||
| `ring` | ✓ | Ring crypto backend |
|
||||
| `aws-lc-rs` | ✗ | Alternative crypto backend |
|
||||
| `fips` | ✗ | FIPS mode (requires aws-lc-rs) |
|
||||
| `chrono` | ✗ | Use chrono instead of time |
|
||||
| `experimental` | ✗ | Experimental features |
|
||||
| `server_2_10` | ✓ | Server 2.10+ API fields |
|
||||
| `server_2_11` | ✓ | Server 2.11+ API fields |
|
||||
| `server_2_12` | ✓ | Server 2.12+ API fields |
|
||||
| `server_2_14` | ✓ | Server 2.14+ API fields |
|
||||
|
||||
## Connection Defaults
|
||||
|
||||
| Parameter | Default |
|
||||
|-----------|---------|
|
||||
| Connection timeout | 5 seconds |
|
||||
| Ping interval | 60 seconds |
|
||||
| Max pending pings | 2 |
|
||||
| Request timeout | 10 seconds |
|
||||
| Command channel capacity | 2048 |
|
||||
| Subscription capacity | 65536 |
|
||||
| Read buffer capacity | 65535 |
|
||||
| Inbox prefix | `_INBOX` |
|
||||
| Reconnect delay | Exponential (0ms → 4s cap) |
|
||||
| Max reconnects | Unlimited |
|
||||
| TLS required | Auto (server-dependent) |
|
||||
|
||||
## Error Hierarchy
|
||||
|
||||
```
|
||||
ConnectError (ConnectErrorKind::ServerParse | Dns | Authentication | AuthorizationViolation | TimedOut | Tls | Io | MaxReconnects)
|
||||
PublishError (PublishErrorKind::MaxPayloadExceeded | InvalidSubject | Send)
|
||||
RequestError (RequestErrorKind::TimedOut | NoResponders | InvalidSubject | MaxPayloadExceeded | Other)
|
||||
SubscribeError (SubscribeErrorKind::InvalidSubject | InvalidQueueName | Other)
|
||||
FlushError (FlushErrorKind::SendError | FlushError)
|
||||
```
|
||||
|
||||
## nats-server Test Harness
|
||||
|
||||
| Function | Description |
|
||||
|----------|-------------|
|
||||
| `run_server(cfg)` | Start single server with config |
|
||||
| `run_basic_server()` | Start bare server |
|
||||
| `run_cluster(cfg)` | Start 3-node cluster |
|
||||
| `set_lame_duck_mode(s)` | Send LDM signal |
|
||||
|
||||
## JetStream API Subjects
|
||||
|
||||
| Operation | Subject Pattern |
|
||||
|-----------|---------------|
|
||||
| Create stream | `$JS.API.STREAM.CREATE.<name>` |
|
||||
| Stream info | `$JS.API.STREAM.INFO.<name>` |
|
||||
| Update stream | `$JS.API.STREAM.UPDATE.<name>` |
|
||||
| Delete stream | `$JS.API.STREAM.DELETE.<name>` |
|
||||
| Purge stream | `$JS.API.STREAM.PURGE.<name>` |
|
||||
| List streams | `$JS.API.STREAM.LIST` |
|
||||
| Create consumer | `$JS.API.CONSUMER.CREATE.<stream>` |
|
||||
| Create durable | `$JS.API.CONSUMER.DURABLE.CREATE.<stream>.<name>` |
|
||||
| Consumer info | `$JS.API.CONSUMER.INFO.<stream>.<name>` |
|
||||
| Pull next | `$JS.API.CONSUMER.MSG.NEXT.<stream>.<name>` |
|
||||
| Account info | `$JS.API.ACCOUNT.INFO` |
|
||||
| Direct get | `$JS.API.DIRECT.GET.<name>` |
|
||||
@@ -1,963 +0,0 @@
|
||||
# OpenStack Keystone Identity Service — Reference Document
|
||||
|
||||
> Status: Research reference
|
||||
> Created: 2026-06-08
|
||||
> Context: alknet auth/identity system design; rustfs S3-compatible store with Keystone auth
|
||||
|
||||
## 1. Overview
|
||||
|
||||
OpenStack Keystone is the identity service for the OpenStack cloud platform. It
|
||||
provides authentication, authorization, and service discovery via a RESTful HTTP
|
||||
API. Every other OpenStack service (Nova, Neutron, Cinder, Swift, etc.) depends
|
||||
on Keystone for token validation and access control.
|
||||
|
||||
Key responsibilities:
|
||||
|
||||
| Responsibility | Description |
|
||||
|---|---|
|
||||
| **Authentication** | Verify identity via passwords, tokens, TOTP, SAML, OIDC, application credentials |
|
||||
| **Authorization** | Role-based access control (RBAC) across projects, domains, and system scope |
|
||||
| **Service Catalog** | Registry of available services and their endpoint URLs |
|
||||
| **Token Management** | Issue, validate, and revoke bearer tokens with scoped authorization |
|
||||
| **Federation** | Accept identity assertions from external IdPs (SAML, OIDC) |
|
||||
| **Trust Delegation** | Allow users to delegate limited authority to other users |
|
||||
|
||||
---
|
||||
|
||||
## 2. Core Concepts
|
||||
|
||||
### 2.1 Domains
|
||||
|
||||
A **domain** is a top-level namespace that contains users, groups, and projects.
|
||||
Domains provide administrative isolation: a domain administrator can manage
|
||||
users and projects within their domain but not across domains.
|
||||
|
||||
- Domains were introduced in the Identity API v3 (the "v3" API).
|
||||
- Before domains, OpenStack used "tenants" (v2 API) — projects are the v3
|
||||
equivalent, but domains add a containment boundary.
|
||||
- Every user, group, and project belongs to exactly one domain.
|
||||
- The `Default` domain is created automatically and holds all v2-compatible
|
||||
resources.
|
||||
|
||||
**Key property**: Domains are the unit of administrative delegation. A domain
|
||||
admin can create/delete users, groups, and projects within their domain.
|
||||
|
||||
### 2.2 Projects
|
||||
|
||||
A **project** is a container for resources — compute instances, storage volumes,
|
||||
networks, etc. Projects are the primary scope for authorization in OpenStack.
|
||||
|
||||
- Projects group resources: "who can see/use these VMs and volumes?"
|
||||
- Projects belong to a domain.
|
||||
- Projects are the primary unit for role assignment and token scoping.
|
||||
- Projects can be hierarchical (parent/child) with inherited role assignments.
|
||||
|
||||
**Key property**: A project-scoped token lets you operate on resources within
|
||||
that project. You cannot use a project-scoped token to access resources in a
|
||||
different project.
|
||||
|
||||
### 2.3 Users
|
||||
|
||||
A **user** represents a digital identity — a person, system account, or service
|
||||
account that can authenticate and be authorized.
|
||||
|
||||
- Users belong to a domain.
|
||||
- Users can have multiple authentication methods (password, TOTP, application
|
||||
credentials, federated identity).
|
||||
- Users can be members of groups.
|
||||
- Users receive role assignments on projects, domains, or system scope.
|
||||
|
||||
### 2.4 Groups
|
||||
|
||||
A **group** is a named collection of users. Groups simplify role management: you
|
||||
assign a role to a group on a project, and every user in the group inherits that
|
||||
role.
|
||||
|
||||
- Groups belong to a domain.
|
||||
- Groups are used for role assignment: `group:X → role:member → project:Y`.
|
||||
- Federation mappings often resolve external IdP groups to local Keystone groups.
|
||||
|
||||
### 2.5 Roles
|
||||
|
||||
A **role** is a named permission set. Roles by themselves don't define what
|
||||
operations are allowed — they are labels that policy files map to API operations.
|
||||
|
||||
- Roles are assigned by binding an actor (user or group) to a target (project,
|
||||
domain, or system) with a role.
|
||||
- Assignment format: `{actor, role, target}` — e.g., `{user:alice, member,
|
||||
project:engineering}`.
|
||||
- OpenStack defines default roles: `admin`, `member`, `reader`.
|
||||
- Custom roles can be created. Policy files (policy.yaml) map roles to API
|
||||
operations.
|
||||
- **Implied roles**: one role can imply another (e.g., `admin` implies `member`
|
||||
implies `reader`).
|
||||
- **Inherited roles**: a role assigned on a domain with `inherited_to_projects`
|
||||
flag propagates to all projects within that domain.
|
||||
|
||||
### 2.6 Endpoints
|
||||
|
||||
An **endpoint** is a network-accessible URL for an OpenStack service. Each
|
||||
service registers one or more endpoints in Keystone's service catalog.
|
||||
|
||||
- Endpoints have an **interface** type:
|
||||
- `public` — for end users (public network)
|
||||
- `internal` — for service-to-service communication (internal network)
|
||||
- `admin` — for administrative operations (restricted network)
|
||||
- Endpoints have a **region** attribute for multi-region deployments.
|
||||
- Endpoint URLs can contain template variables like `$(project_id)s` that are
|
||||
resolved at token time.
|
||||
|
||||
### 2.7 Service Catalog
|
||||
|
||||
The **service catalog** is a registry of all services available in the
|
||||
deployment and their endpoints. It is included in token responses and is
|
||||
available via `GET /v3/auth/catalog`.
|
||||
|
||||
- A service has a `type` (e.g., `identity`, `compute`, `object-store`) and a
|
||||
`name` (e.g., `keystone`, `nova`, `swift`).
|
||||
- The `type` follows the [service-types authority][] — it identifies the API
|
||||
contract, not the implementation version.
|
||||
- The service catalog in a token is filtered by scope: a project-scoped token
|
||||
shows only endpoints relevant to that project.
|
||||
- Endpoint filtering allows administrators to restrict which endpoints are
|
||||
visible to specific projects via project-endpoint associations or endpoint
|
||||
groups.
|
||||
|
||||
[service-types authority]: https://service-types.openstack.org/
|
||||
|
||||
**Example service catalog entry:**
|
||||
|
||||
```json
|
||||
{
|
||||
"catalog": [
|
||||
{
|
||||
"name": "Keystone",
|
||||
"type": "identity",
|
||||
"endpoints": [
|
||||
{
|
||||
"interface": "public",
|
||||
"url": "https://identity.example.com:5000/"
|
||||
},
|
||||
{
|
||||
"interface": "internal",
|
||||
"url": "https://identity.internal:5000/"
|
||||
},
|
||||
{
|
||||
"interface": "admin",
|
||||
"url": "https://identity.admin:5000/"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Token Lifecycle
|
||||
|
||||
### 3.1 Token Types by Scope
|
||||
|
||||
| Token Type | Scope | Contains | Use Case |
|
||||
|---|---|---|---|
|
||||
| **Unscoped** | None | User identity only, no roles, no catalog | Prove identity for subsequent scoped auth |
|
||||
| **Project-scoped** | Project | Roles, catalog, project info | Operate on project resources (VMs, volumes) |
|
||||
| **Domain-scoped** | Domain | Roles, catalog, domain info | Manage users/projects within a domain |
|
||||
| **System-scoped** | System | Roles, catalog, system info | Cloud-wide admin operations |
|
||||
| **Trust-scoped** | Trust | Delegated roles, trust metadata | Act on behalf of another user |
|
||||
|
||||
### 3.2 Authentication Flow
|
||||
|
||||
```
|
||||
1. Client → POST /v3/auth/tokens (with credentials)
|
||||
2. Keystone validates credentials
|
||||
3. Keystone issues token:
|
||||
- Token ID returned in X-Subject-Token header
|
||||
- Token body (JSON) returned in response body
|
||||
4. Client uses token: X-Auth-Token: <token_id> on subsequent requests
|
||||
5. Services validate token:
|
||||
- Option A: Local validation (Fernet/JWS — self-contained)
|
||||
- Option B: Call Keystone to validate (UUID tokens)
|
||||
```
|
||||
|
||||
### 3.3 Token Providers
|
||||
|
||||
| Provider | Format | Persistence | Size | Security |
|
||||
|---|---|---|---|---|
|
||||
| **Fernet** (default) | AES256-encrypted ciphertext + SHA256 HMAC | None (self-contained) | ~200 bytes | Symmetric keys; only Keystone can decrypt |
|
||||
| **JWS** | JSON Web Signature (ES256) | None (self-contained) | ~800 bytes | Asymmetric keys; anyone can verify signature, payload is readable |
|
||||
| **UUID** (legacy) | Random UUID string | Database (must be stored) | ~32 bytes | Requires database lookup for validation |
|
||||
|
||||
**Fernet tokens** are the recommended default. They are:
|
||||
- Self-contained: no database persistence needed.
|
||||
- Encrypted: the token payload is opaque to clients.
|
||||
- Compact: much smaller than JWS tokens.
|
||||
- Key rotation: Fernet keys are rotated using `keystone-manage fernet_rotate`.
|
||||
|
||||
**JWS tokens** are appropriate when:
|
||||
- You want asymmetric key verification (services can validate without sharing
|
||||
symmetric keys).
|
||||
- You're comfortable with the payload being readable by anyone who has the token.
|
||||
|
||||
### 3.4 Token Contents
|
||||
|
||||
A project-scoped token contains:
|
||||
|
||||
```json
|
||||
{
|
||||
"token": {
|
||||
"methods": ["password"],
|
||||
"user": {
|
||||
"id": "aaa...",
|
||||
"name": "alice",
|
||||
"domain": { "id": "default", "name": "Default" }
|
||||
},
|
||||
"project": {
|
||||
"id": "bbb...",
|
||||
"name": "engineering",
|
||||
"domain": { "id": "default", "name": "Default" }
|
||||
},
|
||||
"roles": [
|
||||
{ "id": "ccc...", "name": "member" },
|
||||
{ "id": "ddd...", "name": "reader" }
|
||||
],
|
||||
"catalog": [ ... ],
|
||||
"expires_at": "2026-06-08T12:00:00.000000Z",
|
||||
"issued_at": "2026-06-08T11:00:00.000000Z",
|
||||
"audit_ids": ["eeee..."],
|
||||
"is_domain": false
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Key fields:
|
||||
|
||||
- `methods`: Authentication methods used (e.g., `["password"]` or
|
||||
`["password", "totp"]` for MFA).
|
||||
- `user`: Who the token belongs to.
|
||||
- `project` / `domain` / `system`: The authorization scope.
|
||||
- `roles`: The roles assigned to the user within the scope.
|
||||
- `catalog`: Service catalog (absent in unscoped tokens).
|
||||
- `expires_at` / `issued_at`: Token validity window.
|
||||
- `audit_ids`: Chain of audit IDs for tracking token derivation.
|
||||
|
||||
### 3.5 Token Validation
|
||||
|
||||
When a service receives a request with a token:
|
||||
|
||||
1. Extract `X-Auth-Token` header.
|
||||
2. For Fernet tokens: decrypt with local Fernet key, parse payload, verify
|
||||
expiration. Check revocation events.
|
||||
3. For JWS tokens: verify signature with public key, parse payload, verify
|
||||
expiration. Check revocation events.
|
||||
4. For UUID tokens: call Keystone to validate. (Deprecated, but still supported.)
|
||||
|
||||
Keystone middleware (`keystonemiddleware`) handles this automatically for
|
||||
OpenStack services.
|
||||
|
||||
### 3.6 Token Revocation
|
||||
|
||||
Tokens can be revoked explicitly (`DELETE /v3/auth/tokens`) or implicitly via
|
||||
revocation events triggered by:
|
||||
|
||||
- User account disabled
|
||||
- Domain disabled
|
||||
- Project disabled
|
||||
- Password changed (invalidates all tokens for that user)
|
||||
- Role assignment changed (invalidates tokens for the affected scope)
|
||||
|
||||
Revocation events use pattern matching for efficiency — a single event can
|
||||
invalidate many tokens (e.g., all tokens for a user, or all tokens for a project).
|
||||
|
||||
---
|
||||
|
||||
## 4. Scoping
|
||||
|
||||
### 4.1 Unscoped → Scoped Flow
|
||||
|
||||
The typical authentication flow is two-step:
|
||||
|
||||
1. **Authenticate** → receive an **unscoped token** (proves identity, no
|
||||
authorization).
|
||||
2. **Re-authenticate with scope** → receive a **scoped token** (proves identity
|
||||
+ authorization).
|
||||
|
||||
```bash
|
||||
# Step 1: Get unscoped token
|
||||
curl -X POST /v3/auth/tokens -d '{
|
||||
"auth": {
|
||||
"identity": {
|
||||
"methods": ["password"],
|
||||
"password": { "user": { "name": "alice", "password": "..." } }
|
||||
}
|
||||
}
|
||||
}'
|
||||
|
||||
# Step 2: Get project-scoped token using unscoped token
|
||||
curl -X POST /v3/auth/tokens -d '{
|
||||
"auth": {
|
||||
"identity": {
|
||||
"methods": ["token"],
|
||||
"token": { "id": "<unscoped_token>" }
|
||||
},
|
||||
"scope": {
|
||||
"project": { "name": "engineering", "domain": { "name": "Default" } }
|
||||
}
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
### 4.2 Scope Types and Authorization
|
||||
|
||||
| Scope | Token Can Do | Token Cannot Do |
|
||||
|---|---|---|
|
||||
| **Project** | Operate on project resources (VMs, storage, networks) | Manage domain users, system-wide operations |
|
||||
| **Domain** | Manage users/projects within that domain | Operate on project resources (without project scope) |
|
||||
| **System** | Cloud-wide admin: manage endpoints, services, hypervisor info | Project-specific resource operations |
|
||||
| **None (unscoped)** | Prove identity to Keystone | Access any service resources |
|
||||
|
||||
A project-scoped token **cannot** be reused in a different project. Each scope
|
||||
is a separate token. This is a deliberate security design: token scope limits
|
||||
the blast radius of a compromised token.
|
||||
|
||||
### 4.3 Design Rationale
|
||||
|
||||
The scoping model exists because:
|
||||
|
||||
1. **Principle of least privilege**: Users authenticate once (expensive), then
|
||||
get narrowly scoped tokens (cheap) for each operation context.
|
||||
2. **Multi-tenancy**: A cloud serves many organizations; project scoping
|
||||
prevents cross-tenant access.
|
||||
3. **Administrative separation**: Domain admins manage users; system admins
|
||||
manage infrastructure. Different scopes for different jobs.
|
||||
|
||||
---
|
||||
|
||||
## 5. Role-Based Access Control (RBAC)
|
||||
|
||||
### 5.1 Role Assignments
|
||||
|
||||
A role assignment binds an **actor** (user or group) to a **role** on a
|
||||
**target** (project, domain, or system).
|
||||
|
||||
The four assignment types:
|
||||
|
||||
| Assignment | Actor | Target | Example |
|
||||
|---|---|---|---|
|
||||
| User → Project | User | Project | Alice is `member` of `engineering` |
|
||||
| Group → Project | Group | Project | `dev-team` group is `member` of `engineering` |
|
||||
| User → Domain | User | Domain | Alice is `admin` of `acme-domain` |
|
||||
| Group → Domain | Group | Domain | `ops-team` group is `admin` of `acme-domain` |
|
||||
|
||||
Plus **system** role assignments for cloud-wide operations.
|
||||
|
||||
### 5.2 Effective Role Assignments
|
||||
|
||||
When querying role assignments with `effective=True`, Keystone resolves:
|
||||
|
||||
1. **Direct assignments**: Roles explicitly granted.
|
||||
2. **Group memberships**: Roles inherited from groups the user belongs to.
|
||||
3. **Inherited roles**: Roles from parent projects or domains (via
|
||||
`inherited_to_projects` flag).
|
||||
4. **Implied roles**: Roles implied by other roles (e.g., `admin` → `member`
|
||||
→ `reader`).
|
||||
|
||||
### 5.3 Policy Enforcement
|
||||
|
||||
Keystone uses `oslo.policy` for policy enforcement. Each OpenStack service
|
||||
defines policy rules in `policy.yaml` files. A rule maps an API operation to a
|
||||
check string:
|
||||
|
||||
```yaml
|
||||
"identity:create_project": "role:admin and domain_id:%(target.domain.id)s"
|
||||
"identity:list_projects": "role:reader"
|
||||
"identity:update_project": "role:admin or project_id:%(target.project.id)s"
|
||||
```
|
||||
|
||||
Policy rules can check:
|
||||
|
||||
- Role membership (`role:admin`)
|
||||
- Scope type (`system_scope:all`, `domain_id:...`)
|
||||
- Resource ownership (`user_id:%(target.user.id)s`)
|
||||
- Arbitrary target attributes
|
||||
|
||||
### 5.4 Scope Enforcement in Policy
|
||||
|
||||
Since the Rocky release, policies can require specific token scopes:
|
||||
|
||||
```yaml
|
||||
# System-scoped token required
|
||||
"identity:list_projects": "role:reader and system_scope:all"
|
||||
|
||||
# Project-scoped token required
|
||||
"nova:create_server": "role:member and project_id:%(target.project.id)s"
|
||||
```
|
||||
|
||||
This prevents:
|
||||
- Using a project-scoped token for system operations.
|
||||
- Using a system-scoped token for project operations (without a project context).
|
||||
|
||||
---
|
||||
|
||||
## 6. Trust Delegation (OS-TRUST)
|
||||
|
||||
### 6.1 Overview
|
||||
|
||||
Trusts allow one user (**trustor**) to delegate a subset of their authority to
|
||||
another user (**trustee**) for a limited scope and duration, without sharing
|
||||
credentials.
|
||||
|
||||
**Key properties of a trust:**
|
||||
|
||||
| Property | Description |
|
||||
|---|---|
|
||||
| `trustor_user_id` | User creating the trust (delegating authority) |
|
||||
| `trustee_user_id` | User receiving the delegation |
|
||||
| `project_id` | Project scope for the delegated authority |
|
||||
| `roles` | Subset of trustor's roles being delegated |
|
||||
| `impersonation` | If `true`, tokens appear to come from the trustor |
|
||||
| `expires_at` | Optional expiration timestamp |
|
||||
| `remaining_uses` | Optional limit on how many tokens can be created from this trust |
|
||||
| `allow_redelegation` | Whether the trustee can create sub-trusts |
|
||||
| `redelegation_count` | Maximum depth of redelegation chain |
|
||||
|
||||
### 6.2 Trust-Scoped Tokens
|
||||
|
||||
When a trustee authenticates using a trust:
|
||||
|
||||
1. The trustee authenticates with their own credentials.
|
||||
2. They specify `trust_id` in the auth request.
|
||||
3. Keystone issues a **trust-scoped token** with:
|
||||
- Roles: the intersection of the trust's roles and the trustor's current
|
||||
roles (if trustor lost a role, the trust is invalidated).
|
||||
- `OS-TRUST:trust` section in the token body containing trust metadata.
|
||||
|
||||
If `impersonation=true`, the token's `user` field shows the trustor — the
|
||||
trustee acts as the trustor. If `impersonation=false`, the token's `user`
|
||||
field shows the trustee.
|
||||
|
||||
### 6.3 Trust Delegation Chains
|
||||
|
||||
Trusts support **redelegation**: a trustee can create a new trust delegating to
|
||||
a third party. This creates a trust chain:
|
||||
|
||||
```
|
||||
Trustor → Trust(A) → Trustee1
|
||||
Trustee1 → Trust(B) → Trustee2 (redelegation)
|
||||
```
|
||||
|
||||
Delegation depth is controlled by:
|
||||
|
||||
- `allow_redelegation: true/false`
|
||||
- `redelegation_count: N` (decremented on each redelegation; default max is 3)
|
||||
|
||||
**Security constraints:**
|
||||
|
||||
- The redelegated trust's roles must be a subset of the original trustor's
|
||||
roles (not the intermediate trustee's).
|
||||
- If `impersonation=false` in the source trust, the redelegated trust cannot
|
||||
set `impersonation=true`.
|
||||
- Application credentials cannot create or delete trusts (prevents automated
|
||||
escalation chains).
|
||||
|
||||
### 6.4 Automatic Trust Revocation
|
||||
|
||||
Trusts are automatically revoked (soft-deleted) when:
|
||||
|
||||
- The trustor is deleted.
|
||||
- The trustee is deleted.
|
||||
- The project is deleted.
|
||||
- The trust expires (`expires_at`).
|
||||
- The remaining uses are exhausted (`remaining_uses` reaches 0).
|
||||
- The trustor loses a role that was delegated in the trust.
|
||||
|
||||
---
|
||||
|
||||
## 7. Application Credentials
|
||||
|
||||
### 7.1 Overview
|
||||
|
||||
Application credentials allow users to create long-lived, restricted credentials
|
||||
for applications without exposing their password. This is especially important
|
||||
for users whose identity comes from LDAP or SSO — applications can't use their
|
||||
password.
|
||||
|
||||
**Key properties:**
|
||||
|
||||
| Property | Description |
|
||||
|---|---|
|
||||
| `name` | Unique name within the user's application credentials |
|
||||
| `secret` | Auto-generated or user-provided secret (hashed on storage, shown once) |
|
||||
| `project_id` | Project scope (always the user's current project) |
|
||||
| `roles` | Subset of the user's roles on the project (cannot exceed user's roles) |
|
||||
| `expires_at` | Optional expiration timestamp |
|
||||
| `unrestricted` | `false` by default — restricted from creating/deleting other app creds and trusts |
|
||||
|
||||
### 7.2 Authentication with Application Credentials
|
||||
|
||||
```bash
|
||||
# Auth with application credential ID + secret
|
||||
curl -X POST /v3/auth/tokens -d '{
|
||||
"auth": {
|
||||
"identity": {
|
||||
"methods": ["application_credential"],
|
||||
"application_credential": {
|
||||
"id": "aa809205ed614a0e854bac92c0768bb9",
|
||||
"secret": "oKce6DOC_WcZoE13l3eX..."
|
||||
}
|
||||
}
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
Or by name + user:
|
||||
|
||||
```bash
|
||||
"application_credential": {
|
||||
"name": "monitoring",
|
||||
"user": { "name": "glance", "domain": { "name": "Default" } },
|
||||
"secret": "securesecret"
|
||||
}
|
||||
```
|
||||
|
||||
### 7.3 Restriction Model
|
||||
|
||||
By default (`unrestricted=false`), application credentials **cannot**:
|
||||
|
||||
- Create or delete other application credentials.
|
||||
- Create or delete trusts.
|
||||
- List other application credentials.
|
||||
|
||||
This prevents a compromised app credential from regenerating itself or escalating
|
||||
privileges. Setting `unrestricted=true` removes these restrictions, but adds
|
||||
risk.
|
||||
|
||||
### 7.4 Rotation
|
||||
|
||||
Application credentials support **zero-downtime rotation**:
|
||||
|
||||
1. Create a new application credential (names must be unique per user).
|
||||
2. Update the application configuration with the new ID/secret.
|
||||
3. Delete the old application credential.
|
||||
|
||||
Multiple application credentials can coexist for the same user+project,
|
||||
enabling seamless transitions.
|
||||
|
||||
### 7.5 Invalidation
|
||||
|
||||
Application credentials are automatically invalidated when:
|
||||
|
||||
- The user is deleted or disabled.
|
||||
- The user's role assignment on the project changes (roles are checked at
|
||||
auth time against the user's current roles).
|
||||
- The project is deleted or disabled.
|
||||
- The credential expires (`expires_at`).
|
||||
- The credential is explicitly deleted.
|
||||
|
||||
---
|
||||
|
||||
## 8. Federation
|
||||
|
||||
### 8.1 Overview
|
||||
|
||||
Keystone's federation module allows external Identity Providers (IdPs) to
|
||||
authenticate users, with Keystone acting as a Service Provider (SP). Keystone
|
||||
maps the external identity to local users, groups, and roles.
|
||||
|
||||
**Supported protocols:**
|
||||
|
||||
| Protocol | Module | Use Case |
|
||||
|---|---|---|
|
||||
| **SAML 2.0** | mod_shib / mod_auth_mellon | Enterprise SSO |
|
||||
| **OpenID Connect** | mod_auth_openidc | OAuth2/OIDC providers (Google, Keycloak, Okta) |
|
||||
| **Mapped** | Custom auth module | Any HTTP auth module |
|
||||
| **K2K** | Keystone-to-Keystone | Multi-cloud federation between OpenStack deployments |
|
||||
|
||||
### 8.2 Federation Architecture
|
||||
|
||||
```
|
||||
┌──────────────────┐
|
||||
│ External IdP │
|
||||
│ (SAML/OIDC/...) │
|
||||
└────────┬────────┘
|
||||
│
|
||||
SAML assertion or
|
||||
OIDC claims
|
||||
│
|
||||
▼
|
||||
┌──────────┐ HTTPD auth module ┌───────────────┐
|
||||
│ Browser │ ───────────────────────▶│ Apache/Nginx │
|
||||
│ or CLI │ (mod_shib / │ + auth module │
|
||||
└──────────┘ mod_auth_openidc) └───────┬────────┘
|
||||
│
|
||||
REMOTE_USER header
|
||||
+ other attributes
|
||||
│
|
||||
▼
|
||||
┌──────────────────┐
|
||||
│ Keystone │
|
||||
│ (SP) │
|
||||
│ │
|
||||
│ 1. Lookup IdP │
|
||||
│ 2. Apply mapping│
|
||||
│ │ remote attrs │
|
||||
│ │ → local user,│
|
||||
│ │ groups, │
|
||||
│ │ roles │
|
||||
│ 3. Issue token │
|
||||
└──────────────────┘
|
||||
```
|
||||
|
||||
### 8.3 Key Federation Components
|
||||
|
||||
1. **Identity Provider** object — represents the external IdP in Keystone.
|
||||
Has `remote_ids` (entity IDs) that Keystone uses to match incoming
|
||||
requests.
|
||||
|
||||
2. **Mapping** — a set of rules that transform attributes from the external IdP
|
||||
into Keystone-local user properties and group memberships. Mappings can:
|
||||
- Map remote users to local users (by name, email, or other attributes).
|
||||
- Assign users to local groups (inherit group role assignments).
|
||||
- Dynamically create projects based on remote attributes.
|
||||
- Support complex condition logic.
|
||||
|
||||
3. **Protocol** — links an Identity Provider to a Mapping. Supported values:
|
||||
`saml2`, `openid`, `mapped`, or custom.
|
||||
|
||||
4. **Mapping rule example:**
|
||||
|
||||
```json
|
||||
[{
|
||||
"local": [{
|
||||
"user": { "name": "{0}" },
|
||||
"group": { "domain": { "name": "Default" }, "name": "federated_users" }
|
||||
}],
|
||||
"remote": [{ "type": "REMOTE_USER" }]
|
||||
}]
|
||||
```
|
||||
|
||||
This maps all authenticated external users to a local user (named by the
|
||||
`REMOTE_USER` attribute) and adds them to the `federated_users` group.
|
||||
|
||||
### 8.4 Federation Token Flow
|
||||
|
||||
1. User authenticates with the external IdP.
|
||||
2. The HTTPD auth module (Apache/Nginx) validates the assertion and sets
|
||||
`REMOTE_USER` and other headers.
|
||||
3. Keystone receives the request at `/v3/OS-FEDERATION/identity_providers/{idp}/protocols/{protocol}/auth`.
|
||||
4. Keystone applies the mapping rules to produce a local user + groups + roles.
|
||||
5. Keystone issues a **federated unscoped token**.
|
||||
6. The user can then exchange it for a scoped token (project, domain, or
|
||||
system) just like any other unscoped token.
|
||||
|
||||
### 8.5 Identity Provider (Keystone as IdP)
|
||||
|
||||
Keystone can also act as an **Identity Provider** (SAML IdP), allowing it to
|
||||
authenticate users from other OpenStack deployments (K2K federation) or other
|
||||
SAML SPs.
|
||||
|
||||
---
|
||||
|
||||
## 9. Service Catalog Deep Dive
|
||||
|
||||
### 9.1 Service Registration
|
||||
|
||||
Services are registered with Keystone via the API:
|
||||
|
||||
```bash
|
||||
openstack service create --name nova --description "Compute" compute
|
||||
openstack endpoint create --region RegionOne compute public https://nova.example.com:8774/
|
||||
openstack endpoint create --region RegionOne compute internal https://nova.internal:8774/
|
||||
openstack endpoint create --region RegionOne compute admin https://nova.admin:8774/
|
||||
```
|
||||
|
||||
### 9.2 Catalog Filtering
|
||||
|
||||
The catalog returned in a token is filtered by:
|
||||
|
||||
1. **Scope**: A project-scoped token includes endpoints filtered by
|
||||
project-endpoint associations.
|
||||
2. **Endpoint groups**: Admins can define endpoint groups (filtered by service
|
||||
type, region, or interface) and associate them with projects.
|
||||
3. **Enabled/disabled**: Disabled services and endpoints don't appear in the
|
||||
catalog.
|
||||
4. **Interface visibility**: `public`, `internal`, and `admin` endpoints serve
|
||||
different audiences.
|
||||
|
||||
### 9.3 URL Templating
|
||||
|
||||
Endpoint URLs support template variables:
|
||||
|
||||
- `$(project_id)s` — replaced with the token's project ID
|
||||
- `$(user_id)s` — replaced with the token's user ID
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
https://object-store.example.com/v1/KEY_$(project_id)s
|
||||
```
|
||||
|
||||
When a project-scoped token is issued, the catalog resolves this to:
|
||||
|
||||
```
|
||||
https://object-store.example.com/v1/KEY_d12af07f4e2c4390a21acc31517ebec9
|
||||
```
|
||||
|
||||
### 9.4 Client Discovery
|
||||
|
||||
An OpenStack client authenticates with Keystone, receives a token (which
|
||||
includes the service catalog), and then uses the catalog to discover the URL
|
||||
for any service it needs:
|
||||
|
||||
```python
|
||||
# After authentication, the catalog is in the token response:
|
||||
for service in token['catalog']:
|
||||
if service['type'] == 'compute':
|
||||
for endpoint in service['endpoints']:
|
||||
if endpoint['interface'] == 'public':
|
||||
nova_url = endpoint['url']
|
||||
break
|
||||
```
|
||||
|
||||
This is how every OpenStack client discovers service endpoints — they never
|
||||
hardcode URLs. They authenticate once, get the catalog, and dynamically route
|
||||
to the correct endpoint.
|
||||
|
||||
---
|
||||
|
||||
## 10. Mapping to alknet Concepts
|
||||
|
||||
### 10.1 Concept Comparison Table
|
||||
|
||||
| Keystone Concept | alknet Concept | Notes |
|
||||
|---|---|---|
|
||||
| Domain | (Not directly mapped) | alknet is single-tenant/small-team focused; no need for domain-level admin boundaries yet |
|
||||
| Project | `Identity.resources` | Projects scope resources; alknet's `resources: HashMap<String, Vec<String>>` serves a similar scoping purpose |
|
||||
| User | `Identity.id` | Keystone users ↔ alknet identities (fingerprint or UUID) |
|
||||
| Group | (Not directly mapped) | Could be added via `Identity.scopes` patterns or a groups concept in alknet-storage |
|
||||
| Role | `Identity.scopes` | Keystone roles map to alknet scopes: `["relay:connect", "service:gitea:read"]` ≈ role assignments |
|
||||
| Token (scoped) | `AuthToken` + scoped permissions | alknet's AuthToken proves identity + timestamp; scopes come from IdentityProvider lookup |
|
||||
| Service Catalog | `OperationRegistry` + OpenAPI spec generation | Both solve service discovery; Keystone is runtime API catalog, alknet generates from OpenAPI |
|
||||
| Trust Delegation | (Potential future model) | alknet doesn't have delegation yet; trust model could inspire future `DelegationToken` |
|
||||
| Application Credentials | API keys in `api_keys` table | alknet's `api_keys` table parallels app creds: long-lived, scoped, user-bound |
|
||||
| Federation (SAML/OIDC) | Phase D OIDC provider aspiration | alknet wants to *be* an OIDC provider; Keystone consumes external IdPs |
|
||||
| Service Endpoint | (Implicit in OperationEnv) | alknet operations are discovered via registry, not external endpoint lookup |
|
||||
| Policy (policy.yaml) | `ForwardingPolicy` + call protocol ACL | Both enforce "who can do what where"; alknet is code-based, not YAML-configured |
|
||||
|
||||
### 10.2 What to Adopt from Keystone
|
||||
|
||||
#### 10.2.1 Scoped Tokens (Strong Adopt)
|
||||
|
||||
**Keystone pattern**: Unscoped → project/domain/system scoped token flow.
|
||||
|
||||
**alknet application**: Currently, `AuthToken` proves identity with a timestamp.
|
||||
`Identity.scopes` and `Identity.resources` are resolved *after* token
|
||||
verification by `IdentityProvider`. This is analogous to Keystone's flow:
|
||||
|
||||
| Keystone | alknet |
|
||||
|---|---|
|
||||
| Unscoped token (identity only) | AuthToken (proves key possession + timestamp) |
|
||||
| Scoped token (identity + roles + catalog) | Identity (resolved by IdentityProvider with scopes + resources) |
|
||||
| Re-auth with scope | Not needed — alknet scopes come from the `IdentityProvider` lookup |
|
||||
|
||||
**Recommendation**: alknet's current model is already similar to Keystone's, but
|
||||
more streamlined. alknet doesn't need a separate "re-auth with scope" step
|
||||
because the `IdentityProvider` resolution *is* the scoping step. However,
|
||||
consider adding explicit scope fields to the token in the future for
|
||||
multi-tenant deployments.
|
||||
|
||||
#### 10.2.2 Service Catalog Pattern (Strong Adopt)
|
||||
|
||||
**Keystone pattern**: Services register endpoints; clients discover them from
|
||||
the token/catalog.
|
||||
|
||||
**alknet application**: The `OperationRegistry` + `OpenAPIServiceRegistry`
|
||||
serves a similar purpose:
|
||||
|
||||
- Keystone: `POST /v3/auth/tokens` → response includes catalog of services
|
||||
and URLs.
|
||||
- alknet: `OperationRegistry` knows all available operations; `FromOpenAPI`
|
||||
generates them from specs.
|
||||
|
||||
**Key difference**: In Keystone, the catalog is returned *with the token* and
|
||||
is dynamic (filtered by project scope). In alknet, the registry is built at
|
||||
startup from configuration, and access control is enforced per-operation in the
|
||||
call protocol.
|
||||
|
||||
**Recommendation**: Consider adding a "service discovery" operation to the
|
||||
call protocol — a way for clients to ask "what operations are available to me?"
|
||||
This would be analogous to Keystone's `GET /v3/auth/catalog`.
|
||||
|
||||
#### 10.2.3 Role Hierarchies and Implied Roles (Moderate Adopt)
|
||||
|
||||
**Keystone pattern**: Roles can imply other roles (`admin` → `member` →
|
||||
`reader`). Role assignments on domains propagate to projects via inheritance.
|
||||
|
||||
**alknet application**: Currently, alknet's scopes are flat strings. Consider:
|
||||
|
||||
```
|
||||
admin:service:* → implies → member:service:* → implies → reader:service:*
|
||||
```
|
||||
|
||||
This would simplify scope assignment in the `IdentityProvider`: grant `admin:service:*`
|
||||
and automatically get `member` and `reader` permissions.
|
||||
|
||||
**Recommendation**: Implement implied scopes as a Phase 2+ feature when
|
||||
alknet-storage adds the ACL graph. Don't over-engineer in Phase 1.
|
||||
|
||||
#### 10.2.4 Application Credentials (Strong Adopt — alreded parallels)
|
||||
|
||||
**Keystone pattern**: Password-less auth with restricted capabilities, tied to a
|
||||
user and project, with expiration and rotation support.
|
||||
|
||||
**alknet application**: The `api_keys` table in alknet-storage is exactly this:
|
||||
|
||||
| Keystone App Credential | alknet API Key |
|
||||
|---|---|
|
||||
| `id` + `secret` | `key_prefix` + `key_hash` |
|
||||
| `roles` (subset of user's roles) | `scopes` (subset of account's scopes) |
|
||||
| `project_id` (scope) | Account-scoped |
|
||||
| `expires_at` | `expires_at` |
|
||||
| `unrestricted` | (not yet implemented) |
|
||||
| Rotation via create-new-then-delete | (not yet implemented) |
|
||||
|
||||
**Recommendation**: Add the `unrestricted` concept to API keys — by default,
|
||||
API keys should NOT be able to create or delete other API keys or modify
|
||||
account settings. Also add rotation support (create new key, update config,
|
||||
delete old key).
|
||||
|
||||
#### 10.2.5 Trust Delegation (Future Consideration)
|
||||
|
||||
**Keystone pattern**: Trustor delegates limited authority to trustee with
|
||||
impersonation, expiration, usage limits, and redelegation chains.
|
||||
|
||||
**alknet application**: alknet doesn't have this yet, but it could be useful
|
||||
for:
|
||||
|
||||
- **Service-to-service auth**: An alknet node delegates limited authority to a
|
||||
service wrapper (e.g., "let the rustfs wrapper access S3 on my behalf for 1
|
||||
hour").
|
||||
- **Temporary access grants**: "Give Alice access to the `engineering` scope
|
||||
for 24 hours."
|
||||
- **Impersonation for audit**: Trusted services acting on behalf of a user,
|
||||
with the user's identity appearing in audit logs.
|
||||
|
||||
**Recommendation**: Design a `DelegationToken` or `Trust` model when
|
||||
alknet-storage is built. The trust model — trustor, trustee, roles, expiration,
|
||||
remaining_uses — is a good template.
|
||||
|
||||
#### 10.2.6 Federation (Phase D Alignment)
|
||||
|
||||
**Keystone pattern**: External IdPs (SAML, OIDC) authenticate users; Keystone
|
||||
maps them to local identities via mapping rules.
|
||||
|
||||
**alknet application**: Phase D of `credential-provider.md` envisions alknet
|
||||
*as* an OIDC provider for self-hosted services. This is the **inverse** of
|
||||
Keystone's federation model:
|
||||
|
||||
- Keystone: external IdP → Keystone (SP) → local identity
|
||||
- alknet Phase D: alknet (IdP) → rustfs/gitea (SP) → local identity on self-hosted service
|
||||
|
||||
**Key learning from Keystone's federation model**:
|
||||
|
||||
1. **Mapping rules** are critical. Keystone's mapping engine (`local` ← `remote`)
|
||||
is how IdP attributes become local roles. alknet will need the inverse:
|
||||
`Identity.scopes` → OIDC claims → rustfs/gitea policies.
|
||||
2. **Group membership from federation** is temporary by default (valid for
|
||||
token lifetime). alknet should consider whether federated identities are
|
||||
permanent or session-scoped.
|
||||
3. **Multiple IdP support**: Keystone can consume from multiple external IdPs.
|
||||
alknet Phase D should support multiple SPs (multiple self-hosted services)
|
||||
consuming from one alknet IdP.
|
||||
|
||||
**Recommendation**: When building Phase D, study Keystone's mapping rule
|
||||
format. alknet will need a similar concept: `alknet.scope → oidc.claim →
|
||||
service.policy`. This could be part of the `CredentialProvider` or a new
|
||||
`IdentityMappingProvider`.
|
||||
|
||||
### 10.3 What NOT to Adopt from Keystone
|
||||
|
||||
#### 10.3.1 Domains (Not Needed)
|
||||
|
||||
Keystone's domain model is designed for multi-tenant cloud hosting where
|
||||
different organizations share the same OpenStack deployment. alknet is designed
|
||||
for self-hosted, single-organization or small-team deployments. The domain
|
||||
concept adds complexity that doesn't justify itself in alknet's use case.
|
||||
|
||||
alknet's `Identity.resources` already provides a lightweight scoping mechanism
|
||||
that covers the "which resources does this identity have access to" use case
|
||||
without the overhead of a domain hierarchy.
|
||||
|
||||
#### 10.3.2 Separate Policy Engine (Over-Engineering)
|
||||
|
||||
Keystone's `oslo.policy` is a full YAML-based policy engine with complex rule
|
||||
combinations (`role:admin AND domain_id:%(target.domain.id)s OR
|
||||
project_id:%(target.project.id)s`). alknet's authorization model is
|
||||
programmatic (Rust code in `ForwardingPolicy` and call protocol handlers), not
|
||||
configured via YAML. This is appropriate for alknet's size and complexity.
|
||||
|
||||
**If** alknet needs configurable policies in the future (e.g., admin-editable
|
||||
ACL rules stored in the database), a simple rule engine would suffice — not the
|
||||
full oslo.policy model.
|
||||
|
||||
#### 10.3.3 Multiple Token/Scope Types (Unnecessary Complexity)
|
||||
|
||||
Keystone has separate token types for project/domain/system scope. alknet's
|
||||
`AuthToken` is already simpler: it proves identity + timestamp, and the
|
||||
`IdentityProvider` resolves scopes. There's no need for alknet to issue
|
||||
different token types for different scopes.
|
||||
|
||||
If multi-tenancy is added in the future, the `Identity.resources` map can
|
||||
encode project equivalents without needing a separate token type.
|
||||
|
||||
#### 10.3.3 Service Endpoint Registration (Unnecessary)
|
||||
|
||||
Keystone requires every service to register its endpoints in the catalog
|
||||
before it can be discovered. alknet services are registered programmatically
|
||||
(via `OperationRegistry::register()`) at startup, not via a central API. The
|
||||
`OperationRegistry` is built from configuration and OpenAPI specs, not from a
|
||||
catalog service.
|
||||
|
||||
This is appropriate for alknet's architecture: services are known at deploy
|
||||
time, not dynamically registered. If dynamic service discovery is needed later,
|
||||
a simple registry operation in the call protocol would suffice.
|
||||
|
||||
---
|
||||
|
||||
## 11. Summary of Recommendations
|
||||
|
||||
| Keystone Concept | Adoption Level | alknet Implementation |
|
||||
|---|---|---|
|
||||
| **Scoped tokens** | ✅ Strong Adopt | Already present in IdentityProvider resolution (AuthToken → Identity with scopes/resources) |
|
||||
| **Service catalog** | ✅ Strong Adopt | `OperationRegistry` + `FromOpenAPI`; consider adding "list operations" discovery |
|
||||
| **Application credentials** | ✅ Strong Adopt | `api_keys` table parallels exactly; add `unrestricted` flag and rotation support |
|
||||
| **Role hierarchies / implied roles** | ⚡ Moderate | Implied scope hierarchies in Phase 2+ when ACL graph is built |
|
||||
| **Trust delegation** | ⚡ Moderate | Design `DelegationToken` model for service-to-service and temporary access in Phase 2+ |
|
||||
| **Federation mapping** | ⚡ Moderate | Phase D: adopt `scope → claim → policy` mapping pattern for OIDC provider |
|
||||
| **Token revocation events** | ⚡ Moderate | Consider pattern-matching revocation for efficiency when alknet-storage supports it |
|
||||
| **Domains** | ❌ Skip | alknet is self-hosted/small-team; `Identity.resources` provides lightweight scoping |
|
||||
| **oslo.policy (YAML-based)** | ❌ Skip | alknet uses programmatic auth (Rust code); add simple rule engine only if needed |
|
||||
| **Multiple token types** | ❌ Skip | One token type with scope resolution via `IdentityProvider` is sufficient |
|
||||
| **Endpoint registration API** | ❌ Skip | `OperationRegistry` is configured at startup, not via a catalog API |
|
||||
|
||||
---
|
||||
|
||||
## 12. References
|
||||
|
||||
- [Keystone Architecture — OpenStack Docs](https://docs.openstack.org/keystone/2024.2/getting-started/architecture.html)
|
||||
- [Keystone Tokens Overview](https://docs.openstack.org/keystone/latest/admin/tokens-overview.html)
|
||||
- [Keystone Service Catalog Overview](https://docs.openstack.org/keystone/latest/contributor/service-catalog.html)
|
||||
- [Keystone Trusts Documentation](https://docs.openstack.org/keystone/latest/user/trusts.html)
|
||||
- [Keystone Application Credentials](https://docs.openstack.org/keystone/queens/user/application_credentials.html)
|
||||
- [Keystone Federation Configuration](https://docs.openstack.org/keystone/latest/admin/federation/configure_federation.html)
|
||||
- [Keystone RBAC and Authorization — DeepWiki](https://deepwiki.com/openstack/keystone/4-authorization-and-access-control)
|
||||
- [Keystone Authentication and Token Management — DeepWiki](https://deepwiki.com/openstack/keystone/3-authentication-and-token-management)
|
||||
- [Keystone Trust Delegation — DeepWiki](https://deepwiki.com/openstack/keystone/4.4-trust-delegation)
|
||||
- [Keystone Service Catalog — DeepWiki](https://deepwiki.com/openstack/keystone/5.4-service-catalog)
|
||||
- [Keystone Token Revocation — DeepWiki](https://deepwiki.com/openstack/keystone/3.4-token-revocation)
|
||||
- [Understanding OpenStack Keystone: Scoped vs. Unscoped Tokens](https://osie.io/blog/understanding-openstack-keystone-scoped-vs-unscoped-tokens)
|
||||
- [Trust Delegation in OpenStack Using Keystone Trusts](https://blog.zhaw.ch/icclab/trust-delegation-in-openstack-using-keystone-trusts/)
|
||||
- [OpenStack Knowledge: Keystone Federation](https://github.com/stackers-network/openstack-knowledge/blob/main/core/identity/federation.md)
|
||||
- [alknet identity.md](../../architecture/identity.md)
|
||||
- [alknet auth.md](../../architecture/auth.md)
|
||||
- [alknet credential-provider.md](../phase2/credential-provider.md)
|
||||
@@ -1,137 +0,0 @@
|
||||
# Polyglot: Research Overview
|
||||
|
||||
**Library**: `polyglot-sql` (Rust crate) / `@polyglot-sql/sdk` (TypeScript/WASM) / `polyglot-sql` (Python)
|
||||
**Repository**: <https://github.com/tobilg/polyglot>
|
||||
**Current Version**: 0.4.4 (as of 2026-06-03)
|
||||
**License**: MIT (+ sqlglot MIT for test fixtures)
|
||||
**Author**: Tobias G. (tobilg)
|
||||
**Inspiration**: Python [sqlglot](https://github.com/tobymao/sqlglot) by Toby Mao
|
||||
|
||||
---
|
||||
|
||||
## 1. What Is Polyglot?
|
||||
|
||||
Polyglot is a **SQL transpiler** — it parses SQL from one database dialect into an AST, and generates SQL for a different dialect. It is **not** a database driver, ORM, query executor, or connection pool. Its core purpose is **dialect-agnostic SQL manipulation**: parse, transform, validate, format, and transpile SQL across 32+ database dialects.
|
||||
|
||||
### Key Capabilities
|
||||
|
||||
| Capability | Description |
|
||||
|---|---|
|
||||
| **Parse** | Convert SQL string → typed AST with 200+ expression node types |
|
||||
| **Generate** | Convert AST → SQL string for any supported dialect |
|
||||
| **Transpile** | Convert SQL from dialect A → dialect B in one call |
|
||||
| **Format** | Pretty-print SQL with configurable guard rails |
|
||||
| **Build** | Construct SQL programmatically via a fluent builder API |
|
||||
| **Validate** | Syntax + semantic validation with error positions |
|
||||
| **Lineage** | Trace column lineage through queries; generate OpenLineage payloads |
|
||||
| **Diff** | AST-aware diff between two SQL expressions |
|
||||
| **Traverse** | DFS/BFS iterators, predicate queries, and transforms on the AST |
|
||||
|
||||
### Supported Dialects (32)
|
||||
|
||||
Athena, BigQuery, ClickHouse, CockroachDB, Databricks, Doris, Dremio, Drill, Druid, DuckDB, Dune, Exasol, Fabric, Hive, Materialize, MySQL, Oracle, PostgreSQL, Presto, Redshift, RisingWave, SingleStore, Snowflake, Solr, Spark, SQLite, StarRocks, Tableau, Teradata, TiDB, Trino, TSQL
|
||||
|
||||
Plus a `Generic` dialect for standard SQL.
|
||||
|
||||
### Language Bindings
|
||||
|
||||
| Binding | Package | Delivery |
|
||||
|---|---|---|
|
||||
| **Rust** | `polyglot-sql` on crates.io | Native Rust crate |
|
||||
| **TypeScript/WASM** | `@polyglot-sql/sdk` on npm | WASM module + JS wrapper |
|
||||
| **Python** | `polyglot-sql` on PyPI | PyO3 native extension |
|
||||
| **Go** | `github.com/tobilg/polyglot/packages/go` | PureGo wrapper over C FFI |
|
||||
| **C FFI** | Built from `polyglot-sql-ffi` | `.so` / `.dylib` / `.dll` + `.a` / `.lib` + header |
|
||||
|
||||
---
|
||||
|
||||
## 2. Core Philosophy & Design Principles
|
||||
|
||||
1. **Pipeline architecture**: SQL → Tokenize → Parse → AST → Transform → Generate → SQL string. Each stage is independently configurable per dialect.
|
||||
|
||||
2. **Ported from Python sqlglot**: The Rust implementation is a faithful port of the Python `sqlglot` library, maintaining compatibility with its test fixtures (10,220+ fixture cases at 100% pass rate). The architecture, expression types, transformation rules, and dialect behaviors mirror the Python original.
|
||||
|
||||
3. **No runtime database connection**: Polyglot never connects to a database. It operates purely on SQL strings and ASTs. This makes it safe for sandboxed environments (WASM, serverless) and suitable for build-time / CI-time SQL analysis.
|
||||
|
||||
4. **Feature-gated compilation**: Each dialect is behind a Cargo feature flag (`dialect-postgresql`, `dialect-mysql`, etc.), so users compiling for constrained targets (WASM) can include only what they need. The `default` feature set includes everything.
|
||||
|
||||
5. **Stack safety**: The `stacker` feature (default-on for native builds) grows the stack on deeply nested inputs, preventing stack overflow from pathological SQL. WASM builds opt out since `stacker` doesn't work there.
|
||||
|
||||
6. **Guard rails**: Format/guard options limit input size (16 MiB default), token count (1M), AST node count (1M), and set-operation chain depth (256) to prevent resource exhaustion.
|
||||
|
||||
7. **Performance-first**: Built in Rust for speed. Benchmarks show 8–19× speedup over the Python `sqlglot` for transpilation, with generation at ~86× faster. The WASM build enables near-native performance in browsers.
|
||||
|
||||
---
|
||||
|
||||
## 3. How It Differs from Database Abstraction Layers
|
||||
|
||||
**Critical distinction**: Polyglot is a **SQL dialect transpiler**, not a database abstraction layer. It does not:
|
||||
|
||||
- Connect to databases
|
||||
- Execute queries
|
||||
- Manage connection pools
|
||||
- Handle migrations (no `CREATE TABLE` schema evolution management)
|
||||
- Map Rust types to database types
|
||||
- Provide an ORM-like interface
|
||||
- Handle async I/O
|
||||
|
||||
Instead, it focuses purely on **SQL text manipulation**: parsing, analyzing, transforming, and generating SQL strings. This makes it complementary to (not competing with) libraries like Diesel, SQLx, or SeaORM.
|
||||
|
||||
---
|
||||
|
||||
## 4. Performance Characteristics
|
||||
|
||||
From the project's benchmark suite (polyglot-sql v0.1.2 vs sqlglot v28.10.1):
|
||||
|
||||
| Operation | Speedup Range |
|
||||
|---|---|
|
||||
| Parse (SQL → AST) | 10–13× faster |
|
||||
| Generate (AST → SQL) | 77–101× faster |
|
||||
| Roundtrip (parse → generate → re-parse) | 13–15× faster |
|
||||
| Transpile (full cross-dialect) | 1.6× (simple) to 19× (complex BigQuery→Snowflake) |
|
||||
| Geometric mean | **8.70×** |
|
||||
|
||||
Parse benchmarks (v0.4.x, native Rust):
|
||||
|
||||
| Query | Mean |
|
||||
|---|---|
|
||||
| short (SELECT a, b, c) | 51.28 μs |
|
||||
| medium (5 cols, JOIN, GROUP BY) | 259.61 μs |
|
||||
| complex (3 CTEs, subquery) | 268.59 μs – 1.03 ms |
|
||||
|
||||
---
|
||||
|
||||
## 5. Project Maturity Indicators
|
||||
|
||||
| Indicator | Status |
|
||||
|---|---|
|
||||
| **Version** | 0.4.4 (pre-1.0, active development) |
|
||||
| **Test coverage** | 18,745 test cases at 100% pass rate |
|
||||
| **crates.io downloads** | ~4,738 total (as of mid-2026) |
|
||||
| **Dependent crates** | 2 (via entdb) |
|
||||
| **Release cadence** | Frequent patch releases (0.4.2, 0.4.3, 0.4.4 in quick succession) |
|
||||
| **Source code size** | ~241K lines of Rust in core crate |
|
||||
| **Fuzzing** | Supported via `cargo +nightly fuzz` |
|
||||
| **CI** | Full test suite + FFI + Python + WASM |
|
||||
| **Documentation** | Rust API docs (docs.rs), TypeScript docs, Python docs, playground |
|
||||
| **Breaking changes** | Possible before 1.0; semver suggests API instability |
|
||||
|
||||
---
|
||||
|
||||
## 6. License
|
||||
|
||||
- **MIT License** for the Polyglot code itself
|
||||
- **sqlglot MIT License** for the test fixtures derived from the Python project
|
||||
- Both are permissive, suitable for commercial use
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- <https://github.com/tobilg/polyglot> — Main repository
|
||||
- <https://crates.io/crates/polyglot-sql> — Rust crate on crates.io
|
||||
- <https://www.npmjs.com/package/@polyglot-sql/sdk> — TypeScript SDK on npm
|
||||
- <https://pypi.org/project/polyglot-sql/> — Python bindings on PyPI
|
||||
- <https://docs.rs/polyglot-sql/latest/polyglot_sql/> — Rust API documentation
|
||||
- <https://polyglot-playground.gh.tobilg.com/> — Interactive playground
|
||||
- <https://github.com/tobymao/sqlglot> — Original Python inspiration
|
||||
@@ -1,720 +0,0 @@
|
||||
# Polyglot: Architecture Deep Dive
|
||||
|
||||
---
|
||||
|
||||
## 1. Workspace Structure
|
||||
|
||||
The repository is organized as a Cargo workspace with 5 crates and supporting packages:
|
||||
|
||||
```
|
||||
polyglot/
|
||||
├── crates/
|
||||
│ ├── polyglot-sql/ # Core Rust library (~241K LOC)
|
||||
│ │ └── src/
|
||||
│ │ ├── lib.rs # Public API, top-level functions
|
||||
│ │ ├── tokens.rs # Tokenizer (lexer)
|
||||
│ │ ├── parser.rs # Recursive-descent parser (~62K LOC)
|
||||
│ │ ├── expressions.rs # AST node types (~15K LOC)
|
||||
│ │ ├── generator.rs # SQL code generator (~39K LOC)
|
||||
│ │ ├── dialects/ # 33 dialect implementations
|
||||
│ │ │ ├── mod.rs # Dialect trait, Dialect struct, CustomDialectBuilder
|
||||
│ │ │ ├── generic.rs # Base/standard SQL dialect
|
||||
│ │ │ ├── postgres.rs # PostgreSQL (~1.9K LOC)
|
||||
│ │ │ ├── mysql.rs # MySQL
|
||||
│ │ │ ├── sqlite.rs # SQLite
|
||||
│ │ │ ├── bigquery.rs # BigQuery
|
||||
│ │ │ ├── ... (32 total)
|
||||
│ │ ├── builder.rs # Fluent query builder API
|
||||
│ │ ├── transforms.rs # Cross-dialect transform functions
|
||||
│ │ ├── validation.rs # Syntax + semantic validation
|
||||
│ │ ├── schema.rs # Schema representation
|
||||
│ │ ├── scope.rs # Scope analysis
|
||||
│ │ ├── resolver.rs # Column resolution
|
||||
│ │ ├── lineage.rs # Column lineage tracking
|
||||
│ │ ├── openlineage.rs # OpenLineage payload generation
|
||||
│ │ ├── diff.rs # AST diff (ChangeDistiller algorithm)
|
||||
│ │ ├── planner.rs # Logical query plan
|
||||
│ │ ├── optimizer/ # Query optimizer modules
|
||||
│ │ │ ├── annotate_types.rs # Type annotation
|
||||
│ │ │ ├── qualify_columns.rs # Column qualification
|
||||
│ │ │ ├── qualify_tables.rs # Table qualification
|
||||
│ │ │ ├── pushdown_predicates.rs
|
||||
│ │ │ ├── pushdown_projections.rs
|
||||
│ │ │ ├── eliminate_joins.rs
|
||||
│ │ │ ├── eliminate_ctes.rs
|
||||
│ │ │ ├── simplify.rs
|
||||
│ │ │ └── ...
|
||||
│ │ ├── traversal.rs # DFS/BFS visitors, AST predicates
|
||||
│ │ ├── ast_transforms.rs # AST manipulation utilities
|
||||
│ │ ├── error.rs # Error types
|
||||
│ │ └── time.rs # Time format conversion
|
||||
│ ├── polyglot-sql-function-catalogs/ # Optional dialect function catalogs
|
||||
│ ├── polyglot-sql-wasm/ # WASM bindings (wasm-pack)
|
||||
│ ├── polyglot-sql-ffi/ # C FFI bindings (cbindgen)
|
||||
│ └── polyglot-sql-python/ # Python bindings (PyO3 + maturin)
|
||||
├── packages/
|
||||
│ ├── sdk/ # TypeScript SDK (@polyglot-sql/sdk)
|
||||
│ ├── go/ # Go SDK (PureGo wrapper over FFI)
|
||||
│ ├── documentation/ # TypeScript API docs site
|
||||
│ ├── playground/ # Browser playground (React 19, Vite)
|
||||
│ └── python-docs/ # Python API docs
|
||||
├── examples/
|
||||
│ ├── rust/ # Rust usage example
|
||||
│ ├── typescript/ # TypeScript SDK example
|
||||
│ └── c/ # C FFI usage example
|
||||
└── tools/
|
||||
├── sqlglot-compare/ # Fixture extraction & comparison
|
||||
└── bench-compare/ # Performance benchmarks
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Data Flow Pipeline
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────────┐
|
||||
│ SQL String (source dialect) │
|
||||
└──────────────────────────┬──────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────────────────┐
|
||||
│ Tokenizer (tokens.rs) │
|
||||
│ • Dialect-specific lexing rules (quotes, comments, keywords) │
|
||||
│ • Configurable via TokenizerConfig per dialect │
|
||||
│ • Produces Vec<Token> with type, text, and Span (line/col/offset) │
|
||||
└──────────────────────────┬──────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────────────────┐
|
||||
│ Parser (parser.rs, ~62K LOC) │
|
||||
│ • Recursive-descent with precedence climbing │
|
||||
│ • Dialect-aware parsing (custom keywords, syntax rules) │
|
||||
│ • Produces Expression AST tree │
|
||||
│ • Stack safety via `stacker` feature (default-on) │
|
||||
└──────────────────────────┬──────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────────────────┐
|
||||
│ Expression AST (expressions.rs) │
|
||||
│ • Single tagged enum with 150+ variants │
|
||||
│ • Each variant has its own struct (Select, Insert, Function, etc.) │
|
||||
│ • Box<Variant> keeps enum size to 2 words (tag + pointer) │
|
||||
│ • Serializable via serde (derive Serialize/Deserialize) │
|
||||
│ • Optional TypeScript type generation via `ts-rs` feature flag │
|
||||
└──────────────────────────┬──────────────────────────────────────────┘
|
||||
│
|
||||
┌────┴────┐
|
||||
│ │
|
||||
┌─────────┘ └──────────┐
|
||||
│ │
|
||||
▼ ▼
|
||||
┌────────────────────────┐ ┌────────────────────────────────────┐
|
||||
│ Transform Pipeline │ │ Semantic / Analysis Modules │
|
||||
│ (transpile path) │ │ • validation.rs → syntax checks │
|
||||
│ │ │ • schema.rs → column/type lookup │
|
||||
│ 1. preprocess() │ │ • scope.rs → scope analysis │
|
||||
│ (whole-tree rewrites│ │ • resolver.rs → column resolution │
|
||||
│ like eliminate_ │ │ • lineage.rs → column lineage │
|
||||
│ qualify) │ │ • openlineage.rs → OL payloads │
|
||||
│ │ │ • optimizer/ → query optimization │
|
||||
│ 2. transform_expr() │ │ • diff.rs → AST diff │
|
||||
│ (per-node rewrites │ │ • planner.rs → logical plan DAG │
|
||||
│ per dialect) │ │ • traversal.rs → DFS/BFS visitors │
|
||||
│ │ │
|
||||
│ 3. Generator │ │
|
||||
│ (AST → SQL string) │ │
|
||||
└───────────┬────────────┘ └────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────────────────┐
|
||||
│ SQL String (target dialect) │
|
||||
└──────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Core Abstractions
|
||||
|
||||
### 3.1 Expression AST
|
||||
|
||||
The central type is `Expression`, a large tagged enum with one variant per SQL construct:
|
||||
|
||||
```rust
|
||||
pub enum Expression {
|
||||
// Literals
|
||||
Literal(Box<Literal>),
|
||||
Boolean(BooleanLiteral),
|
||||
Null(Null),
|
||||
|
||||
// Identifiers
|
||||
Identifier(Identifier),
|
||||
Column(Box<Column>),
|
||||
Table(Box<TableRef>),
|
||||
Star(Star),
|
||||
|
||||
// Queries
|
||||
Select(Box<Select>),
|
||||
Union(Box<Union>),
|
||||
Intersect(Box<Intersect>),
|
||||
Except(Box<Except>),
|
||||
Subquery(Box<Subquery>),
|
||||
|
||||
// DML
|
||||
Insert(Box<Insert>),
|
||||
Update(Box<Update>),
|
||||
Delete(Box<Delete>),
|
||||
Copy(Box<CopyStmt>),
|
||||
|
||||
// Binary/Unary operators
|
||||
And(Box<BinaryOp>),
|
||||
Or(Box<BinaryOp>),
|
||||
Add(Box<BinaryOp>),
|
||||
Eq(Box<BinaryOp>),
|
||||
// ... 30+ operator variants
|
||||
|
||||
// Functions
|
||||
Function(Box<Function>),
|
||||
AggregateFunction(Box<AggregateFunction>),
|
||||
WindowFunction(Box<WindowFunction>),
|
||||
|
||||
// Clauses
|
||||
From(Box<From>),
|
||||
Join(Box<Join>),
|
||||
Where(Box<Where>),
|
||||
OrderBy(Box<OrderBy>),
|
||||
// ...
|
||||
|
||||
// ~150 total variants
|
||||
}
|
||||
```
|
||||
|
||||
Key design choices:
|
||||
- **Boxed variants**: Most variants wrap their payload in `Box` to keep `size_of::<Expression>()` at 2 words (16 bytes on 64-bit).
|
||||
- **Serde support**: `#[derive(Serialize, Deserialize)]` for JSON serialization across FFI/WASM boundaries.
|
||||
- **TypeScript types**: Optional `ts-rs` feature generates TypeScript interfaces.
|
||||
- **Convenience methods**: `Expression::column()`, `Expression::number()`, `Expression::sql()`, `Expression::sql_for()`.
|
||||
|
||||
### 3.2 DialectType Enum
|
||||
|
||||
```rust
|
||||
pub enum DialectType {
|
||||
Generic, PostgreSQL, MySQL, BigQuery, Snowflake, DuckDB, SQLite,
|
||||
Hive, Spark, Trino, Presto, Redshift, TSQL, Oracle, ClickHouse,
|
||||
Databricks, Athena, Teradata, Doris, StarRocks, Materialize,
|
||||
RisingWave, SingleStore, CockroachDB, TiDB, Druid, Solr, Tableau,
|
||||
Dune, Fabric, Drill, Dremio, Exasol, DataFusion,
|
||||
}
|
||||
```
|
||||
|
||||
- Implements `FromStr` with aliases (e.g., `"mssql"` → `TSQL`, `"cockroach"` → `CockroachDB`)
|
||||
- Each variant maps to a feature-gated dialect module
|
||||
- Custom dialects can be registered at runtime via `CustomDialectBuilder`
|
||||
|
||||
### 3.3 DialectImpl Trait
|
||||
|
||||
```rust
|
||||
pub trait DialectImpl {
|
||||
fn dialect_type(&self) -> DialectType;
|
||||
fn tokenizer_config(&self) -> TokenizerConfig { /* default */ }
|
||||
fn generator_config(&self) -> GeneratorConfig { /* default */ }
|
||||
fn generator_config_for_expr(&self, _expr: &Expression) -> GeneratorConfig { /* default */ }
|
||||
fn transform_expr(&self, expr: Expression) -> Result<Expression> { Ok(expr) }
|
||||
fn preprocess(&self, expr: Expression) -> Result<Expression> { Ok(expr) }
|
||||
}
|
||||
```
|
||||
|
||||
Each dialect implements this trait to provide:
|
||||
1. **Tokenizer config**: Identifier quoting characters, string delimiters, keyword overrides, comment styles, hex number support
|
||||
2. **Generator config**: 30+ flags controlling SQL output (identifier quote style, function casing, `LIMIT` vs `TOP` vs `FETCH FIRST`, etc.)
|
||||
3. **Per-node transform**: Dialect-specific expression rewrites (e.g., PostgreSQL transforms `IFNULL` → `COALESCE`, SQLite transforms `TRY_CAST` → `CAST`)
|
||||
4. **Whole-tree preprocess**: Structural rewrites that need full-tree context (e.g., eliminating `QUALIFY` for dialects that don't support it)
|
||||
|
||||
### 3.4 Dialect Struct (High-Level API)
|
||||
|
||||
```rust
|
||||
pub struct Dialect {
|
||||
dialect_type: DialectType,
|
||||
tokenizer: Tokenizer,
|
||||
generator_config: Arc<GeneratorConfig>,
|
||||
transformer: Box<dyn Fn(Expression) -> Result<Expression> + Send + Sync>,
|
||||
generator_config_for_expr: Option<Box<dyn Fn(&Expression) -> GeneratorConfig + Send + Sync>>,
|
||||
custom_preprocess: Option<Box<dyn Fn(Expression) -> Result<Expression> + Send + Sync>>,
|
||||
}
|
||||
```
|
||||
|
||||
The `Dialect` struct bundles all dialect-specific state and provides the primary API:
|
||||
|
||||
```rust
|
||||
// Parse SQL
|
||||
let ast = dialect.parse("SELECT 1")?;
|
||||
|
||||
// Generate SQL from AST
|
||||
let sql = dialect.generate(&ast[0])?;
|
||||
|
||||
// Transpile between dialects
|
||||
let results = dialect.transpile("SELECT IFNULL(a,b) FROM t", DialectType::PostgreSQL)?;
|
||||
|
||||
// Tokenize
|
||||
let tokens = dialect.tokenize("SELECT 1")?;
|
||||
```
|
||||
|
||||
### 3.5 CustomDialectBuilder
|
||||
|
||||
For runtime-extensible dialect support:
|
||||
|
||||
```rust
|
||||
use polyglot_sql::dialects::{CustomDialectBuilder, Dialect, DialectType};
|
||||
use polyglot_sql::generator::NormalizeFunctions;
|
||||
|
||||
// Register a custom dialect inheriting from PostgreSQL
|
||||
CustomDialectBuilder::new("my_postgres")
|
||||
.based_on(DialectType::PostgreSQL)
|
||||
.generator_config_modifier(|gc| {
|
||||
gc.normalize_functions = NormalizeFunctions::Lower;
|
||||
})
|
||||
.register()?;
|
||||
|
||||
let d = Dialect::get_by_name("my_postgres").unwrap();
|
||||
// Use like any built-in dialect
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Dialect Implementation Details
|
||||
|
||||
### 4.1 PostgreSQL (`postgres.rs`, ~1,879 LOC)
|
||||
|
||||
**Tokenizer:**
|
||||
- `$$` string literals (dollar-quoting)
|
||||
- Double-quote identifier quoting
|
||||
- Nested block comments
|
||||
- `EXEC` treated as generic command
|
||||
|
||||
**Generator config highlights:**
|
||||
- `identifier_quote: '"'` (double quotes)
|
||||
- `single_string_interval: true` (`INTERVAL '1 day'`)
|
||||
- `parameter_token: "$"` (`$1`, `$2` placeholders)
|
||||
- `supports_select_into: true`
|
||||
- `supports_window_exclude: true`
|
||||
- `can_implement_array_any: true`
|
||||
|
||||
**Transform examples:**
|
||||
- `IFNULL(a, b)` → `COALESCE(a, b)`
|
||||
- `RAND()` → `RANDOM()`
|
||||
- `DATEDIFF(day, a, b)` → `CAST(b - a AS INT)` (date subtraction)
|
||||
- `JSON_EXTRACT(a, '$.x')` → `a #> '{x}'` (arrow syntax)
|
||||
- `JSON_EXTRACT_SCALAR(a, '$.x')` → `a #>> '{x}'`
|
||||
- `DATE_ADD` / `DATE_SUB` → `+` / `-` interval arithmetic
|
||||
- Type mappings: `TINYINT` → `SMALLINT`, `FLOAT` → `REAL`, `DOUBLE` → `DOUBLE PRECISION`
|
||||
- `ILIKE` preserved (native PostgreSQL)
|
||||
- `RegexpLike` → `~` operator, `RegexpILike` → `~*` operator
|
||||
|
||||
### 4.2 SQLite (`sqlite.rs`, ~750 LOC)
|
||||
|
||||
**Tokenizer:**
|
||||
- Supports `"`, `[`, `` ` `` as identifier quote characters
|
||||
- No nested comments
|
||||
- Hex number literals (`0xCC`)
|
||||
|
||||
**Generator config:**
|
||||
- `identifier_quote: '"'` (double quotes)
|
||||
- `supports_table_alias_columns: false`
|
||||
- `json_key_value_pair_sep: ","` (comma-style `JSON_OBJECT`)
|
||||
|
||||
**Transform examples:**
|
||||
- `NVL(a, b)` → `IFNULL(a, b)`
|
||||
- `TRY_CAST(x AS t)` → `CAST(x AS t)` (no try-cast)
|
||||
- `RANDOM()` → function
|
||||
- `ILIKE` → `LOWER(left) LIKE LOWER(right)` (no native ILIKE)
|
||||
- `CountIf(cond)` → `SUM(IIF(cond, 1, 0))`
|
||||
- `CEIL(x)` → function form
|
||||
- `DATE_TRUNC(unit, col)` → various strftime patterns
|
||||
- `DATE_DIFF` → `juliandiff` patterns
|
||||
|
||||
### 4.3 MySQL (`mysql.rs`)
|
||||
|
||||
**Tokenizer:** Backtick identifiers, `#` comments
|
||||
**Generator:** Backtick quoting, `LIMIT` syntax, `CONCAT()` instead of `||`
|
||||
**Transforms:** `COALESCE(a,b)` ← `IFNULL(a,b)`, `||` → `CONCAT()` (string concat), etc.
|
||||
|
||||
### 4.4 BigQuery (`bigquery.rs`)
|
||||
|
||||
**Tokenizer:** Backtick identifiers, `QUALIFY` keyword
|
||||
**Generator:** Backtick quoting, `STRUCT` types, `QUALIFY` clause, `DATE_DIFF` syntax
|
||||
**Transforms:** Complex date/timestamp function mappings, `UNNEST` handling, `APPROX_COUNT_DISTINCT` → `APPROX_COUNT_DISTINCT`
|
||||
|
||||
### 4.5 How Transpilation Works
|
||||
|
||||
The full transpilation pipeline:
|
||||
|
||||
```
|
||||
Input SQL (source dialect)
|
||||
│
|
||||
▼
|
||||
Source Dialect Tokenizer
|
||||
│
|
||||
▼
|
||||
Parser (dialect-aware)
|
||||
│
|
||||
▼
|
||||
Expression AST
|
||||
│
|
||||
▼
|
||||
Source Dialect::preprocess() ← whole-tree rewrites
|
||||
│
|
||||
▼
|
||||
Source Dialect::transform_expr() ← per-node rewrites (recursive, bottom-up)
|
||||
│
|
||||
▼
|
||||
Normalized AST
|
||||
│
|
||||
▼
|
||||
Target Dialect Generator
|
||||
│
|
||||
▼
|
||||
Output SQL (target dialect)
|
||||
```
|
||||
|
||||
The transform pipeline uses an explicit task stack (not recursive calls) for the hot paths to avoid stack overflow. The `stacker` crate provides additional stack-growth protection.
|
||||
|
||||
Key cross-dialect transforms include:
|
||||
- Function renaming: `IFNULL` ↔ `COALESCE` ↔ `NVL`, `DATEDIFF` ↔ date arithmetic, `STRING_AGG` ↔ `GROUP_CONCAT`
|
||||
- Type mapping: `TINYINT` ↔ `SMALLINT`, `FLOAT` ↔ `REAL`, `JSON` ↔ `JSONB`
|
||||
- Syntax conversion: `LIMIT` ↔ `TOP` ↔ `FETCH FIRST`, `||` (concat) ↔ `CONCAT()`, `SELECT INTO` ↔ `CREATE TABLE AS`
|
||||
- Boolean handling: `BOOL_AND`/`BOOL_OR` ↔ `MIN`/`MAX`-over-`CASE`
|
||||
- JSON operators: `JSON_EXTRACT` ↔ `#>`/`#>>` ↔ `->`/`->>` (PostgreSQL arrow syntax)
|
||||
|
||||
---
|
||||
|
||||
## 5. Fluent Builder API
|
||||
|
||||
The builder module (`builder.rs`, ~3.3K LOC) provides a type-safe, ergonomic way to construct SQL expressions without string interpolation:
|
||||
|
||||
```rust
|
||||
use polyglot_sql::builder::*;
|
||||
|
||||
// SELECT id, name FROM users WHERE age > 18 ORDER BY name LIMIT 10
|
||||
let expr = select(["id", "name"])
|
||||
.from("users")
|
||||
.where_(col("age").gt(lit(18)))
|
||||
.order_by(["name"])
|
||||
.limit(10)
|
||||
.build();
|
||||
|
||||
// INSERT
|
||||
let ins = insert_into("users")
|
||||
.columns(["id", "name"])
|
||||
.values([lit(1), lit("Alice")])
|
||||
.build();
|
||||
|
||||
// CASE expression
|
||||
let expr = case()
|
||||
.when(col("x").gt(lit(0)), lit("positive"))
|
||||
.else_(lit("non-positive"))
|
||||
.build();
|
||||
|
||||
// Set operations
|
||||
let expr = union_all(
|
||||
select(["id"]).from("a"),
|
||||
select(["id"]).from("b"),
|
||||
).order_by(["id"]).limit(5).build();
|
||||
```
|
||||
|
||||
Expression helpers:
|
||||
- `col("users.id")` — column reference (splits on last `.`)
|
||||
- `lit(42)`, `lit("hello")`, `lit(3.14)`, `lit(true)` — literals
|
||||
- `func("COALESCE", [col("a"), col("b")])` — function calls
|
||||
- Operator chain: `col("age").gte(lit(18)).and(col("status").eq(lit("active")))`
|
||||
|
||||
The builder generates an `Expression` AST that can then be serialized to any dialect via `generate()`.
|
||||
|
||||
---
|
||||
|
||||
## 6. Validation and Schema-Aware Analysis
|
||||
|
||||
### 6.1 Syntax Validation
|
||||
|
||||
```rust
|
||||
use polyglot_sql::{validate, DialectType};
|
||||
|
||||
let result = validate("SELECT * FORM users", DialectType::Generic);
|
||||
// result.valid == false
|
||||
// result.errors contain line/column/message/error codes
|
||||
```
|
||||
|
||||
Error codes:
|
||||
- `E001` — Syntax error
|
||||
- `E002` — Tokenization error
|
||||
- `E003` — Parse error
|
||||
- `E004` — Invalid expression (not a valid statement)
|
||||
- `E005` — Trailing comma in strict mode
|
||||
|
||||
### 6.2 Schema-Aware Validation
|
||||
|
||||
```rust
|
||||
use polyglot_sql::{
|
||||
validate_with_schema, DialectType, SchemaColumn, SchemaTable,
|
||||
SchemaValidationOptions, ValidationSchema,
|
||||
};
|
||||
|
||||
let schema = ValidationSchema {
|
||||
strict: Some(true),
|
||||
tables: vec![
|
||||
SchemaTable {
|
||||
name: "users".into(),
|
||||
columns: vec![
|
||||
SchemaColumn { name: "id".into(), data_type: "integer".into(), nullable: Some(false), primary_key: true, unique: false, references: None },
|
||||
SchemaColumn { name: "email".into(), data_type: "varchar".into(), nullable: Some(false), primary_key: false, unique: true, references: None },
|
||||
],
|
||||
// ...
|
||||
},
|
||||
],
|
||||
};
|
||||
|
||||
let opts = SchemaValidationOptions { check_types: true, check_references: true, strict: None, semantic: true };
|
||||
let result = validate_with_schema("SELECT id FROM users WHERE email = 1", DialectType::Generic, &schema, &opts);
|
||||
// result.valid == false (type mismatch: email is varchar, compared to integer)
|
||||
```
|
||||
|
||||
Schema-aware error codes:
|
||||
- `E200`/`E201` — Unknown table/column
|
||||
- `E210`–`E217`, `W210`–`W216` — Type checks
|
||||
- `E220`, `E221`, `W220`, `W221`, `W222` — Reference/FK checks
|
||||
|
||||
### 6.3 Function Catalogs
|
||||
|
||||
Optional feature-gated function catalogs (currently ClickHouse and DuckDB) provide known function signatures for semantic type checking:
|
||||
|
||||
```toml
|
||||
polyglot-sql = { version = "0.4", features = ["function-catalog-clickhouse"] }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Column Lineage & OpenLineage
|
||||
|
||||
### 7.1 Column Lineage
|
||||
|
||||
Trace how columns flow through a query:
|
||||
|
||||
```rust
|
||||
use polyglot_sql::{parse, DialectType};
|
||||
use polyglot_sql::lineage::get_column_lineage;
|
||||
|
||||
let ast = parse("SELECT a + b AS total FROM t", DialectType::Generic).unwrap();
|
||||
let lineage = get_column_lineage(&ast[0], /* schema */ None, DialectType::Generic);
|
||||
// lineage tells you that "total" depends on columns "a" and "b" from table "t"
|
||||
```
|
||||
|
||||
### 7.2 OpenLineage Payload Generation
|
||||
|
||||
```rust
|
||||
use polyglot_sql::openlineage::{generate_run_event, OpenLineageOptions, OpenLineageDatasetId};
|
||||
|
||||
let opts = OpenLineageOptions {
|
||||
dialect: DialectType::PostgreSQL,
|
||||
producer: "my-app".into(),
|
||||
dataset_namespace: Some("mydb".into()),
|
||||
// ...
|
||||
};
|
||||
let event = generate_run_event("SELECT * FROM users", &opts)?;
|
||||
// event is a JSON-serializable OpenLineage RunEvent with columnLineage facets
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Error Handling
|
||||
|
||||
### 8.1 Error Types
|
||||
|
||||
```rust
|
||||
pub enum Error {
|
||||
Tokenize { message: String, line: usize, column: usize, start: usize, end: usize },
|
||||
Parse { message: String, line: usize, column: usize, start: usize, end: usize },
|
||||
Generate(String),
|
||||
Unsupported { feature: String, dialect: String },
|
||||
Syntax { message: String, line: usize, column: usize, start: usize, end: usize },
|
||||
Internal(String),
|
||||
}
|
||||
```
|
||||
|
||||
All position-bearing errors include:
|
||||
- `line` — 1-based line number
|
||||
- `column` — 1-based column number
|
||||
- `start` / `end` — byte offsets (0-based, end exclusive)
|
||||
|
||||
```rust
|
||||
let err = Error::parse("Unexpected token", 3, 15, 42, 44);
|
||||
assert_eq!(err.line(), Some(3));
|
||||
assert_eq!(err.column(), Some(15));
|
||||
assert_eq!(err.start(), Some(42));
|
||||
```
|
||||
|
||||
### 8.2 Validation Errors
|
||||
|
||||
```rust
|
||||
pub struct ValidationError {
|
||||
pub message: String,
|
||||
pub line: Option<usize>,
|
||||
pub column: Option<usize>,
|
||||
pub severity: ValidationSeverity, // Error or Warning
|
||||
pub code: String, // e.g., "E001", "E200"
|
||||
pub start: Option<usize>,
|
||||
pub end: Option<usize>,
|
||||
}
|
||||
|
||||
pub struct ValidationResult {
|
||||
pub valid: bool,
|
||||
pub errors: Vec<ValidationError>,
|
||||
}
|
||||
```
|
||||
|
||||
### 8.3 Guard Rail Errors
|
||||
|
||||
Format operations have configurable guard limits that return structured errors:
|
||||
|
||||
- `E_GUARD_INPUT_TOO_LARGE` — input exceeds `max_input_bytes`
|
||||
- `E_GUARD_TOKEN_BUDGET_EXCEEDED` — token count exceeds `max_tokens`
|
||||
- `E_GUARD_AST_BUDGET_EXCEEDED` — AST node count exceeds `max_ast_nodes`
|
||||
- `E_GUARD_SET_OP_CHAIN_EXCEEDED` — UNION/INTERSECT/EXCEPT chain exceeds `max_set_op_chain`
|
||||
|
||||
---
|
||||
|
||||
## 9. AST Traversal & Analysis
|
||||
|
||||
### 9.1 Traversal
|
||||
|
||||
```rust
|
||||
use polyglot_sql::{parse, DialectType};
|
||||
use polyglot_sql::traversal::*;
|
||||
|
||||
let ast = parse("SELECT a, b FROM t WHERE x > 1", DialectType::Generic).unwrap();
|
||||
let columns = get_columns(&ast[0]); // ["a", "b", "x"]
|
||||
let tables = get_tables(&ast[0]); // ["t"]
|
||||
```
|
||||
|
||||
Available predicates (70+):
|
||||
- `is_select`, `is_insert`, `is_update`, `is_delete`, `is_ddl`
|
||||
- `is_join`, `is_where`, `is_group_by`, `is_order_by`, `is_limit`
|
||||
- `is_function`, `is_aggregate`, `is_subquery`, `is_cte`
|
||||
- `is_comparison`, `is_logical`, `is_arithmetic`
|
||||
- `contains_subquery`, `contains_aggregate`, `contains_window_function`
|
||||
|
||||
Iterators: `DfsIter`, `BfsIter` for depth-first and breadth-first traversal.
|
||||
|
||||
### 9.2 AST Transforms
|
||||
|
||||
```rust
|
||||
use polyglot_sql::ast_transforms::*;
|
||||
|
||||
// Rename tables
|
||||
let renamed = rename_tables(expr, &[("old_name", "new_name")]);
|
||||
|
||||
// Add WHERE condition
|
||||
let filtered = add_where(expr, col("active").eq(lit(true)));
|
||||
|
||||
// Remove LIMIT/OFFSET
|
||||
let unlimited = remove_limit_offset(expr);
|
||||
```
|
||||
|
||||
### 9.3 AST Diff
|
||||
|
||||
```rust
|
||||
use polyglot_sql::diff::{diff, diff_with_config, DiffConfig};
|
||||
|
||||
let edits = diff(&source_expr, &target_expr, true);
|
||||
for edit in &edits {
|
||||
if edit.is_change() {
|
||||
println!("{:?}", edit);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Uses the ChangeDistiller algorithm with Dice coefficient matching for structural comparison.
|
||||
|
||||
### 9.4 Logical Planner
|
||||
|
||||
```rust
|
||||
use polyglot_sql::planner::Plan;
|
||||
|
||||
let plan = Plan::from_expression(&expr);
|
||||
// plan.root is a Step DAG
|
||||
// plan.leaves() returns leaf steps
|
||||
// plan.dag() returns the dependency graph
|
||||
```
|
||||
|
||||
Step kinds: Scan, Filter, Project, Aggregate, Join, Sort, Limit, etc.
|
||||
|
||||
---
|
||||
|
||||
## 10. Optimizer Modules
|
||||
|
||||
The optimizer is available behind the `semantic` feature flag:
|
||||
|
||||
| Module | Purpose |
|
||||
|---|---|
|
||||
| `qualify_columns.rs` | Resolve unqualified column references to table.column |
|
||||
| `qualify_tables.rs` | Expand table names with schema/catalog |
|
||||
| `annotate_types.rs` | Infer and annotate expression types |
|
||||
| `pushdown_predicates.rs` | Push WHERE conditions into JOINs |
|
||||
| `pushdown_projections.rs` | Reduce columns to only what's needed |
|
||||
| `eliminate_joins.rs` | Remove unnecessary JOINs |
|
||||
| `eliminate_ctes.rs` | Inline single-use CTEs |
|
||||
| `simplify.rs` | Simplify boolean expressions, constant folding |
|
||||
| `normalize.rs` | Expression normalization |
|
||||
| `canonicalize.rs` | Query canonicalization |
|
||||
| `subquery.rs` | Subquery analysis |
|
||||
|
||||
---
|
||||
|
||||
## 11. Async Support
|
||||
|
||||
**Polyglot does not use async I/O** — it is a pure computational library. All operations are synchronous and CPU-bound:
|
||||
|
||||
- `parse()` — synchronous
|
||||
- `generate()` — synchronous
|
||||
- `transpile()` — synchronous
|
||||
- `validate()` — synchronous
|
||||
- `format()` — synchronous
|
||||
|
||||
This is by design: Polyglot operates on SQL strings in memory, with no network or filesystem I/O. For use in async contexts (Tokio, async-std), callers should use `tokio::task::spawn_blocking()` or similar to offload CPU-heavy parsing/transpilation to a blocking thread pool.
|
||||
|
||||
---
|
||||
|
||||
## 12. Feature Flags
|
||||
|
||||
| Flag | Description | Default |
|
||||
|---|---|---|
|
||||
| `all-dialects` | Enable all 32 dialect parsers | ✅ |
|
||||
| `generate` | SQL generation from AST | ✅ |
|
||||
| `transpile` | Cross-dialect transpilation (implies `generate`) | ✅ |
|
||||
| `builder` | Fluent query builder API (implies `generate`) | ✅ |
|
||||
| `ast-tools` | AST inspection & transform utilities | ✅ |
|
||||
| `semantic` | Schema, resolver, lineage, optimizer, validation | ✅ |
|
||||
| `openlineage` | OpenLineage payload generation (implies `semantic`) | ✅ |
|
||||
| `diff` | AST diff support (implies `generate`) | ✅ |
|
||||
| `planner` | Logical planning helpers | ✅ |
|
||||
| `time` | Time-format conversion helpers | ✅ |
|
||||
| `stacker` | Stack-growth protection for native builds | ✅ |
|
||||
| `bindings` | TypeScript type generation via `ts-rs` | ❌ |
|
||||
| `dialect-postgresql` | PostgreSQL dialect only | — |
|
||||
| `dialect-mysql` | MySQL dialect only | — |
|
||||
| ... (one per dialect) | Individual dialect selector | — |
|
||||
| `function-catalog-clickhouse` | ClickHouse function catalog | ❌ |
|
||||
| `function-catalog-duckdb` | DuckDB function catalog | ❌ |
|
||||
| `function-catalog-all-dialects` | All function catalogs | ❌ |
|
||||
|
||||
Minimal WASM build (for constrained targets):
|
||||
```toml
|
||||
polyglot-sql = { version = "0.4", default-features = false, features = ["generate", "transpile", "dialect-postgresql", "dialect-mysql"] }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- Source code examined: `/workspace/polyglot/crates/polyglot-sql/src/` (~241K LOC)
|
||||
- Architecture documentation: `/workspace/polyglot/docs/sqlglot-architecture.md`
|
||||
- Benchmark results: `/workspace/polyglot/docs/benchmark.md`
|
||||
- README: `/workspace/polyglot/README.md`, `/workspace/polyglot/crates/polyglot-sql/README.md`
|
||||
- CHANGELOG: `/workspace/polyglot/CHANGELOG.md`
|
||||
@@ -1,294 +0,0 @@
|
||||
# Polyglot: Suitability Analysis & Comparisons
|
||||
|
||||
---
|
||||
|
||||
## 1. What Polyglot Is NOT
|
||||
|
||||
Before evaluating suitability, it's essential to understand what Polyglot **does not** do:
|
||||
|
||||
| NOT a... | Because |
|
||||
|---|---|
|
||||
| **Database driver** | No connection management, no query execution, no result set handling |
|
||||
| **ORM** | No object-relational mapping, no model definitions, no active record pattern |
|
||||
| **Migration tool** | No `CREATE TABLE` evolution management, no up/down migrations framework |
|
||||
| **Type mapper** | No Rust type → SQL type mapping, no `FromRow` derives |
|
||||
| **Connection pool** | No async I/O, no TCP connections, no TLS |
|
||||
| **Query executor** | Never connects to a database; operates purely on SQL text |
|
||||
|
||||
**Polyglot is a SQL dialect transpiler.** It converts SQL strings between database dialects. Period.
|
||||
|
||||
---
|
||||
|
||||
## 2. Suitability Assessment for Multi-Database Storage Layer
|
||||
|
||||
### 2.1 What Polyglot CAN Do for a Multi-DB Project
|
||||
|
||||
| Use Case | Polyglot Support | Maturity |
|
||||
|---|---|---|
|
||||
| **SQL dialect translation** | ✅ Core purpose; 32 dialects with 100% test pass rate | Mature |
|
||||
| **SQL pretty-printing** | ✅ Built-in format with guard rails | Mature |
|
||||
| **SQL syntax validation** | ✅ Line/column error positions, error codes | Mature |
|
||||
| **Schema-aware validation** | ✅ Table/column/type checking with `ValidationSchema` | Moderate |
|
||||
| **Column lineage tracing** | ✅ `get_column_lineage()` for data lineage | Moderate |
|
||||
| **OpenLineage payloads** | ✅ `RunEvent` and `DatasetFacet` generation | Early but functional |
|
||||
| **Query builder** | ✅ Fluent API for SELECT/INSERT/UPDATE/DELETE | Usable but not as rich as query-builder-first libraries |
|
||||
| **AST diff** | ✅ ChangeDistiller-based structural diff | Functional |
|
||||
| **Logical planning** | ✅ Basic DAG plan extraction | Early stage |
|
||||
| **Query optimization** | ✅ Column qualification, predicate pushdown, join elimination | Moderate |
|
||||
| **Custom dialect registration** | ✅ `CustomDialectBuilder` for runtime extension | Functional |
|
||||
|
||||
### 2.2 What Polyglot CANNOT Do for a Multi-DB Project
|
||||
|
||||
| Need | Polyglot Support | Alternative |
|
||||
|---|---|---|
|
||||
| **Execute queries** | ❌ No | Use sqlx, diesel, or sea-orm |
|
||||
| **Connection pooling** | ❌ No | Use deadpool, bb8, or sqlx built-in |
|
||||
| **Async I/O** | ❌ Synchronous only | Wrap in `spawn_blocking()` |
|
||||
| **Type-safe query building** | ⚠️ Partial (builder API returns strings) | Use diesel or sea-orm for compile-time checks |
|
||||
| **Schema migration management** | ❌ No | Use diesel migrations, sqlx migrations, or refinery |
|
||||
| **Row mapping / deserialization** | ❌ No | Use sqlx `FromRow`, diesel `Queryable` |
|
||||
| **Runtime type mapping** | ⚠️ Limited (DataType enum, no Rust type bridge) | Build your own layer |
|
||||
| **Database-specific DDL generation** | ⚠️ Parses/generates DDL but no migration framework | Use as a building block |
|
||||
| **Transaction management** | ❌ No | Use sqlx or diesel |
|
||||
|
||||
### 2.3 Integration Pattern: Polyglot as a SQL Dialect Layer
|
||||
|
||||
The most natural integration pattern for a multi-database storage layer:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────┐
|
||||
│ Application Logic │
|
||||
├──────────────────────────────────────────────┤
|
||||
│ Query Builder / ORM Layer │
|
||||
│ (diesel / sea-orm / custom) │
|
||||
├──────────────────────┬───────────────────────┤
|
||||
│ │ │
|
||||
│ Polyglot Layer │ Direct SQL │
|
||||
│ (transpile, │ (no translation │
|
||||
│ validate, │ needed) │
|
||||
│ format) │ │
|
||||
├──────────────────────┴───────────────────────┤
|
||||
│ Database Driver Layer │
|
||||
│ (sqlx / diesel / tungstenite) │
|
||||
├──────────────────────────────────────────────┤
|
||||
│ PostgreSQL │ MySQL │ SQLite │
|
||||
└──────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
In this pattern, Polyglot sits **above** the database drivers, translating SQL from a canonical dialect to the target database's dialect before execution. It does **not** replace the drivers.
|
||||
|
||||
---
|
||||
|
||||
## 3. Comparison with Other Rust SQL Libraries
|
||||
|
||||
### 3.1 Feature Comparison Matrix
|
||||
|
||||
| Feature | **Polyglot** | **Diesel** | **SQLx** | **SeaORM** | **sqlparser-rs** |
|
||||
|---|---|---|---|---|---|
|
||||
| **Primary Purpose** | SQL transpilation | ORM / query builder | Async DB driver | Async ORM | SQL parsing |
|
||||
| **SQL Parsing** | ✅ Full AST (200+ node types) | ✅ DSL-based | ❌ No | ❌ No | ✅ Full AST |
|
||||
| **SQL Generation** | ✅ Multi-dialect | ✅ Via DSL | ❌ No | ❌ No | ⚠️ Limited |
|
||||
| **Cross-dialect Transpilation** | ✅ 32 dialects | ❌ No | ❌ No | ❌ No | ❌ No |
|
||||
| **Query Builder** | ⚠️ Fluent, string-based | ✅ Type-safe DSL | ❌ No | ✅ Type-safe | ❌ No |
|
||||
| **Async I/O** | ❌ No (sync only) | ❌ Diesel 1.x is sync | ✅ Native async | ✅ Native async | ❌ No |
|
||||
| **Type-safe Queries** | ❌ No (runtime) | ✅ Compile-time | ❌ No | ✅ Compile-time | ❌ No |
|
||||
| **Connection Pool** | ❌ No | ❌ No (Diesel 2.x via r2d2) | ✅ Built-in | ✅ Built-in | ❌ No |
|
||||
| **Migration Support** | ❌ No | ✅ Built-in | ❌ No | ✅ Built-in | ❌ No |
|
||||
| **Database Execution** | ❌ No | ✅ Yes | ✅ Yes | ✅ Yes | ❌ No |
|
||||
| **Schema Validation** | ✅ Via ValidationSchema | ✅ Compile-time | ❌ No | ⚠️ Limited | ❌ No |
|
||||
| **Column Lineage** | ✅ Built-in | ❌ No | ❌ No | ❌ No | ❌ No |
|
||||
| **AST Diff** | ✅ Built-in | ❌ No | ❌ No | ❌ No | ❌ No |
|
||||
| **Dialects Supported** | 32 | 4 (PG, MySQL, SQLite, MSSQL) | N/A | N/A | 1 (ANSI SQL) |
|
||||
| **License** | MIT | MIT/Apache-2.0 | MIT/Apache-2.0 | MIT | MIT/Apache-2.0 |
|
||||
| **Maturity** | v0.4.4 (pre-1.0) | v2.2 (stable) | v0.8 (stable) | v1.1 (stable) | v0.49 (mature) |
|
||||
|
||||
### 3.2 Polyglot vs Diesel
|
||||
|
||||
| Aspect | Polyglot | Diesel |
|
||||
|---|---|---|
|
||||
| **Philosophy** | Parse any SQL → AST → generate any dialect | Type-safe DSL → SQL for specific databases |
|
||||
| **Type Safety** | Runtime (string-based) | Compile-time (macro-based) |
|
||||
| **Query Building** | `select(["col"]).from("t").where_(...)` → `Expression` AST | `schema::table::dsl::col.filter(...)` → SQL |
|
||||
| **Dialect Breadth** | 32 dialects | 4 (PostgreSQL, MySQL, SQLite, MSSQL) |
|
||||
| **Database Execution** | None (SQL text only) | Full CRUD with connection management |
|
||||
| **Migrations** | None | Built-in migration framework |
|
||||
| **When to use** | You need cross-dialect SQL translation, validation, lineage | You need type-safe queries with database execution |
|
||||
|
||||
**Verdict**: Polyglot and Diesel are **complementary**, not competing. Use Diesel for type-safe database interaction; use Polyglot when you need to translate SQL between dialects or analyze SQL without executing it.
|
||||
|
||||
### 3.3 Polyglot vs SQLx
|
||||
|
||||
| Aspect | Polyglot | SQLx |
|
||||
|---|---|---|
|
||||
| **Philosophy** | SQL manipulation without execution | Async database driver with compile-time query checking |
|
||||
| **Async** | Synchronous only | Fully async |
|
||||
| **Query Checking** | Runtime validation against schema | Compile-time `query!()` macro |
|
||||
| **Database Support** | 32 dialects (parsing) | PostgreSQL, MySQL, SQLite (execution) |
|
||||
| **When to use** | SQL transformation/analysis | Database interaction with async Rust |
|
||||
|
||||
**Verdict**: SQLx is for executing queries against databases. Polyglot is for transforming SQL text. They solve entirely different problems.
|
||||
|
||||
### 3.4 Polyglot vs SeaORM
|
||||
|
||||
| Aspect | Polyglot | SeaORM |
|
||||
|---|---|---|
|
||||
| **Philosophy** | SQL transpilation | Async ORM built on SQLx |
|
||||
| **Async** | No | Yes |
|
||||
| **Model Definition** | None | Entity models via macros |
|
||||
| **Relationships** | None | Has-one, has-many, many-to-many |
|
||||
| **When to use** | SQL dialect conversion | Database CRUD with relationships |
|
||||
|
||||
**Verdict**: Same as SQLx — complementary, not competing.
|
||||
|
||||
### 3.5 Polyglot vs sqlparser-rs
|
||||
|
||||
| Aspect | Polyglot | sqlparser-rs |
|
||||
|---|---|---|
|
||||
| **Parsing** | ✅ Full (200+ node types) | ✅ Full (ANSI SQL + some dialects) |
|
||||
| **Generation** | ✅ Multi-dialect generation | ⚠️ Limited round-trip |
|
||||
| **Transpilation** | ✅ Cross-dialect transforms | ❌ No |
|
||||
| **Dialects** | 32 | Primarily ANSI SQL |
|
||||
| **Validation** | ✅ With error positions | ❌ Parse errors only |
|
||||
| **Builder** | ✅ Fluent API | ❌ No |
|
||||
| **Lineage** | ✅ Built-in | ❌ No |
|
||||
| **Diff** | ✅ Built-in | ❌ No |
|
||||
| **Maturity** | v0.4.4 | v0.49 (more established) |
|
||||
|
||||
**Verdict**: sqlparser-rs is a mature parser for ANSI SQL. Polyglot offers significantly more: transpilation, 32 dialects, validation, lineage, diff, and a builder API. If you need dialect translation, Polyglot is the clear choice. If you only need ANSI SQL parsing and don't need generation/transpilation, sqlparser-rs may suffice with less overhead.
|
||||
|
||||
### 3.6 Polyglot vs Python sqlglot
|
||||
|
||||
| Aspect | Polyglot (Rust) | sqlglot (Python) |
|
||||
|---|---|---|
|
||||
| **Performance** | 8–19× faster (transpile), ~86× faster (generate) | Baseline |
|
||||
| **Language** | Rust | Python |
|
||||
| **Feature Parity** | ~95% of sqlglot's transpilation | Full feature set |
|
||||
| **Optimizer** | Column qualification, predicate pushdown (moderate) | Full optimizer (column pruning, join elimination, etc.) |
|
||||
| **Execution** | ❌ No | ⚠️ Limited (can execute against some engines) |
|
||||
| **Test Compatibility** | 10,220+ sqlglot fixture cases at 100% | Original test suite |
|
||||
| **Deployment** | Native binary / WASM / Python / Go | Python package |
|
||||
|
||||
**Verdict**: Polyglot is the performance-oriented port of sqlglot. It covers the core transpilation use case at near-full feature parity. The Python sqlglot has a more mature optimizer and some execution capabilities, but Polyglot is catching up rapidly (0.4.x adds lineage, OpenLineage, schema validation, and more).
|
||||
|
||||
---
|
||||
|
||||
## 4. Limitations and Gotchas
|
||||
|
||||
### 4.1 Current Limitations
|
||||
|
||||
| Limitation | Impact | Mitigation |
|
||||
|---|---|---|
|
||||
| **Pre-1.0 API** | Breaking changes possible between minor versions | Pin exact version in Cargo.toml |
|
||||
| **No query execution** | Cannot run SQL against databases | Use alongside sqlx/diesel |
|
||||
| **No async** | Blocking in async contexts | Wrap in `spawn_blocking()` |
|
||||
| **No migration framework** | Cannot manage schema evolution | Use diesel migrations or refinery |
|
||||
| **No Rust type mapping** | `DataType` enum doesn't map to Rust types | Build your own type bridge |
|
||||
| **Builder returns Expression** | Builder doesn't produce type-safe queries | Accept runtime nature; pair with runtime validation |
|
||||
| **Optimizer is early** | Limited optimization passes vs Python sqlglot | Most useful passes exist (qualify_columns, pushdown_predicates) |
|
||||
| **WASM lacks `stacker`** | Deeply nested SQL may overflow stack in browser | Set format guard limits; consider web workers |
|
||||
| **Custom dialects are global** | `CustomDialectBuilder` uses a global `RwLock` registry | Fine for most apps; not ideal for per-request isolation |
|
||||
| **No prepared statement support** | Cannot generate `?` placeholders for parameterized queries | Build queries as strings; use sqlx for parameterization |
|
||||
|
||||
### 4.2 Gotchas
|
||||
|
||||
1. **`Dialect::get()` creates a new instance each call**: The `Dialect` struct bundles tokenizer + generator config + transformer. For hot loops, cache the `Dialect` instance rather than calling `Dialect::get()` repeatedly. (The overhead is minimal but non-zero.)
|
||||
|
||||
2. **Transpilation is not always invertible**: Some dialects have features that don't exist in others (e.g., BigQuery's `QUALIFY`, PostgreSQL's `ILIKE`, TSQL's `TOP`). Transpiling `A → B` and then `B → A` may lose information.
|
||||
|
||||
3. **Function transformation depth**: The transform pipeline processes per-node bottom-up. Some transformations require multi-pass processing (handled by `preprocess()`), but edge cases may require manual intervention.
|
||||
|
||||
4. **AST is not a stable serialization format**: The `Expression` enum and its inner structs may change between versions. If you serialize ASTs to JSON, expect breaking changes across minor versions.
|
||||
|
||||
5. **Feature flags are cumulative**: `transpile` implies `generate`, `openlineage` implies `semantic`, etc. For minimal builds, use `default-features = false` and select only what you need.
|
||||
|
||||
6. **Global custom dialect registry**: Custom dialects registered via `CustomDialectBuilder::register()` are stored in a global `RwLock<HashMap>`. This means they persist for the lifetime of the process and are visible across threads. Call `unregister_custom_dialect()` to remove them.
|
||||
|
||||
7. **Parser is permissive**: The parser accepts many SQL constructs that some databases reject. Validation (via `validate()` or `validate_with_schema()`) can catch some issues, but it's not a substitute for database-level error checking.
|
||||
|
||||
8. **No `?` placeholder generation**: Polyglot doesn't generate parameterized query placeholders. For prepared statements, you'll need to handle parameter binding yourself with your database driver.
|
||||
|
||||
9. **Schema validation requires manual schema definition**: The `ValidationSchema` struct must be populated manually — there's no automatic schema introspection from a live database.
|
||||
|
||||
---
|
||||
|
||||
## 5. Production-Readiness Assessment
|
||||
|
||||
### 5.1 Strengths
|
||||
|
||||
| Area | Rating | Notes |
|
||||
|---|---|---|
|
||||
| **Transpilation accuracy** | ⭐⭐⭐⭐⭐ | 10,220+ fixture cases at 100% pass rate |
|
||||
| **Performance** | ⭐⭐⭐⭐⭐ | 8–19× faster than Python sqlglot |
|
||||
| **Dialect coverage** | ⭐⭐⭐⭐⭐ | 32 dialects covering all major databases |
|
||||
| **API ergonomics** | ⭐⭐⭐⭐ | Clean public API; builder is pleasant |
|
||||
| **Error reporting** | ⭐⭐⭐⭐ | Line/column/byte-offset positions |
|
||||
| **WASM support** | ⭐⭐⭐⭐ | Full feature set in browser |
|
||||
| **Multi-language bindings** | ⭐⭐⭐⭐⭐ | Rust, TypeScript, Python, Go, C FFI |
|
||||
| **Documentation** | ⭐⭐⭐ | Rust API docs exist; could use more guides |
|
||||
| **Test coverage** | ⭐⭐⭐⭐⭐ | 18,745 test cases |
|
||||
| **Fuzzing** | ⭐⭐⭐⭐ | Supported via `cargo fuzz` |
|
||||
|
||||
### 5.2 Risks
|
||||
|
||||
| Risk | Severity | Mitigation |
|
||||
|---|---|---|
|
||||
| **Pre-1.0 breaking changes** | Medium | Pin version; monitor CHANGELOG |
|
||||
| **Single maintainer** | Medium | Code is well-structured; community could fork |
|
||||
| **Limited optimizer** | Low | Core passes exist; Python sqlglot is reference |
|
||||
| **No query execution** | Low (by design) | Combine with sqlx/diesel |
|
||||
| **WASM stack limits** | Low | Set guard rails; use web workers |
|
||||
|
||||
### 5.3 Overall Assessment
|
||||
|
||||
**Polyglot is production-viable for SQL transpilation and analysis tasks**, with caveats:
|
||||
|
||||
- ✅ **Use for**: SQL dialect translation, SQL linting/validation, column lineage, pretty-printing, AST analysis, cross-database query migration
|
||||
- ⚠️ **Use with caution for**: Query building (no type safety), optimization (partial coverage)
|
||||
- ❌ **Don't use for**: Database execution, connection management, migrations, type-safe queries
|
||||
|
||||
For a multi-database storage layer, the recommended pattern is:
|
||||
|
||||
```
|
||||
Application → Polyglot (transpile SQL to target dialect) → sqlx/diesel (execute)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Recommendation
|
||||
|
||||
### When to Adopt Polyglot
|
||||
|
||||
1. **You need to support multiple database backends with different SQL dialects** and want to write queries once in a canonical dialect, then transpile to the target at runtime.
|
||||
2. **You need SQL validation or analysis** (lineage, schema checking) without executing queries.
|
||||
3. **You need SQL pretty-printing or formatting** with configurable guard rails.
|
||||
4. **You need column lineage tracking** for data governance or OpenLineage integration.
|
||||
5. **You need to parse and analyze SQL** in a Rust/WASM/Python/Go context without connecting to a database.
|
||||
|
||||
### When NOT to Adopt Polyglot
|
||||
|
||||
1. **You need type-safe query building** — use Diesel or SeaORM instead.
|
||||
2. **You need async database execution** — use SQLx or SeaORM instead.
|
||||
3. **You need schema migrations** — use Diesel migrations, sqlx migrations, or Refinery instead.
|
||||
4. **You only need PostgreSQL** (or a single dialect) — a simpler parser may suffice.
|
||||
5. **You need Rust type → SQL type mapping** — Polyglot doesn't provide this.
|
||||
|
||||
### Suggested Adoption Strategy
|
||||
|
||||
For a multi-database storage layer:
|
||||
|
||||
1. **Use Polyglot for SQL transpilation**: Write queries in a canonical dialect (e.g., PostgreSQL-compatible), transpile to the target dialect at runtime.
|
||||
2. **Use SQLx for database execution**: Handle connections, pooling, and async I/O.
|
||||
3. **Use Polyglot for validation**: Validate user-provided SQL before execution.
|
||||
4. **Use Polyglot for lineage**: Trace column flow for data governance.
|
||||
5. **Build a thin integration layer** that combines Polyglot's transpilation with SQLx's execution.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- <https://github.com/tobilg/polyglot> — Main repository
|
||||
- <https://crates.io/crates/polyglot-sql> — Rust crate (v0.4.4)
|
||||
- <https://docs.rs/polyglot-sql/latest/polyglot_sql/> — Rust API docs
|
||||
- <https://github.com/tobymao/sqlglot> — Python inspiration
|
||||
- <https://lib.rs/crates/polyglot-sql> — Package metadata
|
||||
- Local source: `/workspace/polyglot/`
|
||||
@@ -1,765 +0,0 @@
|
||||
# RustFS Event Notification System & S3 Select Reference
|
||||
|
||||
> **Companion document**: This extends [rustfs-reference.md](./rustfs-reference.md) which covers auth, architecture, and credential mapping. This document focuses on the **event notification system** and **S3 Select** feature.
|
||||
|
||||
**Date**: 2026-06-08
|
||||
**RustFS version**: Based on source at `/workspace/rustfs/` (commit-level snapshot)
|
||||
**Purpose**: Evaluate rustfs event notification and S3 Select for alknet integration
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Event Notification System](#1-event-notification-system)
|
||||
2. [Event Types & Structure](#2-event-types--structure)
|
||||
3. [Notification Targets](#3-notification-targets)
|
||||
4. [Configuration & Rule Engine](#4-configuration--rule-engine)
|
||||
5. [Pipeline & Delivery](#5-pipeline--delivery)
|
||||
6. [Live Event Stream](#6-live-event-stream)
|
||||
7. [S3 Select](#7-s3-select)
|
||||
8. [Mapping to alknet](#8-mapping-to-alknet)
|
||||
9. [References](#9-references)
|
||||
|
||||
---
|
||||
|
||||
## 1. Event Notification System
|
||||
|
||||
### 1.1 Architecture Overview
|
||||
|
||||
RustFS implements a full S3-compatible bucket notification system. The architecture follows a layered pattern:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────┐
|
||||
│ S3 API Layer │
|
||||
│ (PutObject, DeleteObject, CopyObject, etc.) │
|
||||
└─────────────┬────────────────────────────────────────────┘
|
||||
│ emits EventArgs
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────┐
|
||||
│ ECStore (event_notification.rs) │
|
||||
│ - send_event() hook (global OnceLock dispatch) │
|
||||
│ - registers dispatch callback during init │
|
||||
└─────────────┬────────────────────────────────────────────┘
|
||||
│ converts EventArgs → NotifyEventArgs
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────┐
|
||||
│ rustfs_notify (NotificationSystem) │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌───────────────┐ │
|
||||
│ │ NotifyPipeline│──▶│ NotifyRuleEngine│─▶│ EventNotifier │ │
|
||||
│ │ (broadcast │ │ (match rules) │ │ (send to │ │
|
||||
│ │ + history) │ │ │ │ targets) │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────┬────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────▼────────┐ │
|
||||
│ │BucketConfigM │ │ NotifyConfigM │ │ TargetList │ │
|
||||
│ │ anager │ │ anager │ │ (Webhook, │ │
|
||||
│ └──────────────┘ └──────────────┘ │ Kafka, AMQP, │ │
|
||||
│ │ NATS, Redis, │ │
|
||||
│ │ MQTT, MySQL, │ │
|
||||
│ │ Postgres, │ │
|
||||
│ │ Pulsar) │ │
|
||||
│ └───────────────┘ │
|
||||
└──────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 1.2 Key Crates
|
||||
|
||||
| Crate | Purpose |
|
||||
|-------|---------|
|
||||
| `rustfs_notify` | Core notification orchestration: `Event`, `EventArgs`, `EventNotifier`, `NotifyPipeline`, `NotificationSystem`, rule engine, bucket config management |
|
||||
| `rustfs_targets` | Target implementations (Webhook, Kafka, AMQP, NATS, Redis, MQTT, MySQL, PostgreSQL, Pulsar) + `Target` trait, `QueueStore`, TLS hot-reload |
|
||||
| `rustfs_s3_types` | `EventName` enum with all S3 event type definitions, serialization, mask/bitfield support |
|
||||
| `rustfs_ecstore` | Storage layer; `event_notification.rs` provides the dispatch hook that bridges ecstore events to the notify system |
|
||||
| `rustfs_config` | Configuration for each target type (Env vars, KVS parsing, subsystem names) |
|
||||
|
||||
### 1.3 Initialization Flow
|
||||
|
||||
1. `rustfs/server/event.rs::init_event_notifier()` runs at startup
|
||||
2. If notify module is enabled (`RUSTFS_NOTIFY_ENABLE=true`), it calls `rustfs_notify::initialize(config)` which:
|
||||
- Creates a `NotificationSystem` with `EventNotifier`, `TargetRegistry`, and config
|
||||
- Loads all target configurations from the config store
|
||||
- Initializes each target (connects, health-checks, starts stream replay workers)
|
||||
3. An ECStore dispatch hook is installed via `register_event_dispatch_hook()` which:
|
||||
- Converts `ecstore::EventArgs` → `notify::EventArgs`
|
||||
- Parses `EventName` from string
|
||||
- Spawns an async task to call `notifier_global::notify(args)`
|
||||
|
||||
### 1.4 Module Toggle
|
||||
|
||||
The notification system respects a module enable/disable flag:
|
||||
- Environment variable: `RUSTFS_NOTIFY_ENABLE` (default: `DEFAULT_NOTIFY_ENABLE`)
|
||||
- When disabled, only the **live event stream** is initialized (no targets are loaded)
|
||||
- This allows in-process event subscription without external delivery
|
||||
|
||||
---
|
||||
|
||||
## 2. Event Types & Structure
|
||||
|
||||
### 2.1 EventName Enum
|
||||
|
||||
Defined in `rustfs_s3_types::EventName`. All S3-standard event types plus RustFS extensions:
|
||||
|
||||
| Category | Events |
|
||||
|----------|--------|
|
||||
| **ObjectAccessed** | `s3:ObjectAccessed:Get`, `s3:ObjectAccessed:Head`, `s3:ObjectAccessed:GetRetention`, `s3:ObjectAccessed:GetLegalHold`, `s3:ObjectAccessed:Attributes` |
|
||||
| **ObjectCreated** | `s3:ObjectCreated:Put`, `s3:ObjectCreated:Post`, `s3:ObjectCreated:Copy`, `s3:ObjectCreated:CompleteMultipartUpload`, `s3:ObjectCreated:PutRetention`, `s3:ObjectCreated:PutLegalHold` |
|
||||
| **ObjectRemoved** | `s3:ObjectRemoved:Delete`, `s3:ObjectRemoved:DeleteMarkerCreated`, `s3:ObjectRemoved:DeleteAllVersions`, `s3:ObjectRemoved:NoOP` |
|
||||
| **ObjectTagging** | `s3:ObjectTagging:Put`, `s3:ObjectTagging:Delete` |
|
||||
| **ObjectAcl** | `s3:ObjectAcl:Put` |
|
||||
| **ObjectReplication** | `s3:Replication:OperationFailedReplication`, `s3:Replication:OperationCompletedReplication`, `s3:Replication:OperationMissedThreshold`, `s3:Replication:OperationReplicatedAfterThreshold`, `s3:Replication:OperationNotTracked` |
|
||||
| **ObjectRestore** | `s3:ObjectRestore:Post`, `s3:ObjectRestore:Completed` |
|
||||
| **ObjectTransition** | `s3:ObjectTransition:Failed`, `s3:ObjectTransition:Complete` |
|
||||
| **Lifecycle** | `s3:LifecycleExpiration:Delete`, `s3:LifecycleExpiration:DeleteMarkerCreated`, `s3:LifecycleDelMarkerExpiration:Delete`, `s3:LifecycleTransition` |
|
||||
| **Bucket** | `s3:BucketCreated:*`, `s3:BucketRemoved:*` |
|
||||
| **Scanner** | `s3:Scanner:ManyVersions`, `s3:Scanner:LargeVersions`, `s3:Scanner:BigPrefix` |
|
||||
| **IntelligentTiering** | `s3:IntelligentTiering` |
|
||||
| **Compound (wildcard)** | `s3:ObjectAccessed:*`, `s3:ObjectCreated:*`, `s3:ObjectRemoved:*`, `s3:ObjectTagging:*`, `s3:Replication:*`, `s3:ObjectRestore:*`, `s3:LifecycleExpiration:*`, `s3:ObjectTransition:*`, `s3:Scanner:*`, `Everything` |
|
||||
| **Internal** | `ObjectRemovedAbortMultipartUpload`, `ObjectCreatedCreateMultipartUpload`, `ObjectRemovedDeleteObjects` |
|
||||
|
||||
### 2.2 Event Schema Versioning
|
||||
|
||||
The `event_schema_version` function returns different versions based on event type:
|
||||
|
||||
| Version | Events |
|
||||
|---------|--------|
|
||||
| `2.1` | ObjectCreated/Removed/Accessed base events |
|
||||
| `2.2` | Replication events |
|
||||
| `2.3` | Tagging, ACL, Restore, Lifecycle, IntelligentTiering events |
|
||||
|
||||
### 2.3 Event Record Structure (`rustfs_notify::Event`)
|
||||
|
||||
```rust
|
||||
pub struct Event {
|
||||
pub event_version: String, // e.g., "2.1", "2.2", "2.3"
|
||||
pub event_source: String, // "rustfs:s3"
|
||||
pub aws_region: String,
|
||||
pub event_time: DateTime<Utc>,
|
||||
pub event_name: EventName,
|
||||
pub user_identity: Identity, // { principal_id: String }
|
||||
pub request_parameters: HashMap<String, String>,
|
||||
pub response_elements: HashMap<String, String>,
|
||||
pub s3: Metadata, // See below
|
||||
pub glacier_event_data: Option<GlacierEventData>,
|
||||
pub source: Source, // { host, port, user_agent }
|
||||
}
|
||||
|
||||
pub struct Metadata {
|
||||
pub schema_version: String, // "1.0"
|
||||
pub configuration_id: String,
|
||||
pub bucket: Bucket, // { name, owner_identity, arn }
|
||||
pub object: Object, // See below
|
||||
}
|
||||
|
||||
pub struct Object {
|
||||
pub key: String, // URL-encoded object key
|
||||
pub size: Option<i64>,
|
||||
pub e_tag: Option<String>,
|
||||
pub content_type: Option<String>,
|
||||
pub user_metadata: Option<HashMap<String, String>>,
|
||||
pub version_id: Option<String>,
|
||||
pub sequencer: String, // Monotonic event sequence ID
|
||||
}
|
||||
```
|
||||
|
||||
- The `key` field is URL-encoded (form-urlencoded)
|
||||
- `sequencer` is derived from `ObjectInfo.mod_time` nanosecond timestamp, ensuring ordering
|
||||
- `user_metadata` filters out keys starting with `x-amz-meta-internal-`
|
||||
- For removed events, `size`, `e_tag`, `content_type`, and `user_metadata` are omitted
|
||||
|
||||
### 2.4 EventArgs Builder
|
||||
|
||||
Events are constructed via `EventArgsBuilder`:
|
||||
|
||||
```rust
|
||||
let args = EventArgsBuilder::new(EventName::ObjectCreatedPut, "my-bucket", object_info)
|
||||
.host("10.0.0.1")
|
||||
.port(9000)
|
||||
.user_agent("alknet-storage/1.0")
|
||||
.req_param("principalId", "user-123")
|
||||
.version_id("v2")
|
||||
.build();
|
||||
let event = Event::new(args);
|
||||
```
|
||||
|
||||
The builder pattern ensures all required fields are provided and allows optional fields.
|
||||
|
||||
---
|
||||
|
||||
## 3. Notification Targets
|
||||
|
||||
### 3.1 Target Trait
|
||||
|
||||
All targets implement `rustfs_targets::Target<E>`:
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait Target<E>: Send + Sync + 'static
|
||||
where E: Send + Sync + 'static + Clone + Serialize + DeserializeOwned
|
||||
{
|
||||
fn id(&self) -> TargetID;
|
||||
fn name(&self) -> String;
|
||||
async fn is_active(&self) -> Result<bool, TargetError>;
|
||||
async fn save(&self, event: Arc<EntityTarget<E>>) -> Result<(), TargetError>;
|
||||
async fn send_raw_from_store(&self, key: Key, body: Vec<u8>, meta: QueuedPayloadMeta) -> Result<(), TargetError>;
|
||||
async fn send_from_store(&self, key: Key) -> Result<(), TargetError>;
|
||||
async fn close(&self) -> Result<(), TargetError>;
|
||||
fn store(&self) -> Option<&(dyn Store<QueuedPayload, ...>)>;
|
||||
fn clone_dyn(&self) -> Box<dyn Target<E> + Send + Sync>;
|
||||
async fn init(&self) -> Result<(), TargetError>;
|
||||
fn is_enabled(&self) -> bool;
|
||||
fn delivery_snapshot(&self) -> TargetDeliverySnapshot;
|
||||
fn record_final_failure(&self);
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Supported Targets
|
||||
|
||||
| Target | Crate Module | Protocol | Queue Store | TLS/mTLS | SASL | Notes |
|
||||
|--------|-------------|----------|-------------|----------|------|-------|
|
||||
| **Webhook** | `targets::webhook` | HTTP POST | Yes (file) | Yes (CA, client cert, skip_verify) | Bearer token | Health check via HEAD to `/`; TLS hot-reload |
|
||||
| **Kafka** | `targets::kafka` | Kafka Produce | Yes (file) | Yes (CA, client cert) | PLAIN, SCRAM-SHA-256, SCRAM-SHA-512 | Uses `rustfs_kafka_async`; acknowledgments configurable (-1, 0, 1) |
|
||||
| **AMQP** | `targets::amqp` | AMQP 0-9-1 | Yes (file) | Yes (CA, client cert via amqps://) | Username/password (in URL or config) | Uses `lapin`; publisher confirms; persistent delivery mode |
|
||||
| **NATS** | `targets::nats` | NATS Publish | Yes (file) | Yes (CA, client cert) | Token, username/password, credentials file | Subject-based routing |
|
||||
| **Redis** | `targets::redis` | Redis Pub/Sub | Yes (file) | Yes (CA, client cert, insecure) | Password | Channel publish; connection pooling |
|
||||
| **MQTT** | `targets::mqtt` | MQTT v5 | Yes (file) | Yes (CA, client cert) | Username/password | Uses `rumqttc`; QoS 0/1; WebSocket path allowlist |
|
||||
| **MySQL** | `targets::mysql` | MySQL INSERT | Yes (file) | Yes (CA, client cert) | Username/password | Namespace or access format; connection pooling |
|
||||
| **PostgreSQL** | `targets::postgres` | PostgreSQL INSERT/UPSERT | Yes (file) | Yes (CA, client cert) | Username/password (DSN) | Namespace (UPSERT) or access (append) format; `deadpool-postgres` pooling |
|
||||
| **Pulsar** | `targets::pulsar` | Pulsar Produce | Yes (file) | Yes (CA, client cert) | Token, OAuth2 | Topic-based; persistent or non-persistent |
|
||||
|
||||
**Note**: Elasticsearch is listed as a subsystem constant (`notify_elasticsearch`) but marked `#[allow(dead_code)]`, indicating it's planned but not yet implemented.
|
||||
|
||||
### 3.3 Target Identification (ARN)
|
||||
|
||||
Each target has a `TargetID` (format: `ID:Name`, e.g., `1:webhook`) and an `ARN` (format: `arn:rustfs:sqs:{region}:{id}:{name}`, e.g., `arn:rustfs:sqs:us-east-1:1:webhook`).
|
||||
|
||||
Default partition: `rustfs`, default service: `sqs`.
|
||||
|
||||
### 3.4 Queue Store (Persistent Delivery)
|
||||
|
||||
Targets that have a `queue_dir` configured use a persistent store for at-least-once delivery:
|
||||
|
||||
- Events are first persisted to the queue store, then sent
|
||||
- If the target is unreachable, events remain in the store and are replayed when connectivity recovers
|
||||
- Queue store format: `RQP1` magic + metadata length (LE u32) + JSON metadata + raw body
|
||||
- `QueuedPayload` structure includes: event_name, bucket_name, object_name, content_type, queued_at_unix_ms, payload_len
|
||||
- Extension: `notify_store` (`.nqs`) for notification events, `audit_store` for audit logs
|
||||
|
||||
### 3.5 Delivery Payload Format (`TargetLog`)
|
||||
|
||||
```rust
|
||||
// Serialized as JSON when delivering to targets
|
||||
struct TargetLog {
|
||||
event_name: EventName,
|
||||
key: String, // "{bucket}/{decoded_object_name}"
|
||||
records: Vec<E>, // For AMQP/NATS: includes full EntityTarget records
|
||||
// For others: includes serialized Event data
|
||||
}
|
||||
```
|
||||
|
||||
For AMQP and NATS targets, `build_queued_payload_with_records()` is used, which includes cloned `EntityTarget` records. For other targets, `build_queued_payload()` serializes just the event data.
|
||||
|
||||
### 3.6 Concurrency Controls
|
||||
|
||||
| Parameter | Default | Env Var |
|
||||
|-----------|---------|---------|
|
||||
| Target stream concurrency | 20 | `RUSTFS_NOTIFY_TARGET_STREAM_CONCURRENCY` |
|
||||
| Send concurrency (inflight limit) | 64 | `RUSTFS_NOTIFY_SEND_CONCURRENCY` |
|
||||
|
||||
### 3.7 TLS Hot-Reload
|
||||
|
||||
All targets that support TLS (webhook, Kafka, AMQP, NATS, MySQL, PostgreSQL, MQTT) implement `ReloadableTargetTls`:
|
||||
|
||||
- A background coordinator polls TLS files for changes
|
||||
- When fingerprint changes are detected, new material (HTTP client, producer, connection) is built
|
||||
- Applied via `apply_tls_material()` without requiring a restart
|
||||
- Supports CA certificates, client certificates, and client keys
|
||||
|
||||
---
|
||||
|
||||
## 4. Configuration & Rule Engine
|
||||
|
||||
### 4.1 Bucket Notification Configuration (XML)
|
||||
|
||||
Configuration follows the S3 `NotificationConfiguration` XML schema:
|
||||
|
||||
```xml
|
||||
<NotificationConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
|
||||
<QueueConfiguration>
|
||||
<Id>my-notification</Id>
|
||||
<Queue>arn:rustfs:sqs:us-east-1:1:webhook</Queue>
|
||||
<Event>s3:ObjectCreated:*</Event>
|
||||
<Event>s3:ObjectRemoved:Delete</Event>
|
||||
<Filter>
|
||||
<S3Key>
|
||||
<FilterRule>
|
||||
<Name>prefix</Name>
|
||||
<Value>uploads/</Value>
|
||||
</FilterRule>
|
||||
<FilterRule>
|
||||
<Name>suffix</Name>
|
||||
<Value>.csv</Value>
|
||||
</FilterRule>
|
||||
</S3Key>
|
||||
</Filter>
|
||||
</QueueConfiguration>
|
||||
</NotificationConfiguration>
|
||||
```
|
||||
|
||||
The XML is parsed via `quick_xml` into `NotificationConfiguration` → `QueueConfig` → validated → converted to `BucketNotificationConfig` → `RulesMap`.
|
||||
|
||||
Key validation rules:
|
||||
- Lambda and Topic configurations are **not supported** (return `UnsupportedConfiguration` error)
|
||||
- Only `QueueConfiguration` is supported (maps to all target types, not just SQS)
|
||||
- One prefix filter and one suffix filter maximum
|
||||
- Filter values: ≤1024 chars, no `.` or `..` segments, no `\`, valid UTF-8
|
||||
- No duplicate event names within a queue config
|
||||
- ARN must exist in the configured target list
|
||||
|
||||
### 4.2 RulesMap
|
||||
|
||||
`RulesMap` maps `EventName` → `PatternRules` → `TargetIdSet`:
|
||||
|
||||
- Compound events (like `ObjectCreatedAll`) are **expanded** into specific events on insertion
|
||||
- Pattern matching: prefix/suffix wildcards (e.g., `uploads/*.csv`)
|
||||
- URL-encoded keys are matched against both encoded and decoded patterns
|
||||
- Bitmask-based fast path: `total_events_mask` enables O(1) `has_subscriber()` checks
|
||||
|
||||
### 4.3 Dynamically Reconfigurable
|
||||
|
||||
- `NotificationSystem::set_target_config()` — add/update a target
|
||||
- `NotificationSystem::remove_target_config()` — remove a target
|
||||
- `NotificationSystem::load_bucket_notification_config()` — load per-bucket rules
|
||||
- `NotificationSystem::remove_bucket_notification_config()` — remove per-bucket rules
|
||||
- `NotificationSystem::reload_config()` — reload from a new `Config` object
|
||||
- All changes trigger automatic re-initialization of affected targets
|
||||
|
||||
---
|
||||
|
||||
## 5. Pipeline & Delivery
|
||||
|
||||
### 5.1 Event Flow
|
||||
|
||||
```
|
||||
ECStore operation
|
||||
↓
|
||||
ecstore::event_notification::send_event(EventArgs)
|
||||
↓ (OnceLock dispatch hook)
|
||||
convert EventArgs → notify::EventArgs
|
||||
↓ spawn
|
||||
notifier_global::notify(EventArgs)
|
||||
↓
|
||||
NotificationSystem::send_event(Arc<Event>)
|
||||
↓
|
||||
NotifyPipeline::send_event()
|
||||
├── LiveEventHistory::record() (in-memory, last 1024 events)
|
||||
├── broadcast::send() (tokio broadcast channel, capacity 1024)
|
||||
└── EventNotifier::send() (async, rule-matched delivery)
|
||||
├── RuleEngine::match_targets(bucket, event_name, object_key)
|
||||
└── For each matched target:
|
||||
├── EntityTarget construction
|
||||
├── If queue_store: persist then async send
|
||||
└── If no queue_store: immediate async send
|
||||
```
|
||||
|
||||
### 5.2 Live Event Stream
|
||||
|
||||
The `NotifyPipeline` provides an in-process event stream via `tokio::sync::broadcast`:
|
||||
|
||||
```rust
|
||||
// Subscribe to live events
|
||||
let rx = system.subscribe_live_events();
|
||||
|
||||
// Check if there are live listeners
|
||||
system.has_live_listeners();
|
||||
|
||||
// Get recent events since a sequence number
|
||||
system.recent_live_events_since(after_sequence, limit) → LiveEventBatch
|
||||
```
|
||||
|
||||
- Broadcast channel capacity: 1024
|
||||
- `LiveEventHistory` stores last 1024 events with monotonic sequence numbers
|
||||
- `LiveEventBatch` includes `events: Vec<Arc<Event>>`, `next_sequence: u64`, `truncated: bool`
|
||||
|
||||
### 5.3 Metrics
|
||||
|
||||
`NotificationMetrics` tracks:
|
||||
- Processing count (in-flight)
|
||||
- Processed count (completed)
|
||||
- Failed count
|
||||
- Skipped count (no matching targets)
|
||||
|
||||
Per-target `TargetDeliverySnapshot`:
|
||||
- `total_messages`
|
||||
- `failed_messages`
|
||||
- `queue_length`
|
||||
|
||||
---
|
||||
|
||||
## 6. Live Event Stream
|
||||
|
||||
### 6.1 In-Process Subscription
|
||||
|
||||
The live event stream is useful for alknet because it provides a **push-based** event feed without requiring external message brokers:
|
||||
|
||||
```rust
|
||||
// This can be used from within the same process
|
||||
let mut rx = notification_system.subscribe_live_events();
|
||||
while let Ok(event) = rx.recv().await {
|
||||
// event: Arc<Event> — full S3 event record
|
||||
println!("Event: {} on {}/{}", event.event_name, event.s3.bucket.name, event.s3.object.key);
|
||||
}
|
||||
```
|
||||
|
||||
### 6.2 Event History Replay
|
||||
|
||||
The `LiveEventHistory` supports catch-up subscriptions:
|
||||
|
||||
```rust
|
||||
// Get events since sequence number 42
|
||||
let batch = system.recent_live_events_since(42, 100).await;
|
||||
// batch.next_sequence → next sequence to request
|
||||
// batch.truncated → whether there are more events
|
||||
// batch.events → Vec<Arc<Event>>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. S3 Select
|
||||
|
||||
### 7.1 Architecture Overview
|
||||
|
||||
RustFS implements S3 Select using **Apache DataFusion** as the SQL engine:
|
||||
|
||||
```
|
||||
SelectObjectContentRequest
|
||||
↓ validation (expression type, input/output format, scan range)
|
||||
↓ preflight (get object info, validate SSE headers)
|
||||
↓ create EcObjectStore (DataFusion ObjectStore adapter)
|
||||
↓ get_global_db(input) → QueryDispatcher
|
||||
↓ Query::new(Context, expression) → execute
|
||||
↓ DataFusion SQL parser → logical plan → optimized → physical plan → RecordBatch stream
|
||||
↓ SelectOutputEncoder → CSV or JSON → chunked (128KB) → event stream
|
||||
```
|
||||
|
||||
### 7.2 Key Crates
|
||||
|
||||
| Crate | Purpose |
|
||||
|-------|---------|
|
||||
| `rustfs_s3select_api` | Query error types, `Context`, `Query`, `QueryResult`, `DatabaseManagerSystem` trait, object store |
|
||||
| `rustfs_s3select_query` | SQL implementation: parser, analyzer, optimizer, function manager, execution, dispatcher |
|
||||
|
||||
### 7.3 SQL Engine
|
||||
|
||||
- **Parser**: Custom `RustFsDialect` + `ExtParser` extending DataFusion's SQL parser
|
||||
- **Supports**: Single SELECT statements only (multi-statement is rejected)
|
||||
- **Optimizer**: `CascadeOptimizerBuilder` (DataFusion's default rule set)
|
||||
- **Scheduler**: `LocalScheduler` (single-node execution)
|
||||
- **Functions**: All of DataFusion's built-in scalar, aggregate, and window functions
|
||||
|
||||
### 7.4 Input Formats
|
||||
|
||||
| Format | Support | Notes |
|
||||
|--------|---------|-------|
|
||||
| **CSV** | ✅ Full | `FileHeaderInfo` (NONE, USE, IGNORE), custom delimiters, quote chars, comment chars, record delimiters |
|
||||
| **JSON (LINES)** | ✅ Full | NDJSON line-by-line streaming |
|
||||
| **JSON (DOCUMENT)** | ✅ Limited | Max 128 MiB (OOM guard); no scan range support |
|
||||
| **Parquet** | ✅ Full | Columnar format |
|
||||
| **Compression** | ❌ Not supported | Only `NONE` compression currently accepted |
|
||||
|
||||
### 7.5 Output Formats
|
||||
|
||||
| Format | Options |
|
||||
|--------|---------|
|
||||
| **CSV** | Custom field delimiter, quote character, quote escape, record delimiter, quote fields (ALWAYS/ASNEEDED) |
|
||||
| **JSON** | Line-delimited (NDJSON); custom record delimiter |
|
||||
|
||||
### 7.6 Expression Limitations
|
||||
|
||||
- Max expression size: 256 KiB (`MAX_SELECT_EXPRESSION_BYTES`)
|
||||
- Expression type must be `SQL`
|
||||
- No `AllowQuotedRecordDelimiter` support for CSV
|
||||
- Scan ranges:
|
||||
- CSV: supported
|
||||
- JSON LINES: supported
|
||||
- JSON DOCUMENT: **not supported**
|
||||
- Parquet: supported
|
||||
- Range must be valid (start < end, start < object size)
|
||||
|
||||
### 7.7 Object Store Integration
|
||||
|
||||
`EcObjectStore` implements DataFusion's `ObjectStore` trait, adapting rustfs's ECStore for query execution:
|
||||
- Handles `GET` with optional byte ranges (scan range)
|
||||
- JSON DOCUMENT mode: entire file buffered for DOM parsing, then flattened to NDJSON
|
||||
- JSON sub-path extraction: `FROM s3object.some.path` navigates to the key before flattening
|
||||
- Respects SSE-C headers for encrypted objects
|
||||
|
||||
### 7.8 Streaming Response
|
||||
|
||||
Results are streamed as S3 event types:
|
||||
1. `Cont` event (continuation marker)
|
||||
2. `Records` events (128KB chunks)
|
||||
3. `Progress` events (if `RequestProgress.Enabled=true`) — currently only `BytesReturned` populated
|
||||
4. `Stats` event (final)
|
||||
5. `End` event
|
||||
|
||||
### 7.9 Error Mapping
|
||||
|
||||
| QueryError | S3 Error |
|
||||
|-----------|----------|
|
||||
| `Parser` | `ParseSelectFailure` (400) |
|
||||
| `MultiStatement` | `UnsupportedSqlStructure` |
|
||||
| `NotImplemented` | `NotImplemented` |
|
||||
| `Datafusion` (scan range) | `InvalidRequestParameter` |
|
||||
| `Datafusion` (missing binding) | `EvaluatorBindingDoesNotExist` |
|
||||
| `Datafusion` (other) | `UnsupportedSqlOperation` |
|
||||
| `StoreError` (bucket not found) | `NoSuchBucket` |
|
||||
| `StoreError` (object not found) | `NoSuchKey` |
|
||||
| `StoreError` (other) | `InternalError` |
|
||||
|
||||
---
|
||||
|
||||
## 8. Mapping to alknet
|
||||
|
||||
### 8.1 rustfs Events → alknet Integration Events
|
||||
|
||||
rustfs events are **integration events from rustfs's perspective** and remain **integration events from alknet's perspective**. This is the correct cross-boundary classification per ADR-032.
|
||||
|
||||
#### Event Projection: `rustfs::BucketNotificationEvent` → `alknet::EventEnvelope`
|
||||
|
||||
Suggested namespace and operation mapping:
|
||||
|
||||
| rustfs EventName | alknet Namespace | alknet Operation |
|
||||
|------------------|-----------------|-----------------|
|
||||
| `s3:ObjectCreated:Put` | `storage.object` | `created.put` |
|
||||
| `s3:ObjectCreated:Post` | `storage.object` | `created.post` |
|
||||
| `s3:ObjectCreated:Copy` | `storage.object` | `created.copy` |
|
||||
| `s3:ObjectCreated:CompleteMultipartUpload` | `storage.object` | `created.multipart-complete` |
|
||||
| `s3:ObjectRemoved:Delete` | `storage.object` | `removed.delete` |
|
||||
| `s3:ObjectRemoved:DeleteMarkerCreated` | `storage.object` | `removed.delete-marker-created` |
|
||||
| `s3:ObjectAccessed:Get` | `storage.object` | `accessed.get` |
|
||||
| `s3:ObjectAccessed:Head` | `storage.object` | `accessed.head` |
|
||||
| `s3:BucketCreated:*` | `storage.bucket` | `created` |
|
||||
| `s3:BucketRemoved:*` | `storage.bucket` | `removed` |
|
||||
|
||||
The full `Event` record from rustfs should be preserved in the `EventEnvelope.payload` field for traceability, while a normalized `metadata` extraction provides fast-path access:
|
||||
|
||||
```rust
|
||||
// Pseudocode for mapping
|
||||
fn project_rustfs_event(event: &rustfs_notify::Event) -> alknet::EventEnvelope {
|
||||
let namespace = if event.event_name == EventName::BucketCreated || event.event_name == EventName::BucketRemoved {
|
||||
"storage.bucket"
|
||||
} else {
|
||||
"storage.object"
|
||||
};
|
||||
|
||||
let operation = event.event_name.as_str() // "s3:ObjectCreated:Put"
|
||||
.strip_prefix("s3:") // "ObjectCreated:Put"
|
||||
.unwrap_or("unknown")
|
||||
.to_lowercase()
|
||||
.replace(':',, ".");
|
||||
|
||||
EventEnvelope {
|
||||
id: uuid::Uuid::new_v4(),
|
||||
namespace: namespace.into(),
|
||||
operation: operation.into(), // e.g., "objectcreated.put"
|
||||
timestamp: event.event_time,
|
||||
source: "rustfs".into(),
|
||||
metadata: json!({
|
||||
"bucket": event.s3.bucket.name,
|
||||
"key": event.s3.object.key,
|
||||
"size": event.s3.object.size,
|
||||
"eTag": event.s3.object.e_tag,
|
||||
"versionId": event.s3.object.version_id,
|
||||
"sequencer": event.s3.object.sequencer,
|
||||
"principalId": event.user_identity.principal_id,
|
||||
}),
|
||||
payload: serde_json::to_value(event).ok(),
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 8.2 Subscription Architecture
|
||||
|
||||
#### Option A: In-Process Live Event Stream (Recommended)
|
||||
|
||||
Since alknet and rustfs share the same process, alknet can subscribe to the live event stream directly:
|
||||
|
||||
```rust
|
||||
// In alknet's initialization
|
||||
let notification_system = rustfs_notify::notification_system().unwrap();
|
||||
let mut event_rx = notification_system.subscribe_live_events();
|
||||
|
||||
// In alknet's event loop
|
||||
tokio::spawn(async move {
|
||||
while let Ok(event) = event_rx.recv().await {
|
||||
let envelope = project_rustfs_event(&event);
|
||||
alknet::honker::publish(envelope).await;
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
**Advantages**:
|
||||
- Zero-latency, zero-serialization overhead
|
||||
- No network hop
|
||||
- Direct access to `Arc<Event>` in-process
|
||||
- alknet's Honker streams get events immediately
|
||||
|
||||
**Considerations**:
|
||||
- `has_live_listeners()` can be checked before performing expensive event construction
|
||||
- The broadcast channel capacity is 1024; slow consumers will miss events (acceptable for integration events)
|
||||
- `recent_live_events_since()` allows catch-up after reconnection
|
||||
|
||||
#### Option B: External Target via Webhook/Kafka/etc.
|
||||
|
||||
If alknet runs as a separate process, configure a webhook or Kafka target pointing to alknet's event ingestion endpoint:
|
||||
|
||||
```json
|
||||
{
|
||||
"notify_webhook": {
|
||||
"1": {
|
||||
"enable": true,
|
||||
"endpoint": "https://alknet.internal/events/rustfs",
|
||||
"auth_token": "Bearer alknet-secret"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Advantages**:
|
||||
- Decoupled deployment
|
||||
- RustFS's queue store provides at-least-once delivery
|
||||
|
||||
**Considerations**:
|
||||
- Network latency and serialization overhead
|
||||
- Need to handle deduplication (at-least-once means possible duplicates)
|
||||
- Queue store provides durability if alknet is temporarily unavailable
|
||||
|
||||
#### Option C: Hybrid — Live Stream + Webhook Fallback
|
||||
|
||||
For maximum reliability:
|
||||
1. In-process live stream for low-latency event propagation
|
||||
2. Webhook/Kafka target as a fallback for events missed during restarts
|
||||
3. Use `sequentor` ordering to detect gaps
|
||||
|
||||
### 8.3 S3 Select → alknet Operations
|
||||
|
||||
S3 Select can be exposed as an alknet operation:
|
||||
|
||||
| alknet Operation | Description |
|
||||
|-----------------|-------------|
|
||||
| `storage.select` | Run an S3 Select SQL query on an object |
|
||||
| `storage.select-status` | Check Select availability (optional) |
|
||||
|
||||
```rust
|
||||
// Example alknet call protocol operation
|
||||
fn handle_storage_select(params: StorageSelectParams) -> Result<StorageSelectResult, Error> {
|
||||
// 1. Construct SelectObjectContentInput
|
||||
// 2. Call existing rustfs SelectObjectContent handler
|
||||
// 3. Stream results back through alknet call protocol
|
||||
}
|
||||
```
|
||||
|
||||
#### Use Cases for alknet
|
||||
|
||||
1. **Metagraph Queries**: Query stored metagraph JSON/CSV objects without downloading them entirely
|
||||
```sql
|
||||
SELECT s.name, s.version FROM S3Object s WHERE s.type = 'service'
|
||||
```
|
||||
|
||||
2. **Log Analytics**: Query structured log data stored in S3
|
||||
```sql
|
||||
SELECT COUNT(*) as cnt, s.level FROM S3Object s WHERE s.timestamp > '2026-01-01' GROUP BY s.level
|
||||
```
|
||||
|
||||
3. **Ad-hoc Data Exploration**: Quick data inspection without full downloads
|
||||
```sql
|
||||
SELECT * FROM S3Object s LIMIT 100
|
||||
```
|
||||
|
||||
4. **Aggregation Pipelines**: Pre-process data before moving to alknet's internal stores
|
||||
|
||||
### 8.4 ADR-032 Implications: Cross-Boundary Event Flow
|
||||
|
||||
Per ADR-032, rustfs events are **integration events** — they represent facts about state changes that have already happened in the storage system boundary. When alknet consumes them:
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌─────────────┐
|
||||
│ rustfs │ │ alknet │
|
||||
│ (bounded │ integration │ (bounded │
|
||||
│ context) │───── event ─────────▶│ context) │
|
||||
│ │ │ │
|
||||
│ S3 Object │ EventEnvelope │ Honker │
|
||||
│ Created/ │ namespace: │ Stream │
|
||||
│ Removed/ │ "storage.object" │ Subscriber │
|
||||
│ Accessed │ operation: │ │
|
||||
│ │ "created.put" │ Call │
|
||||
│ │ │ Protocol │
|
||||
│ S3 Select │ storage.select │ Operation │
|
||||
│ Results │◀──── call ──────────│ │
|
||||
└─────────────┘ └─────────────┘
|
||||
```
|
||||
|
||||
Key points:
|
||||
1. **Events flow inward**: rustfs → alknet (integration events entering alknet's boundary)
|
||||
2. **Calls flow outward**: alknet → rustfs (alknet initiates S3 Select as a call)
|
||||
3. **No shared domain model**: alknet shouldn't reference rustfs's `Event` struct directly in its domain; it projects into its own `EventEnvelope` format
|
||||
4. **Eventual consistency**: rustfs notifications may arrive out of order; `sequentor` field provides ordering within a bucket
|
||||
5. **At-least-once delivery**: If using webhook/Kafka targets, duplicate events are possible; alknet must be idempotent
|
||||
6. **No orchestration across boundaries**: alknet doesn't tell rustfs to emit events; it subscribes to events rustfs naturally produces
|
||||
|
||||
### 8.5 Implementation Recommendations
|
||||
|
||||
1. **Short-term**: Use the **in-process live event stream** to subscribe to rustfs events and re-emit them through alknet's Honker system. This gives immediate value with minimal integration work.
|
||||
|
||||
2. **Medium-term**: Add a **webhook notification target** pointing at an alknet HTTP endpoint for redundancy. Configure bucket notification rules via the S3 API (PutBucketNotificationConfiguration).
|
||||
|
||||
3. **Long-term**: Consider implementing an **alknet NATS target** that directly publishes events into alknet's NATS infrastructure, bypassing the HTTP layer entirely for lower latency.
|
||||
|
||||
4. **S3 Select**: Expose via alknet's call protocol as `storage.select`. The existing `execute_select_object_content` function can be called directly as a library function since alknet and rustfs share the same process.
|
||||
|
||||
5. **Event schema versioning**: Store the `event_version` field from rustfs events in alknet's `EventEnvelope.metadata` to handle future schema evolution.
|
||||
|
||||
---
|
||||
|
||||
## 9. References
|
||||
|
||||
### Source Code Locations
|
||||
|
||||
| Component | Path |
|
||||
|-----------|------|
|
||||
| Event structure | `/crates/notify/src/event.rs` |
|
||||
| EventName enum | `/crates/s3-types/src/event_name.rs` |
|
||||
| NotifyPipeline + LiveEventHistory | `/crates/notify/src/pipeline.rs` |
|
||||
| EventNotifier + TargetList | `/crates/notify/src/notifier.rs` |
|
||||
| NotificationSystem | `/crates/notify/src/integration.rs` |
|
||||
| Rule engine | `/crates/notify/src/rule_engine.rs` |
|
||||
| RulesMap | `/crates/notify/src/rules/rules_map.rs` |
|
||||
| Bucket notification config | `/crates/notify/src/rules/config.rs` |
|
||||
| XML notification config | `/crates/notify/src/rules/xml_config.rs` |
|
||||
| Target trait + QueuedPayload | `/crates/targets/src/target/mod.rs` |
|
||||
| Webhook target | `/crates/targets/src/target/webhook.rs` |
|
||||
| Kafka target | `/crates/targets/src/target/kafka.rs` |
|
||||
| AMQP target | `/crates/targets/src/target/amqp.rs` |
|
||||
| NATS target | `/crates/targets/src/target/nats.rs` |
|
||||
| Redis target | `/crates/targets/src/target/redis.rs` |
|
||||
| MQTT target | `/crates/targets/src/target/mqtt.rs` |
|
||||
| MySQL target | `/crates/targets/src/target/mysql.rs` |
|
||||
| PostgreSQL target | `/crates/targets/src/target/postgres.rs` |
|
||||
| Pulsar target | `/crates/targets/src/target/pulsar.rs` |
|
||||
| ARN + TargetID | `/crates/targets/src/arn.rs` |
|
||||
| ECStore event dispatch | `/crates/ecstore/src/event_notification.rs` |
|
||||
| Server event init | `/rustfs/src/server/event.rs` |
|
||||
| S3 Select handler | `/rustfs/src/app/select_object.rs` |
|
||||
| S3 Select query engine | `/crates/s3select-query/src/` |
|
||||
| S3 Select API | `/crates/s3select-api/src/` |
|
||||
| S3 Select object store | `/crates/s3select-api/src/object_store.rs` |
|
||||
| Config subsystem names | `/crates/config/src/notify/mod.rs` |
|
||||
|
||||
### AWS S3 Documentation
|
||||
|
||||
- [S3 Event Notification Configuration](https://docs.aws.amazon.com/AmazonS3/latest/userguide/EventNotifications.html)
|
||||
- [S3 Select Documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/selecting-content-from-objects.html)
|
||||
|
||||
### Internal References
|
||||
|
||||
- `/workspace/@alkdev/alknet/docs/research/references/rustfs/rustfs-reference.md` — Companion document covering auth, architecture, and credential mapping
|
||||
@@ -1,732 +0,0 @@
|
||||
# RustFS Reference Document
|
||||
|
||||
> Status: Research Complete
|
||||
> Last updated: 2026-06-08
|
||||
> Source: /workspace/rustfs/ (cloned repository, v1.0.0-beta.7)
|
||||
> Context: alknet internal service integration research
|
||||
|
||||
---
|
||||
|
||||
## 1. Architecture Overview
|
||||
|
||||
### What is RustFS?
|
||||
|
||||
RustFS is a high-performance, distributed, S3-compatible object storage system written in Rust. It is an Apache 2.0-licensed alternative to MinIO that combines S3 API compatibility with OpenStack Swift/Keystone support, designed for data lake, AI, and big data workloads.
|
||||
|
||||
**Key characteristics:**
|
||||
- Language: Rust (edition 2024, MSRV 1.95.0)
|
||||
- License: Apache 2.0 (no AGPL restrictions)
|
||||
- Workspace: 57 crates in a flat `crates/` layout
|
||||
- Main binary: `rustfs/` (75K lines); core engine: `crates/ecstore/` (87K lines)
|
||||
- Version: 1.0.0-beta.7
|
||||
|
||||
### Ports and Endpoints
|
||||
|
||||
| Port | Purpose |
|
||||
|------|---------|
|
||||
| 9000 | S3 API (primary data path) + Admin API (`/minio/` prefix) |
|
||||
| 9001 | Web Console UI |
|
||||
|
||||
### Request Flow
|
||||
|
||||
```
|
||||
HTTP request
|
||||
→ server (TLS, auth, routing, compression)
|
||||
→ app/object_usecase (validation, policy, lifecycle)
|
||||
→ storage/ecfs (erasure coding, encryption, checksums)
|
||||
→ ecstore (disk pool selection, data distribution)
|
||||
→ rio (reader pipeline: encrypt → compress → hash → write)
|
||||
→ io-core (zero-copy I/O, buffer pool, direct I/O)
|
||||
→ local disk / remote disk via RPC
|
||||
```
|
||||
|
||||
### Key Crate Map (Security & Auth Focus)
|
||||
|
||||
| Crate | Lines | Purpose |
|
||||
|-------|-------|---------|
|
||||
| `credentials` | 713 | Credential types (access key / secret key), global credentials |
|
||||
| `signer` | 1.4K | AWS Signature V4 request signing |
|
||||
| `iam` | 9.0K | Identity and Access Management (users, groups, policies, OIDC) |
|
||||
| `policy` | 8.8K | S3 bucket/IAM policy engine |
|
||||
| `keystone` | 1.9K | OpenStack Keystone auth integration |
|
||||
| `appauth` | 143 | Application-level auth tokens |
|
||||
| `crypto` | 1.6K | Encryption primitives |
|
||||
| `kms` | 8.1K | Key management service integration |
|
||||
| `protocols` | 18K | FTP/FTPS, WebDAV, Swift API support |
|
||||
| `s3-ops` | — | S3 operation definitions and mapping |
|
||||
| `s3-types` | — | S3 event type definitions |
|
||||
|
||||
### Startup Sequence (Auth-Relevant Steps)
|
||||
|
||||
1. Environment variable compatibility (`MINIO_*` → `RUSTFS_*`)
|
||||
2. Tokio runtime construction
|
||||
3. CLI argument parsing
|
||||
4. Config parsing, credentials/endpoints initialization
|
||||
5. HTTP server start (S3 API + optional console)
|
||||
6. ECStore initialization
|
||||
7. **Steps 13: Bucket metadata, IAM, Keystone, OIDC** initialization
|
||||
8. FullReady → serving requests
|
||||
|
||||
---
|
||||
|
||||
## 2. S3 API Compatibility
|
||||
|
||||
### Supported S3 Operations
|
||||
|
||||
RustFS implements a substantial subset of the S3 API via the `s3s` crate (a fork/custom build at `https://github.com/rustfs/s3s`). Based on the feature status table and crate structure:
|
||||
|
||||
| Category | Status | Details |
|
||||
|----------|--------|---------|
|
||||
| Core Object Ops (GET/PUT/DELETE/HEAD) | ✅ Available | Primary data path |
|
||||
| Multipart Upload | ✅ Available | Upload, download, multipart |
|
||||
| Versioning | ✅ Available | Object versioning |
|
||||
| Bucket Operations | ✅ Available | Create, list, delete, metadata |
|
||||
| Logging | ✅ Available | Access logging |
|
||||
| Event Notifications | ✅ Available | Webhook, Kafka, AMQP, MQTT, NATS targets |
|
||||
| Bitrot Protection | ✅ Available | Checskums at storage layer |
|
||||
| Single Node Mode | ✅ Available | Single-node deployment |
|
||||
| Bucket Replication | ✅ Available | Cross-region replication |
|
||||
| KMS | 🚧 Under Testing | Key management service |
|
||||
| Lifecycle Management | 🚧 Under Testing | Object lifecycle rules |
|
||||
| Distributed Mode | 🚧 Under Testing | Multi-node erasure coding |
|
||||
| Admin API | ✅ Available | `/minio/` prefix, 30+ handler modules |
|
||||
| Console | ✅ Available | Web UI on port 9001 |
|
||||
| S3 Select | ✅ Available | `s3select-api` + `s3select-query` crates |
|
||||
| WebDAV | ✅ Available | `protocols` crate, `dav-server` |
|
||||
| FTP/FTPS | ✅ Available | `libunftp`, `suppaftp` |
|
||||
| SFTP | — | `russh` + `russh-sftp` crate deps |
|
||||
|
||||
### Authentication Methods
|
||||
|
||||
RustFS supports multiple authentication methods (derived from `auth.rs`):
|
||||
|
||||
| Auth Type | Constant | Detection |
|
||||
|-----------|----------|-----------|
|
||||
| AWS Signature V4 (header) | `Signed` | `Authorization: AWS4-HMAC-SHA256 ...` |
|
||||
| AWS Signature V4 (query) | `Presigned` | `X-Amz-Credential` in query |
|
||||
| AWS Signature V2 (header) | `SignedV2` | `Authorization: AWS ...` |
|
||||
| AWS Signature V2 (query) | `PresignedV2` | `AWSAccessKeyId` in query |
|
||||
| Streaming V4 | `StreamingSigned` | `x-amz-content-sha256: STREAMING-AWS4-HMAC-SHA256-PAYLOAD` |
|
||||
| Streaming V4 (trailer) | `StreamingSignedTrailer` | `STREAMING-AWS4-HMAC-SHA256-PAYLOAD-TRAILER` |
|
||||
| Unsigned payload (trailer) | `StreamingUnsignedTrailer` | `STREAMING-UNSIGNED-PAYLOAD-TRAILER` |
|
||||
| POST policy | `PostPolicy` | `multipart/form-data` content type |
|
||||
| Bearer JWT | `JWT` | `Authorization: Bearer ...` |
|
||||
| STS | `STS` | `Action` header presence |
|
||||
| Anonymous | `Anonymous` | No `Authorization` header |
|
||||
| Keystone token | — | `X-Auth-Token` header (via middleware) |
|
||||
|
||||
### S3 Request Signing
|
||||
|
||||
The `rustfs-signer` crate implements AWS Signature V4. The general flow:
|
||||
|
||||
1. Client computes a canonical request (method + path + query + headers + payload hash)
|
||||
2. Client creates a string to sign (algorithm + timestamp + credential scope + canonical request hash)
|
||||
3. Client computes HMAC-SHA256 signature using the secret key
|
||||
4. Client sends the `Authorization` header with the signature
|
||||
|
||||
---
|
||||
|
||||
## 3. OpenStack Swift and Keystone Integration
|
||||
|
||||
### Swift API
|
||||
|
||||
RustFS provides an **OpenStack Swift-compatible API** as an opt-in feature (behind the `swift` cargo feature flag). This is implemented in `crates/protocols/src/swift/`.
|
||||
|
||||
**Swift API endpoint pattern:** `/v1/AUTH_{project_id}/...`
|
||||
|
||||
**Supported Swift operations:**
|
||||
- Container CRUD (create, list, delete, metadata)
|
||||
- Object CRUD with streaming downloads
|
||||
- Keystone token authentication
|
||||
- Multi-tenant isolation with SHA256-based bucket prefixing
|
||||
- Server-side object copy (COPY method)
|
||||
- HTTP Range requests (206/416 responses)
|
||||
- Custom metadata (X-Object-Meta-*, X-Container-Meta-*)
|
||||
|
||||
**Not yet implemented:** Account-level ops, large object support (>5GB), object versioning, container ACLs/CORS, TempURL, XML/plain-text response formats.
|
||||
|
||||
**Tenant isolation:** Swift containers are mapped to S3 buckets with a secure hash prefix:
|
||||
```
|
||||
Swift: /v1/AUTH_abc123/mycontainer
|
||||
→ S3 Bucket: {sha256(abc123)[0:16]}-mycontainer
|
||||
```
|
||||
|
||||
### Keystone Authentication — Complete Flow
|
||||
|
||||
This is the most auth-relevant subsystem for alknet integration.
|
||||
|
||||
#### Configuration (Environment Variables)
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `RUSTFS_KEYSTONE_ENABLE` | Enable Keystone auth | `false` |
|
||||
| `RUSTFS_KEYSTONE_AUTH_URL` | Keystone endpoint URL | (required) |
|
||||
| `RUSTFS_KEYSTONE_VERSION` | API version (`v3` or `v2.0`) | `v3` |
|
||||
| `RUSTFS_KEYSTONE_ADMIN_USER` | Admin username | (optional) |
|
||||
| `RUSTFS_KEYSTONE_ADMIN_PASSWORD` | Admin password | (optional) |
|
||||
| `RUSTFS_KEYSTONE_ADMIN_PROJECT` | Admin project/tenant | (optional) |
|
||||
| `RUSTFS_KEYSTONE_ADMIN_DOMAIN` | Admin domain | `Default` |
|
||||
| `RUSTFS_KEYSTONE_VERIFY_SSL` | Verify TLS certificates | `true` |
|
||||
| `RUSTFS_KEYSTONE_ENABLE_CACHE` | Enable token caching | `true` |
|
||||
| `RUSTFS_KEYSTONE_CACHE_SIZE` | Token cache capacity | `10000` |
|
||||
| `RUSTFS_KEYSTONE_CACHE_TTL` | Token cache TTL (seconds) | `300` |
|
||||
| `RUSTFS_KEYSTONE_TENANT_PREFIX` | Enable tenant project prefixing | `true` |
|
||||
| `RUSTFS_KEYSTONE_IMPLICIT_TENANTS` | Auto-create tenants | `true` |
|
||||
| `RUSTFS_KEYSTONE_TIMEOUT` | Request timeout (seconds) | `30` |
|
||||
|
||||
#### Architecture: Component Stack
|
||||
|
||||
```
|
||||
KeystoneClient (HTTP calls to Keystone v3 API)
|
||||
↓
|
||||
KeystoneAuthProvider (Authentication + Caching via moka::future::Cache)
|
||||
↓
|
||||
KeystoneAuthMiddleware (Tower layer, intercepts HTTP requests)
|
||||
↓ (task-local: KEYSTONE_CREDENTIALS)
|
||||
IAMAuth → check_key_valid (Authorization)
|
||||
↓
|
||||
RustFS Credentials (access_key starts with "keystone:")
|
||||
```
|
||||
|
||||
#### Authentication Flow
|
||||
|
||||
**Request with `X-Auth-Token` header:**
|
||||
|
||||
1. **Middleware intercepts:** `KeystoneAuthMiddleware` extracts `X-Auth-Token` header
|
||||
2. **Cache check:** Token cache hit → return cached credentials (~1-2ms)
|
||||
3. **Token validation:** Cache miss → `KeystoneClient.validate_token()` → `GET /v3/auth/tokens` with `X-Auth-Token` and `X-Subject-Token` headers
|
||||
4. **Token parsing:** Parse `KeystoneToken` (user_id, username, project_id, project_name, domain, roles, expires_at)
|
||||
5. **Credential mapping:** Convert to `Credentials` struct:
|
||||
- `access_key`: `keystone:<user_id>` (special prefix identifies Keystone users)
|
||||
- `secret_key`: `""` (empty — bypasses AWS SigV4 verification)
|
||||
- `session_token`: the Keystone token string
|
||||
- `parent_user`: Keystone username
|
||||
- `groups`: roles list
|
||||
- `claims`: JSON map with `keystone_user_id`, `keystone_project_id`, `keystone_roles`, `auth_source: "keystone"`
|
||||
6. **Task-local storage:** Store credentials in `KEYSTONE_CREDENTIALS` task-local (async-scoped to request)
|
||||
7. **Auth bypass:** IAMAuth detects `keystone:` prefix → returns empty secret key, bypassing SigV4
|
||||
8. **Authorization:** `check_key_valid()` retrieves credentials from task-local storage
|
||||
9. **Role check:** `admin` or `reseller_admin` roles → `is_owner=true`; other roles → `is_owner=false`
|
||||
|
||||
**Request without `X-Auth-Token`:**
|
||||
1. Middleware passes through unchanged
|
||||
2. Standard AWS SigV4 authentication proceeds
|
||||
3. IAM validation as normal
|
||||
|
||||
**Invalid token:**
|
||||
1. Middleware returns `401 Unauthorized` immediately with XML error body
|
||||
2. **No fallback** to standard S3 auth
|
||||
|
||||
#### EC2 Credentials
|
||||
|
||||
RustFS also supports Keystone EC2 credentials for S3 API compatibility:
|
||||
|
||||
- `POST /v3/ec2tokens` with `{access, signature, data}` validates EC2-style credentials
|
||||
- `GET /v3/users/{user_id}/credentials/OS-EC2` lists EC2 credentials for a user
|
||||
- Access key format: `user_id:project_id` or `user_id`
|
||||
|
||||
#### Role Mapping (Keystone → RustFS)
|
||||
|
||||
| Keystone Role | RustFS Policy | Permissions |
|
||||
|---------------|---------------|-------------|
|
||||
| `admin` | AdminPolicy | Full access (`s3:*`) |
|
||||
| `Admin` | AdminPolicy | Full access |
|
||||
| `Member` | ReadWritePolicy | Read/write |
|
||||
| `_member_` | ReadOnlyPolicy | Read-only |
|
||||
| `ResellerAdmin` | AdminPolicy | Full access |
|
||||
| `SwiftOperator` | ReadWritePolicy | Read/write |
|
||||
| `objectstore:admin` | AdminPolicy | Full access |
|
||||
| `objectstore:creator` | ReadWritePolicy | Read/write |
|
||||
|
||||
Custom role mappings can be added programmatically via `KeystoneIdentityMapper::add_role_mapping()`.
|
||||
|
||||
#### Multi-Tenancy
|
||||
|
||||
When `RUSTFS_KEYSTONE_TENANT_PREFIX=true`:
|
||||
- Bucket creation: `mybucket` → stored as `project_id:mybucket`
|
||||
- Bucket listing: filtered by project_id
|
||||
- Access control: users can only access their project's buckets
|
||||
|
||||
---
|
||||
|
||||
## 4. Authentication Model — Complete Reference
|
||||
|
||||
### Credentials Struct
|
||||
|
||||
The core `Credentials` struct (in `rustfs-credentials`):
|
||||
|
||||
```rust
|
||||
pub struct Credentials {
|
||||
pub access_key: String, // S3 access key (or "keystone:<user_id>")
|
||||
pub secret_key: String, // S3 secret key (empty for Keystone)
|
||||
pub session_token: String, // STS session token / Keystone token
|
||||
pub expiration: Option<OffsetDateTime>, // Token expiration
|
||||
pub status: String, // "active" or "off"
|
||||
pub parent_user: String, // Parent user for STS/service accounts
|
||||
pub groups: Option<Vec<String>>, // Group membership
|
||||
pub claims: Option<HashMap<String, Value>>, // JWT/Keystone claims
|
||||
pub name: Option<String>, // Human-readable name
|
||||
pub description: Option<String>,
|
||||
}
|
||||
```
|
||||
|
||||
Key methods:
|
||||
- `is_expired()` — checks if the credential's expiration has passed
|
||||
- `is_temp()` — true if `session_token` is non-empty and not expired
|
||||
- `is_service_account()` — true if claims contain `sa-policy` key and `parent_user` is non-empty
|
||||
- `is_valid()` — access_key >= 3 chars, secret_key >= 8 chars, not expired, status != "off"
|
||||
- Default credentials: `rustfsadmin` / `rustfsadmin` (env vars: `RUSTFS_ACCESS_KEY` / `RUSTFS_SECRET_KEY`)
|
||||
|
||||
### IAM System
|
||||
|
||||
The IAM system (`rustfs-iam`) manages:
|
||||
|
||||
- **Users and groups** with RBAC
|
||||
- **Service accounts** and API key authentication
|
||||
- **Policy engine** with fine-grained S3-style permissions
|
||||
- **LDAP/Active Directory** integration
|
||||
- **Session management** and token validation
|
||||
- **OIDC integration** (full OpenID Connect with PKCE)
|
||||
|
||||
The IAM system is initialized as a singleton (`IAM_SYS`) backed by an `ObjectStore` (persisted in the S3 storage itself). Lookups go through `IamSys::check_key(access_key)` which loads from cache or disk.
|
||||
|
||||
### OIDC Support
|
||||
|
||||
RustFS has comprehensive OIDC support (`rustfs-iam` → `oidc.rs`):
|
||||
|
||||
**Configuration (environment variables):**
|
||||
- `RUSTFS_IDENTITY_OPENID_ENABLE=on`
|
||||
- `RUSTFS_IDENTITY_OPENID_CONFIG_URL` — OIDC discovery URL
|
||||
- `RUSTFS_IDENTITY_OPENID_CLIENT_ID` — OAuth2 client ID
|
||||
- `RUSTFS_IDENTITY_OPENID_CLIENT_SECRET` — OAuth2 client secret
|
||||
- `RUSTFS_IDENTITY_OPENID_SCOPES` — comma-separated scopes (default: `openid,profile,email`)
|
||||
- `RUSTFS_IDENTITY_OPENID_GROUPS_CLAIM` — claim for group membership
|
||||
- `RUSTFS_IDENTITY_OPENID_ROLES_CLAIM` — claim for role mapping (Microsoft Entra ID app roles)
|
||||
- `RUSTFS_IDENTITY_OPENID_CLAIM_NAME` — primary claim for policy mapping
|
||||
- `RUSTFS_IDENTITY_OPENID_CLAIM_PREFIX` — prefix for claim-to-policy mapping
|
||||
- `RUSTFS_IDENTITY_OPENID_REDIRECT_URI` — callback URL
|
||||
- `RUSTFS_IDENTITY_OPENID_REDIRECT_URI_DYNAMIC` — allow dynamic redirect URIs
|
||||
|
||||
**Features:**
|
||||
- Authorization Code flow with PKCE
|
||||
- OIDC discovery and JWKS auto-refresh
|
||||
- Multiple OIDC providers (suffixed env vars like `_PRIMARY`, `_SECONDARY`)
|
||||
- ID token verification (signature, issuer, audience, expiry)
|
||||
- `AssumeRoleWithWebIdentity` flow (JWT directly, no browser)
|
||||
- Roles and groups claim mapping to RustFS IAM policies
|
||||
- Provider-specific configuration (Microsoft Entra ID roles claim support)
|
||||
|
||||
**OIDC Claims → RustFS Policy Mapping:**
|
||||
```json
|
||||
{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [{
|
||||
"Effect": "Allow",
|
||||
"Action": ["admin:*"],
|
||||
"Resource": ["arn:aws:s3:::*"],
|
||||
"Condition": {
|
||||
"ForAnyValue:StringEquals": {
|
||||
"jwt:roles": ["RustFS.ConsoleAdmin"]
|
||||
}
|
||||
}
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
### RPC Authentication
|
||||
|
||||
RustFS uses a derived RPC secret for inter-node communication:
|
||||
- Environment variable: `RUSTFS_RPC_SECRET` (explicit) or derived from `access_key + secret_key` via HMAC-SHA256
|
||||
- Uses a `0xFFFFFFFFFFFFFFFF` mask for the signing context
|
||||
- Base64url-encoded (no padding) output
|
||||
|
||||
---
|
||||
|
||||
## 5. Docker Deployment
|
||||
|
||||
### Simple Deployment
|
||||
|
||||
```yaml
|
||||
# docker-compose-simple.yml
|
||||
services:
|
||||
rustfs:
|
||||
image: rustfs/rustfs:latest
|
||||
ports:
|
||||
- "9000:9000" # S3 API
|
||||
- "9001:9001" # Console
|
||||
environment:
|
||||
- RUSTFS_VOLUMES=/data/rustfs{0...3}
|
||||
- RUSTFS_ADDRESS=0.0.0.0:9000
|
||||
- RUSTFS_CONSOLE_ADDRESS=0.0.0.0:9001
|
||||
- RUSTFS_ACCESS_KEY=rustfsadmin
|
||||
- RUSTFS_SECRET_KEY=rustfsadmin
|
||||
- RUSTFS_OBS_LOGGER_LEVEL=info
|
||||
volumes:
|
||||
- rustfs_data_0:/data/rustfs0
|
||||
- rustfs_data_1:/data/rustfs1
|
||||
- rustfs_data_2:/data/rustfs2
|
||||
- rustfs_data_3:/data/rustfs3
|
||||
```
|
||||
|
||||
### Full Deployment (with Observability)
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml (with --profile observability)
|
||||
services:
|
||||
rustfs:
|
||||
# ... same as above, plus:
|
||||
- RUSTFS_OBS_ENDPOINT=http://otel-collector:4318
|
||||
otel-collector: # OpenTelemetry collector
|
||||
tempo: # Distributed tracing
|
||||
jaeger: # Jaeger UI
|
||||
prometheus: # Metrics
|
||||
loki: # Logs
|
||||
grafana: # Dashboards
|
||||
nginx: # Reverse proxy (optional, --profile proxy)
|
||||
```
|
||||
|
||||
### Dockerfile
|
||||
|
||||
- Base: Alpine 3.23.4
|
||||
- Runs as non-root user `rustfs` (UID/GID 10001:10001)
|
||||
- Single binary: `/usr/bin/rustfs`
|
||||
- Entrypoint: `/entrypoint.sh` (processes volumes, log dirs, default credential warnings)
|
||||
- Health check: HTTP/HTTPS `/health` on port 9000, `/rustfs/console/health` on 9001
|
||||
- Supports TLS via `RUSTFS_TLS_PATH=/opt/tls` with `rustfs_cert.pem` + `rustfs_key.pem` + optional `ca.crt`
|
||||
|
||||
### Keystone-Enabled Deployment
|
||||
|
||||
```bash
|
||||
docker run -d \
|
||||
-p 9000:9000 -p 9001:9001 \
|
||||
-e RUSTFS_ACCESS_KEY=admin \
|
||||
-e RUSTFS_SECRET_KEY=adminsecret \
|
||||
-e RUSTFS_KEYSTONE_ENABLE=true \
|
||||
-e RUSTFS_KEYSTONE_AUTH_URL=http://keystone:5000 \
|
||||
-e RUSTFS_KEYSTONE_VERSION=v3 \
|
||||
-e RUSTFS_KEYSTONE_ADMIN_USER=admin \
|
||||
-e RUSTFS_KEYSTONE_ADMIN_PASSWORD=secret \
|
||||
-e RUSTFS_KEYSTONE_ADMIN_PROJECT=admin \
|
||||
-e RUSTFS_KEYSTONE_ADMIN_DOMAIN=Default \
|
||||
-v /data:/data \
|
||||
rustfs/rustfs:latest
|
||||
```
|
||||
|
||||
### Webhook Notification
|
||||
|
||||
```bash
|
||||
docker run -d --name rustfs -p 9000:9000 \
|
||||
-e RUSTFS_NOTIFY_ENABLE=true \
|
||||
-e RUSTFS_NOTIFY_WEBHOOK_ENABLE_PRIMARY=on \
|
||||
-e RUSTFS_NOTIFY_WEBHOOK_ENDPOINT_PRIMARY=http://host:3020/webhook \
|
||||
-e RUSTFS_NOTIFY_WEBHOOK_QUEUE_DIR_PRIMARY=/tmp/rustfs-events \
|
||||
rustfs/rustfs:latest
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. SDK/Client Libraries — Rust S3 Clients
|
||||
|
||||
### aws-sdk-s3 (Official AWS SDK for Rust)
|
||||
|
||||
RustFS itself uses `aws-sdk-s3` (v1.135.0) as a dependency — this is the most mature Rust S3 client:
|
||||
|
||||
```toml
|
||||
aws-sdk-s3 = { version = "1.135.0", default-features = false, features = ["sigv4a", "default-https-client", "rt-tokio"] }
|
||||
aws-config = { version = "1.8.18" }
|
||||
aws-credential-types = { version = "1.2.14" }
|
||||
```
|
||||
|
||||
**Pros:** Full S3 API coverage, SigV4/SigV4a signing, async, production-tested
|
||||
**Cons:** Heavy dependency (pulls in significant AWS SDK surface area), AWS-centric abstractions
|
||||
|
||||
### s3s (RustFS's own S3 framework)
|
||||
|
||||
RustFS uses a custom `s3s` crate (`https://github.com/rustfs/s3s`, with `minio` feature):
|
||||
```toml
|
||||
s3s = { git = "https://github.com/rustfs/s3s", rev = "507e1312b211c3ddc214b03875d6fabd15d22ed5", features = ["minio"] }
|
||||
```
|
||||
|
||||
This provides S3 request/response types, routing, and the `S3Auth` trait used by RustFS's `IAMAuth`.
|
||||
|
||||
### rust-s3 ( Community)
|
||||
|
||||
Not used by RustFS, but worth noting as an alternative:
|
||||
- Crate: `rust-s3` / `s3`
|
||||
- Simpler API than aws-sdk-s3
|
||||
- Supports MinIO-compatible endpoints
|
||||
- Less complete S3 operation coverage
|
||||
|
||||
### Recommendation for alknet
|
||||
|
||||
For alknet's S3 adapter:
|
||||
- **Internal use**: aws-sdk-s3, configured with custom endpoint pointing to rustfs
|
||||
- **Request signing**: If building a lightweight adapter, extract just the signing logic from `rustfs-signer` or use `aws-smithy-runtime` directly
|
||||
- **The CredentialSet::S3AccessKey variant** (from alknet's credential-provider.md) maps directly to RustFS's `access_key + secret_key` pair; no additional transformation needed
|
||||
|
||||
---
|
||||
|
||||
## 7. Relevance to Alknet
|
||||
|
||||
### 7.1 RustFS as an Internal Object Store Behind Alknet's HTTP Interface
|
||||
|
||||
**Architecture:**
|
||||
```
|
||||
Client (any S3 SDK)
|
||||
→ Alknet HTTP adapter (port 443/80 with HTTPS termination)
|
||||
→ RustFS (port 9000, Docker network, not exposed externally)
|
||||
→ Disk storage (/data volumes)
|
||||
```
|
||||
|
||||
**Deployment pattern:** RustFS runs as a Docker container on the same Docker network as alknet, listening only on the internal network. Alknet's HTTP interface reverse-proxies S3 API calls to rustfs.
|
||||
|
||||
**Reverse proxy considerations:**
|
||||
- Alknet would forward `Host`, `Authorization`, `X-Auth-Token`, `X-Amz-*` headers unchanged
|
||||
- RustFS needs the real client IP for S3 policy `SourceIp` conditions; alknet should set `X-Forwarded-For` and configure `RUSTFS_TRUSTED_PROXIES` or use rustfs's `trusted-proxies` crate
|
||||
- Health check: Alknet proxies `/health` → rustfs:9000
|
||||
- RustFS supports `X-Forwarded-Proto` for TLS offloading via its `trusted-proxies` crate
|
||||
|
||||
**Why behind alknet rather than standalone:**
|
||||
1. Unified TLS termination at alknet
|
||||
2. alknet can inject auth headers (e.g., OIDC tokens) before forwarding
|
||||
3. alknet can enforce rate limiting and access control
|
||||
4. Network isolation — rustfs only accessible via alknet
|
||||
|
||||
**Webhook integration:** RustFS can POST events to alknet via its notification system:
|
||||
```bash
|
||||
RUSTFS_NOTIFY_WEBHOOK_ENDPOINT_PRIMARY=http://alknet:3020/webhook
|
||||
```
|
||||
|
||||
### 7.2 Mapping S3 Auth to Alknet's CredentialProvider/CredentialSet
|
||||
|
||||
The alknet `CredentialSet` enum directly models the S3 auth pattern:
|
||||
|
||||
| RustFS Auth Method | Alknet CredentialSet Variant | Mapping |
|
||||
|---|---|---|
|
||||
| Access key + secret key (SigV4) | `S3AccessKey { access_key, secret_key, session_token }` | Direct 1:1 mapping; access_key and secret_key are the S3 credential pair |
|
||||
| Keystone X-Auth-Token | `OidcToken { access_token, ... }` | Keystone token → OIDC access_token; expires_at maps to Keystone token expiration |
|
||||
| STS AssumeRole session | `S3AccessKey { ..., session_token: Some(...) }` | STS temporary credentials with session token |
|
||||
| OIDC (browser flow) | `OidcToken { access_token, refresh_token, expires_at }` | Direct mapping |
|
||||
| Admin default credentials | `S3AccessKey { access_key: "rustfsadmin", secret_key: "rustfsadmin" }` | Service-level credential |
|
||||
|
||||
**S3 Request Signing (Phase C in credential-provider.md):**
|
||||
The `S3AccessKey` variant contains the raw credential data. The signing computation itself is separate — it's a utility function `s3_sign(credential: &S3AccessKey, request: &HttpRequest) -> SignedRequest` that should live in a shared `alknet-s3` utility crate, not in `CredentialSet`. This matches OpenQ-04 in the credential-provider doc.
|
||||
|
||||
**For alknet's `S3CredentialManager`:**
|
||||
```rust
|
||||
impl CredentialManager for S3CredentialManager {
|
||||
fn refresh(&self, current: &CredentialSet) -> Option<CredentialSet> {
|
||||
// If we have an STS session token, check expiration
|
||||
// and re-AssumeRole if needed
|
||||
}
|
||||
|
||||
fn is_expired(&self, current: &CredentialSet) -> bool {
|
||||
match current {
|
||||
CredentialSet::S3AccessKey { session_token: Some(t), .. }
|
||||
if !t.is_empty() => check_sts_expiration(t),
|
||||
CredentialSet::OidcToken { expires_at: Some(ts), .. }
|
||||
=> *ts < now(),
|
||||
_ => false, // Static keys don't expire
|
||||
}
|
||||
}
|
||||
|
||||
fn provision(&self, identity: &Identity) -> Option<CredentialSet> {
|
||||
// Create a rustfs IAM access key for this alknet identity
|
||||
// via the rustfs admin API
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 7.3 Alknet as an OIDC Provider for RustFS (Phase D)
|
||||
|
||||
This is the most strategically important integration point. RustFS already has complete OIDC support — it just needs an OIDC provider to trust.
|
||||
|
||||
**How it would work:**
|
||||
|
||||
1. **alknet exposes OIDC endpoints** (via call protocol HTTP adapter or a dedicated `/oidc/` path):
|
||||
- `GET /.well-known/openid-configuration` — discovery document
|
||||
- `GET /oidc/authorize` — authorization endpoint
|
||||
- `POST /oidc/token` — token exchange
|
||||
- `GET /oidc/userinfo` — user info
|
||||
- `GET /oidc/jwks` — JSON Web Key Set
|
||||
- `GET /oidc/logout` — RP-initiated logout
|
||||
|
||||
2. **alknet's Identity maps to OIDC claims:**
|
||||
- `sub` → `Identity.id` (SSH fingerprint or account UUID)
|
||||
- `email` → from account metadata (if available)
|
||||
- `username` → display name or `Identity.id`
|
||||
- `groups` → `Identity.scopes` (e.g., `["s3:admin", "s3:readwrite"]`)
|
||||
- `roles` → derived from scopes (e.g., `scope "s3:admin"` → role `"admin"`)
|
||||
|
||||
3. **RustFS configuration** (pointing at alknet):
|
||||
```bash
|
||||
RUSTFS_IDENTITY_OPENID_ENABLE=on
|
||||
RUSTFS_IDENTITY_OPENID_CONFIG_URL=https://alknet:443/.well-known/openid-configuration
|
||||
RUSTFS_IDENTITY_OPENID_CLIENT_ID=alknet-rustfs-client
|
||||
RUSTFS_IDENTITY_OPENID_CLIENT_SECRET=<auto-generated>
|
||||
RUSTFS_IDENTITY_OPENID_SCOPES=openid,profile,email,groups
|
||||
RUSTFS_IDENTITY_OPENID_GROUPS_CLAIM=groups
|
||||
RUSTFS_IDENTITY_OPENID_ROLES_CLAIM=roles
|
||||
```
|
||||
|
||||
4. **Authentication flow:**
|
||||
- User connects to alknet (via SSH/WebTransport/HTTP)
|
||||
- alknet resolves identity → `Identity { id, scopes, resources }`
|
||||
- User requests access to rustfs console
|
||||
- Browser redirects to alknet's OIDC authorize endpoint
|
||||
- alknet issues authorization code → token exchange → ID token
|
||||
- RustFS verifies the ID token using alknet's JWKS endpoint
|
||||
- RustFS maps `groups` and `roles` claims to IAM policies
|
||||
|
||||
5. **For `AssumeRoleWithWebIdentity` (programmatic access):**
|
||||
- alknet issues a JWT directly to the client
|
||||
- Client presents JWT to RustFS via `Action=AssumeRoleWithWebIdentity`
|
||||
- RustFS calls `OidcSys::verify_web_identity_token()` which:
|
||||
- Decodes JWT payload to get `iss` claim
|
||||
- Finds matching OIDC provider (alknet)
|
||||
- Verifies signature, issuer, audience, expiry
|
||||
- Extracts claims → maps to RustFS policies
|
||||
|
||||
**This eliminates stored credentials entirely** — alknet identities authenticate directly to rustfs via OIDC, no `S3AccessKey` needed.
|
||||
|
||||
### 7.4 Alknet RustFS Adapter Architecture
|
||||
|
||||
An alknet HTTP/HTTPS adapter for the S3 API would look like:
|
||||
|
||||
```
|
||||
alknet HTTP adapter
|
||||
├── Route: /s3/* → reverse proxy to rustfs:9000
|
||||
│ ├── Preserve all S3 headers (Authorization, X-Amz-*, X-Auth-Token, Content-*)
|
||||
│ ├── Set X-Forwarded-For, X-Forwarded-Proto
|
||||
│ ├── Optionally inject X-Auth-Token from alknet Identity
|
||||
│ └── Response streaming (for large object downloads)
|
||||
├── Route: /s3/health → rustfs:9000/health (health check)
|
||||
└── Route: /s3/admin/* → rustfs:9000/minio/* (admin API)
|
||||
```
|
||||
|
||||
**Key considerations:**
|
||||
- S3 requests can be very large (multipart uploads, 5TB+ objects). The adapter must support streaming both request and response bodies without buffering.
|
||||
- `X-Forwarded-For` must be set so rustfs can evaluate `SourceIp` condition keys in bucket policies.
|
||||
- RustFS already handles `X-Forwarded-Proto` for HTTPS offloading via its `trusted-proxies` crate.
|
||||
- For OIDC integration, the adapter doesn't need to modify auth headers — rustfs handles OIDC token validation itself when pointed at alknet's OIDC endpoint.
|
||||
|
||||
**Alknet's `OpenAPIServiceRegistry` integration:**
|
||||
|
||||
Since rustfs exposes an S3 API, alknet could auto-register S3 operations via an OpenAPI spec or hardcoded operation specs:
|
||||
|
||||
```rust
|
||||
// In alknet's service registry:
|
||||
let s3_ops = FromOpenAPI(s3_openapi_spec, config);
|
||||
// Where config.auth = CredentialSet::S3AccessKey { access_key, secret_key, session_token: None }
|
||||
// Or: config.auth = CredentialSet::OidcToken { access_token, refresh_token, expires_at }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Key RustFS Source Files for Reference
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `crates/credentials/src/credentials.rs` | `Credentials` struct, global credentials, key generation |
|
||||
| `crates/credentials/src/constants.rs` | Default access/secret keys, IAM policy constants |
|
||||
| `crates/signer/` | AWS Signature V4 implementation |
|
||||
| `crates/keystone/src/config.rs` | Keystone configuration from env vars |
|
||||
| `crates/keystone/src/client.rs` | Keystone v3 API client (token validation, EC2 creds, admin auth) |
|
||||
| `crates/keystone/src/auth.rs` | `KeystoneAuthProvider` (token → `Credentials` mapping) |
|
||||
| `crates/keystone/src/middleware.rs` | Tower middleware extracting `X-Auth-Token`, task-local storage |
|
||||
| `crates/keystone/src/identity.rs` | `KeystoneIdentityMapper` (role → policy, tenant prefix) |
|
||||
| `crates/iam/src/oidc.rs` | Complete OIDC system (discovery, PKCE, token exchange, JWT verification) |
|
||||
| `crates/iam/src/sys.rs` | `IamSys` (IAM singleton, user/key management) |
|
||||
| `crates/policy/` | S3 bucket/IAM policy evaluation engine |
|
||||
| `rustfs/src/auth.rs` | `IAMAuth`, `check_key_valid`, auth type detection, condition values |
|
||||
| `rustfs/src/server/` | HTTP server, TLS, routing, middleware stack |
|
||||
| `crates/protocols/src/swift/` | OpenStack Swift API implementation |
|
||||
| `Dockerfile` / `docker-compose-simple.yml` | Deployment configuration |
|
||||
|
||||
---
|
||||
|
||||
## 9. Configuration Quick Reference
|
||||
|
||||
### RustFS Docker Environment Variables (Auth-Relevant)
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `RUSTFS_ACCESS_KEY` | Root access key | `rustfsadmin` |
|
||||
| `RUSTFS_SECRET_KEY` | Root secret key | `rustfsadmin` |
|
||||
| `RUSTFS_ADDRESS` | S3 API listen address | `0.0.0.0:9000` |
|
||||
| `RUSTFS_CONSOLE_ADDRESS` | Console listen address | `0.0.0.0:9001` |
|
||||
| `RUSTFS_CONSOLE_ENABLE` | Enable web console | `true` |
|
||||
| `RUSTFS_TLS_PATH` | TLS certificate directory | (none, HTTP) |
|
||||
| `RUSTFS_KEYSTONE_ENABLE` | Enable Keystone auth | `false` |
|
||||
| `RUSTFS_KEYSTONE_AUTH_URL` | Keystone v3 endpoint | (required if enabled) |
|
||||
| `RUSTFS_KEYSTONE_VERSION` | Keystone API version | `v3` |
|
||||
| `RUSTFS_KEYSTONE_ADMIN_USER` | Keystone admin user | (optional) |
|
||||
| `RUSTFS_KEYSTONE_ADMIN_PASSWORD` | Keystone admin password | (optional) |
|
||||
| `RUSTFS_KEYSTONE_ADMIN_PROJECT` | Keystone admin project | (optional) |
|
||||
| `RUSTFS_KEYSTONE_ADMIN_DOMAIN` | Keystone admin domain | `Default` |
|
||||
| `RUSTFS_KEYSTONE_VERIFY_SSL` | Verify Keystone TLS | `true` |
|
||||
| `RUSTFS_KEYSTONE_CACHE_SIZE` | Token cache size | `10000` |
|
||||
| `RUSTFS_KEYSTONE_CACHE_TTL` | Token cache TTL (sec) | `300` |
|
||||
| `RUSTFS_KEYSTONE_TENANT_PREFIX` | Enable tenant prefixing | `true` |
|
||||
| `RUSTFS_IDENTITY_OPENID_ENABLE` | Enable OIDC | `off` |
|
||||
| `RUSTFS_IDENTITY_OPENID_CONFIG_URL` | OIDC discovery URL | (required) |
|
||||
| `RUSTFS_IDENTITY_OPENID_CLIENT_ID` | OIDC client ID | (required) |
|
||||
| `RUSTFS_IDENTITY_OPENID_CLIENT_SECRET` | OIDC client secret | (optional) |
|
||||
| `RUSTFS_IDENTITY_OPENID_SCOPES` | OIDC scopes | `openid,profile,email` |
|
||||
| `RUSTFS_IDENTITY_OPENID_GROUPS_CLAIM` | Groups claim name | `groups` |
|
||||
| `RUSTFS_IDENTITY_OPENID_ROLES_CLAIM` | Roles claim name | (empty, opt-in) |
|
||||
| `RUSTFS_RPC_SECRET` | Inter-node RPC auth secret | (derived from keys) |
|
||||
| `RUSTFS_NOTIFY_WEBHOOK_ENABLE_PRIMARY` | Enable webhook notifications | `off` |
|
||||
| `RUSTFS_NOTIFY_WEBHOOK_ENDPOINT_PRIMARY` | Webhook URL | (required) |
|
||||
|
||||
---
|
||||
|
||||
## 10. Summary of Integration Paths
|
||||
|
||||
### Phase A (Immediate): Static S3 Credentials
|
||||
|
||||
- Deploy rustfs as a Docker service next to alknet
|
||||
- Configure `RUSTFS_ACCESS_KEY` and `RUSTFS_SECRET_KEY`
|
||||
- alknet stores these as `CredentialSet::S3AccessKey`
|
||||
- alknet's HTTP adapter reverse-proxies S3 calls to rustfs
|
||||
- Use `aws-sdk-s3` or `rust-s3` as the client library
|
||||
|
||||
**Effort:** Low. No auth changes in either system.
|
||||
|
||||
### Phase B: OIDC via External Provider
|
||||
|
||||
- Configure rustfs `RUSTFS_IDENTITY_OPENID_*` to point at an external OIDC provider (e.g., Keycloak, Authentik, Microsoft Entra ID)
|
||||
- alknet can still manage its own auth independently
|
||||
- Both systems trust the same OIDC provider
|
||||
|
||||
**Effort:** Low. Configuration-only change in rustfs.
|
||||
|
||||
### Phase C: Managed Credentials
|
||||
|
||||
- alknet provisions rustfs access keys via admin API (`/minio/` endpoints)
|
||||
- `S3CredentialManager` handles session token rotation
|
||||
- Identity-bound credentials: alknet creates per-user access keys in rustfs IAM
|
||||
|
||||
**Effort:** Medium. Requires admin API client, credential lifecycle management.
|
||||
|
||||
### Phase D: Alknet as OIDC Provider (Target State)
|
||||
|
||||
- alknet exposes OIDC endpoints (`.well-known/openid-configuration`, `/oidc/authorize`, `/oidc/token`, `/oidc/jwks`)
|
||||
- rustfs trusts alknet as its OIDC provider
|
||||
- `Identity.scopes` maps to rustfs IAM policies (e.g., `s3:admin` → admin policy)
|
||||
- No stored S3 credentials — users authenticate directly via alknet identity
|
||||
- `AssumeRoleWithWebIdentity` for programmatic access
|
||||
|
||||
**Effort:** High. Requires building OIDC authorization server in alknet. This is the most elegant but most complex path.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [RustFS GitHub](https://github.com/rustfs/rustfs) — v1.0.0-beta.7
|
||||
- [RustFS Documentation](https://docs.rustfs.com)
|
||||
- [RustFS Keystone README](file:///workspace/rustfs/crates/keystone/README.md) — comprehensive Keystone integration docs
|
||||
- [RustFS OIDC implementation](file:///workspace/rustfs/crates/iam/src/oidc.rs) — full OIDC client with PKCE, discovery, JWKS refresh
|
||||
- [RustFS auth.rs](file:///workspace/rustfs/rustfs/src/auth.rs) — IAMAuth, check_key_valid, auth type detection
|
||||
- [alknet credential-provider.md](file:///workspace/@alkdev/alknet/docs/research/phase2/credential-provider.md) — alknet's outbound auth design
|
||||
- [alknet identity.md](file:///workspace/@alkdev/alknet/docs/architecture/identity.md) — alknet's inbound auth design
|
||||
@@ -1,808 +0,0 @@
|
||||
# Alknet Services: irpc Service Architecture
|
||||
|
||||
> Status: Research / Draft
|
||||
> Last updated: 2026-06-06
|
||||
|
||||
## Overview
|
||||
|
||||
Alknet uses an **irpc-based service layer** to decompose core responsibilities into independently testable, deployable, and replaceable components. Services communicate via irpc protocol enums that work both as in-process async boundaries (tokio channels) and cross-process/cross-network (QUIC streams via noq).
|
||||
|
||||
This document defines the service protocols and their relationships, following the head/worker terminology established in [core.md](core.md).
|
||||
|
||||
## Design Principles
|
||||
|
||||
### 1. Services are protocol enums
|
||||
|
||||
An irpc service is defined as a Rust enum annotated with `#[rpc_requests]`. The macro generates two versions:
|
||||
- **Serializable** (`Request`): safe to encode with postcard, for remote communication
|
||||
- **With channels** (`RequestWithChannels`): includes `oneshot::Sender` and `mpsc` channels, for local communication
|
||||
|
||||
Both versions use the same `Client<S>` type — the local/remote distinction is transparent at the call site.
|
||||
|
||||
### 2. Services are the async boundary
|
||||
|
||||
Instead of a giant `mpsc` message enum per the irpc documentation's description of the common anti-pattern, each service has its own focused protocol. This keeps responsibilities clear and prevents the "god enum" problem.
|
||||
|
||||
### 3. Local-first, remote-capable
|
||||
|
||||
Every service can run locally (mpsc channels, zero serialization overhead) or remotely (QUIC streams, postcard serialization). The deployment choice doesn't affect the call sites. A single-node setup runs everything locally. A distributed setup runs auth and secrets on dedicated nodes.
|
||||
|
||||
### 4. Event boundary discipline
|
||||
|
||||
Per [event_source_types.md](/workspace/research/event_sourcing/event_source_types.md):
|
||||
|
||||
- **Honker streams** = domain events (internal to the owning service, for state reconstruction)
|
||||
- **irpc service calls** = request-response between services (synchronous boundary within a node)
|
||||
- **Call protocol EventEnvelope** = integration events (cross-node asynchronous boundary)
|
||||
|
||||
Domain events are projected to integration events when crossing service or node boundaries. Never publish domain events directly to other services.
|
||||
|
||||
## Service Definitions
|
||||
|
||||
### AuthService
|
||||
|
||||
Verifies identities without holding all keys in memory.
|
||||
|
||||
```rust
|
||||
use irpc::{rpc_requests, channel::{mpsc, oneshot}};
|
||||
use serde::{Serialize, Deserialize};
|
||||
|
||||
#[rpc_requests(message = AuthMessage)]
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
enum AuthProtocol {
|
||||
#[rpc(tx=oneshot::Sender<AuthResult>)]
|
||||
#[wrap(VerifyPubkey)]
|
||||
VerifyPubkey {
|
||||
fingerprint: String,
|
||||
key_data: Vec<u8>,
|
||||
},
|
||||
|
||||
#[rpc(tx=oneshot::Sender<AuthResult>)]
|
||||
#[wrap(VerifyToken)]
|
||||
VerifyToken {
|
||||
token_bytes: Vec<u8>,
|
||||
timestamp: u64,
|
||||
},
|
||||
|
||||
#[rpc(tx=oneshot::Sender<()>)]
|
||||
#[wrap(ReloadKeys)]
|
||||
ReloadKeys,
|
||||
|
||||
#[rpc(tx=oneshot::Sender<bool>)]
|
||||
#[wrap(CheckAccess)]
|
||||
CheckAccess {
|
||||
identity: Identity,
|
||||
operation: String,
|
||||
},
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
enum AuthResult {
|
||||
Ok(Identity),
|
||||
Denied(String),
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
struct Identity {
|
||||
node_id: String,
|
||||
fingerprint: String,
|
||||
scopes: Vec<String>,
|
||||
}
|
||||
```
|
||||
|
||||
**Backends:**
|
||||
|
||||
| Mode | Backend | When to use |
|
||||
|------|---------|-------------|
|
||||
| Minimal | `ArcSwap<DynamicConfig>` with all keys in memory | CLI, single-node, few users |
|
||||
| SQLite | Query `peer_credentials` / `api_keys` on demand | Production, multi-user head nodes |
|
||||
| Remote | Forward to dedicated auth service | Multi-head clusters, auth federation |
|
||||
|
||||
**Why this solves the scaling problem:** Instead of loading all keys into memory and swapping them atomically, the auth service queries SQLite per request. An LRU cache on hot fingerprints avoids repeated DB hits. Key revocations are propagated via honker stream notifications.
|
||||
|
||||
### SecretService
|
||||
|
||||
Derives keys from a master seed, encrypts/decrypts external credentials. The **only** component that holds the master seed phrase.
|
||||
|
||||
```rust
|
||||
#[rpc_requests(message = SecretMessage)]
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
enum SecretProtocol {
|
||||
#[rpc(tx=oneshot::Sender<DerivedKey>)]
|
||||
#[wrap(DeriveEd25519)]
|
||||
DeriveEd25519 {
|
||||
path: String, // e.g. "m/74'/0'/0'/0'"
|
||||
},
|
||||
|
||||
#[rpc(tx=oneshot::Sender<DerivedKey>)]
|
||||
#[wrap(DeriveEncryptionKey)]
|
||||
DeriveEncryptionKey {
|
||||
path: String, // e.g. "m/74'/2'/0'/0'"
|
||||
},
|
||||
|
||||
#[rpc(tx=oneshot::Sender<DerivedKey>)]
|
||||
#[wrap(DeriveEthereumKey)]
|
||||
DeriveEthereumKey {
|
||||
path: String, // e.g. "m/44'/60'/0'/0/0"
|
||||
},
|
||||
|
||||
#[rpc(tx=oneshot::Sender<Vec<u8>>)]
|
||||
#[wrap(DerivePassword)]
|
||||
DerivePassword {
|
||||
path: String,
|
||||
length: usize,
|
||||
},
|
||||
|
||||
#[rpc(tx=oneshot::Sender<EncryptedData>)]
|
||||
#[wrap(Encrypt)]
|
||||
Encrypt {
|
||||
plaintext: String,
|
||||
key_version: u32,
|
||||
},
|
||||
|
||||
#[rpc(tx=oneshot::Sender<String>)]
|
||||
#[wrap(Decrypt)]
|
||||
Decrypt {
|
||||
encrypted: EncryptedData,
|
||||
},
|
||||
|
||||
#[rpc(tx=oneshot::Sender<()>)]
|
||||
#[wrap(Lock)]
|
||||
Lock,
|
||||
|
||||
#[rpc(tx=oneshot::Sender<()>)]
|
||||
#[wrap(Unlock)]
|
||||
Unlock {
|
||||
passphrase: String,
|
||||
},
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
struct DerivedKey {
|
||||
key_type: KeyType,
|
||||
private_key: Vec<u8>,
|
||||
public_key: Vec<u8>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
enum KeyType {
|
||||
Ed25519,
|
||||
Aes256Gcm,
|
||||
Secp256k1,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
struct EncryptedData {
|
||||
key_version: u32,
|
||||
salt: String, // Base64-encoded
|
||||
iv: String, // Base64-encoded
|
||||
data: String, // Base64-encoded
|
||||
}
|
||||
```
|
||||
|
||||
**Security model:**
|
||||
|
||||
| State | What's in memory | What's on disk |
|
||||
|-------|-----------------|---------------|
|
||||
| Locked | Nothing | Encrypted database, derivation path metadata |
|
||||
| Unlocked | Master seed in RAM | Same (seed is never persisted) |
|
||||
| After use | Derived keys cached in RAM | Derivation paths only |
|
||||
|
||||
The seed phrase is entered once (at node startup or via `Unlock` call), held in memory, and never written to disk. Derived keys are computed on demand. The `Lock` call purges the seed and all cached derived keys from memory.
|
||||
|
||||
**Derived key patterns (see [storage.md](storage.md) for derivation path conventions):**
|
||||
|
||||
- Identity keys: SLIP-0010 `m/74'/0'/0'/0'` → Ed25519 keypair for alknet authentication
|
||||
- Encryption keys: SLIP-0010 `m/74'/2'/0'/0'` → AES-256-GCM key for external credential encryption
|
||||
- Ethereum keys: BIP32 `m/44'/60'/0'/0/0` → secp256k1 keypair for smart contract signing
|
||||
- Site passwords: BIP32 `m/74'/1'/0'/{hash}'` → deterministic password derivation (orbit-db-wallet pattern)
|
||||
|
||||
### ConfigService
|
||||
|
||||
Dynamic configuration reload. Wraps `ArcSwap<DynamicConfig>` for minimal deployments, or delegates to SQLite-backed storage for production.
|
||||
|
||||
```rust
|
||||
#[rpc_requests(message = ConfigMessage)]
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
enum ConfigProtocol {
|
||||
#[rpc(tx=oneshot::Sender<ForwardingPolicy>)]
|
||||
#[wrap(GetForwardingPolicy)]
|
||||
GetForwardingPolicy,
|
||||
|
||||
#[rpc(tx=oneshot::Sender<RateLimitConfig>)]
|
||||
#[wrap(GetRateLimits)]
|
||||
GetRateLimits,
|
||||
|
||||
#[rpc(tx=oneshot::Sender<()>)]
|
||||
#[wrap(ReloadForwarding)]
|
||||
ReloadForwarding {
|
||||
policy: ForwardingPolicy,
|
||||
},
|
||||
|
||||
#[rpc(tx=oneshot::Sender<()>)]
|
||||
#[wrap(ReloadRateLimits)]
|
||||
ReloadRateLimits {
|
||||
limits: RateLimitConfig,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### StorageService
|
||||
|
||||
Graph CRUD operations, metagraph management, and honker event bridge. Wraps the `alknet-storage` crate.
|
||||
|
||||
```rust
|
||||
#[rpc_requests(message = StorageMessage)]
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
enum StorageProtocol {
|
||||
#[rpc(tx=oneshot::Sender<Graph>)]
|
||||
#[wrap(CreateGraph)]
|
||||
CreateGraph {
|
||||
graph_type_id: String,
|
||||
name: String,
|
||||
},
|
||||
|
||||
#[rpc(tx=oneshot::Sender<Node>)]
|
||||
#[wrap(AddNode)]
|
||||
AddNode {
|
||||
graph_id: String,
|
||||
key: String,
|
||||
attributes: serde_json::Value,
|
||||
},
|
||||
|
||||
#[rpc(tx=oneshot::Sender<Node>)]
|
||||
#[wrap(GetNode)]
|
||||
GetNode {
|
||||
graph_id: String,
|
||||
key: String,
|
||||
},
|
||||
|
||||
#[rpc(tx=mpsc::Sender<StorageEvent>)]
|
||||
#[wrap(Subscribe)]
|
||||
Subscribe {
|
||||
stream_name: String,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
The `Subscribe` variant uses server-streaming irpc — the client sends one request and receives multiple `StorageEvent` messages via `mpsc::Sender`. These are honker stream events projected into integration events.
|
||||
|
||||
## Operation Context and Handler Environment
|
||||
|
||||
The call protocol's `OperationSpec` defines *what* an operation looks like (name, namespace, input/output schemas, access control). But the handler that actually processes the call needs more than just `input` — it needs **context**: who made the call, what other operations it can invoke, and what identity it runs as.
|
||||
|
||||
This is the pattern established in `@alkdev/operations` and needs to map cleanly to the Rust implementation.
|
||||
|
||||
### OperationContext
|
||||
|
||||
Every handler receives an `OperationContext` alongside its input:
|
||||
|
||||
```rust
|
||||
pub struct OperationContext {
|
||||
pub request_id: String,
|
||||
pub parent_request_id: Option<String>,
|
||||
pub identity: Option<Identity>,
|
||||
pub metadata: HashMap<String, Value>,
|
||||
pub env: OperationEnv,
|
||||
pub trusted: bool, // set by buildEnv(), not by callers
|
||||
}
|
||||
|
||||
pub struct Identity {
|
||||
pub id: String,
|
||||
pub scopes: Vec<String>,
|
||||
pub resources: Option<HashMap<String, Vec<String>>>,
|
||||
}
|
||||
```
|
||||
|
||||
Key fields:
|
||||
|
||||
- **`request_id`** / **`parent_request_id`**: Call tracing. A mutation that triggers events carries `parent_request_id` so the call graph can link them.
|
||||
- **`identity`**: The authenticated identity making the call. Populated by the auth service from the call protocol's `call.requested` event. ACL checks use `identity.scopes` and `identity.resources` via the operation's `accessControl`.
|
||||
- **`metadata`**: Arbitrary key-value context. Used for things like trace IDs, correlation headers, or feature flags.
|
||||
- **`env`**: The **operation environment** — namespaced access to call other operations. This is the composition mechanism.
|
||||
- **`trusted`**: Internal flag set by `buildEnv()`. When a handler calls another operation through `env`, the nested call is `trusted` (skips ACL checks). This prevents handlers from having to manage auth scope escalation themselves.
|
||||
|
||||
### OperationEnv (the composition mechanism)
|
||||
|
||||
`OperationEnv` provides namespaced access to the operation registry. A handler can call other operations without knowing their transport:
|
||||
|
||||
```rust
|
||||
pub type OperationEnv = HashMap<String, HashMap<String, fn(Value, OperationContext) -> ResponseEnvelope>>;
|
||||
|
||||
// Usage inside a handler:
|
||||
let result = context.env["secrets"]["deriveKey"](derive_input, nested_context)?;
|
||||
```
|
||||
|
||||
In TypeScript, `buildEnv()` iterates all registered specs (excluding subscriptions), creates closure functions for each, and passes `trusted: true` in the nested context. The Rust equivalent uses irpc service calls:
|
||||
|
||||
```rust
|
||||
// Local: direct function call through handler map
|
||||
// Remote: irpc call to the service that owns that operation
|
||||
```
|
||||
|
||||
This means a handler for `/head/docker/create` can internally call `/head/secrets/derive` to get a key for the container, and the nested call is routed through the same service layer — locally if the secret service is on the same node, remotely via irpc if it's on a different node.
|
||||
|
||||
### Mapping to irpc
|
||||
|
||||
The TypeScript `OperationEnv` pattern maps to irpc as follows:
|
||||
|
||||
| TypeScript | Rust (irpc) |
|
||||
|-----------|-------------|
|
||||
| `context.env.namespace.op(input)` | `client.rpc(ProtocolMessage::OpName { ... }).await?` |
|
||||
| `buildEnv(registry, context)` | `irpc::Client::local(tx)` or `irpc::Client::remote(conn)` |
|
||||
| `registry.execute(id, input, context)` | Service handler dispatch on the enum variant |
|
||||
| `accessControl` check | `enforceAccess()` before handler dispatch |
|
||||
| Subscription handlers (`async function*`) | `mpsc::Sender<T>` streaming response |
|
||||
|
||||
### Call Protocol Events and Context
|
||||
|
||||
The call protocol's `EventEnvelope` carries the context fields:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "call.requested",
|
||||
"id": "uuid-123",
|
||||
"payload": {
|
||||
"operationId": "/head/docker/create",
|
||||
"input": { "image": "nginx", "name": "web" },
|
||||
"identity": { "id": "node-abc", "scopes": ["docker:read", "docker:write"] },
|
||||
"parentRequestId": "uuid-122",
|
||||
"deadline": 1712345678000
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The `CallHandler` in `call.ts` receives this event, constructs an `OperationContext` from the payload, validates access control, and dispatches to the registered handler. The same pattern applies in Rust — the `buildCallHandler` function creates the context from the event and calls `registry.execute()`.
|
||||
|
||||
### Mutations and Events
|
||||
|
||||
A mutation handler can trigger side effects after the main operation:
|
||||
|
||||
```
|
||||
handler(input, context) {
|
||||
// 1. Perform mutation (e.g., create a node in storage)
|
||||
let result = storage.create_node(...);
|
||||
|
||||
// 2. Trigger side effects (e.g., publish event)
|
||||
// This is an integration event, not a domain event
|
||||
pubsub.publish("call.responded", "", {
|
||||
requestId: context.request_id,
|
||||
output: result,
|
||||
});
|
||||
|
||||
return result;
|
||||
}
|
||||
```
|
||||
|
||||
Following the event boundary discipline: the mutation itself uses honker's `stream_publish` for internal state management (domain event), and the call protocol `call.responded` is the integration event that other nodes/services react to. The handler doesn't publish honker events directly — that's the storage service's internal concern. The handler calls `context.env.storage.addNode()` and the storage service internally publishes to honker before returning.
|
||||
|
||||
### Adapters: MCP and OpenAPI
|
||||
|
||||
The `from_mcp` and `from_openapi` adapters in `@alkdev/operations` demonstrate how external protocols map to the operation model:
|
||||
|
||||
- **MCP**: Each MCP tool becomes a `MUTATION` operation. The handler calls `client.callTool()` and wraps the result in a `ResponseEnvelope` with `source: "mcp"`.
|
||||
- **OpenAPI**: Each HTTP endpoint becomes a `QUERY`, `MUTATION`, or `SUBSCRIPTION` (detected from `text/event-stream` responses). The handler makes HTTP requests and wraps results with `source: "http"`.
|
||||
|
||||
These adapters will need to map to irpc in Rust. The `ResponseEnvelope` pattern (wrapping results with source metadata) carries over directly. The `OpenAPIServiceRegistry` and `MCPClientLoader` patterns become irpc service initializers that register their operations with the call protocol's `OperationRegistry`.
|
||||
|
||||
The key insight: **adapters are just like any other service** — they register operations in the registry and get an `OperationContext` with `env` access. An MCP adapter can call `/head/secrets/derive` just as easily as a local handler can.
|
||||
|
||||
## Service Composition
|
||||
|
||||
### Minimal Deployment (Single Node, CLI)
|
||||
|
||||
All services run locally as tokio actors:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────┐
|
||||
│ Single Process │
|
||||
│ │
|
||||
│ ┌─────────┐ ┌─────────┐ ┌──────────────┐ │
|
||||
│ │ Auth │ │ Secret │ │ Config │ │
|
||||
│ │ Service │ │ Service │ │ Service │ │
|
||||
│ │ (mpsc) │ │ (mpsc) │ │ (mpsc) │ │
|
||||
│ └────┬─────┘ └────┬────┘ └──────┬───────┘ │
|
||||
│ │ │ │ │
|
||||
│ ┌────▼─────────────▼───────────────▼───────┐ │
|
||||
│ │ alknet-core Server │ │
|
||||
│ │ (SSH auth, call protocol, forwarding) │ │
|
||||
│ └──────────────────────────────────────────┘ │
|
||||
└──────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
- Auth service uses `ArcSwap<DynamicConfig>` (all keys in memory)
|
||||
- Secret service runs unlocked (seed in memory, no external access)
|
||||
- Config service uses `ArcSwap<DynamicConfig>` directly
|
||||
|
||||
### Production Deployment (Multi-Node)
|
||||
|
||||
Auth and secrets run on dedicated nodes; workers access them remotely:
|
||||
|
||||
```
|
||||
┌────────────────────┐ ┌─────────────────────┐
|
||||
│ Auth Node │ │ Secret Node │
|
||||
│ │ │ │
|
||||
│ AuthProtocol │ │ SecretProtocol │
|
||||
│ (SQLite-backed) │ │ (seed in RAM) │
|
||||
│ │ │ │
|
||||
└────────┬───────────┘ └──────────┬──────────┘
|
||||
│ QUIC (irpc) │ QUIC (irpc)
|
||||
│ │
|
||||
┌────────▼────────────────────────────▼─────────┐
|
||||
│ Head Node │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌─────────────┐ │
|
||||
│ │ Config │ │ Storage │ │ alknet-core │ │
|
||||
│ │ Service │ │ Service │ │ Server │ │
|
||||
│ │ (local) │ │ (local) │ │ │ │
|
||||
│ └──────────┘ └──────────┘ └──────────────┘ │
|
||||
└───────────────────────────────────────────────┘
|
||||
│
|
||||
│ SSH / iroh / TLS
|
||||
│
|
||||
┌────────▼──────────────────────────────────────┐
|
||||
│ Worker Node │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────────┐ │
|
||||
│ │ Storage │ │ alknet-core │ │
|
||||
│ │ Client │ │ Client │ │
|
||||
│ │ (remote) │ │ │ │
|
||||
│ └──────────┘ └──────────────┘ │
|
||||
└───────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Workers don't hold the seed or the auth database. They request derived keys and auth verification via irpc over QUIC.
|
||||
|
||||
## Service and Call Protocol Relationship
|
||||
|
||||
Services are **internal** — they run within a node or cluster. The call protocol is **external** — it's how nodes communicate with each other over SSH/QUIC/WebSocket/DNS transports.
|
||||
|
||||
A service can be exposed as a call protocol operation:
|
||||
|
||||
| Internal Service | Call Protocol Path | Direction |
|
||||
|-----------------|-------------------|-----------|
|
||||
| AuthProtocol::VerifyPubkey | `/head/auth/verify` | Worker → Head |
|
||||
| SecretProtocol::DeriveEd25519 | `/head/secrets/derive` | Worker → Head (restricted) |
|
||||
| StorageProtocol::Subscribe | `/{node}/storage/watch` | Any → Any |
|
||||
| ConfigProtocol::GetForwardingPolicy | `/head/config/forwarding` | Worker → Head |
|
||||
|
||||
External workers call these through the call protocol, which routes to the service on the head node:
|
||||
|
||||
```
|
||||
Worker Head
|
||||
│ │
|
||||
│ call.requested │
|
||||
│ operation: /head/auth/verify │
|
||||
│ payload: { fingerprint, key }│
|
||||
│ ─────────────────────────────►│
|
||||
│ │ ┌─ AuthProtocol::VerifyPubkey ─┐
|
||||
│ │ │ (irpc, local mpsc channel) │
|
||||
│ │ └─ Result: AuthResult ──────────┘
|
||||
│ │
|
||||
│ call.responded │
|
||||
│ payload: { status: "ok" } │
|
||||
│ ◄─────────────────────────────│
|
||||
```
|
||||
|
||||
## Service Integration Example
|
||||
|
||||
A head/worker deployment demonstrates service integration end-to-end:
|
||||
|
||||
- **Head node**: runs Auth, Secret, and Config services locally
|
||||
- **Worker node**: connects to head via alknet call protocol
|
||||
|
||||
The worker-to-head protocol maps to call protocol operations:
|
||||
|
||||
| Worker Message | Call Protocol Path | Service |
|
||||
|----------------|-------------------|---------|
|
||||
| Auth | `/head/auth/verify` | AuthProtocol |
|
||||
| Heartbeat | `/worker/heartbeat` (subscription) | ConfigProtocol |
|
||||
| Task result | `/worker/task/submit` | StorageProtocol (persistence) |
|
||||
| Task assignment | `/head/task/template` (subscription) | StorageProtocol |
|
||||
|
||||
Worker keys are derived from the seed by the secret service. The head node's API credentials are stored encrypted and decrypted on demand by the secret service.
|
||||
|
||||
## Derived Key Conventions
|
||||
|
||||
Standardized SLIP-0010/BIP32 paths (see [storage.md](storage.md) for full table):
|
||||
|
||||
| Path | Purpose | Curve/Algorithm |
|
||||
|------|---------|----------------|
|
||||
| `m/74'/0'/0'/0'` | Primary identity keypair | Ed25519 (alknet auth) |
|
||||
| `m/74'/0'/0'/{n}'` | Worker/ device identity | Ed25519 |
|
||||
| `m/74'/0'/1'/0'` | SSH host key | Ed25519 |
|
||||
| `m/74'/1'/0'/{hash}'` | Site-specific password | Deterministic (like orbit-db-wallet) |
|
||||
| `m/74'/2'/0'/0'` | Encryption key for external credentials | AES-256-GCM |
|
||||
| `m/44'/60'/0'/0/0` | Ethereum signing key | secp256k1 (smart contract) |
|
||||
|
||||
The `74'` coin type is unallocated per SLIP-0044 and reserved for alknet.
|
||||
|
||||
## Application Services
|
||||
|
||||
Core services (auth, secret, config, storage) are infrastructure that every node needs. Application services are domain-specific and pluggable — they expose operations via the call protocol and are registered dynamically by the node operator.
|
||||
|
||||
### Service Tiers
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Application Layer │
|
||||
│ DockerService · NodeService · WalletService · GitService│
|
||||
│ ProxyService · ComputeService · AgentService · ... │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ Core Services │
|
||||
│ AuthService · SecretService · ConfigService │
|
||||
│ StorageService │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ alknet-core │
|
||||
│ Transport · Call Protocol · SSH · irpc │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### DockerService
|
||||
|
||||
Container lifecycle management on a node. Wraps the Docker Engine API (via `bollard` crate, already used in dispatch) and exposes it through the call protocol.
|
||||
|
||||
```rust
|
||||
#[rpc_requests(message = DockerMessage)]
|
||||
enum DockerProtocol {
|
||||
#[rpc(tx=oneshot::Sender<ContainerInfo>)]
|
||||
#[wrap(CreateContainer)]
|
||||
CreateContainer { image: String, name: Option<String>, env: Vec<(String, String)>, ports: Vec<(u16, u16)> },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<ContainerInfo>)]
|
||||
#[wrap(InspectContainer)]
|
||||
InspectContainer { id: String },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<Vec<ContainerInfo>>)]
|
||||
#[wrap(ListContainers)]
|
||||
ListContainers { all: bool },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<()>)]
|
||||
#[wrap(StopContainer)]
|
||||
StopContainer { id: String, timeout: u64 },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<()>)]
|
||||
#[wrap(RemoveContainer)]
|
||||
RemoveContainer { id: String, force: bool },
|
||||
|
||||
#[rpc(tx=mpsc::Sender<ContainerEvent>)]
|
||||
#[wrap(StreamEvents)]
|
||||
StreamEvents { filters: Vec<String> },
|
||||
}
|
||||
```
|
||||
|
||||
This makes container management a first-class alknet operation that can be called from any connected node, not just SSH. The dispatch project's `InstanceProvider` trait pattern maps directly here.
|
||||
|
||||
**Self-hosting use case**: An operator deploys a "server in a box" by connecting a worker node with DockerService registered. A head node (or another authorized node) can then deploy containers remotely via call protocol: `/node/docker/create`, `/node/docker/list`, etc. This replaces manual SSH + docker-compose with automated, auditable, policy-governed deployment.
|
||||
|
||||
### NodeService
|
||||
|
||||
System health, metrics, and tiered observability. Exposes system metrics and supports tiered escalation from small models to larger models to humans.
|
||||
|
||||
```rust
|
||||
#[rpc_requests(message = NodeMessage)]
|
||||
enum NodeProtocol {
|
||||
#[rpc(tx=oneshot::Sender<SystemMetrics>)]
|
||||
#[wrap(GetMetrics)]
|
||||
GetMetrics { categories: Vec<MetricCategory> },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<HealthStatus>)]
|
||||
#[wrap(HealthCheck)]
|
||||
HealthCheck,
|
||||
|
||||
#[rpc(tx=mpsc::Sender<SystemEvent>)]
|
||||
#[wrap(SubscribeMetrics)]
|
||||
SubscribeMetrics { interval_ms: u64, categories: Vec<MetricCategory> },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<()>)]
|
||||
#[wrap(Escalate)]
|
||||
Escalate { severity: Severity, message: String, context: serde_json::Value },
|
||||
}
|
||||
|
||||
#[derive(Serialize, Deserialize)]
|
||||
enum MetricCategory { Cpu, Memory, Disk, Network, Docker, Uptime }
|
||||
|
||||
#[derive(Serialize, Deserialize)]
|
||||
enum Severity { Info, Warning, Critical }
|
||||
```
|
||||
|
||||
**Tiered escalation pattern**: A small model (fast, cheap) subscribes to `/node/metrics/stream` and evaluates simple rules (disk > 90%, memory > 95%, container crashed). When a rule triggers, it calls `/node/alert/escalate` with context. The head node decides whether to notify a larger model or a human.
|
||||
|
||||
### WalletService
|
||||
|
||||
Multichain wallet operations using a HD derivation library (e.g., wagyu). Derives keys from the same master seed via the secret service, signs transactions, and manages addresses.
|
||||
|
||||
```rust
|
||||
#[rpc_requests(message = WalletMessage)]
|
||||
enum WalletProtocol {
|
||||
#[rpc(tx=oneshot::Sender<AddressInfo>)]
|
||||
#[wrap(GetAddress)]
|
||||
GetAddress { chain: Chain, path: String },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<BalanceInfo>)]
|
||||
#[wrap(GetBalance)]
|
||||
GetBalance { chain: Chain, address: String },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<SignedTransaction>)]
|
||||
#[wrap(SignTransaction)]
|
||||
SignTransaction { chain: Chain, path: String, tx_params: serde_json::Value },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<String>)]
|
||||
#[wrap(VerifyAddress)]
|
||||
VerifyAddress { chain: Chain, address: String },
|
||||
}
|
||||
|
||||
#[derive(Serialize, Deserialize)]
|
||||
enum Chain { Bitcoin, Ethereum, Monero, Zcash }
|
||||
```
|
||||
|
||||
The WalletService delegates key derivation to the SecretService via irpc. It never sees the master seed — only derived keypairs for specific paths. This means wallet operations are available to authorized nodes without exposing the full key hierarchy.
|
||||
|
||||
### ProxyService
|
||||
|
||||
Reverse proxy and TLS certificate management. Automates nginx/certbot configuration for services deployed via DockerService.
|
||||
|
||||
```rust
|
||||
#[rpc_requests(message = ProxyMessage)]
|
||||
enum ProxyProtocol {
|
||||
#[rpc(tx=oneshot::Sender<ProxyConfig>)]
|
||||
#[wrap(GetConfig)]
|
||||
GetConfig,
|
||||
|
||||
#[rpc(tx=oneshot::Sender<()>)]
|
||||
#[wrap(AddRoute)]
|
||||
AddRoute { domain: String, upstream: String, tls: bool },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<()>)]
|
||||
#[wrap(RemoveRoute)]
|
||||
RemoveRoute { domain: String },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<CertificateInfo>)]
|
||||
#[wrap(ProvisionCert)]
|
||||
ProvisionCert { domain: String },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<Vec<CertificateInfo>>)]
|
||||
#[wrap(ListCerts)]
|
||||
ListCerts,
|
||||
}
|
||||
```
|
||||
|
||||
### ComputeService
|
||||
|
||||
Abstracts compute provider APIs (starting with dispatch's `InstanceProvider` pattern). Manages remote instances across providers.
|
||||
|
||||
```rust
|
||||
#[rpc_requests(message = ComputeMessage)]
|
||||
enum ComputeProtocol {
|
||||
#[rpc(tx=oneshot::Sender<InstanceInfo>)]
|
||||
#[wrap(CreateInstance)]
|
||||
CreateInstance { provider: String, spec: InstanceSpec },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<Vec<InstanceInfo>>)]
|
||||
#[wrap(ListInstances)]
|
||||
ListInstances { provider: Option<String> },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<()>)]
|
||||
#[wrap(DestroyInstance)]
|
||||
DestroyInstance { id: String },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<InstanceInfo>)]
|
||||
#[wrap(GetInstance)]
|
||||
GetInstance { id: String },
|
||||
}
|
||||
```
|
||||
|
||||
### Registration Pattern
|
||||
|
||||
Application services register with the call protocol's `OperationRegistry` at startup:
|
||||
|
||||
```rust
|
||||
registry.register(
|
||||
OperationSpec { name: "/node/docker/create", namespace: "docker", ... },
|
||||
docker_service.create_container_handler,
|
||||
);
|
||||
registry.register(
|
||||
OperationSpec { name: "/node/metrics/stream", namespace: "node", ... },
|
||||
node_service.subscribe_metrics_handler,
|
||||
);
|
||||
```
|
||||
|
||||
A worker node that exposes Docker and Node services registers those operations when it connects to the head. The head can then route calls from any node to the appropriate worker via the call protocol.
|
||||
|
||||
### Self-Hosting Stack Example
|
||||
|
||||
A minimal self-hosted server with all services:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Head Node │
|
||||
│ │
|
||||
│ Core: Auth · Secret · Config · Storage │
|
||||
│ App: Docker · Node · Proxy · Git · Wallet · Compute │
|
||||
│ │
|
||||
│ Call protocol paths: │
|
||||
│ /head/auth/* │
|
||||
│ /head/docker/* │
|
||||
│ /head/proxy/* │
|
||||
│ /head/wallet/* │
|
||||
│ /head/compute/* │
|
||||
│ /head/node/metrics/* │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
An operator deploys this by:
|
||||
1. Running `alknet serve --config stack.toml`
|
||||
2. Entering their seed phrase once (unlocks the secret service)
|
||||
3. All services come online with keys derived from the seed
|
||||
4. Docker containers for Gitea, Postgres, Redis, etc. are managed via DockerService
|
||||
5. Reverse proxy and TLS are automated via ProxyService
|
||||
6. Wallet keys are derived on demand via WalletService
|
||||
|
||||
No manual SSH, no hardcoded credentials, no separate secret management. The seed phrase is the single root of trust.
|
||||
|
||||
## Crate Structure
|
||||
|
||||
```
|
||||
alknet-core/
|
||||
├── transport/ — Transport trait, TCP, TLS, iroh, DNS
|
||||
├── call/ — Call protocol, PendingRequestMap, OperationRegistry
|
||||
├── auth/ — AuthService protocol, identity types
|
||||
├── secrets/ — SecretService protocol, BIP39, SLIP-0010, AES-GCM
|
||||
├── config/ — ConfigService protocol, StaticConfig, DynamicConfig
|
||||
├── handler/ — ServerHandler, SSH authentication hooks
|
||||
└── serve.rs — Server::run(), multi-transport listeners
|
||||
|
||||
alknet-storage/
|
||||
├── metagraph/ — GraphType, NodeType, EdgeType persistence
|
||||
├── identity/ — accounts, organizations, peer_credentials, api_keys
|
||||
├── acl/ — PrincipalNode, DelegatesEdge, access control graph
|
||||
├── secrets/ — Encrypted node type, encrypt/decrypt, key derivation bridge
|
||||
├── honker/ — honker integration: notify, stream, queue
|
||||
├── graph/ — GraphInstance, Node, Edge CRUD with schema validation
|
||||
└── schema/ — JSON Schema definitions (serde + jsonschema)
|
||||
```
|
||||
|
||||
## Security Considerations
|
||||
|
||||
1. **Seed phrase is never persisted** — it's entered at startup or via `Unlock` call and held only in RAM
|
||||
2. **Derived keys are cached in memory** — cleared on `Lock`
|
||||
3. **External credentials are encrypted at rest** — the encryption key is itself derived from the seed
|
||||
4. **Auth service never sees the seed** — it only sees public key fingerprints and verification results
|
||||
5. **irpc remote communication is over QUIC** — encrypted in transit; irpc doesn't add its own encryption layer (assumes the transport provides it)
|
||||
6. **Lock wipes all secrets** — a locked secret service returns errors for all requests until unlocked
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **OQ-SVC-01**: Should the secret service support multiple seed phrases (one per tenant or identity)?
|
||||
|
||||
The simplest approach is one seed per node. Multi-seed support (e.g., one per tenant in a multi-tenant system) can be added later by indexing the `Unlock` call with a tenant ID. Defer for now.
|
||||
|
||||
- **OQ-SVC-02**: Should service protocols use postcard (binary) or JSON for remote calls?
|
||||
|
||||
irpc defaults to postcard for efficiency. However, the call protocol uses JSON `EventEnvelope` for cross-language compatibility. Service-to-service calls should use postcard (Rust-to-Rust), while node-to-node calls use JSON (call protocol). The irpc remote path naturally uses postcard.
|
||||
|
||||
- **OQ-SVC-03**: How does the secret service integrate with the existing `EncryptedDataSchema` from `@alkdev/storage`?
|
||||
|
||||
The TypeScript `encrypt()`/`decrypt()` functions use PBKDF2 with a password. In Rust, the secret service replaces the password with a derived AES-256-GCM key. The `EncryptedData` schema (key_version, salt, iv, data) stays the same, but key derivation changes from PBKDF2(password) to SLIP-0010(seed, path). This is a superset — the old format can be migrated by re-encrypting with the new key.
|
||||
|
||||
- **OQ-SVC-04**: Should workers cache derived keys locally?
|
||||
|
||||
Yes, with a TTL. A worker that holds a derived Ed25519 keypair for its session can re-authenticate without calling the secret service every time. The TTL should be configurable (default: 1 hour). The head can revoke by invalidating the session, not by expiring the key.
|
||||
|
||||
- **OQ-SVC-05**: How does the smart contract (NFT-based ACL) interact with the secret service?
|
||||
|
||||
The Ethereum signing key (`m/44'/60'/0'/0/0`) is derived from the same seed. The secret service can sign transactions on behalf of the node. The smart contract is a separate concern — it's the external source of truth for identity registration. The local ACL graph (in `alknet-storage`) is a cache that's synced from the contract, not the other way around.
|
||||
|
||||
## References
|
||||
|
||||
- [core.md](core.md) — Core overview, transport, call protocol, head/worker model
|
||||
- [configuration.md](configuration.md) — Config architecture, auth service, DynamicConfig
|
||||
- [storage.md](storage.md) — Metagraph, identity, ACL, secrets, event boundaries
|
||||
- [flow.md](flow.md) — Operation graph, call graph, petgraph mapping
|
||||
- `/workspace/@alkdev/storage/docs/architecture/encrypted-data.md` — Original encrypted data design (TypeScript)
|
||||
- `/workspace/research/event_sourcing/event_source_types.md` — Event-driven architecture patterns
|
||||
- irpc crate — https://docs.rs/irpc — Service protocol definitions, local/remote abstraction
|
||||
- SLIP-0010 — https://github.com/satoshilabs/slips/blob/master/slip-0010.md — HD key derivation for Ed25519
|
||||
- BIP39 — https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki — Mnemonic code for generating deterministic keys (widely used beyond cryptocurrency)
|
||||
- `ed25519-bip32` crate — https://docs.rs/ed25519-bip32 — BIP32-Ed25519 (Cardano/IOHK approach)
|
||||
- `bip39` crate — https://docs.rs/bip39 — Mnemonic generation and seed derivation
|
||||
@@ -1,460 +0,0 @@
|
||||
# Alknet Storage: Metagraph, Identity, ACL, Secrets, and Honker Integration
|
||||
|
||||
> Status: Research / Draft
|
||||
> Last updated: 2026-06-06
|
||||
|
||||
## Overview
|
||||
|
||||
`alknet-storage` is a Rust crate providing SQLite-backed graph storage, identity management, access control, secrets management, and reactivity via honker. It mirrors the TypeScript `@alkdev/storage` package's design (`sqlite-host.md`, `metagraph-module.md`, `acl.md`) while leveraging Rust's type system and petgraph's performance.
|
||||
|
||||
## Terminology
|
||||
|
||||
This document uses **head/worker** terminology instead of hub/spoke:
|
||||
- **Head node**: Coordinating node that can also be a worker
|
||||
- **Worker node**: Node that connects to a head and registers services
|
||||
- **Node**: Any participant in the network
|
||||
|
||||
## Crate Decomposition
|
||||
|
||||
```
|
||||
alknet-storage
|
||||
├── metagraph/ — GraphType, NodeType, EdgeType definitions and persistence
|
||||
├── identity/ — accounts, organizations, peer_credentials, api_keys, audit_logs
|
||||
├── acl/ — PrincipalNode, DelegatesEdge, access control graph
|
||||
├── secrets/ — HD key derivation (BIP39/SLIP-0010), encrypted data, secret service bridge
|
||||
├── honker/ — honker integration: notify, stream, queue, event bridge
|
||||
├── graph/ — GraphInstance, Node, Edge CRUD with schema validation
|
||||
└── schema/ — JSON Schema definitions (serde + jsonschema for runtime validation)
|
||||
```
|
||||
|
||||
## Metagraph Data Model
|
||||
|
||||
The metagraph is a three-level type system (mirrors `@alkdev/storage` exactly):
|
||||
|
||||
1. **GraphType** — A class of graphs (e.g., "call-graph", "acl", "task-dependencies"). Defines structural constraints (directed/undirected/mixed, allows self-loops, multi-edges).
|
||||
2. **NodeType** — A category of node within a graph type (e.g., "call", "account", "task"). Each node type has a JSON Schema that validates the `attributes` of nodes belonging to that type.
|
||||
3. **EdgeType** — A category of edge within a graph type (e.g., "triggered", "can_read", "depends_on"). Each edge type has a JSON Schema for its attributes. Optionally constrains which source/target node types are valid.
|
||||
|
||||
**Graph instances** belong to a graph type and contain **Nodes** and **Edges** conforming to those type definitions.
|
||||
|
||||
### Rust Types
|
||||
|
||||
```rust
|
||||
pub struct GraphType {
|
||||
pub id: String,
|
||||
pub name: String, // "call-graph", "acl"
|
||||
pub description: String,
|
||||
pub config: GraphConfig, // directed/undirected/mixed, multi, self-loops
|
||||
pub version: u32,
|
||||
pub scope: Scope, // System, Tenant, User
|
||||
pub metadata: serde_json::Value,
|
||||
}
|
||||
|
||||
pub struct GraphConfig {
|
||||
pub graph_type: GraphDirection, // Directed, Undirected, Mixed
|
||||
pub multi: bool,
|
||||
pub allow_self_loops: bool,
|
||||
}
|
||||
|
||||
pub enum Scope {
|
||||
System,
|
||||
Tenant,
|
||||
User,
|
||||
}
|
||||
|
||||
pub struct NodeType {
|
||||
pub id: String,
|
||||
pub graph_type_id: String,
|
||||
pub name: String, // "call", "account"
|
||||
pub description: String,
|
||||
pub schema: serde_json::Value, // JSON Schema for node attributes
|
||||
}
|
||||
|
||||
pub struct EdgeType {
|
||||
pub id: String,
|
||||
pub graph_type_id: String,
|
||||
pub name: String, // "triggered", "can_read"
|
||||
pub description: String,
|
||||
pub schema: serde_json::Value, // JSON Schema for edge attributes
|
||||
pub allowed_source_types: Vec<String>, // [] = no restriction
|
||||
pub allowed_target_types: Vec<String>,
|
||||
}
|
||||
|
||||
pub struct Graph {
|
||||
pub id: String,
|
||||
pub graph_type_id: String,
|
||||
pub name: String,
|
||||
pub description: String,
|
||||
pub status: GraphStatus, // Active, Archived, Draft
|
||||
pub owner_id: Option<String>,
|
||||
pub project_id: Option<String>,
|
||||
pub metadata: serde_json::Value,
|
||||
}
|
||||
|
||||
pub enum GraphStatus {
|
||||
Active,
|
||||
Archived,
|
||||
Draft,
|
||||
}
|
||||
|
||||
pub struct Node {
|
||||
pub id: String,
|
||||
pub graph_id: String,
|
||||
pub key: String, // Consumer-defined identity within the graph
|
||||
pub attributes: serde_json::Value, // Validated by node type schema
|
||||
pub metadata: serde_json::Value,
|
||||
}
|
||||
|
||||
pub struct Edge {
|
||||
pub id: String,
|
||||
pub graph_id: String,
|
||||
pub key: Option<String>, // Null for anonymous edges
|
||||
pub source_node_key: String,
|
||||
pub target_node_key: String,
|
||||
pub attributes: serde_json::Value, // Validated by edge type schema
|
||||
pub undirected: bool,
|
||||
pub metadata: serde_json::Value,
|
||||
}
|
||||
```
|
||||
|
||||
### SQLite Tables (mirrors `sqlite-host.md`)
|
||||
|
||||
Common columns on all tables: `id TEXT PK`, `metadata TEXT JSON DEFAULT '{}'`, `created_at INTEGER TIMESTAMP DEFAULT (strftime('%s','now'))`, `updated_at INTEGER TIMESTAMP DEFAULT (strftime('%s','now'))`.
|
||||
|
||||
**graph_types**: `id`, `name TEXT UNIQUE`, `description TEXT DEFAULT ''`, `config TEXT JSON NOT NULL`, `version INTEGER NOT NULL DEFAULT 1`, `scope TEXT NOT NULL DEFAULT 'system'`
|
||||
|
||||
**node_types**: `id`, `graph_type_id TEXT FK → graph_types.id CASCADE`, `name TEXT NOT NULL`, `description TEXT DEFAULT ''`, `schema TEXT JSON NOT NULL`. Unique constraint: `(graph_type_id, name)`.
|
||||
|
||||
**edge_types**: `id`, `graph_type_id TEXT FK → graph_types.id CASCADE`, `name TEXT NOT NULL`, `description TEXT DEFAULT ''`, `schema TEXT JSON NOT NULL`, `allowed_source_types TEXT JSON DEFAULT '[]'`, `allowed_target_types TEXT JSON DEFAULT '[]'`. Unique constraint: `(graph_type_id, name)`.
|
||||
|
||||
**graphs**: `id`, `graph_type_id TEXT FK → graph_types.id SET NULL`, `name TEXT NOT NULL`, `description TEXT DEFAULT ''`, `status TEXT NOT NULL DEFAULT 'draft'`, `owner_id TEXT`, `project_id TEXT`. Indexes on `(owner_id)`, `(project_id)`, `(owner_id, project_id)`.
|
||||
|
||||
**nodes**: `id`, `graph_id TEXT FK → graphs.id CASCADE`, `key TEXT NOT NULL`, `attributes TEXT JSON NOT NULL DEFAULT '{}'`. Unique constraint: `(graph_id, key)`. No `node_type_id` column (ADR-020).
|
||||
|
||||
**edges**: `id`, `graph_id TEXT FK → graphs.id CASCADE`, `key TEXT`, `source_node_key TEXT NOT NULL`, `target_node_key TEXT NOT NULL`, `attributes TEXT JSON NOT NULL DEFAULT '{}'`, `undirected INTEGER DEFAULT 0`. Unique constraint: `(graph_id, key)`. FK: `source_node_key`, `target_node_key` reference `(nodes.graph_id, nodes.key)` with CASCADE delete (ADR-022).
|
||||
|
||||
### System DB vs Tenant DB (ADR-040)
|
||||
|
||||
- **System DB** (`system.db`): Identity tables (accounts, organizations, peer_credentials, api_keys, audit_logs) + system-scoped graph types.
|
||||
- **Tenant DB** (`tenant-{orgId}.db`): Metagraph tables (graph_types, node_types, edge_types, graphs, nodes, edges) + tenant-scoped graph types.
|
||||
|
||||
No FK constraints across database files. Consumer enforces referential integrity at application layer.
|
||||
|
||||
## Identity Tables
|
||||
|
||||
Mirrors `sqlite-host.md` identity tables with the same column definitions and FK cascades:
|
||||
|
||||
**accounts**: `email TEXT UNIQUE NOT NULL`, `display_name TEXT`, `access_level TEXT NOT NULL DEFAULT 'user'` (admin/user/service), `status TEXT NOT NULL DEFAULT 'active'` (active/suspended/deactivated).
|
||||
|
||||
**organizations**: `name TEXT UNIQUE NOT NULL`, `slug TEXT UNIQUE NOT NULL`, `owner_id TEXT FK → accounts.id RESTRICT`.
|
||||
|
||||
**organization_members**: `org_id TEXT FK → organizations.id CASCADE`, `account_id TEXT FK → accounts.id CASCADE`, `membership_level TEXT NOT NULL` (owner/admin/member). Unique constraint: `(org_id, account_id)`.
|
||||
|
||||
**api_keys**: `owner_id TEXT FK → accounts.id CASCADE`, `key_hash TEXT UNIQUE NOT NULL`, `name TEXT`, `enabled INTEGER NOT NULL DEFAULT 1`, `expires_at INTEGER TIMESTAMP`, `revoked_at INTEGER TIMESTAMP`, `rotated_to_id TEXT`, `last_used_at INTEGER TIMESTAMP`.
|
||||
|
||||
**peer_credentials**: `owner_id TEXT FK → accounts.id CASCADE`, `credential_type TEXT NOT NULL` (ssh_key/cert_authority), `fingerprint TEXT UNIQUE NOT NULL`, `public_key_data TEXT NOT NULL`, `name TEXT`, `enabled INTEGER NOT NULL DEFAULT 1`, `expires_at INTEGER TIMESTAMP`, `revoked_at INTEGER TIMESTAMP`.
|
||||
|
||||
**audit_logs**: `action TEXT NOT NULL`, `owner_id TEXT FK → accounts.id RESTRICT`, `credential_id TEXT`, `credential_type TEXT`, `org_id TEXT FK → organizations.id SET NULL`, `details TEXT JSON`.
|
||||
|
||||
## Access Control (ACL) as Metagraph
|
||||
|
||||
Mirrors `@alkdev/storage acl.md`:
|
||||
|
||||
### AclGraph Module
|
||||
|
||||
```rust
|
||||
// Graph config: directed, multi=false, allowSelfLoops=false
|
||||
pub const ACL_GRAPH_CONFIG: GraphConfig = GraphConfig {
|
||||
graph_type: GraphDirection::Directed,
|
||||
multi: false,
|
||||
allow_self_loops: false,
|
||||
};
|
||||
|
||||
// Node types
|
||||
pub const PRINCIPAL_NODE: &str = "principal";
|
||||
pub const RESOURCE_NODE: &str = "resource";
|
||||
|
||||
// Edge types
|
||||
pub const CAN_READ_EDGE: &str = "can_read";
|
||||
pub const CAN_WRITE_EDGE: &str = "can_write";
|
||||
pub const CAN_EXECUTE_EDGE: &str = "can_execute";
|
||||
pub const BELONGS_TO_EDGE: &str = "belongs_to";
|
||||
pub const DELEGATES_EDGE: &str = "delegates";
|
||||
|
||||
// PrincipalNode attributes
|
||||
pub struct PrincipalNodeAttrs {
|
||||
pub identity_type: IdentityType, // Account, Org, Service, Role
|
||||
pub identity_id: String, // FK to accounts.id or organizations.id
|
||||
pub scopes: Vec<String>,
|
||||
pub resources: Option<HashMap<String, Vec<String>>>,
|
||||
}
|
||||
|
||||
pub enum IdentityType {
|
||||
Account,
|
||||
Org,
|
||||
Service,
|
||||
Role,
|
||||
}
|
||||
|
||||
// DelegatesEdge attributes
|
||||
pub struct DelegatesEdgeAttrs {
|
||||
pub narrowed_scopes: Vec<String>, // Subset of delegator's scopes
|
||||
pub narrowable: bool, // Can the delegate further narrow?
|
||||
}
|
||||
```
|
||||
|
||||
### Principal-Agent Hierarchy
|
||||
|
||||
- **Account** nodes represent individual users
|
||||
- **Org** nodes represent organizations
|
||||
- **Service** nodes represent automated agents (LLM workers, node credentials)
|
||||
- **Role** nodes represent named permission sets
|
||||
|
||||
Delegation edges (`delegates`) carry `narrowed_scopes` — the delegate can only exercise scopes that are a subset of the delegator's. Liability flows upward; permissions flow downward with narrowing.
|
||||
|
||||
### BelongsToEdge (Derived from org_members)
|
||||
|
||||
ADR-045: The `organization_members` SQL table is the authoritative source. When membership changes, the consumer writes the SQL row first, then creates or removes the ACL `belongs_to` edge. The edge is derived, not the source of truth.
|
||||
|
||||
### Operation-Level ACL
|
||||
|
||||
`OperationSpec.access_control` maps to ACL graph traversal at runtime:
|
||||
|
||||
```rust
|
||||
pub fn check_access(
|
||||
acl_graph: &Graph,
|
||||
principal_key: &str,
|
||||
operation_spec: &OperationSpec,
|
||||
) -> bool {
|
||||
// Traverse from PrincipalNode to ResourceNode
|
||||
// Check if any path satisfies required_scopes (AND) and required_scopes_any (OR)
|
||||
// Honor delegation chains with scope narrowing
|
||||
}
|
||||
```
|
||||
|
||||
## Honker Integration
|
||||
|
||||
### Reactivity Pattern (ADR-047)
|
||||
|
||||
Every mutation is atomic with a notification:
|
||||
|
||||
```rust
|
||||
// Insert a node and notify in one transaction
|
||||
tx.execute(
|
||||
"INSERT INTO nodes (id, graph_id, key, attributes) VALUES (?, ?, ?, ?)",
|
||||
&[&node_id, &graph_id, &key, &attrs_json],
|
||||
)?;
|
||||
tx.stream_publish("nodes:created", &node_attrs_json)?;
|
||||
```
|
||||
|
||||
This mirrors the TypeScript pattern from `sqlite-host.md` but in Rust, using honker's SQLite extension functions:
|
||||
|
||||
```rust
|
||||
use honker::Database;
|
||||
|
||||
let db = Database::open("tenant.db")?;
|
||||
|
||||
// Transactional: business write + event stream publish commit together
|
||||
let mut tx = db.transaction()?;
|
||||
tx.execute("INSERT INTO nodes (id, graph_id, key, attributes) VALUES (?, ?, ?, ?)", ...)?;
|
||||
tx.stream_publish("nodes:created", &attrs)?;
|
||||
tx.commit()?;
|
||||
|
||||
// Subscribe to changes
|
||||
let stream = db.stream("nodes:created");
|
||||
async for event in stream.subscribe("alknet-node-watcher") {
|
||||
// event is a serde_json::Value
|
||||
}
|
||||
```
|
||||
|
||||
### Honker Features Used
|
||||
|
||||
| Feature | Use case |
|
||||
|---------|----------|
|
||||
| `stream_publish` / `subscribe` | Durable pub/sub for node/edge/membership changes with per-consumer offsets |
|
||||
| `notify` / `listen` | Ephemeral pub/sub for real-time control channel events |
|
||||
| `queue` / `claim` / `ack` | Task queue for async operations (key rotation, ACL evaluation) |
|
||||
| `scheduler` | Periodic tasks (session cleanup, audit log pruning) |
|
||||
|
||||
### Database Concurrency
|
||||
|
||||
- WAL mode (default) for concurrent reads during writes
|
||||
- Single writer per `.db` file
|
||||
- `busy_timeout=5000` default
|
||||
- `PRAGMA data_version` polling for cross-process wake (honker pattern)
|
||||
- `max_readers=4` concurrent read connections in the reader pool
|
||||
|
||||
## JSON Schema Validation
|
||||
|
||||
TypeBox from TypeScript maps to `serde_json::Value` + `jsonschema` in Rust:
|
||||
|
||||
| TypeScript | Rust |
|
||||
|-----------|------|
|
||||
| `Type.Object({...})` | `serde_json::json!({...})` as JSON Schema |
|
||||
| `Value.Check(schema, data)` | `jsonschema::validate(&schema, &data)` |
|
||||
| `Type.Module({...})` | JSON Schema with `$defs` stored in DB |
|
||||
| `Type.Composite([A, B])` | Merge + intersect via `serde_json` merge logic |
|
||||
|
||||
The `jsonschema` crate provides runtime validation analogous to TypeBox's `Value.Check()`. Schema definitions are stored as `serde_json::Value` in the `schema` column of `node_types` and `edge_types` tables.
|
||||
|
||||
## Crate Dependency Map
|
||||
|
||||
```toml
|
||||
[dependencies]
|
||||
honker = "0.x" # SQLite extension with pub/sub/queue
|
||||
serde = { version = "1", features = ["derive"] }
|
||||
serde_json = "1"
|
||||
jsonschema = "0.x" # JSON Schema validation (runtime)
|
||||
petgraph = "0.x" # Graph data structure (shared with alknet-flowgraph)
|
||||
rusqlite = { version = "0.x", features = ["bundled"] } # SQLite access (via honker)
|
||||
uuid = { version = "1", features = ["v4"] }
|
||||
chrono = "0.x"
|
||||
thiserror = "1"
|
||||
tokio = { version = "1", features = ["full"] }
|
||||
```
|
||||
|
||||
## Multi-Tenant Replication Path
|
||||
|
||||
For the private use case: single `.db` files, honker for reactivity, no cross-database FK constraints.
|
||||
|
||||
For the distributed use case (later):
|
||||
|
||||
1. **Smart contracts** (Base L2) own namespace identity → `ownerId` field on `graphs` table
|
||||
2. **alknet-relay** gossips namespace availability via iroh-gossip or call protocol subscriptions
|
||||
3. **ACL inference** — Contract `collaborators` → ACL graph `DelegatesEdge` entries
|
||||
4. **Honker streams** — `stream_subscribe("nodes:modified")` carries mutations to relay subscribers
|
||||
|
||||
Replication mindset from the start: **every write is atomic with a notification**. The honker stream event is the replication unit. A future replicator reads `_honker_stream_*` tables and propagates changes to subscribed relays.
|
||||
|
||||
### Event Boundary Discipline
|
||||
|
||||
Following [event_source_types.md](/workspace/research/event_sourcing/event_source_types.md), honker streams serve different roles in different contexts. Preventing conflation is critical:
|
||||
|
||||
| Event Type | Source | Consumer | Boundary |
|
||||
|-----------|--------|----------|----------|
|
||||
| **Domain events** (Event Sourcing) | Service that owns the data | Same service, for state reconstruction | Internal — never published directly to other services |
|
||||
| **Integration events** (State Transfer) | Projected from domain events | Other services/nodes, for cache updates | Cross-service — simple, versioned, stripped of internals |
|
||||
| **Notifications** (Thin Events) | Service that owns the data | Any subscriber, for triggering workflows | Cross-node — just entity ID + action, consumer fetches details |
|
||||
|
||||
Conflation anti-patterns to avoid:
|
||||
- **Leaky event store**: Don't let other services read honker stream events directly to drive business logic. Project domain events into integration events first.
|
||||
- **Boomerang coupling**: If a consumer of an integration event must call back to the source service synchronously, the event payload is too thin. Upgrade to a fat event.
|
||||
- **Fat notification trap**: If a notification event carries the full entity state, use state transfer instead.
|
||||
|
||||
The call protocol's `EventEnvelope` is the **integration boundary** between nodes. Domain events in honker streams stay within the service that owns them.
|
||||
|
||||
## Secrets and HD Key Derivation
|
||||
|
||||
### Key Categories
|
||||
|
||||
Different categories of secrets require different storage and derivation strategies:
|
||||
|
||||
| Category | Example | Derived from seed? | Storage |
|
||||
|-----------|---------|-------------------|---------|
|
||||
| **Identity keys** | Ed25519 keypair for alknet auth | Yes — SLIP-0010 `m/74'/0'/0'/0'` | Only derivation path in DB |
|
||||
| **Encryption keys** | AES-256-GCM key for encrypted nodes | Yes — SLIP-0010 `m/74'/2'/0'/0'` | Only derivation path in DB |
|
||||
| **External credentials** | OpenAI API key, OAuth token | No — third-party issued | Encrypted in DB with derived key |
|
||||
| **On-chain identity** | Ethereum key for contract signing | Yes — SLIP-0010 `m/44'/60'/0'/0/0` | Only derivation path in DB |
|
||||
| **Service registration** | NFT token ID, replicator endpoint | No — on-chain data | Plain in DB or on-chain |
|
||||
|
||||
### BIP39 Seed Phrase as Root of Trust
|
||||
|
||||
The master seed phrase (BIP39 mnemonic) is the single recovery mechanism for the entire system. From one seed phrase, all self-generated secrets can be derived on demand:
|
||||
|
||||
```rust
|
||||
// Seed phrase → master seed (BIP39)
|
||||
let mnemonic = Mnemonic::from_phrase(&phrase, Language::English)?;
|
||||
let seed = mnemonic.to_seed(Some(&passphrase));
|
||||
|
||||
// Master seed → SLIP-0010 Ed25519 master key
|
||||
let master_key = ExtendedPrivKey::new_master(Network::Alknet, &seed)?;
|
||||
|
||||
// Derive identity keypair
|
||||
let identity_key = master_key.derive_path("m/74'/0'/0'/0'")?;
|
||||
|
||||
// Derive encryption key material (use first 32 bytes of derived key as AES-256 key)
|
||||
let encryption_key = master_key.derive_path("m/74'/2'/0'/0'")?;
|
||||
|
||||
// Derive Ethereum signing key (for smart contract interactions)
|
||||
let eth_key = master_key.derive_path("m/44'/60'/0'/0/0")?;
|
||||
```
|
||||
|
||||
### External Credentials: Encryption with Derived Keys
|
||||
|
||||
For external credentials (API keys, OAuth tokens) that can't be derived, the existing `EncryptedDataSchema` pattern from `@alkdev/storage` applies — but the encryption key is itself derived from the seed:
|
||||
|
||||
1. The secret service derives an AES-256-GCM key via SLIP-0010 path `m/74'/2'/0'/0'`
|
||||
2. External credentials are encrypted with this derived key using the existing encrypt/decrypt functions
|
||||
3. The encrypted data is stored as a `SecretNode` in the metagraph
|
||||
4. Only the derivation path and key version are stored in plain attributes
|
||||
5. The seed phrase (or derived encryption key) is held only by the secret service — never in the database
|
||||
|
||||
### Secret Service
|
||||
|
||||
The secret service is an irpc service (see [services.md](services.md)) that:
|
||||
|
||||
- Holds the master seed phrase in memory (never persisted to disk in plain text)
|
||||
- Derives keys on demand via SLIP-0010/BIP39
|
||||
- Encrypts/decrypts external credentials using derived keys
|
||||
- Is the **only** component that ever sees the master seed
|
||||
|
||||
Workers request derived keys through the secret service's irpc protocol. They never see the seed or the encryption key.
|
||||
|
||||
### Derivation Path Conventions
|
||||
|
||||
| Path | Purpose |
|
||||
|------|---------|
|
||||
| `m/74'/0'/0'/0'` | Primary Ed25519 identity keypair (alknet auth) |
|
||||
| `m/74'/0'/0'/1'` | Secondary identity keypair (device key) |
|
||||
| `m/74'/0'/1'/0'` | SSH host key (for server identity) |
|
||||
| `m/74'/1'/0'/{site_hash}'` | Site-specific password derivation |
|
||||
| `m/74'/2'/0'/0'` | AES-256-GCM encryption key (for external credentials) |
|
||||
| `m/44'/60'/0'/0/0` | Ethereum signing key (for smart contract interactions) |
|
||||
|
||||
The `74'` coin type is unallocated per SLIP-0044 and can be registered for alknet. The `0'`/`1'`/`2'` account levels divide identity, password, and encryption purposes.
|
||||
|
||||
### Rust Crates Required
|
||||
|
||||
| Crate | Purpose |
|
||||
|-------|---------|
|
||||
| `bip39` | Mnemonic generation and seed derivation |
|
||||
| `ed25519-bip32` (IOHK) or `rust-bip32-ed25519` (BitBoxSwiss) | SLIP-0010 Ed25519 HD key derivation |
|
||||
| `aes-gcm` | AES-256-GCM encryption for external credentials |
|
||||
| `sha2` | SHA-256 for key hashing |
|
||||
| `irpc` | Service protocol definitions |
|
||||
|
||||
## Design Decisions (mapped from TypeScript ADRs)
|
||||
|
||||
| Original ADR | Decision | Rust adaptation |
|
||||
|-------------|----------|-----------------|
|
||||
| 002 | Metagraph over domain tables | Same 6-table schema, same graph type/node type/edge type model |
|
||||
| 008 | Common columns pattern | `id`, `metadata`, `created_at`, `updated_at` on all tables |
|
||||
| 019 | JSON text for schema columns | `serde_json::Value` stored as TEXT in SQLite |
|
||||
| 020 | No nodeTypeId on nodes | Node type enforced at application layer |
|
||||
| 022 | Composite FKs for node refs | `source_node_key` + `target_node_key` with cascade |
|
||||
| 034 | ACL as metagraph | AclGraph is a metagraph instance |
|
||||
| 038 | SQLite-first, PG removed | SQLite only via honker |
|
||||
| 040 | System DB + tenant DB | Two `.db` files |
|
||||
| 041 | Identity tables in storage | Same tables, same constraints |
|
||||
| 045 | org_members authoritative | SQL table is source of truth, BelongsToEdge is derived |
|
||||
| 047 | Honker event target | honker stream/notify as pub/sub mechanism |
|
||||
| 049 | Identity schema restructuring | Separate credential tables, no Gitea columns |
|
||||
| 050 | SHA-256 for API key hashing | Fast hash for high-entropy machine keys |
|
||||
| 051 | BIP39/SLIP-0010 for HD key derivation | Seed phrase as root of trust for identity, encryption, and signing keys |
|
||||
| 052 | Secrets as irpc service | Secret service holds seed, derives keys, encrypts/decrypts external creds |
|
||||
| 053 | Event boundary discipline | Honker streams are domain events; call protocol is integration boundary |
|
||||
|
||||
## References
|
||||
|
||||
- `@alkdev/storage` — TypeScript metagraph, identity, ACL, encrypted data implementation
|
||||
- `@alkdev/flowgraph` — TypeScript call-graph and operation-graph (maps to petgraph in Rust)
|
||||
- `@alkdev/operations` — TypeScript OperationSpec, CallHandler, registry
|
||||
- `/workspace/honker` — SQLite extension with pub/sub, streams, queues
|
||||
- `/workspace/polyglot` — SQL transpiler (future: schema migration validation)
|
||||
- `/workspace/petgraph` — Graph data structure library (used in alknet-flowgraph)
|
||||
- `/workspace/jsonschema` — JSON Schema validation (Rust, replaces TypeBox at runtime)
|
||||
- `/workspace/iroh/iroh-dns` — DNS resolver and endpoint info
|
||||
- `/workspace/@alkdev/storage/docs/architecture/encrypted-data.md` — Original encrypted data design (TypeScript)
|
||||
- `/workspace/research/event_sourcing/event_source_types.md` — Event-driven architecture patterns
|
||||
- [services.md](services.md) — Service layer architecture (irpc protocols)
|
||||
- [core.md](core.md) — Core overview, head/worker terminology
|
||||
Reference in New Issue
Block a user