docs: write Phase 0 architecture foundation — ADRs 026-034, spec docs, and task updates

Phase 0a — ADRs (9 new):
- ADR-026: Transport/interface separation (three-layer model)
- ADR-027: Crate decomposition (core, secret, storage, flowgraph, napi, CLI)
- ADR-028: Auth as irpc service (AuthProtocol behind feature flag)
- ADR-029: Identity as core type (Identity + IdentityProvider in alknet-core)
- ADR-030: Static/dynamic config split (ArcSwap, ConfigReloadHandle)
- ADR-031: Forwarding policy (rule-based allow/deny, TransportKind-aware)
- ADR-032: Event boundary discipline (domain, irpc, call protocol boundaries)
- ADR-033: OperationEnv universal composition (three dispatch paths)
- ADR-034: Head/worker terminology (replace hub/spoke)

Phase 0b — New spec documents (7):
- identity.md, services.md, interface.md, configuration.md,
  storage.md, flowgraph.md, secret-service.md

Updated existing docs:
- auth.md: reference identity.md for canonical definitions, add AuthProtocol
- open-questions.md: resolve OQ-12, OQ-16, OQ-18, OQ-22, OQ-23-25
- README.md: add all new docs, ADRs 026-034

Marked 19 architecture tasks as completed.
This commit is contained in:
2026-06-07 09:32:58 +00:00
parent 84f16d66e7
commit 19b3d3a078
38 changed files with 2750 additions and 101 deletions

View File

@@ -1,16 +1,18 @@
---
status: draft
last_updated: 2026-06-04
last_updated: 2026-06-07
---
# Alknet Architecture
## Current State
Architecture specification in active development. 22 ADRs accepted. Unified
auth and call protocol architecture being specified — see [auth.md](auth.md)
and [call-protocol.md](call-protocol.md). Configuration architecture under
exploration — see [research/configuration.md](../research/configuration.md).
Architecture specification in active development. Phase 0 foundation ADRs
completed (026034). New spec documents created for identity, services,
interface, configuration, storage, flowgraph, and secret service. Existing
specs updated for the three-layer model, crate decomposition, and unified
identity. See [open-questions.md](open-questions.md) for remaining open
questions.
## Architecture Documents
@@ -24,12 +26,24 @@ exploration — see [research/configuration.md](../research/configuration.md).
| [server.md](server.md) | reviewed | Server acceptance, channel handling, proxy |
| [tun-shim.md](tun-shim.md) | deprecated | TUN interface wrapper — **deferred**, use tun2proxy |
| [napi-and-pubsub.md](napi-and-pubsub.md) | reviewed | NAPI wrapper and pubsub event target adapter |
| [identity.md](identity.md) | draft | Identity type, IdentityProvider trait, auth flows |
| [services.md](services.md) | draft | irpc service layer, OperationEnv, three dispatch paths |
| [interface.md](interface.md) | draft | Layer 2: Interface trait, SshInterface, RawFramingInterface |
| [configuration.md](configuration.md) | draft | StaticConfig, DynamicConfig, forwarding policy, reload |
| [storage.md](storage.md) | draft | alknet-storage: metagraph, identity, ACL, honker |
| [flowgraph.md](flowgraph.md) | draft | alknet-flowgraph: call graph, operation graph, petgraph |
| [secret-service.md](secret-service.md) | draft | alknet-secret: BIP39, SLIP-0010, AES-GCM, SecretProtocol |
## Research Documents
| Document | Status | Description |
|----------|--------|-------------|
| [configuration.md](../research/configuration.md) | draft | Configuration architecture: static/dynamic split, hot reload, forwarding policy |
| [configuration.md](../research/configuration.md) | draft | Configuration architecture (source for promoted spec) |
| [core.md](../research/core.md) | draft | Core overview, transport, call protocol, DNS |
| [services.md](../research/services.md) | draft | irpc service protocols, OperationContext, application services |
| [storage.md](../research/storage.md) | draft | Metagraph, identity, ACL, secrets, honker |
| [flow.md](../research/flow.md) | draft | FlowGraph, operation graph, call graph, petgraph mapping |
| [integration-plan.md](../research/integration-plan.md) | draft | Phased integration plan for services, pubsub, and operations |
## ADR Table
@@ -57,12 +71,24 @@ exploration — see [research/configuration.md](../research/configuration.md).
| [023](decisions/023-unified-auth-shared-key-material.md) | Unified auth with shared key material + token auth | Accepted |
| [024](decisions/024-bidirectional-call-protocol.md) | Bidirectional call protocol (EventEnvelope) | Accepted |
| [025](decisions/025-handler-spec-separation.md) | Handler/spec separation for downstream service registration | Accepted |
| [026](decisions/026-transport-interface-separation.md) | Transport/interface separation (three-layer model) | Accepted |
| [027](decisions/027-crate-decomposition.md) | Crate decomposition (core, secret, storage, flowgraph) | Accepted |
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service behind feature flag | Accepted |
| [029](decisions/029-identity-core-type.md) | Identity as core type in alknet-core | Accepted |
| [030](decisions/030-static-dynamic-config-split.md) | Static/dynamic config split with ArcSwap | Accepted |
| [031](decisions/031-forwarding-policy.md) | Forwarding policy with rule-based allow/deny | Accepted |
| [032](decisions/032-event-boundary-discipline.md) | Event boundary discipline (domain, irpc, call protocol) | Accepted |
| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv as universal composition mechanism | Accepted |
| [034](decisions/034-head-worker-terminology.md) | Head/worker terminology replacing hub/spoke | Accepted |
## Open Questions
Most open questions have been resolved. Open questions remain for
configuration, auth, and call protocol — see
[open-questions.md](open-questions.md) for details.
See [open-questions.md](open-questions.md) for all open and resolved questions.
Key resolved questions from Phase 0: OQ-12, OQ-16, OQ-18 (forwarding policy
and identity scopes), OQ-17 (transport-aware auth), OQ-23 (irpc feature flag),
OQ-24 (DNS control channel scope), OQ-25 (crate irpc dependencies). Key open
questions: OQ-15 (QUIC coexistence), OQ-19 (WebTransport TLS), OQ-20 (worker
registration).
## Lifecycle Definitions

View File

@@ -3,15 +3,15 @@ status: draft
last_updated: 2026-06-07
---
# Authentication & Identity
# Authentication
## What
A unified authentication and identity layer that works across all transports —
SSH-over-any-transport and WebTransport (non-SSH HTTP-level transports). The
same key material (Ed25519 authorized keys and certificate authorities) is
shared across both auth paths. Identity resolution produces a transport-agnostic
`Identity` that carries scopes and resources for downstream authorization.
A unified authentication layer that works across all transports — SSH-over-any-
transport and WebTransport (non-SSH HTTP-level transports). The same key
material (Ed25519 authorized keys and certificate authorities) is shared across
both auth paths. Identity resolution produces a transport-agnostic `Identity`
that carries scopes and resources for downstream authorization.
## Why
@@ -21,8 +21,27 @@ need a different auth presentation that shares the same key material. The
unified auth layer ensures one key set, one identity, one rotation mechanism
across all transports. See ADR-023 for the decision context.
The canonical definitions of `Identity` and `IdentityProvider` are in
[identity.md](identity.md). This document covers auth-specific behavior:
auth presentation per transport, `AuthPolicy` structure, and the auth service
relationship.
## Architecture
### Identity and IdentityProvider
See [identity.md](identity.md) for the canonical definitions of:
- `Identity` struct (`{ id, scopes, resources }`)
- `IdentityProvider` trait (`resolve_from_fingerprint()`, `resolve_from_token()`)
- `ConfigIdentityProvider` (default, ArcSwap-backed)
- `StorageIdentityProvider` (production, SQLite-backed, in alknet-storage)
- `AuthProtocol` irpc service (behind `irpc` feature flag)
The key relationship: `IdentityProvider` is the contract. `ConfigIdentityProvider`
is the default implementation (reads from `DynamicConfig.auth`). `AuthProtocol`
irpc service is one way to satisfy the trait, behind a feature flag. Both paths
produce the same `Identity` result. See ADR-028 and ADR-029.
### Auth Presentation Per Transport
| Transport | Auth presentation | Verification |
@@ -72,44 +91,23 @@ V1 uses timestamp-only (±300s window, no server state). The replay trade-offs
and future zero-replay options (nonce challenge-response) are documented in
ADR-023.
### IdentityProvider Trait
### IdentityProvider and Auth Service Relationship
The `IdentityProvider` trait decouples alknet-core from any specific identity
storage. It resolves a key fingerprint or auth token to an `Identity` with
scopes and resources.
The `IdentityProvider` trait (defined in [identity.md](identity.md)) decouples
alknet-core from any specific identity storage. Two implementations exist:
```rust
pub trait IdentityProvider: Send + Sync + 'static {
/// Resolve an SSH public key fingerprint to an identity.
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
- **ConfigIdentityProvider** (in alknet-core) — reads from
`ArcSwap<DynamicConfig.auth>`. Every authorized key gets a default scope set.
No database required. This is the default for minimal deployments.
/// Resolve an auth token to an identity.
/// Returns None if the token is invalid, expired, or the key is not authorized.
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
}
- **StorageIdentityProvider** (in alknet-storage) — backed by SQLite
`peer_credentials` and `api_keys` tables plus the ACL graph. Resolves
fingerprint → account → organization membership → effective scopes.
pub struct Identity {
pub id: String, // Unique identifier — fingerprint (config) or account UUID (database)
pub scopes: Vec<String>, // e.g., ["relay:connect", "service:gitea:read"]
pub resources: HashMap<String, Vec<String>>, // e.g., {"service": ["gitea", "registry"]}
}
```
> **Note on identity models**: Earlier research used `{node_id, fingerprint, scopes}`.
> The unified model uses `{id, scopes, resources}` where `id` serves as both
> fingerprint (for key-based auth from config) and account UUID (for
> database-backed auth). The `resources` field provides resource-level
> authorization beyond what scopes offer. This is the canonical definition
> that all components should use.
```
**Default implementation**: `ConfigIdentityProvider` loads from
`DynamicConfig.auth` (the `authorized_keys` set). Every authorized key gets a
default scope set. No database required.
**Head implementation**: Backed by `@alkdev/storage`'s `peer_credentials` and
`accounts` tables plus the ACL graph. Resolves fingerprint → account →
organization membership → effective scopes. Uses `ArcSwap` for hot reload.
The `AuthProtocol` irpc service (behind the `irpc` feature flag, per ADR-028)
provides an async boundary for auth verification. It is one way to satisfy the
`IdentityProvider` trait, not a replacement for it. Both the trait path and the
irpc path produce the same `Identity` result.
The trait is the contract. The backing store is pluggable. Alknet-core never
depends on Honker, SQLite, or any specific database.
@@ -240,13 +238,13 @@ security consideration:
## Open Questions
- **OQ-18**: Should `Identity.scopes` be populated from `ForwardingPolicy`
rules, from an external `IdentityProvider`, or from both? See
[open-questions.md](open-questions.md).
- **OQ-18**: ~~Source of Identity.scopes~~ Resolved per ADR-029 and ADR-031.
`IdentityProvider` owns scopes, `ForwardingPolicy` uses scopes from `Identity`.
See [open-questions.md](open-questions.md).
- **OQ-19**: Should the WebTransport listener require its own TLS identity
(separate from the SSH-over-TLS listener), or can they share the same
certificate? See [open-questions.md](open-questions.md).
certificate? Deferred to Phase 4. See [open-questions.md](open-questions.md).
## Design Decisions
@@ -254,16 +252,16 @@ security consideration:
|-----|----------|---------|
| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Ed25519 + cert-authority | Key-based auth, no passwords |
| [023](decisions/023-unified-auth-shared-key-material.md) | Unified auth, shared key material | Same keys for SSH and token auth |
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | AuthProtocol behind feature flag; IdentityProvider is the contract |
| [029](decisions/029-identity-core-type.md) | Identity as core type | `Identity` and `IdentityProvider` in alknet-core |
## References
- [identity.md](identity.md) — Canonical Identity and IdentityProvider definitions
- [server.md](server.md) — Current SSH auth handler
- [transport.md](transport.md) — Transport abstraction
- [configuration.md](../research/configuration.md) — DynamicConfig, AuthPolicy structure
- [open-questions.md](open-questions.md) — OQ-17 (resolved), OQ-18, OQ-19
- `server/handler.rs` — Current `auth_publickey()` callback
- `auth/server_auth.rs` — Current `ServerAuthConfig` struct
- `auth/keys.rs` — `KeySource` and key loading
- [configuration.md](configuration.md) — DynamicConfig, AuthPolicy, ConfigReloadHandle
- [services.md](services.md) — AuthProtocol irpc service
- [open-questions.md](open-questions.md) — OQ-17 (resolved), OQ-18 (resolved), OQ-19
- [wtransport](https://github.com/BiagioFesta/wtransport) — Rust WebTransport library
- [WebTransport W3C Spec](https://www.w3.org/TR/webtransport/) — Browser API
- [@alkdev/storage](/workspace/@alkdev/storage) — `peer_credentials` table, ACL graph
- [WebTransport W3C Spec](https://www.w3.org/TR/webtransport/) — Browser API

View File

@@ -0,0 +1,192 @@
---
status: draft
last_updated: 2026-06-07
---
# Configuration
## What
Alknet's configuration is split into `StaticConfig` (immutable after startup) and
`DynamicConfig` (hot-reloadable at runtime), with `ArcSwap` providing lock-free
reads on the hot path. `ConfigService` wraps reloads behind an irpc protocol
for production deployments.
## Why
Three specific failures motivated the split (ADR-030):
1. No hot reload of authentication credentials — adding a key requires a restart.
2. No port forwarding access control — any authenticated client has unrestricted
access (ADR-031).
3. No structured configuration beyond CLI flags — operators need config files
and the NAPI layer needs programmatic reload.
The split is clean: anything that affects SSH handshake or socket binding is
static; anything checked per-connection or per-channel is dynamic.
## Architecture
### StaticConfig
Immutable after startup. Constructed from `ServeOptions` (the builder pattern
is preserved per ADR-011). Contains:
- Transport mode, listen address
- TLS config (cert, key)
- iroh config (relay URL)
- Stealth mode flag
- Host key, host key algorithm
- Max auth attempts, max connections per IP
- Proxy config
Changing any of these requires a restart.
### DynamicConfig
Hot-reloadable at runtime via `ArcSwap<DynamicConfig>`. Contains:
- `AuthPolicy` — authorized keys, certificate authorities, token config
- `ForwardingPolicy` — allow/deny rules for channel targets (ADR-031)
- `RateLimitConfig` — rate limiting parameters
`ArcSwap` provides lock-free reads. Every `auth_publickey()` and
`channel_open_direct_tcpip()` call does a single `Arc` dereference — zero cost
compared to the current approach. Writes are atomic: `store()` swaps the
pointer.
### ConfigReloadHandle
```rust
pub struct ConfigReloadHandle {
dynamic: Arc<ArcSwap<DynamicConfig>>,
}
impl ConfigReloadHandle {
pub fn reload(&self, new_config: DynamicConfig) { ... }
}
```
Obtained from `Server::run()`. Passed to NAPI or CLI for explicit reload.
### ConfigService irpc Service
```rust
enum ConfigProtocol {
GetForwardingPolicy,
GetRateLimits,
ReloadForwarding { policy: ForwardingPolicy },
ReloadRateLimits { limits: RateLimitConfig },
}
```
Behind the `irpc` feature flag. For production deployments that use the service
layer. For minimal deployments, direct `ConfigReloadHandle::reload()` is
sufficient.
### ForwardingPolicy
Part of DynamicConfig (ADR-031). Evaluated per-channel-open, matched against
the authenticated `Identity`. Rules are evaluated in order; first match wins.
Default determines fallback.
```rust
pub struct ForwardingPolicy {
pub default: ForwardingAction,
pub rules: Vec<ForwardingRule>,
}
```
### TOML Config File
Optional convenience input format (amends ADR-011, does not replace
programmatic API). Covers static config plus initial auth/forwarding paths.
```toml
[server]
transport = "tls"
listen = "0.0.0.0:443"
[auth]
host_key = "/etc/alknet/ssh/host_key"
[forwarding]
default = "deny"
[[forwarding.rules]]
target = "localhost:*"
action = "allow"
```
### NAPI Reload API
```typescript
interface AlknetServer {
reloadAuth(auth: { authorizedKeys?: Buffer, certAuthority?: Buffer }): void;
reloadForwarding(policy: ForwardingPolicyConfig): void;
reloadAll(config: DynamicConfig): void;
}
```
### Multi-Transport Listeners
A head node may accept connections on multiple transports simultaneously. The
architecture supports `Vec<ListenerConfig>` instead of a single
`ServeTransportMode`. `Server::run()` spawns one accept loop per listener,
sharing `DynamicConfig`, `ConnectionRateLimiter`, sessions, and shutdown signal.
```toml
[[listeners]]
transport = "tls"
listen = "0.0.0.0:443"
stealth = true
[[listeners]]
transport = "tcp"
listen = "0.0.0.0:22"
[[listeners]]
transport = "iroh"
iroh_relay = "https://relay.alk.dev"
```
### CLI vs Programmatic Behavior
| Interface | Static config | Dynamic config | Reload mechanism |
|-----------|--------------|----------------|------------------|
| CLI | Flags + optional `--config` file | Loaded at startup from `--authorized-keys` | None (restart to change) |
| Core Rust | `StaticConfig` struct | `AuthService` (irpc) or `ArcSwap<DynamicConfig>` (minimal) | `ConfigService::reload()` or `ConfigReloadHandle::reload()` |
| NAPI | `serve()` options | Same | `server.reloadAuth()`, `server.reloadForwarding()` |
## Constraints
- `StaticConfig` cannot be changed after startup. Changing transport mode,
listen address, TLS config, or host key requires a restart.
- `DynamicConfig` is reloaded atomically via `ArcSwap`. Existing connections
continue with their current config; new connections get the new config.
- Config file is optional. `ServeOptions` builder pattern remains the primary
API (amends ADR-011, does not supersede it).
- No file watching (OQ-13 resolved: potential attack vector, unnecessary
complexity).
- Client configuration stays as `ConnectOptions` — no `ArcSwap` needed.
## Open Questions
- None. All configuration-related questions are resolved per ADR-030, ADR-031,
and the resolved OQs in [open-questions.md](open-questions.md).
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [030](decisions/030-static-dynamic-config-split.md) | Static/dynamic config split | Immutable transport vs. reloadable auth/forwarding |
| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first API | Amended, not superseded — TOML is convenience layer |
| [031](decisions/031-forwarding-policy.md) | Forwarding policy | Rule-based allow/deny, TransportKind-aware |
| [029](decisions/029-identity-core-type.md) | Identity as core type | DynamicConfig.auth consumed by IdentityProvider |
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | ConfigService wraps DynamicConfig reloads |
## References
- [research/configuration.md](../research/configuration.md) — Full analysis and proposed solution
- [identity.md](identity.md) — IdentityProvider trait, DynamicConfig.auth
- [ADR-013](decisions/013-fail2ban-friendly-logging.md) — Rate limiting parameters

View File

@@ -0,0 +1,162 @@
# ADR-026: Transport/Interface Separation (Three-Layer Model)
## Status
Accepted
## Context
In the current architecture, SSH is deeply embedded in the server handler. The
`ServerHandler` owns auth, channel management, and proxy logic — all mixed
together. This makes it impossible to run the call protocol over any transport
that doesn't speak SSH, such as:
- **DNS** — encoding call protocol frames as DNS TXT queries/responses for
censorship resistance
- **Raw framing** — 4-byte length prefix + JSON `EventEnvelope` without SSH
wrapping, for local service mesh or browser-to-head direct communication
- **WebTransport** — running call protocol over QUIC streams (browsers can't do
SSH key exchange)
The DNS control channel concept from research (`core.md`) currently conflates
"DNS as a transport that moves bytes" with "SSH sessions over those bytes." But
SSH is not a transport — it's a protocol layer that sits *on top of* a
transport. Separating them enables the DNS control channel to carry call
protocol events directly, without wrapping SSH inside DNS queries.
The same separation enables raw framing (no SSH overhead) for trusted local
networks, and WebTransport direct call protocol for browser clients.
## Decision
**Establish a three-layer model:**
### Layer 1: Transport
Produces byte streams. A `Transport` still produces
`AsyncRead + AsyncWrite + Unpin + Send`. This layer is unchanged from ADR-001.
```rust
#[async_trait]
pub trait Transport: Send + Sync + 'static {
type Stream: AsyncRead + AsyncWrite + Unpin + Send + 'static;
async fn connect(&self) -> Result<Self::Stream>;
fn describe(&self) -> String;
}
```
Transports: TCP, TLS, iroh, DNS (as byte carrier), WebTransport (future).
### Layer 2: Interface
Consumes a `Transport::Stream` and produces call protocol sessions. An
interface is what SSH currently does: wrap a byte stream in session semantics.
```rust
#[async_trait]
pub trait Interface: Send + Sync + 'static {
type Session;
async fn accept(stream: TransportStream, config: &InterfaceConfig) -> Result<Self::Session>;
}
```
Interfaces:
- **SSH interface** — wraps existing `ServerHandler` logic. SSH handshake, auth,
channel multiplexing. The call protocol runs over a reserved SSH channel
(`alknet-control:0`).
- **Raw framing interface** — 4-byte big-endian length prefix + JSON
`EventEnvelope`. No SSH overhead. Direct call protocol over the transport
stream.
- **DNS control channel** — a (DNS transport, raw framing interface) pair that
encodes/decodes `EventEnvelope` frames as DNS query/response pairs.
### Layer 3: Protocol
Carries semantics. Call protocol events, operation registry, service calls.
The protocol is agnostic to both the transport and the interface below it. It
receives `EventEnvelope` frames from whatever interface produced them.
### Connection Model
A **connection** is always a (Transport, Interface) pair. The valid combinations are enumerated:
| Transport | Interface | Use case |
|-----------|-----------|----------|
| TLS | SSH | Standard alknet tunnel |
| TCP | SSH | Plain SSH tunnel |
| iroh | SSH | P2P SSH tunnel |
| DNS | raw framing | DNS control channel |
| WebTransport | SSH | Browser SSH tunnel (future) |
| WebTransport | raw framing | Browser call protocol (future) |
| TCP | raw framing | Direct call protocol, local mesh |
**The DNS control channel carries call protocol frames directly — it does NOT
wrap SSH inside DNS.** This is explicit because the research originally
conflated "SSH tunneling over DNS" with "DNS as a transport for call protocol."
The (DNS, raw framing) pair sends `EventEnvelope` frames as DNS TXT
queries/responses — no SSH involved.
### `TransportKind` Enum
The `TransportKind` enum (currently `Tcp | Tls | Iroh`) gains `Dns` and
`WebTransport` variants. Initially these are tags only — no acceptor
implementation. The full DNS and WebTransport implementations are Phase 4 work
per the integration plan.
```rust
pub enum TransportKind {
Tcp,
Tls { server_name: Option<String> },
Iroh { endpoint_id: String },
Dns { domain: String },
WebTransport { host: String },
}
```
### ServerHandler Refactor
The existing `ServerHandler` is refactored into `SshInterface`. The interface
abstraction means the server's accept loop becomes:
```rust
// Pseudocode
let (transport, interface) = listener_config;
let stream = transport.accept().await?;
let session = interface.accept(stream, &config).await?;
// session produces call protocol events
```
The call protocol handler is interface-agnostic — it receives `EventEnvelope`
frames from any interface. Auth, forwarding policy, and operation routing happen
at Layer 3, not inside the SSH handler.
## Consequences
- **Positive**: Enables DNS control channel without SSH wrapping. The (DNS,
raw framing) pair is a clean (Transport, Interface) combination.
- **Positive**: Enables raw framing for local service mesh. No SSH overhead for
trusted networks.
- **Positive**: SSH becomes pluggable. The same call protocol handler works with
any interface.
- **Positive**: `ServerHandler` is refactored into `SshInterface` — a smaller,
more focused component that only handles SSH session management.
- **Positive**: Future WebTransport and WebSocket interfaces are additive — they
implement the `Interface` trait without touching SSH code.
- **Negative**: This is the most invasive code change in Phase 1
(integration-plan, Phase 1.8). SSH auth, channel management, and proxy logic
are currently tangled in `ServerHandler`. Extracting them requires careful
refactoring to maintain existing behavior.
- **Negative**: The `Interface` trait is new and untested. The design must
accommodate both SSH's channel multiplexing and raw framing's single-stream
model through the same abstraction.
## References
- [research/core.md](../../research/core.md) — Transport layer, DNS transport section
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.8, three-layer model
- [transport.md](../transport.md) — Current Transport trait (unchanged at Layer 1)
- [server.md](../server.md) — Current ServerHandler (will become SshInterface)
- [ADR-001](001-pluggable-transport.md) — Transport trait produces stream (unchanged)
- [ADR-004](004-ssh-over-transport.md) — SSH runs over transport (reinforced by Layer 2)
- [ADR-024](024-bidirectional-call-protocol.md) — Bidirectional call protocol (Layer 3)

View File

@@ -0,0 +1,150 @@
# ADR-027: Crate Decomposition
## Status
Accepted
## Context
alknet-core currently contains everything: transport, SSH, auth, config, the
call protocol handler, and the server accept loop. As the project grows to
include SQLite-backed identity, HD key derivation, and metagraph storage, core
would need to depend on rusqlite, bip39, petgraph, and other heavy dependencies
— unacceptable for a library crate that CLI users embed.
Different deployment topologies need different subsets:
- A minimal CLI tunnel only needs core, transport, and auth types
- A head node needs SQLite-backed identity and the secret service
- A flowgraph visualization tool only needs petgraph operations
Circular dependencies must be avoided. alknet-storage implements
alknet-core's `IdentityProvider` trait, so alknet-core cannot depend on
alknet-storage. alknet-storage references alknet-secret's `EncryptedData` wire
format, but not as a crate dependency.
## Decision
**Decompose the project into six crates with a strict acyclic dependency graph.**
### Crate Structure
1. **alknet-core** — Transport, SSH, call protocol, config, auth types, identity,
`OperationSpec`, `Interface` trait. The foundational crate that everything
else depends on (by type, not by crate dep in some cases).
- *Depends on*: russh, tokio, irpc (feature-gated), serde, arc-swap
- *Does NOT depend on*: alknet-secret, alknet-storage, alknet-flowgraph
2. **alknet-secret** — BIP39 mnemonic generation, SLIP-0010 Ed25519 HD key
derivation, AES-256-GCM encryption, `SecretProtocol` irpc service.
- *Depends on*: bip39, ed25519-bip32 (or rust-bip32-ed25519), aes-gcm, sha2,
irpc
- *Does NOT depend on*: alknet-core, alknet-storage
3. **alknet-storage** — SQLite-backed metagraph, identity tables, ACL graph,
honker integration, `StorageProtocol` irpc service.
- *Depends on*: rusqlite (via honker), honker, petgraph, jsonschema, irpc
- *Does NOT depend on alknet-core* (but implements alknet-core's
`IdentityProvider` trait via the trait, not a crate dep)
- *Does NOT depend on alknet-secret* (but references `EncryptedData` type
format for wire compatibility)
4. **alknet-flowgraph**`FlowGraph<N,E>` over petgraph, operation graph, call
graph, type compatibility checking.
- *Depends on*: petgraph, serde, jsonschema, thiserror
- *Does NOT depend on*: alknet-core, alknet-storage, alknet-secret
5. **alknet-napi** — Node.js native addon. Exposes alknet-core to Node.js.
- *Depends on*: alknet-core
- *Does NOT depend on*: alknet-secret, alknet-storage, alknet-flowgraph
6. **alknet** (CLI binary) — Assembles everything.
- *Depends on*: alknet-core, alknet-secret (feature), alknet-storage (feature),
alknet-flowgraph (feature), toml
### Dependency Graph
```
alknet-secret
/ \
/ \
alknet-core ←──── ←── alknet-storage
↑ \ /
│ alknet-flowgraph
alknet-napi
alknet (CLI binary — assembles everything)
```
### Narrow Interface Points
Three types serve as the narrow interface points between crates:
1. **`Identity`** — Defined in `alknet_core::auth`. Used by auth handler,
forwarding policy, and call protocol. alknet-storage implements
`IdentityProvider` to produce instances.
2. **`IdentityProvider`** — Trait defined in `alknet_core::auth`. Implemented by
`ConfigIdentityProvider` (in core) and `StorageIdentityProvider` (in
alknet-storage). The CLI/NAPI layer wires the concrete implementation.
3. **`OperationSpec`** — Defined in `alknet_core::call`. Used by the operation
registry and by alknet-flowgraph for type compatibility checking. The bridge
is serialization — flowgraph serializes to JSON, storage persists it.
### irpc Feature Flag
irpc is a feature flag in alknet-core. When disabled, auth and config go through
`IdentityProvider` and `ConfigReloadHandle` directly — no irpc overhead. Nodes
that only do SSH tunneling don't need the service layer.
In alknet-secret and alknet-storage, irpc is an independent dependency, not
feature-gated. These crates always define irpc service protocols because they
are used in production deployments where the service layer is active.
### alknet-storage's Relationship to alknet-core
alknet-storage does NOT depend on alknet-core as a crate. Instead:
- alknet-storage defines its own `IdentityProvider` impl that matches
alknet-core's trait signature. The trait is re-exported or defined locally
with `#[cfg(feature = "alknet-core")]` interop.
- In practice, the CLI binary crate depends on both and wires them together.
alknet-storage provides `StorageIdentityProvider`; alknet-core takes
`impl IdentityProvider`.
### alknet-storage's Relationship to alknet-secret
alknet-storage does NOT depend on alknet-secret as a crate. Instead:
- alknet-storage and alknet-secret share the `EncryptedData` wire format (key
version, salt, IV, ciphertext). This is a type-level compatibility, not a
crate dependency.
- alknet-secret encrypts; alknet-storage stores the encrypted blob in a
`SecretNode` in the metagraph. The bridge is serialization.
## Consequences
- **Positive**: Core is lean. No database, no crypto, no petgraph. CLI users
get a small binary.
- **Positive**: Services are pluggable. alknet-secret and alknet-storage can be
swapped for alternative implementations.
- **Positive**: No circular dependencies. The dependency graph is a DAG.
- **Positive**: Deployment topology determines which crates to include. A CLI
tunnel uses only alknet-core. A head node uses everything.
- **Positive**: irpc is feature-gated in core. Minimal deployments don't pay for
service layer overhead.
- **Negative**: `IdentityProvider` trait interop between alknet-core and
alknet-storage requires careful versioning. If the trait signature changes,
both crates must update.
- **Negative**: `EncryptedData` wire format compatibility between alknet-secret
and alknet-storage is implicit (not enforced by the type system). A shared
types crate could be extracted if needed, but adds another crate dependency.
## References
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 2, dependency graph
- [research/core.md](../../research/core.md) — alknet-core contents
- [research/services.md](../../research/services.md) — Service protocols
- [research/storage.md](../../research/storage.md) — alknet-storage contents
- [research/flow.md](../../research/flow.md) — alknet-flowgraph contents
- [ADR-029](029-identity-core-type.md) — Identity as core type (narrow interface point)

View File

@@ -0,0 +1,146 @@
# ADR-028: Auth as irpc Service
## Status
Accepted
## Context
For head nodes serving many users, in-memory key lookup via `ArcSwap<DynamicConfig>`
doesn't scale. Loading all authorized keys into RAM and atomic-swapping the
entire set on each reload works for small deployments but requires holding every
key in memory. For production deployments with hundreds or thousands of users,
auth verification should query a database on demand rather than holding all keys
in memory.
The current `ArcSwap<DynamicConfig>` approach works for CLI and single-node
setups. What's needed is an async boundary that allows auth verification to go
through a service — locally via channels for minimal deployments, or via irpc
for production deployments where auth runs on a separate process or node.
The critical design point: callers go through the `IdentityProvider` trait
(ADR-029). The irpc service is one way to satisfy the trait. Both paths produce
the same result — an `Identity` or rejection. The trait is the contract; the
service is an implementation path.
## Decision
**Auth verification is provided via an irpc service protocol, with
`IdentityProvider` as the interface contract and `ConfigIdentityProvider`
(ArcSwap-backed) as the default implementation.**
### IdentityProvider Trait (ADR-029) — The Contract
Callers depend on `IdentityProvider`, not on any concrete implementation:
```rust
pub trait IdentityProvider: Send + Sync + 'static {
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
}
```
### ConfigIdentityProvider — Default Implementation
Reads from `ArcSwap<DynamicConfig.auth>`. No database needed. Every authorized
key gets a default scope set. This is the default for CLI and single-node
deployments.
### AuthProtocol irpc Service — Behind Feature Flag
```rust
#[rpc_requests(message = AuthMessage)]
#[derive(Debug, Serialize, Deserialize)]
enum AuthProtocol {
#[rpc(tx=oneshot::Sender<AuthResult>)]
#[wrap(VerifyPubkey)]
VerifyPubkey { fingerprint: String, key_data: Vec<u8> },
#[rpc(tx=oneshot::Sender<AuthResult>)]
#[wrap(VerifyToken)]
VerifyToken { token_bytes: Vec<u8>, timestamp: u64 },
#[rpc(tx=oneshot::Sender<()>)]
#[wrap(ReloadKeys)]
ReloadKeys,
#[rpc(tx=oneshot::Sender<bool>)]
#[wrap(CheckAccess)]
CheckAccess { identity: Identity, operation: String },
}
enum AuthResult {
Ok(Identity),
Denied(String),
}
```
The `AuthProtocol` is behind the `irpc` feature flag in alknet-core. Nodes
that only do SSH tunneling don't need the service layer overhead. When the
feature is disabled, auth goes through `IdentityProvider` directly.
### AuthServiceImpl
Two implementations exist:
- **ConfigAuthService** — backed by `ConfigIdentityProvider` (ArcSwap path).
Wraps the trait in an irpc service for deployments that use the service layer
but don't have SQLite.
- **StorageAuthService** — backed by SQLite `peer_credentials` and `api_keys`
tables (in alknet-storage). Queries on demand. Can maintain an LRU cache for
hot fingerprints. This is the production implementation.
Both produce the same `AuthResult` — an `Identity` or a denial. Callers don't
know or care which backend is running.
### Integration with IdentityProvider
The irpc service and the trait compose. A caller goes through `IdentityProvider`,
which may internally delegate to the irpc service, or may satisfy the request
locally via `ConfigIdentityProvider`. The deployment topology determines the
path:
- **Minimal (CLI, single-node)**: `ConfigIdentityProvider` reads from
`ArcSwap<DynamicConfig>`. No irpc overhead.
- **Production with local auth**: `AuthServiceImpl` wraps
`StorageIdentityProvider` locally. The handler calls `IdentityProvider` which
routes to the local irpc service.
- **Distributed auth**: Handler on a worker node calls `IdentityProvider` which
routes to a remote auth irpc service over QUIC.
### ConfigService Integration
`AuthProtocol::ReloadKeys` triggers reload of the dynamic config's auth section.
For the `ConfigIdentityProvider` path, this is equivalent to
`ConfigReloadHandle::reload()`. For the `StorageIdentityProvider` path, this
refreshes the LRU cache. Both update atomically — ongoing connections are
unaffected, new connections pick up changes.
## Consequences
- **Positive**: Minimal deployments use `ArcSwap` without irpc overhead. No
database dependency for CLI users.
- **Positive**: Production deployments wire `StorageIdentityProvider` behind the
irpc service. Auth scales to thousands of users without loading all keys into
memory.
- **Positive**: The `IdentityProvider` trait is the only contract callers depend
on. This keeps alknet-core lean and testable.
- **Positive**: Feature flag (`irpc`) keeps core lean for deployments that don't
need the service layer.
- **Positive**: Both paths produce identical `Identity` results. Behavioral
parity is enforced by the shared `Identity` type.
- **Negative**: Two implementations must be kept in sync. `ConfigIdentityProvider`
and `StorageIdentityProvider` must produce the same `Identity` for the same
input. Integration tests should verify this.
- **Negative**: The `irpc` feature flag adds conditional compilation complexity.
The core must compile and work without it, and the service layer must work
with it enabled.
## References
- [research/services.md](../../research/services.md) — AuthService, AuthProtocol definition
- [auth.md](../auth.md) — IdentityProvider trait, Identity struct
- [research/configuration.md](../../research/configuration.md) — Auth service approach
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.4
- [ADR-029](029-identity-core-type.md) — Identity as core type
- [ADR-027](027-crate-decomposition.md) — Crate decomposition

View File

@@ -0,0 +1,107 @@
# ADR-029: Identity as Core Type
## Status
Accepted
## Context
The `Identity` struct and `IdentityProvider` trait are needed by auth,
forwarding policy, and call protocol — three different subsystems in
alknet-core. Without placing them in core, these subsystems would each define
their own identity type, leading to duplication and conversion boilerplate.
The constraint: alknet-core must not depend on alknet-storage or any database.
The `IdentityProvider` trait must be in core so that the handler can resolve
identities without knowing whether the backing store is a config file or a
SQLite database. External crates provide implementations.
Earlier research defined `Identity` inconsistently: `{node_id, fingerprint,
scopes}` in services.md and `{id, scopes, resources}` in auth.md. The unified
model uses `{id, scopes, resources}` where `id` serves as both fingerprint (for
key-based auth from config) and account UUID (for database-backed auth).
## Decision
**`Identity` struct and `IdentityProvider` trait live in `alknet_core::auth`.**
### Identity Struct
```rust
pub struct Identity {
pub id: String, // Fingerprint (config auth) or account UUID (database auth)
pub scopes: Vec<String>, // e.g., ["relay:connect", "service:gitea:read"]
pub resources: HashMap<String, Vec<String>>, // e.g., {"service": ["gitea", "registry"]}
}
```
The `id` field serves dual purpose: when using config-based authentication
(`ConfigIdentityProvider`), it holds the Ed25519 key fingerprint. When using
database-backed authentication (`StorageIdentityProvider`), it holds the account
UUID from the `accounts` table. This keeps the type simple while accommodating
both auth paths.
The `scopes` field provides authorization scope strings used by
`ForwardingPolicy` and `AccessControl` in `OperationSpec`. The `resources`
field provides resource-level authorization beyond what scopes offer (e.g., which
services this identity can access).
### IdentityProvider Trait
```rust
pub trait IdentityProvider: Send + Sync + 'static {
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
}
```
The trait is the contract. Callers (auth handler, forwarding policy, call
protocol) depend on `IdentityProvider` — not on any concrete implementation.
### Default and Production Implementations
- **`ConfigIdentityProvider`** (in alknet-core) — reads from
`ArcSwap<DynamicConfig.auth>`. Every authorized key gets a default scope set.
No database needed. This is the default for minimal deployments.
- **`StorageIdentityProvider`** (in alknet-storage) — backed by SQLite
`peer_credentials` and `api_keys` tables plus the ACL graph. Resolves
fingerprint → account → organization membership → effective scopes. This is
the production implementation for head nodes.
alknet-core never depends on alknet-storage. The trait relationship is:
alknet-core *defines* the trait, alknet-storage *implements* it. The CLI or
NAPI assembly layer wires the concrete implementation.
### Why Not in alknet-storage?
If `Identity` lived in alknet-storage, alknet-core would need to depend on
alknet-storage to use the type — creating a circular dependency (since
alknet-storage implements alknet-core's `IdentityProvider` trait). Placing the
type and trait in core breaks the cycle.
## Consequences
- **Positive**: alknet-core has no database dependency. Auth, forwarding, and
call protocol all use the same `Identity` type.
- **Positive**: alknet-storage implements the core trait. The CLI/NAPI layer
wires the concrete implementation. Deployment topology determines which impl
to use.
- **Positive**: The `id` field serves dual purpose (fingerprint or UUID),
avoiding separate types for config-based and database-based auth.
- **Positive**: `ForwardingPolicy` and `AccessControl` can reference scopes from
`Identity` without knowing where they came from.
- **Negative**: Two implementations of `IdentityProvider` exist — `Config` and
`Storage`. Both must produce identical `Identity` results for the same input.
Tests should verify behavioral parity.
- **Negative**: The trait abstraction adds a level of indirection for the
minimal (config-only) deployment path. The cost is negligible — the
`ConfigIdentityProvider` is a simple `ArcSwap` dereference.
## References
- [auth.md](../auth.md) — IdentityProvider trait, Identity struct, unified auth
- [research/services.md](../../research/services.md) — AuthService, Identity section
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.2
- [ADR-023](023-unified-auth-shared-key-material.md) — Unified auth with shared key material
- [ADR-028](028-auth-irpc-service.md) — Auth as irpc service
- [OQ-18](../open-questions.md) — IdentityProvider owns scopes

View File

@@ -0,0 +1,159 @@
# ADR-030: Static/Dynamic Configuration Split
## Status
Accepted
## Context
Alknet's configuration is loaded once at startup and never changes. This causes
three specific failures:
1. **No hot reload of authentication credentials.** Adding or removing an
authorized key requires restarting the server process. In head/worker
deployments where keys are managed via a database, the process must be
restarted every time a key is added, revoked, or rotated. This is
operationally unacceptable.
2. **No port forwarding access control.** Any authenticated client can open a
`direct-tcpip` channel to any destination. There is no policy governing
which hosts, ports, or alknet control channels a client may access. A
compromised key grants unrestricted network access through the tunnel.
3. **No structured configuration beyond CLI flags.** ADR-011 chose
programmatic-first configuration for the alpha — correct at the time. But as
alknet moves toward publishable releases, operators need config files for
reproducible deployments, and the NAPI layer needs programmatic reload
capability that `ServeOptions` doesn't currently support.
Not all configuration should be reloadable. Transport-level settings (listen
address, TLS certificates, host key) require socket/TLS renegotiation to change
at runtime — effectively a restart. Auth and forwarding policy can change
atomically without disrupting existing connections.
## Decision
**Split configuration into `StaticConfig` and `DynamicConfig`.**
### StaticConfig
Immutable after startup. Constructed from `ServeOptions` (the builder pattern is
preserved). Contains everything that affects socket binding, TLS handshakes, or
SSH session negotiation:
- Transport mode, listen address
- TLS config (cert, key)
- iroh config (relay URL)
- Stealth mode flag
- Host key, host key algorithm
- Max auth attempts, max connections per IP
- Proxy config
Changing any of these requires a restart.
### DynamicConfig
Hot-reloadable at runtime via `ArcSwap<DynamicConfig>`. Contains everything
checked per-connection or per-channel:
- `AuthPolicy` — authorized keys, certificate authorities, token config
- `ForwardingPolicy` — allow/deny rules for channel targets (ADR-031)
- `RateLimitConfig` — rate limiting parameters
`ArcSwap` provides lock-free reads on the hot path (every `auth_publickey()` and
every `channel_open_direct_tcpip()` call does an `Arc` dereference — zero cost
compared to the current approach). Writes are atomic: `store()` swaps the
pointer. Existing connections finish with their current config; new connections
get the new config.
### ConfigReloadHandle
```rust
pub struct ConfigReloadHandle {
dynamic: Arc<ArcSwap<DynamicConfig>>,
}
impl ConfigReloadHandle {
pub fn reload(&self, new_config: DynamicConfig) { ... }
}
```
The handle is obtained from `Server::run()` and passed to NAPI or the CLI.
### ConfigService
The `ConfigService` wraps `ArcSwap<DynamicConfig>` reloads behind an irpc
protocol (behind the `irpc` feature flag) for production deployments that use
the service layer. For minimal deployments (CLI, single-node), direct
`ConfigReloadHandle::reload()` is sufficient.
### TOML Config File
An optional TOML config file covers static config plus initial auth/forwarding
paths. This **amends** ADR-011 (does not supersede it) — the programmatic-first
API remains primary. The config file is a convenience input format:
```toml
[server]
transport = "tls"
listen = "0.0.0.0:443"
stealth = false
max_connections_per_ip = 5
max_auth_attempts = 3
[server.tls]
cert = "/etc/alknet/tls/cert.pem"
key = "/etc/alknet/tls/key.pem"
[auth]
host_key = "/etc/alknet/ssh/host_key"
[forwarding]
default = "deny"
```
### NAPI Reload API
```typescript
interface AlknetServer {
reloadAuth(auth: { authorizedKeys?: Buffer, certAuthority?: Buffer }): void;
reloadForwarding(policy: ForwardingPolicyConfig): void;
reloadAll(config: DynamicConfig): void;
}
```
The NAPI layer parses key data and constructs a new `DynamicConfig`, then calls
`ConfigReloadHandle::reload()`.
### Client Configuration
Client configuration stays as `ConnectOptions` — no `ArcSwap` needed. Client
config is almost entirely static (which server to connect to, which key to use).
## Consequences
- **Positive**: Auth credentials and forwarding policy can be reloaded without
restarting the server. Adding a key via `reloadAuth()` takes effect on the
next connection attempt.
- **Positive**: ADR-011's programmatic-first intent is preserved. The TOML
config file is an optional convenience layer, not a replacement for
`ServeOptions`.
- **Positive**: `ArcSwap` provides zero-cost reads on the hot path. Every auth
check and every channel open is a single `Arc` dereference.
- **Positive**: The `ConfigService` irpc protocol (behind feature flag) allows
production deployments to integrate config reload into their service mesh
without taking a direct dependency on `DynamicConfig` internals.
- **Positive**: Forwarding policy is now part of `DynamicConfig` — operators can
restrict access per identity, per destination, per transport (ADR-031).
- **Negative**: Two config structs where there was one. The split is clean
(transport vs. policy) but adds surface area.
- **Negative**: Config file introduces `toml` as a dependency in the CLI crate.
This is acceptable for a CLI binary.
## References
- [research/configuration.md](../../research/configuration.md) — Full analysis
- [ADR-011](011-no-ssh-config-programmatic-api.md) — Programmatic-first API (amended, not superseded)
- [ADR-031](031-forwarding-policy.md) — Forwarding policy (part of DynamicConfig)
- [ADR-029](029-identity-core-type.md) — Identity as core type (DynamicConfig.auth uses IdentityProvider)
- [integration-plan.md](../../research/integration-plan.md) — Phase 1.1

View File

@@ -0,0 +1,138 @@
# ADR-031: Forwarding Policy
## Status
Accepted
## Context
Currently, any authenticated client can open a `direct-tcpip` SSH channel to
any destination. The only gate is authentication — once authenticated, a client
has unrestricted network access through the tunnel. This is a security gap: a
compromised key grants unrestricted access.
Operators need the ability to:
- Restrict which hosts and ports authenticated clients can access
- Apply different rules to different principals (key fingerprints, accounts)
- Restrict WebTransport clients to alknet control channels only
- Set a default policy (allow-all for migration compatibility, deny-all for
production)
## Decision
**Add `ForwardingPolicy` as part of `DynamicConfig` (reloadable without
restart).**
### Type Definitions
```rust
pub struct ForwardingPolicy {
pub default: ForwardingAction,
pub rules: Vec<ForwardingRule>,
}
pub struct ForwardingRule {
pub target: TargetPattern,
pub action: ForwardingAction,
pub principals: Vec<String>, // Empty = matches all
pub transports: Vec<TransportKind>, // Empty = matches all
}
pub enum ForwardingAction {
Allow,
Deny,
}
pub enum TargetPattern {
Any,
Host(String), // "localhost", "*.example.com"
Cidr(IpNetwork), // "10.0.0.0/8"
PortRange(String, Range<u16>), // "localhost", ports 8080-8090
AlknetPrefix, // Matches alknet-* control channels
}
```
### Rule Evaluation
Rules are evaluated in order. First match wins. If no rule matches, the default
applies. This supports both allowlist and blocklist semantics:
- **Allowlist**: `default: Deny`, then explicit Allow rules for permitted
destinations.
- **Blocklist**: `default: Allow`, then explicit Deny rules for blocked
destinations.
### Principals
Each rule can specify which principals it applies to. A principal is an
`Identity.id` (fingerprint or UUID) or a scope from `Identity.scopes`. When the
rule's `principals` field is empty, it matches all identities.
This connects to the `IdentityProvider` trait (ADR-029): when a client
authenticates, the `Identity` is resolved, and the forwarding policy checks
rules against `Identity.id` and `Identity.scopes`.
### TransportKind-Aware Rules
Each rule can specify which `TransportKind` it applies to. This enables
transport-specific restrictions — for example, WebTransport clients can be
restricted to `alknet-*` control channels only:
```rust
ForwardingRule {
target: TargetPattern::AlknetPrefix,
action: ForwardingAction::Allow,
principals: vec![],
transports: vec![TransportKind::WebTransport { host: "*".into() }],
}
```
### Where the Policy Check Happens
The forwarding policy check occurs in `channel_open_direct_tcpip` before the
proxy task is spawned. The current behavior (no check) is equivalent to
`ForwardingPolicy::allow_all()` — default Allow, no rules. This preserves
backward compatibility during migration.
### DynamicConfig Integration
`ForwardingPolicy` is part of `DynamicConfig` and reloadable via
`ConfigReloadHandle::reload()` or NAPI's `reloadForwarding()`. Changes take
effect on the next channel open — existing connections continue with their
current policy.
### OQ Resolutions
- **OQ-12** (Per-user forwarding scope vs global rules): Resolved. Start with
global rules + principal matching from `Identity.scopes`. Per-user scope
from `peer_credentials.metadata.scopes` via `IdentityProvider`.
- **OQ-16** (Transport-specific forwarding): Resolved. Add `TransportKind`
match in `ForwardingRule`. WebTransport clients can be restricted.
- **OQ-18** (Source of Identity.scopes): Resolved by ADR-029 and this ADR.
`IdentityProvider` owns scopes. `ForwardingPolicy` consumes them.
## Consequences
- **Positive**: Operators can restrict access per identity, per destination, per
transport. A compromised key no longer grants unrestricted network access.
- **Positive**: Default-allow preserves current behavior during migration. Switch
to default-deny for production deployments.
- **Positive**: Policy is reloadable without restart. Adding a rule via
`reloadForwarding()` takes effect on the next channel open.
- **Positive**: `TransportKind`-aware rules enable transport-specific
restrictions (e.g., WebTransport clients restricted to alknet-* channels).
- **Negative**: Another check in the hot path (every `channel_open_direct_tcpip`
call). The cost is a linear scan of rules — acceptable for small rule sets.
Large rule sets should use compiled matchers (future optimization).
- **Negative**: `TargetPattern` string matching is lenient. Host patterns like
`*.example.com` require careful implementation to prevent bypasses. The
`glob` or `globset` crate can handle this correctly.
## References
- [research/configuration.md](../../research/configuration.md) — ForwardingPolicy section
- [auth.md](../auth.md) — Identity.scopes and IdentityProvider
- [open-questions.md](../open-questions.md) — OQ-12, OQ-16, OQ-18
- [ADR-029](029-identity-core-type.md) — Identity as core type
- [ADR-030](030-static-dynamic-config-split.md) — DynamicConfig (ForwardingPolicy is part of it)
- [integration-plan.md](../../research/integration-plan.md) — Phase 1.3

View File

@@ -0,0 +1,96 @@
# ADR-032: Event Boundary Discipline
## Status
Accepted
## Context
The research identified three distinct communication patterns in the system, and
conflating them is a known anti-pattern in event-driven architectures:
1. **Domain events** (Honker streams) — Internal to the service that owns that
data. Used for state reconstruction within the service's own boundaries.
Examples: `nodes:created`, `edges:deleted`, `accounts:updated`.
2. **irpc service calls** — Synchronous request-response within a node or
cluster. Internal to the system. Examples: `AuthProtocol::VerifyPubkey`,
`SecretProtocol::DeriveEd25519`, `ConfigProtocol::ReloadForwarding`.
3. **Call protocol events** (`EventEnvelope`) — Asynchronous integration events
that cross node boundaries. External to the system. Examples:
`call.requested`, `call.responded`, `call.completed`, `call.aborted`.
Without a hard constraint, it's tempting to have one service subscribe directly
to another service's Honker streams. This leads to:
- **Leaky event store**: Service A reads Service B's domain events directly,
coupling A to B's internal state representation. When B changes its schema, A
breaks.
- **Boomerang coupling**: An integration event is too thin, causing the
consumer to call back to the source service synchronously to get details. This
negates the benefit of async communication.
- **Fat notification trap**: A notification event carries full entity state,
when it should use state transfer instead.
## Decision
**Event boundary discipline is a hard architectural constraint, not a
suggestion.**
1. **Domain events stay within the owning service.** A Honker stream published
by the storage service (`nodes:created`) is for the storage service's own
state reconstruction. No other service reads these stream events directly.
2. **irpc service calls are synchronous and internal.** They never cross node
boundaries. They are request-response, not events. They should not be used
as a substitute for integration events.
3. **Call protocol events are the only events that cross node boundaries.**
`EventEnvelope` frames are the integration boundary. When a domain event
needs to be communicated to another node, it must be projected into a call
protocol event.
4. **Projection from domain events to integration events is required when
crossing boundaries.** A service that owns a Honker stream must project
relevant state changes into `EventEnvelope` frames before they leave the
node. The projection strips internal details and produces a versioned,
stable integration event.
This discipline applies at three levels:
```
Call Protocol (Layer 3, external, JSON)
└── irpc Service (Layer 3, internal, postcard)
└── Honker Streams (Domain events, within service boundary)
```
A call protocol handler MAY call an irpc service internally (e.g.,
`/head/auth/verify` calls `AuthProtocol::VerifyPubkey`). The irpc service MAY
use Honker streams for its own state management. But domain events never
propagate beyond the service boundary without projection.
## Consequences
- **Positive**: Prevents leaky event stores. Services are independently
deployable and their internal schemas can evolve without breaking consumers.
- **Positive**: Honker and irpc are implementation details, not cross-boundary
contracts. The call protocol's `EventEnvelope` is the only stable, versioned
contract that other nodes depend on.
- **Positive**: Clear ownership. Each service owns its Honker streams and can
change them freely. Integration events are a deliberate, reviewed contract.
- **Positive**: Makes testing easier. Services can be tested in isolation with
mock domain events. Integration events are tested against the `EventEnvelope`
schema.
- **Negative**: Projection code is required. Every domain event that needs to
cross a boundary must be explicitly projected. This is deliberate — the
overhead ensures the integration contract is intentional.
- **Negative**: Developers must resist the temptation to subscribe directly to
Honker streams across services. Code review should catch this pattern.
## References
- [research/services.md](../../research/services.md) — Event boundary discipline section
- [research/storage.md](../../research/storage.md) — Honker integration, event boundaries
- [research/integration-plan.md](../../research/integration-plan.md) — ADR 032 entry
- [event_source_types.md](/workspace/research/event_sourcing/event_source_types.md) — Event-driven architecture patterns

View File

@@ -0,0 +1,130 @@
# ADR-033: OperationEnv as Universal Composition Mechanism
## Status
Accepted
## Context
The `@alkdev/operations` TypeScript package defines `OperationEnv` as a
universal composition mechanism. A handler receives `context.env[namespace][op](input)`
and can invoke any registered operation regardless of whether it runs locally, in
an irpc service on the same cluster, or on a remote node via call protocol.
The research documents define three dispatch paths:
1. **Local dispatch** — direct function call through the operation registry
2. **Service dispatch** — irpc protocol call to a service backend
3. **Remote dispatch** — call protocol `EventEnvelope` to a remote node
Without a formal decision, irpc services could be seen as a replacement for
OperationEnv or for the call protocol. They are not — irpc is one dispatch
backend for OperationEnv, not a replacement for anything. The call protocol is
another dispatch backend. OperationEnv unifies them from the handler's
perspective.
The three communication patterns in the system (ADR-032) are:
- Domain events (Honker streams) — internal to the owning service
- irpc service calls — synchronous, in-cluster
- Call protocol events — asynchronous, cross-node
irpc services and call protocol operations serve different scopes but must
compose cleanly through OperationEnv.
## Decision
**OperationEnv is the universal composition mechanism that all operation
handlers receive. It provides namespace + operation name → invoke with input,
return output, regardless of dispatch path.**
### OperationEnv Behavioral Contract
```rust
// The behavioral contract: given a namespace and operation name, invoke the
// operation with the given input and return the output. The handler neither
// knows nor cares whether the dispatch is local, via irpc, or via call protocol.
pub trait OperationEnv: Send + Sync {
fn invoke(&self, namespace: &str, operation: &str, input: Value) -> ResponseEnvelope;
}
```
The Rust implementation may use typed method dispatch or a registry behind the
scenes, but the handler-facing API must preserve this contract.
### Three Dispatch Paths
OperationEnv resolves each call to one of three dispatch backends:
| Path | Mechanism | Serialization | Scope |
|------|-----------|---------------|-------|
| Local | Direct function call through registry | None (in-process) | Same process |
| Service | irpc protocol enum dispatch | postcard (binary) | Same cluster |
| Remote | Call protocol `EventEnvelope` | JSON | Cross-node |
All three produce the same `ResponseEnvelope`. The handler always calls
`context.env.invoke("secrets", "derive", input)` and gets a `ResponseEnvelope`
back.
### Service Assembly
The deployment topology determines which dispatch path each operation uses:
```rust
// Minimal deployment (single node, all local)
let env = OperationEnv::local(local_registry);
// Production deployment (mix of local and remote)
let env = OperationEnv::new()
.local("auth", auth_registry) // Auth runs locally
.local("config", config_registry) // Config runs locally
.service("secrets", secret_irpc_client) // Secret service via irpc
.remote("worker-1", call_protocol_conn) // Worker-1 operations via call protocol
```
### irpc Services Are One Dispatch Backend
irpc services (`AuthProtocol`, `SecretProtocol`, `ConfigProtocol`) define the
wire format for in-cluster communication. They are Rust-to-Rust, type-safe,
and efficient. But they are not a replacement for OperationEnv or for the call
protocol. They are one dispatch backend.
An irpc service can be exposed as a call protocol operation:
`/head/auth/verify` receives a call protocol event and internally calls
`AuthProtocol::VerifyPubkey` via irpc. The layers compose:
```
Call Protocol (Layer 3, external, JSON)
└── irpc Service (Layer 3, internal, postcard)
└── Honker Streams (Domain events, within service boundary)
```
### Adapters Map to OperationEnv
HTTP (`POST /v1/{namespace}/{op}`), MCP (`tools/call`), DNS
(`{op}.{namespace}.alk.dev TXT?`), and call protocol
(`/call.requested`) all resolve through OperationEnv. This is what makes
operations universally composable across all interfaces.
## Consequences
- **Positive**: Handlers compose through a single interface. Adding a new
dispatch path (e.g., a new irpc service) doesn't change handler code.
- **Positive**: irpc and call protocol coexist naturally. The handler doesn't
know which path was taken.
- **Positive**: Adapters (MCP, HTTP, DNS) map to operations through the same
OperationEnv interface. One handler, multiple dispatch paths.
- **Positive**: Deployment topology determines dispatch, not code. Same handler
works locally, in-cluster, or cross-node.
- **Negative**: OperationEnv is a new abstraction that must coexist with the
existing call protocol handler pattern. The registry currently maps paths to
handlers; OperationEnv adds namespace-aware composition on top.
- **Negative**: The `@alkdev/operations` TypeScript `HashMap<String,
HashMap<String, fn>>` model needs idiomatic Rust translation. The behavioral
contract must match, but the implementation can differ.
## References
- [research/services.md](../../research/services.md) — OperationContext, OperationEnv
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.5, OperationEnv wiring
- [ADR-032](032-event-boundary-discipline.md) — Event boundary discipline
- [ADR-024](024-bidirectional-call-protocol.md) — Bidirectional call protocol
- [ADR-025](025-handler-spec-separation.md) — Handler/spec separation

View File

@@ -0,0 +1,55 @@
# ADR-034: Head/Worker Terminology
## Status
Accepted
## Context
The project previously used hub/spoke terminology for describing node
relationships: a hub node that coordinates connections and spokes that connect to
it. This terminology implies a strict star topology where the hub is
fundamentally different from spokes.
In practice, a coordinating node can also execute operations (run services,
forward traffic). Any node can become a coordinator. The architecture supports
mesh topologies where nodes coordinate in a peer-to-peer fashion.
The research documents (`core.md`, `services.md`) and updated architecture
specs (`call-protocol.md`, `auth.md`, `napi-and-pubsub.md`, `open-questions.md`)
already use head/worker consistently. Existing ADRs (024, 025) retain their
original hub/spoke language because ADRs are historical records.
## Decision
**Use head/worker terminology throughout the project.**
- **Head node**: A node that coordinates — accepts connections, routes
operations, manages cluster state. A head is also a worker (it can execute
operations).
- **Worker node**: A node that connects to a head, registers its services, and
executes operations. Any worker can become a head.
- **Node**: Any participant in the network. Every node has an Ed25519 identity.
The terms hub and spoke are deprecated in all new specs, code, and
documentation. Existing ADRs retain their original language as historical
records — ADRs document what was decided at the time, not what the current
terminology is.
## Consequences
- **Positive**: Natural mesh formation. A head that is also a worker enables
multi-hop routing, redundancy, and distributed topologies without a
centralized authority.
- **Positive**: Consistency with integration plan and research documents.
- **Positive**: The terminology better reflects the architecture — there is no
single "hub" that's fundamentally different from "spokes."
- **Neutral**: Existing ADRs (024, 025) retain hub/spoke in their text. This is
intentional — ADRs are historical records.
## References
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 0 ADR 034 entry, inconsistencies section
- [ADR-024](024-bidirectional-call-protocol.md) — Uses hub/spoke historically
- [ADR-025](025-handler-spec-separation.md) — Uses hub/spoke historically
- [research/core.md](../../research/core.md) — Head/worker terminology

View File

@@ -0,0 +1,186 @@
---
status: draft
last_updated: 2026-06-07
---
# FlowGraph
## What
The `alknet-flowgraph` crate provides graph data structures and operations,
mapping the TypeScript `@alkdev/flowgraph` package's call-graph and
operation-graph concepts to `petgraph::DiGraph`.
## Why
Call graphs and operation graphs are core observability and type-safety
constructs. Call graphs track request flow across services; operation graphs
validate type compatibility between composed operations. The crate is pure
computation (no I/O, no external state), making it safe to include in any
deployment topology.
## Architecture
### Core Abstraction
`petgraph::DiGraph` replaces graphology. The mapping is nearly 1:1 for the
operations used:
| TypeScript (graphology) | Rust (petgraph) |
|------------------------|-----------------|
| `graph.addNode(key, attrs)` | `graph.add_node(attrs)` + key_to_index |
| `graph.addEdge(source, target, attrs)` | `graph.add_edge(source, target, attrs)` |
| `hasCycle()` | `is_cyclic_directed(&graph)` |
| `topologicalSort()` | `toposort(&graph)` |
A `HashMap<String, NodeIndex>` provides node-key-to-index lookups, mirroring
the `key` column in the SQLite `nodes` table.
### FlowGraph<N, E>
```rust
pub struct FlowGraph<N, E>
where
N: NodeAttributes,
E: EdgeAttributes,
{
graph: DiGraph<N, E>,
key_to_index: HashMap<String, NodeIndex>,
}
pub trait NodeAttributes: Clone + Serialize + DeserializeOwned + Debug + Send + Sync {
fn key(&self) -> &str;
fn set_key(&mut self, key: String);
}
pub trait EdgeAttributes: Clone + Serialize + DeserializeOwned + Debug + Send + Sync {
fn edge_type(&self) -> &str;
}
```
### Operation Graph (Static)
Built from `OperationSpec`s at startup. Answers structural questions: type
compatibility, cycle detection, reachability.
```rust
pub struct OperationNodeAttrs {
pub name: String,
pub namespace: String,
pub op_type: OperationType,
pub input_schema: Value,
pub output_schema: Value,
}
pub enum OperationType { Query, Mutation, Subscription }
```
Type compatibility compares `output_schema` (source) against `input_schema`
(target) using `jsonschema::validate()`. Exact match or subtype = compatible
edge. Structural mismatch = incompatible edge.
### Call Graph (Dynamic)
Populated at runtime from call protocol events. Every `call.requested` adds a
node; `call.responded`/`call.error`/`call.aborted` update status.
```rust
pub struct CallNodeAttrs {
pub request_id: String,
pub operation_id: String,
pub status: CallStatus,
pub parent_request_id: Option<String>,
pub input: Value,
pub output: Option<Value>,
pub error: Option<CallErrorInfo>,
pub identity: Option<Identity>,
pub started_at: Option<String>,
pub completed_at: Option<String>,
}
pub enum CallStatus { Pending, Running, Completed, Failed, Aborted }
```
### Key Operations
| Query | Method | Returns |
|-------|--------|---------|
| Topological order | `topological_order()` | `Result<Vec<String>, CycleError>` |
| Cycle detection | `has_cycles()` | `bool` |
| Ancestors/descendants | `ancestors()`, `descendants()` | `Vec<String>` |
| Status filtering | `filter_by_status()` | Keys with matching status |
| Duration | `duration()` | `completed_at - started_at` |
### DAG Invariants
- **Operation graph**: DAG-only enforced at construction. Cycles throw
`CycleError`.
- **Call graph**: DAG by design. `parent_request_id` cannot create ancestor
cycles.
- **No parallel edges**: `multi: false`.
- **No self-loops**: `allow_self_loops: false`.
### Integration with alknet-storage
Call graphs and operation graphs are stored as metagraph instances in
alknet-storage. The bridge is serialization: `FlowGraph` serializes to
`serde_json::Value`, which storage persists in the `nodes.attributes` and
`edges.attributes` columns.
### Integration with alknet-core (Call Protocol)
The call protocol's `EventEnvelope` drives call graph construction:
```rust
call_map.on_requested(|event| {
call_graph.update_from_event(&CallEvent::Requested(event));
});
```
### Crate Dependencies
```toml
[dependencies]
petgraph = "0.x"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
jsonschema = "0.x"
thiserror = "1"
uuid = { version = "1", features = ["v4"] }
chrono = { version = "0.x", features = ["serde"] }
```
Does NOT depend on alknet-core, alknet-storage, or alknet-secret.
### Interface Back to Core
`OperationSpec` and `CallNodeAttrs` types must match alknet-core's definitions.
The bridge is serialization — flowgraph serializes to JSON, storage persists it.
alknet-flowgraph does not depend on alknet-core as a crate; it conforms to the
`OperationSpec` schema independently.
## Constraints
- Pure computation crate — no I/O, no database, no external state.
- No dependency on alknet-core, alknet-storage, or alknet-secret.
- Type compatibility with alknet-core's `OperationSpec` is via serialization
conformance, not a crate dependency.
## Open Questions
- None specific to this spec. See [open-questions.md](open-questions.md) for
general questions.
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [027](decisions/027-crate-decomposition.md) | Crate decomposition | alknet-flowgraph is independent of core, storage, secret |
## References
- [research/flow.md](../research/flow.md) — Full FlowGraph, operation graph, call graph design
- [research/integration-plan.md](../research/integration-plan.md) — Phase 2.3
- [call-protocol.md](call-protocol.md) — EventEnvelope, PendingRequestMap
- `@alkdev/flowgraph` — TypeScript call-graph and operation-graph implementation
- `@alkdev/operations` — OperationSpec, CallHandler, registry

View File

@@ -0,0 +1,189 @@
---
status: draft
last_updated: 2026-06-07
---
# Identity
## What
The `Identity` type and `IdentityProvider` trait are the core abstractions for
authentication and authorization in alknet. `Identity` is the unified result of
auth verification — whether via SSH public key, signed timestamp token, or
database lookup. `IdentityProvider` is the trait that resolves credentials to an
`Identity`, decoupling alknet-core from any specific identity storage.
## Why
Auth, forwarding policy, and call protocol all need to know who is making a
request and what they are authorized to do. Without `Identity` in core, each
subsystem would define its own identity type, leading to duplication and
conversion boilerplate. Without `IdentityProvider` as a trait, alknet-core
would either hardcode config-file-based auth or take a database dependency —
neither acceptable for a library crate.
The `IdentityProvider` trait exists because the same auth verification concept
needs two implementations: `ConfigIdentityProvider` for minimal deployments (all
keys in memory via ArcSwap) and `StorageIdentityProvider` for production (SQLite
lookup via `peer_credentials` and ACL graph). The trait is the contract; the
backing store is pluggable.
## Architecture
### Identity Struct
```rust
pub struct Identity {
pub id: String, // Fingerprint or account UUID
pub scopes: Vec<String>, // e.g., ["relay:connect", "service:gitea:read"]
pub resources: HashMap<String, Vec<String>>, // e.g., {"service": ["gitea", "registry"]}
}
```
The `id` field serves dual purpose:
- **Config-based auth** (`ConfigIdentityProvider`): holds the Ed25519 key
fingerprint (e.g., `SHA256:abc123...`)
- **Database-backed auth** (`StorageIdentityProvider`): holds the account UUID
from the `accounts` table
This keeps the type simple while accommodating both auth paths. Downstream
consumers (forwarding policy, call protocol ACL checks) use `scopes` and
`resources` without knowing whether the identity came from a config file or a
database.
### IdentityProvider Trait
```rust
pub trait IdentityProvider: Send + Sync + 'static {
/// Resolve an SSH public key fingerprint to an identity.
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
/// Resolve an auth token to an identity.
/// Returns None if the token is invalid, expired, or the key is not authorized.
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
}
```
Both SSH key auth and token auth resolve to the same `Identity` type. The trait
lives in `alknet_core::auth`.
### ConfigIdentityProvider (Default)
Reads from `ArcSwap<DynamicConfig.auth>` per ADR-030. Every authorized key gets
a default scope set. No database dependency. This is the default for CLI and
single-node deployments.
```rust
pub struct ConfigIdentityProvider {
auth_config: Arc<ArcSwap<DynamicConfig>>,
}
impl IdentityProvider for ConfigIdentityProvider {
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity> {
let config = self.auth_config.load();
config.auth.ssh.authorized_keys.get(fingerprint)
.map(|key_entry| Identity {
id: fingerprint.to_string(),
scopes: key_entry.scopes.clone(),
resources: key_entry.resources.clone(),
})
}
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity> {
// Verify Ed25519 signature against the same authorized_keys set
// Resolve to the same Identity as SSH auth would produce
}
}
```
### StorageIdentityProvider (Production)
Implemented in `alknet-storage` (not in alknet-core). Backed by SQLite
`peer_credentials` and `api_keys` tables plus the ACL graph. Resolves
fingerprint → account → organization membership → effective scopes. Uses the
`IdentityProvider` trait defined in alknet-core, providing the concrete impl via
the trait.
### AuthProtocol irpc Service
The `AuthProtocol` irpc service (behind the `irpc` feature flag per ADR-028)
provides an async boundary for auth verification. It is one way to satisfy the
`IdentityProvider` trait, not a replacement for it:
```rust
enum AuthProtocol {
VerifyPubkey { fingerprint: String, key_data: Vec<u8> },
VerifyToken { token_bytes: Vec<u8>, timestamp: u64 },
ReloadKeys,
CheckAccess { identity: Identity, operation: String },
}
enum AuthResult {
Ok(Identity),
Denied(String),
}
```
The relationship:
- **Trait-based path**: Handler calls `identity_provider.resolve_from_fingerprint()`
directly. Zero overhead. Used when irpc is disabled or when the
implementation is local.
- **irpc path**: Handler calls `identity_provider.resolve_from_fingerprint()`,
which internally delegates to `AuthProtocol::VerifyPubkey` via an irpc client.
Used in production deployments with SQLite-backed auth.
Both paths produce the same `Identity` result.
### Auth Flows
**SSH key auth** (existing, unchanged):
```
Client connects → SSH handshake → auth_publickey() callback
→ IdentityProvider::resolve_from_fingerprint(fingerprint)
→ Some(Identity) or None
```
**Token auth** (new, for non-SSH transports):
```
Browser connects → WebTransport CONNECT request
→ Extract token from URL path or Authorization header
→ IdentityProvider::resolve_from_token(token)
→ Some(Identity) or None
```
Both paths produce an `Identity`. The `Identity` is attached to the connection
and used by `ForwardingPolicy` and call protocol for authorization decisions.
## Constraints
- `Identity` and `IdentityProvider` live in `alknet_core::auth`. No database
dependency at the core level (ADR-029).
- alknet-storage implements the core trait — the dependency goes from storage
to core, not the other way.
- The `id` field in `Identity` serves dual purpose (fingerprint or UUID). This
is a deliberate simplification — downstream consumers don't need to know the
source.
- Certificate authority tokens are not supported for token auth in v1 (ADR-023).
- The irpc feature flag means nodes that only do SSH tunneling don't need the
service layer overhead.
## Open Questions
- None specific to this spec. See [open-questions.md](open-questions.md) for
general auth questions (OQ-15, OQ-19).
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [029](decisions/029-identity-core-type.md) | Identity as core type | `Identity` and `IdentityProvider` live in alknet-core, not storage |
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | `AuthProtocol` behind feature flag; `IdentityProvider` is the contract |
| [023](decisions/023-unified-auth-shared-key-material.md) | Unified auth | Same key material for SSH and token auth; same `Identity` result |
## References
- [auth.md](auth.md) — Token authentication, AuthPolicy, WebTransport session handling
- [research/services.md](../research/services.md) — AuthService, AuthProtocol definition
- [research/integration-plan.md](../research/integration-plan.md) — Phase 1.2
- [ADR-030](decisions/030-static-dynamic-config-split.md) — DynamicConfig (ConfigIdentityProvider reads from it)
- [ADR-031](decisions/031-forwarding-policy.md) — ForwardingPolicy consumes Identity.scopes

View File

@@ -0,0 +1,221 @@
---
status: draft
last_updated: 2026-06-07
---
# Interface (Layer 2)
## What
The Interface layer sits between Transport (Layer 1) and Protocol (Layer 3).
An Interface consumes a `Transport::Stream` and produces call protocol sessions.
SSH is an interface, not a transport — it wraps a byte stream in session
semantics. Raw framing (4-byte length prefix + JSON `EventEnvelope`) is another
interface, one without SSH overhead.
## Why
In the current architecture, SSH is deeply embedded in `ServerHandler`. This
tangling of transport, interface, and protocol makes it impossible to:
- Run the call protocol over DNS queries without wrapping SSH inside DNS
- Use raw framing for local service mesh (no SSH overhead)
- Support WebTransport direct call protocol for browsers
- Separate auth mechanics from channel management
The three-layer model (ADR-026) cleanly separates these concerns. Transport
produces bytes. Interface parses bytes into sessions. Protocol carries
semantics. A connection is always a (Transport, Interface) pair.
## Architecture
### Three-Layer Model
```
Layer 3: Protocol (Call protocol, Operations, OperationEnv)
Layer 2: Interface (SSH, raw framing, HTTP/WS, DNS control channel)
Layer 1: Transport (TCP, TLS, iroh, DNS, WebTransport)
```
- **Layer 1: Transport** — produces byte streams (`AsyncRead + AsyncWrite + Unpin
+ Send`). Unchanged per ADR-001.
- **Layer 2: Interface** — consumes a `Transport::Stream` and produces call
protocol sessions. SSH does handshake + auth + channel multiplexing. Raw
framing does length-prefix parsing.
- **Layer 3: Protocol** — carries semantics. Call protocol events, operation
registry, service calls. Agnostic to both Transport and Interface below it.
### Interface Trait
```rust
#[async_trait]
pub trait Interface: Send + Sync + 'static {
type Session;
async fn accept(stream: TransportStream, config: &InterfaceConfig) -> Result<Self::Session>;
}
```
The session produced by an interface is consumed by the call protocol handler.
Different interfaces produce different session types, but the call protocol
handler receives `EventEnvelope` frames from any interface.
### SshInterface
Wraps the existing `ServerHandler` logic. This is the most complex interface
because SSH provides channel multiplexing, auth negotiation, and proxy
management within a single session.
What stays in SshInterface (Layer 2):
- SSH handshake and session management
- Auth delegation to `IdentityProvider` (via `auth_publickey()` callback)
- Channel multiplexing (multiple channels per session)
- `alknet-control:0` channel routing to call protocol
What moves to Layer 3 (call protocol handler):
- Operation registry and dispatch
- Forwarding policy checks (per ADR-031)
- Operation context construction (Identity, scopes)
What moves to per-connection state:
- Port forwarding proxy logic
### RawFramingInterface
Reads 4-byte big-endian length prefix + JSON `EventEnvelope` frames directly
from the transport stream. No SSH wrapping. No channel multiplexing — the
entire stream is a single call protocol channel.
```rust
pub struct RawFramingInterface;
impl Interface for RawFramingInterface {
type Session = RawFramingSession;
// Reads length-prefixed EventEnvelope frames from the stream
}
```
Used for:
- DNS control channel (DNS transport + raw framing)
- Local service mesh (TCP + raw framing, no SSH overhead)
- Browser direct call protocol (WebTransport + raw framing, future)
### DNS Control Channel
A (DNS transport, raw framing interface) pair. The DNS transport encodes
`EventEnvelope` frames as DNS query/response pairs. The raw framing interface
parses them directly — **NOT** SSH inside DNS.
```
Client: Encode EventEnvelope as base32 DNS query labels
→ DNS Transport → DNS Server → Raw Framing Interface → Call Protocol Handler
Server: Return EventEnvelope as DNS TXT record response
← Raw Framing Interface ← DNS Transport ← Call Protocol Handler
```
### Valid (Transport, Interface) Pairs
| Transport | Interface | Use case |
|-----------|-----------|----------|
| TLS | SSH | Standard alknet tunnel |
| TCP | SSH | Plain SSH tunnel |
| iroh | SSH | P2P SSH tunnel |
| DNS | raw framing | DNS control channel |
| WebTransport | SSH | Browser SSH tunnel (future) |
| WebTransport | raw framing | Browser call protocol (future) |
| TCP | raw framing | Direct call protocol, local mesh |
### InterfaceConfig
Different interfaces require different configuration:
```rust
pub enum InterfaceConfig {
Ssh(SshInterfaceConfig),
RawFraming(RawFramingConfig),
}
pub struct SshInterfaceConfig {
pub auth: Arc<dyn IdentityProvider>,
pub forwarding: Arc<ArcSwap<DynamicConfig>>, // for ForwardingPolicy
pub host_key: Arc<PrivateKey>,
}
pub struct RawFramingConfig {
// No SSH-specific config needed
// Auth is handled by the transport layer (e.g., token auth for WebTransport)
// or by the call protocol layer
}
```
### Auth Across Interfaces
- **SshInterface**: Auth happens during SSH handshake via
`IdentityProvider::resolve_from_fingerprint()`. The authenticated `Identity`
is attached to the session.
- **RawFramingInterface**: Auth is handled by the transport (e.g., token auth
for WebTransport via `IdentityProvider::resolve_from_token()`) or by the call
protocol layer (operation-level ACL).
Both paths produce the same `Identity` type (ADR-029).
### Server Accept Loop
With the Interface trait, the accept loop becomes:
```rust
for listener in listeners {
let (transport, interface) = listener;
tokio::spawn(async move {
loop {
let stream = transport.accept().await?;
let session = interface.accept(stream, &config).await?;
// session produces call protocol events
// call protocol handler is interface-agnostic
}
});
}
```
## Constraints
- The Interface trait must accommodate both SSH's channel multiplexing and raw
framing's single-stream model through the same abstraction.
- `SshInterface` is the most invasive refactoring in Phase 1. The existing
`ServerHandler` owns auth, channel management, and proxy logic — extracting
these cleanly requires careful design (integration-plan, Phase 1.8).
- DNS transport implementation is Phase 4 work. The `TransportKind::Dns` variant
and `RawFramingInterface` are defined now; implementation is deferred.
- WebTransport is Phase 4 work. The `TransportKind::WebTransport` variant is a
tag only for now.
## Open Questions
- **OQ-IF-01**: How does the `Interface` session type relate to the call
protocol's `EventEnvelope` stream? Does every session implement
`Stream<Item=EventEnvelope>`? This needs design during Phase 1.8.
- **OQ-IF-02**: Should `SshInterface` own the `ForwardingPolicy` check for
`channel_open_direct_tcpip`, or should that move to Layer 3? Current thinking:
the forwarding check is a Layer 3 concern (it's policy, not session mechanics),
but the channel open/close lifecycle is Layer 2. The Interface reports channel
open requests to Layer 3; Layer 3 applies `ForwardingPolicy` and tells
Layer 2 whether to proxy.
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [026](decisions/026-transport-interface-separation.md) | Three-layer model | SSH is Layer 2, not Layer 1 |
| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv | Protocol is interface-agnostic |
| [029](decisions/029-identity-core-type.md) | Identity as core type | Auth resolution across interfaces |
| [031](decisions/031-forwarding-policy.md) | Forwarding policy | Layer 3 policy applied to Layer 2 channel requests |
## References
- [research/integration-plan.md](../research/integration-plan.md) — Phase 1.8, valid (Transport, Interface) pairs
- [research/core.md](../research/core.md) — DNS transport, three-layer model
- [ADR-026](decisions/026-transport-interface-separation.md) — Transport/interface separation
- [transport.md](transport.md) — Transport trait (unchanged at Layer 1)
- [server.md](server.md) — Current ServerHandler (will become SshInterface)
- [identity.md](identity.md) — IdentityProvider, auth across interfaces

View File

@@ -1,6 +1,6 @@
---
status: draft
last_updated: 2026-06-04
last_updated: 2026-06-07
---
# Open Questions
@@ -96,10 +96,10 @@ last_updated: 2026-06-04
### OQ-12: Per-user forwarding scope vs global rules
- **Origin**: [research/configuration.md](../research/configuration.md)
- **Status**: open
- **Priority**: medium
- **Resolution**: (pending)
- **Cross-references**: configuration.md
- **Status**: ~~resolved~~
- **Priority**: ~~medium~~
- **Resolution**: ADR-031 — Start with global rules + principal matching from `Identity.scopes`. Per-user scope from `peer_credentials.metadata.scopes` via `IdentityProvider`. The `ForwardingPolicy` evaluates rules against `Identity.id` and `Identity.scopes` from the authenticated identity.
- **Cross-references**: [ADR-031](decisions/031-forwarding-policy.md), [configuration.md](configuration.md)
### OQ-13: Config file auto-reload via file watching
- **Origin**: [research/configuration.md](../research/configuration.md)
@@ -119,38 +119,59 @@ last_updated: 2026-06-04
- **Origin**: [research/configuration.md](../research/configuration.md)
- **Status**: open
- **Priority**: medium
- **Resolution**: (pending — needs R&D in WebTransport transport session)
- **Cross-references**: [auth.md](auth.md), OQ-19
- **Resolution**: (deferred to Phase 4 — needs R&D in WebTransport transport session)
- **Cross-references**: [auth.md](auth.md), OQ-19, [interface.md](interface.md)
### OQ-16: Transport-specific forwarding policy (e.g., WebTransport clients restricted to alknet-* channels)
- **Origin**: [research/configuration.md](../research/configuration.md)
- **Status**: open
- **Priority**: low
- **Resolution**: (pending — defer to forwarding policy design)
- **Cross-references**: configuration.md
- **Status**: ~~resolved~~
- **Priority**: ~~low~~
- **Resolution**: ADR-031 — Add `TransportKind` match in `ForwardingRule`. WebTransport clients can be restricted to `alknet-*` channels via `TargetPattern::AlknetPrefix` combined with a `TransportKind::WebTransport` filter.
- **Cross-references**: [ADR-031](decisions/031-forwarding-policy.md), [configuration.md](configuration.md)
### OQ-17: Transport-aware auth layer (SSH keys vs API keys for non-SSH transports)
- **Origin**: [research/configuration.md](../research/configuration.md)
- **Status**: ~~resolved~~
- **Priority**: ~~medium~~
- **Resolution**: ADR-023 — Unified auth with shared key material. SSH transports use SSH pubkey auth. Non-SSH transports (WebTransport) use Ed25519-signed timestamp tokens. Both verify against the same `authorized_keys` set. The presentation differs per transport, but the identity is unified. `AuthPolicy` holds both `SshAuthConfig` and `TokenAuthConfig`, with `TokenKeySource::Shared` as the default (same keys for both paths). `IdentityProvider` trait decouples alknet-core from identity storage.
- **Cross-references**: [ADR-023](decisions/023-unified-auth-shared-key-material.md), [auth.md](auth.md), OQ-15
- **Cross-references**: [ADR-023](decisions/023-unified-auth-shared-key-material.md), [identity.md](identity.md), OQ-15
### OQ-23: irpc dependency — always or behind feature flag?
- **Origin**: [research/integration-plan.md](../research/integration-plan.md)
- **Status**: ~~resolved~~
- **Priority**: medium —
- **Resolution**: ADR-027 — Feature flag. Nodes that only do SSH tunneling don't need the service layer. irpc is behind a feature flag in alknet-core and an independent dependency in alknet-secret and alknet-storage.
- **Cross-references**: [ADR-027](decisions/027-crate-decomposition.md)
### OQ-24: DNS control channel scope for initial implementation?
- **Origin**: [research/integration-plan.md](../research/integration-plan.md)
- **Status**: ~~resolved~~
- **Priority**: medium —
- **Resolution**: ADR-026 — DNS control channel carries call protocol frames only (no SSH tunneling over DNS). The (DNS transport, raw framing interface) pair sends `EventEnvelope` directly. SSH-over-DNS is a future possibility but out of scope.
- **Cross-references**: [ADR-026](decisions/026-transport-interface-separation.md), [interface.md](interface.md)
### OQ-25: alknet-storage and alknet-secret irpc dependency
- **Origin**: [research/integration-plan.md](../research/integration-plan.md)
- **Status**: ~~resolved~~
- **Priority**: low —
- **Resolution**: ADR-027 — Independently. They're separate crates. irpc is a shared library they both use as an independent dependency.
- **Cross-references**: [ADR-027](decisions/027-crate-decomposition.md)
## Auth
### OQ-18: Source of Identity.scopes — ForwardingPolicy, IdentityProvider, or both?
- **Origin**: [auth.md](auth.md)
- **Status**: open
- **Priority**: medium
- **Resolution**: (pending)
- **Cross-references**: ADR-023, [call-protocol.md](call-protocol.md)
- **Status**: ~~resolved~~
- **Priority**: ~~medium~~
- **Resolution**: ADR-029 and ADR-031 — `IdentityProvider` owns scopes. The `Identity` struct includes `scopes` and `resources` fields populated by the `IdentityProvider` implementation (config-based or database-backed). `ForwardingPolicy` uses scopes from `Identity` — it consumes them, it doesn't produce them.
- **Cross-references**: [ADR-029](decisions/029-identity-core-type.md), [ADR-031](decisions/031-forwarding-policy.md), [identity.md](identity.md)
### OQ-19: Separate TLS identity for WebTransport vs shared with SSH-over-TLS?
- **Origin**: [auth.md](auth.md)
- **Status**: open
- **Priority**: low
- **Resolution**: (pending)
- **Cross-references**: OQ-15
- **Resolution**: (deferred to Phase 4 — QUIC is UDP, TLS-over-TCP is TCP, they can share port 443 without conflict)
- **Cross-references**: OQ-15, [interface.md](interface.md)
## Call Protocol
@@ -158,19 +179,65 @@ last_updated: 2026-06-04
- **Origin**: [call-protocol.md](call-protocol.md)
- **Status**: open
- **Priority**: medium
- **Resolution**: (pending — registration on connect / cleanup on disconnect is the leading approach)
- **Resolution**: (pending — registration on connect / cleanup on disconnect is the leading approach but needs spec in call-protocol.md)
- **Cross-references**: ADR-024, ADR-025
### OQ-21: Routing calls to specific workers with same-service operations
- **Origin**: [call-protocol.md](call-protocol.md)
- **Status**: ~~resolved~~
- **Priority**: ~~medium~~
- **Resolution**: ADR-024, ADR-025 — Operation paths use `/{node}/{service}/{op}` format. The first path segment identifies the node and routes the call to the correct connected node. Multiple workers exposing the same service (e.g., two dev envs both with `/fs/*`) are differentiated by the node prefix (`/dev1/fs/readFile` vs `/dev2/fs/readFile`). The head maintains a routing table mapping node identity to connection. This mirrors iroh's ALPN dispatch: first segment = routing key.
- **Resolution**: ADR-024, ADR-025 — Operation paths use `/{node}/{service}/{op}` format. The first path segment identifies the node and routes the call to the correct connected node. Multiple workers exposing the same service are differentiated by the node prefix (`/dev1/fs/readFile` vs `/dev2/fs/readFile`). The head maintains a routing table mapping node identity to connection.
- **Cross-references**: [call-protocol.md](call-protocol.md), ADR-024, ADR-025
### OQ-22: Client streaming (streaming inputs) in the call protocol?
- **Origin**: [call-protocol.md](call-protocol.md)
- **Status**: ~~resolved~~
- **Priority**: ~~low~~
- **Resolution**: Deferred. Current model (single request, optional streaming response) covers all identified use cases. Client streaming can be added later if needed.
- **Cross-references**: ADR-024
## Services
### OQ-SVC-01: Should the secret service support multiple seed phrases (one per tenant)?
- **Origin**: [secret-service.md](secret-service.md)
- **Status**: open
- **Priority**: low
- **Resolution**: (pending)
- **Cross-references**: ADR-024
- **Resolution**: (deferred — one seed per node is simplest; multi-seed can be added later by indexing `Unlock` with a tenant ID)
- **Cross-references**: [secret-service.md](secret-service.md)
### OQ-SVC-02: Should service protocols use postcard (binary) or JSON for remote calls?
- **Origin**: [research/services.md](../research/services.md)
- **Status**: ~~resolved~~
- **Priority**: low —
- **Resolution**: Postcard for irpc (Rust-to-Rust, efficient). JSON for call protocol (cross-language, universal). The irpc remote path naturally uses postcard.
- **Cross-references**: [services.md](services.md)
### OQ-SVC-03: How does the secret service integrate with the existing EncryptedDataSchema from @alkdev/storage?
- **Origin**: [secret-service.md](secret-service.md)
- **Status**: open
- **Priority**: medium
- **Resolution**: (pending — Rust implementation replaces PBKDF2 password-based encryption with derived AES-256-GCM keys; EncryptedData format is a superset; migration by re-encrypting)
- **Cross-references**: [secret-service.md](secret-service.md), [storage.md](storage.md)
### OQ-SVC-04: Should workers cache derived keys locally?
- **Origin**: [secret-service.md](secret-service.md)
- **Status**: open
- **Priority**: low
- **Resolution**: Yes, with a TTL (default: 1 hour). The head can revoke by invalidating the session.
- **Cross-references**: [secret-service.md](secret-service.md)
## Interface
### OQ-IF-01: How does the Interface session type relate to the call protocol's EventEnvelope stream?
- **Origin**: [interface.md](interface.md)
- **Status**: open
- **Priority**: high
- **Resolution**: (pending — needs design during Phase 1.8 implementation)
- **Cross-references**: [interface.md](interface.md), [ADR-026](decisions/026-transport-interface-separation.md)
### OQ-IF-02: Should SshInterface own ForwardingPolicy checks or should they move to Layer 3?
- **Origin**: [interface.md](interface.md)
- **Status**: open
- **Priority**: medium
- **Resolution**: (pending — current thinking: forwarding check is Layer 3 policy, but channel open/close lifecycle is Layer 2. The Interface reports channel open requests to Layer 3; Layer 3 applies ForwardingPolicy.)
- **Cross-references**: [interface.md](interface.md), [ADR-031](decisions/031-forwarding-policy.md)

View File

@@ -0,0 +1,197 @@
---
status: draft
last_updated: 2026-06-07
---
# Secret Service
## What
The `alknet-secret` crate provides BIP39 mnemonic generation, SLIP-0010 Ed25519
HD key derivation, AES-256-GCM encryption for external credentials, and the
`SecretProtocol` irpc service. It is the only component that holds the master
seed phrase.
## Why
Operations like SSH key generation, API key storage, and Ethereum transaction
signing all need deterministic key derivation from a single root of trust. The
seed phrase is the single recovery mechanism — from it, all self-generated
secrets can be derived on demand. External credentials (third-party API keys,
OAuth tokens) cannot be derived and must be stored encrypted, with the
encryption key itself derived from the seed.
The secret service isolates this responsibility: no other crate sees the seed,
and derived keys are provided on demand through an irpc service interface.
## Architecture
### Security Model
| State | What's in memory | What's on disk |
|-------|-----------------|---------------|
| Locked | Nothing | Encrypted database, derivation path metadata |
| Unlocked | Master seed in RAM | Same (seed is never persisted) |
| After use | Derived keys cached in RAM | Derivation paths only |
The seed phrase is entered once (at node startup or via `Unlock` call), held
only in RAM, and never written to disk. The `Lock` call purges the seed and all
cached derived keys from memory.
### SecretProtocol irpc Service
```rust
#[rpc_requests(message = SecretMessage)]
#[derive(Debug, Serialize, Deserialize)]
enum SecretProtocol {
#[rpc(tx=oneshot::Sender<DerivedKey>)]
#[wrap(DeriveEd25519)]
DeriveEd25519 { path: String },
#[rpc(tx=oneshot::Sender<DerivedKey>)]
#[wrap(DeriveEncryptionKey)]
DeriveEncryptionKey { path: String },
#[rpc(tx=oneshot::Sender<DerivedKey>)]
#[wrap(DeriveEthereumKey)]
DeriveEthereumKey { path: String },
#[rpc(tx=oneshot::Sender<Vec<u8>>)]
#[wrap(DerivePassword)]
DerivePassword { path: String, length: usize },
#[rpc(tx=oneshot::Sender<EncryptedData>)]
#[wrap(Encrypt)]
Encrypt { plaintext: String, key_version: u32 },
#[rpc(tx=oneshot::Sender<String>)]
#[wrap(Decrypt)]
Decrypt { encrypted: EncryptedData },
#[rpc(tx=oneshot::Sender<()>)]
#[wrap(Lock)]
Lock,
#[rpc(tx=oneshot::Sender<()>)]
#[wrap(Unlock)]
Unlock { passphrase: String },
}
#[derive(Debug, Serialize, Deserialize)]
struct DerivedKey {
key_type: KeyType,
private_key: Vec<u8>,
public_key: Vec<u8>,
}
#[derive(Debug, Serialize, Deserialize)]
enum KeyType {
Ed25519,
Aes256Gcm,
Secp256k1,
}
#[derive(Debug, Serialize, Deserialize)]
struct EncryptedData {
key_version: u32,
salt: String, // Base64-encoded
iv: String, // Base64-encoded
data: String, // Base64-encoded
}
```
### BIP39 Mnemonic and Seed Derivation
```rust
let mnemonic = Mnemonic::from_phrase(&phrase, Language::English)?;
let seed = mnemonic.to_seed(Some(&passphrase));
let master_key = ExtendedPrivKey::new_master(Network::Alknet, &seed)?;
```
### SLIP-0010 Ed25519 HD Key Derivation
The `74'` coin type is unallocated per SLIP-0044 and reserved for alknet.
### Derivation Path Constants
| Path | Purpose | Curve/Algorithm |
|------|---------|----------------|
| `m/74'/0'/0'/0'` | Primary identity keypair | Ed25519 (alknet auth) |
| `m/74'/0'/0'/{n}'` | Worker/device identity | Ed25519 |
| `m/74'/0'/1'/0'` | SSH host key | Ed25519 |
| `m/74'/1'/0'/{hash}'` | Site-specific password | Deterministic |
| `m/74'/2'/0'/0'` | Encryption key for external credentials | AES-256-GCM |
| `m/44'/60'/0'/0/0` | Ethereum signing key | secp256k1 |
### AES-256-GCM Encryption for External Credentials
External credentials (API keys, OAuth tokens) that cannot be derived are
encrypted using a key derived from the seed at path `m/74'/2'/0'/0'`. The
`EncryptedData` type stores the key version, salt, IV, and ciphertext. This
format is compatible with the existing `@alkdev/storage` `EncryptedDataSchema`.
1. The secret service derives an AES-256-GCM key via path `m/74'/2'/0'/0'`
2. External credentials are encrypted with this key
3. The encrypted data is stored as a `SecretNode` in the metagraph
4. Only the derivation path and key version are stored in plain attributes
5. The seed phrase (or derived encryption key) is held only by the secret
service — never in the database
### Deployment Topologies
**Minimal (single node, CLI)**: Secret service runs in the same process. Seed
phrase entered at startup. All keys derived locally. No irpc overhead.
**Production (head node)**: Secret service runs on a dedicated node or as a
local irpc service. Workers request derived keys via irpc over QUIC. The seed
never leaves the secret service node.
## Constraints
- The seed phrase is never persisted to disk. It is entered at startup or via
`Unlock` and held only in RAM.
- `Lock` purges the seed and all cached derived keys from memory.
- alknet-secret does not depend on alknet-core or alknet-storage. It is fully
independent.
- The `EncryptedData` wire format (key_version, salt, iv, data) is shared with
alknet-storage for compatibility, but this is type-level compatibility — not a
crate dependency.
- Per ADR-032, the secret service's Honker streams (key derivation notifications)
stay within the service boundary. External consumers use irpc calls or call
protocol operations that project to integration events.
- The irpc service defines the wire format for in-cluster communication
(postcard serialization). For call protocol exposure (e.g.,
`/head/secrets/derive`), the service is wrapped in an operation that serializes
to JSON.
## Open Questions
- **OQ-SVC-01**: Should the secret service support multiple seed phrases (one per
tenant)? See [open-questions.md](open-questions.md).
- **OQ-SVC-02**: Should service protocols use postcard (binary) or JSON for
remote calls? Postcard for irpc (Rust-to-Rust), JSON for call protocol
(cross-language). See [open-questions.md](open-questions.md).
- **OQ-SVC-03**: How does the secret service integrate with the existing
`EncryptedDataSchema` from `@alkdev/storage`? The Rust implementation replaces
PBKDF2 password-based encryption with derived AES-256-GCM keys. The
`EncryptedData` format is a superset.
- **OQ-SVC-04**: Should workers cache derived keys locally? Yes, with a TTL
(default: 1 hour). The head can revoke by invalidating the session.
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [027](decisions/027-crate-decomposition.md) | Crate decomposition | alknet-secret is independent of core and storage |
| [032](decisions/032-event-boundary-discipline.md) | Event boundary | Secret service domain events stay internal |
## References
- [research/services.md](../research/services.md) — SecretProtocol definition, DerivedKey, KeyType
- [research/storage.md](../research/storage.md) — Secrets section, derivation paths, EncryptedData
- [research/integration-plan.md](../research/integration-plan.md) — Phase 2.1
- SLIP-0010 — https://github.com/satoshilabs/slips/blob/master/slip-0010.md
- BIP39 — https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki

View File

@@ -0,0 +1,211 @@
---
status: draft
last_updated: 2026-06-07
---
# Services
## What
The irpc service layer decomposes alknet's core responsibilities into
independently testable, deployable, and replaceable components. Auth, Secret,
Config, and Storage are irpc protocol enums that work both as in-process async
boundaries (tokio channels) and cross-process/cross-network (QUIC streams via
noq). OperationEnv is the universal composition mechanism that unifies local
dispatch, irpc service dispatch, and remote call protocol dispatch.
## Why
Without the service layer, auth verification, key derivation, and config reload
are scattered across the codebase with no async boundary. For head nodes serving
many users, in-memory key lookup doesn't scale — auth needs to query a database
on demand. For secret management, the seed must be isolated in its own process
boundary.
Without OperationEnv, handlers calling other operations would need to know
whether the target is local, in-cluster, or on a remote node. OperationEnv
abstracts this away: `context.env.invoke("secrets", "derive", input)` works
regardless of dispatch path.
## Architecture
### Service Definition Pattern
Services are defined as irpc protocol enums:
```rust
#[rpc_requests(message = AuthMessage)]
#[derive(Debug, Serialize, Deserialize)]
enum AuthProtocol {
#[rpc(tx=oneshot::Sender<AuthResult>)]
#[wrap(VerifyPubkey)]
VerifyPubkey { fingerprint: String, key_data: Vec<u8> },
// ...
}
```
The `#[rpc_requests]` macro generates two versions:
- **Serializable** (`Request`): for remote communication (postcard encoding)
- **With channels** (`RequestWithChannels`): for local communication (tokio channels)
Both use the same `Client<S>` type. The local/remote distinction is transparent
at the call site.
### Core Services
| Service | Protocol | Purpose | Always Local? |
|---------|----------|---------|---------------|
| **Auth** | `AuthProtocol` | Verify identities, check credentials | Can be remote |
| **Secret** | `SecretProtocol` | Derive keys, encrypt/decrypt | Local or remote |
| **Config** | `ConfigProtocol` | Dynamic config reload | Local |
| **Storage** | `StorageProtocol` | Graph CRUD, metagraph operations | Local or remote |
### OperationContext
Every handler receives an `OperationContext`:
```rust
pub struct OperationContext {
pub request_id: String,
pub parent_request_id: Option<String>,
pub identity: Option<Identity>,
pub metadata: HashMap<String, Value>,
pub env: OperationEnv,
pub trusted: bool, // set by buildEnv(), not by callers
}
```
- **`identity`**: The authenticated identity making the call. Populated by
`IdentityProvider` from the interface layer.
- **`env`**: The operation environment — namespaced access to other operations.
- **`trusted`**: When a handler calls another operation through `env`, the
nested call is `trusted` (skips ACL checks).
### OperationEnv — Universal Composition Mechanism
OperationEnv provides namespace + operation name → invoke with input, return
output. The handler doesn't know or care whether the dispatch is local, irpc,
or remote.
Three dispatch paths:
| Path | Mechanism | Serialization | Scope |
|------|-----------|---------------|-------|
| **Local** | Direct function call through registry | None (in-process) | Same process |
| **Service** | irpc protocol enum dispatch | postcard (binary) | Same cluster |
| **Remote** | Call protocol `EventEnvelope` | JSON | Cross-node |
All three produce the same `ResponseEnvelope`.
Service assembly determines which path each operation uses:
```rust
// Minimal deployment (single node, all local)
let env = OperationEnv::local(local_registry);
// Production deployment (mix of local and remote)
let env = OperationEnv::new()
.local("auth", auth_registry)
.local("config", config_registry)
.service("secrets", secret_irpc_client)
.remote("worker-1", call_protocol_conn);
```
### Service vs Call Protocol vs External Service
These are different concepts that compose through OperationEnv:
- **irpc service**: In-cluster, Rust-to-Rust, type-safe, postcard serialization.
Dispatched by enum variant. Example: `AuthProtocol::VerifyPubkey`.
- **Call protocol operation**: Cross-node, cross-language, path-based, JSON
`EventEnvelope`. Dispatched by namespace + name. Example:
`/head/auth/verify`.
- **External service**: Any endpoint reachable via the call protocol.
Example: a vast.ai instance, an HTTP API, another head node.
An irpc service can back a call protocol operation. The OperationEnv routes to
the appropriate dispatch path:
```
Call Protocol (Layer 3, external, JSON)
└── irpc Service (Layer 3, internal, postcard)
└── Honker Streams (Domain events, within service boundary)
```
### Adapters
HTTP, MCP, DNS, and WebSocket adapters all resolve through OperationEnv:
- HTTP: `POST /v1/{namespace}/{op}``context.env.invoke(namespace, op, input)`
- MCP: `tools/call` with tool name → `context.env.invoke(namespace, op, input)`
- DNS: `{op}.{namespace}.alk.dev TXT?``context.env.invoke(namespace, op, input)`
- Call protocol: `call.requested` with `operationId``context.env.invoke(namespace, op, input)`
### Deployment Topologies
**Minimal (single node, CLI)**: All services run locally via tokio channels.
```
┌──────────────────────────────────────────────┐
│ Single Process │
│ Auth (ArcSwap) | Secret (seed in RAM) | │
│ Config (ArcSwap) | alknet-core Server │
└──────────────────────────────────────────────┘
```
**Production (multi-node)**: Auth and secrets on dedicated nodes; workers
access them remotely.
```
Auth Node (SQLite) Secret Node (seed in RAM)
↑ ↑
│ QUIC (irpc) │ QUIC (irpc)
│ │
Head Node (Config, Storage, alknet-core Server)
│ SSH / iroh / TLS
Worker Node (alknet-core Client)
```
## Constraints
- Services are **internal** — they run within a node or cluster.
- The call protocol is **external** — it's how nodes talk to each other.
- Per ADR-032, domain events (Honker streams) stay within the owning service.
irpc calls are synchronous request-response within a node. Call protocol
`EventEnvelope` is the integration boundary between nodes.
- OperationEnv is a hard constraint: the handler-facing API must match the
behavioral contract from `@alkdev/operations`. Namespace + operation name →
invoke with input, return output.
- irpc is behind a feature flag in alknet-core. Nodes that only do SSH tunneling
don't need the service layer overhead.
## Open Questions
- **OQ-SVC-01**: Should the secret service support multiple seed phrases (one
per tenant)? Defer for now — one seed per node. Multi-seed can be added
later by indexing the `Unlock` call with a tenant ID.
- **OQ-SVC-02**: Should service protocols use postcard (binary) or JSON for
remote calls? Postcard for irpc (Rust-to-Rust, efficient). JSON for call
protocol (cross-language, universal). The irpc remote path naturally uses
postcard.
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [027](decisions/027-crate-decomposition.md) | Crate decomposition | Service crates are independent of core |
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | AuthProtocol behind feature flag |
| [032](decisions/032-event-boundary-discipline.md) | Event boundary | Domain events never cross service boundaries |
| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv | Universal composition mechanism with three dispatch paths |
## References
- [research/services.md](../research/services.md) — Service protocol definitions, OperationContext, deployment topologies
- [research/integration-plan.md](../research/integration-plan.md) — OperationEnv, three dispatch paths, adapter patterns
- [secret-service.md](secret-service.md) — SecretProtocol definition
- [identity.md](identity.md) — IdentityProvider, AuthProtocol
- [configuration.md](configuration.md) — ConfigProtocol, DynamicConfig reload
- [interface.md](interface.md) — Interface layer, auth across interfaces

View File

@@ -0,0 +1,219 @@
---
status: draft
last_updated: 2026-06-07
---
# Storage
## What
The `alknet-storage` crate provides SQLite-backed graph storage, identity
management, access control, and reactivity via honker. It mirrors the
TypeScript `@alkdev/storage` package's design while leveraging Rust's type
system and honker's built-in pub/sub.
## Why
alknet-core needs persistent identity data (authorized keys, accounts, ACLs)
and a way to store and query graph-structured data (call graphs, operation
graphs, metagraph). But alknet-core cannot take a database dependency. The
solution: alknet-storage implements alknet-core's `IdentityProvider` trait,
providing SQLite-backed identity resolution without core knowing about SQLite.
The metagraph (three-level type system: GraphType → NodeType → EdgeType → Graph
→ Node → Edge) is the foundation for ACL, flowgraph persistence, and any
future graph-structured data.
## Architecture
### Crate Structure
```
alknet-storage/
├── metagraph/ — GraphType, NodeType, EdgeType persistence
├── identity/ — accounts, organizations, peer_credentials, api_keys, audit_logs
├── acl/ — PrincipalNode, DelegatesEdge, access control graph
├── secrets/ — Encrypted node type, encrypt/decrypt bridge
├── honker/ — honker integration: notify, stream, queue
├── graph/ — GraphInstance, Node, Edge CRUD with schema validation
└── schema/ — JSON Schema definitions (serde + jsonschema)
```
### Metagraph Data Model
Three-level type system:
1. **GraphType** — A class of graphs (e.g., "call-graph", "acl",
"task-dependencies"). Defines structural constraints.
2. **NodeType** — A category of node within a graph type. Each has a JSON Schema
for attribute validation.
3. **EdgeType** — A category of edge within a graph type. Each has a JSON Schema
and optional source/target constraints.
Graph instances belong to a graph type and contain nodes and edges conforming
to those type definitions.
### SQLite Table Schema
Common columns: `id TEXT PK`, `metadata TEXT JSON DEFAULT '{}'`,
`created_at INTEGER TIMESTAMP`, `updated_at INTEGER TIMESTAMP`.
| Table | Key columns |
|-------|------------|
| `graph_types` | id, name (UNIQUE), config JSON, version, scope |
| `node_types` | id, graph_type_id FK, name, schema JSON |
| `edge_types` | id, graph_type_id FK, name, schema JSON, allowed_source/target types |
| `graphs` | id, graph_type_id FK, name, description, status, owner_id, project_id |
| `nodes` | id, graph_id FK, key (UNIQUE per graph), attributes JSON |
| `edges` | id, graph_id FK, key, source_node_key, target_node_key, attributes JSON, undirected |
No FK constraints across database files. Referential integrity is enforced at
the application layer.
### System DB vs Tenant DB
- **System DB** (`system.db`): Identity tables (accounts, organizations,
peer_credentials, api_keys, audit_logs) + system-scoped graph types.
- **Tenant DB** (`tenant-{orgId}.db`): Metagraph tables + tenant-scoped graph
types.
### Identity Tables
| Table | Key columns |
|-------|------------|
| `accounts` | email (UNIQUE), display_name, access_level (admin/user/service), status |
| `organizations` | name (UNIQUE), slug (UNIQUE), owner_id FK → accounts |
| `organization_members` | org_id FK, account_id FK, membership_level (owner/admin/member) |
| `api_keys` | owner_id FK, key_hash (UNIQUE), name, enabled, expires_at, revoked_at |
| `peer_credentials` | owner_id FK, credential_type (ssh_key/cert_authority), fingerprint (UNIQUE), public_key_data |
| `audit_logs` | action, owner_id FK, credential_id, org_id FK, details JSON |
### ACL as Metagraph
The ACL graph is a directed, non-multi metagraph:
- **PrincipalNode**: IdentityType (Account, Org, Service, Role) + identity_id + scopes + resources
- **ResourceNode**: The thing being accessed
- **Edges**: can_read, can_write, can_execute, belongs_to, delegates
Delegation edges carry `narrowed_scopes` — the delegate can only exercise scopes
that are a subset of the delegator's.
### StorageIdentityProvider
Implements alknet-core's `IdentityProvider` trait (ADR-029). Queries
`peer_credentials` (for SSH key resolution) and `api_keys` (for token auth), then
traverses the ACL graph to compute effective scopes and resources.
```rust
impl IdentityProvider for StorageIdentityProvider {
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity> {
// 1. Find peer_credentials row by fingerprint
// 2. Resolve to account → organization membership → effective scopes
// 3. Return Identity { id: account_uuid, scopes, resources }
}
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity> {
// 1. Verify Ed25519 signature against api_keys or peer_credentials
// 2. Resolve to account → effective scopes
// 3. Return Identity { id: account_uuid, scopes, resources }
}
}
```
### StorageProtocol irpc Service
```rust
#[rpc_requests(message = StorageMessage)]
enum StorageProtocol {
#[rpc(tx=oneshot::Sender<Graph>)]
#[wrap(CreateGraph)]
CreateGraph { graph_type_id: String, name: String },
#[rpc(tx=oneshot::Sender<Node>)]
#[wrap(AddNode)]
AddNode { graph_id: String, key: String, attributes: Value },
// ... (full protocol in research/services.md)
}
```
### Honker Integration
| Feature | Use case |
|---------|----------|
| `stream_publish` / `subscribe` | Durable pub/sub for node/edge/membership changes |
| `notify` / `listen` | Ephemeral pub/sub for real-time control channel events |
| `queue` / `claim` / `ack` | Task queue for async operations |
Per ADR-032, honker streams are domain events internal to the storage service.
They are projected to call protocol `EventEnvelope` events when crossing service
boundaries.
### Encrypted Data
alknet-storage references alknet-secret's `EncryptedData` wire format for
storing encrypted nodes (API keys, OAuth tokens). The format (key_version,
salt, iv, ciphertext) is shared by type-level compatibility, not a crate
dependency. alknet-secret encrypts; alknet-storage stores the blob.
### Crate Dependencies
```toml
[dependencies]
honker = "0.x"
rusqlite = { version = "0.x", features = ["bundled"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
jsonschema = "0.x"
petgraph = "0.x"
irpc = "0.x"
```
Does NOT depend on alknet-core or alknet-secret. Implements alknet-core's
`IdentityProvider` trait by conforming to its signature, not by direct crate
dependency.
## Constraints
- alknet-storage does NOT depend on alknet-core as a crate. It implements the
`IdentityProvider` trait by conforming to the signature. The CLI binary
wires them together.
- alknet-storage does NOT depend on alknet-secret. They share the `EncryptedData`
wire format by type-level compatibility, not a crate dependency.
- WAL mode for concurrent reads during writes. Single writer per `.db` file.
- JSON Schema validation uses the `jsonschema` crate at runtime (replaces
TypeBox from TypeScript).
- Per ADR-032, honker stream events never cross service boundaries without
projection to `EventEnvelope`.
## Open Questions
- **OQ-SVC-03**: How does the secret service integrate with the existing
`EncryptedDataSchema` from `@alkdev/storage`? The Rust implementation replaces
PBKDF2 password-based encryption with derived AES-256-GCM keys. The
`EncryptedData` format is a superset — old format can be migrated by
re-encrypting with the new key.
- **OQ-SVC-04**: Should workers cache derived keys locally? Yes, with a TTL
(default: 1 hour). The head can revoke by invalidating the session.
- **OQ-SVC-05**: How does the smart contract (NFT-based ACL) interact with the
secret service? The Ethereum signing key (`m/44'/60'/0'/0/0`) is derived from
the same seed. The smart contract is a separate concern.
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [027](decisions/027-crate-decomposition.md) | Crate decomposition | alknet-storage is independent of core and secret |
| [029](decisions/029-identity-core-type.md) | Identity as core type | alknet-storage implements IdentityProvider trait |
| [032](decisions/032-event-boundary-discipline.md) | Event boundary | Honker streams stay internal; projection to EventEnvelope at boundaries |
## References
- [research/storage.md](../research/storage.md) — Full metagraph, identity, ACL, honker definitions
- [research/services.md](../research/services.md) — StorageProtocol, StorageIdentityProvider
- [research/integration-plan.md](../research/integration-plan.md) — Phase 2.2
- [identity.md](identity.md) — IdentityProvider trait, Identity struct
- [secret-service.md](secret-service.md) — EncryptedData format, derivation paths

View File

@@ -1,7 +1,7 @@
---
id: architecture/adr-026-transport-interface-separation
name: Write ADR-026 — Transport/interface separation (three-layer model)
status: pending
status: completed
depends_on: []
scope: moderate
risk: high

View File

@@ -1,7 +1,7 @@
---
id: architecture/adr-027-crate-decomposition
name: Write ADR-027 — Crate decomposition
status: pending
status: completed
depends_on:
- architecture/adr-029-identity-core-type
scope: moderate

View File

@@ -1,7 +1,7 @@
---
id: architecture/adr-028-auth-irpc-service
name: Write ADR-028 — Auth as irpc service
status: pending
status: completed
depends_on:
- architecture/adr-029-identity-core-type
scope: narrow

View File

@@ -1,7 +1,7 @@
---
id: architecture/adr-029-identity-core-type
name: Write ADR-029 — Identity as core type
status: pending
status: completed
depends_on: []
scope: single
risk: low

View File

@@ -1,7 +1,7 @@
---
id: architecture/adr-030-static-dynamic-config-split
name: Write ADR-030 — Static/dynamic config split
status: pending
status: completed
depends_on: []
scope: narrow
risk: low

View File

@@ -1,7 +1,7 @@
---
id: architecture/adr-031-forwarding-policy
name: Write ADR-031 — Forwarding policy
status: pending
status: completed
depends_on: []
scope: narrow
risk: low

View File

@@ -1,7 +1,7 @@
---
id: architecture/adr-032-event-boundary-discipline
name: Write ADR-032 — Event boundary discipline
status: pending
status: completed
depends_on: []
scope: single
risk: low

View File

@@ -1,7 +1,7 @@
---
id: architecture/adr-033-operationenv-irpc-call-protocol
name: Write ADR-033 — OperationEnv, irpc, and call protocol relationship
status: pending
status: completed
depends_on:
- architecture/adr-028-auth-irpc-service
- architecture/adr-027-crate-decomposition

View File

@@ -1,7 +1,7 @@
---
id: architecture/adr-034-head-worker-terminology
name: Write ADR-034 — Head/worker terminology
status: pending
status: completed
depends_on: []
scope: single
risk: trivial

View File

@@ -1,7 +1,7 @@
---
id: architecture/spec-configuration
name: Promote configuration.md from research to architecture spec
status: pending
status: completed
depends_on:
- architecture/adr-030-static-dynamic-config-split
- architecture/adr-031-forwarding-policy

View File

@@ -1,7 +1,7 @@
---
id: architecture/spec-flowgraph
name: Create flowgraph.md architecture spec (or stub referencing crate docs)
status: pending
status: completed
depends_on:
- architecture/adr-027-crate-decomposition
scope: narrow

View File

@@ -1,7 +1,7 @@
---
id: architecture/spec-identity
name: Create identity.md architecture spec
status: pending
status: completed
depends_on:
- architecture/adr-029-identity-core-type
- architecture/adr-028-auth-irpc-service

View File

@@ -1,7 +1,7 @@
---
id: architecture/spec-interface
name: Create interface.md architecture spec (Layer 2)
status: pending
status: completed
depends_on:
- architecture/adr-026-transport-interface-separation
- architecture/adr-033-operationenv-irpc-call-protocol

View File

@@ -1,7 +1,7 @@
---
id: architecture/spec-secret-service
name: Create secret-service.md architecture spec
status: pending
status: completed
depends_on:
- architecture/adr-027-crate-decomposition
- architecture/adr-032-event-boundary-discipline

View File

@@ -1,7 +1,7 @@
---
id: architecture/spec-services
name: Create services.md architecture spec (irpc service layer + OperationEnv)
status: pending
status: completed
depends_on:
- architecture/adr-033-operationenv-irpc-call-protocol
- architecture/adr-027-crate-decomposition

View File

@@ -1,7 +1,7 @@
---
id: architecture/spec-storage
name: Create storage.md architecture spec (or stub referencing crate docs)
status: pending
status: completed
depends_on:
- architecture/adr-027-crate-decomposition
- architecture/adr-029-identity-core-type

View File

@@ -1,7 +1,7 @@
---
id: architecture/spec-update-auth
name: Update auth.md — add IdentityProvider vs AuthService relationship
status: pending
status: completed
depends_on:
- architecture/spec-identity
- architecture/adr-028-auth-irpc-service

View File

@@ -1,7 +1,7 @@
---
id: architecture/spec-update-open-questions
name: Update open-questions.md — resolve questions per ADR decisions
status: pending
status: completed
depends_on:
- architecture/adr-031-forwarding-policy
- architecture/adr-029-identity-core-type

View File

@@ -1,7 +1,7 @@
---
id: architecture/spec-update-readme
name: Update architecture README.md — add new docs and ADRs to tables
status: pending
status: completed
depends_on:
- architecture/spec-configuration
- architecture/spec-identity