docs: complete Phase 0 architecture — spec updates, review fixes, and link portability

Update four existing specs (overview, server, napi-and-pubsub, call-protocol) to
reflect Phase 0 decisions: three-layer model, IdentityProvider, ForwardingPolicy,
OperationEnv, static/dynamic config split. Review all 9 Phase 0a ADRs (026-034)
for consistency. Fix 4 critical issues from architecture review: missing OQ-SVC-05
in open-questions.md, deprecated hub terminology, undefined AuthService and noq
terms. Replace inline OQ text with cross-references per format rules. Add
ConfigServiceImpl definition to configuration.md. Port absolute workspace paths
to project-relative links by copying referenced docs (feasibility, certbot,
fail2ban, event_source_types) into docs/research/.
This commit is contained in:
2026-06-07 11:27:52 +00:00
parent 835724d087
commit d3633b7839
22 changed files with 1508 additions and 115 deletions

View File

@@ -7,25 +7,26 @@ last_updated: 2026-06-07
## Current State
Architecture specification in active development. Phase 0 foundation ADRs
completed (026034). New spec documents created for identity, services,
interface, configuration, storage, flowgraph, and secret service. Existing
specs updated for the three-layer model, crate decomposition, and unified
identity. See [open-questions.md](open-questions.md) for remaining open
questions.
Architecture specification in active development. Phase 0 foundation complete:
ADRs 001034 accepted, new spec documents created for all components, existing
specs updated for the three-layer model, crate decomposition, unified identity,
OperationEnv, and forwarding policy. Remaining open questions: OQ-15 (QUIC
coexistence), OQ-19 (WebTransport TLS), OQ-20 (worker registration), OQ-IF-01
(Interface session/EventEnvelope), OQ-IF-02 (ForwardingPolicy placement). See
[open-questions.md](open-questions.md).
## Architecture Documents
| Document | Status | Description |
|----------|--------|-------------|
| [overview.md](overview.md) | reviewed | Package purpose, exports, dependencies |
| [overview.md](overview.md) | reviewed | Package purpose, crate structure, three-layer model, exports, dependencies |
| [transport.md](transport.md) | reviewed | Transport abstraction: TCP, TLS, iroh |
| [auth.md](auth.md) | draft | Unified auth: SSH + token, IdentityProvider trait |
| [call-protocol.md](call-protocol.md) | draft | Bidirectional call/event protocol, operation registry |
| [call-protocol.md](call-protocol.md) | draft | Bidirectional call/event protocol, OperationEnv, three dispatch paths |
| [client.md](client.md) | reviewed | Client connection, SOCKS5, port forwarding |
| [server.md](server.md) | reviewed | Server acceptance, channel handling, proxy |
| [server.md](server.md) | reviewed | Server acceptance, IdentityProvider, ForwardingPolicy, channel handling |
| [tun-shim.md](tun-shim.md) | deprecated | TUN interface wrapper — **deferred**, use tun2proxy |
| [napi-and-pubsub.md](napi-and-pubsub.md) | reviewed | NAPI wrapper and pubsub event target adapter |
| [napi-and-pubsub.md](napi-and-pubsub.md) | reviewed | NAPI wrapper, reload API, pubsub event target adapter |
| [identity.md](identity.md) | draft | Identity type, IdentityProvider trait, auth flows |
| [services.md](services.md) | draft | irpc service layer, OperationEnv, three dispatch paths |
| [interface.md](interface.md) | draft | Layer 2: Interface trait, SshInterface, RawFramingInterface |
@@ -44,6 +45,9 @@ questions.
| [storage.md](../research/storage.md) | draft | Metagraph, identity, ACL, secrets, honker |
| [flow.md](../research/flow.md) | draft | FlowGraph, operation graph, call graph, petgraph mapping |
| [integration-plan.md](../research/integration-plan.md) | draft | Phased integration plan for services, pubsub, and operations |
| [feasibility/](../research/feasibility/) | — | SSH tunnel feasibility assessment and related analyses |
| [event-sourcing/](../research/event-sourcing/) | — | Event sourcing patterns and event-driven architecture reference |
| [ops/](../research/ops/) | — | Production ops reference: certbot, fail2ban |
## ADR Table
@@ -81,6 +85,9 @@ questions.
| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv as universal composition mechanism | Accepted |
| [034](decisions/034-head-worker-terminology.md) | Head/worker terminology replacing hub/spoke | Accepted |
> ADR numbers 020022 were allocated to proposals that were withdrawn before
> acceptance and are not listed.
## Open Questions
See [open-questions.md](open-questions.md) for all open and resolved questions.

View File

@@ -13,6 +13,11 @@ subscriptions, and unidirectional events — all using the same wire format. The
protocol is defined as a spec + handler + registry; downstream consumers (NAPI,
Python, head/worker) register their own operations without modifying core.
OperationEnv extends the call protocol with a universal composition mechanism
that unifies local dispatch, irpc service dispatch, and remote dispatch. A
handler receives `context.env.invoke(namespace, op, input)` and doesn't know
whether the operation runs locally, in-cluster, or on a remote node.
## Why
The current control channel (ADR-018) is unidirectional (client → server) and
@@ -21,6 +26,10 @@ The call protocol generalizes it to support bidirectional calls (ADR-024) and
downstream service registration (ADR-025), enabling the head/worker model where
workers expose operations the head invokes.
Without OperationEnv, handlers calling other operations would need to know
whether the target is local, in-cluster, or on a remote node. OperationEnv
abstracts this away — one handler-facing API, three dispatch backends (ADR-033).
## Architecture
### Operation Paths
@@ -316,6 +325,101 @@ that carries `EventEnvelope` frames:
The framing is always: 4-byte BE length prefix + JSON. The envelope shape is
the same regardless of transport.
### OperationEnv — Universal Composition Mechanism
OperationEnv provides the handler-facing API for composing operations. A handler
receives `context.env.invoke(namespace, operation, input)` and gets back a
`ResponseEnvelope` — regardless of which dispatch path the operation takes
(ADR-033).
Three dispatch paths, one API:
| Path | Mechanism | Serialization | Scope |
|------|-----------|---------------|-------|
| **Local** | Direct function call through registry | None (in-process) | Same process |
| **Service** | irpc protocol enum dispatch | postcard (binary) | Same cluster |
| **Remote** | Call protocol `EventEnvelope` | JSON | Cross-node |
All three produce the same `ResponseEnvelope`. Service assembly determines
which path each operation uses:
```rust
// Minimal deployment (Phase 1: single node, all local)
let env = OperationEnv::local(local_registry);
// Production deployment (Phase 2+: mix of local and remote)
let env = OperationEnv::new()
.local("auth", auth_registry)
.local("config", config_registry)
.service("secrets", secret_irpc_client)
.remote("worker-1", call_protocol_conn);
```
**Phase boundary**: Phase 1 ships with local dispatch only (direct function
calls through the operation registry). The irpc service dispatch and remote
dispatch paths are contracted here but not built yet. irpc service protocols
(`AuthProtocol`, `SecretProtocol`, etc.) are defined in the specs but the
implementations are Phase 2+ work.
**irpc is one dispatch backend for OperationEnv, not a replacement for the
call protocol or for OperationEnv.** A call protocol handler can call an irpc
service internally (e.g., `/head/auth/verify` calls
`AuthProtocol::VerifyPubkey`) — the layers compose. irpc is behind a feature
flag in alknet-core. See [services.md](services.md) for full OperationEnv and
irpc service details.
### OperationContext
Every handler receives an `OperationContext`:
```rust
pub struct OperationContext {
pub request_id: String,
pub parent_request_id: Option<String>,
pub identity: Option<Identity>,
pub metadata: HashMap<String, Value>,
pub env: OperationEnv,
pub trusted: bool, // set by buildEnv(), not by callers
}
```
- **`identity`**: The authenticated identity making the call. Populated by
`IdentityProvider` from the interface layer ([identity.md](identity.md)).
- **`env`**: The operation environment — namespaced access to other operations.
- **`trusted`**: When a handler calls another operation through `env`, the
nested call is `trusted` (skips ACL checks). This prevents double-checking:
if `/head/agent/chat` is allowed, and it internally calls
`/head/auth/verify`, the auth check is trusted.
Handler signature:
```rust
fn handle(input: Value, context: OperationContext) -> ResponseEnvelope;
```
### ResponseEnvelope
The universal return type from all three dispatch paths:
```rust
pub struct ResponseEnvelope {
pub request_id: String,
pub result: Result<Value, CallError>,
}
pub struct CallError {
pub code: String,
pub message: String,
pub retryable: bool,
}
```
Local dispatch produces `ResponseEnvelope` with no serialization. irpc service
dispatch produces postcard-encoded results that are decoded into
`ResponseEnvelope`. Remote dispatch receives `call.responded` EventEnvelope
frames and maps them to `ResponseEnvelope`. The handler always gets the same
type back.
### Relationship to @alkdev/pubsub and @alkdev/operations
The call protocol in core is a Rust reimplementation of the same protocol
@@ -335,11 +439,11 @@ through core, out over SSH channel, into a JavaScript pubsub adapter, and
be dispatched through `@alkdev/operations`'s call handler** — with zero
translation at the wire level.
### Agent Service Pattern (Future)
### Agent Service Pattern (Downstream Application Concern)
An agent service — coordinating between LLM providers and tool calls — is a
primary use case for the call protocol. It would be just another set of
registered operations with no special treatment:
primary downstream use case for the call protocol. It would be just another set
of registered operations with no special treatment:
- `/head/agent/chat` — send a message, get a completion. Routes to the
appropriate LLM provider based on available workers and configuration.
@@ -348,12 +452,10 @@ registered operations with no special treatment:
durable storage).
- `/head/sessions/history` — retrieve a specific session's message history.
The agent service would use the same call protocol to invoke tools on workers
(e.g., `/dev1/fs/readFile` for file access, `/dev1/bash/exec` for shell
commands). This is a **downstream application concern**, not a core
requirement. The call protocol enables it by providing the universal composition
mechanism (OperationEnv, ADR-033), but the agent service itself is built on
top, not into the core.
The agent service uses OperationEnv to invoke tools on workers. **This is a
downstream application concern, not a core requirement.** The call protocol
enables it by providing the universal composition mechanism (ADR-033), but the
agent service itself is built on top, not into the core.
## Constraints
@@ -370,6 +472,16 @@ top, not into the core.
boundary. ACL is enforced at the `AccessControl` level, not by path prefix
alone. A worker that exposes `/dev1/bash/exec` can restrict access via
`required_scopes` — not every authenticated identity should have shell access.
- **OperationEnv composition model matches the `@alkdev/operations` behavioral
contract**: namespace + operation name → invoke with input, return output.
The Rust implementation may differ in structure but must preserve this
contract (ADR-033).
- **irpc is explicitly positioned as one dispatch backend for OperationEnv**
(ADR-033, ADR-028). It is not a replacement for the call protocol or for
OperationEnv.
- **Phase 1 is local dispatch only.** irpc service dispatch and remote dispatch
are contracted in this spec but not built yet. The `OperationEnv::local()`
path is the Phase 1 implementation.
## Open Questions
@@ -378,9 +490,13 @@ top, not into the core.
disconnect, or heartbeat-based discovery? See
[open-questions.md](open-questions.md).
- **OQ-22**: Should the call protocol support streaming inputs (client streaming
in gRPC terms), or is client→server always a single request payload with
streaming only server→client? See [open-questions.md](open-questions.md).
- **OQ-22**: ~~Should the call protocol support streaming inputs (client streaming
in gRPC terms)?~~ Resolved — deferred. Current model covers all identified use
cases. See [open-questions.md](open-questions.md).
- **OQ-IF-01**: How does the `Interface` session type relate to the call
protocol's `EventEnvelope` stream? This needs design during Phase 1.8
implementation. See [open-questions.md](open-questions.md).
## Design Decisions
@@ -389,6 +505,8 @@ top, not into the core.
| [018](decisions/018-control-channel-for-pubsub.md) | Control channel for pubsub | Reserved destination for event bus |
| [024](decisions/024-bidirectional-call-protocol.md) | Bidirectional call protocol | Generalizes ADR-018, both sides can call |
| [025](decisions/025-handler-spec-separation.md) | Handler/spec separation | Downstream registers operations without modifying core |
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | irpc is one dispatch backend for OperationEnv |
| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv | Universal composition with three dispatch paths |
## References
@@ -396,7 +514,10 @@ top, not into the core.
- [napi-and-pubsub.md](napi-and-pubsub.md) — NAPI wrapper and pubsub adapter
- [server.md](server.md) — Channel handling and control channel routing
- [transport.md](transport.md) — Transport abstraction
- [configuration.md](../research/configuration.md) — ForwardingPolicy, service metadata
- [identity.md](identity.md) — Identity struct, IdentityProvider trait
- [interface.md](interface.md) — Interface layer, EventEnvelope stream from interfaces
- [configuration.md](configuration.md) — ForwardingPolicy, service metadata
- [services.md](services.md) — OperationEnv, OperationContext, irpc service layer
- `@alkdev/pubsub` — TypeScript event target adapters and `EventEnvelope`
- `@alkdev/operations` — TypeScript call protocol, `OperationSpec`, registry
- `@alkdev/storage``peer_credentials` table, ACL graph, `Identity`

View File

@@ -69,6 +69,39 @@ impl ConfigReloadHandle {
Obtained from `Server::run()`. Passed to NAPI or CLI for explicit reload.
### ConfigServiceImpl
The Phase 1 implementation of config service logic, backed by
`ArcSwap<DynamicConfig>`. Where `ConfigIdentityProvider` wraps the auth section
of `DynamicConfig`, `ConfigServiceImpl` wraps the forwarding and rate-limit
sections. Both are ArcSwap-backed and share the same `DynamicConfig` instance.
```rust
pub struct ConfigServiceImpl {
dynamic: Arc<ArcSwap<DynamicConfig>>,
}
impl ConfigServiceImpl {
pub fn forwarding_policy(&self) -> Arc<ForwardingPolicy> {
self.dynamic.load().forwarding.clone()
}
pub fn rate_limits(&self) -> Arc<RateLimitConfig> {
self.dynamic.load().rate_limits.clone()
}
pub fn reload(&self, new_config: DynamicConfig) {
self.dynamic.store(Arc::new(new_config));
}
}
```
Phase 1 deploys `ConfigServiceImpl` directly — no irpc service boundary. The
`ConfigProtocol` irpc service (behind feature flag) wraps `ConfigServiceImpl`
for production deployments that use the service layer. This mirrors the
`ConfigIdentityProvider` / `AuthProtocol` pattern from [identity.md](identity.md)
and ADR-028.
### ConfigService irpc Service
```rust
@@ -155,7 +188,7 @@ iroh_relay = "https://relay.alk.dev"
| Interface | Static config | Dynamic config | Reload mechanism |
|-----------|--------------|----------------|------------------|
| CLI | Flags + optional `--config` file | Loaded at startup from `--authorized-keys` | None (restart to change) |
| Core Rust | `StaticConfig` struct | `AuthService` (irpc) or `ArcSwap<DynamicConfig>` (minimal) | `ConfigService::reload()` or `ConfigReloadHandle::reload()` |
| Core Rust | `StaticConfig` struct | `AuthProtocol` (irpc) or `ConfigIdentityProvider` (ArcSwap) | `ConfigProtocol::ReloadDynamicConfig` or `ConfigReloadHandle::reload()` |
| NAPI | `serve()` options | Same | `server.reloadAuth()`, `server.reloadForwarding()` |
## Constraints

View File

@@ -23,4 +23,4 @@ This makes adding a new transport (e.g., WebSocket, QUIC directly) a matter of i
## References
- [transport.md](../transport.md)
- [Feasibility assessment §3](../../../../conversations/research/ssh-tunnel-vpn-alternative-feasibility.md)
- [Feasibility assessment §3](../../research/feasibility/ssh-tunnel-vpn-alternative-feasibility.md)

View File

@@ -28,4 +28,4 @@ Option 3 was rejected because it would require modifying russh to understand iro
## References
- [transport.md](../transport.md)
- [Feasibility assessment §11](../../../../conversations/research/ssh-tunnel-vpn-alternative-feasibility.md)
- [Feasibility assessment §11](../../research/feasibility/ssh-tunnel-vpn-alternative-feasibility.md)

View File

@@ -25,4 +25,4 @@ This is directly enabled by russh's `connect_stream()` and `run_stream()` APIs,
## References
- [transport.md](../transport.md)
- [Feasibility assessment §3.4](../../../../conversations/research/ssh-tunnel-vpn-alternative-feasibility.md)
- [Feasibility assessment §3.4](../../research/feasibility/ssh-tunnel-vpn-alternative-feasibility.md)

View File

@@ -4,7 +4,7 @@
Accepted
## Context
TLS transport mode requires certificates. Manual certificate management is error-prone — users need to obtain, install, and renew certificates. Our production setup uses certbot with Let's Encrypt (documented in `/workspace/system/dev1/certbot.md`), which automates this via the ACME protocol.
TLS transport mode requires certificates. Manual certificate management is error-prone — users need to obtain, install, and renew certificates. Our production setup uses certbot with Let's Encrypt (documented in [certbot.md](../../research/ops/certbot.md)), which automates this via the ACME protocol.
There are two ACME flows:
1. **Domain-based**: Standard flow with DNS-01 or HTTP-01 challenge. Certificate is tied to a domain name, auto-renews via certbot/systemd timer. Requires port 80 or DNS access for challenges.
@@ -35,4 +35,4 @@ The implementation should use the `rustls-acme` crate (or similar pure-Rust ACME
- [server.md](../server.md)
- [OQ-01](../open-questions.md) — resolved by this ADR
- [OQ-07](../open-questions.md) — resolved by this ADR
- Production certbot setup: `/workspace/system/dev1/certbot.md`
- Production certbot setup: [certbot.md](../../research/ops/certbot.md)

View File

@@ -4,7 +4,7 @@
Accepted
## Context
The server needs to handle abuse on public-facing deployments. Our production infrastructure uses fail2ban on Linux (documented in `/workspace/system/dev1/fail2ban.md`) with nftables and systemd journal backend. fail2ban needs structured, parseable logs to identify abusive IP addresses.
The server needs to handle abuse on public-facing deployments. Our production infrastructure uses fail2ban on Linux (documented in [fail2ban.md](../../research/ops/fail2ban.md)) with nftables and systemd journal backend. fail2ban needs structured, parseable logs to identify abusive IP addresses.
However, fail2ban is Linux-specific. On other platforms (macOS, Windows, BSD), users need a different approach to reject abusive connections. The server should provide enough logging for fail2ban on Linux and enough built-in protection for other platforms.
@@ -36,4 +36,4 @@ This ensures that even without fail2ban, the server rejects obviously abusive co
## References
- [server.md](../server.md)
- [OQ-08](../open-questions.md) — resolved by this ADR
- Production fail2ban setup: `/workspace/system/dev1/fail2ban.md`
- Production fail2ban setup: [fail2ban.md](../../research/ops/fail2ban.md)

View File

@@ -64,17 +64,30 @@ format, but not as a crate dependency.
### Dependency Graph
```
alknet-secret
/ \
/ \
alknet-core ←──── ←── alknet-storage
\ /
alknet-flowgraph
alknet-napi
alknet (CLI binary — assembles everything)
alknet-secret alknet-storage alknet-flowgraph
(standalone) (standalone) (standalone)
(feature flags (trait impl │ (type compat
in CLI binary) via CLI wire) via JSON)
▼ ▼
┌─────────────────────┐
│ alknet-core │
│ (transport, SSH, │
│ call protocol, │
│ Identity, Config) │
└─────────┬───────────┘
┌────────────┼────────────┐
▼ ▼ ▼
alknet-napi alknet (CLI binary — assembles everything)
```
All four library crates (core, secret, storage, flowgraph) are independent of
each other. Dependencies flow **upward** only. The CLI binary sits at the top
and wires concrete implementations together. alknet-storage implements
alknet-core's `IdentityProvider` trait without a crate dependency — the CLI
binary provides the bridge.
### Narrow Interface Points
Three types serve as the narrow interface points between crates:
@@ -147,4 +160,5 @@ alknet-storage does NOT depend on alknet-secret as a crate. Instead:
- [research/services.md](../../research/services.md) — Service protocols
- [research/storage.md](../../research/storage.md) — alknet-storage contents
- [research/flow.md](../../research/flow.md) — alknet-flowgraph contents
- [ADR-028](028-auth-irpc-service.md) — Auth as irpc service (service protocol enabled by decomposition)
- [ADR-029](029-identity-core-type.md) — Identity as core type (narrow interface point)

View File

@@ -93,4 +93,4 @@ propagate beyond the service boundary without projection.
- [research/services.md](../../research/services.md) — Event boundary discipline section
- [research/storage.md](../../research/storage.md) — Honker integration, event boundaries
- [research/integration-plan.md](../../research/integration-plan.md) — ADR 032 entry
- [event_source_types.md](/workspace/research/event_sourcing/event_source_types.md) — Event-driven architecture patterns
- [event_source_types.md](../../research/event-sourcing/event_source_types.md) — Event-driven architecture patterns

View File

@@ -125,6 +125,8 @@ operations universally composable across all interfaces.
- [research/services.md](../../research/services.md) — OperationContext, OperationEnv
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.5, OperationEnv wiring
- [ADR-026](026-transport-interface-separation.md) — Three-layer model (OperationEnv is Layer 3)
- [ADR-028](028-auth-irpc-service.md) — Auth as irpc service (one dispatch backend)
- [ADR-032](032-event-boundary-discipline.md) — Event boundary discipline
- [ADR-024](024-bidirectional-call-protocol.md) — Bidirectional call protocol
- [ADR-025](025-handler-spec-separation.md) — Handler/spec separation

View File

@@ -1,6 +1,6 @@
---
status: reviewed
last_updated: 2026-06-02
last_updated: 2026-06-07
---
# NAPI Wrapper & PubSub Event Target
@@ -71,11 +71,36 @@ function serve(options: AlknetServeOptions): Promise<AlknetServer>;
interface AlknetServer {
close(): Promise<void>;
onConnection(callback: (stream: Duplex, info: ConnectionInfo) => void): void;
// Dynamic config reload (ADR-030)
reloadAuth(auth: { authorizedKeys?: Buffer, certAuthority?: Buffer }): void;
reloadForwarding(policy: ForwardingPolicyConfig): void;
reloadAll(config: DynamicConfig): void;
}
interface ForwardingPolicyConfig {
default: 'allow' | 'deny';
rules: ForwardingRuleConfig[];
}
interface ForwardingRuleConfig {
target: string; // "localhost:*", "10.0.0.0/8:80", "alknet-*"
action: 'allow' | 'deny';
principals?: string[]; // default ["*"]
}
```
The NAPI layer is **transport-agnostic** — it doesn't know about pubsub's `EventEnvelope`. The pubsub adapter wraps the `Duplex` stream to implement `TypedEventTarget`. This separation ensures the NAPI wrapper is reusable for any stream-based protocol, not tied specifically to pubsub.
### NAPI Call Protocol Integration
NAPI consumers can register operation handlers to participate in the call protocol. The `Duplex` stream from `connect()` or `serve()` carries `EventEnvelope` frames (4-byte BE length prefix + JSON). A TypeScript consumer can implement a call protocol handler that reads these frames and dispatches to registered operations — the same wire protocol used by `@alkdev/operations`.
See [call-protocol.md](call-protocol.md) for the call protocol spec and [services.md](services.md) for OperationEnv and dispatch paths.
### NAPI irpc Service Creation
Behind the `irpc` feature flag, NAPI consumers can create irpc service instances for in-cluster communication. This is a Phase 2+ capability — Phase 1 uses `ConfigIdentityProvider` and direct `ConfigReloadHandle` calls. See [services.md](services.md) for the irpc service layer and ADR-027 for crate decomposition.
### NAPI `connect()` vs CLI `alknet connect`
The NAPI `connect()` function and the CLI `alknet connect` command are fundamentally different operations despite sharing the same name:
@@ -154,4 +179,11 @@ None — all resolved.
| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first API | No file-based config; options are structs or env vars |
| [015](decisions/015-napi-rs-for-ffi-bridge.md) | napi-rs for FFI | Standard Node.js native addon tooling |
| [016](decisions/016-napi-expose-connect-and-serve.md) | Both connect() and serve() | NAPI exposes client and server sides from the start |
| [018](decisions/018-control-channel-for-pubsub.md) | Control channel for pubsub | Reserved `alknet-control` destination for event bus |
| [018](decisions/018-control-channel-for-pubsub.md) | Control channel for pubsub | Reserved `alknet-control` destination for event bus |
| [030](decisions/030-static-dynamic-config-split.md) | Static/dynamic config split | NAPI reload methods for auth, forwarding, and all dynamic config |
## References
- [configuration.md](configuration.md) — DynamicConfig, ForwardingPolicy, reload mechanism
- [services.md](services.md) — OperationEnv, irpc service layer
- [call-protocol.md](call-protocol.md) — Call protocol wire format and operation registry

View File

@@ -105,7 +105,7 @@ last_updated: 2026-06-07
- **Origin**: [research/configuration.md](../research/configuration.md)
- **Status**: resolved
- **Priority**: low
- **Resolution**: No file watching. CLI loads once at startup; NAPI/hub reload explicitly. File watching is a potential attack vector and unnecessary complexity for a security tool.
- **Resolution**: No file watching. CLI loads once at startup; NAPI/head reload explicitly. File watching is a potential attack vector and unnecessary complexity for a security tool.
- **Cross-references**: configuration.md
### OQ-14: ArcSwap vs RwLock for dynamic config
@@ -221,11 +221,18 @@ last_updated: 2026-06-07
### OQ-SVC-04: Should workers cache derived keys locally?
- **Origin**: [secret-service.md](secret-service.md)
- **Status**: open
- **Priority**: low
- **Status**: ~~resolved~~
- **Priority**: low
- **Resolution**: Yes, with a TTL (default: 1 hour). The head can revoke by invalidating the session.
- **Cross-references**: [secret-service.md](secret-service.md)
### OQ-SVC-05: How does the NFT-based ACL smart contract interact with the secret service?
- **Origin**: [storage.md](storage.md)
- **Status**: open
- **Priority**: low
- **Resolution**: The Ethereum signing key (`m/44'/60'/0'/0/0`) is derived from the same seed as the secret service. The smart contract is a separate concern — it reads on-chain ACL state, it doesn't call the secret service.
- **Cross-references**: [storage.md](storage.md), [secret-service.md](secret-service.md)
## Interface
### OQ-IF-01: How does the Interface session type relate to the call protocol's EventEnvelope stream?

View File

@@ -1,6 +1,6 @@
---
status: reviewed
last_updated: 2026-06-02
last_updated: 2026-06-07
---
# Alknet Overview
@@ -16,6 +16,64 @@ Alknet is a self-hostable SSH-based tunnel tool that provides VPN-like functiona
The core insight: SSH tunnels work because SSH is fundamental infrastructure. Blocking it breaks the internet. Alknet makes SSH tunneling accessible through a simple CLI with pluggable transports.
## Crate Structure
Alknet is decomposed into six crates with a strict acyclic dependency graph (ADR-027):
| Crate | Purpose | Exists Now? |
|-------|---------|-------------|
| **alknet-core** | Transport, SSH, call protocol, config, auth types, `OperationSpec`, `Interface` trait | Yes |
| **alknet-napi** | Node.js native addon via napi-rs | Yes |
| **alknet-secret** | BIP39, SLIP-0010 HD key derivation, AES-256-GCM, `SecretProtocol` irpc service | Phase 2+ |
| **alknet-storage** | SQLite-backed metagraph, identity tables, ACL graph, honker, `StorageProtocol` | Phase 2+ |
| **alknet-flowgraph** | `FlowGraph<N,E>` over petgraph, operation graph, call graph | Phase 2+ |
| **alknet** (CLI) | Binary that assembles everything with feature flags | Yes |
The four library crates (core, secret, storage, flowgraph) are independent of each other. Dependencies flow upward only: the CLI binary sits at the top and wires concrete implementations together. alknet-storage implements alknet-core's `IdentityProvider` trait without a crate dependency — the CLI binary provides the bridge.
irpc is behind a feature flag in alknet-core. Nodes that only do SSH tunneling don't need the service layer overhead.
## Three-Layer Model
Alknet uses a three-layer model (ADR-026):
| Layer | Responsibility | Examples |
|-------|---------------|----------|
| **Layer 1: Transport** | Produces byte streams (`AsyncRead + AsyncWrite + Unpin + Send`) | TCP, TLS, iroh, DNS (future), WebTransport (future) |
| **Layer 2: Interface** | Consumes a transport stream and produces call protocol sessions | SSH (handshake + auth + channel multiplexing), raw framing (length-prefix + JSON) |
| **Layer 3: Protocol** | Carries semantics — operation registry, service calls, events | Call protocol, OperationEnv, operation dispatch |
SSH is an interface, not a transport. The three-layer model enables DNS control channels (DNS transport + raw framing), local service mesh (TCP + raw framing), and browser direct call protocol (WebTransport + raw framing) without wrapping SSH inside those transports.
A connection is always a (Transport, Interface) pair. The protocol layer is agnostic to both.
## Service Layer
The irpc service layer decomposes alknet's core responsibilities into independently testable, deployable, and replaceable components (ADR-033, [services.md](services.md)):
- **Auth** (`AuthProtocol`) — verify identities, check credentials
- **Secret** (`SecretProtocol`) — derive keys, encrypt/decrypt
- **Config** (`ConfigProtocol`) — dynamic config reload
- **Storage** (`StorageProtocol`) — graph CRUD, metagraph operations
**OperationEnv** is the universal composition mechanism. A handler receives `context.env.invoke("secrets", "derive", input)` and doesn't know whether the dispatch is local (direct function call), in-cluster (irpc service), or cross-node (call protocol `EventEnvelope`). Three dispatch paths, one handler-facing API.
**Phase boundary**: Phase 1 ships `ConfigIdentityProvider` (ArcSwap-backed) and `ConfigServiceImpl` (ArcSwap-backed) as the only auth and config implementations. The irpc service protocols (`AuthProtocol`, `SecretProtocol`, etc.) and the production deployment topology (multi-node with `StorageIdentityProvider`) are contracted in the specs but will be implemented in Phase 2+. Application services (DockerService, NodeService, agent services) are downstream concerns that build on top of the call protocol and OperationEnv.
## Identity
`Identity` struct and `IdentityProvider` trait are core types in alknet-core (ADR-029, [identity.md](identity.md)):
```rust
pub struct Identity {
pub id: String, // Fingerprint (config auth) or account UUID (database auth)
pub scopes: Vec<String>, // Authorization scope strings
pub resources: HashMap<String, Vec<String>>, // Resource-level authorization
}
```
`IdentityProvider` decouples alknet-core from identity storage. Phase 1 ships `ConfigIdentityProvider` (reads from `ArcSwap<DynamicConfig.auth>`). `StorageIdentityProvider` (Phase 2+, backed by SQLite) replaces it for production deployments. Both produce the same `Identity` result.
## Exports
### Binary: `alknet`
@@ -35,24 +93,40 @@ The `alknet-core` crate exports the pluggable components for embedding or progra
- `TcpTransport` — direct TCP connection
- `TlsTransport` — TCP + tokio-rustls TLS
- `IrohTransport` — iroh QUIC P2P connection
- `Interface` trait — consumes transport stream, produces call protocol session
- `Socks5Server` — local SOCKS5 proxy that forwards through SSH channels
- `PortForwarder` — manages local/remote port forwards
- `ServerHandler` — russh server handler with configurable auth and channel policies
- `ConnectOptions` / `ServeOptions` — programmatic configuration structs (no file parsing)
- `Identity` / `IdentityProvider` — core identity types (ADR-029)
- `OperationSpec` — operation registration for call protocol (ADR-025)
- `ConnectOptions` / `ServeOptions` — programmatic configuration structs
- `StaticConfig` / `DynamicConfig` — static/immutable vs. hot-reloadable config (ADR-030)
- `ConfigReloadHandle` — programmatic reload of dynamic config
## Dependencies
| Dependency | Purpose | Feature-gated |
|------------|---------|---------------|
| `russh` | SSH client & server | No (core) |
| `tokio` | Async runtime | No (core) |
| `tokio-rustls` | TLS wrapping | Yes (`tls`) |
| `rustls` | TLS implementation | Yes (`tls`) |
| `rustls-acme` | ACME/Let's Encrypt auto-cert | Yes (`acme`) |
| `iroh` | P2P QUIC transport | Yes (`iroh`) |
| `clap` | CLI argument parsing | No (core) |
| `tracing` | Structured logging | No (core) |
| `anyhow` / `thiserror` | Error handling | No (core) |
| Dependency | Purpose | Crate | Feature-gated |
|------------|---------|-------|---------------|
| `russh` | SSH client & server | core | No (core) |
| `tokio` | Async runtime | core | No (core) |
| `tokio-rustls` | TLS wrapping | core | Yes (`tls`) |
| `rustls` | TLS implementation | core | Yes (`tls`) |
| `rustls-acme` | ACME/Let's Encrypt auto-cert | core | Yes (`acme`) |
| `iroh` | P2P QUIC transport | core | Yes (`iroh`) |
| `irpc` | Streaming RPC service layer | core | Yes (`irpc`) |
| `arc-swap` | Lock-free dynamic config | core | No (core) |
| `serde` | Serialization | core | No (core) |
| `clap` | CLI argument parsing | CLI | No (CLI) |
| `toml` | TOML config file | CLI | No (CLI) |
| `tracing` | Structured logging | core | No (core) |
| `anyhow` / `thiserror` | Error handling | core | No (core) |
| `bip39` | Mnemonic generation | secret | No (secret) |
| `ed25519-bip32` | HD key derivation | secret | No (secret) |
| `aes-gcm` | AES-256-GCM encryption | secret | No (secret) |
| `rusqlite` | SQLite (via honker) | storage | No (storage) |
| `honker` | Event-sourced storage | storage | No (storage) |
| `petgraph` | Graph data structure | storage, flowgraph | No |
| `jsonschema` | JSON Schema validation | storage, flowgraph | No |
> Note: `tun-rs` is no longer a dependency. TUN support is deferred in favor of the external `tun2proxy` tool (ADR-014).
@@ -60,19 +134,29 @@ The `alknet-core` crate exports the pluggable components for embedding or progra
1. **SSH runs over transport, not alongside** — The transport layer produces a single `AsyncRead+AsyncWrite+Unpin+Send` stream. SSH runs over that stream via `russh::client::connect_stream()` / `russh::server::run_stream()`. The SSH layer never knows what transport it's on. (ADR-001, ADR-004)
2. **SOCKS5 is the primary client interface** — Port forwarding is built on top of SOCKS5-like channel management. For VPN-like "route all traffic" behavior, users run `tun2proxy` alongside alknet's SOCKS5 proxy. TUN is not in the project scope. (ADR-005, ADR-014)
2. **Three-layer model: Transport, Interface, Protocol** — SSH is an interface (Layer 2), not a transport (Layer 1). A connection is always a (Transport, Interface) pair. The call protocol (Layer 3) is agnostic to both. This enables DNS control channels, raw framing, and WebTransport direct call protocol without wrapping SSH inside those transports. (ADR-026)
3. **No logging of tunnel destinations** — The server logs auth attempts and connections (for fail2ban) but does not log `channel_open_direct_tcpip` destinations, DNS lookups, or bytes transferred. (ADR-006, ADR-013)
3. **SOCKS5 is the primary client interface** — Port forwarding is built on top of SOCKS5-like channel management. For VPN-like "route all traffic" behavior, users run `tun2proxy` alongside alknet's SOCKS5 proxy. TUN is not in the project scope. (ADR-005, ADR-014)
4. **Programmatic-first API** — Configuration via CLI flags, library API structs (`ConnectOptions`, `ServeOptions`), and environment variables. No `~/.ssh/config` parsing, no custom config files. (ADR-011)
4. **No logging of tunnel destinations** — The server logs auth attempts and connections (for fail2ban) but does not log `channel_open_direct_tcpip` destinations, DNS lookups, or bytes transferred. (ADR-006, ADR-013)
5. **Feature flags control transport inclusion**`tls`, `iroh`, `acme` are feature-gated so the base install is lean. Users opt in to heavier dependencies.
5. **Programmatic-first API** — Configuration via CLI flags, library API structs (`ConnectOptions`, `ServeOptions`), and environment variables. No `~/.ssh/config` parsing. Optional `--config` TOML file for reproducible deployments. (ADR-011, ADR-030)
6. **Authentication is key-based** — Ed25519 public key (default) and OpenSSH certificate authority. No password authentication over SSH. (ADR-012)
6. **Feature flags control transport inclusion**`tls`, `iroh`, `acme`, `irpc` are feature-gated so the base install is lean. Users opt in to heavier dependencies.
7. **NAPI exposes both connect() and serve()** — The napi-rs wrapper provides client and server functionality, using napi-rs as the FFI bridge. The NAPI layer is transport-agnostic and not tied to pubsub. (ADR-015, ADR-016)
7. **Authentication is key-based and unified** — Ed25519 public key (default) and OpenSSH certificate authority. Same key material for SSH and token auth. Identity resolves through `IdentityProvider` trait, decoupling core from identity storage. (ADR-012, ADR-023, ADR-029)
8. **Error handling follows a consistent layered pattern** — Transport and auth errors cause reconnection (client, with exponential backoff) or connection rejection (server). Channel-level errors (target unreachable, proxy failure) close the individual channel without killing the session. Library API errors propagate via `anyhow::Result` / `thiserror` types. CLI reports errors to stderr with appropriate exit codes. NAPI errors are marshalled as JavaScript exceptions.
8. **NAPI exposes both connect() and serve()** — The napi-rs wrapper provides client and server functionality, using napi-rs as the FFI bridge. The NAPI layer is transport-agnostic and not tied to pubsub. (ADR-015, ADR-016)
9. **Static/dynamic config split** — Transport-level settings (listen address, TLS certs) are immutable after startup. Auth, forwarding policy, and rate limits are hot-reloadable via `ArcSwap<DynamicConfig>`. (ADR-030)
10. **Forwarding policy enforced before proxy spawn** — Each `channel_open_direct_tcpip` is checked against `ForwardingPolicy` before a TCP connection is made. Default-allow preserves current behavior. (ADR-031)
11. **OperationEnv as universal composition mechanism** — Handlers call `context.env.invoke(namespace, op, input)` regardless of dispatch path (local, irpc service, remote call protocol). (ADR-033)
12. **Event boundary discipline** — Domain events (Honker streams) stay within the owning service. irpc calls are synchronous and in-cluster. Call protocol `EventEnvelope` is the only thing that crosses node boundaries. (ADR-032)
13. **Error handling follows a consistent layered pattern** — Transport and auth errors cause reconnection (client, with exponential backoff) or connection rejection (server). Channel-level errors (target unreachable, proxy failure) close the individual channel without killing the session. Library API errors propagate via `anyhow::Result` / `thiserror` types. CLI reports errors to stderr with appropriate exit codes. NAPI errors are marshalled as JavaScript exceptions.
## Design Decisions
@@ -88,7 +172,7 @@ The `alknet-core` crate exports the pluggable components for embedding or progra
| [008](decisions/008-acme-lets-encrypt.md) | ACME/Let's Encrypt | Auto-provision TLS certs, domain and IP paths |
| [009](decisions/009-default-iroh-relay.md) | Default iroh relay | n0 relay by default, `--iroh-relay` override |
| [010](decisions/010-transport-chaining-cli.md) | Transport chaining | `--proxy` works with all transports natively |
| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first | No file-based config; options are structs, env vars, CLI flags |
| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first | No SSH config files; options are structs, env vars, CLI flags (amended by ADR-030 for optional TOML) |
| [012](decisions/012-auth-ed25519-and-cert-authority.md) | Key + cert-authority | Ed25519 keys + OpenSSH CA; no password auth |
| [013](decisions/013-fail2ban-friendly-logging.md) | Fail2ban-friendly | Structured auth logs + built-in rate limiting |
| [014](decisions/014-defer-tun-recommend-socks5-proxy.md) | Defer TUN | Use tun2proxy for VPN-like behavior; no alknet-tun binary |
@@ -97,17 +181,46 @@ The `alknet-core` crate exports the pluggable components for embedding or progra
| [017](decisions/017-stealth-mode-protocol-multiplexing.md) | Stealth mode | Protocol multiplexing on port 443 |
| [018](decisions/018-control-channel-for-pubsub.md) | Control channel | Reserved `alknet-control` destination for pubsub |
| [019](decisions/019-proxy-dual-semantics.md) | Proxy dual semantics | `--proxy` routes transport on client, data on server |
| [023](decisions/023-unified-auth-shared-key-material.md) | Unified auth | Same key material for SSH and token auth |
| [024](decisions/024-bidirectional-call-protocol.md) | Bidirectional call protocol | Both sides can initiate calls |
| [025](decisions/025-handler-spec-separation.md) | Handler/spec separation | Downstream registers operations without modifying core |
| [026](decisions/026-transport-interface-separation.md) | Three-layer model | SSH is Layer 2, not Layer 1 |
| [027](decisions/027-crate-decomposition.md) | Crate decomposition | Six crates, acyclic deps, feature-gated irpc |
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | IdentityProvider is the contract, irpc is one backend |
| [029](decisions/029-identity-core-type.md) | Identity as core type | `Identity` and `IdentityProvider` in alknet-core |
| [030](decisions/030-static-dynamic-config-split.md) | Static/dynamic config | ArcSwap for hot-reloadable auth and forwarding |
| [031](decisions/031-forwarding-policy.md) | Forwarding policy | Per-identity, per-destination, per-transport rules |
| [032](decisions/032-event-boundary-discipline.md) | Event boundary | Domain events never cross service boundaries |
| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv | Universal composition, three dispatch paths |
| [034](decisions/034-head-worker-terminology.md) | Head/worker | Replaces hub/spoke terminology |
## Open Questions
All open questions have been resolved. See [open-questions.md](open-questions.md) for resolution details.
See [open-questions.md](open-questions.md) for all open and resolved questions.
Key open questions: OQ-15 (QUIC coexistence), OQ-19 (WebTransport TLS),
OQ-20 (worker registration), OQ-IF-01 (Interface session / EventEnvelope
relationship).
## References
- [Feasibility Assessment](../../../conversations/research/ssh-tunnel-vpn-alternative-feasibility.md)
- [transport.md](transport.md) — Transport abstraction (Layer 1)
- [interface.md](interface.md) — Interface layer (Layer 2)
- [call-protocol.md](call-protocol.md) — Call protocol (Layer 3)
- [auth.md](auth.md) — Unified authentication
- [identity.md](identity.md) — Identity and IdentityProvider
- [configuration.md](configuration.md) — StaticConfig, DynamicConfig, ForwardingPolicy
- [services.md](services.md) — irpc service layer, OperationEnv
- [server.md](server.md) — Server acceptance, channel handling
- [client.md](client.md) — Client connection, SOCKS5, port forwarding
- [napi-and-pubsub.md](napi-and-pubsub.md) — NAPI wrapper and pubsub adapter
- [storage.md](storage.md) — alknet-storage: metagraph, identity, ACL
- [flowgraph.md](flowgraph.md) — alknet-flowgraph: call graph, operation graph
- [secret-service.md](secret-service.md) — alknet-secret: BIP39, SLIP-0010, AES-GCM
- [Feasibility Assessment](../research/feasibility/ssh-tunnel-vpn-alternative-feasibility.md)
- [russh API](/workspace/russh) — SSH client/server library
- [Dispatch](/workspace/@alkdev/dispatch) — Reference implementation of russh port forwarding
- [iroh](/workspace/iroh) — P2P QUIC connections
- [tun2proxy](https://github.com/tun2proxy/tun2proxy) — Recommended external TUN-to-SOCKS5 tool
- [Production certbot setup](/workspace/system/dev1/certbot.md) — Let's Encrypt on our infrastructure
- [Production fail2ban setup](/workspace/system/dev1/fail2ban.md) — fail2ban with nftables on our infrastructure
- [irpc](/workspace/irpc) — iroh streaming RPC
- [Production certbot setup](../research/ops/certbot.md) — Let's Encrypt on our infrastructure
- [Production fail2ban setup](../research/ops/fail2ban.md) — fail2ban with nftables on our infrastructure

View File

@@ -166,20 +166,16 @@ never leaves the secret service node.
## Open Questions
- **OQ-SVC-01**: Should the secret service support multiple seed phrases (one per
tenant)? See [open-questions.md](open-questions.md).
- **OQ-SVC-01**: Should the secret service support multiple seed phrases (one
per tenant)? See [open-questions.md](open-questions.md).
- **OQ-SVC-02**: Should service protocols use postcard (binary) or JSON for
remote calls? Postcard for irpc (Rust-to-Rust), JSON for call protocol
(cross-language). See [open-questions.md](open-questions.md).
remote calls? See [open-questions.md](open-questions.md).
- **OQ-SVC-03**: How does the secret service integrate with the existing
`EncryptedDataSchema` from `@alkdev/storage`? The Rust implementation replaces
PBKDF2 password-based encryption with derived AES-256-GCM keys. The
`EncryptedData` format is a superset.
`EncryptedDataSchema` from `@alkdev/storage`? See [open-questions.md](open-questions.md).
- **OQ-SVC-04**: Should workers cache derived keys locally? Yes, with a TTL
(default: 1 hour). The head can revoke by invalidating the session.
- **OQ-SVC-04**: Should workers cache derived keys locally? See [open-questions.md](open-questions.md).
## Design Decisions

View File

@@ -1,6 +1,6 @@
---
status: reviewed
last_updated: 2026-06-02
last_updated: 2026-06-07
---
# Server
@@ -51,21 +51,30 @@ The server is the tunnel endpoint. It receives SSH channels requesting TCP conne
### Authentication
The server supports Ed25519 public key authentication (default) and OpenSSH certificate authority authentication (ADR-012):
The server authenticates connections through the `IdentityProvider` trait (ADR-029, [identity.md](identity.md)). `IdentityProvider` decouples the server from any specific identity storage — the server resolves an identity, it doesn't manage keys.
**Ed25519 public key** (default):
1. Load authorized keys from a specified path or in-memory data
2. `auth_publickey()` checks the presented key against the authorized set
3. Uses constant-time comparison to prevent timing attacks
**Phase 1 implementation**: `ConfigIdentityProvider` (in alknet-core) reads from `ArcSwap<DynamicConfig.auth>` (ADR-030). Every authorized key gets a default scope set. No database required. This is the default for CLI and single-node deployments.
**OpenSSH certificate authority** (ADR-012):
1. Load a trusted CA public key (`--cert-authority <path>`)
2. `auth_publickey()` validates the presented certificate: checks CA signature, expiry, and principal restrictions
3. Supports certificate options: `permit-port-forwarding`, `no-pty`, `source-address`
**Future implementation**: `StorageIdentityProvider` (in alknet-storage, not yet built) backed by SQLite `peer_credentials` and `api_keys` tables plus the ACL graph. The server doesn't need to know which implementation is active — it goes through the trait.
This enables multi-user deployments where adding one CA line to `authorized_keys` is simpler than managing individual keys for every user.
The server supports two auth presentation paths (ADR-023, [auth.md](auth.md)):
**No password authentication over SSH.** Keys and certificates are sufficient and more secure. If a local SOCKS5 proxy needs its own auth layer, that's a separate concern.
**SSH public key auth** (SSH transports):
1. `auth_publickey()` callback receives the presented key
2. Delegates to `IdentityProvider::resolve_from_fingerprint()` with the key fingerprint
3. Returns `Accept` (with `Identity` attached) or `Reject`
**Ed25519 + OpenSSH certificate authority** (ADR-012):
1. If no direct key match, validate the presented certificate against trusted cert-authorities
2. Check CA signature, expiry, and principal restrictions
3. Certificate options: `permit-port-forwarding`, `no-pty`, `source-address`
**Token auth** (non-SSH transports, WebTransport):
1. Extract token from URL path or `Authorization` header
2. Delegate to `IdentityProvider::resolve_from_token()`
3. Same verification: same authorized keys set, same `Identity` result (ADR-023)
**No password authentication over SSH channels.** Keys and certificates are sufficient and more secure. If a local SOCKS5 proxy needs its own auth layer, that's a separate concern.
### Key Material Format
@@ -87,7 +96,9 @@ When a client opens a `channel_open_direct_tcpip(host, port, originator_addr, or
**Reserved destination** — If `host` starts with `alknet-` (e.g., `alknet-control`), the server routes the channel internally instead of connecting to a TCP target. The primary reserved destination is `alknet-control:0`, which bridges the channel to the local pubsub event bus (ADR-018).
**Regular destination** — For all other targets:
**Forwarding policy check** — Before the proxy task is spawned for any non-reserved destination, the server evaluates `ForwardingPolicy` against the authenticated `Identity` (ADR-031, [configuration.md](configuration.md)). The policy check uses `Identity.id` and `Identity.scopes` from the identity resolved during auth. If the policy denies the destination, the channel open is rejected — no TCP connection is attempted. The default policy (`ForwardingPolicy::allow_all()`) preserves current behavior.
**Regular destination** — For targets that pass the forwarding policy check:
1. **Connection** — connect to `host:port`, either directly or via the configured outbound proxy
2. **Outbound connection** — connect to the target, either directly or via the configured outbound proxy
@@ -122,17 +133,23 @@ This makes the server appear as an ordinary web server to port scanners and DPI
The server handler implements `russh::server::Handler` with two primary responsibilities:
**Authentication (`auth_publickey`)**:
- Check the presented key against the configured `authorized_keys` set (constant-time comparison)
- If no direct match, check whether the key is a certificate signed by a trusted cert-authority
- Validate certificate signature, expiry, and principal restrictions (e.g., `permit-port-forwarding`, `no-pty`, `source-address`)
- Delegate to `IdentityProvider::resolve_from_fingerprint()` with the presented key fingerprint
- If identity resolved, return `Accept` with the `Identity` attached to the session
- If no identity, check certificate authority: validate CA signature, expiry, principals
- Return `Accept` or `Reject`
**Channel handling (`channel_open_direct_tcpip`)**:
- If the destination host starts with `alknet-`, route internally (control channel, ADR-018)
- Otherwise, connect to `host:port` (directly or via the configured outbound proxy)
- Otherwise, evaluate `ForwardingPolicy` against the session's `Identity` (ADR-031)
- If denied, reject the channel open
- If allowed, connect to `host:port` (directly or via the configured outbound proxy)
- Spawn a bidirectional proxy task between the SSH channel and the outbound TCP stream
- Return the channel for data flow
### Interface Abstraction
SSH is one interface at Layer 2 in the three-layer model (ADR-026, [interface.md](interface.md)). The current `ServerHandler` will be refactored into `SshInterface` — it manages SSH session concerns (handshake, auth delegation, channel multiplexing). Forwarding policy, operation routing, and call protocol handling are Layer 3 concerns that live outside the interface. This refactoring is the most invasive code change in Phase 1 (integration-plan, Phase 1.8).
### Logging and Rate Limiting
**Logging** (for fail2ban integration on Linux):
@@ -159,6 +176,25 @@ These provide abuse protection on platforms without fail2ban (macOS, Windows, BS
### CLI Interface
Configuration sources (in priority order): CLI flags, environment variables, optional `--config` TOML file (ADR-030). The TOML config file is a convenience input for reproducible deployments; it does not replace `ServeOptions` (ADR-011).
Multi-transport listeners use `[[listeners]]` in the TOML config (ADR-030):
```toml
[[listeners]]
transport = "tls"
listen = "0.0.0.0:443"
[listeners.tls]
cert = "/etc/alknet/tls/cert.pem"
key = "/etc/alknet/tls/key.pem"
[[listeners]]
transport = "iroh"
```
Currently, the server binds to a single transport at a time. Multi-transport via `[[listeners]]` is coming per ADR-030.
```bash
# Basic server (SSH on port 22)
alknet serve --key ~/.ssh/ssh_host_ed25519_key
@@ -230,7 +266,9 @@ No listening port is needed. The server connects outbound to the iroh relay (def
- The server does not log tunnel destinations (ADR-006). Auth events and connection events are logged for fail2ban integration (ADR-013).
- Destination strings beginning with `alknet-` are reserved for internal use (ADR-018). The server must not attempt TCP connections to `alknet-*` destinations — these are intercepted for control channel routing.
- One `ServerHandler` instance per connection. Handler state is not shared between connections (unless explicitly configured via `Arc` shared state for things like connection limits).
- The server binds to a single transport at a time. Running multiple transports (e.g., TCP + iroh) simultaneously requires separate processes or a future multiplexing feature.
- The server currently binds to a single transport at a time. Multi-transport via `[[listeners]]` is coming per ADR-030.
- Forwarding policy is evaluated before every channel proxy spawn. Denied channels are rejected immediately (ADR-031).
- Auth resolves through `IdentityProvider` (ADR-029). Phase 1 uses `ConfigIdentityProvider` backed by `ArcSwap<DynamicConfig>` (ADR-030). `StorageIdentityProvider` (Phase 2+) replaces it for production deployments with SQLite.
- ACME support requires the `acme` feature flag. Without it, only manual TLS certs are supported.
- No password authentication over SSH channels. Key-based and cert-authority only (ADR-012).
- Stealth mode (`--stealth`) requires TLS transport. It has no effect on TCP or iroh transports (ADR-017).
@@ -272,4 +310,16 @@ None — all resolved.
| [013](decisions/013-fail2ban-friendly-logging.md) | Fail2ban-friendly logging | Structured auth logs + built-in rate limiting |
| [017](decisions/017-stealth-mode-protocol-multiplexing.md) | Stealth mode | Protocol multiplexing on port 443 |
| [018](decisions/018-control-channel-for-pubsub.md) | Control channel | Reserved `alknet-control` destination for pubsub |
| [019](decisions/019-proxy-dual-semantics.md) | Proxy dual semantics | `--proxy` routes transport on client, data on server |
| [019](decisions/019-proxy-dual-semantics.md) | Proxy dual semantics | `--proxy` routes transport on client, data on server |
| [026](decisions/026-transport-interface-separation.md) | Three-layer model | SSH is Layer 2 interface, ServerHandler → SshInterface |
| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | IdentityProvider is the contract; irpc service is one backend |
| [029](decisions/029-identity-core-type.md) | Identity as core type | IdentityProvider trait in alknet-core |
| [030](decisions/030-static-dynamic-config-split.md) | Static/dynamic config split | ArcSwap for dynamic config, ConfigReloadHandle |
| [031](decisions/031-forwarding-policy.md) | Forwarding policy | Evaluated before channel proxy spawn |
## References
- [configuration.md](configuration.md) — DynamicConfig, ForwardingPolicy, ConfigReloadHandle
- [identity.md](identity.md) — IdentityProvider trait, Identity struct
- [auth.md](auth.md) — Unified auth, AuthPolicy, token auth
- [interface.md](interface.md) — Interface trait, SshInterface, three-layer model

View File

@@ -20,8 +20,8 @@ last_updated: 2026-06-07
The irpc service layer decomposes alknet's core responsibilities into
independently testable, deployable, and replaceable components. Auth, Secret,
Config, and Storage are irpc protocol enums that work both as in-process async
boundaries (tokio channels) and cross-process/cross-network (QUIC streams via
noq). OperationEnv is the universal composition mechanism that unifies local
boundaries (tokio channels) and cross-process/cross-network (irpc over iroh
QUIC streams). OperationEnv is the universal composition mechanism that unifies local
dispatch, irpc service dispatch, and remote call protocol dispatch.
## Why
@@ -209,13 +209,10 @@ layer to be built — they are Phase 2+ concerns.
## Open Questions
- **OQ-SVC-01**: Should the secret service support multiple seed phrases (one
per tenant)? Defer for now — one seed per node. Multi-seed can be added
later by indexing the `Unlock` call with a tenant ID.
per tenant)? See [open-questions.md](open-questions.md).
- **OQ-SVC-02**: Should service protocols use postcard (binary) or JSON for
remote calls? Postcard for irpc (Rust-to-Rust, efficient). JSON for call
protocol (cross-language, universal). The irpc remote path naturally uses
postcard.
remote calls? See [open-questions.md](open-questions.md).
## Design Decisions

View File

@@ -197,17 +197,12 @@ dependency.
## Open Questions
- **OQ-SVC-03**: How does the secret service integrate with the existing
`EncryptedDataSchema` from `@alkdev/storage`? The Rust implementation replaces
PBKDF2 password-based encryption with derived AES-256-GCM keys. The
`EncryptedData` format is a superset — old format can be migrated by
re-encrypting with the new key.
`EncryptedDataSchema` from `@alkdev/storage`? See [open-questions.md](open-questions.md).
- **OQ-SVC-04**: Should workers cache derived keys locally? Yes, with a TTL
(default: 1 hour). The head can revoke by invalidating the session.
- **OQ-SVC-04**: Should workers cache derived keys locally? See [open-questions.md](open-questions.md).
- **OQ-SVC-05**: How does the smart contract (NFT-based ACL) interact with the
secret service? The Ethereum signing key (`m/44'/60'/0'/0/0`) is derived from
the same seed. The smart contract is a separate concern.
- **OQ-SVC-05**: How does the NFT-based ACL smart contract interact with the
secret service? See [open-questions.md](open-questions.md).
## Design Decisions