From 19b3d3a078afbc1cee842f3e195b9ea09fcbdf94 Mon Sep 17 00:00:00 2001 From: "glm-5.1" Date: Sun, 7 Jun 2026 09:32:58 +0000 Subject: [PATCH] =?UTF-8?q?docs:=20write=20Phase=200=20architecture=20foun?= =?UTF-8?q?dation=20=E2=80=94=20ADRs=20026-034,=20spec=20docs,=20and=20tas?= =?UTF-8?q?k=20updates?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase 0a — ADRs (9 new): - ADR-026: Transport/interface separation (three-layer model) - ADR-027: Crate decomposition (core, secret, storage, flowgraph, napi, CLI) - ADR-028: Auth as irpc service (AuthProtocol behind feature flag) - ADR-029: Identity as core type (Identity + IdentityProvider in alknet-core) - ADR-030: Static/dynamic config split (ArcSwap, ConfigReloadHandle) - ADR-031: Forwarding policy (rule-based allow/deny, TransportKind-aware) - ADR-032: Event boundary discipline (domain, irpc, call protocol boundaries) - ADR-033: OperationEnv universal composition (three dispatch paths) - ADR-034: Head/worker terminology (replace hub/spoke) Phase 0b — New spec documents (7): - identity.md, services.md, interface.md, configuration.md, storage.md, flowgraph.md, secret-service.md Updated existing docs: - auth.md: reference identity.md for canonical definitions, add AuthProtocol - open-questions.md: resolve OQ-12, OQ-16, OQ-18, OQ-22, OQ-23-25 - README.md: add all new docs, ADRs 026-034 Marked 19 architecture tasks as completed. --- docs/architecture/README.md | 44 +++- docs/architecture/auth.md | 100 ++++---- docs/architecture/configuration.md | 192 +++++++++++++++ .../026-transport-interface-separation.md | 162 +++++++++++++ .../decisions/027-crate-decomposition.md | 150 ++++++++++++ .../decisions/028-auth-irpc-service.md | 146 ++++++++++++ .../decisions/029-identity-core-type.md | 107 +++++++++ .../030-static-dynamic-config-split.md | 159 +++++++++++++ .../decisions/031-forwarding-policy.md | 138 +++++++++++ .../032-event-boundary-discipline.md | 96 ++++++++ .../033-operationenv-irpc-call-protocol.md | 130 +++++++++++ .../decisions/034-head-worker-terminology.md | 55 +++++ docs/architecture/flowgraph.md | 186 +++++++++++++++ docs/architecture/identity.md | 189 +++++++++++++++ docs/architecture/interface.md | 221 ++++++++++++++++++ docs/architecture/open-questions.md | 111 +++++++-- docs/architecture/secret-service.md | 197 ++++++++++++++++ docs/architecture/services.md | 211 +++++++++++++++++ docs/architecture/storage.md | 219 +++++++++++++++++ .../adr-026-transport-interface-separation.md | 2 +- .../adr-027-crate-decomposition.md | 2 +- .../architecture/adr-028-auth-irpc-service.md | 2 +- .../adr-029-identity-core-type.md | 2 +- .../adr-030-static-dynamic-config-split.md | 2 +- .../architecture/adr-031-forwarding-policy.md | 2 +- .../adr-032-event-boundary-discipline.md | 2 +- ...adr-033-operationenv-irpc-call-protocol.md | 2 +- .../adr-034-head-worker-terminology.md | 2 +- tasks/architecture/spec-configuration.md | 2 +- tasks/architecture/spec-flowgraph.md | 2 +- tasks/architecture/spec-identity.md | 2 +- tasks/architecture/spec-interface.md | 2 +- tasks/architecture/spec-secret-service.md | 2 +- tasks/architecture/spec-services.md | 2 +- tasks/architecture/spec-storage.md | 2 +- tasks/architecture/spec-update-auth.md | 2 +- .../spec-update-open-questions.md | 2 +- tasks/architecture/spec-update-readme.md | 2 +- 38 files changed, 2750 insertions(+), 101 deletions(-) create mode 100644 docs/architecture/configuration.md create mode 100644 docs/architecture/decisions/026-transport-interface-separation.md create mode 100644 docs/architecture/decisions/027-crate-decomposition.md create mode 100644 docs/architecture/decisions/028-auth-irpc-service.md create mode 100644 docs/architecture/decisions/029-identity-core-type.md create mode 100644 docs/architecture/decisions/030-static-dynamic-config-split.md create mode 100644 docs/architecture/decisions/031-forwarding-policy.md create mode 100644 docs/architecture/decisions/032-event-boundary-discipline.md create mode 100644 docs/architecture/decisions/033-operationenv-irpc-call-protocol.md create mode 100644 docs/architecture/decisions/034-head-worker-terminology.md create mode 100644 docs/architecture/flowgraph.md create mode 100644 docs/architecture/identity.md create mode 100644 docs/architecture/interface.md create mode 100644 docs/architecture/secret-service.md create mode 100644 docs/architecture/services.md create mode 100644 docs/architecture/storage.md diff --git a/docs/architecture/README.md b/docs/architecture/README.md index 5572180..80957d6 100644 --- a/docs/architecture/README.md +++ b/docs/architecture/README.md @@ -1,16 +1,18 @@ --- status: draft -last_updated: 2026-06-04 +last_updated: 2026-06-07 --- # Alknet Architecture ## Current State -Architecture specification in active development. 22 ADRs accepted. Unified -auth and call protocol architecture being specified — see [auth.md](auth.md) -and [call-protocol.md](call-protocol.md). Configuration architecture under -exploration — see [research/configuration.md](../research/configuration.md). +Architecture specification in active development. Phase 0 foundation ADRs +completed (026–034). New spec documents created for identity, services, +interface, configuration, storage, flowgraph, and secret service. Existing +specs updated for the three-layer model, crate decomposition, and unified +identity. See [open-questions.md](open-questions.md) for remaining open +questions. ## Architecture Documents @@ -24,12 +26,24 @@ exploration — see [research/configuration.md](../research/configuration.md). | [server.md](server.md) | reviewed | Server acceptance, channel handling, proxy | | [tun-shim.md](tun-shim.md) | deprecated | TUN interface wrapper — **deferred**, use tun2proxy | | [napi-and-pubsub.md](napi-and-pubsub.md) | reviewed | NAPI wrapper and pubsub event target adapter | +| [identity.md](identity.md) | draft | Identity type, IdentityProvider trait, auth flows | +| [services.md](services.md) | draft | irpc service layer, OperationEnv, three dispatch paths | +| [interface.md](interface.md) | draft | Layer 2: Interface trait, SshInterface, RawFramingInterface | +| [configuration.md](configuration.md) | draft | StaticConfig, DynamicConfig, forwarding policy, reload | +| [storage.md](storage.md) | draft | alknet-storage: metagraph, identity, ACL, honker | +| [flowgraph.md](flowgraph.md) | draft | alknet-flowgraph: call graph, operation graph, petgraph | +| [secret-service.md](secret-service.md) | draft | alknet-secret: BIP39, SLIP-0010, AES-GCM, SecretProtocol | ## Research Documents | Document | Status | Description | |----------|--------|-------------| -| [configuration.md](../research/configuration.md) | draft | Configuration architecture: static/dynamic split, hot reload, forwarding policy | +| [configuration.md](../research/configuration.md) | draft | Configuration architecture (source for promoted spec) | +| [core.md](../research/core.md) | draft | Core overview, transport, call protocol, DNS | +| [services.md](../research/services.md) | draft | irpc service protocols, OperationContext, application services | +| [storage.md](../research/storage.md) | draft | Metagraph, identity, ACL, secrets, honker | +| [flow.md](../research/flow.md) | draft | FlowGraph, operation graph, call graph, petgraph mapping | +| [integration-plan.md](../research/integration-plan.md) | draft | Phased integration plan for services, pubsub, and operations | ## ADR Table @@ -57,12 +71,24 @@ exploration — see [research/configuration.md](../research/configuration.md). | [023](decisions/023-unified-auth-shared-key-material.md) | Unified auth with shared key material + token auth | Accepted | | [024](decisions/024-bidirectional-call-protocol.md) | Bidirectional call protocol (EventEnvelope) | Accepted | | [025](decisions/025-handler-spec-separation.md) | Handler/spec separation for downstream service registration | Accepted | +| [026](decisions/026-transport-interface-separation.md) | Transport/interface separation (three-layer model) | Accepted | +| [027](decisions/027-crate-decomposition.md) | Crate decomposition (core, secret, storage, flowgraph) | Accepted | +| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service behind feature flag | Accepted | +| [029](decisions/029-identity-core-type.md) | Identity as core type in alknet-core | Accepted | +| [030](decisions/030-static-dynamic-config-split.md) | Static/dynamic config split with ArcSwap | Accepted | +| [031](decisions/031-forwarding-policy.md) | Forwarding policy with rule-based allow/deny | Accepted | +| [032](decisions/032-event-boundary-discipline.md) | Event boundary discipline (domain, irpc, call protocol) | Accepted | +| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv as universal composition mechanism | Accepted | +| [034](decisions/034-head-worker-terminology.md) | Head/worker terminology replacing hub/spoke | Accepted | ## Open Questions -Most open questions have been resolved. Open questions remain for -configuration, auth, and call protocol — see -[open-questions.md](open-questions.md) for details. +See [open-questions.md](open-questions.md) for all open and resolved questions. +Key resolved questions from Phase 0: OQ-12, OQ-16, OQ-18 (forwarding policy +and identity scopes), OQ-17 (transport-aware auth), OQ-23 (irpc feature flag), +OQ-24 (DNS control channel scope), OQ-25 (crate irpc dependencies). Key open +questions: OQ-15 (QUIC coexistence), OQ-19 (WebTransport TLS), OQ-20 (worker +registration). ## Lifecycle Definitions diff --git a/docs/architecture/auth.md b/docs/architecture/auth.md index 6b3599b..048bafa 100644 --- a/docs/architecture/auth.md +++ b/docs/architecture/auth.md @@ -3,15 +3,15 @@ status: draft last_updated: 2026-06-07 --- -# Authentication & Identity +# Authentication ## What -A unified authentication and identity layer that works across all transports — -SSH-over-any-transport and WebTransport (non-SSH HTTP-level transports). The -same key material (Ed25519 authorized keys and certificate authorities) is -shared across both auth paths. Identity resolution produces a transport-agnostic -`Identity` that carries scopes and resources for downstream authorization. +A unified authentication layer that works across all transports — SSH-over-any- +transport and WebTransport (non-SSH HTTP-level transports). The same key +material (Ed25519 authorized keys and certificate authorities) is shared across +both auth paths. Identity resolution produces a transport-agnostic `Identity` +that carries scopes and resources for downstream authorization. ## Why @@ -21,8 +21,27 @@ need a different auth presentation that shares the same key material. The unified auth layer ensures one key set, one identity, one rotation mechanism across all transports. See ADR-023 for the decision context. +The canonical definitions of `Identity` and `IdentityProvider` are in +[identity.md](identity.md). This document covers auth-specific behavior: +auth presentation per transport, `AuthPolicy` structure, and the auth service +relationship. + ## Architecture +### Identity and IdentityProvider + +See [identity.md](identity.md) for the canonical definitions of: +- `Identity` struct (`{ id, scopes, resources }`) +- `IdentityProvider` trait (`resolve_from_fingerprint()`, `resolve_from_token()`) +- `ConfigIdentityProvider` (default, ArcSwap-backed) +- `StorageIdentityProvider` (production, SQLite-backed, in alknet-storage) +- `AuthProtocol` irpc service (behind `irpc` feature flag) + +The key relationship: `IdentityProvider` is the contract. `ConfigIdentityProvider` +is the default implementation (reads from `DynamicConfig.auth`). `AuthProtocol` +irpc service is one way to satisfy the trait, behind a feature flag. Both paths +produce the same `Identity` result. See ADR-028 and ADR-029. + ### Auth Presentation Per Transport | Transport | Auth presentation | Verification | @@ -72,44 +91,23 @@ V1 uses timestamp-only (±300s window, no server state). The replay trade-offs and future zero-replay options (nonce challenge-response) are documented in ADR-023. -### IdentityProvider Trait +### IdentityProvider and Auth Service Relationship -The `IdentityProvider` trait decouples alknet-core from any specific identity -storage. It resolves a key fingerprint or auth token to an `Identity` with -scopes and resources. +The `IdentityProvider` trait (defined in [identity.md](identity.md)) decouples +alknet-core from any specific identity storage. Two implementations exist: -```rust -pub trait IdentityProvider: Send + Sync + 'static { - /// Resolve an SSH public key fingerprint to an identity. - fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option; +- **ConfigIdentityProvider** (in alknet-core) — reads from + `ArcSwap`. Every authorized key gets a default scope set. + No database required. This is the default for minimal deployments. - /// Resolve an auth token to an identity. - /// Returns None if the token is invalid, expired, or the key is not authorized. - fn resolve_from_token(&self, token: &AuthToken) -> Option; -} +- **StorageIdentityProvider** (in alknet-storage) — backed by SQLite + `peer_credentials` and `api_keys` tables plus the ACL graph. Resolves + fingerprint → account → organization membership → effective scopes. -pub struct Identity { - pub id: String, // Unique identifier — fingerprint (config) or account UUID (database) - pub scopes: Vec, // e.g., ["relay:connect", "service:gitea:read"] - pub resources: HashMap>, // e.g., {"service": ["gitea", "registry"]} -} -``` - -> **Note on identity models**: Earlier research used `{node_id, fingerprint, scopes}`. -> The unified model uses `{id, scopes, resources}` where `id` serves as both -> fingerprint (for key-based auth from config) and account UUID (for -> database-backed auth). The `resources` field provides resource-level -> authorization beyond what scopes offer. This is the canonical definition -> that all components should use. -``` - -**Default implementation**: `ConfigIdentityProvider` loads from -`DynamicConfig.auth` (the `authorized_keys` set). Every authorized key gets a -default scope set. No database required. - -**Head implementation**: Backed by `@alkdev/storage`'s `peer_credentials` and -`accounts` tables plus the ACL graph. Resolves fingerprint → account → -organization membership → effective scopes. Uses `ArcSwap` for hot reload. +The `AuthProtocol` irpc service (behind the `irpc` feature flag, per ADR-028) +provides an async boundary for auth verification. It is one way to satisfy the +`IdentityProvider` trait, not a replacement for it. Both the trait path and the +irpc path produce the same `Identity` result. The trait is the contract. The backing store is pluggable. Alknet-core never depends on Honker, SQLite, or any specific database. @@ -240,13 +238,13 @@ security consideration: ## Open Questions -- **OQ-18**: Should `Identity.scopes` be populated from `ForwardingPolicy` - rules, from an external `IdentityProvider`, or from both? See - [open-questions.md](open-questions.md). +- **OQ-18**: ~~Source of Identity.scopes~~ Resolved per ADR-029 and ADR-031. + `IdentityProvider` owns scopes, `ForwardingPolicy` uses scopes from `Identity`. + See [open-questions.md](open-questions.md). - **OQ-19**: Should the WebTransport listener require its own TLS identity (separate from the SSH-over-TLS listener), or can they share the same - certificate? See [open-questions.md](open-questions.md). + certificate? Deferred to Phase 4. See [open-questions.md](open-questions.md). ## Design Decisions @@ -254,16 +252,16 @@ security consideration: |-----|----------|---------| | [012](decisions/012-auth-ed25519-and-cert-authority.md) | Ed25519 + cert-authority | Key-based auth, no passwords | | [023](decisions/023-unified-auth-shared-key-material.md) | Unified auth, shared key material | Same keys for SSH and token auth | +| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | AuthProtocol behind feature flag; IdentityProvider is the contract | +| [029](decisions/029-identity-core-type.md) | Identity as core type | `Identity` and `IdentityProvider` in alknet-core | ## References +- [identity.md](identity.md) — Canonical Identity and IdentityProvider definitions - [server.md](server.md) — Current SSH auth handler - [transport.md](transport.md) — Transport abstraction -- [configuration.md](../research/configuration.md) — DynamicConfig, AuthPolicy structure -- [open-questions.md](open-questions.md) — OQ-17 (resolved), OQ-18, OQ-19 -- `server/handler.rs` — Current `auth_publickey()` callback -- `auth/server_auth.rs` — Current `ServerAuthConfig` struct -- `auth/keys.rs` — `KeySource` and key loading +- [configuration.md](configuration.md) — DynamicConfig, AuthPolicy, ConfigReloadHandle +- [services.md](services.md) — AuthProtocol irpc service +- [open-questions.md](open-questions.md) — OQ-17 (resolved), OQ-18 (resolved), OQ-19 - [wtransport](https://github.com/BiagioFesta/wtransport) — Rust WebTransport library -- [WebTransport W3C Spec](https://www.w3.org/TR/webtransport/) — Browser API -- [@alkdev/storage](/workspace/@alkdev/storage) — `peer_credentials` table, ACL graph \ No newline at end of file +- [WebTransport W3C Spec](https://www.w3.org/TR/webtransport/) — Browser API \ No newline at end of file diff --git a/docs/architecture/configuration.md b/docs/architecture/configuration.md new file mode 100644 index 0000000..9f14c84 --- /dev/null +++ b/docs/architecture/configuration.md @@ -0,0 +1,192 @@ +--- +status: draft +last_updated: 2026-06-07 +--- + +# Configuration + +## What + +Alknet's configuration is split into `StaticConfig` (immutable after startup) and +`DynamicConfig` (hot-reloadable at runtime), with `ArcSwap` providing lock-free +reads on the hot path. `ConfigService` wraps reloads behind an irpc protocol +for production deployments. + +## Why + +Three specific failures motivated the split (ADR-030): + +1. No hot reload of authentication credentials — adding a key requires a restart. +2. No port forwarding access control — any authenticated client has unrestricted + access (ADR-031). +3. No structured configuration beyond CLI flags — operators need config files + and the NAPI layer needs programmatic reload. + +The split is clean: anything that affects SSH handshake or socket binding is +static; anything checked per-connection or per-channel is dynamic. + +## Architecture + +### StaticConfig + +Immutable after startup. Constructed from `ServeOptions` (the builder pattern +is preserved per ADR-011). Contains: + +- Transport mode, listen address +- TLS config (cert, key) +- iroh config (relay URL) +- Stealth mode flag +- Host key, host key algorithm +- Max auth attempts, max connections per IP +- Proxy config + +Changing any of these requires a restart. + +### DynamicConfig + +Hot-reloadable at runtime via `ArcSwap`. Contains: + +- `AuthPolicy` — authorized keys, certificate authorities, token config +- `ForwardingPolicy` — allow/deny rules for channel targets (ADR-031) +- `RateLimitConfig` — rate limiting parameters + +`ArcSwap` provides lock-free reads. Every `auth_publickey()` and +`channel_open_direct_tcpip()` call does a single `Arc` dereference — zero cost +compared to the current approach. Writes are atomic: `store()` swaps the +pointer. + +### ConfigReloadHandle + +```rust +pub struct ConfigReloadHandle { + dynamic: Arc>, +} + +impl ConfigReloadHandle { + pub fn reload(&self, new_config: DynamicConfig) { ... } +} +``` + +Obtained from `Server::run()`. Passed to NAPI or CLI for explicit reload. + +### ConfigService irpc Service + +```rust +enum ConfigProtocol { + GetForwardingPolicy, + GetRateLimits, + ReloadForwarding { policy: ForwardingPolicy }, + ReloadRateLimits { limits: RateLimitConfig }, +} +``` + +Behind the `irpc` feature flag. For production deployments that use the service +layer. For minimal deployments, direct `ConfigReloadHandle::reload()` is +sufficient. + +### ForwardingPolicy + +Part of DynamicConfig (ADR-031). Evaluated per-channel-open, matched against +the authenticated `Identity`. Rules are evaluated in order; first match wins. +Default determines fallback. + +```rust +pub struct ForwardingPolicy { + pub default: ForwardingAction, + pub rules: Vec, +} +``` + +### TOML Config File + +Optional convenience input format (amends ADR-011, does not replace +programmatic API). Covers static config plus initial auth/forwarding paths. + +```toml +[server] +transport = "tls" +listen = "0.0.0.0:443" + +[auth] +host_key = "/etc/alknet/ssh/host_key" + +[forwarding] +default = "deny" + +[[forwarding.rules]] +target = "localhost:*" +action = "allow" +``` + +### NAPI Reload API + +```typescript +interface AlknetServer { + reloadAuth(auth: { authorizedKeys?: Buffer, certAuthority?: Buffer }): void; + reloadForwarding(policy: ForwardingPolicyConfig): void; + reloadAll(config: DynamicConfig): void; +} +``` + +### Multi-Transport Listeners + +A head node may accept connections on multiple transports simultaneously. The +architecture supports `Vec` instead of a single +`ServeTransportMode`. `Server::run()` spawns one accept loop per listener, +sharing `DynamicConfig`, `ConnectionRateLimiter`, sessions, and shutdown signal. + +```toml +[[listeners]] +transport = "tls" +listen = "0.0.0.0:443" +stealth = true + +[[listeners]] +transport = "tcp" +listen = "0.0.0.0:22" + +[[listeners]] +transport = "iroh" +iroh_relay = "https://relay.alk.dev" +``` + +### CLI vs Programmatic Behavior + +| Interface | Static config | Dynamic config | Reload mechanism | +|-----------|--------------|----------------|------------------| +| CLI | Flags + optional `--config` file | Loaded at startup from `--authorized-keys` | None (restart to change) | +| Core Rust | `StaticConfig` struct | `AuthService` (irpc) or `ArcSwap` (minimal) | `ConfigService::reload()` or `ConfigReloadHandle::reload()` | +| NAPI | `serve()` options | Same | `server.reloadAuth()`, `server.reloadForwarding()` | + +## Constraints + +- `StaticConfig` cannot be changed after startup. Changing transport mode, + listen address, TLS config, or host key requires a restart. +- `DynamicConfig` is reloaded atomically via `ArcSwap`. Existing connections + continue with their current config; new connections get the new config. +- Config file is optional. `ServeOptions` builder pattern remains the primary + API (amends ADR-011, does not supersede it). +- No file watching (OQ-13 resolved: potential attack vector, unnecessary + complexity). +- Client configuration stays as `ConnectOptions` — no `ArcSwap` needed. + +## Open Questions + +- None. All configuration-related questions are resolved per ADR-030, ADR-031, + and the resolved OQs in [open-questions.md](open-questions.md). + +## Design Decisions + +| ADR | Decision | Summary | +|-----|----------|---------| +| [030](decisions/030-static-dynamic-config-split.md) | Static/dynamic config split | Immutable transport vs. reloadable auth/forwarding | +| [011](decisions/011-no-ssh-config-programmatic-api.md) | Programmatic-first API | Amended, not superseded — TOML is convenience layer | +| [031](decisions/031-forwarding-policy.md) | Forwarding policy | Rule-based allow/deny, TransportKind-aware | +| [029](decisions/029-identity-core-type.md) | Identity as core type | DynamicConfig.auth consumed by IdentityProvider | +| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | ConfigService wraps DynamicConfig reloads | + +## References + +- [research/configuration.md](../research/configuration.md) — Full analysis and proposed solution +- [identity.md](identity.md) — IdentityProvider trait, DynamicConfig.auth +- [ADR-013](decisions/013-fail2ban-friendly-logging.md) — Rate limiting parameters \ No newline at end of file diff --git a/docs/architecture/decisions/026-transport-interface-separation.md b/docs/architecture/decisions/026-transport-interface-separation.md new file mode 100644 index 0000000..8e72746 --- /dev/null +++ b/docs/architecture/decisions/026-transport-interface-separation.md @@ -0,0 +1,162 @@ +# ADR-026: Transport/Interface Separation (Three-Layer Model) + +## Status + +Accepted + +## Context + +In the current architecture, SSH is deeply embedded in the server handler. The +`ServerHandler` owns auth, channel management, and proxy logic — all mixed +together. This makes it impossible to run the call protocol over any transport +that doesn't speak SSH, such as: + +- **DNS** — encoding call protocol frames as DNS TXT queries/responses for + censorship resistance +- **Raw framing** — 4-byte length prefix + JSON `EventEnvelope` without SSH + wrapping, for local service mesh or browser-to-head direct communication +- **WebTransport** — running call protocol over QUIC streams (browsers can't do + SSH key exchange) + +The DNS control channel concept from research (`core.md`) currently conflates +"DNS as a transport that moves bytes" with "SSH sessions over those bytes." But +SSH is not a transport — it's a protocol layer that sits *on top of* a +transport. Separating them enables the DNS control channel to carry call +protocol events directly, without wrapping SSH inside DNS queries. + +The same separation enables raw framing (no SSH overhead) for trusted local +networks, and WebTransport direct call protocol for browser clients. + +## Decision + +**Establish a three-layer model:** + +### Layer 1: Transport + +Produces byte streams. A `Transport` still produces +`AsyncRead + AsyncWrite + Unpin + Send`. This layer is unchanged from ADR-001. + +```rust +#[async_trait] +pub trait Transport: Send + Sync + 'static { + type Stream: AsyncRead + AsyncWrite + Unpin + Send + 'static; + async fn connect(&self) -> Result; + fn describe(&self) -> String; +} +``` + +Transports: TCP, TLS, iroh, DNS (as byte carrier), WebTransport (future). + +### Layer 2: Interface + +Consumes a `Transport::Stream` and produces call protocol sessions. An +interface is what SSH currently does: wrap a byte stream in session semantics. + +```rust +#[async_trait] +pub trait Interface: Send + Sync + 'static { + type Session; + async fn accept(stream: TransportStream, config: &InterfaceConfig) -> Result; +} +``` + +Interfaces: + +- **SSH interface** — wraps existing `ServerHandler` logic. SSH handshake, auth, + channel multiplexing. The call protocol runs over a reserved SSH channel + (`alknet-control:0`). +- **Raw framing interface** — 4-byte big-endian length prefix + JSON + `EventEnvelope`. No SSH overhead. Direct call protocol over the transport + stream. +- **DNS control channel** — a (DNS transport, raw framing interface) pair that + encodes/decodes `EventEnvelope` frames as DNS query/response pairs. + +### Layer 3: Protocol + +Carries semantics. Call protocol events, operation registry, service calls. +The protocol is agnostic to both the transport and the interface below it. It +receives `EventEnvelope` frames from whatever interface produced them. + +### Connection Model + +A **connection** is always a (Transport, Interface) pair. The valid combinations are enumerated: + +| Transport | Interface | Use case | +|-----------|-----------|----------| +| TLS | SSH | Standard alknet tunnel | +| TCP | SSH | Plain SSH tunnel | +| iroh | SSH | P2P SSH tunnel | +| DNS | raw framing | DNS control channel | +| WebTransport | SSH | Browser SSH tunnel (future) | +| WebTransport | raw framing | Browser call protocol (future) | +| TCP | raw framing | Direct call protocol, local mesh | + +**The DNS control channel carries call protocol frames directly — it does NOT +wrap SSH inside DNS.** This is explicit because the research originally +conflated "SSH tunneling over DNS" with "DNS as a transport for call protocol." +The (DNS, raw framing) pair sends `EventEnvelope` frames as DNS TXT +queries/responses — no SSH involved. + +### `TransportKind` Enum + +The `TransportKind` enum (currently `Tcp | Tls | Iroh`) gains `Dns` and +`WebTransport` variants. Initially these are tags only — no acceptor +implementation. The full DNS and WebTransport implementations are Phase 4 work +per the integration plan. + +```rust +pub enum TransportKind { + Tcp, + Tls { server_name: Option }, + Iroh { endpoint_id: String }, + Dns { domain: String }, + WebTransport { host: String }, +} +``` + +### ServerHandler Refactor + +The existing `ServerHandler` is refactored into `SshInterface`. The interface +abstraction means the server's accept loop becomes: + +```rust +// Pseudocode +let (transport, interface) = listener_config; +let stream = transport.accept().await?; +let session = interface.accept(stream, &config).await?; +// session produces call protocol events +``` + +The call protocol handler is interface-agnostic — it receives `EventEnvelope` +frames from any interface. Auth, forwarding policy, and operation routing happen +at Layer 3, not inside the SSH handler. + +## Consequences + +- **Positive**: Enables DNS control channel without SSH wrapping. The (DNS, + raw framing) pair is a clean (Transport, Interface) combination. +- **Positive**: Enables raw framing for local service mesh. No SSH overhead for + trusted networks. +- **Positive**: SSH becomes pluggable. The same call protocol handler works with + any interface. +- **Positive**: `ServerHandler` is refactored into `SshInterface` — a smaller, + more focused component that only handles SSH session management. +- **Positive**: Future WebTransport and WebSocket interfaces are additive — they + implement the `Interface` trait without touching SSH code. +- **Negative**: This is the most invasive code change in Phase 1 + (integration-plan, Phase 1.8). SSH auth, channel management, and proxy logic + are currently tangled in `ServerHandler`. Extracting them requires careful + refactoring to maintain existing behavior. +- **Negative**: The `Interface` trait is new and untested. The design must + accommodate both SSH's channel multiplexing and raw framing's single-stream + model through the same abstraction. + +## References + +- [research/core.md](../../research/core.md) — Transport layer, DNS transport section +- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.8, three-layer model +- [transport.md](../transport.md) — Current Transport trait (unchanged at Layer 1) +- [server.md](../server.md) — Current ServerHandler (will become SshInterface) +- [ADR-001](001-pluggable-transport.md) — Transport trait produces stream (unchanged) +- [ADR-004](004-ssh-over-transport.md) — SSH runs over transport (reinforced by Layer 2) +- [ADR-024](024-bidirectional-call-protocol.md) — Bidirectional call protocol (Layer 3) \ No newline at end of file diff --git a/docs/architecture/decisions/027-crate-decomposition.md b/docs/architecture/decisions/027-crate-decomposition.md new file mode 100644 index 0000000..cfd3ac1 --- /dev/null +++ b/docs/architecture/decisions/027-crate-decomposition.md @@ -0,0 +1,150 @@ +# ADR-027: Crate Decomposition + +## Status + +Accepted + +## Context + +alknet-core currently contains everything: transport, SSH, auth, config, the +call protocol handler, and the server accept loop. As the project grows to +include SQLite-backed identity, HD key derivation, and metagraph storage, core +would need to depend on rusqlite, bip39, petgraph, and other heavy dependencies +— unacceptable for a library crate that CLI users embed. + +Different deployment topologies need different subsets: +- A minimal CLI tunnel only needs core, transport, and auth types +- A head node needs SQLite-backed identity and the secret service +- A flowgraph visualization tool only needs petgraph operations + +Circular dependencies must be avoided. alknet-storage implements +alknet-core's `IdentityProvider` trait, so alknet-core cannot depend on +alknet-storage. alknet-storage references alknet-secret's `EncryptedData` wire +format, but not as a crate dependency. + +## Decision + +**Decompose the project into six crates with a strict acyclic dependency graph.** + +### Crate Structure + +1. **alknet-core** — Transport, SSH, call protocol, config, auth types, identity, + `OperationSpec`, `Interface` trait. The foundational crate that everything + else depends on (by type, not by crate dep in some cases). + - *Depends on*: russh, tokio, irpc (feature-gated), serde, arc-swap + - *Does NOT depend on*: alknet-secret, alknet-storage, alknet-flowgraph + +2. **alknet-secret** — BIP39 mnemonic generation, SLIP-0010 Ed25519 HD key + derivation, AES-256-GCM encryption, `SecretProtocol` irpc service. + - *Depends on*: bip39, ed25519-bip32 (or rust-bip32-ed25519), aes-gcm, sha2, + irpc + - *Does NOT depend on*: alknet-core, alknet-storage + +3. **alknet-storage** — SQLite-backed metagraph, identity tables, ACL graph, + honker integration, `StorageProtocol` irpc service. + - *Depends on*: rusqlite (via honker), honker, petgraph, jsonschema, irpc + - *Does NOT depend on alknet-core* (but implements alknet-core's + `IdentityProvider` trait via the trait, not a crate dep) + - *Does NOT depend on alknet-secret* (but references `EncryptedData` type + format for wire compatibility) + +4. **alknet-flowgraph** — `FlowGraph` over petgraph, operation graph, call + graph, type compatibility checking. + - *Depends on*: petgraph, serde, jsonschema, thiserror + - *Does NOT depend on*: alknet-core, alknet-storage, alknet-secret + +5. **alknet-napi** — Node.js native addon. Exposes alknet-core to Node.js. + - *Depends on*: alknet-core + - *Does NOT depend on*: alknet-secret, alknet-storage, alknet-flowgraph + +6. **alknet** (CLI binary) — Assembles everything. + - *Depends on*: alknet-core, alknet-secret (feature), alknet-storage (feature), + alknet-flowgraph (feature), toml + +### Dependency Graph + +``` + alknet-secret + / \ + / \ +alknet-core ←──── ←── alknet-storage + ↑ \ / + │ alknet-flowgraph + │ +alknet-napi +alknet (CLI binary — assembles everything) +``` + +### Narrow Interface Points + +Three types serve as the narrow interface points between crates: + +1. **`Identity`** — Defined in `alknet_core::auth`. Used by auth handler, + forwarding policy, and call protocol. alknet-storage implements + `IdentityProvider` to produce instances. + +2. **`IdentityProvider`** — Trait defined in `alknet_core::auth`. Implemented by + `ConfigIdentityProvider` (in core) and `StorageIdentityProvider` (in + alknet-storage). The CLI/NAPI layer wires the concrete implementation. + +3. **`OperationSpec`** — Defined in `alknet_core::call`. Used by the operation + registry and by alknet-flowgraph for type compatibility checking. The bridge + is serialization — flowgraph serializes to JSON, storage persists it. + +### irpc Feature Flag + +irpc is a feature flag in alknet-core. When disabled, auth and config go through +`IdentityProvider` and `ConfigReloadHandle` directly — no irpc overhead. Nodes +that only do SSH tunneling don't need the service layer. + +In alknet-secret and alknet-storage, irpc is an independent dependency, not +feature-gated. These crates always define irpc service protocols because they +are used in production deployments where the service layer is active. + +### alknet-storage's Relationship to alknet-core + +alknet-storage does NOT depend on alknet-core as a crate. Instead: + +- alknet-storage defines its own `IdentityProvider` impl that matches + alknet-core's trait signature. The trait is re-exported or defined locally + with `#[cfg(feature = "alknet-core")]` interop. +- In practice, the CLI binary crate depends on both and wires them together. + alknet-storage provides `StorageIdentityProvider`; alknet-core takes + `impl IdentityProvider`. + +### alknet-storage's Relationship to alknet-secret + +alknet-storage does NOT depend on alknet-secret as a crate. Instead: + +- alknet-storage and alknet-secret share the `EncryptedData` wire format (key + version, salt, IV, ciphertext). This is a type-level compatibility, not a + crate dependency. +- alknet-secret encrypts; alknet-storage stores the encrypted blob in a + `SecretNode` in the metagraph. The bridge is serialization. + +## Consequences + +- **Positive**: Core is lean. No database, no crypto, no petgraph. CLI users + get a small binary. +- **Positive**: Services are pluggable. alknet-secret and alknet-storage can be + swapped for alternative implementations. +- **Positive**: No circular dependencies. The dependency graph is a DAG. +- **Positive**: Deployment topology determines which crates to include. A CLI + tunnel uses only alknet-core. A head node uses everything. +- **Positive**: irpc is feature-gated in core. Minimal deployments don't pay for + service layer overhead. +- **Negative**: `IdentityProvider` trait interop between alknet-core and + alknet-storage requires careful versioning. If the trait signature changes, + both crates must update. +- **Negative**: `EncryptedData` wire format compatibility between alknet-secret + and alknet-storage is implicit (not enforced by the type system). A shared + types crate could be extracted if needed, but adds another crate dependency. + +## References + +- [research/integration-plan.md](../../research/integration-plan.md) — Phase 2, dependency graph +- [research/core.md](../../research/core.md) — alknet-core contents +- [research/services.md](../../research/services.md) — Service protocols +- [research/storage.md](../../research/storage.md) — alknet-storage contents +- [research/flow.md](../../research/flow.md) — alknet-flowgraph contents +- [ADR-029](029-identity-core-type.md) — Identity as core type (narrow interface point) \ No newline at end of file diff --git a/docs/architecture/decisions/028-auth-irpc-service.md b/docs/architecture/decisions/028-auth-irpc-service.md new file mode 100644 index 0000000..f5d12ec --- /dev/null +++ b/docs/architecture/decisions/028-auth-irpc-service.md @@ -0,0 +1,146 @@ +# ADR-028: Auth as irpc Service + +## Status + +Accepted + +## Context + +For head nodes serving many users, in-memory key lookup via `ArcSwap` +doesn't scale. Loading all authorized keys into RAM and atomic-swapping the +entire set on each reload works for small deployments but requires holding every +key in memory. For production deployments with hundreds or thousands of users, +auth verification should query a database on demand rather than holding all keys +in memory. + +The current `ArcSwap` approach works for CLI and single-node +setups. What's needed is an async boundary that allows auth verification to go +through a service — locally via channels for minimal deployments, or via irpc +for production deployments where auth runs on a separate process or node. + +The critical design point: callers go through the `IdentityProvider` trait +(ADR-029). The irpc service is one way to satisfy the trait. Both paths produce +the same result — an `Identity` or rejection. The trait is the contract; the +service is an implementation path. + +## Decision + +**Auth verification is provided via an irpc service protocol, with +`IdentityProvider` as the interface contract and `ConfigIdentityProvider` +(ArcSwap-backed) as the default implementation.** + +### IdentityProvider Trait (ADR-029) — The Contract + +Callers depend on `IdentityProvider`, not on any concrete implementation: + +```rust +pub trait IdentityProvider: Send + Sync + 'static { + fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option; + fn resolve_from_token(&self, token: &AuthToken) -> Option; +} +``` + +### ConfigIdentityProvider — Default Implementation + +Reads from `ArcSwap`. No database needed. Every authorized +key gets a default scope set. This is the default for CLI and single-node +deployments. + +### AuthProtocol irpc Service — Behind Feature Flag + +```rust +#[rpc_requests(message = AuthMessage)] +#[derive(Debug, Serialize, Deserialize)] +enum AuthProtocol { + #[rpc(tx=oneshot::Sender)] + #[wrap(VerifyPubkey)] + VerifyPubkey { fingerprint: String, key_data: Vec }, + + #[rpc(tx=oneshot::Sender)] + #[wrap(VerifyToken)] + VerifyToken { token_bytes: Vec, timestamp: u64 }, + + #[rpc(tx=oneshot::Sender<()>)] + #[wrap(ReloadKeys)] + ReloadKeys, + + #[rpc(tx=oneshot::Sender)] + #[wrap(CheckAccess)] + CheckAccess { identity: Identity, operation: String }, +} + +enum AuthResult { + Ok(Identity), + Denied(String), +} +``` + +The `AuthProtocol` is behind the `irpc` feature flag in alknet-core. Nodes +that only do SSH tunneling don't need the service layer overhead. When the +feature is disabled, auth goes through `IdentityProvider` directly. + +### AuthServiceImpl + +Two implementations exist: + +- **ConfigAuthService** — backed by `ConfigIdentityProvider` (ArcSwap path). + Wraps the trait in an irpc service for deployments that use the service layer + but don't have SQLite. +- **StorageAuthService** — backed by SQLite `peer_credentials` and `api_keys` + tables (in alknet-storage). Queries on demand. Can maintain an LRU cache for + hot fingerprints. This is the production implementation. + +Both produce the same `AuthResult` — an `Identity` or a denial. Callers don't +know or care which backend is running. + +### Integration with IdentityProvider + +The irpc service and the trait compose. A caller goes through `IdentityProvider`, +which may internally delegate to the irpc service, or may satisfy the request +locally via `ConfigIdentityProvider`. The deployment topology determines the +path: + +- **Minimal (CLI, single-node)**: `ConfigIdentityProvider` reads from + `ArcSwap`. No irpc overhead. +- **Production with local auth**: `AuthServiceImpl` wraps + `StorageIdentityProvider` locally. The handler calls `IdentityProvider` which + routes to the local irpc service. +- **Distributed auth**: Handler on a worker node calls `IdentityProvider` which + routes to a remote auth irpc service over QUIC. + +### ConfigService Integration + +`AuthProtocol::ReloadKeys` triggers reload of the dynamic config's auth section. +For the `ConfigIdentityProvider` path, this is equivalent to +`ConfigReloadHandle::reload()`. For the `StorageIdentityProvider` path, this +refreshes the LRU cache. Both update atomically — ongoing connections are +unaffected, new connections pick up changes. + +## Consequences + +- **Positive**: Minimal deployments use `ArcSwap` without irpc overhead. No + database dependency for CLI users. +- **Positive**: Production deployments wire `StorageIdentityProvider` behind the + irpc service. Auth scales to thousands of users without loading all keys into + memory. +- **Positive**: The `IdentityProvider` trait is the only contract callers depend + on. This keeps alknet-core lean and testable. +- **Positive**: Feature flag (`irpc`) keeps core lean for deployments that don't + need the service layer. +- **Positive**: Both paths produce identical `Identity` results. Behavioral + parity is enforced by the shared `Identity` type. +- **Negative**: Two implementations must be kept in sync. `ConfigIdentityProvider` + and `StorageIdentityProvider` must produce the same `Identity` for the same + input. Integration tests should verify this. +- **Negative**: The `irpc` feature flag adds conditional compilation complexity. + The core must compile and work without it, and the service layer must work + with it enabled. + +## References + +- [research/services.md](../../research/services.md) — AuthService, AuthProtocol definition +- [auth.md](../auth.md) — IdentityProvider trait, Identity struct +- [research/configuration.md](../../research/configuration.md) — Auth service approach +- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.4 +- [ADR-029](029-identity-core-type.md) — Identity as core type +- [ADR-027](027-crate-decomposition.md) — Crate decomposition \ No newline at end of file diff --git a/docs/architecture/decisions/029-identity-core-type.md b/docs/architecture/decisions/029-identity-core-type.md new file mode 100644 index 0000000..0077c0e --- /dev/null +++ b/docs/architecture/decisions/029-identity-core-type.md @@ -0,0 +1,107 @@ +# ADR-029: Identity as Core Type + +## Status + +Accepted + +## Context + +The `Identity` struct and `IdentityProvider` trait are needed by auth, +forwarding policy, and call protocol — three different subsystems in +alknet-core. Without placing them in core, these subsystems would each define +their own identity type, leading to duplication and conversion boilerplate. + +The constraint: alknet-core must not depend on alknet-storage or any database. +The `IdentityProvider` trait must be in core so that the handler can resolve +identities without knowing whether the backing store is a config file or a +SQLite database. External crates provide implementations. + +Earlier research defined `Identity` inconsistently: `{node_id, fingerprint, +scopes}` in services.md and `{id, scopes, resources}` in auth.md. The unified +model uses `{id, scopes, resources}` where `id` serves as both fingerprint (for +key-based auth from config) and account UUID (for database-backed auth). + +## Decision + +**`Identity` struct and `IdentityProvider` trait live in `alknet_core::auth`.** + +### Identity Struct + +```rust +pub struct Identity { + pub id: String, // Fingerprint (config auth) or account UUID (database auth) + pub scopes: Vec, // e.g., ["relay:connect", "service:gitea:read"] + pub resources: HashMap>, // e.g., {"service": ["gitea", "registry"]} +} +``` + +The `id` field serves dual purpose: when using config-based authentication +(`ConfigIdentityProvider`), it holds the Ed25519 key fingerprint. When using +database-backed authentication (`StorageIdentityProvider`), it holds the account +UUID from the `accounts` table. This keeps the type simple while accommodating +both auth paths. + +The `scopes` field provides authorization scope strings used by +`ForwardingPolicy` and `AccessControl` in `OperationSpec`. The `resources` +field provides resource-level authorization beyond what scopes offer (e.g., which +services this identity can access). + +### IdentityProvider Trait + +```rust +pub trait IdentityProvider: Send + Sync + 'static { + fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option; + fn resolve_from_token(&self, token: &AuthToken) -> Option; +} +``` + +The trait is the contract. Callers (auth handler, forwarding policy, call +protocol) depend on `IdentityProvider` — not on any concrete implementation. + +### Default and Production Implementations + +- **`ConfigIdentityProvider`** (in alknet-core) — reads from + `ArcSwap`. Every authorized key gets a default scope set. + No database needed. This is the default for minimal deployments. +- **`StorageIdentityProvider`** (in alknet-storage) — backed by SQLite + `peer_credentials` and `api_keys` tables plus the ACL graph. Resolves + fingerprint → account → organization membership → effective scopes. This is + the production implementation for head nodes. + +alknet-core never depends on alknet-storage. The trait relationship is: +alknet-core *defines* the trait, alknet-storage *implements* it. The CLI or +NAPI assembly layer wires the concrete implementation. + +### Why Not in alknet-storage? + +If `Identity` lived in alknet-storage, alknet-core would need to depend on +alknet-storage to use the type — creating a circular dependency (since +alknet-storage implements alknet-core's `IdentityProvider` trait). Placing the +type and trait in core breaks the cycle. + +## Consequences + +- **Positive**: alknet-core has no database dependency. Auth, forwarding, and + call protocol all use the same `Identity` type. +- **Positive**: alknet-storage implements the core trait. The CLI/NAPI layer + wires the concrete implementation. Deployment topology determines which impl + to use. +- **Positive**: The `id` field serves dual purpose (fingerprint or UUID), + avoiding separate types for config-based and database-based auth. +- **Positive**: `ForwardingPolicy` and `AccessControl` can reference scopes from + `Identity` without knowing where they came from. +- **Negative**: Two implementations of `IdentityProvider` exist — `Config` and + `Storage`. Both must produce identical `Identity` results for the same input. + Tests should verify behavioral parity. +- **Negative**: The trait abstraction adds a level of indirection for the + minimal (config-only) deployment path. The cost is negligible — the + `ConfigIdentityProvider` is a simple `ArcSwap` dereference. + +## References + +- [auth.md](../auth.md) — IdentityProvider trait, Identity struct, unified auth +- [research/services.md](../../research/services.md) — AuthService, Identity section +- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.2 +- [ADR-023](023-unified-auth-shared-key-material.md) — Unified auth with shared key material +- [ADR-028](028-auth-irpc-service.md) — Auth as irpc service +- [OQ-18](../open-questions.md) — IdentityProvider owns scopes \ No newline at end of file diff --git a/docs/architecture/decisions/030-static-dynamic-config-split.md b/docs/architecture/decisions/030-static-dynamic-config-split.md new file mode 100644 index 0000000..a2158db --- /dev/null +++ b/docs/architecture/decisions/030-static-dynamic-config-split.md @@ -0,0 +1,159 @@ +# ADR-030: Static/Dynamic Configuration Split + +## Status + +Accepted + +## Context + +Alknet's configuration is loaded once at startup and never changes. This causes +three specific failures: + +1. **No hot reload of authentication credentials.** Adding or removing an + authorized key requires restarting the server process. In head/worker + deployments where keys are managed via a database, the process must be + restarted every time a key is added, revoked, or rotated. This is + operationally unacceptable. + +2. **No port forwarding access control.** Any authenticated client can open a + `direct-tcpip` channel to any destination. There is no policy governing + which hosts, ports, or alknet control channels a client may access. A + compromised key grants unrestricted network access through the tunnel. + +3. **No structured configuration beyond CLI flags.** ADR-011 chose + programmatic-first configuration for the alpha — correct at the time. But as + alknet moves toward publishable releases, operators need config files for + reproducible deployments, and the NAPI layer needs programmatic reload + capability that `ServeOptions` doesn't currently support. + +Not all configuration should be reloadable. Transport-level settings (listen +address, TLS certificates, host key) require socket/TLS renegotiation to change +at runtime — effectively a restart. Auth and forwarding policy can change +atomically without disrupting existing connections. + +## Decision + +**Split configuration into `StaticConfig` and `DynamicConfig`.** + +### StaticConfig + +Immutable after startup. Constructed from `ServeOptions` (the builder pattern is +preserved). Contains everything that affects socket binding, TLS handshakes, or +SSH session negotiation: + +- Transport mode, listen address +- TLS config (cert, key) +- iroh config (relay URL) +- Stealth mode flag +- Host key, host key algorithm +- Max auth attempts, max connections per IP +- Proxy config + +Changing any of these requires a restart. + +### DynamicConfig + +Hot-reloadable at runtime via `ArcSwap`. Contains everything +checked per-connection or per-channel: + +- `AuthPolicy` — authorized keys, certificate authorities, token config +- `ForwardingPolicy` — allow/deny rules for channel targets (ADR-031) +- `RateLimitConfig` — rate limiting parameters + +`ArcSwap` provides lock-free reads on the hot path (every `auth_publickey()` and +every `channel_open_direct_tcpip()` call does an `Arc` dereference — zero cost +compared to the current approach). Writes are atomic: `store()` swaps the +pointer. Existing connections finish with their current config; new connections +get the new config. + +### ConfigReloadHandle + +```rust +pub struct ConfigReloadHandle { + dynamic: Arc>, +} + +impl ConfigReloadHandle { + pub fn reload(&self, new_config: DynamicConfig) { ... } +} +``` + +The handle is obtained from `Server::run()` and passed to NAPI or the CLI. + +### ConfigService + +The `ConfigService` wraps `ArcSwap` reloads behind an irpc +protocol (behind the `irpc` feature flag) for production deployments that use +the service layer. For minimal deployments (CLI, single-node), direct +`ConfigReloadHandle::reload()` is sufficient. + +### TOML Config File + +An optional TOML config file covers static config plus initial auth/forwarding +paths. This **amends** ADR-011 (does not supersede it) — the programmatic-first +API remains primary. The config file is a convenience input format: + +```toml +[server] +transport = "tls" +listen = "0.0.0.0:443" +stealth = false +max_connections_per_ip = 5 +max_auth_attempts = 3 + +[server.tls] +cert = "/etc/alknet/tls/cert.pem" +key = "/etc/alknet/tls/key.pem" + +[auth] +host_key = "/etc/alknet/ssh/host_key" + +[forwarding] +default = "deny" +``` + +### NAPI Reload API + +```typescript +interface AlknetServer { + reloadAuth(auth: { authorizedKeys?: Buffer, certAuthority?: Buffer }): void; + reloadForwarding(policy: ForwardingPolicyConfig): void; + reloadAll(config: DynamicConfig): void; +} +``` + +The NAPI layer parses key data and constructs a new `DynamicConfig`, then calls +`ConfigReloadHandle::reload()`. + +### Client Configuration + +Client configuration stays as `ConnectOptions` — no `ArcSwap` needed. Client +config is almost entirely static (which server to connect to, which key to use). + +## Consequences + +- **Positive**: Auth credentials and forwarding policy can be reloaded without + restarting the server. Adding a key via `reloadAuth()` takes effect on the + next connection attempt. +- **Positive**: ADR-011's programmatic-first intent is preserved. The TOML + config file is an optional convenience layer, not a replacement for + `ServeOptions`. +- **Positive**: `ArcSwap` provides zero-cost reads on the hot path. Every auth + check and every channel open is a single `Arc` dereference. +- **Positive**: The `ConfigService` irpc protocol (behind feature flag) allows + production deployments to integrate config reload into their service mesh + without taking a direct dependency on `DynamicConfig` internals. +- **Positive**: Forwarding policy is now part of `DynamicConfig` — operators can + restrict access per identity, per destination, per transport (ADR-031). +- **Negative**: Two config structs where there was one. The split is clean + (transport vs. policy) but adds surface area. +- **Negative**: Config file introduces `toml` as a dependency in the CLI crate. + This is acceptable for a CLI binary. + +## References + +- [research/configuration.md](../../research/configuration.md) — Full analysis +- [ADR-011](011-no-ssh-config-programmatic-api.md) — Programmatic-first API (amended, not superseded) +- [ADR-031](031-forwarding-policy.md) — Forwarding policy (part of DynamicConfig) +- [ADR-029](029-identity-core-type.md) — Identity as core type (DynamicConfig.auth uses IdentityProvider) +- [integration-plan.md](../../research/integration-plan.md) — Phase 1.1 \ No newline at end of file diff --git a/docs/architecture/decisions/031-forwarding-policy.md b/docs/architecture/decisions/031-forwarding-policy.md new file mode 100644 index 0000000..553f42e --- /dev/null +++ b/docs/architecture/decisions/031-forwarding-policy.md @@ -0,0 +1,138 @@ +# ADR-031: Forwarding Policy + +## Status + +Accepted + +## Context + +Currently, any authenticated client can open a `direct-tcpip` SSH channel to +any destination. The only gate is authentication — once authenticated, a client +has unrestricted network access through the tunnel. This is a security gap: a +compromised key grants unrestricted access. + +Operators need the ability to: +- Restrict which hosts and ports authenticated clients can access +- Apply different rules to different principals (key fingerprints, accounts) +- Restrict WebTransport clients to alknet control channels only +- Set a default policy (allow-all for migration compatibility, deny-all for + production) + +## Decision + +**Add `ForwardingPolicy` as part of `DynamicConfig` (reloadable without +restart).** + +### Type Definitions + +```rust +pub struct ForwardingPolicy { + pub default: ForwardingAction, + pub rules: Vec, +} + +pub struct ForwardingRule { + pub target: TargetPattern, + pub action: ForwardingAction, + pub principals: Vec, // Empty = matches all + pub transports: Vec, // Empty = matches all +} + +pub enum ForwardingAction { + Allow, + Deny, +} + +pub enum TargetPattern { + Any, + Host(String), // "localhost", "*.example.com" + Cidr(IpNetwork), // "10.0.0.0/8" + PortRange(String, Range), // "localhost", ports 8080-8090 + AlknetPrefix, // Matches alknet-* control channels +} +``` + +### Rule Evaluation + +Rules are evaluated in order. First match wins. If no rule matches, the default +applies. This supports both allowlist and blocklist semantics: + +- **Allowlist**: `default: Deny`, then explicit Allow rules for permitted + destinations. +- **Blocklist**: `default: Allow`, then explicit Deny rules for blocked + destinations. + +### Principals + +Each rule can specify which principals it applies to. A principal is an +`Identity.id` (fingerprint or UUID) or a scope from `Identity.scopes`. When the +rule's `principals` field is empty, it matches all identities. + +This connects to the `IdentityProvider` trait (ADR-029): when a client +authenticates, the `Identity` is resolved, and the forwarding policy checks +rules against `Identity.id` and `Identity.scopes`. + +### TransportKind-Aware Rules + +Each rule can specify which `TransportKind` it applies to. This enables +transport-specific restrictions — for example, WebTransport clients can be +restricted to `alknet-*` control channels only: + +```rust +ForwardingRule { + target: TargetPattern::AlknetPrefix, + action: ForwardingAction::Allow, + principals: vec![], + transports: vec![TransportKind::WebTransport { host: "*".into() }], +} +``` + +### Where the Policy Check Happens + +The forwarding policy check occurs in `channel_open_direct_tcpip` before the +proxy task is spawned. The current behavior (no check) is equivalent to +`ForwardingPolicy::allow_all()` — default Allow, no rules. This preserves +backward compatibility during migration. + +### DynamicConfig Integration + +`ForwardingPolicy` is part of `DynamicConfig` and reloadable via +`ConfigReloadHandle::reload()` or NAPI's `reloadForwarding()`. Changes take +effect on the next channel open — existing connections continue with their +current policy. + +### OQ Resolutions + +- **OQ-12** (Per-user forwarding scope vs global rules): Resolved. Start with + global rules + principal matching from `Identity.scopes`. Per-user scope + from `peer_credentials.metadata.scopes` via `IdentityProvider`. +- **OQ-16** (Transport-specific forwarding): Resolved. Add `TransportKind` + match in `ForwardingRule`. WebTransport clients can be restricted. +- **OQ-18** (Source of Identity.scopes): Resolved by ADR-029 and this ADR. + `IdentityProvider` owns scopes. `ForwardingPolicy` consumes them. + +## Consequences + +- **Positive**: Operators can restrict access per identity, per destination, per + transport. A compromised key no longer grants unrestricted network access. +- **Positive**: Default-allow preserves current behavior during migration. Switch + to default-deny for production deployments. +- **Positive**: Policy is reloadable without restart. Adding a rule via + `reloadForwarding()` takes effect on the next channel open. +- **Positive**: `TransportKind`-aware rules enable transport-specific + restrictions (e.g., WebTransport clients restricted to alknet-* channels). +- **Negative**: Another check in the hot path (every `channel_open_direct_tcpip` + call). The cost is a linear scan of rules — acceptable for small rule sets. + Large rule sets should use compiled matchers (future optimization). +- **Negative**: `TargetPattern` string matching is lenient. Host patterns like + `*.example.com` require careful implementation to prevent bypasses. The + `glob` or `globset` crate can handle this correctly. + +## References + +- [research/configuration.md](../../research/configuration.md) — ForwardingPolicy section +- [auth.md](../auth.md) — Identity.scopes and IdentityProvider +- [open-questions.md](../open-questions.md) — OQ-12, OQ-16, OQ-18 +- [ADR-029](029-identity-core-type.md) — Identity as core type +- [ADR-030](030-static-dynamic-config-split.md) — DynamicConfig (ForwardingPolicy is part of it) +- [integration-plan.md](../../research/integration-plan.md) — Phase 1.3 \ No newline at end of file diff --git a/docs/architecture/decisions/032-event-boundary-discipline.md b/docs/architecture/decisions/032-event-boundary-discipline.md new file mode 100644 index 0000000..a860776 --- /dev/null +++ b/docs/architecture/decisions/032-event-boundary-discipline.md @@ -0,0 +1,96 @@ +# ADR-032: Event Boundary Discipline + +## Status + +Accepted + +## Context + +The research identified three distinct communication patterns in the system, and +conflating them is a known anti-pattern in event-driven architectures: + +1. **Domain events** (Honker streams) — Internal to the service that owns that + data. Used for state reconstruction within the service's own boundaries. + Examples: `nodes:created`, `edges:deleted`, `accounts:updated`. + +2. **irpc service calls** — Synchronous request-response within a node or + cluster. Internal to the system. Examples: `AuthProtocol::VerifyPubkey`, + `SecretProtocol::DeriveEd25519`, `ConfigProtocol::ReloadForwarding`. + +3. **Call protocol events** (`EventEnvelope`) — Asynchronous integration events + that cross node boundaries. External to the system. Examples: + `call.requested`, `call.responded`, `call.completed`, `call.aborted`. + +Without a hard constraint, it's tempting to have one service subscribe directly +to another service's Honker streams. This leads to: + +- **Leaky event store**: Service A reads Service B's domain events directly, + coupling A to B's internal state representation. When B changes its schema, A + breaks. +- **Boomerang coupling**: An integration event is too thin, causing the + consumer to call back to the source service synchronously to get details. This + negates the benefit of async communication. +- **Fat notification trap**: A notification event carries full entity state, + when it should use state transfer instead. + +## Decision + +**Event boundary discipline is a hard architectural constraint, not a +suggestion.** + +1. **Domain events stay within the owning service.** A Honker stream published + by the storage service (`nodes:created`) is for the storage service's own + state reconstruction. No other service reads these stream events directly. + +2. **irpc service calls are synchronous and internal.** They never cross node + boundaries. They are request-response, not events. They should not be used + as a substitute for integration events. + +3. **Call protocol events are the only events that cross node boundaries.** + `EventEnvelope` frames are the integration boundary. When a domain event + needs to be communicated to another node, it must be projected into a call + protocol event. + +4. **Projection from domain events to integration events is required when + crossing boundaries.** A service that owns a Honker stream must project + relevant state changes into `EventEnvelope` frames before they leave the + node. The projection strips internal details and produces a versioned, + stable integration event. + +This discipline applies at three levels: + +``` +Call Protocol (Layer 3, external, JSON) + └── irpc Service (Layer 3, internal, postcard) + └── Honker Streams (Domain events, within service boundary) +``` + +A call protocol handler MAY call an irpc service internally (e.g., +`/head/auth/verify` calls `AuthProtocol::VerifyPubkey`). The irpc service MAY +use Honker streams for its own state management. But domain events never +propagate beyond the service boundary without projection. + +## Consequences + +- **Positive**: Prevents leaky event stores. Services are independently + deployable and their internal schemas can evolve without breaking consumers. +- **Positive**: Honker and irpc are implementation details, not cross-boundary + contracts. The call protocol's `EventEnvelope` is the only stable, versioned + contract that other nodes depend on. +- **Positive**: Clear ownership. Each service owns its Honker streams and can + change them freely. Integration events are a deliberate, reviewed contract. +- **Positive**: Makes testing easier. Services can be tested in isolation with + mock domain events. Integration events are tested against the `EventEnvelope` + schema. +- **Negative**: Projection code is required. Every domain event that needs to + cross a boundary must be explicitly projected. This is deliberate — the + overhead ensures the integration contract is intentional. +- **Negative**: Developers must resist the temptation to subscribe directly to + Honker streams across services. Code review should catch this pattern. + +## References + +- [research/services.md](../../research/services.md) — Event boundary discipline section +- [research/storage.md](../../research/storage.md) — Honker integration, event boundaries +- [research/integration-plan.md](../../research/integration-plan.md) — ADR 032 entry +- [event_source_types.md](/workspace/research/event_sourcing/event_source_types.md) — Event-driven architecture patterns \ No newline at end of file diff --git a/docs/architecture/decisions/033-operationenv-irpc-call-protocol.md b/docs/architecture/decisions/033-operationenv-irpc-call-protocol.md new file mode 100644 index 0000000..f216d37 --- /dev/null +++ b/docs/architecture/decisions/033-operationenv-irpc-call-protocol.md @@ -0,0 +1,130 @@ +# ADR-033: OperationEnv as Universal Composition Mechanism + +## Status + +Accepted + +## Context + +The `@alkdev/operations` TypeScript package defines `OperationEnv` as a +universal composition mechanism. A handler receives `context.env[namespace][op](input)` +and can invoke any registered operation regardless of whether it runs locally, in +an irpc service on the same cluster, or on a remote node via call protocol. + +The research documents define three dispatch paths: +1. **Local dispatch** — direct function call through the operation registry +2. **Service dispatch** — irpc protocol call to a service backend +3. **Remote dispatch** — call protocol `EventEnvelope` to a remote node + +Without a formal decision, irpc services could be seen as a replacement for +OperationEnv or for the call protocol. They are not — irpc is one dispatch +backend for OperationEnv, not a replacement for anything. The call protocol is +another dispatch backend. OperationEnv unifies them from the handler's +perspective. + +The three communication patterns in the system (ADR-032) are: +- Domain events (Honker streams) — internal to the owning service +- irpc service calls — synchronous, in-cluster +- Call protocol events — asynchronous, cross-node + +irpc services and call protocol operations serve different scopes but must +compose cleanly through OperationEnv. + +## Decision + +**OperationEnv is the universal composition mechanism that all operation +handlers receive. It provides namespace + operation name → invoke with input, +return output, regardless of dispatch path.** + +### OperationEnv Behavioral Contract + +```rust +// The behavioral contract: given a namespace and operation name, invoke the +// operation with the given input and return the output. The handler neither +// knows nor cares whether the dispatch is local, via irpc, or via call protocol. +pub trait OperationEnv: Send + Sync { + fn invoke(&self, namespace: &str, operation: &str, input: Value) -> ResponseEnvelope; +} +``` + +The Rust implementation may use typed method dispatch or a registry behind the +scenes, but the handler-facing API must preserve this contract. + +### Three Dispatch Paths + +OperationEnv resolves each call to one of three dispatch backends: + +| Path | Mechanism | Serialization | Scope | +|------|-----------|---------------|-------| +| Local | Direct function call through registry | None (in-process) | Same process | +| Service | irpc protocol enum dispatch | postcard (binary) | Same cluster | +| Remote | Call protocol `EventEnvelope` | JSON | Cross-node | + +All three produce the same `ResponseEnvelope`. The handler always calls +`context.env.invoke("secrets", "derive", input)` and gets a `ResponseEnvelope` +back. + +### Service Assembly + +The deployment topology determines which dispatch path each operation uses: + +```rust +// Minimal deployment (single node, all local) +let env = OperationEnv::local(local_registry); + +// Production deployment (mix of local and remote) +let env = OperationEnv::new() + .local("auth", auth_registry) // Auth runs locally + .local("config", config_registry) // Config runs locally + .service("secrets", secret_irpc_client) // Secret service via irpc + .remote("worker-1", call_protocol_conn) // Worker-1 operations via call protocol +``` + +### irpc Services Are One Dispatch Backend + +irpc services (`AuthProtocol`, `SecretProtocol`, `ConfigProtocol`) define the +wire format for in-cluster communication. They are Rust-to-Rust, type-safe, +and efficient. But they are not a replacement for OperationEnv or for the call +protocol. They are one dispatch backend. + +An irpc service can be exposed as a call protocol operation: +`/head/auth/verify` receives a call protocol event and internally calls +`AuthProtocol::VerifyPubkey` via irpc. The layers compose: + +``` +Call Protocol (Layer 3, external, JSON) + └── irpc Service (Layer 3, internal, postcard) + └── Honker Streams (Domain events, within service boundary) +``` + +### Adapters Map to OperationEnv + +HTTP (`POST /v1/{namespace}/{op}`), MCP (`tools/call`), DNS +(`{op}.{namespace}.alk.dev TXT?`), and call protocol +(`/call.requested`) all resolve through OperationEnv. This is what makes +operations universally composable across all interfaces. + +## Consequences + +- **Positive**: Handlers compose through a single interface. Adding a new + dispatch path (e.g., a new irpc service) doesn't change handler code. +- **Positive**: irpc and call protocol coexist naturally. The handler doesn't + know which path was taken. +- **Positive**: Adapters (MCP, HTTP, DNS) map to operations through the same + OperationEnv interface. One handler, multiple dispatch paths. +- **Positive**: Deployment topology determines dispatch, not code. Same handler + works locally, in-cluster, or cross-node. +- **Negative**: OperationEnv is a new abstraction that must coexist with the + existing call protocol handler pattern. The registry currently maps paths to + handlers; OperationEnv adds namespace-aware composition on top. +- **Negative**: The `@alkdev/operations` TypeScript `HashMap>` model needs idiomatic Rust translation. The behavioral + contract must match, but the implementation can differ. + +## References + +- [research/services.md](../../research/services.md) — OperationContext, OperationEnv +- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.5, OperationEnv wiring +- [ADR-032](032-event-boundary-discipline.md) — Event boundary discipline +- [ADR-024](024-bidirectional-call-protocol.md) — Bidirectional call protocol +- [ADR-025](025-handler-spec-separation.md) — Handler/spec separation \ No newline at end of file diff --git a/docs/architecture/decisions/034-head-worker-terminology.md b/docs/architecture/decisions/034-head-worker-terminology.md new file mode 100644 index 0000000..58b8873 --- /dev/null +++ b/docs/architecture/decisions/034-head-worker-terminology.md @@ -0,0 +1,55 @@ +# ADR-034: Head/Worker Terminology + +## Status + +Accepted + +## Context + +The project previously used hub/spoke terminology for describing node +relationships: a hub node that coordinates connections and spokes that connect to +it. This terminology implies a strict star topology where the hub is +fundamentally different from spokes. + +In practice, a coordinating node can also execute operations (run services, +forward traffic). Any node can become a coordinator. The architecture supports +mesh topologies where nodes coordinate in a peer-to-peer fashion. + +The research documents (`core.md`, `services.md`) and updated architecture +specs (`call-protocol.md`, `auth.md`, `napi-and-pubsub.md`, `open-questions.md`) +already use head/worker consistently. Existing ADRs (024, 025) retain their +original hub/spoke language because ADRs are historical records. + +## Decision + +**Use head/worker terminology throughout the project.** + +- **Head node**: A node that coordinates — accepts connections, routes + operations, manages cluster state. A head is also a worker (it can execute + operations). +- **Worker node**: A node that connects to a head, registers its services, and + executes operations. Any worker can become a head. +- **Node**: Any participant in the network. Every node has an Ed25519 identity. + +The terms hub and spoke are deprecated in all new specs, code, and +documentation. Existing ADRs retain their original language as historical +records — ADRs document what was decided at the time, not what the current +terminology is. + +## Consequences + +- **Positive**: Natural mesh formation. A head that is also a worker enables + multi-hop routing, redundancy, and distributed topologies without a + centralized authority. +- **Positive**: Consistency with integration plan and research documents. +- **Positive**: The terminology better reflects the architecture — there is no + single "hub" that's fundamentally different from "spokes." +- **Neutral**: Existing ADRs (024, 025) retain hub/spoke in their text. This is + intentional — ADRs are historical records. + +## References + +- [research/integration-plan.md](../../research/integration-plan.md) — Phase 0 ADR 034 entry, inconsistencies section +- [ADR-024](024-bidirectional-call-protocol.md) — Uses hub/spoke historically +- [ADR-025](025-handler-spec-separation.md) — Uses hub/spoke historically +- [research/core.md](../../research/core.md) — Head/worker terminology \ No newline at end of file diff --git a/docs/architecture/flowgraph.md b/docs/architecture/flowgraph.md new file mode 100644 index 0000000..591030e --- /dev/null +++ b/docs/architecture/flowgraph.md @@ -0,0 +1,186 @@ +--- +status: draft +last_updated: 2026-06-07 +--- + +# FlowGraph + +## What + +The `alknet-flowgraph` crate provides graph data structures and operations, +mapping the TypeScript `@alkdev/flowgraph` package's call-graph and +operation-graph concepts to `petgraph::DiGraph`. + +## Why + +Call graphs and operation graphs are core observability and type-safety +constructs. Call graphs track request flow across services; operation graphs +validate type compatibility between composed operations. The crate is pure +computation (no I/O, no external state), making it safe to include in any +deployment topology. + +## Architecture + +### Core Abstraction + +`petgraph::DiGraph` replaces graphology. The mapping is nearly 1:1 for the +operations used: + +| TypeScript (graphology) | Rust (petgraph) | +|------------------------|-----------------| +| `graph.addNode(key, attrs)` | `graph.add_node(attrs)` + key_to_index | +| `graph.addEdge(source, target, attrs)` | `graph.add_edge(source, target, attrs)` | +| `hasCycle()` | `is_cyclic_directed(&graph)` | +| `topologicalSort()` | `toposort(&graph)` | + +A `HashMap` provides node-key-to-index lookups, mirroring +the `key` column in the SQLite `nodes` table. + +### FlowGraph + +```rust +pub struct FlowGraph +where + N: NodeAttributes, + E: EdgeAttributes, +{ + graph: DiGraph, + key_to_index: HashMap, +} + +pub trait NodeAttributes: Clone + Serialize + DeserializeOwned + Debug + Send + Sync { + fn key(&self) -> &str; + fn set_key(&mut self, key: String); +} + +pub trait EdgeAttributes: Clone + Serialize + DeserializeOwned + Debug + Send + Sync { + fn edge_type(&self) -> &str; +} +``` + +### Operation Graph (Static) + +Built from `OperationSpec`s at startup. Answers structural questions: type +compatibility, cycle detection, reachability. + +```rust +pub struct OperationNodeAttrs { + pub name: String, + pub namespace: String, + pub op_type: OperationType, + pub input_schema: Value, + pub output_schema: Value, +} + +pub enum OperationType { Query, Mutation, Subscription } +``` + +Type compatibility compares `output_schema` (source) against `input_schema` +(target) using `jsonschema::validate()`. Exact match or subtype = compatible +edge. Structural mismatch = incompatible edge. + +### Call Graph (Dynamic) + +Populated at runtime from call protocol events. Every `call.requested` adds a +node; `call.responded`/`call.error`/`call.aborted` update status. + +```rust +pub struct CallNodeAttrs { + pub request_id: String, + pub operation_id: String, + pub status: CallStatus, + pub parent_request_id: Option, + pub input: Value, + pub output: Option, + pub error: Option, + pub identity: Option, + pub started_at: Option, + pub completed_at: Option, +} + +pub enum CallStatus { Pending, Running, Completed, Failed, Aborted } +``` + +### Key Operations + +| Query | Method | Returns | +|-------|--------|---------| +| Topological order | `topological_order()` | `Result, CycleError>` | +| Cycle detection | `has_cycles()` | `bool` | +| Ancestors/descendants | `ancestors()`, `descendants()` | `Vec` | +| Status filtering | `filter_by_status()` | Keys with matching status | +| Duration | `duration()` | `completed_at - started_at` | + +### DAG Invariants + +- **Operation graph**: DAG-only enforced at construction. Cycles throw + `CycleError`. +- **Call graph**: DAG by design. `parent_request_id` cannot create ancestor + cycles. +- **No parallel edges**: `multi: false`. +- **No self-loops**: `allow_self_loops: false`. + +### Integration with alknet-storage + +Call graphs and operation graphs are stored as metagraph instances in +alknet-storage. The bridge is serialization: `FlowGraph` serializes to +`serde_json::Value`, which storage persists in the `nodes.attributes` and +`edges.attributes` columns. + +### Integration with alknet-core (Call Protocol) + +The call protocol's `EventEnvelope` drives call graph construction: + +```rust +call_map.on_requested(|event| { + call_graph.update_from_event(&CallEvent::Requested(event)); +}); +``` + +### Crate Dependencies + +```toml +[dependencies] +petgraph = "0.x" +serde = { version = "1", features = ["derive"] } +serde_json = "1" +jsonschema = "0.x" +thiserror = "1" +uuid = { version = "1", features = ["v4"] } +chrono = { version = "0.x", features = ["serde"] } +``` + +Does NOT depend on alknet-core, alknet-storage, or alknet-secret. + +### Interface Back to Core + +`OperationSpec` and `CallNodeAttrs` types must match alknet-core's definitions. +The bridge is serialization — flowgraph serializes to JSON, storage persists it. +alknet-flowgraph does not depend on alknet-core as a crate; it conforms to the +`OperationSpec` schema independently. + +## Constraints + +- Pure computation crate — no I/O, no database, no external state. +- No dependency on alknet-core, alknet-storage, or alknet-secret. +- Type compatibility with alknet-core's `OperationSpec` is via serialization + conformance, not a crate dependency. + +## Open Questions + +- None specific to this spec. See [open-questions.md](open-questions.md) for + general questions. + +## Design Decisions + +| ADR | Decision | Summary | +|-----|----------|---------| +| [027](decisions/027-crate-decomposition.md) | Crate decomposition | alknet-flowgraph is independent of core, storage, secret | + +## References + +- [research/flow.md](../research/flow.md) — Full FlowGraph, operation graph, call graph design +- [research/integration-plan.md](../research/integration-plan.md) — Phase 2.3 +- [call-protocol.md](call-protocol.md) — EventEnvelope, PendingRequestMap +- `@alkdev/flowgraph` — TypeScript call-graph and operation-graph implementation +- `@alkdev/operations` — OperationSpec, CallHandler, registry \ No newline at end of file diff --git a/docs/architecture/identity.md b/docs/architecture/identity.md new file mode 100644 index 0000000..b2032f6 --- /dev/null +++ b/docs/architecture/identity.md @@ -0,0 +1,189 @@ +--- +status: draft +last_updated: 2026-06-07 +--- + +# Identity + +## What + +The `Identity` type and `IdentityProvider` trait are the core abstractions for +authentication and authorization in alknet. `Identity` is the unified result of +auth verification — whether via SSH public key, signed timestamp token, or +database lookup. `IdentityProvider` is the trait that resolves credentials to an +`Identity`, decoupling alknet-core from any specific identity storage. + +## Why + +Auth, forwarding policy, and call protocol all need to know who is making a +request and what they are authorized to do. Without `Identity` in core, each +subsystem would define its own identity type, leading to duplication and +conversion boilerplate. Without `IdentityProvider` as a trait, alknet-core +would either hardcode config-file-based auth or take a database dependency — +neither acceptable for a library crate. + +The `IdentityProvider` trait exists because the same auth verification concept +needs two implementations: `ConfigIdentityProvider` for minimal deployments (all +keys in memory via ArcSwap) and `StorageIdentityProvider` for production (SQLite +lookup via `peer_credentials` and ACL graph). The trait is the contract; the +backing store is pluggable. + +## Architecture + +### Identity Struct + +```rust +pub struct Identity { + pub id: String, // Fingerprint or account UUID + pub scopes: Vec, // e.g., ["relay:connect", "service:gitea:read"] + pub resources: HashMap>, // e.g., {"service": ["gitea", "registry"]} +} +``` + +The `id` field serves dual purpose: +- **Config-based auth** (`ConfigIdentityProvider`): holds the Ed25519 key + fingerprint (e.g., `SHA256:abc123...`) +- **Database-backed auth** (`StorageIdentityProvider`): holds the account UUID + from the `accounts` table + +This keeps the type simple while accommodating both auth paths. Downstream +consumers (forwarding policy, call protocol ACL checks) use `scopes` and +`resources` without knowing whether the identity came from a config file or a +database. + +### IdentityProvider Trait + +```rust +pub trait IdentityProvider: Send + Sync + 'static { + /// Resolve an SSH public key fingerprint to an identity. + fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option; + + /// Resolve an auth token to an identity. + /// Returns None if the token is invalid, expired, or the key is not authorized. + fn resolve_from_token(&self, token: &AuthToken) -> Option; +} +``` + +Both SSH key auth and token auth resolve to the same `Identity` type. The trait +lives in `alknet_core::auth`. + +### ConfigIdentityProvider (Default) + +Reads from `ArcSwap` per ADR-030. Every authorized key gets +a default scope set. No database dependency. This is the default for CLI and +single-node deployments. + +```rust +pub struct ConfigIdentityProvider { + auth_config: Arc>, +} + +impl IdentityProvider for ConfigIdentityProvider { + fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option { + let config = self.auth_config.load(); + config.auth.ssh.authorized_keys.get(fingerprint) + .map(|key_entry| Identity { + id: fingerprint.to_string(), + scopes: key_entry.scopes.clone(), + resources: key_entry.resources.clone(), + }) + } + + fn resolve_from_token(&self, token: &AuthToken) -> Option { + // Verify Ed25519 signature against the same authorized_keys set + // Resolve to the same Identity as SSH auth would produce + } +} +``` + +### StorageIdentityProvider (Production) + +Implemented in `alknet-storage` (not in alknet-core). Backed by SQLite +`peer_credentials` and `api_keys` tables plus the ACL graph. Resolves +fingerprint → account → organization membership → effective scopes. Uses the +`IdentityProvider` trait defined in alknet-core, providing the concrete impl via +the trait. + +### AuthProtocol irpc Service + +The `AuthProtocol` irpc service (behind the `irpc` feature flag per ADR-028) +provides an async boundary for auth verification. It is one way to satisfy the +`IdentityProvider` trait, not a replacement for it: + +```rust +enum AuthProtocol { + VerifyPubkey { fingerprint: String, key_data: Vec }, + VerifyToken { token_bytes: Vec, timestamp: u64 }, + ReloadKeys, + CheckAccess { identity: Identity, operation: String }, +} + +enum AuthResult { + Ok(Identity), + Denied(String), +} +``` + +The relationship: +- **Trait-based path**: Handler calls `identity_provider.resolve_from_fingerprint()` + directly. Zero overhead. Used when irpc is disabled or when the + implementation is local. +- **irpc path**: Handler calls `identity_provider.resolve_from_fingerprint()`, + which internally delegates to `AuthProtocol::VerifyPubkey` via an irpc client. + Used in production deployments with SQLite-backed auth. + +Both paths produce the same `Identity` result. + +### Auth Flows + +**SSH key auth** (existing, unchanged): +``` +Client connects → SSH handshake → auth_publickey() callback + → IdentityProvider::resolve_from_fingerprint(fingerprint) + → Some(Identity) or None +``` + +**Token auth** (new, for non-SSH transports): +``` +Browser connects → WebTransport CONNECT request + → Extract token from URL path or Authorization header + → IdentityProvider::resolve_from_token(token) + → Some(Identity) or None +``` + +Both paths produce an `Identity`. The `Identity` is attached to the connection +and used by `ForwardingPolicy` and call protocol for authorization decisions. + +## Constraints + +- `Identity` and `IdentityProvider` live in `alknet_core::auth`. No database + dependency at the core level (ADR-029). +- alknet-storage implements the core trait — the dependency goes from storage + to core, not the other way. +- The `id` field in `Identity` serves dual purpose (fingerprint or UUID). This + is a deliberate simplification — downstream consumers don't need to know the + source. +- Certificate authority tokens are not supported for token auth in v1 (ADR-023). +- The irpc feature flag means nodes that only do SSH tunneling don't need the + service layer overhead. + +## Open Questions + +- None specific to this spec. See [open-questions.md](open-questions.md) for + general auth questions (OQ-15, OQ-19). + +## Design Decisions + +| ADR | Decision | Summary | +|-----|----------|---------| +| [029](decisions/029-identity-core-type.md) | Identity as core type | `Identity` and `IdentityProvider` live in alknet-core, not storage | +| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | `AuthProtocol` behind feature flag; `IdentityProvider` is the contract | +| [023](decisions/023-unified-auth-shared-key-material.md) | Unified auth | Same key material for SSH and token auth; same `Identity` result | + +## References + +- [auth.md](auth.md) — Token authentication, AuthPolicy, WebTransport session handling +- [research/services.md](../research/services.md) — AuthService, AuthProtocol definition +- [research/integration-plan.md](../research/integration-plan.md) — Phase 1.2 +- [ADR-030](decisions/030-static-dynamic-config-split.md) — DynamicConfig (ConfigIdentityProvider reads from it) +- [ADR-031](decisions/031-forwarding-policy.md) — ForwardingPolicy consumes Identity.scopes \ No newline at end of file diff --git a/docs/architecture/interface.md b/docs/architecture/interface.md new file mode 100644 index 0000000..0cde1d1 --- /dev/null +++ b/docs/architecture/interface.md @@ -0,0 +1,221 @@ +--- +status: draft +last_updated: 2026-06-07 +--- + +# Interface (Layer 2) + +## What + +The Interface layer sits between Transport (Layer 1) and Protocol (Layer 3). +An Interface consumes a `Transport::Stream` and produces call protocol sessions. +SSH is an interface, not a transport — it wraps a byte stream in session +semantics. Raw framing (4-byte length prefix + JSON `EventEnvelope`) is another +interface, one without SSH overhead. + +## Why + +In the current architecture, SSH is deeply embedded in `ServerHandler`. This +tangling of transport, interface, and protocol makes it impossible to: + +- Run the call protocol over DNS queries without wrapping SSH inside DNS +- Use raw framing for local service mesh (no SSH overhead) +- Support WebTransport direct call protocol for browsers +- Separate auth mechanics from channel management + +The three-layer model (ADR-026) cleanly separates these concerns. Transport +produces bytes. Interface parses bytes into sessions. Protocol carries +semantics. A connection is always a (Transport, Interface) pair. + +## Architecture + +### Three-Layer Model + +``` +Layer 3: Protocol (Call protocol, Operations, OperationEnv) +Layer 2: Interface (SSH, raw framing, HTTP/WS, DNS control channel) +Layer 1: Transport (TCP, TLS, iroh, DNS, WebTransport) +``` + +- **Layer 1: Transport** — produces byte streams (`AsyncRead + AsyncWrite + Unpin + + Send`). Unchanged per ADR-001. +- **Layer 2: Interface** — consumes a `Transport::Stream` and produces call + protocol sessions. SSH does handshake + auth + channel multiplexing. Raw + framing does length-prefix parsing. +- **Layer 3: Protocol** — carries semantics. Call protocol events, operation + registry, service calls. Agnostic to both Transport and Interface below it. + +### Interface Trait + +```rust +#[async_trait] +pub trait Interface: Send + Sync + 'static { + type Session; + async fn accept(stream: TransportStream, config: &InterfaceConfig) -> Result; +} +``` + +The session produced by an interface is consumed by the call protocol handler. +Different interfaces produce different session types, but the call protocol +handler receives `EventEnvelope` frames from any interface. + +### SshInterface + +Wraps the existing `ServerHandler` logic. This is the most complex interface +because SSH provides channel multiplexing, auth negotiation, and proxy +management within a single session. + +What stays in SshInterface (Layer 2): +- SSH handshake and session management +- Auth delegation to `IdentityProvider` (via `auth_publickey()` callback) +- Channel multiplexing (multiple channels per session) +- `alknet-control:0` channel routing to call protocol + +What moves to Layer 3 (call protocol handler): +- Operation registry and dispatch +- Forwarding policy checks (per ADR-031) +- Operation context construction (Identity, scopes) + +What moves to per-connection state: +- Port forwarding proxy logic + +### RawFramingInterface + +Reads 4-byte big-endian length prefix + JSON `EventEnvelope` frames directly +from the transport stream. No SSH wrapping. No channel multiplexing — the +entire stream is a single call protocol channel. + +```rust +pub struct RawFramingInterface; + +impl Interface for RawFramingInterface { + type Session = RawFramingSession; + // Reads length-prefixed EventEnvelope frames from the stream +} +``` + +Used for: +- DNS control channel (DNS transport + raw framing) +- Local service mesh (TCP + raw framing, no SSH overhead) +- Browser direct call protocol (WebTransport + raw framing, future) + +### DNS Control Channel + +A (DNS transport, raw framing interface) pair. The DNS transport encodes +`EventEnvelope` frames as DNS query/response pairs. The raw framing interface +parses them directly — **NOT** SSH inside DNS. + +``` +Client: Encode EventEnvelope as base32 DNS query labels + → DNS Transport → DNS Server → Raw Framing Interface → Call Protocol Handler + +Server: Return EventEnvelope as DNS TXT record response + ← Raw Framing Interface ← DNS Transport ← Call Protocol Handler +``` + +### Valid (Transport, Interface) Pairs + +| Transport | Interface | Use case | +|-----------|-----------|----------| +| TLS | SSH | Standard alknet tunnel | +| TCP | SSH | Plain SSH tunnel | +| iroh | SSH | P2P SSH tunnel | +| DNS | raw framing | DNS control channel | +| WebTransport | SSH | Browser SSH tunnel (future) | +| WebTransport | raw framing | Browser call protocol (future) | +| TCP | raw framing | Direct call protocol, local mesh | + +### InterfaceConfig + +Different interfaces require different configuration: + +```rust +pub enum InterfaceConfig { + Ssh(SshInterfaceConfig), + RawFraming(RawFramingConfig), +} + +pub struct SshInterfaceConfig { + pub auth: Arc, + pub forwarding: Arc>, // for ForwardingPolicy + pub host_key: Arc, +} + +pub struct RawFramingConfig { + // No SSH-specific config needed + // Auth is handled by the transport layer (e.g., token auth for WebTransport) + // or by the call protocol layer +} +``` + +### Auth Across Interfaces + +- **SshInterface**: Auth happens during SSH handshake via + `IdentityProvider::resolve_from_fingerprint()`. The authenticated `Identity` + is attached to the session. +- **RawFramingInterface**: Auth is handled by the transport (e.g., token auth + for WebTransport via `IdentityProvider::resolve_from_token()`) or by the call + protocol layer (operation-level ACL). + +Both paths produce the same `Identity` type (ADR-029). + +### Server Accept Loop + +With the Interface trait, the accept loop becomes: + +```rust +for listener in listeners { + let (transport, interface) = listener; + tokio::spawn(async move { + loop { + let stream = transport.accept().await?; + let session = interface.accept(stream, &config).await?; + // session produces call protocol events + // call protocol handler is interface-agnostic + } + }); +} +``` + +## Constraints + +- The Interface trait must accommodate both SSH's channel multiplexing and raw + framing's single-stream model through the same abstraction. +- `SshInterface` is the most invasive refactoring in Phase 1. The existing + `ServerHandler` owns auth, channel management, and proxy logic — extracting + these cleanly requires careful design (integration-plan, Phase 1.8). +- DNS transport implementation is Phase 4 work. The `TransportKind::Dns` variant + and `RawFramingInterface` are defined now; implementation is deferred. +- WebTransport is Phase 4 work. The `TransportKind::WebTransport` variant is a + tag only for now. + +## Open Questions + +- **OQ-IF-01**: How does the `Interface` session type relate to the call + protocol's `EventEnvelope` stream? Does every session implement + `Stream`? This needs design during Phase 1.8. + +- **OQ-IF-02**: Should `SshInterface` own the `ForwardingPolicy` check for + `channel_open_direct_tcpip`, or should that move to Layer 3? Current thinking: + the forwarding check is a Layer 3 concern (it's policy, not session mechanics), + but the channel open/close lifecycle is Layer 2. The Interface reports channel + open requests to Layer 3; Layer 3 applies `ForwardingPolicy` and tells + Layer 2 whether to proxy. + +## Design Decisions + +| ADR | Decision | Summary | +|-----|----------|---------| +| [026](decisions/026-transport-interface-separation.md) | Three-layer model | SSH is Layer 2, not Layer 1 | +| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv | Protocol is interface-agnostic | +| [029](decisions/029-identity-core-type.md) | Identity as core type | Auth resolution across interfaces | +| [031](decisions/031-forwarding-policy.md) | Forwarding policy | Layer 3 policy applied to Layer 2 channel requests | + +## References + +- [research/integration-plan.md](../research/integration-plan.md) — Phase 1.8, valid (Transport, Interface) pairs +- [research/core.md](../research/core.md) — DNS transport, three-layer model +- [ADR-026](decisions/026-transport-interface-separation.md) — Transport/interface separation +- [transport.md](transport.md) — Transport trait (unchanged at Layer 1) +- [server.md](server.md) — Current ServerHandler (will become SshInterface) +- [identity.md](identity.md) — IdentityProvider, auth across interfaces \ No newline at end of file diff --git a/docs/architecture/open-questions.md b/docs/architecture/open-questions.md index 7664421..563056e 100644 --- a/docs/architecture/open-questions.md +++ b/docs/architecture/open-questions.md @@ -1,6 +1,6 @@ --- status: draft -last_updated: 2026-06-04 +last_updated: 2026-06-07 --- # Open Questions @@ -96,10 +96,10 @@ last_updated: 2026-06-04 ### OQ-12: Per-user forwarding scope vs global rules - **Origin**: [research/configuration.md](../research/configuration.md) -- **Status**: open -- **Priority**: medium -- **Resolution**: (pending) -- **Cross-references**: configuration.md +- **Status**: ~~resolved~~ +- **Priority**: ~~medium~~ — +- **Resolution**: ADR-031 — Start with global rules + principal matching from `Identity.scopes`. Per-user scope from `peer_credentials.metadata.scopes` via `IdentityProvider`. The `ForwardingPolicy` evaluates rules against `Identity.id` and `Identity.scopes` from the authenticated identity. +- **Cross-references**: [ADR-031](decisions/031-forwarding-policy.md), [configuration.md](configuration.md) ### OQ-13: Config file auto-reload via file watching - **Origin**: [research/configuration.md](../research/configuration.md) @@ -119,38 +119,59 @@ last_updated: 2026-06-04 - **Origin**: [research/configuration.md](../research/configuration.md) - **Status**: open - **Priority**: medium -- **Resolution**: (pending — needs R&D in WebTransport transport session) -- **Cross-references**: [auth.md](auth.md), OQ-19 +- **Resolution**: (deferred to Phase 4 — needs R&D in WebTransport transport session) +- **Cross-references**: [auth.md](auth.md), OQ-19, [interface.md](interface.md) ### OQ-16: Transport-specific forwarding policy (e.g., WebTransport clients restricted to alknet-* channels) - **Origin**: [research/configuration.md](../research/configuration.md) -- **Status**: open -- **Priority**: low -- **Resolution**: (pending — defer to forwarding policy design) -- **Cross-references**: configuration.md +- **Status**: ~~resolved~~ +- **Priority**: ~~low~~ — +- **Resolution**: ADR-031 — Add `TransportKind` match in `ForwardingRule`. WebTransport clients can be restricted to `alknet-*` channels via `TargetPattern::AlknetPrefix` combined with a `TransportKind::WebTransport` filter. +- **Cross-references**: [ADR-031](decisions/031-forwarding-policy.md), [configuration.md](configuration.md) ### OQ-17: Transport-aware auth layer (SSH keys vs API keys for non-SSH transports) - **Origin**: [research/configuration.md](../research/configuration.md) - **Status**: ~~resolved~~ - **Priority**: ~~medium~~ — - **Resolution**: ADR-023 — Unified auth with shared key material. SSH transports use SSH pubkey auth. Non-SSH transports (WebTransport) use Ed25519-signed timestamp tokens. Both verify against the same `authorized_keys` set. The presentation differs per transport, but the identity is unified. `AuthPolicy` holds both `SshAuthConfig` and `TokenAuthConfig`, with `TokenKeySource::Shared` as the default (same keys for both paths). `IdentityProvider` trait decouples alknet-core from identity storage. -- **Cross-references**: [ADR-023](decisions/023-unified-auth-shared-key-material.md), [auth.md](auth.md), OQ-15 +- **Cross-references**: [ADR-023](decisions/023-unified-auth-shared-key-material.md), [identity.md](identity.md), OQ-15 + +### OQ-23: irpc dependency — always or behind feature flag? +- **Origin**: [research/integration-plan.md](../research/integration-plan.md) +- **Status**: ~~resolved~~ +- **Priority**: medium — +- **Resolution**: ADR-027 — Feature flag. Nodes that only do SSH tunneling don't need the service layer. irpc is behind a feature flag in alknet-core and an independent dependency in alknet-secret and alknet-storage. +- **Cross-references**: [ADR-027](decisions/027-crate-decomposition.md) + +### OQ-24: DNS control channel scope for initial implementation? +- **Origin**: [research/integration-plan.md](../research/integration-plan.md) +- **Status**: ~~resolved~~ +- **Priority**: medium — +- **Resolution**: ADR-026 — DNS control channel carries call protocol frames only (no SSH tunneling over DNS). The (DNS transport, raw framing interface) pair sends `EventEnvelope` directly. SSH-over-DNS is a future possibility but out of scope. +- **Cross-references**: [ADR-026](decisions/026-transport-interface-separation.md), [interface.md](interface.md) + +### OQ-25: alknet-storage and alknet-secret irpc dependency +- **Origin**: [research/integration-plan.md](../research/integration-plan.md) +- **Status**: ~~resolved~~ +- **Priority**: low — +- **Resolution**: ADR-027 — Independently. They're separate crates. irpc is a shared library they both use as an independent dependency. +- **Cross-references**: [ADR-027](decisions/027-crate-decomposition.md) ## Auth ### OQ-18: Source of Identity.scopes — ForwardingPolicy, IdentityProvider, or both? - **Origin**: [auth.md](auth.md) -- **Status**: open -- **Priority**: medium -- **Resolution**: (pending) -- **Cross-references**: ADR-023, [call-protocol.md](call-protocol.md) +- **Status**: ~~resolved~~ +- **Priority**: ~~medium~~ — +- **Resolution**: ADR-029 and ADR-031 — `IdentityProvider` owns scopes. The `Identity` struct includes `scopes` and `resources` fields populated by the `IdentityProvider` implementation (config-based or database-backed). `ForwardingPolicy` uses scopes from `Identity` — it consumes them, it doesn't produce them. +- **Cross-references**: [ADR-029](decisions/029-identity-core-type.md), [ADR-031](decisions/031-forwarding-policy.md), [identity.md](identity.md) ### OQ-19: Separate TLS identity for WebTransport vs shared with SSH-over-TLS? - **Origin**: [auth.md](auth.md) - **Status**: open - **Priority**: low -- **Resolution**: (pending) -- **Cross-references**: OQ-15 +- **Resolution**: (deferred to Phase 4 — QUIC is UDP, TLS-over-TCP is TCP, they can share port 443 without conflict) +- **Cross-references**: OQ-15, [interface.md](interface.md) ## Call Protocol @@ -158,19 +179,65 @@ last_updated: 2026-06-04 - **Origin**: [call-protocol.md](call-protocol.md) - **Status**: open - **Priority**: medium -- **Resolution**: (pending — registration on connect / cleanup on disconnect is the leading approach) +- **Resolution**: (pending — registration on connect / cleanup on disconnect is the leading approach but needs spec in call-protocol.md) - **Cross-references**: ADR-024, ADR-025 ### OQ-21: Routing calls to specific workers with same-service operations - **Origin**: [call-protocol.md](call-protocol.md) - **Status**: ~~resolved~~ - **Priority**: ~~medium~~ — -- **Resolution**: ADR-024, ADR-025 — Operation paths use `/{node}/{service}/{op}` format. The first path segment identifies the node and routes the call to the correct connected node. Multiple workers exposing the same service (e.g., two dev envs both with `/fs/*`) are differentiated by the node prefix (`/dev1/fs/readFile` vs `/dev2/fs/readFile`). The head maintains a routing table mapping node identity to connection. This mirrors iroh's ALPN dispatch: first segment = routing key. +- **Resolution**: ADR-024, ADR-025 — Operation paths use `/{node}/{service}/{op}` format. The first path segment identifies the node and routes the call to the correct connected node. Multiple workers exposing the same service are differentiated by the node prefix (`/dev1/fs/readFile` vs `/dev2/fs/readFile`). The head maintains a routing table mapping node identity to connection. - **Cross-references**: [call-protocol.md](call-protocol.md), ADR-024, ADR-025 ### OQ-22: Client streaming (streaming inputs) in the call protocol? - **Origin**: [call-protocol.md](call-protocol.md) +- **Status**: ~~resolved~~ +- **Priority**: ~~low~~ — +- **Resolution**: Deferred. Current model (single request, optional streaming response) covers all identified use cases. Client streaming can be added later if needed. +- **Cross-references**: ADR-024 + +## Services + +### OQ-SVC-01: Should the secret service support multiple seed phrases (one per tenant)? +- **Origin**: [secret-service.md](secret-service.md) - **Status**: open - **Priority**: low -- **Resolution**: (pending) -- **Cross-references**: ADR-024 \ No newline at end of file +- **Resolution**: (deferred — one seed per node is simplest; multi-seed can be added later by indexing `Unlock` with a tenant ID) +- **Cross-references**: [secret-service.md](secret-service.md) + +### OQ-SVC-02: Should service protocols use postcard (binary) or JSON for remote calls? +- **Origin**: [research/services.md](../research/services.md) +- **Status**: ~~resolved~~ +- **Priority**: low — +- **Resolution**: Postcard for irpc (Rust-to-Rust, efficient). JSON for call protocol (cross-language, universal). The irpc remote path naturally uses postcard. +- **Cross-references**: [services.md](services.md) + +### OQ-SVC-03: How does the secret service integrate with the existing EncryptedDataSchema from @alkdev/storage? +- **Origin**: [secret-service.md](secret-service.md) +- **Status**: open +- **Priority**: medium +- **Resolution**: (pending — Rust implementation replaces PBKDF2 password-based encryption with derived AES-256-GCM keys; EncryptedData format is a superset; migration by re-encrypting) +- **Cross-references**: [secret-service.md](secret-service.md), [storage.md](storage.md) + +### OQ-SVC-04: Should workers cache derived keys locally? +- **Origin**: [secret-service.md](secret-service.md) +- **Status**: open +- **Priority**: low +- **Resolution**: Yes, with a TTL (default: 1 hour). The head can revoke by invalidating the session. +- **Cross-references**: [secret-service.md](secret-service.md) + +## Interface + +### OQ-IF-01: How does the Interface session type relate to the call protocol's EventEnvelope stream? +- **Origin**: [interface.md](interface.md) +- **Status**: open +- **Priority**: high +- **Resolution**: (pending — needs design during Phase 1.8 implementation) +- **Cross-references**: [interface.md](interface.md), [ADR-026](decisions/026-transport-interface-separation.md) + +### OQ-IF-02: Should SshInterface own ForwardingPolicy checks or should they move to Layer 3? +- **Origin**: [interface.md](interface.md) +- **Status**: open +- **Priority**: medium +- **Resolution**: (pending — current thinking: forwarding check is Layer 3 policy, but channel open/close lifecycle is Layer 2. The Interface reports channel open requests to Layer 3; Layer 3 applies ForwardingPolicy.) +- **Cross-references**: [interface.md](interface.md), [ADR-031](decisions/031-forwarding-policy.md) \ No newline at end of file diff --git a/docs/architecture/secret-service.md b/docs/architecture/secret-service.md new file mode 100644 index 0000000..a4faf36 --- /dev/null +++ b/docs/architecture/secret-service.md @@ -0,0 +1,197 @@ +--- +status: draft +last_updated: 2026-06-07 +--- + +# Secret Service + +## What + +The `alknet-secret` crate provides BIP39 mnemonic generation, SLIP-0010 Ed25519 +HD key derivation, AES-256-GCM encryption for external credentials, and the +`SecretProtocol` irpc service. It is the only component that holds the master +seed phrase. + +## Why + +Operations like SSH key generation, API key storage, and Ethereum transaction +signing all need deterministic key derivation from a single root of trust. The +seed phrase is the single recovery mechanism — from it, all self-generated +secrets can be derived on demand. External credentials (third-party API keys, +OAuth tokens) cannot be derived and must be stored encrypted, with the +encryption key itself derived from the seed. + +The secret service isolates this responsibility: no other crate sees the seed, +and derived keys are provided on demand through an irpc service interface. + +## Architecture + +### Security Model + +| State | What's in memory | What's on disk | +|-------|-----------------|---------------| +| Locked | Nothing | Encrypted database, derivation path metadata | +| Unlocked | Master seed in RAM | Same (seed is never persisted) | +| After use | Derived keys cached in RAM | Derivation paths only | + +The seed phrase is entered once (at node startup or via `Unlock` call), held +only in RAM, and never written to disk. The `Lock` call purges the seed and all +cached derived keys from memory. + +### SecretProtocol irpc Service + +```rust +#[rpc_requests(message = SecretMessage)] +#[derive(Debug, Serialize, Deserialize)] +enum SecretProtocol { + #[rpc(tx=oneshot::Sender)] + #[wrap(DeriveEd25519)] + DeriveEd25519 { path: String }, + + #[rpc(tx=oneshot::Sender)] + #[wrap(DeriveEncryptionKey)] + DeriveEncryptionKey { path: String }, + + #[rpc(tx=oneshot::Sender)] + #[wrap(DeriveEthereumKey)] + DeriveEthereumKey { path: String }, + + #[rpc(tx=oneshot::Sender>)] + #[wrap(DerivePassword)] + DerivePassword { path: String, length: usize }, + + #[rpc(tx=oneshot::Sender)] + #[wrap(Encrypt)] + Encrypt { plaintext: String, key_version: u32 }, + + #[rpc(tx=oneshot::Sender)] + #[wrap(Decrypt)] + Decrypt { encrypted: EncryptedData }, + + #[rpc(tx=oneshot::Sender<()>)] + #[wrap(Lock)] + Lock, + + #[rpc(tx=oneshot::Sender<()>)] + #[wrap(Unlock)] + Unlock { passphrase: String }, +} + +#[derive(Debug, Serialize, Deserialize)] +struct DerivedKey { + key_type: KeyType, + private_key: Vec, + public_key: Vec, +} + +#[derive(Debug, Serialize, Deserialize)] +enum KeyType { + Ed25519, + Aes256Gcm, + Secp256k1, +} + +#[derive(Debug, Serialize, Deserialize)] +struct EncryptedData { + key_version: u32, + salt: String, // Base64-encoded + iv: String, // Base64-encoded + data: String, // Base64-encoded +} +``` + +### BIP39 Mnemonic and Seed Derivation + +```rust +let mnemonic = Mnemonic::from_phrase(&phrase, Language::English)?; +let seed = mnemonic.to_seed(Some(&passphrase)); +let master_key = ExtendedPrivKey::new_master(Network::Alknet, &seed)?; +``` + +### SLIP-0010 Ed25519 HD Key Derivation + +The `74'` coin type is unallocated per SLIP-0044 and reserved for alknet. + +### Derivation Path Constants + +| Path | Purpose | Curve/Algorithm | +|------|---------|----------------| +| `m/74'/0'/0'/0'` | Primary identity keypair | Ed25519 (alknet auth) | +| `m/74'/0'/0'/{n}'` | Worker/device identity | Ed25519 | +| `m/74'/0'/1'/0'` | SSH host key | Ed25519 | +| `m/74'/1'/0'/{hash}'` | Site-specific password | Deterministic | +| `m/74'/2'/0'/0'` | Encryption key for external credentials | AES-256-GCM | +| `m/44'/60'/0'/0/0` | Ethereum signing key | secp256k1 | + +### AES-256-GCM Encryption for External Credentials + +External credentials (API keys, OAuth tokens) that cannot be derived are +encrypted using a key derived from the seed at path `m/74'/2'/0'/0'`. The +`EncryptedData` type stores the key version, salt, IV, and ciphertext. This +format is compatible with the existing `@alkdev/storage` `EncryptedDataSchema`. + +1. The secret service derives an AES-256-GCM key via path `m/74'/2'/0'/0'` +2. External credentials are encrypted with this key +3. The encrypted data is stored as a `SecretNode` in the metagraph +4. Only the derivation path and key version are stored in plain attributes +5. The seed phrase (or derived encryption key) is held only by the secret + service — never in the database + +### Deployment Topologies + +**Minimal (single node, CLI)**: Secret service runs in the same process. Seed +phrase entered at startup. All keys derived locally. No irpc overhead. + +**Production (head node)**: Secret service runs on a dedicated node or as a +local irpc service. Workers request derived keys via irpc over QUIC. The seed +never leaves the secret service node. + +## Constraints + +- The seed phrase is never persisted to disk. It is entered at startup or via + `Unlock` and held only in RAM. +- `Lock` purges the seed and all cached derived keys from memory. +- alknet-secret does not depend on alknet-core or alknet-storage. It is fully + independent. +- The `EncryptedData` wire format (key_version, salt, iv, data) is shared with + alknet-storage for compatibility, but this is type-level compatibility — not a + crate dependency. +- Per ADR-032, the secret service's Honker streams (key derivation notifications) + stay within the service boundary. External consumers use irpc calls or call + protocol operations that project to integration events. +- The irpc service defines the wire format for in-cluster communication + (postcard serialization). For call protocol exposure (e.g., + `/head/secrets/derive`), the service is wrapped in an operation that serializes + to JSON. + +## Open Questions + +- **OQ-SVC-01**: Should the secret service support multiple seed phrases (one per + tenant)? See [open-questions.md](open-questions.md). + +- **OQ-SVC-02**: Should service protocols use postcard (binary) or JSON for + remote calls? Postcard for irpc (Rust-to-Rust), JSON for call protocol + (cross-language). See [open-questions.md](open-questions.md). + +- **OQ-SVC-03**: How does the secret service integrate with the existing + `EncryptedDataSchema` from `@alkdev/storage`? The Rust implementation replaces + PBKDF2 password-based encryption with derived AES-256-GCM keys. The + `EncryptedData` format is a superset. + +- **OQ-SVC-04**: Should workers cache derived keys locally? Yes, with a TTL + (default: 1 hour). The head can revoke by invalidating the session. + +## Design Decisions + +| ADR | Decision | Summary | +|-----|----------|---------| +| [027](decisions/027-crate-decomposition.md) | Crate decomposition | alknet-secret is independent of core and storage | +| [032](decisions/032-event-boundary-discipline.md) | Event boundary | Secret service domain events stay internal | + +## References + +- [research/services.md](../research/services.md) — SecretProtocol definition, DerivedKey, KeyType +- [research/storage.md](../research/storage.md) — Secrets section, derivation paths, EncryptedData +- [research/integration-plan.md](../research/integration-plan.md) — Phase 2.1 +- SLIP-0010 — https://github.com/satoshilabs/slips/blob/master/slip-0010.md +- BIP39 — https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki \ No newline at end of file diff --git a/docs/architecture/services.md b/docs/architecture/services.md new file mode 100644 index 0000000..d380d78 --- /dev/null +++ b/docs/architecture/services.md @@ -0,0 +1,211 @@ +--- +status: draft +last_updated: 2026-06-07 +--- + +# Services + +## What + +The irpc service layer decomposes alknet's core responsibilities into +independently testable, deployable, and replaceable components. Auth, Secret, +Config, and Storage are irpc protocol enums that work both as in-process async +boundaries (tokio channels) and cross-process/cross-network (QUIC streams via +noq). OperationEnv is the universal composition mechanism that unifies local +dispatch, irpc service dispatch, and remote call protocol dispatch. + +## Why + +Without the service layer, auth verification, key derivation, and config reload +are scattered across the codebase with no async boundary. For head nodes serving +many users, in-memory key lookup doesn't scale — auth needs to query a database +on demand. For secret management, the seed must be isolated in its own process +boundary. + +Without OperationEnv, handlers calling other operations would need to know +whether the target is local, in-cluster, or on a remote node. OperationEnv +abstracts this away: `context.env.invoke("secrets", "derive", input)` works +regardless of dispatch path. + +## Architecture + +### Service Definition Pattern + +Services are defined as irpc protocol enums: + +```rust +#[rpc_requests(message = AuthMessage)] +#[derive(Debug, Serialize, Deserialize)] +enum AuthProtocol { + #[rpc(tx=oneshot::Sender)] + #[wrap(VerifyPubkey)] + VerifyPubkey { fingerprint: String, key_data: Vec }, + // ... +} +``` + +The `#[rpc_requests]` macro generates two versions: +- **Serializable** (`Request`): for remote communication (postcard encoding) +- **With channels** (`RequestWithChannels`): for local communication (tokio channels) + +Both use the same `Client` type. The local/remote distinction is transparent +at the call site. + +### Core Services + +| Service | Protocol | Purpose | Always Local? | +|---------|----------|---------|---------------| +| **Auth** | `AuthProtocol` | Verify identities, check credentials | Can be remote | +| **Secret** | `SecretProtocol` | Derive keys, encrypt/decrypt | Local or remote | +| **Config** | `ConfigProtocol` | Dynamic config reload | Local | +| **Storage** | `StorageProtocol` | Graph CRUD, metagraph operations | Local or remote | + +### OperationContext + +Every handler receives an `OperationContext`: + +```rust +pub struct OperationContext { + pub request_id: String, + pub parent_request_id: Option, + pub identity: Option, + pub metadata: HashMap, + pub env: OperationEnv, + pub trusted: bool, // set by buildEnv(), not by callers +} +``` + +- **`identity`**: The authenticated identity making the call. Populated by + `IdentityProvider` from the interface layer. +- **`env`**: The operation environment — namespaced access to other operations. +- **`trusted`**: When a handler calls another operation through `env`, the + nested call is `trusted` (skips ACL checks). + +### OperationEnv — Universal Composition Mechanism + +OperationEnv provides namespace + operation name → invoke with input, return +output. The handler doesn't know or care whether the dispatch is local, irpc, +or remote. + +Three dispatch paths: + +| Path | Mechanism | Serialization | Scope | +|------|-----------|---------------|-------| +| **Local** | Direct function call through registry | None (in-process) | Same process | +| **Service** | irpc protocol enum dispatch | postcard (binary) | Same cluster | +| **Remote** | Call protocol `EventEnvelope` | JSON | Cross-node | + +All three produce the same `ResponseEnvelope`. + +Service assembly determines which path each operation uses: + +```rust +// Minimal deployment (single node, all local) +let env = OperationEnv::local(local_registry); + +// Production deployment (mix of local and remote) +let env = OperationEnv::new() + .local("auth", auth_registry) + .local("config", config_registry) + .service("secrets", secret_irpc_client) + .remote("worker-1", call_protocol_conn); +``` + +### Service vs Call Protocol vs External Service + +These are different concepts that compose through OperationEnv: + +- **irpc service**: In-cluster, Rust-to-Rust, type-safe, postcard serialization. + Dispatched by enum variant. Example: `AuthProtocol::VerifyPubkey`. +- **Call protocol operation**: Cross-node, cross-language, path-based, JSON + `EventEnvelope`. Dispatched by namespace + name. Example: + `/head/auth/verify`. +- **External service**: Any endpoint reachable via the call protocol. + Example: a vast.ai instance, an HTTP API, another head node. + +An irpc service can back a call protocol operation. The OperationEnv routes to +the appropriate dispatch path: + +``` +Call Protocol (Layer 3, external, JSON) + └── irpc Service (Layer 3, internal, postcard) + └── Honker Streams (Domain events, within service boundary) +``` + +### Adapters + +HTTP, MCP, DNS, and WebSocket adapters all resolve through OperationEnv: + +- HTTP: `POST /v1/{namespace}/{op}` → `context.env.invoke(namespace, op, input)` +- MCP: `tools/call` with tool name → `context.env.invoke(namespace, op, input)` +- DNS: `{op}.{namespace}.alk.dev TXT?` → `context.env.invoke(namespace, op, input)` +- Call protocol: `call.requested` with `operationId` → `context.env.invoke(namespace, op, input)` + +### Deployment Topologies + +**Minimal (single node, CLI)**: All services run locally via tokio channels. + +``` +┌──────────────────────────────────────────────┐ +│ Single Process │ +│ Auth (ArcSwap) | Secret (seed in RAM) | │ +│ Config (ArcSwap) | alknet-core Server │ +└──────────────────────────────────────────────┘ +``` + +**Production (multi-node)**: Auth and secrets on dedicated nodes; workers +access them remotely. + +``` +Auth Node (SQLite) Secret Node (seed in RAM) + ↑ ↑ + │ QUIC (irpc) │ QUIC (irpc) + │ │ +Head Node (Config, Storage, alknet-core Server) + │ + │ SSH / iroh / TLS + │ +Worker Node (alknet-core Client) +``` + +## Constraints + +- Services are **internal** — they run within a node or cluster. +- The call protocol is **external** — it's how nodes talk to each other. +- Per ADR-032, domain events (Honker streams) stay within the owning service. + irpc calls are synchronous request-response within a node. Call protocol + `EventEnvelope` is the integration boundary between nodes. +- OperationEnv is a hard constraint: the handler-facing API must match the + behavioral contract from `@alkdev/operations`. Namespace + operation name → + invoke with input, return output. +- irpc is behind a feature flag in alknet-core. Nodes that only do SSH tunneling + don't need the service layer overhead. + +## Open Questions + +- **OQ-SVC-01**: Should the secret service support multiple seed phrases (one + per tenant)? Defer for now — one seed per node. Multi-seed can be added + later by indexing the `Unlock` call with a tenant ID. + +- **OQ-SVC-02**: Should service protocols use postcard (binary) or JSON for + remote calls? Postcard for irpc (Rust-to-Rust, efficient). JSON for call + protocol (cross-language, universal). The irpc remote path naturally uses + postcard. + +## Design Decisions + +| ADR | Decision | Summary | +|-----|----------|---------| +| [027](decisions/027-crate-decomposition.md) | Crate decomposition | Service crates are independent of core | +| [028](decisions/028-auth-irpc-service.md) | Auth as irpc service | AuthProtocol behind feature flag | +| [032](decisions/032-event-boundary-discipline.md) | Event boundary | Domain events never cross service boundaries | +| [033](decisions/033-operationenv-irpc-call-protocol.md) | OperationEnv | Universal composition mechanism with three dispatch paths | + +## References + +- [research/services.md](../research/services.md) — Service protocol definitions, OperationContext, deployment topologies +- [research/integration-plan.md](../research/integration-plan.md) — OperationEnv, three dispatch paths, adapter patterns +- [secret-service.md](secret-service.md) — SecretProtocol definition +- [identity.md](identity.md) — IdentityProvider, AuthProtocol +- [configuration.md](configuration.md) — ConfigProtocol, DynamicConfig reload +- [interface.md](interface.md) — Interface layer, auth across interfaces \ No newline at end of file diff --git a/docs/architecture/storage.md b/docs/architecture/storage.md new file mode 100644 index 0000000..b6da05b --- /dev/null +++ b/docs/architecture/storage.md @@ -0,0 +1,219 @@ +--- +status: draft +last_updated: 2026-06-07 +--- + +# Storage + +## What + +The `alknet-storage` crate provides SQLite-backed graph storage, identity +management, access control, and reactivity via honker. It mirrors the +TypeScript `@alkdev/storage` package's design while leveraging Rust's type +system and honker's built-in pub/sub. + +## Why + +alknet-core needs persistent identity data (authorized keys, accounts, ACLs) +and a way to store and query graph-structured data (call graphs, operation +graphs, metagraph). But alknet-core cannot take a database dependency. The +solution: alknet-storage implements alknet-core's `IdentityProvider` trait, +providing SQLite-backed identity resolution without core knowing about SQLite. + +The metagraph (three-level type system: GraphType → NodeType → EdgeType → Graph +→ Node → Edge) is the foundation for ACL, flowgraph persistence, and any +future graph-structured data. + +## Architecture + +### Crate Structure + +``` +alknet-storage/ +├── metagraph/ — GraphType, NodeType, EdgeType persistence +├── identity/ — accounts, organizations, peer_credentials, api_keys, audit_logs +├── acl/ — PrincipalNode, DelegatesEdge, access control graph +├── secrets/ — Encrypted node type, encrypt/decrypt bridge +├── honker/ — honker integration: notify, stream, queue +├── graph/ — GraphInstance, Node, Edge CRUD with schema validation +└── schema/ — JSON Schema definitions (serde + jsonschema) +``` + +### Metagraph Data Model + +Three-level type system: + +1. **GraphType** — A class of graphs (e.g., "call-graph", "acl", + "task-dependencies"). Defines structural constraints. +2. **NodeType** — A category of node within a graph type. Each has a JSON Schema + for attribute validation. +3. **EdgeType** — A category of edge within a graph type. Each has a JSON Schema + and optional source/target constraints. + +Graph instances belong to a graph type and contain nodes and edges conforming +to those type definitions. + +### SQLite Table Schema + +Common columns: `id TEXT PK`, `metadata TEXT JSON DEFAULT '{}'`, +`created_at INTEGER TIMESTAMP`, `updated_at INTEGER TIMESTAMP`. + +| Table | Key columns | +|-------|------------| +| `graph_types` | id, name (UNIQUE), config JSON, version, scope | +| `node_types` | id, graph_type_id FK, name, schema JSON | +| `edge_types` | id, graph_type_id FK, name, schema JSON, allowed_source/target types | +| `graphs` | id, graph_type_id FK, name, description, status, owner_id, project_id | +| `nodes` | id, graph_id FK, key (UNIQUE per graph), attributes JSON | +| `edges` | id, graph_id FK, key, source_node_key, target_node_key, attributes JSON, undirected | + +No FK constraints across database files. Referential integrity is enforced at +the application layer. + +### System DB vs Tenant DB + +- **System DB** (`system.db`): Identity tables (accounts, organizations, + peer_credentials, api_keys, audit_logs) + system-scoped graph types. +- **Tenant DB** (`tenant-{orgId}.db`): Metagraph tables + tenant-scoped graph + types. + +### Identity Tables + +| Table | Key columns | +|-------|------------| +| `accounts` | email (UNIQUE), display_name, access_level (admin/user/service), status | +| `organizations` | name (UNIQUE), slug (UNIQUE), owner_id FK → accounts | +| `organization_members` | org_id FK, account_id FK, membership_level (owner/admin/member) | +| `api_keys` | owner_id FK, key_hash (UNIQUE), name, enabled, expires_at, revoked_at | +| `peer_credentials` | owner_id FK, credential_type (ssh_key/cert_authority), fingerprint (UNIQUE), public_key_data | +| `audit_logs` | action, owner_id FK, credential_id, org_id FK, details JSON | + +### ACL as Metagraph + +The ACL graph is a directed, non-multi metagraph: + +- **PrincipalNode**: IdentityType (Account, Org, Service, Role) + identity_id + scopes + resources +- **ResourceNode**: The thing being accessed +- **Edges**: can_read, can_write, can_execute, belongs_to, delegates + +Delegation edges carry `narrowed_scopes` — the delegate can only exercise scopes +that are a subset of the delegator's. + +### StorageIdentityProvider + +Implements alknet-core's `IdentityProvider` trait (ADR-029). Queries +`peer_credentials` (for SSH key resolution) and `api_keys` (for token auth), then +traverses the ACL graph to compute effective scopes and resources. + +```rust +impl IdentityProvider for StorageIdentityProvider { + fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option { + // 1. Find peer_credentials row by fingerprint + // 2. Resolve to account → organization membership → effective scopes + // 3. Return Identity { id: account_uuid, scopes, resources } + } + + fn resolve_from_token(&self, token: &AuthToken) -> Option { + // 1. Verify Ed25519 signature against api_keys or peer_credentials + // 2. Resolve to account → effective scopes + // 3. Return Identity { id: account_uuid, scopes, resources } + } +} +``` + +### StorageProtocol irpc Service + +```rust +#[rpc_requests(message = StorageMessage)] +enum StorageProtocol { + #[rpc(tx=oneshot::Sender)] + #[wrap(CreateGraph)] + CreateGraph { graph_type_id: String, name: String }, + + #[rpc(tx=oneshot::Sender)] + #[wrap(AddNode)] + AddNode { graph_id: String, key: String, attributes: Value }, + + // ... (full protocol in research/services.md) +} +``` + +### Honker Integration + +| Feature | Use case | +|---------|----------| +| `stream_publish` / `subscribe` | Durable pub/sub for node/edge/membership changes | +| `notify` / `listen` | Ephemeral pub/sub for real-time control channel events | +| `queue` / `claim` / `ack` | Task queue for async operations | + +Per ADR-032, honker streams are domain events internal to the storage service. +They are projected to call protocol `EventEnvelope` events when crossing service +boundaries. + +### Encrypted Data + +alknet-storage references alknet-secret's `EncryptedData` wire format for +storing encrypted nodes (API keys, OAuth tokens). The format (key_version, +salt, iv, ciphertext) is shared by type-level compatibility, not a crate +dependency. alknet-secret encrypts; alknet-storage stores the blob. + +### Crate Dependencies + +```toml +[dependencies] +honker = "0.x" +rusqlite = { version = "0.x", features = ["bundled"] } +serde = { version = "1", features = ["derive"] } +serde_json = "1" +jsonschema = "0.x" +petgraph = "0.x" +irpc = "0.x" +``` + +Does NOT depend on alknet-core or alknet-secret. Implements alknet-core's +`IdentityProvider` trait by conforming to its signature, not by direct crate +dependency. + +## Constraints + +- alknet-storage does NOT depend on alknet-core as a crate. It implements the + `IdentityProvider` trait by conforming to the signature. The CLI binary + wires them together. +- alknet-storage does NOT depend on alknet-secret. They share the `EncryptedData` + wire format by type-level compatibility, not a crate dependency. +- WAL mode for concurrent reads during writes. Single writer per `.db` file. +- JSON Schema validation uses the `jsonschema` crate at runtime (replaces + TypeBox from TypeScript). +- Per ADR-032, honker stream events never cross service boundaries without + projection to `EventEnvelope`. + +## Open Questions + +- **OQ-SVC-03**: How does the secret service integrate with the existing + `EncryptedDataSchema` from `@alkdev/storage`? The Rust implementation replaces + PBKDF2 password-based encryption with derived AES-256-GCM keys. The + `EncryptedData` format is a superset — old format can be migrated by + re-encrypting with the new key. + +- **OQ-SVC-04**: Should workers cache derived keys locally? Yes, with a TTL + (default: 1 hour). The head can revoke by invalidating the session. + +- **OQ-SVC-05**: How does the smart contract (NFT-based ACL) interact with the + secret service? The Ethereum signing key (`m/44'/60'/0'/0/0`) is derived from + the same seed. The smart contract is a separate concern. + +## Design Decisions + +| ADR | Decision | Summary | +|-----|----------|---------| +| [027](decisions/027-crate-decomposition.md) | Crate decomposition | alknet-storage is independent of core and secret | +| [029](decisions/029-identity-core-type.md) | Identity as core type | alknet-storage implements IdentityProvider trait | +| [032](decisions/032-event-boundary-discipline.md) | Event boundary | Honker streams stay internal; projection to EventEnvelope at boundaries | + +## References + +- [research/storage.md](../research/storage.md) — Full metagraph, identity, ACL, honker definitions +- [research/services.md](../research/services.md) — StorageProtocol, StorageIdentityProvider +- [research/integration-plan.md](../research/integration-plan.md) — Phase 2.2 +- [identity.md](identity.md) — IdentityProvider trait, Identity struct +- [secret-service.md](secret-service.md) — EncryptedData format, derivation paths \ No newline at end of file diff --git a/tasks/architecture/adr-026-transport-interface-separation.md b/tasks/architecture/adr-026-transport-interface-separation.md index 776dde8..9d779c3 100644 --- a/tasks/architecture/adr-026-transport-interface-separation.md +++ b/tasks/architecture/adr-026-transport-interface-separation.md @@ -1,7 +1,7 @@ --- id: architecture/adr-026-transport-interface-separation name: Write ADR-026 — Transport/interface separation (three-layer model) -status: pending +status: completed depends_on: [] scope: moderate risk: high diff --git a/tasks/architecture/adr-027-crate-decomposition.md b/tasks/architecture/adr-027-crate-decomposition.md index 7295415..3ff3f66 100644 --- a/tasks/architecture/adr-027-crate-decomposition.md +++ b/tasks/architecture/adr-027-crate-decomposition.md @@ -1,7 +1,7 @@ --- id: architecture/adr-027-crate-decomposition name: Write ADR-027 — Crate decomposition -status: pending +status: completed depends_on: - architecture/adr-029-identity-core-type scope: moderate diff --git a/tasks/architecture/adr-028-auth-irpc-service.md b/tasks/architecture/adr-028-auth-irpc-service.md index 23b2c6a..28ce4e9 100644 --- a/tasks/architecture/adr-028-auth-irpc-service.md +++ b/tasks/architecture/adr-028-auth-irpc-service.md @@ -1,7 +1,7 @@ --- id: architecture/adr-028-auth-irpc-service name: Write ADR-028 — Auth as irpc service -status: pending +status: completed depends_on: - architecture/adr-029-identity-core-type scope: narrow diff --git a/tasks/architecture/adr-029-identity-core-type.md b/tasks/architecture/adr-029-identity-core-type.md index 560101b..9788ed3 100644 --- a/tasks/architecture/adr-029-identity-core-type.md +++ b/tasks/architecture/adr-029-identity-core-type.md @@ -1,7 +1,7 @@ --- id: architecture/adr-029-identity-core-type name: Write ADR-029 — Identity as core type -status: pending +status: completed depends_on: [] scope: single risk: low diff --git a/tasks/architecture/adr-030-static-dynamic-config-split.md b/tasks/architecture/adr-030-static-dynamic-config-split.md index 15c7767..62985d8 100644 --- a/tasks/architecture/adr-030-static-dynamic-config-split.md +++ b/tasks/architecture/adr-030-static-dynamic-config-split.md @@ -1,7 +1,7 @@ --- id: architecture/adr-030-static-dynamic-config-split name: Write ADR-030 — Static/dynamic config split -status: pending +status: completed depends_on: [] scope: narrow risk: low diff --git a/tasks/architecture/adr-031-forwarding-policy.md b/tasks/architecture/adr-031-forwarding-policy.md index b1c2b1e..078b932 100644 --- a/tasks/architecture/adr-031-forwarding-policy.md +++ b/tasks/architecture/adr-031-forwarding-policy.md @@ -1,7 +1,7 @@ --- id: architecture/adr-031-forwarding-policy name: Write ADR-031 — Forwarding policy -status: pending +status: completed depends_on: [] scope: narrow risk: low diff --git a/tasks/architecture/adr-032-event-boundary-discipline.md b/tasks/architecture/adr-032-event-boundary-discipline.md index a1514f8..3c2d3fc 100644 --- a/tasks/architecture/adr-032-event-boundary-discipline.md +++ b/tasks/architecture/adr-032-event-boundary-discipline.md @@ -1,7 +1,7 @@ --- id: architecture/adr-032-event-boundary-discipline name: Write ADR-032 — Event boundary discipline -status: pending +status: completed depends_on: [] scope: single risk: low diff --git a/tasks/architecture/adr-033-operationenv-irpc-call-protocol.md b/tasks/architecture/adr-033-operationenv-irpc-call-protocol.md index ad369fc..e0652bf 100644 --- a/tasks/architecture/adr-033-operationenv-irpc-call-protocol.md +++ b/tasks/architecture/adr-033-operationenv-irpc-call-protocol.md @@ -1,7 +1,7 @@ --- id: architecture/adr-033-operationenv-irpc-call-protocol name: Write ADR-033 — OperationEnv, irpc, and call protocol relationship -status: pending +status: completed depends_on: - architecture/adr-028-auth-irpc-service - architecture/adr-027-crate-decomposition diff --git a/tasks/architecture/adr-034-head-worker-terminology.md b/tasks/architecture/adr-034-head-worker-terminology.md index b40d8f4..207f7b5 100644 --- a/tasks/architecture/adr-034-head-worker-terminology.md +++ b/tasks/architecture/adr-034-head-worker-terminology.md @@ -1,7 +1,7 @@ --- id: architecture/adr-034-head-worker-terminology name: Write ADR-034 — Head/worker terminology -status: pending +status: completed depends_on: [] scope: single risk: trivial diff --git a/tasks/architecture/spec-configuration.md b/tasks/architecture/spec-configuration.md index 0144ded..cd7febe 100644 --- a/tasks/architecture/spec-configuration.md +++ b/tasks/architecture/spec-configuration.md @@ -1,7 +1,7 @@ --- id: architecture/spec-configuration name: Promote configuration.md from research to architecture spec -status: pending +status: completed depends_on: - architecture/adr-030-static-dynamic-config-split - architecture/adr-031-forwarding-policy diff --git a/tasks/architecture/spec-flowgraph.md b/tasks/architecture/spec-flowgraph.md index d632476..bdd153a 100644 --- a/tasks/architecture/spec-flowgraph.md +++ b/tasks/architecture/spec-flowgraph.md @@ -1,7 +1,7 @@ --- id: architecture/spec-flowgraph name: Create flowgraph.md architecture spec (or stub referencing crate docs) -status: pending +status: completed depends_on: - architecture/adr-027-crate-decomposition scope: narrow diff --git a/tasks/architecture/spec-identity.md b/tasks/architecture/spec-identity.md index cb8c0f2..7e72fa0 100644 --- a/tasks/architecture/spec-identity.md +++ b/tasks/architecture/spec-identity.md @@ -1,7 +1,7 @@ --- id: architecture/spec-identity name: Create identity.md architecture spec -status: pending +status: completed depends_on: - architecture/adr-029-identity-core-type - architecture/adr-028-auth-irpc-service diff --git a/tasks/architecture/spec-interface.md b/tasks/architecture/spec-interface.md index dfbdb82..d6ce802 100644 --- a/tasks/architecture/spec-interface.md +++ b/tasks/architecture/spec-interface.md @@ -1,7 +1,7 @@ --- id: architecture/spec-interface name: Create interface.md architecture spec (Layer 2) -status: pending +status: completed depends_on: - architecture/adr-026-transport-interface-separation - architecture/adr-033-operationenv-irpc-call-protocol diff --git a/tasks/architecture/spec-secret-service.md b/tasks/architecture/spec-secret-service.md index 217718f..42e2e86 100644 --- a/tasks/architecture/spec-secret-service.md +++ b/tasks/architecture/spec-secret-service.md @@ -1,7 +1,7 @@ --- id: architecture/spec-secret-service name: Create secret-service.md architecture spec -status: pending +status: completed depends_on: - architecture/adr-027-crate-decomposition - architecture/adr-032-event-boundary-discipline diff --git a/tasks/architecture/spec-services.md b/tasks/architecture/spec-services.md index b6425b0..a7dcd0b 100644 --- a/tasks/architecture/spec-services.md +++ b/tasks/architecture/spec-services.md @@ -1,7 +1,7 @@ --- id: architecture/spec-services name: Create services.md architecture spec (irpc service layer + OperationEnv) -status: pending +status: completed depends_on: - architecture/adr-033-operationenv-irpc-call-protocol - architecture/adr-027-crate-decomposition diff --git a/tasks/architecture/spec-storage.md b/tasks/architecture/spec-storage.md index 484c1de..1130d0b 100644 --- a/tasks/architecture/spec-storage.md +++ b/tasks/architecture/spec-storage.md @@ -1,7 +1,7 @@ --- id: architecture/spec-storage name: Create storage.md architecture spec (or stub referencing crate docs) -status: pending +status: completed depends_on: - architecture/adr-027-crate-decomposition - architecture/adr-029-identity-core-type diff --git a/tasks/architecture/spec-update-auth.md b/tasks/architecture/spec-update-auth.md index 4c3b20f..a15da2d 100644 --- a/tasks/architecture/spec-update-auth.md +++ b/tasks/architecture/spec-update-auth.md @@ -1,7 +1,7 @@ --- id: architecture/spec-update-auth name: Update auth.md — add IdentityProvider vs AuthService relationship -status: pending +status: completed depends_on: - architecture/spec-identity - architecture/adr-028-auth-irpc-service diff --git a/tasks/architecture/spec-update-open-questions.md b/tasks/architecture/spec-update-open-questions.md index cd741bf..02f03c8 100644 --- a/tasks/architecture/spec-update-open-questions.md +++ b/tasks/architecture/spec-update-open-questions.md @@ -1,7 +1,7 @@ --- id: architecture/spec-update-open-questions name: Update open-questions.md — resolve questions per ADR decisions -status: pending +status: completed depends_on: - architecture/adr-031-forwarding-policy - architecture/adr-029-identity-core-type diff --git a/tasks/architecture/spec-update-readme.md b/tasks/architecture/spec-update-readme.md index 37f95f3..40b87e7 100644 --- a/tasks/architecture/spec-update-readme.md +++ b/tasks/architecture/spec-update-readme.md @@ -1,7 +1,7 @@ --- id: architecture/spec-update-readme name: Update architecture README.md — add new docs and ADRs to tables -status: pending +status: completed depends_on: - architecture/spec-configuration - architecture/spec-identity