docs: write Phase 0 architecture foundation — ADRs 026-034, spec docs, and task updates
Phase 0a — ADRs (9 new): - ADR-026: Transport/interface separation (three-layer model) - ADR-027: Crate decomposition (core, secret, storage, flowgraph, napi, CLI) - ADR-028: Auth as irpc service (AuthProtocol behind feature flag) - ADR-029: Identity as core type (Identity + IdentityProvider in alknet-core) - ADR-030: Static/dynamic config split (ArcSwap, ConfigReloadHandle) - ADR-031: Forwarding policy (rule-based allow/deny, TransportKind-aware) - ADR-032: Event boundary discipline (domain, irpc, call protocol boundaries) - ADR-033: OperationEnv universal composition (three dispatch paths) - ADR-034: Head/worker terminology (replace hub/spoke) Phase 0b — New spec documents (7): - identity.md, services.md, interface.md, configuration.md, storage.md, flowgraph.md, secret-service.md Updated existing docs: - auth.md: reference identity.md for canonical definitions, add AuthProtocol - open-questions.md: resolve OQ-12, OQ-16, OQ-18, OQ-22, OQ-23-25 - README.md: add all new docs, ADRs 026-034 Marked 19 architecture tasks as completed.
This commit is contained in:
@@ -0,0 +1,162 @@
|
||||
# ADR-026: Transport/Interface Separation (Three-Layer Model)
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
In the current architecture, SSH is deeply embedded in the server handler. The
|
||||
`ServerHandler` owns auth, channel management, and proxy logic — all mixed
|
||||
together. This makes it impossible to run the call protocol over any transport
|
||||
that doesn't speak SSH, such as:
|
||||
|
||||
- **DNS** — encoding call protocol frames as DNS TXT queries/responses for
|
||||
censorship resistance
|
||||
- **Raw framing** — 4-byte length prefix + JSON `EventEnvelope` without SSH
|
||||
wrapping, for local service mesh or browser-to-head direct communication
|
||||
- **WebTransport** — running call protocol over QUIC streams (browsers can't do
|
||||
SSH key exchange)
|
||||
|
||||
The DNS control channel concept from research (`core.md`) currently conflates
|
||||
"DNS as a transport that moves bytes" with "SSH sessions over those bytes." But
|
||||
SSH is not a transport — it's a protocol layer that sits *on top of* a
|
||||
transport. Separating them enables the DNS control channel to carry call
|
||||
protocol events directly, without wrapping SSH inside DNS queries.
|
||||
|
||||
The same separation enables raw framing (no SSH overhead) for trusted local
|
||||
networks, and WebTransport direct call protocol for browser clients.
|
||||
|
||||
## Decision
|
||||
|
||||
**Establish a three-layer model:**
|
||||
|
||||
### Layer 1: Transport
|
||||
|
||||
Produces byte streams. A `Transport` still produces
|
||||
`AsyncRead + AsyncWrite + Unpin + Send`. This layer is unchanged from ADR-001.
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait Transport: Send + Sync + 'static {
|
||||
type Stream: AsyncRead + AsyncWrite + Unpin + Send + 'static;
|
||||
async fn connect(&self) -> Result<Self::Stream>;
|
||||
fn describe(&self) -> String;
|
||||
}
|
||||
```
|
||||
|
||||
Transports: TCP, TLS, iroh, DNS (as byte carrier), WebTransport (future).
|
||||
|
||||
### Layer 2: Interface
|
||||
|
||||
Consumes a `Transport::Stream` and produces call protocol sessions. An
|
||||
interface is what SSH currently does: wrap a byte stream in session semantics.
|
||||
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait Interface: Send + Sync + 'static {
|
||||
type Session;
|
||||
async fn accept(stream: TransportStream, config: &InterfaceConfig) -> Result<Self::Session>;
|
||||
}
|
||||
```
|
||||
|
||||
Interfaces:
|
||||
|
||||
- **SSH interface** — wraps existing `ServerHandler` logic. SSH handshake, auth,
|
||||
channel multiplexing. The call protocol runs over a reserved SSH channel
|
||||
(`alknet-control:0`).
|
||||
- **Raw framing interface** — 4-byte big-endian length prefix + JSON
|
||||
`EventEnvelope`. No SSH overhead. Direct call protocol over the transport
|
||||
stream.
|
||||
- **DNS control channel** — a (DNS transport, raw framing interface) pair that
|
||||
encodes/decodes `EventEnvelope` frames as DNS query/response pairs.
|
||||
|
||||
### Layer 3: Protocol
|
||||
|
||||
Carries semantics. Call protocol events, operation registry, service calls.
|
||||
The protocol is agnostic to both the transport and the interface below it. It
|
||||
receives `EventEnvelope` frames from whatever interface produced them.
|
||||
|
||||
### Connection Model
|
||||
|
||||
A **connection** is always a (Transport, Interface) pair. The valid combinations are enumerated:
|
||||
|
||||
| Transport | Interface | Use case |
|
||||
|-----------|-----------|----------|
|
||||
| TLS | SSH | Standard alknet tunnel |
|
||||
| TCP | SSH | Plain SSH tunnel |
|
||||
| iroh | SSH | P2P SSH tunnel |
|
||||
| DNS | raw framing | DNS control channel |
|
||||
| WebTransport | SSH | Browser SSH tunnel (future) |
|
||||
| WebTransport | raw framing | Browser call protocol (future) |
|
||||
| TCP | raw framing | Direct call protocol, local mesh |
|
||||
|
||||
**The DNS control channel carries call protocol frames directly — it does NOT
|
||||
wrap SSH inside DNS.** This is explicit because the research originally
|
||||
conflated "SSH tunneling over DNS" with "DNS as a transport for call protocol."
|
||||
The (DNS, raw framing) pair sends `EventEnvelope` frames as DNS TXT
|
||||
queries/responses — no SSH involved.
|
||||
|
||||
### `TransportKind` Enum
|
||||
|
||||
The `TransportKind` enum (currently `Tcp | Tls | Iroh`) gains `Dns` and
|
||||
`WebTransport` variants. Initially these are tags only — no acceptor
|
||||
implementation. The full DNS and WebTransport implementations are Phase 4 work
|
||||
per the integration plan.
|
||||
|
||||
```rust
|
||||
pub enum TransportKind {
|
||||
Tcp,
|
||||
Tls { server_name: Option<String> },
|
||||
Iroh { endpoint_id: String },
|
||||
Dns { domain: String },
|
||||
WebTransport { host: String },
|
||||
}
|
||||
```
|
||||
|
||||
### ServerHandler Refactor
|
||||
|
||||
The existing `ServerHandler` is refactored into `SshInterface`. The interface
|
||||
abstraction means the server's accept loop becomes:
|
||||
|
||||
```rust
|
||||
// Pseudocode
|
||||
let (transport, interface) = listener_config;
|
||||
let stream = transport.accept().await?;
|
||||
let session = interface.accept(stream, &config).await?;
|
||||
// session produces call protocol events
|
||||
```
|
||||
|
||||
The call protocol handler is interface-agnostic — it receives `EventEnvelope`
|
||||
frames from any interface. Auth, forwarding policy, and operation routing happen
|
||||
at Layer 3, not inside the SSH handler.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Enables DNS control channel without SSH wrapping. The (DNS,
|
||||
raw framing) pair is a clean (Transport, Interface) combination.
|
||||
- **Positive**: Enables raw framing for local service mesh. No SSH overhead for
|
||||
trusted networks.
|
||||
- **Positive**: SSH becomes pluggable. The same call protocol handler works with
|
||||
any interface.
|
||||
- **Positive**: `ServerHandler` is refactored into `SshInterface` — a smaller,
|
||||
more focused component that only handles SSH session management.
|
||||
- **Positive**: Future WebTransport and WebSocket interfaces are additive — they
|
||||
implement the `Interface` trait without touching SSH code.
|
||||
- **Negative**: This is the most invasive code change in Phase 1
|
||||
(integration-plan, Phase 1.8). SSH auth, channel management, and proxy logic
|
||||
are currently tangled in `ServerHandler`. Extracting them requires careful
|
||||
refactoring to maintain existing behavior.
|
||||
- **Negative**: The `Interface` trait is new and untested. The design must
|
||||
accommodate both SSH's channel multiplexing and raw framing's single-stream
|
||||
model through the same abstraction.
|
||||
|
||||
## References
|
||||
|
||||
- [research/core.md](../../research/core.md) — Transport layer, DNS transport section
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.8, three-layer model
|
||||
- [transport.md](../transport.md) — Current Transport trait (unchanged at Layer 1)
|
||||
- [server.md](../server.md) — Current ServerHandler (will become SshInterface)
|
||||
- [ADR-001](001-pluggable-transport.md) — Transport trait produces stream (unchanged)
|
||||
- [ADR-004](004-ssh-over-transport.md) — SSH runs over transport (reinforced by Layer 2)
|
||||
- [ADR-024](024-bidirectional-call-protocol.md) — Bidirectional call protocol (Layer 3)
|
||||
150
docs/architecture/decisions/027-crate-decomposition.md
Normal file
150
docs/architecture/decisions/027-crate-decomposition.md
Normal file
@@ -0,0 +1,150 @@
|
||||
# ADR-027: Crate Decomposition
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
alknet-core currently contains everything: transport, SSH, auth, config, the
|
||||
call protocol handler, and the server accept loop. As the project grows to
|
||||
include SQLite-backed identity, HD key derivation, and metagraph storage, core
|
||||
would need to depend on rusqlite, bip39, petgraph, and other heavy dependencies
|
||||
— unacceptable for a library crate that CLI users embed.
|
||||
|
||||
Different deployment topologies need different subsets:
|
||||
- A minimal CLI tunnel only needs core, transport, and auth types
|
||||
- A head node needs SQLite-backed identity and the secret service
|
||||
- A flowgraph visualization tool only needs petgraph operations
|
||||
|
||||
Circular dependencies must be avoided. alknet-storage implements
|
||||
alknet-core's `IdentityProvider` trait, so alknet-core cannot depend on
|
||||
alknet-storage. alknet-storage references alknet-secret's `EncryptedData` wire
|
||||
format, but not as a crate dependency.
|
||||
|
||||
## Decision
|
||||
|
||||
**Decompose the project into six crates with a strict acyclic dependency graph.**
|
||||
|
||||
### Crate Structure
|
||||
|
||||
1. **alknet-core** — Transport, SSH, call protocol, config, auth types, identity,
|
||||
`OperationSpec`, `Interface` trait. The foundational crate that everything
|
||||
else depends on (by type, not by crate dep in some cases).
|
||||
- *Depends on*: russh, tokio, irpc (feature-gated), serde, arc-swap
|
||||
- *Does NOT depend on*: alknet-secret, alknet-storage, alknet-flowgraph
|
||||
|
||||
2. **alknet-secret** — BIP39 mnemonic generation, SLIP-0010 Ed25519 HD key
|
||||
derivation, AES-256-GCM encryption, `SecretProtocol` irpc service.
|
||||
- *Depends on*: bip39, ed25519-bip32 (or rust-bip32-ed25519), aes-gcm, sha2,
|
||||
irpc
|
||||
- *Does NOT depend on*: alknet-core, alknet-storage
|
||||
|
||||
3. **alknet-storage** — SQLite-backed metagraph, identity tables, ACL graph,
|
||||
honker integration, `StorageProtocol` irpc service.
|
||||
- *Depends on*: rusqlite (via honker), honker, petgraph, jsonschema, irpc
|
||||
- *Does NOT depend on alknet-core* (but implements alknet-core's
|
||||
`IdentityProvider` trait via the trait, not a crate dep)
|
||||
- *Does NOT depend on alknet-secret* (but references `EncryptedData` type
|
||||
format for wire compatibility)
|
||||
|
||||
4. **alknet-flowgraph** — `FlowGraph<N,E>` over petgraph, operation graph, call
|
||||
graph, type compatibility checking.
|
||||
- *Depends on*: petgraph, serde, jsonschema, thiserror
|
||||
- *Does NOT depend on*: alknet-core, alknet-storage, alknet-secret
|
||||
|
||||
5. **alknet-napi** — Node.js native addon. Exposes alknet-core to Node.js.
|
||||
- *Depends on*: alknet-core
|
||||
- *Does NOT depend on*: alknet-secret, alknet-storage, alknet-flowgraph
|
||||
|
||||
6. **alknet** (CLI binary) — Assembles everything.
|
||||
- *Depends on*: alknet-core, alknet-secret (feature), alknet-storage (feature),
|
||||
alknet-flowgraph (feature), toml
|
||||
|
||||
### Dependency Graph
|
||||
|
||||
```
|
||||
alknet-secret
|
||||
/ \
|
||||
/ \
|
||||
alknet-core ←──── ←── alknet-storage
|
||||
↑ \ /
|
||||
│ alknet-flowgraph
|
||||
│
|
||||
alknet-napi
|
||||
alknet (CLI binary — assembles everything)
|
||||
```
|
||||
|
||||
### Narrow Interface Points
|
||||
|
||||
Three types serve as the narrow interface points between crates:
|
||||
|
||||
1. **`Identity`** — Defined in `alknet_core::auth`. Used by auth handler,
|
||||
forwarding policy, and call protocol. alknet-storage implements
|
||||
`IdentityProvider` to produce instances.
|
||||
|
||||
2. **`IdentityProvider`** — Trait defined in `alknet_core::auth`. Implemented by
|
||||
`ConfigIdentityProvider` (in core) and `StorageIdentityProvider` (in
|
||||
alknet-storage). The CLI/NAPI layer wires the concrete implementation.
|
||||
|
||||
3. **`OperationSpec`** — Defined in `alknet_core::call`. Used by the operation
|
||||
registry and by alknet-flowgraph for type compatibility checking. The bridge
|
||||
is serialization — flowgraph serializes to JSON, storage persists it.
|
||||
|
||||
### irpc Feature Flag
|
||||
|
||||
irpc is a feature flag in alknet-core. When disabled, auth and config go through
|
||||
`IdentityProvider` and `ConfigReloadHandle` directly — no irpc overhead. Nodes
|
||||
that only do SSH tunneling don't need the service layer.
|
||||
|
||||
In alknet-secret and alknet-storage, irpc is an independent dependency, not
|
||||
feature-gated. These crates always define irpc service protocols because they
|
||||
are used in production deployments where the service layer is active.
|
||||
|
||||
### alknet-storage's Relationship to alknet-core
|
||||
|
||||
alknet-storage does NOT depend on alknet-core as a crate. Instead:
|
||||
|
||||
- alknet-storage defines its own `IdentityProvider` impl that matches
|
||||
alknet-core's trait signature. The trait is re-exported or defined locally
|
||||
with `#[cfg(feature = "alknet-core")]` interop.
|
||||
- In practice, the CLI binary crate depends on both and wires them together.
|
||||
alknet-storage provides `StorageIdentityProvider`; alknet-core takes
|
||||
`impl IdentityProvider`.
|
||||
|
||||
### alknet-storage's Relationship to alknet-secret
|
||||
|
||||
alknet-storage does NOT depend on alknet-secret as a crate. Instead:
|
||||
|
||||
- alknet-storage and alknet-secret share the `EncryptedData` wire format (key
|
||||
version, salt, IV, ciphertext). This is a type-level compatibility, not a
|
||||
crate dependency.
|
||||
- alknet-secret encrypts; alknet-storage stores the encrypted blob in a
|
||||
`SecretNode` in the metagraph. The bridge is serialization.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Core is lean. No database, no crypto, no petgraph. CLI users
|
||||
get a small binary.
|
||||
- **Positive**: Services are pluggable. alknet-secret and alknet-storage can be
|
||||
swapped for alternative implementations.
|
||||
- **Positive**: No circular dependencies. The dependency graph is a DAG.
|
||||
- **Positive**: Deployment topology determines which crates to include. A CLI
|
||||
tunnel uses only alknet-core. A head node uses everything.
|
||||
- **Positive**: irpc is feature-gated in core. Minimal deployments don't pay for
|
||||
service layer overhead.
|
||||
- **Negative**: `IdentityProvider` trait interop between alknet-core and
|
||||
alknet-storage requires careful versioning. If the trait signature changes,
|
||||
both crates must update.
|
||||
- **Negative**: `EncryptedData` wire format compatibility between alknet-secret
|
||||
and alknet-storage is implicit (not enforced by the type system). A shared
|
||||
types crate could be extracted if needed, but adds another crate dependency.
|
||||
|
||||
## References
|
||||
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 2, dependency graph
|
||||
- [research/core.md](../../research/core.md) — alknet-core contents
|
||||
- [research/services.md](../../research/services.md) — Service protocols
|
||||
- [research/storage.md](../../research/storage.md) — alknet-storage contents
|
||||
- [research/flow.md](../../research/flow.md) — alknet-flowgraph contents
|
||||
- [ADR-029](029-identity-core-type.md) — Identity as core type (narrow interface point)
|
||||
146
docs/architecture/decisions/028-auth-irpc-service.md
Normal file
146
docs/architecture/decisions/028-auth-irpc-service.md
Normal file
@@ -0,0 +1,146 @@
|
||||
# ADR-028: Auth as irpc Service
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
For head nodes serving many users, in-memory key lookup via `ArcSwap<DynamicConfig>`
|
||||
doesn't scale. Loading all authorized keys into RAM and atomic-swapping the
|
||||
entire set on each reload works for small deployments but requires holding every
|
||||
key in memory. For production deployments with hundreds or thousands of users,
|
||||
auth verification should query a database on demand rather than holding all keys
|
||||
in memory.
|
||||
|
||||
The current `ArcSwap<DynamicConfig>` approach works for CLI and single-node
|
||||
setups. What's needed is an async boundary that allows auth verification to go
|
||||
through a service — locally via channels for minimal deployments, or via irpc
|
||||
for production deployments where auth runs on a separate process or node.
|
||||
|
||||
The critical design point: callers go through the `IdentityProvider` trait
|
||||
(ADR-029). The irpc service is one way to satisfy the trait. Both paths produce
|
||||
the same result — an `Identity` or rejection. The trait is the contract; the
|
||||
service is an implementation path.
|
||||
|
||||
## Decision
|
||||
|
||||
**Auth verification is provided via an irpc service protocol, with
|
||||
`IdentityProvider` as the interface contract and `ConfigIdentityProvider`
|
||||
(ArcSwap-backed) as the default implementation.**
|
||||
|
||||
### IdentityProvider Trait (ADR-029) — The Contract
|
||||
|
||||
Callers depend on `IdentityProvider`, not on any concrete implementation:
|
||||
|
||||
```rust
|
||||
pub trait IdentityProvider: Send + Sync + 'static {
|
||||
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
|
||||
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
|
||||
}
|
||||
```
|
||||
|
||||
### ConfigIdentityProvider — Default Implementation
|
||||
|
||||
Reads from `ArcSwap<DynamicConfig.auth>`. No database needed. Every authorized
|
||||
key gets a default scope set. This is the default for CLI and single-node
|
||||
deployments.
|
||||
|
||||
### AuthProtocol irpc Service — Behind Feature Flag
|
||||
|
||||
```rust
|
||||
#[rpc_requests(message = AuthMessage)]
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
enum AuthProtocol {
|
||||
#[rpc(tx=oneshot::Sender<AuthResult>)]
|
||||
#[wrap(VerifyPubkey)]
|
||||
VerifyPubkey { fingerprint: String, key_data: Vec<u8> },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<AuthResult>)]
|
||||
#[wrap(VerifyToken)]
|
||||
VerifyToken { token_bytes: Vec<u8>, timestamp: u64 },
|
||||
|
||||
#[rpc(tx=oneshot::Sender<()>)]
|
||||
#[wrap(ReloadKeys)]
|
||||
ReloadKeys,
|
||||
|
||||
#[rpc(tx=oneshot::Sender<bool>)]
|
||||
#[wrap(CheckAccess)]
|
||||
CheckAccess { identity: Identity, operation: String },
|
||||
}
|
||||
|
||||
enum AuthResult {
|
||||
Ok(Identity),
|
||||
Denied(String),
|
||||
}
|
||||
```
|
||||
|
||||
The `AuthProtocol` is behind the `irpc` feature flag in alknet-core. Nodes
|
||||
that only do SSH tunneling don't need the service layer overhead. When the
|
||||
feature is disabled, auth goes through `IdentityProvider` directly.
|
||||
|
||||
### AuthServiceImpl
|
||||
|
||||
Two implementations exist:
|
||||
|
||||
- **ConfigAuthService** — backed by `ConfigIdentityProvider` (ArcSwap path).
|
||||
Wraps the trait in an irpc service for deployments that use the service layer
|
||||
but don't have SQLite.
|
||||
- **StorageAuthService** — backed by SQLite `peer_credentials` and `api_keys`
|
||||
tables (in alknet-storage). Queries on demand. Can maintain an LRU cache for
|
||||
hot fingerprints. This is the production implementation.
|
||||
|
||||
Both produce the same `AuthResult` — an `Identity` or a denial. Callers don't
|
||||
know or care which backend is running.
|
||||
|
||||
### Integration with IdentityProvider
|
||||
|
||||
The irpc service and the trait compose. A caller goes through `IdentityProvider`,
|
||||
which may internally delegate to the irpc service, or may satisfy the request
|
||||
locally via `ConfigIdentityProvider`. The deployment topology determines the
|
||||
path:
|
||||
|
||||
- **Minimal (CLI, single-node)**: `ConfigIdentityProvider` reads from
|
||||
`ArcSwap<DynamicConfig>`. No irpc overhead.
|
||||
- **Production with local auth**: `AuthServiceImpl` wraps
|
||||
`StorageIdentityProvider` locally. The handler calls `IdentityProvider` which
|
||||
routes to the local irpc service.
|
||||
- **Distributed auth**: Handler on a worker node calls `IdentityProvider` which
|
||||
routes to a remote auth irpc service over QUIC.
|
||||
|
||||
### ConfigService Integration
|
||||
|
||||
`AuthProtocol::ReloadKeys` triggers reload of the dynamic config's auth section.
|
||||
For the `ConfigIdentityProvider` path, this is equivalent to
|
||||
`ConfigReloadHandle::reload()`. For the `StorageIdentityProvider` path, this
|
||||
refreshes the LRU cache. Both update atomically — ongoing connections are
|
||||
unaffected, new connections pick up changes.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Minimal deployments use `ArcSwap` without irpc overhead. No
|
||||
database dependency for CLI users.
|
||||
- **Positive**: Production deployments wire `StorageIdentityProvider` behind the
|
||||
irpc service. Auth scales to thousands of users without loading all keys into
|
||||
memory.
|
||||
- **Positive**: The `IdentityProvider` trait is the only contract callers depend
|
||||
on. This keeps alknet-core lean and testable.
|
||||
- **Positive**: Feature flag (`irpc`) keeps core lean for deployments that don't
|
||||
need the service layer.
|
||||
- **Positive**: Both paths produce identical `Identity` results. Behavioral
|
||||
parity is enforced by the shared `Identity` type.
|
||||
- **Negative**: Two implementations must be kept in sync. `ConfigIdentityProvider`
|
||||
and `StorageIdentityProvider` must produce the same `Identity` for the same
|
||||
input. Integration tests should verify this.
|
||||
- **Negative**: The `irpc` feature flag adds conditional compilation complexity.
|
||||
The core must compile and work without it, and the service layer must work
|
||||
with it enabled.
|
||||
|
||||
## References
|
||||
|
||||
- [research/services.md](../../research/services.md) — AuthService, AuthProtocol definition
|
||||
- [auth.md](../auth.md) — IdentityProvider trait, Identity struct
|
||||
- [research/configuration.md](../../research/configuration.md) — Auth service approach
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.4
|
||||
- [ADR-029](029-identity-core-type.md) — Identity as core type
|
||||
- [ADR-027](027-crate-decomposition.md) — Crate decomposition
|
||||
107
docs/architecture/decisions/029-identity-core-type.md
Normal file
107
docs/architecture/decisions/029-identity-core-type.md
Normal file
@@ -0,0 +1,107 @@
|
||||
# ADR-029: Identity as Core Type
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The `Identity` struct and `IdentityProvider` trait are needed by auth,
|
||||
forwarding policy, and call protocol — three different subsystems in
|
||||
alknet-core. Without placing them in core, these subsystems would each define
|
||||
their own identity type, leading to duplication and conversion boilerplate.
|
||||
|
||||
The constraint: alknet-core must not depend on alknet-storage or any database.
|
||||
The `IdentityProvider` trait must be in core so that the handler can resolve
|
||||
identities without knowing whether the backing store is a config file or a
|
||||
SQLite database. External crates provide implementations.
|
||||
|
||||
Earlier research defined `Identity` inconsistently: `{node_id, fingerprint,
|
||||
scopes}` in services.md and `{id, scopes, resources}` in auth.md. The unified
|
||||
model uses `{id, scopes, resources}` where `id` serves as both fingerprint (for
|
||||
key-based auth from config) and account UUID (for database-backed auth).
|
||||
|
||||
## Decision
|
||||
|
||||
**`Identity` struct and `IdentityProvider` trait live in `alknet_core::auth`.**
|
||||
|
||||
### Identity Struct
|
||||
|
||||
```rust
|
||||
pub struct Identity {
|
||||
pub id: String, // Fingerprint (config auth) or account UUID (database auth)
|
||||
pub scopes: Vec<String>, // e.g., ["relay:connect", "service:gitea:read"]
|
||||
pub resources: HashMap<String, Vec<String>>, // e.g., {"service": ["gitea", "registry"]}
|
||||
}
|
||||
```
|
||||
|
||||
The `id` field serves dual purpose: when using config-based authentication
|
||||
(`ConfigIdentityProvider`), it holds the Ed25519 key fingerprint. When using
|
||||
database-backed authentication (`StorageIdentityProvider`), it holds the account
|
||||
UUID from the `accounts` table. This keeps the type simple while accommodating
|
||||
both auth paths.
|
||||
|
||||
The `scopes` field provides authorization scope strings used by
|
||||
`ForwardingPolicy` and `AccessControl` in `OperationSpec`. The `resources`
|
||||
field provides resource-level authorization beyond what scopes offer (e.g., which
|
||||
services this identity can access).
|
||||
|
||||
### IdentityProvider Trait
|
||||
|
||||
```rust
|
||||
pub trait IdentityProvider: Send + Sync + 'static {
|
||||
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
|
||||
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
|
||||
}
|
||||
```
|
||||
|
||||
The trait is the contract. Callers (auth handler, forwarding policy, call
|
||||
protocol) depend on `IdentityProvider` — not on any concrete implementation.
|
||||
|
||||
### Default and Production Implementations
|
||||
|
||||
- **`ConfigIdentityProvider`** (in alknet-core) — reads from
|
||||
`ArcSwap<DynamicConfig.auth>`. Every authorized key gets a default scope set.
|
||||
No database needed. This is the default for minimal deployments.
|
||||
- **`StorageIdentityProvider`** (in alknet-storage) — backed by SQLite
|
||||
`peer_credentials` and `api_keys` tables plus the ACL graph. Resolves
|
||||
fingerprint → account → organization membership → effective scopes. This is
|
||||
the production implementation for head nodes.
|
||||
|
||||
alknet-core never depends on alknet-storage. The trait relationship is:
|
||||
alknet-core *defines* the trait, alknet-storage *implements* it. The CLI or
|
||||
NAPI assembly layer wires the concrete implementation.
|
||||
|
||||
### Why Not in alknet-storage?
|
||||
|
||||
If `Identity` lived in alknet-storage, alknet-core would need to depend on
|
||||
alknet-storage to use the type — creating a circular dependency (since
|
||||
alknet-storage implements alknet-core's `IdentityProvider` trait). Placing the
|
||||
type and trait in core breaks the cycle.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: alknet-core has no database dependency. Auth, forwarding, and
|
||||
call protocol all use the same `Identity` type.
|
||||
- **Positive**: alknet-storage implements the core trait. The CLI/NAPI layer
|
||||
wires the concrete implementation. Deployment topology determines which impl
|
||||
to use.
|
||||
- **Positive**: The `id` field serves dual purpose (fingerprint or UUID),
|
||||
avoiding separate types for config-based and database-based auth.
|
||||
- **Positive**: `ForwardingPolicy` and `AccessControl` can reference scopes from
|
||||
`Identity` without knowing where they came from.
|
||||
- **Negative**: Two implementations of `IdentityProvider` exist — `Config` and
|
||||
`Storage`. Both must produce identical `Identity` results for the same input.
|
||||
Tests should verify behavioral parity.
|
||||
- **Negative**: The trait abstraction adds a level of indirection for the
|
||||
minimal (config-only) deployment path. The cost is negligible — the
|
||||
`ConfigIdentityProvider` is a simple `ArcSwap` dereference.
|
||||
|
||||
## References
|
||||
|
||||
- [auth.md](../auth.md) — IdentityProvider trait, Identity struct, unified auth
|
||||
- [research/services.md](../../research/services.md) — AuthService, Identity section
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.2
|
||||
- [ADR-023](023-unified-auth-shared-key-material.md) — Unified auth with shared key material
|
||||
- [ADR-028](028-auth-irpc-service.md) — Auth as irpc service
|
||||
- [OQ-18](../open-questions.md) — IdentityProvider owns scopes
|
||||
159
docs/architecture/decisions/030-static-dynamic-config-split.md
Normal file
159
docs/architecture/decisions/030-static-dynamic-config-split.md
Normal file
@@ -0,0 +1,159 @@
|
||||
# ADR-030: Static/Dynamic Configuration Split
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Alknet's configuration is loaded once at startup and never changes. This causes
|
||||
three specific failures:
|
||||
|
||||
1. **No hot reload of authentication credentials.** Adding or removing an
|
||||
authorized key requires restarting the server process. In head/worker
|
||||
deployments where keys are managed via a database, the process must be
|
||||
restarted every time a key is added, revoked, or rotated. This is
|
||||
operationally unacceptable.
|
||||
|
||||
2. **No port forwarding access control.** Any authenticated client can open a
|
||||
`direct-tcpip` channel to any destination. There is no policy governing
|
||||
which hosts, ports, or alknet control channels a client may access. A
|
||||
compromised key grants unrestricted network access through the tunnel.
|
||||
|
||||
3. **No structured configuration beyond CLI flags.** ADR-011 chose
|
||||
programmatic-first configuration for the alpha — correct at the time. But as
|
||||
alknet moves toward publishable releases, operators need config files for
|
||||
reproducible deployments, and the NAPI layer needs programmatic reload
|
||||
capability that `ServeOptions` doesn't currently support.
|
||||
|
||||
Not all configuration should be reloadable. Transport-level settings (listen
|
||||
address, TLS certificates, host key) require socket/TLS renegotiation to change
|
||||
at runtime — effectively a restart. Auth and forwarding policy can change
|
||||
atomically without disrupting existing connections.
|
||||
|
||||
## Decision
|
||||
|
||||
**Split configuration into `StaticConfig` and `DynamicConfig`.**
|
||||
|
||||
### StaticConfig
|
||||
|
||||
Immutable after startup. Constructed from `ServeOptions` (the builder pattern is
|
||||
preserved). Contains everything that affects socket binding, TLS handshakes, or
|
||||
SSH session negotiation:
|
||||
|
||||
- Transport mode, listen address
|
||||
- TLS config (cert, key)
|
||||
- iroh config (relay URL)
|
||||
- Stealth mode flag
|
||||
- Host key, host key algorithm
|
||||
- Max auth attempts, max connections per IP
|
||||
- Proxy config
|
||||
|
||||
Changing any of these requires a restart.
|
||||
|
||||
### DynamicConfig
|
||||
|
||||
Hot-reloadable at runtime via `ArcSwap<DynamicConfig>`. Contains everything
|
||||
checked per-connection or per-channel:
|
||||
|
||||
- `AuthPolicy` — authorized keys, certificate authorities, token config
|
||||
- `ForwardingPolicy` — allow/deny rules for channel targets (ADR-031)
|
||||
- `RateLimitConfig` — rate limiting parameters
|
||||
|
||||
`ArcSwap` provides lock-free reads on the hot path (every `auth_publickey()` and
|
||||
every `channel_open_direct_tcpip()` call does an `Arc` dereference — zero cost
|
||||
compared to the current approach). Writes are atomic: `store()` swaps the
|
||||
pointer. Existing connections finish with their current config; new connections
|
||||
get the new config.
|
||||
|
||||
### ConfigReloadHandle
|
||||
|
||||
```rust
|
||||
pub struct ConfigReloadHandle {
|
||||
dynamic: Arc<ArcSwap<DynamicConfig>>,
|
||||
}
|
||||
|
||||
impl ConfigReloadHandle {
|
||||
pub fn reload(&self, new_config: DynamicConfig) { ... }
|
||||
}
|
||||
```
|
||||
|
||||
The handle is obtained from `Server::run()` and passed to NAPI or the CLI.
|
||||
|
||||
### ConfigService
|
||||
|
||||
The `ConfigService` wraps `ArcSwap<DynamicConfig>` reloads behind an irpc
|
||||
protocol (behind the `irpc` feature flag) for production deployments that use
|
||||
the service layer. For minimal deployments (CLI, single-node), direct
|
||||
`ConfigReloadHandle::reload()` is sufficient.
|
||||
|
||||
### TOML Config File
|
||||
|
||||
An optional TOML config file covers static config plus initial auth/forwarding
|
||||
paths. This **amends** ADR-011 (does not supersede it) — the programmatic-first
|
||||
API remains primary. The config file is a convenience input format:
|
||||
|
||||
```toml
|
||||
[server]
|
||||
transport = "tls"
|
||||
listen = "0.0.0.0:443"
|
||||
stealth = false
|
||||
max_connections_per_ip = 5
|
||||
max_auth_attempts = 3
|
||||
|
||||
[server.tls]
|
||||
cert = "/etc/alknet/tls/cert.pem"
|
||||
key = "/etc/alknet/tls/key.pem"
|
||||
|
||||
[auth]
|
||||
host_key = "/etc/alknet/ssh/host_key"
|
||||
|
||||
[forwarding]
|
||||
default = "deny"
|
||||
```
|
||||
|
||||
### NAPI Reload API
|
||||
|
||||
```typescript
|
||||
interface AlknetServer {
|
||||
reloadAuth(auth: { authorizedKeys?: Buffer, certAuthority?: Buffer }): void;
|
||||
reloadForwarding(policy: ForwardingPolicyConfig): void;
|
||||
reloadAll(config: DynamicConfig): void;
|
||||
}
|
||||
```
|
||||
|
||||
The NAPI layer parses key data and constructs a new `DynamicConfig`, then calls
|
||||
`ConfigReloadHandle::reload()`.
|
||||
|
||||
### Client Configuration
|
||||
|
||||
Client configuration stays as `ConnectOptions` — no `ArcSwap` needed. Client
|
||||
config is almost entirely static (which server to connect to, which key to use).
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Auth credentials and forwarding policy can be reloaded without
|
||||
restarting the server. Adding a key via `reloadAuth()` takes effect on the
|
||||
next connection attempt.
|
||||
- **Positive**: ADR-011's programmatic-first intent is preserved. The TOML
|
||||
config file is an optional convenience layer, not a replacement for
|
||||
`ServeOptions`.
|
||||
- **Positive**: `ArcSwap` provides zero-cost reads on the hot path. Every auth
|
||||
check and every channel open is a single `Arc` dereference.
|
||||
- **Positive**: The `ConfigService` irpc protocol (behind feature flag) allows
|
||||
production deployments to integrate config reload into their service mesh
|
||||
without taking a direct dependency on `DynamicConfig` internals.
|
||||
- **Positive**: Forwarding policy is now part of `DynamicConfig` — operators can
|
||||
restrict access per identity, per destination, per transport (ADR-031).
|
||||
- **Negative**: Two config structs where there was one. The split is clean
|
||||
(transport vs. policy) but adds surface area.
|
||||
- **Negative**: Config file introduces `toml` as a dependency in the CLI crate.
|
||||
This is acceptable for a CLI binary.
|
||||
|
||||
## References
|
||||
|
||||
- [research/configuration.md](../../research/configuration.md) — Full analysis
|
||||
- [ADR-011](011-no-ssh-config-programmatic-api.md) — Programmatic-first API (amended, not superseded)
|
||||
- [ADR-031](031-forwarding-policy.md) — Forwarding policy (part of DynamicConfig)
|
||||
- [ADR-029](029-identity-core-type.md) — Identity as core type (DynamicConfig.auth uses IdentityProvider)
|
||||
- [integration-plan.md](../../research/integration-plan.md) — Phase 1.1
|
||||
138
docs/architecture/decisions/031-forwarding-policy.md
Normal file
138
docs/architecture/decisions/031-forwarding-policy.md
Normal file
@@ -0,0 +1,138 @@
|
||||
# ADR-031: Forwarding Policy
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Currently, any authenticated client can open a `direct-tcpip` SSH channel to
|
||||
any destination. The only gate is authentication — once authenticated, a client
|
||||
has unrestricted network access through the tunnel. This is a security gap: a
|
||||
compromised key grants unrestricted access.
|
||||
|
||||
Operators need the ability to:
|
||||
- Restrict which hosts and ports authenticated clients can access
|
||||
- Apply different rules to different principals (key fingerprints, accounts)
|
||||
- Restrict WebTransport clients to alknet control channels only
|
||||
- Set a default policy (allow-all for migration compatibility, deny-all for
|
||||
production)
|
||||
|
||||
## Decision
|
||||
|
||||
**Add `ForwardingPolicy` as part of `DynamicConfig` (reloadable without
|
||||
restart).**
|
||||
|
||||
### Type Definitions
|
||||
|
||||
```rust
|
||||
pub struct ForwardingPolicy {
|
||||
pub default: ForwardingAction,
|
||||
pub rules: Vec<ForwardingRule>,
|
||||
}
|
||||
|
||||
pub struct ForwardingRule {
|
||||
pub target: TargetPattern,
|
||||
pub action: ForwardingAction,
|
||||
pub principals: Vec<String>, // Empty = matches all
|
||||
pub transports: Vec<TransportKind>, // Empty = matches all
|
||||
}
|
||||
|
||||
pub enum ForwardingAction {
|
||||
Allow,
|
||||
Deny,
|
||||
}
|
||||
|
||||
pub enum TargetPattern {
|
||||
Any,
|
||||
Host(String), // "localhost", "*.example.com"
|
||||
Cidr(IpNetwork), // "10.0.0.0/8"
|
||||
PortRange(String, Range<u16>), // "localhost", ports 8080-8090
|
||||
AlknetPrefix, // Matches alknet-* control channels
|
||||
}
|
||||
```
|
||||
|
||||
### Rule Evaluation
|
||||
|
||||
Rules are evaluated in order. First match wins. If no rule matches, the default
|
||||
applies. This supports both allowlist and blocklist semantics:
|
||||
|
||||
- **Allowlist**: `default: Deny`, then explicit Allow rules for permitted
|
||||
destinations.
|
||||
- **Blocklist**: `default: Allow`, then explicit Deny rules for blocked
|
||||
destinations.
|
||||
|
||||
### Principals
|
||||
|
||||
Each rule can specify which principals it applies to. A principal is an
|
||||
`Identity.id` (fingerprint or UUID) or a scope from `Identity.scopes`. When the
|
||||
rule's `principals` field is empty, it matches all identities.
|
||||
|
||||
This connects to the `IdentityProvider` trait (ADR-029): when a client
|
||||
authenticates, the `Identity` is resolved, and the forwarding policy checks
|
||||
rules against `Identity.id` and `Identity.scopes`.
|
||||
|
||||
### TransportKind-Aware Rules
|
||||
|
||||
Each rule can specify which `TransportKind` it applies to. This enables
|
||||
transport-specific restrictions — for example, WebTransport clients can be
|
||||
restricted to `alknet-*` control channels only:
|
||||
|
||||
```rust
|
||||
ForwardingRule {
|
||||
target: TargetPattern::AlknetPrefix,
|
||||
action: ForwardingAction::Allow,
|
||||
principals: vec![],
|
||||
transports: vec![TransportKind::WebTransport { host: "*".into() }],
|
||||
}
|
||||
```
|
||||
|
||||
### Where the Policy Check Happens
|
||||
|
||||
The forwarding policy check occurs in `channel_open_direct_tcpip` before the
|
||||
proxy task is spawned. The current behavior (no check) is equivalent to
|
||||
`ForwardingPolicy::allow_all()` — default Allow, no rules. This preserves
|
||||
backward compatibility during migration.
|
||||
|
||||
### DynamicConfig Integration
|
||||
|
||||
`ForwardingPolicy` is part of `DynamicConfig` and reloadable via
|
||||
`ConfigReloadHandle::reload()` or NAPI's `reloadForwarding()`. Changes take
|
||||
effect on the next channel open — existing connections continue with their
|
||||
current policy.
|
||||
|
||||
### OQ Resolutions
|
||||
|
||||
- **OQ-12** (Per-user forwarding scope vs global rules): Resolved. Start with
|
||||
global rules + principal matching from `Identity.scopes`. Per-user scope
|
||||
from `peer_credentials.metadata.scopes` via `IdentityProvider`.
|
||||
- **OQ-16** (Transport-specific forwarding): Resolved. Add `TransportKind`
|
||||
match in `ForwardingRule`. WebTransport clients can be restricted.
|
||||
- **OQ-18** (Source of Identity.scopes): Resolved by ADR-029 and this ADR.
|
||||
`IdentityProvider` owns scopes. `ForwardingPolicy` consumes them.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Operators can restrict access per identity, per destination, per
|
||||
transport. A compromised key no longer grants unrestricted network access.
|
||||
- **Positive**: Default-allow preserves current behavior during migration. Switch
|
||||
to default-deny for production deployments.
|
||||
- **Positive**: Policy is reloadable without restart. Adding a rule via
|
||||
`reloadForwarding()` takes effect on the next channel open.
|
||||
- **Positive**: `TransportKind`-aware rules enable transport-specific
|
||||
restrictions (e.g., WebTransport clients restricted to alknet-* channels).
|
||||
- **Negative**: Another check in the hot path (every `channel_open_direct_tcpip`
|
||||
call). The cost is a linear scan of rules — acceptable for small rule sets.
|
||||
Large rule sets should use compiled matchers (future optimization).
|
||||
- **Negative**: `TargetPattern` string matching is lenient. Host patterns like
|
||||
`*.example.com` require careful implementation to prevent bypasses. The
|
||||
`glob` or `globset` crate can handle this correctly.
|
||||
|
||||
## References
|
||||
|
||||
- [research/configuration.md](../../research/configuration.md) — ForwardingPolicy section
|
||||
- [auth.md](../auth.md) — Identity.scopes and IdentityProvider
|
||||
- [open-questions.md](../open-questions.md) — OQ-12, OQ-16, OQ-18
|
||||
- [ADR-029](029-identity-core-type.md) — Identity as core type
|
||||
- [ADR-030](030-static-dynamic-config-split.md) — DynamicConfig (ForwardingPolicy is part of it)
|
||||
- [integration-plan.md](../../research/integration-plan.md) — Phase 1.3
|
||||
96
docs/architecture/decisions/032-event-boundary-discipline.md
Normal file
96
docs/architecture/decisions/032-event-boundary-discipline.md
Normal file
@@ -0,0 +1,96 @@
|
||||
# ADR-032: Event Boundary Discipline
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The research identified three distinct communication patterns in the system, and
|
||||
conflating them is a known anti-pattern in event-driven architectures:
|
||||
|
||||
1. **Domain events** (Honker streams) — Internal to the service that owns that
|
||||
data. Used for state reconstruction within the service's own boundaries.
|
||||
Examples: `nodes:created`, `edges:deleted`, `accounts:updated`.
|
||||
|
||||
2. **irpc service calls** — Synchronous request-response within a node or
|
||||
cluster. Internal to the system. Examples: `AuthProtocol::VerifyPubkey`,
|
||||
`SecretProtocol::DeriveEd25519`, `ConfigProtocol::ReloadForwarding`.
|
||||
|
||||
3. **Call protocol events** (`EventEnvelope`) — Asynchronous integration events
|
||||
that cross node boundaries. External to the system. Examples:
|
||||
`call.requested`, `call.responded`, `call.completed`, `call.aborted`.
|
||||
|
||||
Without a hard constraint, it's tempting to have one service subscribe directly
|
||||
to another service's Honker streams. This leads to:
|
||||
|
||||
- **Leaky event store**: Service A reads Service B's domain events directly,
|
||||
coupling A to B's internal state representation. When B changes its schema, A
|
||||
breaks.
|
||||
- **Boomerang coupling**: An integration event is too thin, causing the
|
||||
consumer to call back to the source service synchronously to get details. This
|
||||
negates the benefit of async communication.
|
||||
- **Fat notification trap**: A notification event carries full entity state,
|
||||
when it should use state transfer instead.
|
||||
|
||||
## Decision
|
||||
|
||||
**Event boundary discipline is a hard architectural constraint, not a
|
||||
suggestion.**
|
||||
|
||||
1. **Domain events stay within the owning service.** A Honker stream published
|
||||
by the storage service (`nodes:created`) is for the storage service's own
|
||||
state reconstruction. No other service reads these stream events directly.
|
||||
|
||||
2. **irpc service calls are synchronous and internal.** They never cross node
|
||||
boundaries. They are request-response, not events. They should not be used
|
||||
as a substitute for integration events.
|
||||
|
||||
3. **Call protocol events are the only events that cross node boundaries.**
|
||||
`EventEnvelope` frames are the integration boundary. When a domain event
|
||||
needs to be communicated to another node, it must be projected into a call
|
||||
protocol event.
|
||||
|
||||
4. **Projection from domain events to integration events is required when
|
||||
crossing boundaries.** A service that owns a Honker stream must project
|
||||
relevant state changes into `EventEnvelope` frames before they leave the
|
||||
node. The projection strips internal details and produces a versioned,
|
||||
stable integration event.
|
||||
|
||||
This discipline applies at three levels:
|
||||
|
||||
```
|
||||
Call Protocol (Layer 3, external, JSON)
|
||||
└── irpc Service (Layer 3, internal, postcard)
|
||||
└── Honker Streams (Domain events, within service boundary)
|
||||
```
|
||||
|
||||
A call protocol handler MAY call an irpc service internally (e.g.,
|
||||
`/head/auth/verify` calls `AuthProtocol::VerifyPubkey`). The irpc service MAY
|
||||
use Honker streams for its own state management. But domain events never
|
||||
propagate beyond the service boundary without projection.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Prevents leaky event stores. Services are independently
|
||||
deployable and their internal schemas can evolve without breaking consumers.
|
||||
- **Positive**: Honker and irpc are implementation details, not cross-boundary
|
||||
contracts. The call protocol's `EventEnvelope` is the only stable, versioned
|
||||
contract that other nodes depend on.
|
||||
- **Positive**: Clear ownership. Each service owns its Honker streams and can
|
||||
change them freely. Integration events are a deliberate, reviewed contract.
|
||||
- **Positive**: Makes testing easier. Services can be tested in isolation with
|
||||
mock domain events. Integration events are tested against the `EventEnvelope`
|
||||
schema.
|
||||
- **Negative**: Projection code is required. Every domain event that needs to
|
||||
cross a boundary must be explicitly projected. This is deliberate — the
|
||||
overhead ensures the integration contract is intentional.
|
||||
- **Negative**: Developers must resist the temptation to subscribe directly to
|
||||
Honker streams across services. Code review should catch this pattern.
|
||||
|
||||
## References
|
||||
|
||||
- [research/services.md](../../research/services.md) — Event boundary discipline section
|
||||
- [research/storage.md](../../research/storage.md) — Honker integration, event boundaries
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — ADR 032 entry
|
||||
- [event_source_types.md](/workspace/research/event_sourcing/event_source_types.md) — Event-driven architecture patterns
|
||||
@@ -0,0 +1,130 @@
|
||||
# ADR-033: OperationEnv as Universal Composition Mechanism
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The `@alkdev/operations` TypeScript package defines `OperationEnv` as a
|
||||
universal composition mechanism. A handler receives `context.env[namespace][op](input)`
|
||||
and can invoke any registered operation regardless of whether it runs locally, in
|
||||
an irpc service on the same cluster, or on a remote node via call protocol.
|
||||
|
||||
The research documents define three dispatch paths:
|
||||
1. **Local dispatch** — direct function call through the operation registry
|
||||
2. **Service dispatch** — irpc protocol call to a service backend
|
||||
3. **Remote dispatch** — call protocol `EventEnvelope` to a remote node
|
||||
|
||||
Without a formal decision, irpc services could be seen as a replacement for
|
||||
OperationEnv or for the call protocol. They are not — irpc is one dispatch
|
||||
backend for OperationEnv, not a replacement for anything. The call protocol is
|
||||
another dispatch backend. OperationEnv unifies them from the handler's
|
||||
perspective.
|
||||
|
||||
The three communication patterns in the system (ADR-032) are:
|
||||
- Domain events (Honker streams) — internal to the owning service
|
||||
- irpc service calls — synchronous, in-cluster
|
||||
- Call protocol events — asynchronous, cross-node
|
||||
|
||||
irpc services and call protocol operations serve different scopes but must
|
||||
compose cleanly through OperationEnv.
|
||||
|
||||
## Decision
|
||||
|
||||
**OperationEnv is the universal composition mechanism that all operation
|
||||
handlers receive. It provides namespace + operation name → invoke with input,
|
||||
return output, regardless of dispatch path.**
|
||||
|
||||
### OperationEnv Behavioral Contract
|
||||
|
||||
```rust
|
||||
// The behavioral contract: given a namespace and operation name, invoke the
|
||||
// operation with the given input and return the output. The handler neither
|
||||
// knows nor cares whether the dispatch is local, via irpc, or via call protocol.
|
||||
pub trait OperationEnv: Send + Sync {
|
||||
fn invoke(&self, namespace: &str, operation: &str, input: Value) -> ResponseEnvelope;
|
||||
}
|
||||
```
|
||||
|
||||
The Rust implementation may use typed method dispatch or a registry behind the
|
||||
scenes, but the handler-facing API must preserve this contract.
|
||||
|
||||
### Three Dispatch Paths
|
||||
|
||||
OperationEnv resolves each call to one of three dispatch backends:
|
||||
|
||||
| Path | Mechanism | Serialization | Scope |
|
||||
|------|-----------|---------------|-------|
|
||||
| Local | Direct function call through registry | None (in-process) | Same process |
|
||||
| Service | irpc protocol enum dispatch | postcard (binary) | Same cluster |
|
||||
| Remote | Call protocol `EventEnvelope` | JSON | Cross-node |
|
||||
|
||||
All three produce the same `ResponseEnvelope`. The handler always calls
|
||||
`context.env.invoke("secrets", "derive", input)` and gets a `ResponseEnvelope`
|
||||
back.
|
||||
|
||||
### Service Assembly
|
||||
|
||||
The deployment topology determines which dispatch path each operation uses:
|
||||
|
||||
```rust
|
||||
// Minimal deployment (single node, all local)
|
||||
let env = OperationEnv::local(local_registry);
|
||||
|
||||
// Production deployment (mix of local and remote)
|
||||
let env = OperationEnv::new()
|
||||
.local("auth", auth_registry) // Auth runs locally
|
||||
.local("config", config_registry) // Config runs locally
|
||||
.service("secrets", secret_irpc_client) // Secret service via irpc
|
||||
.remote("worker-1", call_protocol_conn) // Worker-1 operations via call protocol
|
||||
```
|
||||
|
||||
### irpc Services Are One Dispatch Backend
|
||||
|
||||
irpc services (`AuthProtocol`, `SecretProtocol`, `ConfigProtocol`) define the
|
||||
wire format for in-cluster communication. They are Rust-to-Rust, type-safe,
|
||||
and efficient. But they are not a replacement for OperationEnv or for the call
|
||||
protocol. They are one dispatch backend.
|
||||
|
||||
An irpc service can be exposed as a call protocol operation:
|
||||
`/head/auth/verify` receives a call protocol event and internally calls
|
||||
`AuthProtocol::VerifyPubkey` via irpc. The layers compose:
|
||||
|
||||
```
|
||||
Call Protocol (Layer 3, external, JSON)
|
||||
└── irpc Service (Layer 3, internal, postcard)
|
||||
└── Honker Streams (Domain events, within service boundary)
|
||||
```
|
||||
|
||||
### Adapters Map to OperationEnv
|
||||
|
||||
HTTP (`POST /v1/{namespace}/{op}`), MCP (`tools/call`), DNS
|
||||
(`{op}.{namespace}.alk.dev TXT?`), and call protocol
|
||||
(`/call.requested`) all resolve through OperationEnv. This is what makes
|
||||
operations universally composable across all interfaces.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Handlers compose through a single interface. Adding a new
|
||||
dispatch path (e.g., a new irpc service) doesn't change handler code.
|
||||
- **Positive**: irpc and call protocol coexist naturally. The handler doesn't
|
||||
know which path was taken.
|
||||
- **Positive**: Adapters (MCP, HTTP, DNS) map to operations through the same
|
||||
OperationEnv interface. One handler, multiple dispatch paths.
|
||||
- **Positive**: Deployment topology determines dispatch, not code. Same handler
|
||||
works locally, in-cluster, or cross-node.
|
||||
- **Negative**: OperationEnv is a new abstraction that must coexist with the
|
||||
existing call protocol handler pattern. The registry currently maps paths to
|
||||
handlers; OperationEnv adds namespace-aware composition on top.
|
||||
- **Negative**: The `@alkdev/operations` TypeScript `HashMap<String,
|
||||
HashMap<String, fn>>` model needs idiomatic Rust translation. The behavioral
|
||||
contract must match, but the implementation can differ.
|
||||
|
||||
## References
|
||||
|
||||
- [research/services.md](../../research/services.md) — OperationContext, OperationEnv
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 1.5, OperationEnv wiring
|
||||
- [ADR-032](032-event-boundary-discipline.md) — Event boundary discipline
|
||||
- [ADR-024](024-bidirectional-call-protocol.md) — Bidirectional call protocol
|
||||
- [ADR-025](025-handler-spec-separation.md) — Handler/spec separation
|
||||
55
docs/architecture/decisions/034-head-worker-terminology.md
Normal file
55
docs/architecture/decisions/034-head-worker-terminology.md
Normal file
@@ -0,0 +1,55 @@
|
||||
# ADR-034: Head/Worker Terminology
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The project previously used hub/spoke terminology for describing node
|
||||
relationships: a hub node that coordinates connections and spokes that connect to
|
||||
it. This terminology implies a strict star topology where the hub is
|
||||
fundamentally different from spokes.
|
||||
|
||||
In practice, a coordinating node can also execute operations (run services,
|
||||
forward traffic). Any node can become a coordinator. The architecture supports
|
||||
mesh topologies where nodes coordinate in a peer-to-peer fashion.
|
||||
|
||||
The research documents (`core.md`, `services.md`) and updated architecture
|
||||
specs (`call-protocol.md`, `auth.md`, `napi-and-pubsub.md`, `open-questions.md`)
|
||||
already use head/worker consistently. Existing ADRs (024, 025) retain their
|
||||
original hub/spoke language because ADRs are historical records.
|
||||
|
||||
## Decision
|
||||
|
||||
**Use head/worker terminology throughout the project.**
|
||||
|
||||
- **Head node**: A node that coordinates — accepts connections, routes
|
||||
operations, manages cluster state. A head is also a worker (it can execute
|
||||
operations).
|
||||
- **Worker node**: A node that connects to a head, registers its services, and
|
||||
executes operations. Any worker can become a head.
|
||||
- **Node**: Any participant in the network. Every node has an Ed25519 identity.
|
||||
|
||||
The terms hub and spoke are deprecated in all new specs, code, and
|
||||
documentation. Existing ADRs retain their original language as historical
|
||||
records — ADRs document what was decided at the time, not what the current
|
||||
terminology is.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **Positive**: Natural mesh formation. A head that is also a worker enables
|
||||
multi-hop routing, redundancy, and distributed topologies without a
|
||||
centralized authority.
|
||||
- **Positive**: Consistency with integration plan and research documents.
|
||||
- **Positive**: The terminology better reflects the architecture — there is no
|
||||
single "hub" that's fundamentally different from "spokes."
|
||||
- **Neutral**: Existing ADRs (024, 025) retain hub/spoke in their text. This is
|
||||
intentional — ADRs are historical records.
|
||||
|
||||
## References
|
||||
|
||||
- [research/integration-plan.md](../../research/integration-plan.md) — Phase 0 ADR 034 entry, inconsistencies section
|
||||
- [ADR-024](024-bidirectional-call-protocol.md) — Uses hub/spoke historically
|
||||
- [ADR-025](025-handler-spec-separation.md) — Uses hub/spoke historically
|
||||
- [research/core.md](../../research/core.md) — Head/worker terminology
|
||||
Reference in New Issue
Block a user