docs: fix inconsistencies in architecture specs

- Replace hub/spoke with head/worker terminology in call-protocol.md,
  auth.md, open-questions.md, napi-and-pubsub.md
- Update operation paths from /{spoke}/{service}/{op} to
  /{node}/{service}/{op} throughout call-protocol.md
- Unify Identity struct: auth.md already had {id, scopes, resources},
  add note clarifying this is canonical (vs research/services.md which
  used {node_id, fingerprint, scopes})
- Update integration-plan.md inconsistencies section to track what's
  been fixed (hub/spoke, identity model) and expand service naming
  to include external services
- Update call-protocol.md last_updated date

ADRs are intentionally left unchanged as historical records.
This commit is contained in:
2026-06-07 07:50:00 +00:00
parent 69d232fda7
commit 6db1266672
5 changed files with 88 additions and 82 deletions

View File

@@ -1,6 +1,6 @@
---
status: draft
last_updated: 2026-06-04
last_updated: 2026-06-07
---
# Authentication & Identity
@@ -95,11 +95,19 @@ pub struct Identity {
}
```
> **Note on identity models**: Earlier research used `{node_id, fingerprint, scopes}`.
> The unified model uses `{id, scopes, resources}` where `id` serves as both
> fingerprint (for key-based auth from config) and account UUID (for
> database-backed auth). The `resources` field provides resource-level
> authorization beyond what scopes offer. This is the canonical definition
> that all components should use.
```
**Default implementation**: `ConfigIdentityProvider` loads from
`DynamicConfig.auth` (the `authorized_keys` set). Every authorized key gets a
default scope set. No database required.
**Hub implementation**: Backed by `@alkdev/storage`'s `peer_credentials` and
**Head implementation**: Backed by `@alkdev/storage`'s `peer_credentials` and
`accounts` tables plus the ACL graph. Resolves fingerprint → account →
organization membership → effective scopes. Uses `ArcSwap` for hot reload.

View File

@@ -1,6 +1,6 @@
---
status: draft
last_updated: 2026-06-04
last_updated: 2026-06-07
---
# Call Protocol
@@ -11,15 +11,15 @@ A bidirectional, transport-agnostic call and event protocol that runs over
authenticated pipes. It supports request/response calls, streaming
subscriptions, and unidirectional events — all using the same wire format. The
protocol is defined as a spec + handler + registry; downstream consumers (NAPI,
Python, hub/spoke) register their own operations without modifying core.
Python, head/worker) register their own operations without modifying core.
## Why
The current control channel (ADR-018) is unidirectional (client → server) and
provides fire-and-forget event dispatch without request/response semantics.
The call protocol generalizes it to support bidirectional calls (ADR-024) and
downstream service registration (ADR-025), enabling the hub/spoke model where
spokes expose operations the hub invokes.
downstream service registration (ADR-025), enabling the head/worker model where
workers expose operations the head invokes.
## Architecture
@@ -28,10 +28,10 @@ spokes expose operations the hub invokes.
Operation names use slash-based paths aligned with URL routing conventions:
```
/{spoke}/{service}/{op}
/{node}/{service}/{op}
```
- **spoke** — identity prefix of the node that exposes the operation. The hub
- **node** — identity prefix of the node that exposes the operation. The head
uses this segment to route calls to the correct connected node.
- **service** — the logical service namespace. Groups related operations
under one handler prefix.
@@ -41,11 +41,11 @@ Examples:
| Path | Meaning |
|------|---------|
| `/dev1/fs/readFile` | Spoke `dev1`, service `fs`, operation `readFile` |
| `/dev1/bash/exec` | Spoke `dev1`, service `bash`, operation `exec` |
| `/hub/agent/chat` | Hub's own `agent` service, operation `chat` |
| `/hub/sessions/list` | Hub's own `sessions` service, operation `list` |
| `/browser-1/notify/alert` | Browser spoke `browser-1`, `notify` service |
| `/dev1/fs/readFile` | Node `dev1`, service `fs`, operation `readFile` |
| `/dev1/bash/exec` | Node `dev1`, service `bash`, operation `exec` |
| `/head/agent/chat` | Head's own `agent` service, operation `chat` |
| `/head/sessions/list` | Head's own `sessions` service, operation `list` |
| `/browser-1/notify/alert` | Worker `browser-1`, `notify` service |
This three-level routing mirrors iroh's ALPN dispatch: the first segment
routes to a connected node (like ALPN routes to a protocol handler), the
@@ -110,11 +110,11 @@ The `id` field carries the `requestId` for correlation.
### Bidirectional Calls and Routing
Both sides of a connection can initiate calls. The hub routes calls to spokes
Both sides of a connection can initiate calls. The head routes calls to workers
using the first path segment:
```
Hub (server) Spoke: "dev1" (client)
Head (server) Worker: "dev1" (client)
│ │
│ call.requested │
│ name: "/dev1/fs/readFile" │
@@ -126,11 +126,11 @@ Hub (server) Spoke: "dev1" (client)
│ payload: { content: "fn main()..." } │
│◀──────────────────────────────────────────│
│ │
Spoke exposes /dev1/fs/*, │
│ /dev1/bash/* to hub
Worker exposes /dev1/fs/*, │
│ /dev1/bash/* to head
│ │
│◀─ call.requested ────────────────────────│
│ name: "/hub/agent/chat" │
│ name: "/head/agent/chat" │
│ payload: { provider: "anthropic", ... } │
│ │
│── call.responded ──────────────────────▶ │
@@ -138,54 +138,54 @@ Hub (server) Spoke: "dev1" (client)
│ payload: { completion: "..." } │
```
The hub's registry includes:
- **Hub-local operations** (`/hub/*`) — handled directly
- **Remote operations** (`/{spoke}/*`) — forwarded to the spoke connection
The head's registry includes:
- **Head-local operations** (`/head/*`) — handled directly
- **Remote operations** (`/{node}/*`) — forwarded to the worker connection
When the hub routes `/dev1/fs/readFile` to spoke `dev1`, it strips the spoke
prefix and delivers the call to the spoke's local registry as `/fs/readFile`.
The spoke doesn't need to know its own alias.
When the head routes `/dev1/fs/readFile` to worker `dev1`, it strips the node
prefix and delivers the call to the worker's local registry as `/fs/readFile`.
The worker doesn't need to know its own alias.
### Hub/Spoke Architecture
### Head/Worker Architecture
```
┌─────────────────────────────────┐
Hub
Head Node
│ │
│ Hub-local services:
│ /hub/agent/chat (LLM coord) │
│ /hub/agent/complete
│ /hub/sessions/list
│ /hub/sessions/history
│ Head-local services: │
│ /head/agent/chat (LLM coord) │
│ /head/agent/complete │
│ /head/sessions/list │
│ /head/sessions/history │
│ │
Spoke registry (discovered):
│ /dev1/fs/* → dev1 connection
│ /dev1/bash/* → dev1 connection
│ /dev2/fs/* → dev2 connection
│ /browser-1/notify/* → WT conn
Worker registry (discovered): │
│ /dev1/fs/* → dev1 connection │
│ /dev1/bash/* → dev1 connection │
│ /dev2/fs/* → dev2 connection │
│ /browser-1/notify/* → WT conn │
└──────┬───────┬───────┬──────────┘
│ │ │
┌─────────▼┐ ┌───▼────┐ ┌▼───────────┐
Dev Spoke│ │Dev Spk │ │Browser Spoke│
│ "dev1" │ │"dev2" │ │"browser-1" │
│ /fs/* │ │/fs/* │ │/notify/* │
│ /bash/* │ │/bash/* │ │ │
│ /search/*│ │ │ │ │
└──────────┘ └────────┘ └─────────────┘
Worker │ │Worker │ │Browser Worker
│ "dev1" │ │"dev2" │ │"browser-1" │
│ /fs/* │ │/fs/* │ │/notify/* │
│ /bash/* │ │/bash/* │ │ │
│ /search/*│ │ │ │ │
└──────────┘ └────────┘ └─────────────┘
```
When a spoke connects, it registers its operations with the hub:
When a worker connects, it registers its operations with the head:
```
spoke → hub: call.requested { name: "/hub/services/register", payload: {
spoke: "dev1",
worker → head: call.requested { name: "/head/services/register", payload: {
node: "dev1",
operations: ["/fs/readFile", "/fs/writeFile", "/bash/exec", "/search/query"]
}}
```
The hub adds these to its routing table with the spoke prefix. Other spokes
The head adds these to its routing table with the node prefix. Other workers
and browser clients can then call `/dev1/fs/readFile` without knowing how
the hub routes it internally.
the head routes it internally.
### Operation Registry
@@ -223,7 +223,7 @@ pub struct AccessControl {
registry.register(OperationSpec { name: "/services/list", ... }, list_services_handler);
registry.register(OperationSpec { name: "/services/schema", ... }, schema_handler);
// A dev env spoke registers its tools
// A dev env worker registers its tools
registry.register(OperationSpec { name: "/fs/readFile", ... }, fs_read_handler);
registry.register(OperationSpec { name: "/bash/exec", ... }, bash_exec_handler);
@@ -231,10 +231,10 @@ registry.register(OperationSpec { name: "/bash/exec", ... }, bash_exec_handler);
registry.register(OperationSpec { name: "/notify/alert", ... }, notify_handler);
```
Core-provided operations use short paths without a spoke prefix
Core-provided operations use short paths without a node prefix
(`/services/list`, `/services/schema`). They live on whatever node the
caller is connected to. Spoke-prefixed operations (`/dev1/fs/readFile`)
are routed by the hub.
caller is connected to. Worker-prefixed operations (`/dev1/fs/readFile`)
are routed by the head.
### ACL Per Operation Path
@@ -242,12 +242,12 @@ Access control maps to path prefixes using standard URL-like matching:
| Pattern | Matches | Purpose |
|---------|---------|---------|
| `/dev1/*` | All operations on spoke `dev1` | Full access to a spoke |
| `/*/fs/*` | `fs` service on any spoke | Read file access across dev envs |
| `/*/bash/*` | `bash` service on any spoke | Shell access (higher risk) |
| `/hub/agent/*` | Hub LLM agent | LLM calls |
| `/hub/sessions/*` | Hub session management | Session history |
| `/browser-1/notify/alert` | Specific operation on specific spoke | One UI notification |
| `/dev1/*` | All operations on node `dev1` | Full access to a worker |
| `/*/fs/*` | `fs` service on any node | Read file access across dev envs |
| `/*/bash/*` | `bash` service on any node | Shell access (higher risk) |
| `/head/agent/*` | Head LLM agent | LLM calls |
| `/head/sessions/*` | Head session management | Session history |
| `/browser-1/notify/alert` | Specific operation on specific node | One UI notification |
Higher-risk operations (shell, filesystem write) can require tighter scopes
than read-only operations. The ACL evaluates against the caller's
@@ -337,20 +337,20 @@ translation at the wire level.
### Agent Service Pattern
The hub commonly runs an agent service that coordinates between LLM providers
The head commonly runs an agent service that coordinates between LLM providers
and tool calls. This service is just another set of registered operations —
no special treatment:
- `/hub/agent/chat` — send a message, get a completion. Routes to the
appropriate LLM provider based on available spokes and configuration.
- `/hub/agent/complete` — streaming completion. Yields tokens as they arrive.
- `/hub/sessions/list` — list session histories (backed by Honker or other
- `/head/agent/chat` — send a message, get a completion. Routes to the
appropriate LLM provider based on available workers and configuration.
- `/head/agent/complete` — streaming completion. Yields tokens as they arrive.
- `/head/sessions/list` — list session histories (backed by Honker or other
durable storage).
- `/hub/sessions/history` — retrieve a specific session's message history.
- `/head/sessions/history` — retrieve a specific session's message history.
The agent service uses the same call protocol to invoke tools on spokes:
The agent service uses the same call protocol to invoke tools on workers:
`/dev1/fs/readFile` for file access, `/dev1/bash/exec` for shell commands. It
stores session state via whatever mechanism the hub deployment provides — core
stores session state via whatever mechanism the head deployment provides — core
doesn't mandate Honker or any specific storage.
## Constraints
@@ -364,15 +364,15 @@ doesn't mandate Honker or any specific storage.
admin operations are exposed through the call protocol itself.
- Batch is not a protocol primitive. Multiple `call.requested` events with
correlated `requestId`s provide equivalent semantics.
- The spoke prefix in the operation path is a routing mechanism, not a security
- The node prefix in the operation path is a routing mechanism, not a security
boundary. ACL is enforced at the `AccessControl` level, not by path prefix
alone. A spoke that exposes `/dev1/bash/exec` can restrict access via
alone. A worker that exposes `/dev1/bash/exec` can restrict access via
`required_scopes` — not every authenticated identity should have shell access.
## Open Questions
- **OQ-20**: How does the hub track which spokes expose which operations when
spokes connect and disconnect? Registration on connect and cleanup on
- **OQ-20**: How does the head track which workers expose which operations when
workers connect and disconnect? Registration on connect and cleanup on
disconnect, or heartbeat-based discovery? See
[open-questions.md](open-questions.md).

View File

@@ -14,7 +14,7 @@ Two integration layers that enable TypeScript/JavaScript consumers to use alknet
## Why
The alknet Rust binary serves CLI users. But the broader ecosystem (pubsub, operations, agent spokes) is TypeScript-first. These integration layers let TypeScript code use alknet's transport without reimplementing SSH.
The alknet Rust binary serves CLI users. But the broader ecosystem (pubsub, operations, agent workers) is TypeScript-first. These integration layers let TypeScript code use alknet's transport without reimplementing SSH.
The NAPI surface is intentionally minimal — it exposes transport connections as duplex streams, not the full SSH protocol. The pubsub adapter wraps those streams with `EventEnvelope` serialization.
@@ -127,14 +127,11 @@ The alknet server uses a reserved `direct_tcpip` destination (`alknet-control:0`
2. Instead of opening a TCP connection, it bridges the channel to its local pubsub event bus
3. `EventEnvelope` JSON flows bidirectionally over the SSH channel
Users who prefer not to use the control channel can alternatively run a pubsub hub on a specific port and use standard port forwarding: `alknet connect --forward 9736:hub:9736`. This is a deployment choice, not a separate implementation — alknet's port forwarding works normally for any TCP service.
Users who prefer not to use the control channel can alternatively run a pubsub service on a specific port and use standard port forwarding: `alknet connect --forward 9736:head:9736`. This is a deployment choice, not a separate implementation — alknet's port forwarding works normally for any TCP service.
### Direction Agnostic
- **Worker connects to head**: `alknet connect --forward 9736:head:9736` then create WebSocket event target pointing at `ws://localhost:9736`
Because alknet supports both local and remote port forwarding, the event target works in either direction:
- **Worker connects to hub**: `alknet connect --forward 9736:hub:9736` then create WebSocket event target pointing at `ws://localhost:9736`
- **Hub connects to worker**: `alknet connect --remote-forward 9736:worker:9736` — same result, opposite initiator
- **Head connects to worker**: `alknet connect --remote-forward 9736:worker:9736` — same result, opposite initiator
The pubsub adapter doesn't care which side initiated the SSH session. It just needs a byte stream.

View File

@@ -154,18 +154,18 @@ last_updated: 2026-06-04
## Call Protocol
### OQ-20: Spoke registration and discovery on connect/disconnect
### OQ-20: Worker registration and discovery on connect/disconnect
- **Origin**: [call-protocol.md](call-protocol.md)
- **Status**: open
- **Priority**: medium
- **Resolution**: (pending — registration on connect / cleanup on disconnect is the leading approach)
- **Cross-references**: ADR-024, ADR-025
### OQ-21: Routing calls to specific spokes with same-service operations
### OQ-21: Routing calls to specific workers with same-service operations
- **Origin**: [call-protocol.md](call-protocol.md)
- **Status**: ~~resolved~~
- **Priority**: ~~medium~~
- **Resolution**: ADR-024, ADR-025 — Operation paths use `/{spoke}/{service}/{op}` format. The first path segment identifies the spoke and routes the call to the correct connected node. Multiple spokes exposing the same service (e.g., two dev envs both with `/fs/*`) are differentiated by the spoke prefix (`/dev1/fs/readFile` vs `/dev2/fs/readFile`). The hub maintains a routing table mapping spoke identity to connection. This mirrors iroh's ALPN dispatch: first segment = routing key.
- **Resolution**: ADR-024, ADR-025 — Operation paths use `/{node}/{service}/{op}` format. The first path segment identifies the node and routes the call to the correct connected node. Multiple workers exposing the same service (e.g., two dev envs both with `/fs/*`) are differentiated by the node prefix (`/dev1/fs/readFile` vs `/dev2/fs/readFile`). The head maintains a routing table mapping node identity to connection. This mirrors iroh's ALPN dispatch: first segment = routing key.
- **Cross-references**: [call-protocol.md](call-protocol.md), ADR-024, ADR-025
### OQ-22: Client streaming (streaming inputs) in the call protocol?

View File

@@ -628,17 +628,18 @@ These must have answers before implementation begins:
The research documents have a few areas that need reconciliation:
1. **Hub/spoke vs head/worker**: core.md and services.md use head/worker. call-protocol.md still uses hub/spoke in several places. All docs need to be updated consistently. ADR-034 formalizes this.
1. **Hub/spoke vs head/worker**~~: core.md and services.md use head/worker. call-protocol.md still uses hub/spoke in several places. All docs need to be updated consistently. ADR-034 formalizes this.~~ **Fixed**: call-protocol.md, auth.md, open-questions.md, and napi-and-pubsub.md updated to head/worker terminology. ADRs are historical records and retain original terminology. ADR-034 still needed to formalize the decision.
2. **DNS as transport vs interface**: core.md conflates "DNS as transport" (encoding bytes as DNS queries) with "DNS as naming/discovery" (TXT records). The three-layer model cleanly separates these: DNS transport is Layer 1, DNS naming is a separate concern (similar to DNS-SD or iroh-dns).
3. **Service naming collision — irpc service vs call protocol operation**: The research uses "service" for both irpc protocol enums (AuthProtocol, SecretProtocol) and call protocol path-based handlers (`/head/auth/verify`, `/head/secrets/derive`). These are different concepts that compose through OperationEnv. The architecture should consistently use:
3. **Service naming collision — irpc service vs call protocol operation vs external service**: The research uses "service" for both irpc protocol enums (AuthProtocol, SecretProtocol) and call protocol path-based handlers (`/head/auth/verify`, `/head/secrets/derive`). These are different concepts that compose through OperationEnv. The architecture should consistently use:
- **irpc service** for in-cluster, Rust-to-Rust protocol enums dispatched by variant (AuthProtocol::VerifyPubkey)
- **operation** for path-based call protocol handlers dispatched by namespace + name (`/head/auth/verify`)
- **external service** for any endpoint reachable via the call protocol from another node or over an interface — an HTTP endpoint, a vast.ai instance, another head node. These are "services" in the broadest sense but sit outside the cluster. They're reachable through OperationEnv's remote dispatch path.
- An irpc service can back an operation — the OperationEnv routes to the right dispatch path automatically
- Both are "services" in the broad sense, but the dispatch mechanism differs. OperationEnv unifies them.
4. **Identity model divergence**: auth.md defines `Identity` with `{id, scopes, resources}`. services.md defines `Identity` with `{node_id, fingerprint, scopes}`. These need to be unified. Proposed: `{id, scopes, resources}` where `id` is a fingerprint (for key-based auth) or account UUID (for database-backed auth).
4. **Identity model divergence**~~: auth.md defines `Identity` with `{id, scopes, resources}`. services.md defines `Identity` with `{node_id, fingerprint, scopes}`. These need to be unified. Proposed: `{id, scopes, resources}` where `id` is a fingerprint (for key-based auth) or account UUID (for database-backed auth).~~ **Fixed**: auth.md already has the correct unified definition `{id, scopes, resources}`. Added a note in auth.md calling out the unification. services.md (research) still uses the old form — will be corrected when the services spec is formally written.
5. **OperationEnv is a universal composition mechanism, not an implementation detail**: services.md defines `OperationEnv` as `HashMap<String, HashMap<String, fn(Value, OperationContext) -> ResponseEnvelope>>`. This is not a TypeScript pattern to be "translated" to Rust as an irpc Client<S>. The OperationEnv composition model is what makes operations universally addressable across HTTP, MCP, DNS, call protocol, and irpc. The Rust implementation can use typed method dispatch or a registry behind the scenes, but the behavioral contract — namespace + operation name → invoke with input, return output — must match. Adapters (MCP, HTTP, DNS) map to this interface. Handlers compose through this interface. irpc is one dispatch backend for OperationEnv, not a replacement for it.