Land the storage and auth strategy research (findings.md) as four accepted ADRs and amend the core and call specs to match: - ADR-030: PeerEntry and Identity.id decoupling. Replaces authorized_fingerprints with peers: Vec<PeerEntry>; Identity.id becomes the stable peer_id, decoupled from the rotating fingerprint. Supersedes ADR-029 Assumption 1's UUID source (one-way door preserved, source changes). Resolves OQ-33 and the storage-boundary half of OQ-34. Records the API-key asymmetry as deliberate (OQ-35). - ADR-031: CredentialStore repo trait + InMemoryCredentialStore default adapter in core. Second repo trait alongside IdentityProvider. Vault encrypts; the store persists the EncryptedData blob; assembly layer loads into Capabilities. EncryptedData core mirror includes salt for wire-format compat. - ADR-032: Forwarded-for identity. forwarded_for field on call.requested and OperationContext — metadata only, never read by AccessControl::check (enforced structurally via the check signature). The from_call handler populates it. Wire-format one-way door, folded into the ADR-029 migration window. - ADR-033: Storage boundary and repo/adapter pattern. Core defines repo traits + in-memory defaults; persistence adapters are separate crates; assembly layer wires. Resolves OQ-34. Concrete adapter shapes deferred for exploration (OQ-36). Amends auth.md, config.md, operation-registry.md, client-and-adapters.md, open-questions.md, README.md, crates/core/README.md. Marks ADR-029 Accepted (Assumption 1 carries the ADR-030 superseded note). Marks the research findings doc reviewed.
751 lines
35 KiB
Markdown
751 lines
35 KiB
Markdown
---
|
|
status: reviewed
|
|
last_updated: 2026-06-27
|
|
---
|
|
|
|
# Storage and Auth Strategy
|
|
|
|
**Status**: Conclusions landed as ADR-030, ADR-031, ADR-032, ADR-033. This
|
|
document is the research that informed those ADRs; the ADRs and the amended
|
|
specs (`auth.md`, `config.md`, `client-and-adapters.md`,
|
|
`operation-registry.md`) are now the authoritative source.
|
|
**Date**: 2026-06-27
|
|
**Scope**: Cross-cutting — storage decomposition, auth/ACL model, repo/adapter
|
|
pattern, SQLite+honker as foundation, metagraph as tool. Synthesizes the
|
|
discussion that surfaced during the peer-graph routing research (ADR-029) and
|
|
OQ-33/34 resolution.
|
|
|
|
This document consolidates a multi-thread discussion into an architectural
|
|
strategy for storage and auth in the alknet crate graph. It is not an ADR —
|
|
it's the research that informed ADRs and spec amendments.
|
|
|
|
---
|
|
|
|
## 1. The Problem
|
|
|
|
Three separate threads converged on the same question: where does persistent
|
|
state live in the alknet crate graph, and what's the shared infrastructure
|
|
for it?
|
|
|
|
1. **Peer identity (OQ-33/OQ-34)** — a head node needs to persist the mapping
|
|
from a stable logical peer identity to its current cryptographic material,
|
|
surviving key rotation and restarts. The UUID workaround is ephemeral; a
|
|
real store is needed.
|
|
2. **Filesystem (POC-validated)** — SQLite + honker + iroh-blobs as the
|
|
three-layer stack for path-tree metadata, content-addressed blobs, and
|
|
transactional notify-on-commit. 24 tests across two POC crates.
|
|
3. **The old `alknet-storage` spec (alknet-main)** — a single crate doing
|
|
metagraph, identity, ACL, secrets, and honker integration. Designed before
|
|
the vault existed, before ADR-029, before the filesystem POC. Has residual
|
|
issues: multi-tenant complexity, secrets module that's now the vault,
|
|
metagraph-as-foundation rather than metagraph-as-tool.
|
|
|
|
The common thread: **SQLite via honker is the right local persistence layer
|
|
for all three**, and the metagraph model is the right shape for *some* of the
|
|
data. The question is how to decompose this so the core crates stay lean
|
|
while the storage-dependent crates get what they need — without forcing
|
|
everything through the same abstraction.
|
|
|
|
The answer is a **repo/adapter pattern**: core defines traits, adapters
|
|
implement them against specific backends, the assembly layer wires the
|
|
adapter. This is not a deferral — the traits and the adapters are concrete
|
|
design commitments, documented below.
|
|
|
|
---
|
|
|
|
## 2. The Principle: Right Tool for the Right Shape
|
|
|
|
The metagraph (GraphType → NodeType → EdgeType → Graph → Node → Edge) is a
|
|
generalized graph store. It's the right tool for genuinely graph-shaped
|
|
problems: ACL delegation chains, workflows, task dependency DAGs, call
|
|
composition trees. It is the *wrong* tool for things that aren't graph-shaped:
|
|
|
|
| Data | Shape | Right tool |
|
|
|------|-------|------------|
|
|
| Peer identity → crypto material + scopes | Key-value (flat table) | `peers` table with typed columns |
|
|
| Filesystem path tree | Tree (degenerate graph) | Specialized path-tree tables (recursive CTE, proven by POC) |
|
|
| Provider credentials (encrypted blobs) | Key-value | `credentials` table |
|
|
| ACL delegation chains | Graph (traversal, narrowing) | Metagraph |
|
|
| Workflows / flowgraph | Graph (DAG, type compatibility) | Metagraph |
|
|
| Taskgraph | Graph (dependency DAG) | Metagraph |
|
|
| Operation specs | Flat records with typed fields | Table (or in-memory registry, as today) |
|
|
|
|
Forcing table-shaped data through the metagraph adds overhead (JSON Schema
|
|
validation on every node, graph traversal for what should be an indexed
|
|
lookup) without benefit. The filesystem POC proved this empirically: the
|
|
path tree uses specialized tables with a recursive CTE, and it's sub-
|
|
millisecond. The same data in a metagraph would be a graph traversal per
|
|
resolve — slower, more complex, no upside.
|
|
|
|
**The principle: SQLite + honker is the foundation. The metagraph is one
|
|
tool built on it, for graph-shaped problems. Direct tables are another tool,
|
|
for table-shaped problems. Each consumer picks the right tool.**
|
|
|
|
---
|
|
|
|
## 3. SQLite + Honker as Foundation (Pattern, Not Crate)
|
|
|
|
The filesystem POC established the integration pattern:
|
|
|
|
```rust
|
|
honker_core::apply_default_pragmas(conn)?; // WAL, synchronous=NORMAL
|
|
honker_core::attach_notify(conn)?; // notify() SQL function
|
|
honker_core::attach_honker_functions(conn)?; // enqueue, claim, lock, stream, cron
|
|
honker_core::bootstrap_honker_schema(conn)?; // queue/stream/scheduler tables
|
|
```
|
|
|
|
This is ~20 lines of setup per consumer. Each consumer that wants its own
|
|
tables does this on its own rusqlite connection. The critical property: the
|
|
honker functions live on *the same connection* as the data tables, so writes
|
|
and notifications are atomic in one transaction (the transactional-outbox
|
|
pattern, built in). This is `honker-core` (attach to your connection), not
|
|
`honker` (manages its own connection) — the POC documented this distinction.
|
|
|
|
**This is a pattern, not a crate.** Packaging ~20 lines of setup as a shared
|
|
crate adds a dependency boundary for no gain. Each consumer opens its own
|
|
SQLite file, attaches honker, defines its schema. A `setup_honker(conn)`
|
|
helper function (in a shared utility, or just copy-pasted) is enough.
|
|
|
|
### Why SQLite, not a "real database"
|
|
|
|
SQLite is an [application file format](https://sqlite.org/appfileformat.html),
|
|
not just a database. The filesystem POC's insight: BLOBs < 100KB are faster
|
|
inline in SQLite than as filesystem files; atomic transactions over metadata
|
|
independent of content; the schema is the documentation. Each consumer gets
|
|
a local, crash-safe, queryable file — not a database server to operate.
|
|
|
|
The core crates (alknet-core, alknet-call) stay DB-free. The storage-
|
|
consuming crates (filesystem, peer registry, graphs) each own their SQLite
|
|
file. The assembly layer wires them together.
|
|
|
|
### What honker adds
|
|
|
|
| Feature | Use case |
|
|
|---------|---------|
|
|
| `notify` / `listen` | Ephemeral pub/sub — "ACL entry changed, invalidate cache" |
|
|
| `stream_publish` / `subscribe` | Durable pub/sub — "peer identity updated, propagate" |
|
|
| `queue` / `claim` / `ack` | Task queue — "orphaned write session cleanup" |
|
|
| `lock_acquire` / `lock_release` | Named locks — "writer coordination on a path" |
|
|
| `scheduler` | Periodic tasks — "session cleanup, audit log pruning" |
|
|
|
|
The key integration: every mutation is atomic with its notification. A
|
|
`peers` table update + `notify("peers:changed", peer_id)` commit together.
|
|
A downstream consumer (e.g., the call protocol's `IdentityProvider` cache)
|
|
wakes on commit, not on poll.
|
|
|
|
---
|
|
|
|
## 4. The Repo/Adapter Pattern
|
|
|
|
### The principle
|
|
|
|
Core defines traits (repo interfaces). Adapters implement them against
|
|
specific backends. The assembly layer wires the adapter. Downstream crates
|
|
consume the trait, not the adapter. This is the same pattern `IdentityProvider`
|
|
already establishes — we're making it explicit and extending it to every
|
|
storage-shaped concern.
|
|
|
|
### Reference: kepal
|
|
|
|
The TypeScript project [kepal](/workspace/keypal) is a clean example. It
|
|
abstracts API key management (hashing, validation, scopes, expiration,
|
|
caching) with a `Storage` interface and adapters for Redis, Drizzle, Prisma,
|
|
Kysely, Convex, and in-memory. The core logic (`Manager`) is backend-agnostic;
|
|
the storage is a trait; the consumer picks the adapter at wiring time. An
|
|
`AdapterFactory` provides column-mapping / schema-config so the same adapter
|
|
works against different table schemas.
|
|
|
|
The alknet equivalent: core defines the repo trait, adapters implement it,
|
|
the assembly layer wires the adapter. The shapes map cleanly.
|
|
|
|
### Why this matters beyond the call crate
|
|
|
|
Downstream crates that don't use the call protocol still need auth. A crate
|
|
that exposes operations over HTTP (alknet-http) or a service with no protocol
|
|
at all still needs to resolve identities and check ACL. If the auth layer is
|
|
a repo trait in core, those crates use the same trait, the same adapters, and
|
|
potentially the same backing store — without depending on alknet-call. The
|
|
call crate is one consumer of auth, not the owner of it.
|
|
|
|
The repo pattern also opens the door to distributed auth adapters (automerge
|
|
sync, Redis, a remote identity service) — the trait doesn't care which
|
|
backend is wired. That's not designed here, but the pattern doesn't foreclose
|
|
it.
|
|
|
|
### The concrete repo traits and adapters
|
|
|
|
This is the design commitment, not a deferral:
|
|
|
|
#### `IdentityProvider` (auth repo trait — already in core)
|
|
|
|
```rust
|
|
pub trait IdentityProvider: Send + Sync + 'static {
|
|
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
|
|
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
|
|
}
|
|
```
|
|
|
|
Already exists. Already used by the call protocol's `Dispatcher`. The
|
|
contract is: given a credential (fingerprint or token), return the resolved
|
|
`Identity` (id, scopes, resources). The `Identity.id` is the **stable logical
|
|
peer identity**, decoupled from the fingerprint (OQ-33). The adapter maps
|
|
fingerprint → stable id + scopes + resources.
|
|
|
|
**Adapters that need to exist:**
|
|
|
|
1. **`ConfigIdentityProvider`** (exists, needs updating) — backed by
|
|
`ArcSwap<DynamicConfig>`. Today it sets `Identity.id = fingerprint`, which
|
|
couples the identity to the crypto material and breaks on key rotation.
|
|
Needs to be updated to use `PeerEntry` (see below) so `Identity.id` is the
|
|
stable `peer_id`, not the fingerprint.
|
|
|
|
2. **`SqliteIdentityProvider`** (needs building) — backed by a `peers` table
|
|
in SQLite + honker. Implements `IdentityProvider` by querying the `peers`
|
|
table. This is the persistent adapter that survives restarts and supports
|
|
runtime peer add/remove/update. The `peers` table is:
|
|
|
|
```sql
|
|
CREATE TABLE peers (
|
|
peer_id TEXT PRIMARY KEY, -- stable logical id ("worker-a")
|
|
fingerprint TEXT NOT NULL, -- current crypto material
|
|
scopes TEXT NOT NULL DEFAULT '[]', -- JSON array
|
|
resources TEXT NOT NULL DEFAULT '{}', -- JSON map
|
|
display_name TEXT,
|
|
enabled INTEGER NOT NULL DEFAULT 1,
|
|
created_at INTEGER NOT NULL,
|
|
updated_at INTEGER NOT NULL
|
|
);
|
|
CREATE INDEX idx_peers_fingerprint ON peers(fingerprint);
|
|
```
|
|
|
|
Key rotation: `UPDATE peers SET fingerprint = ?new WHERE peer_id = ?`. The
|
|
`peer_id` is stable; ACL entries key on it; the fingerprint changes; the
|
|
ACL still matches.
|
|
|
|
3. **In-memory `IdentityProvider`** (exists for tests) — the current
|
|
`ConfigIdentityProvider` with `AuthPolicy::default()` or a test config.
|
|
|
|
#### `CredentialStore` (encrypted credentials repo trait — needs adding to core)
|
|
|
|
The http crate's `from_openapi`/`from_mcp` handlers need provider credentials
|
|
(API keys, OAuth tokens). The vault encrypts them; a store persists the
|
|
encrypted blobs. The trait:
|
|
|
|
```rust
|
|
pub trait CredentialStore: Send + Sync {
|
|
fn get(&self, provider: &str) -> Option<EncryptedData>;
|
|
fn put(&self, provider: &str, data: &EncryptedData) -> Result<(), CredentialStoreError>;
|
|
fn delete(&self, provider: &str) -> Result<(), CredentialStoreError>;
|
|
}
|
|
```
|
|
|
|
**Adapters:**
|
|
1. **`InMemoryCredentialStore`** — `HashMap<String, EncryptedData>`. For
|
|
tests and simple deployments where credentials are loaded from config at
|
|
startup.
|
|
2. **`SqliteCredentialStore`** — `credentials` table in SQLite + honker.
|
|
Persists encrypted provider credentials. The vault encrypts; the store
|
|
persists the `EncryptedData` blob; the assembly layer loads them into
|
|
`Capabilities` at registration time (the no-env-vars invariant, ADR-014).
|
|
|
|
```sql
|
|
CREATE TABLE credentials (
|
|
provider TEXT PRIMARY KEY, -- "openai", "anthropic", etc.
|
|
encrypted_data TEXT NOT NULL, -- EncryptedData JSON (key_version, iv, ciphertext)
|
|
created_at INTEGER NOT NULL,
|
|
updated_at INTEGER NOT NULL
|
|
);
|
|
```
|
|
|
|
#### `PeerStore` (adapter-internal, not a core trait)
|
|
|
|
A `PeerStore` trait (save/find/update/delete peer records) is an
|
|
*adapter-internal* detail, not a core trait. The core trait is
|
|
`IdentityProvider`. The `SqliteIdentityProvider` implements
|
|
`IdentityProvider` by delegating to an internal `PeerStore` (which queries
|
|
the `peers` table). The `ConfigIdentityProvider` implements
|
|
`IdentityProvider` by reading `PeerEntry` from config. The trait boundary
|
|
that matters for cross-crate sharing is `IdentityProvider`, not `PeerStore`.
|
|
|
|
This keeps core lean: the auth repo trait (`IdentityProvider`) and the
|
|
credential repo trait (`CredentialStore`) are in core. The store traits
|
|
(`PeerStore`, etc.) are adapter-internal.
|
|
|
|
### The `PeerEntry` config model
|
|
|
|
`AuthPolicy` needs to support the id-fingerprint decoupling. Today it has
|
|
`authorized_fingerprints: HashSet<String>` — just fingerprints, no stable id.
|
|
The update:
|
|
|
|
```rust
|
|
pub struct PeerEntry {
|
|
pub peer_id: String, // stable logical id ("worker-a")
|
|
pub fingerprint: String, // current crypto material
|
|
pub scopes: Vec<String>,
|
|
pub resources: HashMap<String, Vec<String>>,
|
|
pub display_name: Option<String>,
|
|
pub enabled: bool,
|
|
}
|
|
|
|
pub struct AuthPolicy {
|
|
pub peers: Vec<PeerEntry>, // replaces authorized_fingerprints
|
|
pub api_keys: Vec<ApiKeyEntry>,
|
|
}
|
|
```
|
|
|
|
`ConfigIdentityProvider::resolve_from_fingerprint` queries `peers` for the
|
|
matching fingerprint and returns `Identity { id: peer.peer_id, scopes:
|
|
peer.scopes, resources: peer.resources }`. The `Identity.id` is the stable
|
|
`peer_id`, not the fingerprint. Key rotation: update the `fingerprint` field
|
|
in the `PeerEntry`; the `peer_id` and all ACL entries stay stable.
|
|
|
|
This is a config change to `AuthPolicy`, not a storage change. It works
|
|
in-memory from config, without SQLite. The SQLite adapter (`SqliteIdentityProvider`)
|
|
stores the same `PeerEntry` shape in a table and persists across restarts.
|
|
|
|
---
|
|
|
|
## 5. Per-Node ACL, No "Trusted" Flag
|
|
|
|
### The model
|
|
|
|
Each node has its own ACL. A node's ACL answers one question: **is this
|
|
caller authorized to call this operation?** The caller is whoever
|
|
authenticated to the connection — resolved by `IdentityProvider` from the
|
|
TLS fingerprint or `auth_token`, checked by `AccessControl::check(identity)`.
|
|
No "trusted" flag, no bypass, no special mode.
|
|
|
|
This is the existing mechanism, restated for the cross-node case. The call
|
|
protocol's dispatch path (`registration.rs:128-140`) already runs
|
|
`AccessControl::check` against the caller's `Identity`. For a remote peer's
|
|
call, the caller's `Identity` is the peer's resolved identity. Same check,
|
|
same mechanism, no new concept.
|
|
|
|
### Why no "trusted=true"
|
|
|
|
A generic "trusted" flag is a blanket authorization bypass — the exact
|
|
anti-pattern that ADR-015 was written to kill (it replaced `trusted: true`
|
|
with the authority-switch model). There is no circumstance where a generic
|
|
"skip the security check" flag is the right answer in a reasonably secure
|
|
system. If a caller is authorized, the ACL says so. If the ACL doesn't say
|
|
so, the caller isn't authorized. There's no third state.
|
|
|
|
### The cross-node case
|
|
|
|
When a hub forwards to a spoke (via `from_call`), the spoke authenticates
|
|
the hub (resolves the hub's identity from the connection), and checks its
|
|
ACL: "is this identity authorized to call this operation?" The answer is
|
|
yes or no, based on the hub's identity and the op's `AccessControl`. Same
|
|
mechanism, same check, no special-casing.
|
|
|
|
```
|
|
End user ──calls──> Hub ──forwards as hub──> Spoke (docker service)
|
|
│ │
|
|
hub's ACL spoke's ACL
|
|
(user → hub ops) (hub → spoke ops)
|
|
```
|
|
|
|
The hub's ACL checked the end user. The spoke's ACL checked the hub. Two
|
|
independent authorization decisions, same mechanism, no replication. The hub
|
|
isn't "trusted" by the spoke — the hub is *authorized* by the spoke's ACL,
|
|
the same way any caller is authorized.
|
|
|
|
### The service-to-service pattern
|
|
|
|
This is the same principle as: a database server authorizes the application
|
|
server; it doesn't need to know about every end user the app server
|
|
authenticated. The application server is the authorization boundary. In
|
|
alknet, each node is an authorization boundary for its direct callers.
|
|
|
|
The docker service example: the service exposes `/docker/start`. It's
|
|
reachable directly (end users connect and call it) or through a hub (the
|
|
hub imports via `from_call`, re-exposes, forwards). The docker service's
|
|
ACL lists the principals that call it directly — either end users (direct
|
|
topology) or the hub (proxied topology). It doesn't need to know about the
|
|
hub's end users. The hub's ACL handles end-user authorization.
|
|
|
|
### No global ACL, no replication
|
|
|
|
Each node's ACL is local — in its own SQLite file (when the SQLite adapter
|
|
is wired), in its own `peers` table, checked by its own `AccessControl`.
|
|
There is no global ACL, no cross-service ACL replication. When a user's key
|
|
rotates, the hub's `peers` table updates her fingerprint. The spoke's `peers`
|
|
table is unchanged — it only knows about the hub. When the hub's key
|
|
rotates, the spoke's `peers` table updates the hub's fingerprint — a single
|
|
entry update, not a full ACL replication.
|
|
|
|
---
|
|
|
|
## 6. Forwarded-For Identity (Metadata, Not Authority)
|
|
|
|
### The question
|
|
|
|
When a hub forwards a call to a spoke, should the spoke know *who initiated
|
|
the call* (the end user), or just *who called it* (the hub)?
|
|
|
|
**Without forwarded-for** (what the implementation does today): the spoke
|
|
sees the hub as the caller. It authorizes the hub. It logs "the hub called
|
|
`/docker/start`." If the spoke needs to audit "who actually initiated this,"
|
|
it can't — that information is at the hub.
|
|
|
|
**With forwarded-for**: the hub includes the original caller's identity in
|
|
the `call.requested` payload. The spoke can log it, use it for per-user
|
|
quotas, or pass it to the operation handler for context. But the spoke's ACL
|
|
still authorizes the *hub*, not the end user — the forwarded-for identity is
|
|
informational, not authoritative.
|
|
|
|
### The decision: add it, as metadata
|
|
|
|
The forwarded-for identity is a protocol-level field. It's either in the
|
|
model or it isn't — it can't be bolted on without a protocol change. The
|
|
recommendation is to include it:
|
|
|
|
1. **Audit trail.** Without it, a cross-node call chain is untraceable at
|
|
the leaf. The spoke knows "the hub called me" but not "alice asked the
|
|
hub to call me." For debugging, billing, and abuse investigation, the
|
|
originator matters.
|
|
|
|
2. **It's metadata, not authority.** The forwarded-for identity goes in the
|
|
call's metadata (or a dedicated `forwarded_for` field), not as the
|
|
`auth_token`. The spoke's dispatch path makes it available on
|
|
`OperationContext` but `AccessControl::check` *never* uses it — it
|
|
always authorizes the direct caller's identity. This keeps it from
|
|
becoming an authorization bypass.
|
|
|
|
3. **The ACL check signature prevents misuse.** `AccessControl::check` takes
|
|
`Option<&Identity>` (the direct caller's identity). `forwarded_for` is a
|
|
*separate* field on `OperationContext` (`Option<Identity>`). The ACL
|
|
check signature doesn't accept it. If someone wants to ACL on the
|
|
forwarded-for identity, they'd have to change the `AccessControl::check`
|
|
signature — a visible, reviewable change, not a quiet flag flip.
|
|
|
|
4. **Without it, the leaf service is blind to the originator.** If the spoke
|
|
needs to rate-limit per-user (not per-hub), or log who triggered a
|
|
container start, it can't. The hub would have to proxy and track
|
|
everything, which defeats the point of direct service composition.
|
|
|
|
### Protocol shape
|
|
|
|
The `call.requested` payload gains an optional `forwarded_for` field:
|
|
|
|
```json
|
|
{
|
|
"operationId": "/docker/start",
|
|
"input": { ... },
|
|
"auth_token": "alk_...", // the direct caller's token (the hub's)
|
|
"forwarded_for": { // the original caller (the end user's)
|
|
"id": "alice-fingerprint",
|
|
"scopes": ["fs:read", "docker:start"]
|
|
}
|
|
}
|
|
```
|
|
|
|
The dispatch path populates `OperationContext`:
|
|
```rust
|
|
pub struct OperationContext {
|
|
// ... existing fields ...
|
|
pub identity: Option<Identity>, // the direct caller (authorized by ACL)
|
|
pub forwarded_for: Option<Identity>, // the original caller (metadata only)
|
|
}
|
|
```
|
|
|
|
`AccessControl::check(identity.as_ref())` — unchanged. The `forwarded_for`
|
|
field is available to handlers for logging, auditing, rate-limiting, but
|
|
never to the ACL.
|
|
|
|
### The `from_call` handler's responsibility
|
|
|
|
The hub's `from_call` forwarding handler populates `forwarded_for` with the
|
|
end user's identity (from the hub's `OperationContext.identity`) when it
|
|
constructs the `call.requested` payload to send to the spoke. The hub
|
|
authenticates as itself (its own `auth_token`); the `forwarded_for` field
|
|
carries the originator's identity as context.
|
|
|
|
This is a protocol addition — a field on the `call.requested` payload and
|
|
on `OperationContext`. It's included in the ADR-029 migration or a
|
|
companion task — the `from_call` handler is being rewritten anyway, and the
|
|
`OperationContext` struct is being touched.
|
|
|
|
---
|
|
|
|
## 7. The Decomposition
|
|
|
|
### Crate boundaries
|
|
|
|
```
|
|
alknet-core (lean — no SQLite, no honker)
|
|
├── IdentityProvider trait (the auth repo trait — already exists)
|
|
├── CredentialStore trait (the encrypted-credentials repo trait — needs adding)
|
|
├── Identity, AuthToken, AuthContext (the auth types — already exist)
|
|
├── AccessControl, AccessResult (the ACL check — already exists)
|
|
├── ConfigIdentityProvider (in-memory adapter — needs PeerEntry update)
|
|
├── InMemoryCredentialStore (in-memory adapter — needs building)
|
|
└── PeerEntry (config model for decoupled id — needs adding to AuthPolicy)
|
|
|
|
Storage-consuming crates (each owns its SQLite + honker):
|
|
├── alknet-peer-store-sqlite — SqliteIdentityProvider (peers table + honker)
|
|
├── alknet-credential-store-sqlite — SqliteCredentialStore (credentials table + honker)
|
|
├── alknet-filesystem — path-tree tables (tree, not graph; POC-proven)
|
|
└── alknet-graphs — metagraph tables (graph-shaped problems: ACL delegation, workflows, taskgraph)
|
|
|
|
alknet-call (lean — no SQLite, no honker, no storage traits)
|
|
├── Uses IdentityProvider (the trait, not the adapter)
|
|
├── PeerCompositeEnv keyed by PeerId (= Identity.id from IdentityProvider)
|
|
├── AccessControl::check(identity) for per-node ACL
|
|
└── from_call handler authenticates as the hub, forwards-for as metadata
|
|
```
|
|
|
|
### What goes where
|
|
|
|
| Concern | Where it lives | Shape | Status |
|
|
|---------|---------------|-------|--------|
|
|
| Auth repo trait (`IdentityProvider`) | alknet-core | Trait | Exists |
|
|
| Credential repo trait (`CredentialStore`) | alknet-core | Trait | Needs adding |
|
|
| In-memory auth adapter (`ConfigIdentityProvider`) | alknet-core | Config-backed | Needs `PeerEntry` update |
|
|
| In-memory credential adapter (`InMemoryCredentialStore`) | alknet-core | HashMap-backed | Needs building |
|
|
| SQLite auth adapter (`SqliteIdentityProvider`) | `alknet-peer-store-sqlite` | `peers` table + honker | Needs building |
|
|
| SQLite credential adapter (`SqliteCredentialStore`) | `alknet-credential-store-sqlite` | `credentials` table + honker | Needs building |
|
|
| Per-node ACL check (`AccessControl::check`) | alknet-core | Table-shaped: scope/resource match | Exists |
|
|
| Filesystem path tree + bucket ACL | alknet-filesystem | Specialized tables (POC-proven) | POC done, crate needs building |
|
|
| ACL delegation graph | alknet-graphs (metagraph) | Graph (traversal, scope narrowing) | Needs building when delegation is needed |
|
|
| Workflows / flowgraph | alknet-graphs (metagraph) | Graph (DAG) | Needs building when workflows are needed |
|
|
| Taskgraph | alknet-graphs (metagraph) | Graph (dependency DAG) | Needs building when taskgraph is needed |
|
|
| Forwarded-for identity | alknet-call (protocol field) | Metadata on `call.requested` + `OperationContext` | Needs adding |
|
|
|
|
### What the old spec had that we're dropping
|
|
|
|
| Old spec | Status | Why |
|
|
|----------|--------|-----|
|
|
| Multi-tenant (system.db + tenant.db) | Dropped | Each tenant gets its own complete setup (own ACL, ops, DB). Simpler, no cross-tenant complexity. |
|
|
| `secrets/` module (HD derivation, secret service) | Replaced by alknet-vault | The vault already handles encryption/decryption (ADR-018/019/020/025/026). Storage just stores the `EncryptedData` blob. |
|
|
| Metagraph as the foundation | Demoted to tool | SQLite+honker is the foundation. Metagraph is one tool on it, for graph-shaped problems. Tables are another tool, for table-shaped problems. |
|
|
| `alknet-storage` as one crate | Split | The storage-consuming concerns are separate (peer store, credential store, filesystem, graphs). No single "storage" crate. |
|
|
| Accounts/organizations/multi-tenant identity | Dropped | The need is a `peers` table (PeerId → fingerprint + scopes). The full account/org model is over-engineering for the current use case. |
|
|
| `alknet-flowgraph` as a separate crate | Folded into alknet-graphs | The metagraph + petgraph interop are one crate for graph-shaped problems. |
|
|
|
|
---
|
|
|
|
## 8. The ACL Split: Check Stays Table, Delegation Is Graph
|
|
|
|
### The current ACL is table-shaped
|
|
|
|
`AccessControl` on `OperationSpec` is `required_scopes` (AND-gate),
|
|
`required_scopes_any` (OR-gate), `resource_type`/`resource_action`. `Identity`
|
|
has `scopes: Vec<String>` and `resources: HashMap<String, Vec<String>>`. The
|
|
check is `AccessControl::check(identity)` — a flat scope-match, not a graph
|
|
traversal. This is fast, indexable, and correct for the current model (no
|
|
delegation).
|
|
|
|
### Delegation is graph-shaped
|
|
|
|
When delegation is needed ("A delegates to B with narrowed scopes, B
|
|
delegates to C with further narrowing"), the delegation chain is a graph
|
|
traversal — you walk the chain computing the effective scope set. This is
|
|
where the metagraph pays off (PrincipalNode, DelegatesEdge, scope narrowing).
|
|
|
|
But the *check* stays table-shaped even with delegation: the delegation
|
|
graph produces the effective `Identity.scopes` (the graph's output); the ACL
|
|
check is still "does the effective scope set satisfy the op's requirements?"
|
|
(a flat join). The graph and the table compose — the graph produces the
|
|
scopes, the table checks them.
|
|
|
|
### Don't force the check through the graph
|
|
|
|
The temptation is to make `AccessControl::check` traverse the delegation
|
|
graph. Don't. The check is a flat scope-match — keep it that way. The
|
|
delegation graph is a separate concern (producing effective scopes), and it
|
|
lives in `alknet-graphs` (metagraph). The check lives in core (table). They
|
|
compose at the `IdentityProvider` boundary: the adapter resolves the identity
|
|
(possibly by traversing the delegation graph to compute effective scopes),
|
|
returns an `Identity` with the effective scopes, and the check is a flat
|
|
match against that `Identity`.
|
|
|
|
This matches the "don't use a screwdriver to hammer a nail" principle: the
|
|
check is table-shaped, the delegation is graph-shaped, and forcing either
|
|
through the other's shape is worse.
|
|
|
|
---
|
|
|
|
## 9. The Hub Proxy Tangle (Resolved)
|
|
|
|
### The tangle
|
|
|
|
A hub can "have a filesystem" two ways:
|
|
1. **In-process** — the hub's binary loads `alknet-filesystem`. The
|
|
filesystem's SQLite is local. The hub's call protocol dispatches
|
|
`/fs/readFile` directly to the filesystem handler. No network.
|
|
2. **Proxied** — the filesystem runs on a spoke. The hub imports the spoke's
|
|
ops via `from_call`. The hub's `from_call` handler forwards over QUIC.
|
|
The spoke's call protocol dispatches to its own filesystem handler.
|
|
|
|
These are different deployment topologies for the same libraries. The
|
|
libraries don't change; the assembly does.
|
|
|
|
### The three concerns that got conflated
|
|
|
|
1. **ACL** — who can call the operation? The hub's ACL authorizes the user.
|
|
The spoke's ACL authorizes the hub. (Per-node ACL, same mechanism.)
|
|
2. **Bucket routing** — which bucket is the operation targeting? The bucket
|
|
is a *parameter* in the operation input (`{ "bucket": "alice-files",
|
|
"path": "hello.txt" }`). It's not an ACL concern — it's operation input.
|
|
3. **Peer routing** — which spoke *hosts* the operation? This is
|
|
`PeerRef::Specific` (ADR-029) — the hub's composition env routes to the
|
|
right peer.
|
|
|
|
These are three separate decisions at three separate layers:
|
|
|
|
```
|
|
User calls hub's /fs/readFile with { bucket: "alice-files", path: "hello.txt" }
|
|
→ hub's ACL: is this user authorized to call /fs/readFile? (AccessControl::check)
|
|
→ hub's composition env: which peer serves /fs/readFile? (PeerRef routing)
|
|
→ hub's from_call handler: forward { bucket, path } to that peer
|
|
→ spoke's ACL: is the hub authorized to call /fs/readFile? (AccessControl::check)
|
|
→ spoke's filesystem handler: read path from bucket (operation logic + bucket ACL)
|
|
```
|
|
|
|
### Bucket-level authorization
|
|
|
|
The call protocol's ACL is coarse: "can this identity call `/fs/readFile`?"
|
|
It doesn't know about buckets. The bucket is in the operation input. The
|
|
**handler** checks bucket-level authorization — the filesystem handler reads
|
|
`ctx.identity`, reads the input's `bucket` field, and checks its own bucket
|
|
ACL (a `bucket_acl` table in the filesystem's SQLite: "is this identity
|
|
authorized for this bucket?"). This is application logic — the filesystem
|
|
owns its bucket authorization. The call protocol's ACL is the coarse gate;
|
|
the handler is the fine gate.
|
|
|
|
This keeps the call protocol's ACL simple and fast (a scope/resource check),
|
|
and lets each service define its own fine-grained authorization against its
|
|
own storage. The ACL doesn't inspect operation input; the handler does.
|
|
|
|
---
|
|
|
|
## 10. Build Order
|
|
|
|
This is the concrete sequence, not a deferral. Each item is a design
|
|
commitment that needs to be built. The order is dependency-driven, not
|
|
priority-driven — earlier items unblock later ones.
|
|
|
|
### Tier 1: Core repo traits and config model (unblocks everything)
|
|
|
|
1. **`PeerEntry` in `AuthPolicy`** — replace `authorized_fingerprints:
|
|
HashSet<String>` with `peers: Vec<PeerEntry>` (peer_id, fingerprint,
|
|
scopes, resources). Update `ConfigIdentityProvider` to resolve
|
|
fingerprint → `PeerEntry` → `Identity { id: peer_id, ... }`. This is the
|
|
id-fingerprint decoupling (OQ-33). Without this, the ACL keys on the
|
|
fingerprint and breaks on key rotation.
|
|
|
|
2. **`CredentialStore` trait in core** — the repo trait for encrypted
|
|
provider credentials. `InMemoryCredentialStore` adapter (HashMap-backed)
|
|
for tests and config-loaded deployments.
|
|
|
|
These are core changes — no SQLite, no honker, no new crates. They fix the
|
|
id-fingerprint coupling and establish the credential repo pattern.
|
|
|
|
### Tier 2: SQLite adapters (enables persistence)
|
|
|
|
3. **`alknet-peer-store-sqlite`** — `SqliteIdentityProvider` backed by a
|
|
`peers` table + honker. Implements `IdentityProvider`. The assembly layer
|
|
wires it instead of `ConfigIdentityProvider` when persistence is needed.
|
|
The `peers` table schema is in §4. Honker `notify("peers:changed")` on
|
|
mutations for cache invalidation.
|
|
|
|
4. **`alknet-credential-store-sqlite`** — `SqliteCredentialStore` backed by
|
|
a `credentials` table + honker. Implements `CredentialStore`. The
|
|
assembly layer wires it when credentials need to persist across restarts.
|
|
|
|
These are new crates — each owns its SQLite file, attaches honker, defines
|
|
its schema. They implement the core traits.
|
|
|
|
### Tier 3: Protocol and call crate (enables cross-node composition)
|
|
|
|
5. **ADR-029 migration** — peer-keyed overlays (`PeerCompositeEnv`), retire
|
|
`remote_safe`/`trusted_peer`, `PeerRef` routing, `AccessControl`-based
|
|
peer authorization. The `forwarded_for` field is added here (or in a
|
|
companion task) since `OperationContext` and the `from_call` handler are
|
|
being rewritten.
|
|
|
|
6. **`forwarded_for` field** — add to `call.requested` payload and
|
|
`OperationContext`. The `from_call` handler populates it; the dispatch
|
|
path makes it available; `AccessControl::check` ignores it. This is a
|
|
protocol addition that's included with the migration or done as a
|
|
companion task immediately after.
|
|
|
|
### Tier 4: Graph-shaped problems (enables ACL delegation, workflows, taskgraph)
|
|
|
|
7. **`alknet-graphs`** — the metagraph crate (GraphType/NodeType/EdgeType,
|
|
CRUD, schema validation, petgraph interop). Built on SQLite + honker.
|
|
This is built when the first graph-shaped consumer needs it — ACL
|
|
delegation, workflows, or taskgraph. Not built speculatively; built when
|
|
there's a graph-shaped problem to solve.
|
|
|
|
8. **ACL delegation graph** — a metagraph instance (PrincipalNode,
|
|
DelegatesEdge, scope narrowing). The `IdentityProvider` adapter traverses
|
|
it to compute effective scopes. Built when delegation is needed — not
|
|
before, not speculatively.
|
|
|
|
### What does NOT get built (dropped, not deferred)
|
|
|
|
- Multi-tenant (system.db + tenant.db) — dropped; each tenant gets its own
|
|
setup
|
|
- Accounts/organizations/multi-tenant identity — dropped; the `peers` table
|
|
is the model
|
|
- `secrets/` module — dropped; the vault handles encryption
|
|
- `alknet-storage` as one crate — dropped; split by concern
|
|
|
|
---
|
|
|
|
## 11. Open Questions
|
|
|
|
1. **Does the peer registry SQLite adapter live in its own crate
|
|
(`alknet-peer-store-sqlite`) or in the assembly layer?** The kepal
|
|
pattern suggests a separate crate (the adapter is reusable across
|
|
deployments). `ConfigIdentityProvider` lives in core (a simple impl);
|
|
the SQLite adapter could live in a separate crate or in the assembly
|
|
layer's binary. This is a packaging choice — the trait is in core either
|
|
way.
|
|
|
|
2. **Does the ACL delegation graph produce `Identity.scopes` at resolution
|
|
time or at check time?** The recommendation in §8 is at resolution time
|
|
(the `IdentityProvider` adapter traverses the delegation graph to compute
|
|
effective scopes, returns an `Identity` with them, and the check is
|
|
flat). The alternative is lazy computation (the check triggers the
|
|
traversal). This is a design question for when the delegation graph is
|
|
built — the current model has no delegation, so it's not blocking.
|
|
|
|
3. **Does the `CredentialStore` trait need a `list` method?** The current
|
|
design has `get`/`put`/`delete`. A `list` (list all providers) might be
|
|
needed for a management UI or for the assembly layer to enumerate
|
|
credentials at startup. Two-way door — add `list` when a consumer needs
|
|
it.
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- ADR-014: Secret Material Flow and Capability Injection (the no-env-vars
|
|
invariant)
|
|
- ADR-015: Privilege Model and Authority Context (the authority-switch model
|
|
that replaced `trusted: true`)
|
|
- ADR-017: Call Protocol Client and Adapter Contract (the `from_call`
|
|
forwarding handler)
|
|
- ADR-018/019/020/025/026: The vault crate (handles encryption/decryption;
|
|
storage stores the `EncryptedData` blob)
|
|
- ADR-029: Peer-Graph Routing Model (peer-keyed overlays, `PeerRef` routing,
|
|
`AccessControl`-based peer authorization)
|
|
- OQ-33: PeerId — logical id, not crypto identity
|
|
- OQ-34: Persistent peer registry (the storage dimension)
|
|
- `docs/research/alknet-call-peer-routing/findings.md` — the peer-graph
|
|
routing research that surfaced the storage question
|
|
- `docs/research/alknet-filesystem/poc-summary.md` — the filesystem POC that
|
|
validated SQLite + honker + iroh-blobs
|
|
- `/workspace/@alkdev/alknet-main/docs/architecture/storage.md` — the old
|
|
storage spec (residual issues documented in §7)
|
|
- `/workspace/@alkdev/alknet-main/docs/research/storage.md` — the old storage
|
|
research (metagraph, identity, ACL, honker integration)
|
|
- `/workspace/keypal` — TypeScript repo-pattern reference for API key
|
|
management (Storage interface + adapters, the pattern alknet's
|
|
`IdentityProvider` follows)
|
|
- `/workspace/honker` — SQLite extension with pub/sub, streams, queues,
|
|
locks, scheduler (`honker-core` for the attach-to-your-connection pattern)
|
|
- https://sqlite.org/appfileformat.html — SQLite as an application file format |