Working through the WebTransport implementation path surfaced a scope
question distinct from the hedging-as-deferral anti-pattern ADR-038 was
written to correct. Three findings drove the re-evaluation:
1. The browser bidirectional call-protocol path doesn't require
WebTransport — WebSocket is full-duplex, EventEnvelope fits a WS
binary message boundary cleanly, and the Dispatcher is stream-
agnostic (ADR-012). What WebTransport gives over WebSocket (native
multi-stream multiplexing, the ALPN-as-stream substrate) benefits the
proxy use case, not the call protocol.
2. WebTransport is a draft standard (-07, not RFC) on an experimental
Rust dependency stack (wtransport/h3 both self-describe as not
production-ready). Either choice puts a draft protocol on the
security surface of the first release.
3. The ALPN-stream-proxy (ADR-040) is speculative — its WASM parser
consumers (browser SSH/SFTP/git clients) don't exist yet, and the
downstream crates WebTransport deferral blocks (SSH, git, SFTP)
expose their ALPNs natively over QUIC regardless.
This is a scope decision (per ADR-009: a decision that 'genuinely
doesn't need to be made yet because the use case isn't concrete'), not
hedging. The reversal trigger is concrete: a real deployment needing
the ALPN-stream-proxy.
ADR-038 is superseded (its anti-pattern correction stands; its specific
'h3 in scope now' decision is reversed). ADR-040 and ADR-043 are
parked, not superseded — their designs revive unchanged when WebTransport
revives, with §2 (bidirectionality) and §3 (no-PeerId overlay) of ADR-043
transferring to WebSocket for v1.
ADR-044 §5 also states the 'browser is not a peer' rationale that
ADR-034 §4 closed without arguing: peer = addressable node in the
call-protocol peer graph (stable PeerId, PeerRef::Specific-reachable,
identity stable across reconnects), not 'any endpoint that exchanges
calls during a live session.' A browser is the second but not the first
(no stable crypto identity of its own, ephemeral, not addressable from
other nodes). ADR-034 §4 and Assumption 2 are amended by reference.
The wtransport-vs-hyperium dependency question is recorded (not
resolved — WebTransport is deferred) in ADR-044 §'Research note' and
webtransport.md so the revival doesn't re-derive it: wtransport probably
isn't the right choice (axum-bridge friction — it owns its own HTTP
serving path); the hyperium stack (h3 + h3-quinn + h3-webtransport) fits
the axum integration better but its server-side WebTransport API needs
verification before commitment.
Reviewed by architecture-review subagent; all critical cross-reference
issues (ADR-034 §5 stale 'in scope' assertion, ADR-036 Context listing
h3 as implemented, webtransport.md Design Decisions table) resolved.
Reframes the SSH scope around the channel multiplexer as the decomposition
point. Each feature (forwarding, SOCKS5, SFTP) is a channel type or a consumer
of channel types, stacking on the core — each layer functional when built,
none shipped broken. Dissolves the 'massive v1' framing that produced hedging
language proposing non-functional or half-built versions.
Three developments since the initial 2026-06-25 research changed the framing:
(1) WebTransport landed as ADRs 038/040/043, grounding SSH-over-WebTransport
as a constraint (the handler must be source-agnostic about its Connection);
(2) russh's runtime abstraction (russh-util swaps tokio::spawn for
wasm_bindgen_futures on wasm32) means the SSH *client* runs in WASM when fed a
WebTransport BiStream — the browser case is real, not speculative;
(3) the http crate intersection (ALPN-stream-proxy depends on SSH handlers
being source-agnostic) is now visible and specified.
The layered build order (1-4 stream+connection+channels+exec, then 5
forwarding, then 6 SOCKS5, then 7 SFTP) doubles as the configuration surface:
each layer beyond the core is an opt-in channel type, gating on the
default-deny ACL baseline inherited from russh.
A consistency review of the alknet-http specs found two classes of
issues: internal contradictions from the mid-spec pivot (the to_openapi
gateway pattern landed in prose but not in cross-references), and a
systematic client→server assumption that only holds for the OpenAPI/MCP
case leaking into the WebTransport architecture.
Class 1 (internal contradictions):
- C1: to_openapi was half-refactored — body described the ADR-042
gateway pattern but the decisions table and ADR-036 still said
'paths mirror /{service}/{op}'. ADR-036's to_openapi clause is now
amended as superseded by ADR-042; the stale decisions row and README
Principle 2 are fixed.
- C2: the axum Router route list didn't include the 5 gateway endpoints
(/search, /schema, /call, /batch, /subscribe). Added them; clarified
/openapi.json as the gateway description doc; added gateway paths to
the decoy exclusion list.
- C3: ADR-034 §5 still talked about the 'h3/WebTransport deferral
bucket' that ADR-038 eliminated. Amended §5/Consequences/References
to drop the deferral framing (the auth-model decision stands; only
the 'when' wording was stale).
Class 2 (one-way direction assumption):
- C4/C5/C6: the WebTransport specs framed the session as browser→hub
one-way, when the call protocol is bidirectional and WebTransport is
a general ALPN transport substrate. New ADR-043 reframes WebTransport
as a bidirectional ALPN transport substrate (call protocol is the
first/canonical target; needs no WASM parser), names the call
protocol's bidirectionality over WebTransport sessions, and states
the inbound no-PeerId connection-local overlay as the mirror of
ADR-034 §2. webtransport.md is updated to reflect this framing;
ADR-040 is repositioned (not superseded) as the substrate's non-call-
ALPN mechanism.
- C7: the HTTP/1.1+HTTP/2 surface's one-directionality is now named as
a lossy consequence of HTTP request/response; WebTransport is named
as the surface that restores the bidirectional call model.
- C8: overview.md acknowledges the from/to direction model is
OpenAPI/MCP-specific, not a call-protocol property.
A review subagent pass on ADR-043 + webtransport.md found no critical
issues; warnings W1-W3 (residual browser-as-subject framing, ADR-009
rationale in spec, opening abstract tone) and suggestions S2/S4/S5
were addressed.
The to_openapi spec was describing one OpenAPI path per alknet operation
— the inverse of from_openapi. That inverse is genuinely messy: the call
protocol's input is a flat JSON object, and generating a traditional
OpenAPI path entry (POST /fs/{path} with path param, body, query params)
requires reverse-engineering which fields are path/query/body — metadata
the call protocol doesn't carry. The three options (leaky HTTP metadata
on OperationSpec, fragile heuristics, manual annotation) are all messy.
ADR-042 replaces this with the gateway pattern (same as ADR-041 for
to_mcp): to_openapi generates 5 fixed endpoints (search, schema, call,
batch, subscribe) that gate access to the full operation registry. The
input is always a flat JSON body — no path/query/body split to
reverse-engineer. JSON Schema is already in the OperationSpec.
The per-caller API surface is the key advantage: /search is
AccessControl-filtered, so the client sees only what it can call. The
Gitea failure mode (dumping admin ops to every caller in a static
OpenAPI doc) is structurally impossible — the per-caller surface is the
default, not an afterthought. OpenAPI has no per-caller filtering
concept; the gateway pattern provides it through /search.
Gateway endpoint set:
- /search -> services/list (AccessControl-filtered, names + descriptions)
- /schema -> services/schema (full OperationSpec)
- /call -> call.requested (Query/Mutation, flat JSON body)
- /batch -> multiple call.requested (correlated IDs)
- /subscribe -> call.requested (Subscription, SSE) — the one endpoint
the MCP gateway excludes (MCP is request/response; OpenAPI/SSE
supports streaming)
A traditional per-operation-paths projection is additive (a deployment
that wants the nice Swagger UI builds it with HTTP-specific metadata),
not a replacement. The gateway is the default.
http-adapters.md to_openapi section rewritten: the gateway endpoint
set, per-caller filtering, error fidelity on the /call endpoint, and
the additive traditional projection. The 'Why' section adds the
flat->structured and per-caller-surface rationale.
README/overview ADR tables and the top-level README current-state note
updated for ADR-042.
The to_mcp spec was describing one MCP tool per alknet operation — the
tool-bloat problem. An LLM connecting to a node with 200 operations gets
200 MCP tools dumped into its context, degrading reasoning and wasting
context budget.
ADR-041 replaces this with the tool-gateway pattern (same pattern as
opencode's memory and worktree tools): to_mcp exposes 4 fixed meta-tools
(search, schema, call, batch) that gate access to the full operation
registry. The LLM has a few tools in context, discovers operations on
demand through search + schema, then calls. Same principle as Linux's
man command — don't preload all documentation; query on demand.
Gateway tool set:
- search -> services/list (names + descriptions, AccessControl-filtered)
- schema -> services/schema (full OperationSpec for a specific op)
- call -> call.requested (Query/Mutation only, request/response)
- batch -> multiple call.requested (correlated IDs, OQ-14)
Subscription operations are excluded — MCP tool calls are
request/response by protocol design (the client blocks until
CallToolResult returns); streaming subscriptions don't fit. Subscriptions
are filtered out of search results and cannot be invoked via call.
http-mcp.md to_mcp section rewritten: the gateway tool set, Subscription
exclusion, and the service behavior (tools/list returns 4 fixed tools,
tools/call dispatches through the gateway). The 'Why' section adds the
tool-bloat rationale and the memory/worktree tool pattern that informed
the design.
README/overview ADR tables and the top-level README current-state note
updated for ADR-041.
The 'WebTransport proxy' concept was conflating two distinct things;
this pass separates them:
1. In-process ALPN-stream-proxy (ADR-040, in alknet-http): the h3 handler
hands a WebTransport stream to another ALPN handler (SshAdapter,
GitAdapter, etc.) as a Connection, so a browser with a WASM parser
can reach any ALPN service via WebTransport. Path-based routing
(the CONNECT path declares the target: /alknet/ssh -> SshAdapter).
HttpAdapter gains Arc<HandlerRegistry> for the lookup. The browser's
WASM parser implements BiStream (ADR-007) over the WebTransport
stream. SSH-over-WebTransport is HTTPS-shaped at the network layer
(anti-censorship: the 'VPN-like without being a VPN' use case on a
clean foundation). russh-sftp demonstrates WASM targeting is
feasible; SSH is the next target.
2. Standalone relay service (OQ-38, future alknet-relay crate): a full
relay - fork of iroh-relay - with WebTransport proxy fallback for
NAT traversal. This is infrastructure, not a mode of the h3 handler.
OQ-38 reframed to be the standalone-relay scope question (distinct
from the in-process proxy now resolved by ADR-040).
webtransport.md updated: three stream destinations (call protocol,
ALPN-handler proxy, other sub-protocols) with path-based routing; new
'ALPN-stream-proxy' section covering the WASM client side, auth model
(bearer token gates the session; protocol's own auth gates the
protocol session), and the HandlerRegistry reference.
README/overview ADR tables and OQ summaries updated for ADR-040.
Commits the concrete adapter shape deferred by ADR-033: read-sync /
write-async split with honker NOTIFY/LISTEN for no-restart cache
invalidation, against SQLite, in a separate alknet-store-sqlite crate.
Two constraints drive the design: (1) the hot-path read trait
(IdentityProvider::resolve_from_fingerprint, CredentialStore::get) is
sync — called in the accept loop, no .await — so a SQLite-backed
adapter must cache in memory and serve sync reads from the cache; (2)
auth changes must take effect without a restart (an early issue the
project already fixed for ConfigIdentityProvider via ArcSwap config
reload). honker's SQLite NOTIFY/LISTEN (single-digit-ms wake, no
polling) is the cache-invalidation mechanism that makes both hold:
write commits to SQLite + emits NOTIFY, the running process's LISTEN
wakes, the in-memory index reloads and atomically swaps, the next
read sees the new state. Same ArcSwap-reload pattern as config,
generalized from 'config file is source of truth' to 'SQLite is
source of truth, honker signals when it changed.'
New async IdentityStore write trait (put_peer / update_peer /
remove_peer) extends the sync IdentityProvider read trait for peer
mutations. ConfigIdentityProvider does NOT implement it (config
reload is its write path — a posture enforced by the absence of a
backend, not a type-system constraint); SqliteIdentityProvider
implements both. CredentialStore::put/delete refined to async (within
ADR-031's one-way door — the contract was get/put/delete keyed by
provider persisting EncryptedData never decrypting; sync-vs-async was
unspecified). CredentialStoreError renamed to shared StoreError
covering both traits.
alknet-store-sqlite is one crate implementing both IdentityStore and
CredentialStore with shared SQLite connection + honker LISTEN infra
(splitting later is a two-way door). Schema shape committed (one row
per PeerEntry with JSON columns for fingerprints/scopes/resources;
one row per EncryptedData blob keyed by provider); exact DDL is an
implementation-detail two-way door in the adapter crate. The keypal
adapter-factory pattern is intentionally not ported to Rust (runtime
column-mapping is a TS affordance; in Rust each adapter is a concrete
type, cross-cutting concerns are a shared helper module).
Amends ADR-031 (put/delete async refinement, StoreError rename),
ADR-033 (concrete adapter shape now specified, two-crate framing
collapsed to one), ADR-034 (OQ-36 now resolved), auth.md (IdentityStore
section, cache-invalidation summary, OQ-36 reference), config.md (two
write paths note), and the OQ-36/OQ-34 entries in open-questions.md.
Review fixed 4 criticals (error-type name divergence, duplicate
IdentityProvider sketch, upsert/Duplicate ambiguity, 'shape unchanged'
contradiction), 7 warnings, 5 suggestions.
Untangles the conflation of three distinct remote roles under 'X.509
endpoint': (1) public X.509 endpoint — a remote HTTPS/call-over-TLS
server the local node is a client of (no PeerEntry, no PeerId, not in
the peer graph; CA verification + bearer token); (2) transport relay —
iroh's DERP-equivalent, infrastructure, not an alknet peer; (3) hub /
hosting node — an alknet peer that also exposes a public domain + X.509
for browsers (mixed-fingerprint PeerEntry, already supported by
ADR-030).
The load-bearing one-way door is the client-side verifier selection
rule: known peer (PeerEntry present) → fingerprint pin; unknown X.509
remote → CA verification (WebPkiServerVerifier); unknown Ed25519
remote → fails closed. This closes the AcceptAnyServerCertVerifier
security hole OQ-29 flagged, with the peer-model criterion (PeerEntry
presence) made explicit. The 'make PeerEntry symmetric' instinct is
rejected — pure-client connections to public APIs have no stable
logical identity to pin.
Documents that CallCredentials.remote_identity: None is load-bearing
(None = public X.509 endpoint → CA path, not a missing field; Some =
known peer → fingerprint pin), closing a subtle gap where an
implementer could have defaulted to a placeholder or treated None as
skip-verify.
Records WebTransport relay-as-proxy (deferred with h3/WebTransport,
new OQ-HTTP-07) and on-chain/smart-contract peer discovery (fits the
OQ-36 repo/adapter pattern, no auth-model change) so they aren't lost.
Amends auth.md and client-and-adapters.md with the three-role naming,
the verifier selection rule, and the Option semantics; updates OQ-37
to resolved in open-questions.md, README.md, and both crate READMEs.
ADR-009, open-questions.md, and the architect agent spec all had the same
conflation: 'two-way door' was phrased as 'can be decided during
implementation,' which reads as 'defer the decision.' That's not what it
means. A two-way door is a decision you make now and can revert later if
wrong — it's about reversal cost, not urgency.
ADR-009: add §'What this framework is NOT' — explicitly separates door
type (reversal cost) from deferral (scope management). State that
architecture decisions are the architect's regardless of door type.
Reword the two-way-door process from 'can be decided during
implementation' to 'pick the simplest option that works, implement it,
revert if needed.'
open-questions.md: reword the header to clarify door type describes
reversal cost, not urgency. Add 'Door type is separate from whether a
decision is made.'
architect.md: add Key Principle #8 (decisions are made, not deferred),
a new 'Door Types and Decision Urgency' section, and two new anti-patterns
(#8: door type as deferral, #9: hedging language in resolved decisions).
Amend ADR-030 with three changes from the auth-type analysis:
1. PeerEntry is now multi-credential: fingerprints: Vec<String> (Ed25519
and/or X.509) + auth_token_hash: Option<String> (bearer token). All
resolve to the same peer_id. A peer that authenticates via Ed25519
today and via auth_token tomorrow gets the same PeerId. The 'peer
bearer vs auth bearer' distinction was wrong — the correct framing is
the three credential types (Ed25519, X.509, bearer token) and whether
the token needs a stable logical id across rotation (PeerEntry) or not
(ApiKeyEntry).
2. Fingerprint normalization (§6): quinn extracts the raw Ed25519 public
key from the SPKI cert and formats as ed25519:<hex>, matching iroh.
The same key has the same fingerprint regardless of transport. X.509
fingerprints stay as SHA256:<hex of DER>. This also simplifies the
coming WebTransport relay work.
3. The 'API keys' section is replaced with 'Bearer tokens' — correctly
framing the three auth types and the two bearer-token paths
(PeerEntry.auth_token_hash vs ApiKeyEntry).
Resolve OQ-29 (CallClient TLS client-auth): wire quinn client-auth (present
Ed25519 key as raw public key client cert — the server-side extraction
already works); key-type-aware server cert verification (raw key =
fingerprint match, X.509 = CA verification via WebPkiServerVerifier —
AcceptAnyServerCertVerifier is only safe for raw keys); fingerprint
normalization. The iroh path already works (RFC 7250 raw keys, both sides
exchange automatically); the gap was quinn-only.
Dissolve OQ-35: the 'API key asymmetry' framing was wrong. PeerEntry
supports multiple credential paths; ApiKeyEntry is for tokens that ARE the
identity.
Add OQ-37: X.509 outgoing-only case — the three auth types and how X.509
server identity fits the peer model. Not blocking the ADR-029 migration;
downstream (HTTP crate phase).
Update auth.md, config.md, client-and-adapters.md, call/README.md,
core/README.md, open-questions.md, README.md, and call_client.rs source
comment.
Workspace green: 326 tests pass, build clean.
Resolve the call-crate open questions where the decision is made —
OQ-27 (auto-re-import), OQ-28 (same-peer collision = error), OQ-30
(PeerRef::Any insertion-order first-match), OQ-31 (services/list-peers
opt-in). These were previously marked 'open' with 'v1' hedging language
despite having a decided default. What remains (refresh(), richer routing,
services/list-peers the op) is genuine feature addition, not unmade
architecture.
Reframe OQ-32 (multi-hop) as a feature extension rather than a 'v1'
deferral — the one-hop model is the architectural commitment; extending
to multi-hop doesn't break downstream.
Promote OQ-29 (CallClient TLS client-auth) from medium to high priority
and surface its real interaction with ADR-030. Previously framed as
'additive — two-way-door remainder,' but ADR-030's PeerEntry fingerprint
→ peer_id resolution requires the client to present a TLS client cert.
With with_no_client_auth(), no fingerprint is extracted, the PeerEntry
path is dormant, and PeerCompositeEnv keys on None or the API-key prefix
instead of the stable peer_id. This is the activation path for ADR-030's
primary use case, not an additive feature. Three options laid out: (a)
wire client-auth with the ADR-029 migration, (b) ship token-only and
switch later (the 'compounds into a mess' path), (c) extend PeerEntry
to cover auth_token-based identity. Requires a decision before the
migration lands.
Clarify OQ-36 (concrete adapter shapes): the trait shapes and in-memory
adapters ship with core — the deferral is only for the persistence
adapters (SQLite, etc.). The in-memory adapters are real implementations
of a full repo pattern, not stubs.
Update call_client.rs source comment to reference OQ-29 instead of the
'v1' / 'two-way-door remainder' framing.
Workspace green: 326 tests pass, build clean.
Land the storage and auth strategy research (findings.md) as four
accepted ADRs and amend the core and call specs to match:
- ADR-030: PeerEntry and Identity.id decoupling. Replaces
authorized_fingerprints with peers: Vec<PeerEntry>; Identity.id becomes
the stable peer_id, decoupled from the rotating fingerprint. Supersedes
ADR-029 Assumption 1's UUID source (one-way door preserved, source
changes). Resolves OQ-33 and the storage-boundary half of OQ-34. Records
the API-key asymmetry as deliberate (OQ-35).
- ADR-031: CredentialStore repo trait + InMemoryCredentialStore default
adapter in core. Second repo trait alongside IdentityProvider. Vault
encrypts; the store persists the EncryptedData blob; assembly layer
loads into Capabilities. EncryptedData core mirror includes salt for
wire-format compat.
- ADR-032: Forwarded-for identity. forwarded_for field on call.requested
and OperationContext — metadata only, never read by AccessControl::check
(enforced structurally via the check signature). The from_call handler
populates it. Wire-format one-way door, folded into the ADR-029
migration window.
- ADR-033: Storage boundary and repo/adapter pattern. Core defines repo
traits + in-memory defaults; persistence adapters are separate crates;
assembly layer wires. Resolves OQ-34. Concrete adapter shapes deferred
for exploration (OQ-36).
Amends auth.md, config.md, operation-registry.md, client-and-adapters.md,
open-questions.md, README.md, crates/core/README.md. Marks ADR-029
Accepted (Assumption 1 carries the ADR-030 superseded note). Marks the
research findings doc reviewed.
Reworks the storage strategy doc to commit to concrete design, replacing
the 'when storage arrives' / 'future' / 'later' framing that was putting off
important work.
Key changes from the previous draft:
- §4 (Repo/Adapter Pattern): now an explicit design with the trait contracts
(IdentityProvider, CredentialStore), the adapter contracts
(ConfigIdentityProvider with PeerEntry update, SqliteIdentityProvider,
InMemoryCredentialStore, SqliteCredentialStore), and the concrete table
schemas. Not a pattern description — a design commitment.
- §4: PeerEntry config model — AuthPolicy gains peers: Vec<PeerEntry>
replacing authorized_fingerprints: HashSet<String>. This is the
id-fingerprint decoupling (OQ-33) done as a config change, not a storage
change. ConfigIdentityProvider resolves fingerprint → PeerEntry →
Identity { id: peer_id } (stable, not the fingerprint).
- §7 (Decomposition): the 'what goes where' table now has a Status column
(exists / needs adding / needs building / needs PeerEntry update) instead
of 'future'. The crate graph is a concrete build plan.
- §10 (Build Order): replaces 'What This Means for the Immediate Path' (which
had 'when storage arrives' framing) with a 4-tier dependency-driven build
order. Tier 1 = core repo traits + PeerEntry config model. Tier 2 = SQLite
adapters. Tier 3 = ADR-029 migration + forwarded_for. Tier 4 = alknet-graphs
(built when a graph-shaped problem exists, not speculatively).
- §10: explicit 'What does NOT get built (dropped, not deferred)' section —
multi-tenant, accounts/orgs, secrets module, single storage crate are
dropped, not deferred.
- All 'future' / 'when X arrives' / 'v1' / 'phase n' language removed for
things that are needed. The only 'when X is needed' language remaining is
for genuinely non-existent problems (ACL delegation, workflows, taskgraph)
— those are built when the problem exists, not speculatively.
Synthesizes the multi-thread discussion that surfaced during the peer-graph
routing research (ADR-029) and OQ-33/34 resolution. Three separate threads
(peer identity, filesystem POC, old storage spec) converged on the same
question: where does persistent state live in the alknet crate graph, and
what's the shared infrastructure for it.
Key commitments documented:
- SQLite + honker is the foundation (pattern, not a crate — ~20 lines per
consumer). The metagraph is one tool built on it, for graph-shaped
problems. Direct tables are another tool, for table-shaped problems.
- IdentityProvider is the auth repo trait (already exists in core, make the
pattern explicit). Adapters implement it (Config, SQLite, future
Redis/remote/automerge). PeerStore is adapter-internal, not core.
- Per-node ACL, no 'trusted' flag. Each node authorizes its direct callers
via AccessControl::check(identity). No global ACL, no replication. The
hub authorizes the user; the spoke authorizes the hub. Same mechanism.
- Forwarded-for identity as metadata, not authority. The from_call handler
includes the original caller's identity in the call payload; the spoke's
ACL authorizes the hub (direct caller), never the forwarded_for. The ACL
check signature prevents misuse.
- The ACL check stays table-shaped (flat scope match); the delegation graph
(future) produces effective scopes at resolution time. They compose at the
IdentityProvider boundary.
- The hub proxy tangle: ACL (authorize), bucket routing (operation input),
peer routing (PeerRef) are three separate layers. Bucket-level
authorization is handler logic, not protocol logic.
What the old spec had that's dropped: multi-tenant (each tenant gets own
setup), secrets module (replaced by vault), metagraph-as-foundation (demoted
to tool), single storage crate (split by concern), accounts/orgs (deferred —
v1 is a peers table).
Reference: kepal (/workspace/keypal) — TypeScript repo-pattern example
(Storage interface + adapters) that alknet's IdentityProvider follows.
OQ-26 (resolved): AdapterError variants decided — DiscoveryFailed,
SchemaParse, Transport, Unauthorized, SamePeerCollision (replaces flat
Conflict per ADR-029 §5). #[non_exhaustive] for downstream extension.
Two-way door; the initial set is the code's return type.
OQ-33 (resolved): PeerId is a logical identifier, NOT Identity.id. The
research's v1 default (PeerId = fingerprint) is overridden: coupling PeerId
to crypto material breaks every in-flight PeerRef::Specific and every ACL
entry on key rotation. v1 source is a connection-assigned UUID — a
no-storage workaround that works for the immediate use case (head→workers,
reconnect produces fresh PeerRef, in-flight gets NOT_FOUND which is correct).
The one-way door: PeerId is logical, not crypto — this determines
PeerCompositeEnv key type and PeerRef::Specific payload. The id source
(UUID vs configured name vs peer registry) is the two-way-door remainder.
OQ-34 (new): the storage dimension OQ-33 surfaced. The core crates are
deliberately DB-free (smaller, fewer deps, simpler testing) — this served
local-only state (vault, registry) well, but peer identity is the first
cross-node state that wants persistence. The real solution (a persistent
peer registry mapping stable logical name → current crypto material,
surviving key rotation) is not a v1 blocker (UUID works), but tracked so the
no-DB posture's limit is deliberate, not accidental. The storage boundary
(core gets a PeerRegistry trait vs stays storage-free) is the one-way door;
the backend choice is two-way. Key-rotation/ACL note: decoupling PeerId from
crypto keeps the door open for ACL entries that persist across key rotation
— when the peer registry is built, ACLs key on the logical name and key
rotation becomes vault-only with no remote-side ACL update.
ADR-028's remote_safe/trusted_peer was a parallel, weaker authorization system
that duplicated the existing AccessControl/Identity machinery and couldn't
express the head→N-workers pattern (the primary use case). The flat-namespace
single-peer overlay model (one connection layer in CompositeOperationEnv)
structurally breaks the moment a head has two workers both exposing
/container/exec.
ADR-029 replaces it with:
- Peer-keyed overlays: PeerCompositeEnv { connections: HashMap<PeerId, ...> }
replaces CompositeOperationEnv's singular connection layer. A head node
routes invoke_peer() to the right peer via PeerRef::Specific / PeerRef::Any.
- AccessControl-based peer authorization: the existing AccessControl::check
(peer_identity) gates peer calls — the same mechanism that gates every other
call. remote_safe/trusted_peer/RemoteFilter/list_operations_peer_scoped/
services_list_handler_peer_scoped are retired. The op's AccessControl IS the
peer-authorization policy; no parallel system.
- ScopedPeerEnv: peer-qualified reachability (peer-pinned allowlist) replaces
from_call's namespace_prefix as the disambiguation mechanism. Cross-peer
collision dissolves (separate sub-overlays); same-peer collision stays error.
- services/list-peers opt-in for peer-attributed re-export listing.
POC-validated against real types (scratch module written, type-checked,
removed; build clean, 207 tests pass). Petgraph not needed for v1 (one-hop,
shallow); nested HashMap suffices; extends to multi-hop without redesign (OQ-32).
OQ impact: OQ-25 dissolved (no marking); OQ-28 cross-peer dissolved / same-peer
stays; OQ-26/27/29 stay; new OQ-30 (Any routing policy), OQ-31 (list-peers
semantics), OQ-32 (multi-hop federation).
Research: docs/research/alknet-call-peer-routing/findings.md (POC shapes,
prior art — Ray.io actors, Dapr service invocation, full ADR draft).
ADR-028 marked Superseded; ADR-017 DC-1 amendment updated to point at ADR-029.
Post-implementation spec sync after the call-completion batch landed
(commits e4a2594..a3825f5). The sub-agent review flagged no spec drift, but
comparing the implemented types against the spec sketches surfaced five
details the specs didn't name — filled in here so the spec matches what was
built:
- client-and-adapters.md: name the shared Dispatcher (protocol/dispatch.rs)
+ RemoteFilter mechanism that enforces ADR-028's default-deny at dispatch
time (the load-bearing security gate — checks remote_safe before building
context, before any capability material reaches the handler). Add
ClientError/RemoteIdentity types, the spawn_dispatch lower-level API, and
the services_list_handler_peer_scoped wiring (the assembly layer must
register the peer-scoped services/list handler for a CallClient's registry,
not the plain one). Record the v1 TLS client-auth gap (AcceptAnyServerCertVerifier,
with_no_client_auth) as OQ-29.
- call-protocol.md: point the adapter dispatch-loop description at the shared
Dispatcher (dispatch.rs) so readers find the mechanism ADR-017 §1 commits to.
- open-questions.md: OQ-29 — CallClient TLS client-auth + remote-identity
verification is a two-way-door remainder; the no-env-vars invariant is
unaffected (auth_token flows via call-protocol payload, not TLS).
- READMEs: current-state now reflects completion done + reviewed (207 lib +
2 integration tests); OQ-29 added to both OQ summaries.
Resolves the four gap-analysis decisions (DC-1..4) blocking the alknet-call
client/adapter surface specced in ADR-017:
- ADR-028 (new): locks the one-way door for DC-1 — CallClient registry is
default-deny (remote_safe: bool on HandlerRegistration, default false across
all provenance); share-global is an explicit trusted-peer opt-in; filtering
is a dispatch-time read over the single Layer-0 registry, not a copy.
- client-and-adapters.md (new spec): operationally fills the gap ADR-017 left
to implementation — CallClient, from_call, from_jsonschema, OperationAdapter
trait, adapter location map, no-env-vars invariant, exchange-of-operations
pattern. Keeps call-protocol.md and operation-registry.md under the
700-line split threshold.
- ADR-017 amended: records DC-2/3/4 v1 defaults (auto-on-reconnect,
error-on-collision, Result error type) and points DC-1 at ADR-028.
- OQ-25..28 (new): two-way-door remainders (remote_safe shape, AdapterError
variants, re-import trigger, namespace collision) with v1 defaults recorded.
- Index/cross-ref updates across READMEs and the two existing call specs.
Tasks: 6 task files under tasks/call/ decomposing the completion work along
the gap-analysis priority order — remote-safe-marking (one-way door, first)
→ call-client (phase-risk) → from-call → operation-adapter-trait →
from-jsonschema (parallel with call-client) → review-completion. Graph
validated with taskgraph; parallelism designed in (from-jsonschema runs
concurrent with call-client/from-call once the trait lands).
Phase 0 exploration for alknet-http (greenfield crate, no existing arch):
HTTP server (axum, ProtocolHandler for h2/http1.1, h3 deferred), HTTP client
(reqwest, the from_openapi/from_mcp forwarding handlers), MCP streamable HTTP
(feature-gated, stdio excluded as security position), to_openapi/to_mcp
projections.
Records: 8 design points (DH-3 HTTP→call operation mapping as the load-bearing
one), the settled adapter location map (from alknet-call gap analysis), the
no-env-vars invariant (Capabilities → from_openapi handler → HTTP header as the
credential injection point), and the prerequisite on alknet-call's
OperationAdapter trait being defined first.
Gap analysis for completing alknet-call: the server-side core (~5.7k lines,
159 tests) is implemented, but the client side (CallClient), the bilateral
exchange mechanism (from_call), and the adapter contract (OperationAdapter
trait) are specced in ADR-017 and unimplemented.
Records: implementation state (verified against src/), 5 decisions needed
(peer-scoped registry filtering as the load-bearing one), the settled adapter
location map (trait + from_call + from_jsonschema in alknet-call; from_openapi/
from_mcp in alknet-http), the no-env-vars invariant (Capabilities → from_openapi
handler → HTTP header), and the exchange-of-operations runner pattern with
dispatch as the concrete downstream consumer.
Incorporates user clarifications: SOCKS5 and bidirectional port forwarding are
core non-negotiable v1 features (the VPN-like use case + the 3.5k-clones
demand). Adds DP-10 for the bare-TCP SSH listener as a first-class path needed
for future git-over-SSH, with config shape reserved in v1 (off-by-default,
default-deny). Grounds the client/forwarding recommendations in the dispatch
downstream consumer at /workspace/@alkdev/dispatch, which is a textbook russh
SSH client + direct-tcpip forwarder the user wants to replace with this stack.
alknet-ssh now owns both server and client + SOCKS5-server in v1; the SOCKS5
codec may extract to a separate crate later (two-way door).
Phase 0 exploration for alknet-ssh: confirms SSH-over-QUIC-bistream via
tokio::io::join (no custom adapter needed, per reference impl), russh 0.60.2
generic run_stream/connect_stream, and channel-into-bistream multiplexing.
Surfaces 9 decision points for Phase 1: host key sourcing (vault-derived vs
config), channel policy v1 surface, client + SOCKS5 crate split, crypto
backend, auth method coverage, and a stream-handling POC to close russh's
upstream test gap.
First dedicated coverage pass (cargo-llvm-cov --workspace --all-features).
Workspace at 87.1% line coverage (5759/6615), all 224 tests pass. Vault
and registry layers are essentially fully covered; gaps concentrate in
endpoint.rs (56%), types.rs (57%), and connection.rs (54%), all stemming
from tests using MockConnection whose open_bi/accept_bi return Err.
Eight suggestions (S1-S8) ordered by leverage: pure-function tests for
dispatch_envelope / map_*_connection_error / error Display+Debug (S1-S3),
Tier A directly-callable TLS/rustls helpers in endpoint.rs (S4), one
loopback quinn integration test as the real unlock across four files (S5),
ACME event-loop extraction via synthetic stream (S6, the flagged research
item), and two small remaining gaps (S7-S8). No critical or warning
findings — this is a testing-infrastructure gap, not a logic gap.
Three tasks implementing ADR-027:
1. core/rawkey-decouple-from-iroh: TlsIdentity::RawKey now uses
Ed25519SecretKey (alknet-core-owned wrapper over ed25519_dalek)
instead of iroh::SecretKey. RawKeyCertResolver and Ed25519SigningKey
un-gated from #[cfg(all(quinn, iroh))] to #[cfg(quinn)] only.
Quinn-only builds (default) now support RFC 7250 raw-key identity.
iroh transport converts via iroh::SecretKey::from_bytes.
2. core/endpoint-request-client-cert: replaced with_no_client_auth()
with AcceptAnyCertVerifier — a custom ClientCertVerifier that
requests client certs but doesn't require them or verify against
a CA. alknet's identity model is fingerprint-based (the
authorized_fingerprints set is the trust anchor), not PKI-based.
Peer certs are extracted at the TLS layer for fingerprinting;
peers without certs connect normally.
3. core/acme-integration: TlsIdentity::Acme variant (domains,
cache_dir, directory, contact) + AcmeDirectory enum. TlsSetup
two-phase construction: synchronous for X509/RawKey/SelfSigned,
async for Acme (spawns AcmeState event loop, builds ServerConfig
with ResolvesServerCertAcme). acme-tls/1 ALPN added when ACME is
active; dispatch_quinn guard closes challenge connections
gracefully (challenge is TLS-layer-handled). acme feature gate
keeps rustls-acme out of non-ACME builds.
Workspace: build/test/clippy green across all 3 feature configs
(quinn-only, quinn+iroh, quinn+acme, all-features). 331 tests, 0
failures, 0 warnings.
ADR-027 resolves the architectural gap surfaced when ACME integration
became a concrete target:
1. TlsIdentity::Acme variant — static config data (domains, cache_dir,
directory, contact) with async AcmeState constructed at endpoint
setup via two-phase TlsSetup (not stuffed into the Clone-able enum).
2. TlsIdentity::RawKey decoupled from the iroh feature — uses
Ed25519SecretKey (alknet-core-owned wrapper over ed25519_dalek)
instead of iroh::SecretKey. Raw-key TLS identity (RFC 7250, the
default for most alknet nodes) now works in quinn-only builds.
iroh transport converts via SecretKey::from_bytes.
3. ACME feature-gated behind new acme feature (rustls-acme optional
dep). Non-ACME builds don't compile it.
4. dispatch_quinn guard for acme-tls/1 challenge connections — TLS-ALPN-01
is handled at the rustls cert resolver layer during the handshake;
the guard closes challenge connections gracefully instead of logging
a misleading "no handler" warning.
Research confirmed QUIC (quinn) handles ACME challenges differently than
TCP (reverse-proxy): quinn gives no ClientHello peek hook, but the
challenge is fully answered at the cert resolution step before the
connection surfaces to the application. No handler registration needed.
Spec updates: config.md, endpoint.md, open-questions.md (OQ-12),
overview.md + README.md (ADR index), ADR-010 (cross-ref).
Tasks: core/rawkey-decouple-from-iroh (gen 1, no deps),
core/acme-integration (gen 2, depends on rawkey). Graph: 36 tasks.
W1 (call/protocol/abort-cascade-wiring): wire AbortCascade into
CallAdapter handle_stream for EVENT_ABORTED. Cascades with
AbortPolicy::AbortDependents, aborts root, no descendant frames on
wire (ADR-016 Decision 2). Two integration tests added.
W2 (core/endpoint-client-fingerprint): extract TLS client cert
fingerprint in dispatch_quinn (SHA256:<hex> of leaf cert DER via
peer_identity) and dispatch_iroh (ed25519:<hex> of peer NodeId).
Fingerprint format documented in auth.md. Server config change
(with_no_client_auth → request-but-don't-require) deferred to new
follow-up task core/endpoint-request-client-cert.
W3 (vault/mnemonic-debug-redaction): replace Mnemonic derive(Debug)
with manual redacting impl (phrase: "[REDACTED]"). Seed confirmed
no Debug impl. Redaction test added.
W4 (core/auth-apikey-resources): Option B — drop entry.resources from
spec. External identities (token/fingerprint) grant scopes only;
resource-scoped ACLs are composition-internal (ADR-015/022). auth.md
corrected + limitation documented. Two tests confirm empty resources.
review-post-impl-fixes: all 4 verified, workspace green (326 tests,
0 failures, 0 clippy warnings). Review #004 status → resolved.
Graph: 34 tasks, 12 gens.
The vault spec-to-implementation sync is complete. Remove the drift
tracking tools that were only needed during sync:
- Remove the Known Source Drift table from vault/README.md
- Remove 'known drift' / 'current source uses X' prose from Security
Constraints sections in vault/README.md, encryption.md, and service.md.
The permanent constraint statements (OsRng for IVs, zeroized drop,
no unwrap, etc.) are preserved.
- Remove the drift paragraph in encryption.md Key Versioning.
- Remove stale 'to be updated per ADR-025' / 'postcard tests to be
removed' notes in protocol.md References.
- Bump status: draft -> stable in the frontmatter of all vault docs
(README, mnemonic-derivation, encryption, service, protocol).
- Update architecture/README.md: vault doc status entries to stable,
Current State paragraph reflects vault implementation complete (no
'pending ADR-025/026 refactor' language).
Review #003 found 11 critical, 14 warning, and 6 suggestion findings
after reviews #001 (governance/security) and #002 (cross-document
consistency/two-way-door audit) were resolved. The theme: types and
APIs that were *referenced* but never *defined*, and stale ADR sketches
that didn't match the now-updated spec docs.
Critical fixes (11):
- C1: DerivedKey #[derive(Deserialize)] contradicted the custom
Deserialize that rejects "[REDACTED]" — dropped the derive, added
explicit manual Serialize/Deserialize impls (protocol.md).
- C2: encrypt prose said "derived at PATHS::ENCRYPTION" but the
signature takes key_version — updated to encryption_path_for_version
(service.md).
- C3: derive_encryption_key returned DerivedKey, derive_encryption_key
_for_version returned EncryptionKey (same cache) — unified on
DerivedKey, defined CachedKey (service.md).
- C4: tokio vs std::sync::RwLock contradiction — specified
std::sync::RwLock, dropped tokio from vault deps (ADR-018, ADR-025,
service.md).
- C5: Missing drift rows in vault README — added #9 (key_version
ignored) and #10 (rotate not implemented).
- C6: ADR-022 build_root_context and invoke() sketches omitted
abort_policy (9 fields vs 10) — added the field to both sketches.
- C7: Capabilities type referenced 20+ times, never defined — added
struct definition to core-types.md with Clone+Send+Sync, Zeroize,
sealed builder API, immutability guard.
- C8: SessionOverlaySource on CallAdapter but never defined, crate
violation (alknet-call can't depend on alknet-agent) — defined the
trait in alknet-call (call-protocol.md), matching the IdentityProvider
pattern.
- C9: CompositeOperationEnv dispatch fall-through was "a two-way door"
— added contains() to OperationEnv trait, made the composite probe
before dispatching, eliminating the sentinel ambiguity.
- C10: No API for Layer 2 (connection overlay) registration, CallConnection
undefined — defined CallConnection struct + register_imported() API
(call-protocol.md).
- C11: with_local signature diverged between two examples (4 args vs 5)
— added capabilities as the 5th arg, made both examples consistent.
Warning fixes (14):
- W1: invoke_with_policy restructured as required method, invoke gets a
default impl delegating to it — eliminates duplication across impls.
- W2: CachedKey defined (service.md).
- W3: EncryptionKey constructor/glue specified, added to re-export list.
- W4: Secp256k1ExtendedPrivKey defined, derive_ethereum_key glue shown.
- W5: encryption_path_for_version rejects version < 2 (v1 is TS PBKDF2).
- W6: Wire payload schemas for all event types + ResponseEnvelope →
EventEnvelope conversion table (call-protocol.md).
- W7: Timeout section — deadline on OperationContext, composed calls
inherit parent's deadline, CallAdapter::with_timeout().
- W8: Request ID generation spec — UUID v4 for composed calls, wire ID
vs internal ID relationship for abort cascade.
- W9: unlock_new already-unlocked behavior specified (returns
AlreadyUnlocked).
- W10: KeyType Serialize/Deserialize justification corrected (stale
irpc reference removed).
- W11: OperationProvenance and CompositionAuthority defined inline in
operation-registry.md (were only in ADR-022).
- W12: encrypt/decrypt free functions marked pub(crate), relationship
to VaultServiceHandle methods stated.
- W13: rotate signature removed from encryption.md (it's a
VaultServiceHandle method, not a free function).
- W14: CallAdapter::new() + with_session_source() + with_timeout()
constructors shown.
Suggestion fixes (6): Seed: Clone note, VaultServiceInner invariant,
ExtendedPrivKey accessor signatures, CURRENT_KEY_VERSION location, ADR-018
stale actor text, derivation helpers re-export note.
Add ADR-026 (vault key model — HD derivation) recording the foundational
HD-derivation decision, 74' coin type reservation, SLIP-0010/Ed25519
default, secp256k1 feature-gating, and AES-256-GCM cipher choice. These
were previously inline rationale with no ADR (W9).
Extend ADR-018 with an explicit EncryptedData wire format lock — fields,
encoding, and semantics are frozen; no removal without a format-version
migration (W10).
Resolve the remaining guard clauses and spec decisions:
- W2: Capabilities must be immutable after construction (no interior
mutability). Makes the Arc vs deep-copy clone semantics genuinely
two-way.
- W5: Published to_* specs are compatibility contracts — best-effort
mappings are two-way before first publication, one-way after. Version
generated specs.
- W6: Salt field clarification — v2 salt is permanently unused; a future
KDF is a different derivation family, not a version-indexed path; the
field saves a wire-format change only.
- W7: unlock_new returns Zeroizing<String> — the mnemonic is the root of
trust and must not linger in freed memory.
- W17: OQ-09 WASM — server-side dispatch door is honestly closed
(Connection is concrete, tokio-bound), not implicitly preserved.
- W18: OQ-10 git — composability fork (raw smart protocol vs call-protocol
projection) is a separate decision from ERC721 scope.
- W20: from_openapi must prefix imported error codes (HTTP_404) to avoid
collision with protocol-level codes (NOT_FOUND). Normative rule, not
naming convention.
- W21: ScopedOperationEnv field is private — construction via new()/
empty(), query via allows(). Makes the future subgraph refactor
non-breaking.
- C13: Connection::set_identity — the endpoint does not read identity()
after handle() returns (Connection is moved into the spawned task).
Observability is handler-side logging. Simplest honest answer.
- W1: OperationAdapter trait is async, returns Vec<HandlerRegistration>.
from_call requires async discovery; ADR-022 changed the return type.
- W11: CompositionAuthority::as_identity() defined — constructs a
synthetic Identity (label as id, scopes, resources) not resolvable via
IdentityProvider. Second Identity construction path, acknowledged.
- W14: SecretKey is iroh::SecretKey (Ed25519) — consistent with the
endpoint's iroh dependency.
- W19: Grandchild abort propagation is inherit-by-default (option a) —
invoke() with no explicit policy inherits parent's policy. ContinueRunning
auto-propagates to grandchildren unless explicitly overridden.
The password-manager pattern (deterministic per-site passwords from HD
derivation) is not relevant to an RPC system's vault. Handlers call APIs
(using API keys, OAuth tokens, mTLS), not websites with passwords. The
vault is for cryptographic key derivation and credential encryption.
Removes:
- derive_password, derive_password_string from service.md
- site_password_path from mnemonic-derivation.md
- m/74'/1'/0'/{hash}' path from PATHS module and path semantics table
- derive_password row from the cache table
Resolves review #002 C9 (site_password_path hash mapping underspecified)
by removing the feature rather than specifying the non-standard
string→u32 mapping and Ed25519-as-password-entropy construction.
If deterministic password generation is ever needed (browser-automation
edge case), it can be re-added — the cost is near-zero. Removing it now
eliminates permanent API surface inherited from a prior project's
password-manager pattern.
Drops irpc from alknet-vault entirely. The vault's dispatch is now direct
method calls on VaultServiceHandle — no VaultProtocol enum, no
VaultMessage, no VaultServiceActor, no mpsc channel, no Service trait, no
RemoteService trait, no postcard serialization. The vault is local-only by
construction.
The core security argument: irpc made the vault remote-capable by default
(RemoteService generated unless no_rpc is passed). The IrohProtocol handler
forwards all messages without auth. The docs framed 'register an ALPN' as a
server-setup change. This is the default-insecure anti-pattern — security
should be opt-in, not opt-out. ADR-025 inverts the default: local-only is
the only mode, and remote access requires building a separate vault-server
crate (a visible architectural act, not a flag flip).
The actor path was already dead code — service.md said 'prefer
VaultServiceHandle directly — no channel, no serialization.' The actor
existed only to make irpc's Service trait work, which existed only to make
RemoteService work, which was the footgun. VaultServiceHandle's
Arc<RwLock> provides concurrent reads and exclusive writes — better
throughput than the actor's sequential processing.
DerivedKey serialization simplifies: always redact on serialize (for
logging safety), reject '[REDACTED]' on deserialize with an error. No
'postcard preserves bytes' path. This resolves review #002 W8 (silent
corruption on JSON-deserialized DerivedKey).
Resolves:
- OQ-21: remote vault access — resolved (not deferred). Not a vault crate
feature; if needed, a separate vault-server crate with its own ADR.
- C7: vault-server-crate question decided — not created now, not precluded.
- C8: operation access policy table dissolved — all operations local-only
by default; if a vault-server crate exposes some remotely, that crate
defines the policy.
- W8: DerivedKey JSON deserialization — resolved (reject redacted payloads).
Amends ADR-005 (irpc remains for alknet-call, not for alknet-vault),
ADR-018 (vault is even more standalone — zero RPC framework deps),
ADR-019 (vault is the only layer, not just the only direct-caller layer),
ADR-008 (vault integration point unchanged, but now local-only by
construction).
Diagnoses a conflation in the pre-ADR-024 spec: the OperationRegistry
inherited immutability by analogy from ADR-010's HandlerRegistry (ALPN-level),
but the TLS-config argument that justifies HandlerRegistry immutability does
not apply to the operation registry, which lives behind a single ALPN
(alknet/call). This made from_call (which discovers ops over a live connection
at runtime) structurally incompatible with the blanket immutability claim.
ADR-024 layers the operation registry by trust boundary: curated (Local) ops
are static and immutable — the startup trust boundary is where their
composition authority is granted; session (Session) and imported (FromCall
etc.) ops are dynamic at their respective scopes (per-session, per-connection)
— their trust boundaries are per-scope, not per-startup. The principle:
immutability follows the trust boundary. Immutability is the security control
for composing ops (can escalate privilege); provenance + composition authority
are the controls for non-composing ops (can't escalate).
The OperationEnv trait becomes the integration point (Arc<dyn OperationEnv>),
following the IdentityProvider precedent (ADR-004): the CallAdapter composes
the root OperationContext.env per incoming call from the active layers
(curated base + connection overlay + session overlay). Children inherit the
parent's composite env by Arc::clone — overlay composition happens once at
the root and propagates through the composition tree.
Resolves review #002 C6 (OperationContext.env type identity crisis): the
field is split into scoped_env: ScopedOperationEnv (reachability data, from
the registration bundle) and env: Arc<dyn OperationEnv + Send + Sync>
(dispatch trait object). One field was being used as two different types
(reachability set with .allows() and dispatch trait with .invoke());
Localizes W4 (hot-swap ↔ registry mutability coupling) to the connection
scope: no global mutable registry to hot-swap; overlays replace naturally
with connect/disconnect and session start/end. Schema-drift on reconnect is
a per-connection overlay-rebuild concern, not a global hot-swap protocol.
Partially addresses W3 (CallClient registry security): the registry-shape
sub-question is resolved by the overlay model; the capability-exposure
sub-question (what capabilities a remote peer can trigger) remains for
ADR-017 — ADR-024 does not overclaim resolution there.
Amends OQ-04 to scope its immutability claim to the HandlerRegistry and
cross-reference ADR-024 for the operation registry. Generalizes OQ-19's
session-overlay mechanism to also cover connection-scoped remote imports —
both are per-scope dynamic overlays on the static curated base, using the
same trait-layering mechanism.
Governance (Tier 2):
- Advance ADR-022 and ADR-023 from Proposed to Accepted (specs already
depend on their types as source of truth)
- Amend ADR-015: mark Decision 3 and Assumption 6 as superseded by ADR-022;
update handler_identity type to CompositionAuthority
- Amend ADR-002: note handle() signature revised by ADR-007 (BiStream → Connection)
- Amend ADR-004: note 'enrich/replace' AuthContext language superseded by
ADR-011's immutability model; update to describe set_identity on Connection
- Update main README ADR table to show ADR-022/023 as Accepted
Spec-ADR consistency (Tier 3):
- Add abort_policy: AbortPolicy field to OperationContext struct (ADR-016
Decision 6 mandated this but the spec omitted it)
- Define AbortPolicy enum (AbortDependents | ContinueRunning) with Default impl
- Add abort_policy to build_root_context and LocalOperationEnv::invoke()
- Define the OperationEnv trait explicitly with invoke() and
invoke_with_policy() methods (was referenced as 'must remain a trait'
but never defined)
- Specify From<StreamError> for HandlerError impl with exact variant mapping
- Add Connection::from_quinn() / from_iroh() constructors (was referenced
as Connection::new() but never defined)
- Remove undefined CertAuthorityEntry placeholder from AuthPolicy v1 (will
be added additively when alknet-ssh lands)
- Fix config.md key-differences table: rate limits are in DynamicConfig,
not StaticConfig
Mechanical fixes (Tier 1):
- overview.md: 'closes the QUIC stream' → 'closes the connection' (stale
from pre-ADR-007 model)
- overview.md: OQ-04 entry updated from stale 'defer to implementation'
to 'resolved: static at startup'
- mnemonic-derivation.md: remove duplicate helper functions block (incomplete
first copy, complete second copy)
- ADR-003: add iroh (feature-gated) to alknet-core dependency list, added
by ADR-010
- ADR-021: fix ambiguous 'W1 drift issue from the vault review' cross-reference
- ADR-022: rephrase FromCall 'leaf locally' to 'leaf in the local registry'
- ADR-017: add error_schemas to from_call mirror list and services/schema
step (inconsistency with ADR-023)
- ADR-016: fix self-referential citation ('ADR-016 Assumption 5' → 'Assumption 5')
- Add ScopedOperationEnv::empty(), allows(), new() and
CompositionAuthority::none(), new() impl blocks (referenced but undefined)
- Add call.completed clarification for non-subscription calls
- Add services/schema leading-slash normalization note
- Crate README ADR tables: add missing ADR-013 (call), ADR-015 (core),
ADR-006 + ADR-010 (vault)
- Vault README: add consolidated 'Known Source Drift' table tracking all
four drift items (OsRng, unwrap, CURRENT_KEY_VERSION, spawn bug) in one
place, including the two previously missing from README
Second pre-implementation review. Goes wider than #001 on cross-document
consistency and the two-way-door framing from ADR-009.
Finds 13 critical, 21 warning, 12 suggestion issues:
- Governance: ADR-022/023 are Proposed but specs treat them as binding;
ADR-015/002/004 (Accepted) contradict later refinements without supersession
markers
- Abort policy (ADR-016) missing from OperationContext struct; OperationEnv
trait never defined
- OperationContext.env type identity crisis (reachability set vs dispatch
trait)
- ADR-017 from_call mirror list missing error_schemas; OperationAdapter trait
stale vs ADR-022 bundle
- OQ-21 remote vault 'non-breaking' framing conflicts with ADR-019 and hides
a crate-decomposition decision; RemoteService path unvalidated
- Vault operation access policy table incomplete for security-sensitive methods
- site_password_path string-to-index mapping breaks determinism guarantee
- Two-way-door audit: ADR-022 narrowed several doors without updating OQ
classifications; 'published artifact is a contract' blind spot in framework
Includes recommended 5-pass resolution order.
ADR-015 L171 said the scoped env API was 'a two-way door for
implementation.' ADR-022 has now resolved it: ScopedOperationEnv with
operation-level granularity (HashSet<String>), not namespace-level.
Update the stale text to point to the resolution.
ADR-016 Decision 6 specifies that the abort policy (abort-dependents vs
continue-running) is set on OperationContext and propagated through
OperationEnv::invoke() — the composing handler decides the child's
policy, not the wire caller. The call.requested payload does not carry
an abort policy field. This resolves the TBD that was masquerading as a
two-way door: two of the three options ADR-016 floated (wire payload,
per-operation declaration) were inconsistent with the ADR's own
assumptions.
Also marks review #001 as resolved — all 5 critical, 4 warning, and 4
suggestion findings are now addressed.
ADR-023 adds error_schemas to OperationSpec so operations can declare
their domain-level failure modes (FILE_NOT_FOUND, RATE_LIMITED, etc.)
distinct from protocol-level codes (NOT_FOUND, FORBIDDEN, etc.). The
call.error payload gains an optional 'details' field carrying the typed
error payload conforming to the declared schema. from_openapi/to_openapi
map OpenAPI response status codes to/from ErrorDefinitions, making the
adapter contract from ADR-017 faithful on the error axis.
Also fixes W2 (KeyVersionMismatch stale comment in encryption.md —
ADR-021 implements rotation without this variant) and W4
(derive_encryption_key_for_version missing from service.md method list).
Spec updates: operation-registry.md (OperationSpec, ErrorDefinition,
Handler error mapping, services/schema), call-protocol.md (call.error
payload, CallError, ResponseEnvelope), README.md, overview.md,
open-questions.md (OQ-24), call/README.md, encryption.md, service.md.
ADR-022 wires the three controls ADR-015 specified but left without
registration paths (C1-C4 from review #001): composition authority,
scoped env, and capabilities now enter through a HandlerRegistration
bundle. Provenance (Local, FromOpenAPI, FromMCP, FromCall, Session)
determines which ops can compose — leaves don't get composition
authority. CompositionAuthority replaces handler_identity: Identity
(it's a declared authority bundle, not a peer identity). Capabilities
are per-request from the bundle (resolves closure-capture vs context
ambiguity). Kernel/user analogy: user's authority checked at External
gate; handler's composition authority used inside; scoped env bounds
reachability.
Also fixes W1 (stale ADR-020 path example) and W3 (from_mcp missing
from adapter lists in operation-registry.md).
Spec updates: operation-registry.md (OperationRegistry,
HandlerRegistration, OperationContext, OperationEnv, registration
example, capability injection), call-protocol.md (build_root_context),
README.md, overview.md, open-questions.md (OQ-23), call/README.md.
Third POC iteration (alknet-fs-sync-poc, 9/9 tests) proves multi-node
path-tree sync:
- Path tree modeled as automerge CRDT document, synced via automerge's
sync protocol over iroh QUIC connections
- Each node has a local replica; writes are local + immediate (no
network latency); sync is async, gossip-style, eventually consistent
- Concurrent writes to different paths converge cleanly; concurrent
writes to same path resolve via LWW (NFS-equivalent semantics)
- Content (blobs) and metadata (path tree) sync separately — automerge
for path edges, iroh-blobs for file bytes
- Branch inheritance works through automerge sync
Key finding: automerge concurrent put_object on same key creates a
conflict, not a merge. Root structures must be created by one node and
synced before other nodes write. This is a design constraint for the
spec.
24 total tests pass across both POC crates. All remaining unknowns are
implementation-scope, not feasibility blockers.
Validates the three-layer architecture for a content-addressed, branch-aware,
mountable filesystem:
- SQLite path tree over iroh-blobs MemStore (15/15 tests pass)
- Fossil-style branching with free content dedup via BLAKE3 content addressing
- honker-core for notify-on-commit inside the same transaction as path-tree
mutations (transactional outbox pattern)
- Write path: "branch on write, merge on close" reconciles BLAKE3-must-hash-
complete-file with chunked filesystem writes; concurrent readers see old
version until close commits atomically; crash/abort leaves old version intact
- Multi-tenancy via bucket_id column (free isolation, auth is an adapter problem)
Remaining unknowns (FsStore/redb coexistence, distributed incomplete-blob reads,
SFTP wiring, GC/tag management, branch chain depth) are implementation-scope,
not feasibility blockers.
Documents the metatensor format: a binary data format where a TypeBox/jsonschema
schema describes the layout of binary data at schema-computed offsets. Extends
safetensors (fixed TensorRef schema) to arbitrary schemas, enabling struct tensors
(records), blob tensors (variable-length via indirection), and nested layouts.
Key points:
- TypeBox schemas render to standard JSON Schema; the jsonschema Rust crate
validates them with zero translation. Custom typedef.ts kinds (TFloat32,
TInt32, TStruct) map to jsonschema custom keywords via with_keyword().
- This eliminates typebox-rs as a schema engine — replaced by jsonschema +
a small offset-computation module + ~50 lines of custom keyword impls.
- Three tensor kinds: flat (safetensor today), struct (record of typed fields),
blob (struct tensor as index + flat tensor as data store, for variable-length)
- Memory-mappable: parse header, compute offsets, mmap data, typed views per
schema. No copy, no deserialization.
- QUIC-streamable: header is one small JSON message, each tensor is a separate
stream. Lazy loading, parallel transfer, incremental compute.
- ujsx-authorable: <Tensor>, <Struct>, <Field> as layout components, same
reconciler that diffs UI trees diffs model schemas. Model versioning is
tree diffing.
- Category-theory foundation: ujsx as universal typed-tree IR, HostConfig as
interpreter. <Tensor> is no stranger than <div>.
Adds a major section documenting how @alkdev/flowgraph (already npm-published,
uses ujsx) becomes the compute graph authoring and execution layer for
alknet-tensor, replacing webgpu-torch's imperative nn.Module hierarchy and
autograd recording with declarative ujsx templates and reactive DAG execution.
Key points documented:
- The ujsx tree IS the compute graph (CUDA-graphs-shaped but declarative)
- flowgraph's two HostConfigs: GraphologyHostConfig (compile/validate) and
ReactiveHostConfig (execute with signal-driven status propagation)
- nn modules become ujsx components, autograd becomes reverse tree walk
- Conditional/Map components enable dynamic structure CUDA graphs can't express
- Network-callable compute graphs (mix local + remote ops in one template)
- TSX authoring via standard JSX→h transform (ujsx jsx-runtime as target)
- graphology → petgraph port: ~15 API methods map 1:1, removes ~5400 lines of JS
- Updated POC priorities: end-to-end skeleton now includes flowgraph integration,
petgraph host port as a separate POC
Documents the architectural direction for a PyTorch-shaped tensor computation
library built on Rust + wgpu, where QuickJS is a thin API/composition layer
and Rust owns memory, dispatch, and WGSL codegen. Derived from webgpu-torch
as the reference design (op_spec → opgen → WGSL shader pipeline) but not a
port of its code — webgpu-torch is the reference, alknet-tensor is the
production architecture.
Key decisions: JS holds handles (BufferId), Rust owns wgpu::Buffers; ~4-5
high-level Rust ops (create_tensor/dispatch_kernel/register_kernel/read/write)
not ~20 low-level GPU API calls; WgslGenerator as a third handlebars backend
in typebox-rs codegen alongside RustGenerator and TypeScriptGenerator; tensor
ops as OperationSpecs on the registry (network-callable over irpc, verified
protocol-compatible on quickjs by POC 2).
Documents the downstream problems this solves as a side effect: distributed
compute over irpc, LLM-authored model code (toolEnv pattern), edge/embedded
tensor compute, the compositing problem sidestepped (compute has no surface),
and cross-platform by construction (wgpu's many backends).
The quickjs-reactive-probe was extended to load @alkdev/operations (registry,
call protocol, response envelopes, ACL, buildCallHandler) alongside the
reactive core. All five operations assertions pass on QuickJS-NG via rquickjs:
registry/execute/envelope/acl/callHandler. 271 modules loaded total.
This closes the third highest-leverage unknown: the operations protocol is
runtime-agnostic in practice, not just in theory. Adds a new section on the
QuickJS UDF host convergence — a minimal isolate speaking the same bidirectional
operations protocol as the TypeScript reference, the Rust alknet-call port,
and the planned NAPI/Python adapters, without needing Node/Deno/Bun. Connects
to the toolEnv WASM-QuickJS sandbox precedent at /workspace/toolEnv.
Captures 5 critical, 4 warning, 4 suggestion findings from a sanity
check of the core, call, and vault crate specs against ADRs 001-021
and the OQ tracker. Criticals cluster on one tangle: the registration
API surface in operation-registry.md doesn't carry the handler
identity, scoped env, or capabilities that ADR-014/015 lock as 'set at
registration' — plus a missing error-schema concept for adapters.