Document the two codebases that inform the ShaderGenerator's op table
and the wgpu+handlebars+remote-GPU patterns:
- wonnx (MIT/Apache-2.0, archived): comprehensive ONNX op set in
Tera-templated WGSL at wonnx/templates/ — arithmetic, activation,
gemm, conv, batchnorm, softmax, etc. Port the shader implementations,
swap Tera for handlebars. compiler.rs's add_raw_template +
include_str! pattern maps 1:1 to handlebars-rs register_template_string.
- Handlebars + wgpu + remote-GPU patterns (private reference, patterns
reusable): validates the handlebars-rs side and the vast.ai deployment
shape. Patterns carried over: {{> partial}} includes for shared
fragments, inline-able constant tables via switch statements (SHA-256
k-values, universal across wgpu versions), default-valued template
parameters, wgpu-on-remote-GPU sync. sha256 as a base shader
demonstrating non-ML compute on the same dispatch surface.
Updated the WGSL codegen probe POC to reference wonnx's op set as the
porting source.
Extract the shared JS+wgpu substrate (verified by the alknet-desktop POCs)
as alknet-runtime — the generalized QuickJS-NG + wgpu runtime that both
alknet-desktop (render) and alknet-compute (tensor compute) build on. Key
property driving the split: wgpu on llvmpipe is genuinely useful compute
with no physical GPU (WGSL → optimized SIMD beats JS for non-trivial
workloads), so wgpu is unconditional in the runtime rather than a feature
flag.
Reframes the original alknet-tensor architecture-summary as alknet-compute
(builds on alknet-runtime + alknet-tensor) with ShaderGenerator as a trait
(WGSL first impl, SPIR-V/GLSL/naga-IR later per wgpu multi-input-language
support). alknet-tensor/metatensor-format.md is now clearly the pure binary
format crate (no JS or wgpu dep), usable standalone by a pure-Rust model
server.
Layering: alknet-runtime depends on alknet-call (registry authority stays
per ADR-013); alknet-compute and alknet-desktop depend on alknet-runtime;
alknet-tensor is a pure-format sibling.
Promote the WebSocket browser path from a section in http-server.md to a
first-class spec (websocket.md) and commit the contract-pattern decision
(ADR-048): a WS connection carries the native EventEnvelope call-protocol
session, not the HTTP gateway shape. The gateway endpoints are HTTP-only;
discovery on WS is via services/list/services/schema as ordinary call-protocol
ops; subscriptions project as native call.responded events (no SSE).
ADR-044 already decided WS as the v1 browser bidirectional path; ADR-048
clarifies the shape of what ADR-044 committed (§1 implies native session;
the ADR makes it an explicit implementer-visible rule). The from_wss adapter
(importing a remote node's ops over WS) is recorded as out-of-scope with a
concrete reversal trigger so it is not re-derived later.
Spec cleanup: http-server.md WS section collapsed to a stub pointer;
websocket.md Why section references ADRs rather than re-arguing them;
length-prefix decision made canonical (no prefix on WS — message boundary
is the delimiter); default upgrade path pinned (/alknet/call) with HTTP/2
extended CONNECT noted; indexes (README, http/README, overview) updated.
OQ-39 (to_openapi published-spec versioning) resolved by ADR-045:
info.version semver tracks the gateway endpoint contract, not the
operation set — per-caller operations discovered via /search do not
bump the version. The gateway pattern (ADR-042) dissolved most of the
original churn concern.
ADR-046: assembly-layer custom HTTP routes on HttpAdapter. The HTTP
router had no documented extension point for deployment-specific
endpoints (e.g., an OAI-compatible proxy at /v1/chat/completions). Adds
extra_routes: Option<Router> at construction; raw HTTP, not operations;
default surface takes precedence on collision. The mechanism is the
one-way door; specific routes are two-way.
ADR-047: remove the direct-call POST /{service}/{op} HTTP surface. The
gateway /call is the sole invoke path — the simplified contract is a
few fixed endpoints, not a per-operation REST tree. The direct-call
surface re-introduced the 'dump the full API regardless of privs'
failure mode at the HTTP level that the gateway /search was built to
escape. ADR-036's routing decision is superseded; its non-routing
clauses (SSE, Bearer auth, /healthz, stealth, error mapping) survive.
A deployment wanting a REST-like per-operation surface builds it as a
custom route projection (ADR-046).
ADR-044 updated with the tradeoff framing (WSS is the right tool for
the call-protocol-from-browser case; WebTransport is the right tool for
the generalized ALPN-stream-proxy case we don't have yet — coexist, not
migrate) and the @alkdev/pubsub concrete prior art (the EventEnvelope
{type,id,payload} the call protocol was derived from already has a
working WebSocket client/server; the sync is a small adjustment, not a
from-scratch build).
call-protocol.md references the pubsub lineage for the
transport-agnosticism claim.
OQ-40 resolved: alknet-http owns a shared reqwest_middleware::ClientWithMiddleware
(not a bare reqwest::Client) with a two-layer middleware stack —
RetryTransientMiddleware (reqwest-retry, exponential backoff on transient
failures) + inlined RetryAfterMiddleware (from melotic/reqwest-retry-after, MIT,
~50 lines, inlined to bound the upstream's unbounded HashMap storage). The two
are complementary: reqwest-retry's default strategy does not honor Retry-After.
Hot-reload is rebuild-and-swap via ArcSwap (same pattern as
ConfigIdentityProvider, ADR-035); a rebuild drops the connection pool, which
is acceptable since a config change wanting a fresh pool is the trigger. The
three one-way constraints stand unchanged: alknet-http owns its client (no
env-var config, no shared global), credentials inject per-request from
OperationContext.capabilities, outbound TLS uses the system trust store.
Records the downstream layering boundary: the agent crate's provider SSE
normalization (the solid part of aisdk's pattern — Vercel-UI-message
normalization) sits on top of this client, consuming the reqwest::Response
stream; it does not replace the client. The aisdk core/client.rs reference for
client construction is dropped (env-var config + hand-rolled retry are the
anti-patterns discarded); the from_openapi.ts SSE normalization reference in
the forwarding-handler section is kept (separate, solid pattern).
No ADR — the decision is internal to alknet-http: the client type does not
cross crate boundaries (alknet-call never sees reqwest), the library choice is
reversible, and it does not touch the system's structure, constraints, or
cross-crate API surface.
Updates: http-adapters.md (HTTP client section rewritten, references updated,
constraints/OQ bullets updated), http-mcp.md (OQ-40 status flip), open-
questions.md (OQ-40 resolved with full config-shape table), README.md (OQ-40
folded into the existing two-way-doors bucket), and three secondary docs
(crates/http/README.md, overview.md, http-server.md) that carried stale 'open'
OQ-40 references.
Working through the WebTransport implementation path surfaced a scope
question distinct from the hedging-as-deferral anti-pattern ADR-038 was
written to correct. Three findings drove the re-evaluation:
1. The browser bidirectional call-protocol path doesn't require
WebTransport — WebSocket is full-duplex, EventEnvelope fits a WS
binary message boundary cleanly, and the Dispatcher is stream-
agnostic (ADR-012). What WebTransport gives over WebSocket (native
multi-stream multiplexing, the ALPN-as-stream substrate) benefits the
proxy use case, not the call protocol.
2. WebTransport is a draft standard (-07, not RFC) on an experimental
Rust dependency stack (wtransport/h3 both self-describe as not
production-ready). Either choice puts a draft protocol on the
security surface of the first release.
3. The ALPN-stream-proxy (ADR-040) is speculative — its WASM parser
consumers (browser SSH/SFTP/git clients) don't exist yet, and the
downstream crates WebTransport deferral blocks (SSH, git, SFTP)
expose their ALPNs natively over QUIC regardless.
This is a scope decision (per ADR-009: a decision that 'genuinely
doesn't need to be made yet because the use case isn't concrete'), not
hedging. The reversal trigger is concrete: a real deployment needing
the ALPN-stream-proxy.
ADR-038 is superseded (its anti-pattern correction stands; its specific
'h3 in scope now' decision is reversed). ADR-040 and ADR-043 are
parked, not superseded — their designs revive unchanged when WebTransport
revives, with §2 (bidirectionality) and §3 (no-PeerId overlay) of ADR-043
transferring to WebSocket for v1.
ADR-044 §5 also states the 'browser is not a peer' rationale that
ADR-034 §4 closed without arguing: peer = addressable node in the
call-protocol peer graph (stable PeerId, PeerRef::Specific-reachable,
identity stable across reconnects), not 'any endpoint that exchanges
calls during a live session.' A browser is the second but not the first
(no stable crypto identity of its own, ephemeral, not addressable from
other nodes). ADR-034 §4 and Assumption 2 are amended by reference.
The wtransport-vs-hyperium dependency question is recorded (not
resolved — WebTransport is deferred) in ADR-044 §'Research note' and
webtransport.md so the revival doesn't re-derive it: wtransport probably
isn't the right choice (axum-bridge friction — it owns its own HTTP
serving path); the hyperium stack (h3 + h3-quinn + h3-webtransport) fits
the axum integration better but its server-side WebTransport API needs
verification before commitment.
Reviewed by architecture-review subagent; all critical cross-reference
issues (ADR-034 §5 stale 'in scope' assertion, ADR-036 Context listing
h3 as implemented, webtransport.md Design Decisions table) resolved.
Reframes the SSH scope around the channel multiplexer as the decomposition
point. Each feature (forwarding, SOCKS5, SFTP) is a channel type or a consumer
of channel types, stacking on the core — each layer functional when built,
none shipped broken. Dissolves the 'massive v1' framing that produced hedging
language proposing non-functional or half-built versions.
Three developments since the initial 2026-06-25 research changed the framing:
(1) WebTransport landed as ADRs 038/040/043, grounding SSH-over-WebTransport
as a constraint (the handler must be source-agnostic about its Connection);
(2) russh's runtime abstraction (russh-util swaps tokio::spawn for
wasm_bindgen_futures on wasm32) means the SSH *client* runs in WASM when fed a
WebTransport BiStream — the browser case is real, not speculative;
(3) the http crate intersection (ALPN-stream-proxy depends on SSH handlers
being source-agnostic) is now visible and specified.
The layered build order (1-4 stream+connection+channels+exec, then 5
forwarding, then 6 SOCKS5, then 7 SFTP) doubles as the configuration surface:
each layer beyond the core is an opt-in channel type, gating on the
default-deny ACL baseline inherited from russh.
A consistency review of the alknet-http specs found two classes of
issues: internal contradictions from the mid-spec pivot (the to_openapi
gateway pattern landed in prose but not in cross-references), and a
systematic client→server assumption that only holds for the OpenAPI/MCP
case leaking into the WebTransport architecture.
Class 1 (internal contradictions):
- C1: to_openapi was half-refactored — body described the ADR-042
gateway pattern but the decisions table and ADR-036 still said
'paths mirror /{service}/{op}'. ADR-036's to_openapi clause is now
amended as superseded by ADR-042; the stale decisions row and README
Principle 2 are fixed.
- C2: the axum Router route list didn't include the 5 gateway endpoints
(/search, /schema, /call, /batch, /subscribe). Added them; clarified
/openapi.json as the gateway description doc; added gateway paths to
the decoy exclusion list.
- C3: ADR-034 §5 still talked about the 'h3/WebTransport deferral
bucket' that ADR-038 eliminated. Amended §5/Consequences/References
to drop the deferral framing (the auth-model decision stands; only
the 'when' wording was stale).
Class 2 (one-way direction assumption):
- C4/C5/C6: the WebTransport specs framed the session as browser→hub
one-way, when the call protocol is bidirectional and WebTransport is
a general ALPN transport substrate. New ADR-043 reframes WebTransport
as a bidirectional ALPN transport substrate (call protocol is the
first/canonical target; needs no WASM parser), names the call
protocol's bidirectionality over WebTransport sessions, and states
the inbound no-PeerId connection-local overlay as the mirror of
ADR-034 §2. webtransport.md is updated to reflect this framing;
ADR-040 is repositioned (not superseded) as the substrate's non-call-
ALPN mechanism.
- C7: the HTTP/1.1+HTTP/2 surface's one-directionality is now named as
a lossy consequence of HTTP request/response; WebTransport is named
as the surface that restores the bidirectional call model.
- C8: overview.md acknowledges the from/to direction model is
OpenAPI/MCP-specific, not a call-protocol property.
A review subagent pass on ADR-043 + webtransport.md found no critical
issues; warnings W1-W3 (residual browser-as-subject framing, ADR-009
rationale in spec, opening abstract tone) and suggestions S2/S4/S5
were addressed.
The to_openapi spec was describing one OpenAPI path per alknet operation
— the inverse of from_openapi. That inverse is genuinely messy: the call
protocol's input is a flat JSON object, and generating a traditional
OpenAPI path entry (POST /fs/{path} with path param, body, query params)
requires reverse-engineering which fields are path/query/body — metadata
the call protocol doesn't carry. The three options (leaky HTTP metadata
on OperationSpec, fragile heuristics, manual annotation) are all messy.
ADR-042 replaces this with the gateway pattern (same as ADR-041 for
to_mcp): to_openapi generates 5 fixed endpoints (search, schema, call,
batch, subscribe) that gate access to the full operation registry. The
input is always a flat JSON body — no path/query/body split to
reverse-engineer. JSON Schema is already in the OperationSpec.
The per-caller API surface is the key advantage: /search is
AccessControl-filtered, so the client sees only what it can call. The
Gitea failure mode (dumping admin ops to every caller in a static
OpenAPI doc) is structurally impossible — the per-caller surface is the
default, not an afterthought. OpenAPI has no per-caller filtering
concept; the gateway pattern provides it through /search.
Gateway endpoint set:
- /search -> services/list (AccessControl-filtered, names + descriptions)
- /schema -> services/schema (full OperationSpec)
- /call -> call.requested (Query/Mutation, flat JSON body)
- /batch -> multiple call.requested (correlated IDs)
- /subscribe -> call.requested (Subscription, SSE) — the one endpoint
the MCP gateway excludes (MCP is request/response; OpenAPI/SSE
supports streaming)
A traditional per-operation-paths projection is additive (a deployment
that wants the nice Swagger UI builds it with HTTP-specific metadata),
not a replacement. The gateway is the default.
http-adapters.md to_openapi section rewritten: the gateway endpoint
set, per-caller filtering, error fidelity on the /call endpoint, and
the additive traditional projection. The 'Why' section adds the
flat->structured and per-caller-surface rationale.
README/overview ADR tables and the top-level README current-state note
updated for ADR-042.
The to_mcp spec was describing one MCP tool per alknet operation — the
tool-bloat problem. An LLM connecting to a node with 200 operations gets
200 MCP tools dumped into its context, degrading reasoning and wasting
context budget.
ADR-041 replaces this with the tool-gateway pattern (same pattern as
opencode's memory and worktree tools): to_mcp exposes 4 fixed meta-tools
(search, schema, call, batch) that gate access to the full operation
registry. The LLM has a few tools in context, discovers operations on
demand through search + schema, then calls. Same principle as Linux's
man command — don't preload all documentation; query on demand.
Gateway tool set:
- search -> services/list (names + descriptions, AccessControl-filtered)
- schema -> services/schema (full OperationSpec for a specific op)
- call -> call.requested (Query/Mutation only, request/response)
- batch -> multiple call.requested (correlated IDs, OQ-14)
Subscription operations are excluded — MCP tool calls are
request/response by protocol design (the client blocks until
CallToolResult returns); streaming subscriptions don't fit. Subscriptions
are filtered out of search results and cannot be invoked via call.
http-mcp.md to_mcp section rewritten: the gateway tool set, Subscription
exclusion, and the service behavior (tools/list returns 4 fixed tools,
tools/call dispatches through the gateway). The 'Why' section adds the
tool-bloat rationale and the memory/worktree tool pattern that informed
the design.
README/overview ADR tables and the top-level README current-state note
updated for ADR-041.
The 'WebTransport proxy' concept was conflating two distinct things;
this pass separates them:
1. In-process ALPN-stream-proxy (ADR-040, in alknet-http): the h3 handler
hands a WebTransport stream to another ALPN handler (SshAdapter,
GitAdapter, etc.) as a Connection, so a browser with a WASM parser
can reach any ALPN service via WebTransport. Path-based routing
(the CONNECT path declares the target: /alknet/ssh -> SshAdapter).
HttpAdapter gains Arc<HandlerRegistry> for the lookup. The browser's
WASM parser implements BiStream (ADR-007) over the WebTransport
stream. SSH-over-WebTransport is HTTPS-shaped at the network layer
(anti-censorship: the 'VPN-like without being a VPN' use case on a
clean foundation). russh-sftp demonstrates WASM targeting is
feasible; SSH is the next target.
2. Standalone relay service (OQ-38, future alknet-relay crate): a full
relay - fork of iroh-relay - with WebTransport proxy fallback for
NAT traversal. This is infrastructure, not a mode of the h3 handler.
OQ-38 reframed to be the standalone-relay scope question (distinct
from the in-process proxy now resolved by ADR-040).
webtransport.md updated: three stream destinations (call protocol,
ALPN-handler proxy, other sub-protocols) with path-based routing; new
'ALPN-stream-proxy' section covering the WASM client side, auth model
(bearer token gates the session; protocol's own auth gates the
protocol session), and the HandlerRegistry reference.
README/overview ADR tables and OQ summaries updated for ADR-040.
Replace AcceptAnyServerCertVerifier (a security hole for X.509) with
verifier selection by PeerEntry presence (ADR-034 §3, OQ-29):
- build_client_auth presents the Ed25519 key as an RFC 7250 raw public
key client cert (replaces with_no_client_auth), activating the
PeerEntry fingerprint -> peer_id resolution path on quinn.
- select_server_verifier: Some(fingerprint) -> FingerprintPinVerifier
(fingerprint match for known peers); None -> WebPkiServerVerifier
(CA verification for public X.509 endpoints). None + Ed25519 raw key
fails closed at handshake (no CA to fall back to).
- FingerprintPinVerifier matches ed25519:<hex> (raw key extraction) and
SHA256:<hex> (DER hash); verifies handshake signatures via
verify_tls13_signature_with_raw_key / verify_tls12/13_signature.
- Extract shared fingerprint logic into alknet_core::fingerprint (pub
module) reused by endpoint (server-side) and call_client (client-side).
- remote_identity: None is load-bearing (not defaulted to placeholder).
- Integration tests updated to pin the self-signed server cert
fingerprint (the known-peer path).
Commits the concrete adapter shape deferred by ADR-033: read-sync /
write-async split with honker NOTIFY/LISTEN for no-restart cache
invalidation, against SQLite, in a separate alknet-store-sqlite crate.
Two constraints drive the design: (1) the hot-path read trait
(IdentityProvider::resolve_from_fingerprint, CredentialStore::get) is
sync — called in the accept loop, no .await — so a SQLite-backed
adapter must cache in memory and serve sync reads from the cache; (2)
auth changes must take effect without a restart (an early issue the
project already fixed for ConfigIdentityProvider via ArcSwap config
reload). honker's SQLite NOTIFY/LISTEN (single-digit-ms wake, no
polling) is the cache-invalidation mechanism that makes both hold:
write commits to SQLite + emits NOTIFY, the running process's LISTEN
wakes, the in-memory index reloads and atomically swaps, the next
read sees the new state. Same ArcSwap-reload pattern as config,
generalized from 'config file is source of truth' to 'SQLite is
source of truth, honker signals when it changed.'
New async IdentityStore write trait (put_peer / update_peer /
remove_peer) extends the sync IdentityProvider read trait for peer
mutations. ConfigIdentityProvider does NOT implement it (config
reload is its write path — a posture enforced by the absence of a
backend, not a type-system constraint); SqliteIdentityProvider
implements both. CredentialStore::put/delete refined to async (within
ADR-031's one-way door — the contract was get/put/delete keyed by
provider persisting EncryptedData never decrypting; sync-vs-async was
unspecified). CredentialStoreError renamed to shared StoreError
covering both traits.
alknet-store-sqlite is one crate implementing both IdentityStore and
CredentialStore with shared SQLite connection + honker LISTEN infra
(splitting later is a two-way door). Schema shape committed (one row
per PeerEntry with JSON columns for fingerprints/scopes/resources;
one row per EncryptedData blob keyed by provider); exact DDL is an
implementation-detail two-way door in the adapter crate. The keypal
adapter-factory pattern is intentionally not ported to Rust (runtime
column-mapping is a TS affordance; in Rust each adapter is a concrete
type, cross-cutting concerns are a shared helper module).
Amends ADR-031 (put/delete async refinement, StoreError rename),
ADR-033 (concrete adapter shape now specified, two-crate framing
collapsed to one), ADR-034 (OQ-36 now resolved), auth.md (IdentityStore
section, cache-invalidation summary, OQ-36 reference), config.md (two
write paths note), and the OQ-36/OQ-34 entries in open-questions.md.
Review fixed 4 criticals (error-type name divergence, duplicate
IdentityProvider sketch, upsert/Duplicate ambiguity, 'shape unchanged'
contradiction), 7 warnings, 5 suggestions.
Untangles the conflation of three distinct remote roles under 'X.509
endpoint': (1) public X.509 endpoint — a remote HTTPS/call-over-TLS
server the local node is a client of (no PeerEntry, no PeerId, not in
the peer graph; CA verification + bearer token); (2) transport relay —
iroh's DERP-equivalent, infrastructure, not an alknet peer; (3) hub /
hosting node — an alknet peer that also exposes a public domain + X.509
for browsers (mixed-fingerprint PeerEntry, already supported by
ADR-030).
The load-bearing one-way door is the client-side verifier selection
rule: known peer (PeerEntry present) → fingerprint pin; unknown X.509
remote → CA verification (WebPkiServerVerifier); unknown Ed25519
remote → fails closed. This closes the AcceptAnyServerCertVerifier
security hole OQ-29 flagged, with the peer-model criterion (PeerEntry
presence) made explicit. The 'make PeerEntry symmetric' instinct is
rejected — pure-client connections to public APIs have no stable
logical identity to pin.
Documents that CallCredentials.remote_identity: None is load-bearing
(None = public X.509 endpoint → CA path, not a missing field; Some =
known peer → fingerprint pin), closing a subtle gap where an
implementer could have defaulted to a placeholder or treated None as
skip-verify.
Records WebTransport relay-as-proxy (deferred with h3/WebTransport,
new OQ-HTTP-07) and on-chain/smart-contract peer discovery (fits the
OQ-36 repo/adapter pattern, no auth-model change) so they aren't lost.
Amends auth.md and client-and-adapters.md with the three-role naming,
the verifier selection rule, and the Option semantics; updates OQ-37
to resolved in open-questions.md, README.md, and both crate READMEs.