alknet

Author	SHA1	Message	Date
glm-5.2	ad167aa470	docs(arch): update core/call specs for ADR-050 — ownership provider + resource_id_path operation-registry.md: - OperationSpec gains resource_id_path: Option<String> (JSON pointer into the input for runtime-spawned resource ID extraction) - AccessControl::check signature updated: consults an OwnershipProvider for dynamic resource ownership; falls back to static Identity.resources when no provider is wired (backward compatible) - Dispatch flow updated: step 3 extracts resource_id via spec.resource_id_path before the ACL check - Added composition + dynamic ownership interaction (ADR-050 §4d): two orthogonal checks, ADR-015/022 unchanged - Design Decisions table + Open Questions + References updated auth.md: - New 'Ownership Provider and Store (ADR-050)' section: OwnershipProvider (sync read trait) + OwnershipStore (async write trait) + InMemoryOwnershipStore default adapter; fourth instance of the repo/adapter pattern (ADR-033) - How it integrates with AccessControl::check - Access pattern: proxy-only (spawner owns, proxy to share, teardown revokes; no grant mechanism in core) - Per-node ownership (no cross-node propagation in the base model) - Resource-scoped ACLs table gains the dynamic ownership path - Design Decisions table + Open Questions updated	2026-07-05 08:50:04 +00:00
glm-5.2	f6ddd37433	docs(arch): add ADR-050 — dynamic resource ownership for runtime-spawned resources Writes OQ-42's five decisions into ADR format: 1. Storage: reuse the repo/adapter pattern (ADR-033, fourth instance alongside IdentityProvider/IdentityStore/CredentialStore). New traits: OwnershipProvider (sync read, hot-path) + OwnershipStore (async write, handler lifecycle). In-memory default; persistence adapter additive. 2. Integration: AccessControl::check consults the ownership provider directly (Option 2). OperationSpec gains resource_id_path (JSON pointer into the input). Backward-compatible — ownership=None falls back to the static Identity.resources path. 3. Access pattern: proxy-only. Spawner owns, proxy to share via from_call + forwarded_for (ADR-032), teardown revokes. No grant mechanism in core. Future grant is additive (new trait method), stated as reversal-cost classification, not deferral. 4. Four edge specifics: list = scope-gate + result-filter; teardown = automatic, handler-driven; fleet = per-node ownership, downstream app tracks 'who is this for'; composition = two orthogonal checks, ADR-015/022 unchanged. Reviewed: zero critical issues. Two warnings fixed (None-handling in the check sketch, missing ADR-004 cross-ref). One suggestion applied ('v1 mechanism' → 'initial mechanism' to avoid hedging misread).	2026-07-04 16:08:04 +00:00
glm-5.2	13dd15ab0b	docs(arch): resolve OQ-42 — proxy-only ownership model for runtime-spawned resources Lock in the dynamic resource ownership model for runtime-spawned resources (containers, TTYs, runner workspaces). Three decisions: 1. Storage: reuse the repo/adapter pattern (ADR-033, fourth instance alongside IdentityProvider/IdentityStore/CredentialStore) with an in-memory default adapter; sync read on the dispatch hot path. 2. Integration: AccessControl::check consults an ownership provider directly (Option 2); OperationSpec gains resource_id_path (JSON pointer into the input) so the spec is fully self-describing for authorization. 3. Access pattern: proxy-only — spawner owns, proxy to share via from_call + forwarded_for (ADR-032), teardown revokes. No grant mechanism in core; 'poking holes' is a downstream-app concern. A future grant is additive (new trait method), stated as reversal-cost classification, not deferral. Four edge specifics pinned: list = scope-gate + result-filter; teardown = automatic, handler-driven; fleet = per-node ownership, downstream app tracks 'who is this for'; composition = two orthogonal checks, ADR-015/022 unchanged. Removes the prior hedging language ('decision direction set', 'open for the ADR') and the contingent qualifiers from specifics 3/4 now that the proxy-vs-grant call is made. The dependent crate specs (docker, tty, runner, fleet) can declare their AccessControl shapes against this model.	2026-07-04 16:02:38 +00:00
glm-5.2	3daecd7ab2	fix(process): add architect Safe Exit for deferred decisions, clean hedging language - Add Safe Exit section to architect spec: when a decision genuinely can't be made, mark OQ as deferred(scope) with concrete blocking condition, create blocker task, move on - Add anti-patterns #10-#11 covering hedging synonyms (feature extension, additive, not a v1 blocker, for now, resolved with escape hatch) - Add hedging audit to architect self-review checklist - Clean hedging language from resolved OQs (OQ-04, OQ-13, OQ-14, OQ-16) - Reclassify OQ-32 and OQ-41 as deferred(scope) with blocking conditions - Add deferred(scope) status to OQ format in sdd_process.md	2026-07-04 15:31:04 +00:00
glm-5.2	f390550a06	docs(arch): set OQ-42 decision direction — repo/adapter storage + Option 2 integration Two structural decisions for dynamic resource ownership (OQ-42), recorded in the OQ so ADR drafting starts from a clear position: 1. Storage side reuses the repo/adapter pattern (ADR-033) — a fourth instance alongside IdentityProvider/IdentityStore/CredentialStore. Trait in alknet-core with an in-memory default adapter; persistence adapter separable. Sync read with ArcSwap + honker-NOTIFY cache invalidation, same shape as ConfigIdentityProvider (ADR-035). No new shape invented; no Phase 0 needed for the storage side. 2. Integration point is Option 2 — AccessControl::check consults the ownership provider directly. Rejected Option 1 (augment identity with a per-request snapshot) because its purity was theatrical — the question 'can X exec into container C' was never purely a function of identity, it just looked that way because the resource set was static. Option 2 makes check's signature honest about what ACL checking is in the presence of dynamic resources. Cost is a check signature change (one-way door, every call site updates) — implementation cost, not semantic cost, per the project's decision principle. Refinement that makes Option 2 clean: OperationSpec gains resource_id_path (JSON pointer into the input, e.g. '$.containerId'). Fits naturally with the existing JSON-Schema-backed input_schema — the pointer is within an existing schema on the same spec. OperationSpec becomes fully self-describing for authorization: resource type, action, and which input field drives the resource lookup, all declared on the spec. Four specifics remain open for the ADR: the no-specific-resource (list) case, teardown coupling, fleet representation (spoke resources on the hub), and composition interaction with dynamic ownership. These were surfaced by choosing Option 2 rather than by leaving the integration point undecided.	2026-07-04 13:04:14 +00:00
glm-5.2	e29672942c	docs(arch): record OQ-42 — dynamic resource ownership for runtime-spawned resources The alknet-docker POC research surfaced that containers are a natural AccessControl resource, but the resource set is dynamic (containers are created at runtime) and ownership is derived from creation — which the current static Identity.resources model (config-sourced via PeerEntry/CompositionAuthority) doesn't fit. The issue generalizes to every crate that spawns a thing at runtime and exposes it over the call protocol (docker, tty, opencode-runner wrapper, alknet-container fleet layer); solving it per-crate would diverge. Recording as OQ-42 in the centralized tracker with the generalized framing so the architecture workflow sees it: one-way door at the model level (core/call), two-way at the mechanism level, high priority, blocks the dependent crate specs. A Phase 0 research/POC pass is likely warranted before the ADR.	2026-07-04 11:38:23 +00:00
glm-5.2	8c7443c7c6	docs(research): fix alknet-docker POC normalization crate boundary — alknet-compute is a workload not the fleet layer; add head-worker/machine-node model and dispatch reverse-runner prior art	2026-07-03 10:28:53 +00:00
glm-5.2	4500338384	docs(research): add alknet-tty phase-0 findings — terminal session protocol as separate ALPN, TtyBackend trait, dissolves alknet-ssh PTY hedge	2026-07-03 07:48:21 +00:00
glm-5.2	157f1dfb18	docs(research): add alknet-docker POC summary — validates two-carriage model (JSON + raw) for bollard docker ops over framed bidi streams	2026-07-02 17:08:52 +00:00
glm-5.2	7ecc11610a	docs(arch): ADR-049 — streaming handler for subscription operations The call protocol spec describes streaming (call.responded*N + call.completed, PendingRequestMap::Subscribe, CallConnection::subscribe), but the server-side Handler type returned a single ResponseEnvelope — a Subscription op had no way to produce a stream. The TS predecessor (@alkdev/operations) had separate OperationHandler / SubscriptionHandler types; the Rust port collapsed them, losing the streaming path. This restores it end-to-end: StreamingHandler type, HandlerKind on HandlerRegistration validated against op_type, invoke_streaming() on OperationRegistry, server-side dispatch branches on op_type, new INVALID_OPERATION_TYPE protocol code for wrong-dispatch-path misuse, GatewayDispatch::invoke_streaming() for /subscribe SSE, from_call stream forwarding via CallConnection::subscribe(), from_openapi SSE forwarding. OperationEnv::invoke() stays request/response-only (stream composition is handler-level, not protocol-level). Amends ADR-023's protocol-code list (five → six). Tracks the stream-operators library as OQ-41 (feature extension, not an unmade decision).	2026-07-02 07:43:01 +00:00
glm-5.2	e0c6f61e6a	docs(http): pre-decomposition sanity check fixes — /subscribe POST, direct-call cleanup, from_mcp output handling Three issues found in the http crate spec sanity check that would have caused problems during task decomposition, now fixed: C1 — /subscribe GET→POST: the gateway's /subscribe is an invoke endpoint carrying { operation, input } in the body, but was listed as GET (which has no body). Flipped to POST with Accept: text/event-stream negotiating the SSE response, consistent with /call's flat-JSON-body invariant. Browsers using EventSource can't POST but use WebSocket for the bidirectional path; the HTTP gateway's /subscribe is for non-browser HTTP clients (fetch + ReadableStream). Touches ADR-042, ADR-047, ADR-048, http-adapters.md, http-server.md. C2 — stale direct-call references: three spots contradicted ADR-047 (which removed the POST /{service}/{op} direct-call surface) and ADR-046 §3 (which states /{service}/{op} is no longer reserved). Cleaned up in http-server.md (custom-routes intro + collision list) and ADR-046 §6 (default-surface list). W2 — from_mcp output handling: the spec's fallback for tools without outputSchema was Type.Unknown(), but the correct fallback is the MCP ContentBlock union (text\|image\|audio\|resource\|resource_link) — a well-defined MCP type, not Unknown. Fixed http-mcp.md with the full structuredContent-preferred-over-content-blocks logic (matching the TS adapter and rmcp SDK), enriched references with specific rmcp source files. Also added shared-dispatch-spine notes to http-mcp.md and http-adapters.md cross-referencing the new research findings. Research (docs/research/alknet-http-gateway-factoring/findings.md): to_mcp and to_openapi share a dispatch spine (resolve → invoke → map). Recommendation: extract a thin shared struct now, not a GatewayDispatch trait — the server-integration layers (axum routes vs rmcp StreamableHttpService) and wire-framing stay per-gateway. A third gateway is not on the horizon; if one appears its server-integration needs its own shape anyway. Minor: WS route precedence note (websocket.md), OpenAPISpec shared-type-not-shape clarification (http-adapters.md), date bumps.	2026-07-01 05:41:07 +00:00
glm-5.2	3edc42e3b4	docs(compute): add wonnx + handlebars/wgpu reference implementations Document the two codebases that inform the ShaderGenerator's op table and the wgpu+handlebars+remote-GPU patterns: - wonnx (MIT/Apache-2.0, archived): comprehensive ONNX op set in Tera-templated WGSL at wonnx/templates/ — arithmetic, activation, gemm, conv, batchnorm, softmax, etc. Port the shader implementations, swap Tera for handlebars. compiler.rs's add_raw_template + include_str! pattern maps 1:1 to handlebars-rs register_template_string. - Handlebars + wgpu + remote-GPU patterns (private reference, patterns reusable): validates the handlebars-rs side and the vast.ai deployment shape. Patterns carried over: {{> partial}} includes for shared fragments, inline-able constant tables via switch statements (SHA-256 k-values, universal across wgpu versions), default-valued template parameters, wgpu-on-remote-GPU sync. sha256 as a base shader demonstrating non-ML compute on the same dispatch surface. Updated the WGSL codegen probe POC to reference wonnx's op set as the porting source.	2026-06-30 13:05:54 +00:00
glm-5.2	303b9a58e2	docs(research): split alknet-tensor into alknet-runtime + alknet-compute + alknet-tensor Extract the shared JS+wgpu substrate (verified by the alknet-desktop POCs) as alknet-runtime — the generalized QuickJS-NG + wgpu runtime that both alknet-desktop (render) and alknet-compute (tensor compute) build on. Key property driving the split: wgpu on llvmpipe is genuinely useful compute with no physical GPU (WGSL → optimized SIMD beats JS for non-trivial workloads), so wgpu is unconditional in the runtime rather than a feature flag. Reframes the original alknet-tensor architecture-summary as alknet-compute (builds on alknet-runtime + alknet-tensor) with ShaderGenerator as a trait (WGSL first impl, SPIR-V/GLSL/naga-IR later per wgpu multi-input-language support). alknet-tensor/metatensor-format.md is now clearly the pure binary format crate (no JS or wgpu dep), usable standalone by a pure-Rust model server. Layering: alknet-runtime depends on alknet-call (registry authority stays per ADR-013); alknet-compute and alknet-desktop depend on alknet-runtime; alknet-tensor is a pure-format sibling.	2026-06-30 12:44:39 +00:00
glm-5.2	b71db99753	docs(http): add ADR-048 and websocket.md — WS carries native session, not gateway Promote the WebSocket browser path from a section in http-server.md to a first-class spec (websocket.md) and commit the contract-pattern decision (ADR-048): a WS connection carries the native EventEnvelope call-protocol session, not the HTTP gateway shape. The gateway endpoints are HTTP-only; discovery on WS is via services/list/services/schema as ordinary call-protocol ops; subscriptions project as native call.responded events (no SSE). ADR-044 already decided WS as the v1 browser bidirectional path; ADR-048 clarifies the shape of what ADR-044 committed (§1 implies native session; the ADR makes it an explicit implementer-visible rule). The from_wss adapter (importing a remote node's ops over WS) is recorded as out-of-scope with a concrete reversal trigger so it is not re-derived later. Spec cleanup: http-server.md WS section collapsed to a stub pointer; websocket.md Why section references ADRs rather than re-arguing them; length-prefix decision made canonical (no prefix on WS — message boundary is the delimiter); default upgrade path pinned (/alknet/call) with HTTP/2 extended CONNECT noted; indexes (README, http/README, overview) updated.	2026-06-30 12:27:00 +00:00
glm-5.2	2a6e4c371a	docs(http): resolve OQ-39; add ADRs 045-047; record pubsub prior art for WS path OQ-39 (to_openapi published-spec versioning) resolved by ADR-045: info.version semver tracks the gateway endpoint contract, not the operation set — per-caller operations discovered via /search do not bump the version. The gateway pattern (ADR-042) dissolved most of the original churn concern. ADR-046: assembly-layer custom HTTP routes on HttpAdapter. The HTTP router had no documented extension point for deployment-specific endpoints (e.g., an OAI-compatible proxy at /v1/chat/completions). Adds extra_routes: Option<Router> at construction; raw HTTP, not operations; default surface takes precedence on collision. The mechanism is the one-way door; specific routes are two-way. ADR-047: remove the direct-call POST /{service}/{op} HTTP surface. The gateway /call is the sole invoke path — the simplified contract is a few fixed endpoints, not a per-operation REST tree. The direct-call surface re-introduced the 'dump the full API regardless of privs' failure mode at the HTTP level that the gateway /search was built to escape. ADR-036's routing decision is superseded; its non-routing clauses (SSE, Bearer auth, /healthz, stealth, error mapping) survive. A deployment wanting a REST-like per-operation surface builds it as a custom route projection (ADR-046). ADR-044 updated with the tradeoff framing (WSS is the right tool for the call-protocol-from-browser case; WebTransport is the right tool for the generalized ALPN-stream-proxy case we don't have yet — coexist, not migrate) and the @alkdev/pubsub concrete prior art (the EventEnvelope {type,id,payload} the call protocol was derived from already has a working WebSocket client/server; the sync is a small adjustment, not a from-scratch build). call-protocol.md references the pubsub lineage for the transport-agnosticism claim.	2026-06-30 09:49:25 +00:00
glm-5.2	3327d585da	docs(http): resolve OQ-40 reqwest client config — ClientWithMiddleware + retry/retry-after middleware stack OQ-40 resolved: alknet-http owns a shared reqwest_middleware::ClientWithMiddleware (not a bare reqwest::Client) with a two-layer middleware stack — RetryTransientMiddleware (reqwest-retry, exponential backoff on transient failures) + inlined RetryAfterMiddleware (from melotic/reqwest-retry-after, MIT, ~50 lines, inlined to bound the upstream's unbounded HashMap storage). The two are complementary: reqwest-retry's default strategy does not honor Retry-After. Hot-reload is rebuild-and-swap via ArcSwap (same pattern as ConfigIdentityProvider, ADR-035); a rebuild drops the connection pool, which is acceptable since a config change wanting a fresh pool is the trigger. The three one-way constraints stand unchanged: alknet-http owns its client (no env-var config, no shared global), credentials inject per-request from OperationContext.capabilities, outbound TLS uses the system trust store. Records the downstream layering boundary: the agent crate's provider SSE normalization (the solid part of aisdk's pattern — Vercel-UI-message normalization) sits on top of this client, consuming the reqwest::Response stream; it does not replace the client. The aisdk core/client.rs reference for client construction is dropped (env-var config + hand-rolled retry are the anti-patterns discarded); the from_openapi.ts SSE normalization reference in the forwarding-handler section is kept (separate, solid pattern). No ADR — the decision is internal to alknet-http: the client type does not cross crate boundaries (alknet-call never sees reqwest), the library choice is reversible, and it does not touch the system's structure, constraints, or cross-crate API surface. Updates: http-adapters.md (HTTP client section rewritten, references updated, constraints/OQ bullets updated), http-mcp.md (OQ-40 status flip), open- questions.md (OQ-40 resolved with full config-shape table), README.md (OQ-40 folded into the existing two-way-doors bucket), and three secondary docs (crates/http/README.md, overview.md, http-server.md) that carried stale 'open' OQ-40 references.	2026-06-30 08:02:30 +00:00
glm-5.2	125cb49cc4	docs(http): defer h3/WebTransport (ADR-044); browsers use WebSocket for v1 Working through the WebTransport implementation path surfaced a scope question distinct from the hedging-as-deferral anti-pattern ADR-038 was written to correct. Three findings drove the re-evaluation: 1. The browser bidirectional call-protocol path doesn't require WebTransport — WebSocket is full-duplex, EventEnvelope fits a WS binary message boundary cleanly, and the Dispatcher is stream- agnostic (ADR-012). What WebTransport gives over WebSocket (native multi-stream multiplexing, the ALPN-as-stream substrate) benefits the proxy use case, not the call protocol. 2. WebTransport is a draft standard (-07, not RFC) on an experimental Rust dependency stack (wtransport/h3 both self-describe as not production-ready). Either choice puts a draft protocol on the security surface of the first release. 3. The ALPN-stream-proxy (ADR-040) is speculative — its WASM parser consumers (browser SSH/SFTP/git clients) don't exist yet, and the downstream crates WebTransport deferral blocks (SSH, git, SFTP) expose their ALPNs natively over QUIC regardless. This is a scope decision (per ADR-009: a decision that 'genuinely doesn't need to be made yet because the use case isn't concrete'), not hedging. The reversal trigger is concrete: a real deployment needing the ALPN-stream-proxy. ADR-038 is superseded (its anti-pattern correction stands; its specific 'h3 in scope now' decision is reversed). ADR-040 and ADR-043 are parked, not superseded — their designs revive unchanged when WebTransport revives, with §2 (bidirectionality) and §3 (no-PeerId overlay) of ADR-043 transferring to WebSocket for v1. ADR-044 §5 also states the 'browser is not a peer' rationale that ADR-034 §4 closed without arguing: peer = addressable node in the call-protocol peer graph (stable PeerId, PeerRef::Specific-reachable, identity stable across reconnects), not 'any endpoint that exchanges calls during a live session.' A browser is the second but not the first (no stable crypto identity of its own, ephemeral, not addressable from other nodes). ADR-034 §4 and Assumption 2 are amended by reference. The wtransport-vs-hyperium dependency question is recorded (not resolved — WebTransport is deferred) in ADR-044 §'Research note' and webtransport.md so the revival doesn't re-derive it: wtransport probably isn't the right choice (axum-bridge friction — it owns its own HTTP serving path); the hyperium stack (h3 + h3-quinn + h3-webtransport) fits the axum integration better but its server-side WebTransport API needs verification before commitment. Reviewed by architecture-review subagent; all critical cross-reference issues (ADR-034 §5 stale 'in scope' assertion, ADR-036 Context listing h3 as implemented, webtransport.md Design Decisions table) resolved.	2026-06-30 05:55:55 +00:00
glm-5.2	78b226d31b	docs(research): revise alknet-ssh phase-0 — channel decomposition, WebTransport grounding, WASM client Reframes the SSH scope around the channel multiplexer as the decomposition point. Each feature (forwarding, SOCKS5, SFTP) is a channel type or a consumer of channel types, stacking on the core — each layer functional when built, none shipped broken. Dissolves the 'massive v1' framing that produced hedging language proposing non-functional or half-built versions. Three developments since the initial 2026-06-25 research changed the framing: (1) WebTransport landed as ADRs 038/040/043, grounding SSH-over-WebTransport as a constraint (the handler must be source-agnostic about its Connection); (2) russh's runtime abstraction (russh-util swaps tokio::spawn for wasm_bindgen_futures on wasm32) means the SSH client runs in WASM when fed a WebTransport BiStream — the browser case is real, not speculative; (3) the http crate intersection (ALPN-stream-proxy depends on SSH handlers being source-agnostic) is now visible and specified. The layered build order (1-4 stream+connection+channels+exec, then 5 forwarding, then 6 SOCKS5, then 7 SFTP) doubles as the configuration surface: each layer beyond the core is an opt-in channel type, gating on the default-deny ACL baseline inherited from russh.	2026-06-29 13:03:11 +00:00
glm-5.2	0a78306686	docs(http): add ADR-043 WebTransport bidirectional ALPN substrate; fix spec drift from mid-spec pivot A consistency review of the alknet-http specs found two classes of issues: internal contradictions from the mid-spec pivot (the to_openapi gateway pattern landed in prose but not in cross-references), and a systematic client→server assumption that only holds for the OpenAPI/MCP case leaking into the WebTransport architecture. Class 1 (internal contradictions): - C1: to_openapi was half-refactored — body described the ADR-042 gateway pattern but the decisions table and ADR-036 still said 'paths mirror /{service}/{op}'. ADR-036's to_openapi clause is now amended as superseded by ADR-042; the stale decisions row and README Principle 2 are fixed. - C2: the axum Router route list didn't include the 5 gateway endpoints (/search, /schema, /call, /batch, /subscribe). Added them; clarified /openapi.json as the gateway description doc; added gateway paths to the decoy exclusion list. - C3: ADR-034 §5 still talked about the 'h3/WebTransport deferral bucket' that ADR-038 eliminated. Amended §5/Consequences/References to drop the deferral framing (the auth-model decision stands; only the 'when' wording was stale). Class 2 (one-way direction assumption): - C4/C5/C6: the WebTransport specs framed the session as browser→hub one-way, when the call protocol is bidirectional and WebTransport is a general ALPN transport substrate. New ADR-043 reframes WebTransport as a bidirectional ALPN transport substrate (call protocol is the first/canonical target; needs no WASM parser), names the call protocol's bidirectionality over WebTransport sessions, and states the inbound no-PeerId connection-local overlay as the mirror of ADR-034 §2. webtransport.md is updated to reflect this framing; ADR-040 is repositioned (not superseded) as the substrate's non-call- ALPN mechanism. - C7: the HTTP/1.1+HTTP/2 surface's one-directionality is now named as a lossy consequence of HTTP request/response; WebTransport is named as the surface that restores the bidirectional call model. - C8: overview.md acknowledges the from/to direction model is OpenAPI/MCP-specific, not a call-protocol property. A review subagent pass on ADR-043 + webtransport.md found no critical issues; warnings W1-W3 (residual browser-as-subject framing, ADR-009 rationale in spec, opening abstract tone) and suggestions S2/S4/S5 were addressed.	2026-06-29 10:43:18 +00:00
glm-5.2	69ebe58bab	docs(http): add ADR-042 OpenAPI gateway pattern for to_openapi The to_openapi spec was describing one OpenAPI path per alknet operation — the inverse of from_openapi. That inverse is genuinely messy: the call protocol's input is a flat JSON object, and generating a traditional OpenAPI path entry (POST /fs/{path} with path param, body, query params) requires reverse-engineering which fields are path/query/body — metadata the call protocol doesn't carry. The three options (leaky HTTP metadata on OperationSpec, fragile heuristics, manual annotation) are all messy. ADR-042 replaces this with the gateway pattern (same as ADR-041 for to_mcp): to_openapi generates 5 fixed endpoints (search, schema, call, batch, subscribe) that gate access to the full operation registry. The input is always a flat JSON body — no path/query/body split to reverse-engineer. JSON Schema is already in the OperationSpec. The per-caller API surface is the key advantage: /search is AccessControl-filtered, so the client sees only what it can call. The Gitea failure mode (dumping admin ops to every caller in a static OpenAPI doc) is structurally impossible — the per-caller surface is the default, not an afterthought. OpenAPI has no per-caller filtering concept; the gateway pattern provides it through /search. Gateway endpoint set: - /search -> services/list (AccessControl-filtered, names + descriptions) - /schema -> services/schema (full OperationSpec) - /call -> call.requested (Query/Mutation, flat JSON body) - /batch -> multiple call.requested (correlated IDs) - /subscribe -> call.requested (Subscription, SSE) — the one endpoint the MCP gateway excludes (MCP is request/response; OpenAPI/SSE supports streaming) A traditional per-operation-paths projection is additive (a deployment that wants the nice Swagger UI builds it with HTTP-specific metadata), not a replacement. The gateway is the default. http-adapters.md to_openapi section rewritten: the gateway endpoint set, per-caller filtering, error fidelity on the /call endpoint, and the additive traditional projection. The 'Why' section adds the flat->structured and per-caller-surface rationale. README/overview ADR tables and the top-level README current-state note updated for ADR-042.	2026-06-29 09:33:39 +00:00
glm-5.2	5fc074713c	docs(http): add ADR-041 MCP tool-gateway pattern for to_mcp The to_mcp spec was describing one MCP tool per alknet operation — the tool-bloat problem. An LLM connecting to a node with 200 operations gets 200 MCP tools dumped into its context, degrading reasoning and wasting context budget. ADR-041 replaces this with the tool-gateway pattern (same pattern as opencode's memory and worktree tools): to_mcp exposes 4 fixed meta-tools (search, schema, call, batch) that gate access to the full operation registry. The LLM has a few tools in context, discovers operations on demand through search + schema, then calls. Same principle as Linux's man command — don't preload all documentation; query on demand. Gateway tool set: - search -> services/list (names + descriptions, AccessControl-filtered) - schema -> services/schema (full OperationSpec for a specific op) - call -> call.requested (Query/Mutation only, request/response) - batch -> multiple call.requested (correlated IDs, OQ-14) Subscription operations are excluded — MCP tool calls are request/response by protocol design (the client blocks until CallToolResult returns); streaming subscriptions don't fit. Subscriptions are filtered out of search results and cannot be invoked via call. http-mcp.md to_mcp section rewritten: the gateway tool set, Subscription exclusion, and the service behavior (tools/list returns 4 fixed tools, tools/call dispatches through the gateway). The 'Why' section adds the tool-bloat rationale and the memory/worktree tool pattern that informed the design. README/overview ADR tables and the top-level README current-state note updated for ADR-041.	2026-06-29 08:34:44 +00:00
glm-5.2	398e3d512d	docs(http): add ADR-040 WebTransport ALPN-stream-proxy and reframe OQ-38 The 'WebTransport proxy' concept was conflating two distinct things; this pass separates them: 1. In-process ALPN-stream-proxy (ADR-040, in alknet-http): the h3 handler hands a WebTransport stream to another ALPN handler (SshAdapter, GitAdapter, etc.) as a Connection, so a browser with a WASM parser can reach any ALPN service via WebTransport. Path-based routing (the CONNECT path declares the target: /alknet/ssh -> SshAdapter). HttpAdapter gains Arc<HandlerRegistry> for the lookup. The browser's WASM parser implements BiStream (ADR-007) over the WebTransport stream. SSH-over-WebTransport is HTTPS-shaped at the network layer (anti-censorship: the 'VPN-like without being a VPN' use case on a clean foundation). russh-sftp demonstrates WASM targeting is feasible; SSH is the next target. 2. Standalone relay service (OQ-38, future alknet-relay crate): a full relay - fork of iroh-relay - with WebTransport proxy fallback for NAT traversal. This is infrastructure, not a mode of the h3 handler. OQ-38 reframed to be the standalone-relay scope question (distinct from the in-process proxy now resolved by ADR-040). webtransport.md updated: three stream destinations (call protocol, ALPN-handler proxy, other sub-protocols) with path-based routing; new 'ALPN-stream-proxy' section covering the WASM client side, auth model (bearer token gates the session; protocol's own auth gates the protocol session), and the HandlerRegistry reference. README/overview ADR tables and OQ summaries updated for ADR-040.	2026-06-29 07:56:35 +00:00
glm-5.2	ab47dac4ad	docs(http): draft alknet-http architecture specs and ADRs 036-039 First speccing pass for alknet-http (HTTP interface crate: h2/http1.1/h3 server + from_openapi/to_openapi/from_mcp/to_mcp adapters). Specs (crates/http/): - README.md, overview.md — crate index, two-roles-in-one-crate framing, adapter location map, feature gates (h3, mcp), no-env-vars invariant - http-server.md — HttpAdapter for h2/http1.1, axum over QUIC stream, Bearer auth, SSE projection for subscriptions, /healthz, stealth decoy - http-adapters.md — from_openapi (reqwest) and to_openapi (projection), error fidelity (HTTP_<status> per ADR-023), type definitions - http-mcp.md — from_mcp/to_mcp (feature-gated), streamable-HTTP-only - webtransport.md — h3/WebTransport handler, browser streaming path, HTTP/3 request vs WebTransport session distinguished at framing layer ADRs: - ADR-036 HTTP-to-Call Operation Mapping (Proposed) — direct path mapping; to_openapi is projection, not router (the load-bearing one-way door from Phase 0 DH-3) - ADR-037 MCP Stdio Transport Exclusion (Proposed) — streamable HTTP only; stdio is not built (RCE-vector security position) - ADR-038 HTTP/3 and WebTransport as First-Class HTTP Transports (Proposed) — corrects the Phase 0 DH-2 deferral framing; h3 is in scope, not deferred, per ADR-009 §'What this framework is NOT' - ADR-039 HTTP Server and Client Host Colocated in alknet-http (Proposed) — one crate for server + client host (shared HTTP deps, shared operation-spec->HTTP mapping) - ADR-003 Amendment 1 — clarifies alknet-call is a protocol-foundation crate (the alknet-http -> alknet-call dependency edge) Open questions (OQ-38, OQ-39, OQ-40 added under 'Theme: alknet-http'): - OQ-38 WebTransport relay-as-proxy scope (genuine scope question, not a deferral — the decision is made when the use case becomes concrete) - OQ-39 to_openapi published-spec versioning (one-way after first publication) - OQ-40 reqwest client config and connection pooling (two-way-door) Architecture README and overview updated with doc table, ADR table (036-039), current-state note, and crate graph (alknet-http -> alknet-call edge). Reviewed by architecture-reviewer subagent: 3 critical, 4 warning, 5 suggestion issues found and fixed (missing ADR-039, WebTransport stream routing conflation, undefined types, stale OQ-37 deferral language, README OQ table completeness, Bearer-only attribution, cross-references, ADR-038 ALPN quote, feature-gate placeholder, MCP temporal language).	2026-06-29 05:53:38 +00:00
glm-5.2	4a52779460	docs(arch): amend call specs for ADR-029/030/032/034 — peer-keyed routing, PeerEntry, forwarded-for, three roles Sync the call crate specs to the accepted ADRs 029-034: - operation-registry: PeerCompositeEnv (peer-keyed overlays), invoke_peer/ PeerRef routing, retire remote_safe/trusted_peer, AccessControl-based peer auth, forwarded_for on OperationContext (ADR-029/030/032) - call-protocol: peer-keyed compose_root_env, forwarded_for in call.requested payload, build_root_context forwarded_for parameter (ADR-029/032) - client-and-adapters: CallClient verifier selection by PeerEntry presence, remote_identity: None load-bearing, three remote roles (ADR-034) - README: ADR-029/030/032/034 in applicable ADRs table	2026-06-28 21:08:26 +00:00
glm-5.2	0de2cebb1d	docs(arch): ADR-035 — concrete persistence adapter shapes, resolve OQ-36 Commits the concrete adapter shape deferred by ADR-033: read-sync / write-async split with honker NOTIFY/LISTEN for no-restart cache invalidation, against SQLite, in a separate alknet-store-sqlite crate. Two constraints drive the design: (1) the hot-path read trait (IdentityProvider::resolve_from_fingerprint, CredentialStore::get) is sync — called in the accept loop, no .await — so a SQLite-backed adapter must cache in memory and serve sync reads from the cache; (2) auth changes must take effect without a restart (an early issue the project already fixed for ConfigIdentityProvider via ArcSwap config reload). honker's SQLite NOTIFY/LISTEN (single-digit-ms wake, no polling) is the cache-invalidation mechanism that makes both hold: write commits to SQLite + emits NOTIFY, the running process's LISTEN wakes, the in-memory index reloads and atomically swaps, the next read sees the new state. Same ArcSwap-reload pattern as config, generalized from 'config file is source of truth' to 'SQLite is source of truth, honker signals when it changed.' New async IdentityStore write trait (put_peer / update_peer / remove_peer) extends the sync IdentityProvider read trait for peer mutations. ConfigIdentityProvider does NOT implement it (config reload is its write path — a posture enforced by the absence of a backend, not a type-system constraint); SqliteIdentityProvider implements both. CredentialStore::put/delete refined to async (within ADR-031's one-way door — the contract was get/put/delete keyed by provider persisting EncryptedData never decrypting; sync-vs-async was unspecified). CredentialStoreError renamed to shared StoreError covering both traits. alknet-store-sqlite is one crate implementing both IdentityStore and CredentialStore with shared SQLite connection + honker LISTEN infra (splitting later is a two-way door). Schema shape committed (one row per PeerEntry with JSON columns for fingerprints/scopes/resources; one row per EncryptedData blob keyed by provider); exact DDL is an implementation-detail two-way door in the adapter crate. The keypal adapter-factory pattern is intentionally not ported to Rust (runtime column-mapping is a TS affordance; in Rust each adapter is a concrete type, cross-cutting concerns are a shared helper module). Amends ADR-031 (put/delete async refinement, StoreError rename), ADR-033 (concrete adapter shape now specified, two-crate framing collapsed to one), ADR-034 (OQ-36 now resolved), auth.md (IdentityStore section, cache-invalidation summary, OQ-36 reference), config.md (two write paths note), and the OQ-36/OQ-34 entries in open-questions.md. Review fixed 4 criticals (error-type name divergence, duplicate IdentityProvider sketch, upsert/Duplicate ambiguity, 'shape unchanged' contradiction), 7 warnings, 5 suggestions.	2026-06-28 11:10:31 +00:00
glm-5.2	6cc8715ccf	docs(arch): ADR-034 — outgoing-only X.509 and three peer roles, resolve OQ-37 Untangles the conflation of three distinct remote roles under 'X.509 endpoint': (1) public X.509 endpoint — a remote HTTPS/call-over-TLS server the local node is a client of (no PeerEntry, no PeerId, not in the peer graph; CA verification + bearer token); (2) transport relay — iroh's DERP-equivalent, infrastructure, not an alknet peer; (3) hub / hosting node — an alknet peer that also exposes a public domain + X.509 for browsers (mixed-fingerprint PeerEntry, already supported by ADR-030). The load-bearing one-way door is the client-side verifier selection rule: known peer (PeerEntry present) → fingerprint pin; unknown X.509 remote → CA verification (WebPkiServerVerifier); unknown Ed25519 remote → fails closed. This closes the AcceptAnyServerCertVerifier security hole OQ-29 flagged, with the peer-model criterion (PeerEntry presence) made explicit. The 'make PeerEntry symmetric' instinct is rejected — pure-client connections to public APIs have no stable logical identity to pin. Documents that CallCredentials.remote_identity: None is load-bearing (None = public X.509 endpoint → CA path, not a missing field; Some = known peer → fingerprint pin), closing a subtle gap where an implementer could have defaulted to a placeholder or treated None as skip-verify. Records WebTransport relay-as-proxy (deferred with h3/WebTransport, new OQ-HTTP-07) and on-chain/smart-contract peer discovery (fits the OQ-36 repo/adapter pattern, no auth-model change) so they aren't lost. Amends auth.md and client-and-adapters.md with the three-role naming, the verifier selection rule, and the Option semantics; updates OQ-37 to resolved in open-questions.md, README.md, and both crate READMEs.	2026-06-28 10:47:49 +00:00
glm-5.2	3f011cbb82	docs(arch): tighten door-type framing — reversal cost, not deferral ADR-009, open-questions.md, and the architect agent spec all had the same conflation: 'two-way door' was phrased as 'can be decided during implementation,' which reads as 'defer the decision.' That's not what it means. A two-way door is a decision you make now and can revert later if wrong — it's about reversal cost, not urgency. ADR-009: add §'What this framework is NOT' — explicitly separates door type (reversal cost) from deferral (scope management). State that architecture decisions are the architect's regardless of door type. Reword the two-way-door process from 'can be decided during implementation' to 'pick the simplest option that works, implement it, revert if needed.' open-questions.md: reword the header to clarify door type describes reversal cost, not urgency. Add 'Door type is separate from whether a decision is made.' architect.md: add Key Principle #8 (decisions are made, not deferred), a new 'Door Types and Decision Urgency' section, and two new anti-patterns (#8: door type as deferral, #9: hedging language in resolved decisions).	2026-06-28 09:19:10 +00:00
glm-5.2	7d812af8f4	docs(arch): multi-credential PeerEntry, resolve OQ-29, dissolve OQ-35, add OQ-37 Amend ADR-030 with three changes from the auth-type analysis: 1. PeerEntry is now multi-credential: fingerprints: Vec<String> (Ed25519 and/or X.509) + auth_token_hash: Option<String> (bearer token). All resolve to the same peer_id. A peer that authenticates via Ed25519 today and via auth_token tomorrow gets the same PeerId. The 'peer bearer vs auth bearer' distinction was wrong — the correct framing is the three credential types (Ed25519, X.509, bearer token) and whether the token needs a stable logical id across rotation (PeerEntry) or not (ApiKeyEntry). 2. Fingerprint normalization (§6): quinn extracts the raw Ed25519 public key from the SPKI cert and formats as ed25519:<hex>, matching iroh. The same key has the same fingerprint regardless of transport. X.509 fingerprints stay as SHA256:<hex of DER>. This also simplifies the coming WebTransport relay work. 3. The 'API keys' section is replaced with 'Bearer tokens' — correctly framing the three auth types and the two bearer-token paths (PeerEntry.auth_token_hash vs ApiKeyEntry). Resolve OQ-29 (CallClient TLS client-auth): wire quinn client-auth (present Ed25519 key as raw public key client cert — the server-side extraction already works); key-type-aware server cert verification (raw key = fingerprint match, X.509 = CA verification via WebPkiServerVerifier — AcceptAnyServerCertVerifier is only safe for raw keys); fingerprint normalization. The iroh path already works (RFC 7250 raw keys, both sides exchange automatically); the gap was quinn-only. Dissolve OQ-35: the 'API key asymmetry' framing was wrong. PeerEntry supports multiple credential paths; ApiKeyEntry is for tokens that ARE the identity. Add OQ-37: X.509 outgoing-only case — the three auth types and how X.509 server identity fits the peer model. Not blocking the ADR-029 migration; downstream (HTTP crate phase). Update auth.md, config.md, client-and-adapters.md, call/README.md, core/README.md, open-questions.md, README.md, and call_client.rs source comment. Workspace green: 326 tests pass, build clean.	2026-06-28 08:49:36 +00:00
glm-5.2	1d94aaea51	docs(arch): resolve call-crate OQs, promote OQ-29 to load-bearing on ADR-030 Resolve the call-crate open questions where the decision is made — OQ-27 (auto-re-import), OQ-28 (same-peer collision = error), OQ-30 (PeerRef::Any insertion-order first-match), OQ-31 (services/list-peers opt-in). These were previously marked 'open' with 'v1' hedging language despite having a decided default. What remains (refresh(), richer routing, services/list-peers the op) is genuine feature addition, not unmade architecture. Reframe OQ-32 (multi-hop) as a feature extension rather than a 'v1' deferral — the one-hop model is the architectural commitment; extending to multi-hop doesn't break downstream. Promote OQ-29 (CallClient TLS client-auth) from medium to high priority and surface its real interaction with ADR-030. Previously framed as 'additive — two-way-door remainder,' but ADR-030's PeerEntry fingerprint → peer_id resolution requires the client to present a TLS client cert. With with_no_client_auth(), no fingerprint is extracted, the PeerEntry path is dormant, and PeerCompositeEnv keys on None or the API-key prefix instead of the stable peer_id. This is the activation path for ADR-030's primary use case, not an additive feature. Three options laid out: (a) wire client-auth with the ADR-029 migration, (b) ship token-only and switch later (the 'compounds into a mess' path), (c) extend PeerEntry to cover auth_token-based identity. Requires a decision before the migration lands. Clarify OQ-36 (concrete adapter shapes): the trait shapes and in-memory adapters ship with core — the deferral is only for the persistence adapters (SQLite, etc.). The in-memory adapters are real implementations of a full repo pattern, not stubs. Update call_client.rs source comment to reference OQ-29 instead of the 'v1' / 'two-way-door remainder' framing. Workspace green: 326 tests pass, build clean.	2026-06-28 05:35:52 +00:00
glm-5.2	f224ea998c	docs(arch): ADR-030..033 — repo/adapter pattern, PeerEntry, CredentialStore, forwarded-for Land the storage and auth strategy research (findings.md) as four accepted ADRs and amend the core and call specs to match: - ADR-030: PeerEntry and Identity.id decoupling. Replaces authorized_fingerprints with peers: Vec<PeerEntry>; Identity.id becomes the stable peer_id, decoupled from the rotating fingerprint. Supersedes ADR-029 Assumption 1's UUID source (one-way door preserved, source changes). Resolves OQ-33 and the storage-boundary half of OQ-34. Records the API-key asymmetry as deliberate (OQ-35). - ADR-031: CredentialStore repo trait + InMemoryCredentialStore default adapter in core. Second repo trait alongside IdentityProvider. Vault encrypts; the store persists the EncryptedData blob; assembly layer loads into Capabilities. EncryptedData core mirror includes salt for wire-format compat. - ADR-032: Forwarded-for identity. forwarded_for field on call.requested and OperationContext — metadata only, never read by AccessControl::check (enforced structurally via the check signature). The from_call handler populates it. Wire-format one-way door, folded into the ADR-029 migration window. - ADR-033: Storage boundary and repo/adapter pattern. Core defines repo traits + in-memory defaults; persistence adapters are separate crates; assembly layer wires. Resolves OQ-34. Concrete adapter shapes deferred for exploration (OQ-36). Amends auth.md, config.md, operation-registry.md, client-and-adapters.md, open-questions.md, README.md, crates/core/README.md. Marks ADR-029 Accepted (Assumption 1 carries the ADR-030 superseded note). Marks the research findings doc reviewed.	2026-06-27 12:12:25 +00:00
glm-5.2	347bff257c	docs(research): rewrite storage/auth strategy — concrete repo/adapter design, no deferrals Reworks the storage strategy doc to commit to concrete design, replacing the 'when storage arrives' / 'future' / 'later' framing that was putting off important work. Key changes from the previous draft: - §4 (Repo/Adapter Pattern): now an explicit design with the trait contracts (IdentityProvider, CredentialStore), the adapter contracts (ConfigIdentityProvider with PeerEntry update, SqliteIdentityProvider, InMemoryCredentialStore, SqliteCredentialStore), and the concrete table schemas. Not a pattern description — a design commitment. - §4: PeerEntry config model — AuthPolicy gains peers: Vec<PeerEntry> replacing authorized_fingerprints: HashSet<String>. This is the id-fingerprint decoupling (OQ-33) done as a config change, not a storage change. ConfigIdentityProvider resolves fingerprint → PeerEntry → Identity { id: peer_id } (stable, not the fingerprint). - §7 (Decomposition): the 'what goes where' table now has a Status column (exists / needs adding / needs building / needs PeerEntry update) instead of 'future'. The crate graph is a concrete build plan. - §10 (Build Order): replaces 'What This Means for the Immediate Path' (which had 'when storage arrives' framing) with a 4-tier dependency-driven build order. Tier 1 = core repo traits + PeerEntry config model. Tier 2 = SQLite adapters. Tier 3 = ADR-029 migration + forwarded_for. Tier 4 = alknet-graphs (built when a graph-shaped problem exists, not speculatively). - §10: explicit 'What does NOT get built (dropped, not deferred)' section — multi-tenant, accounts/orgs, secrets module, single storage crate are dropped, not deferred. - All 'future' / 'when X arrives' / 'v1' / 'phase n' language removed for things that are needed. The only 'when X is needed' language remaining is for genuinely non-existent problems (ACL delegation, workflows, taskgraph) — those are built when the problem exists, not speculatively.	2026-06-27 10:36:07 +00:00
glm-5.2	19d010cf73	docs(research): storage and auth strategy — repo pattern, per-node ACL, SQLite+honker, metagraph-as-tool Synthesizes the multi-thread discussion that surfaced during the peer-graph routing research (ADR-029) and OQ-33/34 resolution. Three separate threads (peer identity, filesystem POC, old storage spec) converged on the same question: where does persistent state live in the alknet crate graph, and what's the shared infrastructure for it. Key commitments documented: - SQLite + honker is the foundation (pattern, not a crate — ~20 lines per consumer). The metagraph is one tool built on it, for graph-shaped problems. Direct tables are another tool, for table-shaped problems. - IdentityProvider is the auth repo trait (already exists in core, make the pattern explicit). Adapters implement it (Config, SQLite, future Redis/remote/automerge). PeerStore is adapter-internal, not core. - Per-node ACL, no 'trusted' flag. Each node authorizes its direct callers via AccessControl::check(identity). No global ACL, no replication. The hub authorizes the user; the spoke authorizes the hub. Same mechanism. - Forwarded-for identity as metadata, not authority. The from_call handler includes the original caller's identity in the call payload; the spoke's ACL authorizes the hub (direct caller), never the forwarded_for. The ACL check signature prevents misuse. - The ACL check stays table-shaped (flat scope match); the delegation graph (future) produces effective scopes at resolution time. They compose at the IdentityProvider boundary. - The hub proxy tangle: ACL (authorize), bucket routing (operation input), peer routing (PeerRef) are three separate layers. Bucket-level authorization is handler logic, not protocol logic. What the old spec had that's dropped: multi-tenant (each tenant gets own setup), secrets module (replaced by vault), metagraph-as-foundation (demoted to tool), single storage crate (split by concern), accounts/orgs (deferred — v1 is a peers table). Reference: kepal (/workspace/keypal) — TypeScript repo-pattern example (Storage interface + adapters) that alknet's IdentityProvider follows.	2026-06-27 10:02:26 +00:00
glm-5.2	99c6dd9483	docs(arch): resolve OQ-26 (AdapterError variants) + OQ-33 (PeerId = logical id) + OQ-34 (persistent peer registry) OQ-26 (resolved): AdapterError variants decided — DiscoveryFailed, SchemaParse, Transport, Unauthorized, SamePeerCollision (replaces flat Conflict per ADR-029 §5). #[non_exhaustive] for downstream extension. Two-way door; the initial set is the code's return type. OQ-33 (resolved): PeerId is a logical identifier, NOT Identity.id. The research's v1 default (PeerId = fingerprint) is overridden: coupling PeerId to crypto material breaks every in-flight PeerRef::Specific and every ACL entry on key rotation. v1 source is a connection-assigned UUID — a no-storage workaround that works for the immediate use case (head→workers, reconnect produces fresh PeerRef, in-flight gets NOT_FOUND which is correct). The one-way door: PeerId is logical, not crypto — this determines PeerCompositeEnv key type and PeerRef::Specific payload. The id source (UUID vs configured name vs peer registry) is the two-way-door remainder. OQ-34 (new): the storage dimension OQ-33 surfaced. The core crates are deliberately DB-free (smaller, fewer deps, simpler testing) — this served local-only state (vault, registry) well, but peer identity is the first cross-node state that wants persistence. The real solution (a persistent peer registry mapping stable logical name → current crypto material, surviving key rotation) is not a v1 blocker (UUID works), but tracked so the no-DB posture's limit is deliberate, not accidental. The storage boundary (core gets a PeerRegistry trait vs stays storage-free) is the one-way door; the backend choice is two-way. Key-rotation/ACL note: decoupling PeerId from crypto keeps the door open for ACL entries that persist across key rotation — when the peer registry is built, ACLs key on the logical name and key rotation becomes vault-only with no remote-side ACL update.	2026-06-27 06:34:35 +00:00
glm-5.2	77eb35a8a5	docs(arch): ADR-029 peer-graph routing model — supersedes ADR-028 ADR-028's remote_safe/trusted_peer was a parallel, weaker authorization system that duplicated the existing AccessControl/Identity machinery and couldn't express the head→N-workers pattern (the primary use case). The flat-namespace single-peer overlay model (one connection layer in CompositeOperationEnv) structurally breaks the moment a head has two workers both exposing /container/exec. ADR-029 replaces it with: - Peer-keyed overlays: PeerCompositeEnv { connections: HashMap<PeerId, ...> } replaces CompositeOperationEnv's singular connection layer. A head node routes invoke_peer() to the right peer via PeerRef::Specific / PeerRef::Any. - AccessControl-based peer authorization: the existing AccessControl::check (peer_identity) gates peer calls — the same mechanism that gates every other call. remote_safe/trusted_peer/RemoteFilter/list_operations_peer_scoped/ services_list_handler_peer_scoped are retired. The op's AccessControl IS the peer-authorization policy; no parallel system. - ScopedPeerEnv: peer-qualified reachability (peer-pinned allowlist) replaces from_call's namespace_prefix as the disambiguation mechanism. Cross-peer collision dissolves (separate sub-overlays); same-peer collision stays error. - services/list-peers opt-in for peer-attributed re-export listing. POC-validated against real types (scratch module written, type-checked, removed; build clean, 207 tests pass). Petgraph not needed for v1 (one-hop, shallow); nested HashMap suffices; extends to multi-hop without redesign (OQ-32). OQ impact: OQ-25 dissolved (no marking); OQ-28 cross-peer dissolved / same-peer stays; OQ-26/27/29 stay; new OQ-30 (Any routing policy), OQ-31 (list-peers semantics), OQ-32 (multi-hop federation). Research: docs/research/alknet-call-peer-routing/findings.md (POC shapes, prior art — Ray.io actors, Dapr service invocation, full ADR draft). ADR-028 marked Superseded; ADR-017 DC-1 amendment updated to point at ADR-029.	2026-06-27 06:04:19 +00:00
glm-5.2	f9c0ab092b	docs(arch): sync call-completion specs with implementation — Dispatcher/RemoteFilter, ClientError, OQ-29 Post-implementation spec sync after the call-completion batch landed (commits e4a2594..a3825f5). The sub-agent review flagged no spec drift, but comparing the implemented types against the spec sketches surfaced five details the specs didn't name — filled in here so the spec matches what was built: - client-and-adapters.md: name the shared Dispatcher (protocol/dispatch.rs) + RemoteFilter mechanism that enforces ADR-028's default-deny at dispatch time (the load-bearing security gate — checks remote_safe before building context, before any capability material reaches the handler). Add ClientError/RemoteIdentity types, the spawn_dispatch lower-level API, and the services_list_handler_peer_scoped wiring (the assembly layer must register the peer-scoped services/list handler for a CallClient's registry, not the plain one). Record the v1 TLS client-auth gap (AcceptAnyServerCertVerifier, with_no_client_auth) as OQ-29. - call-protocol.md: point the adapter dispatch-loop description at the shared Dispatcher (dispatch.rs) so readers find the mechanism ADR-017 §1 commits to. - open-questions.md: OQ-29 — CallClient TLS client-auth + remote-identity verification is a two-way-door remainder; the no-env-vars invariant is unaffected (auth_token flows via call-protocol payload, not TLS). - READMEs: current-state now reflects completion done + reviewed (207 lib + 2 integration tests); OQ-29 added to both OQ summaries.	2026-06-26 13:42:42 +00:00
glm-5.2	2649e068e5	docs(arch): call-completion — ADR-028 peer-scoped filtering + client-and-adapters spec + tasks Resolves the four gap-analysis decisions (DC-1..4) blocking the alknet-call client/adapter surface specced in ADR-017: - ADR-028 (new): locks the one-way door for DC-1 — CallClient registry is default-deny (remote_safe: bool on HandlerRegistration, default false across all provenance); share-global is an explicit trusted-peer opt-in; filtering is a dispatch-time read over the single Layer-0 registry, not a copy. - client-and-adapters.md (new spec): operationally fills the gap ADR-017 left to implementation — CallClient, from_call, from_jsonschema, OperationAdapter trait, adapter location map, no-env-vars invariant, exchange-of-operations pattern. Keeps call-protocol.md and operation-registry.md under the 700-line split threshold. - ADR-017 amended: records DC-2/3/4 v1 defaults (auto-on-reconnect, error-on-collision, Result error type) and points DC-1 at ADR-028. - OQ-25..28 (new): two-way-door remainders (remote_safe shape, AdapterError variants, re-import trigger, namespace collision) with v1 defaults recorded. - Index/cross-ref updates across READMEs and the two existing call specs. Tasks: 6 task files under tasks/call/ decomposing the completion work along the gap-analysis priority order — remote-safe-marking (one-way door, first) → call-client (phase-risk) → from-call → operation-adapter-trait → from-jsonschema (parallel with call-client) → review-completion. Graph validated with taskgraph; parallelism designed in (from-jsonschema runs concurrent with call-client/from-call once the trait lands).	2026-06-26 12:25:13 +00:00
glm-5.2	6940d9858d	docs(research): alknet-http phase-0 findings — HTTP server + client + MCP adapters Phase 0 exploration for alknet-http (greenfield crate, no existing arch): HTTP server (axum, ProtocolHandler for h2/http1.1, h3 deferred), HTTP client (reqwest, the from_openapi/from_mcp forwarding handlers), MCP streamable HTTP (feature-gated, stdio excluded as security position), to_openapi/to_mcp projections. Records: 8 design points (DH-3 HTTP→call operation mapping as the load-bearing one), the settled adapter location map (from alknet-call gap analysis), the no-env-vars invariant (Capabilities → from_openapi handler → HTTP header as the credential injection point), and the prerequisite on alknet-call's OperationAdapter trait being defined first.	2026-06-25 12:46:25 +00:00
glm-5.2	79d8561bb4	docs(research): alknet-call completion gap analysis — CallClient + from_call + OperationAdapter Gap analysis for completing alknet-call: the server-side core (~5.7k lines, 159 tests) is implemented, but the client side (CallClient), the bilateral exchange mechanism (from_call), and the adapter contract (OperationAdapter trait) are specced in ADR-017 and unimplemented. Records: implementation state (verified against src/), 5 decisions needed (peer-scoped registry filtering as the load-bearing one), the settled adapter location map (trait + from_call + from_jsonschema in alknet-call; from_openapi/ from_mcp in alknet-http), the no-env-vars invariant (Capabilities → from_openapi handler → HTTP header), and the exchange-of-operations runner pattern with dispatch as the concrete downstream consumer.	2026-06-25 12:44:49 +00:00
glm-5.2	db1dcd362f	docs(research): revise alknet-ssh phase-0 — SOCKS5+forwarding in v1, TCP listener for git-over-ssh Incorporates user clarifications: SOCKS5 and bidirectional port forwarding are core non-negotiable v1 features (the VPN-like use case + the 3.5k-clones demand). Adds DP-10 for the bare-TCP SSH listener as a first-class path needed for future git-over-SSH, with config shape reserved in v1 (off-by-default, default-deny). Grounds the client/forwarding recommendations in the dispatch downstream consumer at /workspace/@alkdev/dispatch, which is a textbook russh SSH client + direct-tcpip forwarder the user wants to replace with this stack. alknet-ssh now owns both server and client + SOCKS5-server in v1; the SOCKS5 codec may extract to a separate crate later (two-way door).	2026-06-25 08:46:35 +00:00
glm-5.2	d758a71490	docs(research): alknet-ssh phase-0 findings — stream wiring, russh 0.60.2, decision points Phase 0 exploration for alknet-ssh: confirms SSH-over-QUIC-bistream via tokio::io::join (no custom adapter needed, per reference impl), russh 0.60.2 generic run_stream/connect_stream, and channel-into-bistream multiplexing. Surfaces 9 decision points for Phase 1: host key sourcing (vault-derived vs config), channel policy v1 surface, client + SOCKS5 crate split, crypto backend, auth method coverage, and a stream-handling POC to close russh's upstream test gap.	2026-06-25 08:06:45 +00:00
glm-5.2	011db05a52	test: implement coverage #005 Tier-A suggestions (S1-S4, S8) Add 165 tests covering the directly-testable surface identified in coverage review #005. Workspace coverage rises 87.1% -> 91.2% (5759/6615 -> 6505/7135); all 389 tests pass, clippy clean. - S1 (connection.rs): dispatch_envelope across all five event-type arms for Call + Subscribe, plus SubscriptionStream poll_next branches and SubscriptionStream::closed. - S2 (types.rs): map_quinn/iroh_connection_error for TimedOut/Reset/ ApplicationClosed/other, plus HandlerError + StreamError Debug/Display/ source for every variant. - S3 (config.rs): Ed25519SecretKey from_bytes/as_bytes round-trip, sign+verify, tampered-message rejection, Debug non-leakage. - S4 (endpoint.rs): build_rustls_server_config RawKey/SelfSigned/Acme arms, build_quinn_server_config_from_rustls, load_private_key/ load_cert_chain error paths, has_iroh_identity branches, AcceptAnyCertVerifier trait methods, Ed25519SigningKey trait impls (choose_scheme both branches, algorithm, public_key, sign, scheme), RawKeyCertResolver + AlknetEndpoint Debug. endpoint.rs 56% -> 73%. - S8 (vault protocol.rs): the existing redacted-deserialize test passed for the wrong reason (JSON string failed Vec<u8> coercion before the guard). Two new tests exercise the guard directly via a [REDACTED] byte array (rejected) and a real payload (accepted). protocol.rs -> 100%. Deferred to follow-up: S5 (loopback quinn integration test, the real unlock for accept/dispatch/stream paths), S6 (ACME event-loop extraction), S7 (adapter abort arm). Review #005 updated with the resolution.	2026-06-25 05:43:59 +00:00
glm-5.2	32dcc05658	docs(reviews): add coverage analysis #005 First dedicated coverage pass (cargo-llvm-cov --workspace --all-features). Workspace at 87.1% line coverage (5759/6615), all 224 tests pass. Vault and registry layers are essentially fully covered; gaps concentrate in endpoint.rs (56%), types.rs (57%), and connection.rs (54%), all stemming from tests using MockConnection whose open_bi/accept_bi return Err. Eight suggestions (S1-S8) ordered by leverage: pure-function tests for dispatch_envelope / map_*_connection_error / error Display+Debug (S1-S3), Tier A directly-callable TLS/rustls helpers in endpoint.rs (S4), one loopback quinn integration test as the real unlock across four files (S5), ACME event-loop extraction via synthetic stream (S6, the flagged research item), and two small remaining gaps (S7-S8). No critical or warning findings — this is a testing-infrastructure gap, not a logic gap.	2026-06-25 04:32:51 +00:00
glm-5.2	00edfc0889	feat(core): ADR-027 — RawKey decoupling, client cert request, ACME integration Three tasks implementing ADR-027: 1. core/rawkey-decouple-from-iroh: TlsIdentity::RawKey now uses Ed25519SecretKey (alknet-core-owned wrapper over ed25519_dalek) instead of iroh::SecretKey. RawKeyCertResolver and Ed25519SigningKey un-gated from #[cfg(all(quinn, iroh))] to #[cfg(quinn)] only. Quinn-only builds (default) now support RFC 7250 raw-key identity. iroh transport converts via iroh::SecretKey::from_bytes. 2. core/endpoint-request-client-cert: replaced with_no_client_auth() with AcceptAnyCertVerifier — a custom ClientCertVerifier that requests client certs but doesn't require them or verify against a CA. alknet's identity model is fingerprint-based (the authorized_fingerprints set is the trust anchor), not PKI-based. Peer certs are extracted at the TLS layer for fingerprinting; peers without certs connect normally. 3. core/acme-integration: TlsIdentity::Acme variant (domains, cache_dir, directory, contact) + AcmeDirectory enum. TlsSetup two-phase construction: synchronous for X509/RawKey/SelfSigned, async for Acme (spawns AcmeState event loop, builds ServerConfig with ResolvesServerCertAcme). acme-tls/1 ALPN added when ACME is active; dispatch_quinn guard closes challenge connections gracefully (challenge is TLS-layer-handled). acme feature gate keeps rustls-acme out of non-ACME builds. Workspace: build/test/clippy green across all 3 feature configs (quinn-only, quinn+iroh, quinn+acme, all-features). 331 tests, 0 failures, 0 warnings.	2026-06-24 20:29:43 +00:00
glm-5.2	d94d7a132a	docs(adr-027): TLS identity redesign — ACME + RawKey decoupling ADR-027 resolves the architectural gap surfaced when ACME integration became a concrete target: 1. TlsIdentity::Acme variant — static config data (domains, cache_dir, directory, contact) with async AcmeState constructed at endpoint setup via two-phase TlsSetup (not stuffed into the Clone-able enum). 2. TlsIdentity::RawKey decoupled from the iroh feature — uses Ed25519SecretKey (alknet-core-owned wrapper over ed25519_dalek) instead of iroh::SecretKey. Raw-key TLS identity (RFC 7250, the default for most alknet nodes) now works in quinn-only builds. iroh transport converts via SecretKey::from_bytes. 3. ACME feature-gated behind new acme feature (rustls-acme optional dep). Non-ACME builds don't compile it. 4. dispatch_quinn guard for acme-tls/1 challenge connections — TLS-ALPN-01 is handled at the rustls cert resolver layer during the handshake; the guard closes challenge connections gracefully instead of logging a misleading "no handler" warning. Research confirmed QUIC (quinn) handles ACME challenges differently than TCP (reverse-proxy): quinn gives no ClientHello peek hook, but the challenge is fully answered at the cert resolution step before the connection surfaces to the application. No handler registration needed. Spec updates: config.md, endpoint.md, open-questions.md (OQ-12), overview.md + README.md (ADR index), ADR-010 (cross-ref). Tasks: core/rawkey-decouple-from-iroh (gen 1, no deps), core/acme-integration (gen 2, depends on rawkey). Graph: 36 tasks.	2026-06-24 12:29:24 +00:00
glm-5.2	97216764ea	fix: resolve review #004 findings W1-W4 + close review gate W1 (call/protocol/abort-cascade-wiring): wire AbortCascade into CallAdapter handle_stream for EVENT_ABORTED. Cascades with AbortPolicy::AbortDependents, aborts root, no descendant frames on wire (ADR-016 Decision 2). Two integration tests added. W2 (core/endpoint-client-fingerprint): extract TLS client cert fingerprint in dispatch_quinn (SHA256:<hex> of leaf cert DER via peer_identity) and dispatch_iroh (ed25519:<hex> of peer NodeId). Fingerprint format documented in auth.md. Server config change (with_no_client_auth → request-but-don't-require) deferred to new follow-up task core/endpoint-request-client-cert. W3 (vault/mnemonic-debug-redaction): replace Mnemonic derive(Debug) with manual redacting impl (phrase: "[REDACTED]"). Seed confirmed no Debug impl. Redaction test added. W4 (core/auth-apikey-resources): Option B — drop entry.resources from spec. External identities (token/fingerprint) grant scopes only; resource-scoped ACLs are composition-internal (ADR-015/022). auth.md corrected + limitation documented. Two tests confirm empty resources. review-post-impl-fixes: all 4 verified, workspace green (326 tests, 0 failures, 0 clippy warnings). Review #004 status → resolved. Graph: 34 tasks, 12 gens.	2026-06-24 11:00:54 +00:00
glm-5.2	d904dfc243	docs(reviews): add post-implementation sanity check #004 28-task implementation review across alknet-vault, alknet-core, alknet-call. Zero critical findings; 4 warnings (W1: AbortCascade implemented but never invoked by CallAdapter; W2: endpoint hard-codes tls_client_fingerprint=None; W3: Mnemonic Debug leaks seed phrase; W4: ApiKeyEntry missing resources field); 5 suggestions. Build clean, 332 tests pass, default clippy clean.	2026-06-23 22:13:20 +00:00
glm-5.2	23b76a240a	docs(call): record review-call pass — implementation conforms to registry/protocol/ADR spec call/review-call verified the alknet-call crate against operation-registry.md, call-protocol.md, and ADRs 005/012/014/015/016/017/022/023/024. All registry types, protocol types, security constraints (Capabilities non-serializable/zeroized/immutable, metadata non-propagation, internal ops -> NOT_FOUND, reachability bounds, UUID v4 request IDs), and pattern consistency (OperationEnv trait, CompositeOperationEnv contains-probe, authority switch, deadline inheritance) are conformant. 159 tests green; build/clippy(fmt check) clean. No source changes required.	2026-06-23 15:55:10 +00:00
glm-5.2	323ee85d40	docs(vault): remove drift tracking artifacts, bump vault docs to stable The vault spec-to-implementation sync is complete. Remove the drift tracking tools that were only needed during sync: - Remove the Known Source Drift table from vault/README.md - Remove 'known drift' / 'current source uses X' prose from Security Constraints sections in vault/README.md, encryption.md, and service.md. The permanent constraint statements (OsRng for IVs, zeroized drop, no unwrap, etc.) are preserved. - Remove the drift paragraph in encryption.md Key Versioning. - Remove stale 'to be updated per ADR-025' / 'postcard tests to be removed' notes in protocol.md References. - Bump status: draft -> stable in the frontmatter of all vault docs (README, mnemonic-derivation, encryption, service, protocol). - Update architecture/README.md: vault doc status entries to stable, Current State paragraph reflects vault implementation complete (no 'pending ADR-025/026 refactor' language).	2026-06-23 14:15:13 +00:00
glm-5.2	2e34590522	docs(architecture): resolve review #003 — type/API surface completeness Review #003 found 11 critical, 14 warning, and 6 suggestion findings after reviews #001 (governance/security) and #002 (cross-document consistency/two-way-door audit) were resolved. The theme: types and APIs that were referenced but never defined, and stale ADR sketches that didn't match the now-updated spec docs. Critical fixes (11): - C1: DerivedKey #[derive(Deserialize)] contradicted the custom Deserialize that rejects "[REDACTED]" — dropped the derive, added explicit manual Serialize/Deserialize impls (protocol.md). - C2: encrypt prose said "derived at PATHS::ENCRYPTION" but the signature takes key_version — updated to encryption_path_for_version (service.md). - C3: derive_encryption_key returned DerivedKey, derive_encryption_key _for_version returned EncryptionKey (same cache) — unified on DerivedKey, defined CachedKey (service.md). - C4: tokio vs std::sync::RwLock contradiction — specified std::sync::RwLock, dropped tokio from vault deps (ADR-018, ADR-025, service.md). - C5: Missing drift rows in vault README — added #9 (key_version ignored) and #10 (rotate not implemented). - C6: ADR-022 build_root_context and invoke() sketches omitted abort_policy (9 fields vs 10) — added the field to both sketches. - C7: Capabilities type referenced 20+ times, never defined — added struct definition to core-types.md with Clone+Send+Sync, Zeroize, sealed builder API, immutability guard. - C8: SessionOverlaySource on CallAdapter but never defined, crate violation (alknet-call can't depend on alknet-agent) — defined the trait in alknet-call (call-protocol.md), matching the IdentityProvider pattern. - C9: CompositeOperationEnv dispatch fall-through was "a two-way door" — added contains() to OperationEnv trait, made the composite probe before dispatching, eliminating the sentinel ambiguity. - C10: No API for Layer 2 (connection overlay) registration, CallConnection undefined — defined CallConnection struct + register_imported() API (call-protocol.md). - C11: with_local signature diverged between two examples (4 args vs 5) — added capabilities as the 5th arg, made both examples consistent. Warning fixes (14): - W1: invoke_with_policy restructured as required method, invoke gets a default impl delegating to it — eliminates duplication across impls. - W2: CachedKey defined (service.md). - W3: EncryptionKey constructor/glue specified, added to re-export list. - W4: Secp256k1ExtendedPrivKey defined, derive_ethereum_key glue shown. - W5: encryption_path_for_version rejects version < 2 (v1 is TS PBKDF2). - W6: Wire payload schemas for all event types + ResponseEnvelope → EventEnvelope conversion table (call-protocol.md). - W7: Timeout section — deadline on OperationContext, composed calls inherit parent's deadline, CallAdapter::with_timeout(). - W8: Request ID generation spec — UUID v4 for composed calls, wire ID vs internal ID relationship for abort cascade. - W9: unlock_new already-unlocked behavior specified (returns AlreadyUnlocked). - W10: KeyType Serialize/Deserialize justification corrected (stale irpc reference removed). - W11: OperationProvenance and CompositionAuthority defined inline in operation-registry.md (were only in ADR-022). - W12: encrypt/decrypt free functions marked pub(crate), relationship to VaultServiceHandle methods stated. - W13: rotate signature removed from encryption.md (it's a VaultServiceHandle method, not a free function). - W14: CallAdapter::new() + with_session_source() + with_timeout() constructors shown. Suggestion fixes (6): Seed: Clone note, VaultServiceInner invariant, ExtendedPrivKey accessor signatures, CURRENT_KEY_VERSION location, ADR-018 stale actor text, derivation helpers re-export note.	2026-06-23 10:56:05 +00:00
glm-5.2	cb98f42cd4	docs(architecture): resolve review #002 remaining Tier 4 findings Add ADR-026 (vault key model — HD derivation) recording the foundational HD-derivation decision, 74' coin type reservation, SLIP-0010/Ed25519 default, secp256k1 feature-gating, and AES-256-GCM cipher choice. These were previously inline rationale with no ADR (W9). Extend ADR-018 with an explicit EncryptedData wire format lock — fields, encoding, and semantics are frozen; no removal without a format-version migration (W10). Resolve the remaining guard clauses and spec decisions: - W2: Capabilities must be immutable after construction (no interior mutability). Makes the Arc vs deep-copy clone semantics genuinely two-way. - W5: Published to_* specs are compatibility contracts — best-effort mappings are two-way before first publication, one-way after. Version generated specs. - W6: Salt field clarification — v2 salt is permanently unused; a future KDF is a different derivation family, not a version-indexed path; the field saves a wire-format change only. - W7: unlock_new returns Zeroizing<String> — the mnemonic is the root of trust and must not linger in freed memory. - W17: OQ-09 WASM — server-side dispatch door is honestly closed (Connection is concrete, tokio-bound), not implicitly preserved. - W18: OQ-10 git — composability fork (raw smart protocol vs call-protocol projection) is a separate decision from ERC721 scope. - W20: from_openapi must prefix imported error codes (HTTP_404) to avoid collision with protocol-level codes (NOT_FOUND). Normative rule, not naming convention. - W21: ScopedOperationEnv field is private — construction via new()/ empty(), query via allows(). Makes the future subgraph refactor non-breaking. - C13: Connection::set_identity — the endpoint does not read identity() after handle() returns (Connection is moved into the spawned task). Observability is handler-side logging. Simplest honest answer. - W1: OperationAdapter trait is async, returns Vec<HandlerRegistration>. from_call requires async discovery; ADR-022 changed the return type. - W11: CompositionAuthority::as_identity() defined — constructs a synthetic Identity (label as id, scopes, resources) not resolvable via IdentityProvider. Second Identity construction path, acknowledged. - W14: SecretKey is iroh::SecretKey (Ed25519) — consistent with the endpoint's iroh dependency. - W19: Grandchild abort propagation is inherit-by-default (option a) — invoke() with no explicit policy inherits parent's policy. ContinueRunning auto-propagates to grandchildren unless explicitly overridden.	2026-06-23 08:20:27 +00:00

1 2 3

126 Commits