76 Commits

Author SHA1 Message Date
7ecc11610a docs(arch): ADR-049 — streaming handler for subscription operations
The call protocol spec describes streaming (call.responded*N +
call.completed, PendingRequestMap::Subscribe, CallConnection::subscribe),
but the server-side Handler type returned a single ResponseEnvelope —
a Subscription op had no way to produce a stream. The TS predecessor
(@alkdev/operations) had separate OperationHandler / SubscriptionHandler
types; the Rust port collapsed them, losing the streaming path. This
restores it end-to-end: StreamingHandler type, HandlerKind on
HandlerRegistration validated against op_type, invoke_streaming() on
OperationRegistry, server-side dispatch branches on op_type, new
INVALID_OPERATION_TYPE protocol code for wrong-dispatch-path misuse,
GatewayDispatch::invoke_streaming() for /subscribe SSE, from_call stream
forwarding via CallConnection::subscribe(), from_openapi SSE forwarding.
OperationEnv::invoke() stays request/response-only (stream composition is
handler-level, not protocol-level). Amends ADR-023's protocol-code list
(five → six). Tracks the stream-operators library as OQ-41 (feature
extension, not an unmade decision).
2026-07-02 07:43:01 +00:00
e0c6f61e6a docs(http): pre-decomposition sanity check fixes — /subscribe POST, direct-call cleanup, from_mcp output handling
Three issues found in the http crate spec sanity check that would have
caused problems during task decomposition, now fixed:

C1 — /subscribe GET→POST: the gateway's /subscribe is an invoke endpoint
carrying { operation, input } in the body, but was listed as GET (which
has no body). Flipped to POST with Accept: text/event-stream negotiating
the SSE response, consistent with /call's flat-JSON-body invariant.
Browsers using EventSource can't POST but use WebSocket for the
bidirectional path; the HTTP gateway's /subscribe is for non-browser
HTTP clients (fetch + ReadableStream). Touches ADR-042, ADR-047,
ADR-048, http-adapters.md, http-server.md.

C2 — stale direct-call references: three spots contradicted ADR-047
(which removed the POST /{service}/{op} direct-call surface) and
ADR-046 §3 (which states /{service}/{op} is no longer reserved).
Cleaned up in http-server.md (custom-routes intro + collision list) and
ADR-046 §6 (default-surface list).

W2 — from_mcp output handling: the spec's fallback for tools without
outputSchema was Type.Unknown(), but the correct fallback is the MCP
ContentBlock union (text|image|audio|resource|resource_link) — a
well-defined MCP type, not Unknown. Fixed http-mcp.md with the full
structuredContent-preferred-over-content-blocks logic (matching the TS
adapter and rmcp SDK), enriched references with specific rmcp source
files. Also added shared-dispatch-spine notes to http-mcp.md and
http-adapters.md cross-referencing the new research findings.

Research (docs/research/alknet-http-gateway-factoring/findings.md):
to_mcp and to_openapi share a dispatch spine (resolve → invoke → map).
Recommendation: extract a thin shared struct now, not a GatewayDispatch
trait — the server-integration layers (axum routes vs rmcp
StreamableHttpService) and wire-framing stay per-gateway. A third
gateway is not on the horizon; if one appears its server-integration
needs its own shape anyway.

Minor: WS route precedence note (websocket.md), OpenAPISpec
shared-type-not-shape clarification (http-adapters.md), date bumps.
2026-07-01 05:41:07 +00:00
b71db99753 docs(http): add ADR-048 and websocket.md — WS carries native session, not gateway
Promote the WebSocket browser path from a section in http-server.md to a
first-class spec (websocket.md) and commit the contract-pattern decision
(ADR-048): a WS connection carries the native EventEnvelope call-protocol
session, not the HTTP gateway shape. The gateway endpoints are HTTP-only;
discovery on WS is via services/list/services/schema as ordinary call-protocol
ops; subscriptions project as native call.responded events (no SSE).

ADR-044 already decided WS as the v1 browser bidirectional path; ADR-048
clarifies the shape of what ADR-044 committed (§1 implies native session;
the ADR makes it an explicit implementer-visible rule). The from_wss adapter
(importing a remote node's ops over WS) is recorded as out-of-scope with a
concrete reversal trigger so it is not re-derived later.

Spec cleanup: http-server.md WS section collapsed to a stub pointer;
websocket.md Why section references ADRs rather than re-arguing them;
length-prefix decision made canonical (no prefix on WS — message boundary
is the delimiter); default upgrade path pinned (/alknet/call) with HTTP/2
extended CONNECT noted; indexes (README, http/README, overview) updated.
2026-06-30 12:27:00 +00:00
2a6e4c371a docs(http): resolve OQ-39; add ADRs 045-047; record pubsub prior art for WS path
OQ-39 (to_openapi published-spec versioning) resolved by ADR-045:
info.version semver tracks the gateway endpoint contract, not the
operation set — per-caller operations discovered via /search do not
bump the version. The gateway pattern (ADR-042) dissolved most of the
original churn concern.

ADR-046: assembly-layer custom HTTP routes on HttpAdapter. The HTTP
router had no documented extension point for deployment-specific
endpoints (e.g., an OAI-compatible proxy at /v1/chat/completions). Adds
extra_routes: Option<Router> at construction; raw HTTP, not operations;
default surface takes precedence on collision. The mechanism is the
one-way door; specific routes are two-way.

ADR-047: remove the direct-call POST /{service}/{op} HTTP surface. The
gateway /call is the sole invoke path — the simplified contract is a
few fixed endpoints, not a per-operation REST tree. The direct-call
surface re-introduced the 'dump the full API regardless of privs'
failure mode at the HTTP level that the gateway /search was built to
escape. ADR-036's routing decision is superseded; its non-routing
clauses (SSE, Bearer auth, /healthz, stealth, error mapping) survive.
A deployment wanting a REST-like per-operation surface builds it as a
custom route projection (ADR-046).

ADR-044 updated with the tradeoff framing (WSS is the right tool for
the call-protocol-from-browser case; WebTransport is the right tool for
the generalized ALPN-stream-proxy case we don't have yet — coexist, not
migrate) and the @alkdev/pubsub concrete prior art (the EventEnvelope
{type,id,payload} the call protocol was derived from already has a
working WebSocket client/server; the sync is a small adjustment, not a
from-scratch build).

call-protocol.md references the pubsub lineage for the
transport-agnosticism claim.
2026-06-30 09:49:25 +00:00
3327d585da docs(http): resolve OQ-40 reqwest client config — ClientWithMiddleware + retry/retry-after middleware stack
OQ-40 resolved: alknet-http owns a shared reqwest_middleware::ClientWithMiddleware
(not a bare reqwest::Client) with a two-layer middleware stack —
RetryTransientMiddleware (reqwest-retry, exponential backoff on transient
failures) + inlined RetryAfterMiddleware (from melotic/reqwest-retry-after, MIT,
~50 lines, inlined to bound the upstream's unbounded HashMap storage). The two
are complementary: reqwest-retry's default strategy does not honor Retry-After.

Hot-reload is rebuild-and-swap via ArcSwap (same pattern as
ConfigIdentityProvider, ADR-035); a rebuild drops the connection pool, which
is acceptable since a config change wanting a fresh pool is the trigger. The
three one-way constraints stand unchanged: alknet-http owns its client (no
env-var config, no shared global), credentials inject per-request from
OperationContext.capabilities, outbound TLS uses the system trust store.

Records the downstream layering boundary: the agent crate's provider SSE
normalization (the solid part of aisdk's pattern — Vercel-UI-message
normalization) sits on top of this client, consuming the reqwest::Response
stream; it does not replace the client. The aisdk core/client.rs reference for
client construction is dropped (env-var config + hand-rolled retry are the
anti-patterns discarded); the from_openapi.ts SSE normalization reference in
the forwarding-handler section is kept (separate, solid pattern).

No ADR — the decision is internal to alknet-http: the client type does not
cross crate boundaries (alknet-call never sees reqwest), the library choice is
reversible, and it does not touch the system's structure, constraints, or
cross-crate API surface.

Updates: http-adapters.md (HTTP client section rewritten, references updated,
constraints/OQ bullets updated), http-mcp.md (OQ-40 status flip), open-
questions.md (OQ-40 resolved with full config-shape table), README.md (OQ-40
folded into the existing two-way-doors bucket), and three secondary docs
(crates/http/README.md, overview.md, http-server.md) that carried stale 'open'
OQ-40 references.
2026-06-30 08:02:30 +00:00
125cb49cc4 docs(http): defer h3/WebTransport (ADR-044); browsers use WebSocket for v1
Working through the WebTransport implementation path surfaced a scope
question distinct from the hedging-as-deferral anti-pattern ADR-038 was
written to correct. Three findings drove the re-evaluation:

1. The browser bidirectional call-protocol path doesn't require
   WebTransport — WebSocket is full-duplex, EventEnvelope fits a WS
   binary message boundary cleanly, and the Dispatcher is stream-
   agnostic (ADR-012). What WebTransport gives over WebSocket (native
   multi-stream multiplexing, the ALPN-as-stream substrate) benefits the
   proxy use case, not the call protocol.
2. WebTransport is a draft standard (-07, not RFC) on an experimental
   Rust dependency stack (wtransport/h3 both self-describe as not
   production-ready). Either choice puts a draft protocol on the
   security surface of the first release.
3. The ALPN-stream-proxy (ADR-040) is speculative — its WASM parser
   consumers (browser SSH/SFTP/git clients) don't exist yet, and the
   downstream crates WebTransport deferral blocks (SSH, git, SFTP)
   expose their ALPNs natively over QUIC regardless.

This is a scope decision (per ADR-009: a decision that 'genuinely
doesn't need to be made yet because the use case isn't concrete'), not
hedging. The reversal trigger is concrete: a real deployment needing
the ALPN-stream-proxy.

ADR-038 is superseded (its anti-pattern correction stands; its specific
'h3 in scope now' decision is reversed). ADR-040 and ADR-043 are
parked, not superseded — their designs revive unchanged when WebTransport
revives, with §2 (bidirectionality) and §3 (no-PeerId overlay) of ADR-043
transferring to WebSocket for v1.

ADR-044 §5 also states the 'browser is not a peer' rationale that
ADR-034 §4 closed without arguing: peer = addressable node in the
call-protocol peer graph (stable PeerId, PeerRef::Specific-reachable,
identity stable across reconnects), not 'any endpoint that exchanges
calls during a live session.' A browser is the second but not the first
(no stable crypto identity of its own, ephemeral, not addressable from
other nodes). ADR-034 §4 and Assumption 2 are amended by reference.

The wtransport-vs-hyperium dependency question is recorded (not
resolved — WebTransport is deferred) in ADR-044 §'Research note' and
webtransport.md so the revival doesn't re-derive it: wtransport probably
isn't the right choice (axum-bridge friction — it owns its own HTTP
serving path); the hyperium stack (h3 + h3-quinn + h3-webtransport) fits
the axum integration better but its server-side WebTransport API needs
verification before commitment.

Reviewed by architecture-review subagent; all critical cross-reference
issues (ADR-034 §5 stale 'in scope' assertion, ADR-036 Context listing
h3 as implemented, webtransport.md Design Decisions table) resolved.
2026-06-30 05:55:55 +00:00
0a78306686 docs(http): add ADR-043 WebTransport bidirectional ALPN substrate; fix spec drift from mid-spec pivot
A consistency review of the alknet-http specs found two classes of
issues: internal contradictions from the mid-spec pivot (the to_openapi
gateway pattern landed in prose but not in cross-references), and a
systematic client→server assumption that only holds for the OpenAPI/MCP
case leaking into the WebTransport architecture.

Class 1 (internal contradictions):
- C1: to_openapi was half-refactored — body described the ADR-042
  gateway pattern but the decisions table and ADR-036 still said
  'paths mirror /{service}/{op}'. ADR-036's to_openapi clause is now
  amended as superseded by ADR-042; the stale decisions row and README
  Principle 2 are fixed.
- C2: the axum Router route list didn't include the 5 gateway endpoints
  (/search, /schema, /call, /batch, /subscribe). Added them; clarified
  /openapi.json as the gateway description doc; added gateway paths to
  the decoy exclusion list.
- C3: ADR-034 §5 still talked about the 'h3/WebTransport deferral
  bucket' that ADR-038 eliminated. Amended §5/Consequences/References
  to drop the deferral framing (the auth-model decision stands; only
  the 'when' wording was stale).

Class 2 (one-way direction assumption):
- C4/C5/C6: the WebTransport specs framed the session as browser→hub
  one-way, when the call protocol is bidirectional and WebTransport is
  a general ALPN transport substrate. New ADR-043 reframes WebTransport
  as a bidirectional ALPN transport substrate (call protocol is the
  first/canonical target; needs no WASM parser), names the call
  protocol's bidirectionality over WebTransport sessions, and states
  the inbound no-PeerId connection-local overlay as the mirror of
  ADR-034 §2. webtransport.md is updated to reflect this framing;
  ADR-040 is repositioned (not superseded) as the substrate's non-call-
  ALPN mechanism.
- C7: the HTTP/1.1+HTTP/2 surface's one-directionality is now named as
  a lossy consequence of HTTP request/response; WebTransport is named
  as the surface that restores the bidirectional call model.
- C8: overview.md acknowledges the from/to direction model is
  OpenAPI/MCP-specific, not a call-protocol property.

A review subagent pass on ADR-043 + webtransport.md found no critical
issues; warnings W1-W3 (residual browser-as-subject framing, ADR-009
rationale in spec, opening abstract tone) and suggestions S2/S4/S5
were addressed.
2026-06-29 10:43:18 +00:00
69ebe58bab docs(http): add ADR-042 OpenAPI gateway pattern for to_openapi
The to_openapi spec was describing one OpenAPI path per alknet operation
— the inverse of from_openapi. That inverse is genuinely messy: the call
protocol's input is a flat JSON object, and generating a traditional
OpenAPI path entry (POST /fs/{path} with path param, body, query params)
requires reverse-engineering which fields are path/query/body — metadata
the call protocol doesn't carry. The three options (leaky HTTP metadata
on OperationSpec, fragile heuristics, manual annotation) are all messy.

ADR-042 replaces this with the gateway pattern (same as ADR-041 for
to_mcp): to_openapi generates 5 fixed endpoints (search, schema, call,
batch, subscribe) that gate access to the full operation registry. The
input is always a flat JSON body — no path/query/body split to
reverse-engineer. JSON Schema is already in the OperationSpec.

The per-caller API surface is the key advantage: /search is
AccessControl-filtered, so the client sees only what it can call. The
Gitea failure mode (dumping admin ops to every caller in a static
OpenAPI doc) is structurally impossible — the per-caller surface is the
default, not an afterthought. OpenAPI has no per-caller filtering
concept; the gateway pattern provides it through /search.

Gateway endpoint set:
- /search -> services/list (AccessControl-filtered, names + descriptions)
- /schema -> services/schema (full OperationSpec)
- /call -> call.requested (Query/Mutation, flat JSON body)
- /batch -> multiple call.requested (correlated IDs)
- /subscribe -> call.requested (Subscription, SSE) — the one endpoint
  the MCP gateway excludes (MCP is request/response; OpenAPI/SSE
  supports streaming)

A traditional per-operation-paths projection is additive (a deployment
that wants the nice Swagger UI builds it with HTTP-specific metadata),
not a replacement. The gateway is the default.

http-adapters.md to_openapi section rewritten: the gateway endpoint
set, per-caller filtering, error fidelity on the /call endpoint, and
the additive traditional projection. The 'Why' section adds the
flat->structured and per-caller-surface rationale.

README/overview ADR tables and the top-level README current-state note
updated for ADR-042.
2026-06-29 09:33:39 +00:00
5fc074713c docs(http): add ADR-041 MCP tool-gateway pattern for to_mcp
The to_mcp spec was describing one MCP tool per alknet operation — the
tool-bloat problem. An LLM connecting to a node with 200 operations gets
200 MCP tools dumped into its context, degrading reasoning and wasting
context budget.

ADR-041 replaces this with the tool-gateway pattern (same pattern as
opencode's memory and worktree tools): to_mcp exposes 4 fixed meta-tools
(search, schema, call, batch) that gate access to the full operation
registry. The LLM has a few tools in context, discovers operations on
demand through search + schema, then calls. Same principle as Linux's
man command — don't preload all documentation; query on demand.

Gateway tool set:
- search -> services/list (names + descriptions, AccessControl-filtered)
- schema -> services/schema (full OperationSpec for a specific op)
- call -> call.requested (Query/Mutation only, request/response)
- batch -> multiple call.requested (correlated IDs, OQ-14)

Subscription operations are excluded — MCP tool calls are
request/response by protocol design (the client blocks until
CallToolResult returns); streaming subscriptions don't fit. Subscriptions
are filtered out of search results and cannot be invoked via call.

http-mcp.md to_mcp section rewritten: the gateway tool set, Subscription
exclusion, and the service behavior (tools/list returns 4 fixed tools,
tools/call dispatches through the gateway). The 'Why' section adds the
tool-bloat rationale and the memory/worktree tool pattern that informed
the design.

README/overview ADR tables and the top-level README current-state note
updated for ADR-041.
2026-06-29 08:34:44 +00:00
398e3d512d docs(http): add ADR-040 WebTransport ALPN-stream-proxy and reframe OQ-38
The 'WebTransport proxy' concept was conflating two distinct things;
this pass separates them:

1. In-process ALPN-stream-proxy (ADR-040, in alknet-http): the h3 handler
   hands a WebTransport stream to another ALPN handler (SshAdapter,
   GitAdapter, etc.) as a Connection, so a browser with a WASM parser
   can reach any ALPN service via WebTransport. Path-based routing
   (the CONNECT path declares the target: /alknet/ssh -> SshAdapter).
   HttpAdapter gains Arc<HandlerRegistry> for the lookup. The browser's
   WASM parser implements BiStream (ADR-007) over the WebTransport
   stream. SSH-over-WebTransport is HTTPS-shaped at the network layer
   (anti-censorship: the 'VPN-like without being a VPN' use case on a
   clean foundation). russh-sftp demonstrates WASM targeting is
   feasible; SSH is the next target.

2. Standalone relay service (OQ-38, future alknet-relay crate): a full
   relay - fork of iroh-relay - with WebTransport proxy fallback for
   NAT traversal. This is infrastructure, not a mode of the h3 handler.
   OQ-38 reframed to be the standalone-relay scope question (distinct
   from the in-process proxy now resolved by ADR-040).

webtransport.md updated: three stream destinations (call protocol,
ALPN-handler proxy, other sub-protocols) with path-based routing; new
'ALPN-stream-proxy' section covering the WASM client side, auth model
(bearer token gates the session; protocol's own auth gates the
protocol session), and the HandlerRegistry reference.

README/overview ADR tables and OQ summaries updated for ADR-040.
2026-06-29 07:56:35 +00:00
ab47dac4ad docs(http): draft alknet-http architecture specs and ADRs 036-039
First speccing pass for alknet-http (HTTP interface crate: h2/http1.1/h3
server + from_openapi/to_openapi/from_mcp/to_mcp adapters).

Specs (crates/http/):
- README.md, overview.md — crate index, two-roles-in-one-crate framing,
  adapter location map, feature gates (h3, mcp), no-env-vars invariant
- http-server.md — HttpAdapter for h2/http1.1, axum over QUIC stream,
  Bearer auth, SSE projection for subscriptions, /healthz, stealth decoy
- http-adapters.md — from_openapi (reqwest) and to_openapi (projection),
  error fidelity (HTTP_<status> per ADR-023), type definitions
- http-mcp.md — from_mcp/to_mcp (feature-gated), streamable-HTTP-only
- webtransport.md — h3/WebTransport handler, browser streaming path,
  HTTP/3 request vs WebTransport session distinguished at framing layer

ADRs:
- ADR-036 HTTP-to-Call Operation Mapping (Proposed) — direct path
  mapping; to_openapi is projection, not router (the load-bearing one-way
  door from Phase 0 DH-3)
- ADR-037 MCP Stdio Transport Exclusion (Proposed) — streamable HTTP
  only; stdio is not built (RCE-vector security position)
- ADR-038 HTTP/3 and WebTransport as First-Class HTTP Transports
  (Proposed) — corrects the Phase 0 DH-2 deferral framing; h3 is in
  scope, not deferred, per ADR-009 §'What this framework is NOT'
- ADR-039 HTTP Server and Client Host Colocated in alknet-http
  (Proposed) — one crate for server + client host (shared HTTP deps,
  shared operation-spec->HTTP mapping)
- ADR-003 Amendment 1 — clarifies alknet-call is a protocol-foundation
  crate (the alknet-http -> alknet-call dependency edge)

Open questions (OQ-38, OQ-39, OQ-40 added under 'Theme: alknet-http'):
- OQ-38 WebTransport relay-as-proxy scope (genuine scope question, not
  a deferral — the decision is made when the use case becomes concrete)
- OQ-39 to_openapi published-spec versioning (one-way after first
  publication)
- OQ-40 reqwest client config and connection pooling (two-way-door)

Architecture README and overview updated with doc table, ADR table
(036-039), current-state note, and crate graph (alknet-http ->
alknet-call edge).

Reviewed by architecture-reviewer subagent: 3 critical, 4 warning, 5
suggestion issues found and fixed (missing ADR-039, WebTransport stream
routing conflation, undefined types, stale OQ-37 deferral language,
README OQ table completeness, Bearer-only attribution, cross-references,
ADR-038 ALPN quote, feature-gate placeholder, MCP temporal language).
2026-06-29 05:53:38 +00:00
4a52779460 docs(arch): amend call specs for ADR-029/030/032/034 — peer-keyed routing, PeerEntry, forwarded-for, three roles
Sync the call crate specs to the accepted ADRs 029-034:
- operation-registry: PeerCompositeEnv (peer-keyed overlays), invoke_peer/
  PeerRef routing, retire remote_safe/trusted_peer, AccessControl-based peer
  auth, forwarded_for on OperationContext (ADR-029/030/032)
- call-protocol: peer-keyed compose_root_env, forwarded_for in call.requested
  payload, build_root_context forwarded_for parameter (ADR-029/032)
- client-and-adapters: CallClient verifier selection by PeerEntry presence,
  remote_identity: None load-bearing, three remote roles (ADR-034)
- README: ADR-029/030/032/034 in applicable ADRs table
2026-06-28 21:08:26 +00:00
0de2cebb1d docs(arch): ADR-035 — concrete persistence adapter shapes, resolve OQ-36
Commits the concrete adapter shape deferred by ADR-033: read-sync /
write-async split with honker NOTIFY/LISTEN for no-restart cache
invalidation, against SQLite, in a separate alknet-store-sqlite crate.

Two constraints drive the design: (1) the hot-path read trait
(IdentityProvider::resolve_from_fingerprint, CredentialStore::get) is
sync — called in the accept loop, no .await — so a SQLite-backed
adapter must cache in memory and serve sync reads from the cache; (2)
auth changes must take effect without a restart (an early issue the
project already fixed for ConfigIdentityProvider via ArcSwap config
reload). honker's SQLite NOTIFY/LISTEN (single-digit-ms wake, no
polling) is the cache-invalidation mechanism that makes both hold:
write commits to SQLite + emits NOTIFY, the running process's LISTEN
wakes, the in-memory index reloads and atomically swaps, the next
read sees the new state. Same ArcSwap-reload pattern as config,
generalized from 'config file is source of truth' to 'SQLite is
source of truth, honker signals when it changed.'

New async IdentityStore write trait (put_peer / update_peer /
remove_peer) extends the sync IdentityProvider read trait for peer
mutations. ConfigIdentityProvider does NOT implement it (config
reload is its write path — a posture enforced by the absence of a
backend, not a type-system constraint); SqliteIdentityProvider
implements both. CredentialStore::put/delete refined to async (within
ADR-031's one-way door — the contract was get/put/delete keyed by
provider persisting EncryptedData never decrypting; sync-vs-async was
unspecified). CredentialStoreError renamed to shared StoreError
covering both traits.

alknet-store-sqlite is one crate implementing both IdentityStore and
CredentialStore with shared SQLite connection + honker LISTEN infra
(splitting later is a two-way door). Schema shape committed (one row
per PeerEntry with JSON columns for fingerprints/scopes/resources;
one row per EncryptedData blob keyed by provider); exact DDL is an
implementation-detail two-way door in the adapter crate. The keypal
adapter-factory pattern is intentionally not ported to Rust (runtime
column-mapping is a TS affordance; in Rust each adapter is a concrete
type, cross-cutting concerns are a shared helper module).

Amends ADR-031 (put/delete async refinement, StoreError rename),
ADR-033 (concrete adapter shape now specified, two-crate framing
collapsed to one), ADR-034 (OQ-36 now resolved), auth.md (IdentityStore
section, cache-invalidation summary, OQ-36 reference), config.md (two
write paths note), and the OQ-36/OQ-34 entries in open-questions.md.
Review fixed 4 criticals (error-type name divergence, duplicate
IdentityProvider sketch, upsert/Duplicate ambiguity, 'shape unchanged'
contradiction), 7 warnings, 5 suggestions.
2026-06-28 11:10:31 +00:00
6cc8715ccf docs(arch): ADR-034 — outgoing-only X.509 and three peer roles, resolve OQ-37
Untangles the conflation of three distinct remote roles under 'X.509
endpoint': (1) public X.509 endpoint — a remote HTTPS/call-over-TLS
server the local node is a client of (no PeerEntry, no PeerId, not in
the peer graph; CA verification + bearer token); (2) transport relay —
iroh's DERP-equivalent, infrastructure, not an alknet peer; (3) hub /
hosting node — an alknet peer that also exposes a public domain + X.509
for browsers (mixed-fingerprint PeerEntry, already supported by
ADR-030).

The load-bearing one-way door is the client-side verifier selection
rule: known peer (PeerEntry present) → fingerprint pin; unknown X.509
remote → CA verification (WebPkiServerVerifier); unknown Ed25519
remote → fails closed. This closes the AcceptAnyServerCertVerifier
security hole OQ-29 flagged, with the peer-model criterion (PeerEntry
presence) made explicit. The 'make PeerEntry symmetric' instinct is
rejected — pure-client connections to public APIs have no stable
logical identity to pin.

Documents that CallCredentials.remote_identity: None is load-bearing
(None = public X.509 endpoint → CA path, not a missing field; Some =
known peer → fingerprint pin), closing a subtle gap where an
implementer could have defaulted to a placeholder or treated None as
skip-verify.

Records WebTransport relay-as-proxy (deferred with h3/WebTransport,
new OQ-HTTP-07) and on-chain/smart-contract peer discovery (fits the
OQ-36 repo/adapter pattern, no auth-model change) so they aren't lost.

Amends auth.md and client-and-adapters.md with the three-role naming,
the verifier selection rule, and the Option semantics; updates OQ-37
to resolved in open-questions.md, README.md, and both crate READMEs.
2026-06-28 10:47:49 +00:00
3f011cbb82 docs(arch): tighten door-type framing — reversal cost, not deferral
ADR-009, open-questions.md, and the architect agent spec all had the same
conflation: 'two-way door' was phrased as 'can be decided during
implementation,' which reads as 'defer the decision.' That's not what it
means. A two-way door is a decision you make now and can revert later if
wrong — it's about reversal cost, not urgency.

ADR-009: add §'What this framework is NOT' — explicitly separates door
type (reversal cost) from deferral (scope management). State that
architecture decisions are the architect's regardless of door type.
Reword the two-way-door process from 'can be decided during
implementation' to 'pick the simplest option that works, implement it,
revert if needed.'

open-questions.md: reword the header to clarify door type describes
reversal cost, not urgency. Add 'Door type is separate from whether a
decision is made.'

architect.md: add Key Principle #8 (decisions are made, not deferred),
a new 'Door Types and Decision Urgency' section, and two new anti-patterns
(#8: door type as deferral, #9: hedging language in resolved decisions).
2026-06-28 09:19:10 +00:00
7d812af8f4 docs(arch): multi-credential PeerEntry, resolve OQ-29, dissolve OQ-35, add OQ-37
Amend ADR-030 with three changes from the auth-type analysis:

1. PeerEntry is now multi-credential: fingerprints: Vec<String> (Ed25519
   and/or X.509) + auth_token_hash: Option<String> (bearer token). All
   resolve to the same peer_id. A peer that authenticates via Ed25519
   today and via auth_token tomorrow gets the same PeerId. The 'peer
   bearer vs auth bearer' distinction was wrong — the correct framing is
   the three credential types (Ed25519, X.509, bearer token) and whether
   the token needs a stable logical id across rotation (PeerEntry) or not
   (ApiKeyEntry).

2. Fingerprint normalization (§6): quinn extracts the raw Ed25519 public
   key from the SPKI cert and formats as ed25519:<hex>, matching iroh.
   The same key has the same fingerprint regardless of transport. X.509
   fingerprints stay as SHA256:<hex of DER>. This also simplifies the
   coming WebTransport relay work.

3. The 'API keys' section is replaced with 'Bearer tokens' — correctly
   framing the three auth types and the two bearer-token paths
   (PeerEntry.auth_token_hash vs ApiKeyEntry).

Resolve OQ-29 (CallClient TLS client-auth): wire quinn client-auth (present
Ed25519 key as raw public key client cert — the server-side extraction
already works); key-type-aware server cert verification (raw key =
fingerprint match, X.509 = CA verification via WebPkiServerVerifier —
AcceptAnyServerCertVerifier is only safe for raw keys); fingerprint
normalization. The iroh path already works (RFC 7250 raw keys, both sides
exchange automatically); the gap was quinn-only.

Dissolve OQ-35: the 'API key asymmetry' framing was wrong. PeerEntry
supports multiple credential paths; ApiKeyEntry is for tokens that ARE the
identity.

Add OQ-37: X.509 outgoing-only case — the three auth types and how X.509
server identity fits the peer model. Not blocking the ADR-029 migration;
downstream (HTTP crate phase).

Update auth.md, config.md, client-and-adapters.md, call/README.md,
core/README.md, open-questions.md, README.md, and call_client.rs source
comment.

Workspace green: 326 tests pass, build clean.
2026-06-28 08:49:36 +00:00
1d94aaea51 docs(arch): resolve call-crate OQs, promote OQ-29 to load-bearing on ADR-030
Resolve the call-crate open questions where the decision is made —
OQ-27 (auto-re-import), OQ-28 (same-peer collision = error), OQ-30
(PeerRef::Any insertion-order first-match), OQ-31 (services/list-peers
opt-in). These were previously marked 'open' with 'v1' hedging language
despite having a decided default. What remains (refresh(), richer routing,
services/list-peers the op) is genuine feature addition, not unmade
architecture.

Reframe OQ-32 (multi-hop) as a feature extension rather than a 'v1'
deferral — the one-hop model is the architectural commitment; extending
to multi-hop doesn't break downstream.

Promote OQ-29 (CallClient TLS client-auth) from medium to high priority
and surface its real interaction with ADR-030. Previously framed as
'additive — two-way-door remainder,' but ADR-030's PeerEntry fingerprint
→ peer_id resolution requires the client to present a TLS client cert.
With with_no_client_auth(), no fingerprint is extracted, the PeerEntry
path is dormant, and PeerCompositeEnv keys on None or the API-key prefix
instead of the stable peer_id. This is the activation path for ADR-030's
primary use case, not an additive feature. Three options laid out: (a)
wire client-auth with the ADR-029 migration, (b) ship token-only and
switch later (the 'compounds into a mess' path), (c) extend PeerEntry
to cover auth_token-based identity. Requires a decision before the
migration lands.

Clarify OQ-36 (concrete adapter shapes): the trait shapes and in-memory
adapters ship with core — the deferral is only for the persistence
adapters (SQLite, etc.). The in-memory adapters are real implementations
of a full repo pattern, not stubs.

Update call_client.rs source comment to reference OQ-29 instead of the
'v1' / 'two-way-door remainder' framing.

Workspace green: 326 tests pass, build clean.
2026-06-28 05:35:52 +00:00
f224ea998c docs(arch): ADR-030..033 — repo/adapter pattern, PeerEntry, CredentialStore, forwarded-for
Land the storage and auth strategy research (findings.md) as four
accepted ADRs and amend the core and call specs to match:

- ADR-030: PeerEntry and Identity.id decoupling. Replaces
  authorized_fingerprints with peers: Vec<PeerEntry>; Identity.id becomes
  the stable peer_id, decoupled from the rotating fingerprint. Supersedes
  ADR-029 Assumption 1's UUID source (one-way door preserved, source
  changes). Resolves OQ-33 and the storage-boundary half of OQ-34. Records
  the API-key asymmetry as deliberate (OQ-35).

- ADR-031: CredentialStore repo trait + InMemoryCredentialStore default
  adapter in core. Second repo trait alongside IdentityProvider. Vault
  encrypts; the store persists the EncryptedData blob; assembly layer
  loads into Capabilities. EncryptedData core mirror includes salt for
  wire-format compat.

- ADR-032: Forwarded-for identity. forwarded_for field on call.requested
  and OperationContext — metadata only, never read by AccessControl::check
  (enforced structurally via the check signature). The from_call handler
  populates it. Wire-format one-way door, folded into the ADR-029
  migration window.

- ADR-033: Storage boundary and repo/adapter pattern. Core defines repo
  traits + in-memory defaults; persistence adapters are separate crates;
  assembly layer wires. Resolves OQ-34. Concrete adapter shapes deferred
  for exploration (OQ-36).

Amends auth.md, config.md, operation-registry.md, client-and-adapters.md,
open-questions.md, README.md, crates/core/README.md. Marks ADR-029
Accepted (Assumption 1 carries the ADR-030 superseded note). Marks the
research findings doc reviewed.
2026-06-27 12:12:25 +00:00
99c6dd9483 docs(arch): resolve OQ-26 (AdapterError variants) + OQ-33 (PeerId = logical id) + OQ-34 (persistent peer registry)
OQ-26 (resolved): AdapterError variants decided — DiscoveryFailed,
SchemaParse, Transport, Unauthorized, SamePeerCollision (replaces flat
Conflict per ADR-029 §5). #[non_exhaustive] for downstream extension.
Two-way door; the initial set is the code's return type.

OQ-33 (resolved): PeerId is a logical identifier, NOT Identity.id. The
research's v1 default (PeerId = fingerprint) is overridden: coupling PeerId
to crypto material breaks every in-flight PeerRef::Specific and every ACL
entry on key rotation. v1 source is a connection-assigned UUID — a
no-storage workaround that works for the immediate use case (head→workers,
reconnect produces fresh PeerRef, in-flight gets NOT_FOUND which is correct).
The one-way door: PeerId is logical, not crypto — this determines
PeerCompositeEnv key type and PeerRef::Specific payload. The id source
(UUID vs configured name vs peer registry) is the two-way-door remainder.

OQ-34 (new): the storage dimension OQ-33 surfaced. The core crates are
deliberately DB-free (smaller, fewer deps, simpler testing) — this served
local-only state (vault, registry) well, but peer identity is the first
cross-node state that wants persistence. The real solution (a persistent
peer registry mapping stable logical name → current crypto material,
surviving key rotation) is not a v1 blocker (UUID works), but tracked so the
no-DB posture's limit is deliberate, not accidental. The storage boundary
(core gets a PeerRegistry trait vs stays storage-free) is the one-way door;
the backend choice is two-way. Key-rotation/ACL note: decoupling PeerId from
crypto keeps the door open for ACL entries that persist across key rotation
— when the peer registry is built, ACLs key on the logical name and key
rotation becomes vault-only with no remote-side ACL update.
2026-06-27 06:34:35 +00:00
77eb35a8a5 docs(arch): ADR-029 peer-graph routing model — supersedes ADR-028
ADR-028's remote_safe/trusted_peer was a parallel, weaker authorization system
that duplicated the existing AccessControl/Identity machinery and couldn't
express the head→N-workers pattern (the primary use case). The flat-namespace
single-peer overlay model (one connection layer in CompositeOperationEnv)
structurally breaks the moment a head has two workers both exposing
/container/exec.

ADR-029 replaces it with:
- Peer-keyed overlays: PeerCompositeEnv { connections: HashMap<PeerId, ...> }
  replaces CompositeOperationEnv's singular connection layer. A head node
  routes invoke_peer() to the right peer via PeerRef::Specific / PeerRef::Any.
- AccessControl-based peer authorization: the existing AccessControl::check
  (peer_identity) gates peer calls — the same mechanism that gates every other
  call. remote_safe/trusted_peer/RemoteFilter/list_operations_peer_scoped/
  services_list_handler_peer_scoped are retired. The op's AccessControl IS the
  peer-authorization policy; no parallel system.
- ScopedPeerEnv: peer-qualified reachability (peer-pinned allowlist) replaces
  from_call's namespace_prefix as the disambiguation mechanism. Cross-peer
  collision dissolves (separate sub-overlays); same-peer collision stays error.
- services/list-peers opt-in for peer-attributed re-export listing.

POC-validated against real types (scratch module written, type-checked,
removed; build clean, 207 tests pass). Petgraph not needed for v1 (one-hop,
shallow); nested HashMap suffices; extends to multi-hop without redesign (OQ-32).

OQ impact: OQ-25 dissolved (no marking); OQ-28 cross-peer dissolved / same-peer
stays; OQ-26/27/29 stay; new OQ-30 (Any routing policy), OQ-31 (list-peers
semantics), OQ-32 (multi-hop federation).

Research: docs/research/alknet-call-peer-routing/findings.md (POC shapes,
prior art — Ray.io actors, Dapr service invocation, full ADR draft).
ADR-028 marked Superseded; ADR-017 DC-1 amendment updated to point at ADR-029.
2026-06-27 06:04:19 +00:00
f9c0ab092b docs(arch): sync call-completion specs with implementation — Dispatcher/RemoteFilter, ClientError, OQ-29
Post-implementation spec sync after the call-completion batch landed
(commits e4a2594..a3825f5). The sub-agent review flagged no spec drift, but
comparing the implemented types against the spec sketches surfaced five
details the specs didn't name — filled in here so the spec matches what was
built:

- client-and-adapters.md: name the shared Dispatcher (protocol/dispatch.rs)
  + RemoteFilter mechanism that enforces ADR-028's default-deny at dispatch
  time (the load-bearing security gate — checks remote_safe before building
  context, before any capability material reaches the handler). Add
  ClientError/RemoteIdentity types, the spawn_dispatch lower-level API, and
  the services_list_handler_peer_scoped wiring (the assembly layer must
  register the peer-scoped services/list handler for a CallClient's registry,
  not the plain one). Record the v1 TLS client-auth gap (AcceptAnyServerCertVerifier,
  with_no_client_auth) as OQ-29.
- call-protocol.md: point the adapter dispatch-loop description at the shared
  Dispatcher (dispatch.rs) so readers find the mechanism ADR-017 §1 commits to.
- open-questions.md: OQ-29 — CallClient TLS client-auth + remote-identity
  verification is a two-way-door remainder; the no-env-vars invariant is
  unaffected (auth_token flows via call-protocol payload, not TLS).
- READMEs: current-state now reflects completion done + reviewed (207 lib +
  2 integration tests); OQ-29 added to both OQ summaries.
2026-06-26 13:42:42 +00:00
2649e068e5 docs(arch): call-completion — ADR-028 peer-scoped filtering + client-and-adapters spec + tasks
Resolves the four gap-analysis decisions (DC-1..4) blocking the alknet-call
client/adapter surface specced in ADR-017:

- ADR-028 (new): locks the one-way door for DC-1 — CallClient registry is
  default-deny (remote_safe: bool on HandlerRegistration, default false across
  all provenance); share-global is an explicit trusted-peer opt-in; filtering
  is a dispatch-time read over the single Layer-0 registry, not a copy.
- client-and-adapters.md (new spec): operationally fills the gap ADR-017 left
  to implementation — CallClient, from_call, from_jsonschema, OperationAdapter
  trait, adapter location map, no-env-vars invariant, exchange-of-operations
  pattern. Keeps call-protocol.md and operation-registry.md under the
  700-line split threshold.
- ADR-017 amended: records DC-2/3/4 v1 defaults (auto-on-reconnect,
  error-on-collision, Result error type) and points DC-1 at ADR-028.
- OQ-25..28 (new): two-way-door remainders (remote_safe shape, AdapterError
  variants, re-import trigger, namespace collision) with v1 defaults recorded.
- Index/cross-ref updates across READMEs and the two existing call specs.

Tasks: 6 task files under tasks/call/ decomposing the completion work along
the gap-analysis priority order — remote-safe-marking (one-way door, first)
→ call-client (phase-risk) → from-call → operation-adapter-trait →
from-jsonschema (parallel with call-client) → review-completion. Graph
validated with taskgraph; parallelism designed in (from-jsonschema runs
concurrent with call-client/from-call once the trait lands).
2026-06-26 12:25:13 +00:00
00edfc0889 feat(core): ADR-027 — RawKey decoupling, client cert request, ACME integration
Three tasks implementing ADR-027:

1. core/rawkey-decouple-from-iroh: TlsIdentity::RawKey now uses
   Ed25519SecretKey (alknet-core-owned wrapper over ed25519_dalek)
   instead of iroh::SecretKey. RawKeyCertResolver and Ed25519SigningKey
   un-gated from #[cfg(all(quinn, iroh))] to #[cfg(quinn)] only.
   Quinn-only builds (default) now support RFC 7250 raw-key identity.
   iroh transport converts via iroh::SecretKey::from_bytes.

2. core/endpoint-request-client-cert: replaced with_no_client_auth()
   with AcceptAnyCertVerifier — a custom ClientCertVerifier that
   requests client certs but doesn't require them or verify against
   a CA. alknet's identity model is fingerprint-based (the
   authorized_fingerprints set is the trust anchor), not PKI-based.
   Peer certs are extracted at the TLS layer for fingerprinting;
   peers without certs connect normally.

3. core/acme-integration: TlsIdentity::Acme variant (domains,
   cache_dir, directory, contact) + AcmeDirectory enum. TlsSetup
   two-phase construction: synchronous for X509/RawKey/SelfSigned,
   async for Acme (spawns AcmeState event loop, builds ServerConfig
   with ResolvesServerCertAcme). acme-tls/1 ALPN added when ACME is
   active; dispatch_quinn guard closes challenge connections
   gracefully (challenge is TLS-layer-handled). acme feature gate
   keeps rustls-acme out of non-ACME builds.

Workspace: build/test/clippy green across all 3 feature configs
(quinn-only, quinn+iroh, quinn+acme, all-features). 331 tests, 0
failures, 0 warnings.
2026-06-24 20:29:43 +00:00
d94d7a132a docs(adr-027): TLS identity redesign — ACME + RawKey decoupling
ADR-027 resolves the architectural gap surfaced when ACME integration
became a concrete target:

1. TlsIdentity::Acme variant — static config data (domains, cache_dir,
   directory, contact) with async AcmeState constructed at endpoint
   setup via two-phase TlsSetup (not stuffed into the Clone-able enum).

2. TlsIdentity::RawKey decoupled from the iroh feature — uses
   Ed25519SecretKey (alknet-core-owned wrapper over ed25519_dalek)
   instead of iroh::SecretKey. Raw-key TLS identity (RFC 7250, the
   default for most alknet nodes) now works in quinn-only builds.
   iroh transport converts via SecretKey::from_bytes.

3. ACME feature-gated behind new acme feature (rustls-acme optional
   dep). Non-ACME builds don't compile it.

4. dispatch_quinn guard for acme-tls/1 challenge connections — TLS-ALPN-01
   is handled at the rustls cert resolver layer during the handshake;
   the guard closes challenge connections gracefully instead of logging
   a misleading "no handler" warning.

Research confirmed QUIC (quinn) handles ACME challenges differently than
TCP (reverse-proxy): quinn gives no ClientHello peek hook, but the
challenge is fully answered at the cert resolution step before the
connection surfaces to the application. No handler registration needed.

Spec updates: config.md, endpoint.md, open-questions.md (OQ-12),
overview.md + README.md (ADR index), ADR-010 (cross-ref).

Tasks: core/rawkey-decouple-from-iroh (gen 1, no deps),
core/acme-integration (gen 2, depends on rawkey). Graph: 36 tasks.
2026-06-24 12:29:24 +00:00
97216764ea fix: resolve review #004 findings W1-W4 + close review gate
W1 (call/protocol/abort-cascade-wiring): wire AbortCascade into
CallAdapter handle_stream for EVENT_ABORTED. Cascades with
AbortPolicy::AbortDependents, aborts root, no descendant frames on
wire (ADR-016 Decision 2). Two integration tests added.

W2 (core/endpoint-client-fingerprint): extract TLS client cert
fingerprint in dispatch_quinn (SHA256:<hex> of leaf cert DER via
peer_identity) and dispatch_iroh (ed25519:<hex> of peer NodeId).
Fingerprint format documented in auth.md. Server config change
(with_no_client_auth → request-but-don't-require) deferred to new
follow-up task core/endpoint-request-client-cert.

W3 (vault/mnemonic-debug-redaction): replace Mnemonic derive(Debug)
with manual redacting impl (phrase: "[REDACTED]"). Seed confirmed
no Debug impl. Redaction test added.

W4 (core/auth-apikey-resources): Option B — drop entry.resources from
spec. External identities (token/fingerprint) grant scopes only;
resource-scoped ACLs are composition-internal (ADR-015/022). auth.md
corrected + limitation documented. Two tests confirm empty resources.

review-post-impl-fixes: all 4 verified, workspace green (326 tests,
0 failures, 0 clippy warnings). Review #004 status → resolved.

Graph: 34 tasks, 12 gens.
2026-06-24 11:00:54 +00:00
23b76a240a docs(call): record review-call pass — implementation conforms to registry/protocol/ADR spec
call/review-call verified the alknet-call crate against operation-registry.md,
call-protocol.md, and ADRs 005/012/014/015/016/017/022/023/024. All
registry types, protocol types, security constraints (Capabilities
non-serializable/zeroized/immutable, metadata non-propagation, internal
ops -> NOT_FOUND, reachability bounds, UUID v4 request IDs), and pattern
consistency (OperationEnv trait, CompositeOperationEnv contains-probe,
authority switch, deadline inheritance) are conformant. 159 tests green;
build/clippy(fmt check) clean. No source changes required.
2026-06-23 15:55:10 +00:00
323ee85d40 docs(vault): remove drift tracking artifacts, bump vault docs to stable
The vault spec-to-implementation sync is complete. Remove the drift
tracking tools that were only needed during sync:

- Remove the Known Source Drift table from vault/README.md
- Remove 'known drift' / 'current source uses X' prose from Security
  Constraints sections in vault/README.md, encryption.md, and service.md.
  The permanent constraint statements (OsRng for IVs, zeroized drop,
  no unwrap, etc.) are preserved.
- Remove the drift paragraph in encryption.md Key Versioning.
- Remove stale 'to be updated per ADR-025' / 'postcard tests to be
  removed' notes in protocol.md References.
- Bump status: draft -> stable in the frontmatter of all vault docs
  (README, mnemonic-derivation, encryption, service, protocol).
- Update architecture/README.md: vault doc status entries to stable,
  Current State paragraph reflects vault implementation complete (no
  'pending ADR-025/026 refactor' language).
2026-06-23 14:15:13 +00:00
2e34590522 docs(architecture): resolve review #003 — type/API surface completeness
Review #003 found 11 critical, 14 warning, and 6 suggestion findings
after reviews #001 (governance/security) and #002 (cross-document
consistency/two-way-door audit) were resolved. The theme: types and
APIs that were *referenced* but never *defined*, and stale ADR sketches
that didn't match the now-updated spec docs.

Critical fixes (11):

- C1: DerivedKey #[derive(Deserialize)] contradicted the custom
  Deserialize that rejects "[REDACTED]" — dropped the derive, added
  explicit manual Serialize/Deserialize impls (protocol.md).
- C2: encrypt prose said "derived at PATHS::ENCRYPTION" but the
  signature takes key_version — updated to encryption_path_for_version
  (service.md).
- C3: derive_encryption_key returned DerivedKey, derive_encryption_key
  _for_version returned EncryptionKey (same cache) — unified on
  DerivedKey, defined CachedKey (service.md).
- C4: tokio vs std::sync::RwLock contradiction — specified
  std::sync::RwLock, dropped tokio from vault deps (ADR-018, ADR-025,
  service.md).
- C5: Missing drift rows in vault README — added #9 (key_version
  ignored) and #10 (rotate not implemented).
- C6: ADR-022 build_root_context and invoke() sketches omitted
  abort_policy (9 fields vs 10) — added the field to both sketches.
- C7: Capabilities type referenced 20+ times, never defined — added
  struct definition to core-types.md with Clone+Send+Sync, Zeroize,
  sealed builder API, immutability guard.
- C8: SessionOverlaySource on CallAdapter but never defined, crate
  violation (alknet-call can't depend on alknet-agent) — defined the
  trait in alknet-call (call-protocol.md), matching the IdentityProvider
  pattern.
- C9: CompositeOperationEnv dispatch fall-through was "a two-way door"
  — added contains() to OperationEnv trait, made the composite probe
  before dispatching, eliminating the sentinel ambiguity.
- C10: No API for Layer 2 (connection overlay) registration, CallConnection
  undefined — defined CallConnection struct + register_imported() API
  (call-protocol.md).
- C11: with_local signature diverged between two examples (4 args vs 5)
  — added capabilities as the 5th arg, made both examples consistent.

Warning fixes (14):

- W1: invoke_with_policy restructured as required method, invoke gets a
  default impl delegating to it — eliminates duplication across impls.
- W2: CachedKey defined (service.md).
- W3: EncryptionKey constructor/glue specified, added to re-export list.
- W4: Secp256k1ExtendedPrivKey defined, derive_ethereum_key glue shown.
- W5: encryption_path_for_version rejects version < 2 (v1 is TS PBKDF2).
- W6: Wire payload schemas for all event types + ResponseEnvelope →
  EventEnvelope conversion table (call-protocol.md).
- W7: Timeout section — deadline on OperationContext, composed calls
  inherit parent's deadline, CallAdapter::with_timeout().
- W8: Request ID generation spec — UUID v4 for composed calls, wire ID
  vs internal ID relationship for abort cascade.
- W9: unlock_new already-unlocked behavior specified (returns
  AlreadyUnlocked).
- W10: KeyType Serialize/Deserialize justification corrected (stale
  irpc reference removed).
- W11: OperationProvenance and CompositionAuthority defined inline in
  operation-registry.md (were only in ADR-022).
- W12: encrypt/decrypt free functions marked pub(crate), relationship
  to VaultServiceHandle methods stated.
- W13: rotate signature removed from encryption.md (it's a
  VaultServiceHandle method, not a free function).
- W14: CallAdapter::new() + with_session_source() + with_timeout()
  constructors shown.

Suggestion fixes (6): Seed: Clone note, VaultServiceInner invariant,
ExtendedPrivKey accessor signatures, CURRENT_KEY_VERSION location, ADR-018
stale actor text, derivation helpers re-export note.
2026-06-23 10:56:05 +00:00
cb98f42cd4 docs(architecture): resolve review #002 remaining Tier 4 findings
Add ADR-026 (vault key model — HD derivation) recording the foundational
HD-derivation decision, 74' coin type reservation, SLIP-0010/Ed25519
default, secp256k1 feature-gating, and AES-256-GCM cipher choice. These
were previously inline rationale with no ADR (W9).

Extend ADR-018 with an explicit EncryptedData wire format lock — fields,
encoding, and semantics are frozen; no removal without a format-version
migration (W10).

Resolve the remaining guard clauses and spec decisions:

- W2: Capabilities must be immutable after construction (no interior
  mutability). Makes the Arc vs deep-copy clone semantics genuinely
  two-way.
- W5: Published to_* specs are compatibility contracts — best-effort
  mappings are two-way before first publication, one-way after. Version
  generated specs.
- W6: Salt field clarification — v2 salt is permanently unused; a future
  KDF is a different derivation family, not a version-indexed path; the
  field saves a wire-format change only.
- W7: unlock_new returns Zeroizing<String> — the mnemonic is the root of
  trust and must not linger in freed memory.
- W17: OQ-09 WASM — server-side dispatch door is honestly closed
  (Connection is concrete, tokio-bound), not implicitly preserved.
- W18: OQ-10 git — composability fork (raw smart protocol vs call-protocol
  projection) is a separate decision from ERC721 scope.
- W20: from_openapi must prefix imported error codes (HTTP_404) to avoid
  collision with protocol-level codes (NOT_FOUND). Normative rule, not
  naming convention.
- W21: ScopedOperationEnv field is private — construction via new()/
  empty(), query via allows(). Makes the future subgraph refactor
  non-breaking.
- C13: Connection::set_identity — the endpoint does not read identity()
  after handle() returns (Connection is moved into the spawned task).
  Observability is handler-side logging. Simplest honest answer.
- W1: OperationAdapter trait is async, returns Vec<HandlerRegistration>.
  from_call requires async discovery; ADR-022 changed the return type.
- W11: CompositionAuthority::as_identity() defined — constructs a
  synthetic Identity (label as id, scopes, resources) not resolvable via
  IdentityProvider. Second Identity construction path, acknowledged.
- W14: SecretKey is iroh::SecretKey (Ed25519) — consistent with the
  endpoint's iroh dependency.
- W19: Grandchild abort propagation is inherit-by-default (option a) —
  invoke() with no explicit policy inherits parent's policy. ContinueRunning
  auto-propagates to grandchildren unless explicitly overridden.
2026-06-23 08:20:27 +00:00
91159bf574 docs(architecture): remove derive_password and site_password_path from vault
The password-manager pattern (deterministic per-site passwords from HD
derivation) is not relevant to an RPC system's vault. Handlers call APIs
(using API keys, OAuth tokens, mTLS), not websites with passwords. The
vault is for cryptographic key derivation and credential encryption.

Removes:
- derive_password, derive_password_string from service.md
- site_password_path from mnemonic-derivation.md
- m/74'/1'/0'/{hash}' path from PATHS module and path semantics table
- derive_password row from the cache table

Resolves review #002 C9 (site_password_path hash mapping underspecified)
by removing the feature rather than specifying the non-standard
string→u32 mapping and Ed25519-as-password-entropy construction.

If deterministic password generation is ever needed (browser-automation
edge case), it can be re-added — the cost is near-zero. Removing it now
eliminates permanent API surface inherited from a prior project's
password-manager pattern.
2026-06-23 06:06:11 +00:00
7dda6eec68 docs(architecture): add ADR-025 — vault local-only dispatch, drop irpc
Drops irpc from alknet-vault entirely. The vault's dispatch is now direct
method calls on VaultServiceHandle — no VaultProtocol enum, no
VaultMessage, no VaultServiceActor, no mpsc channel, no Service trait, no
RemoteService trait, no postcard serialization. The vault is local-only by
construction.

The core security argument: irpc made the vault remote-capable by default
(RemoteService generated unless no_rpc is passed). The IrohProtocol handler
forwards all messages without auth. The docs framed 'register an ALPN' as a
server-setup change. This is the default-insecure anti-pattern — security
should be opt-in, not opt-out. ADR-025 inverts the default: local-only is
the only mode, and remote access requires building a separate vault-server
crate (a visible architectural act, not a flag flip).

The actor path was already dead code — service.md said 'prefer
VaultServiceHandle directly — no channel, no serialization.' The actor
existed only to make irpc's Service trait work, which existed only to make
RemoteService work, which was the footgun. VaultServiceHandle's
Arc<RwLock> provides concurrent reads and exclusive writes — better
throughput than the actor's sequential processing.

DerivedKey serialization simplifies: always redact on serialize (for
logging safety), reject '[REDACTED]' on deserialize with an error. No
'postcard preserves bytes' path. This resolves review #002 W8 (silent
corruption on JSON-deserialized DerivedKey).

Resolves:
- OQ-21: remote vault access — resolved (not deferred). Not a vault crate
  feature; if needed, a separate vault-server crate with its own ADR.
- C7: vault-server-crate question decided — not created now, not precluded.
- C8: operation access policy table dissolved — all operations local-only
  by default; if a vault-server crate exposes some remotely, that crate
  defines the policy.
- W8: DerivedKey JSON deserialization — resolved (reject redacted payloads).

Amends ADR-005 (irpc remains for alknet-call, not for alknet-vault),
ADR-018 (vault is even more standalone — zero RPC framework deps),
ADR-019 (vault is the only layer, not just the only direct-caller layer),
ADR-008 (vault integration point unchanged, but now local-only by
construction).
2026-06-22 14:53:52 +00:00
cdf340bec7 docs(architecture): add ADR-024 — operation registry layering, resolve C6
Diagnoses a conflation in the pre-ADR-024 spec: the OperationRegistry
inherited immutability by analogy from ADR-010's HandlerRegistry (ALPN-level),
but the TLS-config argument that justifies HandlerRegistry immutability does
not apply to the operation registry, which lives behind a single ALPN
(alknet/call). This made from_call (which discovers ops over a live connection
at runtime) structurally incompatible with the blanket immutability claim.

ADR-024 layers the operation registry by trust boundary: curated (Local) ops
are static and immutable — the startup trust boundary is where their
composition authority is granted; session (Session) and imported (FromCall
etc.) ops are dynamic at their respective scopes (per-session, per-connection)
— their trust boundaries are per-scope, not per-startup. The principle:
immutability follows the trust boundary. Immutability is the security control
for composing ops (can escalate privilege); provenance + composition authority
are the controls for non-composing ops (can't escalate).

The OperationEnv trait becomes the integration point (Arc<dyn OperationEnv>),
following the IdentityProvider precedent (ADR-004): the CallAdapter composes
the root OperationContext.env per incoming call from the active layers
(curated base + connection overlay + session overlay). Children inherit the
parent's composite env by Arc::clone — overlay composition happens once at
the root and propagates through the composition tree.

Resolves review #002 C6 (OperationContext.env type identity crisis): the
field is split into scoped_env: ScopedOperationEnv (reachability data, from
the registration bundle) and env: Arc<dyn OperationEnv + Send + Sync>
(dispatch trait object). One field was being used as two different types
(reachability set with .allows() and dispatch trait with .invoke());

Localizes W4 (hot-swap ↔ registry mutability coupling) to the connection
scope: no global mutable registry to hot-swap; overlays replace naturally
with connect/disconnect and session start/end. Schema-drift on reconnect is
a per-connection overlay-rebuild concern, not a global hot-swap protocol.

Partially addresses W3 (CallClient registry security): the registry-shape
sub-question is resolved by the overlay model; the capability-exposure
sub-question (what capabilities a remote peer can trigger) remains for
ADR-017 — ADR-024 does not overclaim resolution there.

Amends OQ-04 to scope its immutability claim to the HandlerRegistry and
cross-reference ADR-024 for the operation registry. Generalizes OQ-19's
session-overlay mechanism to also cover connection-scoped remote imports —
both are per-scope dynamic overlays on the static curated base, using the
same trait-layering mechanism.
2026-06-22 13:44:58 +00:00
c62a6adc7b docs(architecture): resolve review #002 Tiers 1-3 — mechanical and consistency fixes
Governance (Tier 2):
- Advance ADR-022 and ADR-023 from Proposed to Accepted (specs already
  depend on their types as source of truth)
- Amend ADR-015: mark Decision 3 and Assumption 6 as superseded by ADR-022;
  update handler_identity type to CompositionAuthority
- Amend ADR-002: note handle() signature revised by ADR-007 (BiStream → Connection)
- Amend ADR-004: note 'enrich/replace' AuthContext language superseded by
  ADR-011's immutability model; update to describe set_identity on Connection
- Update main README ADR table to show ADR-022/023 as Accepted

Spec-ADR consistency (Tier 3):
- Add abort_policy: AbortPolicy field to OperationContext struct (ADR-016
  Decision 6 mandated this but the spec omitted it)
- Define AbortPolicy enum (AbortDependents | ContinueRunning) with Default impl
- Add abort_policy to build_root_context and LocalOperationEnv::invoke()
- Define the OperationEnv trait explicitly with invoke() and
  invoke_with_policy() methods (was referenced as 'must remain a trait'
  but never defined)
- Specify From<StreamError> for HandlerError impl with exact variant mapping
- Add Connection::from_quinn() / from_iroh() constructors (was referenced
  as Connection::new() but never defined)
- Remove undefined CertAuthorityEntry placeholder from AuthPolicy v1 (will
  be added additively when alknet-ssh lands)
- Fix config.md key-differences table: rate limits are in DynamicConfig,
  not StaticConfig

Mechanical fixes (Tier 1):
- overview.md: 'closes the QUIC stream' → 'closes the connection' (stale
  from pre-ADR-007 model)
- overview.md: OQ-04 entry updated from stale 'defer to implementation'
  to 'resolved: static at startup'
- mnemonic-derivation.md: remove duplicate helper functions block (incomplete
  first copy, complete second copy)
- ADR-003: add iroh (feature-gated) to alknet-core dependency list, added
  by ADR-010
- ADR-021: fix ambiguous 'W1 drift issue from the vault review' cross-reference
- ADR-022: rephrase FromCall 'leaf locally' to 'leaf in the local registry'
- ADR-017: add error_schemas to from_call mirror list and services/schema
  step (inconsistency with ADR-023)
- ADR-016: fix self-referential citation ('ADR-016 Assumption 5' → 'Assumption 5')
- Add ScopedOperationEnv::empty(), allows(), new() and
  CompositionAuthority::none(), new() impl blocks (referenced but undefined)
- Add call.completed clarification for non-subscription calls
- Add services/schema leading-slash normalization note
- Crate README ADR tables: add missing ADR-013 (call), ADR-015 (core),
  ADR-006 + ADR-010 (vault)
- Vault README: add consolidated 'Known Source Drift' table tracking all
  four drift items (OsRng, unwrap, CURRENT_KEY_VERSION, spawn bug) in one
  place, including the two previously missing from README
2026-06-22 05:46:37 +00:00
3f529df367 docs(architecture): update ADR-015 scoped env API — resolved by ADR-022
ADR-015 L171 said the scoped env API was 'a two-way door for
implementation.' ADR-022 has now resolved it: ScopedOperationEnv with
operation-level granularity (HashSet<String>), not namespace-level.
Update the stale text to point to the resolution.
2026-06-21 10:51:42 +00:00
6a7f8f91ad docs(architecture): resolve S1 — abort policy on OperationContext, not wire
ADR-016 Decision 6 specifies that the abort policy (abort-dependents vs
continue-running) is set on OperationContext and propagated through
OperationEnv::invoke() — the composing handler decides the child's
policy, not the wire caller. The call.requested payload does not carry
an abort policy field. This resolves the TBD that was masquerading as a
two-way door: two of the three options ADR-016 floated (wire payload,
per-operation declaration) were inconsistent with the ADR's own
assumptions.

Also marks review #001 as resolved — all 5 critical, 4 warning, and 4
suggestion findings are now addressed.
2026-06-21 10:34:12 +00:00
3e238a471b docs(architecture): add ADR-023, resolve OQ-24 — operation error schemas
ADR-023 adds error_schemas to OperationSpec so operations can declare
their domain-level failure modes (FILE_NOT_FOUND, RATE_LIMITED, etc.)
distinct from protocol-level codes (NOT_FOUND, FORBIDDEN, etc.). The
call.error payload gains an optional 'details' field carrying the typed
error payload conforming to the declared schema. from_openapi/to_openapi
map OpenAPI response status codes to/from ErrorDefinitions, making the
adapter contract from ADR-017 faithful on the error axis.

Also fixes W2 (KeyVersionMismatch stale comment in encryption.md —
ADR-021 implements rotation without this variant) and W4
(derive_encryption_key_for_version missing from service.md method list).

Spec updates: operation-registry.md (OperationSpec, ErrorDefinition,
Handler error mapping, services/schema), call-protocol.md (call.error
payload, CallError, ResponseEnvelope), README.md, overview.md,
open-questions.md (OQ-24), call/README.md, encryption.md, service.md.
2026-06-21 10:26:18 +00:00
1cedc4eeba docs(architecture): add ADR-022, resolve OQ-23 — handler registration, provenance, and composition authority
ADR-022 wires the three controls ADR-015 specified but left without
registration paths (C1-C4 from review #001): composition authority,
scoped env, and capabilities now enter through a HandlerRegistration
bundle. Provenance (Local, FromOpenAPI, FromMCP, FromCall, Session)
determines which ops can compose — leaves don't get composition
authority. CompositionAuthority replaces handler_identity: Identity
(it's a declared authority bundle, not a peer identity). Capabilities
are per-request from the bundle (resolves closure-capture vs context
ambiguity). Kernel/user analogy: user's authority checked at External
gate; handler's composition authority used inside; scoped env bounds
reachability.

Also fixes W1 (stale ADR-020 path example) and W3 (from_mcp missing
from adapter lists in operation-registry.md).

Spec updates: operation-registry.md (OperationRegistry,
HandlerRegistration, OperationContext, OperationEnv, registration
example, capability injection), call-protocol.md (build_root_context),
README.md, overview.md, open-questions.md (OQ-23), call/README.md.
2026-06-21 09:09:47 +00:00
9087f0579f docs(architecture): document vault remote capability, enrich OQ-21
The VaultProtocol is a remote-capable irpc service by construction —
#[rpc_requests] generates both Service (local) and RemoteService (remote)
trait impls. DerivedKey's dual serialization (JSON redacts, postcard
preserves) was designed for this. Enabling remote vault access is a
server-setup change, not a protocol change.

OQ-21 enriched with full context:
- What's already in place (protocol, serialization, actor, auth transport)
- What's not in place (IrohProtocol handler forwards all messages without
  auth checks; needs NodeId allowlist + message filtering in assembly layer)
- Operation access policy: Unlock/Lock local-only; Derive/Encrypt/Decrypt
  remote-capable
- Use case: machine node → workers (workers don't hold mnemonics)
- Per-machine-node vaults, not shared (compartmentalization)
- Breaking vs non-breaking analysis (enabling = non-breaking; protocol
  evolution = wire break, manageable via ALPN versioning)

The auth-wrapping handler lives in the assembly layer (or a dedicated
vault-server crate depending on both alknet-core and alknet-vault), not in
the vault crate itself — the vault is standalone (ADR-018) and can't
import alknet-core's auth model.

OQ-21 remains deferred — no commitment to implement, but the door is open
and the design space is mapped.
2026-06-20 06:48:23 +00:00
dc27753680 docs(architecture): add ADR-021, resolve OQ-22 — key rotation via version-indexed paths
Key rotation uses version-indexed derivation paths: each key version maps
to a distinct SLIP-0010 path (m/74'/2'/0'/{version-2}'). v2 is at index 0
(PATHS::ENCRYPTION), v3 at index 1, etc.

Mechanism:
- encryption_path_for_version(version) constructs the path
- decrypt derives the key at the version-indicated path (not always
  PATHS::ENCRYPTION)
- rotate(blob, to_version) decrypts with old key, re-encrypts with new
- No new mnemonic needed — same seed, different path
- Partial rotation is safe — old keys remain derivable
- The vault does not self-rotate; the assembly layer iterates blobs

Source drift flagged:
- decrypt currently ignores key_version for path selection (always uses
  PATHS::ENCRYPTION) — must use version-indexed paths
- rotate method does not exist in source — must be added
- CURRENT_KEY_VERSION must bump from 1 to 2 (per ADR-020, reinforced here)

OQ-22 resolved. Only OQ-21 (remote vault admin, deferred) remains.
2026-06-19 10:09:20 +00:00
6e9414bc81 docs(architecture): add ADR-020, resolve OQ-20 — HD derivation for encryption keys
The vault uses SLIP-0010 HD derivation from the BIP39 seed for the
AES-256-GCM encryption key, not PBKDF2. This replaces the TypeScript
predecessor's (@alkdev/storage/src/graphs/crypto.ts) PBKDF2-based
approach.

Key decisions:
- HD derivation at m/74'/2'/0'/0' produces the encryption key
- PBKDF2 is not implemented in the vault; no password-based derivation
- salt field is unused in v2 (wire-format compat only)
- key_version=1 reserved for TS PBKDF2 data; key_version=2 for vault HD
- TS-encrypted data requires one-time migration to v2
- CURRENT_KEY_VERSION changes from 1 to 2 (source drift flagged)

OQ-20 resolved: the encryption key derivation method is locked. OQ-22
(key rotation workflow) remains open but does not block implementation.
2026-06-19 09:49:06 +00:00
dd1ca1de70 docs(architecture): add alknet-vault spec, ADR-018, ADR-019, OQ-20/21/22
Spec the vault crate from its existing implementation. The vault is
stable (implementation exists); this spec documents what IS so the
implementation-sync agent can reconcile source drift.

New spec documents (crates/vault/):
- README.md — crate index, security constraints, public API
- mnemonic-derivation.md — BIP39, SLIP-0010, BIP-0032, derivation paths
- encryption.md — AES-256-GCM, EncryptedData, key versioning, salt
- service.md — VaultServiceHandle lifecycle, actor dispatch, cache
- protocol.md — VaultProtocol irpc messages, DerivedKey redaction

New ADRs:
- ADR-018: Vault as standalone crate (zero alknet deps; own types/errors)
- ADR-019: Vault assembly-layer-only access (CLI is sole caller)

New open questions:
- OQ-20: Salt/KDF Phase B (open, low priority — salt field reserved)
- OQ-21: Remote vault administration (deferred — needs ADR if ever needed)
- OQ-22: Key rotation mechanism (open, low priority — workflow not specced)

Spec-vs-source drift explicitly flagged (for the sync agent):
- rand::random() used for IVs instead of OsRng (security-critical)
- unwrap() on every RwLock acquisition (must use unwrap_or_else)
- ADR-038 / OQ-SVC-03 references in source comments are stale (old numbering)
- VaultServiceActor::spawn returns a non-functional second actor (source bug)
- KeyVersionMismatch error variant is defined but unused in v1
2026-06-19 09:23:47 +00:00
40f6468e18 docs(architecture): fix spec/ADR inconsistencies from pre-decomposition review
Critical:
- operation-registry: remove stale duplicate OperationEnv impl that
  propagated parent.metadata through composition (violated ADR-014);
  collapse to one canonical block with metadata: HashMap::new()
- operation-registry: fix request_id collision — format!("env-{name}")
  produced identical IDs across concurrent invocations, corrupting
  PendingRequestMap correlation and the abort-cascade tree (ADR-016)
- operation-registry + ADR-015: fix OperationContext.internal visibility —
  pub field let handlers mark their own call internal (privilege
  escalation per ADR-015); change to pub(crate) with pub fn is_internal

Warnings:
- core-types: add Connection::set_identity/identity (OQ-11) to the
  Connection type spec — was specified in auth.md but missing from the
  type definition
- operation-registry: add Capabilities: Clone design note — invoke()
  clones capabilities through composition; explicit security implication
- call-protocol: add CallAdapter root OperationContext construction
  example showing internal: false wire path, complementing
  OperationEnv::invoke() internal: true composition path
- overview: remove alknet/agent from ALPN registry — agent is a future
  consumer of alknet-call (call-protocol operations), not a separate ALPN
- call-protocol: clarify call.requested payload schema and the
  leading-slash convention (wire operationId has slash, registry name
  does not)

Suggestions:
- operation-registry: cross-reference ResponseEnvelope definition
- core-types: add StreamError to HandlerError mapping table
2026-06-19 09:13:10 +00:00
400c60e7f4 docs(architecture): security constraints from security review
Address security review findings by adding explicit constraints to specs
and implementation specialist role:

Architectural constraints (spec updates):
- metadata does not propagate through OperationEnv::invoke() — fresh
  HashMap for nested calls, closes the back-door leak channel where a
  handler that puts a secret in metadata would leak it to children and
  across from_call to remote nodes (ADR-014)
- Config reload must be authenticated/local-only — malicious reload =
  root-equivalent privilege grant (config.md)
- from_call trust is transitive — scoped env bounds reachability, not
  what the remote op does (operation-registry.md)
- Token entropy ≥128 bits — prefix is lookup aid not secret, offline
  hash verification requires high-entropy tokens (auth.md)

Implementation constraints (auth.md security constraints section + role spec):
- OsRng for cryptographic nonces (AES-GCM IV reuse is catastrophic)
- CachedKey derives Zeroize/ZeroizeOnDrop (no secrets in freed heap)
- No unwrap()/expect() outside tests (poisoned lock recovery, not crash)
- Implementation specialist role spec updated with all four constraints
2026-06-19 06:55:54 +00:00
c0a322ac29 docs(architecture): resolve OQ-11 and OQ-19 — all open questions resolved
OQ-11 (handler-level auth observability): Option B — handlers store
resolved identity on Connection via set_identity. Two identity scopes:
connection-level (observability, write-once-read-many) and per-request
(ACL, on OperationContext). Per-request takes precedence for ACL;
connection-level is for logging/audit only.

OQ-19 (session-scoped registries): Protocol doesn't need changes.
OperationEnv must remain a trait (not concrete) to enable session-overlay
pattern. Three-tier registry: core (static, External+Internal), session
(dynamic, Internal-only), promotion (curated review). Documented as
implementation guard in operation-registry.md.

All 19 open questions are now resolved. No open one-way or two-way doors
remain. The architecture is ready for review and implementation.
2026-06-19 06:05:04 +00:00
8f19eb8861 docs(architecture): add ADR-017 call protocol client and adapter contract, resolve OQ-15
ADR-017 locks the client/adapter architecture:
- CallClient opens QUIC connections, shares dispatch loop with CallAdapter
- Connection direction independent of call direction (both sides can call)
- from_call adapter: discovers remote ops via services/list + services/schema,
  registers with forwarding handlers (same pattern as from_openapi/from_mcp)
- to_openapi/to_mcp: project local ops to external protocols
- OperationAdapter trait: produces (OperationSpec, Handler) pairs
- Cross-node call tree: abort cascade propagates through from_call handlers
- Credentials from capabilities (ADR-014), adapter ops Internal by default (ADR-015)

The dispatch POC at /workspace/@alkdev/dispatch demonstrated head/worker over
SSH+axum; under the call protocol it's cross-node composition via from_call.
Connection topology (who advertises, who opens) is independent of call
direction — runner pattern, dispatch pattern, and P2P all work.
2026-06-18 10:57:29 +00:00
e2730869ca docs(architecture): add ADR-016 abort cascade for nested calls, resolve OQ-17
ADR-016 locks the abort cascade model:
- call.aborted cascades to all non-terminal descendants via parent_request_id
- Default policy: abort-dependents (abort everything downstream)
- Opt-in: continue-running (started descendants continue, pending ones abort)
- Server (CallAdapter) discovers descendants and propagates; client sends one abort
- Handlers clean up via Rust async drop semantics (Drop guards)
- parent_indexed map suffices for tree walking; flowgraph is optional prior art

Spec updates:
- call-protocol.md abort cascade section references ADR-016
- OQ-17 resolved, ADR-016 referenced across all call crate specs
- README.md updated: ADRs 001-016, OQ-17 moved to resolved
2026-06-18 09:37:19 +00:00
6285779c30 docs(architecture): add ADR-015 privilege model and authority context, resolve OQ-18
ADR-015 locks the call protocol's security model:
- internal flag switches authority context to handler identity, not skip ACL
- Operations have External/Internal visibility (Internal returns NOT_FOUND from wire, excluded from services/list)
- OperationContext carries both identity (caller/principal) and handler_identity (handler/agent)
- Scoped composition env bounds reachability (handler can only invoke declared operations)
- Three controls together: visibility (wire boundary) + handler identity (authority) + scoped env (reachability) = least privilege

Spec updates:
- OperationSpec gains Visibility field (External/Internal)
- OperationContext gains handler_identity field
- AccessControl section: ACL runs against caller identity for external, handler identity for internal
- LocalOperationEnv propagates handler_identity
- services/list only returns External operations
- Adapter-registered operations are Internal by default
- OQ-18 resolved, ADR-015 referenced across all call crate specs
2026-06-18 08:55:34 +00:00
b4aadc6b93 docs(architecture): add OQ-19 session-scoped registries and agent-written operations
Document the three-tier registry model (core/session/promotion) and the
self-improving agent workflow where agents write their own operations in
a quickjs sandbox. The POC at /workspace/toolEnv demonstrated the sandbox
mechanism (quickjs in Deno web workers, proxy-based env bridge via
postMessage) but exposed the full registry to the sandbox — the security
gap that OQ-18's scoped composition env addresses.

The call protocol doesn't need changes: the OperationEnv trait is the
composition point, and a session-scoped env wraps the global env (session
registry first, fall through to global). The one-way door this OQ guards
against: making OperationEnv concrete instead of a trait, or hardcoding
the global registry into the dispatch path, would close the session-overlay
pattern. Session-scoped operations are always Internal, run under the
handler's identity, and are ephemeral. Promotion to core requires curation
review (architect role with promote scope).
2026-06-18 08:31:46 +00:00
f27d717ac8 docs(architecture): reframe OQ-17 and OQ-18 as protocol-level concerns, not agent-specific
The abort cascade and privilege model are call protocol semantics that
every consumer inherits — NAPI adapter, Python adapter, agent service, and
any future service speaking the EventEnvelope wire format. Framing them as
'needs agent crate in view' let a single consumer's timeline gate a
protocol-level decision. The agent use case is a useful test case for edge
cases, but the decisions belong to the call protocol.
2026-06-18 07:47:57 +00:00
fab2c88444 docs(architecture): rename trusted to internal, add OQ-17 abort cascade and OQ-18 privilege model
The 'trusted' flag on OperationContext was the wrong word — it implies a
trust decision was made, but what actually happens is the call originated
internally (from composition) not externally (from the wire). Renamed to
'internal' with clarified semantics: internal calls switch authority
context to the handler's identity, not skip ACL. This prevents the
privilege escalation vector where composition with 'trusted: true' bypassed
all access control (buggy handler + parameterized dispatch).

- Rename trusted -> internal across operation-registry.md, ADR-014
- Update OperationContext field description and LocalOperationEnv code
- Add OQ-17: abort cascade for nested calls (call.aborted cascades to
  descendants, default abort-dependents, continue-running opt-in). One-way
  door on the protocol event schema; mechanism is a two-way door.
- Add OQ-18: privilege model and authority context (internal = authority
  switch not ACL skip, External/Internal operation visibility, scoped
  composition env + handler identity). Needs agent crate in view.
- Add abort cascade section and constraint to call-protocol.md
- Update crates/call/README.md with OQ-17, OQ-18, and two new design principles
- Update architecture README.md with OQ-17, OQ-18
2026-06-18 07:38:33 +00:00