Gap analysis for completing alknet-call: the server-side core (~5.7k lines, 159 tests) is implemented, but the client side (CallClient), the bilateral exchange mechanism (from_call), and the adapter contract (OperationAdapter trait) are specced in ADR-017 and unimplemented. Records: implementation state (verified against src/), 5 decisions needed (peer-scoped registry filtering as the load-bearing one), the settled adapter location map (trait + from_call + from_jsonschema in alknet-call; from_openapi/ from_mcp in alknet-http), the no-env-vars invariant (Capabilities → from_openapi handler → HTTP header), and the exchange-of-operations runner pattern with dispatch as the concrete downstream consumer.
23 KiB
status, last_updated
| status | last_updated |
|---|---|
| draft | 2026-06-25 |
alknet-call Completion — Gap Analysis
This document captures the gap between the existing alknet-call architecture
(ADRs 005/007/012/014/015/016/017/022/023/024, specs in
docs/architecture/crates/call/) and the current implementation
(crates/alknet-call/src/), the decisions needed before implementation can
proceed, and the downstream crates this completion unblocks.
Unlike the alknet-ssh phase-0 findings (a true exploration doc for a crate with no existing architecture), this is a gap analysis + decision record for completing existing architecture. The specs are largely settled; the work is implementing what's specced and resolving a small number of decisions the specs left as two-way doors or didn't address.
Implementation State (Verified)
The call protocol's server-side core is implemented and tested (159 tests, passing). What's missing is the client side and the adapter contract.
Implemented (~5,773 lines, 159 tests)
| Component | File | Lines | Status |
|---|---|---|---|
CallAdapter (ProtocolHandler for alknet/call) |
protocol/adapter.rs |
1,051 | Done |
CallConnection (Layer 2 overlay, call/subscribe/abort) |
protocol/connection.rs |
780 | Partial — see below |
Wire framing (EventEnvelope, FrameFramedReader/Writer) |
protocol/wire.rs |
544 | Done |
PendingRequestMap (ID-based correlation) |
protocol/pending.rs |
584 | Done |
| Abort cascade | protocol/abort.rs |
393 | Done |
OperationRegistry, HandlerRegistration, builder |
registry/registration.rs |
734 | Done |
OperationSpec, AccessControl, Visibility, ErrorDefinition |
registry/spec.rs |
321 | Done |
OperationContext, ScopedOperationEnv, AbortPolicy |
registry/context.rs |
178 | Done |
OperationEnv trait, CompositeOperationEnv, LocalOperationEnv |
registry/env.rs |
598 | Done |
Service discovery (services/list, services/schema specs + handlers) |
registry/discovery.rs |
557 | Done |
Not implemented (specced in ADR-017, absent from src/)
| Component | Spec location | Priority | Unblocks |
|---|---|---|---|
CallClient (outbound connection opener) |
ADR-017 §1 | Critical | Runner pattern, bilateral exchange, every downstream consumer |
from_call adapter (discover + register remote ops) |
ADR-017 §3 | Critical (depends on CallClient) |
Bilateral registry exchange, container-service pattern |
OperationAdapter trait |
ADR-017 §5 | Enabling | alknet-http's from_openapi/from_mcp implementations |
from_jsonschema (schema-only registration, no handler) |
ADR-017 §5 | Medium | Type validation, composition graph construction without runtime |
Partially implemented
CallConnection (protocol/connection.rs:34) exists and implements the
Layer 2 overlay (register_imported, register_imported_all, overlay_env),
the call() / subscribe() / abort() outbound-call API, and the
OverlayOperationEnv trait impl. It is constructed via
CallConnection::new(connection: Connection) — meaning it wraps a Connection
that was already established by the CallAdapter's accept path.
What's missing is the path that opens a connection and constructs a
CallConnection from the client side: CallClient::connect(addr, credentials).
The CallConnection type itself is ready; the CallClient that produces it is
not. This confirms ADR-017's design: the dispatch loop is shared, and the
client is the connection-establishment half, not a parallel protocol
implementation.
Decisions Needed
These are the points the specs either left as two-way doors or didn't address. Each is tagged with door type per ADR-009. Resolving these is the prerequisite for implementation.
DC-1: CallClient registry scope — share global vs peer-scoped subset
(One-way door — security dimension; ADR-017 Consequences flags this)
ADR-017 §1 says CallClient "has its own operation registry to dispatch
incoming calls from the remote side." The Consequences section flags the
security dimension explicitly: "Sharing the global registry with a CallClient
exposes local capabilities to the remote peer... A peer-scoped subset must
filter by capability remote-safety, not just operation name."
Three options:
- (a) Share the global registry — the remote peer can call any
Externaloperation. Simplest. But per ADR-017's Consequences, this exposes the local node'sCapabilitiesto the remote peer's calls:OperationContext.capabilitiesis populated from the localHandlerRegistration.capabilities, so the local node's API keys get used for the remote peer's call. This is a capability-exposure decision, not just a dispatch decision. - (b) Peer-scoped subset — the
CallClientholds a filtered view of the global registry, exposing only operations whoseCapabilitiesare marked remote-safe. Requires a "remote-safe" flag onHandlerRegistrationor onCapabilitiesentries (which don't exist today). - (c) Separate registry per
CallClient— theCallClienthas its own registry, populated explicitly at construction. Most restrictive, most explicit, most boilerplate.
Recommendation: (b) peer-scoped subset as the v1 default, with (a) as
an explicit opt-in for trusted peers. Rationale: the runner pattern (worker
connects to hub) and the dispatch pattern (hub connects to worker) both
involve semi-trusted peers where exposing all local capabilities is wrong by
default. The "remote-safe" marking is the new concept this introduces — likely
a Visibility::External-adjacent flag or a Capabilities entry annotation.
This needs an ADR (likely an amendment to ADR-017 or a new ADR-028) because it
adds a concept to the registration bundle. The exact shape is a two-way door;
the existence of the filtering is the one-way door.
DC-2: from_call re-import on reconnection
(Two-way door — ADR-017 Assumption 4)
ADR-017 Assumption 4: "If the remote operation changes (new schema, renamed), the imported spec is stale until re-import. The assumption is that re-import happens on reconnection or is triggered explicitly. Hot-swapping imported specs is a two-way door."
The question: does from_call run automatically on every (re)connection, or
only on explicit trigger? Auto-re-import on reconnect is simpler for the
runner pattern (worker reconnects → hub re-discovers worker's ops
automatically). Explicit trigger is safer (no surprise registry mutations).
Recommendation: auto-re-import on connection establishment for the
v1 default. The runner pattern is the primary use case, and runners
reconnecting is the common case — making it explicit adds friction without
clear benefit. The overlay is per-connection (Layer 2, ADR-024), so a
stale overlay dies with the connection; re-import on reconnect is naturally
scoped. Explicit re-import can be added later as a CallConnection::refresh()
method if needed. This is a two-way door — record the default, don't spend an
ADR.
DC-3: from_call namespace collision handling
(Two-way door — ADR-017 §3 mentions FromCallConfig prefix)
ADR-017 §3: FromCallConfig includes "An optional namespace prefix (to avoid
collisions when importing from multiple remote nodes)." The question is
whether the prefix is mandatory (always applied) or optional (default no
prefix, collision = last-wins or error).
Recommendation: optional prefix, default no prefix, collision = error.
A node importing from two remotes that both expose /container/exec without
prefixes should fail loudly rather than silently overwrite. The operator adds
prefixes when they know they're importing from multiple sources. This matches
the "default-deny, explicit-allow" posture. Two-way door, no ADR needed.
DC-4: OperationAdapter trait error type
(Two-way door — ADR-017 §5 says "specific trait signatures... are two-way doors")
ADR-017 §5 shows the trait as async fn import(&self) -> Vec<HandlerRegistration>,
with no error type. A real implementation needs to handle failures (HTTP fetch
fails for from_openapi, remote unreachable for from_call, schema parse
error for from_jsonschema).
Recommendation: the trait returns Result<Vec<HandlerRegistration>, AdapterError> where AdapterError is a crate-level enum
(DiscoveryFailed, SchemaParse, Transport, Unauthorized). The spec's
omission of the error type was an implementation-detail two-way door; the
implementation fills it in. Record in the spec amendment, not a full ADR.
DC-5: from_jsonschema vs from_call separation
(Confirmed — not a decision, but recorded for clarity)
These are distinct, not collapsible:
from_jsonschema |
from_call |
|
|---|---|---|
| Schema source | Provided directly (caller fetches, passes in) | Discovered over wire (services/list + services/schema) |
| Handler at call time | None (schema-only, FromJsonSchema provenance) |
Forwards over QUIC (FromCall provenance, leaf) |
| Use case | Type validation, discovery, composition graph construction | Actually invoking remote operations |
from_call = schema import (the from_jsonschema-shaped step) + forwarding
handler attachment. Keeping them separate preserves the "schema-only, no
execution" use case (type checking, safe composition planning without runtime).
This is confirmed architecture, not a decision to make.
Adapter Location Map (Settled)
The decomposition principle: the adapter trait lives where the types live (alknet-call); the adapter implementations live where their transport dependencies live.
alknet-call (lean — no HTTP client, no HTTP server)
├── OperationAdapter trait (the contract — async, per ADR-017 §5)
├── from_call (QUIC — discovers remote ops via call protocol)
├── from_jsonschema (pure parse — caller fetches the doc, passes it in)
└── CallClient (outbound connection opener — the #1 gap)
alknet-http (owns HTTP server + HTTP client — separate crate, separate Phase 0)
├── ProtocolHandler for h2/http1.1/h3 (axum server — inbound HTTP)
├── from_openapi (parse OpenAPI doc + reqwest forwarding handler)
├── to_openapi (generate OpenAPI doc from local registry)
├── from_mcp (feature-gated) (import remote MCP tools over streamable HTTP — reqwest)
└── to_mcp (feature-gated) (expose local ops as MCP tools over streamable HTTP — axum)
Not built: MCP stdio transport
— stdio = spawn arbitrary executable = built-in RCE ("download untrusted MCP servers")
— streamable HTTP is the only supported MCP transport in alknet
— recorded as an explicit security position, not a feature gap
Why this works: alknet-call never sees the HTTP client. The
from_openapi/from_mcp forwarding handlers are opaque Arc<dyn Handler>
from the registry's perspective — constructed by alknet_http::from_openapi()
at registration time, stored in HandlerRegistration, dispatched by the
CallAdapter which doesn't know reqwest is involved. alknet-call stays lean
(no reqwest, no axum); alknet-http owns both HTTP directions.
ADR-003 dependency note: alknet-http implementing from_openapi/from_mcp
means alknet-http depends on alknet-call (for OperationSpec, Handler,
HandlerRegistration, OperationAdapter). ADR-003's rule is "no handler crate
depends on another handler crate" — but alknet-call is both a handler and the
protocol foundation that alknet-agent and alknet-napi already consume. alknet-http
depending on alknet-call is "HTTP uses the call protocol types," not "HTTP depends
on SSH." This is within the spirit of ADR-003 (alknet-call is protocol-foundation,
not a peer handler), but should be noted explicitly in the alknet-http spec
and possibly as a one-line amendment to ADR-003 clarifying that alknet-call is a
protocol-foundation crate.
The No-Env-Vars Invariant (Architectural Mechanism)
This is the architectural fix for the env-var problem in downstream consumers
like aisdk (the Rust port of Vercel's AI SDK at /workspace/aisdk/, 75
providers all reading std::env::var("OPENAI_API_KEY") in their Default
impls). The fix is not to modify aisdk — it's that the env-var path is
never taken because the assembly layer never calls Default::default().
The credential injection path:
vault (seed)
→ assembly layer (derive + decrypt at startup, per ADR-014/019/025)
→ Capabilities (non-serializable, zeroized, immutable — ADR-014)
→ HandlerRegistration.capabilities (ADR-022, the registration bundle)
→ OperationContext.capabilities (per-request, populated by dispatch
path from the bundle — ADR-022 §6)
→ from_openapi handler reads context.capabilities.get("openai")
→ injects into HTTP Authorization header
→ reqwest request goes out with vault-derived credential
The from_openapi/from_mcp forwarding handler (living in alknet-http) is the
credential injection point. It reads from context.capabilities, not from
std::env::var. aisdk's Default impls reading env vars are simply never
called — the assembly layer constructs providers with vault-derived
credentials through the builder API, or the provider's HTTP calls are routed
through from_openapi operations that carry the credential in Capabilities.
This must be a spec-level invariant in alknet-call, not a runtime convention.
The dispatch path (build_root_context and OperationEnv::invoke() per
ADR-022 §5) populates OperationContext.capabilities from the registration
bundle. The invariant is: no handler reads outbound credentials from any
source other than OperationContext.capabilities. This is already the
architectural intent of ADR-014; the completion work should make it an explicit,
documented invariant that the from_openapi/from_mcp handler implementations
(in alknet-http) are verified against.
The "Exchange of Operations" Pattern (Runner / Container Service)
This is the canonical downstream pattern alknet-call completion unblocks, made
explicit here so Phase 1 specs can reference it. Concrete example: the
container service at /workspace/@alkdev/dispatch (axum + russh SSH client for
"reverse git runner" over Docker/vast.ai) gets rewritten as a call-protocol
service.
Bilateral exchange
Container service (runs on a vast.ai/docker instance):
Defines Local ops: /container/exec, /container/list, /container/logs...
(real handlers — calls bollard or vast.ai API)
Connects to hub as a CallClient (outbound connection — runner pattern)
Hub (central server):
Runs CallAdapter (server) on alknet/call (already implemented)
When the container service connects:
hub runs from_call → discovers /container/* via services/list + services/schema
registers them as FromCall provenance (leaf, forwarding handlers) in the
connection's Layer 2 overlay (ADR-024)
Now the hub (or anything connected to the hub) can call /container/exec
The from_call handler forwards over the connection back to the container service
Bilateral: the container service ALSO runs from_call against the hub,
discovers the hub's External ops, and can call them.
Connection direction (container → hub) is independent of call direction
(both can call each other) per ADR-017 §2.
What this requires
CallClient— the container service uses it to open the outbound connection to the hub. This is the #1 gap.from_call— both sides run it to populate their Layer 2 overlays with the other side'sExternalops. This is the #2 gap.OperationAdaptertrait —from_callimplements it. This is the #3 gap (enabling, not blocking —from_callcan be built as a free function before the trait exists, but the trait is needed for alknet-http's adapters).
Why the container service doesn't need alknet-ssh
The current dispatch uses SSH (channel_open_direct_tcpip) as the transport
for the "connect back to hub" pattern. Under the call protocol, the container
service is a CallClient that dials the hub's alknet/call ALPN directly over
QUIC — no SSH in the loop. SSH port forwarding becomes the transitional
mechanism for targets that can't run a call-protocol client (the alknet-ssh
phase-0 findings document this transition). Once the container service runs a
CallClient, SSH is out of the path entirely.
This is the "dev runner" pattern: a call-protocol client that connects back to
a hub and exposes the core dev tools (bash, fs, etc.) as operations. The other
tools (web search, etc.) plug into the call protocol as additional operations.
The agent service (alknet-agent, downstream) is the consumer that orchestrates
these via env.invoke().
Implementation Priority Order
Based on the gap analysis and the downstream unblock chain:
-
CallClient(critical) — outbound connection opener. Without it, no runner, no container service, no bilateral exchange. Reuses the existingCallConnection(which is already implemented) for the dispatch loop; adds only the connection-establishment + credential-handling half. This is the single highest-value piece of work in the entire alknet-call completion. -
from_call(critical, depends onCallClient) — discovers remote ops viaservices/list+services/schema, constructsHandlerRegistrationbundles withFromCallprovenance, registers them in the connection's Layer 2 overlay viaCallConnection::register_imported_all(). The discovery mechanism (services/list/services/schemaspecs + handlers) is already implemented inregistry/discovery.rs;from_callis the client-side consumer of that discovery API. -
OperationAdaptertrait (enabling) — the async trait (async fn import(&self) -> Result<Vec<HandlerRegistration>, AdapterError>) thatfrom_call,from_openapi,from_mcp,from_jsonschemaall implement. Needed before alknet-http's adapter implementations can be built. Small, standalone, unblocks alknet-http Phase 1. -
from_jsonschema(medium, standalone) — schema-only registration, no handler. Useful for validation/discovery without execution. Distinct fromfrom_call(no forwarding behavior). Small. -
DC-1 resolution (peer-scoped registry filtering) — the security dimension of
CallClient's registry. Can be addressed in parallel with #1 (it's a filtering layer on the registry theCallClientexposes, not a blocker for the connection-establishment work). Needs an ADR.
What This Completion Unblocks
| Downstream crate | What it needs from alknet-call | Status without completion |
|---|---|---|
| alknet-http | OperationAdapter trait (to implement from_openapi/from_mcp) |
Blocked — can't define HTTP-backed adapters without the trait |
| alknet-ssh | Stable alknet-call types (no adapter dependency) | Not blocked — ssh depends on alknet-core, not alknet-call's adapters. Can proceed in parallel. |
| alknet-agent | CallClient (tool dispatch), from_call (remote tool import), OperationAdapter (provider adapters) |
Blocked on CallClient + from_call |
| Container service (dispatch rewrite) | CallClient + from_call |
Blocked — this is the primary consumer |
| Runner pattern (dev runner, opencode runner) | CallClient + from_call |
Blocked — the runner IS a CallClient |
| alknet-napi | CallClient (Node.js calls remote ops) |
Blocked — NAPI projects CallClient to JS |
Open Questions to Carry into Phase 1
- OQ-CALL-01 (peer-scoped registry filtering shape): the exact mechanism
for marking
Capabilitiesentries orHandlerRegistrations as remote-safe (DC-1). Needs an ADR. The existence of filtering is one-way; the shape is two-way. - OQ-CALL-02 (
OperationAdaptererror type):AdapterErrorenum shape (DC-4). Two-way door; record in spec amendment. - OQ-CALL-03 (
from_callre-import trigger): auto-on-reconnect vs explicit (DC-2). Two-way door; recommend auto-on-reconnect as default. - OQ-CALL-04 (namespace collision behavior): error on collision (DC-3). Two-way door; recommend error as default.
Next Steps
- Resolve DC-1 (peer-scoped registry filtering) — this is the one decision
that needs an ADR before
CallClientcan be implemented correctly. The others (DC-2, DC-3, DC-4) are two-way-door defaults that can be set in the spec amendment and revisited during implementation. - Amend the call spec (
call-protocol.md,operation-registry.md) to capture: theCallClientgap, the adapter location map, the no-env-vars invariant, the exchange-of-operations pattern, and the DC-2/3/4 defaults. - Implement
CallClient— the highest-value piece. ReusesCallConnectionfor the dispatch loop; adds connection establishment + credentials. - Implement
from_call— consumes the already-implementedservices/list+services/schemadiscovery API. - Implement
OperationAdaptertrait — small, unblocks alknet-http. - Implement
from_jsonschema— small, standalone.
References
docs/architecture/decisions/017-call-protocol-client-and-adapter-contract.md— the client/adapter contract (specced, partially unimplemented)docs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md— registration bundle, provenance, composition authoritydocs/architecture/decisions/024-operation-registry-layering.md— Layer 0/1/2 overlay modeldocs/architecture/decisions/014-secret-material-flow-and-capability-injection.md— the no-env-vars invariant's foundationdocs/architecture/crates/call/call-protocol.md—CallConnection, Layer 2 overlay,compose_root_envdocs/architecture/crates/call/operation-registry.md— adapter provenance,Capabilitiesinjectioncrates/alknet-call/src/— implementation (verified state above)/workspace/@alkdev/operations/— TypeScript prior art (from_openapi.ts,from_mcp.ts,from_schema.ts,scanner.ts)/workspace/@alkdev/dispatch/— concrete downstream consumer (container service / "reverse git runner") this completion unblocks/workspace/aisdk/— downstream consumer (Rust port of Vercel AI SDK); the no-env-vars invariant makes itsstd::env::varreads unreachable/workspace/rust-sdk/— MCP Rust SDK (rmcp); streamable HTTP transport for alknet-http'sfrom_mcp/to_mcp(separate crate, separate Phase 0)docs/research/alknet-ssh/phase-0-findings.md— alknet-ssh Phase 0; confirms ssh depends on alknet-core not alknet-call's adapters, so it proceeds in parallel with this completion