diff --git a/docs/architecture/README.md b/docs/architecture/README.md index 982640f..ef08004 100644 --- a/docs/architecture/README.md +++ b/docs/architecture/README.md @@ -1,15 +1,17 @@ --- status: draft -last_updated: 2026-06-23 +last_updated: 2026-06-26 --- # Alknet Architecture ## Current State -**Pre-implementation.** The project has completed a pivot from a three-layer model to an ALPN-as-service model. The greenfield workspace contains only `alknet-vault` (stable — implementation complete and verified, local-only by construction per ADR-025, HD-derivation key model per ADR-026) and research/reference material. Foundational ADRs (001–026) are in place. ADR-024 resolves the registry mutability question and the `OperationContext.env` type identity crisis by layering the registry by trust boundary. ADR-025 drops irpc from the vault, making it local-only by construction. ADR-026 records the HD-derivation key model as a foundational decision. Review #003 (type/API surface completeness) resolved: `DerivedKey` derive contradiction, `encrypt` prose, return-type divergence, RwLock contradiction, drift table gaps, ADR-022 stale sketches, `Capabilities`/`SessionOverlaySource`/`CallConnection`/`CachedKey` definitions, `CompositeOperationEnv` dispatch contract, `with_local` signature, payload schemas, timeout propagation, and request ID generation. The alknet-core and alknet-call crate specs are in draft; the alknet-vault crate specs are stable. +**Pre-implementation.** The project has completed a pivot from a three-layer model to an ALPN-as-service model. The greenfield workspace contains only `alknet-vault` (stable — implementation complete and verified, local-only by construction per ADR-025, HD-derivation key model per ADR-026) and research/reference material. Foundational ADRs (001–028) are in place. ADR-024 resolves the registry mutability question and the `OperationContext.env` type identity crisis by layering the registry by trust boundary. ADR-025 drops irpc from the vault, making it local-only by construction. ADR-026 records the HD-derivation key model as a foundational decision. Review #003 (type/API surface completeness) resolved: `DerivedKey` derive contradiction, `encrypt` prose, return-type divergence, RwLock contradiction, drift table gaps, ADR-022 stale sketches, `Capabilities`/`SessionOverlaySource`/`CallConnection`/`CachedKey` definitions, `CompositeOperationEnv` dispatch contract, `with_local` signature, payload schemas, timeout propagation, and request ID generation. The alknet-core and alknet-call crate specs are in draft; the alknet-vault crate specs are stable. -**Next step**: Implementation. All open questions are resolved. The specs have passed three review passes (#001 governance/security model, #002 cross-document consistency/two-way-door audit, #003 type/API surface completeness). +The alknet-call server-side core (`CallAdapter`, `CallConnection` dispatch loop, wire framing, pending map, abort cascade, operation registry, service discovery) is implemented and tested (159 tests passing). The call-completion gap analysis (`docs/research/alknet-call-completion/gap-analysis.md`) identified the missing client/adapter surface specced in ADR-017 — `CallClient`, `from_call`, `from_jsonschema`, the `OperationAdapter` trait — plus four decisions (DC-1..4) needed before implementation. DC-1 (the one-way door: peer-scoped registry filtering) is resolved by ADR-028; DC-2/3/4 are two-way-door defaults recorded in `client-and-adapters.md` and tracked as OQ-25..28. The client/adapter surface is specced (`crates/call/client-and-adapters.md`); implementation is pending. + +**Next step**: Implementation of the alknet-call client/adapter surface (priority order in `client-and-adapters.md`): `CallClient` → `from_call` → `OperationAdapter` trait → `from_jsonschema`. All one-way doors are resolved; remaining open questions (OQ-25..28) are two-way-door shape/defaults decided during implementation. ## Architecture Documents @@ -25,6 +27,7 @@ last_updated: 2026-06-23 | [crates/call/README.md](crates/call/README.md) | draft | alknet-call crate index | | [crates/call/call-protocol.md](crates/call/call-protocol.md) | draft | CallAdapter, EventEnvelope framing, stream model, PendingRequestMap, bidirectional calls, streaming subscribe example | | [crates/call/operation-registry.md](crates/call/operation-registry.md) | draft | OperationSpec, Handler, OperationRegistry, AccessControl, capability injection, service discovery, irpc integration | +| [crates/call/client-and-adapters.md](crates/call/client-and-adapters.md) | draft | CallClient (outbound connection opener), from_call / from_jsonschema, OperationAdapter trait, adapter location map, no-env-vars invariant, exchange-of-operations pattern | | [crates/vault/README.md](crates/vault/README.md) | stable | alknet-vault crate index | | [crates/vault/mnemonic-derivation.md](crates/vault/mnemonic-derivation.md) | stable | BIP39, SLIP-0010, BIP-0032, derivation paths, key types | | [crates/vault/encryption.md](crates/vault/encryption.md) | stable | AES-256-GCM, EncryptedData, key versioning, salt (Phase B reserved) | @@ -62,6 +65,7 @@ last_updated: 2026-06-23 | [025](decisions/025-vault-local-only-dispatch.md) | Vault Local-Only Dispatch | Accepted | | [026](decisions/026-vault-key-model-hd-derivation.md) | Vault Key Model — HD Derivation | Accepted | | [027](decisions/027-tls-identity-redesign-acme-rawkey-decoupling.md) | TLS Identity Redesign — ACME + RawKey Decoupling | Accepted | +| [028](decisions/028-callclient-peer-scoped-registry-filtering.md) | Peer-Scoped Registry Filtering for CallClient Inbound Dispatch | Accepted | ## Open Questions @@ -93,6 +97,12 @@ See [open-questions.md](open-questions.md) for the full tracker. - **OQ-23**: Handler identity registration path — registration bundle with provenance, composition authority, scoped env, capabilities (ADR-022) - **OQ-24**: Operation error schemas — declared domain errors with typed `details` payload; adapter fidelity for `from_openapi`/`to_openapi` (ADR-023) +**Open (two-way-door remainders from alknet-call completion):** +- **OQ-25**: Remote-safe marking shape — existence of default-deny `CallClient` filtering locked by ADR-028; shape (`remote_safe: bool` v1 vs per-peer allowlist) open +- **OQ-26**: `OperationAdapter` error type — `import()` returns `Result<_, AdapterError>`; variants decided in implementation +- **OQ-27**: `from_call` re-import trigger — v1 default auto-on-reconnect; explicit `refresh()` additive +- **OQ-28**: `from_call` namespace collision — v1 default error-on-collision (no prefix by default) + **Deferred (not active):** - **OQ-09**: WASM target boundaries — design constraint, not deliverable - **OQ-10**: Git adapter scope — start with smart protocol, add ERC721 later diff --git a/docs/architecture/crates/call/README.md b/docs/architecture/crates/call/README.md index bea871d..be6a020 100644 --- a/docs/architecture/crates/call/README.md +++ b/docs/architecture/crates/call/README.md @@ -1,7 +1,7 @@ --- status: draft -last_updated: 2026-06-23 -review: call/review-call passed 2026-06-23 — registry, protocol, ADR (005/012/014/015/016/017/022/023/024), security, and pattern-consistency checks all conformant; 159 unit/integration tests green; `cargo build`, `cargo clippy -- -D warnings`, `cargo fmt --check`, `cargo test` clean. +last_updated: 2026-06-26 +review: call/review-call passed 2026-06-23 — registry, protocol, ADR (005/012/014/015/016/017/022/023/024), security, and pattern-consistency checks all conformant; 159 unit/integration tests green; `cargo build`, `cargo clippy -- -D warnings`, `cargo fmt --check`, `cargo test` clean. Call-completion gap (ADR-017 client/adapter surface) addressed 2026-06-26 — ADR-028 + client-and-adapters.md added; implementation pending. --- # alknet-call @@ -14,6 +14,7 @@ Structured RPC over QUIC: operations, request/response, streaming subscriptions, |----------|--------|-------------| | [call-protocol.md](call-protocol.md) | draft | CallAdapter, EventEnvelope framing, stream model, PendingRequestMap, bidirectional calls | | [operation-registry.md](operation-registry.md) | draft | OperationSpec, Handler, OperationRegistry, AccessControl, service discovery, irpc integration | +| [client-and-adapters.md](client-and-adapters.md) | draft | CallClient (outbound connection opener), from_call / from_jsonschema, OperationAdapter trait, adapter location map, no-env-vars invariant, exchange-of-operations pattern | ## Applicable ADRs @@ -37,6 +38,7 @@ Structured RPC over QUIC: operations, request/response, streaming subscriptions, | [022](../../decisions/022-handler-registration-provenance-and-composition-authority.md) | Handler Registration, Provenance, and Composition Authority | Registration bundle carries provenance, composition authority, scoped env, capabilities | | [023](../../decisions/023-operation-error-schemas.md) | Operation Error Schemas | Operations declare domain errors; `call.error` carries typed `details`; adapter fidelity | | [024](../../decisions/024-operation-registry-layering.md) | Operation Registry Layering | Curated (static) + session/connection overlays (dynamic); `OperationEnv` as trait-object integration point; `OperationContext.env` split into `scoped_env` (data) and `env` (dispatch trait) | +| [028](../../decisions/028-callclient-peer-scoped-registry-filtering.md) | Peer-Scoped Registry Filtering for CallClient Inbound Dispatch | Default-deny peer-scoped registry view; `remote_safe` marking on `HandlerRegistration`; trusted-peer opt-in; locks the ADR-017 §1 security-dimension one-way door | ## Relevant Open Questions @@ -47,6 +49,10 @@ Structured RPC over QUIC: operations, request/response, streaming subscriptions, | OQ-14 | Batch operation semantics | resolved | Correlated `call.requested` events is the correct protocol design | | OQ-16 | Safe vault operations for call protocol exposure | resolved (ADR-014) | None exposed for now | | OQ-19 | Session-scoped operation registries | resolved | Agent-written operations overlaid on curated registry via `OperationEnv` trait layering. Protocol doesn't need changes; `OperationEnv` must remain a trait. Generalized by ADR-024 to cover connection-scoped overlays. | +| OQ-25 | Remote-safe marking shape for CallClient peer-scoped filtering | open (two-way) | Existence of default-deny filtering locked by ADR-028; shape (`remote_safe: bool` v1 vs per-peer allowlist) is the two-way-door remainder | +| OQ-26 | OperationAdapter error type (AdapterError variants) | open (two-way) | `import()` returns `Result<_, AdapterError>`; variants decided in implementation | +| OQ-27 | from_call re-import trigger | open (two-way) | v1 default: auto-on-reconnect; explicit `refresh()` is additive | +| OQ-28 | from_call namespace collision behavior | open (two-way) | v1 default: error on collision (no prefix by default) | ## Key Design Principles @@ -59,4 +65,8 @@ Structured RPC over QUIC: operations, request/response, streaming subscriptions, 7. **No secret material on the wire**: The call protocol carries no private keys, API keys, mnemonics, or decrypted credentials. Handlers receive outbound credentials through `OperationContext.capabilities`, injected at the assembly layer. See ADR-014. 8. **Abort cascades to descendants**: `call.aborted` for a parent request cascades to all non-terminal descendants. Default `abort-dependents`; `continue-running` opt-in. See ADR-016. 9. **Internal calls switch authority context, not skip ACL**: The `internal` flag marks composition-originated calls. ACL runs against the handler's composition authority, not the caller's and not as a blanket skip. Operations have External/Internal visibility. Scoped composition env bounds reachability. See ADR-015, ADR-022. -10. **Provenance determines composition capability**: Only `Local` and `Session` ops can compose. Leaves (`FromOpenAPI`, `FromMCP`, `FromCall`) are forwarding stubs — they don't get composition authority or a scoped env. The assembly layer is the sole grantor of composition authority. See ADR-022. \ No newline at end of file +10. **Provenance determines composition capability**: Only `Local` and `Session` ops can compose. Leaves (`FromOpenAPI`, `FromMCP`, `FromCall`) are forwarding stubs — they don't get composition authority or a scoped env. The assembly layer is the sole grantor of composition authority. See ADR-022. +11. **Connection direction is independent of call direction**: Who opens the QUIC connection is a connection-layer concern, not a protocol-layer concern. Both sides can call each other once connected. The `CallAdapter` accepts connections; the `CallClient` opens them; both produce the same `CallConnection` and dispatch through the same loop. See ADR-017, [client-and-adapters.md](client-and-adapters.md). +12. **CallClient registry is default-deny**: A `CallClient` exposes no operations to the remote peer unless explicitly marked remote-safe. Sharing the global registry is an explicit trusted-peer opt-in, never the default. This prevents a remote peer's call from triggering dispatch that populates `OperationContext.capabilities` from the local node's registration bundle. See ADR-028. +13. **Adapter trait lives with the types; implementations live with their transport**: `OperationAdapter` is in `alknet-call`; `from_call`/`from_jsonschema` are in `alknet-call` (QUIC / pure parse); `from_openapi`/`from_mcp`/`to_openapi`/`to_mcp` are in `alknet-http` (reqwest / axum). `alknet-call` stays lean — no HTTP client, no HTTP server. See [client-and-adapters.md](client-and-adapters.md). +14. **No handler reads outbound credentials from any source other than `OperationContext.capabilities`** (no-env-vars invariant): the credential injection path is vault → assembly layer → `Capabilities` → `HandlerRegistration.capabilities` → `OperationContext.capabilities` → handler. Downstream consumers' `std::env::var` reads are unreachable because the assembly layer never calls `Default::default()`. See ADR-014, [client-and-adapters.md](client-and-adapters.md). \ No newline at end of file diff --git a/docs/architecture/crates/call/call-protocol.md b/docs/architecture/crates/call/call-protocol.md index d3cda72..ac0faed 100644 --- a/docs/architecture/crates/call/call-protocol.md +++ b/docs/architecture/crates/call/call-protocol.md @@ -524,8 +524,9 @@ Handlers clean up resources when their call is cancelled (in Rust, the future is | Secret material flow | [ADR-014](../../decisions/014-secret-material-flow-and-capability-injection.md) | Call protocol carries no secret material; capabilities injected at assembly layer | | Privilege model and authority context | [ADR-015](../../decisions/015-privilege-model-and-authority-context.md) | `internal` = authority switch not ACL skip; External/Internal visibility; handler identity + scoped env | | Abort cascade for nested calls | [ADR-016](../../decisions/016-abort-cascade-for-nested-calls.md) | `call.aborted` cascades to descendants; default `abort-dependents`, `continue-running` opt-in | -| Call protocol client and adapter contract | [ADR-017](../../decisions/017-call-protocol-client-and-adapter-contract.md) | `CallClient` opens connections; `from_call` imports remote ops; connection direction independent of call direction | +| Call protocol client and adapter contract | [ADR-017](../../decisions/017-call-protocol-client-and-adapter-contract.md) | `CallClient` opens connections; `from_call` imports remote ops; connection direction independent of call direction. Client/adapter surface specced in [client-and-adapters.md](client-and-adapters.md) | | Handler registration, provenance, and composition authority | [ADR-022](../../decisions/022-handler-registration-provenance-and-composition-authority.md) | Registration bundle carries provenance, composition authority, scoped env, capabilities; dispatch path reads from bundle | +| Peer-scoped registry filtering for CallClient | [ADR-028](../../decisions/028-callclient-peer-scoped-registry-filtering.md) | Default-deny `CallClient` registry view; `remote_safe` marking; trusted-peer opt-in | | Operation error schemas | [ADR-023](../../decisions/023-operation-error-schemas.md) | Operations declare domain errors; `call.error` carries typed `details` | ## Open Questions @@ -536,6 +537,7 @@ See [open-questions.md](../../open-questions.md) for full details. - **OQ-14** (resolved): Batch is a client-side pattern of correlated `call.requested` events, not a protocol primitive. - **OQ-16** (resolved by ADR-014): No vault operations are exposed over the call protocol for now. - **OQ-19** (resolved): Session-scoped operation registries — agent-written operations overlaid on global registry via `OperationEnv` trait layering. Protocol doesn't need changes; `OperationEnv` must remain a trait. +- **OQ-25..28** (open, two-way): Call-completion remainders — `CallClient` remote-safe marking shape, `OperationAdapter` error type, `from_call` re-import trigger, `from_call` namespace collision. The `CallClient`/adapter surface itself is specced in [client-and-adapters.md](client-and-adapters.md); the one-way door among these (existence of default-deny filtering) is resolved by ADR-028. ## References diff --git a/docs/architecture/crates/call/client-and-adapters.md b/docs/architecture/crates/call/client-and-adapters.md new file mode 100644 index 0000000..5590c31 --- /dev/null +++ b/docs/architecture/crates/call/client-and-adapters.md @@ -0,0 +1,537 @@ +--- +status: draft +last_updated: 2026-06-26 +--- + +# alknet-call — Client and Adapters + +The outbound half of the call protocol: opening connections, importing remote +operations, and the adapter contract that ties import-style adapters together. +This document covers what ADR-017 specced but the server-side implementation +(`call-protocol.md`, `operation-registry.md`) did not include — the `CallClient` +that *opens* a connection, the `from_call`/`from_jsonschema` adapters, and the +`OperationAdapter` trait. The server-side `CallAdapter` and `CallConnection` +dispatch loop are covered in `call-protocol.md`; this document covers the +client-side connection-establishment half and the adapter surface. + +## What + +This document specifies four components, all in `alknet-call`: + +1. **`CallClient`** — opens an outbound `alknet/call` QUIC connection and + produces a `CallConnection`. The dispatch loop is shared with the + server-side `CallAdapter` (ADR-017 §1); `CallClient` is the + connection-establishment + credential-handling half, not a parallel + protocol implementation. +2. **`from_call`** — discovers operations on a remote call-protocol endpoint + via `services/list` + `services/schema` (already implemented in + `registry/discovery.rs`) and registers them in the connection's Layer 2 + overlay as `FromCall`-provenance leaves with forwarding handlers. +3. **`from_jsonschema`** — schema-only registration: produces + `HandlerRegistration` bundles with no handler, for validation, discovery, + and composition-graph construction without a runtime. +4. **`OperationAdapter` trait** — the async trait that `from_call`, + `from_openapi`, `from_mcp`, and `from_jsonschema` all implement. + +It also records two cross-cutting architectural mechanisms that the adapter +surface rests on: + +- The **adapter location map** — which adapters live in `alknet-call` vs + `alknet-http`, and why. +- The **no-env-vars invariant** — the architectural mechanism by which + downstream consumers' `std::env::var` credential reads are made unreachable. + +And one downstream pattern this completion unblocks: + +- The **exchange-of-operations pattern** (runner / container service) — the + canonical bilateral composition this client surface enables. + +## Why + +The server-side `CallAdapter` (accept path) and `CallConnection` (dispatch +loop) are implemented and tested. The client side is the #1 gap blocking every +downstream consumer: the runner pattern (a process that connects outward to a +hub and exposes local ops), the container-service rewrite, the bilateral +exchange, the NAPI projection, and the agent's cross-node tool dispatch all +require a `CallClient`. `from_call` is the #2 gap; the `OperationAdapter` +trait is the enabling gap for `alknet-http`'s `from_openapi`/`from_mcp`. + +ADR-017 specced this surface. This document is the spec that operationally +fills the gap ADR-017 left to implementation: the `CallClient` API, the +`from_call`/`from_jsonschema` flows, the trait signature, the adapter +location, the credential invariant, and the bilateral pattern. The gap +analysis (`docs/research/alknet-call-completion/gap-analysis.md`) identified +four decisions (DC-1..4) needed before implementation; DC-1 is resolved by +ADR-028, and DC-2/3/4 are two-way-door defaults recorded here and tracked as +OQs (DC-2→OQ-27, DC-3→OQ-28, DC-4→OQ-26). + +## Architecture + +### CallClient + +`CallClient` opens a QUIC connection to a remote node on ALPN `alknet/call`, +performs credential setup, and produces a `CallConnection`. The +`CallConnection` type is already implemented (`call-protocol.md` §"CallConnection") +— it wraps an established `Connection` and holds the Layer 2 imported-ops +overlay. `CallClient` is the producer on the outbound side; `CallAdapter`'s +accept path is the producer on the inbound side. Both produce the same +`CallConnection` and hand it to the same shared dispatch loop. + +```rust +pub struct CallClient { + /// The operation registry. The peer-scoped view is a dispatch-time read + /// over this registry, not a copy (ADR-028 §5). + registry: Arc, + identity_provider: Arc, + /// Trusted-peer mode (ADR-028 §3): when true, the dispatch path exposes + /// all External ops to the remote peer and `services/list` lists all + /// External ops, ignoring the `remote_safe` marking. When false + /// (default), only registrations with `remote_safe: true` dispatch, and + /// `services/list` hides non-remote-safe ops (ADR-028 Assumption 2). + trusted_peer: bool, +} + +impl CallClient { + /// Open a QUIC connection to `addr` on ALPN `alknet/call`, perform + /// credential handshake, and return a CallConnection running the shared + /// dispatch loop. Credentials come from capabilities (ADR-014), not env + /// vars — see "No-Env-Vars Invariant" below. + pub async fn connect( + &self, + addr: SocketAddr, + credentials: CallCredentials, + ) -> Result; + + /// Trusted-peer mode: construct a CallClient that exposes all External + /// ops from `registry` to the remote peer, ignoring the remote-safe + /// marking. Explicit opt-in per ADR-028 §3. + pub fn trusted_peer( + registry: Arc, + identity_provider: Arc, + ) -> Self; +} +``` + +The v1 mechanism is the `trusted_peer: bool` flag plus the `remote_safe: bool` +field on each `HandlerRegistration` (default `false` across all provenance, +ADR-028 §4). A richer per-peer filtering mechanism (per-peer allowlist, +capability-class tag) is the two-way-door remainder tracked as OQ-25; v1's +boolean limits exposure control to "remote-safe for any peer" vs "not," which +is acceptable for the runner/dispatch pattern (one remote peer per +`CallClient`). + +The connection is symmetric after establishment (ADR-017 §2): both sides can +send and receive `call.requested`. Connection direction (who opened it) is +independent of call direction (who calls whom). The `CallClient` is therefore +both a caller and a callee — it dispatches incoming calls from the remote +peer against its peer-scoped registry view, and it initiates outgoing calls +through the `CallConnection::call()` / `subscribe()` / `abort()` API. + +### Credential sources for connections + +`CallClient::connect()` takes a `CallCredentials` bundle. Credentials come +from `Capabilities` (ADR-014), never from environment variables. The three +credential dimensions (ADR-017 §7): + +```rust +pub struct CallCredentials { + pub tls_identity: Option, // RFC 7250 raw key or X.509 + pub auth_token: Option, // call-protocol-level token + pub remote_identity: Option, // expected fingerprint/cert +} +``` + +- **TLS identity** — the local node's Ed25519 raw key (RFC 7250) or X.509 cert, + derived from the vault at startup (ADR-020, ADR-026, ADR-027). +- **Auth token** — an opaque call-protocol-level token, decrypted from the + vault or derived from a shared secret. +- **Remote identity verification** — the expected fingerprint/cert of the + remote node, stored as a capability. + +These are populated by the assembly layer at `CallClient` construction time +from vault-derived `Capabilities`. The credential path is the no-env-vars +invariant (below). The concrete shapes of `TlsIdentity`, `AuthToken`, and +`RemoteIdentity` are implementation-detail two-way doors; the one-way +constraints are that they come from `Capabilities`, not env vars (ADR-014). + +### from_call + +`from_call` discovers the remote peer's `External` operations and registers +them in the connection's Layer 2 overlay as `FromCall`-provenance leaves with +forwarding handlers. The discovery mechanism (`services/list` + +`services/schema`) is already implemented in `registry/discovery.rs`; +`from_call` is the client-side consumer of that API. + +```rust +pub struct FromCallConfig { + /// Namespace prefix applied to imported operation names. Optional — + /// default no prefix. Collision on import is an error (DC-3, OQ-28), + /// not last-wins. + pub namespace_prefix: Option, + /// Optional filter — import only operations whose names match. None + /// imports all External ops discovered via services/list. + pub operation_filter: Option>, +} + +/// Discover the remote peer's External ops and construct HandlerRegistration +/// bundles with FromCall provenance and forwarding handlers. The caller +/// registers the bundles in the connection's overlay via +/// CallConnection::register_imported_all(). +pub async fn from_call( + connection: &CallConnection, + config: FromCallConfig, +) -> Result, AdapterError>; +``` + +The flow (ADR-017 §3): + +1. Call `services/list` on the remote → list of `External` operations. +2. Call `services/schema` for each → input/output JSON Schemas and declared + `error_schemas` (ADR-023). +3. For each discovered op, construct a `HandlerRegistration`: + - `spec` mirrors the remote op's name (with optional prefix), namespace, + type, schemas, access control. + - `handler` is a forwarding handler: sends `call.requested` through the + `CallConnection`, awaits `call.responded` (or streams for subscriptions). + - `provenance: FromCall`, `composition_authority: None`, `scoped_env: None` + (leaf — ADR-022). +4. The caller registers the bundles via + `CallConnection::register_imported_all()`. + +**Re-import on reconnection** (DC-2, OQ-27): `from_call` runs automatically on +connection establishment. The overlay is per-connection (Layer 2, ADR-024), so +a stale overlay dies with the connection; re-import on reconnect is naturally +scoped to the new connection. This is the v1 default; explicit re-import via a +future `CallConnection::refresh()` is additive. + +**Namespace collision** (DC-3, OQ-28): optional prefix, default no prefix, +collision = error. A node importing from two remotes that both expose +`/container/exec` without prefixes should fail loudly. The operator adds +prefixes when they know they're importing from multiple sources. + +**Trust is transitive** (recorded in `operation-registry.md`): a +`from_call`-imported operation executes the remote node's code, not yours. +The scoped env (ADR-015) bounds *which* operations are reachable, not *what* +they do. `from_call` means "I trust the remote node as much as my own +handlers." The abort cascade (ADR-016) crosses the node boundary transparently +through the forwarding handler's `parent_request_id`. + +### from_jsonschema + +Schema-only registration: produces `HandlerRegistration` bundles with no +handler (`FromJsonSchema` provenance). Used for validation, discovery, and +composition-graph construction without a runtime — type-checking a composition +plan without executing it, building a UI of available operations without +standing up the transports, etc. + +```rust +pub fn from_jsonschema( + spec: OperationSpec, + schema: serde_json::Value, +) -> HandlerRegistration; +``` + +Distinct from `from_call` (gap analysis DC-5, confirmed not a decision): + +| | `from_jsonschema` | `from_call` | +|---|---|---| +| Schema source | Provided directly (caller fetches, passes in) | Discovered over wire (`services/list` + `services/schema`) | +| Handler at call time | None (schema-only, `FromJsonSchema` provenance) | Forwards over QUIC (`FromCall` provenance, leaf) | +| Use case | Type validation, discovery, composition graph construction | Actually invoking remote operations | + +Keeping them separate preserves the "schema-only, no execution" use case +(type checking, safe composition planning without runtime). + +### OperationAdapter trait + +The shared shape across import-style adapters. The trait lives in +`alknet-call` (where the types live); the implementations live where their +transport dependencies live (see "Adapter Location Map" below). + +```rust +#[async_trait] +pub trait OperationAdapter: Send + Sync { + async fn import(&self) -> Result, AdapterError>; +} +``` + +The trait is **async** because `from_call` requires async discovery +(`services/list` + `services/schema` over a QUIC connection). Sync adapters +(`from_openapi`, `from_mcp` reading a static spec) trivially satisfy an async +trait — their `import()` bodies contain no `.await` points. This is locked by +ADR-017 §5. + +The **error type** (DC-4, OQ-26) is `Result, +AdapterError>` where `AdapterError` is a crate-level enum covering the +failure modes real implementations hit: discovery transport failure +(`from_call` remote unreachable), schema parse failure (`from_openapi`, +`from_jsonschema`), unauthorized (HTTP 401 for `from_openapi`, +`from_mcp`). The exact `AdapterError` variants are the two-way-door +remainder; the *presence* of an error type is filled in here. ADR-017 §5 +showed `async fn import(&self) -> Vec` with no error +type; the spec omitted the error type as an implementation-detail two-way +door, recorded here. + +Implementations: +- `FromCall` — QUIC-backed (in `alknet-call`). +- `FromJsonSchema` — pure parse, no transport (in `alknet-call`). +- `FromOpenAPI` — HTTP-backed (in `alknet-http`). +- `FromMCP` — MCP streamable-HTTP-backed (in `alknet-http`, feature-gated). + +The `to_*` adapters (`to_openapi`, `to_mcp`) are outbound projections, not +`OperationAdapter` implementations — they consume the registry, they don't +produce entries for it (ADR-017 §5). + +### Adapter Location Map + +The decomposition principle: **the adapter trait lives where the types live +(`alknet-call`); the adapter implementations live where their transport +dependencies live.** + +``` +alknet-call (lean — no HTTP client, no HTTP server) +├── OperationAdapter trait (the contract — async, per ADR-017 §5) +├── from_call (QUIC — discovers remote ops via call protocol) +├── from_jsonschema (pure parse — caller fetches the doc, passes it in) +└── CallClient (outbound connection opener — the #1 gap) + +alknet-http (owns HTTP server + HTTP client — separate crate, separate Phase 0) +├── ProtocolHandler for h2/http1.1/h3 (axum server — inbound HTTP) +├── from_openapi (parse OpenAPI doc + reqwest forwarding handler) +├── to_openapi (generate OpenAPI doc from local registry) +├── from_mcp (feature-gated) (import remote MCP tools over streamable HTTP — reqwest) +└── to_mcp (feature-gated) (expose local ops as MCP tools over streamable HTTP — axum) + +Not built: MCP stdio transport + — stdio = spawn arbitrary executable = built-in RCE ("download untrusted MCP servers") + — streamable HTTP is the only supported MCP transport in alknet + — recorded as an explicit security position, not a feature gap +``` + +`alknet-call` never sees the HTTP client. The `from_openapi`/`from_mcp` +forwarding handlers are opaque `Arc` from the registry's +perspective — constructed by `alknet_http::from_openapi()` at registration +time, stored in `HandlerRegistration`, dispatched by the `CallAdapter` which +doesn't know reqwest is involved. `alknet-call` stays lean (no reqwest, no +axum); `alknet-http` owns both HTTP directions. + +**ADR-003 dependency note**: `alknet-http` implementing `from_openapi`/ +`from_mcp` means `alknet-http` depends on `alknet-call` (for `OperationSpec`, +`Handler`, `HandlerRegistration`, `OperationAdapter`). ADR-003's rule is "no +handler crate depends on another handler crate" — but `alknet-call` is both +a handler *and* the protocol foundation that `alknet-agent` and `alknet-napi` +already consume. `alknet-http` depending on `alknet-call` is "HTTP uses the +call protocol types," not "HTTP depends on SSH." This is within the spirit of +ADR-003 (`alknet-call` is protocol-foundation, not a peer handler). The +`alknet-http` spec should note this explicitly; a one-line amendment to +ADR-003 clarifying that `alknet-call` is a protocol-foundation crate is +deferred to the `alknet-http` Phase 0. + +### No-Env-Vars Invariant + +The architectural mechanism for the env-var problem in downstream consumers +(the Rust port of Vercel's AI SDK at `/workspace/aisdk/`, whose providers all +read `std::env::var("OPENAI_API_KEY")` in their `Default` impls). The fix is +**not** to modify those consumers — it's that the env-var path is never taken +because the assembly layer never calls `Default::default()`. + +The credential injection path: + +``` +vault (seed) + → assembly layer (derive + decrypt at startup, per ADR-014/019/025) + → Capabilities (non-serializable, zeroized, immutable — ADR-014) + → HandlerRegistration.capabilities (ADR-022, the registration bundle) + → OperationContext.capabilities (per-request, populated by dispatch + path from the bundle — ADR-022 §6) + → from_openapi handler reads context.capabilities.get("openai") + → injects into HTTP Authorization header + → reqwest request goes out with vault-derived credential +``` + +The `from_openapi`/`from_mcp` forwarding handlers (in `alknet-http`) are the +credential injection point. They read from `context.capabilities`, not from +`std::env::var`. The downstream consumers' `Default` impls reading env vars +are simply never called — the assembly layer constructs providers with +vault-derived credentials through the builder API, or the provider's HTTP +calls are routed through `from_openapi` operations that carry the credential +in `Capabilities`. + +**This is a spec-level invariant in `alknet-call`, not a runtime convention.** +The dispatch path (`build_root_context` and `OperationEnv::invoke()` per +ADR-022 §6) populates `OperationContext.capabilities` from the registration +bundle. The invariant is: *no handler reads outbound credentials from any +source other than `OperationContext.capabilities`.* This is already the +architectural intent of ADR-014; this document records it as an explicit +invariant that the `from_openapi`/`from_mcp` handler implementations (in +`alknet-http`) are verified against. + +### Exchange-of-Operations Pattern (Runner / Container Service) + +The canonical downstream pattern this completion unblocks, recorded here so +Phase 1 specs can reference it. Concrete example: the container service at +`/workspace/@alkdev/dispatch` (axum + russh SSH client for "reverse git +runner" over Docker/vast.ai) gets rewritten as a call-protocol service. + +**Bilateral exchange**: + +``` +Container service (runs on a vast.ai/docker instance): + Defines Local ops: /container/exec, /container/list, /container/logs... + (real handlers — calls bollard or vast.ai API) + Connects to hub as a CallClient (outbound connection — runner pattern) + +Hub (central server): + Runs CallAdapter (server) on alknet/call (already implemented) + When the container service connects: + hub runs from_call → discovers /container/* via services/list + services/schema + registers them as FromCall provenance (leaf, forwarding handlers) in the + connection's Layer 2 overlay (ADR-024) + Now the hub (or anything connected to the hub) can call /container/exec + The from_call handler forwards over the connection back to the container service + +Bilateral: the container service ALSO runs from_call against the hub, + discovers the hub's External ops, and can call them. + Connection direction (container → hub) is independent of call direction + (both can call each other) per ADR-017 §2. +``` + +**What this requires**: +1. `CallClient` — the container service uses it to open the outbound + connection to the hub. The #1 gap. +2. `from_call` — both sides run it to populate their Layer 2 overlays with + the other side's `External` ops. The #2 gap. +3. `OperationAdapter` trait — `from_call` implements it. The #3 gap (enabling, + not blocking — `from_call` can be built as a free function before the trait + exists, but the trait is needed for `alknet-http`'s adapters). + +**Why the container service doesn't need alknet-ssh**: under the call +protocol, the container service is a `CallClient` that dials the hub's +`alknet/call` ALPN directly over QUIC — no SSH in the loop. SSH port +forwarding becomes the *transitional* mechanism for targets that can't run a +call-protocol client (the `alknet-ssh` phase-0 findings document this +transition). Once the container service runs a `CallClient`, SSH is out of +the path entirely. + +This is the "dev runner" pattern: a call-protocol client that connects back +to a hub and exposes core dev tools (bash, fs, etc.) as operations. The agent +service (`alknet-agent`, downstream) is the consumer that orchestrates these +via `env.invoke()`. + +## Implementation Priority Order + +Based on the gap analysis and the downstream unblock chain: + +1. **`CallClient`** (critical) — outbound connection opener. Without it, no + runner, no container service, no bilateral exchange. Reuses the existing + `CallConnection` for the dispatch loop; adds only the + connection-establishment + credential-handling half. The single + highest-value piece of work in the entire `alknet-call` completion. + +2. **`from_call`** (critical, depends on `CallClient`) — consumes the + already-implemented `services/list` + `services/schema` discovery API. + +3. **`OperationAdapter` trait** (enabling) — the async trait. Small, + standalone, unblocks `alknet-http` Phase 1. + +4. **`from_jsonschema`** (medium, standalone) — schema-only registration, no + handler. Small. + +5. **DC-1 resolution** (peer-scoped registry filtering, ADR-028) — the + security dimension of `CallClient`'s registry. Addressed in parallel with + #1 — it's a filtering layer on the registry the `CallClient` exposes, not + a blocker for the connection-establishment work. + +## What This Completion Unblocks + +| Downstream crate | What it needs from alknet-call | Status without completion | +|-------------------|-------------------------------|--------------------------| +| alknet-http | `OperationAdapter` trait (to implement `from_openapi`/`from_mcp`) | Blocked — can't define HTTP-backed adapters without the trait | +| alknet-ssh | Stable alknet-call types (no adapter dependency) | Not blocked — ssh depends on alknet-core, not alknet-call's adapters. Proceeds in parallel. | +| alknet-agent | `CallClient` (tool dispatch), `from_call` (remote tool import), `OperationAdapter` (provider adapters) | Blocked on `CallClient` + `from_call` | +| Container service (dispatch rewrite) | `CallClient` + `from_call` | Blocked — this is the primary consumer | +| Runner pattern (dev runner, opencode runner) | `CallClient` + `from_call` | Blocked — the runner IS a `CallClient` | +| alknet-napi | `CallClient` (Node.js calls remote ops) | Blocked — NAPI projects `CallClient` to JS | + +## Constraints + +- **No HTTP in alknet-call.** `from_openapi`/`from_mcp`/`to_openapi`/`to_mcp` + live in `alknet-http`. The `OperationAdapter` trait and the QUIC-backed + adapters (`from_call`, `from_jsonschema`) live in `alknet-call`. See + Adapter Location Map. +- **No secret material on the wire.** `CallCredentials` carries vault-derived + material for the *outbound* connection (TLS identity, auth token); the + call protocol's wire format carries no private keys, API keys, or decrypted + credentials (ADR-014). The no-env-vars invariant (above) is the dispatch-side + corollary. +- **Peer-scoped registry is default-deny.** A `CallClient` exposes no + operations to the remote peer unless marked remote-safe. Trusted-peer + opt-in is explicit (ADR-028). +- **`from_call` re-import is auto-on-reconnect.** v1 default; the overlay is + per-connection so re-import is naturally scoped (DC-2, OQ-27). +- **`from_call` namespace collision is an error.** Default no prefix; the + operator adds prefixes when importing from multiple sources (DC-3, OQ-28). +- **`OperationAdapter::import()` returns `Result`.** Failures surface as + `AdapterError` (DC-4, OQ-26). +- **MCP stdio transport is not built.** Streamable HTTP is the only supported + MCP transport in alknet. stdio = spawn arbitrary executable = built-in RCE. + Recorded as an explicit security position, not a feature gap. + +## Design Decisions + +| Decision | ADR | Summary | +|----------|-----|---------| +| Call protocol client and adapter contract | [ADR-017](../../decisions/017-call-protocol-client-and-adapter-contract.md) | `CallClient` opens connections; `from_call` imports remote ops; connection direction independent of call direction; trait is async; adapters produce `HandlerRegistration` bundles | +| Peer-scoped registry filtering (DC-1) | [ADR-028](../../decisions/028-callclient-peer-scoped-registry-filtering.md) | Default-deny; `remote_safe: bool` on `HandlerRegistration`; trusted-peer opt-in; one-way door on the security dimension | +| Secret material flow and capability injection | [ADR-014](../../decisions/014-secret-material-flow-and-capability-injection.md) | The no-env-vars invariant's foundation; capabilities injected at assembly layer | +| Handler registration, provenance, and composition authority | [ADR-022](../../decisions/022-handler-registration-provenance-and-composition-authority.md) | The registration bundle adapters produce; `composition_authority: None` for leaves | +| Operation registry layering | [ADR-024](../../decisions/024-operation-registry-layering.md) | Layer 2 per-connection overlay where `from_call` imports land | +| Privilege model and authority context | [ADR-015](../../decisions/015-privilege-model-and-authority-context.md) | Adapter-registered ops are `Internal` by default; default-deny posture | +| Abort cascade for nested calls | [ADR-016](../../decisions/016-abort-cascade-for-nested-calls.md) | Cross-node abort through `from_call` forwarding handler's `parent_request_id` | +| Operation error schemas | [ADR-023](../../decisions/023-operation-error-schemas.md) | `error_schemas` mirrored by `from_call` from remote op's spec | +| TLS identity redesign | [ADR-027](../../decisions/027-tls-identity-redesign-acme-rawkey-decoupling.md) | RFC 7250 raw key / X.509 cert dimensions of `CallCredentials` | +| HD derivation for encryption keys | [ADR-020](../../decisions/020-hd-derivation-for-encryption-keys.md) | Vault-derived TLS identity material | +| Vault key model | [ADR-026](../../decisions/026-vault-key-model-hd-derivation.md) | Vault-derived TLS identity material | +| Vault local-only dispatch | [ADR-025](../../decisions/025-vault-local-only-dispatch.md) | Vault access at assembly layer only; the credential injection path's first hop | +| Crate decomposition | [ADR-003](../../decisions/003-crate-decomposition.md) | `alknet-http` depends on `alknet-call` (protocol-foundation exception, noted in Adapter Location Map) | +| One-way door decision framework | [ADR-009](../../decisions/009-one-way-door-decision-framework.md) | Door-type classification for DC-1..4 | + +## Open Questions + +See [open-questions.md](../../open-questions.md) for full details. + +- **OQ-25** (open, two-way): Remote-safe marking shape — `remote_safe: bool` + v1 vs per-peer allowlist vs capability-class tag. The *existence* of + filtering is locked by ADR-028; the shape is the two-way-door remainder. +- **OQ-26** (open, two-way): `AdapterError` enum variants (DC-4). The + *presence* of an error type is recorded here; the variants are + implementation-detail. +- **OQ-27** (open, two-way): `from_call` re-import trigger — auto-on-reconnect + (v1 default, recorded here) vs explicit `CallConnection::refresh()`. v1 is + auto-on-reconnect; the explicit path is additive. +- **OQ-28** (open, two-way): `from_call` namespace collision behavior — error + on collision (v1 default, recorded here) vs last-wins. + +## References + +- ADR-017: Call Protocol Client and Adapter Contract (the spec this document + operationally fills) +- ADR-028: Peer-Scoped Registry Filtering for CallClient Inbound Dispatch + (resolves DC-1) +- `call-protocol.md` — `CallAdapter`, `CallConnection`, dispatch loop, stream + model (the server-side complement to this document) +- `operation-registry.md` — `HandlerRegistration`, provenance, capability + injection, service discovery (the discovery API `from_call` consumes) +- `docs/research/alknet-call-completion/gap-analysis.md` — DC-1..4, the + implementation-state audit, the downstream unblock chain +- `/workspace/@alkdev/operations/` — TypeScript prior art (`from_openapi.ts`, + `from_mcp.ts`, `from_schema.ts`, `scanner.ts`) +- `/workspace/@alkdev/dispatch/` — concrete downstream consumer (container + service / "reverse git runner") this completion unblocks +- `/workspace/aisdk/` — downstream consumer (Rust port of Vercel AI SDK); the + no-env-vars invariant makes its `std::env::var` reads unreachable +- `/workspace/rust-sdk/` — MCP Rust SDK (rmcp); streamable HTTP transport for + `alknet-http`'s `from_mcp`/`to_mcp` (separate crate, separate Phase 0) +- `docs/research/alknet-ssh/phase-0-findings.md` — alknet-ssh Phase 0; + confirms ssh depends on alknet-core not alknet-call's adapters, so it + proceeds in parallel with this completion \ No newline at end of file diff --git a/docs/architecture/crates/call/operation-registry.md b/docs/architecture/crates/call/operation-registry.md index 25f550e..8de0449 100644 --- a/docs/architecture/crates/call/operation-registry.md +++ b/docs/architecture/crates/call/operation-registry.md @@ -232,6 +232,8 @@ pub struct HandlerRegistration { pub composition_authority: Option, // None for leaves pub scoped_env: Option, // None for leaves pub capabilities: Capabilities, + pub remote_safe: bool, // default false; ADR-028 — exposes this op to + // CallClient peers (trusted-peer mode bypasses) } ``` @@ -632,6 +634,8 @@ The `Capabilities` type holds non-serializable, zeroized secret material. It doe **Scoped composition env.** The `OperationEnv` given to a handler is scoped — it can only invoke a declared set of operations, set at registration on the `HandlerRegistration` bundle by the assembly layer (ADR-022). This bounds the parameterized-dispatch attack surface: a handler (or an LLM picking tools, or a quickjs sandbox) can only reach declared operations, not the entire registry. The scoped env is the reachability control; the composition authority is the authority control. Both are needed for least privilege. See ADR-015 and ADR-022. +**No-env-vars invariant.** No handler reads outbound credentials from any source other than `OperationContext.capabilities`. This is the dispatch-side corollary of the capability-injection flow above: because the dispatch path populates `OperationContext.capabilities` from the registration bundle (ADR-022 §6), and because the assembly layer constructs handlers with vault-derived credentials rather than calling `Default::default()`, downstream consumers' `std::env::var` credential reads are unreachable by construction. The full invariant, the credential injection path, and the downstream-consumer framing are recorded in [client-and-adapters.md](client-and-adapters.md); this section documents the dispatch-path mechanism that makes it enforceable. + ## Constraints - The registry is **layered by trust boundary** (ADR-024). The curated layer (`Local` provenance) is immutable after construction — adding a `Local` op requires restarting the process, which re-enters the startup trust boundary. Session (`Session`) and imported (`FromCall` etc.) ops are dynamic at their respective scopes (per-session, per-connection). The pre-ADR-024 blanket immutability claim was inherited by analogy from ADR-010's `HandlerRegistry` (ALPN-level) and did not apply to the operation registry — the TLS-config argument that justifies `HandlerRegistry` immutability does not touch the operation registry, which lives behind the single ALPN `alknet/call`. @@ -659,6 +663,8 @@ The `Capabilities` type holds non-serializable, zeroized secret material. It doe | Handler registration, provenance, and composition authority | [ADR-022](../../decisions/022-handler-registration-provenance-and-composition-authority.md) | Registration bundle carries provenance, composition authority, scoped env, capabilities; dispatch path reads from bundle | | Operation registry layering | [ADR-024](../../decisions/024-operation-registry-layering.md) | Curated (static, immutable) + session and connection overlays (dynamic); `OperationEnv` as trait-object integration point; `OperationContext.env` split into `scoped_env` (data) and `env` (dispatch trait) | | Operation error schemas | [ADR-023](../../decisions/023-operation-error-schemas.md) | Operations declare domain errors; `call.error` carries typed `details`; adapter fidelity for `from_openapi`/`to_openapi` | +| Call protocol client and adapter contract | [ADR-017](../../decisions/017-call-protocol-client-and-adapter-contract.md) | `from_call`/`from_jsonschema`/`OperationAdapter` produce `HandlerRegistration` bundles; adapter-registered ops are `Internal` leaves. Surface specced in [client-and-adapters.md](client-and-adapters.md) | +| Peer-scoped registry filtering for CallClient | [ADR-028](../../decisions/028-callclient-peer-scoped-registry-filtering.md) | Default-deny `CallClient` registry view; adds `remote_safe` marking to `HandlerRegistration` (the bundle this doc defines) | ## Open Questions @@ -668,6 +674,8 @@ See [open-questions.md](../../open-questions.md) for full details. - **OQ-14** (resolved): Batch is a client-side pattern of correlated `call.requested` events, not a protocol primitive. - **OQ-16** (resolved by ADR-014): No vault operations are exposed over the call protocol for now. - **OQ-19** (resolved): Session-scoped operation registries — agent-written operations overlaid on the curated registry via `OperationEnv` trait layering. Protocol doesn't need changes; `OperationEnv` must remain a trait. Session ops are `Session` provenance (ADR-022) — always `Internal`, compose under restricted authority scoped down at sandbox creation. Generalized by ADR-024 to cover connection-scoped overlays as well. +- **OQ-25** (open, two-way): Remote-safe marking shape — existence of default-deny `CallClient` filtering locked by ADR-028; the shape (the `remote_safe: bool` field this doc's `HandlerRegistration` gains vs a richer per-peer mechanism) is the two-way-door remainder. See [client-and-adapters.md](client-and-adapters.md). +- **OQ-26..28** (open, two-way): `OperationAdapter` error type, `from_call` re-import trigger, `from_call` namespace collision. v1 defaults recorded in [client-and-adapters.md](client-and-adapters.md). ## References diff --git a/docs/architecture/decisions/017-call-protocol-client-and-adapter-contract.md b/docs/architecture/decisions/017-call-protocol-client-and-adapter-contract.md index 8484ba4..c132f22 100644 --- a/docs/architecture/decisions/017-call-protocol-client-and-adapter-contract.md +++ b/docs/architecture/decisions/017-call-protocol-client-and-adapter-contract.md @@ -2,7 +2,7 @@ ## Status -Accepted +Accepted (amended 2026-06-26 — see "Amendments" below) ## Context @@ -336,9 +336,77 @@ same as `from_openapi` receives HTTP credentials. - ADR-014: Secret material flow (credential sources, not static tokens) - ADR-015: Privilege model (adapter ops are Internal by default) - ADR-016: Abort cascade (cross-node abort propagation) +- ADR-028: Peer-Scoped Registry Filtering for CallClient Inbound Dispatch + (resolves the §1 Consequences security dimension flagged as a two-way door) - OQ-15: Call protocol client and adapter contract (resolved by this ADR) +- OQ-25..28: Two-way-door remainders from the call-completion gap analysis + (DC-1 shape, DC-4 error type, DC-2 re-import trigger, DC-3 namespace + collision — see [open-questions.md](../open-questions.md)) - [call-protocol.md](../crates/call/call-protocol.md) - [operation-registry.md](../crates/call/operation-registry.md) +- [client-and-adapters.md](../crates/call/client-and-adapters.md) — the spec + that operationally fills the gap this ADR left to implementation +- `docs/research/alknet-call-completion/gap-analysis.md` — DC-1..4, the + decisions that needed resolution before implementation - TypeScript `@alkdev/operations` — `from_openapi`, `from_mcp`, `buildEnv` prior art -- POC at `/workspace/@alkdev/dispatch` — head/worker dispatch over SSH+axum \ No newline at end of file +- POC at `/workspace/@alkdev/dispatch` — head/worker dispatch over SSH+axum + +## Amendments (2026-06-26) + +This ADR left four decisions as two-way doors (§1 Consequences flagged DC-1's +security dimension; §5 noted trait signatures are two-way doors; Assumption 4 +noted re-import hot-swap is a two-way door; §3 mentioned the namespace prefix). +The call-completion gap analysis (`docs/research/alknet-call-completion/gap-analysis.md` +DC-1..4) resolved them. The resolutions: + +### DC-1 — CallClient registry scope: resolved by ADR-028 + +The §1 Consequences security dimension is resolved by +[ADR-028](028-callclient-peer-scoped-registry-filtering.md). The one-way +door (existence of peer-scoped filtering as the v1 default) is locked: +**default-deny**, with a `remote_safe: bool` on `HandlerRegistration` +v1 shape and a trusted-peer opt-in. The shape of the marking is the +two-way-door remainder, tracked as OQ-25. This ADR's §1 text ("It has its own +operation registry to dispatch incoming calls from the remote side") and +the Consequences note ("The specific mechanism … is a two-way door") are +superseded by ADR-028's decision that the *default* is filtered, not +shared-global. Share-global remains available as the explicit opt-in +(ADR-028 §3). + +### DC-4 — OperationAdapter trait error type: resolved + +§5 showed `async fn import(&self) -> Vec` with no error +type. The trait returns `Result, AdapterError>` +where `AdapterError` is a crate-level enum. The *presence* of the error type +is recorded in [client-and-adapters.md](../crates/call/client-and-adapters.md); +the exact variants are the two-way-door remainder, tracked as OQ-26. + +### DC-2 — from_call re-import on reconnection: default set + +Assumption 4 noted re-import "happens on reconnection or is triggered +explicitly." The v1 default is **auto-re-import on connection establishment**. +The overlay is per-connection (Layer 2, ADR-024), so re-import is naturally +scoped; a stale overlay dies with the connection. Explicit re-import via a +future `CallConnection::refresh()` is additive. Two-way door; recorded in +[client-and-adapters.md](../crates/call/client-and-adapters.md); tracked as +OQ-27. + +### DC-3 — from_call namespace collision: default set + +§3's `FromCallConfig` namespace prefix is **optional, default no prefix, +collision = error**. A node importing from two remotes that both expose the +same unprefixed op name should fail loudly. The operator adds prefixes when +importing from multiple sources. Two-way door; recorded in +[client-and-adapters.md](../crates/call/client-and-adapters.md); tracked as +OQ-28. + +### Operational spec + +The gap this ADR left to implementation — the `CallClient` API, the +`from_call`/`from_jsonschema` flows, the trait signature, the adapter +location map, the no-env-vars invariant, and the exchange-of-operations +pattern — is specified in +[client-and-adapters.md](../crates/call/client-and-adapters.md). That document +is the operational complement to this ADR; this ADR remains the architectural +authority. \ No newline at end of file diff --git a/docs/architecture/decisions/028-callclient-peer-scoped-registry-filtering.md b/docs/architecture/decisions/028-callclient-peer-scoped-registry-filtering.md new file mode 100644 index 0000000..7f09136 --- /dev/null +++ b/docs/architecture/decisions/028-callclient-peer-scoped-registry-filtering.md @@ -0,0 +1,215 @@ +# ADR-028: Peer-Scoped Registry Filtering for CallClient Inbound Dispatch + +## Status + +Accepted + +## Context + +ADR-017 §1 established that a `CallClient` — which opens an outbound +`alknet/call` connection — "has its own operation registry to dispatch incoming +calls from the remote side." The ADR left the *registry scope* as an explicit +two-way door in its Consequences: + +> Sharing the global registry with a `CallClient` exposes local capabilities to +> the remote peer… A peer-scoped subset must filter by capability +> remote-safety, not just operation name. The registry-mechanism choice +> (share global vs subset vs separate) is two-way mechanically but has a +> security dimension post-ADR-022: the "share global" option is a +> capability-exposure decision, not just a dispatch decision. + +This is the one decision identified in +`docs/research/alknet-call-completion/gap-analysis.md` (DC-1) that must be +locked before `CallClient` can be implemented correctly. It is a **one-way door +on the security dimension**: the choice of default determines what a remote peer +can reach, and a wrong default silently exposes outbound credentials. + +### Why this is a one-way door, not a two-way door + +The gap analysis framed the *mechanism* (share-global vs subset vs separate +registry instance) as a two-way door, and that framing holds. But the +**existence of peer-scoped filtering as the v1 default** is one-way, because: + +1. Once a downstream consumer (the runner pattern, the container service, the + NAPI projection) is written against the "remote peer can call any + `External` op and the local node's capabilities will be populated for it" + semantics, switching the default to default-deny is a breaking change for + every consumer. The container-service rewrite at `/workspace/@alkdev/dispatch` + and the dev/runner patterns are the first consumers; the default is set + before they're written, so it's still cheap to set correctly — but only now. + +2. The security dimension is asymmetric in ADR-009 terms. "Share global" leaks + silently: there is no error, no log line, no test that fails — the remote + peer simply receives a populated `OperationContext.capabilities` drawn from + the local `HandlerRegistration.capabilities`, and the local node's API keys + get used for the remote peer's call. The reversal cost is "discover which + consumers quietly depend on the leak and re-audit." Default-deny fails + loudly (the remote peer's call to an unexposed op returns `NOT_FOUND`), + which is the cheaper failure mode. + +3. ADR-014's invariant — "no handler reads outbound credentials from any + source other than `OperationContext.capabilities`" — combined with + ADR-022's dispatch path (which populates `capabilities` from the + registration bundle) means the registration bundle *is* the exposure + boundary. Whatever the `CallClient` dispatches determines which + `Capabilities` objects cross to the remote peer's call context. Filtering + the registry is filtering capability exposure. + +### The runner/dispatch pattern is the primary use case, and it is semi-trusted + +The canonical consumer (gap analysis §"Exchange of Operations"): a container +service / dev runner connects *outward* to a hub and exposes `/container/exec`, +`/container/list`, etc. The hub then calls back into the runner. Both sides +are semi-trusted peers, not extensions of self. Exposing every `External` +operation on the runner — including any operation that carries an outbound +API key the runner happens to hold — is wrong by default. The operator who +*does* want full bilateral sharing is making an explicit trust decision. + +## Decision + +### 1. Default-deny: a CallClient exposes no operations to the remote peer unless explicitly marked remote-safe + +The `CallClient` does not share the global `OperationRegistry` by default. It +holds a **peer-scoped subset**: a filtered view containing only +`HandlerRegistration`s that are explicitly marked as remote-safe for this peer. + +The *existence* of filtering is the one-way door; this ADR locks it. + +### 2. The remote-safe marking lives on the registration bundle, not on capabilities + +The marking is added to `HandlerRegistration` (per ADR-022, the registration +bundle) as a peer-exposure field. It is not placed on `Capabilities` entries, +because: + +- `Capabilities` is a flat credential bag; marking individual entries + remote-safe conflates "this credential is safe to send over the wire" with + "this operation may be dispatched on behalf of a remote peer." Those are + different questions — an operation may be remote-safe while using a + credential that must never leave the node, and the dispatch path already + keeps `Capabilities` off the wire (ADR-014). The exposure question is about + *which ops dispatch*, not *which credentials are serializable*. +- The registration bundle is already the integration point for provenance, + composition authority, scoped env, and visibility (ADR-022). Peer-exposure is + a property of the same shape: a dispatch-path concern set at registration. + +The exact shape of the marking (a boolean, a per-peer allowlist, a +capability-class tag) is the two-way-door remainder — tracked as OQ-25, not +decided here. v1 uses the simplest shape that supports default-deny: a boolean +`remote_safe: bool` on `HandlerRegistration`, defaulting to `false`. + +### 3. "Share the global registry" remains available as an explicit opt-in + +A `CallClient` may be constructed in "trusted-peer" mode that exposes all +`External` operations from the global registry regardless of the remote-safe +marking. This is the explicit-allow path for operators who have made the trust +decision (e.g., two nodes under single administrative control, a test harness). +It is opt-in, never the default. + +### 4. Provenance-based defaults + +The remote-safe marking has a provenance-aware default at registration time, +before the operator's explicit choice: + +| Provenance | Default `remote_safe` | +|-----------|----------------------| +| `Local` | `false` — assembly-written ops are not remote-callable unless the operator says so | +| `Session` | `false` — agent-written ops are sandboxed (ADR-015); exposing them to a remote peer would widen the sandbox | +| `FromOpenAPI`, `FromMCP`, `FromCall`, `FromJsonSchema` | `false` — leaves are composition material, not wire-callable (ADR-015) | + +`false` across the board as the default. The operator flips specific +operations to `true` when they want this peer to reach them. This is the same +default-deny posture as ADR-015's visibility (`Internal` by default) and +ADR-022's composition authority (`None` for leaves by default). + +### 5. The filtering is a dispatch-time read, not a copy + +The `CallClient`'s peer-scoped view is not a second copy of the registry. It +is a dispatch-time read against the global registry, gated by the remote-safe +marking (and the trusted-peer flag). This keeps the curated layer (Layer 0, +ADR-024) single-source — the global registry is still the one Layer-0 store +built by the assembly layer at startup. Only the *visibility* to the remote +peer is filtered. + +This avoids a third registry instance (the "separate registry per CallClient" +option from DC-1) and avoids the staleness problem a copied subset would +introduce: if the assembly layer reloads a curated op's spec, the peer-scoped +view reflects it on the next dispatch, not on the next copy. + +## Consequences + +**Positive:** +- The default is safe-by-construction for the runner/dispatch pattern. A + container service that connects outward to a hub cannot accidentally expose + its local vault-derived API keys to the hub's calls. +- The one-way security door is locked before any consumer is written against + the leaky default. The container-service rewrite and the dev/runner patterns + implement against default-deny from day one. +- Failure mode is loud: a remote peer calling an unexposed op gets + `NOT_FOUND`, not silent credential exposure. +- The mechanism is additive. Trusted-peer opt-in preserves the "share global" + path for operators who want it, without making it the default. +- Single-source Layer 0: no copied registry, no staleness. + +**Negative:** +- Adds one field (`remote_safe: bool`) to `HandlerRegistration` (ADR-022). + The registration bundle grows. This is the smallest shape that supports + default-deny; OQ-25 may replace it with a richer mechanism (per-peer + allowlists, capability-class tags). +- Operators must explicitly mark operations remote-safe for bilateral + exchange. This is friction, deliberately: the bilateral container-service + pattern requires the operator to declare which of the runner's ops the hub + may call back into. +- The remote-safe marking is a v1 mechanism and may be superseded. OQ-25 + tracks the shape; a future ADR may amend or supersede this one without + revisiting the *existence* of filtering. +- The trusted-peer opt-in is a sharp tool. An operator who enables it for the + wrong peer gets the "share global" exposure this ADR exists to prevent. + The opt-in is documented as a trust decision, not as a convenience. + +## Assumptions + +1. **The remote-safe marking is set at registration time, not at connection + time.** The marking is a property of the operation (per-peer in a richer + shape, but at least a boolean in v1), set by the assembly layer when it + builds the registry. Per-connection overrides are not part of v1; if a + deployment needs different exposure per peer, it uses the richer shape + (OQ-25) or multiple `CallClient`s with different filtered views. + +2. **The peer-scoped view filters dispatch, not `services/list` semantics.** + The remote peer discovers operations via `services/list` (ADR-017 §3), + which already filters by `Visibility::External` (ADR-015). The remote-safe + marking is an *additional* filter for the dispatch path: an op may be + `External` yet not remote-safe. In v1, `services/list` served to a + `CallClient` peer **hides** non-remote-safe ops — a peer should not see + ops it cannot call, so discovery and dispatch filters agree. (The + pre-filter mental model — "`External` appears in `services/list`, then + the dispatch path returns `NOT_FOUND` for non-remote-safe" — is *not* the + v1 behavior; v1 hides them from listing too.) Whether a richer shape + (OQ-25) should expose-but-deny instead of hide is a two-way-door detail + tracked in OQ-25. + +3. **Filtering is per-`CallClient`, not global.** A node with multiple + outbound connections may expose different subsets to different peers. The + v1 boolean marking limits this to "remote-safe for any peer" vs "not"; the + richer OQ-25 shape is what enables per-peer differentiation. v1's + limitation is acceptable because the runner/dispatch pattern has one + remote peer per `CallClient`. + +## References + +- ADR-009: One-Way Door Decision Framework (the door-type framing this ADR + relies on) +- ADR-014: Secret Material Flow and Capability Injection (the no-env-vars + invariant this ADR's security argument rests on) +- ADR-015: Privilege Model and Authority Context (the default-`Internal`, + default-deny posture this ADR mirrors) +- ADR-017: Call Protocol Client and Adapter Contract (§1 Consequences flagged + this decision; §1 is amended by this ADR) +- ADR-022: Handler Registration, Provenance, and Composition Authority (the + registration bundle this ADR adds a field to) +- ADR-024: Operation Registry Layering (Layer 0 single-source; the peer-scoped + view is a dispatch-time read, not a copy) +- OQ-25: Remote-safe marking shape (the two-way-door remainder) +- `docs/research/alknet-call-completion/gap-analysis.md` — DC-1 +- `docs/architecture/crates/call/client-and-adapters.md` — the spec this ADR + informs \ No newline at end of file diff --git a/docs/architecture/open-questions.md b/docs/architecture/open-questions.md index ae86c95..b9176db 100644 --- a/docs/architecture/open-questions.md +++ b/docs/architecture/open-questions.md @@ -1,6 +1,6 @@ --- status: draft -last_updated: 2026-06-20 +last_updated: 2026-06-26 --- # Open Questions @@ -315,4 +315,97 @@ These questions are acknowledged but not active. They will be promoted to open w - **Door type**: One-way (wire format), two-way (mapping mechanism) - **Priority**: high - **Resolution**: `OperationSpec` gains `error_schemas: Vec` where each `ErrorDefinition` carries a `code`, `description`, `schema` (JSON Schema for the error detail payload), and optional `http_status` (for adapter projection). The `call.error` payload gains an optional `details` field carrying the typed error payload. Protocol-level codes (`NOT_FOUND`, `FORBIDDEN`, `INVALID_INPUT`, `INTERNAL`, `TIMEOUT`) are distinct from operation-level domain codes (`FILE_NOT_FOUND`, `RATE_LIMITED`, etc.) — protocol codes are emitted by the dispatch machinery, operation codes by handlers. `from_openapi`/`to_openapi` map OpenAPI response status codes to/from `ErrorDefinition`s, making the adapter contract from ADR-017 faithful on the error axis. `services/schema` exposes `error_schemas` for client code generation. See ADR-023. -- **Cross-references**: ADR-017, ADR-023, docs/reviews/001-pre-implementation-architecture-sanity-check.md (C5), [operation-registry.md](crates/call/operation-registry.md), [call-protocol.md](crates/call/call-protocol.md) \ No newline at end of file +- **Cross-references**: ADR-017, ADR-023, docs/reviews/001-pre-implementation-architecture-sanity-check.md (C5), [operation-registry.md](crates/call/operation-registry.md), [call-protocol.md](crates/call/call-protocol.md) + +## Theme: Call Client and Adapters + +These open questions are the two-way-door remainders from the +call-completion gap analysis +(`docs/research/alknet-call-completion/gap-analysis.md`, DC-1..4). The +one-way door among them (DC-1, the *existence* of peer-scoped filtering as +the default) is resolved by ADR-028; what remains open here is the shape. +The v1 defaults for DC-2/3/4 are recorded in +[client-and-adapters.md](crates/call/client-and-adapters.md) and may be +revisited during implementation without a new ADR. + +### OQ-25: Remote-Safe Marking Shape for CallClient Peer-Scoped Filtering + +- **Origin**: [client-and-adapters.md](crates/call/client-and-adapters.md), ADR-017 (§1 Consequences), ADR-028 +- **Status**: open +- **Door type**: Two-way (shape only — existence is one-way, resolved by ADR-028) +- **Priority**: medium +- **Resolution**: ADR-028 locks the one-way door: a `CallClient`'s registry + view is **default-deny** (no operation is exposed to the remote peer unless + explicitly marked remote-safe), with share-global as an explicit trusted-peer + opt-in. The v1 shape is a `remote_safe: bool` field on + `HandlerRegistration` (default `false` across all provenance). The shape is + the two-way-door remainder: a boolean is the simplest shape that supports + default-deny; a deployment that needs per-peer differentiation (different + subsets exposed to different peers on the same node) needs a richer + mechanism — per-peer allowlist, capability-class tag, or a peer-id-keyed map + on the registration. v1's boolean limits this to "remote-safe for any peer" + vs "not", which is acceptable for the runner/dispatch pattern (one remote + peer per `CallClient`). A future ADR may amend or supersede ADR-028's shape + without revisiting the *existence* of filtering. Also open under this OQ: + whether a richer shape should *expose-but-deny* non-remote-safe ops in + `services/list` (returning `NOT_FOUND` on call) instead of *hiding* them. + v1 hides them — a peer should not see ops it cannot call, so discovery and + dispatch filters agree (ADR-028 Assumption 2); expose-but-deny is the + richer-shape question, not a v1 question. +- **Cross-references**: ADR-009, ADR-014, ADR-015, ADR-017, ADR-022, ADR-024, + ADR-028, [client-and-adapters.md](crates/call/client-and-adapters.md), + [operation-registry.md](crates/call/operation-registry.md) + +### OQ-26: OperationAdapter Error Type (AdapterError Variants) + +- **Origin**: [client-and-adapters.md](crates/call/client-and-adapters.md), ADR-017 §5 +- **Status**: open +- **Door type**: Two-way +- **Priority**: medium +- **Resolution**: ADR-017 §5 showed `async fn import(&self) -> + Vec` with no error type. The trait returns + `Result, AdapterError>` where `AdapterError` is a + crate-level enum. The *presence* of an error type is recorded in + [client-and-adapters.md](crates/call/client-and-adapters.md); the exact + variants are the two-way-door remainder. The failure modes real + implementations hit: discovery transport failure (`from_call` remote + unreachable), schema parse failure (`from_openapi`, `from_jsonschema`), + unauthorized (HTTP 401 for `from_openapi`, `from_mcp`). Likely variants: + `DiscoveryFailed`, `SchemaParse`, `Transport`, `Unauthorized`. Decided + during implementation; recorded here, not in a full ADR. +- **Cross-references**: ADR-017, [client-and-adapters.md](crates/call/client-and-adapters.md) + +### OQ-27: from_call Re-Import Trigger + +- **Origin**: [client-and-adapters.md](crates/call/client-and-adapters.md), ADR-017 Assumption 4 +- **Status**: open +- **Door type**: Two-way +- **Priority**: low +- **Resolution**: ADR-017 Assumption 4 noted re-import "happens on + reconnection or is triggered explicitly." The v1 default is + **auto-re-import on connection establishment**. The overlay is + per-connection (Layer 2, ADR-024), so a stale overlay dies with the + connection; re-import on reconnect is naturally scoped to the new + connection. This is the right default for the runner pattern (a worker + reconnects → the hub re-discovers the worker's ops automatically). + Explicit re-import via a future `CallConnection::refresh()` method is + additive and can be added if a deployment needs manual control. Reversal + is cheap; no ADR needed. +- **Cross-references**: ADR-017, ADR-024, [client-and-adapters.md](crates/call/client-and-adapters.md) + +### OQ-28: from_call Namespace Collision Behavior + +- **Origin**: [client-and-adapters.md](crates/call/client-and-adapters.md), ADR-017 §3 +- **Status**: open +- **Door type**: Two-way +- **Priority**: low +- **Resolution**: ADR-017 §3's `FromCallConfig` namespace prefix is + **optional, default no prefix, collision = error**. A node importing from + two remotes that both expose `/container/exec` without prefixes should fail + loudly rather than silently overwrite. The operator adds prefixes when they + know they're importing from multiple sources. This matches the + default-deny, explicit-allow posture (ADR-015, ADR-028). Reversal is cheap; + no ADR needed. The alternative (last-wins) would silently mask one + remote's op behind another's, which is the kind of surprise the + default-deny posture exists to avoid. +- **Cross-references**: ADR-015, ADR-017, ADR-028, [client-and-adapters.md](crates/call/client-and-adapters.md) \ No newline at end of file diff --git a/tasks/call/client/call-client.md b/tasks/call/client/call-client.md new file mode 100644 index 0000000..9f6a7b6 --- /dev/null +++ b/tasks/call/client/call-client.md @@ -0,0 +1,183 @@ +--- +id: call/client/call-client +name: Implement CallClient (outbound connection opener) with peer-scoped default-deny dispatch (ADR-017, ADR-028) +status: pending +depends_on: [call/protocol/call-connection, call/registry/remote-safe-marking] +scope: moderate +risk: high +impact: phase +level: implementation +--- + +## Description + +Implement `CallClient` in `src/client/mod.rs` (new `client` module). This is +the #1 gap in alknet-call — the outbound connection opener. Every downstream +consumer (runner pattern, container service, bilateral exchange, NAPI +projection, agent cross-node tool dispatch) is blocked on it. It opens a QUIC +connection to a remote node on ALPN `alknet/call`, performs credential setup, +and produces a `CallConnection` running the **shared** dispatch loop (ADR-017 +§1). `CallClient` is the connection-establishment half; `CallAdapter`'s +accept path is the inbound half. Both produce the same `CallConnection`. + +### CallClient struct + +```rust +pub struct CallClient { + /// The operation registry. The peer-scoped view is a dispatch-time read + /// over this registry, not a copy (ADR-028 §5). + registry: Arc, + identity_provider: Arc, + /// Trusted-peer mode (ADR-028 §3): when true, the dispatch path exposes + /// all External ops to the remote peer and services/list lists all + /// External ops, ignoring the remote_safe marking. When false (default), + /// only registrations with remote_safe: true dispatch, and services/list + /// hides non-remote-safe ops (ADR-028 Assumption 2). + trusted_peer: bool, +} + +impl CallClient { + /// Default: peer-scoped (default-deny). Filters dispatch + services/list + /// by remote_safe == true. + pub fn new(registry: Arc, idp: Arc) -> Self; + + /// Trusted-peer mode: expose all External ops, ignore remote_safe. + /// Explicit opt-in per ADR-028 §3. + pub fn trusted_peer(registry: Arc, idp: Arc) -> Self; + + /// Open a QUIC connection to `addr` on ALPN `alknet/call`, perform + /// credential handshake, and return a CallConnection running the shared + /// dispatch loop. Credentials come from capabilities (ADR-014), not env + /// vars — see client-and-adapters.md "No-Env-Vars Invariant". + pub async fn connect( + &self, + addr: SocketAddr, + credentials: CallCredentials, + ) -> Result; +} +``` + +### Shared dispatch loop + +The dispatch loop is **shared** with `CallAdapter`. Once a connection is +established (whether accepted or opened), the same logic applies: read +`EventEnvelope` frames, dispatch to the operation registry, write responses, +send outgoing `call.requested` for calls initiated on this side. Refactor the +existing accept-path dispatch out of `CallAdapter` into a shared function +(likely in `src/protocol/connection.rs` or a new `src/protocol/dispatch.rs`) +that both `CallAdapter::handle` and `CallClient::connect` call. Do not +duplicate the dispatch loop — ADR-017 §1 is explicit that the client is the +connection-establishment half, not a parallel protocol implementation. + +The `CallConnection` type already exists (`protocol/connection.rs`) and +holds the Layer 2 overlay + call/subscribe/abort API. `CallClient::connect` +constructs it from the opened connection (vs `CallAdapter` constructing it +from the accepted connection). + +### Peer-scoped dispatch (ADR-028 — default-deny) + +The incoming-call dispatch path in the `CallClient` must filter by +`remote_safe`: + +- **Default mode** (`trusted_peer: false`): an incoming `call.requested` for + an op name resolves to the registration; if `registration.remote_safe == + false`, return `NOT_FOUND` (not `FORBIDDEN` — same posture as + `Visibility::Internal` per ADR-015). If `true`, dispatch normally. + `OperationContext.capabilities` is populated from the registration bundle + only for remote-safe ops — this is the security argument for default-deny + (ADR-028 Context): a remote peer's call must not trigger dispatch that + populates capabilities from the local node's registration bundle unless the + op is explicitly exposed. +- **Trusted-peer mode** (`trusted_peer: true`): bypass the `remote_safe` + filter; expose all `External` ops. The operator has made the trust decision + explicitly. + +This is a dispatch-time read over the single Layer-0 registry (ADR-028 §5) — +not a copied subset, not a third registry instance. The `OperationRegistry` +from `remote-safe-marking` is the single source. + +### services/list hide behavior (ADR-028 Assumption 2) + +When the `CallClient` serves `services/list` to the remote peer: + +- **Default mode**: hide ops where `remote_safe == false` (in addition to + the existing `Visibility::External` filter). A peer should not see ops it + cannot call. +- **Trusted-peer mode**: list all `External` ops regardless of + `remote_safe`. + +The existing `services_list_handler` in `registry/discovery.rs` filters by +`Visibility::External` only. Wire the additional `remote_safe` filter for the +`CallClient`'s serving path. (The `CallAdapter`'s serving path — local +accept — is unchanged; it continues to list all `External` ops, since a +direct QUIC client is not a `CallClient` peer in the filtered sense. Clarify +this split in code comments and a test.) + +### Credentials + +`connect()` takes a `CallCredentials` bundle. Credentials come from +`Capabilities` (ADR-014), never env vars. The three dimensions (ADR-017 §7): +TLS identity (RFC 7250 raw key or X.509, ADR-027), auth token (opaque, +vault-decrypted), remote identity verification (expected fingerprint/cert). +Populated by the assembly layer at `CallClient` construction time from +vault-derived `Capabilities`. The concrete `TlsIdentity` / `AuthToken` / +`RemoteIdentity` shapes are implementation-detail two-way doors (recorded in +client-and-adapters.md); the one-way constraint is they come from +capabilities, not env vars. + +### Connection symmetry + +After establishment, the connection is symmetric (ADR-017 §2): both sides +can send and receive `call.requested`. Connection direction is independent of +call direction. The `CallClient` is both a caller (initiates outgoing calls +via `CallConnection::call()`/`subscribe()`/`abort()`) and a callee +(dispatches incoming calls against its peer-scoped view). + +## Acceptance Criteria + +- [ ] `src/client/mod.rs` exists with `CallClient` struct (registry, idp, trusted_peer) +- [ ] `CallClient::new` constructs default-deny (trusted_peer: false) +- [ ] `CallClient::trusted_peer` constructs trusted-peer mode +- [ ] `connect()` opens a QUIC connection on ALPN `alknet/call` +- [ ] `connect()` returns a `CallConnection` running the shared dispatch loop +- [ ] Dispatch loop is shared with `CallAdapter` (refactored, not duplicated) +- [ ] Default mode: incoming call to op with `remote_safe == false` returns NOT_FOUND +- [ ] Default mode: incoming call to op with `remote_safe == true` dispatches +- [ ] Default mode: capabilities populated only for remote-safe dispatched ops +- [ ] Trusted-peer mode: all External ops dispatch regardless of remote_safe +- [ ] Default mode: services/list hides non-remote-safe ops from the peer +- [ ] Trusted-peer mode: services/list lists all External ops +- [ ] Outgoing call()/subscribe()/abort() work through the returned CallConnection +- [ ] Connection symmetry: remote peer can call back into the CallClient +- [ ] `CallCredentials` carries TLS identity / auth token / remote identity (from capabilities) +- [ ] No env-var reads in the credential path (no-env-vars invariant, ADR-014) +- [ ] Integration test: two-node call (CallClient connects to CallAdapter, both call each other) +- [ ] Integration test: default-deny op returns NOT_FOUND to remote peer +- [ ] Integration test: remote_safe op dispatches to remote peer +- [ ] Integration test: trusted-peer mode exposes all External ops +- [ ] Integration test: services/list hides non-remote-safe in default mode +- [ ] `cargo test -p alknet-call` succeeds +- [ ] `cargo clippy -p alknet-call --all-targets` succeeds with no warnings + +## References + +- docs/architecture/crates/call/client-and-adapters.md — CallClient §, credential sources § +- docs/architecture/decisions/017-call-protocol-client-and-adapter-contract.md — ADR-017 §1 (shared loop), §2 (symmetry), §7 (credentials) +- docs/architecture/decisions/028-callclient-peer-scoped-registry-filtering.md — ADR-028 (default-deny, trusted-peer, Assumption 2 services/list hide) +- docs/architecture/crates/call/call-protocol.md — CallConnection (the type connect() produces) +- tasks/call/protocol/call-connection.md — completed CallConnection task +- tasks/call/registry/remote-safe-marking.md — prerequisite (adds remote_safe field) +- docs/research/alknet-call-completion/gap-analysis.md — DC-1, implementation priority #1 + +## Notes + +> This is the single highest-value piece of work in the alknet-call +> completion — every downstream consumer is blocked on it. The dispatch loop +> is shared with CallAdapter (refactor, don't duplicate — ADR-017 §1 is +> explicit). The peer-scoped default-deny (ADR-028) is the one-way-door +> security dimension: a remote peer's call must not populate +> OperationContext.capabilities from the local bundle unless the op is +> explicitly remote-safe. The v1 shape is `trusted_peer: bool` + the +> `remote_safe: bool` field from `remote-safe-marking`; per-peer allowlists +> are OQ-25 and explicitly out of scope. Credentials come from capabilities, +> never env vars (no-env-vars invariant). \ No newline at end of file diff --git a/tasks/call/client/from-call.md b/tasks/call/client/from-call.md new file mode 100644 index 0000000..cb4e18e --- /dev/null +++ b/tasks/call/client/from-call.md @@ -0,0 +1,155 @@ +--- +id: call/client/from-call +name: Implement from_call adapter (discover remote ops via services/list + services/schema, register FromCall leaves) +status: pending +depends_on: [call/client/call-client] +scope: moderate +risk: medium +impact: component +level: implementation +--- + +## Description + +Implement `from_call` in `src/client/from_call.rs`. This is the #2 gap — it +discovers the remote peer's `External` operations and registers them in the +connection's Layer 2 overlay as `FromCall`-provenance leaves with forwarding +handlers. The discovery mechanism (`services/list` + `services/schema`) is +already implemented in `registry/discovery.rs`; `from_call` is the +client-side consumer of that API. + +### Flow (ADR-017 §3) + +1. Call `services/list` on the remote → list of `External` operations. +2. Call `services/schema` for each → input/output JSON Schemas and declared + `error_schemas` (ADR-023). +3. For each discovered op, construct a `HandlerRegistration`: + - `spec` mirrors the remote op's name (with optional prefix), namespace, + type, schemas, access control. + - `handler` is a forwarding handler: sends `call.requested` through the + `CallConnection`, awaits `call.responded` (or streams for subscriptions). + - `provenance: FromCall`, `composition_authority: None`, `scoped_env: None` + (leaf — ADR-022). +4. The caller registers the bundles via + `CallConnection::register_imported_all()`. + +### API + +```rust +pub struct FromCallConfig { + /// Namespace prefix applied to imported operation names. Optional — + /// default no prefix. Collision on import is an error (DC-3, OQ-28), + /// not last-wins. + pub namespace_prefix: Option, + /// Optional filter — import only operations whose names match. None + /// imports all External ops discovered via services/list. + pub operation_filter: Option>, +} + +/// Discover the remote peer's External ops and construct HandlerRegistration +/// bundles with FromCall provenance and forwarding handlers. The caller +/// registers the bundles in the connection's overlay via +/// CallConnection::register_imported_all(). +pub async fn from_call( + connection: &CallConnection, + config: FromCallConfig, +) -> Result, AdapterError>; +``` + +### Forwarding handler + +The handler captures a handle to the `CallConnection` and, on invocation: + +- For a `Query`/`Mutation` op: calls `connection.call(imported_name, input)`, + returns the `ResponseEnvelope`. +- For a `Subscription` op: calls `connection.subscribe(imported_name, input)`, + yields each `call.responded` until `call.completed`/`call.aborted`. +- The handler's `parent_request_id` participates in the abort cascade + (ADR-016 §6) — if the parent is aborted, the cascade reaches this handler, + which sends `call.aborted` to the remote node; the remote node cascades to + its own descendants. Cross-node abort is transparent. + +### Re-import on reconnection (DC-2, OQ-27) + +v1 default: `from_call` runs **automatically on connection establishment**. +The overlay is per-connection (Layer 2, ADR-024), so a stale overlay dies with +the connection; re-import on reconnect is naturally scoped to the new +connection. This is the right default for the runner pattern (a worker +reconnects → the hub re-discovers the worker's ops automatically). Wire the +auto-re-import into the `CallClient::connect` path (or document that the +assembly layer calls `from_call` immediately after `connect()` — pick the +cleaner integration; the auto-on-reconnect behavior is the v1 contract). + +Explicit re-import via a future `CallConnection::refresh()` is additive +(OQ-27); do not implement `refresh()` in this task unless the auto-import +wiring naturally produces it. + +### Namespace collision (DC-3, OQ-28) + +v1 default: **optional prefix, default no prefix, collision = error**. A node +importing from two remotes that both expose `/container/exec` without +prefixes should fail loudly (return `AdapterError`) rather than silently +overwrite. The operator adds prefixes when importing from multiple sources. +Implement collision detection: if applying the (possibly empty) prefix +produces a name that already exists in the target overlay, return an error. +This matches the default-deny, explicit-allow posture (ADR-015, ADR-028). + +### Provenance and visibility + +`from_call`-registered operations are `Internal` by default (ADR-015) — +composition material, not directly callable from the wire. The handler that +composes them is `External`. Set `remote_safe: false` on FromCall leaves +(they're leaves — they don't expose to *their* peers; the composition +authority is `None`). + +### Trust is transitive + +A `from_call`-imported operation executes the remote node's code, not yours. +The scoped env (ADR-015) bounds *which* operations are reachable, not *what* +they do. `from_call` means "I trust the remote node as much as my own +handlers." This is inherent to remote composition; the spec records it, the +implementation doesn't need to enforce it beyond the scoped-env reachability +that already exists. + +## Acceptance Criteria + +- [ ] `src/client/from_call.rs` exists with `FromCallConfig` and `from_call` +- [ ] `from_call` calls `services/list` then `services/schema` for each op +- [ ] Each discovered op becomes a `HandlerRegistration` with `provenance: FromCall` +- [ ] Forwarding handler sends `call.requested` via `CallConnection::call`/`subscribe` +- [ ] Subscription forwarding yields until `call.completed`/`call.aborted` +- [ ] `composition_authority: None`, `scoped_env: None` for FromCall leaves +- [ ] `remote_safe: false` on FromCall leaves +- [ ] Namespace prefix applied when `config.namespace_prefix` is Some +- [ ] Collision on import (same prefixed name) returns `AdapterError`, not silent overwrite +- [ ] `operation_filter` limits which ops are imported +- [ ] Re-import runs on connection establishment (auto-on-reconnect, v1 default) +- [ ] Cross-node abort: parent abort cascades to from_call handler → sends call.aborted remote +- [ ] `from_call` returns `Result<_, AdapterError>` (the error type from OQ-26) +- [ ] Integration test: from_call populates Layer 2 overlay with remote External ops +- [ ] Integration test: forwarding handler invokes remote op and returns result +- [ ] Integration test: subscription forwarding streams remote events +- [ ] Integration test: namespace collision returns error +- [ ] Integration test: operation_filter limits imports +- [ ] `cargo test -p alknet-call` succeeds +- [ ] `cargo clippy -p alknet-call --all-targets` succeeds with no warnings + +## References + +- docs/architecture/crates/call/client-and-adapters.md — from_call §, re-import §, namespace collision § +- docs/architecture/decisions/017-call-protocol-client-and-adapter-contract.md — ADR-017 §3 (from_call flow), §6 (cross-node abort) +- docs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md — ADR-022 (leaf provenance, None authority/env) +- docs/architecture/decisions/024-operation-registry-layering.md — ADR-024 (Layer 2 overlay) +- docs/architecture/decisions/016-abort-cascade-for-nested-calls.md — ADR-016 §6 (cross-node cascade) +- docs/architecture/decisions/023-operation-error-schemas.md — ADR-023 (error_schemas mirrored) +- docs/research/alknet-call-completion/gap-analysis.md — DC-2, DC-3, DC-5, implementation priority #2 + +## Notes + +> from_call is the client-side consumer of the already-implemented +> services/list + services/schema discovery API. The v1 defaults are +> auto-on-reconnect (DC-2/OQ-27) and error-on-collision (DC-3/OQ-28) — both +> two-way doors, recorded in client-and-adapters.md, revisitable without an +> ADR. The AdapterError type (DC-4/OQ-26) is shared with the +> operation-adapter-trait task — coordinate the enum shape. Cross-node abort +> is transparent via the forwarding handler's parent_request_id (ADR-016 §6). \ No newline at end of file diff --git a/tasks/call/client/from-jsonschema.md b/tasks/call/client/from-jsonschema.md new file mode 100644 index 0000000..74c789e --- /dev/null +++ b/tasks/call/client/from-jsonschema.md @@ -0,0 +1,124 @@ +--- +id: call/client/from-jsonschema +name: Implement from_jsonschema adapter (schema-only registration, FromJsonSchema provenance, no handler) +status: pending +depends_on: [call/client/operation-adapter-trait] +scope: narrow +risk: low +impact: isolated +level: implementation +--- + +## Description + +Implement `from_jsonschema` in `src/client/from_jsonschema.rs`. This is the #4 +gap — schema-only registration: produces `HandlerRegistration` bundles with +no handler (`FromJsonSchema` provenance). Used for validation, discovery, and +composition-graph construction without a runtime — type-checking a +composition plan without executing it, building a UI of available operations +without standing up the transports, etc. + +### Distinct from from_call (gap analysis DC-5 — confirmed, not a decision) + +| | `from_jsonschema` | `from_call` | +|---|---|---| +| Schema source | Provided directly (caller fetches, passes in) | Discovered over wire (`services/list` + `services/schema`) | +| Handler at call time | None (schema-only, `FromJsonSchema` provenance) | Forwards over QUIC (`FromCall` provenance, leaf) | +| Use case | Type validation, discovery, composition graph construction | Actually invoking remote operations | + +Keeping them separate preserves the "schema-only, no execution" use case (type +checking, safe composition planning without runtime). + +### API + +```rust +/// Schema-only registration: produce a HandlerRegistration bundle with +/// FromJsonSchema provenance and no handler. The caller fetches the JSON +/// Schema doc and passes it in; this adapter does no network I/O. +pub fn from_jsonschema( + spec: OperationSpec, + schema: serde_json::Value, +) -> HandlerRegistration; +``` + +The bundle: +- `provenance: FromJsonSchema` +- `composition_authority: None` (no composition — it's schema-only) +- `scoped_env: None` (leaf-equivalent — no reachability) +- `capabilities: Capabilities::new()` (empty — no outbound credentials, no handler to use them) +- `remote_safe: false` (default — ADR-028 §4; provenance-aware default) +- `handler`: a placeholder that returns a `NOT_FOUND`-style or + `INVALID_INPUT`-style error if ever invoked. Since `FromJsonSchema` ops are + `Internal`/not-remote-safe by default and have no composition authority, they + should never be dispatched; the placeholder makes the type-level constraint + hold (the `Handler` type requires a closure) and fails loudly if a bug routes + a call to it. + +### OperationAdapter impl + +`from_jsonschema` implements the `OperationAdapter` trait (from +`operation-adapter-trait`). Because it does no I/O, the `import()` body +contains no `.await` points — it trivially satisfies the async trait. + +```rust +pub struct FromJsonSchema { + spec: OperationSpec, + schema: serde_json::Value, +} + +#[async_trait] +impl OperationAdapter for FromJsonSchema { + async fn import(&self) -> Result, AdapterError> { + // No .await — pure parse. Validates schema shape if useful, returns bundle. + Ok(vec![from_jsonschema(self.spec.clone(), self.schema.clone())]) + } +} +``` + +If the schema is malformed, return `AdapterError::SchemaParse`. + +### Why this is standalone (medium priority) + +`from_jsonschema` doesn't depend on `CallClient` or `from_call` — it's pure +parse with no transport. It's sequenced after `operation-adapter-trait` only +because it implements the trait; if the trait lands first, this can proceed +in parallel with `call-client`/`from-call`. It's medium priority because the +primary consumers (runner, container service, agent) need `from_call`, not +`from_jsonschema`; the schema-only use case is validation/discovery tooling. + +## Acceptance Criteria + +- [ ] `src/client/from_jsonschema.rs` exists with `from_jsonschema` fn + `FromJsonSchema` struct +- [ ] `from_jsonschema` produces a `HandlerRegistration` with `provenance: FromJsonSchema` +- [ ] `composition_authority: None`, `scoped_env: None`, empty `capabilities` +- [ ] `remote_safe: false` (provenance-aware default, ADR-028 §4) +- [ ] Handler placeholder returns an error if invoked (no real handler) +- [ ] `FromJsonSchema` implements `OperationAdapter` (async, no .await in import) +- [ ] Malformed schema returns `AdapterError::SchemaParse` +- [ ] No network I/O (pure parse — caller fetches the doc) +- [ ] Unit test: from_jsonschema produces a bundle with correct provenance + None fields +- [ ] Unit test: placeholder handler returns error when invoked +- [ ] Unit test: OperationAdapter impl returns Ok with one bundle +- [ ] Unit test: malformed schema returns SchemaParse error +- [ ] `cargo test -p alknet-call` succeeds +- [ ] `cargo clippy -p alknet-call --all-targets` succeeds with no warnings + +## References + +- docs/architecture/crates/call/client-and-adapters.md — from_jsonschema §, from_jsonschema vs from_call table +- docs/architecture/decisions/017-call-protocol-client-and-adapter-contract.md — ADR-017 §5 (FromJsonSchema impl) +- docs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md — ADR-022 (FromJsonSchema provenance, leaf-equivalent None fields) +- docs/architecture/decisions/028-callclient-peer-scoped-registry-filtering.md — ADR-028 §4 (remote_safe default false) +- tasks/call/client/operation-adapter-trait.md — prerequisite (the trait + AdapterError) +- docs/research/alknet-call-completion/gap-analysis.md — DC-5, implementation priority #4 + +## Notes + +> from_jsonschema is distinct from from_call (DC-5): schema source is +> provided directly (caller fetches), there's no handler at call time, and +> the use case is validation/discovery/composition-graph construction without +> a runtime. It's pure parse with no transport, so it can proceed in parallel +> with call-client/from-call once the trait lands. The placeholder handler +> fails loudly if a bug ever routes a call to a schema-only op — they're +> Internal + not-remote-safe + no composition authority, so dispatch should +> never reach them. \ No newline at end of file diff --git a/tasks/call/client/operation-adapter-trait.md b/tasks/call/client/operation-adapter-trait.md new file mode 100644 index 0000000..e473fe5 --- /dev/null +++ b/tasks/call/client/operation-adapter-trait.md @@ -0,0 +1,125 @@ +--- +id: call/client/operation-adapter-trait +name: Define OperationAdapter async trait + AdapterError enum (ADR-017 §5, DC-4/OQ-26) +status: pending +depends_on: [call/registry/handler-registration] +scope: narrow +risk: low +impact: project +level: implementation +--- + +## Description + +Define the `OperationAdapter` async trait and the `AdapterError` crate-level +enum in `src/client/adapter.rs` (or `src/registry/adapter.rs` — pick the +module that keeps the trait near the types it produces). This is the #3 gap +(enabling, not blocking) — `from_call` can be built as a free function before +the trait exists, but the trait is needed before `alknet-http`'s +`from_openapi`/`from_mcp` adapters can be built. Small, standalone, unblocks +`alknet-http` Phase 1. + +### The trait (ADR-017 §5) + +```rust +#[async_trait] +pub trait OperationAdapter: Send + Sync { + async fn import(&self) -> Result, AdapterError>; +} +``` + +The trait is **async** because `from_call` requires async discovery +(`services/list` + `services/schema` over a QUIC connection). Sync adapters +(`from_openapi`, `from_mcp` reading a static spec) trivially satisfy an async +trait — their `import()` bodies contain no `.await` points. This is locked by +ADR-017 §5; the async/sync question is decided. + +The return type is `Vec` (not `(OperationSpec, Handler)` +pairs) — ADR-022 changed the registration API to the bundle shape, and +adapters must produce bundles. Adapter convenience methods construct bundles +with `composition_authority: None` and `scoped_env: None` for the leaf ops they +produce. + +The `to_*` adapters (`to_openapi`, `to_mcp`) are outbound projections, not +`OperationAdapter` implementations — they consume the registry, they don't +produce entries for it (ADR-017 §5). Do not implement `to_*` here. + +### AdapterError (DC-4, OQ-26) + +ADR-017 §5 showed `async fn import(&self) -> Vec` with no +error type. A real implementation needs to handle failures. The trait returns +`Result, AdapterError>` where `AdapterError` is a +crate-level enum covering the failure modes real implementations hit: + +- `DiscoveryFailed` — `from_call` remote unreachable / `services/list` failed +- `SchemaParse` — `from_openapi` / `from_jsonschema` couldn't parse the spec +- `Transport` — underlying transport error (QUIC for `from_call`, HTTP for + `from_openapi`/`from_mcp`) +- `Unauthorized` — HTTP 401 for `from_openapi`/`from_mcp`, auth rejected for + `from_call` +- `Conflict` — namespace collision in `from_call` (DC-3); reuse for other + adapter collisions + +The exact variant set is the two-way-door remainder (OQ-26); the *presence* of +an error type is recorded in `client-and-adapters.md`. Pick the variants +above as the v1 set; add a `#[non_exhaustive]` so `alknet-http`'s adapters can +extend without breaking match arms. Use `thiserror::Error` for the derive +(consistent with the crate's existing error types). + +### Where the trait lives + +The trait lives in **alknet-call** (where the types — `HandlerRegistration`, +`OperationSpec`, `Handler` — live). The *implementations* live where their +transport dependencies live (the adapter location map, client-and-adapters.md): + +- `FromCall` — QUIC-backed (in `alknet-call`, task `call/client/from_call`) +- `FromJsonSchema` — pure parse, no transport (in `alknet-call`, task + `call/client/from-jsonschema`) +- `FromOpenAPI` — HTTP-backed (in `alknet-http`, separate Phase 0) +- `FromMCP` — MCP streamable-HTTP-backed (in `alknet-http`, feature-gated, + separate Phase 0) + +Do not implement `FromOpenAPI`/`FromMCP` here — those are `alknet-http` tasks. +This task defines the trait + error; `from_call` and `from_jsonschema` +implement it (in their tasks). + +### Implementations registered in this task + +Optionally implement a trivial `FromJsonSchema` adapter in this task if it +falls out naturally (it's a pure-parse adapter with no transport — see +`call/client/from-jsonschema`). If it doesn't fall out naturally, leave it for +the `from-jsonschema` task; the trait + error alone satisfy this task's +acceptance criteria. + +## Acceptance Criteria + +- [ ] `OperationAdapter` trait defined: `async fn import(&self) -> Result, AdapterError>` +- [ ] Trait is `#[async_trait]` (async — ADR-017 §5, locked) +- [ ] `AdapterError` enum defined with `#[non_exhaustive]` and `thiserror::Error` +- [ ] `AdapterError` variants: `DiscoveryFailed`, `SchemaParse`, `Transport`, `Unauthorized`, `Conflict` +- [ ] Trait + error are `pub` and re-exported from `lib.rs` +- [ ] Trait is located in alknet-call (where the types live), not alknet-http +- [ ] Doc comments link to ADR-017 §5 and client-and-adapters.md +- [ ] Unit test: a trivial test adapter implementing the trait compiles and returns Ok +- [ ] Unit test: a test adapter returning `Err(AdapterError::SchemaParse)` compiles +- [ ] `cargo test -p alknet-call` succeeds +- [ ] `cargo clippy -p alknet-call --all-targets` succeeds with no warnings + +## References + +- docs/architecture/crates/call/client-and-adapters.md — OperationAdapter trait §, Adapter Location Map § +- docs/architecture/decisions/017-call-protocol-client-and-adapter-contract.md — ADR-017 §5 (the trait contract), Amendments (DC-4 resolution) +- docs/architecture/open-questions.md — OQ-26 (AdapterError variants, two-way-door remainder) +- docs/research/alknet-call-completion/gap-analysis.md — DC-4, implementation priority #3 + +## Notes + +> The trait is async because from_call needs async discovery; sync adapters +> (from_openapi reading a static spec) trivially satisfy it. The trait lives +> in alknet-call (where the types live); implementations live with their +> transport deps (from_call/from_jsonschema here, from_openapi/from_mcp in +> alknet-http). The AdapterError variants are the two-way-door remainder +> (OQ-26) — `#[non_exhaustive]` lets alknet-http extend without breaking. This +> task is small and standalone; it unblocks alknet-http Phase 1's adapter +> implementations. The to_* adapters are projections, not OperationAdapter +> impls — don't implement them here. \ No newline at end of file diff --git a/tasks/call/registry/remote-safe-marking.md b/tasks/call/registry/remote-safe-marking.md new file mode 100644 index 0000000..e483c8e --- /dev/null +++ b/tasks/call/registry/remote-safe-marking.md @@ -0,0 +1,118 @@ +--- +id: call/registry/remote-safe-marking +name: Add remote_safe field to HandlerRegistration for CallClient peer-scoped filtering (ADR-028) +status: pending +depends_on: [] +scope: narrow +risk: medium +impact: isolated +level: implementation +--- + +## Description + +Add the `remote_safe: bool` field to `HandlerRegistration` (and its builder) so +that a `CallClient` can default-deny which operations it exposes to a remote +peer. This is the v1 shape of the peer-scoped filtering mechanism locked by +ADR-028. It is the prerequisite for `call/client/call-client` (the +`CallClient`'s dispatch path reads this field) and is the only one-way-door +piece of the call-completion work, so it goes first. + +### Field + +```rust +pub struct HandlerRegistration { + pub spec: OperationSpec, + pub handler: Handler, + pub provenance: OperationProvenance, + pub composition_authority: Option, // None for leaves + pub scoped_env: Option, // None for leaves + pub capabilities: Capabilities, + pub remote_safe: bool, // default false; ADR-028 — exposes this op to + // CallClient peers (trusted-peer mode bypasses) +} +``` + +`remote_safe` defaults to `false` across **all** provenance (Local, Session, +and the leaf provenances — see ADR-028 §4). The operator flips specific +operations to `true` when they want a peer to reach them. This mirrors the +default-deny posture of ADR-015 (visibility `Internal` by default) and +ADR-022 (composition authority `None` for leaves by default). + +### Builder + +`OperationRegistryBuilder` needs a way to set `remote_safe`: + +- `.with_local(...)` / `.with_leaf(...)` / `.with(...)` should default + `remote_safe: false` (current call sites stay valid unchanged). +- Add a chainable setter, e.g. `.remote_safe(true)` on the builder or a + `with_local_remote(...)` / explicit-arg variant. The exact builder API shape + is a two-way door — pick the least invasive (an optional trailing arg or a + builder setter method); do not over-engineer per-peer allowlists (that's + OQ-25's two-way-door remainder, explicitly out of scope here). + +### services/list interaction (ADR-028 Assumption 2) + +`services/list` already filters by `Visibility::External` (ADR-015). Per +ADR-028 Assumption 2, when served to a `CallClient` peer, `services/list` must +**additionally hide** non-remote-safe ops — a peer should not see ops it +cannot call, so discovery and dispatch filters agree. The +`services_list_handler` in `registry/discovery.rs` currently filters only on +`visibility`. + +**Scoping note**: the `services/list` handler doesn't know whether the caller +is a `CallClient` peer or a local process. The v1 implementation: the filter +applied by `services/list` is the *registry's* filter, and the peer-scoped +*view* a `CallClient` exposes is built atop this. The cleanest v1 split is: + +- `services/list` keeps filtering by `Visibility::External` (unchanged). +- The `CallClient`'s peer-scoped view (task `call/client/call-client`) is a + dispatch-time read that additionally filters by `remote_safe`, and the + `CallClient`'s *own* `services/list` serving (when it receives + `services/list` from the remote peer) hides non-remote-safe ops. + +So this task adds the **field + builder setter + provenance defaults**, and +the *filtering behavior* that consumes the field is wired in +`call/client/call-client` (the dispatch path) and — for the +`services/list`-hides-non-remote-safe behavior — in the `CallClient`'s +serving path. **This task only adds the data and defaults**, plus a unit +test that the field defaults to `false` and that the setter flips it. Keep +this task tightly scoped: adding the field must not change any existing +dispatch behavior (the field is read-only by the CallClient layer added +later). + +## Acceptance Criteria + +- [ ] `HandlerRegistration` has `pub remote_safe: bool` field +- [ ] All existing `with_local` / `with_leaf` / `with` builder call sites + compile unchanged (default `remote_safe: false`) +- [ ] A builder setter exists to set `remote_safe: true` (e.g. + `.remote_safe(true)` or an explicit-arg variant) +- [ ] Provenance-aware defaults: `Local`, `Session`, `FromOpenAPI`, `FromMCP`, + `FromCall`, `FromJsonSchema` all default to `false` (ADR-028 §4) +- [ ] No existing dispatch path behavior changes (field is data-only here; + the CallClient filter that reads it is a later task) +- [ ] `services/list` handler is unchanged in this task (filtering wired later) +- [ ] Unit test: `HandlerRegistration` default has `remote_safe == false` +- [ ] Unit test: builder setter produces `remote_safe == true` +- [ ] Unit test: all six provenance variants default `remote_safe == false` +- [ ] `cargo test -p alknet-call` succeeds +- [ ] `cargo clippy -p alknet-call --all-targets` succeeds with no warnings + +## References + +- docs/architecture/decisions/028-callclient-peer-scoped-registry-filtering.md — ADR-028 (the one-way door; §2 field location, §4 provenance defaults, Assumption 2 services/list hide) +- docs/architecture/crates/call/operation-registry.md — HandlerRegistration struct sketch (now shows `remote_safe`) +- docs/architecture/crates/call/client-and-adapters.md — CallClient § (consumes the field; trusted-peer bypass) +- docs/research/alknet-call-completion/gap-analysis.md — DC-1 + +## Notes + +> This is the one-way-door piece of the call-completion work: the *existence* +> of default-deny filtering is locked by ADR-028; this task adds only the v1 +> data shape (`remote_safe: bool`) and defaults. The richer per-peer shape +> (allowlist, capability-class tag) is explicitly out of scope — it's the +> two-way-door remainder tracked as OQ-25. Do not implement per-peer logic +> here. The filtering *behavior* (dispatch path + services/list hide) is wired +> in `call/client/call-client`, not here — this task is data + defaults only +> so it can land first and unblock the CallClient task. \ No newline at end of file diff --git a/tasks/call/review-completion.md b/tasks/call/review-completion.md new file mode 100644 index 0000000..cf4fed9 --- /dev/null +++ b/tasks/call/review-completion.md @@ -0,0 +1,158 @@ +--- +id: call/review-completion +name: Review alknet-call client/adapter completion for spec conformance (ADR-017, ADR-028) and no-env-vars invariant +status: pending +depends_on: [call/client/from-call, call/client/operation-adapter-trait, call/client/from-jsonschema] +scope: broad +risk: low +impact: phase +level: review +--- + +## Description + +Review the alknet-call client/adapter completion (the gap ADR-017 left to +implementation, now specced in `client-and-adapters.md`) for spec conformance, +security-constraint conformance, and pattern consistency. This is the quality +checkpoint at the end of the call-completion batch — the work that unblocks +every downstream consumer (runner, container service, bilateral exchange, +NAPI, agent cross-node dispatch). + +### Review Checklist + +1. **CallClient conformance** (client-and-adapters.md §CallClient): + - `CallClient` struct with registry, identity_provider, trusted_peer + - `new()` constructs default-deny (trusted_peer: false) + - `trusted_peer()` constructs trusted-peer mode (explicit opt-in) + - `connect()` opens QUIC on ALPN `alknet/call`, returns CallConnection + - Dispatch loop is **shared** with CallAdapter (refactored, not duplicated — ADR-017 §1) + - Connection symmetry (ADR-017 §2): both sides call each other after establishment + - Credentials from Capabilities, not env vars (ADR-014, no-env-vars invariant) + +2. **Peer-scoped filtering conformance** (ADR-028): + - Default-deny: op with `remote_safe == false` returns NOT_FOUND to remote peer + - Default-deny: `OperationContext.capabilities` populated only for remote-safe ops + - Trusted-peer mode: all External ops dispatch regardless of remote_safe + - services/list hides non-remote-safe ops in default mode (ADR-028 Assumption 2) + - services/list lists all External ops in trusted-peer mode + - Dispatch-time read over single Layer-0 registry (not a copy — ADR-028 §5) + - `remote_safe` defaults false across all provenance (ADR-028 §4) + +3. **from_call conformance** (client-and-adapters.md §from_call): + - Calls services/list then services/schema for each op + - Constructs HandlerRegistration with `provenance: FromCall` + - Forwarding handler sends call.requested via CallConnection + - Subscription forwarding yields until completed/aborted + - `composition_authority: None`, `scoped_env: None` (leaf — ADR-022) + - `remote_safe: false` on FromCall leaves + - Namespace collision = error (DC-3/OQ-28), not silent overwrite + - Re-import on connection establishment (DC-2/OQ-27, v1 default) + - Cross-node abort via parent_request_id (ADR-016 §6) + +4. **OperationAdapter trait conformance** (client-and-adapters.md §OperationAdapter): + - `async fn import(&self) -> Result, AdapterError>` + - Trait is `#[async_trait]` (async — ADR-017 §5, locked) + - `AdapterError` is `#[non_exhaustive]` + `thiserror::Error` + - Variants: DiscoveryFailed, SchemaParse, Transport, Unauthorized, Conflict + - Trait lives in alknet-call (where the types live), not alknet-http + +5. **from_jsonschema conformance** (client-and-adapters.md §from_jsonschema): + - `provenance: FromJsonSchema`, no real handler (placeholder errors if invoked) + - `composition_authority: None`, `scoped_env: None`, empty capabilities + - `remote_safe: false` (provenance default, ADR-028 §4) + - Implements OperationAdapter, no .await in import (pure parse) + - Malformed schema → `AdapterError::SchemaParse` + +6. **Adapter location map conformance** (client-and-adapters.md §Adapter Location Map): + - OperationAdapter trait + from_call + from_jsonschema + CallClient in alknet-call + - No HTTP client / HTTP server deps in alknet-call (stays lean) + - from_openapi/from_mcp/to_openapi/to_mcp NOT in alknet-call (deferred to alknet-http) + - MCP stdio not built (security position, not a feature gap) + +7. **No-env-vars invariant** (client-and-adapters.md §No-Env-Vars Invariant): + - Credential path: vault → assembly → Capabilities → HandlerRegistration.capabilities → OperationContext.capabilities → handler + - No handler reads outbound credentials from any source other than OperationContext.capabilities + - No `std::env::var` reads in the credential path + - The invariant is enforced by the dispatch path (build_root_context), not runtime convention + +8. **ADR conformance (completion-specific)**: + - ADR-017 §1: shared dispatch loop, CallClient own registry (now peer-scoped per ADR-028) + - ADR-017 §2: connection direction independent of call direction + - ADR-017 §3: from_call flow (services/list + services/schema), FromCallConfig prefix/filter + - ADR-017 §5: async trait, bundles not (spec,handler) pairs, to_* are projections not impls + - ADR-017 §6: cross-node abort cascade through from_call handler + - ADR-017 §7: credentials from capabilities (TLS identity, auth token, remote identity) + - ADR-028: default-deny, remote_safe bool, trusted-peer opt-in, dispatch-time read, services/list hide + +9. **Security constraints (completion-specific)**: + - Default-deny filtering: remote peer can't trigger capability exposure for non-remote-safe ops + - Trusted-peer opt-in is explicit, never default + - Capabilities non-serializable, never cross the wire (ADR-014) + - from_call trust is transitive (remote node's code runs) — recorded in spec, not enforced beyond scoped env + - FromCall/FromJsonSchema leaves have no composition authority (can't escalate) + +10. **Test coverage**: + - Integration test: two-node call (CallClient ↔ CallAdapter, both call each other) + - Integration test: default-deny op → NOT_FOUND to remote peer + - Integration test: remote_safe op dispatches to remote peer + - Integration test: trusted-peer mode exposes all External ops + - Integration test: services/list hides non-remote-safe in default mode + - Integration test: from_call populates Layer 2 overlay, forwarding works + - Integration test: subscription forwarding streams remote events + - Integration test: namespace collision returns error + - Integration test: cross-node abort cascades through from_call handler + - Unit tests: AdapterError variants, OperationAdapter trait compiles + +11. **Spec drift check**: verify `client-and-adapters.md` still matches the + implementation after the completion (no spec/impl drift introduced during + implementation). In particular: the CallClient struct sketch, the + CallCredentials sketch, the FromCallConfig fields, the AdapterError + variants, and the remote_safe field on HandlerRegistration. + +## Acceptance Criteria + +- [ ] CallClient matches client-and-adapters.md (struct, new/trusted_peer, connect, shared loop) +- [ ] Peer-scoped filtering matches ADR-028 (default-deny, trusted-peer, services/list hide) +- [ ] from_call matches client-and-adapters.md (flow, FromCallConfig, provenance, None fields) +- [ ] OperationAdapter trait + AdapterError match client-and-adapters.md (async, non_exhaustive, variants) +- [ ] from_jsonschema matches client-and-adapters.md (provenance, placeholder handler, no I/O) +- [ ] Adapter location map respected (no HTTP deps in alknet-call; from_openapi/mcp not built here) +- [ ] No-env-vars invariant holds (credentials from Capabilities, no env-var reads) +- [ ] ADRs 017 + 028 conformed to (plus 014/015/016/022/023/024 where touched) +- [ ] Default-deny security constraint enforced (no capability exposure for non-remote-safe) +- [ ] Integration tests cover two-node call, default-deny, trusted-peer, from_call, abort cascade +- [ ] No spec/impl drift in client-and-adapters.md (or drift documented + spec amended) +- [ ] `cargo fmt --check -p alknet-call` passes +- [ ] `cargo clippy -p alknet-call --all-targets` passes with no warnings +- [ ] All tests pass + +## References + +- docs/architecture/crates/call/client-and-adapters.md — the spec being reviewed against +- docs/architecture/crates/call/README.md — crate index (now lists client-and-adapters.md) +- docs/architecture/decisions/017-call-protocol-client-and-adapter-contract.md — ADR-017 (amended) +- docs/architecture/decisions/028-callclient-peer-scoped-registry-filtering.md — ADR-028 +- docs/architecture/open-questions.md — OQ-25..28 (two-way-door remainders — verify defaults match spec) +- docs/research/alknet-call-completion/gap-analysis.md — DC-1..4, the decisions this batch resolved +- tasks/call/registry/remote-safe-marking.md +- tasks/call/client/call-client.md +- tasks/call/client/from-call.md +- tasks/call/client/operation-adapter-trait.md +- tasks/call/client/from-jsonschema.md + +## Notes + +> This review closes the call-completion batch. The load-bearing security +> invariant is ADR-028's default-deny: a remote peer's call must not trigger +> dispatch that populates OperationContext.capabilities from the local +> registration bundle unless the op is explicitly remote-safe. Verify this +> with a test that asserts a non-remote-safe op's call does NOT populate +> capabilities (not just that it returns NOT_FOUND — the security argument is +> about capability exposure, not just call denial). The no-env-vars invariant +> (ADR-014) is the dispatch-side corollary: no handler reads credentials from +> any source other than OperationContext.capabilities. The shared dispatch +> loop (ADR-017 §1) is the architectural commitment that keeps CallClient from +> becoming a parallel protocol implementation — verify the loop is genuinely +> shared (refactored out of CallAdapter), not copy-pasted. If deviations are +> found, document and fix before considering the call-completion batch done. +> This unblocks every downstream consumer, so spec/impl drift here propagates. \ No newline at end of file