tasks: decompose vault, core, call crates into 28 atomic implementation tasks

Break down the three initial crates (alknet-vault, alknet-core, alknet-call)
into dependency-ordered task files for implementation agents.

Structure:
- tasks/vault/ (10 tasks) — drift fixes from ADR-025/026 refactor, review,
  spec sync. Vault is independent and can run fully in parallel with core/call.
- tasks/core/ (6 tasks) — crate init, core types, config, auth, endpoint,
  review. Core is foundational; call depends on it.
- tasks/call/ (12 tasks) — split into registry/ and protocol/ topic subdirs
  reflecting the two subsystems. CallAdapter is the merge point.

Key decisions:
- Drifts 3+9+10 grouped as one task (key-versioning-rotation) — the complete
  ADR-021 rotation feature that doesn't compile in pieces
- Reviews injected at end of each crate phase (vault, core, call)
- Vault spec-sync task removes the drift table and bumps doc status to stable
- ACME deferred in core/endpoint (noted as TODO; X509 manual certs for now)
- OperationEnv kept as a trait (load-bearing for ADR-024 layering)

Validated: 28 tasks, no cycles, 11 generations of parallel work.
Critical path runs through call (11 tasks). Vault completes by generation 4.
6 high-risk tasks identified (21%): irpc-removal, endpoint, operation-context,
operation-env, call-adapter, abort-cascade.
This commit is contained in:
2026-06-23 12:41:47 +00:00
parent 2e34590522
commit 098fd8b9b9
28 changed files with 4271 additions and 0 deletions

103
tasks/call/crate-init.md Normal file
View File

@@ -0,0 +1,103 @@
---
id: call/crate-init
name: Initialize alknet-call crate with Cargo.toml, dependencies, and module skeleton
status: pending
depends_on: [core/core-types]
scope: moderate
risk: low
impact: project
level: implementation
---
## Description
Initialize the `alknet-call` crate from scratch. This crate implements the call
protocol (structured RPC over QUIC) on ALPN `alknet/call`. It depends on
alknet-core (for ProtocolHandler, Connection, AuthContext, Capabilities,
IdentityProvider) and irpc (for framing).
### Crate setup
Create `crates/alknet-call/` with:
- `Cargo.toml` — package metadata, dependencies
- `src/lib.rs` — crate root with module declarations and re-exports
- Module skeleton files for:
- `src/registry/mod.rs` — registry module root
- `src/registry/spec.rs` — OperationSpec, OperationType, Visibility, ErrorDefinition, AccessControl
- `src/registry/context.rs` — OperationContext, AbortPolicy, CompositionAuthority, ScopedOperationEnv
- `src/registry/registration.rs` — Handler, HandlerRegistration, OperationProvenance, OperationRegistry, OperationRegistryBuilder
- `src/registry/env.rs` — OperationEnv trait, LocalOperationEnv, CompositeOperationEnv
- `src/registry/discovery.rs` — services/list, services/schema handlers
- `src/protocol/mod.rs` — protocol module root
- `src/protocol/wire.rs` — EventEnvelope, ResponseEnvelope, CallError, framing
- `src/protocol/pending.rs` — PendingRequestMap, PendingEntry
- `src/protocol/connection.rs` — CallConnection
- `src/protocol/adapter.rs` — CallAdapter (ProtocolHandler impl)
- `src/protocol/abort.rs` — abort cascade logic
### Dependencies
| Crate | Purpose |
|-------|---------|
| `alknet-core` | ProtocolHandler, Connection, AuthContext, Capabilities, IdentityProvider, Identity, HandlerError (workspace path) |
| `irpc` | Framing, service dispatch (workspace dep) |
| `tokio` 1 (full) | Async runtime, sync primitives (oneshot, mpsc, watch) |
| `serde` 1 | Serialization for wire types |
| `serde_json` 1 | JSON wire format, JSON Schema values |
| `async-trait` 0.1 | OperationEnv trait (async fn in trait) |
| `tracing` 0.1 | Structured logging |
| `thiserror` 2 | Error enums |
| `uuid` 1 | Request ID generation (UUID v4) |
| `futures` | Stream trait for subscribe |
### Workspace Cargo.toml
Add `crates/alknet-call` to the workspace `members` list in the root
`Cargo.toml`.
### Module skeleton
```rust
// src/lib.rs
//! alknet-call: Structured RPC over QUIC — operations, streaming, service discovery.
//! Implements ProtocolHandler on ALPN `alknet/call`.
pub mod registry;
pub mod protocol;
// Re-exports (filled in by subsequent tasks)
```
Each module file gets a doc comment and `// TODO: implement` marker.
## Acceptance Criteria
- [ ] `crates/alknet-call/Cargo.toml` exists with all dependencies
- [ ] `crates/alknet-call/src/lib.rs` exists with module declarations
- [ ] All module skeleton files exist (registry/*, protocol/*)
- [ ] Root `Cargo.toml` `members` list includes `crates/alknet-call`
- [ ] `cargo check -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
- [ ] Dual licensing: `MIT OR Apache-2.0` (workspace-inherited)
- [ ] alknet-core dependency uses workspace path (`path = "../alknet-core"`)
## References
- docs/architecture/crates/call/README.md — crate index
- docs/architecture/crates/call/call-protocol.md — CallAdapter, wire format
- docs/architecture/crates/call/operation-registry.md — registry, OperationEnv
- docs/architecture/decisions/003-crate-decomposition.md — ADR-003
- docs/architecture/decisions/005-irpc-as-call-protocol-foundation.md — ADR-005
## Notes
> alknet-call depends on alknet-core (for ProtocolHandler, Connection,
> AuthContext, Capabilities, IdentityProvider) and irpc (for framing). The
> crate has two subsystems: registry (operation specs, context, dispatch) and
> protocol (wire format, streams, adapter). The module structure reflects
> this split.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,193 @@
---
id: call/protocol/abort-cascade
name: Implement abort cascade logic for nested calls (ADR-016)
status: pending
depends_on: [call/protocol/call-adapter]
scope: moderate
risk: high
impact: component
level: implementation
---
## Description
Implement the abort cascade logic in `src/protocol/abort.rs`. When a handler
composes other operations via `OperationEnv::invoke()`, it creates a call tree:
a parent request (r1) spawns children (r1-a, r1-b), which may spawn their own
children. When `call.aborted` arrives for a parent, the protocol cascades the
abort to all non-terminal descendants.
**Read ADR-016 before starting this task.**
### Call tree
The call tree is indexed by `parent_request_id` in the `PendingRequestMap`. The
root request has `parent_request_id: None`. Each composed call has
`parent_request_id: Some(parent.request_id)`.
```
r1 (root, wire call)
├── r1-a (composed by r1's handler)
│ ├── r1-a-1 (composed by r1-a's handler)
│ └── r1-a-2
└── r1-b
└── r1-b-1
```
### Abort cascade
When `call.aborted` arrives for a parent request:
1. Find all non-terminal descendants in the tree (walk by `parent_request_id`)
2. Send `call.aborted` for each descendant
3. Cancel each descendant's future (Drop releases resources)
The CallAdapter walks the tree indexed by `parent_request_id` in
`PendingRequestMap` and sends `call.aborted` for each descendant.
### AbortPolicy
The abort policy is set on `OperationContext` and propagated through
`OperationEnv::invoke()` — the composing handler decides the child's policy,
not the wire caller.
**`AbortDependents` (default)**: aborting a request aborts everything
downstream, regardless of branch. This is the correct default because aborted
parent work has no consumer waiting for results — continuing is wasted work at
best and unwanted side effects at worst (e.g., a `bash/exec` that keeps running
after the caller stopped caring).
**`ContinueRunning` (opt-in)**: descendants that have already started continue
to completion; descendants that haven't started yet are aborted; no new
descendants start. Use for long-running work that should survive a parent's
abort (e.g., a subscription that should keep streaming).
### Wire visibility
Composed child `request_id`s are **internal** — they appear in
`PendingRequestMap` for abort-cascade indexing but are not sent as
`call.requested` to any peer. The client only sees `call.aborted` for the root
ID it sent; the server cascades internally to descendants.
The exception is `from_call` ops, which generate their own wire ID when
forwarding to the remote node (the remote node's `PendingRequestMap` indexes
it).
### Implementation
The abort cascade needs access to the `PendingRequestMap` to walk the tree.
The `CallAdapter` holds the `PendingRequestMap` (or a reference to it). The
cascade logic:
```rust
pub struct AbortCascade {
// Access to PendingRequestMap for tree walking
// The map indexes entries by request_id, and each entry knows its parent_request_id
// (from OperationContext, stored when the entry was registered)
}
impl AbortCascade {
/// Cascade an abort from the given request ID to all non-terminal descendants.
/// Returns the list of request IDs that were aborted (for logging/auditing).
pub fn cascade_abort(&self, root_request_id: &str, policy: AbortPolicy) -> Vec<String>;
/// Find all descendants of a request ID in the call tree.
fn find_descendants(&self, parent_id: &str) -> Vec<String>;
}
```
### Storing parent_request_id in PendingRequestMap
The `PendingRequestMap` needs to know the `parent_request_id` for each entry to
walk the tree. This means `PendingEntry` needs to store the parent ID (or the
full `OperationContext`):
```rust
enum PendingEntry {
Call {
tx: oneshot::Sender<Result<Value, CallError>>,
timeout: Instant,
parent_request_id: Option<String>, // for abort cascade tree
},
Subscribe {
tx: mpsc::Sender<Result<Value, CallError>>,
timeout: Option<Instant>,
parent_request_id: Option<String>, // for abort cascade tree
},
}
```
Update the `PendingRequestMap` (from the pending-request-map task) to store
`parent_request_id` when registering entries. The `register_call` and
`register_subscribe` methods take an optional `parent_request_id` parameter.
### AbortPolicy propagation
The abort policy is propagated through `OperationEnv::invoke()`:
- `invoke()` uses the default impl, which delegates to `invoke_with_policy()`
with `parent.abort_policy.clone()`
- `invoke_with_policy()` takes an explicit policy — use
`AbortPolicy::ContinueRunning` for long-running work
When cascading:
- `AbortDependents`: abort ALL descendants (started and unstarted)
- `ContinueRunning`: abort only unstarted descendants; started ones continue to
completion; no new descendants start
Determining "started" vs "unstarted" is tricky. A practical approach:
- A descendant is "started" if its handler has begun executing (the future has
been polled at least once)
- A descendant is "unstarted" if it's queued but not yet dispatched
This may require tracking dispatch state in `PendingEntry`. A simpler
approximation: under `ContinueRunning`, abort all descendants that haven't sent
a `call.responded` yet (they're still pending). This is conservative but safe.
### Handler cleanup
Handlers clean up resources when their call is cancelled. In Rust, the future
is dropped and `Drop` guards release resources (HTTP streams, file handles,
locks). This is a handler-level concern; the protocol's job is to cascade the
abort. See ADR-016.
## Acceptance Criteria
- [ ] `PendingEntry` stores `parent_request_id` (Call and Subscribe variants)
- [ ] `register_call` and `register_subscribe` accept optional `parent_request_id`
- [ ] `AbortCascade` struct with `cascade_abort()` method
- [ ] `cascade_abort` walks the tree by `parent_request_id`
- [ ] `AbortDependents`: aborts ALL descendants (started and unstarted)
- [ ] `ContinueRunning`: aborts unstarted descendants, started ones continue
- [ ] `cascade_abort` returns list of aborted request IDs
- [ ] `call.aborted` for unknown request_id is silently discarded
- [ ] Composed child request_ids are internal (not sent as call.requested to peer)
- [ ] Client only sees call.aborted for the root ID it sent
- [ ] AbortPolicy propagated through OperationEnv::invoke()
- [ ] Unit test: cascade aborts all descendants under AbortDependents
- [ ] Unit test: cascade aborts only unstarted under ContinueRunning
- [ ] Unit test: unknown request_id → no-op (silently discarded)
- [ ] Unit test: tree with depth 3, abort root → all descendants aborted
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/decisions/016-abort-cascade-for-nested-calls.md — ADR-016 (full rationale)
- docs/architecture/crates/call/call-protocol.md — Abort Cascade and Nested Calls section
- docs/architecture/crates/call/operation-registry.md — AbortPolicy, OperationContext.abort_policy
## Notes
> **Read ADR-016 before starting.** The abort cascade walks the call tree
> indexed by parent_request_id in PendingRequestMap. The default policy
> (AbortDependents) aborts everything downstream — this is correct because
> aborted parent work has no consumer. ContinueRunning is the opt-in for
> long-running work. Composed child request_ids are internal — the client only
> sees call.aborted for the root ID. The PendingRequestMap needs to store
> parent_request_id for tree walking — update the pending-request-map task's
> output if needed.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,260 @@
---
id: call/protocol/call-adapter
name: Implement CallAdapter (ProtocolHandler for alknet/call) with stream handling, identity resolution, and root context construction
status: pending
depends_on: [call/protocol/call-connection, call/registry/operation-env, call/registry/service-discovery, core/endpoint]
scope: broad
risk: high
impact: component
level: implementation
---
## Description
Implement `CallAdapter` in `src/protocol/adapter.rs`. This is the
`ProtocolHandler` implementation for ALPN `alknet/call` — the merge point of the
registry and protocol strands. It ties everything together: stream handling,
identity resolution, root context construction, env composition, dispatch.
### CallAdapter struct
```rust
pub struct CallAdapter {
registry: Arc<OperationRegistry>, // Layer 0 — curated, immutable
identity_provider: Arc<dyn IdentityProvider>,
session_source: Option<Arc<dyn SessionOverlaySource + Send + Sync>>, // Layer 1
default_timeout: Duration, // 30s default
}
impl CallAdapter {
pub fn new(registry: Arc<OperationRegistry>, identity_provider: Arc<dyn IdentityProvider>) -> Self {
Self { registry, identity_provider, session_source: None,
default_timeout: Duration::from_secs(30) }
}
pub fn with_session_source(mut self, source: Arc<dyn SessionOverlaySource + Send + Sync>) -> Self {
self.session_source = Some(source);
self
}
pub fn with_timeout(mut self, timeout: Duration) -> Self {
self.default_timeout = timeout;
self
}
}
```
### SessionOverlaySource trait
```rust
pub trait SessionOverlaySource: Send + Sync {
fn overlay_for(&self, context: &OperationContext) -> Option<Arc<dyn OperationEnv + Send + Sync>>;
}
```
Defined in alknet-call because CallAdapter must name the type — alknet-call
cannot depend on alknet-agent (agent depends on call, not reverse). The agent
crate implements this trait; alknet-call defines it. Same pattern as
IdentityProvider (ADR-004).
### ProtocolHandler impl
```rust
#[async_trait]
impl ProtocolHandler for CallAdapter {
fn alpn(&self) -> &'static [u8] { b"alknet/call" }
async fn handle(&self, connection: Connection, auth: &AuthContext) -> Result<(), HandlerError> {
// 1. Create CallConnection from the Connection
// 2. Spawn a task that continuously calls connection.accept_bi()
// 3. For each accepted stream, read EventEnvelope frames (FrameFramedReader)
// 4. Dispatch call.requested events to the operation registry
// 5. Write response EventEnvelope frames (FrameFramedWriter)
// 6. Manage PendingRequestMap for outgoing calls
// 7. On connection close: fail all pending, return Ok or Err(ConnectionClosed)
}
}
```
### Stream handling
The adapter:
1. Spawns a task that continuously calls `connection.accept_bi()` to receive
incoming streams
2. For each accepted stream, reads `EventEnvelope` frames using
`FrameFramedReader`
3. Dispatches `call.requested` events to the operation registry
4. Writes response `EventEnvelope` frames using `FrameFramedWriter`
5. Manages `PendingRequestMap` for outgoing calls initiated by the server
For outgoing calls (server → client), the adapter:
1. Opens a bidirectional stream with `connection.open_bi()`
2. Sends `call.requested` on that stream
3. Adds the request ID to the `PendingRequestMap`
4. Reads responses from any stream, correlates by ID
### Identity resolution (per-request)
The CallAdapter resolves identity per-request, not per-connection:
1. The endpoint provides `AuthContext` with whatever identity it resolved at
the TLS layer (may be `None`)
2. When a `call.requested` event arrives, the CallAdapter constructs an
`OperationContext` with the connection-level `AuthContext.identity`
3. If the `call.requested` payload includes an `auth_token` field, the
CallAdapter resolves it using `IdentityProvider::resolve_from_token()`. If
resolution succeeds, the resulting `Identity` replaces the connection-level
identity in the `OperationContext`. If resolution fails, the request
proceeds with the connection-level identity (which may be `None`)
4. The `OperationContext.identity` is passed to the `OperationRegistry` for
ACL checking
5. If `identity` is `None` and the operation's `AccessControl` has
restrictions, the registry returns `FORBIDDEN` with message
`"authentication required"`
**Key point**: Identity is resolved per-request. This allows a single
connection to upgrade authentication mid-session and allows different operations
on the same connection to have different identity levels.
### Root OperationContext construction
When a `call.requested` arrives from the wire, the CallAdapter constructs the
root `OperationContext` — the entry point of the call tree. This sets
`internal: false`, meaning ACL runs against the caller's `identity`, not a
handler's composition authority (ADR-015, ADR-022).
```rust
fn build_root_context(
&self,
request_id: String,
operation_name: &str,
identity: Option<Identity>,
/* connection, session */
) -> OperationContext {
let registration = self.registry.registration(operation_name);
OperationContext {
request_id,
parent_request_id: None, // wire request — top of call tree
identity: identity.clone(), // caller's identity (inbound)
handler_identity: registration.composition_authority.clone(),
capabilities: registration.capabilities.clone(),
metadata: HashMap::new(),
deadline: Some(Instant::now() + self.default_timeout),
scoped_env: registration.scoped_env.clone()
.unwrap_or_else(ScopedOperationEnv::empty),
env: self.compose_root_env(/* connection, session */),
abort_policy: AbortPolicy::default(), // abort-dependents
internal: false, // external call — ACL against caller identity
}
}
```
### compose_root_env
The per-call `env` composition (ADR-024) builds a `CompositeOperationEnv` from:
- Layer 0: `LocalOperationEnv` (curated registry)
- Layer 1: session overlay (if active, from `session_source.overlay_for()`)
- Layer 2: connection overlay (from `CallConnection.overlay_env()`)
```rust
fn compose_root_env(&self, connection: &CallConnection, context: &OperationContext) -> Arc<dyn OperationEnv + Send + Sync> {
let base = Arc::new(LocalOperationEnv { registry: self.registry.clone() });
let session = self.session_source.as_ref()
.and_then(|s| s.overlay_for(context));
let connection_overlay = connection.overlay_env();
Arc::new(CompositeOperationEnv { session, connection: Some(connection_overlay), base })
}
```
### operationId normalization
The `call.requested` payload's `operationId` has a leading slash (`/fs/readFile`).
The CallAdapter strips it before registry lookup (`fs/readFile`). This is a
single rule applied consistently — the registry stores names without leading
slash, the wire format adds it.
### ResponseEnvelope → EventEnvelope
The CallAdapter converts `ResponseEnvelope` (from local dispatch) to
`EventEnvelope` for the wire:
| `ResponseEnvelope` | `EventEnvelope` |
|--------------------|-----------------|
| `Ok(value)` | `{ type: "call.responded", id: request_id, payload: { output: value } }` |
| `Err(call_error)` | `{ type: "call.error", id: request_id, payload: <serialized CallError> }` |
For subscriptions, each `call.responded` is a separate `EventEnvelope` with the
same `id`; `call.completed` is `{ type: "call.completed", id, payload: {} }`.
### Timeout handling
- Default timeout for wire calls is 30 seconds (`default_timeout`)
- `build_root_context` sets `OperationContext.deadline` to `now + default_timeout`
- Composed calls inherit the parent's deadline (children do NOT get a fresh 30s)
- A composed call that exceeds the deadline is cancelled and returns
`CallError { code: "TIMEOUT", retryable: true }`
- Subscriptions default to no deadline (`deadline: None` — unbounded); the
client can specify a timeout in the `call.requested` payload
- The `PendingRequestMap` sweeper runs every 10 seconds and removes expired
wire entries
### Error handling in handle()
- If a handler panics, the stream is closed and the PendingRequestMap entry is
cleaned up by the next sweeper pass. Other streams and the connection are
unaffected.
- Connection drop: all pending requests failed with `call.error` code
`INTERNAL` and message `"connection closed"`. All subscription channels
closed. `handle()` returns `Ok(())` (clean) or `Err(ConnectionClosed)`.
- Stream reset: `FrameFramedReader` returns an error. If subscription, remove
PendingRequestMap entry, close mpsc. If call, resolve oneshot with error. No
`call.aborted` sent — stream is gone.
## Acceptance Criteria
- [ ] `CallAdapter` struct with registry, identity_provider, session_source, default_timeout
- [ ] `CallAdapter::new()`, `with_session_source()`, `with_timeout()` constructors
- [ ] `SessionOverlaySource` trait defined with `overlay_for()` method
- [ ] `ProtocolHandler::alpn()` returns `b"alknet/call"`
- [ ] `handle()` accepts streams, reads EventEnvelope frames, dispatches
- [ ] `handle()` spawns task for continuous `accept_bi()`
- [ ] Outgoing calls: open_bi, send call.requested, add to PendingRequestMap
- [ ] Identity resolution: AuthContext.identity used, auth_token overrides per-request
- [ ] auth_token resolution failure → proceed with connection-level identity
- [ ] `build_root_context` sets internal: false, deadline, capabilities from registration
- [ ] `compose_root_env` builds CompositeOperationEnv (base + session + connection)
- [ ] operationId leading slash stripped before registry lookup
- [ ] ResponseEnvelope → EventEnvelope conversion (Ok → responded, Err → error)
- [ ] Subscriptions: multiple call.responded with same id, then call.completed
- [ ] Timeout: 30s default, composed calls inherit parent deadline
- [ ] Handler panic: stream closed, PendingRequestMap cleaned up, others unaffected
- [ ] Connection drop: fail all pending with INTERNAL, return Ok or Err
- [ ] Unit test: CallAdapter alpn returns b"alknet/call"
- [ ] Integration test: call.requested → dispatch → call.responded round-trip
- [ ] Integration test: auth_token overrides connection-level identity
- [ ] Integration test: Internal op called from wire → NOT_FOUND
- [ ] Integration test: ACL denied → FORBIDDEN
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/crates/call/call-protocol.md — CallAdapter, stream handling, root context
- docs/architecture/crates/call/operation-registry.md — OperationContext construction
- docs/architecture/decisions/015-privilege-model-and-authority-context.md — ADR-015 (internal: false for wire)
- docs/architecture/decisions/024-operation-registry-layering.md — ADR-024 (env composition)
- docs/architecture/decisions/012-call-protocol-stream-model.md — ADR-012
## Notes
> This is the merge point of the registry and protocol strands — the highest-
> risk task in the call crate. It ties together stream handling, identity
> resolution, root context construction, env composition, and dispatch. The
> per-request identity resolution (auth_token overrides connection-level) is
> important — a single connection can upgrade auth mid-session. The
> compose_root_env builds the CompositeOperationEnv per call from the active
> layers. operationId on the wire has a leading slash; strip it before lookup.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,158 @@
---
id: call/protocol/call-connection
name: Implement CallConnection with imported-ops overlay (Layer 2) and call/subscribe/abort methods
status: pending
depends_on: [call/protocol/pending-request-map, call/registry/operation-env]
scope: moderate
risk: medium
impact: component
level: implementation
---
## Description
Implement `CallConnection` in `src/protocol/connection.rs`. This represents an
established `alknet/call` connection, regardless of which side opened it
(ADR-017). It holds the connection's imported-ops overlay (Layer 2, ADR-024).
### CallConnection
```rust
pub struct CallConnection {
connection: Connection,
imported_operations: Arc<RwLock<HashMap<String, HandlerRegistration>>>,
}
```
An established alknet/call connection (either direction — accepted or opened).
Holds the Layer 2 overlay (imported ops from `from_call` discovery).
### Layer 2 registration API
```rust
impl CallConnection {
/// Register an imported operation into this connection's overlay (Layer 2, ADR-024).
/// Called by from_call after discovery.
pub fn register_imported(&self, registration: HandlerRegistration) {
let name = registration.spec.name.clone();
self.imported_operations.write().insert(name, registration);
}
/// Register multiple imported operations (bulk variant for from_call).
pub fn register_imported_all(&self, registrations: Vec<HandlerRegistration>) {
let mut overlay = self.imported_operations.write();
for reg in registrations {
overlay.insert(reg.spec.name.clone(), reg);
}
}
}
```
Layer 0 (curated) is built via `OperationRegistryBuilder` at startup. Layer 2
(per-connection) registration uses `CallConnection::register_imported()` at
runtime. When the connection drops, the overlay (and all imported ops) is
dropped — no explicit deregistration needed.
### Overlay env
```rust
impl CallConnection {
/// Build an OperationEnv impl for this connection's overlay.
/// Used by the CallAdapter when composing the root OperationContext.env.
/// Returns an OperationEnv that dispatches to this connection's imported ops
/// (and reports contains only for ops in the overlay).
pub fn overlay_env(&self) -> Arc<dyn OperationEnv + Send + Sync>;
}
```
This is an `OperationEnv` impl that dispatches to the connection's imported ops.
The `contains()` method returns true only for ops in the overlay. The
`invoke_with_policy()` method looks up the op in the overlay and dispatches to
its handler.
This env is composed into the `CompositeOperationEnv` by the CallAdapter as the
`connection` layer (Layer 2).
### Call methods (outgoing)
```rust
impl CallConnection {
/// Call an operation on the remote peer (sends call.requested).
pub async fn call(&self, operation_id: &str, input: Value) -> ResponseEnvelope;
/// Subscribe to a streaming operation on the remote peer.
pub async fn subscribe(&self, operation_id: &str, input: Value) -> impl Stream<Item = ResponseEnvelope>;
/// Abort an in-flight request (sends call.aborted, cascades per ADR-016).
pub async fn abort(&self, request_id: &str);
}
```
These methods:
1. Open a bidirectional stream with `connection.open_bi()`
2. Send `call.requested` on that stream (via FrameFramedWriter)
3. Add the request ID to the PendingRequestMap
4. Read responses from any stream, correlate by ID (via PendingRequestMap)
`call()` resolves on the first `call.responded`. `subscribe()` yields each
`call.responded` until `call.completed` or `call.aborted`.
`abort()` sends `call.aborted` for the given request ID. The abort cascade
(ADR-016) is handled by the abort-cascade task.
### Connection direction independence
Per ADR-017, connection direction is independent of call direction. Both
sides can call each other once connected. The `CallConnection` type is the same
whether the connection was accepted (server side) or opened (client side via
`CallClient`). The `call`/`subscribe`/`abort` methods work the same way.
### from_call integration
The `from_call` adapter (ADR-017) discovers operations on a remote call
protocol endpoint via `services/list` and `services/schema`, then registers
them with `register_imported()` / `register_imported_all()`. This makes
cross-node composition transparent — a handler calling
`env.invoke("worker", "exec", ...)` doesn't know whether the operation is
local or remote.
The `from_call` adapter itself is not implemented in this task — it's a future
task. This task implements the `CallConnection` infrastructure that `from_call`
will use.
## Acceptance Criteria
- [ ] `CallConnection` struct with connection and imported_operations fields
- [ ] `register_imported()` adds to the Layer 2 overlay
- [ ] `register_imported_all()` bulk adds to the overlay
- [ ] `overlay_env()` returns an OperationEnv dispatching to imported ops
- [ ] `overlay_env().contains()` returns true only for ops in the overlay
- [ ] `call()` sends call.requested, resolves on first call.responded
- [ ] `subscribe()` sends call.requested, yields call.responded until completed/aborted
- [ ] `abort()` sends call.aborted for the request ID
- [ ] Outgoing calls open a stream, send request, add to PendingRequestMap
- [ ] Connection drop drops the overlay (no explicit deregistration)
- [ ] Unit test: register_imported adds to overlay, contains returns true
- [ ] Unit test: overlay_env dispatches to imported op
- [ ] Unit test: overlay_env contains returns false for non-imported op
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/crates/call/call-protocol.md — CallConnection section
- docs/architecture/decisions/017-call-protocol-client-and-adapter-contract.md — ADR-017
- docs/architecture/decisions/024-operation-registry-layering.md — ADR-024 (Layer 2)
## Notes
> Connection direction is independent of call direction (ADR-017) — both sides
> can call each other. The Layer 2 overlay is per-connection: when the
> connection drops, the overlay drops (no deregistration needed). The
> overlay_env() is composed into CompositeOperationEnv by the CallAdapter as
> the connection layer. The from_call adapter itself is a future task — this
> implements the infrastructure it will use.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,164 @@
---
id: call/protocol/pending-request-map
name: Implement PendingRequestMap for correlating call.requested and call.responded events
status: pending
depends_on: [call/protocol/wire-types]
scope: moderate
risk: medium
impact: component
level: implementation
---
## Description
Implement `PendingRequestMap` in `src/protocol/pending.rs`. This manages
in-flight calls and subscriptions, correlating `call.responded` events back to
the original `call.requested` by request ID.
### PendingRequestMap
```rust
pub struct PendingRequestMap {
pending: HashMap<String, PendingEntry>,
}
enum PendingEntry {
Call {
tx: oneshot::Sender<Result<Value, CallError>>,
timeout: Instant,
},
Subscribe {
tx: mpsc::Sender<Result<Value, CallError>>,
timeout: Option<Instant>,
},
}
```
### Behavior
When a `call.responded` event arrives:
- If `PendingEntry::Call` → resolve the oneshot, delete entry
- If `PendingEntry::Subscribe` → push to the mpsc channel, keep entry alive
When `call.completed` arrives on a subscription → close the mpsc channel, delete entry.
When `call.aborted` arrives → cancel/drop whichever side initiated it. A
`call.aborted` for an unknown `requestId` is silently discarded.
When `call.error` arrives → resolve the oneshot (Call) or push to channel
(Subscribe) with the error, delete entry.
### Timeouts
Timeouts prevent dangling entries. A background task sweeps expired entries
periodically (every 10 seconds per call-protocol.md).
- `Call` entries have a timeout (default 30s from CallAdapter.default_timeout)
- `Subscribe` entries may have `timeout: None` (unbounded — long-running
subscriptions)
When the sweeper finds an expired entry:
- `Call`: resolve oneshot with `CallError { code: "TIMEOUT", retryable: true }`, delete
- `Subscribe`: close mpsc channel with a timeout error, delete
### Methods
```rust
impl PendingRequestMap {
pub fn new() -> Self;
/// Register a pending call. Returns a oneshot receiver for the result.
pub fn register_call(&mut self, request_id: String, timeout: Instant) -> oneshot::Receiver<Result<Value, CallError>>;
/// Register a pending subscription. Returns an mpsc receiver for the stream.
pub fn register_subscribe(&mut self, request_id: String, timeout: Option<Instant>) -> mpsc::Receiver<Result<Value, CallError>>;
/// Handle an incoming call.responded event.
/// Returns true if the entry was found and handled.
pub fn handle_responded(&mut self, request_id: &str, output: Value) -> bool;
/// Handle an incoming call.completed event (subscriptions only).
/// Closes the mpsc channel, deletes entry.
pub fn handle_completed(&mut self, request_id: &str) -> bool;
/// Handle an incoming call.aborted event.
/// Cancels the pending request, deletes entry.
pub fn handle_aborted(&mut self, request_id: &str) -> bool;
/// Handle an incoming call.error event.
/// Resolves with the error, deletes entry.
pub fn handle_error(&mut self, request_id: &str, error: CallError) -> bool;
/// Sweep expired entries. Called periodically by a background task.
pub fn evict_expired(&mut self) -> Vec<String>; // returns evicted request IDs
/// Fail all pending requests (connection closed). Returns the request IDs that were failed.
pub fn fail_all(&mut self, error: CallError) -> Vec<String>;
/// Check if a request ID is pending.
pub fn contains(&self, request_id: &str) -> bool;
/// Number of pending entries.
pub fn len(&self) -> usize;
}
```
### Connection drop handling
When the QUIC connection closes, all pending requests are failed with
`call.error` code `INTERNAL` and message `"connection closed"`. All
subscription channels are closed. This is `fail_all()`.
### Stream reset handling
When a QUIC stream is reset mid-operation, the `FrameFramedReader` returns an
error. If the stream was carrying a subscription, the PendingRequestMap entry
is removed and the mpsc channel is closed. If the stream was carrying a call,
the oneshot is resolved with an error. No `call.aborted` is sent — the stream
is gone.
### Correlation is by ID, not by stream
A response arriving on stream N can fulfill a request sent on stream M. The
`PendingRequestMap` is keyed by ID, not by stream. This is the stream-agnostic
correlation property from ADR-012.
## Acceptance Criteria
- [ ] `PendingRequestMap` struct with pending HashMap
- [ ] `PendingEntry::Call` with oneshot::Sender and timeout
- [ ] `PendingEntry::Subscribe` with mpsc::Sender and optional timeout
- [ ] `register_call` returns oneshot::Receiver
- [ ] `register_subscribe` returns mpsc::Receiver
- [ ] `handle_responded` resolves Call oneshot, pushes to Subscribe channel
- [ ] `handle_completed` closes Subscribe mpsc, deletes entry
- [ ] `handle_aborted` cancels pending, deletes entry
- [ ] `handle_error` resolves with error, deletes entry
- [ ] Unknown request_id in handle_* is silently discarded (returns false)
- [ ] `evict_expired` removes timed-out entries, resolves with TIMEOUT error
- [ ] `fail_all` fails all pending with given error (connection close)
- [ ] Correlation is by request ID, not by stream
- [ ] Unit test: register call, handle_responded → oneshot resolves
- [ ] Unit test: register subscribe, handle multiple responded, handle_completed → stream ends
- [ ] Unit test: expired call → evict_expired resolves with TIMEOUT
- [ ] Unit test: fail_all resolves all pending with INTERNAL error
- [ ] Unit test: unknown request_id handle_responded → false (silently discarded)
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/crates/call/call-protocol.md — PendingRequestMap section
- docs/architecture/decisions/012-call-protocol-stream-model.md — ADR-012 (ID-based correlation)
## Notes
> Correlation is by request ID, not by stream — a response on stream N can
> fulfill a request sent on stream M. This is the stream-agnostic property from
> ADR-012. The sweeper runs every 10 seconds to evict expired entries. Unknown
> request IDs in handle_* are silently discarded (not an error — the entry may
> have already been resolved/cleaned up).
## Summary
> To be filled on completion

View File

@@ -0,0 +1,219 @@
---
id: call/protocol/wire-types
name: Implement EventEnvelope, ResponseEnvelope, CallError, and length-prefixed JSON framing
status: pending
depends_on: [call/crate-init]
scope: moderate
risk: medium
impact: component
level: implementation
---
## Description
Implement the wire protocol types and framing in `src/protocol/wire.rs`. Every
message on the wire is a length-prefixed JSON `EventEnvelope`.
### EventEnvelope
```rust
pub struct EventEnvelope {
pub r#type: String, // Event type
pub id: String, // Correlation key (request ID, subscription ID)
pub payload: Value, // serde_json::Value — schema depends on event type
}
// Frame: 4-byte big-endian length prefix + UTF-8 JSON body
```
The envelope is JSON because it must be consumable from JavaScript, Python, and
any language. The `Value` type is `serde_json::Value`.
Binary payloads (postcard, protobuf) are base64-encoded as a JSON string within
the `payload` field. The envelope itself does not interpret the payload — this
is a handler-level concern, not a protocol-level concern.
### Event Types
Five event types:
| Event | Direction | Purpose |
|-------|-----------|---------|
| `call.requested` | Caller → Handler | Initiate a call or subscription |
| `call.responded` | Handler → Caller | Deliver a result (one for calls, many for subscriptions) |
| `call.completed` | Handler → Caller | Signal end of subscription stream |
| `call.aborted` | Either side | Cancel the call/subscription |
| `call.error` | Handler → Caller | Signal an error |
### Wire Payload Schemas
| Event | `payload` shape |
|-------|----------------|
| `call.requested` | `{ "operationId": "/fs/readFile", "input": {...}, "auth_token": "alk_..." (optional) }` |
| `call.responded` | `{ "output": <Value> }` |
| `call.completed` | `{}` — empty object |
| `call.aborted` | `{}` — empty object |
| `call.error` | `{ "code": "...", "message": "...", "retryable": bool, "details": {...} (optional) }` |
### call.requested payload
```json
{
"operationId": "/fs/readFile",
"input": { ... },
"auth_token": "alk_..." // optional
}
```
- `operationId` — the operation to invoke, **with a leading slash** on the wire.
The registry stores names without the leading slash; the wire format adds it.
The CallAdapter strips the leading slash before registry lookup.
- `input` — the operation input, matching the operation's `input_schema`.
- `auth_token` — optional. If present, CallAdapter resolves via
`IdentityProvider::resolve_from_token()`. Resulting Identity takes precedence
over connection-level identity for this request.
The `call.requested` payload does **not** carry an abort policy field. The abort
policy is set on `OperationContext` and propagated through
`OperationEnv::invoke()` — the composing handler decides, not the wire caller.
### call.error payload
```json
{
"code": "FILE_NOT_FOUND",
"message": "file not found: /etc/nonexistent",
"retryable": false,
"details": { "path": "/etc/nonexistent", "errno": 2 }
}
```
Protocol-level codes (emitted by dispatch machinery):
- `NOT_FOUND` — operation not in registry (or Internal op called from wire)
- `FORBIDDEN` — access denied
- `INVALID_INPUT` — input doesn't match JSON Schema
- `INTERNAL` — handler error, panic, connection failure
- `TIMEOUT` — request timed out (retryable: true)
Operation-level domain codes (emitted by handlers, ADR-023): e.g.,
`FILE_NOT_FOUND`, `RATE_LIMITED`. These carry a `details` payload conforming to
the declared `ErrorDefinition.schema`.
New error codes may be added in future. Clients should treat unknown codes as
`INTERNAL` with `retryable: false`.
### ResponseEnvelope
```rust
pub struct ResponseEnvelope {
pub request_id: String,
pub result: Result<Value, CallError>,
}
pub struct CallError {
pub code: String,
pub message: String,
pub retryable: bool,
pub details: Option<Value>,
}
```
Local dispatch produces `ResponseEnvelope` with no serialization overhead. The
CallAdapter converts it to `EventEnvelope` for the wire.
### ResponseEnvelope → EventEnvelope conversion
| `ResponseEnvelope` | `EventEnvelope` |
|--------------------|-----------------|
| `Ok(value)` | `{ type: "call.responded", id: request_id, payload: { output: value } }` |
| `Err(call_error)` | `{ type: "call.error", id: request_id, payload: <serialized CallError> }` |
For subscriptions, each `call.responded` is a separate `EventEnvelope` with the
same `id`; `call.completed` is `{ type: "call.completed", id, payload: {} }`.
### Framing
Length-prefixed JSON: 4-byte big-endian length prefix + UTF-8 JSON body.
Implement:
- `FrameFramedReader` — reads length-prefixed frames from an async reader
(RecvStream)
- `FrameFramedWriter` — writes length-prefixed frames to an async writer
(SendStream)
```rust
pub struct FrameFramedReader<R: AsyncRead + Unpin> { /* ... */ }
impl<R: AsyncRead + Unpin> FrameFramedReader<R> {
pub fn new(reader: R) -> Self;
pub async fn read_frame(&mut self) -> Result<EventEnvelope, FrameError>;
}
pub struct FrameFramedWriter<W: AsyncWrite + Unpin> { /* ... */ }
impl<W: AsyncWrite + Unpin> FrameFramedWriter<W> {
pub fn new(writer: W) -> Self;
pub async fn write_frame(&mut self, envelope: &EventEnvelope) -> Result<(), FrameError>;
}
```
This is the same framing used by irpc. The Rust implementation in alknet-call is
canonical (ADR-005, ADR-013).
### ResponseEnvelope helper methods
```rust
impl ResponseEnvelope {
pub fn ok(request_id: String, output: Value) -> Self;
pub fn error(request_id: String, error: CallError) -> Self;
pub fn not_found(request_id: String, op_name: &str) -> Self;
pub fn forbidden(request_id: String, message: &str) -> Self;
}
```
### FrameError
```rust
pub enum FrameError {
Io(io::Error),
Json(serde_json::Error),
ConnectionClosed,
InvalidFrame,
}
```
## Acceptance Criteria
- [ ] `EventEnvelope` struct with type, id, payload fields
- [ ] `ResponseEnvelope` struct with request_id, result fields
- [ ] `CallError` struct with code, message, retryable, details fields
- [ ] `FrameError` enum with Io, Json, ConnectionClosed, InvalidFrame
- [ ] `FrameFramedReader` reads length-prefixed JSON frames
- [ ] `FrameFramedWriter` writes length-prefixed JSON frames
- [ ] 4-byte big-endian length prefix + UTF-8 JSON body
- [ ] `ResponseEnvelope::ok()`, `error()`, `not_found()`, `forbidden()` helpers
- [ ] `ResponseEnvelope``EventEnvelope` conversion (Ok → call.responded, Err → call.error)
- [ ] Unit test: write frame, read frame, round-trip EventEnvelope
- [ ] Unit test: ResponseEnvelope::ok produces correct EventEnvelope
- [ ] Unit test: ResponseEnvelope::error produces correct call.error EventEnvelope
- [ ] Unit test: framing handles large payloads
- [ ] Unit test: framing detects truncated frames (ConnectionClosed error)
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/crates/call/call-protocol.md — EventEnvelope, wire format, event types
- docs/architecture/decisions/005-irpc-as-call-protocol-foundation.md — ADR-005
- docs/architecture/decisions/012-call-protocol-stream-model.md — ADR-012
- docs/architecture/decisions/023-operation-error-schemas.md — ADR-023 (CallError, details)
## Notes
> The envelope is always JSON for cross-language compatibility. Binary
> payloads are base64-encoded within the payload field (handler concern, not
> protocol concern). The 4-byte big-endian length prefix is the same framing
> irpc uses. operationId on the wire has a leading slash; the registry stores
> names without it — the CallAdapter strips it before lookup.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,202 @@
---
id: call/registry/handler-registration
name: Implement Handler, HandlerRegistration, OperationProvenance, OperationRegistry, and OperationRegistryBuilder
status: pending
depends_on: [call/registry/operation-context]
scope: broad
risk: medium
impact: component
level: implementation
---
## Description
Implement the handler registration types and the operation registry in
`src/registry/registration.rs`. The registry maps operation names to
registration bundles and provides the dispatch entry point.
### Handler
```rust
pub type Handler = Arc<
dyn Fn(Value, OperationContext) -> Pin<Box<dyn Future<Output = ResponseEnvelope> + Send>>
+ Send + Sync
>;
```
Handlers are async. They receive:
- `input: Value` — deserialized payload from `call.requested` (always `serde_json::Value`)
- `context: OperationContext` — request ID, identity, metadata, env
And return `ResponseEnvelope` (defined in protocol/wire task — use a forward
reference or define a minimal version here, full impl in the wire task).
### HandlerRegistration
```rust
pub struct HandlerRegistration {
pub spec: OperationSpec,
pub handler: Handler,
pub provenance: OperationProvenance,
pub composition_authority: Option<CompositionAuthority>, // None for leaves
pub scoped_env: Option<ScopedOperationEnv>, // None for leaves
pub capabilities: Capabilities,
}
```
The registration bundle carries everything the dispatch path needs to
construct an `OperationContext`. See ADR-022.
### OperationProvenance
```rust
pub enum OperationProvenance {
Local, // Assembly-written, trusted, can compose
FromOpenAPI, // HTTP forwarding stub, leaf
FromMCP, // MCP forwarding stub, leaf
FromCall, // QUIC forwarding stub, leaf locally
FromJsonSchema, // JSON Schema definition, no handler — schema only
Session, // Agent-written, sandboxed, can compose within sandbox
}
```
| Provenance | Can compose? | Has composition authority? | Default visibility |
|-----------|-------------|---------------------------|-------------------|
| `Local` | Yes | Yes | External or Internal (assembly declares) |
| `FromOpenAPI` | No (leaf) | No | Internal |
| `FromMCP` | No (leaf) | No | Internal |
| `FromCall` | No (leaf in local registry) | No | Internal |
| `FromJsonSchema` | N/A (no handler) | No | N/A |
| `Session` | Yes (within sandbox) | Yes | Internal always |
### OperationRegistry
```rust
pub struct OperationRegistry {
operations: HashMap<String, HandlerRegistration>,
}
```
The curated layer (Layer 0) is a `HashMap<String, HandlerRegistration>`. Session
and connection overlays (Layers 1 and 2) are separate maps composed into the
per-call `OperationContext.env` by the CallAdapter (ADR-024).
Methods:
- `register(registration)`: add to curated layer at startup
- `registration(name)`: find by operation name (checks active overlays first,
then curated base — ADR-024). Returns spec, handler, provenance, composition
authority, scoped env, capabilities.
- `invoke(name, input, context)`: look up, check ACL, invoke handler, return result
- `list_operations()`: return all registered specs (for `/services/list`
returns curated + active overlay ops, External only)
### OperationRegistryBuilder
Fluent API with convenience methods:
```rust
pub struct OperationRegistryBuilder {
operations: HashMap<String, HandlerRegistration>,
}
impl OperationRegistryBuilder {
pub fn new() -> Self;
// with_local: Local provenance, full bundle — all 5 args required
pub fn with_local(
mut self,
spec: OperationSpec,
handler: Handler,
composition_authority: Option<CompositionAuthority>,
scoped_env: Option<ScopedOperationEnv>,
capabilities: Capabilities,
) -> Self;
// with_leaf: leaf provenance (FromOpenAPI/FromMCP/FromCall), no authority, no scoped env
pub fn with_leaf(
mut self,
spec: OperationSpec,
handler: Handler,
capabilities: Capabilities,
) -> Self;
// with: full manual registration (any provenance)
pub fn with(mut self, registration: HandlerRegistration) -> Self;
pub fn build(self) -> OperationRegistry;
}
```
`with_local` sets `provenance: Local`. `with_leaf` sets `provenance: FromOpenAPI`
(or a parameter), `composition_authority: None`, `scoped_env: None`. `with` takes
the full bundle for any provenance.
### Registry invoke flow
```rust
impl OperationRegistry {
pub async fn invoke(&self, name: &str, input: Value, context: OperationContext) -> ResponseEnvelope {
// 1. Look up registration by name
// 2. Check visibility: if Internal and context is external (internal: false), return NOT_FOUND
// 3. Check ACL: access_control.check(identity or handler_identity depending on internal flag)
// 4. If denied: return FORBIDDEN
// 5. Invoke handler: (handler)(input, context).await
// 6. Return ResponseEnvelope
}
}
```
The ACL authority depends on `context.internal`:
- `internal: false` (wire call): check against `context.identity` (caller)
- `internal: true` (composition): check against `context.handler_identity.as_identity()`
### Layer 0 immutability
The curated layer (Layer 0 — `Local` provenance ops) is immutable after
construction. Adding a `Local` op requires restarting the process. Session and
imported overlays are dynamic at their respective scopes (ADR-024). The
`OperationRegistryBuilder` is Layer-0-only; runtime overlay registration uses
`CallConnection::register_imported()` (in the protocol/connection task).
## Acceptance Criteria
- [ ] `Handler` type alias (async closure returning ResponseEnvelope)
- [ ] `HandlerRegistration` struct with all 6 fields
- [ ] `OperationProvenance` enum with all 6 variants
- [ ] `OperationRegistry` struct with operations HashMap
- [ ] `OperationRegistry::register()` adds to curated layer
- [ ] `OperationRegistry::registration()` looks up by name
- [ ] `OperationRegistry::invoke()` checks visibility, ACL, invokes handler
- [ ] `OperationRegistry::list_operations()` returns External specs only
- [ ] `OperationRegistryBuilder` with `new()`, `with_local()`, `with_leaf()`, `with()`, `build()`
- [ ] `with_local` sets provenance Local, requires all 5 args
- [ ] `with_leaf` sets provenance leaf, composition_authority None, scoped_env None
- [ ] invoke: Internal op called externally → NOT_FOUND (not FORBIDDEN)
- [ ] invoke: ACL denied → FORBIDDEN
- [ ] invoke: internal: true → ACL against handler_identity, not identity
- [ ] invoke: internal: false → ACL against identity
- [ ] Unit test: register and invoke a simple operation
- [ ] Unit test: Internal op returns NOT_FOUND from external call
- [ ] Unit test: ACL check with sufficient scopes → Allowed
- [ ] Unit test: ACL check with insufficient scopes → Forbidden
- [ ] Unit test: builder with_local and with_leaf produce correct provenance
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/crates/call/operation-registry.md — Handler, HandlerRegistration, OperationRegistry, builder
- docs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md — ADR-022
- docs/architecture/decisions/024-operation-registry-layering.md — ADR-024 (layering, immutability)
## Notes
> The registry is the dispatch core. The ACL authority switch (internal: true
> → handler_identity, internal: false → identity) is the ADR-015 privilege
> model — get this right. Internal ops return NOT_FOUND from the wire (don't
> leak existence), not FORBIDDEN. The builder is Layer-0-only; runtime overlay
> registration is via CallConnection (protocol task).
## Summary
> To be filled on completion

View File

@@ -0,0 +1,204 @@
---
id: call/registry/operation-context
name: Implement OperationContext, AbortPolicy, CompositionAuthority, and ScopedOperationEnv
status: pending
depends_on: [call/registry/operation-spec, core/core-types]
scope: broad
risk: high
impact: component
level: implementation
---
## Description
Implement the operation context types in `src/registry/context.rs`. This is
the highest-density task in the call crate — `OperationContext` has 10 fields,
each tied to an ADR. The authority-switch semantics (`internal: true` → ACL
against `handler_identity`, not `identity`) is where ADR-015, ADR-022, and
ADR-024 converge.
**Read ADR-015, ADR-022, and ADR-024 before starting this task.**
### OperationContext
```rust
pub struct OperationContext {
pub request_id: String,
pub parent_request_id: Option<String>,
pub identity: Option<Identity>, // Caller's identity (inbound)
pub handler_identity: Option<CompositionAuthority>, // Handler's composition authority (ADR-022)
pub capabilities: Capabilities,
pub metadata: HashMap<String, Value>,
pub scoped_env: ScopedOperationEnv, // Reachability set (data, ADR-022)
pub env: Arc<dyn OperationEnv + Send + Sync>, // Composition dispatch trait (ADR-024)
pub abort_policy: AbortPolicy, // ADR-016 Decision 6
pub deadline: Option<Instant>,
pub(crate) internal: bool, // Module-private for writes (ADR-015)
}
```
Field-by-field:
- `request_id`: correlates with `call.requested` event's `id` field. For wire
calls, this is the client-generated ID. For composed calls, generated by
`OperationEnv::invoke()` via `generate_request_id()` (UUID v4 or
`parent_id + "-" + counter`). **Deterministic IDs must not be used** — they
collide across concurrent invocations, corrupting PendingRequestMap and the
abort-cascade tree.
- `parent_request_id`: set when this call was initiated by another operation
(via OperationEnv). Records the agency chain — the call tree is the
principal→agent chain (ADR-015).
- `identity`: the authenticated caller (from IdentityProvider) — inbound auth
(who is calling me). For external calls, who sent `call.requested`. For
internal calls, the parent handler's `handler_identity` (propagated through
`OperationEnv::invoke()`).
- `handler_identity`: the composition authority of the handler processing this
call. `None` for leaves (FromOpenAPI/FromMCP/FromCall) — they don't compose.
`Some(...)` for Local/Session ops. For internal calls (`internal: true`), ACL
checks against this authority (ADR-015, ADR-022). This is NOT a peer Identity
— it's a declared authority bundle set at registration.
- `capabilities`: outbound credentials the handler may use (decrypted API keys,
scoped vault access). From the registration bundle (ADR-022).
- `metadata`: request-scoped context (tracing IDs, connection info). **Must not
hold secret material** (ADR-014). **Does not propagate through
`OperationEnv::invoke()`** — nested calls get fresh metadata. The tracing
link is `parent_request_id`, not metadata propagation.
- `scoped_env`: the reachability set — operations this handler may compose.
Populated from the registration bundle (ADR-022). This is *data* (a struct),
not a dispatch trait. `None`/empty for leaves.
- `env`: the composition dispatch trait (`Arc<dyn OperationEnv + Send + Sync>`).
A handler calls `context.env.invoke(...)` to compose children. This is a
trait object, not a concrete struct — enables registry layering (ADR-024).
- `abort_policy`: for this call's descendants (ADR-016 Decision 6). Default
`AbortDependents`. `ContinueRunning` is opt-in for long-running work. Set by
the composing handler via `invoke()`, not by the wire caller.
- `deadline`: for this call and all descendants. Set by `build_root_context`
to `now + CallAdapter.default_timeout` (default 30s). Composed calls inherit
the parent's deadline (children do NOT get a fresh 30s). `None` = unbounded
(long-running subscriptions).
- `internal`: when `true`, this call originated from composition (a handler
calling another operation via OperationEnv), not from a wire request. This
switches the authority context: ACL runs against `handler_identity`, not
`identity`. Module-private for writes; read via `is_internal()`. Only set by
`OperationEnv::invoke()` (true) or `CallAdapter` dispatch path (false).
### AbortPolicy
```rust
pub enum AbortPolicy {
AbortDependents, // default — abort cascades to all non-terminal descendants
ContinueRunning, // opt-in — started descendants continue, unstarted aborted
}
impl Default for AbortPolicy {
fn default() -> Self { Self::AbortDependents }
}
```
### CompositionAuthority
```rust
pub struct CompositionAuthority {
pub label: String, // e.g., "agent-chat" — not a peer id
pub scopes: Vec<String>, // e.g., ["llm:call", "fs:read"]
pub resources: HashMap<String, Vec<String>>, // e.g., {"service": ["vastai"]}
}
impl CompositionAuthority {
pub fn none() -> Option<Self> { None } // Convenience for leaves
pub fn new(label: &str, scopes: impl IntoIterator<Item = String>) -> Self { ... }
pub fn as_identity(&self) -> Option<Identity> { ... } // Synthetic Identity for ACL
}
```
The declared authority the handler operates under when composing children.
`None` for leaves. This replaces ADR-015's `handler_identity: Identity` — it's
not a peer identity, it's a declared authority bundle. See ADR-022.
`as_identity()` produces a synthetic `Identity` from the authority (label as
id, scopes, resources) for ACL checking against `AccessControl`.
### ScopedOperationEnv
```rust
pub struct ScopedOperationEnv {
allowed: HashSet<String>, // operation names this handler may reach
}
impl ScopedOperationEnv {
pub fn empty() -> Self;
pub fn new(ops: impl IntoIterator<Item = impl Into<String>>) -> Self;
pub fn allows(&self, name: &str) -> bool; // is this op in the reachability set?
}
```
The reachability set — the operations this handler may reach via `env.invoke()`.
Populated from the registration bundle (ADR-022). This is *data*, not a dispatch
trait. The reachability check in `OperationEnv::invoke()` consults
`scoped_env.allows(&name)`. `None`/empty for leaves.
### OperationContext methods
```rust
impl OperationContext {
pub fn is_internal(&self) -> bool { self.internal }
}
```
The `internal` field is `pub(crate)` — only `OperationEnv::invoke()` and the
`CallAdapter` dispatch path can set it. Handlers read via `is_internal()`.
### generate_request_id
```rust
pub(crate) fn generate_request_id() -> String {
// UUID v4 — must be unique across concurrent invocations
// Deterministic IDs (e.g., format!("env-{name}")) MUST NOT be used
}
```
Use the `uuid` crate (already a dependency). This is module-internal — called
by `OperationEnv::invoke()` for composed calls.
## Acceptance Criteria
- [ ] `OperationContext` struct with all 10 fields
- [ ] `internal` field is `pub(crate)` (module-private for writes)
- [ ] `is_internal()` method exposes read access
- [ ] `AbortPolicy` enum with AbortDependents, ContinueRunning
- [ ] `Default for AbortPolicy` returns `AbortDependents`
- [ ] `CompositionAuthority` struct with label, scopes, resources
- [ ] `CompositionAuthority::none()` returns `None`
- [ ] `CompositionAuthority::new(label, scopes)` constructor
- [ ] `CompositionAuthority::as_identity()` produces synthetic Identity for ACL
- [ ] `ScopedOperationEnv` struct with allowed set
- [ ] `ScopedOperationEnv::empty()`, `new()`, `allows()` methods
- [ ] `generate_request_id()` produces UUID v4 (unique, non-deterministic)
- [ ] Unit test: ScopedOperationEnv::allows (in set → true, not in set → false)
- [ ] Unit test: CompositionAuthority::as_identity produces correct Identity
- [ ] Unit test: AbortPolicy default is AbortDependents
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/crates/call/operation-registry.md — OperationContext, AbortPolicy, CompositionAuthority, ScopedOperationEnv
- docs/architecture/decisions/015-privilege-model-and-authority-context.md — ADR-015 (internal flag, authority switch)
- docs/architecture/decisions/016-abort-cascade-for-nested-calls.md — ADR-016 (AbortPolicy)
- docs/architecture/decisions/022-handler-registration-provenance-and-composition-authority.md — ADR-022 (CompositionAuthority, ScopedOperationEnv)
- docs/architecture/decisions/024-operation-registry-layering.md — ADR-024 (env as trait object)
## Notes
> **Read ADR-015, ADR-022, and ADR-024 before starting.** This is the
> highest-density task in the call crate. OperationContext has 10 fields, each
> tied to an ADR. The authority-switch semantics (internal: true → ACL against
> handler_identity, not identity) is where three ADRs converge. The `internal`
> field is module-private for writes — only OperationEnv::invoke() and the
> CallAdapter dispatch path set it. Metadata does NOT propagate through
> composition (security constraint, ADR-014). Request IDs must be unique
> (UUID v4) — deterministic IDs corrupt PendingRequestMap and abort-cascade tree.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,225 @@
---
id: call/registry/operation-env
name: Implement OperationEnv trait, LocalOperationEnv, and CompositeOperationEnv
status: pending
depends_on: [call/registry/handler-registration]
scope: broad
risk: high
impact: component
level: implementation
---
## Description
Implement the `OperationEnv` trait and its implementations in
`src/registry/env.rs`. This is the universal composition mechanism — a handler
calls `context.env.invoke(...)` to compose child operations. The trait-object
design is what enables registry layering (ADR-024).
**Read ADR-024 before starting this task.** The trait-object pattern is
load-bearing — making `OperationEnv` concrete would close the session-overlay
and connection-overlay patterns.
### OperationEnv trait
```rust
#[async_trait]
pub trait OperationEnv: Send + Sync {
/// Compose a child operation. The child's OperationContext is constructed
/// with internal: true, inheriting the parent's composition authority as
/// the child's caller identity. Abort policy defaults to parent's.
async fn invoke(
&self,
namespace: &str,
operation: &str,
input: Value,
parent: &OperationContext,
) -> ResponseEnvelope {
self.invoke_with_policy(namespace, operation, input, parent, parent.abort_policy.clone()).await
}
/// Compose with explicit abort policy (ADR-016 Decision 6).
/// This is the required method — invoke() delegates to it.
async fn invoke_with_policy(
&self,
namespace: &str,
operation: &str,
input: Value,
parent: &OperationContext,
policy: AbortPolicy,
) -> ResponseEnvelope;
/// Does this env contain the named operation? Used by CompositeOperationEnv
/// to probe overlays before dispatching (ADR-024).
fn contains(&self, name: &str) -> bool { true }
}
```
`invoke()` has a default impl that delegates to `invoke_with_policy()` with
the parent's abort policy. Implementations only need to implement
`invoke_with_policy()`.
### LocalOperationEnv (Layer 0)
```rust
pub struct LocalOperationEnv {
registry: Arc<OperationRegistry>,
}
#[async_trait]
impl OperationEnv for LocalOperationEnv {
async fn invoke_with_policy(&self, namespace: &str, operation: &str, input: Value, parent: &OperationContext, policy: AbortPolicy) -> ResponseEnvelope {
let name = format!("{namespace}/{operation}");
// 1. Reachability check (ADR-015, ADR-022): is this op in parent's scoped env?
if !parent.scoped_env.allows(&name) {
return ResponseEnvelope::not_found(name);
}
// 2. Look up registration
let registration = self.registry.registration(&name);
// 3. Construct child OperationContext
let context = OperationContext {
request_id: generate_request_id(), // UUID v4 — NOT deterministic
parent_request_id: Some(parent.request_id.clone()),
identity: parent.handler_identity.as_identity(), // authority switch
handler_identity: registration.composition_authority.clone(),
capabilities: parent.capabilities.clone(), // inherit
metadata: HashMap::new(), // fresh — does NOT propagate parent metadata (ADR-014)
abort_policy: policy,
deadline: parent.deadline, // inherit — children don't get fresh 30s
scoped_env: registration.scoped_env.clone().unwrap_or_else(ScopedOperationEnv::empty),
env: parent.env.clone(), // inherit the same composite env
internal: true, // nested calls use handler authority
};
// 4. Dispatch
self.registry.invoke(&name, input, context).await
}
// contains() uses default (returns true — curated registry contains everything it can dispatch)
}
```
Key points:
- **Reachability check first**: if op not in parent's scoped_env, NOT_FOUND.
This bounds the parameterized-dispatch attack surface.
- **Authority propagation**: child's `identity` = parent's `handler_identity`
(the parent's composition authority becomes the caller). This is the
authority switch from ADR-015.
- **Fresh metadata**: `HashMap::new()`, NOT parent's metadata. Security
constraint (ADR-014) — prevents secret leakage through composition.
- **Inherited deadline**: children don't get a fresh 30s — the root call's
deadline bounds the entire call tree.
- **Inherited env**: child gets `parent.env.clone()` (the same composite of
curated base + active overlays).
- **internal: true**: this is the flag that switches ACL authority.
### CompositeOperationEnv (per-call, ADR-024)
```rust
pub struct CompositeOperationEnv {
session: Option<Arc<dyn OperationEnv + Send + Sync>>, // Layer 1
connection: Option<Arc<dyn OperationEnv + Send + Sync>>, // Layer 2
base: Arc<dyn OperationEnv + Send + Sync>, // Layer 0 (LocalOperationEnv)
}
#[async_trait]
impl OperationEnv for CompositeOperationEnv {
async fn invoke_with_policy(&self, namespace: &str, operation: &str, input: Value, parent: &OperationContext, policy: AbortPolicy) -> ResponseEnvelope {
let name = format!("{namespace}/{operation}");
// Reachability check (same as LocalOperationEnv)
if !parent.scoped_env.allows(&name) {
return ResponseEnvelope::not_found(name);
}
// Dispatch in overlay order: session → connection → curated base
// First overlay that *contains* the op wins
if let Some(session) = &self.session {
if session.contains(&name) {
return session.invoke_with_policy(namespace, operation, input, parent, policy).await;
}
}
if let Some(connection) = &self.connection {
if connection.contains(&name) {
return connection.invoke_with_policy(namespace, operation, input, parent, policy).await;
}
}
self.base.invoke_with_policy(namespace, operation, input, parent, policy).await
}
fn contains(&self, name: &str) -> bool {
self.session.as_ref().map_or(false, |s| s.contains(name))
|| self.connection.as_ref().map_or(false, |c| c.contains(name))
|| self.base.contains(name)
}
}
```
The `contains()` method (review #003 C9) is the overlay-dispatch contract. It
replaces the previous ambiguous "sentinel or contains check" framing. The
structural decision (composite trait object, overlay order, Arc::clone
inheritance) is locked by ADR-024; the dispatch contract (contains probe before
invoke_with_policy) is locked too.
### Why OperationEnv must remain a trait
The trait-based design enables registry layering (ADR-024):
- The CallAdapter composes the root env per call from curated base + active
connection/session overlays
- Overlays wrap the base via trait layering
- Session-scoped registries (OQ-19) and connection-scoped remote imports
(ADR-017 `from_call`) are both overlays on the same base
Making `OperationEnv` concrete or hardcoding the global registry into the
dispatch path would close both patterns. This is the same integration-point
pattern as `IdentityProvider` (ADR-004).
## Acceptance Criteria
- [ ] `OperationEnv` trait with `invoke()`, `invoke_with_policy()`, `contains()`
- [ ] `invoke()` has default impl delegating to `invoke_with_policy()` with parent's policy
- [ ] `contains()` has default impl returning `true`
- [ ] `LocalOperationEnv` struct holding `Arc<OperationRegistry>`
- [ ] `LocalOperationEnv::invoke_with_policy` checks reachability (scoped_env.allows)
- [ ] `LocalOperationEnv` constructs child context with internal: true, authority switch
- [ ] `LocalOperationEnv` fresh metadata (HashMap::new(), not parent's)
- [ ] `LocalOperationEnv` inherited deadline (parent.deadline, not fresh 30s)
- [ ] `LocalOperationEnv` inherited env (parent.env.clone())
- [ ] `CompositeOperationEnv` with session, connection, base fields
- [ ] `CompositeOperationEnv::invoke_with_policy` dispatches in overlay order (session → connection → base)
- [ ] `CompositeOperationEnv` uses `contains()` probe before dispatching to overlay
- [ ] `CompositeOperationEnv::contains` returns true if any layer contains the op
- [ ] Reachability check returns NOT_FOUND if op not in scoped_env
- [ ] Unit test: LocalOperationEnv invoke with allowed op → dispatches
- [ ] Unit test: LocalOperationEnv invoke with disallowed op → NOT_FOUND
- [ ] Unit test: child context has internal: true
- [ ] Unit test: child context identity = parent's handler_identity
- [ ] Unit test: child metadata is fresh (empty), not parent's
- [ ] Unit test: CompositeOperationEnv dispatches to session overlay if contains
- [ ] Unit test: CompositeOperationEnv falls through to base if no overlay contains
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/crates/call/operation-registry.md — OperationEnv, LocalOperationEnv, CompositeOperationEnv
- docs/architecture/decisions/015-privilege-model-and-authority-context.md — ADR-015 (authority switch)
- docs/architecture/decisions/016-abort-cascade-for-nested-calls.md — ADR-016 (abort policy propagation)
- docs/architecture/decisions/024-operation-registry-layering.md — ADR-024 (layering, contains contract)
## Notes
> **Read ADR-024 before starting.** The trait-object design is load-bearing —
> OperationEnv MUST remain a trait, not a concrete type. The authority switch
> (child identity = parent handler_identity) is the ADR-015 privilege model.
> Metadata does NOT propagate (ADR-014 security constraint). Deadline
> inherits (children don't get fresh 30s). The `contains()` probe is the
> overlay-dispatch contract from review #003 C9 — any OperationEnv impl that
> correctly reports contains works with the composite.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,168 @@
---
id: call/registry/operation-spec
name: Implement OperationSpec, OperationType, Visibility, ErrorDefinition, and AccessControl
status: pending
depends_on: [call/crate-init]
scope: moderate
risk: medium
impact: component
level: implementation
---
## Description
Implement the operation specification types in `src/registry/spec.rs`. These
types declare what an operation is, its schemas, and its access control policy.
### OperationSpec
```rust
pub struct OperationSpec {
pub name: String, // e.g., "fs/readFile", "agent/chat" (no leading slash)
pub namespace: String, // e.g., "fs", "agent"
pub op_type: OperationType, // Query, Mutation, Subscription
pub visibility: Visibility, // External (wire-callable) or Internal (composition-only)
pub input_schema: Value, // JSON Schema for input
pub output_schema: Value, // JSON Schema for output
pub error_schemas: Vec<ErrorDefinition>, // Declared domain errors (ADR-023)
pub access_control: AccessControl,
}
```
Operation names use slash-based paths **without a leading slash**, aligned with
URL path conventions: `fs/readFile`, `agent/chat`, `services/list`. The leading
slash is added for display (`spec.path()` returns `/fs/readFile`) and wire
format. The registry stores names without the leading slash.
The `namespace` field is derived from the name: for `fs/readFile` it's `fs`,
for `agent/chat` it's `agent`. It's a convenience accessor for ACL matching and
service grouping.
Implement `OperationSpec::path(&self) -> String` that returns `/{name}` (the
wire/display form with leading slash).
### OperationType
```rust
pub enum OperationType {
Query, // Read-only, idempotent (e.g., "fs/readFile", "services/list")
Mutation, // Side effects (e.g., "bash/exec", "github/authenticate")
Subscription, // Streaming (e.g., "agent/chat", "events/subscribe")
}
```
### Visibility
```rust
pub enum Visibility {
External, // Callable from the wire (call.requested from a client)
Internal, // Composition-only (env.invoke from a handler)
}
```
`External` operations appear in `services/list` and accept `call.requested`.
`Internal` operations return `NOT_FOUND` when called from the wire and do not
appear in `services/list`. The assembly layer declares visibility at
registration. All import adapters register operations as `Internal` by default
(they're composition material); the handler that composes them is `External`.
### ErrorDefinition
```rust
pub struct ErrorDefinition {
pub code: String, // e.g., "FILE_NOT_FOUND", "RATE_LIMITED"
pub description: String, // Human-readable description
pub schema: Value, // JSON Schema for the error detail payload
pub http_status: Option<u16>, // HTTP status for adapter projection (from_openapi/to_openapi)
}
```
A declared operation-level error (ADR-023). When a handler returns a `CallError`
whose `code` matches a declared `ErrorDefinition`, the `call.error` event
carries that code and the error's detail payload. If it doesn't match, the
`call.error` carries `INTERNAL`.
### AccessControl
```rust
pub struct AccessControl {
pub required_scopes: Vec<String>, // AND-checked: caller must have ALL
pub required_scopes_any: Option<Vec<String>>, // OR-checked: caller must have at LEAST ONE
pub resource_type: Option<String>, // e.g., "service"
pub resource_action: Option<String>, // e.g., "read"
}
```
### ACL check flow
When a `call.requested` event arrives:
1. Registry checks **visibility** — if `Internal`, returns `NOT_FOUND` (does
not leak existence)
2. Registry checks `access_control.check(identity)`:
- For external calls (`internal: false`): ACL against the **caller's identity**
- For internal calls (`internal: true`): ACL against the **handler's
composition authority** (ADR-015)
3. If denied: `FORBIDDEN`
4. If identity is `None` and operation has restrictions: `FORBIDDEN` with
message `"authentication required"`
Operations with empty `AccessControl` (no required scopes, no resource checks)
are accessible to all callers, including unauthenticated ones.
### Implement AccessControl::check
```rust
impl AccessControl {
pub fn check(&self, identity: Option<&Identity>) -> AccessResult;
}
pub enum AccessResult {
Allowed,
Forbidden(String), // reason
}
```
The check logic:
- `required_scopes`: caller must have ALL (subset check)
- `required_scopes_any`: caller must have at LEAST ONE (if present)
- `resource_type` / `resource_action`: check against `identity.resources`
- If `identity` is `None` and any scope/resource is required: `Forbidden("authentication required")`
## Acceptance Criteria
- [ ] `OperationSpec` struct with all 8 fields
- [ ] `OperationSpec::path()` returns `/{name}` (leading slash for wire/display)
- [ ] `OperationSpec::namespace` derived from name (split on `/`)
- [ ] `OperationType` enum with Query, Mutation, Subscription
- [ ] `Visibility` enum with External, Internal
- [ ] `ErrorDefinition` struct with all 4 fields
- [ ] `AccessControl` struct with all 4 fields
- [ ] `AccessControl::check(identity)` returns `AccessResult`
- [ ] `required_scopes` is AND-checked (caller must have all)
- [ ] `required_scopes_any` is OR-checked (caller must have at least one)
- [ ] `None` identity with restrictions → `Forbidden("authentication required")`
- [ ] Empty AccessControl → `Allowed` for all callers
- [ ] Unit tests for AccessControl::check (all combinations)
- [ ] Unit test: OperationSpec::path() produces leading slash
- [ ] Unit test: namespace derived correctly from name
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/crates/call/operation-registry.md — OperationSpec, AccessControl, Visibility
- docs/architecture/decisions/015-privilege-model-and-authority-context.md — ADR-015 (visibility, ACL)
- docs/architecture/decisions/023-operation-error-schemas.md — ADR-023 (ErrorDefinition)
## Notes
> Operation names have NO leading slash in the registry (`fs/readFile`). The
> leading slash is added for wire format and display (`/fs/readFile`). This is
> a single rule applied consistently — do not mix the two forms. Visibility
> controls wire-callability: Internal ops return NOT_FOUND from the wire (don't
> leak existence). AccessControl.check is the ACL gate — read it carefully
> against ADR-015 for the internal vs external authority distinction.
## Summary
> To be filled on completion

View File

@@ -0,0 +1,181 @@
---
id: call/registry/service-discovery
name: Implement services/list and services/schema built-in operations
status: pending
depends_on: [call/registry/handler-registration]
scope: narrow
risk: low
impact: isolated
level: implementation
---
## Description
Implement the two built-in service discovery operations in
`src/registry/discovery.rs`. These are read-only operations that expose what
the node offers.
### Operations
| Operation name | Display path | Type | Description |
|---------------|-------------|------|-------------|
| `services/list` | `/services/list` | Query | List registered operation names and metadata |
| `services/schema` | `/services/schema` | Query | Get the OperationSpec for a specific operation |
### services/list
Returns `External` operations only. `Internal` operations are not part of the
wire-facing API surface — they're implementation details of composition. A
remote client cannot enumerate the internal call tree (ADR-015).
```json
{
"operations": [
{ "name": "fs/readFile", "namespace": "fs", "op_type": "query" },
{ "name": "agent/chat", "namespace": "agent", "op_type": "subscription" },
{ "name": "events/subscribe", "namespace": "events", "op_type": "subscription" }
]
}
```
The handler queries the registry's `list_operations()` (which returns External
specs only) and serializes to the above format.
### services/schema
Accepts `{ "name": "fs/readFile" }` (no leading slash — registry form, same as
`OperationSpec.name`) and returns the full `OperationSpec` including
input/output JSON Schemas and declared `error_schemas` (ADR-023).
The CallAdapter normalizes the leading slash from wire `operationId`s before
lookup, so `services/schema` accepts both `fs/readFile` and `/fs/readFile`.
This enables client code generation: a client reading the schema can produce
typed error enums instead of generic error handling.
### Registration
These are registered as `Local` provenance with empty composition authority,
empty scoped env, and empty capabilities (they don't compose, don't need
credentials):
```rust
.with_local(services_list_spec(), Arc::new(services_list_handler),
CompositionAuthority::none(), ScopedOperationEnv::empty(), Capabilities::new())
.with_local(services_schema_spec(), Arc::new(schema_handler),
CompositionAuthority::none(), ScopedOperationEnv::empty(), Capabilities::new())
```
### Specs
```rust
fn services_list_spec() -> OperationSpec {
OperationSpec {
name: "services/list".into(),
namespace: "services".into(),
op_type: OperationType::Query,
visibility: Visibility::External,
input_schema: json!({}), // no input
output_schema: json!({
"type": "object",
"properties": {
"operations": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"namespace": { "type": "string" },
"op_type": { "type": "string", "enum": ["query", "mutation", "subscription"] }
}
}
}
}
}),
error_schemas: vec![],
access_control: AccessControl::default(), // no restrictions — callable by all
}
}
fn services_schema_spec() -> OperationSpec {
OperationSpec {
name: "services/schema".into(),
namespace: "services".into(),
op_type: OperationType::Query,
visibility: Visibility::External,
input_schema: json!({
"type": "object",
"properties": { "name": { "type": "string" } },
"required": ["name"]
}),
output_schema: json!({ /* full OperationSpec schema */ }),
error_schemas: vec![],
access_control: AccessControl::default(),
}
}
```
### Handlers
The handlers need access to the registry. Since handlers are `Arc<dyn Fn>`,
the registry reference is captured in the closure. Use `Arc<OperationRegistry>`
cloned into the closure.
```rust
fn services_list_handler(registry: Arc<OperationRegistry>) -> Handler {
Arc::new(move |input: Value, ctx: OperationContext| {
let registry = registry.clone();
Box::pin(async move {
let ops: Vec<_> = registry.list_operations()
.into_iter()
.filter(|s| s.visibility == Visibility::External)
.map(|s| json!({
"name": s.name,
"namespace": s.namespace,
"op_type": match s.op_type {
OperationType::Query => "query",
OperationType::Mutation => "mutation",
OperationType::Subscription => "subscription",
}
}))
.collect();
ResponseEnvelope::ok(ctx.request_id, json!({ "operations": ops }))
})
})
}
```
## Acceptance Criteria
- [ ] `services/list` spec with correct fields (Query, External, no input, output schema)
- [ ] `services/schema` spec with correct fields (Query, External, name input, full spec output)
- [ ] `services/list` handler returns External operations only (Internal excluded)
- [ ] `services/list` output format matches spec (operations array with name, namespace, op_type)
- [ ] `services/schema` handler accepts name with or without leading slash
- [ ] `services/schema` returns full OperationSpec (input_schema, output_schema, error_schemas)
- [ ] `services/schema` returns NOT_FOUND for unknown operation name
- [ ] Both registered as Local provenance, empty authority/env/caps
- [ ] Both have empty AccessControl (callable by all, including unauthenticated)
- [ ] Unit test: services/list returns only External ops
- [ ] Unit test: services/schema returns spec for known op
- [ ] Unit test: services/schema returns NOT_FOUND for unknown op
- [ ] Unit test: services/schema accepts both "fs/readFile" and "/fs/readFile"
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/crates/call/operation-registry.md — Service Discovery section
- docs/architecture/decisions/015-privilege-model-and-authority-context.md — ADR-015 (Internal not in services/list)
## Notes
> services/list returns External ops only — Internal ops are implementation
> details of composition and must not be enumerable from the wire. The
> CallAdapter normalizes leading slashes, so services/schema accepts both
> forms. These are the only built-in operations; no admin operations are
> exposed through the call protocol itself.
## Summary
> To be filled on completion

137
tasks/call/review-call.md Normal file
View File

@@ -0,0 +1,137 @@
---
id: call/review-call
name: Review alknet-call implementation for spec conformance and pattern consistency
status: pending
depends_on: [call/protocol/abort-cascade]
scope: broad
risk: low
impact: phase
level: review
---
## Description
Review the alknet-call implementation for spec conformance, pattern
consistency, and correctness. This is the quality checkpoint at the end of the
call phase — the most complex crate in this batch.
### Review Checklist
1. **Registry conformance** (operation-registry.md):
- `OperationSpec` has all 8 fields, `path()` adds leading slash
- `OperationType` (Query/Mutation/Subscription), `Visibility` (External/Internal)
- `ErrorDefinition` with code, description, schema, http_status (ADR-023)
- `AccessControl` with required_scopes (AND), required_scopes_any (OR), resource checks
- `AccessControl::check` returns Allowed/Forbidden, None identity with restrictions → Forbidden
- `OperationContext` has all 10 fields, `internal` is pub(crate), `is_internal()` reads
- `AbortPolicy` (AbortDependents default, ContinueRunning opt-in)
- `CompositionAuthority` with label, scopes, resources, `as_identity()`
- `ScopedOperationEnv` with `allows()` reachability check
- `Handler` type (async closure → ResponseEnvelope)
- `HandlerRegistration` with all 6 fields (spec, handler, provenance, authority, scoped_env, caps)
- `OperationProvenance` with all 6 variants
- `OperationRegistry` with register, registration, invoke, list_operations
- `OperationRegistryBuilder` with with_local, with_leaf, with, build
- `OperationEnv` trait with invoke, invoke_with_policy, contains
- `LocalOperationEnv` reachability check, authority switch, fresh metadata, inherited deadline
- `CompositeOperationEnv` overlay dispatch (session → connection → base), contains probe
- `services/list` returns External only, `services/schema` returns full spec
2. **Protocol conformance** (call-protocol.md):
- `EventEnvelope` with type, id, payload (JSON, length-prefixed framing)
- `ResponseEnvelope` with request_id, result
- `CallError` with code, message, retryable, details
- 5 event types: call.requested, call.responded, call.completed, call.aborted, call.error
- Wire payload schemas match spec table
- `call.requested` has operationId (leading slash), input, optional auth_token
- `call.error` has protocol-level codes (NOT_FOUND, FORBIDDEN, INVALID_INPUT, INTERNAL, TIMEOUT)
- `PendingRequestMap` correlates by ID (not stream), handles all event types
- `CallConnection` with Layer 2 overlay, register_imported, overlay_env, call/subscribe/abort
- `CallAdapter` implements ProtocolHandler for alknet/call
- CallAdapter stream handling (accept_bi loop, FrameFramedReader/Writer)
- Per-request identity resolution (auth_token overrides connection-level)
- `build_root_context` sets internal: false, deadline, capabilities from registration
- `compose_root_env` builds CompositeOperationEnv (base + session + connection)
- operationId leading slash stripped before lookup
- ResponseEnvelope → EventEnvelope conversion
- Timeout: 30s default, composed calls inherit parent deadline
- Abort cascade: walks tree by parent_request_id, AbortDependents/ContinueRunning
3. **ADR conformance**:
- ADR-005: irpc framing used
- ADR-012: bidirectional streams, ID-based correlation
- ADR-014: no secret material on wire, Capabilities non-serializable
- ADR-015: internal flag switches authority (handler_identity vs identity), Visibility
- ADR-016: abort cascade, AbortPolicy, default AbortDependents
- ADR-017: connection direction independent of call direction
- ADR-022: registration bundle (provenance, authority, scoped_env, capabilities)
- ADR-023: ErrorDefinition, typed details in call.error
- ADR-024: registry layering (curated + session + connection), OperationEnv as trait
4. **Security constraints**:
- Capabilities non-serializable (no Serialize derive)
- Capabilities zeroized, immutable after construction
- Metadata does not propagate through composition (fresh HashMap::new())
- Call protocol carries no secret material
- Internal ops return NOT_FOUND from wire (don't leak existence)
- Reachability check (scoped_env.allows) bounds composition attack surface
- Request IDs are UUID v4 (non-deterministic, no collisions)
5. **Pattern consistency**:
- OperationEnv is a trait (not concrete) — enables layering
- CompositeOperationEnv uses contains() probe before dispatch
- Authority switch in invoke_with_policy (child identity = parent handler_identity)
- Deadline inheritance (children don't get fresh 30s)
- ArcSwap not used in call (that's core's pattern)
6. **Test coverage**:
- Unit tests for AccessControl::check (all combinations)
- Unit tests for OperationContext construction
- Unit tests for OperationEnv (LocalOperationEnv, CompositeOperationEnv)
- Unit tests for PendingRequestMap (all event types, timeouts, fail_all)
- Unit tests for framing (round-trip, truncation)
- Unit tests for abort cascade (both policies, tree walking)
- Integration test: call.requested → dispatch → call.responded
- Integration test: auth_token overrides identity
- Integration test: Internal op → NOT_FOUND from wire
- Integration test: ACL denied → FORBIDDEN
- Integration test: subscription streaming (multiple responded, completed)
## Acceptance Criteria
- [ ] All registry types match operation-registry.md
- [ ] All protocol types match call-protocol.md
- [ ] All ADRs conformed to (005, 012, 014, 015, 016, 017, 022, 023, 024)
- [ ] Capabilities non-serializable, zeroized, immutable
- [ ] Metadata does not propagate through composition
- [ ] Internal ops return NOT_FOUND from wire
- [ ] Reachability check bounds composition
- [ ] Request IDs are UUID v4
- [ ] OperationEnv is a trait (not concrete)
- [ ] CompositeOperationEnv uses contains() probe
- [ ] Authority switch correct (internal: true → handler_identity)
- [ ] Deadline inheritance correct (children inherit parent deadline)
- [ ] Test coverage adequate for all functionality
- [ ] `cargo fmt --check -p alknet-call` passes
- [ ] `cargo clippy -p alknet-call` passes with no warnings
- [ ] All tests pass
## References
- docs/architecture/crates/call/README.md
- docs/architecture/crates/call/call-protocol.md
- docs/architecture/crates/call/operation-registry.md
- docs/architecture/decisions/ (relevant ADRs: 005, 012, 014-017, 022-024)
## Notes
> This is the most complex crate in this batch. The review should verify that
> the registry layering (ADR-024), authority switch (ADR-015), abort cascade
> (ADR-016), and composition model (ADR-022) all work correctly together. The
> OperationEnv trait-object design is load-bearing — verify it's a trait, not
> concrete. If deviations are found, document and fix before considering the
> call crate complete.
## Summary
> To be filled on completion