--- id: call/protocol/abort-cascade name: Implement abort cascade logic for nested calls (ADR-016) status: completed depends_on: [call/protocol/call-adapter] scope: moderate risk: high impact: component level: implementation --- ## Description Implement the abort cascade logic in `src/protocol/abort.rs`. When a handler composes other operations via `OperationEnv::invoke()`, it creates a call tree: a parent request (r1) spawns children (r1-a, r1-b), which may spawn their own children. When `call.aborted` arrives for a parent, the protocol cascades the abort to all non-terminal descendants. **Read ADR-016 before starting this task.** ### Call tree The call tree is indexed by `parent_request_id` in the `PendingRequestMap`. The root request has `parent_request_id: None`. Each composed call has `parent_request_id: Some(parent.request_id)`. ``` r1 (root, wire call) ├── r1-a (composed by r1's handler) │ ├── r1-a-1 (composed by r1-a's handler) │ └── r1-a-2 └── r1-b └── r1-b-1 ``` ### Abort cascade When `call.aborted` arrives for a parent request: 1. Find all non-terminal descendants in the tree (walk by `parent_request_id`) 2. Send `call.aborted` for each descendant 3. Cancel each descendant's future (Drop releases resources) The CallAdapter walks the tree indexed by `parent_request_id` in `PendingRequestMap` and sends `call.aborted` for each descendant. ### AbortPolicy The abort policy is set on `OperationContext` and propagated through `OperationEnv::invoke()` — the composing handler decides the child's policy, not the wire caller. **`AbortDependents` (default)**: aborting a request aborts everything downstream, regardless of branch. This is the correct default because aborted parent work has no consumer waiting for results — continuing is wasted work at best and unwanted side effects at worst (e.g., a `bash/exec` that keeps running after the caller stopped caring). **`ContinueRunning` (opt-in)**: descendants that have already started continue to completion; descendants that haven't started yet are aborted; no new descendants start. Use for long-running work that should survive a parent's abort (e.g., a subscription that should keep streaming). ### Wire visibility Composed child `request_id`s are **internal** — they appear in `PendingRequestMap` for abort-cascade indexing but are not sent as `call.requested` to any peer. The client only sees `call.aborted` for the root ID it sent; the server cascades internally to descendants. The exception is `from_call` ops, which generate their own wire ID when forwarding to the remote node (the remote node's `PendingRequestMap` indexes it). ### Implementation The abort cascade needs access to the `PendingRequestMap` to walk the tree. The `CallAdapter` holds the `PendingRequestMap` (or a reference to it). The cascade logic: ```rust pub struct AbortCascade { // Access to PendingRequestMap for tree walking // The map indexes entries by request_id, and each entry knows its parent_request_id // (from OperationContext, stored when the entry was registered) } impl AbortCascade { /// Cascade an abort from the given request ID to all non-terminal descendants. /// Returns the list of request IDs that were aborted (for logging/auditing). pub fn cascade_abort(&self, root_request_id: &str, policy: AbortPolicy) -> Vec; /// Find all descendants of a request ID in the call tree. fn find_descendants(&self, parent_id: &str) -> Vec; } ``` ### Storing parent_request_id in PendingRequestMap The `PendingRequestMap` needs to know the `parent_request_id` for each entry to walk the tree. This means `PendingEntry` needs to store the parent ID (or the full `OperationContext`): ```rust enum PendingEntry { Call { tx: oneshot::Sender>, timeout: Instant, parent_request_id: Option, // for abort cascade tree }, Subscribe { tx: mpsc::Sender>, timeout: Option, parent_request_id: Option, // for abort cascade tree }, } ``` Update the `PendingRequestMap` (from the pending-request-map task) to store `parent_request_id` when registering entries. The `register_call` and `register_subscribe` methods take an optional `parent_request_id` parameter. ### AbortPolicy propagation The abort policy is propagated through `OperationEnv::invoke()`: - `invoke()` uses the default impl, which delegates to `invoke_with_policy()` with `parent.abort_policy.clone()` - `invoke_with_policy()` takes an explicit policy — use `AbortPolicy::ContinueRunning` for long-running work When cascading: - `AbortDependents`: abort ALL descendants (started and unstarted) - `ContinueRunning`: abort only unstarted descendants; started ones continue to completion; no new descendants start Determining "started" vs "unstarted" is tricky. A practical approach: - A descendant is "started" if its handler has begun executing (the future has been polled at least once) - A descendant is "unstarted" if it's queued but not yet dispatched This may require tracking dispatch state in `PendingEntry`. A simpler approximation: under `ContinueRunning`, abort all descendants that haven't sent a `call.responded` yet (they're still pending). This is conservative but safe. ### Handler cleanup Handlers clean up resources when their call is cancelled. In Rust, the future is dropped and `Drop` guards release resources (HTTP streams, file handles, locks). This is a handler-level concern; the protocol's job is to cascade the abort. See ADR-016. ## Acceptance Criteria - [ ] `PendingEntry` stores `parent_request_id` (Call and Subscribe variants) - [ ] `register_call` and `register_subscribe` accept optional `parent_request_id` - [ ] `AbortCascade` struct with `cascade_abort()` method - [ ] `cascade_abort` walks the tree by `parent_request_id` - [ ] `AbortDependents`: aborts ALL descendants (started and unstarted) - [ ] `ContinueRunning`: aborts unstarted descendants, started ones continue - [ ] `cascade_abort` returns list of aborted request IDs - [ ] `call.aborted` for unknown request_id is silently discarded - [ ] Composed child request_ids are internal (not sent as call.requested to peer) - [ ] Client only sees call.aborted for the root ID it sent - [ ] AbortPolicy propagated through OperationEnv::invoke() - [ ] Unit test: cascade aborts all descendants under AbortDependents - [ ] Unit test: cascade aborts only unstarted under ContinueRunning - [ ] Unit test: unknown request_id → no-op (silently discarded) - [ ] Unit test: tree with depth 3, abort root → all descendants aborted - [ ] `cargo test -p alknet-call` succeeds - [ ] `cargo clippy -p alknet-call` succeeds with no warnings ## References - docs/architecture/decisions/016-abort-cascade-for-nested-calls.md — ADR-016 (full rationale) - docs/architecture/crates/call/call-protocol.md — Abort Cascade and Nested Calls section - docs/architecture/crates/call/operation-registry.md — AbortPolicy, OperationContext.abort_policy ## Notes > **Read ADR-016 before starting.** The abort cascade walks the call tree > indexed by parent_request_id in PendingRequestMap. The default policy > (AbortDependents) aborts everything downstream — this is correct because > aborted parent work has no consumer. ContinueRunning is the opt-in for > long-running work. Composed child request_ids are internal — the client only > sees call.aborted for the root ID. The PendingRequestMap needs to store > parent_request_id for tree walking — update the pending-request-map task's > output if needed. ## Summary Implemented `AbortCascade` in `protocol/abort.rs` per ADR-016: `PendingEntry` now stores `parent_request_id` (Call & Subscribe) and a `started` flag for tree indexing. `AbortCascade::cascade_abort` walks the call tree by `parent_request_id` and aborts descendants per `AbortPolicy` (`AbortDependents` aborts all; `ContinueRunning` aborts only unstarted via `mark_started()`). Returns sorted list of aborted IDs; unknown root silently discarded. 20 unit tests covering depth-3 cascade, mixed Call/Subscribe, determinism, both policies (159 total in call crate, 290+ workspace-wide). Clippy clean. Merged to develop.