Files
alknet/tasks/call/protocol/abort-cascade.md

200 lines
8.1 KiB
Markdown

---
id: call/protocol/abort-cascade
name: Implement abort cascade logic for nested calls (ADR-016)
status: completed
depends_on: [call/protocol/call-adapter]
scope: moderate
risk: high
impact: component
level: implementation
---
## Description
Implement the abort cascade logic in `src/protocol/abort.rs`. When a handler
composes other operations via `OperationEnv::invoke()`, it creates a call tree:
a parent request (r1) spawns children (r1-a, r1-b), which may spawn their own
children. When `call.aborted` arrives for a parent, the protocol cascades the
abort to all non-terminal descendants.
**Read ADR-016 before starting this task.**
### Call tree
The call tree is indexed by `parent_request_id` in the `PendingRequestMap`. The
root request has `parent_request_id: None`. Each composed call has
`parent_request_id: Some(parent.request_id)`.
```
r1 (root, wire call)
├── r1-a (composed by r1's handler)
│ ├── r1-a-1 (composed by r1-a's handler)
│ └── r1-a-2
└── r1-b
└── r1-b-1
```
### Abort cascade
When `call.aborted` arrives for a parent request:
1. Find all non-terminal descendants in the tree (walk by `parent_request_id`)
2. Send `call.aborted` for each descendant
3. Cancel each descendant's future (Drop releases resources)
The CallAdapter walks the tree indexed by `parent_request_id` in
`PendingRequestMap` and sends `call.aborted` for each descendant.
### AbortPolicy
The abort policy is set on `OperationContext` and propagated through
`OperationEnv::invoke()` — the composing handler decides the child's policy,
not the wire caller.
**`AbortDependents` (default)**: aborting a request aborts everything
downstream, regardless of branch. This is the correct default because aborted
parent work has no consumer waiting for results — continuing is wasted work at
best and unwanted side effects at worst (e.g., a `bash/exec` that keeps running
after the caller stopped caring).
**`ContinueRunning` (opt-in)**: descendants that have already started continue
to completion; descendants that haven't started yet are aborted; no new
descendants start. Use for long-running work that should survive a parent's
abort (e.g., a subscription that should keep streaming).
### Wire visibility
Composed child `request_id`s are **internal** — they appear in
`PendingRequestMap` for abort-cascade indexing but are not sent as
`call.requested` to any peer. The client only sees `call.aborted` for the root
ID it sent; the server cascades internally to descendants.
The exception is `from_call` ops, which generate their own wire ID when
forwarding to the remote node (the remote node's `PendingRequestMap` indexes
it).
### Implementation
The abort cascade needs access to the `PendingRequestMap` to walk the tree.
The `CallAdapter` holds the `PendingRequestMap` (or a reference to it). The
cascade logic:
```rust
pub struct AbortCascade {
// Access to PendingRequestMap for tree walking
// The map indexes entries by request_id, and each entry knows its parent_request_id
// (from OperationContext, stored when the entry was registered)
}
impl AbortCascade {
/// Cascade an abort from the given request ID to all non-terminal descendants.
/// Returns the list of request IDs that were aborted (for logging/auditing).
pub fn cascade_abort(&self, root_request_id: &str, policy: AbortPolicy) -> Vec<String>;
/// Find all descendants of a request ID in the call tree.
fn find_descendants(&self, parent_id: &str) -> Vec<String>;
}
```
### Storing parent_request_id in PendingRequestMap
The `PendingRequestMap` needs to know the `parent_request_id` for each entry to
walk the tree. This means `PendingEntry` needs to store the parent ID (or the
full `OperationContext`):
```rust
enum PendingEntry {
Call {
tx: oneshot::Sender<Result<Value, CallError>>,
timeout: Instant,
parent_request_id: Option<String>, // for abort cascade tree
},
Subscribe {
tx: mpsc::Sender<Result<Value, CallError>>,
timeout: Option<Instant>,
parent_request_id: Option<String>, // for abort cascade tree
},
}
```
Update the `PendingRequestMap` (from the pending-request-map task) to store
`parent_request_id` when registering entries. The `register_call` and
`register_subscribe` methods take an optional `parent_request_id` parameter.
### AbortPolicy propagation
The abort policy is propagated through `OperationEnv::invoke()`:
- `invoke()` uses the default impl, which delegates to `invoke_with_policy()`
with `parent.abort_policy.clone()`
- `invoke_with_policy()` takes an explicit policy — use
`AbortPolicy::ContinueRunning` for long-running work
When cascading:
- `AbortDependents`: abort ALL descendants (started and unstarted)
- `ContinueRunning`: abort only unstarted descendants; started ones continue to
completion; no new descendants start
Determining "started" vs "unstarted" is tricky. A practical approach:
- A descendant is "started" if its handler has begun executing (the future has
been polled at least once)
- A descendant is "unstarted" if it's queued but not yet dispatched
This may require tracking dispatch state in `PendingEntry`. A simpler
approximation: under `ContinueRunning`, abort all descendants that haven't sent
a `call.responded` yet (they're still pending). This is conservative but safe.
### Handler cleanup
Handlers clean up resources when their call is cancelled. In Rust, the future
is dropped and `Drop` guards release resources (HTTP streams, file handles,
locks). This is a handler-level concern; the protocol's job is to cascade the
abort. See ADR-016.
## Acceptance Criteria
- [ ] `PendingEntry` stores `parent_request_id` (Call and Subscribe variants)
- [ ] `register_call` and `register_subscribe` accept optional `parent_request_id`
- [ ] `AbortCascade` struct with `cascade_abort()` method
- [ ] `cascade_abort` walks the tree by `parent_request_id`
- [ ] `AbortDependents`: aborts ALL descendants (started and unstarted)
- [ ] `ContinueRunning`: aborts unstarted descendants, started ones continue
- [ ] `cascade_abort` returns list of aborted request IDs
- [ ] `call.aborted` for unknown request_id is silently discarded
- [ ] Composed child request_ids are internal (not sent as call.requested to peer)
- [ ] Client only sees call.aborted for the root ID it sent
- [ ] AbortPolicy propagated through OperationEnv::invoke()
- [ ] Unit test: cascade aborts all descendants under AbortDependents
- [ ] Unit test: cascade aborts only unstarted under ContinueRunning
- [ ] Unit test: unknown request_id → no-op (silently discarded)
- [ ] Unit test: tree with depth 3, abort root → all descendants aborted
- [ ] `cargo test -p alknet-call` succeeds
- [ ] `cargo clippy -p alknet-call` succeeds with no warnings
## References
- docs/architecture/decisions/016-abort-cascade-for-nested-calls.md — ADR-016 (full rationale)
- docs/architecture/crates/call/call-protocol.md — Abort Cascade and Nested Calls section
- docs/architecture/crates/call/operation-registry.md — AbortPolicy, OperationContext.abort_policy
## Notes
> **Read ADR-016 before starting.** The abort cascade walks the call tree
> indexed by parent_request_id in PendingRequestMap. The default policy
> (AbortDependents) aborts everything downstream — this is correct because
> aborted parent work has no consumer. ContinueRunning is the opt-in for
> long-running work. Composed child request_ids are internal — the client only
> sees call.aborted for the root ID. The PendingRequestMap needs to store
> parent_request_id for tree walking — update the pending-request-map task's
> output if needed.
## Summary
Implemented `AbortCascade` in `protocol/abort.rs` per ADR-016: `PendingEntry`
now stores `parent_request_id` (Call & Subscribe) and a `started` flag for tree
indexing. `AbortCascade::cascade_abort` walks the call tree by `parent_request_id`
and aborts descendants per `AbortPolicy` (`AbortDependents` aborts all;
`ContinueRunning` aborts only unstarted via `mark_started()`). Returns sorted
list of aborted IDs; unknown root silently discarded. 20 unit tests covering
depth-3 cascade, mixed Call/Subscribe, determinism, both policies (159 total in
call crate, 290+ workspace-wide). Clippy clean. Merged to develop.