docs(architecture): add ADR-016 abort cascade for nested calls, resolve OQ-17
ADR-016 locks the abort cascade model: - call.aborted cascades to all non-terminal descendants via parent_request_id - Default policy: abort-dependents (abort everything downstream) - Opt-in: continue-running (started descendants continue, pending ones abort) - Server (CallAdapter) discovers descendants and propagates; client sends one abort - Handlers clean up via Rust async drop semantics (Drop guards) - parent_indexed map suffices for tree walking; flowgraph is optional prior art Spec updates: - call-protocol.md abort cascade section references ADR-016 - OQ-17 resolved, ADR-016 referenced across all call crate specs - README.md updated: ADRs 001-016, OQ-17 moved to resolved
This commit is contained in:
@@ -7,9 +7,9 @@ last_updated: 2026-06-20
|
||||
|
||||
## Current State
|
||||
|
||||
**Pre-implementation.** The project has completed a pivot from a three-layer model to an ALPN-as-service model. The greenfield workspace contains only `alknet-vault` (stable) and research/reference material. Foundational ADRs (001–015) are in place, including the BiStream type definition (ADR-007), vault integration (ADR-008), ALPN router/endpoint (ADR-010), AuthContext structure (ADR-011), call protocol stream model (ADR-012), Rust as canonical implementation language (ADR-013), secret material flow with capability injection (ADR-014), and privilege model with authority context (ADR-015). The alknet-core and alknet-call crate specs are in draft.
|
||||
**Pre-implementation.** The project has completed a pivot from a three-layer model to an ALPN-as-service model. The greenfield workspace contains only `alknet-vault` (stable) and research/reference material. Foundational ADRs (001–016) are in place, including the BiStream type definition (ADR-007), vault integration (ADR-008), ALPN router/endpoint (ADR-010), AuthContext structure (ADR-011), call protocol stream model (ADR-012), Rust as canonical implementation language (ADR-013), secret material flow with capability injection (ADR-014), privilege model with authority context (ADR-015), and abort cascade for nested calls (ADR-016). The alknet-core and alknet-call crate specs are in draft.
|
||||
|
||||
**Next step**: Review alknet-call spec documents, then begin implementation. OQ-11 (handler-level auth resolution observability), OQ-15 (call protocol client and adapter contract), and OQ-17 (abort cascade) will be resolved during or before implementation.
|
||||
**Next step**: Review alknet-call spec documents, then begin implementation. OQ-11 (handler-level auth resolution observability) and OQ-15 (call protocol client and adapter contract) will be resolved during or before implementation.
|
||||
|
||||
## Architecture Documents
|
||||
|
||||
@@ -45,6 +45,7 @@ last_updated: 2026-06-20
|
||||
| [013](decisions/013-rust-canonical-implementation.md) | Rust as Canonical Implementation Language | Accepted |
|
||||
| [014](decisions/014-secret-material-flow-and-capability-injection.md) | Secret Material Flow and Capability Injection | Accepted |
|
||||
| [015](decisions/015-privilege-model-and-authority-context.md) | Privilege Model and Authority Context | Accepted |
|
||||
| [016](decisions/016-abort-cascade-for-nested-calls.md) | Abort Cascade for Nested Calls | Accepted |
|
||||
|
||||
## Open Questions
|
||||
|
||||
@@ -59,6 +60,7 @@ See [open-questions.md](open-questions.md) for the full tracker.
|
||||
- **OQ-08**: Vault integration — CLI-embedded, assembly-layer only (ADR-008, ADR-014)
|
||||
- **OQ-16**: Safe vault operations for call protocol exposure — none for now (ADR-014)
|
||||
- **OQ-18**: Privilege model — `internal` = authority switch, External/Internal visibility, handler identity + scoped env (ADR-015)
|
||||
- **OQ-17**: Abort cascade — `call.aborted` cascades to descendants; default `abort-dependents`, `continue-running` opt-in (ADR-016)
|
||||
|
||||
**Resolved two-way doors:**
|
||||
- **OQ-04**: Dynamic handler registration — static at startup (ADR-010)
|
||||
@@ -72,7 +74,6 @@ See [open-questions.md](open-questions.md) for the full tracker.
|
||||
|
||||
**Open one-way doors (need ADR before implementation):**
|
||||
- **OQ-15**: Call protocol client and adapter contract — alknet-call needs both the server (CallAdapter) and client (call invocation over QUIC), plus the adapter contract traits (from_*, to_*) that enable composition. ADR-014 constrains the adapter contract: adapters take credential sources from the assembly layer, not static tokens. ADR-015 constrains: adapter-registered operations are `Internal` by default.
|
||||
- **OQ-17**: Abort cascade semantics — `call.aborted` cascades to descendants. Default `abort-dependents`, `continue-running` opt-in. One-way door on the event schema; mechanism is a two-way door.
|
||||
- **OQ-19**: Session-scoped operation registries — agent-written operations in a quickjs sandbox, overlaid on the global registry via `OperationEnv` trait layering. Protocol doesn't need changes; the one-way door is not closing the trait-based composition point. Promotion from session to core requires curation review.
|
||||
|
||||
**Deferred (not active):**
|
||||
|
||||
@@ -30,6 +30,7 @@ Structured RPC over QUIC: operations, request/response, streaming subscriptions,
|
||||
| [012](../../decisions/012-call-protocol-stream-model.md) | Call Protocol Stream Model | Bidirectional streams, EventEnvelope, ID-based correlation |
|
||||
| [014](../../decisions/014-secret-material-flow-and-capability-injection.md) | Secret Material Flow and Capability Injection | Call protocol carries no secret material; capabilities injected at assembly layer |
|
||||
| [015](../../decisions/015-privilege-model-and-authority-context.md) | Privilege Model and Authority Context | `internal` = authority switch not ACL skip; External/Internal visibility; handler identity + scoped env |
|
||||
| [016](../../decisions/016-abort-cascade-for-nested-calls.md) | Abort Cascade for Nested Calls | `call.aborted` cascades to descendants; default `abort-dependents`, `continue-running` opt-in |
|
||||
|
||||
## Relevant Open Questions
|
||||
|
||||
@@ -40,7 +41,6 @@ Structured RPC over QUIC: operations, request/response, streaming subscriptions,
|
||||
| OQ-14 | Batch operation semantics | resolved | Correlated `call.requested` events is the correct protocol design |
|
||||
| OQ-15 | Call protocol client and adapter contract | open | ADR-014 constrains adapters: credential sources, not static tokens. ADR-015: adapter ops are Internal by default |
|
||||
| OQ-16 | Safe vault operations for call protocol exposure | resolved (ADR-014) | None exposed for now |
|
||||
| OQ-17 | Abort cascade semantics | open | `call.aborted` cascades to descendants; default `abort-dependents`, `continue-running` opt-in. One-way door on event schema |
|
||||
| OQ-19 | Session-scoped operation registries | open | Agent-written operations overlaid on global registry via `OperationEnv` trait layering. Protocol doesn't need changes; one-way door is not closing the trait-based composition point |
|
||||
|
||||
## Key Design Principles
|
||||
@@ -52,5 +52,5 @@ Structured RPC over QUIC: operations, request/response, streaming subscriptions,
|
||||
5. **irpc is one dispatch backend**: Local operations dispatch directly. irpc service calls (in-process, type-safe) are internal. The call protocol is the external interface.
|
||||
6. **Local dispatch only**: The operation registry dispatches to local handlers. Remote dispatch (federation, head/worker routing) would be a separate mechanism at a different layer, not a modification to alknet-call's path format.
|
||||
7. **No secret material on the wire**: The call protocol carries no private keys, API keys, mnemonics, or decrypted credentials. Handlers receive outbound credentials through `OperationContext.capabilities`, injected at the assembly layer. See ADR-014.
|
||||
8. **Abort cascades to descendants**: `call.aborted` for a parent request cascades to all non-terminal descendants. Default `abort-dependents`; `continue-running` opt-in. See OQ-17.
|
||||
8. **Abort cascades to descendants**: `call.aborted` for a parent request cascades to all non-terminal descendants. Default `abort-dependents`; `continue-running` opt-in. See ADR-016.
|
||||
9. **Internal calls switch authority context, not skip ACL**: The `internal` flag marks composition-originated calls. ACL runs against the handler's identity, not the caller's and not as a blanket skip. Operations have External/Internal visibility. Scoped composition env bounds reachability. See ADR-015.
|
||||
@@ -273,13 +273,13 @@ Local dispatch produces `ResponseEnvelope` with no serialization overhead. The `
|
||||
|
||||
### Abort Cascade and Nested Calls
|
||||
|
||||
When a handler composes other operations via `OperationEnv::invoke()`, it creates a call tree: a parent request (r1) spawns children (r1-a, r1-b), which may spawn their own children. The `parent_request_id` field on `OperationContext` records this tree.
|
||||
When a handler composes other operations via `OperationEnv::invoke()`, it creates a call tree: a parent request (r1) spawns children (r1-a, r1-b), which may spawn their own children. The `parent_request_id` field on `OperationContext` records this tree — it is the agency chain (ADR-015).
|
||||
|
||||
When `call.aborted` arrives for a parent request, the protocol cascades the abort to all non-terminal descendants in the tree. The default policy is **`abort-dependents`**: aborting a request aborts everything downstream, regardless of branch. This is the correct default because aborted parent work has no consumer waiting for results — continuing is wasted work at best and unwanted side effects at worst (e.g., a `bash/exec` that keeps running after the caller stopped caring).
|
||||
When `call.aborted` arrives for a parent request, the protocol cascades the abort to all non-terminal descendants in the tree. The CallAdapter walks the tree (indexed by `parent_request_id` in `PendingRequestMap`) and sends `call.aborted` for each descendant. The default policy is **`abort-dependents`**: aborting a request aborts everything downstream, regardless of branch. This is the correct default because aborted parent work has no consumer waiting for results — continuing is wasted work at best and unwanted side effects at worst (e.g., a `bash/exec` that keeps running after the caller stopped caring).
|
||||
|
||||
An opt-in **`continue-running`** policy is available for cases where long-running work should survive a parent's abort (e.g., a subscription that should keep streaming). The caller or handler specifies the policy at call time.
|
||||
An opt-in **`continue-running`** policy is available for cases where long-running work should survive a parent's abort (e.g., a subscription that should keep streaming). Under `continue-running`, descendants that have already started continue to completion; descendants that haven't started yet are aborted; no new descendants start.
|
||||
|
||||
The one-way door is the protocol event schema: `call.aborted` must carry cascade semantics before implementation, because retrofitting cascade onto a non-cascading abort is a breaking protocol change. The mechanism — how the runtime discovers descendants and propagates cancellation (cancellation tokens, parent-indexed map, or a separate graph structure) — is a two-way door for implementation. See OQ-17.
|
||||
Handlers clean up resources when their call is cancelled (in Rust, the future is dropped and `Drop` guards release resources — HTTP streams, file handles, locks). This is a handler-level concern; the protocol's job is to cascade the abort. See ADR-016.
|
||||
|
||||
## Constraints
|
||||
|
||||
@@ -289,7 +289,7 @@ The one-way door is the protocol event schema: `call.aborted` must carry cascade
|
||||
- The call protocol is transport-agnostic at the envelope level. The `EventEnvelope` framing can run over QUIC streams, WebSocket frames, or Worker `postMessage`. The `CallAdapter` is the QUIC-specific implementation.
|
||||
- `OperationEnv::invoke()` dispatches through the local registry. Remote dispatch (federation, head/worker routing) would be a separate mechanism at a different layer. See ADR-005 and OQ-13.
|
||||
- **The call protocol carries no secret material.** Secret material (private keys, API keys, mnemonics, decrypted credentials, raw tokens) must not appear in `call.requested` payloads, `call.responded` payloads, or `OperationContext.metadata`. The wire format carries `serde_json::Value` and cannot enforce this at the type level — the constraint is architectural, enforced by the operation registry and by convention. Operations that need to share public key material use a dedicated operation that returns only the public component. See ADR-014.
|
||||
- **Abort cascades to descendants.** `call.aborted` for a parent request cascades to all non-terminal descendants in the call tree. Default policy is `abort-dependents`; `continue-running` is an opt-in. See OQ-17.
|
||||
- **Abort cascades to descendants.** `call.aborted` for a parent request cascades to all non-terminal descendants in the call tree. Default policy is `abort-dependents`; `continue-running` is an opt-in. See ADR-016.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
@@ -302,6 +302,7 @@ The one-way door is the protocol event schema: `call.aborted` must carry cascade
|
||||
| Vault integration point | [ADR-008](../../decisions/008-secret-service-integration.md) | Vault is a capability source, accessed at assembly time |
|
||||
| Secret material flow | [ADR-014](../../decisions/014-secret-material-flow-and-capability-injection.md) | Call protocol carries no secret material; capabilities injected at assembly layer |
|
||||
| Privilege model and authority context | [ADR-015](../../decisions/015-privilege-model-and-authority-context.md) | `internal` = authority switch not ACL skip; External/Internal visibility; handler identity + scoped env |
|
||||
| Abort cascade for nested calls | [ADR-016](../../decisions/016-abort-cascade-for-nested-calls.md) | `call.aborted` cascades to descendants; default `abort-dependents`, `continue-running` opt-in |
|
||||
|
||||
## Open Questions
|
||||
|
||||
@@ -311,7 +312,6 @@ See [open-questions.md](../../open-questions.md) for full details.
|
||||
- **OQ-14** (resolved): Batch is a client-side pattern of correlated `call.requested` events, not a protocol primitive.
|
||||
- **OQ-15** (open): Call protocol client and adapter contract. ADR-014 constrains the adapter contract: adapters take credential sources from the assembly layer, not static tokens. ADR-015 constrains: adapter-registered operations are `Internal` by default.
|
||||
- **OQ-16** (resolved by ADR-014): No vault operations are exposed over the call protocol for now.
|
||||
- **OQ-17** (open): Abort cascade semantics — `call.aborted` cascades to descendants, default `abort-dependents`, `continue-running` opt-in. One-way door on the event schema; mechanism is a two-way door.
|
||||
- **OQ-19** (open): Session-scoped operation registries — agent-written operations overlaid on global registry via `OperationEnv` trait layering. Protocol doesn't need changes.
|
||||
|
||||
## References
|
||||
|
||||
@@ -323,7 +323,6 @@ See [open-questions.md](../../open-questions.md) for full details.
|
||||
- **OQ-14** (resolved): Batch is a client-side pattern of correlated `call.requested` events, not a protocol primitive.
|
||||
- **OQ-15** (open): Call protocol client and adapter contract. ADR-014 constrains the adapter contract: adapters take credential sources from the assembly layer, not static tokens. ADR-015 constrains: adapter-registered operations are `Internal` by default.
|
||||
- **OQ-16** (resolved by ADR-014): No vault operations are exposed over the call protocol for now.
|
||||
- **OQ-17** (open): Abort cascade semantics — `call.aborted` cascades to descendants, default `abort-dependents`, `continue-running` opt-in. One-way door on the event schema; mechanism is a two-way door.
|
||||
- **OQ-19** (open): Session-scoped operation registries — agent-written operations overlaid on the global registry via `OperationEnv` trait layering. Protocol doesn't need changes; one-way door is not closing the trait-based composition point.
|
||||
|
||||
## References
|
||||
|
||||
@@ -0,0 +1,195 @@
|
||||
# ADR-016: Abort Cascade for Nested Calls
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The call protocol allows handlers to compose other operations through
|
||||
`OperationEnv::invoke()`. This creates a call tree: a parent request spawns
|
||||
children (via `parent_request_id`), which may spawn their own children. The
|
||||
tree is the agency chain (ADR-015) — principal delegates to agent, agent may
|
||||
delegate to sub-agent.
|
||||
|
||||
When `call.aborted` arrives for a parent request, the current `PendingRequestMap`
|
||||
removes only that single entry. The children are unaware — they continue running,
|
||||
consuming resources, and potentially producing side effects. This is the nested
|
||||
abort problem:
|
||||
|
||||
```
|
||||
Client calls /agent/chat (r1)
|
||||
agent handler calls /fs/readFile via env.invoke (r1-a)
|
||||
fs handler calls /db/query via env.invoke (r1-a-1)
|
||||
agent handler calls /bash/exec via env.invoke (r1-b)
|
||||
|
||||
Client aborts r1 (call.aborted { id: "r1" })
|
||||
→ r1 removed from PendingRequestMap
|
||||
→ r1-a, r1-a-1, r1-b continue running (ghost work)
|
||||
→ bash/exec keeps executing (unwanted side effect)
|
||||
→ db/query keeps running (wasted resources)
|
||||
→ results produced that nobody consumes
|
||||
```
|
||||
|
||||
The `@alkdev/flowgraph` TypeScript package solved this with a directed graph
|
||||
that tracks the call tree and a `FailurePolicy` enum:
|
||||
|
||||
- `"abort-dependents"`: aborting a node cascades to all non-terminal descendants.
|
||||
This is the "whole tree should abort" behavior.
|
||||
- `"continue-running"`: only idle/waiting dependents are aborted; started ones
|
||||
keep going. New ones don't start because their predecessors failed/aborted.
|
||||
|
||||
The agent use case makes this concrete and urgent: an LLM composes deep, dynamic
|
||||
call trees (parallel tools, sequential tools, sub-agents calling sub-tools).
|
||||
Aborting a chat should tear down the entire tree — the LLM HTTP stream, all tool
|
||||
calls, all sub-calls. But this is a protocol-level concern, not an agent feature:
|
||||
every consumer (NAPI adapter, Python adapter, any service speaking EventEnvelope)
|
||||
inherits whatever abort model the protocol defines. The call protocol is a
|
||||
general-purpose cross-boundary RPC mechanism; nested composition is a core
|
||||
protocol feature, and abort semantics for that composition are protocol semantics.
|
||||
|
||||
## Decision
|
||||
|
||||
### 1. `call.aborted` cascades to descendants
|
||||
|
||||
When `call.aborted` arrives for a request, the protocol cascades the abort to
|
||||
all non-terminal descendants in the call tree (identified via `parent_request_id`).
|
||||
Each descendant receives a `call.aborted` event. The `PendingRequestMap` removes
|
||||
all affected entries.
|
||||
|
||||
The cascade is protocol-level: the event schema carries cascade semantics. A
|
||||
`call.aborted` for a parent implies abort of all descendants. This is not a
|
||||
client-side convention — the server (CallAdapter) is responsible for discovering
|
||||
descendants and propagating the abort.
|
||||
|
||||
### 2. Default policy: `abort-dependents`
|
||||
|
||||
The default policy is `abort-dependents`: aborting a request aborts everything
|
||||
downstream, regardless of branch. This is the correct default because aborted
|
||||
parent work has no consumer waiting for results — continuing is wasted work at
|
||||
best and unwanted side effects at worst (e.g., a `bash/exec` that keeps running
|
||||
after the caller stopped caring, a DB mutation that completes after the
|
||||
transaction was aborted).
|
||||
|
||||
### 3. Opt-in policy: `continue-running`
|
||||
|
||||
An opt-in `continue-running` policy is available for cases where long-running
|
||||
work should survive a parent's abort. Under `continue-running`:
|
||||
- Descendants that have already started (status: running) continue to completion.
|
||||
- Descendants that haven't started yet (status: pending/waiting) are aborted
|
||||
(their predecessors failed, so they can't proceed).
|
||||
- No new descendants start (the parent is gone).
|
||||
|
||||
Use cases for `continue-running`: a long-running subscription that should keep
|
||||
streaming after its parent's sibling failed; a background task that was spawned
|
||||
by a handler and should survive the handler's abort.
|
||||
|
||||
The caller or handler specifies the policy at call time. The specific mechanism
|
||||
(a field in the `call.requested` payload, a field on `OperationContext`, or a
|
||||
per-operation declaration) is a two-way door for implementation.
|
||||
|
||||
### 4. Cleanup hooks
|
||||
|
||||
When a call is aborted, handlers need a mechanism to clean up resources: cancel
|
||||
an HTTP stream, cancel a honker queue job, close a file handle, release a lock.
|
||||
The protocol provides this through the call lifecycle — when a call is aborted,
|
||||
the handler's task is cancelled (in Rust, the future is dropped). Cleanup is
|
||||
handled by `Drop` implementations on resource guards, or by explicit
|
||||
cancellation callbacks if the handler registers them.
|
||||
|
||||
This is a handler-level concern, not a protocol-level one. The protocol's job is
|
||||
to cascade the abort; the handler's job is to clean up when cancelled. The
|
||||
mechanism (tokio `CancellationToken`, `Drop` guards, explicit callbacks) is a
|
||||
two-way door for implementation.
|
||||
|
||||
### 5. The call tree is tracked via `parent_request_id`
|
||||
|
||||
The call tree is already recorded: `OperationContext.parent_request_id` links
|
||||
each call to its parent. The cascade mechanism walks this tree to find
|
||||
descendants. No separate graph structure is required at the protocol level —
|
||||
the `PendingRequestMap` can index entries by `parent_request_id` to enable
|
||||
efficient descendant lookup.
|
||||
|
||||
The `@alkdev/flowgraph` package (directed graph with `descendants()`,
|
||||
reactive status propagation, `FailurePolicy`) is prior art and may be adapted
|
||||
as a separate Rust crate for consumers that need richer call-tree visualization
|
||||
or reactive status tracking. It is not required for the protocol-level cascade
|
||||
— a parent-indexed map suffices.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive:**
|
||||
- No ghost work. Aborting a parent call tears down the entire tree. Resources
|
||||
are released, side effects are halted, no results are produced for absent
|
||||
consumers.
|
||||
- The default (`abort-dependents`) matches the intuitive expectation: if I
|
||||
stop caring about the parent, I stop caring about everything it spawned.
|
||||
- The opt-in (`continue-running`) covers the legitimate exception (long-running
|
||||
work that should survive) without making it the default.
|
||||
- The protocol carries cascade semantics, so every consumer inherits the
|
||||
correct behavior — no consumer needs to implement its own abort propagation.
|
||||
- The `parent_request_id` chain already exists; the cascade mechanism is an
|
||||
index on it, not a new data structure.
|
||||
- Cleanup hooks are handled by Rust's async drop semantics — dropping the
|
||||
handler's future cancels it, and `Drop` guards release resources. This is
|
||||
idiomatic Rust, not a custom mechanism.
|
||||
|
||||
**Negative:**
|
||||
- The `PendingRequestMap` needs a parent-indexed lookup (a `HashMap<String,
|
||||
Vec<String>>` from parent_request_id to child request_ids, or a scan). This
|
||||
is a minor implementation cost, not a protocol change.
|
||||
- The `call.aborted` event schema carries cascade semantics — clients that
|
||||
don't understand cascade (future versions, other implementations) would
|
||||
need to handle it. Mitigated: cascade is server-side (the CallAdapter walks
|
||||
the tree and sends `call.aborted` per descendant), so clients see individual
|
||||
abort events regardless of whether they understand the cascade concept.
|
||||
- The `continue-running` policy adds a parameter to the call lifecycle. The
|
||||
specific location (payload field, context field, per-operation declaration)
|
||||
is a two-way door, but the existence of the policy is a one-way commitment.
|
||||
|
||||
## Assumptions
|
||||
|
||||
1. **Aborting a parent should abort descendants by default.** If the default
|
||||
should be `continue-running` (descendants survive), this ADR is wrong. The
|
||||
assumption is that ghost work is worse than premature cancellation — a
|
||||
cancelled descendant can be retried, but a ghost process consuming
|
||||
resources and producing unwanted side effects is harder to recover from.
|
||||
|
||||
2. **The server (CallAdapter) is responsible for cascade.** The client sends
|
||||
`call.aborted` for one request ID; the server discovers descendants and
|
||||
propagates. If the client were responsible for cascading, it would need to
|
||||
know the full tree — which it may not (server-side composition creates
|
||||
children the client never saw).
|
||||
|
||||
3. **`parent_request_id` is sufficient to discover descendants.** The call tree
|
||||
is a tree (acyclic, single parent per node). If future composition patterns
|
||||
create multi-parent relationships (e.g., a shared subcall invoked by two
|
||||
parents), the cascade model needs extension. The assumption is that
|
||||
composition creates a tree, not a DAG.
|
||||
|
||||
4. **Dropping the handler's future is sufficient for cleanup.** Rust's async
|
||||
drop semantics cancel the future and run `Drop` guards. If a use case
|
||||
requires explicit cleanup callbacks (e.g., external systems that need a
|
||||
signal), the mechanism needs extension. The assumption is that `Drop`
|
||||
guards cover the common cases (HTTP stream cancellation, file handle
|
||||
release, lock release).
|
||||
|
||||
5. **`continue-running` is per-call, not per-operation.** The policy is
|
||||
specified at call time, not declared at registration. If the policy should
|
||||
be a static property of the operation (declared in `OperationSpec`), the
|
||||
model changes. The assumption is that the caller or handler decides at call
|
||||
time based on the specific context.
|
||||
|
||||
## References
|
||||
|
||||
- ADR-012: Call protocol stream model (bidirectional streams, EventEnvelope,
|
||||
ID-based correlation)
|
||||
- ADR-015: Privilege model (the call tree is the agency chain —
|
||||
`parent_request_id` traces principal → agent)
|
||||
- OQ-17: Abort cascade semantics (resolved by this ADR)
|
||||
- OQ-19: Session-scoped registries (session-scoped operations are in the call
|
||||
tree and participate in cascade)
|
||||
- `@alkdev/flowgraph` TypeScript package — prior art for call-graph tracking
|
||||
with `descendants()`, `FailurePolicy`, reactive status propagation
|
||||
- [call-protocol.md](../crates/call/call-protocol.md)
|
||||
- [operation-registry.md](../crates/call/operation-registry.md)
|
||||
@@ -186,17 +186,11 @@ These questions are acknowledged but not active. They will be promoted to open w
|
||||
### OQ-17: Abort Cascade Semantics for Nested Calls
|
||||
|
||||
- **Origin**: [call-protocol.md](crates/call/call-protocol.md), [operation-registry.md](crates/call/operation-registry.md)
|
||||
- **Status**: open
|
||||
- **Status**: resolved
|
||||
- **Door type**: One-way (protocol schema), two-way (mechanism)
|
||||
- **Priority**: high
|
||||
- **Resolution**: When a handler composes other operations via `OperationEnv::invoke()`, it creates a call tree (parent → children via `parent_request_id`). When `call.aborted` arrives for a parent request, the protocol cascades the abort to all non-terminal descendants in the tree. The default policy is `abort-dependents`: aborting a request aborts everything downstream, regardless of branch. This is the correct default because aborted parent work has no consumer waiting for results — continuing is wasted work at best and unwanted side effects at worst (e.g., a `bash/exec` that keeps running after the caller stopped caring). An opt-in `continue-running` policy is available for cases where long-running work should survive a parent's abort (e.g., a subscription that should keep streaming).
|
||||
|
||||
The one-way door is the protocol event schema: `call.aborted` must carry cascade semantics before implementation, because retrofitting cascade onto a non-cascading abort is a breaking protocol change (existing clients send `call.aborted` for one ID, the server processes one ID). The mechanism — how the runtime discovers descendants and propagates cancellation (cancellation tokens propagated through `OperationContext`, a parent-indexed map in `PendingRequestMap`, or a separate graph structure consuming call events) — is a two-way door for implementation. The `@alkdev/flowgraph` TypeScript package demonstrates a reactive call-graph approach (directed graph with `descendants()`, `FailurePolicy: "abort-dependents" | "continue-running"`, signal-based status propagation); a Rust adaptation could use `petgraph` for the graph structure or tokio `CancellationToken` for a simpler implicit tree. The flowgraph may live as a separate crate consuming call events (as the TS version does), not necessarily inside alknet-call.
|
||||
|
||||
This is a protocol-level concern, not specific to any single consumer. The call protocol is a general-purpose cross-boundary RPC mechanism — every consumer (NAPI adapter, Python adapter, agent service, future services) inherits whatever abort model is locked in. Nested composition is a core protocol feature, not an agent feature. The agent use case makes the deep/dynamic call tree case concrete, but the abort cascade problem exists for any handler that composes other operations.
|
||||
|
||||
This OQ will be resolved with an ADR before alknet-call implementation begins.
|
||||
- **Cross-references**: ADR-012, [call-protocol.md](crates/call/call-protocol.md), [operation-registry.md](crates/call/operation-registry.md)
|
||||
- **Resolution**: `call.aborted` cascades to all non-terminal descendants in the call tree. The CallAdapter walks the tree (indexed by `parent_request_id` in `PendingRequestMap`) and sends `call.aborted` for each descendant. Default policy is `abort-dependents` (abort everything downstream); `continue-running` is an opt-in for long-running work that should survive a parent's abort. Handlers clean up via Rust's async drop semantics (future dropped → `Drop` guards release resources). The cascade is protocol-level (server discovers descendants and propagates); the mechanism (parent-indexed map, cancellation tokens, or a separate graph) is a two-way door. See ADR-016.
|
||||
- **Cross-references**: ADR-012, ADR-015, ADR-016, [call-protocol.md](crates/call/call-protocol.md), [operation-registry.md](crates/call/operation-registry.md)
|
||||
|
||||
### OQ-18: Privilege Model and Authority Context
|
||||
|
||||
|
||||
@@ -205,6 +205,7 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
|
||||
| [013](decisions/013-rust-canonical-implementation.md) | Rust as Canonical Implementation Language | Rust canonical, TypeScript reference adaptation |
|
||||
| [014](decisions/014-secret-material-flow-and-capability-injection.md) | Secret Material Flow and Capability Injection | Capabilities carry outbound credentials; call protocol carries no secret material |
|
||||
| [015](decisions/015-privilege-model-and-authority-context.md) | Privilege Model and Authority Context | `internal` = authority switch not ACL skip; External/Internal visibility; handler identity + scoped env |
|
||||
| [016](decisions/016-abort-cascade-for-nested-calls.md) | Abort Cascade for Nested Calls | `call.aborted` cascades to descendants; default `abort-dependents`, `continue-running` opt-in |
|
||||
|
||||
## Open Questions
|
||||
|
||||
|
||||
Reference in New Issue
Block a user