specify failure propagation semantics (C-04)

Key design decisions:
- Failure follows dependency edges, not structural scope
- Parallel branches are independent: failure in one branch
  doesn't cancel sibling branches
- blockedByFailure computed signal detects failed/aborted predecessors
- Conditionals serve as error boundaries (caught failures redirect
  to else branch, uncaught failures cascade)
- aborted nodes don't satisfy preconditions; skipped nodes do
- abortAll() for systemic failures (provider outage, auth failure)

Changes:
- reactive-execution.md: new Failure Propagation section with
  sequential/parallel/join/conditional examples, blockedByFailure
  effect, partial success model
- host-configs.md: add blockedByFailure to WorkflowNode, update
  status propagation effects, replace cascadeAbort with abortAll
- schema.md: document precondition semantics for NodeStatus
- build-distribution.md + README.md: add blockedByFailure to
  node-status.ts comments
- review checklist: mark C-04 resolved
This commit is contained in:
2026-05-19 11:36:55 +00:00
parent c5e649cc9f
commit 1dbaccbde3
6 changed files with 280 additions and 65 deletions

View File

@@ -172,7 +172,8 @@ interface WorkflowNode {
key: string; // Operation name or structural container ID
type: "operation" | "sequential" | "parallel" | "conditional" | "map";
status: Signal<NodeStatus>; // Reactive status signal
preconditions: Computed<boolean>; // Computed: true when all preconditions are met
preconditions: Computed<boolean>; // Computed: true when all preconditions are met
blockedByFailure: Computed<boolean>; // Computed: true when any predecessor failed/aborted (uncaught)
operationId?: string; // For operation nodes: the fully qualified ID
output?: Signal<unknown>; // For operation nodes: the call result (when completed)
children: WorkflowNode[]; // Child nodes (structural containers have children)
@@ -181,7 +182,8 @@ interface WorkflowNode {
Each `WorkflowNode` holds:
- A `signal<NodeStatus>` that tracks the call's lifecycle (`idle``waiting``ready``running``completed`/`failed`/`aborted`/`skipped`)
- A `computed` that derives `preconditions` from parent nodes' statuses
- A `computed` that derives `preconditions` from parent nodes' statuses (true when all predecessors are `completed` or `skipped`)
- A `computed` that derives `blockedByFailure` from parent nodes' statuses (true when any predecessor is `failed` or `aborted`)
- An optional `output` signal that holds the call result when completed
### ReactiveContext
@@ -243,29 +245,42 @@ function computePreconditions(node: WorkflowNode, ctx: ReactiveContext): boolean
### Status Propagation
When a node's `status` signal changes, its dependents' `preconditions` computed automatically re-evaluate. If preconditions are met, the node transitions to `ready`:
When a node's `status` signal changes, its dependents' `preconditions` and `blockedByFailure` computed values automatically re-evaluate. If preconditions are met, the node transitions to `ready`; if blocked by failure, it transitions to `aborted`:
```typescript
// Start when preconditions are met
effect(() => {
if (node.preconditions.value) {
node.status.value = "ready";
if (node.status.value === "idle" || node.status.value === "waiting") {
node.status.value = "ready";
}
}
});
// Abort when a predecessor fails (uncaught failure propagation)
effect(() => {
if (node.blockedByFailure.value) {
if (node.status.value === "idle" || node.status.value === "waiting") {
node.status.value = "aborted";
}
}
});
```
The reactive engine then starts the call associated with the node, which sets `status` to `running`, and eventually `completed` or `failed`.
The reactive engine then starts the call associated with the node (when `ready`), which sets `status` to `running`, and eventually `completed` or `failed`.
**Note**: Failure propagation follows dependency edges, not structural scope. A failed node only causes its downstream dependents (via DAG edges) to abort. Sibling branches in a `Parallel` group are independent and continue running. See [reactive-execution.md](reactive-execution.md) for the full failure propagation model.
### Abort Cascading
When a node is aborted, all its descendants are also aborted:
System-level abort (e.g., provider outage) aborts the entire workflow:
```typescript
function cascadeAbort(node: WorkflowNode): void {
if (node.status.value === "running" || node.status.value === "ready" || node.status.value === "waiting") {
node.status.value = "aborted";
}
for (const child of node.children) {
cascadeAbort(child);
function abortAll(root: WorkflowReactiveRoot): void {
for (const [nodeId, status] of root.statusMap) {
if (status.value !== "completed" && status.value !== "failed") {
status.value = "aborted";
}
}
}
```