Files
flowgraph/docs/architecture/reactive-execution.md
glm-5.1 1dbaccbde3 specify failure propagation semantics (C-04)
Key design decisions:
- Failure follows dependency edges, not structural scope
- Parallel branches are independent: failure in one branch
  doesn't cancel sibling branches
- blockedByFailure computed signal detects failed/aborted predecessors
- Conditionals serve as error boundaries (caught failures redirect
  to else branch, uncaught failures cascade)
- aborted nodes don't satisfy preconditions; skipped nodes do
- abortAll() for systemic failures (provider outage, auth failure)

Changes:
- reactive-execution.md: new Failure Propagation section with
  sequential/parallel/join/conditional examples, blockedByFailure
  effect, partial success model
- host-configs.md: add blockedByFailure to WorkflowNode, update
  status propagation effects, replace cascadeAbort with abortAll
- schema.md: document precondition semantics for NodeStatus
- build-distribution.md + README.md: add blockedByFailure to
  node-status.ts comments
- review checklist: mark C-04 resolved
2026-05-19 11:36:55 +00:00

23 KiB

status, last_updated
status last_updated
draft 2026-05-19

Reactive Execution

Signal-driven status propagation, computed preconditions, and failure propagation for workflow template execution.

Overview

The reactive execution layer bridges workflow template structure (DAG) to runtime behavior (call execution). It uses @preact/signals-core (via ujsx's reactive layer) to create a signal-backed execution model where:

  • Each <Operation> node gets a signal<NodeStatus> tracking its lifecycle state
  • Preconditions are computed<boolean> values that automatically resolve when upstream dependencies complete
  • Failure propagation follows dependency edges — a failed predecessor causes downstream dependents to abort, while independent branches continue running
  • Conditionals can serve as error boundaries, catching failures and redirecting to fallback paths

This layer does NOT execute operations directly. It provides reactive state that the hub coordinator reads and writes. The coordinator calls registry.execute() when a node's preconditions are met, and updates the node's status signal when the call completes or fails.

ReactiveRoot for Workflows

class WorkflowReactiveRoot {
  private statusMap: Map<string, Signal<NodeStatus>>;
  private preconditions: Map<string, Computed<boolean>>;
  private blockedByFailure: Map<string, Computed<boolean>>;
  private graph: DirectedGraph;
  private effectDisposers: (() => void)[];

  constructor(graph: DirectedGraph) {
    this.graph = graph;
    this.statusMap = new Map();
    this.preconditions = new Map();
    this.blockedByFailure = new Map();
    this.effectDisposers = [];
    this.initializeSignals();
  }
}

WorkflowReactiveRoot wraps the reactive state for an entire workflow execution. It takes the structural DAG (from the GraphologyHost) and creates reactive state for each operation node.

initializeSignals()

private initializeSignals(): void {
  for (const node of this.graph.nodes()) {
    const attrs = this.graph.getNodeAttributes(node);
    if (attrs.category !== "operation") continue;  // Skip structural nodes (already flattened)

    const status = signal<NodeStatus>("idle");

    const predecessors = this.graph.inNeighbors(node);

    // Preconditions: all predecessors completed or skipped
    const preconditions = computed(() => {
      return predecessors.every(pred => {
        const predStatus = this.statusMap.get(pred);
        return predStatus && (predStatus.value === "completed" || predStatus.value === "skipped");
      });
    });

    // Blocked by failure: any predecessor failed or aborted (uncaught)
    const blockedByFailure = computed(() => {
      return predecessors.some(pred => {
        const predStatus = this.statusMap.get(pred);
        return predStatus && (predStatus.value === "failed" || predStatus.value === "aborted");
      });
    });

    this.statusMap.set(node, status);
    this.preconditions.set(node, preconditions);
    this.blockedByFailure.set(node, blockedByFailure);
  }
}

For each operation node in the DAG:

  1. Create a signal<NodeStatus> starting at "idle"
  2. Create a computed<boolean> that's true when all predecessor nodes have status "completed" (or "skipped" — a skipped node satisfies its dependents' preconditions)
  3. Create a computed<NodeStatus | null> that detects whether any predecessor has failed or been aborted, triggering a cascade
  4. Register an abort function that cascades to all descendants

Status lifecycle

The signal-based status lifecycle mirrors CallStatus with workflow-specific additions:

idle → waiting → ready → running → completed
                               ↓            ↑
                            failed          │
                               ↓            │
                     (uncaught) → aborted ←──┘
                                  ↑
                    (cascade from failed predecessor)
                                  ↑
                            skipped (conditional)

Full transition rules:

idle        → waiting    (predecessor starts running)
idle        → ready      (no predecessors — root node)
waiting     → ready      (all predecessors completed or skipped)
waiting     → aborted    (predecessor failed and failure is uncaught)
ready       → running    (hub starts the call)
running     → completed  (call succeeded)
running     → failed     (call threw an error)
running     → aborted    (call cancelled externally)
failed      → [terminal] (no further transitions)
aborted     → [terminal] (no further transitions)
skipped     → [terminal] (conditional branch not taken)
completed   → [terminal] (no further transitions)
Status Meaning Signal trigger
idle Node just created, no predecessor activity yet Initial state
waiting At least one predecessor is running, none have completed yet Any predecessor status change
ready All predecessors completed or skipped (preconditions met) computed resolves to true
running Call executing Hub sets status.value = "running"
completed Call succeeded Hub sets status.value = "completed"
failed Call failed (uncaught error) Hub sets status.value = "failed"
aborted Call cancelled, or cascaded from failed predecessor Hub or cascade sets status.value = "aborted"
skipped Conditional branch not taken Conditional evaluation sets this

The key distinction between failed and aborted:

  • failed means the operation itself threw an error. The node is the source of the failure.
  • aborted means the operation was cancelled or a predecessor failed. The node is a victim of failure propagation.

Computed Preconditions

The core innovation of reactive execution: each node's "can I start?" question is a computed signal that automatically resolves based on upstream states.

const preconditions = computed(() => {
  const predecessors = graph.inNeighbors(node);
  return predecessors.every(pred => {
    const status = statusMap.get(pred)!.value;
    return status === "completed" || status === "skipped";
  });
});

A node's preconditions are met when all predecessors have reached a satisfying terminal state (completed or skipped). A failed or aborted predecessor does NOT satisfy preconditions — it prevents the dependent from ever becoming ready.

This means:

  • Adding a new predecessor automatically includes it in the check (if the DAG changes)
  • A predecessor completing automatically re-evaluates all dependent preconditions
  • An aborted predecessor prevents dependents from becoming ready
  • A skipped predecessor satisfies preconditions (the branch was deliberately bypassed, not broken)
  • No manual event wiring or callback chains

Sequential preconditions

In a sequential group (A → B → C):

  • A's preconditions: true (no predecessors, or root-level)
  • B's preconditions: A.status === "completed"
  • C's preconditions: B.status === "completed"

When A completes → B's preconditions become true → hub starts B → B completes → C's preconditions become true → hub starts C. All without manual event wiring.

Parallel preconditions

In a parallel group (A starts B and C simultaneously):

  • B's preconditions: A.status === "completed" (same as any sequential dependency)
  • C's preconditions: A.status === "completed" (shared predecessor)

Both B and C become ready at the same time, and the hub starts them in parallel.

Join preconditions

When a node depends on multiple predecessors (e.g., D depends on both B and C completing):

  • D's preconditions: B.status === "completed" && C.status === "completed"

D only becomes ready when all predecessors complete. This is the "join" in fork-join parallelism.

Failure Propagation

Failure propagation is the mechanism by which a failed or aborted node causes its downstream dependents to abort. The key design principle: failure follows dependency edges, not structural scope.

This means:

  • In a Sequential group, failure propagates forward through the chain (B depends on A, so if A fails, B aborts)
  • In a Parallel group, sibling branches are independent — a failure in branch A does NOT affect branch B, because there are no dependency edges between them
  • A node that depends on multiple predecessors (a join) aborts only when it's impossible for its preconditions to ever be met

The preconditions-failure duality

Each node has two complementary reactive computations:

  1. preconditions (computed<boolean>) — true when all predecessors are completed or skipped. Node can start.
  2. blockedByFailure (computed<boolean>) — true when any predecessor is failed or aborted and the failure is uncaught (not handled by a Conditional).
const preconditions = computed(() => {
  const predecessors = graph.inNeighbors(node);
  return predecessors.every(pred => {
    const status = statusMap.get(pred)!.value;
    return status === "completed" || status === "skipped";
  });
});

const blockedByFailure = computed(() => {
  const predecessors = graph.inNeighbors(node);
  return predecessors.some(pred => {
    const status = statusMap.get(pred)!.value;
    return status === "failed" || status === "aborted";
  });
});

When blockedByFailure becomes true and the node hasn't started (idle or waiting), the node transitions to aborted. This happens via an effect():

effect(() => {
  if (blockedByFailure.value && (status.value === "idle" || status.value === "waiting")) {
    status.value = "aborted";
  }
});

This cascade is automatic and reactive — when a predecessor fails, all downstream blockedByFailure computations re-evaluate, and their effects fire, aborting any waiting dependents.

Sequential failure propagation

A (failed) → B (aborted) → C (aborted)

When A fails, B's blockedByFailure becomes true. B transitions from waiting to aborted. C's blockedByFailure then becomes true (B is now aborted). C transitions to aborted. The entire downstream chain aborts.

Parallel independence

        ┌── B (completed) ──┐
A (completed)                ├── D (ready)
        └── C (failed) ─────┘

When C fails:

  • C's downstream dependents see blockedByFailure = true
  • B is unaffected — it's on an independent branch
  • D depends on both B and C. D's preconditions will never be met (C is failed, not completed). D's blockedByFailure is true (C is failed). D transitions to aborted.

But crucially, this is because D depends on C, not because they share a structural scope:

        ┌── B (completed) ──┐
A (completed)                │   (no edge from C to E)
        └── C (failed) ─────┘
                                    └── E (completed)

E has no dependency on C. E continues running regardless of C's failure. Failure follows dependency edges, not structural boundaries.

Join semantics

When a node depends on multiple predecessors (fork-join):

        ┌── B (completed) ──┐
A (completed)                ├── D (aborted)
        └── C (failed) ─────┘

D's preconditions requires both B and C to be completed/skipped. Since C is failed, D's preconditions can never be met. D transitions to aborted.

The alternative would be "partial success" — D starts with B's output even though C failed. This is NOT supported by the precondition model. If partial execution is needed, the template author should use a Conditional to handle the failure case explicitly.

Conditional as error boundary

A Conditional can catch a failure and redirect to a fallback path:

h(Sequential, {},
  h(Operation, { name: "fetch-data" }),
  h(Conditional, {
    test: (results) => results["fetch-data"].status !== "failed",
  },
    // then: proceed with data processing
    h(Sequential, {},
      h(Operation, { name: "transform" }),
      h(Operation, { name: "store" }),
    ),
    // else: fallback path
    h(Operation, { name: "notify-error" }),
  ),
)

If fetch-data fails:

  1. The Conditional's test function receives the results map including fetch-data's status
  2. test evaluates to false (the operation failed)
  3. The then branch transitions to skipped
  4. The else branch (notify-error) becomes ready
  5. Downstream nodes after the Conditional see the Conditional as completed (it resolved successfully, just on a different branch)

This makes Conditional a caught error boundary. The failure is handled — downstream nodes don't see a cascade because the Conditional resolved successfully.

Without a Conditional, the failure is uncaught. It cascades through dependency edges to all dependents, which transition to aborted.

Systemic failure: aborting the entire workflow

For failures that should cancel everything (e.g., provider outage, authentication failure), the hub coordinator can abort the entire WorkflowReactiveRoot:

workflowRoot.abortAll();  // Sets all non-terminal nodes to "aborted"

This is separate from dependency-edge failure propagation. It's for systemic failures where the workflow cannot meaningfully continue regardless of which branches are independent.

Interaction with call protocol abort

There are two abort mechanisms:

  1. Signal cascade (this layer) — blockedByFailure effects transition dependents to aborted. This is automatic and follows dependency edges.
  2. Call protocol abort (operations layer) — PendingRequestMap.abort(requestId) propagates call.aborted events through the pub/sub layer. This is network-aware and handles remote calls.
  3. Full workflow abortworkflowRoot.abortAll() aborts all non-terminal nodes. For systemic failures.

The hub coordinator should invoke signal cascade and protocol abort together:

// When aborting a call:
workflowRoot.abortNode(nodeId);           // Signal: transition dependents to aborted
prm.abort(requestId);                    // Protocol: cancel the remote call

// When aborting entire workflow:
workflowRoot.abortAll();                  // Signal: abort everything
prm.abortAll(pendingRequestIds);         // Protocol: cancel all pending calls

Signal cascades are instant. Protocol aborts may take time to propagate. They're complementary — the signal cascade ensures local state is immediately consistent, while the protocol abort ensures remote state eventually catches up.

NodeStatus vs CallStatus

NodeStatus extends CallStatus with workflow-specific states that have no call protocol equivalent:

NodeStatus Meaning CallStatus equivalent
idle Not started, no preconditions evaluated None (call doesn't exist yet)
waiting Preconditions not met (upstream still running) None
ready Preconditions met, eligible to start None
running Call in progress running
completed Call succeeded completed
failed Call failed failed
aborted Call cancelled aborted
skipped Conditional branch not taken None

The hub coordinator maps between these:

// NodeStatus → CallStatus (when starting a call)
function nodeStatusToCallAction(status: NodeStatus): "start" | "skip" | "abort" | "none" {
  switch (status) {
    case "ready": return "start";
    case "skipped": return "skip";
    case "aborted": return "abort";
    default: return "none";
  }
}

// CallStatus → NodeStatus (when call event arrives)
function callStatusToNodeStatus(callStatus: CallStatus): NodeStatus {
  // Direct mapping for shared states
  return callStatus as NodeStatus;
}

Effect-Driven Execution

The hub coordinator uses two effect()s per node — one for starting when preconditions are met, and one for aborting when failure propagates:

for (const [nodeId, preconditions, blockedByFailure] of workflowRoot.nodes) {
  // Start the call when preconditions are met
  effect(() => {
    if (preconditions.value) {
      const status = workflowRoot.statusMap.get(nodeId)!;
      if (status.value === "idle" || status.value === "waiting") {
        // All preconditions satisfied — start the call
        status.value = "running";
        const operationId = graph.getNodeAttributes(nodeId).name;
        prm.call(operationId, getInput(nodeId), { parentRequestId: parentCallId })
          .then(result => { status.value = "completed"; })
          .catch(error => { status.value = "failed"; });
      }
    }
  });

  // Abort when a predecessor fails (uncaught failure propagation)
  effect(() => {
    if (blockedByFailure.value) {
      const status = workflowRoot.statusMap.get(nodeId)!;
      if (status.value === "idle" || status.value === "waiting") {
        // A predecessor failed and no Conditional caught it — abort
        status.value = "aborted";
      }
    }
  });
}

Both effects are reactive. When a predecessor completes, the preconditions computed re-evaluates, potentially triggering the start effect. When a predecessor fails, the blockedByFailure computed re-evaluates, potentially triggering the abort effect.

The call's promise resolution updates the node's status signal, which triggers downstream preconditions and failure propagations to re-evaluate, which triggers their effects, and so on.

Effect disposal

Each effect() returns a dispose function. The WorkflowReactiveRoot tracks all effect disposers and provides a dispose() method that tears down the entire reactive graph:

dispose(): void {
  for (const disposer of this.effectDisposers) {
    disposer();
  }
  this.statusMap.clear();
  this.preconditions.clear();
  this.blockedByFailure.clear();
}

This is critical for cleaning up when a workflow completes, fails, or is aborted. Without disposal, signal subscriptions leak.

Full workflow abort

For systemic failures (provider outage, authentication failure), WorkflowReactiveRoot provides abortAll():

abortAll(): void {
  for (const [nodeId, status] of this.statusMap) {
    if (status.value !== "completed" && status.value !== "failed") {
      status.value = "aborted";
    }
  }
  // Effects will fire and clean up any waiting/ready nodes
}

This transitions all non-terminal, non-failed nodes to aborted. It's for cases where the entire workflow should stop, regardless of which branches are independent.

Constraints

  • Signals are in-memoryWorkflowReactiveRoot state is not persisted. If the hub restarts, the reactive state is lost and must be reconstructed from call protocol events + template re-render.
  • Effect-driven execution is optional — the hub coordinator can choose not to use effect() and instead poll preconditions.value and blockedByFailure.value manually. The reactive layer provides the building blocks; the coordinator decides how to use them.
  • Failure follows dependency edges, not structural scope — a failed node causes only its downstream dependents (via DAG edges) to abort. Sibling branches in a Parallel group are independent and continue running. This enables partial success: one branch can fail while another completes.
  • Conditionals are error boundaries — a Conditional whose test evaluates against a failed predecessor can redirect to an else branch, catching the failure. Without a Conditional, failures cascade uncaught through dependency edges.
  • Abort is immediate in signals, delayed in protocol — setting status.value = "aborted" is instant, but prm.abort(requestId) takes time to propagate through the call protocol. The hub should invoke both.
  • skipped satisfies preconditions — a skipped predecessor is treated as "completed for the purpose of preconditions." It means the branch was deliberately bypassed, not broken.
  • failed and aborted block preconditions — a failed or aborted predecessor means the dependent's preconditions can never be met. The blockedByFailure effect transitions the dependent to aborted.
  • NodeStatus and CallStatus share terminal statesrunning, completed, failed, aborted map directly. idle, waiting, ready, skipped are workflow-specific additions.

Open Questions

  1. Should preconditions support OR logic? Currently all predecessors must complete (AND logic). An anyOf predicate would allow "start this node as soon as any predecessor completes." This would require an edge attribute or node-level configuration.

  2. How are retries handled at the signal level? If an operation fails and should be retried, the status would go running → failed → ready → running. This requires resetting the status back to ready, which the current state machine doesn't support (failed is terminal). A retried status or a separate retryCount attribute may be needed.

  3. Should the reactive graph support partial re-rendering? If a template changes mid-execution (e.g., a step is added), the ujsx reconciler could diff the old and new trees. But the ReactiveHost only supports mount rendering. Re-rendering would require reconciler support.

  4. How does maxConcurrency interact with preconditions? A Parallel group with maxConcurrency: 3 should only start 3 nodes at a time, even though all preconditions are met. This is a scheduling concern, not a structural one. The reactive layer could implement this as a semaphore signal, or it could be the coordinator's responsibility.

  5. Should blockedByFailure be a separate computed or derived from preconditions? Currently the design has two separate computeds — preconditions (all predecessors completed/skipped) and blockedByFailure (any predecessor failed/aborted). An alternative is a single computed<NodeReadiness> that returns "ready" | "blocked" | "failed" or similar. This reduces the number of effects but makes the readiness check less composable.

  6. What happens to running nodes when a predecessor fails? The current spec transitions idle and waiting nodes to aborted. But what about a node that's already running? Should it be cancelled (set to aborted and call prm.abort()), or should it be allowed to complete? The answer depends on whether the running node's output is still needed — which the template author decides via Conditional error boundaries.

References

  • ujsx reactive layer: @alkdev/ujsx/docs/architecture/reactive-layer.md
  • ujsx reconciler: @alkdev/ujsx/docs/architecture/reconciler.md
  • Schema: schema.mdNodeStatus, CallStatus
  • Host configs: host-configs.md
  • Workflow templates: workflow-templates.md
  • Call protocol: @alkdev/alkhub_ts/docs/architecture/call-graph.md