ADR-005 accepted: resolve all open consequences, update cascading docs

Resolve the three open consequences from ADR-005 (Event Log as Single
Source of Truth) and transition from Proposed to Accepted:

1. Event log IS the call protocol event stream — not a separate type,
   but an EventLogProjection interface (append/getStatus/getResult/
   getEvents) over CallEventMapValue[] with an append-only contract.

2. Event log persists across template re-renders — projections recompute
   against the new DAG; orphaned events stay in log for audit but don't
   affect active projections.

3. Edges get dataFlow: boolean attribute on TemplateEdgeAttrs — inferred
   (not manual) by GraphologyHostConfig from template expressions.
   typeCompat() only runs on dataFlow: true edges. Inference rules are
   precisely specified for Conditional.test, Map.over, and Operation.input.

Also resolve OQ-05 (structural containers stay transparent; aggregate
status is a projection from children) and OQ-10 (running node failure
is a FailurePolicy configuration, default continues-running).

Cascading updates to:
- reactive-execution.md: add hybrid status model (event-log-driven vs
  projection-driven vs signal-mutation), EventLogProjection interface,
  result projection respecting retries, FailurePolicy type
- host-configs.md: ReactiveContext now includes resultProjection and
  computed results; resolved Q1/Q3/Q4
- schema.md: dataFlow attribute on TemplateEdgeAttrs with inference
  rules and type checking implications
- workflow-templates.md: edge creation rules with dataFlow, result
  projection in Conditional/Map, resolved Q1/Q4
- open-questions.md: all ADR-005 questions marked resolved, updated
  summary table and cross-cutting themes, removed duplicate OQ-07

7 files changed, 464 insertions, 139 deletions
This commit is contained in:
2026-05-21 07:44:28 +00:00
parent 2c1b2d1a15
commit c76be7f689
7 changed files with 463 additions and 138 deletions

View File

@@ -1,11 +1,11 @@
---
status: draft
last_updated: 2026-05-20
last_updated: 2026-05-21
---
# Reactive Execution
Signal-driven status propagation, computed preconditions, and failure propagation for workflow template execution.
Signal-driven status propagation, computed preconditions, and failure propagation for workflow template execution, built on the event log as single source of truth (ADR-005).
## Overview
@@ -16,30 +16,160 @@ The reactive execution layer bridges workflow template structure (DAG) to runtim
- Failure propagation follows dependency edges — a failed predecessor causes downstream dependents to abort, while independent branches continue running
- Conditionals can serve as error boundaries, catching failures and redirecting to fallback paths
This layer does NOT execute operations directly. It provides reactive state that the hub coordinator reads and writes. The coordinator calls `registry.execute()` when a node's preconditions are met, and updates the node's status signal when the call completes or fails.
### Event Log as Source of Truth
Per [ADR-005](decisions/005-event-log-as-source-of-truth.md), the reactive execution layer is a **projection** of the call protocol event log. The hub coordinator appends call protocol events (`call.requested`, `call.responded`, `call.error`, `call.aborted`, `call.completed`), and the reactive layer derives its state from these events:
```
┌─────────────────────────────────────────────┐
│ Execution Event Log │
│ (append-only CallEventMapValue[] — │
│ the call protocol events) │
└──────────────────┬──────────────────────────┘
┌─────────────┼──────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌──────────┐
│ Status │ │ Result │ │ Call │
│ Proj. │ │ Proj. │ │ Graph │
│ │ │ │ │ Proj. │
│ nodeId: │ │ nodeId: │ │ │
│ status │ │ output │ │ nodes + │
│ │ │ │ │ edges │
└────┬────┘ └────┬─────┘ └──────────┘
│ │
▼ ▼
┌───────────────────────────────────────────┐
│ Reactive Execution Layer │
│ │
│ preconditions → "does the log show │
│ all predecessors │
│ completed?" │
│ │
│ result resolution → "does the log │
│ have A's output?" │
│ │
│ Conditional.test → reads from result proj. │
│ Map.over → reads from result proj. │
└───────────────────────────────────────────┘
```
**The hub coordinator appends events; the reactive layer projects them.** This replaces the previous design where the coordinator directly set signal values. Under ADR-005, the coordinator's responsibility is:
1. Start a call (which emits `call.requested`)
2. Receive the result (which emits `call.responded` or `call.error`)
3. Append these events to the log
The reactive layer's projections derive `NodeStatus` and `CallResult` from the log. The coordinator no longer calls `status.value = "running"` — the status projection derives this from `call.requested` events.
### Hybrid Status Model
While ADR-005 positions the event log as the single source of truth, not all `NodeStatus` values correspond to call protocol events. The model is hybrid:
**Event-log-driven statuses** (derived directly from `CallEventMapValue` events):
| Call protocol event | Derived NodeStatus |
|---------------------|--------------------|
| `call.requested` | `running` |
| `call.responded` | `completed` |
| `call.error` | `failed` |
| `call.aborted` | `aborted` |
**Projection-driven statuses** (derived from the event log combined with template structure and reactive state):
| NodeStatus | Derived from |
|------------|-------------|
| `idle` | No events for this node yet; no predecessors are running |
| `waiting` | At least one predecessor is `running`, none have completed |
| `ready` | All predecessors are `completed` or `skipped`; no `call.requested` event yet |
| `skipped` | Conditional branch not taken (template-level decision, no call event) |
**Signal-mutation statuses** (set by the reactive engine, not derived from events):
| Trigger | NodeStatus | Rationale |
|---------|------------|-----------|
| `blockedByFailure` effect | `aborted` | A predecessor failed; the node is aborted by failure propagation. This is a projection policy decision, not a call protocol event. |
This distinction is important: the event log records **what happened at the call level**, while the reactive engine derives **workflow-level state** from the log combined with template structure. The `WorkflowReactiveRoot` maintains `signal<NodeStatus>` values, but these signals are set by:
1. The status projection when call events arrive (event-log-driven)
2. The reactive engine for workflow-level states (projection-driven or signal-mutation)
The `getStatus(nodeId)` method on `EventLogProjection` checks the event log first (for call-level statuses), then falls back to the signal map (for workflow-level statuses). The `getResult(nodeId)` method is purely event-log-driven.
## ReactiveRoot for Workflows
```typescript
class WorkflowReactiveRoot {
class WorkflowReactiveRoot implements EventLogProjection {
private statusMap: Map<string, Signal<NodeStatus>>;
private preconditions: Map<string, Computed<boolean>>;
private blockedByFailure: Map<string, Computed<boolean>>;
private resultMap: Map<string, Computed<CallResult | undefined>>;
private graph: DirectedGraph;
private effectDisposers: (() => void)[];
private eventLog: CallEventMapValue[];
private nodeKeyToRequestId: Map<string, string>;
private failurePolicy: FailurePolicy;
constructor(graph: DirectedGraph) {
constructor(graph: DirectedGraph, options?: { failurePolicy?: FailurePolicy }) {
this.graph = graph;
this.statusMap = new Map();
this.preconditions = new Map();
this.blockedByFailure = new Map();
this.resultMap = new Map();
this.effectDisposers = [];
this.eventLog = [];
this.nodeKeyToRequestId = new Map();
this.failurePolicy = options?.failurePolicy ?? "continue-running";
this.initializeSignals();
}
}
```
`WorkflowReactiveRoot` wraps the reactive state for an entire workflow execution. It takes the structural DAG (from the GraphologyHost) and creates reactive state for each operation node.
`WorkflowReactiveRoot` wraps the reactive state for an entire workflow execution. It takes the structural DAG (from the GraphologyHost) and creates reactive state for each operation node. It implements the `EventLogProjection` interface from ADR-005, meaning the hub coordinator appends call protocol events and the root derives status and results from them.
### FailurePolicy
The failure policy determines what happens to running nodes when a predecessor fails. Per ADR-005 and OQ-010, this is a **projection policy**, not a hardcoded rule:
```typescript
type FailurePolicy =
| "continue-running" // Running nodes continue. Only idle/waiting dependents abort. (default)
| "abort-dependents"; // Running dependents of the failed node also abort.
```
The default policy (`continue-running`) means a node that has already started execution completes normally, even if a sibling or predecessor fails. Only nodes that haven't started (`idle` or `waiting`) transition to `aborted`.
### EventLogProjection Interface
```typescript
interface EventLogProjection {
/** Append an event. Events are processed idempotently. */
append(event: CallEventMapValue): void;
/** Current status of a node, derived from the most recent event. */
getStatus(nodeId: string): NodeStatus;
/** Result of a completed node, derived from call.responded events. */
getResult(nodeId: string): CallResult | undefined;
/** All events for a node, in order. */
getEvents(nodeId: string): CallEventMapValue[];
}
```
The `append()` method is the primary entry point for the hub coordinator. When a call protocol event arrives (`call.requested`, `call.responded`, etc.), the coordinator appends it to the log. The projections automatically update: `getStatus()` scans the log for the most recent event per node, and `getResult()` extracts the output from `call.responded` events.
### Request ID Mapping
The event log uses `requestId` (from the call protocol), while the reactive engine uses node keys (from the template DAG). The `nodeKeyToRequestId` map bridges these:
```typescript
// When starting a call:
const requestId = crypto.randomUUID();
workflowRoot.nodeKeyToRequestId.set(nodeKey, requestId);
// When appending events:
workflowRoot.append({ type: "call.requested", requestId, operationId, input, timestamp: now() });
```
This mapping is necessary because a single template node may have multiple requests (retries), and the event log records all of them.
### initializeSignals()
@@ -71,9 +201,51 @@ private initializeSignals(): void {
});
});
// Result: derived from the event log's result projection
// Uses the MOST RECENT call.responded event for this node (respects retries)
const result = computed(() => {
const requestId = this.nodeKeyToRequestId.get(node);
if (!requestId) return undefined;
const nodeEvents = this.eventLog
.filter(e => "requestId" in e && e.requestId === requestId);
// For retries, find the most recent call.responded or call.error event
// Events are in chronological order, so findLast would work in ES2023.
// Here we reverse-filter to find the latest terminal event.
let latestTerminalEvent: CallEventMapValue | undefined;
for (let i = nodeEvents.length - 1; i >= 0; i--) {
const e = nodeEvents[i];
if (e.type === "call.responded" || e.type === "call.error" || e.type === "call.aborted") {
latestTerminalEvent = e;
break;
}
}
if (!latestTerminalEvent) return undefined;
if (latestTerminalEvent.type === "call.error") {
return {
status: "failed",
output: undefined,
error: latestTerminalEvent.error,
} satisfies CallResult;
}
if (latestTerminalEvent.type === "call.responded") {
return {
status: "completed",
output: latestTerminalEvent.output,
} satisfies CallResult;
}
if (latestTerminalEvent.type === "call.aborted") {
return {
status: "aborted",
output: undefined,
} satisfies CallResult;
}
return undefined;
});
this.statusMap.set(node, status);
this.preconditions.set(node, preconditions);
this.blockedByFailure.set(node, blockedByFailure);
this.resultMap.set(node, result);
}
}
```
@@ -82,11 +254,23 @@ For each operation node in the DAG:
1. Create a `signal<NodeStatus>` starting at `"idle"`
2. Create a `computed<boolean>` that's `true` when all predecessor nodes have status `"completed"` (or `"skipped"` — a skipped node satisfies its dependents' preconditions)
3. Create a `computed<NodeStatus | null>` that detects whether any predecessor has failed or been aborted, triggering a cascade
4. Register an abort function that cascades to all descendants
4. Create a `computed<CallResult | undefined>` that derives the node's result from the event log (for use by `Conditional.test` and `Map.over`)
5. Register an abort function that cascades to all descendants
### Status lifecycle
The signal-based status lifecycle mirrors `CallStatus` with workflow-specific additions:
The signal-based status lifecycle mirrors `CallStatus` with workflow-specific additions. Under ADR-005, status transitions are **derived from the event log** — the coordinator appends events, and the status projection maps events to states:
| Event log signals | NodeStatus | Meaning |
|-------------------|------------|---------|
| (no events) | `idle` | Node just created, no call activity yet |
| Predecessor events arriving | `waiting` | At least one predecessor is running, none have completed yet |
| All predecessors completed/skipped | `ready` | All preconditions met, eligible to start |
| `call.requested` received | `running` | Call executing |
| `call.responded` received | `completed` | Call succeeded |
| `call.error` received | `failed` | Call failed (uncaught error) |
| `call.aborted` received | `aborted` | Call cancelled |
| Conditional branch not taken | `skipped` | Conditional branch not taken |
```
┌──────┐
@@ -106,13 +290,14 @@ The signal-based status lifecycle mirrors `CallStatus` with workflow-specific ad
│ │ └──────────►│ready │
│ │ └──┬───┘
│ │ │ hub starts call
│ │ │ (appends call.requested)
│ │ ▼
│ │ ┌────────┐
│ │ │running │──── ──── ──── ────►
│ │ └──┬──┬──┘ │
│ │ │ │ │
│ │ call │ │ call │ call
│ │ completed │ │ failed │ aborted
│ │ responded │ │ failed │ aborted
│ │ │ │ │
│ │ ▼ ▼ ▼
│ │ ┌───────────┐ ┌──────┐ ┌────────┐
@@ -134,37 +319,18 @@ The signal-based status lifecycle mirrors `CallStatus` with workflow-specific ad
└─── all are terminal states
```
Full transition rules:
### Retry semantics (ADR-005)
Retries are natural with the event log. A retry is NOT a state mutation — it's a new sequence of events appended to the log:
```
idle → waiting (predecessor starts running)
idle → ready (no predecessors — root node)
waiting → ready (all predecessors completed or skipped)
waiting → aborted (predecessor failed and failure is uncaught)
ready → running (hub starts the call)
running → completed (call succeeded)
running → failed (call threw an error)
running → aborted (call cancelled externally)
failed → [terminal] (no further transitions)
aborted → [terminal] (no further transitions)
skipped → [terminal] (conditional branch not taken)
completed → [terminal] (no further transitions)
call.requested(A, reqId=1) → fact: A was requested
call.error(A, reqId=1) → fact: A failed on first attempt
call.requested(A, reqId=2) → fact: A was retried with a new request
call.responded(A, reqId=2) → fact: A succeeded on retry
```
| Status | Meaning | Signal trigger |
|--------|---------|---------------|
| `idle` | Node just created, no predecessor activity yet | Initial state |
| `waiting` | At least one predecessor is running, none have completed yet | Any predecessor status change |
| `ready` | All predecessors completed or skipped (preconditions met) | `computed` resolves to `true` |
| `running` | Call executing | Hub sets `status.value = "running"` |
| `completed` | Call succeeded | Hub sets `status.value = "completed"` |
| `failed` | Call failed (uncaught error) | Hub sets `status.value = "failed"` |
| `aborted` | Call cancelled, or cascaded from failed predecessor | Hub or cascade sets `status.value = "aborted"` |
| `skipped` | Conditional branch not taken | Conditional evaluation sets this |
The key distinction between `failed` and `aborted`:
- **`failed`** means the operation itself threw an error. The node is the *source* of the failure.
- **`aborted`** means the operation was cancelled or a predecessor failed. The node is a *victim* of failure propagation.
The status projection derives the current state by scanning for the **most recent event per node**. No `retried` status needed; no state machine mutation; the log preserves full history. The `nodeKeyToRequestId` map tracks which `requestId` corresponds to each node's current attempt.
## Computed Preconditions
@@ -330,13 +496,13 @@ h(Sequential, {},
```
If `fetch-data` fails:
1. The `Conditional`'s `test` function receives the results map including `fetch-data`'s status
1. The `Conditional`'s `test` function receives the results map from the **result projection** (derived from the event log)
2. `test` evaluates to `false` (the operation failed)
3. The `then` branch transitions to `skipped`
4. The `else` branch (`notify-error`) becomes `ready`
3. The `then`-branch transitions to `skipped`
4. The `else`-branch (`notify-error`) becomes `ready`
5. Downstream nodes after the `Conditional` see the `Conditional` as `completed` (it resolved successfully, just on a different branch)
This makes `Conditional` a **caught error boundary**. The failure is handled — downstream nodes don't see a cascade because the `Conditional` resolved successfully.
The result projection (from ADR-005) provides `CallResult` values to `Conditional.test` and `Map.over`. These are computed from the event log, not from direct signal reads. This ensures that `Conditional.test` always sees the most recent state — if a node is retried, the test sees the retry's result, not the original failure.
Without a `Conditional`, the failure is **uncaught**. It cascades through dependency edges to all dependents, which transition to `aborted`.
@@ -406,11 +572,17 @@ function callStatusToNodeStatus(callStatus: CallStatus): NodeStatus {
}
```
## Effect-Driven Execution
## Event-Driven Execution
The hub coordinator uses two `effect()`s per node — one for starting when preconditions are met, and one for aborting when failure propagates:
Under ADR-005, the hub coordinator's responsibility shifts from directly setting signal values to **appending events to the log**. The reactive layer drives execution via `effect()`s that watch projections and invoke calls when preconditions are met.
### Coordinator Flow
```typescript
// 1. Create the reactive root from the DAG
const workflowRoot = new WorkflowReactiveRoot(dag, { failurePolicy: "continue-running" });
// 2. Register effects that start calls when preconditions are met
for (const [nodeId, preconditions, blockedByFailure] of workflowRoot.nodes) {
// Start the call when preconditions are met
effect(() => {
@@ -418,11 +590,18 @@ for (const [nodeId, preconditions, blockedByFailure] of workflowRoot.nodes) {
const status = workflowRoot.statusMap.get(nodeId)!;
if (status.value === "idle" || status.value === "waiting") {
// All preconditions satisfied — start the call
status.value = "running";
const operationId = graph.getNodeAttributes(nodeId).name;
prm.call(operationId, getInput(nodeId), { parentRequestId: parentCallId })
.then(result => { status.value = "completed"; })
.catch(error => { status.value = "failed"; });
const requestId = crypto.randomUUID();
workflowRoot.nodeKeyToRequestId.set(nodeId, requestId);
// Append event to the log (the status projection updates automatically)
workflowRoot.append({
type: "call.requested",
requestId,
operationId,
input: getInput(nodeId),
timestamp: new Date().toISOString(),
});
}
}
});
@@ -438,11 +617,45 @@ for (const [nodeId, preconditions, blockedByFailure] of workflowRoot.nodes) {
}
});
}
// 3. When a call completes, append the result event
prm.call(operationId, input, { parentRequestId })
.then(result => {
workflowRoot.append({
type: "call.responded",
requestId,
output: result,
timestamp: new Date().toISOString(),
});
})
.catch(error => {
workflowRoot.append({
type: "call.error",
requestId,
error: { code: error.code, message: error.message },
timestamp: new Date().toISOString(),
});
});
```
Both effects are reactive. When a predecessor completes, the `preconditions` computed re-evaluates, potentially triggering the start effect. When a predecessor fails, the `blockedByFailure` computed re-evaluates, potentially triggering the abort effect.
The call's promise resolution updates the node's status signal, which triggers downstream preconditions and failure propagations to re-evaluate, which triggers their effects, and so on.
The call's promise resolution appends events to the log. The status projection derives state from events. There is no direct `status.value = "running"` or `status.value = "completed"` — the projection handles these transitions by scanning the event log.
### Event-to-Status Mapping
The status projection maps events to `NodeStatus` values:
| Last event for node | Derived NodeStatus |
|---------------------|--------------------|
| No events | `idle` (or `waiting` if predecessors are running) |
| `call.requested` | `running` |
| `call.responded` | `completed` |
| `call.error` | `failed` |
| `call.aborted` | `aborted` |
| `call.completed` | `completed` |
For retries, the projection scans for the most recent event per node. A node with both `call.error` and `call.requested` (with a new `requestId`) is `running`, not `failed`.
### Effect disposal
@@ -558,11 +771,13 @@ The `WorkflowErrorBoundary` catches errors that escape the signal graph (e.g., a
## Constraints
- **Signals are in-memory** — `WorkflowReactiveRoot` state is not persisted. If the hub restarts, the reactive state is lost and must be reconstructed from call protocol events + template re-render.
- **Effect-driven execution is optional** — the hub coordinator can choose not to use `effect()` and instead poll `preconditions.value` and `blockedByFailure.value` manually. The reactive layer provides the building blocks; the coordinator decides how to use them.
- **Events are the source of truth** (ADR-005) — the hub coordinator appends call protocol events. Status, results, and call graph state are derived from the event log. The coordinator does NOT directly set signal values.
- **Event processing is idempotent** — processing the same event twice produces the same projected state. The status projection scans for the most recent event per node.
- **Signals are in-memory** — `WorkflowReactiveRoot` state is not persisted. If the hub restarts, the reactive state is reconstructed from call protocol events + template re-render. The event log itself can be reconstructed from the call protocol event stream.
- **Failure policy is configurable** — the `FailurePolicy` determines what happens to running nodes when a predecessor fails. Default is `continue-running` (only idle/waiting nodes abort). Alternative is `abort-dependents` (running dependents also abort).
- **Failure follows dependency edges, not structural scope** — a failed node causes only its downstream dependents (via DAG edges) to abort. Sibling branches in a `Parallel` group are independent and continue running. This enables partial success: one branch can fail while another completes.
- **Conditionals are error boundaries** — a `Conditional` whose test evaluates against a failed predecessor can redirect to an else branch, catching the failure. Without a `Conditional`, failures cascade uncaught through dependency edges.
- **Abort is immediate in signals, delayed in protocol** — setting `status.value = "aborted"` is instant, but `prm.abort(requestId)` takes time to propagate through the call protocol. The hub should invoke both.
- **Abort is immediate in signals, delayed in protocol** — transitioning a signal to `aborted` is instant, but `prm.abort(requestId)` takes time to propagate through the call protocol. The hub should invoke both.
- **`skipped` satisfies preconditions** — a `skipped` predecessor is treated as "completed for the purpose of preconditions." It means the branch was deliberately bypassed, not broken.
- **`failed` and `aborted` block preconditions** — a `failed` or `aborted` predecessor means the dependent's preconditions can never be met. The `blockedByFailure` effect transitions the dependent to `aborted`.
- **`NodeStatus` and `CallStatus` share terminal states** — `running`, `completed`, `failed`, `aborted` map directly. `idle`, `waiting`, `ready`, `skipped` are workflow-specific additions.
@@ -659,7 +874,7 @@ The `ReactiveContext` passed to `ReactiveHostConfig` includes a reference to `wo
1. **Should preconditions support OR logic?** Currently all predecessors must complete (AND logic). An `anyOf` predicate would allow "start this node as soon as any predecessor completes." This would require an edge attribute or node-level configuration.
2. **How are retries handled at the signal level?** If an operation fails and should be retried, the status would go `running → failed → ready → running`. This requires resetting the status back to `ready`, which the current state machine doesn't support (failed is terminal). A `retried` status or a separate `retryCount` attribute may be needed.
2. ~~**How are retries handled at the signal level?**~~ **Resolved by ADR-005**: Retries are natural append events. A retry creates a new `call.requested` with a new `requestId`. The status projection derives the current state by scanning for the most recent event per node. No `retried` status needed. See the Retry semantics section above.
3. **Should the reactive graph support partial re-rendering?** If a template changes mid-execution (e.g., a step is added), the ujsx reconciler could diff the old and new trees. But the ReactiveHost only supports mount rendering. Re-rendering would require reconciler support.
@@ -667,7 +882,7 @@ The `ReactiveContext` passed to `ReactiveHostConfig` includes a reference to `wo
5. **Should `blockedByFailure` be a separate `computed` or derived from `preconditions`?** Currently the design has two separate computeds — `preconditions` (all predecessors completed/skipped) and `blockedByFailure` (any predecessor failed/aborted). An alternative is a single `computed<NodeReadiness>` that returns `"ready" | "blocked" | "failed"` or similar. This reduces the number of effects but makes the readiness check less composable.
6. **What happens to running nodes when a predecessor fails?** The current spec transitions `idle` and `waiting` nodes to `aborted`. But what about a node that's already `running`? Should it be cancelled (set to `aborted` and call `prm.abort()`), or should it be allowed to complete? The answer depends on whether the running node's output is still needed — which the template author decides via `Conditional` error boundaries.
6. ~~**What happens to running nodes when a predecessor fails?**~~ **Resolved by ADR-005/OQ-010**: This is a `FailurePolicy` configuration of the projection. The default policy (`continue-running`) means running nodes continue. An alternative policy (`abort-dependents`) would abort running dependents. The event log makes both strategies expressible — only the projection logic changes.
## References