resolve all remaining open questions (OQ-03–OQ-29), add ADR-006

Resolve all 19 remaining open questions across the architecture. Every
question now has a documented resolution with rationale:

- OQ-004/OQ-029: edgeType is a universal required attribute on all edges,
  single graph per FlowGraph instance (ADR-006)
- OQ-011: No OR preconditions for v1; preconditionMode as v2 extension
- OQ-012: maxConcurrency enforced via reactive counting semaphore
- OQ-014: Unknown operationId creates node with pending status
- OQ-017: Expose common graphology traversal methods on FlowGraph (80/20)
- OQ-020: condition as Type.Unknown() with string/function documentation
- OQ-022: Identity imported from @alkdev/operations peer dep
- All other questions resolved with documented rationale

Fix three critical issues found by architecture review:
1. edgeType serialization/validation gap: document two-step validation
2. CallEdgeAttrs runtime discrimination: edgeType as runtime discriminant,
   depends_on edges clarified as observability-only (not execution)
3. ADR-005 signal mutation inconsistency: explicitly distinguish call-level
   statuses (event-log-driven) from workflow-derived statuses (signal-mutation)

Additional clarifications:
- dataFlow inference uses conservative strategy (defaults false)
- Conditional.test string resolution: operationName → status === completed
- Add negated field to TemplateEdgeAttrs for else-branch conditions
- Document edge key priority convention for composite keys
- Add maxConcurrency semaphore design to reactive-execution.md
This commit is contained in:
2026-05-21 09:25:55 +00:00
parent c76be7f689
commit f3e084d02f
9 changed files with 239 additions and 268 deletions

View File

@@ -1,6 +1,6 @@
---
status: draft
last_updated: 2026-05-21
last_updated: 2026-05-22
---
# Reactive Execution
@@ -376,67 +376,7 @@ Both B and C become `ready` at the same time, and the hub starts them in paralle
### Join preconditions
When a node depends on multiple predecessors (e.g., D depends on both B and C completing):
- D's preconditions: `B.status === "completed" && C.status === "completed"`
D only becomes `ready` when all predecessors complete. This is the "join" in fork-join parallelism.
## Failure Propagation
Failure propagation is the mechanism by which a failed or aborted node causes its downstream dependents to abort. The key design principle: **failure follows dependency edges, not structural scope**.
This means:
- In a `Sequential` group, failure propagates forward through the chain (B depends on A, so if A fails, B aborts)
- In a `Parallel` group, sibling branches are independent — a failure in branch A does NOT affect branch B, because there are no dependency edges between them
- A node that depends on multiple predecessors (a join) aborts only when it's impossible for its preconditions to ever be met
### The preconditions-failure duality
Each node has two complementary reactive computations:
1. **`preconditions`** (`computed<boolean>`) — true when all predecessors are `completed` or `skipped`. Node can start.
2. **`blockedByFailure`** (`computed<boolean>`) — true when any predecessor is `failed` or `aborted` and the failure is uncaught (not handled by a `Conditional`).
```typescript
const preconditions = computed(() => {
const predecessors = graph.inNeighbors(node);
return predecessors.every(pred => {
const status = statusMap.get(pred)!.value;
return status === "completed" || status === "skipped";
});
});
const blockedByFailure = computed(() => {
const predecessors = graph.inNeighbors(node);
return predecessors.some(pred => {
const status = statusMap.get(pred)!.value;
return status === "failed" || status === "aborted";
});
});
```
When `blockedByFailure` becomes `true` and the node hasn't started (`idle` or `waiting`), the node transitions to `aborted`. This happens via an `effect()`:
```typescript
effect(() => {
if (blockedByFailure.value && (status.value === "idle" || status.value === "waiting")) {
status.value = "aborted";
}
});
```
This cascade is automatic and reactive — when a predecessor fails, all downstream `blockedByFailure` computations re-evaluate, and their effects fire, aborting any waiting dependents.
### Sequential failure propagation
```
A (failed) → B (aborted) → C (aborted)
```
When A fails, B's `blockedByFailure` becomes true. B transitions from `waiting` to `aborted`. C's `blockedByFailure` then becomes true (B is now `aborted`). C transitions to `aborted`. The entire downstream chain aborts.
### Parallel independence
When a node depends on multiple predecessors (fork-join):
```
┌── B (completed) ──┐
@@ -444,36 +384,33 @@ A (completed) ├── D (ready)
└── C (failed) ─────┘
```
When C fails:
- C's downstream dependents see `blockedByFailure = true`
- B is unaffected — it's on an independent branch
- D depends on both B and C. D's `preconditions` will never be met (C is `failed`, not `completed`). D's `blockedByFailure` is true (C is `failed`). D transitions to `aborted`.
But crucially, this is because D *depends on* C, not because they share a structural scope:
```
┌── B (completed) ──┐
A (completed) │ (no edge from C to E)
└── C (failed) ─────┘
└── E (completed)
```
E has no dependency on C. E continues running regardless of C's failure. **Failure follows dependency edges, not structural boundaries.**
### Join semantics
When a node depends on multiple predecessors (fork-join):
```
┌── B (completed) ──┐
A (completed) ├── D (aborted)
└── C (failed) ─────┘
```
D's `preconditions` requires both B and C to be completed/skipped. Since C is `failed`, D's preconditions can never be met. D transitions to `aborted`.
The alternative would be "partial success" — D starts with B's output even though C failed. This is NOT supported by the precondition model. If partial execution is needed, the template author should use a `Conditional` to handle the failure case explicitly.
### `maxConcurrency` for Parallel groups
A `Parallel` group with `maxConcurrency: 3` should only start 3 nodes at a time, even though all preconditions are met. This is a scheduling constraint, not a structural one — the DAG doesn't encode it.
The `WorkflowReactiveRoot` enforces `maxConcurrency` via a reactive counting semaphore:
```typescript
// For each node in a Parallel group with maxConcurrency:
const groupKey = getParallelGroup(nodeId); // from parentMap/siblingMap
const maxConc = getMaxConcurrency(groupKey); // from template props
const canStart = computed(() => {
const siblingRunningCount = siblings.filter(
sib => statusMap.get(sib)!.value === "running"
).length;
return preconditions.value && siblingRunningCount < maxConc;
});
```
A node becomes `ready` only when both its `preconditions` are met AND the number of currently running siblings is below `maxConcurrency`. When a sibling completes and a slot opens, the next ready node starts.
For `Parallel` groups without `maxConcurrency` (the default), all siblings start immediately when their preconditions are met — no semaphore is needed.
### Conditional as error boundary
A `Conditional` can catch a failure and redirect to a fallback path:
@@ -771,7 +708,8 @@ The `WorkflowErrorBoundary` catches errors that escape the signal graph (e.g., a
## Constraints
- **Events are the source of truth** (ADR-005) — the hub coordinator appends call protocol events. Status, results, and call graph state are derived from the event log. The coordinator does NOT directly set signal values.
- **Events are the source of truth for call-level statuses** (ADR-005) — the hub coordinator appends call protocol events. Call-level statuses (`running`, `completed`, `failed`, `aborted` from `call.aborted`) are derived from the event log by the status projection. The coordinator does NOT directly set signal values for these statuses.
- **Workflow-derived statuses use signal mutation** — statuses that have no call protocol equivalent (`idle`, `waiting`, `ready`, `skipped`, and `aborted` from `blockedByFailure`) are set directly on signals by the reactive engine. This is not a violation of ADR-005's event-log principle — these statuses represent workflow-level concerns (scheduling, failure propagation) that exist outside the call protocol's scope. ADR-005's principle applies to *call protocol events*; it does not forbid the reactive layer from managing its own workflow-level state. See the "Hybrid Status Model" section for the full categorization.
- **Event processing is idempotent** — processing the same event twice produces the same projected state. The status projection scans for the most recent event per node.
- **Signals are in-memory** — `WorkflowReactiveRoot` state is not persisted. If the hub restarts, the reactive state is reconstructed from call protocol events + template re-render. The event log itself can be reconstructed from the call protocol event stream.
- **Failure policy is configurable** — the `FailurePolicy` determines what happens to running nodes when a predecessor fails. Default is `continue-running` (only idle/waiting nodes abort). Alternative is `abort-dependents` (running dependents also abort).
@@ -780,7 +718,8 @@ The `WorkflowErrorBoundary` catches errors that escape the signal graph (e.g., a
- **Abort is immediate in signals, delayed in protocol** — transitioning a signal to `aborted` is instant, but `prm.abort(requestId)` takes time to propagate through the call protocol. The hub should invoke both.
- **`skipped` satisfies preconditions** — a `skipped` predecessor is treated as "completed for the purpose of preconditions." It means the branch was deliberately bypassed, not broken.
- **`failed` and `aborted` block preconditions** — a `failed` or `aborted` predecessor means the dependent's preconditions can never be met. The `blockedByFailure` effect transitions the dependent to `aborted`.
- **`NodeStatus` and `CallStatus` share terminal states** — `running`, `completed`, `failed`, `aborted` map directly. `idle`, `waiting`, `ready`, `skipped` are workflow-specific additions.
- **`NodeStatus` and `CallStatus` share terminal states** — `running`, `completed`, `failed`, `aborted` map directly. `idle`, `waiting`, `ready`, `skipped` are workflow-specific additions with no call protocol equivalent.
- **Edge key format uses composite keys for call graph** — `triggered` edges use `${source}->${target}`, `depends_on` edges use `${source}->${target}:depends_on`. See [schema.md](schema.md) for the full key convention.
## Lifecycle and Ownership
@@ -872,15 +811,15 @@ The `ReactiveContext` passed to `ReactiveHostConfig` includes a reference to `wo
## Open Questions
1. **Should preconditions support OR logic?** Currently all predecessors must complete (AND logic). An `anyOf` predicate would allow "start this node as soon as any predecessor completes." This would require an edge attribute or node-level configuration.
1. ~~**Should preconditions support OR logic?**~~ **Resolved (OQ-011)**: No for v1. All preconditions use AND logic — a node becomes `ready` only when ALL predecessors have reached a satisfying terminal state (`completed` or `skipped`). OR logic (`anyOf`) would introduce significant complexity (what happens when one predecessor completes but another fails? Is the node ready or blocked?) and is already partially addressed by `Conditional` (which provides branch-level either/or semantics). For v2, if OR logic becomes necessary, it should be added as a `preconditionMode: "allOf" | "anyOf"` attribute on `Operation` (node-level, not edge-level), defaulting to `"allOf"`. This is a clean extension point that doesn't change the current precondition model.
2. ~~**How are retries handled at the signal level?**~~ **Resolved by ADR-005**: Retries are natural append events. A retry creates a new `call.requested` with a new `requestId`. The status projection derives the current state by scanning for the most recent event per node. No `retried` status needed. See the Retry semantics section above.
3. **Should the reactive graph support partial re-rendering?** If a template changes mid-execution (e.g., a step is added), the ujsx reconciler could diff the old and new trees. But the ReactiveHost only supports mount rendering. Re-rendering would require reconciler support.
3. ~~**Should the reactive graph support partial re-rendering?**~~ **Resolved (OQ-025)**: Blocked on ujsx reconciler. Currently mount-only. When the reconciler is implemented, flowgraph gains re-rendering through the standard `prepareUpdate`/`commitUpdate` HostConfig methods. The event log persists across re-renders (ADR-005), so re-rendered nodes pick up where they left off. No special reactive-graph re-rendering logic is needed — the reconciler handles tree diffing, and the HostConfig applies mutations.
4. **How does `maxConcurrency` interact with preconditions?** A `Parallel` group with `maxConcurrency: 3` should only start 3 nodes at a time, even though all preconditions are met. This is a scheduling concern, not a structural one. The reactive layer could implement this as a semaphore signal, or it could be the coordinator's responsibility.
4. ~~**How does `maxConcurrency` interact with preconditions?**~~ **Resolved (OQ-012)**: `maxConcurrency` is a `Parallel` prop enforced by the `WorkflowReactiveRoot` via a counting semaphore in the reactive layer. When the root initializes signals for nodes in a `Parallel` group with `maxConcurrency: N`, it wraps the precondition logic: a node's effective `ready` transition requires both `preconditions.value === true` AND `runningCount < maxConcurrency`. The `runningCount` is a reactive computed derived from counting sibling nodes currently in the `running` state. This is entirely a reactive-engine concern — the DAG doesn't encode `maxConcurrency` (it's not structural), and the call graph doesn't need to know about it. The `Parallel` component's `maxConcurrency` prop is already part of the template definition; the reactive engine just needs to honor it.
5. **Should `blockedByFailure` be a separate `computed` or derived from `preconditions`?** Currently the design has two separate computeds `preconditions` (all predecessors completed/skipped) and `blockedByFailure` (any predecessor failed/aborted). An alternative is a single `computed<NodeReadiness>` that returns `"ready" | "blocked" | "failed"` or similar. This reduces the number of effects but makes the readiness check less composable.
5. ~~**Should `blockedByFailure` be a separate `computed` or derived from `preconditions`?**~~ **Resolved (OQ-013)**: Keep two separate `computed` values (current design). Two separate computeds are more composable — you can check preconditions independently of failure status, and you can compose different effects for each. A single `computed<NodeReadiness>` would require every consumer to destructure the result, losing the clean `if (preconditions.value) { ... }` pattern. The implementation cost of two effects per node is negligible. The current design is the right one.
6. ~~**What happens to running nodes when a predecessor fails?**~~ **Resolved by ADR-005/OQ-010**: This is a `FailurePolicy` configuration of the projection. The default policy (`continue-running`) means running nodes continue. An alternative policy (`abort-dependents`) would abort running dependents. The event log makes both strategies expressible — only the projection logic changes.