ADR-005: event log as single source of truth

Proposed architecture decision to use an append-only execution event log
(call protocol events) as ground truth, with status/result/call-graph as
projections. Resolves OQ-06, OQ-07, OQ-08, OQ-09; reframes OQ-01, OQ-02,
OQ-10. Inspired by event sourcing discipline (notification vs state transfer)
and compute_graph ExecutionContext pattern.
This commit is contained in:
2026-05-20 09:33:15 +00:00
parent 27ebbd491e
commit 2c1b2d1a15
3 changed files with 204 additions and 25 deletions

View File

@@ -71,6 +71,7 @@ Flowgraph is in Phase 0/1 (exploration → architecture). No code exists yet. Th
| [002](decisions/002-dag-only-graph.md) | Enforce DAG invariants — no cycles in flowgraph | | [002](decisions/002-dag-only-graph.md) | Enforce DAG invariants — no cycles in flowgraph |
| [003](decisions/003-storage-decoupled.md) | Storage is not flowgraph's concern — in-memory graph with export/import boundary | | [003](decisions/003-storage-decoupled.md) | Storage is not flowgraph's concern — in-memory graph with export/import boundary |
| [004](decisions/004-no-schema-version.md) | No schema version field in serialized format — consumers wrap in their own versioned envelope | | [004](decisions/004-no-schema-version.md) | No schema version field in serialized format — consumers wrap in their own versioned envelope |
| [005](decisions/005-event-log-as-source-of-truth.md) | Execution Event Log as single source of truth — call protocol events as ground truth, status/result/call-graph as projections |
### Open Questions ### Open Questions

View File

@@ -0,0 +1,156 @@
# ADR-005: Event Log as Single Source of Truth
## Status
Proposed
## Context
Flowgraph's reactive execution layer currently uses signal-based state propagation (`signal<NodeStatus>` and `computed<boolean>` for preconditions). Call graph nodes are populated from call protocol events. The two systems — reactive status tracking and call graph construction — are separate concepts that happen to process the same events.
Several open questions in the architecture reveal a common underlying problem:
1. **OQ-06**: How does the template system bridge to the call protocol? The reactive engine needs to know when a call completes and what its output was, but the current design has no formal mechanism for this — `Conditional.test` receives a `results` map from an ad-hoc closure.
2. **OQ-07**: Should the reactive engine own the call graph? They're both derived from the same call protocol events, but they're described as separate concepts.
3. **OQ-08**: Should `depends_on` edges be auto-populated from templates? This conflates temporal ordering ("B starts after A completes") with data flow ("B needs A's output").
4. **OQ-09**: How are retries handled? The current state machine marks `failed` as terminal, requiring awkward workarounds for retry.
5. **OQ-10**: What happens to running nodes when a predecessor fails? The current design uses signal mutations without a clear policy mechanism.
6. **OQ-02**: How deep should type compatibility checking go? This conflates edges that carry data (where types matter) with edges that only express ordering (where types are irrelevant).
These questions share a common root: **the architecture conflates notification (something happened) with state transfer (here's the data).** The event sourcing discipline calls this a "spaghetti concept" — using the same mechanism for semantically different purposes.
Meanwhile, the call protocol already defines a sequence of append-only facts:
```
call.requested → { requestId, operationId, input, parentRequestId, timestamp }
call.responded → { requestId, output, timestamp }
call.error → { requestId, error, timestamp }
call.aborted → { requestId, timestamp }
call.completed → { requestId, timestamp }
```
These events are the ground truth. The call graph, the reactive status map, and the result map are all projections of this event sequence.
## Decision
Flowgraph's reactive execution layer will be built on an **Execution Event Log** — an append-only sequence of call protocol events that serves as the single source of truth. The call graph, reactive status signals, and result map are all projections derived from this log.
### Core Concept: Event Log + Projections
```
┌─────────────────────────────────────────────┐
│ Execution Event Log │
│ (append-only sequence of call protocol │
│ events — the ground truth) │
└──────────────────┬──────────────────────────┘
┌─────────────┼──────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌──────────┐
│ Status │ │ Result │ │ Call │
│ Proj. │ │ Proj. │ │ Graph │
│ │ │ │ │ Proj. │
│ nodeId: │ │ nodeId: │ │ │
│ status │ │ output │ │ nodes + │
│ │ │ │ │ edges │
└────┬────┘ └────┬─────┘ └──────────┘
│ │
▼ ▼
┌───────────────────────────────────────────┐
│ Reactive Execution Layer │
│ │
│ preconditions → "does the log show │
│ all predecessors │
│ completed?" │
│ │
│ result resolution → "does the log │
│ have A's output?" │
│ │
│ Conditional.test → reads from result proj. │
│ Map.over → reads from result proj. │
└───────────────────────────────────────────┘
```
### Notification vs. State Transfer
The event log naturally distinguishes two patterns that the current architecture conflates:
| Pattern | What the edge means | What downstream needs | Event type |
|---------|--------------------|-----------------------|------------|
| **Temporal (task ordering)** | "A must complete before B starts" | Just the notification that A completed | Notification |
| **Data flow** | "A's output is B's input" | A's actual output data | State Transfer |
The SDD pipeline (architect → reviewer → decompressor) is temporal ordering — the reviewer starts because the architect finished, but it reads files from disk, not from the architect's output. It only needs the notification.
The data-flow pipeline (fetch-items → Map(process-item) → aggregate) is state transfer — `process-item` needs `fetch-items`'s output. It needs the state.
Both patterns derive from the same event log. Different projections serve different needs.
### Retry Semantics
Retries become natural with an append-only log. A retry is not a state mutation — it's a new sequence of events appended to the log:
```
call.requested(A, attempt=1) → fact: A was requested
call.error(A, "timeout") → fact: A failed
call.requested(A, attempt=2) → fact: A was retried
call.responded(A, output) → fact: A succeeded on retry
```
The status projection derives the current state by scanning for the most recent event per node. No state machine mutation needed. The state machine becomes a fold over the event log.
### Type Compatibility
Type compatibility checking (OQ-02) only applies to edges that carry state transfer — where the downstream node actually reads the upstream node's output. Temporal-only edges don't need type checking because there's no data flowing between them.
This resolves OQ-01: incompatible edges (type mismatches) only exist on state-transfer edges. For temporal-only edges, type compatibility is irrelevant. The distinction emerges naturally from the notification/state-transfer separation.
## Rationale
1. **The call protocol already IS the event log.** Every call event (`call.requested`, `call.responded`, `call.error`, `call.aborted`, `call.completed`) is an append-only fact. We've been treating these as separate from the reactive layer instead of recognizing that they're the same sequence of events projected differently.
2. **Projections separate concerns.** The status projection, result projection, and call graph projection all derive from the same log but serve different consumers. This eliminates the question of "who owns the call graph" (OQ-07) — it's a projection, not something the reactive engine "owns."
3. **Notification and state transfer are different.** The event sourcing discipline makes this explicit. Conflating them leads to the "boomerang callback" anti-pattern (OQ-06) — if you send a thin notification but the consumer needs the data, they call back synchronously. The event log carries both notification and state transfer; different consumers read different projections.
4. **Retries are natural.** An append-only log makes retries a sequence of facts, not a state mutation hack. This resolves OQ-09 without adding a `retried` status or breaking the terminal-state invariant.
5. **Data dependencies don't need separate edges.** If B needs A's output, B reads from the result projection. The temporal ordering is already expressed by template edges. A separate `depends_on` edge type (OQ-08) becomes unnecessary because the event log is the data transport.
6. **Category theory alignment.** The event log is a monoid (append with identity). Projections are functors from the log monoid to status/result/graph monoids. Composition of morphisms (A → B → C) follows from the composition of events in the log. This is the same structure as the category theory research prototype, but applied to workflow orchestration rather than generic morphism composition.
## Consequences
### Positive
- **OQ-06 resolved**: The reactive layer bridges to the call protocol through the event log. The hub appends call protocol events; the reactive layer projects them. No callback, no boomerang.
- **OQ-07 resolved**: The call graph and reactive engine are both projections of the event log. Neither owns the other.
- **OQ-08 resolved**: `depends_on` edges are unnecessary. Data dependencies are expressed through the result projection, not through edge attributes.
- **OQ-09 resolved**: Retries are natural — append new events rather than mutating state.
- **OQ-10 reframed**: When a predecessor fails, the event log records the failure. Policy (abort running nodes? let them continue?) is a configuration of the projection, not a hard-coded state machine rule.
- **OQ-02 reframed**: Type compatibility checking only applies to edges where state transfer occurs, not to temporal-only edges.
- **OQ-01 reframed**: Incompatible edges only exist where there's data flow. Temporal-only edges don't need them in the operation graph.
### Negative
- **The reactive layer needs a redesign.** The current `WorkflowReactiveRoot` directly creates `signal<NodeStatus>` instances and expects the hub coordinator to set them. The event log approach replaces direct signal mutation with event appends that project into signal updates. This is a non-trivial refactoring of the reactive-execution.md spec.
- **The event log must be persisted if workflow state must survive restarts.** Since flowgraph is in-memory only (ADR-003), the event log lives in memory. Persistence is the consumer's concern — the hub can persist the call protocol events in Postgres and replay them to reconstruct the reactive state after a restart.
- **Event replay must be idempotent.** Processing the same event twice must produce the same projected state. This is already a property of the call protocol events (`updateFromEvent` is documented as idempotent in call-graph.md).
- **The result projection needs a clear interface.** `getResult(nodeId)` must be defined — what it returns, when it's available, and how it interacts with `Conditional.test` and `Map.over` closures that may reference results from nodes that haven't completed yet.
### Open
- **Should the event log be its own exported type, or is it the call protocol event stream by another name?** The call protocol already defines the events. The event log might just be `CallEventMapValue[]` with an append-only contract and projection functions.
- **How does the event log interact with the ujsx template lifecycle?** When a template is rendered to a reactive root, the log starts empty and populates as events arrive. But if the template is re-rendered (when the reconciler supports it), what happens to the log? Is it reset, or does it persist across re-renders?
- **Should temporal-only edges be explicitly marked?** Currently `sequential` edges are always temporal ordering. Data flow is implicitly expressed by `Conditional.test` and `Map.over` reading from the result projection. Should edges carry an attribute that explicitly marks them as notification vs. state transfer? This would make type checking more precise (only check types on state-transfer edges).
## References
- Open questions tracker: [open-questions.md](open-questions.md)
- Reactive execution: [reactive-execution.md](reactive-execution.md)
- Call graph: [call-graph.md](call-graph.md)
- Call protocol events: `@alkdev/operations/src/call.ts`
- Event sourcing research: `/workspace/research/event_sourcing/event_source_types.md`
- Category theory graph research: `/workspace/@alkdev/ujsx/docs/research/category-theory-graph.md`
- Compute graph DAG: `/workspace/compute_graph/packages/dag/`

View File

@@ -14,28 +14,43 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
- When a question is resolved, update its status to `resolved` and add a resolution note - When a question is resolved, update its status to `resolved` and add a resolution note
- Once all questions in a theme are resolved, the theme section can be removed - Once all questions in a theme are resolved, the theme section can be removed
## ADR-005 Impact
[ADR-005: Event Log as Single Source of Truth](decisions/005-event-log-as-source-of-truth.md) proposes an Execution Event Log pattern that resolves or reframes several open questions. Questions affected by ADR-005 are marked with `adr-005` in their status. Summary:
| Question | ADR-005 Impact |
|----------|-----------------|
| OQ-01 | Reframed: incompatible edges only exist where there's data flow. Temporal-only edges don't need type checking. |
| OQ-02 | Reframed: type compatibility depth only applies to state-transfer edges, not notification edges. |
| OQ-06 | Resolved: the reactive layer bridges to the call protocol through the event log, not direct signal mutation. |
| OQ-07 | Resolved: call graph and reactive engine are both projections of the event log. Neither owns the other. |
| OQ-08 | Resolved: `depends_on` edges unnecessary; data dependencies expressed through result projection. |
| OQ-09 | Resolved: retries are natural append events, not state mutations. |
| OQ-10 | Reframed: policy question (abort running nodes?) becomes a projection configuration, not a hardcoded state machine rule. |
## Theme 1: Edge Semantics and Type Compatibility ## Theme 1: Edge Semantics and Type Compatibility
### OQ-01: Should `fromSpecs()` add ALL edges or only compatible ones? ### OQ-01: Should `fromSpecs()` add ALL edges or only compatible ones?
- **Origin**: [operation-graph.md](operation-graph.md) Q1 - **Origin**: [operation-graph.md](operation-graph.md) Q1
- **Status**: open - **Status**: reframed by ADR-005
- **Priority**: high — affects storage size, API surface, and diagnostic value - **Priority**: high — affects storage size, API surface, and diagnostic value
- **Options**: - **Options**:
- (a) Add both compatible and incompatible edges (current design). Pro: diagnostic information visible. Con: graph is larger. - (a) Add both compatible and incompatible edges (current design). Pro: diagnostic information visible. Con: graph is larger.
- (b) Only add compatible edges, with a `potentialEdges()` query computing incompatible connections on demand. Pro: smaller graph. Con: loses diagnostic information. - (b) Only add compatible edges, with a `potentialEdges()` query computing incompatible connections on demand. Pro: smaller graph. Con: loses diagnostic information.
- **Notes**: This decision affects `buildTypeEdges()` in [analysis.md](analysis.md) and `OperationEdgeAttrs` in [schema.md](schema.md). The `compatible: false` attribute on edges only makes sense if option (a) is chosen. - **Notes**: This decision affects `buildTypeEdges()` in [analysis.md](analysis.md) and `OperationEdgeAttrs` in [schema.md](schema.md). The `compatible: false` attribute on edges only makes sense if option (a) is chosen.
- **ADR-005 reframing**: Incompatible edges only exist on **state-transfer** edges (where data flows from A's output to B's input). **Temporal-only** edges (where B starts after A completes but doesn't use A's output) don't need type checking at all. This means option (b) may be correct for temporal edges, while option (a) is correct for state-transfer edges. The operation graph could distinguish these with an edge attribute.
- **Cross-references**: OQ-04 - **Cross-references**: OQ-04
### OQ-02: How granular should type compatibility results be? ### OQ-02: How granular should type compatibility results be?
- **Origin**: [operation-graph.md](operation-graph.md) Q4, [analysis.md](analysis.md) Q1 - **Origin**: [operation-graph.md](operation-graph.md) Q4, [analysis.md](analysis.md) Q1
- **Status**: open - **Status**: reframed by ADR-005
- **Priority**: high — directly shapes the `typeCompat()` return type and `OperationEdgeAttrs` - **Priority**: high — directly shapes the `typeCompat()` return type and `OperationEdgeAttrs`
- **Question (merged)**: How deep should `typeCompat` check? Should it be fully recursive? And should the result be `{ compatible, detail? }` or `{ compatible, mismatches: TypeMismatch[] }`? - **Question (merged)**: How deep should `typeCompat` check? Should it be fully recursive? And should the result be `{ compatible, detail? }` or `{ compatible, mismatches: TypeMismatch[] }`?
- **Current design**: The schema already defines `TypeMismatch` with `{ path, expected, actual }` and `OperationEdgeAttrs` has an optional `mismatches` field. The analysis doc describes deep recursive structural comparison. But there's a tension: full recursive checking is more thorough but may produce false negatives for schemas with dynamic structures. - **Current design**: The schema already defines `TypeMismatch` with `{ path, expected, actual }` and `OperationEdgeAttrs` has an optional `mismatches` field. The analysis doc describes deep recursive structural comparison. But there's a tension: full recursive checking is more thorough but may produce false negatives for schemas with dynamic structures.
- **Notes**: The schema doc already has `mismatches?: TypeMismatch[]` in `OperationEdgeAttrs`. The analysis doc already defines `TypeCompatResult` with `mismatches`. This suggests the design has already converged toward structured mismatch reporting. What remains is confirming: (a) recursive depth limits, (b) handling of `Type.Unknown()` and complex types (unions, intersections), (c) whether the `detail` string field is still needed alongside `mismatches`. - **Notes**: The schema doc already has `mismatches?: TypeMismatch[]` in `OperationEdgeAttrs`. The analysis doc already defines `TypeCompatResult` with `mismatches`. This suggests the design has already converged toward structured mismatch reporting. What remains is confirming: (a) recursive depth limits, (b) handling of `Type.Unknown()` and complex types (unions, intersections), (c) whether the `detail` string field is still needed alongside `mismatches`.
- **Cross-references**: OQ-01 (incompatible edges need mismatch detail) - **ADR-005 reframing**: Type compatibility checking only applies to **state-transfer** edges (where A's output flows into B's input). **Temporal-only** edges (where B starts after A but doesn't use A's output) don't need type checking — their "compatibility" is trivially true. This means the operation graph should distinguish between edges that carry data and edges that only express ordering. `typeCompat()` only needs to run on state-transfer edges.
### OQ-03: Should subscription operations be treated differently in type compatibility? ### OQ-03: Should subscription operations be treated differently in type compatibility?
@@ -80,18 +95,20 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
### OQ-06: How does template instantiation interact with the call protocol? ### OQ-06: How does template instantiation interact with the call protocol?
- **Origin**: [workflow-templates.md](workflow-templates.md) Q4, [host-configs.md](host-configs.md) Q3 - **Origin**: [workflow-templates.md](workflow-templates.md) Q4, [host-configs.md](host-configs.md) Q3
- **Status**: open - **Status**: resolved by ADR-005
- **Priority**: high — this is a fundamental integration point between flowgraph and the call protocol - **Priority**: high — this is a fundamental integration point between flowgraph and the call protocol
- **Question (merged)**: When a template is instantiated as a call graph, each `<Operation>` becomes a call. But the call protocol's `call.requested` events include `parentRequestId` — who is the parent? Is it the template instance? The hub coordinator? And how does the `ReactiveHostConfig` bridge to `registry.execute()` or `PendingRequestMap.call()`? - **Question (merged)**: When a template is instantiated as a call graph, each `<Operation>` becomes a call. But the call protocol's `call.requested` events include `parentRequestId` — who is the parent? Is it the template instance? The hub coordinator? And how does the `ReactiveHostConfig` bridge to `registry.execute()` or `PendingRequestMap.call()`?
- **Notes**: The consumer-integration doc shows the coordinator calling `registry.execute()` inside an `effect()`, but doesn't specify the `parentRequestId` semantics. This is a consumer-side decision, but flowgraph needs to document: (a) whether the template has its own `requestId`, (b) how the reactive engine signals the coordinator to start a call, (c) whether `ReactiveHostConfig` has a callback prop for this. - **ADR-005 resolution**: The reactive layer bridges to the call protocol through the event log. Call protocol events (`call.requested`, `call.responded`, etc.) are appended to the event log. The reactive status projection derives `NodeStatus` from the log. The result projection derives `CallResult` from the log. The hub coordinator appends events; the reactive layer projects them. No callback, no boomerang, no direct signal mutation by the coordinator.
- **Cross-references**: OQ-07, OQ-08 - **Cross-references**: OQ-07, OQ-08
### OQ-07: Should the reactive engine own the call graph? ### OQ-07: Should the reactive engine own the call graph?
- **Origin**: [host-configs.md](host-configs.md) Q4 - **Origin**: [host-configs.md](host-configs.md) Q4
- **Status**: open - **Status**: resolved by ADR-005
- **Priority**: high — affects the separation between flowgraph and the call protocol - **Priority**: high — affects the separation between flowgraph and the call protocol
- **Question**: Currently the call graph (from call-graph.md) and the reactive engine (from reactive-execution.md) are separate concepts. But at runtime, every `<Operation>` in a template becomes a call graph node. Should the reactive engine populate the call graph as a side effect? - **Question**: Currently the call graph (from call-graph.md) and the reactive engine (from reactive-execution.md) are separate concepts. But at runtime, every `<Operation>` in a template becomes a call graph node. Should the reactive engine populate the call graph as a side effect?
- **ADR-005 resolution**: Neither owns the other. Both the call graph and the reactive status/result projections derive from the same event log. They are independent projections of the same source of truth. The call graph projects the structural view (who triggered whom). The reactive engine projects the behavioral view (what's running, what's blocked). You can have one without the other, or both simultaneously.
- **Question**: Currently the call graph (from call-graph.md) and the reactive engine (from reactive-execution.md) are separate concepts. But at runtime, every `<Operation>` in a template becomes a call graph node. Should the reactive engine populate the call graph as a side effect?
- **Options**: - **Options**:
- (a) Separate: Call graph is populated by call protocol events. Reactive engine uses signals only. Coordinator bridges them. - (a) Separate: Call graph is populated by call protocol events. Reactive engine uses signals only. Coordinator bridges them.
- (b) Unified: Reactive engine creates call graph nodes when nodes transition to `running`, updates them on completion. Call graph is derived from reactive state. - (b) Unified: Reactive engine creates call graph nodes when nodes transition to `running`, updates them on completion. Call graph is derived from reactive state.
@@ -100,11 +117,10 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
### OQ-08: Should `depends_on` edges be auto-populated from workflow templates? ### OQ-08: Should `depends_on` edges be auto-populated from workflow templates?
- **Origin**: [call-graph.md](call-graph.md) Q2 - **Origin**: [call-graph.md](call-graph.md) Q2
- **Status**: open - **Status**: resolved by ADR-005
- **Priority**: medium — affects how the call graph and template system relate - **Priority**: medium — affects how the call graph and template system relate
- **Question**: When a call graph is instantiated from a workflow template, the template's sequential/parallel structure implies data dependencies. Should the template instantiation automatically create `depends_on` edges in the call graph? - **Question**: When a call graph is instantiated from a workflow template, the template's sequential/parallel structure implies data dependencies. Should the template instantiation automatically create `depends_on` edges in the call graph?
- **Notes**: Currently `depends_on` edges must be added explicitly. Auto-population would couple the call graph to the template system. The alternative is for the coordinator to add `depends_on` edges when it instantiates a template. - **ADR-005 resolution**: `depends_on` edges are unnecessary as a separate concept. Data dependencies are expressed through the result projection of the event log. If node B needs node A's output, B reads `getResult("A")` from the result projection. The temporal ordering (A before B) is already expressed by template edges. There's no need for a separate edge type to represent data flow — the event log IS the data transport.
- **Cross-references**: OQ-06, workflow-templates Q3 (explicit `depends_on` in templates)
--- ---
@@ -113,22 +129,28 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
### OQ-09: How are retries handled at the signal level? ### OQ-09: How are retries handled at the signal level?
- **Origin**: [reactive-execution.md](reactive-execution.md) Q2 - **Origin**: [reactive-execution.md](reactive-execution.md) Q2
- **Status**: open - **Status**: resolved by ADR-005
- **Priority**: high — affects the core status state machine - **Priority**: high — affects the core status state machine
- **Question**: If an operation fails and should be retried, the status would need to go `running → failed → ready → running`. But the current state machine marks `failed` as terminal with no exit transitions. How should this work? - **Question**: If an operation fails and should be retried, the status would need to go `running → failed → ready → running`. But the current state machine marks `failed` as terminal with no exit transitions. How should this work?
- **Options**: - **Options**:
- (a) A `retried` status that allows re-entering `ready`. Con: adds another state to `NodeStatus`. - (a) A `retried` status that allows re-entering `ready`. Con: adds another state to `NodeStatus`.
- (b) A separate `retryCount` attribute. A node can reset its status from `failed` to `ready` if `retryCount < maxRetries`. Con: breaks the terminal-state invariant. - (b) A separate `retryCount` attribute. A node can reset its status from `failed` to `ready` if `retryCount < maxRetries`. Con: breaks the terminal-state invariant.
- (c) Retry creates a new node (new `requestId`). The old node stays `failed`. Con: increases graph size but preserves state machine integrity. - (c) Retry creates a new node (new `requestId`). The old node stays `failed`. Con: increases graph size but preserves state machine integrity.
- **Notes**: Option (c) aligns with the call protocol, where each retry is a new call with a new `requestId`. This is likely the right answer but needs confirmation. - **ADR-005 resolution**: Option (c) is correct, and the event log makes it natural. A retry is not a state mutation — it's a new sequence of events appended to the log. When `call.requested` arrives for the same operation with a new `requestId`, it's a new fact. The old `call.error` event remains in the log as history. The status projection derives the current state by scanning for the most recent event per node. No `retried` status needed; no state machine mutation; the log preserves full history.
- **Cross-references**: OQ-10 - **Cross-references**: OQ-10
### OQ-10: What happens to running nodes when a predecessor fails? ### OQ-10: What happens to running nodes when a predecessor fails?
- **Origin**: [reactive-execution.md](reactive-execution.md) Q6 - **Origin**: [reactive-execution.md](reactive-execution.md) Q6
- **Status**: open - **Status**: reframed by ADR-005
- **Priority**: high — affects failure propagation correctness - **Priority**: high — affects failure propagation correctness
- **Question**: The current spec transitions `idle` and `waiting` nodes to `aborted` when `blockedByFailure` becomes true. But what about a node that's already `running`? Should it be cancelled? - **Question**: The current spec transitions `idle` and `waiting` nodes to `aborted` when `blockedByFailure` becomes true. But what about a node that's already `running`? Should it be cancelled?
- **Options**:
- (a) Running nodes are NOT affected. A predecessor's failure blocks dependents that haven't started, but running nodes continue. The coordinator can cancel them via `prm.abort()` if desired.
- (b) Running nodes automatically transition to `aborted`. This requires the `effect()` to check for running nodes.
- **ADR-005 reframing**: This becomes a policy configuration of the status projection, not a hardcoded state machine rule. The event log records the failure fact. The projection decides: do we abort running nodes that depend on the failed node? The answer depends on the workflow's failure strategy. Option (a) is the default (running nodes continue), but a policy could specify otherwise. The event log makes both strategies expressible without changing the underlying mechanism — only the projection logic changes.
- **Cross-references**: OQ-09 (retries need to know if a running node can be restarted)
- **Question**: The current spec transitions `idle` and `waiting` nodes to `aborted` when `blockedByFailure` becomes true. But what about a node that's already `running`? Should it be cancelled?
- **Options**: - **Options**:
- (a) Running nodes are NOT affected. A predecessor's failure blocks dependents that haven't started, but running nodes continue. The coordinator can cancel them via `prm.abort()` if desired. - (a) Running nodes are NOT affected. A predecessor's failure blocks dependents that haven't started, but running nodes continue. The coordinator can cancel them via `prm.abort()` if desired.
- (b) Running nodes automatically transition to `aborted`. This requires the `effect()` to check for running nodes. - (b) Running nodes automatically transition to `aborted`. This requires the `effect()` to check for running nodes.
@@ -330,16 +352,16 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
| ID | Question | Origin | Priority | Status | | ID | Question | Origin | Priority | Status |
|----|----------|--------|----------|--------| |----|----------|--------|----------|--------|
| OQ-01 | All edges or only compatible edges? | operation-graph | high | open | | OQ-01 | All edges or only compatible edges? | operation-graph | high | reframed by ADR-005 |
| OQ-02 | Type compatibility depth and granularity | operation-graph, analysis | high | open | | OQ-02 | Type compatibility depth and granularity | operation-graph, analysis | high | reframed by ADR-005 |
| OQ-03 | Subscription operations in type compat | operation-graph | medium | open | | OQ-03 | Subscription operations in type compat | operation-graph | medium | open |
| OQ-04 | `edgeType` on all edges? | schema | medium | open | | OQ-04 | `edgeType` on all edges? | schema | medium | open |
| OQ-05 | Structural container transparency | workflow-templates, host-configs | high | open | | OQ-05 | Structural container transparency | workflow-templates, host-configs | high | open |
| OQ-06 | Template ↔ call protocol interaction | workflow-templates, host-configs | high | open | | OQ-06 | Template ↔ call protocol interaction | workflow-templates, host-configs | high | resolved by ADR-005 |
| OQ-07 | Should reactive engine own call graph? | host-configs | high | open | | OQ-07 | Should reactive engine own call graph? | host-configs | high | resolved by ADR-005 |
| OQ-08 | Auto-populate `depends_on` from templates? | call-graph | medium | open | | OQ-08 | Auto-populate `depends_on` from templates? | call-graph | medium | resolved by ADR-005 |
| OQ-09 | Retries at signal level | reactive-execution | high | open | | OQ-09 | Retries at signal level | reactive-execution | high | resolved by ADR-005 |
| OQ-10 | Running nodes when predecessor fails | reactive-execution | high | open | | OQ-10 | Running nodes when predecessor fails | reactive-execution | high | reframed by ADR-005 |
| OQ-11 | OR logic for preconditions | reactive-execution | medium | open | | OQ-11 | OR logic for preconditions | reactive-execution | medium | open |
| OQ-12 | `maxConcurrency` interaction with preconditions | reactive-execution | medium | open | | OQ-12 | `maxConcurrency` interaction with preconditions | reactive-execution | medium | open |
| OQ-13 | `blockedByFailure` vs single computed | reactive-execution | low | open | | OQ-13 | `blockedByFailure` vs single computed | reactive-execution | low | open |
@@ -362,13 +384,13 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
### Priority Assessment ### Priority Assessment
**High priority** (should resolve before implementation): **High priority** (should resolve before implementation):
- OQ-01: All edges or only compatible — shapes the entire operation graph API - ~~OQ-01: All edges or only compatible~~reframed by ADR-005: incompatible edges only exist on state-transfer edges
- OQ-02: Type compatibility depth — shapes `typeCompat()` return type - ~~OQ-02: Type compatibility depth~~reframed by ADR-005: type checking only for state-transfer edges
- OQ-05: Structural container transparency — fundamental to DAG and reactive engine - OQ-05: Structural container transparency — fundamental to DAG and reactive engine
- OQ-06: Template ↔ call protocol — fundamental integration point - ~~OQ-06: Template ↔ call protocol~~resolved by ADR-005
- OQ-07: Reactive engine owns call graph? — affects architecture boundaries - ~~OQ-07: Reactive engine owns call graph?~~resolved by ADR-005
- OQ-09: Retries — shapes the state machine - ~~OQ-09: Retries~~resolved by ADR-005
- OQ-10: Running node failure handling — shapes failure propagation - ~~OQ-10: Running node failure handling~~reframed by ADR-005: policy configuration, not hardcoded
**Medium priority** (should resolve before v1 release): **Medium priority** (should resolve before v1 release):
- OQ-03, OQ-04, OQ-08, OQ-11, OQ-12, OQ-14, OQ-17, OQ-20, OQ-21, OQ-22, OQ-29 - OQ-03, OQ-04, OQ-08, OQ-11, OQ-12, OQ-14, OQ-17, OQ-20, OQ-21, OQ-22, OQ-29