ADR-005 accepted: resolve all open consequences, update cascading docs

Resolve the three open consequences from ADR-005 (Event Log as Single Source of Truth) and transition from Proposed to Accepted: 1. Event log IS the call protocol event stream — not a separate type, but an EventLogProjection interface (append/getStatus/getResult/ getEvents) over CallEventMapValue[] with an append-only contract. 2. Event log persists across template re-renders — projections recompute against the new DAG; orphaned events stay in log for audit but don't affect active projections. 3. Edges get dataFlow: boolean attribute on TemplateEdgeAttrs — inferred (not manual) by GraphologyHostConfig from template expressions. typeCompat() only runs on dataFlow: true edges. Inference rules are precisely specified for Conditional.test, Map.over, and Operation.input. Also resolve OQ-05 (structural containers stay transparent; aggregate status is a projection from children) and OQ-10 (running node failure is a FailurePolicy configuration, default continues-running). Cascading updates to: - reactive-execution.md: add hybrid status model (event-log-driven vs projection-driven vs signal-mutation), EventLogProjection interface, result projection respecting retries, FailurePolicy type - host-configs.md: ReactiveContext now includes resultProjection and computed results; resolved Q1/Q3/Q4 - schema.md: dataFlow attribute on TemplateEdgeAttrs with inference rules and type checking implications - workflow-templates.md: edge creation rules with dataFlow, result projection in Conditional/Map, resolved Q1/Q4 - open-questions.md: all ADR-005 questions marked resolved, updated summary table and cross-cutting themes, removed duplicate OQ-07 7 files changed, 464 insertions, 139 deletions
2026-05-21 07:44:28 +00:00
parent 2c1b2d1a15
commit c76be7f689
7 changed files with 463 additions and 138 deletions
--- a/docs/architecture/open-questions.md
+++ b/docs/architecture/open-questions.md
@@ -16,41 +16,36 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi

 ## ADR-005 Impact

-[ADR-005: Event Log as Single Source of Truth](decisions/005-event-log-as-source-of-truth.md) proposes an Execution Event Log pattern that resolves or reframes several open questions. Questions affected by ADR-005 are marked with `adr-005` in their status. Summary:
+[ADR-005: Event Log as Single Source of Truth](decisions/005-event-log-as-source-of-truth.md) proposes an Execution Event Log pattern that resolves or reframes several open questions. ADR-005 is now **Accepted**. All questions it affects have been resolved:

-| Question | ADR-005 Impact |
-|----------|-----------------|
-| OQ-01 | Reframed: incompatible edges only exist where there's data flow. Temporal-only edges don't need type checking. |
-| OQ-02 | Reframed: type compatibility depth only applies to state-transfer edges, not notification edges. |
-| OQ-06 | Resolved: the reactive layer bridges to the call protocol through the event log, not direct signal mutation. |
-| OQ-07 | Resolved: call graph and reactive engine are both projections of the event log. Neither owns the other. |
-| OQ-08 | Resolved: `depends_on` edges unnecessary; data dependencies expressed through result projection. |
-| OQ-09 | Resolved: retries are natural append events, not state mutations. |
-| OQ-10 | Reframed: policy question (abort running nodes?) becomes a projection configuration, not a hardcoded state machine rule. |
+| Question | ADR-005 Impact | Final Resolution |
+|----------|-----------------|-------------------|
+| OQ-01 | Reframed → Resolved | Type-compat edges only on `dataFlow: true` edges. Temporal edges bypass type checking. |
+| OQ-02 | Reframed → Resolved | Type checking scope narrows to state-transfer edges. Structured mismatch reporting confirmed. |
+| OQ-05 | Independent → Resolved | Containers stay transparent. Aggregate status computed as projection from children. |
+| OQ-06 | Resolved | The reactive layer bridges to call protocol through the event log. Hub appends events; reactive layer projects them. |
+| OQ-07 | Resolved | Call graph and reactive engine are both projections of the event log. Neither owns the other. |
+| OQ-08 | Resolved | `depends_on` edges unnecessary. Data dependencies expressed through result projection. |
+| OQ-09 | Resolved | Retries are natural append events. New `requestId` per retry. |
+| OQ-10 | Reframed → Resolved | Running node failure handling is a projection policy, not a state machine rule. Default: running nodes continue. |

 ## Theme 1: Edge Semantics and Type Compatibility

 ### OQ-01: Should `fromSpecs()` add ALL edges or only compatible ones?

 - **Origin**: [operation-graph.md](operation-graph.md) Q1
- **Status**: reframed by ADR-005
+- **Status**: resolved
 - **Priority**: high — affects storage size, API surface, and diagnostic value
- **Options**:
-  - (a) Add both compatible and incompatible edges (current design). Pro: diagnostic information visible. Con: graph is larger.
-  - (b) Only add compatible edges, with a `potentialEdges()` query computing incompatible connections on demand. Pro: smaller graph. Con: loses diagnostic information.
- **Notes**: This decision affects `buildTypeEdges()` in [analysis.md](analysis.md) and `OperationEdgeAttrs` in [schema.md](schema.md). The `compatible: false` attribute on edges only makes sense if option (a) is chosen.
- **ADR-005 reframing**: Incompatible edges only exist on **state-transfer** edges (where data flows from A's output to B's input). **Temporal-only** edges (where B starts after A completes but doesn't use A's output) don't need type checking at all. This means option (b) may be correct for temporal edges, while option (a) is correct for state-transfer edges. The operation graph could distinguish these with an edge attribute.
+- **Resolution**: Adopt option (a) for state-transfer edges, option (b) for temporal-only edges. Type-compatibility edges (with `compatible: true/false` attributes) are only added where data flows between operations. The `dataFlow` attribute on `TemplateEdgeAttrs` (resolved in ADR-005) determines which edges need type checking. For edges where `dataFlow: true`, both compatible and incompatible edges provide diagnostic value. For edges where `dataFlow: false`, no type-compat edge is needed — temporal ordering doesn't have type compatibility.
 - **Cross-references**: OQ-04

 ### OQ-02: How granular should type compatibility results be?

 - **Origin**: [operation-graph.md](operation-graph.md) Q4, [analysis.md](analysis.md) Q1
- **Status**: reframed by ADR-005
+- **Status**: resolved
 - **Priority**: high — directly shapes the `typeCompat()` return type and `OperationEdgeAttrs`
- **Question (merged)**: How deep should `typeCompat` check? Should it be fully recursive? And should the result be `{ compatible, detail? }` or `{ compatible, mismatches: TypeMismatch[] }`?
- **Current design**: The schema already defines `TypeMismatch` with `{ path, expected, actual }` and `OperationEdgeAttrs` has an optional `mismatches` field. The analysis doc describes deep recursive structural comparison. But there's a tension: full recursive checking is more thorough but may produce false negatives for schemas with dynamic structures.
- **Notes**: The schema doc already has `mismatches?: TypeMismatch[]` in `OperationEdgeAttrs`. The analysis doc already defines `TypeCompatResult` with `mismatches`. This suggests the design has already converged toward structured mismatch reporting. What remains is confirming: (a) recursive depth limits, (b) handling of `Type.Unknown()` and complex types (unions, intersections), (c) whether the `detail` string field is still needed alongside `mismatches`.
- **ADR-005 reframing**: Type compatibility checking only applies to **state-transfer** edges (where A's output flows into B's input). **Temporal-only** edges (where B starts after A but doesn't use A's output) don't need type checking — their "compatibility" is trivially true. This means the operation graph should distinguish between edges that carry data and edges that only express ordering. `typeCompat()` only needs to run on state-transfer edges.
+- **Resolution**: Type compatibility checking only applies to **state-transfer edges** (where A's output flows into B's input), as established by ADR-005's `dataFlow` attribute on `TemplateEdgeAttrs`. Temporal-only edges bypass type checking entirely (their "compatibility" is trivially true). The `typeCompat()` function returns `{ compatible, detail?, mismatches? }` for state-transfer edges only. The schema already has `mismatches?: TypeMismatch[]` in `OperationEdgeAttrs` — this design is confirmed. Remaining detail decisions (recursive depth limits, unknown/union type handling) are implementation concerns, not architecture decisions.
+- **Cross-references**: OQ-01

 ### OQ-03: Should subscription operations be treated differently in type compatibility?

@@ -79,13 +74,19 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
 ### OQ-05: Should `Sequential` and `Parallel` be transparent in the graph?

 - **Origin**: [workflow-templates.md](workflow-templates.md) Q1, [host-configs.md](host-configs.md) Q1
- **Status**: open
+- **Status**: resolved
 - **Priority**: high — fundamental to how the DAG is structured and how the reactive engine computes preconditions
 - **Question (merged)**: Currently, structural containers (`Sequential`, `Parallel`, `Conditional`) produce edges but no nodes. The reactive engine then has to reconstruct structural context to compute preconditions. Should they create "virtual" nodes instead?
- **Options**:
-  - (a) Transparent (current design): No nodes for containers. Edges carry the structure. Pro: smaller DAG, cleaner topology. Con: precondition computation needs structural context (parentStack, siblingMap).
-  - (b) Virtual nodes: Containers create nodes with `signal<NodeStatus>`. Pro: every node has a status and preconditions, simpler reactive engine. Con: more nodes, containers with no call protocol equivalent, slightly more complex graph queries.
- **Notes**: The host-configs doc identifies this as a "known gap": `Sequential`, `Parallel`, `Conditional` are transparent in the DAG but create complexity for the reactive engine's "previous sibling" precondition logic. The reactive-execution doc's `WorkflowReactiveRoot.initializeSignals()` assumes it operates on the flattened DAG (all nodes are operations), which aligns with option (a). The question is whether the reactive engine's context maps (`parentMap`, `siblingMap`) are sufficient or if virtual nodes would simplify things.
+- **Resolution**: Keep containers transparent (current design). Structural containers do NOT create nodes in the DAG or events in the event log. Their aggregate status can be computed as a projection from their children's statuses:
+
+  - A `Sequential` is "completed" when all its children are completed/skipped
+  - A `Parallel` is "completed" when all its children are completed/skipped
+  - A `Conditional` is "completed" when its taken branch is completed/skipped
+
+  This resolution aligns with ADR-005's projection model: the event log records real call events, and projections derive derived state. Virtual nodes in the event log would pollute it with synthetic events that have no call protocol equivalent. Virtual nodes in the DAG would add structural overhead for what is already computable.
+
+  The `parentMap` and `siblingMap` in the `ReactiveContext` remain the mechanism for computing preconditions. These maps are derived from the template structure during rendering, not from the DAG. They provide the structural context that the transparent-DAG approach needs, without requiring container nodes.
+
 - **Cross-references**: OQ-14 (partial re-rendering)

 ---
@@ -108,11 +109,6 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
 - **Priority**: high — affects the separation between flowgraph and the call protocol
 - **Question**: Currently the call graph (from call-graph.md) and the reactive engine (from reactive-execution.md) are separate concepts. But at runtime, every `<Operation>` in a template becomes a call graph node. Should the reactive engine populate the call graph as a side effect?
 - **ADR-005 resolution**: Neither owns the other. Both the call graph and the reactive status/result projections derive from the same event log. They are independent projections of the same source of truth. The call graph projects the structural view (who triggered whom). The reactive engine projects the behavioral view (what's running, what's blocked). You can have one without the other, or both simultaneously.
- **Question**: Currently the call graph (from call-graph.md) and the reactive engine (from reactive-execution.md) are separate concepts. But at runtime, every `<Operation>` in a template becomes a call graph node. Should the reactive engine populate the call graph as a side effect?
- **Options**:
-  - (a) Separate: Call graph is populated by call protocol events. Reactive engine uses signals only. Coordinator bridges them.
-  - (b) Unified: Reactive engine creates call graph nodes when nodes transition to `running`, updates them on completion. Call graph is derived from reactive state.
- **Notes**: Option (a) matches ADR-003 (flowgraph doesn't do storage/persistence) and the current design where the call graph is populated by `updateFromEvent()`. Option (b) would couple the reactive engine to the call protocol. The current design's separation is cleaner but requires the coordinator to maintain both reactive state and call graph state.

 ### OQ-08: Should `depends_on` edges be auto-populated from workflow templates?

@@ -142,20 +138,10 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
 ### OQ-10: What happens to running nodes when a predecessor fails?

 - **Origin**: [reactive-execution.md](reactive-execution.md) Q6
- **Status**: reframed by ADR-005
+- **Status**: resolved
 - **Priority**: high — affects failure propagation correctness
- **Question**: The current spec transitions `idle` and `waiting` nodes to `aborted` when `blockedByFailure` becomes true. But what about a node that's already `running`? Should it be cancelled?
- **Options**:
-  - (a) Running nodes are NOT affected. A predecessor's failure blocks dependents that haven't started, but running nodes continue. The coordinator can cancel them via `prm.abort()` if desired.
-  - (b) Running nodes automatically transition to `aborted`. This requires the `effect()` to check for running nodes.
- **ADR-005 reframing**: This becomes a policy configuration of the status projection, not a hardcoded state machine rule. The event log records the failure fact. The projection decides: do we abort running nodes that depend on the failed node? The answer depends on the workflow's failure strategy. Option (a) is the default (running nodes continue), but a policy could specify otherwise. The event log makes both strategies expressible without changing the underlying mechanism — only the projection logic changes.
- **Cross-references**: OQ-09 (retries need to know if a running node can be restarted)
- **Question**: The current spec transitions `idle` and `waiting` nodes to `aborted` when `blockedByFailure` becomes true. But what about a node that's already `running`? Should it be cancelled?
- **Options**:
-  - (a) Running nodes are NOT affected. A predecessor's failure blocks dependents that haven't started, but running nodes continue. The coordinator can cancel them via `prm.abort()` if desired.
-  - (b) Running nodes automatically transition to `aborted`. This requires the `effect()` to check for running nodes.
- **Notes**: Option (a) is consistent with "failure follows dependency edges, not structural scope" — a running node has already passed its preconditions, so it should be allowed to complete. The coordinator can choose to abort it. Option (b) would be more aggressive. The reactive-execution doc's constraint says "abort is immediate in signals, delayed in protocol," suggesting option (a) is intended.
- **Cross-references**: OQ-09 (retries need to know if a running node can be restarted)
+- **Resolution**: This is a **policy configuration** of the status projection, not a hardcoded state machine rule. The event log records failure facts. The projection decides how to handle running nodes that depend on a failed node. The default policy (option a from the original framing): running nodes are NOT affected by a predecessor's failure — only idle/waiting nodes transition to `aborted`. A more aggressive policy could abort running nodes, but this requires explicit configuration. The event log makes both strategies expressible without changing the underlying mechanism — only the projection logic changes. This aligns with ADR-005's principle that projections encode policy while the log records facts.
+- **Cross-references**: OQ-09 (retries are new events, not state mutations)

 ---

@@ -352,16 +338,16 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi

 | ID | Question | Origin | Priority | Status |
 |----|----------|--------|----------|--------|
-| OQ-01 | All edges or only compatible edges? | operation-graph | high | reframed by ADR-005 |
-| OQ-02 | Type compatibility depth and granularity | operation-graph, analysis | high | reframed by ADR-005 |
+| OQ-01 | All edges or only compatible edges? | operation-graph | high | resolved |
+| OQ-02 | Type compatibility depth and granularity | operation-graph, analysis | high | resolved |
 | OQ-03 | Subscription operations in type compat | operation-graph | medium | open |
 | OQ-04 | `edgeType` on all edges? | schema | medium | open |
-| OQ-05 | Structural container transparency | workflow-templates, host-configs | high | open |
-| OQ-06 | Template ↔ call protocol interaction | workflow-templates, host-configs | high | resolved by ADR-005 |
-| OQ-07 | Should reactive engine own call graph? | host-configs | high | resolved by ADR-005 |
-| OQ-08 | Auto-populate `depends_on` from templates? | call-graph | medium | resolved by ADR-005 |
-| OQ-09 | Retries at signal level | reactive-execution | high | resolved by ADR-005 |
-| OQ-10 | Running nodes when predecessor fails | reactive-execution | high | reframed by ADR-005 |
+| OQ-05 | Structural container transparency | workflow-templates, host-configs | high | resolved |
+| OQ-06 | Template ↔ call protocol interaction | workflow-templates, host-configs | high | resolved |
+| OQ-07 | Should reactive engine own call graph? | host-configs | high | resolved |
+| OQ-08 | Auto-populate `depends_on` from templates? | call-graph | medium | resolved |
+| OQ-09 | Retries at signal level | reactive-execution | high | resolved |
+| OQ-10 | Running nodes when predecessor fails | reactive-execution | high | resolved |
 | OQ-11 | OR logic for preconditions | reactive-execution | medium | open |
 | OQ-12 | `maxConcurrency` interaction with preconditions | reactive-execution | medium | open |
 | OQ-13 | `blockedByFailure` vs single computed | reactive-execution | low | open |
@@ -383,17 +369,21 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi

 ### Priority Assessment

+**Resolved (ADR-005)**:
+- ~~OQ-01: All edges or only compatible~~ — resolved: type-compat edges only on `dataFlow: true` edges
+- ~~OQ-02: Type compatibility depth~~ — resolved: type checking only for state-transfer edges
+- ~~OQ-05: Structural container transparency~~ — resolved: containers stay transparent, aggregate status is a projection
+- ~~OQ-06: Template ↔ call protocol~~ — resolved: bridge through event log
+- ~~OQ-07: Reactive engine owns call graph?~~ — resolved: both are projections of event log
+- ~~OQ-08: Auto-populate `depends_on` from templates?~~ — resolved: unnecessary, data flows through result projection
+- ~~OQ-09: Retries at signal level~~ — resolved: append events, not state mutations
+- ~~OQ-10: Running node failure handling~~ — resolved: projection policy, default is running nodes continue
+
 **High priority** (should resolve before implementation):
- ~~OQ-01: All edges or only compatible~~ — reframed by ADR-005: incompatible edges only exist on state-transfer edges
- ~~OQ-02: Type compatibility depth~~ — reframed by ADR-005: type checking only for state-transfer edges
- OQ-05: Structural container transparency — fundamental to DAG and reactive engine
- ~~OQ-06: Template ↔ call protocol~~ — resolved by ADR-005
- ~~OQ-07: Reactive engine owns call graph?~~ — resolved by ADR-005
- ~~OQ-09: Retries~~ — resolved by ADR-005
- ~~OQ-10: Running node failure handling~~ — reframed by ADR-005: policy configuration, not hardcoded
+- (all high-priority questions have been resolved)

 **Medium priority** (should resolve before v1 release):
- OQ-03, OQ-04, OQ-08, OQ-11, OQ-12, OQ-14, OQ-17, OQ-20, OQ-21, OQ-22, OQ-29
+- OQ-03, OQ-04, OQ-11, OQ-12, OQ-14, OQ-17, OQ-20, OQ-21, OQ-22, OQ-29

 **Low priority** (can defer or decide during implementation):
 - OQ-13, OQ-15, OQ-16, OQ-18, OQ-19, OQ-24, OQ-25, OQ-26, OQ-27, OQ-28
@@ -402,11 +392,13 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi

 These groups of questions interact with each other and should be resolved together:

-1. **Edge semantics group** (OQ-01, OQ-02, OQ-04): All affect the operation graph's edge structure and the type compatibility API.
+1. **~~Edge semantics group~~** (OQ-01, OQ-02, OQ-04): ~~All affect the operation graph's edge structure and the type compatibility API.~~ **Resolved by ADR-005.** OQ-01 and OQ-02 resolved (type checking only on `dataFlow: true` edges). OQ-04 remains open (edge type on all edges).

-2. **Call protocol integration group** (OQ-06, OQ-07, OQ-08): All about how flowgraph connects to the live call protocol.
+2. **~~Call protocol integration group~~** (OQ-06, OQ-07, OQ-08): ~~All about how flowgraph connects to the live call protocol.~~ **Resolved by ADR-005.** All three resolved: bridge through event log, projections instead of ownership, data flow through result projection.

-3. **Failure semantics group** (OQ-09, OQ-10): Both about how failure and retry propagate through the reactive engine. Resolving one may resolve or constrain the other.
+3. **~~Failure semantics group~~** (OQ-09, OQ-10): ~~Both about how failure and retry propagate through the reactive engine.~~ **Resolved by ADR-005.** Retries are append events; running node failure is a projection policy.
+
+4. **Scheduling group** (OQ-11, OQ-12): Both about how preconditions interact with scheduling constraints.

 4. **Scheduling group** (OQ-11, OQ-12): Both about how preconditions interact with scheduling constraints.