Files
flowgraph/docs/architecture/decisions/005-event-log-as-source-of-truth.md
glm-5.1 c76be7f689 ADR-005 accepted: resolve all open consequences, update cascading docs
Resolve the three open consequences from ADR-005 (Event Log as Single
Source of Truth) and transition from Proposed to Accepted:

1. Event log IS the call protocol event stream — not a separate type,
   but an EventLogProjection interface (append/getStatus/getResult/
   getEvents) over CallEventMapValue[] with an append-only contract.

2. Event log persists across template re-renders — projections recompute
   against the new DAG; orphaned events stay in log for audit but don't
   affect active projections.

3. Edges get dataFlow: boolean attribute on TemplateEdgeAttrs — inferred
   (not manual) by GraphologyHostConfig from template expressions.
   typeCompat() only runs on dataFlow: true edges. Inference rules are
   precisely specified for Conditional.test, Map.over, and Operation.input.

Also resolve OQ-05 (structural containers stay transparent; aggregate
status is a projection from children) and OQ-10 (running node failure
is a FailurePolicy configuration, default continues-running).

Cascading updates to:
- reactive-execution.md: add hybrid status model (event-log-driven vs
  projection-driven vs signal-mutation), EventLogProjection interface,
  result projection respecting retries, FailurePolicy type
- host-configs.md: ReactiveContext now includes resultProjection and
  computed results; resolved Q1/Q3/Q4
- schema.md: dataFlow attribute on TemplateEdgeAttrs with inference
  rules and type checking implications
- workflow-templates.md: edge creation rules with dataFlow, result
  projection in Conditional/Map, resolved Q1/Q4
- open-questions.md: all ADR-005 questions marked resolved, updated
  summary table and cross-cutting themes, removed duplicate OQ-07

7 files changed, 464 insertions, 139 deletions
2026-05-21 07:44:28 +00:00

256 lines
19 KiB
Markdown

# ADR-005: Event Log as Single Source of Truth
## Status
Accepted
## Context
Flowgraph's reactive execution layer currently uses signal-based state propagation (`signal<NodeStatus>` and `computed<boolean>` for preconditions). Call graph nodes are populated from call protocol events. The two systems — reactive status tracking and call graph construction — are separate concepts that happen to process the same events.
Several open questions in the architecture reveal a common underlying problem:
1. **OQ-06**: How does the template system bridge to the call protocol? The reactive engine needs to know when a call completes and what its output was, but the current design has no formal mechanism for this — `Conditional.test` receives a `results` map from an ad-hoc closure.
2. **OQ-07**: Should the reactive engine own the call graph? They're both derived from the same call protocol events, but they're described as separate concepts.
3. **OQ-08**: Should `depends_on` edges be auto-populated from templates? This conflates temporal ordering ("B starts after A completes") with data flow ("B needs A's output").
4. **OQ-09**: How are retries handled? The current state machine marks `failed` as terminal, requiring awkward workarounds for retry.
5. **OQ-10**: What happens to running nodes when a predecessor fails? The current design uses signal mutations without a clear policy mechanism.
6. **OQ-02**: How deep should type compatibility checking go? This conflates edges that carry data (where types matter) with edges that only express ordering (where types are irrelevant).
These questions share a common root: **the architecture conflates notification (something happened) with state transfer (here's the data).** The event sourcing discipline calls this a "spaghetti concept" — using the same mechanism for semantically different purposes.
Meanwhile, the call protocol already defines a sequence of append-only facts:
```
call.requested → { requestId, operationId, input, parentRequestId, timestamp }
call.responded → { requestId, output, timestamp }
call.error → { requestId, error, timestamp }
call.aborted → { requestId, timestamp }
call.completed → { requestId, timestamp }
```
These events are the ground truth. The call graph, the reactive status map, and the result map are all projections of this event sequence.
## Decision
Flowgraph's reactive execution layer will be built on an **Execution Event Log** — an append-only sequence of call protocol events that serves as the single source of truth. The call graph, reactive status signals, and result map are all projections derived from this log.
### Core Concept: Event Log + Projections
```
┌─────────────────────────────────────────────┐
│ Execution Event Log │
│ (append-only sequence of call protocol │
│ events — the ground truth) │
└──────────────────┬──────────────────────────┘
┌─────────────┼──────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌──────────┐
│ Status │ │ Result │ │ Call │
│ Proj. │ │ Proj. │ │ Graph │
│ │ │ │ │ Proj. │
│ nodeId: │ │ nodeId: │ │ │
│ status │ │ output │ │ nodes + │
│ │ │ │ │ edges │
└────┬────┘ └────┬─────┘ └──────────┘
│ │
▼ ▼
┌───────────────────────────────────────────┐
│ Reactive Execution Layer │
│ │
│ preconditions → "does the log show │
│ all predecessors │
│ completed?" │
│ │
│ result resolution → "does the log │
│ have A's output?" │
│ │
│ Conditional.test → reads from result proj. │
│ Map.over → reads from result proj. │
└───────────────────────────────────────────┘
```
### Notification vs. State Transfer
The event log naturally distinguishes two patterns that the current architecture conflates:
| Pattern | What the edge means | What downstream needs | Event type |
|---------|--------------------|-----------------------|------------|
| **Temporal (task ordering)** | "A must complete before B starts" | Just the notification that A completed | Notification |
| **Data flow** | "A's output is B's input" | A's actual output data | State Transfer |
The SDD pipeline (architect → reviewer → decompressor) is temporal ordering — the reviewer starts because the architect finished, but it reads files from disk, not from the architect's output. It only needs the notification.
The data-flow pipeline (fetch-items → Map(process-item) → aggregate) is state transfer — `process-item` needs `fetch-items`'s output. It needs the state.
Both patterns derive from the same event log. Different projections serve different needs.
### Retry Semantics
Retries become natural with an append-only log. A retry is not a state mutation — it's a new sequence of events appended to the log:
```
call.requested(A, attempt=1) → fact: A was requested
call.error(A, "timeout") → fact: A failed
call.requested(A, attempt=2) → fact: A was retried
call.responded(A, output) → fact: A succeeded on retry
```
The status projection derives the current state by scanning for the most recent event per node. No state machine mutation needed. The state machine becomes a fold over the event log.
### Type Compatibility
Type compatibility checking (OQ-02) only applies to edges that carry state transfer — where the downstream node actually reads the upstream node's output. Temporal-only edges don't need type checking because there's no data flowing between them.
This resolves OQ-01: incompatible edges (type mismatches) only exist on state-transfer edges. For temporal-only edges, type compatibility is irrelevant. The distinction emerges naturally from the notification/state-transfer separation.
## Rationale
1. **The call protocol already IS the event log.** Every call event (`call.requested`, `call.responded`, `call.error`, `call.aborted`, `call.completed`) is an append-only fact. We've been treating these as separate from the reactive layer instead of recognizing that they're the same sequence of events projected differently.
2. **Projections separate concerns.** The status projection, result projection, and call graph projection all derive from the same log but serve different consumers. This eliminates the question of "who owns the call graph" (OQ-07) — it's a projection, not something the reactive engine "owns."
3. **Notification and state transfer are different.** The event sourcing discipline makes this explicit. Conflating them leads to the "boomerang callback" anti-pattern (OQ-06) — if you send a thin notification but the consumer needs the data, they call back synchronously. The event log carries both notification and state transfer; different consumers read different projections.
4. **Retries are natural.** An append-only log makes retries a sequence of facts, not a state mutation hack. This resolves OQ-09 without adding a `retried` status or breaking the terminal-state invariant.
5. **Data dependencies don't need separate edges.** If B needs A's output, B reads from the result projection. The temporal ordering is already expressed by template edges. A separate `depends_on` edge type (OQ-08) becomes unnecessary because the event log is the data transport.
6. **Category theory alignment.** The event log is a monoid (append with identity). Projections are functors from the log monoid to status/result/graph monoids. Composition of morphisms (A → B → C) follows from the composition of events in the log. This is the same structure as the category theory research prototype, but applied to workflow orchestration rather than generic morphism composition.
## Consequences
### Positive
- **OQ-06 resolved**: The reactive layer bridges to the call protocol through the event log. The hub appends call protocol events; the reactive layer projects them. No callback, no boomerang.
- **OQ-07 resolved**: The call graph and reactive engine are both projections of the event log. Neither owns the other.
- **OQ-08 resolved**: `depends_on` edges are unnecessary. Data dependencies are expressed through the result projection, not through edge attributes.
- **OQ-09 resolved**: Retries are natural — append new events rather than mutating state.
- **OQ-10 reframed**: When a predecessor fails, the event log records the failure. Policy (abort running nodes? let them continue?) is a configuration of the projection, not a hard-coded state machine rule.
- **OQ-02 reframed**: Type compatibility checking only applies to edges where state transfer occurs, not to temporal-only edges.
- **OQ-01 reframed**: Incompatible edges only exist where there's data flow. Temporal-only edges don't need them in the operation graph.
### Negative
- **The reactive layer needs a redesign.** The current `WorkflowReactiveRoot` directly creates `signal<NodeStatus>` instances and expects the hub coordinator to set them. The event log approach replaces direct signal mutation with event appends that project into signal updates. This is a non-trivial refactoring of the reactive-execution.md spec.
- **The event log must be persisted if workflow state must survive restarts.** Since flowgraph is in-memory only (ADR-003), the event log lives in memory. Persistence is the consumer's concern — the hub can persist the call protocol events in Postgres and replay them to reconstruct the reactive state after a restart.
- **Event replay must be idempotent.** Processing the same event twice must produce the same projected state. This is already a property of the call protocol events (`updateFromEvent` is documented as idempotent in call-graph.md).
- **The result projection needs a clear interface.** `getResult(nodeId)` must be defined — what it returns, when it's available, and how it interacts with `Conditional.test` and `Map.over` closures that may reference results from nodes that haven't completed yet.
### Resolved: Event log is the call protocol event stream
The event log is NOT a separate type. It IS the call protocol event stream with an **append-only contract** and **projection functions**. The call protocol events (`CallEventMapValue[]`) already carry everything needed:
- `requestId` — identifies which invocation
- `operationId` — identifies which operation
- `input`/`output` — the payload data (for state transfer edges)
- `parentRequestId` — the causation link
- `timestamp` — when it happened
What flowgraph provides is not a new event type, but a **consumption contract**:
```typescript
interface EventLogProjection {
/** Append an event. Events are processed idempotently. */
append(event: CallEventMapValue): void;
/** Current status of a node, derived from the most recent event. */
getStatus(nodeId: string): NodeStatus;
/** Result of a completed node, derived from call.responded events. */
getResult(nodeId: string): CallResult | undefined;
/** All events for a node, in order. */
getEvents(nodeId: string): CallEventMapValue[];
}
```
The `EventLogProjection` interface makes the append-only discipline explicit and provides typed access to projections. Implementations wrap `CallEventMapValue[]` and derive state on demand (or with memoization). This avoids creating a parallel type system — the event types, their structure, and their semantics remain in `@alkdev/operations/src/call.ts`.
### Resolved: Event log persists across re-renders; projections recompute
When a template is re-rendered (when the ujsx reconciler supports it), the event log persists. Events are append-only facts — they record what happened, and what happened doesn't change when the template structure changes.
Projections are recomputed by scanning the log against the new DAG:
1. Events for nodes still in the DAG map naturally to their projections.
2. Events for nodes removed from the DAG become **orphaned events** — they remain in the log (for audit/history) but don't affect active projections.
3. New nodes added to the DAG have no events yet — their status is `idle` and their result is `undefined`.
This means re-rendering doesn't lose history. The event log is the durable record; projections are ephemeral views that can always be reconstructed.
For v1 (before the reconciler exists), the event log starts at template mount and is disposed when the `WorkflowReactiveRoot` is disposed. The re-render scenario is an architectural commitment for when the reconciler arrives, not something to implement now.
#### Orphaned events specification
When a template is re-rendered and nodes are removed from the DAG, their events become orphaned. The projection layer handles this as follows:
1. **The `EventLogProjection` receives the current DAG structure** (the set of active node keys) alongside the event log. Methods like `getStatus(nodeId)` first check whether `nodeId` is in the active DAG. If not, the node is orphaned.
2. **Orphaned nodes return `undefined` from `getResult()`**. A downstream node referencing an orphaned predecessor via `Conditional.test` or `Map.over` will see `undefined`, causing the test to evaluate as if the predecessor didn't complete. This is the correct behavior — a removed node can't provide data.
3. **Orphaned events remain in the log** for audit and history. `getEvents(nodeId)` on an orphaned node returns its events (if any). The overall event log is still queryable for debugging.
4. **The `nodeKeyToRequestId` map is rebuilt on re-render**. New nodes get fresh `requestId` values. Old mappings are discarded, along with their associated signal subscriptions (the `WorkflowReactiveRoot.dispose()` call before re-render handles this).
### Resolved: Edges are marked with `dataFlow` attribute
Template edges get a `dataFlow: boolean` attribute that distinguishes temporal edges from state-transfer edges:
| `dataFlow` value | Meaning | Type checking needed? |
|:---|:---|:---|
| `false` (default) | Temporal ordering only — downstream starts after upstream completes but doesn't read upstream's output | No — no data flows between nodes |
| `true` | State transfer — downstream reads upstream's output via `Conditional.test` or `Map.over` | Yes — `typeCompat()` checks output→input compatibility |
This attribute is **inferred, not manual**. The `GraphologyHostConfig` detects `dataFlow` from template expressions during rendering:
- A `Sequential` edge where the downstream node references `results["upstreamNode"]` in `Conditional.test`, `Map.over`, or `Operation.input` gets `dataFlow: true`
- A `Sequential` edge where no such reference exists gets `dataFlow: false` (the default)
- A `Conditional` edge always gets `dataFlow: true` (the condition always reads a predecessor's result)
- `Parallel` edges don't exist (parallel children have no inter-sibling edges)
#### dataFlow inference specification
The inference algorithm operates at **template AST level** during `GraphologyHostConfig.createInstance` / `appendChild`, not at runtime. It inspects template component props to detect references to predecessor results:
**Detectable references** (set `dataFlow: true` on the edge from the referenced node to the referencing node):
| Expression | Detection method |
|:---|:---|
| `Conditional.test = (results) => results["X"]` | Static analysis of the function body for `results[...]` property accesses |
| `Conditional.test = "X"` (string form) | String comparison — the referenced operation name |
| `Map.over = (results) => results["X"].output.items` | Static analysis of the function body for `results[...]` property accesses |
| `Map.over = itemsSignal` (signal form) | No `dataFlow: true` — the array comes from a signal, not a predecessor result |
| `Operation.input = (results) => results["X"].output` | Static analysis of the function body for `results[...]` property accesses |
| `Operation.input = staticValue` | No `dataFlow: true` — the input doesn't depend on a predecessor result |
**Inference rules**:
1. **Direct predecessor edges only**: `dataFlow: true` is set only on edges that exist in the DAG. In a `Sequential` chain A → B → C, if C references `results["A"]`, the edge B → C gets `dataFlow: true` (since A is a predecessor of C via the chain), but no new edge A → C is created. Data flows transitively through the chain — B must complete before C starts, and C reads A's result from the result projection.
2. **`Map` component edges**: A `Map` component's predecessor-to-first-mapped-child edge gets `dataFlow: true` if `Map.over` references a predecessor result. Each mapped child's edge from the `Map`'s predecessor gets `dataFlow: true` because the array data comes from a predecessor's output.
3. **Ambiguous references**: If `Operation.input` is a function that cannot be statically analyzed (e.g., `(results) => computeInput(results)` where `computeInput` is a closure), the inference defaults to `dataFlow: false`. Template authors can manually annotate with `dataFlow: true` as an override, though this should be rare.
4. **Function body analysis**: JavaScript function introspection is unreliable (minification, closures). Inference operates on the **AST** of the ujsx template during rendering, not on the runtime function body. This means that `Conditional.test` functions passed as closures from external code (not inline in the template) cannot have their references detected. For these cases, the string form (`Conditional.test = "operationName"`) should be used to ensure detectability.
The `dataFlow` attribute propagates to the `TemplateEdgeAttrs` schema:
```typescript
const TemplateEdgeAttrs = Type.Object({
edgeType: Type.Union([Type.Literal("sequential"), Type.Literal("conditional")]),
condition: Type.Optional(Type.Unknown()),
dataFlow: Type.Optional(Type.Boolean({ default: false })),
});
```
This resolves OQ-01 and OQ-02 precisely: `typeCompat()` only runs on edges where `dataFlow: true`. Temporal-only edges bypass type checking entirely.
## References
- Open questions tracker: [open-questions.md](open-questions.md)
- Reactive execution: [reactive-execution.md](reactive-execution.md)
- Call graph: [call-graph.md](call-graph.md)
- Call protocol events: `@alkdev/operations/src/call.ts`
- Event sourcing research: `/workspace/research/event_sourcing/event_source_types.md`
- Category theory graph research: `/workspace/@alkdev/ujsx/docs/research/category-theory-graph.md`
- Compute graph DAG: `/workspace/compute_graph/packages/dag/`