flowgraph/docs/architecture/decisions/005-event-log-as-source-of-truth.md

# ADR-005: Event Log as Single Source of Truth

## Status

Accepted

## Context

Flowgraph's reactive execution layer currently uses signal-based state propagation (`signal<NodeStatus>` and `computed<boolean>` for preconditions). Call graph nodes are populated from call protocol events. The two systems — reactive status tracking and call graph construction — are separate concepts that happen to process the same events.

Several open questions in the architecture reveal a common underlying problem:

1. **OQ-06**: How does the template system bridge to the call protocol? The reactive engine needs to know when a call completes and what its output was, but the current design has no formal mechanism for this — `Conditional.test` receives a `results` map from an ad-hoc closure.
2. **OQ-07**: Should the reactive engine own the call graph? They're both derived from the same call protocol events, but they're described as separate concepts.
3. **OQ-08**: Should `depends_on` edges be auto-populated from templates? This conflates temporal ordering ("B starts after A completes") with data flow ("B needs A's output").
4. **OQ-09**: How are retries handled? The current state machine marks `failed` as terminal, requiring awkward workarounds for retry.
5. **OQ-10**: What happens to running nodes when a predecessor fails? The current design uses signal mutations without a clear policy mechanism.
6. **OQ-02**: How deep should type compatibility checking go? This conflates edges that carry data (where types matter) with edges that only express ordering (where types are irrelevant).

These questions share a common root: **the architecture conflates notification (something happened) with state transfer (here's the data).** The event sourcing discipline calls this a "spaghetti concept" — using the same mechanism for semantically different purposes.

Meanwhile, the call protocol already defines a sequence of append-only facts:

```
call.requested  → { requestId, operationId, input, parentRequestId, timestamp }
call.responded  → { requestId, output, timestamp }
call.error      → { requestId, error, timestamp }
call.aborted    → { requestId, timestamp }
call.completed  → { requestId, timestamp }
```

These events are the ground truth. The call graph, the reactive status map, and the result map are all projections of this event sequence.

## Decision

Flowgraph's reactive execution layer will be built on an **Execution Event Log** — an append-only sequence of call protocol events that serves as the single source of truth. The call graph, reactive status signals, and result map are all projections derived from this log.

### Core Concept: Event Log + Projections

```
┌─────────────────────────────────────────────┐
│           Execution Event Log               │
│  (append-only sequence of call protocol     │
│   events — the ground truth)                │
└──────────────────┬──────────────────────────┘
                   │
     ┌─────────────┼──────────────┐
     │             │              │
     ▼             ▼              ▼
┌─────────┐  ┌──────────┐  ┌──────────┐
│ Status  │  │ Result   │  │ Call     │
│ Proj.   │  │ Proj.    │  │ Graph    │
│         │  │          │  │ Proj.    │
│ nodeId: │  │ nodeId:  │  │          │
│ status  │  │ output   │  │ nodes +  │
│         │  │          │  │ edges    │
└────┬────┘  └────┬─────┘  └──────────┘
     │             │
     ▼             ▼
┌───────────────────────────────────────────┐
│        Reactive Execution Layer            │
│                                             │
│  preconditions → "does the log show        │
│                    all predecessors          │
│                    completed?"              │
│                                             │
│  result resolution → "does the log          │
│                       have A's output?"     │
│                                             │
│  Conditional.test → reads from result proj.  │
│  Map.over         → reads from result proj.  │
└───────────────────────────────────────────┘
```

### Notification vs. State Transfer

The event log naturally distinguishes two patterns that the current architecture conflates:

| Pattern | What the edge means | What downstream needs | Event type |
|---------|--------------------|-----------------------|------------|
| **Temporal (task ordering)** | "A must complete before B starts" | Just the notification that A completed | Notification |
| **Data flow** | "A's output is B's input" | A's actual output data | State Transfer |

The SDD pipeline (architect → reviewer → decompressor) is temporal ordering — the reviewer starts because the architect finished, but it reads files from disk, not from the architect's output. It only needs the notification.

The data-flow pipeline (fetch-items → Map(process-item) → aggregate) is state transfer — `process-item` needs `fetch-items`'s output. It needs the state.

Both patterns derive from the same event log. Different projections serve different needs.

### Retry Semantics

Retries become natural with an append-only log. A retry is not a state mutation — it's a new sequence of events appended to the log:

```
call.requested(A, attempt=1)  → fact: A was requested
call.error(A, "timeout")      → fact: A failed
call.requested(A, attempt=2)  → fact: A was retried
call.responded(A, output)      → fact: A succeeded on retry
```

The status projection derives the current state by scanning for the most recent event per node. No state machine mutation needed. The state machine becomes a fold over the event log.

### Type Compatibility

Type compatibility checking (OQ-02) only applies to edges that carry state transfer — where the downstream node actually reads the upstream node's output. Temporal-only edges don't need type checking because there's no data flowing between them.

This resolves OQ-01: incompatible edges (type mismatches) only exist on state-transfer edges. For temporal-only edges, type compatibility is irrelevant. The distinction emerges naturally from the notification/state-transfer separation.

## Rationale

1. **The call protocol already IS the event log.** Every call event (`call.requested`, `call.responded`, `call.error`, `call.aborted`, `call.completed`) is an append-only fact. We've been treating these as separate from the reactive layer instead of recognizing that they're the same sequence of events projected differently.

2. **Projections separate concerns.** The status projection, result projection, and call graph projection all derive from the same log but serve different consumers. This eliminates the question of "who owns the call graph" (OQ-07) — it's a projection, not something the reactive engine "owns."

3. **Notification and state transfer are different.** The event sourcing discipline makes this explicit. Conflating them leads to the "boomerang callback" anti-pattern (OQ-06) — if you send a thin notification but the consumer needs the data, they call back synchronously. The event log carries both notification and state transfer; different consumers read different projections.

4. **Retries are natural.** An append-only log makes retries a sequence of facts, not a state mutation hack. This resolves OQ-09 without adding a `retried` status or breaking the terminal-state invariant.

5. **Data dependencies don't need separate edges.** If B needs A's output, B reads from the result projection. The temporal ordering is already expressed by template edges. A separate `depends_on` edge type (OQ-08) becomes unnecessary because the event log is the data transport.

6. **Category theory alignment.** The event log is a monoid (append with identity). Projections are functors from the log monoid to status/result/graph monoids. Composition of morphisms (A → B → C) follows from the composition of events in the log. This is the same structure as the category theory research prototype, but applied to workflow orchestration rather than generic morphism composition.

## Consequences

### Positive

- **OQ-06 resolved**: The reactive layer bridges to the call protocol through the event log. The hub appends call protocol events; the reactive layer projects them. No callback, no boomerang.
- **OQ-07 resolved**: The call graph and reactive engine are both projections of the event log. Neither owns the other.
- **OQ-08 resolved**: `depends_on` edges are unnecessary. Data dependencies are expressed through the result projection, not through edge attributes.
- **OQ-09 resolved**: Retries are natural — append new events rather than mutating state.
- **OQ-10 reframed**: When a predecessor fails, the event log records the failure. Policy (abort running nodes? let them continue?) is a configuration of the projection, not a hard-coded state machine rule.
- **OQ-02 reframed**: Type compatibility checking only applies to edges where state transfer occurs, not to temporal-only edges.
- **OQ-01 reframed**: Incompatible edges only exist where there's data flow. Temporal-only edges don't need them in the operation graph.

### Negative

- **The reactive layer needs a redesign.** The current `WorkflowReactiveRoot` directly creates `signal<NodeStatus>` instances and expects the hub coordinator to set them. The event log approach replaces direct signal mutation with event appends that project into signal updates. This is a non-trivial refactoring of the reactive-execution.md spec.
- **The event log must be persisted if workflow state must survive restarts.** Since flowgraph is in-memory only (ADR-003), the event log lives in memory. Persistence is the consumer's concern — the hub can persist the call protocol events in Postgres and replay them to reconstruct the reactive state after a restart.
- **Event replay must be idempotent.** Processing the same event twice must produce the same projected state. This is already a property of the call protocol events (`updateFromEvent` is documented as idempotent in call-graph.md).
- **The result projection needs a clear interface.** `getResult(nodeId)` must be defined — what it returns, when it's available, and how it interacts with `Conditional.test` and `Map.over` closures that may reference results from nodes that haven't completed yet.

### Resolved: Event log is the call protocol event stream

The event log is NOT a separate type. It IS the call protocol event stream with an **append-only contract** and **projection functions**. The call protocol events (`CallEventMapValue[]`) already carry everything needed:

- `requestId` — identifies which invocation
- `operationId` — identifies which operation
- `input`/`output` — the payload data (for state transfer edges)
- `parentRequestId` — the causation link
- `timestamp` — when it happened

What flowgraph provides is not a new event type, but a **consumption contract**:

```typescript
interface EventLogProjection {
  /** Append an event. Events are processed idempotently. */
  append(event: CallEventMapValue): void;
  /** Current status of a node, derived from the most recent event. */
  getStatus(nodeId: string): NodeStatus;
  /** Result of a completed node, derived from call.responded events. */
  getResult(nodeId: string): CallResult | undefined;
  /** All events for a node, in order. */
  getEvents(nodeId: string): CallEventMapValue[];
}
```

The `EventLogProjection` interface makes the append-only discipline explicit and provides typed access to projections. Implementations wrap `CallEventMapValue[]` and derive state on demand (or with memoization). This avoids creating a parallel type system — the event types, their structure, and their semantics remain in `@alkdev/operations/src/call.ts`.

### Resolved: Event log persists across re-renders; projections recompute

When a template is re-rendered (when the ujsx reconciler supports it), the event log persists. Events are append-only facts — they record what happened, and what happened doesn't change when the template structure changes.

Projections are recomputed by scanning the log against the new DAG:

1. Events for nodes still in the DAG map naturally to their projections.
2. Events for nodes removed from the DAG become **orphaned events** — they remain in the log (for audit/history) but don't affect active projections.
3. New nodes added to the DAG have no events yet — their status is `idle` and their result is `undefined`.

This means re-rendering doesn't lose history. The event log is the durable record; projections are ephemeral views that can always be reconstructed.

For v1 (before the reconciler exists), the event log starts at template mount and is disposed when the `WorkflowReactiveRoot` is disposed. The re-render scenario is an architectural commitment for when the reconciler arrives, not something to implement now.

#### Orphaned events specification

When a template is re-rendered and nodes are removed from the DAG, their events become orphaned. The projection layer handles this as follows:

1. **The `EventLogProjection` receives the current DAG structure** (the set of active node keys) alongside the event log. Methods like `getStatus(nodeId)` first check whether `nodeId` is in the active DAG. If not, the node is orphaned.

2. **Orphaned nodes return `undefined` from `getResult()`**. A downstream node referencing an orphaned predecessor via `Conditional.test` or `Map.over` will see `undefined`, causing the test to evaluate as if the predecessor didn't complete. This is the correct behavior — a removed node can't provide data.

3. **Orphaned events remain in the log** for audit and history. `getEvents(nodeId)` on an orphaned node returns its events (if any). The overall event log is still queryable for debugging.

4. **The `nodeKeyToRequestId` map is rebuilt on re-render**. New nodes get fresh `requestId` values. Old mappings are discarded, along with their associated signal subscriptions (the `WorkflowReactiveRoot.dispose()` call before re-render handles this).

### Resolved: Edges are marked with `dataFlow` attribute

Template edges get a `dataFlow: boolean` attribute that distinguishes temporal edges from state-transfer edges:

| `dataFlow` value | Meaning | Type checking needed? |
|:---|:---|:---|
| `false` (default) | Temporal ordering only — downstream starts after upstream completes but doesn't read upstream's output | No — no data flows between nodes |
| `true` | State transfer — downstream reads upstream's output via `Conditional.test` or `Map.over` | Yes — `typeCompat()` checks output→input compatibility |

This attribute is **inferred, not manual**. The `GraphologyHostConfig` detects `dataFlow` from template expressions during rendering:

- A `Sequential` edge where the downstream node references `results["upstreamNode"]` in `Conditional.test`, `Map.over`, or `Operation.input` gets `dataFlow: true`
- A `Sequential` edge where no such reference exists gets `dataFlow: false` (the default)
- A `Conditional` edge always gets `dataFlow: true` (the condition always reads a predecessor's result)
- `Parallel` edges don't exist (parallel children have no inter-sibling edges)

#### dataFlow inference specification

The inference algorithm operates at **template AST level** during `GraphologyHostConfig.createInstance` / `appendChild`, not at runtime. It inspects template component props to detect references to predecessor results:

**Detectable references** (set `dataFlow: true` on the edge from the referenced node to the referencing node):

| Expression | Detection method |
|:---|:---|
| `Conditional.test = (results) => results["X"]` | Static analysis of the function body for `results[...]` property accesses |
| `Conditional.test = "X"` (string form) | String comparison — the referenced operation name |
| `Map.over = (results) => results["X"].output.items` | Static analysis of the function body for `results[...]` property accesses |
| `Map.over = itemsSignal` (signal form) | No `dataFlow: true` — the array comes from a signal, not a predecessor result |
| `Operation.input = (results) => results["X"].output` | Static analysis of the function body for `results[...]` property accesses |
| `Operation.input = staticValue` | No `dataFlow: true` — the input doesn't depend on a predecessor result |

**Inference rules**:

1. **Direct predecessor edges only**: `dataFlow: true` is set only on edges that exist in the DAG. In a `Sequential` chain A → B → C, if C references `results["A"]`, the edge B → C gets `dataFlow: true` (since A is a predecessor of C via the chain), but no new edge A → C is created. Data flows transitively through the chain — B must complete before C starts, and C reads A's result from the result projection.

2. **`Map` component edges**: A `Map` component's predecessor-to-first-mapped-child edge gets `dataFlow: true` if `Map.over` references a predecessor result. Each mapped child's edge from the `Map`'s predecessor gets `dataFlow: true` because the array data comes from a predecessor's output.

3. **Ambiguous references**: If `Operation.input` is a function that cannot be statically analyzed (e.g., `(results) => computeInput(results)` where `computeInput` is a closure), the inference defaults to `dataFlow: false`. Template authors can manually annotate with `dataFlow: true` as an override, though this should be rare.

4. **Function body analysis**: JavaScript function introspection is unreliable (minification, closures). Inference operates on the **AST** of the ujsx template during rendering, not on the runtime function body. This means that `Conditional.test` functions passed as closures from external code (not inline in the template) cannot have their references detected. For these cases, the string form (`Conditional.test = "operationName"`) should be used to ensure detectability.

The `dataFlow` attribute propagates to the `TemplateEdgeAttrs` schema:

```typescript
const TemplateEdgeAttrs = Type.Object({
  edgeType: Type.Union([Type.Literal("sequential"), Type.Literal("conditional")]),
  condition: Type.Optional(Type.Unknown()),
  dataFlow: Type.Optional(Type.Boolean({ default: false })),
});
```

This resolves OQ-01 and OQ-02 precisely: `typeCompat()` only runs on edges where `dataFlow: true`. Temporal-only edges bypass type checking entirely.

## References

- Open questions tracker: [open-questions.md](open-questions.md)
- Reactive execution: [reactive-execution.md](reactive-execution.md)
- Call graph: [call-graph.md](call-graph.md)
- Call protocol events: `@alkdev/operations/src/call.ts`
- Event sourcing research: `/workspace/research/event_sourcing/event_source_types.md`
- Category theory graph research: `/workspace/@alkdev/ujsx/docs/research/category-theory-graph.md`
- Compute graph DAG: `/workspace/compute_graph/packages/dag/`