# ADR-005: Event Log as Single Source of Truth ## Status Accepted ## Context Flowgraph's reactive execution layer currently uses signal-based state propagation (`signal` and `computed` for preconditions). Call graph nodes are populated from call protocol events. The two systems — reactive status tracking and call graph construction — are separate concepts that happen to process the same events. Several open questions in the architecture reveal a common underlying problem: 1. **OQ-06**: How does the template system bridge to the call protocol? The reactive engine needs to know when a call completes and what its output was, but the current design has no formal mechanism for this — `Conditional.test` receives a `results` map from an ad-hoc closure. 2. **OQ-07**: Should the reactive engine own the call graph? They're both derived from the same call protocol events, but they're described as separate concepts. 3. **OQ-08**: Should `depends_on` edges be auto-populated from templates? This conflates temporal ordering ("B starts after A completes") with data flow ("B needs A's output"). 4. **OQ-09**: How are retries handled? The current state machine marks `failed` as terminal, requiring awkward workarounds for retry. 5. **OQ-10**: What happens to running nodes when a predecessor fails? The current design uses signal mutations without a clear policy mechanism. 6. **OQ-02**: How deep should type compatibility checking go? This conflates edges that carry data (where types matter) with edges that only express ordering (where types are irrelevant). These questions share a common root: **the architecture conflates notification (something happened) with state transfer (here's the data).** The event sourcing discipline calls this a "spaghetti concept" — using the same mechanism for semantically different purposes. Meanwhile, the call protocol already defines a sequence of append-only facts: ``` call.requested → { requestId, operationId, input, parentRequestId, timestamp } call.responded → { requestId, output, timestamp } call.error → { requestId, error, timestamp } call.aborted → { requestId, timestamp } call.completed → { requestId, timestamp } ``` These events are the ground truth. The call graph, the reactive status map, and the result map are all projections of this event sequence. ## Decision Flowgraph's reactive execution layer will be built on an **Execution Event Log** — an append-only sequence of call protocol events that serves as the single source of truth. The call graph, reactive status signals, and result map are all projections derived from this log. ### Core Concept: Event Log + Projections ``` ┌─────────────────────────────────────────────┐ │ Execution Event Log │ │ (append-only sequence of call protocol │ │ events — the ground truth) │ └──────────────────┬──────────────────────────┘ │ ┌─────────────┼──────────────┐ │ │ │ ▼ ▼ ▼ ┌─────────┐ ┌──────────┐ ┌──────────┐ │ Status │ │ Result │ │ Call │ │ Proj. │ │ Proj. │ │ Graph │ │ │ │ │ │ Proj. │ │ nodeId: │ │ nodeId: │ │ │ │ status │ │ output │ │ nodes + │ │ │ │ │ │ edges │ └────┬────┘ └────┬─────┘ └──────────┘ │ │ ▼ ▼ ┌───────────────────────────────────────────┐ │ Reactive Execution Layer │ │ │ │ preconditions → "does the log show │ │ all predecessors │ │ completed?" │ │ │ │ result resolution → "does the log │ │ have A's output?" │ │ │ │ Conditional.test → reads from result proj. │ │ Map.over → reads from result proj. │ └───────────────────────────────────────────┘ ``` ### Notification vs. State Transfer The event log naturally distinguishes two patterns that the current architecture conflates: | Pattern | What the edge means | What downstream needs | Event type | |---------|--------------------|-----------------------|------------| | **Temporal (task ordering)** | "A must complete before B starts" | Just the notification that A completed | Notification | | **Data flow** | "A's output is B's input" | A's actual output data | State Transfer | The SDD pipeline (architect → reviewer → decompressor) is temporal ordering — the reviewer starts because the architect finished, but it reads files from disk, not from the architect's output. It only needs the notification. The data-flow pipeline (fetch-items → Map(process-item) → aggregate) is state transfer — `process-item` needs `fetch-items`'s output. It needs the state. Both patterns derive from the same event log. Different projections serve different needs. ### Retry Semantics Retries become natural with an append-only log. A retry is not a state mutation — it's a new sequence of events appended to the log: ``` call.requested(A, attempt=1) → fact: A was requested call.error(A, "timeout") → fact: A failed call.requested(A, attempt=2) → fact: A was retried call.responded(A, output) → fact: A succeeded on retry ``` The status projection derives the current state by scanning for the most recent event per node. No state machine mutation needed. The state machine becomes a fold over the event log. ### Type Compatibility Type compatibility checking (OQ-02) only applies to edges that carry state transfer — where the downstream node actually reads the upstream node's output. Temporal-only edges don't need type checking because there's no data flowing between them. This resolves OQ-01: incompatible edges (type mismatches) only exist on state-transfer edges. For temporal-only edges, type compatibility is irrelevant. The distinction emerges naturally from the notification/state-transfer separation. ## Rationale 1. **The call protocol already IS the event log.** Every call event (`call.requested`, `call.responded`, `call.error`, `call.aborted`, `call.completed`) is an append-only fact. We've been treating these as separate from the reactive layer instead of recognizing that they're the same sequence of events projected differently. 2. **Projections separate concerns.** The status projection, result projection, and call graph projection all derive from the same log but serve different consumers. This eliminates the question of "who owns the call graph" (OQ-07) — it's a projection, not something the reactive engine "owns." 3. **Notification and state transfer are different.** The event sourcing discipline makes this explicit. Conflating them leads to the "boomerang callback" anti-pattern (OQ-06) — if you send a thin notification but the consumer needs the data, they call back synchronously. The event log carries both notification and state transfer; different consumers read different projections. 4. **Retries are natural.** An append-only log makes retries a sequence of facts, not a state mutation hack. This resolves OQ-09 without adding a `retried` status or breaking the terminal-state invariant. 5. **Data dependencies don't need separate edges.** If B needs A's output, B reads from the result projection. The temporal ordering is already expressed by template edges. A separate `depends_on` edge type (OQ-08) becomes unnecessary because the event log is the data transport. 6. **Category theory alignment.** The event log is a monoid (append with identity). Projections are functors from the log monoid to status/result/graph monoids. Composition of morphisms (A → B → C) follows from the composition of events in the log. This is the same structure as the category theory research prototype, but applied to workflow orchestration rather than generic morphism composition. ## Consequences ### Positive - **OQ-06 resolved**: The reactive layer bridges to the call protocol through the event log. The hub appends call protocol events; the reactive layer projects them. No callback, no boomerang. - **OQ-07 resolved**: The call graph and reactive engine are both projections of the event log. Neither owns the other. - **OQ-08 resolved**: `depends_on` edges are unnecessary. Data dependencies are expressed through the result projection, not through edge attributes. - **OQ-09 resolved**: Retries are natural — append new events rather than mutating state. - **OQ-10 reframed**: When a predecessor fails, the event log records the failure. Policy (abort running nodes? let them continue?) is a configuration of the projection, not a hard-coded state machine rule. - **OQ-02 reframed**: Type compatibility checking only applies to edges where state transfer occurs, not to temporal-only edges. - **OQ-01 reframed**: Incompatible edges only exist where there's data flow. Temporal-only edges don't need them in the operation graph. ### Negative - **The reactive layer needs a redesign.** The current `WorkflowReactiveRoot` directly creates `signal` instances and expects the hub coordinator to set them. The event log approach replaces direct signal mutation with event appends that project into signal updates. This is a non-trivial refactoring of the reactive-execution.md spec. - **The event log must be persisted if workflow state must survive restarts.** Since flowgraph is in-memory only (ADR-003), the event log lives in memory. Persistence is the consumer's concern — the hub can persist the call protocol events in Postgres and replay them to reconstruct the reactive state after a restart. - **Event replay must be idempotent.** Processing the same event twice must produce the same projected state. This is already a property of the call protocol events (`updateFromEvent` is documented as idempotent in call-graph.md). - **The result projection needs a clear interface.** `getResult(nodeId)` must be defined — what it returns, when it's available, and how it interacts with `Conditional.test` and `Map.over` closures that may reference results from nodes that haven't completed yet. ### Resolved: Event log is the call protocol event stream The event log is NOT a separate type. It IS the call protocol event stream with an **append-only contract** and **projection functions**. The call protocol events (`CallEventMapValue[]`) already carry everything needed: - `requestId` — identifies which invocation - `operationId` — identifies which operation - `input`/`output` — the payload data (for state transfer edges) - `parentRequestId` — the causation link - `timestamp` — when it happened What flowgraph provides is not a new event type, but a **consumption contract**: ```typescript interface EventLogProjection { /** Append an event. Events are processed idempotently. */ append(event: CallEventMapValue): void; /** Current status of a node, derived from the most recent event. */ getStatus(nodeId: string): NodeStatus; /** Result of a completed node, derived from call.responded events. */ getResult(nodeId: string): CallResult | undefined; /** All events for a node, in order. */ getEvents(nodeId: string): CallEventMapValue[]; } ``` The `EventLogProjection` interface makes the append-only discipline explicit and provides typed access to projections. Implementations wrap `CallEventMapValue[]` and derive state on demand (or with memoization). This avoids creating a parallel type system — the event types, their structure, and their semantics remain in `@alkdev/operations/src/call.ts`. ### Resolved: Event log persists across re-renders; projections recompute When a template is re-rendered (when the ujsx reconciler supports it), the event log persists. Events are append-only facts — they record what happened, and what happened doesn't change when the template structure changes. Projections are recomputed by scanning the log against the new DAG: 1. Events for nodes still in the DAG map naturally to their projections. 2. Events for nodes removed from the DAG become **orphaned events** — they remain in the log (for audit/history) but don't affect active projections. 3. New nodes added to the DAG have no events yet — their status is `idle` and their result is `undefined`. This means re-rendering doesn't lose history. The event log is the durable record; projections are ephemeral views that can always be reconstructed. For v1 (before the reconciler exists), the event log starts at template mount and is disposed when the `WorkflowReactiveRoot` is disposed. The re-render scenario is an architectural commitment for when the reconciler arrives, not something to implement now. #### Orphaned events specification When a template is re-rendered and nodes are removed from the DAG, their events become orphaned. The projection layer handles this as follows: 1. **The `EventLogProjection` receives the current DAG structure** (the set of active node keys) alongside the event log. Methods like `getStatus(nodeId)` first check whether `nodeId` is in the active DAG. If not, the node is orphaned. 2. **Orphaned nodes return `undefined` from `getResult()`**. A downstream node referencing an orphaned predecessor via `Conditional.test` or `Map.over` will see `undefined`, causing the test to evaluate as if the predecessor didn't complete. This is the correct behavior — a removed node can't provide data. 3. **Orphaned events remain in the log** for audit and history. `getEvents(nodeId)` on an orphaned node returns its events (if any). The overall event log is still queryable for debugging. 4. **The `nodeKeyToRequestId` map is rebuilt on re-render**. New nodes get fresh `requestId` values. Old mappings are discarded, along with their associated signal subscriptions (the `WorkflowReactiveRoot.dispose()` call before re-render handles this). ### Resolved: Edges are marked with `dataFlow` attribute Template edges get a `dataFlow: boolean` attribute that distinguishes temporal edges from state-transfer edges: | `dataFlow` value | Meaning | Type checking needed? | |:---|:---|:---| | `false` (default) | Temporal ordering only — downstream starts after upstream completes but doesn't read upstream's output | No — no data flows between nodes | | `true` | State transfer — downstream reads upstream's output via `Conditional.test` or `Map.over` | Yes — `typeCompat()` checks output→input compatibility | This attribute is **inferred, not manual**. The `GraphologyHostConfig` detects `dataFlow` from template expressions during rendering: - A `Sequential` edge where the downstream node references `results["upstreamNode"]` in `Conditional.test`, `Map.over`, or `Operation.input` gets `dataFlow: true` - A `Sequential` edge where no such reference exists gets `dataFlow: false` (the default) - A `Conditional` edge always gets `dataFlow: true` (the condition always reads a predecessor's result) - `Parallel` edges don't exist (parallel children have no inter-sibling edges) #### dataFlow inference specification The inference algorithm operates at **template AST level** during `GraphologyHostConfig.createInstance` / `appendChild`, not at runtime. It inspects template component props to detect references to predecessor results: **Detectable references** (set `dataFlow: true` on the edge from the referenced node to the referencing node): | Expression | Detection method | |:---|:---| | `Conditional.test = (results) => results["X"]` | Static analysis of the function body for `results[...]` property accesses | | `Conditional.test = "X"` (string form) | String comparison — the referenced operation name | | `Map.over = (results) => results["X"].output.items` | Static analysis of the function body for `results[...]` property accesses | | `Map.over = itemsSignal` (signal form) | No `dataFlow: true` — the array comes from a signal, not a predecessor result | | `Operation.input = (results) => results["X"].output` | Static analysis of the function body for `results[...]` property accesses | | `Operation.input = staticValue` | No `dataFlow: true` — the input doesn't depend on a predecessor result | **Inference rules**: 1. **Direct predecessor edges only**: `dataFlow: true` is set only on edges that exist in the DAG. In a `Sequential` chain A → B → C, if C references `results["A"]`, the edge B → C gets `dataFlow: true` (since A is a predecessor of C via the chain), but no new edge A → C is created. Data flows transitively through the chain — B must complete before C starts, and C reads A's result from the result projection. 2. **`Map` component edges**: A `Map` component's predecessor-to-first-mapped-child edge gets `dataFlow: true` if `Map.over` references a predecessor result. Each mapped child's edge from the `Map`'s predecessor gets `dataFlow: true` because the array data comes from a predecessor's output. 3. **Ambiguous references**: If `Operation.input` is a function that cannot be statically analyzed (e.g., `(results) => computeInput(results)` where `computeInput` is a closure), the inference defaults to `dataFlow: false`. Template authors can manually annotate with `dataFlow: true` as an override, though this should be rare. 4. **Function body analysis**: JavaScript function introspection is unreliable (minification, closures). Inference operates on the **AST** of the ujsx template during rendering, not on the runtime function body. This means that `Conditional.test` functions passed as closures from external code (not inline in the template) cannot have their references detected. For these cases, the string form (`Conditional.test = "operationName"`) should be used to ensure detectability. The `dataFlow` attribute propagates to the `TemplateEdgeAttrs` schema: ```typescript const TemplateEdgeAttrs = Type.Object({ edgeType: Type.Union([Type.Literal("sequential"), Type.Literal("conditional")]), condition: Type.Optional(Type.Unknown()), dataFlow: Type.Optional(Type.Boolean({ default: false })), }); ``` This resolves OQ-01 and OQ-02 precisely: `typeCompat()` only runs on edges where `dataFlow: true`. Temporal-only edges bypass type checking entirely. ## References - Open questions tracker: [open-questions.md](open-questions.md) - Reactive execution: [reactive-execution.md](reactive-execution.md) - Call graph: [call-graph.md](call-graph.md) - Call protocol events: `@alkdev/operations/src/call.ts` - Event sourcing research: `/workspace/research/event_sourcing/event_source_types.md` - Category theory graph research: `/workspace/@alkdev/ujsx/docs/research/category-theory-graph.md` - Compute graph DAG: `/workspace/compute_graph/packages/dag/`