ADR-005: event log as single source of truth

Proposed architecture decision to use an append-only execution event log (call protocol events) as ground truth, with status/result/call-graph as projections. Resolves OQ-06, OQ-07, OQ-08, OQ-09; reframes OQ-01, OQ-02, OQ-10. Inspired by event sourcing discipline (notification vs state transfer) and compute_graph ExecutionContext pattern.
2026-05-20 09:33:15 +00:00
parent 27ebbd491e
commit 2c1b2d1a15
3 changed files with 204 additions and 25 deletions
--- a/docs/architecture/decisions/005-event-log-as-source-of-truth.md
+++ b/docs/architecture/decisions/005-event-log-as-source-of-truth.md
@@ -0,0 +1,156 @@
+# ADR-005: Event Log as Single Source of Truth
+
+## Status
+
+Proposed
+
+## Context
+
+Flowgraph's reactive execution layer currently uses signal-based state propagation (`signal<NodeStatus>` and `computed<boolean>` for preconditions). Call graph nodes are populated from call protocol events. The two systems — reactive status tracking and call graph construction — are separate concepts that happen to process the same events.
+
+Several open questions in the architecture reveal a common underlying problem:
+
+1. **OQ-06**: How does the template system bridge to the call protocol? The reactive engine needs to know when a call completes and what its output was, but the current design has no formal mechanism for this — `Conditional.test` receives a `results` map from an ad-hoc closure.
+2. **OQ-07**: Should the reactive engine own the call graph? They're both derived from the same call protocol events, but they're described as separate concepts.
+3. **OQ-08**: Should `depends_on` edges be auto-populated from templates? This conflates temporal ordering ("B starts after A completes") with data flow ("B needs A's output").
+4. **OQ-09**: How are retries handled? The current state machine marks `failed` as terminal, requiring awkward workarounds for retry.
+5. **OQ-10**: What happens to running nodes when a predecessor fails? The current design uses signal mutations without a clear policy mechanism.
+6. **OQ-02**: How deep should type compatibility checking go? This conflates edges that carry data (where types matter) with edges that only express ordering (where types are irrelevant).
+
+These questions share a common root: **the architecture conflates notification (something happened) with state transfer (here's the data).** The event sourcing discipline calls this a "spaghetti concept" — using the same mechanism for semantically different purposes.
+
+Meanwhile, the call protocol already defines a sequence of append-only facts:
+
+```
+call.requested  → { requestId, operationId, input, parentRequestId, timestamp }
+call.responded  → { requestId, output, timestamp }
+call.error      → { requestId, error, timestamp }
+call.aborted    → { requestId, timestamp }
+call.completed  → { requestId, timestamp }
+```
+
+These events are the ground truth. The call graph, the reactive status map, and the result map are all projections of this event sequence.
+
+## Decision
+
+Flowgraph's reactive execution layer will be built on an **Execution Event Log** — an append-only sequence of call protocol events that serves as the single source of truth. The call graph, reactive status signals, and result map are all projections derived from this log.
+
+### Core Concept: Event Log + Projections
+
+```
+┌─────────────────────────────────────────────┐
+│           Execution Event Log               │
+│  (append-only sequence of call protocol     │
+│   events — the ground truth)                │
+└──────────────────┬──────────────────────────┘
+                   │
+     ┌─────────────┼──────────────┐
+     │             │              │
+     ▼             ▼              ▼
+┌─────────┐  ┌──────────┐  ┌──────────┐
+│ Status  │  │ Result   │  │ Call     │
+│ Proj.   │  │ Proj.    │  │ Graph    │
+│         │  │          │  │ Proj.    │
+│ nodeId: │  │ nodeId:  │  │          │
+│ status  │  │ output   │  │ nodes +  │
+│         │  │          │  │ edges    │
+└────┬────┘  └────┬─────┘  └──────────┘
+     │             │
+     ▼             ▼
+┌───────────────────────────────────────────┐
+│        Reactive Execution Layer            │
+│                                             │
+│  preconditions → "does the log show        │
+│                    all predecessors          │
+│                    completed?"              │
+│                                             │
+│  result resolution → "does the log          │
+│                       have A's output?"     │
+│                                             │
+│  Conditional.test → reads from result proj.  │
+│  Map.over         → reads from result proj.  │
+└───────────────────────────────────────────┘
+```
+
+### Notification vs. State Transfer
+
+The event log naturally distinguishes two patterns that the current architecture conflates:
+
+| Pattern | What the edge means | What downstream needs | Event type |
+|---------|--------------------|-----------------------|------------|
+| **Temporal (task ordering)** | "A must complete before B starts" | Just the notification that A completed | Notification |
+| **Data flow** | "A's output is B's input" | A's actual output data | State Transfer |
+
+The SDD pipeline (architect → reviewer → decompressor) is temporal ordering — the reviewer starts because the architect finished, but it reads files from disk, not from the architect's output. It only needs the notification.
+
+The data-flow pipeline (fetch-items → Map(process-item) → aggregate) is state transfer — `process-item` needs `fetch-items`'s output. It needs the state.
+
+Both patterns derive from the same event log. Different projections serve different needs.
+
+### Retry Semantics
+
+Retries become natural with an append-only log. A retry is not a state mutation — it's a new sequence of events appended to the log:
+
+```
+call.requested(A, attempt=1)  → fact: A was requested
+call.error(A, "timeout")      → fact: A failed
+call.requested(A, attempt=2)  → fact: A was retried
+call.responded(A, output)      → fact: A succeeded on retry
+```
+
+The status projection derives the current state by scanning for the most recent event per node. No state machine mutation needed. The state machine becomes a fold over the event log.
+
+### Type Compatibility
+
+Type compatibility checking (OQ-02) only applies to edges that carry state transfer — where the downstream node actually reads the upstream node's output. Temporal-only edges don't need type checking because there's no data flowing between them.
+
+This resolves OQ-01: incompatible edges (type mismatches) only exist on state-transfer edges. For temporal-only edges, type compatibility is irrelevant. The distinction emerges naturally from the notification/state-transfer separation.
+
+## Rationale
+
+1. **The call protocol already IS the event log.** Every call event (`call.requested`, `call.responded`, `call.error`, `call.aborted`, `call.completed`) is an append-only fact. We've been treating these as separate from the reactive layer instead of recognizing that they're the same sequence of events projected differently.
+
+2. **Projections separate concerns.** The status projection, result projection, and call graph projection all derive from the same log but serve different consumers. This eliminates the question of "who owns the call graph" (OQ-07) — it's a projection, not something the reactive engine "owns."
+
+3. **Notification and state transfer are different.** The event sourcing discipline makes this explicit. Conflating them leads to the "boomerang callback" anti-pattern (OQ-06) — if you send a thin notification but the consumer needs the data, they call back synchronously. The event log carries both notification and state transfer; different consumers read different projections.
+
+4. **Retries are natural.** An append-only log makes retries a sequence of facts, not a state mutation hack. This resolves OQ-09 without adding a `retried` status or breaking the terminal-state invariant.
+
+5. **Data dependencies don't need separate edges.** If B needs A's output, B reads from the result projection. The temporal ordering is already expressed by template edges. A separate `depends_on` edge type (OQ-08) becomes unnecessary because the event log is the data transport.
+
+6. **Category theory alignment.** The event log is a monoid (append with identity). Projections are functors from the log monoid to status/result/graph monoids. Composition of morphisms (A → B → C) follows from the composition of events in the log. This is the same structure as the category theory research prototype, but applied to workflow orchestration rather than generic morphism composition.
+
+## Consequences
+
+### Positive
+
+- **OQ-06 resolved**: The reactive layer bridges to the call protocol through the event log. The hub appends call protocol events; the reactive layer projects them. No callback, no boomerang.
+- **OQ-07 resolved**: The call graph and reactive engine are both projections of the event log. Neither owns the other.
+- **OQ-08 resolved**: `depends_on` edges are unnecessary. Data dependencies are expressed through the result projection, not through edge attributes.
+- **OQ-09 resolved**: Retries are natural — append new events rather than mutating state.
+- **OQ-10 reframed**: When a predecessor fails, the event log records the failure. Policy (abort running nodes? let them continue?) is a configuration of the projection, not a hard-coded state machine rule.
+- **OQ-02 reframed**: Type compatibility checking only applies to edges where state transfer occurs, not to temporal-only edges.
+- **OQ-01 reframed**: Incompatible edges only exist where there's data flow. Temporal-only edges don't need them in the operation graph.
+
+### Negative
+
+- **The reactive layer needs a redesign.** The current `WorkflowReactiveRoot` directly creates `signal<NodeStatus>` instances and expects the hub coordinator to set them. The event log approach replaces direct signal mutation with event appends that project into signal updates. This is a non-trivial refactoring of the reactive-execution.md spec.
+- **The event log must be persisted if workflow state must survive restarts.** Since flowgraph is in-memory only (ADR-003), the event log lives in memory. Persistence is the consumer's concern — the hub can persist the call protocol events in Postgres and replay them to reconstruct the reactive state after a restart.
+- **Event replay must be idempotent.** Processing the same event twice must produce the same projected state. This is already a property of the call protocol events (`updateFromEvent` is documented as idempotent in call-graph.md).
+- **The result projection needs a clear interface.** `getResult(nodeId)` must be defined — what it returns, when it's available, and how it interacts with `Conditional.test` and `Map.over` closures that may reference results from nodes that haven't completed yet.
+
+### Open
+
+- **Should the event log be its own exported type, or is it the call protocol event stream by another name?** The call protocol already defines the events. The event log might just be `CallEventMapValue[]` with an append-only contract and projection functions.
+- **How does the event log interact with the ujsx template lifecycle?** When a template is rendered to a reactive root, the log starts empty and populates as events arrive. But if the template is re-rendered (when the reconciler supports it), what happens to the log? Is it reset, or does it persist across re-renders?
+- **Should temporal-only edges be explicitly marked?** Currently `sequential` edges are always temporal ordering. Data flow is implicitly expressed by `Conditional.test` and `Map.over` reading from the result projection. Should edges carry an attribute that explicitly marks them as notification vs. state transfer? This would make type checking more precise (only check types on state-transfer edges).
+
+## References
+
+- Open questions tracker: [open-questions.md](open-questions.md)
+- Reactive execution: [reactive-execution.md](reactive-execution.md)
+- Call graph: [call-graph.md](call-graph.md)
+- Call protocol events: `@alkdev/operations/src/call.ts`
+- Event sourcing research: `/workspace/research/event_sourcing/event_source_types.md`
+- Category theory graph research: `/workspace/@alkdev/ujsx/docs/research/category-theory-graph.md`
+- Compute graph DAG: `/workspace/compute_graph/packages/dag/`