Files
flowgraph/docs/architecture/decisions/005-event-log-as-source-of-truth.md
glm-5.1 c76be7f689 ADR-005 accepted: resolve all open consequences, update cascading docs
Resolve the three open consequences from ADR-005 (Event Log as Single
Source of Truth) and transition from Proposed to Accepted:

1. Event log IS the call protocol event stream — not a separate type,
   but an EventLogProjection interface (append/getStatus/getResult/
   getEvents) over CallEventMapValue[] with an append-only contract.

2. Event log persists across template re-renders — projections recompute
   against the new DAG; orphaned events stay in log for audit but don't
   affect active projections.

3. Edges get dataFlow: boolean attribute on TemplateEdgeAttrs — inferred
   (not manual) by GraphologyHostConfig from template expressions.
   typeCompat() only runs on dataFlow: true edges. Inference rules are
   precisely specified for Conditional.test, Map.over, and Operation.input.

Also resolve OQ-05 (structural containers stay transparent; aggregate
status is a projection from children) and OQ-10 (running node failure
is a FailurePolicy configuration, default continues-running).

Cascading updates to:
- reactive-execution.md: add hybrid status model (event-log-driven vs
  projection-driven vs signal-mutation), EventLogProjection interface,
  result projection respecting retries, FailurePolicy type
- host-configs.md: ReactiveContext now includes resultProjection and
  computed results; resolved Q1/Q3/Q4
- schema.md: dataFlow attribute on TemplateEdgeAttrs with inference
  rules and type checking implications
- workflow-templates.md: edge creation rules with dataFlow, result
  projection in Conditional/Map, resolved Q1/Q4
- open-questions.md: all ADR-005 questions marked resolved, updated
  summary table and cross-cutting themes, removed duplicate OQ-07

7 files changed, 464 insertions, 139 deletions
2026-05-21 07:44:28 +00:00

19 KiB

ADR-005: Event Log as Single Source of Truth

Status

Accepted

Context

Flowgraph's reactive execution layer currently uses signal-based state propagation (signal<NodeStatus> and computed<boolean> for preconditions). Call graph nodes are populated from call protocol events. The two systems — reactive status tracking and call graph construction — are separate concepts that happen to process the same events.

Several open questions in the architecture reveal a common underlying problem:

  1. OQ-06: How does the template system bridge to the call protocol? The reactive engine needs to know when a call completes and what its output was, but the current design has no formal mechanism for this — Conditional.test receives a results map from an ad-hoc closure.
  2. OQ-07: Should the reactive engine own the call graph? They're both derived from the same call protocol events, but they're described as separate concepts.
  3. OQ-08: Should depends_on edges be auto-populated from templates? This conflates temporal ordering ("B starts after A completes") with data flow ("B needs A's output").
  4. OQ-09: How are retries handled? The current state machine marks failed as terminal, requiring awkward workarounds for retry.
  5. OQ-10: What happens to running nodes when a predecessor fails? The current design uses signal mutations without a clear policy mechanism.
  6. OQ-02: How deep should type compatibility checking go? This conflates edges that carry data (where types matter) with edges that only express ordering (where types are irrelevant).

These questions share a common root: the architecture conflates notification (something happened) with state transfer (here's the data). The event sourcing discipline calls this a "spaghetti concept" — using the same mechanism for semantically different purposes.

Meanwhile, the call protocol already defines a sequence of append-only facts:

call.requested  → { requestId, operationId, input, parentRequestId, timestamp }
call.responded  → { requestId, output, timestamp }
call.error      → { requestId, error, timestamp }
call.aborted    → { requestId, timestamp }
call.completed  → { requestId, timestamp }

These events are the ground truth. The call graph, the reactive status map, and the result map are all projections of this event sequence.

Decision

Flowgraph's reactive execution layer will be built on an Execution Event Log — an append-only sequence of call protocol events that serves as the single source of truth. The call graph, reactive status signals, and result map are all projections derived from this log.

Core Concept: Event Log + Projections

┌─────────────────────────────────────────────┐
│           Execution Event Log               │
│  (append-only sequence of call protocol     │
│   events — the ground truth)                │
└──────────────────┬──────────────────────────┘
                   │
     ┌─────────────┼──────────────┐
     │             │              │
     ▼             ▼              ▼
┌─────────┐  ┌──────────┐  ┌──────────┐
│ Status  │  │ Result   │  │ Call     │
│ Proj.   │  │ Proj.    │  │ Graph    │
│         │  │          │  │ Proj.    │
│ nodeId: │  │ nodeId:  │  │          │
│ status  │  │ output   │  │ nodes +  │
│         │  │          │  │ edges    │
└────┬────┘  └────┬─────┘  └──────────┘
     │             │
     ▼             ▼
┌───────────────────────────────────────────┐
│        Reactive Execution Layer            │
│                                             │
│  preconditions → "does the log show        │
│                    all predecessors          │
│                    completed?"              │
│                                             │
│  result resolution → "does the log          │
│                       have A's output?"     │
│                                             │
│  Conditional.test → reads from result proj.  │
│  Map.over         → reads from result proj.  │
└───────────────────────────────────────────┘

Notification vs. State Transfer

The event log naturally distinguishes two patterns that the current architecture conflates:

Pattern What the edge means What downstream needs Event type
Temporal (task ordering) "A must complete before B starts" Just the notification that A completed Notification
Data flow "A's output is B's input" A's actual output data State Transfer

The SDD pipeline (architect → reviewer → decompressor) is temporal ordering — the reviewer starts because the architect finished, but it reads files from disk, not from the architect's output. It only needs the notification.

The data-flow pipeline (fetch-items → Map(process-item) → aggregate) is state transfer — process-item needs fetch-items's output. It needs the state.

Both patterns derive from the same event log. Different projections serve different needs.

Retry Semantics

Retries become natural with an append-only log. A retry is not a state mutation — it's a new sequence of events appended to the log:

call.requested(A, attempt=1)  → fact: A was requested
call.error(A, "timeout")      → fact: A failed
call.requested(A, attempt=2)  → fact: A was retried
call.responded(A, output)      → fact: A succeeded on retry

The status projection derives the current state by scanning for the most recent event per node. No state machine mutation needed. The state machine becomes a fold over the event log.

Type Compatibility

Type compatibility checking (OQ-02) only applies to edges that carry state transfer — where the downstream node actually reads the upstream node's output. Temporal-only edges don't need type checking because there's no data flowing between them.

This resolves OQ-01: incompatible edges (type mismatches) only exist on state-transfer edges. For temporal-only edges, type compatibility is irrelevant. The distinction emerges naturally from the notification/state-transfer separation.

Rationale

  1. The call protocol already IS the event log. Every call event (call.requested, call.responded, call.error, call.aborted, call.completed) is an append-only fact. We've been treating these as separate from the reactive layer instead of recognizing that they're the same sequence of events projected differently.

  2. Projections separate concerns. The status projection, result projection, and call graph projection all derive from the same log but serve different consumers. This eliminates the question of "who owns the call graph" (OQ-07) — it's a projection, not something the reactive engine "owns."

  3. Notification and state transfer are different. The event sourcing discipline makes this explicit. Conflating them leads to the "boomerang callback" anti-pattern (OQ-06) — if you send a thin notification but the consumer needs the data, they call back synchronously. The event log carries both notification and state transfer; different consumers read different projections.

  4. Retries are natural. An append-only log makes retries a sequence of facts, not a state mutation hack. This resolves OQ-09 without adding a retried status or breaking the terminal-state invariant.

  5. Data dependencies don't need separate edges. If B needs A's output, B reads from the result projection. The temporal ordering is already expressed by template edges. A separate depends_on edge type (OQ-08) becomes unnecessary because the event log is the data transport.

  6. Category theory alignment. The event log is a monoid (append with identity). Projections are functors from the log monoid to status/result/graph monoids. Composition of morphisms (A → B → C) follows from the composition of events in the log. This is the same structure as the category theory research prototype, but applied to workflow orchestration rather than generic morphism composition.

Consequences

Positive

  • OQ-06 resolved: The reactive layer bridges to the call protocol through the event log. The hub appends call protocol events; the reactive layer projects them. No callback, no boomerang.
  • OQ-07 resolved: The call graph and reactive engine are both projections of the event log. Neither owns the other.
  • OQ-08 resolved: depends_on edges are unnecessary. Data dependencies are expressed through the result projection, not through edge attributes.
  • OQ-09 resolved: Retries are natural — append new events rather than mutating state.
  • OQ-10 reframed: When a predecessor fails, the event log records the failure. Policy (abort running nodes? let them continue?) is a configuration of the projection, not a hard-coded state machine rule.
  • OQ-02 reframed: Type compatibility checking only applies to edges where state transfer occurs, not to temporal-only edges.
  • OQ-01 reframed: Incompatible edges only exist where there's data flow. Temporal-only edges don't need them in the operation graph.

Negative

  • The reactive layer needs a redesign. The current WorkflowReactiveRoot directly creates signal<NodeStatus> instances and expects the hub coordinator to set them. The event log approach replaces direct signal mutation with event appends that project into signal updates. This is a non-trivial refactoring of the reactive-execution.md spec.
  • The event log must be persisted if workflow state must survive restarts. Since flowgraph is in-memory only (ADR-003), the event log lives in memory. Persistence is the consumer's concern — the hub can persist the call protocol events in Postgres and replay them to reconstruct the reactive state after a restart.
  • Event replay must be idempotent. Processing the same event twice must produce the same projected state. This is already a property of the call protocol events (updateFromEvent is documented as idempotent in call-graph.md).
  • The result projection needs a clear interface. getResult(nodeId) must be defined — what it returns, when it's available, and how it interacts with Conditional.test and Map.over closures that may reference results from nodes that haven't completed yet.

Resolved: Event log is the call protocol event stream

The event log is NOT a separate type. It IS the call protocol event stream with an append-only contract and projection functions. The call protocol events (CallEventMapValue[]) already carry everything needed:

  • requestId — identifies which invocation
  • operationId — identifies which operation
  • input/output — the payload data (for state transfer edges)
  • parentRequestId — the causation link
  • timestamp — when it happened

What flowgraph provides is not a new event type, but a consumption contract:

interface EventLogProjection {
  /** Append an event. Events are processed idempotently. */
  append(event: CallEventMapValue): void;
  /** Current status of a node, derived from the most recent event. */
  getStatus(nodeId: string): NodeStatus;
  /** Result of a completed node, derived from call.responded events. */
  getResult(nodeId: string): CallResult | undefined;
  /** All events for a node, in order. */
  getEvents(nodeId: string): CallEventMapValue[];
}

The EventLogProjection interface makes the append-only discipline explicit and provides typed access to projections. Implementations wrap CallEventMapValue[] and derive state on demand (or with memoization). This avoids creating a parallel type system — the event types, their structure, and their semantics remain in @alkdev/operations/src/call.ts.

Resolved: Event log persists across re-renders; projections recompute

When a template is re-rendered (when the ujsx reconciler supports it), the event log persists. Events are append-only facts — they record what happened, and what happened doesn't change when the template structure changes.

Projections are recomputed by scanning the log against the new DAG:

  1. Events for nodes still in the DAG map naturally to their projections.
  2. Events for nodes removed from the DAG become orphaned events — they remain in the log (for audit/history) but don't affect active projections.
  3. New nodes added to the DAG have no events yet — their status is idle and their result is undefined.

This means re-rendering doesn't lose history. The event log is the durable record; projections are ephemeral views that can always be reconstructed.

For v1 (before the reconciler exists), the event log starts at template mount and is disposed when the WorkflowReactiveRoot is disposed. The re-render scenario is an architectural commitment for when the reconciler arrives, not something to implement now.

Orphaned events specification

When a template is re-rendered and nodes are removed from the DAG, their events become orphaned. The projection layer handles this as follows:

  1. The EventLogProjection receives the current DAG structure (the set of active node keys) alongside the event log. Methods like getStatus(nodeId) first check whether nodeId is in the active DAG. If not, the node is orphaned.

  2. Orphaned nodes return undefined from getResult(). A downstream node referencing an orphaned predecessor via Conditional.test or Map.over will see undefined, causing the test to evaluate as if the predecessor didn't complete. This is the correct behavior — a removed node can't provide data.

  3. Orphaned events remain in the log for audit and history. getEvents(nodeId) on an orphaned node returns its events (if any). The overall event log is still queryable for debugging.

  4. The nodeKeyToRequestId map is rebuilt on re-render. New nodes get fresh requestId values. Old mappings are discarded, along with their associated signal subscriptions (the WorkflowReactiveRoot.dispose() call before re-render handles this).

Resolved: Edges are marked with dataFlow attribute

Template edges get a dataFlow: boolean attribute that distinguishes temporal edges from state-transfer edges:

dataFlow value Meaning Type checking needed?
false (default) Temporal ordering only — downstream starts after upstream completes but doesn't read upstream's output No — no data flows between nodes
true State transfer — downstream reads upstream's output via Conditional.test or Map.over Yes — typeCompat() checks output→input compatibility

This attribute is inferred, not manual. The GraphologyHostConfig detects dataFlow from template expressions during rendering:

  • A Sequential edge where the downstream node references results["upstreamNode"] in Conditional.test, Map.over, or Operation.input gets dataFlow: true
  • A Sequential edge where no such reference exists gets dataFlow: false (the default)
  • A Conditional edge always gets dataFlow: true (the condition always reads a predecessor's result)
  • Parallel edges don't exist (parallel children have no inter-sibling edges)

dataFlow inference specification

The inference algorithm operates at template AST level during GraphologyHostConfig.createInstance / appendChild, not at runtime. It inspects template component props to detect references to predecessor results:

Detectable references (set dataFlow: true on the edge from the referenced node to the referencing node):

Expression Detection method
Conditional.test = (results) => results["X"] Static analysis of the function body for results[...] property accesses
Conditional.test = "X" (string form) String comparison — the referenced operation name
Map.over = (results) => results["X"].output.items Static analysis of the function body for results[...] property accesses
Map.over = itemsSignal (signal form) No dataFlow: true — the array comes from a signal, not a predecessor result
Operation.input = (results) => results["X"].output Static analysis of the function body for results[...] property accesses
Operation.input = staticValue No dataFlow: true — the input doesn't depend on a predecessor result

Inference rules:

  1. Direct predecessor edges only: dataFlow: true is set only on edges that exist in the DAG. In a Sequential chain A → B → C, if C references results["A"], the edge B → C gets dataFlow: true (since A is a predecessor of C via the chain), but no new edge A → C is created. Data flows transitively through the chain — B must complete before C starts, and C reads A's result from the result projection.

  2. Map component edges: A Map component's predecessor-to-first-mapped-child edge gets dataFlow: true if Map.over references a predecessor result. Each mapped child's edge from the Map's predecessor gets dataFlow: true because the array data comes from a predecessor's output.

  3. Ambiguous references: If Operation.input is a function that cannot be statically analyzed (e.g., (results) => computeInput(results) where computeInput is a closure), the inference defaults to dataFlow: false. Template authors can manually annotate with dataFlow: true as an override, though this should be rare.

  4. Function body analysis: JavaScript function introspection is unreliable (minification, closures). Inference operates on the AST of the ujsx template during rendering, not on the runtime function body. This means that Conditional.test functions passed as closures from external code (not inline in the template) cannot have their references detected. For these cases, the string form (Conditional.test = "operationName") should be used to ensure detectability.

The dataFlow attribute propagates to the TemplateEdgeAttrs schema:

const TemplateEdgeAttrs = Type.Object({
  edgeType: Type.Union([Type.Literal("sequential"), Type.Literal("conditional")]),
  condition: Type.Optional(Type.Unknown()),
  dataFlow: Type.Optional(Type.Boolean({ default: false })),
});

This resolves OQ-01 and OQ-02 precisely: typeCompat() only runs on edges where dataFlow: true. Temporal-only edges bypass type checking entirely.

References

  • Open questions tracker: open-questions.md
  • Reactive execution: reactive-execution.md
  • Call graph: call-graph.md
  • Call protocol events: @alkdev/operations/src/call.ts
  • Event sourcing research: /workspace/research/event_sourcing/event_source_types.md
  • Category theory graph research: /workspace/@alkdev/ujsx/docs/research/category-theory-graph.md
  • Compute graph DAG: /workspace/compute_graph/packages/dag/