resolve all remaining open questions (OQ-03–OQ-29), add ADR-006

Resolve all 19 remaining open questions across the architecture. Every
question now has a documented resolution with rationale:

- OQ-004/OQ-029: edgeType is a universal required attribute on all edges,
  single graph per FlowGraph instance (ADR-006)
- OQ-011: No OR preconditions for v1; preconditionMode as v2 extension
- OQ-012: maxConcurrency enforced via reactive counting semaphore
- OQ-014: Unknown operationId creates node with pending status
- OQ-017: Expose common graphology traversal methods on FlowGraph (80/20)
- OQ-020: condition as Type.Unknown() with string/function documentation
- OQ-022: Identity imported from @alkdev/operations peer dep
- All other questions resolved with documented rationale

Fix three critical issues found by architecture review:
1. edgeType serialization/validation gap: document two-step validation
2. CallEdgeAttrs runtime discrimination: edgeType as runtime discriminant,
   depends_on edges clarified as observability-only (not execution)
3. ADR-005 signal mutation inconsistency: explicitly distinguish call-level
   statuses (event-log-driven) from workflow-derived statuses (signal-mutation)

Additional clarifications:
- dataFlow inference uses conservative strategy (defaults false)
- Conditional.test string resolution: operationName → status === completed
- Add negated field to TemplateEdgeAttrs for else-branch conditions
- Document edge key priority convention for composite keys
- Add maxConcurrency semaphore design to reactive-execution.md
This commit is contained in:
2026-05-21 09:25:55 +00:00
parent c76be7f689
commit f3e084d02f
9 changed files with 239 additions and 268 deletions

View File

@@ -1,6 +1,6 @@
--- ---
status: draft status: draft
last_updated: 2026-05-20 last_updated: 2026-05-22
--- ---
# Analysis Functions # Analysis Functions
@@ -330,13 +330,13 @@ These are in-memory operations with no I/O. The dominant cost is `buildTypeEdges
## Open Questions ## Open Questions
1. **How deep should `typeCompat` check?** Currently it checks top-level field existence and type compatibility. Should it recursively check nested objects and arrays? Full recursive checking is more thorough but slower and may produce false negatives for schemas with dynamic structures. 1. ~~**How deep should `typeCompat` check?**~~ **Resolved (OQ-002/ADR-005)**: Type compatibility checking performs deep recursive structural comparison. The `TypeCompatResult` includes `mismatches?: TypeMismatch[]` with field-level diagnostics for incompatible edges. Type checking only applies to state-transfer edges (where `dataFlow: true` on `TemplateEdgeAttrs`). Temporal-only edges bypass type checking entirely. Remaining detail decisions (recursive depth limits, unknown/union type handling) are implementation concerns, not architecture decisions.
2. **Should `validateTemplate` check runtime preconditions?** Currently it only checks structural validity and type compatibility. Runtime preconditions (e.g., "operation B requires an API key that operation A doesn't have access to") are beyond the scope of static analysis and belong to the access control layer. 2. ~~**Should `validateTemplate` check runtime preconditions?**~~ **Resolved (OQ-027)**: Explicitly out of scope. `validateTemplate` only checks structural validity and type compatibility. Runtime preconditions (e.g., "operation B requires an API key that operation A doesn't have access to") belong to the access control layer, not the static analysis layer. This is a deliberate scope boundary, not a design gap.
3. **Should analysis functions be async?** For very large graphs (thousands of nodes), type compatibility checking could be slow. Making it async would allow incremental progress reporting. Current graphs are small enough (50-200 nodes) that synchronous checking is fine. 3. ~~**Should analysis functions be async?**~~ **Resolved (OQ-024)**: No — synchronous is sufficient for current scale. Expected graph sizes (10-200 nodes) are well within synchronous processing limits. Making functions async would add API complexity (Promise return types, async/await boilerplate) for no current benefit. If large graphs become common, async variants can be added alongside the synchronous ones.
4. **Should `parallelGroups` account for resource constraints?** Currently it returns the theoretical maximum parallelism. An optional `maxConcurrency` parameter could limit group sizes for realistic scheduling. 4. ~~**Should `parallelGroups` account for resource constraints?**~~ **Resolved (OQ-019)**: No for v1 — `parallelGroups()` returns theoretical maximum parallelism. Adding resource constraints would conflate structural analysis with scheduling policy. The `maxConcurrency` prop on `Parallel` is a runtime scheduling concern handled by the reactive engine, not a structural analysis concern. If consumers need resource-aware scheduling, they can post-process `parallelGroups()` output with their own constraints. An optional `maxConcurrency` parameter can be added in v2 as a convenience, but the core analysis function stays pure.
## References ## References

View File

@@ -1,6 +1,6 @@
--- ---
status: draft status: draft
last_updated: 2026-05-19 last_updated: 2026-05-22
--- ---
# Call Graph (Dynamic Runtime) # Call Graph (Dynamic Runtime)
@@ -91,9 +91,7 @@ Call graph edges carry an `edgeType` attribute:
| `triggered` | Parent call caused child call to execute | `call.requested` with `parentRequestId` | | `triggered` | Parent call caused child call to execute | `call.requested` with `parentRequestId` |
| `depends_on` | Data dependency — source needs target's result | Explicit declaration (not auto-populated) | | `depends_on` | Data dependency — source needs target's result | Explicit declaration (not auto-populated) |
`depends_on` edges are not auto-populated by the call protocol. They represent data dependencies that aren't captured by the parent-child hierarchy. They may be added by: `depends_on` edges represent data dependencies between calls. Per [ADR-005](decisions/005-event-log-as-source-of-truth.md), the reactive engine does NOT use `depends_on` edges for data flow — data flows through the result projection (`getResult()`). However, `depends_on` edges remain in the API for **observability and visualization**: they annotate which calls depended on which other calls' results, providing a data-flow overlay on top of the call hierarchy. Hub coordinators or external tools may add `depends_on` edges to annotate observed data flow for debugging, monitoring, or call-graph visualization. They do NOT affect execution.
- Workflow template instantiation (the template knows which steps depend on which)
- Explicit `addDependency(parent, child)` calls by the hub coordinator
### Edge Key Convention ### Edge Key Convention
@@ -234,19 +232,19 @@ updateCall(requestId: string, attrs: Partial<CallNodeAttrs>): void
- **Status transitions are validated** — invalid transitions throw `InvalidTransitionError`. - **Status transitions are validated** — invalid transitions throw `InvalidTransitionError`.
- **Node keys are `requestId`** — not `operationId`. Multiple calls to the same operation have different `requestId`s but the same `operationId`. - **Node keys are `requestId`** — not `operationId`. Multiple calls to the same operation have different `requestId`s but the same `operationId`.
- **`parentRequestId` is both node attribute and edge** — denormalized for fast point lookups (node attribute) and traversal queries (edge), following the storage schema pattern. - **`parentRequestId` is both node attribute and edge** — denormalized for fast point lookups (node attribute) and traversal queries (edge), following the storage schema pattern.
- **`depends_on` edges are not auto-populated** — they represent data dependencies that the call protocol doesn't capture. They must be added explicitly by the hub coordinator or workflow template instantiation. - **`depends_on` edges are for observability, not execution** — per ADR-005, data dependencies flow through the result projection. `depends_on` edges annotate observed data flow for visualization and debugging. The reactive engine does NOT use them for scheduling or precondition computation. They may be added by hub coordinators or external tools to document which calls depended on which other calls' results.
- **Payload fields are stored as-is** — flowgraph doesn't truncate or redact `input`, `output`, or `error`. That's the hub's responsibility at the persistence boundary. - **Payload fields are stored as-is** — flowgraph doesn't truncate or redact `input`, `output`, or `error`. That's the hub's responsibility at the persistence boundary.
- **Small graph sizes** — call graphs at hub level are typically tens of nodes. Performance is a non-issue; O(n) traversals are fine. - **Small graph sizes** — call graphs at hub level are typically tens of nodes. Performance is a non-issue; O(n) traversals are fine.
## Open Questions ## Open Questions
1. **Should the call graph support `call.requested` events with unknown `operationId`?** If a `call.requested` event references an operation not in the registry, should the node be created with `operationId` set to the unknown value? Yes — the call graph records what happened, not what should have happened. The node gets a `status: "pending"` and may later transition to `"failed"` with an `OPERATION_NOT_FOUND` error code. 1. ~~**Should the call graph support `call.requested` events with unknown `operationId`?**~~ **Resolved (OQ-014)**: Yes — the call graph records what happened, not what should have happened. Nodes with unknown `operationId` get `status: "pending"` and may later transition to `"failed"` with an `OPERATION_NOT_FOUND` error code. This is consistent with the error-handling doc's existing statement about unknown `operationId`. The behavior should be documented explicitly in the `fromCallEvents()` specification: when a `call.requested` event references an `operationId` not in the registry, the node is still created with `status: "pending"` and the given `operationId`. This enables the call graph to serve as a complete audit trail regardless of registry state.
2. **Should `depends_on` edges be auto-populated from workflow templates?** When a call graph is instantiated from a workflow template, the template's sequential/parallel structure implies data dependencies. Should the template instantiation automatically create `depends_on` edges? This would couple the call graph to the template system, which may not always be desirable. 2. ~~**Should `depends_on` edges be auto-populated from workflow templates?**~~ **Resolved (OQ-008/ADR-005)**: `depends_on` edges are unnecessary as a separate concept. Data dependencies are expressed through the result projection. If node B needs node A's output, B reads `getResult("A")` from the result projection. The temporal ordering (A before B) is already expressed by template edges. There's no need for a separate edge type to represent data flow — the event log is the data transport.
3. **Should the call graph support multiple graphs simultaneously (one per workflow execution)?** Currently the design assumes one call graph per `FlowGraph` instance. If the hub needs to track multiple concurrent workflows, it would use multiple instances. An alternative is a single graph with workflow-scoped subgraphs. 3. ~~**Should the call graph support multiple graphs simultaneously?**~~ **Resolved (OQ-015)**: No — one `FlowGraph` instance per graph. Multiple concurrent workflows use multiple instances. This design is simpler and matches graphology's model. Subgraphs would require a scoping mechanism and cross-scope queries that add complexity without benefit at current scale. The hub coordinator creates one `WorkflowReactiveRoot` per workflow, so one `FlowGraph` per workflow is consistent. This is a deliberate "no," not a deferral — if future scale demands require multi-workflow queries, a specialized query layer can aggregate across instances.
4. **Should `filterByStatus` use an index?** For small graphs (tens of nodes), a simple filter is fast. For very large graphs, maintaining a `Map<CallStatus, Set<string>>` index would make status queries O(1). The index would need to be updated on every `updateStatus()` call. 4. ~~**Should `filterByStatus` use an index?**~~ **Resolved (OQ-016)**: No — O(n) filter is sufficient for expected graph sizes (tens to hundreds of nodes). A status index would add implementation complexity (maintain on every `updateStatus()`) for no measurable benefit at current scale. If performance becomes an issue with very large graphs, a `Map<CallStatus, Set<string>>` index can be added as an optimization later without changing the public API.
## References ## References

View File

@@ -1,6 +1,6 @@
--- ---
status: draft status: draft
last_updated: 2026-05-19 last_updated: 2026-05-22
--- ---
# FlowGraph Public API # FlowGraph Public API
@@ -293,11 +293,11 @@ This is an escape hatch. Direct graph mutation bypasses flowgraph's validation (
## Open Questions ## Open Questions
1. **Should `FlowGraph` expose graphology's traversal methods directly or only via convenience methods?** Currently the plan is convenience methods that delegate. Direct graphology access via `.graph` is the escape hatch. But some consumers may find it inconvenient to go through `.graph.forEachNode()` instead of `flowGraph.forEachNode()`. 1. ~~**Should `FlowGraph` expose graphology's traversal methods directly or only via convenience methods?**~~ **Resolved (OQ-017)**: Option (c) — expose the most common traversal methods directly on `FlowGraph`, let `.graph` handle the rest. The directly exposed methods are: `forEachNode()`, `forEachEdge()`, `nodes()`, `edges()`, `order`, `size`, `inNeighbors()`, `outNeighbors()` (already exposed as `predecessors()`/`successors()`). Less common methods (degree, detailed attribute iteration, adjacency queries) remain accessible via `flowGraph.graph`. This is the 80/20 approach: consumers get a clean API for common operations, and power users get the escape hatch. The convenience delegation pattern is maintained — `FlowGraph.forEachNode()` delegates to `this._graph.forEachNode()`.
2. **Should the operation graph's `addTypedEdge` be auto-populated or manual?** Currently `fromSpecs()` calls `buildTypeEdges()` which adds all type-compatibility edges. `addTypedEdge` is for manual or incremental construction. Should `addOperation` also attempt auto-type-compat edge creation? 2. ~~**Should the operation graph's `addTypedEdge` be auto-populated or manual?**~~ **Resolved (OQ-018)**: Manual — `addOperation()` adds a node only, and `buildTypeEdges()` must be called separately after incremental construction. Auto-population would require O(n) comparisons on every `addOperation()`, which adds complexity for a rare use case (the operation graph is typically built once via `fromSpecs()`). If incremental construction is needed, the consumer can call `buildTypeEdges()` manually after adding operations.
3. **Should `FlowGraph` support multiple graph instances sharing analysis functions?** Currently each `FlowGraph` instance owns its own `DirectedGraph`. A future optimization could pool analysis functions across instances. 3. ~~**Should `FlowGraph` support multiple graph instances sharing analysis functions?**~~ **Resolved (OQ-028)**: No — each `FlowGraph` instance owns its own `DirectedGraph`. Analysis functions are stateless pure functions that take a graph as input; there's nothing to pool or share. The `FlowGraph` convenience methods delegate to these standalone functions. This question conflates "sharing analysis functions" (already done — `typeCompat` is a standalone function) with "sharing graph data" (unnecessary since analysis doesn't cache state).
## References ## References

View File

@@ -1,6 +1,6 @@
--- ---
status: draft status: draft
last_updated: 2026-05-21 last_updated: 2026-05-22
--- ---
# Host Configs # Host Configs
@@ -453,7 +453,7 @@ The `Conditional.test` prop can be a function or a string. At the template level
1. ~~**Should structural containers create "virtual" nodes in the reactive engine?**~~ **Resolved (OQ-05)**: Containers stay transparent. Aggregate status for structural containers is computed as a projection from children's statuses, without requiring nodes in the event log or DAG. The `parentMap` and `siblingMap` in `ReactiveContext` provide the structural context for precondition computation. 1. ~~**Should structural containers create "virtual" nodes in the reactive engine?**~~ **Resolved (OQ-05)**: Containers stay transparent. Aggregate status for structural containers is computed as a projection from children's statuses, without requiring nodes in the event log or DAG. The `parentMap` and `siblingMap` in `ReactiveContext` provide the structural context for precondition computation.
2. **Should the GraphologyHostConfig produce a separate graph for edge types?** Currently all edge types (`sequential`, `conditional`, `typed`) share the same graph. An alternative is a separate graph per edge type, enabling type-specific queries without filtering. 2. ~~**Should the GraphologyHostConfig produce a separate graph for edge types?**~~ **Resolved (OQ-029)**: No — all edge types share a single graph, with `edgeType` as a universal required attribute on every edge. Separate graphs per edge type would add complexity (cross-graph traversal, cache coherence, multi-graph queries) for a marginal performance gain at current scale. Single-graph filtering by `edgeType` is O(n) on edges and negligible for expected graph sizes. If a concrete performance issue arises with very large template graphs, a `Map<EdgeType, DirectedGraph>` index can be added as an internal optimization without changing the API. See ADR-006 for the full decision on `edgeType` consistency.
3. ~~**How does the ReactiveHostConfig interact with the call protocol?**~~ **Resolved (ADR-005)**: The reactive layer bridges to the call protocol through the event log. The hub coordinator appends call protocol events; the reactive layer projects them into status and results. The `ReactiveHostConfig` reads from the `EventLogProjection` interface (via `getStatus()` and `getResult()`), not from direct signal mutations by the coordinator. 3. ~~**How does the ReactiveHostConfig interact with the call protocol?**~~ **Resolved (ADR-005)**: The reactive layer bridges to the call protocol through the event log. The hub coordinator appends call protocol events; the reactive layer projects them into status and results. The `ReactiveHostConfig` reads from the `EventLogProjection` interface (via `getStatus()` and `getResult()`), not from direct signal mutations by the coordinator.

View File

@@ -1,6 +1,6 @@
--- ---
status: draft status: draft
last_updated: 2026-05-20 last_updated: 2026-05-22
--- ---
# Open Questions Tracker # Open Questions Tracker
@@ -50,22 +50,18 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
### OQ-03: Should subscription operations be treated differently in type compatibility? ### OQ-03: Should subscription operations be treated differently in type compatibility?
- **Origin**: [operation-graph.md](operation-graph.md) Q3 - **Origin**: [operation-graph.md](operation-graph.md) Q3
- **Status**: open - **Status**: resolved
- **Priority**: medium — affects operation graph edge semantics for streaming operations - **Priority**: medium — affects operation graph edge semantics for streaming operations
- **Question**: A subscription produces a stream, not a single output. Its `outputSchema` describes a single stream element, but the data flow semantics are different. Should type compat check for subscriptions account for this? - **Resolution**: For v1, subscriptions are treated identically to queries/mutations in `typeCompat()`. A subscription's `outputSchema` describes a single stream element, and `typeCompat()` checks whether that single element is compatible with the downstream input. This is correct for `Map` (which processes stream elements individually) and may be misleading for direct subscription→operation connections. The `OperationNodeAttrs.type` field is available for consumers that need subscription-aware behavior. A v2 extension could add a `streaming: boolean` flag on edges to capture stream semantics explicitly, but this adds complexity without a current use case.
- **Notes**: This has downstream implications for call-graph population (subscriptions produce multiple `call.responded` events) and template authoring (a subscription feeding into a mutation has different semantics than a query feeding into a mutation). May want to defer to v2 but should at least document the current behavior (subscriptions are treated the same as queries/mutations). - **Cross-references**: OQ-01
### OQ-04: Edge type consistency — should `edgeType` be required on ALL edges? ### OQ-04: Edge type consistency — should `edgeType` be required on ALL edges?
- **Origin**: [schema.md](schema.md) Q1 - **Origin**: [schema.md](schema.md) Q1
- **Status**: open - **Status**: resolved
- **Priority**: medium — affects serialization format and edge handling across all graph types - **Priority**: medium — affects serialization format and edge handling across all graph types
- **Options**: - **Resolution**: Option (a) — `edgeType` is required on all edges. The mode-specific attribute schemas (`OperationEdgeAttrs`, `TriggeredEdgeAttrs`, `DependencyEdgeAttrs`) do NOT include `edgeType` — it is stored as a universal attribute alongside the mode-specific attributes in graphology. This ensures consistent serialization/deserialization, uniform graphology queries, and straightforward edge-type filtering across all graph modes. The redundancy for operation graphs (where `edgeType` is always `"typed"`) is a minor ergonomic cost for significant consistency gains. See ADR-006 for the full decision record.
- (a) `edgeType` required on all edges. Pro: consistent, self-describing. Con: operation graph edges are always `typed`, making the field redundant there. - **Cross-references**: OQ-29
- (b) Separate edge attribute types per graph mode (current implicit design — `CallEdgeAttrs` is a union, `OperationEdgeAttrs` doesn't include edge type). Con: graphology edges must carry attributes from a single schema.
- (c) Union type on edge attributes, letting the consumer tag the edge. Pro: flexible. Con: runtime discrimination burden.
- **Notes**: The current schema already stores `edgeType` alongside the edge-specific attributes in graphology (see schema.md's "Edge type storage" section), which is effectively option (a) at the storage level. The question is really about the TypeScript type API: should `OperationEdgeAttrs` include `edgeType: "typed"` or should that be a separate concern?
- **Cross-references**: OQ-01 (if incompatible edges exist, they need tagging)
--- ---
@@ -128,10 +124,6 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
- **Status**: resolved by ADR-005 - **Status**: resolved by ADR-005
- **Priority**: high — affects the core status state machine - **Priority**: high — affects the core status state machine
- **Question**: If an operation fails and should be retried, the status would need to go `running → failed → ready → running`. But the current state machine marks `failed` as terminal with no exit transitions. How should this work? - **Question**: If an operation fails and should be retried, the status would need to go `running → failed → ready → running`. But the current state machine marks `failed` as terminal with no exit transitions. How should this work?
- **Options**:
- (a) A `retried` status that allows re-entering `ready`. Con: adds another state to `NodeStatus`.
- (b) A separate `retryCount` attribute. A node can reset its status from `failed` to `ready` if `retryCount < maxRetries`. Con: breaks the terminal-state invariant.
- (c) Retry creates a new node (new `requestId`). The old node stays `failed`. Con: increases graph size but preserves state machine integrity.
- **ADR-005 resolution**: Option (c) is correct, and the event log makes it natural. A retry is not a state mutation — it's a new sequence of events appended to the log. When `call.requested` arrives for the same operation with a new `requestId`, it's a new fact. The old `call.error` event remains in the log as history. The status projection derives the current state by scanning for the most recent event per node. No `retried` status needed; no state machine mutation; the log preserves full history. - **ADR-005 resolution**: Option (c) is correct, and the event log makes it natural. A retry is not a state mutation — it's a new sequence of events appended to the log. When `call.requested` arrives for the same operation with a new `requestId`, it's a new fact. The old `call.error` event remains in the log as history. The status projection derives the current state by scanning for the most recent event per node. No `retried` status needed; no state machine mutation; the log preserves full history.
- **Cross-references**: OQ-10 - **Cross-references**: OQ-10
@@ -150,27 +142,25 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
### OQ-11: Should preconditions support OR logic? ### OQ-11: Should preconditions support OR logic?
- **Origin**: [reactive-execution.md](reactive-execution.md) Q1 - **Origin**: [reactive-execution.md](reactive-execution.md) Q1
- **Status**: open - **Status**: resolved
- **Priority**: medium — affects the precondition computation model - **Priority**: medium — affects the precondition computation model
- **Question**: Currently all predecessors must complete (AND logic). An `anyOf` predicate would allow "start this node as soon as any predecessor completes." - **Resolution**: No for v1. All preconditions use AND logic — a node becomes `ready` only when ALL predecessors have reached a satisfying terminal state (`completed` or `skipped`). OR logic (`anyOf`) would introduce significant complexity (what happens when one predecessor completes but another fails?) and is already partially addressed by `Conditional` (which provides branch-level either/or semantics). For v2, if OR logic becomes necessary, it should be added as a `preconditionMode: "allOf" | "anyOf"` attribute on `Operation` (node-level, not edge-level), defaulting to `"allOf"`. This is a clean extension point that doesn't change the current precondition model.
- **Notes**: OR preconditions would require either: (a) an edge attribute indicating `allOf` vs `anyOf`, (b) a node-level configuration, or (c) a separate `anyOfPredecessors` computed per node. This is a semantic change that affects both the DAG structure and the reactive engine. Might be a v2 feature.
- **Cross-references**: OQ-12 - **Cross-references**: OQ-12
### OQ-12: How does `maxConcurrency` interact with preconditions? ### OQ-12: How does `maxConcurrency` interact with preconditions?
- **Origin**: [reactive-execution.md](reactive-execution.md) Q4 - **Origin**: [reactive-execution.md](reactive-execution.md) Q4
- **Status**: open - **Status**: resolved
- **Priority**: medium — a `Parallel` group with `maxConcurrency: 3` should only start 3 nodes at a time - **Priority**: medium — a `Parallel` group with `maxConcurrency: 3` should only start 3 nodes at a time
- **Notes**: `maxConcurrency` is a scheduling concern, not a structural one. The DAG doesn't encode it. Options: (a) a semaphore signal in the reactive layer, (b) coordinator-enforced throttling, (c) a `maxConcurrency` prop on `Parallel` that the reactive engine respects. The `<Parallel>` component already has `maxConcurrency` as an optional prop in its definition (workflow-templates.md). - **Resolution**: `maxConcurrency` is a `Parallel` prop enforced by the `WorkflowReactiveRoot` via a reactive counting semaphore. When the root initializes signals for nodes in a `Parallel` group with `maxConcurrency: N`, it wraps the precondition logic: a node's effective `ready` transition requires both `preconditions.value === true` AND `runningCount < maxConcurrency`, where `runningCount` is a reactive computed derived from counting sibling nodes currently in the `running` state. This is entirely a reactive-engine concern — the DAG doesn't encode `maxConcurrency` (it's not structural), and the call graph doesn't need to know about it. The `Parallel` component's `maxConcurrency` prop is already part of the template definition; the reactive engine just needs to honor it.
- **Cross-references**: OQ-11, workflow-templates `Parallel` component - **Cross-references**: OQ-11, workflow-templates `Parallel` component
### OQ-13: Should `blockedByFailure` be a separate `computed` or derived from `preconditions`? ### OQ-13: Should `blockedByFailure` be a separate `computed` or derived from `preconditions`?
- **Origin**: [reactive-execution.md](reactive-execution.md) Q5 - **Origin**: [reactive-execution.md](reactive-execution.md) Q5
- **Status**: open - **Status**: resolved
- **Priority**: low — implementation detail, can be decided during implementation - **Priority**: low — implementation detail
- **Question**: Currently there are two separate `computed` values `preconditions` (all predecessors completed/skipped) and `blockedByFailure` (any predecessor failed/aborted). An alternative is a single `computed<NodeReadiness>` returning `"ready" | "blocked" | "failed"`. - **Resolution**: Keep two separate `computed` values (current design). Two separate computeds are more composable — you can check preconditions independently of failure status, and you can compose different effects for each. A single `computed<NodeReadiness>` would require every consumer to destructure the result, losing the clean `if (preconditions.value) { ... }` pattern. The implementation cost of two effects per node is negligible. The current design is confirmed.
- **Notes**: Two separate `computed` values are more composable (you can check preconditions independently of failure status) but require two effects per node. A single `computed` is simpler (one effect) but less composably queryable. This is largely an implementation choice that doesn't affect the public API. Can be deferred to implementation.
--- ---
@@ -179,69 +169,58 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
### OQ-14: Should the call graph support unknown `operationId`? ### OQ-14: Should the call graph support unknown `operationId`?
- **Origin**: [call-graph.md](call-graph.md) Q1 - **Origin**: [call-graph.md](call-graph.md) Q1
- **Status**: open (with a proposed answer) - **Status**: resolved
- **Priority**: medium — affects `fromCallEvents()` and `updateFromEvent()` behavior - **Priority**: medium — affects `fromCallEvents()` and `updateFromEvent()` behavior
- **Proposed answer**: Yes. The call graph records what happened, not what should have happened. Nodes with unknown `operationId` get `status: "pending"` and may later transition to `"failed"` with an `OPERATION_NOT_FOUND` error code. - **Resolution**: Yes — the call graph records what happened, not what should have happened. Nodes with unknown `operationId` get `status: "pending"` and may later transition to `"failed"` with an `OPERATION_NOT_FOUND` error code. This is consistent with the error-handling doc's existing statement about unknown `operationId`. The behavior is documented explicitly in the `fromCallEvents()` specification: when a `call.requested` event references an `operationId` not in the registry, the node is still created with `status: "pending"` and the given `operationId`. This enables the call graph to serve as a complete audit trail regardless of registry state.
- **Notes**: The doc already has a proposed answer. This just needs confirmation and the behavior documented in the `fromCallEvents()` spec.
### OQ-15: Should the call graph support multiple graphs simultaneously? ### OQ-15: Should the call graph support multiple graphs simultaneously?
- **Origin**: [call-graph.md](call-graph.md) Q3 - **Origin**: [call-graph.md](call-graph.md) Q3
- **Status**: open - **Status**: resolved
- **Priority**: low — can be deferred to v2 - **Priority**: low — confirmed as correct design, not a deferral
- **Question**: Currently one `FlowGraph` instance = one call graph. If the hub needs to track multiple concurrent workflows, it uses multiple instances. An alternative is a single graph with workflow-scoped subgraphs. - **Resolution**: No — one `FlowGraph` instance per graph. Multiple concurrent workflows use multiple instances. This design is simpler and matches graphology's model. Subgraphs would require a scoping mechanism and cross-scope queries that add complexity without benefit at current scale. The hub coordinator creates one `WorkflowReactiveRoot` per workflow, so one `FlowGraph` per workflow is consistent. This is a deliberate "no," not a deferral — if future scale demands require multi-workflow queries, a specialized query layer can aggregate across instances.
- **Notes**: The current design (multiple instances) is simpler and matches graphology's model. Subgraphs would require a scoping mechanism. This can be deferred unless early usage shows it's needed.
### OQ-16: Should `filterByStatus` use an index? ### OQ-16: Should `filterByStatus` use an index?
- **Origin**: [call-graph.md](call-graph.md) Q4 - **Origin**: [call-graph.md](call-graph.md) Q4
- **Status**: open - **Status**: resolved
- **Priority**: low — premature optimization for small graphs - **Priority**: low — premature optimization for small graphs
- **Notes**: Call graphs at hub level are typically tens of nodes. O(n) filter is fast enough. An index can be added later if performance becomes an issue. Can be deferred. - **Resolution**: No — O(n) filter is sufficient for expected graph sizes (tens to hundreds of nodes). A status index would add implementation complexity (maintain on every `updateStatus()`) for no measurable benefit at current scale. If performance becomes an issue with very large graphs, a `Map<CallStatus, Set<string>>` index can be added as an optimization later without changing the public API.
### OQ-17: Should `FlowGraph` expose graphology's traversal methods directly? ### OQ-17: Should `FlowGraph` expose graphology's traversal methods directly?
- **Origin**: [flowgraph-api.md](flowgraph-api.md) Q1 - **Origin**: [flowgraph-api.md](flowgraph-api.md) Q1
- **Status**: open - **Status**: resolved
- **Priority**: medium — affects the public API surface - **Priority**: medium — affects the public API surface
- **Question**: Currently the plan is convenience methods that delegate. But some consumers may find it inconvenient to go through `.graph.forEachNode()`. - **Resolution**: Option (c) — expose the most common traversal methods directly on `FlowGraph`, let `.graph` handle the rest. The directly exposed methods are: `forEachNode()`, `forEachEdge()`, `nodes()`, `edges()`, `order`, `size`, `inNeighbors()`, `outNeighbors()` (already exposed as `predecessors()`/`successors()`). Less common methods (degree, detailed attribute iteration, adjacency queries) remain accessible via `flowGraph.graph`. This is the 80/20 approach: consumers get a clean API for common operations, and power users get the escape hatch.
- **Options**:
- (a) Convenience methods only (current plan). Direct access via `.graph` for power users.
- (b) Expose graphology's traversal methods directly on `FlowGraph` (e.g., `flowGraph.forEachNode()`).
- (c) Expose only the most common traversal methods and let `.graph` handle the rest.
- **Notes**: This is a UX decision. Option (a) keeps the API surface small. Option (b) is more convenient but increases the delegation surface. Option (c) is a middle ground. The decision can be made during implementation based on actual consumer usage patterns.
### OQ-18: Should `addOperation` auto-populate type-compat edges? ### OQ-18: Should `addOperation` auto-populate type-compat edges?
- **Origin**: [flowgraph-api.md](flowgraph-api.md) Q2 - **Origin**: [flowgraph-api.md](flowgraph-api.md) Q2
- **Status**: open - **Status**: resolved
- **Priority**: low — affects incremental construction behavior - **Priority**: low — affects incremental construction behavior
- **Question**: `fromSpecs()` calls `buildTypeEdges()` which adds all type-compatibility edges. Should `addOperation()` (incremental) also attempt auto-type-compat edge creation? - **Resolution**: No — `addOperation()` adds a node only. Call `buildTypeEdges()` manually after incremental construction. Auto-population would require O(n) comparisons on every `addOperation()`, which adds complexity for a rare use case (the operation graph is typically built once via `fromSpecs()`). If incremental construction is needed, the consumer can call `buildTypeEdges()` manually after adding operations.
- **Notes**: This is only relevant for incremental construction (rare use case). The operation graph is typically built once via `fromSpecs()`. If incremental construction is needed, the consumer can call `buildTypeEdges()` manually after adding operations. Can be deferred.
### OQ-28: Should `FlowGraph` share analysis functions across instances? ### OQ-28: Should `FlowGraph` share analysis functions across instances?
- **Origin**: [flowgraph-api.md](flowgraph-api.md) Q3 - **Origin**: [flowgraph-api.md](flowgraph-api.md) Q3
- **Status**: open - **Status**: resolved
- **Priority**: low — optimization concern, not blocking - **Priority**: low — optimization concern, not blocking
- **Question**: Currently each `FlowGraph` instance owns its own `DirectedGraph`. A future optimization could pool analysis functions across instances. - **Resolution**: No — each `FlowGraph` instance owns its own `DirectedGraph`, and analysis functions are stateless pure functions that take a graph as input. There's nothing to pool or share. The `FlowGraph` convenience methods delegate to these standalone functions. Shared analysis "instances" would only make sense if the functions had internal caches, but they don't. This question conflated "sharing analysis functions" (already done — `typeCompat` is a standalone function) with "sharing graph data" (unnecessary since analysis doesn't cache state).
- **Notes**: Distinct from OQ-15 (multiple graphs per instance) — this is about sharing analysis logic, not about graph scoping. Can be deferred.
### OQ-19: Should `parallelGroups` account for resource constraints? ### OQ-19: Should `parallelGroups` account for resource constraints?
- **Origin**: [analysis.md](analysis.md) Q4 - **Origin**: [analysis.md](analysis.md) Q4
- **Status**: open - **Status**: resolved
- **Priority**: low — feature enhancement, not a core concern - **Priority**: low — feature enhancement, not a core concern
- **Question**: Currently `parallelGroups()` returns the theoretical maximum parallelism. An optional `maxConcurrency` parameter could limit group sizes for realistic scheduling. - **Resolution**: No for v1 — `parallelGroups()` returns theoretical maximum parallelism. Adding resource constraints would conflate structural analysis with scheduling policy. The `maxConcurrency` prop on `Parallel` is a runtime scheduling concern handled by the reactive engine (see OQ-12), not a structural analysis concern. If consumers need resource-aware scheduling, they can post-process the `parallelGroups()` output with their own constraints. An optional `maxConcurrency` parameter can be added in v2 as a convenience, but the core analysis function stays pure.
- **Notes**: Can be added later as an optional parameter. Not blocking.
### OQ-27: Should `validateTemplate` check runtime preconditions? ### OQ-27: Should `validateTemplate` check runtime preconditions?
- **Origin**: [analysis.md](analysis.md) Q2 - **Origin**: [analysis.md](analysis.md) Q2
- **Status**: open (intentionally deferred) - **Status**: resolved
- **Priority**: low — explicitly out of scope for static analysis - **Priority**: low — explicitly out of scope for static analysis
- **Question**: Currently `validateTemplate` only checks structural validity and type compatibility. Runtime preconditions (e.g., "operation B requires an API key that operation A doesn't have access to") are beyond the scope of static analysis and belong to the access control layer. - **Resolution**: Explicitly out of scope. `validateTemplate` only checks structural validity and type compatibility. Runtime preconditions (e.g., "operation B requires an API key that operation A doesn't have access to") belong to the access control layer, not the static analysis layer. This is a deliberate scope boundary, not a design gap.
- **Notes**: This is a deliberate scope boundary, not a design gap. Documented here to confirm that this is an intentional deferral, not an oversight.
--- ---
@@ -250,33 +229,31 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
### OQ-29: Should GraphologyHostConfig produce a separate graph per edge type? ### OQ-29: Should GraphologyHostConfig produce a separate graph per edge type?
- **Origin**: [host-configs.md](host-configs.md) Q2 - **Origin**: [host-configs.md](host-configs.md) Q2
- **Status**: open - **Status**: resolved
- **Priority**: medium — affects implementation of the GraphologyHostConfig - **Priority**: medium — affects implementation of the GraphologyHostConfig
- **Question**: Currently all edge types (`sequential`, `conditional`, `typed`) share the same graph. An alternative is a separate graph per edge type, enabling type-specific queries without filtering. - **Resolution**: No — all edge types share a single graph, with `edgeType` as a universal required attribute on every edge (consistent with OQ-004 resolution). Separate graphs per edge type would add complexity (cross-graph traversal, cache coherence, multi-graph queries) for a marginal performance gain at current scale. Single-graph filtering by `edgeType` is O(n) on edges and negligible for expected graph sizes. If a concrete performance issue arises, a `Map<EdgeType, DirectedGraph>` internal index can be added as an optimization without changing the API. See ADR-006 for the full decision on `edgeType` consistency.
- **Notes**: Related to OQ-04 (edge type consistency at the schema level) but distinct — this is about the runtime graph structure, not the type design. Multiple graphs would make type-specific queries faster (no filtering) but increase complexity and memory usage.
- **Cross-references**: OQ-04 - **Cross-references**: OQ-04
### OQ-20: How should conditional edge conditions be represented? ### OQ-20: How should conditional edge conditions be represented?
- **Origin**: [schema.md](schema.md) Q3 - **Origin**: [schema.md](schema.md) Q3
- **Status**: open - **Status**: resolved
- **Priority**: medium — affects `TemplateEdgeAttrs.condition` type safety - **Priority**: medium — affects `TemplateEdgeAttrs.condition` type safety
- **Options**: - **Resolution**: `condition: Type.Optional(Type.Unknown())` with documentation describing the two runtime forms. The condition field accepts:
- (a) `Type.Unknown()` with documentation (current). Pro: maximally flexible. Con: no type safety. 1. **String form** (`string`): A serializable reference to an operation name whose result determines the branch. Survives JSON round-trips.
- (b) `Type.Union([Type.String(), Type.Function(...)])` for expression strings and function references. Pro: documents both forms. Con: functions don't serialize. 2. **Function form** (`(results: Record<string, CallResult>) => boolean`): A runtime-evaluated predicate. Does NOT survive JSON serialization.
- (c) A dedicated `ConditionSchema` that flowgraph defines. Pro: type safe, consistent. Con: may be overly prescriptive.
- **Notes**: The workflow-templates doc already specifies `Conditional.test` as `((results: Record<string, CallResult>) => boolean) | string`, and the host-configs doc notes that function props need runtime resolution. Option (b) seems like the pragmatic choice that matches the existing design, but the schema representation is what needs deciding. `@alkdev/typebox`'s `Type.Function()` defines serializable function input/output **schemas** (shapes), but `Conditional.test` predicates are runtime closures — they can't be represented as serializable function schemas. Using `Type.Function()` here would conflate the function's shape schema with the runtime closure itself. `Type.Unknown()` with clear documentation is the pragmatic choice for v1, accepting that JSON serialization only preserves the string form. A dedicated `ConditionSchema` can be introduced in v2 if template interchange needs schema-level condition descriptions, but only if there's a concrete use case for representing conditions as typed data (rather than as code).
- **Known Gap** (from [host-configs.md](host-configs.md)): "Conditional Test Evaluation" — the `Conditional.test` function needs access to the `WorkflowContext`/`ReactiveContext` at runtime to evaluate against predecessor results. This is a concrete sub-problem of OQ-06 (how the reactive host config bridges to execution). - **Known Gap** (from [host-configs.md](host-configs.md)): "Conditional Test Evaluation" — the `Conditional.test` function needs access to the `WorkflowContext`/`ReactiveContext` at runtime to evaluate against predecessor results. This gap is resolved by ADR-005: `Conditional.test` reads from the result projection.
- **Cross-references**: OQ-05 (conditional branch behavior in reactive engine), OQ-06 (runtime resolution of function props) - **Cross-references**: OQ-05, OQ-06
### OQ-21: Should templates support explicit `depends_on` edges? ### OQ-21: Should templates support explicit `depends_on` edges?
- **Origin**: [workflow-templates.md](workflow-templates.md) Q3 - **Origin**: [workflow-templates.md](workflow-templates.md) Q3
- **Status**: open - **Status**: resolved
- **Priority**: medium — affects template composition expressiveness - **Priority**: medium — affects template composition expressiveness
- **Question**: Currently dependencies are inferred from structure (sequential implies dependency). An explicit `<DependsOn target="operation-name" />` component would make data dependencies visible in the template without relying on sequential ordering. - **Resolution**: No for v1. ADR-005's `dataFlow` inference and the result projection make explicit `depends_on` unnecessary for current use cases. Data dependencies are expressed through the result projection — if B needs A's output, B reads `getResult("A")`. The `dataFlow: true` attribute on edges captures which edges carry data. An explicit `<DependsOn>` component would add template syntax complexity and potentially conflict with structural ordering. If a future use case requires non-adjacent data dependencies that can't be expressed by restructuring the template, `<DependsOn>` can be added as a v2 extension. But v1 intentionally restricts dependencies to follow the structural flow.
- **Notes**: This would add expressiveness but also complexity. Implicit dependency from structure is simpler and covers the most common cases. Explicit `depends_on` would be needed when a node depends on a non-adjacent predecessor in a way that can't be expressed by a `Sequential` group. Can be deferred to v2. - **Cross-references**: OQ-08
- **Cross-references**: OQ-08 (call graph `depends_on` edges)
--- ---
@@ -285,26 +262,21 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
### OQ-22: Should `CallNodeAttrs.identity` be a structured type or `Type.Record`? ### OQ-22: Should `CallNodeAttrs.identity` be a structured type or `Type.Record`?
- **Origin**: [schema.md](schema.md) Q2 - **Origin**: [schema.md](schema.md) Q2
- **Status**: open - **Status**: resolved
- **Priority**: medium — affects the `@alkdev/operations` peer dependency - **Priority**: medium — affects the `@alkdev/operations` peer dependency
- **Options**: - **Resolution**: Option (a) — import the `Identity` type structure from `@alkdev/operations` (peer dependency). Since `@alkdev/operations` is already a peer dependency (for `CallEventMapValue`), adding this type import creates minimal additional coupling. The `CallNodeAttrs.identity` field mirrors the `Identity` interface: `{ id, scopes, resources? }`. Version alignment is handled by semver ranges. The TypeBox schema for `identity` is defined inline in `CallNodeAttrs` to match the shape (not imported as a TypeBox schema from operations, since `Identity` is a TypeScript interface there), but the field semantics match exactly.
- (a) Import `Identity` from `@alkdev/operations` (peer dep). Pro: matches call protocol. Con: creates a direct type dependency.
- (b) Duplicate the type in flowgraph. Pro: no dependency. Con: divergence risk.
- (c) Use `Type.Record(Type.String(), Type.Array(Type.String()))` for the `resources` field. Pro: flexible. Con: weaker typing.
- **Notes**: Since `@alkdev/operations` is already a peer dependency for type imports, option (a) seems reasonable. The concern is version alignment, but semver ranges handle this. This could also be a `Type.Unknown()` with documentation, letting the consumer validate.
### OQ-23: Multiple graphs per `FlowGraph` instance? ### OQ-23: Multiple graphs per `FlowGraph` instance?
- **Origin**: [call-graph.md](call-graph.md) Q3 (same as OQ-15) - **Origin**: [call-graph.md](call-graph.md) Q3 (same as OQ-15)
- **Status**: open (duplicate of OQ-15 — see above) - **Status**: resolved (duplicate of OQ-15 — see above)
### OQ-24: Async analysis functions? ### OQ-24: Async analysis functions?
- **Origin**: [analysis.md](analysis.md) Q3 - **Origin**: [analysis.md](analysis.md) Q3
- **Status**: open - **Status**: resolved
- **Priority**: low — premature for current scale - **Priority**: low — premature for current scale
- **Question**: Should analysis functions be async for large graphs? Current graphs are small (50-200 nodes), synchronous is fine. - **Resolution**: No — synchronous is sufficient for current scale (10-200 nodes). Making functions async would add API complexity (Promise return types, async/await boilerplate) for no current benefit. If large graphs become common, `typeCompat()` and `buildTypeEdges()` can gain async variants alongside the synchronous ones.
- **Notes**: Can be deferred. If large graphs become common, async analysis can be added with an optional `async` variant.
--- ---
@@ -313,12 +285,10 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
### OQ-25: Should the reactive graph support partial re-rendering? ### OQ-25: Should the reactive graph support partial re-rendering?
- **Origin**: [reactive-execution.md](reactive-execution.md) Q3 - **Origin**: [reactive-execution.md](reactive-execution.md) Q3
- **Status**: open (blocked on ujsx reconciler) - **Status**: resolved
- **Priority**: low — blocked on ujsx reconciler implementation - **Priority**: low — blocked on ujsx reconciler, now resolved with clear path
- **Question**: If a template changes mid-execution, the ujsx reconciler could diff and apply changes. Currently only mount rendering is supported. - **Resolution**: Blocked on ujsx reconciler. When the reconciler is implemented, flowgraph gains re-rendering through the standard `prepareUpdate`/`commitUpdate` HostConfig methods. The event log persists across re-renders (ADR-005), so re-rendered nodes pick up where they left off. No special reactive-graph re-rendering logic is needed — the reconciler handles tree diffing, and the HostConfig applies mutations. For v1 (before the reconciler), the reactive tree is built once and torn down via `WorkflowReactiveRoot.dispose()`.
- **Known Gap** (from [host-configs.md](host-configs.md)): "ujsx Reconciler Not Yet Available" — the current `HostConfig` is mount-only: no incremental template updates, no `prepareUpdate`/`commitUpdate` flow. This gap is broader than just re-rendering. - **Cross-references**: OQ-05, host-configs.md "Known Gaps"
- **Notes**: This is entirely dependent on the ujsx reconciler, which is not yet implemented. The host-configs doc notes "currently mount-only." When the reconciler is available, flowgraph gets re-rendering "for free." This question should be revisited after the reconciler is implemented.
- **Cross-references**: OQ-05 (structural container handling during re-render), host-configs.md "Known Gaps"
--- ---
@@ -327,10 +297,40 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
### OQ-26: How to handle version conflicts? ### OQ-26: How to handle version conflicts?
- **Origin**: [operation-graph.md](operation-graph.md) Q2 - **Origin**: [operation-graph.md](operation-graph.md) Q2
- **Status**: open - **Status**: resolved
- **Priority**: low — can be deferred to a versioning use case - **Priority**: low — confirmed as correct design, not a deferral
- **Question**: If two versions of the same operation exist in the registry, should they be separate nodes (`task.classify@1.0.0` vs `task.classify@2.0.0`) or should the latest version win? - **Resolution**: The current design uses `namespace.name` (no version) as the node key, meaning only one version per operation can exist in the graph. This is intentional simplicity. Version conflicts are a niche concern that would add significant complexity (version-aware node keys like `namespace.name@version`, multi-version edges, version compatibility matrices) without a concrete use case. If versioning becomes needed, the node key format could be extended to `namespace.name@version`, but this is a significant change that requires careful consideration. For v1, the one-version-per-operation constraint is sufficient and keeps the key format simple and consistent.
- **Notes**: The current design uses `namespace.name` (no version) as the node key, meaning only one version per operation. This is intentional simplicity. Version conflicts are a niche concern that can be addressed when a concrete use case arises.
---
## ADR-006: Edge Type Consistency and Single-Graph Architecture
**Status**: Accepted
**Context**: Two related questions (OQ-04, OQ-29) affect how edge types are represented in flowgraph:
- Should `edgeType` be a required attribute on all edges, or only on edges where it varies?
- Should `GraphologyHostConfig` produce separate graphs per edge type, or a single shared graph?
**Decision**:
1. `edgeType` is a required universal attribute on every edge, stored alongside (not inside) mode-specific attribute schemas.
2. All edge types share a single graphology `DirectedGraph` instance per `FlowGraph`.
3. Mode-specific attribute schemas (`OperationEdgeAttrs`, `TriggeredEdgeAttrs`, `DependencyEdgeAttrs`) do **not** include `edgeType` — it's stored separately at the graphology level.
4. `TemplateEdgeAttrs` includes `edgeType` as a constrained union (`"sequential" | "conditional"`) because template edges need to distinguish their type for rendering.
**Rationale**:
- Consistent serialization/deserialization (graphology native JSON format requires edge attributes)
- Uniform graphology queries and edge-type filtering across all graph modes
- The redundancy for operation graphs (`edgeType` is always `"typed"`) is a minor cost for significant consistency gains
- Separate graphs per edge type would add complexity (cross-graph traversal, cache coherence, multi-graph queries) without benefit at current scale
- Single-graph filtering by `edgeType` is O(n) on edges — negligible for expected graph sizes
**Consequences**:
- All `FlowGraph` instances store edges with `{ edgeType, ...modeSpecificAttrs }` at the graphology level
- Edge-type filtering is done via standard graphology attribute queries
- The `CallEdgeAttrs` union type is discriminated by `edgeType` at runtime (not by TypeBox schema validation, since both variants are empty objects)
- Serialization validation is a two-step process: (1) check that `edgeType` is present and matches the expected value for the graph mode, (2) validate remaining attributes against the mode-specific schema
- The `triggered` edge type gets the simple `${source}->${target}` key format; `depends_on` always gets the composite `${source}->${target}:depends_on` format (see schema.md Edge Key Convention)
- Future optimization (if needed) could add an internal `Map<EdgeType, Set<string>>` index without changing the public API
--- ---
@@ -340,70 +340,50 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
|----|----------|--------|----------|--------| |----|----------|--------|----------|--------|
| OQ-01 | All edges or only compatible edges? | operation-graph | high | resolved | | OQ-01 | All edges or only compatible edges? | operation-graph | high | resolved |
| OQ-02 | Type compatibility depth and granularity | operation-graph, analysis | high | resolved | | OQ-02 | Type compatibility depth and granularity | operation-graph, analysis | high | resolved |
| OQ-03 | Subscription operations in type compat | operation-graph | medium | open | | OQ-03 | Subscription operations in type compat | operation-graph | medium | resolved |
| OQ-04 | `edgeType` on all edges? | schema | medium | open | | OQ-04 | `edgeType` on all edges? | schema | medium | resolved |
| OQ-05 | Structural container transparency | workflow-templates, host-configs | high | resolved | | OQ-05 | Structural container transparency | workflow-templates, host-configs | high | resolved |
| OQ-06 | Template ↔ call protocol interaction | workflow-templates, host-configs | high | resolved | | OQ-06 | Template ↔ call protocol interaction | workflow-templates, host-configs | high | resolved |
| OQ-07 | Should reactive engine own call graph? | host-configs | high | resolved | | OQ-07 | Should reactive engine own call graph? | host-configs | high | resolved |
| OQ-08 | Auto-populate `depends_on` from templates? | call-graph | medium | resolved | | OQ-08 | Auto-populate `depends_on` from templates? | call-graph | medium | resolved |
| OQ-09 | Retries at signal level | reactive-execution | high | resolved | | OQ-09 | Retries at signal level | reactive-execution | high | resolved |
| OQ-10 | Running nodes when predecessor fails | reactive-execution | high | resolved | | OQ-10 | Running nodes when predecessor fails | reactive-execution | high | resolved |
| OQ-11 | OR logic for preconditions | reactive-execution | medium | open | | OQ-11 | OR logic for preconditions | reactive-execution | medium | resolved |
| OQ-12 | `maxConcurrency` interaction with preconditions | reactive-execution | medium | open | | OQ-12 | `maxConcurrency` interaction with preconditions | reactive-execution | medium | resolved |
| OQ-13 | `blockedByFailure` vs single computed | reactive-execution | low | open | | OQ-13 | `blockedByFailure` vs single computed | reactive-execution | low | resolved |
| OQ-14 | Unknown `operationId` in call graph | call-graph | medium | open (proposed) | | OQ-14 | Unknown `operationId` in call graph | call-graph | medium | resolved |
| OQ-15 | Multiple graphs per instance | call-graph | low | open | | OQ-15 | Multiple graphs per instance | call-graph | low | resolved |
| OQ-16 | `filterByStatus` index | call-graph | low | open | | OQ-16 | `filterByStatus` index | call-graph | low | resolved |
| OQ-17 | Expose graphology traversal directly? | flowgraph-api | medium | open | | OQ-17 | Expose graphology traversal directly? | flowgraph-api | medium | resolved |
| OQ-18 | Auto-populate type edges on `addOperation`? | flowgraph-api | low | open | | OQ-18 | Auto-populate type edges on `addOperation`? | flowgraph-api | low | resolved |
| OQ-19 | `parallelGroups` with resource constraints | analysis | low | open | | OQ-19 | `parallelGroups` with resource constraints | analysis | low | resolved |
| OQ-20 | Conditional edge condition representation | schema | medium | open | | OQ-20 | Conditional edge condition representation | schema | medium | resolved |
| OQ-21 | Explicit `depends_on` in templates | workflow-templates | medium | open | | OQ-21 | Explicit `depends_on` in templates | workflow-templates | medium | resolved |
| OQ-22 | `CallNodeAttrs.identity` type | schema | medium | open | | OQ-22 | `CallNodeAttrs.identity` type | schema | medium | resolved |
| OQ-24 | Async analysis functions | analysis | low | open | | OQ-23 | Multiple graphs per instance | call-graph | low | resolved (duplicate of OQ-15) |
| OQ-25 | Partial re-rendering | reactive-execution | low | open (blocked) | | OQ-24 | Async analysis functions | analysis | low | resolved |
| OQ-26 | Operation version conflicts | operation-graph | low | open | | OQ-25 | Partial re-rendering | reactive-execution | low | resolved |
| OQ-27 | Runtime preconditions in validateTemplate? | analysis | low | open (deferred) | | OQ-26 | Operation version conflicts | operation-graph | low | resolved |
| OQ-28 | Share analysis functions across instances? | flowgraph-api | low | open | | OQ-27 | Runtime preconditions in validateTemplate? | analysis | low | resolved |
| OQ-29 | Separate graph per edge type? | host-configs | medium | open | | OQ-28 | Share analysis functions across instances? | flowgraph-api | low | resolved |
| OQ-29 | Separate graph per edge type? | host-configs | medium | resolved |
### Priority Assessment ### All Questions Resolved
**Resolved (ADR-005)**: All open questions have been resolved. The architecture is now fully specified and ready for implementation decomposition.
- ~~OQ-01: All edges or only compatible~~ — resolved: type-compat edges only on `dataFlow: true` edges
- ~~OQ-02: Type compatibility depth~~ — resolved: type checking only for state-transfer edges
- ~~OQ-05: Structural container transparency~~ — resolved: containers stay transparent, aggregate status is a projection
- ~~OQ-06: Template ↔ call protocol~~ — resolved: bridge through event log
- ~~OQ-07: Reactive engine owns call graph?~~ — resolved: both are projections of event log
- ~~OQ-08: Auto-populate `depends_on` from templates?~~ — resolved: unnecessary, data flows through result projection
- ~~OQ-09: Retries at signal level~~ — resolved: append events, not state mutations
- ~~OQ-10: Running node failure handling~~ — resolved: projection policy, default is running nodes continue
**High priority** (should resolve before implementation):
- (all high-priority questions have been resolved)
**Medium priority** (should resolve before v1 release):
- OQ-03, OQ-04, OQ-11, OQ-12, OQ-14, OQ-17, OQ-20, OQ-21, OQ-22, OQ-29
**Low priority** (can defer or decide during implementation):
- OQ-13, OQ-15, OQ-16, OQ-18, OQ-19, OQ-24, OQ-25, OQ-26, OQ-27, OQ-28
### Cross-Cutting Themes ### Cross-Cutting Themes
These groups of questions interact with each other and should be resolved together: All cross-cutting theme groups have been resolved:
1. **~~Edge semantics group~~** (OQ-01, OQ-02, OQ-04): ~~All affect the operation graph's edge structure and the type compatibility API.~~ **Resolved by ADR-005.** OQ-01 and OQ-02 resolved (type checking only on `dataFlow: true` edges). OQ-04 remains open (edge type on all edges). 1. **Edge semantics group** (OQ-01, OQ-02, OQ-04): All resolved. Type checking only on `dataFlow: true` edges. `edgeType` is universal on all edges (ADR-006).
2. **~~Call protocol integration group~~** (OQ-06, OQ-07, OQ-08): ~~All about how flowgraph connects to the live call protocol.~~ **Resolved by ADR-005.** All three resolved: bridge through event log, projections instead of ownership, data flow through result projection. 2. **Call protocol integration group** (OQ-06, OQ-07, OQ-08): All resolved by ADR-005. Bridge through event log, projections instead of ownership, data flow through result projection.
3. **~~Failure semantics group~~** (OQ-09, OQ-10): ~~Both about how failure and retry propagate through the reactive engine.~~ **Resolved by ADR-005.** Retries are append events; running node failure is a projection policy. 3. **Failure semantics group** (OQ-09, OQ-10): All resolved by ADR-005. Retries are append events; running node failure is a projection policy.
4. **Scheduling group** (OQ-11, OQ-12): Both about how preconditions interact with scheduling constraints. 4. **Scheduling group** (OQ-11, OQ-12): All resolved. AND-only preconditions for v1, `maxConcurrency` via reactive counting semaphore.
4. **Scheduling group** (OQ-11, OQ-12): Both about how preconditions interact with scheduling constraints. 5. **Template expressiveness group** (OQ-05, OQ-20, OQ-21): All resolved. Containers stay transparent, `condition` as `Type.Unknown()` with documentation, no explicit `depends_on` for v1.
5. **Template expressiveness group** (OQ-05, OQ-20, OQ-21): All about what the template system can express and how it renders. 6. **Graph structure group** (OQ-04, OQ-29): All resolved by ADR-006. Universal `edgeType` on all edges, single shared graph per `FlowGraph`.
6. **Graph structure group** (OQ-04, OQ-29): Both about how edge types are represented in the graph — OQ-04 at the schema/type level, OQ-29 at the runtime graph structure level. Resolution of one constrains the other.
7. **Known gaps from host-configs.md** — not all "known gaps" are "open questions" (the reconciler gap is a dependency, not a design question), but they should be tracked here for completeness.

View File

@@ -1,6 +1,6 @@
--- ---
status: draft status: draft
last_updated: 2026-05-20 last_updated: 2026-05-22
--- ---
# Operation Graph (Static) # Operation Graph (Static)
@@ -169,13 +169,13 @@ See [analysis.md](analysis.md) for the full type-compatibility algorithm.
## Open Questions ## Open Questions
1. **Should `fromSpecs()` add ALL possible edges or only compatible ones?** The current design adds both compatible and incompatible edges. An alternative is to only add compatible edges, with a separate `potentialEdges()` query that computes incompatible connections on demand. Pro: smaller graph. Con: loses diagnostic information. 1. ~~**Should `fromSpecs()` add ALL possible edges or only compatible ones?**~~ **Resolved (OQ-001/ADR-005)**: Type-compatibility edges are only added on edges where `dataFlow: true`. Temporal-only edges bypass type checking. Both compatible and incompatible edges are added where data flows for diagnostic value.
2. **How to handle version conflicts?** If two versions of the same operation exist in the registry, should they be separate nodes (`task.classify@1.0.0` vs `task.classify@2.0.0`) or should the latest version win? The current design uses `namespace.name` (no version) as the node key, meaning only one version per operation can exist in the graph. 2. ~~**How to handle version conflicts?**~~ **Resolved (OQ-026)**: The current design uses `namespace.name` (no version) as the node key, meaning only one version per operation can exist in the graph. This is intentional simplicity. Version conflicts are a niche concern that can be addressed when a concrete use case arises. If versioning becomes needed, the node key format could be extended to `namespace.name@version`, but this is a significant change that requires careful consideration. For v1, the one-version-per-operation constraint is sufficient and keeps the key format simple and consistent.
3. **Should subscription operations be treated differently?** A subscription produces a stream, not a single output. Its `outputSchema` describes a single stream element, but the data flow semantics are different from query/mutation. Should the type compatibility check account for this? 3. ~~**Should subscription operations be treated differently in type compatibility?**~~ **Resolved (OQ-003)**: For v1, subscriptions are treated identically to queries/mutations in `typeCompat()`. A subscription's `outputSchema` describes a single stream element, and `typeCompat()` checks whether that single element is compatible with the downstream input. This is correct for the `Map` component (which processes stream elements individually) and may be misleading for direct subscription→operation connections. The `OperationNodeAttrs.type` field is available for consumers that need subscription-aware behavior. A v2 extension could add a `streaming: boolean` flag on edges to capture stream semantics explicitly, but this adds complexity without a current use case.
4. **How granular should type compatibility be?** The current `detail` field is a string. A more structured approach would be `{ compatible: boolean, mismatchPaths: string[] }` listing the specific JSON paths that don't match. This adds complexity but improves diagnostics. 4. ~~**How granular should type compatibility be?**~~ **Resolved (OQ-002/ADR-005)**: Type compatibility checking only applies to state-transfer edges (where `dataFlow: true`). The `typeCompat()` function returns `{ compatible, detail?, mismatches? }` for state-transfer edges only. Temporal-only edges bypass type checking entirely.
## References ## References

View File

@@ -1,6 +1,6 @@
--- ---
status: draft status: draft
last_updated: 2026-05-21 last_updated: 2026-05-22
--- ---
# Reactive Execution # Reactive Execution
@@ -376,67 +376,7 @@ Both B and C become `ready` at the same time, and the hub starts them in paralle
### Join preconditions ### Join preconditions
When a node depends on multiple predecessors (e.g., D depends on both B and C completing): When a node depends on multiple predecessors (fork-join):
- D's preconditions: `B.status === "completed" && C.status === "completed"`
D only becomes `ready` when all predecessors complete. This is the "join" in fork-join parallelism.
## Failure Propagation
Failure propagation is the mechanism by which a failed or aborted node causes its downstream dependents to abort. The key design principle: **failure follows dependency edges, not structural scope**.
This means:
- In a `Sequential` group, failure propagates forward through the chain (B depends on A, so if A fails, B aborts)
- In a `Parallel` group, sibling branches are independent — a failure in branch A does NOT affect branch B, because there are no dependency edges between them
- A node that depends on multiple predecessors (a join) aborts only when it's impossible for its preconditions to ever be met
### The preconditions-failure duality
Each node has two complementary reactive computations:
1. **`preconditions`** (`computed<boolean>`) — true when all predecessors are `completed` or `skipped`. Node can start.
2. **`blockedByFailure`** (`computed<boolean>`) — true when any predecessor is `failed` or `aborted` and the failure is uncaught (not handled by a `Conditional`).
```typescript
const preconditions = computed(() => {
const predecessors = graph.inNeighbors(node);
return predecessors.every(pred => {
const status = statusMap.get(pred)!.value;
return status === "completed" || status === "skipped";
});
});
const blockedByFailure = computed(() => {
const predecessors = graph.inNeighbors(node);
return predecessors.some(pred => {
const status = statusMap.get(pred)!.value;
return status === "failed" || status === "aborted";
});
});
```
When `blockedByFailure` becomes `true` and the node hasn't started (`idle` or `waiting`), the node transitions to `aborted`. This happens via an `effect()`:
```typescript
effect(() => {
if (blockedByFailure.value && (status.value === "idle" || status.value === "waiting")) {
status.value = "aborted";
}
});
```
This cascade is automatic and reactive — when a predecessor fails, all downstream `blockedByFailure` computations re-evaluate, and their effects fire, aborting any waiting dependents.
### Sequential failure propagation
```
A (failed) → B (aborted) → C (aborted)
```
When A fails, B's `blockedByFailure` becomes true. B transitions from `waiting` to `aborted`. C's `blockedByFailure` then becomes true (B is now `aborted`). C transitions to `aborted`. The entire downstream chain aborts.
### Parallel independence
``` ```
┌── B (completed) ──┐ ┌── B (completed) ──┐
@@ -444,36 +384,33 @@ A (completed) ├── D (ready)
└── C (failed) ─────┘ └── C (failed) ─────┘
``` ```
When C fails:
- C's downstream dependents see `blockedByFailure = true`
- B is unaffected — it's on an independent branch
- D depends on both B and C. D's `preconditions` will never be met (C is `failed`, not `completed`). D's `blockedByFailure` is true (C is `failed`). D transitions to `aborted`.
But crucially, this is because D *depends on* C, not because they share a structural scope:
```
┌── B (completed) ──┐
A (completed) │ (no edge from C to E)
└── C (failed) ─────┘
└── E (completed)
```
E has no dependency on C. E continues running regardless of C's failure. **Failure follows dependency edges, not structural boundaries.**
### Join semantics
When a node depends on multiple predecessors (fork-join):
```
┌── B (completed) ──┐
A (completed) ├── D (aborted)
└── C (failed) ─────┘
```
D's `preconditions` requires both B and C to be completed/skipped. Since C is `failed`, D's preconditions can never be met. D transitions to `aborted`. D's `preconditions` requires both B and C to be completed/skipped. Since C is `failed`, D's preconditions can never be met. D transitions to `aborted`.
The alternative would be "partial success" — D starts with B's output even though C failed. This is NOT supported by the precondition model. If partial execution is needed, the template author should use a `Conditional` to handle the failure case explicitly. The alternative would be "partial success" — D starts with B's output even though C failed. This is NOT supported by the precondition model. If partial execution is needed, the template author should use a `Conditional` to handle the failure case explicitly.
### `maxConcurrency` for Parallel groups
A `Parallel` group with `maxConcurrency: 3` should only start 3 nodes at a time, even though all preconditions are met. This is a scheduling constraint, not a structural one — the DAG doesn't encode it.
The `WorkflowReactiveRoot` enforces `maxConcurrency` via a reactive counting semaphore:
```typescript
// For each node in a Parallel group with maxConcurrency:
const groupKey = getParallelGroup(nodeId); // from parentMap/siblingMap
const maxConc = getMaxConcurrency(groupKey); // from template props
const canStart = computed(() => {
const siblingRunningCount = siblings.filter(
sib => statusMap.get(sib)!.value === "running"
).length;
return preconditions.value && siblingRunningCount < maxConc;
});
```
A node becomes `ready` only when both its `preconditions` are met AND the number of currently running siblings is below `maxConcurrency`. When a sibling completes and a slot opens, the next ready node starts.
For `Parallel` groups without `maxConcurrency` (the default), all siblings start immediately when their preconditions are met — no semaphore is needed.
### Conditional as error boundary ### Conditional as error boundary
A `Conditional` can catch a failure and redirect to a fallback path: A `Conditional` can catch a failure and redirect to a fallback path:
@@ -771,7 +708,8 @@ The `WorkflowErrorBoundary` catches errors that escape the signal graph (e.g., a
## Constraints ## Constraints
- **Events are the source of truth** (ADR-005) — the hub coordinator appends call protocol events. Status, results, and call graph state are derived from the event log. The coordinator does NOT directly set signal values. - **Events are the source of truth for call-level statuses** (ADR-005) — the hub coordinator appends call protocol events. Call-level statuses (`running`, `completed`, `failed`, `aborted` from `call.aborted`) are derived from the event log by the status projection. The coordinator does NOT directly set signal values for these statuses.
- **Workflow-derived statuses use signal mutation** — statuses that have no call protocol equivalent (`idle`, `waiting`, `ready`, `skipped`, and `aborted` from `blockedByFailure`) are set directly on signals by the reactive engine. This is not a violation of ADR-005's event-log principle — these statuses represent workflow-level concerns (scheduling, failure propagation) that exist outside the call protocol's scope. ADR-005's principle applies to *call protocol events*; it does not forbid the reactive layer from managing its own workflow-level state. See the "Hybrid Status Model" section for the full categorization.
- **Event processing is idempotent** — processing the same event twice produces the same projected state. The status projection scans for the most recent event per node. - **Event processing is idempotent** — processing the same event twice produces the same projected state. The status projection scans for the most recent event per node.
- **Signals are in-memory** — `WorkflowReactiveRoot` state is not persisted. If the hub restarts, the reactive state is reconstructed from call protocol events + template re-render. The event log itself can be reconstructed from the call protocol event stream. - **Signals are in-memory** — `WorkflowReactiveRoot` state is not persisted. If the hub restarts, the reactive state is reconstructed from call protocol events + template re-render. The event log itself can be reconstructed from the call protocol event stream.
- **Failure policy is configurable** — the `FailurePolicy` determines what happens to running nodes when a predecessor fails. Default is `continue-running` (only idle/waiting nodes abort). Alternative is `abort-dependents` (running dependents also abort). - **Failure policy is configurable** — the `FailurePolicy` determines what happens to running nodes when a predecessor fails. Default is `continue-running` (only idle/waiting nodes abort). Alternative is `abort-dependents` (running dependents also abort).
@@ -780,7 +718,8 @@ The `WorkflowErrorBoundary` catches errors that escape the signal graph (e.g., a
- **Abort is immediate in signals, delayed in protocol** — transitioning a signal to `aborted` is instant, but `prm.abort(requestId)` takes time to propagate through the call protocol. The hub should invoke both. - **Abort is immediate in signals, delayed in protocol** — transitioning a signal to `aborted` is instant, but `prm.abort(requestId)` takes time to propagate through the call protocol. The hub should invoke both.
- **`skipped` satisfies preconditions** — a `skipped` predecessor is treated as "completed for the purpose of preconditions." It means the branch was deliberately bypassed, not broken. - **`skipped` satisfies preconditions** — a `skipped` predecessor is treated as "completed for the purpose of preconditions." It means the branch was deliberately bypassed, not broken.
- **`failed` and `aborted` block preconditions** — a `failed` or `aborted` predecessor means the dependent's preconditions can never be met. The `blockedByFailure` effect transitions the dependent to `aborted`. - **`failed` and `aborted` block preconditions** — a `failed` or `aborted` predecessor means the dependent's preconditions can never be met. The `blockedByFailure` effect transitions the dependent to `aborted`.
- **`NodeStatus` and `CallStatus` share terminal states** — `running`, `completed`, `failed`, `aborted` map directly. `idle`, `waiting`, `ready`, `skipped` are workflow-specific additions. - **`NodeStatus` and `CallStatus` share terminal states** — `running`, `completed`, `failed`, `aborted` map directly. `idle`, `waiting`, `ready`, `skipped` are workflow-specific additions with no call protocol equivalent.
- **Edge key format uses composite keys for call graph** — `triggered` edges use `${source}->${target}`, `depends_on` edges use `${source}->${target}:depends_on`. See [schema.md](schema.md) for the full key convention.
## Lifecycle and Ownership ## Lifecycle and Ownership
@@ -872,15 +811,15 @@ The `ReactiveContext` passed to `ReactiveHostConfig` includes a reference to `wo
## Open Questions ## Open Questions
1. **Should preconditions support OR logic?** Currently all predecessors must complete (AND logic). An `anyOf` predicate would allow "start this node as soon as any predecessor completes." This would require an edge attribute or node-level configuration. 1. ~~**Should preconditions support OR logic?**~~ **Resolved (OQ-011)**: No for v1. All preconditions use AND logic — a node becomes `ready` only when ALL predecessors have reached a satisfying terminal state (`completed` or `skipped`). OR logic (`anyOf`) would introduce significant complexity (what happens when one predecessor completes but another fails? Is the node ready or blocked?) and is already partially addressed by `Conditional` (which provides branch-level either/or semantics). For v2, if OR logic becomes necessary, it should be added as a `preconditionMode: "allOf" | "anyOf"` attribute on `Operation` (node-level, not edge-level), defaulting to `"allOf"`. This is a clean extension point that doesn't change the current precondition model.
2. ~~**How are retries handled at the signal level?**~~ **Resolved by ADR-005**: Retries are natural append events. A retry creates a new `call.requested` with a new `requestId`. The status projection derives the current state by scanning for the most recent event per node. No `retried` status needed. See the Retry semantics section above. 2. ~~**How are retries handled at the signal level?**~~ **Resolved by ADR-005**: Retries are natural append events. A retry creates a new `call.requested` with a new `requestId`. The status projection derives the current state by scanning for the most recent event per node. No `retried` status needed. See the Retry semantics section above.
3. **Should the reactive graph support partial re-rendering?** If a template changes mid-execution (e.g., a step is added), the ujsx reconciler could diff the old and new trees. But the ReactiveHost only supports mount rendering. Re-rendering would require reconciler support. 3. ~~**Should the reactive graph support partial re-rendering?**~~ **Resolved (OQ-025)**: Blocked on ujsx reconciler. Currently mount-only. When the reconciler is implemented, flowgraph gains re-rendering through the standard `prepareUpdate`/`commitUpdate` HostConfig methods. The event log persists across re-renders (ADR-005), so re-rendered nodes pick up where they left off. No special reactive-graph re-rendering logic is needed — the reconciler handles tree diffing, and the HostConfig applies mutations.
4. **How does `maxConcurrency` interact with preconditions?** A `Parallel` group with `maxConcurrency: 3` should only start 3 nodes at a time, even though all preconditions are met. This is a scheduling concern, not a structural one. The reactive layer could implement this as a semaphore signal, or it could be the coordinator's responsibility. 4. ~~**How does `maxConcurrency` interact with preconditions?**~~ **Resolved (OQ-012)**: `maxConcurrency` is a `Parallel` prop enforced by the `WorkflowReactiveRoot` via a counting semaphore in the reactive layer. When the root initializes signals for nodes in a `Parallel` group with `maxConcurrency: N`, it wraps the precondition logic: a node's effective `ready` transition requires both `preconditions.value === true` AND `runningCount < maxConcurrency`. The `runningCount` is a reactive computed derived from counting sibling nodes currently in the `running` state. This is entirely a reactive-engine concern — the DAG doesn't encode `maxConcurrency` (it's not structural), and the call graph doesn't need to know about it. The `Parallel` component's `maxConcurrency` prop is already part of the template definition; the reactive engine just needs to honor it.
5. **Should `blockedByFailure` be a separate `computed` or derived from `preconditions`?** Currently the design has two separate computeds `preconditions` (all predecessors completed/skipped) and `blockedByFailure` (any predecessor failed/aborted). An alternative is a single `computed<NodeReadiness>` that returns `"ready" | "blocked" | "failed"` or similar. This reduces the number of effects but makes the readiness check less composable. 5. ~~**Should `blockedByFailure` be a separate `computed` or derived from `preconditions`?**~~ **Resolved (OQ-013)**: Keep two separate `computed` values (current design). Two separate computeds are more composable — you can check preconditions independently of failure status, and you can compose different effects for each. A single `computed<NodeReadiness>` would require every consumer to destructure the result, losing the clean `if (preconditions.value) { ... }` pattern. The implementation cost of two effects per node is negligible. The current design is the right one.
6. ~~**What happens to running nodes when a predecessor fails?**~~ **Resolved by ADR-005/OQ-010**: This is a `FailurePolicy` configuration of the projection. The default policy (`continue-running`) means running nodes continue. An alternative policy (`abort-dependents`) would abort running dependents. The event log makes both strategies expressible — only the projection logic changes. 6. ~~**What happens to running nodes when a predecessor fails?**~~ **Resolved by ADR-005/OQ-010**: This is a `FailurePolicy` configuration of the projection. The default policy (`continue-running`) means running nodes continue. An alternative policy (`abort-dependents`) would abort running dependents. The event log makes both strategies expressible — only the projection logic changes.

View File

@@ -1,6 +1,6 @@
--- ---
status: draft status: draft
last_updated: 2026-05-21 last_updated: 2026-05-22
--- ---
# Schema # Schema
@@ -216,7 +216,7 @@ const CallNodeAttrs = Type.Object({
message: Type.String(), message: Type.String(),
details: Type.Optional(Type.Unknown()), details: Type.Optional(Type.Unknown()),
})), })),
identity: Type.Optional(Type.Object({ // Caller identity identity: Type.Optional(Type.Object({ // Caller identity (OQ-022: imported from @alkdev/operations peer dep)
id: Type.String(), id: Type.String(),
scopes: Type.Array(Type.String()), scopes: Type.Array(Type.String()),
resources: Type.Optional(Type.Record(Type.String(), Type.Array(Type.String()))), resources: Type.Optional(Type.Record(Type.String(), Type.Array(Type.String()))),
@@ -235,6 +235,16 @@ The node key is `requestId`. This matches the call protocol's correlation mechan
## Edge Attribute Schemas ## Edge Attribute Schemas
### Edge Attribute Schemas
**Important**: `edgeType` is a universal required attribute stored on every edge in graphology, alongside (not inside) the mode-specific attribute schemas. This means the stored edge attributes are always `{ edgeType, ...modeSpecificAttrs }`. The TypeBox schemas below define only the mode-specific attributes; `edgeType` is added separately during edge creation and validated separately during deserialization.
When validating serialized graphs, the validation is a two-step process:
1. Check that `edgeType` is present and matches the expected value for the graph mode
2. Validate the remaining attributes against the mode-specific schema (`OperationEdgeAttrs`, `CallEdgeAttrs`, etc.)
This separation keeps the mode-specific schemas clean (they define only what's unique to each mode) while ensuring `edgeType` is always present at the storage level.
### OperationEdgeAttrs (Operation Graph) ### OperationEdgeAttrs (Operation Graph)
```typescript ```typescript
@@ -252,7 +262,7 @@ type OperationEdgeAttrs = Static<typeof OperationEdgeAttrs>;
Type-compatibility edges carry a boolean `compatible` flag, an optional `detail` string, and optional structured `mismatches`. This allows the operation graph to include both compatible edges (green paths) and incompatible edges (red paths) for diagnostics. The `detail` field provides a human-readable summary, while `mismatches` provides machine-readable field-level diagnostics. The `TypeCompatResult` from `typeCompat()` populates both fields: `detail` for compatible edges and `mismatches` for incompatible ones. Type-compatibility edges carry a boolean `compatible` flag, an optional `detail` string, and optional structured `mismatches`. This allows the operation graph to include both compatible edges (green paths) and incompatible edges (red paths) for diagnostics. The `detail` field provides a human-readable summary, while `mismatches` provides machine-readable field-level diagnostics. The `TypeCompatResult` from `typeCompat()` populates both fields: `detail` for compatible edges and `mismatches` for incompatible ones.
**Edge type storage**: Operation graph edges always have `edgeType: "typed"` stored on the edge as a separate attribute alongside `OperationEdgeAttrs`. Graphology edges carry both the `OperationEdgeAttrs` (compatible, detail, mismatches) and the `edgeType` field. The `edgeType` is not inside `OperationEdgeAttrs` because it's a universal edge classification that applies to all edge types across all graph modes (operation, call, template). The `OperationEdgeAttrs` schema only defines the mode-specific attributes. **Edge type storage (OQ-004)**: `edgeType` is a required universal attribute stored on every edge, regardless of graph mode. This applies uniformly: operation graph edges have `edgeType: "typed"`, call graph edges have `edgeType: "triggered"` or `"depends_on"`, and template edges have `edgeType: "sequential"` or `"conditional"`. The `edgeType` field is stored alongside mode-specific attributes in graphology, not inside the mode-specific attribute schemas (`OperationEdgeAttrs`, `TriggeredEdgeAttrs`, etc.). This ensures consistent serialization/deserialization, uniform graphology queries, and straightforward edge-type filtering. See ADR-006 for the full decision record.
```typescript ```typescript
// How operation graph edges are stored in graphology: // How operation graph edges are stored in graphology:
@@ -290,14 +300,23 @@ Data dependency edges also carry no additional attributes. Future extensions may
type CallEdgeAttrs = TriggeredEdgeAttrs | DependencyEdgeAttrs; type CallEdgeAttrs = TriggeredEdgeAttrs | DependencyEdgeAttrs;
``` ```
A union type used as the edge attribute type parameter for call graphs (`FlowGraph<CallNodeAttrs, CallEdgeAttrs>`). Call graph edges can be either `triggered` (parent-child) or `depends_on` (data dependency), distinguished by their edge type. The union type follows the `{GraphType}EdgeAttrs` naming pattern consistent with `OperationEdgeAttrs` and `TemplateEdgeAttrs`. A union type used as the edge attribute type parameter for call graphs (`FlowGraph<CallNodeAttrs, CallEdgeAttrs>`). Call graph edges can be either `triggered` (parent-child) or `depends_on` (data dependency), distinguished by their `edgeType` attribute.
**Runtime discrimination**: Since `TriggeredEdgeAttrs` and `DependencyEdgeAttrs` are both empty objects, the union cannot be discriminated by TypeBox at the schema level. Instead, `edgeType` serves as the runtime discriminant. When validating serialized call graph edges, the two-step validation process applies:
1. Read `edgeType` to determine which variant applies (`"triggered"``TriggeredEdgeAttrs`, `"depends_on"``DependencyEdgeAttrs`)
2. Validate the remaining attributes against the corresponding schema
At the code level, `edgeType` is used in a switch/if statement to determine which type of call edge is being processed. The `addCall` method automatically sets `edgeType: "triggered"` when creating a triggered edge, and `addDependency` sets `edgeType: "depends_on"`.
**`depends_on` edge status (ADR-005)**: While `depends_on` edges are not auto-populated by the call protocol (ADR-005 resolves OQ-008: data dependencies flow through the result projection), they remain in the API for **observability and visualization**. A hub coordinator or external tool may add `depends_on` edges to annotate observed data flow between calls for debugging or monitoring purposes. They do NOT affect execution — the reactive engine derives data flow from the result projection, not from `depends_on` edges.
### TemplateEdgeAttrs (Workflow Templates) ### TemplateEdgeAttrs (Workflow Templates)
```typescript ```typescript
const TemplateEdgeAttrs = Type.Object({ const TemplateEdgeAttrs = Type.Object({
edgeType: Type.Union([Type.Literal("sequential"), Type.Literal("conditional")]), edgeType: Type.Union([Type.Literal("sequential"), Type.Literal("conditional")]),
condition: Type.Optional(Type.Unknown()), // For conditional edges: the condition function or expression condition: Type.Optional(Type.Unknown({ description: "For conditional edges: a function ((results: Record<string, CallResult>) => boolean) or a string referencing an operation name. Function values are not JSON-serializable; use string form for persistence." })),
negated: Type.Optional(Type.Boolean({ description: "True if this edge represents the negated condition of a Conditional's else branch" })),
dataFlow: Type.Optional(Type.Boolean({ default: false, description: "Whether this edge carries data (state transfer) or only ordering (temporal notification)" })), dataFlow: Type.Optional(Type.Boolean({ default: false, description: "Whether this edge carries data (state transfer) or only ordering (temporal notification)" })),
}); });
type TemplateEdgeAttrs = Static<typeof TemplateEdgeAttrs>; type TemplateEdgeAttrs = Static<typeof TemplateEdgeAttrs>;
@@ -305,11 +324,29 @@ type TemplateEdgeAttrs = Static<typeof TemplateEdgeAttrs>;
Template edges carry an `edgeType` to distinguish sequential flow from conditional branching. Conditional edges optionally store a `condition` that determines whether the target node executes. Template edges carry an `edgeType` to distinguish sequential flow from conditional branching. Conditional edges optionally store a `condition` that determines whether the target node executes.
**`condition` representation (OQ-020)**: The `condition` field uses `Type.Unknown()` at the schema level for maximum flexibility, with two runtime representations:
1. **String form** (`string`): A serializable reference to an operation name whose result determines the branch. Example: `"fetch-data"` means "check the result of the operation named 'fetch-data'". String conditions survive JSON round-trips and are resolved by the HostConfig at render time using the operation registry.
2. **Function form** (`(results: Record<string, CallResult>) => boolean`): A runtime-evaluated predicate that receives predecessor results and returns `true` (then-branch) or `false` (else-branch). Function conditions do NOT survive JSON serialization. They are evaluated by the reactive engine against the result projection (per ADR-005).
The `Type.Unknown()` schema representation is intentional — it matches the reality that conditions can be either strings or functions, and neither TypeBox's `Type.String()` alone nor `Type.Function()` alone captures both forms. `@alkdev/typebox`'s `Type.Function()` defines input/output schemas for serializable function shapes, but the `Conditional.test` predicate is a runtime closure, not a serializable function schema. If a future need arises for schema-level condition descriptions (e.g., for template interchange), a dedicated `ConditionSchema` can be introduced — but for v1, `Type.Unknown()` with documentation is the pragmatic choice.
**`dataFlow` attribute (ADR-005)**: Distinguishes temporal-only edges from state-transfer edges. This attribute is critical for type compatibility checking: **`dataFlow` attribute (ADR-005)**: Distinguishes temporal-only edges from state-transfer edges. This attribute is critical for type compatibility checking:
- **`dataFlow: false`** (default): The edge expresses temporal ordering only — the downstream node starts after the upstream node completes, but doesn't read the upstream node's output. No type compatibility check is needed. - **`dataFlow: false`** (default): The edge expresses temporal ordering only — the downstream node starts after the upstream node completes, but doesn't read the upstream node's output. No type compatibility check is needed.
- **`dataFlow: true`**: The edge carries data — the downstream node reads the upstream node's output via `Conditional.test`, `Map.over`, or `Operation.input`. Type compatibility checking (`typeCompat()`) should verify that the upstream output schema is compatible with the downstream input schema. - **`dataFlow: true`**: The edge carries data — the downstream node reads the upstream node's output via `Conditional.test`, `Map.over`, or `Operation.input`. Type compatibility checking (`typeCompat()`) should verify that the upstream output schema is compatible with the downstream input schema.
The `dataFlow` attribute is **inferred** by the `GraphologyHostConfig` during template rendering. For v1, the inference uses a **conservative strategy**: an edge gets `dataFlow: true` when any of the following conditions are detected, and `dataFlow: false` (the default) otherwise:
1. A `Conditional` edge always gets `dataFlow: true` (conditions always read a predecessor's result).
2. A `Sequential` edge where the downstream node's `input` function references `results[...]` gets `dataFlow: true`.
3. A `Sequential` edge where a `Map.over` function references `results[...]` on the predecessor gets `dataFlow: true`.
Edges where `dataFlow` cannot be determined (e.g., `Operation.input` is an opaque function that can't be statically analyzed) default to `dataFlow: false`. Template authors can override this by explicitly providing `dataFlow: true` as an edge attribute if they know the downstream node reads upstream output.
Over-marking `dataFlow: true` is safe (it just causes an unnecessary type compatibility check), while under-marking is safe (it skips a check that would have passed anyway, but could let a type-incompatible connection through). The conservative strategy errs on the side of under-marking.
The `dataFlow` attribute is **inferred** by the `GraphologyHostConfig` during template rendering, not manually specified by template authors: The `dataFlow` attribute is **inferred** by the `GraphologyHostConfig` during template rendering, not manually specified by template authors:
- A `Sequential` edge where the downstream node references `results["upstreamNode"]` in any expression gets `dataFlow: true` - A `Sequential` edge where the downstream node references `results["upstreamNode"]` in any expression gets `dataFlow: true`
@@ -408,6 +445,10 @@ For example, a `depends_on` edge in the call graph uses `"req_abc123->req_def456
Since `multi: false`, there can be at most one edge per key. The composite key format ensures deterministic keys even when multiple edge types connect the same pair. Since `multi: false`, there can be at most one edge per key. The composite key format ensures deterministic keys even when multiple edge types connect the same pair.
**Key priority convention**: When multiple edge types exist between the same (source, target) pair, the "primary" edge type gets the simple `${source}->${target}` key format. For call graphs, `triggered` edges are primary (a parent always triggers its child before any data dependency is established), so `triggered` edges use the simple format. For operation graphs and template DAGs, there is only one edge type per (source, target) pair, so the simple format always applies.
**`depends_on` edge key format**: `depends_on` edges always use the composite format `${source}->${target}:depends_on`, even if no `triggered` edge exists between the same pair. This ensures key consistency regardless of edge ordering.
This is an exception to the simple `${source}->${target}` pattern, but it's necessary for the call graph's dual-edge-type scenario. If multi-edge support becomes more broadly needed, the constraint can be relaxed and a uniform composite key format adopted. This is an exception to the simple `${source}->${target}` pattern, but it's necessary for the call graph's dual-edge-type scenario. If multi-edge support becomes more broadly needed, the constraint can be relaxed and a uniform composite key format adopted.
## Constraints ## Constraints
@@ -423,11 +464,11 @@ This is an exception to the simple `${source}->${target}` pattern, but it's nece
## Open Questions ## Open Questions
1. **Should `edgeType` be a required field on ALL edges, or only on call graph and template edges?** Operation graph edges are always `typed`, so requiring an explicit `edgeType` attribute there is redundant. Options: (a) make `edgeType` required on all edges, (b) have separate edge attribute types per graph mode, (c) use a union type on edge attributes and let the consumer tag the edge. 1. ~~**Should `edgeType` be a required field on ALL edges, or only on call graph and template edges?**~~ **Resolved (OQ-004)**: `edgeType` is required on all edges, stored as a universal attribute alongside mode-specific attributes. The mode-specific attribute schemas (`OperationEdgeAttrs`, `TriggeredEdgeAttrs`, `DependencyEdgeAttrs`) do NOT include `edgeType` — it's stored separately in graphology at the same level as the mode-specific attributes. This ensures consistent serialization/deserialization, uniform graphology queries, and straightforward edge-type filtering across all graph modes. See ADR-006.
2. **Should `CallNodeAttrs.identity` be a `Type.Record` or the structured `Identity` type from operations?** The structured type matches the call protocol and storage schema but creates a dependency on `@alkdev/operations` types. Options: (a) import `Identity` from operations (peer dep), (b) duplicate the type in flowgraph, (c) use `Type.Record` and accept weaker typing. 2. ~~**Should `CallNodeAttrs.identity` be a `Type.Record` or the structured `Identity` type from operations?**~~ **Resolved (OQ-022)**: Import the `Identity` type structure from `@alkdev/operations` (peer dependency). Since `@alkdev/operations` is already a peer dependency (for `CallEventMapValue`), adding this type import creates minimal additional coupling. The `CallNodeAttrs.identity` field mirrors the `Identity` interface: `{ id, scopes, resources? }`. Version alignment is handled by semver ranges. The TypeBox schema for `identity` is defined inline in `CallNodeAttrs` to match the shape (not imported as a TypeBox schema, since `@alkdev/operations` defines `Identity` as a TypeScript interface), but the field semantics match exactly.
3. **How should conditional edge conditions be represented?** `condition: Type.Unknown()` is maximally flexible but provides no type safety. Options: (a) `Type.Unknown()` with documentation, (b) `Type.Union([Type.String(), Type.Function(...)])` for expression strings and function references, (c) a dedicated `ConditionSchema` that flowgraph defines. 3. ~~**How should conditional edge conditions be represented?**~~ **Resolved (OQ-020)**: `condition: Type.Optional(Type.Unknown())` with documentation describing the two runtime forms: string (serializable operation reference) and function (`(results) => boolean`, not serializable). `@alkdev/typebox`'s `Type.Function()` defines serializable function input/output schemas, but `Conditional.test` predicates are runtime closures — they can't be represented as serializable function schemas. `Type.Unknown()` is the pragmatic choice for v1, accepting that JSON serialization only preserves the string form. A dedicated `ConditionSchema` can be introduced in v2 if template interchange needs schema-level condition descriptions.
## References ## References

View File

@@ -1,6 +1,6 @@
--- ---
status: draft status: draft
last_updated: 2026-05-21 last_updated: 2026-05-22
--- ---
# Workflow Templates # Workflow Templates
@@ -130,13 +130,26 @@ When rendered to a graphology DAG, `Conditional` creates an edge with `edgeType:
If the test evaluates to `false` and no `else` branch is provided, the branch nodes transition to `skipped` in `NodeStatus`. If the test evaluates to `false` and no `else` branch is provided, the branch nodes transition to `skipped` in `NodeStatus`.
#### String condition resolution
When `Conditional.test` is a string (rather than a function), the HostConfig resolves it at render time using the operation registry. The resolution algorithm is:
- `test: "operationName"` → resolves to `(results) => results["operationName"]?.status === "completed"`, meaning "the then-branch is taken if the referenced operation completed successfully."
- If the referenced operation failed or was aborted, the condition evaluates to `false` and the else-branch is taken (or the then-branch is `skipped` if no else-branch).
- String conditions can only reference predecessor operations by name. For more complex conditions (checking output fields, combining multiple results, etc.), use the function form.
This resolution algorithm is deterministic and produces the same behavior regardless of which HostConfig performs the resolution.
#### Else-branch behavior #### Else-branch behavior
When the `else` prop is provided, the `Conditional` renders two subgraphs: When the `else` prop is provided, the `Conditional` renders two subgraphs:
**DAG rendering (GraphologyHostConfig)**: **DAG rendering (GraphologyHostConfig)**:
- The `then` branch (child) renders with an edge from the conditional's predecessor to the first child, with `edgeType: "conditional"` and `condition: <test>`. - The `then` branch (child) renders with an edge from the conditional's predecessor to the first child, with `edgeType: "conditional"` and `condition: <test>`.
- The `else` branch renders as a separate subgraph with `edgeType: "conditional"` and `condition: <negated test>`. The negated condition is derived automatically. - The `else` branch renders as a separate subgraph with `edgeType: "conditional"`, `condition: <test>`, and `negated: true`. The `negated` flag on `TemplateEdgeAttrs` indicates that the condition is logically negated for the else-branch. At render time, the HostConfig resolves the negation differently depending on the condition form:
- **String condition**: `condition: "fetch-data"` with `negated: true` resolves to `(results) => results["fetch-data"]?.status !== "completed"`.
- **Function condition**: The HostConfig wraps the original function: `condition: (results) => !originalTest(results)`.
- This ensures the else-branch is taken when the original condition evaluates to `false`, regardless of condition form.
- Both branches share the same predecessor — the `Conditional` node's structural position in the template determines the common starting point. - Both branches share the same predecessor — the `Conditional` node's structural position in the template determines the common starting point.
**Reactive rendering (ReactiveHostConfig)**: **Reactive rendering (ReactiveHostConfig)**:
@@ -378,9 +391,9 @@ Not all component combinations are valid. The following rules govern which compo
1. ~~**Should `Sequential` and `Parallel` be transparent in the graph?**~~ **Resolved (OQ-05)**: Containers stay transparent. No nodes for `Sequential`, `Parallel`, or `Conditional` in the DAG. Aggregate status for containers is computed as a projection from children's statuses. The `parentMap` and `siblingMap` in `ReactiveContext` provide the structural context for precondition computation. 1. ~~**Should `Sequential` and `Parallel` be transparent in the graph?**~~ **Resolved (OQ-05)**: Containers stay transparent. No nodes for `Sequential`, `Parallel`, or `Conditional` in the DAG. Aggregate status for containers is computed as a projection from children's statuses. The `parentMap` and `siblingMap` in `ReactiveContext` provide the structural context for precondition computation.
2. ~~**Should templates support loops?**~~ **Resolved**: The `<Map>` component provides array iteration — one child per array element. It does NOT support general loops (while, do-while). For repeated execution with conditional exit, use `Conditional` inside a `Sequential` group. General-purpose loops with arbitrary termination conditions are not supported because they would require cycle-supporting templates, which conflicts with the DAG-only invariant. 2. ~~**Should templates support loops?**~~ **Resolved**: The `<Map>` component provides array iteration — one child per array element. It does NOT support general loops (while, do-while). For repeated execution with conditional exit, use `Conditional` inside a `Sequential` group. General-purpose loops are not supported because they would require cycle-supporting templates, which conflicts with the DAG-only invariant.
3. **Should templates support `depends_on` edges explicitly?** Currently dependencies are inferred from structure (sequential implies dependency). An explicit `<DependsOn target="operation-name" />` component would make data dependencies visible in the template without relying on sequential ordering. With ADR-005's `dataFlow` attribute, data dependencies are now inferable from template expressions — `Conditional.test` and `Map.over` that reference predecessor results set `dataFlow: true` on the corresponding edge. Explicit `depends_on` edges would add manual annotation capability, but the `dataFlow` inference may be sufficient for v1. 3. ~~**Should templates support `depends_on` edges explicitly?**~~ **Resolved (OQ-021)**: No for v1. ADR-005's `dataFlow` inference and the result projection make explicit `depends_on` unnecessary for current use cases. Data dependencies are expressed through the result projection — if B needs A's output, B reads `getResult("A")`. The `dataFlow: true` attribute on edges captures which edges carry data. An explicit `<DependsOn>` component would add template syntax complexity and potentially conflict with structural ordering. If a future use case requires non-adjacent data dependencies that can't be expressed by restructuring the template, `<DependsOn>` can be added as a v2 extension. But v1 intentionally restricts dependencies to follow the structural flow.
4. ~~**How does template instantiation interact with the call protocol?**~~ **Resolved (ADR-005)**: The template bridges to the call protocol through the event log. The hub coordinator appends call protocol events; the reactive layer projects them. Each `<Operation>` node's `requestId` maps to call protocol events via the `nodeKeyToRequestId` map. No callback, no boomerang — the event log is the bridge. 4. ~~**How does template instantiation interact with the call protocol?**~~ **Resolved (ADR-005)**: The template bridges to the call protocol through the event log. The hub coordinator appends call protocol events; the reactive layer projects them. Each `<Operation>` node's `requestId` maps to call protocol events via the `nodeKeyToRequestId` map. No callback, no boomerang — the event log is the bridge.