resolve all remaining open questions (OQ-03–OQ-29), add ADR-006

Resolve all 19 remaining open questions across the architecture. Every
question now has a documented resolution with rationale:

- OQ-004/OQ-029: edgeType is a universal required attribute on all edges,
  single graph per FlowGraph instance (ADR-006)
- OQ-011: No OR preconditions for v1; preconditionMode as v2 extension
- OQ-012: maxConcurrency enforced via reactive counting semaphore
- OQ-014: Unknown operationId creates node with pending status
- OQ-017: Expose common graphology traversal methods on FlowGraph (80/20)
- OQ-020: condition as Type.Unknown() with string/function documentation
- OQ-022: Identity imported from @alkdev/operations peer dep
- All other questions resolved with documented rationale

Fix three critical issues found by architecture review:
1. edgeType serialization/validation gap: document two-step validation
2. CallEdgeAttrs runtime discrimination: edgeType as runtime discriminant,
   depends_on edges clarified as observability-only (not execution)
3. ADR-005 signal mutation inconsistency: explicitly distinguish call-level
   statuses (event-log-driven) from workflow-derived statuses (signal-mutation)

Additional clarifications:
- dataFlow inference uses conservative strategy (defaults false)
- Conditional.test string resolution: operationName → status === completed
- Add negated field to TemplateEdgeAttrs for else-branch conditions
- Document edge key priority convention for composite keys
- Add maxConcurrency semaphore design to reactive-execution.md
This commit is contained in:
2026-05-21 09:25:55 +00:00
parent c76be7f689
commit f3e084d02f
9 changed files with 239 additions and 268 deletions

View File

@@ -1,6 +1,6 @@
---
status: draft
last_updated: 2026-05-20
last_updated: 2026-05-22
---
# Open Questions Tracker
@@ -50,22 +50,18 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
### OQ-03: Should subscription operations be treated differently in type compatibility?
- **Origin**: [operation-graph.md](operation-graph.md) Q3
- **Status**: open
- **Status**: resolved
- **Priority**: medium — affects operation graph edge semantics for streaming operations
- **Question**: A subscription produces a stream, not a single output. Its `outputSchema` describes a single stream element, but the data flow semantics are different. Should type compat check for subscriptions account for this?
- **Notes**: This has downstream implications for call-graph population (subscriptions produce multiple `call.responded` events) and template authoring (a subscription feeding into a mutation has different semantics than a query feeding into a mutation). May want to defer to v2 but should at least document the current behavior (subscriptions are treated the same as queries/mutations).
- **Resolution**: For v1, subscriptions are treated identically to queries/mutations in `typeCompat()`. A subscription's `outputSchema` describes a single stream element, and `typeCompat()` checks whether that single element is compatible with the downstream input. This is correct for `Map` (which processes stream elements individually) and may be misleading for direct subscription→operation connections. The `OperationNodeAttrs.type` field is available for consumers that need subscription-aware behavior. A v2 extension could add a `streaming: boolean` flag on edges to capture stream semantics explicitly, but this adds complexity without a current use case.
- **Cross-references**: OQ-01
### OQ-04: Edge type consistency — should `edgeType` be required on ALL edges?
- **Origin**: [schema.md](schema.md) Q1
- **Status**: open
- **Status**: resolved
- **Priority**: medium — affects serialization format and edge handling across all graph types
- **Options**:
- (a) `edgeType` required on all edges. Pro: consistent, self-describing. Con: operation graph edges are always `typed`, making the field redundant there.
- (b) Separate edge attribute types per graph mode (current implicit design — `CallEdgeAttrs` is a union, `OperationEdgeAttrs` doesn't include edge type). Con: graphology edges must carry attributes from a single schema.
- (c) Union type on edge attributes, letting the consumer tag the edge. Pro: flexible. Con: runtime discrimination burden.
- **Notes**: The current schema already stores `edgeType` alongside the edge-specific attributes in graphology (see schema.md's "Edge type storage" section), which is effectively option (a) at the storage level. The question is really about the TypeScript type API: should `OperationEdgeAttrs` include `edgeType: "typed"` or should that be a separate concern?
- **Cross-references**: OQ-01 (if incompatible edges exist, they need tagging)
- **Resolution**: Option (a) — `edgeType` is required on all edges. The mode-specific attribute schemas (`OperationEdgeAttrs`, `TriggeredEdgeAttrs`, `DependencyEdgeAttrs`) do NOT include `edgeType` — it is stored as a universal attribute alongside the mode-specific attributes in graphology. This ensures consistent serialization/deserialization, uniform graphology queries, and straightforward edge-type filtering across all graph modes. The redundancy for operation graphs (where `edgeType` is always `"typed"`) is a minor ergonomic cost for significant consistency gains. See ADR-006 for the full decision record.
- **Cross-references**: OQ-29
---
@@ -128,10 +124,6 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
- **Status**: resolved by ADR-005
- **Priority**: high — affects the core status state machine
- **Question**: If an operation fails and should be retried, the status would need to go `running → failed → ready → running`. But the current state machine marks `failed` as terminal with no exit transitions. How should this work?
- **Options**:
- (a) A `retried` status that allows re-entering `ready`. Con: adds another state to `NodeStatus`.
- (b) A separate `retryCount` attribute. A node can reset its status from `failed` to `ready` if `retryCount < maxRetries`. Con: breaks the terminal-state invariant.
- (c) Retry creates a new node (new `requestId`). The old node stays `failed`. Con: increases graph size but preserves state machine integrity.
- **ADR-005 resolution**: Option (c) is correct, and the event log makes it natural. A retry is not a state mutation — it's a new sequence of events appended to the log. When `call.requested` arrives for the same operation with a new `requestId`, it's a new fact. The old `call.error` event remains in the log as history. The status projection derives the current state by scanning for the most recent event per node. No `retried` status needed; no state machine mutation; the log preserves full history.
- **Cross-references**: OQ-10
@@ -150,27 +142,25 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
### OQ-11: Should preconditions support OR logic?
- **Origin**: [reactive-execution.md](reactive-execution.md) Q1
- **Status**: open
- **Status**: resolved
- **Priority**: medium — affects the precondition computation model
- **Question**: Currently all predecessors must complete (AND logic). An `anyOf` predicate would allow "start this node as soon as any predecessor completes."
- **Notes**: OR preconditions would require either: (a) an edge attribute indicating `allOf` vs `anyOf`, (b) a node-level configuration, or (c) a separate `anyOfPredecessors` computed per node. This is a semantic change that affects both the DAG structure and the reactive engine. Might be a v2 feature.
- **Resolution**: No for v1. All preconditions use AND logic — a node becomes `ready` only when ALL predecessors have reached a satisfying terminal state (`completed` or `skipped`). OR logic (`anyOf`) would introduce significant complexity (what happens when one predecessor completes but another fails?) and is already partially addressed by `Conditional` (which provides branch-level either/or semantics). For v2, if OR logic becomes necessary, it should be added as a `preconditionMode: "allOf" | "anyOf"` attribute on `Operation` (node-level, not edge-level), defaulting to `"allOf"`. This is a clean extension point that doesn't change the current precondition model.
- **Cross-references**: OQ-12
### OQ-12: How does `maxConcurrency` interact with preconditions?
- **Origin**: [reactive-execution.md](reactive-execution.md) Q4
- **Status**: open
- **Status**: resolved
- **Priority**: medium — a `Parallel` group with `maxConcurrency: 3` should only start 3 nodes at a time
- **Notes**: `maxConcurrency` is a scheduling concern, not a structural one. The DAG doesn't encode it. Options: (a) a semaphore signal in the reactive layer, (b) coordinator-enforced throttling, (c) a `maxConcurrency` prop on `Parallel` that the reactive engine respects. The `<Parallel>` component already has `maxConcurrency` as an optional prop in its definition (workflow-templates.md).
- **Resolution**: `maxConcurrency` is a `Parallel` prop enforced by the `WorkflowReactiveRoot` via a reactive counting semaphore. When the root initializes signals for nodes in a `Parallel` group with `maxConcurrency: N`, it wraps the precondition logic: a node's effective `ready` transition requires both `preconditions.value === true` AND `runningCount < maxConcurrency`, where `runningCount` is a reactive computed derived from counting sibling nodes currently in the `running` state. This is entirely a reactive-engine concern — the DAG doesn't encode `maxConcurrency` (it's not structural), and the call graph doesn't need to know about it. The `Parallel` component's `maxConcurrency` prop is already part of the template definition; the reactive engine just needs to honor it.
- **Cross-references**: OQ-11, workflow-templates `Parallel` component
### OQ-13: Should `blockedByFailure` be a separate `computed` or derived from `preconditions`?
- **Origin**: [reactive-execution.md](reactive-execution.md) Q5
- **Status**: open
- **Priority**: low — implementation detail, can be decided during implementation
- **Question**: Currently there are two separate `computed` values `preconditions` (all predecessors completed/skipped) and `blockedByFailure` (any predecessor failed/aborted). An alternative is a single `computed<NodeReadiness>` returning `"ready" | "blocked" | "failed"`.
- **Notes**: Two separate `computed` values are more composable (you can check preconditions independently of failure status) but require two effects per node. A single `computed` is simpler (one effect) but less composably queryable. This is largely an implementation choice that doesn't affect the public API. Can be deferred to implementation.
- **Status**: resolved
- **Priority**: low — implementation detail
- **Resolution**: Keep two separate `computed` values (current design). Two separate computeds are more composable — you can check preconditions independently of failure status, and you can compose different effects for each. A single `computed<NodeReadiness>` would require every consumer to destructure the result, losing the clean `if (preconditions.value) { ... }` pattern. The implementation cost of two effects per node is negligible. The current design is confirmed.
---
@@ -179,69 +169,58 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
### OQ-14: Should the call graph support unknown `operationId`?
- **Origin**: [call-graph.md](call-graph.md) Q1
- **Status**: open (with a proposed answer)
- **Status**: resolved
- **Priority**: medium — affects `fromCallEvents()` and `updateFromEvent()` behavior
- **Proposed answer**: Yes. The call graph records what happened, not what should have happened. Nodes with unknown `operationId` get `status: "pending"` and may later transition to `"failed"` with an `OPERATION_NOT_FOUND` error code.
- **Notes**: The doc already has a proposed answer. This just needs confirmation and the behavior documented in the `fromCallEvents()` spec.
- **Resolution**: Yes — the call graph records what happened, not what should have happened. Nodes with unknown `operationId` get `status: "pending"` and may later transition to `"failed"` with an `OPERATION_NOT_FOUND` error code. This is consistent with the error-handling doc's existing statement about unknown `operationId`. The behavior is documented explicitly in the `fromCallEvents()` specification: when a `call.requested` event references an `operationId` not in the registry, the node is still created with `status: "pending"` and the given `operationId`. This enables the call graph to serve as a complete audit trail regardless of registry state.
### OQ-15: Should the call graph support multiple graphs simultaneously?
- **Origin**: [call-graph.md](call-graph.md) Q3
- **Status**: open
- **Priority**: low — can be deferred to v2
- **Question**: Currently one `FlowGraph` instance = one call graph. If the hub needs to track multiple concurrent workflows, it uses multiple instances. An alternative is a single graph with workflow-scoped subgraphs.
- **Notes**: The current design (multiple instances) is simpler and matches graphology's model. Subgraphs would require a scoping mechanism. This can be deferred unless early usage shows it's needed.
- **Status**: resolved
- **Priority**: low — confirmed as correct design, not a deferral
- **Resolution**: No — one `FlowGraph` instance per graph. Multiple concurrent workflows use multiple instances. This design is simpler and matches graphology's model. Subgraphs would require a scoping mechanism and cross-scope queries that add complexity without benefit at current scale. The hub coordinator creates one `WorkflowReactiveRoot` per workflow, so one `FlowGraph` per workflow is consistent. This is a deliberate "no," not a deferral — if future scale demands require multi-workflow queries, a specialized query layer can aggregate across instances.
### OQ-16: Should `filterByStatus` use an index?
- **Origin**: [call-graph.md](call-graph.md) Q4
- **Status**: open
- **Status**: resolved
- **Priority**: low — premature optimization for small graphs
- **Notes**: Call graphs at hub level are typically tens of nodes. O(n) filter is fast enough. An index can be added later if performance becomes an issue. Can be deferred.
- **Resolution**: No — O(n) filter is sufficient for expected graph sizes (tens to hundreds of nodes). A status index would add implementation complexity (maintain on every `updateStatus()`) for no measurable benefit at current scale. If performance becomes an issue with very large graphs, a `Map<CallStatus, Set<string>>` index can be added as an optimization later without changing the public API.
### OQ-17: Should `FlowGraph` expose graphology's traversal methods directly?
- **Origin**: [flowgraph-api.md](flowgraph-api.md) Q1
- **Status**: open
- **Status**: resolved
- **Priority**: medium — affects the public API surface
- **Question**: Currently the plan is convenience methods that delegate. But some consumers may find it inconvenient to go through `.graph.forEachNode()`.
- **Options**:
- (a) Convenience methods only (current plan). Direct access via `.graph` for power users.
- (b) Expose graphology's traversal methods directly on `FlowGraph` (e.g., `flowGraph.forEachNode()`).
- (c) Expose only the most common traversal methods and let `.graph` handle the rest.
- **Notes**: This is a UX decision. Option (a) keeps the API surface small. Option (b) is more convenient but increases the delegation surface. Option (c) is a middle ground. The decision can be made during implementation based on actual consumer usage patterns.
- **Resolution**: Option (c) — expose the most common traversal methods directly on `FlowGraph`, let `.graph` handle the rest. The directly exposed methods are: `forEachNode()`, `forEachEdge()`, `nodes()`, `edges()`, `order`, `size`, `inNeighbors()`, `outNeighbors()` (already exposed as `predecessors()`/`successors()`). Less common methods (degree, detailed attribute iteration, adjacency queries) remain accessible via `flowGraph.graph`. This is the 80/20 approach: consumers get a clean API for common operations, and power users get the escape hatch.
### OQ-18: Should `addOperation` auto-populate type-compat edges?
- **Origin**: [flowgraph-api.md](flowgraph-api.md) Q2
- **Status**: open
- **Status**: resolved
- **Priority**: low — affects incremental construction behavior
- **Question**: `fromSpecs()` calls `buildTypeEdges()` which adds all type-compatibility edges. Should `addOperation()` (incremental) also attempt auto-type-compat edge creation?
- **Notes**: This is only relevant for incremental construction (rare use case). The operation graph is typically built once via `fromSpecs()`. If incremental construction is needed, the consumer can call `buildTypeEdges()` manually after adding operations. Can be deferred.
- **Resolution**: No — `addOperation()` adds a node only. Call `buildTypeEdges()` manually after incremental construction. Auto-population would require O(n) comparisons on every `addOperation()`, which adds complexity for a rare use case (the operation graph is typically built once via `fromSpecs()`). If incremental construction is needed, the consumer can call `buildTypeEdges()` manually after adding operations.
### OQ-28: Should `FlowGraph` share analysis functions across instances?
- **Origin**: [flowgraph-api.md](flowgraph-api.md) Q3
- **Status**: open
- **Status**: resolved
- **Priority**: low — optimization concern, not blocking
- **Question**: Currently each `FlowGraph` instance owns its own `DirectedGraph`. A future optimization could pool analysis functions across instances.
- **Notes**: Distinct from OQ-15 (multiple graphs per instance) — this is about sharing analysis logic, not about graph scoping. Can be deferred.
- **Resolution**: No — each `FlowGraph` instance owns its own `DirectedGraph`, and analysis functions are stateless pure functions that take a graph as input. There's nothing to pool or share. The `FlowGraph` convenience methods delegate to these standalone functions. Shared analysis "instances" would only make sense if the functions had internal caches, but they don't. This question conflated "sharing analysis functions" (already done — `typeCompat` is a standalone function) with "sharing graph data" (unnecessary since analysis doesn't cache state).
### OQ-19: Should `parallelGroups` account for resource constraints?
- **Origin**: [analysis.md](analysis.md) Q4
- **Status**: open
- **Status**: resolved
- **Priority**: low — feature enhancement, not a core concern
- **Question**: Currently `parallelGroups()` returns the theoretical maximum parallelism. An optional `maxConcurrency` parameter could limit group sizes for realistic scheduling.
- **Notes**: Can be added later as an optional parameter. Not blocking.
- **Resolution**: No for v1 — `parallelGroups()` returns theoretical maximum parallelism. Adding resource constraints would conflate structural analysis with scheduling policy. The `maxConcurrency` prop on `Parallel` is a runtime scheduling concern handled by the reactive engine (see OQ-12), not a structural analysis concern. If consumers need resource-aware scheduling, they can post-process the `parallelGroups()` output with their own constraints. An optional `maxConcurrency` parameter can be added in v2 as a convenience, but the core analysis function stays pure.
### OQ-27: Should `validateTemplate` check runtime preconditions?
- **Origin**: [analysis.md](analysis.md) Q2
- **Status**: open (intentionally deferred)
- **Status**: resolved
- **Priority**: low — explicitly out of scope for static analysis
- **Question**: Currently `validateTemplate` only checks structural validity and type compatibility. Runtime preconditions (e.g., "operation B requires an API key that operation A doesn't have access to") are beyond the scope of static analysis and belong to the access control layer.
- **Notes**: This is a deliberate scope boundary, not a design gap. Documented here to confirm that this is an intentional deferral, not an oversight.
- **Resolution**: Explicitly out of scope. `validateTemplate` only checks structural validity and type compatibility. Runtime preconditions (e.g., "operation B requires an API key that operation A doesn't have access to") belong to the access control layer, not the static analysis layer. This is a deliberate scope boundary, not a design gap.
---
@@ -250,33 +229,31 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
### OQ-29: Should GraphologyHostConfig produce a separate graph per edge type?
- **Origin**: [host-configs.md](host-configs.md) Q2
- **Status**: open
- **Status**: resolved
- **Priority**: medium — affects implementation of the GraphologyHostConfig
- **Question**: Currently all edge types (`sequential`, `conditional`, `typed`) share the same graph. An alternative is a separate graph per edge type, enabling type-specific queries without filtering.
- **Notes**: Related to OQ-04 (edge type consistency at the schema level) but distinct — this is about the runtime graph structure, not the type design. Multiple graphs would make type-specific queries faster (no filtering) but increase complexity and memory usage.
- **Resolution**: No — all edge types share a single graph, with `edgeType` as a universal required attribute on every edge (consistent with OQ-004 resolution). Separate graphs per edge type would add complexity (cross-graph traversal, cache coherence, multi-graph queries) for a marginal performance gain at current scale. Single-graph filtering by `edgeType` is O(n) on edges and negligible for expected graph sizes. If a concrete performance issue arises, a `Map<EdgeType, DirectedGraph>` internal index can be added as an optimization without changing the API. See ADR-006 for the full decision on `edgeType` consistency.
- **Cross-references**: OQ-04
### OQ-20: How should conditional edge conditions be represented?
- **Origin**: [schema.md](schema.md) Q3
- **Status**: open
- **Status**: resolved
- **Priority**: medium — affects `TemplateEdgeAttrs.condition` type safety
- **Options**:
- (a) `Type.Unknown()` with documentation (current). Pro: maximally flexible. Con: no type safety.
- (b) `Type.Union([Type.String(), Type.Function(...)])` for expression strings and function references. Pro: documents both forms. Con: functions don't serialize.
- (c) A dedicated `ConditionSchema` that flowgraph defines. Pro: type safe, consistent. Con: may be overly prescriptive.
- **Notes**: The workflow-templates doc already specifies `Conditional.test` as `((results: Record<string, CallResult>) => boolean) | string`, and the host-configs doc notes that function props need runtime resolution. Option (b) seems like the pragmatic choice that matches the existing design, but the schema representation is what needs deciding.
- **Known Gap** (from [host-configs.md](host-configs.md)): "Conditional Test Evaluation" — the `Conditional.test` function needs access to the `WorkflowContext`/`ReactiveContext` at runtime to evaluate against predecessor results. This is a concrete sub-problem of OQ-06 (how the reactive host config bridges to execution).
- **Cross-references**: OQ-05 (conditional branch behavior in reactive engine), OQ-06 (runtime resolution of function props)
- **Resolution**: `condition: Type.Optional(Type.Unknown())` with documentation describing the two runtime forms. The condition field accepts:
1. **String form** (`string`): A serializable reference to an operation name whose result determines the branch. Survives JSON round-trips.
2. **Function form** (`(results: Record<string, CallResult>) => boolean`): A runtime-evaluated predicate. Does NOT survive JSON serialization.
`@alkdev/typebox`'s `Type.Function()` defines serializable function input/output **schemas** (shapes), but `Conditional.test` predicates are runtime closures — they can't be represented as serializable function schemas. Using `Type.Function()` here would conflate the function's shape schema with the runtime closure itself. `Type.Unknown()` with clear documentation is the pragmatic choice for v1, accepting that JSON serialization only preserves the string form. A dedicated `ConditionSchema` can be introduced in v2 if template interchange needs schema-level condition descriptions, but only if there's a concrete use case for representing conditions as typed data (rather than as code).
- **Known Gap** (from [host-configs.md](host-configs.md)): "Conditional Test Evaluation" — the `Conditional.test` function needs access to the `WorkflowContext`/`ReactiveContext` at runtime to evaluate against predecessor results. This gap is resolved by ADR-005: `Conditional.test` reads from the result projection.
- **Cross-references**: OQ-05, OQ-06
### OQ-21: Should templates support explicit `depends_on` edges?
- **Origin**: [workflow-templates.md](workflow-templates.md) Q3
- **Status**: open
- **Status**: resolved
- **Priority**: medium — affects template composition expressiveness
- **Question**: Currently dependencies are inferred from structure (sequential implies dependency). An explicit `<DependsOn target="operation-name" />` component would make data dependencies visible in the template without relying on sequential ordering.
- **Notes**: This would add expressiveness but also complexity. Implicit dependency from structure is simpler and covers the most common cases. Explicit `depends_on` would be needed when a node depends on a non-adjacent predecessor in a way that can't be expressed by a `Sequential` group. Can be deferred to v2.
- **Cross-references**: OQ-08 (call graph `depends_on` edges)
- **Resolution**: No for v1. ADR-005's `dataFlow` inference and the result projection make explicit `depends_on` unnecessary for current use cases. Data dependencies are expressed through the result projection — if B needs A's output, B reads `getResult("A")`. The `dataFlow: true` attribute on edges captures which edges carry data. An explicit `<DependsOn>` component would add template syntax complexity and potentially conflict with structural ordering. If a future use case requires non-adjacent data dependencies that can't be expressed by restructuring the template, `<DependsOn>` can be added as a v2 extension. But v1 intentionally restricts dependencies to follow the structural flow.
- **Cross-references**: OQ-08
---
@@ -285,26 +262,21 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
### OQ-22: Should `CallNodeAttrs.identity` be a structured type or `Type.Record`?
- **Origin**: [schema.md](schema.md) Q2
- **Status**: open
- **Status**: resolved
- **Priority**: medium — affects the `@alkdev/operations` peer dependency
- **Options**:
- (a) Import `Identity` from `@alkdev/operations` (peer dep). Pro: matches call protocol. Con: creates a direct type dependency.
- (b) Duplicate the type in flowgraph. Pro: no dependency. Con: divergence risk.
- (c) Use `Type.Record(Type.String(), Type.Array(Type.String()))` for the `resources` field. Pro: flexible. Con: weaker typing.
- **Notes**: Since `@alkdev/operations` is already a peer dependency for type imports, option (a) seems reasonable. The concern is version alignment, but semver ranges handle this. This could also be a `Type.Unknown()` with documentation, letting the consumer validate.
- **Resolution**: Option (a) — import the `Identity` type structure from `@alkdev/operations` (peer dependency). Since `@alkdev/operations` is already a peer dependency (for `CallEventMapValue`), adding this type import creates minimal additional coupling. The `CallNodeAttrs.identity` field mirrors the `Identity` interface: `{ id, scopes, resources? }`. Version alignment is handled by semver ranges. The TypeBox schema for `identity` is defined inline in `CallNodeAttrs` to match the shape (not imported as a TypeBox schema from operations, since `Identity` is a TypeScript interface there), but the field semantics match exactly.
### OQ-23: Multiple graphs per `FlowGraph` instance?
- **Origin**: [call-graph.md](call-graph.md) Q3 (same as OQ-15)
- **Status**: open (duplicate of OQ-15 — see above)
- **Status**: resolved (duplicate of OQ-15 — see above)
### OQ-24: Async analysis functions?
- **Origin**: [analysis.md](analysis.md) Q3
- **Status**: open
- **Status**: resolved
- **Priority**: low — premature for current scale
- **Question**: Should analysis functions be async for large graphs? Current graphs are small (50-200 nodes), synchronous is fine.
- **Notes**: Can be deferred. If large graphs become common, async analysis can be added with an optional `async` variant.
- **Resolution**: No — synchronous is sufficient for current scale (10-200 nodes). Making functions async would add API complexity (Promise return types, async/await boilerplate) for no current benefit. If large graphs become common, `typeCompat()` and `buildTypeEdges()` can gain async variants alongside the synchronous ones.
---
@@ -313,12 +285,10 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
### OQ-25: Should the reactive graph support partial re-rendering?
- **Origin**: [reactive-execution.md](reactive-execution.md) Q3
- **Status**: open (blocked on ujsx reconciler)
- **Priority**: low — blocked on ujsx reconciler implementation
- **Question**: If a template changes mid-execution, the ujsx reconciler could diff and apply changes. Currently only mount rendering is supported.
- **Known Gap** (from [host-configs.md](host-configs.md)): "ujsx Reconciler Not Yet Available" — the current `HostConfig` is mount-only: no incremental template updates, no `prepareUpdate`/`commitUpdate` flow. This gap is broader than just re-rendering.
- **Notes**: This is entirely dependent on the ujsx reconciler, which is not yet implemented. The host-configs doc notes "currently mount-only." When the reconciler is available, flowgraph gets re-rendering "for free." This question should be revisited after the reconciler is implemented.
- **Cross-references**: OQ-05 (structural container handling during re-render), host-configs.md "Known Gaps"
- **Status**: resolved
- **Priority**: low — blocked on ujsx reconciler, now resolved with clear path
- **Resolution**: Blocked on ujsx reconciler. When the reconciler is implemented, flowgraph gains re-rendering through the standard `prepareUpdate`/`commitUpdate` HostConfig methods. The event log persists across re-renders (ADR-005), so re-rendered nodes pick up where they left off. No special reactive-graph re-rendering logic is needed — the reconciler handles tree diffing, and the HostConfig applies mutations. For v1 (before the reconciler), the reactive tree is built once and torn down via `WorkflowReactiveRoot.dispose()`.
- **Cross-references**: OQ-05, host-configs.md "Known Gaps"
---
@@ -327,10 +297,40 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
### OQ-26: How to handle version conflicts?
- **Origin**: [operation-graph.md](operation-graph.md) Q2
- **Status**: open
- **Priority**: low — can be deferred to a versioning use case
- **Question**: If two versions of the same operation exist in the registry, should they be separate nodes (`task.classify@1.0.0` vs `task.classify@2.0.0`) or should the latest version win?
- **Notes**: The current design uses `namespace.name` (no version) as the node key, meaning only one version per operation. This is intentional simplicity. Version conflicts are a niche concern that can be addressed when a concrete use case arises.
- **Status**: resolved
- **Priority**: low — confirmed as correct design, not a deferral
- **Resolution**: The current design uses `namespace.name` (no version) as the node key, meaning only one version per operation can exist in the graph. This is intentional simplicity. Version conflicts are a niche concern that would add significant complexity (version-aware node keys like `namespace.name@version`, multi-version edges, version compatibility matrices) without a concrete use case. If versioning becomes needed, the node key format could be extended to `namespace.name@version`, but this is a significant change that requires careful consideration. For v1, the one-version-per-operation constraint is sufficient and keeps the key format simple and consistent.
---
## ADR-006: Edge Type Consistency and Single-Graph Architecture
**Status**: Accepted
**Context**: Two related questions (OQ-04, OQ-29) affect how edge types are represented in flowgraph:
- Should `edgeType` be a required attribute on all edges, or only on edges where it varies?
- Should `GraphologyHostConfig` produce separate graphs per edge type, or a single shared graph?
**Decision**:
1. `edgeType` is a required universal attribute on every edge, stored alongside (not inside) mode-specific attribute schemas.
2. All edge types share a single graphology `DirectedGraph` instance per `FlowGraph`.
3. Mode-specific attribute schemas (`OperationEdgeAttrs`, `TriggeredEdgeAttrs`, `DependencyEdgeAttrs`) do **not** include `edgeType` — it's stored separately at the graphology level.
4. `TemplateEdgeAttrs` includes `edgeType` as a constrained union (`"sequential" | "conditional"`) because template edges need to distinguish their type for rendering.
**Rationale**:
- Consistent serialization/deserialization (graphology native JSON format requires edge attributes)
- Uniform graphology queries and edge-type filtering across all graph modes
- The redundancy for operation graphs (`edgeType` is always `"typed"`) is a minor cost for significant consistency gains
- Separate graphs per edge type would add complexity (cross-graph traversal, cache coherence, multi-graph queries) without benefit at current scale
- Single-graph filtering by `edgeType` is O(n) on edges — negligible for expected graph sizes
**Consequences**:
- All `FlowGraph` instances store edges with `{ edgeType, ...modeSpecificAttrs }` at the graphology level
- Edge-type filtering is done via standard graphology attribute queries
- The `CallEdgeAttrs` union type is discriminated by `edgeType` at runtime (not by TypeBox schema validation, since both variants are empty objects)
- Serialization validation is a two-step process: (1) check that `edgeType` is present and matches the expected value for the graph mode, (2) validate remaining attributes against the mode-specific schema
- The `triggered` edge type gets the simple `${source}->${target}` key format; `depends_on` always gets the composite `${source}->${target}:depends_on` format (see schema.md Edge Key Convention)
- Future optimization (if needed) could add an internal `Map<EdgeType, Set<string>>` index without changing the public API
---
@@ -340,70 +340,50 @@ Cross-cutting compilation of all unresolved questions across the flowgraph archi
|----|----------|--------|----------|--------|
| OQ-01 | All edges or only compatible edges? | operation-graph | high | resolved |
| OQ-02 | Type compatibility depth and granularity | operation-graph, analysis | high | resolved |
| OQ-03 | Subscription operations in type compat | operation-graph | medium | open |
| OQ-04 | `edgeType` on all edges? | schema | medium | open |
| OQ-03 | Subscription operations in type compat | operation-graph | medium | resolved |
| OQ-04 | `edgeType` on all edges? | schema | medium | resolved |
| OQ-05 | Structural container transparency | workflow-templates, host-configs | high | resolved |
| OQ-06 | Template ↔ call protocol interaction | workflow-templates, host-configs | high | resolved |
| OQ-07 | Should reactive engine own call graph? | host-configs | high | resolved |
| OQ-08 | Auto-populate `depends_on` from templates? | call-graph | medium | resolved |
| OQ-09 | Retries at signal level | reactive-execution | high | resolved |
| OQ-10 | Running nodes when predecessor fails | reactive-execution | high | resolved |
| OQ-11 | OR logic for preconditions | reactive-execution | medium | open |
| OQ-12 | `maxConcurrency` interaction with preconditions | reactive-execution | medium | open |
| OQ-13 | `blockedByFailure` vs single computed | reactive-execution | low | open |
| OQ-14 | Unknown `operationId` in call graph | call-graph | medium | open (proposed) |
| OQ-15 | Multiple graphs per instance | call-graph | low | open |
| OQ-16 | `filterByStatus` index | call-graph | low | open |
| OQ-17 | Expose graphology traversal directly? | flowgraph-api | medium | open |
| OQ-18 | Auto-populate type edges on `addOperation`? | flowgraph-api | low | open |
| OQ-19 | `parallelGroups` with resource constraints | analysis | low | open |
| OQ-20 | Conditional edge condition representation | schema | medium | open |
| OQ-21 | Explicit `depends_on` in templates | workflow-templates | medium | open |
| OQ-22 | `CallNodeAttrs.identity` type | schema | medium | open |
| OQ-24 | Async analysis functions | analysis | low | open |
| OQ-25 | Partial re-rendering | reactive-execution | low | open (blocked) |
| OQ-26 | Operation version conflicts | operation-graph | low | open |
| OQ-27 | Runtime preconditions in validateTemplate? | analysis | low | open (deferred) |
| OQ-28 | Share analysis functions across instances? | flowgraph-api | low | open |
| OQ-29 | Separate graph per edge type? | host-configs | medium | open |
| OQ-11 | OR logic for preconditions | reactive-execution | medium | resolved |
| OQ-12 | `maxConcurrency` interaction with preconditions | reactive-execution | medium | resolved |
| OQ-13 | `blockedByFailure` vs single computed | reactive-execution | low | resolved |
| OQ-14 | Unknown `operationId` in call graph | call-graph | medium | resolved |
| OQ-15 | Multiple graphs per instance | call-graph | low | resolved |
| OQ-16 | `filterByStatus` index | call-graph | low | resolved |
| OQ-17 | Expose graphology traversal directly? | flowgraph-api | medium | resolved |
| OQ-18 | Auto-populate type edges on `addOperation`? | flowgraph-api | low | resolved |
| OQ-19 | `parallelGroups` with resource constraints | analysis | low | resolved |
| OQ-20 | Conditional edge condition representation | schema | medium | resolved |
| OQ-21 | Explicit `depends_on` in templates | workflow-templates | medium | resolved |
| OQ-22 | `CallNodeAttrs.identity` type | schema | medium | resolved |
| OQ-23 | Multiple graphs per instance | call-graph | low | resolved (duplicate of OQ-15) |
| OQ-24 | Async analysis functions | analysis | low | resolved |
| OQ-25 | Partial re-rendering | reactive-execution | low | resolved |
| OQ-26 | Operation version conflicts | operation-graph | low | resolved |
| OQ-27 | Runtime preconditions in validateTemplate? | analysis | low | resolved |
| OQ-28 | Share analysis functions across instances? | flowgraph-api | low | resolved |
| OQ-29 | Separate graph per edge type? | host-configs | medium | resolved |
### Priority Assessment
### All Questions Resolved
**Resolved (ADR-005)**:
- ~~OQ-01: All edges or only compatible~~ — resolved: type-compat edges only on `dataFlow: true` edges
- ~~OQ-02: Type compatibility depth~~ — resolved: type checking only for state-transfer edges
- ~~OQ-05: Structural container transparency~~ — resolved: containers stay transparent, aggregate status is a projection
- ~~OQ-06: Template ↔ call protocol~~ — resolved: bridge through event log
- ~~OQ-07: Reactive engine owns call graph?~~ — resolved: both are projections of event log
- ~~OQ-08: Auto-populate `depends_on` from templates?~~ — resolved: unnecessary, data flows through result projection
- ~~OQ-09: Retries at signal level~~ — resolved: append events, not state mutations
- ~~OQ-10: Running node failure handling~~ — resolved: projection policy, default is running nodes continue
**High priority** (should resolve before implementation):
- (all high-priority questions have been resolved)
**Medium priority** (should resolve before v1 release):
- OQ-03, OQ-04, OQ-11, OQ-12, OQ-14, OQ-17, OQ-20, OQ-21, OQ-22, OQ-29
**Low priority** (can defer or decide during implementation):
- OQ-13, OQ-15, OQ-16, OQ-18, OQ-19, OQ-24, OQ-25, OQ-26, OQ-27, OQ-28
All open questions have been resolved. The architecture is now fully specified and ready for implementation decomposition.
### Cross-Cutting Themes
These groups of questions interact with each other and should be resolved together:
All cross-cutting theme groups have been resolved:
1. **~~Edge semantics group~~** (OQ-01, OQ-02, OQ-04): ~~All affect the operation graph's edge structure and the type compatibility API.~~ **Resolved by ADR-005.** OQ-01 and OQ-02 resolved (type checking only on `dataFlow: true` edges). OQ-04 remains open (edge type on all edges).
1. **Edge semantics group** (OQ-01, OQ-02, OQ-04): All resolved. Type checking only on `dataFlow: true` edges. `edgeType` is universal on all edges (ADR-006).
2. **~~Call protocol integration group~~** (OQ-06, OQ-07, OQ-08): ~~All about how flowgraph connects to the live call protocol.~~ **Resolved by ADR-005.** All three resolved: bridge through event log, projections instead of ownership, data flow through result projection.
2. **Call protocol integration group** (OQ-06, OQ-07, OQ-08): All resolved by ADR-005. Bridge through event log, projections instead of ownership, data flow through result projection.
3. **~~Failure semantics group~~** (OQ-09, OQ-10): ~~Both about how failure and retry propagate through the reactive engine.~~ **Resolved by ADR-005.** Retries are append events; running node failure is a projection policy.
3. **Failure semantics group** (OQ-09, OQ-10): All resolved by ADR-005. Retries are append events; running node failure is a projection policy.
4. **Scheduling group** (OQ-11, OQ-12): Both about how preconditions interact with scheduling constraints.
4. **Scheduling group** (OQ-11, OQ-12): All resolved. AND-only preconditions for v1, `maxConcurrency` via reactive counting semaphore.
4. **Scheduling group** (OQ-11, OQ-12): Both about how preconditions interact with scheduling constraints.
5. **Template expressiveness group** (OQ-05, OQ-20, OQ-21): All resolved. Containers stay transparent, `condition` as `Type.Unknown()` with documentation, no explicit `depends_on` for v1.
5. **Template expressiveness group** (OQ-05, OQ-20, OQ-21): All about what the template system can express and how it renders.
6. **Graph structure group** (OQ-04, OQ-29): Both about how edge types are represented in the graph — OQ-04 at the schema/type level, OQ-29 at the runtime graph structure level. Resolution of one constrains the other.
7. **Known gaps from host-configs.md** — not all "known gaps" are "open questions" (the reconciler gap is a dependency, not a design question), but they should be tracked here for completeness.
6. **Graph structure group** (OQ-04, OQ-29): All resolved by ADR-006. Universal `edgeType` on all edges, single shared graph per `FlowGraph`.