Files
flowgraph/docs/architecture/open-questions.md
glm-5.1 c76be7f689 ADR-005 accepted: resolve all open consequences, update cascading docs
Resolve the three open consequences from ADR-005 (Event Log as Single
Source of Truth) and transition from Proposed to Accepted:

1. Event log IS the call protocol event stream — not a separate type,
   but an EventLogProjection interface (append/getStatus/getResult/
   getEvents) over CallEventMapValue[] with an append-only contract.

2. Event log persists across template re-renders — projections recompute
   against the new DAG; orphaned events stay in log for audit but don't
   affect active projections.

3. Edges get dataFlow: boolean attribute on TemplateEdgeAttrs — inferred
   (not manual) by GraphologyHostConfig from template expressions.
   typeCompat() only runs on dataFlow: true edges. Inference rules are
   precisely specified for Conditional.test, Map.over, and Operation.input.

Also resolve OQ-05 (structural containers stay transparent; aggregate
status is a projection from children) and OQ-10 (running node failure
is a FailurePolicy configuration, default continues-running).

Cascading updates to:
- reactive-execution.md: add hybrid status model (event-log-driven vs
  projection-driven vs signal-mutation), EventLogProjection interface,
  result projection respecting retries, FailurePolicy type
- host-configs.md: ReactiveContext now includes resultProjection and
  computed results; resolved Q1/Q3/Q4
- schema.md: dataFlow attribute on TemplateEdgeAttrs with inference
  rules and type checking implications
- workflow-templates.md: edge creation rules with dataFlow, result
  projection in Conditional/Map, resolved Q1/Q4
- open-questions.md: all ADR-005 questions marked resolved, updated
  summary table and cross-cutting themes, removed duplicate OQ-07

7 files changed, 464 insertions, 139 deletions
2026-05-21 07:44:28 +00:00

409 lines
31 KiB
Markdown

---
status: draft
last_updated: 2026-05-20
---
# Open Questions Tracker
Cross-cutting compilation of all unresolved questions across the flowgraph architecture documents, organized by theme. Questions that appear in multiple documents are unified here with cross-references.
## How to Use This Document
- Each question has an **ID** (e.g., OQ-01), **status**, **origin** (which doc(s)), and **priority** assessment
- **Cross-references** link related questions that may conflict or answer each other
- When a question is resolved, update its status to `resolved` and add a resolution note
- Once all questions in a theme are resolved, the theme section can be removed
## ADR-005 Impact
[ADR-005: Event Log as Single Source of Truth](decisions/005-event-log-as-source-of-truth.md) proposes an Execution Event Log pattern that resolves or reframes several open questions. ADR-005 is now **Accepted**. All questions it affects have been resolved:
| Question | ADR-005 Impact | Final Resolution |
|----------|-----------------|-------------------|
| OQ-01 | Reframed → Resolved | Type-compat edges only on `dataFlow: true` edges. Temporal edges bypass type checking. |
| OQ-02 | Reframed → Resolved | Type checking scope narrows to state-transfer edges. Structured mismatch reporting confirmed. |
| OQ-05 | Independent → Resolved | Containers stay transparent. Aggregate status computed as projection from children. |
| OQ-06 | Resolved | The reactive layer bridges to call protocol through the event log. Hub appends events; reactive layer projects them. |
| OQ-07 | Resolved | Call graph and reactive engine are both projections of the event log. Neither owns the other. |
| OQ-08 | Resolved | `depends_on` edges unnecessary. Data dependencies expressed through result projection. |
| OQ-09 | Resolved | Retries are natural append events. New `requestId` per retry. |
| OQ-10 | Reframed → Resolved | Running node failure handling is a projection policy, not a state machine rule. Default: running nodes continue. |
## Theme 1: Edge Semantics and Type Compatibility
### OQ-01: Should `fromSpecs()` add ALL edges or only compatible ones?
- **Origin**: [operation-graph.md](operation-graph.md) Q1
- **Status**: resolved
- **Priority**: high — affects storage size, API surface, and diagnostic value
- **Resolution**: Adopt option (a) for state-transfer edges, option (b) for temporal-only edges. Type-compatibility edges (with `compatible: true/false` attributes) are only added where data flows between operations. The `dataFlow` attribute on `TemplateEdgeAttrs` (resolved in ADR-005) determines which edges need type checking. For edges where `dataFlow: true`, both compatible and incompatible edges provide diagnostic value. For edges where `dataFlow: false`, no type-compat edge is needed — temporal ordering doesn't have type compatibility.
- **Cross-references**: OQ-04
### OQ-02: How granular should type compatibility results be?
- **Origin**: [operation-graph.md](operation-graph.md) Q4, [analysis.md](analysis.md) Q1
- **Status**: resolved
- **Priority**: high — directly shapes the `typeCompat()` return type and `OperationEdgeAttrs`
- **Resolution**: Type compatibility checking only applies to **state-transfer edges** (where A's output flows into B's input), as established by ADR-005's `dataFlow` attribute on `TemplateEdgeAttrs`. Temporal-only edges bypass type checking entirely (their "compatibility" is trivially true). The `typeCompat()` function returns `{ compatible, detail?, mismatches? }` for state-transfer edges only. The schema already has `mismatches?: TypeMismatch[]` in `OperationEdgeAttrs` — this design is confirmed. Remaining detail decisions (recursive depth limits, unknown/union type handling) are implementation concerns, not architecture decisions.
- **Cross-references**: OQ-01
### OQ-03: Should subscription operations be treated differently in type compatibility?
- **Origin**: [operation-graph.md](operation-graph.md) Q3
- **Status**: open
- **Priority**: medium — affects operation graph edge semantics for streaming operations
- **Question**: A subscription produces a stream, not a single output. Its `outputSchema` describes a single stream element, but the data flow semantics are different. Should type compat check for subscriptions account for this?
- **Notes**: This has downstream implications for call-graph population (subscriptions produce multiple `call.responded` events) and template authoring (a subscription feeding into a mutation has different semantics than a query feeding into a mutation). May want to defer to v2 but should at least document the current behavior (subscriptions are treated the same as queries/mutations).
### OQ-04: Edge type consistency — should `edgeType` be required on ALL edges?
- **Origin**: [schema.md](schema.md) Q1
- **Status**: open
- **Priority**: medium — affects serialization format and edge handling across all graph types
- **Options**:
- (a) `edgeType` required on all edges. Pro: consistent, self-describing. Con: operation graph edges are always `typed`, making the field redundant there.
- (b) Separate edge attribute types per graph mode (current implicit design — `CallEdgeAttrs` is a union, `OperationEdgeAttrs` doesn't include edge type). Con: graphology edges must carry attributes from a single schema.
- (c) Union type on edge attributes, letting the consumer tag the edge. Pro: flexible. Con: runtime discrimination burden.
- **Notes**: The current schema already stores `edgeType` alongside the edge-specific attributes in graphology (see schema.md's "Edge type storage" section), which is effectively option (a) at the storage level. The question is really about the TypeScript type API: should `OperationEdgeAttrs` include `edgeType: "typed"` or should that be a separate concern?
- **Cross-references**: OQ-01 (if incompatible edges exist, they need tagging)
---
## Theme 2: Structural Container Transparency
### OQ-05: Should `Sequential` and `Parallel` be transparent in the graph?
- **Origin**: [workflow-templates.md](workflow-templates.md) Q1, [host-configs.md](host-configs.md) Q1
- **Status**: resolved
- **Priority**: high — fundamental to how the DAG is structured and how the reactive engine computes preconditions
- **Question (merged)**: Currently, structural containers (`Sequential`, `Parallel`, `Conditional`) produce edges but no nodes. The reactive engine then has to reconstruct structural context to compute preconditions. Should they create "virtual" nodes instead?
- **Resolution**: Keep containers transparent (current design). Structural containers do NOT create nodes in the DAG or events in the event log. Their aggregate status can be computed as a projection from their children's statuses:
- A `Sequential` is "completed" when all its children are completed/skipped
- A `Parallel` is "completed" when all its children are completed/skipped
- A `Conditional` is "completed" when its taken branch is completed/skipped
This resolution aligns with ADR-005's projection model: the event log records real call events, and projections derive derived state. Virtual nodes in the event log would pollute it with synthetic events that have no call protocol equivalent. Virtual nodes in the DAG would add structural overhead for what is already computable.
The `parentMap` and `siblingMap` in the `ReactiveContext` remain the mechanism for computing preconditions. These maps are derived from the template structure during rendering, not from the DAG. They provide the structural context that the transparent-DAG approach needs, without requiring container nodes.
- **Cross-references**: OQ-14 (partial re-rendering)
---
## Theme 3: Call Protocol Integration
### OQ-06: How does template instantiation interact with the call protocol?
- **Origin**: [workflow-templates.md](workflow-templates.md) Q4, [host-configs.md](host-configs.md) Q3
- **Status**: resolved by ADR-005
- **Priority**: high — this is a fundamental integration point between flowgraph and the call protocol
- **Question (merged)**: When a template is instantiated as a call graph, each `<Operation>` becomes a call. But the call protocol's `call.requested` events include `parentRequestId` — who is the parent? Is it the template instance? The hub coordinator? And how does the `ReactiveHostConfig` bridge to `registry.execute()` or `PendingRequestMap.call()`?
- **ADR-005 resolution**: The reactive layer bridges to the call protocol through the event log. Call protocol events (`call.requested`, `call.responded`, etc.) are appended to the event log. The reactive status projection derives `NodeStatus` from the log. The result projection derives `CallResult` from the log. The hub coordinator appends events; the reactive layer projects them. No callback, no boomerang, no direct signal mutation by the coordinator.
- **Cross-references**: OQ-07, OQ-08
### OQ-07: Should the reactive engine own the call graph?
- **Origin**: [host-configs.md](host-configs.md) Q4
- **Status**: resolved by ADR-005
- **Priority**: high — affects the separation between flowgraph and the call protocol
- **Question**: Currently the call graph (from call-graph.md) and the reactive engine (from reactive-execution.md) are separate concepts. But at runtime, every `<Operation>` in a template becomes a call graph node. Should the reactive engine populate the call graph as a side effect?
- **ADR-005 resolution**: Neither owns the other. Both the call graph and the reactive status/result projections derive from the same event log. They are independent projections of the same source of truth. The call graph projects the structural view (who triggered whom). The reactive engine projects the behavioral view (what's running, what's blocked). You can have one without the other, or both simultaneously.
### OQ-08: Should `depends_on` edges be auto-populated from workflow templates?
- **Origin**: [call-graph.md](call-graph.md) Q2
- **Status**: resolved by ADR-005
- **Priority**: medium — affects how the call graph and template system relate
- **Question**: When a call graph is instantiated from a workflow template, the template's sequential/parallel structure implies data dependencies. Should the template instantiation automatically create `depends_on` edges in the call graph?
- **ADR-005 resolution**: `depends_on` edges are unnecessary as a separate concept. Data dependencies are expressed through the result projection of the event log. If node B needs node A's output, B reads `getResult("A")` from the result projection. The temporal ordering (A before B) is already expressed by template edges. There's no need for a separate edge type to represent data flow — the event log IS the data transport.
---
## Theme 4: Failure and Retry Semantics
### OQ-09: How are retries handled at the signal level?
- **Origin**: [reactive-execution.md](reactive-execution.md) Q2
- **Status**: resolved by ADR-005
- **Priority**: high — affects the core status state machine
- **Question**: If an operation fails and should be retried, the status would need to go `running → failed → ready → running`. But the current state machine marks `failed` as terminal with no exit transitions. How should this work?
- **Options**:
- (a) A `retried` status that allows re-entering `ready`. Con: adds another state to `NodeStatus`.
- (b) A separate `retryCount` attribute. A node can reset its status from `failed` to `ready` if `retryCount < maxRetries`. Con: breaks the terminal-state invariant.
- (c) Retry creates a new node (new `requestId`). The old node stays `failed`. Con: increases graph size but preserves state machine integrity.
- **ADR-005 resolution**: Option (c) is correct, and the event log makes it natural. A retry is not a state mutation — it's a new sequence of events appended to the log. When `call.requested` arrives for the same operation with a new `requestId`, it's a new fact. The old `call.error` event remains in the log as history. The status projection derives the current state by scanning for the most recent event per node. No `retried` status needed; no state machine mutation; the log preserves full history.
- **Cross-references**: OQ-10
### OQ-10: What happens to running nodes when a predecessor fails?
- **Origin**: [reactive-execution.md](reactive-execution.md) Q6
- **Status**: resolved
- **Priority**: high — affects failure propagation correctness
- **Resolution**: This is a **policy configuration** of the status projection, not a hardcoded state machine rule. The event log records failure facts. The projection decides how to handle running nodes that depend on a failed node. The default policy (option a from the original framing): running nodes are NOT affected by a predecessor's failure — only idle/waiting nodes transition to `aborted`. A more aggressive policy could abort running nodes, but this requires explicit configuration. The event log makes both strategies expressible without changing the underlying mechanism — only the projection logic changes. This aligns with ADR-005's principle that projections encode policy while the log records facts.
- **Cross-references**: OQ-09 (retries are new events, not state mutations)
---
## Theme 5: Preconditions and Scheduling
### OQ-11: Should preconditions support OR logic?
- **Origin**: [reactive-execution.md](reactive-execution.md) Q1
- **Status**: open
- **Priority**: medium — affects the precondition computation model
- **Question**: Currently all predecessors must complete (AND logic). An `anyOf` predicate would allow "start this node as soon as any predecessor completes."
- **Notes**: OR preconditions would require either: (a) an edge attribute indicating `allOf` vs `anyOf`, (b) a node-level configuration, or (c) a separate `anyOfPredecessors` computed per node. This is a semantic change that affects both the DAG structure and the reactive engine. Might be a v2 feature.
- **Cross-references**: OQ-12
### OQ-12: How does `maxConcurrency` interact with preconditions?
- **Origin**: [reactive-execution.md](reactive-execution.md) Q4
- **Status**: open
- **Priority**: medium — a `Parallel` group with `maxConcurrency: 3` should only start 3 nodes at a time
- **Notes**: `maxConcurrency` is a scheduling concern, not a structural one. The DAG doesn't encode it. Options: (a) a semaphore signal in the reactive layer, (b) coordinator-enforced throttling, (c) a `maxConcurrency` prop on `Parallel` that the reactive engine respects. The `<Parallel>` component already has `maxConcurrency` as an optional prop in its definition (workflow-templates.md).
- **Cross-references**: OQ-11, workflow-templates `Parallel` component
### OQ-13: Should `blockedByFailure` be a separate `computed` or derived from `preconditions`?
- **Origin**: [reactive-execution.md](reactive-execution.md) Q5
- **Status**: open
- **Priority**: low — implementation detail, can be decided during implementation
- **Question**: Currently there are two separate `computed` values — `preconditions` (all predecessors completed/skipped) and `blockedByFailure` (any predecessor failed/aborted). An alternative is a single `computed<NodeReadiness>` returning `"ready" | "blocked" | "failed"`.
- **Notes**: Two separate `computed` values are more composable (you can check preconditions independently of failure status) but require two effects per node. A single `computed` is simpler (one effect) but less composably queryable. This is largely an implementation choice that doesn't affect the public API. Can be deferred to implementation.
---
## Theme 6: Graph Construction and API Surface
### OQ-14: Should the call graph support unknown `operationId`?
- **Origin**: [call-graph.md](call-graph.md) Q1
- **Status**: open (with a proposed answer)
- **Priority**: medium — affects `fromCallEvents()` and `updateFromEvent()` behavior
- **Proposed answer**: Yes. The call graph records what happened, not what should have happened. Nodes with unknown `operationId` get `status: "pending"` and may later transition to `"failed"` with an `OPERATION_NOT_FOUND` error code.
- **Notes**: The doc already has a proposed answer. This just needs confirmation and the behavior documented in the `fromCallEvents()` spec.
### OQ-15: Should the call graph support multiple graphs simultaneously?
- **Origin**: [call-graph.md](call-graph.md) Q3
- **Status**: open
- **Priority**: low — can be deferred to v2
- **Question**: Currently one `FlowGraph` instance = one call graph. If the hub needs to track multiple concurrent workflows, it uses multiple instances. An alternative is a single graph with workflow-scoped subgraphs.
- **Notes**: The current design (multiple instances) is simpler and matches graphology's model. Subgraphs would require a scoping mechanism. This can be deferred unless early usage shows it's needed.
### OQ-16: Should `filterByStatus` use an index?
- **Origin**: [call-graph.md](call-graph.md) Q4
- **Status**: open
- **Priority**: low — premature optimization for small graphs
- **Notes**: Call graphs at hub level are typically tens of nodes. O(n) filter is fast enough. An index can be added later if performance becomes an issue. Can be deferred.
### OQ-17: Should `FlowGraph` expose graphology's traversal methods directly?
- **Origin**: [flowgraph-api.md](flowgraph-api.md) Q1
- **Status**: open
- **Priority**: medium — affects the public API surface
- **Question**: Currently the plan is convenience methods that delegate. But some consumers may find it inconvenient to go through `.graph.forEachNode()`.
- **Options**:
- (a) Convenience methods only (current plan). Direct access via `.graph` for power users.
- (b) Expose graphology's traversal methods directly on `FlowGraph` (e.g., `flowGraph.forEachNode()`).
- (c) Expose only the most common traversal methods and let `.graph` handle the rest.
- **Notes**: This is a UX decision. Option (a) keeps the API surface small. Option (b) is more convenient but increases the delegation surface. Option (c) is a middle ground. The decision can be made during implementation based on actual consumer usage patterns.
### OQ-18: Should `addOperation` auto-populate type-compat edges?
- **Origin**: [flowgraph-api.md](flowgraph-api.md) Q2
- **Status**: open
- **Priority**: low — affects incremental construction behavior
- **Question**: `fromSpecs()` calls `buildTypeEdges()` which adds all type-compatibility edges. Should `addOperation()` (incremental) also attempt auto-type-compat edge creation?
- **Notes**: This is only relevant for incremental construction (rare use case). The operation graph is typically built once via `fromSpecs()`. If incremental construction is needed, the consumer can call `buildTypeEdges()` manually after adding operations. Can be deferred.
### OQ-28: Should `FlowGraph` share analysis functions across instances?
- **Origin**: [flowgraph-api.md](flowgraph-api.md) Q3
- **Status**: open
- **Priority**: low — optimization concern, not blocking
- **Question**: Currently each `FlowGraph` instance owns its own `DirectedGraph`. A future optimization could pool analysis functions across instances.
- **Notes**: Distinct from OQ-15 (multiple graphs per instance) — this is about sharing analysis logic, not about graph scoping. Can be deferred.
### OQ-19: Should `parallelGroups` account for resource constraints?
- **Origin**: [analysis.md](analysis.md) Q4
- **Status**: open
- **Priority**: low — feature enhancement, not a core concern
- **Question**: Currently `parallelGroups()` returns the theoretical maximum parallelism. An optional `maxConcurrency` parameter could limit group sizes for realistic scheduling.
- **Notes**: Can be added later as an optional parameter. Not blocking.
### OQ-27: Should `validateTemplate` check runtime preconditions?
- **Origin**: [analysis.md](analysis.md) Q2
- **Status**: open (intentionally deferred)
- **Priority**: low — explicitly out of scope for static analysis
- **Question**: Currently `validateTemplate` only checks structural validity and type compatibility. Runtime preconditions (e.g., "operation B requires an API key that operation A doesn't have access to") are beyond the scope of static analysis and belong to the access control layer.
- **Notes**: This is a deliberate scope boundary, not a design gap. Documented here to confirm that this is an intentional deferral, not an oversight.
---
## Theme 7: Conditional and Template Semantics
### OQ-29: Should GraphologyHostConfig produce a separate graph per edge type?
- **Origin**: [host-configs.md](host-configs.md) Q2
- **Status**: open
- **Priority**: medium — affects implementation of the GraphologyHostConfig
- **Question**: Currently all edge types (`sequential`, `conditional`, `typed`) share the same graph. An alternative is a separate graph per edge type, enabling type-specific queries without filtering.
- **Notes**: Related to OQ-04 (edge type consistency at the schema level) but distinct — this is about the runtime graph structure, not the type design. Multiple graphs would make type-specific queries faster (no filtering) but increase complexity and memory usage.
- **Cross-references**: OQ-04
### OQ-20: How should conditional edge conditions be represented?
- **Origin**: [schema.md](schema.md) Q3
- **Status**: open
- **Priority**: medium — affects `TemplateEdgeAttrs.condition` type safety
- **Options**:
- (a) `Type.Unknown()` with documentation (current). Pro: maximally flexible. Con: no type safety.
- (b) `Type.Union([Type.String(), Type.Function(...)])` for expression strings and function references. Pro: documents both forms. Con: functions don't serialize.
- (c) A dedicated `ConditionSchema` that flowgraph defines. Pro: type safe, consistent. Con: may be overly prescriptive.
- **Notes**: The workflow-templates doc already specifies `Conditional.test` as `((results: Record<string, CallResult>) => boolean) | string`, and the host-configs doc notes that function props need runtime resolution. Option (b) seems like the pragmatic choice that matches the existing design, but the schema representation is what needs deciding.
- **Known Gap** (from [host-configs.md](host-configs.md)): "Conditional Test Evaluation" — the `Conditional.test` function needs access to the `WorkflowContext`/`ReactiveContext` at runtime to evaluate against predecessor results. This is a concrete sub-problem of OQ-06 (how the reactive host config bridges to execution).
- **Cross-references**: OQ-05 (conditional branch behavior in reactive engine), OQ-06 (runtime resolution of function props)
### OQ-21: Should templates support explicit `depends_on` edges?
- **Origin**: [workflow-templates.md](workflow-templates.md) Q3
- **Status**: open
- **Priority**: medium — affects template composition expressiveness
- **Question**: Currently dependencies are inferred from structure (sequential implies dependency). An explicit `<DependsOn target="operation-name" />` component would make data dependencies visible in the template without relying on sequential ordering.
- **Notes**: This would add expressiveness but also complexity. Implicit dependency from structure is simpler and covers the most common cases. Explicit `depends_on` would be needed when a node depends on a non-adjacent predecessor in a way that can't be expressed by a `Sequential` group. Can be deferred to v2.
- **Cross-references**: OQ-08 (call graph `depends_on` edges)
---
## Theme 8: Identity and Serialization
### OQ-22: Should `CallNodeAttrs.identity` be a structured type or `Type.Record`?
- **Origin**: [schema.md](schema.md) Q2
- **Status**: open
- **Priority**: medium — affects the `@alkdev/operations` peer dependency
- **Options**:
- (a) Import `Identity` from `@alkdev/operations` (peer dep). Pro: matches call protocol. Con: creates a direct type dependency.
- (b) Duplicate the type in flowgraph. Pro: no dependency. Con: divergence risk.
- (c) Use `Type.Record(Type.String(), Type.Array(Type.String()))` for the `resources` field. Pro: flexible. Con: weaker typing.
- **Notes**: Since `@alkdev/operations` is already a peer dependency for type imports, option (a) seems reasonable. The concern is version alignment, but semver ranges handle this. This could also be a `Type.Unknown()` with documentation, letting the consumer validate.
### OQ-23: Multiple graphs per `FlowGraph` instance?
- **Origin**: [call-graph.md](call-graph.md) Q3 (same as OQ-15)
- **Status**: open (duplicate of OQ-15 — see above)
### OQ-24: Async analysis functions?
- **Origin**: [analysis.md](analysis.md) Q3
- **Status**: open
- **Priority**: low — premature for current scale
- **Question**: Should analysis functions be async for large graphs? Current graphs are small (50-200 nodes), synchronous is fine.
- **Notes**: Can be deferred. If large graphs become common, async analysis can be added with an optional `async` variant.
---
## Theme 9: Reactive Execution Mechanics
### OQ-25: Should the reactive graph support partial re-rendering?
- **Origin**: [reactive-execution.md](reactive-execution.md) Q3
- **Status**: open (blocked on ujsx reconciler)
- **Priority**: low — blocked on ujsx reconciler implementation
- **Question**: If a template changes mid-execution, the ujsx reconciler could diff and apply changes. Currently only mount rendering is supported.
- **Known Gap** (from [host-configs.md](host-configs.md)): "ujsx Reconciler Not Yet Available" — the current `HostConfig` is mount-only: no incremental template updates, no `prepareUpdate`/`commitUpdate` flow. This gap is broader than just re-rendering.
- **Notes**: This is entirely dependent on the ujsx reconciler, which is not yet implemented. The host-configs doc notes "currently mount-only." When the reconciler is available, flowgraph gets re-rendering "for free." This question should be revisited after the reconciler is implemented.
- **Cross-references**: OQ-05 (structural container handling during re-render), host-configs.md "Known Gaps"
---
## Theme 10: Version and Scale Concerns
### OQ-26: How to handle version conflicts?
- **Origin**: [operation-graph.md](operation-graph.md) Q2
- **Status**: open
- **Priority**: low — can be deferred to a versioning use case
- **Question**: If two versions of the same operation exist in the registry, should they be separate nodes (`task.classify@1.0.0` vs `task.classify@2.0.0`) or should the latest version win?
- **Notes**: The current design uses `namespace.name` (no version) as the node key, meaning only one version per operation. This is intentional simplicity. Version conflicts are a niche concern that can be addressed when a concrete use case arises.
---
## Summary Table
| ID | Question | Origin | Priority | Status |
|----|----------|--------|----------|--------|
| OQ-01 | All edges or only compatible edges? | operation-graph | high | resolved |
| OQ-02 | Type compatibility depth and granularity | operation-graph, analysis | high | resolved |
| OQ-03 | Subscription operations in type compat | operation-graph | medium | open |
| OQ-04 | `edgeType` on all edges? | schema | medium | open |
| OQ-05 | Structural container transparency | workflow-templates, host-configs | high | resolved |
| OQ-06 | Template ↔ call protocol interaction | workflow-templates, host-configs | high | resolved |
| OQ-07 | Should reactive engine own call graph? | host-configs | high | resolved |
| OQ-08 | Auto-populate `depends_on` from templates? | call-graph | medium | resolved |
| OQ-09 | Retries at signal level | reactive-execution | high | resolved |
| OQ-10 | Running nodes when predecessor fails | reactive-execution | high | resolved |
| OQ-11 | OR logic for preconditions | reactive-execution | medium | open |
| OQ-12 | `maxConcurrency` interaction with preconditions | reactive-execution | medium | open |
| OQ-13 | `blockedByFailure` vs single computed | reactive-execution | low | open |
| OQ-14 | Unknown `operationId` in call graph | call-graph | medium | open (proposed) |
| OQ-15 | Multiple graphs per instance | call-graph | low | open |
| OQ-16 | `filterByStatus` index | call-graph | low | open |
| OQ-17 | Expose graphology traversal directly? | flowgraph-api | medium | open |
| OQ-18 | Auto-populate type edges on `addOperation`? | flowgraph-api | low | open |
| OQ-19 | `parallelGroups` with resource constraints | analysis | low | open |
| OQ-20 | Conditional edge condition representation | schema | medium | open |
| OQ-21 | Explicit `depends_on` in templates | workflow-templates | medium | open |
| OQ-22 | `CallNodeAttrs.identity` type | schema | medium | open |
| OQ-24 | Async analysis functions | analysis | low | open |
| OQ-25 | Partial re-rendering | reactive-execution | low | open (blocked) |
| OQ-26 | Operation version conflicts | operation-graph | low | open |
| OQ-27 | Runtime preconditions in validateTemplate? | analysis | low | open (deferred) |
| OQ-28 | Share analysis functions across instances? | flowgraph-api | low | open |
| OQ-29 | Separate graph per edge type? | host-configs | medium | open |
### Priority Assessment
**Resolved (ADR-005)**:
- ~~OQ-01: All edges or only compatible~~ — resolved: type-compat edges only on `dataFlow: true` edges
- ~~OQ-02: Type compatibility depth~~ — resolved: type checking only for state-transfer edges
- ~~OQ-05: Structural container transparency~~ — resolved: containers stay transparent, aggregate status is a projection
- ~~OQ-06: Template ↔ call protocol~~ — resolved: bridge through event log
- ~~OQ-07: Reactive engine owns call graph?~~ — resolved: both are projections of event log
- ~~OQ-08: Auto-populate `depends_on` from templates?~~ — resolved: unnecessary, data flows through result projection
- ~~OQ-09: Retries at signal level~~ — resolved: append events, not state mutations
- ~~OQ-10: Running node failure handling~~ — resolved: projection policy, default is running nodes continue
**High priority** (should resolve before implementation):
- (all high-priority questions have been resolved)
**Medium priority** (should resolve before v1 release):
- OQ-03, OQ-04, OQ-11, OQ-12, OQ-14, OQ-17, OQ-20, OQ-21, OQ-22, OQ-29
**Low priority** (can defer or decide during implementation):
- OQ-13, OQ-15, OQ-16, OQ-18, OQ-19, OQ-24, OQ-25, OQ-26, OQ-27, OQ-28
### Cross-Cutting Themes
These groups of questions interact with each other and should be resolved together:
1. **~~Edge semantics group~~** (OQ-01, OQ-02, OQ-04): ~~All affect the operation graph's edge structure and the type compatibility API.~~ **Resolved by ADR-005.** OQ-01 and OQ-02 resolved (type checking only on `dataFlow: true` edges). OQ-04 remains open (edge type on all edges).
2. **~~Call protocol integration group~~** (OQ-06, OQ-07, OQ-08): ~~All about how flowgraph connects to the live call protocol.~~ **Resolved by ADR-005.** All three resolved: bridge through event log, projections instead of ownership, data flow through result projection.
3. **~~Failure semantics group~~** (OQ-09, OQ-10): ~~Both about how failure and retry propagate through the reactive engine.~~ **Resolved by ADR-005.** Retries are append events; running node failure is a projection policy.
4. **Scheduling group** (OQ-11, OQ-12): Both about how preconditions interact with scheduling constraints.
4. **Scheduling group** (OQ-11, OQ-12): Both about how preconditions interact with scheduling constraints.
5. **Template expressiveness group** (OQ-05, OQ-20, OQ-21): All about what the template system can express and how it renders.
6. **Graph structure group** (OQ-04, OQ-29): Both about how edge types are represented in the graph — OQ-04 at the schema/type level, OQ-29 at the runtime graph structure level. Resolution of one constrains the other.
7. **Known gaps from host-configs.md** — not all "known gaps" are "open questions" (the reconciler gap is a dependency, not a design question), but they should be tracked here for completeness.