From 27ebbd491ee4f70995e9e58a4c922d5074572dcc Mon Sep 17 00:00:00 2001
From: "glm-5.1" <glm-5.1@alk.dev>
Date: Wed, 20 May 2026 05:27:19 +0000
Subject: [PATCH] add open questions tracker: compile all unresolved questions
 across architecture docs into one cross-referenced view organized by theme
 and priority

---
 docs/architecture/README.md         |   4 +
 docs/architecture/open-questions.md | 395 ++++++++++++++++++++++++++++
 2 files changed, 399 insertions(+)
 create mode 100644 docs/architecture/open-questions.md

diff --git a/docs/architecture/README.md b/docs/architecture/README.md
index 0ac4528..3fef459 100644
--- a/docs/architecture/README.md
+++ b/docs/architecture/README.md
@@ -72,6 +72,10 @@ Flowgraph is in Phase 0/1 (exploration → architecture). No code exists yet. Th
 | [003](decisions/003-storage-decoupled.md) | Storage is not flowgraph's concern — in-memory graph with export/import boundary |
 | [004](decisions/004-no-schema-version.md) | No schema version field in serialized format — consumers wrap in their own versioned envelope |
 
+### Open Questions
+
+All unresolved design questions across the architecture are tracked in [open-questions.md](open-questions.md), organized by theme with cross-references between related questions.
+
 ## Consumer Context
 
 ### alkhub (hub-spoke coordinator)
diff --git a/docs/architecture/open-questions.md b/docs/architecture/open-questions.md
new file mode 100644
index 0000000..684e179
--- /dev/null
+++ b/docs/architecture/open-questions.md
@@ -0,0 +1,395 @@
+---
+status: draft
+last_updated: 2026-05-20
+---
+
+# Open Questions Tracker
+
+Cross-cutting compilation of all unresolved questions across the flowgraph architecture documents, organized by theme. Questions that appear in multiple documents are unified here with cross-references.
+
+## How to Use This Document
+
+- Each question has an **ID** (e.g., OQ-01), **status**, **origin** (which doc(s)), and **priority** assessment
+- **Cross-references** link related questions that may conflict or answer each other
+- When a question is resolved, update its status to `resolved` and add a resolution note
+- Once all questions in a theme are resolved, the theme section can be removed
+
+## Theme 1: Edge Semantics and Type Compatibility
+
+### OQ-01: Should `fromSpecs()` add ALL edges or only compatible ones?
+
+- **Origin**: [operation-graph.md](operation-graph.md) Q1
+- **Status**: open
+- **Priority**: high — affects storage size, API surface, and diagnostic value
+- **Options**:
+  - (a) Add both compatible and incompatible edges (current design). Pro: diagnostic information visible. Con: graph is larger.
+  - (b) Only add compatible edges, with a `potentialEdges()` query computing incompatible connections on demand. Pro: smaller graph. Con: loses diagnostic information.
+- **Notes**: This decision affects `buildTypeEdges()` in [analysis.md](analysis.md) and `OperationEdgeAttrs` in [schema.md](schema.md). The `compatible: false` attribute on edges only makes sense if option (a) is chosen.
+- **Cross-references**: OQ-04
+
+### OQ-02: How granular should type compatibility results be?
+
+- **Origin**: [operation-graph.md](operation-graph.md) Q4, [analysis.md](analysis.md) Q1
+- **Status**: open
+- **Priority**: high — directly shapes the `typeCompat()` return type and `OperationEdgeAttrs`
+- **Question (merged)**: How deep should `typeCompat` check? Should it be fully recursive? And should the result be `{ compatible, detail? }` or `{ compatible, mismatches: TypeMismatch[] }`?
+- **Current design**: The schema already defines `TypeMismatch` with `{ path, expected, actual }` and `OperationEdgeAttrs` has an optional `mismatches` field. The analysis doc describes deep recursive structural comparison. But there's a tension: full recursive checking is more thorough but may produce false negatives for schemas with dynamic structures.
+- **Notes**: The schema doc already has `mismatches?: TypeMismatch[]` in `OperationEdgeAttrs`. The analysis doc already defines `TypeCompatResult` with `mismatches`. This suggests the design has already converged toward structured mismatch reporting. What remains is confirming: (a) recursive depth limits, (b) handling of `Type.Unknown()` and complex types (unions, intersections), (c) whether the `detail` string field is still needed alongside `mismatches`.
+- **Cross-references**: OQ-01 (incompatible edges need mismatch detail)
+
+### OQ-03: Should subscription operations be treated differently in type compatibility?
+
+- **Origin**: [operation-graph.md](operation-graph.md) Q3
+- **Status**: open
+- **Priority**: medium — affects operation graph edge semantics for streaming operations
+- **Question**: A subscription produces a stream, not a single output. Its `outputSchema` describes a single stream element, but the data flow semantics are different. Should type compat check for subscriptions account for this?
+- **Notes**: This has downstream implications for call-graph population (subscriptions produce multiple `call.responded` events) and template authoring (a subscription feeding into a mutation has different semantics than a query feeding into a mutation). May want to defer to v2 but should at least document the current behavior (subscriptions are treated the same as queries/mutations).
+
+### OQ-04: Edge type consistency — should `edgeType` be required on ALL edges?
+
+- **Origin**: [schema.md](schema.md) Q1
+- **Status**: open
+- **Priority**: medium — affects serialization format and edge handling across all graph types
+- **Options**:
+  - (a) `edgeType` required on all edges. Pro: consistent, self-describing. Con: operation graph edges are always `typed`, making the field redundant there.
+  - (b) Separate edge attribute types per graph mode (current implicit design — `CallEdgeAttrs` is a union, `OperationEdgeAttrs` doesn't include edge type). Con: graphology edges must carry attributes from a single schema.
+  - (c) Union type on edge attributes, letting the consumer tag the edge. Pro: flexible. Con: runtime discrimination burden.
+- **Notes**: The current schema already stores `edgeType` alongside the edge-specific attributes in graphology (see schema.md's "Edge type storage" section), which is effectively option (a) at the storage level. The question is really about the TypeScript type API: should `OperationEdgeAttrs` include `edgeType: "typed"` or should that be a separate concern?
+- **Cross-references**: OQ-01 (if incompatible edges exist, they need tagging)
+
+---
+
+## Theme 2: Structural Container Transparency
+
+### OQ-05: Should `Sequential` and `Parallel` be transparent in the graph?
+
+- **Origin**: [workflow-templates.md](workflow-templates.md) Q1, [host-configs.md](host-configs.md) Q1
+- **Status**: open
+- **Priority**: high — fundamental to how the DAG is structured and how the reactive engine computes preconditions
+- **Question (merged)**: Currently, structural containers (`Sequential`, `Parallel`, `Conditional`) produce edges but no nodes. The reactive engine then has to reconstruct structural context to compute preconditions. Should they create "virtual" nodes instead?
+- **Options**:
+  - (a) Transparent (current design): No nodes for containers. Edges carry the structure. Pro: smaller DAG, cleaner topology. Con: precondition computation needs structural context (parentStack, siblingMap).
+  - (b) Virtual nodes: Containers create nodes with `signal<NodeStatus>`. Pro: every node has a status and preconditions, simpler reactive engine. Con: more nodes, containers with no call protocol equivalent, slightly more complex graph queries.
+- **Notes**: The host-configs doc identifies this as a "known gap": `Sequential`, `Parallel`, `Conditional` are transparent in the DAG but create complexity for the reactive engine's "previous sibling" precondition logic. The reactive-execution doc's `WorkflowReactiveRoot.initializeSignals()` assumes it operates on the flattened DAG (all nodes are operations), which aligns with option (a). The question is whether the reactive engine's context maps (`parentMap`, `siblingMap`) are sufficient or if virtual nodes would simplify things.
+- **Cross-references**: OQ-14 (partial re-rendering)
+
+---
+
+## Theme 3: Call Protocol Integration
+
+### OQ-06: How does template instantiation interact with the call protocol?
+
+- **Origin**: [workflow-templates.md](workflow-templates.md) Q4, [host-configs.md](host-configs.md) Q3
+- **Status**: open
+- **Priority**: high — this is a fundamental integration point between flowgraph and the call protocol
+- **Question (merged)**: When a template is instantiated as a call graph, each `<Operation>` becomes a call. But the call protocol's `call.requested` events include `parentRequestId` — who is the parent? Is it the template instance? The hub coordinator? And how does the `ReactiveHostConfig` bridge to `registry.execute()` or `PendingRequestMap.call()`?
+- **Notes**: The consumer-integration doc shows the coordinator calling `registry.execute()` inside an `effect()`, but doesn't specify the `parentRequestId` semantics. This is a consumer-side decision, but flowgraph needs to document: (a) whether the template has its own `requestId`, (b) how the reactive engine signals the coordinator to start a call, (c) whether `ReactiveHostConfig` has a callback prop for this.
+- **Cross-references**: OQ-07, OQ-08
+
+### OQ-07: Should the reactive engine own the call graph?
+
+- **Origin**: [host-configs.md](host-configs.md) Q4
+- **Status**: open
+- **Priority**: high — affects the separation between flowgraph and the call protocol
+- **Question**: Currently the call graph (from call-graph.md) and the reactive engine (from reactive-execution.md) are separate concepts. But at runtime, every `<Operation>` in a template becomes a call graph node. Should the reactive engine populate the call graph as a side effect?
+- **Options**:
+  - (a) Separate: Call graph is populated by call protocol events. Reactive engine uses signals only. Coordinator bridges them.
+  - (b) Unified: Reactive engine creates call graph nodes when nodes transition to `running`, updates them on completion. Call graph is derived from reactive state.
+- **Notes**: Option (a) matches ADR-003 (flowgraph doesn't do storage/persistence) and the current design where the call graph is populated by `updateFromEvent()`. Option (b) would couple the reactive engine to the call protocol. The current design's separation is cleaner but requires the coordinator to maintain both reactive state and call graph state.
+
+### OQ-08: Should `depends_on` edges be auto-populated from workflow templates?
+
+- **Origin**: [call-graph.md](call-graph.md) Q2
+- **Status**: open
+- **Priority**: medium — affects how the call graph and template system relate
+- **Question**: When a call graph is instantiated from a workflow template, the template's sequential/parallel structure implies data dependencies. Should the template instantiation automatically create `depends_on` edges in the call graph?
+- **Notes**: Currently `depends_on` edges must be added explicitly. Auto-population would couple the call graph to the template system. The alternative is for the coordinator to add `depends_on` edges when it instantiates a template.
+- **Cross-references**: OQ-06, workflow-templates Q3 (explicit `depends_on` in templates)
+
+---
+
+## Theme 4: Failure and Retry Semantics
+
+### OQ-09: How are retries handled at the signal level?
+
+- **Origin**: [reactive-execution.md](reactive-execution.md) Q2
+- **Status**: open
+- **Priority**: high — affects the core status state machine
+- **Question**: If an operation fails and should be retried, the status would need to go `running → failed → ready → running`. But the current state machine marks `failed` as terminal with no exit transitions. How should this work?
+- **Options**:
+  - (a) A `retried` status that allows re-entering `ready`. Con: adds another state to `NodeStatus`.
+  - (b) A separate `retryCount` attribute. A node can reset its status from `failed` to `ready` if `retryCount < maxRetries`. Con: breaks the terminal-state invariant.
+  - (c) Retry creates a new node (new `requestId`). The old node stays `failed`. Con: increases graph size but preserves state machine integrity.
+- **Notes**: Option (c) aligns with the call protocol, where each retry is a new call with a new `requestId`. This is likely the right answer but needs confirmation.
+- **Cross-references**: OQ-10
+
+### OQ-10: What happens to running nodes when a predecessor fails?
+
+- **Origin**: [reactive-execution.md](reactive-execution.md) Q6
+- **Status**: open
+- **Priority**: high — affects failure propagation correctness
+- **Question**: The current spec transitions `idle` and `waiting` nodes to `aborted` when `blockedByFailure` becomes true. But what about a node that's already `running`? Should it be cancelled?
+- **Options**:
+  - (a) Running nodes are NOT affected. A predecessor's failure blocks dependents that haven't started, but running nodes continue. The coordinator can cancel them via `prm.abort()` if desired.
+  - (b) Running nodes automatically transition to `aborted`. This requires the `effect()` to check for running nodes.
+- **Notes**: Option (a) is consistent with "failure follows dependency edges, not structural scope" — a running node has already passed its preconditions, so it should be allowed to complete. The coordinator can choose to abort it. Option (b) would be more aggressive. The reactive-execution doc's constraint says "abort is immediate in signals, delayed in protocol," suggesting option (a) is intended.
+- **Cross-references**: OQ-09 (retries need to know if a running node can be restarted)
+
+---
+
+## Theme 5: Preconditions and Scheduling
+
+### OQ-11: Should preconditions support OR logic?
+
+- **Origin**: [reactive-execution.md](reactive-execution.md) Q1
+- **Status**: open
+- **Priority**: medium — affects the precondition computation model
+- **Question**: Currently all predecessors must complete (AND logic). An `anyOf` predicate would allow "start this node as soon as any predecessor completes."
+- **Notes**: OR preconditions would require either: (a) an edge attribute indicating `allOf` vs `anyOf`, (b) a node-level configuration, or (c) a separate `anyOfPredecessors` computed per node. This is a semantic change that affects both the DAG structure and the reactive engine. Might be a v2 feature.
+- **Cross-references**: OQ-12
+
+### OQ-12: How does `maxConcurrency` interact with preconditions?
+
+- **Origin**: [reactive-execution.md](reactive-execution.md) Q4
+- **Status**: open
+- **Priority**: medium — a `Parallel` group with `maxConcurrency: 3` should only start 3 nodes at a time
+- **Notes**: `maxConcurrency` is a scheduling concern, not a structural one. The DAG doesn't encode it. Options: (a) a semaphore signal in the reactive layer, (b) coordinator-enforced throttling, (c) a `maxConcurrency` prop on `Parallel` that the reactive engine respects. The `<Parallel>` component already has `maxConcurrency` as an optional prop in its definition (workflow-templates.md).
+- **Cross-references**: OQ-11, workflow-templates `Parallel` component
+
+### OQ-13: Should `blockedByFailure` be a separate `computed` or derived from `preconditions`?
+
+- **Origin**: [reactive-execution.md](reactive-execution.md) Q5
+- **Status**: open
+- **Priority**: low — implementation detail, can be decided during implementation
+- **Question**: Currently there are two separate `computed` values — `preconditions` (all predecessors completed/skipped) and `blockedByFailure` (any predecessor failed/aborted). An alternative is a single `computed<NodeReadiness>` returning `"ready" | "blocked" | "failed"`.
+- **Notes**: Two separate `computed` values are more composable (you can check preconditions independently of failure status) but require two effects per node. A single `computed` is simpler (one effect) but less composably queryable. This is largely an implementation choice that doesn't affect the public API. Can be deferred to implementation.
+
+---
+
+## Theme 6: Graph Construction and API Surface
+
+### OQ-14: Should the call graph support unknown `operationId`?
+
+- **Origin**: [call-graph.md](call-graph.md) Q1
+- **Status**: open (with a proposed answer)
+- **Priority**: medium — affects `fromCallEvents()` and `updateFromEvent()` behavior
+- **Proposed answer**: Yes. The call graph records what happened, not what should have happened. Nodes with unknown `operationId` get `status: "pending"` and may later transition to `"failed"` with an `OPERATION_NOT_FOUND` error code.
+- **Notes**: The doc already has a proposed answer. This just needs confirmation and the behavior documented in the `fromCallEvents()` spec.
+
+### OQ-15: Should the call graph support multiple graphs simultaneously?
+
+- **Origin**: [call-graph.md](call-graph.md) Q3
+- **Status**: open
+- **Priority**: low — can be deferred to v2
+- **Question**: Currently one `FlowGraph` instance = one call graph. If the hub needs to track multiple concurrent workflows, it uses multiple instances. An alternative is a single graph with workflow-scoped subgraphs.
+- **Notes**: The current design (multiple instances) is simpler and matches graphology's model. Subgraphs would require a scoping mechanism. This can be deferred unless early usage shows it's needed.
+
+### OQ-16: Should `filterByStatus` use an index?
+
+- **Origin**: [call-graph.md](call-graph.md) Q4
+- **Status**: open
+- **Priority**: low — premature optimization for small graphs
+- **Notes**: Call graphs at hub level are typically tens of nodes. O(n) filter is fast enough. An index can be added later if performance becomes an issue. Can be deferred.
+
+### OQ-17: Should `FlowGraph` expose graphology's traversal methods directly?
+
+- **Origin**: [flowgraph-api.md](flowgraph-api.md) Q1
+- **Status**: open
+- **Priority**: medium — affects the public API surface
+- **Question**: Currently the plan is convenience methods that delegate. But some consumers may find it inconvenient to go through `.graph.forEachNode()`.
+- **Options**:
+  - (a) Convenience methods only (current plan). Direct access via `.graph` for power users.
+  - (b) Expose graphology's traversal methods directly on `FlowGraph` (e.g., `flowGraph.forEachNode()`).
+  - (c) Expose only the most common traversal methods and let `.graph` handle the rest.
+- **Notes**: This is a UX decision. Option (a) keeps the API surface small. Option (b) is more convenient but increases the delegation surface. Option (c) is a middle ground. The decision can be made during implementation based on actual consumer usage patterns.
+
+### OQ-18: Should `addOperation` auto-populate type-compat edges?
+
+- **Origin**: [flowgraph-api.md](flowgraph-api.md) Q2
+- **Status**: open
+- **Priority**: low — affects incremental construction behavior
+- **Question**: `fromSpecs()` calls `buildTypeEdges()` which adds all type-compatibility edges. Should `addOperation()` (incremental) also attempt auto-type-compat edge creation?
+- **Notes**: This is only relevant for incremental construction (rare use case). The operation graph is typically built once via `fromSpecs()`. If incremental construction is needed, the consumer can call `buildTypeEdges()` manually after adding operations. Can be deferred.
+
+### OQ-28: Should `FlowGraph` share analysis functions across instances?
+
+- **Origin**: [flowgraph-api.md](flowgraph-api.md) Q3
+- **Status**: open
+- **Priority**: low — optimization concern, not blocking
+- **Question**: Currently each `FlowGraph` instance owns its own `DirectedGraph`. A future optimization could pool analysis functions across instances.
+- **Notes**: Distinct from OQ-15 (multiple graphs per instance) — this is about sharing analysis logic, not about graph scoping. Can be deferred.
+
+### OQ-19: Should `parallelGroups` account for resource constraints?
+
+- **Origin**: [analysis.md](analysis.md) Q4
+- **Status**: open
+- **Priority**: low — feature enhancement, not a core concern
+- **Question**: Currently `parallelGroups()` returns the theoretical maximum parallelism. An optional `maxConcurrency` parameter could limit group sizes for realistic scheduling.
+- **Notes**: Can be added later as an optional parameter. Not blocking.
+
+### OQ-27: Should `validateTemplate` check runtime preconditions?
+
+- **Origin**: [analysis.md](analysis.md) Q2
+- **Status**: open (intentionally deferred)
+- **Priority**: low — explicitly out of scope for static analysis
+- **Question**: Currently `validateTemplate` only checks structural validity and type compatibility. Runtime preconditions (e.g., "operation B requires an API key that operation A doesn't have access to") are beyond the scope of static analysis and belong to the access control layer.
+- **Notes**: This is a deliberate scope boundary, not a design gap. Documented here to confirm that this is an intentional deferral, not an oversight.
+
+---
+
+## Theme 7: Conditional and Template Semantics
+
+### OQ-29: Should GraphologyHostConfig produce a separate graph per edge type?
+
+- **Origin**: [host-configs.md](host-configs.md) Q2
+- **Status**: open
+- **Priority**: medium — affects implementation of the GraphologyHostConfig
+- **Question**: Currently all edge types (`sequential`, `conditional`, `typed`) share the same graph. An alternative is a separate graph per edge type, enabling type-specific queries without filtering.
+- **Notes**: Related to OQ-04 (edge type consistency at the schema level) but distinct — this is about the runtime graph structure, not the type design. Multiple graphs would make type-specific queries faster (no filtering) but increase complexity and memory usage.
+- **Cross-references**: OQ-04
+
+### OQ-20: How should conditional edge conditions be represented?
+
+- **Origin**: [schema.md](schema.md) Q3
+- **Status**: open
+- **Priority**: medium — affects `TemplateEdgeAttrs.condition` type safety
+- **Options**:
+  - (a) `Type.Unknown()` with documentation (current). Pro: maximally flexible. Con: no type safety.
+  - (b) `Type.Union([Type.String(), Type.Function(...)])` for expression strings and function references. Pro: documents both forms. Con: functions don't serialize.
+  - (c) A dedicated `ConditionSchema` that flowgraph defines. Pro: type safe, consistent. Con: may be overly prescriptive.
+- **Notes**: The workflow-templates doc already specifies `Conditional.test` as `((results: Record<string, CallResult>) => boolean) | string`, and the host-configs doc notes that function props need runtime resolution. Option (b) seems like the pragmatic choice that matches the existing design, but the schema representation is what needs deciding.
+- **Known Gap** (from [host-configs.md](host-configs.md)): "Conditional Test Evaluation" — the `Conditional.test` function needs access to the `WorkflowContext`/`ReactiveContext` at runtime to evaluate against predecessor results. This is a concrete sub-problem of OQ-06 (how the reactive host config bridges to execution).
+- **Cross-references**: OQ-05 (conditional branch behavior in reactive engine), OQ-06 (runtime resolution of function props)
+
+### OQ-21: Should templates support explicit `depends_on` edges?
+
+- **Origin**: [workflow-templates.md](workflow-templates.md) Q3
+- **Status**: open
+- **Priority**: medium — affects template composition expressiveness
+- **Question**: Currently dependencies are inferred from structure (sequential implies dependency). An explicit `<DependsOn target="operation-name" />` component would make data dependencies visible in the template without relying on sequential ordering.
+- **Notes**: This would add expressiveness but also complexity. Implicit dependency from structure is simpler and covers the most common cases. Explicit `depends_on` would be needed when a node depends on a non-adjacent predecessor in a way that can't be expressed by a `Sequential` group. Can be deferred to v2.
+- **Cross-references**: OQ-08 (call graph `depends_on` edges)
+
+---
+
+## Theme 8: Identity and Serialization
+
+### OQ-22: Should `CallNodeAttrs.identity` be a structured type or `Type.Record`?
+
+- **Origin**: [schema.md](schema.md) Q2
+- **Status**: open
+- **Priority**: medium — affects the `@alkdev/operations` peer dependency
+- **Options**:
+  - (a) Import `Identity` from `@alkdev/operations` (peer dep). Pro: matches call protocol. Con: creates a direct type dependency.
+  - (b) Duplicate the type in flowgraph. Pro: no dependency. Con: divergence risk.
+  - (c) Use `Type.Record(Type.String(), Type.Array(Type.String()))` for the `resources` field. Pro: flexible. Con: weaker typing.
+- **Notes**: Since `@alkdev/operations` is already a peer dependency for type imports, option (a) seems reasonable. The concern is version alignment, but semver ranges handle this. This could also be a `Type.Unknown()` with documentation, letting the consumer validate.
+
+### OQ-23: Multiple graphs per `FlowGraph` instance?
+
+- **Origin**: [call-graph.md](call-graph.md) Q3 (same as OQ-15)
+- **Status**: open (duplicate of OQ-15 — see above)
+
+### OQ-24: Async analysis functions?
+
+- **Origin**: [analysis.md](analysis.md) Q3
+- **Status**: open
+- **Priority**: low — premature for current scale
+- **Question**: Should analysis functions be async for large graphs? Current graphs are small (50-200 nodes), synchronous is fine.
+- **Notes**: Can be deferred. If large graphs become common, async analysis can be added with an optional `async` variant.
+
+---
+
+## Theme 9: Reactive Execution Mechanics
+
+### OQ-25: Should the reactive graph support partial re-rendering?
+
+- **Origin**: [reactive-execution.md](reactive-execution.md) Q3
+- **Status**: open (blocked on ujsx reconciler)
+- **Priority**: low — blocked on ujsx reconciler implementation
+- **Question**: If a template changes mid-execution, the ujsx reconciler could diff and apply changes. Currently only mount rendering is supported.
+- **Known Gap** (from [host-configs.md](host-configs.md)): "ujsx Reconciler Not Yet Available" — the current `HostConfig` is mount-only: no incremental template updates, no `prepareUpdate`/`commitUpdate` flow. This gap is broader than just re-rendering.
+- **Notes**: This is entirely dependent on the ujsx reconciler, which is not yet implemented. The host-configs doc notes "currently mount-only." When the reconciler is available, flowgraph gets re-rendering "for free." This question should be revisited after the reconciler is implemented.
+- **Cross-references**: OQ-05 (structural container handling during re-render), host-configs.md "Known Gaps"
+
+---
+
+## Theme 10: Version and Scale Concerns
+
+### OQ-26: How to handle version conflicts?
+
+- **Origin**: [operation-graph.md](operation-graph.md) Q2
+- **Status**: open
+- **Priority**: low — can be deferred to a versioning use case
+- **Question**: If two versions of the same operation exist in the registry, should they be separate nodes (`task.classify@1.0.0` vs `task.classify@2.0.0`) or should the latest version win?
+- **Notes**: The current design uses `namespace.name` (no version) as the node key, meaning only one version per operation. This is intentional simplicity. Version conflicts are a niche concern that can be addressed when a concrete use case arises.
+
+---
+
+## Summary Table
+
+| ID | Question | Origin | Priority | Status |
+|----|----------|--------|----------|--------|
+| OQ-01 | All edges or only compatible edges? | operation-graph | high | open |
+| OQ-02 | Type compatibility depth and granularity | operation-graph, analysis | high | open |
+| OQ-03 | Subscription operations in type compat | operation-graph | medium | open |
+| OQ-04 | `edgeType` on all edges? | schema | medium | open |
+| OQ-05 | Structural container transparency | workflow-templates, host-configs | high | open |
+| OQ-06 | Template ↔ call protocol interaction | workflow-templates, host-configs | high | open |
+| OQ-07 | Should reactive engine own call graph? | host-configs | high | open |
+| OQ-08 | Auto-populate `depends_on` from templates? | call-graph | medium | open |
+| OQ-09 | Retries at signal level | reactive-execution | high | open |
+| OQ-10 | Running nodes when predecessor fails | reactive-execution | high | open |
+| OQ-11 | OR logic for preconditions | reactive-execution | medium | open |
+| OQ-12 | `maxConcurrency` interaction with preconditions | reactive-execution | medium | open |
+| OQ-13 | `blockedByFailure` vs single computed | reactive-execution | low | open |
+| OQ-14 | Unknown `operationId` in call graph | call-graph | medium | open (proposed) |
+| OQ-15 | Multiple graphs per instance | call-graph | low | open |
+| OQ-16 | `filterByStatus` index | call-graph | low | open |
+| OQ-17 | Expose graphology traversal directly? | flowgraph-api | medium | open |
+| OQ-18 | Auto-populate type edges on `addOperation`? | flowgraph-api | low | open |
+| OQ-19 | `parallelGroups` with resource constraints | analysis | low | open |
+| OQ-20 | Conditional edge condition representation | schema | medium | open |
+| OQ-21 | Explicit `depends_on` in templates | workflow-templates | medium | open |
+| OQ-22 | `CallNodeAttrs.identity` type | schema | medium | open |
+| OQ-24 | Async analysis functions | analysis | low | open |
+| OQ-25 | Partial re-rendering | reactive-execution | low | open (blocked) |
+| OQ-26 | Operation version conflicts | operation-graph | low | open |
+| OQ-27 | Runtime preconditions in validateTemplate? | analysis | low | open (deferred) |
+| OQ-28 | Share analysis functions across instances? | flowgraph-api | low | open |
+| OQ-29 | Separate graph per edge type? | host-configs | medium | open |
+
+### Priority Assessment
+
+**High priority** (should resolve before implementation):
+- OQ-01: All edges or only compatible — shapes the entire operation graph API
+- OQ-02: Type compatibility depth — shapes `typeCompat()` return type
+- OQ-05: Structural container transparency — fundamental to DAG and reactive engine
+- OQ-06: Template ↔ call protocol — fundamental integration point
+- OQ-07: Reactive engine owns call graph? — affects architecture boundaries
+- OQ-09: Retries — shapes the state machine
+- OQ-10: Running node failure handling — shapes failure propagation
+
+**Medium priority** (should resolve before v1 release):
+- OQ-03, OQ-04, OQ-08, OQ-11, OQ-12, OQ-14, OQ-17, OQ-20, OQ-21, OQ-22, OQ-29
+
+**Low priority** (can defer or decide during implementation):
+- OQ-13, OQ-15, OQ-16, OQ-18, OQ-19, OQ-24, OQ-25, OQ-26, OQ-27, OQ-28
+
+### Cross-Cutting Themes
+
+These groups of questions interact with each other and should be resolved together:
+
+1. **Edge semantics group** (OQ-01, OQ-02, OQ-04): All affect the operation graph's edge structure and the type compatibility API.
+
+2. **Call protocol integration group** (OQ-06, OQ-07, OQ-08): All about how flowgraph connects to the live call protocol.
+
+3. **Failure semantics group** (OQ-09, OQ-10): Both about how failure and retry propagate through the reactive engine. Resolving one may resolve or constrain the other.
+
+4. **Scheduling group** (OQ-11, OQ-12): Both about how preconditions interact with scheduling constraints.
+
+5. **Template expressiveness group** (OQ-05, OQ-20, OQ-21): All about what the template system can express and how it renders.
+
+6. **Graph structure group** (OQ-04, OQ-29): Both about how edge types are represented in the graph — OQ-04 at the schema/type level, OQ-29 at the runtime graph structure level. Resolution of one constrains the other.
+
+7. **Known gaps from host-configs.md** — not all "known gaps" are "open questions" (the reconciler gap is a dependency, not a design question), but they should be tracked here for completeness.
\ No newline at end of file