From 27ebbd491ee4f70995e9e58a4c922d5074572dcc Mon Sep 17 00:00:00 2001 From: "glm-5.1" Date: Wed, 20 May 2026 05:27:19 +0000 Subject: [PATCH] add open questions tracker: compile all unresolved questions across architecture docs into one cross-referenced view organized by theme and priority --- docs/architecture/README.md | 4 + docs/architecture/open-questions.md | 395 ++++++++++++++++++++++++++++ 2 files changed, 399 insertions(+) create mode 100644 docs/architecture/open-questions.md diff --git a/docs/architecture/README.md b/docs/architecture/README.md index 0ac4528..3fef459 100644 --- a/docs/architecture/README.md +++ b/docs/architecture/README.md @@ -72,6 +72,10 @@ Flowgraph is in Phase 0/1 (exploration → architecture). No code exists yet. Th | [003](decisions/003-storage-decoupled.md) | Storage is not flowgraph's concern — in-memory graph with export/import boundary | | [004](decisions/004-no-schema-version.md) | No schema version field in serialized format — consumers wrap in their own versioned envelope | +### Open Questions + +All unresolved design questions across the architecture are tracked in [open-questions.md](open-questions.md), organized by theme with cross-references between related questions. + ## Consumer Context ### alkhub (hub-spoke coordinator) diff --git a/docs/architecture/open-questions.md b/docs/architecture/open-questions.md new file mode 100644 index 0000000..684e179 --- /dev/null +++ b/docs/architecture/open-questions.md @@ -0,0 +1,395 @@ +--- +status: draft +last_updated: 2026-05-20 +--- + +# Open Questions Tracker + +Cross-cutting compilation of all unresolved questions across the flowgraph architecture documents, organized by theme. Questions that appear in multiple documents are unified here with cross-references. + +## How to Use This Document + +- Each question has an **ID** (e.g., OQ-01), **status**, **origin** (which doc(s)), and **priority** assessment +- **Cross-references** link related questions that may conflict or answer each other +- When a question is resolved, update its status to `resolved` and add a resolution note +- Once all questions in a theme are resolved, the theme section can be removed + +## Theme 1: Edge Semantics and Type Compatibility + +### OQ-01: Should `fromSpecs()` add ALL edges or only compatible ones? + +- **Origin**: [operation-graph.md](operation-graph.md) Q1 +- **Status**: open +- **Priority**: high — affects storage size, API surface, and diagnostic value +- **Options**: + - (a) Add both compatible and incompatible edges (current design). Pro: diagnostic information visible. Con: graph is larger. + - (b) Only add compatible edges, with a `potentialEdges()` query computing incompatible connections on demand. Pro: smaller graph. Con: loses diagnostic information. +- **Notes**: This decision affects `buildTypeEdges()` in [analysis.md](analysis.md) and `OperationEdgeAttrs` in [schema.md](schema.md). The `compatible: false` attribute on edges only makes sense if option (a) is chosen. +- **Cross-references**: OQ-04 + +### OQ-02: How granular should type compatibility results be? + +- **Origin**: [operation-graph.md](operation-graph.md) Q4, [analysis.md](analysis.md) Q1 +- **Status**: open +- **Priority**: high — directly shapes the `typeCompat()` return type and `OperationEdgeAttrs` +- **Question (merged)**: How deep should `typeCompat` check? Should it be fully recursive? And should the result be `{ compatible, detail? }` or `{ compatible, mismatches: TypeMismatch[] }`? +- **Current design**: The schema already defines `TypeMismatch` with `{ path, expected, actual }` and `OperationEdgeAttrs` has an optional `mismatches` field. The analysis doc describes deep recursive structural comparison. But there's a tension: full recursive checking is more thorough but may produce false negatives for schemas with dynamic structures. +- **Notes**: The schema doc already has `mismatches?: TypeMismatch[]` in `OperationEdgeAttrs`. The analysis doc already defines `TypeCompatResult` with `mismatches`. This suggests the design has already converged toward structured mismatch reporting. What remains is confirming: (a) recursive depth limits, (b) handling of `Type.Unknown()` and complex types (unions, intersections), (c) whether the `detail` string field is still needed alongside `mismatches`. +- **Cross-references**: OQ-01 (incompatible edges need mismatch detail) + +### OQ-03: Should subscription operations be treated differently in type compatibility? + +- **Origin**: [operation-graph.md](operation-graph.md) Q3 +- **Status**: open +- **Priority**: medium — affects operation graph edge semantics for streaming operations +- **Question**: A subscription produces a stream, not a single output. Its `outputSchema` describes a single stream element, but the data flow semantics are different. Should type compat check for subscriptions account for this? +- **Notes**: This has downstream implications for call-graph population (subscriptions produce multiple `call.responded` events) and template authoring (a subscription feeding into a mutation has different semantics than a query feeding into a mutation). May want to defer to v2 but should at least document the current behavior (subscriptions are treated the same as queries/mutations). + +### OQ-04: Edge type consistency — should `edgeType` be required on ALL edges? + +- **Origin**: [schema.md](schema.md) Q1 +- **Status**: open +- **Priority**: medium — affects serialization format and edge handling across all graph types +- **Options**: + - (a) `edgeType` required on all edges. Pro: consistent, self-describing. Con: operation graph edges are always `typed`, making the field redundant there. + - (b) Separate edge attribute types per graph mode (current implicit design — `CallEdgeAttrs` is a union, `OperationEdgeAttrs` doesn't include edge type). Con: graphology edges must carry attributes from a single schema. + - (c) Union type on edge attributes, letting the consumer tag the edge. Pro: flexible. Con: runtime discrimination burden. +- **Notes**: The current schema already stores `edgeType` alongside the edge-specific attributes in graphology (see schema.md's "Edge type storage" section), which is effectively option (a) at the storage level. The question is really about the TypeScript type API: should `OperationEdgeAttrs` include `edgeType: "typed"` or should that be a separate concern? +- **Cross-references**: OQ-01 (if incompatible edges exist, they need tagging) + +--- + +## Theme 2: Structural Container Transparency + +### OQ-05: Should `Sequential` and `Parallel` be transparent in the graph? + +- **Origin**: [workflow-templates.md](workflow-templates.md) Q1, [host-configs.md](host-configs.md) Q1 +- **Status**: open +- **Priority**: high — fundamental to how the DAG is structured and how the reactive engine computes preconditions +- **Question (merged)**: Currently, structural containers (`Sequential`, `Parallel`, `Conditional`) produce edges but no nodes. The reactive engine then has to reconstruct structural context to compute preconditions. Should they create "virtual" nodes instead? +- **Options**: + - (a) Transparent (current design): No nodes for containers. Edges carry the structure. Pro: smaller DAG, cleaner topology. Con: precondition computation needs structural context (parentStack, siblingMap). + - (b) Virtual nodes: Containers create nodes with `signal`. Pro: every node has a status and preconditions, simpler reactive engine. Con: more nodes, containers with no call protocol equivalent, slightly more complex graph queries. +- **Notes**: The host-configs doc identifies this as a "known gap": `Sequential`, `Parallel`, `Conditional` are transparent in the DAG but create complexity for the reactive engine's "previous sibling" precondition logic. The reactive-execution doc's `WorkflowReactiveRoot.initializeSignals()` assumes it operates on the flattened DAG (all nodes are operations), which aligns with option (a). The question is whether the reactive engine's context maps (`parentMap`, `siblingMap`) are sufficient or if virtual nodes would simplify things. +- **Cross-references**: OQ-14 (partial re-rendering) + +--- + +## Theme 3: Call Protocol Integration + +### OQ-06: How does template instantiation interact with the call protocol? + +- **Origin**: [workflow-templates.md](workflow-templates.md) Q4, [host-configs.md](host-configs.md) Q3 +- **Status**: open +- **Priority**: high — this is a fundamental integration point between flowgraph and the call protocol +- **Question (merged)**: When a template is instantiated as a call graph, each `` becomes a call. But the call protocol's `call.requested` events include `parentRequestId` — who is the parent? Is it the template instance? The hub coordinator? And how does the `ReactiveHostConfig` bridge to `registry.execute()` or `PendingRequestMap.call()`? +- **Notes**: The consumer-integration doc shows the coordinator calling `registry.execute()` inside an `effect()`, but doesn't specify the `parentRequestId` semantics. This is a consumer-side decision, but flowgraph needs to document: (a) whether the template has its own `requestId`, (b) how the reactive engine signals the coordinator to start a call, (c) whether `ReactiveHostConfig` has a callback prop for this. +- **Cross-references**: OQ-07, OQ-08 + +### OQ-07: Should the reactive engine own the call graph? + +- **Origin**: [host-configs.md](host-configs.md) Q4 +- **Status**: open +- **Priority**: high — affects the separation between flowgraph and the call protocol +- **Question**: Currently the call graph (from call-graph.md) and the reactive engine (from reactive-execution.md) are separate concepts. But at runtime, every `` in a template becomes a call graph node. Should the reactive engine populate the call graph as a side effect? +- **Options**: + - (a) Separate: Call graph is populated by call protocol events. Reactive engine uses signals only. Coordinator bridges them. + - (b) Unified: Reactive engine creates call graph nodes when nodes transition to `running`, updates them on completion. Call graph is derived from reactive state. +- **Notes**: Option (a) matches ADR-003 (flowgraph doesn't do storage/persistence) and the current design where the call graph is populated by `updateFromEvent()`. Option (b) would couple the reactive engine to the call protocol. The current design's separation is cleaner but requires the coordinator to maintain both reactive state and call graph state. + +### OQ-08: Should `depends_on` edges be auto-populated from workflow templates? + +- **Origin**: [call-graph.md](call-graph.md) Q2 +- **Status**: open +- **Priority**: medium — affects how the call graph and template system relate +- **Question**: When a call graph is instantiated from a workflow template, the template's sequential/parallel structure implies data dependencies. Should the template instantiation automatically create `depends_on` edges in the call graph? +- **Notes**: Currently `depends_on` edges must be added explicitly. Auto-population would couple the call graph to the template system. The alternative is for the coordinator to add `depends_on` edges when it instantiates a template. +- **Cross-references**: OQ-06, workflow-templates Q3 (explicit `depends_on` in templates) + +--- + +## Theme 4: Failure and Retry Semantics + +### OQ-09: How are retries handled at the signal level? + +- **Origin**: [reactive-execution.md](reactive-execution.md) Q2 +- **Status**: open +- **Priority**: high — affects the core status state machine +- **Question**: If an operation fails and should be retried, the status would need to go `running → failed → ready → running`. But the current state machine marks `failed` as terminal with no exit transitions. How should this work? +- **Options**: + - (a) A `retried` status that allows re-entering `ready`. Con: adds another state to `NodeStatus`. + - (b) A separate `retryCount` attribute. A node can reset its status from `failed` to `ready` if `retryCount < maxRetries`. Con: breaks the terminal-state invariant. + - (c) Retry creates a new node (new `requestId`). The old node stays `failed`. Con: increases graph size but preserves state machine integrity. +- **Notes**: Option (c) aligns with the call protocol, where each retry is a new call with a new `requestId`. This is likely the right answer but needs confirmation. +- **Cross-references**: OQ-10 + +### OQ-10: What happens to running nodes when a predecessor fails? + +- **Origin**: [reactive-execution.md](reactive-execution.md) Q6 +- **Status**: open +- **Priority**: high — affects failure propagation correctness +- **Question**: The current spec transitions `idle` and `waiting` nodes to `aborted` when `blockedByFailure` becomes true. But what about a node that's already `running`? Should it be cancelled? +- **Options**: + - (a) Running nodes are NOT affected. A predecessor's failure blocks dependents that haven't started, but running nodes continue. The coordinator can cancel them via `prm.abort()` if desired. + - (b) Running nodes automatically transition to `aborted`. This requires the `effect()` to check for running nodes. +- **Notes**: Option (a) is consistent with "failure follows dependency edges, not structural scope" — a running node has already passed its preconditions, so it should be allowed to complete. The coordinator can choose to abort it. Option (b) would be more aggressive. The reactive-execution doc's constraint says "abort is immediate in signals, delayed in protocol," suggesting option (a) is intended. +- **Cross-references**: OQ-09 (retries need to know if a running node can be restarted) + +--- + +## Theme 5: Preconditions and Scheduling + +### OQ-11: Should preconditions support OR logic? + +- **Origin**: [reactive-execution.md](reactive-execution.md) Q1 +- **Status**: open +- **Priority**: medium — affects the precondition computation model +- **Question**: Currently all predecessors must complete (AND logic). An `anyOf` predicate would allow "start this node as soon as any predecessor completes." +- **Notes**: OR preconditions would require either: (a) an edge attribute indicating `allOf` vs `anyOf`, (b) a node-level configuration, or (c) a separate `anyOfPredecessors` computed per node. This is a semantic change that affects both the DAG structure and the reactive engine. Might be a v2 feature. +- **Cross-references**: OQ-12 + +### OQ-12: How does `maxConcurrency` interact with preconditions? + +- **Origin**: [reactive-execution.md](reactive-execution.md) Q4 +- **Status**: open +- **Priority**: medium — a `Parallel` group with `maxConcurrency: 3` should only start 3 nodes at a time +- **Notes**: `maxConcurrency` is a scheduling concern, not a structural one. The DAG doesn't encode it. Options: (a) a semaphore signal in the reactive layer, (b) coordinator-enforced throttling, (c) a `maxConcurrency` prop on `Parallel` that the reactive engine respects. The `` component already has `maxConcurrency` as an optional prop in its definition (workflow-templates.md). +- **Cross-references**: OQ-11, workflow-templates `Parallel` component + +### OQ-13: Should `blockedByFailure` be a separate `computed` or derived from `preconditions`? + +- **Origin**: [reactive-execution.md](reactive-execution.md) Q5 +- **Status**: open +- **Priority**: low — implementation detail, can be decided during implementation +- **Question**: Currently there are two separate `computed` values — `preconditions` (all predecessors completed/skipped) and `blockedByFailure` (any predecessor failed/aborted). An alternative is a single `computed` returning `"ready" | "blocked" | "failed"`. +- **Notes**: Two separate `computed` values are more composable (you can check preconditions independently of failure status) but require two effects per node. A single `computed` is simpler (one effect) but less composably queryable. This is largely an implementation choice that doesn't affect the public API. Can be deferred to implementation. + +--- + +## Theme 6: Graph Construction and API Surface + +### OQ-14: Should the call graph support unknown `operationId`? + +- **Origin**: [call-graph.md](call-graph.md) Q1 +- **Status**: open (with a proposed answer) +- **Priority**: medium — affects `fromCallEvents()` and `updateFromEvent()` behavior +- **Proposed answer**: Yes. The call graph records what happened, not what should have happened. Nodes with unknown `operationId` get `status: "pending"` and may later transition to `"failed"` with an `OPERATION_NOT_FOUND` error code. +- **Notes**: The doc already has a proposed answer. This just needs confirmation and the behavior documented in the `fromCallEvents()` spec. + +### OQ-15: Should the call graph support multiple graphs simultaneously? + +- **Origin**: [call-graph.md](call-graph.md) Q3 +- **Status**: open +- **Priority**: low — can be deferred to v2 +- **Question**: Currently one `FlowGraph` instance = one call graph. If the hub needs to track multiple concurrent workflows, it uses multiple instances. An alternative is a single graph with workflow-scoped subgraphs. +- **Notes**: The current design (multiple instances) is simpler and matches graphology's model. Subgraphs would require a scoping mechanism. This can be deferred unless early usage shows it's needed. + +### OQ-16: Should `filterByStatus` use an index? + +- **Origin**: [call-graph.md](call-graph.md) Q4 +- **Status**: open +- **Priority**: low — premature optimization for small graphs +- **Notes**: Call graphs at hub level are typically tens of nodes. O(n) filter is fast enough. An index can be added later if performance becomes an issue. Can be deferred. + +### OQ-17: Should `FlowGraph` expose graphology's traversal methods directly? + +- **Origin**: [flowgraph-api.md](flowgraph-api.md) Q1 +- **Status**: open +- **Priority**: medium — affects the public API surface +- **Question**: Currently the plan is convenience methods that delegate. But some consumers may find it inconvenient to go through `.graph.forEachNode()`. +- **Options**: + - (a) Convenience methods only (current plan). Direct access via `.graph` for power users. + - (b) Expose graphology's traversal methods directly on `FlowGraph` (e.g., `flowGraph.forEachNode()`). + - (c) Expose only the most common traversal methods and let `.graph` handle the rest. +- **Notes**: This is a UX decision. Option (a) keeps the API surface small. Option (b) is more convenient but increases the delegation surface. Option (c) is a middle ground. The decision can be made during implementation based on actual consumer usage patterns. + +### OQ-18: Should `addOperation` auto-populate type-compat edges? + +- **Origin**: [flowgraph-api.md](flowgraph-api.md) Q2 +- **Status**: open +- **Priority**: low — affects incremental construction behavior +- **Question**: `fromSpecs()` calls `buildTypeEdges()` which adds all type-compatibility edges. Should `addOperation()` (incremental) also attempt auto-type-compat edge creation? +- **Notes**: This is only relevant for incremental construction (rare use case). The operation graph is typically built once via `fromSpecs()`. If incremental construction is needed, the consumer can call `buildTypeEdges()` manually after adding operations. Can be deferred. + +### OQ-28: Should `FlowGraph` share analysis functions across instances? + +- **Origin**: [flowgraph-api.md](flowgraph-api.md) Q3 +- **Status**: open +- **Priority**: low — optimization concern, not blocking +- **Question**: Currently each `FlowGraph` instance owns its own `DirectedGraph`. A future optimization could pool analysis functions across instances. +- **Notes**: Distinct from OQ-15 (multiple graphs per instance) — this is about sharing analysis logic, not about graph scoping. Can be deferred. + +### OQ-19: Should `parallelGroups` account for resource constraints? + +- **Origin**: [analysis.md](analysis.md) Q4 +- **Status**: open +- **Priority**: low — feature enhancement, not a core concern +- **Question**: Currently `parallelGroups()` returns the theoretical maximum parallelism. An optional `maxConcurrency` parameter could limit group sizes for realistic scheduling. +- **Notes**: Can be added later as an optional parameter. Not blocking. + +### OQ-27: Should `validateTemplate` check runtime preconditions? + +- **Origin**: [analysis.md](analysis.md) Q2 +- **Status**: open (intentionally deferred) +- **Priority**: low — explicitly out of scope for static analysis +- **Question**: Currently `validateTemplate` only checks structural validity and type compatibility. Runtime preconditions (e.g., "operation B requires an API key that operation A doesn't have access to") are beyond the scope of static analysis and belong to the access control layer. +- **Notes**: This is a deliberate scope boundary, not a design gap. Documented here to confirm that this is an intentional deferral, not an oversight. + +--- + +## Theme 7: Conditional and Template Semantics + +### OQ-29: Should GraphologyHostConfig produce a separate graph per edge type? + +- **Origin**: [host-configs.md](host-configs.md) Q2 +- **Status**: open +- **Priority**: medium — affects implementation of the GraphologyHostConfig +- **Question**: Currently all edge types (`sequential`, `conditional`, `typed`) share the same graph. An alternative is a separate graph per edge type, enabling type-specific queries without filtering. +- **Notes**: Related to OQ-04 (edge type consistency at the schema level) but distinct — this is about the runtime graph structure, not the type design. Multiple graphs would make type-specific queries faster (no filtering) but increase complexity and memory usage. +- **Cross-references**: OQ-04 + +### OQ-20: How should conditional edge conditions be represented? + +- **Origin**: [schema.md](schema.md) Q3 +- **Status**: open +- **Priority**: medium — affects `TemplateEdgeAttrs.condition` type safety +- **Options**: + - (a) `Type.Unknown()` with documentation (current). Pro: maximally flexible. Con: no type safety. + - (b) `Type.Union([Type.String(), Type.Function(...)])` for expression strings and function references. Pro: documents both forms. Con: functions don't serialize. + - (c) A dedicated `ConditionSchema` that flowgraph defines. Pro: type safe, consistent. Con: may be overly prescriptive. +- **Notes**: The workflow-templates doc already specifies `Conditional.test` as `((results: Record) => boolean) | string`, and the host-configs doc notes that function props need runtime resolution. Option (b) seems like the pragmatic choice that matches the existing design, but the schema representation is what needs deciding. +- **Known Gap** (from [host-configs.md](host-configs.md)): "Conditional Test Evaluation" — the `Conditional.test` function needs access to the `WorkflowContext`/`ReactiveContext` at runtime to evaluate against predecessor results. This is a concrete sub-problem of OQ-06 (how the reactive host config bridges to execution). +- **Cross-references**: OQ-05 (conditional branch behavior in reactive engine), OQ-06 (runtime resolution of function props) + +### OQ-21: Should templates support explicit `depends_on` edges? + +- **Origin**: [workflow-templates.md](workflow-templates.md) Q3 +- **Status**: open +- **Priority**: medium — affects template composition expressiveness +- **Question**: Currently dependencies are inferred from structure (sequential implies dependency). An explicit `` component would make data dependencies visible in the template without relying on sequential ordering. +- **Notes**: This would add expressiveness but also complexity. Implicit dependency from structure is simpler and covers the most common cases. Explicit `depends_on` would be needed when a node depends on a non-adjacent predecessor in a way that can't be expressed by a `Sequential` group. Can be deferred to v2. +- **Cross-references**: OQ-08 (call graph `depends_on` edges) + +--- + +## Theme 8: Identity and Serialization + +### OQ-22: Should `CallNodeAttrs.identity` be a structured type or `Type.Record`? + +- **Origin**: [schema.md](schema.md) Q2 +- **Status**: open +- **Priority**: medium — affects the `@alkdev/operations` peer dependency +- **Options**: + - (a) Import `Identity` from `@alkdev/operations` (peer dep). Pro: matches call protocol. Con: creates a direct type dependency. + - (b) Duplicate the type in flowgraph. Pro: no dependency. Con: divergence risk. + - (c) Use `Type.Record(Type.String(), Type.Array(Type.String()))` for the `resources` field. Pro: flexible. Con: weaker typing. +- **Notes**: Since `@alkdev/operations` is already a peer dependency for type imports, option (a) seems reasonable. The concern is version alignment, but semver ranges handle this. This could also be a `Type.Unknown()` with documentation, letting the consumer validate. + +### OQ-23: Multiple graphs per `FlowGraph` instance? + +- **Origin**: [call-graph.md](call-graph.md) Q3 (same as OQ-15) +- **Status**: open (duplicate of OQ-15 — see above) + +### OQ-24: Async analysis functions? + +- **Origin**: [analysis.md](analysis.md) Q3 +- **Status**: open +- **Priority**: low — premature for current scale +- **Question**: Should analysis functions be async for large graphs? Current graphs are small (50-200 nodes), synchronous is fine. +- **Notes**: Can be deferred. If large graphs become common, async analysis can be added with an optional `async` variant. + +--- + +## Theme 9: Reactive Execution Mechanics + +### OQ-25: Should the reactive graph support partial re-rendering? + +- **Origin**: [reactive-execution.md](reactive-execution.md) Q3 +- **Status**: open (blocked on ujsx reconciler) +- **Priority**: low — blocked on ujsx reconciler implementation +- **Question**: If a template changes mid-execution, the ujsx reconciler could diff and apply changes. Currently only mount rendering is supported. +- **Known Gap** (from [host-configs.md](host-configs.md)): "ujsx Reconciler Not Yet Available" — the current `HostConfig` is mount-only: no incremental template updates, no `prepareUpdate`/`commitUpdate` flow. This gap is broader than just re-rendering. +- **Notes**: This is entirely dependent on the ujsx reconciler, which is not yet implemented. The host-configs doc notes "currently mount-only." When the reconciler is available, flowgraph gets re-rendering "for free." This question should be revisited after the reconciler is implemented. +- **Cross-references**: OQ-05 (structural container handling during re-render), host-configs.md "Known Gaps" + +--- + +## Theme 10: Version and Scale Concerns + +### OQ-26: How to handle version conflicts? + +- **Origin**: [operation-graph.md](operation-graph.md) Q2 +- **Status**: open +- **Priority**: low — can be deferred to a versioning use case +- **Question**: If two versions of the same operation exist in the registry, should they be separate nodes (`task.classify@1.0.0` vs `task.classify@2.0.0`) or should the latest version win? +- **Notes**: The current design uses `namespace.name` (no version) as the node key, meaning only one version per operation. This is intentional simplicity. Version conflicts are a niche concern that can be addressed when a concrete use case arises. + +--- + +## Summary Table + +| ID | Question | Origin | Priority | Status | +|----|----------|--------|----------|--------| +| OQ-01 | All edges or only compatible edges? | operation-graph | high | open | +| OQ-02 | Type compatibility depth and granularity | operation-graph, analysis | high | open | +| OQ-03 | Subscription operations in type compat | operation-graph | medium | open | +| OQ-04 | `edgeType` on all edges? | schema | medium | open | +| OQ-05 | Structural container transparency | workflow-templates, host-configs | high | open | +| OQ-06 | Template ↔ call protocol interaction | workflow-templates, host-configs | high | open | +| OQ-07 | Should reactive engine own call graph? | host-configs | high | open | +| OQ-08 | Auto-populate `depends_on` from templates? | call-graph | medium | open | +| OQ-09 | Retries at signal level | reactive-execution | high | open | +| OQ-10 | Running nodes when predecessor fails | reactive-execution | high | open | +| OQ-11 | OR logic for preconditions | reactive-execution | medium | open | +| OQ-12 | `maxConcurrency` interaction with preconditions | reactive-execution | medium | open | +| OQ-13 | `blockedByFailure` vs single computed | reactive-execution | low | open | +| OQ-14 | Unknown `operationId` in call graph | call-graph | medium | open (proposed) | +| OQ-15 | Multiple graphs per instance | call-graph | low | open | +| OQ-16 | `filterByStatus` index | call-graph | low | open | +| OQ-17 | Expose graphology traversal directly? | flowgraph-api | medium | open | +| OQ-18 | Auto-populate type edges on `addOperation`? | flowgraph-api | low | open | +| OQ-19 | `parallelGroups` with resource constraints | analysis | low | open | +| OQ-20 | Conditional edge condition representation | schema | medium | open | +| OQ-21 | Explicit `depends_on` in templates | workflow-templates | medium | open | +| OQ-22 | `CallNodeAttrs.identity` type | schema | medium | open | +| OQ-24 | Async analysis functions | analysis | low | open | +| OQ-25 | Partial re-rendering | reactive-execution | low | open (blocked) | +| OQ-26 | Operation version conflicts | operation-graph | low | open | +| OQ-27 | Runtime preconditions in validateTemplate? | analysis | low | open (deferred) | +| OQ-28 | Share analysis functions across instances? | flowgraph-api | low | open | +| OQ-29 | Separate graph per edge type? | host-configs | medium | open | + +### Priority Assessment + +**High priority** (should resolve before implementation): +- OQ-01: All edges or only compatible — shapes the entire operation graph API +- OQ-02: Type compatibility depth — shapes `typeCompat()` return type +- OQ-05: Structural container transparency — fundamental to DAG and reactive engine +- OQ-06: Template ↔ call protocol — fundamental integration point +- OQ-07: Reactive engine owns call graph? — affects architecture boundaries +- OQ-09: Retries — shapes the state machine +- OQ-10: Running node failure handling — shapes failure propagation + +**Medium priority** (should resolve before v1 release): +- OQ-03, OQ-04, OQ-08, OQ-11, OQ-12, OQ-14, OQ-17, OQ-20, OQ-21, OQ-22, OQ-29 + +**Low priority** (can defer or decide during implementation): +- OQ-13, OQ-15, OQ-16, OQ-18, OQ-19, OQ-24, OQ-25, OQ-26, OQ-27, OQ-28 + +### Cross-Cutting Themes + +These groups of questions interact with each other and should be resolved together: + +1. **Edge semantics group** (OQ-01, OQ-02, OQ-04): All affect the operation graph's edge structure and the type compatibility API. + +2. **Call protocol integration group** (OQ-06, OQ-07, OQ-08): All about how flowgraph connects to the live call protocol. + +3. **Failure semantics group** (OQ-09, OQ-10): Both about how failure and retry propagate through the reactive engine. Resolving one may resolve or constrain the other. + +4. **Scheduling group** (OQ-11, OQ-12): Both about how preconditions interact with scheduling constraints. + +5. **Template expressiveness group** (OQ-05, OQ-20, OQ-21): All about what the template system can express and how it renders. + +6. **Graph structure group** (OQ-04, OQ-29): Both about how edge types are represented in the graph — OQ-04 at the schema/type level, OQ-29 at the runtime graph structure level. Resolution of one constrains the other. + +7. **Known gaps from host-configs.md** — not all "known gaps" are "open questions" (the reconciler gap is a dependency, not a design question), but they should be tracked here for completeness. \ No newline at end of file