resolve all remaining open questions (OQ-03–OQ-29), add ADR-006

Resolve all 19 remaining open questions across the architecture. Every question now has a documented resolution with rationale: - OQ-004/OQ-029: edgeType is a universal required attribute on all edges, single graph per FlowGraph instance (ADR-006) - OQ-011: No OR preconditions for v1; preconditionMode as v2 extension - OQ-012: maxConcurrency enforced via reactive counting semaphore - OQ-014: Unknown operationId creates node with pending status - OQ-017: Expose common graphology traversal methods on FlowGraph (80/20) - OQ-020: condition as Type.Unknown() with string/function documentation - OQ-022: Identity imported from @alkdev/operations peer dep - All other questions resolved with documented rationale Fix three critical issues found by architecture review: 1. edgeType serialization/validation gap: document two-step validation 2. CallEdgeAttrs runtime discrimination: edgeType as runtime discriminant, depends_on edges clarified as observability-only (not execution) 3. ADR-005 signal mutation inconsistency: explicitly distinguish call-level statuses (event-log-driven) from workflow-derived statuses (signal-mutation) Additional clarifications: - dataFlow inference uses conservative strategy (defaults false) - Conditional.test string resolution: operationName → status === completed - Add negated field to TemplateEdgeAttrs for else-branch conditions - Document edge key priority convention for composite keys - Add maxConcurrency semaphore design to reactive-execution.md
2026-05-21 09:25:55 +00:00
parent c76be7f689
commit f3e084d02f
9 changed files with 239 additions and 268 deletions
--- a/docs/architecture/call-graph.md
+++ b/docs/architecture/call-graph.md
@@ -1,6 +1,6 @@
 ---
 status: draft
-last_updated: 2026-05-19
+last_updated: 2026-05-22
 ---

 # Call Graph (Dynamic Runtime)
@@ -91,9 +91,7 @@ Call graph edges carry an `edgeType` attribute:
 | `triggered` | Parent call caused child call to execute | `call.requested` with `parentRequestId` |
 | `depends_on` | Data dependency — source needs target's result | Explicit declaration (not auto-populated) |

-`depends_on` edges are not auto-populated by the call protocol. They represent data dependencies that aren't captured by the parent-child hierarchy. They may be added by:
- Workflow template instantiation (the template knows which steps depend on which)
- Explicit `addDependency(parent, child)` calls by the hub coordinator
+`depends_on` edges represent data dependencies between calls. Per [ADR-005](decisions/005-event-log-as-source-of-truth.md), the reactive engine does NOT use `depends_on` edges for data flow — data flows through the result projection (`getResult()`). However, `depends_on` edges remain in the API for **observability and visualization**: they annotate which calls depended on which other calls' results, providing a data-flow overlay on top of the call hierarchy. Hub coordinators or external tools may add `depends_on` edges to annotate observed data flow for debugging, monitoring, or call-graph visualization. They do NOT affect execution.

 ### Edge Key Convention

@@ -234,19 +232,19 @@ updateCall(requestId: string, attrs: Partial<CallNodeAttrs>): void
 - **Status transitions are validated** — invalid transitions throw `InvalidTransitionError`.
 - **Node keys are `requestId`** — not `operationId`. Multiple calls to the same operation have different `requestId`s but the same `operationId`.
 - **`parentRequestId` is both node attribute and edge** — denormalized for fast point lookups (node attribute) and traversal queries (edge), following the storage schema pattern.
- **`depends_on` edges are not auto-populated** — they represent data dependencies that the call protocol doesn't capture. They must be added explicitly by the hub coordinator or workflow template instantiation.
+- **`depends_on` edges are for observability, not execution** — per ADR-005, data dependencies flow through the result projection. `depends_on` edges annotate observed data flow for visualization and debugging. The reactive engine does NOT use them for scheduling or precondition computation. They may be added by hub coordinators or external tools to document which calls depended on which other calls' results.
 - **Payload fields are stored as-is** — flowgraph doesn't truncate or redact `input`, `output`, or `error`. That's the hub's responsibility at the persistence boundary.
 - **Small graph sizes** — call graphs at hub level are typically tens of nodes. Performance is a non-issue; O(n) traversals are fine.

 ## Open Questions

-1. **Should the call graph support `call.requested` events with unknown `operationId`?** If a `call.requested` event references an operation not in the registry, should the node be created with `operationId` set to the unknown value? Yes — the call graph records what happened, not what should have happened. The node gets a `status: "pending"` and may later transition to `"failed"` with an `OPERATION_NOT_FOUND` error code.
+1. ~~**Should the call graph support `call.requested` events with unknown `operationId`?**~~ **Resolved (OQ-014)**: Yes — the call graph records what happened, not what should have happened. Nodes with unknown `operationId` get `status: "pending"` and may later transition to `"failed"` with an `OPERATION_NOT_FOUND` error code. This is consistent with the error-handling doc's existing statement about unknown `operationId`. The behavior should be documented explicitly in the `fromCallEvents()` specification: when a `call.requested` event references an `operationId` not in the registry, the node is still created with `status: "pending"` and the given `operationId`. This enables the call graph to serve as a complete audit trail regardless of registry state.

-2. **Should `depends_on` edges be auto-populated from workflow templates?** When a call graph is instantiated from a workflow template, the template's sequential/parallel structure implies data dependencies. Should the template instantiation automatically create `depends_on` edges? This would couple the call graph to the template system, which may not always be desirable.
+2. ~~**Should `depends_on` edges be auto-populated from workflow templates?**~~ **Resolved (OQ-008/ADR-005)**: `depends_on` edges are unnecessary as a separate concept. Data dependencies are expressed through the result projection. If node B needs node A's output, B reads `getResult("A")` from the result projection. The temporal ordering (A before B) is already expressed by template edges. There's no need for a separate edge type to represent data flow — the event log is the data transport.

-3. **Should the call graph support multiple graphs simultaneously (one per workflow execution)?** Currently the design assumes one call graph per `FlowGraph` instance. If the hub needs to track multiple concurrent workflows, it would use multiple instances. An alternative is a single graph with workflow-scoped subgraphs.
+3. ~~**Should the call graph support multiple graphs simultaneously?**~~ **Resolved (OQ-015)**: No — one `FlowGraph` instance per graph. Multiple concurrent workflows use multiple instances. This design is simpler and matches graphology's model. Subgraphs would require a scoping mechanism and cross-scope queries that add complexity without benefit at current scale. The hub coordinator creates one `WorkflowReactiveRoot` per workflow, so one `FlowGraph` per workflow is consistent. This is a deliberate "no," not a deferral — if future scale demands require multi-workflow queries, a specialized query layer can aggregate across instances.

-4. **Should `filterByStatus` use an index?** For small graphs (tens of nodes), a simple filter is fast. For very large graphs, maintaining a `Map<CallStatus, Set<string>>` index would make status queries O(1). The index would need to be updated on every `updateStatus()` call.
+4. ~~**Should `filterByStatus` use an index?**~~ **Resolved (OQ-016)**: No — O(n) filter is sufficient for expected graph sizes (tens to hundreds of nodes). A status index would add implementation complexity (maintain on every `updateStatus()`) for no measurable benefit at current scale. If performance becomes an issue with very large graphs, a `Map<CallStatus, Set<string>>` index can be added as an optimization later without changing the public API.

 ## References