- C-05: Add flowgraph-api.md with complete public API surface - C-06: Document <Map> component in workflow-templates.md - C-07: Specify Conditional else-branch behavior - C-08: Add lifecycle/ownership section to reactive-execution.md - C-09: Add consumer-integration.md end-to-end walkthrough - W-02: Add reactive error boundary semantics (3 levels) - W-03: Complete ReactiveContext interface definition - W-04: Add template composition rules (8 rules) - W-05: Document removeChild for both HostConfigs - W-06: Document signal/effect disposal lifecycle - W-07: Add ADR-004 (no schema version field) - W-08: Add type compatibility depth/contract to analysis.md - W-11: Add performance characteristics section - S-01: Getting Started merged into consumer-integration.md - S-02: Add flow diagrams for template rendering pipeline - S-03: Add node status state machine diagram - S-04: Add testing strategy section - S-06: Validate source structure cross-references Review round 2 fixes: - Define TemplateNodeAttrs as alias for OperationNodeAttrs - Document CallEventMapValue and CallResult types in schema.md - Standardize CycleError naming (replace CircularDependencyError) - Add function form to Map.over type definition - Define Map aggregate completion/failure semantics - Fix immutability claim for fromCallEvents - Clarify edgeType storage alongside OperationEdgeAttrs - Clarify WorkflowNode.status === statusMap (same Signal) - Add component-to-tag mapping for WorkflowTag
17 KiB
status, last_updated
| status | last_updated |
|---|---|
| draft | 2026-05-20 |
Analysis Functions
Standalone composable functions for type-compatibility checking, execution ordering, and precondition validation.
Overview
Analysis functions are pure, composable functions that operate on a FlowGraph instance. They follow the same pattern as taskgraph: standalone functions (not methods on the class) that take a graph as input and return structured results.
The analysis layer provides:
- Type compatibility — can operation A's output feed into operation B's input?
- Execution ordering — what's a valid topological order for a set of operations?
- Precondition validation — are all required inputs available before a step starts?
- Reachability — which operations can be reached from a given starting point?
- Template validation — does a workflow template follow a valid path through the operation graph?
All analysis functions are pure: they don't mutate the graph, they don't depend on external state, and they return structured results (not throw on failure). This makes them testable, composable, and suitable for both synchronous and async use.
Type Compatibility
Compatibility Contract
The typeCompat function defines a clear contract for what each result means:
| Result | Meaning | What the consumer should do |
|---|---|---|
{ compatible: true } |
Output schema is a subtype of input schema | Allow the edge; data can flow from source to target without transformation |
{ compatible: true, detail } |
Compatible with notes | Allow the edge; the detail string describes why (e.g., "output has extra fields beyond input requirements") |
{ compatible: false, mismatches } |
Structural incompatibility | Reject the edge or add a transformation step; mismatches lists specific field-level problems |
| No edge at all | Unknown compatibility (one or both schemas are Type.Unknown()) |
Neither compatible nor incompatible; no edge is created |
Depth of Compatibility Checking
typeCompat performs deep recursive structural comparison:
- Top-level fields — all required fields in
inputSchemamust be present inoutputSchema - Nested objects — recursively compared. If
inputSchemarequires{ address: { city: string } },outputSchemaproviding{ address: { city: string, zip: string } }is compatible (output is a superset) - Arrays — element types are compared. If
inputSchemarequiresstring[],outputSchemaproviding(string | number)[]is not compatible (output could produce non-string elements) - Optional fields — if
inputSchemamarks a field as optional (Type.Optional()), it's not required inoutputSchema. IfoutputSchemaomits it, compatibility is stilltrue. - Union types — if
inputSchemaacceptsstring | number,outputSchemaproviding juststringis compatible (string is a subtype of string | number). The reverse (input requiresstring, output providesstring | number) is not compatible.
The mismatches array provides field-level diagnostics for incompatible results:
interface TypeMismatch {
path: string; // JSON path to the mismatched field (e.g., "/address/city")
expected: string; // What input requires (e.g., "string")
actual: string; // What output provides (e.g., "number")
}
Compatibility Rules Summary
| Output \ Input | Exact match | Superset of input | Subset of input | Unknown |
|---|---|---|---|---|
| Exact match | ✅ compatible | ✅ compatible | ❌ incompatible | No edge |
| Superset | ✅ compatible | ✅ compatible | ❌ incompatible | No edge |
| Subset | ❌ incompatible | ❌ incompatible | Depends on which fields | No edge |
| Unknown | No edge | No edge | No edge | No edge |
typeCompat(outputSchema, inputSchema)
function typeCompat(
outputSchema: TSchema,
inputSchema: TSchema,
): TypeCompatResult
interface TypeCompatResult {
compatible: boolean;
detail?: string;
mismatches?: TypeMismatch[];
}
interface TypeMismatch {
path: string; // JSON path to the mismatched field
expected: string; // What the input schema requires
actual: string; // What the output schema provides
}
Compares two TypeBox schemas and determines if the output schema is compatible with the input schema. Returns a structured result with details about mismatches.
Compatibility rules
The analysis is structural, not semantic. It checks whether the output shape can satisfy the input shape:
- Exact match —
outputSchemaandinputSchemaare structurally identical →compatible: true - Output is superset — output has all fields that input requires, plus extras →
compatible: true(output is a subtype of input, meaning input accepts output) - Output is subset — output is missing fields that input requires →
compatible: false, withmismatcheslisting the missing fields - Type mismatch — output field type doesn't match input field type →
compatible: false, withmismatcheslisting the type differences - Unknown passthrough — if either schema is
Type.Unknown(), compatibility is unknown → no edge is created (not incompatible, just unresolvable)
Subtype checking
The key insight: output must be a subtype of input for compatibility. This means:
- If input expects
{ name: string, age: number }, output must provide at least those fields - If input expects
string, output providingstring | numberis not compatible (it could produce a number) - If input expects
string | number, output providingstringis compatible (string is a subset of string|number)
This follows standard type theory: the output must be at least as specific as what the input requires.
buildTypeEdges(graph)
function buildTypeEdges(graph: FlowGraph<OperationNodeAttrs, OperationEdgeAttrs>): void
Populates the operation graph with type-compatibility edges. For each pair of nodes (A, B), calls typeCompat(A.outputSchema, B.inputSchema) and adds an edge with the result.
This is called automatically by FlowGraph.fromSpecs(). It can also be called manually after adding operations incrementally.
Edge attributes from type compatibility
A type-compatibility edge carries:
{
edgeType: "typed",
compatible: boolean, // true if output feeds into input
detail?: string, // "classify.output is compatible with enrich.input"
mismatches?: TypeMismatch[] // specific field-level mismatches (if incompatible)
}
Execution Ordering
topologicalOrder(graph)
function topologicalOrder(graph: FlowGraph): string[]
Returns node keys in topological order (prerequisites before dependents). Uses graphology-dag's topologicalSort algorithm.
Throws CycleError if the graph contains cycles, with cycles populated by findCycles().
parallelGroups(graph)
function parallelGroups(graph: FlowGraph): string[][]
Returns groups of nodes that can execute in parallel. Each group is an array of node keys. Groups are ordered by dependency depth:
- Group 0: nodes with no prerequisites (roots)
- Group 1: nodes whose only prerequisites are in Group 0
- Group N: nodes whose prerequisites are all in Groups 0 through N-1
This is useful for the hub coordinator to determine max parallelism: all nodes in a group can start simultaneously.
criticalPath(graph)
function criticalPath(graph: FlowGraph): string[]
Returns the longest path through the DAG, which represents the sequence of operations that determines the minimum total execution time. Useful for identifying bottlenecks.
Precondition Validation
validatePreconditions(graph)
function validatePreconditions(
graph: FlowGraph<OperationNodeAttrs, OperationEdgeAttrs>
): ValidationError[]
For each node in the operation graph, checks that all required input fields are provided by at least one predecessor's output. Returns an array of ValidationError objects (never throws).
A "missing precondition" occurs when a node's input requires a field that no predecessor's output provides. This is a stronger check than type compatibility — it verifies that a valid execution path exists through the graph.
validateTemplate(template, operationGraph)
function validateTemplate(
template: UNode,
operationGraph: FlowGraph<OperationNodeAttrs, OperationEdgeAttrs>,
): ValidationError[]
Validates a workflow template against an operation graph:
- All operations exist — every
<Operation name="X">has a matching node in the operation graph - No cycles — the rendered DAG has no cycles
- Type compatibility — sequential operations have compatible type edges (or no incompatible edge)
- Reachability — all operations are reachable from the start
- No orphan nodes — every operation has at least one incoming or outgoing edge (unless it's a single-operation template)
Returns an array of ValidationError objects. Template validation is advisory — it can produce warnings (e.g., "operation not in registry") and errors (e.g., "cycle detected").
Reachability
reachableFrom(graph, nodeIds)
function reachableFrom(graph: FlowGraph, nodeIds: string[]): Set<string>
Returns all node keys reachable from the given starting nodes via directed edges. Useful for:
- Determining which operations a coordinator can reach from a starting operation
- Computing the abort cascade scope for a given call
- Finding all operations affected by a change to a particular operation
ancestors(graph, nodeId)
function ancestors(graph: FlowGraph, nodeId: string): string[]
Returns all ancestors of a node (nodes reachable via incoming edges). Useful for:
- Finding which operations must complete before a given operation can start
- Computing depth-from-roots for execution priority
descendants(graph, nodeId)
function descendants(graph: FlowGraph, nodeId: string): string[]
Returns all descendants of a node (nodes reachable via outgoing edges). Useful for:
- Finding all calls that would be affected by aborting a given call
- Computing the scope of a failure cascade
Graph-Level Validation
validateGraph(graph)
function validateGraph(graph: FlowGraph): AnyValidationError[]
Runs all validation checks:
- Schema validation — node attributes match
OperationNodeAttrsorCallNodeAttrsschema - Graph invariants — no cycles, no dangling edges, no self-loops
- Orphan detection — nodes with no edges (warning, not error)
Returns an array of AnyValidationError objects, which is a union type:
type AnyValidationError = ValidationError | GraphValidationError;
Matching taskgraph's pattern, this function never throws — it collects all issues and returns them.
Standalone Function Pattern
All analysis functions are standalone (not methods on FlowGraph). They take a FlowGraph instance as their first argument and return structured results. This follows taskgraph's pattern:
// Standalone functions
import { topologicalOrder, hasCycles, typeCompat } from "@alkdev/flowgraph/analysis";
const order = topologicalOrder(graph);
const cycles = hasCycles(graph);
const result = typeCompat(outputSchema, inputSchema);
The FlowGraph class exposes convenience methods that delegate to these standalone functions:
class FlowGraph {
topologicalOrder(): string[] { return _topologicalOrder(this._graph); }
hasCycles(): boolean { return _hasCycles(this._graph); }
validate(): AnyValidationError[] { return _validate(this._graph); }
}
This pattern enables:
- Tree-shaking — consumers only import the analysis functions they use
- Testing — standalone functions are easier to test in isolation
- Composition — consumers can chain analysis functions without creating intermediate
FlowGraphinstances
Performance Characteristics
Analysis functions are pure and operate on the graph in memory. Their complexity is:
| Function | Complexity | Notes |
|---|---|---|
topologicalOrder() |
O(V + E) | Linear in nodes + edges. Single traversal. |
parallelGroups() |
O(V + E) | Same as topological sort. One pass. |
criticalPath() |
O(V + E) | Longest path in DAG. Single traversal with path tracking. |
reachableFrom() |
O(V + E) | BFS/DFS from starting nodes. |
ancestors() |
O(V + E) | Backward traversal from target. |
descendants() |
O(V + E) | Forward traversal from target. |
hasCycles() |
O(V + E) | DFS-based cycle detection. Always false after validated construction. |
findCycles() |
O(V + E) | Johnson's algorithm for finding all elementary cycles. |
typeCompat() |
O(depth) | Depends on schema depth. Schemas are typically shallow (5-10 fields). Fast for realistic schemas. |
buildTypeEdges() |
O(V²) | Pairwise comparison of all operations. For 50 operations: 2,500 comparisons. For 200: 40,000. Each comparison is O(depth). |
validateTemplate() |
O(V + E) | Template traversal plus DAG validation. |
validatePreconditions() |
O(V × E) | For each node, check all predecessors. |
validateGraph() |
O(V + E) | Cycle detection + edge validation + orphan detection. |
Practical Performance
For expected graph sizes (10-200 nodes):
buildTypeEdges(): 0.5-5ms for 50 operations, 5-50ms for 200 operationstopologicalOrder(): <1ms for any realistic graphtypeCompat(): <0.01ms per comparison- All query functions: <1ms for any realistic graph
These are in-memory operations with no I/O. The dominant cost is buildTypeEdges() which scales quadratically with the number of operations. For very large registries (>500 operations), consider lazy edge construction or caching.
Optimization Opportunities
-
Lazy edge construction —
buildTypeEdges()currently compares all pairs. For large registries, edges could be computed on demand: whentypeCompat(A, B)is queried, compute and cache the result. This trades startup time for query-time cost. -
Type compatibility caching —
typeCompat()results could be cached by schema hash. Identical schemas always produce the same result. This helps when the same operation appears in multiple templates. -
Incremental graph updates — when a single operation is added to the registry, only compute edges for the new node (O(V) instead of O(V²)).
-
Parallel group scheduling —
parallelGroups()is useful for the hub coordinator to determine max parallelism. An optionalmaxConcurrencyparameter could be added to limit group sizes for realistic scheduling.
Constraints
- Analysis functions are pure — they don't mutate the graph, don't depend on external state, and don't throw on validation failures (they return error arrays)
- Type compatibility is structural, not semantic —
typeCompat()checks schema shapes, not whether the data makes sense. "Age as number" is compatible with "count as number" even though they're semantically different. - Template validation is advisory — warnings are not errors. A template with an unknown operation is a warning, not a validation failure (the operation might be added to the registry later).
- Analysis functions work on the underlying
DirectedGraph— they're thin wrappers around graphology and graphology-dag functions, following the same pattern as taskgraph topologicalOrder()throws on cycles — unlikevalidateGraph()which returns errors,topologicalOrder()throwsCycleErrorbecause it cannot produce a valid ordering from a cyclic graph
Open Questions
-
How deep should
typeCompatcheck? Currently it checks top-level field existence and type compatibility. Should it recursively check nested objects and arrays? Full recursive checking is more thorough but slower and may produce false negatives for schemas with dynamic structures. -
Should
validateTemplatecheck runtime preconditions? Currently it only checks structural validity and type compatibility. Runtime preconditions (e.g., "operation B requires an API key that operation A doesn't have access to") are beyond the scope of static analysis and belong to the access control layer. -
Should analysis functions be async? For very large graphs (thousands of nodes), type compatibility checking could be slow. Making it async would allow incremental progress reporting. Current graphs are small enough (50-200 nodes) that synchronous checking is fine.
-
Should
parallelGroupsaccount for resource constraints? Currently it returns the theoretical maximum parallelism. An optionalmaxConcurrencyparameter could limit group sizes for realistic scheduling.
References
- Schema: schema.md —
TypeCompatResult,TypeMismatch,ValidationError - Error handling: error-handling.md —
CycleError,TypeIncompatError - Taskgraph analysis pattern:
@alkdev/taskgraph_ts/src/analysis/ - TypeBox Value utilities:
@alkdev/typebox/value