Files
flowgraph/docs/architecture/analysis.md
glm-5.1 d2253099ee add flowgraph architecture docs (Phase 1 SDD)
Draft architecture specification for @alkdev/flowgraph — a workflow graph library providing DAG-based orchestration over operations. Covers two graph types (operation graph, call graph), ujsx workflow templates, GraphologyHost and ReactiveHost configs, signal-driven execution, type-compatibility analysis, error hierarchy, and build/distribution. Includes 3 ADRs: ujsx as template IR, DAG-only enforcement, decoupled storage.
2026-05-19 09:36:22 +00:00

12 KiB

status, last_updated
status last_updated
draft 2026-05-19

Analysis Functions

Standalone composable functions for type-compatibility checking, execution ordering, and precondition validation.

Overview

Analysis functions are pure, composable functions that operate on a FlowGraph instance. They follow the same pattern as taskgraph: standalone functions (not methods on the class) that take a graph as input and return structured results.

The analysis layer provides:

  • Type compatibility — can operation A's output feed into operation B's input?
  • Execution ordering — what's a valid topological order for a set of operations?
  • Precondition validation — are all required inputs available before a step starts?
  • Reachability — which operations can be reached from a given starting point?
  • Template validation — does a workflow template follow a valid path through the operation graph?

All analysis functions are pure: they don't mutate the graph, they don't depend on external state, and they return structured results (not throw on failure). This makes them testable, composable, and suitable for both synchronous and async use.

Type Compatibility

typeCompat(outputSchema, inputSchema)

function typeCompat(
  outputSchema: TSchema,
  inputSchema: TSchema,
): TypeCompatResult

interface TypeCompatResult {
  compatible: boolean;
  detail?: string;
  mismatches?: TypeMismatch[];
}

interface TypeMismatch {
  path: string;      // JSON path to the mismatched field
  expected: string;  // What the input schema requires
  actual: string;    // What the output schema provides
}

Compares two TypeBox schemas and determines if the output schema is compatible with the input schema. Returns a structured result with details about mismatches.

Compatibility rules

The analysis is structural, not semantic. It checks whether the output shape can satisfy the input shape:

  1. Exact matchoutputSchema and inputSchema are structurally identical → compatible: true
  2. Output is superset — output has all fields that input requires, plus extras → compatible: true (output is a subtype of input, meaning input accepts output)
  3. Output is subset — output is missing fields that input requires → compatible: false, with mismatches listing the missing fields
  4. Type mismatch — output field type doesn't match input field type → compatible: false, with mismatches listing the type differences
  5. Unknown passthrough — if either schema is Type.Unknown(), compatibility is unknown → no edge is created (not incompatible, just unresolvable)

Subtype checking

The key insight: output must be a subtype of input for compatibility. This means:

  • If input expects { name: string, age: number }, output must provide at least those fields
  • If input expects string, output providing string | number is not compatible (it could produce a number)
  • If input expects string | number, output providing string is compatible (string is a subset of string|number)

This follows standard type theory: the output must be at least as specific as what the input requires.

buildTypeEdges(graph)

function buildTypeEdges(graph: FlowGraph<OperationNodeAttrs, OperationEdgeAttrs>): void

Populates the operation graph with type-compatibility edges. For each pair of nodes (A, B), calls typeCompat(A.outputSchema, B.inputSchema) and adds an edge with the result.

This is called automatically by FlowGraph.fromSpecs(). It can also be called manually after adding operations incrementally.

Edge attributes from type compatibility

A type-compatibility edge carries:

{
  edgeType: "typed",
  compatible: boolean,      // true if output feeds into input
  detail?: string,           // "classify.output is compatible with enrich.input"
  mismatches?: TypeMismatch[] // specific field-level mismatches (if incompatible)
}

Execution Ordering

topologicalOrder(graph)

function topologicalOrder(graph: FlowGraph): string[]

Returns node keys in topological order (prerequisites before dependents). Uses graphology-dag's topologicalSort algorithm.

Throws CircularDependencyError if the graph contains cycles, with cycles populated by findCycles().

parallelGroups(graph)

function parallelGroups(graph: FlowGraph): string[][]

Returns groups of nodes that can execute in parallel. Each group is an array of node keys. Groups are ordered by dependency depth:

  • Group 0: nodes with no prerequisites (roots)
  • Group 1: nodes whose only prerequisites are in Group 0
  • Group N: nodes whose prerequisites are all in Groups 0 through N-1

This is useful for the hub coordinator to determine max parallelism: all nodes in a group can start simultaneously.

criticalPath(graph)

function criticalPath(graph: FlowGraph): string[]

Returns the longest path through the DAG, which represents the sequence of operations that determines the minimum total execution time. Useful for identifying bottlenecks.

Precondition Validation

validatePreconditions(graph)

function validatePreconditions(
  graph: FlowGraph<OperationNodeAttrs, OperationEdgeAttrs>
): ValidationError[]

For each node in the operation graph, checks that all required input fields are provided by at least one predecessor's output. Returns an array of ValidationError objects (never throws).

A "missing precondition" occurs when a node's input requires a field that no predecessor's output provides. This is a stronger check than type compatibility — it verifies that a valid execution path exists through the graph.

validateTemplate(template, operationGraph)

function validateTemplate(
  template: UNode,
  operationGraph: FlowGraph<OperationNodeAttrs, OperationEdgeAttrs>,
): ValidationError[]

Validates a workflow template against an operation graph:

  1. All operations exist — every <Operation name="X"> has a matching node in the operation graph
  2. No cycles — the rendered DAG has no cycles
  3. Type compatibility — sequential operations have compatible type edges (or no incompatible edge)
  4. Reachability — all operations are reachable from the start
  5. No orphan nodes — every operation has at least one incoming or outgoing edge (unless it's a single-operation template)

Returns an array of ValidationError objects. Template validation is advisory — it can produce warnings (e.g., "operation not in registry") and errors (e.g., "cycle detected").

Reachability

reachableFrom(graph, nodeIds)

function reachableFrom(graph: FlowGraph, nodeIds: string[]): Set<string>

Returns all node keys reachable from the given starting nodes via directed edges. Useful for:

  • Determining which operations a coordinator can reach from a starting operation
  • Computing the abort cascade scope for a given call
  • Finding all operations affected by a change to a particular operation

ancestors(graph, nodeId)

function ancestors(graph: FlowGraph, nodeId: string): string[]

Returns all ancestors of a node (nodes reachable via incoming edges). Useful for:

  • Finding which operations must complete before a given operation can start
  • Computing depth-from-roots for execution priority

descendants(graph, nodeId)

function descendants(graph: FlowGraph, nodeId: string): string[]

Returns all descendants of a node (nodes reachable via outgoing edges). Useful for:

  • Finding all calls that would be affected by aborting a given call
  • Computing the scope of a failure cascade

Graph-Level Validation

validateGraph(graph)

function validateGraph(graph: FlowGraph): AnyValidationError[]

Runs all validation checks:

  1. Schema validation — node attributes match OperationNodeAttrs or CallNodeAttrs schema
  2. Graph invariants — no cycles, no dangling edges, no self-loops
  3. Orphan detection — nodes with no edges (warning, not error)

Returns an array of AnyValidationError objects, which is a union type:

type AnyValidationError = ValidationError | GraphValidationError;

Matching taskgraph's pattern, this function never throws — it collects all issues and returns them.

Standalone Function Pattern

All analysis functions are standalone (not methods on FlowGraph). They take a FlowGraph instance as their first argument and return structured results. This follows taskgraph's pattern:

// Standalone functions
import { topologicalOrder, hasCycles, typeCompat } from "@alkdev/flowgraph/analysis";

const order = topologicalOrder(graph);
const cycles = hasCycles(graph);
const result = typeCompat(outputSchema, inputSchema);

The FlowGraph class exposes convenience methods that delegate to these standalone functions:

class FlowGraph {
  topologicalOrder(): string[] { return _topologicalOrder(this._graph); }
  hasCycles(): boolean { return _hasCycles(this._graph); }
  validate(): AnyValidationError[] { return _validate(this._graph); }
}

This pattern enables:

  • Tree-shaking — consumers only import the analysis functions they use
  • Testing — standalone functions are easier to test in isolation
  • Composition — consumers can chain analysis functions without creating intermediate FlowGraph instances

Constraints

  • Analysis functions are pure — they don't mutate the graph, don't depend on external state, and don't throw on validation failures (they return error arrays)
  • Type compatibility is structural, not semantictypeCompat() checks schema shapes, not whether the data makes sense. "Age as number" is compatible with "count as number" even though they're semantically different.
  • Template validation is advisory — warnings are not errors. A template with an unknown operation is a warning, not a validation failure (the operation might be added to the registry later).
  • Analysis functions work on the underlying DirectedGraph — they're thin wrappers around graphology and graphology-dag functions, following the same pattern as taskgraph
  • topologicalOrder() throws on cycles — unlike validateGraph() which returns errors, topologicalOrder() throws CircularDependencyError because it cannot produce a valid ordering from a cyclic graph

Open Questions

  1. How deep should typeCompat check? Currently it checks top-level field existence and type compatibility. Should it recursively check nested objects and arrays? Full recursive checking is more thorough but slower and may produce false negatives for schemas with dynamic structures.

  2. Should validateTemplate check runtime preconditions? Currently it only checks structural validity and type compatibility. Runtime preconditions (e.g., "operation B requires an API key that operation A doesn't have access to") are beyond the scope of static analysis and belong to the access control layer.

  3. Should analysis functions be async? For very large graphs (thousands of nodes), type compatibility checking could be slow. Making it async would allow incremental progress reporting. Current graphs are small enough (50-200 nodes) that synchronous checking is fine.

  4. Should parallelGroups account for resource constraints? Currently it returns the theoretical maximum parallelism. An optional maxConcurrency parameter could limit group sizes for realistic scheduling.

References

  • Schema: schema.mdTypeCompatResult, TypeMismatch, ValidationError
  • Error handling: error-handling.mdCircularDependencyError, TypeIncompatError
  • Taskgraph analysis pattern: @alkdev/taskgraph_ts/src/analysis/
  • TypeBox Value utilities: @alkdev/typebox/value