Draft architecture specification for @alkdev/flowgraph — a workflow graph library providing DAG-based orchestration over operations. Covers two graph types (operation graph, call graph), ujsx workflow templates, GraphologyHost and ReactiveHost configs, signal-driven execution, type-compatibility analysis, error hierarchy, and build/distribution. Includes 3 ADRs: ujsx as template IR, DAG-only enforcement, decoupled storage.
12 KiB
status, last_updated
| status | last_updated |
|---|---|
| draft | 2026-05-19 |
Call Graph (Dynamic Runtime)
The dynamic call graph populated at runtime from call events. Nodes are call invocations with status and timestamps; edges are parent-child and dependency relationships.
Overview
The call graph is the runtime counterpart to the operation graph. Where the operation graph captures what can happen (type compatibility), the call graph captures what is happening or has happened (running calls, completed calls, failures, aborts).
The call graph is populated automatically by the call protocol — every call.requested adds a node, every call.responded/call.error/call.aborted updates its status. This means the call graph is always in sync with the actual state of in-flight calls.
Key capabilities:
- Abort cascading — abort a call → all children are automatically aborted via
parentRequestIdchains - Observability — query what's running, what failed, what's blocked
- DAG operations — topological sort of running calls, cycle detection (shouldn't happen but verified), reachability queries
- Serialization —
export()/fromJSON()for Postgres persistence
Construction
fromCallEvents()
static fromCallEvents(events: CallEventMapValue[]): FlowGraph<CallNodeAttrs, CallEdgeAttrs>
Builds a call graph from an array of call protocol events. Events are processed in order:
call.requested→ add aCallNodeAttrsnode withstatus: "pending". IfparentRequestIdis set, add atriggerededge from parent to child.call.responded→ update node status tocompleted, setoutputandcompletedAtcall.error→ update node status tofailed, seterrorandcompletedAtcall.aborted→ update node status toaborted, setcompletedAtcall.completed→ update node status tocompleted, setcompletedAt(if not already set bycall.responded)
Processing is idempotent — processing the same event twice has no effect (the node already has the updated status).
Incremental: updateFromEvent()
updateFromEvent(event: CallEventMapValue): void
Updates an existing call graph with a single call event. This is the primary interface for real-time graph population:
const callGraph = new FlowGraph();
// Subscribe to call protocol events
pubsub.subscribe("call.requested", (event) => callGraph.updateFromEvent(event));
pubsub.subscribe("call.responded", (event) => callGraph.updateFromEvent(event));
pubsub.subscribe("call.error", (event) => callGraph.updateFromEvent(event));
pubsub.subscribe("call.aborted", (event) => callGraph.updateFromEvent(event));
pubsub.subscribe("call.completed", (event) => callGraph.updateFromEvent(event));
fromJSON()
static fromJSON(data: CallGraphSerialized): FlowGraph
Deserialize from graphology native JSON format. Used for loading persisted call graphs from Postgres.
Node Attributes
See schema.md for the full schema definition.
| Field | Type | Set by |
|---|---|---|
requestId |
string |
call.requested |
operationId |
string |
call.requested |
status |
CallStatus |
Updated by each call event |
parentRequestId |
string? |
call.requested |
input |
unknown |
call.requested |
output |
unknown? |
call.responded |
error |
{ code, message, details? }? |
call.error |
identity |
Identity? |
call.requested |
startedAt |
string? |
call.requested (when handler starts) |
completedAt |
string? |
Terminal event (responded, error, aborted) |
The node key is requestId.
Edges
Call graph edges carry an edgeType attribute:
edgeType |
Meaning | Added by |
|---|---|---|
triggered |
Parent call caused child call to execute | call.requested with parentRequestId |
depends_on |
Data dependency — source needs target's result | Explicit declaration (not auto-populated) |
depends_on edges are not auto-populated by the call protocol. They represent data dependencies that aren't captured by the parent-child hierarchy. They may be added by:
- Workflow template instantiation (the template knows which steps depend on which)
- Explicit
addDependency(parent, child)calls by the hub coordinator
Edge Key Convention
triggered edges use ${parentRequestId}->${childRequestId} as the edge key. depends_on edges use ${sourceRequestId}->${targetRequestId}:depends_on to distinguish from triggered edges between the same pair.
Since multi: false, there can be at most one triggered and one depends_on edge between the same pair. The edge key convention ensures deterministic keys.
Status Lifecycle
Call node status transitions follow a strict state machine:
call.requested
│
▼
┌─────────┐
│ pending │
└────┬────┘
│
handler starts
│
▼
┌─────────┐
┌────│ running │────┐
│ └────┬────┘ │
call.aborted │ call.aborted
│ │ │
▼ │ ▼
┌─────────┐ │ ┌─────────┐
│ aborted │ │ │ aborted │
└─────────┘ │ └─────────┘
│
┌─────────┼─────────┐
│ │ │
call.responded │ call.error
│ │ │
▼ │ ▼
┌───────────┐ │ ┌────────┐
│ completed │ │ │ failed │
└───────────┘ │ └────────┘
│
call.completed
│
▼
┌───────────┐
│ completed │
└───────────┘
Invalid transitions (e.g., completed → running) throw InvalidTransitionError. The updateStatus() method validates the transition before applying it.
Abort Cascading
When a call is aborted, all of its children should also be aborted. The call protocol handles this via call.aborted events propagating through parentRequestId chains.
The call graph supports this with a traversal query:
// Abort cascade: get all descendants of a call
const descendants = callGraph.descendants(requestId);
// → all calls that would be affected by aborting this call
The hub coordinator can:
- Receive
call.abortedfor a parent call - Query
callGraph.descendants(requestId)for all children - Abort each child call via
PendingRequestMap.abort()
This is a structural operation — the graph provides the "who is affected" information, the protocol provides the "abort them" mechanism.
Observability Queries
The call graph supports queries for observability without traversing the entire graph:
| Query | Method | Returns |
|---|---|---|
| Get running calls | filterByStatus("running") |
Node IDs with running status |
| Get failed calls | filterByStatus("failed") |
Node IDs with failed status |
| Get top-level calls | getRoots() |
Nodes with no parentRequestId |
| Get children of call | children(requestId) |
Direct children via triggered edges |
| Get call duration | duration(requestId) |
completedAt - startedAt (throws if not completed) |
| Get call lineage | lineage(requestId) |
Ancestor chain from root to this call |
filterByStatus
filterByStatus(status: CallStatus): string[]
Returns all node keys with the given status. Implemented as a filter over graph.forEachNode(). For small graphs (tens to hundreds of nodes), this is O(n) and fast. For very large graphs, a status index could be added as an optimization.
getRoots
getRoots(): string[]
Returns all nodes with parentRequestId === undefined (top-level calls). These are the entry points of call chains.
Serialization and Persistence
const data = callGraph.export(); // graphology native JSON
callGraph.toJSON(); // alias for export()
const restored = FlowGraph.fromJSON(data); // round-trip
The call graph's export()/fromJSON() boundary is designed for Postgres persistence via the hub's storage layer. Flowgraph does not handle database operations — it provides the serialized format, and the hub handles storage.
Payload fields (input, output, error) are stored as-is in the graph. The hub's storage layer is responsible for truncation and redaction (see @alkdev/alkhub_ts/docs/architecture/storage/call-graph.md for the payload handling strategy).
Mutations
// Add a call node (from call.requested event)
addCall(attrs: CallNodeAttrs): void
// Update call status (from call.responded/error/aborted/completed event)
updateStatus(requestId: string, status: CallStatus, extra?: Partial<CallNodeAttrs>): void
// Add a dependency edge (explicit, not auto-populated)
addDependency(source: string, target: string): void
// Remove a call node and its edges
removeCall(requestId: string): void
// Update call attributes (partial merge)
updateCall(requestId: string, attrs: Partial<CallNodeAttrs>): void
updateStatus validates the transition. addDependency validates that both endpoints exist. removeCall removes the node and all attached edges (graphology cascade).
Constraints
- DAG-only — call graphs cannot have cycles. A call cannot be its own ancestor.
addCallwith aparentRequestIdthat would create a cycle throwsCycleError. - Status transitions are validated — invalid transitions throw
InvalidTransitionError. - Node keys are
requestId— notoperationId. Multiple calls to the same operation have differentrequestIds but the sameoperationId. parentRequestIdis both node attribute and edge — denormalized for fast point lookups (node attribute) and traversal queries (edge), following the storage schema pattern.depends_onedges are not auto-populated — they represent data dependencies that the call protocol doesn't capture. They must be added explicitly by the hub coordinator or workflow template instantiation.- Payload fields are stored as-is — flowgraph doesn't truncate or redact
input,output, orerror. That's the hub's responsibility at the persistence boundary. - Small graph sizes — call graphs at hub level are typically tens of nodes. Performance is a non-issue; O(n) traversals are fine.
Open Questions
-
Should the call graph support
call.requestedevents with unknownoperationId? If acall.requestedevent references an operation not in the registry, should the node be created withoperationIdset to the unknown value? Yes — the call graph records what happened, not what should have happened. The node gets astatus: "pending"and may later transition to"failed"with anOPERATION_NOT_FOUNDerror code. -
Should
depends_onedges be auto-populated from workflow templates? When a call graph is instantiated from a workflow template, the template's sequential/parallel structure implies data dependencies. Should the template instantiation automatically createdepends_onedges? This would couple the call graph to the template system, which may not always be desirable. -
Should the call graph support multiple graphs simultaneously (one per workflow execution)? Currently the design assumes one call graph per
FlowGraphinstance. If the hub needs to track multiple concurrent workflows, it would use multiple instances. An alternative is a single graph with workflow-scoped subgraphs. -
Should
filterByStatususe an index? For small graphs (tens of nodes), a simple filter is fast. For very large graphs, maintaining aMap<CallStatus, Set<string>>index would make status queries O(1). The index would need to be updated on everyupdateStatus()call.
References
- Schema: schema.md —
CallNodeAttrs,CallEdgeAttrs,CallStatus,EdgeType - Call protocol:
@alkdev/alkhub_ts/docs/architecture/call-graph.md - Call graph storage:
@alkdev/alkhub_ts/docs/architecture/storage/call-graph.md - Call event types:
@alkdev/operations/src/call.ts - Taskgraph pattern:
@alkdev/taskgraph_ts/src/graph/construction.ts