- Replace workspace:* deps with published npm semver ranges (^0.34.49, ^0.1.0) - Expand package.json: add description, publishConfig, scripts, engines, devDependencies, conditional exports with types/default for import+require - Fix tsup entry names (path-prefixed like ujsx), add target: es2022, remove splitting:true (not used by sibling projects) - Align tsconfig with sibling projects: add lib, noUncheckedIndexedAccess, noUnusedLocals, noUnusedParameters, erasableSyntaxOnly, etc. - Expand vitest.config.ts with include, coverage, and path alias - Clarify @preact/signals-core as direct dep (not just transitive via ujsx) - Clarify @alkdev/pubsub is a consumer dependency, not flowgraph's dep - Fix edge key convention: document composite key format for call graph's multi-edge-type scenario (triggered + depends_on between same pair) - Align OperationEdgeAttrs field naming: use detail+mismatches consistently instead of compatibilityDetail - Add InvalidInputError to error hierarchy (referenced in flowgraph-api but was missing) - Fix undefined attrs.category reference in reactive-execution.md - Remove internal drafting note from host-configs.md - Fix ReactiveHostConfig constructor signature inconsistency across docs - Constrain TemplateEdgeAttrs.edgeType to sequential|conditional only
257 lines
13 KiB
Markdown
257 lines
13 KiB
Markdown
---
|
|
status: draft
|
|
last_updated: 2026-05-19
|
|
---
|
|
|
|
# Call Graph (Dynamic Runtime)
|
|
|
|
The dynamic call graph populated at runtime from call events. Nodes are call invocations with status and timestamps; edges are parent-child and dependency relationships.
|
|
|
|
## Overview
|
|
|
|
The call graph is the runtime counterpart to the operation graph. Where the operation graph captures what *can* happen (type compatibility), the call graph captures what *is* happening or *has happened* (running calls, completed calls, failures, aborts).
|
|
|
|
The call graph is populated automatically by the call protocol — every `call.requested` adds a node, every `call.responded`/`call.error`/`call.aborted` updates its status. This means the call graph is always in sync with the actual state of in-flight calls.
|
|
|
|
Key capabilities:
|
|
- **Abort cascading** — abort a call → all children are automatically aborted via `parentRequestId` chains
|
|
- **Observability** — query what's running, what failed, what's blocked
|
|
- **DAG operations** — topological sort of running calls, cycle detection (shouldn't happen but verified), reachability queries
|
|
- **Serialization** — `export()`/`fromJSON()` for Postgres persistence
|
|
|
|
## Construction
|
|
|
|
### fromCallEvents()
|
|
|
|
```typescript
|
|
static fromCallEvents(events: CallEventMapValue[]): FlowGraph<CallNodeAttrs, CallEdgeAttrs>
|
|
```
|
|
|
|
Builds a call graph from an array of call protocol events. Events are processed in order:
|
|
|
|
1. **`call.requested`** → add a `CallNodeAttrs` node with `status: "pending"`. If `parentRequestId` is set, add a `triggered` edge from parent to child.
|
|
2. **`call.responded`** → update node status to `completed`, set `output` and `completedAt`
|
|
3. **`call.error`** → update node status to `failed`, set `error` and `completedAt`
|
|
4. **`call.aborted`** → update node status to `aborted`, set `completedAt`
|
|
5. **`call.completed`** → update node status to `completed`, set `completedAt` (if not already set by `call.responded`)
|
|
|
|
Processing is idempotent — processing the same event twice has no effect (the node already has the updated status).
|
|
|
|
### Incremental: updateFromEvent()
|
|
|
|
```typescript
|
|
updateFromEvent(event: CallEventMapValue): void
|
|
```
|
|
|
|
Updates an existing call graph with a single call event. This is the primary interface for real-time graph population:
|
|
|
|
```typescript
|
|
const callGraph = new FlowGraph();
|
|
// Subscribe to call protocol events
|
|
pubsub.subscribe("call.requested", (event) => callGraph.updateFromEvent(event));
|
|
pubsub.subscribe("call.responded", (event) => callGraph.updateFromEvent(event));
|
|
pubsub.subscribe("call.error", (event) => callGraph.updateFromEvent(event));
|
|
pubsub.subscribe("call.aborted", (event) => callGraph.updateFromEvent(event));
|
|
pubsub.subscribe("call.completed", (event) => callGraph.updateFromEvent(event));
|
|
```
|
|
|
|
### fromJSON()
|
|
|
|
```typescript
|
|
static fromJSON(data: CallGraphSerialized): FlowGraph
|
|
```
|
|
|
|
Deserialize from graphology native JSON format. Used for loading persisted call graphs from Postgres.
|
|
|
|
## Node Attributes
|
|
|
|
See [schema.md](schema.md#CallNodeAttrs) for the full schema definition.
|
|
|
|
| Field | Type | Set by |
|
|
|-------|------|--------|
|
|
| `requestId` | `string` | `call.requested` |
|
|
| `operationId` | `string` | `call.requested` |
|
|
| `status` | `CallStatus` | Updated by each call event |
|
|
| `parentRequestId` | `string?` | `call.requested` |
|
|
| `input` | `unknown` | `call.requested` |
|
|
| `output` | `unknown?` | `call.responded` |
|
|
| `error` | `{ code, message, details? }?` | `call.error` |
|
|
| `identity` | `Identity?` | `call.requested` |
|
|
| `startedAt` | `string?` | `call.requested` (when handler starts) |
|
|
| `completedAt` | `string?` | Terminal event (`responded`, `error`, `aborted`) |
|
|
|
|
The node key is `requestId`.
|
|
|
|
## Edges
|
|
|
|
Call graph edges carry an `edgeType` attribute:
|
|
|
|
| `edgeType` | Meaning | Added by |
|
|
|-----------|---------|----------|
|
|
| `triggered` | Parent call caused child call to execute | `call.requested` with `parentRequestId` |
|
|
| `depends_on` | Data dependency — source needs target's result | Explicit declaration (not auto-populated) |
|
|
|
|
`depends_on` edges are not auto-populated by the call protocol. They represent data dependencies that aren't captured by the parent-child hierarchy. They may be added by:
|
|
- Workflow template instantiation (the template knows which steps depend on which)
|
|
- Explicit `addDependency(parent, child)` calls by the hub coordinator
|
|
|
|
### Edge Key Convention
|
|
|
|
`triggered` edges use `${parentRequestId}->${childRequestId}` as the edge key. `depends_on` edges use `${sourceRequestId}->${targetRequestId}:depends_on` to distinguish from `triggered` edges between the same pair.
|
|
|
|
This composite key format is necessary because `multi: false` allows at most one edge per key between a given (source, target) pair. Since a call graph can have both a `triggered` edge (parent→child) and a `depends_on` edge (data dependency) between the same pair of calls, the edge type suffix in the key disambiguates them. See [schema.md#edge-key-convention](schema.md) for the general key convention and the discussion of multi-edge support.
|
|
|
|
## Status Lifecycle
|
|
|
|
Call node status transitions follow a strict state machine:
|
|
|
|
```
|
|
call.requested
|
|
│
|
|
▼
|
|
┌─────────┐
|
|
│ pending │
|
|
└────┬────┘
|
|
│
|
|
handler starts
|
|
│
|
|
▼
|
|
┌─────────┐
|
|
┌────│ running │────┐
|
|
│ └────┬────┘ │
|
|
call.aborted │ call.aborted
|
|
│ │ │
|
|
▼ │ ▼
|
|
┌─────────┐ │ ┌─────────┐
|
|
│ aborted │ │ │ aborted │
|
|
└─────────┘ │ └─────────┘
|
|
│
|
|
┌─────────┼─────────┐
|
|
│ │ │
|
|
call.responded │ call.error
|
|
│ │ │
|
|
▼ │ ▼
|
|
┌───────────┐ │ ┌────────┐
|
|
│ completed │ │ │ failed │
|
|
└───────────┘ │ └────────┘
|
|
│
|
|
call.completed
|
|
│
|
|
▼
|
|
┌───────────┐
|
|
│ completed │
|
|
└───────────┘
|
|
```
|
|
|
|
Invalid transitions (e.g., `completed` → `running`) throw `InvalidTransitionError`. The `updateStatus()` method validates the transition before applying it.
|
|
|
|
## Abort Cascading
|
|
|
|
When a call is aborted, all of its children should also be aborted. The call protocol handles this via `call.aborted` events propagating through `parentRequestId` chains.
|
|
|
|
The call graph supports this with a traversal query:
|
|
|
|
```typescript
|
|
// Abort cascade: get all descendants of a call
|
|
const descendants = callGraph.descendants(requestId);
|
|
// → all calls that would be affected by aborting this call
|
|
```
|
|
|
|
The hub coordinator can:
|
|
1. Receive `call.aborted` for a parent call
|
|
2. Query `callGraph.descendants(requestId)` for all children
|
|
3. Abort each child call via `PendingRequestMap.abort()`
|
|
|
|
This is a structural operation — the graph provides the "who is affected" information, the protocol provides the "abort them" mechanism.
|
|
|
|
## Observability Queries
|
|
|
|
The call graph supports queries for observability without traversing the entire graph:
|
|
|
|
| Query | Method | Returns |
|
|
|-------|--------|---------|
|
|
| Get running calls | `filterByStatus("running")` | Node IDs with running status |
|
|
| Get failed calls | `filterByStatus("failed")` | Node IDs with failed status |
|
|
| Get top-level calls | `getRoots()` | Nodes with no `parentRequestId` |
|
|
| Get children of call | `children(requestId)` | Direct children via `triggered` edges |
|
|
| Get call duration | `duration(requestId)` | `completedAt - startedAt` (throws if not completed) |
|
|
| Get call lineage | `lineage(requestId)` | Ancestor chain from root to this call |
|
|
|
|
### filterByStatus
|
|
|
|
```typescript
|
|
filterByStatus(status: CallStatus): string[]
|
|
```
|
|
|
|
Returns all node keys with the given status. Implemented as a filter over `graph.forEachNode()`. For small graphs (tens to hundreds of nodes), this is O(n) and fast. For very large graphs, a status index could be added as an optimization.
|
|
|
|
### getRoots
|
|
|
|
```typescript
|
|
getRoots(): string[]
|
|
```
|
|
|
|
Returns all nodes with `parentRequestId === undefined` (top-level calls). These are the entry points of call chains.
|
|
|
|
## Serialization and Persistence
|
|
|
|
```typescript
|
|
const data = callGraph.export(); // graphology native JSON
|
|
callGraph.toJSON(); // alias for export()
|
|
const restored = FlowGraph.fromJSON(data); // round-trip
|
|
```
|
|
|
|
The call graph's `export()`/`fromJSON()` boundary is designed for Postgres persistence via the hub's storage layer. Flowgraph does not handle database operations — it provides the serialized format, and the hub handles storage.
|
|
|
|
Payload fields (`input`, `output`, `error`) are stored as-is in the graph. The hub's storage layer is responsible for truncation and redaction (see `@alkdev/alkhub_ts/docs/architecture/storage/call-graph.md` for the payload handling strategy).
|
|
|
|
## Mutations
|
|
|
|
```typescript
|
|
// Add a call node (from call.requested event)
|
|
// If attrs.parentRequestId is set, also creates a triggered edge from parent to child
|
|
addCall(attrs: CallNodeAttrs): void
|
|
|
|
// Update call status (from call.responded/error/aborted/completed event)
|
|
updateStatus(requestId: string, status: CallStatus, extra?: Partial<CallNodeAttrs>): void
|
|
|
|
// Add a dependency edge (explicit, not auto-populated by call protocol)
|
|
// Creates an edge with edgeType: "depends_on"
|
|
addDependency(source: string, target: string): void
|
|
|
|
// Remove a call node and its edges
|
|
removeCall(requestId: string): void
|
|
|
|
// Update call attributes (partial merge)
|
|
updateCall(requestId: string, attrs: Partial<CallNodeAttrs>): void
|
|
```
|
|
|
|
`addCall` is the primary entry point for populating the call graph from call events. When `attrs.parentRequestId` is present, it automatically creates a `triggered` edge from the parent to the new node. `addDependency` creates explicit `depends_on` edges that represent data dependencies not captured by the parent-child hierarchy. `updateStatus` validates the transition. `addDependency` validates that both endpoints exist and that the edge would not create a cycle. `removeCall` removes the node and all attached edges (graphology cascade).
|
|
|
|
## Constraints
|
|
|
|
- **DAG-only** — call graphs cannot have cycles. A call cannot be its own ancestor. `addCall` with a `parentRequestId` that would create a cycle throws `CycleError`.
|
|
- **Status transitions are validated** — invalid transitions throw `InvalidTransitionError`.
|
|
- **Node keys are `requestId`** — not `operationId`. Multiple calls to the same operation have different `requestId`s but the same `operationId`.
|
|
- **`parentRequestId` is both node attribute and edge** — denormalized for fast point lookups (node attribute) and traversal queries (edge), following the storage schema pattern.
|
|
- **`depends_on` edges are not auto-populated** — they represent data dependencies that the call protocol doesn't capture. They must be added explicitly by the hub coordinator or workflow template instantiation.
|
|
- **Payload fields are stored as-is** — flowgraph doesn't truncate or redact `input`, `output`, or `error`. That's the hub's responsibility at the persistence boundary.
|
|
- **Small graph sizes** — call graphs at hub level are typically tens of nodes. Performance is a non-issue; O(n) traversals are fine.
|
|
|
|
## Open Questions
|
|
|
|
1. **Should the call graph support `call.requested` events with unknown `operationId`?** If a `call.requested` event references an operation not in the registry, should the node be created with `operationId` set to the unknown value? Yes — the call graph records what happened, not what should have happened. The node gets a `status: "pending"` and may later transition to `"failed"` with an `OPERATION_NOT_FOUND` error code.
|
|
|
|
2. **Should `depends_on` edges be auto-populated from workflow templates?** When a call graph is instantiated from a workflow template, the template's sequential/parallel structure implies data dependencies. Should the template instantiation automatically create `depends_on` edges? This would couple the call graph to the template system, which may not always be desirable.
|
|
|
|
3. **Should the call graph support multiple graphs simultaneously (one per workflow execution)?** Currently the design assumes one call graph per `FlowGraph` instance. If the hub needs to track multiple concurrent workflows, it would use multiple instances. An alternative is a single graph with workflow-scoped subgraphs.
|
|
|
|
4. **Should `filterByStatus` use an index?** For small graphs (tens of nodes), a simple filter is fast. For very large graphs, maintaining a `Map<CallStatus, Set<string>>` index would make status queries O(1). The index would need to be updated on every `updateStatus()` call.
|
|
|
|
## References
|
|
|
|
- Schema: [schema.md](schema.md) — `CallNodeAttrs`, `CallEdgeAttrs`, `CallStatus`, `EdgeType`
|
|
- Call protocol: `@alkdev/alkhub_ts/docs/architecture/call-graph.md`
|
|
- Call graph storage: `@alkdev/alkhub_ts/docs/architecture/storage/call-graph.md`
|
|
- Call event types: `@alkdev/operations/src/call.ts`
|
|
- Taskgraph pattern: `@alkdev/taskgraph_ts/src/graph/construction.ts` |