--- status: draft last_updated: 2026-05-19 --- # Call Graph (Dynamic Runtime) The dynamic call graph populated at runtime from call events. Nodes are call invocations with status and timestamps; edges are parent-child and dependency relationships. ## Overview The call graph is the runtime counterpart to the operation graph. Where the operation graph captures what *can* happen (type compatibility), the call graph captures what *is* happening or *has happened* (running calls, completed calls, failures, aborts). The call graph is populated automatically by the call protocol — every `call.requested` adds a node, every `call.responded`/`call.error`/`call.aborted` updates its status. This means the call graph is always in sync with the actual state of in-flight calls. Key capabilities: - **Abort cascading** — abort a call → all children are automatically aborted via `parentRequestId` chains - **Observability** — query what's running, what failed, what's blocked - **DAG operations** — topological sort of running calls, cycle detection (shouldn't happen but verified), reachability queries - **Serialization** — `export()`/`fromJSON()` for Postgres persistence ## Construction ### fromCallEvents() ```typescript static fromCallEvents(events: CallEventMapValue[]): FlowGraph ``` Builds a call graph from an array of call protocol events. Events are processed in order: 1. **`call.requested`** → add a `CallNodeAttrs` node with `status: "pending"`. If `parentRequestId` is set, add a `triggered` edge from parent to child. 2. **`call.responded`** → update node status to `completed`, set `output` and `completedAt` 3. **`call.error`** → update node status to `failed`, set `error` and `completedAt` 4. **`call.aborted`** → update node status to `aborted`, set `completedAt` 5. **`call.completed`** → update node status to `completed`, set `completedAt` (if not already set by `call.responded`) Processing is idempotent — processing the same event twice has no effect (the node already has the updated status). ### Incremental: updateFromEvent() ```typescript updateFromEvent(event: CallEventMapValue): void ``` Updates an existing call graph with a single call event. This is the primary interface for real-time graph population: ```typescript const callGraph = new FlowGraph(); // Subscribe to call protocol events pubsub.subscribe("call.requested", (event) => callGraph.updateFromEvent(event)); pubsub.subscribe("call.responded", (event) => callGraph.updateFromEvent(event)); pubsub.subscribe("call.error", (event) => callGraph.updateFromEvent(event)); pubsub.subscribe("call.aborted", (event) => callGraph.updateFromEvent(event)); pubsub.subscribe("call.completed", (event) => callGraph.updateFromEvent(event)); ``` ### fromJSON() ```typescript static fromJSON(data: CallGraphSerialized): FlowGraph ``` Deserialize from graphology native JSON format. Used for loading persisted call graphs from Postgres. ## Node Attributes See [schema.md](schema.md#CallNodeAttrs) for the full schema definition. | Field | Type | Set by | |-------|------|--------| | `requestId` | `string` | `call.requested` | | `operationId` | `string` | `call.requested` | | `status` | `CallStatus` | Updated by each call event | | `parentRequestId` | `string?` | `call.requested` | | `input` | `unknown` | `call.requested` | | `output` | `unknown?` | `call.responded` | | `error` | `{ code, message, details? }?` | `call.error` | | `identity` | `Identity?` | `call.requested` | | `startedAt` | `string?` | `call.requested` (when handler starts) | | `completedAt` | `string?` | Terminal event (`responded`, `error`, `aborted`) | The node key is `requestId`. ## Edges Call graph edges carry an `edgeType` attribute: | `edgeType` | Meaning | Added by | |-----------|---------|----------| | `triggered` | Parent call caused child call to execute | `call.requested` with `parentRequestId` | | `depends_on` | Data dependency — source needs target's result | Explicit declaration (not auto-populated) | `depends_on` edges are not auto-populated by the call protocol. They represent data dependencies that aren't captured by the parent-child hierarchy. They may be added by: - Workflow template instantiation (the template knows which steps depend on which) - Explicit `addDependency(parent, child)` calls by the hub coordinator ### Edge Key Convention `triggered` edges use `${parentRequestId}->${childRequestId}` as the edge key. `depends_on` edges use `${sourceRequestId}->${targetRequestId}:depends_on` to distinguish from `triggered` edges between the same pair. Since `multi: false`, there can be at most one `triggered` and one `depends_on` edge between the same pair. The edge key convention ensures deterministic keys. ## Status Lifecycle Call node status transitions follow a strict state machine: ``` call.requested │ ▼ ┌─────────┐ │ pending │ └────┬────┘ │ handler starts │ ▼ ┌─────────┐ ┌────│ running │────┐ │ └────┬────┘ │ call.aborted │ call.aborted │ │ │ ▼ │ ▼ ┌─────────┐ │ ┌─────────┐ │ aborted │ │ │ aborted │ └─────────┘ │ └─────────┘ │ ┌─────────┼─────────┐ │ │ │ call.responded │ call.error │ │ │ ▼ │ ▼ ┌───────────┐ │ ┌────────┐ │ completed │ │ │ failed │ └───────────┘ │ └────────┘ │ call.completed │ ▼ ┌───────────┐ │ completed │ └───────────┘ ``` Invalid transitions (e.g., `completed` → `running`) throw `InvalidTransitionError`. The `updateStatus()` method validates the transition before applying it. ## Abort Cascading When a call is aborted, all of its children should also be aborted. The call protocol handles this via `call.aborted` events propagating through `parentRequestId` chains. The call graph supports this with a traversal query: ```typescript // Abort cascade: get all descendants of a call const descendants = callGraph.descendants(requestId); // → all calls that would be affected by aborting this call ``` The hub coordinator can: 1. Receive `call.aborted` for a parent call 2. Query `callGraph.descendants(requestId)` for all children 3. Abort each child call via `PendingRequestMap.abort()` This is a structural operation — the graph provides the "who is affected" information, the protocol provides the "abort them" mechanism. ## Observability Queries The call graph supports queries for observability without traversing the entire graph: | Query | Method | Returns | |-------|--------|---------| | Get running calls | `filterByStatus("running")` | Node IDs with running status | | Get failed calls | `filterByStatus("failed")` | Node IDs with failed status | | Get top-level calls | `getRoots()` | Nodes with no `parentRequestId` | | Get children of call | `children(requestId)` | Direct children via `triggered` edges | | Get call duration | `duration(requestId)` | `completedAt - startedAt` (throws if not completed) | | Get call lineage | `lineage(requestId)` | Ancestor chain from root to this call | ### filterByStatus ```typescript filterByStatus(status: CallStatus): string[] ``` Returns all node keys with the given status. Implemented as a filter over `graph.forEachNode()`. For small graphs (tens to hundreds of nodes), this is O(n) and fast. For very large graphs, a status index could be added as an optimization. ### getRoots ```typescript getRoots(): string[] ``` Returns all nodes with `parentRequestId === undefined` (top-level calls). These are the entry points of call chains. ## Serialization and Persistence ```typescript const data = callGraph.export(); // graphology native JSON callGraph.toJSON(); // alias for export() const restored = FlowGraph.fromJSON(data); // round-trip ``` The call graph's `export()`/`fromJSON()` boundary is designed for Postgres persistence via the hub's storage layer. Flowgraph does not handle database operations — it provides the serialized format, and the hub handles storage. Payload fields (`input`, `output`, `error`) are stored as-is in the graph. The hub's storage layer is responsible for truncation and redaction (see `@alkdev/alkhub_ts/docs/architecture/storage/call-graph.md` for the payload handling strategy). ## Mutations ```typescript // Add a call node (from call.requested event) addCall(attrs: CallNodeAttrs): void // Update call status (from call.responded/error/aborted/completed event) updateStatus(requestId: string, status: CallStatus, extra?: Partial): void // Add a dependency edge (explicit, not auto-populated) addDependency(source: string, target: string): void // Remove a call node and its edges removeCall(requestId: string): void // Update call attributes (partial merge) updateCall(requestId: string, attrs: Partial): void ``` `updateStatus` validates the transition. `addDependency` validates that both endpoints exist. `removeCall` removes the node and all attached edges (graphology cascade). ## Constraints - **DAG-only** — call graphs cannot have cycles. A call cannot be its own ancestor. `addCall` with a `parentRequestId` that would create a cycle throws `CycleError`. - **Status transitions are validated** — invalid transitions throw `InvalidTransitionError`. - **Node keys are `requestId`** — not `operationId`. Multiple calls to the same operation have different `requestId`s but the same `operationId`. - **`parentRequestId` is both node attribute and edge** — denormalized for fast point lookups (node attribute) and traversal queries (edge), following the storage schema pattern. - **`depends_on` edges are not auto-populated** — they represent data dependencies that the call protocol doesn't capture. They must be added explicitly by the hub coordinator or workflow template instantiation. - **Payload fields are stored as-is** — flowgraph doesn't truncate or redact `input`, `output`, or `error`. That's the hub's responsibility at the persistence boundary. - **Small graph sizes** — call graphs at hub level are typically tens of nodes. Performance is a non-issue; O(n) traversals are fine. ## Open Questions 1. **Should the call graph support `call.requested` events with unknown `operationId`?** If a `call.requested` event references an operation not in the registry, should the node be created with `operationId` set to the unknown value? Yes — the call graph records what happened, not what should have happened. The node gets a `status: "pending"` and may later transition to `"failed"` with an `OPERATION_NOT_FOUND` error code. 2. **Should `depends_on` edges be auto-populated from workflow templates?** When a call graph is instantiated from a workflow template, the template's sequential/parallel structure implies data dependencies. Should the template instantiation automatically create `depends_on` edges? This would couple the call graph to the template system, which may not always be desirable. 3. **Should the call graph support multiple graphs simultaneously (one per workflow execution)?** Currently the design assumes one call graph per `FlowGraph` instance. If the hub needs to track multiple concurrent workflows, it would use multiple instances. An alternative is a single graph with workflow-scoped subgraphs. 4. **Should `filterByStatus` use an index?** For small graphs (tens of nodes), a simple filter is fast. For very large graphs, maintaining a `Map>` index would make status queries O(1). The index would need to be updated on every `updateStatus()` call. ## References - Schema: [schema.md](schema.md) — `CallNodeAttrs`, `CallEdgeAttrs`, `CallStatus`, `EdgeType` - Call protocol: `@alkdev/alkhub_ts/docs/architecture/call-graph.md` - Call graph storage: `@alkdev/alkhub_ts/docs/architecture/storage/call-graph.md` - Call event types: `@alkdev/operations/src/call.ts` - Taskgraph pattern: `@alkdev/taskgraph_ts/src/graph/construction.ts`