Files
flowgraph/docs/architecture/call-graph.md
glm-5.1 c5e649cc9f resolve mechanical architecture review issues (C-01,C-02,C-03,W-01,W-09,W-10,W-12)
- C-01: fix broken README link (call-graph-runtime.md → call-graph.md)
- C-02: add CallEdgeAttrs union type alias in schema.md
- C-03/W-12: rename TypedEdgeAttrs → OperationEdgeAttrs for consistent
  {GraphType}EdgeAttrs naming pattern, update all references
- W-01: standardize terminology — prerequisites=structural/graph,
  preconditions=reactive/computed, rename WorkflowNode.prerequisites
  to preconditions, rename computePrerequisites to computePreconditions
- W-09: update ADR-001/002/003 status from Proposed to Accepted
- W-10: clarify call graph mutation API — addCall creates triggered
  edges automatically, addDependency creates depends_on edges
- update review checklist with resolved items
2026-05-19 11:09:06 +00:00

257 lines
13 KiB
Markdown

---
status: draft
last_updated: 2026-05-19
---
# Call Graph (Dynamic Runtime)
The dynamic call graph populated at runtime from call events. Nodes are call invocations with status and timestamps; edges are parent-child and dependency relationships.
## Overview
The call graph is the runtime counterpart to the operation graph. Where the operation graph captures what *can* happen (type compatibility), the call graph captures what *is* happening or *has happened* (running calls, completed calls, failures, aborts).
The call graph is populated automatically by the call protocol — every `call.requested` adds a node, every `call.responded`/`call.error`/`call.aborted` updates its status. This means the call graph is always in sync with the actual state of in-flight calls.
Key capabilities:
- **Abort cascading** — abort a call → all children are automatically aborted via `parentRequestId` chains
- **Observability** — query what's running, what failed, what's blocked
- **DAG operations** — topological sort of running calls, cycle detection (shouldn't happen but verified), reachability queries
- **Serialization** — `export()`/`fromJSON()` for Postgres persistence
## Construction
### fromCallEvents()
```typescript
static fromCallEvents(events: CallEventMapValue[]): FlowGraph<CallNodeAttrs, CallEdgeAttrs>
```
Builds a call graph from an array of call protocol events. Events are processed in order:
1. **`call.requested`** → add a `CallNodeAttrs` node with `status: "pending"`. If `parentRequestId` is set, add a `triggered` edge from parent to child.
2. **`call.responded`** → update node status to `completed`, set `output` and `completedAt`
3. **`call.error`** → update node status to `failed`, set `error` and `completedAt`
4. **`call.aborted`** → update node status to `aborted`, set `completedAt`
5. **`call.completed`** → update node status to `completed`, set `completedAt` (if not already set by `call.responded`)
Processing is idempotent — processing the same event twice has no effect (the node already has the updated status).
### Incremental: updateFromEvent()
```typescript
updateFromEvent(event: CallEventMapValue): void
```
Updates an existing call graph with a single call event. This is the primary interface for real-time graph population:
```typescript
const callGraph = new FlowGraph();
// Subscribe to call protocol events
pubsub.subscribe("call.requested", (event) => callGraph.updateFromEvent(event));
pubsub.subscribe("call.responded", (event) => callGraph.updateFromEvent(event));
pubsub.subscribe("call.error", (event) => callGraph.updateFromEvent(event));
pubsub.subscribe("call.aborted", (event) => callGraph.updateFromEvent(event));
pubsub.subscribe("call.completed", (event) => callGraph.updateFromEvent(event));
```
### fromJSON()
```typescript
static fromJSON(data: CallGraphSerialized): FlowGraph
```
Deserialize from graphology native JSON format. Used for loading persisted call graphs from Postgres.
## Node Attributes
See [schema.md](schema.md#CallNodeAttrs) for the full schema definition.
| Field | Type | Set by |
|-------|------|--------|
| `requestId` | `string` | `call.requested` |
| `operationId` | `string` | `call.requested` |
| `status` | `CallStatus` | Updated by each call event |
| `parentRequestId` | `string?` | `call.requested` |
| `input` | `unknown` | `call.requested` |
| `output` | `unknown?` | `call.responded` |
| `error` | `{ code, message, details? }?` | `call.error` |
| `identity` | `Identity?` | `call.requested` |
| `startedAt` | `string?` | `call.requested` (when handler starts) |
| `completedAt` | `string?` | Terminal event (`responded`, `error`, `aborted`) |
The node key is `requestId`.
## Edges
Call graph edges carry an `edgeType` attribute:
| `edgeType` | Meaning | Added by |
|-----------|---------|----------|
| `triggered` | Parent call caused child call to execute | `call.requested` with `parentRequestId` |
| `depends_on` | Data dependency — source needs target's result | Explicit declaration (not auto-populated) |
`depends_on` edges are not auto-populated by the call protocol. They represent data dependencies that aren't captured by the parent-child hierarchy. They may be added by:
- Workflow template instantiation (the template knows which steps depend on which)
- Explicit `addDependency(parent, child)` calls by the hub coordinator
### Edge Key Convention
`triggered` edges use `${parentRequestId}->${childRequestId}` as the edge key. `depends_on` edges use `${sourceRequestId}->${targetRequestId}:depends_on` to distinguish from `triggered` edges between the same pair.
Since `multi: false`, there can be at most one `triggered` and one `depends_on` edge between the same pair. The edge key convention ensures deterministic keys.
## Status Lifecycle
Call node status transitions follow a strict state machine:
```
call.requested
┌─────────┐
│ pending │
└────┬────┘
handler starts
┌─────────┐
┌────│ running │────┐
│ └────┬────┘ │
call.aborted │ call.aborted
│ │ │
▼ │ ▼
┌─────────┐ │ ┌─────────┐
│ aborted │ │ │ aborted │
└─────────┘ │ └─────────┘
┌─────────┼─────────┐
│ │ │
call.responded │ call.error
│ │ │
▼ │ ▼
┌───────────┐ │ ┌────────┐
│ completed │ │ │ failed │
└───────────┘ │ └────────┘
call.completed
┌───────────┐
│ completed │
└───────────┘
```
Invalid transitions (e.g., `completed``running`) throw `InvalidTransitionError`. The `updateStatus()` method validates the transition before applying it.
## Abort Cascading
When a call is aborted, all of its children should also be aborted. The call protocol handles this via `call.aborted` events propagating through `parentRequestId` chains.
The call graph supports this with a traversal query:
```typescript
// Abort cascade: get all descendants of a call
const descendants = callGraph.descendants(requestId);
// → all calls that would be affected by aborting this call
```
The hub coordinator can:
1. Receive `call.aborted` for a parent call
2. Query `callGraph.descendants(requestId)` for all children
3. Abort each child call via `PendingRequestMap.abort()`
This is a structural operation — the graph provides the "who is affected" information, the protocol provides the "abort them" mechanism.
## Observability Queries
The call graph supports queries for observability without traversing the entire graph:
| Query | Method | Returns |
|-------|--------|---------|
| Get running calls | `filterByStatus("running")` | Node IDs with running status |
| Get failed calls | `filterByStatus("failed")` | Node IDs with failed status |
| Get top-level calls | `getRoots()` | Nodes with no `parentRequestId` |
| Get children of call | `children(requestId)` | Direct children via `triggered` edges |
| Get call duration | `duration(requestId)` | `completedAt - startedAt` (throws if not completed) |
| Get call lineage | `lineage(requestId)` | Ancestor chain from root to this call |
### filterByStatus
```typescript
filterByStatus(status: CallStatus): string[]
```
Returns all node keys with the given status. Implemented as a filter over `graph.forEachNode()`. For small graphs (tens to hundreds of nodes), this is O(n) and fast. For very large graphs, a status index could be added as an optimization.
### getRoots
```typescript
getRoots(): string[]
```
Returns all nodes with `parentRequestId === undefined` (top-level calls). These are the entry points of call chains.
## Serialization and Persistence
```typescript
const data = callGraph.export(); // graphology native JSON
callGraph.toJSON(); // alias for export()
const restored = FlowGraph.fromJSON(data); // round-trip
```
The call graph's `export()`/`fromJSON()` boundary is designed for Postgres persistence via the hub's storage layer. Flowgraph does not handle database operations — it provides the serialized format, and the hub handles storage.
Payload fields (`input`, `output`, `error`) are stored as-is in the graph. The hub's storage layer is responsible for truncation and redaction (see `@alkdev/alkhub_ts/docs/architecture/storage/call-graph.md` for the payload handling strategy).
## Mutations
```typescript
// Add a call node (from call.requested event)
// If attrs.parentRequestId is set, also creates a triggered edge from parent to child
addCall(attrs: CallNodeAttrs): void
// Update call status (from call.responded/error/aborted/completed event)
updateStatus(requestId: string, status: CallStatus, extra?: Partial<CallNodeAttrs>): void
// Add a dependency edge (explicit, not auto-populated by call protocol)
// Creates an edge with edgeType: "depends_on"
addDependency(source: string, target: string): void
// Remove a call node and its edges
removeCall(requestId: string): void
// Update call attributes (partial merge)
updateCall(requestId: string, attrs: Partial<CallNodeAttrs>): void
```
`addCall` is the primary entry point for populating the call graph from call events. When `attrs.parentRequestId` is present, it automatically creates a `triggered` edge from the parent to the new node. `addDependency` creates explicit `depends_on` edges that represent data dependencies not captured by the parent-child hierarchy. `updateStatus` validates the transition. `addDependency` validates that both endpoints exist and that the edge would not create a cycle. `removeCall` removes the node and all attached edges (graphology cascade).
## Constraints
- **DAG-only** — call graphs cannot have cycles. A call cannot be its own ancestor. `addCall` with a `parentRequestId` that would create a cycle throws `CycleError`.
- **Status transitions are validated** — invalid transitions throw `InvalidTransitionError`.
- **Node keys are `requestId`** — not `operationId`. Multiple calls to the same operation have different `requestId`s but the same `operationId`.
- **`parentRequestId` is both node attribute and edge** — denormalized for fast point lookups (node attribute) and traversal queries (edge), following the storage schema pattern.
- **`depends_on` edges are not auto-populated** — they represent data dependencies that the call protocol doesn't capture. They must be added explicitly by the hub coordinator or workflow template instantiation.
- **Payload fields are stored as-is** — flowgraph doesn't truncate or redact `input`, `output`, or `error`. That's the hub's responsibility at the persistence boundary.
- **Small graph sizes** — call graphs at hub level are typically tens of nodes. Performance is a non-issue; O(n) traversals are fine.
## Open Questions
1. **Should the call graph support `call.requested` events with unknown `operationId`?** If a `call.requested` event references an operation not in the registry, should the node be created with `operationId` set to the unknown value? Yes — the call graph records what happened, not what should have happened. The node gets a `status: "pending"` and may later transition to `"failed"` with an `OPERATION_NOT_FOUND` error code.
2. **Should `depends_on` edges be auto-populated from workflow templates?** When a call graph is instantiated from a workflow template, the template's sequential/parallel structure implies data dependencies. Should the template instantiation automatically create `depends_on` edges? This would couple the call graph to the template system, which may not always be desirable.
3. **Should the call graph support multiple graphs simultaneously (one per workflow execution)?** Currently the design assumes one call graph per `FlowGraph` instance. If the hub needs to track multiple concurrent workflows, it would use multiple instances. An alternative is a single graph with workflow-scoped subgraphs.
4. **Should `filterByStatus` use an index?** For small graphs (tens of nodes), a simple filter is fast. For very large graphs, maintaining a `Map<CallStatus, Set<string>>` index would make status queries O(1). The index would need to be updated on every `updateStatus()` call.
## References
- Schema: [schema.md](schema.md) — `CallNodeAttrs`, `CallEdgeAttrs`, `CallStatus`, `EdgeType`
- Call protocol: `@alkdev/alkhub_ts/docs/architecture/call-graph.md`
- Call graph storage: `@alkdev/alkhub_ts/docs/architecture/storage/call-graph.md`
- Call event types: `@alkdev/operations/src/call.ts`
- Taskgraph pattern: `@alkdev/taskgraph_ts/src/graph/construction.ts`