flowgraph/docs/architecture/call-graph.md

---
status: draft
last_updated: 2026-05-19
---

# Call Graph (Dynamic Runtime)

The dynamic call graph populated at runtime from call events. Nodes are call invocations with status and timestamps; edges are parent-child and dependency relationships.

## Overview

The call graph is the runtime counterpart to the operation graph. Where the operation graph captures what *can* happen (type compatibility), the call graph captures what *is* happening or *has happened* (running calls, completed calls, failures, aborts).

The call graph is populated automatically by the call protocol — every `call.requested` adds a node, every `call.responded`/`call.error`/`call.aborted` updates its status. This means the call graph is always in sync with the actual state of in-flight calls.

Key capabilities:
- **Abort cascading** — abort a call → all children are automatically aborted via `parentRequestId` chains
- **Observability** — query what's running, what failed, what's blocked
- **DAG operations** — topological sort of running calls, cycle detection (shouldn't happen but verified), reachability queries
- **Serialization** — `export()`/`fromJSON()` for Postgres persistence

## Construction

### fromCallEvents()

```typescript
static fromCallEvents(events: CallEventMapValue[]): FlowGraph<CallNodeAttrs, CallEdgeAttrs>
```

Builds a call graph from an array of call protocol events. Events are processed in order:

1. **`call.requested`** → add a `CallNodeAttrs` node with `status: "pending"`. If `parentRequestId` is set, add a `triggered` edge from parent to child.
2. **`call.responded`** → update node status to `completed`, set `output` and `completedAt`
3. **`call.error`** → update node status to `failed`, set `error` and `completedAt`
4. **`call.aborted`** → update node status to `aborted`, set `completedAt`
5. **`call.completed`** → update node status to `completed`, set `completedAt` (if not already set by `call.responded`)

Processing is idempotent — processing the same event twice has no effect (the node already has the updated status).

### Incremental: updateFromEvent()

```typescript
updateFromEvent(event: CallEventMapValue): void
```

Updates an existing call graph with a single call event. This is the primary interface for real-time graph population:

```typescript
const callGraph = new FlowGraph();
// Subscribe to call protocol events
pubsub.subscribe("call.requested", (event) => callGraph.updateFromEvent(event));
pubsub.subscribe("call.responded", (event) => callGraph.updateFromEvent(event));
pubsub.subscribe("call.error", (event) => callGraph.updateFromEvent(event));
pubsub.subscribe("call.aborted", (event) => callGraph.updateFromEvent(event));
pubsub.subscribe("call.completed", (event) => callGraph.updateFromEvent(event));
```

### fromJSON()

```typescript
static fromJSON(data: CallGraphSerialized): FlowGraph
```

Deserialize from graphology native JSON format. Used for loading persisted call graphs from Postgres.

## Node Attributes

See [schema.md](schema.md#CallNodeAttrs) for the full schema definition.

| Field | Type | Set by |
|-------|------|--------|
| `requestId` | `string` | `call.requested` |
| `operationId` | `string` | `call.requested` |
| `status` | `CallStatus` | Updated by each call event |
| `parentRequestId` | `string?` | `call.requested` |
| `input` | `unknown` | `call.requested` |
| `output` | `unknown?` | `call.responded` |
| `error` | `{ code, message, details? }?` | `call.error` |
| `identity` | `Identity?` | `call.requested` |
| `startedAt` | `string?` | `call.requested` (when handler starts) |
| `completedAt` | `string?` | Terminal event (`responded`, `error`, `aborted`) |

The node key is `requestId`.

## Edges

Call graph edges carry an `edgeType` attribute:

| `edgeType` | Meaning | Added by |
|-----------|---------|----------|
| `triggered` | Parent call caused child call to execute | `call.requested` with `parentRequestId` |
| `depends_on` | Data dependency — source needs target's result | Explicit declaration (not auto-populated) |

`depends_on` edges are not auto-populated by the call protocol. They represent data dependencies that aren't captured by the parent-child hierarchy. They may be added by:
- Workflow template instantiation (the template knows which steps depend on which)
- Explicit `addDependency(parent, child)` calls by the hub coordinator

### Edge Key Convention

`triggered` edges use `${parentRequestId}->${childRequestId}` as the edge key. `depends_on` edges use `${sourceRequestId}->${targetRequestId}:depends_on` to distinguish from `triggered` edges between the same pair.

Since `multi: false`, there can be at most one `triggered` and one `depends_on` edge between the same pair. The edge key convention ensures deterministic keys.

## Status Lifecycle

Call node status transitions follow a strict state machine:

```
              call.requested
                   │
                   ▼
              ┌─────────┐
              │ pending │
              └────┬────┘
                   │
              handler starts
                   │
                   ▼
              ┌─────────┐
         ┌────│ running │────┐
         │    └────┬────┘    │
    call.aborted  │    call.aborted
         │        │         │
         ▼        │         ▼
   ┌─────────┐    │   ┌─────────┐
   │ aborted │    │   │ aborted │
   └─────────┘    │   └─────────┘
                  │
        ┌─────────┼─────────┐
        │         │         │
  call.responded   │    call.error
        │         │         │
        ▼         │         ▼
  ┌───────────┐   │   ┌────────┐
  │ completed │   │   │ failed │
  └───────────┘   │   └────────┘
                  │
           call.completed
                  │
                  ▼
            ┌───────────┐
            │ completed │
            └───────────┘
```

Invalid transitions (e.g., `completed` → `running`) throw `InvalidTransitionError`. The `updateStatus()` method validates the transition before applying it.

## Abort Cascading

When a call is aborted, all of its children should also be aborted. The call protocol handles this via `call.aborted` events propagating through `parentRequestId` chains.

The call graph supports this with a traversal query:

```typescript
// Abort cascade: get all descendants of a call
const descendants = callGraph.descendants(requestId);
// → all calls that would be affected by aborting this call
```

The hub coordinator can:
1. Receive `call.aborted` for a parent call
2. Query `callGraph.descendants(requestId)` for all children
3. Abort each child call via `PendingRequestMap.abort()`

This is a structural operation — the graph provides the "who is affected" information, the protocol provides the "abort them" mechanism.

## Observability Queries

The call graph supports queries for observability without traversing the entire graph:

| Query | Method | Returns |
|-------|--------|---------|
| Get running calls | `filterByStatus("running")` | Node IDs with running status |
| Get failed calls | `filterByStatus("failed")` | Node IDs with failed status |
| Get top-level calls | `getRoots()` | Nodes with no `parentRequestId` |
| Get children of call | `children(requestId)` | Direct children via `triggered` edges |
| Get call duration | `duration(requestId)` | `completedAt - startedAt` (throws if not completed) |
| Get call lineage | `lineage(requestId)` | Ancestor chain from root to this call |

### filterByStatus

```typescript
filterByStatus(status: CallStatus): string[]
```

Returns all node keys with the given status. Implemented as a filter over `graph.forEachNode()`. For small graphs (tens to hundreds of nodes), this is O(n) and fast. For very large graphs, a status index could be added as an optimization.

### getRoots

```typescript
getRoots(): string[]
```

Returns all nodes with `parentRequestId === undefined` (top-level calls). These are the entry points of call chains.

## Serialization and Persistence

```typescript
const data = callGraph.export();          // graphology native JSON
callGraph.toJSON();                       // alias for export()
const restored = FlowGraph.fromJSON(data); // round-trip
```

The call graph's `export()`/`fromJSON()` boundary is designed for Postgres persistence via the hub's storage layer. Flowgraph does not handle database operations — it provides the serialized format, and the hub handles storage.

Payload fields (`input`, `output`, `error`) are stored as-is in the graph. The hub's storage layer is responsible for truncation and redaction (see `@alkdev/alkhub_ts/docs/architecture/storage/call-graph.md` for the payload handling strategy).

## Mutations

```typescript
// Add a call node (from call.requested event)
// If attrs.parentRequestId is set, also creates a triggered edge from parent to child
addCall(attrs: CallNodeAttrs): void

// Update call status (from call.responded/error/aborted/completed event)
updateStatus(requestId: string, status: CallStatus, extra?: Partial<CallNodeAttrs>): void

// Add a dependency edge (explicit, not auto-populated by call protocol)
// Creates an edge with edgeType: "depends_on"
addDependency(source: string, target: string): void

// Remove a call node and its edges
removeCall(requestId: string): void

// Update call attributes (partial merge)
updateCall(requestId: string, attrs: Partial<CallNodeAttrs>): void
```

`addCall` is the primary entry point for populating the call graph from call events. When `attrs.parentRequestId` is present, it automatically creates a `triggered` edge from the parent to the new node. `addDependency` creates explicit `depends_on` edges that represent data dependencies not captured by the parent-child hierarchy. `updateStatus` validates the transition. `addDependency` validates that both endpoints exist and that the edge would not create a cycle. `removeCall` removes the node and all attached edges (graphology cascade).

## Constraints

- **DAG-only** — call graphs cannot have cycles. A call cannot be its own ancestor. `addCall` with a `parentRequestId` that would create a cycle throws `CycleError`.
- **Status transitions are validated** — invalid transitions throw `InvalidTransitionError`.
- **Node keys are `requestId`** — not `operationId`. Multiple calls to the same operation have different `requestId`s but the same `operationId`.
- **`parentRequestId` is both node attribute and edge** — denormalized for fast point lookups (node attribute) and traversal queries (edge), following the storage schema pattern.
- **`depends_on` edges are not auto-populated** — they represent data dependencies that the call protocol doesn't capture. They must be added explicitly by the hub coordinator or workflow template instantiation.
- **Payload fields are stored as-is** — flowgraph doesn't truncate or redact `input`, `output`, or `error`. That's the hub's responsibility at the persistence boundary.
- **Small graph sizes** — call graphs at hub level are typically tens of nodes. Performance is a non-issue; O(n) traversals are fine.

## Open Questions

1. **Should the call graph support `call.requested` events with unknown `operationId`?** If a `call.requested` event references an operation not in the registry, should the node be created with `operationId` set to the unknown value? Yes — the call graph records what happened, not what should have happened. The node gets a `status: "pending"` and may later transition to `"failed"` with an `OPERATION_NOT_FOUND` error code.

2. **Should `depends_on` edges be auto-populated from workflow templates?** When a call graph is instantiated from a workflow template, the template's sequential/parallel structure implies data dependencies. Should the template instantiation automatically create `depends_on` edges? This would couple the call graph to the template system, which may not always be desirable.

3. **Should the call graph support multiple graphs simultaneously (one per workflow execution)?** Currently the design assumes one call graph per `FlowGraph` instance. If the hub needs to track multiple concurrent workflows, it would use multiple instances. An alternative is a single graph with workflow-scoped subgraphs.

4. **Should `filterByStatus` use an index?** For small graphs (tens of nodes), a simple filter is fast. For very large graphs, maintaining a `Map<CallStatus, Set<string>>` index would make status queries O(1). The index would need to be updated on every `updateStatus()` call.

## References

- Schema: [schema.md](schema.md) — `CallNodeAttrs`, `CallEdgeAttrs`, `CallStatus`, `EdgeType`
- Call protocol: `@alkdev/alkhub_ts/docs/architecture/call-graph.md`
- Call graph storage: `@alkdev/alkhub_ts/docs/architecture/storage/call-graph.md`
- Call event types: `@alkdev/operations/src/call.ts`
- Taskgraph pattern: `@alkdev/taskgraph_ts/src/graph/construction.ts`