hub/docs/architecture/call-graph.md

---
status: draft
last_updated: 2026-05-22
---

# Call Protocol, Call Graph & Operation Graph

## Overview

The call protocol is the unified transport layer for all operation invocations. It provides a single event-based mechanism that works the same whether the call is local (in-process), remote (hub ↔ spoke over websocket), or streamed (subscription). The call graph and operation graph are built on top of it — and `@alkdev/flowgraph` provides the graph construction, analysis, and reactive execution primitives.

Websockets are the primary transport for hub-spoke communication, not SSE. SSE is half-duplex and requires polling for the reverse path; websockets give us bidirectional channels where hub → spoke dispatch and spoke → hub results flow through the same connection. The call protocol's `call ≡ subscribe` semantics map naturally: a websocket frame comes in, the protocol resolves or streams depending on the consumption pattern.

**Transport distinction**: WebSocket is the primary bidirectional transport for hub↔spoke and hub↔client-spoke communication. SSE support exists for compatibility (e.g., OpenAI proxy streams, legacy clients) but is not the preferred transport. A client (browser, CLI) that connects as a spoke gets full bidirectional communication over a single WebSocket — no SSE needed.

## call ≡ subscribe

At the protocol level, `call` and `subscribe` are the same thing with different consumption patterns:

- **`call`**: Publish `call.requested`, subscribe to `call.responded:{requestId}`, resolve on first response → `Promise<TOutput>`
- **`subscribe`**: Publish `call.requested`, subscribe to `call.responded:{requestId}`, yield each response → `AsyncIterable<TOutput>`

Both use the same event types, the same `requestId` correlation, and the same `PendingRequestMap`. The only difference is that `call` resolves after the first `call.responded` and unsubscribes, while `subscribe` stays open and yields each `call.responded` until `call.aborted` or `call.error`.

This means `call` is semantically `subscribe().next()` — a subscription that completes after one event.

**HTTP endpoint**: An HTTP `POST /api/{namespace}/{operation}` is just `call` over HTTP — publish a `call.requested`, wait for `call.responded`, return the output as JSON.

**WebSocket endpoint**: A websocket connection carries bidirectional call protocol events. The hub pushes `call.requested` to spoke runners; runners push `call.responded`/`call.error` back. Same protocol, different transport. This is the hub-spoke "rpc-mode": persistent connection, no polling, natural streaming support.

## Why We Keep the Call Protocol (Not Just the Graphs)

1. **SDD process requires it** — the coordinator models development workflows between agents using the call graph. When the architect calls the decomposer which calls the coordinator which spawns implementation specialists, that's a call graph. The call protocol is what populates it automatically.

2. **Abort cascading** — when a parent operation fails or is aborted, all child operations should be notified. The call protocol propagates `call.aborted` through `parentRequestId` chains. Without it, each coordination operation handles errors ad-hoc (e.g., `coord.spawn` chains 5 `registry.execute()` calls — if the 3rd fails, there's no structured abort of the first two or the pending 4th/5th).

3. **Observability** — seeing what operations called what, how long they took, what failed, is essential for debugging agent workflows. The call protocol auto-tracks calls via `PendingRequestMap`; the call graph is populated as a side effect.

4. **Unified error handling** — `mapError` + `InfrastructureErrors` + `errorSchemas` declaration gives structured, typed errors across all transports. Without it, each consumer invents its own error format.

5. **Transport flexibility** — the `TypedEventTarget` plug point means the same protocol works over in-process `EventTarget`, Redis channels, or websockets. The hub uses all three: in-process for local operations, Redis for cross-process events, websockets for spoke runner dispatch.

6. **Future Rust rewrite** — the API contract needs to be stable. The call protocol is a small, well-defined event contract. Building it now means the Rust rewrite has a spec to implement against.

## Call Event Types

All communication flows through typed events:

> The call event TypeBox schemas are defined in `@alkdev/operations` as `CallEventSchema`. The shape shown here is the current design; verify against the package source for any minor differences.

```ts
import { Type } from "@alkdev/typebox"

export const CallEventMap = {
  call: {
    requested: Type.Object({
      requestId: Type.String(),
      operationId: Type.String(),
      input: Type.Unknown(),
      parentRequestId: Type.Optional(Type.String()),
      deadline: Type.Optional(Type.Number()),
      identity: Type.Optional(Type.Object({
        id: Type.String(),
        scopes: Type.Array(Type.String()),
        resources: Type.Optional(Type.Record(Type.String(), Type.Array(Type.String())))
      }))
    }),
    responded: Type.Object({
      requestId: Type.String(),
      output: Type.Unknown()  // ResponseEnvelope from @alkdev/operations
    }),
    completed: Type.Object({
      requestId: Type.String()
    }),
    aborted: Type.Object({
      requestId: Type.String()
    }),
    error: Type.Object({
      requestId: Type.String(),
      code: Type.String(),
      message: Type.String(),
      details: Type.Optional(Type.Unknown())
    })
  }
} as const
```

### Event Semantics

- **`call.requested`** — Initiates a call. Creates a call graph node (status: `pending`) and adds a `triggered` edge if `parentRequestId` is present.
- **`call.responded`** — Carries the call result. For one-shot calls, this is the terminal event that resolves the `Promise<ResponseEnvelope>`. The `output` field contains a `ResponseEnvelope` (with `data` and `meta` fields) from `@alkdev/operations`.
- **`call.completed`** — Terminal completion signal, idempotent if `call.responded` was already received. For subscriptions, fires after the last `call.responded` to signal stream end. For one-shot calls, the `PendingRequestMap` may emit `call.completed` as a separate event or as part of `call.responded` processing. In flowgraph, this event fills `completedAt` if it was not already set.
- **`call.aborted`** — Call was cancelled. Sets status to `aborted` and cascades to children.
- **`call.error`** — Call failed with an error. Sets status to `failed` and stores the error.

**Note on `@alkdev/flowgraph`**: The `CallEventMapValue` type in `@alkdev/flowgraph/schema` defines the union of these event types. Flowgraph's `FlowGraph.fromCallEvents()` and `updateFromEvent()` consume these events directly to populate the call graph. The `CallStatus` enum in flowgraph (`pending`, `running`, `completed`, `failed`, `aborted`) aligns with the statuses in the call protocol events.

**Note on ResponseEnvelope unwrapping**: The `call.responded` event carries `output` as a `ResponseEnvelope` (from `@alkdev/operations`). When feeding events to `@alkdev/flowgraph`, the hub **unwraps the envelope** before calling `updateFromEvent()` — `CallNodeAttrs.output` stores the `ResponseEnvelope.data` value (the actual result), not the full envelope. The `ResponseEnvelope.meta` is discarded at the call graph level (it's available in `PendingRequestMap` for the caller, but not persisted in the graph node). This means `call_graph_nodes.output` contains the unwrapped result data.

**Identity**: The `Identity` type represents the caller's security context. Derived from keypal's `ApiKeyMetadata` — `scopes` maps directly from keypal's global scopes, `resources` maps from keypal's resource-scoped permissions (key format: `"type:id"`, value: scope array). Passed through the call chain and checked by `CallHandler` against the operation's `AccessControl` definition. See operations.md for the `AccessControl` type.

**Request correlation**: Every call has a unique `requestId`. Nested calls include `parentRequestId` to track the call chain. Responses and errors are matched to requests by `requestId`.

## Error Model

The call protocol uses a **unified error model**: both infrastructure (protocol-level) and domain (operation-level) errors flow through the same `CallError` event. `CallError.code` is `string` — the distinction between infrastructure and domain codes is by convention, not by type.

### Infrastructure Error Codes

Reserved codes produced by `CallHandler` itself, before or after operation execution:

| Code | When | Schema |
|------|------|--------|
| `OPERATION_NOT_FOUND` | No operation matches `operationId` | `{ operationId: string }` |
| `ACCESS_DENIED` | Missing scopes | `{ requiredScopes?: string[] }` |
| `VALIDATION_ERROR` | Input fails `inputSchema` check | `{ errors: ValueError[] }` |
| `TIMEOUT` | Deadline exceeded | `{ deadline: number }` |
| `ABORTED` | Call cancelled | `{ reason?: string }` |
| `EXECUTION_ERROR` | Handler threw, no `errorSchemas` match | `{ message: string }` |
| `UNKNOWN_ERROR` | Non-Error thrown | `{ raw: string }` |

### Domain Error Propagation

Operations declare their possible errors via `errorSchemas` on `IOperationDefinition`. When a handler throws, `mapError` matches the thrown error against the declared schemas — falls back to `EXECUTION_ERROR` if no match.

**`errorSchemas` is the contract**: An operation's `errorSchemas` declaration is the contract between the operation and its callers about what errors it might produce. No `errorSchemas` = safe default with `EXECUTION_ERROR` wrapper.

## PendingRequestMap

Manages in-flight requests and provides the `call()` interface:

```ts
// From @alkdev/operations
import { PendingRequestMap } from "@alkdev/operations"

// Construction — takes optional EventTarget for pluggable transport
const prm = new PendingRequestMap({ eventTarget })

// Call protocol — call() returns Promise<ResponseEnvelope>
const envelope = await prm.call(operationId, input, { deadline, identity })
// envelope.data contains the result, envelope.meta contains source + timestamp

// Subscribe protocol — returns AsyncIterable<ResponseEnvelope>
const stream = prm.subscribe(operationId, input, { idleTimeout, identity })
for await (const envelope of stream) {
  // yield each response
}

// Resolving calls
prm.respond(requestId, output)  // output must be ResponseEnvelope
prm.emitError(requestId, code, message, details?)
prm.complete(requestId)
prm.abort(requestId)
```

**Key behaviors**:
- `call()` returns `Promise<ResponseEnvelope>` (not `Promise<unknown>`)
- `subscribe()` returns `AsyncIterable<ResponseEnvelope>`
- `respond()` requires `isResponseEnvelope(output)`
- Built-in deadline and idle timeout support
- Constructor takes optional `EventTarget` for pluggable transport

## CallHandler

Bridges pubsub events to `OperationRegistry.execute()`. Performs access control and error mapping:

```ts
import { buildCallHandler } from "@alkdev/operations"

const handler = buildCallHandler({ registry, eventTarget })
// subscribes to call.requested events
// checks access control (requiredScopes, resource permissions) against Identity
// executes via registry, dispatches call.responded on success
// maps errors via mapError, dispatches call.error
```

## Nested Call Wiring

Routing is an **env construction concern**, not a separate protocol layer. `buildEnv` is the single function that creates the `env`:

- **Direct mode**: `buildEnv({ registry, context })` → env functions call `registry.execute()` directly, return `Promise<ResponseEnvelope>`
- **Call protocol mode**: `PendingRequestMap` handles routing internally, `parentRequestId` is set via context

`buildEnv` no longer takes a `callMap` parameter. It sets `trusted: true` on nested context (bypasses access control for internal calls). Env functions return `Promise<ResponseEnvelope>`, not `Promise<unknown>`. Callers must use `unwrap(envelope)` or access `envelope.data` for the result.

**parentRequestId propagation**: Every nested call includes `parentRequestId` — enables call graph reconstruction and abort cascading.

## Operation Graph (Static)

Built once at startup from the `OperationRegistry`. Represents type-compatibility edges between operations. Implemented using `@alkdev/flowgraph`.

### Structure

```
Node = OperationNodeAttrs (namespace.name, type, inputSchema, outputSchema)
Edge = OperationEdgeAttrs (compatible: boolean, detail?, mismatches?)
Edge type = "typed" (from flowgraph EdgeType enum)
```

The operation graph is constructed via `FlowGraph.fromSpecs(specs)`, which takes an array of `OperationSpec` objects (derived from `OperationRegistry`) and:

1. Creates a node for each operation with `OperationNodeAttrs` attributes
2. Runs `buildTypeEdges(graph)` to create edges between operations whose output/input schemas are type-compatible
3. Throws `CycleError` if the resulting graph has cycles (DAG invariant)

### Type Compatibility

`typeCompat(outputSchema, inputSchema)` performs deep structural comparison of two TypeBox schemas. Returns:

- `{ compatible: true }` — output is a subtype of input
- `{ compatible: true, detail }` — compatible with notes (e.g., "output has extra fields")
- `{ compatible: false, mismatches: TypeMismatch[] }` — structural incompatibility
- `undefined` — one or both schemas are `unknown`/`any` (no meaningful check possible)

Edges where `compatible: false` are still added to the graph (with `compatible: false` and the mismatch details) so the graph is complete for observability, but the `compatible` attribute allows consumers to filter.

### Call Templates for SDD

The SDD process defines a natural workflow:

```
architect → architecture-reviewer → decomposer → coordinator → implementation-specialist → code-reviewer
```

This is a call template — a validated path through the operation graph that the coordinator can instantiate as a call graph at runtime.

**Current approach**: Hardcoded workflow sequences. See "What We Defer" below.

**Future approach**: `@alkdev/flowgraph` provides ujsx workflow composition components (`Operation`, `Sequential`, `Parallel`, `Conditional`, `Map`) that can define templates declaratively. The `GraphologyHostConfig` renders templates to a `DirectedGraph` for validation, and `ReactiveHostConfig` renders them to reactive `WorkflowNode` trees for execution. When we adopt template-based workflows, flowgraph provides the validation (`validateTemplate`), type-compatibility checking, and DAG enforcement out of the box.

### API Summary

```ts
import { FlowGraph } from "@alkdev/flowgraph/graph"
import { typeCompat, buildTypeEdges, topologicalOrder, validateGraph } from "@alkdev/flowgraph/analysis"

// Build operation graph from registered operations.
// Note: @alkdev/flowgraph's OperationSpec is a structural subset of
// @alkdev/operations' OperationSpec (it omits handler, accessControl, errorSchemas).
// The .map() transforms between the two types.
const opGraph = FlowGraph.fromSpecs(registry.list().map(spec => ({
  name: spec.name,
  namespace: spec.namespace,
  version: spec.version,
  type: spec.type,        // "query" | "mutation" | "subscription"
  inputSchema: spec.inputSchema,
  outputSchema: spec.outputSchema,
  description: spec.description,
})))

// Query
const compatResult = typeCompat(opA.outputSchema, opB.inputSchema)
const order = topologicalOrder(opGraph.graph)
const issues = validateGraph(opGraph.graph)

// Serialization
const data = opGraph.export()   // -> OperationGraphSerialized (graphology JSON format)
const restored = FlowGraph.fromJSON(data)  // validates schema + DAG invariant
```

## Call Graph (Dynamic)

Created at runtime for each workflow execution. Populated automatically by the call protocol — every `call.requested` adds a node, every `call.responded`/`call.error`/`call.aborted` updates its state and timestamp. Implemented using `@alkdev/flowgraph`.

### Structure

```
Node = CallNodeAttrs (requestId, operationId, status, input, output?, error?, identity?, parentRequestId?, startedAt?, completedAt?)
Edge type "triggered"  = execution hierarchy (parentRequestId → child call)
Edge type "depends_on" = data dependency (call A waits on call B's output)
```

> **EdgeType scoping**: `@alkdev/flowgraph` defines five edge types in its `EdgeType` enum: `triggered`, `depends_on`, `typed`, `sequential`, `conditional`. Not all apply to every graph type:
> - **Call graph**: `triggered` and `depends_on` (plus the storage-layer `requested_by`)
> - **Operation graph**: `typed` (type compatibility between operations)
> - **Template graph**: `sequential` and `conditional` (workflow composition via ujsx)
>
> This document focuses on call graph edge types. See the [flowgraph architecture docs](https://git.alk.dev/alkdev/flowgraph) for the full type definitions.

The call graph is populated by `FlowGraph.fromCallEvents(events)` or incrementally via `updateFromEvent(event)`. Each call protocol event maps directly to a graph mutation:

| Event | Graph Mutation |
|-------|---------------|
| `call.requested` | `addCall(attrs)` — creates node (status: `pending`) + `triggered` edge if `parentRequestId` present |
| `call.responded` | `updateCall(requestId, { status: "completed", output, completedAt })` |
| `call.completed` | `updateCall(requestId, { completedAt })` — idempotent if already responded, sets `completedAt` if missing |
| `call.error` | `updateCall(requestId, { status: "failed", error: { code, message, details? } })` |
| `call.aborted` | `updateStatus(requestId, "aborted")` + cascade to children |
| `call.running` | `updateStatus(requestId, "running")` — when the call starts executing (hub dispatches to handler) |

### Call Status State Machine

Flowgraph enforces valid status transitions via `updateStatus()`. The state machine is:

```
pending → running → completed
                  → failed
         → aborted
running → aborted
```

Terminal states (`completed`, `failed`, `aborted`) are immutable. `InvalidTransitionError` is thrown on invalid transitions. This matches the storage layer's `call_graph_nodes.status` enum.

### Abort Cascading

When a call is aborted, all of its children should also be aborted. Flowgraph provides two mechanisms:

1. **`triggered` edge traversal**: `children(requestId)` returns direct children via `triggered` edges. Full cascading uses `descendants(requestId)` for all descendants.
2. **`WorkflowReactiveRoot`**: For running workflow executions, the reactive engine provides `abortNode(nodeId)` and `abortAll()` with `FailurePolicy` configuration (`"continue-running"` vs `"abort-dependents"`).

The hub's `CallHandler` wires `call.aborted` events to:

- `updateStatus(requestId, "aborted")` on the call graph
- Pubsub event propagation so downstream `PendingRequestMap` instances also call `abort()` on their in-flight requests
- `WorkflowReactiveRoot.abortNode(nodeId)` if a workflow execution is tracking this call

### `depends_on` Edges

While `triggered` edges represent the parent-child execution hierarchy, `depends_on` edges represent data dependencies — a call that needs another call's output before it can proceed. These are created by the coordinator when orchestrating workflows:

```ts
callGraph.addDependency(sourceRequestId, targetRequestId)
// Adds a "depends_on" edge (source depends on target's output)
// Cycle-checked — throws CycleError if the edge would create a cycle
```

`depends_on` edges are not created by the call protocol itself. They are added by coordination logic that knows the data flow between calls (e.g., the coordinator knows that `coord.spawn` step 3 depends on step 1's output). This gives the observability layer a richer graph for analysis without changing the protocol.

### API Summary

```ts
import { FlowGraph } from "@alkdev/flowgraph/graph"
import { CallStatus } from "@alkdev/flowgraph/schema"

// Build call graph from events (e.g., after hub restart, reconstruct from DB)
const callGraph = FlowGraph.fromCallEvents(storedEvents)

// Or build incrementally as events arrive
const callGraph = new FlowGraph(CallNodeAttrs, CallEdgeAttrs)

// Process events
callGraph.updateFromEvent(event)       // handles all call.* event types

// Status management
callGraph.updateStatus(requestId, "running")  // validates state machine transition
callGraph.updateStatus(requestId, "completed") // throws if not currently "running"

// Edge management
callGraph.addCall({ requestId, operationId, status: "pending", parentRequestId?, input?, identity? })
callGraph.addDependency(sourceRequestId, targetRequestId)  // depends_on edge

// Queries
callGraph.children(requestId)           // direct children via triggered edges
callGraph.descendants(requestId)        // all descendants
callGraph.lineage(requestId)            // ancestor chain from root to this call
callGraph.getRoots()                     // calls with no parentRequestId
callGraph.filterByStatus("running")      // all running calls
callGraph.duration(requestId)           // completedAt - startedAt in ms

// Serialization (for Postgres persistence)
const data = callGraph.export()          // -> CallGraphSerialized
const restored = FlowGraph.fromJSON(data)
```

### Graph Size

At hub level, the call graph is small — just metadata nodes mapping to the actual process/call. Agent workflow call graphs will have tens of nodes at most for simple workflows, potentially hundreds for complex coordination with many parallel tasks. Performance is a non-issue for call-level metadata; flowgraph wraps graphology which handles thousands of nodes efficiently.

## Reactive Workflow Execution

For running workflow executions (not just observability), `@alkdev/flowgraph/reactive` provides `WorkflowReactiveRoot` — a signal-driven execution engine that:

1. Takes a `DirectedGraph` (from a ujsx template rendered via `GraphologyHostConfig`) and creates reactive state for every node
2. Processes call protocol events via `append(event)` — the event log is the source of truth, status/results are derived projections
3. Computes `preconditions` (all predecessors completed), `canStart` (preconditions met + not blocked by failure), and `blockedByFailure` (any predecessor failed/aborted) as reactive signals
4. Supports `FailurePolicy`: `"continue-running"` (only abort idle/waiting dependents) or `"abort-dependents"` (cascade abort to all non-terminal dependents)
5. Maps node keys to `requestId`s via `setRequestId(nodeKey, requestId)` — bridging template nodes to call protocol identifiers
6. Requires `dispose()` to release signal subscriptions

```ts
import { WorkflowReactiveRoot } from "@alkdev/flowgraph/reactive"

// WorkflowReactiveRoot takes the raw DirectedGraph (flowGraph.graph),
// not the FlowGraph wrapper
const workflowRoot = new WorkflowReactiveRoot(templateGraph.graph)
try {
  // Bridge template nodes to call protocol request IDs
  workflowRoot.setRequestId("step-1", requestId1)
  workflowRoot.setRequestId("step-2", requestId2)

  // Process events (appended by the hub's call protocol handler)
  workflowRoot.append(callRequestedEvent)
  workflowRoot.append(callRespondedEvent)

  // Query reactive state
  const status = workflowRoot.getStatus("step-1")    // NodeStatus
  const canStart = workflowRoot.canStart.get("step-2") // ReadonlySignal<boolean>
  const isComplete = workflowRoot.isComplete()         // all nodes terminal?
} finally {
  workflowRoot.dispose()
}
```

This is the execution engine for workflow-based coordination. The hub coordinator instantiates a `WorkflowReactiveRoot` for each running workflow, feeds it call protocol events, and uses its reactive state to determine what to do next (start the next step, handle failures, cascade aborts).

## Storage

Call graph nodes and edges are stored in Postgres. See `storage/call-graph.md` for the full schema definitions.

The storage layer persists individual `call_graph_nodes` and `call_graph_edges` rows. Flowgraph's `export()` produces graphology's native JSON format (`CallGraphSerialized`), which is suitable for snapshot/restore but not for incremental observability queries. The hub uses **both**:

- **Incremental storage**: Each call protocol event writes/updates a row in `call_graph_nodes` and creates `call_graph_edges` as needed. This supports real-time observability queries (what's running, what failed, what's blocked).
- **Reconstruction**: After a hub restart, the call graph can be reconstructed from stored events or from incremental rows using `FlowGraph.fromCallEvents()`.

### Write Path

The hub's `CallHandler` is responsible for writing call graph data to Postgres. When a call protocol event arrives:

1. **`call.requested`**: The `CallHandler` creates a row in `call_graph_nodes` (status: `pending`) and, if `parentRequestId` is present, a `triggered` edge in `call_graph_edges`. This write happens **synchronously before dispatching** to ensure the call is tracked even if the handler fails immediately.
2. **`call.responded`**: Updates the node's status to `completed`, sets `output` (unwrapped from the `ResponseEnvelope` — only `data` is stored, not `meta`), and sets `completedAt`.
3. **`call.error`**: Updates status to `failed`, sets `error`, and sets `completedAt`.
4. **`call.aborted`**: Updates status to `aborted` and sets `completedAt`. The hub then cascades the abort to child calls.
5. **`call.completed`**: Sets `completedAt` if not already set. Idempotent — no-op if the call is already `completed`.
6. **`call.running`**: Updates status from `pending` to `running` and sets `startedAt`.

Error handling: If a DB write fails, the call still proceeds (the handler has already been invoked). The hub logs the write failure and continues. Call graph data is best-effort — the in-memory flowgraph is the authoritative source for running calls; the DB is for persistence and observability.

### Identifier Mapping

The `call_graph_nodes` table uses two identifiers:

- **`id`** (UUID, from `commonCols`): Internal primary key, used as the FK target for `call_graph_edges`.
- **`requestId`** (text, UNIQUE): Protocol-level correlation key, used as the flowgraph node key.

When reconstructing a flowgraph from the database, the hub uses `requestId` as the node key (matching `CallNodeAttrs.requestId`). The `call_graph_edges` table uses `sourceId`/`targetId` referencing `call_graph_nodes.id` (the UUID), so reconstruction requires resolving UUIDs to requestIds. The `call_graph_nodes.requestId` column has a UNIQUE index, making this lookup efficient.

### `call_graph_nodes` — One row per call

| Column | Type | Notes |
|--------|------|-------|
| commonCols | — | id, metadata, createdAt, updatedAt |
| requestId | text NOT NULL UNIQUE | Protocol-level correlation key. Also the flowgraph node key. |
| operationId | text | FK → operations.id. Nullable — survives operation removal. |
| parentRequestId | text | Denormalized parent — fast point lookup. Redundant with `triggered` edge. |
| identity | jsonb | Caller identity: `{ id, scopes, resources }` |
| callerAccountId | text | FK → accounts.id (ON DELETE SET NULL). System calls are nullable. |
| status | text NOT NULL | Matches `CallStatus` enum: `pending`, `running`, `completed`, `failed`, `aborted` |
| input | jsonb | Call input (redacted, truncated — see storage/call-graph.md) |
| output | jsonb | Call output (on success) |
| error | jsonb | `{ code, message, details? }` (on failure) |
| startedAt | timestamp with tz | When call was dispatched (maps to flowgraph `startedAt`) |
| completedAt | timestamp with tz | When call completed/failed/aborted (maps to flowgraph `completedAt`) |

### `call_graph_edges` — Typed directed edges between calls

| Column | Type | Notes |
|--------|------|-------|
| commonCols | — | id, metadata, createdAt, updatedAt |
| sourceId | text NOT NULL | FK → call_graph_nodes.id (CASCADE) |
| targetId | text NOT NULL | FK → call_graph_nodes.id (CASCADE) |
| edgeType | text NOT NULL | `triggered`, `depends_on`, or `requested_by` |

**Edge type semantics**: `triggered` = execution hierarchy (parentRequestId), `depends_on` = data dependency, `requested_by` = identity/authorization chain. See storage/call-graph.md for details.

**Note on `depends_on` in flowgraph**: The flowgraph `CallEdgeAttrs` type is a union of `TriggeredEdgeAttrs` and `DependencyEdgeAttrs`, matching the `triggered` and `depends_on` edge types. The `requested_by` edge type is a storage-layer concept for identity tracing that doesn't have a corresponding flowgraph edge type — it's persisted in the database but not modeled in the in-memory graph.

## Transport Mapping

The call protocol is transport-agnostic. The `TypedEventTarget` plug point (same pattern as `RedisEventTarget` in the pubsub design) determines how events move:

| Transport | Use Case | `TypedEventTarget` impl |
|-----------|----------|------------------------|
| In-process | Local hub operations | Browser `EventTarget` (default) |
| Redis | Cross-process events (e.g., hub → all processes) | `RedisEventTarget` |
| WebSocket | Hub ↔ spoke bidirectional | `createWebSocketServerEventTarget` (hub) / `createWebSocketClientEventTarget` (spoke) from `@alkdev/pubsub` |

A `WebSocketEventTarget` implementing `TypedEventTarget` makes each spoke runner's websocket connection a live bidirectional channel. The hub dispatches `call.requested` over the socket; the runner sends `call.responded`/`call.error` back. Same protocol, same event shapes, same `PendingRequestMap` — just a different `eventTarget`.

## What We Defer

1. **Full ujsx call templates** — currently using hardcoded workflow sequences. `@alkdev/flowgraph/component` provides `Operation`, `Sequential`, `Parallel`, `Conditional`, `Map` components for declarative template definition, and `GraphologyHostConfig` + `ReactiveHostConfig` for rendering. We'll adopt these when workflow complexity justifies it.
2. **Graph visualization** — API only, no Sigma.js UI
3. **Stream deduplication** — `Value.Hash({operationId, input})` deduplication for multiple subscribers to the same stream
4. **`requested_by` edge creation in flowgraph** — the `requested_by` edge type is a storage-layer concept for identity tracing. It's persisted in `call_graph_edges` but not modeled in `@alkdev/flowgraph`'s `CallEdgeAttrs` union. We may add it to flowgraph in the future.

The call protocol itself, `PendingRequestMap`, `CallHandler`, `buildEnv` dual-mode, call graph auto-tracking, and reactive workflow execution are **in the initial implementation**. They're not much code and they prevent the need to bolt on ad-hoc error handling and abort logic in every coordination operation.

## Dependencies

```
@alkdev/flowgraph      # DAG construction, reactive execution, call/operation graphs, type-compat analysis
@alkdev/operations     # Call protocol, PendingRequestMap, CallHandler
@alkdev/pubsub         # Event transport (Redis, WebSocket, Worker)
@alkdev/taskgraph      # Task graph construction and analysis (for task management, not call graphs)
```

**Why both `@alkdev/flowgraph` and `@alkdev/taskgraph`?** `@alkdev/taskgraph` is a domain-specific library for task DAG construction with categorical estimates (scope, risk, impact), frontmatter parsing, and task-specific analysis (critical path, bottleneck detection, risk assessment). `@alkdev/flowgraph` is a general-purpose workflow graph library for call/operation DAGs with ujsx template composition and reactive execution. They both wrap graphology, but serve different domains. The hub uses `@alkdev/taskgraph` for task management and `@alkdev/flowgraph` for call graph and operation graph management.

## Prior Art

The call protocol was adapted from `ade_spoke`'s call protocol design (which was pubsub-agnostic). The key difference here is that websockets are the primary transport for hub-spoke communication rather than SSE. The call graph and operation graph are now implemented using `@alkdev/flowgraph` rather than raw graphology, which provides DAG enforcement, type-compatibility analysis, and reactive execution out of the box.