Files

glm-5.1 f3e084d02f resolve all remaining open questions (OQ-03–OQ-29), add ADR-006

Resolve all 19 remaining open questions across the architecture. Every
question now has a documented resolution with rationale:

- OQ-004/OQ-029: edgeType is a universal required attribute on all edges,
  single graph per FlowGraph instance (ADR-006)
- OQ-011: No OR preconditions for v1; preconditionMode as v2 extension
- OQ-012: maxConcurrency enforced via reactive counting semaphore
- OQ-014: Unknown operationId creates node with pending status
- OQ-017: Expose common graphology traversal methods on FlowGraph (80/20)
- OQ-020: condition as Type.Unknown() with string/function documentation
- OQ-022: Identity imported from @alkdev/operations peer dep
- All other questions resolved with documented rationale

Fix three critical issues found by architecture review:
1. edgeType serialization/validation gap: document two-step validation
2. CallEdgeAttrs runtime discrimination: edgeType as runtime discriminant,
   depends_on edges clarified as observability-only (not execution)
3. ADR-005 signal mutation inconsistency: explicitly distinguish call-level
   statuses (event-log-driven) from workflow-derived statuses (signal-mutation)

Additional clarifications:
- dataFlow inference uses conservative strategy (defaults false)
- Conditional.test string resolution: operationName → status === completed
- Add negated field to TemplateEdgeAttrs for else-branch conditions
- Document edge key priority convention for composite keys
- Add maxConcurrency semaphore design to reactive-execution.md

2026-05-21 09:25:55 +00:00

15 KiB

Raw Blame History

status, last_updated

status	last_updated
draft	2026-05-22

Call Graph (Dynamic Runtime)

The dynamic call graph populated at runtime from call events. Nodes are call invocations with status and timestamps; edges are parent-child and dependency relationships.

Overview

The call graph is the runtime counterpart to the operation graph. Where the operation graph captures what can happen (type compatibility), the call graph captures what is happening or has happened (running calls, completed calls, failures, aborts).

The call graph is populated automatically by the call protocol — every call.requested adds a node, every call.responded/call.error/call.aborted updates its status. This means the call graph is always in sync with the actual state of in-flight calls.

Key capabilities:

Abort cascading — abort a call → all children are automatically aborted via parentRequestId chains
Observability — query what's running, what failed, what's blocked
DAG operations — topological sort of running calls, cycle detection (shouldn't happen but verified), reachability queries
Serialization — export()/fromJSON() for Postgres persistence

Construction

fromCallEvents()

static fromCallEvents(events: CallEventMapValue[]): FlowGraph<CallNodeAttrs, CallEdgeAttrs>

Builds a call graph from an array of call protocol events. Events are processed in order:

call.requested → add a CallNodeAttrs node with status: "pending". If parentRequestId is set, add a triggered edge from parent to child.
call.responded → update node status to completed, set output and completedAt
call.error → update node status to failed, set error and completedAt
call.aborted → update node status to aborted, set completedAt
call.completed → update node status to completed, set completedAt (if not already set by call.responded)

Processing is idempotent — processing the same event twice has no effect (the node already has the updated status).

Incremental: updateFromEvent()

updateFromEvent(event: CallEventMapValue): void

Updates an existing call graph with a single call event. This is the primary interface for real-time graph population:

const callGraph = new FlowGraph();
// Subscribe to call protocol events
pubsub.subscribe("call.requested", (event) => callGraph.updateFromEvent(event));
pubsub.subscribe("call.responded", (event) => callGraph.updateFromEvent(event));
pubsub.subscribe("call.error", (event) => callGraph.updateFromEvent(event));
pubsub.subscribe("call.aborted", (event) => callGraph.updateFromEvent(event));
pubsub.subscribe("call.completed", (event) => callGraph.updateFromEvent(event));

fromJSON()

static fromJSON(data: CallGraphSerialized): FlowGraph

Deserialize from graphology native JSON format. Used for loading persisted call graphs from Postgres.

Node Attributes

See schema.md for the full schema definition.

Field	Type	Set by
`requestId`	`string`	`call.requested`
`operationId`	`string`	`call.requested`
`status`	`CallStatus`	Updated by each call event
`parentRequestId`	`string?`	`call.requested`
`input`	`unknown`	`call.requested`
`output`	`unknown?`	`call.responded`
`error`	`{ code, message, details? }?`	`call.error`
`identity`	`Identity?`	`call.requested`
`startedAt`	`string?`	`call.requested` (when handler starts)
`completedAt`	`string?`	Terminal event (`responded`, `error`, `aborted`)

The node key is requestId.

Edges

Call graph edges carry an edgeType attribute:

`edgeType`	Meaning	Added by
`triggered`	Parent call caused child call to execute	`call.requested` with `parentRequestId`
`depends_on`	Data dependency — source needs target's result	Explicit declaration (not auto-populated)

depends_on edges represent data dependencies between calls. Per ADR-005, the reactive engine does NOT use depends_on edges for data flow — data flows through the result projection (getResult()). However, depends_on edges remain in the API for observability and visualization: they annotate which calls depended on which other calls' results, providing a data-flow overlay on top of the call hierarchy. Hub coordinators or external tools may add depends_on edges to annotate observed data flow for debugging, monitoring, or call-graph visualization. They do NOT affect execution.

Edge Key Convention

triggered edges use ${parentRequestId}->${childRequestId} as the edge key. depends_on edges use ${sourceRequestId}->${targetRequestId}:depends_on to distinguish from triggered edges between the same pair.

This composite key format is necessary because multi: false allows at most one edge per key between a given (source, target) pair. Since a call graph can have both a triggered edge (parent→child) and a depends_on edge (data dependency) between the same pair of calls, the edge type suffix in the key disambiguates them. See schema.md#edge-key-convention for the general key convention and the discussion of multi-edge support.

Status Lifecycle

Call node status transitions follow a strict state machine:

              call.requested
                   │
                   ▼
              ┌─────────┐
              │ pending │
              └────┬────┘
                   │
              handler starts
                   │
                   ▼
              ┌─────────┐
         ┌────│ running │────┐
         │    └────┬────┘    │
    call.aborted  │    call.aborted
         │        │         │
         ▼        │         ▼
   ┌─────────┐    │   ┌─────────┐
   │ aborted │    │   │ aborted │
   └─────────┘    │   └─────────┘
                  │
        ┌─────────┼─────────┐
        │         │         │
  call.responded   │    call.error
        │         │         │
        ▼         │         ▼
  ┌───────────┐   │   ┌────────┐
  │ completed │   │   │ failed │
  └───────────┘   │   └────────┘
                  │
           call.completed
                  │
                  ▼
            ┌───────────┐
            │ completed │
            └───────────┘

Invalid transitions (e.g., completed → running) throw InvalidTransitionError. The updateStatus() method validates the transition before applying it.

Abort Cascading

When a call is aborted, all of its children should also be aborted. The call protocol handles this via call.aborted events propagating through parentRequestId chains.

The call graph supports this with a traversal query:

// Abort cascade: get all descendants of a call
const descendants = callGraph.descendants(requestId);
// → all calls that would be affected by aborting this call

The hub coordinator can:

Receive call.aborted for a parent call
Query callGraph.descendants(requestId) for all children
Abort each child call via PendingRequestMap.abort()

This is a structural operation — the graph provides the "who is affected" information, the protocol provides the "abort them" mechanism.

Observability Queries

The call graph supports queries for observability without traversing the entire graph:

Query	Method	Returns
Get running calls	`filterByStatus("running")`	Node IDs with running status
Get failed calls	`filterByStatus("failed")`	Node IDs with failed status
Get top-level calls	`getRoots()`	Nodes with no `parentRequestId`
Get children of call	`children(requestId)`	Direct children via `triggered` edges
Get call duration	`duration(requestId)`	`completedAt - startedAt` (throws if not completed)
Get call lineage	`lineage(requestId)`	Ancestor chain from root to this call

filterByStatus

filterByStatus(status: CallStatus): string[]

Returns all node keys with the given status. Implemented as a filter over graph.forEachNode(). For small graphs (tens to hundreds of nodes), this is O(n) and fast. For very large graphs, a status index could be added as an optimization.

getRoots

getRoots(): string[]

Returns all nodes with parentRequestId === undefined (top-level calls). These are the entry points of call chains.

Serialization and Persistence

const data = callGraph.export();          // graphology native JSON
callGraph.toJSON();                       // alias for export()
const restored = FlowGraph.fromJSON(data); // round-trip

The call graph's export()/fromJSON() boundary is designed for Postgres persistence via the hub's storage layer. Flowgraph does not handle database operations — it provides the serialized format, and the hub handles storage.

Payload fields (input, output, error) are stored as-is in the graph. The hub's storage layer is responsible for truncation and redaction (see @alkdev/alkhub_ts/docs/architecture/storage/call-graph.md for the payload handling strategy).

Mutations

// Add a call node (from call.requested event)
// If attrs.parentRequestId is set, also creates a triggered edge from parent to child
addCall(attrs: CallNodeAttrs): void

// Update call status (from call.responded/error/aborted/completed event)
updateStatus(requestId: string, status: CallStatus, extra?: Partial<CallNodeAttrs>): void

// Add a dependency edge (explicit, not auto-populated by call protocol)
// Creates an edge with edgeType: "depends_on"
addDependency(source: string, target: string): void

// Remove a call node and its edges
removeCall(requestId: string): void

// Update call attributes (partial merge)
updateCall(requestId: string, attrs: Partial<CallNodeAttrs>): void

addCall is the primary entry point for populating the call graph from call events. When attrs.parentRequestId is present, it automatically creates a triggered edge from the parent to the new node. addDependency creates explicit depends_on edges that represent data dependencies not captured by the parent-child hierarchy. updateStatus validates the transition. addDependency validates that both endpoints exist and that the edge would not create a cycle. removeCall removes the node and all attached edges (graphology cascade).

Constraints

DAG-only — call graphs cannot have cycles. A call cannot be its own ancestor. addCall with a parentRequestId that would create a cycle throws CycleError.
Status transitions are validated — invalid transitions throw InvalidTransitionError.
Node keys are requestId — not operationId. Multiple calls to the same operation have different requestIds but the same operationId.
parentRequestId is both node attribute and edge — denormalized for fast point lookups (node attribute) and traversal queries (edge), following the storage schema pattern.
depends_on edges are for observability, not execution — per ADR-005, data dependencies flow through the result projection. depends_on edges annotate observed data flow for visualization and debugging. The reactive engine does NOT use them for scheduling or precondition computation. They may be added by hub coordinators or external tools to document which calls depended on which other calls' results.
Payload fields are stored as-is — flowgraph doesn't truncate or redact input, output, or error. That's the hub's responsibility at the persistence boundary.
Small graph sizes — call graphs at hub level are typically tens of nodes. Performance is a non-issue; O(n) traversals are fine.

Open Questions

~~Should the call graph support call.requested events with unknown operationId?~~ Resolved (OQ-014): Yes — the call graph records what happened, not what should have happened. Nodes with unknown operationId get status: "pending" and may later transition to "failed" with an OPERATION_NOT_FOUND error code. This is consistent with the error-handling doc's existing statement about unknown operationId. The behavior should be documented explicitly in the fromCallEvents() specification: when a call.requested event references an operationId not in the registry, the node is still created with status: "pending" and the given operationId. This enables the call graph to serve as a complete audit trail regardless of registry state.
~~Should depends_on edges be auto-populated from workflow templates?~~ Resolved (OQ-008/ADR-005): depends_on edges are unnecessary as a separate concept. Data dependencies are expressed through the result projection. If node B needs node A's output, B reads getResult("A") from the result projection. The temporal ordering (A before B) is already expressed by template edges. There's no need for a separate edge type to represent data flow — the event log is the data transport.
~~Should the call graph support multiple graphs simultaneously?~~ Resolved (OQ-015): No — one FlowGraph instance per graph. Multiple concurrent workflows use multiple instances. This design is simpler and matches graphology's model. Subgraphs would require a scoping mechanism and cross-scope queries that add complexity without benefit at current scale. The hub coordinator creates one WorkflowReactiveRoot per workflow, so one FlowGraph per workflow is consistent. This is a deliberate "no," not a deferral — if future scale demands require multi-workflow queries, a specialized query layer can aggregate across instances.
~~Should filterByStatus use an index?~~ Resolved (OQ-016): No — O(n) filter is sufficient for expected graph sizes (tens to hundreds of nodes). A status index would add implementation complexity (maintain on every updateStatus()) for no measurable benefit at current scale. If performance becomes an issue with very large graphs, a Map<CallStatus, Set<string>> index can be added as an optimization later without changing the public API.

References

Schema: schema.md — CallNodeAttrs, CallEdgeAttrs, CallStatus, EdgeType
Call protocol: @alkdev/alkhub_ts/docs/architecture/call-graph.md
Call graph storage: @alkdev/alkhub_ts/docs/architecture/storage/call-graph.md
Call event types: @alkdev/operations/src/call.ts
Taskgraph pattern: @alkdev/taskgraph_ts/src/graph/construction.ts

15 KiB Raw Blame History