Files
hub/docs/architecture/storage/call-graph.md
glm-5.1 93e2286343 Align storage & architecture specs with published npm libraries
Systematically compared @alkdev/taskgraph, @alkdev/operations, and
@alkdev/flowgraph against storage/arch specs and fixed all mismatches.

Key changes:

Tasks (storage/tasks.md + ADR-011):
- Rename TaskFrontmatter → TaskInput to match library export
- Fix dependsOn (was depends_on) in field mappings — library uses
  camelCase; parseFrontmatter normalizes YAML snake_case on input
- Document DependencyEdge shape {from, to, qualityRetention?} and
  DB↔library field mapping
- Document graph node vs DB column distinction (TaskGraphNodeAttrs
  is a subset of TaskInput)
- Fix default risk fallback from low → medium (matches resolveDefaults)
- Fix cross-project guard column references (dependentTaskId, not taskId)
- Clarify @alkdev/taskgraph TS is source of truth; frontmatter is for
  LLM output parsing and legacy imports, not Rust CLI
- Add complete library exports reference

Operations (storage/spokes.md + operations.md):
- Add version, title, _meta columns to operations table (required by
  OperationSpec, were missing)
- Fix type casing: query/mutation/subscription (lowercase, matching
  OperationType runtime values)
- Make outputSchema and accessControl NOT NULL (matching library)
- Document ErrorDefinition shape {code, description, schema, httpStatus?}
- Document _meta vs commonCols.metadata distinction
- Add registerAll, get, getHandler, getByName, list, subscribe methods
- Fix buildCallHandler signature ({ registry, callMap })
- Fix OperationType values (lowercase)

Call graph (storage/call-graph.md + call-graph.md):
- Change operationId to NOT NULL with RESTRICT FK (was nullable/SET NULL)
  — matches flowgraph's required CallNodeAttrs.operationId
- Document sentinel __removed__ operation strategy for deletions
- Document ISO 8601 string ↔ timestamptz conversion requirement
- Rewrite CallEventMap to match actual library: flat dot-notation keys,
  timestamp on all events, nested error structure, optional output on
  completed event
- Remove call.running event (doesn't exist in library) — hub calls
  updateStatus(running) directly on dispatch
- Fix buildCallHandler({ registry, callMap }) signature
- Fix PendingRequestMap constructor (positional EventTarget)
- Add updateCall/removeCall/graph methods to API summary
- Document abort cascade as hub logic, not flowgraph logic
- Add open questions for operation deletion and reactive vs call graph
  semantics

Table reference (storage/table-reference.md):
- Update call_graph_nodes.operationId cascade to RESTRICT
- Update operations.type comment to lowercase
- Update status enum reference
2026-05-25 11:46:42 +00:00

14 KiB

status, last_updated
status last_updated
draft 2026-05-25

Table Schemas: Call Graph

Call graph observability tables. For cross-cutting reference (cascade behavior, index reference, status enums, relations), see table-reference.md. For design decisions, see ../../../decisions/. For call protocol architecture, see ../../call-graph.md. For the flowgraph library that manages call/operation graphs in memory, see @alkdev/flowgraph.

call_graph_nodes

Call graph entries for observability. Every operation invocation creates a node; parent-child relationships create edges. The status column matches @alkdev/flowgraph/schema's CallStatus enum. See call-graph.md for the full call protocol spec.

Column Type Notes
commonCols id, metadata, createdAt, updatedAt
requestId text NOT NULL UNIQUE Protocol-level correlation key. Also serves as the flowgraph node key.
operationId text NOT NULL FK → operations.id (RESTRICT) — The operation definition that was called. NOT NULL@alkdev/flowgraph's CallNodeAttrs.operationId is a required string. The RESTRICT constraint means an operation cannot be deleted while call records reference it. Deletion strategy: The hub should deny operation removal when active call records exist. If removal is required (e.g., cleanup), the hub must first reassign call records to a sentinel operation row (pre-seeded in migrations with id __removed__, name removed, namespace system), then delete the original operation.
parentRequestId text Parent call's requestId (null = top-level call). Denormalized fast lookup — redundant with triggered edge in call_graph_edges.
identity jsonb Caller identity at time of call ({ id, scopes, resources? }), matching @alkdev/flowgraph/schema's CallNodeAttrs.identity.
callerAccountId text FK → accounts.id — The account that initiated this call. Nullable — system-initiated calls may not have an account. onDelete: SET NULL (calls survive account deletion for audit). This follows the D1 cascade policy — live session/call data uses nullable FK + SET NULL to preserve audit history.
status text NOT NULL Matches @alkdev/flowgraph/schema's CallStatus enum: pending, running, completed, failed, aborted. State transitions are enforced by the flowgraph state machine — pending → running → completed/failed and pending/running → aborted.
input jsonb Call input (redacted before storage — see Payload Redaction).
output jsonb Call output (on success). Contains ResponseEnvelope.data only — the hub unwraps the envelope before storing in the call graph. Maps to CallNodeAttrs.output in flowgraph.
error jsonb { code, message, details? } (on failure). Maps to CallNodeAttrs.error in flowgraph.
startedAt timestamp with tz When call was dispatched. Maps to CallNodeAttrs.startedAt in flowgraph. Type conversion: flowgraph stores timestamps as ISO 8601 strings; storage layer must convert between timestamptz and ISO strings during read/write.
completedAt timestamp with tz When call completed/failed/aborted. Maps to CallNodeAttrs.completedAt in flowgraph. Type conversion: same as startedAt.

identity boundaries: Caller identity at time of call (account, scopes, resources). This is immutable after creation. metadata boundaries: Retention metadata and other system fields. User-facing data goes in input/output.

Timestamp serialization: @alkdev/flowgraph's CallNodeAttrs stores startedAt and completedAt as ISO 8601 strings (Type.Optional(Type.String())), not native Date objects. The storage layer stores them as Postgres timestamp with tz. The hub must:

  • On write (DB→flowgraph): Convert timestamptz → ISO string via .toISOString()
  • On read (flowgraph→DB): Convert ISO string → Date or pass as parameterized timestamp

Indexes: idx_call_graph_nodes_request_id UNIQUE on (requestId), idx_call_graph_nodes_operation_id on (operationId), idx_call_graph_nodes_status on (status), idx_call_graph_nodes_caller_account_id on (callerAccountId), idx_call_graph_nodes_created_at on (createdAt) — time-range queries, idx_call_graph_nodes_operation_created on (operationId, createdAt) — operation + time queries, idx_call_graph_nodes_started_at on (startedAt) — p99 latency analysis.

Call graph payload size: The input and output JSONB columns can grow arbitrarily large. For observability, the full payload is valuable but can bloat storage. Strategy: truncate payloads larger than 10KB to { _truncated: true, size: number, preview: string } at the application layer. Full payloads can optionally be stored in object storage (S3/MinIO) with a reference URL in the metadata column. This keeps the call graph table lean while preserving the ability to inspect large payloads when needed.

call.running and startedAt: There is no call.running event in @alkdev/flowgraph's CallEventMapValue. The call.requested event creates the node in pending state. The transition to running is performed by the hub's CallHandler calling flowGraph.updateStatus(requestId, "running", { startedAt: now.toISOString() }) directly when it dispatches the operation handler. This is hub-initiated, not event-driven. See call-graph.md for the write path details.

Mapping to @alkdev/flowgraph: The call_graph_nodes columns map directly to CallNodeAttrs in @alkdev/flowgraph/schema. The in-memory flowgraph instance uses requestId as the node key. Storage reads populate a FlowGraph.fromCallEvents() call graph for observability queries, and storage writes persist each call protocol event incrementally.

call_graph_edges

Edges in call graph (typed directed edges between calls). The edgeType column aligns with @alkdev/flowgraph/schema's EdgeType enum for the edge types that flowgraph models (triggered, depends_on). The requested_by type is a storage-layer extension for identity tracing.

Column Type Notes
commonCols id, metadata, createdAt, updatedAt
sourceId text NOT NULL FK → call_graph_nodes.id (CASCADE) — deleting a source node removes its outgoing edges
targetId text NOT NULL FK → call_graph_nodes.id (CASCADE) — deleting a target node removes its incoming edges
edgeType text NOT NULL Edge type (see Edge Type Semantics below)

Indexes: idx_call_graph_edges_source_id on (sourceId) — find calls originating from a node, idx_call_graph_edges_target_id on (targetId) — find calls targeting a node, idx_call_graph_edges_source_id_type on (sourceId, edgeType) — find outgoing calls of a specific type.

Unique constraint: unq_call_graph_edges_source_target_type UNIQUE on (sourceId, targetId, edgeType) — prevents duplicate edges from retries/reconnections.

Edge Type Semantics

The edgeType column is an extensible text field. The initial set of edge types aligns with @alkdev/flowgraph/schema's EdgeType enum for the first two, with a storage-layer extension for the third:

Edge Type Flowgraph EdgeType Meaning
triggered EdgeType.triggered The source node caused the target node to execute. Represents the parent-child call hierarchy — when call A invokes call B (via parentRequestId), a triggered edge connects them. This is the most common edge type and corresponds to the call graph nesting described in the call protocol. Created automatically by FlowGraph.addCall() when parentRequestId is present.
depends_on EdgeType.depends_on The source node requires the result of the target node before it can complete. Represents a data dependency — call A cannot proceed until call B's output is available. Unlike triggered, the source does not cause the target to execute; it merely waits on it. Created by coordination logic via FlowGraph.addDependency().
requested_by Storage extension (no flowgraph EdgeType) The target node was executed on behalf of the source node's identity. Represents the identity/authorization chain — call A's identity was delegated or propagated to call B. Used to trace which account's authority a call was performed under, distinct from the execution hierarchy (triggered). This is persisted in the database for observability but not modeled in the in-memory flowgraph graph.

New edge types may be added as the call protocol evolves. Convention: use snake_case names, document each new type in this table, and ensure the type has a clear semantic distinction from existing types.

Relationship: parentRequestId vs call_graph_edges

The parentRequestId column on call_graph_nodes and triggered edges in call_graph_edges both represent the parent-child call hierarchy, but serve different purposes:

  • parentRequestId is a convenience shortcut on the node itself, set at call creation time from the call protocol's parentRequestId field. It enables fast point lookups ("who is this call's parent?") without a JOIN. Also used as the node key in the flowgraph instance.
  • triggered edges represent the same relationship in the graph structure, enabling traversal queries ("find all children of this node"), path queries, and graph algorithm operations (topological sort, cycle detection).
  • They are intentionally redundant: parentRequestId is denormalized for fast reads; edges are normalized for graph operations. Both should be kept consistent — when a node with a parentRequestId is stored, a triggered edge should also be created.

Mapping to @alkdev/flowgraph In-Memory Model

The storage tables map to @alkdev/flowgraph types as follows:

Storage Table/Column Flowgraph Type Notes
call_graph_nodes row CallNodeAttrs (node in FlowGraph) requestId is the node key in the flowgraph instance
call_graph_nodes.status CallStatus enum Same values: pending, running, completed, failed, aborted
call_graph_nodes.identity CallNodeAttrs.identity { id, scopes, resources }
call_graph_nodes.error CallNodeAttrs.error { code, message, details? }
call_graph_edges with edgeType='triggered' TriggeredEdgeAttrs Created by FlowGraph.addCall() when parentRequestId is present
call_graph_edges with edgeType='depends_on' DependencyEdgeAttrs Created by FlowGraph.addDependency()
call_graph_edges with edgeType='requested_by' No flowgraph equivalent Storage-layer only, not modeled in the in-memory graph

Reconstruction: After a hub restart, the call graph is rebuilt from stored events or incremental rows using FlowGraph.fromCallEvents() or by iterating over call_graph_nodes + call_graph_edges rows and populating a FlowGraph instance via addCall() and addDependency().

Identifier mapping: call_graph_nodes uses two identifiers — id (UUID, from commonCols, used as PK and FK target for edges) and requestId (text, UNIQUE, used as the flowgraph node key). When writing edges to call_graph_edges, the hub resolves requestIdcall_graph_nodes.id for the FK references. When reconstructing from the database, the hub resolves call_graph_nodes.idrequestId for flowgraph node keys. This mapping is efficient because call_graph_nodes.requestId has a UNIQUE index.

Serialization: Flowgraph's export() produces graphology's native JSON format (CallGraphSerialized), which is suitable for snapshot/restore but not for incremental queries. The hub uses incremental storage for real-time observability and can optionally persist snapshots for fast recovery.

Retention Policy

Call graph data is retained for 90 days by default (configurable via hub config). Completed/failed/aborted nodes and their edges older than the retention period are cleaned up by a background job. Pending/running nodes are never auto-deleted.

Aggregation for observability: Before deletion, summary statistics (call counts, average duration, error rates by operation) may be computed and stored in a separate aggregation table (deferred to Phase 2).

The metadata column on call_graph_nodes stores retention metadata: { _retentionExpiresAt: timestamp } for tracking when a node becomes eligible for cleanup.

Payload Redaction

Call graph input and output payloads may contain sensitive data (API keys, tokens, personal information). A redaction strategy is applied before storage.

Redaction rules: (1) Known sensitive field names (apiKey, token, password, secret, authorization, key) are replaced with [REDACTED]. (2) String values matching common secret patterns (Bearer tokens, base64-encoded secrets) are replaced with [REDACTED]. (3) Redaction is applied BEFORE the 10KB truncation — the truncated preview contains only redacted data.

Redaction timing: Applied at the application layer before DB write. Never store raw payloads and redact on read — redaction must be one-way.

Configuration: The list of redacted field names and patterns is configurable via hub config, with sensible defaults.

Payload Truncation

Truncation timing: Payloads are truncated on DB write, not in-flight. In-flight calls hold full payloads in memory for processing. Only the persisted version is truncated.

Truncation strategy: Payloads larger than 10KB are truncated to { _truncated: true, size: number, preview: string } where preview is the first 1024 bytes (not characters) of the JSON-serialized payload. The threshold is configurable via HubConfig.callGraph.payloadTruncationThreshold (defaults to 10240 bytes).

Object storage reference: For payloads exceeding the truncation threshold, the full payload MAY be stored in object storage (S3/MinIO) with a reference URL in the metadata column as { _storageRef: 's3://bucket/key' }. This is Phase 2 and not yet implemented.