Files
hub/docs/architecture/call-graph.md
glm-5.1 2b63cda1c7 Setup repo: migrate architecture specs, code stubs, and tasks from alkhub_ts
Copy architecture docs, ADRs, storage domain specs, research, reviews,
and 56 storage architecture tasks from the alkhub_ts monorepo. Adapt for
standalone @alkdev/hub repo structure (src/ not packages/hub/).

Sanitize all sensitive information:
- Replace private IPs (10.0.0.1) with localhost defaults
- Remove internal server hostnames (dev1, ns528096)
- Replace /workspace/ private paths with npm package references
- Remove hardcoded credentials from examples
- Rewrite infrastructure.md without private network details

Add Deno project scaffolding: deno.json (pinned deps), .gitignore,
AGENTS.md, entry point. Migrate existing code stubs (crypto, config
types, logger) with updated import paths.
2026-05-25 10:56:32 +00:00

32 KiB

status, last_updated
status last_updated
draft 2026-05-22

Call Protocol, Call Graph & Operation Graph

Overview

The call protocol is the unified transport layer for all operation invocations. It provides a single event-based mechanism that works the same whether the call is local (in-process), remote (hub ↔ spoke over websocket), or streamed (subscription). The call graph and operation graph are built on top of it — and @alkdev/flowgraph provides the graph construction, analysis, and reactive execution primitives.

Websockets are the primary transport for hub-spoke communication, not SSE. SSE is half-duplex and requires polling for the reverse path; websockets give us bidirectional channels where hub → spoke dispatch and spoke → hub results flow through the same connection. The call protocol's call ≡ subscribe semantics map naturally: a websocket frame comes in, the protocol resolves or streams depending on the consumption pattern.

Transport distinction: WebSocket is the primary bidirectional transport for hub↔spoke and hub↔client-spoke communication. SSE support exists for compatibility (e.g., OpenAI proxy streams, legacy clients) but is not the preferred transport. A client (browser, CLI) that connects as a spoke gets full bidirectional communication over a single WebSocket — no SSE needed.

call ≡ subscribe

At the protocol level, call and subscribe are the same thing with different consumption patterns:

  • call: Publish call.requested, subscribe to call.responded:{requestId}, resolve on first response → Promise<TOutput>
  • subscribe: Publish call.requested, subscribe to call.responded:{requestId}, yield each response → AsyncIterable<TOutput>

Both use the same event types, the same requestId correlation, and the same PendingRequestMap. The only difference is that call resolves after the first call.responded and unsubscribes, while subscribe stays open and yields each call.responded until call.aborted or call.error.

This means call is semantically subscribe().next() — a subscription that completes after one event.

HTTP endpoint: An HTTP POST /api/{namespace}/{operation} is just call over HTTP — publish a call.requested, wait for call.responded, return the output as JSON.

WebSocket endpoint: A websocket connection carries bidirectional call protocol events. The hub pushes call.requested to spoke runners; runners push call.responded/call.error back. Same protocol, different transport. This is the hub-spoke "rpc-mode": persistent connection, no polling, natural streaming support.

Why We Keep the Call Protocol (Not Just the Graphs)

  1. SDD process requires it — the coordinator models development workflows between agents using the call graph. When the architect calls the decomposer which calls the coordinator which spawns implementation specialists, that's a call graph. The call protocol is what populates it automatically.

  2. Abort cascading — when a parent operation fails or is aborted, all child operations should be notified. The call protocol propagates call.aborted through parentRequestId chains. Without it, each coordination operation handles errors ad-hoc (e.g., coord.spawn chains 5 registry.execute() calls — if the 3rd fails, there's no structured abort of the first two or the pending 4th/5th).

  3. Observability — seeing what operations called what, how long they took, what failed, is essential for debugging agent workflows. The call protocol auto-tracks calls via PendingRequestMap; the call graph is populated as a side effect.

  4. Unified error handlingmapError + InfrastructureErrors + errorSchemas declaration gives structured, typed errors across all transports. Without it, each consumer invents its own error format.

  5. Transport flexibility — the TypedEventTarget plug point means the same protocol works over in-process EventTarget, Redis channels, or websockets. The hub uses all three: in-process for local operations, Redis for cross-process events, websockets for spoke runner dispatch.

  6. Future Rust rewrite — the API contract needs to be stable. The call protocol is a small, well-defined event contract. Building it now means the Rust rewrite has a spec to implement against.

Call Event Types

All communication flows through typed events:

The call event TypeBox schemas are defined in @alkdev/operations as CallEventSchema. The shape shown here is the current design; verify against the package source for any minor differences.

import { Type } from "@alkdev/typebox"

export const CallEventMap = {
  call: {
    requested: Type.Object({
      requestId: Type.String(),
      operationId: Type.String(),
      input: Type.Unknown(),
      parentRequestId: Type.Optional(Type.String()),
      deadline: Type.Optional(Type.Number()),
      identity: Type.Optional(Type.Object({
        id: Type.String(),
        scopes: Type.Array(Type.String()),
        resources: Type.Optional(Type.Record(Type.String(), Type.Array(Type.String())))
      }))
    }),
    responded: Type.Object({
      requestId: Type.String(),
      output: Type.Unknown()  // ResponseEnvelope from @alkdev/operations
    }),
    completed: Type.Object({
      requestId: Type.String()
    }),
    aborted: Type.Object({
      requestId: Type.String()
    }),
    error: Type.Object({
      requestId: Type.String(),
      code: Type.String(),
      message: Type.String(),
      details: Type.Optional(Type.Unknown())
    })
  }
} as const

Event Semantics

  • call.requested — Initiates a call. Creates a call graph node (status: pending) and adds a triggered edge if parentRequestId is present.
  • call.responded — Carries the call result. For one-shot calls, this is the terminal event that resolves the Promise<ResponseEnvelope>. The output field contains a ResponseEnvelope (with data and meta fields) from @alkdev/operations.
  • call.completed — Terminal completion signal, idempotent if call.responded was already received. For subscriptions, fires after the last call.responded to signal stream end. For one-shot calls, the PendingRequestMap may emit call.completed as a separate event or as part of call.responded processing. In flowgraph, this event fills completedAt if it was not already set.
  • call.aborted — Call was cancelled. Sets status to aborted and cascades to children.
  • call.error — Call failed with an error. Sets status to failed and stores the error.

Note on @alkdev/flowgraph: The CallEventMapValue type in @alkdev/flowgraph/schema defines the union of these event types. Flowgraph's FlowGraph.fromCallEvents() and updateFromEvent() consume these events directly to populate the call graph. The CallStatus enum in flowgraph (pending, running, completed, failed, aborted) aligns with the statuses in the call protocol events.

Note on ResponseEnvelope unwrapping: The call.responded event carries output as a ResponseEnvelope (from @alkdev/operations). When feeding events to @alkdev/flowgraph, the hub unwraps the envelope before calling updateFromEvent()CallNodeAttrs.output stores the ResponseEnvelope.data value (the actual result), not the full envelope. The ResponseEnvelope.meta is discarded at the call graph level (it's available in PendingRequestMap for the caller, but not persisted in the graph node). This means call_graph_nodes.output contains the unwrapped result data.

Identity: The Identity type represents the caller's security context. Derived from keypal's ApiKeyMetadatascopes maps directly from keypal's global scopes, resources maps from keypal's resource-scoped permissions (key format: "type:id", value: scope array). Passed through the call chain and checked by CallHandler against the operation's AccessControl definition. See operations.md for the AccessControl type.

Request correlation: Every call has a unique requestId. Nested calls include parentRequestId to track the call chain. Responses and errors are matched to requests by requestId.

Error Model

The call protocol uses a unified error model: both infrastructure (protocol-level) and domain (operation-level) errors flow through the same CallError event. CallError.code is string — the distinction between infrastructure and domain codes is by convention, not by type.

Infrastructure Error Codes

Reserved codes produced by CallHandler itself, before or after operation execution:

Code When Schema
OPERATION_NOT_FOUND No operation matches operationId { operationId: string }
ACCESS_DENIED Missing scopes { requiredScopes?: string[] }
VALIDATION_ERROR Input fails inputSchema check { errors: ValueError[] }
TIMEOUT Deadline exceeded { deadline: number }
ABORTED Call cancelled { reason?: string }
EXECUTION_ERROR Handler threw, no errorSchemas match { message: string }
UNKNOWN_ERROR Non-Error thrown { raw: string }

Domain Error Propagation

Operations declare their possible errors via errorSchemas on IOperationDefinition. When a handler throws, mapError matches the thrown error against the declared schemas — falls back to EXECUTION_ERROR if no match.

errorSchemas is the contract: An operation's errorSchemas declaration is the contract between the operation and its callers about what errors it might produce. No errorSchemas = safe default with EXECUTION_ERROR wrapper.

PendingRequestMap

Manages in-flight requests and provides the call() interface:

// From @alkdev/operations
import { PendingRequestMap } from "@alkdev/operations"

// Construction — takes optional EventTarget for pluggable transport
const prm = new PendingRequestMap({ eventTarget })

// Call protocol — call() returns Promise<ResponseEnvelope>
const envelope = await prm.call(operationId, input, { deadline, identity })
// envelope.data contains the result, envelope.meta contains source + timestamp

// Subscribe protocol — returns AsyncIterable<ResponseEnvelope>
const stream = prm.subscribe(operationId, input, { idleTimeout, identity })
for await (const envelope of stream) {
  // yield each response
}

// Resolving calls
prm.respond(requestId, output)  // output must be ResponseEnvelope
prm.emitError(requestId, code, message, details?)
prm.complete(requestId)
prm.abort(requestId)

Key behaviors:

  • call() returns Promise<ResponseEnvelope> (not Promise<unknown>)
  • subscribe() returns AsyncIterable<ResponseEnvelope>
  • respond() requires isResponseEnvelope(output)
  • Built-in deadline and idle timeout support
  • Constructor takes optional EventTarget for pluggable transport

CallHandler

Bridges pubsub events to OperationRegistry.execute(). Performs access control and error mapping:

import { buildCallHandler } from "@alkdev/operations"

const handler = buildCallHandler({ registry, eventTarget })
// subscribes to call.requested events
// checks access control (requiredScopes, resource permissions) against Identity
// executes via registry, dispatches call.responded on success
// maps errors via mapError, dispatches call.error

Nested Call Wiring

Routing is an env construction concern, not a separate protocol layer. buildEnv is the single function that creates the env:

  • Direct mode: buildEnv({ registry, context }) → env functions call registry.execute() directly, return Promise<ResponseEnvelope>
  • Call protocol mode: PendingRequestMap handles routing internally, parentRequestId is set via context

buildEnv no longer takes a callMap parameter. It sets trusted: true on nested context (bypasses access control for internal calls). Env functions return Promise<ResponseEnvelope>, not Promise<unknown>. Callers must use unwrap(envelope) or access envelope.data for the result.

parentRequestId propagation: Every nested call includes parentRequestId — enables call graph reconstruction and abort cascading.

Operation Graph (Static)

Built once at startup from the OperationRegistry. Represents type-compatibility edges between operations. Implemented using @alkdev/flowgraph.

Structure

Node = OperationNodeAttrs (namespace.name, type, inputSchema, outputSchema)
Edge = OperationEdgeAttrs (compatible: boolean, detail?, mismatches?)
Edge type = "typed" (from flowgraph EdgeType enum)

The operation graph is constructed via FlowGraph.fromSpecs(specs), which takes an array of OperationSpec objects (derived from OperationRegistry) and:

  1. Creates a node for each operation with OperationNodeAttrs attributes
  2. Runs buildTypeEdges(graph) to create edges between operations whose output/input schemas are type-compatible
  3. Throws CycleError if the resulting graph has cycles (DAG invariant)

Type Compatibility

typeCompat(outputSchema, inputSchema) performs deep structural comparison of two TypeBox schemas. Returns:

  • { compatible: true } — output is a subtype of input
  • { compatible: true, detail } — compatible with notes (e.g., "output has extra fields")
  • { compatible: false, mismatches: TypeMismatch[] } — structural incompatibility
  • undefined — one or both schemas are unknown/any (no meaningful check possible)

Edges where compatible: false are still added to the graph (with compatible: false and the mismatch details) so the graph is complete for observability, but the compatible attribute allows consumers to filter.

Call Templates for SDD

The SDD process defines a natural workflow:

architect → architecture-reviewer → decomposer → coordinator → implementation-specialist → code-reviewer

This is a call template — a validated path through the operation graph that the coordinator can instantiate as a call graph at runtime.

Current approach: Hardcoded workflow sequences. See "What We Defer" below.

Future approach: @alkdev/flowgraph provides ujsx workflow composition components (Operation, Sequential, Parallel, Conditional, Map) that can define templates declaratively. The GraphologyHostConfig renders templates to a DirectedGraph for validation, and ReactiveHostConfig renders them to reactive WorkflowNode trees for execution. When we adopt template-based workflows, flowgraph provides the validation (validateTemplate), type-compatibility checking, and DAG enforcement out of the box.

API Summary

import { FlowGraph } from "@alkdev/flowgraph/graph"
import { typeCompat, buildTypeEdges, topologicalOrder, validateGraph } from "@alkdev/flowgraph/analysis"

// Build operation graph from registered operations.
// Note: @alkdev/flowgraph's OperationSpec is a structural subset of
// @alkdev/operations' OperationSpec (it omits handler, accessControl, errorSchemas).
// The .map() transforms between the two types.
const opGraph = FlowGraph.fromSpecs(registry.list().map(spec => ({
  name: spec.name,
  namespace: spec.namespace,
  version: spec.version,
  type: spec.type,        // "query" | "mutation" | "subscription"
  inputSchema: spec.inputSchema,
  outputSchema: spec.outputSchema,
  description: spec.description,
})))

// Query
const compatResult = typeCompat(opA.outputSchema, opB.inputSchema)
const order = topologicalOrder(opGraph.graph)
const issues = validateGraph(opGraph.graph)

// Serialization
const data = opGraph.export()   // -> OperationGraphSerialized (graphology JSON format)
const restored = FlowGraph.fromJSON(data)  // validates schema + DAG invariant

Call Graph (Dynamic)

Created at runtime for each workflow execution. Populated automatically by the call protocol — every call.requested adds a node, every call.responded/call.error/call.aborted updates its state and timestamp. Implemented using @alkdev/flowgraph.

Structure

Node = CallNodeAttrs (requestId, operationId, status, input, output?, error?, identity?, parentRequestId?, startedAt?, completedAt?)
Edge type "triggered"  = execution hierarchy (parentRequestId → child call)
Edge type "depends_on" = data dependency (call A waits on call B's output)

EdgeType scoping: @alkdev/flowgraph defines five edge types in its EdgeType enum: triggered, depends_on, typed, sequential, conditional. Not all apply to every graph type:

  • Call graph: triggered and depends_on (plus the storage-layer requested_by)
  • Operation graph: typed (type compatibility between operations)
  • Template graph: sequential and conditional (workflow composition via ujsx)

This document focuses on call graph edge types. See the flowgraph architecture docs for the full type definitions.

The call graph is populated by FlowGraph.fromCallEvents(events) or incrementally via updateFromEvent(event). Each call protocol event maps directly to a graph mutation:

Event Graph Mutation
call.requested addCall(attrs) — creates node (status: pending) + triggered edge if parentRequestId present
call.responded updateCall(requestId, { status: "completed", output, completedAt })
call.completed updateCall(requestId, { completedAt }) — idempotent if already responded, sets completedAt if missing
call.error updateCall(requestId, { status: "failed", error: { code, message, details? } })
call.aborted updateStatus(requestId, "aborted") + cascade to children
call.running updateStatus(requestId, "running") — when the call starts executing (hub dispatches to handler)

Call Status State Machine

Flowgraph enforces valid status transitions via updateStatus(). The state machine is:

pending → running → completed
                  → failed
         → aborted
running → aborted

Terminal states (completed, failed, aborted) are immutable. InvalidTransitionError is thrown on invalid transitions. This matches the storage layer's call_graph_nodes.status enum.

Abort Cascading

When a call is aborted, all of its children should also be aborted. Flowgraph provides two mechanisms:

  1. triggered edge traversal: children(requestId) returns direct children via triggered edges. Full cascading uses descendants(requestId) for all descendants.
  2. WorkflowReactiveRoot: For running workflow executions, the reactive engine provides abortNode(nodeId) and abortAll() with FailurePolicy configuration ("continue-running" vs "abort-dependents").

The hub's CallHandler wires call.aborted events to:

  • updateStatus(requestId, "aborted") on the call graph
  • Pubsub event propagation so downstream PendingRequestMap instances also call abort() on their in-flight requests
  • WorkflowReactiveRoot.abortNode(nodeId) if a workflow execution is tracking this call

depends_on Edges

While triggered edges represent the parent-child execution hierarchy, depends_on edges represent data dependencies — a call that needs another call's output before it can proceed. These are created by the coordinator when orchestrating workflows:

callGraph.addDependency(sourceRequestId, targetRequestId)
// Adds a "depends_on" edge (source depends on target's output)
// Cycle-checked — throws CycleError if the edge would create a cycle

depends_on edges are not created by the call protocol itself. They are added by coordination logic that knows the data flow between calls (e.g., the coordinator knows that coord.spawn step 3 depends on step 1's output). This gives the observability layer a richer graph for analysis without changing the protocol.

API Summary

import { FlowGraph } from "@alkdev/flowgraph/graph"
import { CallStatus } from "@alkdev/flowgraph/schema"

// Build call graph from events (e.g., after hub restart, reconstruct from DB)
const callGraph = FlowGraph.fromCallEvents(storedEvents)

// Or build incrementally as events arrive
const callGraph = new FlowGraph(CallNodeAttrs, CallEdgeAttrs)

// Process events
callGraph.updateFromEvent(event)       // handles all call.* event types

// Status management
callGraph.updateStatus(requestId, "running")  // validates state machine transition
callGraph.updateStatus(requestId, "completed") // throws if not currently "running"

// Edge management
callGraph.addCall({ requestId, operationId, status: "pending", parentRequestId?, input?, identity? })
callGraph.addDependency(sourceRequestId, targetRequestId)  // depends_on edge

// Queries
callGraph.children(requestId)           // direct children via triggered edges
callGraph.descendants(requestId)        // all descendants
callGraph.lineage(requestId)            // ancestor chain from root to this call
callGraph.getRoots()                     // calls with no parentRequestId
callGraph.filterByStatus("running")      // all running calls
callGraph.duration(requestId)           // completedAt - startedAt in ms

// Serialization (for Postgres persistence)
const data = callGraph.export()          // -> CallGraphSerialized
const restored = FlowGraph.fromJSON(data)

Graph Size

At hub level, the call graph is small — just metadata nodes mapping to the actual process/call. Agent workflow call graphs will have tens of nodes at most for simple workflows, potentially hundreds for complex coordination with many parallel tasks. Performance is a non-issue for call-level metadata; flowgraph wraps graphology which handles thousands of nodes efficiently.

Reactive Workflow Execution

For running workflow executions (not just observability), @alkdev/flowgraph/reactive provides WorkflowReactiveRoot — a signal-driven execution engine that:

  1. Takes a DirectedGraph (from a ujsx template rendered via GraphologyHostConfig) and creates reactive state for every node
  2. Processes call protocol events via append(event) — the event log is the source of truth, status/results are derived projections
  3. Computes preconditions (all predecessors completed), canStart (preconditions met + not blocked by failure), and blockedByFailure (any predecessor failed/aborted) as reactive signals
  4. Supports FailurePolicy: "continue-running" (only abort idle/waiting dependents) or "abort-dependents" (cascade abort to all non-terminal dependents)
  5. Maps node keys to requestIds via setRequestId(nodeKey, requestId) — bridging template nodes to call protocol identifiers
  6. Requires dispose() to release signal subscriptions
import { WorkflowReactiveRoot } from "@alkdev/flowgraph/reactive"

// WorkflowReactiveRoot takes the raw DirectedGraph (flowGraph.graph),
// not the FlowGraph wrapper
const workflowRoot = new WorkflowReactiveRoot(templateGraph.graph)
try {
  // Bridge template nodes to call protocol request IDs
  workflowRoot.setRequestId("step-1", requestId1)
  workflowRoot.setRequestId("step-2", requestId2)

  // Process events (appended by the hub's call protocol handler)
  workflowRoot.append(callRequestedEvent)
  workflowRoot.append(callRespondedEvent)

  // Query reactive state
  const status = workflowRoot.getStatus("step-1")    // NodeStatus
  const canStart = workflowRoot.canStart.get("step-2") // ReadonlySignal<boolean>
  const isComplete = workflowRoot.isComplete()         // all nodes terminal?
} finally {
  workflowRoot.dispose()
}

This is the execution engine for workflow-based coordination. The hub coordinator instantiates a WorkflowReactiveRoot for each running workflow, feeds it call protocol events, and uses its reactive state to determine what to do next (start the next step, handle failures, cascade aborts).

Storage

Call graph nodes and edges are stored in Postgres. See storage/call-graph.md for the full schema definitions.

The storage layer persists individual call_graph_nodes and call_graph_edges rows. Flowgraph's export() produces graphology's native JSON format (CallGraphSerialized), which is suitable for snapshot/restore but not for incremental observability queries. The hub uses both:

  • Incremental storage: Each call protocol event writes/updates a row in call_graph_nodes and creates call_graph_edges as needed. This supports real-time observability queries (what's running, what failed, what's blocked).
  • Reconstruction: After a hub restart, the call graph can be reconstructed from stored events or from incremental rows using FlowGraph.fromCallEvents().

Write Path

The hub's CallHandler is responsible for writing call graph data to Postgres. When a call protocol event arrives:

  1. call.requested: The CallHandler creates a row in call_graph_nodes (status: pending) and, if parentRequestId is present, a triggered edge in call_graph_edges. This write happens synchronously before dispatching to ensure the call is tracked even if the handler fails immediately.
  2. call.responded: Updates the node's status to completed, sets output (unwrapped from the ResponseEnvelope — only data is stored, not meta), and sets completedAt.
  3. call.error: Updates status to failed, sets error, and sets completedAt.
  4. call.aborted: Updates status to aborted and sets completedAt. The hub then cascades the abort to child calls.
  5. call.completed: Sets completedAt if not already set. Idempotent — no-op if the call is already completed.
  6. call.running: Updates status from pending to running and sets startedAt.

Error handling: If a DB write fails, the call still proceeds (the handler has already been invoked). The hub logs the write failure and continues. Call graph data is best-effort — the in-memory flowgraph is the authoritative source for running calls; the DB is for persistence and observability.

Identifier Mapping

The call_graph_nodes table uses two identifiers:

  • id (UUID, from commonCols): Internal primary key, used as the FK target for call_graph_edges.
  • requestId (text, UNIQUE): Protocol-level correlation key, used as the flowgraph node key.

When reconstructing a flowgraph from the database, the hub uses requestId as the node key (matching CallNodeAttrs.requestId). The call_graph_edges table uses sourceId/targetId referencing call_graph_nodes.id (the UUID), so reconstruction requires resolving UUIDs to requestIds. The call_graph_nodes.requestId column has a UNIQUE index, making this lookup efficient.

call_graph_nodes — One row per call

Column Type Notes
commonCols id, metadata, createdAt, updatedAt
requestId text NOT NULL UNIQUE Protocol-level correlation key. Also the flowgraph node key.
operationId text FK → operations.id. Nullable — survives operation removal.
parentRequestId text Denormalized parent — fast point lookup. Redundant with triggered edge.
identity jsonb Caller identity: { id, scopes, resources }
callerAccountId text FK → accounts.id (ON DELETE SET NULL). System calls are nullable.
status text NOT NULL Matches CallStatus enum: pending, running, completed, failed, aborted
input jsonb Call input (redacted, truncated — see storage/call-graph.md)
output jsonb Call output (on success)
error jsonb { code, message, details? } (on failure)
startedAt timestamp with tz When call was dispatched (maps to flowgraph startedAt)
completedAt timestamp with tz When call completed/failed/aborted (maps to flowgraph completedAt)

call_graph_edges — Typed directed edges between calls

Column Type Notes
commonCols id, metadata, createdAt, updatedAt
sourceId text NOT NULL FK → call_graph_nodes.id (CASCADE)
targetId text NOT NULL FK → call_graph_nodes.id (CASCADE)
edgeType text NOT NULL triggered, depends_on, or requested_by

Edge type semantics: triggered = execution hierarchy (parentRequestId), depends_on = data dependency, requested_by = identity/authorization chain. See storage/call-graph.md for details.

Note on depends_on in flowgraph: The flowgraph CallEdgeAttrs type is a union of TriggeredEdgeAttrs and DependencyEdgeAttrs, matching the triggered and depends_on edge types. The requested_by edge type is a storage-layer concept for identity tracing that doesn't have a corresponding flowgraph edge type — it's persisted in the database but not modeled in the in-memory graph.

Transport Mapping

The call protocol is transport-agnostic. The TypedEventTarget plug point (same pattern as RedisEventTarget in the pubsub design) determines how events move:

Transport Use Case TypedEventTarget impl
In-process Local hub operations Browser EventTarget (default)
Redis Cross-process events (e.g., hub → all processes) RedisEventTarget
WebSocket Hub ↔ spoke bidirectional createWebSocketServerEventTarget (hub) / createWebSocketClientEventTarget (spoke) from @alkdev/pubsub

A WebSocketEventTarget implementing TypedEventTarget makes each spoke runner's websocket connection a live bidirectional channel. The hub dispatches call.requested over the socket; the runner sends call.responded/call.error back. Same protocol, same event shapes, same PendingRequestMap — just a different eventTarget.

What We Defer

  1. Full ujsx call templates — currently using hardcoded workflow sequences. @alkdev/flowgraph/component provides Operation, Sequential, Parallel, Conditional, Map components for declarative template definition, and GraphologyHostConfig + ReactiveHostConfig for rendering. We'll adopt these when workflow complexity justifies it.
  2. Graph visualization — API only, no Sigma.js UI
  3. Stream deduplicationValue.Hash({operationId, input}) deduplication for multiple subscribers to the same stream
  4. requested_by edge creation in flowgraph — the requested_by edge type is a storage-layer concept for identity tracing. It's persisted in call_graph_edges but not modeled in @alkdev/flowgraph's CallEdgeAttrs union. We may add it to flowgraph in the future.

The call protocol itself, PendingRequestMap, CallHandler, buildEnv dual-mode, call graph auto-tracking, and reactive workflow execution are in the initial implementation. They're not much code and they prevent the need to bolt on ad-hoc error handling and abort logic in every coordination operation.

Dependencies

@alkdev/flowgraph      # DAG construction, reactive execution, call/operation graphs, type-compat analysis
@alkdev/operations     # Call protocol, PendingRequestMap, CallHandler
@alkdev/pubsub         # Event transport (Redis, WebSocket, Worker)
@alkdev/taskgraph      # Task graph construction and analysis (for task management, not call graphs)

Why both @alkdev/flowgraph and @alkdev/taskgraph? @alkdev/taskgraph is a domain-specific library for task DAG construction with categorical estimates (scope, risk, impact), frontmatter parsing, and task-specific analysis (critical path, bottleneck detection, risk assessment). @alkdev/flowgraph is a general-purpose workflow graph library for call/operation DAGs with ujsx template composition and reactive execution. They both wrap graphology, but serve different domains. The hub uses @alkdev/taskgraph for task management and @alkdev/flowgraph for call graph and operation graph management.

Prior Art

The call protocol was adapted from ade_spoke's call protocol design (which was pubsub-agnostic). The key difference here is that websockets are the primary transport for hub-spoke communication rather than SSE. The call graph and operation graph are now implemented using @alkdev/flowgraph rather than raw graphology, which provides DAG enforcement, type-compatibility analysis, and reactive execution out of the box.