Align storage & architecture specs with published npm libraries
Systematically compared @alkdev/taskgraph, @alkdev/operations, and
@alkdev/flowgraph against storage/arch specs and fixed all mismatches.
Key changes:
Tasks (storage/tasks.md + ADR-011):
- Rename TaskFrontmatter → TaskInput to match library export
- Fix dependsOn (was depends_on) in field mappings — library uses
camelCase; parseFrontmatter normalizes YAML snake_case on input
- Document DependencyEdge shape {from, to, qualityRetention?} and
DB↔library field mapping
- Document graph node vs DB column distinction (TaskGraphNodeAttrs
is a subset of TaskInput)
- Fix default risk fallback from low → medium (matches resolveDefaults)
- Fix cross-project guard column references (dependentTaskId, not taskId)
- Clarify @alkdev/taskgraph TS is source of truth; frontmatter is for
LLM output parsing and legacy imports, not Rust CLI
- Add complete library exports reference
Operations (storage/spokes.md + operations.md):
- Add version, title, _meta columns to operations table (required by
OperationSpec, were missing)
- Fix type casing: query/mutation/subscription (lowercase, matching
OperationType runtime values)
- Make outputSchema and accessControl NOT NULL (matching library)
- Document ErrorDefinition shape {code, description, schema, httpStatus?}
- Document _meta vs commonCols.metadata distinction
- Add registerAll, get, getHandler, getByName, list, subscribe methods
- Fix buildCallHandler signature ({ registry, callMap })
- Fix OperationType values (lowercase)
Call graph (storage/call-graph.md + call-graph.md):
- Change operationId to NOT NULL with RESTRICT FK (was nullable/SET NULL)
— matches flowgraph's required CallNodeAttrs.operationId
- Document sentinel __removed__ operation strategy for deletions
- Document ISO 8601 string ↔ timestamptz conversion requirement
- Rewrite CallEventMap to match actual library: flat dot-notation keys,
timestamp on all events, nested error structure, optional output on
completed event
- Remove call.running event (doesn't exist in library) — hub calls
updateStatus(running) directly on dispatch
- Fix buildCallHandler({ registry, callMap }) signature
- Fix PendingRequestMap constructor (positional EventTarget)
- Add updateCall/removeCall/graph methods to API summary
- Document abort cascade as hub logic, not flowgraph logic
- Add open questions for operation deletion and reactive vs call graph
semantics
Table reference (storage/table-reference.md):
- Update call_graph_nodes.operationId cascade to RESTRICT
- Update operations.type comment to lowercase
- Update status enum reference
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-05-22
|
||||
last_updated: 2026-05-25
|
||||
---
|
||||
|
||||
# Call Protocol, Call Graph & Operation Graph
|
||||
@@ -44,54 +44,78 @@ This means `call` is semantically `subscribe().next()` — a subscription that c
|
||||
|
||||
## Call Event Types
|
||||
|
||||
All communication flows through typed events:
|
||||
|
||||
> The call event TypeBox schemas are defined in `@alkdev/operations` as `CallEventSchema`. The shape shown here is the current design; verify against the package source for any minor differences.
|
||||
All communication flows through typed events. The call event TypeBox schemas are defined in `@alkdev/operations` as `CallEventSchema` (flat dot-notation keys like `"call.requested"`). The shapes below match the library's actual event types.
|
||||
|
||||
```ts
|
||||
import { Type } from "@alkdev/typebox"
|
||||
|
||||
export const CallEventMap = {
|
||||
call: {
|
||||
requested: Type.Object({
|
||||
requestId: Type.String(),
|
||||
operationId: Type.String(),
|
||||
input: Type.Unknown(),
|
||||
parentRequestId: Type.Optional(Type.String()),
|
||||
deadline: Type.Optional(Type.Number()),
|
||||
identity: Type.Optional(Type.Object({
|
||||
id: Type.String(),
|
||||
scopes: Type.Array(Type.String()),
|
||||
resources: Type.Optional(Type.Record(Type.String(), Type.Array(Type.String())))
|
||||
}))
|
||||
}),
|
||||
responded: Type.Object({
|
||||
requestId: Type.String(),
|
||||
output: Type.Unknown() // ResponseEnvelope from @alkdev/operations
|
||||
}),
|
||||
completed: Type.Object({
|
||||
requestId: Type.String()
|
||||
}),
|
||||
aborted: Type.Object({
|
||||
requestId: Type.String()
|
||||
}),
|
||||
error: Type.Object({
|
||||
requestId: Type.String(),
|
||||
code: Type.String(),
|
||||
message: Type.String(),
|
||||
details: Type.Optional(Type.Unknown())
|
||||
})
|
||||
}
|
||||
} as const
|
||||
// CallEventSchema uses flat dot-notation keys (not nested objects):
|
||||
// "call.requested", "call.responded", "call.completed", "call.aborted", "call.error"
|
||||
// The shapes below show the event payload for each type.
|
||||
|
||||
// call.requested
|
||||
Type.Object({
|
||||
type: Type.Literal("call.requested"),
|
||||
requestId: Type.String(),
|
||||
operationId: Type.String(),
|
||||
input: Type.Unknown(),
|
||||
timestamp: Type.String(), // ISO 8601 — used as source for startedAt/completedAt
|
||||
parentRequestId: Type.Optional(Type.String()),
|
||||
identity: Type.Optional(Type.Object({
|
||||
id: Type.String(),
|
||||
scopes: Type.Array(Type.String()),
|
||||
resources: Type.Optional(Type.Record(Type.String(), Type.Array(Type.String())))
|
||||
})),
|
||||
startedAt: Type.Optional(Type.String()), // ISO 8601 — if provided, overrides timestamp for startedAt
|
||||
})
|
||||
|
||||
// call.responded
|
||||
Type.Object({
|
||||
type: Type.Literal("call.responded"),
|
||||
requestId: Type.String(),
|
||||
output: Type.Unknown(), // ResponseEnvelope from @alkdev/operations
|
||||
timestamp: Type.String(), // ISO 8601
|
||||
})
|
||||
|
||||
// call.completed
|
||||
Type.Object({
|
||||
type: Type.Literal("call.completed"),
|
||||
requestId: Type.String(),
|
||||
output: Type.Optional(Type.Unknown()), // Optional final output
|
||||
timestamp: Type.String(), // ISO 8601
|
||||
})
|
||||
|
||||
// call.aborted
|
||||
Type.Object({
|
||||
type: Type.Literal("call.aborted"),
|
||||
requestId: Type.String(),
|
||||
timestamp: Type.String(), // ISO 8601
|
||||
})
|
||||
|
||||
// call.error
|
||||
Type.Object({
|
||||
type: Type.Literal("call.error"),
|
||||
requestId: Type.String(),
|
||||
error: Type.Object({ // Error is nested under "error" key
|
||||
code: Type.String(),
|
||||
message: Type.String(),
|
||||
details: Type.Optional(Type.Unknown()),
|
||||
}),
|
||||
timestamp: Type.String(), // ISO 8601
|
||||
})
|
||||
```
|
||||
|
||||
**Note on `deadline`**: The `call.requested` event above does not include a `deadline` field. Deadlines are a `PendingRequestMap` concept (timeouts are applied at the protocol layer, not persisted in the call graph). If a call times out, the `PendingRequestMap` emits a `call.error` with code `TIMEOUT`.
|
||||
|
||||
**Note: no `call.running` event**: The library's `CallEventMapValue` union only has 5 event types. There is no `call.running` event. When the hub dispatches an operation handler, it calls `flowGraph.updateStatus(requestId, "running", { startedAt: now.toISOString() })` directly. This is a hub-initiated state transition, not an event. See [Write Path](#write-path) for details.
|
||||
|
||||
### Event Semantics
|
||||
|
||||
- **`call.requested`** — Initiates a call. Creates a call graph node (status: `pending`) and adds a `triggered` edge if `parentRequestId` is present.
|
||||
- **`call.responded`** — Carries the call result. For one-shot calls, this is the terminal event that resolves the `Promise<ResponseEnvelope>`. The `output` field contains a `ResponseEnvelope` (with `data` and `meta` fields) from `@alkdev/operations`.
|
||||
- **`call.completed`** — Terminal completion signal, idempotent if `call.responded` was already received. For subscriptions, fires after the last `call.responded` to signal stream end. For one-shot calls, the `PendingRequestMap` may emit `call.completed` as a separate event or as part of `call.responded` processing. In flowgraph, this event fills `completedAt` if it was not already set.
|
||||
- **`call.aborted`** — Call was cancelled. Sets status to `aborted` and cascades to children.
|
||||
- **`call.error`** — Call failed with an error. Sets status to `failed` and stores the error.
|
||||
- **`call.requested`** — Initiates a call. Creates a call graph node (status: `pending`) and adds a `triggered` edge if `parentRequestId` is present. If `startedAt` is provided in the event, it overrides `timestamp` for the node's `startedAt`.
|
||||
- **`call.responded`** — Carries the call result. For one-shot calls, this is the terminal event that resolves the `Promise<ResponseEnvelope>`. The `output` field contains a `ResponseEnvelope` (with `data` and `meta` fields) from `@alkdev/operations`. Sets `status: "completed"` and `completedAt: event.timestamp`.
|
||||
- **`call.completed`** — Terminal completion signal, idempotent if `call.responded` was already received. For subscriptions, fires after the last `call.responded` to signal stream end. May include optional `output`. Sets `completedAt: event.timestamp` if not already set.
|
||||
- **`call.aborted`** — Call was cancelled. Sets status to `aborted` and `completedAt: event.timestamp`. **Note**: flowgraph's `updateFromEvent()` does NOT cascade aborts to children — the hub's `CallHandler` is responsible for cascading `call.aborted` to descendant calls via the pubsub layer.
|
||||
- **`call.error`** — Call failed with an error. Sets status to `failed`, stores the error (nested under `error: { code, message, details? }`), and sets `completedAt: event.timestamp`.
|
||||
|
||||
**Note on `@alkdev/flowgraph`**: The `CallEventMapValue` type in `@alkdev/flowgraph/schema` defines the union of these event types. Flowgraph's `FlowGraph.fromCallEvents()` and `updateFromEvent()` consume these events directly to populate the call graph. The `CallStatus` enum in flowgraph (`pending`, `running`, `completed`, `failed`, `aborted`) aligns with the statuses in the call protocol events.
|
||||
|
||||
@@ -133,8 +157,8 @@ Manages in-flight requests and provides the `call()` interface:
|
||||
// From @alkdev/operations
|
||||
import { PendingRequestMap } from "@alkdev/operations"
|
||||
|
||||
// Construction — takes optional EventTarget for pluggable transport
|
||||
const prm = new PendingRequestMap({ eventTarget })
|
||||
// Construction — takes optional EventTarget for pluggable transport (positional, not options object)
|
||||
const prm = new PendingRequestMap(eventTarget?)
|
||||
|
||||
// Call protocol — call() returns Promise<ResponseEnvelope>
|
||||
const envelope = await prm.call(operationId, input, { deadline, identity })
|
||||
@@ -158,7 +182,7 @@ prm.abort(requestId)
|
||||
- `subscribe()` returns `AsyncIterable<ResponseEnvelope>`
|
||||
- `respond()` requires `isResponseEnvelope(output)`
|
||||
- Built-in deadline and idle timeout support
|
||||
- Constructor takes optional `EventTarget` for pluggable transport
|
||||
- Constructor takes optional `EventTarget` for pluggable transport (positional parameter, not options object)
|
||||
|
||||
## CallHandler
|
||||
|
||||
@@ -167,7 +191,8 @@ Bridges pubsub events to `OperationRegistry.execute()`. Performs access control
|
||||
```ts
|
||||
import { buildCallHandler } from "@alkdev/operations"
|
||||
|
||||
const handler = buildCallHandler({ registry, eventTarget })
|
||||
const handler = buildCallHandler({ registry, callMap })
|
||||
// callMap: PendingRequestMap instance (not raw EventTarget)
|
||||
// subscribes to call.requested events
|
||||
// checks access control (requiredScopes, resource permissions) against Identity
|
||||
// executes via registry, dispatches call.responded on success
|
||||
@@ -282,11 +307,11 @@ The call graph is populated by `FlowGraph.fromCallEvents(events)` or incremental
|
||||
| Event | Graph Mutation |
|
||||
|-------|---------------|
|
||||
| `call.requested` | `addCall(attrs)` — creates node (status: `pending`) + `triggered` edge if `parentRequestId` present |
|
||||
| `call.responded` | `updateCall(requestId, { status: "completed", output, completedAt })` |
|
||||
| `call.completed` | `updateCall(requestId, { completedAt })` — idempotent if already responded, sets `completedAt` if missing |
|
||||
| `call.error` | `updateCall(requestId, { status: "failed", error: { code, message, details? } })` |
|
||||
| `call.aborted` | `updateStatus(requestId, "aborted")` + cascade to children |
|
||||
| `call.running` | `updateStatus(requestId, "running")` — when the call starts executing (hub dispatches to handler) |
|
||||
| `call.responded` | `updateFromEvent()` sets `status: "completed"`, `output`, `completedAt: event.timestamp` (idempotent for terminal states) |
|
||||
| `call.completed` | Sets `completedAt: event.timestamp` if not already set. Idempotent — if already `completed`, no-op. May also set `output` if present in event. |
|
||||
| `call.error` | `updateFromEvent()` sets `status: "failed"`, `error: { code, message, details? }`, `completedAt: event.timestamp` |
|
||||
| `call.aborted` | `updateFromEvent()` sets `status: "aborted"`, `completedAt: event.timestamp`. **No cascade**: `updateFromEvent()` does not abort children — the hub's `CallHandler` handles cascading. |
|
||||
| _(hub-initiated)_ | `updateStatus(requestId, "running", { startedAt: now.toISOString() })` — **Not an event.** The hub's `CallHandler` calls `updateStatus()` directly when it dispatches the operation handler, transitioning `pending` → `running`. |
|
||||
|
||||
### Call Status State Machine
|
||||
|
||||
@@ -339,24 +364,29 @@ const callGraph = FlowGraph.fromCallEvents(storedEvents)
|
||||
const callGraph = new FlowGraph(CallNodeAttrs, CallEdgeAttrs)
|
||||
|
||||
// Process events
|
||||
callGraph.updateFromEvent(event) // handles all call.* event types
|
||||
callGraph.updateFromEvent(event) // handles all 5 call.* event types
|
||||
|
||||
// Status management
|
||||
callGraph.updateStatus(requestId, "running") // validates state machine transition
|
||||
callGraph.updateStatus(requestId, "completed") // throws if not currently "running"
|
||||
callGraph.updateStatus(requestId, "running", { startedAt: now.toISOString() }) // hub-initiated, not event-driven
|
||||
callGraph.updateStatus(requestId, "completed") // validates state machine transition
|
||||
|
||||
// Edge management
|
||||
callGraph.addCall({ requestId, operationId, status: "pending", parentRequestId?, input?, identity? })
|
||||
// Call management
|
||||
callGraph.addCall({ requestId, operationId, status: "pending", parentRequestId?, input?, identity?, startedAt? })
|
||||
callGraph.updateCall(requestId, attrs) // partial update of any CallNodeAttrs
|
||||
callGraph.removeCall(requestId)
|
||||
callGraph.addDependency(sourceRequestId, targetRequestId) // depends_on edge
|
||||
|
||||
// Queries
|
||||
callGraph.children(requestId) // direct children via triggered edges
|
||||
callGraph.descendants(requestId) // all descendants
|
||||
callGraph.descendants(nodeId) // all descendants
|
||||
callGraph.lineage(requestId) // ancestor chain from root to this call
|
||||
callGraph.getRoots() // calls with no parentRequestId
|
||||
callGraph.filterByStatus("running") // all running calls
|
||||
callGraph.duration(requestId) // completedAt - startedAt in ms
|
||||
|
||||
// Escape hatch
|
||||
callGraph.graph // raw graphology DirectedGraph
|
||||
|
||||
// Serialization (for Postgres persistence)
|
||||
const data = callGraph.export() // -> CallGraphSerialized
|
||||
const restored = FlowGraph.fromJSON(data)
|
||||
@@ -417,11 +447,11 @@ The storage layer persists individual `call_graph_nodes` and `call_graph_edges`
|
||||
The hub's `CallHandler` is responsible for writing call graph data to Postgres. When a call protocol event arrives:
|
||||
|
||||
1. **`call.requested`**: The `CallHandler` creates a row in `call_graph_nodes` (status: `pending`) and, if `parentRequestId` is present, a `triggered` edge in `call_graph_edges`. This write happens **synchronously before dispatching** to ensure the call is tracked even if the handler fails immediately.
|
||||
2. **`call.responded`**: Updates the node's status to `completed`, sets `output` (unwrapped from the `ResponseEnvelope` — only `data` is stored, not `meta`), and sets `completedAt`.
|
||||
3. **`call.error`**: Updates status to `failed`, sets `error`, and sets `completedAt`.
|
||||
4. **`call.aborted`**: Updates status to `aborted` and sets `completedAt`. The hub then cascades the abort to child calls.
|
||||
5. **`call.completed`**: Sets `completedAt` if not already set. Idempotent — no-op if the call is already `completed`.
|
||||
6. **`call.running`**: Updates status from `pending` to `running` and sets `startedAt`.
|
||||
2. **Handler dispatch**: The `CallHandler` dispatches the operation handler. At this point it updates the node status to `running` and sets `startedAt` — this is **not** an event, but a hub-initiated `updateStatus()` call on both the in-memory flowgraph and the DB row.
|
||||
3. **`call.responded`**: Updates the node's status to `completed`, sets `output` (unwrapped from the `ResponseEnvelope` — only `data` is stored, not `meta`), and sets `completedAt`.
|
||||
4. **`call.error`**: Updates status to `failed`, sets `error`, and sets `completedAt`.
|
||||
5. **`call.aborted`**: Updates status to `aborted` and sets `completedAt`. The hub's `CallHandler` then cascades the abort to child calls (this is hub logic, not flowgraph's `updateFromEvent()`).
|
||||
6. **`call.completed`**: Sets `completedAt` if not already set. Idempotent — no-op if the call is already `completed`.
|
||||
|
||||
Error handling: If a DB write fails, the call still proceeds (the handler has already been invoked). The hub logs the write failure and continues. Call graph data is best-effort — the in-memory flowgraph is the authoritative source for running calls; the DB is for persistence and observability.
|
||||
|
||||
@@ -440,16 +470,16 @@ When reconstructing a flowgraph from the database, the hub uses `requestId` as t
|
||||
|--------|------|-------|
|
||||
| commonCols | — | id, metadata, createdAt, updatedAt |
|
||||
| requestId | text NOT NULL UNIQUE | Protocol-level correlation key. Also the flowgraph node key. |
|
||||
| operationId | text | FK → operations.id. Nullable — survives operation removal. |
|
||||
| operationId | text NOT NULL | FK → operations.id (RESTRICT). NOT NULL — `CallNodeAttrs.operationId` is required in flowgraph. |
|
||||
| parentRequestId | text | Denormalized parent — fast point lookup. Redundant with `triggered` edge. |
|
||||
| identity | jsonb | Caller identity: `{ id, scopes, resources }` |
|
||||
| identity | jsonb | Caller identity: `{ id, scopes, resources? }` |
|
||||
| callerAccountId | text | FK → accounts.id (ON DELETE SET NULL). System calls are nullable. |
|
||||
| status | text NOT NULL | Matches `CallStatus` enum: `pending`, `running`, `completed`, `failed`, `aborted` |
|
||||
| input | jsonb | Call input (redacted, truncated — see storage/call-graph.md) |
|
||||
| output | jsonb | Call output (on success) |
|
||||
| error | jsonb | `{ code, message, details? }` (on failure) |
|
||||
| startedAt | timestamp with tz | When call was dispatched (maps to flowgraph `startedAt`) |
|
||||
| completedAt | timestamp with tz | When call completed/failed/aborted (maps to flowgraph `completedAt`) |
|
||||
| error | jsonb | `{ code, message, details? }` (on failure, nested under `error` key matching flowgraph) |
|
||||
| startedAt | timestamp with tz | When call was dispatched. **Type conversion**: flowgraph stores as ISO 8601 string; DB stores as `timestamptz`. |
|
||||
| completedAt | timestamp with tz | When call completed/failed/aborted. Same type conversion as `startedAt`. |
|
||||
|
||||
### `call_graph_edges` — Typed directed edges between calls
|
||||
|
||||
@@ -485,6 +515,12 @@ A `WebSocketEventTarget` implementing `TypedEventTarget` makes each spoke runner
|
||||
|
||||
The call protocol itself, `PendingRequestMap`, `CallHandler`, `buildEnv` dual-mode, call graph auto-tracking, and reactive workflow execution are **in the initial implementation**. They're not much code and they prevent the need to bolt on ad-hoc error handling and abort logic in every coordination operation.
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Operation deletion and call graph referential integrity**: The `call_graph_nodes.operationId` column has a RESTRICT FK to `operations.id`. An operation cannot be deleted while any call records reference it. For v1, the strategy is to deny removal while call records exist. If operation removal becomes necessary (e.g., cleanup of old operations), the hub would need to either: (a) reassign all referencing call records to a sentinel `__removed__` operation (pre-seeded in migrations with `id='__removed__'`, `namespace='system'`), then delete the original operation, or (b) accept that historical call records reference operations that may have been provided by disconnected spokes — in which case, consider making `operationId` nullable in flowgraph's `CallNodeAttrs` so the hub can NULL the FK instead of requiring a sentinel row. This requires coordination with the `@alkdev/flowgraph` package.
|
||||
|
||||
2. **Reactive vs. call graph `requested` semantics**: In `FlowGraph`, `call.requested` creates a node in `pending` state. In `WorkflowReactiveRoot`, `call.requested` maps to `NodeStatus.running` (the reactive model assumes a template node starts executing when requested). This is a deliberate semantic difference — the reactive model tracks execution progress, while the call graph model tracks protocol state. The spec documents this, but implementers should be aware that feeding the same event to both models produces different initial statuses.
|
||||
|
||||
## Dependencies
|
||||
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user