Files
flowgraph/docs/architecture/schema.md
glm-5.1 da2973e2a6 fix build/distribution spec: npm deps not workspace, align configs with sibling projects, resolve review issues
- Replace workspace:* deps with published npm semver ranges (^0.34.49, ^0.1.0)
- Expand package.json: add description, publishConfig, scripts, engines,
  devDependencies, conditional exports with types/default for import+require
- Fix tsup entry names (path-prefixed like ujsx), add target: es2022,
  remove splitting:true (not used by sibling projects)
- Align tsconfig with sibling projects: add lib, noUncheckedIndexedAccess,
  noUnusedLocals, noUnusedParameters, erasableSyntaxOnly, etc.
- Expand vitest.config.ts with include, coverage, and path alias
- Clarify @preact/signals-core as direct dep (not just transitive via ujsx)
- Clarify @alkdev/pubsub is a consumer dependency, not flowgraph's dep
- Fix edge key convention: document composite key format for call graph's
  multi-edge-type scenario (triggered + depends_on between same pair)
- Align OperationEdgeAttrs field naming: use detail+mismatches consistently
  instead of compatibilityDetail
- Add InvalidInputError to error hierarchy (referenced in flowgraph-api but
  was missing)
- Fix undefined attrs.category reference in reactive-execution.md
- Remove internal drafting note from host-configs.md
- Fix ReactiveHostConfig constructor signature inconsistency across docs
- Constrain TemplateEdgeAttrs.edgeType to sequential|conditional only
2026-05-20 03:09:57 +00:00

424 lines
22 KiB
Markdown

---
status: draft
last_updated: 2026-05-20
---
# Schema
TypeBox Module, TypeScript types, categorical enums, node/edge attribute schemas, and the design decisions behind them.
## Overview
Flowgraph's schema layer follows the same pattern as taskgraph: TypeBox schemas are the single source of truth for both runtime validation and TypeScript type derivation. All data shapes are defined as TypeBox schemas, with `Static<typeof Schema>` producing the corresponding TypeScript types.
The schema is organized around two distinct graph types (operation graph and call graph) plus shared enums and the serialized graph factory.
## Design Decision: TypeBox as Single Source of Truth
Identical to taskgraph's approach:
1. **Static TypeScript types** via `Static<typeof Schema>` — every schema constant has a corresponding `type X = Static<typeof X>` alias
2. **Runtime validation** via `Value.Check()` / `Value.Errors()` — structured field-level error reporting
3. **JSON Schema export** for consumers that need schema-based contracts
No separate `interface` or `type` definitions outside of `Static<typeof>`. No Zod.
### Naming Convention
| Category | Convention | Example |
|----------|-----------|---------|
| Enum schema constant | PascalCase + `Enum` suffix | `CallStatusEnum` |
| Enum type alias | PascalCase, no suffix | `type CallStatus = Static<typeof CallStatusEnum>` |
| Object schema constant | PascalCase, no suffix | `OperationNodeAttrs`, `CallNodeAttrs` |
| Object type alias | Same name as schema constant | `type OperationNodeAttrs = Static<typeof OperationNodeAttrs>` |
| Graph attribute schemas | `PascalCase` + suffix | `FlowGraphSerialized`, `OperationGraphSerialized` |
| Factory function | PascalCase | `SerializedGraph(NodeAttrs, EdgeAttrs, GraphAttrs)` |
### Nullable Helper
Same `Nullable` helper as taskgraph:
```typescript
const Nullable = <T extends TSchema>(schema: T) => Type.Union([schema, Type.Null()]);
```
Used for fields that can be explicitly set to `null` (distinct from absent).
## Enums
### CallStatus
The lifecycle states of a call invocation. Matches the call graph storage schema in `@alkdev/alkhub_ts/docs/architecture/storage/call-graph.md`.
```typescript
const CallStatusEnum = Type.Union([
Type.Literal("pending"), // Call requested, not yet dispatched
Type.Literal("running"), // Handler executing
Type.Literal("completed"), // Successfully finished (call.responded + call.completed)
Type.Literal("failed"), // Handler threw or call.error emitted
Type.Literal("aborted"), // Call.aborted emitted (parent cancelled, deadline exceeded)
]);
type CallStatus = Static<typeof CallStatusEnum>;
```
Transitions:
```
pending → running → completed
→ failed
→ aborted
```
- `pending → running`: Handler starts executing
- `running → completed`: `call.responded` + `call.completed` received
- `running → failed`: `call.error` received
- `pending → aborted`: `call.aborted` received before handler started (e.g., deadline exceeded)
- `running → aborted`: `call.aborted` received during execution (parent cancelled)
`completed`, `failed`, and `aborted` are terminal states — no further transitions.
### NodeStatus
A derived status for workflow template nodes. While `CallStatus` tracks individual call invocations, `NodeStatus` reflects the template-level view:
```typescript
const NodeStatusEnum = Type.Union([
Type.Literal("idle"), // Not started, no call yet
Type.Literal("waiting"), // Preconditions not met, waiting for upstream
Type.Literal("ready"), // Preconditions met, eligible to start
Type.Literal("running"), // Call in progress
Type.Literal("completed"), // Call completed successfully
Type.Literal("failed"), // Call failed
Type.Literal("skipped"), // Conditional branch not taken
Type.Literal("aborted"), // Call aborted
]);
type NodeStatus = Static<typeof NodeStatusEnum>;
```
`NodeStatus` extends `CallStatus` with workflow-specific states (`idle`, `waiting`, `ready`, `skipped`) that have no call protocol equivalent. A node that is `waiting` has no call yet because its preconditions haven't been met.
**Precondition semantics**: A predecessor in `completed` or `skipped` status satisfies a dependent's preconditions. A predecessor in `failed` or `aborted` status does NOT satisfy preconditions — it blocks the dependent and triggers failure propagation (the dependent transitions to `aborted`). This enables partial success: independent parallel branches continue running even when one branch fails.
### CallResult
The result of a completed call, used by `Conditional.test` and `Map.over` to access predecessor outputs:
```typescript
interface CallResult {
status: NodeStatus; // Status of the call (completed, failed, aborted, skipped)
output: unknown; // Call output (if completed)
error?: { // Call error (if failed)
code: string;
message: string;
details?: unknown;
};
}
```
`CallResult` is the value in the `results` map passed to `Conditional.test` and `Map.over` functions. It's derived from `CallNodeAttrs` but simplified for template use — it omits `requestId`, `operationId`, `identity`, and timestamps, preserving only what template logic needs.
### OperationTypeEnum
The type of an operation, determining its call semantics:
```typescript
const OperationTypeEnum = Type.Union([
Type.Literal("query"), // Read-only, idempotent
Type.Literal("mutation"), // Side effects, not idempotent
Type.Literal("subscription"), // Streaming, produces multiple results
]);
type OperationType = Static<typeof OperationTypeEnum>;
```
This enum is used in `OperationNodeAttrs.type` to classify operations by their call behavior.
### CallEventMapValue
`CallEventMapValue` is imported from `@alkdev/operations` (peer dependency). It represents a single call protocol event — the union type of all event types (`CallRequestedEvent | CallRespondedEvent | CallErrorEvent | CallAbortedEvent | CallCompletedEvent`). The full definition lives in `@alkdev/operations/src/call.ts`.
Flowgraph's `fromCallEvents()` and `updateFromEvent()` accept this type directly. The mapping from `CallEventMapValue` to `CallNodeAttrs` is:
| Event type | Action |
|------------|--------|
| `call.requested` | Add node with `status: "pending"`, add `triggered` edge if `parentRequestId` present |
| `call.responded` | Update node status to `completed`, set `output` and `completedAt` |
| `call.error` | Update node status to `failed`, set `error` and `completedAt` |
| `call.aborted` | Update node status to `aborted`, set `completedAt` |
| `call.completed` | Update node status to `completed`, set `completedAt` (if not already set) |
### EdgeType
The type of edge in a flowgraph. Matches the call graph storage schema's `edgeType` column. This is a universal enum that covers all graph modes (operation, call, template), but each graph mode uses only a subset:
```typescript
const EdgeTypeEnum = Type.Union([
Type.Literal("triggered"), // Source caused target to execute (parent→child in call hierarchy)
Type.Literal("depends_on"), // Source requires target's result before it can complete (data dependency)
Type.Literal("typed"), // Type compatibility edge (output schema A → input schema B)
Type.Literal("sequential"), // Sequential flow edge (template: <Sequential> ordering)
Type.Literal("conditional"), // Conditional flow edge (template: <Conditional> branch)
]);
type EdgeType = Static<typeof EdgeTypeEnum>;
```
| Edge Type | Graph Mode | Meaning |
|-----------|------------|---------|
| `triggered` | Call graph | Parent call triggered child call. Corresponds to `parentRequestId`. |
| `depends_on` | Call graph | Data dependency — source needs target's result. |
| `typed` | Operation graph | Type compatibility — source's output schema is compatible with target's input schema. |
| `sequential` | Template DAG | Sequential ordering from `<Sequential>` component. |
| `conditional` | Template DAG | Conditional branch from `<Conditional>` component. |
`EdgeTypeEnum` is the universal enumeration. Each graph mode constrains its edge types through its specific edge attribute schemas:
- **Operation graphs** only use `typed` edges (`OperationEdgeAttrs`)
- **Call graphs** use `triggered` and `depends_on` edges (`CallEdgeAttrs`)
- **Template DAGs** use `sequential` and `conditional` edges (`TemplateEdgeAttrs`)
## Node Attribute Schemas
### OperationNodeAttrs
Attributes for nodes in the operation graph. Derived from `OperationSpec` but carrying only graph-relevant data:
```typescript
const OperationNodeAttrs = Type.Object({
name: Type.String(), // Operation name (e.g., "classify")
namespace: Type.String(), // Namespace (e.g., "task")
version: Type.String(), // Semantic version
type: OperationTypeEnum, // "query" | "mutation" | "subscription"
inputSchema: Type.Unknown(), // JSON Schema for input (TypeBox schema)
outputSchema: Type.Unknown(), // JSON Schema for output (TypeBox schema)
description: Type.Optional(Type.String()),
tags: Type.Optional(Type.Array(Type.String())),
});
type OperationNodeAttrs = Static<typeof OperationNodeAttrs>;
```
The node key is `namespace.name` (e.g., `"task.classify"`), matching the `operationId` format used in the call protocol. The full `OperationSpec` is not stored on the graph — `accessControl`, `errorSchemas`, and `handler` belong to the registry, not the graph.
**Why `inputSchema` and `outputSchema` on the graph**: These are needed for type-compatibility edge construction. An edge from operation A to operation B exists if A's `outputSchema` is compatible with B's `inputSchema`. Storing the schemas on the node avoids a round-trip to the registry during graph queries.
### CallNodeAttrs
Attributes for nodes in the call graph. Populated from call events:
```typescript
const CallNodeAttrs = Type.Object({
requestId: Type.String(), // Unique call identifier
operationId: Type.String(), // namespace.name of the operation
status: CallStatusEnum, // Current call status
parentRequestId: Type.Optional(Type.String()), // Parent call (null = top-level)
input: Type.Unknown(), // Call input
output: Type.Optional(Type.Unknown()), // Call output (on completion)
error: Type.Optional(Type.Object({ // Call error (on failure)
code: Type.String(),
message: Type.String(),
details: Type.Optional(Type.Unknown()),
})),
identity: Type.Optional(Type.Object({ // Caller identity
id: Type.String(),
scopes: Type.Array(Type.String()),
resources: Type.Optional(Type.Record(Type.String(), Type.Array(Type.String()))),
})),
startedAt: Type.Optional(Type.String()), // ISO timestamp when call was dispatched
completedAt: Type.Optional(Type.String()), // ISO timestamp when call completed/failed/aborted
});
type CallNodeAttrs = Static<typeof CallNodeAttrs>;
```
The node key is `requestId`. This matches the call protocol's correlation mechanism and the call graph storage schema.
**Why ISO timestamps as strings**: Following the call protocol, timestamps are ISO 8601 strings rather than numbers. This makes the graph directly serializable to JSON without transformation and aligns with the storage schema's `timestamp with tz` columns.
**Why `parentRequestId` is both a node attribute and an edge**: Following the same denormalization pattern as the storage schema — `parentRequestId` on the node enables fast point lookups ("who is this call's parent?"), while `triggered` edges enable traversal queries. Both are kept consistent by construction.
## Edge Attribute Schemas
### OperationEdgeAttrs (Operation Graph)
```typescript
const OperationEdgeAttrs = Type.Object({
compatible: Type.Boolean({ description: "Whether the source output schema is compatible with the target input schema" }),
detail: Type.Optional(Type.String({ description: "Human-readable description of compatibility or mismatch" })),
mismatches: Type.Optional(Type.Array(Type.Object({ // Structured mismatch details (populated when compatible: false)
path: Type.String(),
expected: Type.String(),
actual: Type.String(),
}))),
});
type OperationEdgeAttrs = Static<typeof OperationEdgeAttrs>;
```
Type-compatibility edges carry a boolean `compatible` flag, an optional `detail` string, and optional structured `mismatches`. This allows the operation graph to include both compatible edges (green paths) and incompatible edges (red paths) for diagnostics. The `detail` field provides a human-readable summary, while `mismatches` provides machine-readable field-level diagnostics. The `TypeCompatResult` from `typeCompat()` populates both fields: `detail` for compatible edges and `mismatches` for incompatible ones.
**Edge type storage**: Operation graph edges always have `edgeType: "typed"` stored on the edge as a separate attribute alongside `OperationEdgeAttrs`. Graphology edges carry both the `OperationEdgeAttrs` (compatible, detail, mismatches) and the `edgeType` field. The `edgeType` is not inside `OperationEdgeAttrs` because it's a universal edge classification that applies to all edge types across all graph modes (operation, call, template). The `OperationEdgeAttrs` schema only defines the mode-specific attributes.
```typescript
// How operation graph edges are stored in graphology:
{
edgeType: "typed", // Universal classification (stored alongside attrs)
compatible: true, // OperationEdgeAttrs field
detail: "classify.output → enrich.input", // OperationEdgeAttrs field
mismatches: [] // Empty when compatible
}
```
**Naming note**: Previously named `TypedEdgeAttrs`. Renamed to follow the `{GraphType}EdgeAttrs` pattern used by `CallEdgeAttrs` and `TemplateEdgeAttrs`.
### TriggeredEdgeAttrs (Call Graph)
```typescript
const TriggeredEdgeAttrs = Type.Object({});
type TriggeredEdgeAttrs = Static<typeof TriggeredEdgeAttrs>;
```
Parent-child edges in the call graph carry no additional attributes — the relationship is fully captured by the edge direction and type. This may be extended in the future with `latency` or `metadata` attributes.
### DependencyEdgeAttrs (Call Graph)
```typescript
const DependencyEdgeAttrs = Type.Object({});
type DependencyEdgeAttrs = Static<typeof DependencyEdgeAttrs>;
```
Data dependency edges also carry no additional attributes. Future extensions may include `dataPath` (which field of the output feeds which field of the input).
### CallEdgeAttrs (Call Graph Union)
```typescript
type CallEdgeAttrs = TriggeredEdgeAttrs | DependencyEdgeAttrs;
```
A union type used as the edge attribute type parameter for call graphs (`FlowGraph<CallNodeAttrs, CallEdgeAttrs>`). Call graph edges can be either `triggered` (parent-child) or `depends_on` (data dependency), distinguished by their edge type. The union type follows the `{GraphType}EdgeAttrs` naming pattern consistent with `OperationEdgeAttrs` and `TemplateEdgeAttrs`.
### TemplateEdgeAttrs (Workflow Templates)
```typescript
const TemplateEdgeAttrs = Type.Object({
edgeType: Type.Union([Type.Literal("sequential"), Type.Literal("conditional")]),
condition: Type.Optional(Type.Unknown()), // For conditional edges: the condition function or expression
});
type TemplateEdgeAttrs = Static<typeof TemplateEdgeAttrs>;
```
Template edges carry an `edgeType` to distinguish sequential flow from conditional branching. Conditional edges optionally store a `condition` that determines whether the target node executes.
**Note**: `TemplateEdgeAttrs.edgeType` uses a constrained union of `"sequential" | "conditional"` rather than the full `EdgeTypeEnum`. Template DAGs never have `triggered`, `depends_on`, or `typed` edges — those belong to call graphs and operation graphs respectively.
### TemplateNodeAttrs (Workflow Templates)
Template DAGs use `OperationNodeAttrs` for their operation nodes — the template doesn't need a separate node type because every node in a template DAG corresponds to an operation invocation. The template's structural information (`Sequential`, `Parallel`, `Conditional`, `Map`) is expressed through edges, not through special node types.
```typescript
// Template DAGs use OperationNodeAttrs for operation nodes
type TemplateNodeAttrs = OperationNodeAttrs;
// This alias makes the intent explicit: a template node represents an operation invocation
```
The separation between `OperationNodeAttrs` and `TemplateNodeAttrs` is a type alias for clarity. In the template context, each node carries the same attributes as an operation node (name, namespace, type, input/output schemas), but with template-specific edges (sequential, conditional) rather than type-compatibility edges (typed).
## SerializedGraph Factory
Following the taskgraph pattern, a generic factory for graphology native JSON format:
```typescript
const SerializedGraph = <N extends TSchema, E extends TSchema, G extends TSchema>(
NodeAttrs: N,
EdgeAttrs: E,
GraphAttrs: G,
) =>
Type.Object({
attributes: GraphAttrs,
options: Type.Object({
type: Type.Literal("directed"),
multi: Type.Literal(false),
allowSelfLoops: Type.Literal(false),
}),
nodes: Type.Array(Type.Object({
key: Type.String(),
attributes: NodeAttrs,
})),
edges: Type.Array(Type.Object({
key: Type.String(),
source: Type.String(),
target: Type.String(),
attributes: EdgeAttrs,
})),
});
```
**`multi: false`**: Flowgraph edges are unique per (source, target, edgeType) triple. No parallel edges between the same node pair with the same type.
**`allowSelfLoops: false`**: Operations and calls cannot be their own prerequisite. Self-loops are rejected at construction time.
**`type: "directed"`**: All edges have direction. `A → B` means A is prerequisite/source, B is dependent/target. This matches the graphology convention and the call graph storage schema.
### FlowGraphSerialized variants
Two specialized serialization types, one for each graph type:
```typescript
const OperationGraphSerialized = SerializedGraph(
OperationNodeAttrs,
OperationEdgeAttrs,
Type.Object({}), // No graph-level attributes
);
const CallGraphSerialized = SerializedGraph(
CallNodeAttrs,
CallEdgeAttrs,
Type.Object({}), // No graph-level attributes
);
```
For call graphs, edges can be either `triggered` or `depends_on`, distinguished by their attributes rather than separate schemas.
## Edge Key Convention
Following taskgraph's ADR-006, edge keys are deterministic:
```
${source}->${target}
```
For the operation graph, this means keys like `"task.classify->task.enrich"`. For the call graph, keys like `"req_abc123->req_def456"`.
When multiple edge types exist between the same (source, target) pair (e.g., in the call graph where both `triggered` and `depends_on` edges can connect the same calls), a composite key format is used:
```
${source}->${target}:${edgeType}
```
For example, a `depends_on` edge in the call graph uses `"req_abc123->req_def456:depends_on"` while the `triggered` edge between the same pair uses `"req_abc123->req_def456"`.
Since `multi: false`, there can be at most one edge per key. The composite key format ensures deterministic keys even when multiple edge types connect the same pair.
This is an exception to the simple `${source}->${target}` pattern, but it's necessary for the call graph's dual-edge-type scenario. If multi-edge support becomes more broadly needed, the constraint can be relaxed and a uniform composite key format adopted.
## Constraints
- **TypeBox schemas are the single source of truth** — no hand-written `interface` or `type` definitions for data shapes. All types are derived via `Static<typeof Schema>`.
- **Edge keys are deterministic** — `${source}->${target}` format, following ADR-006 in taskgraph.
- **No parallel edges** — `multi: false` in graphology. At most one edge per (source, target) pair.
- **No self-loops** — `allowSelfLoops: false`. An operation cannot be its own prerequisite.
- **ISO timestamp strings** — Call graph timestamps are ISO 8601 strings, matching the storage schema.
- **Nullable categorical fields** — Following taskgraph's convention, `Type.Optional(Nullable(Enum))` for optional fields that can be explicitly null.
- **`inputSchema` and `outputSchema` on operation nodes** — These are TypeBox schemas (unknown at the graph level), stored for type-compatibility checking. The graph does not validate these schemas — it stores them and makes them available for the `typeCompat` analysis function.
- **No schema version field** — Following taskgraph, the serialized format does not include a version field. It follows graphology's native JSON format and is not a persistence format with backward-compatibility guarantees. Consumers that need persistence wrap it in their own versioned envelope.
## Open Questions
1. **Should `edgeType` be a required field on ALL edges, or only on call graph and template edges?** Operation graph edges are always `typed`, so requiring an explicit `edgeType` attribute there is redundant. Options: (a) make `edgeType` required on all edges, (b) have separate edge attribute types per graph mode, (c) use a union type on edge attributes and let the consumer tag the edge.
2. **Should `CallNodeAttrs.identity` be a `Type.Record` or the structured `Identity` type from operations?** The structured type matches the call protocol and storage schema but creates a dependency on `@alkdev/operations` types. Options: (a) import `Identity` from operations (peer dep), (b) duplicate the type in flowgraph, (c) use `Type.Record` and accept weaker typing.
3. **How should conditional edge conditions be represented?** `condition: Type.Unknown()` is maximally flexible but provides no type safety. Options: (a) `Type.Unknown()` with documentation, (b) `Type.Union([Type.String(), Type.Function(...)])` for expression strings and function references, (c) a dedicated `ConditionSchema` that flowgraph defines.
## References
- Taskgraph schema patterns: `@alkdev/taskgraph_ts/docs/architecture/schemas.md`
- Call graph storage schema: `@alkdev/alkhub_ts/docs/architecture/storage/call-graph.md`
- Call event types: `@alkdev/operations/src/call.ts`
- Operation types: `@alkdev/operations/src/types.ts`
- ujsx schema: `@alkdev/ujsx/docs/architecture/schema.md`