add flowgraph architecture docs (Phase 1 SDD)
Draft architecture specification for @alkdev/flowgraph — a workflow graph library providing DAG-based orchestration over operations. Covers two graph types (operation graph, call graph), ujsx workflow templates, GraphologyHost and ReactiveHost configs, signal-driven execution, type-compatibility analysis, error hierarchy, and build/distribution. Includes 3 ADRs: ujsx as template IR, DAG-only enforcement, decoupled storage.
This commit is contained in:
327
docs/architecture/schema.md
Normal file
327
docs/architecture/schema.md
Normal file
@@ -0,0 +1,327 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-05-19
|
||||
---
|
||||
|
||||
# Schema
|
||||
|
||||
TypeBox Module, TypeScript types, categorical enums, node/edge attribute schemas, and the design decisions behind them.
|
||||
|
||||
## Overview
|
||||
|
||||
Flowgraph's schema layer follows the same pattern as taskgraph: TypeBox schemas are the single source of truth for both runtime validation and TypeScript type derivation. All data shapes are defined as TypeBox schemas, with `Static<typeof Schema>` producing the corresponding TypeScript types.
|
||||
|
||||
The schema is organized around two distinct graph types (operation graph and call graph) plus shared enums and the serialized graph factory.
|
||||
|
||||
## Design Decision: TypeBox as Single Source of Truth
|
||||
|
||||
Identical to taskgraph's approach:
|
||||
|
||||
1. **Static TypeScript types** via `Static<typeof Schema>` — every schema constant has a corresponding `type X = Static<typeof X>` alias
|
||||
2. **Runtime validation** via `Value.Check()` / `Value.Errors()` — structured field-level error reporting
|
||||
3. **JSON Schema export** for consumers that need schema-based contracts
|
||||
|
||||
No separate `interface` or `type` definitions outside of `Static<typeof>`. No Zod.
|
||||
|
||||
### Naming Convention
|
||||
|
||||
| Category | Convention | Example |
|
||||
|----------|-----------|---------|
|
||||
| Enum schema constant | PascalCase + `Enum` suffix | `CallStatusEnum` |
|
||||
| Enum type alias | PascalCase, no suffix | `type CallStatus = Static<typeof CallStatusEnum>` |
|
||||
| Object schema constant | PascalCase, no suffix | `OperationNodeAttrs`, `CallNodeAttrs` |
|
||||
| Object type alias | Same name as schema constant | `type OperationNodeAttrs = Static<typeof OperationNodeAttrs>` |
|
||||
| Graph attribute schemas | `PascalCase` + suffix | `FlowGraphSerialized`, `OperationGraphSerialized` |
|
||||
| Factory function | PascalCase | `SerializedGraph(NodeAttrs, EdgeAttrs, GraphAttrs)` |
|
||||
|
||||
### Nullable Helper
|
||||
|
||||
Same `Nullable` helper as taskgraph:
|
||||
|
||||
```typescript
|
||||
const Nullable = <T extends TSchema>(schema: T) => Type.Union([schema, Type.Null()]);
|
||||
```
|
||||
|
||||
Used for fields that can be explicitly set to `null` (distinct from absent).
|
||||
|
||||
## Enums
|
||||
|
||||
### CallStatus
|
||||
|
||||
The lifecycle states of a call invocation. Matches the call graph storage schema in `@alkdev/alkhub_ts/docs/architecture/storage/call-graph.md`.
|
||||
|
||||
```typescript
|
||||
const CallStatusEnum = Type.Union([
|
||||
Type.Literal("pending"), // Call requested, not yet dispatched
|
||||
Type.Literal("running"), // Handler executing
|
||||
Type.Literal("completed"), // Successfully finished (call.responded + call.completed)
|
||||
Type.Literal("failed"), // Handler threw or call.error emitted
|
||||
Type.Literal("aborted"), // Call.aborted emitted (parent cancelled, deadline exceeded)
|
||||
]);
|
||||
type CallStatus = Static<typeof CallStatusEnum>;
|
||||
```
|
||||
|
||||
Transitions:
|
||||
|
||||
```
|
||||
pending → running → completed
|
||||
→ failed
|
||||
→ aborted
|
||||
```
|
||||
|
||||
- `pending → running`: Handler starts executing
|
||||
- `running → completed`: `call.responded` + `call.completed` received
|
||||
- `running → failed`: `call.error` received
|
||||
- `pending → aborted`: `call.aborted` received before handler started (e.g., deadline exceeded)
|
||||
- `running → aborted`: `call.aborted` received during execution (parent cancelled)
|
||||
|
||||
`completed`, `failed`, and `aborted` are terminal states — no further transitions.
|
||||
|
||||
### NodeStatus
|
||||
|
||||
A derived status for workflow template nodes. While `CallStatus` tracks individual call invocations, `NodeStatus` reflects the template-level view:
|
||||
|
||||
```typescript
|
||||
const NodeStatusEnum = Type.Union([
|
||||
Type.Literal("idle"), // Not started, no call yet
|
||||
Type.Literal("waiting"), // Preconditions not met, waiting for upstream
|
||||
Type.Literal("ready"), // Preconditions met, eligible to start
|
||||
Type.Literal("running"), // Call in progress
|
||||
Type.Literal("completed"), // Call completed successfully
|
||||
Type.Literal("failed"), // Call failed
|
||||
Type.Literal("skipped"), // Conditional branch not taken
|
||||
Type.Literal("aborted"), // Call aborted
|
||||
]);
|
||||
type NodeStatus = Static<typeof NodeStatusEnum>;
|
||||
```
|
||||
|
||||
`NodeStatus` extends `CallStatus` with workflow-specific states (`idle`, `waiting`, `ready`, `skipped`) that have no call protocol equivalent. A node that is `waiting` has no call yet because its preconditions haven't been met.
|
||||
|
||||
### EdgeType
|
||||
|
||||
The type of edge in a flowgraph. Matches the call graph storage schema's `edgeType` column:
|
||||
|
||||
```typescript
|
||||
const EdgeTypeEnum = Type.Union([
|
||||
Type.Literal("triggered"), // Source caused target to execute (parent→child in call hierarchy)
|
||||
Type.Literal("depends_on"), // Source requires target's result before it can complete (data dependency)
|
||||
Type.Literal("typed"), // Type compatibility edge (output schema A → input schema B)
|
||||
Type.Literal("sequential"), // Sequential flow edge (template: <Sequential> ordering)
|
||||
Type.Literal("conditional"), // Conditional flow edge (template: <Conditional> branch)
|
||||
]);
|
||||
type EdgeType = Static<typeof EdgeTypeEnum>;
|
||||
```
|
||||
|
||||
The first three (`triggered`, `depends_on`) match the call graph storage schema. The last two (`sequential`, `conditional`) are template-specific and only exist in workflow template DAGs.
|
||||
|
||||
| Edge Type | Graph Type | Meaning |
|
||||
|-----------|------------|---------|
|
||||
| `triggered` | Call graph | Parent call triggered child call. Corresponds to `parentRequestId`. |
|
||||
| `depends_on` | Call graph | Data dependency — source needs target's result. |
|
||||
| `typed` | Operation graph | Type compatibility — source's output schema is compatible with target's input schema. |
|
||||
| `sequential` | Template → DAG | Sequential ordering from `<Sequential>` component. |
|
||||
| `conditional` | Template → DAG | Conditional branch from `<Conditional>` component. |
|
||||
|
||||
## Node Attribute Schemas
|
||||
|
||||
### OperationNodeAttrs
|
||||
|
||||
Attributes for nodes in the operation graph. Derived from `OperationSpec` but carrying only graph-relevant data:
|
||||
|
||||
```typescript
|
||||
const OperationNodeAttrs = Type.Object({
|
||||
name: Type.String(), // Operation name (e.g., "classify")
|
||||
namespace: Type.String(), // Namespace (e.g., "task")
|
||||
version: Type.String(), // Semantic version
|
||||
type: OperationTypeEnum, // "query" | "mutation" | "subscription"
|
||||
inputSchema: Type.Unknown(), // JSON Schema for input (TypeBox schema)
|
||||
outputSchema: Type.Unknown(), // JSON Schema for output (TypeBox schema)
|
||||
description: Type.Optional(Type.String()),
|
||||
tags: Type.Optional(Type.Array(Type.String())),
|
||||
});
|
||||
type OperationNodeAttrs = Static<typeof OperationNodeAttrs>;
|
||||
```
|
||||
|
||||
The node key is `namespace.name` (e.g., `"task.classify"`), matching the `operationId` format used in the call protocol. The full `OperationSpec` is not stored on the graph — `accessControl`, `errorSchemas`, and `handler` belong to the registry, not the graph.
|
||||
|
||||
**Why `inputSchema` and `outputSchema` on the graph**: These are needed for type-compatibility edge construction. An edge from operation A to operation B exists if A's `outputSchema` is compatible with B's `inputSchema`. Storing the schemas on the node avoids a round-trip to the registry during graph queries.
|
||||
|
||||
### CallNodeAttrs
|
||||
|
||||
Attributes for nodes in the call graph. Populated from call events:
|
||||
|
||||
```typescript
|
||||
const CallNodeAttrs = Type.Object({
|
||||
requestId: Type.String(), // Unique call identifier
|
||||
operationId: Type.String(), // namespace.name of the operation
|
||||
status: CallStatusEnum, // Current call status
|
||||
parentRequestId: Type.Optional(Type.String()), // Parent call (null = top-level)
|
||||
input: Type.Unknown(), // Call input
|
||||
output: Type.Optional(Type.Unknown()), // Call output (on completion)
|
||||
error: Type.Optional(Type.Object({ // Call error (on failure)
|
||||
code: Type.String(),
|
||||
message: Type.String(),
|
||||
details: Type.Optional(Type.Unknown()),
|
||||
})),
|
||||
identity: Type.Optional(Type.Object({ // Caller identity
|
||||
id: Type.String(),
|
||||
scopes: Type.Array(Type.String()),
|
||||
resources: Type.Optional(Type.Record(Type.String(), Type.Array(Type.String()))),
|
||||
})),
|
||||
startedAt: Type.Optional(Type.String()), // ISO timestamp when call was dispatched
|
||||
completedAt: Type.Optional(Type.String()), // ISO timestamp when call completed/failed/aborted
|
||||
});
|
||||
type CallNodeAttrs = Static<typeof CallNodeAttrs>;
|
||||
```
|
||||
|
||||
The node key is `requestId`. This matches the call protocol's correlation mechanism and the call graph storage schema.
|
||||
|
||||
**Why ISO timestamps as strings**: Following the call protocol, timestamps are ISO 8601 strings rather than numbers. This makes the graph directly serializable to JSON without transformation and aligns with the storage schema's `timestamp with tz` columns.
|
||||
|
||||
**Why `parentRequestId` is both a node attribute and an edge**: Following the same denormalization pattern as the storage schema — `parentRequestId` on the node enables fast point lookups ("who is this call's parent?"), while `triggered` edges enable traversal queries. Both are kept consistent by construction.
|
||||
|
||||
## Edge Attribute Schemas
|
||||
|
||||
### TypedEdgeAttrs (Operation Graph)
|
||||
|
||||
```typescript
|
||||
const TypedEdgeAttrs = Type.Object({
|
||||
compatible: Type.Boolean({ description: "Whether the source output schema is compatible with the target input schema" }),
|
||||
compatibilityDetail: Type.Optional(Type.String({ description: "Human-readable description of compatibility or mismatch" })),
|
||||
});
|
||||
type TypedEdgeAttrs = Static<typeof TypedEdgeAttrs>;
|
||||
```
|
||||
|
||||
Type-compatibility edges carry a boolean `compatible` flag and optional detail. This allows the operation graph to include both compatible edges (green paths) and incompatible edges (red paths) for diagnostics.
|
||||
|
||||
### TriggeredEdgeAttrs (Call Graph)
|
||||
|
||||
```typescript
|
||||
const TriggeredEdgeAttrs = Type.Object({});
|
||||
type TriggeredEdgeAttrs = Static<typeof TriggeredEdgeAttrs>;
|
||||
```
|
||||
|
||||
Parent-child edges in the call graph carry no additional attributes — the relationship is fully captured by the edge direction and type. This may be extended in the future with `latency` or `metadata` attributes.
|
||||
|
||||
### DependencyEdgeAttrs (Call Graph)
|
||||
|
||||
```typescript
|
||||
const DependencyEdgeAttrs = Type.Object({});
|
||||
type DependencyEdgeAttrs = Static<typeof DependencyEdgeAttrs>;
|
||||
```
|
||||
|
||||
Data dependency edges also carry no additional attributes. Future extensions may include `dataPath` (which field of the output feeds which field of the input).
|
||||
|
||||
### TemplateEdgeAttrs (Workflow Templates)
|
||||
|
||||
```typescript
|
||||
const TemplateEdgeAttrs = Type.Object({
|
||||
edgeType: EdgeTypeEnum, // "sequential" or "conditional"
|
||||
condition: Type.Optional(Type.Unknown()), // For conditional edges: the condition function or expression
|
||||
});
|
||||
type TemplateEdgeAttrs = Static<typeof TemplateEdgeAttrs>;
|
||||
```
|
||||
|
||||
Template edges carry an `edgeType` to distinguish sequential flow from conditional branching. Conditional edges optionally store a `condition` that determines whether the target node executes.
|
||||
|
||||
## SerializedGraph Factory
|
||||
|
||||
Following the taskgraph pattern, a generic factory for graphology native JSON format:
|
||||
|
||||
```typescript
|
||||
const SerializedGraph = <N extends TSchema, E extends TSchema, G extends TSchema>(
|
||||
NodeAttrs: N,
|
||||
EdgeAttrs: E,
|
||||
GraphAttrs: G,
|
||||
) =>
|
||||
Type.Object({
|
||||
attributes: GraphAttrs,
|
||||
options: Type.Object({
|
||||
type: Type.Literal("directed"),
|
||||
multi: Type.Literal(false),
|
||||
allowSelfLoops: Type.Literal(false),
|
||||
}),
|
||||
nodes: Type.Array(Type.Object({
|
||||
key: Type.String(),
|
||||
attributes: NodeAttrs,
|
||||
})),
|
||||
edges: Type.Array(Type.Object({
|
||||
key: Type.String(),
|
||||
source: Type.String(),
|
||||
target: Type.String(),
|
||||
attributes: EdgeAttrs,
|
||||
})),
|
||||
});
|
||||
```
|
||||
|
||||
**`multi: false`**: Flowgraph edges are unique per (source, target, edgeType) triple. No parallel edges between the same node pair with the same type.
|
||||
|
||||
**`allowSelfLoops: false`**: Operations and calls cannot be their own prerequisite. Self-loops are rejected at construction time.
|
||||
|
||||
**`type: "directed"`**: All edges have direction. `A → B` means A is prerequisite/source, B is dependent/target. This matches the graphology convention and the call graph storage schema.
|
||||
|
||||
### FlowGraphSerialized variants
|
||||
|
||||
Two specialized serialization types, one for each graph type:
|
||||
|
||||
```typescript
|
||||
const OperationGraphSerialized = SerializedGraph(
|
||||
OperationNodeAttrs,
|
||||
TypedEdgeAttrs,
|
||||
Type.Object({}), // No graph-level attributes
|
||||
);
|
||||
|
||||
const CallGraphSerialized = SerializedGraph(
|
||||
CallNodeAttrs,
|
||||
Type.Union([TriggeredEdgeAttrs, DependencyEdgeAttrs]),
|
||||
Type.Object({}), // No graph-level attributes
|
||||
);
|
||||
```
|
||||
|
||||
For call graphs, edges can be either `triggered` or `depends_on`, distinguished by their attributes rather than separate schemas.
|
||||
|
||||
## Edge Key Convention
|
||||
|
||||
Following taskgraph's ADR-006, edge keys are deterministic:
|
||||
|
||||
```
|
||||
${source}->${target}
|
||||
```
|
||||
|
||||
For the operation graph, this means keys like `"task.classify->task.enrich"`. For the call graph, keys like `"req_abc123->req_def456"`.
|
||||
|
||||
Since `multi: false`, there can be at most one edge between any (source, target) pair. When multiple edge types are needed between the same pair (e.g., both `triggered` and `depends_on` between two calls), the graph stores a single edge whose `edgeType` attribute captures the semantic relationship. This is a simplification from the storage schema, which allows multiple edges per (source, target, edgeType) triple — the in-memory graph collapses these into a single edge per (source, target) pair.
|
||||
|
||||
This is acceptable because:
|
||||
- Operation graphs only have `typed` edges, so no multi-edge concern.
|
||||
- Call graphs rarely have both `triggered` and `depends_on` between the same pair.
|
||||
- Template DAGs only have `sequential` or `conditional` edges.
|
||||
|
||||
If multi-edge support becomes necessary, the `allowSelfLoops: false` constraint can be relaxed and a composite key format (`${source}->${target}:${edgeType}`) adopted.
|
||||
|
||||
## Constraints
|
||||
|
||||
- **TypeBox schemas are the single source of truth** — no hand-written `interface` or `type` definitions for data shapes. All types are derived via `Static<typeof Schema>`.
|
||||
- **Edge keys are deterministic** — `${source}->${target}` format, following ADR-006 in taskgraph.
|
||||
- **No parallel edges** — `multi: false` in graphology. At most one edge per (source, target) pair.
|
||||
- **No self-loops** — `allowSelfLoops: false`. An operation cannot be its own prerequisite.
|
||||
- **ISO timestamp strings** — Call graph timestamps are ISO 8601 strings, matching the storage schema.
|
||||
- **Nullable categorical fields** — Following taskgraph's convention, `Type.Optional(Nullable(Enum))` for optional fields that can be explicitly null.
|
||||
- **`inputSchema` and `outputSchema` on operation nodes** — These are TypeBox schemas (unknown at the graph level), stored for type-compatibility checking. The graph does not validate these schemas — it stores them and makes them available for the `typeCompat` analysis function.
|
||||
- **No schema version field** — Following taskgraph, the serialized format does not include a version field. It follows graphology's native JSON format and is not a persistence format with backward-compatibility guarantees. Consumers that need persistence wrap it in their own versioned envelope.
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Should `edgeType` be a required field on ALL edges, or only on call graph and template edges?** Operation graph edges are always `typed`, so requiring an explicit `edgeType` attribute there is redundant. Options: (a) make `edgeType` required on all edges, (b) have separate edge attribute types per graph mode, (c) use a union type on edge attributes and let the consumer tag the edge.
|
||||
|
||||
2. **Should `CallNodeAttrs.identity` be a `Type.Record` or the structured `Identity` type from operations?** The structured type matches the call protocol and storage schema but creates a dependency on `@alkdev/operations` types. Options: (a) import `Identity` from operations (peer dep), (b) duplicate the type in flowgraph, (c) use `Type.Record` and accept weaker typing.
|
||||
|
||||
3. **How should conditional edge conditions be represented?** `condition: Type.Unknown()` is maximally flexible but provides no type safety. Options: (a) `Type.Unknown()` with documentation, (b) `Type.Union([Type.String(), Type.Function(...)])` for expression strings and function references, (c) a dedicated `ConditionSchema` that flowgraph defines.
|
||||
|
||||
## References
|
||||
|
||||
- Taskgraph schema patterns: `@alkdev/taskgraph_ts/docs/architecture/schemas.md`
|
||||
- Call graph storage schema: `@alkdev/alkhub_ts/docs/architecture/storage/call-graph.md`
|
||||
- Call event types: `@alkdev/operations/src/call.ts`
|
||||
- Operation types: `@alkdev/operations/src/types.ts`
|
||||
- ujsx schema: `@alkdev/ujsx/docs/architecture/schema.md`
|
||||
Reference in New Issue
Block a user