diff --git a/docs/architecture/metagraph-module.md b/docs/architecture/metagraph-module.md index 7667b58..5dd224a 100644 --- a/docs/architecture/metagraph-module.md +++ b/docs/architecture/metagraph-module.md @@ -1,16 +1,16 @@ --- status: draft -last_updated: 2026-05-30 +last_updated: 2026-05-29 --- # Metagraph as TypeBox Module -Graph type definitions as `Type.Module` — aligning with the ujsx pattern for -recursive schemas, cross-package references, codegen, and graphology serialization. +Graph type definitions as `Type.Module` — recursive schemas, cross-package +references, and DB persistence. ## The Metagraph Data Model -The metagraph pattern is a three-level type system: +The metagraph is a three-level type system: 1. **GraphType** — A class of graphs (e.g., "call-graph", "acl", "task-dependencies"). Defines structural constraints @@ -24,8 +24,8 @@ The metagraph pattern is a three-level type system: "can_read", "depends_on"). Each edge type has a TypeBox schema for its attributes. Optionally constrains which source/target node types are valid. -Then **Graph instances** belong to a graph type and contain **Nodes** and -**Edges** conforming to those type definitions. +**Graph instances** belong to a graph type and contain **Nodes** and **Edges** +conforming to those type definitions. ``` GraphType "call-graph" (directed, multi, self-loops allowed) @@ -42,7 +42,7 @@ Graph "session-abc-call-graph" (instance) │ └── attributes: { requestId, operationId, status, ... } ├── Node "call-002" → nodeTypeId → NodeType "subcall" │ └── attributes: { requestId, parentRequestId, ... } - └── Edge "edge-001" → edgeTypeId → EdgeType "triggered" + └── Edge "edge-001" → edgeTypeId → NodeType "triggered" └── attributes: { type: "triggered" } sourceNodeKey: "call-001" targetNodeKey: "call-002" @@ -54,83 +54,45 @@ Nodes and edges use a **composite identity model**: identified by `key` is the identity. Node and edge attributes are stored as JSON text in SQLite (jsonb in PG). The -graph type's schema defines what shape these attributes should have, but the -database doesn't enforce the schema — all validation happens in the repository -layer. See [schema-evolution.md](./schema-evolution.md) for how schemas change -over time, and [sqlite-host.md](./sqlite-host.md) for the table definitions. +graph type's schema defines the expected shape, but the database doesn't enforce +it — validation happens in the repository layer. See +[schema-evolution.md](./schema-evolution.md) for how schemas change over time, +and [sqlite-host.md](./sqlite-host.md) for table definitions. -## Overview +## Why TypeBox Modules -A graph type definition is naturally a TypeBox Module. It has named entries -(node types, edge types, config) that reference each other with `Type.Ref()`, -compose with `Type.Composite()`, and can cross-reference other Modules with -`Import()`. This is the same pattern used by `@alkdev/ujsx` (where `UJSX` is -a Module with `UPrimitive`, `UElement`, `URoot`, `UNode` recursively referencing -each other). +A graph type definition has named entries (node types, edge types, config) that +reference each other. `Type.Module` is the natural fit: -The removed `SchemaBuilder` produced a flat `GraphSchema` object — an ad-hoc -`Record` + `Record`. This works but -creates friction: +- **`Type.Ref("CallStatus")`** — recursive and internal references resolve + within the Module's `$defs` +- **`Module.Import("CallStatus")`** — cross-package references embed the + referenced Module's `$defs` +- **`Value.Check(Module.Import("CallNode"), data)`** — runtime validation +- **`Static`** — TypeScript types from the Module -1. **No cross-graph-type references** — a call graph node type can't reference - `CallStatus` from `@alkdev/flowgraph` without manual `Type.Intersect` - composition. Each package defines schemas independently, duplicating types. -2. **No graphology compatibility** — the schema output is a flat JSON object, - not a format that maps to graphology's `import()`/`export()`. Consumers - manually map node/edge attributes. +This replaces the removed `SchemaBuilder`, which produced a flat +`Record` + `Record`. That approach had +three limitations that Modules solve natively: + +1. **No cross-graph-type references** — a call graph node type couldn't + reference `CallStatus` from `@alkdev/flowgraph` without manual + `Type.Intersect`. Each package duplicated types independently. +2. **No graphology compatibility** — the flat JSON output didn't map to + graphology's `import()`/`export()`. Consumers manually mapped node/edge + attributes. 3. **No codegen leverage** — `TsToModule` generates TypeBox Modules from - TypeScript interfaces. The SchemaBuilder couldn't consume Module output, so - codegen-produced types must be manually translated. + TypeScript interfaces, but the builder couldn't consume Module output. -The Module approach treats each graph type as a `Type.Module`, aligning storage -with how ujsx already works — recursive types via `Ref`, composition via -`Composite`, cross-references via `Import`. +This aligns with the pattern proven in `@alkdev/ujsx`, where `UJSX` is a Module +with `UPrimitive`, `UElement`, `URoot`, `UNode` recursively referencing each +other. See [forward-look.md](./forward-look.md) for how this connects to the +broader ecosystem (codegen, graphology, dbtype). -For the forward-looking view of how this connects to dbtype, graph pointers, -and the ujsx universal IR pipeline, see [forward-look.md](./forward-look.md). +## Base Module: Metagraph -## The Pattern (Proven in ujsx) - -`@alkdev/ujsx` already uses this pattern (ADR-002: "TypeBox Module as type -registry"): - -```ts -// ujsx: schema.ts -export const UJSX = Type.Module({ - UPrimitive: Type.Union([Type.String(), Type.Number(), Type.Boolean(), Type.Null()]), - PropValue: Type.Union([..., Type.Ref("UNode"), ...]), - UniversalProps: Type.Object({}, { additionalProperties: Type.Union([Type.Ref("PropValue"), Type.Undefined()]) }), - UElement: Type.Object({ - type: Type.String(), - props: Type.Ref("UniversalProps"), - children: Type.Array(Type.Ref("UNode")), // recursive! - }), - URoot: Type.Object({ - type: Type.Literal("root"), - props: Type.Ref("UniversalProps"), - children: Type.Array(Type.Ref("UNode")), // recursive! - }), - UNode: Type.Union([Type.Ref("UPrimitive"), Type.Ref("UElement"), Type.Ref("URoot")]), -}); -``` - -Key properties: -- **`Type.Ref("UNode")`** resolves within the Module's `$defs` — recursive - references are natural -- **`UJSX.Import("UElement")`** lets other Modules reference ujsx types — the - referenced Module's `$defs` are embedded in the importing Module's JSON Schema -- **`Value.Check(UJSX.Import("UElement"), node)`** validates at runtime -- **`Static`** gives TypeScript types (or hand-written types for - non-serializable entries like `ComponentFn`) - -Graph type definitions have the same structure — named entries that reference -each other, with possible cross-references to other packages' Modules. - -## Proposed: GraphType as a TypeBox Module - -### Base Module: Metagraph - -The metagraph meta-schema itself is a Module: +The metagraph meta-schema is a Module providing base entries that concrete +graph types compose from: ```ts export const Metagraph = Type.Module({ @@ -157,12 +119,26 @@ export const Metagraph = Type.Module({ }); ``` -### Concrete Graph Type: CallGraph +- `Config` uses `Type.Union` with defaults for construction-time validation + ("any valid config"). Specific graph types narrow these to `Type.Literal` + values. +- `BaseNode` and `BaseEdge` provide common attribute schemas. Concrete graph + types compose them via `Type.Composite`. +- `metadata` and similar "arbitrary data" fields use `Type.Unknown()` + (not `Type.Any()`). `Type.Unknown()` is canonical — it communicates "no + validation applied" explicitly. -A specific graph type is also a Module. It composes `BaseNode`/`BaseEdge` via -`Type.Composite()` (same as ujsx's `Mdast.Node: Type.Composite([Unist.Import("UnistNode"), ...])`): +## Concrete Graph Type Modules + +A specific graph type is also a `Type.Module`. It composes `BaseNode`/`BaseEdge` +via `Metagraph.Import()` and `Type.Composite()`, narrows config to literal values, +and defines its own node types, edge types, and shared types. + +### Example: CallGraph ```ts +import { Metagraph } from "./metagraph.ts"; + export const CallGraph = Type.Module({ // Config is specific — literal values, not unions with defaults Config: Type.Object({ @@ -171,7 +147,7 @@ export const CallGraph = Type.Module({ allowSelfLoops: Type.Literal(false), }), - // Node types compose BaseNode (from Metagraph) with call-specific attributes + // Node types compose BaseNode with call-specific attributes CallNode: Type.Composite([ Metagraph.Import("BaseNode"), Type.Object({ @@ -229,9 +205,33 @@ export const CallGraph = Type.Module({ }); ``` +### Type.Composite, not Type.Intersect + +Graph type Modules use `Type.Composite` to extend base schemas, not +`Type.Intersect`. The difference: + +- **`Type.Intersect`** produces a `TIntersect` wrapper with `allOf` — consumers + must traverse `allOf` to access properties. +- **`Type.Composite`** produces a flat `TObject` — overlapping keys are + intersected via `IntersectEvaluated`, non-overlapping keys are merged. + +Both use intersection semantics for overlapping keys. When overlapping keys have +a subtype relationship (e.g., `type: Type.String()` → `type: Type.Literal("triggered")`), +the intersection resolves to the narrower type, which is the correct behavior. + +**Constraint**: Do not use `Type.Composite` with overlapping keys of incompatible +types. If `BaseEdge` has `type: Type.String()` and a concrete edge type needs +`type: Type.Number()`, the intersection evaluates to `never`. For graph types, +this is not a concern — base and concrete keys either don't overlap, or the +overlap is a valid subtype narrowing. + ### Cross-Module References -`Module.Import()` allows one Module to reference entries from another: +`Module.Import()` allows one Module to reference entries from another. In the +CallGraph example, `Metagraph.Import("BaseNode")` embeds `Metagraph`'s `$defs` +into `CallGraph`'s JSON Schema output. + +This also works across packages: ```ts import { FlowGraph } from "@alkdev/flowgraph/schema"; @@ -241,146 +241,76 @@ const CallGraph = Type.Module({ CallNode: Type.Composite([ Type.Ref("BaseNode"), Type.Object({ - status: FlowGraph.Import("CallStatus"), // from flowgraph - identity: Type.Optional(FlowGraph.Import("Identity")), // from flowgraph - // ... + status: FlowGraph.Import("CallStatus"), + identity: Type.Optional(FlowGraph.Import("Identity")), }), ]), }); ``` -This is exactly the `Mdast.Import("UnistNode")` pattern from the ujsx research. - -**⚠️ Import embedding**: `Module.Import()` embeds the referenced Module's `$defs` -into the importing Module's JSON Schema output. When `CallGraph` imports from +**Import embedding**: `Module.Import()` embeds the referenced Module's `$defs` +into the importing Module's JSON Schema. When `CallGraph` imports from `FlowGraph`, the resulting JSON Schema includes all of `FlowGraph`'s definitions -in `$defs`. See DD6 for how the repository layer handles this. +in `$defs`. The repository layer stores **dereferenced entry schemas** — each +`node_types` row gets its entry's resolved JSON Schema (with inline `$defs` for +just its transitive references), not the entire importing Module. This avoids +storage bloat and version coupling (DD6). -**Decision (DD6)**: The repository layer stores **dereferenced entry schemas** — -each `node_types` row gets its entry's resolved JSON Schema (with inline `$defs` -for just its transitive references), not the entire importing Module. This -avoids storage bloat and version coupling issues. +### BaseNode/BaseEdge: Import vs Local Re-declaration -### BaseNode/BaseEdge: Local Re-declaration vs Metagraph.Import +There are two ways to get `BaseNode`/`BaseEdge` into a concrete graph type Module: -`Type.Ref()` only resolves entries within the *same* Module. In the `CallGraph` -example above, `Type.Ref("BaseNode")` requires `BaseNode` to be an entry in the -`CallGraph` Module. There are two strategies for getting `BaseNode`/`BaseEdge` -into a concrete graph type Module: +- **`Metagraph.Import("BaseNode")`** — references the base Module directly. + No duplication, but embeds `Metagraph`'s `$defs` (3 entries — minimal bloat). +- **Local re-declaration** — copy the base schemas into the concrete Module. + No `$defs` embedding, but duplication if `Metagraph` evolves. -**Option A: Re-declare locally** (shown in the example above). Each concrete -Module includes its own `BaseNode`/`BaseEdge` entries. The schemas are identical -to `Metagraph.BaseNode`/`Metagraph.BaseEdge` — you copy them in. Simple, but -creates duplication. If the base schemas evolve, each concrete Module must be -updated independently. +**Decision**: Use `Metagraph.Import()` for Modules within `@alkdev/storage` +(e.g., `modules/call-graph.ts`). Both Modules live in the same package, so +there's no circular dependency. For Modules defined in external packages +(e.g., `@alkdev/flowgraph`), re-declare base schemas locally — external +packages should not depend on storage's `Metagraph` Module. -**Option B: Metagraph.Import**. The concrete Module imports from `Metagraph`: +### Config: Literal Values Freeze the Configuration + +The general `Metagraph.Config` uses `Type.Union` with defaults (for +construction-time: "any valid config"). Specific graph types freeze these to +`Type.Literal` values: ```ts -const CallGraph = Type.Module({ - CallNode: Type.Composite([ - Metagraph.Import("BaseNode"), - Type.Object({ requestId: Type.String(), ... }), - ]), -}); -``` +// General: accepts any valid config +Metagraph.Config // type: union of "directed"|"undirected"|"mixed", multi: boolean, ... -This avoids duplication but embeds `Metagraph`'s `$defs` into `CallGraph`'s -JSON Schema output. For most cases, `Metagraph` is small (3 entries) so the -bloat is minimal. If `Metagraph` grows, this could become a concern. - -**Decision: Option B for same-package Modules (recommended), Option A as -fallback for external-package Modules**. - -For Modules defined within `@alkdev/storage` (like `CallGraph` in -`modules/call-graph.ts`), `Metagraph.Import("BaseNode")` has no circular -dependency issue — both `Metagraph` and `CallGraph` live in the same package. -The `Import` approach avoids duplication and keeps the base schemas in one -place. - -For Modules defined outside `@alkdev/storage` (e.g., in `@alkdev/flowgraph`), -Option A applies because external packages should not depend on storage's -`Metagraph` Module (see Open Question 1). Those packages re-declare their own -base schemas or define them independently. - -The v1 reference Modules in `modules/` should use Option B. If a future -consumer defines a `CallGraph` Module externally, they can choose either -approach — the schemas are structurally identical. - -**Verified**: `Type.Composite([Type.Ref("BaseNode"), Type.Object({...})])` -within a Module resolves correctly. Test confirms: `Value.Check(Module.Import("CallNode"), validData)` passes. - -### Type.Composite vs Type.Intersect - -The Module approach uses `Type.Composite` for extending `BaseNode`/`BaseEdge`, -not `Type.Intersect`. This matches the ujsx pattern where `Mdast.Node` is -`Type.Composite([Unist.Import("UnistNode"), Type.Object({...})])`. - -The difference: -- **`Type.Intersect`** creates a JSON Schema `allOf` — the result is a - `TIntersect` wrapper with nested schemas. Consumers must traverse `allOf` - to access properties. -- **`Type.Composite`** produces an **intersection evaluated into a flat - `TObject`** — overlapping keys are intersected via `IntersectEvaluated` - and the result is a single object with no `allOf` wrapper. The output - shape is `{ key1: Intersect([typeA, typeB]), key2: typeC, ... }`. - -**Both use intersection semantics for overlapping keys.** Composite is NOT -an `Object.assign` override — when overlapping keys have varying (incompatible) -types, the result is `never`. When overlapping keys have a subtype -relationship (like `Type.String()` and `Type.Literal("triggered")`), the -intersection resolves to the narrower type (`Type.Literal("triggered")`), -which is the correct behavior. - -**Why Composite over Intersect for graph types**: The output is a flat -`TObject` that maps directly to a node/edge attribute schema. `Intersect` -produces a `TIntersect` wrapper that would need unwrapping. For graph types -where base and concrete attributes have non-overlapping keys (most cases) -or subtype-only overlaps (like `type: Type.String()` → `type: Type.Literal(...)`), -Composite evaluates to the same result but in a more convenient shape. - -**Design constraint**: Do not use `Type.Composite` with overlapping keys of -incompatible types. If `BaseEdge` has `type: Type.String()` and a concrete -edge type needs `type: Type.Number()`, the intersection evaluates to `never`. -For graph types, this is not a concern — base and concrete keys either don't -overlap, or the overlap is a valid subtype narrowing (union → literal). - -### Config: Literal Values for Specific Graph Types - -The general `Metagraph.Config` has `Type.Union` with defaults (for -construction-time validation: "any valid config"). Specific graph types use -`Type.Literal` for frozen config values: - -```ts -// General (construction): Type.Union([Type.Literal("directed"), Type.Literal("undirected"), ...]) -// Specific (frozen): Type.Literal("directed") +// Specific: frozen to exact values +CallGraph.Config // type: "directed", multi: false, allowSelfLoops: false ``` The construction flow: consumer provides a general config → validated against -`Metagraph.Config` → the specific graph type Module uses `Type.Literal` to -freeze the value. Narrowing from `Type.Union` to `Type.Literal` is explicit -in the Module — no builder step needed. +`Metagraph.Config` → the specific graph type Module freezes the values with +`Type.Literal`. -### Edge Type Constraints: named constraint entries +## Edge Type Constraints Edge type constraints (`allowedSourceTypes`/`allowedTargetTypes`) are **named Module entries**, not columns bolted onto DB rows. This makes them first-class -parts of the schema — queryable, validatable, and composable: +parts of the schema — queryable, validatable, and serializable. ```ts +import { Metagraph } from "./metagraph.ts"; + export const CallGraph = Type.Module({ // ... TriggeredEdge: Type.Composite([ - Type.Ref("BaseEdge"), + Metagraph.Import("BaseEdge"), Type.Object({ type: Type.Literal("triggered") }), ]), TriggeredEdgeConstraints: Type.Object({ edgeType: Type.Literal("triggered"), - allowedSourceTypes: Type.Array(Type.String()), // node type names: ["Call"] - allowedTargetTypes: Type.Array(Type.String()), // node type names: ["Call", "Subcall"] + allowedSourceTypes: Type.Array(Type.String()), // ["Call"] + allowedTargetTypes: Type.Array(Type.String()), // ["Call", "Subcall"] }), DependsOnEdge: Type.Composite([ - Type.Ref("BaseEdge"), + Metagraph.Import("BaseEdge"), Type.Object({ type: Type.Literal("depends_on") }), ]), DependsOnEdgeConstraints: Type.Object({ @@ -391,47 +321,23 @@ export const CallGraph = Type.Module({ }); ``` -**Why Module entries instead of DB columns** (DD7 revised): - -1. **Schema-level validation**: `Value.Check(CallGraph.TriggeredEdgeConstraints, data)` - validates that constraint data is well-formed. With DB columns, there's no - schema validation — just JSON arrays in text columns. -2. **Serialization**: The constraint entries serialize to JSON Schema with - `$defs`, enabling `Value.Diff` for migration detection and `FromSchema` - for round-tripping. -3. **DB mapping**: The `moduleToDbSchema()` function extracts - `*EdgeConstraints` entries and writes their `allowedSourceTypes`/ - `allowedTargetTypes` fields to the existing `edge_types` columns. The DB - schema doesn't change — the Module entries are the source of truth, the - DB columns are the persistence projection. - -**Why Type.String() not Type.Ref()**: The constraint arrays contain node type -*names* (strings like `"Call"`), not node type *schemas*. `Type.Ref("CallNode")` -would mean "an element must validate against the CallNode schema," which is -incorrect — the constraint is about which named node types are valid endpoints, -not about node data shapes. The naming convention (`*Node` suffix) provides an -implicit structural contract: string values in `allowedSourceTypes` should -correspond to `*Node` entry names in the same Module. This is enforced by -`moduleToDbSchema()` at Module-to-DB projection time, not by the schema itself. -See Open Question 4 for the `Type.Ref` vs `Type.String` trade-off. - -**DB mapping note**: The current DB schema stores `allowedSourceTypes` and -`allowedTargetTypes` as JSON text columns (arrays of strings, default `[]`). -In the Module, these become `Type.Array(Type.String())` entries — the DB -column values are the same string arrays. `moduleToDbSchema()` extracts them -directly. Read-path reconstruction resolves the names back to Module entries -for validation. +**Why `Type.String()` not `Type.Ref()`**: The constraint arrays contain node +type *names* (strings like `"Call"`), not node type *schemas*. `Type.Ref("CallNode")` +would mean "each element must validate against the CallNode schema," which is +semantically wrong — the constraint is about which named node types are valid +endpoints, not about data shapes. The `*Node` suffix naming convention provides +an implicit structural contract. `moduleToDbSchema()` enforces this convention +at Module-to-DB projection time. **Empty array semantics**: In the DB, `[]` means "no restriction" (any node -type valid). In the Module, omitting the `*EdgeConstraints` entry means the -same thing. An explicit entry with empty arrays is not valid — it would mean -"no node types are valid at this endpoint," which is nonsensical. The -repository layer enforces this convention. +type valid). In the Module, omitting the `*EdgeConstraints` entry means the same +thing. An explicit entry with empty arrays is not valid — it would mean "no node +types are valid at this endpoint," which is nonsensical. -### Entry Naming Convention +## Entry Naming Convention -Within a graph type Module, entries follow a naming convention that distinguishes -their role (DD8): +Within a graph type Module, entries follow a suffix convention that distinguishes +their role and determines their DB mapping: | Suffix | Role | Maps to DB | |--------|------|------------| @@ -442,15 +348,14 @@ their role (DD8): | `*Enum` or bare name | Shared enum/type | Embedded in `node_types.schema`/`edge_types.schema` | | `BaseNode`, `BaseEdge` | Base attribute schemas | Composed into `*Node`/`*Edge` entries | -The `moduleToDbSchema()` function uses this convention to map Module entries to -the `node_types` and `edge_types` tables. Entries ending in `Node` become rows -with `name = entryNameWithoutSuffix ("Node")` and `schema = resolved entry`. -Same for `*Edge`. The `Config` entry maps to `graph_types.config`. +`moduleToDbSchema()` uses this convention to project Module entries to DB rows. +Entries ending in `Node` become rows with `name = entryNameWithoutSuffix("Node")` +and `schema = resolved entry`. Same for `*Edge`. The `Config` entry maps to +`graph_types.config`. ## graphology Serialization Bridge -The bridge between Modules and graphology is the `SerializedGraph` pattern that -`@alkdev/flowgraph` already uses: +The bridge between Modules and graphology is the `SerializedGraph` pattern: ```ts // flowgraph's current pattern (standalone schemas) @@ -462,7 +367,7 @@ const CallGraphSerialized = SerializedGraph( // Module pattern (entries from the Module) const CallGraphSerialized = SerializedGraph( - CallGraph.CallNode, // entry from Module — resolves Refs through $defs + CallGraph.CallNode, // entry from Module — resolves Refs through $defs CallGraph.DependsOnEdge, // entry from Module Type.Object({}), ); @@ -472,7 +377,7 @@ Graphology's serialized format: ```ts { - attributes: {}, // Graph-level attributes (empty for most graphs) + attributes: {}, // Graph-level attributes options: { type: "directed", // From CallGraph.Config multi: false, @@ -493,36 +398,27 @@ The mapping: - `CallGraph.CallNode` → validates `nodes[].attributes` - `CallGraph.TriggeredEdge` → validates `edges[].attributes` -This is **complementary** to `@alkdev/flowgraph`'s `SerializedGraph` — storage -produces the data, flowgraph operates on it in memory. The `SerializedGraph` -factory function stays the same — its schema arguments now come from Module -entries instead of standalone schemas. The `moduleToDbSchema()` -function extracts per-entry schemas for DB storage; the `moduleToGraphology()` -function produces the graphology import format for hydration. +Storage produces this format; `@alkdev/flowgraph`'s `FlowGraph.fromJSON()` and +`SerializedGraph` consume it. The `SerializedGraph` factory function stays the +same — its schema arguments now come from Module entries instead of standalone +schemas. Storage doesn't need a graphology dependency. ## DB Persistence Bridge -The repository layer maps Module entries to the existing 6-table schema: +The repository layer maps Module entries to the 6-table metagraph schema: -1. **`graph_types`** row: `name` = Module name, `config` = `CallGraph.Config` - JSON Schema (with defaults resolved) -2. **`node_types`** rows: one row per `*Node` entry, `name` = entry name - (minus `Node` suffix), `schema` = resolved entry JSON Schema -3. **`edge_types`** rows: one row per `*Edge` entry, `name` = entry name - (minus `Edge` suffix), `schema` = resolved entry JSON Schema, +1. **`graph_types`** row: `name` = Module name, `config` = resolved + `CallGraph.Config` JSON Schema +2. **`node_types`** rows: one per `*Node` entry, `name` = entry name (minus + suffix), `schema` = resolved entry JSON Schema +3. **`edge_types`** rows: one per `*Edge` entry, `name` = entry name (minus + suffix), `schema` = resolved entry JSON Schema, `allowedSourceTypes`/`allowedTargetTypes` from constraint entries On read, the repository layer reconstructs the Module from DB rows: `Value.Check(CallGraph.CallNode, node.attributes)` validates node data against the Module entry. -**`Module.Import()` embedding**: When a Module entry references entries from -another Module (e.g., `FlowGraph.Import("CallStatus")`), the JSON Schema for -that entry includes the referenced entries in `$defs`. The repository layer -stores the **dereferenced entry** — the resolved JSON Schema with inline `$defs` -for transitive references — not the entire importing Module. This avoids -duplicating all of FlowGraph's definitions in every CallGraph node_types row. - ### Bridge Functions #### `moduleToDbSchema(module)` @@ -564,8 +460,7 @@ function moduleToDbSchema(module: TModule): DbSchema - `*EdgeConstraints` entries that reference edge type entries not present in the Module (the `edgeType` field must match an `*Edge` entry name). - `*EdgeConstraints` entries with empty `allowedSourceTypes` and - `allowedTargetTypes` arrays (empty = "no types allowed", which is - nonsensical; omit the entry instead for "no restriction"). + `allowedTargetTypes` arrays (omit the entry for "no restriction"). - Module without a `Config` entry (all graph types require configuration). #### `validateNode(module, entryName, data)` / `validateEdge(module, entryName, data)` @@ -581,47 +476,31 @@ Returns `true` if data passes `Value.Check` against the resolved Module entry. Throws if `entryName` doesn't match an `*Node`/`*Edge` entry in the Module. Does NOT throw on invalid data — returns `false`. -### Type.Any vs Type.Unknown - -The pre-Module `types.ts` used `Type.Any()` for `metadata` and `schema` fields. -The Module approach uses `Type.Unknown()`. These have different JSON Schema -outputs: - -- `Type.Any()` → `{}` (accepts anything, no validation) -- `Type.Unknown()` → `{}` with `additionalProperties: true` semantics - -For the Module approach, **`Type.Unknown()` is canonical**. It's the more -explicit choice — it communicates "this field stores arbitrary data, no -validation applied." `Type.Any()` is a legacy from the original TypeBox API. -The `Metagraph` Module uses `Type.Unknown()` throughout. - -### Performance Expectations +### Performance Graph type Modules are small — typically 5–20 entries (one Config, 2–5 node -types, 2–5 edge types, 2–5 shared types, 2–5 constraint entries). The -`Value.Check` cost scales with schema complexity, not Module size; only the -resolved entry schema is checked, not the entire Module. +types, 2–5 edge types, 2–5 shared types, 2–5 constraint entries). `Value.Check` +cost scales with schema complexity, not Module size; only the resolved entry +schema is checked, not the entire Module. The dereferenced entry strategy (DD6) means each DB row stores only its own -JSON Schema with transitive `$defs` — typically 1–3 KB per entry. A full -graph type's schemas total ~10–50 KB in the DB. This is negligible compared -to the node/edge data being stored. +JSON Schema with transitive `$defs` — typically 1–3 KB per entry. A full graph +type's schemas total ~10–50 KB in the DB, negligible compared to node/edge data. -"Validate on read" (Open Question 5) has a per-read cost. For -high-throughput paths, the repository layer can cache the resolved Module -entry locally after first read, avoiding repeated `Value.Check` for known-good -data. This is a repository-layer optimization, not a Module design concern. +"Validate on read" has a per-read cost. For high-throughput paths, the repository +layer can cache the resolved Module entry locally after first read. This is a +repository-layer optimization, not a Module design concern. ## Codegen Path -`TsToModule` generates TypeBox Modules from TypeScript interfaces. The path from -TypeScript to graph type: +`TsToModule.Generate()` produces TypeBox Module entries from TypeScript +interfaces, enabling a pipeline from TypeScript to graph type: ``` -TypeScript interface → TsToModule.Generate() → TypeBox Module entry -@alkdev/flowgraph CallNodeAttrs → flowgraph schema.ts → FlowGraph Module -@alkdev/taskgraph TaskNodeAttrs → taskgraph schema.ts → TaskGraph Module -@alkdev/operations Identity → operations types.ts → Operations Module +TypeScript interface → TsToModule.Generate() → Module entry +@alkdev/flowgraph CallNodeAttrs → flowgraph schema.ts → FlowGraph Module +@alkdev/taskgraph TaskNodeAttrs → taskgraph schema.ts → TaskGraph Module +@alkdev/operations Identity → operations types.ts → Operations Module ``` Since flowgraph already defines `CallNodeAttrs` as a standalone TypeBox schema, @@ -629,180 +508,32 @@ the codegen can produce a Module entry from it. Storage's `CallGraph` Module the composes `BaseNode` with `CallNodeAttrs` via `Type.Composite`, or imports from the flowgraph Module if flowgraph exports one (see Open Question 1). -## SchemaBuilder Equivalence +## Transition from SchemaBuilder -The removed `SchemaBuilder.build()` used to return a `GraphSchema` — a flat -object with `config`, `nodeTypes: Record`, and `edgeTypes: -Record`. A `Type.Module` with the same entries is -structurally equivalent. This section documents what the builder was doing -internally to show the correspondence. - -### What the builder was doing internally - -``` -SchemaBuilder - .config({ type: "directed", multi: false }) - .nodeType("call", CallNodeSchema) - .edgeType("triggered", EdgeSchema, { allowedSourceTypes: ["call"] }) - .build() - -internally builds: - -defs = { - Config: Type.Object({ type: Literal("directed"), multi: Literal(false), ... }), - CallNode: CallNodeSchema, - TriggeredEdge: EdgeSchema, - TriggeredEdgeConstraints: Type.Object({ edgeType: Literal("triggered"), ... }), -} -return Type.Module(defs) -``` - -The `.build()` return type was `TModule` (TypeBox Module). The `SchemaBuilder` is -removed — consumers use Module construction directly. - -### Why this is equivalent - -The `SchemaBuilder` was building a module under the hood — it just didn't have a -module system to target. Named entries referencing each other via strings is -exactly what `Type.Ref()` does natively. The Module format: - -- Gives `Type.Ref()` instead of loose schema objects -- Gives `Module.Import()` instead of `Type.Intersect` for cross-package refs -- Gives JSON Schema `$defs` that map directly to DB storage -- Gives `Value.Check`, `Value.Diff`, `Value.Errors` on the full type system -- Gives codegen compatibility via `TsToModule.Generate()` - -For the forward-looking connections (typed graph pointers, dbtype table -rendering, ujsx HostConfig for graph schemas), see -[forward-look.md](./forward-look.md). - -## Design Decisions - -### DD1: Module replaces SchemaBuilder - -The SchemaBuilder is replaced by TypeBox Modules. The Module format provides -what SchemaBuilder was building toward, but natively: -- Named references → `Type.Ref()` instead of loose schema objects -- Cross-module imports → `Module.Import()` instead of `Type.Intersect` -- JSON Schema `$defs` → maps directly to DB storage -- Codegen compatibility → `TsToModule.Generate()` produces Module entries - -### DD2: SchemaBuilder removed - -The `SchemaBuilder` is removed. Consumers use `Type.Module()` construction -directly, with `Type.Ref()`, `Type.Composite()`, and `Metagraph.Import()` -as the building blocks. The `moduleToDbSchema()` function replaces -`SchemaBuilder.build()` as the bridge from Module to DB rows. - -### DD3: Config as a Module entry with Literal values - -Specific graph type Modules use `Type.Literal` for config values. The general -`Metagraph.Config` with `Type.Union` and defaults is for construction-time -validation. The specific Module freezes the config to exact values. - -### DD4: Node/edge attribute schemas are Module entries, not `Type.Any()` - -At the application layer, node and edge attribute schemas are named Module entries -with full type safety (`CallGraph.CallNode`, not `schema: Type.Any()`). At the -DB storage layer, the meta-schemas (`NodeType`, `EdgeType`) still have -`schema: Type.Unknown()` because the DB stores arbitrary JSON Schema blobs — the -Module entries are the application-level validation, the DB is the persistence -layer. - -**Mapping**: The repository layer maps between Module entries and DB rows using -the naming convention (`*Node` → `node_types`, `*Edge` → `edge_types`, `Config` -→ `graph_types.config`). On read, it looks up the graph type's Module to get -the validation schema for each entry. - -### DD5: Graphology import/export as the bridge to in-memory graphs - -Storage produces data that `@alkdev/flowgraph`'s `FlowGraph.fromJSON()` and -`SerializedGraph` consume. The Module entries validate data flowing in both -directions. Storage doesn't need its own graphology dependency — it produces -the JSON format, flowgraph consumes it. - -### DD6: Repository stores dereferenced entry schemas - -To avoid `Module.Import()` embedding the full `$defs` of referenced Modules in -every DB row, the repository layer stores **dereferenced entry schemas** — each -`node_types` row gets its entry's resolved JSON Schema with just the transitive -`$defs` it needs, not the entire importing Module's definitions. - -### DD7: Edge type constraints as named Module entries, not DB columns - -Edge type constraints (`allowedSourceTypes`/`allowedTargetTypes`) are named -Module entries (e.g., `TriggeredEdgeConstraints` with `Type.Array(Type.String())` -fields), not just DB columns. This gives them schema validation and -serialization. The repository layer projects these entries to the existing -`edge_types` columns (arrays of node type name strings). The DB schema -doesn't change — the Module entries are the source of truth. - -**Revised from original DD7** which stored constraints only as DB columns. -Named entries are strictly more capable: they validate and serialize; -DB columns are their persistence projection. - -### DD8: Naming convention for Module entries - -Within a graph type Module, entries are named with role-distinguishing suffixes: -`*Node` for node types, `*Edge` for edge types, `Config` for graph configuration, -`*EdgeConstraints` for edge endpoint constraints, and bare names or `*Enum` for -shared types. `moduleToDbSchema()` uses this convention to map entries to DB -tables. - -**Alternative considered**: Explicit metadata/decorators on entries (e.g., -`{ kind: "nodeType", name: "call", schema: ... }`). Rejected because it adds -boilerplate without adding information — the suffix convention is simpler -and sufficient for the expected Module size (5–20 entries). - -### DD9: Pointer abstraction is forward-looking, not v1 - -The structural analogy between ujsx's `ValuePointer`/`selectNode`/`setNode` and -graph node/edge addressing is real, but implementing typed graph pointers (via -JPATH Module or reactive signals) is a post-v1 concern. For v1, repository -functions use direct key-based addressing and the Module validates attribute -shapes. The Module's existence makes typed pointers feasible later because -it provides the schema the pointer validates against. - -**Alternative considered**: Implement typed pointers in v1 via a lightweight -`GraphPointer` wrapper. Rejected because it requires either JPATH Module -dependency or reactive signal integration, both of which add complexity -without clear v1 benefit. Direct key-based addressing is sufficient. - -### DD10: dbtype integration is post-v1 - -`@alkdev/dbtype`'s UJSX→Module→Host pipeline can eliminate the manual dual -definition of SQLite/PG table schemas. But dbtype is Phase 0 (architecture -complete, no implementation). For v1, storage uses manual Drizzle table -definitions. The Module-based graph type definitions are compatible with dbtype -because both produce `Type.Module` objects — the integration path is clear. - -**Alternative considered**: Implement dbtype integration alongside the initial Module -construction. Rejected because it adds a dependency on an unimplemented package -and the manual table definitions work well. The cost of deferring is continued -dual SQLite/PG maintenance, which is manageable for 6 metagraph tables. - -## What Changes +The existing `schemaBuilder.ts` and `types.ts` use a different approach that is +being replaced: | Before (unreleased) | After | |---------|-----| | `types.ts` — standalone schemas | `modules/metagraph.ts` — `Metagraph` Module | -| `schemaBuilder.ts` — fluent builder | Removed — replaced by Module construction | +| `schemaBuilder.ts` — fluent builder | Removed — replaced by `Type.Module()` construction | | `types.ts` — `BaseNodeAttributes`, `BaseEdgeAttributes` | `Metagraph` Module entries | | `types.ts` — `GraphConfig`, `GraphStatus`, `GraphBaseType` | `Metagraph` Module entries + const objects | | `allowedSourceTypes`/`allowedTargetTypes` as DB columns only | Named `*EdgeConstraints` Module entries (projected to DB columns) | | No concrete graph type Modules | `modules/call-graph.ts`, `modules/acl-graph.ts`, etc. | | No bridge between Module ↔ DB ↔ graphology | `bridge.ts` — validation, DB mapping, graphology format | -## What Doesn't Change +Note: `Type.Any()` used in the old `types.ts` for `metadata` and `schema` fields +is replaced by `Type.Unknown()` in the Module approach. Both produce `{}` in +JSON Schema, but `Type.Unknown()` is the canonical choice — it explicitly +communicates "no validation applied." -- **Database tables** — same 6 metagraph tables, same columns, same relations -- **SQLite host** — table definitions, relations, client factory unchanged -- **PostgreSQL host** (planned) — same shapes, different dialect -- **`@alkdev/typebox` dependency** — same. Modules are a core TypeBox feature -- **Encryption utility** — unchanged, can be a Module entry in `SecretGraph` -- **`allowedSourceTypes`/`allowedTargetTypes`** — same DB columns, same semantics - (Module entries are the source of truth, projected to DB columns by - `moduleToDbSchema()`) +**What doesn't change**: The 6 metagraph database tables, their columns, and +relations remain the same. SQLite host table definitions, client factory, and +drizzlebox-generated schemas are unchanged. The `@alkdev/typebox` dependency is +unchanged. The encryption utility (planned) is unchanged. `allowedSourceTypes` +and `allowedTargetTypes` remain DB columns with the same semantics — Module +entries are the source of truth, projected to columns by `moduleToDbSchema()`. ## Implementation Path @@ -817,62 +548,138 @@ dual SQLite/PG maintenance, which is manageable for 6 metagraph tables. 4. **Phase 4**: Add `moduleToGraphology()` and `fromGraphologyExport()` for the graphology bridge. Storage produces the format, flowgraph consumes it. -Acceptance criteria per phase: -- **Phase 2 complete**: `moduleToDbSchema()` produces values compatible with all - 6 metagraph tables -- **Phase 3 complete**: Reference Modules validate against their flowgraph/taskgraph - counterparts +Acceptance criteria: +- **Phase 2 complete**: `moduleToDbSchema()` produces values compatible with + all 6 metagraph tables +- **Phase 3 complete**: Reference Modules validate against their + flowgraph/taskgraph counterparts -## Relationship to Other Packages +### Relationship to Other Packages | Package | What changes | What stays | |---------|-------------|------------| | `@alkdev/storage` | `types.ts` → Module, `schemaBuilder.ts` → removed, new `modules/` and `bridge.ts` | Tables, relations, crypto, client factory | -| `@alkdev/flowgraph` | `CallNodeAttrs`, `CallEdgeAttrs`, `CallStatus` become Module entries (optional, exported from `/schema` subpath) | FlowGraph class, analysis, all runtime logic | +| `@alkdev/flowgraph` | `CallNodeAttrs`, `CallEdgeAttrs`, `CallStatus` become Module entries (optional, exported from `/schema`) | FlowGraph class, analysis, all runtime logic | | `@alkdev/taskgraph` | `TaskGraphNodeAttributes`, `DependencyEdge` become Module entries (optional) | TaskGraph class, analysis, all runtime logic | | `@alkdev/operations` | `Identity`, `AccessControl` become Module entries (optional) | Registry, call protocol, adapters | | `@alkdev/pubsub` | No change | Transport layer | | `@alkdev/ujsx` | No change (already a Module) | The pattern we're following | | `@alkdev/dbtype` | No change (Phase 0) | Future: storage table defs could be dbtype element trees | +## Design Decisions + +### DD1: TypeBox Module replaces the SchemaBuilder + +Graph type definitions are `Type.Module` objects. The previous `SchemaBuilder` +class is removed — consumers use `Type.Module()` construction directly, with +`Type.Ref()`, `Type.Composite()`, and `Metagraph.Import()` as the building +blocks. The `moduleToDbSchema()` function replaces `SchemaBuilder.build()` as +the bridge from Module to DB rows. + +This provides `Type.Ref()` for internal references, `Module.Import()` for +cross-package references, JSON Schema `$defs` that map directly to DB storage, +and codegen compatibility via `TsToModule.Generate()`. + +### DD2: Metagraph.Import() for same-package Modules + +Concrete graph types within `@alkdev/storage` use `Metagraph.Import("BaseNode")` +to compose base schemas. This avoids duplication and keeps the base schemas in +one place. External packages that define graph type Modules should re-declare +base schemas locally — storage should not be a dependency of other packages' +schema definitions. + +### DD3: Config as a Module entry with Literal values + +General `Metagraph.Config` uses `Type.Union` with defaults for construction-time +validation. Specific graph types freeze config values to `Type.Literal`, making +the config a precise contract rather than a validation surface. + +### DD4: Node/edge attribute schemas are Module entries, not Type.Any() + +At the application layer, node and edge attribute schemas are named Module +entries with full type safety (`CallGraph.CallNode`, not `schema: Type.Any()`). +At the DB storage layer, the meta-schemas (`NodeType`, `EdgeType`) still have +`schema: Type.Unknown()` because the DB stores arbitrary JSON Schema blobs. + +### DD5: Storage produces graphology format, flowgraph consumes it + +Storage doesn't need a graphology dependency. It produces the JSON serialization +format that `@alkdev/flowgraph`'s `FlowGraph.fromJSON()` and `SerializedGraph` +consume. The Module entries validate data flowing in both directions. + +### DD6: Repository stores dereferenced entry schemas + +When a Module entry uses `Module.Import()`, the entry's JSON Schema embeds the +referenced Module's `$defs`. To avoid storing the full referenced Module in +every DB row, the repository layer stores **dereferenced entry schemas** — each +`node_types` row gets its entry's resolved JSON Schema with just the transitive +`$defs` it needs, not the entire importing Module's definitions. + +### DD7: Edge type constraints as named Module entries + +Edge type constraints (`allowedSourceTypes`/`allowedTargetTypes`) are named +Module entries (e.g., `TriggeredEdgeConstraints`), not just DB columns. This +gives them schema validation (`Value.Check`) and serialization (JSON Schema +with `$defs`). The repository layer projects these entries to the existing +`edge_types` columns. The DB schema doesn't change — Module entries are the +source of truth, DB columns are the persistence projection. + +### DD8: Naming convention for Module entries + +Module entries use role-distinguishing suffixes: `*Node` for node types, +`*Edge` for edge types, `Config` for graph configuration, `*EdgeConstraints` +for edge endpoint constraints, and bare names or `*Enum` for shared types. +`moduleToDbSchema()` uses this convention to map entries to DB tables. + +This was chosen over explicit metadata/decorators (e.g., +`{ kind: "nodeType", name: "call", schema: ... }`) because the suffix convention +is simpler and sufficient for the expected Module size (5–20 entries). + +### DD9: Pointer abstraction is forward-looking, not v1 + +The structural analogy between ujsx's `ValuePointer`/`selectNode`/`setNode` and +graph node/edge addressing is real, but implementing typed graph pointers (via +JPATH Module or reactive signals) is a post-v1 concern. For v1, repository +functions use direct key-based addressing (`findNode(graphId, nodeKey)`), and +the Module validates attribute shapes. See [forward-look.md](./forward-look.md). + +### DD10: dbtype integration is post-v1 + +`@alkdev/dbtype`'s UJSX→Module→Host pipeline can eliminate the manual dual +definition of SQLite/PG table schemas. But dbtype is Phase 0 (architecture +complete, no implementation). For v1, storage uses manual Drizzle table +definitions. The Module-based graph type definitions are compatible with dbtype +because both produce `Type.Module` objects — the integration path is clear. +See [forward-look.md](./forward-look.md). + ## Open Questions 1. **Should `@alkdev/flowgraph` export a `Type.Module`, or should storage define its own entries with documented correspondence?** Flowgraph currently exports `CallNodeAttrs` as a standalone `Type.Object`. To use `Import()`, flowgraph - needs to export a Module. But storage can start with standalone schemas and + needs to export a Module. Storage can start with standalone schemas and `Type.Composite([BaseNode, CallNodeAttrs])` — no dependency on flowgraph. - Adopt `Import()` when flowgraph provides a Module. **This avoids a - circular dependency: `@alkdev/storage` does NOT depend on `@alkdev/flowgraph`.** + Adopt `Import()` when flowgraph provides a Module. **This avoids a circular + dependency: `@alkdev/storage` does NOT depend on `@alkdev/flowgraph`.** 2. **Should concrete graph type Modules live in storage or in their respective packages?** Call-graph attribute schemas are defined by flowgraph's domain, not storage's. Storage provides the metagraph *framework* (the `Metagraph` Module - with `BaseNode`, `BaseEdge`, `Config`). Concrete graph types like `CallGraph` - could live either in storage (as reference implementations) or in their - respective packages (flowgraph exports `CallGraph` Module alongside - `CallNodeAttrs`). **Decision: Both.** Storage provides reference Modules in - `modules/` that consumers can use directly or replace. Flowgraph may also - export a Module — the two are compatible via Module `$defs`. + with `BaseNode`, `BaseEdge`, `Config`). Concrete types like `CallGraph` could + live either in storage (as reference implementations) or in their respective + packages. **Decision: Both.** Storage provides reference Modules in `modules/` + that consumers can use directly or replace. Flowgraph may also export a + Module — the two are compatible via Module `$defs`. 3. **Should `*EdgeConstraints` entries use `Type.Ref("CallNode")` or - `Type.String()` for allowed source/target types?** Using `Type.Ref` - would mean "each element in the array must validate against the CallNode - schema," which is semantically wrong — the constraint is about which named - node types are valid endpoints, not about data shapes. Using `Type.String()` - matches the actual semantics (arrays of node type names) but loses the - structural link. **Decision: `Type.String()`** — the constraint arrays - contain names, not schemas. The naming convention provides an implicit - contract that string values should correspond to `*Node` entry names, - enforced by `moduleToDbSchema()` at projection time. + `Type.String()` for allowed source/target types?** See the + [Edge Type Constraints](#edge-type-constraints) section. **Decision: + `Type.String()`** — the constraint arrays contain names, not schemas. 4. **How does the graph pointer abstraction interact with the repository layer?** - For v1, repository functions use direct key-based addressing. Typed pointers - (JPATH Module, reactive ValuePointer) could layer on top of the repository - later. The key question: does the repository return raw data (untyped JSON), - or does it validate against the Module before returning? **Decision: validate - on read** — if the data doesn't match the Module entry, throw. This makes - typed pointers safe: any value you get from the repo conforms to the schema. + For v1, repository functions use direct key-based addressing. **Decision: + validate on read** — if data doesn't match the Module entry, throw. This + makes any value retrieved from the repo conform to the schema. ## References @@ -880,8 +687,8 @@ Acceptance criteria per phase: - ujsx ADR-002 (Module as type registry): `/workspace/@alkdev/ujsx/docs/architecture/decisions/002-typebox-module-as-registry.md` - ujsx schema docs: `/workspace/@alkdev/ujsx/docs/architecture/schema.md` - TsToModule codegen: `/workspace/research/typebox_research/codegen/ts-to-module.ts` -- ujsx Module examples: `/workspace/research/typebox_research/ujsx/unist.gen.ts`, `/workspace/research/typebox_research/ujsx/mdast.gen.ts` - Flowgraph schema (standalone TypeBox, not yet Module): `/workspace/@alkdev/flowgraph/src/schema/` - Flowgraph SerializedGraph factory: `/workspace/@alkdev/flowgraph/src/schema/graph.ts` -- Forward-looking connections (pointers, dbtype, ujsx IR): [forward-look.md](./forward-look.md) -- Ecosystem integration: [overview.md](./overview.md) \ No newline at end of file +- Schema evolution: [schema-evolution.md](./schema-evolution.md) +- Forward-looking connections: [forward-look.md](./forward-look.md) +- Package overview: [overview.md](./overview.md) \ No newline at end of file