docs: restructure metagraph-module.md for clarity and reduced redundancy

- Eliminate 4x redundancy on SchemaBuilder removal (was in Overview, Equivalence section, DD1, DD2)
- Remove forward references to DD numbers that break reading flow
- Separate specification from rationale (DDs capture decisions, body specifies)
- Fix Type.Ref inconsistency in Edge Constraints example (should use Metagraph.Import per DD2)
- Expand 'Why TypeBox Modules' with the three friction points it solves
- Add Performance subsection, Codegen Path, Transition table, Implementation Path
- Restore Relationship to Other Packages table
- Remove historical artifacts (SchemaBuilder equivalence internals, Type.Any migration notes)
- 887 lines → 694 lines (22% reduction)
This commit is contained in:
2026-05-29 06:59:05 +00:00
parent 3b63d92976
commit 6c3ed598db

View File

@@ -1,16 +1,16 @@
---
status: draft
last_updated: 2026-05-30
last_updated: 2026-05-29
---
# Metagraph as TypeBox Module
Graph type definitions as `Type.Module`aligning with the ujsx pattern for
recursive schemas, cross-package references, codegen, and graphology serialization.
Graph type definitions as `Type.Module`recursive schemas, cross-package
references, and DB persistence.
## The Metagraph Data Model
The metagraph pattern is a three-level type system:
The metagraph is a three-level type system:
1. **GraphType** — A class of graphs (e.g., "call-graph", "acl",
"task-dependencies"). Defines structural constraints
@@ -24,8 +24,8 @@ The metagraph pattern is a three-level type system:
"can_read", "depends_on"). Each edge type has a TypeBox schema for its
attributes. Optionally constrains which source/target node types are valid.
Then **Graph instances** belong to a graph type and contain **Nodes** and
**Edges** conforming to those type definitions.
**Graph instances** belong to a graph type and contain **Nodes** and **Edges**
conforming to those type definitions.
```
GraphType "call-graph" (directed, multi, self-loops allowed)
@@ -42,7 +42,7 @@ Graph "session-abc-call-graph" (instance)
│ └── attributes: { requestId, operationId, status, ... }
├── Node "call-002" → nodeTypeId → NodeType "subcall"
│ └── attributes: { requestId, parentRequestId, ... }
└── Edge "edge-001" → edgeTypeId → EdgeType "triggered"
└── Edge "edge-001" → edgeTypeId → NodeType "triggered"
└── attributes: { type: "triggered" }
sourceNodeKey: "call-001"
targetNodeKey: "call-002"
@@ -54,83 +54,45 @@ Nodes and edges use a **composite identity model**: identified by
`key` is the identity.
Node and edge attributes are stored as JSON text in SQLite (jsonb in PG). The
graph type's schema defines what shape these attributes should have, but the
database doesn't enforce the schema — all validation happens in the repository
layer. See [schema-evolution.md](./schema-evolution.md) for how schemas change
over time, and [sqlite-host.md](./sqlite-host.md) for the table definitions.
graph type's schema defines the expected shape, but the database doesn't enforce
it — validation happens in the repository layer. See
[schema-evolution.md](./schema-evolution.md) for how schemas change over time,
and [sqlite-host.md](./sqlite-host.md) for table definitions.
## Overview
## Why TypeBox Modules
A graph type definition is naturally a TypeBox Module. It has named entries
(node types, edge types, config) that reference each other with `Type.Ref()`,
compose with `Type.Composite()`, and can cross-reference other Modules with
`Import()`. This is the same pattern used by `@alkdev/ujsx` (where `UJSX` is
a Module with `UPrimitive`, `UElement`, `URoot`, `UNode` recursively referencing
each other).
A graph type definition has named entries (node types, edge types, config) that
reference each other. `Type.Module` is the natural fit:
The removed `SchemaBuilder` produced a flat `GraphSchema` object — an ad-hoc
`Record<string, NodeType>` + `Record<string, EdgeType>`. This works but
creates friction:
- **`Type.Ref("CallStatus")`** — recursive and internal references resolve
within the Module's `$defs`
- **`Module.Import("CallStatus")`** — cross-package references embed the
referenced Module's `$defs`
- **`Value.Check(Module.Import("CallNode"), data)`** — runtime validation
- **`Static<typeof Module>`** — TypeScript types from the Module
1. **No cross-graph-type references** — a call graph node type can't reference
`CallStatus` from `@alkdev/flowgraph` without manual `Type.Intersect`
composition. Each package defines schemas independently, duplicating types.
2. **No graphology compatibility** — the schema output is a flat JSON object,
not a format that maps to graphology's `import()`/`export()`. Consumers
manually map node/edge attributes.
This replaces the removed `SchemaBuilder`, which produced a flat
`Record<string, NodeType>` + `Record<string, EdgeType>`. That approach had
three limitations that Modules solve natively:
1. **No cross-graph-type references** — a call graph node type couldn't
reference `CallStatus` from `@alkdev/flowgraph` without manual
`Type.Intersect`. Each package duplicated types independently.
2. **No graphology compatibility** — the flat JSON output didn't map to
graphology's `import()`/`export()`. Consumers manually mapped node/edge
attributes.
3. **No codegen leverage**`TsToModule` generates TypeBox Modules from
TypeScript interfaces. The SchemaBuilder couldn't consume Module output, so
codegen-produced types must be manually translated.
TypeScript interfaces, but the builder couldn't consume Module output.
The Module approach treats each graph type as a `Type.Module`, aligning storage
with how ujsx already works — recursive types via `Ref`, composition via
`Composite`, cross-references via `Import`.
This aligns with the pattern proven in `@alkdev/ujsx`, where `UJSX` is a Module
with `UPrimitive`, `UElement`, `URoot`, `UNode` recursively referencing each
other. See [forward-look.md](./forward-look.md) for how this connects to the
broader ecosystem (codegen, graphology, dbtype).
For the forward-looking view of how this connects to dbtype, graph pointers,
and the ujsx universal IR pipeline, see [forward-look.md](./forward-look.md).
## Base Module: Metagraph
## The Pattern (Proven in ujsx)
`@alkdev/ujsx` already uses this pattern (ADR-002: "TypeBox Module as type
registry"):
```ts
// ujsx: schema.ts
export const UJSX = Type.Module({
UPrimitive: Type.Union([Type.String(), Type.Number(), Type.Boolean(), Type.Null()]),
PropValue: Type.Union([..., Type.Ref("UNode"), ...]),
UniversalProps: Type.Object({}, { additionalProperties: Type.Union([Type.Ref("PropValue"), Type.Undefined()]) }),
UElement: Type.Object({
type: Type.String(),
props: Type.Ref("UniversalProps"),
children: Type.Array(Type.Ref("UNode")), // recursive!
}),
URoot: Type.Object({
type: Type.Literal("root"),
props: Type.Ref("UniversalProps"),
children: Type.Array(Type.Ref("UNode")), // recursive!
}),
UNode: Type.Union([Type.Ref("UPrimitive"), Type.Ref("UElement"), Type.Ref("URoot")]),
});
```
Key properties:
- **`Type.Ref("UNode")`** resolves within the Module's `$defs` — recursive
references are natural
- **`UJSX.Import("UElement")`** lets other Modules reference ujsx types — the
referenced Module's `$defs` are embedded in the importing Module's JSON Schema
- **`Value.Check(UJSX.Import("UElement"), node)`** validates at runtime
- **`Static<typeof UJSX>`** gives TypeScript types (or hand-written types for
non-serializable entries like `ComponentFn`)
Graph type definitions have the same structure — named entries that reference
each other, with possible cross-references to other packages' Modules.
## Proposed: GraphType as a TypeBox Module
### Base Module: Metagraph
The metagraph meta-schema itself is a Module:
The metagraph meta-schema is a Module providing base entries that concrete
graph types compose from:
```ts
export const Metagraph = Type.Module({
@@ -157,12 +119,26 @@ export const Metagraph = Type.Module({
});
```
### Concrete Graph Type: CallGraph
- `Config` uses `Type.Union` with defaults for construction-time validation
("any valid config"). Specific graph types narrow these to `Type.Literal`
values.
- `BaseNode` and `BaseEdge` provide common attribute schemas. Concrete graph
types compose them via `Type.Composite`.
- `metadata` and similar "arbitrary data" fields use `Type.Unknown()`
(not `Type.Any()`). `Type.Unknown()` is canonical — it communicates "no
validation applied" explicitly.
A specific graph type is also a Module. It composes `BaseNode`/`BaseEdge` via
`Type.Composite()` (same as ujsx's `Mdast.Node: Type.Composite([Unist.Import("UnistNode"), ...])`):
## Concrete Graph Type Modules
A specific graph type is also a `Type.Module`. It composes `BaseNode`/`BaseEdge`
via `Metagraph.Import()` and `Type.Composite()`, narrows config to literal values,
and defines its own node types, edge types, and shared types.
### Example: CallGraph
```ts
import { Metagraph } from "./metagraph.ts";
export const CallGraph = Type.Module({
// Config is specific — literal values, not unions with defaults
Config: Type.Object({
@@ -171,7 +147,7 @@ export const CallGraph = Type.Module({
allowSelfLoops: Type.Literal(false),
}),
// Node types compose BaseNode (from Metagraph) with call-specific attributes
// Node types compose BaseNode with call-specific attributes
CallNode: Type.Composite([
Metagraph.Import("BaseNode"),
Type.Object({
@@ -229,9 +205,33 @@ export const CallGraph = Type.Module({
});
```
### Type.Composite, not Type.Intersect
Graph type Modules use `Type.Composite` to extend base schemas, not
`Type.Intersect`. The difference:
- **`Type.Intersect`** produces a `TIntersect` wrapper with `allOf` — consumers
must traverse `allOf` to access properties.
- **`Type.Composite`** produces a flat `TObject` — overlapping keys are
intersected via `IntersectEvaluated`, non-overlapping keys are merged.
Both use intersection semantics for overlapping keys. When overlapping keys have
a subtype relationship (e.g., `type: Type.String()``type: Type.Literal("triggered")`),
the intersection resolves to the narrower type, which is the correct behavior.
**Constraint**: Do not use `Type.Composite` with overlapping keys of incompatible
types. If `BaseEdge` has `type: Type.String()` and a concrete edge type needs
`type: Type.Number()`, the intersection evaluates to `never`. For graph types,
this is not a concern — base and concrete keys either don't overlap, or the
overlap is a valid subtype narrowing.
### Cross-Module References
`Module.Import()` allows one Module to reference entries from another:
`Module.Import()` allows one Module to reference entries from another. In the
CallGraph example, `Metagraph.Import("BaseNode")` embeds `Metagraph`'s `$defs`
into `CallGraph`'s JSON Schema output.
This also works across packages:
```ts
import { FlowGraph } from "@alkdev/flowgraph/schema";
@@ -241,146 +241,76 @@ const CallGraph = Type.Module({
CallNode: Type.Composite([
Type.Ref("BaseNode"),
Type.Object({
status: FlowGraph.Import("CallStatus"), // from flowgraph
identity: Type.Optional(FlowGraph.Import("Identity")), // from flowgraph
// ...
status: FlowGraph.Import("CallStatus"),
identity: Type.Optional(FlowGraph.Import("Identity")),
}),
]),
});
```
This is exactly the `Mdast.Import("UnistNode")` pattern from the ujsx research.
**⚠️ Import embedding**: `Module.Import()` embeds the referenced Module's `$defs`
into the importing Module's JSON Schema output. When `CallGraph` imports from
**Import embedding**: `Module.Import()` embeds the referenced Module's `$defs`
into the importing Module's JSON Schema. When `CallGraph` imports from
`FlowGraph`, the resulting JSON Schema includes all of `FlowGraph`'s definitions
in `$defs`. See DD6 for how the repository layer handles this.
in `$defs`. The repository layer stores **dereferenced entry schemas** — each
`node_types` row gets its entry's resolved JSON Schema (with inline `$defs` for
just its transitive references), not the entire importing Module. This avoids
storage bloat and version coupling (DD6).
**Decision (DD6)**: The repository layer stores **dereferenced entry schemas**
each `node_types` row gets its entry's resolved JSON Schema (with inline `$defs`
for just its transitive references), not the entire importing Module. This
avoids storage bloat and version coupling issues.
### BaseNode/BaseEdge: Import vs Local Re-declaration
### BaseNode/BaseEdge: Local Re-declaration vs Metagraph.Import
There are two ways to get `BaseNode`/`BaseEdge` into a concrete graph type Module:
`Type.Ref()` only resolves entries within the *same* Module. In the `CallGraph`
example above, `Type.Ref("BaseNode")` requires `BaseNode` to be an entry in the
`CallGraph` Module. There are two strategies for getting `BaseNode`/`BaseEdge`
into a concrete graph type Module:
- **`Metagraph.Import("BaseNode")`** — references the base Module directly.
No duplication, but embeds `Metagraph`'s `$defs` (3 entries — minimal bloat).
- **Local re-declaration** — copy the base schemas into the concrete Module.
No `$defs` embedding, but duplication if `Metagraph` evolves.
**Option A: Re-declare locally** (shown in the example above). Each concrete
Module includes its own `BaseNode`/`BaseEdge` entries. The schemas are identical
to `Metagraph.BaseNode`/`Metagraph.BaseEdge` — you copy them in. Simple, but
creates duplication. If the base schemas evolve, each concrete Module must be
updated independently.
**Decision**: Use `Metagraph.Import()` for Modules within `@alkdev/storage`
(e.g., `modules/call-graph.ts`). Both Modules live in the same package, so
there's no circular dependency. For Modules defined in external packages
(e.g., `@alkdev/flowgraph`), re-declare base schemas locally — external
packages should not depend on storage's `Metagraph` Module.
**Option B: Metagraph.Import**. The concrete Module imports from `Metagraph`:
### Config: Literal Values Freeze the Configuration
The general `Metagraph.Config` uses `Type.Union` with defaults (for
construction-time: "any valid config"). Specific graph types freeze these to
`Type.Literal` values:
```ts
const CallGraph = Type.Module({
CallNode: Type.Composite([
Metagraph.Import("BaseNode"),
Type.Object({ requestId: Type.String(), ... }),
]),
});
```
// General: accepts any valid config
Metagraph.Config // type: union of "directed"|"undirected"|"mixed", multi: boolean, ...
This avoids duplication but embeds `Metagraph`'s `$defs` into `CallGraph`'s
JSON Schema output. For most cases, `Metagraph` is small (3 entries) so the
bloat is minimal. If `Metagraph` grows, this could become a concern.
**Decision: Option B for same-package Modules (recommended), Option A as
fallback for external-package Modules**.
For Modules defined within `@alkdev/storage` (like `CallGraph` in
`modules/call-graph.ts`), `Metagraph.Import("BaseNode")` has no circular
dependency issue — both `Metagraph` and `CallGraph` live in the same package.
The `Import` approach avoids duplication and keeps the base schemas in one
place.
For Modules defined outside `@alkdev/storage` (e.g., in `@alkdev/flowgraph`),
Option A applies because external packages should not depend on storage's
`Metagraph` Module (see Open Question 1). Those packages re-declare their own
base schemas or define them independently.
The v1 reference Modules in `modules/` should use Option B. If a future
consumer defines a `CallGraph` Module externally, they can choose either
approach — the schemas are structurally identical.
**Verified**: `Type.Composite([Type.Ref("BaseNode"), Type.Object({...})])`
within a Module resolves correctly. Test confirms: `Value.Check(Module.Import("CallNode"), validData)` passes.
### Type.Composite vs Type.Intersect
The Module approach uses `Type.Composite` for extending `BaseNode`/`BaseEdge`,
not `Type.Intersect`. This matches the ujsx pattern where `Mdast.Node` is
`Type.Composite([Unist.Import("UnistNode"), Type.Object({...})])`.
The difference:
- **`Type.Intersect`** creates a JSON Schema `allOf` — the result is a
`TIntersect` wrapper with nested schemas. Consumers must traverse `allOf`
to access properties.
- **`Type.Composite`** produces an **intersection evaluated into a flat
`TObject`** — overlapping keys are intersected via `IntersectEvaluated`
and the result is a single object with no `allOf` wrapper. The output
shape is `{ key1: Intersect([typeA, typeB]), key2: typeC, ... }`.
**Both use intersection semantics for overlapping keys.** Composite is NOT
an `Object.assign` override — when overlapping keys have varying (incompatible)
types, the result is `never`. When overlapping keys have a subtype
relationship (like `Type.String()` and `Type.Literal("triggered")`), the
intersection resolves to the narrower type (`Type.Literal("triggered")`),
which is the correct behavior.
**Why Composite over Intersect for graph types**: The output is a flat
`TObject` that maps directly to a node/edge attribute schema. `Intersect`
produces a `TIntersect` wrapper that would need unwrapping. For graph types
where base and concrete attributes have non-overlapping keys (most cases)
or subtype-only overlaps (like `type: Type.String()``type: Type.Literal(...)`),
Composite evaluates to the same result but in a more convenient shape.
**Design constraint**: Do not use `Type.Composite` with overlapping keys of
incompatible types. If `BaseEdge` has `type: Type.String()` and a concrete
edge type needs `type: Type.Number()`, the intersection evaluates to `never`.
For graph types, this is not a concern — base and concrete keys either don't
overlap, or the overlap is a valid subtype narrowing (union → literal).
### Config: Literal Values for Specific Graph Types
The general `Metagraph.Config` has `Type.Union` with defaults (for
construction-time validation: "any valid config"). Specific graph types use
`Type.Literal` for frozen config values:
```ts
// General (construction): Type.Union([Type.Literal("directed"), Type.Literal("undirected"), ...])
// Specific (frozen): Type.Literal("directed")
// Specific: frozen to exact values
CallGraph.Config // type: "directed", multi: false, allowSelfLoops: false
```
The construction flow: consumer provides a general config → validated against
`Metagraph.Config` → the specific graph type Module uses `Type.Literal` to
freeze the value. Narrowing from `Type.Union` to `Type.Literal` is explicit
in the Module — no builder step needed.
`Metagraph.Config` → the specific graph type Module freezes the values with
`Type.Literal`.
### Edge Type Constraints: named constraint entries
## Edge Type Constraints
Edge type constraints (`allowedSourceTypes`/`allowedTargetTypes`) are **named
Module entries**, not columns bolted onto DB rows. This makes them first-class
parts of the schema — queryable, validatable, and composable:
parts of the schema — queryable, validatable, and serializable.
```ts
import { Metagraph } from "./metagraph.ts";
export const CallGraph = Type.Module({
// ...
TriggeredEdge: Type.Composite([
Type.Ref("BaseEdge"),
Metagraph.Import("BaseEdge"),
Type.Object({ type: Type.Literal("triggered") }),
]),
TriggeredEdgeConstraints: Type.Object({
edgeType: Type.Literal("triggered"),
allowedSourceTypes: Type.Array(Type.String()), // node type names: ["Call"]
allowedTargetTypes: Type.Array(Type.String()), // node type names: ["Call", "Subcall"]
allowedSourceTypes: Type.Array(Type.String()), // ["Call"]
allowedTargetTypes: Type.Array(Type.String()), // ["Call", "Subcall"]
}),
DependsOnEdge: Type.Composite([
Type.Ref("BaseEdge"),
Metagraph.Import("BaseEdge"),
Type.Object({ type: Type.Literal("depends_on") }),
]),
DependsOnEdgeConstraints: Type.Object({
@@ -391,47 +321,23 @@ export const CallGraph = Type.Module({
});
```
**Why Module entries instead of DB columns** (DD7 revised):
1. **Schema-level validation**: `Value.Check(CallGraph.TriggeredEdgeConstraints, data)`
validates that constraint data is well-formed. With DB columns, there's no
schema validation — just JSON arrays in text columns.
2. **Serialization**: The constraint entries serialize to JSON Schema with
`$defs`, enabling `Value.Diff` for migration detection and `FromSchema`
for round-tripping.
3. **DB mapping**: The `moduleToDbSchema()` function extracts
`*EdgeConstraints` entries and writes their `allowedSourceTypes`/
`allowedTargetTypes` fields to the existing `edge_types` columns. The DB
schema doesn't change — the Module entries are the source of truth, the
DB columns are the persistence projection.
**Why Type.String() not Type.Ref()**: The constraint arrays contain node type
*names* (strings like `"Call"`), not node type *schemas*. `Type.Ref("CallNode")`
would mean "an element must validate against the CallNode schema," which is
incorrect — the constraint is about which named node types are valid endpoints,
not about node data shapes. The naming convention (`*Node` suffix) provides an
implicit structural contract: string values in `allowedSourceTypes` should
correspond to `*Node` entry names in the same Module. This is enforced by
`moduleToDbSchema()` at Module-to-DB projection time, not by the schema itself.
See Open Question 4 for the `Type.Ref` vs `Type.String` trade-off.
**DB mapping note**: The current DB schema stores `allowedSourceTypes` and
`allowedTargetTypes` as JSON text columns (arrays of strings, default `[]`).
In the Module, these become `Type.Array(Type.String())` entries — the DB
column values are the same string arrays. `moduleToDbSchema()` extracts them
directly. Read-path reconstruction resolves the names back to Module entries
for validation.
**Why `Type.String()` not `Type.Ref()`**: The constraint arrays contain node
type *names* (strings like `"Call"`), not node type *schemas*. `Type.Ref("CallNode")`
would mean "each element must validate against the CallNode schema," which is
semantically wrong — the constraint is about which named node types are valid
endpoints, not about data shapes. The `*Node` suffix naming convention provides
an implicit structural contract. `moduleToDbSchema()` enforces this convention
at Module-to-DB projection time.
**Empty array semantics**: In the DB, `[]` means "no restriction" (any node
type valid). In the Module, omitting the `*EdgeConstraints` entry means the
same thing. An explicit entry with empty arrays is not valid — it would mean
"no node types are valid at this endpoint," which is nonsensical. The
repository layer enforces this convention.
type valid). In the Module, omitting the `*EdgeConstraints` entry means the same
thing. An explicit entry with empty arrays is not valid — it would mean "no node
types are valid at this endpoint," which is nonsensical.
### Entry Naming Convention
## Entry Naming Convention
Within a graph type Module, entries follow a naming convention that distinguishes
their role (DD8):
Within a graph type Module, entries follow a suffix convention that distinguishes
their role and determines their DB mapping:
| Suffix | Role | Maps to DB |
|--------|------|------------|
@@ -442,15 +348,14 @@ their role (DD8):
| `*Enum` or bare name | Shared enum/type | Embedded in `node_types.schema`/`edge_types.schema` |
| `BaseNode`, `BaseEdge` | Base attribute schemas | Composed into `*Node`/`*Edge` entries |
The `moduleToDbSchema()` function uses this convention to map Module entries to
the `node_types` and `edge_types` tables. Entries ending in `Node` become rows
with `name = entryNameWithoutSuffix ("Node")` and `schema = resolved entry`.
Same for `*Edge`. The `Config` entry maps to `graph_types.config`.
`moduleToDbSchema()` uses this convention to project Module entries to DB rows.
Entries ending in `Node` become rows with `name = entryNameWithoutSuffix("Node")`
and `schema = resolved entry`. Same for `*Edge`. The `Config` entry maps to
`graph_types.config`.
## graphology Serialization Bridge
The bridge between Modules and graphology is the `SerializedGraph` pattern that
`@alkdev/flowgraph` already uses:
The bridge between Modules and graphology is the `SerializedGraph` pattern:
```ts
// flowgraph's current pattern (standalone schemas)
@@ -462,7 +367,7 @@ const CallGraphSerialized = SerializedGraph(
// Module pattern (entries from the Module)
const CallGraphSerialized = SerializedGraph(
CallGraph.CallNode, // entry from Module — resolves Refs through $defs
CallGraph.CallNode, // entry from Module — resolves Refs through $defs
CallGraph.DependsOnEdge, // entry from Module
Type.Object({}),
);
@@ -472,7 +377,7 @@ Graphology's serialized format:
```ts
{
attributes: {}, // Graph-level attributes (empty for most graphs)
attributes: {}, // Graph-level attributes
options: {
type: "directed", // From CallGraph.Config
multi: false,
@@ -493,36 +398,27 @@ The mapping:
- `CallGraph.CallNode` → validates `nodes[].attributes`
- `CallGraph.TriggeredEdge` → validates `edges[].attributes`
This is **complementary** to `@alkdev/flowgraph`'s `SerializedGraph` — storage
produces the data, flowgraph operates on it in memory. The `SerializedGraph`
factory function stays the same — its schema arguments now come from Module
entries instead of standalone schemas. The `moduleToDbSchema()`
function extracts per-entry schemas for DB storage; the `moduleToGraphology()`
function produces the graphology import format for hydration.
Storage produces this format; `@alkdev/flowgraph`'s `FlowGraph.fromJSON()` and
`SerializedGraph` consume it. The `SerializedGraph` factory function stays the
same — its schema arguments now come from Module entries instead of standalone
schemas. Storage doesn't need a graphology dependency.
## DB Persistence Bridge
The repository layer maps Module entries to the existing 6-table schema:
The repository layer maps Module entries to the 6-table metagraph schema:
1. **`graph_types`** row: `name` = Module name, `config` = `CallGraph.Config`
JSON Schema (with defaults resolved)
2. **`node_types`** rows: one row per `*Node` entry, `name` = entry name
(minus `Node` suffix), `schema` = resolved entry JSON Schema
3. **`edge_types`** rows: one row per `*Edge` entry, `name` = entry name
(minus `Edge` suffix), `schema` = resolved entry JSON Schema,
1. **`graph_types`** row: `name` = Module name, `config` = resolved
`CallGraph.Config` JSON Schema
2. **`node_types`** rows: one per `*Node` entry, `name` = entry name (minus
suffix), `schema` = resolved entry JSON Schema
3. **`edge_types`** rows: one per `*Edge` entry, `name` = entry name (minus
suffix), `schema` = resolved entry JSON Schema,
`allowedSourceTypes`/`allowedTargetTypes` from constraint entries
On read, the repository layer reconstructs the Module from DB rows:
`Value.Check(CallGraph.CallNode, node.attributes)` validates node data against
the Module entry.
**`Module.Import()` embedding**: When a Module entry references entries from
another Module (e.g., `FlowGraph.Import("CallStatus")`), the JSON Schema for
that entry includes the referenced entries in `$defs`. The repository layer
stores the **dereferenced entry** — the resolved JSON Schema with inline `$defs`
for transitive references — not the entire importing Module. This avoids
duplicating all of FlowGraph's definitions in every CallGraph node_types row.
### Bridge Functions
#### `moduleToDbSchema(module)`
@@ -564,8 +460,7 @@ function moduleToDbSchema(module: TModule): DbSchema
- `*EdgeConstraints` entries that reference edge type entries not present in
the Module (the `edgeType` field must match an `*Edge` entry name).
- `*EdgeConstraints` entries with empty `allowedSourceTypes` and
`allowedTargetTypes` arrays (empty = "no types allowed", which is
nonsensical; omit the entry instead for "no restriction").
`allowedTargetTypes` arrays (omit the entry for "no restriction").
- Module without a `Config` entry (all graph types require configuration).
#### `validateNode(module, entryName, data)` / `validateEdge(module, entryName, data)`
@@ -581,47 +476,31 @@ Returns `true` if data passes `Value.Check` against the resolved Module entry.
Throws if `entryName` doesn't match an `*Node`/`*Edge` entry in the Module.
Does NOT throw on invalid data — returns `false`.
### Type.Any vs Type.Unknown
The pre-Module `types.ts` used `Type.Any()` for `metadata` and `schema` fields.
The Module approach uses `Type.Unknown()`. These have different JSON Schema
outputs:
- `Type.Any()``{}` (accepts anything, no validation)
- `Type.Unknown()``{}` with `additionalProperties: true` semantics
For the Module approach, **`Type.Unknown()` is canonical**. It's the more
explicit choice — it communicates "this field stores arbitrary data, no
validation applied." `Type.Any()` is a legacy from the original TypeBox API.
The `Metagraph` Module uses `Type.Unknown()` throughout.
### Performance Expectations
### Performance
Graph type Modules are small — typically 520 entries (one Config, 25 node
types, 25 edge types, 25 shared types, 25 constraint entries). The
`Value.Check` cost scales with schema complexity, not Module size; only the
resolved entry schema is checked, not the entire Module.
types, 25 edge types, 25 shared types, 25 constraint entries). `Value.Check`
cost scales with schema complexity, not Module size; only the resolved entry
schema is checked, not the entire Module.
The dereferenced entry strategy (DD6) means each DB row stores only its own
JSON Schema with transitive `$defs` — typically 13 KB per entry. A full
graph type's schemas total ~1050 KB in the DB. This is negligible compared
to the node/edge data being stored.
JSON Schema with transitive `$defs` — typically 13 KB per entry. A full graph
type's schemas total ~1050 KB in the DB, negligible compared to node/edge data.
"Validate on read" (Open Question 5) has a per-read cost. For
high-throughput paths, the repository layer can cache the resolved Module
entry locally after first read, avoiding repeated `Value.Check` for known-good
data. This is a repository-layer optimization, not a Module design concern.
"Validate on read" has a per-read cost. For high-throughput paths, the repository
layer can cache the resolved Module entry locally after first read. This is a
repository-layer optimization, not a Module design concern.
## Codegen Path
`TsToModule` generates TypeBox Modules from TypeScript interfaces. The path from
TypeScript to graph type:
`TsToModule.Generate()` produces TypeBox Module entries from TypeScript
interfaces, enabling a pipeline from TypeScript to graph type:
```
TypeScript interface → TsToModule.Generate() → TypeBox Module entry
@alkdev/flowgraph CallNodeAttrs → flowgraph schema.ts → FlowGraph Module
@alkdev/taskgraph TaskNodeAttrs → taskgraph schema.ts → TaskGraph Module
@alkdev/operations Identity → operations types.ts → Operations Module
TypeScript interface → TsToModule.Generate() → Module entry
@alkdev/flowgraph CallNodeAttrs → flowgraph schema.ts → FlowGraph Module
@alkdev/taskgraph TaskNodeAttrs → taskgraph schema.ts → TaskGraph Module
@alkdev/operations Identity → operations types.ts → Operations Module
```
Since flowgraph already defines `CallNodeAttrs` as a standalone TypeBox schema,
@@ -629,180 +508,32 @@ the codegen can produce a Module entry from it. Storage's `CallGraph` Module the
composes `BaseNode` with `CallNodeAttrs` via `Type.Composite`, or imports from
the flowgraph Module if flowgraph exports one (see Open Question 1).
## SchemaBuilder Equivalence
## Transition from SchemaBuilder
The removed `SchemaBuilder.build()` used to return a `GraphSchema` — a flat
object with `config`, `nodeTypes: Record<string, NodeType>`, and `edgeTypes:
Record<string, EdgeType>`. A `Type.Module` with the same entries is
structurally equivalent. This section documents what the builder was doing
internally to show the correspondence.
### What the builder was doing internally
```
SchemaBuilder
.config({ type: "directed", multi: false })
.nodeType("call", CallNodeSchema)
.edgeType("triggered", EdgeSchema, { allowedSourceTypes: ["call"] })
.build()
internally builds:
defs = {
Config: Type.Object({ type: Literal("directed"), multi: Literal(false), ... }),
CallNode: CallNodeSchema,
TriggeredEdge: EdgeSchema,
TriggeredEdgeConstraints: Type.Object({ edgeType: Literal("triggered"), ... }),
}
return Type.Module(defs)
```
The `.build()` return type was `TModule` (TypeBox Module). The `SchemaBuilder` is
removed — consumers use Module construction directly.
### Why this is equivalent
The `SchemaBuilder` was building a module under the hood — it just didn't have a
module system to target. Named entries referencing each other via strings is
exactly what `Type.Ref()` does natively. The Module format:
- Gives `Type.Ref()` instead of loose schema objects
- Gives `Module.Import()` instead of `Type.Intersect` for cross-package refs
- Gives JSON Schema `$defs` that map directly to DB storage
- Gives `Value.Check`, `Value.Diff`, `Value.Errors` on the full type system
- Gives codegen compatibility via `TsToModule.Generate()`
For the forward-looking connections (typed graph pointers, dbtype table
rendering, ujsx HostConfig for graph schemas), see
[forward-look.md](./forward-look.md).
## Design Decisions
### DD1: Module replaces SchemaBuilder
The SchemaBuilder is replaced by TypeBox Modules. The Module format provides
what SchemaBuilder was building toward, but natively:
- Named references → `Type.Ref()` instead of loose schema objects
- Cross-module imports → `Module.Import()` instead of `Type.Intersect`
- JSON Schema `$defs` → maps directly to DB storage
- Codegen compatibility → `TsToModule.Generate()` produces Module entries
### DD2: SchemaBuilder removed
The `SchemaBuilder` is removed. Consumers use `Type.Module()` construction
directly, with `Type.Ref()`, `Type.Composite()`, and `Metagraph.Import()`
as the building blocks. The `moduleToDbSchema()` function replaces
`SchemaBuilder.build()` as the bridge from Module to DB rows.
### DD3: Config as a Module entry with Literal values
Specific graph type Modules use `Type.Literal` for config values. The general
`Metagraph.Config` with `Type.Union` and defaults is for construction-time
validation. The specific Module freezes the config to exact values.
### DD4: Node/edge attribute schemas are Module entries, not `Type.Any()`
At the application layer, node and edge attribute schemas are named Module entries
with full type safety (`CallGraph.CallNode`, not `schema: Type.Any()`). At the
DB storage layer, the meta-schemas (`NodeType`, `EdgeType`) still have
`schema: Type.Unknown()` because the DB stores arbitrary JSON Schema blobs — the
Module entries are the application-level validation, the DB is the persistence
layer.
**Mapping**: The repository layer maps between Module entries and DB rows using
the naming convention (`*Node``node_types`, `*Edge``edge_types`, `Config`
`graph_types.config`). On read, it looks up the graph type's Module to get
the validation schema for each entry.
### DD5: Graphology import/export as the bridge to in-memory graphs
Storage produces data that `@alkdev/flowgraph`'s `FlowGraph.fromJSON()` and
`SerializedGraph` consume. The Module entries validate data flowing in both
directions. Storage doesn't need its own graphology dependency — it produces
the JSON format, flowgraph consumes it.
### DD6: Repository stores dereferenced entry schemas
To avoid `Module.Import()` embedding the full `$defs` of referenced Modules in
every DB row, the repository layer stores **dereferenced entry schemas** — each
`node_types` row gets its entry's resolved JSON Schema with just the transitive
`$defs` it needs, not the entire importing Module's definitions.
### DD7: Edge type constraints as named Module entries, not DB columns
Edge type constraints (`allowedSourceTypes`/`allowedTargetTypes`) are named
Module entries (e.g., `TriggeredEdgeConstraints` with `Type.Array(Type.String())`
fields), not just DB columns. This gives them schema validation and
serialization. The repository layer projects these entries to the existing
`edge_types` columns (arrays of node type name strings). The DB schema
doesn't change — the Module entries are the source of truth.
**Revised from original DD7** which stored constraints only as DB columns.
Named entries are strictly more capable: they validate and serialize;
DB columns are their persistence projection.
### DD8: Naming convention for Module entries
Within a graph type Module, entries are named with role-distinguishing suffixes:
`*Node` for node types, `*Edge` for edge types, `Config` for graph configuration,
`*EdgeConstraints` for edge endpoint constraints, and bare names or `*Enum` for
shared types. `moduleToDbSchema()` uses this convention to map entries to DB
tables.
**Alternative considered**: Explicit metadata/decorators on entries (e.g.,
`{ kind: "nodeType", name: "call", schema: ... }`). Rejected because it adds
boilerplate without adding information — the suffix convention is simpler
and sufficient for the expected Module size (520 entries).
### DD9: Pointer abstraction is forward-looking, not v1
The structural analogy between ujsx's `ValuePointer`/`selectNode`/`setNode` and
graph node/edge addressing is real, but implementing typed graph pointers (via
JPATH Module or reactive signals) is a post-v1 concern. For v1, repository
functions use direct key-based addressing and the Module validates attribute
shapes. The Module's existence makes typed pointers feasible later because
it provides the schema the pointer validates against.
**Alternative considered**: Implement typed pointers in v1 via a lightweight
`GraphPointer<T>` wrapper. Rejected because it requires either JPATH Module
dependency or reactive signal integration, both of which add complexity
without clear v1 benefit. Direct key-based addressing is sufficient.
### DD10: dbtype integration is post-v1
`@alkdev/dbtype`'s UJSX→Module→Host pipeline can eliminate the manual dual
definition of SQLite/PG table schemas. But dbtype is Phase 0 (architecture
complete, no implementation). For v1, storage uses manual Drizzle table
definitions. The Module-based graph type definitions are compatible with dbtype
because both produce `Type.Module` objects — the integration path is clear.
**Alternative considered**: Implement dbtype integration alongside the initial Module
construction. Rejected because it adds a dependency on an unimplemented package
and the manual table definitions work well. The cost of deferring is continued
dual SQLite/PG maintenance, which is manageable for 6 metagraph tables.
## What Changes
The existing `schemaBuilder.ts` and `types.ts` use a different approach that is
being replaced:
| Before (unreleased) | After |
|---------|-----|
| `types.ts` — standalone schemas | `modules/metagraph.ts``Metagraph` Module |
| `schemaBuilder.ts` — fluent builder | Removed — replaced by Module construction |
| `schemaBuilder.ts` — fluent builder | Removed — replaced by `Type.Module()` construction |
| `types.ts``BaseNodeAttributes`, `BaseEdgeAttributes` | `Metagraph` Module entries |
| `types.ts``GraphConfig`, `GraphStatus`, `GraphBaseType` | `Metagraph` Module entries + const objects |
| `allowedSourceTypes`/`allowedTargetTypes` as DB columns only | Named `*EdgeConstraints` Module entries (projected to DB columns) |
| No concrete graph type Modules | `modules/call-graph.ts`, `modules/acl-graph.ts`, etc. |
| No bridge between Module ↔ DB ↔ graphology | `bridge.ts` — validation, DB mapping, graphology format |
## What Doesn't Change
Note: `Type.Any()` used in the old `types.ts` for `metadata` and `schema` fields
is replaced by `Type.Unknown()` in the Module approach. Both produce `{}` in
JSON Schema, but `Type.Unknown()` is the canonical choice — it explicitly
communicates "no validation applied."
- **Database tables** — same 6 metagraph tables, same columns, same relations
- **SQLite host** — table definitions, relations, client factory unchanged
- **PostgreSQL host** (planned) — same shapes, different dialect
- **`@alkdev/typebox` dependency** — same. Modules are a core TypeBox feature
- **Encryption utility** — unchanged, can be a Module entry in `SecretGraph`
- **`allowedSourceTypes`/`allowedTargetTypes`** — same DB columns, same semantics
(Module entries are the source of truth, projected to DB columns by
`moduleToDbSchema()`)
**What doesn't change**: The 6 metagraph database tables, their columns, and
relations remain the same. SQLite host table definitions, client factory, and
drizzlebox-generated schemas are unchanged. The `@alkdev/typebox` dependency is
unchanged. The encryption utility (planned) is unchanged. `allowedSourceTypes`
and `allowedTargetTypes` remain DB columns with the same semantics — Module
entries are the source of truth, projected to columns by `moduleToDbSchema()`.
## Implementation Path
@@ -817,62 +548,138 @@ dual SQLite/PG maintenance, which is manageable for 6 metagraph tables.
4. **Phase 4**: Add `moduleToGraphology()` and `fromGraphologyExport()` for the
graphology bridge. Storage produces the format, flowgraph consumes it.
Acceptance criteria per phase:
- **Phase 2 complete**: `moduleToDbSchema()` produces values compatible with all
6 metagraph tables
- **Phase 3 complete**: Reference Modules validate against their flowgraph/taskgraph
counterparts
Acceptance criteria:
- **Phase 2 complete**: `moduleToDbSchema()` produces values compatible with
all 6 metagraph tables
- **Phase 3 complete**: Reference Modules validate against their
flowgraph/taskgraph counterparts
## Relationship to Other Packages
### Relationship to Other Packages
| Package | What changes | What stays |
|---------|-------------|------------|
| `@alkdev/storage` | `types.ts` → Module, `schemaBuilder.ts` → removed, new `modules/` and `bridge.ts` | Tables, relations, crypto, client factory |
| `@alkdev/flowgraph` | `CallNodeAttrs`, `CallEdgeAttrs`, `CallStatus` become Module entries (optional, exported from `/schema` subpath) | FlowGraph class, analysis, all runtime logic |
| `@alkdev/flowgraph` | `CallNodeAttrs`, `CallEdgeAttrs`, `CallStatus` become Module entries (optional, exported from `/schema`) | FlowGraph class, analysis, all runtime logic |
| `@alkdev/taskgraph` | `TaskGraphNodeAttributes`, `DependencyEdge` become Module entries (optional) | TaskGraph class, analysis, all runtime logic |
| `@alkdev/operations` | `Identity`, `AccessControl` become Module entries (optional) | Registry, call protocol, adapters |
| `@alkdev/pubsub` | No change | Transport layer |
| `@alkdev/ujsx` | No change (already a Module) | The pattern we're following |
| `@alkdev/dbtype` | No change (Phase 0) | Future: storage table defs could be dbtype element trees |
## Design Decisions
### DD1: TypeBox Module replaces the SchemaBuilder
Graph type definitions are `Type.Module` objects. The previous `SchemaBuilder`
class is removed — consumers use `Type.Module()` construction directly, with
`Type.Ref()`, `Type.Composite()`, and `Metagraph.Import()` as the building
blocks. The `moduleToDbSchema()` function replaces `SchemaBuilder.build()` as
the bridge from Module to DB rows.
This provides `Type.Ref()` for internal references, `Module.Import()` for
cross-package references, JSON Schema `$defs` that map directly to DB storage,
and codegen compatibility via `TsToModule.Generate()`.
### DD2: Metagraph.Import() for same-package Modules
Concrete graph types within `@alkdev/storage` use `Metagraph.Import("BaseNode")`
to compose base schemas. This avoids duplication and keeps the base schemas in
one place. External packages that define graph type Modules should re-declare
base schemas locally — storage should not be a dependency of other packages'
schema definitions.
### DD3: Config as a Module entry with Literal values
General `Metagraph.Config` uses `Type.Union` with defaults for construction-time
validation. Specific graph types freeze config values to `Type.Literal`, making
the config a precise contract rather than a validation surface.
### DD4: Node/edge attribute schemas are Module entries, not Type.Any()
At the application layer, node and edge attribute schemas are named Module
entries with full type safety (`CallGraph.CallNode`, not `schema: Type.Any()`).
At the DB storage layer, the meta-schemas (`NodeType`, `EdgeType`) still have
`schema: Type.Unknown()` because the DB stores arbitrary JSON Schema blobs.
### DD5: Storage produces graphology format, flowgraph consumes it
Storage doesn't need a graphology dependency. It produces the JSON serialization
format that `@alkdev/flowgraph`'s `FlowGraph.fromJSON()` and `SerializedGraph`
consume. The Module entries validate data flowing in both directions.
### DD6: Repository stores dereferenced entry schemas
When a Module entry uses `Module.Import()`, the entry's JSON Schema embeds the
referenced Module's `$defs`. To avoid storing the full referenced Module in
every DB row, the repository layer stores **dereferenced entry schemas** — each
`node_types` row gets its entry's resolved JSON Schema with just the transitive
`$defs` it needs, not the entire importing Module's definitions.
### DD7: Edge type constraints as named Module entries
Edge type constraints (`allowedSourceTypes`/`allowedTargetTypes`) are named
Module entries (e.g., `TriggeredEdgeConstraints`), not just DB columns. This
gives them schema validation (`Value.Check`) and serialization (JSON Schema
with `$defs`). The repository layer projects these entries to the existing
`edge_types` columns. The DB schema doesn't change — Module entries are the
source of truth, DB columns are the persistence projection.
### DD8: Naming convention for Module entries
Module entries use role-distinguishing suffixes: `*Node` for node types,
`*Edge` for edge types, `Config` for graph configuration, `*EdgeConstraints`
for edge endpoint constraints, and bare names or `*Enum` for shared types.
`moduleToDbSchema()` uses this convention to map entries to DB tables.
This was chosen over explicit metadata/decorators (e.g.,
`{ kind: "nodeType", name: "call", schema: ... }`) because the suffix convention
is simpler and sufficient for the expected Module size (520 entries).
### DD9: Pointer abstraction is forward-looking, not v1
The structural analogy between ujsx's `ValuePointer`/`selectNode`/`setNode` and
graph node/edge addressing is real, but implementing typed graph pointers (via
JPATH Module or reactive signals) is a post-v1 concern. For v1, repository
functions use direct key-based addressing (`findNode(graphId, nodeKey)`), and
the Module validates attribute shapes. See [forward-look.md](./forward-look.md).
### DD10: dbtype integration is post-v1
`@alkdev/dbtype`'s UJSX→Module→Host pipeline can eliminate the manual dual
definition of SQLite/PG table schemas. But dbtype is Phase 0 (architecture
complete, no implementation). For v1, storage uses manual Drizzle table
definitions. The Module-based graph type definitions are compatible with dbtype
because both produce `Type.Module` objects — the integration path is clear.
See [forward-look.md](./forward-look.md).
## Open Questions
1. **Should `@alkdev/flowgraph` export a `Type.Module`, or should storage define
its own entries with documented correspondence?** Flowgraph currently exports
`CallNodeAttrs` as a standalone `Type.Object`. To use `Import()`, flowgraph
needs to export a Module. But storage can start with standalone schemas and
needs to export a Module. Storage can start with standalone schemas and
`Type.Composite([BaseNode, CallNodeAttrs])` — no dependency on flowgraph.
Adopt `Import()` when flowgraph provides a Module. **This avoids a
circular dependency: `@alkdev/storage` does NOT depend on `@alkdev/flowgraph`.**
Adopt `Import()` when flowgraph provides a Module. **This avoids a circular
dependency: `@alkdev/storage` does NOT depend on `@alkdev/flowgraph`.**
2. **Should concrete graph type Modules live in storage or in their respective
packages?** Call-graph attribute schemas are defined by flowgraph's domain, not
storage's. Storage provides the metagraph *framework* (the `Metagraph` Module
with `BaseNode`, `BaseEdge`, `Config`). Concrete graph types like `CallGraph`
could live either in storage (as reference implementations) or in their
respective packages (flowgraph exports `CallGraph` Module alongside
`CallNodeAttrs`). **Decision: Both.** Storage provides reference Modules in
`modules/` that consumers can use directly or replace. Flowgraph may also
export a Module — the two are compatible via Module `$defs`.
with `BaseNode`, `BaseEdge`, `Config`). Concrete types like `CallGraph` could
live either in storage (as reference implementations) or in their respective
packages. **Decision: Both.** Storage provides reference Modules in `modules/`
that consumers can use directly or replace. Flowgraph may also export a
Module — the two are compatible via Module `$defs`.
3. **Should `*EdgeConstraints` entries use `Type.Ref("CallNode")` or
`Type.String()` for allowed source/target types?** Using `Type.Ref`
would mean "each element in the array must validate against the CallNode
schema," which is semantically wrong — the constraint is about which named
node types are valid endpoints, not about data shapes. Using `Type.String()`
matches the actual semantics (arrays of node type names) but loses the
structural link. **Decision: `Type.String()`** — the constraint arrays
contain names, not schemas. The naming convention provides an implicit
contract that string values should correspond to `*Node` entry names,
enforced by `moduleToDbSchema()` at projection time.
`Type.String()` for allowed source/target types?** See the
[Edge Type Constraints](#edge-type-constraints) section. **Decision:
`Type.String()`** — the constraint arrays contain names, not schemas.
4. **How does the graph pointer abstraction interact with the repository layer?**
For v1, repository functions use direct key-based addressing. Typed pointers
(JPATH Module, reactive ValuePointer) could layer on top of the repository
later. The key question: does the repository return raw data (untyped JSON),
or does it validate against the Module before returning? **Decision: validate
on read** — if the data doesn't match the Module entry, throw. This makes
typed pointers safe: any value you get from the repo conforms to the schema.
For v1, repository functions use direct key-based addressing. **Decision:
validate on read** — if data doesn't match the Module entry, throw. This
makes any value retrieved from the repo conform to the schema.
## References
@@ -880,8 +687,8 @@ Acceptance criteria per phase:
- ujsx ADR-002 (Module as type registry): `/workspace/@alkdev/ujsx/docs/architecture/decisions/002-typebox-module-as-registry.md`
- ujsx schema docs: `/workspace/@alkdev/ujsx/docs/architecture/schema.md`
- TsToModule codegen: `/workspace/research/typebox_research/codegen/ts-to-module.ts`
- ujsx Module examples: `/workspace/research/typebox_research/ujsx/unist.gen.ts`, `/workspace/research/typebox_research/ujsx/mdast.gen.ts`
- Flowgraph schema (standalone TypeBox, not yet Module): `/workspace/@alkdev/flowgraph/src/schema/`
- Flowgraph SerializedGraph factory: `/workspace/@alkdev/flowgraph/src/schema/graph.ts`
- Forward-looking connections (pointers, dbtype, ujsx IR): [forward-look.md](./forward-look.md)
- Ecosystem integration: [overview.md](./overview.md)
- Schema evolution: [schema-evolution.md](./schema-evolution.md)
- Forward-looking connections: [forward-look.md](./forward-look.md)
- Package overview: [overview.md](./overview.md)