Files
storage/docs/architecture/forward-look.md
glm-5.1 5ce93b1357 docs: add metagraph-module and forward-look architecture specs, remove SchemaBuilder legacy support
Graph type definitions as TypeBox Modules — the core architecture evolution
for @alkdev/storage. The SchemaBuilder is removed (no existing consumers),
replaced by direct TypeModule construction with Metagraph.Import() for
base attribute composition and Type.Composite() for node/edge type
specialization.

Key additions:
- metagraph-module.md: Module pattern, edge constraints as named entries,
  SchemaBuilder equivalence, DB bridge contracts (moduleToDbSchema return
  type, validateNode/validateEdge signatures), 10 design decisions (DD1-DD10)
- forward-look.md: pointer abstraction (ujsx ValuePointer analogy, JPATH
  Module), dbtype table rendering relationship, ujsx as universal IR pipeline

Critical corrections from architecture review:
- Type.Composite uses IntersectEvaluated (intersection, not Object.assign
  override) — overlapping keys with subtype relationships resolve correctly
- Type.Ref inside Type.Composite within a Module is verified working
- BaseNode/BaseEdge use Metagraph.Import() for same-package Modules (Option B),
  not local re-declaration (no circular dep within same package)
- Edge constraints use Type.String() for node type name arrays (not Type.Ref) —
  constraints contain names, not schemas
2026-05-28 15:32:56 +00:00

256 lines
10 KiB
Markdown

---
status: draft
last_updated: 2026-05-30
---
# Forward Look: Pointers, dbtype, and Universal IR
How the Module-based metagraph connects to the broader @alkdev ecosystem —
typed graph pointers, dbtype table rendering, and the ujsx universal IR
pipeline. These are forward-looking designs that justify why certain
structural decisions are made now (DD9, DD10 in
[metagraph-module.md](./metagraph-module.md)).
## Overview
Three packages in the @alkdev ecosystem share the same pipeline shape:
```
Schema (TypeBox Module) → Element Tree (ujsx) → Host (HostConfig)
```
| Package | Schema | Element tree | Host |
|---------|--------|-------------|------|
| `@alkdev/ujsx` | `UJSX` Module | `<element>`, `<root>` | DOM, custom |
| `@alkdev/dbtype` | Table/Column schemas | `<table>`, `<column>` | SQLite, PG, MySQL drizzle dialects |
| `@alkdev/storage` | `Metagraph` Module | ⚠️ Future: `<graphSchema>`, `<nodeType>` | ⚠️ Future: graph DB hosts |
When storage's graph type definitions align with the Module pattern, they
join this same pipeline. The immediate benefit is recursive/cross-referencing
schemas (today). The forward benefit is that graph type definitions, table
definitions, and pointer expressions can all be authored as ujsx element trees
rendered to different hosts.
## Pointer Abstraction
Addressing nodes and edges within a graph instance follows the same pattern as
ujsx's `ValuePointer` and `selectNode`/`setNode` — and the same pattern as
jsonpathly's JPATH Module for path expressions.
### ujsx's pointer system (proven)
ujsx already implements a reactive pointer system:
```ts
class ValuePointer<T> {
private _signal: Signal<T>;
private _path: string[];
get value(): T
set value(v: T)
get reactive(): ReadonlySignal<T>
get path(): string[]
}
function selectNode(root: UNode, path: string[]): UNode | undefined
function setNode(root: UNode, path: string[], value: UNode): UNode
```
This addresses elements within a ujsx tree by path segments (child indices,
prop names). A graph instance has analogous structure: nodes identified by
key, edges identified by key, attributes addressed by JSON path.
### Graph pointer analogy
```ts
// ujsx pointer: element tree → path → value
selectNode(root, ["children", 0, "props", "name"])
// Graph pointer: graph instance → path → value
selectNode(graph, ["nodes", "call-001", "attributes", "requestId"])
```
The structural analogy:
| ujsx concept | Graph concept |
|-------------|---------------|
| Element tree root | Graph instance |
| `UNode` | Node or Edge |
| `path: string[]` | Key path: `["nodes", key]` or `["edges", key]` |
| `selectNode(root, path)` | `selectGraphNode(graph, path)` |
| `setNode(root, path, value)` | `setGraphNode(graph, path, value)` (via repository) |
### JPATH Module (jsonpathly)
The research shows that JSONPath expressions can themselves be a TypeBox Module
(`JPATH = Type.Module({...})` with recursive `Type.Ref("Subscript")`). This means
pointer paths are not just runtime strings — they're typed schemas that can be
validated and composed.
For graph storage, this opens the possibility of **typed graph queries** — a
pointer expression like `nodes.call-001.attributes.requestId` has a schema that
validates against the graph type's Module. If `CallNode` doesn't have a
`requestId` field, the pointer expression is invalid at compile time.
### Scope for v1
The pointer abstraction is a forward-looking design. For v1:
- **Repository functions** use direct key-based addressing:
`findNode(graphId, nodeKey)`, `findEdge(graphId, edgeKey)`
- **Attribute access** is untyped JSON retrieval:
`node.attributes.requestId`
- **The Module** validates attribute shapes, but query paths are strings
The jump to typed pointers requires either the JPATH Module (for path
validation) or ujsx-style `ValuePointer` with signals (for reactive graph
observation). Both are post-v1 concerns, but the graph type Module makes them
feasible because it provides the schema the pointer validates against.
## Relationship to @alkdev/dbtype
`@alkdev/dbtype` defines database schemas as ujsx element trees and renders them
to Drizzle dialects via HostConfig. Storage's SQLite/PG table definitions are a
natural consumer of this pipeline.
### Current vs. Future Table Definition
**Current** (manual Drizzle table defs):
```ts
export const graphTypes = sqliteTable("graph_types", {
id: text("id").primaryKey(),
name: text("name").notNull(),
config: text("config", { mode: "json" }).notNull(),
// ...
});
```
**Future** (dbtype element tree → HostConfig rendering):
```tsx
const GraphTypesEl = h("table", { name: "graph_types" },
h(IdColumn, {}),
h("column", { name: "name", type: "string", notNull: true }),
h("column", { name: "config", type: "json", mode: "json", notNull: true }),
h(AuditColumns, {}),
);
const root = createRoot(sqliteHost, {});
root.render(GraphTypesEl);
const drizzleTable = root.ctx.tables.graph_types;
```
### Why this matters for storage
1. **Single source of truth**: Today's `sqlite/tables/` and future `pg/tables/`
define the same shapes in two different Drizzle dialects. dbtype renders the
same element tree to both — no manual duplication.
2. **Schema extraction**: `extractTable()` produces both TypeBox schemas (for
validation) and column metadata (for Drizzle rendering) from the same tree.
Storage gets `SelectGraphType` and `InsertGraphType` schemas for free.
3. **Module alignment**: dbtype assembles extracted schemas into a
`Type.Module` for cross-table references. Storage's metagraph Module and
dbtype's table Module could share a namespace — the `graph_types.config`
column stores the JSON Schema from `Metagraph.Config`.
### v1 approach
For v1, storage continues with manual Drizzle table definitions. The dbtype
integration is a post-v1 migration path because:
- dbtype is Phase 0 (architecture complete, no implementation)
- The manual defs work and are well-understood
- The Module pattern for graph types can be adopted independently (no dbtype
dependency)
When dbtype reaches Phase 1 (implementation), storage can migrate table defs
to dbtype elements one table at a time. The Module-based graph type definitions
are already compatible — they're both TypeBox `Type.Module` objects.
## ujsx as Universal IR
The three packages (ujsx, dbtype, storage) share the same pipeline shape:
**Schema → Element Tree → Host**. This is not coincidental — ujsx is a
universal declarative IR, and different "render targets" are just different
HostConfigs.
### What this could look like
```tsx
// Graph type definitions as ujsx elements (future)
const CallGraphSchema = h("graphSchema", { name: "call-graph" },
h("config", { type: "directed", multi: false, allowSelfLoops: false }),
h("nodeType", { name: "call" },
h(BaseNode, {}),
h("attr", { name: "requestId", type: "string", required: true }),
h("attr", { name: "status", ref: "CallStatus" }),
),
h("edgeType", { name: "triggered" },
h(BaseEdge, {}),
h("attr", { name: "type", literal: "triggered" }),
),
h("edgeConstraints", { edgeType: "triggered",
allowedSourceTypes: ["Call"],
allowedTargetTypes: ["Call", "Subcall"] }),
);
```
Rendered to different hosts:
| Host | Output |
|------|--------|
| TypeBox Host | `Type.Module({ CallNode: ..., TriggeredEdge: ... })` |
| SQLite Host | `sqliteTable("node_types", { ... })` + `sqliteTable("edge_types", { ... })` |
| PG Host | `pgTable("node_types", { ... })` + `pgTable("edge_types", { ... })` |
| graphology Host | `SerializedGraph` format |
| Documentation Host | Mermaid diagram, typed API docs |
### What's real today vs. aspirational
| Capability | Status |
|-----------|--------|
| `Type.Module` for graph type definitions | ✅ Ready to implement now |
| Codegen from TypeScript interfaces → Module entries | ✅ TsToModule exists |
| dbtype element trees → Drizzle tables | ⚠️ dbtype Phase 0, no implementation |
| `<graphSchema>` ujsx elements | ⚠️ Conceptual — needs HostConfig design |
| Typed graph pointers via JPATH | ⚠️ Conceptual — needs JPATH Module design |
| Reactive graph observation via ValuePointer | ⚠️ Conceptual — needs signal integration |
The Module-based graph type definitions (this spec) are the **first concrete
step** in this pipeline. Everything else builds on having a `Type.Module` as
the schema source of truth.
## Constraints on Current Design
The forward-looking patterns documented here constrain the Module evolution
design in [metagraph-module.md](./metagraph-module.md):
1. **The Module format must be self-contained**`Type.Module({...})` entries
with `Type.Ref` and `Type.Composite` are the same structures that a ujsx
TypeBox Host would produce. If the Module format were an ad-hoc builder
output, it couldn't be rendered by a different host later.
2. **Edge constraints must be schema entries, not just DB columns** — the
constraint data needs to survive serialization/deserialization and be
validatable independently. DB-only columns can't do this.
3. **The base attribute schemas (`BaseNode`, `BaseEdge`) must be TypeBox
schemas** — not Drizzle column definitions, not builder-internal objects.
Only TypeBox schemas can be composed via `Type.Composite`, referenced via
`Type.Ref`, and serialized to JSON Schema.
4. **No ujsx dependency** — storage's Module-based graph types join the
pipeline conceptually, not as a runtime dependency. The `Type.Module`
output is the same shape that a ujsx HostConfig would produce, but storage
doesn't need ujsx to create it. The alignment is structural, not dependent.
## References
- ujsx pointer system: `/workspace/@alkdev/ujsx/src/core/pointer.ts`
- ujsx HostConfig adapter: `/workspace/@alkdev/ujsx/src/host/config.ts`
- dbtype architecture: `/workspace/@alkdev/dbtype/docs/architecture/README.md`
- dbtype elements: `/workspace/@alkdev/dbtype/docs/architecture/elements.md`
- dbtype module: `/workspace/@alkdev/dbtype/docs/architecture/module.md`
- JPATH Module (JSONPath as TypeBox Module): `/workspace/research/typebox_research/ujsx/jpath.gen.ts`
- jsonpathly source: `/workspace/jsonpathly/`
- Module evolution spec: [metagraph-module.md](./metagraph-module.md)