docs: delete metagraph.md, migrate data model into metagraph-module.md

The historical reference doc was exactly the confusing artifact we were
cleaning up. Its unique content (the three-level type system overview
and ASCII diagram) now lives in metagraph-module.md as an introductory
section. Everything else was redundant:

- Schema types → metagraph-module.md (Module entries)
- SchemaBuilder → metagraph-module.md (SchemaBuilder Equivalence section)
- Usage patterns → metagraph-module.md + encrypted-data.md (Module examples)
- Composite identity / attributes storage → sqlite-host.md (table definitions)
- Versioning → schema-evolution.md (thorough treatment)
- Ecosystem context → overview.md (Ecosystem Integration section)

All cross-references updated: AGENTS.md, sqlite-host.md, schema-evolution.md.
This commit is contained in:
2026-05-29 05:27:08 +00:00
parent 95e02f939d
commit 3b63d92976
5 changed files with 57 additions and 429 deletions

View File

@@ -8,6 +8,57 @@ last_updated: 2026-05-30
Graph type definitions as `Type.Module` — aligning with the ujsx pattern for
recursive schemas, cross-package references, codegen, and graphology serialization.
## The Metagraph Data Model
The metagraph pattern is a three-level type system:
1. **GraphType** — A class of graphs (e.g., "call-graph", "acl",
"task-dependencies"). Defines structural constraints
(directed/undirected/mixed, allows self-loops, multi-edges) via a
`Config` entry.
2. **NodeType** — A category of node within a graph type (e.g., "call",
"account", "task"). Each node type has a TypeBox schema that validates
the `attributes` of nodes belonging to that type. Optionally constrains
which edge types can connect from/to this node type.
3. **EdgeType** — A category of edge within a graph type (e.g., "triggered",
"can_read", "depends_on"). Each edge type has a TypeBox schema for its
attributes. Optionally constrains which source/target node types are valid.
Then **Graph instances** belong to a graph type and contain **Nodes** and
**Edges** conforming to those type definitions.
```
GraphType "call-graph" (directed, multi, self-loops allowed)
├── NodeType "call" → schema validates call attributes
├── NodeType "subcall" → schema validates subcall attributes
├── EdgeType "triggered" → allowedSourceTypes: ["Call"], allowedTargetTypes: ["Call", "Subcall"]
└── EdgeType "depends_on" → allowedSourceTypes: ["Call", "Subcall"], allowedTargetTypes: ["Call", "Subcall"]
Graph "session-abc-call-graph" (instance)
│ graphTypeId → GraphType "call-graph"
│ status: "active"
├── Node "call-001" → nodeTypeId → NodeType "call"
│ └── attributes: { requestId, operationId, status, ... }
├── Node "call-002" → nodeTypeId → NodeType "subcall"
│ └── attributes: { requestId, parentRequestId, ... }
└── Edge "edge-001" → edgeTypeId → EdgeType "triggered"
└── attributes: { type: "triggered" }
sourceNodeKey: "call-001"
targetNodeKey: "call-002"
```
Nodes and edges use a **composite identity model**: identified by
`(graphId, key)` where `key` is consumer-defined. The database generates UUID
`id` values for cross-graph references, but within a graph, the consumer's
`key` is the identity.
Node and edge attributes are stored as JSON text in SQLite (jsonb in PG). The
graph type's schema defines what shape these attributes should have, but the
database doesn't enforce the schema — all validation happens in the repository
layer. See [schema-evolution.md](./schema-evolution.md) for how schemas change
over time, and [sqlite-host.md](./sqlite-host.md) for the table definitions.
## Overview
A graph type definition is naturally a TypeBox Module. It has named entries
@@ -833,5 +884,4 @@ Acceptance criteria per phase:
- Flowgraph schema (standalone TypeBox, not yet Module): `/workspace/@alkdev/flowgraph/src/schema/`
- Flowgraph SerializedGraph factory: `/workspace/@alkdev/flowgraph/src/schema/graph.ts`
- Forward-looking connections (pointers, dbtype, ujsx IR): [forward-look.md](./forward-look.md)
- Current metagraph model: [metagraph.md](./metagraph.md)
- Ecosystem integration: [overview.md](./overview.md)

View File

@@ -1,422 +0,0 @@
---
status: draft
last_updated: 2026-05-28
---
# Metagraph Model
> **Historical reference only.** Graph type definitions are now TypeBox Modules,
> not standalone schemas + SchemaBuilder. The current data model and construction
> patterns are specified in [metagraph-module.md](./metagraph-module.md). This
> document is retained for understanding the data model concepts (graph types,
> node types, edge types, graph instances) and the versioning/attributes storage
> design, which carry forward unchanged into the Module approach.
The core data model: graph types define schemas, node types define shapes, edge
types define relationships, and typed graph instances hold actual data.
## Overview
The metagraph pattern is a three-level type system:
1. **GraphType** — A class of graphs (e.g., "call-graph", "acl",
"task-dependencies"). Defines structural constraints
(directed/undirected/mixed, allows self-loops, multi-edges) via a
`GraphConfig`.
2. **NodeType** — A category of node within a graph type (e.g.,
"operation-call", "account", "task"). Each node type has a TypeBox schema
that validates the `attributes` of nodes belonging to that type. Optionally
constrains which edge types can connect from/to this node type.
3. **EdgeType** — A category of edge within a graph type (e.g., "triggered",
"can_read", "depends_on"). Each edge type has a TypeBox schema for its
attributes. Optionally constrains which source/target node types are valid.
Then **Graph instances** belong to a graph type and contain **Nodes** and
**Edges** conforming to those type definitions.
```
GraphType "call-graph" (directed, multi, self-loops allowed)
├── NodeType "call" → schema validates call attributes
├── NodeType "subcall" → schema validates subcall attributes
├── EdgeType "triggered" → allowedSourceTypes: ["call"], allowedTargetTypes: ["call", "subcall"]
└── EdgeType "depends_on" → allowedSourceTypes: ["call", "subcall"], allowedTargetTypes: ["call", "subcall"]
Graph "session-abc-call-graph" (instance)
│ graphTypeId → GraphType "call-graph"
│ status: "active"
├── Node "call-001" → nodeTypeId → NodeType "call"
│ └── attributes: { requestId, operationId, status, ... }
├── Node "call-002" → nodeTypeId → NodeType "subcall"
│ └── attributes: { requestId, parentRequestId, ... }
└── Edge "edge-001" → edgeTypeId → EdgeType "triggered"
└── attributes: { type: "triggered" }
sourceNodeKey: "call-001"
targetNodeKey: "call-002"
```
## Schema Types
> These are the pre-Module representations. The `Metagraph` Module
> ([metagraph-module.md](./metagraph-module.md)) replaces these standalone
> schemas with Module entries (`Metagraph.BaseNode`, `Metagraph.BaseEdge`,
> `Metagraph.Config`). The data shapes are the same; the Module format adds
> `Type.Ref()`, `Type.Composite()`, and `$defs` support.
Were defined in `src/graphs/types.ts`. Zero database dependencies — pure
TypeBox schemas used for validation and type inference.
### BaseNodeAttributes
```ts
{
created?: string, // ISO 8601 date-time
modified?: string, // ISO 8601 date-time
metadata?: Record<string, unknown>
}
```
Optional audit and extension fields. Node `attributes` should extend this.
### BaseEdgeAttributes
```ts
{
type: string, // edge type discriminator
metadata?: Record<string, unknown>
}
```
Every edge carries its type and optional metadata. Edge `attributes` should
extend this.
### GraphConfig
```ts
{
type: "directed" | "undirected" | "mixed", // default: "mixed"
multi: boolean, // default: true
allowSelfLoops: boolean // default: true
}
```
Structural constraints for a graph type. Defaults encourage permissive graphs
(mixed, multi-edges, self-loops) because most real-world graphs need these
features.
### NodeType
```ts
{
name: string,
schema: TSchema // TypeBox schema for node attributes
}
```
A node type definition. The `schema` validates the `attributes` of nodes that
belong to this type. Consumer must extend `BaseNodeAttributes` in their schema —
the metagraph model does not enforce this at the database level (SQLite can't
enforce JSON schema), but the SchemaBuilder validated it at definition time
(now handled by `Type.Module()` construction).
### EdgeType
```ts
{
name: string,
schema: TSchema,
allowedSourceTypes?: string[],
allowedTargetTypes?: string[]
}
```
An edge type definition. Optionally constrains which node types can appear at
source/target endpoints. When `allowedSourceTypes` or `allowedTargetTypes` is
undefined, any node type is valid. When defined, only listed node types are
valid endpoints.
### GraphSchema
```ts
{
config: GraphConfig,
nodeTypes: Record<string, NodeType>,
edgeTypes: Record<string, EdgeType>
}
```
The complete definition of a graph type. This is what `SchemaBuilder.build()`
produced (now `Type.Module()` produces the same structure).
### GraphStatus & GraphBaseType
Enum-backed types for graph lifecycle and structural type:
- `GraphStatus`: `active`, `archived`, `draft`
- `GraphBaseType`: `directed`, `undirected`, `mixed`
These are provided as `as const` object constants and TypeBox `Type.Union` of
`Type.Literal` schemas, following the project convention (see overview.md D6).
The TypeBox schemas derive their literal values from the same const object.
## SchemaBuilder (Historical — Replaced by Module Construction)
Was defined in `src/graphs/schemaBuilder.ts`. Fluent builder API. **This builder
is removed in the Module approach** — see [metagraph-module.md](./metagraph-module.md)
DD1/DD2. The builder's internal structure documents the structural equivalence
with `Type.Module()` (see "SchemaBuilder Equivalence" in that document).
```ts
const schema = new SchemaBuilder()
.config({ type: "directed", multi: true, allowSelfLoops: false })
.nodeType("call", CallAttributesSchema)
.nodeType("subcall", SubcallAttributesSchema)
.edgeType("triggered", BaseEdgeAttributes, {
allowedSourceTypes: ["call"],
allowedTargetTypes: ["call", "subcall"],
})
.edgeType("depends_on", BaseEdgeAttributes)
.build();
```
### Validation
The builder validated at each step:
1. **`config()`** — Validated against `GraphConfig` schema. Applied defaults for
missing fields.
2. **`nodeType()`** — Validated the schema was a valid TypeBox schema
(`KindGuard.IsSchema`). Validated the resulting object against `NodeType`
schema.
3. **`edgeType()`** — Same as nodeType, plus validated
allowedSourceTypes/allowedTargetTypes were strings.
4. **`build()`** — Validated the complete schema against `GraphSchema`. Threw
on any invalid structure.
**Error behavior**: The builder threw `Error` with a JSON-stringified list of
validation errors (path + message). Validation failures did not roll back partial
state — a builder that failed on the second `nodeType()` call still had the first
node type in its schema. Callers were advised not to reuse a builder after a
failure.
**Edge type enforcement**: When `allowedSourceTypes` or `allowedTargetTypes` is
`undefined` in the schema layer, any node type is a valid endpoint. When a
non-empty array is provided, only the listed node types are valid endpoints.
In the database layer, `[]` (the column default) represents "no restriction" —
any node type is valid — matching the behavior of `undefined` in the schema.
There is no "no types allowed" state; if edge types need to be disabled, use a
status or soft-delete pattern on the edge type definition. The repository layer
must enforce this convention consistently.
The SchemaBuilder enforced structural integrity at definition time. In the
Module approach, `Type.Module()` construction and `Value.Check()` provide the
same guarantee. The database
stores graph/node/edge type schemas as JSON blobs (`text` mode in SQLite, will
be `jsonb` in PG). Database-level constraints (unique composite keys, cascade
deletes) protect referential integrity, but the database does NOT validate JSON
schema conformance. This is a deliberate trade-off:
- **Pro**: Schema changes don't require migrations. A graph type's schema
evolves by updating the JSON blob.
- **Pro**: SQLite's JSON support is limited (no JSON schema constraints).
- **Con**: Invalid data can be inserted if application-level validation is
bypassed.
- **Mitigation**: All repository-layer mutations validate against the current
graph type's schema before writing.
## Node and Edge Identity
Nodes and edges use a **composite identity model**:
- **Node**: identified by `(graphId, key)` — unique within a graph. The `key` is
a consumer-defined string (e.g., `"call-001"`, `"account:alice"`).
- **Edge**: identified by `(graphId, key)` — unique within a graph. The `key` is
optional for directed graphs but required for multi-edges.
This means consumers control their own identifiers within a graph. The database
generates UUID `id` values for cross-graph references, but within a graph, the
consumer's `key` is the identity.
## Attributes Storage
Node attributes and edge attributes are stored as JSON text in SQLite (will be
`jsonb` in PG). The graph type's schema defines what shape these attributes
should have, but the database doesn't enforce the schema — it stores whatever
JSON is provided.
This design means:
- **Schema evolution**: Add optional fields to a node type schema without
data migration. Old nodes are still valid.
- **Schema versioning**: The `version` field on graph types tracks breaking
schema changes. Consumer code can check the version before processing.
- **Validation boundary**: All validation happens in the repository layer
(application code), not in the database.
## Versioning
Graph types have a `version` integer (default 1). This tracks **breaking**
schema changes — field removals, type changes that invalidate existing data.
Non-breaking changes (adding optional fields) do not require a version bump.
The `version` field is stored as a column on the `graph_types` table (see
[sqlite-host.md](./sqlite-host.md)). It is not part of the `GraphSchema` — it
lives at the database level because it affects how the repository layer
processes data for that graph type.
The repository layer should check `version` before processing to ensure
compatibility. A version mismatch indicates the data format has changed
incompatibly and the consumer should handle it explicitly.
## Usage Patterns (Historical — SchemaBuilder API)
> **⚠️ These examples use the removed SchemaBuilder API.** They are retained here
> as structural reference for the data model concepts. For the current Module
> construction API, see [metagraph-module.md](./metagraph-module.md). For current
> encrypted data examples, see [encrypted-data.md](./encrypted-data.md).
### Defining a Call Graph Type
> **Note**: `@alkdev/flowgraph` exports a canonical `CallNodeAttrs` schema that
> defines these same fields (plus `parentRequestId`, `identity`, timestamps).
> The example below illustrates the structure; production code should import
> `CallNodeAttrs` from `@alkdev/flowgraph/schema` when defining the graph type
> for persisted call graphs.
```ts
import {
BaseEdgeAttributes,
BaseNodeAttributes,
SchemaBuilder,
} from "@alkdev/storage";
import { Type } from "@alkdev/typebox";
const CallNodeAttributes = Type.Intersect([
BaseNodeAttributes,
Type.Object({
requestId: Type.String(),
operationId: Type.String(),
status: Type.Union([
Type.Literal("pending"),
Type.Literal("running"),
Type.Literal("completed"),
Type.Literal("failed"),
Type.Literal("aborted"),
]),
}),
]);
const schema = new SchemaBuilder()
.config({ type: "directed", multi: false, allowSelfLoops: false })
.nodeType("call", CallNodeAttributes)
.edgeType("triggered", BaseEdgeAttributes)
.edgeType("depends_on", BaseEdgeAttributes)
.build();
```
### Defining an ACL Graph Type
```ts
const ACLNodeAttributes = Type.Intersect([
BaseNodeAttributes,
Type.Object({
resourceType: Type.String(), // "project", "session", "client"
resourceId: Type.String(),
}),
]);
const ACLEdgeAttributes = Type.Intersect([
BaseEdgeAttributes,
Type.Object({
permission: Type.Union([
Type.Literal("read"),
Type.Literal("write"),
Type.Literal("admin"),
]),
}),
]);
const schema = new SchemaBuilder()
.config({ type: "directed", multi: true, allowSelfLoops: false })
.nodeType("principal", ACLNodeAttributes) // accounts, groups
.nodeType("resource", ACLNodeAttributes) // projects, sessions, etc.
.edgeType("can_access", ACLEdgeAttributes, {
allowedSourceTypes: ["principal"],
allowedTargetTypes: ["resource"],
})
.build();
```
### Defining Encrypted Secret Storage as a Node Type
> **⚠️ Not yet implemented.** `EncryptedDataSchema` and `encrypt()`/`decrypt()`
> are planned additions. See [encrypted-data.md](./encrypted-data.md) for the
> design.
```ts
// PLANNED — not yet available
import { EncryptedDataSchema } from "@alkdev/storage";
const SecretNodeAttributes = Type.Intersect([
BaseNodeAttributes,
Type.Object({
key: Type.String(), // secret key name
encryptedData: EncryptedDataSchema, // AES-256-GCM ciphertext
expiresAt: Type.Optional(Type.String({ format: "date-time" })),
}),
]);
const schema = new SchemaBuilder()
.config({ type: "undirected", multi: false, allowSelfLoops: false })
.nodeType("secret", SecretNodeAttributes)
.build();
```
See [encrypted-data.md](./encrypted-data.md) for the full encrypted data design.
## Ecosystem Context
The metagraph model is a **storage layer** consumed by other packages in the
@alkdev ecosystem. It does not depend on the hub — the dependency flows the other
way. The call protocol, identity model, and graph type definitions that inform
the usage patterns below originate from sibling packages:
- **Call protocol** — Defined in `@alkdev/operations`. The `CallEventMap`,
`PendingRequestMap`, and `CallHandler` define call/subscribe semantics, event
types, and the request correlation model. Storage persists what the call
protocol records.
- **Identity & access control** — Defined in `@alkdev/operations`. The
`Identity` interface and `AccessControl` schema provide the security context
carried through calls. The "ACL graph type" usage pattern maps to these
constructs.
- **Call graph schema** — Defined in `@alkdev/flowgraph`. `CallNodeAttrs`,
`CallEdgeAttrs`, `CallStatus`, and `EdgeType` specify the attribute shapes for
call graph instances. Storage persists these shapes; flowgraph operates on
them in memory.
- **Task graph schema** — Defined in `@alkdev/taskgraph`. `TaskGraphNodeAttributes`
and `DependencyEdge` specify task dependency shapes. Another graph type
that storage persists.
- **Event transport** — Provided by `@alkdev/pubsub`. The `TypedEventTarget`
contract and `EventEnvelope` wrapping carry call protocol events between
processes. Storage is not involved in event routing but stores the events'
outcomes.
The repository layer (not yet implemented) will provide typed CRUD for metagraph
data — insert, find, update, delete with schema validation. A consumer can then
wire these CRUD functions into `@alkdev/operations`'s registry pattern
(analogous to how `drizzle-graphql` auto-generates a GraphQL schema from Drizzle
tables, but using operations instead of GraphQL resolvers). This avoids circular
dependencies: storage defines the schema types and tables, operations provides
the call protocol and registry, and the consumer bridges them.
## References
- Call protocol: `/workspace/@alkdev/operations/docs/architecture/call-protocol.md`
- Identity & access control: `/workspace/@alkdev/operations/docs/architecture/api-surface.md`
- Call graph schema: `/workspace/@alkdev/flowgraph/docs/architecture/schema.md`
- Call graph (dynamic): `/workspace/@alkdev/flowgraph/docs/architecture/call-graph.md`
- Task graph schema: `/workspace/@alkdev/taskgraph_ts/docs/architecture/schemas.md`
- Pubsub architecture: `/workspace/@alkdev/pubsub/docs/architecture/README.md`
- TypeBox: https://github.com/sinclairzx/typebox
- SchemaBuilder source (pre-Module, removed): `src/graphs/schemaBuilder.ts`
- Schema types source (pre-Module, being replaced): `src/graphs/types.ts`

View File

@@ -556,4 +556,4 @@ be needed. That's a post-v1 concern.
- Event Log as Source of Truth (ADR-005): `/workspace/@alkdev/flowgraph/docs/architecture/decisions/005-event-log-as-source-of-truth.md`
- Call protocol: `/workspace/@alkdev/operations/docs/architecture/call-protocol.md`
- Metagraph Module: [metagraph-module.md](./metagraph-module.md)
- Current schema versioning: [metagraph.md](./metagraph.md)
- Schema versioning in the data model: [metagraph-module.md](./metagraph-module.md) (Versioning section and DD3)

View File

@@ -136,7 +136,7 @@ node type is a valid endpoint — matching the behavior of `undefined` in the
listed node types. There is no "no types allowed" state; if edge types need to
be disabled, use a status or soft-delete pattern on the edge type definition.
The repository layer must enforce this convention consistently. See
[metagraph.md](./metagraph.md) for the schema-layer definition.
[metagraph-module.md](./metagraph-module.md) for edge endpoint semantics.
### `graphs`