docs: delete metagraph.md, migrate data model into metagraph-module.md

The historical reference doc was exactly the confusing artifact we were cleaning up. Its unique content (the three-level type system overview and ASCII diagram) now lives in metagraph-module.md as an introductory section. Everything else was redundant: - Schema types → metagraph-module.md (Module entries) - SchemaBuilder → metagraph-module.md (SchemaBuilder Equivalence section) - Usage patterns → metagraph-module.md + encrypted-data.md (Module examples) - Composite identity / attributes storage → sqlite-host.md (table definitions) - Versioning → schema-evolution.md (thorough treatment) - Ecosystem context → overview.md (Ecosystem Integration section) All cross-references updated: AGENTS.md, sqlite-host.md, schema-evolution.md.
2026-05-29 05:27:08 +00:00
parent 95e02f939d
commit 3b63d92976
5 changed files with 57 additions and 429 deletions
--- a/docs/architecture/metagraph-module.md
+++ b/docs/architecture/metagraph-module.md
@@ -8,6 +8,57 @@ last_updated: 2026-05-30
 Graph type definitions as `Type.Module` — aligning with the ujsx pattern for
 recursive schemas, cross-package references, codegen, and graphology serialization.

+## The Metagraph Data Model
+
+The metagraph pattern is a three-level type system:
+
+1. **GraphType** — A class of graphs (e.g., "call-graph", "acl",
+   "task-dependencies"). Defines structural constraints
+   (directed/undirected/mixed, allows self-loops, multi-edges) via a
+   `Config` entry.
+2. **NodeType** — A category of node within a graph type (e.g., "call",
+   "account", "task"). Each node type has a TypeBox schema that validates
+   the `attributes` of nodes belonging to that type. Optionally constrains
+   which edge types can connect from/to this node type.
+3. **EdgeType** — A category of edge within a graph type (e.g., "triggered",
+   "can_read", "depends_on"). Each edge type has a TypeBox schema for its
+   attributes. Optionally constrains which source/target node types are valid.
+
+Then **Graph instances** belong to a graph type and contain **Nodes** and
+**Edges** conforming to those type definitions.
+
+```
+GraphType "call-graph" (directed, multi, self-loops allowed)
+  ├── NodeType "call"         → schema validates call attributes
+  ├── NodeType "subcall"      → schema validates subcall attributes
+  ├── EdgeType "triggered"    → allowedSourceTypes: ["Call"], allowedTargetTypes: ["Call", "Subcall"]
+  └── EdgeType "depends_on"   → allowedSourceTypes: ["Call", "Subcall"], allowedTargetTypes: ["Call", "Subcall"]
+
+Graph "session-abc-call-graph" (instance)
+  │  graphTypeId → GraphType "call-graph"
+  │  status: "active"
+  │
+  ├── Node "call-001"        → nodeTypeId → NodeType "call"
+  │   └── attributes: { requestId, operationId, status, ... }
+  ├── Node "call-002"        → nodeTypeId → NodeType "subcall"
+  │   └── attributes: { requestId, parentRequestId, ... }
+  └── Edge "edge-001"        → edgeTypeId → EdgeType "triggered"
+      └── attributes: { type: "triggered" }
+          sourceNodeKey: "call-001"
+          targetNodeKey: "call-002"
+```
+
+Nodes and edges use a **composite identity model**: identified by
+`(graphId, key)` where `key` is consumer-defined. The database generates UUID
+`id` values for cross-graph references, but within a graph, the consumer's
+`key` is the identity.
+
+Node and edge attributes are stored as JSON text in SQLite (jsonb in PG). The
+graph type's schema defines what shape these attributes should have, but the
+database doesn't enforce the schema — all validation happens in the repository
+layer. See [schema-evolution.md](./schema-evolution.md) for how schemas change
+over time, and [sqlite-host.md](./sqlite-host.md) for the table definitions.
+
 ## Overview

 A graph type definition is naturally a TypeBox Module. It has named entries
@@ -833,5 +884,4 @@ Acceptance criteria per phase:
 - Flowgraph schema (standalone TypeBox, not yet Module): `/workspace/@alkdev/flowgraph/src/schema/`
 - Flowgraph SerializedGraph factory: `/workspace/@alkdev/flowgraph/src/schema/graph.ts`
 - Forward-looking connections (pointers, dbtype, ujsx IR): [forward-look.md](./forward-look.md)
- Current metagraph model: [metagraph.md](./metagraph.md)
 - Ecosystem integration: [overview.md](./overview.md)
--- a/docs/architecture/metagraph.md
+++ b/docs/architecture/metagraph.md
@@ -1,422 +0,0 @@
---
-status: draft
-last_updated: 2026-05-28
---
-
-# Metagraph Model
-
-> **Historical reference only.** Graph type definitions are now TypeBox Modules,
-> not standalone schemas + SchemaBuilder. The current data model and construction
-> patterns are specified in [metagraph-module.md](./metagraph-module.md). This
-> document is retained for understanding the data model concepts (graph types,
-> node types, edge types, graph instances) and the versioning/attributes storage
-> design, which carry forward unchanged into the Module approach.
-
-The core data model: graph types define schemas, node types define shapes, edge
-types define relationships, and typed graph instances hold actual data.
-
-## Overview
-
-The metagraph pattern is a three-level type system:
-
-1. **GraphType** — A class of graphs (e.g., "call-graph", "acl",
-   "task-dependencies"). Defines structural constraints
-   (directed/undirected/mixed, allows self-loops, multi-edges) via a
-   `GraphConfig`.
-2. **NodeType** — A category of node within a graph type (e.g.,
-   "operation-call", "account", "task"). Each node type has a TypeBox schema
-   that validates the `attributes` of nodes belonging to that type. Optionally
-   constrains which edge types can connect from/to this node type.
-3. **EdgeType** — A category of edge within a graph type (e.g., "triggered",
-   "can_read", "depends_on"). Each edge type has a TypeBox schema for its
-   attributes. Optionally constrains which source/target node types are valid.
-
-Then **Graph instances** belong to a graph type and contain **Nodes** and
-**Edges** conforming to those type definitions.
-
-```
-GraphType "call-graph" (directed, multi, self-loops allowed)
-  ├── NodeType "call"         → schema validates call attributes
-  ├── NodeType "subcall"     → schema validates subcall attributes
-  ├── EdgeType "triggered"   → allowedSourceTypes: ["call"], allowedTargetTypes: ["call", "subcall"]
-  └── EdgeType "depends_on"  → allowedSourceTypes: ["call", "subcall"], allowedTargetTypes: ["call", "subcall"]
-
-Graph "session-abc-call-graph" (instance)
-  │  graphTypeId → GraphType "call-graph"
-  │  status: "active"
-  │
-  ├── Node "call-001"        → nodeTypeId → NodeType "call"
-  │   └── attributes: { requestId, operationId, status, ... }
-  ├── Node "call-002"        → nodeTypeId → NodeType "subcall"
-  │   └── attributes: { requestId, parentRequestId, ... }
-  └── Edge "edge-001"        → edgeTypeId → EdgeType "triggered"
-      └── attributes: { type: "triggered" }
-          sourceNodeKey: "call-001"
-          targetNodeKey: "call-002"
-```
-
-## Schema Types
-
-> These are the pre-Module representations. The `Metagraph` Module
-> ([metagraph-module.md](./metagraph-module.md)) replaces these standalone
-> schemas with Module entries (`Metagraph.BaseNode`, `Metagraph.BaseEdge`,
-> `Metagraph.Config`). The data shapes are the same; the Module format adds
-> `Type.Ref()`, `Type.Composite()`, and `$defs` support.
-
-Were defined in `src/graphs/types.ts`. Zero database dependencies — pure
-TypeBox schemas used for validation and type inference.
-
-### BaseNodeAttributes
-
-```ts
-{
-  created?: string,       // ISO 8601 date-time
-  modified?: string,      // ISO 8601 date-time
-  metadata?: Record<string, unknown>
-}
-```
-
-Optional audit and extension fields. Node `attributes` should extend this.
-
-### BaseEdgeAttributes
-
-```ts
-{
-  type: string,           // edge type discriminator
-  metadata?: Record<string, unknown>
-}
-```
-
-Every edge carries its type and optional metadata. Edge `attributes` should
-extend this.
-
-### GraphConfig
-
-```ts
-{
-  type: "directed" | "undirected" | "mixed",  // default: "mixed"
-  multi: boolean,                              // default: true
-  allowSelfLoops: boolean                       // default: true
-}
-```
-
-Structural constraints for a graph type. Defaults encourage permissive graphs
-(mixed, multi-edges, self-loops) because most real-world graphs need these
-features.
-
-### NodeType
-
-```ts
-{
-  name: string,
-  schema: TSchema  // TypeBox schema for node attributes
-}
-```
-
-A node type definition. The `schema` validates the `attributes` of nodes that
-belong to this type. Consumer must extend `BaseNodeAttributes` in their schema —
-the metagraph model does not enforce this at the database level (SQLite can't
-enforce JSON schema), but the SchemaBuilder validated it at definition time
-(now handled by `Type.Module()` construction).
-
-### EdgeType
-
-```ts
-{
-  name: string,
-  schema: TSchema,
-  allowedSourceTypes?: string[],
-  allowedTargetTypes?: string[]
-}
-```
-
-An edge type definition. Optionally constrains which node types can appear at
-source/target endpoints. When `allowedSourceTypes` or `allowedTargetTypes` is
-undefined, any node type is valid. When defined, only listed node types are
-valid endpoints.
-
-### GraphSchema
-
-```ts
-{
-  config: GraphConfig,
-  nodeTypes: Record<string, NodeType>,
-  edgeTypes: Record<string, EdgeType>
-}
-```
-
-The complete definition of a graph type. This is what `SchemaBuilder.build()`
-produced (now `Type.Module()` produces the same structure).
-
-### GraphStatus & GraphBaseType
-
-Enum-backed types for graph lifecycle and structural type:
-
- `GraphStatus`: `active`, `archived`, `draft`
- `GraphBaseType`: `directed`, `undirected`, `mixed`
-
-These are provided as `as const` object constants and TypeBox `Type.Union` of
-`Type.Literal` schemas, following the project convention (see overview.md D6).
-The TypeBox schemas derive their literal values from the same const object.
-
-## SchemaBuilder (Historical — Replaced by Module Construction)
-
-Was defined in `src/graphs/schemaBuilder.ts`. Fluent builder API. **This builder
-is removed in the Module approach** — see [metagraph-module.md](./metagraph-module.md)
-DD1/DD2. The builder's internal structure documents the structural equivalence
-with `Type.Module()` (see "SchemaBuilder Equivalence" in that document).
-
-```ts
-const schema = new SchemaBuilder()
-  .config({ type: "directed", multi: true, allowSelfLoops: false })
-  .nodeType("call", CallAttributesSchema)
-  .nodeType("subcall", SubcallAttributesSchema)
-  .edgeType("triggered", BaseEdgeAttributes, {
-    allowedSourceTypes: ["call"],
-    allowedTargetTypes: ["call", "subcall"],
-  })
-  .edgeType("depends_on", BaseEdgeAttributes)
-  .build();
-```
-
-### Validation
-
-The builder validated at each step:
-
-1. **`config()`** — Validated against `GraphConfig` schema. Applied defaults for
-   missing fields.
-2. **`nodeType()`** — Validated the schema was a valid TypeBox schema
-   (`KindGuard.IsSchema`). Validated the resulting object against `NodeType`
-   schema.
-3. **`edgeType()`** — Same as nodeType, plus validated
-   allowedSourceTypes/allowedTargetTypes were strings.
-4. **`build()`** — Validated the complete schema against `GraphSchema`. Threw
-   on any invalid structure.
-
-**Error behavior**: The builder threw `Error` with a JSON-stringified list of
-validation errors (path + message). Validation failures did not roll back partial
-state — a builder that failed on the second `nodeType()` call still had the first
-node type in its schema. Callers were advised not to reuse a builder after a
-failure.
-
-**Edge type enforcement**: When `allowedSourceTypes` or `allowedTargetTypes` is
-`undefined` in the schema layer, any node type is a valid endpoint. When a
-non-empty array is provided, only the listed node types are valid endpoints.
-In the database layer, `[]` (the column default) represents "no restriction" —
-any node type is valid — matching the behavior of `undefined` in the schema.
-There is no "no types allowed" state; if edge types need to be disabled, use a
-status or soft-delete pattern on the edge type definition. The repository layer
-must enforce this convention consistently.
-
-The SchemaBuilder enforced structural integrity at definition time. In the
-Module approach, `Type.Module()` construction and `Value.Check()` provide the
-same guarantee. The database
-stores graph/node/edge type schemas as JSON blobs (`text` mode in SQLite, will
-be `jsonb` in PG). Database-level constraints (unique composite keys, cascade
-deletes) protect referential integrity, but the database does NOT validate JSON
-schema conformance. This is a deliberate trade-off:
-
- **Pro**: Schema changes don't require migrations. A graph type's schema
-  evolves by updating the JSON blob.
- **Pro**: SQLite's JSON support is limited (no JSON schema constraints).
- **Con**: Invalid data can be inserted if application-level validation is
-  bypassed.
- **Mitigation**: All repository-layer mutations validate against the current
-  graph type's schema before writing.
-
-## Node and Edge Identity
-
-Nodes and edges use a **composite identity model**:
-
- **Node**: identified by `(graphId, key)` — unique within a graph. The `key` is
-  a consumer-defined string (e.g., `"call-001"`, `"account:alice"`).
- **Edge**: identified by `(graphId, key)` — unique within a graph. The `key` is
-  optional for directed graphs but required for multi-edges.
-
-This means consumers control their own identifiers within a graph. The database
-generates UUID `id` values for cross-graph references, but within a graph, the
-consumer's `key` is the identity.
-
-## Attributes Storage
-
-Node attributes and edge attributes are stored as JSON text in SQLite (will be
-`jsonb` in PG). The graph type's schema defines what shape these attributes
-should have, but the database doesn't enforce the schema — it stores whatever
-JSON is provided.
-
-This design means:
-
- **Schema evolution**: Add optional fields to a node type schema without
-  data migration. Old nodes are still valid.
- **Schema versioning**: The `version` field on graph types tracks breaking
-  schema changes. Consumer code can check the version before processing.
- **Validation boundary**: All validation happens in the repository layer
-  (application code), not in the database.
-
-## Versioning
-
-Graph types have a `version` integer (default 1). This tracks **breaking**
-schema changes — field removals, type changes that invalidate existing data.
-Non-breaking changes (adding optional fields) do not require a version bump.
-
-The `version` field is stored as a column on the `graph_types` table (see
-[sqlite-host.md](./sqlite-host.md)). It is not part of the `GraphSchema` — it
-lives at the database level because it affects how the repository layer
-processes data for that graph type.
-
-The repository layer should check `version` before processing to ensure
-compatibility. A version mismatch indicates the data format has changed
-incompatibly and the consumer should handle it explicitly.
-
-## Usage Patterns (Historical — SchemaBuilder API)
-
-> **⚠️ These examples use the removed SchemaBuilder API.** They are retained here
-> as structural reference for the data model concepts. For the current Module
-> construction API, see [metagraph-module.md](./metagraph-module.md). For current
-> encrypted data examples, see [encrypted-data.md](./encrypted-data.md).
-
-### Defining a Call Graph Type
-
-> **Note**: `@alkdev/flowgraph` exports a canonical `CallNodeAttrs` schema that
-> defines these same fields (plus `parentRequestId`, `identity`, timestamps).
-> The example below illustrates the structure; production code should import
-> `CallNodeAttrs` from `@alkdev/flowgraph/schema` when defining the graph type
-> for persisted call graphs.
-
-```ts
-import {
-  BaseEdgeAttributes,
-  BaseNodeAttributes,
-  SchemaBuilder,
-} from "@alkdev/storage";
-import { Type } from "@alkdev/typebox";
-
-const CallNodeAttributes = Type.Intersect([
-  BaseNodeAttributes,
-  Type.Object({
-    requestId: Type.String(),
-    operationId: Type.String(),
-    status: Type.Union([
-      Type.Literal("pending"),
-      Type.Literal("running"),
-      Type.Literal("completed"),
-      Type.Literal("failed"),
-      Type.Literal("aborted"),
-    ]),
-  }),
-]);
-
-const schema = new SchemaBuilder()
-  .config({ type: "directed", multi: false, allowSelfLoops: false })
-  .nodeType("call", CallNodeAttributes)
-  .edgeType("triggered", BaseEdgeAttributes)
-  .edgeType("depends_on", BaseEdgeAttributes)
-  .build();
-```
-
-### Defining an ACL Graph Type
-
-```ts
-const ACLNodeAttributes = Type.Intersect([
-  BaseNodeAttributes,
-  Type.Object({
-    resourceType: Type.String(), // "project", "session", "client"
-    resourceId: Type.String(),
-  }),
-]);
-
-const ACLEdgeAttributes = Type.Intersect([
-  BaseEdgeAttributes,
-  Type.Object({
-    permission: Type.Union([
-      Type.Literal("read"),
-      Type.Literal("write"),
-      Type.Literal("admin"),
-    ]),
-  }),
-]);
-
-const schema = new SchemaBuilder()
-  .config({ type: "directed", multi: true, allowSelfLoops: false })
-  .nodeType("principal", ACLNodeAttributes) // accounts, groups
-  .nodeType("resource", ACLNodeAttributes) // projects, sessions, etc.
-  .edgeType("can_access", ACLEdgeAttributes, {
-    allowedSourceTypes: ["principal"],
-    allowedTargetTypes: ["resource"],
-  })
-  .build();
-```
-
-### Defining Encrypted Secret Storage as a Node Type
-
-> **⚠️ Not yet implemented.** `EncryptedDataSchema` and `encrypt()`/`decrypt()`
-> are planned additions. See [encrypted-data.md](./encrypted-data.md) for the
-> design.
-
-```ts
-// PLANNED — not yet available
-import { EncryptedDataSchema } from "@alkdev/storage";
-
-const SecretNodeAttributes = Type.Intersect([
-  BaseNodeAttributes,
-  Type.Object({
-    key: Type.String(), // secret key name
-    encryptedData: EncryptedDataSchema, // AES-256-GCM ciphertext
-    expiresAt: Type.Optional(Type.String({ format: "date-time" })),
-  }),
-]);
-
-const schema = new SchemaBuilder()
-  .config({ type: "undirected", multi: false, allowSelfLoops: false })
-  .nodeType("secret", SecretNodeAttributes)
-  .build();
-```
-
-See [encrypted-data.md](./encrypted-data.md) for the full encrypted data design.
-
-## Ecosystem Context
-
-The metagraph model is a **storage layer** consumed by other packages in the
-@alkdev ecosystem. It does not depend on the hub — the dependency flows the other
-way. The call protocol, identity model, and graph type definitions that inform
-the usage patterns below originate from sibling packages:
-
- **Call protocol** — Defined in `@alkdev/operations`. The `CallEventMap`,
-  `PendingRequestMap`, and `CallHandler` define call/subscribe semantics, event
-  types, and the request correlation model. Storage persists what the call
-  protocol records.
- **Identity & access control** — Defined in `@alkdev/operations`. The
-  `Identity` interface and `AccessControl` schema provide the security context
-  carried through calls. The "ACL graph type" usage pattern maps to these
-  constructs.
- **Call graph schema** — Defined in `@alkdev/flowgraph`. `CallNodeAttrs`,
-  `CallEdgeAttrs`, `CallStatus`, and `EdgeType` specify the attribute shapes for
-  call graph instances. Storage persists these shapes; flowgraph operates on
-  them in memory.
- **Task graph schema** — Defined in `@alkdev/taskgraph`. `TaskGraphNodeAttributes`
-  and `DependencyEdge` specify task dependency shapes. Another graph type
-  that storage persists.
- **Event transport** — Provided by `@alkdev/pubsub`. The `TypedEventTarget`
-  contract and `EventEnvelope` wrapping carry call protocol events between
-  processes. Storage is not involved in event routing but stores the events'
-  outcomes.
-
-The repository layer (not yet implemented) will provide typed CRUD for metagraph
-data — insert, find, update, delete with schema validation. A consumer can then
-wire these CRUD functions into `@alkdev/operations`'s registry pattern
-(analogous to how `drizzle-graphql` auto-generates a GraphQL schema from Drizzle
-tables, but using operations instead of GraphQL resolvers). This avoids circular
-dependencies: storage defines the schema types and tables, operations provides
-the call protocol and registry, and the consumer bridges them.
-
-## References
-
- Call protocol: `/workspace/@alkdev/operations/docs/architecture/call-protocol.md`
- Identity & access control: `/workspace/@alkdev/operations/docs/architecture/api-surface.md`
- Call graph schema: `/workspace/@alkdev/flowgraph/docs/architecture/schema.md`
- Call graph (dynamic): `/workspace/@alkdev/flowgraph/docs/architecture/call-graph.md`
- Task graph schema: `/workspace/@alkdev/taskgraph_ts/docs/architecture/schemas.md`
- Pubsub architecture: `/workspace/@alkdev/pubsub/docs/architecture/README.md`
- TypeBox: https://github.com/sinclairzx/typebox
- SchemaBuilder source (pre-Module, removed): `src/graphs/schemaBuilder.ts`
- Schema types source (pre-Module, being replaced): `src/graphs/types.ts`
--- a/docs/architecture/schema-evolution.md
+++ b/docs/architecture/schema-evolution.md
@@ -556,4 +556,4 @@ be needed. That's a post-v1 concern.
 - Event Log as Source of Truth (ADR-005): `/workspace/@alkdev/flowgraph/docs/architecture/decisions/005-event-log-as-source-of-truth.md`
 - Call protocol: `/workspace/@alkdev/operations/docs/architecture/call-protocol.md`
 - Metagraph Module: [metagraph-module.md](./metagraph-module.md)
- Current schema versioning: [metagraph.md](./metagraph.md)
+- Schema versioning in the data model: [metagraph-module.md](./metagraph-module.md) (Versioning section and DD3)
--- a/docs/architecture/sqlite-host.md
+++ b/docs/architecture/sqlite-host.md
@@ -136,7 +136,7 @@ node type is a valid endpoint — matching the behavior of `undefined` in the
 listed node types. There is no "no types allowed" state; if edge types need to
 be disabled, use a status or soft-delete pattern on the edge type definition.
 The repository layer must enforce this convention consistently. See
-[metagraph.md](./metagraph.md) for the schema-layer definition.
+[metagraph-module.md](./metagraph-module.md) for edge endpoint semantics.

 ### `graphs`