From 3b63d929761706583fddc5225127c3e9e0e49226 Mon Sep 17 00:00:00 2001 From: "glm-5.1" Date: Fri, 29 May 2026 05:27:08 +0000 Subject: [PATCH] docs: delete metagraph.md, migrate data model into metagraph-module.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The historical reference doc was exactly the confusing artifact we were cleaning up. Its unique content (the three-level type system overview and ASCII diagram) now lives in metagraph-module.md as an introductory section. Everything else was redundant: - Schema types → metagraph-module.md (Module entries) - SchemaBuilder → metagraph-module.md (SchemaBuilder Equivalence section) - Usage patterns → metagraph-module.md + encrypted-data.md (Module examples) - Composite identity / attributes storage → sqlite-host.md (table definitions) - Versioning → schema-evolution.md (thorough treatment) - Ecosystem context → overview.md (Ecosystem Integration section) All cross-references updated: AGENTS.md, sqlite-host.md, schema-evolution.md. --- AGENTS.md | 8 +- docs/architecture/metagraph-module.md | 52 +++- docs/architecture/metagraph.md | 422 -------------------------- docs/architecture/schema-evolution.md | 2 +- docs/architecture/sqlite-host.md | 2 +- 5 files changed, 57 insertions(+), 429 deletions(-) delete mode 100644 docs/architecture/metagraph.md diff --git a/AGENTS.md b/AGENTS.md index f87cd09..614a1f7 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -98,12 +98,12 @@ Key changes from the originals: See `docs/architecture/` for detailed specifications: - `overview.md` — Package purpose, exports, design decisions, open questions -- `metagraph.md` — Core graph model, schema types, SchemaBuilder, attribute - storage -- `metagraph-module.md` — Graph type definitions as TypeBox Modules (evolution - of metagraph.md), naming conventions, migration path +- `metagraph-module.md` — Graph type definitions as TypeBox Modules, data model, + naming conventions, implementation path - `forward-look.md` — Connections to dbtype, graph pointers, ujsx universal IR pipeline +- `schema-evolution.md` — How graph type schemas evolve, TypeBox Value.Diff/Patch/Cast + for schema change detection and data migration - `sqlite-host.md` — SQLite tables, relations, client factory, porting notes - `encrypted-data.md` — Encrypted data design (planned), crypto utility, node type modeling diff --git a/docs/architecture/metagraph-module.md b/docs/architecture/metagraph-module.md index 76a73a7..7667b58 100644 --- a/docs/architecture/metagraph-module.md +++ b/docs/architecture/metagraph-module.md @@ -8,6 +8,57 @@ last_updated: 2026-05-30 Graph type definitions as `Type.Module` — aligning with the ujsx pattern for recursive schemas, cross-package references, codegen, and graphology serialization. +## The Metagraph Data Model + +The metagraph pattern is a three-level type system: + +1. **GraphType** — A class of graphs (e.g., "call-graph", "acl", + "task-dependencies"). Defines structural constraints + (directed/undirected/mixed, allows self-loops, multi-edges) via a + `Config` entry. +2. **NodeType** — A category of node within a graph type (e.g., "call", + "account", "task"). Each node type has a TypeBox schema that validates + the `attributes` of nodes belonging to that type. Optionally constrains + which edge types can connect from/to this node type. +3. **EdgeType** — A category of edge within a graph type (e.g., "triggered", + "can_read", "depends_on"). Each edge type has a TypeBox schema for its + attributes. Optionally constrains which source/target node types are valid. + +Then **Graph instances** belong to a graph type and contain **Nodes** and +**Edges** conforming to those type definitions. + +``` +GraphType "call-graph" (directed, multi, self-loops allowed) + ├── NodeType "call" → schema validates call attributes + ├── NodeType "subcall" → schema validates subcall attributes + ├── EdgeType "triggered" → allowedSourceTypes: ["Call"], allowedTargetTypes: ["Call", "Subcall"] + └── EdgeType "depends_on" → allowedSourceTypes: ["Call", "Subcall"], allowedTargetTypes: ["Call", "Subcall"] + +Graph "session-abc-call-graph" (instance) + │ graphTypeId → GraphType "call-graph" + │ status: "active" + │ + ├── Node "call-001" → nodeTypeId → NodeType "call" + │ └── attributes: { requestId, operationId, status, ... } + ├── Node "call-002" → nodeTypeId → NodeType "subcall" + │ └── attributes: { requestId, parentRequestId, ... } + └── Edge "edge-001" → edgeTypeId → EdgeType "triggered" + └── attributes: { type: "triggered" } + sourceNodeKey: "call-001" + targetNodeKey: "call-002" +``` + +Nodes and edges use a **composite identity model**: identified by +`(graphId, key)` where `key` is consumer-defined. The database generates UUID +`id` values for cross-graph references, but within a graph, the consumer's +`key` is the identity. + +Node and edge attributes are stored as JSON text in SQLite (jsonb in PG). The +graph type's schema defines what shape these attributes should have, but the +database doesn't enforce the schema — all validation happens in the repository +layer. See [schema-evolution.md](./schema-evolution.md) for how schemas change +over time, and [sqlite-host.md](./sqlite-host.md) for the table definitions. + ## Overview A graph type definition is naturally a TypeBox Module. It has named entries @@ -833,5 +884,4 @@ Acceptance criteria per phase: - Flowgraph schema (standalone TypeBox, not yet Module): `/workspace/@alkdev/flowgraph/src/schema/` - Flowgraph SerializedGraph factory: `/workspace/@alkdev/flowgraph/src/schema/graph.ts` - Forward-looking connections (pointers, dbtype, ujsx IR): [forward-look.md](./forward-look.md) -- Current metagraph model: [metagraph.md](./metagraph.md) - Ecosystem integration: [overview.md](./overview.md) \ No newline at end of file diff --git a/docs/architecture/metagraph.md b/docs/architecture/metagraph.md deleted file mode 100644 index 6b89418..0000000 --- a/docs/architecture/metagraph.md +++ /dev/null @@ -1,422 +0,0 @@ ---- -status: draft -last_updated: 2026-05-28 ---- - -# Metagraph Model - -> **Historical reference only.** Graph type definitions are now TypeBox Modules, -> not standalone schemas + SchemaBuilder. The current data model and construction -> patterns are specified in [metagraph-module.md](./metagraph-module.md). This -> document is retained for understanding the data model concepts (graph types, -> node types, edge types, graph instances) and the versioning/attributes storage -> design, which carry forward unchanged into the Module approach. - -The core data model: graph types define schemas, node types define shapes, edge -types define relationships, and typed graph instances hold actual data. - -## Overview - -The metagraph pattern is a three-level type system: - -1. **GraphType** — A class of graphs (e.g., "call-graph", "acl", - "task-dependencies"). Defines structural constraints - (directed/undirected/mixed, allows self-loops, multi-edges) via a - `GraphConfig`. -2. **NodeType** — A category of node within a graph type (e.g., - "operation-call", "account", "task"). Each node type has a TypeBox schema - that validates the `attributes` of nodes belonging to that type. Optionally - constrains which edge types can connect from/to this node type. -3. **EdgeType** — A category of edge within a graph type (e.g., "triggered", - "can_read", "depends_on"). Each edge type has a TypeBox schema for its - attributes. Optionally constrains which source/target node types are valid. - -Then **Graph instances** belong to a graph type and contain **Nodes** and -**Edges** conforming to those type definitions. - -``` -GraphType "call-graph" (directed, multi, self-loops allowed) - ├── NodeType "call" → schema validates call attributes - ├── NodeType "subcall" → schema validates subcall attributes - ├── EdgeType "triggered" → allowedSourceTypes: ["call"], allowedTargetTypes: ["call", "subcall"] - └── EdgeType "depends_on" → allowedSourceTypes: ["call", "subcall"], allowedTargetTypes: ["call", "subcall"] - -Graph "session-abc-call-graph" (instance) - │ graphTypeId → GraphType "call-graph" - │ status: "active" - │ - ├── Node "call-001" → nodeTypeId → NodeType "call" - │ └── attributes: { requestId, operationId, status, ... } - ├── Node "call-002" → nodeTypeId → NodeType "subcall" - │ └── attributes: { requestId, parentRequestId, ... } - └── Edge "edge-001" → edgeTypeId → EdgeType "triggered" - └── attributes: { type: "triggered" } - sourceNodeKey: "call-001" - targetNodeKey: "call-002" -``` - -## Schema Types - -> These are the pre-Module representations. The `Metagraph` Module -> ([metagraph-module.md](./metagraph-module.md)) replaces these standalone -> schemas with Module entries (`Metagraph.BaseNode`, `Metagraph.BaseEdge`, -> `Metagraph.Config`). The data shapes are the same; the Module format adds -> `Type.Ref()`, `Type.Composite()`, and `$defs` support. - -Were defined in `src/graphs/types.ts`. Zero database dependencies — pure -TypeBox schemas used for validation and type inference. - -### BaseNodeAttributes - -```ts -{ - created?: string, // ISO 8601 date-time - modified?: string, // ISO 8601 date-time - metadata?: Record -} -``` - -Optional audit and extension fields. Node `attributes` should extend this. - -### BaseEdgeAttributes - -```ts -{ - type: string, // edge type discriminator - metadata?: Record -} -``` - -Every edge carries its type and optional metadata. Edge `attributes` should -extend this. - -### GraphConfig - -```ts -{ - type: "directed" | "undirected" | "mixed", // default: "mixed" - multi: boolean, // default: true - allowSelfLoops: boolean // default: true -} -``` - -Structural constraints for a graph type. Defaults encourage permissive graphs -(mixed, multi-edges, self-loops) because most real-world graphs need these -features. - -### NodeType - -```ts -{ - name: string, - schema: TSchema // TypeBox schema for node attributes -} -``` - -A node type definition. The `schema` validates the `attributes` of nodes that -belong to this type. Consumer must extend `BaseNodeAttributes` in their schema — -the metagraph model does not enforce this at the database level (SQLite can't -enforce JSON schema), but the SchemaBuilder validated it at definition time -(now handled by `Type.Module()` construction). - -### EdgeType - -```ts -{ - name: string, - schema: TSchema, - allowedSourceTypes?: string[], - allowedTargetTypes?: string[] -} -``` - -An edge type definition. Optionally constrains which node types can appear at -source/target endpoints. When `allowedSourceTypes` or `allowedTargetTypes` is -undefined, any node type is valid. When defined, only listed node types are -valid endpoints. - -### GraphSchema - -```ts -{ - config: GraphConfig, - nodeTypes: Record, - edgeTypes: Record -} -``` - -The complete definition of a graph type. This is what `SchemaBuilder.build()` -produced (now `Type.Module()` produces the same structure). - -### GraphStatus & GraphBaseType - -Enum-backed types for graph lifecycle and structural type: - -- `GraphStatus`: `active`, `archived`, `draft` -- `GraphBaseType`: `directed`, `undirected`, `mixed` - -These are provided as `as const` object constants and TypeBox `Type.Union` of -`Type.Literal` schemas, following the project convention (see overview.md D6). -The TypeBox schemas derive their literal values from the same const object. - -## SchemaBuilder (Historical — Replaced by Module Construction) - -Was defined in `src/graphs/schemaBuilder.ts`. Fluent builder API. **This builder -is removed in the Module approach** — see [metagraph-module.md](./metagraph-module.md) -DD1/DD2. The builder's internal structure documents the structural equivalence -with `Type.Module()` (see "SchemaBuilder Equivalence" in that document). - -```ts -const schema = new SchemaBuilder() - .config({ type: "directed", multi: true, allowSelfLoops: false }) - .nodeType("call", CallAttributesSchema) - .nodeType("subcall", SubcallAttributesSchema) - .edgeType("triggered", BaseEdgeAttributes, { - allowedSourceTypes: ["call"], - allowedTargetTypes: ["call", "subcall"], - }) - .edgeType("depends_on", BaseEdgeAttributes) - .build(); -``` - -### Validation - -The builder validated at each step: - -1. **`config()`** — Validated against `GraphConfig` schema. Applied defaults for - missing fields. -2. **`nodeType()`** — Validated the schema was a valid TypeBox schema - (`KindGuard.IsSchema`). Validated the resulting object against `NodeType` - schema. -3. **`edgeType()`** — Same as nodeType, plus validated - allowedSourceTypes/allowedTargetTypes were strings. -4. **`build()`** — Validated the complete schema against `GraphSchema`. Threw - on any invalid structure. - -**Error behavior**: The builder threw `Error` with a JSON-stringified list of -validation errors (path + message). Validation failures did not roll back partial -state — a builder that failed on the second `nodeType()` call still had the first -node type in its schema. Callers were advised not to reuse a builder after a -failure. - -**Edge type enforcement**: When `allowedSourceTypes` or `allowedTargetTypes` is -`undefined` in the schema layer, any node type is a valid endpoint. When a -non-empty array is provided, only the listed node types are valid endpoints. -In the database layer, `[]` (the column default) represents "no restriction" — -any node type is valid — matching the behavior of `undefined` in the schema. -There is no "no types allowed" state; if edge types need to be disabled, use a -status or soft-delete pattern on the edge type definition. The repository layer -must enforce this convention consistently. - -The SchemaBuilder enforced structural integrity at definition time. In the -Module approach, `Type.Module()` construction and `Value.Check()` provide the -same guarantee. The database -stores graph/node/edge type schemas as JSON blobs (`text` mode in SQLite, will -be `jsonb` in PG). Database-level constraints (unique composite keys, cascade -deletes) protect referential integrity, but the database does NOT validate JSON -schema conformance. This is a deliberate trade-off: - -- **Pro**: Schema changes don't require migrations. A graph type's schema - evolves by updating the JSON blob. -- **Pro**: SQLite's JSON support is limited (no JSON schema constraints). -- **Con**: Invalid data can be inserted if application-level validation is - bypassed. -- **Mitigation**: All repository-layer mutations validate against the current - graph type's schema before writing. - -## Node and Edge Identity - -Nodes and edges use a **composite identity model**: - -- **Node**: identified by `(graphId, key)` — unique within a graph. The `key` is - a consumer-defined string (e.g., `"call-001"`, `"account:alice"`). -- **Edge**: identified by `(graphId, key)` — unique within a graph. The `key` is - optional for directed graphs but required for multi-edges. - -This means consumers control their own identifiers within a graph. The database -generates UUID `id` values for cross-graph references, but within a graph, the -consumer's `key` is the identity. - -## Attributes Storage - -Node attributes and edge attributes are stored as JSON text in SQLite (will be -`jsonb` in PG). The graph type's schema defines what shape these attributes -should have, but the database doesn't enforce the schema — it stores whatever -JSON is provided. - -This design means: - -- **Schema evolution**: Add optional fields to a node type schema without - data migration. Old nodes are still valid. -- **Schema versioning**: The `version` field on graph types tracks breaking - schema changes. Consumer code can check the version before processing. -- **Validation boundary**: All validation happens in the repository layer - (application code), not in the database. - -## Versioning - -Graph types have a `version` integer (default 1). This tracks **breaking** -schema changes — field removals, type changes that invalidate existing data. -Non-breaking changes (adding optional fields) do not require a version bump. - -The `version` field is stored as a column on the `graph_types` table (see -[sqlite-host.md](./sqlite-host.md)). It is not part of the `GraphSchema` — it -lives at the database level because it affects how the repository layer -processes data for that graph type. - -The repository layer should check `version` before processing to ensure -compatibility. A version mismatch indicates the data format has changed -incompatibly and the consumer should handle it explicitly. - -## Usage Patterns (Historical — SchemaBuilder API) - -> **⚠️ These examples use the removed SchemaBuilder API.** They are retained here -> as structural reference for the data model concepts. For the current Module -> construction API, see [metagraph-module.md](./metagraph-module.md). For current -> encrypted data examples, see [encrypted-data.md](./encrypted-data.md). - -### Defining a Call Graph Type - -> **Note**: `@alkdev/flowgraph` exports a canonical `CallNodeAttrs` schema that -> defines these same fields (plus `parentRequestId`, `identity`, timestamps). -> The example below illustrates the structure; production code should import -> `CallNodeAttrs` from `@alkdev/flowgraph/schema` when defining the graph type -> for persisted call graphs. - -```ts -import { - BaseEdgeAttributes, - BaseNodeAttributes, - SchemaBuilder, -} from "@alkdev/storage"; -import { Type } from "@alkdev/typebox"; - -const CallNodeAttributes = Type.Intersect([ - BaseNodeAttributes, - Type.Object({ - requestId: Type.String(), - operationId: Type.String(), - status: Type.Union([ - Type.Literal("pending"), - Type.Literal("running"), - Type.Literal("completed"), - Type.Literal("failed"), - Type.Literal("aborted"), - ]), - }), -]); - -const schema = new SchemaBuilder() - .config({ type: "directed", multi: false, allowSelfLoops: false }) - .nodeType("call", CallNodeAttributes) - .edgeType("triggered", BaseEdgeAttributes) - .edgeType("depends_on", BaseEdgeAttributes) - .build(); -``` - -### Defining an ACL Graph Type - -```ts -const ACLNodeAttributes = Type.Intersect([ - BaseNodeAttributes, - Type.Object({ - resourceType: Type.String(), // "project", "session", "client" - resourceId: Type.String(), - }), -]); - -const ACLEdgeAttributes = Type.Intersect([ - BaseEdgeAttributes, - Type.Object({ - permission: Type.Union([ - Type.Literal("read"), - Type.Literal("write"), - Type.Literal("admin"), - ]), - }), -]); - -const schema = new SchemaBuilder() - .config({ type: "directed", multi: true, allowSelfLoops: false }) - .nodeType("principal", ACLNodeAttributes) // accounts, groups - .nodeType("resource", ACLNodeAttributes) // projects, sessions, etc. - .edgeType("can_access", ACLEdgeAttributes, { - allowedSourceTypes: ["principal"], - allowedTargetTypes: ["resource"], - }) - .build(); -``` - -### Defining Encrypted Secret Storage as a Node Type - -> **⚠️ Not yet implemented.** `EncryptedDataSchema` and `encrypt()`/`decrypt()` -> are planned additions. See [encrypted-data.md](./encrypted-data.md) for the -> design. - -```ts -// PLANNED — not yet available -import { EncryptedDataSchema } from "@alkdev/storage"; - -const SecretNodeAttributes = Type.Intersect([ - BaseNodeAttributes, - Type.Object({ - key: Type.String(), // secret key name - encryptedData: EncryptedDataSchema, // AES-256-GCM ciphertext - expiresAt: Type.Optional(Type.String({ format: "date-time" })), - }), -]); - -const schema = new SchemaBuilder() - .config({ type: "undirected", multi: false, allowSelfLoops: false }) - .nodeType("secret", SecretNodeAttributes) - .build(); -``` - -See [encrypted-data.md](./encrypted-data.md) for the full encrypted data design. - -## Ecosystem Context - -The metagraph model is a **storage layer** consumed by other packages in the -@alkdev ecosystem. It does not depend on the hub — the dependency flows the other -way. The call protocol, identity model, and graph type definitions that inform -the usage patterns below originate from sibling packages: - -- **Call protocol** — Defined in `@alkdev/operations`. The `CallEventMap`, - `PendingRequestMap`, and `CallHandler` define call/subscribe semantics, event - types, and the request correlation model. Storage persists what the call - protocol records. -- **Identity & access control** — Defined in `@alkdev/operations`. The - `Identity` interface and `AccessControl` schema provide the security context - carried through calls. The "ACL graph type" usage pattern maps to these - constructs. -- **Call graph schema** — Defined in `@alkdev/flowgraph`. `CallNodeAttrs`, - `CallEdgeAttrs`, `CallStatus`, and `EdgeType` specify the attribute shapes for - call graph instances. Storage persists these shapes; flowgraph operates on - them in memory. -- **Task graph schema** — Defined in `@alkdev/taskgraph`. `TaskGraphNodeAttributes` - and `DependencyEdge` specify task dependency shapes. Another graph type - that storage persists. -- **Event transport** — Provided by `@alkdev/pubsub`. The `TypedEventTarget` - contract and `EventEnvelope` wrapping carry call protocol events between - processes. Storage is not involved in event routing but stores the events' - outcomes. - -The repository layer (not yet implemented) will provide typed CRUD for metagraph -data — insert, find, update, delete with schema validation. A consumer can then -wire these CRUD functions into `@alkdev/operations`'s registry pattern -(analogous to how `drizzle-graphql` auto-generates a GraphQL schema from Drizzle -tables, but using operations instead of GraphQL resolvers). This avoids circular -dependencies: storage defines the schema types and tables, operations provides -the call protocol and registry, and the consumer bridges them. - -## References - -- Call protocol: `/workspace/@alkdev/operations/docs/architecture/call-protocol.md` -- Identity & access control: `/workspace/@alkdev/operations/docs/architecture/api-surface.md` -- Call graph schema: `/workspace/@alkdev/flowgraph/docs/architecture/schema.md` -- Call graph (dynamic): `/workspace/@alkdev/flowgraph/docs/architecture/call-graph.md` -- Task graph schema: `/workspace/@alkdev/taskgraph_ts/docs/architecture/schemas.md` -- Pubsub architecture: `/workspace/@alkdev/pubsub/docs/architecture/README.md` -- TypeBox: https://github.com/sinclairzx/typebox -- SchemaBuilder source (pre-Module, removed): `src/graphs/schemaBuilder.ts` -- Schema types source (pre-Module, being replaced): `src/graphs/types.ts` diff --git a/docs/architecture/schema-evolution.md b/docs/architecture/schema-evolution.md index fd8ab13..d9e4240 100644 --- a/docs/architecture/schema-evolution.md +++ b/docs/architecture/schema-evolution.md @@ -556,4 +556,4 @@ be needed. That's a post-v1 concern. - Event Log as Source of Truth (ADR-005): `/workspace/@alkdev/flowgraph/docs/architecture/decisions/005-event-log-as-source-of-truth.md` - Call protocol: `/workspace/@alkdev/operations/docs/architecture/call-protocol.md` - Metagraph Module: [metagraph-module.md](./metagraph-module.md) -- Current schema versioning: [metagraph.md](./metagraph.md) \ No newline at end of file +- Schema versioning in the data model: [metagraph-module.md](./metagraph-module.md) (Versioning section and DD3) \ No newline at end of file diff --git a/docs/architecture/sqlite-host.md b/docs/architecture/sqlite-host.md index 11f007b..784c89e 100644 --- a/docs/architecture/sqlite-host.md +++ b/docs/architecture/sqlite-host.md @@ -136,7 +136,7 @@ node type is a valid endpoint — matching the behavior of `undefined` in the listed node types. There is no "no types allowed" state; if edge types need to be disabled, use a status or soft-delete pattern on the edge type definition. The repository layer must enforce this convention consistently. See -[metagraph.md](./metagraph.md) for the schema-layer definition. +[metagraph-module.md](./metagraph-module.md) for edge endpoint semantics. ### `graphs`