--- status: reviewed last_updated: 2026-05-30 --- # Schema Evolution How graph type schemas evolve over time — detecting changes, classifying their impact, and migrating stored data. Uses TypeBox's `Value.Diff`/`Value.Patch`/ `Value.Cast` to operate on schemas-as-JSON and data-as-JSON, aligned with the ecosystem's event-sourced design. ## Overview The @alkdev ecosystem is event-driven. Call protocol events are append-only. Graph instances in storage are projections — materialized views produced by replaying events through projector functions. When a graph type's schema changes, stored data may need migration, and the repository layer needs to detect and handle the change. The key insight: **TypeBox schemas are JSON**. They're JSON Schema objects stored as JSON in `node_types.schema` and `edge_types.schema` columns (text in SQLite, jsonb in PG). Because they're JSON, TypeBox's own `Value.Diff`, `Value.Patch`, and `Value.Cast` can operate on them directly — diffing schemas to detect changes, patching stored schemas to update them, and casting stored data to fit new schema shapes. Two distinct domains of JSON values are involved: - **Schemas-as-JSON**: The TypeBox schema objects stored in `node_types.schema` and `edge_types.schema` columns. `Value.Diff`/`Value.Patch` operate on these (detecting schema changes, updating stored schemas). - **Data-as-JSON**: The node/edge attribute values stored in `nodes.attributes` and `edges.attributes` columns. `Value.Cast`/`Value.Check` operate on these (migrating data to fit new schemas, verifying compatibility). This is not a migration framework. It's the observation that the existing TypeBox value system, combined with schemas-as-JSON storage, gives us schema evolution primitives for free — we just need to wire them together. ### The Edit Type `Value.Diff` returns an array of `Edit` objects — structural delta operations that transform one JSON value into another: ```ts type Insert = { type: "insert", path: string, value: unknown } type Update = { type: "update", path: string, value: unknown } type Delete = { type: "delete", path: string } type Edit = Insert | Update | Delete ``` Paths use RFC 6901 JSON Pointer format (`/properties/requestId`, `/properties/status/anyOf/0/const`). The `value` field contains the inserted or updated value. `Value.Patch(current, edits)` applies the edits to `current` and returns the result. `Value.Diff` and `Value.Patch` are inverses: `Patch(current, Diff(current, next))` ≈ `next`. **Critical: `Edit` paths are structural, not semantic.** An `Update` at path `/properties/status` could mean type narrowing (`String` → `Literal`) or type change (`String` → `Number`). Diff doesn't know — it only sees the raw JSON structure. Classification logic (breaking vs non-breaking) must interpret the edits with schema awareness. ## The Event-Sourced Context The @alkdev ecosystem uses an append-only event model: - **Call protocol** (`@alkdev/operations`): 5 typed events (`call.requested`, `call.responded`, `call.completed`, `call.aborted`, `call.error`) form an append-only event stream per workflow execution - **Event log as source of truth** (`@alkdev/flowgraph` ADR-005): The in-memory `CallEventMapValue[]` is the authoritative state; projected views (status, results, call graph) are derived from it - **Hub persistence**: The hub persists call protocol events to Postgres and replays them to reconstruct reactive state after restart - **Storage as projection**: The metagraph tables store the projected state — graph instances, nodes, edges — not the raw event stream This means storage's graph data is analogous to a read model in event-sourced systems. Schema evolution in storage is projection migration, not event migration. The event stream (managed by the hub) retains full history; storage's tables hold the current materialized view. ### Single-Author, Not CRDT Unlike Yjs or Automerge, the @alkdev event model is single-author per session with central coordination. There are no concurrent multi-author edit conflicts. This means: - **Idempotent replay** (same events → same state) is sufficient; CRDT merge semantics are not needed - **Schema evolution** is forward-only (new code processes old data), not bidirectional (concurrent versions merging) - **Migration** is apply-on-read or apply-on-write, not conflict resolution If future use cases require multi-author concurrent graph editing (e.g., collaborative task boards), a CRDT layer would need to sit between the event stream and storage. That's a post-v1 concern. ## Schema-as-JSON: Value.Diff on Schemas TypeBox schemas are JSON Schema objects — plain JSON values. The current `node_types.schema` and `edge_types.schema` columns store them as JSON text (SQLite) or jsonb (PG). This means `Value.Diff` can diff schemas themselves. ### Detecting Schema Changes ```ts import { Value } from "@alkdev/typebox"; // Stored schema (from DB) const storedSchema = await db.query.nodeTypes.findFirst({ where: eq(nodeTypes.name, "Call"), }).schema; // Current schema (from Module) const currentSchema = CallGraph.CallNode; // Diff the schemas const edits = Value.Diff(storedSchema, currentSchema); if (edits.length === 0) { // No change — schema matches } else { // Schema has changed — classify the edits } ``` ### Classifying Schema Edits `Value.Diff` is schema-agnostic — it diffs raw JSON structure without understanding JSON Schema semantics. A classification layer is needed to determine whether an edit is breaking or non-breaking. | Edit pattern | Schema-level meaning | Breaking? | |---|---|---| | Insert new property with `default` | New field with default | No | | Insert new property without `default` | New required field — old data won't have it | Yes (unless `Type.Optional`) | | Insert new property with `Optional` wrapper | New optional field — old data valid | No | | Update property type: `String` → `Literal("x")` | Type narrowing (subtype) | No (existing data with `"x"` is valid) | | Update property type: `String` → `Number` | Type change (incompatible) | Yes | | Update property type: add `Optional` wrapper | Making field optional | No (backward compatible) | | Delete property | Field removed from schema | Yes (old data with this field is non-conforming if `additionalProperties: false`) | | Update `$defs` references | Cross-reference changes | Depends — see note | **⚠️ Needs POC**: The classification above is theoretical. Whether this can be done reliably from `Edit[]` objects (which are raw JSON pointer + value pairs) needs validation. POC scope: - **Test corpus**: ~20 schema change patterns covering each row in the table above (add optional field, add required field, narrow type, change type, remove field, add enum value, `$ref` change) - **Success criteria**: Classification accuracy >95% against expected breaking/non-breaking labels for each pattern - **Fallback**: If classification accuracy is too low, use Strategy C (hybrid with `Value.Check` verification) instead of Strategy A. ### Three Detection Strategies **Strategy A: Diff schemas, classify edits** ``` Value.Diff(storedSchema, currentSchema) → Edit[] classify(Edit[]) → { breaking: boolean, edits: ClassifiedEdit[] } ``` Pros: Identifies *what* changed and *how*. Enables targeted migration (only migrate data affected by breaking changes). Cons: Classification is hard to get right. Schema semantics are rich (type narrowing, union widening, `$ref` changes, `additionalProperties` interactions). **Strategy B: Diff schemas, test data compatibility** ``` Value.Diff(storedSchema, currentSchema) → Edit[] if (edits.length > 0) { // Schema changed — test if existing data is still valid const sampleData = fetchSampleNodeData(graphTypeId, nodeTypeName); const compatible = sampleData.every(d => Value.Check(currentSchema, d)); } ``` Pros: Simple. No classification logic needed. Works with any schema change. Cons: Requires fetching data to test. Binary answer (compatible or not) — no information about *what* changed. Sample may not be representative. **Strategy C: Hybrid — diff for detection, Check for verification** ``` edits = Value.Diff(storedSchema, currentSchema) if (edits.length === 0) return "unchanged"; // Schema changed. Fast-path: check if version bump covers it. // storedVersion = graph_types.version from DB // currentVersion = consumer-defined constant (e.g., CURRENT_CALL_GRAPH_VERSION = 2) if (storedVersion < currentVersion) return "version-mismatch"; // No version bump but schema changed — non-breaking change expected. // Verify by checking stored data against new schema. // findIncompatibleNodes = repository query that returns nodes // where Value.Check(currentSchema, node.attributes) is false const incompatible = await findIncompatibleNodes(graphTypeId, nodeTypeName, currentSchema); if (incompatible.length === 0) return "non-breaking"; return "breaking"; ``` `findIncompatibleNodes` is a repository query function: fetch nodes of the given type in the given graph, filter to those where `!Value.Check(currentSchema, node.attributes)`. `currentVersion` is a consumer-defined integer constant that the consumer increments when making a breaking change to their graph type Module. This combines detection (diff), metadata (version), and verification (check) and is likely the most robust approach. **Recommended for POC exploration.** ## Data Migration via Value.Cast Once a schema change is detected and classified, `Value.Cast` can migrate existing data to fit the new schema shape. ### How Value.Cast Works `Value.Cast(schema, value)` attempts to fit a value into the shape defined by the schema: - **Matching properties**: Retained from the original value - **Missing required properties with defaults**: Filled from `schema.default` - **Missing required properties without defaults**: Created as zeros (`0`, `""`, `false`, `{}`, `[]`) - **Unknown properties**: Dropped if `additionalProperties: false`, retained otherwise - **Union types**: Each variant is scored and the best match is selected This is exactly what's needed for data migration — reshape stored node attributes to fit a new schema while preserving matching fields. ### Example: Adding a Required Field with Default ```ts // Schema v1 (stored in DB) const CallNodeV1 = Type.Object({ requestId: Type.String(), operationId: Type.String(), status: Type.Union([...]), }); // Schema v2 (new Module entry — added priority field) const CallNodeV2 = Type.Object({ requestId: Type.String(), operationId: Type.String(), status: Type.Union([...]), priority: Type.Number({ default: 0 }), // new field with default }); // Existing data const oldNode = { requestId: "req-001", operationId: "op-call", status: "completed", }; // Cast migrates the data const migratedNode = Value.Cast(CallNodeV2, oldNode); // → { requestId: "req-001", operationId: "op-call", status: "completed", priority: 0 } ``` ### Example: Type Narrowing (Non-Breaking) ```ts // Schema v1: type: Type.String() // Schema v2: type: Type.Literal("triggered") const oldEdge = { type: "triggered", metadata: {} }; const migrated = Value.Cast(CallGraphV2.TriggeredEdge, oldEdge); // → { type: "triggered", metadata: {} } (unchanged — "triggered" satisfies Literal) ``` ### Example: Type Change (Breaking) ```ts // Schema v1: priority: Type.Number() // Schema v2: priority: Type.Union([Type.Literal("low"), Type.Literal("high")]) const oldNode = { requestId: "req-001", priority: 3 }; const migrated = Value.Cast(CallNodeV2, oldNode); // Cast scores each union variant against the value. // Neither "low" nor "high" matches 3 — Cast picks the best match, // but the result is likely incorrect (data loss). ``` **⚠️ Cast limitation**: `Value.Cast` does not provide custom migration functions. For breaking changes that require transformation logic (e.g., `priority: 3` → `priority: "high"` based on a threshold), the repository layer needs a custom migration handler. Cast handles the common case (add fields, narrow types, drop removed fields); custom logic handles the rest. ## Schema-as-JSON: Patching Stored Schemas When a graph type Module changes, the stored schema in `node_types.schema` must be updated. `Value.Patch` can apply the diff to the stored schema: ```ts // 1. Diff current stored schema against new Module entry const edits = Value.Diff(storedSchema, CallGraph.CallNode); // 2. Patch the stored schema const newStoredSchema = Value.Patch(storedSchema, edits); // 3. Update the DB row await db.update(nodeTypes) .set({ schema: newStoredSchema, updatedAt: new Date() }) .where(eq(nodeTypes.name, "Call")); ``` This is simpler than reconstructing the full Module and running `moduleToDbSchema()` again. But it requires the same caveats as `Value.Diff` — the edits are structural, not semantic. If the Module entry's `$defs` structure changed (e.g., a `Type.Ref` target was updated), the diff may not capture the semantic change correctly. **Alternative**: Re-run `moduleToDbSchema()` on the updated Module and write the full output. This is simpler and more reliable but requires the full Module — not just the schema entry — to be available at migration time. **Decision for v1**: Re-run `moduleToDbSchema()` on the updated Module. The Module is always available when the consumer defines graph types. Patch-based schema update is an optimization for later. ## Evolution Strategies ### Additive-Only (Recommended for v1) The simplest strategy: only add optional fields and new node/edge types. Never remove or rename existing fields. - **New optional fields**: `Type.Optional(Type.String())` — old data is still valid under the new schema - **New node/edge types**: New rows in `node_types`/`edge_types` — existing rows unchanged - **New enum values**: Add to `Type.Union` of `Type.Literal` — old data with existing values is still valid This avoids the need for `Value.Cast` migrations entirely. The `version` field on `graph_types` stays at 1. **When additive-only breaks down**: If a field was incorrectly designed (wrong type, wrong name, wrong semantics), additive-only forces you to deprecate the old field and add a new one. The deprecated field stays in the schema forever. This is acceptable for early development but creates technical debt over time. ### Version-Bumped with Cast Migration When a breaking change is needed, bump the `graph_types.version` integer. The repository layer checks the version before processing and applies `Value.Cast` to migrate data when the version doesn't match. ```ts const graphType = await findGraphType("call-graph"); const currentModule = CallGraph; if (graphType.version < CURRENT_CALL_GRAPH_VERSION) { // Mark migration in progress await updateGraphTypeVersion(graphType.id, graphType.version + 1); // Schema has changed — migrate all nodes const nodes = await findNodesByGraphType(graphType.id); for (const node of nodes) { const migrated = Value.Cast( currentModule[`${node.nodeType}Node`], node.attributes, ); // Guard: verify the cast result before writing if (!Value.Check(currentModule[`${node.nodeType}Node`], migrated)) { throw new Error( `Cast produced invalid data for node ${node.id}. ` + `Custom migration required for this schema change.` ); } await updateNode(node.id, { attributes: migrated }); } // Update the schema and finalize version await updateNodeTypesSchemas(graphType.id, moduleToDbSchema(currentModule)); await updateGraphTypeVersion(graphType.id, graphType.version + 1); } ``` **Version bump contract** (decision: even/odd scheme): - **Even version**: Stable schema, no pending migrations - **Odd version**: Migration in progress — reads return stale data or error (consumer-configurable) - **After migration**: Version is bumped to the next even number ### Migration Safety Partial migration is the primary risk — if the process crashes or errors on node N of M, some nodes are migrated and some are not. The even/odd version scheme provides the recovery mechanism: 1. **Before migration**: Set `version` to odd (in-progress state) 2. **During migration**: Each node is cast, verified with `Value.Check`, then written. If any cast produces invalid data, the migration aborts with an error — the consumer must provide a custom migration function for that schema change. 3. **After migration**: Set `version` to even (stable state), update stored schemas via `moduleToDbSchema()` 4. **Recovery**: If the process crashes during migration, the odd version signals that migration is incomplete. On restart, the repository layer detects the odd version and can either resume the migration (migrate the remaining nodes) or roll back (restore from backup / re-project from events). **Read behavior during migration**: Consumer-configurable. Options: - Reject reads with an error ("migration in progress") - Return stale data (from unmigrated nodes) - Return mixed data (some migrated, some not — not recommended) **Cast safety guard**: Every `Value.Cast` result must be verified with `Value.Check(newSchema, migratedData)` before writing. If the check fails, the migration is classified as breaking after all, and the consumer must provide a custom migration function. Document this as a hard requirement. **Performance**: For large datasets (10K+ nodes), the migration loop should batch reads/writes to avoid holding all nodes in memory. The graph type is effectively read-only during migration — writes to that graph type's nodes should be rejected while `version` is odd. This is the simplest versioning that handles breaking changes. It's application-level (the consumer decides when to bump and migrate), not framework-level (storage doesn't auto-migrate). `graph_types.version` is consistent with `@alkdev/flowgraph` ADR-004: flowgraph itself doesn't version schemas (it's in-memory), but the persisting consumer (storage) provides the versioned envelope. ### Event-Sourced Replay (Forward-Looking) In a fully event-sourced model, schema evolution is handled by replaying events through updated projector functions. Storage's tables are rebuilt from the event log, so no data migration is needed — you just re-project. This requires the event log to be the authoritative source and storage to be a disposable projection. The hub's call graph persistence (Postgres) is approaching this model: events are persisted, and state is reconstructed by replay. But the current metagraph tables are not rebuilt from events — they're written directly by the repository layer. If the hub migrates to a model where call graph nodes/edges are projections of the call protocol event stream, then schema evolution becomes projector evolution — update the projector, replay the events, rebuild the projection. No `Value.Cast` needed for stored data (because there is no stored data — just re-projected views). **This is a post-v1 design** — it requires the event log to be the primary persistence, with storage tables as read-optimized projections. The current model writes directly to storage tables; the event stream is separate (managed by the hub's call graph module, not by `@alkdev/storage`). ## Relationship to the Module The `Metagraph` Module ([metagraph-module.md](./metagraph-module.md)) is the source of truth for graph type schemas. When the Module changes: 1. **`moduleToDbSchema()`** produces the updated DB row values 2. **`Value.Diff(storedSchema, moduleEntry)`** detects what changed 3. **`Value.Cast(moduleEntry, storedData)`** migrates affected node/edge data 4. **`Value.Patch(storedSchema, edits)`** updates the stored schema (or re-run `moduleToDbSchema()`) The Module's `$defs` structure adds complexity to diffing — a `Type.Ref` resolution changes the effective schema in ways that a raw JSON diff might not capture. Using `moduleToDbSchema()` (which resolves refs before writing to the DB) avoids this problem — the stored schema is already dereferenced. ## Design Decisions All design decisions are documented as ADRs in [decisions/](decisions/). | ADR | Decision | Summary | |-----|----------|---------| | [028](decisions/028-additive-only-with-cast-migration.md) | Additive-only for v1, Cast migration when needed | Additive changes avoid migration; `Value.Cast` for common cases | | [029](decisions/029-version-as-breaking-change-signal.md) | Version as a coarse-grained breaking-change signal | Only breaking changes bump the version; even/odd for migration state | | [030](decisions/030-schema-change-detection-via-diff.md) | Schema change detection via Value.Diff | No manual changelog; diff stored vs current schemas | | [031](decisions/031-moduletodbschema-for-updates.md) | moduleToDbSchema() for schema updates | Re-run full Module projection, not Value.Patch | | [032](decisions/032-single-author-not-crdt.md) | Single-author model, not CRDT | No concurrent multi-author graph types | ## Open Questions Open questions are tracked in [open-questions.md](open-questions.md). Key questions affecting schema evolution: - **OQ-10**: Can `Edit[]` from `Value.Diff` be reliably classified as breaking vs non-breaking? - **OQ-11**: Should the repository layer auto-migrate data on schema change, or require explicit consumer action? - **OQ-12**: How does schema evolution interact with the hub's event-sourced call graph persistence? - **OQ-13**: Should schema evolution events be part of the event stream? ## References - TypeBox `Value` namespace: `/workspace/@alkdev/typebox/src/value/` - TypeBox `Value.Diff`/`Value.Patch`: `/workspace/@alkdev/typebox/src/value/delta/delta.ts` - TypeBox `Value.Cast`: `/workspace/@alkdev/typebox/src/value/cast/cast.ts` - TypeBox `Value.Check`: `/workspace/@alkdev/typebox/src/value/check/check.ts` - Event Log as Source of Truth (ADR-005): `/workspace/@alkdev/flowgraph/docs/architecture/decisions/005-event-log-as-source-of-truth.md` - Call protocol: `/workspace/@alkdev/operations/docs/architecture/call-protocol.md` - Metagraph Module: [metagraph-module.md](./metagraph-module.md) - Schema versioning in the data model: ADR-029, [metagraph-module.md](./metagraph-module.md)