Files
storage/docs/architecture/schema-evolution.md

24 KiB

status, last_updated
status last_updated
draft 2026-05-28

Schema Evolution

How graph type schemas evolve over time — detecting changes, classifying their impact, and migrating stored data. Uses TypeBox's Value.Diff/Value.Patch/ Value.Cast to operate on schemas-as-JSON and data-as-JSON, aligned with the ecosystem's event-sourced design.

Overview

The @alkdev ecosystem is event-driven. Call protocol events are append-only. Graph instances in storage are projections — materialized views produced by replaying events through projector functions. When a graph type's schema changes, stored data may need migration, and the repository layer needs to detect and handle the change.

The key insight: TypeBox schemas are JSON. They're JSON Schema objects stored as JSON in node_types.schema and edge_types.schema columns (text in SQLite, jsonb in PG). Because they're JSON, TypeBox's own Value.Diff, Value.Patch, and Value.Cast can operate on them directly — diffing schemas to detect changes, patching stored schemas to update them, and casting stored data to fit new schema shapes.

Two distinct domains of JSON values are involved:

  • Schemas-as-JSON: The TypeBox schema objects stored in node_types.schema and edge_types.schema columns. Value.Diff/Value.Patch operate on these (detecting schema changes, updating stored schemas).
  • Data-as-JSON: The node/edge attribute values stored in nodes.attributes and edges.attributes columns. Value.Cast/Value.Check operate on these (migrating data to fit new schemas, verifying compatibility).

This is not a migration framework. It's the observation that the existing TypeBox value system, combined with schemas-as-JSON storage, gives us schema evolution primitives for free — we just need to wire them together.

The Edit Type

Value.Diff returns an array of Edit objects — structural delta operations that transform one JSON value into another:

type Insert  = { type: "insert",  path: string, value: unknown }
type Update  = { type: "update",  path: string, value: unknown }
type Delete  = { type: "delete",   path: string }
type Edit    = Insert | Update | Delete

Paths use RFC 6901 JSON Pointer format (/properties/requestId, /properties/status/anyOf/0/const). The value field contains the inserted or updated value.

Value.Patch(current, edits) applies the edits to current and returns the result. Value.Diff and Value.Patch are inverses: Patch(current, Diff(current, next))next.

Critical: Edit paths are structural, not semantic. An Update at path /properties/status could mean type narrowing (StringLiteral) or type change (StringNumber). Diff doesn't know — it only sees the raw JSON structure. Classification logic (breaking vs non-breaking) must interpret the edits with schema awareness.

The Event-Sourced Context

The @alkdev ecosystem uses an append-only event model:

  • Call protocol (@alkdev/operations): 5 typed events (call.requested, call.responded, call.completed, call.aborted, call.error) form an append-only event stream per workflow execution
  • Event log as source of truth (@alkdev/flowgraph ADR-005): The in-memory CallEventMapValue[] is the authoritative state; projected views (status, results, call graph) are derived from it
  • Hub persistence: The hub persists call protocol events to Postgres and replays them to reconstruct reactive state after restart
  • Storage as projection: The metagraph tables store the projected state — graph instances, nodes, edges — not the raw event stream

This means storage's graph data is analogous to a read model in event-sourced systems. Schema evolution in storage is projection migration, not event migration. The event stream (managed by the hub) retains full history; storage's tables hold the current materialized view.

Single-Author, Not CRDT

Unlike Yjs or Automerge, the @alkdev event model is single-author per session with central coordination. There are no concurrent multi-author edit conflicts. This means:

  • Idempotent replay (same events → same state) is sufficient; CRDT merge semantics are not needed
  • Schema evolution is forward-only (new code processes old data), not bidirectional (concurrent versions merging)
  • Migration is apply-on-read or apply-on-write, not conflict resolution

If future use cases require multi-author concurrent graph editing (e.g., collaborative task boards), a CRDT layer would need to sit between the event stream and storage. That's a post-v1 concern.

Schema-as-JSON: Value.Diff on Schemas

TypeBox schemas are JSON Schema objects — plain JSON values. The current node_types.schema and edge_types.schema columns store them as JSON text (SQLite) or jsonb (PG). This means Value.Diff can diff schemas themselves.

Detecting Schema Changes

import { Value } from "@alkdev/typebox";

// Stored schema (from DB)
const storedSchema = await db.query.nodeTypes.findFirst({
  where: eq(nodeTypes.name, "Call"),
}).schema;

// Current schema (from Module)
const currentSchema = CallGraph.CallNode;

// Diff the schemas
const edits = Value.Diff(storedSchema, currentSchema);

if (edits.length === 0) {
  // No change — schema matches
} else {
  // Schema has changed — classify the edits
}

Classifying Schema Edits

Value.Diff is schema-agnostic — it diffs raw JSON structure without understanding JSON Schema semantics. A classification layer is needed to determine whether an edit is breaking or non-breaking.

Edit pattern Schema-level meaning Breaking?
Insert new property with default New field with default No
Insert new property without default New required field — old data won't have it Yes (unless Type.Optional)
Insert new property with Optional wrapper New optional field — old data valid No
Update property type: StringLiteral("x") Type narrowing (subtype) No (existing data with "x" is valid)
Update property type: StringNumber Type change (incompatible) Yes
Update property type: add Optional wrapper Making field optional No (backward compatible)
Delete property Field removed from schema Yes (old data with this field is non-conforming if additionalProperties: false)
Update $defs references Cross-reference changes Depends — see note

⚠️ Needs POC: The classification above is theoretical. Whether this can be done reliably from Edit[] objects (which are raw JSON pointer + value pairs) needs validation. POC scope:

  • Test corpus: ~20 schema change patterns covering each row in the table above (add optional field, add required field, narrow type, change type, remove field, add enum value, $ref change)
  • Success criteria: Classification accuracy >95% against expected breaking/non-breaking labels for each pattern
  • Fallback: If classification accuracy is too low, use Strategy C (hybrid with Value.Check verification) instead of Strategy A.

Three Detection Strategies

Strategy A: Diff schemas, classify edits

Value.Diff(storedSchema, currentSchema) → Edit[]
classify(Edit[]) → { breaking: boolean, edits: ClassifiedEdit[] }

Pros: Identifies what changed and how. Enables targeted migration (only migrate data affected by breaking changes).

Cons: Classification is hard to get right. Schema semantics are rich (type narrowing, union widening, $ref changes, additionalProperties interactions).

Strategy B: Diff schemas, test data compatibility

Value.Diff(storedSchema, currentSchema) → Edit[]
if (edits.length > 0) {
  // Schema changed — test if existing data is still valid
  const sampleData = fetchSampleNodeData(graphTypeId, nodeTypeName);
  const compatible = sampleData.every(d => Value.Check(currentSchema, d));
}

Pros: Simple. No classification logic needed. Works with any schema change.

Cons: Requires fetching data to test. Binary answer (compatible or not) — no information about what changed. Sample may not be representative.

Strategy C: Hybrid — diff for detection, Check for verification

edits = Value.Diff(storedSchema, currentSchema)
if (edits.length === 0) return "unchanged";

// Schema changed. Fast-path: check if version bump covers it.
// storedVersion = graph_types.version from DB
// currentVersion = consumer-defined constant (e.g., CURRENT_CALL_GRAPH_VERSION = 2)
if (storedVersion < currentVersion) return "version-mismatch";

// No version bump but schema changed — non-breaking change expected.
// Verify by checking stored data against new schema.
// findIncompatibleNodes = repository query that returns nodes
// where Value.Check(currentSchema, node.attributes) is false
const incompatible = await findIncompatibleNodes(graphTypeId, nodeTypeName, currentSchema);
if (incompatible.length === 0) return "non-breaking";
return "breaking";

findIncompatibleNodes is a repository query function: fetch nodes of the given type in the given graph, filter to those where !Value.Check(currentSchema, node.attributes). currentVersion is a consumer-defined integer constant that the consumer increments when making a breaking change to their graph type Module.

This combines detection (diff), metadata (version), and verification (check) and is likely the most robust approach. Recommended for POC exploration.

Data Migration via Value.Cast

Once a schema change is detected and classified, Value.Cast can migrate existing data to fit the new schema shape.

How Value.Cast Works

Value.Cast(schema, value) attempts to fit a value into the shape defined by the schema:

  • Matching properties: Retained from the original value
  • Missing required properties with defaults: Filled from schema.default
  • Missing required properties without defaults: Created as zeros (0, "", false, {}, [])
  • Unknown properties: Dropped if additionalProperties: false, retained otherwise
  • Union types: Each variant is scored and the best match is selected

This is exactly what's needed for data migration — reshape stored node attributes to fit a new schema while preserving matching fields.

Example: Adding a Required Field with Default

// Schema v1 (stored in DB)
const CallNodeV1 = Type.Object({
  requestId: Type.String(),
  operationId: Type.String(),
  status: Type.Union([...]),
});

// Schema v2 (new Module entry — added priority field)
const CallNodeV2 = Type.Object({
  requestId: Type.String(),
  operationId: Type.String(),
  status: Type.Union([...]),
  priority: Type.Number({ default: 0 }),  // new field with default
});

// Existing data
const oldNode = {
  requestId: "req-001",
  operationId: "op-call",
  status: "completed",
};

// Cast migrates the data
const migratedNode = Value.Cast(CallNodeV2, oldNode);
// → { requestId: "req-001", operationId: "op-call", status: "completed", priority: 0 }

Example: Type Narrowing (Non-Breaking)

// Schema v1: type: Type.String()
// Schema v2: type: Type.Literal("triggered")

const oldEdge = { type: "triggered", metadata: {} };
const migrated = Value.Cast(CallGraphV2.TriggeredEdge, oldEdge);
// → { type: "triggered", metadata: {} }  (unchanged — "triggered" satisfies Literal)

Example: Type Change (Breaking)

// Schema v1: priority: Type.Number()
// Schema v2: priority: Type.Union([Type.Literal("low"), Type.Literal("high")])

const oldNode = { requestId: "req-001", priority: 3 };
const migrated = Value.Cast(CallNodeV2, oldNode);
// Cast scores each union variant against the value.
// Neither "low" nor "high" matches 3 — Cast picks the best match,
// but the result is likely incorrect (data loss).

⚠️ Cast limitation: Value.Cast does not provide custom migration functions. For breaking changes that require transformation logic (e.g., priority: 3priority: "high" based on a threshold), the repository layer needs a custom migration handler. Cast handles the common case (add fields, narrow types, drop removed fields); custom logic handles the rest.

Schema-as-JSON: Patching Stored Schemas

When a graph type Module changes, the stored schema in node_types.schema must be updated. Value.Patch can apply the diff to the stored schema:

// 1. Diff current stored schema against new Module entry
const edits = Value.Diff(storedSchema, CallGraph.CallNode);

// 2. Patch the stored schema
const newStoredSchema = Value.Patch(storedSchema, edits);

// 3. Update the DB row
await db.update(nodeTypes)
  .set({ schema: newStoredSchema, updatedAt: new Date() })
  .where(eq(nodeTypes.name, "Call"));

This is simpler than reconstructing the full Module and running moduleToDbSchema() again. But it requires the same caveats as Value.Diff — the edits are structural, not semantic. If the Module entry's $defs structure changed (e.g., a Type.Ref target was updated), the diff may not capture the semantic change correctly.

Alternative: Re-run moduleToDbSchema() on the updated Module and write the full output. This is simpler and more reliable but requires the full Module — not just the schema entry — to be available at migration time.

Decision for v1: Re-run moduleToDbSchema() on the updated Module. The Module is always available when the consumer defines graph types. Patch-based schema update is an optimization for later.

Evolution Strategies

The simplest strategy: only add optional fields and new node/edge types. Never remove or rename existing fields.

  • New optional fields: Type.Optional(Type.String()) — old data is still valid under the new schema
  • New node/edge types: New rows in node_types/edge_types — existing rows unchanged
  • New enum values: Add to Type.Union of Type.Literal — old data with existing values is still valid

This avoids the need for Value.Cast migrations entirely. The version field on graph_types stays at 1.

When additive-only breaks down: If a field was incorrectly designed (wrong type, wrong name, wrong semantics), additive-only forces you to deprecate the old field and add a new one. The deprecated field stays in the schema forever. This is acceptable for early development but creates technical debt over time.

Version-Bumped with Cast Migration

When a breaking change is needed, bump the graph_types.version integer. The repository layer checks the version before processing and applies Value.Cast to migrate data when the version doesn't match.

const graphType = await findGraphType("call-graph");
const currentModule = CallGraph;

if (graphType.version < CURRENT_CALL_GRAPH_VERSION) {
  // Mark migration in progress
  await updateGraphTypeVersion(graphType.id, graphType.version + 1);

  // Schema has changed — migrate all nodes
  const nodes = await findNodesByGraphType(graphType.id);
  for (const node of nodes) {
    const migrated = Value.Cast(
      currentModule[`${node.nodeType}Node`],
      node.attributes,
    );
    // Guard: verify the cast result before writing
    if (!Value.Check(currentModule[`${node.nodeType}Node`], migrated)) {
      throw new Error(
        `Cast produced invalid data for node ${node.id}. ` +
        `Custom migration required for this schema change.`
      );
    }
    await updateNode(node.id, { attributes: migrated });
  }
  // Update the schema and finalize version
  await updateNodeTypesSchemas(graphType.id, moduleToDbSchema(currentModule));
  await updateGraphTypeVersion(graphType.id, graphType.version + 1);
}

Version bump contract (decision: even/odd scheme):

  • Even version: Stable schema, no pending migrations
  • Odd version: Migration in progress — reads return stale data or error (consumer-configurable)
  • After migration: Version is bumped to the next even number

Migration Safety

Partial migration is the primary risk — if the process crashes or errors on node N of M, some nodes are migrated and some are not. The even/odd version scheme provides the recovery mechanism:

  1. Before migration: Set version to odd (in-progress state)
  2. During migration: Each node is cast, verified with Value.Check, then written. If any cast produces invalid data, the migration aborts with an error — the consumer must provide a custom migration function for that schema change.
  3. After migration: Set version to even (stable state), update stored schemas via moduleToDbSchema()
  4. Recovery: If the process crashes during migration, the odd version signals that migration is incomplete. On restart, the repository layer detects the odd version and can either resume the migration (migrate the remaining nodes) or roll back (restore from backup / re-project from events).

Read behavior during migration: Consumer-configurable. Options:

  • Reject reads with an error ("migration in progress")
  • Return stale data (from unmigrated nodes)
  • Return mixed data (some migrated, some not — not recommended)

Cast safety guard: Every Value.Cast result must be verified with Value.Check(newSchema, migratedData) before writing. If the check fails, the migration is classified as breaking after all, and the consumer must provide a custom migration function. Document this as a hard requirement.

Performance: For large datasets (10K+ nodes), the migration loop should batch reads/writes to avoid holding all nodes in memory. The graph type is effectively read-only during migration — writes to that graph type's nodes should be rejected while version is odd.

This is the simplest versioning that handles breaking changes. It's application-level (the consumer decides when to bump and migrate), not framework-level (storage doesn't auto-migrate). graph_types.version is consistent with @alkdev/flowgraph ADR-004: flowgraph itself doesn't version schemas (it's in-memory), but the persisting consumer (storage) provides the versioned envelope.

Event-Sourced Replay (Forward-Looking)

In a fully event-sourced model, schema evolution is handled by replaying events through updated projector functions. Storage's tables are rebuilt from the event log, so no data migration is needed — you just re-project.

This requires the event log to be the authoritative source and storage to be a disposable projection. The hub's call graph persistence (Postgres) is approaching this model: events are persisted, and state is reconstructed by replay. But the current metagraph tables are not rebuilt from events — they're written directly by the repository layer.

If the hub migrates to a model where call graph nodes/edges are projections of the call protocol event stream, then schema evolution becomes projector evolution — update the projector, replay the events, rebuild the projection. No Value.Cast needed for stored data (because there is no stored data — just re-projected views).

This is a post-v1 design — it requires the event log to be the primary persistence, with storage tables as read-optimized projections. The current model writes directly to storage tables; the event stream is separate (managed by the hub's call graph module, not by @alkdev/storage).

Relationship to the Module

The Metagraph Module (metagraph-module.md) is the source of truth for graph type schemas. When the Module changes:

  1. moduleToDbSchema() produces the updated DB row values
  2. Value.Diff(storedSchema, moduleEntry) detects what changed
  3. Value.Cast(moduleEntry, storedData) migrates affected node/edge data
  4. Value.Patch(storedSchema, edits) updates the stored schema (or re-run moduleToDbSchema())

The Module's $defs structure adds complexity to diffing — a Type.Ref resolution changes the effective schema in ways that a raw JSON diff might not capture. Using moduleToDbSchema() (which resolves refs before writing to the DB) avoids this problem — the stored schema is already dereferenced.

Design Decisions

SE1: Additive-only for v1, Cast migration when needed

For v1, schema changes should be additive (new optional fields, new types, new enum values). This avoids data migration entirely. When additive-only is insufficient, Value.Cast handles the common migration cases. Custom migration functions are the consumer's responsibility.

SE2: Version as a coarse-grained breaking-change signal

The version integer on graph_types tracks breaking schema changes. Non-breaking changes (additive) do not require a version bump. This is a coarse signal — the repository layer checks version before processing and knows to run migration logic when it doesn't match.

SE3: Schema change detection via Value.Diff, not manual tracking

Rather than maintaining a separate "schema version log" or changelog, the repository layer uses Value.Diff(storedSchema, moduleEntry) to detect when a stored schema has diverged from the current Module entry. This is schema-agnostic and works for any change.

SE4: moduleToDbSchema() for schema updates, not Value.Patch

When updating stored schemas, re-run moduleToDbSchema() on the full Module rather than using Value.Patch to apply edits. This is more reliable because it doesn't depend on Diff correctly capturing Type.Ref/$defs changes. Patch-based schema update is an optimization for later.

SE5: Single-author model, not CRDT

Schema evolution assumes single-author per graph type. There is no concurrent multi-author editing of graph types. If this changes (multiple consumers defining the same graph type with different schemas), a merge/CRDT layer would be needed. That's a post-v1 concern.

Open Questions

  1. Can Edit[] from Value.Diff be reliably classified as breaking vs non-breaking? The classification table above is theoretical. A POC should validate whether the Edit[] output contains enough information to distinguish, for example, String → Literal("x") (narrowing, non-breaking) from String → Number (incompatible, breaking). Alternative: skip classification and just use Value.Check(newSchema, storedData) for verification.

  2. Should the repository layer auto-migrate data on schema change, or require explicit consumer action? Auto-migration is simpler for consumers but risky (data transformation without consumer awareness). Explicit migration is safer but more boilerplate. Decision (conditional on OQ1 POC outcome): if classification is feasible (OQ1 POC succeeds), the repository layer auto-applies Value.Cast for changes it classifies as non-breaking, and requires explicit consumer action for breaking changes. If classification is not feasible, the fallback is: the repository layer auto-applies Value.Cast only when Value.Check(newSchema, storedData) passes for all stored data (verification, not classification), and requires explicit consumer action otherwise. This ensures auto-migration never corrupts data — if in doubt, the consumer decides.

  3. How does this interact with the hub's event-sourced call graph persistence? If the hub migrates to event-sourced replay (projector evolution), storage's call graph tables become disposable projections and Value.Cast migration is unnecessary. But other graph types (ACL, tasks, secrets) may not have an event stream to replay from. The schema evolution design should work for both projections and direct-persisted data.

  4. Should schema evolution events be part of the event stream? If the system is event-sourced, schema changes themselves could be events (schema.updated, schema.version_bumped). This would give a full audit trail of schema evolution, but adds complexity. Decision: post-v1. For v1, schema changes are applied directly via the repository layer with version tracking.

References

  • TypeBox Value namespace: /workspace/@alkdev/typebox/src/value/
  • TypeBox Value.Diff/Value.Patch: /workspace/@alkdev/typebox/src/value/delta/delta.ts
  • TypeBox Value.Cast: /workspace/@alkdev/typebox/src/value/cast/cast.ts
  • TypeBox Value.Check: /workspace/@alkdev/typebox/src/value/check/check.ts
  • Event Log as Source of Truth (ADR-005): /workspace/@alkdev/flowgraph/docs/architecture/decisions/005-event-log-as-source-of-truth.md
  • Call protocol: /workspace/@alkdev/operations/docs/architecture/call-protocol.md
  • Metagraph Module: metagraph-module.md
  • Current schema versioning: metagraph.md