storage/docs/architecture/schema-evolution.md

---
status: draft
last_updated: 2026-05-31
---

# Schema Evolution

How graph type schemas evolve over time — detecting changes, classifying their
impact, and migrating stored data. Uses TypeBox's `Value.Diff`/`Value.Patch`/
`Value.Cast` to operate on schemas-as-JSON and data-as-JSON, aligned with the
ecosystem's event-sourced design.

## Overview

The @alkdev ecosystem is event-driven. Call protocol events are append-only.
Graph instances in storage are projections — materialized views produced by
replaying events through projector functions. When a graph type's schema
changes, stored data may need migration, and the repository layer needs to
detect and handle the change.

The key insight: **TypeBox schemas are JSON**. They're JSON Schema objects
stored as JSON in `node_types.schema` and `edge_types.schema` columns (text in
SQLite, jsonb in PG). Because they're JSON, TypeBox's own `Value.Diff`,
`Value.Patch`, and `Value.Cast` can operate on them directly — diffing schemas
to detect changes, patching stored schemas to update them, and casting stored
data to fit new schema shapes.

Two distinct domains of JSON values are involved:

- **Schemas-as-JSON**: The TypeBox schema objects stored in `node_types.schema`
  and `edge_types.schema` columns (text with JSON mode in SQLite). `Value.Diff`/`Value.Patch` operate on these
  (detecting schema changes, updating stored schemas).
- **Data-as-JSON**: The node/edge attribute values stored in `nodes.attributes`
  and `edges.attributes` columns (text with JSON mode in SQLite). `Value.Cast`/`Value.Check` operate on these
  (migrating data to fit new schemas, verifying compatibility).

This is not a migration framework. It's the observation that the existing
TypeBox value system, combined with schemas-as-JSON storage, gives us schema
evolution primitives for free — we just need to wire them together.

### The Edit Type

`Value.Diff` returns an array of `Edit` objects — structural delta operations
that transform one JSON value into another:

```ts
type Insert  = { type: "insert",  path: string, value: unknown }
type Update  = { type: "update",  path: string, value: unknown }
type Delete  = { type: "delete",   path: string }
type Edit    = Insert | Update | Delete
```

Paths use RFC 6901 JSON Pointer format (`/properties/requestId`,
`/properties/status/anyOf/0/const`). The `value` field contains the inserted
or updated value.

`Value.Patch(current, edits)` applies the edits to `current` and returns the
result. `Value.Diff` and `Value.Patch` are inverses:
`Patch(current, Diff(current, next))` ≈ `next`.

**Critical: `Edit` paths are structural, not semantic.** An `Update` at path
`/properties/status` could mean type narrowing (`String` → `Literal`) or type
change (`String` → `Number`). Diff doesn't know — it only sees the raw JSON
structure. Classification logic (breaking vs non-breaking) must interpret the
edits with schema awareness.

## The Event-Sourced Context

The @alkdev ecosystem uses an append-only event model:

- **Call protocol** (`@alkdev/operations`): 5 typed events (`call.requested`,
  `call.responded`, `call.completed`, `call.aborted`, `call.error`) form an
  append-only event stream per workflow execution
- **Event log as source of truth** (`@alkdev/flowgraph` ADR-005): The in-memory
  `CallEventMapValue[]` is the authoritative state; projected views (status,
  results, call graph) are derived from it
- **Hub persistence**: The hub persists call protocol events to Postgres and
  replays them to reconstruct reactive state after restart
- **Storage as projection**: The metagraph tables store the projected state —
  graph instances, nodes, edges — not the raw event stream

This means storage's graph data is analogous to a read model in event-sourced
systems. Schema evolution in storage is projection migration, not event
migration. The event stream (managed by the hub) retains full history;
storage's tables hold the current materialized view.

### Single-Author, Not CRDT

Unlike Yjs or Automerge, the @alkdev event model is single-author per session
with central coordination. There are no concurrent multi-author edit conflicts.
This means:

- **Idempotent replay** (same events → same state) is sufficient; CRDT merge
  semantics are not needed
- **Schema evolution** is forward-only (new code processes old data), not
  bidirectional (concurrent versions merging)
- **Migration** is apply-on-read or apply-on-write, not conflict resolution

If future use cases require multi-author concurrent graph editing (e.g.,
collaborative task boards), a CRDT layer would need to sit between the event
stream and storage. That's a post-v1 concern.

## Schema-as-JSON: Value.Diff on Schemas

TypeBox schemas are JSON Schema objects — plain JSON values. The current
`node_types.schema` and `edge_types.schema` columns store them as JSON text
in SQLite. This means `Value.Diff` can diff schemas themselves.

### Detecting Schema Changes

```ts
import { Value } from "@alkdev/typebox";

// Stored schema (from DB)
const storedSchema = await db.query.nodeTypes.findFirst({
  where: eq(nodeTypes.name, "Call"),
}).schema;

// Current schema (from Module)
const currentSchema = CallGraph.CallNode;

// Diff the schemas
const edits = Value.Diff(storedSchema, currentSchema);

if (edits.length === 0) {
  // No change — schema matches
} else {
  // Schema has changed — classify the edits
}
```

### Classifying Schema Edits

`Value.Diff` is schema-agnostic — it diffs raw JSON structure without
understanding JSON Schema semantics. A classification layer is needed to
determine whether an edit is breaking or non-breaking.

| Edit pattern | Schema-level meaning | Breaking? |
|---|---|---|
| Insert new property with `default` | New field with default | No |
| Insert new property without `default` | New required field — old data won't have it | Yes (unless `Type.Optional`) |
| Insert new property with `Optional` wrapper | New optional field — old data valid | No |
| Update property type: `String` → `Literal("x")` | Type narrowing (subtype) | No (existing data with `"x"` is valid) |
| Update property type: `String` → `Number` | Type change (incompatible) | Yes |
| Update property type: add `Optional` wrapper | Making field optional | No (backward compatible) |
| Delete property | Field removed from schema | Yes (old data with this field is non-conforming if `additionalProperties: false`) |
| Update `$defs` references | Cross-reference changes | Depends — see note |

**⚠️ Needs POC**: The classification above is theoretical. Whether this can be
done reliably from `Edit[]` objects (which are raw JSON pointer + value pairs)
needs validation. POC scope:
- **Test corpus**: ~20 schema change patterns covering each row in the table
  above (add optional field, add required field, narrow type, change type,
  remove field, add enum value, `$ref` change)
- **Success criteria**: Classification accuracy >95% against expected
  breaking/non-breaking labels for each pattern
- **Fallback**: If classification accuracy is too low, use Strategy C (hybrid
  with `Value.Check` verification) instead of Strategy A.

### Three Detection Strategies

**Strategy A: Diff schemas, classify edits**

```
Value.Diff(storedSchema, currentSchema) → Edit[]
classify(Edit[]) → { breaking: boolean, edits: ClassifiedEdit[] }
```

Pros: Identifies *what* changed and *how*. Enables targeted migration (only
migrate data affected by breaking changes).

Cons: Classification is hard to get right. Schema semantics are rich (type
narrowing, union widening, `$ref` changes, `additionalProperties` interactions).

**Strategy B: Diff schemas, test data compatibility**

```
Value.Diff(storedSchema, currentSchema) → Edit[]
if (edits.length > 0) {
  // Schema changed — test if existing data is still valid
  const sampleData = fetchSampleNodeData(graphTypeId, nodeTypeName);
  const compatible = sampleData.every(d => Value.Check(currentSchema, d));
}
```

Pros: Simple. No classification logic needed. Works with any schema change.

Cons: Requires fetching data to test. Binary answer (compatible or not) — no
information about *what* changed. Sample may not be representative.

**Strategy C: Hybrid — diff for detection, Check for verification**

```
edits = Value.Diff(storedSchema, currentSchema)
if (edits.length === 0) return "unchanged";

// Schema changed. Fast-path: check if version bump covers it.
// storedVersion = graph_types.version from DB
// currentVersion = consumer-defined constant (e.g., CURRENT_CALL_GRAPH_VERSION = 2)
if (storedVersion < currentVersion) return "version-mismatch";

// No version bump but schema changed — non-breaking change expected.
// Verify by checking stored data against new schema.
// findIncompatibleNodes = repository query that returns nodes
// where Value.Check(currentSchema, node.attributes) is false
const incompatible = await findIncompatibleNodes(graphTypeId, nodeTypeName, currentSchema);
if (incompatible.length === 0) return "non-breaking";
return "breaking";
```

`findIncompatibleNodes` is a repository query function: fetch nodes of the
given type in the given graph, filter to those where
`!Value.Check(currentSchema, node.attributes)`. `currentVersion` is a
consumer-defined integer constant that the consumer increments when making
a breaking change to their graph type Module.

This combines detection (diff), metadata (version), and verification (check)
and is likely the most robust approach. **Recommended for POC exploration.**

## Data Migration via Value.Cast

Once a schema change is detected and classified, `Value.Cast` can migrate
existing data to fit the new schema shape.

### How Value.Cast Works

`Value.Cast(schema, value)` attempts to fit a value into the shape defined by
the schema:

- **Matching properties**: Retained from the original value
- **Missing required properties with defaults**: Filled from `schema.default`
- **Missing required properties without defaults**: Created as zeros (`0`,
  `""`, `false`, `{}`, `[]`)
- **Unknown properties**: Dropped if `additionalProperties: false`, retained
  otherwise
- **Union types**: Each variant is scored and the best match is selected

This is exactly what's needed for data migration — reshape stored node
attributes to fit a new schema while preserving matching fields.

### Example: Adding a Required Field with Default

```ts
// Schema v1 (stored in DB)
const CallNodeV1 = Type.Object({
  requestId: Type.String(),
  operationId: Type.String(),
  status: Type.Union([...]),
});

// Schema v2 (new Module entry — added priority field)
const CallNodeV2 = Type.Object({
  requestId: Type.String(),
  operationId: Type.String(),
  status: Type.Union([...]),
  priority: Type.Number({ default: 0 }),  // new field with default
});

// Existing data
const oldNode = {
  requestId: "req-001",
  operationId: "op-call",
  status: "completed",
};

// Cast migrates the data
const migratedNode = Value.Cast(CallNodeV2, oldNode);
// → { requestId: "req-001", operationId: "op-call", status: "completed", priority: 0 }
```

### Example: Type Narrowing (Non-Breaking)

```ts
// Schema v1: type: Type.String()
// Schema v2: type: Type.Literal("triggered")

const oldEdge = { type: "triggered", metadata: {} };
const migrated = Value.Cast(CallGraphV2.TriggeredEdge, oldEdge);
// → { type: "triggered", metadata: {} }  (unchanged — "triggered" satisfies Literal)
```

### Example: Type Change (Breaking)

```ts
// Schema v1: priority: Type.Number()
// Schema v2: priority: Type.Union([Type.Literal("low"), Type.Literal("high")])

const oldNode = { requestId: "req-001", priority: 3 };
const migrated = Value.Cast(CallNodeV2, oldNode);
// Cast scores each union variant against the value.
// Neither "low" nor "high" matches 3 — Cast picks the best match,
// but the result is likely incorrect (data loss).
```

**⚠️ Cast limitation**: `Value.Cast` does not provide custom migration
functions. For breaking changes that require transformation logic (e.g.,
`priority: 3` → `priority: "high"` based on a threshold), the repository layer
needs a custom migration handler. Cast handles the common case (add fields,
narrow types, drop removed fields); custom logic handles the rest.

## Schema-as-JSON: Patching Stored Schemas

When a graph type Module changes, the stored schema in `node_types.schema` must
be updated. `Value.Patch` can apply the diff to the stored schema:

```ts
// 1. Diff current stored schema against new Module entry
const edits = Value.Diff(storedSchema, CallGraph.CallNode);

// 2. Patch the stored schema
const newStoredSchema = Value.Patch(storedSchema, edits);

// 3. Update the DB row
await db.update(nodeTypes)
  .set({ schema: newStoredSchema, updatedAt: new Date() })
  .where(eq(nodeTypes.name, "Call"));
```

This is simpler than reconstructing the full Module and running
`moduleToDbSchema()` again. But it requires the same caveats as `Value.Diff` —
the edits are structural, not semantic. If the Module entry's `$defs` structure
changed (e.g., a `Type.Ref` target was updated), the diff may not capture the
semantic change correctly.

**Alternative**: Re-run `moduleToDbSchema()` on the updated Module and write the
full output. This is simpler and more reliable but requires the full Module —
not just the schema entry — to be available at migration time.

**Decision for v1**: Re-run `moduleToDbSchema()` on the updated Module. The
Module is always available when the consumer defines graph types. Patch-based
schema update is an optimization for later.

## Evolution Strategies

### Additive-Only (Recommended for v1)

The simplest strategy: only add optional fields and new node/edge types. Never
remove or rename existing fields.

- **New optional fields**: `Type.Optional(Type.String())` — old data is still
  valid under the new schema
- **New node/edge types**: New rows in `node_types`/`edge_types` — existing
  rows unchanged
- **New enum values**: Add to `Type.Union` of `Type.Literal` — old data with
  existing values is still valid

This avoids the need for `Value.Cast` migrations entirely. The `version` field
on `graph_types` stays at 1.

**When additive-only breaks down**: If a field was incorrectly designed (wrong
type, wrong name, wrong semantics), additive-only forces you to deprecate the
old field and add a new one. The deprecated field stays in the schema forever.
This is acceptable for early development but creates technical debt over time.

### Version-Bumped with Cast Migration

When a breaking change is needed, bump the `graph_types.version` integer. The
repository layer checks the version before processing and applies `Value.Cast`
to migrate data when the version doesn't match.

```ts
const graphType = await findGraphType("call-graph");
const currentModule = CallGraph;

if (graphType.version < CURRENT_CALL_GRAPH_VERSION) {
  // Mark migration in progress
  await updateGraphTypeVersion(graphType.id, graphType.version + 1);

  // Schema has changed — migrate all nodes
  const nodes = await findNodesByGraphType(graphType.id);
  for (const node of nodes) {
    const migrated = Value.Cast(
      currentModule[`${node.nodeType}Node`],
      node.attributes,
    );
    // Guard: verify the cast result before writing
    if (!Value.Check(currentModule[`${node.nodeType}Node`], migrated)) {
      throw new Error(
        `Cast produced invalid data for node ${node.id}. ` +
        `Custom migration required for this schema change.`
      );
    }
    await updateNode(node.id, { attributes: migrated });
  }
  // Update the schema and finalize version
  await updateNodeTypesSchemas(graphType.id, moduleToDbSchema(currentModule));
  await updateGraphTypeVersion(graphType.id, graphType.version + 1);
}
```

**Version bump contract** (decision: even/odd scheme):
- **Even version**: Stable schema, no pending migrations
- **Odd version**: Migration in progress — reads return stale data or error
  (consumer-configurable)
- **After migration**: Version is bumped to the next even number

### Migration Safety

Partial migration is the primary risk — if the process crashes or errors on
node N of M, some nodes are migrated and some are not. The even/odd version
scheme provides the recovery mechanism:

1. **Before migration**: Set `version` to odd (in-progress state)
2. **During migration**: Each node is cast, verified with `Value.Check`, then
   written. If any cast produces invalid data, the migration aborts with an
   error — the consumer must provide a custom migration function for that
   schema change.
3. **After migration**: Set `version` to even (stable state), update stored
   schemas via `moduleToDbSchema()`
4. **Recovery**: If the process crashes during migration, the odd version
   signals that migration is incomplete. On restart, the repository layer
   detects the odd version and can either resume the migration (migrate the
   remaining nodes) or roll back (restore from backup / re-project from events).

**Read behavior during migration**: Consumer-configurable. Options:
- Reject reads with an error ("migration in progress")
- Return stale data (from unmigrated nodes)
- Return mixed data (some migrated, some not — not recommended)

**Cast safety guard**: Every `Value.Cast` result must be verified with
`Value.Check(newSchema, migratedData)` before writing. If the check fails,
the migration is classified as breaking after all, and the consumer must
provide a custom migration function. Document this as a hard requirement.

**Performance**: For large datasets (10K+ nodes), the migration loop should
batch reads/writes to avoid holding all nodes in memory. The graph type is
effectively read-only during migration — writes to that graph type's nodes
should be rejected while `version` is odd.

This is the simplest versioning that handles breaking changes. It's
application-level (the consumer decides when to bump and migrate), not
framework-level (storage doesn't auto-migrate). `graph_types.version` is
consistent with `@alkdev/flowgraph` ADR-004: flowgraph itself doesn't version
schemas (it's in-memory), but the persisting consumer (storage) provides the
versioned envelope.

### Event-Sourced Replay (Forward-Looking)

In a fully event-sourced model, schema evolution is handled by replaying events
through updated projector functions. Storage's tables are rebuilt from the
event log, so no data migration is needed — you just re-project.

This requires the event log to be the authoritative source and storage to be
a disposable projection. The hub's call graph persistence (Postgres) is
approaching this model: events are persisted, and state is reconstructed by
replay. But the current metagraph tables are not rebuilt from events — they're
written directly by the repository layer.

If the hub migrates to a model where call graph nodes/edges are projections
of the call protocol event stream, then schema evolution becomes projector
evolution — update the projector, replay the events, rebuild the projection.
No `Value.Cast` needed for stored data (because there is no stored data — just
re-projected views).

**This is a post-v1 design** — it requires the event log to be the primary
persistence, with storage tables as read-optimized projections. The current
model writes directly to storage tables; the event stream is separate (managed
by the hub's call graph module, not by `@alkdev/storage`).

## Relationship to the Module

The `Metagraph` Module ([metagraph-module.md](./metagraph-module.md)) is the
source of truth for graph type schemas. When the Module changes:

1. **`moduleToDbSchema()`** produces the updated DB row values
2. **`Value.Diff(storedSchema, moduleEntry)`** detects what changed
3. **`Value.Cast(moduleEntry, storedData)`** migrates affected node/edge data
4. **`Value.Patch(storedSchema, edits)`** updates the stored schema (or
   re-run `moduleToDbSchema()`)

The Module's `$defs` structure adds complexity to diffing — a `Type.Ref`
resolution changes the effective schema in ways that a raw JSON diff might not
capture. Using `moduleToDbSchema()` (which resolves refs before writing to the
DB) avoids this problem — the stored schema is already dereferenced.

## Design Decisions

All design decisions are documented as ADRs in [decisions/](decisions/).

| ADR | Decision | Summary |
|-----|----------|---------|
| [028](decisions/028-additive-only-with-cast-migration.md) | Additive-only for v1, Cast migration when needed | Additive changes avoid migration; `Value.Cast` for common cases |
| [029](decisions/029-version-as-breaking-change-signal.md) | Version as a coarse-grained breaking-change signal | Only breaking changes bump the version; even/odd for migration state |
| [030](decisions/030-schema-change-detection-via-diff.md) | Schema change detection via Value.Diff | No manual changelog; diff stored vs current schemas |
| [031](decisions/031-moduletodbschema-for-updates.md) | moduleToDbSchema() for schema updates | Re-run full Module projection, not Value.Patch |
| [032](decisions/032-single-author-not-crdt.md) | Single-author model, not CRDT | No concurrent multi-author graph types |

## Open Questions

Open questions are tracked in [open-questions.md](open-questions.md). Key
questions affecting schema evolution:

- **OQ-10**: Can `Edit[]` from `Value.Diff` be reliably classified as breaking vs non-breaking?
- **OQ-11**: Should the repository layer auto-migrate data on schema change, or require explicit consumer action?
- **OQ-12**: How does schema evolution interact with the hub's event-sourced call graph persistence?
- **OQ-13**: Should schema evolution events be part of the event stream?

## References

- TypeBox `Value` namespace: `/workspace/@alkdev/typebox/src/value/`
- TypeBox `Value.Diff`/`Value.Patch`: `/workspace/@alkdev/typebox/src/value/delta/delta.ts`
- TypeBox `Value.Cast`: `/workspace/@alkdev/typebox/src/value/cast/cast.ts`
- TypeBox `Value.Check`: `/workspace/@alkdev/typebox/src/value/check/check.ts`
- Event Log as Source of Truth (ADR-005): `/workspace/@alkdev/flowgraph/docs/architecture/decisions/005-event-log-as-source-of-truth.md`
- Call protocol: `/workspace/@alkdev/operations/docs/architecture/call-protocol.md`
- Metagraph Module: [metagraph-module.md](./metagraph-module.md)
- Schema versioning in the data model: ADR-029, [metagraph-module.md](./metagraph-module.md)