docs: add schema-evolution.md — TypeBox Value.Diff/Patch/Cast for schema evolution

This commit is contained in:
2026-05-28 16:31:25 +00:00
parent 1e804b9174
commit 95e02f939d
3 changed files with 575 additions and 6 deletions

View File

@@ -244,6 +244,13 @@ design in [metagraph-module.md](./metagraph-module.md):
output is the same shape that a ujsx HostConfig would produce, but storage
doesn't need ujsx to create it. The alignment is structural, not dependent.
5. **Schemas-as-JSON enables `Value.Diff`/`Value.Patch`/`Value.Cast`**
because TypeBox Modules serialize to JSON Schema, the TypeBox value system
can operate on schemas themselves (diff to detect changes, patch to update
stored schemas, cast to migrate data). This is not possible if schemas are
opaque builder objects or Drizzle column definitions. See
[schema-evolution.md](./schema-evolution.md).
## References
- ujsx pointer system: `/workspace/@alkdev/ujsx/src/core/pointer.ts`
@@ -254,3 +261,4 @@ design in [metagraph-module.md](./metagraph-module.md):
- JPATH Module (JSONPath as TypeBox Module): `/workspace/research/typebox_research/ujsx/jpath.gen.ts`
- jsonpathly source: `/workspace/jsonpathly/`
- Module evolution spec: [metagraph-module.md](./metagraph-module.md)
- Schema evolution spec: [schema-evolution.md](./schema-evolution.md)

View File

@@ -282,10 +282,11 @@ storage node attributes and operations call events), they should either:
deployment-specific secret management.
5. **Schema evolution strategy**: When graph type schemas evolve (new node types,
changed attribute schemas), who handles migration? The repository layer
should support schema version checking, but actual data migration scripts are
application-level. See [metagraph.md](./metagraph.md) for the versioning
approach.
changed attribute schemas), how are changes detected and data migrated?
TypeBox's `Value.Diff` can diff schemas-as-JSON to detect changes,
`Value.Cast` can migrate data shapes, and `Value.Check` can verify
compatibility. The `version` field on `graph_types` tracks breaking changes.
See [schema-evolution.md](./schema-evolution.md) for the full design.
6. **~~Should the repository layer live in `@alkdev/storage` or in a consumer
package?~~** Decision: the repository CRUD layer (host-specific typed
@@ -301,6 +302,7 @@ storage node attributes and operations call events), they should either:
## References
- Metagraph Module evolution: [metagraph-module.md](./metagraph-module.md)
- Schema evolution via TypeBox value system: [schema-evolution.md](./schema-evolution.md)
- Forward-looking connections: [forward-look.md](./forward-look.md)
- Operations architecture: `/workspace/@alkdev/operations/docs/architecture/README.md`
- Pubsub architecture: `/workspace/@alkdev/pubsub/docs/architecture/README.md`

View File

@@ -0,0 +1,559 @@
---
status: draft
last_updated: 2026-05-28
---
# Schema Evolution
How graph type schemas evolve over time — detecting changes, classifying their
impact, and migrating stored data. Uses TypeBox's `Value.Diff`/`Value.Patch`/
`Value.Cast` to operate on schemas-as-JSON and data-as-JSON, aligned with the
ecosystem's event-sourced design.
## Overview
The @alkdev ecosystem is event-driven. Call protocol events are append-only.
Graph instances in storage are projections — materialized views produced by
replaying events through projector functions. When a graph type's schema
changes, stored data may need migration, and the repository layer needs to
detect and handle the change.
The key insight: **TypeBox schemas are JSON**. They're JSON Schema objects
stored as JSON in `node_types.schema` and `edge_types.schema` columns (text in
SQLite, jsonb in PG). Because they're JSON, TypeBox's own `Value.Diff`,
`Value.Patch`, and `Value.Cast` can operate on them directly — diffing schemas
to detect changes, patching stored schemas to update them, and casting stored
data to fit new schema shapes.
Two distinct domains of JSON values are involved:
- **Schemas-as-JSON**: The TypeBox schema objects stored in `node_types.schema`
and `edge_types.schema` columns. `Value.Diff`/`Value.Patch` operate on these
(detecting schema changes, updating stored schemas).
- **Data-as-JSON**: The node/edge attribute values stored in `nodes.attributes`
and `edges.attributes` columns. `Value.Cast`/`Value.Check` operate on these
(migrating data to fit new schemas, verifying compatibility).
This is not a migration framework. It's the observation that the existing
TypeBox value system, combined with schemas-as-JSON storage, gives us schema
evolution primitives for free — we just need to wire them together.
### The Edit Type
`Value.Diff` returns an array of `Edit` objects — structural delta operations
that transform one JSON value into another:
```ts
type Insert = { type: "insert", path: string, value: unknown }
type Update = { type: "update", path: string, value: unknown }
type Delete = { type: "delete", path: string }
type Edit = Insert | Update | Delete
```
Paths use RFC 6901 JSON Pointer format (`/properties/requestId`,
`/properties/status/anyOf/0/const`). The `value` field contains the inserted
or updated value.
`Value.Patch(current, edits)` applies the edits to `current` and returns the
result. `Value.Diff` and `Value.Patch` are inverses:
`Patch(current, Diff(current, next))``next`.
**Critical: `Edit` paths are structural, not semantic.** An `Update` at path
`/properties/status` could mean type narrowing (`String``Literal`) or type
change (`String``Number`). Diff doesn't know — it only sees the raw JSON
structure. Classification logic (breaking vs non-breaking) must interpret the
edits with schema awareness.
## The Event-Sourced Context
The @alkdev ecosystem uses an append-only event model:
- **Call protocol** (`@alkdev/operations`): 5 typed events (`call.requested`,
`call.responded`, `call.completed`, `call.aborted`, `call.error`) form an
append-only event stream per workflow execution
- **Event log as source of truth** (`@alkdev/flowgraph` ADR-005): The in-memory
`CallEventMapValue[]` is the authoritative state; projected views (status,
results, call graph) are derived from it
- **Hub persistence**: The hub persists call protocol events to Postgres and
replays them to reconstruct reactive state after restart
- **Storage as projection**: The metagraph tables store the projected state —
graph instances, nodes, edges — not the raw event stream
This means storage's graph data is analogous to a read model in event-sourced
systems. Schema evolution in storage is projection migration, not event
migration. The event stream (managed by the hub) retains full history;
storage's tables hold the current materialized view.
### Single-Author, Not CRDT
Unlike Yjs or Automerge, the @alkdev event model is single-author per session
with central coordination. There are no concurrent multi-author edit conflicts.
This means:
- **Idempotent replay** (same events → same state) is sufficient; CRDT merge
semantics are not needed
- **Schema evolution** is forward-only (new code processes old data), not
bidirectional (concurrent versions merging)
- **Migration** is apply-on-read or apply-on-write, not conflict resolution
If future use cases require multi-author concurrent graph editing (e.g.,
collaborative task boards), a CRDT layer would need to sit between the event
stream and storage. That's a post-v1 concern.
## Schema-as-JSON: Value.Diff on Schemas
TypeBox schemas are JSON Schema objects — plain JSON values. The current
`node_types.schema` and `edge_types.schema` columns store them as JSON text
(SQLite) or jsonb (PG). This means `Value.Diff` can diff schemas themselves.
### Detecting Schema Changes
```ts
import { Value } from "@alkdev/typebox";
// Stored schema (from DB)
const storedSchema = await db.query.nodeTypes.findFirst({
where: eq(nodeTypes.name, "Call"),
}).schema;
// Current schema (from Module)
const currentSchema = CallGraph.CallNode;
// Diff the schemas
const edits = Value.Diff(storedSchema, currentSchema);
if (edits.length === 0) {
// No change — schema matches
} else {
// Schema has changed — classify the edits
}
```
### Classifying Schema Edits
`Value.Diff` is schema-agnostic — it diffs raw JSON structure without
understanding JSON Schema semantics. A classification layer is needed to
determine whether an edit is breaking or non-breaking.
| Edit pattern | Schema-level meaning | Breaking? |
|---|---|---|
| Insert new property with `default` | New field with default | No |
| Insert new property without `default` | New required field — old data won't have it | Yes (unless `Type.Optional`) |
| Insert new property with `Optional` wrapper | New optional field — old data valid | No |
| Update property type: `String``Literal("x")` | Type narrowing (subtype) | No (existing data with `"x"` is valid) |
| Update property type: `String``Number` | Type change (incompatible) | Yes |
| Update property type: add `Optional` wrapper | Making field optional | No (backward compatible) |
| Delete property | Field removed from schema | Yes (old data with this field is non-conforming if `additionalProperties: false`) |
| Update `$defs` references | Cross-reference changes | Depends — see note |
**⚠️ Needs POC**: The classification above is theoretical. Whether this can be
done reliably from `Edit[]` objects (which are raw JSON pointer + value pairs)
needs validation. POC scope:
- **Test corpus**: ~20 schema change patterns covering each row in the table
above (add optional field, add required field, narrow type, change type,
remove field, add enum value, `$ref` change)
- **Success criteria**: Classification accuracy >95% against expected
breaking/non-breaking labels for each pattern
- **Fallback**: If classification accuracy is too low, use Strategy C (hybrid
with `Value.Check` verification) instead of Strategy A.
### Three Detection Strategies
**Strategy A: Diff schemas, classify edits**
```
Value.Diff(storedSchema, currentSchema) → Edit[]
classify(Edit[]) → { breaking: boolean, edits: ClassifiedEdit[] }
```
Pros: Identifies *what* changed and *how*. Enables targeted migration (only
migrate data affected by breaking changes).
Cons: Classification is hard to get right. Schema semantics are rich (type
narrowing, union widening, `$ref` changes, `additionalProperties` interactions).
**Strategy B: Diff schemas, test data compatibility**
```
Value.Diff(storedSchema, currentSchema) → Edit[]
if (edits.length > 0) {
// Schema changed — test if existing data is still valid
const sampleData = fetchSampleNodeData(graphTypeId, nodeTypeName);
const compatible = sampleData.every(d => Value.Check(currentSchema, d));
}
```
Pros: Simple. No classification logic needed. Works with any schema change.
Cons: Requires fetching data to test. Binary answer (compatible or not) — no
information about *what* changed. Sample may not be representative.
**Strategy C: Hybrid — diff for detection, Check for verification**
```
edits = Value.Diff(storedSchema, currentSchema)
if (edits.length === 0) return "unchanged";
// Schema changed. Fast-path: check if version bump covers it.
// storedVersion = graph_types.version from DB
// currentVersion = consumer-defined constant (e.g., CURRENT_CALL_GRAPH_VERSION = 2)
if (storedVersion < currentVersion) return "version-mismatch";
// No version bump but schema changed — non-breaking change expected.
// Verify by checking stored data against new schema.
// findIncompatibleNodes = repository query that returns nodes
// where Value.Check(currentSchema, node.attributes) is false
const incompatible = await findIncompatibleNodes(graphTypeId, nodeTypeName, currentSchema);
if (incompatible.length === 0) return "non-breaking";
return "breaking";
```
`findIncompatibleNodes` is a repository query function: fetch nodes of the
given type in the given graph, filter to those where
`!Value.Check(currentSchema, node.attributes)`. `currentVersion` is a
consumer-defined integer constant that the consumer increments when making
a breaking change to their graph type Module.
This combines detection (diff), metadata (version), and verification (check)
and is likely the most robust approach. **Recommended for POC exploration.**
## Data Migration via Value.Cast
Once a schema change is detected and classified, `Value.Cast` can migrate
existing data to fit the new schema shape.
### How Value.Cast Works
`Value.Cast(schema, value)` attempts to fit a value into the shape defined by
the schema:
- **Matching properties**: Retained from the original value
- **Missing required properties with defaults**: Filled from `schema.default`
- **Missing required properties without defaults**: Created as zeros (`0`,
`""`, `false`, `{}`, `[]`)
- **Unknown properties**: Dropped if `additionalProperties: false`, retained
otherwise
- **Union types**: Each variant is scored and the best match is selected
This is exactly what's needed for data migration — reshape stored node
attributes to fit a new schema while preserving matching fields.
### Example: Adding a Required Field with Default
```ts
// Schema v1 (stored in DB)
const CallNodeV1 = Type.Object({
requestId: Type.String(),
operationId: Type.String(),
status: Type.Union([...]),
});
// Schema v2 (new Module entry — added priority field)
const CallNodeV2 = Type.Object({
requestId: Type.String(),
operationId: Type.String(),
status: Type.Union([...]),
priority: Type.Number({ default: 0 }), // new field with default
});
// Existing data
const oldNode = {
requestId: "req-001",
operationId: "op-call",
status: "completed",
};
// Cast migrates the data
const migratedNode = Value.Cast(CallNodeV2, oldNode);
// → { requestId: "req-001", operationId: "op-call", status: "completed", priority: 0 }
```
### Example: Type Narrowing (Non-Breaking)
```ts
// Schema v1: type: Type.String()
// Schema v2: type: Type.Literal("triggered")
const oldEdge = { type: "triggered", metadata: {} };
const migrated = Value.Cast(CallGraphV2.TriggeredEdge, oldEdge);
// → { type: "triggered", metadata: {} } (unchanged — "triggered" satisfies Literal)
```
### Example: Type Change (Breaking)
```ts
// Schema v1: priority: Type.Number()
// Schema v2: priority: Type.Union([Type.Literal("low"), Type.Literal("high")])
const oldNode = { requestId: "req-001", priority: 3 };
const migrated = Value.Cast(CallNodeV2, oldNode);
// Cast scores each union variant against the value.
// Neither "low" nor "high" matches 3 — Cast picks the best match,
// but the result is likely incorrect (data loss).
```
**⚠️ Cast limitation**: `Value.Cast` does not provide custom migration
functions. For breaking changes that require transformation logic (e.g.,
`priority: 3``priority: "high"` based on a threshold), the repository layer
needs a custom migration handler. Cast handles the common case (add fields,
narrow types, drop removed fields); custom logic handles the rest.
## Schema-as-JSON: Patching Stored Schemas
When a graph type Module changes, the stored schema in `node_types.schema` must
be updated. `Value.Patch` can apply the diff to the stored schema:
```ts
// 1. Diff current stored schema against new Module entry
const edits = Value.Diff(storedSchema, CallGraph.CallNode);
// 2. Patch the stored schema
const newStoredSchema = Value.Patch(storedSchema, edits);
// 3. Update the DB row
await db.update(nodeTypes)
.set({ schema: newStoredSchema, updatedAt: new Date() })
.where(eq(nodeTypes.name, "Call"));
```
This is simpler than reconstructing the full Module and running
`moduleToDbSchema()` again. But it requires the same caveats as `Value.Diff`
the edits are structural, not semantic. If the Module entry's `$defs` structure
changed (e.g., a `Type.Ref` target was updated), the diff may not capture the
semantic change correctly.
**Alternative**: Re-run `moduleToDbSchema()` on the updated Module and write the
full output. This is simpler and more reliable but requires the full Module —
not just the schema entry — to be available at migration time.
**Decision for v1**: Re-run `moduleToDbSchema()` on the updated Module. The
Module is always available when the consumer defines graph types. Patch-based
schema update is an optimization for later.
## Evolution Strategies
### Additive-Only (Recommended for v1)
The simplest strategy: only add optional fields and new node/edge types. Never
remove or rename existing fields.
- **New optional fields**: `Type.Optional(Type.String())` — old data is still
valid under the new schema
- **New node/edge types**: New rows in `node_types`/`edge_types` — existing
rows unchanged
- **New enum values**: Add to `Type.Union` of `Type.Literal` — old data with
existing values is still valid
This avoids the need for `Value.Cast` migrations entirely. The `version` field
on `graph_types` stays at 1.
**When additive-only breaks down**: If a field was incorrectly designed (wrong
type, wrong name, wrong semantics), additive-only forces you to deprecate the
old field and add a new one. The deprecated field stays in the schema forever.
This is acceptable for early development but creates technical debt over time.
### Version-Bumped with Cast Migration
When a breaking change is needed, bump the `graph_types.version` integer. The
repository layer checks the version before processing and applies `Value.Cast`
to migrate data when the version doesn't match.
```ts
const graphType = await findGraphType("call-graph");
const currentModule = CallGraph;
if (graphType.version < CURRENT_CALL_GRAPH_VERSION) {
// Mark migration in progress
await updateGraphTypeVersion(graphType.id, graphType.version + 1);
// Schema has changed — migrate all nodes
const nodes = await findNodesByGraphType(graphType.id);
for (const node of nodes) {
const migrated = Value.Cast(
currentModule[`${node.nodeType}Node`],
node.attributes,
);
// Guard: verify the cast result before writing
if (!Value.Check(currentModule[`${node.nodeType}Node`], migrated)) {
throw new Error(
`Cast produced invalid data for node ${node.id}. ` +
`Custom migration required for this schema change.`
);
}
await updateNode(node.id, { attributes: migrated });
}
// Update the schema and finalize version
await updateNodeTypesSchemas(graphType.id, moduleToDbSchema(currentModule));
await updateGraphTypeVersion(graphType.id, graphType.version + 1);
}
```
**Version bump contract** (decision: even/odd scheme):
- **Even version**: Stable schema, no pending migrations
- **Odd version**: Migration in progress — reads return stale data or error
(consumer-configurable)
- **After migration**: Version is bumped to the next even number
### Migration Safety
Partial migration is the primary risk — if the process crashes or errors on
node N of M, some nodes are migrated and some are not. The even/odd version
scheme provides the recovery mechanism:
1. **Before migration**: Set `version` to odd (in-progress state)
2. **During migration**: Each node is cast, verified with `Value.Check`, then
written. If any cast produces invalid data, the migration aborts with an
error — the consumer must provide a custom migration function for that
schema change.
3. **After migration**: Set `version` to even (stable state), update stored
schemas via `moduleToDbSchema()`
4. **Recovery**: If the process crashes during migration, the odd version
signals that migration is incomplete. On restart, the repository layer
detects the odd version and can either resume the migration (migrate the
remaining nodes) or roll back (restore from backup / re-project from events).
**Read behavior during migration**: Consumer-configurable. Options:
- Reject reads with an error ("migration in progress")
- Return stale data (from unmigrated nodes)
- Return mixed data (some migrated, some not — not recommended)
**Cast safety guard**: Every `Value.Cast` result must be verified with
`Value.Check(newSchema, migratedData)` before writing. If the check fails,
the migration is classified as breaking after all, and the consumer must
provide a custom migration function. Document this as a hard requirement.
**Performance**: For large datasets (10K+ nodes), the migration loop should
batch reads/writes to avoid holding all nodes in memory. The graph type is
effectively read-only during migration — writes to that graph type's nodes
should be rejected while `version` is odd.
This is the simplest versioning that handles breaking changes. It's
application-level (the consumer decides when to bump and migrate), not
framework-level (storage doesn't auto-migrate). `graph_types.version` is
consistent with `@alkdev/flowgraph` ADR-004: flowgraph itself doesn't version
schemas (it's in-memory), but the persisting consumer (storage) provides the
versioned envelope.
### Event-Sourced Replay (Forward-Looking)
In a fully event-sourced model, schema evolution is handled by replaying events
through updated projector functions. Storage's tables are rebuilt from the
event log, so no data migration is needed — you just re-project.
This requires the event log to be the authoritative source and storage to be
a disposable projection. The hub's call graph persistence (Postgres) is
approaching this model: events are persisted, and state is reconstructed by
replay. But the current metagraph tables are not rebuilt from events — they're
written directly by the repository layer.
If the hub migrates to a model where call graph nodes/edges are projections
of the call protocol event stream, then schema evolution becomes projector
evolution — update the projector, replay the events, rebuild the projection.
No `Value.Cast` needed for stored data (because there is no stored data — just
re-projected views).
**This is a post-v1 design** — it requires the event log to be the primary
persistence, with storage tables as read-optimized projections. The current
model writes directly to storage tables; the event stream is separate (managed
by the hub's call graph module, not by `@alkdev/storage`).
## Relationship to the Module
The `Metagraph` Module ([metagraph-module.md](./metagraph-module.md)) is the
source of truth for graph type schemas. When the Module changes:
1. **`moduleToDbSchema()`** produces the updated DB row values
2. **`Value.Diff(storedSchema, moduleEntry)`** detects what changed
3. **`Value.Cast(moduleEntry, storedData)`** migrates affected node/edge data
4. **`Value.Patch(storedSchema, edits)`** updates the stored schema (or
re-run `moduleToDbSchema()`)
The Module's `$defs` structure adds complexity to diffing — a `Type.Ref`
resolution changes the effective schema in ways that a raw JSON diff might not
capture. Using `moduleToDbSchema()` (which resolves refs before writing to the
DB) avoids this problem — the stored schema is already dereferenced.
## Design Decisions
### SE1: Additive-only for v1, Cast migration when needed
For v1, schema changes should be additive (new optional fields, new types,
new enum values). This avoids data migration entirely. When additive-only is
insufficient, `Value.Cast` handles the common migration cases. Custom
migration functions are the consumer's responsibility.
### SE2: Version as a coarse-grained breaking-change signal
The `version` integer on `graph_types` tracks **breaking** schema changes.
Non-breaking changes (additive) do not require a version bump. This is a
coarse signal — the repository layer checks version before processing and
knows to run migration logic when it doesn't match.
### SE3: Schema change detection via Value.Diff, not manual tracking
Rather than maintaining a separate "schema version log" or changelog, the
repository layer uses `Value.Diff(storedSchema, moduleEntry)` to detect when
a stored schema has diverged from the current Module entry. This is
schema-agnostic and works for any change.
### SE4: moduleToDbSchema() for schema updates, not Value.Patch
When updating stored schemas, re-run `moduleToDbSchema()` on the full Module
rather than using `Value.Patch` to apply edits. This is more reliable because
it doesn't depend on Diff correctly capturing `Type.Ref`/`$defs` changes.
Patch-based schema update is an optimization for later.
### SE5: Single-author model, not CRDT
Schema evolution assumes single-author per graph type. There is no concurrent
multi-author editing of graph types. If this changes (multiple consumers
defining the same graph type with different schemas), a merge/CRDT layer would
be needed. That's a post-v1 concern.
## Open Questions
1. **Can `Edit[]` from `Value.Diff` be reliably classified as breaking vs
non-breaking?** The classification table above is theoretical. A POC should
validate whether the `Edit[]` output contains enough information to
distinguish, for example, `String → Literal("x")` (narrowing, non-breaking)
from `String → Number` (incompatible, breaking). Alternative: skip
classification and just use `Value.Check(newSchema, storedData)` for
verification.
2. **Should the repository layer auto-migrate data on schema change, or
require explicit consumer action?** Auto-migration is simpler for consumers
but risky (data transformation without consumer awareness). Explicit
migration is safer but more boilerplate. **Decision (conditional on OQ1
POC outcome):** if classification is feasible (OQ1 POC succeeds), the
repository layer auto-applies `Value.Cast` for changes it classifies as
non-breaking, and requires explicit consumer action for breaking changes.
If classification is not feasible, the fallback is: the repository layer
auto-applies `Value.Cast` only when `Value.Check(newSchema, storedData)`
passes for all stored data (verification, not classification), and requires
explicit consumer action otherwise. This ensures auto-migration never
corrupts data — if in doubt, the consumer decides.
3. **How does this interact with the hub's event-sourced call graph
persistence?** If the hub migrates to event-sourced replay (projector
evolution), storage's call graph tables become disposable projections and
`Value.Cast` migration is unnecessary. But other graph types (ACL, tasks,
secrets) may not have an event stream to replay from. The schema evolution
design should work for both projections and direct-persisted data.
4. **Should schema evolution events be part of the event stream?** If the
system is event-sourced, schema changes themselves could be events
(`schema.updated`, `schema.version_bumped`). This would give a full audit
trail of schema evolution, but adds complexity. **Decision: post-v1.** For
v1, schema changes are applied directly via the repository layer with version
tracking.
## References
- TypeBox `Value` namespace: `/workspace/@alkdev/typebox/src/value/`
- TypeBox `Value.Diff`/`Value.Patch`: `/workspace/@alkdev/typebox/src/value/delta/delta.ts`
- TypeBox `Value.Cast`: `/workspace/@alkdev/typebox/src/value/cast/cast.ts`
- TypeBox `Value.Check`: `/workspace/@alkdev/typebox/src/value/check/check.ts`
- Event Log as Source of Truth (ADR-005): `/workspace/@alkdev/flowgraph/docs/architecture/decisions/005-event-log-as-source-of-truth.md`
- Call protocol: `/workspace/@alkdev/operations/docs/architecture/call-protocol.md`
- Metagraph Module: [metagraph-module.md](./metagraph-module.md)
- Current schema versioning: [metagraph.md](./metagraph.md)