add category theory research doc connecting UJSX to functors, adjunctions, and round-trip fidelity
This commit is contained in:
348
docs/research/category-theory-ujsx.md
Normal file
348
docs/research/category-theory-ujsx.md
Normal file
@@ -0,0 +1,348 @@
|
|||||||
|
# Category Theory and UJSX: Functors, Adjunctions, and Round-Trip Fidelity
|
||||||
|
|
||||||
|
**Date**: 2026-05-09
|
||||||
|
**Topic**: Formal foundations for the UJSX transform system, connecting categorical concepts to practical implementation
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. The Big Picture
|
||||||
|
|
||||||
|
UJSX is a system for transforming trees between representations. It has:
|
||||||
|
- A universal tree model (`UElement` / `URoot` / `UPrimitive`)
|
||||||
|
- A `TransformRegistry` with bidirectional rules and a `direction` field
|
||||||
|
- A `HostConfig` reconciler pattern that renders trees into target formats
|
||||||
|
- TypeBox schemas that define the shape of valid trees at runtime
|
||||||
|
|
||||||
|
This document connects these practical pieces to their categorical foundations, and explains why those connections matter for correctness.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Categories We Work With
|
||||||
|
|
||||||
|
A **category** consists of objects (types) and morphisms (structure-preserving maps between types) with an identity morphism and composable morphisms.
|
||||||
|
|
||||||
|
### 2.1 The TypeBox Schema Category
|
||||||
|
|
||||||
|
Objects: TypeBox schema types (`TObject`, `TString`, `TUnion`, `TModule`, etc.)
|
||||||
|
Morphisms: Schema transformations — `Type.Optional(T)`, `Type.Ref("X")`, `Type.Composite([A, B])`, `Module.Import("X")`
|
||||||
|
Composition: Nesting and combining schemas
|
||||||
|
Identity: `Type.Ref("Self")` within a `Type.Module`
|
||||||
|
|
||||||
|
This is approximately a **cartesian closed category**:
|
||||||
|
- **Products** — `Type.Composite([A, B])` (intersection types)
|
||||||
|
- **Coproducts** — `Type.Union([A, B])` (union types)
|
||||||
|
- **Exponentials** — `Type.Function([...], R)` (function types as first-class objects)
|
||||||
|
- **Terminal object** — `Type.Void()` (every type has a unique map to it)
|
||||||
|
|
||||||
|
### 2.2 The Tree Category
|
||||||
|
|
||||||
|
Trees (ASTs, IRs, DOMs) form categories where:
|
||||||
|
- **Objects** are tree species: UJSX elements, mdast nodes, hast nodes, TypeBox schemas-as-trees, safetensor headers
|
||||||
|
- **Morphisms** are transforms: `ujsx → mdast`, `mdast → ujsx`, `ujsx → typebox`, `typebox → ujsx`, etc.
|
||||||
|
- **Composition** is pipeline composition: first transform A→B, then B→C
|
||||||
|
|
||||||
|
A UJSX tree (or mdast tree, or hast tree, or TypeBox Module-as-tree) is itself an object in a tree category. The `TransformRegistry.transform()` function with `next` continuation is a **recursive functor** — it maps each subtree while preserving parent-child structure.
|
||||||
|
|
||||||
|
### 2.3 The Host Category
|
||||||
|
|
||||||
|
Each `HostConfig` defines a rendering functor from the UJSX tree category to a target category:
|
||||||
|
|
||||||
|
```
|
||||||
|
HostConfig<string, string, Record<string, unknown>>
|
||||||
|
```
|
||||||
|
|
||||||
|
- Source category: UJSX tree category
|
||||||
|
- Target category: string instances (DOM nodes, markdown strings, TypeBox schema objects, binary buffers)
|
||||||
|
- Functor: `createHostRoot(config).render(tree)` maps UJSX elements to target instances
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Functors and the TransformRegistry
|
||||||
|
|
||||||
|
### 3.1 Transforms Are Functors
|
||||||
|
|
||||||
|
A **functor** `F: C → D` maps:
|
||||||
|
- Every object `A` in `C` to an object `F(A)` in `D`
|
||||||
|
- Every morphism `f: A → B` in `C` to a morphism `F(f): F(A) → F(B)` in `D`
|
||||||
|
- Such that `F(id_A) = id_{F(A)}` and `F(g ∘ f) = F(g) ∘ F(f)`
|
||||||
|
|
||||||
|
In UJSX terms:
|
||||||
|
|
||||||
|
| Categorical Concept | UJSX Implementation |
|
||||||
|
|---------------------|---------------------|
|
||||||
|
| Object in C | A UJSX tree conforming to some schema |
|
||||||
|
| Object in D | An mdast tree / hast tree / TypeBox schema |
|
||||||
|
| Functor F | A set of `TransformRule`s with `direction: "ujsx→mdast"` |
|
||||||
|
| `F(g ∘ f)` | The `next` continuation in `transform(node, ctx, next)` |
|
||||||
|
| `F(id_A)` | Identity transform (pass-through rule) |
|
||||||
|
|
||||||
|
The `next` parameter in `TransformRule.transform` is the key: it ensures that when you transform a subtree, you first transform the current node, then recursively transform children via `next`. This is the functoriality condition — the structure is preserved.
|
||||||
|
|
||||||
|
### 3.2 Direction Selects the Functor
|
||||||
|
|
||||||
|
The `Direction` type:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
type Direction = "ujsx→mdast" | "mdast→ujsx" | "ujsx→jpath" | "jpath→ujsx" | "ujsx→hast" | "hast→ujsx"
|
||||||
|
```
|
||||||
|
|
||||||
|
Each direction is a different functor. The pair `"ujsx→mdast"` and `"mdast→ujsx"` are potentially **adjoint functors**.
|
||||||
|
|
||||||
|
### 3.3 Adjunctions and Round-Trip Fidelity
|
||||||
|
|
||||||
|
Two functors `F: C → D` and `G: D → C` form an **adjunction** if there exists a natural bijection:
|
||||||
|
|
||||||
|
```
|
||||||
|
Hom_D(F(A), B) ≅ Hom_C(A, G(B))
|
||||||
|
```
|
||||||
|
|
||||||
|
In practice, this means: for every UJSX tree `a` and mdast tree `b`, there's a one-to-one correspondence between maps `F(a) → b` (mdast transforms) and maps `a → G(b)` (UJSX transforms).
|
||||||
|
|
||||||
|
The **round-trip property** is a stronger, more practical notion: for all `a` in C,
|
||||||
|
|
||||||
|
```
|
||||||
|
G(F(a)) ≅ a (unit of the adjunction)
|
||||||
|
```
|
||||||
|
|
||||||
|
and for all `b` in D,
|
||||||
|
|
||||||
|
```
|
||||||
|
F(G(b)) ≅ b (counit of the adjunction)
|
||||||
|
```
|
||||||
|
|
||||||
|
This is what we actually want to verify: if I transform a UJSX tree to mdast and back, I get something equivalent to what I started with.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. The Impedance Mismatch: ASG vs AST
|
||||||
|
|
||||||
|
### 4.1 The Problem
|
||||||
|
|
||||||
|
Not all transforms satisfy round-trip fidelity. The key failure mode is when we have an **Abstract Semantic Graph** (ASG) on one side and an **Abstract Syntax Tree** (AST) on the other.
|
||||||
|
|
||||||
|
An **AST** represents syntax — the textual structure of a format. A mdast tree is an AST. It has a `type` discriminant and positional information.
|
||||||
|
|
||||||
|
An **ASG** represents semantics — meaning that may not round-trip through syntax. A UJSX `<Table>` component is an ASG node. It represents the *idea* of a table, which can be rendered as:
|
||||||
|
- A markdown table (`| col1 | col2 |`)
|
||||||
|
- An HTML `<table>` element
|
||||||
|
- A grid of key-value pairs in plain text
|
||||||
|
- A data structure (headers + rows) in JSON
|
||||||
|
|
||||||
|
But the markdown table syntax cannot express all table semantics (no colspans in GFM, no per-cell formatting), so the round trip `Table → mdast → Table` may lose information.
|
||||||
|
|
||||||
|
### 4.2 Categorical Formulation
|
||||||
|
|
||||||
|
In category theory terms, this is a **reflection** rather than an equivalence:
|
||||||
|
|
||||||
|
- The functor `F: ASG → AST` (render abstract semantics to concrete syntax) is **surjective on objects** (every ASG node maps *somewhere* in AST-land)
|
||||||
|
- The functor `G: AST → ASG` (parse concrete syntax back to abstract semantics) is **not injective on objects** (some ASG constructs have no AST representation, and some map to "lossy" approximations)
|
||||||
|
|
||||||
|
The adjunction unit `η: id → G ∘ F` is not an isomorphism — `G(F(a))` may be a *simplification* of `a`. This is expected and correct.
|
||||||
|
|
||||||
|
### 4.3 Practical Examples
|
||||||
|
|
||||||
|
| ASG Node | AST Representation | Round-Trip? | Loss? |
|
||||||
|
|----------|-------------------|-------------|-------|
|
||||||
|
| `<Table cols={["Name","Age"]}>` | mdast `table` node | Yes | Column alignment only |
|
||||||
|
| `<Table cols={["Name","Age"]} colspan={{1:3}}>` | mdast `table` node | Partial | colspan lost (GFM limit) |
|
||||||
|
| `<Callout type="warning">` | mdast `blockquote` with marker | Partial | Structural equivalence, semantic loss |
|
||||||
|
| `<Formula>ax² + bx + c</Formula>` | Plain text `ax² + bx + c` | No | Layout/formatting lost |
|
||||||
|
| `<Code language="ts">` | mdast `code` with `lang="ts"` | Yes | None |
|
||||||
|
| `<Heading level={2}>` | mdast `heading` with `depth: 2` | Yes | None |
|
||||||
|
|
||||||
|
### 4.4 Implications for Transform Design
|
||||||
|
|
||||||
|
For ASG→AST transforms, we should:
|
||||||
|
1. **Acknowledge loss explicitly** — transform rules can declare their fidelity level (exact, structural, lossy)
|
||||||
|
2. **Preserve semantics through metadata** — `UElement.metadata` can carry information that the target format can't express, enabling partial recovery on the return trip
|
||||||
|
3. **Not attempt impossible round trips** — instead, design the return transform `G: AST → ASG` to produce the *best possible* ASG node, not a faithful reproduction
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Codegen and Functor Construction
|
||||||
|
|
||||||
|
### 5.1 From TypeBox Modules to Transform Rules
|
||||||
|
|
||||||
|
The `ts-to-module.ts` codegen produces `Type.Module({...})` definitions from TypeScript sources. We have modules for Unist, MDAST, and JPATH. Each module defines an object in the TypeBox Schema Category.
|
||||||
|
|
||||||
|
Given two modules (say `UJSX` and `Mdast`), a functor between them can be partially constructed by analyzing the structural correspondence:
|
||||||
|
|
||||||
|
- `UJSX.UniversalElement.type` is a `String` or `Function` → matches `Mdast` nodes with `type` discriminants
|
||||||
|
- `UJSX.UniversalProps.children` is `Array(UniversalNode)` → matches `Mdast.Parent.children`
|
||||||
|
- `UJSX.UniversalElement.props` → matches `Mdast` node-specific properties
|
||||||
|
|
||||||
|
The codegen can produce **skeleton transform rules** — rules where `match` predicates are derived from schema intersection, and `transform` functions are stubs waiting for implementation:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Auto-generated from schema correspondence analysis
|
||||||
|
registry.register({
|
||||||
|
name: "heading-ujsx-to-mdast",
|
||||||
|
direction: "ujsx→mdast",
|
||||||
|
match: (node) => isUElement(node) && node.type === "heading", // derived from UJSX schema
|
||||||
|
transform: (node, ctx, next) => {
|
||||||
|
// TODO: implement heading-specific transform
|
||||||
|
// node.props.level maps to mdast heading depth
|
||||||
|
// node.children maps to mdast heading children
|
||||||
|
throw new Error("Not implemented: heading transform");
|
||||||
|
},
|
||||||
|
priority: 1,
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5.2 From Type Definitions to Host Config Element Types
|
||||||
|
|
||||||
|
Each TypeBox `Type.XXX()` call in a module definition becomes a potential UJSX element type in a host config:
|
||||||
|
|
||||||
|
- `Type.Object({...})` → `<Object>` element
|
||||||
|
- `Type.String()` → `<String>` element
|
||||||
|
- `Type.Ref("X")` → `<Ref name="X" />` element
|
||||||
|
- `Type.Literal("heading")` → `<Literal value="heading" />` element
|
||||||
|
- `Type.Array(Type.Ref("Node"))` → `<Array><Ref name="Node" /></Array>`
|
||||||
|
|
||||||
|
This is a functor from the TypeBox Schema Category to the UJSX Element Category — every TypeBox constructor maps to a UJSX element, and TypeBox composition maps to UJSX tree composition.
|
||||||
|
|
||||||
|
### 5.3 The Proxy Insight: Call Trees and Definition Trees
|
||||||
|
|
||||||
|
The proxy experiment in `/research/typebox_research/ujsx/proxy.ts` traced TypeBox's internal call tree. The relationship:
|
||||||
|
|
||||||
|
```
|
||||||
|
Call tree (imperative): Definition tree (declarative):
|
||||||
|
Type.Object({ <Object name="MyType">
|
||||||
|
name: Type.String(), <property name="name"><String /></property>
|
||||||
|
... ...
|
||||||
|
}) </Object>
|
||||||
|
```
|
||||||
|
|
||||||
|
These are the same information, just in different traversal orders. The **call tree** is a depth-first execution trace; the **definition tree** is the resulting data structure. UJSX's declarative form unifies them — the JSX element IS the definition tree, and rendering it through a host produces the call tree.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Proof as Program: Verification via Generation
|
||||||
|
|
||||||
|
### 6.1 The Curry-Howard-Lambek Correspondence
|
||||||
|
|
||||||
|
The Curry-Howard-Lambek correspondence establishes:
|
||||||
|
|
||||||
|
| Logic | Programming | Category Theory |
|
||||||
|
|-------|------------|-----------------|
|
||||||
|
| Propositions | Types | Objects |
|
||||||
|
| Proofs | Programs | Morphisms |
|
||||||
|
| Implication A → B | Function type A → B | Exponential object B^A |
|
||||||
|
| Conjunction A ∧ B | Product type A × B | Product object |
|
||||||
|
| Disjunction A ∨ B | Union type A \| B | Coproduct object |
|
||||||
|
|
||||||
|
Under this correspondence: a TypeBox schema is a **proposition** (a specification of what valid data looks like). A value that passes `Value.Check(schema, data)` is a **proof** (evidence that the proposition holds for this data). A transform `f: SchemaA → SchemaB` is a **proof that if A-structured data exists, then B-structured data can be derived from it**.
|
||||||
|
|
||||||
|
### 6.2 Faker as Proof Generator
|
||||||
|
|
||||||
|
`typebox-schema-faker` generates random inhabitants of TypeBox schemas. In logical terms, it generates **witnesses** — concrete evidence that a type (proposition) is inhabited. This serves as:
|
||||||
|
|
||||||
|
1. **Model checking** — generate many instances of UJSX trees, transform them, validate the output against the target schema. If all generated instances pass, you have statistical confidence in the transform's correctness.
|
||||||
|
|
||||||
|
2. **Round-trip verification** — for adjoint functor pairs:
|
||||||
|
```
|
||||||
|
for (const x of generate(UJSXSchema)) {
|
||||||
|
const y = F(x) // ujsx → mdast
|
||||||
|
const x2 = G(y) // mdast → ujsx
|
||||||
|
assert(equivalent(x, x2, fidelityLevel))
|
||||||
|
}
|
||||||
|
```
|
||||||
|
This tests the adjunction laws for generated inhabitants.
|
||||||
|
|
||||||
|
3. **Schema coverage** — faker can be guided to hit edge cases (null children, empty arrays, deeply nested structures) that hand-written tests miss.
|
||||||
|
|
||||||
|
### 6.3 Compositional Verification
|
||||||
|
|
||||||
|
If transform `f: A → B` is verified and transform `g: B → C` is verified, then `g ∘ f: A → C` should also be correct by composition. The `composedOf` field from the graphology pattern ([`category-theory-graph.md`](./category-theory-graph.md)) can track this:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
interface TransformRule<T, U, A> {
|
||||||
|
name: string;
|
||||||
|
match: (node: T) => boolean;
|
||||||
|
transform: (node: T, ctx: TransformContext<A>, next: TransformFn<T, U, A>) => U;
|
||||||
|
priority?: number;
|
||||||
|
composedOf?: [string, string]; // provenance: which rules composed to produce this one
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This enables provenance tracking — every composed transform knows its derivation, enabling targeted re-verification when a base rule changes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. The TypeDef Bridge: Binary Layout as a Category
|
||||||
|
|
||||||
|
### 7.1 Standard TypeBox vs TypeDef
|
||||||
|
|
||||||
|
Standard TypeBox types represent **JSON values** — data serialized as JSON, validated as JSON Schema. The TypeDef system adds **binary layout semantics**:
|
||||||
|
|
||||||
|
| Standard TypeBox | TypeDef | Difference |
|
||||||
|
|-----------------|---------|-----------|
|
||||||
|
| `Type.Number()` | `TF.Float32()`, `TF.Int32()`, etc. | Fixed-width, specific encoding |
|
||||||
|
| `Type.String()` | `TF.String()` | Same, but in struct context has known offset |
|
||||||
|
| `Type.Object({...})` | `TF.Struct({...})` | Ordered fields, computable offsets |
|
||||||
|
| `Type.Union([...], 'type')` | `TF.Union([...], 'type')` | Tagged union with discriminant |
|
||||||
|
|
||||||
|
The `TypeDef:Kind` markers (`TypeDef:Float32`, `TypeDef:Struct`, etc.) act as **discriminants in a functor** — they map from the abstract number type to the concrete binary representation.
|
||||||
|
|
||||||
|
### 7.2 Offset Computation as a Functor
|
||||||
|
|
||||||
|
Given a `TF.Struct({ name: TF.String(), embedding: TF.Array(TF.Float32()) })`:
|
||||||
|
|
||||||
|
1. The **TypeBox functor** maps this to a JSON Schema that validates JSON objects with `name` (string) and `embedding` (number array).
|
||||||
|
2. The **Binary layout functor** maps this to a byte offset table: `name` at offset 0 (variable, preceded by length), `embedding` at next aligned offset, with shape `[768]` → `768 × 4 = 3072` bytes.
|
||||||
|
|
||||||
|
These are **two different functors from the same source category** (TypeDef schemas) to two different target categories (JSON validation and binary layout). They're naturally isomorphic in the sense that every TypeDef schema has both a JSON interpretation and a binary interpretation.
|
||||||
|
|
||||||
|
### 7.3 SafeTensor as a Container Category
|
||||||
|
|
||||||
|
The SafeTensor format is: `header_length (8 bytes) + header (JSON) + data (raw bytes)`
|
||||||
|
|
||||||
|
This is a **coproduct** (sum type): `SafeTensor = Header × Data`. The header contains `TensorRef` entries with `data_offsets: [start, end]` — these are morphisms from the header category to positions in the data category.
|
||||||
|
|
||||||
|
The metatensor concept adds `TF.Struct` entries alongside raw tensors, enabling a third functor: from TypeDef schemas to byte ranges within the data section.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Summary: The Categorical Structure of UJSX
|
||||||
|
|
||||||
|
```
|
||||||
|
TypeBox Schema Category
|
||||||
|
│
|
||||||
|
┌──────┼──────┐
|
||||||
|
│ │ │
|
||||||
|
F₁ │ F₂ │ F₃ │
|
||||||
|
┌───┘ │ └───┐
|
||||||
|
▼ ▼ ▼
|
||||||
|
UJSX Tree MDAST Tree Binary Layout
|
||||||
|
Category Category Category
|
||||||
|
│ ▲
|
||||||
|
│ G₂ │
|
||||||
|
└──────────┘
|
||||||
|
│ F₂⁻¹ │
|
||||||
|
│ │
|
||||||
|
(adjoint pair: F₂ and G₂)
|
||||||
|
```
|
||||||
|
|
||||||
|
- **F₁** — TypeBox → UJSX host (renders type definitions as JSX elements)
|
||||||
|
- **F₂** — UJSX → MDAST transform (renders abstract components as markdown AST)
|
||||||
|
- **G₂** — MDAST → UJSX transform (parses markdown AST back to abstract components)
|
||||||
|
- **F₃** — TypeDef → Binary Layout (computes offsets and validates binary data)
|
||||||
|
|
||||||
|
**F₂ and G₂ should form an adjunction** — verified by round-trip property testing with `typebox-schema-faker`.
|
||||||
|
|
||||||
|
**F₁ is a faithful embedding** — TypeBox schemas embed into UJSX without information loss.
|
||||||
|
|
||||||
|
**F₃ is a projection** — TypeDef schemas project onto binary layouts, but not all binary layouts correspond to valid TypeDef schemas (lossy in the other direction).
|
||||||
|
|
||||||
|
**The ASG/AST impedance mismatch is a reflection, not an equivalence** — some ASG nodes have no faithful AST representation, and this is expected and correct.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. Practical Next Steps
|
||||||
|
|
||||||
|
1. **Fork `typebox-schema-faker`** into the alkdev ecosystem for model checking
|
||||||
|
2. **Add `composedOf` provenance tracking** to `TransformRule`
|
||||||
|
3. **Add `fidelity` levels** to transform rules (`"exact"` | `"structural"` | `"lossy"`)
|
||||||
|
4. **Implement the TypeBox host** — prove F₁ works before building F₂/F₃
|
||||||
|
5. **Use `ts-to-module.ts` codegen** to produce module definitions, then derive skeleton transform rules from schema correspondence analysis
|
||||||
|
6. **Design the TypeDef UJSX elements** (`<Struct>`, `<Float32>`, etc.) as a second host that computes binary offsets
|
||||||
Reference in New Issue
Block a user