ujsx/docs/research/category-theory-ujsx.md

# Category Theory and UJSX: Functors, Adjunctions, and Round-Trip Fidelity

**Date**: 2026-05-09
**Topic**: Formal foundations for the UJSX transform system, connecting categorical concepts to practical implementation

---

## 1. The Big Picture

UJSX is a system for transforming trees between representations. It has:
- A universal tree model (`UElement` / `URoot` / `UPrimitive`)
- A `TransformRegistry` with bidirectional rules and a `direction` field
- A `HostConfig` reconciler pattern that renders trees into target formats
- TypeBox schemas that define the shape of valid trees at runtime

This document connects these practical pieces to their categorical foundations, and explains why those connections matter for correctness.

---

## 2. Categories We Work With

A **category** consists of objects (types) and morphisms (structure-preserving maps between types) with an identity morphism and composable morphisms.

### 2.1 The TypeBox Schema Category

Objects: TypeBox schema types (`TObject`, `TString`, `TUnion`, `TModule`, etc.)
Morphisms: Schema transformations — `Type.Optional(T)`, `Type.Ref("X")`, `Type.Composite([A, B])`, `Module.Import("X")`
Composition: Nesting and combining schemas
Identity: `Type.Ref("Self")` within a `Type.Module`

This is approximately a **cartesian closed category**:
- **Products** — `Type.Composite([A, B])` (intersection types)
- **Coproducts** — `Type.Union([A, B])` (union types)
- **Exponentials** — `Type.Function([...], R)` (function types as first-class objects)
- **Terminal object** — `Type.Void()` (every type has a unique map to it)

### 2.2 The Tree Category

Trees (ASTs, IRs, DOMs) form categories where:
- **Objects** are tree species: UJSX elements, mdast nodes, hast nodes, TypeBox schemas-as-trees, safetensor headers
- **Morphisms** are transforms: `ujsx → mdast`, `mdast → ujsx`, `ujsx → typebox`, `typebox → ujsx`, etc.
- **Composition** is pipeline composition: first transform A→B, then B→C

A UJSX tree (or mdast tree, or hast tree, or TypeBox Module-as-tree) is itself an object in a tree category. The `TransformRegistry.transform()` function with `next` continuation is a **recursive functor** — it maps each subtree while preserving parent-child structure.

### 2.3 The Host Category

Each `HostConfig` defines a rendering functor from the UJSX tree category to a target category:

```
HostConfig<string, string, Record<string, unknown>>
```

- Source category: UJSX tree category
- Target category: string instances (DOM nodes, markdown strings, TypeBox schema objects, binary buffers)
- Functor: `createHostRoot(config).render(tree)` maps UJSX elements to target instances

---

## 3. Functors and the TransformRegistry

### 3.1 Transforms Are Functors

A **functor** `F: C → D` maps:
- Every object `A` in `C` to an object `F(A)` in `D`
- Every morphism `f: A → B` in `C` to a morphism `F(f): F(A) → F(B)` in `D`
- Such that `F(id_A) = id_{F(A)}` and `F(g ∘ f) = F(g) ∘ F(f)`

In UJSX terms:

| Categorical Concept | UJSX Implementation |
|---------------------|---------------------|
| Object in C | A UJSX tree conforming to some schema |
| Object in D | An mdast tree / hast tree / TypeBox schema |
| Functor F | A set of `TransformRule`s with `direction: "ujsx→mdast"` |
| `F(g ∘ f)` | The `next` continuation in `transform(node, ctx, next)` |
| `F(id_A)` | Identity transform (pass-through rule) |

The `next` parameter in `TransformRule.transform` is the key: it ensures that when you transform a subtree, you first transform the current node, then recursively transform children via `next`. This is the functoriality condition — the structure is preserved.

### 3.2 Direction Selects the Functor

The `Direction` type:

```typescript
type Direction = "ujsx→mdast" | "mdast→ujsx" | "ujsx→jpath" | "jpath→ujsx" | "ujsx→hast" | "hast→ujsx"
```

Each direction is a different functor. The pair `"ujsx→mdast"` and `"mdast→ujsx"` are potentially **adjoint functors**.

### 3.3 Adjunctions and Round-Trip Fidelity

Two functors `F: C → D` and `G: D → C` form an **adjunction** if there exists a natural bijection:

```
Hom_D(F(A), B) ≅ Hom_C(A, G(B))
```

In practice, this means: for every UJSX tree `a` and mdast tree `b`, there's a one-to-one correspondence between maps `F(a) → b` (mdast transforms) and maps `a → G(b)` (UJSX transforms).

The **round-trip property** is a stronger, more practical notion: for all `a` in C,

```
G(F(a)) ≅ a    (unit of the adjunction)
```

and for all `b` in D,

```
F(G(b)) ≅ b    (counit of the adjunction)
```

This is what we actually want to verify: if I transform a UJSX tree to mdast and back, I get something equivalent to what I started with.

---

## 4. The Impedance Mismatch: ASG vs AST

### 4.1 The Problem

Not all transforms satisfy round-trip fidelity. The key failure mode is when we have an **Abstract Semantic Graph** (ASG) on one side and an **Abstract Syntax Tree** (AST) on the other.

An **AST** represents syntax — the textual structure of a format. A mdast tree is an AST. It has a `type` discriminant and positional information.

An **ASG** represents semantics — meaning that may not round-trip through syntax. A UJSX `<Table>` component is an ASG node. It represents the *idea* of a table, which can be rendered as:
- A markdown table (`| col1 | col2 |`)
- An HTML `<table>` element
- A grid of key-value pairs in plain text
- A data structure (headers + rows) in JSON

But the markdown table syntax cannot express all table semantics (no colspans in GFM, no per-cell formatting), so the round trip `Table → mdast → Table` may lose information.

### 4.2 Categorical Formulation

In category theory terms, this is a **reflection** rather than an equivalence:

- The functor `F: ASG → AST` (render abstract semantics to concrete syntax) is **surjective on objects** (every ASG node maps *somewhere* in AST-land)
- The functor `G: AST → ASG` (parse concrete syntax back to abstract semantics) is **not injective on objects** (some ASG constructs have no AST representation, and some map to "lossy" approximations)

The adjunction unit `η: id → G ∘ F` is not an isomorphism — `G(F(a))` may be a *simplification* of `a`. This is expected and correct.

### 4.3 Practical Examples

| ASG Node | AST Representation | Round-Trip? | Loss? |
|----------|-------------------|-------------|-------|
| `<Table cols={["Name","Age"]}>` | mdast `table` node | Yes | Column alignment only |
| `<Table cols={["Name","Age"]} colspan={{1:3}}>` | mdast `table` node | Partial | colspan lost (GFM limit) |
| `<Callout type="warning">` | mdast `blockquote` with marker | Partial | Structural equivalence, semantic loss |
| `<Formula>ax² + bx + c</Formula>` | Plain text `ax² + bx + c` | No | Layout/formatting lost |
| `<Code language="ts">` | mdast `code` with `lang="ts"` | Yes | None |
| `<Heading level={2}>` | mdast `heading` with `depth: 2` | Yes | None |

### 4.4 Implications for Transform Design

For ASG→AST transforms, we should:
1. **Acknowledge loss explicitly** — transform rules can declare their fidelity level (exact, structural, lossy)
2. **Preserve semantics through metadata** — `UElement.metadata` can carry information that the target format can't express, enabling partial recovery on the return trip
3. **Not attempt impossible round trips** — instead, design the return transform `G: AST → ASG` to produce the *best possible* ASG node, not a faithful reproduction

---

## 5. Codegen and Functor Construction

### 5.1 From TypeBox Modules to Transform Rules

The `ts-to-module.ts` codegen produces `Type.Module({...})` definitions from TypeScript sources. We have modules for Unist, MDAST, and JPATH. Each module defines an object in the TypeBox Schema Category.

Given two modules (say `UJSX` and `Mdast`), a functor between them can be partially constructed by analyzing the structural correspondence:

- `UJSX.UniversalElement.type` is a `String` or `Function` → matches `Mdast` nodes with `type` discriminants
- `UJSX.UniversalProps.children` is `Array(UniversalNode)` → matches `Mdast.Parent.children`
- `UJSX.UniversalElement.props` → matches `Mdast` node-specific properties

The codegen can produce **skeleton transform rules** — rules where `match` predicates are derived from schema intersection, and `transform` functions are stubs waiting for implementation:

```typescript
// Auto-generated from schema correspondence analysis
registry.register({
  name: "heading-ujsx-to-mdast",
  direction: "ujsx→mdast",
  match: (node) => isUElement(node) && node.type === "heading",  // derived from UJSX schema
  transform: (node, ctx, next) => {
    // TODO: implement heading-specific transform
    // node.props.level maps to mdast heading depth
    // node.children maps to mdast heading children
    throw new Error("Not implemented: heading transform");
  },
  priority: 1,
});
```

### 5.2 From Type Definitions to Host Config Element Types

Each TypeBox `Type.XXX()` call in a module definition becomes a potential UJSX element type in a host config:

- `Type.Object({...})` → `<Object>` element
- `Type.String()` → `<String>` element
- `Type.Ref("X")` → `<Ref name="X" />` element
- `Type.Literal("heading")` → `<Literal value="heading" />` element
- `Type.Array(Type.Ref("Node"))` → `<Array><Ref name="Node" /></Array>`

This is a functor from the TypeBox Schema Category to the UJSX Element Category — every TypeBox constructor maps to a UJSX element, and TypeBox composition maps to UJSX tree composition.

### 5.3 The Proxy Insight: Call Trees and Definition Trees

The proxy experiment in `/research/typebox_research/ujsx/proxy.ts` traced TypeBox's internal call tree. The relationship:

```
Call tree (imperative):    Definition tree (declarative):
Type.Object({              <Object name="MyType">
  name: Type.String(),       <property name="name"><String /></property>
  ...                         ...
})                          </Object>
```

These are the same information, just in different traversal orders. The **call tree** is a depth-first execution trace; the **definition tree** is the resulting data structure. UJSX's declarative form unifies them — the JSX element IS the definition tree, and rendering it through a host produces the call tree.

---

## 6. Proof as Program: Verification via Generation

### 6.1 The Curry-Howard-Lambek Correspondence

The Curry-Howard-Lambek correspondence establishes:

| Logic | Programming | Category Theory |
|-------|------------|-----------------|
| Propositions | Types | Objects |
| Proofs | Programs | Morphisms |
| Implication A → B | Function type A → B | Exponential object B^A |
| Conjunction A ∧ B | Product type A × B | Product object |
| Disjunction A ∨ B | Union type A \| B | Coproduct object |

Under this correspondence: a TypeBox schema is a **proposition** (a specification of what valid data looks like). A value that passes `Value.Check(schema, data)` is a **proof** (evidence that the proposition holds for this data). A transform `f: SchemaA → SchemaB` is a **proof that if A-structured data exists, then B-structured data can be derived from it**.

### 6.2 Faker as Proof Generator

`typebox-schema-faker` generates random inhabitants of TypeBox schemas. In logical terms, it generates **witnesses** — concrete evidence that a type (proposition) is inhabited. This serves as:

1. **Model checking** — generate many instances of UJSX trees, transform them, validate the output against the target schema. If all generated instances pass, you have statistical confidence in the transform's correctness.

2. **Round-trip verification** — for adjoint functor pairs:
   ```
   for (const x of generate(UJSXSchema)) {
     const y = F(x)   // ujsx → mdast
     const x2 = G(y)  // mdast → ujsx
     assert(equivalent(x, x2, fidelityLevel))
   }
   ```
   This tests the adjunction laws for generated inhabitants.

3. **Schema coverage** — faker can be guided to hit edge cases (null children, empty arrays, deeply nested structures) that hand-written tests miss.

### 6.3 Compositional Verification

If transform `f: A → B` is verified and transform `g: B → C` is verified, then `g ∘ f: A → C` should also be correct by composition. The `composedOf` field from the graphology pattern ([`category-theory-graph.md`](./category-theory-graph.md)) can track this:

```typescript
interface TransformRule<T, U, A> {
  name: string;
  match: (node: T) => boolean;
  transform: (node: T, ctx: TransformContext<A>, next: TransformFn<T, U, A>) => U;
  priority?: number;
  composedOf?: [string, string];  // provenance: which rules composed to produce this one
}
```

This enables provenance tracking — every composed transform knows its derivation, enabling targeted re-verification when a base rule changes.

---

## 7. The TypeDef Bridge: Binary Layout as a Category

### 7.1 Standard TypeBox vs TypeDef

Standard TypeBox types represent **JSON values** — data serialized as JSON, validated as JSON Schema. The TypeDef system adds **binary layout semantics**:

| Standard TypeBox | TypeDef | Difference |
|-----------------|---------|-----------|
| `Type.Number()` | `TF.Float32()`, `TF.Int32()`, etc. | Fixed-width, specific encoding |
| `Type.String()` | `TF.String()` | Same, but in struct context has known offset |
| `Type.Object({...})` | `TF.Struct({...})` | Ordered fields, computable offsets |
| `Type.Union([...], 'type')` | `TF.Union([...], 'type')` | Tagged union with discriminant |

The `TypeDef:Kind` markers (`TypeDef:Float32`, `TypeDef:Struct`, etc.) act as **discriminants in a functor** — they map from the abstract number type to the concrete binary representation.

### 7.2 Offset Computation as a Functor

Given a `TF.Struct({ name: TF.String(), embedding: TF.Array(TF.Float32()) })`:

1. The **TypeBox functor** maps this to a JSON Schema that validates JSON objects with `name` (string) and `embedding` (number array).
2. The **Binary layout functor** maps this to a byte offset table: `name` at offset 0 (variable, preceded by length), `embedding` at next aligned offset, with shape `[768]` → `768 × 4 = 3072` bytes.

These are **two different functors from the same source category** (TypeDef schemas) to two different target categories (JSON validation and binary layout). They're naturally isomorphic in the sense that every TypeDef schema has both a JSON interpretation and a binary interpretation.

### 7.3 SafeTensor as a Container Category

The SafeTensor format is: `header_length (8 bytes) + header (JSON) + data (raw bytes)`

This is a **coproduct** (sum type): `SafeTensor = Header × Data`. The header contains `TensorRef` entries with `data_offsets: [start, end]` — these are morphisms from the header category to positions in the data category.

The metatensor concept adds `TF.Struct` entries alongside raw tensors, enabling a third functor: from TypeDef schemas to byte ranges within the data section.

---

## 8. Summary: The Categorical Structure of UJSX

```
                    TypeBox Schema Category
                           │
                    ┌──────┼──────┐
                    │      │      │
              F₁    │  F₂  │  F₃  │
                ┌───┘      │      └───┐
                ▼          ▼          ▼
          UJSX Tree    MDAST Tree   Binary Layout
          Category     Category     Category
                │          ▲
                │    G₂    │
                └──────────┘
                │    F₂⁻¹  │
                │          │
           (adjoint pair: F₂ and G₂)
```

- **F₁** — TypeBox → UJSX host (renders type definitions as JSX elements)
- **F₂** — UJSX → MDAST transform (renders abstract components as markdown AST)
- **G₂** — MDAST → UJSX transform (parses markdown AST back to abstract components)
- **F₃** — TypeDef → Binary Layout (computes offsets and validates binary data)

**F₂ and G₂ should form an adjunction** — verified by round-trip property testing with `typebox-schema-faker`.

**F₁ is a faithful embedding** — TypeBox schemas embed into UJSX without information loss.

**F₃ is a projection** — TypeDef schemas project onto binary layouts, but not all binary layouts correspond to valid TypeDef schemas (lossy in the other direction).

**The ASG/AST impedance mismatch is a reflection, not an equivalence** — some ASG nodes have no faithful AST representation, and this is expected and correct.

---

## 9. Practical Next Steps

1. **Fork `typebox-schema-faker`** into the alkdev ecosystem for model checking
2. **Add `composedOf` provenance tracking** to `TransformRule`
3. **Add `fidelity` levels** to transform rules (`"exact"` | `"structural"` | `"lossy"`)
4. **Implement the TypeBox host** — prove F₁ works before building F₂/F₃
5. **Use `ts-to-module.ts` codegen** to produce module definitions, then derive skeleton transform rules from schema correspondence analysis
6. **Design the TypeDef UJSX elements** (`<Struct>`, `<Float32>`, etc.) as a second host that computes binary offsets