17 KiB
Category Theory and UJSX: Functors, Adjunctions, and Round-Trip Fidelity
Date: 2026-05-09 Topic: Formal foundations for the UJSX transform system, connecting categorical concepts to practical implementation
1. The Big Picture
UJSX is a system for transforming trees between representations. It has:
- A universal tree model (
UElement/URoot/UPrimitive) - A
TransformRegistrywith bidirectional rules and adirectionfield - A
HostConfigreconciler pattern that renders trees into target formats - TypeBox schemas that define the shape of valid trees at runtime
This document connects these practical pieces to their categorical foundations, and explains why those connections matter for correctness.
2. Categories We Work With
A category consists of objects (types) and morphisms (structure-preserving maps between types) with an identity morphism and composable morphisms.
2.1 The TypeBox Schema Category
Objects: TypeBox schema types (TObject, TString, TUnion, TModule, etc.)
Morphisms: Schema transformations — Type.Optional(T), Type.Ref("X"), Type.Composite([A, B]), Module.Import("X")
Composition: Nesting and combining schemas
Identity: Type.Ref("Self") within a Type.Module
This is approximately a cartesian closed category:
- Products —
Type.Composite([A, B])(intersection types) - Coproducts —
Type.Union([A, B])(union types) - Exponentials —
Type.Function([...], R)(function types as first-class objects) - Terminal object —
Type.Void()(every type has a unique map to it)
2.2 The Tree Category
Trees (ASTs, IRs, DOMs) form categories where:
- Objects are tree species: UJSX elements, mdast nodes, hast nodes, TypeBox schemas-as-trees, safetensor headers
- Morphisms are transforms:
ujsx → mdast,mdast → ujsx,ujsx → typebox,typebox → ujsx, etc. - Composition is pipeline composition: first transform A→B, then B→C
A UJSX tree (or mdast tree, or hast tree, or TypeBox Module-as-tree) is itself an object in a tree category. The TransformRegistry.transform() function with next continuation is a recursive functor — it maps each subtree while preserving parent-child structure.
2.3 The Host Category
Each HostConfig defines a rendering functor from the UJSX tree category to a target category:
HostConfig<string, string, Record<string, unknown>>
- Source category: UJSX tree category
- Target category: string instances (DOM nodes, markdown strings, TypeBox schema objects, binary buffers)
- Functor:
createHostRoot(config).render(tree)maps UJSX elements to target instances
3. Functors and the TransformRegistry
3.1 Transforms Are Functors
A functor F: C → D maps:
- Every object
AinCto an objectF(A)inD - Every morphism
f: A → BinCto a morphismF(f): F(A) → F(B)inD - Such that
F(id_A) = id_{F(A)}andF(g ∘ f) = F(g) ∘ F(f)
In UJSX terms:
| Categorical Concept | UJSX Implementation |
|---|---|
| Object in C | A UJSX tree conforming to some schema |
| Object in D | An mdast tree / hast tree / TypeBox schema |
| Functor F | A set of TransformRules with direction: "ujsx→mdast" |
F(g ∘ f) |
The next continuation in transform(node, ctx, next) |
F(id_A) |
Identity transform (pass-through rule) |
The next parameter in TransformRule.transform is the key: it ensures that when you transform a subtree, you first transform the current node, then recursively transform children via next. This is the functoriality condition — the structure is preserved.
3.2 Direction Selects the Functor
The Direction type:
type Direction = "ujsx→mdast" | "mdast→ujsx" | "ujsx→jpath" | "jpath→ujsx" | "ujsx→hast" | "hast→ujsx"
Each direction is a different functor. The pair "ujsx→mdast" and "mdast→ujsx" are potentially adjoint functors.
3.3 Adjunctions and Round-Trip Fidelity
Two functors F: C → D and G: D → C form an adjunction if there exists a natural bijection:
Hom_D(F(A), B) ≅ Hom_C(A, G(B))
In practice, this means: for every UJSX tree a and mdast tree b, there's a one-to-one correspondence between maps F(a) → b (mdast transforms) and maps a → G(b) (UJSX transforms).
The round-trip property is a stronger, more practical notion: for all a in C,
G(F(a)) ≅ a (unit of the adjunction)
and for all b in D,
F(G(b)) ≅ b (counit of the adjunction)
This is what we actually want to verify: if I transform a UJSX tree to mdast and back, I get something equivalent to what I started with.
4. The Impedance Mismatch: ASG vs AST
4.1 The Problem
Not all transforms satisfy round-trip fidelity. The key failure mode is when we have an Abstract Semantic Graph (ASG) on one side and an Abstract Syntax Tree (AST) on the other.
An AST represents syntax — the textual structure of a format. A mdast tree is an AST. It has a type discriminant and positional information.
An ASG represents semantics — meaning that may not round-trip through syntax. A UJSX <Table> component is an ASG node. It represents the idea of a table, which can be rendered as:
- A markdown table (
| col1 | col2 |) - An HTML
<table>element - A grid of key-value pairs in plain text
- A data structure (headers + rows) in JSON
But the markdown table syntax cannot express all table semantics (no colspans in GFM, no per-cell formatting), so the round trip Table → mdast → Table may lose information.
4.2 Categorical Formulation
In category theory terms, this is a reflection rather than an equivalence:
- The functor
F: ASG → AST(render abstract semantics to concrete syntax) is surjective on objects (every ASG node maps somewhere in AST-land) - The functor
G: AST → ASG(parse concrete syntax back to abstract semantics) is not injective on objects (some ASG constructs have no AST representation, and some map to "lossy" approximations)
The adjunction unit η: id → G ∘ F is not an isomorphism — G(F(a)) may be a simplification of a. This is expected and correct.
4.3 Practical Examples
| ASG Node | AST Representation | Round-Trip? | Loss? |
|---|---|---|---|
<Table cols={["Name","Age"]}> |
mdast table node |
Yes | Column alignment only |
<Table cols={["Name","Age"]} colspan={{1:3}}> |
mdast table node |
Partial | colspan lost (GFM limit) |
<Callout type="warning"> |
mdast blockquote with marker |
Partial | Structural equivalence, semantic loss |
<Formula>ax² + bx + c</Formula> |
Plain text ax² + bx + c |
No | Layout/formatting lost |
<Code language="ts"> |
mdast code with lang="ts" |
Yes | None |
<Heading level={2}> |
mdast heading with depth: 2 |
Yes | None |
4.4 Implications for Transform Design
For ASG→AST transforms, we should:
- Acknowledge loss explicitly — transform rules can declare their fidelity level (exact, structural, lossy)
- Preserve semantics through metadata —
UElement.metadatacan carry information that the target format can't express, enabling partial recovery on the return trip - Not attempt impossible round trips — instead, design the return transform
G: AST → ASGto produce the best possible ASG node, not a faithful reproduction
5. Codegen and Functor Construction
5.1 From TypeBox Modules to Transform Rules
The ts-to-module.ts codegen produces Type.Module({...}) definitions from TypeScript sources. We have modules for Unist, MDAST, and JPATH. Each module defines an object in the TypeBox Schema Category.
Given two modules (say UJSX and Mdast), a functor between them can be partially constructed by analyzing the structural correspondence:
UJSX.UniversalElement.typeis aStringorFunction→ matchesMdastnodes withtypediscriminantsUJSX.UniversalProps.childrenisArray(UniversalNode)→ matchesMdast.Parent.childrenUJSX.UniversalElement.props→ matchesMdastnode-specific properties
The codegen can produce skeleton transform rules — rules where match predicates are derived from schema intersection, and transform functions are stubs waiting for implementation:
// Auto-generated from schema correspondence analysis
registry.register({
name: "heading-ujsx-to-mdast",
direction: "ujsx→mdast",
match: (node) => isUElement(node) && node.type === "heading", // derived from UJSX schema
transform: (node, ctx, next) => {
// TODO: implement heading-specific transform
// node.props.level maps to mdast heading depth
// node.children maps to mdast heading children
throw new Error("Not implemented: heading transform");
},
priority: 1,
});
5.2 From Type Definitions to Host Config Element Types
Each TypeBox Type.XXX() call in a module definition becomes a potential UJSX element type in a host config:
Type.Object({...})→<Object>elementType.String()→<String>elementType.Ref("X")→<Ref name="X" />elementType.Literal("heading")→<Literal value="heading" />elementType.Array(Type.Ref("Node"))→<Array><Ref name="Node" /></Array>
This is a functor from the TypeBox Schema Category to the UJSX Element Category — every TypeBox constructor maps to a UJSX element, and TypeBox composition maps to UJSX tree composition.
5.3 The Proxy Insight: Call Trees and Definition Trees
The proxy experiment in /research/typebox_research/ujsx/proxy.ts traced TypeBox's internal call tree. The relationship:
Call tree (imperative): Definition tree (declarative):
Type.Object({ <Object name="MyType">
name: Type.String(), <property name="name"><String /></property>
... ...
}) </Object>
These are the same information, just in different traversal orders. The call tree is a depth-first execution trace; the definition tree is the resulting data structure. UJSX's declarative form unifies them — the JSX element IS the definition tree, and rendering it through a host produces the call tree.
6. Proof as Program: Verification via Generation
6.1 The Curry-Howard-Lambek Correspondence
The Curry-Howard-Lambek correspondence establishes:
| Logic | Programming | Category Theory |
|---|---|---|
| Propositions | Types | Objects |
| Proofs | Programs | Morphisms |
| Implication A → B | Function type A → B | Exponential object B^A |
| Conjunction A ∧ B | Product type A × B | Product object |
| Disjunction A ∨ B | Union type A | B | Coproduct object |
Under this correspondence: a TypeBox schema is a proposition (a specification of what valid data looks like). A value that passes Value.Check(schema, data) is a proof (evidence that the proposition holds for this data). A transform f: SchemaA → SchemaB is a proof that if A-structured data exists, then B-structured data can be derived from it.
6.2 Faker as Proof Generator
typebox-schema-faker generates random inhabitants of TypeBox schemas. In logical terms, it generates witnesses — concrete evidence that a type (proposition) is inhabited. This serves as:
-
Model checking — generate many instances of UJSX trees, transform them, validate the output against the target schema. If all generated instances pass, you have statistical confidence in the transform's correctness.
-
Round-trip verification — for adjoint functor pairs:
for (const x of generate(UJSXSchema)) { const y = F(x) // ujsx → mdast const x2 = G(y) // mdast → ujsx assert(equivalent(x, x2, fidelityLevel)) }This tests the adjunction laws for generated inhabitants.
-
Schema coverage — faker can be guided to hit edge cases (null children, empty arrays, deeply nested structures) that hand-written tests miss.
6.3 Compositional Verification
If transform f: A → B is verified and transform g: B → C is verified, then g ∘ f: A → C should also be correct by composition. The composedOf field from the graphology pattern (category-theory-graph.md) can track this:
interface TransformRule<T, U, A> {
name: string;
match: (node: T) => boolean;
transform: (node: T, ctx: TransformContext<A>, next: TransformFn<T, U, A>) => U;
priority?: number;
composedOf?: [string, string]; // provenance: which rules composed to produce this one
}
This enables provenance tracking — every composed transform knows its derivation, enabling targeted re-verification when a base rule changes.
7. The TypeDef Bridge: Binary Layout as a Category
7.1 Standard TypeBox vs TypeDef
Standard TypeBox types represent JSON values — data serialized as JSON, validated as JSON Schema. The TypeDef system adds binary layout semantics:
| Standard TypeBox | TypeDef | Difference |
|---|---|---|
Type.Number() |
TF.Float32(), TF.Int32(), etc. |
Fixed-width, specific encoding |
Type.String() |
TF.String() |
Same, but in struct context has known offset |
Type.Object({...}) |
TF.Struct({...}) |
Ordered fields, computable offsets |
Type.Union([...], 'type') |
TF.Union([...], 'type') |
Tagged union with discriminant |
The TypeDef:Kind markers (TypeDef:Float32, TypeDef:Struct, etc.) act as discriminants in a functor — they map from the abstract number type to the concrete binary representation.
7.2 Offset Computation as a Functor
Given a TF.Struct({ name: TF.String(), embedding: TF.Array(TF.Float32()) }):
- The TypeBox functor maps this to a JSON Schema that validates JSON objects with
name(string) andembedding(number array). - The Binary layout functor maps this to a byte offset table:
nameat offset 0 (variable, preceded by length),embeddingat next aligned offset, with shape[768]→768 × 4 = 3072bytes.
These are two different functors from the same source category (TypeDef schemas) to two different target categories (JSON validation and binary layout). They're naturally isomorphic in the sense that every TypeDef schema has both a JSON interpretation and a binary interpretation.
7.3 SafeTensor as a Container Category
The SafeTensor format is: header_length (8 bytes) + header (JSON) + data (raw bytes)
This is a coproduct (sum type): SafeTensor = Header × Data. The header contains TensorRef entries with data_offsets: [start, end] — these are morphisms from the header category to positions in the data category.
The metatensor concept adds TF.Struct entries alongside raw tensors, enabling a third functor: from TypeDef schemas to byte ranges within the data section.
8. Summary: The Categorical Structure of UJSX
TypeBox Schema Category
│
┌──────┼──────┐
│ │ │
F₁ │ F₂ │ F₃ │
┌───┘ │ └───┐
▼ ▼ ▼
UJSX Tree MDAST Tree Binary Layout
Category Category Category
│ ▲
│ G₂ │
└──────────┘
│ F₂⁻¹ │
│ │
(adjoint pair: F₂ and G₂)
- F₁ — TypeBox → UJSX host (renders type definitions as JSX elements)
- F₂ — UJSX → MDAST transform (renders abstract components as markdown AST)
- G₂ — MDAST → UJSX transform (parses markdown AST back to abstract components)
- F₃ — TypeDef → Binary Layout (computes offsets and validates binary data)
F₂ and G₂ should form an adjunction — verified by round-trip property testing with typebox-schema-faker.
F₁ is a faithful embedding — TypeBox schemas embed into UJSX without information loss.
F₃ is a projection — TypeDef schemas project onto binary layouts, but not all binary layouts correspond to valid TypeDef schemas (lossy in the other direction).
The ASG/AST impedance mismatch is a reflection, not an equivalence — some ASG nodes have no faithful AST representation, and this is expected and correct.
9. Practical Next Steps
- Fork
typebox-schema-fakerinto the alkdev ecosystem for model checking - Add
composedOfprovenance tracking toTransformRule - Add
fidelitylevels to transform rules ("exact"|"structural"|"lossy") - Implement the TypeBox host — prove F₁ works before building F₂/F₃
- Use
ts-to-module.tscodegen to produce module definitions, then derive skeleton transform rules from schema correspondence analysis - Design the TypeDef UJSX elements (
<Struct>,<Float32>, etc.) as a second host that computes binary offsets