Files

glm-5.1 ba74afd0b3 add category theory research doc connecting UJSX to functors, adjunctions, and round-trip fidelity

2026-05-09 11:40:51 +00:00

17 KiB

Raw Permalink Blame History

Category Theory and UJSX: Functors, Adjunctions, and Round-Trip Fidelity

Date: 2026-05-09 Topic: Formal foundations for the UJSX transform system, connecting categorical concepts to practical implementation

1. The Big Picture

UJSX is a system for transforming trees between representations. It has:

A universal tree model (UElement / URoot / UPrimitive)
A TransformRegistry with bidirectional rules and a direction field
A HostConfig reconciler pattern that renders trees into target formats
TypeBox schemas that define the shape of valid trees at runtime

This document connects these practical pieces to their categorical foundations, and explains why those connections matter for correctness.

2. Categories We Work With

A category consists of objects (types) and morphisms (structure-preserving maps between types) with an identity morphism and composable morphisms.

2.1 The TypeBox Schema Category

Objects: TypeBox schema types (TObject, TString, TUnion, TModule, etc.) Morphisms: Schema transformations — Type.Optional(T), Type.Ref("X"), Type.Composite([A, B]), Module.Import("X") Composition: Nesting and combining schemas Identity: Type.Ref("Self") within a Type.Module

This is approximately a cartesian closed category:

Products — Type.Composite([A, B]) (intersection types)
Coproducts — Type.Union([A, B]) (union types)
Exponentials — Type.Function([...], R) (function types as first-class objects)
Terminal object — Type.Void() (every type has a unique map to it)

2.2 The Tree Category

Trees (ASTs, IRs, DOMs) form categories where:

Objects are tree species: UJSX elements, mdast nodes, hast nodes, TypeBox schemas-as-trees, safetensor headers
Morphisms are transforms: ujsx → mdast, mdast → ujsx, ujsx → typebox, typebox → ujsx, etc.
Composition is pipeline composition: first transform A→B, then B→C

A UJSX tree (or mdast tree, or hast tree, or TypeBox Module-as-tree) is itself an object in a tree category. The TransformRegistry.transform() function with next continuation is a recursive functor — it maps each subtree while preserving parent-child structure.

2.3 The Host Category

Each HostConfig defines a rendering functor from the UJSX tree category to a target category:

HostConfig<string, string, Record<string, unknown>>

Source category: UJSX tree category
Target category: string instances (DOM nodes, markdown strings, TypeBox schema objects, binary buffers)
Functor: createHostRoot(config).render(tree) maps UJSX elements to target instances

3. Functors and the TransformRegistry

3.1 Transforms Are Functors

A functor F: C → D maps:

Every object A in C to an object F(A) in D
Every morphism f: A → B in C to a morphism F(f): F(A) → F(B) in D
Such that F(id_A) = id_{F(A)} and F(g ∘ f) = F(g) ∘ F(f)

In UJSX terms:

Categorical Concept	UJSX Implementation
Object in C	A UJSX tree conforming to some schema
Object in D	An mdast tree / hast tree / TypeBox schema
Functor F	A set of `TransformRule`s with `direction: "ujsx→mdast"`
`F(g ∘ f)`	The `next` continuation in `transform(node, ctx, next)`
`F(id_A)`	Identity transform (pass-through rule)

The next parameter in TransformRule.transform is the key: it ensures that when you transform a subtree, you first transform the current node, then recursively transform children via next. This is the functoriality condition — the structure is preserved.

3.2 Direction Selects the Functor

The Direction type:

type Direction = "ujsx→mdast" | "mdast→ujsx" | "ujsx→jpath" | "jpath→ujsx" | "ujsx→hast" | "hast→ujsx"

Each direction is a different functor. The pair "ujsx→mdast" and "mdast→ujsx" are potentially adjoint functors.

3.3 Adjunctions and Round-Trip Fidelity

Two functors F: C → D and G: D → C form an adjunction if there exists a natural bijection:

Hom_D(F(A), B) ≅ Hom_C(A, G(B))

In practice, this means: for every UJSX tree a and mdast tree b, there's a one-to-one correspondence between maps F(a) → b (mdast transforms) and maps a → G(b) (UJSX transforms).

The round-trip property is a stronger, more practical notion: for all a in C,

G(F(a)) ≅ a    (unit of the adjunction)

and for all b in D,

F(G(b)) ≅ b    (counit of the adjunction)

This is what we actually want to verify: if I transform a UJSX tree to mdast and back, I get something equivalent to what I started with.

4. The Impedance Mismatch: ASG vs AST

4.1 The Problem

Not all transforms satisfy round-trip fidelity. The key failure mode is when we have an Abstract Semantic Graph (ASG) on one side and an Abstract Syntax Tree (AST) on the other.

An AST represents syntax — the textual structure of a format. A mdast tree is an AST. It has a type discriminant and positional information.

An ASG represents semantics — meaning that may not round-trip through syntax. A UJSX <Table> component is an ASG node. It represents the idea of a table, which can be rendered as:

A markdown table (| col1 | col2 |)
An HTML <table> element
A grid of key-value pairs in plain text
A data structure (headers + rows) in JSON

But the markdown table syntax cannot express all table semantics (no colspans in GFM, no per-cell formatting), so the round trip Table → mdast → Table may lose information.

4.2 Categorical Formulation

In category theory terms, this is a reflection rather than an equivalence:

The functor F: ASG → AST (render abstract semantics to concrete syntax) is surjective on objects (every ASG node maps somewhere in AST-land)
The functor G: AST → ASG (parse concrete syntax back to abstract semantics) is not injective on objects (some ASG constructs have no AST representation, and some map to "lossy" approximations)

The adjunction unit η: id → G ∘ F is not an isomorphism — G(F(a)) may be a simplification of a. This is expected and correct.

4.3 Practical Examples

ASG Node	AST Representation	Round-Trip?	Loss?
`<Table cols={["Name","Age"]}>`	mdast `table` node	Yes	Column alignment only
`<Table cols={["Name","Age"]} colspan={{1:3}}>`	mdast `table` node	Partial	colspan lost (GFM limit)
`<Callout type="warning">`	mdast `blockquote` with marker	Partial	Structural equivalence, semantic loss
`<Formula>ax² + bx + c</Formula>`	Plain text `ax² + bx + c`	No	Layout/formatting lost
`<Code language="ts">`	mdast `code` with `lang="ts"`	Yes	None
`<Heading level={2}>`	mdast `heading` with `depth: 2`	Yes	None

4.4 Implications for Transform Design

For ASG→AST transforms, we should:

Acknowledge loss explicitly — transform rules can declare their fidelity level (exact, structural, lossy)
Preserve semantics through metadata — UElement.metadata can carry information that the target format can't express, enabling partial recovery on the return trip
Not attempt impossible round trips — instead, design the return transform G: AST → ASG to produce the best possible ASG node, not a faithful reproduction

5. Codegen and Functor Construction

5.1 From TypeBox Modules to Transform Rules

The ts-to-module.ts codegen produces Type.Module({...}) definitions from TypeScript sources. We have modules for Unist, MDAST, and JPATH. Each module defines an object in the TypeBox Schema Category.

Given two modules (say UJSX and Mdast), a functor between them can be partially constructed by analyzing the structural correspondence:

UJSX.UniversalElement.type is a String or Function → matches Mdast nodes with type discriminants
UJSX.UniversalProps.children is Array(UniversalNode) → matches Mdast.Parent.children
UJSX.UniversalElement.props → matches Mdast node-specific properties

The codegen can produce skeleton transform rules — rules where match predicates are derived from schema intersection, and transform functions are stubs waiting for implementation:

// Auto-generated from schema correspondence analysis
registry.register({
  name: "heading-ujsx-to-mdast",
  direction: "ujsx→mdast",
  match: (node) => isUElement(node) && node.type === "heading",  // derived from UJSX schema
  transform: (node, ctx, next) => {
    // TODO: implement heading-specific transform
    // node.props.level maps to mdast heading depth
    // node.children maps to mdast heading children
    throw new Error("Not implemented: heading transform");
  },
  priority: 1,
});

5.2 From Type Definitions to Host Config Element Types

Each TypeBox Type.XXX() call in a module definition becomes a potential UJSX element type in a host config:

Type.Object({...}) → <Object> element
Type.String() → <String> element
Type.Ref("X") → <Ref name="X" /> element
Type.Literal("heading") → <Literal value="heading" /> element
Type.Array(Type.Ref("Node")) → <Array><Ref name="Node" /></Array>

This is a functor from the TypeBox Schema Category to the UJSX Element Category — every TypeBox constructor maps to a UJSX element, and TypeBox composition maps to UJSX tree composition.

5.3 The Proxy Insight: Call Trees and Definition Trees

The proxy experiment in /research/typebox_research/ujsx/proxy.ts traced TypeBox's internal call tree. The relationship:

Call tree (imperative):    Definition tree (declarative):
Type.Object({              <Object name="MyType">
  name: Type.String(),       <property name="name"><String /></property>
  ...                         ...
})                          </Object>

These are the same information, just in different traversal orders. The call tree is a depth-first execution trace; the definition tree is the resulting data structure. UJSX's declarative form unifies them — the JSX element IS the definition tree, and rendering it through a host produces the call tree.

6. Proof as Program: Verification via Generation

6.1 The Curry-Howard-Lambek Correspondence

The Curry-Howard-Lambek correspondence establishes:

Logic	Programming	Category Theory
Propositions	Types	Objects
Proofs	Programs	Morphisms
Implication A → B	Function type A → B	Exponential object B^A
Conjunction A ∧ B	Product type A × B	Product object
Disjunction A ∨ B	Union type A \| B	Coproduct object

Under this correspondence: a TypeBox schema is a proposition (a specification of what valid data looks like). A value that passes Value.Check(schema, data) is a proof (evidence that the proposition holds for this data). A transform f: SchemaA → SchemaB is a proof that if A-structured data exists, then B-structured data can be derived from it.

6.2 Faker as Proof Generator

typebox-schema-faker generates random inhabitants of TypeBox schemas. In logical terms, it generates witnesses — concrete evidence that a type (proposition) is inhabited. This serves as:

Model checking — generate many instances of UJSX trees, transform them, validate the output against the target schema. If all generated instances pass, you have statistical confidence in the transform's correctness.

Round-trip verification — for adjoint functor pairs:

for (const x of generate(UJSXSchema)) {
  const y = F(x)   // ujsx → mdast
  const x2 = G(y)  // mdast → ujsx
  assert(equivalent(x, x2, fidelityLevel))
}

This tests the adjunction laws for generated inhabitants.

Schema coverage — faker can be guided to hit edge cases (null children, empty arrays, deeply nested structures) that hand-written tests miss.

6.3 Compositional Verification

If transform f: A → B is verified and transform g: B → C is verified, then g ∘ f: A → C should also be correct by composition. The composedOf field from the graphology pattern (category-theory-graph.md) can track this:

interface TransformRule<T, U, A> {
  name: string;
  match: (node: T) => boolean;
  transform: (node: T, ctx: TransformContext<A>, next: TransformFn<T, U, A>) => U;
  priority?: number;
  composedOf?: [string, string];  // provenance: which rules composed to produce this one
}

This enables provenance tracking — every composed transform knows its derivation, enabling targeted re-verification when a base rule changes.

7. The TypeDef Bridge: Binary Layout as a Category

7.1 Standard TypeBox vs TypeDef

Standard TypeBox types represent JSON values — data serialized as JSON, validated as JSON Schema. The TypeDef system adds binary layout semantics:

Standard TypeBox	TypeDef	Difference
`Type.Number()`	`TF.Float32()`, `TF.Int32()`, etc.	Fixed-width, specific encoding
`Type.String()`	`TF.String()`	Same, but in struct context has known offset
`Type.Object({...})`	`TF.Struct({...})`	Ordered fields, computable offsets
`Type.Union([...], 'type')`	`TF.Union([...], 'type')`	Tagged union with discriminant

The TypeDef:Kind markers (TypeDef:Float32, TypeDef:Struct, etc.) act as discriminants in a functor — they map from the abstract number type to the concrete binary representation.

7.2 Offset Computation as a Functor

Given a TF.Struct({ name: TF.String(), embedding: TF.Array(TF.Float32()) }):

The TypeBox functor maps this to a JSON Schema that validates JSON objects with name (string) and embedding (number array).
The Binary layout functor maps this to a byte offset table: name at offset 0 (variable, preceded by length), embedding at next aligned offset, with shape [768] → 768 × 4 = 3072 bytes.

These are two different functors from the same source category (TypeDef schemas) to two different target categories (JSON validation and binary layout). They're naturally isomorphic in the sense that every TypeDef schema has both a JSON interpretation and a binary interpretation.

7.3 SafeTensor as a Container Category

The SafeTensor format is: header_length (8 bytes) + header (JSON) + data (raw bytes)

This is a coproduct (sum type): SafeTensor = Header × Data. The header contains TensorRef entries with data_offsets: [start, end] — these are morphisms from the header category to positions in the data category.

The metatensor concept adds TF.Struct entries alongside raw tensors, enabling a third functor: from TypeDef schemas to byte ranges within the data section.

8. Summary: The Categorical Structure of UJSX

                    TypeBox Schema Category
                           │
                    ┌──────┼──────┐
                    │      │      │
              F₁    │  F₂  │  F₃  │
                ┌───┘      │      └───┐
                ▼          ▼          ▼
          UJSX Tree    MDAST Tree   Binary Layout
          Category     Category     Category
                │          ▲
                │    G₂    │
                └──────────┘
                │    F₂⁻¹  │
                │          │
           (adjoint pair: F₂ and G₂)

F₁ — TypeBox → UJSX host (renders type definitions as JSX elements)
F₂ — UJSX → MDAST transform (renders abstract components as markdown AST)
G₂ — MDAST → UJSX transform (parses markdown AST back to abstract components)
F₃ — TypeDef → Binary Layout (computes offsets and validates binary data)

F₂ and G₂ should form an adjunction — verified by round-trip property testing with typebox-schema-faker.

F₁ is a faithful embedding — TypeBox schemas embed into UJSX without information loss.

F₃ is a projection — TypeDef schemas project onto binary layouts, but not all binary layouts correspond to valid TypeDef schemas (lossy in the other direction).

The ASG/AST impedance mismatch is a reflection, not an equivalence — some ASG nodes have no faithful AST representation, and this is expected and correct.

9. Practical Next Steps

Fork typebox-schema-faker into the alkdev ecosystem for model checking
Add composedOf provenance tracking to TransformRule
Add fidelity levels to transform rules ("exact" | "structural" | "lossy")
Implement the TypeBox host — prove F₁ works before building F₂/F₃
Use ts-to-module.ts codegen to produce module definitions, then derive skeleton transform rules from schema correspondence analysis
Design the TypeDef UJSX elements (<Struct>, <Float32>, etc.) as a second host that computes binary offsets

17 KiB Raw Permalink Blame History Unescape Escape