# Unist Ecosystem Research: JSX → Markdown Pipeline for LLM Consumption **Date**: 2026-04-28 **Topic**: Feasibility of JSX components → hast → mdast → markdown pipeline using the Unist/syntax-tree ecosystem --- ## Table of Contents 1. [Executive Summary](#1-executive-summary) 2. [unist: The Universal Foundation](#2-unist-the-universal-foundation) 3. [hast: Hypertext Abstract Syntax Tree](#3-hast-hypertext-abstract-syntax-tree) 4. [mdast: Markdown Abstract Syntax Tree](#4-mdast-markdown-abstract-syntax-tree) 5. [hast-util-to-mdast: The Key Transform](#5-hast-util-to-mdast-the-key-transform) 6. [mdast-util-to-markdown: Serialization to Markdown](#6-mdast-util-to-markdown-serialization-to-markdown) 7. [remark/rehype Ecosystem](#7-remarkrehype-ecosystem) 8. [unist-util-visit and Related Utilities](#8-unist-util-visit-and-related-utilities) 9. [TypeScript Type Definitions](#9-typescript-type-definitions) 10. [Pipeline Feasibility Assessment](#10-pipeline-feasibility-assessment) 11. [Alternative Approaches](#11-alternative-approaches) 12. [Recommended Architecture](#12-recommended-architecture) 13. [Appendix: Element-to-Markdown Mapping Table](#13-appendix-element-to-markdown-mapping-table) --- ## 1. Executive Summary The JSX → hast → mdast → markdown pipeline is **feasible and well-supported** by mature, well-typed libraries in the unist/syntax-tree ecosystem. The core transformation chain is: ``` JSX Component Tree → hast (HTML AST) → mdast (Markdown AST) → markdown string │ │ │ │ React rendering hast-util-from-html hast-util-to-mdast mdast-util-to-markdown or react-dom/ or manual hast (v10.1.2) (v2.1.2) server rendering construction ``` **Key finding**: The hardest step is not hast→mdast→markdown (which is solved by existing, mature libraries), but rather **JSX → hast** and handling **custom components** that have no direct HTML/markdown equivalent. The ecosystem provides excellent tooling for standard HTML elements but requires a custom strategy for framework-specific components. **Verdict**: Use the existing unist ecosystem libraries for the hast→mdast→markdown steps. Build a custom JSX→hast adapter layer that handles React component rendering and custom element mapping. --- ## 2. unist: The Universal Foundation **Repository**: https://github.com/syntax-tree/unist **Current version**: 3.0.0 **License**: CC-BY-4.0 unist is the abstract base specification that hast, mdast, xast, and nlcst all implement. It defines the minimal node interface that all syntax tree nodes share. ### Core Node Interface ```typescript interface Node { type: string // Non-empty string identifying the node variant data?: Data // Ecosystem-specific metadata position?: Position // Source location info } interface Parent <: Node { children: [Node] // Child nodes } interface Literal <: Node { value: any // Node's value } interface Position { start: Point end: Point } interface Point { line: number // 1-indexed column: number // 1-indexed offset?: number // 0-indexed } ``` ### Design Principles - All values must be JSON-serializable (no functions, undefined, symbols) - Trees can survive `JSON.parse(JSON.stringify(tree))` roundtrips - `data` field is reserved for ecosystem use; specifications never define fields on it - `position` must be absent on generated nodes ### Why This Matters for Our Pipeline The JSON-serializability constraint means the AST is inherently portable and can be passed between contexts (server/client, different frameworks). The `data` field provides an escape hatch for custom metadata that custom component handlers can use. --- ## 3. hast: Hypertext Abstract Syntax Tree **Repository**: https://github.com/syntax-tree/hast **Spec version**: 2.4.0 **Type definitions**: `@types/hast` **Stars**: 892 hast represents HTML (and embedded SVG/MathML) as an abstract syntax tree. It extends unist. ### Node Types | Node Type | Extends | Description | Key Fields | |-----------|---------|-------------|------------| | **`Root`** | Parent | Document root | `children` | | **`Element`** | Parent | HTML element | `tagName`, `properties`, `children`, `content?` | | **`Text`** | Literal | Text content | `value` | | **`Comment`** | Literal | HTML comment | `value` | | **`Doctype`** | Node | Document type declaration | (none beyond unist Node) | ### Element Interface (the workhorse) ```typescript interface Element <: Parent { type: 'element' tagName: string // e.g., 'div', 'span', 'custom-card' properties: Properties // HTML attributes mapped to DOM properties content?: Root // Only for