docs(research): add polyglot SQL transpiler deep dive for multi-DB storage evaluation

2026-06-10 10:04:30 +00:00
parent bdb0b604e9
commit 6e71d1f306
3 changed files with 1151 additions and 0 deletions
--- a/docs/research/references/polyglot/01_overview.md
+++ b/docs/research/references/polyglot/01_overview.md
@@ -0,0 +1,137 @@
+# Polyglot: Research Overview
+
+**Library**: `polyglot-sql` (Rust crate) / `@polyglot-sql/sdk` (TypeScript/WASM) / `polyglot-sql` (Python)  
+**Repository**: <https://github.com/tobilg/polyglot>  
+**Current Version**: 0.4.4 (as of 2026-06-03)  
+**License**: MIT (+ sqlglot MIT for test fixtures)  
+**Author**: Tobias G. (tobilg)  
+**Inspiration**: Python [sqlglot](https://github.com/tobymao/sqlglot) by Toby Mao  
+
+---
+
+## 1. What Is Polyglot?
+
+Polyglot is a **SQL transpiler** — it parses SQL from one database dialect into an AST, and generates SQL for a different dialect. It is **not** a database driver, ORM, query executor, or connection pool. Its core purpose is **dialect-agnostic SQL manipulation**: parse, transform, validate, format, and transpile SQL across 32+ database dialects.
+
+### Key Capabilities
+
+| Capability | Description |
+|---|---|
+| **Parse** | Convert SQL string → typed AST with 200+ expression node types |
+| **Generate** | Convert AST → SQL string for any supported dialect |
+| **Transpile** | Convert SQL from dialect A → dialect B in one call |
+| **Format** | Pretty-print SQL with configurable guard rails |
+| **Build** | Construct SQL programmatically via a fluent builder API |
+| **Validate** | Syntax + semantic validation with error positions |
+| **Lineage** | Trace column lineage through queries; generate OpenLineage payloads |
+| **Diff** | AST-aware diff between two SQL expressions |
+| **Traverse** | DFS/BFS iterators, predicate queries, and transforms on the AST |
+
+### Supported Dialects (32)
+
+Athena, BigQuery, ClickHouse, CockroachDB, Databricks, Doris, Dremio, Drill, Druid, DuckDB, Dune, Exasol, Fabric, Hive, Materialize, MySQL, Oracle, PostgreSQL, Presto, Redshift, RisingWave, SingleStore, Snowflake, Solr, Spark, SQLite, StarRocks, Tableau, Teradata, TiDB, Trino, TSQL
+
+Plus a `Generic` dialect for standard SQL.
+
+### Language Bindings
+
+| Binding | Package | Delivery |
+|---|---|---|
+| **Rust** | `polyglot-sql` on crates.io | Native Rust crate |
+| **TypeScript/WASM** | `@polyglot-sql/sdk` on npm | WASM module + JS wrapper |
+| **Python** | `polyglot-sql` on PyPI | PyO3 native extension |
+| **Go** | `github.com/tobilg/polyglot/packages/go` | PureGo wrapper over C FFI |
+| **C FFI** | Built from `polyglot-sql-ffi` | `.so` / `.dylib` / `.dll` + `.a` / `.lib` + header |
+
+---
+
+## 2. Core Philosophy & Design Principles
+
+1. **Pipeline architecture**: SQL → Tokenize → Parse → AST → Transform → Generate → SQL string. Each stage is independently configurable per dialect.
+
+2. **Ported from Python sqlglot**: The Rust implementation is a faithful port of the Python `sqlglot` library, maintaining compatibility with its test fixtures (10,220+ fixture cases at 100% pass rate). The architecture, expression types, transformation rules, and dialect behaviors mirror the Python original.
+
+3. **No runtime database connection**: Polyglot never connects to a database. It operates purely on SQL strings and ASTs. This makes it safe for sandboxed environments (WASM, serverless) and suitable for build-time / CI-time SQL analysis.
+
+4. **Feature-gated compilation**: Each dialect is behind a Cargo feature flag (`dialect-postgresql`, `dialect-mysql`, etc.), so users compiling for constrained targets (WASM) can include only what they need. The `default` feature set includes everything.
+
+5. **Stack safety**: The `stacker` feature (default-on for native builds) grows the stack on deeply nested inputs, preventing stack overflow from pathological SQL. WASM builds opt out since `stacker` doesn't work there.
+
+6. **Guard rails**: Format/guard options limit input size (16 MiB default), token count (1M), AST node count (1M), and set-operation chain depth (256) to prevent resource exhaustion.
+
+7. **Performance-first**: Built in Rust for speed. Benchmarks show 8–19× speedup over the Python `sqlglot` for transpilation, with generation at ~86× faster. The WASM build enables near-native performance in browsers.
+
+---
+
+## 3. How It Differs from Database Abstraction Layers
+
+**Critical distinction**: Polyglot is a **SQL dialect transpiler**, not a database abstraction layer. It does not:
+
+- Connect to databases
+- Execute queries
+- Manage connection pools
+- Handle migrations (no `CREATE TABLE` schema evolution management)
+- Map Rust types to database types
+- Provide an ORM-like interface
+- Handle async I/O
+
+Instead, it focuses purely on **SQL text manipulation**: parsing, analyzing, transforming, and generating SQL strings. This makes it complementary to (not competing with) libraries like Diesel, SQLx, or SeaORM.
+
+---
+
+## 4. Performance Characteristics
+
+From the project's benchmark suite (polyglot-sql v0.1.2 vs sqlglot v28.10.1):
+
+| Operation | Speedup Range |
+|---|---|
+| Parse (SQL → AST) | 10–13× faster |
+| Generate (AST → SQL) | 77–101× faster |
+| Roundtrip (parse → generate → re-parse) | 13–15× faster |
+| Transpile (full cross-dialect) | 1.6× (simple) to 19× (complex BigQuery→Snowflake) |
+| Geometric mean | **8.70×** |
+
+Parse benchmarks (v0.4.x, native Rust):
+
+| Query | Mean |
+|---|---|
+| short (SELECT a, b, c) | 51.28 μs |
+| medium (5 cols, JOIN, GROUP BY) | 259.61 μs |
+| complex (3 CTEs, subquery) | 268.59 μs – 1.03 ms |
+
+---
+
+## 5. Project Maturity Indicators
+
+| Indicator | Status |
+|---|---|
+| **Version** | 0.4.4 (pre-1.0, active development) |
+| **Test coverage** | 18,745 test cases at 100% pass rate |
+| **crates.io downloads** | ~4,738 total (as of mid-2026) |
+| **Dependent crates** | 2 (via entdb) |
+| **Release cadence** | Frequent patch releases (0.4.2, 0.4.3, 0.4.4 in quick succession) |
+| **Source code size** | ~241K lines of Rust in core crate |
+| **Fuzzing** | Supported via `cargo +nightly fuzz` |
+| **CI** | Full test suite + FFI + Python + WASM |
+| **Documentation** | Rust API docs (docs.rs), TypeScript docs, Python docs, playground |
+| **Breaking changes** | Possible before 1.0; semver suggests API instability |
+
+---
+
+## 6. License
+
+- **MIT License** for the Polyglot code itself
+- **sqlglot MIT License** for the test fixtures derived from the Python project
+- Both are permissive, suitable for commercial use
+
+---
+
+## References
+
+- <https://github.com/tobilg/polyglot> — Main repository
+- <https://crates.io/crates/polyglot-sql> — Rust crate on crates.io
+- <https://www.npmjs.com/package/@polyglot-sql/sdk> — TypeScript SDK on npm
+- <https://pypi.org/project/polyglot-sql/> — Python bindings on PyPI
+- <https://docs.rs/polyglot-sql/latest/polyglot_sql/> — Rust API documentation
+- <https://polyglot-playground.gh.tobilg.com/> — Interactive playground
+- <https://github.com/tobymao/sqlglot> — Original Python inspiration
--- a/docs/research/references/polyglot/02_architecture.md
+++ b/docs/research/references/polyglot/02_architecture.md
@@ -0,0 +1,720 @@
+# Polyglot: Architecture Deep Dive
+
+---
+
+## 1. Workspace Structure
+
+The repository is organized as a Cargo workspace with 5 crates and supporting packages:
+
+```
+polyglot/
+├── crates/
+│   ├── polyglot-sql/                    # Core Rust library (~241K LOC)
+│   │   └── src/
+│   │       ├── lib.rs                   # Public API, top-level functions
+│   │       ├── tokens.rs               # Tokenizer (lexer)
+│   │       ├── parser.rs                # Recursive-descent parser (~62K LOC)
+│   │       ├── expressions.rs           # AST node types (~15K LOC)
+│   │       ├── generator.rs             # SQL code generator (~39K LOC)
+│   │       ├── dialects/                # 33 dialect implementations
+│   │       │   ├── mod.rs              # Dialect trait, Dialect struct, CustomDialectBuilder
+│   │       │   ├── generic.rs          # Base/standard SQL dialect
+│   │       │   ├── postgres.rs         # PostgreSQL (~1.9K LOC)
+│   │       │   ├── mysql.rs            # MySQL
+│   │       │   ├── sqlite.rs           # SQLite
+│   │       │   ├── bigquery.rs         # BigQuery
+│   │       │   ├── ... (32 total)
+│   │       ├── builder.rs              # Fluent query builder API
+│   │       ├── transforms.rs           # Cross-dialect transform functions
+│   │       ├── validation.rs           # Syntax + semantic validation
+│   │       ├── schema.rs              # Schema representation
+│   │       ├── scope.rs               # Scope analysis
+│   │       ├── resolver.rs            # Column resolution
+│   │       ├── lineage.rs             # Column lineage tracking
+│   │       ├── openlineage.rs          # OpenLineage payload generation
+│   │       ├── diff.rs                # AST diff (ChangeDistiller algorithm)
+│   │       ├── planner.rs             # Logical query plan
+│   │       ├── optimizer/              # Query optimizer modules
+│   │       │   ├── annotate_types.rs  # Type annotation
+│   │       │   ├── qualify_columns.rs # Column qualification
+│   │       │   ├── qualify_tables.rs   # Table qualification
+│   │       │   ├── pushdown_predicates.rs
+│   │       │   ├── pushdown_projections.rs
+│   │       │   ├── eliminate_joins.rs
+│   │       │   ├── eliminate_ctes.rs
+│   │       │   ├── simplify.rs
+│   │       │   └── ...
+│   │       ├── traversal.rs            # DFS/BFS visitors, AST predicates
+│   │       ├── ast_transforms.rs       # AST manipulation utilities
+│   │       ├── error.rs               # Error types
+│   │       └── time.rs                # Time format conversion
+│   ├── polyglot-sql-function-catalogs/  # Optional dialect function catalogs
+│   ├── polyglot-sql-wasm/              # WASM bindings (wasm-pack)
+│   ├── polyglot-sql-ffi/               # C FFI bindings (cbindgen)
+│   └── polyglot-sql-python/            # Python bindings (PyO3 + maturin)
+├── packages/
+│   ├── sdk/                            # TypeScript SDK (@polyglot-sql/sdk)
+│   ├── go/                             # Go SDK (PureGo wrapper over FFI)
+│   ├── documentation/                  # TypeScript API docs site
+│   ├── playground/                      # Browser playground (React 19, Vite)
+│   └── python-docs/                     # Python API docs
+├── examples/
+│   ├── rust/                           # Rust usage example
+│   ├── typescript/                     # TypeScript SDK example
+│   └── c/                              # C FFI usage example
+└── tools/
+    ├── sqlglot-compare/                # Fixture extraction & comparison
+    └── bench-compare/                  # Performance benchmarks
+```
+
+---
+
+## 2. Data Flow Pipeline
+
+```
+┌──────────────────────────────────────────────────────────────────────┐
+│                        SQL String (source dialect)                   │
+└──────────────────────────┬──────────────────────────────────────────┘
+                           │
+                           ▼
+┌──────────────────────────────────────────────────────────────────────┐
+│                        Tokenizer (tokens.rs)                         │
+│  • Dialect-specific lexing rules (quotes, comments, keywords)        │
+│  • Configurable via TokenizerConfig per dialect                      │
+│  • Produces Vec<Token> with type, text, and Span (line/col/offset) │
+└──────────────────────────┬──────────────────────────────────────────┘
+                           │
+                           ▼
+┌──────────────────────────────────────────────────────────────────────┐
+│                    Parser (parser.rs, ~62K LOC)                      │
+│  • Recursive-descent with precedence climbing                       │
+│  • Dialect-aware parsing (custom keywords, syntax rules)            │
+│  • Produces Expression AST tree                                     │
+│  • Stack safety via `stacker` feature (default-on)                  │
+└──────────────────────────┬──────────────────────────────────────────┘
+                           │
+                           ▼
+┌──────────────────────────────────────────────────────────────────────┐
+│                    Expression AST (expressions.rs)                    │
+│  • Single tagged enum with 150+ variants                            │
+│  • Each variant has its own struct (Select, Insert, Function, etc.) │
+│  • Box<Variant> keeps enum size to 2 words (tag + pointer)          │
+│  • Serializable via serde (derive Serialize/Deserialize)             │
+│  • Optional TypeScript type generation via `ts-rs` feature flag      │
+└──────────────────────────┬──────────────────────────────────────────┘
+                           │
+                      ┌────┴────┐
+                      │         │
+            ┌─────────┘         └──────────┐
+            │                               │
+            ▼                               ▼
+┌────────────────────────┐    ┌────────────────────────────────────┐
+│  Transform Pipeline     │    │  Semantic / Analysis Modules       │
+│  (transpile path)      │    │  • validation.rs → syntax checks   │
+│                        │    │  • schema.rs → column/type lookup   │
+│  1. preprocess()       │    │  • scope.rs → scope analysis        │
+│     (whole-tree rewrites│    │  • resolver.rs → column resolution  │
+│      like eliminate_   │    │  • lineage.rs → column lineage      │
+│      qualify)           │    │  • openlineage.rs → OL payloads    │
+│                        │    │  • optimizer/ → query optimization   │
+│  2. transform_expr()   │    │  • diff.rs → AST diff               │
+│     (per-node rewrites  │    │  • planner.rs → logical plan DAG    │
+│      per dialect)       │    │  • traversal.rs → DFS/BFS visitors   │
+│                        │    │
+│  3. Generator          │    │
+│     (AST → SQL string) │    │
+└───────────┬────────────┘    └────────────────────────────────────┘
+            │
+            ▼
+┌──────────────────────────────────────────────────────────────────────┐
+│                     SQL String (target dialect)                      │
+└──────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 3. Core Abstractions
+
+### 3.1 Expression AST
+
+The central type is `Expression`, a large tagged enum with one variant per SQL construct:
+
+```rust
+pub enum Expression {
+    // Literals
+    Literal(Box<Literal>),
+    Boolean(BooleanLiteral),
+    Null(Null),
+
+    // Identifiers
+    Identifier(Identifier),
+    Column(Box<Column>),
+    Table(Box<TableRef>),
+    Star(Star),
+
+    // Queries
+    Select(Box<Select>),
+    Union(Box<Union>),
+    Intersect(Box<Intersect>),
+    Except(Box<Except>),
+    Subquery(Box<Subquery>),
+
+    // DML
+    Insert(Box<Insert>),
+    Update(Box<Update>),
+    Delete(Box<Delete>),
+    Copy(Box<CopyStmt>),
+
+    // Binary/Unary operators
+    And(Box<BinaryOp>),
+    Or(Box<BinaryOp>),
+    Add(Box<BinaryOp>),
+    Eq(Box<BinaryOp>),
+    // ... 30+ operator variants
+
+    // Functions
+    Function(Box<Function>),
+    AggregateFunction(Box<AggregateFunction>),
+    WindowFunction(Box<WindowFunction>),
+
+    // Clauses
+    From(Box<From>),
+    Join(Box<Join>),
+    Where(Box<Where>),
+    OrderBy(Box<OrderBy>),
+    // ...
+
+    // ~150 total variants
+}
+```
+
+Key design choices:
+- **Boxed variants**: Most variants wrap their payload in `Box` to keep `size_of::<Expression>()` at 2 words (16 bytes on 64-bit).
+- **Serde support**: `#[derive(Serialize, Deserialize)]` for JSON serialization across FFI/WASM boundaries.
+- **TypeScript types**: Optional `ts-rs` feature generates TypeScript interfaces.
+- **Convenience methods**: `Expression::column()`, `Expression::number()`, `Expression::sql()`, `Expression::sql_for()`.
+
+### 3.2 DialectType Enum
+
+```rust
+pub enum DialectType {
+    Generic, PostgreSQL, MySQL, BigQuery, Snowflake, DuckDB, SQLite,
+    Hive, Spark, Trino, Presto, Redshift, TSQL, Oracle, ClickHouse,
+    Databricks, Athena, Teradata, Doris, StarRocks, Materialize,
+    RisingWave, SingleStore, CockroachDB, TiDB, Druid, Solr, Tableau,
+    Dune, Fabric, Drill, Dremio, Exasol, DataFusion,
+}
+```
+
+- Implements `FromStr` with aliases (e.g., `"mssql"` → `TSQL`, `"cockroach"` → `CockroachDB`)
+- Each variant maps to a feature-gated dialect module
+- Custom dialects can be registered at runtime via `CustomDialectBuilder`
+
+### 3.3 DialectImpl Trait
+
+```rust
+pub trait DialectImpl {
+    fn dialect_type(&self) -> DialectType;
+    fn tokenizer_config(&self) -> TokenizerConfig { /* default */ }
+    fn generator_config(&self) -> GeneratorConfig { /* default */ }
+    fn generator_config_for_expr(&self, _expr: &Expression) -> GeneratorConfig { /* default */ }
+    fn transform_expr(&self, expr: Expression) -> Result<Expression> { Ok(expr) }
+    fn preprocess(&self, expr: Expression) -> Result<Expression> { Ok(expr) }
+}
+```
+
+Each dialect implements this trait to provide:
+1. **Tokenizer config**: Identifier quoting characters, string delimiters, keyword overrides, comment styles, hex number support
+2. **Generator config**: 30+ flags controlling SQL output (identifier quote style, function casing, `LIMIT` vs `TOP` vs `FETCH FIRST`, etc.)
+3. **Per-node transform**: Dialect-specific expression rewrites (e.g., PostgreSQL transforms `IFNULL` → `COALESCE`, SQLite transforms `TRY_CAST` → `CAST`)
+4. **Whole-tree preprocess**: Structural rewrites that need full-tree context (e.g., eliminating `QUALIFY` for dialects that don't support it)
+
+### 3.4 Dialect Struct (High-Level API)
+
+```rust
+pub struct Dialect {
+    dialect_type: DialectType,
+    tokenizer: Tokenizer,
+    generator_config: Arc<GeneratorConfig>,
+    transformer: Box<dyn Fn(Expression) -> Result<Expression> + Send + Sync>,
+    generator_config_for_expr: Option<Box<dyn Fn(&Expression) -> GeneratorConfig + Send + Sync>>,
+    custom_preprocess: Option<Box<dyn Fn(Expression) -> Result<Expression> + Send + Sync>>,
+}
+```
+
+The `Dialect` struct bundles all dialect-specific state and provides the primary API:
+
+```rust
+// Parse SQL
+let ast = dialect.parse("SELECT 1")?;
+
+// Generate SQL from AST
+let sql = dialect.generate(&ast[0])?;
+
+// Transpile between dialects
+let results = dialect.transpile("SELECT IFNULL(a,b) FROM t", DialectType::PostgreSQL)?;
+
+// Tokenize
+let tokens = dialect.tokenize("SELECT 1")?;
+```
+
+### 3.5 CustomDialectBuilder
+
+For runtime-extensible dialect support:
+
+```rust
+use polyglot_sql::dialects::{CustomDialectBuilder, Dialect, DialectType};
+use polyglot_sql::generator::NormalizeFunctions;
+
+// Register a custom dialect inheriting from PostgreSQL
+CustomDialectBuilder::new("my_postgres")
+    .based_on(DialectType::PostgreSQL)
+    .generator_config_modifier(|gc| {
+        gc.normalize_functions = NormalizeFunctions::Lower;
+    })
+    .register()?;
+
+let d = Dialect::get_by_name("my_postgres").unwrap();
+// Use like any built-in dialect
+```
+
+---
+
+## 4. Dialect Implementation Details
+
+### 4.1 PostgreSQL (`postgres.rs`, ~1,879 LOC)
+
+**Tokenizer:**
+- `$$` string literals (dollar-quoting)
+- Double-quote identifier quoting
+- Nested block comments
+- `EXEC` treated as generic command
+
+**Generator config highlights:**
+- `identifier_quote: '"'` (double quotes)
+- `single_string_interval: true` (`INTERVAL '1 day'`)
+- `parameter_token: "$"` (`$1`, `$2` placeholders)
+- `supports_select_into: true`
+- `supports_window_exclude: true`
+- `can_implement_array_any: true`
+
+**Transform examples:**
+- `IFNULL(a, b)` → `COALESCE(a, b)`
+- `RAND()` → `RANDOM()`
+- `DATEDIFF(day, a, b)` → `CAST(b - a AS INT)` (date subtraction)
+- `JSON_EXTRACT(a, '$.x')` → `a #> '{x}'` (arrow syntax)
+- `JSON_EXTRACT_SCALAR(a, '$.x')` → `a #>> '{x}'`
+- `DATE_ADD` / `DATE_SUB` → `+` / `-` interval arithmetic
+- Type mappings: `TINYINT` → `SMALLINT`, `FLOAT` → `REAL`, `DOUBLE` → `DOUBLE PRECISION`
+- `ILIKE` preserved (native PostgreSQL)
+- `RegexpLike` → `~` operator, `RegexpILike` → `~*` operator
+
+### 4.2 SQLite (`sqlite.rs`, ~750 LOC)
+
+**Tokenizer:**
+- Supports `"`, `[`, `` ` `` as identifier quote characters
+- No nested comments
+- Hex number literals (`0xCC`)
+
+**Generator config:**
+- `identifier_quote: '"'` (double quotes)
+- `supports_table_alias_columns: false`
+- `json_key_value_pair_sep: ","` (comma-style `JSON_OBJECT`)
+
+**Transform examples:**
+- `NVL(a, b)` → `IFNULL(a, b)`
+- `TRY_CAST(x AS t)` → `CAST(x AS t)` (no try-cast)
+- `RANDOM()` → function
+- `ILIKE` → `LOWER(left) LIKE LOWER(right)` (no native ILIKE)
+- `CountIf(cond)` → `SUM(IIF(cond, 1, 0))`
+- `CEIL(x)` → function form
+- `DATE_TRUNC(unit, col)` → various strftime patterns
+- `DATE_DIFF` → `juliandiff` patterns
+
+### 4.3 MySQL (`mysql.rs`)
+
+**Tokenizer:** Backtick identifiers, `#` comments
+**Generator:** Backtick quoting, `LIMIT` syntax, `CONCAT()` instead of `||`
+**Transforms:** `COALESCE(a,b)` ← `IFNULL(a,b)`, `||` → `CONCAT()` (string concat), etc.
+
+### 4.4 BigQuery (`bigquery.rs`)
+
+**Tokenizer:** Backtick identifiers, `QUALIFY` keyword
+**Generator:** Backtick quoting, `STRUCT` types, `QUALIFY` clause, `DATE_DIFF` syntax
+**Transforms:** Complex date/timestamp function mappings, `UNNEST` handling, `APPROX_COUNT_DISTINCT` → `APPROX_COUNT_DISTINCT`
+
+### 4.5 How Transpilation Works
+
+The full transpilation pipeline:
+
+```
+Input SQL (source dialect)
+      │
+      ▼
+Source Dialect Tokenizer
+      │
+      ▼
+Parser (dialect-aware)
+      │
+      ▼
+Expression AST
+      │
+      ▼
+Source Dialect::preprocess()      ← whole-tree rewrites
+      │
+      ▼
+Source Dialect::transform_expr()   ← per-node rewrites (recursive, bottom-up)
+      │
+      ▼
+Normalized AST
+      │
+      ▼
+Target Dialect Generator
+      │
+      ▼
+Output SQL (target dialect)
+```
+
+The transform pipeline uses an explicit task stack (not recursive calls) for the hot paths to avoid stack overflow. The `stacker` crate provides additional stack-growth protection.
+
+Key cross-dialect transforms include:
+- Function renaming: `IFNULL` ↔ `COALESCE` ↔ `NVL`, `DATEDIFF` ↔ date arithmetic, `STRING_AGG` ↔ `GROUP_CONCAT`
+- Type mapping: `TINYINT` ↔ `SMALLINT`, `FLOAT` ↔ `REAL`, `JSON` ↔ `JSONB`
+- Syntax conversion: `LIMIT` ↔ `TOP` ↔ `FETCH FIRST`, `||` (concat) ↔ `CONCAT()`, `SELECT INTO` ↔ `CREATE TABLE AS`
+- Boolean handling: `BOOL_AND`/`BOOL_OR` ↔ `MIN`/`MAX`-over-`CASE`
+- JSON operators: `JSON_EXTRACT` ↔ `#>`/`#>>` ↔ `->`/`->>` (PostgreSQL arrow syntax)
+
+---
+
+## 5. Fluent Builder API
+
+The builder module (`builder.rs`, ~3.3K LOC) provides a type-safe, ergonomic way to construct SQL expressions without string interpolation:
+
+```rust
+use polyglot_sql::builder::*;
+
+// SELECT id, name FROM users WHERE age > 18 ORDER BY name LIMIT 10
+let expr = select(["id", "name"])
+    .from("users")
+    .where_(col("age").gt(lit(18)))
+    .order_by(["name"])
+    .limit(10)
+    .build();
+
+// INSERT
+let ins = insert_into("users")
+    .columns(["id", "name"])
+    .values([lit(1), lit("Alice")])
+    .build();
+
+// CASE expression
+let expr = case()
+    .when(col("x").gt(lit(0)), lit("positive"))
+    .else_(lit("non-positive"))
+    .build();
+
+// Set operations
+let expr = union_all(
+    select(["id"]).from("a"),
+    select(["id"]).from("b"),
+).order_by(["id"]).limit(5).build();
+```
+
+Expression helpers:
+- `col("users.id")` — column reference (splits on last `.`)
+- `lit(42)`, `lit("hello")`, `lit(3.14)`, `lit(true)` — literals
+- `func("COALESCE", [col("a"), col("b")])` — function calls
+- Operator chain: `col("age").gte(lit(18)).and(col("status").eq(lit("active")))`
+
+The builder generates an `Expression` AST that can then be serialized to any dialect via `generate()`.
+
+---
+
+## 6. Validation and Schema-Aware Analysis
+
+### 6.1 Syntax Validation
+
+```rust
+use polyglot_sql::{validate, DialectType};
+
+let result = validate("SELECT * FORM users", DialectType::Generic);
+// result.valid == false
+// result.errors contain line/column/message/error codes
+```
+
+Error codes:
+- `E001` — Syntax error
+- `E002` — Tokenization error
+- `E003` — Parse error
+- `E004` — Invalid expression (not a valid statement)
+- `E005` — Trailing comma in strict mode
+
+### 6.2 Schema-Aware Validation
+
+```rust
+use polyglot_sql::{
+    validate_with_schema, DialectType, SchemaColumn, SchemaTable,
+    SchemaValidationOptions, ValidationSchema,
+};
+
+let schema = ValidationSchema {
+    strict: Some(true),
+    tables: vec![
+        SchemaTable {
+            name: "users".into(),
+            columns: vec![
+                SchemaColumn { name: "id".into(), data_type: "integer".into(), nullable: Some(false), primary_key: true, unique: false, references: None },
+                SchemaColumn { name: "email".into(), data_type: "varchar".into(), nullable: Some(false), primary_key: false, unique: true, references: None },
+            ],
+            // ...
+        },
+    ],
+};
+
+let opts = SchemaValidationOptions { check_types: true, check_references: true, strict: None, semantic: true };
+let result = validate_with_schema("SELECT id FROM users WHERE email = 1", DialectType::Generic, &schema, &opts);
+// result.valid == false (type mismatch: email is varchar, compared to integer)
+```
+
+Schema-aware error codes:
+- `E200`/`E201` — Unknown table/column
+- `E210`–`E217`, `W210`–`W216` — Type checks
+- `E220`, `E221`, `W220`, `W221`, `W222` — Reference/FK checks
+
+### 6.3 Function Catalogs
+
+Optional feature-gated function catalogs (currently ClickHouse and DuckDB) provide known function signatures for semantic type checking:
+
+```toml
+polyglot-sql = { version = "0.4", features = ["function-catalog-clickhouse"] }
+```
+
+---
+
+## 7. Column Lineage & OpenLineage
+
+### 7.1 Column Lineage
+
+Trace how columns flow through a query:
+
+```rust
+use polyglot_sql::{parse, DialectType};
+use polyglot_sql::lineage::get_column_lineage;
+
+let ast = parse("SELECT a + b AS total FROM t", DialectType::Generic).unwrap();
+let lineage = get_column_lineage(&ast[0], /* schema */ None, DialectType::Generic);
+// lineage tells you that "total" depends on columns "a" and "b" from table "t"
+```
+
+### 7.2 OpenLineage Payload Generation
+
+```rust
+use polyglot_sql::openlineage::{generate_run_event, OpenLineageOptions, OpenLineageDatasetId};
+
+let opts = OpenLineageOptions {
+    dialect: DialectType::PostgreSQL,
+    producer: "my-app".into(),
+    dataset_namespace: Some("mydb".into()),
+    // ...
+};
+let event = generate_run_event("SELECT * FROM users", &opts)?;
+// event is a JSON-serializable OpenLineage RunEvent with columnLineage facets
+```
+
+---
+
+## 8. Error Handling
+
+### 8.1 Error Types
+
+```rust
+pub enum Error {
+    Tokenize { message: String, line: usize, column: usize, start: usize, end: usize },
+    Parse { message: String, line: usize, column: usize, start: usize, end: usize },
+    Generate(String),
+    Unsupported { feature: String, dialect: String },
+    Syntax { message: String, line: usize, column: usize, start: usize, end: usize },
+    Internal(String),
+}
+```
+
+All position-bearing errors include:
+- `line` — 1-based line number
+- `column` — 1-based column number
+- `start` / `end` — byte offsets (0-based, end exclusive)
+
+```rust
+let err = Error::parse("Unexpected token", 3, 15, 42, 44);
+assert_eq!(err.line(), Some(3));
+assert_eq!(err.column(), Some(15));
+assert_eq!(err.start(), Some(42));
+```
+
+### 8.2 Validation Errors
+
+```rust
+pub struct ValidationError {
+    pub message: String,
+    pub line: Option<usize>,
+    pub column: Option<usize>,
+    pub severity: ValidationSeverity,  // Error or Warning
+    pub code: String,                    // e.g., "E001", "E200"
+    pub start: Option<usize>,
+    pub end: Option<usize>,
+}
+
+pub struct ValidationResult {
+    pub valid: bool,
+    pub errors: Vec<ValidationError>,
+}
+```
+
+### 8.3 Guard Rail Errors
+
+Format operations have configurable guard limits that return structured errors:
+
+- `E_GUARD_INPUT_TOO_LARGE` — input exceeds `max_input_bytes`
+- `E_GUARD_TOKEN_BUDGET_EXCEEDED` — token count exceeds `max_tokens`
+- `E_GUARD_AST_BUDGET_EXCEEDED` — AST node count exceeds `max_ast_nodes`
+- `E_GUARD_SET_OP_CHAIN_EXCEEDED` — UNION/INTERSECT/EXCEPT chain exceeds `max_set_op_chain`
+
+---
+
+## 9. AST Traversal & Analysis
+
+### 9.1 Traversal
+
+```rust
+use polyglot_sql::{parse, DialectType};
+use polyglot_sql::traversal::*;
+
+let ast = parse("SELECT a, b FROM t WHERE x > 1", DialectType::Generic).unwrap();
+let columns = get_columns(&ast[0]);  // ["a", "b", "x"]
+let tables = get_tables(&ast[0]);    // ["t"]
+```
+
+Available predicates (70+):
+- `is_select`, `is_insert`, `is_update`, `is_delete`, `is_ddl`
+- `is_join`, `is_where`, `is_group_by`, `is_order_by`, `is_limit`
+- `is_function`, `is_aggregate`, `is_subquery`, `is_cte`
+- `is_comparison`, `is_logical`, `is_arithmetic`
+- `contains_subquery`, `contains_aggregate`, `contains_window_function`
+
+Iterators: `DfsIter`, `BfsIter` for depth-first and breadth-first traversal.
+
+### 9.2 AST Transforms
+
+```rust
+use polyglot_sql::ast_transforms::*;
+
+// Rename tables
+let renamed = rename_tables(expr, &[("old_name", "new_name")]);
+
+// Add WHERE condition
+let filtered = add_where(expr, col("active").eq(lit(true)));
+
+// Remove LIMIT/OFFSET
+let unlimited = remove_limit_offset(expr);
+```
+
+### 9.3 AST Diff
+
+```rust
+use polyglot_sql::diff::{diff, diff_with_config, DiffConfig};
+
+let edits = diff(&source_expr, &target_expr, true);
+for edit in &edits {
+    if edit.is_change() {
+        println!("{:?}", edit);
+    }
+}
+```
+
+Uses the ChangeDistiller algorithm with Dice coefficient matching for structural comparison.
+
+### 9.4 Logical Planner
+
+```rust
+use polyglot_sql::planner::Plan;
+
+let plan = Plan::from_expression(&expr);
+// plan.root is a Step DAG
+// plan.leaves() returns leaf steps
+// plan.dag() returns the dependency graph
+```
+
+Step kinds: Scan, Filter, Project, Aggregate, Join, Sort, Limit, etc.
+
+---
+
+## 10. Optimizer Modules
+
+The optimizer is available behind the `semantic` feature flag:
+
+| Module | Purpose |
+|---|---|
+| `qualify_columns.rs` | Resolve unqualified column references to table.column |
+| `qualify_tables.rs` | Expand table names with schema/catalog |
+| `annotate_types.rs` | Infer and annotate expression types |
+| `pushdown_predicates.rs` | Push WHERE conditions into JOINs |
+| `pushdown_projections.rs` | Reduce columns to only what's needed |
+| `eliminate_joins.rs` | Remove unnecessary JOINs |
+| `eliminate_ctes.rs` | Inline single-use CTEs |
+| `simplify.rs` | Simplify boolean expressions, constant folding |
+| `normalize.rs` | Expression normalization |
+| `canonicalize.rs` | Query canonicalization |
+| `subquery.rs` | Subquery analysis |
+
+---
+
+## 11. Async Support
+
+**Polyglot does not use async I/O** — it is a pure computational library. All operations are synchronous and CPU-bound:
+
+- `parse()` — synchronous
+- `generate()` — synchronous
+- `transpile()` — synchronous
+- `validate()` — synchronous
+- `format()` — synchronous
+
+This is by design: Polyglot operates on SQL strings in memory, with no network or filesystem I/O. For use in async contexts (Tokio, async-std), callers should use `tokio::task::spawn_blocking()` or similar to offload CPU-heavy parsing/transpilation to a blocking thread pool.
+
+---
+
+## 12. Feature Flags
+
+| Flag | Description | Default |
+|---|---|---|
+| `all-dialects` | Enable all 32 dialect parsers | ✅ |
+| `generate` | SQL generation from AST | ✅ |
+| `transpile` | Cross-dialect transpilation (implies `generate`) | ✅ |
+| `builder` | Fluent query builder API (implies `generate`) | ✅ |
+| `ast-tools` | AST inspection & transform utilities | ✅ |
+| `semantic` | Schema, resolver, lineage, optimizer, validation | ✅ |
+| `openlineage` | OpenLineage payload generation (implies `semantic`) | ✅ |
+| `diff` | AST diff support (implies `generate`) | ✅ |
+| `planner` | Logical planning helpers | ✅ |
+| `time` | Time-format conversion helpers | ✅ |
+| `stacker` | Stack-growth protection for native builds | ✅ |
+| `bindings` | TypeScript type generation via `ts-rs` | ❌ |
+| `dialect-postgresql` | PostgreSQL dialect only | — |
+| `dialect-mysql` | MySQL dialect only | — |
+| ... (one per dialect) | Individual dialect selector | — |
+| `function-catalog-clickhouse` | ClickHouse function catalog | ❌ |
+| `function-catalog-duckdb` | DuckDB function catalog | ❌ |
+| `function-catalog-all-dialects` | All function catalogs | ❌ |
+
+Minimal WASM build (for constrained targets):
+```toml
+polyglot-sql = { version = "0.4", default-features = false, features = ["generate", "transpile", "dialect-postgresql", "dialect-mysql"] }
+```
+
+---
+
+## References
+
+- Source code examined: `/workspace/polyglot/crates/polyglot-sql/src/` (~241K LOC)
+- Architecture documentation: `/workspace/polyglot/docs/sqlglot-architecture.md`
+- Benchmark results: `/workspace/polyglot/docs/benchmark.md`
+- README: `/workspace/polyglot/README.md`, `/workspace/polyglot/crates/polyglot-sql/README.md`
+- CHANGELOG: `/workspace/polyglot/CHANGELOG.md`
--- a/docs/research/references/polyglot/03_analysis.md
+++ b/docs/research/references/polyglot/03_analysis.md
@@ -0,0 +1,294 @@
+# Polyglot: Suitability Analysis & Comparisons
+
+---
+
+## 1. What Polyglot Is NOT
+
+Before evaluating suitability, it's essential to understand what Polyglot **does not** do:
+
+| NOT a... | Because |
+|---|---|
+| **Database driver** | No connection management, no query execution, no result set handling |
+| **ORM** | No object-relational mapping, no model definitions, no active record pattern |
+| **Migration tool** | No `CREATE TABLE` evolution management, no up/down migrations framework |
+| **Type mapper** | No Rust type → SQL type mapping, no `FromRow` derives |
+| **Connection pool** | No async I/O, no TCP connections, no TLS |
+| **Query executor** | Never connects to a database; operates purely on SQL text |
+
+**Polyglot is a SQL dialect transpiler.** It converts SQL strings between database dialects. Period.
+
+---
+
+## 2. Suitability Assessment for Multi-Database Storage Layer
+
+### 2.1 What Polyglot CAN Do for a Multi-DB Project
+
+| Use Case | Polyglot Support | Maturity |
+|---|---|---|
+| **SQL dialect translation** | ✅ Core purpose; 32 dialects with 100% test pass rate | Mature |
+| **SQL pretty-printing** | ✅ Built-in format with guard rails | Mature |
+| **SQL syntax validation** | ✅ Line/column error positions, error codes | Mature |
+| **Schema-aware validation** | ✅ Table/column/type checking with `ValidationSchema` | Moderate |
+| **Column lineage tracing** | ✅ `get_column_lineage()` for data lineage | Moderate |
+| **OpenLineage payloads** | ✅ `RunEvent` and `DatasetFacet` generation | Early but functional |
+| **Query builder** | ✅ Fluent API for SELECT/INSERT/UPDATE/DELETE | Usable but not as rich as query-builder-first libraries |
+| **AST diff** | ✅ ChangeDistiller-based structural diff | Functional |
+| **Logical planning** | ✅ Basic DAG plan extraction | Early stage |
+| **Query optimization** | ✅ Column qualification, predicate pushdown, join elimination | Moderate |
+| **Custom dialect registration** | ✅ `CustomDialectBuilder` for runtime extension | Functional |
+
+### 2.2 What Polyglot CANNOT Do for a Multi-DB Project
+
+| Need | Polyglot Support | Alternative |
+|---|---|---|
+| **Execute queries** | ❌ No | Use sqlx, diesel, or sea-orm |
+| **Connection pooling** | ❌ No | Use deadpool, bb8, or sqlx built-in |
+| **Async I/O** | ❌ Synchronous only | Wrap in `spawn_blocking()` |
+| **Type-safe query building** | ⚠️ Partial (builder API returns strings) | Use diesel or sea-orm for compile-time checks |
+| **Schema migration management** | ❌ No | Use diesel migrations, sqlx migrations, or refinery |
+| **Row mapping / deserialization** | ❌ No | Use sqlx `FromRow`, diesel `Queryable` |
+| **Runtime type mapping** | ⚠️ Limited (DataType enum, no Rust type bridge) | Build your own layer |
+| **Database-specific DDL generation** | ⚠️ Parses/generates DDL but no migration framework | Use as a building block |
+| **Transaction management** | ❌ No | Use sqlx or diesel |
+
+### 2.3 Integration Pattern: Polyglot as a SQL Dialect Layer
+
+The most natural integration pattern for a multi-database storage layer:
+
+```
+┌──────────────────────────────────────────────┐
+│           Application Logic                   │
+├──────────────────────────────────────────────┤
+│         Query Builder / ORM Layer             │
+│         (diesel / sea-orm / custom)            │
+├──────────────────────┬───────────────────────┤
+│                      │                        │
+│   Polyglot Layer     │    Direct SQL          │
+│   (transpile,        │    (no translation     │
+│    validate,          │     needed)             │
+│    format)            │                        │
+├──────────────────────┴───────────────────────┤
+│         Database Driver Layer                  │
+│         (sqlx / diesel / tungstenite)          │
+├──────────────────────────────────────────────┤
+│    PostgreSQL    │    MySQL    │   SQLite      │
+└──────────────────────────────────────────────┘
+```
+
+In this pattern, Polyglot sits **above** the database drivers, translating SQL from a canonical dialect to the target database's dialect before execution. It does **not** replace the drivers.
+
+---
+
+## 3. Comparison with Other Rust SQL Libraries
+
+### 3.1 Feature Comparison Matrix
+
+| Feature | **Polyglot** | **Diesel** | **SQLx** | **SeaORM** | **sqlparser-rs** |
+|---|---|---|---|---|---|
+| **Primary Purpose** | SQL transpilation | ORM / query builder | Async DB driver | Async ORM | SQL parsing |
+| **SQL Parsing** | ✅ Full AST (200+ node types) | ✅ DSL-based | ❌ No | ❌ No | ✅ Full AST |
+| **SQL Generation** | ✅ Multi-dialect | ✅ Via DSL | ❌ No | ❌ No | ⚠️ Limited |
+| **Cross-dialect Transpilation** | ✅ 32 dialects | ❌ No | ❌ No | ❌ No | ❌ No |
+| **Query Builder** | ⚠️ Fluent, string-based | ✅ Type-safe DSL | ❌ No | ✅ Type-safe | ❌ No |
+| **Async I/O** | ❌ No (sync only) | ❌ Diesel 1.x is sync | ✅ Native async | ✅ Native async | ❌ No |
+| **Type-safe Queries** | ❌ No (runtime) | ✅ Compile-time | ❌ No | ✅ Compile-time | ❌ No |
+| **Connection Pool** | ❌ No | ❌ No (Diesel 2.x via r2d2) | ✅ Built-in | ✅ Built-in | ❌ No |
+| **Migration Support** | ❌ No | ✅ Built-in | ❌ No | ✅ Built-in | ❌ No |
+| **Database Execution** | ❌ No | ✅ Yes | ✅ Yes | ✅ Yes | ❌ No |
+| **Schema Validation** | ✅ Via ValidationSchema | ✅ Compile-time | ❌ No | ⚠️ Limited | ❌ No |
+| **Column Lineage** | ✅ Built-in | ❌ No | ❌ No | ❌ No | ❌ No |
+| **AST Diff** | ✅ Built-in | ❌ No | ❌ No | ❌ No | ❌ No |
+| **Dialects Supported** | 32 | 4 (PG, MySQL, SQLite, MSSQL) | N/A | N/A | 1 (ANSI SQL) |
+| **License** | MIT | MIT/Apache-2.0 | MIT/Apache-2.0 | MIT | MIT/Apache-2.0 |
+| **Maturity** | v0.4.4 (pre-1.0) | v2.2 (stable) | v0.8 (stable) | v1.1 (stable) | v0.49 (mature) |
+
+### 3.2 Polyglot vs Diesel
+
+| Aspect | Polyglot | Diesel |
+|---|---|---|
+| **Philosophy** | Parse any SQL → AST → generate any dialect | Type-safe DSL → SQL for specific databases |
+| **Type Safety** | Runtime (string-based) | Compile-time (macro-based) |
+| **Query Building** | `select(["col"]).from("t").where_(...)` → `Expression` AST | `schema::table::dsl::col.filter(...)` → SQL |
+| **Dialect Breadth** | 32 dialects | 4 (PostgreSQL, MySQL, SQLite, MSSQL) |
+| **Database Execution** | None (SQL text only) | Full CRUD with connection management |
+| **Migrations** | None | Built-in migration framework |
+| **When to use** | You need cross-dialect SQL translation, validation, lineage | You need type-safe queries with database execution |
+
+**Verdict**: Polyglot and Diesel are **complementary**, not competing. Use Diesel for type-safe database interaction; use Polyglot when you need to translate SQL between dialects or analyze SQL without executing it.
+
+### 3.3 Polyglot vs SQLx
+
+| Aspect | Polyglot | SQLx |
+|---|---|---|
+| **Philosophy** | SQL manipulation without execution | Async database driver with compile-time query checking |
+| **Async** | Synchronous only | Fully async |
+| **Query Checking** | Runtime validation against schema | Compile-time `query!()` macro |
+| **Database Support** | 32 dialects (parsing) | PostgreSQL, MySQL, SQLite (execution) |
+| **When to use** | SQL transformation/analysis | Database interaction with async Rust |
+
+**Verdict**: SQLx is for executing queries against databases. Polyglot is for transforming SQL text. They solve entirely different problems.
+
+### 3.4 Polyglot vs SeaORM
+
+| Aspect | Polyglot | SeaORM |
+|---|---|---|
+| **Philosophy** | SQL transpilation | Async ORM built on SQLx |
+| **Async** | No | Yes |
+| **Model Definition** | None | Entity models via macros |
+| **Relationships** | None | Has-one, has-many, many-to-many |
+| **When to use** | SQL dialect conversion | Database CRUD with relationships |
+
+**Verdict**: Same as SQLx — complementary, not competing.
+
+### 3.5 Polyglot vs sqlparser-rs
+
+| Aspect | Polyglot | sqlparser-rs |
+|---|---|---|
+| **Parsing** | ✅ Full (200+ node types) | ✅ Full (ANSI SQL + some dialects) |
+| **Generation** | ✅ Multi-dialect generation | ⚠️ Limited round-trip |
+| **Transpilation** | ✅ Cross-dialect transforms | ❌ No |
+| **Dialects** | 32 | Primarily ANSI SQL |
+| **Validation** | ✅ With error positions | ❌ Parse errors only |
+| **Builder** | ✅ Fluent API | ❌ No |
+| **Lineage** | ✅ Built-in | ❌ No |
+| **Diff** | ✅ Built-in | ❌ No |
+| **Maturity** | v0.4.4 | v0.49 (more established) |
+
+**Verdict**: sqlparser-rs is a mature parser for ANSI SQL. Polyglot offers significantly more: transpilation, 32 dialects, validation, lineage, diff, and a builder API. If you need dialect translation, Polyglot is the clear choice. If you only need ANSI SQL parsing and don't need generation/transpilation, sqlparser-rs may suffice with less overhead.
+
+### 3.6 Polyglot vs Python sqlglot
+
+| Aspect | Polyglot (Rust) | sqlglot (Python) |
+|---|---|---|
+| **Performance** | 8–19× faster (transpile), ~86× faster (generate) | Baseline |
+| **Language** | Rust | Python |
+| **Feature Parity** | ~95% of sqlglot's transpilation | Full feature set |
+| **Optimizer** | Column qualification, predicate pushdown (moderate) | Full optimizer (column pruning, join elimination, etc.) |
+| **Execution** | ❌ No | ⚠️ Limited (can execute against some engines) |
+| **Test Compatibility** | 10,220+ sqlglot fixture cases at 100% | Original test suite |
+| **Deployment** | Native binary / WASM / Python / Go | Python package |
+
+**Verdict**: Polyglot is the performance-oriented port of sqlglot. It covers the core transpilation use case at near-full feature parity. The Python sqlglot has a more mature optimizer and some execution capabilities, but Polyglot is catching up rapidly (0.4.x adds lineage, OpenLineage, schema validation, and more).
+
+---
+
+## 4. Limitations and Gotchas
+
+### 4.1 Current Limitations
+
+| Limitation | Impact | Mitigation |
+|---|---|---|
+| **Pre-1.0 API** | Breaking changes possible between minor versions | Pin exact version in Cargo.toml |
+| **No query execution** | Cannot run SQL against databases | Use alongside sqlx/diesel |
+| **No async** | Blocking in async contexts | Wrap in `spawn_blocking()` |
+| **No migration framework** | Cannot manage schema evolution | Use diesel migrations or refinery |
+| **No Rust type mapping** | `DataType` enum doesn't map to Rust types | Build your own type bridge |
+| **Builder returns Expression** | Builder doesn't produce type-safe queries | Accept runtime nature; pair with runtime validation |
+| **Optimizer is early** | Limited optimization passes vs Python sqlglot | Most useful passes exist (qualify_columns, pushdown_predicates) |
+| **WASM lacks `stacker`** | Deeply nested SQL may overflow stack in browser | Set format guard limits; consider web workers |
+| **Custom dialects are global** | `CustomDialectBuilder` uses a global `RwLock` registry | Fine for most apps; not ideal for per-request isolation |
+| **No prepared statement support** | Cannot generate `?` placeholders for parameterized queries | Build queries as strings; use sqlx for parameterization |
+
+### 4.2 Gotchas
+
+1. **`Dialect::get()` creates a new instance each call**: The `Dialect` struct bundles tokenizer + generator config + transformer. For hot loops, cache the `Dialect` instance rather than calling `Dialect::get()` repeatedly. (The overhead is minimal but non-zero.)
+
+2. **Transpilation is not always invertible**: Some dialects have features that don't exist in others (e.g., BigQuery's `QUALIFY`, PostgreSQL's `ILIKE`, TSQL's `TOP`). Transpiling `A → B` and then `B → A` may lose information.
+
+3. **Function transformation depth**: The transform pipeline processes per-node bottom-up. Some transformations require multi-pass processing (handled by `preprocess()`), but edge cases may require manual intervention.
+
+4. **AST is not a stable serialization format**: The `Expression` enum and its inner structs may change between versions. If you serialize ASTs to JSON, expect breaking changes across minor versions.
+
+5. **Feature flags are cumulative**: `transpile` implies `generate`, `openlineage` implies `semantic`, etc. For minimal builds, use `default-features = false` and select only what you need.
+
+6. **Global custom dialect registry**: Custom dialects registered via `CustomDialectBuilder::register()` are stored in a global `RwLock<HashMap>`. This means they persist for the lifetime of the process and are visible across threads. Call `unregister_custom_dialect()` to remove them.
+
+7. **Parser is permissive**: The parser accepts many SQL constructs that some databases reject. Validation (via `validate()` or `validate_with_schema()`) can catch some issues, but it's not a substitute for database-level error checking.
+
+8. **No `?` placeholder generation**: Polyglot doesn't generate parameterized query placeholders. For prepared statements, you'll need to handle parameter binding yourself with your database driver.
+
+9. **Schema validation requires manual schema definition**: The `ValidationSchema` struct must be populated manually — there's no automatic schema introspection from a live database.
+
+---
+
+## 5. Production-Readiness Assessment
+
+### 5.1 Strengths
+
+| Area | Rating | Notes |
+|---|---|---|
+| **Transpilation accuracy** | ⭐⭐⭐⭐⭐ | 10,220+ fixture cases at 100% pass rate |
+| **Performance** | ⭐⭐⭐⭐⭐ | 8–19× faster than Python sqlglot |
+| **Dialect coverage** | ⭐⭐⭐⭐⭐ | 32 dialects covering all major databases |
+| **API ergonomics** | ⭐⭐⭐⭐ | Clean public API; builder is pleasant |
+| **Error reporting** | ⭐⭐⭐⭐ | Line/column/byte-offset positions |
+| **WASM support** | ⭐⭐⭐⭐ | Full feature set in browser |
+| **Multi-language bindings** | ⭐⭐⭐⭐⭐ | Rust, TypeScript, Python, Go, C FFI |
+| **Documentation** | ⭐⭐⭐ | Rust API docs exist; could use more guides |
+| **Test coverage** | ⭐⭐⭐⭐⭐ | 18,745 test cases |
+| **Fuzzing** | ⭐⭐⭐⭐ | Supported via `cargo fuzz` |
+
+### 5.2 Risks
+
+| Risk | Severity | Mitigation |
+|---|---|---|
+| **Pre-1.0 breaking changes** | Medium | Pin version; monitor CHANGELOG |
+| **Single maintainer** | Medium | Code is well-structured; community could fork |
+| **Limited optimizer** | Low | Core passes exist; Python sqlglot is reference |
+| **No query execution** | Low (by design) | Combine with sqlx/diesel |
+| **WASM stack limits** | Low | Set guard rails; use web workers |
+
+### 5.3 Overall Assessment
+
+**Polyglot is production-viable for SQL transpilation and analysis tasks**, with caveats:
+
+- ✅ **Use for**: SQL dialect translation, SQL linting/validation, column lineage, pretty-printing, AST analysis, cross-database query migration
+- ⚠️ **Use with caution for**: Query building (no type safety), optimization (partial coverage)
+- ❌ **Don't use for**: Database execution, connection management, migrations, type-safe queries
+
+For a multi-database storage layer, the recommended pattern is:
+
+```
+Application → Polyglot (transpile SQL to target dialect) → sqlx/diesel (execute)
+```
+
+---
+
+## 6. Recommendation
+
+### When to Adopt Polyglot
+
+1. **You need to support multiple database backends with different SQL dialects** and want to write queries once in a canonical dialect, then transpile to the target at runtime.
+2. **You need SQL validation or analysis** (lineage, schema checking) without executing queries.
+3. **You need SQL pretty-printing or formatting** with configurable guard rails.
+4. **You need column lineage tracking** for data governance or OpenLineage integration.
+5. **You need to parse and analyze SQL** in a Rust/WASM/Python/Go context without connecting to a database.
+
+### When NOT to Adopt Polyglot
+
+1. **You need type-safe query building** — use Diesel or SeaORM instead.
+2. **You need async database execution** — use SQLx or SeaORM instead.
+3. **You need schema migrations** — use Diesel migrations, sqlx migrations, or Refinery instead.
+4. **You only need PostgreSQL** (or a single dialect) — a simpler parser may suffice.
+5. **You need Rust type → SQL type mapping** — Polyglot doesn't provide this.
+
+### Suggested Adoption Strategy
+
+For a multi-database storage layer:
+
+1. **Use Polyglot for SQL transpilation**: Write queries in a canonical dialect (e.g., PostgreSQL-compatible), transpile to the target dialect at runtime.
+2. **Use SQLx for database execution**: Handle connections, pooling, and async I/O.
+3. **Use Polyglot for validation**: Validate user-provided SQL before execution.
+4. **Use Polyglot for lineage**: Trace column flow for data governance.
+5. **Build a thin integration layer** that combines Polyglot's transpilation with SQLx's execution.
+
+---
+
+## References
+
+- <https://github.com/tobilg/polyglot> — Main repository
+- <https://crates.io/crates/polyglot-sql> — Rust crate (v0.4.4)
+- <https://docs.rs/polyglot-sql/latest/polyglot_sql/> — Rust API docs
+- <https://github.com/tobymao/sqlglot> — Python inspiration
+- <https://lib.rs/crates/polyglot-sql> — Package metadata
+- Local source: `/workspace/polyglot/`