docs(research): add polyglot SQL transpiler deep dive for multi-DB storage evaluation

This commit is contained in:
2026-06-10 10:04:30 +00:00
parent bdb0b604e9
commit 6e71d1f306
3 changed files with 1151 additions and 0 deletions

View File

@@ -0,0 +1,137 @@
# Polyglot: Research Overview
**Library**: `polyglot-sql` (Rust crate) / `@polyglot-sql/sdk` (TypeScript/WASM) / `polyglot-sql` (Python)
**Repository**: <https://github.com/tobilg/polyglot>
**Current Version**: 0.4.4 (as of 2026-06-03)
**License**: MIT (+ sqlglot MIT for test fixtures)
**Author**: Tobias G. (tobilg)
**Inspiration**: Python [sqlglot](https://github.com/tobymao/sqlglot) by Toby Mao
---
## 1. What Is Polyglot?
Polyglot is a **SQL transpiler** — it parses SQL from one database dialect into an AST, and generates SQL for a different dialect. It is **not** a database driver, ORM, query executor, or connection pool. Its core purpose is **dialect-agnostic SQL manipulation**: parse, transform, validate, format, and transpile SQL across 32+ database dialects.
### Key Capabilities
| Capability | Description |
|---|---|
| **Parse** | Convert SQL string → typed AST with 200+ expression node types |
| **Generate** | Convert AST → SQL string for any supported dialect |
| **Transpile** | Convert SQL from dialect A → dialect B in one call |
| **Format** | Pretty-print SQL with configurable guard rails |
| **Build** | Construct SQL programmatically via a fluent builder API |
| **Validate** | Syntax + semantic validation with error positions |
| **Lineage** | Trace column lineage through queries; generate OpenLineage payloads |
| **Diff** | AST-aware diff between two SQL expressions |
| **Traverse** | DFS/BFS iterators, predicate queries, and transforms on the AST |
### Supported Dialects (32)
Athena, BigQuery, ClickHouse, CockroachDB, Databricks, Doris, Dremio, Drill, Druid, DuckDB, Dune, Exasol, Fabric, Hive, Materialize, MySQL, Oracle, PostgreSQL, Presto, Redshift, RisingWave, SingleStore, Snowflake, Solr, Spark, SQLite, StarRocks, Tableau, Teradata, TiDB, Trino, TSQL
Plus a `Generic` dialect for standard SQL.
### Language Bindings
| Binding | Package | Delivery |
|---|---|---|
| **Rust** | `polyglot-sql` on crates.io | Native Rust crate |
| **TypeScript/WASM** | `@polyglot-sql/sdk` on npm | WASM module + JS wrapper |
| **Python** | `polyglot-sql` on PyPI | PyO3 native extension |
| **Go** | `github.com/tobilg/polyglot/packages/go` | PureGo wrapper over C FFI |
| **C FFI** | Built from `polyglot-sql-ffi` | `.so` / `.dylib` / `.dll` + `.a` / `.lib` + header |
---
## 2. Core Philosophy & Design Principles
1. **Pipeline architecture**: SQL → Tokenize → Parse → AST → Transform → Generate → SQL string. Each stage is independently configurable per dialect.
2. **Ported from Python sqlglot**: The Rust implementation is a faithful port of the Python `sqlglot` library, maintaining compatibility with its test fixtures (10,220+ fixture cases at 100% pass rate). The architecture, expression types, transformation rules, and dialect behaviors mirror the Python original.
3. **No runtime database connection**: Polyglot never connects to a database. It operates purely on SQL strings and ASTs. This makes it safe for sandboxed environments (WASM, serverless) and suitable for build-time / CI-time SQL analysis.
4. **Feature-gated compilation**: Each dialect is behind a Cargo feature flag (`dialect-postgresql`, `dialect-mysql`, etc.), so users compiling for constrained targets (WASM) can include only what they need. The `default` feature set includes everything.
5. **Stack safety**: The `stacker` feature (default-on for native builds) grows the stack on deeply nested inputs, preventing stack overflow from pathological SQL. WASM builds opt out since `stacker` doesn't work there.
6. **Guard rails**: Format/guard options limit input size (16 MiB default), token count (1M), AST node count (1M), and set-operation chain depth (256) to prevent resource exhaustion.
7. **Performance-first**: Built in Rust for speed. Benchmarks show 819× speedup over the Python `sqlglot` for transpilation, with generation at ~86× faster. The WASM build enables near-native performance in browsers.
---
## 3. How It Differs from Database Abstraction Layers
**Critical distinction**: Polyglot is a **SQL dialect transpiler**, not a database abstraction layer. It does not:
- Connect to databases
- Execute queries
- Manage connection pools
- Handle migrations (no `CREATE TABLE` schema evolution management)
- Map Rust types to database types
- Provide an ORM-like interface
- Handle async I/O
Instead, it focuses purely on **SQL text manipulation**: parsing, analyzing, transforming, and generating SQL strings. This makes it complementary to (not competing with) libraries like Diesel, SQLx, or SeaORM.
---
## 4. Performance Characteristics
From the project's benchmark suite (polyglot-sql v0.1.2 vs sqlglot v28.10.1):
| Operation | Speedup Range |
|---|---|
| Parse (SQL → AST) | 1013× faster |
| Generate (AST → SQL) | 77101× faster |
| Roundtrip (parse → generate → re-parse) | 1315× faster |
| Transpile (full cross-dialect) | 1.6× (simple) to 19× (complex BigQuery→Snowflake) |
| Geometric mean | **8.70×** |
Parse benchmarks (v0.4.x, native Rust):
| Query | Mean |
|---|---|
| short (SELECT a, b, c) | 51.28 μs |
| medium (5 cols, JOIN, GROUP BY) | 259.61 μs |
| complex (3 CTEs, subquery) | 268.59 μs 1.03 ms |
---
## 5. Project Maturity Indicators
| Indicator | Status |
|---|---|
| **Version** | 0.4.4 (pre-1.0, active development) |
| **Test coverage** | 18,745 test cases at 100% pass rate |
| **crates.io downloads** | ~4,738 total (as of mid-2026) |
| **Dependent crates** | 2 (via entdb) |
| **Release cadence** | Frequent patch releases (0.4.2, 0.4.3, 0.4.4 in quick succession) |
| **Source code size** | ~241K lines of Rust in core crate |
| **Fuzzing** | Supported via `cargo +nightly fuzz` |
| **CI** | Full test suite + FFI + Python + WASM |
| **Documentation** | Rust API docs (docs.rs), TypeScript docs, Python docs, playground |
| **Breaking changes** | Possible before 1.0; semver suggests API instability |
---
## 6. License
- **MIT License** for the Polyglot code itself
- **sqlglot MIT License** for the test fixtures derived from the Python project
- Both are permissive, suitable for commercial use
---
## References
- <https://github.com/tobilg/polyglot> — Main repository
- <https://crates.io/crates/polyglot-sql> — Rust crate on crates.io
- <https://www.npmjs.com/package/@polyglot-sql/sdk> — TypeScript SDK on npm
- <https://pypi.org/project/polyglot-sql/> — Python bindings on PyPI
- <https://docs.rs/polyglot-sql/latest/polyglot_sql/> — Rust API documentation
- <https://polyglot-playground.gh.tobilg.com/> — Interactive playground
- <https://github.com/tobymao/sqlglot> — Original Python inspiration

View File

@@ -0,0 +1,720 @@
# Polyglot: Architecture Deep Dive
---
## 1. Workspace Structure
The repository is organized as a Cargo workspace with 5 crates and supporting packages:
```
polyglot/
├── crates/
│ ├── polyglot-sql/ # Core Rust library (~241K LOC)
│ │ └── src/
│ │ ├── lib.rs # Public API, top-level functions
│ │ ├── tokens.rs # Tokenizer (lexer)
│ │ ├── parser.rs # Recursive-descent parser (~62K LOC)
│ │ ├── expressions.rs # AST node types (~15K LOC)
│ │ ├── generator.rs # SQL code generator (~39K LOC)
│ │ ├── dialects/ # 33 dialect implementations
│ │ │ ├── mod.rs # Dialect trait, Dialect struct, CustomDialectBuilder
│ │ │ ├── generic.rs # Base/standard SQL dialect
│ │ │ ├── postgres.rs # PostgreSQL (~1.9K LOC)
│ │ │ ├── mysql.rs # MySQL
│ │ │ ├── sqlite.rs # SQLite
│ │ │ ├── bigquery.rs # BigQuery
│ │ │ ├── ... (32 total)
│ │ ├── builder.rs # Fluent query builder API
│ │ ├── transforms.rs # Cross-dialect transform functions
│ │ ├── validation.rs # Syntax + semantic validation
│ │ ├── schema.rs # Schema representation
│ │ ├── scope.rs # Scope analysis
│ │ ├── resolver.rs # Column resolution
│ │ ├── lineage.rs # Column lineage tracking
│ │ ├── openlineage.rs # OpenLineage payload generation
│ │ ├── diff.rs # AST diff (ChangeDistiller algorithm)
│ │ ├── planner.rs # Logical query plan
│ │ ├── optimizer/ # Query optimizer modules
│ │ │ ├── annotate_types.rs # Type annotation
│ │ │ ├── qualify_columns.rs # Column qualification
│ │ │ ├── qualify_tables.rs # Table qualification
│ │ │ ├── pushdown_predicates.rs
│ │ │ ├── pushdown_projections.rs
│ │ │ ├── eliminate_joins.rs
│ │ │ ├── eliminate_ctes.rs
│ │ │ ├── simplify.rs
│ │ │ └── ...
│ │ ├── traversal.rs # DFS/BFS visitors, AST predicates
│ │ ├── ast_transforms.rs # AST manipulation utilities
│ │ ├── error.rs # Error types
│ │ └── time.rs # Time format conversion
│ ├── polyglot-sql-function-catalogs/ # Optional dialect function catalogs
│ ├── polyglot-sql-wasm/ # WASM bindings (wasm-pack)
│ ├── polyglot-sql-ffi/ # C FFI bindings (cbindgen)
│ └── polyglot-sql-python/ # Python bindings (PyO3 + maturin)
├── packages/
│ ├── sdk/ # TypeScript SDK (@polyglot-sql/sdk)
│ ├── go/ # Go SDK (PureGo wrapper over FFI)
│ ├── documentation/ # TypeScript API docs site
│ ├── playground/ # Browser playground (React 19, Vite)
│ └── python-docs/ # Python API docs
├── examples/
│ ├── rust/ # Rust usage example
│ ├── typescript/ # TypeScript SDK example
│ └── c/ # C FFI usage example
└── tools/
├── sqlglot-compare/ # Fixture extraction & comparison
└── bench-compare/ # Performance benchmarks
```
---
## 2. Data Flow Pipeline
```
┌──────────────────────────────────────────────────────────────────────┐
│ SQL String (source dialect) │
└──────────────────────────┬──────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────┐
│ Tokenizer (tokens.rs) │
│ • Dialect-specific lexing rules (quotes, comments, keywords) │
│ • Configurable via TokenizerConfig per dialect │
│ • Produces Vec<Token> with type, text, and Span (line/col/offset) │
└──────────────────────────┬──────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────┐
│ Parser (parser.rs, ~62K LOC) │
│ • Recursive-descent with precedence climbing │
│ • Dialect-aware parsing (custom keywords, syntax rules) │
│ • Produces Expression AST tree │
│ • Stack safety via `stacker` feature (default-on) │
└──────────────────────────┬──────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────┐
│ Expression AST (expressions.rs) │
│ • Single tagged enum with 150+ variants │
│ • Each variant has its own struct (Select, Insert, Function, etc.) │
│ • Box<Variant> keeps enum size to 2 words (tag + pointer) │
│ • Serializable via serde (derive Serialize/Deserialize) │
│ • Optional TypeScript type generation via `ts-rs` feature flag │
└──────────────────────────┬──────────────────────────────────────────┘
┌────┴────┐
│ │
┌─────────┘ └──────────┐
│ │
▼ ▼
┌────────────────────────┐ ┌────────────────────────────────────┐
│ Transform Pipeline │ │ Semantic / Analysis Modules │
│ (transpile path) │ │ • validation.rs → syntax checks │
│ │ │ • schema.rs → column/type lookup │
│ 1. preprocess() │ │ • scope.rs → scope analysis │
│ (whole-tree rewrites│ │ • resolver.rs → column resolution │
│ like eliminate_ │ │ • lineage.rs → column lineage │
│ qualify) │ │ • openlineage.rs → OL payloads │
│ │ │ • optimizer/ → query optimization │
│ 2. transform_expr() │ │ • diff.rs → AST diff │
│ (per-node rewrites │ │ • planner.rs → logical plan DAG │
│ per dialect) │ │ • traversal.rs → DFS/BFS visitors │
│ │ │
│ 3. Generator │ │
│ (AST → SQL string) │ │
└───────────┬────────────┘ └────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────┐
│ SQL String (target dialect) │
└──────────────────────────────────────────────────────────────────────┘
```
---
## 3. Core Abstractions
### 3.1 Expression AST
The central type is `Expression`, a large tagged enum with one variant per SQL construct:
```rust
pub enum Expression {
// Literals
Literal(Box<Literal>),
Boolean(BooleanLiteral),
Null(Null),
// Identifiers
Identifier(Identifier),
Column(Box<Column>),
Table(Box<TableRef>),
Star(Star),
// Queries
Select(Box<Select>),
Union(Box<Union>),
Intersect(Box<Intersect>),
Except(Box<Except>),
Subquery(Box<Subquery>),
// DML
Insert(Box<Insert>),
Update(Box<Update>),
Delete(Box<Delete>),
Copy(Box<CopyStmt>),
// Binary/Unary operators
And(Box<BinaryOp>),
Or(Box<BinaryOp>),
Add(Box<BinaryOp>),
Eq(Box<BinaryOp>),
// ... 30+ operator variants
// Functions
Function(Box<Function>),
AggregateFunction(Box<AggregateFunction>),
WindowFunction(Box<WindowFunction>),
// Clauses
From(Box<From>),
Join(Box<Join>),
Where(Box<Where>),
OrderBy(Box<OrderBy>),
// ...
// ~150 total variants
}
```
Key design choices:
- **Boxed variants**: Most variants wrap their payload in `Box` to keep `size_of::<Expression>()` at 2 words (16 bytes on 64-bit).
- **Serde support**: `#[derive(Serialize, Deserialize)]` for JSON serialization across FFI/WASM boundaries.
- **TypeScript types**: Optional `ts-rs` feature generates TypeScript interfaces.
- **Convenience methods**: `Expression::column()`, `Expression::number()`, `Expression::sql()`, `Expression::sql_for()`.
### 3.2 DialectType Enum
```rust
pub enum DialectType {
Generic, PostgreSQL, MySQL, BigQuery, Snowflake, DuckDB, SQLite,
Hive, Spark, Trino, Presto, Redshift, TSQL, Oracle, ClickHouse,
Databricks, Athena, Teradata, Doris, StarRocks, Materialize,
RisingWave, SingleStore, CockroachDB, TiDB, Druid, Solr, Tableau,
Dune, Fabric, Drill, Dremio, Exasol, DataFusion,
}
```
- Implements `FromStr` with aliases (e.g., `"mssql"``TSQL`, `"cockroach"``CockroachDB`)
- Each variant maps to a feature-gated dialect module
- Custom dialects can be registered at runtime via `CustomDialectBuilder`
### 3.3 DialectImpl Trait
```rust
pub trait DialectImpl {
fn dialect_type(&self) -> DialectType;
fn tokenizer_config(&self) -> TokenizerConfig { /* default */ }
fn generator_config(&self) -> GeneratorConfig { /* default */ }
fn generator_config_for_expr(&self, _expr: &Expression) -> GeneratorConfig { /* default */ }
fn transform_expr(&self, expr: Expression) -> Result<Expression> { Ok(expr) }
fn preprocess(&self, expr: Expression) -> Result<Expression> { Ok(expr) }
}
```
Each dialect implements this trait to provide:
1. **Tokenizer config**: Identifier quoting characters, string delimiters, keyword overrides, comment styles, hex number support
2. **Generator config**: 30+ flags controlling SQL output (identifier quote style, function casing, `LIMIT` vs `TOP` vs `FETCH FIRST`, etc.)
3. **Per-node transform**: Dialect-specific expression rewrites (e.g., PostgreSQL transforms `IFNULL``COALESCE`, SQLite transforms `TRY_CAST``CAST`)
4. **Whole-tree preprocess**: Structural rewrites that need full-tree context (e.g., eliminating `QUALIFY` for dialects that don't support it)
### 3.4 Dialect Struct (High-Level API)
```rust
pub struct Dialect {
dialect_type: DialectType,
tokenizer: Tokenizer,
generator_config: Arc<GeneratorConfig>,
transformer: Box<dyn Fn(Expression) -> Result<Expression> + Send + Sync>,
generator_config_for_expr: Option<Box<dyn Fn(&Expression) -> GeneratorConfig + Send + Sync>>,
custom_preprocess: Option<Box<dyn Fn(Expression) -> Result<Expression> + Send + Sync>>,
}
```
The `Dialect` struct bundles all dialect-specific state and provides the primary API:
```rust
// Parse SQL
let ast = dialect.parse("SELECT 1")?;
// Generate SQL from AST
let sql = dialect.generate(&ast[0])?;
// Transpile between dialects
let results = dialect.transpile("SELECT IFNULL(a,b) FROM t", DialectType::PostgreSQL)?;
// Tokenize
let tokens = dialect.tokenize("SELECT 1")?;
```
### 3.5 CustomDialectBuilder
For runtime-extensible dialect support:
```rust
use polyglot_sql::dialects::{CustomDialectBuilder, Dialect, DialectType};
use polyglot_sql::generator::NormalizeFunctions;
// Register a custom dialect inheriting from PostgreSQL
CustomDialectBuilder::new("my_postgres")
.based_on(DialectType::PostgreSQL)
.generator_config_modifier(|gc| {
gc.normalize_functions = NormalizeFunctions::Lower;
})
.register()?;
let d = Dialect::get_by_name("my_postgres").unwrap();
// Use like any built-in dialect
```
---
## 4. Dialect Implementation Details
### 4.1 PostgreSQL (`postgres.rs`, ~1,879 LOC)
**Tokenizer:**
- `$$` string literals (dollar-quoting)
- Double-quote identifier quoting
- Nested block comments
- `EXEC` treated as generic command
**Generator config highlights:**
- `identifier_quote: '"'` (double quotes)
- `single_string_interval: true` (`INTERVAL '1 day'`)
- `parameter_token: "$"` (`$1`, `$2` placeholders)
- `supports_select_into: true`
- `supports_window_exclude: true`
- `can_implement_array_any: true`
**Transform examples:**
- `IFNULL(a, b)``COALESCE(a, b)`
- `RAND()``RANDOM()`
- `DATEDIFF(day, a, b)``CAST(b - a AS INT)` (date subtraction)
- `JSON_EXTRACT(a, '$.x')``a #> '{x}'` (arrow syntax)
- `JSON_EXTRACT_SCALAR(a, '$.x')``a #>> '{x}'`
- `DATE_ADD` / `DATE_SUB``+` / `-` interval arithmetic
- Type mappings: `TINYINT``SMALLINT`, `FLOAT``REAL`, `DOUBLE``DOUBLE PRECISION`
- `ILIKE` preserved (native PostgreSQL)
- `RegexpLike``~` operator, `RegexpILike``~*` operator
### 4.2 SQLite (`sqlite.rs`, ~750 LOC)
**Tokenizer:**
- Supports `"`, `[`, `` ` `` as identifier quote characters
- No nested comments
- Hex number literals (`0xCC`)
**Generator config:**
- `identifier_quote: '"'` (double quotes)
- `supports_table_alias_columns: false`
- `json_key_value_pair_sep: ","` (comma-style `JSON_OBJECT`)
**Transform examples:**
- `NVL(a, b)``IFNULL(a, b)`
- `TRY_CAST(x AS t)``CAST(x AS t)` (no try-cast)
- `RANDOM()` → function
- `ILIKE``LOWER(left) LIKE LOWER(right)` (no native ILIKE)
- `CountIf(cond)``SUM(IIF(cond, 1, 0))`
- `CEIL(x)` → function form
- `DATE_TRUNC(unit, col)` → various strftime patterns
- `DATE_DIFF``juliandiff` patterns
### 4.3 MySQL (`mysql.rs`)
**Tokenizer:** Backtick identifiers, `#` comments
**Generator:** Backtick quoting, `LIMIT` syntax, `CONCAT()` instead of `||`
**Transforms:** `COALESCE(a,b)``IFNULL(a,b)`, `||``CONCAT()` (string concat), etc.
### 4.4 BigQuery (`bigquery.rs`)
**Tokenizer:** Backtick identifiers, `QUALIFY` keyword
**Generator:** Backtick quoting, `STRUCT` types, `QUALIFY` clause, `DATE_DIFF` syntax
**Transforms:** Complex date/timestamp function mappings, `UNNEST` handling, `APPROX_COUNT_DISTINCT``APPROX_COUNT_DISTINCT`
### 4.5 How Transpilation Works
The full transpilation pipeline:
```
Input SQL (source dialect)
Source Dialect Tokenizer
Parser (dialect-aware)
Expression AST
Source Dialect::preprocess() ← whole-tree rewrites
Source Dialect::transform_expr() ← per-node rewrites (recursive, bottom-up)
Normalized AST
Target Dialect Generator
Output SQL (target dialect)
```
The transform pipeline uses an explicit task stack (not recursive calls) for the hot paths to avoid stack overflow. The `stacker` crate provides additional stack-growth protection.
Key cross-dialect transforms include:
- Function renaming: `IFNULL``COALESCE``NVL`, `DATEDIFF` ↔ date arithmetic, `STRING_AGG``GROUP_CONCAT`
- Type mapping: `TINYINT``SMALLINT`, `FLOAT``REAL`, `JSON``JSONB`
- Syntax conversion: `LIMIT``TOP``FETCH FIRST`, `||` (concat) ↔ `CONCAT()`, `SELECT INTO``CREATE TABLE AS`
- Boolean handling: `BOOL_AND`/`BOOL_OR``MIN`/`MAX`-over-`CASE`
- JSON operators: `JSON_EXTRACT``#>`/`#>>``->`/`->>` (PostgreSQL arrow syntax)
---
## 5. Fluent Builder API
The builder module (`builder.rs`, ~3.3K LOC) provides a type-safe, ergonomic way to construct SQL expressions without string interpolation:
```rust
use polyglot_sql::builder::*;
// SELECT id, name FROM users WHERE age > 18 ORDER BY name LIMIT 10
let expr = select(["id", "name"])
.from("users")
.where_(col("age").gt(lit(18)))
.order_by(["name"])
.limit(10)
.build();
// INSERT
let ins = insert_into("users")
.columns(["id", "name"])
.values([lit(1), lit("Alice")])
.build();
// CASE expression
let expr = case()
.when(col("x").gt(lit(0)), lit("positive"))
.else_(lit("non-positive"))
.build();
// Set operations
let expr = union_all(
select(["id"]).from("a"),
select(["id"]).from("b"),
).order_by(["id"]).limit(5).build();
```
Expression helpers:
- `col("users.id")` — column reference (splits on last `.`)
- `lit(42)`, `lit("hello")`, `lit(3.14)`, `lit(true)` — literals
- `func("COALESCE", [col("a"), col("b")])` — function calls
- Operator chain: `col("age").gte(lit(18)).and(col("status").eq(lit("active")))`
The builder generates an `Expression` AST that can then be serialized to any dialect via `generate()`.
---
## 6. Validation and Schema-Aware Analysis
### 6.1 Syntax Validation
```rust
use polyglot_sql::{validate, DialectType};
let result = validate("SELECT * FORM users", DialectType::Generic);
// result.valid == false
// result.errors contain line/column/message/error codes
```
Error codes:
- `E001` — Syntax error
- `E002` — Tokenization error
- `E003` — Parse error
- `E004` — Invalid expression (not a valid statement)
- `E005` — Trailing comma in strict mode
### 6.2 Schema-Aware Validation
```rust
use polyglot_sql::{
validate_with_schema, DialectType, SchemaColumn, SchemaTable,
SchemaValidationOptions, ValidationSchema,
};
let schema = ValidationSchema {
strict: Some(true),
tables: vec![
SchemaTable {
name: "users".into(),
columns: vec![
SchemaColumn { name: "id".into(), data_type: "integer".into(), nullable: Some(false), primary_key: true, unique: false, references: None },
SchemaColumn { name: "email".into(), data_type: "varchar".into(), nullable: Some(false), primary_key: false, unique: true, references: None },
],
// ...
},
],
};
let opts = SchemaValidationOptions { check_types: true, check_references: true, strict: None, semantic: true };
let result = validate_with_schema("SELECT id FROM users WHERE email = 1", DialectType::Generic, &schema, &opts);
// result.valid == false (type mismatch: email is varchar, compared to integer)
```
Schema-aware error codes:
- `E200`/`E201` — Unknown table/column
- `E210``E217`, `W210``W216` — Type checks
- `E220`, `E221`, `W220`, `W221`, `W222` — Reference/FK checks
### 6.3 Function Catalogs
Optional feature-gated function catalogs (currently ClickHouse and DuckDB) provide known function signatures for semantic type checking:
```toml
polyglot-sql = { version = "0.4", features = ["function-catalog-clickhouse"] }
```
---
## 7. Column Lineage & OpenLineage
### 7.1 Column Lineage
Trace how columns flow through a query:
```rust
use polyglot_sql::{parse, DialectType};
use polyglot_sql::lineage::get_column_lineage;
let ast = parse("SELECT a + b AS total FROM t", DialectType::Generic).unwrap();
let lineage = get_column_lineage(&ast[0], /* schema */ None, DialectType::Generic);
// lineage tells you that "total" depends on columns "a" and "b" from table "t"
```
### 7.2 OpenLineage Payload Generation
```rust
use polyglot_sql::openlineage::{generate_run_event, OpenLineageOptions, OpenLineageDatasetId};
let opts = OpenLineageOptions {
dialect: DialectType::PostgreSQL,
producer: "my-app".into(),
dataset_namespace: Some("mydb".into()),
// ...
};
let event = generate_run_event("SELECT * FROM users", &opts)?;
// event is a JSON-serializable OpenLineage RunEvent with columnLineage facets
```
---
## 8. Error Handling
### 8.1 Error Types
```rust
pub enum Error {
Tokenize { message: String, line: usize, column: usize, start: usize, end: usize },
Parse { message: String, line: usize, column: usize, start: usize, end: usize },
Generate(String),
Unsupported { feature: String, dialect: String },
Syntax { message: String, line: usize, column: usize, start: usize, end: usize },
Internal(String),
}
```
All position-bearing errors include:
- `line` — 1-based line number
- `column` — 1-based column number
- `start` / `end` — byte offsets (0-based, end exclusive)
```rust
let err = Error::parse("Unexpected token", 3, 15, 42, 44);
assert_eq!(err.line(), Some(3));
assert_eq!(err.column(), Some(15));
assert_eq!(err.start(), Some(42));
```
### 8.2 Validation Errors
```rust
pub struct ValidationError {
pub message: String,
pub line: Option<usize>,
pub column: Option<usize>,
pub severity: ValidationSeverity, // Error or Warning
pub code: String, // e.g., "E001", "E200"
pub start: Option<usize>,
pub end: Option<usize>,
}
pub struct ValidationResult {
pub valid: bool,
pub errors: Vec<ValidationError>,
}
```
### 8.3 Guard Rail Errors
Format operations have configurable guard limits that return structured errors:
- `E_GUARD_INPUT_TOO_LARGE` — input exceeds `max_input_bytes`
- `E_GUARD_TOKEN_BUDGET_EXCEEDED` — token count exceeds `max_tokens`
- `E_GUARD_AST_BUDGET_EXCEEDED` — AST node count exceeds `max_ast_nodes`
- `E_GUARD_SET_OP_CHAIN_EXCEEDED` — UNION/INTERSECT/EXCEPT chain exceeds `max_set_op_chain`
---
## 9. AST Traversal & Analysis
### 9.1 Traversal
```rust
use polyglot_sql::{parse, DialectType};
use polyglot_sql::traversal::*;
let ast = parse("SELECT a, b FROM t WHERE x > 1", DialectType::Generic).unwrap();
let columns = get_columns(&ast[0]); // ["a", "b", "x"]
let tables = get_tables(&ast[0]); // ["t"]
```
Available predicates (70+):
- `is_select`, `is_insert`, `is_update`, `is_delete`, `is_ddl`
- `is_join`, `is_where`, `is_group_by`, `is_order_by`, `is_limit`
- `is_function`, `is_aggregate`, `is_subquery`, `is_cte`
- `is_comparison`, `is_logical`, `is_arithmetic`
- `contains_subquery`, `contains_aggregate`, `contains_window_function`
Iterators: `DfsIter`, `BfsIter` for depth-first and breadth-first traversal.
### 9.2 AST Transforms
```rust
use polyglot_sql::ast_transforms::*;
// Rename tables
let renamed = rename_tables(expr, &[("old_name", "new_name")]);
// Add WHERE condition
let filtered = add_where(expr, col("active").eq(lit(true)));
// Remove LIMIT/OFFSET
let unlimited = remove_limit_offset(expr);
```
### 9.3 AST Diff
```rust
use polyglot_sql::diff::{diff, diff_with_config, DiffConfig};
let edits = diff(&source_expr, &target_expr, true);
for edit in &edits {
if edit.is_change() {
println!("{:?}", edit);
}
}
```
Uses the ChangeDistiller algorithm with Dice coefficient matching for structural comparison.
### 9.4 Logical Planner
```rust
use polyglot_sql::planner::Plan;
let plan = Plan::from_expression(&expr);
// plan.root is a Step DAG
// plan.leaves() returns leaf steps
// plan.dag() returns the dependency graph
```
Step kinds: Scan, Filter, Project, Aggregate, Join, Sort, Limit, etc.
---
## 10. Optimizer Modules
The optimizer is available behind the `semantic` feature flag:
| Module | Purpose |
|---|---|
| `qualify_columns.rs` | Resolve unqualified column references to table.column |
| `qualify_tables.rs` | Expand table names with schema/catalog |
| `annotate_types.rs` | Infer and annotate expression types |
| `pushdown_predicates.rs` | Push WHERE conditions into JOINs |
| `pushdown_projections.rs` | Reduce columns to only what's needed |
| `eliminate_joins.rs` | Remove unnecessary JOINs |
| `eliminate_ctes.rs` | Inline single-use CTEs |
| `simplify.rs` | Simplify boolean expressions, constant folding |
| `normalize.rs` | Expression normalization |
| `canonicalize.rs` | Query canonicalization |
| `subquery.rs` | Subquery analysis |
---
## 11. Async Support
**Polyglot does not use async I/O** — it is a pure computational library. All operations are synchronous and CPU-bound:
- `parse()` — synchronous
- `generate()` — synchronous
- `transpile()` — synchronous
- `validate()` — synchronous
- `format()` — synchronous
This is by design: Polyglot operates on SQL strings in memory, with no network or filesystem I/O. For use in async contexts (Tokio, async-std), callers should use `tokio::task::spawn_blocking()` or similar to offload CPU-heavy parsing/transpilation to a blocking thread pool.
---
## 12. Feature Flags
| Flag | Description | Default |
|---|---|---|
| `all-dialects` | Enable all 32 dialect parsers | ✅ |
| `generate` | SQL generation from AST | ✅ |
| `transpile` | Cross-dialect transpilation (implies `generate`) | ✅ |
| `builder` | Fluent query builder API (implies `generate`) | ✅ |
| `ast-tools` | AST inspection & transform utilities | ✅ |
| `semantic` | Schema, resolver, lineage, optimizer, validation | ✅ |
| `openlineage` | OpenLineage payload generation (implies `semantic`) | ✅ |
| `diff` | AST diff support (implies `generate`) | ✅ |
| `planner` | Logical planning helpers | ✅ |
| `time` | Time-format conversion helpers | ✅ |
| `stacker` | Stack-growth protection for native builds | ✅ |
| `bindings` | TypeScript type generation via `ts-rs` | ❌ |
| `dialect-postgresql` | PostgreSQL dialect only | — |
| `dialect-mysql` | MySQL dialect only | — |
| ... (one per dialect) | Individual dialect selector | — |
| `function-catalog-clickhouse` | ClickHouse function catalog | ❌ |
| `function-catalog-duckdb` | DuckDB function catalog | ❌ |
| `function-catalog-all-dialects` | All function catalogs | ❌ |
Minimal WASM build (for constrained targets):
```toml
polyglot-sql = { version = "0.4", default-features = false, features = ["generate", "transpile", "dialect-postgresql", "dialect-mysql"] }
```
---
## References
- Source code examined: `/workspace/polyglot/crates/polyglot-sql/src/` (~241K LOC)
- Architecture documentation: `/workspace/polyglot/docs/sqlglot-architecture.md`
- Benchmark results: `/workspace/polyglot/docs/benchmark.md`
- README: `/workspace/polyglot/README.md`, `/workspace/polyglot/crates/polyglot-sql/README.md`
- CHANGELOG: `/workspace/polyglot/CHANGELOG.md`

View File

@@ -0,0 +1,294 @@
# Polyglot: Suitability Analysis & Comparisons
---
## 1. What Polyglot Is NOT
Before evaluating suitability, it's essential to understand what Polyglot **does not** do:
| NOT a... | Because |
|---|---|
| **Database driver** | No connection management, no query execution, no result set handling |
| **ORM** | No object-relational mapping, no model definitions, no active record pattern |
| **Migration tool** | No `CREATE TABLE` evolution management, no up/down migrations framework |
| **Type mapper** | No Rust type → SQL type mapping, no `FromRow` derives |
| **Connection pool** | No async I/O, no TCP connections, no TLS |
| **Query executor** | Never connects to a database; operates purely on SQL text |
**Polyglot is a SQL dialect transpiler.** It converts SQL strings between database dialects. Period.
---
## 2. Suitability Assessment for Multi-Database Storage Layer
### 2.1 What Polyglot CAN Do for a Multi-DB Project
| Use Case | Polyglot Support | Maturity |
|---|---|---|
| **SQL dialect translation** | ✅ Core purpose; 32 dialects with 100% test pass rate | Mature |
| **SQL pretty-printing** | ✅ Built-in format with guard rails | Mature |
| **SQL syntax validation** | ✅ Line/column error positions, error codes | Mature |
| **Schema-aware validation** | ✅ Table/column/type checking with `ValidationSchema` | Moderate |
| **Column lineage tracing** | ✅ `get_column_lineage()` for data lineage | Moderate |
| **OpenLineage payloads** | ✅ `RunEvent` and `DatasetFacet` generation | Early but functional |
| **Query builder** | ✅ Fluent API for SELECT/INSERT/UPDATE/DELETE | Usable but not as rich as query-builder-first libraries |
| **AST diff** | ✅ ChangeDistiller-based structural diff | Functional |
| **Logical planning** | ✅ Basic DAG plan extraction | Early stage |
| **Query optimization** | ✅ Column qualification, predicate pushdown, join elimination | Moderate |
| **Custom dialect registration** | ✅ `CustomDialectBuilder` for runtime extension | Functional |
### 2.2 What Polyglot CANNOT Do for a Multi-DB Project
| Need | Polyglot Support | Alternative |
|---|---|---|
| **Execute queries** | ❌ No | Use sqlx, diesel, or sea-orm |
| **Connection pooling** | ❌ No | Use deadpool, bb8, or sqlx built-in |
| **Async I/O** | ❌ Synchronous only | Wrap in `spawn_blocking()` |
| **Type-safe query building** | ⚠️ Partial (builder API returns strings) | Use diesel or sea-orm for compile-time checks |
| **Schema migration management** | ❌ No | Use diesel migrations, sqlx migrations, or refinery |
| **Row mapping / deserialization** | ❌ No | Use sqlx `FromRow`, diesel `Queryable` |
| **Runtime type mapping** | ⚠️ Limited (DataType enum, no Rust type bridge) | Build your own layer |
| **Database-specific DDL generation** | ⚠️ Parses/generates DDL but no migration framework | Use as a building block |
| **Transaction management** | ❌ No | Use sqlx or diesel |
### 2.3 Integration Pattern: Polyglot as a SQL Dialect Layer
The most natural integration pattern for a multi-database storage layer:
```
┌──────────────────────────────────────────────┐
│ Application Logic │
├──────────────────────────────────────────────┤
│ Query Builder / ORM Layer │
│ (diesel / sea-orm / custom) │
├──────────────────────┬───────────────────────┤
│ │ │
│ Polyglot Layer │ Direct SQL │
│ (transpile, │ (no translation │
│ validate, │ needed) │
│ format) │ │
├──────────────────────┴───────────────────────┤
│ Database Driver Layer │
│ (sqlx / diesel / tungstenite) │
├──────────────────────────────────────────────┤
│ PostgreSQL │ MySQL │ SQLite │
└──────────────────────────────────────────────┘
```
In this pattern, Polyglot sits **above** the database drivers, translating SQL from a canonical dialect to the target database's dialect before execution. It does **not** replace the drivers.
---
## 3. Comparison with Other Rust SQL Libraries
### 3.1 Feature Comparison Matrix
| Feature | **Polyglot** | **Diesel** | **SQLx** | **SeaORM** | **sqlparser-rs** |
|---|---|---|---|---|---|
| **Primary Purpose** | SQL transpilation | ORM / query builder | Async DB driver | Async ORM | SQL parsing |
| **SQL Parsing** | ✅ Full AST (200+ node types) | ✅ DSL-based | ❌ No | ❌ No | ✅ Full AST |
| **SQL Generation** | ✅ Multi-dialect | ✅ Via DSL | ❌ No | ❌ No | ⚠️ Limited |
| **Cross-dialect Transpilation** | ✅ 32 dialects | ❌ No | ❌ No | ❌ No | ❌ No |
| **Query Builder** | ⚠️ Fluent, string-based | ✅ Type-safe DSL | ❌ No | ✅ Type-safe | ❌ No |
| **Async I/O** | ❌ No (sync only) | ❌ Diesel 1.x is sync | ✅ Native async | ✅ Native async | ❌ No |
| **Type-safe Queries** | ❌ No (runtime) | ✅ Compile-time | ❌ No | ✅ Compile-time | ❌ No |
| **Connection Pool** | ❌ No | ❌ No (Diesel 2.x via r2d2) | ✅ Built-in | ✅ Built-in | ❌ No |
| **Migration Support** | ❌ No | ✅ Built-in | ❌ No | ✅ Built-in | ❌ No |
| **Database Execution** | ❌ No | ✅ Yes | ✅ Yes | ✅ Yes | ❌ No |
| **Schema Validation** | ✅ Via ValidationSchema | ✅ Compile-time | ❌ No | ⚠️ Limited | ❌ No |
| **Column Lineage** | ✅ Built-in | ❌ No | ❌ No | ❌ No | ❌ No |
| **AST Diff** | ✅ Built-in | ❌ No | ❌ No | ❌ No | ❌ No |
| **Dialects Supported** | 32 | 4 (PG, MySQL, SQLite, MSSQL) | N/A | N/A | 1 (ANSI SQL) |
| **License** | MIT | MIT/Apache-2.0 | MIT/Apache-2.0 | MIT | MIT/Apache-2.0 |
| **Maturity** | v0.4.4 (pre-1.0) | v2.2 (stable) | v0.8 (stable) | v1.1 (stable) | v0.49 (mature) |
### 3.2 Polyglot vs Diesel
| Aspect | Polyglot | Diesel |
|---|---|---|
| **Philosophy** | Parse any SQL → AST → generate any dialect | Type-safe DSL → SQL for specific databases |
| **Type Safety** | Runtime (string-based) | Compile-time (macro-based) |
| **Query Building** | `select(["col"]).from("t").where_(...)``Expression` AST | `schema::table::dsl::col.filter(...)` → SQL |
| **Dialect Breadth** | 32 dialects | 4 (PostgreSQL, MySQL, SQLite, MSSQL) |
| **Database Execution** | None (SQL text only) | Full CRUD with connection management |
| **Migrations** | None | Built-in migration framework |
| **When to use** | You need cross-dialect SQL translation, validation, lineage | You need type-safe queries with database execution |
**Verdict**: Polyglot and Diesel are **complementary**, not competing. Use Diesel for type-safe database interaction; use Polyglot when you need to translate SQL between dialects or analyze SQL without executing it.
### 3.3 Polyglot vs SQLx
| Aspect | Polyglot | SQLx |
|---|---|---|
| **Philosophy** | SQL manipulation without execution | Async database driver with compile-time query checking |
| **Async** | Synchronous only | Fully async |
| **Query Checking** | Runtime validation against schema | Compile-time `query!()` macro |
| **Database Support** | 32 dialects (parsing) | PostgreSQL, MySQL, SQLite (execution) |
| **When to use** | SQL transformation/analysis | Database interaction with async Rust |
**Verdict**: SQLx is for executing queries against databases. Polyglot is for transforming SQL text. They solve entirely different problems.
### 3.4 Polyglot vs SeaORM
| Aspect | Polyglot | SeaORM |
|---|---|---|
| **Philosophy** | SQL transpilation | Async ORM built on SQLx |
| **Async** | No | Yes |
| **Model Definition** | None | Entity models via macros |
| **Relationships** | None | Has-one, has-many, many-to-many |
| **When to use** | SQL dialect conversion | Database CRUD with relationships |
**Verdict**: Same as SQLx — complementary, not competing.
### 3.5 Polyglot vs sqlparser-rs
| Aspect | Polyglot | sqlparser-rs |
|---|---|---|
| **Parsing** | ✅ Full (200+ node types) | ✅ Full (ANSI SQL + some dialects) |
| **Generation** | ✅ Multi-dialect generation | ⚠️ Limited round-trip |
| **Transpilation** | ✅ Cross-dialect transforms | ❌ No |
| **Dialects** | 32 | Primarily ANSI SQL |
| **Validation** | ✅ With error positions | ❌ Parse errors only |
| **Builder** | ✅ Fluent API | ❌ No |
| **Lineage** | ✅ Built-in | ❌ No |
| **Diff** | ✅ Built-in | ❌ No |
| **Maturity** | v0.4.4 | v0.49 (more established) |
**Verdict**: sqlparser-rs is a mature parser for ANSI SQL. Polyglot offers significantly more: transpilation, 32 dialects, validation, lineage, diff, and a builder API. If you need dialect translation, Polyglot is the clear choice. If you only need ANSI SQL parsing and don't need generation/transpilation, sqlparser-rs may suffice with less overhead.
### 3.6 Polyglot vs Python sqlglot
| Aspect | Polyglot (Rust) | sqlglot (Python) |
|---|---|---|
| **Performance** | 819× faster (transpile), ~86× faster (generate) | Baseline |
| **Language** | Rust | Python |
| **Feature Parity** | ~95% of sqlglot's transpilation | Full feature set |
| **Optimizer** | Column qualification, predicate pushdown (moderate) | Full optimizer (column pruning, join elimination, etc.) |
| **Execution** | ❌ No | ⚠️ Limited (can execute against some engines) |
| **Test Compatibility** | 10,220+ sqlglot fixture cases at 100% | Original test suite |
| **Deployment** | Native binary / WASM / Python / Go | Python package |
**Verdict**: Polyglot is the performance-oriented port of sqlglot. It covers the core transpilation use case at near-full feature parity. The Python sqlglot has a more mature optimizer and some execution capabilities, but Polyglot is catching up rapidly (0.4.x adds lineage, OpenLineage, schema validation, and more).
---
## 4. Limitations and Gotchas
### 4.1 Current Limitations
| Limitation | Impact | Mitigation |
|---|---|---|
| **Pre-1.0 API** | Breaking changes possible between minor versions | Pin exact version in Cargo.toml |
| **No query execution** | Cannot run SQL against databases | Use alongside sqlx/diesel |
| **No async** | Blocking in async contexts | Wrap in `spawn_blocking()` |
| **No migration framework** | Cannot manage schema evolution | Use diesel migrations or refinery |
| **No Rust type mapping** | `DataType` enum doesn't map to Rust types | Build your own type bridge |
| **Builder returns Expression** | Builder doesn't produce type-safe queries | Accept runtime nature; pair with runtime validation |
| **Optimizer is early** | Limited optimization passes vs Python sqlglot | Most useful passes exist (qualify_columns, pushdown_predicates) |
| **WASM lacks `stacker`** | Deeply nested SQL may overflow stack in browser | Set format guard limits; consider web workers |
| **Custom dialects are global** | `CustomDialectBuilder` uses a global `RwLock` registry | Fine for most apps; not ideal for per-request isolation |
| **No prepared statement support** | Cannot generate `?` placeholders for parameterized queries | Build queries as strings; use sqlx for parameterization |
### 4.2 Gotchas
1. **`Dialect::get()` creates a new instance each call**: The `Dialect` struct bundles tokenizer + generator config + transformer. For hot loops, cache the `Dialect` instance rather than calling `Dialect::get()` repeatedly. (The overhead is minimal but non-zero.)
2. **Transpilation is not always invertible**: Some dialects have features that don't exist in others (e.g., BigQuery's `QUALIFY`, PostgreSQL's `ILIKE`, TSQL's `TOP`). Transpiling `A → B` and then `B → A` may lose information.
3. **Function transformation depth**: The transform pipeline processes per-node bottom-up. Some transformations require multi-pass processing (handled by `preprocess()`), but edge cases may require manual intervention.
4. **AST is not a stable serialization format**: The `Expression` enum and its inner structs may change between versions. If you serialize ASTs to JSON, expect breaking changes across minor versions.
5. **Feature flags are cumulative**: `transpile` implies `generate`, `openlineage` implies `semantic`, etc. For minimal builds, use `default-features = false` and select only what you need.
6. **Global custom dialect registry**: Custom dialects registered via `CustomDialectBuilder::register()` are stored in a global `RwLock<HashMap>`. This means they persist for the lifetime of the process and are visible across threads. Call `unregister_custom_dialect()` to remove them.
7. **Parser is permissive**: The parser accepts many SQL constructs that some databases reject. Validation (via `validate()` or `validate_with_schema()`) can catch some issues, but it's not a substitute for database-level error checking.
8. **No `?` placeholder generation**: Polyglot doesn't generate parameterized query placeholders. For prepared statements, you'll need to handle parameter binding yourself with your database driver.
9. **Schema validation requires manual schema definition**: The `ValidationSchema` struct must be populated manually — there's no automatic schema introspection from a live database.
---
## 5. Production-Readiness Assessment
### 5.1 Strengths
| Area | Rating | Notes |
|---|---|---|
| **Transpilation accuracy** | ⭐⭐⭐⭐⭐ | 10,220+ fixture cases at 100% pass rate |
| **Performance** | ⭐⭐⭐⭐⭐ | 819× faster than Python sqlglot |
| **Dialect coverage** | ⭐⭐⭐⭐⭐ | 32 dialects covering all major databases |
| **API ergonomics** | ⭐⭐⭐⭐ | Clean public API; builder is pleasant |
| **Error reporting** | ⭐⭐⭐⭐ | Line/column/byte-offset positions |
| **WASM support** | ⭐⭐⭐⭐ | Full feature set in browser |
| **Multi-language bindings** | ⭐⭐⭐⭐⭐ | Rust, TypeScript, Python, Go, C FFI |
| **Documentation** | ⭐⭐⭐ | Rust API docs exist; could use more guides |
| **Test coverage** | ⭐⭐⭐⭐⭐ | 18,745 test cases |
| **Fuzzing** | ⭐⭐⭐⭐ | Supported via `cargo fuzz` |
### 5.2 Risks
| Risk | Severity | Mitigation |
|---|---|---|
| **Pre-1.0 breaking changes** | Medium | Pin version; monitor CHANGELOG |
| **Single maintainer** | Medium | Code is well-structured; community could fork |
| **Limited optimizer** | Low | Core passes exist; Python sqlglot is reference |
| **No query execution** | Low (by design) | Combine with sqlx/diesel |
| **WASM stack limits** | Low | Set guard rails; use web workers |
### 5.3 Overall Assessment
**Polyglot is production-viable for SQL transpilation and analysis tasks**, with caveats:
-**Use for**: SQL dialect translation, SQL linting/validation, column lineage, pretty-printing, AST analysis, cross-database query migration
- ⚠️ **Use with caution for**: Query building (no type safety), optimization (partial coverage)
-**Don't use for**: Database execution, connection management, migrations, type-safe queries
For a multi-database storage layer, the recommended pattern is:
```
Application → Polyglot (transpile SQL to target dialect) → sqlx/diesel (execute)
```
---
## 6. Recommendation
### When to Adopt Polyglot
1. **You need to support multiple database backends with different SQL dialects** and want to write queries once in a canonical dialect, then transpile to the target at runtime.
2. **You need SQL validation or analysis** (lineage, schema checking) without executing queries.
3. **You need SQL pretty-printing or formatting** with configurable guard rails.
4. **You need column lineage tracking** for data governance or OpenLineage integration.
5. **You need to parse and analyze SQL** in a Rust/WASM/Python/Go context without connecting to a database.
### When NOT to Adopt Polyglot
1. **You need type-safe query building** — use Diesel or SeaORM instead.
2. **You need async database execution** — use SQLx or SeaORM instead.
3. **You need schema migrations** — use Diesel migrations, sqlx migrations, or Refinery instead.
4. **You only need PostgreSQL** (or a single dialect) — a simpler parser may suffice.
5. **You need Rust type → SQL type mapping** — Polyglot doesn't provide this.
### Suggested Adoption Strategy
For a multi-database storage layer:
1. **Use Polyglot for SQL transpilation**: Write queries in a canonical dialect (e.g., PostgreSQL-compatible), transpile to the target dialect at runtime.
2. **Use SQLx for database execution**: Handle connections, pooling, and async I/O.
3. **Use Polyglot for validation**: Validate user-provided SQL before execution.
4. **Use Polyglot for lineage**: Trace column flow for data governance.
5. **Build a thin integration layer** that combines Polyglot's transpilation with SQLx's execution.
---
## References
- <https://github.com/tobilg/polyglot> — Main repository
- <https://crates.io/crates/polyglot-sql> — Rust crate (v0.4.4)
- <https://docs.rs/polyglot-sql/latest/polyglot_sql/> — Rust API docs
- <https://github.com/tobymao/sqlglot> — Python inspiration
- <https://lib.rs/crates/polyglot-sql> — Package metadata
- Local source: `/workspace/polyglot/`