Files
alknet/docs/research/references/polyglot/01_overview.md

6.5 KiB
Raw Blame History

Polyglot: Research Overview

Library: polyglot-sql (Rust crate) / @polyglot-sql/sdk (TypeScript/WASM) / polyglot-sql (Python)
Repository: https://github.com/tobilg/polyglot
Current Version: 0.4.4 (as of 2026-06-03)
License: MIT (+ sqlglot MIT for test fixtures)
Author: Tobias G. (tobilg)
Inspiration: Python sqlglot by Toby Mao


1. What Is Polyglot?

Polyglot is a SQL transpiler — it parses SQL from one database dialect into an AST, and generates SQL for a different dialect. It is not a database driver, ORM, query executor, or connection pool. Its core purpose is dialect-agnostic SQL manipulation: parse, transform, validate, format, and transpile SQL across 32+ database dialects.

Key Capabilities

Capability Description
Parse Convert SQL string → typed AST with 200+ expression node types
Generate Convert AST → SQL string for any supported dialect
Transpile Convert SQL from dialect A → dialect B in one call
Format Pretty-print SQL with configurable guard rails
Build Construct SQL programmatically via a fluent builder API
Validate Syntax + semantic validation with error positions
Lineage Trace column lineage through queries; generate OpenLineage payloads
Diff AST-aware diff between two SQL expressions
Traverse DFS/BFS iterators, predicate queries, and transforms on the AST

Supported Dialects (32)

Athena, BigQuery, ClickHouse, CockroachDB, Databricks, Doris, Dremio, Drill, Druid, DuckDB, Dune, Exasol, Fabric, Hive, Materialize, MySQL, Oracle, PostgreSQL, Presto, Redshift, RisingWave, SingleStore, Snowflake, Solr, Spark, SQLite, StarRocks, Tableau, Teradata, TiDB, Trino, TSQL

Plus a Generic dialect for standard SQL.

Language Bindings

Binding Package Delivery
Rust polyglot-sql on crates.io Native Rust crate
TypeScript/WASM @polyglot-sql/sdk on npm WASM module + JS wrapper
Python polyglot-sql on PyPI PyO3 native extension
Go github.com/tobilg/polyglot/packages/go PureGo wrapper over C FFI
C FFI Built from polyglot-sql-ffi .so / .dylib / .dll + .a / .lib + header

2. Core Philosophy & Design Principles

  1. Pipeline architecture: SQL → Tokenize → Parse → AST → Transform → Generate → SQL string. Each stage is independently configurable per dialect.

  2. Ported from Python sqlglot: The Rust implementation is a faithful port of the Python sqlglot library, maintaining compatibility with its test fixtures (10,220+ fixture cases at 100% pass rate). The architecture, expression types, transformation rules, and dialect behaviors mirror the Python original.

  3. No runtime database connection: Polyglot never connects to a database. It operates purely on SQL strings and ASTs. This makes it safe for sandboxed environments (WASM, serverless) and suitable for build-time / CI-time SQL analysis.

  4. Feature-gated compilation: Each dialect is behind a Cargo feature flag (dialect-postgresql, dialect-mysql, etc.), so users compiling for constrained targets (WASM) can include only what they need. The default feature set includes everything.

  5. Stack safety: The stacker feature (default-on for native builds) grows the stack on deeply nested inputs, preventing stack overflow from pathological SQL. WASM builds opt out since stacker doesn't work there.

  6. Guard rails: Format/guard options limit input size (16 MiB default), token count (1M), AST node count (1M), and set-operation chain depth (256) to prevent resource exhaustion.

  7. Performance-first: Built in Rust for speed. Benchmarks show 819× speedup over the Python sqlglot for transpilation, with generation at ~86× faster. The WASM build enables near-native performance in browsers.


3. How It Differs from Database Abstraction Layers

Critical distinction: Polyglot is a SQL dialect transpiler, not a database abstraction layer. It does not:

  • Connect to databases
  • Execute queries
  • Manage connection pools
  • Handle migrations (no CREATE TABLE schema evolution management)
  • Map Rust types to database types
  • Provide an ORM-like interface
  • Handle async I/O

Instead, it focuses purely on SQL text manipulation: parsing, analyzing, transforming, and generating SQL strings. This makes it complementary to (not competing with) libraries like Diesel, SQLx, or SeaORM.


4. Performance Characteristics

From the project's benchmark suite (polyglot-sql v0.1.2 vs sqlglot v28.10.1):

Operation Speedup Range
Parse (SQL → AST) 1013× faster
Generate (AST → SQL) 77101× faster
Roundtrip (parse → generate → re-parse) 1315× faster
Transpile (full cross-dialect) 1.6× (simple) to 19× (complex BigQuery→Snowflake)
Geometric mean 8.70×

Parse benchmarks (v0.4.x, native Rust):

Query Mean
short (SELECT a, b, c) 51.28 μs
medium (5 cols, JOIN, GROUP BY) 259.61 μs
complex (3 CTEs, subquery) 268.59 μs 1.03 ms

5. Project Maturity Indicators

Indicator Status
Version 0.4.4 (pre-1.0, active development)
Test coverage 18,745 test cases at 100% pass rate
crates.io downloads ~4,738 total (as of mid-2026)
Dependent crates 2 (via entdb)
Release cadence Frequent patch releases (0.4.2, 0.4.3, 0.4.4 in quick succession)
Source code size ~241K lines of Rust in core crate
Fuzzing Supported via cargo +nightly fuzz
CI Full test suite + FFI + Python + WASM
Documentation Rust API docs (docs.rs), TypeScript docs, Python docs, playground
Breaking changes Possible before 1.0; semver suggests API instability

6. License

  • MIT License for the Polyglot code itself
  • sqlglot MIT License for the test fixtures derived from the Python project
  • Both are permissive, suitable for commercial use

References