17 KiB
Polyglot: Suitability Analysis & Comparisons
1. What Polyglot Is NOT
Before evaluating suitability, it's essential to understand what Polyglot does not do:
| NOT a... | Because |
|---|---|
| Database driver | No connection management, no query execution, no result set handling |
| ORM | No object-relational mapping, no model definitions, no active record pattern |
| Migration tool | No CREATE TABLE evolution management, no up/down migrations framework |
| Type mapper | No Rust type → SQL type mapping, no FromRow derives |
| Connection pool | No async I/O, no TCP connections, no TLS |
| Query executor | Never connects to a database; operates purely on SQL text |
Polyglot is a SQL dialect transpiler. It converts SQL strings between database dialects. Period.
2. Suitability Assessment for Multi-Database Storage Layer
2.1 What Polyglot CAN Do for a Multi-DB Project
| Use Case | Polyglot Support | Maturity |
|---|---|---|
| SQL dialect translation | ✅ Core purpose; 32 dialects with 100% test pass rate | Mature |
| SQL pretty-printing | ✅ Built-in format with guard rails | Mature |
| SQL syntax validation | ✅ Line/column error positions, error codes | Mature |
| Schema-aware validation | ✅ Table/column/type checking with ValidationSchema |
Moderate |
| Column lineage tracing | ✅ get_column_lineage() for data lineage |
Moderate |
| OpenLineage payloads | ✅ RunEvent and DatasetFacet generation |
Early but functional |
| Query builder | ✅ Fluent API for SELECT/INSERT/UPDATE/DELETE | Usable but not as rich as query-builder-first libraries |
| AST diff | ✅ ChangeDistiller-based structural diff | Functional |
| Logical planning | ✅ Basic DAG plan extraction | Early stage |
| Query optimization | ✅ Column qualification, predicate pushdown, join elimination | Moderate |
| Custom dialect registration | ✅ CustomDialectBuilder for runtime extension |
Functional |
2.2 What Polyglot CANNOT Do for a Multi-DB Project
| Need | Polyglot Support | Alternative |
|---|---|---|
| Execute queries | ❌ No | Use sqlx, diesel, or sea-orm |
| Connection pooling | ❌ No | Use deadpool, bb8, or sqlx built-in |
| Async I/O | ❌ Synchronous only | Wrap in spawn_blocking() |
| Type-safe query building | ⚠️ Partial (builder API returns strings) | Use diesel or sea-orm for compile-time checks |
| Schema migration management | ❌ No | Use diesel migrations, sqlx migrations, or refinery |
| Row mapping / deserialization | ❌ No | Use sqlx FromRow, diesel Queryable |
| Runtime type mapping | ⚠️ Limited (DataType enum, no Rust type bridge) | Build your own layer |
| Database-specific DDL generation | ⚠️ Parses/generates DDL but no migration framework | Use as a building block |
| Transaction management | ❌ No | Use sqlx or diesel |
2.3 Integration Pattern: Polyglot as a SQL Dialect Layer
The most natural integration pattern for a multi-database storage layer:
┌──────────────────────────────────────────────┐
│ Application Logic │
├──────────────────────────────────────────────┤
│ Query Builder / ORM Layer │
│ (diesel / sea-orm / custom) │
├──────────────────────┬───────────────────────┤
│ │ │
│ Polyglot Layer │ Direct SQL │
│ (transpile, │ (no translation │
│ validate, │ needed) │
│ format) │ │
├──────────────────────┴───────────────────────┤
│ Database Driver Layer │
│ (sqlx / diesel / tungstenite) │
├──────────────────────────────────────────────┤
│ PostgreSQL │ MySQL │ SQLite │
└──────────────────────────────────────────────┘
In this pattern, Polyglot sits above the database drivers, translating SQL from a canonical dialect to the target database's dialect before execution. It does not replace the drivers.
3. Comparison with Other Rust SQL Libraries
3.1 Feature Comparison Matrix
| Feature | Polyglot | Diesel | SQLx | SeaORM | sqlparser-rs |
|---|---|---|---|---|---|
| Primary Purpose | SQL transpilation | ORM / query builder | Async DB driver | Async ORM | SQL parsing |
| SQL Parsing | ✅ Full AST (200+ node types) | ✅ DSL-based | ❌ No | ❌ No | ✅ Full AST |
| SQL Generation | ✅ Multi-dialect | ✅ Via DSL | ❌ No | ❌ No | ⚠️ Limited |
| Cross-dialect Transpilation | ✅ 32 dialects | ❌ No | ❌ No | ❌ No | ❌ No |
| Query Builder | ⚠️ Fluent, string-based | ✅ Type-safe DSL | ❌ No | ✅ Type-safe | ❌ No |
| Async I/O | ❌ No (sync only) | ❌ Diesel 1.x is sync | ✅ Native async | ✅ Native async | ❌ No |
| Type-safe Queries | ❌ No (runtime) | ✅ Compile-time | ❌ No | ✅ Compile-time | ❌ No |
| Connection Pool | ❌ No | ❌ No (Diesel 2.x via r2d2) | ✅ Built-in | ✅ Built-in | ❌ No |
| Migration Support | ❌ No | ✅ Built-in | ❌ No | ✅ Built-in | ❌ No |
| Database Execution | ❌ No | ✅ Yes | ✅ Yes | ✅ Yes | ❌ No |
| Schema Validation | ✅ Via ValidationSchema | ✅ Compile-time | ❌ No | ⚠️ Limited | ❌ No |
| Column Lineage | ✅ Built-in | ❌ No | ❌ No | ❌ No | ❌ No |
| AST Diff | ✅ Built-in | ❌ No | ❌ No | ❌ No | ❌ No |
| Dialects Supported | 32 | 4 (PG, MySQL, SQLite, MSSQL) | N/A | N/A | 1 (ANSI SQL) |
| License | MIT | MIT/Apache-2.0 | MIT/Apache-2.0 | MIT | MIT/Apache-2.0 |
| Maturity | v0.4.4 (pre-1.0) | v2.2 (stable) | v0.8 (stable) | v1.1 (stable) | v0.49 (mature) |
3.2 Polyglot vs Diesel
| Aspect | Polyglot | Diesel |
|---|---|---|
| Philosophy | Parse any SQL → AST → generate any dialect | Type-safe DSL → SQL for specific databases |
| Type Safety | Runtime (string-based) | Compile-time (macro-based) |
| Query Building | select(["col"]).from("t").where_(...) → Expression AST |
schema::table::dsl::col.filter(...) → SQL |
| Dialect Breadth | 32 dialects | 4 (PostgreSQL, MySQL, SQLite, MSSQL) |
| Database Execution | None (SQL text only) | Full CRUD with connection management |
| Migrations | None | Built-in migration framework |
| When to use | You need cross-dialect SQL translation, validation, lineage | You need type-safe queries with database execution |
Verdict: Polyglot and Diesel are complementary, not competing. Use Diesel for type-safe database interaction; use Polyglot when you need to translate SQL between dialects or analyze SQL without executing it.
3.3 Polyglot vs SQLx
| Aspect | Polyglot | SQLx |
|---|---|---|
| Philosophy | SQL manipulation without execution | Async database driver with compile-time query checking |
| Async | Synchronous only | Fully async |
| Query Checking | Runtime validation against schema | Compile-time query!() macro |
| Database Support | 32 dialects (parsing) | PostgreSQL, MySQL, SQLite (execution) |
| When to use | SQL transformation/analysis | Database interaction with async Rust |
Verdict: SQLx is for executing queries against databases. Polyglot is for transforming SQL text. They solve entirely different problems.
3.4 Polyglot vs SeaORM
| Aspect | Polyglot | SeaORM |
|---|---|---|
| Philosophy | SQL transpilation | Async ORM built on SQLx |
| Async | No | Yes |
| Model Definition | None | Entity models via macros |
| Relationships | None | Has-one, has-many, many-to-many |
| When to use | SQL dialect conversion | Database CRUD with relationships |
Verdict: Same as SQLx — complementary, not competing.
3.5 Polyglot vs sqlparser-rs
| Aspect | Polyglot | sqlparser-rs |
|---|---|---|
| Parsing | ✅ Full (200+ node types) | ✅ Full (ANSI SQL + some dialects) |
| Generation | ✅ Multi-dialect generation | ⚠️ Limited round-trip |
| Transpilation | ✅ Cross-dialect transforms | ❌ No |
| Dialects | 32 | Primarily ANSI SQL |
| Validation | ✅ With error positions | ❌ Parse errors only |
| Builder | ✅ Fluent API | ❌ No |
| Lineage | ✅ Built-in | ❌ No |
| Diff | ✅ Built-in | ❌ No |
| Maturity | v0.4.4 | v0.49 (more established) |
Verdict: sqlparser-rs is a mature parser for ANSI SQL. Polyglot offers significantly more: transpilation, 32 dialects, validation, lineage, diff, and a builder API. If you need dialect translation, Polyglot is the clear choice. If you only need ANSI SQL parsing and don't need generation/transpilation, sqlparser-rs may suffice with less overhead.
3.6 Polyglot vs Python sqlglot
| Aspect | Polyglot (Rust) | sqlglot (Python) |
|---|---|---|
| Performance | 8–19× faster (transpile), ~86× faster (generate) | Baseline |
| Language | Rust | Python |
| Feature Parity | ~95% of sqlglot's transpilation | Full feature set |
| Optimizer | Column qualification, predicate pushdown (moderate) | Full optimizer (column pruning, join elimination, etc.) |
| Execution | ❌ No | ⚠️ Limited (can execute against some engines) |
| Test Compatibility | 10,220+ sqlglot fixture cases at 100% | Original test suite |
| Deployment | Native binary / WASM / Python / Go | Python package |
Verdict: Polyglot is the performance-oriented port of sqlglot. It covers the core transpilation use case at near-full feature parity. The Python sqlglot has a more mature optimizer and some execution capabilities, but Polyglot is catching up rapidly (0.4.x adds lineage, OpenLineage, schema validation, and more).
4. Limitations and Gotchas
4.1 Current Limitations
| Limitation | Impact | Mitigation |
|---|---|---|
| Pre-1.0 API | Breaking changes possible between minor versions | Pin exact version in Cargo.toml |
| No query execution | Cannot run SQL against databases | Use alongside sqlx/diesel |
| No async | Blocking in async contexts | Wrap in spawn_blocking() |
| No migration framework | Cannot manage schema evolution | Use diesel migrations or refinery |
| No Rust type mapping | DataType enum doesn't map to Rust types |
Build your own type bridge |
| Builder returns Expression | Builder doesn't produce type-safe queries | Accept runtime nature; pair with runtime validation |
| Optimizer is early | Limited optimization passes vs Python sqlglot | Most useful passes exist (qualify_columns, pushdown_predicates) |
WASM lacks stacker |
Deeply nested SQL may overflow stack in browser | Set format guard limits; consider web workers |
| Custom dialects are global | CustomDialectBuilder uses a global RwLock registry |
Fine for most apps; not ideal for per-request isolation |
| No prepared statement support | Cannot generate ? placeholders for parameterized queries |
Build queries as strings; use sqlx for parameterization |
4.2 Gotchas
-
Dialect::get()creates a new instance each call: TheDialectstruct bundles tokenizer + generator config + transformer. For hot loops, cache theDialectinstance rather than callingDialect::get()repeatedly. (The overhead is minimal but non-zero.) -
Transpilation is not always invertible: Some dialects have features that don't exist in others (e.g., BigQuery's
QUALIFY, PostgreSQL'sILIKE, TSQL'sTOP). TranspilingA → Band thenB → Amay lose information. -
Function transformation depth: The transform pipeline processes per-node bottom-up. Some transformations require multi-pass processing (handled by
preprocess()), but edge cases may require manual intervention. -
AST is not a stable serialization format: The
Expressionenum and its inner structs may change between versions. If you serialize ASTs to JSON, expect breaking changes across minor versions. -
Feature flags are cumulative:
transpileimpliesgenerate,openlineageimpliessemantic, etc. For minimal builds, usedefault-features = falseand select only what you need. -
Global custom dialect registry: Custom dialects registered via
CustomDialectBuilder::register()are stored in a globalRwLock<HashMap>. This means they persist for the lifetime of the process and are visible across threads. Callunregister_custom_dialect()to remove them. -
Parser is permissive: The parser accepts many SQL constructs that some databases reject. Validation (via
validate()orvalidate_with_schema()) can catch some issues, but it's not a substitute for database-level error checking. -
No
?placeholder generation: Polyglot doesn't generate parameterized query placeholders. For prepared statements, you'll need to handle parameter binding yourself with your database driver. -
Schema validation requires manual schema definition: The
ValidationSchemastruct must be populated manually — there's no automatic schema introspection from a live database.
5. Production-Readiness Assessment
5.1 Strengths
| Area | Rating | Notes |
|---|---|---|
| Transpilation accuracy | ⭐⭐⭐⭐⭐ | 10,220+ fixture cases at 100% pass rate |
| Performance | ⭐⭐⭐⭐⭐ | 8–19× faster than Python sqlglot |
| Dialect coverage | ⭐⭐⭐⭐⭐ | 32 dialects covering all major databases |
| API ergonomics | ⭐⭐⭐⭐ | Clean public API; builder is pleasant |
| Error reporting | ⭐⭐⭐⭐ | Line/column/byte-offset positions |
| WASM support | ⭐⭐⭐⭐ | Full feature set in browser |
| Multi-language bindings | ⭐⭐⭐⭐⭐ | Rust, TypeScript, Python, Go, C FFI |
| Documentation | ⭐⭐⭐ | Rust API docs exist; could use more guides |
| Test coverage | ⭐⭐⭐⭐⭐ | 18,745 test cases |
| Fuzzing | ⭐⭐⭐⭐ | Supported via cargo fuzz |
5.2 Risks
| Risk | Severity | Mitigation |
|---|---|---|
| Pre-1.0 breaking changes | Medium | Pin version; monitor CHANGELOG |
| Single maintainer | Medium | Code is well-structured; community could fork |
| Limited optimizer | Low | Core passes exist; Python sqlglot is reference |
| No query execution | Low (by design) | Combine with sqlx/diesel |
| WASM stack limits | Low | Set guard rails; use web workers |
5.3 Overall Assessment
Polyglot is production-viable for SQL transpilation and analysis tasks, with caveats:
- ✅ Use for: SQL dialect translation, SQL linting/validation, column lineage, pretty-printing, AST analysis, cross-database query migration
- ⚠️ Use with caution for: Query building (no type safety), optimization (partial coverage)
- ❌ Don't use for: Database execution, connection management, migrations, type-safe queries
For a multi-database storage layer, the recommended pattern is:
Application → Polyglot (transpile SQL to target dialect) → sqlx/diesel (execute)
6. Recommendation
When to Adopt Polyglot
- You need to support multiple database backends with different SQL dialects and want to write queries once in a canonical dialect, then transpile to the target at runtime.
- You need SQL validation or analysis (lineage, schema checking) without executing queries.
- You need SQL pretty-printing or formatting with configurable guard rails.
- You need column lineage tracking for data governance or OpenLineage integration.
- You need to parse and analyze SQL in a Rust/WASM/Python/Go context without connecting to a database.
When NOT to Adopt Polyglot
- You need type-safe query building — use Diesel or SeaORM instead.
- You need async database execution — use SQLx or SeaORM instead.
- You need schema migrations — use Diesel migrations, sqlx migrations, or Refinery instead.
- You only need PostgreSQL (or a single dialect) — a simpler parser may suffice.
- You need Rust type → SQL type mapping — Polyglot doesn't provide this.
Suggested Adoption Strategy
For a multi-database storage layer:
- Use Polyglot for SQL transpilation: Write queries in a canonical dialect (e.g., PostgreSQL-compatible), transpile to the target dialect at runtime.
- Use SQLx for database execution: Handle connections, pooling, and async I/O.
- Use Polyglot for validation: Validate user-provided SQL before execution.
- Use Polyglot for lineage: Trace column flow for data governance.
- Build a thin integration layer that combines Polyglot's transpilation with SQLx's execution.
References
- https://github.com/tobilg/polyglot — Main repository
- https://crates.io/crates/polyglot-sql — Rust crate (v0.4.4)
- https://docs.rs/polyglot-sql/latest/polyglot_sql/ — Rust API docs
- https://github.com/tobymao/sqlglot — Python inspiration
- https://lib.rs/crates/polyglot-sql — Package metadata
- Local source:
/workspace/polyglot/