Files
alknet/docs/research/references/polyglot/03_analysis.md

17 KiB
Raw Blame History

Polyglot: Suitability Analysis & Comparisons


1. What Polyglot Is NOT

Before evaluating suitability, it's essential to understand what Polyglot does not do:

NOT a... Because
Database driver No connection management, no query execution, no result set handling
ORM No object-relational mapping, no model definitions, no active record pattern
Migration tool No CREATE TABLE evolution management, no up/down migrations framework
Type mapper No Rust type → SQL type mapping, no FromRow derives
Connection pool No async I/O, no TCP connections, no TLS
Query executor Never connects to a database; operates purely on SQL text

Polyglot is a SQL dialect transpiler. It converts SQL strings between database dialects. Period.


2. Suitability Assessment for Multi-Database Storage Layer

2.1 What Polyglot CAN Do for a Multi-DB Project

Use Case Polyglot Support Maturity
SQL dialect translation Core purpose; 32 dialects with 100% test pass rate Mature
SQL pretty-printing Built-in format with guard rails Mature
SQL syntax validation Line/column error positions, error codes Mature
Schema-aware validation Table/column/type checking with ValidationSchema Moderate
Column lineage tracing get_column_lineage() for data lineage Moderate
OpenLineage payloads RunEvent and DatasetFacet generation Early but functional
Query builder Fluent API for SELECT/INSERT/UPDATE/DELETE Usable but not as rich as query-builder-first libraries
AST diff ChangeDistiller-based structural diff Functional
Logical planning Basic DAG plan extraction Early stage
Query optimization Column qualification, predicate pushdown, join elimination Moderate
Custom dialect registration CustomDialectBuilder for runtime extension Functional

2.2 What Polyglot CANNOT Do for a Multi-DB Project

Need Polyglot Support Alternative
Execute queries No Use sqlx, diesel, or sea-orm
Connection pooling No Use deadpool, bb8, or sqlx built-in
Async I/O Synchronous only Wrap in spawn_blocking()
Type-safe query building ⚠️ Partial (builder API returns strings) Use diesel or sea-orm for compile-time checks
Schema migration management No Use diesel migrations, sqlx migrations, or refinery
Row mapping / deserialization No Use sqlx FromRow, diesel Queryable
Runtime type mapping ⚠️ Limited (DataType enum, no Rust type bridge) Build your own layer
Database-specific DDL generation ⚠️ Parses/generates DDL but no migration framework Use as a building block
Transaction management No Use sqlx or diesel

2.3 Integration Pattern: Polyglot as a SQL Dialect Layer

The most natural integration pattern for a multi-database storage layer:

┌──────────────────────────────────────────────┐
│           Application Logic                   │
├──────────────────────────────────────────────┤
│         Query Builder / ORM Layer             │
│         (diesel / sea-orm / custom)            │
├──────────────────────┬───────────────────────┤
│                      │                        │
│   Polyglot Layer     │    Direct SQL          │
│   (transpile,        │    (no translation     │
│    validate,          │     needed)             │
│    format)            │                        │
├──────────────────────┴───────────────────────┤
│         Database Driver Layer                  │
│         (sqlx / diesel / tungstenite)          │
├──────────────────────────────────────────────┤
│    PostgreSQL    │    MySQL    │   SQLite      │
└──────────────────────────────────────────────┘

In this pattern, Polyglot sits above the database drivers, translating SQL from a canonical dialect to the target database's dialect before execution. It does not replace the drivers.


3. Comparison with Other Rust SQL Libraries

3.1 Feature Comparison Matrix

Feature Polyglot Diesel SQLx SeaORM sqlparser-rs
Primary Purpose SQL transpilation ORM / query builder Async DB driver Async ORM SQL parsing
SQL Parsing Full AST (200+ node types) DSL-based No No Full AST
SQL Generation Multi-dialect Via DSL No No ⚠️ Limited
Cross-dialect Transpilation 32 dialects No No No No
Query Builder ⚠️ Fluent, string-based Type-safe DSL No Type-safe No
Async I/O No (sync only) Diesel 1.x is sync Native async Native async No
Type-safe Queries No (runtime) Compile-time No Compile-time No
Connection Pool No No (Diesel 2.x via r2d2) Built-in Built-in No
Migration Support No Built-in No Built-in No
Database Execution No Yes Yes Yes No
Schema Validation Via ValidationSchema Compile-time No ⚠️ Limited No
Column Lineage Built-in No No No No
AST Diff Built-in No No No No
Dialects Supported 32 4 (PG, MySQL, SQLite, MSSQL) N/A N/A 1 (ANSI SQL)
License MIT MIT/Apache-2.0 MIT/Apache-2.0 MIT MIT/Apache-2.0
Maturity v0.4.4 (pre-1.0) v2.2 (stable) v0.8 (stable) v1.1 (stable) v0.49 (mature)

3.2 Polyglot vs Diesel

Aspect Polyglot Diesel
Philosophy Parse any SQL → AST → generate any dialect Type-safe DSL → SQL for specific databases
Type Safety Runtime (string-based) Compile-time (macro-based)
Query Building select(["col"]).from("t").where_(...)Expression AST schema::table::dsl::col.filter(...) → SQL
Dialect Breadth 32 dialects 4 (PostgreSQL, MySQL, SQLite, MSSQL)
Database Execution None (SQL text only) Full CRUD with connection management
Migrations None Built-in migration framework
When to use You need cross-dialect SQL translation, validation, lineage You need type-safe queries with database execution

Verdict: Polyglot and Diesel are complementary, not competing. Use Diesel for type-safe database interaction; use Polyglot when you need to translate SQL between dialects or analyze SQL without executing it.

3.3 Polyglot vs SQLx

Aspect Polyglot SQLx
Philosophy SQL manipulation without execution Async database driver with compile-time query checking
Async Synchronous only Fully async
Query Checking Runtime validation against schema Compile-time query!() macro
Database Support 32 dialects (parsing) PostgreSQL, MySQL, SQLite (execution)
When to use SQL transformation/analysis Database interaction with async Rust

Verdict: SQLx is for executing queries against databases. Polyglot is for transforming SQL text. They solve entirely different problems.

3.4 Polyglot vs SeaORM

Aspect Polyglot SeaORM
Philosophy SQL transpilation Async ORM built on SQLx
Async No Yes
Model Definition None Entity models via macros
Relationships None Has-one, has-many, many-to-many
When to use SQL dialect conversion Database CRUD with relationships

Verdict: Same as SQLx — complementary, not competing.

3.5 Polyglot vs sqlparser-rs

Aspect Polyglot sqlparser-rs
Parsing Full (200+ node types) Full (ANSI SQL + some dialects)
Generation Multi-dialect generation ⚠️ Limited round-trip
Transpilation Cross-dialect transforms No
Dialects 32 Primarily ANSI SQL
Validation With error positions Parse errors only
Builder Fluent API No
Lineage Built-in No
Diff Built-in No
Maturity v0.4.4 v0.49 (more established)

Verdict: sqlparser-rs is a mature parser for ANSI SQL. Polyglot offers significantly more: transpilation, 32 dialects, validation, lineage, diff, and a builder API. If you need dialect translation, Polyglot is the clear choice. If you only need ANSI SQL parsing and don't need generation/transpilation, sqlparser-rs may suffice with less overhead.

3.6 Polyglot vs Python sqlglot

Aspect Polyglot (Rust) sqlglot (Python)
Performance 819× faster (transpile), ~86× faster (generate) Baseline
Language Rust Python
Feature Parity ~95% of sqlglot's transpilation Full feature set
Optimizer Column qualification, predicate pushdown (moderate) Full optimizer (column pruning, join elimination, etc.)
Execution No ⚠️ Limited (can execute against some engines)
Test Compatibility 10,220+ sqlglot fixture cases at 100% Original test suite
Deployment Native binary / WASM / Python / Go Python package

Verdict: Polyglot is the performance-oriented port of sqlglot. It covers the core transpilation use case at near-full feature parity. The Python sqlglot has a more mature optimizer and some execution capabilities, but Polyglot is catching up rapidly (0.4.x adds lineage, OpenLineage, schema validation, and more).


4. Limitations and Gotchas

4.1 Current Limitations

Limitation Impact Mitigation
Pre-1.0 API Breaking changes possible between minor versions Pin exact version in Cargo.toml
No query execution Cannot run SQL against databases Use alongside sqlx/diesel
No async Blocking in async contexts Wrap in spawn_blocking()
No migration framework Cannot manage schema evolution Use diesel migrations or refinery
No Rust type mapping DataType enum doesn't map to Rust types Build your own type bridge
Builder returns Expression Builder doesn't produce type-safe queries Accept runtime nature; pair with runtime validation
Optimizer is early Limited optimization passes vs Python sqlglot Most useful passes exist (qualify_columns, pushdown_predicates)
WASM lacks stacker Deeply nested SQL may overflow stack in browser Set format guard limits; consider web workers
Custom dialects are global CustomDialectBuilder uses a global RwLock registry Fine for most apps; not ideal for per-request isolation
No prepared statement support Cannot generate ? placeholders for parameterized queries Build queries as strings; use sqlx for parameterization

4.2 Gotchas

  1. Dialect::get() creates a new instance each call: The Dialect struct bundles tokenizer + generator config + transformer. For hot loops, cache the Dialect instance rather than calling Dialect::get() repeatedly. (The overhead is minimal but non-zero.)

  2. Transpilation is not always invertible: Some dialects have features that don't exist in others (e.g., BigQuery's QUALIFY, PostgreSQL's ILIKE, TSQL's TOP). Transpiling A → B and then B → A may lose information.

  3. Function transformation depth: The transform pipeline processes per-node bottom-up. Some transformations require multi-pass processing (handled by preprocess()), but edge cases may require manual intervention.

  4. AST is not a stable serialization format: The Expression enum and its inner structs may change between versions. If you serialize ASTs to JSON, expect breaking changes across minor versions.

  5. Feature flags are cumulative: transpile implies generate, openlineage implies semantic, etc. For minimal builds, use default-features = false and select only what you need.

  6. Global custom dialect registry: Custom dialects registered via CustomDialectBuilder::register() are stored in a global RwLock<HashMap>. This means they persist for the lifetime of the process and are visible across threads. Call unregister_custom_dialect() to remove them.

  7. Parser is permissive: The parser accepts many SQL constructs that some databases reject. Validation (via validate() or validate_with_schema()) can catch some issues, but it's not a substitute for database-level error checking.

  8. No ? placeholder generation: Polyglot doesn't generate parameterized query placeholders. For prepared statements, you'll need to handle parameter binding yourself with your database driver.

  9. Schema validation requires manual schema definition: The ValidationSchema struct must be populated manually — there's no automatic schema introspection from a live database.


5. Production-Readiness Assessment

5.1 Strengths

Area Rating Notes
Transpilation accuracy 10,220+ fixture cases at 100% pass rate
Performance 819× faster than Python sqlglot
Dialect coverage 32 dialects covering all major databases
API ergonomics Clean public API; builder is pleasant
Error reporting Line/column/byte-offset positions
WASM support Full feature set in browser
Multi-language bindings Rust, TypeScript, Python, Go, C FFI
Documentation Rust API docs exist; could use more guides
Test coverage 18,745 test cases
Fuzzing Supported via cargo fuzz

5.2 Risks

Risk Severity Mitigation
Pre-1.0 breaking changes Medium Pin version; monitor CHANGELOG
Single maintainer Medium Code is well-structured; community could fork
Limited optimizer Low Core passes exist; Python sqlglot is reference
No query execution Low (by design) Combine with sqlx/diesel
WASM stack limits Low Set guard rails; use web workers

5.3 Overall Assessment

Polyglot is production-viable for SQL transpilation and analysis tasks, with caveats:

  • Use for: SQL dialect translation, SQL linting/validation, column lineage, pretty-printing, AST analysis, cross-database query migration
  • ⚠️ Use with caution for: Query building (no type safety), optimization (partial coverage)
  • Don't use for: Database execution, connection management, migrations, type-safe queries

For a multi-database storage layer, the recommended pattern is:

Application → Polyglot (transpile SQL to target dialect) → sqlx/diesel (execute)

6. Recommendation

When to Adopt Polyglot

  1. You need to support multiple database backends with different SQL dialects and want to write queries once in a canonical dialect, then transpile to the target at runtime.
  2. You need SQL validation or analysis (lineage, schema checking) without executing queries.
  3. You need SQL pretty-printing or formatting with configurable guard rails.
  4. You need column lineage tracking for data governance or OpenLineage integration.
  5. You need to parse and analyze SQL in a Rust/WASM/Python/Go context without connecting to a database.

When NOT to Adopt Polyglot

  1. You need type-safe query building — use Diesel or SeaORM instead.
  2. You need async database execution — use SQLx or SeaORM instead.
  3. You need schema migrations — use Diesel migrations, sqlx migrations, or Refinery instead.
  4. You only need PostgreSQL (or a single dialect) — a simpler parser may suffice.
  5. You need Rust type → SQL type mapping — Polyglot doesn't provide this.

Suggested Adoption Strategy

For a multi-database storage layer:

  1. Use Polyglot for SQL transpilation: Write queries in a canonical dialect (e.g., PostgreSQL-compatible), transpile to the target dialect at runtime.
  2. Use SQLx for database execution: Handle connections, pooling, and async I/O.
  3. Use Polyglot for validation: Validate user-provided SQL before execution.
  4. Use Polyglot for lineage: Trace column flow for data governance.
  5. Build a thin integration layer that combines Polyglot's transpilation with SQLx's execution.

References