# Polyglot: Suitability Analysis & Comparisons --- ## 1. What Polyglot Is NOT Before evaluating suitability, it's essential to understand what Polyglot **does not** do: | NOT a... | Because | |---|---| | **Database driver** | No connection management, no query execution, no result set handling | | **ORM** | No object-relational mapping, no model definitions, no active record pattern | | **Migration tool** | No `CREATE TABLE` evolution management, no up/down migrations framework | | **Type mapper** | No Rust type → SQL type mapping, no `FromRow` derives | | **Connection pool** | No async I/O, no TCP connections, no TLS | | **Query executor** | Never connects to a database; operates purely on SQL text | **Polyglot is a SQL dialect transpiler.** It converts SQL strings between database dialects. Period. --- ## 2. Suitability Assessment for Multi-Database Storage Layer ### 2.1 What Polyglot CAN Do for a Multi-DB Project | Use Case | Polyglot Support | Maturity | |---|---|---| | **SQL dialect translation** | ✅ Core purpose; 32 dialects with 100% test pass rate | Mature | | **SQL pretty-printing** | ✅ Built-in format with guard rails | Mature | | **SQL syntax validation** | ✅ Line/column error positions, error codes | Mature | | **Schema-aware validation** | ✅ Table/column/type checking with `ValidationSchema` | Moderate | | **Column lineage tracing** | ✅ `get_column_lineage()` for data lineage | Moderate | | **OpenLineage payloads** | ✅ `RunEvent` and `DatasetFacet` generation | Early but functional | | **Query builder** | ✅ Fluent API for SELECT/INSERT/UPDATE/DELETE | Usable but not as rich as query-builder-first libraries | | **AST diff** | ✅ ChangeDistiller-based structural diff | Functional | | **Logical planning** | ✅ Basic DAG plan extraction | Early stage | | **Query optimization** | ✅ Column qualification, predicate pushdown, join elimination | Moderate | | **Custom dialect registration** | ✅ `CustomDialectBuilder` for runtime extension | Functional | ### 2.2 What Polyglot CANNOT Do for a Multi-DB Project | Need | Polyglot Support | Alternative | |---|---|---| | **Execute queries** | ❌ No | Use sqlx, diesel, or sea-orm | | **Connection pooling** | ❌ No | Use deadpool, bb8, or sqlx built-in | | **Async I/O** | ❌ Synchronous only | Wrap in `spawn_blocking()` | | **Type-safe query building** | ⚠️ Partial (builder API returns strings) | Use diesel or sea-orm for compile-time checks | | **Schema migration management** | ❌ No | Use diesel migrations, sqlx migrations, or refinery | | **Row mapping / deserialization** | ❌ No | Use sqlx `FromRow`, diesel `Queryable` | | **Runtime type mapping** | ⚠️ Limited (DataType enum, no Rust type bridge) | Build your own layer | | **Database-specific DDL generation** | ⚠️ Parses/generates DDL but no migration framework | Use as a building block | | **Transaction management** | ❌ No | Use sqlx or diesel | ### 2.3 Integration Pattern: Polyglot as a SQL Dialect Layer The most natural integration pattern for a multi-database storage layer: ``` ┌──────────────────────────────────────────────┐ │ Application Logic │ ├──────────────────────────────────────────────┤ │ Query Builder / ORM Layer │ │ (diesel / sea-orm / custom) │ ├──────────────────────┬───────────────────────┤ │ │ │ │ Polyglot Layer │ Direct SQL │ │ (transpile, │ (no translation │ │ validate, │ needed) │ │ format) │ │ ├──────────────────────┴───────────────────────┤ │ Database Driver Layer │ │ (sqlx / diesel / tungstenite) │ ├──────────────────────────────────────────────┤ │ PostgreSQL │ MySQL │ SQLite │ └──────────────────────────────────────────────┘ ``` In this pattern, Polyglot sits **above** the database drivers, translating SQL from a canonical dialect to the target database's dialect before execution. It does **not** replace the drivers. --- ## 3. Comparison with Other Rust SQL Libraries ### 3.1 Feature Comparison Matrix | Feature | **Polyglot** | **Diesel** | **SQLx** | **SeaORM** | **sqlparser-rs** | |---|---|---|---|---|---| | **Primary Purpose** | SQL transpilation | ORM / query builder | Async DB driver | Async ORM | SQL parsing | | **SQL Parsing** | ✅ Full AST (200+ node types) | ✅ DSL-based | ❌ No | ❌ No | ✅ Full AST | | **SQL Generation** | ✅ Multi-dialect | ✅ Via DSL | ❌ No | ❌ No | ⚠️ Limited | | **Cross-dialect Transpilation** | ✅ 32 dialects | ❌ No | ❌ No | ❌ No | ❌ No | | **Query Builder** | ⚠️ Fluent, string-based | ✅ Type-safe DSL | ❌ No | ✅ Type-safe | ❌ No | | **Async I/O** | ❌ No (sync only) | ❌ Diesel 1.x is sync | ✅ Native async | ✅ Native async | ❌ No | | **Type-safe Queries** | ❌ No (runtime) | ✅ Compile-time | ❌ No | ✅ Compile-time | ❌ No | | **Connection Pool** | ❌ No | ❌ No (Diesel 2.x via r2d2) | ✅ Built-in | ✅ Built-in | ❌ No | | **Migration Support** | ❌ No | ✅ Built-in | ❌ No | ✅ Built-in | ❌ No | | **Database Execution** | ❌ No | ✅ Yes | ✅ Yes | ✅ Yes | ❌ No | | **Schema Validation** | ✅ Via ValidationSchema | ✅ Compile-time | ❌ No | ⚠️ Limited | ❌ No | | **Column Lineage** | ✅ Built-in | ❌ No | ❌ No | ❌ No | ❌ No | | **AST Diff** | ✅ Built-in | ❌ No | ❌ No | ❌ No | ❌ No | | **Dialects Supported** | 32 | 4 (PG, MySQL, SQLite, MSSQL) | N/A | N/A | 1 (ANSI SQL) | | **License** | MIT | MIT/Apache-2.0 | MIT/Apache-2.0 | MIT | MIT/Apache-2.0 | | **Maturity** | v0.4.4 (pre-1.0) | v2.2 (stable) | v0.8 (stable) | v1.1 (stable) | v0.49 (mature) | ### 3.2 Polyglot vs Diesel | Aspect | Polyglot | Diesel | |---|---|---| | **Philosophy** | Parse any SQL → AST → generate any dialect | Type-safe DSL → SQL for specific databases | | **Type Safety** | Runtime (string-based) | Compile-time (macro-based) | | **Query Building** | `select(["col"]).from("t").where_(...)` → `Expression` AST | `schema::table::dsl::col.filter(...)` → SQL | | **Dialect Breadth** | 32 dialects | 4 (PostgreSQL, MySQL, SQLite, MSSQL) | | **Database Execution** | None (SQL text only) | Full CRUD with connection management | | **Migrations** | None | Built-in migration framework | | **When to use** | You need cross-dialect SQL translation, validation, lineage | You need type-safe queries with database execution | **Verdict**: Polyglot and Diesel are **complementary**, not competing. Use Diesel for type-safe database interaction; use Polyglot when you need to translate SQL between dialects or analyze SQL without executing it. ### 3.3 Polyglot vs SQLx | Aspect | Polyglot | SQLx | |---|---|---| | **Philosophy** | SQL manipulation without execution | Async database driver with compile-time query checking | | **Async** | Synchronous only | Fully async | | **Query Checking** | Runtime validation against schema | Compile-time `query!()` macro | | **Database Support** | 32 dialects (parsing) | PostgreSQL, MySQL, SQLite (execution) | | **When to use** | SQL transformation/analysis | Database interaction with async Rust | **Verdict**: SQLx is for executing queries against databases. Polyglot is for transforming SQL text. They solve entirely different problems. ### 3.4 Polyglot vs SeaORM | Aspect | Polyglot | SeaORM | |---|---|---| | **Philosophy** | SQL transpilation | Async ORM built on SQLx | | **Async** | No | Yes | | **Model Definition** | None | Entity models via macros | | **Relationships** | None | Has-one, has-many, many-to-many | | **When to use** | SQL dialect conversion | Database CRUD with relationships | **Verdict**: Same as SQLx — complementary, not competing. ### 3.5 Polyglot vs sqlparser-rs | Aspect | Polyglot | sqlparser-rs | |---|---|---| | **Parsing** | ✅ Full (200+ node types) | ✅ Full (ANSI SQL + some dialects) | | **Generation** | ✅ Multi-dialect generation | ⚠️ Limited round-trip | | **Transpilation** | ✅ Cross-dialect transforms | ❌ No | | **Dialects** | 32 | Primarily ANSI SQL | | **Validation** | ✅ With error positions | ❌ Parse errors only | | **Builder** | ✅ Fluent API | ❌ No | | **Lineage** | ✅ Built-in | ❌ No | | **Diff** | ✅ Built-in | ❌ No | | **Maturity** | v0.4.4 | v0.49 (more established) | **Verdict**: sqlparser-rs is a mature parser for ANSI SQL. Polyglot offers significantly more: transpilation, 32 dialects, validation, lineage, diff, and a builder API. If you need dialect translation, Polyglot is the clear choice. If you only need ANSI SQL parsing and don't need generation/transpilation, sqlparser-rs may suffice with less overhead. ### 3.6 Polyglot vs Python sqlglot | Aspect | Polyglot (Rust) | sqlglot (Python) | |---|---|---| | **Performance** | 8–19× faster (transpile), ~86× faster (generate) | Baseline | | **Language** | Rust | Python | | **Feature Parity** | ~95% of sqlglot's transpilation | Full feature set | | **Optimizer** | Column qualification, predicate pushdown (moderate) | Full optimizer (column pruning, join elimination, etc.) | | **Execution** | ❌ No | ⚠️ Limited (can execute against some engines) | | **Test Compatibility** | 10,220+ sqlglot fixture cases at 100% | Original test suite | | **Deployment** | Native binary / WASM / Python / Go | Python package | **Verdict**: Polyglot is the performance-oriented port of sqlglot. It covers the core transpilation use case at near-full feature parity. The Python sqlglot has a more mature optimizer and some execution capabilities, but Polyglot is catching up rapidly (0.4.x adds lineage, OpenLineage, schema validation, and more). --- ## 4. Limitations and Gotchas ### 4.1 Current Limitations | Limitation | Impact | Mitigation | |---|---|---| | **Pre-1.0 API** | Breaking changes possible between minor versions | Pin exact version in Cargo.toml | | **No query execution** | Cannot run SQL against databases | Use alongside sqlx/diesel | | **No async** | Blocking in async contexts | Wrap in `spawn_blocking()` | | **No migration framework** | Cannot manage schema evolution | Use diesel migrations or refinery | | **No Rust type mapping** | `DataType` enum doesn't map to Rust types | Build your own type bridge | | **Builder returns Expression** | Builder doesn't produce type-safe queries | Accept runtime nature; pair with runtime validation | | **Optimizer is early** | Limited optimization passes vs Python sqlglot | Most useful passes exist (qualify_columns, pushdown_predicates) | | **WASM lacks `stacker`** | Deeply nested SQL may overflow stack in browser | Set format guard limits; consider web workers | | **Custom dialects are global** | `CustomDialectBuilder` uses a global `RwLock` registry | Fine for most apps; not ideal for per-request isolation | | **No prepared statement support** | Cannot generate `?` placeholders for parameterized queries | Build queries as strings; use sqlx for parameterization | ### 4.2 Gotchas 1. **`Dialect::get()` creates a new instance each call**: The `Dialect` struct bundles tokenizer + generator config + transformer. For hot loops, cache the `Dialect` instance rather than calling `Dialect::get()` repeatedly. (The overhead is minimal but non-zero.) 2. **Transpilation is not always invertible**: Some dialects have features that don't exist in others (e.g., BigQuery's `QUALIFY`, PostgreSQL's `ILIKE`, TSQL's `TOP`). Transpiling `A → B` and then `B → A` may lose information. 3. **Function transformation depth**: The transform pipeline processes per-node bottom-up. Some transformations require multi-pass processing (handled by `preprocess()`), but edge cases may require manual intervention. 4. **AST is not a stable serialization format**: The `Expression` enum and its inner structs may change between versions. If you serialize ASTs to JSON, expect breaking changes across minor versions. 5. **Feature flags are cumulative**: `transpile` implies `generate`, `openlineage` implies `semantic`, etc. For minimal builds, use `default-features = false` and select only what you need. 6. **Global custom dialect registry**: Custom dialects registered via `CustomDialectBuilder::register()` are stored in a global `RwLock`. This means they persist for the lifetime of the process and are visible across threads. Call `unregister_custom_dialect()` to remove them. 7. **Parser is permissive**: The parser accepts many SQL constructs that some databases reject. Validation (via `validate()` or `validate_with_schema()`) can catch some issues, but it's not a substitute for database-level error checking. 8. **No `?` placeholder generation**: Polyglot doesn't generate parameterized query placeholders. For prepared statements, you'll need to handle parameter binding yourself with your database driver. 9. **Schema validation requires manual schema definition**: The `ValidationSchema` struct must be populated manually — there's no automatic schema introspection from a live database. --- ## 5. Production-Readiness Assessment ### 5.1 Strengths | Area | Rating | Notes | |---|---|---| | **Transpilation accuracy** | ⭐⭐⭐⭐⭐ | 10,220+ fixture cases at 100% pass rate | | **Performance** | ⭐⭐⭐⭐⭐ | 8–19× faster than Python sqlglot | | **Dialect coverage** | ⭐⭐⭐⭐⭐ | 32 dialects covering all major databases | | **API ergonomics** | ⭐⭐⭐⭐ | Clean public API; builder is pleasant | | **Error reporting** | ⭐⭐⭐⭐ | Line/column/byte-offset positions | | **WASM support** | ⭐⭐⭐⭐ | Full feature set in browser | | **Multi-language bindings** | ⭐⭐⭐⭐⭐ | Rust, TypeScript, Python, Go, C FFI | | **Documentation** | ⭐⭐⭐ | Rust API docs exist; could use more guides | | **Test coverage** | ⭐⭐⭐⭐⭐ | 18,745 test cases | | **Fuzzing** | ⭐⭐⭐⭐ | Supported via `cargo fuzz` | ### 5.2 Risks | Risk | Severity | Mitigation | |---|---|---| | **Pre-1.0 breaking changes** | Medium | Pin version; monitor CHANGELOG | | **Single maintainer** | Medium | Code is well-structured; community could fork | | **Limited optimizer** | Low | Core passes exist; Python sqlglot is reference | | **No query execution** | Low (by design) | Combine with sqlx/diesel | | **WASM stack limits** | Low | Set guard rails; use web workers | ### 5.3 Overall Assessment **Polyglot is production-viable for SQL transpilation and analysis tasks**, with caveats: - ✅ **Use for**: SQL dialect translation, SQL linting/validation, column lineage, pretty-printing, AST analysis, cross-database query migration - ⚠️ **Use with caution for**: Query building (no type safety), optimization (partial coverage) - ❌ **Don't use for**: Database execution, connection management, migrations, type-safe queries For a multi-database storage layer, the recommended pattern is: ``` Application → Polyglot (transpile SQL to target dialect) → sqlx/diesel (execute) ``` --- ## 6. Recommendation ### When to Adopt Polyglot 1. **You need to support multiple database backends with different SQL dialects** and want to write queries once in a canonical dialect, then transpile to the target at runtime. 2. **You need SQL validation or analysis** (lineage, schema checking) without executing queries. 3. **You need SQL pretty-printing or formatting** with configurable guard rails. 4. **You need column lineage tracking** for data governance or OpenLineage integration. 5. **You need to parse and analyze SQL** in a Rust/WASM/Python/Go context without connecting to a database. ### When NOT to Adopt Polyglot 1. **You need type-safe query building** — use Diesel or SeaORM instead. 2. **You need async database execution** — use SQLx or SeaORM instead. 3. **You need schema migrations** — use Diesel migrations, sqlx migrations, or Refinery instead. 4. **You only need PostgreSQL** (or a single dialect) — a simpler parser may suffice. 5. **You need Rust type → SQL type mapping** — Polyglot doesn't provide this. ### Suggested Adoption Strategy For a multi-database storage layer: 1. **Use Polyglot for SQL transpilation**: Write queries in a canonical dialect (e.g., PostgreSQL-compatible), transpile to the target dialect at runtime. 2. **Use SQLx for database execution**: Handle connections, pooling, and async I/O. 3. **Use Polyglot for validation**: Validate user-provided SQL before execution. 4. **Use Polyglot for lineage**: Trace column flow for data governance. 5. **Build a thin integration layer** that combines Polyglot's transpilation with SQLx's execution. --- ## References - — Main repository - — Rust crate (v0.4.4) - — Rust API docs - — Python inspiration - — Package metadata - Local source: `/workspace/polyglot/`