# Polyglot: Architecture Deep Dive --- ## 1. Workspace Structure The repository is organized as a Cargo workspace with 5 crates and supporting packages: ``` polyglot/ ├── crates/ │ ├── polyglot-sql/ # Core Rust library (~241K LOC) │ │ └── src/ │ │ ├── lib.rs # Public API, top-level functions │ │ ├── tokens.rs # Tokenizer (lexer) │ │ ├── parser.rs # Recursive-descent parser (~62K LOC) │ │ ├── expressions.rs # AST node types (~15K LOC) │ │ ├── generator.rs # SQL code generator (~39K LOC) │ │ ├── dialects/ # 33 dialect implementations │ │ │ ├── mod.rs # Dialect trait, Dialect struct, CustomDialectBuilder │ │ │ ├── generic.rs # Base/standard SQL dialect │ │ │ ├── postgres.rs # PostgreSQL (~1.9K LOC) │ │ │ ├── mysql.rs # MySQL │ │ │ ├── sqlite.rs # SQLite │ │ │ ├── bigquery.rs # BigQuery │ │ │ ├── ... (32 total) │ │ ├── builder.rs # Fluent query builder API │ │ ├── transforms.rs # Cross-dialect transform functions │ │ ├── validation.rs # Syntax + semantic validation │ │ ├── schema.rs # Schema representation │ │ ├── scope.rs # Scope analysis │ │ ├── resolver.rs # Column resolution │ │ ├── lineage.rs # Column lineage tracking │ │ ├── openlineage.rs # OpenLineage payload generation │ │ ├── diff.rs # AST diff (ChangeDistiller algorithm) │ │ ├── planner.rs # Logical query plan │ │ ├── optimizer/ # Query optimizer modules │ │ │ ├── annotate_types.rs # Type annotation │ │ │ ├── qualify_columns.rs # Column qualification │ │ │ ├── qualify_tables.rs # Table qualification │ │ │ ├── pushdown_predicates.rs │ │ │ ├── pushdown_projections.rs │ │ │ ├── eliminate_joins.rs │ │ │ ├── eliminate_ctes.rs │ │ │ ├── simplify.rs │ │ │ └── ... │ │ ├── traversal.rs # DFS/BFS visitors, AST predicates │ │ ├── ast_transforms.rs # AST manipulation utilities │ │ ├── error.rs # Error types │ │ └── time.rs # Time format conversion │ ├── polyglot-sql-function-catalogs/ # Optional dialect function catalogs │ ├── polyglot-sql-wasm/ # WASM bindings (wasm-pack) │ ├── polyglot-sql-ffi/ # C FFI bindings (cbindgen) │ └── polyglot-sql-python/ # Python bindings (PyO3 + maturin) ├── packages/ │ ├── sdk/ # TypeScript SDK (@polyglot-sql/sdk) │ ├── go/ # Go SDK (PureGo wrapper over FFI) │ ├── documentation/ # TypeScript API docs site │ ├── playground/ # Browser playground (React 19, Vite) │ └── python-docs/ # Python API docs ├── examples/ │ ├── rust/ # Rust usage example │ ├── typescript/ # TypeScript SDK example │ └── c/ # C FFI usage example └── tools/ ├── sqlglot-compare/ # Fixture extraction & comparison └── bench-compare/ # Performance benchmarks ``` --- ## 2. Data Flow Pipeline ``` ┌──────────────────────────────────────────────────────────────────────┐ │ SQL String (source dialect) │ └──────────────────────────┬──────────────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────────────────────────────────────┐ │ Tokenizer (tokens.rs) │ │ • Dialect-specific lexing rules (quotes, comments, keywords) │ │ • Configurable via TokenizerConfig per dialect │ │ • Produces Vec with type, text, and Span (line/col/offset) │ └──────────────────────────┬──────────────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────────────────────────────────────┐ │ Parser (parser.rs, ~62K LOC) │ │ • Recursive-descent with precedence climbing │ │ • Dialect-aware parsing (custom keywords, syntax rules) │ │ • Produces Expression AST tree │ │ • Stack safety via `stacker` feature (default-on) │ └──────────────────────────┬──────────────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────────────────────────────────────┐ │ Expression AST (expressions.rs) │ │ • Single tagged enum with 150+ variants │ │ • Each variant has its own struct (Select, Insert, Function, etc.) │ │ • Box keeps enum size to 2 words (tag + pointer) │ │ • Serializable via serde (derive Serialize/Deserialize) │ │ • Optional TypeScript type generation via `ts-rs` feature flag │ └──────────────────────────┬──────────────────────────────────────────┘ │ ┌────┴────┐ │ │ ┌─────────┘ └──────────┐ │ │ ▼ ▼ ┌────────────────────────┐ ┌────────────────────────────────────┐ │ Transform Pipeline │ │ Semantic / Analysis Modules │ │ (transpile path) │ │ • validation.rs → syntax checks │ │ │ │ • schema.rs → column/type lookup │ │ 1. preprocess() │ │ • scope.rs → scope analysis │ │ (whole-tree rewrites│ │ • resolver.rs → column resolution │ │ like eliminate_ │ │ • lineage.rs → column lineage │ │ qualify) │ │ • openlineage.rs → OL payloads │ │ │ │ • optimizer/ → query optimization │ │ 2. transform_expr() │ │ • diff.rs → AST diff │ │ (per-node rewrites │ │ • planner.rs → logical plan DAG │ │ per dialect) │ │ • traversal.rs → DFS/BFS visitors │ │ │ │ │ 3. Generator │ │ │ (AST → SQL string) │ │ └───────────┬────────────┘ └────────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────────────────────────────────────┐ │ SQL String (target dialect) │ └──────────────────────────────────────────────────────────────────────┘ ``` --- ## 3. Core Abstractions ### 3.1 Expression AST The central type is `Expression`, a large tagged enum with one variant per SQL construct: ```rust pub enum Expression { // Literals Literal(Box), Boolean(BooleanLiteral), Null(Null), // Identifiers Identifier(Identifier), Column(Box), Table(Box), Star(Star), // Queries Select(Box