docs: resolve architecture open questions, add type definitions, consolidate docs

Architecture review session resolving all high-priority open questions and filling documentation gaps identified during review: Decisions resolved: - OQ-04: Flat props with inner escape hatch for column validation (ADR-007) - OQ-05: PG enum pre-declaration returns enums and tables (ADR-008) - OQ-06: Render results accumulate in root.ctx (resolved in hosts.md) - Column references vs fk: references is shorthand, explicit fk takes precedence (ADR-006) - ADR-001, 002, 003 promoted from Proposed to Accepted (probe-validated) Documentation improvements: - Complete DbColumnType mapping tables for all 14 types across 3 dialects - Define ColumnMeta, TableMeta, IndexMeta, FkMeta types in elements.md - Document inner prop, mode prop, and default prop semantics - Add PgRootCtx, SqliteRootCtx, MySqlRootCtx context types - Consolidate schema.md and module.md (remove duplication) - Add end-to-end pipeline walkthrough to README - Add glossary with 13 terms - Add error handling strategy - Remove duplicate content from hosts.md (cross-ref elements.md)
2026-05-23 12:06:51 +00:00
parent 4644e1b362
commit d4fd67f4d2
12 changed files with 476 additions and 221 deletions
--- a/docs/architecture/decisions/001-ujsx-as-ir.md
+++ b/docs/architecture/decisions/001-ujsx-as-ir.md
@@ -2,7 +2,7 @@

 ## Status

-Proposed
+Accepted

 ## Context

--- a/docs/architecture/decisions/002-type-module-as-bundle.md
+++ b/docs/architecture/decisions/002-type-module-as-bundle.md
@@ -2,7 +2,7 @@

 ## Status

-Proposed
+Accepted

 ## Context

--- a/docs/architecture/decisions/003-hostconfig-for-dialects.md
+++ b/docs/architecture/decisions/003-hostconfig-for-dialects.md
@@ -2,7 +2,7 @@

 ## Status

-Proposed
+Accepted

 ## Context

--- a/docs/architecture/decisions/006-references-vs-fk.md
+++ b/docs/architecture/decisions/006-references-vs-fk.md
@@ -0,0 +1,53 @@
+# ADR-006: Column `references` as FK Shorthand
+
+## Status
+
+Accepted
+
+## Context
+
+The `<column>` element has a `references` prop (`string`) that specifies a foreign key target table name. There is also a separate `<fk>` element with full FK specification (`columns`, `references`, `foreignColumns`, `onDelete`, `onUpdate`).
+
+Two mechanisms exist for expressing foreign keys with different capabilities:
+- `<column references="users">` — only specifies target table, no composite FK support, no ON DELETE/UPDATE
+- `<fk>` — full FK specification with composite support and actions
+
+This creates ambiguity: which takes precedence? Can both be used simultaneously? What if they conflict?
+
+## Decision
+
+`references` on `<column>` is a **shorthand for simple single-column foreign keys** that targets the referenced table's primary key by convention. `<fk>` is the **explicit form** for composite foreign keys and FKs with ON DELETE/ON UPDATE actions.
+
+Internally, `extractTable()` normalizes all `references` props into `FkMeta` entries alongside `<fk>` elements. When both `references` on a column and an explicit `<fk>` referencing the same column exist, the explicit `<fk>` takes precedence.
+
+## Rationale
+
+1. **Simple FKs are the common case**: Most foreign keys are single-column references to another table's primary key. `references="users"` covers this case ergonomically without requiring a separate `<fk>` element.
+
+2. **Composition is additive**: `references` props and `<fk>` elements both produce `FkMeta` entries. There's no separate representation for "simple" vs "complex" FKs in the output — they're unified.
+
+3. **Explicit overrides implicit**: When both forms specify the same relationship, the explicit `<fk>` form is more complete and should take precedence. This allows shorthand to be overridden without conflict.
+
+4. **Convention over configuration**: `references` assumes the target is the primary key. This convention covers 95% of FK relationships. For the remaining cases (non-PK targets, composite FKs), `<fk>` provides full control.
+
+5. **`extractTable()` normalizes**: Both forms produce the same `FkMeta` output, so downstream consumers (hosts, module, repo adapter) don't need to handle two different representations.
+
+## Consequences
+
+### Positive
+
+- Ergonomic shorthand for the most common FK pattern
+- No ambiguity about precedence — explicit `<fk>` always wins
+- Unified internal representation (`FkMeta`) regardless of authoring style
+- `<column references="users">` maps to `{ columns: ['userId'], references: 'users', foreignColumns: ['id'] }` by convention
+
+### Negative
+
+- `references` on `<column>` cannot specify target columns or ON DELETE/UPDATE — must use `<fk>` for those
+- The convention that `references` targets the primary key must be documented and understood
+- Two ways to express the same thing (simple FKs) — could confuse new users until they learn the convention
+
+## References
+
+- [elements.md](../elements.md) — column and fk element definitions
+- [repo-adapter.md](../repo-adapter.md) — FK metadata consumption
--- a/docs/architecture/decisions/007-inner-escape-hatch.md
+++ b/docs/architecture/decisions/007-inner-escape-hatch.md
@@ -0,0 +1,65 @@
+# ADR-007: Flat Props with `inner` Escape Hatch for Column Validation
+
+## Status
+
+Accepted
+
+## Context
+
+Column elements carry both database metadata (notNull, primaryKey, default) and validation semantics (type, format, maxLength). The question is how to handle TypeBox validation constraints that don't have dedicated column props.
+
+Option A (flat props only) limits validation to what has explicit props: `format`, `maxLength`, `minLength`, `pattern`, etc. Each validation constraint requires a new column prop. Custom validation beyond these requires manipulating the module entry directly.
+
+Option B (flat props + `inner` escape hatch) uses flat props for the common 90% of cases and provides an `inner` prop that accepts a full TypeBox schema to override the auto-generated one. The host ignores `inner` — it's purely for the TypeBox schema.
+
+Option C (inner-first) makes the TypeBox schema primary and DB metadata secondary. This is close to the `DbTypeBuilder` pattern that ADR-001 specifically rejected.
+
+The research docs (`docs/research/architecture.md`) proposed `DbType.String({ notNull: true, inner: Type.String({ format: 'email', maxLength: 255 }) })`. With UJSX elements, this becomes `<column name="email" type="string" notNull inner={Type.String({ format: 'email', maxLength: 255 })} />`.
+
+## Decision
+
+Use flat props for common cases with an `inner` escape hatch for custom TypeBox schemas (Option B).
+
+- Common validation constraints (`format`, `maxLength`, `minLength`, `pattern`, `values`) are top-level column props
+- The `inner` prop accepts a TypeBox schema that overrides the auto-generated `colToTypeBox()` result
+- When `inner` is provided, `extractTable()` uses it directly instead of calling `colToTypeBox(type, props)`
+- The host ignores `inner` — database rendering uses only `type` and the other DB metadata props
+- When `inner` is absent, `colToTypeBox(type, props)` generates the TypeBox schema from the column type and props
+
+## Rationale
+
+1. **Ergonomics for the common case**: Most columns only need `type`, `notNull`, `primaryKey`, `default`, and maybe `format`. Flat props (`<column name="email" type="string" notNull format="email" />`) are concise and readable.
+
+2. **Escape hatch for complex validation**: Some columns need validation constraints that don't map to column props: `pattern`, `minimum`/`maximum` for numeric ranges, `contentMediaType` for embeddings, custom format validators. The `inner` prop lets you express any TypeBox validation without polluting the column props namespace.
+
+3. **Host independence**: The host renders columns to Drizzle based on `type` and DB metadata. `inner` doesn't affect rendering at all. This means adding `inner` doesn't require host changes — it's a pure TypeConcern.
+
+4. **Future-proofing**: Embedding vectors, custom types, and complex validation patterns will need `inner`. For instance, a vector column might be `<column name="embedding" type="string" mode="json" inner={Type.Array(Type.Number(), { minItems: 1536, maxItems: 1536 })} />` — the host still stores it as JSON, but the TypeBox schema validates the array structure.
+
+5. **Probe validated**: The probe scripts use flat props exclusively. `inner` adds a single conditional in `extractTable()` — if `inner` is provided, use it; otherwise, call `colToTypeBox()`. No structural change to the pipeline.
+
+6. **No alternative duplication**: Without `inner`, consumers who need `Type.String({ format: 'email', maxLength: 255, pattern: '^[a-z]' })` would have to manipulate the module entry after extraction, which defeats the purpose of a schema-first element tree.
+
+## Consequences
+
+### Positive
+
+- Flat props stay ergonomic for the 90% case
+- Any TypeBox validation is expressible via `inner` without new column props
+- Hosts don't need changes — `inner` is a TypeConcern only
+- Future column types (vectors, custom point types) can embed complex validation
+- Implementation cost is minimal: one conditional in `extractTable()`
+
+### Negative
+
+- Two ways to express validation (flat props vs `inner`) — could confuse new users
+- `inner` TypeBox schemas are not reflected in column props — introspecting a column's validation requires looking at both `props` and `props.inner`
+- The `colToTypeBox()` function must still exist for the no-`inner` case, and its output must be compatible with what `inner` would provide
+- When `inner` is provided alongside flat validation props (`format`, `maxLength`), `inner` takes precedence. This must be clearly documented.
+
+## References
+
+- [elements.md](../elements.md) — Column element definition and `inner` prop
+- [ADR-001](001-ujsx-as-ir.md) — UJSX as the IR (rejected separate builder API)
+- [ADR-004](004-format-annotation-only.md) — Format as annotation (relevant: `format` is a flat prop, `inner` is for when it's not enough)
+- Research: `docs/research/architecture.md` (DbTypeBuilder inner pattern)
--- a/docs/architecture/decisions/008-pg-enum-predeclaration.md
+++ b/docs/architecture/decisions/008-pg-enum-predeclaration.md
@@ -0,0 +1,73 @@
+# ADR-008: PG Enum Pre-declaration — Return Both Enums and Tables
+
+## Status
+
+Accepted
+
+## Context
+
+PostgreSQL requires `pgEnum()` to be called at module scope before any table that references it. SQLite uses `text({ enum: [...] })` inline. MySQL uses `mysqlEnum()` inline. This creates a structural difference in the render output: PG enums must be declared separately.
+
+Three options were considered:
+
+**Option A**: Return both enums and tables from render. The PG host context accumulates enum declarations during the render walk and exposes them alongside tables.
+
+**Option B**: Use `text()` for all enums initially, add native PG enum support later as an opt-in.
+
+**Option C**: Per-column opt-in with `postgres: { nativeEnum: true }` — only generates `pgEnum` when explicitly requested.
+
+## Decision
+
+Use Option A: the PG host returns both enums and tables from the render context.
+
+The PG host context shape after rendering:
+
+```typescript
+interface PgRootCtx {
+  dialect: 'pg'
+  tables: Record<string, PgTable>
+  enums: Record<string, PgEnum>  // Accumulated during render
+}
+```
+
+When a `<column type="enum" values={['happy', 'sad', 'neutral']}>` is encountered, the PG host:
+1. Registers the enum in `ctx.enums` (using the table and column name to derive a unique enum name)
+2. Uses the registered enum in the column builder
+3. The consumer includes both `ctx.enums` and `ctx.tables` in their Drizzle schema
+
+SQLite and MySQL hosts use `ctx.tables` only — no `enums` key.
+
+## Rationale
+
+1. **Correct PG behavior**: PG enums are a separate type declaration (`CREATE TYPE ... AS ENUM`). The render output must include them for the schema to be valid.
+
+2. **No information loss**: Option B (use `text()` for all) loses PG's native enum validation at the database level. The whole point of defining schema types is to get correct database behavior per dialect.
+
+3. **Clean API**: The host context already accumulates `tables`. Adding `enums` to the same context is natural and doesn't change the render pattern — it just adds a second accumulation target.
+
+4. **SQLite/MySQL unaffected**: These dialects don't have pre-declared enums. Their host contexts don't include `enums`, so the API is dialect-appropriate.
+
+5. **Consistent with resolved OQ-006**: The rendering pipeline accumulates results in `root.ctx`. PG enums are auxiliary state, just like tables. The context shape varies by dialect — this is expected.
+
+6. **Deriving enum names**: For a column `status` on table `users`, the PG enum name is `users_status` (table_column convention). This is the same convention Drizzle uses for inline enums.
+
+## Consequences
+
+### Positive
+
+- Correct PG enum support from day one
+- No opt-in flag needed — enums just work when `type="enum"` is used with the PG host
+- Dialect-appropriate context: PG has `enums + tables`, SQLite/MySQL have `tables` only
+- Consistent with the context accumulation pattern
+
+### Negative
+
+- PG consumers must include `ctx.enums` in their schema file (alongside `ctx.tables`)
+- The derived enum naming convention (`table_column`) could conflict with explicit enum names in the future — a naming override prop may be needed later
+- Slightly more complex PG host implementation (tracking enum registrations during the walk)
+
+## References
+
+- [hosts.md](../hosts.md) — Host rendering pipeline and PG column type mapping
+- [elements.md](../elements.md) — Column element `type="enum"` and `values` props
+- [open-questions.md](../open-questions.md) — Original OQ-05