Add repository layer strategy: JSON path queries, CRUD decisions, ecosystem integration

Add three open questions (OQ-17, OQ-18, OQ-19) covering attribute query strategy, CRUD generation approach, and storage-operations bridge placement. Create ADR-033 recording the v1 decision: JSON path queries for attributes with hand-written CRUD for static tables. Expand forward-look.md with Repository Layer Strategy section analyzing three approaches (JSON path, native columns via dbtype, hybrid) and their implications for the metagraph pattern. Add drizzle-graphql and dbtype from-dbtype comparison showing neither handles dynamic schema-as-data. Update overview.md with dbtype/ujsx in the dependency diagram, expanded ecosystem context in the bridging pattern section, and new open questions. Align open-questions.md: resolve OQ-17 and OQ-18 for v1 (ADR-033), add OQ-19 as open, update summary counts and ADR impact table.
2026-05-30 11:02:49 +00:00
parent ed8710a7f5
commit a2ee452a63
5 changed files with 258 additions and 6 deletions
--- a/docs/architecture/forward-look.md
+++ b/docs/architecture/forward-look.md
@@ -1,5 +1,5 @@
 ---
-status: draft
+status: reviewed
 last_updated: 2026-05-30
 ---

@@ -225,6 +225,129 @@ The Module-based graph type definitions (this spec) are the **first concrete
 step** in this pipeline. Everything else builds on having a `Type.Module` as
 the schema source of truth.

+## Repository Layer Strategy
+
+The repository layer (typed CRUD for the 6 metagraph tables + queries for graph data)
+is the next major feature to implement. The question of *how* it queries attributes
+connects to broader ecosystem decisions about dbtype and operations.
+
+### Three Approaches
+
+#### A. JSON Path Queries (Near-Term)
+
+The repository layer maps filter criteria to JSON path extraction:
+
+```ts
+findNodes({ graphId, attributes: { status: "active" } })
+// SQLite: json_extract(attributes, '$.status') = 'active'
+// PG:     attributes ->> 'status' = 'active'
+```
+
+- Works with current table definitions (no schema changes)
+- SQLite `json_extract()` and PG `->>` / `#>>` operators handle JSON path
+- No native index support on individual JSON attributes
+- PG can add GIN indexes on `jsonb` columns for containment queries, but not for
+  arbitrary key-value lookups
+- Simple, immediate, no new infrastructure
+
+This is the pragmatic v1 approach. The metagraph pattern *requires* JSON attributes
+because node types are dynamic schemas (defined at runtime, stored in
+`node_types.schema`), not static columns known at database definition time.
+
+#### B. Native Columns via dbtype (Long-Term, Speculative)
+
+If storage migrates to dbtype element trees for table definitions, the 6 static
+metagraph tables (graph_types, node_types, edge_types, graphs, nodes, edges) could
+be rendered via the dbtype pipeline: element tree → HostConfig → Drizzle tables.
+This would eliminate the manual duplication between `sqlite/` and future `pg/`.
+
+However, dbtype does NOT solve the attribute indexing problem:
+
+- The metagraph's `attributes` column MUST remain JSON because the shape is defined
+  by runtime schemas (node type definitions), not by static column definitions
+- dbtype generates static table schemas; it does not handle dynamic schema-as-data
+  patterns like the metagraph
+- A "call" node's attributes (`requestId`, `status`, `duration`) are not columns
+  on the `nodes` table — they're values in the `attributes` JSON column, validated
+  by the corresponding node type's TypeBox schema
+
+#### C. Hybrid: Static Tables via dbtype, Dynamic Attributes Remain JSON
+
+The hybrid approach preserves the metagraph's dynamic schema model while leveraging
+dbtype for the static table scaffolding:
+
+1. **Static tables**: dbtype renders the 6 metagraph tables to Drizzle dialects.
+   This eliminates the SQLite/PG manual duplication for table *structure*.
+   The `attributes` column is still `text/jsonb` across both dialects.
+
+2. **Dynamic attributes**: Remain JSON. The Module-based node type schemas validate
+   data at the application layer, not the database layer. This is by design
+   (ADR-003, ADR-014).
+
+3. **Virtual columns / computed columns**: A post-v1 optimization, not a v1 concern.
+   Frequently queried attributes could be extracted to indexed columns as a
+   performance optimization. For example, if `nodes.attributes.status` is a common
+   filter, a computed column or trigger could copy it to `nodes.status_column` with
+   an index. This would be a denormalization trade-off (triggers, migration
+   complexity, dual-write responsibility) and is not designed or planned for v1.
+
+4. **Repository CRUD**: The static table CRUD operations (insert graph type, find
+   node by key) could be auto-generated like drizzle-graphql or the dbtype
+   `from-dbtype` adapter. Graph-specific attribute queries remain JSON path.
+
+### Implications for Each Approach
+
+| Concern | Path A (JSON) | Path B (Native) | Path C (Hybrid) |
+|---------|---------------|-----------------|------------------|
+| Works today | ✅ | ❌ (requires dbtype) | ❌ (requires dbtype) |
+| Preserves metagraph pattern | ✅ | ❌ (conflicts with dynamic schemas) | ✅ |
+| Eliminates SQLite/PG duplication | ❌ | ✅ | ✅ |
+| Indexes on attributes | GIN on PG only | ✅ full native | GIN + virtual columns |
+| Repository generation | Hand-write CRUD | Auto-gen from dbtype | Auto-gen for static, JSON path for dynamic |
+| Dependency on dbtype | None | Full | Partial (static tables only) |
+
+### Connection to drizzle-graphql
+
+The overview references drizzle-graphql as a pattern for auto-generating a CRUD/query
+surface. The dbtype `from-dbtype` adapter is the @alkdev equivalent: it consumes
+element trees + Type.Module bundles and produces `OperationSpec[]` for the
+operations registry.
+
+The parallel:
+
+| Concern | drizzle-graphql | dbtype from-dbtype |
+|---------|----------------|-------------------|
+| Input | Drizzle schema (tables + relations) | UJSX element tree + Type.Module |
+| Output | GraphQL schema (queries + mutations) | `OperationSpec[]` (CRUD operations) |
+| Dialects | SQLite, PG, MySQL | SQLite, PG, MySQL (via HostConfig) |
+| Table model | Static columns only | Static columns only |
+| Dynamic data (JSON attrs) | Not handled | Not handled |
+
+Neither drizzle-graphql nor dbtype's `from-dbtype` handles dynamic schema-as-data
+patterns. The metagraph's JSON attributes require their own query layer, regardless
+of whether the static tables are auto-generated. This means the repository layer
+for `@alkdev/storage` will always have two parts:
+
+1. **Static table CRUD** — could be auto-generated (by dbtype or hand-written)
+2. **Graph data queries** — JSON path queries against the `attributes` column,
+   validated by the Module schema at the application layer
+
+### v1 Decision
+
+For v1, the practical path is **A (JSON path queries) with hand-written CRUD**. This
+decision is recorded as [ADR-033](./decisions/033-json-path-queries-for-v1.md). The
+hybrid approach (C) remains viable for a future iteration when dbtype reaches
+implementation, and it doesn't require any changes to the metagraph data model —
+only to how the static table definitions are generated. See OQ-17, OQ-18, OQ-19
+in [open-questions.md](./open-questions.md) for the specific long-term questions
+that remain open beyond v1.
+
+### Decisions Required
+
+- **OQ-17**: JSON path vs native columns vs hybrid for attribute queries (resolved for v1 — see ADR-033)
+- **OQ-18**: Auto-generated vs hand-written CRUD for static tables (resolved for v1 — see ADR-033)
+- **OQ-19**: Where the storage-operations bridge package should live (open)
+
 ## Constraints on Current Design

 The forward-looking patterns documented here constrain the Module evolution
@@ -263,7 +386,11 @@ design in [metagraph-module.md](./metagraph-module.md):
 - dbtype architecture: `/workspace/@alkdev/dbtype/docs/architecture/README.md`
 - dbtype elements: `/workspace/@alkdev/dbtype/docs/architecture/elements.md`
 - dbtype module: `/workspace/@alkdev/dbtype/docs/architecture/module.md`
+- dbtype repo adapter: `/workspace/@alkdev/dbtype/docs/architecture/repo-adapter.md`
+- drizzle-graphql (reference for CRUD generation pattern): `/workspace/drizzle-graphql/`
+- Operations registry: `/workspace/@alkdev/operations/docs/architecture/README.md`
 - JPATH Module (JSONPath as TypeBox Module): `/workspace/research/typebox_research/ujsx/jpath.gen.ts`
 - jsonpathly source: `/workspace/jsonpathly/`
 - Module evolution spec: [metagraph-module.md](./metagraph-module.md)
- Schema evolution spec: [schema-evolution.md](./schema-evolution.md)
+- Schema evolution spec: [schema-evolution.md](./schema-evolution.md)
+- ADR-033: JSON path queries and hand-written CRUD for v1