From a2ee452a639f8f63bff6c85ee8e812059dd17269 Mon Sep 17 00:00:00 2001
From: "glm-5.1" <glm-5.1@alk.dev>
Date: Sat, 30 May 2026 11:02:49 +0000
Subject: [PATCH] Add repository layer strategy: JSON path queries, CRUD
 decisions, ecosystem integration

Add three open questions (OQ-17, OQ-18, OQ-19) covering attribute query
strategy, CRUD generation approach, and storage-operations bridge placement.
Create ADR-033 recording the v1 decision: JSON path queries for attributes
with hand-written CRUD for static tables.

Expand forward-look.md with Repository Layer Strategy section analyzing
three approaches (JSON path, native columns via dbtype, hybrid) and their
implications for the metagraph pattern. Add drizzle-graphql and dbtype
from-dbtype comparison showing neither handles dynamic schema-as-data.

Update overview.md with dbtype/ujsx in the dependency diagram, expanded
ecosystem context in the bridging pattern section, and new open questions.

Align open-questions.md: resolve OQ-17 and OQ-18 for v1 (ADR-033), add
OQ-19 as open, update summary counts and ADR impact table.
---
 docs/architecture/README.md                   |   1 +
 .../decisions/033-json-path-queries-for-v1.md |  53 +++++++
 docs/architecture/forward-look.md             | 131 +++++++++++++++++-
 docs/architecture/open-questions.md           |  39 +++++-
 docs/architecture/overview.md                 |  40 +++++-
 5 files changed, 258 insertions(+), 6 deletions(-)
 create mode 100644 docs/architecture/decisions/033-json-path-queries-for-v1.md

diff --git a/docs/architecture/README.md b/docs/architecture/README.md
index f2f821c..a444527 100644
--- a/docs/architecture/README.md
+++ b/docs/architecture/README.md
@@ -58,6 +58,7 @@ Storage has Phase 1-3 of the metagraph implementation complete: Metagraph Module
 | [030](decisions/030-schema-change-detection-via-diff.md) | Schema change detection via Value.Diff | Accepted |
 | [031](decisions/031-moduletodbschema-for-updates.md) | moduleToDbSchema() for schema updates | Accepted |
 | [032](decisions/032-single-author-not-crdt.md) | Single-author model, not CRDT | Accepted |
+| [033](decisions/033-json-path-queries-for-v1.md) | JSON path queries and hand-written CRUD for v1 | Accepted |
 
 ### Open Questions
 
diff --git a/docs/architecture/decisions/033-json-path-queries-for-v1.md b/docs/architecture/decisions/033-json-path-queries-for-v1.md
new file mode 100644
index 0000000..11fce66
--- /dev/null
+++ b/docs/architecture/decisions/033-json-path-queries-for-v1.md
@@ -0,0 +1,53 @@
+# ADR-033: JSON path queries and hand-written CRUD for v1 repository layer
+
+## Status
+
+Accepted
+
+## Context
+
+The repository layer is the next major feature for `@alkdev/storage`. It needs to provide typed CRUD operations for the 6 metagraph tables and query capability for node/edge attributes stored as JSON columns.
+
+The metagraph pattern stores node and edge attributes as JSON (`attributes text not null default '{}'` with JSON mode in SQLite, `jsonb` in PG). This is fundamental to the design — node types are dynamic schemas defined at runtime and stored in `node_types.schema`, not static columns known at database definition time.
+
+Three approaches exist for querying attributes:
+
+1. **JSON path queries**: Map filter criteria to `json_extract()` (SQLite) or `->>` / `#>>` (PG). Works with current table definitions. No native index support on individual attributes.
+
+2. **Native columns via dbtype**: Render the metagraph tables via `@alkdev/dbtype` element trees and make attributes native columns. Conflicts with the metagraph's dynamic schema model — attributes are runtime data, not static columns.
+
+3. **Hybrid**: dbtype renders the 6 static tables. Attributes remain JSON (dynamic schema requirement). CRUD for static tables could be auto-generated. Graph-specific queries use JSON path. Virtual columns for frequently queried attributes as a later optimization.
+
+Separately, the CRUD operations for the 6 metagraph tables (insert graph type, find node by key, etc.) could be hand-written, auto-generated from Drizzle schemas (drizzle-graphql pattern), or auto-generated from dbtype element trees (the `from-dbtype` adapter pattern).
+
+## Decision
+
+For v1:
+
+1. **Attribute queries use JSON path extraction** (`json_extract` on SQLite, `->>`/`#>>` on PG). This preserves the metagraph's dynamic schema model. Native column indexes on individual attributes are not available in v1.
+
+2. **Static table CRUD operations are hand-written** with explicit function signatures (`findNode(graphId, key)`, `insertNodeType(...)`, etc.). No auto-generation from Drizzle or dbtype.
+
+3. **dbtype integration is deferred** (per ADR-018). The hybrid approach remains viable for a future iteration but is not the v1 path.
+
+4. **Virtual/computed columns for frequently queried attributes** are a post-v1 optimization, not a v1 design concern.
+
+The repository layer will have two parts:
+- **Static table CRUD**: Insert, find, update, delete for graph_types, node_types, edge_types, graphs, nodes, edges, actors.
+- **Graph data queries**: JSON path queries against node/edge attributes, validated by the Module schema at the application layer.
+
+## Consequences
+
+- v1 repository API uses JSON path for attribute queries — no native SQL indexes on attributes
+- CRUD function signatures are known and explicit — no generated code surface to learn
+- PG can add GIN indexes on `jsonb` columns for containment queries, but not for arbitrary key-value lookups
+- The hand-written CRUD path doesn't block any future auto-generation approach (dbtype `from-dbtype`, drizzle-graphql pattern, or a `from-storage` adapter in `@alkdev/operations`)
+- The metagraph's dynamic schema model is preserved — attributes are always JSON, not static columns
+
+## References
+
+- [forward-look.md](../forward-look.md) — Repository Layer Strategy section (full analysis)
+- [overview.md](../overview.md) — Repository Layer Bridging Pattern
+- [sqlite-host.md](../sqlite-host.md) — JSON text for schema columns (ADR-019)
+- ADR-018: dbtype integration is post-v1
+- ADR-005: Drizzle + TypeBox via drizzlebox
\ No newline at end of file
diff --git a/docs/architecture/forward-look.md b/docs/architecture/forward-look.md
index 40ce209..514bc5f 100644
--- a/docs/architecture/forward-look.md
+++ b/docs/architecture/forward-look.md
@@ -1,5 +1,5 @@
 ---
-status: draft
+status: reviewed
 last_updated: 2026-05-30
 ---
 
@@ -225,6 +225,129 @@ The Module-based graph type definitions (this spec) are the **first concrete
 step** in this pipeline. Everything else builds on having a `Type.Module` as
 the schema source of truth.
 
+## Repository Layer Strategy
+
+The repository layer (typed CRUD for the 6 metagraph tables + queries for graph data)
+is the next major feature to implement. The question of *how* it queries attributes
+connects to broader ecosystem decisions about dbtype and operations.
+
+### Three Approaches
+
+#### A. JSON Path Queries (Near-Term)
+
+The repository layer maps filter criteria to JSON path extraction:
+
+```ts
+findNodes({ graphId, attributes: { status: "active" } })
+// SQLite: json_extract(attributes, '$.status') = 'active'
+// PG:     attributes ->> 'status' = 'active'
+```
+
+- Works with current table definitions (no schema changes)
+- SQLite `json_extract()` and PG `->>` / `#>>` operators handle JSON path
+- No native index support on individual JSON attributes
+- PG can add GIN indexes on `jsonb` columns for containment queries, but not for
+  arbitrary key-value lookups
+- Simple, immediate, no new infrastructure
+
+This is the pragmatic v1 approach. The metagraph pattern *requires* JSON attributes
+because node types are dynamic schemas (defined at runtime, stored in
+`node_types.schema`), not static columns known at database definition time.
+
+#### B. Native Columns via dbtype (Long-Term, Speculative)
+
+If storage migrates to dbtype element trees for table definitions, the 6 static
+metagraph tables (graph_types, node_types, edge_types, graphs, nodes, edges) could
+be rendered via the dbtype pipeline: element tree → HostConfig → Drizzle tables.
+This would eliminate the manual duplication between `sqlite/` and future `pg/`.
+
+However, dbtype does NOT solve the attribute indexing problem:
+
+- The metagraph's `attributes` column MUST remain JSON because the shape is defined
+  by runtime schemas (node type definitions), not by static column definitions
+- dbtype generates static table schemas; it does not handle dynamic schema-as-data
+  patterns like the metagraph
+- A "call" node's attributes (`requestId`, `status`, `duration`) are not columns
+  on the `nodes` table — they're values in the `attributes` JSON column, validated
+  by the corresponding node type's TypeBox schema
+
+#### C. Hybrid: Static Tables via dbtype, Dynamic Attributes Remain JSON
+
+The hybrid approach preserves the metagraph's dynamic schema model while leveraging
+dbtype for the static table scaffolding:
+
+1. **Static tables**: dbtype renders the 6 metagraph tables to Drizzle dialects.
+   This eliminates the SQLite/PG manual duplication for table *structure*.
+   The `attributes` column is still `text/jsonb` across both dialects.
+
+2. **Dynamic attributes**: Remain JSON. The Module-based node type schemas validate
+   data at the application layer, not the database layer. This is by design
+   (ADR-003, ADR-014).
+
+3. **Virtual columns / computed columns**: A post-v1 optimization, not a v1 concern.
+   Frequently queried attributes could be extracted to indexed columns as a
+   performance optimization. For example, if `nodes.attributes.status` is a common
+   filter, a computed column or trigger could copy it to `nodes.status_column` with
+   an index. This would be a denormalization trade-off (triggers, migration
+   complexity, dual-write responsibility) and is not designed or planned for v1.
+
+4. **Repository CRUD**: The static table CRUD operations (insert graph type, find
+   node by key) could be auto-generated like drizzle-graphql or the dbtype
+   `from-dbtype` adapter. Graph-specific attribute queries remain JSON path.
+
+### Implications for Each Approach
+
+| Concern | Path A (JSON) | Path B (Native) | Path C (Hybrid) |
+|---------|---------------|-----------------|------------------|
+| Works today | ✅ | ❌ (requires dbtype) | ❌ (requires dbtype) |
+| Preserves metagraph pattern | ✅ | ❌ (conflicts with dynamic schemas) | ✅ |
+| Eliminates SQLite/PG duplication | ❌ | ✅ | ✅ |
+| Indexes on attributes | GIN on PG only | ✅ full native | GIN + virtual columns |
+| Repository generation | Hand-write CRUD | Auto-gen from dbtype | Auto-gen for static, JSON path for dynamic |
+| Dependency on dbtype | None | Full | Partial (static tables only) |
+
+### Connection to drizzle-graphql
+
+The overview references drizzle-graphql as a pattern for auto-generating a CRUD/query
+surface. The dbtype `from-dbtype` adapter is the @alkdev equivalent: it consumes
+element trees + Type.Module bundles and produces `OperationSpec[]` for the
+operations registry.
+
+The parallel:
+
+| Concern | drizzle-graphql | dbtype from-dbtype |
+|---------|----------------|-------------------|
+| Input | Drizzle schema (tables + relations) | UJSX element tree + Type.Module |
+| Output | GraphQL schema (queries + mutations) | `OperationSpec[]` (CRUD operations) |
+| Dialects | SQLite, PG, MySQL | SQLite, PG, MySQL (via HostConfig) |
+| Table model | Static columns only | Static columns only |
+| Dynamic data (JSON attrs) | Not handled | Not handled |
+
+Neither drizzle-graphql nor dbtype's `from-dbtype` handles dynamic schema-as-data
+patterns. The metagraph's JSON attributes require their own query layer, regardless
+of whether the static tables are auto-generated. This means the repository layer
+for `@alkdev/storage` will always have two parts:
+
+1. **Static table CRUD** — could be auto-generated (by dbtype or hand-written)
+2. **Graph data queries** — JSON path queries against the `attributes` column,
+   validated by the Module schema at the application layer
+
+### v1 Decision
+
+For v1, the practical path is **A (JSON path queries) with hand-written CRUD**. This
+decision is recorded as [ADR-033](./decisions/033-json-path-queries-for-v1.md). The
+hybrid approach (C) remains viable for a future iteration when dbtype reaches
+implementation, and it doesn't require any changes to the metagraph data model —
+only to how the static table definitions are generated. See OQ-17, OQ-18, OQ-19
+in [open-questions.md](./open-questions.md) for the specific long-term questions
+that remain open beyond v1.
+
+### Decisions Required
+
+- **OQ-17**: JSON path vs native columns vs hybrid for attribute queries (resolved for v1 — see ADR-033)
+- **OQ-18**: Auto-generated vs hand-written CRUD for static tables (resolved for v1 — see ADR-033)
+- **OQ-19**: Where the storage-operations bridge package should live (open)
+
 ## Constraints on Current Design
 
 The forward-looking patterns documented here constrain the Module evolution
@@ -263,7 +386,11 @@ design in [metagraph-module.md](./metagraph-module.md):
 - dbtype architecture: `/workspace/@alkdev/dbtype/docs/architecture/README.md`
 - dbtype elements: `/workspace/@alkdev/dbtype/docs/architecture/elements.md`
 - dbtype module: `/workspace/@alkdev/dbtype/docs/architecture/module.md`
+- dbtype repo adapter: `/workspace/@alkdev/dbtype/docs/architecture/repo-adapter.md`
+- drizzle-graphql (reference for CRUD generation pattern): `/workspace/drizzle-graphql/`
+- Operations registry: `/workspace/@alkdev/operations/docs/architecture/README.md`
 - JPATH Module (JSONPath as TypeBox Module): `/workspace/research/typebox_research/ujsx/jpath.gen.ts`
 - jsonpathly source: `/workspace/jsonpathly/`
 - Module evolution spec: [metagraph-module.md](./metagraph-module.md)
-- Schema evolution spec: [schema-evolution.md](./schema-evolution.md)
\ No newline at end of file
+- Schema evolution spec: [schema-evolution.md](./schema-evolution.md)
+- ADR-033: JSON path queries and hand-written CRUD for v1
\ No newline at end of file
diff --git a/docs/architecture/open-questions.md b/docs/architecture/open-questions.md
index ea17fed..5e83d93 100644
--- a/docs/architecture/open-questions.md
+++ b/docs/architecture/open-questions.md
@@ -13,9 +13,9 @@ When a question is resolved, update its status to `resolved` and add a resolutio
 
 | Status | Count |
 |--------|-------|
-| Open | 7 |
+| Open | 8 |
 | Partially resolved | 1 |
-| Resolved | 8 |
+| Resolved | 10 |
 
 **Open questions requiring decisions:**
 - **OQ-03** (actors table design) — deferred to ACL design
@@ -25,10 +25,15 @@ When a question is resolved, update its status to `resolved` and add a resolutio
 - **OQ-11** (auto-migrate vs explicit consumer action) — conditional on OQ-10
 - **OQ-12** (schema evolution vs event-sourced replay) — post-v1 concern
 - **OQ-13** (schema evolution events in event stream) — post-v1
+- **OQ-19** (storage-operations bridge package location) — depends on long-term CRUD strategy
 
 **Partially resolved:**
 - **OQ-01** (flowgraph Module export) — storage can start without it
 
+**Resolved (v1 direction decided, long-term question remains open):**
+- **OQ-17** (attribute query strategy) — JSON path for v1 (ADR-033), hybrid viable with dbtype later
+- **OQ-18** (auto-generated vs hand-written CRUD) — hand-write for v1 (ADR-033), auto-gen remains an option
+
 ## How to Use This Document
 
 - Each question has an **ID** (e.g., OQ-01), **status**, **origin** (which doc(s)), and **priority**
@@ -41,7 +46,9 @@ When a question is resolved, update its status to `resolved` and add a resolutio
 |-----|----------|
 | ADR-003 | OQ-01 (partial — storage can start without flowgraph Module) |
 | ADR-015 | OQ-05 (constraint semantics) |
+| ADR-018 | OQ-17 (v1 decision: dbtype integration deferred, JSON path for v1) |
 | ADR-020 | OQ-02 (no nodeTypeId for now, can add later) |
+| ADR-033 | OQ-17 (JSON path queries for v1), OQ-18 (hand-written CRUD for v1) |
 
 ## Theme 1: Package Boundaries and Dependencies
 
@@ -171,4 +178,30 @@ When a question is resolved, update its status to `resolved` and add a resolutio
 - **Origin**: [overview.md](overview.md)
 - **Status**: resolved
 - **Priority**: high
-- **Resolution**: The repository CRUD layer (host-specific typed queries, schema validation before writes) belongs in `@alkdev/storage`. The operations bridging layer (generating `OperationSpec`s from metagraph schemas) belongs in a consumer or adapter package. These are separate concerns — CRUD is a storage concern; call protocol integration is an application concern.
\ No newline at end of file
+- **Resolution**: The repository CRUD layer (host-specific typed queries, schema validation before writes) belongs in `@alkdev/storage`. The operations bridging layer (generating `OperationSpec`s from metagraph schemas) belongs in a consumer or adapter package. These are separate concerns — CRUD is a storage concern; call protocol integration is an application concern.
+
+## Theme 7: Repository Layer Strategy
+
+### OQ-17: How should the repository layer handle attribute queries — JSON path, native columns, or dbtype-generated?
+
+- **Origin**: [forward-look.md](forward-look.md)
+- **Status**: resolved (v1)
+- **Priority**: high
+- **Resolution**: For v1, attribute queries use JSON path extraction (`json_extract` on SQLite, `->>`/`#>>` on PG). Hand-written CRUD for static tables. dbtype integration and hybrid approach are post-v1. See ADR-033. The long-term question of whether to adopt the hybrid approach (static tables via dbtype, dynamic attributes remain JSON) remains open for future iterations.
+- **Cross-references**: ADR-033, ADR-018, [forward-look.md](forward-look.md)
+
+### OQ-18: Should the repository layer's CRUD operations be auto-generated (drizzle-graphql pattern) or hand-written?
+
+- **Origin**: [forward-look.md](forward-look.md)
+- **Status**: resolved (v1)
+- **Priority**: medium
+- **Resolution**: For v1, hand-write CRUD functions with explicit signatures. The three long-term options (hand-written, auto-generated from Drizzle, auto-generated from dbtype) remain open for future iterations. See ADR-033.
+- **Cross-references**: ADR-033, OQ-17
+
+### OQ-19: Where does the storage-operations bridge package live in the @alkdev workspace?
+
+- **Origin**: [forward-look.md](forward-look.md)
+- **Status**: open
+- **Priority**: medium
+- **Notes**: Four options: (1) hub-internal code, (2) dedicated `@alkdev/storage-operations` adapter, (3) `from-storage` adapter inside `@alkdev/operations`, (4) part of `@alkdev/dbtype`'s `from-dbtype` adapter. Option 1 is the most immediate (no new package). Option 2 is the cleanest separation. Option 3 creates an undesirable dependency direction (operations → storage). Option 4 is the long-term goal if dbtype is adopted. The choice depends on OQ-17/OQ-18 resolution: if hand-written CRUD, the bridge is trivial and can live in the hub; if auto-generated from dbtype, the bridge naturally lives with dbtype.
+- **Cross-references**: OQ-16, OQ-17, ADR-033
\ No newline at end of file
diff --git a/docs/architecture/overview.md b/docs/architecture/overview.md
index bc0c528..c2b6c5d 100644
--- a/docs/architecture/overview.md
+++ b/docs/architecture/overview.md
@@ -90,6 +90,7 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
 | [006](decisions/006-enum-pattern-as-const-objects.md) | `as const` objects, not TypeScript enums | Avoids JSR slow-types; consistent pattern across codebase |
 | [007](decisions/007-no-comments-in-code.md) | No comments in code | Documentation lives in architecture docs and TypeBox descriptions |
 | [008](decisions/008-common-columns-pattern.md) | Common columns pattern | `id`, `metadata`, `createdAt`, `updatedAt` on every table |
+| [033](decisions/033-json-path-queries-for-v1.md) | JSON path queries and hand-written CRUD for v1 | Attribute queries use JSON path; CRUD is hand-written; dbtype and auto-generation are post-v1 |
 
 ## Dependencies
 
@@ -152,6 +153,11 @@ and taskgraph) as part of its architecture.
 @alkdev/taskgraph       ← task dependency graph schema, cost-benefit analysis
                         (depends on: @alkdev/typebox)
 
+@alkdev/dbtype          ← schema-first multi-dialect DB type system (Phase 0, not yet implemented)
+                        (depends on: @alkdev/typebox, @alkdev/ujsx)
+                        Renders UJSX element trees to Drizzle dialects; future: from-dbtype
+                        adapter generates CRUD OperationSpecs for @alkdev/operations
+
 @alkdev/storage         ← YOU ARE HERE — typed graph persistence
                         (depends on: @alkdev/typebox, @alkdev/drizzlebox)
 
@@ -177,6 +183,8 @@ define the semantics itself.
 | Call graph schema (`CallNodeAttrs`, `CallEdgeAttrs`, `CallStatus`) | `@alkdev/flowgraph` | Storage persists these in-memory shapes to the database |
 | Task graph schema (`TaskGraphNodeAttributes`, `DependencyEdge`) | `@alkdev/taskgraph` | Storage persists task dependency shapes |
 | Event transport (`TypedEventTarget`, `EventEnvelope`) | `@alkdev/pubsub` | Storage is not involved in event routing; it stores the events' outcomes |
+| Database schema rendering (`<table>`, `<column>`, HostConfig) | `@alkdev/dbtype` | Storage's static metagraph tables could be dbtype-rendered in the future (OQ-17, OQ-18) |
+| Universal IR (`h()`, `createComponent`, `createRoot`) | `@alkdev/ujsx` | Storage's `Type.Module` format is structurally compatible with ujsx rendering; no runtime dependency |
 
 ### Repository Layer Bridging Pattern (Consumer-Side Concern)
 
@@ -190,7 +198,8 @@ because:
 1. `@alkdev/operations` already maps closely to GraphQL's
    queries/mutations/subscriptions (it was modeled after that pattern)
 2. `@alkdev/pubsub` provides the subscription transport (forked from
-   graphql-yoga's pubsub with additions)
+   graphql-yoga's pubsub with additions like in-memory, Redis, WebSocket,
+   WebWorker event targets)
 3. `@alkdev/storage`'s metagraph tables are the data source, analogous to
    Drizzle tables for drizzle-graphql
 
@@ -204,6 +213,32 @@ to avoid circular dependencies:
 Consumer (hub / adapter) → imports both, generates operations from schemas
 ```
 
+#### Ecosystem Context
+
+The question of *where* this bridge lives and *how* it's generated connects to
+the broader ecosystem:
+
+- **drizzle-graphql** (`/workspace/drizzle-graphql`): Auto-generates GraphQL
+  CRUD from Drizzle tables. The reference pattern for "database schema → API
+  surface." Produces `{ schema, entities }` from `buildSchema(db)`. No TypeBox,
+  no metagraph.
+
+- **@alkdev/dbtype**: Schema-first multi-dialect system using ujsx element trees.
+  Defines `<table>`, `<column>` elements rendered to Drizzle via HostConfig. Has
+  a designed `from-dbtype` adapter that generates `OperationSpec[]` from element
+  trees + Type.Module bundles. Phase 0 (architecture only, no implementation).
+
+- **@alkdev/operations**: Runtime-agnostic typed operations registry with
+  adapters (`FromOpenAPI`, `from_mcp`, `from_typemap`) that generate
+  `OperationSpec[]` from external specifications. The `from-dbtype` adapter would
+  be another adapter in the same pattern.
+
+The strategic question (OQ-17, OQ-18) is whether storage's repository CRUD
+operations should be hand-written, auto-generated from Drizzle schemas, or
+auto-generated from dbtype element trees once dbtype is implemented. For v1,
+hand-written CRUD is the simplest path and doesn't block any long-term option.
+See [forward-look.md](forward-look.md) for the full analysis.
+
 ### Avoiding Circular Dependencies
 
 Neither `@alkdev/storage` nor `@alkdev/operations` should depend on each
@@ -229,6 +264,9 @@ questions affecting this package:
 - **OQ-14**: Should encryption be per-attribute, per-node, or per-graph? (resolved: per-attribute)
 - **OQ-15**: Should key management be in this package? (resolved: no, application provides key ring)
 - **OQ-16**: Should the repository layer live in storage or a consumer package? (resolved: CRUD in storage, operations bridging in consumer)
+- **OQ-17**: How should the repository layer handle attribute queries — JSON path, native columns, or dbtype-generated? (open, JSON path for v1)
+- **OQ-18**: Should CRUD operations be auto-generated or hand-written? (open, hand-write for v1)
+- **OQ-19**: Where does the storage-operations bridge package live? (open, depends on OQ-17/OQ-18)
 
 ## References