From a2ee452a639f8f63bff6c85ee8e812059dd17269 Mon Sep 17 00:00:00 2001 From: "glm-5.1" Date: Sat, 30 May 2026 11:02:49 +0000 Subject: [PATCH] Add repository layer strategy: JSON path queries, CRUD decisions, ecosystem integration Add three open questions (OQ-17, OQ-18, OQ-19) covering attribute query strategy, CRUD generation approach, and storage-operations bridge placement. Create ADR-033 recording the v1 decision: JSON path queries for attributes with hand-written CRUD for static tables. Expand forward-look.md with Repository Layer Strategy section analyzing three approaches (JSON path, native columns via dbtype, hybrid) and their implications for the metagraph pattern. Add drizzle-graphql and dbtype from-dbtype comparison showing neither handles dynamic schema-as-data. Update overview.md with dbtype/ujsx in the dependency diagram, expanded ecosystem context in the bridging pattern section, and new open questions. Align open-questions.md: resolve OQ-17 and OQ-18 for v1 (ADR-033), add OQ-19 as open, update summary counts and ADR impact table. --- docs/architecture/README.md | 1 + .../decisions/033-json-path-queries-for-v1.md | 53 +++++++ docs/architecture/forward-look.md | 131 +++++++++++++++++- docs/architecture/open-questions.md | 39 +++++- docs/architecture/overview.md | 40 +++++- 5 files changed, 258 insertions(+), 6 deletions(-) create mode 100644 docs/architecture/decisions/033-json-path-queries-for-v1.md diff --git a/docs/architecture/README.md b/docs/architecture/README.md index f2f821c..a444527 100644 --- a/docs/architecture/README.md +++ b/docs/architecture/README.md @@ -58,6 +58,7 @@ Storage has Phase 1-3 of the metagraph implementation complete: Metagraph Module | [030](decisions/030-schema-change-detection-via-diff.md) | Schema change detection via Value.Diff | Accepted | | [031](decisions/031-moduletodbschema-for-updates.md) | moduleToDbSchema() for schema updates | Accepted | | [032](decisions/032-single-author-not-crdt.md) | Single-author model, not CRDT | Accepted | +| [033](decisions/033-json-path-queries-for-v1.md) | JSON path queries and hand-written CRUD for v1 | Accepted | ### Open Questions diff --git a/docs/architecture/decisions/033-json-path-queries-for-v1.md b/docs/architecture/decisions/033-json-path-queries-for-v1.md new file mode 100644 index 0000000..11fce66 --- /dev/null +++ b/docs/architecture/decisions/033-json-path-queries-for-v1.md @@ -0,0 +1,53 @@ +# ADR-033: JSON path queries and hand-written CRUD for v1 repository layer + +## Status + +Accepted + +## Context + +The repository layer is the next major feature for `@alkdev/storage`. It needs to provide typed CRUD operations for the 6 metagraph tables and query capability for node/edge attributes stored as JSON columns. + +The metagraph pattern stores node and edge attributes as JSON (`attributes text not null default '{}'` with JSON mode in SQLite, `jsonb` in PG). This is fundamental to the design — node types are dynamic schemas defined at runtime and stored in `node_types.schema`, not static columns known at database definition time. + +Three approaches exist for querying attributes: + +1. **JSON path queries**: Map filter criteria to `json_extract()` (SQLite) or `->>` / `#>>` (PG). Works with current table definitions. No native index support on individual attributes. + +2. **Native columns via dbtype**: Render the metagraph tables via `@alkdev/dbtype` element trees and make attributes native columns. Conflicts with the metagraph's dynamic schema model — attributes are runtime data, not static columns. + +3. **Hybrid**: dbtype renders the 6 static tables. Attributes remain JSON (dynamic schema requirement). CRUD for static tables could be auto-generated. Graph-specific queries use JSON path. Virtual columns for frequently queried attributes as a later optimization. + +Separately, the CRUD operations for the 6 metagraph tables (insert graph type, find node by key, etc.) could be hand-written, auto-generated from Drizzle schemas (drizzle-graphql pattern), or auto-generated from dbtype element trees (the `from-dbtype` adapter pattern). + +## Decision + +For v1: + +1. **Attribute queries use JSON path extraction** (`json_extract` on SQLite, `->>`/`#>>` on PG). This preserves the metagraph's dynamic schema model. Native column indexes on individual attributes are not available in v1. + +2. **Static table CRUD operations are hand-written** with explicit function signatures (`findNode(graphId, key)`, `insertNodeType(...)`, etc.). No auto-generation from Drizzle or dbtype. + +3. **dbtype integration is deferred** (per ADR-018). The hybrid approach remains viable for a future iteration but is not the v1 path. + +4. **Virtual/computed columns for frequently queried attributes** are a post-v1 optimization, not a v1 design concern. + +The repository layer will have two parts: +- **Static table CRUD**: Insert, find, update, delete for graph_types, node_types, edge_types, graphs, nodes, edges, actors. +- **Graph data queries**: JSON path queries against node/edge attributes, validated by the Module schema at the application layer. + +## Consequences + +- v1 repository API uses JSON path for attribute queries — no native SQL indexes on attributes +- CRUD function signatures are known and explicit — no generated code surface to learn +- PG can add GIN indexes on `jsonb` columns for containment queries, but not for arbitrary key-value lookups +- The hand-written CRUD path doesn't block any future auto-generation approach (dbtype `from-dbtype`, drizzle-graphql pattern, or a `from-storage` adapter in `@alkdev/operations`) +- The metagraph's dynamic schema model is preserved — attributes are always JSON, not static columns + +## References + +- [forward-look.md](../forward-look.md) — Repository Layer Strategy section (full analysis) +- [overview.md](../overview.md) — Repository Layer Bridging Pattern +- [sqlite-host.md](../sqlite-host.md) — JSON text for schema columns (ADR-019) +- ADR-018: dbtype integration is post-v1 +- ADR-005: Drizzle + TypeBox via drizzlebox \ No newline at end of file diff --git a/docs/architecture/forward-look.md b/docs/architecture/forward-look.md index 40ce209..514bc5f 100644 --- a/docs/architecture/forward-look.md +++ b/docs/architecture/forward-look.md @@ -1,5 +1,5 @@ --- -status: draft +status: reviewed last_updated: 2026-05-30 --- @@ -225,6 +225,129 @@ The Module-based graph type definitions (this spec) are the **first concrete step** in this pipeline. Everything else builds on having a `Type.Module` as the schema source of truth. +## Repository Layer Strategy + +The repository layer (typed CRUD for the 6 metagraph tables + queries for graph data) +is the next major feature to implement. The question of *how* it queries attributes +connects to broader ecosystem decisions about dbtype and operations. + +### Three Approaches + +#### A. JSON Path Queries (Near-Term) + +The repository layer maps filter criteria to JSON path extraction: + +```ts +findNodes({ graphId, attributes: { status: "active" } }) +// SQLite: json_extract(attributes, '$.status') = 'active' +// PG: attributes ->> 'status' = 'active' +``` + +- Works with current table definitions (no schema changes) +- SQLite `json_extract()` and PG `->>` / `#>>` operators handle JSON path +- No native index support on individual JSON attributes +- PG can add GIN indexes on `jsonb` columns for containment queries, but not for + arbitrary key-value lookups +- Simple, immediate, no new infrastructure + +This is the pragmatic v1 approach. The metagraph pattern *requires* JSON attributes +because node types are dynamic schemas (defined at runtime, stored in +`node_types.schema`), not static columns known at database definition time. + +#### B. Native Columns via dbtype (Long-Term, Speculative) + +If storage migrates to dbtype element trees for table definitions, the 6 static +metagraph tables (graph_types, node_types, edge_types, graphs, nodes, edges) could +be rendered via the dbtype pipeline: element tree → HostConfig → Drizzle tables. +This would eliminate the manual duplication between `sqlite/` and future `pg/`. + +However, dbtype does NOT solve the attribute indexing problem: + +- The metagraph's `attributes` column MUST remain JSON because the shape is defined + by runtime schemas (node type definitions), not by static column definitions +- dbtype generates static table schemas; it does not handle dynamic schema-as-data + patterns like the metagraph +- A "call" node's attributes (`requestId`, `status`, `duration`) are not columns + on the `nodes` table — they're values in the `attributes` JSON column, validated + by the corresponding node type's TypeBox schema + +#### C. Hybrid: Static Tables via dbtype, Dynamic Attributes Remain JSON + +The hybrid approach preserves the metagraph's dynamic schema model while leveraging +dbtype for the static table scaffolding: + +1. **Static tables**: dbtype renders the 6 metagraph tables to Drizzle dialects. + This eliminates the SQLite/PG manual duplication for table *structure*. + The `attributes` column is still `text/jsonb` across both dialects. + +2. **Dynamic attributes**: Remain JSON. The Module-based node type schemas validate + data at the application layer, not the database layer. This is by design + (ADR-003, ADR-014). + +3. **Virtual columns / computed columns**: A post-v1 optimization, not a v1 concern. + Frequently queried attributes could be extracted to indexed columns as a + performance optimization. For example, if `nodes.attributes.status` is a common + filter, a computed column or trigger could copy it to `nodes.status_column` with + an index. This would be a denormalization trade-off (triggers, migration + complexity, dual-write responsibility) and is not designed or planned for v1. + +4. **Repository CRUD**: The static table CRUD operations (insert graph type, find + node by key) could be auto-generated like drizzle-graphql or the dbtype + `from-dbtype` adapter. Graph-specific attribute queries remain JSON path. + +### Implications for Each Approach + +| Concern | Path A (JSON) | Path B (Native) | Path C (Hybrid) | +|---------|---------------|-----------------|------------------| +| Works today | ✅ | ❌ (requires dbtype) | ❌ (requires dbtype) | +| Preserves metagraph pattern | ✅ | ❌ (conflicts with dynamic schemas) | ✅ | +| Eliminates SQLite/PG duplication | ❌ | ✅ | ✅ | +| Indexes on attributes | GIN on PG only | ✅ full native | GIN + virtual columns | +| Repository generation | Hand-write CRUD | Auto-gen from dbtype | Auto-gen for static, JSON path for dynamic | +| Dependency on dbtype | None | Full | Partial (static tables only) | + +### Connection to drizzle-graphql + +The overview references drizzle-graphql as a pattern for auto-generating a CRUD/query +surface. The dbtype `from-dbtype` adapter is the @alkdev equivalent: it consumes +element trees + Type.Module bundles and produces `OperationSpec[]` for the +operations registry. + +The parallel: + +| Concern | drizzle-graphql | dbtype from-dbtype | +|---------|----------------|-------------------| +| Input | Drizzle schema (tables + relations) | UJSX element tree + Type.Module | +| Output | GraphQL schema (queries + mutations) | `OperationSpec[]` (CRUD operations) | +| Dialects | SQLite, PG, MySQL | SQLite, PG, MySQL (via HostConfig) | +| Table model | Static columns only | Static columns only | +| Dynamic data (JSON attrs) | Not handled | Not handled | + +Neither drizzle-graphql nor dbtype's `from-dbtype` handles dynamic schema-as-data +patterns. The metagraph's JSON attributes require their own query layer, regardless +of whether the static tables are auto-generated. This means the repository layer +for `@alkdev/storage` will always have two parts: + +1. **Static table CRUD** — could be auto-generated (by dbtype or hand-written) +2. **Graph data queries** — JSON path queries against the `attributes` column, + validated by the Module schema at the application layer + +### v1 Decision + +For v1, the practical path is **A (JSON path queries) with hand-written CRUD**. This +decision is recorded as [ADR-033](./decisions/033-json-path-queries-for-v1.md). The +hybrid approach (C) remains viable for a future iteration when dbtype reaches +implementation, and it doesn't require any changes to the metagraph data model — +only to how the static table definitions are generated. See OQ-17, OQ-18, OQ-19 +in [open-questions.md](./open-questions.md) for the specific long-term questions +that remain open beyond v1. + +### Decisions Required + +- **OQ-17**: JSON path vs native columns vs hybrid for attribute queries (resolved for v1 — see ADR-033) +- **OQ-18**: Auto-generated vs hand-written CRUD for static tables (resolved for v1 — see ADR-033) +- **OQ-19**: Where the storage-operations bridge package should live (open) + ## Constraints on Current Design The forward-looking patterns documented here constrain the Module evolution @@ -263,7 +386,11 @@ design in [metagraph-module.md](./metagraph-module.md): - dbtype architecture: `/workspace/@alkdev/dbtype/docs/architecture/README.md` - dbtype elements: `/workspace/@alkdev/dbtype/docs/architecture/elements.md` - dbtype module: `/workspace/@alkdev/dbtype/docs/architecture/module.md` +- dbtype repo adapter: `/workspace/@alkdev/dbtype/docs/architecture/repo-adapter.md` +- drizzle-graphql (reference for CRUD generation pattern): `/workspace/drizzle-graphql/` +- Operations registry: `/workspace/@alkdev/operations/docs/architecture/README.md` - JPATH Module (JSONPath as TypeBox Module): `/workspace/research/typebox_research/ujsx/jpath.gen.ts` - jsonpathly source: `/workspace/jsonpathly/` - Module evolution spec: [metagraph-module.md](./metagraph-module.md) -- Schema evolution spec: [schema-evolution.md](./schema-evolution.md) \ No newline at end of file +- Schema evolution spec: [schema-evolution.md](./schema-evolution.md) +- ADR-033: JSON path queries and hand-written CRUD for v1 \ No newline at end of file diff --git a/docs/architecture/open-questions.md b/docs/architecture/open-questions.md index ea17fed..5e83d93 100644 --- a/docs/architecture/open-questions.md +++ b/docs/architecture/open-questions.md @@ -13,9 +13,9 @@ When a question is resolved, update its status to `resolved` and add a resolutio | Status | Count | |--------|-------| -| Open | 7 | +| Open | 8 | | Partially resolved | 1 | -| Resolved | 8 | +| Resolved | 10 | **Open questions requiring decisions:** - **OQ-03** (actors table design) — deferred to ACL design @@ -25,10 +25,15 @@ When a question is resolved, update its status to `resolved` and add a resolutio - **OQ-11** (auto-migrate vs explicit consumer action) — conditional on OQ-10 - **OQ-12** (schema evolution vs event-sourced replay) — post-v1 concern - **OQ-13** (schema evolution events in event stream) — post-v1 +- **OQ-19** (storage-operations bridge package location) — depends on long-term CRUD strategy **Partially resolved:** - **OQ-01** (flowgraph Module export) — storage can start without it +**Resolved (v1 direction decided, long-term question remains open):** +- **OQ-17** (attribute query strategy) — JSON path for v1 (ADR-033), hybrid viable with dbtype later +- **OQ-18** (auto-generated vs hand-written CRUD) — hand-write for v1 (ADR-033), auto-gen remains an option + ## How to Use This Document - Each question has an **ID** (e.g., OQ-01), **status**, **origin** (which doc(s)), and **priority** @@ -41,7 +46,9 @@ When a question is resolved, update its status to `resolved` and add a resolutio |-----|----------| | ADR-003 | OQ-01 (partial — storage can start without flowgraph Module) | | ADR-015 | OQ-05 (constraint semantics) | +| ADR-018 | OQ-17 (v1 decision: dbtype integration deferred, JSON path for v1) | | ADR-020 | OQ-02 (no nodeTypeId for now, can add later) | +| ADR-033 | OQ-17 (JSON path queries for v1), OQ-18 (hand-written CRUD for v1) | ## Theme 1: Package Boundaries and Dependencies @@ -171,4 +178,30 @@ When a question is resolved, update its status to `resolved` and add a resolutio - **Origin**: [overview.md](overview.md) - **Status**: resolved - **Priority**: high -- **Resolution**: The repository CRUD layer (host-specific typed queries, schema validation before writes) belongs in `@alkdev/storage`. The operations bridging layer (generating `OperationSpec`s from metagraph schemas) belongs in a consumer or adapter package. These are separate concerns — CRUD is a storage concern; call protocol integration is an application concern. \ No newline at end of file +- **Resolution**: The repository CRUD layer (host-specific typed queries, schema validation before writes) belongs in `@alkdev/storage`. The operations bridging layer (generating `OperationSpec`s from metagraph schemas) belongs in a consumer or adapter package. These are separate concerns — CRUD is a storage concern; call protocol integration is an application concern. + +## Theme 7: Repository Layer Strategy + +### OQ-17: How should the repository layer handle attribute queries — JSON path, native columns, or dbtype-generated? + +- **Origin**: [forward-look.md](forward-look.md) +- **Status**: resolved (v1) +- **Priority**: high +- **Resolution**: For v1, attribute queries use JSON path extraction (`json_extract` on SQLite, `->>`/`#>>` on PG). Hand-written CRUD for static tables. dbtype integration and hybrid approach are post-v1. See ADR-033. The long-term question of whether to adopt the hybrid approach (static tables via dbtype, dynamic attributes remain JSON) remains open for future iterations. +- **Cross-references**: ADR-033, ADR-018, [forward-look.md](forward-look.md) + +### OQ-18: Should the repository layer's CRUD operations be auto-generated (drizzle-graphql pattern) or hand-written? + +- **Origin**: [forward-look.md](forward-look.md) +- **Status**: resolved (v1) +- **Priority**: medium +- **Resolution**: For v1, hand-write CRUD functions with explicit signatures. The three long-term options (hand-written, auto-generated from Drizzle, auto-generated from dbtype) remain open for future iterations. See ADR-033. +- **Cross-references**: ADR-033, OQ-17 + +### OQ-19: Where does the storage-operations bridge package live in the @alkdev workspace? + +- **Origin**: [forward-look.md](forward-look.md) +- **Status**: open +- **Priority**: medium +- **Notes**: Four options: (1) hub-internal code, (2) dedicated `@alkdev/storage-operations` adapter, (3) `from-storage` adapter inside `@alkdev/operations`, (4) part of `@alkdev/dbtype`'s `from-dbtype` adapter. Option 1 is the most immediate (no new package). Option 2 is the cleanest separation. Option 3 creates an undesirable dependency direction (operations → storage). Option 4 is the long-term goal if dbtype is adopted. The choice depends on OQ-17/OQ-18 resolution: if hand-written CRUD, the bridge is trivial and can live in the hub; if auto-generated from dbtype, the bridge naturally lives with dbtype. +- **Cross-references**: OQ-16, OQ-17, ADR-033 \ No newline at end of file diff --git a/docs/architecture/overview.md b/docs/architecture/overview.md index bc0c528..c2b6c5d 100644 --- a/docs/architecture/overview.md +++ b/docs/architecture/overview.md @@ -90,6 +90,7 @@ All design decisions are documented as ADRs in [decisions/](decisions/). | [006](decisions/006-enum-pattern-as-const-objects.md) | `as const` objects, not TypeScript enums | Avoids JSR slow-types; consistent pattern across codebase | | [007](decisions/007-no-comments-in-code.md) | No comments in code | Documentation lives in architecture docs and TypeBox descriptions | | [008](decisions/008-common-columns-pattern.md) | Common columns pattern | `id`, `metadata`, `createdAt`, `updatedAt` on every table | +| [033](decisions/033-json-path-queries-for-v1.md) | JSON path queries and hand-written CRUD for v1 | Attribute queries use JSON path; CRUD is hand-written; dbtype and auto-generation are post-v1 | ## Dependencies @@ -152,6 +153,11 @@ and taskgraph) as part of its architecture. @alkdev/taskgraph ← task dependency graph schema, cost-benefit analysis (depends on: @alkdev/typebox) +@alkdev/dbtype ← schema-first multi-dialect DB type system (Phase 0, not yet implemented) + (depends on: @alkdev/typebox, @alkdev/ujsx) + Renders UJSX element trees to Drizzle dialects; future: from-dbtype + adapter generates CRUD OperationSpecs for @alkdev/operations + @alkdev/storage ← YOU ARE HERE — typed graph persistence (depends on: @alkdev/typebox, @alkdev/drizzlebox) @@ -177,6 +183,8 @@ define the semantics itself. | Call graph schema (`CallNodeAttrs`, `CallEdgeAttrs`, `CallStatus`) | `@alkdev/flowgraph` | Storage persists these in-memory shapes to the database | | Task graph schema (`TaskGraphNodeAttributes`, `DependencyEdge`) | `@alkdev/taskgraph` | Storage persists task dependency shapes | | Event transport (`TypedEventTarget`, `EventEnvelope`) | `@alkdev/pubsub` | Storage is not involved in event routing; it stores the events' outcomes | +| Database schema rendering (``, ``, HostConfig) | `@alkdev/dbtype` | Storage's static metagraph tables could be dbtype-rendered in the future (OQ-17, OQ-18) | +| Universal IR (`h()`, `createComponent`, `createRoot`) | `@alkdev/ujsx` | Storage's `Type.Module` format is structurally compatible with ujsx rendering; no runtime dependency | ### Repository Layer Bridging Pattern (Consumer-Side Concern) @@ -190,7 +198,8 @@ because: 1. `@alkdev/operations` already maps closely to GraphQL's queries/mutations/subscriptions (it was modeled after that pattern) 2. `@alkdev/pubsub` provides the subscription transport (forked from - graphql-yoga's pubsub with additions) + graphql-yoga's pubsub with additions like in-memory, Redis, WebSocket, + WebWorker event targets) 3. `@alkdev/storage`'s metagraph tables are the data source, analogous to Drizzle tables for drizzle-graphql @@ -204,6 +213,32 @@ to avoid circular dependencies: Consumer (hub / adapter) → imports both, generates operations from schemas ``` +#### Ecosystem Context + +The question of *where* this bridge lives and *how* it's generated connects to +the broader ecosystem: + +- **drizzle-graphql** (`/workspace/drizzle-graphql`): Auto-generates GraphQL + CRUD from Drizzle tables. The reference pattern for "database schema → API + surface." Produces `{ schema, entities }` from `buildSchema(db)`. No TypeBox, + no metagraph. + +- **@alkdev/dbtype**: Schema-first multi-dialect system using ujsx element trees. + Defines `
`, `` elements rendered to Drizzle via HostConfig. Has + a designed `from-dbtype` adapter that generates `OperationSpec[]` from element + trees + Type.Module bundles. Phase 0 (architecture only, no implementation). + +- **@alkdev/operations**: Runtime-agnostic typed operations registry with + adapters (`FromOpenAPI`, `from_mcp`, `from_typemap`) that generate + `OperationSpec[]` from external specifications. The `from-dbtype` adapter would + be another adapter in the same pattern. + +The strategic question (OQ-17, OQ-18) is whether storage's repository CRUD +operations should be hand-written, auto-generated from Drizzle schemas, or +auto-generated from dbtype element trees once dbtype is implemented. For v1, +hand-written CRUD is the simplest path and doesn't block any long-term option. +See [forward-look.md](forward-look.md) for the full analysis. + ### Avoiding Circular Dependencies Neither `@alkdev/storage` nor `@alkdev/operations` should depend on each @@ -229,6 +264,9 @@ questions affecting this package: - **OQ-14**: Should encryption be per-attribute, per-node, or per-graph? (resolved: per-attribute) - **OQ-15**: Should key management be in this package? (resolved: no, application provides key ring) - **OQ-16**: Should the repository layer live in storage or a consumer package? (resolved: CRUD in storage, operations bridging in consumer) +- **OQ-17**: How should the repository layer handle attribute queries — JSON path, native columns, or dbtype-generated? (open, JSON path for v1) +- **OQ-18**: Should CRUD operations be auto-generated or hand-written? (open, hand-write for v1) +- **OQ-19**: Where does the storage-operations bridge package live? (open, depends on OQ-17/OQ-18) ## References