Files
storage/docs/architecture/overview.md
glm-5.1 5ce93b1357 docs: add metagraph-module and forward-look architecture specs, remove SchemaBuilder legacy support
Graph type definitions as TypeBox Modules — the core architecture evolution
for @alkdev/storage. The SchemaBuilder is removed (no existing consumers),
replaced by direct TypeModule construction with Metagraph.Import() for
base attribute composition and Type.Composite() for node/edge type
specialization.

Key additions:
- metagraph-module.md: Module pattern, edge constraints as named entries,
  SchemaBuilder equivalence, DB bridge contracts (moduleToDbSchema return
  type, validateNode/validateEdge signatures), 10 design decisions (DD1-DD10)
- forward-look.md: pointer abstraction (ujsx ValuePointer analogy, JPATH
  Module), dbtype table rendering relationship, ujsx as universal IR pipeline

Critical corrections from architecture review:
- Type.Composite uses IntersectEvaluated (intersection, not Object.assign
  override) — overlapping keys with subtype relationships resolve correctly
- Type.Ref inside Type.Composite within a Module is verified working
- BaseNode/BaseEdge use Metagraph.Import() for same-package Modules (Option B),
  not local re-declaration (no circular dep within same package)
- Edge constraints use Type.String() for node type name arrays (not Type.Ref) —
  constraints contain names, not schemas
2026-05-28 15:32:56 +00:00

315 lines
19 KiB
Markdown

---
status: draft
last_updated: 2026-05-28
---
# @alkdev/storage — Overview
Typed graph storage with dual database hosts. Deno-first, published via JSR.
## Purpose
`@alkdev/storage` provides a **metagraph** storage model: graph types define
schemas, node types define data shapes within those graphs, and edge types
define typed relationships. Instances of these type definitions become actual
graphs populated with nodes and edges.
This pattern replaces domain-specific table proliferation with a small number of
general-purpose tables that can model anything — call graphs, ACL rules, task
dependencies, encrypted secrets — while enforcing schema integrity through
TypeBox validation.
The package evolved from `@ade/ade-v0/packages/core/graphs` and
`@ade/ade-v0/packages/storage_sqlite`, simplified and refactored for the @alkdev
ecosystem.
## Architecture
```
@alkdev/storage/
├── mod.ts → re-exports graphs/ (zero db deps)
├── src/
│ ├── graphs/ → Metagraph Module, bridge functions (no db deps)
│ ├── sqlite/ → SQLite host (drizzle-orm/libsql)
│ │ ├── tables/ → drizzle table definitions
│ │ ├── relations.ts → drizzle relational mappings
│ │ ├── schema.ts → barrel re-export
│ │ └── client.ts → injectable createSqliteDatabase()
│ └── pg/ → PostgreSQL host (NOT YET IMPLEMENTED)
└── test/ → empty — tests not yet written
```
### Subpath Exports (JSR/npm)
| Export | Contents | Dependencies |
| ------------------------ | --------------------------------------- | --------------------------------------- |
| `@alkdev/storage` | Graph schema types, SchemaBuilder | `@alkdev/typebox`, `@alkdev/drizzlebox` |
| `@alkdev/storage/graphs` | Same as `.` — alias for the main export | Same as `.` |
| `@alkdev/storage/sqlite` | SQLite tables, relations, client | + `drizzle-orm`, `@libsql/client` |
| `@alkdev/storage/pg` | PostgreSQL tables, relations, client | ⚠️ NOT YET IMPLEMENTED |
The `./graphs` subpath exists because the source code lives in `src/graphs/` and
the main `mod.ts` re-exports it. Importing from either `@alkdev/storage` or
`@alkdev/storage/graphs` yields the same types and SchemaBuilder.
## Terminology
| Term | Definition |
| ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Metagraph** | A type system where graph types define schemas, node types define data shapes within those graphs, and edge types define typed relationships. Graph instances are concrete data conforming to these type definitions. |
| **Hub** | The central service in the hub-spoke architecture. A consumer of `@alkdev/storage` — uses the PostgreSQL host for persistent graph storage. The hub also depends on `@alkdev/operations`, `@alkdev/pubsub`, `@alkdev/flowgraph`. |
| **Spoke** | A local/embedded instance that runs per-project or per-session. A consumer of `@alkdev/storage` — uses the SQLite host for local graph storage. |
| **Graph type** | A class of graphs (e.g., "call-graph", "acl"). Defines structural constraints (directed/undirected/mixed, multi-edges, self-loops) and the valid node/edge type vocabularies. Stored in the `graph_types` table. |
| **Node type** | A category of node within a graph type. Defines the attribute schema for nodes of that type. Stored in the `node_types` table. |
| **Edge type** | A category of edge within a graph type. Defines the attribute schema and optionally restricts which node types can be source/target. Stored in the `edge_types` table. |
| **Graph instance** | A concrete graph belonging to a graph type. Contains nodes and edges conforming to its type definitions. Stored in the `graphs` table. |
| **Consumer** | Code that imports `@alkdev/storage` (or a subpath) to define graph types and persist graph data. The hub, spokes, and other @alkdev packages are consumers. |
| **Repository layer** | ⚠️ Not yet implemented. The typed CRUD functions (insert, find, update, delete) that sit between consumer code and raw Drizzle queries. Performs schema validation before writes. No dependency on `@alkdev/operations` — the consumer wires CRUD into the registry. |
| **Validation boundary** | The line where schema validation is enforced. In this package, validation happens in the SchemaBuilder (at type definition time) and the repository layer (at mutation time), NOT in the database. |
## Design Decisions
### D1: Deno-first, JSR publishes, npm comes free
The package is published to JSR (`deno publish`). npm compatibility is automatic
via JSR's npm layer (`@jsr/alkdev__storage`). No separate dnt build step.
### D2: Metagraph over domain-specific tables
Instead of a table per domain concept (call graphs, ACL rules, task trees), we
define graph types with typed node and edge schemas. A "call graph" is a graph
type with specific node types (operation call, subcall) and edge types
(triggered, depends_on). An "ACL graph" is a graph type with node types
(account, resource) and edge types (can_read, can_write).
This trades some query convenience for generality. Domain-specific queries are
built on top of the graph query layer, not baked into table schemas.
### D3: Type.Module as the primary API surface
The `Type.Module()` construction API is the intended way to define graph type
definitions. The `Metagraph` Module provides base entries (`BaseNode`,
`BaseEdge`, `Config`); concrete graph types compose them via `Metagraph.Import()`
and `Type.Composite()`. The `SchemaBuilder` is removed.
This replaces the earlier fluent builder pattern. The Module format provides
native `Type.Ref()` for internal references, `Module.Import()` for cross-package
references, and JSON Schema `$defs` that map directly to DB storage.
### D4: Injectable clients, no module-level side effects
`createSqliteDatabase(client)` receives a pre-created client. Module-level side
effects (auto-connections, env-based configuration) are forbidden. This enables
testing with in-memory databases and containerized deployment patterns.
### D5: Drizzle + TypeBox (via drizzlebox) as the table definition pattern
Drizzle table definitions are the single source of truth for database schema.
`@alkdev/drizzlebox` generates TypeBox `Select*` and `Insert*` schemas from
Drizzle tables, enabling runtime validation without manual schema duplication.
### D6: Enumeration pattern — `as const` objects, not TypeScript enums
All enumerations use the `as const` object pattern (e.g.,
`GRAPH_STATUS = { Active: "active", ... } as const`) rather than TypeScript
`enum`. This avoids JSR slow-type issues and provides a consistent pattern
across the codebase. The TypeBox schemas use `Type.Union` of `Type.Literal`
values derived from the const object.
### D7: No comments in code
Per project convention across @alkdev packages, source files contain no inline
comments. Documentation lives in architecture docs and TypeBox schema
descriptions.
### D8: Common columns pattern
All tables share `id` (text PK), `metadata` (JSON text defaulting to `{}`),
`createdAt`, and `updatedAt` (integer timestamps in SQLite, will be timestamptz
in PG). This ensures every row has auditability and extensibility.
## Dependencies
| Package | Purpose | Layer |
| -------------------- | ------------------------------------ | ------------------------ |
| `@alkdev/typebox` | Runtime schema validation | graphs/ |
| `@alkdev/drizzlebox` | Generate TypeBox from Drizzle tables | sqlite/ |
| `drizzle-orm` | ORM, table definitions, queries | sqlite/ (and future pg/) |
| `@libsql/client` | SQLite client (libsql/turso) | sqlite/ |
| `postgres` | PostgreSQL client | pg/ (not yet used) |
`@alkdev/typebox` and `@alkdev/drizzlebox` are npm packages (not yet on JSR).
JSR handles npm dependencies natively.
**Ecosystem packages are not runtime dependencies of `@alkdev/storage`.** All
ecosystem references in this document describe consumer-side data shapes and
integration patterns, not import dependencies. The `@alkdev/operations`,
`@alkdev/pubsub`, `@alkdev/flowgraph`, and `@alkdev/taskgraph` packages are
consumed by the hub and spokes, not by storage itself.
## What Exists vs. What's Needed
### Implemented
- Graph schema types and Metagraph Module (replaces SchemaBuilder)
- SQLite host: 6 metagraph tables + actors table + Drizzle relations + client
factory
- TypeBox select/insert schemas generated from Drizzle tables (drizzlebox)
### Not Yet Implemented
| Gap | Priority | Notes |
| ----------------------------------------- | ------------ | --------------------------------------------------------------------------------------------------- |
| Encrypted data node type + crypto utility | **Critical** | ⚠️ Not yet implemented. API keys and secrets at rest. See [encrypted-data.md](./encrypted-data.md). |
| Repository/CRUD layer | High | ⚠️ Not yet implemented. Typed insert, find, update, delete functions for graphs, nodes, edges. No dependency on `@alkdev/operations` — consumer wires CRUD into registry. |
| Tests | High | Zero tests exist. Needed before any real use. |
| PostgreSQL host | Medium | Same table shapes, `pgTable` + `jsonb` + `timestamp` + `pgEnum`. Stub only. |
| Call graph type | Medium | Informed by `@alkdev/flowgraph`'s `CallNodeAttrs`/`CallEdgeAttrs` schemas and `@alkdev/operations`' call protocol events. Not hub-specific — any consumer that tracks call invocations needs this. |
| ACL graph type | Medium | Access control as a graph. Informed by `@alkdev/operations`' `Identity` and `AccessControl`. Depends on encrypted data and CRUD layer. |
| Task graph type | Low | Informed by `@alkdev/taskgraph`'s `TaskGraphNodeAttributes` and `DependencyEdge` schemas. |
## Ecosystem Integration
`@alkdev/storage` is a **data layer package** consumed by other packages in the
@alkdev ecosystem. It does not depend on the hub — the dependency flows the
other way. The hub consumes storage (along with operations, pubsub, flowgraph,
and taskgraph) as part of its architecture.
### Dependency Direction
```
@alkdev/pubsub ← transport only (no storage dependency)
@alkdev/operations ← call protocol, registry, identity, access control
↑ (depends on: @alkdev/pubsub, @alkdev/typebox)
@alkdev/flowgraph ← call graph schema, operation graph, workflow templates
↑ (depends on: @alkdev/operations [peer], @alkdev/typebox)
@alkdev/taskgraph ← task dependency graph schema, cost-benefit analysis
(depends on: @alkdev/typebox)
@alkdev/storage ← YOU ARE HERE — typed graph persistence
(depends on: @alkdev/typebox, @alkdev/drizzlebox)
↑ ↑
| |
Hub / Spoke Any consumer that needs
(consumes all) persistent graph storage
```
The key insight: `@alkdev/storage` provides the **persistence primitives**
(schemas, tables, repository layer). The **domain semantics** (what a call graph
means, what identity looks like, how access control works) are defined by the
packages above. Storage stores the shapes those packages define; it does not
define the semantics itself.
### What Comes from Where
| Concept | Source package | Storage's role |
|---------|---------------|----------------|
| Call protocol events (`call.requested`, `call.responded`, etc.) | `@alkdev/operations` | Storage persists the outcomes — graphs with `CallNodeAttrs` nodes |
| Identity (`id`, `scopes`, `resources`) | `@alkdev/operations` | Storage stores identity as node attributes; `Identity` is a data shape, not a storage concept |
| Access control (`AccessControl`, `requiredScopes`) | `@alkdev/operations` | Storage's ACL graph type mirrors the operations `AccessControl` schema as graph structure |
| Call graph schema (`CallNodeAttrs`, `CallEdgeAttrs`, `CallStatus`) | `@alkdev/flowgraph` | Storage persists these in-memory shapes to the database |
| Task graph schema (`TaskGraphNodeAttributes`, `DependencyEdge`) | `@alkdev/taskgraph` | Storage persists task dependency shapes |
| Event transport (`TypedEventTarget`, `EventEnvelope`) | `@alkdev/pubsub` | Storage is not involved in event routing; it stores the events' outcomes |
### Repository Layer Bridging Pattern (Consumer-Side Concern)
The repository layer in `@alkdev/storage` provides typed CRUD — no `@alkdev/operations`
dependency. A **consumer-side** bridging module can then wire these CRUD functions
into the `@alkdev/operations` registry, analogous to how `drizzle-graphql`
auto-generates a GraphQL schema from Drizzle tables — but using operations
(queries, mutations, subscriptions) instead of GraphQL resolvers. This works
because:
1. `@alkdev/operations` already maps closely to GraphQL's
queries/mutations/subscriptions (it was modeled after that pattern)
2. `@alkdev/pubsub` provides the subscription transport (forked from
graphql-yoga's pubsub with additions)
3. `@alkdev/storage`'s metagraph tables are the data source, analogous to
Drizzle tables for drizzle-graphql
The bridging module would live in a consumer package (e.g., the hub or a
dedicated `@alkdev/storage-operations` adapter), not in `@alkdev/storage` itself,
to avoid circular dependencies:
```
@alkdev/storage → defines types + tables (no operations dependency)
@alkdev/operations → defines call protocol + registry (no storage dependency)
Consumer (hub / adapter) → imports both, generates operations from schemas
```
### Avoiding Circular Dependencies
Neither `@alkdev/storage` nor `@alkdev/operations` should depend on each
other directly. Storage defines the schema types and database tables; operations
defines the call protocol and execution model. The consumer (hub, spoke, or
adapter package) imports both and bridges them. This preserves the
single-responsibility principle and allows each package to evolve independently.
If shared type definitions are needed (e.g., `Identity` referenced in both
storage node attributes and operations call events), they should either:
1. Be duplicated in each package with a documented correspondence (acceptable
for small, stable types)
2. Be extracted to a minimal shared types package if the duplication becomes
burdensome
## Open Questions
1. **Should `actors` be a node type or a standalone table?** Currently `actors`
is a standalone table in the SQLite host that isn't referenced by any
relation. If identity/authentication is a graph (ACL nodes based on
`@alkdev/operations`'s `Identity` interface), actors become node types. If
identity is a domain concept that needs special query patterns (auth lookups,
session joins), standalone tables may be better. Decision: defer until ACL
design, informed by `@alkdev/operations`'s `AccessControl` model.
2. **Should the repository layer be host-specific or host-agnostic?** A
host-agnostic repository (insert graph, find nodes by type) requires an
abstraction over Drizzle's query builder. A host-specific repository is
simpler but means duplicating query logic for PG. Decision: start
host-specific in SQLite, extract common patterns later.
3. **Encrypted data scope**: Should encryption be per-attribute, per-node, or
per-graph? Per-attribute (like hub's `client_secrets.value`) allows selective
encryption. Per-node encrypts the entire `attributes` blob. Per-graph is
overkill. Decision: per-attribute, modeled as an encrypted node type with a
dedicated attribute for the ciphertext.
4. **Key management scope**: `@alkdev/storage` should provide the
encryption/decryption primitives but NOT key management. The consuming
application provides the key ring. This keeps the storage package agnostic to
deployment-specific secret management.
5. **Migration strategy**: When graph type schemas evolve (new node types,
changed attribute schemas), who handles migration? The repository layer
should support schema version checking, but actual migration scripts are
application-level. See [metagraph.md](./metagraph.md) for the versioning
approach.
6. **~~Should the repository layer live in `@alkdev/storage` or in a consumer
package?~~** Decision: the repository CRUD layer (host-specific typed
queries, schema validation before writes) belongs in `@alkdev/storage`. The
operations bridging layer (generating `OperationSpec`s from metagraph schemas)
belongs in a consumer or adapter package. These are separate concerns — CRUD
is a storage concern; call protocol integration is an application concern.
The repository layer in `@alkdev/storage` has **no dependency on
`@alkdev/operations`**. It performs typed inserts, finds, updates, and
deletes with schema validation. The consumer then wires these CRUD functions
into the operations registry if desired.
## References
- Metagraph Module evolution: [metagraph-module.md](./metagraph-module.md)
- Forward-looking connections: [forward-look.md](./forward-look.md)
- Operations architecture: `/workspace/@alkdev/operations/docs/architecture/README.md`
- Pubsub architecture: `/workspace/@alkdev/pubsub/docs/architecture/README.md`
- Flowgraph architecture: `/workspace/@alkdev/flowgraph/docs/architecture/README.md`
- Taskgraph architecture: `/workspace/@alkdev/taskgraph_ts/docs/architecture/README.md`
- drizzle-graphql (reference for repo bridging pattern): `/workspace/drizzle-graphql/`
- Source heritage: `@ade/ade-v0/packages/core/graphs` and
`@ade/ade-v0/packages/storage_sqlite`
- Drizzle ORM: https://orm.drizzle.team/
- TypeBox: https://github.com/sinclairzx/typebox
- JSR: https://jsr.io/