docs: restructure architecture docs to flowgraph pattern
- Create decisions/ directory with 32 numbered ADRs (ADR-001 through ADR-032) extracted from inline DD/SD/ED/SE decision sections - Create open-questions.md with 16 OQs organized by theme, cross-referenced to ADRs, with status tracking (resolved/open) - Create README.md as architecture index with doc table, ADR table, and lifecycle status definitions (draft/reviewed/stable/deprecated) - Replace inline decision sections in all spec docs with ADR reference tables - Replace inline open questions with OQ references to centralized tracker - Update frontmatter: metagraph-module.md, overview.md, sqlite-host.md → reviewed; schema-evolution.md and encrypted-data.md remain draft - DD1-DD10 → ADR-009 through ADR-018 - D1-D8 → ADR-001 through ADR-008 - SD1-SD5 → ADR-019 through ADR-023 (SD5 folded into ADR-006/008) - ED1-ED5 → ADR-023 through ADR-027 - SE1-SE5 → ADR-028 through ADR-032
This commit is contained in:
92
docs/architecture/README.md
Normal file
92
docs/architecture/README.md
Normal file
@@ -0,0 +1,92 @@
|
||||
---
|
||||
status: reviewed
|
||||
last_updated: 2026-05-29
|
||||
---
|
||||
|
||||
# @alkdev/storage Architecture
|
||||
|
||||
Typed graph storage with dual database hosts. Deno-first, published via JSR.
|
||||
|
||||
## Current State
|
||||
|
||||
Storage is in Phase 1/2 (SQLite tables and type system implemented; repository layer, tests, PostgreSQL host, and encrypted data not yet implemented).
|
||||
|
||||
## Architecture Documents
|
||||
|
||||
| Document | Content | Status |
|
||||
|----------|---------|--------|
|
||||
| [overview.md](overview.md) | Package purpose, exports, dependencies, ecosystem integration | reviewed |
|
||||
| [metagraph-module.md](metagraph-module.md) | TypeBox Module type system, bridge functions, implementation path | reviewed |
|
||||
| [sqlite-host.md](sqlite-host.md) | SQLite tables, relations, client factory, PG porting notes | reviewed |
|
||||
| [schema-evolution.md](schema-evolution.md) | Value.Diff/Cast/Patch for schema migration, version strategy | draft |
|
||||
| [encrypted-data.md](encrypted-data.md) | Crypto utility, encrypted node type, key management | draft |
|
||||
| [forward-look.md](forward-look.md) | Pointers, dbtype, ujsx IR (conceptual, post-v1) | draft |
|
||||
|
||||
### Design Decisions
|
||||
|
||||
| ADR | Decision | Status |
|
||||
|-----|----------|--------|
|
||||
| [001](decisions/001-deno-first-jsr-publishes.md) | Deno-first, JSR publishes, npm comes free | Accepted |
|
||||
| [002](decisions/002-metagraph-over-domain-tables.md) | Metagraph pattern over domain-specific tables | Accepted |
|
||||
| [003](decisions/003-typebox-module-as-api-surface.md) | TypeBox Module as the graph type definition API surface | Accepted |
|
||||
| [004](decisions/004-injectable-clients-no-side-effects.md) | Injectable clients, no module-level side effects | Accepted |
|
||||
| [005](decisions/005-drizzle-plus-typebox-via-drizzlebox.md) | Drizzle + TypeBox via drizzlebox | Accepted |
|
||||
| [006](decisions/006-enum-pattern-as-const-objects.md) | `as const` objects, not TypeScript enums | Accepted |
|
||||
| [007](decisions/007-no-comments-in-code.md) | No comments in code | Accepted |
|
||||
| [008](decisions/008-common-columns-pattern.md) | Common columns pattern | Accepted |
|
||||
| [009](decisions/009-typebox-module-replaces-schemabuilder.md) | TypeBox Module replaces the SchemaBuilder | Accepted |
|
||||
| [010](decisions/010-metagraph-import-for-same-package.md) | Metagraph.Import() for same-package Modules | Accepted |
|
||||
| [011](decisions/011-config-as-module-entry-with-literal-values.md) | Config as Module entry with Literal values | Accepted |
|
||||
| [012](decisions/012-node-edge-attributes-as-module-entries.md) | Node/edge attribute schemas are Module entries | Accepted |
|
||||
| [013](decisions/013-storage-produces-graphology-format.md) | Storage produces graphology format, flowgraph consumes it | Accepted |
|
||||
| [014](decisions/014-dereferenced-entry-schemas.md) | Repository stores dereferenced entry schemas | Accepted |
|
||||
| [015](decisions/015-edge-constraints-as-named-entries.md) | Edge type constraints as named Module entries | Accepted |
|
||||
| [016](decisions/016-naming-convention-for-module-entries.md) | Naming convention for Module entries | Accepted |
|
||||
| [017](decisions/017-pointer-abstraction-is-forward-looking.md) | Pointer abstraction is forward-looking, not v1 | Accepted |
|
||||
| [018](decisions/018-dbtype-integration-is-post-v1.md) | dbtype integration is post-v1 | Accepted |
|
||||
| [019](decisions/019-json-text-for-schema-columns.md) | JSON text for schema columns in SQLite | Accepted |
|
||||
| [020](decisions/020-no-nodetypeid-on-nodes.md) | No nodeTypeId on nodes | Accepted |
|
||||
| [021](decisions/021-edge-identity-uses-consumer-keys.md) | Edge identity uses consumer-defined keys | Accepted |
|
||||
| [022](decisions/022-composite-fks-for-node-references.md) | Composite foreign keys for node references | Accepted |
|
||||
| [023](decisions/023-per-attribute-encryption.md) | Per-attribute encryption, not per-node | Accepted |
|
||||
| [024](decisions/024-encrypted-data-as-node-type.md) | Encrypted data as node type, not standalone table | Accepted |
|
||||
| [025](decisions/025-password-based-encryption-pbkdf2.md) | Password-based encryption via PBKDF2 | Accepted |
|
||||
| [026](decisions/026-application-managed-key-ring.md) | Application-managed key ring | Accepted |
|
||||
| [027](decisions/027-no-key-rotation-utility.md) | No key rotation utility in this package | Accepted |
|
||||
| [028](decisions/028-additive-only-with-cast-migration.md) | Additive-only for v1, Cast migration when needed | Accepted |
|
||||
| [029](decisions/029-version-as-breaking-change-signal.md) | Version as a coarse-grained breaking-change signal | Accepted |
|
||||
| [030](decisions/030-schema-change-detection-via-diff.md) | Schema change detection via Value.Diff | Accepted |
|
||||
| [031](decisions/031-moduletodbschema-for-updates.md) | moduleToDbSchema() for schema updates | Accepted |
|
||||
| [032](decisions/032-single-author-not-crdt.md) | Single-author model, not CRDT | Accepted |
|
||||
|
||||
### Open Questions
|
||||
|
||||
All unresolved design questions are tracked in [open-questions.md](open-questions.md), organized by theme with cross-references between related questions.
|
||||
|
||||
## Document Lifecycle
|
||||
|
||||
Architecture documents use YAML frontmatter with `status` and `last_updated` fields:
|
||||
|
||||
```yaml
|
||||
---
|
||||
status: draft | reviewed | stable | deprecated
|
||||
last_updated: YYYY-MM-DD
|
||||
---
|
||||
```
|
||||
|
||||
| Status | Meaning | Transitions |
|
||||
|--------|---------|-------------|
|
||||
| `draft` | Under active development. Content may change significantly. Implementation should not start until the document reaches `reviewed`. | → `reviewed` when all open questions are resolved and cross-cutting issues are addressed. |
|
||||
| `reviewed` | Architecture is final and reviewed. Implementation may begin. API contracts are specified but not yet verified by tests. Changes require a review cycle. | → `stable` when implementation is complete and API contracts are verified by tests. → `draft` if a fundamental redesign is needed. |
|
||||
| `stable` | API contracts are locked and verified by tests. Changes require a review cycle and may warrant an ADR. | → `deprecated` when superseded. |
|
||||
| `deprecated` | Superseded by another document. Kept for reference. | Removed when no longer referenced. |
|
||||
|
||||
ADR documents use a separate `Status` field in their body: `Proposed`, `Accepted`, `Deprecated`, or `Superseded`. ADRs never revert from `Accepted`.
|
||||
|
||||
## References
|
||||
|
||||
- Source: `src/`
|
||||
- AGENTS.md: `/workspace/@alkdev/storage/AGENTS.md`
|
||||
- Flowgraph architecture (pattern reference): `/workspace/@alkdev/flowgraph/docs/architecture/`
|
||||
- ujsx architecture: `/workspace/@alkdev/ujsx/docs/architecture/`
|
||||
- Operations architecture: `/workspace/@alkdev/operations/docs/architecture/`
|
||||
24
docs/architecture/decisions/001-deno-first-jsr-publishes.md
Normal file
24
docs/architecture/decisions/001-deno-first-jsr-publishes.md
Normal file
@@ -0,0 +1,24 @@
|
||||
# ADR-001: Deno-first, JSR publishes, npm comes free
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The package needs to be usable from both Deno and Node.js. Maintaining separate build pipelines (dnt, esbuild) is overhead. JSR automatically provides npm compatibility via `@jsr/alkdev__storage`.
|
||||
|
||||
## Decision
|
||||
|
||||
Publish to JSR (`deno publish`). npm compatibility is automatic via JSR's npm layer. No separate dnt build step.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Source files use `.ts` extension with explicit import paths (Deno convention)
|
||||
- `--allow-slow-types` is used on publish; `"exclude": ["no-slow-types"]` in lint config
|
||||
- npm consumers install `@jsr/alkdev__storage`
|
||||
- No dual-publishing pipeline to maintain
|
||||
|
||||
## References
|
||||
|
||||
- [overview.md](../overview.md)
|
||||
@@ -0,0 +1,24 @@
|
||||
# ADR-002: Metagraph pattern over domain-specific tables
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Without a generic graph storage model, each domain concept (call graphs, ACL rules, task trees) requires its own table definitions — `call_table`, `acl_table`, etc. This leads to table proliferation and duplicated query patterns.
|
||||
|
||||
## Decision
|
||||
|
||||
Use the metagraph pattern: graph types define schemas, node types define data shapes, edge types define typed relationships. A single set of 6 metagraph tables stores any graph type. Domain-specific queries are built on top, not baked into table schemas.
|
||||
|
||||
## Consequences
|
||||
|
||||
- 6 tables (graph_types, node_types, edge_types, graphs, nodes, edges) serve all domains
|
||||
- Some query convenience is traded for generality — domain-specific indexes need application-layer solutions
|
||||
- Schema validation is the application's responsibility, not the database's
|
||||
|
||||
## References
|
||||
|
||||
- [overview.md](../overview.md)
|
||||
- [sqlite-host.md](../sqlite-host.md)
|
||||
@@ -0,0 +1,28 @@
|
||||
# ADR-003: TypeBox Module as the graph type definition API surface
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Graph type definitions need named entries (node types, edge types, config) that reference each other. The previous `SchemaBuilder` produced a flat `Record<string, NodeType>` + `Record<string, EdgeType>` which couldn't handle cross-graph-type references, didn't map to graphology's serialization format, and couldn't consume codegen output.
|
||||
|
||||
## Decision
|
||||
|
||||
Use `Type.Module()` as the primary API for defining graph types. The `SchemaBuilder` is removed. Consumers construct Modules directly with `Type.Ref()`, `Type.Composite()`, and `Metagraph.Import()`. The `moduleToDbSchema()` function bridges Modules to DB rows.
|
||||
|
||||
This provides: `Type.Ref()` for internal references, `Module.Import()` for cross-package references, JSON Schema `$defs` that map directly to DB storage, and codegen compatibility via `TsToModule.Generate()`.
|
||||
|
||||
## Consequences
|
||||
|
||||
- `SchemaBuilder` class is removed; consumers use `Type.Module()` directly
|
||||
- `types.ts` standalone schemas replaced by `Metagraph` Module and reference Modules
|
||||
- Schema validation uses `Value.Check` on Module entries
|
||||
- The Module format aligns with `@alkdev/ujsx`'s proven pattern
|
||||
- Cross-package imports work via `Module.Import()` when other packages export Modules
|
||||
|
||||
## References
|
||||
|
||||
- [metagraph-module.md](../metagraph-module.md)
|
||||
- [forward-look.md](../forward-look.md)
|
||||
@@ -0,0 +1,24 @@
|
||||
# ADR-004: Injectable clients, no module-level side effects
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Module-level side effects (auto-connections, env-based configuration) make testing difficult and conflict with containerized deployment patterns. The hub needs to create database instances with different configurations.
|
||||
|
||||
## Decision
|
||||
|
||||
`createSqliteDatabase(client)` receives a pre-created client. Module-level side effects are forbidden. The factory enables in-memory testing (`createClient({ url: ":memory:" })`) and custom client configuration.
|
||||
|
||||
## Consequences
|
||||
|
||||
- No global database state — each consumer explicitly creates and injects
|
||||
- Testing with in-memory databases is straightforward
|
||||
- Containerized deployment patterns are supported
|
||||
- Consumer controls client lifecycle (connections, cleanup)
|
||||
|
||||
## References
|
||||
|
||||
- [sqlite-host.md](../sqlite-host.md)
|
||||
@@ -0,0 +1,24 @@
|
||||
# ADR-005: Drizzle + TypeBox (via drizzlebox) as table definition pattern
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Table definitions need both ORM capabilities (type-safe queries, migrations) and runtime validation (TypeBox schemas). Maintaining two separate definitions (Drizzle tables + hand-written TypeBox schemas) creates sync burden.
|
||||
|
||||
## Decision
|
||||
|
||||
Drizzle table definitions are the single source of truth. `@alkdev/drizzlebox` generates TypeBox `Select*` and `Insert*` schemas from Drizzle tables, eliminating manual schema duplication.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Table schema changes only require updating the Drizzle definition
|
||||
- TypeBox schemas are always in sync with table definitions
|
||||
- `@alkdev/drizzlebox` is a dependency for the SQLite/PG subpath exports only
|
||||
- The main `@alkdev/storage` export has zero database driver dependencies
|
||||
|
||||
## References
|
||||
|
||||
- [sqlite-host.md](../sqlite-host.md)
|
||||
@@ -0,0 +1,24 @@
|
||||
# ADR-006: Enumeration pattern — `as const` objects, not TypeScript enums
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
TypeScript `enum` causes JSR slow-type issues and doesn't integrate well with TypeBox schemas. Enumeration values need to be available both as runtime constants and as TypeBox union literals.
|
||||
|
||||
## Decision
|
||||
|
||||
All enumerations use the `as const` object pattern (e.g., `GRAPH_STATUS = { Active: "active", ... } as const`). TypeBox schemas use `Type.Union` of `Type.Literal` values derived from the const object.
|
||||
|
||||
## Consequences
|
||||
|
||||
- No JSR slow-type warnings
|
||||
- Enum values available as both runtime constants and type system literals
|
||||
- Consistent pattern across the codebase
|
||||
- Matches the pattern used in `common.ts` for `ACTOR_TYPE`
|
||||
|
||||
## References
|
||||
|
||||
- [sqlite-host.md](../sqlite-host.md)
|
||||
23
docs/architecture/decisions/007-no-comments-in-code.md
Normal file
23
docs/architecture/decisions/007-no-comments-in-code.md
Normal file
@@ -0,0 +1,23 @@
|
||||
# ADR-007: No comments in code
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Per project convention across @alkdev packages, inline comments in source files create maintenance burden and often go stale. Documentation belongs in architecture docs and TypeBox schema descriptions.
|
||||
|
||||
## Decision
|
||||
|
||||
Source files contain no inline comments. Documentation lives in architecture docs (`docs/architecture/`) and TypeBox schema `description` properties.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Code is self-documenting through naming and structure
|
||||
- All rationale and context lives in versioned architecture docs
|
||||
- TypeBox schemas can carry `description` fields for runtime documentation
|
||||
|
||||
## References
|
||||
|
||||
- [overview.md](../overview.md)
|
||||
24
docs/architecture/decisions/008-common-columns-pattern.md
Normal file
24
docs/architecture/decisions/008-common-columns-pattern.md
Normal file
@@ -0,0 +1,24 @@
|
||||
# ADR-008: Common columns pattern
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
All tables need auditability (creation/modification timestamps) and extensibility (metadata). Without a common pattern, each table defines these differently.
|
||||
|
||||
## Decision
|
||||
|
||||
All tables share `id` (text PK, consumer-generated), `metadata` (JSON text defaulting to `{}`), `createdAt`, and `updatedAt` (integer timestamps in SQLite, timestamptz in PG). The `metadata` column is an extension namespace following `_subsystem.key` namespacing — it's for opaque key-value pairs that subsystems add without schema changes, not for queryable data.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Every row has auditability and extensibility
|
||||
- Subsystems add metadata without schema changes
|
||||
- Columns that appear in WHERE/JOIN conditions should be proper columns, not metadata
|
||||
- SQLite uses `integer` timestamps; PG will use `timestamptz`
|
||||
|
||||
## References
|
||||
|
||||
- [sqlite-host.md](../sqlite-host.md)
|
||||
@@ -0,0 +1,25 @@
|
||||
# ADR-009: TypeBox Module replaces the SchemaBuilder
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The `SchemaBuilder` produced a flat `Record<string, NodeType>` + `Record<string, EdgeType>`. It couldn't handle cross-graph-type references, didn't map to graphology's serialization format, and couldn't consume codegen output from `TsToModule`.
|
||||
|
||||
## Decision
|
||||
|
||||
The `SchemaBuilder` class is removed. Graph type definitions are `Type.Module` objects constructed directly with `Type.Ref()`, `Type.Composite()`, and `Metagraph.Import()`. The `moduleToDbSchema()` function replaces `SchemaBuilder.build()` as the bridge from Module to DB rows.
|
||||
|
||||
## Consequences
|
||||
|
||||
- `schemaBuilder.ts` is removed
|
||||
- `types.ts` standalone schemas replaced by `Metagraph` Module entries
|
||||
- Named entries (`*Node`, `*Edge`, `Config`) replace flat schema records
|
||||
- `Type.Ref()` provides internal references, `Module.Import()` provides cross-package references
|
||||
- `Type.Unknown()` replaces `Type.Any()` for arbitrary-data fields (both produce `{}` but `Unknown` is explicit)
|
||||
|
||||
## References
|
||||
|
||||
- [metagraph-module.md](../metagraph-module.md)
|
||||
@@ -0,0 +1,25 @@
|
||||
# ADR-010: Metagraph.Import() for same-package Modules
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Concrete graph type Modules need to compose `BaseNode` and `BaseEdge` from the `Metagraph` Module. There are two approaches: `Metagraph.Import()` (embeds the referenced Module's `$defs`) or local re-declaration (copy schemas into the concrete Module).
|
||||
|
||||
## Decision
|
||||
|
||||
Use `Metagraph.Import("BaseNode")` for Modules within `@alkdev/storage`. Both Modules live in the same package, so there's no circular dependency and no duplication. External packages that define graph type Modules should re-declare base schemas locally — storage should not be a dependency of other packages' schema definitions.
|
||||
|
||||
`Metagraph` is small (3 entries), so the `$defs` embedding is minimal. If it grows significantly or circular imports become an issue, local re-declaration is the fallback.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Reference Modules in `modules/` use `Metagraph.Import()` throughout
|
||||
- No duplication of `BaseNode`/`BaseEdge` within `@alkdev/storage`
|
||||
- External packages (flowgraph, taskgraph) define their own base schemas independently
|
||||
|
||||
## References
|
||||
|
||||
- [metagraph-module.md](../metagraph-module.md)
|
||||
@@ -0,0 +1,25 @@
|
||||
# ADR-011: Config as Module entry with Literal values
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Graph type configurations can be general (any valid config at construction time) or specific (frozen values for a particular graph type). The Module format needs to express both.
|
||||
|
||||
## Decision
|
||||
|
||||
The general `Metagraph.Config` uses `Type.Union` with defaults for construction-time validation ("any valid config"). Specific graph types freeze config values to `Type.Literal`, making the config a precise contract rather than a validation surface.
|
||||
|
||||
Construction flow: consumer provides a general config → validated against `Metagraph.Config` → the specific graph type Module freezes the values with `Type.Literal`.
|
||||
|
||||
## Consequences
|
||||
|
||||
- `Metagraph.Config` accepts `{ type: "directed" | "undirected" | "mixed", multi: boolean, allowSelfLoops: boolean }` with defaults
|
||||
- `CallGraph.Config` is `{ type: "directed", multi: false, allowSelfLoops: false }` — frozen, no ambiguity
|
||||
- Type narrowing from Union to Literal is explicit in the Module, no builder step needed
|
||||
|
||||
## References
|
||||
|
||||
- [metagraph-module.md](../metagraph-module.md)
|
||||
@@ -0,0 +1,24 @@
|
||||
# ADR-012: Node/edge attribute schemas are Module entries, not Type.Any()
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
At the DB storage layer, node/edge attribute schemas are arbitrary JSON Schema blobs (`Type.Unknown()`). But at the application layer, graph type definitions should have full type safety for their attribute schemas.
|
||||
|
||||
## Decision
|
||||
|
||||
At the application layer, node and edge attribute schemas are named Module entries with full type safety (`CallGraph.CallNode`, not `schema: Type.Any()`). At the DB storage layer, the meta-schemas (`NodeType`, `EdgeType`) still have `schema: Type.Unknown()` because the DB stores arbitrary JSON Schema blobs. The repository layer maps between the two: Module entries are the application-level validation, the DB is the persistence layer.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Compile-time type safety for graph type definitions
|
||||
- Runtime validation via `Value.Check(Module.EntryName, data)`
|
||||
- DB layer remains generic — any JSON Schema blob is accepted for storage
|
||||
- Validation boundary is the repository layer, not the database
|
||||
|
||||
## References
|
||||
|
||||
- [metagraph-module.md](../metagraph-module.md)
|
||||
@@ -0,0 +1,24 @@
|
||||
# ADR-013: Storage produces graphology format, flowgraph consumes it
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Graph data needs to flow between storage (persistence) and flowgraph (in-memory operations). The question is whether storage should depend on graphology directly.
|
||||
|
||||
## Decision
|
||||
|
||||
Storage doesn't need a graphology dependency. It produces the JSON serialization format (`SerializedGraph`) that `@alkdev/flowgraph`'s `FlowGraph.fromJSON()` and `SerializedGraph` consume. The Module entries validate data flowing in both directions. Storage defines the data shapes; flowgraph operates on them in memory.
|
||||
|
||||
## Consequences
|
||||
|
||||
- `@alkdev/storage` has zero dependency on graphology
|
||||
- The `SerializedGraph` factory signature stays the same — its schema arguments come from Module entries instead of standalone schemas
|
||||
- Storage produces format, flowgraph consumes format
|
||||
- `moduleToGraphology()` and `fromGraphologyExport()` bridge functions will be added in Phase 4
|
||||
|
||||
## References
|
||||
|
||||
- [metagraph-module.md](../metagraph-module.md)
|
||||
@@ -0,0 +1,24 @@
|
||||
# ADR-014: Repository stores dereferenced entry schemas
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
When a Module entry uses `Module.Import()`, the entry's JSON Schema embeds the referenced Module's `$defs`. For example, `CallGraph` importing `FlowGraph.Import("CallStatus")` includes all of `FlowGraph`'s definitions in the JSON Schema output. Storing this in every `node_types` row would duplicate the entire referenced Module.
|
||||
|
||||
## Decision
|
||||
|
||||
The repository layer stores **dereferenced entry schemas** — each `node_types` row gets its entry's resolved JSON Schema with just the transitive `$defs` it needs, not the entire importing Module's definitions. This avoids storage bloat and version coupling between packages.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Each DB row stores 1–3 KB of JSON Schema (just its own entry + transitive refs)
|
||||
- A full graph type's schemas total ~10–50 KB in the DB, negligible compared to node/edge data
|
||||
- No version coupling — a `FlowGraph` version change doesn't require updating all CallGraph rows
|
||||
- `moduleToDbSchema()` performs the dereferencing at write time
|
||||
|
||||
## References
|
||||
|
||||
- [metagraph-module.md](../metagraph-module.md)
|
||||
@@ -0,0 +1,26 @@
|
||||
# ADR-015: Edge type constraints as named Module entries
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Edge type constraints (`allowedSourceTypes`/`allowedTargetTypes`) could be stored only as DB columns (JSON text arrays) or as first-class parts of the schema with validation and serialization.
|
||||
|
||||
## Decision
|
||||
|
||||
Edge type constraints are named Module entries (e.g., `TriggeredEdgeConstraints`), not just DB columns. This gives them schema validation (`Value.Check`) and serialization (JSON Schema with `$defs`). The repository layer projects these entries to the existing `edge_types.allowedSourceTypes`/`allowedTargetTypes` columns. The DB schema doesn't change — Module entries are the source of truth, DB columns are the persistence projection.
|
||||
|
||||
Empty constraint arrays `[]` mean "no restriction" (any node type valid). Omitting the `*EdgeConstraints` entry means the same. An explicit entry with empty arrays is invalid.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Constraints are validatable at the schema level, not just at the DB layer
|
||||
- `moduleToDbSchema()` extracts constraint entries to DB columns
|
||||
- Constraint data survives serialization/deserialization cycles via JSON Schema `$defs`
|
||||
- DB columns remain `allowedSourceTypes text` and `allowedTargetTypes text` with JSON arrays
|
||||
|
||||
## References
|
||||
|
||||
- [metagraph-module.md](../metagraph-module.md)
|
||||
@@ -0,0 +1,25 @@
|
||||
# ADR-016: Naming convention for Module entries
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Module entries need a way to distinguish their role (config, node type, edge type, constraint) so that `moduleToDbSchema()` can map them to the correct DB tables. Two approaches: explicit metadata/decorators on entries, or naming convention with role-distinguishing suffixes.
|
||||
|
||||
## Decision
|
||||
|
||||
Module entries use role-distinguishing suffixes: `*Node` for node types, `*Edge` for edge types, `Config` for graph configuration, `*EdgeConstraints` for edge endpoint constraints, and bare names or `*Enum` for shared types. `moduleToDbSchema()` uses this convention to map entries to DB tables.
|
||||
|
||||
Explicit metadata/decorators (e.g., `{ kind: "nodeType", name: "call", schema: ... }`) were considered but rejected — they add boilerplate without adding information. The suffix convention is simpler and sufficient for the expected Module size (5–20 entries).
|
||||
|
||||
## Consequences
|
||||
|
||||
- `moduleToDbSchema()` throws on entries that don't match any recognized suffix
|
||||
- Bare names without a suffix are treated as shared types (embedded in other entries' schemas), not as independent DB rows
|
||||
- `*EdgeConstraints` entries must have an `edgeType` field matching an `*Edge` entry name
|
||||
|
||||
## References
|
||||
|
||||
- [metagraph-module.md](../metagraph-module.md)
|
||||
@@ -0,0 +1,25 @@
|
||||
# ADR-017: Pointer abstraction is forward-looking, not v1
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
There's a structural analogy between ujsx's `ValuePointer`/`selectNode`/`setNode` and graph node/edge addressing. Typed graph pointers (via JPATH Module or reactive signals) could provide compile-time-safe graph queries.
|
||||
|
||||
## Decision
|
||||
|
||||
For v1, repository functions use direct key-based addressing (`findNode(graphId, nodeKey)`). Typed pointers are a post-v1 concern. The Module's existence makes typed pointers feasible later because it provides the schema the pointer validates against, but implementing them now adds complexity without clear benefit.
|
||||
|
||||
## Consequences
|
||||
|
||||
- v1 repository API uses string-based keys for node/edge lookups
|
||||
- Attribute access is untyped JSON retrieval (`node.attributes.requestId`)
|
||||
- The Module validates attribute shapes, but query paths are strings
|
||||
- The pointer abstraction can layer on top of the repository later
|
||||
|
||||
## References
|
||||
|
||||
- [metagraph-module.md](../metagraph-module.md)
|
||||
- [forward-look.md](../forward-look.md)
|
||||
@@ -0,0 +1,26 @@
|
||||
# ADR-018: dbtype integration is post-v1
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
`@alkdev/dbtype`'s UJSX→Module→Host pipeline can eliminate the manual dual definition of SQLite/PG table schemas. Storage currently defines tables twice (once for SQLite, once for PG — the latter not yet implemented).
|
||||
|
||||
## Decision
|
||||
|
||||
For v1, storage uses manual Drizzle table definitions. dbtype is Phase 0 (architecture complete, no implementation). The Module-based graph type definitions are compatible with dbtype because both produce `Type.Module` objects — the integration path is clear when dbtype reaches implementation.
|
||||
|
||||
## Consequences
|
||||
|
||||
- SQLite tables are defined manually via Drizzle
|
||||
- PG tables will also be defined manually via Drizzle when implemented
|
||||
- Dual SQLite/PG maintenance for 6 metagraph tables is manageable
|
||||
- No dbtype dependency in v1
|
||||
- The Module format ensures future dbtype integration is straightforward
|
||||
|
||||
## References
|
||||
|
||||
- [metagraph-module.md](../metagraph-module.md)
|
||||
- [forward-look.md](../forward-look.md)
|
||||
@@ -0,0 +1,24 @@
|
||||
# ADR-019: JSON text for schema columns in SQLite
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
SQLite stores JSON as `text` with `{ mode: "json" }`. PostgreSQL uses native `jsonb`. The choice affects queryability and validation behavior.
|
||||
|
||||
## Decision
|
||||
|
||||
SQLite uses `text` with JSON mode for `schema`, `config`, `attributes`, `metadata`, and `allowedSourceTypes`/`allowedTargetTypes` columns. JSON validation relies on application-level TypeBox schemas, not database constraints. SQLite is for spokes (local, infrequent queries); PostgreSQL is for the hub (frequent, complex queries where `jsonb` queryability matters).
|
||||
|
||||
## Consequences
|
||||
|
||||
- SQLite cannot efficiently query inside JSON columns (no GIN indexes)
|
||||
- All JSON validation is application-level (`Value.Check`)
|
||||
- PostgreSQL gets queryability benefits from `jsonb` when implemented
|
||||
- The dual-host strategy is appropriate: SQLite for local infrequent access, PG for hub-level querying
|
||||
|
||||
## References
|
||||
|
||||
- [sqlite-host.md](../sqlite-host.md)
|
||||
26
docs/architecture/decisions/020-no-nodetypeid-on-nodes.md
Normal file
26
docs/architecture/decisions/020-no-nodetypeid-on-nodes.md
Normal file
@@ -0,0 +1,26 @@
|
||||
# ADR-020: No nodeTypeId on nodes
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Nodes could carry a direct FK to `node_types`, making "find all nodes of type X" queries trivial. Without it, the node type must be determined at the application layer.
|
||||
|
||||
## Decision
|
||||
|
||||
Nodes don't carry a `nodeTypeId` FK. The node type is enforced at the application layer. Adding a FK would duplicate the constraint already expressed in the graph type schema. Node types can evolve (schemas change) without requiring node row updates. The repository layer validates node attributes against the appropriate node type schema before insertion.
|
||||
|
||||
This may change if query performance requires filtering nodes by type. A `nodeTypeId` column can be added as a denormalized index later.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Nodes table has no FK to `node_types`
|
||||
- Filtering by node type requires application-layer logic or a repository query
|
||||
- Schema evolution is simpler — no FK to update when node type schemas change
|
||||
- A `nodeTypeId` column can be added as a denormalized index if needed
|
||||
|
||||
## References
|
||||
|
||||
- [sqlite-host.md](../sqlite-host.md)
|
||||
@@ -0,0 +1,23 @@
|
||||
# ADR-021: Edge identity uses consumer-defined keys
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Edges need unique identity within a graph. Auto-generated UUIDs vs consumer-defined keys affect how edges are referenced and updated.
|
||||
|
||||
## Decision
|
||||
|
||||
Edges use `(graphId, key)` as their unique identity, where `key` is consumer-defined. This matches the metagraph model where consumers control identifiers. For anonymous edges (common in simple graphs), `key` can be auto-generated.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Consumers can reference edges by meaningful keys (e.g., `"call-001→call-002"`)
|
||||
- Updates use the consumer's key, not a generated UUID
|
||||
- Anonymous edges can have null or auto-generated keys
|
||||
|
||||
## References
|
||||
|
||||
- [sqlite-host.md](../sqlite-host.md)
|
||||
@@ -0,0 +1,23 @@
|
||||
# ADR-022: Composite foreign keys for node references
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Edges reference source and target nodes. A simple `sourceNodeId` FK would work but doesn't capture the composite key model.
|
||||
|
||||
## Decision
|
||||
|
||||
Edges reference nodes via composite FKs: `(graphId, sourceNodeKey) → (nodes.graphId, nodes.key)`. This ensures referential integrity within a graph and cascades node deletions to connected edges.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Deleting a node cascades to all its edges
|
||||
- Edges cannot reference nodes in a different graph
|
||||
- Composite FKs match the metagraph identity model `(graphId, key)`
|
||||
|
||||
## References
|
||||
|
||||
- [sqlite-host.md](../sqlite-host.md)
|
||||
25
docs/architecture/decisions/023-per-attribute-encryption.md
Normal file
25
docs/architecture/decisions/023-per-attribute-encryption.md
Normal file
@@ -0,0 +1,25 @@
|
||||
# ADR-023: Per-attribute encryption, not per-node
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Encrypted data could be stored as an entire encrypted `attributes` blob or as individual encrypted attributes within the node. The choice affects queryability and schema design.
|
||||
|
||||
## Decision
|
||||
|
||||
The `EncryptedData` schema is a single attribute within a node type's attributes, not the entire node. Per-attribute encryption preserves queryability on non-sensitive fields — a secret node can have unencrypted metadata alongside the encrypted value. The node key (identity) is always readable for queries.
|
||||
|
||||
Encrypting the entire `attributes` column would make queries impossible (can't find "all secrets for client X" if the client reference is encrypted).
|
||||
|
||||
## Consequences
|
||||
|
||||
- Only sensitive payload is encrypted; identity and metadata remain queryable
|
||||
- `EncryptedDataSchema` validates the encryption envelope at write time
|
||||
- Different graph types can have encrypted attributes without special table definitions
|
||||
|
||||
## References
|
||||
|
||||
- [encrypted-data.md](../encrypted-data.md)
|
||||
@@ -0,0 +1,26 @@
|
||||
# ADR-024: Encrypted data as node type, not standalone table
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Encrypted data could be modeled as a standalone `secrets` table (like the hub's `client_secrets`) or as a node type within the metagraph.
|
||||
|
||||
## Decision
|
||||
|
||||
Encrypted data is modeled as a node type (`SecretNode`) in a graph, not a standalone table. Graphs already provide the structure — edges represent "client X has secret Y" without a join table. New secret types (OAuth, SSH, API keys) are new node types, not new columns or tables.
|
||||
|
||||
When a consumer needs high-throughput key lookups (e.g., the hub authenticating every request), a dedicated `api_keys` table with proper indexes is still appropriate. The metagraph pattern is for everything else.
|
||||
|
||||
## Consequences
|
||||
|
||||
- No dedicated encryption table — existing metagraph tables store everything
|
||||
- "Find all secrets for client X" becomes "find all edges of type `has_secret` from node X"
|
||||
- Domain flexibility — any graph can have encrypted nodes
|
||||
- High-throughput auth paths should use a standalone table, not graph queries
|
||||
|
||||
## References
|
||||
|
||||
- [encrypted-data.md](../encrypted-data.md)
|
||||
@@ -0,0 +1,26 @@
|
||||
# ADR-025: Password-based encryption via PBKDF2
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Encryption can use PBKDF2 key derivation from a password string or direct AES-GCM with raw key bytes. The hub uses password-based encryption.
|
||||
|
||||
## Decision
|
||||
|
||||
Use the PBKDF2 pattern for consistency with the hub. The "password" in practice is a base64-encoded 32-byte random key from `generateEncryptionKey()`. PBKDF2 adds security per-encryption (each encryption gets a unique salt), but adds ~100ms latency per operation.
|
||||
|
||||
If performance becomes an issue, add an `encryptRaw()` function that skips PBKDF2 for raw key inputs.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Consistent with hub's crypto implementation
|
||||
- ~100ms per encrypt/decrypt due to PBKDF2 iterations
|
||||
- v1 key version uses 100,000 PBKDF2 iterations
|
||||
- Web Crypto API only — no external crypto dependencies
|
||||
|
||||
## References
|
||||
|
||||
- [encrypted-data.md](../encrypted-data.md)
|
||||
@@ -0,0 +1,24 @@
|
||||
# ADR-026: Application-managed key ring
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Encryption keys need management — storage, rotation, version tracking. The storage package could manage key rings internally or leave this to the consuming application.
|
||||
|
||||
## Decision
|
||||
|
||||
The storage package provides `encrypt()`, `decrypt()`, and `generateEncryptionKey()` but does NOT manage the key ring. The consuming application stores keys in a secure location, loads them at startup, and passes the appropriate key based on `keyVersion`. Key rotation (decrypt with old key, re-encrypt with current key) is an application-level workflow.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Storage package doesn't need to know about deployment infrastructure
|
||||
- Key management policies are application-specific
|
||||
- Encryption primitives are testable without a key ring implementation
|
||||
- Key rotation is an application concern, not a storage concern
|
||||
|
||||
## References
|
||||
|
||||
- [encrypted-data.md](../encrypted-data.md)
|
||||
23
docs/architecture/decisions/027-no-key-rotation-utility.md
Normal file
23
docs/architecture/decisions/027-no-key-rotation-utility.md
Normal file
@@ -0,0 +1,23 @@
|
||||
# ADR-027: No key rotation utility in this package
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Key rotation (decrypt with old key, re-encrypt with current key) is a necessary operation for long-lived encrypted data. The question is whether the storage package provides a rotation utility.
|
||||
|
||||
## Decision
|
||||
|
||||
Key rotation is an application-level workflow: find all nodes with `keyVersion < currentVersion`, decrypt with old key, encrypt with current key, update node. The storage package provides the building blocks (`encrypt()`, `decrypt()`, `EncryptedDataSchema`), not the rotation workflow. The hub's background sweep pattern is a good reference implementation.
|
||||
|
||||
## Consequences
|
||||
|
||||
- No rotation utility in this package — application orchestrates the workflow
|
||||
- `keyVersion` field in `EncryptedData` enables rotation detection
|
||||
- The building blocks (encrypt, decrypt, schema validation) are provided
|
||||
|
||||
## References
|
||||
|
||||
- [encrypted-data.md](../encrypted-data.md)
|
||||
@@ -0,0 +1,24 @@
|
||||
# ADR-028: Additive-only for v1, Cast migration when needed
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Graph type schemas will evolve over time. Changes range from adding optional fields (non-breaking) to removing fields or changing types (breaking). The repository layer needs a strategy for handling schema changes.
|
||||
|
||||
## Decision
|
||||
|
||||
For v1, schema changes should be additive (new optional fields, new types, new enum values). This avoids data migration entirely. When additive-only is insufficient, `Value.Cast` handles common migration cases (adding fields with defaults, type narrowing). Custom migration functions are the consumer's responsibility for breaking changes.
|
||||
|
||||
## Consequences
|
||||
|
||||
- v1 schemas should only add optional fields and new types
|
||||
- The `version` field on `graph_types` stays at 1 for additive changes
|
||||
- Breaking changes require a version bump and migration strategy
|
||||
- `Value.Cast` is available for common migration cases but is not a migration framework
|
||||
|
||||
## References
|
||||
|
||||
- [schema-evolution.md](../schema-evolution.md)
|
||||
@@ -0,0 +1,24 @@
|
||||
# ADR-029: Version as a coarse-grained breaking-change signal
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The `version` integer on `graph_types` could track every schema change or only breaking changes. Tracking every change is noisy; ignoring versioning creates risk.
|
||||
|
||||
## Decision
|
||||
|
||||
The `version` integer tracks **breaking** schema changes. Non-breaking changes (additive) do not require a version bump. The repository layer checks the version before processing and knows to run migration logic when it doesn't match. An even/odd scheme signals migration state: even = stable, odd = migration in progress.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Most schema changes (additive) don't bump the version
|
||||
- Breaking changes bump the version and trigger migration
|
||||
- Odd versions signal incomplete migration for crash recovery
|
||||
- The consumer defines when a version bump occurs via a constant
|
||||
|
||||
## References
|
||||
|
||||
- [schema-evolution.md](../schema-evolution.md)
|
||||
@@ -0,0 +1,24 @@
|
||||
# ADR-030: Schema change detection via Value.Diff, not manual tracking
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Schema changes need to be detected when a stored graph type's Module diverges from the current Module definition. This could be tracked via a separate changelog or detected from the schemas themselves.
|
||||
|
||||
## Decision
|
||||
|
||||
The repository layer uses `Value.Diff(storedSchema, moduleEntry)` to detect when a stored schema has diverged from the current Module entry. This is schema-agnostic and works for any change — no separate changelog or version log needed. The `Edit[]` output from `Value.Diff` can optionally be classified as breaking or non-breaking.
|
||||
|
||||
## Consequences
|
||||
|
||||
- No manual changelog to maintain
|
||||
- Works for any TypeBox schema change
|
||||
- Classification of edits (breaking vs non-breaking) can be added later
|
||||
- `moduleToDbSchema()` is used to update stored schemas after migration
|
||||
|
||||
## References
|
||||
|
||||
- [schema-evolution.md](../schema-evolution.md)
|
||||
@@ -0,0 +1,26 @@
|
||||
# ADR-031: moduleToDbSchema() for schema updates, not Value.Patch
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
When a graph type Module changes, the stored schema in `node_types.schema` must be updated. This could be done by patching the stored schema with `Value.Patch` or by re-running `moduleToDbSchema()` on the full Module.
|
||||
|
||||
## Decision
|
||||
|
||||
Re-run `moduleToDbSchema()` on the updated Module. Patch-based schema update (`Value.Patch`) is unreliable for `Type.Ref`/`$defs` changes — the diff is structural, not semantic, and may not capture `$defs` reorganization correctly. `moduleToDbSchema()` is more reliable because it always produces a fresh, complete projection.
|
||||
|
||||
Patch-based schema update is an optimization for later.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Schema updates always go through `moduleToDbSchema()`
|
||||
- The full Module must be available at migration time
|
||||
- No risk of partial/partially-correct patches from `Value.Diff`
|
||||
- Patch-based update can be added as an optimization if needed
|
||||
|
||||
## References
|
||||
|
||||
- [schema-evolution.md](../schema-evolution.md)
|
||||
23
docs/architecture/decisions/032-single-author-not-crdt.md
Normal file
23
docs/architecture/decisions/032-single-author-not-crdt.md
Normal file
@@ -0,0 +1,23 @@
|
||||
# ADR-032: Single-author model, not CRDT
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Schema evolution could follow a single-author model (one consumer defines the graph type) or a multi-author CRDT model (multiple consumers define the same type concurrently).
|
||||
|
||||
## Decision
|
||||
|
||||
Schema evolution assumes a single author per graph type. There is no concurrent multi-author editing. If this changes (multiple consumers defining the same graph type with different schemas), a CRDT layer would need to sit between the event stream and storage. That's a post-v1 concern.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Idempotent replay is sufficient; CRDT merge semantics are not needed
|
||||
- Schema evolution is forward-only (new code processes old data), not bidirectional
|
||||
- Migration is apply-on-read or apply-on-write, not conflict resolution
|
||||
|
||||
## References
|
||||
|
||||
- [schema-evolution.md](../schema-evolution.md)
|
||||
@@ -212,93 +212,15 @@ No external crypto dependencies.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### ED1: Per-attribute encryption, not per-node
|
||||
All design decisions are documented as ADRs in [decisions/](decisions/).
|
||||
|
||||
The `EncryptedData` schema is a single attribute within a node type's
|
||||
attributes, not the entire node. This means:
|
||||
|
||||
- A secret node can have unencrypted metadata alongside the encrypted value
|
||||
- The node key (identity) is always readable for queries
|
||||
- Only the sensitive payload is encrypted
|
||||
|
||||
**Alternative considered**: Encrypt the entire `attributes` column. This makes
|
||||
queries impossible (you can't find "all secrets for client X" if the client
|
||||
reference is encrypted). Per-attribute encryption preserves queryability on
|
||||
non-sensitive fields.
|
||||
|
||||
### ED2: Node type, not standalone table
|
||||
|
||||
Encrypted data is modeled as a node type rather than a dedicated `secrets` table
|
||||
because:
|
||||
|
||||
- **Graphs already provide the structure** — edges represent "client X has
|
||||
secret Y" without a join table
|
||||
- **No foreign key proliferation** — new secret types (OAuth, SSH, API keys) are
|
||||
new node types, not new columns or tables
|
||||
- **Uniform query patterns** — All graph queries work on secret nodes without
|
||||
special code
|
||||
|
||||
**When a standalone table might be better**: If a consumer (like the hub) needs
|
||||
to query "all active API keys" across all clients with a single indexed `WHERE`
|
||||
clause, a dedicated `api_keys` table with proper indexes is faster. The graph
|
||||
model requires traversing edges to find related secrets. For a hub's specific use
|
||||
case (key lookup on every authenticated request), this matters. The metagraph
|
||||
pattern is optimized for flexibility, not raw key-lookup performance. Consumers
|
||||
should use standalone tables for authentication hot paths and the metagraph for
|
||||
everything else.
|
||||
|
||||
### ED3: Password-based encryption, not raw-key encryption
|
||||
|
||||
The current implementation uses PBKDF2 to derive a key from a password string.
|
||||
The "password" in practice is a base64-encoded 32-byte random key from
|
||||
`generateEncryptionKey()`. This means:
|
||||
|
||||
- The key derivation step adds security even when the input is already
|
||||
high-entropy (each encryption gets a unique salt, so the same key produces
|
||||
different ciphertexts)
|
||||
- However, this adds ~100ms of latency per encryption/decryption due to PBKDF2
|
||||
iterations
|
||||
|
||||
**Alternative**: Direct AES-GCM with raw key bytes (skip PBKDF2). This would be
|
||||
much faster for high-throughput scenarios but removes the per-encryption salt
|
||||
benefit (the IV still provides uniqueness for GCM). The hub uses password-based
|
||||
because the config format is human-manageable key strings. For
|
||||
`@alkdev/storage`, either approach works — the API accepts a "password" string
|
||||
which could be a raw key encoded as base64.
|
||||
|
||||
**Decision**: Use the same PBKDF2 pattern for consistency with the hub. If
|
||||
performance becomes an issue, add a `encryptRaw()` function that skips PBKDF2
|
||||
for raw key inputs.
|
||||
|
||||
### ED4: Application-managed key ring
|
||||
|
||||
The storage package provides `encrypt()` and `decrypt()` but does NOT manage the
|
||||
key ring. The consuming application:
|
||||
|
||||
1. Stores encryption keys in a secure location (Docker secrets, vault, config
|
||||
file with restricted permissions)
|
||||
2. Loads keys at startup
|
||||
3. Passes the appropriate key to `encrypt()` / `decrypt()` based on `keyVersion`
|
||||
4. Handles key rotation (decrypt with old key, re-encrypt with current key)
|
||||
|
||||
This separation ensures:
|
||||
|
||||
- The storage package doesn't need to know about deployment infrastructure
|
||||
- Key management policies are application-specific
|
||||
- The encryption primitives are testable without a key ring implementation
|
||||
|
||||
### ED5: No key rotation utility in this package
|
||||
|
||||
Key rotation (decrypt with old key, re-encrypt with current key) is an
|
||||
application-level workflow:
|
||||
|
||||
1. Find all nodes with `attributes.encryptedData.keyVersion < currentVersion`
|
||||
2. For each: decrypt with old key → encrypt with current key → update node
|
||||
3. Commit transaction
|
||||
|
||||
The storage package provides the building blocks (`encrypt()`, `decrypt()`,
|
||||
`EncryptedDataSchema`), not the rotation workflow. The hub's background sweep
|
||||
pattern is a good reference implementation.
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [023](decisions/023-per-attribute-encryption.md) | Per-attribute encryption, not per-node | Only sensitive payload encrypted; key/metadata remain queryable |
|
||||
| [024](decisions/024-encrypted-data-as-node-type.md) | Encrypted data as node type, not standalone table | No special tables; metagraph pattern with `SecretNode` and `HasSecretEdge` |
|
||||
| [025](decisions/025-password-based-encryption-pbkdf2.md) | Password-based encryption via PBKDF2 | Consistent with hub; ~100ms per operation; `encryptRaw()` added later if needed |
|
||||
| [026](decisions/026-application-managed-key-ring.md) | Application-managed key ring | Storage provides encrypt/decrypt primitives, not key management |
|
||||
| [027](decisions/027-no-key-rotation-utility.md) | No key rotation utility in this package | Application orchestrates rotation; storage provides building blocks |
|
||||
|
||||
## Integration with SQLite Host
|
||||
|
||||
@@ -348,26 +270,12 @@ Crypto API and `@alkdev/typebox` for the schema).
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Should we add `encryptRaw()` for performance?** The PBKDF2 derivation adds
|
||||
~100ms per operation. For batch secret operations (e.g., rotating 1000 keys),
|
||||
this adds up. A `encryptRaw()` that skips PBKDF2 and uses the key directly
|
||||
would be much faster. Decision: add in a future iteration if performance
|
||||
demands it.
|
||||
Open questions are tracked in [open-questions.md](open-questions.md). Key
|
||||
questions affecting encrypted data:
|
||||
|
||||
2. **Should the `key` attribute on secret nodes be encrypted?** Currently only
|
||||
the `encryptedData` attribute is encrypted. The `key` (secret name like
|
||||
"api_key") is stored in plaintext for queryability. If secret names are
|
||||
themselves sensitive, they could be hashed instead. Decision: plaintext key
|
||||
names are acceptable for now. If needed, add a `keyHash` attribute for blind
|
||||
lookups (similar to the hub's `api_keys.keyHash`).
|
||||
|
||||
3. **Should secret nodes have `lastUsedAt` and `expiresAt` as first-class
|
||||
columns?** The hub's `client_secrets` has these as columns for indexed
|
||||
queries. In the metagraph model, they're attributes inside the node JSON.
|
||||
SQLite can't efficiently index JSON properties. Decision: for spoke use
|
||||
(occasional lookups), JSON attributes are fine. For hub use (high-throughput
|
||||
key validation), a standalone `api_keys` table with proper indexes is still
|
||||
needed.
|
||||
- **OQ-07**: Should we add `encryptRaw()` for performance? (open, low priority)
|
||||
- **OQ-08**: Should the `key` attribute on secret nodes be encrypted? (resolved: plaintext for now)
|
||||
- **OQ-09**: Should secret nodes have `lastUsedAt` and `expiresAt` as first-class columns? (resolved: JSON attributes for spoke, standalone table for hub)
|
||||
|
||||
## References
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
---
|
||||
status: draft
|
||||
status: reviewed
|
||||
last_updated: 2026-05-29
|
||||
---
|
||||
|
||||
@@ -483,7 +483,7 @@ types, 2–5 edge types, 2–5 shared types, 2–5 constraint entries). `Value.C
|
||||
cost scales with schema complexity, not Module size; only the resolved entry
|
||||
schema is checked, not the entire Module.
|
||||
|
||||
The dereferenced entry strategy (DD6) means each DB row stores only its own
|
||||
The dereferenced entry strategy ([ADR-014](decisions/014-dereferenced-entry-schemas.md)) means each DB row stores only its own
|
||||
JSON Schema with transitive `$defs` — typically 1–3 KB per entry. A full graph
|
||||
type's schemas total ~10–50 KB in the DB, negligible compared to node/edge data.
|
||||
|
||||
@@ -568,127 +568,39 @@ Acceptance criteria:
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### DD1: TypeBox Module replaces the SchemaBuilder
|
||||
All design decisions are documented as ADRs in [decisions/](decisions/).
|
||||
|
||||
Graph type definitions are `Type.Module` objects. The previous `SchemaBuilder`
|
||||
class is removed — consumers use `Type.Module()` construction directly, with
|
||||
`Type.Ref()`, `Type.Composite()`, and `Metagraph.Import()` as the building
|
||||
blocks. The `moduleToDbSchema()` function replaces `SchemaBuilder.build()` as
|
||||
the bridge from Module to DB rows.
|
||||
|
||||
This provides `Type.Ref()` for internal references, `Module.Import()` for
|
||||
cross-package references, JSON Schema `$defs` that map directly to DB storage,
|
||||
and codegen compatibility via `TsToModule.Generate()`.
|
||||
|
||||
### DD2: Metagraph.Import() for same-package Modules
|
||||
|
||||
Concrete graph types within `@alkdev/storage` use `Metagraph.Import("BaseNode")`
|
||||
to compose base schemas. This avoids duplication and keeps the base schemas in
|
||||
one place. External packages that define graph type Modules should re-declare
|
||||
base schemas locally — storage should not be a dependency of other packages'
|
||||
schema definitions.
|
||||
|
||||
### DD3: Config as a Module entry with Literal values
|
||||
|
||||
General `Metagraph.Config` uses `Type.Union` with defaults for construction-time
|
||||
validation. Specific graph types freeze config values to `Type.Literal`, making
|
||||
the config a precise contract rather than a validation surface.
|
||||
|
||||
### DD4: Node/edge attribute schemas are Module entries, not Type.Any()
|
||||
|
||||
At the application layer, node and edge attribute schemas are named Module
|
||||
entries with full type safety (`CallGraph.CallNode`, not `schema: Type.Any()`).
|
||||
At the DB storage layer, the meta-schemas (`NodeType`, `EdgeType`) still have
|
||||
`schema: Type.Unknown()` because the DB stores arbitrary JSON Schema blobs.
|
||||
|
||||
### DD5: Storage produces graphology format, flowgraph consumes it
|
||||
|
||||
Storage doesn't need a graphology dependency. It produces the JSON serialization
|
||||
format that `@alkdev/flowgraph`'s `FlowGraph.fromJSON()` and `SerializedGraph`
|
||||
consume. The Module entries validate data flowing in both directions.
|
||||
|
||||
### DD6: Repository stores dereferenced entry schemas
|
||||
|
||||
When a Module entry uses `Module.Import()`, the entry's JSON Schema embeds the
|
||||
referenced Module's `$defs`. To avoid storing the full referenced Module in
|
||||
every DB row, the repository layer stores **dereferenced entry schemas** — each
|
||||
`node_types` row gets its entry's resolved JSON Schema with just the transitive
|
||||
`$defs` it needs, not the entire importing Module's definitions.
|
||||
|
||||
### DD7: Edge type constraints as named Module entries
|
||||
|
||||
Edge type constraints (`allowedSourceTypes`/`allowedTargetTypes`) are named
|
||||
Module entries (e.g., `TriggeredEdgeConstraints`), not just DB columns. This
|
||||
gives them schema validation (`Value.Check`) and serialization (JSON Schema
|
||||
with `$defs`). The repository layer projects these entries to the existing
|
||||
`edge_types` columns. The DB schema doesn't change — Module entries are the
|
||||
source of truth, DB columns are the persistence projection.
|
||||
|
||||
### DD8: Naming convention for Module entries
|
||||
|
||||
Module entries use role-distinguishing suffixes: `*Node` for node types,
|
||||
`*Edge` for edge types, `Config` for graph configuration, `*EdgeConstraints`
|
||||
for edge endpoint constraints, and bare names or `*Enum` for shared types.
|
||||
`moduleToDbSchema()` uses this convention to map entries to DB tables.
|
||||
|
||||
This was chosen over explicit metadata/decorators (e.g.,
|
||||
`{ kind: "nodeType", name: "call", schema: ... }`) because the suffix convention
|
||||
is simpler and sufficient for the expected Module size (5–20 entries).
|
||||
|
||||
### DD9: Pointer abstraction is forward-looking, not v1
|
||||
|
||||
The structural analogy between ujsx's `ValuePointer`/`selectNode`/`setNode` and
|
||||
graph node/edge addressing is real, but implementing typed graph pointers (via
|
||||
JPATH Module or reactive signals) is a post-v1 concern. For v1, repository
|
||||
functions use direct key-based addressing (`findNode(graphId, nodeKey)`), and
|
||||
the Module validates attribute shapes. See [forward-look.md](./forward-look.md).
|
||||
|
||||
### DD10: dbtype integration is post-v1
|
||||
|
||||
`@alkdev/dbtype`'s UJSX→Module→Host pipeline can eliminate the manual dual
|
||||
definition of SQLite/PG table schemas. But dbtype is Phase 0 (architecture
|
||||
complete, no implementation). For v1, storage uses manual Drizzle table
|
||||
definitions. The Module-based graph type definitions are compatible with dbtype
|
||||
because both produce `Type.Module` objects — the integration path is clear.
|
||||
See [forward-look.md](./forward-look.md).
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [009](decisions/009-typebox-module-replaces-schemabuilder.md) | TypeBox Module replaces SchemaBuilder | `SchemaBuilder` removed; `Type.Module()` construction is the API |
|
||||
| [010](decisions/010-metagraph-import-for-same-package.md) | Metagraph.Import() for same-package | Use `Import()` within `@alkdev/storage`; local re-declaration for external packages |
|
||||
| [011](decisions/011-config-as-module-entry-with-literal-values.md) | Config as Module entry with Literal values | General `Config` uses Unions; specific types freeze to Literals |
|
||||
| [012](decisions/012-node-edge-attributes-as-module-entries.md) | Node/edge attributes as Module entries | Application layer: named entries. DB layer: `Type.Unknown()` for arbitrary schema blobs |
|
||||
| [013](decisions/013-storage-produces-graphology-format.md) | Storage produces graphology format | No graphology dependency; produces JSON that flowgraph consumes |
|
||||
| [014](decisions/014-dereferenced-entry-schemas.md) | Repository stores dereferenced entries | Each row stores its own schema + transitive `$defs`, not the full Module |
|
||||
| [015](decisions/015-edge-constraints-as-named-entries.md) | Edge constraints as named Module entries | `*EdgeConstraints` entries are the source of truth; DB columns are projections |
|
||||
| [016](decisions/016-naming-convention-for-module-entries.md) | Naming convention for entries | `*Node`, `*Edge`, `Config`, `*EdgeConstraints`, `*Enum` suffixes |
|
||||
| [017](decisions/017-pointer-abstraction-is-forward-looking.md) | Pointer abstraction is forward-looking | v1 uses direct key-based addressing; typed pointers post-v1 |
|
||||
| [018](decisions/018-dbtype-integration-is-post-v1.md) | dbtype integration is post-v1 | Manual Drizzle table defs for v1; Module format is compatible with future dbtype |
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Should `@alkdev/flowgraph` export a `Type.Module`, or should storage define
|
||||
its own entries with documented correspondence?** Flowgraph currently exports
|
||||
`CallNodeAttrs` as a standalone `Type.Object`. To use `Import()`, flowgraph
|
||||
needs to export a Module. Storage can start with standalone schemas and
|
||||
`Type.Composite([BaseNode, CallNodeAttrs])` — no dependency on flowgraph.
|
||||
Adopt `Import()` when flowgraph provides a Module. **This avoids a circular
|
||||
dependency: `@alkdev/storage` does NOT depend on `@alkdev/flowgraph`.**
|
||||
Open questions are tracked in [open-questions.md](open-questions.md). Key questions
|
||||
affecting this document:
|
||||
|
||||
2. **Should concrete graph type Modules live in storage or in their respective
|
||||
packages?** Call-graph attribute schemas are defined by flowgraph's domain, not
|
||||
storage's. Storage provides the metagraph *framework* (the `Metagraph` Module
|
||||
with `BaseNode`, `BaseEdge`, `Config`). Concrete types like `CallGraph` could
|
||||
live either in storage (as reference implementations) or in their respective
|
||||
packages. **Decision: Both.** Storage provides reference Modules in `modules/`
|
||||
that consumers can use directly or replace. Flowgraph may also export a
|
||||
Module — the two are compatible via Module `$defs`.
|
||||
|
||||
3. **Should `*EdgeConstraints` entries use `Type.Ref("CallNode")` or
|
||||
`Type.String()` for allowed source/target types?** See the
|
||||
[Edge Type Constraints](#edge-type-constraints) section. **Decision:
|
||||
`Type.String()`** — the constraint arrays contain names, not schemas.
|
||||
|
||||
4. **How does the graph pointer abstraction interact with the repository layer?**
|
||||
For v1, repository functions use direct key-based addressing. **Decision:
|
||||
validate on read** — if data doesn't match the Module entry, throw. This
|
||||
makes any value retrieved from the repo conform to the schema.
|
||||
- **OQ-01**: Should `@alkdev/flowgraph` export a `Type.Module`? (partially resolved — storage can start with standalone schemas)
|
||||
- **OQ-02**: Should concrete graph type Modules live in storage or their packages? (resolved — both)
|
||||
- **OQ-05**: Should `*EdgeConstraints` use `Type.Ref` or `Type.String`? (resolved — `Type.String()`)
|
||||
- **OQ-06**: Pointer abstraction vs repository layer? (resolved — direct key-based addressing for v1)
|
||||
|
||||
## References
|
||||
|
||||
- ADRs: [decisions/](decisions/)
|
||||
- Open questions: [open-questions.md](open-questions.md)
|
||||
- Package overview: [overview.md](overview.md)
|
||||
- Schema evolution: [schema-evolution.md](schema-evolution.md)
|
||||
- Forward-looking connections: [forward-look.md](forward-look.md)
|
||||
- ujsx schema (proven Module pattern): `/workspace/@alkdev/ujsx/src/core/schema.ts`
|
||||
- ujsx ADR-002 (Module as type registry): `/workspace/@alkdev/ujsx/docs/architecture/decisions/002-typebox-module-as-registry.md`
|
||||
- ujsx schema docs: `/workspace/@alkdev/ujsx/docs/architecture/schema.md`
|
||||
- ujsx ADR-002: `/workspace/@alkdev/ujsx/docs/architecture/decisions/002-typebox-module-as-registry.md`
|
||||
- TsToModule codegen: `/workspace/research/typebox_research/codegen/ts-to-module.ts`
|
||||
- Flowgraph schema (standalone TypeBox, not yet Module): `/workspace/@alkdev/flowgraph/src/schema/`
|
||||
- Flowgraph SerializedGraph factory: `/workspace/@alkdev/flowgraph/src/schema/graph.ts`
|
||||
- Schema evolution: [schema-evolution.md](./schema-evolution.md)
|
||||
- Forward-looking connections: [forward-look.md](./forward-look.md)
|
||||
- Package overview: [overview.md](./overview.md)
|
||||
- Flowgraph schema: `/workspace/@alkdev/flowgraph/src/schema/`
|
||||
154
docs/architecture/open-questions.md
Normal file
154
docs/architecture/open-questions.md
Normal file
@@ -0,0 +1,154 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-05-29
|
||||
---
|
||||
|
||||
# Open Questions Tracker
|
||||
|
||||
Cross-cutting compilation of all unresolved questions across the storage architecture documents, organized by theme. Questions that appear in multiple documents are unified here with cross-references.
|
||||
|
||||
When a question is resolved, update its status to `resolved` and add a resolution note. Once all questions in a theme are resolved, the theme section can be removed and the resolution noted in the relevant ADR.
|
||||
|
||||
## How to Use This Document
|
||||
|
||||
- Each question has an **ID** (e.g., OQ-01), **status**, **origin** (which doc(s)), and **priority**
|
||||
- **Cross-references** link related questions and ADRs
|
||||
- Resolved questions have a **resolution** note
|
||||
|
||||
## ADR Impact
|
||||
|
||||
| ADR | Resolves |
|
||||
|-----|----------|
|
||||
| ADR-003 | OQ-01 (partial — storage can start without flowgraph Module) |
|
||||
| ADR-015 | OQ-05 (constraint semantics) |
|
||||
| ADR-020 | OQ-02 (no nodeTypeId for now, can add later) |
|
||||
|
||||
## Theme 1: Package Boundaries and Dependencies
|
||||
|
||||
### OQ-01: Should @alkdev/flowgraph export a Type.Module, or should storage define its own entries with documented correspondence?
|
||||
|
||||
- **Origin**: [metagraph-module.md](metagraph-module.md)
|
||||
- **Status**: partially resolved
|
||||
- **Priority**: high
|
||||
- **Notes**: Storage can start with standalone schemas and `Type.Composite([BaseNode, CallNodeAttrs])` — no dependency on flowgraph. Adopt `Import()` when flowgraph provides a Module. This avoids a circular dependency: `@alkdev/storage` does NOT depend on `@alkdev/flowgraph`.
|
||||
- **Cross-references**: ADR-003, ADR-010
|
||||
|
||||
### OQ-02: Should concrete graph type Modules live in storage or in their respective packages?
|
||||
|
||||
- **Origin**: [metagraph-module.md](metagraph-module.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: medium
|
||||
- **Resolution**: Both. Storage provides reference Modules in `modules/` that consumers can use directly or replace. Flowgraph may also export a Module — the two are compatible via Module `$defs`.
|
||||
- **Cross-references**: ADR-003
|
||||
|
||||
## Theme 2: Data Model
|
||||
|
||||
### OQ-03: Should actors be a node type or a standalone table?
|
||||
|
||||
- **Origin**: [overview.md](overview.md)
|
||||
- **Status**: open
|
||||
- **Priority**: medium
|
||||
- **Notes**: Currently `actors` is a standalone table with no relations. If identity/authentication is a graph (ACL nodes based on `@alkdev/operations`' `Identity` interface), actors become node types. If identity needs special query patterns (auth lookups, session joins), standalone tables may be better. Decision deferred until ACL design.
|
||||
- **Cross-references**: ADR-024, [encrypted-data.md](encrypted-data.md)
|
||||
|
||||
### OQ-04: Should the repository layer be host-specific or host-agnostic?
|
||||
|
||||
- **Origin**: [overview.md](overview.md)
|
||||
- **Status**: open
|
||||
- **Priority**: medium
|
||||
- **Notes**: A host-agnostic repository requires an abstraction over Drizzle's query builder. A host-specific repository is simpler but means duplicating query logic for PG. Decision: start host-specific in SQLite, extract common patterns later.
|
||||
- **Cross-references**: [sqlite-host.md](sqlite-host.md)
|
||||
|
||||
### OQ-05: Should *EdgeConstraints entries use Type.Ref or Type.String for allowed source/target types?
|
||||
|
||||
- **Origin**: [metagraph-module.md](metagraph-module.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: low
|
||||
- **Resolution**: `Type.String()` — the constraint arrays contain node type names, not node type schemas.
|
||||
- **Cross-references**: ADR-015
|
||||
|
||||
### OQ-06: How does the graph pointer abstraction interact with the repository layer?
|
||||
|
||||
- **Origin**: [metagraph-module.md](metagraph-module.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: low
|
||||
- **Resolution**: For v1, repository functions use direct key-based addressing. Validate on read — if data doesn't match the Module entry, throw. Typed pointers are post-v1 (ADR-017).
|
||||
- **Cross-references**: ADR-017, [forward-look.md](forward-look.md)
|
||||
|
||||
## Theme 3: Encryption and Security
|
||||
|
||||
### OQ-07: Should we add encryptRaw() for performance?
|
||||
|
||||
- **Origin**: [encrypted-data.md](encrypted-data.md)
|
||||
- **Status**: open
|
||||
- **Priority**: low
|
||||
- **Notes**: PBKDF2 derivation adds ~100ms per operation. For batch operations (e.g., rotating 1000 keys), this adds up. An `encryptRaw()` that skips PBKDF2 would be much faster. Decision: add in a future iteration if performance demands it.
|
||||
|
||||
### OQ-08: Should the key attribute on secret nodes be encrypted?
|
||||
|
||||
- **Origin**: [encrypted-data.md](encrypted-data.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: low
|
||||
- **Resolution**: Plaintext key names are acceptable for now. If secret names are sensitive, add a `keyHash` attribute for blind lookups.
|
||||
|
||||
### OQ-09: Should secret nodes have lastUsedAt and expiresAt as first-class columns?
|
||||
|
||||
- **Origin**: [encrypted-data.md](encrypted-data.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: low
|
||||
- **Resolution**: For spoke use (occasional lookups), JSON attributes are fine. For hub use (high-throughput key validation), a standalone `api_keys` table with proper indexes is still needed.
|
||||
|
||||
## Theme 4: Schema Evolution
|
||||
|
||||
### OQ-10: Can Value.Diff Edit[] be reliably classified as breaking vs non-breaking?
|
||||
|
||||
- **Origin**: [schema-evolution.md](schema-evolution.md)
|
||||
- **Status**: open
|
||||
- **Priority**: high
|
||||
- **Notes**: The classification table in schema-evolution.md is theoretical. A POC should validate whether `Edit[]` output contains enough information to distinguish `String → Literal("x")` (narrowing, non-breaking) from `String → Number` (incompatible, breaking). Alternative: skip classification and just use `Value.Check(newSchema, storedData)` for verification.
|
||||
|
||||
### OQ-11: Should the repository layer auto-migrate data on schema change, or require explicit consumer action?
|
||||
|
||||
- **Origin**: [schema-evolution.md](schema-evolution.md)
|
||||
- **Status**: open
|
||||
- **Priority**: high
|
||||
- **Notes**: Conditional on OQ-10 POC outcome. If classification is feasible, the repository layer auto-applies `Value.Cast` for non-breaking changes and requires explicit consumer action for breaking changes. If classification is not feasible, the repository layer auto-applies `Value.Cast` only when `Value.Check(newSchema, storedData)` passes for all stored data.
|
||||
|
||||
### OQ-12: How does schema evolution interact with the hub's event-sourced call graph?
|
||||
|
||||
- **Origin**: [schema-evolution.md](schema-evolution.md)
|
||||
- **Status**: open
|
||||
- **Priority**: medium
|
||||
- **Notes**: If the hub migrates to event-sourced replay (projector evolution), storage's call graph tables become disposable projections. But other graph types (ACL, tasks, secrets) may not have an event stream to replay from. The schema evolution design should work for both projections and direct-persisted data.
|
||||
|
||||
### OQ-13: Should schema evolution events be part of the event stream?
|
||||
|
||||
- **Origin**: [schema-evolution.md](schema-evolution.md)
|
||||
- **Status**: open
|
||||
- **Priority**: low
|
||||
- **Notes**: Post-v1. For v1, schema changes are applied directly via the repository layer with version tracking.
|
||||
|
||||
## Theme 5: Encrypted Data Scope
|
||||
|
||||
### OQ-14: Should encryption be per-attribute, per-node, or per-graph?
|
||||
|
||||
- **Origin**: [overview.md](overview.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: high
|
||||
- **Resolution**: Per-attribute. The `EncryptedData` schema is a single attribute within a node type, not the entire node. This preserves queryability on non-sensitive fields (ADR-023).
|
||||
|
||||
### OQ-15: Should key management be in this package?
|
||||
|
||||
- **Origin**: [overview.md](overview.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: high
|
||||
- **Resolution**: No. `@alkdev/storage` provides encryption/decryption primitives but NOT key management. The consuming application provides the key ring (ADR-026).
|
||||
|
||||
## Theme 6: Repository Layer
|
||||
|
||||
### OQ-16: Should the repository layer live in @alkdev/storage or in a consumer package?
|
||||
|
||||
- **Origin**: [overview.md](overview.md)
|
||||
- **Status**: resolved
|
||||
- **Priority**: high
|
||||
- **Resolution**: The repository CRUD layer (host-specific typed queries, schema validation before writes) belongs in `@alkdev/storage`. The operations bridging layer (generating `OperationSpec`s from metagraph schemas) belongs in a consumer or adapter package. These are separate concerns — CRUD is a storage concern; call protocol integration is an application concern.
|
||||
@@ -1,6 +1,6 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-05-28
|
||||
status: reviewed
|
||||
last_updated: 2026-05-29
|
||||
---
|
||||
|
||||
# @alkdev/storage — Overview
|
||||
@@ -69,64 +69,18 @@ the main `mod.ts` re-exports it. Importing from either `@alkdev/storage` or
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### D1: Deno-first, JSR publishes, npm comes free
|
||||
All design decisions are documented as ADRs in [decisions/](decisions/).
|
||||
|
||||
The package is published to JSR (`deno publish`). npm compatibility is automatic
|
||||
via JSR's npm layer (`@jsr/alkdev__storage`). No separate dnt build step.
|
||||
|
||||
### D2: Metagraph over domain-specific tables
|
||||
|
||||
Instead of a table per domain concept (call graphs, ACL rules, task trees), we
|
||||
define graph types with typed node and edge schemas. A "call graph" is a graph
|
||||
type with specific node types (operation call, subcall) and edge types
|
||||
(triggered, depends_on). An "ACL graph" is a graph type with node types
|
||||
(account, resource) and edge types (can_read, can_write).
|
||||
|
||||
This trades some query convenience for generality. Domain-specific queries are
|
||||
built on top of the graph query layer, not baked into table schemas.
|
||||
|
||||
### D3: Type.Module as the primary API surface
|
||||
|
||||
The `Type.Module()` construction API is the intended way to define graph type
|
||||
definitions. The `Metagraph` Module provides base entries (`BaseNode`,
|
||||
`BaseEdge`, `Config`); concrete graph types compose them via `Metagraph.Import()`
|
||||
and `Type.Composite()`. The `SchemaBuilder` is removed.
|
||||
|
||||
This replaces the earlier fluent builder pattern. The Module format provides
|
||||
native `Type.Ref()` for internal references, `Module.Import()` for cross-package
|
||||
references, and JSON Schema `$defs` that map directly to DB storage.
|
||||
|
||||
### D4: Injectable clients, no module-level side effects
|
||||
|
||||
`createSqliteDatabase(client)` receives a pre-created client. Module-level side
|
||||
effects (auto-connections, env-based configuration) are forbidden. This enables
|
||||
testing with in-memory databases and containerized deployment patterns.
|
||||
|
||||
### D5: Drizzle + TypeBox (via drizzlebox) as the table definition pattern
|
||||
|
||||
Drizzle table definitions are the single source of truth for database schema.
|
||||
`@alkdev/drizzlebox` generates TypeBox `Select*` and `Insert*` schemas from
|
||||
Drizzle tables, enabling runtime validation without manual schema duplication.
|
||||
|
||||
### D6: Enumeration pattern — `as const` objects, not TypeScript enums
|
||||
|
||||
All enumerations use the `as const` object pattern (e.g.,
|
||||
`GRAPH_STATUS = { Active: "active", ... } as const`) rather than TypeScript
|
||||
`enum`. This avoids JSR slow-type issues and provides a consistent pattern
|
||||
across the codebase. The TypeBox schemas use `Type.Union` of `Type.Literal`
|
||||
values derived from the const object.
|
||||
|
||||
### D7: No comments in code
|
||||
|
||||
Per project convention across @alkdev packages, source files contain no inline
|
||||
comments. Documentation lives in architecture docs and TypeBox schema
|
||||
descriptions.
|
||||
|
||||
### D8: Common columns pattern
|
||||
|
||||
All tables share `id` (text PK), `metadata` (JSON text defaulting to `{}`),
|
||||
`createdAt`, and `updatedAt` (integer timestamps in SQLite, will be timestamptz
|
||||
in PG). This ensures every row has auditability and extensibility.
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [001](decisions/001-deno-first-jsr-publishes.md) | Deno-first, JSR publishes | Published to JSR; npm comes free via `@jsr/alkdev__storage` |
|
||||
| [002](decisions/002-metagraph-over-domain-tables.md) | Metagraph over domain-specific tables | 6 general-purpose tables serve all domains |
|
||||
| [003](decisions/003-typebox-module-as-api-surface.md) | TypeBox Module as API surface | `Type.Module()` replaces `SchemaBuilder`; `Metagraph.Import()` + `Type.Composite()` |
|
||||
| [004](decisions/004-injectable-clients-no-side-effects.md) | Injectable clients, no side effects | `createSqliteDatabase(client)` takes a pre-created client |
|
||||
| [005](decisions/005-drizzle-plus-typebox-via-drizzlebox.md) | Drizzle + TypeBox via drizzlebox | Drizzle tables are single source of truth; drizzlebox generates TypeBox schemas |
|
||||
| [006](decisions/006-enum-pattern-as-const-objects.md) | `as const` objects, not TypeScript enums | Avoids JSR slow-types; consistent pattern across codebase |
|
||||
| [007](decisions/007-no-comments-in-code.md) | No comments in code | Documentation lives in architecture docs and TypeBox descriptions |
|
||||
| [008](decisions/008-common-columns-pattern.md) | Common columns pattern | `id`, `metadata`, `createdAt`, `updatedAt` on every table |
|
||||
|
||||
## Dependencies
|
||||
|
||||
@@ -256,48 +210,14 @@ storage node attributes and operations call events), they should either:
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Should `actors` be a node type or a standalone table?** Currently `actors`
|
||||
is a standalone table in the SQLite host that isn't referenced by any
|
||||
relation. If identity/authentication is a graph (ACL nodes based on
|
||||
`@alkdev/operations`'s `Identity` interface), actors become node types. If
|
||||
identity is a domain concept that needs special query patterns (auth lookups,
|
||||
session joins), standalone tables may be better. Decision: defer until ACL
|
||||
design, informed by `@alkdev/operations`'s `AccessControl` model.
|
||||
Open questions are tracked in [open-questions.md](open-questions.md). Key
|
||||
questions affecting this package:
|
||||
|
||||
2. **Should the repository layer be host-specific or host-agnostic?** A
|
||||
host-agnostic repository (insert graph, find nodes by type) requires an
|
||||
abstraction over Drizzle's query builder. A host-specific repository is
|
||||
simpler but means duplicating query logic for PG. Decision: start
|
||||
host-specific in SQLite, extract common patterns later.
|
||||
|
||||
3. **Encrypted data scope**: Should encryption be per-attribute, per-node, or
|
||||
per-graph? Per-attribute (like hub's `client_secrets.value`) allows selective
|
||||
encryption. Per-node encrypts the entire `attributes` blob. Per-graph is
|
||||
overkill. Decision: per-attribute, modeled as an encrypted node type with a
|
||||
dedicated attribute for the ciphertext.
|
||||
|
||||
4. **Key management scope**: `@alkdev/storage` should provide the
|
||||
encryption/decryption primitives but NOT key management. The consuming
|
||||
application provides the key ring. This keeps the storage package agnostic to
|
||||
deployment-specific secret management.
|
||||
|
||||
5. **Schema evolution strategy**: When graph type schemas evolve (new node types,
|
||||
changed attribute schemas), how are changes detected and data migrated?
|
||||
TypeBox's `Value.Diff` can diff schemas-as-JSON to detect changes,
|
||||
`Value.Cast` can migrate data shapes, and `Value.Check` can verify
|
||||
compatibility. The `version` field on `graph_types` tracks breaking changes.
|
||||
See [schema-evolution.md](./schema-evolution.md) for the full design.
|
||||
|
||||
6. **~~Should the repository layer live in `@alkdev/storage` or in a consumer
|
||||
package?~~** Decision: the repository CRUD layer (host-specific typed
|
||||
queries, schema validation before writes) belongs in `@alkdev/storage`. The
|
||||
operations bridging layer (generating `OperationSpec`s from metagraph schemas)
|
||||
belongs in a consumer or adapter package. These are separate concerns — CRUD
|
||||
is a storage concern; call protocol integration is an application concern.
|
||||
The repository layer in `@alkdev/storage` has **no dependency on
|
||||
`@alkdev/operations`**. It performs typed inserts, finds, updates, and
|
||||
deletes with schema validation. The consumer then wires these CRUD functions
|
||||
into the operations registry if desired.
|
||||
- **OQ-03**: Should actors be a node type or a standalone table? (open, deferred to ACL design)
|
||||
- **OQ-04**: Should the repository layer be host-specific or host-agnostic? (open, start host-specific)
|
||||
- **OQ-14**: Should encryption be per-attribute, per-node, or per-graph? (resolved: per-attribute)
|
||||
- **OQ-15**: Should key management be in this package? (resolved: no, application provides key ring)
|
||||
- **OQ-16**: Should the repository layer live in storage or a consumer package? (resolved: CRUD in storage, operations bridging in consumer)
|
||||
|
||||
## References
|
||||
|
||||
|
||||
@@ -475,77 +475,25 @@ DB) avoids this problem — the stored schema is already dereferenced.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### SE1: Additive-only for v1, Cast migration when needed
|
||||
All design decisions are documented as ADRs in [decisions/](decisions/).
|
||||
|
||||
For v1, schema changes should be additive (new optional fields, new types,
|
||||
new enum values). This avoids data migration entirely. When additive-only is
|
||||
insufficient, `Value.Cast` handles the common migration cases. Custom
|
||||
migration functions are the consumer's responsibility.
|
||||
|
||||
### SE2: Version as a coarse-grained breaking-change signal
|
||||
|
||||
The `version` integer on `graph_types` tracks **breaking** schema changes.
|
||||
Non-breaking changes (additive) do not require a version bump. This is a
|
||||
coarse signal — the repository layer checks version before processing and
|
||||
knows to run migration logic when it doesn't match.
|
||||
|
||||
### SE3: Schema change detection via Value.Diff, not manual tracking
|
||||
|
||||
Rather than maintaining a separate "schema version log" or changelog, the
|
||||
repository layer uses `Value.Diff(storedSchema, moduleEntry)` to detect when
|
||||
a stored schema has diverged from the current Module entry. This is
|
||||
schema-agnostic and works for any change.
|
||||
|
||||
### SE4: moduleToDbSchema() for schema updates, not Value.Patch
|
||||
|
||||
When updating stored schemas, re-run `moduleToDbSchema()` on the full Module
|
||||
rather than using `Value.Patch` to apply edits. This is more reliable because
|
||||
it doesn't depend on Diff correctly capturing `Type.Ref`/`$defs` changes.
|
||||
Patch-based schema update is an optimization for later.
|
||||
|
||||
### SE5: Single-author model, not CRDT
|
||||
|
||||
Schema evolution assumes single-author per graph type. There is no concurrent
|
||||
multi-author editing of graph types. If this changes (multiple consumers
|
||||
defining the same graph type with different schemas), a merge/CRDT layer would
|
||||
be needed. That's a post-v1 concern.
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [028](decisions/028-additive-only-with-cast-migration.md) | Additive-only for v1, Cast migration when needed | Additive changes avoid migration; `Value.Cast` for common cases |
|
||||
| [029](decisions/029-version-as-breaking-change-signal.md) | Version as a coarse-grained breaking-change signal | Only breaking changes bump the version; even/odd for migration state |
|
||||
| [030](decisions/030-schema-change-detection-via-diff.md) | Schema change detection via Value.Diff | No manual changelog; diff stored vs current schemas |
|
||||
| [031](decisions/031-moduletodbschema-for-updates.md) | moduleToDbSchema() for schema updates | Re-run full Module projection, not Value.Patch |
|
||||
| [032](decisions/032-single-author-not-crdt.md) | Single-author model, not CRDT | No concurrent multi-author graph types |
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Can `Edit[]` from `Value.Diff` be reliably classified as breaking vs
|
||||
non-breaking?** The classification table above is theoretical. A POC should
|
||||
validate whether the `Edit[]` output contains enough information to
|
||||
distinguish, for example, `String → Literal("x")` (narrowing, non-breaking)
|
||||
from `String → Number` (incompatible, breaking). Alternative: skip
|
||||
classification and just use `Value.Check(newSchema, storedData)` for
|
||||
verification.
|
||||
Open questions are tracked in [open-questions.md](open-questions.md). Key
|
||||
questions affecting schema evolution:
|
||||
|
||||
2. **Should the repository layer auto-migrate data on schema change, or
|
||||
require explicit consumer action?** Auto-migration is simpler for consumers
|
||||
but risky (data transformation without consumer awareness). Explicit
|
||||
migration is safer but more boilerplate. **Decision (conditional on OQ1
|
||||
POC outcome):** if classification is feasible (OQ1 POC succeeds), the
|
||||
repository layer auto-applies `Value.Cast` for changes it classifies as
|
||||
non-breaking, and requires explicit consumer action for breaking changes.
|
||||
If classification is not feasible, the fallback is: the repository layer
|
||||
auto-applies `Value.Cast` only when `Value.Check(newSchema, storedData)`
|
||||
passes for all stored data (verification, not classification), and requires
|
||||
explicit consumer action otherwise. This ensures auto-migration never
|
||||
corrupts data — if in doubt, the consumer decides.
|
||||
|
||||
3. **How does this interact with the hub's event-sourced call graph
|
||||
persistence?** If the hub migrates to event-sourced replay (projector
|
||||
evolution), storage's call graph tables become disposable projections and
|
||||
`Value.Cast` migration is unnecessary. But other graph types (ACL, tasks,
|
||||
secrets) may not have an event stream to replay from. The schema evolution
|
||||
design should work for both projections and direct-persisted data.
|
||||
|
||||
4. **Should schema evolution events be part of the event stream?** If the
|
||||
system is event-sourced, schema changes themselves could be events
|
||||
(`schema.updated`, `schema.version_bumped`). This would give a full audit
|
||||
trail of schema evolution, but adds complexity. **Decision: post-v1.** For
|
||||
v1, schema changes are applied directly via the repository layer with version
|
||||
tracking.
|
||||
- **OQ-10**: Can `Edit[]` from `Value.Diff` be reliably classified as breaking vs non-breaking?
|
||||
- **OQ-11**: Should the repository layer auto-migrate data on schema change, or require explicit consumer action?
|
||||
- **OQ-12**: How does schema evolution interact with the hub's event-sourced call graph persistence?
|
||||
- **OQ-13**: Should schema evolution events be part of the event stream?
|
||||
|
||||
## References
|
||||
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
---
|
||||
status: draft
|
||||
last_updated: 2026-05-28
|
||||
status: reviewed
|
||||
last_updated: 2026-05-29
|
||||
---
|
||||
|
||||
# SQLite Host
|
||||
@@ -260,53 +260,16 @@ Drizzle database instance with the full schema attached. This enables:
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### SD1: JSON text vs. JSONB in SQLite
|
||||
All design decisions are documented as ADRs in [decisions/](decisions/).
|
||||
|
||||
SQLite stores JSON as `text` with `{ mode: "json" }`. PostgreSQL uses native
|
||||
`jsonb`. This means:
|
||||
|
||||
- SQLite cannot query inside JSON columns efficiently (no GIN indexes)
|
||||
- SQLite JSON validation relies on application-level checks (TypeBox schemas)
|
||||
- PostgreSQL will get queryability benefits for JSON columns
|
||||
|
||||
The trade-off: SQLite is for spokes (local, infrequent queries), PostgreSQL is
|
||||
for the hub (frequent, complex queries).
|
||||
|
||||
### SD2: No `nodeTypeId` on nodes
|
||||
|
||||
Nodes don't carry a direct FK to `node_types`. The node type is enforced at the
|
||||
application layer. Reasons:
|
||||
|
||||
- Graph type schemas define which node types are valid. Adding a FK would
|
||||
duplicate this constraint.
|
||||
- Node types can evolve (schemas can change) without requiring node row updates.
|
||||
- The repository layer validates node attributes against the appropriate node
|
||||
type schema before insertion.
|
||||
|
||||
This may change if query performance requires filtering nodes by type. A
|
||||
`nodeTypeId` column can be added as a denormalized index.
|
||||
|
||||
### SD3: Edge identity uses consumer-defined keys
|
||||
|
||||
Edges use `(graphId, key)` as their unique identity. The `key` is
|
||||
consumer-defined, matching the metagraph model where consumers control
|
||||
identifiers. For anonymous edges (common in simple graphs), `key` can be
|
||||
auto-generated.
|
||||
|
||||
### SD4: Composite foreign keys for node references
|
||||
|
||||
Edges reference nodes via composite FKs:
|
||||
`(graphId, sourceNodeKey) → (nodes.graphId, nodes.key)`. This ensures
|
||||
referential integrity within a graph and cascades node deletions to connected
|
||||
edges.
|
||||
|
||||
### SD5: Enum pattern — `as const` objects, not TypeScript enums
|
||||
|
||||
All enumerations use the `as const` object pattern (e.g.,
|
||||
`GRAPH_STATUS = { Active: "active", ... } as const`) rather than TypeScript
|
||||
`enum`. This matches the `ACTOR_TYPE` pattern in `common.ts` and avoids JSR
|
||||
slow-type issues. The TypeBox schema is a `Type.Union` of `Type.Literal` values
|
||||
derived from the object.
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [019](decisions/019-json-text-for-schema-columns.md) | JSON text for schema columns in SQLite | SQLite uses `text` with JSON mode; application-level validation |
|
||||
| [020](decisions/020-no-nodetypeid-on-nodes.md) | No nodeTypeId on nodes | Node type enforced at application layer, not via FK |
|
||||
| [021](decisions/021-edge-identity-uses-consumer-keys.md) | Edge identity uses consumer-defined keys | `(graphId, key)` as unique identity within a graph |
|
||||
| [022](decisions/022-composite-fks-for-node-references.md) | Composite foreign keys for node references | Edges reference `(graphId, sourceNodeKey) → (nodes.graphId, nodes.key)` |
|
||||
| [006](decisions/006-enum-pattern-as-const-objects.md) | `as const` objects, not TypeScript enums | `GRAPH_STATUS`, `ACTOR_TYPE` use const objects; TypeBox uses Literal unions |
|
||||
| [008](decisions/008-common-columns-pattern.md) | Common columns pattern | `id`, `metadata`, `createdAt`, `updatedAt` on every table |
|
||||
|
||||
## Metadata Convention
|
||||
|
||||
|
||||
Reference in New Issue
Block a user