feat: add architecture docs, fix code issues from review, add analyze_lint script

Architecture docs (docs/architecture/): - overview.md: package purpose, exports, terminology, design decisions, gaps - metagraph.md: core graph model, schema types, SchemaBuilder, validation - sqlite-host.md: SQLite tables, common columns, relations, concurrency model - encrypted-data.md: encrypted data as a node type, AES-256-GCM crypto utility design Code fixes from architecture review: - Remove ConfigSchema duplication in graphTypes.ts (import GraphConfig from types.ts) - Add missing SelectNodeSchema/SelectNode to nodes.ts - Fix InsertEdge.key to be Optional (match nullable DB column) - Replace TypeScript enums with as const objects (GRAPH_STATUS, GRAPH_BASE_TYPE) - Add verbatim-module-syntax to lint exclusions (TypeBox false positive) - Add @std/flags and @std/path to deno.json imports Infrastructure: - Add scripts/analyze_lint.ts from @ade for grouped lint analysis - Add deno task lint:analyze - Update AGENTS.md with architecture doc references, enum convention, crypto todo
2026-05-28 13:18:56 +00:00
parent 351fc98ec1
commit b0298663dc
13 changed files with 1311 additions and 37 deletions
--- a/docs/architecture/sqlite-host.md
+++ b/docs/architecture/sqlite-host.md
@@ -0,0 +1,297 @@
+---
+status: draft
+last_updated: 2026-05-28
+---
+
+# SQLite Host
+
+The SQLite database host for `@alkdev/storage`. Uses Drizzle ORM with libsql/Turso for the SQLite dialect and `@alkdev/drizzlebox` for TypeBox schema generation from Drizzle table definitions.
+
+## Overview
+
+The SQLite host provides:
+
+1. **Drizzle table definitions** for the metagraph pattern (graph types, node types, edge types, graphs, nodes, edges) plus a standalone `actors` table
+2. **Drizzle relations** for the relational query API
+3. **TypeBox schemas** auto-generated from Drizzle tables (select/insert validation)
+4. **Injectable database factory** — `createSqliteDatabase(client)` accepts a pre-created client
+
+The SQLite host is the first-class target. PostgreSQL will follow the same table shapes with appropriate dialect changes.
+
+## Package Structure
+
+```
+src/sqlite/
+├── tables/
+│   ├── common.ts          # commonCols, ACTOR_TYPE enum
+│   ├── graphTypes.ts      # graph_types table + select/insert schemas
+│   ├── nodeTypes.ts        # node_types table + select/insert schemas
+│   ├── edgeTypes.ts        # edge_types table + select/insert schemas
+│   ├── graphs.ts           # graphs table + select/insert schemas
+│   ├── nodes.ts            # nodes table + select/insert schemas
+│   ├── edges.ts            # edges table + select/insert schemas
+│   ├── actors.ts           # actors table + select/insert schemas
+│   └── index.ts            # barrel re-export
+├── relations.ts            # Drizzle relational mappings
+├── schema.ts              # re-exports tables + relations
+└── client.ts              # createSqliteDatabase()
+```
+
+## Tables
+
+### Common Columns
+
+All tables share these columns:
+
+```ts
+{
+  id: text("id").primaryKey(),
+  metadata: text("metadata", { mode: "json" }).$type<Record<string, unknown>>().default({}),
+  createdAt: integer("created_at", { mode: "timestamp" })
+    .default(sql`(strftime('%s', 'now'))`)
+    .notNull(),
+  updatedAt: integer("updated_at", { mode: "timestamp" })
+    .default(sql`(strftime('%s', 'now'))`)
+    .notNull(),
+}
+```
+
+**Notable differences from hub's PostgreSQL common columns**:
+
+| Column | SQLite | PostgreSQL (hub) |
+|--------|--------|-------------------|
+| `id` | text PK (consumer-generated) | text PK with `$defaultFn(() => crypto.randomUUID())` |
+| `metadata` | `text` with JSON mode | `jsonb` with `$type<Record<string, unknown>>()` |
+| `createdAt` | `integer` timestamp mode (Unix epoch) | `timestamp with timezone` defaulting `now()` |
+| `updatedAt` | `integer` timestamp mode (Unix epoch) | `timestamp with timezone` defaulting `now()` with `$onUpdate` |
+
+The SQLite columns do NOT have `$defaultFn` for ID generation (the consumer provides IDs) and do NOT have `$onUpdate` for `updatedAt` (Drizzle's `$onUpdate` is application-level; consumers must set it explicitly).
+
+### `graph_types`
+
+Stores graph type definitions (schemas for classes of graphs).
+
+| Column | Type | Constraints | Notes |
+|--------|------|-------------|-------|
+| id | text | PK | Consumer-generated UUID |
+| metadata | text (JSON) | default `{}` | Extension namespace |
+| createdAt | integer (timestamp) | not null, default `now` | |
+| updatedAt | integer (timestamp) | not null, default `now` | |
+| name | text | not null, **unique** | Graph type name (e.g., "call-graph", "acl") |
+| description | text | default `""` | Human-readable description |
+| config | text (JSON) | not null | `GraphConfig` — directed/undirected/mixed, multi, self-loops |
+| version | integer | not null, default 1 | Breaking schema version |
+
+### `node_types`
+
+Stores node type definitions within a graph type. Each node type has a TypeBox schema that validates node attributes.
+
+| Column | Type | Constraints | Notes |
+|--------|------|-------------|-------|
+| id | text | PK | |
+| metadata | text (JSON) | default `{}` | |
+| createdAt | integer (timestamp) | not null, default `now` | |
+| updatedAt | integer (timestamp) | not null, default `now` | |
+| graphTypeId | text | not null, FK → graphTypes.id (cascade) | Parent graph type |
+| name | text | not null | Node type name (e.g., "call", "account") |
+| description | text | default `""` | |
+| schema | text (JSON) | not null | TypeBox schema for node attributes |
+
+**Unique constraint**: `(graphTypeId, name)` — node type names are unique within a graph type.
+
+### `edge_types`
+
+Stores edge type definitions within a graph type.
+
+| Column | Type | Constraints | Notes |
+|--------|------|-------------|-------|
+| id | text | PK | |
+| metadata | text (JSON) | default `{}` | |
+| createdAt | integer (timestamp) | not null, default `now` | |
+| updatedAt | integer (timestamp) | not null, default `now` | |
+| graphTypeId | text | not null, FK → graphTypes.id (cascade) | Parent graph type |
+| name | text | not null | Edge type name (e.g., "triggered", "can_read") |
+| description | text | default `""` | |
+| schema | text (JSON) | not null | TypeBox schema for edge attributes |
+| allowedSourceTypes | text (JSON) | default `[]` | Node type names valid at source endpoint |
+| allowedTargetTypes | text (JSON) | default `[]` | Node type names valid at target endpoint |
+
+**Unique constraint**: `(graphTypeId, name)` — edge type names are unique within a graph type.
+
+**Empty array semantics**: `allowedSourceTypes` and `allowedTargetTypes` default to `[]` (empty JSON array) in the database. The repository layer must treat `[]` (empty array) as "no restriction" — any node type is a valid endpoint — matching the behavior of `undefined` in the `EdgeType` schema. A non-empty array restricts endpoints to only the listed node types. There is no "no types allowed" state; if edge types need to be disabled, use a status or soft-delete pattern on the edge type definition.
+
+### `graphs`
+
+Graph instances. Each graph belongs to a graph type.
+
+| Column | Type | Constraints | Notes |
+|--------|------|-------------|-------|
+| id | text | PK | |
+| metadata | text (JSON) | default `{}` | |
+| createdAt | integer (timestamp) | not null, default `now` | |
+| updatedAt | integer (timestamp) | not null, default `now` | |
+| graphTypeId | text | FK → graphTypes.id (set null) | Set null on graph type deletion (orphan graph) |
+| name | text | not null | Graph instance name |
+| description | text | default `""` | |
+| status | text | not null, enum: `active`, `archived`, `draft` | Default: `draft` |
+
+**On `graphTypeId` set null**: When a graph type is deleted, its graphs become orphans with `graphTypeId = null`. The application should prevent graph type deletion if active graphs reference it, or set affected graphs' `status` to `archived` as part of a soft-delete workflow. Orphan graphs cannot validate their node/edge types against a missing type definition — queries against orphan graphs should check for `graphTypeId !== null` before performing type-aware operations.
+
+### `nodes`
+
+Nodes within a graph instance. Keyed by `(graphId, key)` — unique within a graph.
+
+| Column | Type | Constraints | Notes |
+|--------|------|-------------|-------|
+| id | text | PK | |
+| metadata | text (JSON) | default `{}` | |
+| createdAt | integer (timestamp) | not null, default `now` | |
+| updatedAt | integer (timestamp) | not null, default `now` | |
+| graphId | text | not null, FK → graphs.id (cascade) | Parent graph |
+| key | text | not null | Consumer-defined identity within the graph |
+| attributes | text (JSON) | not null, default `{}` | Node attributes validated by node type schema |
+
+**Unique constraint**: `(graphId, key)` — node keys are unique within a graph.
+
+**No `nodeTypeId` column**: Nodes do not have a direct FK to `node_types`. The node type is determined at the application layer. This is a deliberate design decision — adding a `nodeTypeId` FK would couple the graph instance layer to the type definition layer. The repository layer can enforce node type constraints via validation against the graph type's schema.
+
+### `edges`
+
+Edges within a graph instance. Keyed by `(graphId, key)` — unique within a graph.
+
+| Column | Type | Constraints | Notes |
+|--------|------|-------------|-------|
+| id | text | PK | |
+| metadata | text (JSON) | default `{}` | |
+| createdAt | integer (timestamp) | not null, default `now` | |
+| updatedAt | integer (timestamp) | not null, default `now` | |
+| graphId | text | not null, FK → graphs.id (cascade) | Parent graph |
+| key | text | | Consumer-defined identity (null for anonymous edges) |
+| sourceNodeKey | text | not null | Source node key within the graph |
+| targetNodeKey | text | not null | Target node key within the graph |
+| attributes | text (JSON) | not null, default `{}` | Edge attributes validated by edge type schema |
+| undirected | integer (boolean) | default false | Treat as undirected regardless of graph type |
+
+**Unique constraint**: `(graphId, key)` — edge keys are unique within a graph.
+
+**Foreign keys**: `sourceNodeKey` and `targetNodeKey` reference `(nodes.graphId, nodes.key)` with cascade delete. Deleting a node removes all its edges.
+
+### `actors`
+
+Standalone identity table. Currently not referenced by any relation. This is a placeholder for the hub's account/identity model and may become a node type in an ACL graph or remain a standalone table. See [overview.md](./overview.md) Open Question 1.
+
+| Column | Type | Constraints | Notes |
+|--------|------|-------------|-------|
+| id | text | PK | |
+| metadata | text (JSON) | default `{}` | |
+| createdAt | integer (timestamp) | not null, default `now` | |
+| updatedAt | integer (timestamp) | not null, default `now` | |
+| name | text | not null | Actor display name |
+| type | text | not null, enum: `human`, `llm`, `agent` | Actor type |
+
+## Relations
+
+Drizzle relational mappings define the following relationships:
+
+- **graphTypes → nodeTypes**: one-to-many
+- **graphTypes → edgeTypes**: one-to-many
+- **graphTypes → graphs**: one-to-many
+- **graphs → nodes**: one-to-many
+- **graphs → edges**: one-to-many
+- **nodes → outgoing edges** (sourceNode): one-to-many
+- **nodes → incoming edges** (targetNode): one-to-many
+- **edges → source node**: one-to-one (via composite key)
+- **edges → target node**: one-to-one (via composite key)
+
+## Client Factory
+
+```ts
+import { createSqliteDatabase } from "@alkdev/storage/sqlite";
+import type { SqliteDatabase } from "@alkdev/storage/sqlite";
+import { createClient } from "@libsql/client";
+
+const client = createClient({ url: "file:local.db" });
+const db: SqliteDatabase = createSqliteDatabase(client);
+```
+
+The factory takes a pre-created `@libsql/client` client and returns a typed Drizzle database instance with the full schema attached. This enables:
+
+- In-memory testing with `createClient({ url: ":memory:" })`
+- Turso remote connections
+- Custom client configuration (auth tokens, etc.)
+
+## Design Decisions
+
+### SD1: JSON text vs. JSONB in SQLite
+
+SQLite stores JSON as `text` with `{ mode: "json" }`. PostgreSQL uses native `jsonb`. This means:
+
+- SQLite cannot query inside JSON columns efficiently (no GIN indexes)
+- SQLite JSON validation relies on application-level checks (TypeBox schemas)
+- PostgreSQL will get queryability benefits for JSON columns
+
+The trade-off: SQLite is for spokes (local, infrequent queries), PostgreSQL is for the hub (frequent, complex queries).
+
+### SD2: No `nodeTypeId` on nodes
+
+Nodes don't carry a direct FK to `node_types`. The node type is enforced at the application layer. Reasons:
+
+- Graph type schemas define which node types are valid. Adding a FK would duplicate this constraint.
+- Node types can evolve (schemas can change) without requiring node row updates.
+- The repository layer validates node attributes against the appropriate node type schema before insertion.
+
+This may change if query performance requires filtering nodes by type. A `nodeTypeId` column can be added as a denormalized index.
+
+### SD3: Edge identity uses consumer-defined keys
+
+Edges use `(graphId, key)` as their unique identity. The `key` is consumer-defined, matching the metagraph model where consumers control identifiers. For anonymous edges (common in simple graphs), `key` can be auto-generated.
+
+### SD4: Composite foreign keys for node references
+
+Edges reference nodes via composite FKs: `(graphId, sourceNodeKey) → (nodes.graphId, nodes.key)`. This ensures referential integrity within a graph and cascades node deletions to connected edges.
+
+### SD5: Enum pattern — `as const` objects, not TypeScript enums
+
+All enumerations use the `as const` object pattern (e.g., `GRAPH_STATUS = { Active: "active", ... } as const`) rather than TypeScript `enum`. This matches the `ACTOR_TYPE` pattern in `common.ts` and avoids JSR slow-type issues. The TypeBox schema is a `Type.Union` of `Type.Literal` values derived from the object.
+
+## Metadata Convention
+
+Every table has a `metadata` JSON column defaulting to `{}`. This is an extension namespace for subsystem use, following a namespacing convention: `_subsystem.key` (e.g., `_keypal.scopes`, `_retention.expiresAt`).
+
+**What metadata is for**: Opaque key-value pairs that subsystems add without schema changes. It's never queried in WHERE clauses or JOINs.
+
+**What metadata is NOT for**: A replacement for typed columns. If a field appears in WHERE clauses, JOIN conditions, or needs a constraint, it should be a proper column — not buried in metadata. When in doubt, add a column.
+
+**Namespacing convention**: Subsystems should prefix their keys (e.g., `_callgraph.payloadRef`, `_acl.inherited`). Unprefixed keys are reserved for the storage package itself.
+
+## Concurrency Model
+
+The SQLite host targets spoke deployments where a single process accesses the database. For this model, SQLite's default journal mode is sufficient. However, for spoke deployments that may run concurrent writes (e.g., multiple worker threads), consumers should:
+
+1. **Enable WAL mode**: `PRAGMA journal_mode=WAL;` — allows concurrent reads during writes
+2. **Set busy timeout**: `PRAGMA busy_timeout=5000;` — wait up to 5 seconds for lock acquisition
+3. **Use a single writer**: SQLite supports one writer at a time. If multiple threads write, route writes through a single queue or connection
+
+The `createSqliteDatabase()` factory does not set these pragmas — it's the consumer's responsibility to configure the SQLite connection appropriately. The libsql client used to create the connection can be pre-configured before passing it to the factory.
+
+## PostgreSQL Porting Notes
+
+When implementing `src/pg/`, the table shapes remain the same but with these changes:
+
+| SQLite | PostgreSQL |
+|--------|------------|
+| `sqliteTable` | `pgTable` |
+| `text` (JSON mode) | `jsonb` with `.$type<T>()` |
+| `integer` (timestamp mode) | `timestamp` with timezone |
+| `sql\`(strftime('%s', 'now'))\`` | `sql\`now()\`` |
+| `integer` (boolean mode) | `boolean` |
+| `text` (enum) | `pgEnum` or `text` with check constraint |
+
+See hub's `commonCols` reference in [../../hub/docs/architecture/storage/table-reference.md] for the PostgreSQL patterns.
+
+## References
+
+- Drizzle ORM SQLite core: https://orm.drizzle.team/docs/sqlite-core
+- libsql client: https://github.com/tursodatabase/libsql
+- Hub common columns pattern: `/workspace/@alkdev/hub/docs/architecture/storage/table-reference.md`
+- Source: `src/sqlite/`