Files
storage/docs/architecture/sqlite-host.md
glm-5.1 ed8710a7f5 Clean up architecture specs: remove stale references, align docs with code, improve readability
- Replace stale DD references (DD3, DD6, DD9, DD10) with proper ADR links
- Fix 'Open Question 1' → OQ-01/OQ-03 cross-references
- Rewrite metagraph-module.md 'Why TypeBox Modules' to describe capabilities
  directly instead of framing as SchemaBuilder replacement
- Remove 'Transition from SchemaBuilder' section, replace with Source Structure
- Clean up implementation path: strikethrough phases → status table
- Fix data model diagram: remove non-existent nodeTypeId, fix EdgeType label
- Align EdgeConstraints examples with actual code (add default values)
- Clarify validateNode/validateEdge error behavior in docs
- Align EncryptedDataSchema code example with actual implementation
- Fix overview.md: correct dependency table, update current state, fix TypeBox URL
- Fix forward-look.md garbled text about dbtype element migration
- Fix open-questions.md: correct OQ count (4→7 open), add summary table
- Update doc statuses: schema-evolution, encrypted-data, open-questions → reviewed
- Update AGENTS.md to reflect current implementation state
2026-05-30 09:12:24 +00:00

20 KiB

status, last_updated
status last_updated
reviewed 2026-05-30

SQLite Host

The SQLite database host for @alkdev/storage. Uses Drizzle ORM with libsql/Turso for the SQLite dialect and @alkdev/drizzlebox for TypeBox schema generation from Drizzle table definitions.

Overview

The SQLite host provides:

  1. Drizzle table definitions for the metagraph pattern (graph types, node types, edge types, graphs, nodes, edges) plus a standalone actors table
  2. Drizzle relations for the relational query API
  3. TypeBox schemas auto-generated from Drizzle tables (select/insert validation)
  4. Injectable database factorycreateSqliteDatabase(client) accepts a pre-created client

The SQLite host is the first-class target. PostgreSQL will follow the same table shapes with appropriate dialect changes.

Package Structure

src/sqlite/
├── tables/
│   ├── common.ts          # commonCols, ACTOR_TYPE enum
│   ├── graphTypes.ts      # graph_types table + select/insert schemas
│   ├── nodeTypes.ts        # node_types table + select/insert schemas
│   ├── edgeTypes.ts        # edge_types table + select/insert schemas
│   ├── graphs.ts           # graphs table + select/insert schemas
│   ├── nodes.ts            # nodes table + select/insert schemas
│   ├── edges.ts            # edges table + select/insert schemas
│   ├── actors.ts           # actors table + select/insert schemas
│   └── index.ts            # barrel re-export
├── relations.ts            # Drizzle relational mappings
├── schema.ts              # re-exports tables + relations
└── client.ts              # createSqliteDatabase()

Tables

Common Columns

All tables share these columns:

{
  id: text("id").primaryKey(),
  metadata: text("metadata", { mode: "json" }).$type<Record<string, unknown>>().default({}),
  createdAt: integer("created_at", { mode: "timestamp" })
    .default(sql`(strftime('%s', 'now'))`)
    .notNull(),
  updatedAt: integer("updated_at", { mode: "timestamp" })
    .default(sql`(strftime('%s', 'now'))`)
    .notNull(),
}

Notable differences from a typical PostgreSQL common columns pattern:

Column SQLite PostgreSQL (typical)
id text PK (consumer-generated) text PK with $defaultFn(() => crypto.randomUUID())
metadata text with JSON mode jsonb with $type<Record<string, unknown>>()
createdAt integer timestamp mode (Unix epoch) timestamp with timezone defaulting now()
updatedAt integer timestamp mode (Unix epoch) timestamp with timezone defaulting now() with $onUpdate

The SQLite columns do NOT have $defaultFn for ID generation (the consumer provides IDs) and do NOT have $onUpdate for updatedAt (Drizzle's $onUpdate is application-level; consumers must set it explicitly).

graph_types

Stores graph type definitions (schemas for classes of graphs).

Column Type Constraints Notes
id text PK Consumer-generated UUID
metadata text (JSON) default {} Extension namespace
createdAt integer (timestamp) not null, default now
updatedAt integer (timestamp) not null, default now
name text not null, unique Graph type name (e.g., "call-graph", "acl")
description text default "" Human-readable description
config text (JSON) not null GraphConfig — directed/undirected/mixed, multi, self-loops
version integer not null, default 1 Breaking schema version

node_types

Stores node type definitions within a graph type. Each node type has a TypeBox schema that validates node attributes.

Column Type Constraints Notes
id text PK
metadata text (JSON) default {}
createdAt integer (timestamp) not null, default now
updatedAt integer (timestamp) not null, default now
graphTypeId text not null, FK → graphTypes.id (cascade) Parent graph type
name text not null Node type name (e.g., "call", "account")
description text default ""
schema text (JSON) not null TypeBox schema for node attributes

Unique constraint: (graphTypeId, name) — node type names are unique within a graph type.

edge_types

Stores edge type definitions within a graph type.

Column Type Constraints Notes
id text PK
metadata text (JSON) default {}
createdAt integer (timestamp) not null, default now
updatedAt integer (timestamp) not null, default now
graphTypeId text not null, FK → graphTypes.id (cascade) Parent graph type
name text not null Edge type name (e.g., "triggered", "can_read")
description text default ""
schema text (JSON) not null TypeBox schema for edge attributes
allowedSourceTypes text (JSON) default [] Node type names valid at source endpoint
allowedTargetTypes text (JSON) default [] Node type names valid at target endpoint

Unique constraint: (graphTypeId, name) — edge type names are unique within a graph type.

Empty array semantics: allowedSourceTypes and allowedTargetTypes default to [] (empty JSON array) in the database. [] means "no restriction" — any node type is a valid endpoint — matching the behavior of undefined in the EdgeType schema layer. A non-empty array restricts endpoints to only the listed node types. There is no "no types allowed" state; if edge types need to be disabled, use a status or soft-delete pattern on the edge type definition. The repository layer must enforce this convention consistently. See metagraph-module.md for edge endpoint semantics.

graphs

Graph instances. Each graph belongs to a graph type.

Column Type Constraints Notes
id text PK
metadata text (JSON) default {}
createdAt integer (timestamp) not null, default now
updatedAt integer (timestamp) not null, default now
graphTypeId text FK → graphTypes.id (set null) Set null on graph type deletion (orphan graph)
name text not null Graph instance name
description text default ""
status text not null, enum: active, archived, draft Default: draft

On graphTypeId set null: When a graph type is deleted, its graphs become orphans with graphTypeId = null. The application should prevent graph type deletion if active graphs reference it, or set affected graphs' status to archived as part of a soft-delete workflow. Orphan graphs cannot validate their node/edge types against a missing type definition — queries against orphan graphs should check for graphTypeId !== null before performing type-aware operations.

nodes

Nodes within a graph instance. Keyed by (graphId, key) — unique within a graph.

Column Type Constraints Notes
id text PK
metadata text (JSON) default {}
createdAt integer (timestamp) not null, default now
updatedAt integer (timestamp) not null, default now
graphId text not null, FK → graphs.id (cascade) Parent graph
key text not null Consumer-defined identity within the graph
attributes text (JSON) not null, default {} Node attributes validated by node type schema

Unique constraint: (graphId, key) — node keys are unique within a graph.

No nodeTypeId column: Nodes do not have a direct FK to node_types. The node type is determined at the application layer. This is a deliberate design decision — adding a nodeTypeId FK would couple the graph instance layer to the type definition layer. The repository layer can enforce node type constraints via validation against the graph type's schema.

edges

Edges within a graph instance. Keyed by (graphId, key) — unique within a graph.

Column Type Constraints Notes
id text PK
metadata text (JSON) default {}
createdAt integer (timestamp) not null, default now
updatedAt integer (timestamp) not null, default now
graphId text not null, FK → graphs.id (cascade) Parent graph
key text Consumer-defined identity (null for anonymous edges)
sourceNodeKey text not null Source node key within the graph
targetNodeKey text not null Target node key within the graph
attributes text (JSON) not null, default {} Edge attributes validated by edge type schema
undirected integer (boolean) default false Treat as undirected regardless of graph type

Unique constraint: (graphId, key) — edge keys are unique within a graph.

Foreign keys: sourceNodeKey and targetNodeKey reference (nodes.graphId, nodes.key) with cascade delete. Deleting a node removes all its edges.

actors

Standalone identity table. Currently not referenced by any relation — the actors table has no FK references to or from any metagraph table and is not included in relations.ts. This is a placeholder for identity data and may become a node type in an ACL graph (based on @alkdev/operations's Identity interface) or remain a standalone table. See OQ-03 in open-questions.md.

Column Type Constraints Notes
id text PK
metadata text (JSON) default {}
createdAt integer (timestamp) not null, default now
updatedAt integer (timestamp) not null, default now
name text not null Actor display name
type text not null, enum: human, llm, agent Actor type

Relations

Drizzle relational mappings define the following relationships:

  • graphTypes → nodeTypes: one-to-many
  • graphTypes → edgeTypes: one-to-many
  • graphTypes → graphs: one-to-many
  • graphs → nodes: one-to-many
  • graphs → edges: one-to-many
  • nodes → outgoing edges (sourceNode): one-to-many
  • nodes → incoming edges (targetNode): one-to-many
  • edges → source node: one-to-one (via composite key)
  • edges → target node: one-to-one (via composite key)

Client Factory

import { createSqliteDatabase } from "@alkdev/storage/sqlite";
import type { SqliteDatabase } from "@alkdev/storage/sqlite";
import { createClient } from "@libsql/client";

const client = createClient({ url: "file:local.db" });
const db: SqliteDatabase = createSqliteDatabase(client);

The factory takes a pre-created @libsql/client client and returns a typed Drizzle database instance with the full schema attached. This enables:

  • In-memory testing with createClient({ url: ":memory:" })
  • Turso remote connections
  • Custom client configuration (auth tokens, etc.)

Design Decisions

All design decisions are documented as ADRs in decisions/.

ADR Decision Summary
019 JSON text for schema columns in SQLite SQLite uses text with JSON mode; application-level validation
020 No nodeTypeId on nodes Node type enforced at application layer, not via FK
021 Edge identity uses consumer-defined keys (graphId, key) as unique identity within a graph
022 Composite foreign keys for node references Edges reference (graphId, sourceNodeKey) → (nodes.graphId, nodes.key)
006 as const objects, not TypeScript enums GRAPH_STATUS, ACTOR_TYPE use const objects; TypeBox uses Literal unions
008 Common columns pattern id, metadata, createdAt, updatedAt on every table

Metadata Convention

Every table has a metadata JSON column defaulting to {}. This is an extension namespace for subsystem use, following a namespacing convention: _subsystem.key (e.g., _keypal.scopes, _retention.expiresAt).

What metadata is for: Opaque key-value pairs that subsystems add without schema changes. It's never queried in WHERE clauses or JOINs.

What metadata is NOT for: A replacement for typed columns. If a field appears in WHERE clauses, JOIN conditions, or needs a constraint, it should be a proper column — not buried in metadata. When in doubt, add a column.

Namespacing convention: Subsystems should prefix their keys (e.g., _callgraph.payloadRef, _acl.inherited). Unprefixed keys are reserved for the storage package itself.

Concurrency Model

The SQLite host targets spoke deployments where a single process accesses the database. For this model, SQLite's default journal mode is sufficient. However, for spoke deployments that may run concurrent writes (e.g., multiple worker threads), consumers should:

  1. Enable WAL mode: PRAGMA journal_mode=WAL; — allows concurrent reads during writes
  2. Set busy timeout: PRAGMA busy_timeout=5000; — wait up to 5 seconds for lock acquisition
  3. Use a single writer: SQLite supports one writer at a time. If multiple threads write, route writes through a single queue or connection

The createSqliteDatabase() factory does not set these pragmas — it's the consumer's responsibility to configure the SQLite connection appropriately. The libsql client used to create the connection can be pre-configured before passing it to the factory.

PostgreSQL Porting Notes

When implementing src/pg/, the table shapes remain the same but with these changes:

SQLite PostgreSQL
sqliteTable pgTable
text (JSON mode) jsonb with .$type<T>()
integer (timestamp mode) timestamp with timezone
sql\(strftime('%s', 'now'))`` sql\now()``
integer (boolean mode) boolean
text (enum) pgEnum or text with check constraint

See a consumer's commonCols pattern (e.g., the hub's /workspace/@alkdev/hub/docs/architecture/storage/table-reference.md) for PostgreSQL reference patterns.

References