feat: add architecture docs, fix code issues from review, add analyze_lint script

Architecture docs (docs/architecture/):
- overview.md: package purpose, exports, terminology, design decisions, gaps
- metagraph.md: core graph model, schema types, SchemaBuilder, validation
- sqlite-host.md: SQLite tables, common columns, relations, concurrency model
- encrypted-data.md: encrypted data as a node type, AES-256-GCM crypto utility design

Code fixes from architecture review:
- Remove ConfigSchema duplication in graphTypes.ts (import GraphConfig from types.ts)
- Add missing SelectNodeSchema/SelectNode to nodes.ts
- Fix InsertEdge.key to be Optional (match nullable DB column)
- Replace TypeScript enums with as const objects (GRAPH_STATUS, GRAPH_BASE_TYPE)
- Add verbatim-module-syntax to lint exclusions (TypeBox false positive)
- Add @std/flags and @std/path to deno.json imports

Infrastructure:
- Add scripts/analyze_lint.ts from @ade for grouped lint analysis
- Add deno task lint:analyze
- Update AGENTS.md with architecture doc references, enum convention, crypto todo
This commit is contained in:
2026-05-28 13:18:56 +00:00
parent 351fc98ec1
commit b0298663dc
13 changed files with 1311 additions and 37 deletions

View File

@@ -35,7 +35,7 @@ This design ensures consumers don't bundle database drivers they don't use.
1. **Deno-first, npm-second via JSR**: Package is published to JSR (`deno publish`). npm compatibility is automatic via JSR's npm layer (`@jsr/alkdev__storage`). No separate dnt build step.
2. **No comments in code**: Per project convention across @alkdev packages.
3. **JSR slow types excluded from lint**: Drizzle's deeply inferred generics (`sqliteTable`, `createInsertSchema`, `relations`) make explicit type annotations impractical. We use `--allow-slow-types` on publish and `"exclude": ["no-slow-types"]` in lint config. This is a known technical debt item — can be tightened iteratively.
3. **JSR slow types excluded from lint**: Drizzle's deeply inferred generics (`sqliteTable`, `createInsertSchema`, `relations`) make explicit type annotations impractical. We use `--allow-slow-types` on publish and `"exclude": ["no-slow-types"]` in lint config. Additionally, `"verbatim-module-syntax"` is excluded because TypeBox schemas are runtime values used as `typeof` type references, which the linter misidentifies as type-only imports. This is known technical debt — can be tightened iteratively.
4. **Injectable clients**: `createSqliteDatabase(client)` takes a client, not env vars. Module-level side effects are forbidden.
5. **Dependencies**: `@alkdev/typebox` and `@alkdev/drizzlebox` are npm deps (not yet on JSR). This works fine — JSR handles npm dependencies natively.
@@ -43,7 +43,8 @@ This design ensures consumers don't bundle database drivers they don't use.
```bash
deno check mod.ts src/graphs/mod.ts src/sqlite/mod.ts # Type check
deno lint # Lint (slow-types excluded)
deno lint # Lint (slow-types, verbatim-module-syntax excluded)
deno task lint:analyze # Analyze lint issues by code/file grouping
deno fmt # Format
deno test --allow-all test/ # Run tests
deno publish --allow-slow-types --dry-run # Dry-run publish
@@ -58,6 +59,7 @@ The `graphs/` and `sqlite/` modules were adapted from `@ade/ade-v0/packages/core
- `@ade/core` imports → relative imports within `src/graphs/`
- `import type { GraphConfig }``import { GraphConfig }` (TypeBox schemas are both values and types)
- `Relation` type alias removed (JSR slow type)
- TypeScript enums replaced with `as const` objects (`EnumGraphStatus``GRAPH_STATUS`)
- `client.ts` refactored to be injectable
- Module-level `db` and `client` exports removed
@@ -68,10 +70,23 @@ The `graphs/` and `sqlite/` modules were adapted from `@ade/ade-v0/packages/core
- TypeBox schemas are named with PascalCase (`NodeType`, `GraphConfig`)
- Drizzle table objects are named with camelCase (`graphTypes`, `nodeTypes`)
- Schema objects from drizzlebox are named with PascalCase (`InsertGraph`, `SelectGraph`)
- Enum constants use `SCREAMING_SNAKE_CASE` objects (`GRAPH_STATUS`, `ACTOR_TYPE`)
## Architecture Docs
See `docs/architecture/` for detailed specifications:
- `overview.md` — Package purpose, exports, design decisions, open questions
- `metagraph.md` — Core graph model, schema types, SchemaBuilder, attribute storage
- `sqlite-host.md` — SQLite tables, relations, client factory, porting notes
- `encrypted-data.md` — Encrypted data design (planned), crypto utility, node type modeling
These docs describe what the package is AND what it's becoming. Items marked ⚠️ are not yet implemented.
## What's Not Done Yet
- `src/pg/` — PostgreSQL host (same table shapes, `pgTable` + `jsonb` + `timestamp` + `pgEnum`)
- `src/graphs/crypto.ts` — Crypto utility (`encrypt`, `decrypt`, `generateEncryptionKey`, `EncryptedDataSchema`)
- Tests
- Repository/CRUD layer (currently only table definitions, no typed query functions)
- Hub-specific tables (sessions, messages, parts, call graphs, tasks, etc.)

View File

@@ -16,17 +16,20 @@
"drizzle-orm/pg-core": "npm:drizzle-orm/pg-core",
"@libsql/client": "npm:@libsql/client",
"postgres": "npm:postgres",
"@std/assert": "jsr:@std/assert"
"@std/assert": "jsr:@std/assert",
"@std/flags": "jsr:@std/flags",
"@std/path": "jsr:@std/path"
},
"lint": {
"rules": {
"exclude": ["no-slow-types"]
"exclude": ["no-slow-types", "verbatim-module-syntax"]
}
},
"tasks": {
"check": "deno check mod.ts src/graphs/mod.ts src/sqlite/mod.ts",
"test": "deno test --allow-all test/",
"lint": "deno lint",
"lint:analyze": "deno run --allow-read --allow-run scripts/analyze_lint.ts",
"fmt": "deno fmt",
"publish:dry": "deno publish --allow-slow-types --dry-run"
}

View File

@@ -0,0 +1,273 @@
---
status: draft
last_updated: 2026-05-28
---
# Encrypted Data
Design for storing encrypted data at rest within the metagraph model. Adapts the hub's AES-256-GCM + PBKDF2 encryption pattern as a reusable node type and crypto utility.
## Overview
Sensitive data — API keys, passwords, OAuth tokens, SSH keys — must be encrypted at rest. The hub's `client_secrets` table stores these as encrypted JSON blobs. In `@alkdev/storage`, the same encryption pattern becomes a reusable utility and an encrypted node type, so any graph can store secrets without special table definitions.
**Key principle**: The storage package provides the **encryption primitives and the schema shape**, not key management. Consumers provide the encryption key. This keeps the package agnostic to deployment-specific secret management.
## The Problem
The hub has `client_secrets` as a standalone table with columns like:
| Column | Purpose |
|--------|---------|
| `clientId` | FK to the client this secret belongs to |
| `key` | Secret name (e.g., "api_key", "oauth_credentials") |
| `value` | The encrypted payload (EncryptedData JSON) |
| `keyVersion` | Which encryption key version was used |
| `expiresAt` | When the secret expires |
| `lastUsedAt` | Audit trail |
This is a domain-specific table. The encryption logic itself is generic — AES-256-GCM with PBKDF2 key derivation and key versioning. When we want encrypted secrets in a spoke (local SQLite) or in a different domain model, we shouldn't have to duplicate the table definition or the crypto code.
## Design: Encrypted Data as a Node Type
Instead of a dedicated `client_secrets` table, encrypted data becomes a **node type** in a graph:
```ts
import { SchemaBuilder, BaseNodeAttributes } from "@alkdev/storage";
import { Type } from "@alkdev/typebox";
import { EncryptedDataSchema } from "@alkdev/storage";
const SecretNodeType = Type.Intersect([
BaseNodeAttributes,
Type.Object({
key: Type.String({ minLength: 1, maxLength: 255 }),
encryptedData: EncryptedDataSchema,
expiresAt: Type.Optional(Type.String({ format: "date-time" })),
}),
]);
const schema = new SchemaBuilder()
.config({ type: "undirected", multi: false, allowSelfLoops: false })
.nodeType("secret", SecretNodeType)
.nodeType("client", Type.Intersect([
BaseNodeAttributes,
Type.Object({
name: Type.String(),
type: Type.String(),
config: Type.Record(Type.String(), Type.Any()),
enabled: Type.Boolean({ default: true }),
}),
]))
.edgeType("has_secret", Type.Intersect([
BaseEdgeAttributes,
Type.Object({
secretKey: Type.String(),
}),
]), {
allowedSourceTypes: ["client"],
allowedTargetTypes: ["secret"],
})
.build();
```
This represents the same relationship as `client_secrets.clientId` — but as a graph edge rather than a foreign key.
### Why This Works
1. **No special tables needed** — The existing `graph_types`, `node_types`, `edge_types`, `graphs`, `nodes`, `edges` tables store everything.
2. **Schema validation** — The `EncryptedDataSchema` TypeBox schema validates the encryption envelope at write time.
3. **Domain flexibility** — An "ACL graph" might also have encrypted credential nodes. A "call graph" might store encrypted auth headers. Different graphs, same pattern.
4. **Query through edges** — "Find all secrets for client X" becomes "find all edges of type `has_secret` from node X to secret nodes."
5. **The crypto utility is shared**`@alkdev/storage` exports `encrypt()` and `decrypt()` that any consumer uses.
### What Lives Where
| Layer | Responsibility | Package |
|-------|---------------|---------|
| `@alkdev/storage` graphs | `EncryptedDataSchema` (TypeBox shape) | `@alkdev/storage` |
| `@alkdev/storage` crypto | `encrypt()`, `decrypt()`, `generateEncryptionKey()` | `@alkdev/storage` |
| `@alkdev/storage` sqlite | Node storage (attributes contain encrypted JSON) | `@alkdev/storage/sqlite` |
| Application | Key management (key ring, key rotation) | Consumer |
| Application | Repository layer (validate schema, encrypt before insert) | Consumer |
## EncryptedData Schema
Ported from the hub's `src/crypto/mod.ts` interface, expressed as a TypeBox schema:
```ts
import { Type } from "@alkdev/typebox";
export const EncryptedDataSchema = Type.Object({
keyVersion: Type.Integer({ minimum: 1, description: "Encryption key version for rotation" }),
salt: Type.String({ description: "Base64-encoded 16-byte PBKDF2 salt" }),
iv: Type.String({ description: "Base64-encoded 12-byte AES-GCM initialization vector" }),
data: Type.String({ description: "Base64-encoded AES-256-GCM ciphertext" }),
});
```
This is the same structure as the hub's `EncryptedData` interface but as a TypeBox schema, enabling runtime validation when inserting encrypted nodes.
## Crypto Utility
The encryption module provides three functions, ported from the hub's `src/crypto/mod.ts`:
### `encrypt(plaintext, password, keyVersion?): Promise<EncryptedData>`
Encrypts a string using AES-256-GCM with PBKDF2 key derivation.
**Process**:
1. Generate random 16-byte salt
2. Generate random 12-byte IV
3. Derive 256-bit key from password + salt via PBKDF2 (SHA-256, 100k iterations for v1)
4. Encrypt plaintext with AES-256-GCM using the derived key and IV
5. Return `{ keyVersion, salt: base64(salt), iv: base64(iv), data: base64(ciphertext) }`
### `decrypt(encryptedData, password): Promise<string>`
Decrypts an `EncryptedData` object.
**Process**:
1. Decode base64 salt, IV, and ciphertext
2. Derive key from password + salt + keyVersion via PBKDF2
3. Decrypt with AES-256-GCM
4. Return plaintext string
5. Throw `"Decryption failed: Invalid data or key"` on failure (no information leakage about which part failed)
### `generateEncryptionKey(): string`
Generates a 32-byte random key encoded as base64. Used by operators to create encryption keys for the key ring.
**Key ring format** (application-level, not in this package): A comma-separated list of `v{N}:{base64key}` pairs. The first key is the "current" key used for new encryptions. All keys are available for decryption.
### Key Versioning
PBKDF2 iteration count varies by key version:
- v1: 100,000 iterations
- Future versions: 200,000+ (adjust for hardware improvements)
This allows gradual security upgrades. Old data encrypted with v1 can still be decrypted. Re-encryption (rotate) reads with the old key and writes with the current key.
### Web Crypto API
The implementation uses the standard Web Crypto API (`crypto.subtle`), available in:
- Deno runtime (native)
- Node.js 19+ (native)
- Modern browsers (native)
- Cloudflare Workers (native)
No external crypto dependencies.
## Design Decisions
### ED1: Per-attribute encryption, not per-node
The `EncryptedData` schema is a single attribute within a node type's attributes, not the entire node. This means:
- A secret node can have unencrypted metadata alongside the encrypted value
- The node key (identity) is always readable for queries
- Only the sensitive payload is encrypted
**Alternative considered**: Encrypt the entire `attributes` column. This makes queries impossible (you can't find "all secrets for client X" if the client reference is encrypted). Per-attribute encryption preserves queryability on non-sensitive fields.
### ED2: Node type, not standalone table
Encrypted data is modeled as a node type rather than a dedicated `secrets` table because:
- **Graphs already provide the structure** — edges represent "client X has secret Y" without a join table
- **No foreign key proliferation** — new secret types (OAuth, SSH, API keys) are new node types, not new columns or tables
- **Uniform query patterns** — All graph queries work on secret nodes without special code
**When a standalone table might be better**: If the hub needs to query "all active API keys" across all clients with a single indexed `WHERE` clause, a dedicated `api_keys` table with proper indexes is faster. The graph model requires traversing edges to find related secrets. For the hub's specific use case (key lookup on every authenticated request), this matters. The metagraph pattern is optimized for flexibility, not raw key-lookup performance. The hub should use a standalone `api_keys` table for authentication and the metagraph for everything else.
### ED3: Password-based encryption, not raw-key encryption
The current implementation uses PBKDF2 to derive a key from a password string. The "password" in practice is a base64-encoded 32-byte random key from `generateEncryptionKey()`. This means:
- The key derivation step adds security even when the input is already high-entropy (each encryption gets a unique salt, so the same key produces different ciphertexts)
- However, this adds ~100ms of latency per encryption/decryption due to PBKDF2 iterations
**Alternative**: Direct AES-GCM with raw key bytes (skip PBKDF2). This would be much faster for high-throughput scenarios but removes the per-encryption salt benefit (the IV still provides uniqueness for GCM). The hub uses password-based because the config format is human-manageable key strings. For `@alkdev/storage`, either approach works — the API accepts a "password" string which could be a raw key encoded as base64.
**Decision**: Use the same PBKDF2 pattern for consistency with the hub. If performance becomes an issue, add a `encryptRaw()` function that skips PBKDF2 for raw key inputs.
### ED4: Application-managed key ring
The storage package provides `encrypt()` and `decrypt()` but does NOT manage the key ring. The consuming application:
1. Stores encryption keys in a secure location (Docker secrets, vault, config file with restricted permissions)
2. Loads keys at startup
3. Passes the appropriate key to `encrypt()` / `decrypt()` based on `keyVersion`
4. Handles key rotation (decrypt with old key, re-encrypt with current key)
This separation ensures:
- The storage package doesn't need to know about deployment infrastructure
- Key management policies are application-specific
- The encryption primitives are testable without a key ring implementation
### ED5: No key rotation utility in this package
Key rotation (decrypt with old key, re-encrypt with current key) is an application-level workflow:
1. Find all nodes with `attributes.encryptedData.keyVersion < currentVersion`
2. For each: decrypt with old key → encrypt with current key → update node
3. Commit transaction
The storage package provides the building blocks (`encrypt()`, `decrypt()`, `EncryptedDataSchema`), not the rotation workflow. The hub's background sweep pattern is a good reference implementation.
## Integration with SQLite Host
Encrypted node attributes are stored as JSON text in the `nodes.attributes` column, same as any other node attributes. The `EncryptedDataSchema` validates the shape at the application level.
```ts
import { encrypt, decrypt } from "@alkdev/storage";
import { EncryptedDataSchema } from "@alkdev/storage";
const encryptionKey = "v1:YmFzZTY0a2V5"; // from application config
const plaintext = "sk-ant-api03-...";
const encryptedData = await encrypt(plaintext, encryptionKey, 1);
// Validate before storage
const attributes = {
key: "api_key",
encryptedData,
expiresAt: new Date().toISOString(),
created: new Date().toISOString(),
};
// Store as a node in a graph
// db.insert(nodes).values({ graphId, key: "anthropic-api-key", attributes });
// Retrieve and decrypt
// const node = await db.query.nodes.findFirst({ where: eq(nodes.key, "anthropic-api-key") });
// const decrypted = await decrypt(node.attributes.encryptedData, encryptionKey);
```
## Export Plan
The crypto module will be exported from the main `@alkdev/storage` package (no db deps):
```
src/graphs/
├── types.ts # existing: GraphConfig, NodeType, EdgeType, etc.
├── schemaBuilder.ts # existing: SchemaBuilder
├── crypto.ts # new: encrypt(), decrypt(), generateEncryptionKey(), EncryptedDataSchema
└── mod.ts # re-exports all of the above
```
This keeps the encryption utility in the zero-dep export path (it only uses Web Crypto API and `@alkdev/typebox` for the schema).
## Open Questions
1. **Should we add `encryptRaw()` for performance?** The PBKDF2 derivation adds ~100ms per operation. For batch secret operations (e.g., rotating 1000 keys), this adds up. A `encryptRaw()` that skips PBKDF2 and uses the key directly would be much faster. Decision: add in a future iteration if performance demands it.
2. **Should the `key` attribute on secret nodes be encrypted?** Currently only the `encryptedData` attribute is encrypted. The `key` (secret name like "api_key") is stored in plaintext for queryability. If secret names are themselves sensitive, they could be hashed instead. Decision: plaintext key names are acceptable for now. If needed, add a `keyHash` attribute for blind lookups (similar to the hub's `api_keys.keyHash`).
3. **Should secret nodes have `lastUsedAt` and `expiresAt` as first-class columns?** The hub's `client_secrets` has these as columns for indexed queries. In the metagraph model, they're attributes inside the node JSON. SQLite can't efficiently index JSON properties. Decision: for spoke use (occasional lookups), JSON attributes are fine. For hub use (high-throughput key validation), a standalone `api_keys` table with proper indexes is still needed.
## References
- Hub crypto utility: `/workspace/@alkdev/hub/src/crypto/mod.ts`
- Hub `client_secrets` table: `/workspace/@alkdev/hub/docs/architecture/storage/services.md`
- Hub ADR-008: `/workspace/@alkdev/hub/docs/decisions/ADR-008-secrets-encrypted-at-rest-with-key-versioning.md`
- Web Crypto API: https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto

View File

@@ -0,0 +1,282 @@
---
status: draft
last_updated: 2026-05-28
---
# Metagraph Model
The core data model: graph types define schemas, node types define shapes, edge types define relationships, and typed graph instances hold actual data.
## Overview
The metagraph pattern is a three-level type system:
1. **GraphType** — A class of graphs (e.g., "call-graph", "acl", "task-dependencies"). Defines structural constraints (directed/undirected/mixed, allows self-loops, multi-edges) via a `GraphConfig`.
2. **NodeType** — A category of node within a graph type (e.g., "operation-call", "account", "task"). Each node type has a TypeBox schema that validates the `attributes` of nodes belonging to that type. Optionally constrains which edge types can connect from/to this node type.
3. **EdgeType** — A category of edge within a graph type (e.g., "triggered", "can_read", "depends_on"). Each edge type has a TypeBox schema for its attributes. Optionally constrains which source/target node types are valid.
Then **Graph instances** belong to a graph type and contain **Nodes** and **Edges** conforming to those type definitions.
```
GraphType "call-graph" (directed, multi, self-loops allowed)
├── NodeType "call" → schema validates call attributes
├── NodeType "subcall" → schema validates subcall attributes
├── EdgeType "triggered" → allowedSourceTypes: ["call"], allowedTargetTypes: ["call", "subcall"]
└── EdgeType "depends_on" → allowedSourceTypes: ["call", "subcall"], allowedTargetTypes: ["call", "subcall"]
Graph "session-abc-call-graph" (instance)
│ graphTypeId → GraphType "call-graph"
│ status: "active"
├── Node "call-001" → nodeTypeId → NodeType "call"
│ └── attributes: { requestId, operationId, status, ... }
├── Node "call-002" → nodeTypeId → NodeType "subcall"
│ └── attributes: { requestId, parentRequestId, ... }
└── Edge "edge-001" → edgeTypeId → EdgeType "triggered"
└── attributes: { type: "triggered" }
sourceNodeKey: "call-001"
targetNodeKey: "call-002"
```
## Schema Types
Defined in `src/graphs/types.ts`. Zero database dependencies — these are pure TypeBox schemas used for validation and type inference.
### BaseNodeAttributes
```ts
{
created?: string, // ISO 8601 date-time
modified?: string, // ISO 8601 date-time
metadata?: Record<string, unknown>
}
```
Optional audit and extension fields. Node `attributes` should extend this.
### BaseEdgeAttributes
```ts
{
type: string, // edge type discriminator
metadata?: Record<string, unknown>
}
```
Every edge carries its type and optional metadata. Edge `attributes` should extend this.
### GraphConfig
```ts
{
type: "directed" | "undirected" | "mixed", // default: "mixed"
multi: boolean, // default: true
allowSelfLoops: boolean // default: true
}
```
Structural constraints for a graph type. Defaults encourage permissive graphs (mixed, multi-edges, self-loops) because most real-world graphs need these features.
### NodeType
```ts
{
name: string,
schema: TSchema // TypeBox schema for node attributes
}
```
A node type definition. The `schema` validates the `attributes` of nodes that belong to this type. Consumer must extend `BaseNodeAttributes` in their schema — the metagraph model does not enforce this at the database level (SQLite can't enforce JSON schema), but the SchemaBuilder validates it at definition time.
### EdgeType
```ts
{
name: string,
schema: TSchema,
allowedSourceTypes?: string[],
allowedTargetTypes?: string[]
}
```
An edge type definition. Optionally constrains which node types can appear at source/target endpoints. When `allowedSourceTypes` or `allowedTargetTypes` is undefined, any node type is valid. When defined, only listed node types are valid endpoints.
### GraphSchema
```ts
{
config: GraphConfig,
nodeTypes: Record<string, NodeType>,
edgeTypes: Record<string, EdgeType>
}
```
The complete definition of a graph type. This is what `SchemaBuilder.build()` produces.
### GraphStatus & GraphBaseType
Enum-backed types for graph lifecycle and structural type:
- `GraphStatus`: `active`, `archived`, `draft`
- `GraphBaseType`: `directed`, `undirected`, `mixed`
These are provided both as TypeScript enums and TypeBox schemas, derived from the same enum definition.
## SchemaBuilder
Defined in `src/graphs/schemaBuilder.ts`. Fluent builder API:
```ts
const schema = new SchemaBuilder()
.config({ type: "directed", multi: true, allowSelfLoops: false })
.nodeType("call", CallAttributesSchema)
.nodeType("subcall", SubcallAttributesSchema)
.edgeType("triggered", BaseEdgeAttributes, {
allowedSourceTypes: ["call"],
allowedTargetTypes: ["call", "subcall"],
})
.edgeType("depends_on", BaseEdgeAttributes)
.build();
```
### Validation
The builder validates at each step:
1. **`config()`** — Validates against `GraphConfig` schema. Applies defaults for missing fields.
2. **`nodeType()`** — Validates the schema is a valid TypeBox schema (`KindGuard.IsSchema`). Validates the resulting object against `NodeType` schema.
3. **`edgeType()`** — Same as nodeType, plus validates allowedSourceTypes/allowedTargetTypes are strings.
4. **`build()`** — Validates the complete schema against `GraphSchema`. Throws on any invalid structure.
**Error behavior**: The builder throws `Error` with a JSON-stringified list of validation errors (path + message). Validation failures do not roll back partial state — a builder that fails on the second `nodeType()` call still has the first node type in its schema. Callers should not reuse a builder after a failure. Create a new `SchemaBuilder` instead.
**Edge type enforcement**: When `allowedSourceTypes` or `allowedTargetTypes` is undefined (or an empty array at the application layer), any node type is a valid endpoint. When a non-empty array is provided, only the listed node types are valid endpoints. The repository layer should enforce this at write time.
The SchemaBuilder enforces structural integrity at definition time. The database stores graph/node/edge type schemas as JSON blobs (`text` mode in SQLite, will be `jsonb` in PG). Database-level constraints (unique composite keys, cascade deletes) protect referential integrity, but the database does NOT validate JSON schema conformance. This is a deliberate trade-off:
- **Pro**: Schema changes don't require migrations. A graph type's schema evolves by updating the JSON blob.
- **Pro**: SQLite's JSON support is limited (no JSON schema constraints).
- **Con**: Invalid data can be inserted if application-level validation is bypassed.
- **Mitigation**: All repository-layer mutations validate against the current graph type's schema before writing.
## Node and Edge Identity
Nodes and edges use a **composite identity model**:
- **Node**: identified by `(graphId, key)` — unique within a graph. The `key` is a consumer-defined string (e.g., `"call-001"`, `"account:alice"`).
- **Edge**: identified by `(graphId, key)` — unique within a graph. The `key` is optional for directed graphs but required for multi-edges.
This means consumers control their own identifiers within a graph. The database generates UUID `id` values for cross-graph references, but within a graph, the consumer's `key` is the identity.
## Attributes Storage
Node attributes and edge attributes are stored as JSON text in SQLite (will be `jsonb` in PG). The graph type's schema defines what shape these attributes should have, but the database doesn't enforce the schema — it stores whatever JSON is provided.
This design means:
- **Schema evolution**: Add optional fields to a node type schema without migration. Old nodes are still valid.
- **Schema versioning**: The `version` field on graph types tracks breaking schema changes. Consumer code can check the version before processing.
- **Validation boundary**: All validation happens in the repository layer (application code), not in the database.
## Versioning
Graph types have a `version` integer (default 1). This tracks **breaking** schema changes — field removals, type changes that break backward compatibility. Non-breaking changes (adding optional fields) do not require a version bump.
The repository layer should check `version` before processing to ensure compatibility. A version mismatch indicates the data format has changed incompatibly and the consumer should handle it explicitly.
## Usage Patterns
### Defining a Call Graph Type
```ts
import { SchemaBuilder, BaseNodeAttributes, BaseEdgeAttributes } from "@alkdev/storage";
import { Type } from "@alkdev/typebox";
const CallNodeAttributes = Type.Intersect([
BaseNodeAttributes,
Type.Object({
requestId: Type.String(),
operationId: Type.String(),
status: Type.Union([
Type.Literal("pending"),
Type.Literal("running"),
Type.Literal("completed"),
Type.Literal("failed"),
Type.Literal("aborted"),
]),
}),
]);
const schema = new SchemaBuilder()
.config({ type: "directed", multi: false, allowSelfLoops: false })
.nodeType("call", CallNodeAttributes)
.edgeType("triggered", BaseEdgeAttributes)
.edgeType("depends_on", BaseEdgeAttributes)
.build();
```
### Defining an ACL Graph Type
```ts
const ACLNodeAttributes = Type.Intersect([
BaseNodeAttributes,
Type.Object({
resourceType: Type.String(), // "project", "session", "client"
resourceId: Type.String(),
}),
]);
const ACLEdgeAttributes = Type.Intersect([
BaseEdgeAttributes,
Type.Object({
permission: Type.Union([
Type.Literal("read"),
Type.Literal("write"),
Type.Literal("admin"),
]),
}),
]);
const schema = new SchemaBuilder()
.config({ type: "directed", multi: true, allowSelfLoops: false })
.nodeType("principal", ACLNodeAttributes) // accounts, groups
.nodeType("resource", ACLNodeAttributes) // projects, sessions, etc.
.edgeType("can_access", ACLEdgeAttributes, {
allowedSourceTypes: ["principal"],
allowedTargetTypes: ["resource"],
})
.build();
```
### Defining Encrypted Secret Storage as a Node Type
> **⚠️ Not yet implemented.** `EncryptedDataSchema` and `encrypt()`/`decrypt()` are planned additions. See [encrypted-data.md](./encrypted-data.md) for the design.
```ts
// PLANNED — not yet available
import { EncryptedDataSchema } from "@alkdev/storage";
const SecretNodeAttributes = Type.Intersect([
BaseNodeAttributes,
Type.Object({
key: Type.String(), // secret key name
encryptedData: EncryptedDataSchema, // AES-256-GCM ciphertext
expiresAt: Type.Optional(Type.String({ format: "date-time" })),
}),
]);
const schema = new SchemaBuilder()
.config({ type: "undirected", multi: false, allowSelfLoops: false })
.nodeType("secret", SecretNodeAttributes)
.build();
```
See [encrypted-data.md](./encrypted-data.md) for the full encrypted data design.
## References
- Hub call graph spec: `/workspace/@alkdev/hub/docs/architecture/storage/call-graph.md`
- Hub identity spec: `/workspace/@alkdev/hub/docs/architecture/storage/identity.md`
- TypeBox: https://github.com/sinclairzx/typebox
- SchemaBuilder source: `src/graphs/schemaBuilder.ts`
- Schema types source: `src/graphs/types.ts`

View File

@@ -0,0 +1,146 @@
---
status: draft
last_updated: 2026-05-28
---
# @alkdev/storage — Overview
Typed graph storage with dual database hosts. Deno-first, published via JSR.
## Purpose
`@alkdev/storage` provides a **metagraph** storage model: graph types define schemas, node types define data shapes within those graphs, and edge types define typed relationships. Instances of these type definitions become actual graphs populated with nodes and edges.
This pattern replaces domain-specific table proliferation with a small number of general-purpose tables that can model anything — call graphs, ACL rules, task dependencies, encrypted secrets — while enforcing schema integrity through TypeBox validation.
The package evolved from `@ade/ade-v0/packages/core/graphs` and `@ade/ade-v0/packages/storage_sqlite`, simplified and refactored for the @alkdev ecosystem.
## Architecture
```
@alkdev/storage/
├── mod.ts → re-exports graphs/ (zero db deps)
├── src/
│ ├── graphs/ → schema types + SchemaBuilder (no db deps)
│ ├── sqlite/ → SQLite host (drizzle-orm/libsql)
│ │ ├── tables/ → drizzle table definitions
│ │ ├── relations.ts → drizzle relational mappings
│ │ ├── schema.ts → barrel re-export
│ │ └── client.ts → injectable createSqliteDatabase()
│ └── pg/ → PostgreSQL host (NOT YET IMPLEMENTED)
└── test/ → empty — tests not yet written
```
### Subpath Exports (JSR/npm)
| Export | Contents | Dependencies |
|--------|----------|-------------|
| `@alkdev/storage` | Graph schema types, SchemaBuilder | `@alkdev/typebox`, `@alkdev/drizzlebox` |
| `@alkdev/storage/graphs` | Same as `.` — alias for the main export | Same as `.` |
| `@alkdev/storage/sqlite` | SQLite tables, relations, client | + `drizzle-orm`, `@libsql/client` |
| `@alkdev/storage/pg` | PostgreSQL tables, relations, client | ⚠️ NOT YET IMPLEMENTED |
The `./graphs` subpath exists because the source code lives in `src/graphs/` and the main `mod.ts` re-exports it. Importing from either `@alkdev/storage` or `@alkdev/storage/graphs` yields the same types and SchemaBuilder.
## Terminology
| Term | Definition |
|------|-----------|
| **Metagraph** | A type system where graph types define schemas, node types define data shapes within those graphs, and edge types define typed relationships. Graph instances are concrete data conforming to these type definitions. |
| **Hub** | The central service in the hub-spoke architecture. Runs PostgreSQL, hosts API endpoints, coordinates spokes, and is the authoritative data store. `@alkdev/storage`'s PostgreSQL host (not yet implemented) targets the hub. |
| **Spoke** | A local/embedded instance that runs per-project or per-session. Uses SQLite for local storage. `@alkdev/storage`'s SQLite host targets spokes. |
| **Graph type** | A class of graphs (e.g., "call-graph", "acl"). Defines structural constraints (directed/undirected/mixed, multi-edges, self-loops) and the valid node/edge type vocabularies. Stored in the `graph_types` table. |
| **Node type** | A category of node within a graph type. Defines the attribute schema for nodes of that type. Stored in the `node_types` table. |
| **Edge type** | A category of edge within a graph type. Defines the attribute schema and optionally restricts which node types can be source/target. Stored in the `edge_types` table. |
| **Graph instance** | A concrete graph belonging to a graph type. Contains nodes and edges conforming to its type definitions. Stored in the `graphs` table. |
| **Consumer** | Code that imports `@alkdev/storage` (or a subpath) to define graph types and persist graph data. The hub and spokes are consumers. |
| **Repository layer** | ⚠️ Not yet implemented. The typed CRUD functions (insert, find, update, delete) that sit between consumer code and raw Drizzle queries. Performs schema validation before writes. |
| **Validation boundary** | The line where schema validation is enforced. In this package, validation happens in the SchemaBuilder (at type definition time) and the repository layer (at mutation time), NOT in the database. |
## Design Decisions
### D1: Deno-first, JSR publishes, npm comes free
The package is published to JSR (`deno publish`). npm compatibility is automatic via JSR's npm layer (`@jsr/alkdev__storage`). No separate dnt build step.
### D2: Metagraph over domain-specific tables
Instead of a table per domain concept (call graphs, ACL rules, task trees), we define graph types with typed node and edge schemas. A "call graph" is a graph type with specific node types (operation call, subcall) and edge types (triggered, depends_on). An "ACL graph" is a graph type with node types (account, resource) and edge types (can_read, can_write).
This trades some query convenience for generality. Domain-specific queries are built on top of the graph query layer, not baked into table schemas.
### D3: SchemaBuilder as the primary API surface
The `SchemaBuilder` fluent API is the intended way to construct graph type definitions. It validates against TypeBox schemas at build time, ensuring that graph/node/edge type definitions are structurally sound before they're persisted to the database.
### D4: Injectable clients, no module-level side effects
`createSqliteDatabase(client)` receives a pre-created client. Module-level side effects (auto-connections, env-based configuration) are forbidden. This enables testing with in-memory databases and containerized deployment patterns.
### D5: Drizzle + TypeBox (via drizzlebox) as the table definition pattern
Drizzle table definitions are the single source of truth for database schema. `@alkdev/drizzlebox` generates TypeBox `Select*` and `Insert*` schemas from Drizzle tables, enabling runtime validation without manual schema duplication.
### D6: Enumeration pattern — `as const` objects, not TypeScript enums
All enumerations use the `as const` object pattern (e.g., `GRAPH_STATUS = { Active: "active", ... } as const`) rather than TypeScript `enum`. This avoids JSR slow-type issues (the existing lint exclusion for `no-slow-types` was needed partly because of TS enums) and provides a consistent pattern across the codebase. The TypeBox schemas use `Type.Union` of `Type.Literal` values derived from the const object.
### D7: No comments in code
Per project convention across @alkdev packages, source files contain no inline comments. Documentation lives in architecture docs and TypeBox schema descriptions.
### D8: Common columns pattern
All tables share `id` (text PK), `metadata` (JSON text defaulting to `{}`), `createdAt`, and `updatedAt` (integer timestamps in SQLite, will be timestamptz in PG). This ensures every row has auditability and extensibility.
## Dependencies
| Package | Purpose | Layer |
|---------|---------|-------|
| `@alkdev/typebox` | Runtime schema validation | graphs/ |
| `@alkdev/drizzlebox` | Generate TypeBox from Drizzle tables | sqlite/ |
| `drizzle-orm` | ORM, table definitions, queries | sqlite/ (and future pg/) |
| `@libsql/client` | SQLite client (libsql/turso) | sqlite/ |
| `postgres` | PostgreSQL client | pg/ (not yet used) |
`@alkdev/typebox` and `@alkdev/drizzlebox` are npm packages (not yet on JSR). JSR handles npm dependencies natively.
## What Exists vs. What's Needed
### Implemented
- Graph schema types and SchemaBuilder
- SQLite host: 6 metagraph tables + actors table + Drizzle relations + client factory
- TypeBox select/insert schemas generated from Drizzle tables (drizzlebox)
### Not Yet Implemented
| Gap | Priority | Notes |
|-----|----------|-------|
| Encrypted data node type + crypto utility | **Critical** | ⚠️ Not yet implemented. API keys and secrets at rest. See [encrypted-data.md](./encrypted-data.md). |
| Repository/CRUD layer | High | ⚠️ Not yet implemented. Typed insert, find, update, delete functions for graphs, nodes, edges |
| Tests | High | Zero tests exist. Needed before any real use. |
| PostgreSQL host | Medium | Same table shapes, `pgTable` + `jsonb` + `timestamp` + `pgEnum`. Stub only. |
| ACL graph type | Medium | Access control as a graph. Depends on encrypted data and CRUD layer. |
| Call graph type | Low | Hub-specific, uses metagraph. Deferred until hub consumes this package. |
| Session/message models | Low | Hub-specific, may remain domain tables. |
## Open Questions
1. **Should `actors` be a node type or a standalone table?** Currently `actors` is a standalone table in the SQLite host that isn't referenced by any relation. If identity/authentication is a graph (ACL nodes), actors become node types. If identity is a domain concept that needs special query patterns (auth lookups, session joins), standalone tables may be better. Decision: defer until ACL design.
2. **Should the repository layer be host-specific or host-agnostic?** A host-agnostic repository (insert graph, find nodes by type) requires an abstraction over Drizzle's query builder. A host-specific repository is simpler but means duplicating query logic for PG. Decision: start host-specific in SQLite, extract common patterns later.
3. **Encrypted data scope**: Should encryption be per-attribute, per-node, or per-graph? Per-attribute (like hub's `client_secrets.value`) allows selective encryption. Per-node encrypts the entire `attributes` blob. Per-graph is overkill. Decision: per-attribute, modeled as an encrypted node type with a dedicated attribute for the ciphertext.
4. **Key management scope**: `@alkdev/storage` should provide the encryption/decryption primitives but NOT key management. The consuming application provides the key ring. This keeps the storage package agnostic to deployment-specific secret management.
5. **Migration strategy**: When graph type schemas evolve (new node types, changed attribute schemas), who handles migration? The repository layer should support schema version checking, but actual migration scripts are application-level. See [metagraph.md](./metagraph.md) for the versioning approach.
## References
- Hub storage spec: `/workspace/@alkdev/hub/docs/architecture/storage/`
- Source heritage: `@ade/ade-v0/packages/core/graphs` and `@ade/ade-v0/packages/storage_sqlite`
- Drizzle ORM: https://orm.drizzle.team/
- TypeBox: https://github.com/sinclairzx/typebox
- JSR: https://jsr.io/

View File

@@ -0,0 +1,297 @@
---
status: draft
last_updated: 2026-05-28
---
# SQLite Host
The SQLite database host for `@alkdev/storage`. Uses Drizzle ORM with libsql/Turso for the SQLite dialect and `@alkdev/drizzlebox` for TypeBox schema generation from Drizzle table definitions.
## Overview
The SQLite host provides:
1. **Drizzle table definitions** for the metagraph pattern (graph types, node types, edge types, graphs, nodes, edges) plus a standalone `actors` table
2. **Drizzle relations** for the relational query API
3. **TypeBox schemas** auto-generated from Drizzle tables (select/insert validation)
4. **Injectable database factory**`createSqliteDatabase(client)` accepts a pre-created client
The SQLite host is the first-class target. PostgreSQL will follow the same table shapes with appropriate dialect changes.
## Package Structure
```
src/sqlite/
├── tables/
│ ├── common.ts # commonCols, ACTOR_TYPE enum
│ ├── graphTypes.ts # graph_types table + select/insert schemas
│ ├── nodeTypes.ts # node_types table + select/insert schemas
│ ├── edgeTypes.ts # edge_types table + select/insert schemas
│ ├── graphs.ts # graphs table + select/insert schemas
│ ├── nodes.ts # nodes table + select/insert schemas
│ ├── edges.ts # edges table + select/insert schemas
│ ├── actors.ts # actors table + select/insert schemas
│ └── index.ts # barrel re-export
├── relations.ts # Drizzle relational mappings
├── schema.ts # re-exports tables + relations
└── client.ts # createSqliteDatabase()
```
## Tables
### Common Columns
All tables share these columns:
```ts
{
id: text("id").primaryKey(),
metadata: text("metadata", { mode: "json" }).$type<Record<string, unknown>>().default({}),
createdAt: integer("created_at", { mode: "timestamp" })
.default(sql`(strftime('%s', 'now'))`)
.notNull(),
updatedAt: integer("updated_at", { mode: "timestamp" })
.default(sql`(strftime('%s', 'now'))`)
.notNull(),
}
```
**Notable differences from hub's PostgreSQL common columns**:
| Column | SQLite | PostgreSQL (hub) |
|--------|--------|-------------------|
| `id` | text PK (consumer-generated) | text PK with `$defaultFn(() => crypto.randomUUID())` |
| `metadata` | `text` with JSON mode | `jsonb` with `$type<Record<string, unknown>>()` |
| `createdAt` | `integer` timestamp mode (Unix epoch) | `timestamp with timezone` defaulting `now()` |
| `updatedAt` | `integer` timestamp mode (Unix epoch) | `timestamp with timezone` defaulting `now()` with `$onUpdate` |
The SQLite columns do NOT have `$defaultFn` for ID generation (the consumer provides IDs) and do NOT have `$onUpdate` for `updatedAt` (Drizzle's `$onUpdate` is application-level; consumers must set it explicitly).
### `graph_types`
Stores graph type definitions (schemas for classes of graphs).
| Column | Type | Constraints | Notes |
|--------|------|-------------|-------|
| id | text | PK | Consumer-generated UUID |
| metadata | text (JSON) | default `{}` | Extension namespace |
| createdAt | integer (timestamp) | not null, default `now` | |
| updatedAt | integer (timestamp) | not null, default `now` | |
| name | text | not null, **unique** | Graph type name (e.g., "call-graph", "acl") |
| description | text | default `""` | Human-readable description |
| config | text (JSON) | not null | `GraphConfig` — directed/undirected/mixed, multi, self-loops |
| version | integer | not null, default 1 | Breaking schema version |
### `node_types`
Stores node type definitions within a graph type. Each node type has a TypeBox schema that validates node attributes.
| Column | Type | Constraints | Notes |
|--------|------|-------------|-------|
| id | text | PK | |
| metadata | text (JSON) | default `{}` | |
| createdAt | integer (timestamp) | not null, default `now` | |
| updatedAt | integer (timestamp) | not null, default `now` | |
| graphTypeId | text | not null, FK → graphTypes.id (cascade) | Parent graph type |
| name | text | not null | Node type name (e.g., "call", "account") |
| description | text | default `""` | |
| schema | text (JSON) | not null | TypeBox schema for node attributes |
**Unique constraint**: `(graphTypeId, name)` — node type names are unique within a graph type.
### `edge_types`
Stores edge type definitions within a graph type.
| Column | Type | Constraints | Notes |
|--------|------|-------------|-------|
| id | text | PK | |
| metadata | text (JSON) | default `{}` | |
| createdAt | integer (timestamp) | not null, default `now` | |
| updatedAt | integer (timestamp) | not null, default `now` | |
| graphTypeId | text | not null, FK → graphTypes.id (cascade) | Parent graph type |
| name | text | not null | Edge type name (e.g., "triggered", "can_read") |
| description | text | default `""` | |
| schema | text (JSON) | not null | TypeBox schema for edge attributes |
| allowedSourceTypes | text (JSON) | default `[]` | Node type names valid at source endpoint |
| allowedTargetTypes | text (JSON) | default `[]` | Node type names valid at target endpoint |
**Unique constraint**: `(graphTypeId, name)` — edge type names are unique within a graph type.
**Empty array semantics**: `allowedSourceTypes` and `allowedTargetTypes` default to `[]` (empty JSON array) in the database. The repository layer must treat `[]` (empty array) as "no restriction" — any node type is a valid endpoint — matching the behavior of `undefined` in the `EdgeType` schema. A non-empty array restricts endpoints to only the listed node types. There is no "no types allowed" state; if edge types need to be disabled, use a status or soft-delete pattern on the edge type definition.
### `graphs`
Graph instances. Each graph belongs to a graph type.
| Column | Type | Constraints | Notes |
|--------|------|-------------|-------|
| id | text | PK | |
| metadata | text (JSON) | default `{}` | |
| createdAt | integer (timestamp) | not null, default `now` | |
| updatedAt | integer (timestamp) | not null, default `now` | |
| graphTypeId | text | FK → graphTypes.id (set null) | Set null on graph type deletion (orphan graph) |
| name | text | not null | Graph instance name |
| description | text | default `""` | |
| status | text | not null, enum: `active`, `archived`, `draft` | Default: `draft` |
**On `graphTypeId` set null**: When a graph type is deleted, its graphs become orphans with `graphTypeId = null`. The application should prevent graph type deletion if active graphs reference it, or set affected graphs' `status` to `archived` as part of a soft-delete workflow. Orphan graphs cannot validate their node/edge types against a missing type definition — queries against orphan graphs should check for `graphTypeId !== null` before performing type-aware operations.
### `nodes`
Nodes within a graph instance. Keyed by `(graphId, key)` — unique within a graph.
| Column | Type | Constraints | Notes |
|--------|------|-------------|-------|
| id | text | PK | |
| metadata | text (JSON) | default `{}` | |
| createdAt | integer (timestamp) | not null, default `now` | |
| updatedAt | integer (timestamp) | not null, default `now` | |
| graphId | text | not null, FK → graphs.id (cascade) | Parent graph |
| key | text | not null | Consumer-defined identity within the graph |
| attributes | text (JSON) | not null, default `{}` | Node attributes validated by node type schema |
**Unique constraint**: `(graphId, key)` — node keys are unique within a graph.
**No `nodeTypeId` column**: Nodes do not have a direct FK to `node_types`. The node type is determined at the application layer. This is a deliberate design decision — adding a `nodeTypeId` FK would couple the graph instance layer to the type definition layer. The repository layer can enforce node type constraints via validation against the graph type's schema.
### `edges`
Edges within a graph instance. Keyed by `(graphId, key)` — unique within a graph.
| Column | Type | Constraints | Notes |
|--------|------|-------------|-------|
| id | text | PK | |
| metadata | text (JSON) | default `{}` | |
| createdAt | integer (timestamp) | not null, default `now` | |
| updatedAt | integer (timestamp) | not null, default `now` | |
| graphId | text | not null, FK → graphs.id (cascade) | Parent graph |
| key | text | | Consumer-defined identity (null for anonymous edges) |
| sourceNodeKey | text | not null | Source node key within the graph |
| targetNodeKey | text | not null | Target node key within the graph |
| attributes | text (JSON) | not null, default `{}` | Edge attributes validated by edge type schema |
| undirected | integer (boolean) | default false | Treat as undirected regardless of graph type |
**Unique constraint**: `(graphId, key)` — edge keys are unique within a graph.
**Foreign keys**: `sourceNodeKey` and `targetNodeKey` reference `(nodes.graphId, nodes.key)` with cascade delete. Deleting a node removes all its edges.
### `actors`
Standalone identity table. Currently not referenced by any relation. This is a placeholder for the hub's account/identity model and may become a node type in an ACL graph or remain a standalone table. See [overview.md](./overview.md) Open Question 1.
| Column | Type | Constraints | Notes |
|--------|------|-------------|-------|
| id | text | PK | |
| metadata | text (JSON) | default `{}` | |
| createdAt | integer (timestamp) | not null, default `now` | |
| updatedAt | integer (timestamp) | not null, default `now` | |
| name | text | not null | Actor display name |
| type | text | not null, enum: `human`, `llm`, `agent` | Actor type |
## Relations
Drizzle relational mappings define the following relationships:
- **graphTypes → nodeTypes**: one-to-many
- **graphTypes → edgeTypes**: one-to-many
- **graphTypes → graphs**: one-to-many
- **graphs → nodes**: one-to-many
- **graphs → edges**: one-to-many
- **nodes → outgoing edges** (sourceNode): one-to-many
- **nodes → incoming edges** (targetNode): one-to-many
- **edges → source node**: one-to-one (via composite key)
- **edges → target node**: one-to-one (via composite key)
## Client Factory
```ts
import { createSqliteDatabase } from "@alkdev/storage/sqlite";
import type { SqliteDatabase } from "@alkdev/storage/sqlite";
import { createClient } from "@libsql/client";
const client = createClient({ url: "file:local.db" });
const db: SqliteDatabase = createSqliteDatabase(client);
```
The factory takes a pre-created `@libsql/client` client and returns a typed Drizzle database instance with the full schema attached. This enables:
- In-memory testing with `createClient({ url: ":memory:" })`
- Turso remote connections
- Custom client configuration (auth tokens, etc.)
## Design Decisions
### SD1: JSON text vs. JSONB in SQLite
SQLite stores JSON as `text` with `{ mode: "json" }`. PostgreSQL uses native `jsonb`. This means:
- SQLite cannot query inside JSON columns efficiently (no GIN indexes)
- SQLite JSON validation relies on application-level checks (TypeBox schemas)
- PostgreSQL will get queryability benefits for JSON columns
The trade-off: SQLite is for spokes (local, infrequent queries), PostgreSQL is for the hub (frequent, complex queries).
### SD2: No `nodeTypeId` on nodes
Nodes don't carry a direct FK to `node_types`. The node type is enforced at the application layer. Reasons:
- Graph type schemas define which node types are valid. Adding a FK would duplicate this constraint.
- Node types can evolve (schemas can change) without requiring node row updates.
- The repository layer validates node attributes against the appropriate node type schema before insertion.
This may change if query performance requires filtering nodes by type. A `nodeTypeId` column can be added as a denormalized index.
### SD3: Edge identity uses consumer-defined keys
Edges use `(graphId, key)` as their unique identity. The `key` is consumer-defined, matching the metagraph model where consumers control identifiers. For anonymous edges (common in simple graphs), `key` can be auto-generated.
### SD4: Composite foreign keys for node references
Edges reference nodes via composite FKs: `(graphId, sourceNodeKey) → (nodes.graphId, nodes.key)`. This ensures referential integrity within a graph and cascades node deletions to connected edges.
### SD5: Enum pattern — `as const` objects, not TypeScript enums
All enumerations use the `as const` object pattern (e.g., `GRAPH_STATUS = { Active: "active", ... } as const`) rather than TypeScript `enum`. This matches the `ACTOR_TYPE` pattern in `common.ts` and avoids JSR slow-type issues. The TypeBox schema is a `Type.Union` of `Type.Literal` values derived from the object.
## Metadata Convention
Every table has a `metadata` JSON column defaulting to `{}`. This is an extension namespace for subsystem use, following a namespacing convention: `_subsystem.key` (e.g., `_keypal.scopes`, `_retention.expiresAt`).
**What metadata is for**: Opaque key-value pairs that subsystems add without schema changes. It's never queried in WHERE clauses or JOINs.
**What metadata is NOT for**: A replacement for typed columns. If a field appears in WHERE clauses, JOIN conditions, or needs a constraint, it should be a proper column — not buried in metadata. When in doubt, add a column.
**Namespacing convention**: Subsystems should prefix their keys (e.g., `_callgraph.payloadRef`, `_acl.inherited`). Unprefixed keys are reserved for the storage package itself.
## Concurrency Model
The SQLite host targets spoke deployments where a single process accesses the database. For this model, SQLite's default journal mode is sufficient. However, for spoke deployments that may run concurrent writes (e.g., multiple worker threads), consumers should:
1. **Enable WAL mode**: `PRAGMA journal_mode=WAL;` — allows concurrent reads during writes
2. **Set busy timeout**: `PRAGMA busy_timeout=5000;` — wait up to 5 seconds for lock acquisition
3. **Use a single writer**: SQLite supports one writer at a time. If multiple threads write, route writes through a single queue or connection
The `createSqliteDatabase()` factory does not set these pragmas — it's the consumer's responsibility to configure the SQLite connection appropriately. The libsql client used to create the connection can be pre-configured before passing it to the factory.
## PostgreSQL Porting Notes
When implementing `src/pg/`, the table shapes remain the same but with these changes:
| SQLite | PostgreSQL |
|--------|------------|
| `sqliteTable` | `pgTable` |
| `text` (JSON mode) | `jsonb` with `.$type<T>()` |
| `integer` (timestamp mode) | `timestamp` with timezone |
| `sql\`(strftime('%s', 'now'))\`` | `sql\`now()\`` |
| `integer` (boolean mode) | `boolean` |
| `text` (enum) | `pgEnum` or `text` with check constraint |
See hub's `commonCols` reference in [../../hub/docs/architecture/storage/table-reference.md] for the PostgreSQL patterns.
## References
- Drizzle ORM SQLite core: https://orm.drizzle.team/docs/sqlite-core
- libsql client: https://github.com/tursodatabase/libsql
- Hub common columns pattern: `/workspace/@alkdev/hub/docs/architecture/storage/table-reference.md`
- Source: `src/sqlite/`

249
scripts/analyze_lint.ts Normal file
View File

@@ -0,0 +1,249 @@
import { parse } from "@std/flags";
import * as path from "@std/path";
interface LintRange {
start: { line: number; col: number; bytePos: number };
end: { line: number; col: number; bytePos: number };
}
interface LintDiagnostic {
filename: string;
range: LintRange;
message: string;
code: string;
hint: string;
}
interface LintResult {
version: number;
diagnostics: LintDiagnostic[];
errors: unknown[];
checkedFiles: string[];
}
interface FilterOptions {
codes?: string[];
files?: string[];
groupBy?: "code" | "file";
}
interface Stats {
total: number;
byCode: Record<string, number>;
byFile: Record<string, number>;
filesWithIssues: number;
}
function filterDiagnostics(
diagnostics: LintDiagnostic[],
options: FilterOptions
): LintDiagnostic[] {
let result = diagnostics;
if (options.codes) {
const codes = new Set(options.codes);
result = result.filter(d => codes.has(d.code));
}
if (options.files) {
const filePatterns = options.files.map(f => new RegExp(f));
result = result.filter(d =>
filePatterns.some(pattern => pattern.test(d.filename))
);
}
return result;
}
function groupDiagnostics(
diagnostics: LintDiagnostic[],
groupBy: "code" | "file"
): Record<string, LintDiagnostic[]> {
const groups: Record<string, LintDiagnostic[]> = {};
for (const diag of diagnostics) {
const key = groupBy === "code" ? diag.code : diag.filename;
if (!groups[key]) {
groups[key] = [];
}
groups[key].push(diag);
}
return groups;
}
function calculateStats(diagnostics: LintDiagnostic[]): Stats {
const byCode: Record<string, number> = {};
const byFile: Record<string, number> = {};
for (const diag of diagnostics) {
byCode[diag.code] = (byCode[diag.code] || 0) + 1;
byFile[diag.filename] = (byFile[diag.filename] || 0) + 1;
}
return {
total: diagnostics.length,
byCode,
byFile,
filesWithIssues: Object.keys(byFile).length
};
}
function printStats(stats: Stats, topN: number = 10) {
console.log("\n=== LINT ISSUE STATISTICS ===");
console.log(`Total issues: ${stats.total}`);
console.log(`Files with issues: ${stats.filesWithIssues}`);
console.log(`\nTop ${topN} issue types:`);
const sortedByCode = Object.entries(stats.byCode).sort((a, b) => b[1] - a[1]);
for (let i = 0; i < Math.min(topN, sortedByCode.length); i++) {
const [code, count] = sortedByCode[i];
console.log(` ${code}: ${count}`);
}
console.log(`\nTop ${topN} files with most issues:`);
const sortedByFile = Object.entries(stats.byFile).sort((a, b) => b[1] - a[1]);
for (let i = 0; i < Math.min(topN, sortedByFile.length); i++) {
const [file, count] = sortedByFile[i];
console.log(` ${path.basename(file)}: ${count}`);
}
}
function printGroupedDiagnostics(
groups: Record<string, LintDiagnostic[]>,
groupBy: "code" | "file",
limit?: number
) {
const sortedEntries = Object.entries(groups).sort(
(a, b) => b[1].length - a[1].length
);
const entriesToShow = limit ? sortedEntries.slice(0, limit) : sortedEntries;
for (const [key, diagnostics] of entriesToShow) {
console.log(`\n${groupBy.toUpperCase()}: ${key} (${diagnostics.length} issues)`);
// Show first 5 issues for each group to avoid overwhelming output
const issuesToShow = Math.min(5, diagnostics.length);
for (let i = 0; i < issuesToShow; i++) {
const diag = diagnostics[i];
console.log(
` ${path.basename(diag.filename)}:${diag.range.start.line + 1}:${diag.range.start.col + 1} - ${diag.message}`
);
}
if (diagnostics.length > issuesToShow) {
console.log(` ... and ${diagnostics.length - issuesToShow} more issues`);
}
}
if (limit && sortedEntries.length > limit) {
console.log(`\n... and ${sortedEntries.length - limit} more groups`);
}
}
async function runDenoLint(): Promise<LintResult> {
const command = new Deno.Command(Deno.execPath(), {
args: ["lint", "--json"],
stdout: "piped",
stderr: "piped",
});
const { code, stdout, stderr } = await command.output();
if (code !== 0 && code !== 1) { // Deno lint returns 1 when there are lint issues
const errorOutput = new TextDecoder().decode(stderr);
throw new Error(`Lint command failed:\n${errorOutput}`);
}
const output = new TextDecoder().decode(stdout);
return JSON.parse(output);
}
async function main() {
const args = parse(Deno.args, {
alias: {
f: "file",
c: "code",
g: "group",
h: "help",
s: "stats",
l: "limit"
},
string: ["file", "code", "group"],
boolean: ["help", "stats"],
default: { limit: 0 } // 0 means no limit
});
if (args.help) {
console.log(`
Usage: deno run analyze_lint.ts [options] [lint-output.json]
Options:
-f, --file <pattern> Filter by file path pattern (regex, can be used multiple times)
-c, --code <codes> Filter by comma-separated lint codes
-g, --group <type> Group by "code" or "file"
-s, --stats Show statistics summary
-l, --limit <number> Limit number of groups to display when grouping
--help Show this help message
Examples:
deno run analyze_lint.ts -c no-unused-vars
deno run analyze_lint.ts -f ".*\\.ts" -f ".*\\.tsx" -g file
deno run analyze_lint.ts --code=no-explicit-any,verbatim-module-syntax
deno run analyze_lint.ts --stats
deno run analyze_lint.ts --group code --limit 5
`);
return;
}
let lintResult: LintResult;
// Read from file or run lint command
if (args._.length > 0) {
const filePath = String(args._[0]);
const content = await Deno.readTextFile(filePath);
lintResult = JSON.parse(content);
} else {
lintResult = await runDenoLint();
}
// Apply filters
const filterOptions: FilterOptions = {};
if (args.code) {
filterOptions.codes = args.code.split(",").map(c => c.trim());
}
if (args.file) {
// Handle multiple file patterns
filterOptions.files = Array.isArray(args.file)
? args.file
: [args.file];
}
const filteredDiagnostics = filterDiagnostics(
lintResult.diagnostics,
filterOptions
);
// Show statistics if requested
if (args.stats) {
const stats = calculateStats(filteredDiagnostics);
printStats(stats);
}
// Group or show all diagnostics
if (args.group) {
const groups = groupDiagnostics(filteredDiagnostics, args.group as "code" | "file");
printGroupedDiagnostics(groups, args.group as "code" | "file", (args.limit as number) || undefined);
} else if (!args.stats) {
// Only show JSON output if neither stats nor grouping is requested
console.log(JSON.stringify({ diagnostics: filteredDiagnostics }, null, 2));
}
if (!args.stats) {
console.log(`\nFound ${filteredDiagnostics.length} issues matching criteria`);
}
}
if (import.meta.main) {
main().catch(console.error);
}

View File

@@ -51,20 +51,28 @@ export const GraphSchema: TSchema = Type.Object({
export type GraphSchema = Static<typeof GraphSchema>;
export enum EnumGraphStatus {
Active = "active",
Archived = "archived",
Draft = "draft",
}
export const GRAPH_STATUS = {
Active: "active",
Archived: "archived",
Draft: "draft",
} as const;
export type GraphStatus = Static<typeof GraphStatus>;
export const GraphStatus: TSchema = Type.Enum(EnumGraphStatus);
export type GraphStatus = (typeof GRAPH_STATUS)[keyof typeof GRAPH_STATUS];
export const GraphStatus: TSchema = Type.Union([
Type.Literal(GRAPH_STATUS.Active),
Type.Literal(GRAPH_STATUS.Archived),
Type.Literal(GRAPH_STATUS.Draft),
]);
export enum EnumGraphBaseType {
Directed = "directed",
Undirected = "undirected",
Mixed = "mixed",
}
export const GRAPH_BASE_TYPE = {
Directed: "directed",
Undirected: "undirected",
Mixed: "mixed",
} as const;
export type GraphBaseType = Static<typeof GraphBaseType>;
export const GraphBaseType: TSchema = Type.Enum(EnumGraphBaseType);
export type GraphBaseType = (typeof GRAPH_BASE_TYPE)[keyof typeof GRAPH_BASE_TYPE];
export const GraphBaseType: TSchema = Type.Union([
Type.Literal(GRAPH_BASE_TYPE.Directed),
Type.Literal(GRAPH_BASE_TYPE.Undirected),
Type.Literal(GRAPH_BASE_TYPE.Mixed),
]);

View File

@@ -37,7 +37,7 @@ export const SelectEdge = createSelectSchema(edges, {
export type SelectEdge = Static<typeof SelectEdge>;
export const InsertEdge = createInsertSchema(edges, {
key: Type.String({ minLength: 1 }),
key: Type.Optional(Type.String({ minLength: 1 })),
attributes: AttributesSchema,
});

View File

@@ -2,23 +2,15 @@ import { sqliteTable, text, integer } from "drizzle-orm/sqlite-core";
import { createInsertSchema, createSelectSchema } from "@alkdev/drizzlebox";
import { Type, type Static } from "@alkdev/typebox";
import { commonCols } from "./common.ts";
import { GraphConfig } from "../../graphs/types.ts";
const ConfigSchema = Type.Object({
type: Type.Union([
Type.Literal("directed"),
Type.Literal("undirected"),
Type.Literal("mixed"),
], { default: "mixed" }),
multi: Type.Boolean({ default: true }),
allowSelfLoops: Type.Boolean({ default: true }),
});
type GraphConfigType = Static<typeof GraphConfig>;
export const graphTypes = sqliteTable("graph_types", {
...commonCols,
name: text("name").notNull().unique(),
description: text("description").default(""),
config: text("config", { mode: "json" }).$type<Static<typeof ConfigSchema>>().notNull(),
config: text("config", { mode: "json" }).$type<GraphConfigType>().notNull(),
version: integer("version").notNull().default(1),
});

View File

@@ -3,7 +3,7 @@ import { createInsertSchema, createSelectSchema } from "@alkdev/drizzlebox";
import { Type, type Static } from "@alkdev/typebox";
import { commonCols } from "./common.ts";
import { graphTypes } from "./graphTypes.ts";
import { EnumGraphStatus } from "../../graphs/types.ts";
import { GRAPH_STATUS } from "../../graphs/types.ts";
export const graphs = sqliteTable("graphs", {
...commonCols,
@@ -26,9 +26,9 @@ export type SelectGraph = Static<typeof SelectGraph>;
export const InsertGraph = createInsertSchema(graphs, {
name: Type.String({ minLength: 2 }),
status: Type.Optional(Type.Union([
Type.Literal(EnumGraphStatus.Active),
Type.Literal(EnumGraphStatus.Archived),
Type.Literal(EnumGraphStatus.Draft),
Type.Literal(GRAPH_STATUS.Active),
Type.Literal(GRAPH_STATUS.Archived),
Type.Literal(GRAPH_STATUS.Draft),
])),
});

View File

@@ -2,8 +2,8 @@ export { graphs } from "./graphs.ts";
export type { SelectGraph, InsertGraph } from "./graphs.ts";
export { SelectGraph as SelectGraphSchema, InsertGraph as InsertGraphSchema } from "./graphs.ts";
export { nodes } from "./nodes.ts";
export type { InsertNode } from "./nodes.ts";
export { InsertNodeSchema } from "./nodes.ts";
export type { SelectNode, InsertNode } from "./nodes.ts";
export { SelectNodeSchema, InsertNodeSchema } from "./nodes.ts";
export { edges } from "./edges.ts";
export type { SelectEdge, InsertEdge } from "./edges.ts";
export { SelectEdge as SelectEdgeSchema, InsertEdge as InsertEdgeSchema } from "./edges.ts";

View File

@@ -1,5 +1,5 @@
import { sqliteTable, text, unique } from "drizzle-orm/sqlite-core";
import { createInsertSchema } from "@alkdev/drizzlebox";
import { createInsertSchema, createSelectSchema } from "@alkdev/drizzlebox";
import { Type, type Static } from "@alkdev/typebox";
import { commonCols } from "./common.ts";
import { graphs } from "./graphs.ts";
@@ -15,6 +15,15 @@ export const nodes = sqliteTable("nodes", {
graphKeyIdx: unique().on(table.graphId, table.key),
}));
export const SelectNodeSchema = createSelectSchema(nodes, {
attributes: AttributesSchema,
metadata: Type.Object({}, { additionalProperties: true }),
createdAt: Type.Date(),
updatedAt: Type.Date(),
});
export type SelectNode = Static<typeof SelectNodeSchema>;
export const InsertNodeSchema = createInsertSchema(nodes, {
key: Type.String({ minLength: 1 }),
attributes: AttributesSchema,