fix: use import type for GraphConfig, remove verbatim-module-syntax exclusion

The verbatim-module-syntax lint rule was correctly flagging that
GraphConfig is only used in a type position (typeof GraphConfig). Since
typeof resolves purely at the type level, import type works fine here
and is the correct form. No lint exclusion needed.

Also: deno fmt across all files (markdown line wrapping).
This commit is contained in:
2026-05-28 13:38:42 +00:00
parent b0298663dc
commit bb544469fd
34 changed files with 1279 additions and 617 deletions

View File

@@ -5,35 +5,47 @@ last_updated: 2026-05-28
# Encrypted Data
Design for storing encrypted data at rest within the metagraph model. Adapts the hub's AES-256-GCM + PBKDF2 encryption pattern as a reusable node type and crypto utility.
Design for storing encrypted data at rest within the metagraph model. Adapts the
hub's AES-256-GCM + PBKDF2 encryption pattern as a reusable node type and crypto
utility.
## Overview
Sensitive data — API keys, passwords, OAuth tokens, SSH keys — must be encrypted at rest. The hub's `client_secrets` table stores these as encrypted JSON blobs. In `@alkdev/storage`, the same encryption pattern becomes a reusable utility and an encrypted node type, so any graph can store secrets without special table definitions.
Sensitive data — API keys, passwords, OAuth tokens, SSH keys — must be encrypted
at rest. The hub's `client_secrets` table stores these as encrypted JSON blobs.
In `@alkdev/storage`, the same encryption pattern becomes a reusable utility and
an encrypted node type, so any graph can store secrets without special table
definitions.
**Key principle**: The storage package provides the **encryption primitives and the schema shape**, not key management. Consumers provide the encryption key. This keeps the package agnostic to deployment-specific secret management.
**Key principle**: The storage package provides the **encryption primitives and
the schema shape**, not key management. Consumers provide the encryption key.
This keeps the package agnostic to deployment-specific secret management.
## The Problem
The hub has `client_secrets` as a standalone table with columns like:
| Column | Purpose |
|--------|---------|
| `clientId` | FK to the client this secret belongs to |
| `key` | Secret name (e.g., "api_key", "oauth_credentials") |
| `value` | The encrypted payload (EncryptedData JSON) |
| `keyVersion` | Which encryption key version was used |
| `expiresAt` | When the secret expires |
| `lastUsedAt` | Audit trail |
| Column | Purpose |
| ------------ | -------------------------------------------------- |
| `clientId` | FK to the client this secret belongs to |
| `key` | Secret name (e.g., "api_key", "oauth_credentials") |
| `value` | The encrypted payload (EncryptedData JSON) |
| `keyVersion` | Which encryption key version was used |
| `expiresAt` | When the secret expires |
| `lastUsedAt` | Audit trail |
This is a domain-specific table. The encryption logic itself is generic — AES-256-GCM with PBKDF2 key derivation and key versioning. When we want encrypted secrets in a spoke (local SQLite) or in a different domain model, we shouldn't have to duplicate the table definition or the crypto code.
This is a domain-specific table. The encryption logic itself is generic —
AES-256-GCM with PBKDF2 key derivation and key versioning. When we want
encrypted secrets in a spoke (local SQLite) or in a different domain model, we
shouldn't have to duplicate the table definition or the crypto code.
## Design: Encrypted Data as a Node Type
Instead of a dedicated `client_secrets` table, encrypted data becomes a **node type** in a graph:
Instead of a dedicated `client_secrets` table, encrypted data becomes a **node
type** in a graph:
```ts
import { SchemaBuilder, BaseNodeAttributes } from "@alkdev/storage";
import { BaseNodeAttributes, SchemaBuilder } from "@alkdev/storage";
import { Type } from "@alkdev/typebox";
import { EncryptedDataSchema } from "@alkdev/storage";
@@ -49,107 +61,142 @@ const SecretNodeType = Type.Intersect([
const schema = new SchemaBuilder()
.config({ type: "undirected", multi: false, allowSelfLoops: false })
.nodeType("secret", SecretNodeType)
.nodeType("client", Type.Intersect([
BaseNodeAttributes,
Type.Object({
name: Type.String(),
type: Type.String(),
config: Type.Record(Type.String(), Type.Any()),
enabled: Type.Boolean({ default: true }),
}),
]))
.edgeType("has_secret", Type.Intersect([
BaseEdgeAttributes,
Type.Object({
secretKey: Type.String(),
}),
]), {
allowedSourceTypes: ["client"],
allowedTargetTypes: ["secret"],
})
.nodeType(
"client",
Type.Intersect([
BaseNodeAttributes,
Type.Object({
name: Type.String(),
type: Type.String(),
config: Type.Record(Type.String(), Type.Any()),
enabled: Type.Boolean({ default: true }),
}),
]),
)
.edgeType(
"has_secret",
Type.Intersect([
BaseEdgeAttributes,
Type.Object({
secretKey: Type.String(),
}),
]),
{
allowedSourceTypes: ["client"],
allowedTargetTypes: ["secret"],
},
)
.build();
```
This represents the same relationship as `client_secrets.clientId` — but as a graph edge rather than a foreign key.
This represents the same relationship as `client_secrets.clientId` — but as a
graph edge rather than a foreign key.
### Why This Works
1. **No special tables needed** — The existing `graph_types`, `node_types`, `edge_types`, `graphs`, `nodes`, `edges` tables store everything.
2. **Schema validation** — The `EncryptedDataSchema` TypeBox schema validates the encryption envelope at write time.
3. **Domain flexibility** — An "ACL graph" might also have encrypted credential nodes. A "call graph" might store encrypted auth headers. Different graphs, same pattern.
4. **Query through edges** — "Find all secrets for client X" becomes "find all edges of type `has_secret` from node X to secret nodes."
5. **The crypto utility is shared**`@alkdev/storage` exports `encrypt()` and `decrypt()` that any consumer uses.
1. **No special tables needed** — The existing `graph_types`, `node_types`,
`edge_types`, `graphs`, `nodes`, `edges` tables store everything.
2. **Schema validation** — The `EncryptedDataSchema` TypeBox schema validates
the encryption envelope at write time.
3. **Domain flexibility** — An "ACL graph" might also have encrypted credential
nodes. A "call graph" might store encrypted auth headers. Different graphs,
same pattern.
4. **Query through edges** — "Find all secrets for client X" becomes "find all
edges of type `has_secret` from node X to secret nodes."
5. **The crypto utility is shared**`@alkdev/storage` exports `encrypt()` and
`decrypt()` that any consumer uses.
### What Lives Where
| Layer | Responsibility | Package |
|-------|---------------|---------|
| `@alkdev/storage` graphs | `EncryptedDataSchema` (TypeBox shape) | `@alkdev/storage` |
| `@alkdev/storage` crypto | `encrypt()`, `decrypt()`, `generateEncryptionKey()` | `@alkdev/storage` |
| `@alkdev/storage` sqlite | Node storage (attributes contain encrypted JSON) | `@alkdev/storage/sqlite` |
| Application | Key management (key ring, key rotation) | Consumer |
| Application | Repository layer (validate schema, encrypt before insert) | Consumer |
| Layer | Responsibility | Package |
| ------------------------ | --------------------------------------------------------- | ------------------------ |
| `@alkdev/storage` graphs | `EncryptedDataSchema` (TypeBox shape) | `@alkdev/storage` |
| `@alkdev/storage` crypto | `encrypt()`, `decrypt()`, `generateEncryptionKey()` | `@alkdev/storage` |
| `@alkdev/storage` sqlite | Node storage (attributes contain encrypted JSON) | `@alkdev/storage/sqlite` |
| Application | Key management (key ring, key rotation) | Consumer |
| Application | Repository layer (validate schema, encrypt before insert) | Consumer |
## EncryptedData Schema
Ported from the hub's `src/crypto/mod.ts` interface, expressed as a TypeBox schema:
Ported from the hub's `src/crypto/mod.ts` interface, expressed as a TypeBox
schema:
```ts
import { Type } from "@alkdev/typebox";
export const EncryptedDataSchema = Type.Object({
keyVersion: Type.Integer({ minimum: 1, description: "Encryption key version for rotation" }),
keyVersion: Type.Integer({
minimum: 1,
description: "Encryption key version for rotation",
}),
salt: Type.String({ description: "Base64-encoded 16-byte PBKDF2 salt" }),
iv: Type.String({ description: "Base64-encoded 12-byte AES-GCM initialization vector" }),
iv: Type.String({
description: "Base64-encoded 12-byte AES-GCM initialization vector",
}),
data: Type.String({ description: "Base64-encoded AES-256-GCM ciphertext" }),
});
```
This is the same structure as the hub's `EncryptedData` interface but as a TypeBox schema, enabling runtime validation when inserting encrypted nodes.
This is the same structure as the hub's `EncryptedData` interface but as a
TypeBox schema, enabling runtime validation when inserting encrypted nodes.
## Crypto Utility
The encryption module provides three functions, ported from the hub's `src/crypto/mod.ts`:
The encryption module provides three functions, ported from the hub's
`src/crypto/mod.ts`:
### `encrypt(plaintext, password, keyVersion?): Promise<EncryptedData>`
Encrypts a string using AES-256-GCM with PBKDF2 key derivation.
**Process**:
1. Generate random 16-byte salt
2. Generate random 12-byte IV
3. Derive 256-bit key from password + salt via PBKDF2 (SHA-256, 100k iterations for v1)
3. Derive 256-bit key from password + salt via PBKDF2 (SHA-256, 100k iterations
for v1)
4. Encrypt plaintext with AES-256-GCM using the derived key and IV
5. Return `{ keyVersion, salt: base64(salt), iv: base64(iv), data: base64(ciphertext) }`
5. Return
`{ keyVersion, salt: base64(salt), iv: base64(iv), data: base64(ciphertext) }`
### `decrypt(encryptedData, password): Promise<string>`
Decrypts an `EncryptedData` object.
**Process**:
1. Decode base64 salt, IV, and ciphertext
2. Derive key from password + salt + keyVersion via PBKDF2
3. Decrypt with AES-256-GCM
4. Return plaintext string
5. Throw `"Decryption failed: Invalid data or key"` on failure (no information leakage about which part failed)
5. Throw `"Decryption failed: Invalid data or key"` on failure (no information
leakage about which part failed)
### `generateEncryptionKey(): string`
Generates a 32-byte random key encoded as base64. Used by operators to create encryption keys for the key ring.
Generates a 32-byte random key encoded as base64. Used by operators to create
encryption keys for the key ring.
**Key ring format** (application-level, not in this package): A comma-separated list of `v{N}:{base64key}` pairs. The first key is the "current" key used for new encryptions. All keys are available for decryption.
**Key ring format** (application-level, not in this package): A comma-separated
list of `v{N}:{base64key}` pairs. The first key is the "current" key used for
new encryptions. All keys are available for decryption.
### Key Versioning
PBKDF2 iteration count varies by key version:
- v1: 100,000 iterations
- Future versions: 200,000+ (adjust for hardware improvements)
This allows gradual security upgrades. Old data encrypted with v1 can still be decrypted. Re-encryption (rotate) reads with the old key and writes with the current key.
This allows gradual security upgrades. Old data encrypted with v1 can still be
decrypted. Re-encryption (rotate) reads with the old key and writes with the
current key.
### Web Crypto API
The implementation uses the standard Web Crypto API (`crypto.subtle`), available in:
The implementation uses the standard Web Crypto API (`crypto.subtle`), available
in:
- Deno runtime (native)
- Node.js 19+ (native)
- Modern browsers (native)
@@ -161,65 +208,100 @@ No external crypto dependencies.
### ED1: Per-attribute encryption, not per-node
The `EncryptedData` schema is a single attribute within a node type's attributes, not the entire node. This means:
The `EncryptedData` schema is a single attribute within a node type's
attributes, not the entire node. This means:
- A secret node can have unencrypted metadata alongside the encrypted value
- The node key (identity) is always readable for queries
- Only the sensitive payload is encrypted
**Alternative considered**: Encrypt the entire `attributes` column. This makes queries impossible (you can't find "all secrets for client X" if the client reference is encrypted). Per-attribute encryption preserves queryability on non-sensitive fields.
**Alternative considered**: Encrypt the entire `attributes` column. This makes
queries impossible (you can't find "all secrets for client X" if the client
reference is encrypted). Per-attribute encryption preserves queryability on
non-sensitive fields.
### ED2: Node type, not standalone table
Encrypted data is modeled as a node type rather than a dedicated `secrets` table because:
Encrypted data is modeled as a node type rather than a dedicated `secrets` table
because:
- **Graphs already provide the structure** — edges represent "client X has secret Y" without a join table
- **No foreign key proliferation** — new secret types (OAuth, SSH, API keys) are new node types, not new columns or tables
- **Uniform query patterns** — All graph queries work on secret nodes without special code
- **Graphs already provide the structure** — edges represent "client X has
secret Y" without a join table
- **No foreign key proliferation** — new secret types (OAuth, SSH, API keys) are
new node types, not new columns or tables
- **Uniform query patterns** — All graph queries work on secret nodes without
special code
**When a standalone table might be better**: If the hub needs to query "all active API keys" across all clients with a single indexed `WHERE` clause, a dedicated `api_keys` table with proper indexes is faster. The graph model requires traversing edges to find related secrets. For the hub's specific use case (key lookup on every authenticated request), this matters. The metagraph pattern is optimized for flexibility, not raw key-lookup performance. The hub should use a standalone `api_keys` table for authentication and the metagraph for everything else.
**When a standalone table might be better**: If the hub needs to query "all
active API keys" across all clients with a single indexed `WHERE` clause, a
dedicated `api_keys` table with proper indexes is faster. The graph model
requires traversing edges to find related secrets. For the hub's specific use
case (key lookup on every authenticated request), this matters. The metagraph
pattern is optimized for flexibility, not raw key-lookup performance. The hub
should use a standalone `api_keys` table for authentication and the metagraph
for everything else.
### ED3: Password-based encryption, not raw-key encryption
The current implementation uses PBKDF2 to derive a key from a password string. The "password" in practice is a base64-encoded 32-byte random key from `generateEncryptionKey()`. This means:
The current implementation uses PBKDF2 to derive a key from a password string.
The "password" in practice is a base64-encoded 32-byte random key from
`generateEncryptionKey()`. This means:
- The key derivation step adds security even when the input is already high-entropy (each encryption gets a unique salt, so the same key produces different ciphertexts)
- However, this adds ~100ms of latency per encryption/decryption due to PBKDF2 iterations
- The key derivation step adds security even when the input is already
high-entropy (each encryption gets a unique salt, so the same key produces
different ciphertexts)
- However, this adds ~100ms of latency per encryption/decryption due to PBKDF2
iterations
**Alternative**: Direct AES-GCM with raw key bytes (skip PBKDF2). This would be much faster for high-throughput scenarios but removes the per-encryption salt benefit (the IV still provides uniqueness for GCM). The hub uses password-based because the config format is human-manageable key strings. For `@alkdev/storage`, either approach works — the API accepts a "password" string which could be a raw key encoded as base64.
**Alternative**: Direct AES-GCM with raw key bytes (skip PBKDF2). This would be
much faster for high-throughput scenarios but removes the per-encryption salt
benefit (the IV still provides uniqueness for GCM). The hub uses password-based
because the config format is human-manageable key strings. For
`@alkdev/storage`, either approach works — the API accepts a "password" string
which could be a raw key encoded as base64.
**Decision**: Use the same PBKDF2 pattern for consistency with the hub. If performance becomes an issue, add a `encryptRaw()` function that skips PBKDF2 for raw key inputs.
**Decision**: Use the same PBKDF2 pattern for consistency with the hub. If
performance becomes an issue, add a `encryptRaw()` function that skips PBKDF2
for raw key inputs.
### ED4: Application-managed key ring
The storage package provides `encrypt()` and `decrypt()` but does NOT manage the key ring. The consuming application:
The storage package provides `encrypt()` and `decrypt()` but does NOT manage the
key ring. The consuming application:
1. Stores encryption keys in a secure location (Docker secrets, vault, config file with restricted permissions)
1. Stores encryption keys in a secure location (Docker secrets, vault, config
file with restricted permissions)
2. Loads keys at startup
3. Passes the appropriate key to `encrypt()` / `decrypt()` based on `keyVersion`
4. Handles key rotation (decrypt with old key, re-encrypt with current key)
This separation ensures:
- The storage package doesn't need to know about deployment infrastructure
- Key management policies are application-specific
- The encryption primitives are testable without a key ring implementation
### ED5: No key rotation utility in this package
Key rotation (decrypt with old key, re-encrypt with current key) is an application-level workflow:
Key rotation (decrypt with old key, re-encrypt with current key) is an
application-level workflow:
1. Find all nodes with `attributes.encryptedData.keyVersion < currentVersion`
2. For each: decrypt with old key → encrypt with current key → update node
3. Commit transaction
The storage package provides the building blocks (`encrypt()`, `decrypt()`, `EncryptedDataSchema`), not the rotation workflow. The hub's background sweep pattern is a good reference implementation.
The storage package provides the building blocks (`encrypt()`, `decrypt()`,
`EncryptedDataSchema`), not the rotation workflow. The hub's background sweep
pattern is a good reference implementation.
## Integration with SQLite Host
Encrypted node attributes are stored as JSON text in the `nodes.attributes` column, same as any other node attributes. The `EncryptedDataSchema` validates the shape at the application level.
Encrypted node attributes are stored as JSON text in the `nodes.attributes`
column, same as any other node attributes. The `EncryptedDataSchema` validates
the shape at the application level.
```ts
import { encrypt, decrypt } from "@alkdev/storage";
import { decrypt, encrypt } from "@alkdev/storage";
import { EncryptedDataSchema } from "@alkdev/storage";
const encryptionKey = "v1:YmFzZTY0a2V5"; // from application config
@@ -245,7 +327,8 @@ const attributes = {
## Export Plan
The crypto module will be exported from the main `@alkdev/storage` package (no db deps):
The crypto module will be exported from the main `@alkdev/storage` package (no
db deps):
```
src/graphs/
@@ -255,19 +338,37 @@ src/graphs/
└── mod.ts # re-exports all of the above
```
This keeps the encryption utility in the zero-dep export path (it only uses Web Crypto API and `@alkdev/typebox` for the schema).
This keeps the encryption utility in the zero-dep export path (it only uses Web
Crypto API and `@alkdev/typebox` for the schema).
## Open Questions
1. **Should we add `encryptRaw()` for performance?** The PBKDF2 derivation adds ~100ms per operation. For batch secret operations (e.g., rotating 1000 keys), this adds up. A `encryptRaw()` that skips PBKDF2 and uses the key directly would be much faster. Decision: add in a future iteration if performance demands it.
1. **Should we add `encryptRaw()` for performance?** The PBKDF2 derivation adds
~100ms per operation. For batch secret operations (e.g., rotating 1000 keys),
this adds up. A `encryptRaw()` that skips PBKDF2 and uses the key directly
would be much faster. Decision: add in a future iteration if performance
demands it.
2. **Should the `key` attribute on secret nodes be encrypted?** Currently only the `encryptedData` attribute is encrypted. The `key` (secret name like "api_key") is stored in plaintext for queryability. If secret names are themselves sensitive, they could be hashed instead. Decision: plaintext key names are acceptable for now. If needed, add a `keyHash` attribute for blind lookups (similar to the hub's `api_keys.keyHash`).
2. **Should the `key` attribute on secret nodes be encrypted?** Currently only
the `encryptedData` attribute is encrypted. The `key` (secret name like
"api_key") is stored in plaintext for queryability. If secret names are
themselves sensitive, they could be hashed instead. Decision: plaintext key
names are acceptable for now. If needed, add a `keyHash` attribute for blind
lookups (similar to the hub's `api_keys.keyHash`).
3. **Should secret nodes have `lastUsedAt` and `expiresAt` as first-class columns?** The hub's `client_secrets` has these as columns for indexed queries. In the metagraph model, they're attributes inside the node JSON. SQLite can't efficiently index JSON properties. Decision: for spoke use (occasional lookups), JSON attributes are fine. For hub use (high-throughput key validation), a standalone `api_keys` table with proper indexes is still needed.
3. **Should secret nodes have `lastUsedAt` and `expiresAt` as first-class
columns?** The hub's `client_secrets` has these as columns for indexed
queries. In the metagraph model, they're attributes inside the node JSON.
SQLite can't efficiently index JSON properties. Decision: for spoke use
(occasional lookups), JSON attributes are fine. For hub use (high-throughput
key validation), a standalone `api_keys` table with proper indexes is still
needed.
## References
- Hub crypto utility: `/workspace/@alkdev/hub/src/crypto/mod.ts`
- Hub `client_secrets` table: `/workspace/@alkdev/hub/docs/architecture/storage/services.md`
- Hub ADR-008: `/workspace/@alkdev/hub/docs/decisions/ADR-008-secrets-encrypted-at-rest-with-key-versioning.md`
- Web Crypto API: https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto
- Hub `client_secrets` table:
`/workspace/@alkdev/hub/docs/architecture/storage/services.md`
- Hub ADR-008:
`/workspace/@alkdev/hub/docs/decisions/ADR-008-secrets-encrypted-at-rest-with-key-versioning.md`
- Web Crypto API: https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto

View File

@@ -5,17 +5,27 @@ last_updated: 2026-05-28
# Metagraph Model
The core data model: graph types define schemas, node types define shapes, edge types define relationships, and typed graph instances hold actual data.
The core data model: graph types define schemas, node types define shapes, edge
types define relationships, and typed graph instances hold actual data.
## Overview
The metagraph pattern is a three-level type system:
1. **GraphType** — A class of graphs (e.g., "call-graph", "acl", "task-dependencies"). Defines structural constraints (directed/undirected/mixed, allows self-loops, multi-edges) via a `GraphConfig`.
2. **NodeType** — A category of node within a graph type (e.g., "operation-call", "account", "task"). Each node type has a TypeBox schema that validates the `attributes` of nodes belonging to that type. Optionally constrains which edge types can connect from/to this node type.
3. **EdgeType** — A category of edge within a graph type (e.g., "triggered", "can_read", "depends_on"). Each edge type has a TypeBox schema for its attributes. Optionally constrains which source/target node types are valid.
1. **GraphType** — A class of graphs (e.g., "call-graph", "acl",
"task-dependencies"). Defines structural constraints
(directed/undirected/mixed, allows self-loops, multi-edges) via a
`GraphConfig`.
2. **NodeType** — A category of node within a graph type (e.g.,
"operation-call", "account", "task"). Each node type has a TypeBox schema
that validates the `attributes` of nodes belonging to that type. Optionally
constrains which edge types can connect from/to this node type.
3. **EdgeType** — A category of edge within a graph type (e.g., "triggered",
"can_read", "depends_on"). Each edge type has a TypeBox schema for its
attributes. Optionally constrains which source/target node types are valid.
Then **Graph instances** belong to a graph type and contain **Nodes** and **Edges** conforming to those type definitions.
Then **Graph instances** belong to a graph type and contain **Nodes** and
**Edges** conforming to those type definitions.
```
GraphType "call-graph" (directed, multi, self-loops allowed)
@@ -40,7 +50,8 @@ Graph "session-abc-call-graph" (instance)
## Schema Types
Defined in `src/graphs/types.ts`. Zero database dependencies — these are pure TypeBox schemas used for validation and type inference.
Defined in `src/graphs/types.ts`. Zero database dependencies — these are pure
TypeBox schemas used for validation and type inference.
### BaseNodeAttributes
@@ -63,7 +74,8 @@ Optional audit and extension fields. Node `attributes` should extend this.
}
```
Every edge carries its type and optional metadata. Edge `attributes` should extend this.
Every edge carries its type and optional metadata. Edge `attributes` should
extend this.
### GraphConfig
@@ -75,7 +87,9 @@ Every edge carries its type and optional metadata. Edge `attributes` should exte
}
```
Structural constraints for a graph type. Defaults encourage permissive graphs (mixed, multi-edges, self-loops) because most real-world graphs need these features.
Structural constraints for a graph type. Defaults encourage permissive graphs
(mixed, multi-edges, self-loops) because most real-world graphs need these
features.
### NodeType
@@ -86,7 +100,10 @@ Structural constraints for a graph type. Defaults encourage permissive graphs (m
}
```
A node type definition. The `schema` validates the `attributes` of nodes that belong to this type. Consumer must extend `BaseNodeAttributes` in their schema — the metagraph model does not enforce this at the database level (SQLite can't enforce JSON schema), but the SchemaBuilder validates it at definition time.
A node type definition. The `schema` validates the `attributes` of nodes that
belong to this type. Consumer must extend `BaseNodeAttributes` in their schema —
the metagraph model does not enforce this at the database level (SQLite can't
enforce JSON schema), but the SchemaBuilder validates it at definition time.
### EdgeType
@@ -99,7 +116,10 @@ A node type definition. The `schema` validates the `attributes` of nodes that be
}
```
An edge type definition. Optionally constrains which node types can appear at source/target endpoints. When `allowedSourceTypes` or `allowedTargetTypes` is undefined, any node type is valid. When defined, only listed node types are valid endpoints.
An edge type definition. Optionally constrains which node types can appear at
source/target endpoints. When `allowedSourceTypes` or `allowedTargetTypes` is
undefined, any node type is valid. When defined, only listed node types are
valid endpoints.
### GraphSchema
@@ -111,7 +131,8 @@ An edge type definition. Optionally constrains which node types can appear at so
}
```
The complete definition of a graph type. This is what `SchemaBuilder.build()` produces.
The complete definition of a graph type. This is what `SchemaBuilder.build()`
produces.
### GraphStatus & GraphBaseType
@@ -120,7 +141,8 @@ Enum-backed types for graph lifecycle and structural type:
- `GraphStatus`: `active`, `archived`, `draft`
- `GraphBaseType`: `directed`, `undirected`, `mixed`
These are provided both as TypeScript enums and TypeBox schemas, derived from the same enum definition.
These are provided both as TypeScript enums and TypeBox schemas, derived from
the same enum definition.
## SchemaBuilder
@@ -143,53 +165,90 @@ const schema = new SchemaBuilder()
The builder validates at each step:
1. **`config()`** — Validates against `GraphConfig` schema. Applies defaults for missing fields.
2. **`nodeType()`** — Validates the schema is a valid TypeBox schema (`KindGuard.IsSchema`). Validates the resulting object against `NodeType` schema.
3. **`edgeType()`** — Same as nodeType, plus validates allowedSourceTypes/allowedTargetTypes are strings.
4. **`build()`** — Validates the complete schema against `GraphSchema`. Throws on any invalid structure.
1. **`config()`** — Validates against `GraphConfig` schema. Applies defaults for
missing fields.
2. **`nodeType()`** — Validates the schema is a valid TypeBox schema
(`KindGuard.IsSchema`). Validates the resulting object against `NodeType`
schema.
3. **`edgeType()`** — Same as nodeType, plus validates
allowedSourceTypes/allowedTargetTypes are strings.
4. **`build()`** — Validates the complete schema against `GraphSchema`. Throws
on any invalid structure.
**Error behavior**: The builder throws `Error` with a JSON-stringified list of validation errors (path + message). Validation failures do not roll back partial state — a builder that fails on the second `nodeType()` call still has the first node type in its schema. Callers should not reuse a builder after a failure. Create a new `SchemaBuilder` instead.
**Error behavior**: The builder throws `Error` with a JSON-stringified list of
validation errors (path + message). Validation failures do not roll back partial
state — a builder that fails on the second `nodeType()` call still has the first
node type in its schema. Callers should not reuse a builder after a failure.
Create a new `SchemaBuilder` instead.
**Edge type enforcement**: When `allowedSourceTypes` or `allowedTargetTypes` is undefined (or an empty array at the application layer), any node type is a valid endpoint. When a non-empty array is provided, only the listed node types are valid endpoints. The repository layer should enforce this at write time.
**Edge type enforcement**: When `allowedSourceTypes` or `allowedTargetTypes` is
undefined (or an empty array at the application layer), any node type is a valid
endpoint. When a non-empty array is provided, only the listed node types are
valid endpoints. The repository layer should enforce this at write time.
The SchemaBuilder enforces structural integrity at definition time. The database stores graph/node/edge type schemas as JSON blobs (`text` mode in SQLite, will be `jsonb` in PG). Database-level constraints (unique composite keys, cascade deletes) protect referential integrity, but the database does NOT validate JSON schema conformance. This is a deliberate trade-off:
The SchemaBuilder enforces structural integrity at definition time. The database
stores graph/node/edge type schemas as JSON blobs (`text` mode in SQLite, will
be `jsonb` in PG). Database-level constraints (unique composite keys, cascade
deletes) protect referential integrity, but the database does NOT validate JSON
schema conformance. This is a deliberate trade-off:
- **Pro**: Schema changes don't require migrations. A graph type's schema evolves by updating the JSON blob.
- **Pro**: Schema changes don't require migrations. A graph type's schema
evolves by updating the JSON blob.
- **Pro**: SQLite's JSON support is limited (no JSON schema constraints).
- **Con**: Invalid data can be inserted if application-level validation is bypassed.
- **Mitigation**: All repository-layer mutations validate against the current graph type's schema before writing.
- **Con**: Invalid data can be inserted if application-level validation is
bypassed.
- **Mitigation**: All repository-layer mutations validate against the current
graph type's schema before writing.
## Node and Edge Identity
Nodes and edges use a **composite identity model**:
- **Node**: identified by `(graphId, key)` — unique within a graph. The `key` is a consumer-defined string (e.g., `"call-001"`, `"account:alice"`).
- **Edge**: identified by `(graphId, key)` — unique within a graph. The `key` is optional for directed graphs but required for multi-edges.
- **Node**: identified by `(graphId, key)` — unique within a graph. The `key` is
a consumer-defined string (e.g., `"call-001"`, `"account:alice"`).
- **Edge**: identified by `(graphId, key)` — unique within a graph. The `key` is
optional for directed graphs but required for multi-edges.
This means consumers control their own identifiers within a graph. The database generates UUID `id` values for cross-graph references, but within a graph, the consumer's `key` is the identity.
This means consumers control their own identifiers within a graph. The database
generates UUID `id` values for cross-graph references, but within a graph, the
consumer's `key` is the identity.
## Attributes Storage
Node attributes and edge attributes are stored as JSON text in SQLite (will be `jsonb` in PG). The graph type's schema defines what shape these attributes should have, but the database doesn't enforce the schema — it stores whatever JSON is provided.
Node attributes and edge attributes are stored as JSON text in SQLite (will be
`jsonb` in PG). The graph type's schema defines what shape these attributes
should have, but the database doesn't enforce the schema — it stores whatever
JSON is provided.
This design means:
- **Schema evolution**: Add optional fields to a node type schema without migration. Old nodes are still valid.
- **Schema versioning**: The `version` field on graph types tracks breaking schema changes. Consumer code can check the version before processing.
- **Validation boundary**: All validation happens in the repository layer (application code), not in the database.
- **Schema evolution**: Add optional fields to a node type schema without
migration. Old nodes are still valid.
- **Schema versioning**: The `version` field on graph types tracks breaking
schema changes. Consumer code can check the version before processing.
- **Validation boundary**: All validation happens in the repository layer
(application code), not in the database.
## Versioning
Graph types have a `version` integer (default 1). This tracks **breaking** schema changes — field removals, type changes that break backward compatibility. Non-breaking changes (adding optional fields) do not require a version bump.
Graph types have a `version` integer (default 1). This tracks **breaking**
schema changes — field removals, type changes that break backward compatibility.
Non-breaking changes (adding optional fields) do not require a version bump.
The repository layer should check `version` before processing to ensure compatibility. A version mismatch indicates the data format has changed incompatibly and the consumer should handle it explicitly.
The repository layer should check `version` before processing to ensure
compatibility. A version mismatch indicates the data format has changed
incompatibly and the consumer should handle it explicitly.
## Usage Patterns
### Defining a Call Graph Type
```ts
import { SchemaBuilder, BaseNodeAttributes, BaseEdgeAttributes } from "@alkdev/storage";
import {
BaseEdgeAttributes,
BaseNodeAttributes,
SchemaBuilder,
} from "@alkdev/storage";
import { Type } from "@alkdev/typebox";
const CallNodeAttributes = Type.Intersect([
@@ -221,7 +280,7 @@ const schema = new SchemaBuilder()
const ACLNodeAttributes = Type.Intersect([
BaseNodeAttributes,
Type.Object({
resourceType: Type.String(), // "project", "session", "client"
resourceType: Type.String(), // "project", "session", "client"
resourceId: Type.String(),
}),
]);
@@ -239,8 +298,8 @@ const ACLEdgeAttributes = Type.Intersect([
const schema = new SchemaBuilder()
.config({ type: "directed", multi: true, allowSelfLoops: false })
.nodeType("principal", ACLNodeAttributes) // accounts, groups
.nodeType("resource", ACLNodeAttributes) // projects, sessions, etc.
.nodeType("principal", ACLNodeAttributes) // accounts, groups
.nodeType("resource", ACLNodeAttributes) // projects, sessions, etc.
.edgeType("can_access", ACLEdgeAttributes, {
allowedSourceTypes: ["principal"],
allowedTargetTypes: ["resource"],
@@ -250,7 +309,9 @@ const schema = new SchemaBuilder()
### Defining Encrypted Secret Storage as a Node Type
> **⚠️ Not yet implemented.** `EncryptedDataSchema` and `encrypt()`/`decrypt()` are planned additions. See [encrypted-data.md](./encrypted-data.md) for the design.
> **⚠️ Not yet implemented.** `EncryptedDataSchema` and `encrypt()`/`decrypt()`
> are planned additions. See [encrypted-data.md](./encrypted-data.md) for the
> design.
```ts
// PLANNED — not yet available
@@ -259,8 +320,8 @@ import { EncryptedDataSchema } from "@alkdev/storage";
const SecretNodeAttributes = Type.Intersect([
BaseNodeAttributes,
Type.Object({
key: Type.String(), // secret key name
encryptedData: EncryptedDataSchema, // AES-256-GCM ciphertext
key: Type.String(), // secret key name
encryptedData: EncryptedDataSchema, // AES-256-GCM ciphertext
expiresAt: Type.Optional(Type.String({ format: "date-time" })),
}),
]);
@@ -275,8 +336,10 @@ See [encrypted-data.md](./encrypted-data.md) for the full encrypted data design.
## References
- Hub call graph spec: `/workspace/@alkdev/hub/docs/architecture/storage/call-graph.md`
- Hub identity spec: `/workspace/@alkdev/hub/docs/architecture/storage/identity.md`
- Hub call graph spec:
`/workspace/@alkdev/hub/docs/architecture/storage/call-graph.md`
- Hub identity spec:
`/workspace/@alkdev/hub/docs/architecture/storage/identity.md`
- TypeBox: https://github.com/sinclairzx/typebox
- SchemaBuilder source: `src/graphs/schemaBuilder.ts`
- Schema types source: `src/graphs/types.ts`
- Schema types source: `src/graphs/types.ts`

View File

@@ -9,11 +9,19 @@ Typed graph storage with dual database hosts. Deno-first, published via JSR.
## Purpose
`@alkdev/storage` provides a **metagraph** storage model: graph types define schemas, node types define data shapes within those graphs, and edge types define typed relationships. Instances of these type definitions become actual graphs populated with nodes and edges.
`@alkdev/storage` provides a **metagraph** storage model: graph types define
schemas, node types define data shapes within those graphs, and edge types
define typed relationships. Instances of these type definitions become actual
graphs populated with nodes and edges.
This pattern replaces domain-specific table proliferation with a small number of general-purpose tables that can model anything — call graphs, ACL rules, task dependencies, encrypted secrets — while enforcing schema integrity through TypeBox validation.
This pattern replaces domain-specific table proliferation with a small number of
general-purpose tables that can model anything — call graphs, ACL rules, task
dependencies, encrypted secrets — while enforcing schema integrity through
TypeBox validation.
The package evolved from `@ade/ade-v0/packages/core/graphs` and `@ade/ade-v0/packages/storage_sqlite`, simplified and refactored for the @alkdev ecosystem.
The package evolved from `@ade/ade-v0/packages/core/graphs` and
`@ade/ade-v0/packages/storage_sqlite`, simplified and refactored for the @alkdev
ecosystem.
## Architecture
@@ -33,114 +41,160 @@ The package evolved from `@ade/ade-v0/packages/core/graphs` and `@ade/ade-v0/pac
### Subpath Exports (JSR/npm)
| Export | Contents | Dependencies |
|--------|----------|-------------|
| `@alkdev/storage` | Graph schema types, SchemaBuilder | `@alkdev/typebox`, `@alkdev/drizzlebox` |
| `@alkdev/storage/graphs` | Same as `.` — alias for the main export | Same as `.` |
| `@alkdev/storage/sqlite` | SQLite tables, relations, client | + `drizzle-orm`, `@libsql/client` |
| `@alkdev/storage/pg` | PostgreSQL tables, relations, client | ⚠️ NOT YET IMPLEMENTED |
| Export | Contents | Dependencies |
| ------------------------ | --------------------------------------- | --------------------------------------- |
| `@alkdev/storage` | Graph schema types, SchemaBuilder | `@alkdev/typebox`, `@alkdev/drizzlebox` |
| `@alkdev/storage/graphs` | Same as `.` — alias for the main export | Same as `.` |
| `@alkdev/storage/sqlite` | SQLite tables, relations, client | + `drizzle-orm`, `@libsql/client` |
| `@alkdev/storage/pg` | PostgreSQL tables, relations, client | ⚠️ NOT YET IMPLEMENTED |
The `./graphs` subpath exists because the source code lives in `src/graphs/` and the main `mod.ts` re-exports it. Importing from either `@alkdev/storage` or `@alkdev/storage/graphs` yields the same types and SchemaBuilder.
The `./graphs` subpath exists because the source code lives in `src/graphs/` and
the main `mod.ts` re-exports it. Importing from either `@alkdev/storage` or
`@alkdev/storage/graphs` yields the same types and SchemaBuilder.
## Terminology
| Term | Definition |
|------|-----------|
| **Metagraph** | A type system where graph types define schemas, node types define data shapes within those graphs, and edge types define typed relationships. Graph instances are concrete data conforming to these type definitions. |
| **Hub** | The central service in the hub-spoke architecture. Runs PostgreSQL, hosts API endpoints, coordinates spokes, and is the authoritative data store. `@alkdev/storage`'s PostgreSQL host (not yet implemented) targets the hub. |
| **Spoke** | A local/embedded instance that runs per-project or per-session. Uses SQLite for local storage. `@alkdev/storage`'s SQLite host targets spokes. |
| **Graph type** | A class of graphs (e.g., "call-graph", "acl"). Defines structural constraints (directed/undirected/mixed, multi-edges, self-loops) and the valid node/edge type vocabularies. Stored in the `graph_types` table. |
| **Node type** | A category of node within a graph type. Defines the attribute schema for nodes of that type. Stored in the `node_types` table. |
| **Edge type** | A category of edge within a graph type. Defines the attribute schema and optionally restricts which node types can be source/target. Stored in the `edge_types` table. |
| **Graph instance** | A concrete graph belonging to a graph type. Contains nodes and edges conforming to its type definitions. Stored in the `graphs` table. |
| **Consumer** | Code that imports `@alkdev/storage` (or a subpath) to define graph types and persist graph data. The hub and spokes are consumers. |
| **Repository layer** | ⚠️ Not yet implemented. The typed CRUD functions (insert, find, update, delete) that sit between consumer code and raw Drizzle queries. Performs schema validation before writes. |
| **Validation boundary** | The line where schema validation is enforced. In this package, validation happens in the SchemaBuilder (at type definition time) and the repository layer (at mutation time), NOT in the database. |
| Term | Definition |
| ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Metagraph** | A type system where graph types define schemas, node types define data shapes within those graphs, and edge types define typed relationships. Graph instances are concrete data conforming to these type definitions. |
| **Hub** | The central service in the hub-spoke architecture. Runs PostgreSQL, hosts API endpoints, coordinates spokes, and is the authoritative data store. `@alkdev/storage`'s PostgreSQL host (not yet implemented) targets the hub. |
| **Spoke** | A local/embedded instance that runs per-project or per-session. Uses SQLite for local storage. `@alkdev/storage`'s SQLite host targets spokes. |
| **Graph type** | A class of graphs (e.g., "call-graph", "acl"). Defines structural constraints (directed/undirected/mixed, multi-edges, self-loops) and the valid node/edge type vocabularies. Stored in the `graph_types` table. |
| **Node type** | A category of node within a graph type. Defines the attribute schema for nodes of that type. Stored in the `node_types` table. |
| **Edge type** | A category of edge within a graph type. Defines the attribute schema and optionally restricts which node types can be source/target. Stored in the `edge_types` table. |
| **Graph instance** | A concrete graph belonging to a graph type. Contains nodes and edges conforming to its type definitions. Stored in the `graphs` table. |
| **Consumer** | Code that imports `@alkdev/storage` (or a subpath) to define graph types and persist graph data. The hub and spokes are consumers. |
| **Repository layer** | ⚠️ Not yet implemented. The typed CRUD functions (insert, find, update, delete) that sit between consumer code and raw Drizzle queries. Performs schema validation before writes. |
| **Validation boundary** | The line where schema validation is enforced. In this package, validation happens in the SchemaBuilder (at type definition time) and the repository layer (at mutation time), NOT in the database. |
## Design Decisions
### D1: Deno-first, JSR publishes, npm comes free
The package is published to JSR (`deno publish`). npm compatibility is automatic via JSR's npm layer (`@jsr/alkdev__storage`). No separate dnt build step.
The package is published to JSR (`deno publish`). npm compatibility is automatic
via JSR's npm layer (`@jsr/alkdev__storage`). No separate dnt build step.
### D2: Metagraph over domain-specific tables
Instead of a table per domain concept (call graphs, ACL rules, task trees), we define graph types with typed node and edge schemas. A "call graph" is a graph type with specific node types (operation call, subcall) and edge types (triggered, depends_on). An "ACL graph" is a graph type with node types (account, resource) and edge types (can_read, can_write).
Instead of a table per domain concept (call graphs, ACL rules, task trees), we
define graph types with typed node and edge schemas. A "call graph" is a graph
type with specific node types (operation call, subcall) and edge types
(triggered, depends_on). An "ACL graph" is a graph type with node types
(account, resource) and edge types (can_read, can_write).
This trades some query convenience for generality. Domain-specific queries are built on top of the graph query layer, not baked into table schemas.
This trades some query convenience for generality. Domain-specific queries are
built on top of the graph query layer, not baked into table schemas.
### D3: SchemaBuilder as the primary API surface
The `SchemaBuilder` fluent API is the intended way to construct graph type definitions. It validates against TypeBox schemas at build time, ensuring that graph/node/edge type definitions are structurally sound before they're persisted to the database.
The `SchemaBuilder` fluent API is the intended way to construct graph type
definitions. It validates against TypeBox schemas at build time, ensuring that
graph/node/edge type definitions are structurally sound before they're persisted
to the database.
### D4: Injectable clients, no module-level side effects
`createSqliteDatabase(client)` receives a pre-created client. Module-level side effects (auto-connections, env-based configuration) are forbidden. This enables testing with in-memory databases and containerized deployment patterns.
`createSqliteDatabase(client)` receives a pre-created client. Module-level side
effects (auto-connections, env-based configuration) are forbidden. This enables
testing with in-memory databases and containerized deployment patterns.
### D5: Drizzle + TypeBox (via drizzlebox) as the table definition pattern
Drizzle table definitions are the single source of truth for database schema. `@alkdev/drizzlebox` generates TypeBox `Select*` and `Insert*` schemas from Drizzle tables, enabling runtime validation without manual schema duplication.
Drizzle table definitions are the single source of truth for database schema.
`@alkdev/drizzlebox` generates TypeBox `Select*` and `Insert*` schemas from
Drizzle tables, enabling runtime validation without manual schema duplication.
### D6: Enumeration pattern — `as const` objects, not TypeScript enums
All enumerations use the `as const` object pattern (e.g., `GRAPH_STATUS = { Active: "active", ... } as const`) rather than TypeScript `enum`. This avoids JSR slow-type issues (the existing lint exclusion for `no-slow-types` was needed partly because of TS enums) and provides a consistent pattern across the codebase. The TypeBox schemas use `Type.Union` of `Type.Literal` values derived from the const object.
All enumerations use the `as const` object pattern (e.g.,
`GRAPH_STATUS = { Active: "active", ... } as const`) rather than TypeScript
`enum`. This avoids JSR slow-type issues and provides a consistent pattern
across the codebase. The TypeBox schemas use `Type.Union` of `Type.Literal`
values derived from the const object.
### D7: No comments in code
Per project convention across @alkdev packages, source files contain no inline comments. Documentation lives in architecture docs and TypeBox schema descriptions.
Per project convention across @alkdev packages, source files contain no inline
comments. Documentation lives in architecture docs and TypeBox schema
descriptions.
### D8: Common columns pattern
All tables share `id` (text PK), `metadata` (JSON text defaulting to `{}`), `createdAt`, and `updatedAt` (integer timestamps in SQLite, will be timestamptz in PG). This ensures every row has auditability and extensibility.
All tables share `id` (text PK), `metadata` (JSON text defaulting to `{}`),
`createdAt`, and `updatedAt` (integer timestamps in SQLite, will be timestamptz
in PG). This ensures every row has auditability and extensibility.
## Dependencies
| Package | Purpose | Layer |
|---------|---------|-------|
| `@alkdev/typebox` | Runtime schema validation | graphs/ |
| `@alkdev/drizzlebox` | Generate TypeBox from Drizzle tables | sqlite/ |
| `drizzle-orm` | ORM, table definitions, queries | sqlite/ (and future pg/) |
| `@libsql/client` | SQLite client (libsql/turso) | sqlite/ |
| `postgres` | PostgreSQL client | pg/ (not yet used) |
| Package | Purpose | Layer |
| -------------------- | ------------------------------------ | ------------------------ |
| `@alkdev/typebox` | Runtime schema validation | graphs/ |
| `@alkdev/drizzlebox` | Generate TypeBox from Drizzle tables | sqlite/ |
| `drizzle-orm` | ORM, table definitions, queries | sqlite/ (and future pg/) |
| `@libsql/client` | SQLite client (libsql/turso) | sqlite/ |
| `postgres` | PostgreSQL client | pg/ (not yet used) |
`@alkdev/typebox` and `@alkdev/drizzlebox` are npm packages (not yet on JSR). JSR handles npm dependencies natively.
`@alkdev/typebox` and `@alkdev/drizzlebox` are npm packages (not yet on JSR).
JSR handles npm dependencies natively.
## What Exists vs. What's Needed
### Implemented
- Graph schema types and SchemaBuilder
- SQLite host: 6 metagraph tables + actors table + Drizzle relations + client factory
- SQLite host: 6 metagraph tables + actors table + Drizzle relations + client
factory
- TypeBox select/insert schemas generated from Drizzle tables (drizzlebox)
### Not Yet Implemented
| Gap | Priority | Notes |
|-----|----------|-------|
| Gap | Priority | Notes |
| ----------------------------------------- | ------------ | --------------------------------------------------------------------------------------------------- |
| Encrypted data node type + crypto utility | **Critical** | ⚠️ Not yet implemented. API keys and secrets at rest. See [encrypted-data.md](./encrypted-data.md). |
| Repository/CRUD layer | High | ⚠️ Not yet implemented. Typed insert, find, update, delete functions for graphs, nodes, edges |
| Tests | High | Zero tests exist. Needed before any real use. |
| PostgreSQL host | Medium | Same table shapes, `pgTable` + `jsonb` + `timestamp` + `pgEnum`. Stub only. |
| ACL graph type | Medium | Access control as a graph. Depends on encrypted data and CRUD layer. |
| Call graph type | Low | Hub-specific, uses metagraph. Deferred until hub consumes this package. |
| Session/message models | Low | Hub-specific, may remain domain tables. |
| Repository/CRUD layer | High | ⚠️ Not yet implemented. Typed insert, find, update, delete functions for graphs, nodes, edges |
| Tests | High | Zero tests exist. Needed before any real use. |
| PostgreSQL host | Medium | Same table shapes, `pgTable` + `jsonb` + `timestamp` + `pgEnum`. Stub only. |
| ACL graph type | Medium | Access control as a graph. Depends on encrypted data and CRUD layer. |
| Call graph type | Low | Hub-specific, uses metagraph. Deferred until hub consumes this package. |
| Session/message models | Low | Hub-specific, may remain domain tables. |
## Open Questions
1. **Should `actors` be a node type or a standalone table?** Currently `actors` is a standalone table in the SQLite host that isn't referenced by any relation. If identity/authentication is a graph (ACL nodes), actors become node types. If identity is a domain concept that needs special query patterns (auth lookups, session joins), standalone tables may be better. Decision: defer until ACL design.
1. **Should `actors` be a node type or a standalone table?** Currently `actors`
is a standalone table in the SQLite host that isn't referenced by any
relation. If identity/authentication is a graph (ACL nodes), actors become
node types. If identity is a domain concept that needs special query patterns
(auth lookups, session joins), standalone tables may be better. Decision:
defer until ACL design.
2. **Should the repository layer be host-specific or host-agnostic?** A host-agnostic repository (insert graph, find nodes by type) requires an abstraction over Drizzle's query builder. A host-specific repository is simpler but means duplicating query logic for PG. Decision: start host-specific in SQLite, extract common patterns later.
2. **Should the repository layer be host-specific or host-agnostic?** A
host-agnostic repository (insert graph, find nodes by type) requires an
abstraction over Drizzle's query builder. A host-specific repository is
simpler but means duplicating query logic for PG. Decision: start
host-specific in SQLite, extract common patterns later.
3. **Encrypted data scope**: Should encryption be per-attribute, per-node, or per-graph? Per-attribute (like hub's `client_secrets.value`) allows selective encryption. Per-node encrypts the entire `attributes` blob. Per-graph is overkill. Decision: per-attribute, modeled as an encrypted node type with a dedicated attribute for the ciphertext.
3. **Encrypted data scope**: Should encryption be per-attribute, per-node, or
per-graph? Per-attribute (like hub's `client_secrets.value`) allows selective
encryption. Per-node encrypts the entire `attributes` blob. Per-graph is
overkill. Decision: per-attribute, modeled as an encrypted node type with a
dedicated attribute for the ciphertext.
4. **Key management scope**: `@alkdev/storage` should provide the encryption/decryption primitives but NOT key management. The consuming application provides the key ring. This keeps the storage package agnostic to deployment-specific secret management.
4. **Key management scope**: `@alkdev/storage` should provide the
encryption/decryption primitives but NOT key management. The consuming
application provides the key ring. This keeps the storage package agnostic to
deployment-specific secret management.
5. **Migration strategy**: When graph type schemas evolve (new node types, changed attribute schemas), who handles migration? The repository layer should support schema version checking, but actual migration scripts are application-level. See [metagraph.md](./metagraph.md) for the versioning approach.
5. **Migration strategy**: When graph type schemas evolve (new node types,
changed attribute schemas), who handles migration? The repository layer
should support schema version checking, but actual migration scripts are
application-level. See [metagraph.md](./metagraph.md) for the versioning
approach.
## References
- Hub storage spec: `/workspace/@alkdev/hub/docs/architecture/storage/`
- Source heritage: `@ade/ade-v0/packages/core/graphs` and `@ade/ade-v0/packages/storage_sqlite`
- Source heritage: `@ade/ade-v0/packages/core/graphs` and
`@ade/ade-v0/packages/storage_sqlite`
- Drizzle ORM: https://orm.drizzle.team/
- TypeBox: https://github.com/sinclairzx/typebox
- JSR: https://jsr.io/
- JSR: https://jsr.io/

View File

@@ -5,18 +5,24 @@ last_updated: 2026-05-28
# SQLite Host
The SQLite database host for `@alkdev/storage`. Uses Drizzle ORM with libsql/Turso for the SQLite dialect and `@alkdev/drizzlebox` for TypeBox schema generation from Drizzle table definitions.
The SQLite database host for `@alkdev/storage`. Uses Drizzle ORM with
libsql/Turso for the SQLite dialect and `@alkdev/drizzlebox` for TypeBox schema
generation from Drizzle table definitions.
## Overview
The SQLite host provides:
1. **Drizzle table definitions** for the metagraph pattern (graph types, node types, edge types, graphs, nodes, edges) plus a standalone `actors` table
1. **Drizzle table definitions** for the metagraph pattern (graph types, node
types, edge types, graphs, nodes, edges) plus a standalone `actors` table
2. **Drizzle relations** for the relational query API
3. **TypeBox schemas** auto-generated from Drizzle tables (select/insert validation)
4. **Injectable database factory**`createSqliteDatabase(client)` accepts a pre-created client
3. **TypeBox schemas** auto-generated from Drizzle tables (select/insert
validation)
4. **Injectable database factory**`createSqliteDatabase(client)` accepts a
pre-created client
The SQLite host is the first-class target. PostgreSQL will follow the same table shapes with appropriate dialect changes.
The SQLite host is the first-class target. PostgreSQL will follow the same table
shapes with appropriate dialect changes.
## Package Structure
@@ -58,136 +64,164 @@ All tables share these columns:
**Notable differences from hub's PostgreSQL common columns**:
| Column | SQLite | PostgreSQL (hub) |
|--------|--------|-------------------|
| `id` | text PK (consumer-generated) | text PK with `$defaultFn(() => crypto.randomUUID())` |
| `metadata` | `text` with JSON mode | `jsonb` with `$type<Record<string, unknown>>()` |
| `createdAt` | `integer` timestamp mode (Unix epoch) | `timestamp with timezone` defaulting `now()` |
| Column | SQLite | PostgreSQL (hub) |
| ----------- | ------------------------------------- | ------------------------------------------------------------- |
| `id` | text PK (consumer-generated) | text PK with `$defaultFn(() => crypto.randomUUID())` |
| `metadata` | `text` with JSON mode | `jsonb` with `$type<Record<string, unknown>>()` |
| `createdAt` | `integer` timestamp mode (Unix epoch) | `timestamp with timezone` defaulting `now()` |
| `updatedAt` | `integer` timestamp mode (Unix epoch) | `timestamp with timezone` defaulting `now()` with `$onUpdate` |
The SQLite columns do NOT have `$defaultFn` for ID generation (the consumer provides IDs) and do NOT have `$onUpdate` for `updatedAt` (Drizzle's `$onUpdate` is application-level; consumers must set it explicitly).
The SQLite columns do NOT have `$defaultFn` for ID generation (the consumer
provides IDs) and do NOT have `$onUpdate` for `updatedAt` (Drizzle's `$onUpdate`
is application-level; consumers must set it explicitly).
### `graph_types`
Stores graph type definitions (schemas for classes of graphs).
| Column | Type | Constraints | Notes |
|--------|------|-------------|-------|
| id | text | PK | Consumer-generated UUID |
| metadata | text (JSON) | default `{}` | Extension namespace |
| createdAt | integer (timestamp) | not null, default `now` | |
| updatedAt | integer (timestamp) | not null, default `now` | |
| name | text | not null, **unique** | Graph type name (e.g., "call-graph", "acl") |
| description | text | default `""` | Human-readable description |
| config | text (JSON) | not null | `GraphConfig` — directed/undirected/mixed, multi, self-loops |
| version | integer | not null, default 1 | Breaking schema version |
| Column | Type | Constraints | Notes |
| ----------- | ------------------- | ----------------------- | ------------------------------------------------------------ |
| id | text | PK | Consumer-generated UUID |
| metadata | text (JSON) | default `{}` | Extension namespace |
| createdAt | integer (timestamp) | not null, default `now` | |
| updatedAt | integer (timestamp) | not null, default `now` | |
| name | text | not null, **unique** | Graph type name (e.g., "call-graph", "acl") |
| description | text | default `""` | Human-readable description |
| config | text (JSON) | not null | `GraphConfig` — directed/undirected/mixed, multi, self-loops |
| version | integer | not null, default 1 | Breaking schema version |
### `node_types`
Stores node type definitions within a graph type. Each node type has a TypeBox schema that validates node attributes.
Stores node type definitions within a graph type. Each node type has a TypeBox
schema that validates node attributes.
| Column | Type | Constraints | Notes |
|--------|------|-------------|-------|
| id | text | PK | |
| metadata | text (JSON) | default `{}` | |
| createdAt | integer (timestamp) | not null, default `now` | |
| updatedAt | integer (timestamp) | not null, default `now` | |
| graphTypeId | text | not null, FK → graphTypes.id (cascade) | Parent graph type |
| name | text | not null | Node type name (e.g., "call", "account") |
| description | text | default `""` | |
| schema | text (JSON) | not null | TypeBox schema for node attributes |
| Column | Type | Constraints | Notes |
| ----------- | ------------------- | -------------------------------------- | ---------------------------------------- |
| id | text | PK | |
| metadata | text (JSON) | default `{}` | |
| createdAt | integer (timestamp) | not null, default `now` | |
| updatedAt | integer (timestamp) | not null, default `now` | |
| graphTypeId | text | not null, FK → graphTypes.id (cascade) | Parent graph type |
| name | text | not null | Node type name (e.g., "call", "account") |
| description | text | default `""` | |
| schema | text (JSON) | not null | TypeBox schema for node attributes |
**Unique constraint**: `(graphTypeId, name)` — node type names are unique within a graph type.
**Unique constraint**: `(graphTypeId, name)` — node type names are unique within
a graph type.
### `edge_types`
Stores edge type definitions within a graph type.
| Column | Type | Constraints | Notes |
|--------|------|-------------|-------|
| id | text | PK | |
| metadata | text (JSON) | default `{}` | |
| createdAt | integer (timestamp) | not null, default `now` | |
| updatedAt | integer (timestamp) | not null, default `now` | |
| graphTypeId | text | not null, FK → graphTypes.id (cascade) | Parent graph type |
| name | text | not null | Edge type name (e.g., "triggered", "can_read") |
| description | text | default `""` | |
| schema | text (JSON) | not null | TypeBox schema for edge attributes |
| allowedSourceTypes | text (JSON) | default `[]` | Node type names valid at source endpoint |
| allowedTargetTypes | text (JSON) | default `[]` | Node type names valid at target endpoint |
| Column | Type | Constraints | Notes |
| ------------------ | ------------------- | -------------------------------------- | ---------------------------------------------- |
| id | text | PK | |
| metadata | text (JSON) | default `{}` | |
| createdAt | integer (timestamp) | not null, default `now` | |
| updatedAt | integer (timestamp) | not null, default `now` | |
| graphTypeId | text | not null, FK → graphTypes.id (cascade) | Parent graph type |
| name | text | not null | Edge type name (e.g., "triggered", "can_read") |
| description | text | default `""` | |
| schema | text (JSON) | not null | TypeBox schema for edge attributes |
| allowedSourceTypes | text (JSON) | default `[]` | Node type names valid at source endpoint |
| allowedTargetTypes | text (JSON) | default `[]` | Node type names valid at target endpoint |
**Unique constraint**: `(graphTypeId, name)` — edge type names are unique within a graph type.
**Unique constraint**: `(graphTypeId, name)` — edge type names are unique within
a graph type.
**Empty array semantics**: `allowedSourceTypes` and `allowedTargetTypes` default to `[]` (empty JSON array) in the database. The repository layer must treat `[]` (empty array) as "no restriction" — any node type is a valid endpoint — matching the behavior of `undefined` in the `EdgeType` schema. A non-empty array restricts endpoints to only the listed node types. There is no "no types allowed" state; if edge types need to be disabled, use a status or soft-delete pattern on the edge type definition.
**Empty array semantics**: `allowedSourceTypes` and `allowedTargetTypes` default
to `[]` (empty JSON array) in the database. The repository layer must treat `[]`
(empty array) as "no restriction" — any node type is a valid endpoint — matching
the behavior of `undefined` in the `EdgeType` schema. A non-empty array
restricts endpoints to only the listed node types. There is no "no types
allowed" state; if edge types need to be disabled, use a status or soft-delete
pattern on the edge type definition.
### `graphs`
Graph instances. Each graph belongs to a graph type.
| Column | Type | Constraints | Notes |
|--------|------|-------------|-------|
| id | text | PK | |
| metadata | text (JSON) | default `{}` | |
| createdAt | integer (timestamp) | not null, default `now` | |
| updatedAt | integer (timestamp) | not null, default `now` | |
| graphTypeId | text | FK → graphTypes.id (set null) | Set null on graph type deletion (orphan graph) |
| name | text | not null | Graph instance name |
| description | text | default `""` | |
| status | text | not null, enum: `active`, `archived`, `draft` | Default: `draft` |
| Column | Type | Constraints | Notes |
| ----------- | ------------------- | --------------------------------------------- | ---------------------------------------------- |
| id | text | PK | |
| metadata | text (JSON) | default `{}` | |
| createdAt | integer (timestamp) | not null, default `now` | |
| updatedAt | integer (timestamp) | not null, default `now` | |
| graphTypeId | text | FK → graphTypes.id (set null) | Set null on graph type deletion (orphan graph) |
| name | text | not null | Graph instance name |
| description | text | default `""` | |
| status | text | not null, enum: `active`, `archived`, `draft` | Default: `draft` |
**On `graphTypeId` set null**: When a graph type is deleted, its graphs become orphans with `graphTypeId = null`. The application should prevent graph type deletion if active graphs reference it, or set affected graphs' `status` to `archived` as part of a soft-delete workflow. Orphan graphs cannot validate their node/edge types against a missing type definition — queries against orphan graphs should check for `graphTypeId !== null` before performing type-aware operations.
**On `graphTypeId` set null**: When a graph type is deleted, its graphs become
orphans with `graphTypeId = null`. The application should prevent graph type
deletion if active graphs reference it, or set affected graphs' `status` to
`archived` as part of a soft-delete workflow. Orphan graphs cannot validate
their node/edge types against a missing type definition — queries against orphan
graphs should check for `graphTypeId !== null` before performing type-aware
operations.
### `nodes`
Nodes within a graph instance. Keyed by `(graphId, key)` — unique within a graph.
Nodes within a graph instance. Keyed by `(graphId, key)` — unique within a
graph.
| Column | Type | Constraints | Notes |
|--------|------|-------------|-------|
| id | text | PK | |
| metadata | text (JSON) | default `{}` | |
| createdAt | integer (timestamp) | not null, default `now` | |
| updatedAt | integer (timestamp) | not null, default `now` | |
| graphId | text | not null, FK → graphs.id (cascade) | Parent graph |
| key | text | not null | Consumer-defined identity within the graph |
| attributes | text (JSON) | not null, default `{}` | Node attributes validated by node type schema |
| Column | Type | Constraints | Notes |
| ---------- | ------------------- | ---------------------------------- | --------------------------------------------- |
| id | text | PK | |
| metadata | text (JSON) | default `{}` | |
| createdAt | integer (timestamp) | not null, default `now` | |
| updatedAt | integer (timestamp) | not null, default `now` | |
| graphId | text | not null, FK → graphs.id (cascade) | Parent graph |
| key | text | not null | Consumer-defined identity within the graph |
| attributes | text (JSON) | not null, default `{}` | Node attributes validated by node type schema |
**Unique constraint**: `(graphId, key)` — node keys are unique within a graph.
**No `nodeTypeId` column**: Nodes do not have a direct FK to `node_types`. The node type is determined at the application layer. This is a deliberate design decision — adding a `nodeTypeId` FK would couple the graph instance layer to the type definition layer. The repository layer can enforce node type constraints via validation against the graph type's schema.
**No `nodeTypeId` column**: Nodes do not have a direct FK to `node_types`. The
node type is determined at the application layer. This is a deliberate design
decision — adding a `nodeTypeId` FK would couple the graph instance layer to the
type definition layer. The repository layer can enforce node type constraints
via validation against the graph type's schema.
### `edges`
Edges within a graph instance. Keyed by `(graphId, key)` — unique within a graph.
Edges within a graph instance. Keyed by `(graphId, key)` — unique within a
graph.
| Column | Type | Constraints | Notes |
|--------|------|-------------|-------|
| id | text | PK | |
| metadata | text (JSON) | default `{}` | |
| createdAt | integer (timestamp) | not null, default `now` | |
| updatedAt | integer (timestamp) | not null, default `now` | |
| graphId | text | not null, FK → graphs.id (cascade) | Parent graph |
| key | text | | Consumer-defined identity (null for anonymous edges) |
| sourceNodeKey | text | not null | Source node key within the graph |
| targetNodeKey | text | not null | Target node key within the graph |
| attributes | text (JSON) | not null, default `{}` | Edge attributes validated by edge type schema |
| undirected | integer (boolean) | default false | Treat as undirected regardless of graph type |
| Column | Type | Constraints | Notes |
| ------------- | ------------------- | ---------------------------------- | ---------------------------------------------------- |
| id | text | PK | |
| metadata | text (JSON) | default `{}` | |
| createdAt | integer (timestamp) | not null, default `now` | |
| updatedAt | integer (timestamp) | not null, default `now` | |
| graphId | text | not null, FK → graphs.id (cascade) | Parent graph |
| key | text | | Consumer-defined identity (null for anonymous edges) |
| sourceNodeKey | text | not null | Source node key within the graph |
| targetNodeKey | text | not null | Target node key within the graph |
| attributes | text (JSON) | not null, default `{}` | Edge attributes validated by edge type schema |
| undirected | integer (boolean) | default false | Treat as undirected regardless of graph type |
**Unique constraint**: `(graphId, key)` — edge keys are unique within a graph.
**Foreign keys**: `sourceNodeKey` and `targetNodeKey` reference `(nodes.graphId, nodes.key)` with cascade delete. Deleting a node removes all its edges.
**Foreign keys**: `sourceNodeKey` and `targetNodeKey` reference
`(nodes.graphId, nodes.key)` with cascade delete. Deleting a node removes all
its edges.
### `actors`
Standalone identity table. Currently not referenced by any relation. This is a placeholder for the hub's account/identity model and may become a node type in an ACL graph or remain a standalone table. See [overview.md](./overview.md) Open Question 1.
Standalone identity table. Currently not referenced by any relation. This is a
placeholder for the hub's account/identity model and may become a node type in
an ACL graph or remain a standalone table. See [overview.md](./overview.md) Open
Question 1.
| Column | Type | Constraints | Notes |
|--------|------|-------------|-------|
| id | text | PK | |
| metadata | text (JSON) | default `{}` | |
| createdAt | integer (timestamp) | not null, default `now` | |
| updatedAt | integer (timestamp) | not null, default `now` | |
| name | text | not null | Actor display name |
| type | text | not null, enum: `human`, `llm`, `agent` | Actor type |
| Column | Type | Constraints | Notes |
| --------- | ------------------- | --------------------------------------- | ------------------ |
| id | text | PK | |
| metadata | text (JSON) | default `{}` | |
| createdAt | integer (timestamp) | not null, default `now` | |
| updatedAt | integer (timestamp) | not null, default `now` | |
| name | text | not null | Actor display name |
| type | text | not null, enum: `human`, `llm`, `agent` | Actor type |
## Relations
@@ -214,7 +248,8 @@ const client = createClient({ url: "file:local.db" });
const db: SqliteDatabase = createSqliteDatabase(client);
```
The factory takes a pre-created `@libsql/client` client and returns a typed Drizzle database instance with the full schema attached. This enables:
The factory takes a pre-created `@libsql/client` client and returns a typed
Drizzle database instance with the full schema attached. This enables:
- In-memory testing with `createClient({ url: ":memory:" })`
- Turso remote connections
@@ -224,74 +259,110 @@ The factory takes a pre-created `@libsql/client` client and returns a typed Driz
### SD1: JSON text vs. JSONB in SQLite
SQLite stores JSON as `text` with `{ mode: "json" }`. PostgreSQL uses native `jsonb`. This means:
SQLite stores JSON as `text` with `{ mode: "json" }`. PostgreSQL uses native
`jsonb`. This means:
- SQLite cannot query inside JSON columns efficiently (no GIN indexes)
- SQLite JSON validation relies on application-level checks (TypeBox schemas)
- PostgreSQL will get queryability benefits for JSON columns
The trade-off: SQLite is for spokes (local, infrequent queries), PostgreSQL is for the hub (frequent, complex queries).
The trade-off: SQLite is for spokes (local, infrequent queries), PostgreSQL is
for the hub (frequent, complex queries).
### SD2: No `nodeTypeId` on nodes
Nodes don't carry a direct FK to `node_types`. The node type is enforced at the application layer. Reasons:
Nodes don't carry a direct FK to `node_types`. The node type is enforced at the
application layer. Reasons:
- Graph type schemas define which node types are valid. Adding a FK would duplicate this constraint.
- Graph type schemas define which node types are valid. Adding a FK would
duplicate this constraint.
- Node types can evolve (schemas can change) without requiring node row updates.
- The repository layer validates node attributes against the appropriate node type schema before insertion.
- The repository layer validates node attributes against the appropriate node
type schema before insertion.
This may change if query performance requires filtering nodes by type. A `nodeTypeId` column can be added as a denormalized index.
This may change if query performance requires filtering nodes by type. A
`nodeTypeId` column can be added as a denormalized index.
### SD3: Edge identity uses consumer-defined keys
Edges use `(graphId, key)` as their unique identity. The `key` is consumer-defined, matching the metagraph model where consumers control identifiers. For anonymous edges (common in simple graphs), `key` can be auto-generated.
Edges use `(graphId, key)` as their unique identity. The `key` is
consumer-defined, matching the metagraph model where consumers control
identifiers. For anonymous edges (common in simple graphs), `key` can be
auto-generated.
### SD4: Composite foreign keys for node references
Edges reference nodes via composite FKs: `(graphId, sourceNodeKey) → (nodes.graphId, nodes.key)`. This ensures referential integrity within a graph and cascades node deletions to connected edges.
Edges reference nodes via composite FKs:
`(graphId, sourceNodeKey) → (nodes.graphId, nodes.key)`. This ensures
referential integrity within a graph and cascades node deletions to connected
edges.
### SD5: Enum pattern — `as const` objects, not TypeScript enums
All enumerations use the `as const` object pattern (e.g., `GRAPH_STATUS = { Active: "active", ... } as const`) rather than TypeScript `enum`. This matches the `ACTOR_TYPE` pattern in `common.ts` and avoids JSR slow-type issues. The TypeBox schema is a `Type.Union` of `Type.Literal` values derived from the object.
All enumerations use the `as const` object pattern (e.g.,
`GRAPH_STATUS = { Active: "active", ... } as const`) rather than TypeScript
`enum`. This matches the `ACTOR_TYPE` pattern in `common.ts` and avoids JSR
slow-type issues. The TypeBox schema is a `Type.Union` of `Type.Literal` values
derived from the object.
## Metadata Convention
Every table has a `metadata` JSON column defaulting to `{}`. This is an extension namespace for subsystem use, following a namespacing convention: `_subsystem.key` (e.g., `_keypal.scopes`, `_retention.expiresAt`).
Every table has a `metadata` JSON column defaulting to `{}`. This is an
extension namespace for subsystem use, following a namespacing convention:
`_subsystem.key` (e.g., `_keypal.scopes`, `_retention.expiresAt`).
**What metadata is for**: Opaque key-value pairs that subsystems add without schema changes. It's never queried in WHERE clauses or JOINs.
**What metadata is for**: Opaque key-value pairs that subsystems add without
schema changes. It's never queried in WHERE clauses or JOINs.
**What metadata is NOT for**: A replacement for typed columns. If a field appears in WHERE clauses, JOIN conditions, or needs a constraint, it should be a proper column — not buried in metadata. When in doubt, add a column.
**What metadata is NOT for**: A replacement for typed columns. If a field
appears in WHERE clauses, JOIN conditions, or needs a constraint, it should be a
proper column — not buried in metadata. When in doubt, add a column.
**Namespacing convention**: Subsystems should prefix their keys (e.g., `_callgraph.payloadRef`, `_acl.inherited`). Unprefixed keys are reserved for the storage package itself.
**Namespacing convention**: Subsystems should prefix their keys (e.g.,
`_callgraph.payloadRef`, `_acl.inherited`). Unprefixed keys are reserved for the
storage package itself.
## Concurrency Model
The SQLite host targets spoke deployments where a single process accesses the database. For this model, SQLite's default journal mode is sufficient. However, for spoke deployments that may run concurrent writes (e.g., multiple worker threads), consumers should:
The SQLite host targets spoke deployments where a single process accesses the
database. For this model, SQLite's default journal mode is sufficient. However,
for spoke deployments that may run concurrent writes (e.g., multiple worker
threads), consumers should:
1. **Enable WAL mode**: `PRAGMA journal_mode=WAL;` — allows concurrent reads during writes
2. **Set busy timeout**: `PRAGMA busy_timeout=5000;` wait up to 5 seconds for lock acquisition
3. **Use a single writer**: SQLite supports one writer at a time. If multiple threads write, route writes through a single queue or connection
1. **Enable WAL mode**: `PRAGMA journal_mode=WAL;` — allows concurrent reads
during writes
2. **Set busy timeout**: `PRAGMA busy_timeout=5000;` wait up to 5 seconds for
lock acquisition
3. **Use a single writer**: SQLite supports one writer at a time. If multiple
threads write, route writes through a single queue or connection
The `createSqliteDatabase()` factory does not set these pragmas — it's the consumer's responsibility to configure the SQLite connection appropriately. The libsql client used to create the connection can be pre-configured before passing it to the factory.
The `createSqliteDatabase()` factory does not set these pragmas — it's the
consumer's responsibility to configure the SQLite connection appropriately. The
libsql client used to create the connection can be pre-configured before passing
it to the factory.
## PostgreSQL Porting Notes
When implementing `src/pg/`, the table shapes remain the same but with these changes:
When implementing `src/pg/`, the table shapes remain the same but with these
changes:
| SQLite | PostgreSQL |
|--------|------------|
| `sqliteTable` | `pgTable` |
| `text` (JSON mode) | `jsonb` with `.$type<T>()` |
| `integer` (timestamp mode) | `timestamp` with timezone |
| `sql\`(strftime('%s', 'now'))\`` | `sql\`now()\`` |
| `integer` (boolean mode) | `boolean` |
| `text` (enum) | `pgEnum` or `text` with check constraint |
| SQLite | PostgreSQL |
| -------------------------------- | ---------------------------------------- |
| `sqliteTable` | `pgTable` |
| `text` (JSON mode) | `jsonb` with `.$type<T>()` |
| `integer` (timestamp mode) | `timestamp` with timezone |
| `sql\`(strftime('%s', 'now'))\`` | `sql\`now()\`` |
| `integer` (boolean mode) | `boolean` |
| `text` (enum) | `pgEnum` or `text` with check constraint |
See hub's `commonCols` reference in [../../hub/docs/architecture/storage/table-reference.md] for the PostgreSQL patterns.
See hub's `commonCols` reference in
[../../hub/docs/architecture/storage/table-reference.md] for the PostgreSQL
patterns.
## References
- Drizzle ORM SQLite core: https://orm.drizzle.team/docs/sqlite-core
- libsql client: https://github.com/tursodatabase/libsql
- Hub common columns pattern: `/workspace/@alkdev/hub/docs/architecture/storage/table-reference.md`
- Source: `src/sqlite/`
- Hub common columns pattern:
`/workspace/@alkdev/hub/docs/architecture/storage/table-reference.md`
- Source: `src/sqlite/`

View File

@@ -2,9 +2,12 @@
## Overview
This document defines the SDD process for the @alkdev/storage package. It leverages:
This document defines the SDD process for the @alkdev/storage package. It
leverages:
- **OpenCode CLI** as the agent execution environment
- **Open-coordinator plugin** for worktree management and parallel session orchestration
- **Open-coordinator plugin** for worktree management and parallel session
orchestration
- **Structured task graphs** with dependency analysis and safe exit protocols
## Core Principles
@@ -14,29 +17,37 @@ This document defines the SDD process for the @alkdev/storage package. It levera
3. **Flexible Self**: Agents can implement, self-review, and fix objectively
4. **Task-Driven**: Structured task graphs with dependency analysis
5. **Safe Exit**: Always have a way to unblock progress when stuck
6. **Categorical Estimates**: Use risk/scope/impact categories, not time estimates. These are structurally important — upstream failures multiply downstream damage regardless of developer type (human or LLM). See the cost-benefit framework in taskgraph's framework docs.
6. **Categorical Estimates**: Use risk/scope/impact categories, not time
estimates. These are structurally important — upstream failures multiply
downstream damage regardless of developer type (human or LLM). See the
cost-benefit framework in taskgraph's framework docs.
## Workflow Phases
### Phase 0: Exploration (Conditional)
**When**: Requirements unclear, multiple approaches to evaluate, or hard problems need investigation.
**When**: Requirements unclear, multiple approaches to evaluate, or hard
problems need investigation.
**Process**:
1. Capture vision and guiding principles
2. Research Specialist investigates options (`docs/research/` or external)
3. POC Specialist validates promising approaches (`.worktrees/research/`)
4. Document learnings
5. Converge on recommended approach
**Output**: Clear understanding of WHAT to build and WHY, with validated approaches
**Output**: Clear understanding of WHAT to build and WHY, with validated
approaches
### Phase 1: Architecture
**Objective**: Produce comprehensive, committed architecture specification.
**Process**:
1. Architect creates modular architecture docs in `docs/architecture/` (Draft status)
1. Architect creates modular architecture docs in `docs/architecture/` (Draft
status)
2. Architecture Review validates for ambiguities, risks
3. Iterate until zero critical issues
4. Transition to Stable status
@@ -48,6 +59,7 @@ This document defines the SDD process for the @alkdev/storage package. It levera
**Objective**: Break architecture into atomic, dependency-ordered tasks.
**Process**:
1. Decomposer analyzes architecture
2. Creates tasks (markdown files in `tasks/`)
3. Establishes dependencies between tasks
@@ -61,14 +73,18 @@ This document defines the SDD process for the @alkdev/storage package. It levera
**Objective**: Execute tasks in dependency order with verification.
**Process**:
1. Coordinator identifies parallelizable work
2. Coordinator spawns worktrees + sessions (via `worktree({action: "spawn", ...})` or hub `coord.spawn` when available)
2. Coordinator spawns worktrees + sessions (via
`worktree({action: "spawn", ...})` or hub `coord.spawn` when available)
- Feature work: `.worktrees/feat/<task-id>/` → Implementation Specialist
- Research POCs: `.worktrees/research/<task-id>/` → POC Specialist
3. Coordinator injects task context into each session
4. Agents execute tasks with self-verification
5. On completion: agent notifies coordinator, updates task status, commits to worktree branch
6. On blocker: Safe Exit protocol, agent notifies coordinator, create blocker task
5. On completion: agent notifies coordinator, updates task status, commits to
worktree branch
6. On blocker: Safe Exit protocol, agent notifies coordinator, create blocker
task
7. Merge worktrees back to main when complete
**Output**: Completed, verified implementation
@@ -78,6 +94,7 @@ This document defines the SDD process for the @alkdev/storage package. It levera
**Objective**: Validate quality and readiness.
**Process**:
1. Code review at injected checkpoints
2. Final integration testing
3. Architecture sync check
@@ -96,16 +113,19 @@ This document defines the SDD process for the @alkdev/storage package. It levera
**Mode**: Primary (interactive with user)
**Tools**:
- Read, Write, Edit, Glob, Grep
- webSearch (research patterns, best practices)
**Key Behaviors**:
- Focus on WHAT and WHY, never HOW
- Document decisions with ADR format
- Redirect exploration work to Research Specialist
- Iterate based on review feedback
**Deliverables**:
- Modular architecture docs in `docs/architecture/`
- Component-specific documents
@@ -118,15 +138,18 @@ This document defines the SDD process for the @alkdev/storage package. It levera
**Mode**: Primary (interactive with user for approval)
**Tools**:
- Read, Glob, Grep
**Key Behaviors**:
- Decompose to atomic tasks (single objective, clear acceptance criteria)
- Establish logical dependencies
- Validate structure (no cycles, logical ordering)
- Inject review tasks at critical points
**Deliverables**:
- Task files in `tasks/` directory
- Dependency graph validated
@@ -134,19 +157,27 @@ This document defines the SDD process for the @alkdev/storage package. It levera
#### 3. Coordinator
**Responsibility**: Orchestrate parallel task execution across worktrees and sessions.
**Responsibility**: Orchestrate parallel task execution across worktrees and
sessions.
**Mode**: Primary (manages worktrees and agent sessions)
**Uses**: The `worktree` tool from the **open-coordinator** opencode plugin. Single tool with `{action, args}` dispatch. Role is auto-detected — coordinator sessions get the full operation set, spawned implementation sessions get a limited set (current, notify, status). No mode toggle required.
**Uses**: The `worktree` tool from the **open-coordinator** opencode plugin.
Single tool with `{action, args}` dispatch. Role is auto-detected — coordinator
sessions get the full operation set, spawned implementation sessions get a
limited set (current, notify, status). No mode toggle required.
**Tools**:
- `worktree({action, args})` — spawn, sessions, dashboard, message, abort, cleanup
- `worktree({action, args})` — spawn, sessions, dashboard, message, abort,
cleanup
- Bash (opencode CLI for session interaction)
- Read (monitor task files)
- `memory` / `memory_compact` — context management and session history (via @alkdev/open-memory, when available)
- `memory` / `memory_compact` — context management and session history (via
@alkdev/open-memory, when available)
**Key Behaviors**:
- Identify parallelizable task groups
- Spawn worktrees + sessions via `worktree({action: "spawn", ...})`
- Inject task context into sessions
@@ -155,6 +186,7 @@ This document defines the SDD process for the @alkdev/storage package. It levera
- Merge completed worktrees
**Deliverables**:
- Coordinated parallel execution
- Blocked task escalation
- Merged branches
@@ -168,13 +200,16 @@ This document defines the SDD process for the @alkdev/storage package. It levera
**Mode**: Primary (works on assigned task in worktree)
**Tools**:
- Read, Write, Edit, Glob, Grep, Bash
- `worktree({action: "notify", ...})` — report progress/blockers to coordinator
- `worktree({action: "current"})` — verify worktree assignment
- webSearch (documentation lookup)
- `memory` / `memory_compact` — context management (via @alkdev/open-memory, when available)
- `memory` / `memory_compact` — context management (via @alkdev/open-memory,
when available)
**Key Behaviors**:
- Load task context (architecture, dependencies)
- Propose plan before implementing
- Implement following architecture constraints
@@ -184,6 +219,7 @@ This document defines the SDD process for the @alkdev/storage package. It levera
- Commit to worktree branch
**Deliverables**:
- Completed task implementation
- Tests passing
- Committed changes in worktree
@@ -199,9 +235,11 @@ This document defines the SDD process for the @alkdev/storage package. It levera
**Mode**: Subagent (invoked by Architect)
**Tools**:
- Read, Grep
**Key Behaviors**:
- Check for undefined terms
- Identify missing trade-off documentation
- Validate quality attribute coverage
@@ -216,9 +254,11 @@ This document defines the SDD process for the @alkdev/storage package. It levera
**Mode**: Subagent (invoked by Coordinator or as task)
**Tools**:
- Read, Grep, Bash (lint, test)
**Key Behaviors**:
- Check adherence to architecture
- Validate patterns and conventions
- Run linters and tests
@@ -233,10 +273,12 @@ This document defines the SDD process for the @alkdev/storage package. It levera
**Mode**: Subagent (invoked by any role)
**Tools**:
- Read, Write, Glob
- webSearch (primary research tool)
**Key Behaviors**:
- Find and summarize documentation
- Evaluate library alternatives
- Document findings
@@ -245,17 +287,20 @@ This document defines the SDD process for the @alkdev/storage package. It levera
#### 8. POC Specialist
**Responsibility**: Create proof-of-concepts to validate technical approaches before production implementation.
**Responsibility**: Create proof-of-concepts to validate technical approaches
before production implementation.
**Mode**: Primary (works in isolated research worktree)
**Worktree Location**: `.worktrees/research/<task-id>/`
**Tools**:
- Read, Write, Edit, Glob, Grep, Bash
- webSearch (implementation references)
**Key Behaviors**:
- Create minimal POCs to validate hypotheses
- Work in isolated research worktrees
- Document findings and recommendations
@@ -263,11 +308,13 @@ This document defines the SDD process for the @alkdev/storage package. It levera
- Be honest about limitations and blockers
**When Invoked**:
- After Research Specialist completes initial research
- When a technical approach needs validation before commitment
- When integration complexity or performance is uncertain
**Deliverables**:
- Working POC code
- Findings document with recommendation (proceed/pivot/block)
- Updated research task with results
@@ -276,7 +323,8 @@ This document defines the SDD process for the @alkdev/storage package. It levera
## Task File Format
Tasks are markdown files stored in `tasks/`. Since they're in the repo, they're automatically available in worktrees.
Tasks are markdown files stored in `tasks/`. Since they're in the repo, they're
automatically available in worktrees.
```markdown
---
@@ -306,40 +354,46 @@ Implement OAuth2 authentication with provider abstraction.
## Notes
> Agent fills this during implementation. Document any decisions,
> deviations from architecture, or relevant context discovered.
> Agent fills this during implementation. Document any decisions, deviations
> from architecture, or relevant context discovered.
## Summary
> Agent fills this on completion. Brief description of what was
> implemented, files changed, and any follow-up needed.
> Agent fills this on completion. Brief description of what was implemented,
> files changed, and any follow-up needed.
```
### Categorical Estimates
These fields are structurally important, not optional metadata. They power `taskgraph decompose`, `risk-path`, `critical`, and `bottleneck` — commands that reveal structural problems in the task graph. A task missing `scope`, `risk`, `impact`, or `level` is a red flag indicating incomplete decomposition. See the cost-benefit framework in taskgraph's framework docs for the reasoning.
These fields are structurally important, not optional metadata. They power
`taskgraph decompose`, `risk-path`, `critical`, and `bottleneck` — commands that
reveal structural problems in the task graph. A task missing `scope`, `risk`,
`impact`, or `level` is a red flag indicating incomplete decomposition. See the
cost-benefit framework in taskgraph's framework docs for the reasoning.
| Scope | Description | Example |
|-------|-------------|---------|
| single | One function, one file | Add validation helper |
| narrow | One component, few files | Implement auth middleware |
| moderate | Feature, multiple components | Build user API endpoints |
| broad | Multi-component feature | Implement OAuth flow |
| system | Cross-cutting changes | Database migration |
| Scope | Description | Example |
| -------- | ---------------------------- | ------------------------- |
| single | One function, one file | Add validation helper |
| narrow | One component, few files | Implement auth middleware |
| moderate | Feature, multiple components | Build user API endpoints |
| broad | Multi-component feature | Implement OAuth flow |
| system | Cross-cutting changes | Database migration |
| Risk | Failure Likelihood |
|------|-------------------|
| trivial | Nearly impossible to fail |
| low | Standard implementation |
| medium | Some uncertainty |
| high | Significant unknowns |
| critical | High chance of failure |
| Risk | Failure Likelihood |
| -------- | ------------------------- |
| trivial | Nearly impossible to fail |
| low | Standard implementation |
| medium | Some uncertainty |
| high | Significant unknowns |
| critical | High chance of failure |
### Task Lifecycle
**Status values**: `pending``in-progress``completed` | `blocked` | `failed`
**Status values**: `pending``in-progress``completed` | `blocked` |
`failed`
**On completion**, the agent:
1. Updates `status: completed`
2. Fills in `## Summary` section
3. Commits changes to worktree branch
@@ -351,10 +405,12 @@ When a task becomes untendable:
### Criteria
**Hard Criteria** (automatic):
- Same task fails verification 3+ times
- Task attempts exceed 5+ total
**Soft Criteria** (agent judgment):
- Ambiguous architecture
- Missing dependencies
- External library incompatibility
@@ -371,18 +427,20 @@ When a task becomes untendable:
Use graph analysis to determine where reviews should happen:
| Analysis | Injection Point |
|----------|-----------------|
| Parallel groups | Review before groups merge |
| Bottleneck tasks | Review before critical path |
| High-risk tasks | Review before proceeding |
| Critical path | Review before critical tasks |
| Analysis | Injection Point |
| ---------------- | ---------------------------- |
| Parallel groups | Review before groups merge |
| Bottleneck tasks | Review before critical path |
| High-risk tasks | Review before proceeding |
| Critical path | Review before critical tasks |
## Coordinator Implementation
### Current (open-coordinator plugin)
The Coordinator uses the `worktree` tool from the open-coordinator opencode plugin. It's a single tool with `{action, args}` dispatch — no separate enable/toggle steps. Role is auto-detected from session state.
The Coordinator uses the `worktree` tool from the open-coordinator opencode
plugin. It's a single tool with `{action, args}` dispatch — no separate
enable/toggle steps. Role is auto-detected from session state.
```
1. Identify parallel work
@@ -413,11 +471,13 @@ The Coordinator uses the `worktree` tool from the open-coordinator opencode plug
worktree({action: "cleanup", args: {action: "remove", pathOrBranch: "feat/auth-setup"}})
```
The plugin also provides SSE-based anomaly detection (model degradation, high error count, session stall) with automatic notifications to the coordinator.
The plugin also provides SSE-based anomaly detection (model degradation, high
error count, session stall) with automatic notifications to the coordinator.
### Implementation Agent Operations
Spawned sessions (implementation specialists, code reviewers, POC specialists) get a limited worktree interface:
Spawned sessions (implementation specialists, code reviewers, POC specialists)
get a limited worktree interface:
```text
worktree({action: "current"}) → Show worktree mapping
@@ -425,17 +485,24 @@ worktree({action: "notify", args: {message: "...", level: "info|blocking"}})
worktree({action: "status"}) → Show worktree git status
```
The plugin auto-injects `workdir` for bash commands when a session is mapped to a worktree.
The plugin auto-injects `workdir` for bash commands when a session is mapped to
a worktree.
### Context & Memory (with @alkdev/open-memory)
When the open-memory plugin is available alongside open-coordinator, the coordinator gains:
- `memory({tool: "children", args: {sessionId: "..."}})` — view sub-agent sessions spawned from the coordinator
- `memory({tool: "messages", args: {sessionId: "..."}})` — read a spawned session's conversation for debugging
- `memory({tool: "context"})`check context window usage before long monitoring sessions
When the open-memory plugin is available alongside open-coordinator, the
coordinator gains:
- `memory({tool: "children", args: {sessionId: "..."}})`view sub-agent
sessions spawned from the coordinator
- `memory({tool: "messages", args: {sessionId: "..."}})` — read a spawned
session's conversation for debugging
- `memory({tool: "context"})` — check context window usage before long
monitoring sessions
- `memory_compact()` — proactively compact at natural breakpoints
Implementation agents can also use `memory({tool: "context"})` and `memory_compact()` to manage their context during long tasks.
Implementation agents can also use `memory({tool: "context"})` and
`memory_compact()` to manage their context during long tasks.
### Future (Hub Operations)
@@ -455,7 +522,9 @@ Once the hub is operational, coordination uses native operations:
hub.call("coord.abort", { sessionId })
```
State moves from in-process tracking to Postgres `mappings` table. The open-coordinator plugin becomes unnecessary — the hub provides the same capabilities as server-side operations accessible from any environment.
State moves from in-process tracking to Postgres `mappings` table. The
open-coordinator plugin becomes unnecessary — the hub provides the same
capabilities as server-side operations accessible from any environment.
## Document Structure
@@ -523,4 +592,4 @@ This document should evolve with the project:
1. Refine roles based on actual usage
2. Adjust task templates based on what works
3. Document coordinator patterns as they emerge
4. Capture learnings in after-action reviews
4. Capture learnings in after-action reviews