Files
hub/docs/architecture/storage/README.md
glm-5.1 2b63cda1c7 Setup repo: migrate architecture specs, code stubs, and tasks from alkhub_ts
Copy architecture docs, ADRs, storage domain specs, research, reviews,
and 56 storage architecture tasks from the alkhub_ts monorepo. Adapt for
standalone @alkdev/hub repo structure (src/ not packages/hub/).

Sanitize all sensitive information:
- Replace private IPs (10.0.0.1) with localhost defaults
- Remove internal server hostnames (dev1, ns528096)
- Replace /workspace/ private paths with npm package references
- Remove hardcoded credentials from examples
- Rewrite infrastructure.md without private network details

Add Deno project scaffolding: deno.json (pinned deps), .gitignore,
AGENTS.md, entry point. Migrate existing code stubs (crypto, config
types, logger) with updated import paths.
2026-05-25 10:56:32 +00:00

286 lines
16 KiB
Markdown

---
status: draft
last_updated: 2026-04-19
---
# Storage: Drizzle + TypeBox + Postgres
## Overview
The storage layer uses Drizzle ORM for database operations, PostgreSQL as the persistence layer, and `@alkdev/drizzlebox` for automatic TypeBox schema generation from Drizzle table definitions. Drizzle table definitions are the single source of truth — `createSelectSchema` / `createInsertSchema` generate TypeBox schemas automatically.
**Location**: `src/storage/`
For table schemas, see [table-reference.md](./table-reference.md) (index, common columns, cascade behavior) and the per-domain schema files (identity.md, projects.md, sessions.md, etc.). For design decisions, see [../../decisions/](../../decisions/).
## Pattern: Drizzle-Typebox
Each table file follows this pattern:
```ts
import { pgTable, text, timestamp, jsonb, boolean, integer, index, unique } from "drizzle-orm/pg-core";
import { createInsertSchema, createSelectSchema } from "@alkdev/drizzlebox";
import { Type, type Static } from "@alkdev/typebox";
import { commonCols } from "./common.ts";
// 1. Table definition with Drizzle (source of truth)
export const sessions = pgTable("sessions", {
...commonCols,
projectId: text("project_id")
.notNull()
.references(() => projects.id, { onDelete: "cascade" }),
title: text("title"),
status: text("status", { enum: ["idle", "busy", "retry", "archived"] })
.default("idle")
.notNull(),
data: jsonb("data").$type<SessionData>().default({}),
});
// 2. Select TypeBox schema (for API responses)
export const SelectSession = createSelectSchema(sessions, {
metadata: Type.Object({}, { additionalProperties: true }),
data: SessionDataSchema, // override JSON columns
});
export type SelectSession = Static<typeof SelectSession>;
// 3. Insert TypeBox schema (for API validation)
export const InsertSession = createInsertSchema(sessions, {
title: Type.Optional(Type.String({ minLength: 1, maxLength: 500 })),
status: Type.Optional(
Type.Union([
Type.Literal("idle"),
Type.Literal("busy"),
Type.Literal("retry"),
Type.Literal("archived"),
]),
),
});
export type InsertSession = Static<typeof InsertSession>;
```
## Common Columns
All tables share these columns:
```ts
import { text, timestamp, jsonb } from "drizzle-orm/pg-core";
import { sql } from "drizzle-orm";
export const commonCols = {
id: text("id")
.primaryKey()
.$defaultFn(() => crypto.randomUUID()),
metadata: jsonb("metadata").$type<Record<string, unknown>>().default({}),
createdAt: timestamp("created_at", { withTimezone: true })
.default(sql`now()`)
.notNull(),
updatedAt: timestamp("updated_at", { withTimezone: true })
.default(sql`now()`)
.notNull()
.$onUpdate(() => new Date()),
};
// Note: commonCols.id uses crypto.randomUUID() which generates UUIDv4 (random, non-sortable).
// For tables requiring chronological ordering by ID (e.g. parts, messages), use sortable IDs:
// - UUIDv7 (time-sortable) via a library like @std/ulid or uuidv7
// - Or add an explicit sequence/position column
// The parts table uses an explicit position-based ID scheme inherited from opencode's sortable
// timestamp-based IDs. See the parts table section in sessions.md for details.
//
// Note: updatedAt uses Drizzle's $onUpdate (application-level). Direct SQL updates bypass this
// and must manually SET updated_at = now(). For critical tables, consider adding a Postgres
// trigger as a safety net.
```
## JSONB Column Boundaries
All tables have `commonCols.metadata` (JSONB, default `{}`), and some tables have an additional domain-specific `data` or `config` column. The boundary between these columns matters for implementers:
- **`metadata`** (commonCols): Opaque key-value pairs for subsystem use, with a namespacing convention (`_subsystem.key`). Examples: `_keypal.scopes`, `_retention.expiresAt`, `_version`. If a subsystem needs to store data on a row, it uses `metadata` with its prefixed namespace. The `metadata` column is never queried in WHERE clauses or JOINs.
- **`data`** (domain-specific): Structured domain-specific data with known TypeScript types. Examples: session execution metadata (`model`, `tokens`, `cost`), message role-specific metadata, account preferences. Fields in `data` have defined shapes and may be validated against TypeBox schemas.
- **`config`** (clients): Validated connection configuration. Validated against the TypeBox schema for the client `type` on write. Secrets are NEVER in `config` — they go in `client_secrets`.
- **`identity`** / **`details`** (call graph, audit): Immutable context set at creation time. These record who/what/why and are never updated after creation.
**Rule of thumb**: If a field appears in WHERE clauses, JOIN conditions, or needs a constraint, it should be a proper column — not buried in JSONB.
## Package Structure
```
src/storage/
├── mod.ts # exports schema namespace + db client
├── client.ts # drizzle + postgres connection
├── schema.ts # barrel re-export of tables + relations
├── drizzle.config.ts # drizzle-kit migration config
├── tables/
│ ├── common.ts # shared columns (id, metadata, timestamps)
│ ├── accounts.ts # hub-local identity records
│ ├── roles.ts # behavioral role definitions (planned — see agent-roles.md)
│ ├── organizations.ts # top-level groupings
│ ├── organization_members.ts # account ↔ org membership
│ ├── projects.ts # projects (git repositories / work contexts)
│ ├── workspaces.ts # project workspaces (branches, directories)
│ ├── sessions.ts # agent conversation sessions
│ ├── messages.ts # session messages (metadata in data column)
│ ├── parts.ts # message parts (discriminated by type, content in data)
│ ├── spokes.ts # spoke registrations
│ ├── operations.ts # operation definitions (what an operation IS)
│ ├── operation_registrations.ts # provider registrations (who provides it now)
│ ├── api_keys.ts # API keys (keypal-managed, inbound auth)
│ ├── audit_logs.ts # keypal + hub audit trail
│ ├── clients.ts # external service registrations (outbound connections)
│ ├── client_secrets.ts # encrypted credentials for clients
│ ├── mappings.ts # worktree/spoke/coordinator mappings
│ ├── detections.ts # anomaly detection records
│ ├── call_graph_nodes.ts # call graph nodes
│ ├── call_graph_edges.ts # call graph edges
│ ├── tasks.ts # SDD task definitions
│ ├── task_dependencies.ts # task dependency edges
│ └── index.ts # barrel re-export
├── relations.ts # drizzle relational mappings
└── test/
└── helpers/
├── db.ts # test db setup
└── migrations.ts # migration runner for tests
```
## Database Connection
The hub reads database configuration from the encrypted config file (see [hub-config.md](../hub-config.md)). Connection parameters are NOT read from environment variables (see ADR-008, revised).
```ts
import { drizzle } from "drizzle-orm/node-postgres";
import { Pool } from "pg";
import * as schema from "./schema.ts";
// HubConfig.postgres is decrypted at startup by loadConfig()
function createPool(pgConfig: PostgresConfig) {
return new Pool({
host: pgConfig.host, // default: 127.0.0.1 (localhost)
port: pgConfig.port, // default: 5432
database: pgConfig.database, // default: alkdev
user: pgConfig.user,
password: pgConfig.password,
ssl: pgConfig.ssl,
max: pgConfig.maxConnections,
});
}
export const db = drizzle(pool, { schema });
```
See [infrastructure.md](../infrastructure.md) for network topology and connection details.
## Migration Strategy
```ts
// drizzle.config.ts
import { defineConfig } from "drizzle-kit";
export default defineConfig({
out: "./migrations",
schema: "./schema.ts",
dialect: "postgresql",
dbCredentials: {
// Read from a local dev config file (gitignored).
// Generate via: alkhub-config decrypt --field postgres --config config.json
// Then assemble the URL from the decrypted fields.
// Do NOT use Deno.env.get() for database credentials.
// See hub-config.md §D7 for rationale.
url: loadDevDbUrl(),
},
});
```
Where `loadDevDbUrl()` reads from a developer-local config file (e.g., `.alkhub/dev-db.json`, gitignored):
```ts
import { readFileSync } from "node:fs";
function loadDevDbUrl(): string {
try {
const devConfig = JSON.parse(readFileSync(".alkhub/dev-db.json", "utf-8"));
return `postgresql://${devConfig.user}:${devConfig.password}@${devConfig.host}:${devConfig.port}/${devConfig.database}`;
} catch {
// Fallback for fresh dev setup — no secrets in env vars
return "postgresql://hub:***@localhost:5432/alkdev_dev";
}
}
```
Run: `drizzle-kit generate` to create migrations, `drizzle-kit migrate` to apply. At hub startup, migrations are applied programmatically (see [hub-startup.md](../hub-startup.md) Step 5).
**Important**: The hub's `drizzle.config.ts` does NOT use `Deno.env.get()` for database credentials. Instead, it reads from a local development config file (gitignored) or from a decrypted field produced by `alkhub-config decrypt`. See [hub-config.md](../hub-config.md) §D7 for the decision and the approved env vars list.
## Test Setup
```ts
import { drizzle } from "drizzle-orm/node-postgres";
import { Pool } from "pg";
import * as schema from "../../schema.ts";
export async function setupTestDb(testConfig: TestDbConfig) {
const pool = new Pool({
host: testConfig.host,
database: testConfig.database,
port: testConfig.port,
user: testConfig.user,
password: testConfig.password,
});
const db = drizzle(pool, { schema });
// Run migrations
return { pool, db };
}
```
Test database configuration is read from a test config file or test-specific Docker secrets, following the same pattern as production config (no env vars for credentials). The `ALKHUB_TEST_CONFIG_PATH` env var (non-sensitive) may point to the test config file location.
## Resolved Decisions
1. **~~Operation spec cleanup~~**: **Resolved** (D3). Operation definitions (`operations` table) persist independently of spoke connections. Operation registrations (`operation_registrations` table) are set to `status: 'inactive'` on disconnect and may be cascade-deleted if a spoke row is administratively removed. See D3 in storage-spec-phase1-resolutions.md.
2. **~~Workspaces vs. directories~~**: **Resolved**. `projects.directory` is the convenience shortcut for the default workspace; `workspaces.directory` is per-workspace. Both are needed.
3. **~~`accounts.role``accounts.accessLevel`~~**: **Resolved** by [ADR-012](../../decisions/ADR-012-agent-vs-role-vs-account.md). `accounts.role` renamed to `accounts.accessLevel` (values: admin/user/service). `organization_members.role` renamed to `organization_members.membershipLevel` (values: owner/admin/member). This disambiguates access levels from behavioral roles.
## Open Questions
1. **Message versioning**: Opencode has a `version` column on sessions for schema migration. Should we version the `data` column format on messages and parts for forward compatibility? The `commonCols.metadata` column could hold a `_version` field.
2. **Session message compaction**: Opencode has a `compaction` part type for context window management. The hub's storage should support this, but the compaction logic itself belongs in the session management layer, not in storage. Need to define what compaction means for hub-direct AI SDK sessions.
3. **Call graph retention policy**: Call graph data can grow fast. Need a retention policy — probably TTL-based cleanup of completed/failed calls older than N days, with aggregation for observability dashboards. See the payload truncation note in call-graph.md.
4. **Keypal adapter testing**: The `HubKeyStorage` adapter should have comprehensive tests. keypal's own test suite covers the core logic; our adapter tests cover the Drizzle integration.
5. **Cross-doc terminology migration**: The "spoke" naming ADR establishes the canonical terminology. Other architecture docs still contain "runner" / "runnerId" references. These should be updated in a separate pass.
6. **Anthropic conversation import**: Anthropic's web interface exports use a flat message model. A future import script should map these to our `messages` + `parts` tables. The Anthropic project model maps to our `projects` + `sessions` structure. Deferred — the export format is documented and available when needed.
7. **Gitea operations at startup**: The Gitea swagger spec is at `https://git.alk.dev/swagger.v1.json` (Swagger 2.0, 299 endpoints). Our `from_openapi.ts` supports this format. At hub startup, load the Gitea client config + secret from the DB, import the spec, and register ~300 Gitea operations.
8. **Client config schema evolution**: When a client type's TypeBox schema changes (e.g., adding a new field), existing DB rows with the old config shape may fail validation. Strategy: schemas should use `Type.Optional()` for new fields, and the resolution code should handle missing fields gracefully. If a breaking change is needed, bump a schema version in the `metadata` column. See [ADR-007](../../decisions/ADR-007-client-config-as-schema-validated-jsonb.md) for the validation pattern. Full contract pending `specify-client-config-validation` task.
9. **Task storage and sync**: The database is the source of truth for task data at runtime. Markdown files serve as the authoring surface for the Decomposer and taskgraph CLI — they are ingested into the DB via a sync operation (files → DB). When offline analysis is needed, tasks can be exported from DB back to files. See [tasks.md](./tasks.md) and [ADR-011](../../decisions/ADR-011-dual-task-representation.md).
10. **Task embeddings (deferred)**: Task descriptions could benefit from vector embeddings for similarity search ("find tasks like this one"). Deferred from initial implementation. The `metadata` JSONB column can hold an embedding reference later, or a separate `task_embeddings` table can be added when needed.
11. **Role definitions in database**: Role definitions (currently in `.opencode/agents/*.md`) should eventually become database records. A `roles` table would store role name, description, mode, permissions, tools, temperature, and model parameters. The transition follows the same pattern as taskgraph (file-based authoring, database as source of truth). See [agent-roles.md](../../agent-roles.md) for the full role model.
## References
- Crypto utility (AES-256-GCM + PBKDF2): `src/crypto.ts`
- Opencode message/part schema: opencode's session schema and message-v2 schema (npm package)
- Opencode SQLite schema: Verified against a local opencode database
- Keypal source and Drizzle adapter: keypal (npm package)
- AI SDK UIMessage format: AI SDK (npm package)
- MCP client config: `src/config/types.ts` (MCPServerConfig TypeBox schema)
- MCP client loader: `@alkdev/operations/from-mcp` (MCPClientLoader, createMCPClient, closeMCPClient)
- OpenAPI import: `@alkdev/operations/from-openapi` (HTTPServiceConfig, FromOpenAPI, supports Swagger 2.0 + OpenAPI 3.x)
- Gitea API spec: `https://git.alk.dev/swagger.v1.json` (Swagger 2.0, 299 endpoints)
- Anthropic exports: Anthropic export data (conversation format, docs.json)
- Agent sessions architecture: `docs/architecture/agent-sessions.md`
- Call protocol: `docs/architecture/call-graph.md`
- Coordination: `docs/architecture/coordination.md`
- Spoke design: `docs/architecture/spoke-runner.md`
- Task storage: [tasks.md](./tasks.md) — task tables, taskgraph integration, dual representation
- taskgraph CLI: @alkdev/taskgraph npm package — Rust CLI for task dependency management