Files
hub/docs/architecture/storage/README.md
glm-5.1 2b63cda1c7 Setup repo: migrate architecture specs, code stubs, and tasks from alkhub_ts
Copy architecture docs, ADRs, storage domain specs, research, reviews,
and 56 storage architecture tasks from the alkhub_ts monorepo. Adapt for
standalone @alkdev/hub repo structure (src/ not packages/hub/).

Sanitize all sensitive information:
- Replace private IPs (10.0.0.1) with localhost defaults
- Remove internal server hostnames (dev1, ns528096)
- Replace /workspace/ private paths with npm package references
- Remove hardcoded credentials from examples
- Rewrite infrastructure.md without private network details

Add Deno project scaffolding: deno.json (pinned deps), .gitignore,
AGENTS.md, entry point. Migrate existing code stubs (crypto, config
types, logger) with updated import paths.
2026-05-25 10:56:32 +00:00

16 KiB

status, last_updated
status last_updated
draft 2026-04-19

Storage: Drizzle + TypeBox + Postgres

Overview

The storage layer uses Drizzle ORM for database operations, PostgreSQL as the persistence layer, and @alkdev/drizzlebox for automatic TypeBox schema generation from Drizzle table definitions. Drizzle table definitions are the single source of truth — createSelectSchema / createInsertSchema generate TypeBox schemas automatically.

Location: src/storage/

For table schemas, see table-reference.md (index, common columns, cascade behavior) and the per-domain schema files (identity.md, projects.md, sessions.md, etc.). For design decisions, see ../../decisions/.

Pattern: Drizzle-Typebox

Each table file follows this pattern:

import { pgTable, text, timestamp, jsonb, boolean, integer, index, unique } from "drizzle-orm/pg-core";
import { createInsertSchema, createSelectSchema } from "@alkdev/drizzlebox";
import { Type, type Static } from "@alkdev/typebox";
import { commonCols } from "./common.ts";

// 1. Table definition with Drizzle (source of truth)
export const sessions = pgTable("sessions", {
  ...commonCols,
  projectId: text("project_id")
    .notNull()
    .references(() => projects.id, { onDelete: "cascade" }),
  title: text("title"),
  status: text("status", { enum: ["idle", "busy", "retry", "archived"] })
    .default("idle")
    .notNull(),
  data: jsonb("data").$type<SessionData>().default({}),
});

// 2. Select TypeBox schema (for API responses)
export const SelectSession = createSelectSchema(sessions, {
  metadata: Type.Object({}, { additionalProperties: true }),
  data: SessionDataSchema, // override JSON columns
});
export type SelectSession = Static<typeof SelectSession>;

// 3. Insert TypeBox schema (for API validation)
export const InsertSession = createInsertSchema(sessions, {
  title: Type.Optional(Type.String({ minLength: 1, maxLength: 500 })),
  status: Type.Optional(
    Type.Union([
      Type.Literal("idle"),
      Type.Literal("busy"),
      Type.Literal("retry"),
      Type.Literal("archived"),
    ]),
  ),
});
export type InsertSession = Static<typeof InsertSession>;

Common Columns

All tables share these columns:

import { text, timestamp, jsonb } from "drizzle-orm/pg-core";
import { sql } from "drizzle-orm";

export const commonCols = {
  id: text("id")
    .primaryKey()
    .$defaultFn(() => crypto.randomUUID()),
  metadata: jsonb("metadata").$type<Record<string, unknown>>().default({}),
  createdAt: timestamp("created_at", { withTimezone: true })
    .default(sql`now()`)
    .notNull(),
  updatedAt: timestamp("updated_at", { withTimezone: true })
    .default(sql`now()`)
    .notNull()
    .$onUpdate(() => new Date()),
};

// Note: commonCols.id uses crypto.randomUUID() which generates UUIDv4 (random, non-sortable).
// For tables requiring chronological ordering by ID (e.g. parts, messages), use sortable IDs:
//   - UUIDv7 (time-sortable) via a library like @std/ulid or uuidv7
//   - Or add an explicit sequence/position column
// The parts table uses an explicit position-based ID scheme inherited from opencode's sortable
// timestamp-based IDs. See the parts table section in sessions.md for details.
//
// Note: updatedAt uses Drizzle's $onUpdate (application-level). Direct SQL updates bypass this
// and must manually SET updated_at = now(). For critical tables, consider adding a Postgres
// trigger as a safety net.

JSONB Column Boundaries

All tables have commonCols.metadata (JSONB, default {}), and some tables have an additional domain-specific data or config column. The boundary between these columns matters for implementers:

  • metadata (commonCols): Opaque key-value pairs for subsystem use, with a namespacing convention (_subsystem.key). Examples: _keypal.scopes, _retention.expiresAt, _version. If a subsystem needs to store data on a row, it uses metadata with its prefixed namespace. The metadata column is never queried in WHERE clauses or JOINs.
  • data (domain-specific): Structured domain-specific data with known TypeScript types. Examples: session execution metadata (model, tokens, cost), message role-specific metadata, account preferences. Fields in data have defined shapes and may be validated against TypeBox schemas.
  • config (clients): Validated connection configuration. Validated against the TypeBox schema for the client type on write. Secrets are NEVER in config — they go in client_secrets.
  • identity / details (call graph, audit): Immutable context set at creation time. These record who/what/why and are never updated after creation.

Rule of thumb: If a field appears in WHERE clauses, JOIN conditions, or needs a constraint, it should be a proper column — not buried in JSONB.

Package Structure

src/storage/
├── mod.ts              # exports schema namespace + db client
├── client.ts           # drizzle + postgres connection
├── schema.ts           # barrel re-export of tables + relations
├── drizzle.config.ts   # drizzle-kit migration config
├── tables/
│   ├── common.ts       # shared columns (id, metadata, timestamps)
│   ├── accounts.ts     # hub-local identity records
│   ├── roles.ts       # behavioral role definitions (planned — see agent-roles.md)
│   ├── organizations.ts # top-level groupings
│   ├── organization_members.ts # account ↔ org membership
│   ├── projects.ts     # projects (git repositories / work contexts)
│   ├── workspaces.ts   # project workspaces (branches, directories)
│   ├── sessions.ts     # agent conversation sessions
│   ├── messages.ts     # session messages (metadata in data column)
│   ├── parts.ts        # message parts (discriminated by type, content in data)
│   ├── spokes.ts       # spoke registrations
│   ├── operations.ts         # operation definitions (what an operation IS)
│   ├── operation_registrations.ts # provider registrations (who provides it now)
│   ├── api_keys.ts     # API keys (keypal-managed, inbound auth)
│   ├── audit_logs.ts   # keypal + hub audit trail
│   ├── clients.ts      # external service registrations (outbound connections)
│   ├── client_secrets.ts # encrypted credentials for clients
│   ├── mappings.ts     # worktree/spoke/coordinator mappings
│   ├── detections.ts   # anomaly detection records
│   ├── call_graph_nodes.ts # call graph nodes
│   ├── call_graph_edges.ts # call graph edges
│   ├── tasks.ts         # SDD task definitions
│   ├── task_dependencies.ts # task dependency edges
│   └── index.ts        # barrel re-export
├── relations.ts        # drizzle relational mappings
└── test/
    └── helpers/
        ├── db.ts           # test db setup
        └── migrations.ts   # migration runner for tests

Database Connection

The hub reads database configuration from the encrypted config file (see hub-config.md). Connection parameters are NOT read from environment variables (see ADR-008, revised).

import { drizzle } from "drizzle-orm/node-postgres";
import { Pool } from "pg";
import * as schema from "./schema.ts";

// HubConfig.postgres is decrypted at startup by loadConfig()
function createPool(pgConfig: PostgresConfig) {
  return new Pool({
    host: pgConfig.host,       // default: 127.0.0.1 (localhost)
    port: pgConfig.port,       // default: 5432
    database: pgConfig.database, // default: alkdev
    user: pgConfig.user,
    password: pgConfig.password,
    ssl: pgConfig.ssl,
    max: pgConfig.maxConnections,
  });
}

export const db = drizzle(pool, { schema });

See infrastructure.md for network topology and connection details.

Migration Strategy

// drizzle.config.ts
import { defineConfig } from "drizzle-kit";

export default defineConfig({
  out: "./migrations",
  schema: "./schema.ts",
  dialect: "postgresql",
  dbCredentials: {
    // Read from a local dev config file (gitignored).
    // Generate via: alkhub-config decrypt --field postgres --config config.json
    // Then assemble the URL from the decrypted fields.
    // Do NOT use Deno.env.get() for database credentials.
    // See hub-config.md §D7 for rationale.
    url: loadDevDbUrl(),
  },
});

Where loadDevDbUrl() reads from a developer-local config file (e.g., .alkhub/dev-db.json, gitignored):

import { readFileSync } from "node:fs";

function loadDevDbUrl(): string {
  try {
    const devConfig = JSON.parse(readFileSync(".alkhub/dev-db.json", "utf-8"));
    return `postgresql://${devConfig.user}:${devConfig.password}@${devConfig.host}:${devConfig.port}/${devConfig.database}`;
  } catch {
    // Fallback for fresh dev setup — no secrets in env vars
    return "postgresql://hub:***@localhost:5432/alkdev_dev";
  }
}

Run: drizzle-kit generate to create migrations, drizzle-kit migrate to apply. At hub startup, migrations are applied programmatically (see hub-startup.md Step 5).

Important: The hub's drizzle.config.ts does NOT use Deno.env.get() for database credentials. Instead, it reads from a local development config file (gitignored) or from a decrypted field produced by alkhub-config decrypt. See hub-config.md §D7 for the decision and the approved env vars list.

Test Setup

import { drizzle } from "drizzle-orm/node-postgres";
import { Pool } from "pg";
import * as schema from "../../schema.ts";

export async function setupTestDb(testConfig: TestDbConfig) {
  const pool = new Pool({
    host: testConfig.host,
    database: testConfig.database,
    port: testConfig.port,
    user: testConfig.user,
    password: testConfig.password,
  });
  const db = drizzle(pool, { schema });
  // Run migrations
  return { pool, db };
}

Test database configuration is read from a test config file or test-specific Docker secrets, following the same pattern as production config (no env vars for credentials). The ALKHUB_TEST_CONFIG_PATH env var (non-sensitive) may point to the test config file location.

Resolved Decisions

  1. Operation spec cleanup: Resolved (D3). Operation definitions (operations table) persist independently of spoke connections. Operation registrations (operation_registrations table) are set to status: 'inactive' on disconnect and may be cascade-deleted if a spoke row is administratively removed. See D3 in storage-spec-phase1-resolutions.md.

  2. Workspaces vs. directories: Resolved. projects.directory is the convenience shortcut for the default workspace; workspaces.directory is per-workspace. Both are needed.

  3. accounts.roleaccounts.accessLevel: Resolved by ADR-012. accounts.role renamed to accounts.accessLevel (values: admin/user/service). organization_members.role renamed to organization_members.membershipLevel (values: owner/admin/member). This disambiguates access levels from behavioral roles.

Open Questions

  1. Message versioning: Opencode has a version column on sessions for schema migration. Should we version the data column format on messages and parts for forward compatibility? The commonCols.metadata column could hold a _version field.

  2. Session message compaction: Opencode has a compaction part type for context window management. The hub's storage should support this, but the compaction logic itself belongs in the session management layer, not in storage. Need to define what compaction means for hub-direct AI SDK sessions.

  3. Call graph retention policy: Call graph data can grow fast. Need a retention policy — probably TTL-based cleanup of completed/failed calls older than N days, with aggregation for observability dashboards. See the payload truncation note in call-graph.md.

  4. Keypal adapter testing: The HubKeyStorage adapter should have comprehensive tests. keypal's own test suite covers the core logic; our adapter tests cover the Drizzle integration.

  5. Cross-doc terminology migration: The "spoke" naming ADR establishes the canonical terminology. Other architecture docs still contain "runner" / "runnerId" references. These should be updated in a separate pass.

  6. Anthropic conversation import: Anthropic's web interface exports use a flat message model. A future import script should map these to our messages + parts tables. The Anthropic project model maps to our projects + sessions structure. Deferred — the export format is documented and available when needed.

  7. Gitea operations at startup: The Gitea swagger spec is at https://git.alk.dev/swagger.v1.json (Swagger 2.0, 299 endpoints). Our from_openapi.ts supports this format. At hub startup, load the Gitea client config + secret from the DB, import the spec, and register ~300 Gitea operations.

  8. Client config schema evolution: When a client type's TypeBox schema changes (e.g., adding a new field), existing DB rows with the old config shape may fail validation. Strategy: schemas should use Type.Optional() for new fields, and the resolution code should handle missing fields gracefully. If a breaking change is needed, bump a schema version in the metadata column. See ADR-007 for the validation pattern. Full contract pending specify-client-config-validation task.

  9. Task storage and sync: The database is the source of truth for task data at runtime. Markdown files serve as the authoring surface for the Decomposer and taskgraph CLI — they are ingested into the DB via a sync operation (files → DB). When offline analysis is needed, tasks can be exported from DB back to files. See tasks.md and ADR-011.

  10. Task embeddings (deferred): Task descriptions could benefit from vector embeddings for similarity search ("find tasks like this one"). Deferred from initial implementation. The metadata JSONB column can hold an embedding reference later, or a separate task_embeddings table can be added when needed.

  11. Role definitions in database: Role definitions (currently in .opencode/agents/*.md) should eventually become database records. A roles table would store role name, description, mode, permissions, tools, temperature, and model parameters. The transition follows the same pattern as taskgraph (file-based authoring, database as source of truth). See agent-roles.md for the full role model.

References

  • Crypto utility (AES-256-GCM + PBKDF2): src/crypto.ts
  • Opencode message/part schema: opencode's session schema and message-v2 schema (npm package)
  • Opencode SQLite schema: Verified against a local opencode database
  • Keypal source and Drizzle adapter: keypal (npm package)
  • AI SDK UIMessage format: AI SDK (npm package)
  • MCP client config: src/config/types.ts (MCPServerConfig TypeBox schema)
  • MCP client loader: @alkdev/operations/from-mcp (MCPClientLoader, createMCPClient, closeMCPClient)
  • OpenAPI import: @alkdev/operations/from-openapi (HTTPServiceConfig, FromOpenAPI, supports Swagger 2.0 + OpenAPI 3.x)
  • Gitea API spec: https://git.alk.dev/swagger.v1.json (Swagger 2.0, 299 endpoints)
  • Anthropic exports: Anthropic export data (conversation format, docs.json)
  • Agent sessions architecture: docs/architecture/agent-sessions.md
  • Call protocol: docs/architecture/call-graph.md
  • Coordination: docs/architecture/coordination.md
  • Spoke design: docs/architecture/spoke-runner.md
  • Task storage: tasks.md — task tables, taskgraph integration, dual representation
  • taskgraph CLI: @alkdev/taskgraph npm package — Rust CLI for task dependency management