Files
hub/docs/architecture/storage
glm-5.1 93e2286343 Align storage & architecture specs with published npm libraries
Systematically compared @alkdev/taskgraph, @alkdev/operations, and
@alkdev/flowgraph against storage/arch specs and fixed all mismatches.

Key changes:

Tasks (storage/tasks.md + ADR-011):
- Rename TaskFrontmatter → TaskInput to match library export
- Fix dependsOn (was depends_on) in field mappings — library uses
  camelCase; parseFrontmatter normalizes YAML snake_case on input
- Document DependencyEdge shape {from, to, qualityRetention?} and
  DB↔library field mapping
- Document graph node vs DB column distinction (TaskGraphNodeAttrs
  is a subset of TaskInput)
- Fix default risk fallback from low → medium (matches resolveDefaults)
- Fix cross-project guard column references (dependentTaskId, not taskId)
- Clarify @alkdev/taskgraph TS is source of truth; frontmatter is for
  LLM output parsing and legacy imports, not Rust CLI
- Add complete library exports reference

Operations (storage/spokes.md + operations.md):
- Add version, title, _meta columns to operations table (required by
  OperationSpec, were missing)
- Fix type casing: query/mutation/subscription (lowercase, matching
  OperationType runtime values)
- Make outputSchema and accessControl NOT NULL (matching library)
- Document ErrorDefinition shape {code, description, schema, httpStatus?}
- Document _meta vs commonCols.metadata distinction
- Add registerAll, get, getHandler, getByName, list, subscribe methods
- Fix buildCallHandler signature ({ registry, callMap })
- Fix OperationType values (lowercase)

Call graph (storage/call-graph.md + call-graph.md):
- Change operationId to NOT NULL with RESTRICT FK (was nullable/SET NULL)
  — matches flowgraph's required CallNodeAttrs.operationId
- Document sentinel __removed__ operation strategy for deletions
- Document ISO 8601 string ↔ timestamptz conversion requirement
- Rewrite CallEventMap to match actual library: flat dot-notation keys,
  timestamp on all events, nested error structure, optional output on
  completed event
- Remove call.running event (doesn't exist in library) — hub calls
  updateStatus(running) directly on dispatch
- Fix buildCallHandler({ registry, callMap }) signature
- Fix PendingRequestMap constructor (positional EventTarget)
- Add updateCall/removeCall/graph methods to API summary
- Document abort cascade as hub logic, not flowgraph logic
- Add open questions for operation deletion and reactive vs call graph
  semantics

Table reference (storage/table-reference.md):
- Update call_graph_nodes.operationId cascade to RESTRICT
- Update operations.type comment to lowercase
- Update status enum reference
2026-05-25 11:46:42 +00:00
..

status, last_updated
status last_updated
draft 2026-04-19

Storage: Drizzle + TypeBox + Postgres

Overview

The storage layer uses Drizzle ORM for database operations, PostgreSQL as the persistence layer, and @alkdev/drizzlebox for automatic TypeBox schema generation from Drizzle table definitions. Drizzle table definitions are the single source of truth — createSelectSchema / createInsertSchema generate TypeBox schemas automatically.

Location: src/storage/

For table schemas, see table-reference.md (index, common columns, cascade behavior) and the per-domain schema files (identity.md, projects.md, sessions.md, etc.). For design decisions, see ../../decisions/.

Pattern: Drizzle-Typebox

Each table file follows this pattern:

import { pgTable, text, timestamp, jsonb, boolean, integer, index, unique } from "drizzle-orm/pg-core";
import { createInsertSchema, createSelectSchema } from "@alkdev/drizzlebox";
import { Type, type Static } from "@alkdev/typebox";
import { commonCols } from "./common.ts";

// 1. Table definition with Drizzle (source of truth)
export const sessions = pgTable("sessions", {
  ...commonCols,
  projectId: text("project_id")
    .notNull()
    .references(() => projects.id, { onDelete: "cascade" }),
  title: text("title"),
  status: text("status", { enum: ["idle", "busy", "retry", "archived"] })
    .default("idle")
    .notNull(),
  data: jsonb("data").$type<SessionData>().default({}),
});

// 2. Select TypeBox schema (for API responses)
export const SelectSession = createSelectSchema(sessions, {
  metadata: Type.Object({}, { additionalProperties: true }),
  data: SessionDataSchema, // override JSON columns
});
export type SelectSession = Static<typeof SelectSession>;

// 3. Insert TypeBox schema (for API validation)
export const InsertSession = createInsertSchema(sessions, {
  title: Type.Optional(Type.String({ minLength: 1, maxLength: 500 })),
  status: Type.Optional(
    Type.Union([
      Type.Literal("idle"),
      Type.Literal("busy"),
      Type.Literal("retry"),
      Type.Literal("archived"),
    ]),
  ),
});
export type InsertSession = Static<typeof InsertSession>;

Common Columns

All tables share these columns:

import { text, timestamp, jsonb } from "drizzle-orm/pg-core";
import { sql } from "drizzle-orm";

export const commonCols = {
  id: text("id")
    .primaryKey()
    .$defaultFn(() => crypto.randomUUID()),
  metadata: jsonb("metadata").$type<Record<string, unknown>>().default({}),
  createdAt: timestamp("created_at", { withTimezone: true })
    .default(sql`now()`)
    .notNull(),
  updatedAt: timestamp("updated_at", { withTimezone: true })
    .default(sql`now()`)
    .notNull()
    .$onUpdate(() => new Date()),
};

// Note: commonCols.id uses crypto.randomUUID() which generates UUIDv4 (random, non-sortable).
// For tables requiring chronological ordering by ID (e.g. parts, messages), use sortable IDs:
//   - UUIDv7 (time-sortable) via a library like @std/ulid or uuidv7
//   - Or add an explicit sequence/position column
// The parts table uses an explicit position-based ID scheme inherited from opencode's sortable
// timestamp-based IDs. See the parts table section in sessions.md for details.
//
// Note: updatedAt uses Drizzle's $onUpdate (application-level). Direct SQL updates bypass this
// and must manually SET updated_at = now(). For critical tables, consider adding a Postgres
// trigger as a safety net.

JSONB Column Boundaries

All tables have commonCols.metadata (JSONB, default {}), and some tables have an additional domain-specific data or config column. The boundary between these columns matters for implementers:

  • metadata (commonCols): Opaque key-value pairs for subsystem use, with a namespacing convention (_subsystem.key). Examples: _keypal.scopes, _retention.expiresAt, _version. If a subsystem needs to store data on a row, it uses metadata with its prefixed namespace. The metadata column is never queried in WHERE clauses or JOINs.
  • data (domain-specific): Structured domain-specific data with known TypeScript types. Examples: session execution metadata (model, tokens, cost), message role-specific metadata, account preferences. Fields in data have defined shapes and may be validated against TypeBox schemas.
  • config (clients): Validated connection configuration. Validated against the TypeBox schema for the client type on write. Secrets are NEVER in config — they go in client_secrets.
  • identity / details (call graph, audit): Immutable context set at creation time. These record who/what/why and are never updated after creation.

Rule of thumb: If a field appears in WHERE clauses, JOIN conditions, or needs a constraint, it should be a proper column — not buried in JSONB.

Package Structure

src/storage/
├── mod.ts              # exports schema namespace + db client
├── client.ts           # drizzle + postgres connection
├── schema.ts           # barrel re-export of tables + relations
├── drizzle.config.ts   # drizzle-kit migration config
├── tables/
│   ├── common.ts       # shared columns (id, metadata, timestamps)
│   ├── accounts.ts     # hub-local identity records
│   ├── roles.ts       # behavioral role definitions (planned — see agent-roles.md)
│   ├── organizations.ts # top-level groupings
│   ├── organization_members.ts # account ↔ org membership
│   ├── projects.ts     # projects (git repositories / work contexts)
│   ├── workspaces.ts   # project workspaces (branches, directories)
│   ├── sessions.ts     # agent conversation sessions
│   ├── messages.ts     # session messages (metadata in data column)
│   ├── parts.ts        # message parts (discriminated by type, content in data)
│   ├── spokes.ts       # spoke registrations
│   ├── operations.ts         # operation definitions (what an operation IS)
│   ├── operation_registrations.ts # provider registrations (who provides it now)
│   ├── api_keys.ts     # API keys (keypal-managed, inbound auth)
│   ├── audit_logs.ts   # keypal + hub audit trail
│   ├── clients.ts      # external service registrations (outbound connections)
│   ├── client_secrets.ts # encrypted credentials for clients
│   ├── mappings.ts     # worktree/spoke/coordinator mappings
│   ├── detections.ts   # anomaly detection records
│   ├── call_graph_nodes.ts # call graph nodes
│   ├── call_graph_edges.ts # call graph edges
│   ├── tasks.ts         # SDD task definitions
│   ├── task_dependencies.ts # task dependency edges
│   └── index.ts        # barrel re-export
├── relations.ts        # drizzle relational mappings
└── test/
    └── helpers/
        ├── db.ts           # test db setup
        └── migrations.ts   # migration runner for tests

Database Connection

The hub reads database configuration from the encrypted config file (see hub-config.md). Connection parameters are NOT read from environment variables (see ADR-008, revised).

import { drizzle } from "drizzle-orm/node-postgres";
import { Pool } from "pg";
import * as schema from "./schema.ts";

// HubConfig.postgres is decrypted at startup by loadConfig()
function createPool(pgConfig: PostgresConfig) {
  return new Pool({
    host: pgConfig.host,       // default: 127.0.0.1 (localhost)
    port: pgConfig.port,       // default: 5432
    database: pgConfig.database, // default: alkdev
    user: pgConfig.user,
    password: pgConfig.password,
    ssl: pgConfig.ssl,
    max: pgConfig.maxConnections,
  });
}

export const db = drizzle(pool, { schema });

See infrastructure.md for network topology and connection details.

Migration Strategy

// drizzle.config.ts
import { defineConfig } from "drizzle-kit";

export default defineConfig({
  out: "./migrations",
  schema: "./schema.ts",
  dialect: "postgresql",
  dbCredentials: {
    // Read from a local dev config file (gitignored).
    // Generate via: alkhub-config decrypt --field postgres --config config.json
    // Then assemble the URL from the decrypted fields.
    // Do NOT use Deno.env.get() for database credentials.
    // See hub-config.md §D7 for rationale.
    url: loadDevDbUrl(),
  },
});

Where loadDevDbUrl() reads from a developer-local config file (e.g., .alkhub/dev-db.json, gitignored):

import { readFileSync } from "node:fs";

function loadDevDbUrl(): string {
  try {
    const devConfig = JSON.parse(readFileSync(".alkhub/dev-db.json", "utf-8"));
    return `postgresql://${devConfig.user}:${devConfig.password}@${devConfig.host}:${devConfig.port}/${devConfig.database}`;
  } catch {
    // Fallback for fresh dev setup — no secrets in env vars
    return "postgresql://hub:***@localhost:5432/alkdev_dev";
  }
}

Run: drizzle-kit generate to create migrations, drizzle-kit migrate to apply. At hub startup, migrations are applied programmatically (see hub-startup.md Step 5).

Important: The hub's drizzle.config.ts does NOT use Deno.env.get() for database credentials. Instead, it reads from a local development config file (gitignored) or from a decrypted field produced by alkhub-config decrypt. See hub-config.md §D7 for the decision and the approved env vars list.

Test Setup

import { drizzle } from "drizzle-orm/node-postgres";
import { Pool } from "pg";
import * as schema from "../../schema.ts";

export async function setupTestDb(testConfig: TestDbConfig) {
  const pool = new Pool({
    host: testConfig.host,
    database: testConfig.database,
    port: testConfig.port,
    user: testConfig.user,
    password: testConfig.password,
  });
  const db = drizzle(pool, { schema });
  // Run migrations
  return { pool, db };
}

Test database configuration is read from a test config file or test-specific Docker secrets, following the same pattern as production config (no env vars for credentials). The ALKHUB_TEST_CONFIG_PATH env var (non-sensitive) may point to the test config file location.

Resolved Decisions

  1. Operation spec cleanup: Resolved (D3). Operation definitions (operations table) persist independently of spoke connections. Operation registrations (operation_registrations table) are set to status: 'inactive' on disconnect and may be cascade-deleted if a spoke row is administratively removed. See D3 in storage-spec-phase1-resolutions.md.

  2. Workspaces vs. directories: Resolved. projects.directory is the convenience shortcut for the default workspace; workspaces.directory is per-workspace. Both are needed.

  3. accounts.roleaccounts.accessLevel: Resolved by ADR-012. accounts.role renamed to accounts.accessLevel (values: admin/user/service). organization_members.role renamed to organization_members.membershipLevel (values: owner/admin/member). This disambiguates access levels from behavioral roles.

Open Questions

  1. Message versioning: Opencode has a version column on sessions for schema migration. Should we version the data column format on messages and parts for forward compatibility? The commonCols.metadata column could hold a _version field.

  2. Session message compaction: Opencode has a compaction part type for context window management. The hub's storage should support this, but the compaction logic itself belongs in the session management layer, not in storage. Need to define what compaction means for hub-direct AI SDK sessions.

  3. Call graph retention policy: Call graph data can grow fast. Need a retention policy — probably TTL-based cleanup of completed/failed calls older than N days, with aggregation for observability dashboards. See the payload truncation note in call-graph.md.

  4. Keypal adapter testing: The HubKeyStorage adapter should have comprehensive tests. keypal's own test suite covers the core logic; our adapter tests cover the Drizzle integration.

  5. Cross-doc terminology migration: The "spoke" naming ADR establishes the canonical terminology. Other architecture docs still contain "runner" / "runnerId" references. These should be updated in a separate pass.

  6. Anthropic conversation import: Anthropic's web interface exports use a flat message model. A future import script should map these to our messages + parts tables. The Anthropic project model maps to our projects + sessions structure. Deferred — the export format is documented and available when needed.

  7. Gitea operations at startup: The Gitea swagger spec is at https://git.alk.dev/swagger.v1.json (Swagger 2.0, 299 endpoints). Our from_openapi.ts supports this format. At hub startup, load the Gitea client config + secret from the DB, import the spec, and register ~300 Gitea operations.

  8. Client config schema evolution: When a client type's TypeBox schema changes (e.g., adding a new field), existing DB rows with the old config shape may fail validation. Strategy: schemas should use Type.Optional() for new fields, and the resolution code should handle missing fields gracefully. If a breaking change is needed, bump a schema version in the metadata column. See ADR-007 for the validation pattern. Full contract pending specify-client-config-validation task.

  9. Task storage and sync: The database is the source of truth for task data at runtime. Markdown files serve as the authoring surface for the Decomposer and taskgraph CLI — they are ingested into the DB via a sync operation (files → DB). When offline analysis is needed, tasks can be exported from DB back to files. See tasks.md and ADR-011.

  10. Task embeddings (deferred): Task descriptions could benefit from vector embeddings for similarity search ("find tasks like this one"). Deferred from initial implementation. The metadata JSONB column can hold an embedding reference later, or a separate task_embeddings table can be added when needed.

  11. Role definitions in database: Role definitions (currently in .opencode/agents/*.md) should eventually become database records. A roles table would store role name, description, mode, permissions, tools, temperature, and model parameters. The transition follows the same pattern as taskgraph (file-based authoring, database as source of truth). See agent-roles.md for the full role model.

References

  • Crypto utility (AES-256-GCM + PBKDF2): src/crypto.ts
  • Opencode message/part schema: opencode's session schema and message-v2 schema (npm package)
  • Opencode SQLite schema: Verified against a local opencode database
  • Keypal source and Drizzle adapter: keypal (npm package)
  • AI SDK UIMessage format: AI SDK (npm package)
  • MCP client config: src/config/types.ts (MCPServerConfig TypeBox schema)
  • MCP client loader: @alkdev/operations/from-mcp (MCPClientLoader, createMCPClient, closeMCPClient)
  • OpenAPI import: @alkdev/operations/from-openapi (HTTPServiceConfig, FromOpenAPI, supports Swagger 2.0 + OpenAPI 3.x)
  • Gitea API spec: https://git.alk.dev/swagger.v1.json (Swagger 2.0, 299 endpoints)
  • Anthropic exports: Anthropic export data (conversation format, docs.json)
  • Agent sessions architecture: docs/architecture/agent-sessions.md
  • Call protocol: docs/architecture/call-graph.md
  • Coordination: docs/architecture/coordination.md
  • Spoke design: docs/architecture/spoke-runner.md
  • Task storage: tasks.md — task tables, taskgraph integration, dual representation
  • taskgraph CLI: @alkdev/taskgraph npm package — Rust CLI for task dependency management