Files
hub/docs/architecture/storage/sessions.md
glm-5.1 2b63cda1c7 Setup repo: migrate architecture specs, code stubs, and tasks from alkhub_ts
Copy architecture docs, ADRs, storage domain specs, research, reviews,
and 56 storage architecture tasks from the alkhub_ts monorepo. Adapt for
standalone @alkdev/hub repo structure (src/ not packages/hub/).

Sanitize all sensitive information:
- Replace private IPs (10.0.0.1) with localhost defaults
- Remove internal server hostnames (dev1, ns528096)
- Replace /workspace/ private paths with npm package references
- Remove hardcoded credentials from examples
- Rewrite infrastructure.md without private network details

Add Deno project scaffolding: deno.json (pinned deps), .gitignore,
AGENTS.md, entry point. Migrate existing code stubs (crypto, config
types, logger) with updated import paths.
2026-05-25 10:56:32 +00:00

14 KiB

status, last_updated
status last_updated
draft 2026-04-20

Table Schemas: Sessions, Messages & Parts

Agent conversation session tables. For cross-cutting reference (cascade behavior, index reference, status enums, relations), see table-reference.md. For design decisions, see ../../../decisions/. For the session architecture, see ../../agent-sessions.md.

sessions

Agent conversation sessions. Every session — whether the LLM runs directly in the hub or in a remote opencode container — stores its data here. The hub is the source of truth; spokes are execution environments.

Column Type Notes
commonCols id, metadata, createdAt, updatedAt
accountId text FK → accounts.id — Nullable — orphaned sessions preserve conversation history for audit and debugging. See D1 in storage-spec-phase1-resolutions.md.
projectId text NOT NULL FK → projects.id (cascade)
workspaceId text FK → workspaces.id
parentId text FK → sessions.id — Parent session (coordinator relationship). onDelete: SET NULL — deleting a parent session detaches children but preserves them.
slug text NOT NULL UNIQUE URL-friendly session identifier (unique across all sessions). slug is generated from the session title using URL-friendly slugification (lowercase, hyphens for spaces, alphanumeric only). Uniqueness is enforced by the UNIQUE constraint. If a collision occurs, append a short random suffix.
title text NOT NULL Session title
status text NOT NULL Enum: idle, busy, retry, archived. Default: idle
version text NOT NULL Schema version of the session's data column. Default: '1'. Incremented when the data format changes (e.g., new optional fields added). New fields should be optional in the schema, so version advances for breaking changes only. The hub uses this for migration-aware reads: version 1 sessions get default values for new fields. This field exists for forward compatibility — it allows the hub to interpret session data correctly as the schema evolves. It is NOT a concurrency version (for optimistic locking, use commonCols.updatedAt).
provider text Execution path: direct (hub AI SDK) or opencode (spoke)
roleName text Which role this session fills (e.g., "architect", "implementation-specialist"). Formerly agentName in OpenCode. See ADR-012 and agent-roles.md. roleName is a free-form string (not a FK constraint). Known role names are defined in the roles table, but sessions may use ad-hoc role names. Application code should validate against known roles when available but tolerate unknown values.
data jsonb Role-specific metadata (model, tokens, cost, finish reason, etc.)

data boundaries: Execution metadata goes in data (model, tokens, cost, finish reason, resolved permissions). Structured fields like status, provider, roleName are separate columns because they're queried, filtered, and constrained. If a field appears in WHERE clauses or JOINs, it should be a proper column, not buried in JSONB.

Session data shapes: The data JSONB column holds execution-path-specific metadata. For direct sessions: { model, tokens, cost, finish }. For opencode sessions: additional fields from opencode's session model (summary stats, etc.). The data column also holds the resolved permissions for the session (data.scope), which is computed from the intersection of role permissions, account scopes, and spoke type trust level. See agent-sessions.md and agent-roles.md for the full models.

Status lifecycle:

  • idle: Session exists, not currently executing
  • busy: Session is actively processing (LLM call in progress)
  • retry: Last execution failed, session pending retry
  • archived: Session is read-only, no further interaction

Indexes: unq_sessions_slug UNIQUE on (slug), idx_sessions_project_id on (projectId), idx_sessions_workspace_id on (workspaceId), idx_sessions_status on (status), idx_sessions_active partial on (id) WHERE status IN ('idle', 'busy', 'retry') — efficiently find active (non-archived) sessions, idx_sessions_account_id on (accountId), idx_sessions_role_name on (roleName), idx_sessions_parent_id on (parentId) — find child sessions of coordinator.

messages

Messages within sessions. Content is stored separately in the parts table. This follows the opencode pattern: message metadata in one row, parts in separate rows. This enables streaming individual part updates, querying parts independently, and SSE events for message.part.updated.

Column Type Notes
commonCols id, metadata, createdAt, updatedAt
sessionId text NOT NULL IMMUTABLE FK → sessions.id (cascade) — Never updated after creation.
role text NOT NULL user, assistant, system
data jsonb NOT NULL Role-specific metadata

Message IDs use UUIDv4 (via commonCols.id). Ordering is handled by the composite index idx_messages_session_id_created_at_id on (session_id, created_at, id). See ADR-003 for the rationale.

Message data shapes (discriminated by role):

user messages:

{
  time: { created: number },           // epoch ms
  format?: "text" | "json_schema",    // input format hint
  summary?: { title?: string, body?: string, diffs?: FileDiff[] },
  agent?: string,                      // target agent name
  model?: { providerID: string, modelID: string },
  tools?: Record<string, boolean>,     // enabled tools for this turn
}

assistant messages:

{
  time: { created: number, completed?: number },
  parentID?: string,                   // FK to the user message that triggered this turn
  modelID: string,
  providerID: string,
  agent?: string,
  path?: { cwd: string, root: string },
  cost?: number,
  tokens?: { input: number, output: number, reasoning?: number, cache?: { read: number, write: number } },
  finish?: string,                     // "stop", "tool-calls", "length", etc.
  error?: { code: string, message: string }, // typed error if the turn failed
}

system messages:

{
  time: { created: number },
  content: string,                     // system prompt text
}

Compatibility with opencode: The data blob is a superset of opencode's InfoData. When importing an opencode session, the opencode-specific fields (parentID, path, modelID, providerID, cost, tokens, finish) map directly. When importing from a hub-direct AI SDK session, the AI SDK UIMessage fields are projected into the same shape.

Compatibility with AI SDK: The AI SDK's UIMessage format (role + parts array) is assembled from these tables via a JOIN query. Storage is normalized; the API presents the denormalized view. No format conversion needed.

parts

Message parts — the actual content of the conversation. Each part has a type discriminator and type-specific content in the data column. Parts are ordered by their id within a message, using sortable timestamp-based IDs (not commonCols.id).

Important: The id column for parts uses a sortable ID scheme (not UUIDv4 from commonCols). Opencode uses prefix-based sortable IDs like prt_{timestamp_hex}{random} that give chronological ordering. This enables ORDER BY id ASC within a message without needing a separate position column. The implementation should use a monotonic ID generator that produces lexicographically sortable IDs.

The sessionId column on parts is a deliberate denormalization of message.sessionId — it allows direct queries like "all parts for a session" without joining through messages. sessionId on both messages and parts is IMMUTABLE after creation. It must never be updated. This is enforced by application logic, not a DB trigger. When inserting a part, read the message's sessionId and set it on the part within the same transaction. Direct SQL must not update sessionId on existing rows.

Column Type Notes
id text PK NOT NULL Sortable timestamp-based ID (not commonCols.id)
metadata jsonb defaults to {}
createdAt timestamp with tz NOT NULL defaults to now()
updatedAt timestamp with tz NOT NULL defaults to now(), $onUpdate(() => new Date())
messageId text NOT NULL FK → messages.id (cascade)
sessionId text NOT NULL IMMUTABLE FK → sessions.id (cascade, denormalized for direct queries) — Never updated after creation.
type text NOT NULL Part type discriminator (see below)
data jsonb NOT NULL Type-specific content

Parts are immutable after creation. updatedAt is set on creation but parts should never be updated. The $onUpdate hook from commonCols is a no-op for parts because insert-only operations don't trigger it. If a part needs correction, insert a new part (e.g., a correction or amendment) rather than updating an existing one. The id column uses a sortable ID scheme (not UUIDv4 from commonCols) because chronological ordering within a message is required — see the sortable ID note above.

Part types and their data shapes:

The type field determines the shape of data. Our part types are a subset of opencode's MessageV2.Part discriminated union, expanded with AI SDK compatibility types. The types we include are:

type Description data shape
text Main text content (user or assistant) { text: string, synthetic?: boolean, ignored?: boolean, time?: { start: number, end: number }, metadata?: Record<string, unknown> }
reasoning Chain-of-thought / extended thinking { text: string, metadata?: Record<string, unknown>, time: { start: number, end: number } }
tool Tool invocation with lifecycle state { callID: string, tool: string, state: ToolState } — see below
step-start Beginning of an agentic step { snapshot?: string } — git tree hash
step-finish End of an agentic step with cost accounting { reason: string, snapshot?: string, cost?: number, tokens: { input: number, output: number, reasoning?: number, cache?: { read: number, write: number } } }
file File attachment { mime: string, filename?: string, url: string, source?: FileSource }
patch Git patch applied during tool execution { hash: string, files: string[] }
snapshot Git tree hash reference { snapshot: string }
agent Sub-agent delegation (e.g., @reviewer) { name: string, source?: { value: string, start: number, end: number } }
compaction Context window compaction marker { auto: boolean, overflow?: boolean }

Tool state discriminated union (ToolState):

type ToolState =
  | { status: "pending", input: Record<string, unknown>, raw: string }
  | { status: "running", input: Record<string, unknown>, title?: string, metadata?: Record<string, unknown>, time: { start: number } }
  | { status: "completed", input: Record<string, unknown>, output: string, title: string, metadata: Record<string, unknown>, time: { start: number, end: number, compacted?: boolean }, attachments?: FilePartData[] }
  | { status: "error", input: Record<string, unknown>, error: string, metadata?: Record<string, unknown>, time: { start: number, end: number } }

File source types:

type FileSource =
  | { type: "file", path: string, text: { value: string, start: number, end: number } }
  | { type: "symbol", path: string, name: string, kind: number, range: LSPLikeRange, text: { value: string, start: number, end: number } }
  | { type: "resource", clientName: string, uri: string, text: { value: string, start: number, end: number } }

type FilePartData = {
  mime: string;
  filename?: string;
  url: string;
  source?: FileSource;
};

AI SDK UIMessage compatibility: The API assembles UIMessage from messages + parts via JOIN. The mapping is:

  • text (not ignored) → { type: "text", text }
  • file (non-text, non-directory) → { type: "file", url, mediaType, filename }
  • reasoning{ type: "reasoning", text }
  • step-start{ type: "step-start" }
  • tool (completed) → { type: "tool-{name}", state: "output-available", toolCallId, input, output }
  • tool (error) → { type: "tool-{name}", state: "output-error", toolCallId, input, errorText }

AI SDK part types not mapped to the UIMessage view: step-finish, patch, snapshot, compaction, agent. These are either internal SDK events (step-finish, compaction), tool-execution metadata handled within the tool part's state lifecycle (patch, snapshot), or session-level delegation (agent, handled via sessions.parentId). They are stored in the parts table but excluded from the UIMessage assembly.

Why separate parts table: Streaming individual part updates, publishing message.part.updated SSE events, and querying parts independently (e.g., "find all tool calls in this session") all require parts to be their own rows, not embedded in a message JSON blob. This is the same pattern opencode uses and it works well at scale (100k+ parts across 24k+ messages in production).

Parts are flat — there is no parentId column on parts. Sub-agent delegation is handled at the session level (via sessions.parentId), not by nesting parts. If nesting becomes necessary in the future, it would require a schema change (adding parentId to parts).

Indexes: part_session_idx on (session_id), part_message_id_id_idx on (message_id, id) for efficient message loading, and idx_parts_session_id_type on (session_id, type) for queries like "all tool-call parts in session X".