Files
hub/docs/architecture/storage/sessions.md
glm-5.1 2b63cda1c7 Setup repo: migrate architecture specs, code stubs, and tasks from alkhub_ts
Copy architecture docs, ADRs, storage domain specs, research, reviews,
and 56 storage architecture tasks from the alkhub_ts monorepo. Adapt for
standalone @alkdev/hub repo structure (src/ not packages/hub/).

Sanitize all sensitive information:
- Replace private IPs (10.0.0.1) with localhost defaults
- Remove internal server hostnames (dev1, ns528096)
- Replace /workspace/ private paths with npm package references
- Remove hardcoded credentials from examples
- Rewrite infrastructure.md without private network details

Add Deno project scaffolding: deno.json (pinned deps), .gitignore,
AGENTS.md, entry point. Migrate existing code stubs (crypto, config
types, logger) with updated import paths.
2026-05-25 10:56:32 +00:00

174 lines
14 KiB
Markdown

---
status: draft
last_updated: 2026-04-20
---
# Table Schemas: Sessions, Messages & Parts
Agent conversation session tables. For cross-cutting reference (cascade behavior, index reference, status enums, relations), see [table-reference.md](./table-reference.md). For design decisions, see [../../../decisions/](../../../decisions/). For the session architecture, see [../../agent-sessions.md](../../agent-sessions.md).
### `sessions`
Agent conversation sessions. Every session — whether the LLM runs directly in the hub or in a remote opencode container — stores its data here. The hub is the source of truth; spokes are execution environments.
| Column | Type | Notes |
|--------|------|-------|
| commonCols | — | id, metadata, createdAt, updatedAt |
| accountId | text | FK → accounts.id — Nullable — orphaned sessions preserve conversation history for audit and debugging. See D1 in storage-spec-phase1-resolutions.md. |
| projectId | text NOT NULL | FK → projects.id (cascade) |
| workspaceId | text | FK → workspaces.id |
| parentId | text | FK → sessions.id — Parent session (coordinator relationship). onDelete: SET NULL — deleting a parent session detaches children but preserves them. |
| slug | text NOT NULL UNIQUE | URL-friendly session identifier (unique across all sessions). `slug` is generated from the session title using URL-friendly slugification (lowercase, hyphens for spaces, alphanumeric only). Uniqueness is enforced by the UNIQUE constraint. If a collision occurs, append a short random suffix. |
| title | text NOT NULL | Session title |
| status | text NOT NULL | Enum: `idle`, `busy`, `retry`, `archived`. Default: `idle` |
| version | text NOT NULL | Schema version of the session's `data` column. Default: `'1'`. Incremented when the data format changes (e.g., new optional fields added). New fields should be optional in the schema, so `version` advances for breaking changes only. The hub uses this for migration-aware reads: version 1 sessions get default values for new fields. This field exists for forward compatibility — it allows the hub to interpret session data correctly as the schema evolves. It is NOT a concurrency version (for optimistic locking, use `commonCols.updatedAt`). |
| provider | text | Execution path: `direct` (hub AI SDK) or `opencode` (spoke) |
| roleName | text | Which role this session fills (e.g., "architect", "implementation-specialist"). Formerly `agentName` in OpenCode. See [ADR-012](../../../decisions/ADR-012-agent-vs-role-vs-account.md) and [agent-roles.md](../../agent-roles.md). `roleName` is a free-form string (not a FK constraint). Known role names are defined in the `roles` table, but sessions may use ad-hoc role names. Application code should validate against known roles when available but tolerate unknown values. |
| data | jsonb | Role-specific metadata (model, tokens, cost, finish reason, etc.) |
**data boundaries**: Execution metadata goes in `data` (model, tokens, cost, finish reason, resolved permissions). Structured fields like `status`, `provider`, `roleName` are separate columns because they're queried, filtered, and constrained. If a field appears in WHERE clauses or JOINs, it should be a proper column, not buried in JSONB.
**Session `data` shapes**: The `data` JSONB column holds execution-path-specific metadata. For `direct` sessions: `{ model, tokens, cost, finish }`. For `opencode` sessions: additional fields from opencode's session model (summary stats, etc.). The `data` column also holds the resolved permissions for the session (`data.scope`), which is computed from the intersection of role permissions, account scopes, and spoke type trust level. See agent-sessions.md and [agent-roles.md](../../agent-roles.md) for the full models.
**Status lifecycle**:
- `idle`: Session exists, not currently executing
- `busy`: Session is actively processing (LLM call in progress)
- `retry`: Last execution failed, session pending retry
- `archived`: Session is read-only, no further interaction
**Indexes**: `unq_sessions_slug` UNIQUE on `(slug)`, `idx_sessions_project_id` on `(projectId)`, `idx_sessions_workspace_id` on `(workspaceId)`, `idx_sessions_status` on `(status)`, `idx_sessions_active` partial on `(id)` WHERE `status IN ('idle', 'busy', 'retry')` — efficiently find active (non-archived) sessions, `idx_sessions_account_id` on `(accountId)`, `idx_sessions_role_name` on `(roleName)`, `idx_sessions_parent_id` on `(parentId)` — find child sessions of coordinator.
### `messages`
Messages within sessions. Content is stored separately in the `parts` table. This follows the opencode pattern: message metadata in one row, parts in separate rows. This enables streaming individual part updates, querying parts independently, and SSE events for `message.part.updated`.
| Column | Type | Notes |
|--------|------|-------|
| commonCols | — | id, metadata, createdAt, updatedAt |
| sessionId | text NOT NULL **IMMUTABLE** | FK → sessions.id (cascade) — Never updated after creation. |
| role | text NOT NULL | `user`, `assistant`, `system` |
| data | jsonb NOT NULL | Role-specific metadata |
**Message IDs use UUIDv4** (via `commonCols.id`). Ordering is handled by the composite index `idx_messages_session_id_created_at_id` on `(session_id, created_at, id)`. See ADR-003 for the rationale.
**Message `data` shapes** (discriminated by `role`):
`user` messages:
```ts
{
time: { created: number }, // epoch ms
format?: "text" | "json_schema", // input format hint
summary?: { title?: string, body?: string, diffs?: FileDiff[] },
agent?: string, // target agent name
model?: { providerID: string, modelID: string },
tools?: Record<string, boolean>, // enabled tools for this turn
}
```
`assistant` messages:
```ts
{
time: { created: number, completed?: number },
parentID?: string, // FK to the user message that triggered this turn
modelID: string,
providerID: string,
agent?: string,
path?: { cwd: string, root: string },
cost?: number,
tokens?: { input: number, output: number, reasoning?: number, cache?: { read: number, write: number } },
finish?: string, // "stop", "tool-calls", "length", etc.
error?: { code: string, message: string }, // typed error if the turn failed
}
```
`system` messages:
```ts
{
time: { created: number },
content: string, // system prompt text
}
```
**Compatibility with opencode**: The `data` blob is a superset of opencode's `InfoData`. When importing an opencode session, the opencode-specific fields (`parentID`, `path`, `modelID`, `providerID`, `cost`, `tokens`, `finish`) map directly. When importing from a hub-direct AI SDK session, the AI SDK `UIMessage` fields are projected into the same shape.
**Compatibility with AI SDK**: The AI SDK's `UIMessage` format (role + parts array) is assembled from these tables via a JOIN query. Storage is normalized; the API presents the denormalized view. No format conversion needed.
### `parts`
Message parts — the actual content of the conversation. Each part has a `type` discriminator and type-specific content in the `data` column. Parts are ordered by their `id` within a message, using sortable timestamp-based IDs (not `commonCols.id`).
**Important**: The `id` column for parts uses a sortable ID scheme (not UUIDv4 from `commonCols`). Opencode uses prefix-based sortable IDs like `prt_{timestamp_hex}{random}` that give chronological ordering. This enables `ORDER BY id ASC` within a message without needing a separate `position` column. The implementation should use a monotonic ID generator that produces lexicographically sortable IDs.
The `sessionId` column on parts is a deliberate denormalization of `message.sessionId` — it allows direct queries like "all parts for a session" without joining through messages. **`sessionId` on both `messages` and `parts` is IMMUTABLE after creation.** It must never be updated. This is enforced by application logic, not a DB trigger. When inserting a part, read the message's `sessionId` and set it on the part within the same transaction. Direct SQL must not update `sessionId` on existing rows.
| Column | Type | Notes |
|--------|------|-------|
| id | text PK NOT NULL | Sortable timestamp-based ID (not commonCols.id) |
| metadata | jsonb | defaults to `{}` |
| createdAt | timestamp with tz NOT NULL | defaults to `now()` |
| updatedAt | timestamp with tz NOT NULL | defaults to `now()`, `$onUpdate(() => new Date())` |
| messageId | text NOT NULL | FK → messages.id (cascade) |
| sessionId | text NOT NULL **IMMUTABLE** | FK → sessions.id (cascade, denormalized for direct queries) — Never updated after creation. |
| type | text NOT NULL | Part type discriminator (see below) |
| data | jsonb NOT NULL | Type-specific content |
**Parts are immutable after creation.** `updatedAt` is set on creation but parts should never be updated. The `$onUpdate` hook from `commonCols` is a no-op for parts because insert-only operations don't trigger it. If a part needs correction, insert a new part (e.g., a correction or amendment) rather than updating an existing one. The `id` column uses a sortable ID scheme (not UUIDv4 from `commonCols`) because chronological ordering within a message is required — see the sortable ID note above.
**Part types and their `data` shapes**:
The `type` field determines the shape of `data`. Our part types are a subset of opencode's `MessageV2.Part` discriminated union, expanded with AI SDK compatibility types. The types we include are:
| type | Description | data shape |
|------|-------------|------------|
| `text` | Main text content (user or assistant) | `{ text: string, synthetic?: boolean, ignored?: boolean, time?: { start: number, end: number }, metadata?: Record<string, unknown> }` |
| `reasoning` | Chain-of-thought / extended thinking | `{ text: string, metadata?: Record<string, unknown>, time: { start: number, end: number } }` |
| `tool` | Tool invocation with lifecycle state | `{ callID: string, tool: string, state: ToolState }` — see below |
| `step-start` | Beginning of an agentic step | `{ snapshot?: string }` — git tree hash |
| `step-finish` | End of an agentic step with cost accounting | `{ reason: string, snapshot?: string, cost?: number, tokens: { input: number, output: number, reasoning?: number, cache?: { read: number, write: number } } }` |
| `file` | File attachment | `{ mime: string, filename?: string, url: string, source?: FileSource }` |
| `patch` | Git patch applied during tool execution | `{ hash: string, files: string[] }` |
| `snapshot` | Git tree hash reference | `{ snapshot: string }` |
| `agent` | Sub-agent delegation (e.g., @reviewer) | `{ name: string, source?: { value: string, start: number, end: number } }` |
| `compaction` | Context window compaction marker | `{ auto: boolean, overflow?: boolean }` |
**Tool state discriminated union** (`ToolState`):
```ts
type ToolState =
| { status: "pending", input: Record<string, unknown>, raw: string }
| { status: "running", input: Record<string, unknown>, title?: string, metadata?: Record<string, unknown>, time: { start: number } }
| { status: "completed", input: Record<string, unknown>, output: string, title: string, metadata: Record<string, unknown>, time: { start: number, end: number, compacted?: boolean }, attachments?: FilePartData[] }
| { status: "error", input: Record<string, unknown>, error: string, metadata?: Record<string, unknown>, time: { start: number, end: number } }
```
**File source types**:
```ts
type FileSource =
| { type: "file", path: string, text: { value: string, start: number, end: number } }
| { type: "symbol", path: string, name: string, kind: number, range: LSPLikeRange, text: { value: string, start: number, end: number } }
| { type: "resource", clientName: string, uri: string, text: { value: string, start: number, end: number } }
type FilePartData = {
mime: string;
filename?: string;
url: string;
source?: FileSource;
};
```
**AI SDK `UIMessage` compatibility**: The API assembles `UIMessage` from `messages` + `parts` via JOIN. The mapping is:
- `text` (not ignored) → `{ type: "text", text }`
- `file` (non-text, non-directory) → `{ type: "file", url, mediaType, filename }`
- `reasoning``{ type: "reasoning", text }`
- `step-start``{ type: "step-start" }`
- `tool` (completed) → `{ type: "tool-{name}", state: "output-available", toolCallId, input, output }`
- `tool` (error) → `{ type: "tool-{name}", state: "output-error", toolCallId, input, errorText }`
AI SDK part types not mapped to the UIMessage view: `step-finish`, `patch`, `snapshot`, `compaction`, `agent`. These are either internal SDK events (`step-finish`, `compaction`), tool-execution metadata handled within the `tool` part's state lifecycle (`patch`, `snapshot`), or session-level delegation (`agent`, handled via `sessions.parentId`). They are stored in the `parts` table but excluded from the `UIMessage` assembly.
**Why separate `parts` table**: Streaming individual part updates, publishing `message.part.updated` SSE events, and querying parts independently (e.g., "find all tool calls in this session") all require parts to be their own rows, not embedded in a message JSON blob. This is the same pattern opencode uses and it works well at scale (100k+ parts across 24k+ messages in production).
**Parts are flat** — there is no `parentId` column on parts. Sub-agent delegation is handled at the session level (via `sessions.parentId`), not by nesting parts. If nesting becomes necessary in the future, it would require a schema change (adding `parentId` to parts).
**Indexes**: `part_session_idx` on `(session_id)`, `part_message_id_id_idx` on `(message_id, id)` for efficient message loading, and `idx_parts_session_id_type` on `(session_id, type)` for queries like "all tool-call parts in session X".