alkdev/hub

Files

glm-5.1 2b63cda1c7 Setup repo: migrate architecture specs, code stubs, and tasks from alkhub_ts

Copy architecture docs, ADRs, storage domain specs, research, reviews,
and 56 storage architecture tasks from the alkhub_ts monorepo. Adapt for
standalone @alkdev/hub repo structure (src/ not packages/hub/).

Sanitize all sensitive information:
- Replace private IPs (10.0.0.1) with localhost defaults
- Remove internal server hostnames (dev1, ns528096)
- Replace /workspace/ private paths with npm package references
- Remove hardcoded credentials from examples
- Rewrite infrastructure.md without private network details

Add Deno project scaffolding: deno.json (pinned deps), .gitignore,
AGENTS.md, entry point. Migrate existing code stubs (crypto, config
types, logger) with updated import paths.

2026-05-25 10:56:32 +00:00

14 KiB

Raw Blame History

status, last_updated

status	last_updated
draft	2026-04-20

Table Schemas: Sessions, Messages & Parts

Agent conversation session tables. For cross-cutting reference (cascade behavior, index reference, status enums, relations), see table-reference.md. For design decisions, see ../../../decisions/. For the session architecture, see ../../agent-sessions.md.

`sessions`

Agent conversation sessions. Every session — whether the LLM runs directly in the hub or in a remote opencode container — stores its data here. The hub is the source of truth; spokes are execution environments.

Column	Type	Notes
commonCols	—	id, metadata, createdAt, updatedAt
accountId	text	FK → accounts.id — Nullable — orphaned sessions preserve conversation history for audit and debugging. See D1 in storage-spec-phase1-resolutions.md.
projectId	text NOT NULL	FK → projects.id (cascade)
workspaceId	text	FK → workspaces.id
parentId	text	FK → sessions.id — Parent session (coordinator relationship). onDelete: SET NULL — deleting a parent session detaches children but preserves them.
slug	text NOT NULL UNIQUE	URL-friendly session identifier (unique across all sessions). `slug` is generated from the session title using URL-friendly slugification (lowercase, hyphens for spaces, alphanumeric only). Uniqueness is enforced by the UNIQUE constraint. If a collision occurs, append a short random suffix.
title	text NOT NULL	Session title
status	text NOT NULL	Enum: `idle`, `busy`, `retry`, `archived`. Default: `idle`
version	text NOT NULL	Schema version of the session's `data` column. Default: `'1'`. Incremented when the data format changes (e.g., new optional fields added). New fields should be optional in the schema, so `version` advances for breaking changes only. The hub uses this for migration-aware reads: version 1 sessions get default values for new fields. This field exists for forward compatibility — it allows the hub to interpret session data correctly as the schema evolves. It is NOT a concurrency version (for optimistic locking, use `commonCols.updatedAt`).
provider	text	Execution path: `direct` (hub AI SDK) or `opencode` (spoke)
roleName	text	Which role this session fills (e.g., "architect", "implementation-specialist"). Formerly `agentName` in OpenCode. See ADR-012 and agent-roles.md. `roleName` is a free-form string (not a FK constraint). Known role names are defined in the `roles` table, but sessions may use ad-hoc role names. Application code should validate against known roles when available but tolerate unknown values.
data	jsonb	Role-specific metadata (model, tokens, cost, finish reason, etc.)

data boundaries: Execution metadata goes in data (model, tokens, cost, finish reason, resolved permissions). Structured fields like status, provider, roleName are separate columns because they're queried, filtered, and constrained. If a field appears in WHERE clauses or JOINs, it should be a proper column, not buried in JSONB.

Session data shapes: The data JSONB column holds execution-path-specific metadata. For direct sessions: { model, tokens, cost, finish }. For opencode sessions: additional fields from opencode's session model (summary stats, etc.). The data column also holds the resolved permissions for the session (data.scope), which is computed from the intersection of role permissions, account scopes, and spoke type trust level. See agent-sessions.md and agent-roles.md for the full models.

Status lifecycle:

idle: Session exists, not currently executing
busy: Session is actively processing (LLM call in progress)
retry: Last execution failed, session pending retry
archived: Session is read-only, no further interaction

Indexes: unq_sessions_slug UNIQUE on (slug), idx_sessions_project_id on (projectId), idx_sessions_workspace_id on (workspaceId), idx_sessions_status on (status), idx_sessions_active partial on (id) WHERE status IN ('idle', 'busy', 'retry') — efficiently find active (non-archived) sessions, idx_sessions_account_id on (accountId), idx_sessions_role_name on (roleName), idx_sessions_parent_id on (parentId) — find child sessions of coordinator.

`messages`

Messages within sessions. Content is stored separately in the parts table. This follows the opencode pattern: message metadata in one row, parts in separate rows. This enables streaming individual part updates, querying parts independently, and SSE events for message.part.updated.

Column	Type	Notes
commonCols	—	id, metadata, createdAt, updatedAt
sessionId	text NOT NULL IMMUTABLE	FK → sessions.id (cascade) — Never updated after creation.
role	text NOT NULL	`user`, `assistant`, `system`
data	jsonb NOT NULL	Role-specific metadata

Message IDs use UUIDv4 (via commonCols.id). Ordering is handled by the composite index idx_messages_session_id_created_at_id on (session_id, created_at, id). See ADR-003 for the rationale.

Message data shapes (discriminated by role):

user messages:

{
  time: { created: number },           // epoch ms
  format?: "text" | "json_schema",    // input format hint
  summary?: { title?: string, body?: string, diffs?: FileDiff[] },
  agent?: string,                      // target agent name
  model?: { providerID: string, modelID: string },
  tools?: Record<string, boolean>,     // enabled tools for this turn
}

assistant messages:

{
  time: { created: number, completed?: number },
  parentID?: string,                   // FK to the user message that triggered this turn
  modelID: string,
  providerID: string,
  agent?: string,
  path?: { cwd: string, root: string },
  cost?: number,
  tokens?: { input: number, output: number, reasoning?: number, cache?: { read: number, write: number } },
  finish?: string,                     // "stop", "tool-calls", "length", etc.
  error?: { code: string, message: string }, // typed error if the turn failed
}

system messages:

{
  time: { created: number },
  content: string,                     // system prompt text
}

Compatibility with opencode: The data blob is a superset of opencode's InfoData. When importing an opencode session, the opencode-specific fields (parentID, path, modelID, providerID, cost, tokens, finish) map directly. When importing from a hub-direct AI SDK session, the AI SDK UIMessage fields are projected into the same shape.

Compatibility with AI SDK: The AI SDK's UIMessage format (role + parts array) is assembled from these tables via a JOIN query. Storage is normalized; the API presents the denormalized view. No format conversion needed.

`parts`

Message parts — the actual content of the conversation. Each part has a type discriminator and type-specific content in the data column. Parts are ordered by their id within a message, using sortable timestamp-based IDs (not commonCols.id).

Important: The id column for parts uses a sortable ID scheme (not UUIDv4 from commonCols). Opencode uses prefix-based sortable IDs like prt_{timestamp_hex}{random} that give chronological ordering. This enables ORDER BY id ASC within a message without needing a separate position column. The implementation should use a monotonic ID generator that produces lexicographically sortable IDs.

The sessionId column on parts is a deliberate denormalization of message.sessionId — it allows direct queries like "all parts for a session" without joining through messages. sessionId on both messages and parts is IMMUTABLE after creation. It must never be updated. This is enforced by application logic, not a DB trigger. When inserting a part, read the message's sessionId and set it on the part within the same transaction. Direct SQL must not update sessionId on existing rows.

Column	Type	Notes
id	text PK NOT NULL	Sortable timestamp-based ID (not commonCols.id)
metadata	jsonb	defaults to `{}`
createdAt	timestamp with tz NOT NULL	defaults to `now()`
updatedAt	timestamp with tz NOT NULL	defaults to `now()`, `$onUpdate(() => new Date())`
messageId	text NOT NULL	FK → messages.id (cascade)
sessionId	text NOT NULL IMMUTABLE	FK → sessions.id (cascade, denormalized for direct queries) — Never updated after creation.
type	text NOT NULL	Part type discriminator (see below)
data	jsonb NOT NULL	Type-specific content

Parts are immutable after creation. updatedAt is set on creation but parts should never be updated. The $onUpdate hook from commonCols is a no-op for parts because insert-only operations don't trigger it. If a part needs correction, insert a new part (e.g., a correction or amendment) rather than updating an existing one. The id column uses a sortable ID scheme (not UUIDv4 from commonCols) because chronological ordering within a message is required — see the sortable ID note above.

Part types and their data shapes:

The type field determines the shape of data. Our part types are a subset of opencode's MessageV2.Part discriminated union, expanded with AI SDK compatibility types. The types we include are:

type	Description	data shape
`text`	Main text content (user or assistant)	`{ text: string, synthetic?: boolean, ignored?: boolean, time?: { start: number, end: number }, metadata?: Record<string, unknown> }`
`reasoning`	Chain-of-thought / extended thinking	`{ text: string, metadata?: Record<string, unknown>, time: { start: number, end: number } }`
`tool`	Tool invocation with lifecycle state	`{ callID: string, tool: string, state: ToolState }` — see below
`step-start`	Beginning of an agentic step	`{ snapshot?: string }` — git tree hash
`step-finish`	End of an agentic step with cost accounting	`{ reason: string, snapshot?: string, cost?: number, tokens: { input: number, output: number, reasoning?: number, cache?: { read: number, write: number } } }`
`file`	File attachment	`{ mime: string, filename?: string, url: string, source?: FileSource }`
`patch`	Git patch applied during tool execution	`{ hash: string, files: string[] }`
`snapshot`	Git tree hash reference	`{ snapshot: string }`
`agent`	Sub-agent delegation (e.g., @reviewer)	`{ name: string, source?: { value: string, start: number, end: number } }`
`compaction`	Context window compaction marker	`{ auto: boolean, overflow?: boolean }`

Tool state discriminated union (ToolState):

type ToolState =
  | { status: "pending", input: Record<string, unknown>, raw: string }
  | { status: "running", input: Record<string, unknown>, title?: string, metadata?: Record<string, unknown>, time: { start: number } }
  | { status: "completed", input: Record<string, unknown>, output: string, title: string, metadata: Record<string, unknown>, time: { start: number, end: number, compacted?: boolean }, attachments?: FilePartData[] }
  | { status: "error", input: Record<string, unknown>, error: string, metadata?: Record<string, unknown>, time: { start: number, end: number } }

File source types:

type FileSource =
  | { type: "file", path: string, text: { value: string, start: number, end: number } }
  | { type: "symbol", path: string, name: string, kind: number, range: LSPLikeRange, text: { value: string, start: number, end: number } }
  | { type: "resource", clientName: string, uri: string, text: { value: string, start: number, end: number } }

type FilePartData = {
  mime: string;
  filename?: string;
  url: string;
  source?: FileSource;
};

AI SDK UIMessage compatibility: The API assembles UIMessage from messages + parts via JOIN. The mapping is:

text (not ignored) → { type: "text", text }
file (non-text, non-directory) → { type: "file", url, mediaType, filename }
reasoning → { type: "reasoning", text }
step-start → { type: "step-start" }
tool (completed) → { type: "tool-{name}", state: "output-available", toolCallId, input, output }
tool (error) → { type: "tool-{name}", state: "output-error", toolCallId, input, errorText }

AI SDK part types not mapped to the UIMessage view: step-finish, patch, snapshot, compaction, agent. These are either internal SDK events (step-finish, compaction), tool-execution metadata handled within the tool part's state lifecycle (patch, snapshot), or session-level delegation (agent, handled via sessions.parentId). They are stored in the parts table but excluded from the UIMessage assembly.

Why separate parts table: Streaming individual part updates, publishing message.part.updated SSE events, and querying parts independently (e.g., "find all tool calls in this session") all require parts to be their own rows, not embedded in a message JSON blob. This is the same pattern opencode uses and it works well at scale (100k+ parts across 24k+ messages in production).

Parts are flat — there is no parentId column on parts. Sub-agent delegation is handled at the session level (via sessions.parentId), not by nesting parts. If nesting becomes necessary in the future, it would require a schema change (adding parentId to parts).

Indexes: part_session_idx on (session_id), part_message_id_id_idx on (message_id, id) for efficient message loading, and idx_parts_session_id_type on (session_id, type) for queries like "all tool-call parts in session X".

14 KiB Raw Blame History

Table Schemas: Sessions, Messages & Parts

sessions

messages

parts

14 KiB

Raw Blame History

`sessions`

`messages`

`parts`