Files
hub/docs/architecture/storage/services.md
glm-5.1 2b63cda1c7 Setup repo: migrate architecture specs, code stubs, and tasks from alkhub_ts
Copy architecture docs, ADRs, storage domain specs, research, reviews,
and 56 storage architecture tasks from the alkhub_ts monorepo. Adapt for
standalone @alkdev/hub repo structure (src/ not packages/hub/).

Sanitize all sensitive information:
- Replace private IPs (10.0.0.1) with localhost defaults
- Remove internal server hostnames (dev1, ns528096)
- Replace /workspace/ private paths with npm package references
- Remove hardcoded credentials from examples
- Rewrite infrastructure.md without private network details

Add Deno project scaffolding: deno.json (pinned deps), .gitignore,
AGENTS.md, entry point. Migrate existing code stubs (crypto, config
types, logger) with updated import paths.
2026-05-25 10:56:32 +00:00

7.8 KiB

status, last_updated
status last_updated
draft 2026-04-19

Table Schemas: External Services

Client and credential tables for outbound service connections. For cross-cutting reference (cascade behavior, index reference, status enums, relations), see table-reference.md. For design decisions, see ../../../decisions/.

clients

External service registrations — "who we connect to." A client is any service the hub calls: LLM providers (Anthropic, OpenAI, OpenRouter), VCS (Gitea), compute (Vast.ai), MCP servers, JMAP, custom REST APIs. The config column holds the validated connection shape (URLs, headers, auth mechanism) without credentials. Credentials live in client_secrets.

Column Type Notes
commonCols id, metadata, createdAt, updatedAt
name text NOT NULL UNIQUE Identifier (anthropic, gitea, openrouter, vast-ai)
type text NOT NULL Client type: llm-provider, vcs, compute, mcp-server, custom
config jsonb NOT NULL Validated config instance — validated against the TypeBox schema for this type on write. Validation timing: Config is validated on write (API handler layer) using the TypeBox schema for the client type. On read, a startup validation pass logs warnings for rows that don't match the current schema — it does not block reads.
enabled boolean NOT NULL DEFAULT true Disable without deleting
ownerId text NOT NULL FK → accounts.id — who configured this client
orgId text FK → organizations.id (nullable — some clients are personal, not org-scoped)

config boundaries: Connection configuration goes in config (URLs, headers, auth mechanism). This is validated against the TypeBox schema for the client type. Secrets are NEVER in config — they go in client_secrets.

Indexes: unq_clients_name UNIQUE on (name), idx_clients_type on (type), idx_clients_owner_id on (ownerId), idx_clients_org_id on (orgId).

Config schema registry (in code, not DB): Each client type maps to a TypeBox schema that validates config on write:

const clientConfigSchemas: Record<string, TSchema> = {
  "llm-provider": LLMProviderConfig,    // baseUrl, defaultModel, models[], auth mechanism
  "vcs": VCSClientConfig,               // baseUrl, specUrl, namespace, auth mechanism
  "compute": ComputeConfig,              // endpoint, region, auth mechanism
  "mcp-server": MCPServerConfig,         // command/url + args/headers (from hub config types)
  "custom": HTTPServiceConfig,           // baseUrl, headers, auth (from @alkdev/operations/from-openapi)
};

Schema evolution contract: New fields in client config schemas MUST be Type.Optional(). Breaking changes MUST use a new client type (e.g., llm-provider-v2). This ensures existing DB rows remain valid across deployments. Consider adding configSchemaVersion to metadata in a future phase if breaking changes become common. For now, optional fields handle forward compatibility.

Validation chain: API handler validates → Drizzle insert → DB stores. Direct SQL bypasses application validation — this is a known risk documented in README.md.

Wiring config to secrets: The config contains secretKey (or envSecretKeys) fields that point to named secrets in client_secrets. The config knows HOW to auth, the secrets table holds WHAT to auth with.

Example config for a Gitea client:

{
  "baseUrl": "https://git.alk.dev/api/v1",
  "specUrl": "https://git.alk.dev/swagger.v1.json",
  "namespace": "gitea",
  "auth": { "type": "apiKey", "headerName": "Authorization", "prefix": "token ", "secretKey": "api_password" }
}

Example config for an MCP server:

{
  "command": "/usr/local/bin/mcp-server",
  "args": ["--port", "3000"],
  "envSecretKeys": { "OPENAI_API_KEY": "openai_key" }
}

Runtime resolution: On startup, load client → validate config → resolve secrets from client_secrets by secretKey wiring → merge config + decrypted secrets → create connection (MCP client, OpenAPI operations, etc.).

client_secrets

Encrypted credential store — "how we authenticate to them." Each secret is an encrypted value (API key, password, OAuth token, SSH key) associated with a client. Stored as AES-256-GCM encrypted data via src/crypto.ts.

Column Type Notes
commonCols id, metadata, createdAt, updatedAt
clientId text NOT NULL FK → clients.id (cascade)
key text NOT NULL Secret key name: api_key, api_password, oauth_credentials, ssh_key, etc.
value jsonb NOT NULL Encrypted payload — EncryptedData { keyVersion, salt, iv, data } from crypto.ts
keyVersion integer NOT NULL DEFAULT 1 Encryption key version for rotation
expiresAt timestamp with tz When the secret expires (e.g., OAuth token TTL). Null = no expiry.
lastUsedAt timestamp with tz When the secret was last used to authenticate

Unique constraint: (client_id, key) — one named secret per client.

Indexes: unq_client_secrets_client_key UNIQUE on (clientId, key), idx_client_secrets_expires_at on (expiresAt).

Encrypted data structure (EncryptedData from crypto.ts):

interface EncryptedData {
  keyVersion: number;   // matches client_secrets.keyVersion
  salt: string;         // base64, 16 bytes (PBKDF2)
  iv: string;           // base64, 12 bytes (AES-GCM)
  data: string;         // base64, AES-256-GCM ciphertext
}

Encryption flow:

  1. Raw secret (API key, password) → crypto.encrypt(secret, dataEncryptionKey)EncryptedData
  2. Store as JSONB in value
  3. On use: crypto.decrypt(value, dataEncryptionKey) → raw secret
  4. Data encryption keys from hub config (see hub-config.md for the two-layer key model) — comma-separated list of version:base64key pairs (e.g., v1:YmFzZTY0a2V5, v2:Zm9yYmFyYmF6). Stored in the config file's encryptionKeys field (encrypted with the Docker-secret-provisioned master key). Generated once per version via crypto.generateEncryptionKey(). The first key in the list is the "current" key used for new encryptions. All keys in the list are available for decryption (allows key rotation). No env vars for secrets — see ADR-008 (revised).

Secret format convention: Most secrets are plain strings (API keys, passwords). Complex secrets (OAuth tokens) are JSON objects JSON.stringify()'d before encryption. The key name indicates the format: api_key = string, oauth_credentials = JSON.

Key rotation protocol:

  • On read: Decrypt with the key version indicated by client_secrets.keyVersion. All key versions in the data encryption key ring (from hub config, see hub-config.md) are available for decryption.
  • On write (new secret): Encrypt with the current key version (the first key in the encryption keys list from hub config).
  • Re-encryption: Decrypt with old key version → encrypt with current key → UPDATE in a single DB transaction. If the process crashes between decrypt and UPDATE, the old version remains accessible (the row still references the old keyVersion and the old key is still in the key ring until fully rotated).
  • Background sweep: A background job SHOULD periodically re-encrypt secrets using old key versions. Until re-encryption completes, secrets encrypted with old keys remain vulnerable if the old key is compromised. Key rotation for data encryption keys is independent of master key rotation — see hub-config.md for the two-layer key model.
  • Error handling: If a key version referenced by client_secrets.keyVersion is not found in the data encryption key ring, log an error and skip re-encryption. Alert the operator — this indicates a missing key that could cause data loss.