Files
hub/docs/architecture/storage/services.md
glm-5.1 2b63cda1c7 Setup repo: migrate architecture specs, code stubs, and tasks from alkhub_ts
Copy architecture docs, ADRs, storage domain specs, research, reviews,
and 56 storage architecture tasks from the alkhub_ts monorepo. Adapt for
standalone @alkdev/hub repo structure (src/ not packages/hub/).

Sanitize all sensitive information:
- Replace private IPs (10.0.0.1) with localhost defaults
- Remove internal server hostnames (dev1, ns528096)
- Replace /workspace/ private paths with npm package references
- Remove hardcoded credentials from examples
- Rewrite infrastructure.md without private network details

Add Deno project scaffolding: deno.json (pinned deps), .gitignore,
AGENTS.md, entry point. Migrate existing code stubs (crypto, config
types, logger) with updated import paths.
2026-05-25 10:56:32 +00:00

108 lines
7.8 KiB
Markdown

---
status: draft
last_updated: 2026-04-19
---
# Table Schemas: External Services
Client and credential tables for outbound service connections. For cross-cutting reference (cascade behavior, index reference, status enums, relations), see [table-reference.md](./table-reference.md). For design decisions, see [../../../decisions/](../../../decisions/).
### `clients`
External service registrations — "who we connect to." A client is any service the hub calls: LLM providers (Anthropic, OpenAI, OpenRouter), VCS (Gitea), compute (Vast.ai), MCP servers, JMAP, custom REST APIs. The `config` column holds the validated connection shape (URLs, headers, auth mechanism) **without credentials**. Credentials live in `client_secrets`.
| Column | Type | Notes |
|--------|------|-------|
| commonCols | — | id, metadata, createdAt, updatedAt |
| name | text NOT NULL UNIQUE | Identifier (`anthropic`, `gitea`, `openrouter`, `vast-ai`) |
| type | text NOT NULL | Client type: `llm-provider`, `vcs`, `compute`, `mcp-server`, `custom` |
| config | jsonb NOT NULL | Validated config instance — validated against the TypeBox schema for this `type` on write. **Validation timing**: Config is validated on write (API handler layer) using the TypeBox schema for the client `type`. On read, a startup validation pass logs warnings for rows that don't match the current schema — it does not block reads. |
| enabled | boolean NOT NULL DEFAULT true | Disable without deleting |
| ownerId | text NOT NULL | FK → accounts.id — who configured this client |
| orgId | text | FK → organizations.id (nullable — some clients are personal, not org-scoped) |
**config boundaries**: Connection configuration goes in `config` (URLs, headers, auth mechanism). This is validated against the TypeBox schema for the client `type`. Secrets are NEVER in `config` — they go in `client_secrets`.
**Indexes**: `unq_clients_name` UNIQUE on `(name)`, `idx_clients_type` on `(type)`, `idx_clients_owner_id` on `(ownerId)`, `idx_clients_org_id` on `(orgId)`.
**Config schema registry** (in code, not DB): Each client `type` maps to a TypeBox schema that validates `config` on write:
```ts
const clientConfigSchemas: Record<string, TSchema> = {
"llm-provider": LLMProviderConfig, // baseUrl, defaultModel, models[], auth mechanism
"vcs": VCSClientConfig, // baseUrl, specUrl, namespace, auth mechanism
"compute": ComputeConfig, // endpoint, region, auth mechanism
"mcp-server": MCPServerConfig, // command/url + args/headers (from hub config types)
"custom": HTTPServiceConfig, // baseUrl, headers, auth (from @alkdev/operations/from-openapi)
};
```
**Schema evolution contract**: New fields in client config schemas MUST be `Type.Optional()`. Breaking changes MUST use a new client `type` (e.g., `llm-provider-v2`). This ensures existing DB rows remain valid across deployments. Consider adding `configSchemaVersion` to `metadata` in a future phase if breaking changes become common. For now, optional fields handle forward compatibility.
**Validation chain**: API handler validates → Drizzle insert → DB stores. Direct SQL bypasses application validation — this is a known risk documented in README.md.
**Wiring config to secrets**: The config contains `secretKey` (or `envSecretKeys`) fields that point to named secrets in `client_secrets`. The config knows HOW to auth, the secrets table holds WHAT to auth with.
Example config for a Gitea client:
```json
{
"baseUrl": "https://git.alk.dev/api/v1",
"specUrl": "https://git.alk.dev/swagger.v1.json",
"namespace": "gitea",
"auth": { "type": "apiKey", "headerName": "Authorization", "prefix": "token ", "secretKey": "api_password" }
}
```
Example config for an MCP server:
```json
{
"command": "/usr/local/bin/mcp-server",
"args": ["--port", "3000"],
"envSecretKeys": { "OPENAI_API_KEY": "openai_key" }
}
```
**Runtime resolution**: On startup, load client → validate config → resolve secrets from `client_secrets` by `secretKey` wiring → merge config + decrypted secrets → create connection (MCP client, OpenAPI operations, etc.).
### `client_secrets`
Encrypted credential store — "how we authenticate to them." Each secret is an encrypted value (API key, password, OAuth token, SSH key) associated with a client. Stored as AES-256-GCM encrypted data via `src/crypto.ts`.
| Column | Type | Notes |
|--------|------|-------|
| commonCols | — | id, metadata, createdAt, updatedAt |
| clientId | text NOT NULL | FK → clients.id (cascade) |
| key | text NOT NULL | Secret key name: `api_key`, `api_password`, `oauth_credentials`, `ssh_key`, etc. |
| value | jsonb NOT NULL | Encrypted payload — `EncryptedData { keyVersion, salt, iv, data }` from crypto.ts |
| keyVersion | integer NOT NULL DEFAULT 1 | Encryption key version for rotation |
| expiresAt | timestamp with tz | When the secret expires (e.g., OAuth token TTL). Null = no expiry. |
| lastUsedAt | timestamp with tz | When the secret was last used to authenticate |
**Unique constraint**: `(client_id, key)` — one named secret per client.
**Indexes**: `unq_client_secrets_client_key` UNIQUE on `(clientId, key)`, `idx_client_secrets_expires_at` on `(expiresAt)`.
**Encrypted data structure** (`EncryptedData` from crypto.ts):
```ts
interface EncryptedData {
keyVersion: number; // matches client_secrets.keyVersion
salt: string; // base64, 16 bytes (PBKDF2)
iv: string; // base64, 12 bytes (AES-GCM)
data: string; // base64, AES-256-GCM ciphertext
}
```
**Encryption flow**:
1. Raw secret (API key, password) → `crypto.encrypt(secret, dataEncryptionKey)``EncryptedData`
2. Store as JSONB in `value`
3. On use: `crypto.decrypt(value, dataEncryptionKey)` → raw secret
4. Data encryption keys from hub config (see [hub-config.md](../../hub-config.md) for the two-layer key model) — comma-separated list of `version:base64key` pairs (e.g., `v1:YmFzZTY0a2V5, v2:Zm9yYmFyYmF6`). Stored in the config file's `encryptionKeys` field (encrypted with the Docker-secret-provisioned master key). Generated once per version via `crypto.generateEncryptionKey()`. The first key in the list is the "current" key used for new encryptions. All keys in the list are available for decryption (allows key rotation). **No env vars for secrets** — see ADR-008 (revised).
**Secret format convention**: Most secrets are plain strings (API keys, passwords). Complex secrets (OAuth tokens) are JSON objects `JSON.stringify()`'d before encryption. The `key` name indicates the format: `api_key` = string, `oauth_credentials` = JSON.
**Key rotation protocol**:
- **On read**: Decrypt with the key version indicated by `client_secrets.keyVersion`. All key versions in the data encryption key ring (from hub config, see [hub-config.md](../../hub-config.md)) are available for decryption.
- **On write (new secret)**: Encrypt with the current key version (the first key in the encryption keys list from hub config).
- **Re-encryption**: Decrypt with old key version → encrypt with current key → UPDATE in a single DB transaction. If the process crashes between decrypt and UPDATE, the old version remains accessible (the row still references the old `keyVersion` and the old key is still in the key ring until fully rotated).
- **Background sweep**: A background job SHOULD periodically re-encrypt secrets using old key versions. Until re-encryption completes, secrets encrypted with old keys remain vulnerable if the old key is compromised. Key rotation for data encryption keys is independent of master key rotation — see [hub-config.md](../../hub-config.md) for the two-layer key model.
- **Error handling**: If a key version referenced by `client_secrets.keyVersion` is not found in the data encryption key ring, log an error and skip re-encryption. Alert the operator — this indicates a missing key that could cause data loss.