--- status: draft last_updated: 2026-05-28 --- # Encrypted Data Design for storing encrypted data at rest within the metagraph model. Uses AES-256-GCM + PBKDF2 key derivation, providing a reusable node type, TypeBox schema, and crypto utility for any consumer that needs to store secrets. ## Overview Sensitive data — API keys, passwords, OAuth tokens, SSH keys — must be encrypted at rest. In `@alkdev/storage`, the encryption pattern becomes a reusable utility and an encrypted node type, so any graph can store secrets without special table definitions. **Key principle**: The storage package provides the **encryption primitives and the schema shape**, not key management. Consumers provide the encryption key. This keeps the package agnostic to deployment-specific secret management. **Provenance**: The encryption pattern (AES-256-GCM + PBKDF2) was originally implemented in the hub's `client_secrets` table and `src/crypto/mod.ts`. `@alkdev/storage` extracts this pattern as a general-purpose utility, independent of the hub's domain model. ## The Problem The hub has `client_secrets` as a standalone table with columns like: | Column | Purpose | | ------------ | -------------------------------------------------- | | `clientId` | FK to the client this secret belongs to | | `key` | Secret name (e.g., "api_key", "oauth_credentials") | | `value` | The encrypted payload (EncryptedData JSON) | | `keyVersion` | Which encryption key version was used | | `expiresAt` | When the secret expires | | `lastUsedAt` | Audit trail | This is a domain-specific table. The encryption logic itself is generic — AES-256-GCM with PBKDF2 key derivation and key versioning. When we want encrypted secrets in a spoke (local SQLite) or in a different domain model, we shouldn't have to duplicate the table definition or the crypto code. ## Design: Encrypted Data as a Node Type Instead of a dedicated `client_secrets` table, encrypted data becomes a **node type** in a graph: ```ts import { BaseNodeAttributes, SchemaBuilder } from "@alkdev/storage"; import { Type } from "@alkdev/typebox"; import { EncryptedDataSchema } from "@alkdev/storage"; const SecretNodeType = Type.Intersect([ BaseNodeAttributes, Type.Object({ key: Type.String({ minLength: 1, maxLength: 255 }), encryptedData: EncryptedDataSchema, expiresAt: Type.Optional(Type.String({ format: "date-time" })), }), ]); const schema = new SchemaBuilder() .config({ type: "undirected", multi: false, allowSelfLoops: false }) .nodeType("secret", SecretNodeType) .nodeType( "client", Type.Intersect([ BaseNodeAttributes, Type.Object({ name: Type.String(), type: Type.String(), config: Type.Record(Type.String(), Type.Any()), enabled: Type.Boolean({ default: true }), }), ]), ) .edgeType( "has_secret", Type.Intersect([ BaseEdgeAttributes, Type.Object({ secretKey: Type.String(), }), ]), { allowedSourceTypes: ["client"], allowedTargetTypes: ["secret"], }, ) .build(); ``` This represents the same relationship as `client_secrets.clientId` — but as a graph edge rather than a foreign key. ### Why This Works 1. **No special tables needed** — The existing `graph_types`, `node_types`, `edge_types`, `graphs`, `nodes`, `edges` tables store everything. 2. **Schema validation** — The `EncryptedDataSchema` TypeBox schema validates the encryption envelope at write time. 3. **Domain flexibility** — An "ACL graph" might also have encrypted credential nodes. A "call graph" might store encrypted auth headers. Different graphs, same pattern. 4. **Query through edges** — "Find all secrets for client X" becomes "find all edges of type `has_secret` from node X to secret nodes." 5. **The crypto utility is shared** — `@alkdev/storage` exports `encrypt()` and `decrypt()` that any consumer uses. ### What Lives Where | Layer | Responsibility | Package | | ------------------------ | --------------------------------------------------------- | ------------------------ | | `@alkdev/storage` graphs | `EncryptedDataSchema` (TypeBox shape) | `@alkdev/storage` | | `@alkdev/storage` crypto | `encrypt()`, `decrypt()`, `generateEncryptionKey()` | `@alkdev/storage` | | `@alkdev/storage` sqlite | Node storage (attributes contain encrypted JSON) | `@alkdev/storage/sqlite` | | `@alkdev/storage` repo | Validate schema, encrypt before insert (⚠️ not yet impl) | `@alkdev/storage` | | Application | Key management (key ring, key rotation) | Consumer | ## EncryptedData Schema Ported from the hub's `src/crypto/mod.ts` interface, now expressed as a TypeBox schema in `@alkdev/storage`: ```ts import { Type } from "@alkdev/typebox"; export const EncryptedDataSchema = Type.Object({ keyVersion: Type.Integer({ minimum: 1, description: "Encryption key version for rotation", }), salt: Type.String({ description: "Base64-encoded 16-byte PBKDF2 salt" }), iv: Type.String({ description: "Base64-encoded 12-byte AES-GCM initialization vector", }), data: Type.String({ description: "Base64-encoded AES-256-GCM ciphertext" }), }); ``` This is the same structure as the hub's `EncryptedData` interface but as a TypeBox schema, enabling runtime validation when inserting encrypted nodes. ## Crypto Utility The encryption module provides three functions, ported from the hub's `src/crypto/mod.ts`: ### `encrypt(plaintext, password, keyVersion?): Promise` Encrypts a string using AES-256-GCM with PBKDF2 key derivation. **Process**: 1. Generate random 16-byte salt 2. Generate random 12-byte IV 3. Derive 256-bit key from password + salt via PBKDF2 (SHA-256, 100k iterations for v1) 4. Encrypt plaintext with AES-256-GCM using the derived key and IV 5. Return `{ keyVersion, salt: base64(salt), iv: base64(iv), data: base64(ciphertext) }` ### `decrypt(encryptedData, password): Promise` Decrypts an `EncryptedData` object. **Process**: 1. Decode base64 salt, IV, and ciphertext 2. Derive key from password + salt + keyVersion via PBKDF2 3. Decrypt with AES-256-GCM 4. Return plaintext string 5. Throw `"Decryption failed: Invalid data or key"` on failure (no information leakage about which part failed) ### `generateEncryptionKey(): string` Generates a 32-byte random key encoded as base64. Used by operators to create encryption keys for the key ring. **Key ring format** (application-level, not in this package): A comma-separated list of `v{N}:{base64key}` pairs. The first key is the "current" key used for new encryptions. All keys are available for decryption. ### Key Versioning PBKDF2 iteration count varies by key version: - v1: 100,000 iterations - Future versions: 200,000+ (adjust for hardware improvements) This allows gradual security upgrades. Old data encrypted with v1 can still be decrypted. Re-encryption (rotate) reads with the old key and writes with the current key. ### Web Crypto API The implementation uses the standard Web Crypto API (`crypto.subtle`), available in: - Deno runtime (native) - Node.js 19+ (native) - Modern browsers (native) - Cloudflare Workers (native) No external crypto dependencies. ## Design Decisions ### ED1: Per-attribute encryption, not per-node The `EncryptedData` schema is a single attribute within a node type's attributes, not the entire node. This means: - A secret node can have unencrypted metadata alongside the encrypted value - The node key (identity) is always readable for queries - Only the sensitive payload is encrypted **Alternative considered**: Encrypt the entire `attributes` column. This makes queries impossible (you can't find "all secrets for client X" if the client reference is encrypted). Per-attribute encryption preserves queryability on non-sensitive fields. ### ED2: Node type, not standalone table Encrypted data is modeled as a node type rather than a dedicated `secrets` table because: - **Graphs already provide the structure** — edges represent "client X has secret Y" without a join table - **No foreign key proliferation** — new secret types (OAuth, SSH, API keys) are new node types, not new columns or tables - **Uniform query patterns** — All graph queries work on secret nodes without special code **When a standalone table might be better**: If a consumer (like the hub) needs to query "all active API keys" across all clients with a single indexed `WHERE` clause, a dedicated `api_keys` table with proper indexes is faster. The graph model requires traversing edges to find related secrets. For a hub's specific use case (key lookup on every authenticated request), this matters. The metagraph pattern is optimized for flexibility, not raw key-lookup performance. Consumers should use standalone tables for authentication hot paths and the metagraph for everything else. ### ED3: Password-based encryption, not raw-key encryption The current implementation uses PBKDF2 to derive a key from a password string. The "password" in practice is a base64-encoded 32-byte random key from `generateEncryptionKey()`. This means: - The key derivation step adds security even when the input is already high-entropy (each encryption gets a unique salt, so the same key produces different ciphertexts) - However, this adds ~100ms of latency per encryption/decryption due to PBKDF2 iterations **Alternative**: Direct AES-GCM with raw key bytes (skip PBKDF2). This would be much faster for high-throughput scenarios but removes the per-encryption salt benefit (the IV still provides uniqueness for GCM). The hub uses password-based because the config format is human-manageable key strings. For `@alkdev/storage`, either approach works — the API accepts a "password" string which could be a raw key encoded as base64. **Decision**: Use the same PBKDF2 pattern for consistency with the hub. If performance becomes an issue, add a `encryptRaw()` function that skips PBKDF2 for raw key inputs. ### ED4: Application-managed key ring The storage package provides `encrypt()` and `decrypt()` but does NOT manage the key ring. The consuming application: 1. Stores encryption keys in a secure location (Docker secrets, vault, config file with restricted permissions) 2. Loads keys at startup 3. Passes the appropriate key to `encrypt()` / `decrypt()` based on `keyVersion` 4. Handles key rotation (decrypt with old key, re-encrypt with current key) This separation ensures: - The storage package doesn't need to know about deployment infrastructure - Key management policies are application-specific - The encryption primitives are testable without a key ring implementation ### ED5: No key rotation utility in this package Key rotation (decrypt with old key, re-encrypt with current key) is an application-level workflow: 1. Find all nodes with `attributes.encryptedData.keyVersion < currentVersion` 2. For each: decrypt with old key → encrypt with current key → update node 3. Commit transaction The storage package provides the building blocks (`encrypt()`, `decrypt()`, `EncryptedDataSchema`), not the rotation workflow. The hub's background sweep pattern is a good reference implementation. ## Integration with SQLite Host Encrypted node attributes are stored as JSON text in the `nodes.attributes` column, same as any other node attributes. The `EncryptedDataSchema` validates the shape at the application level. ```ts import { decrypt, encrypt } from "@alkdev/storage"; import { EncryptedDataSchema } from "@alkdev/storage"; const encryptionKey = "v1:YmFzZTY0a2V5"; // from application config const plaintext = "sk-ant-api03-..."; const encryptedData = await encrypt(plaintext, encryptionKey, 1); // Validate before storage const attributes = { key: "api_key", encryptedData, expiresAt: new Date().toISOString(), created: new Date().toISOString(), }; // Store as a node in a graph // db.insert(nodes).values({ graphId, key: "anthropic-api-key", attributes }); // Retrieve and decrypt // const node = await db.query.nodes.findFirst({ where: eq(nodes.key, "anthropic-api-key") }); // const decrypted = await decrypt(node.attributes.encryptedData, encryptionKey); ``` ## Export Plan The crypto module will be exported from the main `@alkdev/storage` package (no db deps): ``` src/graphs/ ├── types.ts # existing: GraphConfig, NodeType, EdgeType, etc. ├── schemaBuilder.ts # existing: SchemaBuilder ├── crypto.ts # new: encrypt(), decrypt(), generateEncryptionKey(), EncryptedDataSchema └── mod.ts # re-exports all of the above ``` This keeps the encryption utility in the zero-dep export path (it only uses Web Crypto API and `@alkdev/typebox` for the schema). ## Open Questions 1. **Should we add `encryptRaw()` for performance?** The PBKDF2 derivation adds ~100ms per operation. For batch secret operations (e.g., rotating 1000 keys), this adds up. A `encryptRaw()` that skips PBKDF2 and uses the key directly would be much faster. Decision: add in a future iteration if performance demands it. 2. **Should the `key` attribute on secret nodes be encrypted?** Currently only the `encryptedData` attribute is encrypted. The `key` (secret name like "api_key") is stored in plaintext for queryability. If secret names are themselves sensitive, they could be hashed instead. Decision: plaintext key names are acceptable for now. If needed, add a `keyHash` attribute for blind lookups (similar to the hub's `api_keys.keyHash`). 3. **Should secret nodes have `lastUsedAt` and `expiresAt` as first-class columns?** The hub's `client_secrets` has these as columns for indexed queries. In the metagraph model, they're attributes inside the node JSON. SQLite can't efficiently index JSON properties. Decision: for spoke use (occasional lookups), JSON attributes are fine. For hub use (high-throughput key validation), a standalone `api_keys` table with proper indexes is still needed. ## References - Web Crypto API: https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto - Hub crypto utility (provenance): `/workspace/@alkdev/hub/src/crypto/mod.ts` - Hub `client_secrets` table (provenance): `/workspace/@alkdev/hub/docs/architecture/storage/services.md` - Hub ADR-008 (provenance): `/workspace/@alkdev/hub/docs/decisions/ADR-008-secrets-encrypted-at-rest-with-key-versioning.md` - `@alkdev/operations` AccessControl: `/workspace/@alkdev/operations/docs/architecture/api-surface.md`