Architecture docs previously referenced the hub as the authoritative source for call/identity specs. In reality, call protocol, identity, and access control come from @alkdev/operations; call graph schemas from @alkdev/flowgraph; task graph schemas from @alkdev/taskgraph; event transport from @alkdev/pubsub. The hub is a consumer of @alkdev/storage, not the other way around. Key changes: - overview.md: add Ecosystem Integration section with dependency direction diagram, What Comes From Where table, repo layer bridging pattern, and circular dependency avoidance guidance - overview.md: promote repo-layer vs operations-bridging from open question to explicit decision (CRUD in storage, bridging in consumer) - overview.md: add zero-ecosystem-dependency statement; fix taskgraph type names (TaskGraphNodeAttributes, DependencyEdge) - overview.md: fix terminology (hub is consumer, not authority) - metagraph.md: add Ecosystem Context section; replace hub references with correct ecosystem sources; fix GraphStatus/GraphBaseType enum mischaracterization (C1); unify empty-array semantics with sqlite-host (C2); clarify repo layer does NOT import operations (C3); add flowgraph canonical schema note; add versioning cross-reference to graph_types table - encrypted-data.md: reframe hub as provenance not authority; update What Lives Where table; fix standalone table advice; update references - sqlite-host.md: fix actors table description; unify empty-array semantics; contextualize hub as reference consumer; add operations identity reference
381 lines
14 KiB
Markdown
381 lines
14 KiB
Markdown
---
|
|
status: draft
|
|
last_updated: 2026-05-28
|
|
---
|
|
|
|
# Encrypted Data
|
|
|
|
Design for storing encrypted data at rest within the metagraph model. Uses
|
|
AES-256-GCM + PBKDF2 key derivation, providing a reusable node type, TypeBox
|
|
schema, and crypto utility for any consumer that needs to store secrets.
|
|
|
|
## Overview
|
|
|
|
Sensitive data — API keys, passwords, OAuth tokens, SSH keys — must be encrypted
|
|
at rest. In `@alkdev/storage`, the encryption pattern becomes a reusable utility
|
|
and an encrypted node type, so any graph can store secrets without special table
|
|
definitions.
|
|
|
|
**Key principle**: The storage package provides the **encryption primitives and
|
|
the schema shape**, not key management. Consumers provide the encryption key.
|
|
This keeps the package agnostic to deployment-specific secret management.
|
|
|
|
**Provenance**: The encryption pattern (AES-256-GCM + PBKDF2) was originally
|
|
implemented in the hub's `client_secrets` table and `src/crypto/mod.ts`.
|
|
`@alkdev/storage` extracts this pattern as a general-purpose utility, independent
|
|
of the hub's domain model.
|
|
|
|
## The Problem
|
|
|
|
The hub has `client_secrets` as a standalone table with columns like:
|
|
|
|
| Column | Purpose |
|
|
| ------------ | -------------------------------------------------- |
|
|
| `clientId` | FK to the client this secret belongs to |
|
|
| `key` | Secret name (e.g., "api_key", "oauth_credentials") |
|
|
| `value` | The encrypted payload (EncryptedData JSON) |
|
|
| `keyVersion` | Which encryption key version was used |
|
|
| `expiresAt` | When the secret expires |
|
|
| `lastUsedAt` | Audit trail |
|
|
|
|
This is a domain-specific table. The encryption logic itself is generic —
|
|
AES-256-GCM with PBKDF2 key derivation and key versioning. When we want
|
|
encrypted secrets in a spoke (local SQLite) or in a different domain model, we
|
|
shouldn't have to duplicate the table definition or the crypto code.
|
|
|
|
## Design: Encrypted Data as a Node Type
|
|
|
|
Instead of a dedicated `client_secrets` table, encrypted data becomes a **node
|
|
type** in a graph:
|
|
|
|
```ts
|
|
import { BaseNodeAttributes, SchemaBuilder } from "@alkdev/storage";
|
|
import { Type } from "@alkdev/typebox";
|
|
import { EncryptedDataSchema } from "@alkdev/storage";
|
|
|
|
const SecretNodeType = Type.Intersect([
|
|
BaseNodeAttributes,
|
|
Type.Object({
|
|
key: Type.String({ minLength: 1, maxLength: 255 }),
|
|
encryptedData: EncryptedDataSchema,
|
|
expiresAt: Type.Optional(Type.String({ format: "date-time" })),
|
|
}),
|
|
]);
|
|
|
|
const schema = new SchemaBuilder()
|
|
.config({ type: "undirected", multi: false, allowSelfLoops: false })
|
|
.nodeType("secret", SecretNodeType)
|
|
.nodeType(
|
|
"client",
|
|
Type.Intersect([
|
|
BaseNodeAttributes,
|
|
Type.Object({
|
|
name: Type.String(),
|
|
type: Type.String(),
|
|
config: Type.Record(Type.String(), Type.Any()),
|
|
enabled: Type.Boolean({ default: true }),
|
|
}),
|
|
]),
|
|
)
|
|
.edgeType(
|
|
"has_secret",
|
|
Type.Intersect([
|
|
BaseEdgeAttributes,
|
|
Type.Object({
|
|
secretKey: Type.String(),
|
|
}),
|
|
]),
|
|
{
|
|
allowedSourceTypes: ["client"],
|
|
allowedTargetTypes: ["secret"],
|
|
},
|
|
)
|
|
.build();
|
|
```
|
|
|
|
This represents the same relationship as `client_secrets.clientId` — but as a
|
|
graph edge rather than a foreign key.
|
|
|
|
### Why This Works
|
|
|
|
1. **No special tables needed** — The existing `graph_types`, `node_types`,
|
|
`edge_types`, `graphs`, `nodes`, `edges` tables store everything.
|
|
2. **Schema validation** — The `EncryptedDataSchema` TypeBox schema validates
|
|
the encryption envelope at write time.
|
|
3. **Domain flexibility** — An "ACL graph" might also have encrypted credential
|
|
nodes. A "call graph" might store encrypted auth headers. Different graphs,
|
|
same pattern.
|
|
4. **Query through edges** — "Find all secrets for client X" becomes "find all
|
|
edges of type `has_secret` from node X to secret nodes."
|
|
5. **The crypto utility is shared** — `@alkdev/storage` exports `encrypt()` and
|
|
`decrypt()` that any consumer uses.
|
|
|
|
### What Lives Where
|
|
|
|
| Layer | Responsibility | Package |
|
|
| ------------------------ | --------------------------------------------------------- | ------------------------ |
|
|
| `@alkdev/storage` graphs | `EncryptedDataSchema` (TypeBox shape) | `@alkdev/storage` |
|
|
| `@alkdev/storage` crypto | `encrypt()`, `decrypt()`, `generateEncryptionKey()` | `@alkdev/storage` |
|
|
| `@alkdev/storage` sqlite | Node storage (attributes contain encrypted JSON) | `@alkdev/storage/sqlite` |
|
|
| `@alkdev/storage` repo | Validate schema, encrypt before insert (⚠️ not yet impl) | `@alkdev/storage` |
|
|
| Application | Key management (key ring, key rotation) | Consumer |
|
|
|
|
## EncryptedData Schema
|
|
|
|
Ported from the hub's `src/crypto/mod.ts` interface, now expressed as a TypeBox
|
|
schema in `@alkdev/storage`:
|
|
|
|
```ts
|
|
import { Type } from "@alkdev/typebox";
|
|
|
|
export const EncryptedDataSchema = Type.Object({
|
|
keyVersion: Type.Integer({
|
|
minimum: 1,
|
|
description: "Encryption key version for rotation",
|
|
}),
|
|
salt: Type.String({ description: "Base64-encoded 16-byte PBKDF2 salt" }),
|
|
iv: Type.String({
|
|
description: "Base64-encoded 12-byte AES-GCM initialization vector",
|
|
}),
|
|
data: Type.String({ description: "Base64-encoded AES-256-GCM ciphertext" }),
|
|
});
|
|
```
|
|
|
|
This is the same structure as the hub's `EncryptedData` interface but as a
|
|
TypeBox schema, enabling runtime validation when inserting encrypted nodes.
|
|
|
|
## Crypto Utility
|
|
|
|
The encryption module provides three functions, ported from the hub's
|
|
`src/crypto/mod.ts`:
|
|
|
|
### `encrypt(plaintext, password, keyVersion?): Promise<EncryptedData>`
|
|
|
|
Encrypts a string using AES-256-GCM with PBKDF2 key derivation.
|
|
|
|
**Process**:
|
|
|
|
1. Generate random 16-byte salt
|
|
2. Generate random 12-byte IV
|
|
3. Derive 256-bit key from password + salt via PBKDF2 (SHA-256, 100k iterations
|
|
for v1)
|
|
4. Encrypt plaintext with AES-256-GCM using the derived key and IV
|
|
5. Return
|
|
`{ keyVersion, salt: base64(salt), iv: base64(iv), data: base64(ciphertext) }`
|
|
|
|
### `decrypt(encryptedData, password): Promise<string>`
|
|
|
|
Decrypts an `EncryptedData` object.
|
|
|
|
**Process**:
|
|
|
|
1. Decode base64 salt, IV, and ciphertext
|
|
2. Derive key from password + salt + keyVersion via PBKDF2
|
|
3. Decrypt with AES-256-GCM
|
|
4. Return plaintext string
|
|
5. Throw `"Decryption failed: Invalid data or key"` on failure (no information
|
|
leakage about which part failed)
|
|
|
|
### `generateEncryptionKey(): string`
|
|
|
|
Generates a 32-byte random key encoded as base64. Used by operators to create
|
|
encryption keys for the key ring.
|
|
|
|
**Key ring format** (application-level, not in this package): A comma-separated
|
|
list of `v{N}:{base64key}` pairs. The first key is the "current" key used for
|
|
new encryptions. All keys are available for decryption.
|
|
|
|
### Key Versioning
|
|
|
|
PBKDF2 iteration count varies by key version:
|
|
|
|
- v1: 100,000 iterations
|
|
- Future versions: 200,000+ (adjust for hardware improvements)
|
|
|
|
This allows gradual security upgrades. Old data encrypted with v1 can still be
|
|
decrypted. Re-encryption (rotate) reads with the old key and writes with the
|
|
current key.
|
|
|
|
### Web Crypto API
|
|
|
|
The implementation uses the standard Web Crypto API (`crypto.subtle`), available
|
|
in:
|
|
|
|
- Deno runtime (native)
|
|
- Node.js 19+ (native)
|
|
- Modern browsers (native)
|
|
- Cloudflare Workers (native)
|
|
|
|
No external crypto dependencies.
|
|
|
|
## Design Decisions
|
|
|
|
### ED1: Per-attribute encryption, not per-node
|
|
|
|
The `EncryptedData` schema is a single attribute within a node type's
|
|
attributes, not the entire node. This means:
|
|
|
|
- A secret node can have unencrypted metadata alongside the encrypted value
|
|
- The node key (identity) is always readable for queries
|
|
- Only the sensitive payload is encrypted
|
|
|
|
**Alternative considered**: Encrypt the entire `attributes` column. This makes
|
|
queries impossible (you can't find "all secrets for client X" if the client
|
|
reference is encrypted). Per-attribute encryption preserves queryability on
|
|
non-sensitive fields.
|
|
|
|
### ED2: Node type, not standalone table
|
|
|
|
Encrypted data is modeled as a node type rather than a dedicated `secrets` table
|
|
because:
|
|
|
|
- **Graphs already provide the structure** — edges represent "client X has
|
|
secret Y" without a join table
|
|
- **No foreign key proliferation** — new secret types (OAuth, SSH, API keys) are
|
|
new node types, not new columns or tables
|
|
- **Uniform query patterns** — All graph queries work on secret nodes without
|
|
special code
|
|
|
|
**When a standalone table might be better**: If a consumer (like the hub) needs
|
|
to query "all active API keys" across all clients with a single indexed `WHERE`
|
|
clause, a dedicated `api_keys` table with proper indexes is faster. The graph
|
|
model requires traversing edges to find related secrets. For a hub's specific use
|
|
case (key lookup on every authenticated request), this matters. The metagraph
|
|
pattern is optimized for flexibility, not raw key-lookup performance. Consumers
|
|
should use standalone tables for authentication hot paths and the metagraph for
|
|
everything else.
|
|
|
|
### ED3: Password-based encryption, not raw-key encryption
|
|
|
|
The current implementation uses PBKDF2 to derive a key from a password string.
|
|
The "password" in practice is a base64-encoded 32-byte random key from
|
|
`generateEncryptionKey()`. This means:
|
|
|
|
- The key derivation step adds security even when the input is already
|
|
high-entropy (each encryption gets a unique salt, so the same key produces
|
|
different ciphertexts)
|
|
- However, this adds ~100ms of latency per encryption/decryption due to PBKDF2
|
|
iterations
|
|
|
|
**Alternative**: Direct AES-GCM with raw key bytes (skip PBKDF2). This would be
|
|
much faster for high-throughput scenarios but removes the per-encryption salt
|
|
benefit (the IV still provides uniqueness for GCM). The hub uses password-based
|
|
because the config format is human-manageable key strings. For
|
|
`@alkdev/storage`, either approach works — the API accepts a "password" string
|
|
which could be a raw key encoded as base64.
|
|
|
|
**Decision**: Use the same PBKDF2 pattern for consistency with the hub. If
|
|
performance becomes an issue, add a `encryptRaw()` function that skips PBKDF2
|
|
for raw key inputs.
|
|
|
|
### ED4: Application-managed key ring
|
|
|
|
The storage package provides `encrypt()` and `decrypt()` but does NOT manage the
|
|
key ring. The consuming application:
|
|
|
|
1. Stores encryption keys in a secure location (Docker secrets, vault, config
|
|
file with restricted permissions)
|
|
2. Loads keys at startup
|
|
3. Passes the appropriate key to `encrypt()` / `decrypt()` based on `keyVersion`
|
|
4. Handles key rotation (decrypt with old key, re-encrypt with current key)
|
|
|
|
This separation ensures:
|
|
|
|
- The storage package doesn't need to know about deployment infrastructure
|
|
- Key management policies are application-specific
|
|
- The encryption primitives are testable without a key ring implementation
|
|
|
|
### ED5: No key rotation utility in this package
|
|
|
|
Key rotation (decrypt with old key, re-encrypt with current key) is an
|
|
application-level workflow:
|
|
|
|
1. Find all nodes with `attributes.encryptedData.keyVersion < currentVersion`
|
|
2. For each: decrypt with old key → encrypt with current key → update node
|
|
3. Commit transaction
|
|
|
|
The storage package provides the building blocks (`encrypt()`, `decrypt()`,
|
|
`EncryptedDataSchema`), not the rotation workflow. The hub's background sweep
|
|
pattern is a good reference implementation.
|
|
|
|
## Integration with SQLite Host
|
|
|
|
Encrypted node attributes are stored as JSON text in the `nodes.attributes`
|
|
column, same as any other node attributes. The `EncryptedDataSchema` validates
|
|
the shape at the application level.
|
|
|
|
```ts
|
|
import { decrypt, encrypt } from "@alkdev/storage";
|
|
import { EncryptedDataSchema } from "@alkdev/storage";
|
|
|
|
const encryptionKey = "v1:YmFzZTY0a2V5"; // from application config
|
|
|
|
const plaintext = "sk-ant-api03-...";
|
|
const encryptedData = await encrypt(plaintext, encryptionKey, 1);
|
|
|
|
// Validate before storage
|
|
const attributes = {
|
|
key: "api_key",
|
|
encryptedData,
|
|
expiresAt: new Date().toISOString(),
|
|
created: new Date().toISOString(),
|
|
};
|
|
|
|
// Store as a node in a graph
|
|
// db.insert(nodes).values({ graphId, key: "anthropic-api-key", attributes });
|
|
|
|
// Retrieve and decrypt
|
|
// const node = await db.query.nodes.findFirst({ where: eq(nodes.key, "anthropic-api-key") });
|
|
// const decrypted = await decrypt(node.attributes.encryptedData, encryptionKey);
|
|
```
|
|
|
|
## Export Plan
|
|
|
|
The crypto module will be exported from the main `@alkdev/storage` package (no
|
|
db deps):
|
|
|
|
```
|
|
src/graphs/
|
|
├── types.ts # existing: GraphConfig, NodeType, EdgeType, etc.
|
|
├── schemaBuilder.ts # existing: SchemaBuilder
|
|
├── crypto.ts # new: encrypt(), decrypt(), generateEncryptionKey(), EncryptedDataSchema
|
|
└── mod.ts # re-exports all of the above
|
|
```
|
|
|
|
This keeps the encryption utility in the zero-dep export path (it only uses Web
|
|
Crypto API and `@alkdev/typebox` for the schema).
|
|
|
|
## Open Questions
|
|
|
|
1. **Should we add `encryptRaw()` for performance?** The PBKDF2 derivation adds
|
|
~100ms per operation. For batch secret operations (e.g., rotating 1000 keys),
|
|
this adds up. A `encryptRaw()` that skips PBKDF2 and uses the key directly
|
|
would be much faster. Decision: add in a future iteration if performance
|
|
demands it.
|
|
|
|
2. **Should the `key` attribute on secret nodes be encrypted?** Currently only
|
|
the `encryptedData` attribute is encrypted. The `key` (secret name like
|
|
"api_key") is stored in plaintext for queryability. If secret names are
|
|
themselves sensitive, they could be hashed instead. Decision: plaintext key
|
|
names are acceptable for now. If needed, add a `keyHash` attribute for blind
|
|
lookups (similar to the hub's `api_keys.keyHash`).
|
|
|
|
3. **Should secret nodes have `lastUsedAt` and `expiresAt` as first-class
|
|
columns?** The hub's `client_secrets` has these as columns for indexed
|
|
queries. In the metagraph model, they're attributes inside the node JSON.
|
|
SQLite can't efficiently index JSON properties. Decision: for spoke use
|
|
(occasional lookups), JSON attributes are fine. For hub use (high-throughput
|
|
key validation), a standalone `api_keys` table with proper indexes is still
|
|
needed.
|
|
|
|
## References
|
|
|
|
- Web Crypto API: https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto
|
|
- Hub crypto utility (provenance): `/workspace/@alkdev/hub/src/crypto/mod.ts`
|
|
- Hub `client_secrets` table (provenance):
|
|
`/workspace/@alkdev/hub/docs/architecture/storage/services.md`
|
|
- Hub ADR-008 (provenance):
|
|
`/workspace/@alkdev/hub/docs/decisions/ADR-008-secrets-encrypted-at-rest-with-key-versioning.md`
|
|
- `@alkdev/operations` AccessControl:
|
|
`/workspace/@alkdev/operations/docs/architecture/api-surface.md`
|