Files
storage/docs/architecture/encrypted-data.md
glm-5.1 33a5b0816d docs: correct ecosystem dependency direction and add integration context
Architecture docs previously referenced the hub as the authoritative source
for call/identity specs. In reality, call protocol, identity, and access control
come from @alkdev/operations; call graph schemas from @alkdev/flowgraph; task
graph schemas from @alkdev/taskgraph; event transport from @alkdev/pubsub. The
hub is a consumer of @alkdev/storage, not the other way around.

Key changes:
- overview.md: add Ecosystem Integration section with dependency direction
  diagram, What Comes From Where table, repo layer bridging pattern, and
  circular dependency avoidance guidance
- overview.md: promote repo-layer vs operations-bridging from open question
  to explicit decision (CRUD in storage, bridging in consumer)
- overview.md: add zero-ecosystem-dependency statement; fix taskgraph type
  names (TaskGraphNodeAttributes, DependencyEdge)
- overview.md: fix terminology (hub is consumer, not authority)
- metagraph.md: add Ecosystem Context section; replace hub references with
  correct ecosystem sources; fix GraphStatus/GraphBaseType enum
  mischaracterization (C1); unify empty-array semantics with sqlite-host (C2);
  clarify repo layer does NOT import operations (C3); add flowgraph canonical
  schema note; add versioning cross-reference to graph_types table
- encrypted-data.md: reframe hub as provenance not authority; update What
  Lives Where table; fix standalone table advice; update references
- sqlite-host.md: fix actors table description; unify empty-array semantics;
  contextualize hub as reference consumer; add operations identity reference
2026-05-28 14:25:16 +00:00

14 KiB

status, last_updated
status last_updated
draft 2026-05-28

Encrypted Data

Design for storing encrypted data at rest within the metagraph model. Uses AES-256-GCM + PBKDF2 key derivation, providing a reusable node type, TypeBox schema, and crypto utility for any consumer that needs to store secrets.

Overview

Sensitive data — API keys, passwords, OAuth tokens, SSH keys — must be encrypted at rest. In @alkdev/storage, the encryption pattern becomes a reusable utility and an encrypted node type, so any graph can store secrets without special table definitions.

Key principle: The storage package provides the encryption primitives and the schema shape, not key management. Consumers provide the encryption key. This keeps the package agnostic to deployment-specific secret management.

Provenance: The encryption pattern (AES-256-GCM + PBKDF2) was originally implemented in the hub's client_secrets table and src/crypto/mod.ts. @alkdev/storage extracts this pattern as a general-purpose utility, independent of the hub's domain model.

The Problem

The hub has client_secrets as a standalone table with columns like:

Column Purpose
clientId FK to the client this secret belongs to
key Secret name (e.g., "api_key", "oauth_credentials")
value The encrypted payload (EncryptedData JSON)
keyVersion Which encryption key version was used
expiresAt When the secret expires
lastUsedAt Audit trail

This is a domain-specific table. The encryption logic itself is generic — AES-256-GCM with PBKDF2 key derivation and key versioning. When we want encrypted secrets in a spoke (local SQLite) or in a different domain model, we shouldn't have to duplicate the table definition or the crypto code.

Design: Encrypted Data as a Node Type

Instead of a dedicated client_secrets table, encrypted data becomes a node type in a graph:

import { BaseNodeAttributes, SchemaBuilder } from "@alkdev/storage";
import { Type } from "@alkdev/typebox";
import { EncryptedDataSchema } from "@alkdev/storage";

const SecretNodeType = Type.Intersect([
  BaseNodeAttributes,
  Type.Object({
    key: Type.String({ minLength: 1, maxLength: 255 }),
    encryptedData: EncryptedDataSchema,
    expiresAt: Type.Optional(Type.String({ format: "date-time" })),
  }),
]);

const schema = new SchemaBuilder()
  .config({ type: "undirected", multi: false, allowSelfLoops: false })
  .nodeType("secret", SecretNodeType)
  .nodeType(
    "client",
    Type.Intersect([
      BaseNodeAttributes,
      Type.Object({
        name: Type.String(),
        type: Type.String(),
        config: Type.Record(Type.String(), Type.Any()),
        enabled: Type.Boolean({ default: true }),
      }),
    ]),
  )
  .edgeType(
    "has_secret",
    Type.Intersect([
      BaseEdgeAttributes,
      Type.Object({
        secretKey: Type.String(),
      }),
    ]),
    {
      allowedSourceTypes: ["client"],
      allowedTargetTypes: ["secret"],
    },
  )
  .build();

This represents the same relationship as client_secrets.clientId — but as a graph edge rather than a foreign key.

Why This Works

  1. No special tables needed — The existing graph_types, node_types, edge_types, graphs, nodes, edges tables store everything.
  2. Schema validation — The EncryptedDataSchema TypeBox schema validates the encryption envelope at write time.
  3. Domain flexibility — An "ACL graph" might also have encrypted credential nodes. A "call graph" might store encrypted auth headers. Different graphs, same pattern.
  4. Query through edges — "Find all secrets for client X" becomes "find all edges of type has_secret from node X to secret nodes."
  5. The crypto utility is shared@alkdev/storage exports encrypt() and decrypt() that any consumer uses.

What Lives Where

Layer Responsibility Package
@alkdev/storage graphs EncryptedDataSchema (TypeBox shape) @alkdev/storage
@alkdev/storage crypto encrypt(), decrypt(), generateEncryptionKey() @alkdev/storage
@alkdev/storage sqlite Node storage (attributes contain encrypted JSON) @alkdev/storage/sqlite
@alkdev/storage repo Validate schema, encrypt before insert (⚠️ not yet impl) @alkdev/storage
Application Key management (key ring, key rotation) Consumer

EncryptedData Schema

Ported from the hub's src/crypto/mod.ts interface, now expressed as a TypeBox schema in @alkdev/storage:

import { Type } from "@alkdev/typebox";

export const EncryptedDataSchema = Type.Object({
  keyVersion: Type.Integer({
    minimum: 1,
    description: "Encryption key version for rotation",
  }),
  salt: Type.String({ description: "Base64-encoded 16-byte PBKDF2 salt" }),
  iv: Type.String({
    description: "Base64-encoded 12-byte AES-GCM initialization vector",
  }),
  data: Type.String({ description: "Base64-encoded AES-256-GCM ciphertext" }),
});

This is the same structure as the hub's EncryptedData interface but as a TypeBox schema, enabling runtime validation when inserting encrypted nodes.

Crypto Utility

The encryption module provides three functions, ported from the hub's src/crypto/mod.ts:

encrypt(plaintext, password, keyVersion?): Promise<EncryptedData>

Encrypts a string using AES-256-GCM with PBKDF2 key derivation.

Process:

  1. Generate random 16-byte salt
  2. Generate random 12-byte IV
  3. Derive 256-bit key from password + salt via PBKDF2 (SHA-256, 100k iterations for v1)
  4. Encrypt plaintext with AES-256-GCM using the derived key and IV
  5. Return { keyVersion, salt: base64(salt), iv: base64(iv), data: base64(ciphertext) }

decrypt(encryptedData, password): Promise<string>

Decrypts an EncryptedData object.

Process:

  1. Decode base64 salt, IV, and ciphertext
  2. Derive key from password + salt + keyVersion via PBKDF2
  3. Decrypt with AES-256-GCM
  4. Return plaintext string
  5. Throw "Decryption failed: Invalid data or key" on failure (no information leakage about which part failed)

generateEncryptionKey(): string

Generates a 32-byte random key encoded as base64. Used by operators to create encryption keys for the key ring.

Key ring format (application-level, not in this package): A comma-separated list of v{N}:{base64key} pairs. The first key is the "current" key used for new encryptions. All keys are available for decryption.

Key Versioning

PBKDF2 iteration count varies by key version:

  • v1: 100,000 iterations
  • Future versions: 200,000+ (adjust for hardware improvements)

This allows gradual security upgrades. Old data encrypted with v1 can still be decrypted. Re-encryption (rotate) reads with the old key and writes with the current key.

Web Crypto API

The implementation uses the standard Web Crypto API (crypto.subtle), available in:

  • Deno runtime (native)
  • Node.js 19+ (native)
  • Modern browsers (native)
  • Cloudflare Workers (native)

No external crypto dependencies.

Design Decisions

ED1: Per-attribute encryption, not per-node

The EncryptedData schema is a single attribute within a node type's attributes, not the entire node. This means:

  • A secret node can have unencrypted metadata alongside the encrypted value
  • The node key (identity) is always readable for queries
  • Only the sensitive payload is encrypted

Alternative considered: Encrypt the entire attributes column. This makes queries impossible (you can't find "all secrets for client X" if the client reference is encrypted). Per-attribute encryption preserves queryability on non-sensitive fields.

ED2: Node type, not standalone table

Encrypted data is modeled as a node type rather than a dedicated secrets table because:

  • Graphs already provide the structure — edges represent "client X has secret Y" without a join table
  • No foreign key proliferation — new secret types (OAuth, SSH, API keys) are new node types, not new columns or tables
  • Uniform query patterns — All graph queries work on secret nodes without special code

When a standalone table might be better: If a consumer (like the hub) needs to query "all active API keys" across all clients with a single indexed WHERE clause, a dedicated api_keys table with proper indexes is faster. The graph model requires traversing edges to find related secrets. For a hub's specific use case (key lookup on every authenticated request), this matters. The metagraph pattern is optimized for flexibility, not raw key-lookup performance. Consumers should use standalone tables for authentication hot paths and the metagraph for everything else.

ED3: Password-based encryption, not raw-key encryption

The current implementation uses PBKDF2 to derive a key from a password string. The "password" in practice is a base64-encoded 32-byte random key from generateEncryptionKey(). This means:

  • The key derivation step adds security even when the input is already high-entropy (each encryption gets a unique salt, so the same key produces different ciphertexts)
  • However, this adds ~100ms of latency per encryption/decryption due to PBKDF2 iterations

Alternative: Direct AES-GCM with raw key bytes (skip PBKDF2). This would be much faster for high-throughput scenarios but removes the per-encryption salt benefit (the IV still provides uniqueness for GCM). The hub uses password-based because the config format is human-manageable key strings. For @alkdev/storage, either approach works — the API accepts a "password" string which could be a raw key encoded as base64.

Decision: Use the same PBKDF2 pattern for consistency with the hub. If performance becomes an issue, add a encryptRaw() function that skips PBKDF2 for raw key inputs.

ED4: Application-managed key ring

The storage package provides encrypt() and decrypt() but does NOT manage the key ring. The consuming application:

  1. Stores encryption keys in a secure location (Docker secrets, vault, config file with restricted permissions)
  2. Loads keys at startup
  3. Passes the appropriate key to encrypt() / decrypt() based on keyVersion
  4. Handles key rotation (decrypt with old key, re-encrypt with current key)

This separation ensures:

  • The storage package doesn't need to know about deployment infrastructure
  • Key management policies are application-specific
  • The encryption primitives are testable without a key ring implementation

ED5: No key rotation utility in this package

Key rotation (decrypt with old key, re-encrypt with current key) is an application-level workflow:

  1. Find all nodes with attributes.encryptedData.keyVersion < currentVersion
  2. For each: decrypt with old key → encrypt with current key → update node
  3. Commit transaction

The storage package provides the building blocks (encrypt(), decrypt(), EncryptedDataSchema), not the rotation workflow. The hub's background sweep pattern is a good reference implementation.

Integration with SQLite Host

Encrypted node attributes are stored as JSON text in the nodes.attributes column, same as any other node attributes. The EncryptedDataSchema validates the shape at the application level.

import { decrypt, encrypt } from "@alkdev/storage";
import { EncryptedDataSchema } from "@alkdev/storage";

const encryptionKey = "v1:YmFzZTY0a2V5"; // from application config

const plaintext = "sk-ant-api03-...";
const encryptedData = await encrypt(plaintext, encryptionKey, 1);

// Validate before storage
const attributes = {
  key: "api_key",
  encryptedData,
  expiresAt: new Date().toISOString(),
  created: new Date().toISOString(),
};

// Store as a node in a graph
// db.insert(nodes).values({ graphId, key: "anthropic-api-key", attributes });

// Retrieve and decrypt
// const node = await db.query.nodes.findFirst({ where: eq(nodes.key, "anthropic-api-key") });
// const decrypted = await decrypt(node.attributes.encryptedData, encryptionKey);

Export Plan

The crypto module will be exported from the main @alkdev/storage package (no db deps):

src/graphs/
├── types.ts          # existing: GraphConfig, NodeType, EdgeType, etc.
├── schemaBuilder.ts  # existing: SchemaBuilder
├── crypto.ts         # new: encrypt(), decrypt(), generateEncryptionKey(), EncryptedDataSchema
└── mod.ts            # re-exports all of the above

This keeps the encryption utility in the zero-dep export path (it only uses Web Crypto API and @alkdev/typebox for the schema).

Open Questions

  1. Should we add encryptRaw() for performance? The PBKDF2 derivation adds ~100ms per operation. For batch secret operations (e.g., rotating 1000 keys), this adds up. A encryptRaw() that skips PBKDF2 and uses the key directly would be much faster. Decision: add in a future iteration if performance demands it.

  2. Should the key attribute on secret nodes be encrypted? Currently only the encryptedData attribute is encrypted. The key (secret name like "api_key") is stored in plaintext for queryability. If secret names are themselves sensitive, they could be hashed instead. Decision: plaintext key names are acceptable for now. If needed, add a keyHash attribute for blind lookups (similar to the hub's api_keys.keyHash).

  3. Should secret nodes have lastUsedAt and expiresAt as first-class columns? The hub's client_secrets has these as columns for indexed queries. In the metagraph model, they're attributes inside the node JSON. SQLite can't efficiently index JSON properties. Decision: for spoke use (occasional lookups), JSON attributes are fine. For hub use (high-throughput key validation), a standalone api_keys table with proper indexes is still needed.

References

  • Web Crypto API: https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto
  • Hub crypto utility (provenance): /workspace/@alkdev/hub/src/crypto/mod.ts
  • Hub client_secrets table (provenance): /workspace/@alkdev/hub/docs/architecture/storage/services.md
  • Hub ADR-008 (provenance): /workspace/@alkdev/hub/docs/decisions/ADR-008-secrets-encrypted-at-rest-with-key-versioning.md
  • @alkdev/operations AccessControl: /workspace/@alkdev/operations/docs/architecture/api-surface.md