docs: restructure architecture docs to flowgraph pattern
- Create decisions/ directory with 32 numbered ADRs (ADR-001 through ADR-032) extracted from inline DD/SD/ED/SE decision sections - Create open-questions.md with 16 OQs organized by theme, cross-referenced to ADRs, with status tracking (resolved/open) - Create README.md as architecture index with doc table, ADR table, and lifecycle status definitions (draft/reviewed/stable/deprecated) - Replace inline decision sections in all spec docs with ADR reference tables - Replace inline open questions with OQ references to centralized tracker - Update frontmatter: metagraph-module.md, overview.md, sqlite-host.md → reviewed; schema-evolution.md and encrypted-data.md remain draft - DD1-DD10 → ADR-009 through ADR-018 - D1-D8 → ADR-001 through ADR-008 - SD1-SD5 → ADR-019 through ADR-023 (SD5 folded into ADR-006/008) - ED1-ED5 → ADR-023 through ADR-027 - SE1-SE5 → ADR-028 through ADR-032
This commit is contained in:
@@ -212,93 +212,15 @@ No external crypto dependencies.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### ED1: Per-attribute encryption, not per-node
|
||||
All design decisions are documented as ADRs in [decisions/](decisions/).
|
||||
|
||||
The `EncryptedData` schema is a single attribute within a node type's
|
||||
attributes, not the entire node. This means:
|
||||
|
||||
- A secret node can have unencrypted metadata alongside the encrypted value
|
||||
- The node key (identity) is always readable for queries
|
||||
- Only the sensitive payload is encrypted
|
||||
|
||||
**Alternative considered**: Encrypt the entire `attributes` column. This makes
|
||||
queries impossible (you can't find "all secrets for client X" if the client
|
||||
reference is encrypted). Per-attribute encryption preserves queryability on
|
||||
non-sensitive fields.
|
||||
|
||||
### ED2: Node type, not standalone table
|
||||
|
||||
Encrypted data is modeled as a node type rather than a dedicated `secrets` table
|
||||
because:
|
||||
|
||||
- **Graphs already provide the structure** — edges represent "client X has
|
||||
secret Y" without a join table
|
||||
- **No foreign key proliferation** — new secret types (OAuth, SSH, API keys) are
|
||||
new node types, not new columns or tables
|
||||
- **Uniform query patterns** — All graph queries work on secret nodes without
|
||||
special code
|
||||
|
||||
**When a standalone table might be better**: If a consumer (like the hub) needs
|
||||
to query "all active API keys" across all clients with a single indexed `WHERE`
|
||||
clause, a dedicated `api_keys` table with proper indexes is faster. The graph
|
||||
model requires traversing edges to find related secrets. For a hub's specific use
|
||||
case (key lookup on every authenticated request), this matters. The metagraph
|
||||
pattern is optimized for flexibility, not raw key-lookup performance. Consumers
|
||||
should use standalone tables for authentication hot paths and the metagraph for
|
||||
everything else.
|
||||
|
||||
### ED3: Password-based encryption, not raw-key encryption
|
||||
|
||||
The current implementation uses PBKDF2 to derive a key from a password string.
|
||||
The "password" in practice is a base64-encoded 32-byte random key from
|
||||
`generateEncryptionKey()`. This means:
|
||||
|
||||
- The key derivation step adds security even when the input is already
|
||||
high-entropy (each encryption gets a unique salt, so the same key produces
|
||||
different ciphertexts)
|
||||
- However, this adds ~100ms of latency per encryption/decryption due to PBKDF2
|
||||
iterations
|
||||
|
||||
**Alternative**: Direct AES-GCM with raw key bytes (skip PBKDF2). This would be
|
||||
much faster for high-throughput scenarios but removes the per-encryption salt
|
||||
benefit (the IV still provides uniqueness for GCM). The hub uses password-based
|
||||
because the config format is human-manageable key strings. For
|
||||
`@alkdev/storage`, either approach works — the API accepts a "password" string
|
||||
which could be a raw key encoded as base64.
|
||||
|
||||
**Decision**: Use the same PBKDF2 pattern for consistency with the hub. If
|
||||
performance becomes an issue, add a `encryptRaw()` function that skips PBKDF2
|
||||
for raw key inputs.
|
||||
|
||||
### ED4: Application-managed key ring
|
||||
|
||||
The storage package provides `encrypt()` and `decrypt()` but does NOT manage the
|
||||
key ring. The consuming application:
|
||||
|
||||
1. Stores encryption keys in a secure location (Docker secrets, vault, config
|
||||
file with restricted permissions)
|
||||
2. Loads keys at startup
|
||||
3. Passes the appropriate key to `encrypt()` / `decrypt()` based on `keyVersion`
|
||||
4. Handles key rotation (decrypt with old key, re-encrypt with current key)
|
||||
|
||||
This separation ensures:
|
||||
|
||||
- The storage package doesn't need to know about deployment infrastructure
|
||||
- Key management policies are application-specific
|
||||
- The encryption primitives are testable without a key ring implementation
|
||||
|
||||
### ED5: No key rotation utility in this package
|
||||
|
||||
Key rotation (decrypt with old key, re-encrypt with current key) is an
|
||||
application-level workflow:
|
||||
|
||||
1. Find all nodes with `attributes.encryptedData.keyVersion < currentVersion`
|
||||
2. For each: decrypt with old key → encrypt with current key → update node
|
||||
3. Commit transaction
|
||||
|
||||
The storage package provides the building blocks (`encrypt()`, `decrypt()`,
|
||||
`EncryptedDataSchema`), not the rotation workflow. The hub's background sweep
|
||||
pattern is a good reference implementation.
|
||||
| ADR | Decision | Summary |
|
||||
|-----|----------|---------|
|
||||
| [023](decisions/023-per-attribute-encryption.md) | Per-attribute encryption, not per-node | Only sensitive payload encrypted; key/metadata remain queryable |
|
||||
| [024](decisions/024-encrypted-data-as-node-type.md) | Encrypted data as node type, not standalone table | No special tables; metagraph pattern with `SecretNode` and `HasSecretEdge` |
|
||||
| [025](decisions/025-password-based-encryption-pbkdf2.md) | Password-based encryption via PBKDF2 | Consistent with hub; ~100ms per operation; `encryptRaw()` added later if needed |
|
||||
| [026](decisions/026-application-managed-key-ring.md) | Application-managed key ring | Storage provides encrypt/decrypt primitives, not key management |
|
||||
| [027](decisions/027-no-key-rotation-utility.md) | No key rotation utility in this package | Application orchestrates rotation; storage provides building blocks |
|
||||
|
||||
## Integration with SQLite Host
|
||||
|
||||
@@ -348,26 +270,12 @@ Crypto API and `@alkdev/typebox` for the schema).
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Should we add `encryptRaw()` for performance?** The PBKDF2 derivation adds
|
||||
~100ms per operation. For batch secret operations (e.g., rotating 1000 keys),
|
||||
this adds up. A `encryptRaw()` that skips PBKDF2 and uses the key directly
|
||||
would be much faster. Decision: add in a future iteration if performance
|
||||
demands it.
|
||||
Open questions are tracked in [open-questions.md](open-questions.md). Key
|
||||
questions affecting encrypted data:
|
||||
|
||||
2. **Should the `key` attribute on secret nodes be encrypted?** Currently only
|
||||
the `encryptedData` attribute is encrypted. The `key` (secret name like
|
||||
"api_key") is stored in plaintext for queryability. If secret names are
|
||||
themselves sensitive, they could be hashed instead. Decision: plaintext key
|
||||
names are acceptable for now. If needed, add a `keyHash` attribute for blind
|
||||
lookups (similar to the hub's `api_keys.keyHash`).
|
||||
|
||||
3. **Should secret nodes have `lastUsedAt` and `expiresAt` as first-class
|
||||
columns?** The hub's `client_secrets` has these as columns for indexed
|
||||
queries. In the metagraph model, they're attributes inside the node JSON.
|
||||
SQLite can't efficiently index JSON properties. Decision: for spoke use
|
||||
(occasional lookups), JSON attributes are fine. For hub use (high-throughput
|
||||
key validation), a standalone `api_keys` table with proper indexes is still
|
||||
needed.
|
||||
- **OQ-07**: Should we add `encryptRaw()` for performance? (open, low priority)
|
||||
- **OQ-08**: Should the `key` attribute on secret nodes be encrypted? (resolved: plaintext for now)
|
||||
- **OQ-09**: Should secret nodes have `lastUsedAt` and `expiresAt` as first-class columns? (resolved: JSON attributes for spoke, standalone table for hub)
|
||||
|
||||
## References
|
||||
|
||||
|
||||
Reference in New Issue
Block a user