Files
storage/docs/architecture/decisions/049-identity-schema-restructuring.md
glm-5.1 ae242f33b9 Restructure identity tables: separate credential types, add peer_credentials, specify FK cascades and indexes
Identity tables were derived from hub's PostgreSQL schema but simplified
without documenting what was removed or why. This restructures them for the
current auth landscape (API key + wraith SSH/cert-authority):

- ADR-049: Separate api_keys and peer_credentials tables (different lookup
  patterns, columns, lifecycles), remove Gitea columns, map hub data→metadata
- ADR-050: Extract SHA-256 vs KDF decision from inline spec text
- Add peer_credentials table for SSH key and cert-authority auth
- Specify all FK cascade behaviors within system DB (RESTRICT, CASCADE, SET NULL)
- Complete index specifications for all identity tables
- Add scope boundary section (storage owns schemas, not auth/authorization)
- Update audit_logs with credentialId+credentialType polymorphic reference
- Add 3 new open questions (OQ-33/34/35) for credential type expansion
2026-06-02 12:33:20 +00:00

194 lines
9.1 KiB
Markdown

# ADR-049: Identity Schema Restructuring
## Status
Accepted
## Context
The identity tables in `sqlite-host.md` were derived from the hub's PostgreSQL
schema (`@alkdev/hub/docs/architecture/storage/identity.md`) but simplified
without documenting what was removed and why. This creates ambiguity for
implementation:
1. **Gitea columns** (`accounts.giteaUsername`, `organizations.giteaOrgName`)
were dropped without documented rationale. These are hub-specific
integration columns — a storage package should not couple to a particular
git hosting provider. Git association, when needed, belongs in a metagraph
instance (e.g., a project graph with git repo metadata) or a downstream
consumer's schema.
2. **`data` JSONB columns** on `accounts` and `organizations` in the hub schema
were silently dropped. The hub used these for account preferences/profile
and org billing/settings. Storage's `commonCols.metadata` serves the same
purpose — an extension namespace following `_subsystem.key` convention. The
mapping from hub `data` to storage `metadata` is unambiguous but was never
stated.
3. **Single `api_keys` table** assumes keypal-style bearer token auth only. The
@alkdev platform now has two authentication mechanisms:
- **API key** (keypal-style): client sends a bearer token, hub hashes it,
looks up by `keyHash`. Transport: HTTP/WebSocket.
- **Peer credential** (wraith-style): client presents an Ed25519 public key
or OpenSSH certificate over an SSH channel (TCP/TLS/Iroh). Server
validates against known fingerprints. Transport: wraith SSH tunnel.
These credential types have fundamentally different query patterns (hash
lookup vs fingerprint lookup), different columns (keyHash makes no sense
for SSH keys; publicKeyFingerprint makes no sense for API keys), and
different lifecycles (rotation vs addition/removal). A single table would be
mostly nulls on either side.
4. **Missing FK cascade behavior** — Identity tables live in the same system
DB, so real FK constraints apply. The current spec uses "logical reference"
language (appropriate for cross-DB scoping columns on `graphs`) for
relationships that are intra-database and should have proper cascades.
5. **Missing columns** — The hub spec includes `api_keys.description`,
`api_keys.rotatedToId`, `api_keys.lastUsedAt`, and `audit_logs.sessionId`
that were dropped from storage without rationale.
## Decision
### 1. Separate credential tables by type
Two tables with distinct columns, not a unified `credentials` table with a
type discriminator:
| Table | Auth mechanism | Lookup pattern | Transport |
|-------|----------------|----------------|-----------|
| `api_keys` | Bearer token | Hash token → look up `keyHash` | HTTP/WebSocket |
| `peer_credentials` | SSH key / cert-authority | Present fingerprint → look up `fingerprint` | wraith (SSH over TCP/TLS/Iroh) |
Rationale: Query patterns, columns, and lifecycles differ fundamentally.
Credential type proliferation is expected to be low (3-4 types ever). A new
table per type is acceptable. This extends the design principle established by
ADR-002: the metagraph pattern serves graph-shaped data (dynamic schemas,
traversal queries); dedicated tables serve fixed-schema data with known
columns and relational query patterns.
**Decision criterion for future credential types**: Credential types sharing
the same lookup column get a table with a `credentialType` discriminator.
Credential types requiring different lookup columns get their own table. For
example, a future `tls_certificate` credential that uses fingerprint-based
lookup would join `peer_credentials` with a new `credentialType` value. A
credential type using a different lookup column (e.g., a client identifier)
would warrant a new table.
### 2. Remove Gitea columns
`accounts.giteaUsername` and `organizations.giteaOrgName` are removed. Git
hosting integration is a consumer concern, not a storage infrastructure
concern. When a downstream system needs to associate accounts or organizations
with git hosting, it stores that association in:
- A metagraph instance (e.g., a project graph with a `GitRepositoryNode`)
- Consumer-side schema extensions via `commonCols.metadata`
### 3. Hub `data` column maps to storage `commonCols.metadata`
The hub's `accounts.data` and `organizations.data` JSONB columns are not
present in storage's identity tables. Their purpose (extensible account/org
metadata) is served by `commonCols.metadata`, which follows the
`_subsystem.key` convention. No data is lost — the extensible namespace
already exists. The hub maps `data` fields into `metadata` keys when
migrating.
### 4. Add `peer_credentials` table
A new table for SSH key and certificate-authority authentication over wraith
transport:
| Column | Type | Notes |
|--------|------|-------|
| commonCols | — | id, metadata, createdAt, updatedAt |
| ownerId | text NOT NULL | FK → accounts.id (CASCADE) |
| credentialType | text NOT NULL | `ssh_key`, `cert_authority` |
| fingerprint | text NOT NULL UNIQUE | Ed25519 key fingerprint (SHA-256) |
| publicKeyData | text NOT NULL | Full public key in OpenSSH format |
| name | text | Human-readable label |
| enabled | integer NOT NULL DEFAULT 1 | Immediate disable switch |
| expiresAt | integer (timestamp) | Null = never |
| revokedAt | integer (timestamp) | Null = active |
The `credentialType` discriminator separates key entries from CA entries within
a single table because their query pattern (look up by fingerprint) is
identical. Cert-specific data (principals, restrictions, caFingerprint) goes
in `metadata`.
### 5. Add back useful API key columns
| Column | Reason |
|--------|--------|
| `rotatedToId` | API key rotation tracking — sets which key replaced this one |
| `lastUsedAt` | Stale key cleanup and access pattern analysis |
`description` is not added. `name` + `metadata` covers labeling needs.
### 6. Specify FK cascade behavior within system DB
All identity table FKs are intra-database and use real constraints:
| Relationship | onDelete | Rationale |
|-------------|----------|-----------|
| organizations.ownerId → accounts.id | RESTRICT | Cannot delete owner account while org exists |
| organization_members.orgId → organizations.id | CASCADE | Org deletion removes memberships |
| organization_members.accountId → accounts.id | CASCADE | Account deletion removes memberships |
| api_keys.ownerId → accounts.id | CASCADE | Account deletion removes API keys |
| peer_credentials.ownerId → accounts.id | CASCADE | Account deletion removes peer credentials |
| audit_logs.ownerId → accounts.id | RESTRICT | Audit integrity — deactivate account instead of delete |
`audit_logs.keyId` and `audit_logs.credentialId` are logical references (not
FK) because they may reference rows in either `api_keys` or `peer_credentials`,
and the referenced table depends on the audit event type.
### 7. Update audit_logs for multi-credential world
Replace `keyId` (API key only) with polymorphic credential references:
| Column | Type | Notes |
|--------|------|-------|
| credentialId | text | Logical reference to api_keys.id or peer_credentials.id |
| credentialType | text | `api_key`, `peer_credential`, or null |
This replaces the previous `keyId` column. The `credentialType` discriminator
tells the consumer which table to look up (same pattern as
`graphs.ownerId` — logical reference, not FK).
`sessionId` is not added. Session correlation is a hub concern, not a storage
infrastructure concern. When needed, it goes in `metadata`.
## Consequences
**Positive:**
- Two credential types covered from the start — API key auth and wraith SSH
auth can both be stored and looked up efficiently
- Each credential table has native columns for its specific fields — no
null-heavy rows, no JSON lookups for high-query fields
- Gitea coupling removed — storage doesn't depend on a specific git hosting
provider
- FK cascades specified — implementers know exactly what happens on deletion
- Clear provenance mapping — hub's `data` → storage's `metadata` is explicit
**Negative:**
- Two credential tables instead of one — but the columns don't overlap and
query patterns differ, so this is the correct trade-off
- `audit_logs.credentialId`/`credentialType` polymorphic reference — no FK
constraint, consumer resolves the table (same pattern as existing
cross-DB references)
- Hub must migrate its existing identity schema when consuming storage's
definitions — `keyId``credentialId` + `credentialType`, `data``metadata`,
Gitea columns to metagraph or consumer metadata
- `peer_credentials` credential types may grow (e.g., `tls_certificate`) —
handled by adding enum values, not new tables, since query patterns within
peer auth are similar
## References
- ADR-002: Metagraph over domain-specific tables
- ADR-041: Identity tables in storage package
- ADR-040: System DB + tenant DB separation
- Hub identity tables: `/workspace/@alkdev/hub/docs/architecture/storage/identity.md`
- Wraith NAPI + pubsub: `/workspace/@alkdev/wraith/docs/architecture/napi-and-pubsub.md`
- Wraith auth: `/workspace/@alkdev/wraith/docs/architecture/decisions/012-auth-ed25519-and-cert-authority.md`