Files

glm-5.1 ae242f33b9 Restructure identity tables: separate credential types, add peer_credentials, specify FK cascades and indexes

Identity tables were derived from hub's PostgreSQL schema but simplified
without documenting what was removed or why. This restructures them for the
current auth landscape (API key + wraith SSH/cert-authority):

- ADR-049: Separate api_keys and peer_credentials tables (different lookup
  patterns, columns, lifecycles), remove Gitea columns, map hub data→metadata
- ADR-050: Extract SHA-256 vs KDF decision from inline spec text
- Add peer_credentials table for SSH key and cert-authority auth
- Specify all FK cascade behaviors within system DB (RESTRICT, CASCADE, SET NULL)
- Complete index specifications for all identity tables
- Add scope boundary section (storage owns schemas, not auth/authorization)
- Update audit_logs with credentialId+credentialType polymorphic reference
- Add 3 new open questions (OQ-33/34/35) for credential type expansion

2026-06-02 12:33:20 +00:00

9.1 KiB

Raw Permalink Blame History

ADR-049: Identity Schema Restructuring

Status

Accepted

Context

The identity tables in sqlite-host.md were derived from the hub's PostgreSQL schema (@alkdev/hub/docs/architecture/storage/identity.md) but simplified without documenting what was removed and why. This creates ambiguity for implementation:

Gitea columns (accounts.giteaUsername, organizations.giteaOrgName) were dropped without documented rationale. These are hub-specific integration columns — a storage package should not couple to a particular git hosting provider. Git association, when needed, belongs in a metagraph instance (e.g., a project graph with git repo metadata) or a downstream consumer's schema.
data JSONB columns on accounts and organizations in the hub schema were silently dropped. The hub used these for account preferences/profile and org billing/settings. Storage's commonCols.metadata serves the same purpose — an extension namespace following _subsystem.key convention. The mapping from hub data to storage metadata is unambiguous but was never stated.
Single api_keys table assumes keypal-style bearer token auth only. The @alkdev platform now has two authentication mechanisms:
- API key (keypal-style): client sends a bearer token, hub hashes it, looks up by keyHash. Transport: HTTP/WebSocket.
- Peer credential (wraith-style): client presents an Ed25519 public key or OpenSSH certificate over an SSH channel (TCP/TLS/Iroh). Server validates against known fingerprints. Transport: wraith SSH tunnel.
These credential types have fundamentally different query patterns (hash lookup vs fingerprint lookup), different columns (keyHash makes no sense for SSH keys; publicKeyFingerprint makes no sense for API keys), and different lifecycles (rotation vs addition/removal). A single table would be mostly nulls on either side.
Missing FK cascade behavior — Identity tables live in the same system DB, so real FK constraints apply. The current spec uses "logical reference" language (appropriate for cross-DB scoping columns on graphs) for relationships that are intra-database and should have proper cascades.
Missing columns — The hub spec includes api_keys.description, api_keys.rotatedToId, api_keys.lastUsedAt, and audit_logs.sessionId that were dropped from storage without rationale.

Decision

1. Separate credential tables by type

Two tables with distinct columns, not a unified credentials table with a type discriminator:

Table	Auth mechanism	Lookup pattern	Transport
`api_keys`	Bearer token	Hash token → look up `keyHash`	HTTP/WebSocket
`peer_credentials`	SSH key / cert-authority	Present fingerprint → look up `fingerprint`	wraith (SSH over TCP/TLS/Iroh)

Rationale: Query patterns, columns, and lifecycles differ fundamentally. Credential type proliferation is expected to be low (3-4 types ever). A new table per type is acceptable. This extends the design principle established by ADR-002: the metagraph pattern serves graph-shaped data (dynamic schemas, traversal queries); dedicated tables serve fixed-schema data with known columns and relational query patterns.

Decision criterion for future credential types: Credential types sharing the same lookup column get a table with a credentialType discriminator. Credential types requiring different lookup columns get their own table. For example, a future tls_certificate credential that uses fingerprint-based lookup would join peer_credentials with a new credentialType value. A credential type using a different lookup column (e.g., a client identifier) would warrant a new table.

2. Remove Gitea columns

accounts.giteaUsername and organizations.giteaOrgName are removed. Git hosting integration is a consumer concern, not a storage infrastructure concern. When a downstream system needs to associate accounts or organizations with git hosting, it stores that association in:

A metagraph instance (e.g., a project graph with a GitRepositoryNode)
Consumer-side schema extensions via commonCols.metadata

3. Hub `data` column maps to storage `commonCols.metadata`

The hub's accounts.data and organizations.data JSONB columns are not present in storage's identity tables. Their purpose (extensible account/org metadata) is served by commonCols.metadata, which follows the _subsystem.key convention. No data is lost — the extensible namespace already exists. The hub maps data fields into metadata keys when migrating.

4. Add `peer_credentials` table

A new table for SSH key and certificate-authority authentication over wraith transport:

Column	Type	Notes
commonCols	—	id, metadata, createdAt, updatedAt
ownerId	text NOT NULL	FK → accounts.id (CASCADE)
credentialType	text NOT NULL	`ssh_key`, `cert_authority`
fingerprint	text NOT NULL UNIQUE	Ed25519 key fingerprint (SHA-256)
publicKeyData	text NOT NULL	Full public key in OpenSSH format
name	text	Human-readable label
enabled	integer NOT NULL DEFAULT 1	Immediate disable switch
expiresAt	integer (timestamp)	Null = never
revokedAt	integer (timestamp)	Null = active

The credentialType discriminator separates key entries from CA entries within a single table because their query pattern (look up by fingerprint) is identical. Cert-specific data (principals, restrictions, caFingerprint) goes in metadata.

5. Add back useful API key columns

Column	Reason
`rotatedToId`	API key rotation tracking — sets which key replaced this one
`lastUsedAt`	Stale key cleanup and access pattern analysis

description is not added. name + metadata covers labeling needs.

6. Specify FK cascade behavior within system DB

All identity table FKs are intra-database and use real constraints:

Relationship	onDelete	Rationale
organizations.ownerId → accounts.id	RESTRICT	Cannot delete owner account while org exists
organization_members.orgId → organizations.id	CASCADE	Org deletion removes memberships
organization_members.accountId → accounts.id	CASCADE	Account deletion removes memberships
api_keys.ownerId → accounts.id	CASCADE	Account deletion removes API keys
peer_credentials.ownerId → accounts.id	CASCADE	Account deletion removes peer credentials
audit_logs.ownerId → accounts.id	RESTRICT	Audit integrity — deactivate account instead of delete

audit_logs.keyId and audit_logs.credentialId are logical references (not FK) because they may reference rows in either api_keys or peer_credentials, and the referenced table depends on the audit event type.

7. Update audit_logs for multi-credential world

Replace keyId (API key only) with polymorphic credential references:

Column	Type	Notes
credentialId	text	Logical reference to api_keys.id or peer_credentials.id
credentialType	text	`api_key`, `peer_credential`, or null

This replaces the previous keyId column. The credentialType discriminator tells the consumer which table to look up (same pattern as graphs.ownerId — logical reference, not FK).

sessionId is not added. Session correlation is a hub concern, not a storage infrastructure concern. When needed, it goes in metadata.

Consequences

Positive:

Two credential types covered from the start — API key auth and wraith SSH auth can both be stored and looked up efficiently
Each credential table has native columns for its specific fields — no null-heavy rows, no JSON lookups for high-query fields
Gitea coupling removed — storage doesn't depend on a specific git hosting provider
FK cascades specified — implementers know exactly what happens on deletion
Clear provenance mapping — hub's data → storage's metadata is explicit

Negative:

Two credential tables instead of one — but the columns don't overlap and query patterns differ, so this is the correct trade-off
audit_logs.credentialId/credentialType polymorphic reference — no FK constraint, consumer resolves the table (same pattern as existing cross-DB references)
Hub must migrate its existing identity schema when consuming storage's definitions — keyId → credentialId + credentialType, data → metadata, Gitea columns to metagraph or consumer metadata
peer_credentials credential types may grow (e.g., tls_certificate) — handled by adding enum values, not new tables, since query patterns within peer auth are similar

References

ADR-002: Metagraph over domain-specific tables
ADR-041: Identity tables in storage package
ADR-040: System DB + tenant DB separation
Hub identity tables: /workspace/@alkdev/hub/docs/architecture/storage/identity.md
Wraith NAPI + pubsub: /workspace/@alkdev/wraith/docs/architecture/napi-and-pubsub.md
Wraith auth: /workspace/@alkdev/wraith/docs/architecture/decisions/012-auth-ed25519-and-cert-authority.md

9.1 KiB Raw Permalink Blame History