Files
storage/docs/architecture/decisions/049-identity-schema-restructuring.md
glm-5.1 ae242f33b9 Restructure identity tables: separate credential types, add peer_credentials, specify FK cascades and indexes
Identity tables were derived from hub's PostgreSQL schema but simplified
without documenting what was removed or why. This restructures them for the
current auth landscape (API key + wraith SSH/cert-authority):

- ADR-049: Separate api_keys and peer_credentials tables (different lookup
  patterns, columns, lifecycles), remove Gitea columns, map hub data→metadata
- ADR-050: Extract SHA-256 vs KDF decision from inline spec text
- Add peer_credentials table for SSH key and cert-authority auth
- Specify all FK cascade behaviors within system DB (RESTRICT, CASCADE, SET NULL)
- Complete index specifications for all identity tables
- Add scope boundary section (storage owns schemas, not auth/authorization)
- Update audit_logs with credentialId+credentialType polymorphic reference
- Add 3 new open questions (OQ-33/34/35) for credential type expansion
2026-06-02 12:33:20 +00:00

9.1 KiB

ADR-049: Identity Schema Restructuring

Status

Accepted

Context

The identity tables in sqlite-host.md were derived from the hub's PostgreSQL schema (@alkdev/hub/docs/architecture/storage/identity.md) but simplified without documenting what was removed and why. This creates ambiguity for implementation:

  1. Gitea columns (accounts.giteaUsername, organizations.giteaOrgName) were dropped without documented rationale. These are hub-specific integration columns — a storage package should not couple to a particular git hosting provider. Git association, when needed, belongs in a metagraph instance (e.g., a project graph with git repo metadata) or a downstream consumer's schema.

  2. data JSONB columns on accounts and organizations in the hub schema were silently dropped. The hub used these for account preferences/profile and org billing/settings. Storage's commonCols.metadata serves the same purpose — an extension namespace following _subsystem.key convention. The mapping from hub data to storage metadata is unambiguous but was never stated.

  3. Single api_keys table assumes keypal-style bearer token auth only. The @alkdev platform now has two authentication mechanisms:

    • API key (keypal-style): client sends a bearer token, hub hashes it, looks up by keyHash. Transport: HTTP/WebSocket.
    • Peer credential (wraith-style): client presents an Ed25519 public key or OpenSSH certificate over an SSH channel (TCP/TLS/Iroh). Server validates against known fingerprints. Transport: wraith SSH tunnel.

    These credential types have fundamentally different query patterns (hash lookup vs fingerprint lookup), different columns (keyHash makes no sense for SSH keys; publicKeyFingerprint makes no sense for API keys), and different lifecycles (rotation vs addition/removal). A single table would be mostly nulls on either side.

  4. Missing FK cascade behavior — Identity tables live in the same system DB, so real FK constraints apply. The current spec uses "logical reference" language (appropriate for cross-DB scoping columns on graphs) for relationships that are intra-database and should have proper cascades.

  5. Missing columns — The hub spec includes api_keys.description, api_keys.rotatedToId, api_keys.lastUsedAt, and audit_logs.sessionId that were dropped from storage without rationale.

Decision

1. Separate credential tables by type

Two tables with distinct columns, not a unified credentials table with a type discriminator:

Table Auth mechanism Lookup pattern Transport
api_keys Bearer token Hash token → look up keyHash HTTP/WebSocket
peer_credentials SSH key / cert-authority Present fingerprint → look up fingerprint wraith (SSH over TCP/TLS/Iroh)

Rationale: Query patterns, columns, and lifecycles differ fundamentally. Credential type proliferation is expected to be low (3-4 types ever). A new table per type is acceptable. This extends the design principle established by ADR-002: the metagraph pattern serves graph-shaped data (dynamic schemas, traversal queries); dedicated tables serve fixed-schema data with known columns and relational query patterns.

Decision criterion for future credential types: Credential types sharing the same lookup column get a table with a credentialType discriminator. Credential types requiring different lookup columns get their own table. For example, a future tls_certificate credential that uses fingerprint-based lookup would join peer_credentials with a new credentialType value. A credential type using a different lookup column (e.g., a client identifier) would warrant a new table.

2. Remove Gitea columns

accounts.giteaUsername and organizations.giteaOrgName are removed. Git hosting integration is a consumer concern, not a storage infrastructure concern. When a downstream system needs to associate accounts or organizations with git hosting, it stores that association in:

  • A metagraph instance (e.g., a project graph with a GitRepositoryNode)
  • Consumer-side schema extensions via commonCols.metadata

3. Hub data column maps to storage commonCols.metadata

The hub's accounts.data and organizations.data JSONB columns are not present in storage's identity tables. Their purpose (extensible account/org metadata) is served by commonCols.metadata, which follows the _subsystem.key convention. No data is lost — the extensible namespace already exists. The hub maps data fields into metadata keys when migrating.

4. Add peer_credentials table

A new table for SSH key and certificate-authority authentication over wraith transport:

Column Type Notes
commonCols id, metadata, createdAt, updatedAt
ownerId text NOT NULL FK → accounts.id (CASCADE)
credentialType text NOT NULL ssh_key, cert_authority
fingerprint text NOT NULL UNIQUE Ed25519 key fingerprint (SHA-256)
publicKeyData text NOT NULL Full public key in OpenSSH format
name text Human-readable label
enabled integer NOT NULL DEFAULT 1 Immediate disable switch
expiresAt integer (timestamp) Null = never
revokedAt integer (timestamp) Null = active

The credentialType discriminator separates key entries from CA entries within a single table because their query pattern (look up by fingerprint) is identical. Cert-specific data (principals, restrictions, caFingerprint) goes in metadata.

5. Add back useful API key columns

Column Reason
rotatedToId API key rotation tracking — sets which key replaced this one
lastUsedAt Stale key cleanup and access pattern analysis

description is not added. name + metadata covers labeling needs.

6. Specify FK cascade behavior within system DB

All identity table FKs are intra-database and use real constraints:

Relationship onDelete Rationale
organizations.ownerId → accounts.id RESTRICT Cannot delete owner account while org exists
organization_members.orgId → organizations.id CASCADE Org deletion removes memberships
organization_members.accountId → accounts.id CASCADE Account deletion removes memberships
api_keys.ownerId → accounts.id CASCADE Account deletion removes API keys
peer_credentials.ownerId → accounts.id CASCADE Account deletion removes peer credentials
audit_logs.ownerId → accounts.id RESTRICT Audit integrity — deactivate account instead of delete

audit_logs.keyId and audit_logs.credentialId are logical references (not FK) because they may reference rows in either api_keys or peer_credentials, and the referenced table depends on the audit event type.

7. Update audit_logs for multi-credential world

Replace keyId (API key only) with polymorphic credential references:

Column Type Notes
credentialId text Logical reference to api_keys.id or peer_credentials.id
credentialType text api_key, peer_credential, or null

This replaces the previous keyId column. The credentialType discriminator tells the consumer which table to look up (same pattern as graphs.ownerId — logical reference, not FK).

sessionId is not added. Session correlation is a hub concern, not a storage infrastructure concern. When needed, it goes in metadata.

Consequences

Positive:

  • Two credential types covered from the start — API key auth and wraith SSH auth can both be stored and looked up efficiently
  • Each credential table has native columns for its specific fields — no null-heavy rows, no JSON lookups for high-query fields
  • Gitea coupling removed — storage doesn't depend on a specific git hosting provider
  • FK cascades specified — implementers know exactly what happens on deletion
  • Clear provenance mapping — hub's data → storage's metadata is explicit

Negative:

  • Two credential tables instead of one — but the columns don't overlap and query patterns differ, so this is the correct trade-off
  • audit_logs.credentialId/credentialType polymorphic reference — no FK constraint, consumer resolves the table (same pattern as existing cross-DB references)
  • Hub must migrate its existing identity schema when consuming storage's definitions — keyIdcredentialId + credentialType, datametadata, Gitea columns to metagraph or consumer metadata
  • peer_credentials credential types may grow (e.g., tls_certificate) — handled by adding enum values, not new tables, since query patterns within peer auth are similar

References

  • ADR-002: Metagraph over domain-specific tables
  • ADR-041: Identity tables in storage package
  • ADR-040: System DB + tenant DB separation
  • Hub identity tables: /workspace/@alkdev/hub/docs/architecture/storage/identity.md
  • Wraith NAPI + pubsub: /workspace/@alkdev/wraith/docs/architecture/napi-and-pubsub.md
  • Wraith auth: /workspace/@alkdev/wraith/docs/architecture/decisions/012-auth-ed25519-and-cert-authority.md