# ADR-049: Identity Schema Restructuring ## Status Accepted ## Context The identity tables in `sqlite-host.md` were derived from the hub's PostgreSQL schema (`@alkdev/hub/docs/architecture/storage/identity.md`) but simplified without documenting what was removed and why. This creates ambiguity for implementation: 1. **Gitea columns** (`accounts.giteaUsername`, `organizations.giteaOrgName`) were dropped without documented rationale. These are hub-specific integration columns — a storage package should not couple to a particular git hosting provider. Git association, when needed, belongs in a metagraph instance (e.g., a project graph with git repo metadata) or a downstream consumer's schema. 2. **`data` JSONB columns** on `accounts` and `organizations` in the hub schema were silently dropped. The hub used these for account preferences/profile and org billing/settings. Storage's `commonCols.metadata` serves the same purpose — an extension namespace following `_subsystem.key` convention. The mapping from hub `data` to storage `metadata` is unambiguous but was never stated. 3. **Single `api_keys` table** assumes keypal-style bearer token auth only. The @alkdev platform now has two authentication mechanisms: - **API key** (keypal-style): client sends a bearer token, hub hashes it, looks up by `keyHash`. Transport: HTTP/WebSocket. - **Peer credential** (wraith-style): client presents an Ed25519 public key or OpenSSH certificate over an SSH channel (TCP/TLS/Iroh). Server validates against known fingerprints. Transport: wraith SSH tunnel. These credential types have fundamentally different query patterns (hash lookup vs fingerprint lookup), different columns (keyHash makes no sense for SSH keys; publicKeyFingerprint makes no sense for API keys), and different lifecycles (rotation vs addition/removal). A single table would be mostly nulls on either side. 4. **Missing FK cascade behavior** — Identity tables live in the same system DB, so real FK constraints apply. The current spec uses "logical reference" language (appropriate for cross-DB scoping columns on `graphs`) for relationships that are intra-database and should have proper cascades. 5. **Missing columns** — The hub spec includes `api_keys.description`, `api_keys.rotatedToId`, `api_keys.lastUsedAt`, and `audit_logs.sessionId` that were dropped from storage without rationale. ## Decision ### 1. Separate credential tables by type Two tables with distinct columns, not a unified `credentials` table with a type discriminator: | Table | Auth mechanism | Lookup pattern | Transport | |-------|----------------|----------------|-----------| | `api_keys` | Bearer token | Hash token → look up `keyHash` | HTTP/WebSocket | | `peer_credentials` | SSH key / cert-authority | Present fingerprint → look up `fingerprint` | wraith (SSH over TCP/TLS/Iroh) | Rationale: Query patterns, columns, and lifecycles differ fundamentally. Credential type proliferation is expected to be low (3-4 types ever). A new table per type is acceptable. This extends the design principle established by ADR-002: the metagraph pattern serves graph-shaped data (dynamic schemas, traversal queries); dedicated tables serve fixed-schema data with known columns and relational query patterns. **Decision criterion for future credential types**: Credential types sharing the same lookup column get a table with a `credentialType` discriminator. Credential types requiring different lookup columns get their own table. For example, a future `tls_certificate` credential that uses fingerprint-based lookup would join `peer_credentials` with a new `credentialType` value. A credential type using a different lookup column (e.g., a client identifier) would warrant a new table. ### 2. Remove Gitea columns `accounts.giteaUsername` and `organizations.giteaOrgName` are removed. Git hosting integration is a consumer concern, not a storage infrastructure concern. When a downstream system needs to associate accounts or organizations with git hosting, it stores that association in: - A metagraph instance (e.g., a project graph with a `GitRepositoryNode`) - Consumer-side schema extensions via `commonCols.metadata` ### 3. Hub `data` column maps to storage `commonCols.metadata` The hub's `accounts.data` and `organizations.data` JSONB columns are not present in storage's identity tables. Their purpose (extensible account/org metadata) is served by `commonCols.metadata`, which follows the `_subsystem.key` convention. No data is lost — the extensible namespace already exists. The hub maps `data` fields into `metadata` keys when migrating. ### 4. Add `peer_credentials` table A new table for SSH key and certificate-authority authentication over wraith transport: | Column | Type | Notes | |--------|------|-------| | commonCols | — | id, metadata, createdAt, updatedAt | | ownerId | text NOT NULL | FK → accounts.id (CASCADE) | | credentialType | text NOT NULL | `ssh_key`, `cert_authority` | | fingerprint | text NOT NULL UNIQUE | Ed25519 key fingerprint (SHA-256) | | publicKeyData | text NOT NULL | Full public key in OpenSSH format | | name | text | Human-readable label | | enabled | integer NOT NULL DEFAULT 1 | Immediate disable switch | | expiresAt | integer (timestamp) | Null = never | | revokedAt | integer (timestamp) | Null = active | The `credentialType` discriminator separates key entries from CA entries within a single table because their query pattern (look up by fingerprint) is identical. Cert-specific data (principals, restrictions, caFingerprint) goes in `metadata`. ### 5. Add back useful API key columns | Column | Reason | |--------|--------| | `rotatedToId` | API key rotation tracking — sets which key replaced this one | | `lastUsedAt` | Stale key cleanup and access pattern analysis | `description` is not added. `name` + `metadata` covers labeling needs. ### 6. Specify FK cascade behavior within system DB All identity table FKs are intra-database and use real constraints: | Relationship | onDelete | Rationale | |-------------|----------|-----------| | organizations.ownerId → accounts.id | RESTRICT | Cannot delete owner account while org exists | | organization_members.orgId → organizations.id | CASCADE | Org deletion removes memberships | | organization_members.accountId → accounts.id | CASCADE | Account deletion removes memberships | | api_keys.ownerId → accounts.id | CASCADE | Account deletion removes API keys | | peer_credentials.ownerId → accounts.id | CASCADE | Account deletion removes peer credentials | | audit_logs.ownerId → accounts.id | RESTRICT | Audit integrity — deactivate account instead of delete | `audit_logs.keyId` and `audit_logs.credentialId` are logical references (not FK) because they may reference rows in either `api_keys` or `peer_credentials`, and the referenced table depends on the audit event type. ### 7. Update audit_logs for multi-credential world Replace `keyId` (API key only) with polymorphic credential references: | Column | Type | Notes | |--------|------|-------| | credentialId | text | Logical reference to api_keys.id or peer_credentials.id | | credentialType | text | `api_key`, `peer_credential`, or null | This replaces the previous `keyId` column. The `credentialType` discriminator tells the consumer which table to look up (same pattern as `graphs.ownerId` — logical reference, not FK). `sessionId` is not added. Session correlation is a hub concern, not a storage infrastructure concern. When needed, it goes in `metadata`. ## Consequences **Positive:** - Two credential types covered from the start — API key auth and wraith SSH auth can both be stored and looked up efficiently - Each credential table has native columns for its specific fields — no null-heavy rows, no JSON lookups for high-query fields - Gitea coupling removed — storage doesn't depend on a specific git hosting provider - FK cascades specified — implementers know exactly what happens on deletion - Clear provenance mapping — hub's `data` → storage's `metadata` is explicit **Negative:** - Two credential tables instead of one — but the columns don't overlap and query patterns differ, so this is the correct trade-off - `audit_logs.credentialId`/`credentialType` polymorphic reference — no FK constraint, consumer resolves the table (same pattern as existing cross-DB references) - Hub must migrate its existing identity schema when consuming storage's definitions — `keyId` → `credentialId` + `credentialType`, `data` → `metadata`, Gitea columns to metagraph or consumer metadata - `peer_credentials` credential types may grow (e.g., `tls_certificate`) — handled by adding enum values, not new tables, since query patterns within peer auth are similar ## References - ADR-002: Metagraph over domain-specific tables - ADR-041: Identity tables in storage package - ADR-040: System DB + tenant DB separation - Hub identity tables: `/workspace/@alkdev/hub/docs/architecture/storage/identity.md` - Wraith NAPI + pubsub: `/workspace/@alkdev/wraith/docs/architecture/napi-and-pubsub.md` - Wraith auth: `/workspace/@alkdev/wraith/docs/architecture/decisions/012-auth-ed25519-and-cert-authority.md`