storage/docs/architecture/decisions/049-identity-schema-restructuring.md

# ADR-049: Identity Schema Restructuring

## Status

Accepted

## Context

The identity tables in `sqlite-host.md` were derived from the hub's PostgreSQL
schema (`@alkdev/hub/docs/architecture/storage/identity.md`) but simplified
without documenting what was removed and why. This creates ambiguity for
implementation:

1. **Gitea columns** (`accounts.giteaUsername`, `organizations.giteaOrgName`)
   were dropped without documented rationale. These are hub-specific
   integration columns — a storage package should not couple to a particular
   git hosting provider. Git association, when needed, belongs in a metagraph
   instance (e.g., a project graph with git repo metadata) or a downstream
   consumer's schema.

2. **`data` JSONB columns** on `accounts` and `organizations` in the hub schema
   were silently dropped. The hub used these for account preferences/profile
   and org billing/settings. Storage's `commonCols.metadata` serves the same
   purpose — an extension namespace following `_subsystem.key` convention. The
   mapping from hub `data` to storage `metadata` is unambiguous but was never
   stated.

3. **Single `api_keys` table** assumes keypal-style bearer token auth only. The
   @alkdev platform now has two authentication mechanisms:
   - **API key** (keypal-style): client sends a bearer token, hub hashes it,
     looks up by `keyHash`. Transport: HTTP/WebSocket.
   - **Peer credential** (wraith-style): client presents an Ed25519 public key
     or OpenSSH certificate over an SSH channel (TCP/TLS/Iroh). Server
     validates against known fingerprints. Transport: wraith SSH tunnel.

   These credential types have fundamentally different query patterns (hash
   lookup vs fingerprint lookup), different columns (keyHash makes no sense
   for SSH keys; publicKeyFingerprint makes no sense for API keys), and
   different lifecycles (rotation vs addition/removal). A single table would be
   mostly nulls on either side.

4. **Missing FK cascade behavior** — Identity tables live in the same system
   DB, so real FK constraints apply. The current spec uses "logical reference"
   language (appropriate for cross-DB scoping columns on `graphs`) for
   relationships that are intra-database and should have proper cascades.

5. **Missing columns** — The hub spec includes `api_keys.description`,
   `api_keys.rotatedToId`, `api_keys.lastUsedAt`, and `audit_logs.sessionId`
   that were dropped from storage without rationale.

## Decision

### 1. Separate credential tables by type

Two tables with distinct columns, not a unified `credentials` table with a
type discriminator:

| Table | Auth mechanism | Lookup pattern | Transport |
|-------|----------------|----------------|-----------|
| `api_keys` | Bearer token | Hash token → look up `keyHash` | HTTP/WebSocket |
| `peer_credentials` | SSH key / cert-authority | Present fingerprint → look up `fingerprint` | wraith (SSH over TCP/TLS/Iroh) |

Rationale: Query patterns, columns, and lifecycles differ fundamentally.
Credential type proliferation is expected to be low (3-4 types ever). A new
table per type is acceptable. This extends the design principle established by
ADR-002: the metagraph pattern serves graph-shaped data (dynamic schemas,
traversal queries); dedicated tables serve fixed-schema data with known
columns and relational query patterns.

**Decision criterion for future credential types**: Credential types sharing
the same lookup column get a table with a `credentialType` discriminator.
Credential types requiring different lookup columns get their own table. For
example, a future `tls_certificate` credential that uses fingerprint-based
lookup would join `peer_credentials` with a new `credentialType` value. A
credential type using a different lookup column (e.g., a client identifier)
would warrant a new table.

### 2. Remove Gitea columns

`accounts.giteaUsername` and `organizations.giteaOrgName` are removed. Git
hosting integration is a consumer concern, not a storage infrastructure
concern. When a downstream system needs to associate accounts or organizations
with git hosting, it stores that association in:
- A metagraph instance (e.g., a project graph with a `GitRepositoryNode`)
- Consumer-side schema extensions via `commonCols.metadata`

### 3. Hub `data` column maps to storage `commonCols.metadata`

The hub's `accounts.data` and `organizations.data` JSONB columns are not
present in storage's identity tables. Their purpose (extensible account/org
metadata) is served by `commonCols.metadata`, which follows the
`_subsystem.key` convention. No data is lost — the extensible namespace
already exists. The hub maps `data` fields into `metadata` keys when
migrating.

### 4. Add `peer_credentials` table

A new table for SSH key and certificate-authority authentication over wraith
transport:

| Column | Type | Notes |
|--------|------|-------|
| commonCols | — | id, metadata, createdAt, updatedAt |
| ownerId | text NOT NULL | FK → accounts.id (CASCADE) |
| credentialType | text NOT NULL | `ssh_key`, `cert_authority` |
| fingerprint | text NOT NULL UNIQUE | Ed25519 key fingerprint (SHA-256) |
| publicKeyData | text NOT NULL | Full public key in OpenSSH format |
| name | text | Human-readable label |
| enabled | integer NOT NULL DEFAULT 1 | Immediate disable switch |
| expiresAt | integer (timestamp) | Null = never |
| revokedAt | integer (timestamp) | Null = active |

The `credentialType` discriminator separates key entries from CA entries within
a single table because their query pattern (look up by fingerprint) is
identical. Cert-specific data (principals, restrictions, caFingerprint) goes
in `metadata`.

### 5. Add back useful API key columns

| Column | Reason |
|--------|--------|
| `rotatedToId` | API key rotation tracking — sets which key replaced this one |
| `lastUsedAt` | Stale key cleanup and access pattern analysis |

`description` is not added. `name` + `metadata` covers labeling needs.

### 6. Specify FK cascade behavior within system DB

All identity table FKs are intra-database and use real constraints:

| Relationship | onDelete | Rationale |
|-------------|----------|-----------|
| organizations.ownerId → accounts.id | RESTRICT | Cannot delete owner account while org exists |
| organization_members.orgId → organizations.id | CASCADE | Org deletion removes memberships |
| organization_members.accountId → accounts.id | CASCADE | Account deletion removes memberships |
| api_keys.ownerId → accounts.id | CASCADE | Account deletion removes API keys |
| peer_credentials.ownerId → accounts.id | CASCADE | Account deletion removes peer credentials |
| audit_logs.ownerId → accounts.id | RESTRICT | Audit integrity — deactivate account instead of delete |

`audit_logs.keyId` and `audit_logs.credentialId` are logical references (not
FK) because they may reference rows in either `api_keys` or `peer_credentials`,
and the referenced table depends on the audit event type.

### 7. Update audit_logs for multi-credential world

Replace `keyId` (API key only) with polymorphic credential references:

| Column | Type | Notes |
|--------|------|-------|
| credentialId | text | Logical reference to api_keys.id or peer_credentials.id |
| credentialType | text | `api_key`, `peer_credential`, or null |

This replaces the previous `keyId` column. The `credentialType` discriminator
tells the consumer which table to look up (same pattern as
`graphs.ownerId` — logical reference, not FK).

`sessionId` is not added. Session correlation is a hub concern, not a storage
infrastructure concern. When needed, it goes in `metadata`.

## Consequences

**Positive:**

- Two credential types covered from the start — API key auth and wraith SSH
  auth can both be stored and looked up efficiently
- Each credential table has native columns for its specific fields — no
  null-heavy rows, no JSON lookups for high-query fields
- Gitea coupling removed — storage doesn't depend on a specific git hosting
  provider
- FK cascades specified — implementers know exactly what happens on deletion
- Clear provenance mapping — hub's `data` → storage's `metadata` is explicit

**Negative:**

- Two credential tables instead of one — but the columns don't overlap and
  query patterns differ, so this is the correct trade-off
- `audit_logs.credentialId`/`credentialType` polymorphic reference — no FK
  constraint, consumer resolves the table (same pattern as existing
  cross-DB references)
- Hub must migrate its existing identity schema when consuming storage's
  definitions — `keyId` → `credentialId` + `credentialType`, `data` → `metadata`,
  Gitea columns to metagraph or consumer metadata
- `peer_credentials` credential types may grow (e.g., `tls_certificate`) —
  handled by adding enum values, not new tables, since query patterns within
  peer auth are similar

## References

- ADR-002: Metagraph over domain-specific tables
- ADR-041: Identity tables in storage package
- ADR-040: System DB + tenant DB separation
- Hub identity tables: `/workspace/@alkdev/hub/docs/architecture/storage/identity.md`
- Wraith NAPI + pubsub: `/workspace/@alkdev/wraith/docs/architecture/napi-and-pubsub.md`
- Wraith auth: `/workspace/@alkdev/wraith/docs/architecture/decisions/012-auth-ed25519-and-cert-authority.md`