hub/docs/decisions/ADR-011-dual-task-representation.md

# ADR-011: Database as source of truth for tasks

- **Status**: Accepted
- **Date**: 2026-04-19
- **Deciders**: alkdev
- **Supersedes**: Previous "dual representation" design where files were source of truth for content and DB for state

## Context

The SDD process uses tasks as markdown files (compatible with the `taskgraph` CLI). The hub coordinator needs to query and mutate task state at runtime across multiple parallel worktrees. We need a storage model that serves both authoring and runtime coordination.

Taskgraph's file-based model works well for single-agent, single-worktree workflows. In the hub's multi-agent, multi-worktree environment, files create problems:

- **Parallel worktrees**: Agent A marks a task `in-progress` in their worktree's file. Agent B can't see this — the file lives in A's working directory. The coordinator can't get a consistent view.
- **Merge conflicts**: Two agents editing the same task file in different worktrees creates git conflicts on merge.
- **Reliable coordination**: The coordinator needs to query "which tasks are pending?" without scanning filesystems across worktrees.
- **Atomic mutations**: Status changes must be immediately visible to all agents, not delayed until file merges.

Three options were considered:

1. **Files only** — The coordinator runs `taskgraph` CLI commands via bash to query status. Agents edit files directly.
2. **Database only** — Tasks are stored exclusively in Postgres. No markdown files.
3. **Database as source of truth, files as authoring surface** — The DB is the authoritative runtime representation. Markdown files serve as the Decomposer's authoring format, ingested to DB via sync. Taskgraph CLI used for offline analysis via DB export.

## Decision

We choose **Option 3: Database as source of truth, files as authoring surface**.

### Authority Model

| Aspect | Authority | Why |
|--------|-----------|-----|
| All task fields (structure, categorical estimates, metadata) | **DB** | Every taskgraph frontmatter field maps to a dedicated DB column. Queryable, concurrent-safe, consistent. |
| Task specification (body) | **DB** (`body` column) | Stored as markdown text. Agents append notes during execution. |
| Task creation/authoring | **Files** → sync → DB | Decomposer edits markdown files; sync ingests them into DB. |
| Runtime status mutations | **DB** (hub operations) | `hub.task.*` operations ensure all agents see consistent state. |
| Offline graph analysis | **Files** (taskgraph CLI) | Export from DB when needed for `taskgraph risk-path` etc. |

### Key Design Principles

1. **Every taskgraph frontmatter field is a proper DB column** — no fields relegated to JSONB `metadata`. `priority`, `assignee`, `dueAt`, `tags` get dedicated columns because they're queryable and filterable in coordinator workflows.

2. **Categorical fields are nullable, not NOT NULL with defaults** — `scope`, `risk`, `impact`, `level` are nullable (NULL = not yet assessed). This preserves the distinction between "deliberately assessed as low" and "nobody filled this in." Taskgraph itself uses `Option<TaskScope>` etc.

3. **No `parentId`** — Grouping is handled by `path` (a nullable text column for scoped queries like `WHERE path LIKE 'implementation/%'`). Dependencies are in `task_dependencies`. These are separate concepts.

4. **No `removedAt` soft delete** — When a task file is removed, the sync DELETEs the DB row. Git history preserves file-level history. No DB duplication needed.

5. **`fileCreatedAt`/`fileModifiedAt`** — Dedicated columns for frontmatter timestamps, separate from DB `createdAt`/`updatedAt` (row lifecycle times).

## Consequences

**Positive**:
- Coordinator gets a reliable, consistent view of all task state across parallel worktrees.
- No merge conflicts from agents editing the same file in different worktrees.
- Status changes are atomic and immediately visible to all agents via hub operations.
- All taskgraph fields are queryable with proper SQL types and indexes.
- Taskgraph CLI still works for offline analysis via DB → file export.
- Nullable categorical fields provide the "not yet assessed" signal that defaults hide.

**Negative**:
- Two representations exist (files and DB), requiring a sync operation.
- Files are no longer the source of truth — they're the authoring surface. This is a conceptual shift from taskgraph's default model.
- DB → file export is needed for offline analysis (not automatic).

**Mitigation for negatives**:
- Sync is idempotent and can be run at any time after authoring.
- The DB is the authority; files are just one input method. Tasks can also be created via hub API.
- Export for offline analysis is a manual step (run when needed), not a continuous sync.

## Related

- ADR-001: JSONB data columns vs individual columns (same principle — proper columns for queryable fields)
- Cost-benefit framework: taskgraph framework docs
- Task storage: `docs/architecture/storage/tasks.md`
- taskgraph TaskFrontmatter: taskgraph source