hub/docs/architecture/storage/tasks.md

---
status: draft
last_updated: 2026-05-18
---

# Storage: Tasks & Task Dependencies

Tasks are the unit of work in the Spec-Driven Development (SDD) process. The **database is the source of truth** for task data at runtime. Markdown files serve as the **authoring surface** for the Decomposer role and the `taskgraph` CLI — they are ingested into the DB via a sync operation and can be exported back for offline analysis.

For the overall storage pattern, see [README.md](./README.md). For cross-cutting table reference (common columns, cascade behavior, index reference, status enums, relations), see [table-reference.md](./table-reference.md). For design decisions, see [../../decisions/](../../decisions/).

## Overview

### Why Database as Source of Truth

Taskgraph's file-based model works well for single-agent, single-worktree workflows. In the hub's multi-agent, multi-worktree environment, files create problems:

- **Parallel worktrees**: Agent A marks a task `in-progress` in their worktree's file. Agent B can't see this — the file lives in A's working directory. The coordinator can't get a consistent view.
- **Reliable coordination**: The coordinator needs to query "which tasks are pending?" and "what's blocking task X?" at runtime without scanning filesystems across worktrees.
- **Atomic status updates**: An agent calling `hub.task.updateStatus` gets an immediate, transactional state change visible to all other agents and the coordinator.

The database is the authoritative, queryable, concurrent-safe representation. Files are the authoring format.

### Relationship to taskgraph CLI

The `taskgraph` CLI operates on markdown files. Its value is in **offline analysis** — `topo`, `cycles`, `parallel`, `critical`, `bottleneck`, `risk-path`, `decompose`. These commands depend on categorical fields (`scope`, `risk`, `impact`, `level`) being assessed.

The workflow is:

1. **Author** — Decomposer creates/edits markdown files using `taskgraph init` and direct editing
2. **Sync** — Files are ingested into the DB (files → DB)
3. **Execute** — Coordinator and agents query and mutate the DB via hub operations
4. **Analyze** — When needed, export from DB to files, run `taskgraph risk-path` etc.

The taskgraph CLI is not required at runtime. The hub uses **@alkdev/taskgraph** for runtime graph operations (topological sort, cycle detection, parallel groups, critical path, risk analysis) — see [Graphology Integration](#graphology-integration-runtime-graph-ops).

## Task Authority Model

| Aspect | Authority | Why |
|--------|-----------|-----|
| Task structure (all fields) | **DB** | Queryable, concurrent-safe, consistent |
| Task specification (body) | **DB** (`body` column) | Stored as markdown text; agents append notes during execution |
| Task authoring/creation | **Files** → sync → DB | Decomposer edits files; sync ingests them |
| Runtime status mutations | **DB** (hub operations) | `hub.task.*` operations — coordinator and agents call these |
| Offline graph analysis | **Files** (taskgraph CLI) | Export from DB when needed for `taskgraph risk-path` etc. |

See [Field Authority Split](#field-authority-split) for the explicit list of authored vs runtime-managed fields.

## Field Authority Split

Fields are split into two categories based on who writes them:

### Authored Fields (upserted by file sync)

These fields are written by the Decomposer/file sync. The `ON CONFLICT DO UPDATE SET` clause in the sync upsert includes **only** these columns:

| Field | DB Column |
|-------|-----------|
| id | `slug` |
| name | `name` |
| (project) | `projectId` |
| (directory path) | `path` |
| scope | `scope` |
| risk | `risk` |
| impact | `impact` |
| level | `level` |
| priority | `priority` |
| tags | `tags` |
| assignee | `assignee` |
| due | `dueAt` |
| (body) | `body` |
| created | `fileCreatedAt` |
| modified | `fileModifiedAt` |
| depends_on | `task_dependencies` table |

**Note**: `projectId` is set from the project context during sync (the task file's location within a project's `tasks/` directory determines the project), not from taskgraph frontmatter. `commonCols` fields (`id`, `metadata`, `createdAt`, `updatedAt`) are DB-generated and not part of the sync conflict domain.

### Runtime-Managed Fields (mutated via `hub.task.*` operations only)

These fields are never overwritten by sync. They are only mutated by hub operations (`hub.task.updateStatus`, `hub.task.addNote`, etc.):

| Field | DB Column | Set By |
|-------|-----------|--------|
| status | `status` | `hub.task.updateStatus` |
| (started timestamp) | `startedAt` | `hub.task.updateStatus` (on `in-progress`) |
| (completed timestamp) | `completedAt` | `hub.task.updateStatus` (on `completed`) |

> **Warning**: Sync must never write `status`, `startedAt`, or `completedAt` — these are owned by hub operations. The sync upsert uses `ON CONFLICT DO UPDATE SET` only for authored fields; runtime fields are excluded from the SET clause.

## Field Mapping: taskgraph Frontmatter → DB Columns

Every field in taskgraph's `TaskFrontmatter` struct maps to a dedicated DB column. No frontmatter fields are relegated to JSONB `metadata`.

| taskgraph Field | DB Column | Type | Notes |
|---|---|---|---|
| `id` | `slug` | text NOT NULL | Direct mapping. No transformation. `slug` is taskgraph-compatible, used in `depends_on` references. |
| `name` | `name` | text NOT NULL | Direct mapping |
| `status` | `status` | text NOT NULL, enum | Direct mapping: `pending`, `in-progress`, `completed`, `failed`, `blocked`. Default: `pending`. |
| `depends_on` | `task_dependencies` table | — | Each element creates a row: `depends_on[i]` → `dependsOnTaskId`, task → `dependentTaskId` |
| `scope` | `scope` | text, enum | `single`, `narrow`, `moderate`, `broad`, `system`. **Nullable** — NULL = not yet assessed. |
| `risk` | `risk` | text, enum | `trivial`, `low`, `medium`, `high`, `critical`. **Nullable** — NULL = not yet assessed. |
| `impact` | `impact` | text, enum | `isolated`, `component`, `phase`, `project`. **Nullable** — NULL = not yet assessed. |
| `level` | `level` | text, enum | `planning`, `decomposition`, `implementation`, `review`, `research`. **Nullable** — NULL = not yet assessed. |
| `priority` | `priority` | text, enum | `low`, `medium`, `high`, `critical`. Nullable. |
| `tags` | `tags` | text[] | String array. Default `{}`. |
| `assignee` | `assignee` | text | Assigned agent or person. Nullable. |
| `due` | `dueAt` | timestamp with tz | Renamed from `due` for DB convention. Nullable. |
| `created` | `fileCreatedAt` | timestamp with tz | Frontmatter `created` field. Separate from DB `createdAt` (row creation time). Nullable — frontmatter may not include it. |
| `modified` | `fileModifiedAt` | timestamp with tz | Frontmatter `modified` field. Separate from DB `updatedAt` (row update time). Nullable. |
| (body) | `body` | text | Markdown content after frontmatter. Nullable — empty body is valid. |
| (directory path) | `path` | text | Logical grouping prefix: `architecture`, `implementation/storage`. Nullable — tasks created via API with no file origin have no path. See [Path Semantics](#path-semantics). |
| (project) | `projectId` | text NOT NULL | FK → projects.id |

### Table Schemas

### `tasks`

SDD task definitions. The database is the source of truth for task data at runtime. Markdown files serve as the authoring surface for the Decomposer and taskgraph CLI — they are ingested into the DB via a sync operation. Every field in taskgraph's `TaskFrontmatter` struct maps to a dedicated DB column (no frontmatter fields in `metadata` JSONB).

| Column | Type | Notes |
|--------|------|-------|
| commonCols | — | id, metadata, createdAt, updatedAt |
| projectId | text NOT NULL | FK → projects.id (cascade) — tasks belong to a project |
| slug | text NOT NULL | taskgraph `id` — kebab-case identifier used in `depends_on` references. Unique within a project. |
| name | text NOT NULL | Human-readable task name (from frontmatter `name`) |
| path | text | Logical grouping prefix derived from filesystem location (e.g., `architecture`, `implementation/storage`). Nullable — tasks created via API with no file origin have no path. Enables `WHERE path LIKE 'implementation/%'` for scoped queries. |
| status | text NOT NULL | Enum: `pending`, `in-progress`, `completed`, `failed`, `blocked`. Default: `pending`. Status transitions go through hub operations, not file edits. |
| scope | text | Categorical scope: `single`, `narrow`, `moderate`, `broad`, `system`. **Nullable** — NULL = not yet assessed. See [Why Categorical Fields Are Nullable](#why-categorical-fields-are-nullable-not-not-null-with-defaults). |
| risk | text | Categorical risk: `trivial`, `low`, `medium`, `high`, `critical`. **Nullable** — NULL = not yet assessed. |
| impact | text | Categorical impact: `isolated`, `component`, `phase`, `project`. **Nullable** — NULL = not yet assessed. |
| level | text | Task level: `planning`, `decomposition`, `implementation`, `review`, `research`. **Nullable** — NULL = not yet assessed. |
| priority | text | Priority: `low`, `medium`, `high`, `critical`. Nullable. |
| assignee | text | Assigned agent or person. Nullable. |
| dueAt | timestamp with tz | Due date (from frontmatter `due`). Nullable. |
| tags | text[] | Filtering tags. Default `{}`. GIN index for array-contains queries. |
| body | text | Markdown task specification (from file body after frontmatter). Nullable — empty body is valid. Agents may append notes during execution. |
| fileCreatedAt | timestamp with tz | Frontmatter `created` field — file creation time from the markdown. Separate from DB `createdAt` (row creation time). Nullable. |
| fileModifiedAt | timestamp with tz | Frontmatter `modified` field — file modification time from the markdown. Separate from DB `updatedAt` (row update time). Nullable. |
| startedAt | timestamp with tz | When status became `in-progress`. Set by hub operation, not by agent. |
| completedAt | timestamp with tz | When status became `completed`. Set by hub operation. |

**Unique constraint**: `unq_tasks_project_slug` UNIQUE on `(projectId, slug)` — task slugs are unique within a project.

**pgEnum Definitions**: The following enum columns use PostgreSQL `pgEnum` for type safety. Drizzle's `pgEnum` generates named PostgreSQL enums and provides TypeScript type inference. The enum values are aligned with taskgraph's categorical fields.

```ts
export const taskStatus = pgEnum("task_status", ["pending", "in-progress", "completed", "failed", "blocked"]);
export const taskScope = pgEnum("task_scope", ["single", "narrow", "moderate", "broad", "system"]);
export const taskRisk = pgEnum("task_risk", ["trivial", "low", "medium", "high", "critical"]);
export const taskImpact = pgEnum("task_impact", ["isolated", "component", "phase", "project"]);
export const taskLevel = pgEnum("task_level", ["planning", "decomposition", "implementation", "review", "research"]);
export const taskPriority = pgEnum("task_priority", ["low", "medium", "high", "critical"]);
```

The decomposer template should consume these same enum definitions to ensure DB-level constraints match the application-level typing.

**Indexes**: `idx_tasks_project_id` on `(projectId)`, `idx_tasks_project_status` on `(projectId, status)` — composite for "find all pending tasks in project X", `idx_tasks_status` on `(status)`, `idx_tasks_active` partial on `(projectId)` WHERE `status IN ('pending', 'in-progress', 'blocked')` — efficiently find active tasks, `idx_tasks_path` on `(path)` **with `text_pattern_ops`** — locale-independent LIKE pattern matching for path prefix queries (e.g., `WHERE path LIKE 'implementation/%'`), `idx_tasks_priority` on `(priority)`, `idx_tasks_assignee` on `(assignee)`, `idx_tasks_due_at` on `(dueAt)`, `idx_tasks_tags` GIN on `(tags)` — for array-contains queries (`tags @> '{security}'`).

**`slug` semantics**: From taskgraph frontmatter `id` field. Kebab-case identifiers like `auth-setup`, `storage-tasks-table`. Appears in `depends_on` arrays.

**`path` semantics**: Nullable — tasks created via API with no filesystem origin have no path. When set, captures the logical grouping derived from the `tasks/` directory structure. E.g., a file at `tasks/implementation/storage/tasks-table.md` gets `path: "implementation/storage"`. Enables `WHERE path LIKE 'implementation/%'` (scoped queries) without requiring a `parentId` FK. This replaces the previous `parentId` column — grouping is a path concern, not a tree relationship.

**No `parentId` column**: Grouping is handled by `path`, dependencies by `task_dependencies`. A "meta task" is just a regular task that depends on its sub-tasks — no special entity type needed.

**No `removedAt` column**: When a task file is removed, the sync operation DELETEs the DB row. Git history preserves the file-level history; the DB doesn't need to duplicate it with soft deletes. FK cascade handles cleanup.

**`metadata` JSONB**: Reserved for truly ad-hoc data not in the taskgraph schema. No taskgraph frontmatter fields are stored here — all have proper columns.

### `task_dependencies`

Dependency edges between tasks. Directed: a row means the dependent task depends on the prerequisite task (prerequisite must complete before dependent can start). Mirrors the taskgraph `depends_on` relationship.

| Column | Type | Notes |
|--------|------|-------|
| commonCols | — | id, metadata, createdAt, updatedAt |
| dependsOnTaskId | text NOT NULL | FK → tasks.id (cascade) — The prerequisite task (must complete first) |
| dependentTaskId | text NOT NULL | FK → tasks.id (cascade) — The dependent task (waits for prerequisite) |

**Unique constraint**: `unq_task_dependencies_depends_on_task` UNIQUE on `(dependsOnTaskId, dependentTaskId)` — no duplicate dependency edges.

**Indexes**: `idx_task_dependencies_depends_on_task_id` on `(dependsOnTaskId)` — "what depends on this task?", `idx_task_dependencies_dependent_task_id` on `(dependentTaskId)` — "what does this task depend on?".

**Direction**: `dependentTaskId` is the task that has the dependency. `dependsOnTaskId` is the prerequisite task. Together they form a directed edge: `dependentTaskId` → `dependsOnTaskId` meaning "task dependentTaskId depends on task dependsOnTaskId". In the graph, there's an edge from `dependsOnTaskId` → `dependentTaskId` (prerequisite → dependent). This gives correct topological order: prerequisites before dependents.

**Cross-project dependency guard**: `taskId` and `dependsOnTaskId` MUST reference tasks within the same project. The application layer enforces this constraint — creating a dependency between tasks in different projects is rejected with a validation error. This is not enforced at the DB level (FK constraints allow cross-project references), so the application must check project consistency before insert.

A future DB-level guard could use a trigger: `BEFORE INSERT ON task_dependencies` that checks `NEW.taskId` and `NEW.dependsOnTaskId` reference tasks in the same project. This is deferred to Phase 2 — the application-layer check is sufficient for now.

**Sync source**: Dependency edges are authored in task file frontmatter (`depends_on: [other-task]`) and synced to this table during the file → DB sync operation. The sync clears and re-inserts all edges for a task on each run — dependencies are fully replaced by the sync, not merged or modified at runtime.

## Why ALL Frontmatter Fields Get Proper Columns

ADR-001 establishes the pattern: "separate structured columns for high-query, high-filter fields." For tasks, **every** taskgraph frontmatter field is queryable and filterable in the coordinator's workflow:

- `priority` — "show me high-priority pending tasks" (coordinator prioritization)
- `assignee` — "which tasks are assigned to agent X?" (work assignment)
- `dueAt` — "which tasks are due this week?" (deadline tracking)
- `tags` — "filter by tag" (cross-cutting concerns)

Shoving these into `metadata` JSONB loses type safety, indexability, and SQL queryability — exactly the problems the database is meant to solve. The `metadata` JSONB column (from `commonCols`) is reserved for truly ad-hoc data that isn't in the taskgraph schema.

### Why Categorical Fields Are Nullable (Not NOT NULL with Defaults)

The previous design made `scope`, `risk`, `impact`, and `level` NOT NULL with defaults (`narrow`, `low`, `isolated`, `implementation`). This conflated two states:

- **Assessed as `low`** — the Decomposer evaluated this and determined the risk is low
- **Not assessed** — nobody filled this in

Hiding the distinction with defaults means the coordinator can't distinguish a deliberate assessment from a gap. NULL is the correct signal for "not yet assessed."

Taskgraph itself makes these fields `Option<TaskScope>`, `Option<TaskRisk>`, etc. — nullable. The DB should match the source model.

**Application-layer handling**: When `scope`, `risk`, `impact`, or `level` is NULL, the coordinator should:
- Warn that the task hasn't been assessed
- Exclude it from cost-benefit analysis (you can't compute risk-path without risk values)
- Suggest the Decomposer assess it

For @alkdev/taskgraph operations that need numeric weights, provide fallbacks at the application layer (e.g., treat NULL risk as `low` for topo sort, but warn).

## Path Semantics

The `path` column captures the logical grouping of tasks, derived from their location in the `tasks/` directory hierarchy:

```
tasks/
├── architecture/
│   ├── auth-design.md          → path: "architecture"
│   └── storage-overview.md     → path: "architecture"
├── research/
│   └── embedding-approach.md   → path: "research"
└── implementation/
    ├── storage/
    │   ├── tasks-table.md      → path: "implementation/storage"
    │   └── relations.md        → path: "implementation/storage"
    └── auth/
        └── oauth-flow.md       → path: "implementation/auth"
```

**`path` is nullable** because tasks created at runtime via hub operations (not synced from files) have no filesystem origin.

**`path` enables scoped queries**:
- `WHERE path = 'architecture'` — all architecture tasks
- `WHERE path LIKE 'implementation/%'` — all implementation tasks
- `WHERE path = 'implementation/storage'` — storage implementation tasks

This is a prefix-based grouping mechanism. It replaces `parentId` (which was not in the taskgraph model and conflated organizational grouping with dependency ordering).

**Locale sensitivity**: The `path` column uses `text` type with the database's default collation. LIKE pattern matching (`WHERE path LIKE 'implementation/%'`) is collation-sensitive. For case-sensitive matching (recommended for task paths which use lowercase), use `COLLATE "C"` or ensure the default collation is `C`/`POSIX`. Alternatively, use `text_pattern_ops` operator class for the index: `CREATE INDEX idx_tasks_path ON tasks (path text_pattern_ops)` which enables `LIKE` and `~` pattern matching regardless of collation.

## Grouping vs Dependencies

**There is no `parentId` column.** Task grouping and dependency ordering are separate concepts:

- **Grouping** — `path` column. "This task belongs to the `implementation/storage` group." Enables scoped queries. Derived from filesystem layout during sync.
- **Dependencies** — `task_dependencies` table. "This task cannot start until that task completes." Enables topological sort, cycle detection, critical path. Derived from `depends_on` frontmatter.

A "meta task" (e.g., "implement storage") is simply a task that `depends_on` all its sub-tasks. There is no special entity type — it's regular task + dependency edges. The coordinator picks up the meta task as an assignment, and the implementation specialist works through sub-tasks in dependency order.

**Why not `parentId`**: `parentId` was invented in a previous doc revision but has no basis in the taskgraph data model. It created confusion:
- Redundant with `task_dependencies` (a meta task's dependencies ARE its sub-tasks)
- Required a fragile "inference from directory structure" during sync
- Violated the invariant that the DB schema mirrors the taskgraph frontmatter model

## Relationship to Existing Tables

### `mappings` Table

The `mappings` table links sessions to coordinators, spokes, and worktrees. A `taskId` column references the task a mapping is assigned to:

```ts
taskId: text REFERENCES tasks(id)   // FK to tasks
task: text                           // denormalized display name (e.g., task.slug or task.name)
```

This preserves the quick-reference pattern (coordinators can list mappings with task names without a JOIN) while maintaining referential integrity.

### `projects` Table

Tasks belong to a project via `tasks.projectId`. A project's tasks live in the project's `tasks/` directory. Cross-project task dependencies are not supported — tasks can only depend on other tasks within the same project. This is enforced at the application level (see task_dependencies cross-project guard).

### `sessions` Table

Sessions are linked to tasks indirectly through `mappings`. When the coordinator spawns a session for a meta task:
1. The task row already exists in `tasks` (synced from file or created via API)
2. Creates a `sessions` row for the implementation specialist
3. Creates a `mappings` row with `taskId` pointing to the meta task

## Task Status Lifecycle

```
pending → in-progress → completed
                      ↘ failed → in-progress (retry)
                      ↘ blocked → in-progress (unblocked)
```

| Status | Meaning |
|--------|---------|
| `pending` | Task exists, not yet started |
| `in-progress` | A session is actively working on this task |
| `completed` | Task finished successfully |
| `failed` | Task failed, may retry (Safe Exit protocol) |
| `blocked` | Task is blocked by an unmet dependency or external issue |

Status transitions go through **hub operations** (`hub.task.updateStatus`), not file edits. This ensures:
- All agents see consistent state immediately
- The coordinator can query "which tasks are pending?" reliably
- No merge conflicts from parallel file edits

Timestamp columns `startedAt` and `completedAt` track when a task entered `in-progress` and `completed` states respectively. These are set by the hub operation, not by the agent.

## Task Notes (Append-Only)

Agents may need to add notes to a task during execution (observations, partial progress, blockers encountered). For v1, this is handled by **appending markdown to the `body` column**:

```markdown
## Task Description (original)

Implement the tasks table with Drizzle-TypeBox pattern...

## Implementation Notes

- 2026-04-19: Started with table definition, commonCols pattern works
- 2026-04-19: Hit issue with text[] type for tags — need to check Drizzle support
```

The `hub.task.addNote` operation appends a timestamped note section to the end of `body`. This is simple, preserves the full context in one place, and requires no additional tables.

**Concurrency model for `hub.task.addNote`**: Notes are appended to the task `body` field using **DB-level concatenation**: `UPDATE tasks SET body = COALESCE(body, '') || $note WHERE id = $taskId`. This avoids read-modify-write cycles entirely — the append is atomic at the SQL level, eliminating race conditions between concurrent agents.

As a fallback for scenarios where DB-level concatenation isn't feasible, **optimistic locking via `updatedAt`** can be used: read the current `updatedAt`, append the note, and `UPDATE WHERE updatedAt = readValue`. If the row was updated between read and write, the UPDATE affects 0 rows and the operation must be retried. This is sufficient for the expected low-contention scenario (one agent at a time writing notes to a task).

For high-contention scenarios (multiple agents writing simultaneously), consider a separate `task_notes` table with `INSERT` operations instead of UPDATE appends.

If structured, multi-agent notes become necessary later, a dedicated `task_notes` table can be added. The `body` append pattern doesn't preclude this — it's additive.

## Why Categorical Estimates Matter

The `scope`, `risk`, `impact`, and `level` fields are not cosmetic metadata — they are what make taskgraph's analysis commands produce useful results. The cost-benefit framework (see taskgraph framework docs) demonstrates a structural property: **upstream failures multiply downstream damage**.

These fields power:
- **`taskgraph decompose`** — flags tasks where `risk > medium` or `scope > moderate`
- **`taskgraph risk-path`** — finds the highest cumulative risk path
- **`taskgraph critical`** — finds completion blockers
- **`taskgraph bottleneck`** — finds high-betweenness tasks

Without them, you just get topological sort — useful, but not structurally insightful. The DB columns for these fields are **nullable** (NULL = not assessed) rather than NOT NULL with defaults, because the distinction between "deliberately assessed as `low`" and "nobody filled this in" is itself valuable information for the coordinator.

## Graphology Integration (Runtime Graph Ops)

For runtime graph operations, the hub uses **`@alkdev/taskgraph`** — a TypeScript package that wraps graphology and provides a high-level `TaskGraph` class plus analysis functions. The CLI (`taskgraph`) is for offline authoring and analysis; the TS package is for runtime use.

The approach:
1. Load all `tasks` + `task_dependencies` rows for a project from the DB
2. Build a `TaskGraph` via `TaskGraph.fromRecords(tasks, edges)`
3. Run analysis functions as needed: `criticalPath()`, `parallelGroups()`, `bottlenecks()`, `riskPath()`, `shouldDecomposeTask()`, `workflowCost()`

This works because realistic task graphs are small — typically 10–50 tasks, rarely exceeding 200 even on large projects. Building a graph from DB rows is instant at this scale (`TaskGraph.fromRecords` with 100 nodes reconstructs in <5ms).

`@alkdev/taskgraph` exports:
- **`TaskGraph`** — construction (fromTasks, fromRecords, fromJSON), mutation (addTask, removeTask, addDependency, updateTask), queries (hasCycles, findCycles, topologicalOrder, dependencies, dependents, getTask), validation (validateSchema, validateGraph), export
- **Analysis functions** — criticalPath, weightedCriticalPath, parallelGroups, bottlenecks, riskPath, riskDistribution, calculateTaskEv, workflowCost, shouldDecomposeTask
- **Schema types** — TaskScope, TaskRisk, TaskImpact, TaskLevel, TaskPriority, TaskStatus enums with TypeBox schemas
- **Frontmatter** — parseFrontmatter, serializeFrontmatter (YAML + markdown)
- **Error classes** — TaskgraphError, CircularDependencyError, TaskNotFoundError, etc.

**Why not taskgraph NAPI for v1**: The Rust CLI (`taskgraph`) is for offline authoring and analysis. The TypeScript package (`@alkdev/taskgraph`) handles all runtime graph operations. Graphology is a transitive dependency through `@alkdev/taskgraph` and handles < 200 nodes trivially. NAPI is unnecessary at realistic scales.

## Sync Flow

```
┌──────────────┐       ┌───────────────┐       ┌──────────────────┐
│ Decomposer   │       │ taskgraph CLI │       │ Hub DB            │
│ creates .md  │──────►│ validates     │──────►│ tasks table       │
│ files        │       │ analyzes      │       │ task_dependencies  │
└──────────────┘       └───────────────┘       └──────────────────┘
                                                       ▲
                                                       │
                                              ┌────────┴─────────┐
                                              │ Hub operations     │
                                              │ hub.task.*         │
                                              │ (status, notes)    │
                                              └────────────────────┘
```

### Sync: Files → DB

The sync operation runs as a **single database transaction**:

1. **Begin transaction**
2. Scan `tasks/` directory for markdown files
3.    Parse frontmatter (YAML) + body (markdown) from each file. `@alkdev/taskgraph` provides `parseFrontmatter()` and `serializeFrontmatter()` for YAML+markdown parsing. `parseTaskFile()` and `parseTaskDirectory()` are Node.js only (use `node:fs/promises`); for Deno, use `parseFrontmatter()` with Deno file I/O.
4. Upsert into `tasks` table (matches by `(projectId, slug)`)
5. For each task, `DELETE FROM task_dependencies WHERE dependentTaskId = ?` then `INSERT` the current edges — dependency edges are fully replaced, not merged, because the files own the dependency declarations
6. **Commit transaction**

If any step fails, the entire sync rolls back — no partial updates.

**Concurrency**: Only one sync should run at a time. The Decomposer triggers sync after creating/updating task files. No concurrent sync mechanism is needed for v1.

**Deleted files**: When a task file is removed from `tasks/`, the sync operation **deletes** the corresponding DB row. Git history preserves the full file-level history — the DB doesn't need to duplicate it with soft deletes. FK cascade handles cleanup (`task_dependencies` rows, `mappings.taskId` SET NULL).

### DB → Files (Export)

When graph analysis is needed, export DB rows back to markdown files:

1. Query `tasks` + `task_dependencies` for a project
2. For each task, generate markdown with YAML frontmatter + body
3. Write to `tasks/` directory structure (using `path` to determine subdirectory)
4. Run `taskgraph validate`, `taskgraph risk-path`, etc.

This is a manual step — "I want to run analysis now" — not an automatic sync.

### Sync Error Handling

| Error | Behavior |
|-------|----------|
| Invalid YAML frontmatter | Skip file, log warning with file path and parse error. Continue with remaining files. |
| Missing required `id` or `name` field | Skip file, log warning. Task cannot be synced without these fields. |
| `depends_on` references non-existent slug within project | Insert the dependency edge anyway (dangling reference). The coordinator detects and warns about unresolvable dependencies. `taskgraph validate` should be run before sync to catch these. |
| Duplicate `id` (slug) in same project | Fail the sync with a clear error. Slug uniqueness is enforced by the DB constraint `unq_tasks_project_slug`. |
| File removed from filesystem | DELETE the DB row. FK cascade handles dependent rows. Git preserves history. |

**Validation ordering**: Run `taskgraph validate` before sync to catch structural errors (cycles, missing dependencies, duplicate IDs) at the CLI level. The DB sync then handles data-level integrity (unique constraints, FK checks).

## Open Questions

1. **Embeddings**: Task descriptions may benefit from vector embeddings for similarity search. Deferred — the `metadata` JSONB column can hold an embedding reference later, or a separate `task_embeddings` table can be added.

2. **Bulk status updates**: When the coordinator completes a meta task (all sub-tasks done), should it automatically mark the meta task `completed`? Likely yes — this is an application-level operation, not a DB concern.

3. **Cross-project dependencies**: Not supported. Tasks can only depend on other tasks within the same project. Application-layer validation rejects cross-project dependencies; a future DB-level trigger guard is deferred to Phase 2 (see task_dependencies cross-project guard).

4. **Task versioning**: When a task's body is modified (e.g., notes appended), should we keep previous versions? For v1, no — the current body is sufficient. If audit trail is needed, `updatedAt` timestamp + `metadata` revision count could suffice.

## References

- Cost-benefit framework: taskgraph framework docs — why categorical estimates are structurally required
- Workflow guide: taskgraph workflow docs — practical usage patterns
- Task file format: @alkdev/taskgraph README — field definitions
- TaskFrontmatter struct: @alkdev/taskgraph package source — canonical field types and defaults
- taskgraph architecture: taskgraph architecture docs
- Storage pattern: [README.md](./README.md)
- Table reference (cross-cutting): [table-reference.md](./table-reference.md)
- ADR-011: [../../decisions/ADR-011-dual-task-representation.md](../../decisions/ADR-011-dual-task-representation.md)
- @alkdev/taskgraph (runtime graph engine): `@alkdev/taskgraph` npm package