--- status: draft last_updated: 2026-05-18 --- # Storage: Tasks & Task Dependencies Tasks are the unit of work in the Spec-Driven Development (SDD) process. The **database is the source of truth** for task data at runtime. Markdown files serve as the **authoring surface** for the Decomposer role and the `taskgraph` CLI — they are ingested into the DB via a sync operation and can be exported back for offline analysis. For the overall storage pattern, see [README.md](./README.md). For cross-cutting table reference (common columns, cascade behavior, index reference, status enums, relations), see [table-reference.md](./table-reference.md). For design decisions, see [../../decisions/](../../decisions/). ## Overview ### Why Database as Source of Truth Taskgraph's file-based model works well for single-agent, single-worktree workflows. In the hub's multi-agent, multi-worktree environment, files create problems: - **Parallel worktrees**: Agent A marks a task `in-progress` in their worktree's file. Agent B can't see this — the file lives in A's working directory. The coordinator can't get a consistent view. - **Reliable coordination**: The coordinator needs to query "which tasks are pending?" and "what's blocking task X?" at runtime without scanning filesystems across worktrees. - **Atomic status updates**: An agent calling `hub.task.updateStatus` gets an immediate, transactional state change visible to all other agents and the coordinator. The database is the authoritative, queryable, concurrent-safe representation. Files are the authoring format. ### Relationship to taskgraph CLI The `taskgraph` CLI operates on markdown files. Its value is in **offline analysis** — `topo`, `cycles`, `parallel`, `critical`, `bottleneck`, `risk-path`, `decompose`. These commands depend on categorical fields (`scope`, `risk`, `impact`, `level`) being assessed. The workflow is: 1. **Author** — Decomposer creates/edits markdown files using `taskgraph init` and direct editing 2. **Sync** — Files are ingested into the DB (files → DB) 3. **Execute** — Coordinator and agents query and mutate the DB via hub operations 4. **Analyze** — When needed, export from DB to files, run `taskgraph risk-path` etc. The taskgraph CLI is not required at runtime. The hub uses **@alkdev/taskgraph** for runtime graph operations (topological sort, cycle detection, parallel groups, critical path, risk analysis) — see [Graphology Integration](#graphology-integration-runtime-graph-ops). ## Task Authority Model | Aspect | Authority | Why | |--------|-----------|-----| | Task structure (all fields) | **DB** | Queryable, concurrent-safe, consistent | | Task specification (body) | **DB** (`body` column) | Stored as markdown text; agents append notes during execution | | Task authoring/creation | **Files** → sync → DB | Decomposer edits files; sync ingests them | | Runtime status mutations | **DB** (hub operations) | `hub.task.*` operations — coordinator and agents call these | | Offline graph analysis | **Files** (taskgraph CLI) | Export from DB when needed for `taskgraph risk-path` etc. | See [Field Authority Split](#field-authority-split) for the explicit list of authored vs runtime-managed fields. ## Field Authority Split Fields are split into two categories based on who writes them: ### Authored Fields (upserted by file sync) These fields are written by the Decomposer/file sync. The `ON CONFLICT DO UPDATE SET` clause in the sync upsert includes **only** these columns: | Field | DB Column | |-------|-----------| | id | `slug` | | name | `name` | | (project) | `projectId` | | (directory path) | `path` | | scope | `scope` | | risk | `risk` | | impact | `impact` | | level | `level` | | priority | `priority` | | tags | `tags` | | assignee | `assignee` | | due | `dueAt` | | (body) | `body` | | created | `fileCreatedAt` | | modified | `fileModifiedAt` | | depends_on | `task_dependencies` table | **Note**: `projectId` is set from the project context during sync (the task file's location within a project's `tasks/` directory determines the project), not from taskgraph frontmatter. `commonCols` fields (`id`, `metadata`, `createdAt`, `updatedAt`) are DB-generated and not part of the sync conflict domain. ### Runtime-Managed Fields (mutated via `hub.task.*` operations only) These fields are never overwritten by sync. They are only mutated by hub operations (`hub.task.updateStatus`, `hub.task.addNote`, etc.): | Field | DB Column | Set By | |-------|-----------|--------| | status | `status` | `hub.task.updateStatus` | | (started timestamp) | `startedAt` | `hub.task.updateStatus` (on `in-progress`) | | (completed timestamp) | `completedAt` | `hub.task.updateStatus` (on `completed`) | > **Warning**: Sync must never write `status`, `startedAt`, or `completedAt` — these are owned by hub operations. The sync upsert uses `ON CONFLICT DO UPDATE SET` only for authored fields; runtime fields are excluded from the SET clause. ## Field Mapping: taskgraph Frontmatter → DB Columns Every field in taskgraph's `TaskFrontmatter` struct maps to a dedicated DB column. No frontmatter fields are relegated to JSONB `metadata`. | taskgraph Field | DB Column | Type | Notes | |---|---|---|---| | `id` | `slug` | text NOT NULL | Direct mapping. No transformation. `slug` is taskgraph-compatible, used in `depends_on` references. | | `name` | `name` | text NOT NULL | Direct mapping | | `status` | `status` | text NOT NULL, enum | Direct mapping: `pending`, `in-progress`, `completed`, `failed`, `blocked`. Default: `pending`. | | `depends_on` | `task_dependencies` table | — | Each element creates a row: `depends_on[i]` → `dependsOnTaskId`, task → `dependentTaskId` | | `scope` | `scope` | text, enum | `single`, `narrow`, `moderate`, `broad`, `system`. **Nullable** — NULL = not yet assessed. | | `risk` | `risk` | text, enum | `trivial`, `low`, `medium`, `high`, `critical`. **Nullable** — NULL = not yet assessed. | | `impact` | `impact` | text, enum | `isolated`, `component`, `phase`, `project`. **Nullable** — NULL = not yet assessed. | | `level` | `level` | text, enum | `planning`, `decomposition`, `implementation`, `review`, `research`. **Nullable** — NULL = not yet assessed. | | `priority` | `priority` | text, enum | `low`, `medium`, `high`, `critical`. Nullable. | | `tags` | `tags` | text[] | String array. Default `{}`. | | `assignee` | `assignee` | text | Assigned agent or person. Nullable. | | `due` | `dueAt` | timestamp with tz | Renamed from `due` for DB convention. Nullable. | | `created` | `fileCreatedAt` | timestamp with tz | Frontmatter `created` field. Separate from DB `createdAt` (row creation time). Nullable — frontmatter may not include it. | | `modified` | `fileModifiedAt` | timestamp with tz | Frontmatter `modified` field. Separate from DB `updatedAt` (row update time). Nullable. | | (body) | `body` | text | Markdown content after frontmatter. Nullable — empty body is valid. | | (directory path) | `path` | text | Logical grouping prefix: `architecture`, `implementation/storage`. Nullable — tasks created via API with no file origin have no path. See [Path Semantics](#path-semantics). | | (project) | `projectId` | text NOT NULL | FK → projects.id | ### Table Schemas ### `tasks` SDD task definitions. The database is the source of truth for task data at runtime. Markdown files serve as the authoring surface for the Decomposer and taskgraph CLI — they are ingested into the DB via a sync operation. Every field in taskgraph's `TaskFrontmatter` struct maps to a dedicated DB column (no frontmatter fields in `metadata` JSONB). | Column | Type | Notes | |--------|------|-------| | commonCols | — | id, metadata, createdAt, updatedAt | | projectId | text NOT NULL | FK → projects.id (cascade) — tasks belong to a project | | slug | text NOT NULL | taskgraph `id` — kebab-case identifier used in `depends_on` references. Unique within a project. | | name | text NOT NULL | Human-readable task name (from frontmatter `name`) | | path | text | Logical grouping prefix derived from filesystem location (e.g., `architecture`, `implementation/storage`). Nullable — tasks created via API with no file origin have no path. Enables `WHERE path LIKE 'implementation/%'` for scoped queries. | | status | text NOT NULL | Enum: `pending`, `in-progress`, `completed`, `failed`, `blocked`. Default: `pending`. Status transitions go through hub operations, not file edits. | | scope | text | Categorical scope: `single`, `narrow`, `moderate`, `broad`, `system`. **Nullable** — NULL = not yet assessed. See [Why Categorical Fields Are Nullable](#why-categorical-fields-are-nullable-not-not-null-with-defaults). | | risk | text | Categorical risk: `trivial`, `low`, `medium`, `high`, `critical`. **Nullable** — NULL = not yet assessed. | | impact | text | Categorical impact: `isolated`, `component`, `phase`, `project`. **Nullable** — NULL = not yet assessed. | | level | text | Task level: `planning`, `decomposition`, `implementation`, `review`, `research`. **Nullable** — NULL = not yet assessed. | | priority | text | Priority: `low`, `medium`, `high`, `critical`. Nullable. | | assignee | text | Assigned agent or person. Nullable. | | dueAt | timestamp with tz | Due date (from frontmatter `due`). Nullable. | | tags | text[] | Filtering tags. Default `{}`. GIN index for array-contains queries. | | body | text | Markdown task specification (from file body after frontmatter). Nullable — empty body is valid. Agents may append notes during execution. | | fileCreatedAt | timestamp with tz | Frontmatter `created` field — file creation time from the markdown. Separate from DB `createdAt` (row creation time). Nullable. | | fileModifiedAt | timestamp with tz | Frontmatter `modified` field — file modification time from the markdown. Separate from DB `updatedAt` (row update time). Nullable. | | startedAt | timestamp with tz | When status became `in-progress`. Set by hub operation, not by agent. | | completedAt | timestamp with tz | When status became `completed`. Set by hub operation. | **Unique constraint**: `unq_tasks_project_slug` UNIQUE on `(projectId, slug)` — task slugs are unique within a project. **pgEnum Definitions**: The following enum columns use PostgreSQL `pgEnum` for type safety. Drizzle's `pgEnum` generates named PostgreSQL enums and provides TypeScript type inference. The enum values are aligned with taskgraph's categorical fields. ```ts export const taskStatus = pgEnum("task_status", ["pending", "in-progress", "completed", "failed", "blocked"]); export const taskScope = pgEnum("task_scope", ["single", "narrow", "moderate", "broad", "system"]); export const taskRisk = pgEnum("task_risk", ["trivial", "low", "medium", "high", "critical"]); export const taskImpact = pgEnum("task_impact", ["isolated", "component", "phase", "project"]); export const taskLevel = pgEnum("task_level", ["planning", "decomposition", "implementation", "review", "research"]); export const taskPriority = pgEnum("task_priority", ["low", "medium", "high", "critical"]); ``` The decomposer template should consume these same enum definitions to ensure DB-level constraints match the application-level typing. **Indexes**: `idx_tasks_project_id` on `(projectId)`, `idx_tasks_project_status` on `(projectId, status)` — composite for "find all pending tasks in project X", `idx_tasks_status` on `(status)`, `idx_tasks_active` partial on `(projectId)` WHERE `status IN ('pending', 'in-progress', 'blocked')` — efficiently find active tasks, `idx_tasks_path` on `(path)` **with `text_pattern_ops`** — locale-independent LIKE pattern matching for path prefix queries (e.g., `WHERE path LIKE 'implementation/%'`), `idx_tasks_priority` on `(priority)`, `idx_tasks_assignee` on `(assignee)`, `idx_tasks_due_at` on `(dueAt)`, `idx_tasks_tags` GIN on `(tags)` — for array-contains queries (`tags @> '{security}'`). **`slug` semantics**: From taskgraph frontmatter `id` field. Kebab-case identifiers like `auth-setup`, `storage-tasks-table`. Appears in `depends_on` arrays. **`path` semantics**: Nullable — tasks created via API with no filesystem origin have no path. When set, captures the logical grouping derived from the `tasks/` directory structure. E.g., a file at `tasks/implementation/storage/tasks-table.md` gets `path: "implementation/storage"`. Enables `WHERE path LIKE 'implementation/%'` (scoped queries) without requiring a `parentId` FK. This replaces the previous `parentId` column — grouping is a path concern, not a tree relationship. **No `parentId` column**: Grouping is handled by `path`, dependencies by `task_dependencies`. A "meta task" is just a regular task that depends on its sub-tasks — no special entity type needed. **No `removedAt` column**: When a task file is removed, the sync operation DELETEs the DB row. Git history preserves the file-level history; the DB doesn't need to duplicate it with soft deletes. FK cascade handles cleanup. **`metadata` JSONB**: Reserved for truly ad-hoc data not in the taskgraph schema. No taskgraph frontmatter fields are stored here — all have proper columns. ### `task_dependencies` Dependency edges between tasks. Directed: a row means the dependent task depends on the prerequisite task (prerequisite must complete before dependent can start). Mirrors the taskgraph `depends_on` relationship. | Column | Type | Notes | |--------|------|-------| | commonCols | — | id, metadata, createdAt, updatedAt | | dependsOnTaskId | text NOT NULL | FK → tasks.id (cascade) — The prerequisite task (must complete first) | | dependentTaskId | text NOT NULL | FK → tasks.id (cascade) — The dependent task (waits for prerequisite) | **Unique constraint**: `unq_task_dependencies_depends_on_task` UNIQUE on `(dependsOnTaskId, dependentTaskId)` — no duplicate dependency edges. **Indexes**: `idx_task_dependencies_depends_on_task_id` on `(dependsOnTaskId)` — "what depends on this task?", `idx_task_dependencies_dependent_task_id` on `(dependentTaskId)` — "what does this task depend on?". **Direction**: `dependentTaskId` is the task that has the dependency. `dependsOnTaskId` is the prerequisite task. Together they form a directed edge: `dependentTaskId` → `dependsOnTaskId` meaning "task dependentTaskId depends on task dependsOnTaskId". In the graph, there's an edge from `dependsOnTaskId` → `dependentTaskId` (prerequisite → dependent). This gives correct topological order: prerequisites before dependents. **Cross-project dependency guard**: `taskId` and `dependsOnTaskId` MUST reference tasks within the same project. The application layer enforces this constraint — creating a dependency between tasks in different projects is rejected with a validation error. This is not enforced at the DB level (FK constraints allow cross-project references), so the application must check project consistency before insert. A future DB-level guard could use a trigger: `BEFORE INSERT ON task_dependencies` that checks `NEW.taskId` and `NEW.dependsOnTaskId` reference tasks in the same project. This is deferred to Phase 2 — the application-layer check is sufficient for now. **Sync source**: Dependency edges are authored in task file frontmatter (`depends_on: [other-task]`) and synced to this table during the file → DB sync operation. The sync clears and re-inserts all edges for a task on each run — dependencies are fully replaced by the sync, not merged or modified at runtime. ## Why ALL Frontmatter Fields Get Proper Columns ADR-001 establishes the pattern: "separate structured columns for high-query, high-filter fields." For tasks, **every** taskgraph frontmatter field is queryable and filterable in the coordinator's workflow: - `priority` — "show me high-priority pending tasks" (coordinator prioritization) - `assignee` — "which tasks are assigned to agent X?" (work assignment) - `dueAt` — "which tasks are due this week?" (deadline tracking) - `tags` — "filter by tag" (cross-cutting concerns) Shoving these into `metadata` JSONB loses type safety, indexability, and SQL queryability — exactly the problems the database is meant to solve. The `metadata` JSONB column (from `commonCols`) is reserved for truly ad-hoc data that isn't in the taskgraph schema. ### Why Categorical Fields Are Nullable (Not NOT NULL with Defaults) The previous design made `scope`, `risk`, `impact`, and `level` NOT NULL with defaults (`narrow`, `low`, `isolated`, `implementation`). This conflated two states: - **Assessed as `low`** — the Decomposer evaluated this and determined the risk is low - **Not assessed** — nobody filled this in Hiding the distinction with defaults means the coordinator can't distinguish a deliberate assessment from a gap. NULL is the correct signal for "not yet assessed." Taskgraph itself makes these fields `Option`, `Option`, etc. — nullable. The DB should match the source model. **Application-layer handling**: When `scope`, `risk`, `impact`, or `level` is NULL, the coordinator should: - Warn that the task hasn't been assessed - Exclude it from cost-benefit analysis (you can't compute risk-path without risk values) - Suggest the Decomposer assess it For @alkdev/taskgraph operations that need numeric weights, provide fallbacks at the application layer (e.g., treat NULL risk as `low` for topo sort, but warn). ## Path Semantics The `path` column captures the logical grouping of tasks, derived from their location in the `tasks/` directory hierarchy: ``` tasks/ ├── architecture/ │ ├── auth-design.md → path: "architecture" │ └── storage-overview.md → path: "architecture" ├── research/ │ └── embedding-approach.md → path: "research" └── implementation/ ├── storage/ │ ├── tasks-table.md → path: "implementation/storage" │ └── relations.md → path: "implementation/storage" └── auth/ └── oauth-flow.md → path: "implementation/auth" ``` **`path` is nullable** because tasks created at runtime via hub operations (not synced from files) have no filesystem origin. **`path` enables scoped queries**: - `WHERE path = 'architecture'` — all architecture tasks - `WHERE path LIKE 'implementation/%'` — all implementation tasks - `WHERE path = 'implementation/storage'` — storage implementation tasks This is a prefix-based grouping mechanism. It replaces `parentId` (which was not in the taskgraph model and conflated organizational grouping with dependency ordering). **Locale sensitivity**: The `path` column uses `text` type with the database's default collation. LIKE pattern matching (`WHERE path LIKE 'implementation/%'`) is collation-sensitive. For case-sensitive matching (recommended for task paths which use lowercase), use `COLLATE "C"` or ensure the default collation is `C`/`POSIX`. Alternatively, use `text_pattern_ops` operator class for the index: `CREATE INDEX idx_tasks_path ON tasks (path text_pattern_ops)` which enables `LIKE` and `~` pattern matching regardless of collation. ## Grouping vs Dependencies **There is no `parentId` column.** Task grouping and dependency ordering are separate concepts: - **Grouping** — `path` column. "This task belongs to the `implementation/storage` group." Enables scoped queries. Derived from filesystem layout during sync. - **Dependencies** — `task_dependencies` table. "This task cannot start until that task completes." Enables topological sort, cycle detection, critical path. Derived from `depends_on` frontmatter. A "meta task" (e.g., "implement storage") is simply a task that `depends_on` all its sub-tasks. There is no special entity type — it's regular task + dependency edges. The coordinator picks up the meta task as an assignment, and the implementation specialist works through sub-tasks in dependency order. **Why not `parentId`**: `parentId` was invented in a previous doc revision but has no basis in the taskgraph data model. It created confusion: - Redundant with `task_dependencies` (a meta task's dependencies ARE its sub-tasks) - Required a fragile "inference from directory structure" during sync - Violated the invariant that the DB schema mirrors the taskgraph frontmatter model ## Relationship to Existing Tables ### `mappings` Table The `mappings` table links sessions to coordinators, spokes, and worktrees. A `taskId` column references the task a mapping is assigned to: ```ts taskId: text REFERENCES tasks(id) // FK to tasks task: text // denormalized display name (e.g., task.slug or task.name) ``` This preserves the quick-reference pattern (coordinators can list mappings with task names without a JOIN) while maintaining referential integrity. ### `projects` Table Tasks belong to a project via `tasks.projectId`. A project's tasks live in the project's `tasks/` directory. Cross-project task dependencies are not supported — tasks can only depend on other tasks within the same project. This is enforced at the application level (see task_dependencies cross-project guard). ### `sessions` Table Sessions are linked to tasks indirectly through `mappings`. When the coordinator spawns a session for a meta task: 1. The task row already exists in `tasks` (synced from file or created via API) 2. Creates a `sessions` row for the implementation specialist 3. Creates a `mappings` row with `taskId` pointing to the meta task ## Task Status Lifecycle ``` pending → in-progress → completed ↘ failed → in-progress (retry) ↘ blocked → in-progress (unblocked) ``` | Status | Meaning | |--------|---------| | `pending` | Task exists, not yet started | | `in-progress` | A session is actively working on this task | | `completed` | Task finished successfully | | `failed` | Task failed, may retry (Safe Exit protocol) | | `blocked` | Task is blocked by an unmet dependency or external issue | Status transitions go through **hub operations** (`hub.task.updateStatus`), not file edits. This ensures: - All agents see consistent state immediately - The coordinator can query "which tasks are pending?" reliably - No merge conflicts from parallel file edits Timestamp columns `startedAt` and `completedAt` track when a task entered `in-progress` and `completed` states respectively. These are set by the hub operation, not by the agent. ## Task Notes (Append-Only) Agents may need to add notes to a task during execution (observations, partial progress, blockers encountered). For v1, this is handled by **appending markdown to the `body` column**: ```markdown ## Task Description (original) Implement the tasks table with Drizzle-TypeBox pattern... ## Implementation Notes - 2026-04-19: Started with table definition, commonCols pattern works - 2026-04-19: Hit issue with text[] type for tags — need to check Drizzle support ``` The `hub.task.addNote` operation appends a timestamped note section to the end of `body`. This is simple, preserves the full context in one place, and requires no additional tables. **Concurrency model for `hub.task.addNote`**: Notes are appended to the task `body` field using **DB-level concatenation**: `UPDATE tasks SET body = COALESCE(body, '') || $note WHERE id = $taskId`. This avoids read-modify-write cycles entirely — the append is atomic at the SQL level, eliminating race conditions between concurrent agents. As a fallback for scenarios where DB-level concatenation isn't feasible, **optimistic locking via `updatedAt`** can be used: read the current `updatedAt`, append the note, and `UPDATE WHERE updatedAt = readValue`. If the row was updated between read and write, the UPDATE affects 0 rows and the operation must be retried. This is sufficient for the expected low-contention scenario (one agent at a time writing notes to a task). For high-contention scenarios (multiple agents writing simultaneously), consider a separate `task_notes` table with `INSERT` operations instead of UPDATE appends. If structured, multi-agent notes become necessary later, a dedicated `task_notes` table can be added. The `body` append pattern doesn't preclude this — it's additive. ## Why Categorical Estimates Matter The `scope`, `risk`, `impact`, and `level` fields are not cosmetic metadata — they are what make taskgraph's analysis commands produce useful results. The cost-benefit framework (see taskgraph framework docs) demonstrates a structural property: **upstream failures multiply downstream damage**. These fields power: - **`taskgraph decompose`** — flags tasks where `risk > medium` or `scope > moderate` - **`taskgraph risk-path`** — finds the highest cumulative risk path - **`taskgraph critical`** — finds completion blockers - **`taskgraph bottleneck`** — finds high-betweenness tasks Without them, you just get topological sort — useful, but not structurally insightful. The DB columns for these fields are **nullable** (NULL = not assessed) rather than NOT NULL with defaults, because the distinction between "deliberately assessed as `low`" and "nobody filled this in" is itself valuable information for the coordinator. ## Graphology Integration (Runtime Graph Ops) For runtime graph operations, the hub uses **`@alkdev/taskgraph`** — a TypeScript package that wraps graphology and provides a high-level `TaskGraph` class plus analysis functions. The CLI (`taskgraph`) is for offline authoring and analysis; the TS package is for runtime use. The approach: 1. Load all `tasks` + `task_dependencies` rows for a project from the DB 2. Build a `TaskGraph` via `TaskGraph.fromRecords(tasks, edges)` 3. Run analysis functions as needed: `criticalPath()`, `parallelGroups()`, `bottlenecks()`, `riskPath()`, `shouldDecomposeTask()`, `workflowCost()` This works because realistic task graphs are small — typically 10–50 tasks, rarely exceeding 200 even on large projects. Building a graph from DB rows is instant at this scale (`TaskGraph.fromRecords` with 100 nodes reconstructs in <5ms). `@alkdev/taskgraph` exports: - **`TaskGraph`** — construction (fromTasks, fromRecords, fromJSON), mutation (addTask, removeTask, addDependency, updateTask), queries (hasCycles, findCycles, topologicalOrder, dependencies, dependents, getTask), validation (validateSchema, validateGraph), export - **Analysis functions** — criticalPath, weightedCriticalPath, parallelGroups, bottlenecks, riskPath, riskDistribution, calculateTaskEv, workflowCost, shouldDecomposeTask - **Schema types** — TaskScope, TaskRisk, TaskImpact, TaskLevel, TaskPriority, TaskStatus enums with TypeBox schemas - **Frontmatter** — parseFrontmatter, serializeFrontmatter (YAML + markdown) - **Error classes** — TaskgraphError, CircularDependencyError, TaskNotFoundError, etc. **Why not taskgraph NAPI for v1**: The Rust CLI (`taskgraph`) is for offline authoring and analysis. The TypeScript package (`@alkdev/taskgraph`) handles all runtime graph operations. Graphology is a transitive dependency through `@alkdev/taskgraph` and handles < 200 nodes trivially. NAPI is unnecessary at realistic scales. ## Sync Flow ``` ┌──────────────┐ ┌───────────────┐ ┌──────────────────┐ │ Decomposer │ │ taskgraph CLI │ │ Hub DB │ │ creates .md │──────►│ validates │──────►│ tasks table │ │ files │ │ analyzes │ │ task_dependencies │ └──────────────┘ └───────────────┘ └──────────────────┘ ▲ │ ┌────────┴─────────┐ │ Hub operations │ │ hub.task.* │ │ (status, notes) │ └────────────────────┘ ``` ### Sync: Files → DB The sync operation runs as a **single database transaction**: 1. **Begin transaction** 2. Scan `tasks/` directory for markdown files 3. Parse frontmatter (YAML) + body (markdown) from each file. `@alkdev/taskgraph` provides `parseFrontmatter()` and `serializeFrontmatter()` for YAML+markdown parsing. `parseTaskFile()` and `parseTaskDirectory()` are Node.js only (use `node:fs/promises`); for Deno, use `parseFrontmatter()` with Deno file I/O. 4. Upsert into `tasks` table (matches by `(projectId, slug)`) 5. For each task, `DELETE FROM task_dependencies WHERE dependentTaskId = ?` then `INSERT` the current edges — dependency edges are fully replaced, not merged, because the files own the dependency declarations 6. **Commit transaction** If any step fails, the entire sync rolls back — no partial updates. **Concurrency**: Only one sync should run at a time. The Decomposer triggers sync after creating/updating task files. No concurrent sync mechanism is needed for v1. **Deleted files**: When a task file is removed from `tasks/`, the sync operation **deletes** the corresponding DB row. Git history preserves the full file-level history — the DB doesn't need to duplicate it with soft deletes. FK cascade handles cleanup (`task_dependencies` rows, `mappings.taskId` SET NULL). ### DB → Files (Export) When graph analysis is needed, export DB rows back to markdown files: 1. Query `tasks` + `task_dependencies` for a project 2. For each task, generate markdown with YAML frontmatter + body 3. Write to `tasks/` directory structure (using `path` to determine subdirectory) 4. Run `taskgraph validate`, `taskgraph risk-path`, etc. This is a manual step — "I want to run analysis now" — not an automatic sync. ### Sync Error Handling | Error | Behavior | |-------|----------| | Invalid YAML frontmatter | Skip file, log warning with file path and parse error. Continue with remaining files. | | Missing required `id` or `name` field | Skip file, log warning. Task cannot be synced without these fields. | | `depends_on` references non-existent slug within project | Insert the dependency edge anyway (dangling reference). The coordinator detects and warns about unresolvable dependencies. `taskgraph validate` should be run before sync to catch these. | | Duplicate `id` (slug) in same project | Fail the sync with a clear error. Slug uniqueness is enforced by the DB constraint `unq_tasks_project_slug`. | | File removed from filesystem | DELETE the DB row. FK cascade handles dependent rows. Git preserves history. | **Validation ordering**: Run `taskgraph validate` before sync to catch structural errors (cycles, missing dependencies, duplicate IDs) at the CLI level. The DB sync then handles data-level integrity (unique constraints, FK checks). ## Open Questions 1. **Embeddings**: Task descriptions may benefit from vector embeddings for similarity search. Deferred — the `metadata` JSONB column can hold an embedding reference later, or a separate `task_embeddings` table can be added. 2. **Bulk status updates**: When the coordinator completes a meta task (all sub-tasks done), should it automatically mark the meta task `completed`? Likely yes — this is an application-level operation, not a DB concern. 3. **Cross-project dependencies**: Not supported. Tasks can only depend on other tasks within the same project. Application-layer validation rejects cross-project dependencies; a future DB-level trigger guard is deferred to Phase 2 (see task_dependencies cross-project guard). 4. **Task versioning**: When a task's body is modified (e.g., notes appended), should we keep previous versions? For v1, no — the current body is sufficient. If audit trail is needed, `updatedAt` timestamp + `metadata` revision count could suffice. ## References - Cost-benefit framework: taskgraph framework docs — why categorical estimates are structurally required - Workflow guide: taskgraph workflow docs — practical usage patterns - Task file format: @alkdev/taskgraph README — field definitions - TaskFrontmatter struct: @alkdev/taskgraph package source — canonical field types and defaults - taskgraph architecture: taskgraph architecture docs - Storage pattern: [README.md](./README.md) - Table reference (cross-cutting): [table-reference.md](./table-reference.md) - ADR-011: [../../decisions/ADR-011-dual-task-representation.md](../../decisions/ADR-011-dual-task-representation.md) - @alkdev/taskgraph (runtime graph engine): `@alkdev/taskgraph` npm package