Files
hub/docs/architecture/storage/tasks.md
glm-5.1 2b63cda1c7 Setup repo: migrate architecture specs, code stubs, and tasks from alkhub_ts
Copy architecture docs, ADRs, storage domain specs, research, reviews,
and 56 storage architecture tasks from the alkhub_ts monorepo. Adapt for
standalone @alkdev/hub repo structure (src/ not packages/hub/).

Sanitize all sensitive information:
- Replace private IPs (10.0.0.1) with localhost defaults
- Remove internal server hostnames (dev1, ns528096)
- Replace /workspace/ private paths with npm package references
- Remove hardcoded credentials from examples
- Rewrite infrastructure.md without private network details

Add Deno project scaffolding: deno.json (pinned deps), .gitignore,
AGENTS.md, entry point. Migrate existing code stubs (crypto, config
types, logger) with updated import paths.
2026-05-25 10:56:32 +00:00

445 lines
32 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
status: draft
last_updated: 2026-05-18
---
# Storage: Tasks & Task Dependencies
Tasks are the unit of work in the Spec-Driven Development (SDD) process. The **database is the source of truth** for task data at runtime. Markdown files serve as the **authoring surface** for the Decomposer role and the `taskgraph` CLI — they are ingested into the DB via a sync operation and can be exported back for offline analysis.
For the overall storage pattern, see [README.md](./README.md). For cross-cutting table reference (common columns, cascade behavior, index reference, status enums, relations), see [table-reference.md](./table-reference.md). For design decisions, see [../../decisions/](../../decisions/).
## Overview
### Why Database as Source of Truth
Taskgraph's file-based model works well for single-agent, single-worktree workflows. In the hub's multi-agent, multi-worktree environment, files create problems:
- **Parallel worktrees**: Agent A marks a task `in-progress` in their worktree's file. Agent B can't see this — the file lives in A's working directory. The coordinator can't get a consistent view.
- **Reliable coordination**: The coordinator needs to query "which tasks are pending?" and "what's blocking task X?" at runtime without scanning filesystems across worktrees.
- **Atomic status updates**: An agent calling `hub.task.updateStatus` gets an immediate, transactional state change visible to all other agents and the coordinator.
The database is the authoritative, queryable, concurrent-safe representation. Files are the authoring format.
### Relationship to taskgraph CLI
The `taskgraph` CLI operates on markdown files. Its value is in **offline analysis**`topo`, `cycles`, `parallel`, `critical`, `bottleneck`, `risk-path`, `decompose`. These commands depend on categorical fields (`scope`, `risk`, `impact`, `level`) being assessed.
The workflow is:
1. **Author** — Decomposer creates/edits markdown files using `taskgraph init` and direct editing
2. **Sync** — Files are ingested into the DB (files → DB)
3. **Execute** — Coordinator and agents query and mutate the DB via hub operations
4. **Analyze** — When needed, export from DB to files, run `taskgraph risk-path` etc.
The taskgraph CLI is not required at runtime. The hub uses **@alkdev/taskgraph** for runtime graph operations (topological sort, cycle detection, parallel groups, critical path, risk analysis) — see [Graphology Integration](#graphology-integration-runtime-graph-ops).
## Task Authority Model
| Aspect | Authority | Why |
|--------|-----------|-----|
| Task structure (all fields) | **DB** | Queryable, concurrent-safe, consistent |
| Task specification (body) | **DB** (`body` column) | Stored as markdown text; agents append notes during execution |
| Task authoring/creation | **Files** → sync → DB | Decomposer edits files; sync ingests them |
| Runtime status mutations | **DB** (hub operations) | `hub.task.*` operations — coordinator and agents call these |
| Offline graph analysis | **Files** (taskgraph CLI) | Export from DB when needed for `taskgraph risk-path` etc. |
See [Field Authority Split](#field-authority-split) for the explicit list of authored vs runtime-managed fields.
## Field Authority Split
Fields are split into two categories based on who writes them:
### Authored Fields (upserted by file sync)
These fields are written by the Decomposer/file sync. The `ON CONFLICT DO UPDATE SET` clause in the sync upsert includes **only** these columns:
| Field | DB Column |
|-------|-----------|
| id | `slug` |
| name | `name` |
| (project) | `projectId` |
| (directory path) | `path` |
| scope | `scope` |
| risk | `risk` |
| impact | `impact` |
| level | `level` |
| priority | `priority` |
| tags | `tags` |
| assignee | `assignee` |
| due | `dueAt` |
| (body) | `body` |
| created | `fileCreatedAt` |
| modified | `fileModifiedAt` |
| depends_on | `task_dependencies` table |
**Note**: `projectId` is set from the project context during sync (the task file's location within a project's `tasks/` directory determines the project), not from taskgraph frontmatter. `commonCols` fields (`id`, `metadata`, `createdAt`, `updatedAt`) are DB-generated and not part of the sync conflict domain.
### Runtime-Managed Fields (mutated via `hub.task.*` operations only)
These fields are never overwritten by sync. They are only mutated by hub operations (`hub.task.updateStatus`, `hub.task.addNote`, etc.):
| Field | DB Column | Set By |
|-------|-----------|--------|
| status | `status` | `hub.task.updateStatus` |
| (started timestamp) | `startedAt` | `hub.task.updateStatus` (on `in-progress`) |
| (completed timestamp) | `completedAt` | `hub.task.updateStatus` (on `completed`) |
> **Warning**: Sync must never write `status`, `startedAt`, or `completedAt` — these are owned by hub operations. The sync upsert uses `ON CONFLICT DO UPDATE SET` only for authored fields; runtime fields are excluded from the SET clause.
## Field Mapping: taskgraph Frontmatter → DB Columns
Every field in taskgraph's `TaskFrontmatter` struct maps to a dedicated DB column. No frontmatter fields are relegated to JSONB `metadata`.
| taskgraph Field | DB Column | Type | Notes |
|---|---|---|---|
| `id` | `slug` | text NOT NULL | Direct mapping. No transformation. `slug` is taskgraph-compatible, used in `depends_on` references. |
| `name` | `name` | text NOT NULL | Direct mapping |
| `status` | `status` | text NOT NULL, enum | Direct mapping: `pending`, `in-progress`, `completed`, `failed`, `blocked`. Default: `pending`. |
| `depends_on` | `task_dependencies` table | — | Each element creates a row: `depends_on[i]``dependsOnTaskId`, task → `dependentTaskId` |
| `scope` | `scope` | text, enum | `single`, `narrow`, `moderate`, `broad`, `system`. **Nullable** — NULL = not yet assessed. |
| `risk` | `risk` | text, enum | `trivial`, `low`, `medium`, `high`, `critical`. **Nullable** — NULL = not yet assessed. |
| `impact` | `impact` | text, enum | `isolated`, `component`, `phase`, `project`. **Nullable** — NULL = not yet assessed. |
| `level` | `level` | text, enum | `planning`, `decomposition`, `implementation`, `review`, `research`. **Nullable** — NULL = not yet assessed. |
| `priority` | `priority` | text, enum | `low`, `medium`, `high`, `critical`. Nullable. |
| `tags` | `tags` | text[] | String array. Default `{}`. |
| `assignee` | `assignee` | text | Assigned agent or person. Nullable. |
| `due` | `dueAt` | timestamp with tz | Renamed from `due` for DB convention. Nullable. |
| `created` | `fileCreatedAt` | timestamp with tz | Frontmatter `created` field. Separate from DB `createdAt` (row creation time). Nullable — frontmatter may not include it. |
| `modified` | `fileModifiedAt` | timestamp with tz | Frontmatter `modified` field. Separate from DB `updatedAt` (row update time). Nullable. |
| (body) | `body` | text | Markdown content after frontmatter. Nullable — empty body is valid. |
| (directory path) | `path` | text | Logical grouping prefix: `architecture`, `implementation/storage`. Nullable — tasks created via API with no file origin have no path. See [Path Semantics](#path-semantics). |
| (project) | `projectId` | text NOT NULL | FK → projects.id |
### Table Schemas
### `tasks`
SDD task definitions. The database is the source of truth for task data at runtime. Markdown files serve as the authoring surface for the Decomposer and taskgraph CLI — they are ingested into the DB via a sync operation. Every field in taskgraph's `TaskFrontmatter` struct maps to a dedicated DB column (no frontmatter fields in `metadata` JSONB).
| Column | Type | Notes |
|--------|------|-------|
| commonCols | — | id, metadata, createdAt, updatedAt |
| projectId | text NOT NULL | FK → projects.id (cascade) — tasks belong to a project |
| slug | text NOT NULL | taskgraph `id` — kebab-case identifier used in `depends_on` references. Unique within a project. |
| name | text NOT NULL | Human-readable task name (from frontmatter `name`) |
| path | text | Logical grouping prefix derived from filesystem location (e.g., `architecture`, `implementation/storage`). Nullable — tasks created via API with no file origin have no path. Enables `WHERE path LIKE 'implementation/%'` for scoped queries. |
| status | text NOT NULL | Enum: `pending`, `in-progress`, `completed`, `failed`, `blocked`. Default: `pending`. Status transitions go through hub operations, not file edits. |
| scope | text | Categorical scope: `single`, `narrow`, `moderate`, `broad`, `system`. **Nullable** — NULL = not yet assessed. See [Why Categorical Fields Are Nullable](#why-categorical-fields-are-nullable-not-not-null-with-defaults). |
| risk | text | Categorical risk: `trivial`, `low`, `medium`, `high`, `critical`. **Nullable** — NULL = not yet assessed. |
| impact | text | Categorical impact: `isolated`, `component`, `phase`, `project`. **Nullable** — NULL = not yet assessed. |
| level | text | Task level: `planning`, `decomposition`, `implementation`, `review`, `research`. **Nullable** — NULL = not yet assessed. |
| priority | text | Priority: `low`, `medium`, `high`, `critical`. Nullable. |
| assignee | text | Assigned agent or person. Nullable. |
| dueAt | timestamp with tz | Due date (from frontmatter `due`). Nullable. |
| tags | text[] | Filtering tags. Default `{}`. GIN index for array-contains queries. |
| body | text | Markdown task specification (from file body after frontmatter). Nullable — empty body is valid. Agents may append notes during execution. |
| fileCreatedAt | timestamp with tz | Frontmatter `created` field — file creation time from the markdown. Separate from DB `createdAt` (row creation time). Nullable. |
| fileModifiedAt | timestamp with tz | Frontmatter `modified` field — file modification time from the markdown. Separate from DB `updatedAt` (row update time). Nullable. |
| startedAt | timestamp with tz | When status became `in-progress`. Set by hub operation, not by agent. |
| completedAt | timestamp with tz | When status became `completed`. Set by hub operation. |
**Unique constraint**: `unq_tasks_project_slug` UNIQUE on `(projectId, slug)` — task slugs are unique within a project.
**pgEnum Definitions**: The following enum columns use PostgreSQL `pgEnum` for type safety. Drizzle's `pgEnum` generates named PostgreSQL enums and provides TypeScript type inference. The enum values are aligned with taskgraph's categorical fields.
```ts
export const taskStatus = pgEnum("task_status", ["pending", "in-progress", "completed", "failed", "blocked"]);
export const taskScope = pgEnum("task_scope", ["single", "narrow", "moderate", "broad", "system"]);
export const taskRisk = pgEnum("task_risk", ["trivial", "low", "medium", "high", "critical"]);
export const taskImpact = pgEnum("task_impact", ["isolated", "component", "phase", "project"]);
export const taskLevel = pgEnum("task_level", ["planning", "decomposition", "implementation", "review", "research"]);
export const taskPriority = pgEnum("task_priority", ["low", "medium", "high", "critical"]);
```
The decomposer template should consume these same enum definitions to ensure DB-level constraints match the application-level typing.
**Indexes**: `idx_tasks_project_id` on `(projectId)`, `idx_tasks_project_status` on `(projectId, status)` — composite for "find all pending tasks in project X", `idx_tasks_status` on `(status)`, `idx_tasks_active` partial on `(projectId)` WHERE `status IN ('pending', 'in-progress', 'blocked')` — efficiently find active tasks, `idx_tasks_path` on `(path)` **with `text_pattern_ops`** — locale-independent LIKE pattern matching for path prefix queries (e.g., `WHERE path LIKE 'implementation/%'`), `idx_tasks_priority` on `(priority)`, `idx_tasks_assignee` on `(assignee)`, `idx_tasks_due_at` on `(dueAt)`, `idx_tasks_tags` GIN on `(tags)` — for array-contains queries (`tags @> '{security}'`).
**`slug` semantics**: From taskgraph frontmatter `id` field. Kebab-case identifiers like `auth-setup`, `storage-tasks-table`. Appears in `depends_on` arrays.
**`path` semantics**: Nullable — tasks created via API with no filesystem origin have no path. When set, captures the logical grouping derived from the `tasks/` directory structure. E.g., a file at `tasks/implementation/storage/tasks-table.md` gets `path: "implementation/storage"`. Enables `WHERE path LIKE 'implementation/%'` (scoped queries) without requiring a `parentId` FK. This replaces the previous `parentId` column — grouping is a path concern, not a tree relationship.
**No `parentId` column**: Grouping is handled by `path`, dependencies by `task_dependencies`. A "meta task" is just a regular task that depends on its sub-tasks — no special entity type needed.
**No `removedAt` column**: When a task file is removed, the sync operation DELETEs the DB row. Git history preserves the file-level history; the DB doesn't need to duplicate it with soft deletes. FK cascade handles cleanup.
**`metadata` JSONB**: Reserved for truly ad-hoc data not in the taskgraph schema. No taskgraph frontmatter fields are stored here — all have proper columns.
### `task_dependencies`
Dependency edges between tasks. Directed: a row means the dependent task depends on the prerequisite task (prerequisite must complete before dependent can start). Mirrors the taskgraph `depends_on` relationship.
| Column | Type | Notes |
|--------|------|-------|
| commonCols | — | id, metadata, createdAt, updatedAt |
| dependsOnTaskId | text NOT NULL | FK → tasks.id (cascade) — The prerequisite task (must complete first) |
| dependentTaskId | text NOT NULL | FK → tasks.id (cascade) — The dependent task (waits for prerequisite) |
**Unique constraint**: `unq_task_dependencies_depends_on_task` UNIQUE on `(dependsOnTaskId, dependentTaskId)` — no duplicate dependency edges.
**Indexes**: `idx_task_dependencies_depends_on_task_id` on `(dependsOnTaskId)` — "what depends on this task?", `idx_task_dependencies_dependent_task_id` on `(dependentTaskId)` — "what does this task depend on?".
**Direction**: `dependentTaskId` is the task that has the dependency. `dependsOnTaskId` is the prerequisite task. Together they form a directed edge: `dependentTaskId``dependsOnTaskId` meaning "task dependentTaskId depends on task dependsOnTaskId". In the graph, there's an edge from `dependsOnTaskId``dependentTaskId` (prerequisite → dependent). This gives correct topological order: prerequisites before dependents.
**Cross-project dependency guard**: `taskId` and `dependsOnTaskId` MUST reference tasks within the same project. The application layer enforces this constraint — creating a dependency between tasks in different projects is rejected with a validation error. This is not enforced at the DB level (FK constraints allow cross-project references), so the application must check project consistency before insert.
A future DB-level guard could use a trigger: `BEFORE INSERT ON task_dependencies` that checks `NEW.taskId` and `NEW.dependsOnTaskId` reference tasks in the same project. This is deferred to Phase 2 — the application-layer check is sufficient for now.
**Sync source**: Dependency edges are authored in task file frontmatter (`depends_on: [other-task]`) and synced to this table during the file → DB sync operation. The sync clears and re-inserts all edges for a task on each run — dependencies are fully replaced by the sync, not merged or modified at runtime.
## Why ALL Frontmatter Fields Get Proper Columns
ADR-001 establishes the pattern: "separate structured columns for high-query, high-filter fields." For tasks, **every** taskgraph frontmatter field is queryable and filterable in the coordinator's workflow:
- `priority` — "show me high-priority pending tasks" (coordinator prioritization)
- `assignee` — "which tasks are assigned to agent X?" (work assignment)
- `dueAt` — "which tasks are due this week?" (deadline tracking)
- `tags` — "filter by tag" (cross-cutting concerns)
Shoving these into `metadata` JSONB loses type safety, indexability, and SQL queryability — exactly the problems the database is meant to solve. The `metadata` JSONB column (from `commonCols`) is reserved for truly ad-hoc data that isn't in the taskgraph schema.
### Why Categorical Fields Are Nullable (Not NOT NULL with Defaults)
The previous design made `scope`, `risk`, `impact`, and `level` NOT NULL with defaults (`narrow`, `low`, `isolated`, `implementation`). This conflated two states:
- **Assessed as `low`** — the Decomposer evaluated this and determined the risk is low
- **Not assessed** — nobody filled this in
Hiding the distinction with defaults means the coordinator can't distinguish a deliberate assessment from a gap. NULL is the correct signal for "not yet assessed."
Taskgraph itself makes these fields `Option<TaskScope>`, `Option<TaskRisk>`, etc. — nullable. The DB should match the source model.
**Application-layer handling**: When `scope`, `risk`, `impact`, or `level` is NULL, the coordinator should:
- Warn that the task hasn't been assessed
- Exclude it from cost-benefit analysis (you can't compute risk-path without risk values)
- Suggest the Decomposer assess it
For @alkdev/taskgraph operations that need numeric weights, provide fallbacks at the application layer (e.g., treat NULL risk as `low` for topo sort, but warn).
## Path Semantics
The `path` column captures the logical grouping of tasks, derived from their location in the `tasks/` directory hierarchy:
```
tasks/
├── architecture/
│ ├── auth-design.md → path: "architecture"
│ └── storage-overview.md → path: "architecture"
├── research/
│ └── embedding-approach.md → path: "research"
└── implementation/
├── storage/
│ ├── tasks-table.md → path: "implementation/storage"
│ └── relations.md → path: "implementation/storage"
└── auth/
└── oauth-flow.md → path: "implementation/auth"
```
**`path` is nullable** because tasks created at runtime via hub operations (not synced from files) have no filesystem origin.
**`path` enables scoped queries**:
- `WHERE path = 'architecture'` — all architecture tasks
- `WHERE path LIKE 'implementation/%'` — all implementation tasks
- `WHERE path = 'implementation/storage'` — storage implementation tasks
This is a prefix-based grouping mechanism. It replaces `parentId` (which was not in the taskgraph model and conflated organizational grouping with dependency ordering).
**Locale sensitivity**: The `path` column uses `text` type with the database's default collation. LIKE pattern matching (`WHERE path LIKE 'implementation/%'`) is collation-sensitive. For case-sensitive matching (recommended for task paths which use lowercase), use `COLLATE "C"` or ensure the default collation is `C`/`POSIX`. Alternatively, use `text_pattern_ops` operator class for the index: `CREATE INDEX idx_tasks_path ON tasks (path text_pattern_ops)` which enables `LIKE` and `~` pattern matching regardless of collation.
## Grouping vs Dependencies
**There is no `parentId` column.** Task grouping and dependency ordering are separate concepts:
- **Grouping** — `path` column. "This task belongs to the `implementation/storage` group." Enables scoped queries. Derived from filesystem layout during sync.
- **Dependencies** — `task_dependencies` table. "This task cannot start until that task completes." Enables topological sort, cycle detection, critical path. Derived from `depends_on` frontmatter.
A "meta task" (e.g., "implement storage") is simply a task that `depends_on` all its sub-tasks. There is no special entity type — it's regular task + dependency edges. The coordinator picks up the meta task as an assignment, and the implementation specialist works through sub-tasks in dependency order.
**Why not `parentId`**: `parentId` was invented in a previous doc revision but has no basis in the taskgraph data model. It created confusion:
- Redundant with `task_dependencies` (a meta task's dependencies ARE its sub-tasks)
- Required a fragile "inference from directory structure" during sync
- Violated the invariant that the DB schema mirrors the taskgraph frontmatter model
## Relationship to Existing Tables
### `mappings` Table
The `mappings` table links sessions to coordinators, spokes, and worktrees. A `taskId` column references the task a mapping is assigned to:
```ts
taskId: text REFERENCES tasks(id) // FK to tasks
task: text // denormalized display name (e.g., task.slug or task.name)
```
This preserves the quick-reference pattern (coordinators can list mappings with task names without a JOIN) while maintaining referential integrity.
### `projects` Table
Tasks belong to a project via `tasks.projectId`. A project's tasks live in the project's `tasks/` directory. Cross-project task dependencies are not supported — tasks can only depend on other tasks within the same project. This is enforced at the application level (see task_dependencies cross-project guard).
### `sessions` Table
Sessions are linked to tasks indirectly through `mappings`. When the coordinator spawns a session for a meta task:
1. The task row already exists in `tasks` (synced from file or created via API)
2. Creates a `sessions` row for the implementation specialist
3. Creates a `mappings` row with `taskId` pointing to the meta task
## Task Status Lifecycle
```
pending → in-progress → completed
↘ failed → in-progress (retry)
↘ blocked → in-progress (unblocked)
```
| Status | Meaning |
|--------|---------|
| `pending` | Task exists, not yet started |
| `in-progress` | A session is actively working on this task |
| `completed` | Task finished successfully |
| `failed` | Task failed, may retry (Safe Exit protocol) |
| `blocked` | Task is blocked by an unmet dependency or external issue |
Status transitions go through **hub operations** (`hub.task.updateStatus`), not file edits. This ensures:
- All agents see consistent state immediately
- The coordinator can query "which tasks are pending?" reliably
- No merge conflicts from parallel file edits
Timestamp columns `startedAt` and `completedAt` track when a task entered `in-progress` and `completed` states respectively. These are set by the hub operation, not by the agent.
## Task Notes (Append-Only)
Agents may need to add notes to a task during execution (observations, partial progress, blockers encountered). For v1, this is handled by **appending markdown to the `body` column**:
```markdown
## Task Description (original)
Implement the tasks table with Drizzle-TypeBox pattern...
## Implementation Notes
- 2026-04-19: Started with table definition, commonCols pattern works
- 2026-04-19: Hit issue with text[] type for tags — need to check Drizzle support
```
The `hub.task.addNote` operation appends a timestamped note section to the end of `body`. This is simple, preserves the full context in one place, and requires no additional tables.
**Concurrency model for `hub.task.addNote`**: Notes are appended to the task `body` field using **DB-level concatenation**: `UPDATE tasks SET body = COALESCE(body, '') || $note WHERE id = $taskId`. This avoids read-modify-write cycles entirely — the append is atomic at the SQL level, eliminating race conditions between concurrent agents.
As a fallback for scenarios where DB-level concatenation isn't feasible, **optimistic locking via `updatedAt`** can be used: read the current `updatedAt`, append the note, and `UPDATE WHERE updatedAt = readValue`. If the row was updated between read and write, the UPDATE affects 0 rows and the operation must be retried. This is sufficient for the expected low-contention scenario (one agent at a time writing notes to a task).
For high-contention scenarios (multiple agents writing simultaneously), consider a separate `task_notes` table with `INSERT` operations instead of UPDATE appends.
If structured, multi-agent notes become necessary later, a dedicated `task_notes` table can be added. The `body` append pattern doesn't preclude this — it's additive.
## Why Categorical Estimates Matter
The `scope`, `risk`, `impact`, and `level` fields are not cosmetic metadata — they are what make taskgraph's analysis commands produce useful results. The cost-benefit framework (see taskgraph framework docs) demonstrates a structural property: **upstream failures multiply downstream damage**.
These fields power:
- **`taskgraph decompose`** — flags tasks where `risk > medium` or `scope > moderate`
- **`taskgraph risk-path`** — finds the highest cumulative risk path
- **`taskgraph critical`** — finds completion blockers
- **`taskgraph bottleneck`** — finds high-betweenness tasks
Without them, you just get topological sort — useful, but not structurally insightful. The DB columns for these fields are **nullable** (NULL = not assessed) rather than NOT NULL with defaults, because the distinction between "deliberately assessed as `low`" and "nobody filled this in" is itself valuable information for the coordinator.
## Graphology Integration (Runtime Graph Ops)
For runtime graph operations, the hub uses **`@alkdev/taskgraph`** — a TypeScript package that wraps graphology and provides a high-level `TaskGraph` class plus analysis functions. The CLI (`taskgraph`) is for offline authoring and analysis; the TS package is for runtime use.
The approach:
1. Load all `tasks` + `task_dependencies` rows for a project from the DB
2. Build a `TaskGraph` via `TaskGraph.fromRecords(tasks, edges)`
3. Run analysis functions as needed: `criticalPath()`, `parallelGroups()`, `bottlenecks()`, `riskPath()`, `shouldDecomposeTask()`, `workflowCost()`
This works because realistic task graphs are small — typically 1050 tasks, rarely exceeding 200 even on large projects. Building a graph from DB rows is instant at this scale (`TaskGraph.fromRecords` with 100 nodes reconstructs in <5ms).
`@alkdev/taskgraph` exports:
- **`TaskGraph`** — construction (fromTasks, fromRecords, fromJSON), mutation (addTask, removeTask, addDependency, updateTask), queries (hasCycles, findCycles, topologicalOrder, dependencies, dependents, getTask), validation (validateSchema, validateGraph), export
- **Analysis functions** — criticalPath, weightedCriticalPath, parallelGroups, bottlenecks, riskPath, riskDistribution, calculateTaskEv, workflowCost, shouldDecomposeTask
- **Schema types** — TaskScope, TaskRisk, TaskImpact, TaskLevel, TaskPriority, TaskStatus enums with TypeBox schemas
- **Frontmatter** — parseFrontmatter, serializeFrontmatter (YAML + markdown)
- **Error classes** — TaskgraphError, CircularDependencyError, TaskNotFoundError, etc.
**Why not taskgraph NAPI for v1**: The Rust CLI (`taskgraph`) is for offline authoring and analysis. The TypeScript package (`@alkdev/taskgraph`) handles all runtime graph operations. Graphology is a transitive dependency through `@alkdev/taskgraph` and handles < 200 nodes trivially. NAPI is unnecessary at realistic scales.
## Sync Flow
```
┌──────────────┐ ┌───────────────┐ ┌──────────────────┐
│ Decomposer │ │ taskgraph CLI │ │ Hub DB │
│ creates .md │──────►│ validates │──────►│ tasks table │
│ files │ │ analyzes │ │ task_dependencies │
└──────────────┘ └───────────────┘ └──────────────────┘
┌────────┴─────────┐
│ Hub operations │
│ hub.task.* │
│ (status, notes) │
└────────────────────┘
```
### Sync: Files → DB
The sync operation runs as a **single database transaction**:
1. **Begin transaction**
2. Scan `tasks/` directory for markdown files
3. Parse frontmatter (YAML) + body (markdown) from each file. `@alkdev/taskgraph` provides `parseFrontmatter()` and `serializeFrontmatter()` for YAML+markdown parsing. `parseTaskFile()` and `parseTaskDirectory()` are Node.js only (use `node:fs/promises`); for Deno, use `parseFrontmatter()` with Deno file I/O.
4. Upsert into `tasks` table (matches by `(projectId, slug)`)
5. For each task, `DELETE FROM task_dependencies WHERE dependentTaskId = ?` then `INSERT` the current edges — dependency edges are fully replaced, not merged, because the files own the dependency declarations
6. **Commit transaction**
If any step fails, the entire sync rolls back — no partial updates.
**Concurrency**: Only one sync should run at a time. The Decomposer triggers sync after creating/updating task files. No concurrent sync mechanism is needed for v1.
**Deleted files**: When a task file is removed from `tasks/`, the sync operation **deletes** the corresponding DB row. Git history preserves the full file-level history — the DB doesn't need to duplicate it with soft deletes. FK cascade handles cleanup (`task_dependencies` rows, `mappings.taskId` SET NULL).
### DB → Files (Export)
When graph analysis is needed, export DB rows back to markdown files:
1. Query `tasks` + `task_dependencies` for a project
2. For each task, generate markdown with YAML frontmatter + body
3. Write to `tasks/` directory structure (using `path` to determine subdirectory)
4. Run `taskgraph validate`, `taskgraph risk-path`, etc.
This is a manual step — "I want to run analysis now" — not an automatic sync.
### Sync Error Handling
| Error | Behavior |
|-------|----------|
| Invalid YAML frontmatter | Skip file, log warning with file path and parse error. Continue with remaining files. |
| Missing required `id` or `name` field | Skip file, log warning. Task cannot be synced without these fields. |
| `depends_on` references non-existent slug within project | Insert the dependency edge anyway (dangling reference). The coordinator detects and warns about unresolvable dependencies. `taskgraph validate` should be run before sync to catch these. |
| Duplicate `id` (slug) in same project | Fail the sync with a clear error. Slug uniqueness is enforced by the DB constraint `unq_tasks_project_slug`. |
| File removed from filesystem | DELETE the DB row. FK cascade handles dependent rows. Git preserves history. |
**Validation ordering**: Run `taskgraph validate` before sync to catch structural errors (cycles, missing dependencies, duplicate IDs) at the CLI level. The DB sync then handles data-level integrity (unique constraints, FK checks).
## Open Questions
1. **Embeddings**: Task descriptions may benefit from vector embeddings for similarity search. Deferred — the `metadata` JSONB column can hold an embedding reference later, or a separate `task_embeddings` table can be added.
2. **Bulk status updates**: When the coordinator completes a meta task (all sub-tasks done), should it automatically mark the meta task `completed`? Likely yes — this is an application-level operation, not a DB concern.
3. **Cross-project dependencies**: Not supported. Tasks can only depend on other tasks within the same project. Application-layer validation rejects cross-project dependencies; a future DB-level trigger guard is deferred to Phase 2 (see task_dependencies cross-project guard).
4. **Task versioning**: When a task's body is modified (e.g., notes appended), should we keep previous versions? For v1, no — the current body is sufficient. If audit trail is needed, `updatedAt` timestamp + `metadata` revision count could suffice.
## References
- Cost-benefit framework: taskgraph framework docs — why categorical estimates are structurally required
- Workflow guide: taskgraph workflow docs — practical usage patterns
- Task file format: @alkdev/taskgraph README — field definitions
- TaskFrontmatter struct: @alkdev/taskgraph package source — canonical field types and defaults
- taskgraph architecture: taskgraph architecture docs
- Storage pattern: [README.md](./README.md)
- Table reference (cross-cutting): [table-reference.md](./table-reference.md)
- ADR-011: [../../decisions/ADR-011-dual-task-representation.md](../../decisions/ADR-011-dual-task-representation.md)
- @alkdev/taskgraph (runtime graph engine): `@alkdev/taskgraph` npm package