Copy architecture docs, ADRs, storage domain specs, research, reviews, and 56 storage architecture tasks from the alkhub_ts monorepo. Adapt for standalone @alkdev/hub repo structure (src/ not packages/hub/). Sanitize all sensitive information: - Replace private IPs (10.0.0.1) with localhost defaults - Remove internal server hostnames (dev1, ns528096) - Replace /workspace/ private paths with npm package references - Remove hardcoded credentials from examples - Rewrite infrastructure.md without private network details Add Deno project scaffolding: deno.json (pinned deps), .gitignore, AGENTS.md, entry point. Migrate existing code stubs (crypto, config types, logger) with updated import paths.
32 KiB
status, last_updated
| status | last_updated |
|---|---|
| draft | 2026-05-18 |
Storage: Tasks & Task Dependencies
Tasks are the unit of work in the Spec-Driven Development (SDD) process. The database is the source of truth for task data at runtime. Markdown files serve as the authoring surface for the Decomposer role and the taskgraph CLI — they are ingested into the DB via a sync operation and can be exported back for offline analysis.
For the overall storage pattern, see README.md. For cross-cutting table reference (common columns, cascade behavior, index reference, status enums, relations), see table-reference.md. For design decisions, see ../../decisions/.
Overview
Why Database as Source of Truth
Taskgraph's file-based model works well for single-agent, single-worktree workflows. In the hub's multi-agent, multi-worktree environment, files create problems:
- Parallel worktrees: Agent A marks a task
in-progressin their worktree's file. Agent B can't see this — the file lives in A's working directory. The coordinator can't get a consistent view. - Reliable coordination: The coordinator needs to query "which tasks are pending?" and "what's blocking task X?" at runtime without scanning filesystems across worktrees.
- Atomic status updates: An agent calling
hub.task.updateStatusgets an immediate, transactional state change visible to all other agents and the coordinator.
The database is the authoritative, queryable, concurrent-safe representation. Files are the authoring format.
Relationship to taskgraph CLI
The taskgraph CLI operates on markdown files. Its value is in offline analysis — topo, cycles, parallel, critical, bottleneck, risk-path, decompose. These commands depend on categorical fields (scope, risk, impact, level) being assessed.
The workflow is:
- Author — Decomposer creates/edits markdown files using
taskgraph initand direct editing - Sync — Files are ingested into the DB (files → DB)
- Execute — Coordinator and agents query and mutate the DB via hub operations
- Analyze — When needed, export from DB to files, run
taskgraph risk-pathetc.
The taskgraph CLI is not required at runtime. The hub uses @alkdev/taskgraph for runtime graph operations (topological sort, cycle detection, parallel groups, critical path, risk analysis) — see Graphology Integration.
Task Authority Model
| Aspect | Authority | Why |
|---|---|---|
| Task structure (all fields) | DB | Queryable, concurrent-safe, consistent |
| Task specification (body) | DB (body column) |
Stored as markdown text; agents append notes during execution |
| Task authoring/creation | Files → sync → DB | Decomposer edits files; sync ingests them |
| Runtime status mutations | DB (hub operations) | hub.task.* operations — coordinator and agents call these |
| Offline graph analysis | Files (taskgraph CLI) | Export from DB when needed for taskgraph risk-path etc. |
See Field Authority Split for the explicit list of authored vs runtime-managed fields.
Field Authority Split
Fields are split into two categories based on who writes them:
Authored Fields (upserted by file sync)
These fields are written by the Decomposer/file sync. The ON CONFLICT DO UPDATE SET clause in the sync upsert includes only these columns:
| Field | DB Column |
|---|---|
| id | slug |
| name | name |
| (project) | projectId |
| (directory path) | path |
| scope | scope |
| risk | risk |
| impact | impact |
| level | level |
| priority | priority |
| tags | tags |
| assignee | assignee |
| due | dueAt |
| (body) | body |
| created | fileCreatedAt |
| modified | fileModifiedAt |
| depends_on | task_dependencies table |
Note: projectId is set from the project context during sync (the task file's location within a project's tasks/ directory determines the project), not from taskgraph frontmatter. commonCols fields (id, metadata, createdAt, updatedAt) are DB-generated and not part of the sync conflict domain.
Runtime-Managed Fields (mutated via hub.task.* operations only)
These fields are never overwritten by sync. They are only mutated by hub operations (hub.task.updateStatus, hub.task.addNote, etc.):
| Field | DB Column | Set By |
|---|---|---|
| status | status |
hub.task.updateStatus |
| (started timestamp) | startedAt |
hub.task.updateStatus (on in-progress) |
| (completed timestamp) | completedAt |
hub.task.updateStatus (on completed) |
Warning
: Sync must never write
status,startedAt, orcompletedAt— these are owned by hub operations. The sync upsert usesON CONFLICT DO UPDATE SETonly for authored fields; runtime fields are excluded from the SET clause.
Field Mapping: taskgraph Frontmatter → DB Columns
Every field in taskgraph's TaskFrontmatter struct maps to a dedicated DB column. No frontmatter fields are relegated to JSONB metadata.
| taskgraph Field | DB Column | Type | Notes |
|---|---|---|---|
id |
slug |
text NOT NULL | Direct mapping. No transformation. slug is taskgraph-compatible, used in depends_on references. |
name |
name |
text NOT NULL | Direct mapping |
status |
status |
text NOT NULL, enum | Direct mapping: pending, in-progress, completed, failed, blocked. Default: pending. |
depends_on |
task_dependencies table |
— | Each element creates a row: depends_on[i] → dependsOnTaskId, task → dependentTaskId |
scope |
scope |
text, enum | single, narrow, moderate, broad, system. Nullable — NULL = not yet assessed. |
risk |
risk |
text, enum | trivial, low, medium, high, critical. Nullable — NULL = not yet assessed. |
impact |
impact |
text, enum | isolated, component, phase, project. Nullable — NULL = not yet assessed. |
level |
level |
text, enum | planning, decomposition, implementation, review, research. Nullable — NULL = not yet assessed. |
priority |
priority |
text, enum | low, medium, high, critical. Nullable. |
tags |
tags |
text[] | String array. Default {}. |
assignee |
assignee |
text | Assigned agent or person. Nullable. |
due |
dueAt |
timestamp with tz | Renamed from due for DB convention. Nullable. |
created |
fileCreatedAt |
timestamp with tz | Frontmatter created field. Separate from DB createdAt (row creation time). Nullable — frontmatter may not include it. |
modified |
fileModifiedAt |
timestamp with tz | Frontmatter modified field. Separate from DB updatedAt (row update time). Nullable. |
| (body) | body |
text | Markdown content after frontmatter. Nullable — empty body is valid. |
| (directory path) | path |
text | Logical grouping prefix: architecture, implementation/storage. Nullable — tasks created via API with no file origin have no path. See Path Semantics. |
| (project) | projectId |
text NOT NULL | FK → projects.id |
Table Schemas
tasks
SDD task definitions. The database is the source of truth for task data at runtime. Markdown files serve as the authoring surface for the Decomposer and taskgraph CLI — they are ingested into the DB via a sync operation. Every field in taskgraph's TaskFrontmatter struct maps to a dedicated DB column (no frontmatter fields in metadata JSONB).
| Column | Type | Notes |
|---|---|---|
| commonCols | — | id, metadata, createdAt, updatedAt |
| projectId | text NOT NULL | FK → projects.id (cascade) — tasks belong to a project |
| slug | text NOT NULL | taskgraph id — kebab-case identifier used in depends_on references. Unique within a project. |
| name | text NOT NULL | Human-readable task name (from frontmatter name) |
| path | text | Logical grouping prefix derived from filesystem location (e.g., architecture, implementation/storage). Nullable — tasks created via API with no file origin have no path. Enables WHERE path LIKE 'implementation/%' for scoped queries. |
| status | text NOT NULL | Enum: pending, in-progress, completed, failed, blocked. Default: pending. Status transitions go through hub operations, not file edits. |
| scope | text | Categorical scope: single, narrow, moderate, broad, system. Nullable — NULL = not yet assessed. See Why Categorical Fields Are Nullable. |
| risk | text | Categorical risk: trivial, low, medium, high, critical. Nullable — NULL = not yet assessed. |
| impact | text | Categorical impact: isolated, component, phase, project. Nullable — NULL = not yet assessed. |
| level | text | Task level: planning, decomposition, implementation, review, research. Nullable — NULL = not yet assessed. |
| priority | text | Priority: low, medium, high, critical. Nullable. |
| assignee | text | Assigned agent or person. Nullable. |
| dueAt | timestamp with tz | Due date (from frontmatter due). Nullable. |
| tags | text[] | Filtering tags. Default {}. GIN index for array-contains queries. |
| body | text | Markdown task specification (from file body after frontmatter). Nullable — empty body is valid. Agents may append notes during execution. |
| fileCreatedAt | timestamp with tz | Frontmatter created field — file creation time from the markdown. Separate from DB createdAt (row creation time). Nullable. |
| fileModifiedAt | timestamp with tz | Frontmatter modified field — file modification time from the markdown. Separate from DB updatedAt (row update time). Nullable. |
| startedAt | timestamp with tz | When status became in-progress. Set by hub operation, not by agent. |
| completedAt | timestamp with tz | When status became completed. Set by hub operation. |
Unique constraint: unq_tasks_project_slug UNIQUE on (projectId, slug) — task slugs are unique within a project.
pgEnum Definitions: The following enum columns use PostgreSQL pgEnum for type safety. Drizzle's pgEnum generates named PostgreSQL enums and provides TypeScript type inference. The enum values are aligned with taskgraph's categorical fields.
export const taskStatus = pgEnum("task_status", ["pending", "in-progress", "completed", "failed", "blocked"]);
export const taskScope = pgEnum("task_scope", ["single", "narrow", "moderate", "broad", "system"]);
export const taskRisk = pgEnum("task_risk", ["trivial", "low", "medium", "high", "critical"]);
export const taskImpact = pgEnum("task_impact", ["isolated", "component", "phase", "project"]);
export const taskLevel = pgEnum("task_level", ["planning", "decomposition", "implementation", "review", "research"]);
export const taskPriority = pgEnum("task_priority", ["low", "medium", "high", "critical"]);
The decomposer template should consume these same enum definitions to ensure DB-level constraints match the application-level typing.
Indexes: idx_tasks_project_id on (projectId), idx_tasks_project_status on (projectId, status) — composite for "find all pending tasks in project X", idx_tasks_status on (status), idx_tasks_active partial on (projectId) WHERE status IN ('pending', 'in-progress', 'blocked') — efficiently find active tasks, idx_tasks_path on (path) with text_pattern_ops — locale-independent LIKE pattern matching for path prefix queries (e.g., WHERE path LIKE 'implementation/%'), idx_tasks_priority on (priority), idx_tasks_assignee on (assignee), idx_tasks_due_at on (dueAt), idx_tasks_tags GIN on (tags) — for array-contains queries (tags @> '{security}').
slug semantics: From taskgraph frontmatter id field. Kebab-case identifiers like auth-setup, storage-tasks-table. Appears in depends_on arrays.
path semantics: Nullable — tasks created via API with no filesystem origin have no path. When set, captures the logical grouping derived from the tasks/ directory structure. E.g., a file at tasks/implementation/storage/tasks-table.md gets path: "implementation/storage". Enables WHERE path LIKE 'implementation/%' (scoped queries) without requiring a parentId FK. This replaces the previous parentId column — grouping is a path concern, not a tree relationship.
No parentId column: Grouping is handled by path, dependencies by task_dependencies. A "meta task" is just a regular task that depends on its sub-tasks — no special entity type needed.
No removedAt column: When a task file is removed, the sync operation DELETEs the DB row. Git history preserves the file-level history; the DB doesn't need to duplicate it with soft deletes. FK cascade handles cleanup.
metadata JSONB: Reserved for truly ad-hoc data not in the taskgraph schema. No taskgraph frontmatter fields are stored here — all have proper columns.
task_dependencies
Dependency edges between tasks. Directed: a row means the dependent task depends on the prerequisite task (prerequisite must complete before dependent can start). Mirrors the taskgraph depends_on relationship.
| Column | Type | Notes |
|---|---|---|
| commonCols | — | id, metadata, createdAt, updatedAt |
| dependsOnTaskId | text NOT NULL | FK → tasks.id (cascade) — The prerequisite task (must complete first) |
| dependentTaskId | text NOT NULL | FK → tasks.id (cascade) — The dependent task (waits for prerequisite) |
Unique constraint: unq_task_dependencies_depends_on_task UNIQUE on (dependsOnTaskId, dependentTaskId) — no duplicate dependency edges.
Indexes: idx_task_dependencies_depends_on_task_id on (dependsOnTaskId) — "what depends on this task?", idx_task_dependencies_dependent_task_id on (dependentTaskId) — "what does this task depend on?".
Direction: dependentTaskId is the task that has the dependency. dependsOnTaskId is the prerequisite task. Together they form a directed edge: dependentTaskId → dependsOnTaskId meaning "task dependentTaskId depends on task dependsOnTaskId". In the graph, there's an edge from dependsOnTaskId → dependentTaskId (prerequisite → dependent). This gives correct topological order: prerequisites before dependents.
Cross-project dependency guard: taskId and dependsOnTaskId MUST reference tasks within the same project. The application layer enforces this constraint — creating a dependency between tasks in different projects is rejected with a validation error. This is not enforced at the DB level (FK constraints allow cross-project references), so the application must check project consistency before insert.
A future DB-level guard could use a trigger: BEFORE INSERT ON task_dependencies that checks NEW.taskId and NEW.dependsOnTaskId reference tasks in the same project. This is deferred to Phase 2 — the application-layer check is sufficient for now.
Sync source: Dependency edges are authored in task file frontmatter (depends_on: [other-task]) and synced to this table during the file → DB sync operation. The sync clears and re-inserts all edges for a task on each run — dependencies are fully replaced by the sync, not merged or modified at runtime.
Why ALL Frontmatter Fields Get Proper Columns
ADR-001 establishes the pattern: "separate structured columns for high-query, high-filter fields." For tasks, every taskgraph frontmatter field is queryable and filterable in the coordinator's workflow:
priority— "show me high-priority pending tasks" (coordinator prioritization)assignee— "which tasks are assigned to agent X?" (work assignment)dueAt— "which tasks are due this week?" (deadline tracking)tags— "filter by tag" (cross-cutting concerns)
Shoving these into metadata JSONB loses type safety, indexability, and SQL queryability — exactly the problems the database is meant to solve. The metadata JSONB column (from commonCols) is reserved for truly ad-hoc data that isn't in the taskgraph schema.
Why Categorical Fields Are Nullable (Not NOT NULL with Defaults)
The previous design made scope, risk, impact, and level NOT NULL with defaults (narrow, low, isolated, implementation). This conflated two states:
- Assessed as
low— the Decomposer evaluated this and determined the risk is low - Not assessed — nobody filled this in
Hiding the distinction with defaults means the coordinator can't distinguish a deliberate assessment from a gap. NULL is the correct signal for "not yet assessed."
Taskgraph itself makes these fields Option<TaskScope>, Option<TaskRisk>, etc. — nullable. The DB should match the source model.
Application-layer handling: When scope, risk, impact, or level is NULL, the coordinator should:
- Warn that the task hasn't been assessed
- Exclude it from cost-benefit analysis (you can't compute risk-path without risk values)
- Suggest the Decomposer assess it
For @alkdev/taskgraph operations that need numeric weights, provide fallbacks at the application layer (e.g., treat NULL risk as low for topo sort, but warn).
Path Semantics
The path column captures the logical grouping of tasks, derived from their location in the tasks/ directory hierarchy:
tasks/
├── architecture/
│ ├── auth-design.md → path: "architecture"
│ └── storage-overview.md → path: "architecture"
├── research/
│ └── embedding-approach.md → path: "research"
└── implementation/
├── storage/
│ ├── tasks-table.md → path: "implementation/storage"
│ └── relations.md → path: "implementation/storage"
└── auth/
└── oauth-flow.md → path: "implementation/auth"
path is nullable because tasks created at runtime via hub operations (not synced from files) have no filesystem origin.
path enables scoped queries:
WHERE path = 'architecture'— all architecture tasksWHERE path LIKE 'implementation/%'— all implementation tasksWHERE path = 'implementation/storage'— storage implementation tasks
This is a prefix-based grouping mechanism. It replaces parentId (which was not in the taskgraph model and conflated organizational grouping with dependency ordering).
Locale sensitivity: The path column uses text type with the database's default collation. LIKE pattern matching (WHERE path LIKE 'implementation/%') is collation-sensitive. For case-sensitive matching (recommended for task paths which use lowercase), use COLLATE "C" or ensure the default collation is C/POSIX. Alternatively, use text_pattern_ops operator class for the index: CREATE INDEX idx_tasks_path ON tasks (path text_pattern_ops) which enables LIKE and ~ pattern matching regardless of collation.
Grouping vs Dependencies
There is no parentId column. Task grouping and dependency ordering are separate concepts:
- Grouping —
pathcolumn. "This task belongs to theimplementation/storagegroup." Enables scoped queries. Derived from filesystem layout during sync. - Dependencies —
task_dependenciestable. "This task cannot start until that task completes." Enables topological sort, cycle detection, critical path. Derived fromdepends_onfrontmatter.
A "meta task" (e.g., "implement storage") is simply a task that depends_on all its sub-tasks. There is no special entity type — it's regular task + dependency edges. The coordinator picks up the meta task as an assignment, and the implementation specialist works through sub-tasks in dependency order.
Why not parentId: parentId was invented in a previous doc revision but has no basis in the taskgraph data model. It created confusion:
- Redundant with
task_dependencies(a meta task's dependencies ARE its sub-tasks) - Required a fragile "inference from directory structure" during sync
- Violated the invariant that the DB schema mirrors the taskgraph frontmatter model
Relationship to Existing Tables
mappings Table
The mappings table links sessions to coordinators, spokes, and worktrees. A taskId column references the task a mapping is assigned to:
taskId: text REFERENCES tasks(id) // FK to tasks
task: text // denormalized display name (e.g., task.slug or task.name)
This preserves the quick-reference pattern (coordinators can list mappings with task names without a JOIN) while maintaining referential integrity.
projects Table
Tasks belong to a project via tasks.projectId. A project's tasks live in the project's tasks/ directory. Cross-project task dependencies are not supported — tasks can only depend on other tasks within the same project. This is enforced at the application level (see task_dependencies cross-project guard).
sessions Table
Sessions are linked to tasks indirectly through mappings. When the coordinator spawns a session for a meta task:
- The task row already exists in
tasks(synced from file or created via API) - Creates a
sessionsrow for the implementation specialist - Creates a
mappingsrow withtaskIdpointing to the meta task
Task Status Lifecycle
pending → in-progress → completed
↘ failed → in-progress (retry)
↘ blocked → in-progress (unblocked)
| Status | Meaning |
|---|---|
pending |
Task exists, not yet started |
in-progress |
A session is actively working on this task |
completed |
Task finished successfully |
failed |
Task failed, may retry (Safe Exit protocol) |
blocked |
Task is blocked by an unmet dependency or external issue |
Status transitions go through hub operations (hub.task.updateStatus), not file edits. This ensures:
- All agents see consistent state immediately
- The coordinator can query "which tasks are pending?" reliably
- No merge conflicts from parallel file edits
Timestamp columns startedAt and completedAt track when a task entered in-progress and completed states respectively. These are set by the hub operation, not by the agent.
Task Notes (Append-Only)
Agents may need to add notes to a task during execution (observations, partial progress, blockers encountered). For v1, this is handled by appending markdown to the body column:
## Task Description (original)
Implement the tasks table with Drizzle-TypeBox pattern...
## Implementation Notes
- 2026-04-19: Started with table definition, commonCols pattern works
- 2026-04-19: Hit issue with text[] type for tags — need to check Drizzle support
The hub.task.addNote operation appends a timestamped note section to the end of body. This is simple, preserves the full context in one place, and requires no additional tables.
Concurrency model for hub.task.addNote: Notes are appended to the task body field using DB-level concatenation: UPDATE tasks SET body = COALESCE(body, '') || $note WHERE id = $taskId. This avoids read-modify-write cycles entirely — the append is atomic at the SQL level, eliminating race conditions between concurrent agents.
As a fallback for scenarios where DB-level concatenation isn't feasible, optimistic locking via updatedAt can be used: read the current updatedAt, append the note, and UPDATE WHERE updatedAt = readValue. If the row was updated between read and write, the UPDATE affects 0 rows and the operation must be retried. This is sufficient for the expected low-contention scenario (one agent at a time writing notes to a task).
For high-contention scenarios (multiple agents writing simultaneously), consider a separate task_notes table with INSERT operations instead of UPDATE appends.
If structured, multi-agent notes become necessary later, a dedicated task_notes table can be added. The body append pattern doesn't preclude this — it's additive.
Why Categorical Estimates Matter
The scope, risk, impact, and level fields are not cosmetic metadata — they are what make taskgraph's analysis commands produce useful results. The cost-benefit framework (see taskgraph framework docs) demonstrates a structural property: upstream failures multiply downstream damage.
These fields power:
taskgraph decompose— flags tasks whererisk > mediumorscope > moderatetaskgraph risk-path— finds the highest cumulative risk pathtaskgraph critical— finds completion blockerstaskgraph bottleneck— finds high-betweenness tasks
Without them, you just get topological sort — useful, but not structurally insightful. The DB columns for these fields are nullable (NULL = not assessed) rather than NOT NULL with defaults, because the distinction between "deliberately assessed as low" and "nobody filled this in" is itself valuable information for the coordinator.
Graphology Integration (Runtime Graph Ops)
For runtime graph operations, the hub uses @alkdev/taskgraph — a TypeScript package that wraps graphology and provides a high-level TaskGraph class plus analysis functions. The CLI (taskgraph) is for offline authoring and analysis; the TS package is for runtime use.
The approach:
- Load all
tasks+task_dependenciesrows for a project from the DB - Build a
TaskGraphviaTaskGraph.fromRecords(tasks, edges) - Run analysis functions as needed:
criticalPath(),parallelGroups(),bottlenecks(),riskPath(),shouldDecomposeTask(),workflowCost()
This works because realistic task graphs are small — typically 10–50 tasks, rarely exceeding 200 even on large projects. Building a graph from DB rows is instant at this scale (TaskGraph.fromRecords with 100 nodes reconstructs in <5ms).
@alkdev/taskgraph exports:
TaskGraph— construction (fromTasks, fromRecords, fromJSON), mutation (addTask, removeTask, addDependency, updateTask), queries (hasCycles, findCycles, topologicalOrder, dependencies, dependents, getTask), validation (validateSchema, validateGraph), export- Analysis functions — criticalPath, weightedCriticalPath, parallelGroups, bottlenecks, riskPath, riskDistribution, calculateTaskEv, workflowCost, shouldDecomposeTask
- Schema types — TaskScope, TaskRisk, TaskImpact, TaskLevel, TaskPriority, TaskStatus enums with TypeBox schemas
- Frontmatter — parseFrontmatter, serializeFrontmatter (YAML + markdown)
- Error classes — TaskgraphError, CircularDependencyError, TaskNotFoundError, etc.
Why not taskgraph NAPI for v1: The Rust CLI (taskgraph) is for offline authoring and analysis. The TypeScript package (@alkdev/taskgraph) handles all runtime graph operations. Graphology is a transitive dependency through @alkdev/taskgraph and handles < 200 nodes trivially. NAPI is unnecessary at realistic scales.
Sync Flow
┌──────────────┐ ┌───────────────┐ ┌──────────────────┐
│ Decomposer │ │ taskgraph CLI │ │ Hub DB │
│ creates .md │──────►│ validates │──────►│ tasks table │
│ files │ │ analyzes │ │ task_dependencies │
└──────────────┘ └───────────────┘ └──────────────────┘
▲
│
┌────────┴─────────┐
│ Hub operations │
│ hub.task.* │
│ (status, notes) │
└────────────────────┘
Sync: Files → DB
The sync operation runs as a single database transaction:
- Begin transaction
- Scan
tasks/directory for markdown files - Parse frontmatter (YAML) + body (markdown) from each file.
@alkdev/taskgraphprovidesparseFrontmatter()andserializeFrontmatter()for YAML+markdown parsing.parseTaskFile()andparseTaskDirectory()are Node.js only (usenode:fs/promises); for Deno, useparseFrontmatter()with Deno file I/O. - Upsert into
taskstable (matches by(projectId, slug)) - For each task,
DELETE FROM task_dependencies WHERE dependentTaskId = ?thenINSERTthe current edges — dependency edges are fully replaced, not merged, because the files own the dependency declarations - Commit transaction
If any step fails, the entire sync rolls back — no partial updates.
Concurrency: Only one sync should run at a time. The Decomposer triggers sync after creating/updating task files. No concurrent sync mechanism is needed for v1.
Deleted files: When a task file is removed from tasks/, the sync operation deletes the corresponding DB row. Git history preserves the full file-level history — the DB doesn't need to duplicate it with soft deletes. FK cascade handles cleanup (task_dependencies rows, mappings.taskId SET NULL).
DB → Files (Export)
When graph analysis is needed, export DB rows back to markdown files:
- Query
tasks+task_dependenciesfor a project - For each task, generate markdown with YAML frontmatter + body
- Write to
tasks/directory structure (usingpathto determine subdirectory) - Run
taskgraph validate,taskgraph risk-path, etc.
This is a manual step — "I want to run analysis now" — not an automatic sync.
Sync Error Handling
| Error | Behavior |
|---|---|
| Invalid YAML frontmatter | Skip file, log warning with file path and parse error. Continue with remaining files. |
Missing required id or name field |
Skip file, log warning. Task cannot be synced without these fields. |
depends_on references non-existent slug within project |
Insert the dependency edge anyway (dangling reference). The coordinator detects and warns about unresolvable dependencies. taskgraph validate should be run before sync to catch these. |
Duplicate id (slug) in same project |
Fail the sync with a clear error. Slug uniqueness is enforced by the DB constraint unq_tasks_project_slug. |
| File removed from filesystem | DELETE the DB row. FK cascade handles dependent rows. Git preserves history. |
Validation ordering: Run taskgraph validate before sync to catch structural errors (cycles, missing dependencies, duplicate IDs) at the CLI level. The DB sync then handles data-level integrity (unique constraints, FK checks).
Open Questions
-
Embeddings: Task descriptions may benefit from vector embeddings for similarity search. Deferred — the
metadataJSONB column can hold an embedding reference later, or a separatetask_embeddingstable can be added. -
Bulk status updates: When the coordinator completes a meta task (all sub-tasks done), should it automatically mark the meta task
completed? Likely yes — this is an application-level operation, not a DB concern. -
Cross-project dependencies: Not supported. Tasks can only depend on other tasks within the same project. Application-layer validation rejects cross-project dependencies; a future DB-level trigger guard is deferred to Phase 2 (see task_dependencies cross-project guard).
-
Task versioning: When a task's body is modified (e.g., notes appended), should we keep previous versions? For v1, no — the current body is sufficient. If audit trail is needed,
updatedAttimestamp +metadatarevision count could suffice.
References
- Cost-benefit framework: taskgraph framework docs — why categorical estimates are structurally required
- Workflow guide: taskgraph workflow docs — practical usage patterns
- Task file format: @alkdev/taskgraph README — field definitions
- TaskFrontmatter struct: @alkdev/taskgraph package source — canonical field types and defaults
- taskgraph architecture: taskgraph architecture docs
- Storage pattern: README.md
- Table reference (cross-cutting): table-reference.md
- ADR-011: ../../decisions/ADR-011-dual-task-representation.md
- @alkdev/taskgraph (runtime graph engine):
@alkdev/taskgraphnpm package