alkdev/hub

Files

glm-5.1 93e2286343 Align storage & architecture specs with published npm libraries

Systematically compared @alkdev/taskgraph, @alkdev/operations, and
@alkdev/flowgraph against storage/arch specs and fixed all mismatches.

Key changes:

Tasks (storage/tasks.md + ADR-011):
- Rename TaskFrontmatter → TaskInput to match library export
- Fix dependsOn (was depends_on) in field mappings — library uses
  camelCase; parseFrontmatter normalizes YAML snake_case on input
- Document DependencyEdge shape {from, to, qualityRetention?} and
  DB↔library field mapping
- Document graph node vs DB column distinction (TaskGraphNodeAttrs
  is a subset of TaskInput)
- Fix default risk fallback from low → medium (matches resolveDefaults)
- Fix cross-project guard column references (dependentTaskId, not taskId)
- Clarify @alkdev/taskgraph TS is source of truth; frontmatter is for
  LLM output parsing and legacy imports, not Rust CLI
- Add complete library exports reference

Operations (storage/spokes.md + operations.md):
- Add version, title, _meta columns to operations table (required by
  OperationSpec, were missing)
- Fix type casing: query/mutation/subscription (lowercase, matching
  OperationType runtime values)
- Make outputSchema and accessControl NOT NULL (matching library)
- Document ErrorDefinition shape {code, description, schema, httpStatus?}
- Document _meta vs commonCols.metadata distinction
- Add registerAll, get, getHandler, getByName, list, subscribe methods
- Fix buildCallHandler signature ({ registry, callMap })
- Fix OperationType values (lowercase)

Call graph (storage/call-graph.md + call-graph.md):
- Change operationId to NOT NULL with RESTRICT FK (was nullable/SET NULL)
  — matches flowgraph's required CallNodeAttrs.operationId
- Document sentinel __removed__ operation strategy for deletions
- Document ISO 8601 string ↔ timestamptz conversion requirement
- Rewrite CallEventMap to match actual library: flat dot-notation keys,
  timestamp on all events, nested error structure, optional output on
  completed event
- Remove call.running event (doesn't exist in library) — hub calls
  updateStatus(running) directly on dispatch
- Fix buildCallHandler({ registry, callMap }) signature
- Fix PendingRequestMap constructor (positional EventTarget)
- Add updateCall/removeCall/graph methods to API summary
- Document abort cascade as hub logic, not flowgraph logic
- Add open questions for operation deletion and reactive vs call graph
  semantics

Table reference (storage/table-reference.md):
- Update call_graph_nodes.operationId cascade to RESTRICT
- Update operations.type comment to lowercase
- Update status enum reference

2026-05-25 11:46:42 +00:00

38 KiB

Raw Blame History

status, last_updated

status	last_updated
draft	2026-05-25

Storage: Tasks & Task Dependencies

Tasks are the unit of work in the Spec-Driven Development (SDD) process. The database is the source of truth for task data at runtime. Markdown files serve as the authoring surface for the Decomposer role and the taskgraph CLI — they are ingested into the DB via a sync operation and can be exported back for offline analysis.

For the overall storage pattern, see README.md. For cross-cutting table reference (common columns, cascade behavior, index reference, status enums, relations), see table-reference.md. For design decisions, see ../../decisions/.

Overview

Why Database as Source of Truth

Taskgraph's file-based model works well for single-agent, single-worktree workflows. In the hub's multi-agent, multi-worktree environment, files create problems:

Parallel worktrees: Agent A marks a task in-progress in their worktree's file. Agent B can't see this — the file lives in A's working directory. The coordinator can't get a consistent view.
Reliable coordination: The coordinator needs to query "which tasks are pending?" and "what's blocking task X?" at runtime without scanning filesystems across worktrees.
Atomic status updates: An agent calling hub.task.updateStatus gets an immediate, transactional state change visible to all other agents and the coordinator.

The database is the authoritative, queryable, concurrent-safe representation. Files are the authoring format.

Relationship to taskgraph CLI

The taskgraph CLI operates on markdown files. Its value is in offline analysis — topo, cycles, parallel, critical, bottleneck, risk-path, decompose. These commands depend on categorical fields (scope, risk, impact, level) being assessed.

The workflow is:

Author — Decomposer creates/edits markdown files using taskgraph init and direct editing
Sync — Files are ingested into the DB (files → DB)
Execute — Coordinator and agents query and mutate the DB via hub operations
Analyze — When needed, export from DB to files, run taskgraph risk-path etc.

The taskgraph CLI is not required at runtime. The hub uses @alkdev/taskgraph for runtime graph operations (topological sort, cycle detection, parallel groups, critical path, risk analysis) — see Graphology Integration.

Task Authority Model

Aspect	Authority	Why
Task structure (all fields)	DB	Queryable, concurrent-safe, consistent
Task specification (body)	DB (`body` column)	Stored as markdown text; agents append notes during execution
Task authoring/creation	Files → sync → DB	Decomposer edits files; sync ingests them
Runtime status mutations	DB (hub operations)	`hub.task.*` operations — coordinator and agents call these
Offline graph analysis	Files (taskgraph CLI)	Export from DB when needed for `taskgraph risk-path` etc.

See Field Authority Split for the explicit list of authored vs runtime-managed fields.

Field Authority Split

Fields are split into two categories based on who writes them:

Authored Fields (upserted by file sync)

These fields are written by the Decomposer/file sync. The ON CONFLICT DO UPDATE SET clause in the sync upsert includes only these columns:

Field	DB Column
id	`slug`
name	`name`
(project)	`projectId`
(directory path)	`path`
scope	`scope`
risk	`risk`
impact	`impact`
level	`level`
priority	`priority`
tags	`tags`
assignee	`assignee`
due	`dueAt`
(body)	`body`
created	`fileCreatedAt`
modified	`fileModifiedAt`
dependsOn	`task_dependencies` table

Note: projectId is set from the project context during sync (the task file's location within a project's tasks/ directory determines the project), not from taskgraph frontmatter. commonCols fields (id, metadata, createdAt, updatedAt) are DB-generated and not part of the sync conflict domain.

Runtime-Managed Fields (mutated via `hub.task.*` operations only)

These fields are never overwritten by sync. They are only mutated by hub operations (hub.task.updateStatus, hub.task.addNote, etc.):

Field	DB Column	Set By
status	`status`	`hub.task.updateStatus`
(started timestamp)	`startedAt`	`hub.task.updateStatus` (on `in-progress`)
(completed timestamp)	`completedAt`	`hub.task.updateStatus` (on `completed`)

Warning

: Sync must never write status, startedAt, or completedAt — these are owned by hub operations. The sync upsert uses ON CONFLICT DO UPDATE SET only for authored fields; runtime fields are excluded from the SET clause.

Field Mapping: taskgraph `TaskInput` → DB Columns

Every field in taskgraph's TaskInput type (the TypeScript equivalent of the Rust TaskFrontmatter struct) maps to a dedicated DB column. No TaskInput fields are relegated to JSONB metadata.

Naming note: The library exports TaskInput, not TaskFrontmatter. The JSDoc confirms it "matches the Rust TaskFrontmatter field set." The YAML key for dependencies is dependsOn in the library (camelCase); parseFrontmatter() normalizes depends_on → dependsOn on input, and serializeFrontmatter() outputs dependsOn. @alkdev/taskgraph (TypeScript) is the source of truth for the frontmatter format. The Rust CLI is not used going forward — frontmatter is used for LLM output parsing and importing legacy task files, with the DB as the authoritative runtime representation.

taskgraph Field (`TaskInput`)	DB Column	Type	Notes
`id`	`slug`	text NOT NULL	Direct mapping. No transformation. `slug` is taskgraph-compatible, used in `dependsOn` references.
`name`	`name`	text NOT NULL	Direct mapping
`status`	`status`	text NOT NULL, enum	Direct mapping: `pending`, `in-progress`, `completed`, `failed`, `blocked`. Default: `pending`.
`dependsOn`	`task_dependencies` table	—	Each element creates a row: `dependsOn[i]` → `dependsOnTaskId`, task → `dependentTaskId`. Library key is `dependsOn` (camelCase); YAML frontmatter may use `depends_on` which is normalized to `dependsOn` on parse.
`scope`	`scope`	text, enum	`single`, `narrow`, `moderate`, `broad`, `system`. Nullable — NULL = not yet assessed.
`risk`	`risk`	text, enum	`trivial`, `low`, `medium`, `high`, `critical`. Nullable — NULL = not yet assessed.
`impact`	`impact`	text, enum	`isolated`, `component`, `phase`, `project`. Nullable — NULL = not yet assessed.
`level`	`level`	text, enum	`planning`, `decomposition`, `implementation`, `review`, `research`. Nullable — NULL = not yet assessed.
`priority`	`priority`	text, enum	`low`, `medium`, `high`, `critical`. Nullable.
`tags`	`tags`	text[]	String array. Default `{}`.
`assignee`	`assignee`	text	Assigned agent or person. Nullable.
`due`	`dueAt`	timestamp with tz	Renamed from `due` for DB convention. Nullable.
`created`	`fileCreatedAt`	timestamp with tz	Frontmatter `created` field. Separate from DB `createdAt` (row creation time). Nullable — frontmatter may not include it.
`modified`	`fileModifiedAt`	timestamp with tz	Frontmatter `modified` field. Separate from DB `updatedAt` (row update time). Nullable — frontmatter may not include it.
(body)	`body`	text	Markdown content after frontmatter. Nullable — empty body is valid.
(directory path)	`path`	text	Logical grouping prefix: `architecture`, `implementation/storage`. Nullable — tasks created via API with no file origin have no path. See Path Semantics.
(project)	`projectId`	text NOT NULL	FK → projects.id

Table Schemas

`tasks`

SDD task definitions. The database is the source of truth for task data at runtime. Markdown files serve as the authoring surface for the Decomposer and taskgraph CLI — they are ingested into the DB via a sync operation. Every field in taskgraph's TaskFrontmatter struct maps to a dedicated DB column (no frontmatter fields in metadata JSONB).

Column	Type	Notes
commonCols	—	id, metadata, createdAt, updatedAt
projectId	text NOT NULL	FK → projects.id (cascade) — tasks belong to a project
slug	text NOT NULL	taskgraph `id` — kebab-case identifier used in `depends_on` references. Unique within a project.
name	text NOT NULL	Human-readable task name (from frontmatter `name`)
path	text	Logical grouping prefix derived from filesystem location (e.g., `architecture`, `implementation/storage`). Nullable — tasks created via API with no file origin have no path. Enables `WHERE path LIKE 'implementation/%'` for scoped queries.
status	text NOT NULL	Enum: `pending`, `in-progress`, `completed`, `failed`, `blocked`. Default: `pending`. Status transitions go through hub operations, not file edits.
scope	text	Categorical scope: `single`, `narrow`, `moderate`, `broad`, `system`. Nullable — NULL = not yet assessed. See Why Categorical Fields Are Nullable.
risk	text	Categorical risk: `trivial`, `low`, `medium`, `high`, `critical`. Nullable — NULL = not yet assessed.
impact	text	Categorical impact: `isolated`, `component`, `phase`, `project`. Nullable — NULL = not yet assessed.
level	text	Task level: `planning`, `decomposition`, `implementation`, `review`, `research`. Nullable — NULL = not yet assessed.
priority	text	Priority: `low`, `medium`, `high`, `critical`. Nullable.
assignee	text	Assigned agent or person. Nullable.
dueAt	timestamp with tz	Due date (from frontmatter `due`). Nullable.
tags	text[]	Filtering tags. Default `{}`. GIN index for array-contains queries.
body	text	Markdown task specification (from file body after frontmatter). Nullable — empty body is valid. Agents may append notes during execution.
fileCreatedAt	timestamp with tz	Frontmatter `created` field — file creation time from the markdown. Separate from DB `createdAt` (row creation time). Nullable.
fileModifiedAt	timestamp with tz	Frontmatter `modified` field — file modification time from the markdown. Separate from DB `updatedAt` (row update time). Nullable.
startedAt	timestamp with tz	When status became `in-progress`. Set by hub operation, not by agent.
completedAt	timestamp with tz	When status became `completed`. Set by hub operation.

Unique constraint: unq_tasks_project_slug UNIQUE on (projectId, slug) — task slugs are unique within a project.

pgEnum Definitions: The following enum columns use PostgreSQL pgEnum for type safety. Drizzle's pgEnum generates named PostgreSQL enums and provides TypeScript type inference. The enum values are aligned with taskgraph's categorical fields.

export const taskStatus = pgEnum("task_status", ["pending", "in-progress", "completed", "failed", "blocked"]);
export const taskScope = pgEnum("task_scope", ["single", "narrow", "moderate", "broad", "system"]);
export const taskRisk = pgEnum("task_risk", ["trivial", "low", "medium", "high", "critical"]);
export const taskImpact = pgEnum("task_impact", ["isolated", "component", "phase", "project"]);
export const taskLevel = pgEnum("task_level", ["planning", "decomposition", "implementation", "review", "research"]);
export const taskPriority = pgEnum("task_priority", ["low", "medium", "high", "critical"]);

The decomposer template should consume these same enum definitions to ensure DB-level constraints match the application-level typing.

Indexes: idx_tasks_project_id on (projectId), idx_tasks_project_status on (projectId, status) — composite for "find all pending tasks in project X", idx_tasks_status on (status), idx_tasks_active partial on (projectId) WHERE status IN ('pending', 'in-progress', 'blocked') — efficiently find active tasks, idx_tasks_path on (path) with text_pattern_ops — locale-independent LIKE pattern matching for path prefix queries (e.g., WHERE path LIKE 'implementation/%'), idx_tasks_priority on (priority), idx_tasks_assignee on (assignee), idx_tasks_due_at on (dueAt), idx_tasks_tags GIN on (tags) — for array-contains queries (tags @> '{security}').

slug semantics: From taskgraph frontmatter id field. Kebab-case identifiers like auth-setup, storage-tasks-table. Appears in dependsOn arrays (library key; YAML: depends_on).

path semantics: Nullable — tasks created via API with no filesystem origin have no path. When set, captures the logical grouping derived from the tasks/ directory structure. E.g., a file at tasks/implementation/storage/tasks-table.md gets path: "implementation/storage". Enables WHERE path LIKE 'implementation/%' (scoped queries) without requiring a parentId FK. This replaces the previous parentId column — grouping is a path concern, not a tree relationship.

No parentId column: Grouping is handled by path, dependencies by task_dependencies. A "meta task" is just a regular task that depends on its sub-tasks — no special entity type needed.

No removedAt column: When a task file is removed, the sync operation DELETEs the DB row. Git history preserves the file-level history; the DB doesn't need to duplicate it with soft deletes. FK cascade handles cleanup.

metadata JSONB: Reserved for truly ad-hoc data not in the taskgraph schema. No taskgraph frontmatter fields are stored here — all have proper columns.

`task_dependencies`

Dependency edges between tasks. Directed: a row means the dependent task depends on the prerequisite task (prerequisite must complete before dependent can start). Mirrors the taskgraph depends_on relationship.

Column	Type	Notes
commonCols	—	id, metadata, createdAt, updatedAt
dependsOnTaskId	text NOT NULL	FK → tasks.id (cascade) — The prerequisite task (must complete first)
dependentTaskId	text NOT NULL	FK → tasks.id (cascade) — The dependent task (waits for prerequisite)

Unique constraint: unq_task_dependencies_depends_on_task UNIQUE on (dependsOnTaskId, dependentTaskId) — no duplicate dependency edges.

Indexes: idx_task_dependencies_depends_on_task_id on (dependsOnTaskId) — "what depends on this task?", idx_task_dependencies_dependent_task_id on (dependentTaskId) — "what does this task depend on?".

Direction: dependentTaskId is the task that has the dependency. dependsOnTaskId is the prerequisite task. Together they form a directed edge: dependentTaskId → dependsOnTaskId meaning "task dependentTaskId depends on task dependsOnTaskId". In the graph, there's an edge from dependsOnTaskId → dependentTaskId (prerequisite → dependent). This gives correct topological order: prerequisites before dependents.

Cross-project dependency guard: dependentTaskId and dependsOnTaskId MUST reference tasks within the same project. The application layer enforces this constraint — creating a dependency between tasks in different projects is rejected with a validation error. This is not enforced at the DB level (FK constraints allow cross-project references), so the application must check project consistency before insert.

A future DB-level guard could use a trigger: BEFORE INSERT ON task_dependencies that checks NEW.dependentTaskId and NEW.dependsOnTaskId reference tasks in the same project. This is deferred to Phase 2 — the application-layer check is sufficient for now.

Sync source: Dependency edges are authored in task file frontmatter (dependsOn: [other-task] in the library, depends_on: in YAML) and synced to this table during the file → DB sync operation. The sync clears and re-inserts all edges for a task on each run — dependencies are fully replaced by the sync, not merged or modified at runtime.

Why ALL Frontmatter Fields Get Proper Columns

ADR-001 establishes the pattern: "separate structured columns for high-query, high-filter fields." For tasks, every taskgraph frontmatter field is queryable and filterable in the coordinator's workflow:

priority — "show me high-priority pending tasks" (coordinator prioritization)
assignee — "which tasks are assigned to agent X?" (work assignment)
dueAt — "which tasks are due this week?" (deadline tracking)
tags — "filter by tag" (cross-cutting concerns)

Shoving these into metadata JSONB loses type safety, indexability, and SQL queryability — exactly the problems the database is meant to solve. The metadata JSONB column (from commonCols) is reserved for truly ad-hoc data that isn't in the taskgraph schema.

Why Categorical Fields Are Nullable (Not NOT NULL with Defaults)

The previous design made scope, risk, impact, and level NOT NULL with defaults (narrow, low, isolated, implementation). This conflated two states:

Assessed as low — the Decomposer evaluated this and determined the risk is low
Not assessed — nobody filled this in

Hiding the distinction with defaults means the coordinator can't distinguish a deliberate assessment from a gap. NULL is the correct signal for "not yet assessed."

Taskgraph itself makes these fields Option<TaskScope>, Option<TaskRisk>, etc. — nullable. The DB should match the source model.

Application-layer handling: When scope, risk, impact, or level is NULL, the coordinator should:

Warn that the task hasn't been assessed
Exclude it from cost-benefit analysis (you can't compute risk-path without risk values)
Suggest the Decomposer assess it

For @alkdev/taskgraph operations that need numeric weights, provide fallbacks at the application layer. The library's resolveDefaults() uses medium as the default risk, narrow as the default scope, and isolated as the default impact. These defaults are used when computing analysis metrics — they do NOT change the DB value (NULL remains NULL in the database).

Path Semantics

The path column captures the logical grouping of tasks, derived from their location in the tasks/ directory hierarchy:

tasks/
├── architecture/
│   ├── auth-design.md          → path: "architecture"
│   └── storage-overview.md     → path: "architecture"
├── research/
│   └── embedding-approach.md   → path: "research"
└── implementation/
    ├── storage/
    │   ├── tasks-table.md      → path: "implementation/storage"
    │   └── relations.md        → path: "implementation/storage"
    └── auth/
        └── oauth-flow.md       → path: "implementation/auth"

path is nullable because tasks created at runtime via hub operations (not synced from files) have no filesystem origin.

path enables scoped queries:

WHERE path = 'architecture' — all architecture tasks
WHERE path LIKE 'implementation/%' — all implementation tasks
WHERE path = 'implementation/storage' — storage implementation tasks

This is a prefix-based grouping mechanism. It replaces parentId (which was not in the taskgraph model and conflated organizational grouping with dependency ordering).

Locale sensitivity: The path column uses text type with the database's default collation. LIKE pattern matching (WHERE path LIKE 'implementation/%') is collation-sensitive. For case-sensitive matching (recommended for task paths which use lowercase), use COLLATE "C" or ensure the default collation is C/POSIX. Alternatively, use text_pattern_ops operator class for the index: CREATE INDEX idx_tasks_path ON tasks (path text_pattern_ops) which enables LIKE and ~ pattern matching regardless of collation.

Grouping vs Dependencies

There is no parentId column. Task grouping and dependency ordering are separate concepts:

Grouping — path column. "This task belongs to the implementation/storage group." Enables scoped queries. Derived from filesystem layout during sync.
Dependencies — task_dependencies table. "This task cannot start until that task completes." Enables topological sort, cycle detection, critical path. Derived from depends_on frontmatter.

A "meta task" (e.g., "implement storage") is simply a task that depends_on all its sub-tasks. There is no special entity type — it's regular task + dependency edges. The coordinator picks up the meta task as an assignment, and the implementation specialist works through sub-tasks in dependency order.

Why not parentId: parentId was invented in a previous doc revision but has no basis in the taskgraph data model. It created confusion:

Redundant with task_dependencies (a meta task's dependencies ARE its sub-tasks)
Required a fragile "inference from directory structure" during sync
Violated the invariant that the DB schema mirrors the taskgraph frontmatter model

Relationship to Existing Tables

`mappings` Table

The mappings table links sessions to coordinators, spokes, and worktrees. A taskId column references the task a mapping is assigned to:

taskId: text REFERENCES tasks(id)   // FK to tasks
task: text                           // denormalized display name (e.g., task.slug or task.name)

This preserves the quick-reference pattern (coordinators can list mappings with task names without a JOIN) while maintaining referential integrity.

`projects` Table

Tasks belong to a project via tasks.projectId. A project's tasks live in the project's tasks/ directory. Cross-project task dependencies are not supported — tasks can only depend on other tasks within the same project. This is enforced at the application level (see task_dependencies cross-project guard).

`sessions` Table

Sessions are linked to tasks indirectly through mappings. When the coordinator spawns a session for a meta task:

The task row already exists in tasks (synced from file or created via API)
Creates a sessions row for the implementation specialist
Creates a mappings row with taskId pointing to the meta task

Task Status Lifecycle

pending → in-progress → completed
                      ↘ failed → in-progress (retry)
                      ↘ blocked → in-progress (unblocked)

Status	Meaning
`pending`	Task exists, not yet started
`in-progress`	A session is actively working on this task
`completed`	Task finished successfully
`failed`	Task failed, may retry (Safe Exit protocol)
`blocked`	Task is blocked by an unmet dependency or external issue

Status transitions go through hub operations (hub.task.updateStatus), not file edits. This ensures:

All agents see consistent state immediately
The coordinator can query "which tasks are pending?" reliably
No merge conflicts from parallel file edits

Timestamp columns startedAt and completedAt track when a task entered in-progress and completed states respectively. These are set by the hub operation, not by the agent.

Task Notes (Append-Only)

Agents may need to add notes to a task during execution (observations, partial progress, blockers encountered). For v1, this is handled by appending markdown to the body column:

## Task Description (original)

Implement the tasks table with Drizzle-TypeBox pattern...

## Implementation Notes

- 2026-04-19: Started with table definition, commonCols pattern works
- 2026-04-19: Hit issue with text[] type for tags — need to check Drizzle support

The hub.task.addNote operation appends a timestamped note section to the end of body. This is simple, preserves the full context in one place, and requires no additional tables.

Concurrency model for hub.task.addNote: Notes are appended to the task body field using DB-level concatenation: UPDATE tasks SET body = COALESCE(body, '') || $note WHERE id = $taskId. This avoids read-modify-write cycles entirely — the append is atomic at the SQL level, eliminating race conditions between concurrent agents.

As a fallback for scenarios where DB-level concatenation isn't feasible, optimistic locking via updatedAt can be used: read the current updatedAt, append the note, and UPDATE WHERE updatedAt = readValue. If the row was updated between read and write, the UPDATE affects 0 rows and the operation must be retried. This is sufficient for the expected low-contention scenario (one agent at a time writing notes to a task).

For high-contention scenarios (multiple agents writing simultaneously), consider a separate task_notes table with INSERT operations instead of UPDATE appends.

If structured, multi-agent notes become necessary later, a dedicated task_notes table can be added. The body append pattern doesn't preclude this — it's additive.

Why Categorical Estimates Matter

The scope, risk, impact, and level fields are not cosmetic metadata — they are what make taskgraph's analysis commands produce useful results. The cost-benefit framework (see taskgraph framework docs) demonstrates a structural property: upstream failures multiply downstream damage.

These fields power:

taskgraph decompose — flags tasks where risk > medium or scope > moderate
taskgraph risk-path — finds the highest cumulative risk path
taskgraph critical — finds completion blockers
taskgraph bottleneck — finds high-betweenness tasks

Without them, you just get topological sort — useful, but not structurally insightful. The DB columns for these fields are nullable (NULL = not assessed) rather than NOT NULL with defaults, because the distinction between "deliberately assessed as low" and "nobody filled this in" is itself valuable information for the coordinator.

Graphology Integration (Runtime Graph Ops)

For runtime graph operations, the hub uses @alkdev/taskgraph — a TypeScript package that wraps graphology and provides a high-level TaskGraph class plus analysis functions. The CLI (taskgraph) is for offline authoring and analysis; the TS package is for runtime use.

Construction

The approach:

Load all tasks + task_dependencies rows for a project from the DB
Transform DB rows into TaskInput[] and DependencyEdge[] shapes (see Library ↔ DB Field Mapping below)
Build a TaskGraph via TaskGraph.fromRecords(taskInputs, edges)
Run analysis functions as needed

This works because realistic task graphs are small — typically 10–50 tasks, rarely exceeding 200 even on large projects. Building a graph from DB rows is instant at this scale (TaskGraph.fromRecords with 100 nodes reconstructs in <5ms).

Library ↔ DB Field Mapping

Task inputs: TaskGraph.fromRecords(tasks, edges) takes TaskInput[] (frontmatter-shaped), not DB row shapes. The hub transforms DB rows → TaskInput:

DB Column	TaskInput Field	Notes
`slug`	`id`	Direct mapping
`name`	`name`	Direct mapping
`status`	`status`	Direct mapping
`scope`	`scope`	Direct mapping
`risk`	`risk`	Direct mapping
`impact`	`impact`	Direct mapping
`level`	`level`	Direct mapping
`priority`	`priority`	Direct mapping
`tags`	`tags`	Direct mapping

Dependency edges: DependencyEdge uses { from, to, qualityRetention? }, not DB column names:

DB Column	DependencyEdge Field	Notes
`dependsOnTaskId` (prerequisite)	`from`	The prerequisite task that must complete first
`dependentTaskId` (dependent)	`to`	The dependent task that waits for the prerequisite
(no column)	`qualityRetention?`	Per-edge failure propagation weight (0–1, default 0.9). Used by `workflowCost` analysis. Not stored in DB — set at graph construction time.

Graph Node vs DB Column Distinction

TaskGraphNodeAttributes (what the graph stores per node) is a subset of TaskInput. The graph intentionally drops fields that aren't relevant to graph algorithms:

In `TaskInput`	In `TaskGraphNodeAttributes`	Reason
`id`	✅ `id`	Node key
`name`	✅ `name`	Display
`status`	✅ `status`	State tracking
`scope`	✅ `scope`	Analysis
`risk`	✅ `risk`	Analysis
`impact`	✅ `impact`	Analysis
`level`	✅ `level`	Analysis
`priority`	✅ `priority`	Analysis
`tags`	❌	Not used by graph algorithms — available in DB
`assignee`	❌	Not used by graph algorithms — available in DB
`due`	❌	Not used by graph algorithms — available in DB
`created`	❌	Not used by graph algorithms — available in DB
`modified`	❌	Not used by graph algorithms — available in DB

Fields like tags, assignee, and due are fully queryable in the DB and don't need to be in the graph for analysis. If the coordinator needs to filter a graph by assignee, it should query the DB first and then construct a filtered subgraph using taskGraph.subgraph(filter).

@alkdev/taskgraph Exports

Construction — TaskGraph class:

TaskGraph.fromTasks(tasks: TaskInput[]) — builds graph from tasks, inferring edges from dependsOn arrays
TaskGraph.fromRecords(tasks: TaskInput[], edges: DependencyEdge[]) — builds from tasks + explicit edge list
TaskGraph.fromJSON(data: TaskGraphSerialized) — deserializes from graphology JSON
Mutation: addTask(task), removeTask(taskId), addDependency(prerequisite, dependent, qualityRetention?), updateTask(taskId, attrs)
Queries: hasCycles, findCycles, topologicalOrder, dependencies(taskId), dependents(taskId), getTask(taskId), subgraph(filter)
Validation: validateSchema(), validateGraph()
Export: export() → TaskGraphSerialized, toJSON() (alias)
Escape hatch: get graph → raw graphology DirectedGraph

Analysis functions:

criticalPath(graph) — longest path by edge count
weightedCriticalPath(graph, weightFn) — longest path with custom weight function
parallelGroups(graph) — groups of tasks that can run concurrently
bottlenecks(graph) — high-betweenness tasks. Returns BottleneckResult[]
riskPath(graph) — highest cumulative risk path. Returns RiskPathResult { path, totalRisk }
riskDistribution(graph) — risk distribution across graph. Returns RiskDistributionResult
shouldDecomposeTask(task) — decomposition recommendation. Returns DecomposeResult { shouldDecompose, reasons }
calculateTaskEv(p, scopeCost, impactWeight, config?) — expected value math. Returns EvResult
workflowCost(graph, options?) — total workflow cost with failure propagation. Returns WorkflowCostResult. Options: WorkflowCostOptions

Categorical numeric methods (map enum values → numbers for analysis):

scopeCostEstimate(scope) — numeric scope cost
scopeTokenEstimate(scope) — token-based scope estimate
riskSuccessProbability(risk) — probability of success (0–1)
riskWeight(risk) — weight for risk calculations
impactWeight(impact) — weight for impact calculations
resolveDefaults(attrs) — fills default values for unassessed fields and computes derived numeric values. Returns ResolvedTaskAttributes

Schema types — TypeBox schemas with Enum suffix:

TaskStatusEnum, TaskScopeEnum, TaskRiskEnum, TaskImpactEnum, TaskLevelEnum, TaskPriorityEnum
TypeScript types: TaskStatus, TaskScope, TaskRisk, TaskImpact, TaskLevel, TaskPriority

Frontmatter:

parseFrontmatter(content) — parses YAML + markdown, normalizes depends_on → dependsOn
serializeFrontmatter(data) — serializes to YAML + markdown, outputs dependsOn
splitFrontmatter(content) — lower-level helper that splits ----delimited YAML from markdown without validating

Error classes:

TaskgraphError (base), CircularDependencyError, TaskNotFoundError, DuplicateNodeError, DuplicateEdgeError, ValidationError, GraphValidationError

Why not taskgraph NAPI for v1: The Rust CLI (taskgraph) is for offline authoring and analysis. The TypeScript package (@alkdev/taskgraph) handles all runtime graph operations. Graphology is a transitive dependency through @alkdev/taskgraph and handles < 200 nodes trivially. NAPI is unnecessary at realistic scales.

Sync Flow

┌──────────────┐       ┌───────────────┐       ┌──────────────────┐
│ Decomposer   │       │ taskgraph CLI │       │ Hub DB            │
│ creates .md  │──────►│ validates     │──────►│ tasks table       │
│ files        │       │ analyzes      │       │ task_dependencies  │
└──────────────┘       └───────────────┘       └──────────────────┘
                                                       ▲
                                                       │
                                              ┌────────┴─────────┐
                                              │ Hub operations     │
                                              │ hub.task.*         │
                                              │ (status, notes)    │
                                              └────────────────────┘

Sync: Files → DB

The sync operation runs as a single database transaction:

Begin transaction
Scan tasks/ directory for markdown files
Parse frontmatter (YAML) + body (markdown) from each file. @alkdev/taskgraph provides parseFrontmatter() and serializeFrontmatter() for YAML+markdown parsing. parseTaskFile() and parseTaskDirectory() are Node.js only (use node:fs/promises); for Deno, use parseFrontmatter() with Deno file I/O.
Upsert into tasks table (matches by (projectId, slug))
For each task, DELETE FROM task_dependencies WHERE dependentTaskId = ? then INSERT the current edges — dependency edges are fully replaced, not merged, because the files own the dependency declarations
Commit transaction

If any step fails, the entire sync rolls back — no partial updates.

Concurrency: Only one sync should run at a time. The Decomposer triggers sync after creating/updating task files. No concurrent sync mechanism is needed for v1.

Deleted files: When a task file is removed from tasks/, the sync operation deletes the corresponding DB row. Git history preserves the full file-level history — the DB doesn't need to duplicate it with soft deletes. FK cascade handles cleanup (task_dependencies rows, mappings.taskId SET NULL).

DB → Files (Export)

When graph analysis is needed, export DB rows back to markdown files:

Query tasks + task_dependencies for a project
For each task, generate markdown with YAML frontmatter + body
Write to tasks/ directory structure (using path to determine subdirectory)
Run taskgraph validate, taskgraph risk-path, etc.

This is a manual step — "I want to run analysis now" — not an automatic sync.

Sync Error Handling

Error	Behavior
Invalid YAML frontmatter	Skip file, log warning with file path and parse error. Continue with remaining files.
Missing required `id` or `name` field	Skip file, log warning. Task cannot be synced without these fields.
`dependsOn` references non-existent slug within project	Insert the dependency edge anyway (dangling reference). The coordinator detects and warns about unresolvable dependencies. `taskgraph validate` should be run before sync to catch these.
Duplicate `id` (slug) in same project	Fail the sync with a clear error. Slug uniqueness is enforced by the DB constraint `unq_tasks_project_slug`.
File removed from filesystem	DELETE the DB row. FK cascade handles dependent rows. Git preserves history.

Validation ordering: Run taskgraph validate before sync to catch structural errors (cycles, missing dependencies, duplicate IDs) at the CLI level. The DB sync then handles data-level integrity (unique constraints, FK checks).

Open Questions

Embeddings: Task descriptions may benefit from vector embeddings for similarity search. Deferred — the metadata JSONB column can hold an embedding reference later, or a separate task_embeddings table can be added.
Bulk status updates: When the coordinator completes a meta task (all sub-tasks done), should it automatically mark the meta task completed? Likely yes — this is an application-level operation, not a DB concern.
Cross-project dependencies: Not supported. Tasks can only depend on other tasks within the same project. Application-layer validation rejects cross-project dependencies; a future DB-level trigger guard is deferred to Phase 2 (see task_dependencies cross-project guard).
Task versioning: When a task's body is modified (e.g., notes appended), should we keep previous versions? For v1, no — the current body is sufficient. If audit trail is needed, updatedAt timestamp + metadata revision count could suffice.

References

Cost-benefit framework: taskgraph framework docs — why categorical estimates are structurally required
Workflow guide: taskgraph workflow docs — practical usage patterns
Task file format: @alkdev/taskgraph README — field definitions
TaskFrontmatter struct: @alkdev/taskgraph package source — TaskInput type (TypeScript equivalent of Rust TaskFrontmatter)
taskgraph architecture: taskgraph architecture docs
Storage pattern: README.md
Table reference (cross-cutting): table-reference.md
ADR-011: ../../decisions/ADR-011-dual-task-representation.md
@alkdev/taskgraph (runtime graph engine): @alkdev/taskgraph npm package

38 KiB Raw Blame History Unescape Escape