Files
taskgraph_ts/docs/research/taskgraph_rust_source.md

1141 lines
45 KiB
Markdown

# TaskGraph Rust Source - Comprehensive Research Report
> Source: `/workspace/@alkimiadev/taskgraph` (Rust CLI project)
> Report date: 2026-04-23
> Version: 0.1.3
---
## Table of Contents
1. [Project Structure](#1-project-structure)
2. [Cargo.toml Details](#2-cargotoml-details)
3. [Core Data Types and Public APIs](#3-core-data-types-and-public-apis)
4. [Functions/Methods to Expose via NAPI](#4-functionsmethods-to-expose-via-napi)
5. [Serialization (Serde) Support](#5-serialization-serde-support)
6. [Error Types and Error Handling](#6-error-types-and-error-handling)
7. [Input/Output Patterns](#7-inputoutput-patterns)
8. [Existing Tests and Benchmarks](#8-existing-tests-and-benchmarks)
---
## 1. Project Structure
### Directory Layout
```
taskgraph/
├── Cargo.toml # Package manifest (single crate, not a workspace)
├── Cargo.lock # Locked dependencies
├── LICENSE-APACHE # Apache-2.0 license
├── LICENSE-MIT # MIT license
├── README.md # User-facing documentation
├── AGENTS.md # AI agent context file
├── opencode.json # OpenCode configuration
├── .github/
│ └── workflows/
│ └── ci.yml # CI: fmt, clippy, test, coverage
├── docs/
│ ├── ARCHITECTURE.md # Full architecture spec
│ ├── framework.md # Cost-benefit framework rationale
│ ├── workflow.md # Practical workflow guide
│ ├── implementation.md # Tools/models/guidelines
│ ├── phase-1.md through phase-4.md # Phase plans
│ ├── issues/ # Blocking issues tracking
│ ├── reviews/ # Code review docs
│ └── research/
│ └── cost_benefit_analysis_framework.py
├── scripts/
│ └── benchmark.sh # Manual benchmark script
├── benches/
│ └── graph_benchmarks.rs # Criterion benchmarks
├── src/
│ ├── main.rs # Binary entry point (thin: parse CLI, execute)
│ ├── lib.rs # Library root - re-exports public API
│ ├── cli.rs # CLI argument definitions (clap derive)
│ ├── task.rs # Task, TaskFrontmatter, enums (serde types)
│ ├── graph.rs # DependencyGraph (petgraph wrapper)
│ ├── error.rs # Error enum (thiserror)
│ ├── config.rs # Config loading (.taskgraph.toml)
│ ├── discovery.rs # TaskCollection (directory scanning)
│ └── commands/
│ ├── mod.rs # Command module re-exports
│ ├── init.rs # `init` command
│ ├── validate.rs # `validate` command
│ ├── list.rs # `list` command
│ ├── show.rs # `show` command
│ ├── deps.rs # `deps` command
│ ├── topo.rs # `topo` command
│ ├── cycles.rs # `cycles` command
│ ├── parallel.rs # `parallel` command
│ ├── critical.rs # `critical` command
│ ├── bottleneck.rs # `bottleneck` command
│ ├── risk.rs # `risk` command
│ ├── decompose.rs # `decompose` command
│ ├── workflow_cost.rs # `workflow-cost` command
│ ├── risk_path.rs # `risk-path` command
│ └── graph_cmd.rs # `graph` command (DOT output)
└── tests/
├── integration/
│ └── commands.rs # 25 integration tests (assert_cmd)
└── fixtures/
├── tasks/ # 3 valid tasks (one depends on another)
├── cycles/ # 3 tasks forming a cycle
├── invalid/ # 1 task with missing dependency
├── risk/ # 5 tasks with various risk levels
└── decompose/ # 4 tasks for decomposition testing
```
### Module Dependency Graph
```
lib.rs
├── cli → commands::*, config, discovery, graph
├── commands/* → cli, discovery, graph, task
├── config → error
├── discovery → task, error
├── error → (thiserror, std, serde_yaml, serde_json)
├── graph → discovery, task, petgraph
└── task → (serde, chrono, gray_matter, error)
```
### Crates
This is a **single crate** project (not a Cargo workspace). It produces:
- **Library**: `libtaskgraph` (from `src/lib.rs`)
- **Binary**: `taskgraph` (from `src/main.rs`)
---
## 2. Cargo.toml Details
### Package Metadata
| Field | Value |
|-------|-------|
| name | `taskgraph` |
| version | `0.1.3` |
| edition | `2021` |
| license | `MIT OR Apache-2.0` |
| description | CLI tool for managing task dependencies using markdown files |
| repository | `https://github.com/alkimiadev/taskgraph` |
| keywords | `task`, `dependency`, `graph`, `cli`, `markdown` |
| categories | `command-line-utilities`, `development-tools` |
### Dependencies (Production)
| Crate | Version | Features | Purpose |
|-------|---------|----------|---------|
| `petgraph` | `0.7` | - | Directed graph data structure & algorithms (toposort, cycle detection, etc.) |
| `gray_matter` | `0.2` | - | Markdown frontmatter extraction (YAML engine) |
| `serde` | `1.0` | `derive` | Serialization/deserialization framework |
| `serde_json` | `1.0` | - | JSON serialization (for `--format json` output) |
| `serde_yaml` | `0.9` | - | YAML serialization (for frontmatter parsing & roundtrip) |
| `clap` | `4.5` | `derive` | CLI argument parsing |
| `clap_complete` | `4.5` | - | Shell completion generation |
| `chrono` | `0.4` | `serde` | Date/time with serde support |
| `anyhow` | `1.0` | - | Ergonomic error handling (used in CLI/binary) |
| `thiserror` | `2.0` | - | Derived error types (used in library) |
| `dirs` | `6.0` | - | Platform directories (future: global config) |
| `walkdir` | `2.5` | - | Recursive directory walking |
| `tracing` | `0.1` | - | Structured logging |
| `tracing-subscriber` | `0.3` | `env-filter` | Log output formatting |
| `toml` | `0.8` | - | Config file parsing |
### Dev Dependencies
| Crate | Version | Purpose |
|-------|---------|---------|
| `tempfile` | `3.0` | Temporary directories for tests |
| `assert_cmd` | `2.0` | CLI integration testing |
| `predicates` | `3.0` | Assertion predicates for integration tests |
| `criterion` | `0.5` | Benchmarking framework |
### Features
```toml
[features]
default = []
```
No feature flags exist yet. This is a good candidate for adding `napi` feature.
### Release Profile
```toml
[profile.release]
opt-level = 3
lto = true
strip = true
```
---
## 3. Core Data Types and Public APIs
### 3.1 Task (`src/task.rs`)
The central data type. Represents a single task file.
```rust
/// A task with its content.
#[derive(Debug, Clone)]
pub struct Task {
pub frontmatter: TaskFrontmatter,
pub body: String, // Markdown body content
pub source: Option<String>, // Source file path (if loaded from file)
}
```
**Methods:**
| Method | Signature | Returns | Description |
|--------|-----------|---------|-------------|
| `id()` | `&self -> &str` | Task ID | Accessor for frontmatter.id |
| `name()` | `&self -> &str` | Task name | Accessor for frontmatter.name |
| `status()` | `&self -> TaskStatus` | Status enum | Accessor for frontmatter.status |
| `depends_on()` | `&self -> &[String]` | Dependency list | Accessor for frontmatter.depends_on |
| `from_file()` | `&Path -> Result<Self>` | Parsed Task | Parse from a .md file on disk |
| `from_markdown()` | `&str, Option<String> -> Result<Self>` | Parsed Task | Parse from markdown string + optional source name |
| `to_markdown()` | `&self -> Result<String, serde_yaml::Error>` | Markdown string | Serialize back to markdown with YAML frontmatter |
**Key observation:** `Task` itself does **NOT** derive `Serialize` or `Deserialize`. Only `TaskFrontmatter` does. The `body` and `source` fields are not serialized through serde - they're managed separately during parse/render.
### 3.2 TaskFrontmatter (`src/task.rs`)
The structured metadata extracted from YAML frontmatter:
```rust
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct TaskFrontmatter {
pub id: String,
pub name: String,
#[serde(default)]
pub status: TaskStatus,
#[serde(default, rename = "depends_on")]
pub depends_on: Vec<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub priority: Option<String>,
#[serde(default, skip_serializing_if = "Vec::is_empty")]
pub tags: Vec<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub created: Option<DateTime<Utc>>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub modified: Option<DateTime<Utc>>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub assignee: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub due: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub scope: Option<TaskScope>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub risk: Option<TaskRisk>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub impact: Option<TaskImpact>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub level: Option<TaskLevel>,
}
```
**Serde details:**
- All enums use `#[serde(rename_all = "kebab-case")]` for YAML keys
- Optional fields use `skip_serializing_if` to keep output clean
- Tags use `skip_serializing_if = "Vec::is_empty"`
- `depends_on` renamed from Rust `depends_on` (same, but explicitly)
- `status` has a default of `TaskStatus::Pending`
### 3.3 Enum Types (`src/task.rs`)
All enums derive `Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize, Default`.
#### TaskStatus
```rust
#[serde(rename_all = "kebab-case")]
pub enum TaskStatus {
Pending, // default
InProgress, // "in-progress" in YAML/JSON
Completed,
Failed,
Blocked,
}
```
Also implements `Display` (kebab-case strings).
#### TaskScope
```rust
#[serde(rename_all = "kebab-case")]
pub enum TaskScope {
Single, // ~500 tokens, cost 1.0
Narrow, // default, ~1500 tokens, cost 2.0
Moderate, // ~3000 tokens, cost 3.0
Broad, // ~6000 tokens, cost 4.0
System, // ~10000 tokens, cost 5.0
}
```
Methods: `token_estimate() -> u32`, `cost_estimate() -> f64`, `Display`
#### TaskRisk
```rust
#[serde(rename_all = "kebab-case")]
pub enum TaskRisk {
Trivial, // p=0.98
Low, // default, p=0.90
Medium, // p=0.80
High, // p=0.65
Critical, // p=0.50
}
```
Methods: `success_probability() -> f64`, `Display`
#### TaskImpact
```rust
#[serde(rename_all = "kebab-case")]
pub enum TaskImpact {
Isolated, // default, weight 1.0
Component, // weight 1.5
Phase, // weight 2.0
Project, // weight 3.0
}
```
Methods: `weight() -> f64`, `Display`
#### TaskLevel
```rust
#[serde(rename_all = "kebab-case")]
pub enum TaskLevel {
Planning,
Decomposition,
Implementation, // default
Review,
Research,
}
```
Methods: `Display` only
### 3.4 DependencyGraph (`src/graph.rs`)
A directed graph of task dependencies built from a `TaskCollection`.
```rust
pub struct DependencyGraph {
graph: DiGraph<TaskId, ()>, // petgraph directed graph
index_map: HashMap<TaskId, NodeIndex>, // task ID -> node index
}
```
**Edge direction:** `from -> to` means "from must complete before to" (dependency must complete first).
**Public API:**
| Method | Signature | Returns | Description |
|--------|-----------|---------|-------------|
| `new()` | `-> Self` | Empty graph | Create empty graph |
| `from_collection()` | `&TaskCollection -> Self` | Built graph | Build from discovered tasks |
| `from_tasks()` | `Vec<&Task> -> Self` | Built graph | Build from explicit task list |
| `add_task()` | `&mut self, TaskId` | () | Add node |
| `add_dependency()` | `&mut self, &str, &str` | () | Add edge (from->to); silently ignores unknown IDs |
| `has_cycles()` | `&self -> bool` | Boolean | Uses `petgraph::algo::is_cyclic_directed` |
| `find_cycles()` | `&self -> Vec<Vec<TaskId>>` | Cycles | Custom DFS cycle finder |
| `topological_order()` | `&self -> Option<Vec<TaskId>>` | Order or None | Uses `petgraph::algo::toposort` |
| `dependencies()` | `&self, &str -> Vec<TaskId>` | Incoming neighbors | What this task depends on (direct) |
| `dependents()` | `&self, &str -> Vec<TaskId>` | Outgoing neighbors | What depends on this (direct) |
| `parallel_groups()` | `&self -> Vec<Vec<TaskId>>` | Generations | Tasks grouped by level (can run concurrently) |
| `critical_path()` | `&self -> Vec<TaskId>` | Path | Longest path through the graph |
| `weighted_critical_path()` | `&self, F: Fn(&str)->f64 -> Vec<TaskId>` | Weighted path | Path with highest cumulative weight |
| `bottlenecks()` | `&self -> Vec<(TaskId, usize)>` | Ranked list | Betweenness centrality via path counting |
| `to_dot()` | `&self -> String` | DOT string | GraphViz DOT format export |
Also implements `Default` (returns `new()`).
**Important:** `DependencyGraph` does **NOT** implement `Serialize`/`Deserialize`. It's a compute-only structure built fresh each time from tasks.
### 3.5 TaskCollection (`src/discovery.rs`)
Collection of tasks discovered from a directory:
```rust
#[derive(Debug, Default)]
pub struct TaskCollection {
tasks: HashMap<String, Task>, // Tasks indexed by ID
paths: HashMap<String, PathBuf>, // File paths indexed by ID
errors: Vec<DiscoveryError>, // Parse errors encountered
}
```
**Public API:**
| Method | Signature | Returns | Description |
|--------|-----------|---------|-------------|
| `new()` | `-> Self` | Empty collection | Constructor |
| `from_directory()` | `&Path -> Self` | Populated collection | Scan directory recursively for .md files |
| `get()` | `&self, &str -> Option<&Task>` | Task or None | Lookup by ID |
| `path()` | `&self, &str -> Option<&PathBuf>` | Path or None | File path for task ID |
| `tasks()` | `&self -> impl Iterator<Item = &Task>` | Iterator | All tasks |
| `ids()` | `&self -> impl Iterator<Item = &str>` | Iterator | All task IDs |
| `len()` | `&self -> usize` | Count | Number of tasks |
| `is_empty()` | `&self -> bool` | Boolean | Empty check |
| `errors()` | `&self -> &[DiscoveryError]` | Errors | Parse errors from discovery |
| `missing_dependencies()` | `&self -> HashMap<String, Vec<String>>` | Map | Task ID -> missing dep IDs |
| `validate()` | `&self -> ValidationResult` | Result | Full validation |
**Important:** `TaskCollection` does **NOT** implement `Serialize`/`Deserialize` either. It's built procedurally.
### 3.6 DiscoveryError (`src/discovery.rs`)
```rust
#[derive(Debug, Clone)]
pub struct DiscoveryError {
pub path: PathBuf,
pub message: String,
}
```
No serde derives. Simple struct for error reporting.
### 3.7 ValidationResult (`src/discovery.rs`)
```rust
#[derive(Debug)]
pub struct ValidationResult {
pub task_count: usize,
pub errors: Vec<DiscoveryError>,
pub missing_dependencies: HashMap<String, Vec<String>>,
}
```
Methods: `is_valid() -> bool`, `issue_count() -> usize`
No serde derives on the Rust type itself, but it's converted to `ValidationOutput` (which does derive `Serialize`) in the validate command.
### 3.8 Config (`src/config.rs`)
```rust
#[derive(Debug, Default, Serialize, Deserialize)]
pub struct Config {
#[serde(default)]
pub project: ProjectConfig,
}
#[derive(Debug, Serialize, Deserialize)]
pub struct ProjectConfig {
#[serde(default = "default_tasks_dir")]
pub tasks_dir: String, // default: "tasks"
}
```
**API:**
| Method | Signature | Returns | Description |
|--------|-----------|---------|-------------|
| `from_file()` | `&Path -> Result<Self>` | Config | Load from .taskgraph.toml |
| `find_and_load()` | `-> Option<Self>` | Config or None | Search up directory tree |
| `tasks_path()` | `&self -> PathBuf` | Path | Get tasks directory |
### 3.9 CLI Types (`src/cli.rs`)
```rust
#[derive(Clone, Copy, Debug, Default, ValueEnum)]
pub enum OutputFormat {
Plain, // default
Json,
}
#[derive(Parser, Debug)]
pub struct Cli {
pub path: Option<String>,
pub format: OutputFormat,
pub command: Commands,
}
#[derive(Subcommand, Debug)]
pub enum Commands {
Init { id, name, scope, risk },
Validate { strict },
List { status, tag },
Show { id },
Deps { id },
Dependents { id },
Topo { status },
Cycles,
Parallel,
Critical,
Bottleneck,
Risk,
Decompose,
WorkflowCost { include_completed, limit },
RiskPath,
Graph { output },
Completions { shell },
}
```
The `Cli::execute()` method dispatches all commands. It creates `TaskCollection` from directory for each command.
### 3.10 Lib.rs Public Re-exports
```rust
pub mod cli;
pub mod commands;
pub mod config;
pub mod discovery;
pub mod error;
pub mod graph;
pub mod task;
pub use config::Config;
pub use discovery::{DiscoveryError, TaskCollection, ValidationResult};
pub use error::{Error, Result};
pub use graph::DependencyGraph;
pub use task::{Task, TaskFrontmatter, TaskImpact, TaskLevel, TaskRisk, TaskScope, TaskStatus};
```
---
## 4. Functions/Methods to Expose via NAPI
### Priority 1: Core Data Types (Must Have)
These are the foundational types that everything else depends on:
| Rust Type | NAPI Class | Why |
|-----------|------------|-----|
| `Task` | `Task` | Central unit of work; must be creatable, readable, serializable from JS |
| `TaskFrontmatter` | Embedded in `Task` or separate class | All metadata is here; JS needs to read/write fields |
| `TaskStatus` | String enum mapping | Simple 5-variant enum; map to JS string union |
| `TaskScope` | String enum mapping | 5 variants with numeric mappings; map to JS string union |
| `TaskRisk` | String enum mapping | 5 variants with probability; map to JS string union |
| `TaskImpact` | String enum mapping | 4 variants with weight; map to JS string union |
| `TaskLevel` | String enum mapping | 5 variants; map to JS string union |
### Priority 2: Core Functions (Must Have)
| Rust Function | NAPI Method | Input | Output | Why |
|---------------|-------------|-------|--------|-----|
| `Task::from_markdown()` | `Task.fromMarkdown(content, source?)` | `string, string?` | `Task` | Parse task from markdown string |
| `Task::from_file()` | `Task.fromFile(path)` | `string` | `Task` | Parse task from file path |
| `Task::to_markdown()` | `task.toMarkdown()` | - | `string` | Serialize task back to markdown |
| `Task::id()` | `task.id` (getter) | - | `string` | Accessor |
| `Task::name()` | `task.name` (getter) | - | `string` | Accessor |
| `Task::status()` | `task.status` (getter) | - | `string` | Accessor |
| `Task::depends_on()` | `task.dependsOn` (getter) | - | `string[]` | Accessor |
| `TaskScope::token_estimate()` | `scope.tokenEstimate()` | - | `number` | Numeric mapping |
| `TaskScope::cost_estimate()` | `scope.costEstimate()` | - | `number` | Numeric mapping |
| `TaskRisk::success_probability()` | `risk.successProbability()` | - | `number` | Numeric mapping |
| `TaskImpact::weight()` | `impact.weight()` | - | `number` | Numeric mapping |
### Priority 3: Collection & Discovery (Must Have)
| Rust Function | NAPI Method | Input | Output | Why |
|---------------|-------------|-------|--------|-----|
| `TaskCollection::from_directory()` | `TaskCollection.fromDirectory(path)` | `string` | `TaskCollection` | Primary entry point: discover all tasks |
| `TaskCollection::new()` | `new TaskCollection()` | - | `TaskCollection` | Empty constructor for building manually |
| `TaskCollection::get()` | `collection.get(id)` | `string` | `Task\|null` | Lookup by ID |
| `TaskCollection::len()` | `collection.length` (getter) | - | `number` | Task count |
| `TaskCollection::ids()` | `collection.ids()` | - | `string[]` | All task IDs |
| `TaskCollection::tasks()` | `collection.tasks()` | - | `Task[]` | All tasks |
| `TaskCollection::errors()` | `collection.errors` (getter) | - | `DiscoveryError[]` | Parse errors |
| `TaskCollection::missing_dependencies()` | `collection.missingDependencies()` | - | `Record<string, string[]>` | Find broken deps |
| `TaskCollection::validate()` | `collection.validate()` | - | `ValidationResult` | Full validation |
### Priority 4: Graph Operations (Must Have)
| Rust Function | NAPI Method | Input | Output | Why |
|---------------|-------------|-------|--------|-----|
| `DependencyGraph::from_collection()` | `DependencyGraph.fromCollection(collection)` | `TaskCollection` | `DependencyGraph` | Build graph |
| `DependencyGraph::new()` | `new DependencyGraph()` | - | `DependencyGraph` | Empty graph constructor |
| `DependencyGraph::from_tasks()` | `DependencyGraph.fromTasks(tasks[])` | `Task[]` | `DependencyGraph` | Build from JS array |
| `add_task()` | `graph.addTask(id)` | `string` | `void` | Add node |
| `add_dependency()` | `graph.addDependency(from, to)` | `string, string` | `void` | Add edge |
| `has_cycles()` | `graph.hasCycles()` | - | `boolean` | Cycle detection |
| `find_cycles()` | `graph.findCycles()` | - | `string[][]` | Get actual cycles |
| `topological_order()` | `graph.topologicalOrder()` | - | `string[]\|null` | Execution order |
| `dependencies()` | `graph.dependencies(id)` | `string` | `string[]` | Direct deps |
| `dependents()` | `graph.dependents(id)` | `string` | `string[]` | What depends on this |
| `parallel_groups()` | `graph.parallelGroups()` | - | `string[][]` | Parallel work groups |
| `critical_path()` | `graph.criticalPath()` | - | `string[]` | Longest path |
| `weighted_critical_path()` | `graph.weightedCriticalPath(weightFn)` | `(id: string) => number` | `string[]` | Weighted longest path |
| `bottlenecks()` | `graph.bottlenecks()` | - | `[string, number][]` | Betweenness centrality |
| `to_dot()` | `graph.toDot()` | - | `string` | GraphViz DOT format |
### Priority 5: Config (Nice to Have)
| Rust Function | NAPI Method | Input | Output | Why |
|---------------|-------------|-------|--------|-----|
| `Config::from_file()` | `Config.fromFile(path)` | `string` | `Config` | Load config |
| `Config::find_and_load()` | `Config.findAndLoad()` | - | `Config\|null` | Auto-discover config |
| `Config::tasks_path()` | `config.tasksPath` (getter) | - | `string` | Get tasks dir |
### Priority 6: Workflow Cost Calculation (Nice to Have)
The `workflow_cost` command uses `calculate_task_ev()` which is a private function. Consider exposing:
| Function | NAPI Method | Input | Output | Why |
|----------|-------------|-------|--------|-----|
| `calculate_task_ev()` (currently private) | `calculateTaskEv(p, scopeCost, impactWeight)` | `number, number, number` | `number` | Expected value calculation |
This would need to be made `pub` or reimplemented in the NAPI layer.
### Notes on `weighted_critical_path` for NAPI
The `weighted_critical_path` takes a Rust closure `F: Fn(&str) -> f64`. For NAPI, this would need to:
1. Accept a JavaScript function callback, OR
2. Accept a `Record<string, number>` map of task ID -> weight
Option 2 is simpler and avoids cross-language callback overhead. For example:
```typescript
// NAPI signature option A (callback approach - complex)
graph.weightedCriticalPath((taskId: string) => number): string[]
// NAPI signature option B (map approach - simpler)
graph.weightedCriticalPath(weights: Record<string, number>): string[]
```
---
## 5. Serialization (Serde) Support
### Full Serde Support (Serialize + Deserialize)
| Type | Serialize | Deserialize | Notes |
|------|-----------|-------------|-------|
| `TaskStatus` | Yes | Yes | `rename_all = "kebab-case"` |
| `TaskScope` | Yes | Yes | `rename_all = "kebab-case"` |
| `TaskRisk` | Yes | Yes | `rename_all = "kebab-case"` |
| `TaskImpact` | Yes | Yes | `rename_all = "kebab-case"` |
| `TaskLevel` | Yes | Yes | `rename_all = "kebab-case"` |
| `TaskFrontmatter` | Yes | Yes | Rich serde attributes (skip_serializing_if, rename, default) |
| `Config` | Yes | Yes | Via TOML |
| `ProjectConfig` | Yes | Yes | Via TOML |
### No Serde Support
| Type | Serialize | Deserialize | Reason |
|------|-----------|-------------|--------|
| `Task` | No | No | `body` and `source` are separate from frontmatter; `to_markdown()` handles serialization manually |
| `DependencyGraph` | No | No | Computed structure; rebuilt from tasks each time |
| `TaskCollection` | No | No | Procedurally built from directory scanning |
| `DiscoveryError` | No | No | Error reporting struct |
| `ValidationResult` | No | No | Internal result type |
| `Error` | No | No | Error enum |
| `OutputFormat` | No | No | CLI-only (ValueEnum, not serde) |
| `Cli` | No | No | CLI-only (clap derive) |
| `Commands` | No | No | CLI-only enum |
### JSON Serialization in Commands (Ad-hoc)
Several command modules define private structs that derive `Serialize` for JSON output:
| File | Struct | Fields |
|------|--------|--------|
| `validate.rs` | `ValidationOutput` | valid, task_count, error_count, errors[], missing_deps |
| `validate.rs` | `ValidationError` | path, message |
| `list.rs` | `TaskSummary` | id, name, status, scope |
| `show.rs` | `TaskDetails` | id, name, status, depends_on, scope, risk, impact, level, tags, body |
| `deps.rs` | `DependencyInfo` | id, status, exists |
| `deps.rs` | `DependenciesOutput` | task_id, dependencies[] |
| `topo.rs` | `TopoTask` | position, id, name, status |
| `topo.rs` | `TopoOutput` | order[], has_cycles |
| `cycles.rs` | `CyclesOutput` | has_cycles, cycle_count, cycles[] |
| `workflow_cost.rs` | `TaskCost` | id, name, cost |
These are **private** to each command module and not part of the public API. For NAPI, we would define equivalent TypeScript interfaces or create new public serializable structs.
### Serialization Format Details
**YAML (frontmatter):** `TaskFrontmatter` uses `serde_yaml` with:
- `rename_all = "kebab-case"` on enums → `in-progress`, `narrow`, `high`, etc.
- `rename = "depends_on"` on the `depends_on` field (explicit)
- `default` on required-ish fields
- `skip_serializing_if = "Option::is_none"` for optional fields
- `skip_serializing_if = "Vec::is_empty"` for tags
**JSON (output):** Uses `serde_json::to_string_pretty()` in commands.
**TOML (config):** `Config` uses `toml::from_str()`.
**Roundtrip:** `Task::from_markdown()` + `Task::to_markdown()` should produce equivalent output (tested implicitly).
---
## 6. Error Types and Error Handling
### Library Error Type (`src/error.rs`)
```rust
#[derive(Error, Debug)]
pub enum Error {
#[error("Task not found: {0}")]
TaskNotFound(String),
#[error("Task already exists: {0}")]
TaskAlreadyExists(String),
#[error("Circular dependency detected: {0}")]
CircularDependency(String),
#[error("Invalid frontmatter in {file}: {message}")]
InvalidFrontmatter { file: String, message: String },
#[error("Missing required field '{field}' in {file}")]
MissingField { file: String, field: String },
#[error("IO error: {0}")]
Io(#[from] std::io::Error),
#[error("YAML parsing error: {0}")]
Yaml(#[from] serde_yaml::Error),
#[error("JSON serialization error: {0}")]
Json(#[from] serde_json::Error),
#[error("Graph error: {0}")]
Graph(String),
}
pub type Result<T> = std::result::Result<T, Error>;
```
**Error conversion:** `From` impls via `#[from]` for `std::io::Error`, `serde_yaml::Error`, `serde_json::Error`.
**Usage patterns:**
- Library code returns `crate::Result<T>` (= `Result<T, Error>`)
- `anyhow::Result` is used only in `main.rs` for the binary entry point
- `thiserror` provides `Display` impls automatically
### CLI Error Handling
The `Cli::execute()` method returns `anyhow::Result<()>`. Each command function returns `crate::Result<()>`. The `?` operator converts between them naturally.
**Error handling at boundaries:**
- `Task::from_file()`: IO errors → `Error::Io`, parse errors → `Error::InvalidFrontmatter`
- `TaskCollection::from_directory()`: Silently skips files without frontmatter, stores errors in `DiscoveryError` list (non-fatal)
- `Config::from_file()`: TOML parse errors → `Error::Graph(format!(...))` (note: reuses Graph variant)
- Command functions: `Error::TaskNotFound` when task ID missing, `Error::TaskAlreadyExists` on duplicate init
### NAPI Error Mapping Strategy
For the Node.js wrapper, we should map:
| Rust Error | Node.js Error | Notes |
|------------|---------------|-------|
| `TaskNotFound(id)` | Generic `Error` with message | JS: `throw new Error("Task not found: <id>")` |
| `TaskAlreadyExists(id)` | Generic `Error` with message | JS: `throw new Error("Task already exists: <id>")` |
| `CircularDependency(msg)` | Generic `Error` with message | JS: `throw new Error("Circular dependency: <msg>")` |
| `InvalidFrontmatter { file, message }` | Generic `Error` with message | JS: `throw new Error("Invalid frontmatter in <file>: <message>")` |
| `MissingField { file, field }` | Generic `Error` with message | JS: `throw new Error("Missing field <field> in <file>")` |
| `Io(err)` | Generic `Error` with message | JS: `throw new Error("IO error: <message>")` |
| `Yaml(err)` | Generic `Error` with message | JS: `throw new Error("YAML parsing error: <message>")` |
| `Json(err)` | Generic `Error` with message | JS: `throw new Error("JSON error: <message>")` |
| `Graph(msg)` | Generic `Error` with message | JS: `throw new Error("Graph error: <msg>")` |
Alternatively, we could create custom JS error classes for better programmatic handling:
```typescript
class TaskNotFoundError extends Error { taskId: string }
class CircularDependencyError extends Error { }
class InvalidFrontmatterError extends Error { file: string; message: string }
```
---
## 7. Input/Output Patterns
### Data Flow Overview
```
DISCOVERY
tasks/*.md files ──────────────> TaskCollection
(disk) (HashMap<String, Task>)
│ from_collection() / from_tasks()
DependencyGraph
(DiGraph<String, ()>)
┌────────────────┼────────────────────┐
│ │ │
▼ ▼ ▼
topological parallel_groups critical_path
order() () ()
│ │ │
└────────────────┴─────────────────────┘
Output (plain/JSON)
```
### Input Patterns
1. **File-based input (primary):** `TaskCollection::from_directory(path)` scans a directory recursively for `.md` files, parses each, and builds the collection. This is the main entry point.
2. **String-based input:** `Task::from_markdown(content, source)` parses a single markdown string. Useful for programmatic construction.
3. **Path-based input:** `Task::from_file(path)` reads a single file and parses it.
4. **Programmatic construction:** `DependencyGraph::new()` + `add_task()` + `add_dependency()` for building graphs manually.
### Output Patterns
1. **Plain text (default):** Human-readable terminal output with tables, arrows, and formatting.
2. **JSON output (`--format json`):** Structured JSON using ad-hoc `Serialize` structs in each command. This is the primary programmatic output format.
3. **DOT format:** `DependencyGraph::to_dot()` returns GraphViz DOT format string.
4. **Markdown roundtrip:** `Task::to_markdown()` produces valid markdown with YAML frontmatter.
### Typical Usage Flow
```rust
// 1. Discover tasks
let collection = TaskCollection::from_directory(Path::new("./tasks"));
// 2. Validate
let result = collection.validate();
if !result.is_valid() { /* handle errors */ }
// 3. Build graph
let graph = DependencyGraph::from_collection(&collection);
// 4. Analyze
let has_cycles = graph.has_cycles();
let order = graph.topological_order();
let parallel = graph.parallel_groups();
let critical = graph.critical_path();
let bottlenecks = graph.bottlenecks();
```
### NAPI Data Flow Design
For the Node.js wrapper, the recommended data flow is:
```typescript
// Option A: File-based (mirrors Rust CLI)
const collection = TaskCollection.fromDirectory('./tasks');
const graph = DependencyGraph.fromCollection(collection);
// Option B: Programmatic (unique to NAPI)
const tasks = [
Task.fromMarkdown('---\nid: t1\nname: Task 1\n---\nBody'),
Task.fromMarkdown('---\nid: t2\nname: Task 2\ndepends_on: [t1]\n---\nBody'),
];
const graph = DependencyGraph.fromTasks(tasks);
// Option C: Manual graph construction
const graph = new DependencyGraph();
graph.addTask('t1');
graph.addTask('t2');
graph.addDependency('t1', 't2');
```
### Memory/Ownership Considerations for NAPI
- `Task` is `Clone` (cheap to clone; contains String, TaskFrontmatter, Option<String>)
- `TaskCollection` owns all `Task` objects (HashMap<String, Task>)
- `DependencyGraph` owns the graph structure (not the tasks themselves; only stores task IDs as node weights)
- `DependencyGraph::from_collection()` borrows `&TaskCollection` (doesn't take ownership)
- `Task::from_file()` and `from_markdown()` return owned `Task` values
For NAPI, we need to decide:
1. **Should `TaskCollection` hold JS-managed task objects or Rust-owned?** Probably Rust-owned (tasks are parsed from files/strings, not constructed in JS).
2. **Should graph operations return strings or Task references?** Currently returns `Vec<TaskId>` (strings). The JS side can look up tasks from the collection. This is efficient.
3. **Should `DependencyGraph` keep a reference to `TaskCollection`?** Currently no. This means JS must pass the collection alongside the graph for enriched output. We could create a combined `TaskGraph` class in the NAPI layer.
---
## 8. Existing Tests and Benchmarks
### Unit Tests (in-source)
| File | Test Count | Key Tests |
|------|-----------|-----------|
| `src/graph.rs` | 12 | Empty graph, add task/dep, missing deps, cycle detection, topo sort, parallel groups, critical path, bottleneck, DOT output, unknown task queries |
| `src/discovery.rs` | 5 | Single task discovery, skip files without frontmatter, duplicate ID detection, missing dependencies, validation result |
| `src/config.rs` | 2 | Default config, load from file |
### Integration Tests (`tests/integration/commands.rs`)
25 tests total using `assert_cmd`:
| Test | Command | What It Verifies |
|------|---------|-----------------|
| `test_list_command` | `list` | Lists all 3 fixture tasks |
| `test_list_with_status_filter` | `list --status completed` | Filters correctly |
| `test_show_command` | `show task-one` | Shows task details |
| `test_show_missing_task` | `show missing-task` | Fails on missing |
| `test_validate_command` | `validate` | Succeeds on valid fixtures |
| `test_validate_with_missing_dependency` | `validate` (invalid) | Reports missing deps |
| `test_topo_command` | `topo` | Outputs topological order |
| `test_deps_command` | `deps task-two` | Shows task-one as dependency |
| `test_dependents_command` | `dependents task-one` | Shows tasks two and three |
| `test_cycles_command_no_cycles` | `cycles` | No cycles in valid fixtures |
| `test_cycles_command_with_cycles` | `cycles` (cycles fixtures) | Detects cycle |
| `test_parallel_command` | `parallel` | Shows generation groups |
| `test_critical_command` | `critical` | Shows critical path |
| `test_graph_command` | `graph` | Outputs DOT format |
| `test_bottleneck_command` | `bottleneck` | Shows bottleneck tasks |
| `test_init_command` | `init new-task` | Creates file |
| `test_init_duplicate_task` | `init task-one` | Fails on duplicate |
| `test_init_with_options` | `init --scope narrow --risk low` | Writes scope/risk to file |
| `test_risk_command` | `risk` | Distribution with counts |
| `test_risk_command_empty` | `risk` (empty dir) | "No tasks found" |
| `test_decompose_command` | `decompose` | Flags high-risk/broad-scope tasks |
| `test_decompose_command_none_needed` | `decompose` (low-risk tasks) | "No tasks need decomposition" |
| `test_workflow_cost_command` | `workflow-cost` | Shows cost analysis |
| `test_workflow_cost_command_empty` | `workflow-cost` (empty) | "No tasks found" |
| `test_risk_path_command` | `risk-path` | Shows risk path |
| `test_risk_path_command_empty` | `risk-path` (empty) | "No tasks found" |
| `test_help_flag` | `--help` | Shows help text |
| `test_version_flag` | `--version` | Succeeds |
| `test_completions_bash` | `completions bash` | Bash completion output |
| `test_completions_zsh` | `completions zsh` | Zsh completion output |
| `test_completions_fish` | `completions fish` | Fish completion output |
### Benchmark Suite (`benches/graph_benchmarks.rs`)
Uses Criterion. Two benchmark groups:
1. **`load_tasks`**: Measures `TaskCollection::from_directory()` + `DependencyGraph::from_collection()` for 50, 100, 500, 1000 tasks.
2. **`graph_ops`**: On 1000-task graph, measures:
- `topological_sort_1000`
- `cycle_detection_1000`
- `critical_path_1000`
- `bottlenecks_1000`
Test data: linear chain of tasks (task-i depends on task-(i-1)).
### Performance Numbers (from README)
| Tasks | Load Time | Topo Sort | Cycles | Critical Path |
|-------|-----------|-----------|--------|---------------|
| 50 | 3ms | 3ms | 2ms | 8ms |
| 500 | 19ms | 21ms | 14ms | 52ms |
| 1,000 | 34ms | 42ms | 26ms | 82ms |
(Benchmarked on AMD EPYC 9004 series)
### CI Pipeline (`.github/workflows/ci.yml`)
Two jobs:
1. **Test**: checkout -> install Rust (with rustfmt, clippy) -> cache -> fmt check -> clippy -> test -> build release
2. **Coverage**: checkout -> install Rust -> cache -> install cargo-llvm-cov -> generate lcov -> upload to Codecov
### Test Coverage
Reported at 89% (meeting the 80% target from AGENTS.md).
---
## Appendix A: Complete Type Reference for NAPI Mapping
### Enums to JS String Unions
```typescript
// task.ts
type TaskStatus = "pending" | "in-progress" | "completed" | "failed" | "blocked";
type TaskScope = "single" | "narrow" | "moderate" | "broad" | "system";
type TaskRisk = "trivial" | "low" | "medium" | "high" | "critical";
type TaskImpact = "isolated" | "component" | "phase" | "project";
type TaskLevel = "planning" | "decomposition" | "implementation" | "review" | "research";
```
### Proposed NAPI Class Structure
```typescript
// task.ts
class Task {
// Static constructors
static fromMarkdown(content: string, source?: string): Task;
static fromFile(path: string): Task;
// Getters
get id(): string;
get name(): string;
get status(): TaskStatus;
get dependsOn(): string[];
get body(): string;
get source(): string | null;
// Frontmatter access (via JS object)
get frontmatter(): TaskFrontmatter;
// Serialization
toMarkdown(): string;
}
interface TaskFrontmatter {
id: string;
name: string;
status: TaskStatus;
dependsOn: string[];
priority?: string;
tags: string[];
created?: string; // ISO 8601
modified?: string; // ISO 8601
assignee?: string;
due?: string;
scope?: TaskScope;
risk?: TaskRisk;
impact?: TaskImpact;
level?: TaskLevel;
}
// collection.ts
class TaskCollection {
static fromDirectory(path: string): TaskCollection;
get(id: string): Task | null;
get length(): number;
ids(): string[];
tasks(): Task[];
get errors(): DiscoveryError[];
missingDependencies(): Record<string, string[]>;
validate(): ValidationResult;
}
interface DiscoveryError {
path: string;
message: string;
}
interface ValidationResult {
taskCount: number;
errors: DiscoveryError[];
missingDependencies: Record<string, string[]>;
isValid(): boolean;
issueCount(): number;
}
// graph.ts
class DependencyGraph {
static fromCollection(collection: TaskCollection): DependencyGraph;
static fromTasks(tasks: Task[]): DependencyGraph;
addTask(id: string): void;
addDependency(from: string, to: string): void;
hasCycles(): boolean;
findCycles(): string[][];
topologicalOrder(): string[] | null;
dependencies(taskId: string): string[];
dependents(taskId: string): string[];
parallelGroups(): string[][];
criticalPath(): string[];
weightedCriticalPath(weights: Record<string, number>): string[];
bottlenecks(): [string, number][];
toDot(): string;
}
// config.ts
class Config {
static fromFile(path: string): Config;
static findAndLoad(): Config | null;
get tasksPath(): string;
}
// workflow.ts
function calculateTaskEv(p: number, scopeCost: number, impactWeight: number): number;
```
### Key Decisions for NAPI Implementation
1. **Task mutability:** The Rust `Task` struct is `Clone` but has no setters. For NAPI, we should either:
- Make the JS `Task` immutable (read-only after creation) - simpler, matches Rust
- Add a `TaskBuilder` pattern for constructing tasks programmatically
2. **Enum representation:** Use JS string literals (not numeric enums) to match the `kebab-case` serde serialization.
3. **Error handling:** Throw JS `Error` objects from NAPI. Consider custom error classes for `TaskNotFound` and `InvalidFrontmatter`.
4. **DateTime handling:** `chrono::DateTime<Utc>` maps to ISO 8601 strings in JS. No need for JS `Date` objects in the NAPI layer.
5. **Graph lifetime:** The Rust `DependencyGraph` borrows nothing (stores owned `String` node weights). It can be freely moved/owned in NAPI.
6. **Collection lifetime:** `TaskCollection` owns its tasks. The NAPI class should hold the Rust struct. Returning `Task` references from `collection.get()` requires careful lifetime management - consider returning clones.
7. **`weighted_critical_path` callback:** Replace the Rust closure with a JS `Record<string, number>` dict lookup to avoid FFI callback overhead and complexity.
---
## Appendix B: Notable Implementation Details
### Bottleneck Algorithm
The current `bottlenecks()` implementation uses an O(n^2 * P) algorithm where P is the number of paths between nodes. It enumerates all paths between all pairs, then counts how many paths each task appears on. This is **not** true betweenness centrality (which uses Brandes' O(VE) algorithm) but a simpler path-counting approach. For large graphs, this could be slow. The benchmark only tests up to 1000 nodes with linear topology.
### Critical Path Algorithm
Uses recursive memoized longest-path computation. Works well for DAGs but will return empty/incorrect results if cycles exist (the `parallel_groups` method also silently breaks if cycles exist).
### Missing: Task Serialization
`Task` does not implement `Serialize`/`Deserialize`. The `to_markdown()` method manually concatenates YAML frontmatter + markdown body. If we need JSON serialization of the full `Task` (including body), we should add a new serializable struct like:
```rust
#[derive(Serialize)]
pub struct SerializableTask {
pub frontmatter: TaskFrontmatter,
pub body: String,
pub source: Option<String>,
}
```
Or implement `Serialize` for `Task` directly.
### Missing: Task Mutability
There are no methods to update a task's status, dependencies, etc. in place. The current design assumes files are the source of truth and are edited directly. For an NAPI wrapper, we may want to add:
- `task.set_status(status: TaskStatus)`
- `task.set_depends_on(deps: Vec<String>)`
- etc.
Or use a builder pattern for creating new tasks.
### Missing: Partial Graph Building
`DependencyGraph::from_collection()` adds edges only for dependencies that exist as nodes in the graph. Missing dependencies are silently ignored (no error, no warning). This matches the `add_dependency()` behavior which checks `index_map` before adding edges.
### walkdir::FollowLinks(false)
`TaskCollection::from_directory()` does not follow symlinks. This is intentional for safety.
---
## Appendix C: Dependency Version Compatibility Notes
| Crate | Version | Notes for NAPI |
|-------|---------|---------------|
| `petgraph` | `0.7` | Stable API; `DiGraph` and algorithms are well-defined |
| `gray_matter` | `0.2` | Minor version; API may change in `0.3` |
| `serde` | `1.0` | Very stable; `derive` feature needed |
| `serde_json` | `1.0` | Very stable |
| `serde_yaml` | `0.9` | Note: `serde_yaml` 0.9 is the last version before potential breaking changes |
| `chrono` | `0.4` | Stable; `serde` feature for serialization |
| `clap` | `4.5` | CLI-only; not needed in NAPI lib |
| `thiserror` | `2.0` | Error derive; v2 is newer than commonly seen |
| `toml` | `0.8` | For config loading |
| `walkdir` | `2.5` | For directory scanning |
For NAPI, we can exclude from the build:
- `clap` / `clap_complete` (CLI-only, not needed for library)
- `tracing` / `tracing-subscriber` (logging, optional)
- `dirs` (platform directories, only for CLI default paths)
This could be done with feature flags:
```toml
[features]
default = ["cli"]
cli = ["clap", "clap_complete", "tracing", "tracing-subscriber", "dirs"]
napi = [] # Minimal dependencies for Node.js binding
```