Files
taskgraph_ts/docs/architecture/cost-benefit.md
glm-5.1 bde1cc4e70 Decompose monolithic architecture.md into modular docs/architecture/ documents
The 751-line architecture.md violated the SDD process modular documentation
target (~500 lines). It also had duplicate TaskGraph class definitions (one
monolith, one decomposed) that directly contradicted each other, and embedded
consumer-specific tool dispatch mappings that belong in downstream projects.

Changes:
- Split into 8 focused documents + 7 ADR records + redirect page
- Removed the monolithic TaskGraph class (kept only decomposed version)
- Moved CLI→plugin dispatch mapping out (belongs in plugin architecture)
- Extracted implementation code (frontmatter splitter, findCycles, DAG
  propagation) into WHAT/WHY descriptions per architect role spec
- Added proper ADR format for all resolved design decisions
- Fixed review issues: C_fail mapping, DuplicateNodeError/DuplicateEdgeError
  types, ValidationError/GraphValidationError definitions, mutation error
  handling contract, enum naming convention, validation timing clarification
2026-04-26 06:38:52 +00:00

8.4 KiB
Raw Blame History

status, last_updated
status last_updated
draft 2026-04-26

Cost-Benefit Analysis

Expected value math, risk analysis, DAG-propagation cost model, and cycle detection.

Overview

The cost-benefit functions are the key analytical value of the library. They go beyond simple graph topology to answer structural questions about task workflows: which path has the highest cumulative risk? What's the expected cost of a workflow? Which tasks should be decomposed?

These functions implement the cost-benefit framework from /workspace/@alkimiadev/taskgraph/docs/framework.md and extend it with DAG-propagation (from the Python research model) that the Rust CLI's independent model ignores.

Core Concepts

Expected Value of a Task

EV_task = P_success × C_success + (1 - P_success) × C_fail

Where categorical fields provide the inputs:

  • P_success = riskSuccessProbability(risk) — probability the task completes successfully
  • C_success = scopeCostEstimate(scope) — cost when it works
  • C_fail = modeled via EvConfig parameters: scopeCost + fallbackCost + timeLost × expectedRetries. The calculateTaskEv function uses scopeCost as C_success and derives C_fail from the same scopeCost plus fallbackCost and timeLost scaled by expected retry count. fallbackCost and timeLost default to 0 if not provided, yielding C_fail = C_success in the simplest case. The valueRate parameter converts the result to dollar terms if needed.

Structural Insight: Upstream Failures Multiply

planning failure → wrong decomposition → wasted implementation
decomposition failure → unclear tasks → rework
review failure → bugs shipped → rework

This means risk: critical at planning level > risk: critical at implementation level. The cost-benefit framework demonstrates this: poor planning (p=0.65) increases total cost by 150% compared to good planning (p=0.92), even with identical implementation tasks.

The failure propagates: poor planning reduces decomposition quality, which reduces implementation effectiveness, which increases integration issues. This structural property is independent of the developer type — human, LLM, or otherwise.

Decomposition Threshold

shouldDecomposeTask flags tasks where:

  • risk >= high, OR
  • scope >= broad

This is a structural insight: large or risky tasks have higher failure rates and should be broken down. The threshold is consistent with the Rust CLI's decompose command.

DAG-Propagation Cost Model

Why

The Rust CLI computes EV per-task independently — no upstream quality degradation. As the Python research model demonstrates, this is dangerously optimistic for non-trivial workflows. In a dependency chain where planning has p=0.65 (poor), the Python model shows a 213% cost increase vs good planning (p=0.92). The independent model barely shows a difference because it ignores cascading failure.

Implementation Approach

DAG propagation is the default mode. The independent model is a degenerate case (set defaultQualityDegradation: 0 or propagationMode: 'independent').

The algorithm processes tasks in topological order, maintaining an upstreamSuccessProbs map:

  1. For each task in topological order:

    • If propagation mode is dag-propagate: compute pEffective from intrinsic probability + upstream propagation
    • If propagation mode is independent: use intrinsic probability directly
    • Calculate EV using calculateTaskEv
    • Store the task's actual success probability for downstream propagation
  2. When computing effective probability for a task with prerequisites:

    • Start with intrinsic probability
    • For each prerequisite, compute inherited quality: parentP + (1 - parentP) × (1 - qualityDegradation)
    • Multiply all inherited quality factors together with intrinsic probability
  3. The qualityDegradation per edge determines how much a parent's failure bleeds through:

    • 0.0 = no propagation (independent model)
    • 1.0 = full propagation (parent failure guarantees child failure)
    • default 0.9 = high but not total propagation

Per-task output

Each task in the WorkflowCostResult.tasks array includes both pIntrinsic and pEffective so consumers can see the degradation effect. The per-task entries also include taskId and name (enriched from the graph's node attributes) — calculateTaskEv is the pure math function (takes only numeric inputs), while workflowCost is the aggregate that orchestrates the per-task calls and enriches results with identity metadata from the graph.

Skip-completed semantics

When includeCompleted: false, completed tasks are excluded from the result's task list, but they remain in the propagation chain with p=1.0. Removing completed tasks from propagation would worsen downstream probability estimates — exactly the opposite of what "what's left" queries need.

See ADR-004 and ADR-005.

Comparison with Rust CLI

Dimension Rust CLI (Simple Sum) This Library (DAG Propagation)
Topology awareness None Full — topological order + upstream propagation
Upstream failure modeling Ignored Each parent's failure degrades child's effective p
Edge semantics Not used qualityDegradation per edge, default 0.9
Result interpretation Sum of independent per-task costs Total workflow cost accounting for cascading failure
Degenerate case Set propagationMode: 'independent' or defaultQualityDegradation: 0

Risk Analysis Functions

riskPath

riskPath(graph)RiskPathResult

Calls weightedCriticalPath with weight function riskWeight * impactWeight. Returns the path with highest cumulative risk and its total risk score.

riskDistribution

riskDistribution(graph)RiskDistributionResult

Groups tasks by risk category. Returns counts per bucket: trivial, low, medium, high, critical, unspecified.

shouldDecomposeTask

shouldDecomposeTask(attrs: TaskGraphNodeAttributes)DecomposeResult

Pure function — takes node attributes (not a graph). Internally calls resolveDefaults to handle nullable risk/scope fields. A task with risk: null uses the default (medium, which is below the threshold); a task with scope: null uses the default (narrow, which is below the threshold). This means unassessed tasks are never flagged for decomposition — an explicit risk: "high" or scope: "broad" is required.

findCycles

graphology provides hasCycle (boolean) and stronglyConnectedComponents (node groups, not paths). The library implements a custom cycle path extractor for error reporting:

  • Algorithm: Extended 3-color DFS (WHITE/GREY/BLACK). When a back edge is found (GREY → GREY), trace back through the recursion stack to extract the cycle path as an ordered node sequence. Each inner array in the returned string[][] is a single cycle — an ordered sequence of node IDs where the last node has an edge back to the first. The algorithm returns one representative cycle per back edge, not an exhaustive enumeration of all simple cycles (which could be exponential). For error reporting, one cycle per problematic region is sufficient.
  • Optimization: Use stronglyConnectedComponents() as a fast pre-check. If there are zero multi-node SCCs (and no self-loops), skip the DFS entirely.
  • Relationship to topologicalOrder: topologicalOrder() throws CircularDependencyError (with cycles populated from findCycles) when the graph is cyclic. This gives consumers the cycle information needed for error reporting.

See errors-validation.md for error handling.

Constraints

  • DAG-propagation is default — the independent model is opt-in, not the other way around. The independent model is the degenerate case, not the norm.
  • No depth-escalation in v1 — the multiplicative propagation model already captures depth effects implicitly (each hop compounds another <1.0 factor). Adding an explicit depth penalty would double-count until we have empirical calibration data. See ADR-005.
  • Categorical estimates, not numeric — The framework uses categorical fields because LLMs reliably distinguish "high vs medium risk" but struggle with "$3.42 vs $3.50". Categoricals remain valid across environments (different models, providers, token costs).