Decompose monolithic architecture.md into modular docs/architecture/ documents

The 751-line architecture.md violated the SDD process modular documentation target (~500 lines). It also had duplicate TaskGraph class definitions (one monolith, one decomposed) that directly contradicted each other, and embedded consumer-specific tool dispatch mappings that belong in downstream projects. Changes: - Split into 8 focused documents + 7 ADR records + redirect page - Removed the monolithic TaskGraph class (kept only decomposed version) - Moved CLI→plugin dispatch mapping out (belongs in plugin architecture) - Extracted implementation code (frontmatter splitter, findCycles, DAG propagation) into WHAT/WHY descriptions per architect role spec - Added proper ADR format for all resolved design decisions - Fixed review issues: C_fail mapping, DuplicateNodeError/DuplicateEdgeError types, ValidationError/GraphValidationError definitions, mutation error handling contract, enum naming convention, validation timing clarification
2026-04-26 06:38:52 +00:00
parent bac335274d
commit bde1cc4e70
16 changed files with 1264 additions and 748 deletions
--- a/docs/architecture/cost-benefit.md
+++ b/docs/architecture/cost-benefit.md
@@ -0,0 +1,131 @@
+---
+status: draft
+last_updated: 2026-04-26
+---
+
+# Cost-Benefit Analysis
+
+Expected value math, risk analysis, DAG-propagation cost model, and cycle detection.
+
+## Overview
+
+The cost-benefit functions are the key analytical value of the library. They go beyond simple graph topology to answer structural questions about task workflows: which path has the highest cumulative risk? What's the expected cost of a workflow? Which tasks should be decomposed?
+
+These functions implement the cost-benefit framework from `/workspace/@alkimiadev/taskgraph/docs/framework.md` and extend it with DAG-propagation (from the Python research model) that the Rust CLI's independent model ignores.
+
+## Core Concepts
+
+### Expected Value of a Task
+
+```
+EV_task = P_success × C_success + (1 - P_success) × C_fail
+```
+
+Where categorical fields provide the inputs:
+- **P_success** = `riskSuccessProbability(risk)` — probability the task completes successfully
+- **C_success** = `scopeCostEstimate(scope)` — cost when it works
+- **C_fail** = modeled via `EvConfig` parameters: `scopeCost + fallbackCost + timeLost × expectedRetries`. The `calculateTaskEv` function uses `scopeCost` as `C_success` and derives `C_fail` from the same `scopeCost` plus `fallbackCost` and `timeLost` scaled by expected retry count. `fallbackCost` and `timeLost` default to 0 if not provided, yielding `C_fail = C_success` in the simplest case. The `valueRate` parameter converts the result to dollar terms if needed.
+
+### Structural Insight: Upstream Failures Multiply
+
+```
+planning failure → wrong decomposition → wasted implementation
+decomposition failure → unclear tasks → rework
+review failure → bugs shipped → rework
+```
+
+This means `risk: critical` at planning level > `risk: critical` at implementation level. The cost-benefit framework demonstrates this: poor planning (p=0.65) increases total cost by 150% compared to good planning (p=0.92), even with identical implementation tasks.
+
+The failure propagates: poor planning reduces decomposition quality, which reduces implementation effectiveness, which increases integration issues. This structural property is independent of the developer type — human, LLM, or otherwise.
+
+### Decomposition Threshold
+
+`shouldDecomposeTask` flags tasks where:
+- risk >= high, OR
+- scope >= broad
+
+This is a structural insight: large or risky tasks have higher failure rates and should be broken down. The threshold is consistent with the Rust CLI's `decompose` command.
+
+## DAG-Propagation Cost Model
+
+### Why
+
+The Rust CLI computes EV per-task independently — no upstream quality degradation. As the Python research model demonstrates, this is dangerously optimistic for non-trivial workflows. In a dependency chain where planning has p=0.65 (poor), the Python model shows a **213% cost increase** vs good planning (p=0.92). The independent model barely shows a difference because it ignores cascading failure.
+
+### Implementation Approach
+
+DAG propagation is the **default mode**. The independent model is a degenerate case (set `defaultQualityDegradation: 0` or `propagationMode: 'independent'`).
+
+The algorithm processes tasks in topological order, maintaining an `upstreamSuccessProbs` map:
+
+1. For each task in topological order:
+   - If propagation mode is `dag-propagate`: compute `pEffective` from intrinsic probability + upstream propagation
+   - If propagation mode is `independent`: use intrinsic probability directly
+   - Calculate EV using `calculateTaskEv`
+   - Store the task's actual success probability for downstream propagation
+
+2. When computing effective probability for a task with prerequisites:
+   - Start with intrinsic probability
+   - For each prerequisite, compute inherited quality: `parentP + (1 - parentP) × (1 - qualityDegradation)`
+   - Multiply all inherited quality factors together with intrinsic probability
+
+3. The `qualityDegradation` per edge determines how much a parent's failure bleeds through:
+   - 0.0 = no propagation (independent model)
+   - 1.0 = full propagation (parent failure guarantees child failure)
+   - default 0.9 = high but not total propagation
+
+### Per-task output
+
+Each task in the `WorkflowCostResult.tasks` array includes both `pIntrinsic` and `pEffective` so consumers can see the degradation effect. The per-task entries also include `taskId` and `name` (enriched from the graph's node attributes) — `calculateTaskEv` is the pure math function (takes only numeric inputs), while `workflowCost` is the aggregate that orchestrates the per-task calls and enriches results with identity metadata from the graph.
+
+### Skip-completed semantics
+
+When `includeCompleted: false`, completed tasks are excluded from the result's task list, but they **remain in the propagation chain** with p=1.0. Removing completed tasks from propagation would *worsen* downstream probability estimates — exactly the opposite of what "what's left" queries need.
+
+> See [ADR-004](decisions/004-workflow-cost-dag-propagation.md) and [ADR-005](decisions/005-no-depth-escalation-v1.md).
+
+### Comparison with Rust CLI
+
+| Dimension | Rust CLI (Simple Sum) | This Library (DAG Propagation) |
+|-----------|----------------------|-------------------------------|
+| Topology awareness | None | Full — topological order + upstream propagation |
+| Upstream failure modeling | Ignored | Each parent's failure degrades child's effective p |
+| Edge semantics | Not used | `qualityDegradation` per edge, default 0.9 |
+| Result interpretation | Sum of independent per-task costs | Total workflow cost accounting for cascading failure |
+| Degenerate case | — | Set `propagationMode: 'independent'` or `defaultQualityDegradation: 0` |
+
+## Risk Analysis Functions
+
+### riskPath
+
+`riskPath(graph)` → `RiskPathResult`
+
+Calls `weightedCriticalPath` with weight function `riskWeight * impactWeight`. Returns the path with highest cumulative risk and its total risk score.
+
+### riskDistribution
+
+`riskDistribution(graph)` → `RiskDistributionResult`
+
+Groups tasks by risk category. Returns counts per bucket: trivial, low, medium, high, critical, unspecified.
+
+### shouldDecomposeTask
+
+`shouldDecomposeTask(attrs: TaskGraphNodeAttributes)` → `DecomposeResult`
+
+Pure function — takes node attributes (not a graph). Internally calls `resolveDefaults` to handle nullable `risk`/`scope` fields. A task with `risk: null` uses the default (medium, which is below the threshold); a task with `scope: null` uses the default (narrow, which is below the threshold). This means unassessed tasks are never flagged for decomposition — an explicit `risk: "high"` or `scope: "broad"` is required.
+
+## findCycles
+
+graphology provides `hasCycle` (boolean) and `stronglyConnectedComponents` (node groups, not paths). The library implements a custom cycle path extractor for error reporting:
+
+- **Algorithm**: Extended 3-color DFS (WHITE/GREY/BLACK). When a back edge is found (GREY → GREY), trace back through the recursion stack to extract the cycle path as an ordered node sequence. Each inner array in the returned `string[][]` is a single cycle — an ordered sequence of node IDs where the last node has an edge back to the first. The algorithm returns **one representative cycle per back edge**, not an exhaustive enumeration of all simple cycles (which could be exponential). For error reporting, one cycle per problematic region is sufficient.
+- **Optimization**: Use `stronglyConnectedComponents()` as a fast pre-check. If there are zero multi-node SCCs (and no self-loops), skip the DFS entirely.
+- **Relationship to topologicalOrder**: `topologicalOrder()` throws `CircularDependencyError` (with `cycles` populated from `findCycles`) when the graph is cyclic. This gives consumers the cycle information needed for error reporting.
+
+> See [errors-validation.md](errors-validation.md) for error handling.
+
+## Constraints
+
+- **DAG-propagation is default** — the independent model is opt-in, not the other way around. The independent model is the degenerate case, not the norm.
+- **No depth-escalation in v1** — the multiplicative propagation model already captures depth effects implicitly (each hop compounds another `<1.0` factor). Adding an explicit depth penalty would double-count until we have empirical calibration data. See [ADR-005](decisions/005-no-depth-escalation-v1.md).
+- **Categorical estimates, not numeric** — The framework uses categorical fields because LLMs reliably distinguish "high vs medium risk" but struggle with "$3.42 vs $3.50". Categoricals remain valid across environments (different models, providers, token costs).