Decompose monolithic architecture.md into modular docs/architecture/ documents

The 751-line architecture.md violated the SDD process modular documentation target (~500 lines). It also had duplicate TaskGraph class definitions (one monolith, one decomposed) that directly contradicted each other, and embedded consumer-specific tool dispatch mappings that belong in downstream projects. Changes: - Split into 8 focused documents + 7 ADR records + redirect page - Removed the monolithic TaskGraph class (kept only decomposed version) - Moved CLI→plugin dispatch mapping out (belongs in plugin architecture) - Extracted implementation code (frontmatter splitter, findCycles, DAG propagation) into WHAT/WHY descriptions per architect role spec - Added proper ADR format for all resolved design decisions - Fixed review issues: C_fail mapping, DuplicateNodeError/DuplicateEdgeError types, ValidationError/GraphValidationError definitions, mutation error handling contract, enum naming convention, validation timing clarification
2026-04-26 06:38:52 +00:00
parent bac335274d
commit bde1cc4e70
16 changed files with 1264 additions and 748 deletions
--- a/docs/architecture/decisions/001-pivot-to-typescript-graphology.md
+++ b/docs/architecture/decisions/001-pivot-to-typescript-graphology.md
@@ -0,0 +1,31 @@
+# ADR-001: Pivot from NAPI/Rust to TypeScript + graphology
+
+**Status**: Accepted
+
+## Context
+
+The original design specified a Rust core with napi-rs bindings, extracting the graph logic from the existing taskgraph CLI. This would provide high performance but introduced significant complexity.
+
+## Decision
+
+Pivot to pure TypeScript with graphology as the graph engine. No Rust compilation, no native addons, no platform-specific binaries.
+
+## Consequences
+
+### Positive
+- Cross-platform builds eliminated — pure JS works in Node, Deno, and Bun
+- graphology already provides all needed DAG algorithms, and is already in our dependency tree
+- Publishing is simple (`npm publish` with no CI matrix for platform binaries)
+- Future UI path is straightforward — graphology powers sigma.js/react-sigma
+- Near 1:1 petgraph ↔ graphology mapping means porting back to Rust is tractable
+
+### Negative
+- Raw algorithm performance is slower than Rust for very large graphs
+- graphology's API differences require adaptation (not a drop-in petgraph replacement)
+
+### Neutral
+- The Rust CLI continues to exist for human/offline use — this is not a replacement, it's a parallel implementation for different consumers
+
+## Trade-off
+
+Performance at realistic graph sizes (10–200 nodes) is negligible between Rust and JS. The build/publish complexity savings of pure JS massively outweigh the theoretical performance gain.
--- a/docs/architecture/decisions/002-rebuild-vs-incremental.md
+++ b/docs/architecture/decisions/002-rebuild-vs-incremental.md
@@ -0,0 +1,26 @@
+# ADR-002: Rebuild graph on change, not incremental updates
+
+**Status**: Accepted
+
+## Context
+
+When task data changes (file edits, DB updates), the in-memory graph needs to reflect the new state. Two approaches: incremental updates (add/remove individual nodes/edges) or full rebuild from source data.
+
+## Decision
+
+**Rebuild.** For our graph sizes (10–200 nodes), `graph.import()` from a serialized blob is sub-millisecond. Both consumers (alkhub builds from DB query results; OpenCode plugin rebuilds from directory on file change) are well-served by rebuild.
+
+## Consequences
+
+### Positive
+- No change-detection layer needed — no tracking ID renames, dependency removals, edge reconciliation
+- Simpler codebase — no diff algorithm, no incremental update logic
+- Always consistent — rebuild guarantees the graph matches the source data exactly
+
+### Negative
+- Technically wasteful for small changes (rebuilding entire graph when one task changed)
+- Not suitable for very large graphs or extremely frequent updates
+
+### Mitigation
+
+If a future use case requires incremental updates, add it as an optimization then. The API surface (construction methods) supports both patterns — incremental construction exists via `addTask`/`addDependency`.
--- a/docs/architecture/decisions/003-topo-order-throws-on-cycle.md
+++ b/docs/architecture/decisions/003-topo-order-throws-on-cycle.md
@@ -0,0 +1,27 @@
+# ADR-003: topologicalOrder throws CircularDependencyError on cyclic graphs
+
+**Status**: Accepted
+
+## Context
+
+When a graph has cycles, topological sort cannot produce a complete ordering. Options: return `null`, return a partial ordering, or throw an error with cycle information.
+
+## Decision
+
+**Throw `CircularDependencyError`** with `cycles` populated from `findCycles()`. Do not return a partial ordering or `null`.
+
+## Consequences
+
+### Positive
+- Prevents silent ignoring of cycles — consumers get explicit error information
+- `CircularDependencyError.cycles` provides the actual cycle paths for error reporting
+- Simpler return type — `string[]` instead of `string[] | null` or `string[][]`
+- Both consumers treat cycles as bugs: alkhub data comes from validated DB schema; OpenCode plugin data comes from frontmatter that should be validated before graph construction
+
+### Negative
+- Callers who want "best effort" ordering on cyclic graphs must catch the error and call `findCycles()` separately
+- Cannot get partial results — if you want "topo sort of the acyclic portions," that requires filtering first
+
+### Mitigation
+
+`findCycles()` and `hasCycles()` are available for consumers that want to handle cycles gracefully before calling `topologicalOrder()`.
--- a/docs/architecture/decisions/004-workflow-cost-dag-propagation.md
+++ b/docs/architecture/decisions/004-workflow-cost-dag-propagation.md
@@ -0,0 +1,28 @@
+# ADR-004: DAG-propagation as default workflow cost model
+
+**Status**: Accepted
+
+## Context
+
+The Rust CLI computes expected value per-task independently — no upstream quality degradation. The Python research model implements DAG-propagation where each parent's failure degrades the child's effective probability. The independent model is dangerously optimistic for non-trivial workflows: poor planning (p=0.65) shows a 213% cost increase vs good planning (p=0.92) with the propagation model, but barely any difference with the independent model.
+
+## Decision
+
+**DAG-propagation is the default mode.** The independent model is a degenerate case accessible via `propagationMode: 'independent'` or `defaultQualityDegradation: 0`.
+
+## Consequences
+
+### Positive
+- More accurate cost estimates — captures the structural reality that upstream failures multiply downstream damage
+- Per-task output includes both `pIntrinsic` and `pEffective` so consumers can see the degradation effect
+- The independent model is still available as an opt-in degenerate case
+- Per-edge `qualityDegradation` allows fine-grained modeling of how much each dependency bleeds failure
+
+### Negative
+- More complex implementation than simple sum
+- Results differ from the Rust CLI — consumers migrating from CLI to library will see different numbers
+- Requires `qualityDegradation` per edge (default 0.9) which adds a concept the Rust CLI didn't have
+
+### Mitigation
+
+The `propagationMode` option allows consumers to start with the independent model and migrate to DAG-propagation when ready. The per-task `pIntrinsic`/`pEffective` split makes the propagation effect transparent.
--- a/docs/architecture/decisions/005-no-depth-escalation-v1.md
+++ b/docs/architecture/decisions/005-no-depth-escalation-v1.md
@@ -0,0 +1,26 @@
+# ADR-005: No depth-escalation heuristic in v1
+
+**Status**: Accepted
+
+## Context
+
+In the DAG-propagation model, each hop compounds another `<1.0` factor. This implicitly captures depth effects — deeper chains have more compounding. An explicit depth-escalation heuristic (increasing risk at deeper chain levels) would add another multiplicative penalty on top.
+
+## Decision
+
+**Defer depth-escalation to v2.** The multiplicative propagation model already captures depth effects implicitly. Adding an explicit depth heuristic would double-count the depth effect until we have empirical calibration data from actual task outcomes.
+
+## Consequences
+
+### Positive
+- No double-counting of depth effects
+- Simpler model to explain, implement, and debug
+- Architecture supports future depth-escalation via per-edge `qualityDegradation` adjustments or `risk` categorical escalation without API changes
+
+### Negative
+- May underestimate cost for very deep dependency chains where risk genuinely escalates with depth
+- The model treats all "hops" as equivalent — a 5-hop chain where each step is moderate risk may actually be worse than the model predicts
+
+### Future
+
+If empirical data from actual task outcomes shows that depth-escalation is needed, it can be added without API changes — either by adjusting `qualityDegradation` per depth, or by escalating the `risk` categorical. This is a calibration question, not an architecture question.
--- a/docs/architecture/decisions/006-deterministic-edge-keys.md
+++ b/docs/architecture/decisions/006-deterministic-edge-keys.md
@@ -0,0 +1,26 @@
+# ADR-006: Deterministic edge keys via addEdgeWithKey
+
+**Status**: Accepted
+
+## Context
+
+graphology's default `addEdge(source, target)` generates random edge keys (e.g., `ge8kq2`). This makes debugging harder and adds overhead from key generation. For our use case, each source→target pair has at most one edge (no parallel edges in a DAG dependency graph).
+
+## Decision
+
+Use `addEdgeWithKey` with deterministic keys in the format `${source}->${target}` (e.g., `task-a->task-b`). This produces readable, debuggable edge identifiers and skips graphology's key generation overhead.
+
+## Consequences
+
+### Positive
+- Debuggable edge identifiers — `task-a->task-b` is immediately understandable
+- No random key generation overhead
+- Deterministic — exporting and re-importing produces the same graph
+
+### Negative
+- Constraint enforced: no parallel edges between the same node pair
+- Key format collision if task IDs contain `->` (extremely unlikely with kebab-case slugs)
+
+### Mitigation
+
+Duplicate dependency declarations (same source→target pair declared twice) are a validation error, not a valid use case. The constraint is correct for DAG dependency graphs.
--- a/docs/architecture/decisions/007-subgraph-internal-only.md
+++ b/docs/architecture/decisions/007-subgraph-internal-only.md
@@ -0,0 +1,25 @@
+# ADR-007: Subgraph returns internal-only edges
+
+**Status**: Accepted
+
+## Context
+
+When filtering a graph to a subset of nodes, what happens to edges where only one endpoint is in the filtered set? Options: include cross-boundary edges (external dependencies visible), or strict internal-only (only edges where both endpoints are in the filtered set).
+
+## Decision
+
+**Strict internal-only.** `subgraph(filter)` returns a new `TaskGraph` with matching nodes and only edges where both endpoints are in the filtered set. This matches `graphology-operators` `subgraph` behavior and produces valid subgraphs for all algorithms (topo sort, betweenness, etc.).
+
+## Consequences
+
+### Positive
+- Result is always a valid (potentially disconnected) subgraph — all algorithms work correctly
+- Matches graphology's built-in subgraph behavior
+- No surprise external references in analysis results
+
+### Negative
+- External dependency information is lost — you can't see "what does this subgraph depend on outside itself" from the subgraph alone
+
+### Mitigation
+
+External dependency information is available on the original graph via `dependencies()`/`dependents()`. A separate `externalDependencies(filter)` utility can be added later if consumers need "show me what this subgraph depends on outside itself."