Architecture documents use YAML frontmatter with `status` and `last_updated` fields:
```yaml
---
status:draft | stable | deprecated
last_updated:YYYY-MM-DD
---
```
| Status | Meaning | Transitions |
|--------|---------|-------------|
| `draft` | Under active development. Content may change significantly. Implementation should not start until the document reaches `stable`. | → `stable` when implementation is complete and API contract is verified by tests. |
| `stable` | API contracts are locked. Changes require a review cycle and may warrant an ADR if they affect documented decisions. | → `deprecated` when superseded. → `draft` if a fundamental redesign is needed (rare). |
| `deprecated` | Superseded by another document. Kept for reference. Links should point to the replacement. | Removed when no longer referenced. |
ADR documents use a separate `Status` field in their body: `Proposed`, `Accepted`, `Deprecated`, or `Superseded`. ADRs never revert from `Accepted`.
@@ -89,7 +89,7 @@ Each task in the `WorkflowCostResult.tasks` array includes both `pIntrinsic` and
### Skip-completed semantics
When `includeCompleted: false`, completed tasks are excluded from the result's task list, but they **remain in the propagation chain** with p=1.0. Removing completed tasks from propagation would *worsen* downstream probability estimates — exactly the opposite of what "what's left" queries need.
When `includeCompleted: false`, tasks with `status: "completed"` are excluded from the result's task list, but they **remain in the propagation chain** with p=1.0. Removing completed tasks from propagation would *worsen* downstream probability estimates — exactly the opposite of what "what's left" queries need. Only the `"completed"` status triggers this exclusion; tasks with `"failed"` or `"blocked"` status are included regardless of the `includeCompleted` setting.
> See [ADR-004](decisions/004-workflow-cost-dag-propagation.md) and [ADR-005](decisions/005-no-depth-escalation-v1.md).
The original design specified a Rust core with napi-rs bindings, extracting the graph logic from the existing taskgraph CLI. This would provide high performance but introduced significant complexity.
## Alternatives Considered
- **NAPI/Rust (original plan)**: Build a Rust core with napi-rs bindings, extracting logic from the existing taskgraph CLI. Rejected because of cross-platform build complexity (macOS x64/ARM64, Linux x64/ARM64, Windows x64 each need separate binaries) and minimal performance benefit at our graph sizes (10–200 nodes).
- **WASM-compiled Rust**: Compile the Rust graph logic to WebAssembly. Rejected because it reintroduces the Rust toolchain in CI and WASM cold-start latency, without addressing the core complexity problem that the pivot was designed to remove.
- **Manual adjacency map**: Implement graph algorithms on a plain `Map<string, Set<string>>`. Rejected because graphology already provides all needed DAG algorithms (topo sort, cycle detection, betweenness centrality, subgraph extraction) and is already in the dependency tree.
- **D3 or other JS graph libs**: Use d3-graph or similar. Rejected because graphology is already in the alkhub dependency tree, provides the needed algorithms, and has a clear path to UI (sigma.js/react-sigma).
## Decision
Pivot to pure TypeScript with graphology as the graph engine. No Rust compilation, no native addons, no platform-specific binaries.
@@ -25,4 +25,4 @@ When task data changes (file edits, DB updates), the in-memory graph needs to re
If a future use case requires incremental updates, add it as an optimization then. The API surface (construction methods) supports both patterns — incremental construction exists via `addTask`/`addDependency`.
An incremental update architecture has been explored in [incremental-update-exploration.md](../incremental-update-exploration.md). The key finding is that **the win is reactivity (fine-grained event notifications), not performance**. For <200 node graphs, rebuild is always sub-millisecond. If a consumer needs reactive updates, they can use graphology's event system directly via `graph.raw` and implement change detection at the consumer layer, without the library taking on the complexity of diff-based updates.
An incremental update architecture has been explored (draft, not yet a decision) in [incremental-update-exploration.md](../incremental-update-exploration.md). The key finding is that **the win is reactivity (fine-grained event notifications), not performance**. For <200 node graphs, rebuild is always sub-millisecond. If a consumer needs reactive updates, they can use graphology's event system directly via `graph.raw` and implement change detection at the consumer layer, without the library taking on the complexity of diff-based updates.
@@ -71,8 +71,9 @@ The actual YAML parsing is delegated to `yaml.parse()`. The serializer uses `yam
- **No gray-matter, no js-yaml** — these are hard exclusions for supply chain security.
- **YAML 1.2 only** — the `yaml` package implements YAML 1.2, which is a superset of JSON and avoids the ambiguous type coercion issues of YAML 1.1.
- **Frontmatter is a parsing concern, not a graph concern** — parsed `TaskInput` objects are fed to `TaskGraph.fromTasks()`. The parser doesn't know about graphs; the graph doesn't know about files.
- **File I/O functions use Node.js `fs` APIs** — `parseTaskFile` and `parseTaskDirectory` depend on `node:fs/promises` and are only available in Node.js-compatible runtimes. `parseFrontmatter` (the pure parsing function) is runtime-agnostic. Consumers targeting Deno or Bun should use `parseFrontmatter` directly with their own file-reading mechanism, or import the file I/O functions from a separate entry point if a browser-compatible bundle is needed.
## References
-`yaml` package: https://github.com/eemeli/yaml
- CVE-2025-64718 (js-yaml prototype pollution): tracked in npm audit database
- CVE-2025-64718 (js-yaml prototype pollution via `<<` merge key): confirmed, patched in js-yaml 4.1.1 and 3.14.2, but gray-matter still depends on the vulnerable 3.x line
**Draft exploration** — not yet a decision. This document explores whether incremental updates (via TypeBox Diff/Patch → graphology mutation mapping) could replace or complement the current "rebuild on change" approach.
**Draft exploration** — not yet a decision. This document explores whether incremental updates (via TypeBox Diff/Patch → graphology mutation mapping) could replace or complement the current "rebuild on change" approach. If this exploration leads to a decision, it will become ADR-008.
## Current Approach: Rebuild on Change
@@ -168,7 +168,7 @@ The key insight is that **the win is reactivity, not performance**. For <200 nod
## References
- ADR-002: Rebuild vs Incremental → [decisions/002-rebuild-vs-incremental.md](decisions/002-rebuild-vs-incremental.md)
### C-1. Inconsistency in Construction Error Policy — `DuplicateNodeError`/`DuplicateEdgeError` vs. "Construction Never Throws"
**Location**: `errors-validation.md`, "Construction vs. Validation Error Handling" section (lines 117–123); contradicts `api-surface.md`, line 60.
**Issue**: `errors-validation.md` line 119 states:
> "Construction never throws — `fromTasks`, `fromRecords`, `fromJSON`, `addTask`, `addDependency` can be called freely. `DuplicateNodeError` and `DuplicateEdgeError` are the exceptions — they represent programming errors"
This is a direct contradiction. If "construction never throws" then `DuplicateNodeError` and `DuplicateEdgeError` should not exist as thrown errors during construction. The section header and first sentence establish a principle, then immediately carve out exceptions that swallow the principle.
`api-surface.md` line 60 says `addTask` "throws `DuplicateNodeError` if the ID already exists" which is consistent with the exception but contradicts the stated principle.
**Recommendation**: Reformulate. Either:
- State clearly that construction throws only for programming errors (duplicate IDs), not for data quality issues — and remove the absolute "never throws" claim. Something like: "Construction throws only for precondition violations (duplicate nodes/edges that already exist), not for data validation issues."
- Or make `addTask`/`addDependency` idempotent (no-op on duplicate) for truly "construction never throws" semantics, with `validate()` as the path to detect duplicates.
---
### C-2. `qualityDegradation` Semantic Inversion — Higher Value Means Less Degradation
**Issue**: `DependencyEdge.qualityDegradation` is defined in `schemas.md` as "0.0–1.0, default 0.9" and described as "how much upstream failure bleeds through," with: "0.0 = no propagation (independent model), 1.0 = full propagation."
But in `cost-benefit.md` line 69, the propagation formula is:
With default 0.9, this gives: `parentP + (1 - parentP) × 0.1`. A `qualityDegradation` of 0.9 results in **low** actual propagation (only 10% of the failure bleeds through), while 0.0 results in full propagation.
The name `qualityDegradation` says "high value = high degradation" but the formula says "high value = high quality retention, low degradation." This inverted semantics will cause real bugs.
**Recommendation**: Either:
- Rename to `qualityRetention` (0.9 = 90% quality retained, low bleeding), OR
- Invert the formula so high values = high degradation, OR
- Add an explicit "Note on naming" section: "Despite the name, `qualityDegradation` represents quality *retention*. A value of 0.9 means 90% quality is retained; only 10% of the upstream failure propagates. The `(1 - qualityDegradation)` term in the formula extracts the degradation fraction."
---
### C-3. Undefined `Nullable` Helper — Definition After First Use
**Location**: `schemas.md`, lines 46–66 (first use), line 219 (definition).
**Issue**: `TaskInput` uses `Type.Optional(Nullable(TaskStatusEnum))` on lines 54–63, but `Nullable` is only defined at line 219 in the `ResolvedTaskAttributes` section. An implementer reading top-down encounters `Nullable` without understanding what it does.
Additionally, it's unclear whether `Nullable` is from `@alkdev/typebox` or defined locally.
**Recommendation**: Define `Nullable` in a "Shared Schema Utilities" section at the top of `schemas.md` (before `TaskInput`), or at minimum add a forward reference at first use: "see Shared Schema Utilities below for the `Nullable` helper."
---
## Warnings
### W-1. ADR-002 Mitigation References Exploration as If Settled
**Issue**: The mitigation section cites `incremental-update-exploration.md` as if it's a completed exploration, but that document is explicitly "Draft exploration — not yet a decision" (its line 8). If someone reads only ADR-002, they'd assume the incremental exploration is settled.
**Recommendation**: Add a qualifier: "An incremental update architecture has been explored (draft, not yet a decision) in …"
**Issue**: ADR-001 is the foundational decision but lacks an explicit "Alternatives Considered" section. Alternatives (pure Rust, WASM compile, napi-rs) are mentioned in context but not structured. ADRs 002–007 have clearer structure.
**Recommendation**: Add an "Alternatives Considered" section: NAPI/Rust (original plan — build complexity), WASM-compiled Rust (reintroduces Rust toolchain), manual adjacency map (no DAG algorithms), D3/other JS graph libs (graphology already in tree).
---
### W-3. No Document Lifecycle Protocol
**Location**: All peripheral documents have `status: draft`; all ADRs say `Status: Accepted`.
**Issue**: No protocol for when/how documents transition from draft to stable, and no definition of what "draft" means (unstable? unreviewed? subject to change?).
**Recommendation**: Define document lifecycle states (draft → stable → deprecated) in `frontmatter.md` or a governance section in `README.md`. Add transition criteria (e.g., "stable after implementation is complete and tests pass").
---
### W-4. `TaskStatus` Enum Values Never Listed; "Completed" Semantics Undefined
**Location**: `schemas.md` line 143 ("same pattern for TaskImpact, TaskLevel, TaskPriority, TaskStatus"), `cost-benefit.md` line 82–83 (`includeCompleted` semantics).
**Issue**: The actual string values of `TaskStatusEnum` are never defined. `cost-benefit.md` mentions `includeCompleted: false` but doesn't define which enum value(s) constitute "completed."
**Recommendation**: Define `TaskStatusEnum` values explicitly. Specify which status value(s) the `includeCompleted` option treats as "completed."
---
### W-5. File I/O Functions May Not Work in Non-Node Runtimes
**Issue**: `parseTaskFile` and `parseTaskDirectory` are async and presumably use Node.js `fs` APIs. ADR-001 and `build-distribution.md` state the library works in Node, Deno, and Bun. Are these functions available in all runtimes? If not, this needs documenting.
**Recommendation**: Document environment constraints. Consider a separate export path (e.g., `@alkdev/taskgraph/fs`) for file I/O functions to avoid bundling Node APIs into Deno/Bun consumers.
**Issue**: The redirect file lists modular documents but doesn't mention ADRs under `docs/architecture/decisions/`. A reader following this redirect would miss the decision records.
**Recommendation**: Add a line pointing to `docs/architecture/decisions/` or to the ADR table in `README.md`.
---
### W-7. CVE Number for js-yaml Appears Incorrect
**Location**: `frontmatter.md`, line 40.
**Issue**: References "CVE-2025-64718" for js-yaml prototype pollution. The CVE number format and sequence number appear inconsistent with typical CVE assignment patterns. An incorrect CVE undermines the supply-chain security argument.
**Recommendation**: Verify the actual CVE number for the js-yaml prototype pollution vulnerability. Update or mark as "referenced in npm audit" if the specific CVE can't be confirmed.
---
### W-8. `WorkflowCostOptions.limit` Is Undocumented
**Location**: `api-surface.md`, lines 136–137.
**Issue**: `WorkflowCostOptions` includes `limit?: number` with no documentation of what it constrains or its default behavior.
**Recommendation**: Document what `limit` does (number of tasks returned? depth limit?) and its behavior when omitted.
**Issue**: References like `/workspace/@alkdev/typebox/docs/values/diff-patch.md` and `/workspace/@alkimiadev/taskgraph/docs/framework.md` are monorepo-internal absolute paths that won't resolve outside this workspace (published docs, open-source context).
**Recommendation**: Convert to relative paths from repository root or link to published URLs. Keep workspace-absolute paths only in a clearly marked "Developer Notes" section.
---
### W-10. `fromTasks`/`fromRecords` Edge Construction Semantics Underspecified
- Whether `fromRecords` requires edges to reference tasks in the same `tasks` array
- What happens with dangling edge references (validation error? silently dropped?)
- Whether edge order matters in the `edges` array
- Whether `fromTasks`/`fromRecords` throw `DuplicateEdgeError` or silently deduplicate
**Recommendation**: Document edge construction semantics: dangling reference handling, duplicate edge handling, and edge ordering.
---
## Suggestions
### S-1. Enhance ADR Index with One-Line Summaries
The ADR table in `README.md` lists ADR number and title but not consequences. Adding one-line consequence summaries would let readers quickly understand each decision's impact without opening every ADR.
### S-2. Add a Glossary
Terms like `qualityDegradation`, `pIntrinsic`, `pEffective`, `EV` (expected value), `DAG-propagation`, `degenerate case` are used throughout but never formally defined in one place. A glossary in `README.md` or a separate `glossary.md` would help new readers.
### S-3. Document Test Runner in `build-distribution.md`
The build/distribution doc lists the `test/` directory structure but doesn't specify the test runner, CI expectations, or coverage requirements. This is architecturally relevant for a pure TS library.
### S-4. Fix Incomplete Error Class Listing in Project Structure
**Location**: `build-distribution.md`, line 54.
The error directory listing shows `TaskgraphError, TaskNotFoundError, CircularDependencyError, InvalidInputError` but omits `DuplicateNodeError` and `DuplicateEdgeError` which are defined in `errors-validation.md`.
**Location**: `api-surface.md` line 52, `graph-model.md` line 78.
Consumers can access the underlying graphology instance via `graph.raw`, but mutations made directly bypass `TaskGraph` invariants (deterministic edge keys, no-parallel-edges). Add a warning about unsafe direct mutation.
### S-6. Make `architecture.md` Redirect Minimal
The redirect file at `docs/architecture.md` duplicates the document listing from `README.md`. Consider making it a simple pointer to avoid staleness.
### S-7. Note That `incremental-update-exploration.md` May Become a Future ADR
This document is a companion to ADR-002 but has no ADR number. When it reaches a decision, it should become ADR-008. Add a note: "Future: May become ADR-008 if incremental updates are adopted."
---
## Strengths
- **ADR structure is consistent**: ADRs 002–007 follow clear Context/Decision/Consequences with Positive/Negative/Mitigation subsections.
- **Cross-references are strong**: Main docs consistently link to relevant ADRs, and ADRs reference each other and peripheral docs.
- **Construction vs. validation distinction is well-articulated**: The three-way split (construction, validation, operation) is a clear principled decision — once the DuplicateNodeError contradiction is resolved.
- **Supply-chain security reasoning in `frontmatter.md` is exemplary**: CVE references, package counts, recent attack examples — this is how supply-chain decisions should be documented.
- **Threat model is clearly stated**: The README threat model directly connects the library's existence to a concrete security posture.
- **Propagation formula is clearly specified**: Despite the naming concern (C-2), the actual math in `cost-benefit.md` is detailed enough for unambiguous implementation.
---
## Resolution Status (2026-04-26)
All issues resolved. See `tasks/architecture/` for task details.
| Issue | Resolution |
|-------|-----------|
| C-1 (Construction error policy) | ✅ Reworded to "Construction methods enforce uniqueness, not data quality" — no longer contradicts DuplicateNodeError/DuplicateEdgeError. Construction error handling table added to graph-model.md. |
| C-2 (qualityDegradation naming) | ✅ Renamed to `qualityRetention` across all documents. Added explicit "Note on naming" in schemas.md and graph-model.md documenting the rename. Updated ADR-004 and ADR-005. |
| C-3 (Nullable helper placement) | ✅ Moved Nullable definition to "Schema Utility: Nullable" section before TaskInput in schemas.md. |
| W-1 (ADR-002 draft qualifier) | ✅ Added "(draft, not yet a decision)" qualifier to ADR-002's reference to incremental-update-exploration. |
| W-7 (CVE-2025-64718) | ✅ Verified — CVE is legitimate. Updated frontmatter.md to confirm the CVE number and note patching status (js-yaml 4.1.1/3.14.2). |
| W-8 (limit parameter) | ✅ Already documented with inline comments in api-surface.md (resolved by parallel agent). |
| W-9 (Workspace paths) | ✅ Converted workspace-absolute paths to relative/monorepo references in README.md and incremental-update-exploration.md. External links use published URLs where available. |
| W-10 (Edge construction semantics) | ✅ Construction Error Handling table added to graph-model.md with per-method behaviors for dangling references, duplicates, and cycles (resolved by parallel agent). |
| S-4 (Error class listing) | ✅ DuplicateNodeError and DuplicateEdgeError added to build-distribution.md project structure listing (resolved by parallel agent). |
| S-5 (graph.raw mutation safety) | ✅ Warning about direct mutation bypassing TaskGraph invariants added to api-surface.md (resolved by parallel agent). |
| S-7 (Incremental ADR note) | ✅ Added "If this exploration leads to a decision, it will become ADR-008" to incremental-update-exploration.md. |
- **All workspace-internal reference paths resolve correctly**: Verified every path reference and all point to existing files.
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.