Files
taskgraph_ts/docs/architecture/frontmatter.md
glm-5.1 e592caed57 Add architecture review findings and address documentation issues
Review of all ADR documents (001-007) and peripheral architecture docs
identified 3 critical, 10 warning, and 7 suggestion issues.

Addressed in this commit:
- W-1: Add draft qualifier to ADR-002 reference to incremental exploration
- W-2: Add Alternatives Considered section to ADR-001
- W-3: Add Document Lifecycle section to README.md (draft/stable/deprecated)
- W-4: Clarify includeCompleted semantics (only 'completed' status triggers exclusion)
- W-5: Document file I/O runtime constraints in frontmatter.md
- W-6: Add ADR reference to architecture.md redirect
- W-7: Verify CVE-2025-64718 (confirmed real, improved description)
- W-9: Convert workspace-absolute paths to relative/monorepo references
- S-7: Add future ADR-008 note to incremental-update-exploration.md

Critical issues (C-1, C-2, C-3) and remaining warnings (W-8, W-10, S-4, S-5)
were addressed by a parallel agent in a prior commit.

All 16 review tasks created and resolved.
2026-04-26 09:41:05 +00:00

79 lines
4.8 KiB
Markdown

---
status: draft
last_updated: 2026-04-26
---
# Frontmatter Parsing
Parsing and serialization of task markdown files with YAML frontmatter. Included in this package, not a separate module.
## Overview
The library provides frontmatter parsing so that file-based consumers (e.g., the future OpenCode plugin) can read task markdown files directly without depending on an external parser. This supports the same YAML frontmatter format as the Rust CLI.
## Public Functions
```typescript
function parseFrontmatter(markdown: string): TaskInput
function parseTaskFile(filePath: string): Promise<TaskInput>
function parseTaskDirectory(dirPath: string): Promise<TaskInput[]>
function serializeFrontmatter(task: TaskInput, body?: string): string
```
`parseFrontmatter` and `parseTaskFile` also run TypeBox validation on the parsed data before returning — invalid frontmatter throws `InvalidInputError` with field-level details.
### parseTaskDirectory Semantics
- **Recursive** — scans subdirectories recursively
- **File extension** — `.md` only
- **No frontmatter** — files without valid `---`-delimited frontmatter are silently skipped
- **I/O errors** — throws the underlying Node.js error (ENOENT, EACCES, etc.)
This is a convenience wrapper for the common case. Consumers that need different discovery semantics (non-recursive, different extensions, custom filtering) should implement their own file discovery and call `parseTaskFile` per file.
## No gray-matter — Self-contained Splitter + yaml
The library writes its own `---` delimited frontmatter splitter and uses `yaml` (by eemeli) as the sole YAML parser. **`gray-matter` is not a dependency.**
This is a deliberate supply-chain security decision:
- **`gray-matter` depends on `js-yaml@3.x`** — an old version with known code injection vulnerabilities (CVE-2025-64718 — prototype pollution via YAML merge key `<<`). Even with gray-matter's custom engine API, `js-yaml` is still *installed* in `node_modules` as a transitive dependency. The attack surface is the install, not the import.
- **gray-matter's full tree is 11 packages** (js-yaml, argparse, kind-of, section-matter, extend-shallow, is-extendable, strip-bom-string, etc.) — none of which we need.
- **Recent npm supply chain attacks** (April 2026: 18-package phishing compromise targeting chalk/debug/etc., the Shai-Hulud self-replicating worm hitting 500+ packages, the axios RAT incident) demonstrate that every dependency in the tree is potential attack surface.
### What we don't replicate from gray-matter
TOML/Coffee engines, JavaScript eval engine, `section-matter` (nested sections), in-memory cache. We don't use any of these.
### `yaml` package profile
- Zero dependencies, full YAML 1.2 spec compliance, no known CVEs
- Actively maintained, excellent TypeScript types
- Single-package blast radius — if compromised, tractable to fork (pure JS)
### WASM YAML parser — considered and rejected
A Rust YAML crate compiled to WASM was considered as an alternative, but it reintroduces complexity the napi→graphology pivot was designed to remove (Rust toolchain in CI, WASM compile target, cold-start latency, FFI boundary). The marginal security gain over `yaml` (already zero-dep) doesn't justify the added build complexity.
## Splitter Design
The frontmatter splitter is a simple `---` delimiter parser (~40 lines). It:
1. Checks for opening `---` delimiter (not `----`)
2. Finds closing `\n---` delimiter
3. Extracts the YAML data string and the markdown content body
4. Returns `{ data: string, content: string }` or `null` if no valid frontmatter
The actual YAML parsing is delegated to `yaml.parse()`. The serializer uses `yaml.stringify()` for the data portion.
## Constraints
- **No gray-matter, no js-yaml** — these are hard exclusions for supply chain security.
- **YAML 1.2 only** — the `yaml` package implements YAML 1.2, which is a superset of JSON and avoids the ambiguous type coercion issues of YAML 1.1.
- **Frontmatter is a parsing concern, not a graph concern** — parsed `TaskInput` objects are fed to `TaskGraph.fromTasks()`. The parser doesn't know about graphs; the graph doesn't know about files.
- **File I/O functions use Node.js `fs` APIs** — `parseTaskFile` and `parseTaskDirectory` depend on `node:fs/promises` and are only available in Node.js-compatible runtimes. `parseFrontmatter` (the pure parsing function) is runtime-agnostic. Consumers targeting Deno or Bun should use `parseFrontmatter` directly with their own file-reading mechanism, or import the file I/O functions from a separate entry point if a browser-compatible bundle is needed.
## References
- `yaml` package: https://github.com/eemeli/yaml
- CVE-2025-64718 (js-yaml prototype pollution via `<<` merge key): confirmed, patched in js-yaml 4.1.1 and 3.14.2, but gray-matter still depends on the vulnerable 3.x line