add architecture gap analysis review (9 critical, 12 warnings)

This commit is contained in:
2026-05-19 10:56:25 +00:00
parent d2253099ee
commit 037dfec44b

View File

@@ -0,0 +1,317 @@
---
status: open
review_date: 2026-05-19
reviewer: architect (automated gap analysis)
scope: Phase 0→1 architecture documents
baseline: docs/architecture/ (10 docs + 3 ADRs + README)
---
# Review 001: Architecture Gap Analysis
First comprehensive review of the flowgraph architecture specification, conducted at Phase 0 (pre-implementation). Three specialized reviewer sub-agents analyzed the full document set across schema/data-model, reactive/host-config, and cross-cutting/integration concerns.
## Summary
The architecture is structurally sound — clear separation between the three graph types (operation, call, template), sensible dependency choices (graphology, ujsx, @preact/signals-core), and good documentation breadth. However, 9 critical issues and 12 warnings were identified that would block or complicate implementation. Most critical issues fall into two themes: **类型定义 缺失** (referenced types not formally defined) and **行为规范 不足** (key behaviors underspecified, especially around failure propagation and lifecycle).
---
## Critical Issues
### C-01: Broken cross-reference in README
**File**: `docs/architecture/README.md`, line 56
**Problem**: Links to `call-graph-runtime.md` but the actual file is `call-graph.md`.
**Fix**: Update the link to `call-graph.md`.
**Severity**: Trivial to fix, but broken links erode trust in the documentation.
### C-02: Missing `CallEdgeAttrs` type definition
**Referenced in**: `call-graph.md:27`, `call-graph.md:251`, `build-distribution.md:28`
**Defined in schema.md**: Only `TriggeredEdgeAttrs` and `DependencyEdgeAttrs` exist individually
**Problem**: `FlowGraph<CallNodeAttrs, CallEdgeAttrs>` is used as a type parameter, but `CallEdgeAttrs` is never defined as a union or alias. Schema.md defines the individual edge types but not the combined call-graph edge type.
**Fix**: Add a `CallEdgeAttrs` type alias in schema.md:
```typescript
type CallEdgeAttrs = TriggeredEdgeAttrs | DependencyEdgeAttrs;
```
Align `CallGraphSerialized` to use this alias instead of the inline `Type.Union([TriggeredEdgeAttrs, DependencyEdgeAttrs])`.
### C-03: Missing `OperationEdgeAttrs` type reference
**Referenced in**: `analysis.md:72`, `analysis.md:132`, `analysis.md:145`, `workflow-templates.md:249`
**Defined in schema.md**: Only `TypedEdgeAttrs` exists
**Problem**: Multiple docs reference `OperationEdgeAttrs` as a type parameter (`FlowGraph<OperationNodeAttrs, OperationEdgeAttrs>`), but schema.md only defines `TypedEdgeAttrs`. This naming inconsistency will cause divergent implementations.
**Fix**: Either rename `TypedEdgeAttrs` to `OperationEdgeAttrs` (clearer, consistent with the `CallEdgeAttrs` pattern), or add an alias:
```typescript
type OperationEdgeAttrs = TypedEdgeAttrs;
```
Update all references consistently.
### C-04: Failure propagation gap in reactive execution
**File**: `reactive-execution.md`
**Problem**: When a predecessor node fails, downstream nodes that depend on it remain stuck in `idle`/`waiting` forever. Their `prerequisites` signal never resolves because it depends on the failed node transitioning to `completed`. This is THE classic DAG engine failure mode — without abort cascade, a single failure poisons all downstream nodes.
**Current spec**: Node status transitions are:
```
idle → waiting → ready → running → completed
→ failed
→ aborted
```
The `skipped` state exists for conditional branches, but no path leads downstream nodes to `skipped` or `aborted` when an upstream dependency fails.
**Fix**: Add failure propagation semantics:
- When a node transitions to `failed`, all transitive dependents that haven't started yet should transition to `aborted` (not `skipped`, which is reserved for conditional branches).
- Define whether propagation is immediate (reactive cascade) or lazy (downstream nodes check on evaluation).
- Decide whether failure propagation is automatic or requires explicit opt-in.
- Document the interaction with `Conditional` nodes: does a failed predecessor's branch skip evaluation entirely?
### C-05: No FlowGraph public API document
**Problem**: The central `FlowGraph` class is documented across at least 3 files:
- `operation-graph.md` — construction (`fromSpecs`), queries (`typeCompat`)
- `call-graph.md` — construction (`fromCallEvents`), mutations, lifecycle
- `analysis.md` — convenience methods (`topologicalOrder`, `parallelGroups`, `reachableFrom`)
No single document specifies:
- Constructor signature and type parameters (`FlowGraph<NodeAttrs, EdgeAttrs>`)
- Whether `FlowGraph` wraps or extends `graphology.DirectedGraph`
- Complete method list with signatures
- Immutability guarantees (which methods mutate vs. return new instances?)
- The delegation model — which methods are on `FlowGraph` directly vs. inherited from graphology?
**Fix**: Add a `flowgraph-api.md` document (or a dedicated section in README) that specifies the complete public API surface with method signatures.
### C-06: `<Map>` component undefined
**Referenced in**: `host-configs.md:27` (WorkflowTag union type includes `"map"`), `host-configs.md:89` (switch case for `"map"`), `host-configs.md:173`, `host-configs.md:200`
**Problem**: `WorkflowTag` is defined as `"operation" | "sequential" | "parallel" | "conditional" | "map"`, and both HostConfigs have switch cases for `"map"`, but the `<Map>` component has no documentation. The host config pseudocode for `"map"` cases is minimal:
```typescript
case "map":
// Map over array, create one child per item
```
There is no documentation of what `<Map>` does, what props it takes, how it creates children, or how it behaves in the DAG and reactive contexts.
**Fix**: Add `<Map>` component documentation to `workflow-templates.md` covering:
- Props (array source, item variable, child template)
- DAG rendering behavior (how edges are created for each mapped item)
- Reactive rendering behavior (how preconditions work for mapped nodes)
- Edge type for mapped children
### C-07: `Conditional` else-branch handling ambiguous
**File**: `workflow-templates.md:125`
**Current spec**:
```typescript
else?: UNode; // Alternative branch if test fails
```
**Problem**: Three questions are unanswered:
1. Does the `else` branch render a DAG edge? If so, what `edgeType`?
2. If the `Conditional` test evaluates `true`, does the `else` branch still exist in the graph (as a `skipped` node)? Or is it absent entirely?
3. How does the reactive engine handle the `else` branch's `prerequisites`? If the test is `false`, does the else branch inherit the conditional's prerequisites plus a "`test` was `false`" condition?
**Fix**: Specify explicitly:
- The `else` branch renders as a separate subgraph with `edgeType: "conditional"` and a negated condition.
- When test is `true`: else-branch nodes transition to `skipped`.
- When test is `false`: else-branch nodes become `ready` (same prerequisites, condition negated).
- If no `else` prop: nothing is rendered for the `false` path.
### C-08: `WorkflowReactiveRoot` vs `ReactiveHostConfig` ownership unclear
**Files**: `reactive-execution.md`, `host-configs.md`
**Problem**: Both `WorkflowReactiveRoot` and `ReactiveHostConfig` are documented, but their ownership and lifecycle relationship is unclear:
- Which creates which?
- Can you have a `ReactiveHostConfig` without a `WorkflowReactiveRoot`?
- Who owns the signal graph?
- When is the reactive graph created vs. destroyed?
- `WorkflowReactiveRoot.dispose()` tears down signals, but when is it called and by whom?
The docs describe these as separate concerns but don't specify how they interact.
**Fix**: Add a "Lifecycle and Ownership" section to either `reactive-execution.md` or `host-configs.md` that specifies:
1. Creation order: template → GraphologyHostConfig (produces DAG) → DAG → WorkflowReactiveRoot (creates signals) → ReactiveHostConfig (renders template against reactive graph)
2. Ownership: WorkflowReactiveRoot owns the signal graph; ReactiveHostConfig is stateless after rendering
3. Disposal: WorkflowReactiveRoot.dispose() tears down all signals and effects; called by the consumer when execution completes or is cancelled
### C-09: No consumer integration guide
**Problem**: There is no document showing how a consumer (alkhub, OpenCode, etc.) actually uses the library end-to-end. The individual pieces (operation graph, call graph, templates, reactive execution) are well-documented in isolation, but the integration path — from operation specs to a running workflow — is not described.
**Fix**: Add a "Getting Started" or "Consumer Integration" document that walks through the end-to-end flow:
1. Register operations → build operation graph
2. Define workflow template → validate against operation graph
3. Render template to DAG → inspect/validate DAG
4. Create reactive execution → run workflow
5. Subscribe to status changes → respond to completion/failure
6. Export/import for persistence
This doesn't need to be long — even a sequence diagram with code sketches would help.
---
## Warnings
### W-01: Inconsistent naming — `computePrerequisites` vs `computePreconditions`
**Files**: `host-configs.md:232` uses `computePrerequisites`, while `reactive-execution.md` refers to `preconditions` (and the node status is `waiting` for "preconditions not met").
**Fix**: Pick one term consistently. Suggest: `prerequisites` for the function name (since it's in the host config which is closer to the graph), `preconditions` for the conceptual state (matching `NodeStatus.waiting` description). Or just standardize on one term everywhere.
### W-02: Error boundary scope missing
**Files**: `error-handling.md`, `reactive-execution.md`, `call-graph.md`
**Problem**: `reactive-execution.md:245` references `WorkflowErrorBoundary` wrapping the reactive execution. `error-handling.md` defines error types and call-graph error boundaries. But there is no document specifying how errors propagate in the reactive context — specifically what `WorkflowErrorBoundary` wraps, how it catches signal errors, and how it interacts with the node status lifecycle.
**Fix**: Either add reactive error semantics to `error-handling.md`, or add a dedicated section to `reactive-execution.md`.
### W-03: Host context types partially undefined
**Files**: `host-configs.md`
**Problem**: `GraphContext` is defined (lines 58-65) but `ReactiveContext` is only partially defined (lines 187-197). The `ReactiveContext` is shown as having `operationRegistry` and `statusMap`, but its full shape and how it's constructed aren't specified.
**Fix**: Complete the `ReactiveContext` interface definition with all fields, construction, and lifecycle.
### W-04: Template composition rules not formalized
**Files**: `workflow-templates.md`
**Problem**: Which components are valid children of which? Can `<Operation>` appear inside `<Conditional>` test? Can `<Sequential>` have one child? Can `<Map>` appear inside `<Parallel>`? These composition rules affect both the template validator and the HostConfig implementations, but they're only implied by examples.
**Fix**: Add a "Composition Rules" section to `workflow-templates.md` with a validity matrix or explicit rules.
### W-05: Missing `removeChild` in HostConfig
**Files**: `host-configs.md`
**Problem**: The ujsx `HostConfig` interface likely requires `removeChild` for reconciliation, but only `createInstance` and `appendChild` are documented. No teardown/disposal path is shown for either host.
**Fix**: Document `removeChild` for both GraphologyHostConfig and ReactiveHostConfig, specifying what happens when a node is removed (edge cleanup, signal disposal, etc.).
### W-06: Signal/effect disposal not documented
**Files**: `reactive-execution.md`
**Problem**: `WorkflowReactiveRoot` has a `dispose()` method that tears down signals and effects (line 239), but the disposal sequence is not specified. When an individual node is removed (vs. entire graph disposal), who handles its signal cleanup?
**Fix**: Document the effect lifecycle: creation (during `createInstance`), update (via reactive recomputation), and disposal (when `removeChild` is called or `WorkflowReactiveRoot.dispose()` tears everything down).
### W-07: Serialization versioning absent
**Files**: `schema.md`
**Problem**: `FlowGraphSerialized` follows graphology's native JSON format with no `schemaVersion` field. While schema.md explicitly states this is intentional ("following taskgraph"), it also says "Consumers that need persistence wrap it in their own versioned envelope." This should be documented more prominently — possibly as a constraint or ADR.
**Fix**: Consider adding ADR-004 documenting the "no version field" decision and its trade-offs. At minimum, add a prominent note in the constraints section.
### W-08: Type compatibility depth not specified
**Files**: `analysis.md`
**Problem**: The `typeCompat()` function compares `inputSchema` and `outputSchema`, but the depth of compatibility checking is unspecified. Is it shallow (field names only)? Deep (recursive type structure)? Does it handle optional fields? Nested objects?
**Fix**: Specify the compatibility contract: what `compatible: true` means, what `compatible: false` means, and what `compatibilityDetail` contains.
### W-09: ADR status should be "Accepted"
**Files**: `decisions/001-003`
**Problem**: All three ADRs are marked `Status: Proposed`, but they represent decisions that have already been made and are reflected in the architecture. They aren't proposals anymore — they're the basis of the design.
**Fix**: Update all ADR statuses to `Accepted`.
### W-10: Missing `addDependencyEdge` in call graph mutations
**Files**: `call-graph.md`
**Problem**: The call graph mutation section describes `addTriggeredEdge` but doesn't explicitly define `addDependencyEdge` as a separate method. If both edge types use the same `addEdge` method with different attributes, that should be specified. If they're separate methods, `addDependencyEdge` needs documentation.
**Fix**: Clarify the mutation API — either document `addDependencyEdge` or explain that `addEdge` handles both types via the `EdgeType` attribute.
### W-11: Performance characteristics undocumented
**Files**: `analysis.md`
**Problem**: The `buildTypeEdges()` function is noted as potentially O(n²) in the number of operations (checking every pair for type compatibility). No other performance characteristics are documented (e.g., `topologicalOrder` is O(V+E), `parallelGroups` is O(V+E), etc.).
**Fix**: Add a "Performance" section to `analysis.md` with complexity notes for each function.
### W-12: `TemplateEdgeAttrs` vs `TypedEdgeAttrs` naming inconsistency
**Files**: `schema.md`, `build-distribution.md`
**Problem**: `schema.md` defines `TypedEdgeAttrs` (for operation graphs) and `TemplateEdgeAttrs` (for templates). `build-distribution.md:28` lists both `TypedEdgeAttrs` and `TemplateEdgeAttrs` in edge.ts, which is consistent. But `workflow-templates.md` references the edge type on template DAGs without specifying whether it uses `TemplateEdgeAttrs` or something else. Meanwhile, operation-graph.md references `TypedEdgeAttrs` correctly. The naming pattern is inconsistent: operation edges are "typed", template edges are "template", call edges are "triggered"/"dependency".
**Fix**: Standardize naming — suggest `{GraphType}EdgeAttrs` pattern: `OperationEdgeAttrs`, `CallEdgeAttrs`, `TemplateEdgeAttrs`.
---
## Suggestions (Nice to Have)
### S-01: Getting Started / Happy Path document
A consumer-facing walkthrough showing end-to-end usage, even as a sequence diagram with code sketches. This overlaps with C-09 but is worth calling out separately as a quality-of-life improvement.
### S-02: Flow diagrams for template rendering pipeline
An ASCII or Mermaid diagram showing: template → ujsx HostConfig → DAG (Graphology) → reactive execution. The current prose is accurate but a visual would help.
### S-03: Node status state machine diagram
The status transitions are documented textually but a state machine diagram (in `reactive-execution.md`) would make the lifecycle clearer, especially once failure propagation is specified (C-04).
### S-04: Testing strategy section or document
No mention of testing patterns. How do you test a reactive graph? How do you test signal-driven status propagation? How do you test template → DAG rendering? A testing section would help implementers.
### S-05: Additional ADRs for inline decisions
Several decisions are embedded in architecture docs but not formalized as ADRs:
- "Signals over generators" for reactive execution (in `reactive-execution.md`)
- "In-memory only with export boundary" (partially in ADR-003 but the "in-memory only" aspect could be separate)
- "DAG-only enforcement" (covered in ADR-002, so this is fine)
### S-06: Source structure validation
`README.md` mentions `src/graph/mutation.ts` and `src/graph/validation.ts` but these aren't covered by dedicated architecture docs. If they're part of `call-graph.md` or `operation-graph.md`, the cross-reference should be explicit. If they deserve their own docs, they should have them.
---
## ADR Review
### ADR-001: ujsx as Template IR — ✅ Good
Well-structured, clear context/decision/consequences. The trade-offs (function props don't serialize, ujsx dependency) are honestly documented.
### ADR-002: DAG-only Graph — ✅ Good
Clear rationale, honest about the difference from taskgraph. The consequences section correctly notes that `topologicalOrder()` never throws and `hasCycles()` always returns `false`.
### ADR-003: Decoupled Storage — ✅ Good
Strong separation of concerns argument. The consequence "The hub must keep its storage schema in sync with FlowGraphSerialized" is important and well-called-out.
**All three**: Status should be `Accepted`, not `Proposed`. These decisions are reflected in the architecture — they're past the proposal stage.
---
## Resolution Checklist
When addressing these issues, use this checklist to track progress:
- [ ] C-01: Fix README cross-reference link
- [ ] C-02: Add `CallEdgeAttrs` type alias to schema.md
- [ ] C-03: Resolve `OperationEdgeAttrs` vs `TypedEdgeAttrs` naming
- [ ] C-04: Specify failure propagation semantics in reactive-execution.md
- [ ] C-05: Create FlowGraph public API document
- [ ] C-06: Document `<Map>` component in workflow-templates.md
- [ ] C-07: Specify `Conditional` else-branch behavior
- [ ] C-08: Specify `WorkflowReactiveRoot``ReactiveHostConfig` ownership
- [ ] C-09: Create consumer integration guide
- [ ] W-01: Standardize `prerequisites` vs `preconditions` terminology
- [ ] W-02: Add reactive error boundary semantics
- [ ] W-03: Complete `ReactiveContext` interface definition
- [ ] W-04: Add template composition rules
- [ ] W-05: Document `removeChild` for both HostConfigs
- [ ] W-06: Document signal/effect disposal lifecycle
- [ ] W-07: Consider ADR-004 for "no schema version"
- [ ] W-08: Specify type compatibility depth
- [ ] W-09: Update ADR statuses to Accepted
- [ ] W-10: Clarify call graph mutation API (`addDependencyEdge`)
- [ ] W-11: Add performance characteristics section
- [ ] W-12: Standardize edge attribute naming pattern
- [ ] S-01: Getting Started walkthrough document
- [ ] S-02: Flow diagrams for template rendering pipeline
- [ ] S-03: Node status state machine diagram
- [ ] S-04: Testing strategy documentation
- [ ] S-05: Additional ADRs for inline decisions
- [ ] S-06: Validate source structure cross-references
---
## References
- Architecture overview: `docs/architecture/README.md`
- Schema: `docs/architecture/schema.md`
- Call graph: `docs/architecture/call-graph.md`
- Operation graph: `docs/architecture/operation-graph.md`
- Analysis: `docs/architecture/analysis.md`
- Reactive execution: `docs/architecture/reactive-execution.md`
- Host configs: `docs/architecture/host-configs.md`
- Workflow templates: `docs/architecture/workflow-templates.md`
- Error handling: `docs/architecture/error-handling.md`
- Build & distribution: `docs/architecture/build-distribution.md`
- ADR-001: `docs/architecture/decisions/001-ujsx-as-template-ir.md`
- ADR-002: `docs/architecture/decisions/002-dag-only-graph.md`
- ADR-003: `docs/architecture/decisions/003-storage-decoupled.md`