fix(process): add architect Safe Exit for deferred decisions, clean hedging language

- Add Safe Exit section to architect spec: when a decision genuinely can't be made, mark OQ as deferred(scope) with concrete blocking condition, create blocker task, move on - Add anti-patterns #10-#11 covering hedging synonyms (feature extension, additive, not a v1 blocker, for now, resolved with escape hatch) - Add hedging audit to architect self-review checklist - Clean hedging language from resolved OQs (OQ-04, OQ-13, OQ-14, OQ-16) - Reclassify OQ-32 and OQ-41 as deferred(scope) with blocking conditions - Add deferred(scope) status to OQ format in sdd_process.md
2026-07-04 15:31:04 +00:00
parent f390550a06
commit 3daecd7ab2
3 changed files with 122 additions and 16 deletions
--- a/.opencode/agents/architect.md
+++ b/.opencode/agents/architect.md
@@ -176,8 +176,21 @@ Before requesting external review:
 - Check that README has a complete ADR table and doc table
 - Ensure documents are focused (split if a spec exceeds ~700 lines)
 - Verify frontmatter statuses are correct
 - **Hedging audit**: Scan resolved OQs for hedging synonyms (anti-patterns
  #9–#11). If a "resolved" OQ's resolution is primarily about how the
  decision can be changed later, either drop the undo instructions (the
  decision is made) or re-mark it `deferred(scope)` (the decision is not
  made).
-### 5. Request Architecture Review
+### 5. Safe Exit: Deferred Decisions
 When you encounter a decision that genuinely can't be made:
 1. Mark the OQ as `deferred(scope)` with a concrete blocking condition
 2. Create a blocker task in `tasks/architecture/` naming the dependency
 3. Continue to decisions that *can* be made — do not stall on one question
 ### 6. Request Architecture Review
 Spawn a review subagent:
@@ -191,25 +204,27 @@ task(
    4. Undefined terms or concepts
    5. Ambiguities that could cause implementation issues
    6. Document size (recommend split if >700 lines)
    7. Hedging language in resolved OQs (anti-patterns #9-#11)
    Return a structured review with issues categorized as: critical, warning, suggestion",
    subagent_type="general"
 )
 ```
-### 6. Iterate Based on Review
+### 7. Iterate Based on Review
 Address feedback:
 - **Critical**: Must fix before stabilization — inline decisions not extracted,
-  ADR references that point to nonexistent files, undefined terms
+  ADR references that point to nonexistent files, undefined terms, hedging
  language in resolved OQs
 - **Warning**: Should fix — missing cross-references, documents approaching
  split threshold
 - **Suggestion**: Consider — minor clarity improvements
 Iterate until zero critical issues.
-### 7. Mark Review Status
+### 8. Mark Review Status
 When all open questions for a document are resolved and review is complete:
@@ -248,9 +263,9 @@ last_updated: 2026-05-29
   answer is resolved, not left "open" with hedging language like "v1 default"
   or "can be revisited later." If the decision is made, mark it resolved. If
   the decision genuinely can't be made yet (the use case isn't concrete,
-   the options aren't clear), leave it open — but say *why* it can't be made,
+   the options aren't clear), mark it `deferred(scope)` — see Safe Exit below.
-   not "we'll decide later." The architect's job is to make architecture
+   The architect's job is to make architecture decisions that *can* be made
-   decisions, not to defer them to the implementation agent.
+   and to clearly identify which decisions *can't* be made yet and why.
 ## Door Types and Decision Urgency
@@ -309,6 +324,86 @@ door, decide later."
   actually made. If the decision is made, state it cleanly. Reserve temporal
   language for decisions that are genuinely deferred by scope — and even
   then, say "not needed for the current scope" rather than "v1."
 10. **Hedging synonyms in "resolved" OQs**: The following patterns are
    structurally identical to the hedging in #9 — they reframe deferral as
    decisiveness. Do not use them on resolved decisions:
    - "feature extension, not an unmade decision" — if it's not decided, it's
      not resolved. Mark it `deferred(scope)`.
    - "additive, not blocking" — if it's not decided, don't claim it is.
    - "two-way door — can be changed later if needed" — door type classifies
      reversal cost, not whether a decision is made. A two-way door is a
      decision you make now. If you're using it to justify not deciding, see
      anti-pattern #8.
    - "not a v1 blocker" — if it's not decided, it's deferred. Say what
      unblocks it.
    - "for now" / "not yet" on a resolved OQ — if the resolution has an
      expiration date, it's not resolved. Mark it `deferred(scope)` with the
      condition that would trigger re-evaluation.
 11. **Resolved with escape hatch**: An OQ marked `resolved` whose resolution
    text is primarily about how the decision can be changed later. If the
    resolution is "X, but here's how we'd undo X," the decision is made —
    drop the undo instructions (they're implementation details, not
    architecture). If the resolution is "X for now, Y later," the decision
    is not made — mark it `deferred(scope)`.
 ## Safe Exit: Deferred Decisions
 When a decision genuinely can't be made because the information doesn't exist
 yet, the architect has a Safe Exit path. This is not a failure — it's scope
 management. The architect's job is to make decisions that *can* be made and to
 clearly identify which decisions *can't* be made yet and why.
 ### When to Defer
 A decision should be deferred when:
 - The use case isn't concrete (e.g., "we don't know what the agent crate will
  need from the call protocol")
 - The options depend on something that doesn't exist yet (e.g., "depends on
  the alknet-http crate spec")
 - The trade-off requires data that can only come from implementation (e.g.,
  "need performance benchmarks to choose between X and Y")
 - The decision is genuinely not needed for the current scope (e.g., "the
  current scope is core + call crates; this question is about the agent crate")
 ### How to Defer
 1. **Mark the OQ as `deferred(scope)`** — not `open` (implies it should be
   resolved now) and not `resolved` (implies it's decided).
 2. **State the blocking condition** — what specific thing would unblock this
   decision? Be concrete: "blocked on: alknet-agent crate spec exists" not
   "blocked on: future work."
 3. **Create a blocker task** in `tasks/architecture/` that names the
   dependency. This makes the deferral visible and actionable rather than
   buried in hedging language.
 4. **Move on** — the architect continues to decisions that *can* be made.
   Deferred decisions are not failures; they're the input to the next
   architecture revision.
 ### Deferred OQ Format
 ```markdown
 ### OQ-NN: <Question>
 - **Origin**: [spec-doc.md]
 - **Status**: deferred(scope)
 - **Door type**: <one-way | two-way>
 - **Priority**: <high | medium | low>
 - **Blocked on**: <concrete dependency — crate spec, POC result, use case>
 - **Resolution**: Not yet decidable. <Why the information doesn't exist yet.>
 - **Cross-references**: OQ-NN, ADR-NNN
 ```
 ### What NOT to Do
 - Do not mark a deferred decision as `resolved` with caveats. "Resolved with
  an escape hatch" is hedging.
 - Do not use "feature extension" / "additive" / "not blocking" as a
  substitute for `deferred(scope)`. Those phrases describe implementation
  sequencing, not architectural decisions.
 - Do not leave a deferred decision as `open` without a blocking condition.
  "Open" means "needs to be resolved now" — if it can't be resolved now, it's
  `deferred(scope)`.
 ## When to Redirect
--- a/docs/architecture/open-questions.md
+++ b/docs/architecture/open-questions.md
@@ -7,6 +7,12 @@ last_updated: 2026-07-04
 Questions are organized by theme. Each question has a stable OQ-ID for cross-referencing from spec documents.
 **Status values**:
 - `open` — Needs to be resolved now. Has a clear path to resolution.
 - `resolved` — Decided. The resolution is stated cleanly, without caveats about how it could be changed later.
 - `deferred(scope)` — Cannot be resolved yet. The information doesn't exist. Has a concrete blocking condition (e.g., "blocked on: alknet-agent crate spec"). Not a failure — scope management.
 - `partially resolved` — Some aspects decided, others deferred or open.
 Door type classifications follow ADR-009 — they describe **reversal cost** (how expensive it is to undo), not urgency:
 - **One-way door**: Reversal requires rewriting significant code or permanently closes a capability. Getting it wrong is expensive — requires ADR before implementation.
 - **Two-way door**: Reversal is cheap or additive. Getting it wrong is recoverable — decide, implement, revert if needed.
@@ -50,7 +56,7 @@ Door type is separate from whether a decision is made. A two-way door is a decis
 - **Status**: resolved
 - **Door type**: Two-way
 - **Priority**: low
- **Resolution**: Static registration at startup. `HandlerRegistry` is immutable after construction. ALPN strings in the TLS `ServerConfig` are derived from the registry at startup — adding a handler at runtime requires rebuilding the TLS config. The `ArcSwap<HandlerRegistry>` pattern can be applied later if needed (two-way door). See ADR-010.
+- **Resolution**: Static registration at startup. `HandlerRegistry` is immutable after construction. ALPN strings in the TLS `ServerConfig` are derived from the registry at startup. See ADR-010.
  **Scope clarification (ADR-024)**: This resolution applies to the
  **`HandlerRegistry`** (ALPN string → `ProtocolHandler`), which is what
@@ -191,7 +197,7 @@ These questions are acknowledged but not active. They will be promoted to open w
 - **Status**: resolved
 - **Door type**: Two-way
 - **Priority**: medium
- **Resolution**: alknet-call uses `/{service}/{op}` (e.g., `/fs/readFile`, `/agent/chat`, `/services/list`). This is the correct format for the alknet-call crate — it is not a "Phase 1 simplification" but the right design for this architecture. The `/{node}/{service}/{op}` pattern from the reference implementation served a head/worker routing model that is a separate architectural concern. Remote dispatch (federation / node-level routing) would be a different mechanism at a different layer, not a prefix added to alknet-call's operation paths. If remote dispatch is ever needed, it would be addressed by a separate crate or a routing layer above the operation registry, not by changing alknet-call's path format. Two-way door — the path format can be extended later if needed, but `/{service}/{op}` is the correct design now.
+- **Resolution**: alknet-call uses `/{service}/{op}` (e.g., `/fs/readFile`, `/agent/chat`, `/services/list`). The `/{node}/{service}/{op}` pattern from the reference implementation served a head/worker routing model that is a separate architectural concern. Remote dispatch (federation / node-level routing) would be a different mechanism at a different layer, not a prefix added to alknet-call's operation paths. See ADR-005, ADR-012.
 - **Cross-references**: ADR-005, ADR-012
 ### OQ-14: Batch Operation Semantics
@@ -200,7 +206,7 @@ These questions are acknowledged but not active. They will be promoted to open w
 - **Status**: resolved
 - **Door type**: Two-way
 - **Priority**: low
- **Resolution**: Batch is a client-side pattern — multiple `call.requested` events with correlated IDs, responses arrive independently. This is the correct protocol design, not a simplification to be "upgraded" later. QUIC's stream multiplexing already provides the concurrency and ordering guarantees that batch would need. Batch-specific event types (e.g., `batch.requested`, `batch.responded`) would add protocol complexity without clear benefit over sending multiple `call.requested` events. If a compelling use case for atomic batch semantics emerges, it can be added as a new event type without breaking existing clients. Two-way door.
+- **Resolution**: Batch is a client-side pattern — multiple `call.requested` events with correlated IDs, responses arrive independently. QUIC's stream multiplexing provides the concurrency and ordering guarantees that batch would need. Batch-specific event types (e.g., `batch.requested`, `batch.responded`) would add protocol complexity without clear benefit over sending multiple `call.requested` events. See ADR-012.
 - **Cross-references**: ADR-012
 ## Theme: alknet-call
@@ -220,7 +226,7 @@ These questions are acknowledged but not active. They will be promoted to open w
 - **Status**: resolved
 - **Door type**: One-way
 - **Priority**: high
- **Resolution**: No vault operations are exposed over the call protocol for now. The vault is accessed only at the assembly layer (CLI binary at startup). Handlers receive secret material through `OperationContext.capabilities`, not by calling vault operations over the wire. The `operation-registry.md` spec previously showed `vault/derive`, `vault/unlock`, and `vault/decrypt` registered as call protocol operations — that was a contradiction with ADR-008's "capability source" model and has been corrected. If a future use case requires exposing a vault operation over the call protocol (e.g., a restricted `vault/public-key` operation that returns only public key material for identity verification), it would require its own ADR with an explicit threat model justification. See ADR-014.
+- **Resolution**: No vault operations are exposed over the call protocol. The vault is accessed only at the assembly layer (CLI binary at startup). Handlers receive secret material through `OperationContext.capabilities`, not by calling vault operations over the wire. The `operation-registry.md` spec previously showed `vault/derive`, `vault/unlock`, and `vault/decrypt` registered as call protocol operations — that was a contradiction with ADR-008's "capability source" model and has been corrected. See ADR-014.
 - **Cross-references**: ADR-008, ADR-014, [operation-registry.md](crates/call/operation-registry.md)
 ### OQ-17: Abort Cascade Semantics for Nested Calls
@@ -510,9 +516,10 @@ is a feature extension, not an unmade architecture decision.
 ### OQ-32: Multi-Hop Federation
 - **Origin**: [ADR-029](decisions/029-peer-graph-routing-model.md) §3.7, `docs/research/alknet-call-peer-routing/findings.md` §3.7
- **Status**: open (feature extension, not an unmade architecture decision)
+- **Status**: deferred(scope)
 - **Door type**: One-way (federation model), two-way (mechanism)
 - **Priority**: low
 - **Blocked on**: A concrete use case for multi-hop federation. The one-hop model covers all current use cases (head→worker, runner→hub).
 - **Resolution**: The model is **one-hop** — worker A does not transitively
  see worker B's ops through the head unless the head explicitly re-exports
  them. The peer-keyed overlay model extends to multi-hop without redesign
@@ -920,11 +927,14 @@ is a feature extension, not an unmade architecture decision.
 - **Origin**: [ADR-049](decisions/049-streaming-handler-for-subscriptions.md),
  [operation-registry.md](crates/call/operation-registry.md) §"OperationEnv"
- **Status**: open (feature extension — a library to build, not a decision
+- **Status**: deferred(scope)
  to make before implementation)
 - **Door type**: Two-way (additive utility library; no protocol or API-surface
  change)
 - **Priority**: low
 - **Blocked on**: A handler that needs stream operators and finds the existing
  combinators (`Box::pin(stream::iter(...))`, `async_stream::stream!`,
  `futures::stream`) insufficient. The operators library is a convenience, not
  a prerequisite for any handler.
 - **Resolution**: ADR-049 establishes that stream composition (filter, map,
  combine, window, dedupe) is a **handler-level concern**, not a protocol
  composition concern. `OperationEnv::invoke()` is request/response-only;
@@ -937,7 +947,7 @@ is a feature extension, not an unmade architecture decision.
  The Rust analogue — a stream-operators utility crate or module providing
  the same set of operators on `BoxStream<T>` / `impl Stream<Item = T>` — is
-  a **feature extension**, not an unmade architectural decision. Handlers can
+  a feature extension. Handlers can
  produce streams today without it (`Box::pin(stream::iter(...))`,
  `async_stream::stream!`, `futures::stream` combinators all work); the
  operators library is a convenience that reduces boilerplate for handlers
--- a/docs/sdd_process.md
+++ b/docs/sdd_process.md
@@ -625,9 +625,10 @@ they don't revert. If superseded, mark the old one and create a new one.
 ### OQ-NN: <Question>
 - **Origin**: [spec-doc.md]
- **Status**: open | resolved
+- **Status**: open | resolved | deferred(scope) | partially resolved
 - **Priority**: high | medium | low
 - **Resolution**: (when resolved)
 - **Blocked on**: (when deferred — concrete dependency that would unblock)
 - **Cross-references**: OQ-NN, ADR-NNN
 ```