fix(process): add architect Safe Exit for deferred decisions, clean hedging language

- Add Safe Exit section to architect spec: when a decision genuinely can't
  be made, mark OQ as deferred(scope) with concrete blocking condition,
  create blocker task, move on
- Add anti-patterns #10-#11 covering hedging synonyms (feature extension,
  additive, not a v1 blocker, for now, resolved with escape hatch)
- Add hedging audit to architect self-review checklist
- Clean hedging language from resolved OQs (OQ-04, OQ-13, OQ-14, OQ-16)
- Reclassify OQ-32 and OQ-41 as deferred(scope) with blocking conditions
- Add deferred(scope) status to OQ format in sdd_process.md
This commit is contained in:
2026-07-04 15:31:04 +00:00
parent f390550a06
commit 3daecd7ab2
3 changed files with 122 additions and 16 deletions

View File

@@ -176,8 +176,21 @@ Before requesting external review:
- Check that README has a complete ADR table and doc table - Check that README has a complete ADR table and doc table
- Ensure documents are focused (split if a spec exceeds ~700 lines) - Ensure documents are focused (split if a spec exceeds ~700 lines)
- Verify frontmatter statuses are correct - Verify frontmatter statuses are correct
- **Hedging audit**: Scan resolved OQs for hedging synonyms (anti-patterns
#9#11). If a "resolved" OQ's resolution is primarily about how the
decision can be changed later, either drop the undo instructions (the
decision is made) or re-mark it `deferred(scope)` (the decision is not
made).
### 5. Request Architecture Review ### 5. Safe Exit: Deferred Decisions
When you encounter a decision that genuinely can't be made:
1. Mark the OQ as `deferred(scope)` with a concrete blocking condition
2. Create a blocker task in `tasks/architecture/` naming the dependency
3. Continue to decisions that *can* be made — do not stall on one question
### 6. Request Architecture Review
Spawn a review subagent: Spawn a review subagent:
@@ -191,25 +204,27 @@ task(
4. Undefined terms or concepts 4. Undefined terms or concepts
5. Ambiguities that could cause implementation issues 5. Ambiguities that could cause implementation issues
6. Document size (recommend split if >700 lines) 6. Document size (recommend split if >700 lines)
7. Hedging language in resolved OQs (anti-patterns #9-#11)
Return a structured review with issues categorized as: critical, warning, suggestion", Return a structured review with issues categorized as: critical, warning, suggestion",
subagent_type="general" subagent_type="general"
) )
``` ```
### 6. Iterate Based on Review ### 7. Iterate Based on Review
Address feedback: Address feedback:
- **Critical**: Must fix before stabilization — inline decisions not extracted, - **Critical**: Must fix before stabilization — inline decisions not extracted,
ADR references that point to nonexistent files, undefined terms ADR references that point to nonexistent files, undefined terms, hedging
language in resolved OQs
- **Warning**: Should fix — missing cross-references, documents approaching - **Warning**: Should fix — missing cross-references, documents approaching
split threshold split threshold
- **Suggestion**: Consider — minor clarity improvements - **Suggestion**: Consider — minor clarity improvements
Iterate until zero critical issues. Iterate until zero critical issues.
### 7. Mark Review Status ### 8. Mark Review Status
When all open questions for a document are resolved and review is complete: When all open questions for a document are resolved and review is complete:
@@ -248,9 +263,9 @@ last_updated: 2026-05-29
answer is resolved, not left "open" with hedging language like "v1 default" answer is resolved, not left "open" with hedging language like "v1 default"
or "can be revisited later." If the decision is made, mark it resolved. If or "can be revisited later." If the decision is made, mark it resolved. If
the decision genuinely can't be made yet (the use case isn't concrete, the decision genuinely can't be made yet (the use case isn't concrete,
the options aren't clear), leave it open — but say *why* it can't be made, the options aren't clear), mark it `deferred(scope)` — see Safe Exit below.
not "we'll decide later." The architect's job is to make architecture The architect's job is to make architecture decisions that *can* be made
decisions, not to defer them to the implementation agent. and to clearly identify which decisions *can't* be made yet and why.
## Door Types and Decision Urgency ## Door Types and Decision Urgency
@@ -309,6 +324,86 @@ door, decide later."
actually made. If the decision is made, state it cleanly. Reserve temporal actually made. If the decision is made, state it cleanly. Reserve temporal
language for decisions that are genuinely deferred by scope — and even language for decisions that are genuinely deferred by scope — and even
then, say "not needed for the current scope" rather than "v1." then, say "not needed for the current scope" rather than "v1."
10. **Hedging synonyms in "resolved" OQs**: The following patterns are
structurally identical to the hedging in #9 — they reframe deferral as
decisiveness. Do not use them on resolved decisions:
- "feature extension, not an unmade decision" — if it's not decided, it's
not resolved. Mark it `deferred(scope)`.
- "additive, not blocking" — if it's not decided, don't claim it is.
- "two-way door — can be changed later if needed" — door type classifies
reversal cost, not whether a decision is made. A two-way door is a
decision you make now. If you're using it to justify not deciding, see
anti-pattern #8.
- "not a v1 blocker" — if it's not decided, it's deferred. Say what
unblocks it.
- "for now" / "not yet" on a resolved OQ — if the resolution has an
expiration date, it's not resolved. Mark it `deferred(scope)` with the
condition that would trigger re-evaluation.
11. **Resolved with escape hatch**: An OQ marked `resolved` whose resolution
text is primarily about how the decision can be changed later. If the
resolution is "X, but here's how we'd undo X," the decision is made —
drop the undo instructions (they're implementation details, not
architecture). If the resolution is "X for now, Y later," the decision
is not made — mark it `deferred(scope)`.
## Safe Exit: Deferred Decisions
When a decision genuinely can't be made because the information doesn't exist
yet, the architect has a Safe Exit path. This is not a failure — it's scope
management. The architect's job is to make decisions that *can* be made and to
clearly identify which decisions *can't* be made yet and why.
### When to Defer
A decision should be deferred when:
- The use case isn't concrete (e.g., "we don't know what the agent crate will
need from the call protocol")
- The options depend on something that doesn't exist yet (e.g., "depends on
the alknet-http crate spec")
- The trade-off requires data that can only come from implementation (e.g.,
"need performance benchmarks to choose between X and Y")
- The decision is genuinely not needed for the current scope (e.g., "the
current scope is core + call crates; this question is about the agent crate")
### How to Defer
1. **Mark the OQ as `deferred(scope)`** — not `open` (implies it should be
resolved now) and not `resolved` (implies it's decided).
2. **State the blocking condition** — what specific thing would unblock this
decision? Be concrete: "blocked on: alknet-agent crate spec exists" not
"blocked on: future work."
3. **Create a blocker task** in `tasks/architecture/` that names the
dependency. This makes the deferral visible and actionable rather than
buried in hedging language.
4. **Move on** — the architect continues to decisions that *can* be made.
Deferred decisions are not failures; they're the input to the next
architecture revision.
### Deferred OQ Format
```markdown
### OQ-NN: <Question>
- **Origin**: [spec-doc.md]
- **Status**: deferred(scope)
- **Door type**: <one-way | two-way>
- **Priority**: <high | medium | low>
- **Blocked on**: <concrete dependency — crate spec, POC result, use case>
- **Resolution**: Not yet decidable. <Why the information doesn't exist yet.>
- **Cross-references**: OQ-NN, ADR-NNN
```
### What NOT to Do
- Do not mark a deferred decision as `resolved` with caveats. "Resolved with
an escape hatch" is hedging.
- Do not use "feature extension" / "additive" / "not blocking" as a
substitute for `deferred(scope)`. Those phrases describe implementation
sequencing, not architectural decisions.
- Do not leave a deferred decision as `open` without a blocking condition.
"Open" means "needs to be resolved now" — if it can't be resolved now, it's
`deferred(scope)`.
## When to Redirect ## When to Redirect

View File

@@ -7,6 +7,12 @@ last_updated: 2026-07-04
Questions are organized by theme. Each question has a stable OQ-ID for cross-referencing from spec documents. Questions are organized by theme. Each question has a stable OQ-ID for cross-referencing from spec documents.
**Status values**:
- `open` — Needs to be resolved now. Has a clear path to resolution.
- `resolved` — Decided. The resolution is stated cleanly, without caveats about how it could be changed later.
- `deferred(scope)` — Cannot be resolved yet. The information doesn't exist. Has a concrete blocking condition (e.g., "blocked on: alknet-agent crate spec"). Not a failure — scope management.
- `partially resolved` — Some aspects decided, others deferred or open.
Door type classifications follow ADR-009 — they describe **reversal cost** (how expensive it is to undo), not urgency: Door type classifications follow ADR-009 — they describe **reversal cost** (how expensive it is to undo), not urgency:
- **One-way door**: Reversal requires rewriting significant code or permanently closes a capability. Getting it wrong is expensive — requires ADR before implementation. - **One-way door**: Reversal requires rewriting significant code or permanently closes a capability. Getting it wrong is expensive — requires ADR before implementation.
- **Two-way door**: Reversal is cheap or additive. Getting it wrong is recoverable — decide, implement, revert if needed. - **Two-way door**: Reversal is cheap or additive. Getting it wrong is recoverable — decide, implement, revert if needed.
@@ -50,7 +56,7 @@ Door type is separate from whether a decision is made. A two-way door is a decis
- **Status**: resolved - **Status**: resolved
- **Door type**: Two-way - **Door type**: Two-way
- **Priority**: low - **Priority**: low
- **Resolution**: Static registration at startup. `HandlerRegistry` is immutable after construction. ALPN strings in the TLS `ServerConfig` are derived from the registry at startup — adding a handler at runtime requires rebuilding the TLS config. The `ArcSwap<HandlerRegistry>` pattern can be applied later if needed (two-way door). See ADR-010. - **Resolution**: Static registration at startup. `HandlerRegistry` is immutable after construction. ALPN strings in the TLS `ServerConfig` are derived from the registry at startup. See ADR-010.
**Scope clarification (ADR-024)**: This resolution applies to the **Scope clarification (ADR-024)**: This resolution applies to the
**`HandlerRegistry`** (ALPN string → `ProtocolHandler`), which is what **`HandlerRegistry`** (ALPN string → `ProtocolHandler`), which is what
@@ -191,7 +197,7 @@ These questions are acknowledged but not active. They will be promoted to open w
- **Status**: resolved - **Status**: resolved
- **Door type**: Two-way - **Door type**: Two-way
- **Priority**: medium - **Priority**: medium
- **Resolution**: alknet-call uses `/{service}/{op}` (e.g., `/fs/readFile`, `/agent/chat`, `/services/list`). This is the correct format for the alknet-call crate — it is not a "Phase 1 simplification" but the right design for this architecture. The `/{node}/{service}/{op}` pattern from the reference implementation served a head/worker routing model that is a separate architectural concern. Remote dispatch (federation / node-level routing) would be a different mechanism at a different layer, not a prefix added to alknet-call's operation paths. If remote dispatch is ever needed, it would be addressed by a separate crate or a routing layer above the operation registry, not by changing alknet-call's path format. Two-way door — the path format can be extended later if needed, but `/{service}/{op}` is the correct design now. - **Resolution**: alknet-call uses `/{service}/{op}` (e.g., `/fs/readFile`, `/agent/chat`, `/services/list`). The `/{node}/{service}/{op}` pattern from the reference implementation served a head/worker routing model that is a separate architectural concern. Remote dispatch (federation / node-level routing) would be a different mechanism at a different layer, not a prefix added to alknet-call's operation paths. See ADR-005, ADR-012.
- **Cross-references**: ADR-005, ADR-012 - **Cross-references**: ADR-005, ADR-012
### OQ-14: Batch Operation Semantics ### OQ-14: Batch Operation Semantics
@@ -200,7 +206,7 @@ These questions are acknowledged but not active. They will be promoted to open w
- **Status**: resolved - **Status**: resolved
- **Door type**: Two-way - **Door type**: Two-way
- **Priority**: low - **Priority**: low
- **Resolution**: Batch is a client-side pattern — multiple `call.requested` events with correlated IDs, responses arrive independently. This is the correct protocol design, not a simplification to be "upgraded" later. QUIC's stream multiplexing already provides the concurrency and ordering guarantees that batch would need. Batch-specific event types (e.g., `batch.requested`, `batch.responded`) would add protocol complexity without clear benefit over sending multiple `call.requested` events. If a compelling use case for atomic batch semantics emerges, it can be added as a new event type without breaking existing clients. Two-way door. - **Resolution**: Batch is a client-side pattern — multiple `call.requested` events with correlated IDs, responses arrive independently. QUIC's stream multiplexing provides the concurrency and ordering guarantees that batch would need. Batch-specific event types (e.g., `batch.requested`, `batch.responded`) would add protocol complexity without clear benefit over sending multiple `call.requested` events. See ADR-012.
- **Cross-references**: ADR-012 - **Cross-references**: ADR-012
## Theme: alknet-call ## Theme: alknet-call
@@ -220,7 +226,7 @@ These questions are acknowledged but not active. They will be promoted to open w
- **Status**: resolved - **Status**: resolved
- **Door type**: One-way - **Door type**: One-way
- **Priority**: high - **Priority**: high
- **Resolution**: No vault operations are exposed over the call protocol for now. The vault is accessed only at the assembly layer (CLI binary at startup). Handlers receive secret material through `OperationContext.capabilities`, not by calling vault operations over the wire. The `operation-registry.md` spec previously showed `vault/derive`, `vault/unlock`, and `vault/decrypt` registered as call protocol operations — that was a contradiction with ADR-008's "capability source" model and has been corrected. If a future use case requires exposing a vault operation over the call protocol (e.g., a restricted `vault/public-key` operation that returns only public key material for identity verification), it would require its own ADR with an explicit threat model justification. See ADR-014. - **Resolution**: No vault operations are exposed over the call protocol. The vault is accessed only at the assembly layer (CLI binary at startup). Handlers receive secret material through `OperationContext.capabilities`, not by calling vault operations over the wire. The `operation-registry.md` spec previously showed `vault/derive`, `vault/unlock`, and `vault/decrypt` registered as call protocol operations — that was a contradiction with ADR-008's "capability source" model and has been corrected. See ADR-014.
- **Cross-references**: ADR-008, ADR-014, [operation-registry.md](crates/call/operation-registry.md) - **Cross-references**: ADR-008, ADR-014, [operation-registry.md](crates/call/operation-registry.md)
### OQ-17: Abort Cascade Semantics for Nested Calls ### OQ-17: Abort Cascade Semantics for Nested Calls
@@ -510,9 +516,10 @@ is a feature extension, not an unmade architecture decision.
### OQ-32: Multi-Hop Federation ### OQ-32: Multi-Hop Federation
- **Origin**: [ADR-029](decisions/029-peer-graph-routing-model.md) §3.7, `docs/research/alknet-call-peer-routing/findings.md` §3.7 - **Origin**: [ADR-029](decisions/029-peer-graph-routing-model.md) §3.7, `docs/research/alknet-call-peer-routing/findings.md` §3.7
- **Status**: open (feature extension, not an unmade architecture decision) - **Status**: deferred(scope)
- **Door type**: One-way (federation model), two-way (mechanism) - **Door type**: One-way (federation model), two-way (mechanism)
- **Priority**: low - **Priority**: low
- **Blocked on**: A concrete use case for multi-hop federation. The one-hop model covers all current use cases (head→worker, runner→hub).
- **Resolution**: The model is **one-hop** — worker A does not transitively - **Resolution**: The model is **one-hop** — worker A does not transitively
see worker B's ops through the head unless the head explicitly re-exports see worker B's ops through the head unless the head explicitly re-exports
them. The peer-keyed overlay model extends to multi-hop without redesign them. The peer-keyed overlay model extends to multi-hop without redesign
@@ -920,11 +927,14 @@ is a feature extension, not an unmade architecture decision.
- **Origin**: [ADR-049](decisions/049-streaming-handler-for-subscriptions.md), - **Origin**: [ADR-049](decisions/049-streaming-handler-for-subscriptions.md),
[operation-registry.md](crates/call/operation-registry.md) §"OperationEnv" [operation-registry.md](crates/call/operation-registry.md) §"OperationEnv"
- **Status**: open (feature extension — a library to build, not a decision - **Status**: deferred(scope)
to make before implementation)
- **Door type**: Two-way (additive utility library; no protocol or API-surface - **Door type**: Two-way (additive utility library; no protocol or API-surface
change) change)
- **Priority**: low - **Priority**: low
- **Blocked on**: A handler that needs stream operators and finds the existing
combinators (`Box::pin(stream::iter(...))`, `async_stream::stream!`,
`futures::stream`) insufficient. The operators library is a convenience, not
a prerequisite for any handler.
- **Resolution**: ADR-049 establishes that stream composition (filter, map, - **Resolution**: ADR-049 establishes that stream composition (filter, map,
combine, window, dedupe) is a **handler-level concern**, not a protocol combine, window, dedupe) is a **handler-level concern**, not a protocol
composition concern. `OperationEnv::invoke()` is request/response-only; composition concern. `OperationEnv::invoke()` is request/response-only;
@@ -937,7 +947,7 @@ is a feature extension, not an unmade architecture decision.
The Rust analogue — a stream-operators utility crate or module providing The Rust analogue — a stream-operators utility crate or module providing
the same set of operators on `BoxStream<T>` / `impl Stream<Item = T>` — is the same set of operators on `BoxStream<T>` / `impl Stream<Item = T>` — is
a **feature extension**, not an unmade architectural decision. Handlers can a feature extension. Handlers can
produce streams today without it (`Box::pin(stream::iter(...))`, produce streams today without it (`Box::pin(stream::iter(...))`,
`async_stream::stream!`, `futures::stream` combinators all work); the `async_stream::stream!`, `futures::stream` combinators all work); the
operators library is a convenience that reduces boilerplate for handlers operators library is a convenience that reduces boilerplate for handlers

View File

@@ -625,9 +625,10 @@ they don't revert. If superseded, mark the old one and create a new one.
### OQ-NN: <Question> ### OQ-NN: <Question>
- **Origin**: [spec-doc.md] - **Origin**: [spec-doc.md]
- **Status**: open | resolved - **Status**: open | resolved | deferred(scope) | partially resolved
- **Priority**: high | medium | low - **Priority**: high | medium | low
- **Resolution**: (when resolved) - **Resolution**: (when resolved)
- **Blocked on**: (when deferred — concrete dependency that would unblock)
- **Cross-references**: OQ-NN, ADR-NNN - **Cross-references**: OQ-NN, ADR-NNN
``` ```