From 3daecd7ab21b219e0c84286907eca25ab84bf638 Mon Sep 17 00:00:00 2001 From: "glm-5.2" Date: Sat, 4 Jul 2026 15:31:04 +0000 Subject: [PATCH] fix(process): add architect Safe Exit for deferred decisions, clean hedging language - Add Safe Exit section to architect spec: when a decision genuinely can't be made, mark OQ as deferred(scope) with concrete blocking condition, create blocker task, move on - Add anti-patterns #10-#11 covering hedging synonyms (feature extension, additive, not a v1 blocker, for now, resolved with escape hatch) - Add hedging audit to architect self-review checklist - Clean hedging language from resolved OQs (OQ-04, OQ-13, OQ-14, OQ-16) - Reclassify OQ-32 and OQ-41 as deferred(scope) with blocking conditions - Add deferred(scope) status to OQ format in sdd_process.md --- .opencode/agents/architect.md | 109 ++++++++++++++++++++++++++-- docs/architecture/open-questions.md | 26 +++++-- docs/sdd_process.md | 3 +- 3 files changed, 122 insertions(+), 16 deletions(-) diff --git a/.opencode/agents/architect.md b/.opencode/agents/architect.md index c5745d3..979ad64 100644 --- a/.opencode/agents/architect.md +++ b/.opencode/agents/architect.md @@ -176,8 +176,21 @@ Before requesting external review: - Check that README has a complete ADR table and doc table - Ensure documents are focused (split if a spec exceeds ~700 lines) - Verify frontmatter statuses are correct +- **Hedging audit**: Scan resolved OQs for hedging synonyms (anti-patterns + #9–#11). If a "resolved" OQ's resolution is primarily about how the + decision can be changed later, either drop the undo instructions (the + decision is made) or re-mark it `deferred(scope)` (the decision is not + made). -### 5. Request Architecture Review +### 5. Safe Exit: Deferred Decisions + +When you encounter a decision that genuinely can't be made: + +1. Mark the OQ as `deferred(scope)` with a concrete blocking condition +2. Create a blocker task in `tasks/architecture/` naming the dependency +3. Continue to decisions that *can* be made — do not stall on one question + +### 6. Request Architecture Review Spawn a review subagent: @@ -191,25 +204,27 @@ task( 4. Undefined terms or concepts 5. Ambiguities that could cause implementation issues 6. Document size (recommend split if >700 lines) + 7. Hedging language in resolved OQs (anti-patterns #9-#11) Return a structured review with issues categorized as: critical, warning, suggestion", subagent_type="general" ) ``` -### 6. Iterate Based on Review +### 7. Iterate Based on Review Address feedback: - **Critical**: Must fix before stabilization — inline decisions not extracted, - ADR references that point to nonexistent files, undefined terms + ADR references that point to nonexistent files, undefined terms, hedging + language in resolved OQs - **Warning**: Should fix — missing cross-references, documents approaching split threshold - **Suggestion**: Consider — minor clarity improvements Iterate until zero critical issues. -### 7. Mark Review Status +### 8. Mark Review Status When all open questions for a document are resolved and review is complete: @@ -248,9 +263,9 @@ last_updated: 2026-05-29 answer is resolved, not left "open" with hedging language like "v1 default" or "can be revisited later." If the decision is made, mark it resolved. If the decision genuinely can't be made yet (the use case isn't concrete, - the options aren't clear), leave it open — but say *why* it can't be made, - not "we'll decide later." The architect's job is to make architecture - decisions, not to defer them to the implementation agent. + the options aren't clear), mark it `deferred(scope)` — see Safe Exit below. + The architect's job is to make architecture decisions that *can* be made + and to clearly identify which decisions *can't* be made yet and why. ## Door Types and Decision Urgency @@ -309,6 +324,86 @@ door, decide later." actually made. If the decision is made, state it cleanly. Reserve temporal language for decisions that are genuinely deferred by scope — and even then, say "not needed for the current scope" rather than "v1." +10. **Hedging synonyms in "resolved" OQs**: The following patterns are + structurally identical to the hedging in #9 — they reframe deferral as + decisiveness. Do not use them on resolved decisions: + - "feature extension, not an unmade decision" — if it's not decided, it's + not resolved. Mark it `deferred(scope)`. + - "additive, not blocking" — if it's not decided, don't claim it is. + - "two-way door — can be changed later if needed" — door type classifies + reversal cost, not whether a decision is made. A two-way door is a + decision you make now. If you're using it to justify not deciding, see + anti-pattern #8. + - "not a v1 blocker" — if it's not decided, it's deferred. Say what + unblocks it. + - "for now" / "not yet" on a resolved OQ — if the resolution has an + expiration date, it's not resolved. Mark it `deferred(scope)` with the + condition that would trigger re-evaluation. +11. **Resolved with escape hatch**: An OQ marked `resolved` whose resolution + text is primarily about how the decision can be changed later. If the + resolution is "X, but here's how we'd undo X," the decision is made — + drop the undo instructions (they're implementation details, not + architecture). If the resolution is "X for now, Y later," the decision + is not made — mark it `deferred(scope)`. + +## Safe Exit: Deferred Decisions + +When a decision genuinely can't be made because the information doesn't exist +yet, the architect has a Safe Exit path. This is not a failure — it's scope +management. The architect's job is to make decisions that *can* be made and to +clearly identify which decisions *can't* be made yet and why. + +### When to Defer + +A decision should be deferred when: + +- The use case isn't concrete (e.g., "we don't know what the agent crate will + need from the call protocol") +- The options depend on something that doesn't exist yet (e.g., "depends on + the alknet-http crate spec") +- The trade-off requires data that can only come from implementation (e.g., + "need performance benchmarks to choose between X and Y") +- The decision is genuinely not needed for the current scope (e.g., "the + current scope is core + call crates; this question is about the agent crate") + +### How to Defer + +1. **Mark the OQ as `deferred(scope)`** — not `open` (implies it should be + resolved now) and not `resolved` (implies it's decided). +2. **State the blocking condition** — what specific thing would unblock this + decision? Be concrete: "blocked on: alknet-agent crate spec exists" not + "blocked on: future work." +3. **Create a blocker task** in `tasks/architecture/` that names the + dependency. This makes the deferral visible and actionable rather than + buried in hedging language. +4. **Move on** — the architect continues to decisions that *can* be made. + Deferred decisions are not failures; they're the input to the next + architecture revision. + +### Deferred OQ Format + +```markdown +### OQ-NN: + +- **Origin**: [spec-doc.md] +- **Status**: deferred(scope) +- **Door type**: +- **Priority**: +- **Blocked on**: +- **Resolution**: Not yet decidable. +- **Cross-references**: OQ-NN, ADR-NNN +``` + +### What NOT to Do + +- Do not mark a deferred decision as `resolved` with caveats. "Resolved with + an escape hatch" is hedging. +- Do not use "feature extension" / "additive" / "not blocking" as a + substitute for `deferred(scope)`. Those phrases describe implementation + sequencing, not architectural decisions. +- Do not leave a deferred decision as `open` without a blocking condition. + "Open" means "needs to be resolved now" — if it can't be resolved now, it's + `deferred(scope)`. ## When to Redirect diff --git a/docs/architecture/open-questions.md b/docs/architecture/open-questions.md index 2136601..1034fd5 100644 --- a/docs/architecture/open-questions.md +++ b/docs/architecture/open-questions.md @@ -7,6 +7,12 @@ last_updated: 2026-07-04 Questions are organized by theme. Each question has a stable OQ-ID for cross-referencing from spec documents. +**Status values**: +- `open` — Needs to be resolved now. Has a clear path to resolution. +- `resolved` — Decided. The resolution is stated cleanly, without caveats about how it could be changed later. +- `deferred(scope)` — Cannot be resolved yet. The information doesn't exist. Has a concrete blocking condition (e.g., "blocked on: alknet-agent crate spec"). Not a failure — scope management. +- `partially resolved` — Some aspects decided, others deferred or open. + Door type classifications follow ADR-009 — they describe **reversal cost** (how expensive it is to undo), not urgency: - **One-way door**: Reversal requires rewriting significant code or permanently closes a capability. Getting it wrong is expensive — requires ADR before implementation. - **Two-way door**: Reversal is cheap or additive. Getting it wrong is recoverable — decide, implement, revert if needed. @@ -50,7 +56,7 @@ Door type is separate from whether a decision is made. A two-way door is a decis - **Status**: resolved - **Door type**: Two-way - **Priority**: low -- **Resolution**: Static registration at startup. `HandlerRegistry` is immutable after construction. ALPN strings in the TLS `ServerConfig` are derived from the registry at startup — adding a handler at runtime requires rebuilding the TLS config. The `ArcSwap` pattern can be applied later if needed (two-way door). See ADR-010. +- **Resolution**: Static registration at startup. `HandlerRegistry` is immutable after construction. ALPN strings in the TLS `ServerConfig` are derived from the registry at startup. See ADR-010. **Scope clarification (ADR-024)**: This resolution applies to the **`HandlerRegistry`** (ALPN string → `ProtocolHandler`), which is what @@ -191,7 +197,7 @@ These questions are acknowledged but not active. They will be promoted to open w - **Status**: resolved - **Door type**: Two-way - **Priority**: medium -- **Resolution**: alknet-call uses `/{service}/{op}` (e.g., `/fs/readFile`, `/agent/chat`, `/services/list`). This is the correct format for the alknet-call crate — it is not a "Phase 1 simplification" but the right design for this architecture. The `/{node}/{service}/{op}` pattern from the reference implementation served a head/worker routing model that is a separate architectural concern. Remote dispatch (federation / node-level routing) would be a different mechanism at a different layer, not a prefix added to alknet-call's operation paths. If remote dispatch is ever needed, it would be addressed by a separate crate or a routing layer above the operation registry, not by changing alknet-call's path format. Two-way door — the path format can be extended later if needed, but `/{service}/{op}` is the correct design now. +- **Resolution**: alknet-call uses `/{service}/{op}` (e.g., `/fs/readFile`, `/agent/chat`, `/services/list`). The `/{node}/{service}/{op}` pattern from the reference implementation served a head/worker routing model that is a separate architectural concern. Remote dispatch (federation / node-level routing) would be a different mechanism at a different layer, not a prefix added to alknet-call's operation paths. See ADR-005, ADR-012. - **Cross-references**: ADR-005, ADR-012 ### OQ-14: Batch Operation Semantics @@ -200,7 +206,7 @@ These questions are acknowledged but not active. They will be promoted to open w - **Status**: resolved - **Door type**: Two-way - **Priority**: low -- **Resolution**: Batch is a client-side pattern — multiple `call.requested` events with correlated IDs, responses arrive independently. This is the correct protocol design, not a simplification to be "upgraded" later. QUIC's stream multiplexing already provides the concurrency and ordering guarantees that batch would need. Batch-specific event types (e.g., `batch.requested`, `batch.responded`) would add protocol complexity without clear benefit over sending multiple `call.requested` events. If a compelling use case for atomic batch semantics emerges, it can be added as a new event type without breaking existing clients. Two-way door. +- **Resolution**: Batch is a client-side pattern — multiple `call.requested` events with correlated IDs, responses arrive independently. QUIC's stream multiplexing provides the concurrency and ordering guarantees that batch would need. Batch-specific event types (e.g., `batch.requested`, `batch.responded`) would add protocol complexity without clear benefit over sending multiple `call.requested` events. See ADR-012. - **Cross-references**: ADR-012 ## Theme: alknet-call @@ -220,7 +226,7 @@ These questions are acknowledged but not active. They will be promoted to open w - **Status**: resolved - **Door type**: One-way - **Priority**: high -- **Resolution**: No vault operations are exposed over the call protocol for now. The vault is accessed only at the assembly layer (CLI binary at startup). Handlers receive secret material through `OperationContext.capabilities`, not by calling vault operations over the wire. The `operation-registry.md` spec previously showed `vault/derive`, `vault/unlock`, and `vault/decrypt` registered as call protocol operations — that was a contradiction with ADR-008's "capability source" model and has been corrected. If a future use case requires exposing a vault operation over the call protocol (e.g., a restricted `vault/public-key` operation that returns only public key material for identity verification), it would require its own ADR with an explicit threat model justification. See ADR-014. +- **Resolution**: No vault operations are exposed over the call protocol. The vault is accessed only at the assembly layer (CLI binary at startup). Handlers receive secret material through `OperationContext.capabilities`, not by calling vault operations over the wire. The `operation-registry.md` spec previously showed `vault/derive`, `vault/unlock`, and `vault/decrypt` registered as call protocol operations — that was a contradiction with ADR-008's "capability source" model and has been corrected. See ADR-014. - **Cross-references**: ADR-008, ADR-014, [operation-registry.md](crates/call/operation-registry.md) ### OQ-17: Abort Cascade Semantics for Nested Calls @@ -510,9 +516,10 @@ is a feature extension, not an unmade architecture decision. ### OQ-32: Multi-Hop Federation - **Origin**: [ADR-029](decisions/029-peer-graph-routing-model.md) §3.7, `docs/research/alknet-call-peer-routing/findings.md` §3.7 -- **Status**: open (feature extension, not an unmade architecture decision) +- **Status**: deferred(scope) - **Door type**: One-way (federation model), two-way (mechanism) - **Priority**: low +- **Blocked on**: A concrete use case for multi-hop federation. The one-hop model covers all current use cases (head→worker, runner→hub). - **Resolution**: The model is **one-hop** — worker A does not transitively see worker B's ops through the head unless the head explicitly re-exports them. The peer-keyed overlay model extends to multi-hop without redesign @@ -920,11 +927,14 @@ is a feature extension, not an unmade architecture decision. - **Origin**: [ADR-049](decisions/049-streaming-handler-for-subscriptions.md), [operation-registry.md](crates/call/operation-registry.md) §"OperationEnv" -- **Status**: open (feature extension — a library to build, not a decision - to make before implementation) +- **Status**: deferred(scope) - **Door type**: Two-way (additive utility library; no protocol or API-surface change) - **Priority**: low +- **Blocked on**: A handler that needs stream operators and finds the existing + combinators (`Box::pin(stream::iter(...))`, `async_stream::stream!`, + `futures::stream`) insufficient. The operators library is a convenience, not + a prerequisite for any handler. - **Resolution**: ADR-049 establishes that stream composition (filter, map, combine, window, dedupe) is a **handler-level concern**, not a protocol composition concern. `OperationEnv::invoke()` is request/response-only; @@ -937,7 +947,7 @@ is a feature extension, not an unmade architecture decision. The Rust analogue — a stream-operators utility crate or module providing the same set of operators on `BoxStream` / `impl Stream` — is - a **feature extension**, not an unmade architectural decision. Handlers can + a feature extension. Handlers can produce streams today without it (`Box::pin(stream::iter(...))`, `async_stream::stream!`, `futures::stream` combinators all work); the operators library is a convenience that reduces boilerplate for handlers diff --git a/docs/sdd_process.md b/docs/sdd_process.md index 00696f1..97651fa 100644 --- a/docs/sdd_process.md +++ b/docs/sdd_process.md @@ -625,9 +625,10 @@ they don't revert. If superseded, mark the old one and create a new one. ### OQ-NN: - **Origin**: [spec-doc.md] -- **Status**: open | resolved +- **Status**: open | resolved | deferred(scope) | partially resolved - **Priority**: high | medium | low - **Resolution**: (when resolved) +- **Blocked on**: (when deferred — concrete dependency that would unblock) - **Cross-references**: OQ-NN, ADR-NNN ```