docs: resolve OQ-03 — adopt rolling token window screening (ADR-012)

Research confirmed rolling token windows as the right approach for long document screening. ADR-012 formalizes the decision: Phase 2 implements screen_document() with 25% overlap (512 tokens for SmolLM2-135M), max pooling aggregation, and character offset tracking. Short inputs fall through to screen() unchanged. This resolves the last open question. All 6 original OQs are now resolved: - OQ-01: ONNX removed (burn/cublas better future path) - OQ-02: 65% codebook compression achievable - OQ-03: Rolling token windows for Phase 2 (ADR-012) - OQ-04: Both model-specific defaults + user-overridable - OQ-05: Standalone API + thin adapters (ADR-011) - OQ-06: TOML for file-based config
2026-06-13 08:25:12 +00:00
parent 45a0e0798c
commit c225cf420c
5 changed files with 96 additions and 33 deletions
--- a/docs/architecture/README.md
+++ b/docs/architecture/README.md
@@ -47,6 +47,7 @@ raises "behavioral alarms" without needing to know specific attack types.
 | [009](decisions/009-last-token-extraction.md) | Last-Token Activation Extraction | Accepted |
 | [010](decisions/010-monotonic-spline-distributions.md) | Monotonic Spline Distributions | Accepted |
 | [011](decisions/011-guardrail-integration-strategy.md) | Standalone API + Thin Adapter Integration | Accepted |
+| [012](decisions/012-rolling-window-screening.md) | Rolling Token Window Screening | Accepted |

 ## Open Questions

@@ -56,7 +57,7 @@ See [open-questions.md](open-questions.md) for the full tracker.
 |----|----------|----------|--------|
 | ~~OQ-01~~ | ~~Should ONNX Runtime be a supported inference backend in Phase 1?~~ | ~~medium~~ | **resolved** (removed from scope; burn/cublas is better future path) |
 | ~~OQ-02~~ | ~~What is the minimum viable codebook — can the 1,245-line codebook be compressed?~~ | ~~high~~ | **resolved** (~65% compression to 500–600 lines) |
-| OQ-03 | Should the firewall support streaming/chunked input screening? | medium | open (research complete, Phase 2) |
+| ~~OQ-03~~ | ~~Should the firewall support streaming/chunked input screening?~~ | ~~medium~~ | **resolved** (ADR-012: rolling token windows Phase 2) |
 | ~~OQ-04~~ | ~~Should detection thresholds be per-model or globally configurable?~~ | ~~medium~~ | **resolved** (both: model-specific defaults, user-overridable) |
 | ~~OQ-05~~ | ~~How should the firewall integrate with existing guardrail systems?~~ | ~~medium~~ | **resolved** (ADR-011: standalone API + thin adapters) |
 | ~~OQ-06~~ | ~~Should file-based configuration use TOML or YAML?~~ | ~~low~~ | **resolved** (TOML) |