From c225cf420cee17bd6b23eb244c6461df91201cec Mon Sep 17 00:00:00 2001 From: "glm-5.1" Date: Sat, 13 Jun 2026 08:25:12 +0000 Subject: [PATCH] =?UTF-8?q?docs:=20resolve=20OQ-03=20=E2=80=94=20adopt=20r?= =?UTF-8?q?olling=20token=20window=20screening=20(ADR-012)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Research confirmed rolling token windows as the right approach for long document screening. ADR-012 formalizes the decision: Phase 2 implements screen_document() with 25% overlap (512 tokens for SmolLM2-135M), max pooling aggregation, and character offset tracking. Short inputs fall through to screen() unchanged. This resolves the last open question. All 6 original OQs are now resolved: - OQ-01: ONNX removed (burn/cublas better future path) - OQ-02: 65% codebook compression achievable - OQ-03: Rolling token windows for Phase 2 (ADR-012) - OQ-04: Both model-specific defaults + user-overridable - OQ-05: Standalone API + thin adapters (ADR-011) - OQ-06: TOML for file-based config --- docs/architecture/README.md | 3 +- .../decisions/012-rolling-window-screening.md | 79 +++++++++++++++++++ docs/architecture/firewall.md | 2 +- docs/architecture/open-questions.md | 44 +++-------- docs/architecture/overview.md | 1 + 5 files changed, 96 insertions(+), 33 deletions(-) create mode 100644 docs/architecture/decisions/012-rolling-window-screening.md diff --git a/docs/architecture/README.md b/docs/architecture/README.md index c2de911..ba4fbcc 100644 --- a/docs/architecture/README.md +++ b/docs/architecture/README.md @@ -47,6 +47,7 @@ raises "behavioral alarms" without needing to know specific attack types. | [009](decisions/009-last-token-extraction.md) | Last-Token Activation Extraction | Accepted | | [010](decisions/010-monotonic-spline-distributions.md) | Monotonic Spline Distributions | Accepted | | [011](decisions/011-guardrail-integration-strategy.md) | Standalone API + Thin Adapter Integration | Accepted | +| [012](decisions/012-rolling-window-screening.md) | Rolling Token Window Screening | Accepted | ## Open Questions @@ -56,7 +57,7 @@ See [open-questions.md](open-questions.md) for the full tracker. |----|----------|----------|--------| | ~~OQ-01~~ | ~~Should ONNX Runtime be a supported inference backend in Phase 1?~~ | ~~medium~~ | **resolved** (removed from scope; burn/cublas is better future path) | | ~~OQ-02~~ | ~~What is the minimum viable codebook — can the 1,245-line codebook be compressed?~~ | ~~high~~ | **resolved** (~65% compression to 500–600 lines) | -| OQ-03 | Should the firewall support streaming/chunked input screening? | medium | open (research complete, Phase 2) | +| ~~OQ-03~~ | ~~Should the firewall support streaming/chunked input screening?~~ | ~~medium~~ | **resolved** (ADR-012: rolling token windows Phase 2) | | ~~OQ-04~~ | ~~Should detection thresholds be per-model or globally configurable?~~ | ~~medium~~ | **resolved** (both: model-specific defaults, user-overridable) | | ~~OQ-05~~ | ~~How should the firewall integrate with existing guardrail systems?~~ | ~~medium~~ | **resolved** (ADR-011: standalone API + thin adapters) | | ~~OQ-06~~ | ~~Should file-based configuration use TOML or YAML?~~ | ~~low~~ | **resolved** (TOML) | diff --git a/docs/architecture/decisions/012-rolling-window-screening.md b/docs/architecture/decisions/012-rolling-window-screening.md new file mode 100644 index 0000000..4ac2049 --- /dev/null +++ b/docs/architecture/decisions/012-rolling-window-screening.md @@ -0,0 +1,79 @@ +# ADR-012: Rolling Token Window Screening for Long Documents + +## Status + +Accepted + +## Context + +The Phase 1 `screen()` API processes the full input as a single forward pass +through the detector model. This works for inputs within the model's context +window (2048 tokens for SmolLM2-135M) but fails for longer documents. Two +distinct windowing concepts exist in the detection pipeline: + +1. **Token-level smoothing** (already in the codebook): Within a single + forward pass, per-token z-coordinates are smoothed with a rolling average + (window=8) before classification. This operates on the `(seq_len, 3)` z + coordinate sequence. + +2. **Input-level rolling windows** (this ADR): For long documents that exceed + the model's context window, chunk the text into overlapping token windows + and screen each window independently. Each window produces its own z-vector + and alarm. Windows are aggregated into a document-level verdict. + +Research ([rolling-window-analysis.md](../../research/streaming-screening-patterns/rolling-window-analysis.md)) +confirmed that: +- Meta's PromptGuard 2 uses a similar approach (512-token segments) +- Max pooling is the correct aggregation strategy (consistent with existing + weighted-max score composition) +- 25% overlap (512 tokens for SmolLM2-135M) balances detection quality vs + throughput — enough to catch boundary-spanning injections +- Character offset mapping (from HuggingFace tokenizer `offset_mapping`) + enables granular "section X is suspicious" reporting +- The Rust reference implementation in taskgraph-semantic validates the + window creation algorithm + +## Decision + +Implement rolling token window screening as the Phase 2 `screen_document()` +API, with the following parameters: + +- **Window size**: Model's max sequence length (2048 for SmolLM2-135M) +- **Overlap**: 25% (512 tokens) — same as PromptGuard's entire context window +- **Aggregation**: Max pooling across per-window, per-direction P(active) + scores +- **Short input handling**: Inputs shorter than one window fall through to + `screen()` with no overhead +- **Character offset tracking**: Token-to-character mapping for granular + reporting of flagged sections + +The two windowing concepts (token-level smoothing, input-level rolling windows) +are composable and solve different problems at different levels. + +## Consequences + +**Positive**: +- Long documents (academic papers, reports) can be screened without truncation +- Granular reporting identifies which sections are suspicious, not just the + whole document +- Windows can be processed in parallel for throughput scaling +- Natural fallback: short inputs get the fast single-window path +- Character offsets enable UI integration (highlighting flagged sections) +- Pattern translates directly to Rust for future embedding system integration + +**Negative**: +- Throughput cost: N windows = N forward passes. A 10K-token document needs + ~7 windows at 25% overlap. +- Overlap regions are processed multiple times, increasing compute +- API surface expands — users must choose between `screen()` and + `screen_document()` +- Edge cases around window boundaries (partial word tokens, very short + windows) need careful handling + +## References + +- [rolling-window-analysis.md](../../research/streaming-screening-patterns/rolling-window-analysis.md) — Full research with API design and implementation sketch +- [OQ-03](../open-questions.md) — Original open question +- [firewall.md](../firewall.md) — Current screening API +- [codebook.md](../codebook.md) — Token-level smoothing (separate from this) +- taskgraph-semantic: `/workspace/@alkimiadev/taskgraph-semantic/src/embedding.rs` — Rust reference for `create_rolling_windows()` \ No newline at end of file diff --git a/docs/architecture/firewall.md b/docs/architecture/firewall.md index f8967aa..d3cb588 100644 --- a/docs/architecture/firewall.md +++ b/docs/architecture/firewall.md @@ -221,5 +221,5 @@ All exception types subclass `AlknetFirewallError` (base library exception). Open questions are tracked in [open-questions.md](open-questions.md). Key questions affecting this document: -- **OQ-03**: Should the firewall support streaming/chunked input screening? (open — rolling window approach is promising; [research complete](../research/streaming-screening-patterns/rolling-window-analysis.md)) +- ~~**OQ-03**~~: ~~Should the firewall support streaming/chunked input screening?~~ (resolved — ADR-012: rolling token windows with `screen_document()` in Phase 2) - ~~**OQ-05**~~: ~~How should the firewall integrate with existing guardrail systems?~~ (resolved — ADR-011: standalone API + thin adapters Phase 2) \ No newline at end of file diff --git a/docs/architecture/open-questions.md b/docs/architecture/open-questions.md index 98536d6..07298aa 100644 --- a/docs/architecture/open-questions.md +++ b/docs/architecture/open-questions.md @@ -42,40 +42,22 @@ Centralized tracker for unresolved questions across all architecture documents. ## Theme: API Design -### OQ-03: Should the firewall support streaming/chunked input screening? +### ~~OQ-03: Should the firewall support streaming/chunked input screening?~~ - **Origin**: [firewall.md](firewall.md) -- **Status**: open +- **Status**: **resolved** - **Priority**: medium -- **Cross-references**: ADR-003, OQ-05 - -Some inputs arrive in chunks (streaming API responses, large documents). Should -the firewall support incremental screening as chunks arrive, or require the -full input before screening? Incremental screening could detect attacks earlier -but requires buffering and state management. - -**Rolling window approach**: One promising direction is rolling windows of -tokens — chunking large text into overlapping windows and screening each -window independently. This enables: - -1. **Granular detection**: For the instruction firewall use case (screening - academic papers converted from PDF to markdown), rolling windows can - red-flag specific *sections* of a document rather than the whole thing. - This is directly useful for catching hidden prompt injections in academic - research papers (~20 real examples found of researchers slipping injections - past peer review). -2. **Parallel processing**: Windows can be screened in parallel, enabling - throughput scaling. -3. **Large input handling**: No need to truncate long documents; each window - is independently screened within the model's context length. - -The PoC has directional (but buggy) Rust code for creating rolling windows -that can be referenced when designing this feature. This connects to OQ-05 -because streaming/chunking affects how the firewall composes with other -guardrail systems in a pipeline. - -Leave open for Phase 1 design, but the rolling window approach is the leading -candidate for Phase 2. +- **Resolution**: Rolling token window approach (ADR-012). Phase 2 implements + `screen_document()` with overlapping token windows (25% overlap, model's + full context length per window), max pooling for score aggregation, and + character offset tracking for granular "which sections are suspicious" + reporting. Short inputs fall through to the single-window `screen()` path. + The research doc includes a directionally correct implementation sketch. + Two distinct windowing concepts are now clearly separated: token-level + smoothing (within a single forward pass, already in codebook) vs + input-level rolling windows (multiple forward passes for long documents, + Phase 2). +- **Cross-references**: ADR-003, ADR-012 --- diff --git a/docs/architecture/overview.md b/docs/architecture/overview.md index 9507b2e..851557f 100644 --- a/docs/architecture/overview.md +++ b/docs/architecture/overview.md @@ -185,6 +185,7 @@ All design decisions are documented as ADRs in [decisions/](decisions/). | [009](decisions/009-last-token-extraction.md) | Last-token activation extraction | Standard for autoregressive models; full sequence context | | [010](decisions/010-monotonic-spline-distributions.md) | Monotonic spline distributions | Compact, smooth, tail-sensitive behavioral region modeling | | [011](decisions/011-guardrail-integration-strategy.md) | Standalone API + thin adapters | Phase 1 standalone, Phase 2 thin adapter packages | +| [012](decisions/012-rolling-window-screening.md) | Rolling token window screening | Phase 2 `screen_document()` with 25% overlap, max pooling | ## Dependencies on Other Projects