Files
alknet-firewall/docs/architecture/decisions/012-rolling-window-screening.md
glm-5.1 c225cf420c docs: resolve OQ-03 — adopt rolling token window screening (ADR-012)
Research confirmed rolling token windows as the right approach for long
document screening. ADR-012 formalizes the decision: Phase 2 implements
screen_document() with 25% overlap (512 tokens for SmolLM2-135M), max
pooling aggregation, and character offset tracking. Short inputs fall
through to screen() unchanged.

This resolves the last open question. All 6 original OQs are now resolved:
- OQ-01: ONNX removed (burn/cublas better future path)
- OQ-02: 65% codebook compression achievable
- OQ-03: Rolling token windows for Phase 2 (ADR-012)
- OQ-04: Both model-specific defaults + user-overridable
- OQ-05: Standalone API + thin adapters (ADR-011)
- OQ-06: TOML for file-based config
2026-06-13 08:25:12 +00:00

3.6 KiB

ADR-012: Rolling Token Window Screening for Long Documents

Status

Accepted

Context

The Phase 1 screen() API processes the full input as a single forward pass through the detector model. This works for inputs within the model's context window (2048 tokens for SmolLM2-135M) but fails for longer documents. Two distinct windowing concepts exist in the detection pipeline:

  1. Token-level smoothing (already in the codebook): Within a single forward pass, per-token z-coordinates are smoothed with a rolling average (window=8) before classification. This operates on the (seq_len, 3) z coordinate sequence.

  2. Input-level rolling windows (this ADR): For long documents that exceed the model's context window, chunk the text into overlapping token windows and screen each window independently. Each window produces its own z-vector and alarm. Windows are aggregated into a document-level verdict.

Research (rolling-window-analysis.md) confirmed that:

  • Meta's PromptGuard 2 uses a similar approach (512-token segments)
  • Max pooling is the correct aggregation strategy (consistent with existing weighted-max score composition)
  • 25% overlap (512 tokens for SmolLM2-135M) balances detection quality vs throughput — enough to catch boundary-spanning injections
  • Character offset mapping (from HuggingFace tokenizer offset_mapping) enables granular "section X is suspicious" reporting
  • The Rust reference implementation in taskgraph-semantic validates the window creation algorithm

Decision

Implement rolling token window screening as the Phase 2 screen_document() API, with the following parameters:

  • Window size: Model's max sequence length (2048 for SmolLM2-135M)
  • Overlap: 25% (512 tokens) — same as PromptGuard's entire context window
  • Aggregation: Max pooling across per-window, per-direction P(active) scores
  • Short input handling: Inputs shorter than one window fall through to screen() with no overhead
  • Character offset tracking: Token-to-character mapping for granular reporting of flagged sections

The two windowing concepts (token-level smoothing, input-level rolling windows) are composable and solve different problems at different levels.

Consequences

Positive:

  • Long documents (academic papers, reports) can be screened without truncation
  • Granular reporting identifies which sections are suspicious, not just the whole document
  • Windows can be processed in parallel for throughput scaling
  • Natural fallback: short inputs get the fast single-window path
  • Character offsets enable UI integration (highlighting flagged sections)
  • Pattern translates directly to Rust for future embedding system integration

Negative:

  • Throughput cost: N windows = N forward passes. A 10K-token document needs ~7 windows at 25% overlap.
  • Overlap regions are processed multiple times, increasing compute
  • API surface expands — users must choose between screen() and screen_document()
  • Edge cases around window boundaries (partial word tokens, very short windows) need careful handling

References

  • rolling-window-analysis.md — Full research with API design and implementation sketch
  • OQ-03 — Original open question
  • firewall.md — Current screening API
  • codebook.md — Token-level smoothing (separate from this)
  • taskgraph-semantic: /workspace/@alkimiadev/taskgraph-semantic/src/embedding.rs — Rust reference for create_rolling_windows()