docs: resolve OQ-03 — adopt rolling token window screening (ADR-012)

Research confirmed rolling token windows as the right approach for long
document screening. ADR-012 formalizes the decision: Phase 2 implements
screen_document() with 25% overlap (512 tokens for SmolLM2-135M), max
pooling aggregation, and character offset tracking. Short inputs fall
through to screen() unchanged.

This resolves the last open question. All 6 original OQs are now resolved:
- OQ-01: ONNX removed (burn/cublas better future path)
- OQ-02: 65% codebook compression achievable
- OQ-03: Rolling token windows for Phase 2 (ADR-012)
- OQ-04: Both model-specific defaults + user-overridable
- OQ-05: Standalone API + thin adapters (ADR-011)
- OQ-06: TOML for file-based config
This commit is contained in:
2026-06-13 08:25:12 +00:00
parent 45a0e0798c
commit c225cf420c
5 changed files with 96 additions and 33 deletions

View File

@@ -47,6 +47,7 @@ raises "behavioral alarms" without needing to know specific attack types.
| [009](decisions/009-last-token-extraction.md) | Last-Token Activation Extraction | Accepted | | [009](decisions/009-last-token-extraction.md) | Last-Token Activation Extraction | Accepted |
| [010](decisions/010-monotonic-spline-distributions.md) | Monotonic Spline Distributions | Accepted | | [010](decisions/010-monotonic-spline-distributions.md) | Monotonic Spline Distributions | Accepted |
| [011](decisions/011-guardrail-integration-strategy.md) | Standalone API + Thin Adapter Integration | Accepted | | [011](decisions/011-guardrail-integration-strategy.md) | Standalone API + Thin Adapter Integration | Accepted |
| [012](decisions/012-rolling-window-screening.md) | Rolling Token Window Screening | Accepted |
## Open Questions ## Open Questions
@@ -56,7 +57,7 @@ See [open-questions.md](open-questions.md) for the full tracker.
|----|----------|----------|--------| |----|----------|----------|--------|
| ~~OQ-01~~ | ~~Should ONNX Runtime be a supported inference backend in Phase 1?~~ | ~~medium~~ | **resolved** (removed from scope; burn/cublas is better future path) | | ~~OQ-01~~ | ~~Should ONNX Runtime be a supported inference backend in Phase 1?~~ | ~~medium~~ | **resolved** (removed from scope; burn/cublas is better future path) |
| ~~OQ-02~~ | ~~What is the minimum viable codebook — can the 1,245-line codebook be compressed?~~ | ~~high~~ | **resolved** (~65% compression to 500600 lines) | | ~~OQ-02~~ | ~~What is the minimum viable codebook — can the 1,245-line codebook be compressed?~~ | ~~high~~ | **resolved** (~65% compression to 500600 lines) |
| OQ-03 | Should the firewall support streaming/chunked input screening? | medium | open (research complete, Phase 2) | | ~~OQ-03~~ | ~~Should the firewall support streaming/chunked input screening?~~ | ~~medium~~ | **resolved** (ADR-012: rolling token windows Phase 2) |
| ~~OQ-04~~ | ~~Should detection thresholds be per-model or globally configurable?~~ | ~~medium~~ | **resolved** (both: model-specific defaults, user-overridable) | | ~~OQ-04~~ | ~~Should detection thresholds be per-model or globally configurable?~~ | ~~medium~~ | **resolved** (both: model-specific defaults, user-overridable) |
| ~~OQ-05~~ | ~~How should the firewall integrate with existing guardrail systems?~~ | ~~medium~~ | **resolved** (ADR-011: standalone API + thin adapters) | | ~~OQ-05~~ | ~~How should the firewall integrate with existing guardrail systems?~~ | ~~medium~~ | **resolved** (ADR-011: standalone API + thin adapters) |
| ~~OQ-06~~ | ~~Should file-based configuration use TOML or YAML?~~ | ~~low~~ | **resolved** (TOML) | | ~~OQ-06~~ | ~~Should file-based configuration use TOML or YAML?~~ | ~~low~~ | **resolved** (TOML) |

View File

@@ -0,0 +1,79 @@
# ADR-012: Rolling Token Window Screening for Long Documents
## Status
Accepted
## Context
The Phase 1 `screen()` API processes the full input as a single forward pass
through the detector model. This works for inputs within the model's context
window (2048 tokens for SmolLM2-135M) but fails for longer documents. Two
distinct windowing concepts exist in the detection pipeline:
1. **Token-level smoothing** (already in the codebook): Within a single
forward pass, per-token z-coordinates are smoothed with a rolling average
(window=8) before classification. This operates on the `(seq_len, 3)` z
coordinate sequence.
2. **Input-level rolling windows** (this ADR): For long documents that exceed
the model's context window, chunk the text into overlapping token windows
and screen each window independently. Each window produces its own z-vector
and alarm. Windows are aggregated into a document-level verdict.
Research ([rolling-window-analysis.md](../../research/streaming-screening-patterns/rolling-window-analysis.md))
confirmed that:
- Meta's PromptGuard 2 uses a similar approach (512-token segments)
- Max pooling is the correct aggregation strategy (consistent with existing
weighted-max score composition)
- 25% overlap (512 tokens for SmolLM2-135M) balances detection quality vs
throughput — enough to catch boundary-spanning injections
- Character offset mapping (from HuggingFace tokenizer `offset_mapping`)
enables granular "section X is suspicious" reporting
- The Rust reference implementation in taskgraph-semantic validates the
window creation algorithm
## Decision
Implement rolling token window screening as the Phase 2 `screen_document()`
API, with the following parameters:
- **Window size**: Model's max sequence length (2048 for SmolLM2-135M)
- **Overlap**: 25% (512 tokens) — same as PromptGuard's entire context window
- **Aggregation**: Max pooling across per-window, per-direction P(active)
scores
- **Short input handling**: Inputs shorter than one window fall through to
`screen()` with no overhead
- **Character offset tracking**: Token-to-character mapping for granular
reporting of flagged sections
The two windowing concepts (token-level smoothing, input-level rolling windows)
are composable and solve different problems at different levels.
## Consequences
**Positive**:
- Long documents (academic papers, reports) can be screened without truncation
- Granular reporting identifies which sections are suspicious, not just the
whole document
- Windows can be processed in parallel for throughput scaling
- Natural fallback: short inputs get the fast single-window path
- Character offsets enable UI integration (highlighting flagged sections)
- Pattern translates directly to Rust for future embedding system integration
**Negative**:
- Throughput cost: N windows = N forward passes. A 10K-token document needs
~7 windows at 25% overlap.
- Overlap regions are processed multiple times, increasing compute
- API surface expands — users must choose between `screen()` and
`screen_document()`
- Edge cases around window boundaries (partial word tokens, very short
windows) need careful handling
## References
- [rolling-window-analysis.md](../../research/streaming-screening-patterns/rolling-window-analysis.md) — Full research with API design and implementation sketch
- [OQ-03](../open-questions.md) — Original open question
- [firewall.md](../firewall.md) — Current screening API
- [codebook.md](../codebook.md) — Token-level smoothing (separate from this)
- taskgraph-semantic: `/workspace/@alkimiadev/taskgraph-semantic/src/embedding.rs` — Rust reference for `create_rolling_windows()`

View File

@@ -221,5 +221,5 @@ All exception types subclass `AlknetFirewallError` (base library exception).
Open questions are tracked in [open-questions.md](open-questions.md). Key Open questions are tracked in [open-questions.md](open-questions.md). Key
questions affecting this document: questions affecting this document:
- **OQ-03**: Should the firewall support streaming/chunked input screening? (open — rolling window approach is promising; [research complete](../research/streaming-screening-patterns/rolling-window-analysis.md)) - ~~**OQ-03**~~: ~~Should the firewall support streaming/chunked input screening?~~ (resolved — ADR-012: rolling token windows with `screen_document()` in Phase 2)
- ~~**OQ-05**~~: ~~How should the firewall integrate with existing guardrail systems?~~ (resolved — ADR-011: standalone API + thin adapters Phase 2) - ~~**OQ-05**~~: ~~How should the firewall integrate with existing guardrail systems?~~ (resolved — ADR-011: standalone API + thin adapters Phase 2)

View File

@@ -42,40 +42,22 @@ Centralized tracker for unresolved questions across all architecture documents.
## Theme: API Design ## Theme: API Design
### OQ-03: Should the firewall support streaming/chunked input screening? ### ~~OQ-03: Should the firewall support streaming/chunked input screening?~~
- **Origin**: [firewall.md](firewall.md) - **Origin**: [firewall.md](firewall.md)
- **Status**: open - **Status**: **resolved**
- **Priority**: medium - **Priority**: medium
- **Cross-references**: ADR-003, OQ-05 - **Resolution**: Rolling token window approach (ADR-012). Phase 2 implements
`screen_document()` with overlapping token windows (25% overlap, model's
Some inputs arrive in chunks (streaming API responses, large documents). Should full context length per window), max pooling for score aggregation, and
the firewall support incremental screening as chunks arrive, or require the character offset tracking for granular "which sections are suspicious"
full input before screening? Incremental screening could detect attacks earlier reporting. Short inputs fall through to the single-window `screen()` path.
but requires buffering and state management. The research doc includes a directionally correct implementation sketch.
Two distinct windowing concepts are now clearly separated: token-level
**Rolling window approach**: One promising direction is rolling windows of smoothing (within a single forward pass, already in codebook) vs
tokens — chunking large text into overlapping windows and screening each input-level rolling windows (multiple forward passes for long documents,
window independently. This enables: Phase 2).
- **Cross-references**: ADR-003, ADR-012
1. **Granular detection**: For the instruction firewall use case (screening
academic papers converted from PDF to markdown), rolling windows can
red-flag specific *sections* of a document rather than the whole thing.
This is directly useful for catching hidden prompt injections in academic
research papers (~20 real examples found of researchers slipping injections
past peer review).
2. **Parallel processing**: Windows can be screened in parallel, enabling
throughput scaling.
3. **Large input handling**: No need to truncate long documents; each window
is independently screened within the model's context length.
The PoC has directional (but buggy) Rust code for creating rolling windows
that can be referenced when designing this feature. This connects to OQ-05
because streaming/chunking affects how the firewall composes with other
guardrail systems in a pipeline.
Leave open for Phase 1 design, but the rolling window approach is the leading
candidate for Phase 2.
--- ---

View File

@@ -185,6 +185,7 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
| [009](decisions/009-last-token-extraction.md) | Last-token activation extraction | Standard for autoregressive models; full sequence context | | [009](decisions/009-last-token-extraction.md) | Last-token activation extraction | Standard for autoregressive models; full sequence context |
| [010](decisions/010-monotonic-spline-distributions.md) | Monotonic spline distributions | Compact, smooth, tail-sensitive behavioral region modeling | | [010](decisions/010-monotonic-spline-distributions.md) | Monotonic spline distributions | Compact, smooth, tail-sensitive behavioral region modeling |
| [011](decisions/011-guardrail-integration-strategy.md) | Standalone API + thin adapters | Phase 1 standalone, Phase 2 thin adapter packages | | [011](decisions/011-guardrail-integration-strategy.md) | Standalone API + thin adapters | Phase 1 standalone, Phase 2 thin adapter packages |
| [012](decisions/012-rolling-window-screening.md) | Rolling token window screening | Phase 2 `screen_document()` with 25% overlap, max pooling |
## Dependencies on Other Projects ## Dependencies on Other Projects