docs: resolve OQ-03 — adopt rolling token window screening (ADR-012)
Research confirmed rolling token windows as the right approach for long document screening. ADR-012 formalizes the decision: Phase 2 implements screen_document() with 25% overlap (512 tokens for SmolLM2-135M), max pooling aggregation, and character offset tracking. Short inputs fall through to screen() unchanged. This resolves the last open question. All 6 original OQs are now resolved: - OQ-01: ONNX removed (burn/cublas better future path) - OQ-02: 65% codebook compression achievable - OQ-03: Rolling token windows for Phase 2 (ADR-012) - OQ-04: Both model-specific defaults + user-overridable - OQ-05: Standalone API + thin adapters (ADR-011) - OQ-06: TOML for file-based config
This commit is contained in:
@@ -47,6 +47,7 @@ raises "behavioral alarms" without needing to know specific attack types.
|
|||||||
| [009](decisions/009-last-token-extraction.md) | Last-Token Activation Extraction | Accepted |
|
| [009](decisions/009-last-token-extraction.md) | Last-Token Activation Extraction | Accepted |
|
||||||
| [010](decisions/010-monotonic-spline-distributions.md) | Monotonic Spline Distributions | Accepted |
|
| [010](decisions/010-monotonic-spline-distributions.md) | Monotonic Spline Distributions | Accepted |
|
||||||
| [011](decisions/011-guardrail-integration-strategy.md) | Standalone API + Thin Adapter Integration | Accepted |
|
| [011](decisions/011-guardrail-integration-strategy.md) | Standalone API + Thin Adapter Integration | Accepted |
|
||||||
|
| [012](decisions/012-rolling-window-screening.md) | Rolling Token Window Screening | Accepted |
|
||||||
|
|
||||||
## Open Questions
|
## Open Questions
|
||||||
|
|
||||||
@@ -56,7 +57,7 @@ See [open-questions.md](open-questions.md) for the full tracker.
|
|||||||
|----|----------|----------|--------|
|
|----|----------|----------|--------|
|
||||||
| ~~OQ-01~~ | ~~Should ONNX Runtime be a supported inference backend in Phase 1?~~ | ~~medium~~ | **resolved** (removed from scope; burn/cublas is better future path) |
|
| ~~OQ-01~~ | ~~Should ONNX Runtime be a supported inference backend in Phase 1?~~ | ~~medium~~ | **resolved** (removed from scope; burn/cublas is better future path) |
|
||||||
| ~~OQ-02~~ | ~~What is the minimum viable codebook — can the 1,245-line codebook be compressed?~~ | ~~high~~ | **resolved** (~65% compression to 500–600 lines) |
|
| ~~OQ-02~~ | ~~What is the minimum viable codebook — can the 1,245-line codebook be compressed?~~ | ~~high~~ | **resolved** (~65% compression to 500–600 lines) |
|
||||||
| OQ-03 | Should the firewall support streaming/chunked input screening? | medium | open (research complete, Phase 2) |
|
| ~~OQ-03~~ | ~~Should the firewall support streaming/chunked input screening?~~ | ~~medium~~ | **resolved** (ADR-012: rolling token windows Phase 2) |
|
||||||
| ~~OQ-04~~ | ~~Should detection thresholds be per-model or globally configurable?~~ | ~~medium~~ | **resolved** (both: model-specific defaults, user-overridable) |
|
| ~~OQ-04~~ | ~~Should detection thresholds be per-model or globally configurable?~~ | ~~medium~~ | **resolved** (both: model-specific defaults, user-overridable) |
|
||||||
| ~~OQ-05~~ | ~~How should the firewall integrate with existing guardrail systems?~~ | ~~medium~~ | **resolved** (ADR-011: standalone API + thin adapters) |
|
| ~~OQ-05~~ | ~~How should the firewall integrate with existing guardrail systems?~~ | ~~medium~~ | **resolved** (ADR-011: standalone API + thin adapters) |
|
||||||
| ~~OQ-06~~ | ~~Should file-based configuration use TOML or YAML?~~ | ~~low~~ | **resolved** (TOML) |
|
| ~~OQ-06~~ | ~~Should file-based configuration use TOML or YAML?~~ | ~~low~~ | **resolved** (TOML) |
|
||||||
|
|||||||
79
docs/architecture/decisions/012-rolling-window-screening.md
Normal file
79
docs/architecture/decisions/012-rolling-window-screening.md
Normal file
@@ -0,0 +1,79 @@
|
|||||||
|
# ADR-012: Rolling Token Window Screening for Long Documents
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Accepted
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
The Phase 1 `screen()` API processes the full input as a single forward pass
|
||||||
|
through the detector model. This works for inputs within the model's context
|
||||||
|
window (2048 tokens for SmolLM2-135M) but fails for longer documents. Two
|
||||||
|
distinct windowing concepts exist in the detection pipeline:
|
||||||
|
|
||||||
|
1. **Token-level smoothing** (already in the codebook): Within a single
|
||||||
|
forward pass, per-token z-coordinates are smoothed with a rolling average
|
||||||
|
(window=8) before classification. This operates on the `(seq_len, 3)` z
|
||||||
|
coordinate sequence.
|
||||||
|
|
||||||
|
2. **Input-level rolling windows** (this ADR): For long documents that exceed
|
||||||
|
the model's context window, chunk the text into overlapping token windows
|
||||||
|
and screen each window independently. Each window produces its own z-vector
|
||||||
|
and alarm. Windows are aggregated into a document-level verdict.
|
||||||
|
|
||||||
|
Research ([rolling-window-analysis.md](../../research/streaming-screening-patterns/rolling-window-analysis.md))
|
||||||
|
confirmed that:
|
||||||
|
- Meta's PromptGuard 2 uses a similar approach (512-token segments)
|
||||||
|
- Max pooling is the correct aggregation strategy (consistent with existing
|
||||||
|
weighted-max score composition)
|
||||||
|
- 25% overlap (512 tokens for SmolLM2-135M) balances detection quality vs
|
||||||
|
throughput — enough to catch boundary-spanning injections
|
||||||
|
- Character offset mapping (from HuggingFace tokenizer `offset_mapping`)
|
||||||
|
enables granular "section X is suspicious" reporting
|
||||||
|
- The Rust reference implementation in taskgraph-semantic validates the
|
||||||
|
window creation algorithm
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Implement rolling token window screening as the Phase 2 `screen_document()`
|
||||||
|
API, with the following parameters:
|
||||||
|
|
||||||
|
- **Window size**: Model's max sequence length (2048 for SmolLM2-135M)
|
||||||
|
- **Overlap**: 25% (512 tokens) — same as PromptGuard's entire context window
|
||||||
|
- **Aggregation**: Max pooling across per-window, per-direction P(active)
|
||||||
|
scores
|
||||||
|
- **Short input handling**: Inputs shorter than one window fall through to
|
||||||
|
`screen()` with no overhead
|
||||||
|
- **Character offset tracking**: Token-to-character mapping for granular
|
||||||
|
reporting of flagged sections
|
||||||
|
|
||||||
|
The two windowing concepts (token-level smoothing, input-level rolling windows)
|
||||||
|
are composable and solve different problems at different levels.
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
**Positive**:
|
||||||
|
- Long documents (academic papers, reports) can be screened without truncation
|
||||||
|
- Granular reporting identifies which sections are suspicious, not just the
|
||||||
|
whole document
|
||||||
|
- Windows can be processed in parallel for throughput scaling
|
||||||
|
- Natural fallback: short inputs get the fast single-window path
|
||||||
|
- Character offsets enable UI integration (highlighting flagged sections)
|
||||||
|
- Pattern translates directly to Rust for future embedding system integration
|
||||||
|
|
||||||
|
**Negative**:
|
||||||
|
- Throughput cost: N windows = N forward passes. A 10K-token document needs
|
||||||
|
~7 windows at 25% overlap.
|
||||||
|
- Overlap regions are processed multiple times, increasing compute
|
||||||
|
- API surface expands — users must choose between `screen()` and
|
||||||
|
`screen_document()`
|
||||||
|
- Edge cases around window boundaries (partial word tokens, very short
|
||||||
|
windows) need careful handling
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [rolling-window-analysis.md](../../research/streaming-screening-patterns/rolling-window-analysis.md) — Full research with API design and implementation sketch
|
||||||
|
- [OQ-03](../open-questions.md) — Original open question
|
||||||
|
- [firewall.md](../firewall.md) — Current screening API
|
||||||
|
- [codebook.md](../codebook.md) — Token-level smoothing (separate from this)
|
||||||
|
- taskgraph-semantic: `/workspace/@alkimiadev/taskgraph-semantic/src/embedding.rs` — Rust reference for `create_rolling_windows()`
|
||||||
@@ -221,5 +221,5 @@ All exception types subclass `AlknetFirewallError` (base library exception).
|
|||||||
Open questions are tracked in [open-questions.md](open-questions.md). Key
|
Open questions are tracked in [open-questions.md](open-questions.md). Key
|
||||||
questions affecting this document:
|
questions affecting this document:
|
||||||
|
|
||||||
- **OQ-03**: Should the firewall support streaming/chunked input screening? (open — rolling window approach is promising; [research complete](../research/streaming-screening-patterns/rolling-window-analysis.md))
|
- ~~**OQ-03**~~: ~~Should the firewall support streaming/chunked input screening?~~ (resolved — ADR-012: rolling token windows with `screen_document()` in Phase 2)
|
||||||
- ~~**OQ-05**~~: ~~How should the firewall integrate with existing guardrail systems?~~ (resolved — ADR-011: standalone API + thin adapters Phase 2)
|
- ~~**OQ-05**~~: ~~How should the firewall integrate with existing guardrail systems?~~ (resolved — ADR-011: standalone API + thin adapters Phase 2)
|
||||||
@@ -42,40 +42,22 @@ Centralized tracker for unresolved questions across all architecture documents.
|
|||||||
|
|
||||||
## Theme: API Design
|
## Theme: API Design
|
||||||
|
|
||||||
### OQ-03: Should the firewall support streaming/chunked input screening?
|
### ~~OQ-03: Should the firewall support streaming/chunked input screening?~~
|
||||||
|
|
||||||
- **Origin**: [firewall.md](firewall.md)
|
- **Origin**: [firewall.md](firewall.md)
|
||||||
- **Status**: open
|
- **Status**: **resolved**
|
||||||
- **Priority**: medium
|
- **Priority**: medium
|
||||||
- **Cross-references**: ADR-003, OQ-05
|
- **Resolution**: Rolling token window approach (ADR-012). Phase 2 implements
|
||||||
|
`screen_document()` with overlapping token windows (25% overlap, model's
|
||||||
Some inputs arrive in chunks (streaming API responses, large documents). Should
|
full context length per window), max pooling for score aggregation, and
|
||||||
the firewall support incremental screening as chunks arrive, or require the
|
character offset tracking for granular "which sections are suspicious"
|
||||||
full input before screening? Incremental screening could detect attacks earlier
|
reporting. Short inputs fall through to the single-window `screen()` path.
|
||||||
but requires buffering and state management.
|
The research doc includes a directionally correct implementation sketch.
|
||||||
|
Two distinct windowing concepts are now clearly separated: token-level
|
||||||
**Rolling window approach**: One promising direction is rolling windows of
|
smoothing (within a single forward pass, already in codebook) vs
|
||||||
tokens — chunking large text into overlapping windows and screening each
|
input-level rolling windows (multiple forward passes for long documents,
|
||||||
window independently. This enables:
|
Phase 2).
|
||||||
|
- **Cross-references**: ADR-003, ADR-012
|
||||||
1. **Granular detection**: For the instruction firewall use case (screening
|
|
||||||
academic papers converted from PDF to markdown), rolling windows can
|
|
||||||
red-flag specific *sections* of a document rather than the whole thing.
|
|
||||||
This is directly useful for catching hidden prompt injections in academic
|
|
||||||
research papers (~20 real examples found of researchers slipping injections
|
|
||||||
past peer review).
|
|
||||||
2. **Parallel processing**: Windows can be screened in parallel, enabling
|
|
||||||
throughput scaling.
|
|
||||||
3. **Large input handling**: No need to truncate long documents; each window
|
|
||||||
is independently screened within the model's context length.
|
|
||||||
|
|
||||||
The PoC has directional (but buggy) Rust code for creating rolling windows
|
|
||||||
that can be referenced when designing this feature. This connects to OQ-05
|
|
||||||
because streaming/chunking affects how the firewall composes with other
|
|
||||||
guardrail systems in a pipeline.
|
|
||||||
|
|
||||||
Leave open for Phase 1 design, but the rolling window approach is the leading
|
|
||||||
candidate for Phase 2.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@@ -185,6 +185,7 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
|
|||||||
| [009](decisions/009-last-token-extraction.md) | Last-token activation extraction | Standard for autoregressive models; full sequence context |
|
| [009](decisions/009-last-token-extraction.md) | Last-token activation extraction | Standard for autoregressive models; full sequence context |
|
||||||
| [010](decisions/010-monotonic-spline-distributions.md) | Monotonic spline distributions | Compact, smooth, tail-sensitive behavioral region modeling |
|
| [010](decisions/010-monotonic-spline-distributions.md) | Monotonic spline distributions | Compact, smooth, tail-sensitive behavioral region modeling |
|
||||||
| [011](decisions/011-guardrail-integration-strategy.md) | Standalone API + thin adapters | Phase 1 standalone, Phase 2 thin adapter packages |
|
| [011](decisions/011-guardrail-integration-strategy.md) | Standalone API + thin adapters | Phase 1 standalone, Phase 2 thin adapter packages |
|
||||||
|
| [012](decisions/012-rolling-window-screening.md) | Rolling token window screening | Phase 2 `screen_document()` with 25% overlap, max pooling |
|
||||||
|
|
||||||
## Dependencies on Other Projects
|
## Dependencies on Other Projects
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user