Research confirmed rolling token windows as the right approach for long
document screening. ADR-012 formalizes the decision: Phase 2 implements
screen_document() with 25% overlap (512 tokens for SmolLM2-135M), max
pooling aggregation, and character offset tracking. Short inputs fall
through to screen() unchanged.
This resolves the last open question. All 6 original OQs are now resolved:
- OQ-01: ONNX removed (burn/cublas better future path)
- OQ-02: 65% codebook compression achievable
- OQ-03: Rolling token windows for Phase 2 (ADR-012)
- OQ-04: Both model-specific defaults + user-overridable
- OQ-05: Standalone API + thin adapters (ADR-011)
- OQ-06: TOML for file-based config
Phase 0→1 (Exploration → Architecture) — The project has a working PoC
demonstrating that behavioral signals from small language models can detect
adversarial inputs. The core detection logic (~1,745 lines) works reasonably
well but lacks tests, has excessive codebook size, and needs extraction from
the research codebase into a properly structured Python package.
This project extracts and productionizes the behavioral signal detection
approach from the metaspline research project. A ~125M parameter model
(SmolLM2-135M) processes untrusted inputs and produces hidden state
activations. SVD-based dimensionality reduction on these activations reveals
behavioral patterns — normal inputs cluster in expected regions while
adversarial inputs produce anomalous activation signatures. The system
raises "behavioral alarms" without needing to know specific attack types.