Research confirmed rolling token windows as the right approach for long
document screening. ADR-012 formalizes the decision: Phase 2 implements
screen_document() with 25% overlap (512 tokens for SmolLM2-135M), max
pooling aggregation, and character offset tracking. Short inputs fall
through to screen() unchanged.
This resolves the last open question. All 6 original OQs are now resolved:
- OQ-01: ONNX removed (burn/cublas better future path)
- OQ-02: 65% codebook compression achievable
- OQ-03: Rolling token windows for Phase 2 (ADR-012)
- OQ-04: Both model-specific defaults + user-overridable
- OQ-05: Standalone API + thin adapters (ADR-011)
- OQ-06: TOML for file-based config
The architecture specs previously described detection as a single-vector
path (one activation → one z-coordinate → one alarm), but the PoC operates
on per-token z-coordinate sequences with a two-stage copula decomposition.
Key updates:
- codebook.md: Add Copula Decomposition section (z → CDF → simplex →
barycentric → (S, u, v)), Direction Profiles and Contrast Pairs section,
Token-Level Smoothing section, classifier weights and direction profiles
to data format, updated Internal API with decompose/classify/detect methods
- codebook.md: Clarify z-coordinate shapes — training is (N, 3) flattened
per-token positions, inference is (seq_len, 3) per-token sequence
- firewall.md: Update data flow to 10-step pipeline including copula
decomposition, smoothing, and direction classification; update score
composition to use direction-level P(active); update DimensionSignal
dataclass; update latency budget with copula/smoothing/classification steps
- model.md: Add Phase 1 (last-token) vs Phase 2 (per-token) extraction modes
- ADR-009: Note last-token is Phase 1 simplification, per-token is full
pipeline
Research-driven resolution of OQ-01, OQ-02, OQ-05, OQ-06:
- OQ-01: Remove ONNX Runtime from scope entirely — doesn't support
activation extraction natively (optimum #972 closed as not planned),
bloated model exports; burn/cublas via safetensors is a better future path
- OQ-02: Codebook compresses ~65% (1,245 → 500-600 lines); add Package
Structure and Extraction from PoC sections to codebook.md based on PoC
analysis of metaspline firewall_codebook.py
- OQ-05: Standalone API + thin adapter pattern (ADR-011); Phase 1 ships
Firewall.screen() only, Phase 2 adds <100-line adapter packages for
LlamaFirewall, OpenAI Agents SDK, NeMo Guardrails
- OQ-06: TOML for file-based config — standard modern Python, two-way door
Also: research OQ-03 rolling windows from taskgraph-semantic reference code,
remove onnxruntime/optimum from dependencies, move streaming screening to
Phase 2, add burn/cublas as Phase 3 alternative backend.
- OQ-04 resolved: thresholds are both model-specific (shipped with
codebook) and user-overridable. Inspired by platonic representation
hypothesis — calibrated models converge on similar behavioral patterns.
- OQ-07 removed: Rust port is an alknet project concern, not relevant
to the Python package architecture. Removed from overview.md Phase 3.
- OQ-03 enriched: rolling window token screening for granular detection
in documents (PDF→markdown use case, academic paper injection detection).
Upgraded from low to medium priority.
- OQ-01 updated: likely path is PyTorch first, ONNX export by default.
- OQ-05 updated: needs deep dive into guardrail landscape.
- Updated threshold description in configuration.md with platonic
representation context.