Files
glm-5.1 c225cf420c docs: resolve OQ-03 — adopt rolling token window screening (ADR-012)
Research confirmed rolling token windows as the right approach for long
document screening. ADR-012 formalizes the decision: Phase 2 implements
screen_document() with 25% overlap (512 tokens for SmolLM2-135M), max
pooling aggregation, and character offset tracking. Short inputs fall
through to screen() unchanged.

This resolves the last open question. All 6 original OQs are now resolved:
- OQ-01: ONNX removed (burn/cublas better future path)
- OQ-02: 65% codebook compression achievable
- OQ-03: Rolling token windows for Phase 2 (ADR-012)
- OQ-04: Both model-specific defaults + user-overridable
- OQ-05: Standalone API + thin adapters (ADR-011)
- OQ-06: TOML for file-based config
2026-06-13 08:25:12 +00:00

72 lines
4.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
status: draft
last_updated: 2026-06-13
---
# alknet-firewall — Architecture
## Current State
**Phase 0→1 (Exploration → Architecture)** — The project has a working PoC
demonstrating that behavioral signals from small language models can detect
adversarial inputs. The core detection logic (~1,745 lines) works reasonably
well but lacks tests, has excessive codebook size, and needs extraction from
the research codebase into a properly structured Python package.
This project extracts and productionizes the behavioral signal detection
approach from the metaspline research project. A ~125M parameter model
(SmolLM2-135M) processes untrusted inputs and produces hidden state
activations. SVD-based dimensionality reduction on these activations reveals
behavioral patterns — normal inputs cluster in expected regions while
adversarial inputs produce anomalous activation signatures. The system
raises "behavioral alarms" without needing to know specific attack types.
## Architecture Documents
| Document | Status | Description |
|----------|--------|-------------|
| [overview.md](overview.md) | Draft | Vision, scope, package structure, dependencies |
| [firewall.md](firewall.md) | Draft | Core firewall API, input screening, alarm protocol |
| [codebook.md](codebook.md) | Draft | SVD basis, detection parameters, codebook compilation |
| [model.md](model.md) | Draft | Model loading, activation extraction, model-agnostic design |
| [configuration.md](configuration.md) | Draft | Thresholds, model selection, detection tuning |
| [open-questions.md](open-questions.md) | Active | Unresolved questions tracker with OQ-IDs |
## ADR Table
| ADR | Title | Status |
|-----|-------|--------|
| [001](decisions/001-python-uv.md) | Python with uv | Accepted |
| [002](decisions/002-behavioral-signals.md) | Behavioral Signal Detection (Not Text Classification) | Accepted |
| [003](decisions/003-small-model-detector.md) | Small Model (~125M) as Detector | Accepted |
| [004](decisions/004-svd-based-detection.md) | SVD-Based Anomaly Detection | Accepted |
| [005](decisions/005-safetensors-only.md) | Safetensors-Only Model Loading | Accepted |
| [006](decisions/006-optional-pytorch.md) | PyTorch as Optional Dependency | Accepted |
| [007](decisions/007-runtime-model-download.md) | Runtime Model Download via HuggingFace Hub | Accepted |
| [008](decisions/008-three-level-alarm.md) | Three-Level Alarm System | Accepted |
| [009](decisions/009-last-token-extraction.md) | Last-Token Activation Extraction | Accepted |
| [010](decisions/010-monotonic-spline-distributions.md) | Monotonic Spline Distributions | Accepted |
| [011](decisions/011-guardrail-integration-strategy.md) | Standalone API + Thin Adapter Integration | Accepted |
| [012](decisions/012-rolling-window-screening.md) | Rolling Token Window Screening | Accepted |
## Open Questions
See [open-questions.md](open-questions.md) for the full tracker.
| OQ | Question | Priority | Status |
|----|----------|----------|--------|
| ~~OQ-01~~ | ~~Should ONNX Runtime be a supported inference backend in Phase 1?~~ | ~~medium~~ | **resolved** (removed from scope; burn/cublas is better future path) |
| ~~OQ-02~~ | ~~What is the minimum viable codebook — can the 1,245-line codebook be compressed?~~ | ~~high~~ | **resolved** (~65% compression to 500600 lines) |
| ~~OQ-03~~ | ~~Should the firewall support streaming/chunked input screening?~~ | ~~medium~~ | **resolved** (ADR-012: rolling token windows Phase 2) |
| ~~OQ-04~~ | ~~Should detection thresholds be per-model or globally configurable?~~ | ~~medium~~ | **resolved** (both: model-specific defaults, user-overridable) |
| ~~OQ-05~~ | ~~How should the firewall integrate with existing guardrail systems?~~ | ~~medium~~ | **resolved** (ADR-011: standalone API + thin adapters) |
| ~~OQ-06~~ | ~~Should file-based configuration use TOML or YAML?~~ | ~~low~~ | **resolved** (TOML) |
## Document Lifecycle
| Status | Meaning | Transitions |
|--------|---------|-------------|
| `draft` | Under active development. May change significantly. | → `reviewed` when open questions are resolved |
| `reviewed` | Architecture is final. Implementation may begin. Changes require review. | → `stable` when implementation is complete |
| `stable` | Locked. Changes require review and may warrant an ADR. | → `deprecated` when superseded |
| `deprecated` | Superseded. Kept for reference. | Removed when no longer referenced |