--- status: draft last_updated: 2026-06-13 --- # alknet-firewall — Architecture ## Current State **Phase 0→1 (Exploration → Architecture)** — The project has a working PoC demonstrating that behavioral signals from small language models can detect adversarial inputs. The core detection logic (~1,745 lines) works reasonably well but lacks tests, has excessive codebook size, and needs extraction from the research codebase into a properly structured Python package. This project extracts and productionizes the behavioral signal detection approach from the metaspline research project. A ~125M parameter model (SmolLM2-135M) processes untrusted inputs and produces hidden state activations. SVD-based dimensionality reduction on these activations reveals behavioral patterns — normal inputs cluster in expected regions while adversarial inputs produce anomalous activation signatures. The system raises "behavioral alarms" without needing to know specific attack types. ## Architecture Documents | Document | Status | Description | |----------|--------|-------------| | [overview.md](overview.md) | Draft | Vision, scope, package structure, dependencies | | [firewall.md](firewall.md) | Draft | Core firewall API, input screening, alarm protocol | | [codebook.md](codebook.md) | Draft | SVD basis, detection parameters, codebook compilation | | [model.md](model.md) | Draft | Model loading, activation extraction, model-agnostic design | | [configuration.md](configuration.md) | Draft | Thresholds, model selection, detection tuning | | [open-questions.md](open-questions.md) | Active | Unresolved questions tracker with OQ-IDs | ## ADR Table | ADR | Title | Status | |-----|-------|--------| | [001](decisions/001-python-uv.md) | Python with uv | Accepted | | [002](decisions/002-behavioral-signals.md) | Behavioral Signal Detection (Not Text Classification) | Accepted | | [003](decisions/003-small-model-detector.md) | Small Model (~125M) as Detector | Accepted | | [004](decisions/004-svd-based-detection.md) | SVD-Based Anomaly Detection | Accepted | | [005](decisions/005-safetensors-only.md) | Safetensors-Only Model Loading | Accepted | | [006](decisions/006-optional-pytorch.md) | PyTorch as Optional Dependency | Accepted | | [007](decisions/007-runtime-model-download.md) | Runtime Model Download via HuggingFace Hub | Accepted | | [008](decisions/008-three-level-alarm.md) | Three-Level Alarm System | Accepted | | [009](decisions/009-last-token-extraction.md) | Last-Token Activation Extraction | Accepted | | [010](decisions/010-monotonic-spline-distributions.md) | Monotonic Spline Distributions | Accepted | ## Open Questions See [open-questions.md](open-questions.md) for the full tracker. | OQ | Question | Priority | Status | |----|----------|----------|--------| | OQ-01 | Should ONNX Runtime be a supported inference backend in Phase 1? | medium | open | | OQ-02 | What is the minimum viable codebook — can the 1,245-line codebook be compressed? | high | open | | OQ-03 | Should the firewall support streaming/chunked input screening? | medium | open | | ~~OQ-04~~ | ~~Should detection thresholds be per-model or globally configurable?~~ | ~~medium~~ | **resolved** (both: model-specific defaults, user-overridable) | | OQ-05 | How should the firewall integrate with existing guardrail systems? | medium | open | | OQ-06 | Should file-based configuration use TOML or YAML? | low | open | ## Document Lifecycle | Status | Meaning | Transitions | |--------|---------|-------------| | `draft` | Under active development. May change significantly. | → `reviewed` when open questions are resolved | | `reviewed` | Architecture is final. Implementation may begin. Changes require review. | → `stable` when implementation is complete | | `stable` | Locked. Changes require review and may warrant an ADR. | → `deprecated` when superseded | | `deprecated` | Superseded. Kept for reference. | Removed when no longer referenced |