Research-driven resolution of OQ-01, OQ-02, OQ-05, OQ-06: - OQ-01: Remove ONNX Runtime from scope entirely — doesn't support activation extraction natively (optimum #972 closed as not planned), bloated model exports; burn/cublas via safetensors is a better future path - OQ-02: Codebook compresses ~65% (1,245 → 500-600 lines); add Package Structure and Extraction from PoC sections to codebook.md based on PoC analysis of metaspline firewall_codebook.py - OQ-05: Standalone API + thin adapter pattern (ADR-011); Phase 1 ships Firewall.screen() only, Phase 2 adds <100-line adapter packages for LlamaFirewall, OpenAI Agents SDK, NeMo Guardrails - OQ-06: TOML for file-based config — standard modern Python, two-way door Also: research OQ-03 rolling windows from taskgraph-semantic reference code, remove onnxruntime/optimum from dependencies, move streaming screening to Phase 2, add burn/cublas as Phase 3 alternative backend.
71 lines
4.2 KiB
Markdown
71 lines
4.2 KiB
Markdown
---
|
||
status: draft
|
||
last_updated: 2026-06-13
|
||
---
|
||
|
||
# alknet-firewall — Architecture
|
||
|
||
## Current State
|
||
|
||
**Phase 0→1 (Exploration → Architecture)** — The project has a working PoC
|
||
demonstrating that behavioral signals from small language models can detect
|
||
adversarial inputs. The core detection logic (~1,745 lines) works reasonably
|
||
well but lacks tests, has excessive codebook size, and needs extraction from
|
||
the research codebase into a properly structured Python package.
|
||
|
||
This project extracts and productionizes the behavioral signal detection
|
||
approach from the metaspline research project. A ~125M parameter model
|
||
(SmolLM2-135M) processes untrusted inputs and produces hidden state
|
||
activations. SVD-based dimensionality reduction on these activations reveals
|
||
behavioral patterns — normal inputs cluster in expected regions while
|
||
adversarial inputs produce anomalous activation signatures. The system
|
||
raises "behavioral alarms" without needing to know specific attack types.
|
||
|
||
## Architecture Documents
|
||
|
||
| Document | Status | Description |
|
||
|----------|--------|-------------|
|
||
| [overview.md](overview.md) | Draft | Vision, scope, package structure, dependencies |
|
||
| [firewall.md](firewall.md) | Draft | Core firewall API, input screening, alarm protocol |
|
||
| [codebook.md](codebook.md) | Draft | SVD basis, detection parameters, codebook compilation |
|
||
| [model.md](model.md) | Draft | Model loading, activation extraction, model-agnostic design |
|
||
| [configuration.md](configuration.md) | Draft | Thresholds, model selection, detection tuning |
|
||
| [open-questions.md](open-questions.md) | Active | Unresolved questions tracker with OQ-IDs |
|
||
|
||
## ADR Table
|
||
|
||
| ADR | Title | Status |
|
||
|-----|-------|--------|
|
||
| [001](decisions/001-python-uv.md) | Python with uv | Accepted |
|
||
| [002](decisions/002-behavioral-signals.md) | Behavioral Signal Detection (Not Text Classification) | Accepted |
|
||
| [003](decisions/003-small-model-detector.md) | Small Model (~125M) as Detector | Accepted |
|
||
| [004](decisions/004-svd-based-detection.md) | SVD-Based Anomaly Detection | Accepted |
|
||
| [005](decisions/005-safetensors-only.md) | Safetensors-Only Model Loading | Accepted |
|
||
| [006](decisions/006-optional-pytorch.md) | PyTorch as Optional Dependency | Accepted |
|
||
| [007](decisions/007-runtime-model-download.md) | Runtime Model Download via HuggingFace Hub | Accepted |
|
||
| [008](decisions/008-three-level-alarm.md) | Three-Level Alarm System | Accepted |
|
||
| [009](decisions/009-last-token-extraction.md) | Last-Token Activation Extraction | Accepted |
|
||
| [010](decisions/010-monotonic-spline-distributions.md) | Monotonic Spline Distributions | Accepted |
|
||
| [011](decisions/011-guardrail-integration-strategy.md) | Standalone API + Thin Adapter Integration | Accepted |
|
||
|
||
## Open Questions
|
||
|
||
See [open-questions.md](open-questions.md) for the full tracker.
|
||
|
||
| OQ | Question | Priority | Status |
|
||
|----|----------|----------|--------|
|
||
| ~~OQ-01~~ | ~~Should ONNX Runtime be a supported inference backend in Phase 1?~~ | ~~medium~~ | **resolved** (removed from scope; burn/cublas is better future path) |
|
||
| ~~OQ-02~~ | ~~What is the minimum viable codebook — can the 1,245-line codebook be compressed?~~ | ~~high~~ | **resolved** (~65% compression to 500–600 lines) |
|
||
| OQ-03 | Should the firewall support streaming/chunked input screening? | medium | open (research complete, Phase 2) |
|
||
| ~~OQ-04~~ | ~~Should detection thresholds be per-model or globally configurable?~~ | ~~medium~~ | **resolved** (both: model-specific defaults, user-overridable) |
|
||
| ~~OQ-05~~ | ~~How should the firewall integrate with existing guardrail systems?~~ | ~~medium~~ | **resolved** (ADR-011: standalone API + thin adapters) |
|
||
| ~~OQ-06~~ | ~~Should file-based configuration use TOML or YAML?~~ | ~~low~~ | **resolved** (TOML) |
|
||
|
||
## Document Lifecycle
|
||
|
||
| Status | Meaning | Transitions |
|
||
|--------|---------|-------------|
|
||
| `draft` | Under active development. May change significantly. | → `reviewed` when open questions are resolved |
|
||
| `reviewed` | Architecture is final. Implementation may begin. Changes require review. | → `stable` when implementation is complete |
|
||
| `stable` | Locked. Changes require review and may warrant an ADR. | → `deprecated` when superseded |
|
||
| `deprecated` | Superseded. Kept for reference. | Removed when no longer referenced | |