Files
alknet-firewall/docs/architecture/README.md
glm-5.1 cf464c2296 feat: initial architecture specification and research
Phase 0→1 setup for alknet-firewall — a behavioral signal detection
library that screens untrusted LLM inputs using small model activations.

Architecture docs (5 specs, 10 ADRs, 7 open questions):
- overview: vision, scope, dependencies, package structure
- firewall: core API, alarm protocol, score composition, error handling
- codebook: SVD basis, spline distributions, calibration, tensor format
- model: activation extraction, model-agnostic interface, lazy loading
- configuration: thresholds, model selection, detection tuning

Research reports:
- modern-python-project-setup: uv, pyproject.toml, src layout, ruff, CI
- python-ml-packaging: optional PyTorch, HF Hub download, safetensors
- llm-input-safety-landscape: threat taxonomy, defenses, academic evidence

Agent role adaptations for Python project (replaced Rust conventions).
2026-06-13 05:17:40 +00:00

71 lines
3.9 KiB
Markdown

---
status: draft
last_updated: 2026-06-13
---
# alknet-firewall — Architecture
## Current State
**Phase 0→1 (Exploration → Architecture)** — The project has a working PoC
demonstrating that behavioral signals from small language models can detect
adversarial inputs. The core detection logic (~1,745 lines) works reasonably
well but lacks tests, has excessive codebook size, and needs extraction from
the research codebase into a properly structured Python package.
This project extracts and productionizes the behavioral signal detection
approach from the metaspline research project. A ~125M parameter model
(SmolLM2-135M) processes untrusted inputs and produces hidden state
activations. SVD-based dimensionality reduction on these activations reveals
behavioral patterns — normal inputs cluster in expected regions while
adversarial inputs produce anomalous activation signatures. The system
raises "behavioral alarms" without needing to know specific attack types.
## Architecture Documents
| Document | Status | Description |
|----------|--------|-------------|
| [overview.md](overview.md) | Draft | Vision, scope, package structure, dependencies |
| [firewall.md](firewall.md) | Draft | Core firewall API, input screening, alarm protocol |
| [codebook.md](codebook.md) | Draft | SVD basis, detection parameters, codebook compilation |
| [model.md](model.md) | Draft | Model loading, activation extraction, model-agnostic design |
| [configuration.md](configuration.md) | Draft | Thresholds, model selection, detection tuning |
| [open-questions.md](open-questions.md) | Active | Unresolved questions tracker with OQ-IDs |
## ADR Table
| ADR | Title | Status |
|-----|-------|--------|
| [001](decisions/001-python-uv.md) | Python with uv | Accepted |
| [002](decisions/002-behavioral-signals.md) | Behavioral Signal Detection (Not Text Classification) | Accepted |
| [003](decisions/003-small-model-detector.md) | Small Model (~125M) as Detector | Accepted |
| [004](decisions/004-svd-based-detection.md) | SVD-Based Anomaly Detection | Accepted |
| [005](decisions/005-safetensors-only.md) | Safetensors-Only Model Loading | Accepted |
| [006](decisions/006-optional-pytorch.md) | PyTorch as Optional Dependency | Accepted |
| [007](decisions/007-runtime-model-download.md) | Runtime Model Download via HuggingFace Hub | Accepted |
| [008](decisions/008-three-level-alarm.md) | Three-Level Alarm System | Accepted |
| [009](decisions/009-last-token-extraction.md) | Last-Token Activation Extraction | Accepted |
| [010](decisions/010-monotonic-spline-distributions.md) | Monotonic Spline Distributions | Accepted |
## Open Questions
See [open-questions.md](open-questions.md) for the full tracker.
| OQ | Question | Priority | Status |
|----|----------|----------|--------|
| OQ-01 | Should ONNX Runtime be a supported inference backend in Phase 1? | medium | open |
| OQ-02 | What is the minimum viable codebook — can the 1,245-line codebook be compressed? | high | open |
| OQ-03 | Should the firewall support streaming/chunked input screening? | low | open |
| OQ-04 | Should detection thresholds be per-model or globally configurable? | medium | open |
| OQ-05 | How should the firewall integrate with existing guardrail systems (LlamaFirewall, NeMo)? | medium | open |
| OQ-06 | Should file-based configuration use TOML or YAML? | low | open |
| OQ-07 | Is a Rust port feasible given current ML framework maturity? | low | open |
## Document Lifecycle
| Status | Meaning | Transitions |
|--------|---------|-------------|
| `draft` | Under active development. May change significantly. | → `reviewed` when open questions are resolved |
| `reviewed` | Architecture is final. Implementation may begin. Changes require review. | → `stable` when implementation is complete |
| `stable` | Locked. Changes require review and may warrant an ADR. | → `deprecated` when superseded |
| `deprecated` | Superseded. Kept for reference. | Removed when no longer referenced |