feat: initial architecture specification and research

Phase 0→1 setup for alknet-firewall — a behavioral signal detection library that screens untrusted LLM inputs using small model activations. Architecture docs (5 specs, 10 ADRs, 7 open questions): - overview: vision, scope, dependencies, package structure - firewall: core API, alarm protocol, score composition, error handling - codebook: SVD basis, spline distributions, calibration, tensor format - model: activation extraction, model-agnostic interface, lazy loading - configuration: thresholds, model selection, detection tuning Research reports: - modern-python-project-setup: uv, pyproject.toml, src layout, ruff, CI - python-ml-packaging: optional PyTorch, HF Hub download, safetensors - llm-input-safety-landscape: threat taxonomy, defenses, academic evidence Agent role adaptations for Python project (replaced Rust conventions).
2026-06-13 05:17:40 +00:00
parent 141628bae4
commit cf464c2296
23 changed files with 3900 additions and 44 deletions
--- a/docs/architecture/decisions/005-safetensors-only.md
+++ b/docs/architecture/decisions/005-safetensors-only.md
@@ -0,0 +1,47 @@
+# ADR-005: Safetensors-Only Model Loading
+
+## Status
+
+Accepted
+
+## Context
+
+Model weight files come in two formats:
+
+- **Pickle-based** (`.pt`, `.bin`, `.pth`): Can execute arbitrary Python code
+  during loading. Known supply chain attack vector.
+- **safetensors**: Simple binary format with JSON header. No code execution.
+  76x faster CPU loading. Zero-copy/lazy loading support.
+
+This is a security product. Loading untrusted pickle files in a security
+product is a contradiction. The LiteLLM supply chain attack (CVE-2026-33634,
+CVSS 9.4) demonstrated that compromised model files can lead to credential
+theft and backdoors.
+
+## Decision
+
+Only load model weights from safetensors format. Never load `.pt`, `.bin`,
+or `.pth` files. Apply this policy to both the detector model and the codebook
+tensors.
+
+## Consequences
+
+**Positive**:
+- Eliminates entire class of supply chain attacks via model files
+- 76x faster model loading on CPU
+- Zero-copy/lazy loading reduces memory usage
+- Cross-framework compatible (PyTorch, ONNX, numpy)
+- Consistent with HuggingFace's own migration to safetensors-default
+
+**Negative**:
+- Some older models only ship `.bin` weights — must convert before use
+- Safetensors doesn't support saving optimizer state (irrelevant — we only
+  do inference)
+- Explicit `use_safetensors=True` parameter needed in transformers for older
+  versions
+
+## References
+
+- [python-ml-packaging.md](../research/python-ml-packaging.md) — Section 6:
+  safetensors format comparison
+- CVE-2026-33634 — LiteLLM supply chain attack