feat: initial architecture specification and research

Phase 0→1 setup for alknet-firewall — a behavioral signal detection library that screens untrusted LLM inputs using small model activations. Architecture docs (5 specs, 10 ADRs, 7 open questions): - overview: vision, scope, dependencies, package structure - firewall: core API, alarm protocol, score composition, error handling - codebook: SVD basis, spline distributions, calibration, tensor format - model: activation extraction, model-agnostic interface, lazy loading - configuration: thresholds, model selection, detection tuning Research reports: - modern-python-project-setup: uv, pyproject.toml, src layout, ruff, CI - python-ml-packaging: optional PyTorch, HF Hub download, safetensors - llm-input-safety-landscape: threat taxonomy, defenses, academic evidence Agent role adaptations for Python project (replaced Rust conventions).
2026-06-13 05:17:40 +00:00
parent 141628bae4
commit cf464c2296
23 changed files with 3900 additions and 44 deletions
--- a/docs/architecture/decisions/007-runtime-model-download.md
+++ b/docs/architecture/decisions/007-runtime-model-download.md
@@ -0,0 +1,53 @@
+# ADR-007: Runtime Model Download via HuggingFace Hub
+
+## Status
+
+Accepted
+
+## Context
+
+The detector model (SmolLM2-135M) is ~269MB. This is too large to bundle in a
+Python package — PyPI has a 60MB per-file limit and 1GB total project size
+limit. Even if it were allowed, a 269MB wheel download is terrible UX.
+
+Options:
+- **Bundle in package**: Not feasible due to size constraints
+- **Separate package for model**: Possible but awkward, requires users to
+  install two packages
+- **Runtime download via HuggingFace Hub**: Standard approach used by
+  transformers. Provides caching, authentication, offline mode, and
+  checksum verification
+- **Custom download (S3, etc.)**: Works but reinvents the wheel
+
+## Decision
+
+Download the detector model at runtime via HuggingFace Hub (`snapshot_download`
+or `from_pretrained` with automatic caching). Support offline mode via
+`HF_HUB_OFFLINE=1` or `local_files_only=True`. Provide a CLI command for
+pre-downloading models in air-gapped environments.
+
+Pin model revisions to specific commit hashes for reproducibility.
+
+## Consequences
+
+**Positive**:
+- Package stays small (~30MB base install)
+- HuggingFace Hub provides automatic caching, deduplication, and checksum
+  verification
+- Offline mode supported via environment variable
+- Authentication for gated models via `HF_TOKEN`
+- Standard approach — users familiar with transformers will recognize the
+  pattern
+
+**Negative**:
+- First run requires network access and ~269MB download (with progress bar)
+- Model availability depends on HuggingFace Hub uptime
+- Users in restricted networks need to pre-download models
+- Different model versions may produce different detection results — must
+  pin revisions
+
+## References
+
+- [python-ml-packaging.md](../research/python-ml-packaging.md) — Section 2:
+  Model file distribution
+- [model.md](../model.md)