Phase 0→1 setup for alknet-firewall — a behavioral signal detection library that screens untrusted LLM inputs using small model activations. Architecture docs (5 specs, 10 ADRs, 7 open questions): - overview: vision, scope, dependencies, package structure - firewall: core API, alarm protocol, score composition, error handling - codebook: SVD basis, spline distributions, calibration, tensor format - model: activation extraction, model-agnostic interface, lazy loading - configuration: thresholds, model selection, detection tuning Research reports: - modern-python-project-setup: uv, pyproject.toml, src layout, ruff, CI - python-ml-packaging: optional PyTorch, HF Hub download, safetensors - llm-input-safety-landscape: threat taxonomy, defenses, academic evidence Agent role adaptations for Python project (replaced Rust conventions).
1.8 KiB
1.8 KiB
ADR-007: Runtime Model Download via HuggingFace Hub
Status
Accepted
Context
The detector model (SmolLM2-135M) is ~269MB. This is too large to bundle in a Python package — PyPI has a 60MB per-file limit and 1GB total project size limit. Even if it were allowed, a 269MB wheel download is terrible UX.
Options:
- Bundle in package: Not feasible due to size constraints
- Separate package for model: Possible but awkward, requires users to install two packages
- Runtime download via HuggingFace Hub: Standard approach used by transformers. Provides caching, authentication, offline mode, and checksum verification
- Custom download (S3, etc.): Works but reinvents the wheel
Decision
Download the detector model at runtime via HuggingFace Hub (snapshot_download
or from_pretrained with automatic caching). Support offline mode via
HF_HUB_OFFLINE=1 or local_files_only=True. Provide a CLI command for
pre-downloading models in air-gapped environments.
Pin model revisions to specific commit hashes for reproducibility.
Consequences
Positive:
- Package stays small (~30MB base install)
- HuggingFace Hub provides automatic caching, deduplication, and checksum verification
- Offline mode supported via environment variable
- Authentication for gated models via
HF_TOKEN - Standard approach — users familiar with transformers will recognize the pattern
Negative:
- First run requires network access and ~269MB download (with progress bar)
- Model availability depends on HuggingFace Hub uptime
- Users in restricted networks need to pre-download models
- Different model versions may produce different detection results — must pin revisions
References
- python-ml-packaging.md — Section 2: Model file distribution
- model.md