# ADR-007: Runtime Model Download via HuggingFace Hub ## Status Accepted ## Context The detector model (SmolLM2-135M) is ~269MB. This is too large to bundle in a Python package — PyPI has a 60MB per-file limit and 1GB total project size limit. Even if it were allowed, a 269MB wheel download is terrible UX. Options: - **Bundle in package**: Not feasible due to size constraints - **Separate package for model**: Possible but awkward, requires users to install two packages - **Runtime download via HuggingFace Hub**: Standard approach used by transformers. Provides caching, authentication, offline mode, and checksum verification - **Custom download (S3, etc.)**: Works but reinvents the wheel ## Decision Download the detector model at runtime via HuggingFace Hub (`snapshot_download` or `from_pretrained` with automatic caching). Support offline mode via `HF_HUB_OFFLINE=1` or `local_files_only=True`. Provide a CLI command for pre-downloading models in air-gapped environments. Pin model revisions to specific commit hashes for reproducibility. ## Consequences **Positive**: - Package stays small (~30MB base install) - HuggingFace Hub provides automatic caching, deduplication, and checksum verification - Offline mode supported via environment variable - Authentication for gated models via `HF_TOKEN` - Standard approach — users familiar with transformers will recognize the pattern **Negative**: - First run requires network access and ~269MB download (with progress bar) - Model availability depends on HuggingFace Hub uptime - Users in restricted networks need to pre-download models - Different model versions may produce different detection results — must pin revisions ## References - [python-ml-packaging.md](../research/python-ml-packaging.md) — Section 2: Model file distribution - [model.md](../model.md)