alknet-firewall/docs/architecture/decisions/007-runtime-model-download.md

# ADR-007: Runtime Model Download via HuggingFace Hub

## Status

Accepted

## Context

The detector model (SmolLM2-135M) is ~269MB. This is too large to bundle in a
Python package — PyPI has a 60MB per-file limit and 1GB total project size
limit. Even if it were allowed, a 269MB wheel download is terrible UX.

Options:
- **Bundle in package**: Not feasible due to size constraints
- **Separate package for model**: Possible but awkward, requires users to
  install two packages
- **Runtime download via HuggingFace Hub**: Standard approach used by
  transformers. Provides caching, authentication, offline mode, and
  checksum verification
- **Custom download (S3, etc.)**: Works but reinvents the wheel

## Decision

Download the detector model at runtime via HuggingFace Hub (`snapshot_download`
or `from_pretrained` with automatic caching). Support offline mode via
`HF_HUB_OFFLINE=1` or `local_files_only=True`. Provide a CLI command for
pre-downloading models in air-gapped environments.

Pin model revisions to specific commit hashes for reproducibility.

## Consequences

**Positive**:
- Package stays small (~30MB base install)
- HuggingFace Hub provides automatic caching, deduplication, and checksum
  verification
- Offline mode supported via environment variable
- Authentication for gated models via `HF_TOKEN`
- Standard approach — users familiar with transformers will recognize the
  pattern

**Negative**:
- First run requires network access and ~269MB download (with progress bar)
- Model availability depends on HuggingFace Hub uptime
- Users in restricted networks need to pre-download models
- Different model versions may produce different detection results — must
  pin revisions

## References

- [python-ml-packaging.md](../research/python-ml-packaging.md) — Section 2:
  Model file distribution
- [model.md](../model.md)