Files
alknet-firewall/docs/architecture/decisions/007-runtime-model-download.md
glm-5.1 cf464c2296 feat: initial architecture specification and research
Phase 0→1 setup for alknet-firewall — a behavioral signal detection
library that screens untrusted LLM inputs using small model activations.

Architecture docs (5 specs, 10 ADRs, 7 open questions):
- overview: vision, scope, dependencies, package structure
- firewall: core API, alarm protocol, score composition, error handling
- codebook: SVD basis, spline distributions, calibration, tensor format
- model: activation extraction, model-agnostic interface, lazy loading
- configuration: thresholds, model selection, detection tuning

Research reports:
- modern-python-project-setup: uv, pyproject.toml, src layout, ruff, CI
- python-ml-packaging: optional PyTorch, HF Hub download, safetensors
- llm-input-safety-landscape: threat taxonomy, defenses, academic evidence

Agent role adaptations for Python project (replaced Rust conventions).
2026-06-13 05:17:40 +00:00

1.8 KiB

ADR-007: Runtime Model Download via HuggingFace Hub

Status

Accepted

Context

The detector model (SmolLM2-135M) is ~269MB. This is too large to bundle in a Python package — PyPI has a 60MB per-file limit and 1GB total project size limit. Even if it were allowed, a 269MB wheel download is terrible UX.

Options:

  • Bundle in package: Not feasible due to size constraints
  • Separate package for model: Possible but awkward, requires users to install two packages
  • Runtime download via HuggingFace Hub: Standard approach used by transformers. Provides caching, authentication, offline mode, and checksum verification
  • Custom download (S3, etc.): Works but reinvents the wheel

Decision

Download the detector model at runtime via HuggingFace Hub (snapshot_download or from_pretrained with automatic caching). Support offline mode via HF_HUB_OFFLINE=1 or local_files_only=True. Provide a CLI command for pre-downloading models in air-gapped environments.

Pin model revisions to specific commit hashes for reproducibility.

Consequences

Positive:

  • Package stays small (~30MB base install)
  • HuggingFace Hub provides automatic caching, deduplication, and checksum verification
  • Offline mode supported via environment variable
  • Authentication for gated models via HF_TOKEN
  • Standard approach — users familiar with transformers will recognize the pattern

Negative:

  • First run requires network access and ~269MB download (with progress bar)
  • Model availability depends on HuggingFace Hub uptime
  • Users in restricted networks need to pre-download models
  • Different model versions may produce different detection results — must pin revisions

References