Files

glm-5.1 cf464c2296 feat: initial architecture specification and research

Phase 0→1 setup for alknet-firewall — a behavioral signal detection
library that screens untrusted LLM inputs using small model activations.

Architecture docs (5 specs, 10 ADRs, 7 open questions):
- overview: vision, scope, dependencies, package structure
- firewall: core API, alarm protocol, score composition, error handling
- codebook: SVD basis, spline distributions, calibration, tensor format
- model: activation extraction, model-agnostic interface, lazy loading
- configuration: thresholds, model selection, detection tuning

Research reports:
- modern-python-project-setup: uv, pyproject.toml, src layout, ruff, CI
- python-ml-packaging: optional PyTorch, HF Hub download, safetensors
- llm-input-safety-landscape: threat taxonomy, defenses, academic evidence

Agent role adaptations for Python project (replaced Rust conventions).

2026-06-13 05:17:40 +00:00

1.8 KiB

Raw Blame History

ADR-007: Runtime Model Download via HuggingFace Hub

Status

Accepted

Context

The detector model (SmolLM2-135M) is ~269MB. This is too large to bundle in a Python package — PyPI has a 60MB per-file limit and 1GB total project size limit. Even if it were allowed, a 269MB wheel download is terrible UX.

Options:

Bundle in package: Not feasible due to size constraints
Separate package for model: Possible but awkward, requires users to install two packages
Runtime download via HuggingFace Hub: Standard approach used by transformers. Provides caching, authentication, offline mode, and checksum verification
Custom download (S3, etc.): Works but reinvents the wheel

Decision

Download the detector model at runtime via HuggingFace Hub (snapshot_download or from_pretrained with automatic caching). Support offline mode via HF_HUB_OFFLINE=1 or local_files_only=True. Provide a CLI command for pre-downloading models in air-gapped environments.

Pin model revisions to specific commit hashes for reproducibility.

Consequences

Positive:

Package stays small (~30MB base install)
HuggingFace Hub provides automatic caching, deduplication, and checksum verification
Offline mode supported via environment variable
Authentication for gated models via HF_TOKEN
Standard approach — users familiar with transformers will recognize the pattern

Negative:

First run requires network access and ~269MB download (with progress bar)
Model availability depends on HuggingFace Hub uptime
Users in restricted networks need to pre-download models
Different model versions may produce different detection results — must pin revisions

References

python-ml-packaging.md — Section 2: Model file distribution
model.md

1.8 KiB Raw Blame History