docs: resolve 4 open questions, add research, spec codebook package structure

Research-driven resolution of OQ-01, OQ-02, OQ-05, OQ-06:

- OQ-01: Remove ONNX Runtime from scope entirely — doesn't support
  activation extraction natively (optimum #972 closed as not planned),
  bloated model exports; burn/cublas via safetensors is a better future path

- OQ-02: Codebook compresses ~65% (1,245 → 500-600 lines); add Package
  Structure and Extraction from PoC sections to codebook.md based on PoC
  analysis of metaspline firewall_codebook.py

- OQ-05: Standalone API + thin adapter pattern (ADR-011); Phase 1 ships
  Firewall.screen() only, Phase 2 adds <100-line adapter packages for
  LlamaFirewall, OpenAI Agents SDK, NeMo Guardrails

- OQ-06: TOML for file-based config — standard modern Python, two-way door

Also: research OQ-03 rolling windows from taskgraph-semantic reference code,
remove onnxruntime/optimum from dependencies, move streaming screening to
Phase 2, add burn/cublas as Phase 3 alternative backend.
This commit is contained in:
2026-06-13 07:27:40 +00:00
parent 11620e8398
commit 7d8a39a88a
13 changed files with 2576 additions and 83 deletions

View File

@@ -6,17 +6,16 @@ Accepted
## Context
PyTorch is the primary inference backend for the detector model. However,
PyTorch is large:
PyTorch is the inference backend for the detector model. However, PyTorch is
large:
- `torch` (CPU): ~200MB download, ~700MB installed
- `torch` (CUDA): ~2.5GB download, ~5GB+ installed
- `onnxruntime`: ~30-50MB download, ~300MB installed
Making PyTorch a required dependency would force a 200MB-2.5GB download on
every user, even those who already have PyTorch installed or prefer ONNX
Runtime. This is the standard problem for ML libraries, and the HuggingFace
ecosystem has converged on a solution.
every user, even those who already have PyTorch installed. This is the
standard problem for ML libraries, and the HuggingFace ecosystem has
converged on a solution.
## Decision
@@ -43,7 +42,6 @@ except ImportError:
**Positive**:
- Base install is ~30MB download, ~100MB installed — very lightweight
- Users with existing PyTorch installations don't re-download
- ONNX Runtime alternative available for minimal footprint (~100MB total)
- Follows HuggingFace ecosystem conventions (transformers, safetensors, HF
hub all use this pattern)
- uv supports CPU/GPU torch variant selection via `[tool.uv.sources]` and
@@ -55,6 +53,8 @@ except ImportError:
- Runtime import errors if users forget to install a backend
- CPU-only torch requires two-step install or uv configuration (can't be
expressed in pip extras alone)
- PyTorch is the only supported inference backend; future alternatives
(burn/cublas via safetensors) would require separate integration work
## References