docs: resolve 4 open questions, add research, spec codebook package structure

Research-driven resolution of OQ-01, OQ-02, OQ-05, OQ-06: - OQ-01: Remove ONNX Runtime from scope entirely — doesn't support activation extraction natively (optimum #972 closed as not planned), bloated model exports; burn/cublas via safetensors is a better future path - OQ-02: Codebook compresses ~65% (1,245 → 500-600 lines); add Package Structure and Extraction from PoC sections to codebook.md based on PoC analysis of metaspline firewall_codebook.py - OQ-05: Standalone API + thin adapter pattern (ADR-011); Phase 1 ships Firewall.screen() only, Phase 2 adds <100-line adapter packages for LlamaFirewall, OpenAI Agents SDK, NeMo Guardrails - OQ-06: TOML for file-based config — standard modern Python, two-way door Also: research OQ-03 rolling windows from taskgraph-semantic reference code, remove onnxruntime/optimum from dependencies, move streaming screening to Phase 2, add burn/cublas as Phase 3 alternative backend.
2026-06-13 07:27:40 +00:00
parent 11620e8398
commit 7d8a39a88a
13 changed files with 2576 additions and 83 deletions
--- a/docs/architecture/decisions/006-optional-pytorch.md
+++ b/docs/architecture/decisions/006-optional-pytorch.md
@@ -6,17 +6,16 @@ Accepted

 ## Context

-PyTorch is the primary inference backend for the detector model. However,
-PyTorch is large:
+PyTorch is the inference backend for the detector model. However, PyTorch is
+large:

 - `torch` (CPU): ~200MB download, ~700MB installed
 - `torch` (CUDA): ~2.5GB download, ~5GB+ installed
- `onnxruntime`: ~30-50MB download, ~300MB installed

 Making PyTorch a required dependency would force a 200MB-2.5GB download on
-every user, even those who already have PyTorch installed or prefer ONNX
-Runtime. This is the standard problem for ML libraries, and the HuggingFace
-ecosystem has converged on a solution.
+every user, even those who already have PyTorch installed. This is the
+standard problem for ML libraries, and the HuggingFace ecosystem has
+converged on a solution.

 ## Decision

@@ -43,7 +42,6 @@ except ImportError:
 **Positive**:
 - Base install is ~30MB download, ~100MB installed — very lightweight
 - Users with existing PyTorch installations don't re-download
- ONNX Runtime alternative available for minimal footprint (~100MB total)
 - Follows HuggingFace ecosystem conventions (transformers, safetensors, HF
  hub all use this pattern)
 - uv supports CPU/GPU torch variant selection via `[tool.uv.sources]` and
@@ -55,6 +53,8 @@ except ImportError:
 - Runtime import errors if users forget to install a backend
 - CPU-only torch requires two-step install or uv configuration (can't be
  expressed in pip extras alone)
+- PyTorch is the only supported inference backend; future alternatives
+  (burn/cublas via safetensors) would require separate integration work

 ## References