docs: resolve 4 open questions, add research, spec codebook package structure

Research-driven resolution of OQ-01, OQ-02, OQ-05, OQ-06: - OQ-01: Remove ONNX Runtime from scope entirely — doesn't support activation extraction natively (optimum #972 closed as not planned), bloated model exports; burn/cublas via safetensors is a better future path - OQ-02: Codebook compresses ~65% (1,245 → 500-600 lines); add Package Structure and Extraction from PoC sections to codebook.md based on PoC analysis of metaspline firewall_codebook.py - OQ-05: Standalone API + thin adapter pattern (ADR-011); Phase 1 ships Firewall.screen() only, Phase 2 adds <100-line adapter packages for LlamaFirewall, OpenAI Agents SDK, NeMo Guardrails - OQ-06: TOML for file-based config — standard modern Python, two-way door Also: research OQ-03 rolling windows from taskgraph-semantic reference code, remove onnxruntime/optimum from dependencies, move streaming screening to Phase 2, add burn/cublas as Phase 3 alternative backend.
2026-06-13 07:27:40 +00:00
parent 11620e8398
commit 7d8a39a88a
13 changed files with 2576 additions and 83 deletions
--- a/docs/architecture/model.md
+++ b/docs/architecture/model.md
@@ -72,8 +72,7 @@ class DetectorModel(Protocol):
 ```

 The `infer` method returns hidden states at key layers, abstracting away
-whether the backend is PyTorch, ONNX Runtime, or a future Rust inference
-engine.
+whether the backend is PyTorch or a future alternative inference engine.

 ### Lazy Loading

@@ -158,4 +157,4 @@ class HFDetectorModel:
 Open questions are tracked in [open-questions.md](open-questions.md). Key
 questions affecting this document:

- **OQ-01**: Should ONNX Runtime be a supported inference backend in Phase 1? (open)
+- **OQ-01**: ~~Should ONNX Runtime be a supported inference backend in Phase 1?~~ (resolved — removed from scope; burn/cublas is a better future path)