docs: resolve 4 open questions, add research, spec codebook package structure
Research-driven resolution of OQ-01, OQ-02, OQ-05, OQ-06: - OQ-01: Remove ONNX Runtime from scope entirely — doesn't support activation extraction natively (optimum #972 closed as not planned), bloated model exports; burn/cublas via safetensors is a better future path - OQ-02: Codebook compresses ~65% (1,245 → 500-600 lines); add Package Structure and Extraction from PoC sections to codebook.md based on PoC analysis of metaspline firewall_codebook.py - OQ-05: Standalone API + thin adapter pattern (ADR-011); Phase 1 ships Firewall.screen() only, Phase 2 adds <100-line adapter packages for LlamaFirewall, OpenAI Agents SDK, NeMo Guardrails - OQ-06: TOML for file-based config — standard modern Python, two-way door Also: research OQ-03 rolling windows from taskgraph-semantic reference code, remove onnxruntime/optimum from dependencies, move streaming screening to Phase 2, add burn/cublas as Phase 3 alternative backend.
This commit is contained in:
@@ -56,17 +56,16 @@ for the full threat analysis and academic evidence.
|
||||
- Interpretable detection signals (SVD direction analysis)
|
||||
|
||||
- **Phase 2**: Integration and operational hardening
|
||||
- ONNX Runtime inference backend
|
||||
- Async/batch screening API
|
||||
- Integration adapters for LlamaFirewall, NeMo Guardrails
|
||||
- Integration adapters for LlamaFirewall, NeMo Guardrails, OpenAI Agents SDK
|
||||
- Metrics and observability
|
||||
- Codebook training pipeline (`run_manifold_projection.py` extraction)
|
||||
- Streaming/rolling-window input screening (granular detection for documents)
|
||||
|
||||
- **Phase 3**: Advanced capabilities
|
||||
- Multi-turn attack detection (payload splitting)
|
||||
- Streaming/rolling-window input screening (granular detection for documents)
|
||||
- Custom model fine-tuning for domain-specific detection
|
||||
- ONNX Runtime inference backend (export from PyTorch)
|
||||
- Alternative inference backends (burn/cublas via safetensors)
|
||||
|
||||
### Out of Scope
|
||||
|
||||
@@ -138,8 +137,6 @@ for the full threat analysis and academic evidence.
|
||||
|---------|-------|---------|---------|-------|
|
||||
| `torch` | `[torch]` | >=2.2 | Model inference | 200MB-2.5GB; optional dependency |
|
||||
| `transformers` | `[torch]` | >=4.40 | Model loading pipeline | Required with torch extra |
|
||||
| `onnxruntime` | `[onnx]` | >=1.17 | Alternative inference | ~30-50MB; Phase 2 |
|
||||
| `optimum` | `[onnx]` | latest | ONNX Runtime integration | Phase 2 |
|
||||
|
||||
### Development (Not Published)
|
||||
|
||||
@@ -187,6 +184,7 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
|
||||
| [008](decisions/008-three-level-alarm.md) | Three-level alarm system | CLEAR/SUSPICIOUS/DANGEROUS balances simplicity with nuance |
|
||||
| [009](decisions/009-last-token-extraction.md) | Last-token activation extraction | Standard for autoregressive models; full sequence context |
|
||||
| [010](decisions/010-monotonic-spline-distributions.md) | Monotonic spline distributions | Compact, smooth, tail-sensitive behavioral region modeling |
|
||||
| [011](decisions/011-guardrail-integration-strategy.md) | Standalone API + thin adapters | Phase 1 standalone, Phase 2 thin adapter packages |
|
||||
|
||||
## Dependencies on Other Projects
|
||||
|
||||
@@ -204,5 +202,5 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
|
||||
Open questions are tracked in [open-questions.md](open-questions.md). Key
|
||||
questions affecting this document:
|
||||
|
||||
- **OQ-01**: Should ONNX Runtime be a supported inference backend in Phase 1? (open)
|
||||
- **OQ-05**: How should the firewall integrate with existing guardrail systems? (open)
|
||||
- **OQ-01**: Should ONNX Runtime be a supported inference backend in Phase 1? (resolved — removed from scope; ONNX doesn't support activation extraction natively, and burn/cublas is a better future path)
|
||||
- **OQ-05**: How should the firewall integrate with existing guardrail systems? (resolved — ADR-011: standalone API + thin adapters in Phase 2)
|
||||
Reference in New Issue
Block a user