docs: resolve 4 open questions, add research, spec codebook package structure

Research-driven resolution of OQ-01, OQ-02, OQ-05, OQ-06: - OQ-01: Remove ONNX Runtime from scope entirely — doesn't support activation extraction natively (optimum #972 closed as not planned), bloated model exports; burn/cublas via safetensors is a better future path - OQ-02: Codebook compresses ~65% (1,245 → 500-600 lines); add Package Structure and Extraction from PoC sections to codebook.md based on PoC analysis of metaspline firewall_codebook.py - OQ-05: Standalone API + thin adapter pattern (ADR-011); Phase 1 ships Firewall.screen() only, Phase 2 adds <100-line adapter packages for LlamaFirewall, OpenAI Agents SDK, NeMo Guardrails - OQ-06: TOML for file-based config — standard modern Python, two-way door Also: research OQ-03 rolling windows from taskgraph-semantic reference code, remove onnxruntime/optimum from dependencies, move streaming screening to Phase 2, add burn/cublas as Phase 3 alternative backend.
2026-06-13 07:27:40 +00:00
parent 11620e8398
commit 7d8a39a88a
13 changed files with 2576 additions and 83 deletions
--- a/docs/architecture/decisions/011-guardrail-integration-strategy.md
+++ b/docs/architecture/decisions/011-guardrail-integration-strategy.md
@@ -0,0 +1,75 @@
+# ADR-011: Standalone API with Thin Adapter Integration Strategy
+
+## Status
+
+Accepted
+
+## Context
+
+alknet-firewall provides behavioral signal detection — fundamentally different
+from text-surface defenses like Llama Guard, NeMo Guardrails, or Guardrails AI.
+It requires running a small detector model and extracting hidden state
+activations, not classifying input text. Users may want to run both text-surface
+defenses and behavioral detection in series.
+
+Research into existing guardrail systems ([patterns-analysis.md](../../research/guardrail-integration-patterns/patterns-analysis.md))
+identified three viable integration targets with high compatibility:
+
+- **LlamaFirewall**: `BaseScanner.scan()` → `ScanResult` maps directly to
+  `Firewall.screen()` → `Alarm`
+- **OpenAI Agents SDK**: `@input_guardrail` decorator pattern with blocking
+  execution
+- **NeMo Guardrails**: Custom Python action in input rails (Colang DSL can't
+  express behavioral detection natively)
+
+Two systems have low compatibility: Guardrails AI (expects text-surface
+validators with content fixes, not alarms) and Amazon Bedrock Guardrails
+(closed service, no extension mechanism).
+
+## Decision
+
+**Phase 1**: Ship a standalone API only. No adapters, no common interface.
+
+```python
+# The core API — simple, composable, no framework dependencies
+firewall = Firewall()
+alarm = firewall.screen("untrusted input text")
+```
+
+**Phase 2**: Build thin adapter packages as optional dependencies. Each adapter
+is <100 lines and has no impact on the core library:
+
+- `alknet-firewall-llamafirewall`: Custom `BaseScanner` subclass
+- `alknet-firewall-agents`: `@input_guardrail` wrapper
+- `alknet-firewall-nemo`: Custom NeMo input rail action
+
+Do NOT build a common `ScreeningProvider` interface. The integration patterns
+differ enough between systems that a shared abstraction would be premature and
+constraining. If a common pattern emerges organically from the adapters,
+extract it then.
+
+## Consequences
+
+**Positive**:
+- Phase 1 ships faster — no adapter development or testing overhead
+- Core API stays clean and framework-independent
+- Users can compose manually: call `firewall.screen()` then pass results to
+  any guardrail system
+- Adapters are optional packages, not core dependencies — no coupling
+- Thin adapters are easy to maintain when guardrail frameworks change their
+  APIs
+
+**Negative**:
+- Phase 1 users must write their own glue code (typically 5–10 lines)
+- No "pip install and configure" experience until Phase 2
+- Multiple small adapter packages to maintain
+- Risk of API drift between core and adapters if adapters are maintained
+  infrequently
+
+## References
+
+- [OQ-05](../open-questions.md) — How should the firewall integrate with
+  existing guardrail systems?
+- [patterns-analysis.md](../../research/guardrail-integration-patterns/patterns-analysis.md) — Full research analysis
+- [ADR-002](002-behavioral-signals.md) — Behavioral signal detection (not text
+  classification)