Files
alknet-firewall/docs/architecture/decisions/011-guardrail-integration-strategy.md
glm-5.1 7d8a39a88a docs: resolve 4 open questions, add research, spec codebook package structure
Research-driven resolution of OQ-01, OQ-02, OQ-05, OQ-06:

- OQ-01: Remove ONNX Runtime from scope entirely — doesn't support
  activation extraction natively (optimum #972 closed as not planned),
  bloated model exports; burn/cublas via safetensors is a better future path

- OQ-02: Codebook compresses ~65% (1,245 → 500-600 lines); add Package
  Structure and Extraction from PoC sections to codebook.md based on PoC
  analysis of metaspline firewall_codebook.py

- OQ-05: Standalone API + thin adapter pattern (ADR-011); Phase 1 ships
  Firewall.screen() only, Phase 2 adds <100-line adapter packages for
  LlamaFirewall, OpenAI Agents SDK, NeMo Guardrails

- OQ-06: TOML for file-based config — standard modern Python, two-way door

Also: research OQ-03 rolling windows from taskgraph-semantic reference code,
remove onnxruntime/optimum from dependencies, move streaming screening to
Phase 2, add burn/cublas as Phase 3 alternative backend.
2026-06-13 07:27:40 +00:00

75 lines
2.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ADR-011: Standalone API with Thin Adapter Integration Strategy
## Status
Accepted
## Context
alknet-firewall provides behavioral signal detection — fundamentally different
from text-surface defenses like Llama Guard, NeMo Guardrails, or Guardrails AI.
It requires running a small detector model and extracting hidden state
activations, not classifying input text. Users may want to run both text-surface
defenses and behavioral detection in series.
Research into existing guardrail systems ([patterns-analysis.md](../../research/guardrail-integration-patterns/patterns-analysis.md))
identified three viable integration targets with high compatibility:
- **LlamaFirewall**: `BaseScanner.scan()``ScanResult` maps directly to
`Firewall.screen()``Alarm`
- **OpenAI Agents SDK**: `@input_guardrail` decorator pattern with blocking
execution
- **NeMo Guardrails**: Custom Python action in input rails (Colang DSL can't
express behavioral detection natively)
Two systems have low compatibility: Guardrails AI (expects text-surface
validators with content fixes, not alarms) and Amazon Bedrock Guardrails
(closed service, no extension mechanism).
## Decision
**Phase 1**: Ship a standalone API only. No adapters, no common interface.
```python
# The core API — simple, composable, no framework dependencies
firewall = Firewall()
alarm = firewall.screen("untrusted input text")
```
**Phase 2**: Build thin adapter packages as optional dependencies. Each adapter
is <100 lines and has no impact on the core library:
- `alknet-firewall-llamafirewall`: Custom `BaseScanner` subclass
- `alknet-firewall-agents`: `@input_guardrail` wrapper
- `alknet-firewall-nemo`: Custom NeMo input rail action
Do NOT build a common `ScreeningProvider` interface. The integration patterns
differ enough between systems that a shared abstraction would be premature and
constraining. If a common pattern emerges organically from the adapters,
extract it then.
## Consequences
**Positive**:
- Phase 1 ships faster — no adapter development or testing overhead
- Core API stays clean and framework-independent
- Users can compose manually: call `firewall.screen()` then pass results to
any guardrail system
- Adapters are optional packages, not core dependencies — no coupling
- Thin adapters are easy to maintain when guardrail frameworks change their
APIs
**Negative**:
- Phase 1 users must write their own glue code (typically 510 lines)
- No "pip install and configure" experience until Phase 2
- Multiple small adapter packages to maintain
- Risk of API drift between core and adapters if adapters are maintained
infrequently
## References
- [OQ-05](../open-questions.md) — How should the firewall integrate with
existing guardrail systems?
- [patterns-analysis.md](../../research/guardrail-integration-patterns/patterns-analysis.md) — Full research analysis
- [ADR-002](002-behavioral-signals.md) — Behavioral signal detection (not text
classification)