Files
alknet-firewall/docs/architecture/decisions/011-guardrail-integration-strategy.md
glm-5.1 7d8a39a88a docs: resolve 4 open questions, add research, spec codebook package structure
Research-driven resolution of OQ-01, OQ-02, OQ-05, OQ-06:

- OQ-01: Remove ONNX Runtime from scope entirely — doesn't support
  activation extraction natively (optimum #972 closed as not planned),
  bloated model exports; burn/cublas via safetensors is a better future path

- OQ-02: Codebook compresses ~65% (1,245 → 500-600 lines); add Package
  Structure and Extraction from PoC sections to codebook.md based on PoC
  analysis of metaspline firewall_codebook.py

- OQ-05: Standalone API + thin adapter pattern (ADR-011); Phase 1 ships
  Firewall.screen() only, Phase 2 adds <100-line adapter packages for
  LlamaFirewall, OpenAI Agents SDK, NeMo Guardrails

- OQ-06: TOML for file-based config — standard modern Python, two-way door

Also: research OQ-03 rolling windows from taskgraph-semantic reference code,
remove onnxruntime/optimum from dependencies, move streaming screening to
Phase 2, add burn/cublas as Phase 3 alternative backend.
2026-06-13 07:27:40 +00:00

2.9 KiB
Raw Blame History

ADR-011: Standalone API with Thin Adapter Integration Strategy

Status

Accepted

Context

alknet-firewall provides behavioral signal detection — fundamentally different from text-surface defenses like Llama Guard, NeMo Guardrails, or Guardrails AI. It requires running a small detector model and extracting hidden state activations, not classifying input text. Users may want to run both text-surface defenses and behavioral detection in series.

Research into existing guardrail systems (patterns-analysis.md) identified three viable integration targets with high compatibility:

  • LlamaFirewall: BaseScanner.scan()ScanResult maps directly to Firewall.screen()Alarm
  • OpenAI Agents SDK: @input_guardrail decorator pattern with blocking execution
  • NeMo Guardrails: Custom Python action in input rails (Colang DSL can't express behavioral detection natively)

Two systems have low compatibility: Guardrails AI (expects text-surface validators with content fixes, not alarms) and Amazon Bedrock Guardrails (closed service, no extension mechanism).

Decision

Phase 1: Ship a standalone API only. No adapters, no common interface.

# The core API — simple, composable, no framework dependencies
firewall = Firewall()
alarm = firewall.screen("untrusted input text")

Phase 2: Build thin adapter packages as optional dependencies. Each adapter is <100 lines and has no impact on the core library:

  • alknet-firewall-llamafirewall: Custom BaseScanner subclass
  • alknet-firewall-agents: @input_guardrail wrapper
  • alknet-firewall-nemo: Custom NeMo input rail action

Do NOT build a common ScreeningProvider interface. The integration patterns differ enough between systems that a shared abstraction would be premature and constraining. If a common pattern emerges organically from the adapters, extract it then.

Consequences

Positive:

  • Phase 1 ships faster — no adapter development or testing overhead
  • Core API stays clean and framework-independent
  • Users can compose manually: call firewall.screen() then pass results to any guardrail system
  • Adapters are optional packages, not core dependencies — no coupling
  • Thin adapters are easy to maintain when guardrail frameworks change their APIs

Negative:

  • Phase 1 users must write their own glue code (typically 510 lines)
  • No "pip install and configure" experience until Phase 2
  • Multiple small adapter packages to maintain
  • Risk of API drift between core and adapters if adapters are maintained infrequently

References

  • OQ-05 — How should the firewall integrate with existing guardrail systems?
  • patterns-analysis.md — Full research analysis
  • ADR-002 — Behavioral signal detection (not text classification)