alknet-firewall/docs/architecture/decisions/011-guardrail-integration-strategy.md

# ADR-011: Standalone API with Thin Adapter Integration Strategy

## Status

Accepted

## Context

alknet-firewall provides behavioral signal detection — fundamentally different
from text-surface defenses like Llama Guard, NeMo Guardrails, or Guardrails AI.
It requires running a small detector model and extracting hidden state
activations, not classifying input text. Users may want to run both text-surface
defenses and behavioral detection in series.

Research into existing guardrail systems ([patterns-analysis.md](../../research/guardrail-integration-patterns/patterns-analysis.md))
identified three viable integration targets with high compatibility:

- **LlamaFirewall**: `BaseScanner.scan()` → `ScanResult` maps directly to
  `Firewall.screen()` → `Alarm`
- **OpenAI Agents SDK**: `@input_guardrail` decorator pattern with blocking
  execution
- **NeMo Guardrails**: Custom Python action in input rails (Colang DSL can't
  express behavioral detection natively)

Two systems have low compatibility: Guardrails AI (expects text-surface
validators with content fixes, not alarms) and Amazon Bedrock Guardrails
(closed service, no extension mechanism).

## Decision

**Phase 1**: Ship a standalone API only. No adapters, no common interface.

```python
# The core API — simple, composable, no framework dependencies
firewall = Firewall()
alarm = firewall.screen("untrusted input text")
```

**Phase 2**: Build thin adapter packages as optional dependencies. Each adapter
is <100 lines and has no impact on the core library:

- `alknet-firewall-llamafirewall`: Custom `BaseScanner` subclass
- `alknet-firewall-agents`: `@input_guardrail` wrapper
- `alknet-firewall-nemo`: Custom NeMo input rail action

Do NOT build a common `ScreeningProvider` interface. The integration patterns
differ enough between systems that a shared abstraction would be premature and
constraining. If a common pattern emerges organically from the adapters,
extract it then.

## Consequences

**Positive**:
- Phase 1 ships faster — no adapter development or testing overhead
- Core API stays clean and framework-independent
- Users can compose manually: call `firewall.screen()` then pass results to
  any guardrail system
- Adapters are optional packages, not core dependencies — no coupling
- Thin adapters are easy to maintain when guardrail frameworks change their
  APIs

**Negative**:
- Phase 1 users must write their own glue code (typically 5–10 lines)
- No "pip install and configure" experience until Phase 2
- Multiple small adapter packages to maintain
- Risk of API drift between core and adapters if adapters are maintained
  infrequently

## References

- [OQ-05](../open-questions.md) — How should the firewall integrate with
  existing guardrail systems?
- [patterns-analysis.md](../../research/guardrail-integration-patterns/patterns-analysis.md) — Full research analysis
- [ADR-002](002-behavioral-signals.md) — Behavioral signal detection (not text
  classification)